一.概述:
MP3 文件是由幀()構成的,幀是MP3 文件最小的組成單位。MP3的全稱應為MPEG1 Layer-3 音頻
文件,MPEG(Moving Picture Experts Group) 在漢語中譯為活動圖像專家組,特指活動影音壓縮標准,MPEG
音頻文件是MPEG1 標准中的聲音部分,也叫MPEG 音頻層,它根據壓縮質量和編碼復雜程度劃分為三層,即
Layer-1、Layer2、Layer3, 且分別對應MP1、MP2、MP3 這三種聲音文件,並根據不同的用途,使用不同層
次的編碼。MPEG 音頻編碼的層次越高,編碼器越復雜,壓縮率也越高,MP1 和MP2 的壓縮率分別為4:1 和
6:1-8:1,而MP3 的壓縮率則高達10:1-12:1,也就是說,一分鐘CD 音質的音樂,未經壓縮需要10MB
的存儲空間,而經過MP3 壓縮編碼後只有1MB 左右。不過MP3 對音頻信號采用的是有損壓縮方式,為了降
低聲音失真度,MP3采取了“感官編碼技術”,即編碼時先對音頻文件進行頻譜分析,然後用過濾器濾掉
噪音電平,接著通過量化的方式將剩下的每一位打散排列,最後形成具有較高壓縮比的MP3 文件,並使壓
縮後的文件在回放時能夠達到比較接近原音源的聲音效果。
二.整個MP3 文件結構:
MP3 文件大體分為三部分:TAG_V2(ID3V2),, TAG_V1(ID3V1)
ID3V2 包含了作者,作曲,專輯等信息,長度不固定,擴展了ID3V1 的信息量。
一系列的幀,個數由文件大小和幀長決定
. 每個 的長度可能不固定,也可能固定,由位率bitrate 決定
. 每個 又分為幀頭和數據實體兩部分
. 幀頭記錄了mp3 的位率,采樣率,版本等信息,每個幀之間相互獨立
ID3V1 包含了作者,作曲,專輯等信息,長度為128BYTE。
三.MP3的 格式:
每個 都有一個幀頭HEADER,長度是4BYTE(32bit),幀頭後面可能有兩個字節的CRC 校
驗,這兩個字節的是否存在決定於HEADER 信息的第16bit, 為0 則幀頭後面無校驗,為1 則有校驗,
校驗值長度為2 個字節,緊跟在HEADER 後面,接著就是幀的實體數據了,格式如下:
HEADER
CRC(free)
MAIN_DATA
4 BYTE
0 OR 2 BYTE
長度由幀頭計算得出
1.幀頭HEADER 格式如下:
AAAAAAAA AAABBCCD EEEEFFGH IIJJKLMM
13 個幀頭字符的含義如下:
Sign Length
(bits)
Position
(bits)
Deion
A 11 (31-21) sync (all bits set)
B 2 (20,19) MPEG Audio version
00 - MPEG Version 2.5
01 - reserved
10 - MPEG Version 2
11 - MPEG Version 1
C 2 (18,17) Layer deion
00 - reserved
01 - Layer III
10 - Layer II
11 - Layer I
D 1 (16) Protection bit
0 - Protected by CRC (16bit crc follows header)
1 - Not protected
E 4 (15,12) Bitrate index
bits V1,L1 V1,L2 V1,L3 V2,L1 V2,L2 V2,L3
0000 free free free free free free
0001 32 32 32 32 32 8 (8)
0010 64 48 40 64 48 16 (16)
0011 96 56 48 96 56 24 (24)
0100 128 64 56 128 64 32 (32)
0101 160 80 64 160 80 64 (40)
0110 192 96 80 192 96 80 (48)
0111 224 112 96 224 112 56 (56)
1000 256 128 112 256 128 64 (64)
1001 288 160 128 288 160 128 (80)
1010 320 192 160 320 192 160 (96)
1011 352 224 192 352 224 112 (112)
1100 384 256 224 384 256 128 (128)
1101 416 320 256 416 320 256 (144)
1110 448 384 320 448 384 320 (160)
1111 bad bad bad bad bad bad
NOTES: All s are in kbps
V1 - MPEG Version 1
V2 - MPEG Version 2 and Version 2.5
L1 - Layer I
L2 - Layer II
L3 - Layer III
"free" means variable bitrate.
"bad" means that this is not an allowed
The s in parentheses are from different sources which
claim that those s are valid for V2,L2 and V2,L3. If
anyone can confirm please let me know.
F 2 (11,10) Sampling rate frequency index (s are in Hz)
bits MPEG1 MPEG2 MPEG2.5
00 44100 22050 11025
01 48000 24000 12000
10 32000 16000 8000
11 reserv. reserv. reserv.
G 1 (9) Padding bit
0 - is not padded
1 - is padded with one extra bit
H 1 (8) Private bit (unknown purpose)
I 2 (7,6) Channel Mode
00 - Stereo
01 - Joint stereo (Stereo)
10 - Dual channel (Stereo)
11 - Single channel (Mono)
J 2 (5,4) Mode extension (Only if Joint stereo)
Intensity stereo MS stereo
00 off off
01 on off
10 off on
11 on on
K 1 (3) Copyright
0 - Audio is not copyrighted
1 - Audio is copyrighted
L 1 (2) Original
0 - Copy of original media
1 - Original media
M 2 (1,0) Emphasis
00 - none
01 - 50/15 ms
10 - reserved
11 - CCIT J.17
1)每幀的播放時間:無論幀長是多少,每幀的播放時間都是26ms;
2)數據幀大小:
Size = (((MpegVersion == MPEG1 ? 144 : 72) * Bitrate) / SamplingRate) + PaddingBit
例如: Bitrate = 128000, a SamplingRate =44100, and PaddingBit = 1
Size = (144 * 128000) / 44100 + 1 = 417 bytes
2.MAIN_DATA:
MAIN_DATA 部分長度是否變化決定於HEADER 的bitrate是否變化,一首MP3 歌曲,它有三個版本:96Kbps(96 千比特位每秒)、128Kbps 和192Kbps。Kbps (比特位速率), 表明了音樂每秒的數據量,Kbps 值越高,音質越好,文件也越大,MP3標准規定,不變的bitrate 的MP3 文件稱作CBR,大多數MP3 文件都是CBR 的,而變化的bitrate 的MP3 文件稱作VBR, 每個 的長度都可能是變化的。下面是CBR 和VBR 的不同點:
1)CBR:固定位率的 的大小也是固定的(公式如上所述),只要知道文件總長度,和幀長即可由播放每幀需26ms 計算得出mp3 播放的總時間,也可通過計數幀的個數控制快進、快退慢放等操作。
2)VBR:VBR 是XING 公司推出的算法,所以在MP3 的 裡會有“XING"這個關鍵字(現在很多流行的小軟件也可以進行VBR 壓縮,它們是否遵守這個約定,那就不得而知了),它存放在MP3 文件中的第一個有效 裡,它標識了這個MP3 文件是VBR 的。同時第一個 裡存放了MP3 文件的 的總個數,這就很容易獲得了播放總時間,同時還有100 個字節存放了播放總時間的100 個時間分段的 的INDEX,假設4 分鐘的MP3 歌曲,240S, 分成100 段,每兩個相鄰INDEX 的時間差就是2.4S, 所以通過這個INDEX,只要前後處理少數的,就能快速找出我們需要快進的 頭,可參考下文:
This system was created to minimize file lengths and to preserve sound quality.
Higher frequencIEs generally needs more space for encoding (thats why many codecs cut all
frequencIEs above cca 16kHz) and lower tones requires less. So if some part of song doesnt consist
of higher tones then using eg. 192kbps is wasting of space. It should be enough to use only eg.
96kbps.
And it is the principle of VBR. Codec looks over and then choose bitrate suitable for its
sound quality.
It sounds perfect but it brings some problems:
If you want to jump over 2 minutes in song, it is not a problem with CBR because you are able
simply count amount of Bytes which is necessary to skip. But it is impossible with VBR.
lengths should be arbitrary so you have to either go by and counts (time consuming
and very unpractical) or use another mechanism for approximate count.
If you want to cut 5 minutes from the middle of VBR file (all we know CDs where last song takes
10 minutes but 5 minutes is a pure silence, HELL!) problems are the same.
Result? VBR files are more difficult for controlling and adjusting. And I dont like feeling that
sound quality changes in every moment. And AFAIK many codecs have problems with creation VBR in good quality.
Personally I cant see any reason why to use VBR -I dont give a **** if size of one CD in MP3
is 55 MB with CBR or 51 MB with VBR. But everybody has a different taste... some people prefer
VBR.
VBR File Structure is the same as for CBR. But the first doesnt contain audio data and it is used for special information about VBR file.
Structure of the first (the table as follow):
Byte Content
0-3 Standard audio header (as deed above). Mostly it contains s FF
FB 30 4C, from which you can count Len = 156 Bytes. And thats exactly enough
space for storing VBR info.
This header contains some important information valid for the whole
-MPEG (MPEG1 or MPEG2)
-SAMPLING rate frequency index
-CHANNEL (JointStereo etc.)
4-x Not used till string "Xing" (58 69 6E 67). This string is used as a main VBR file
identifIEr. If it is not found, file is supposed to be CBR. This string can be placed
at different locations according to s of MPEG and CHANNEL (ya, these from a
few lines upwards):
36-39 "Xing" for MPEG1 and CHANNEL != mono (mostly used)
21-24 "Xing" for MPEG1 and CHANNEL == mono
21-24 "Xing" for MPEG2 and CHANNEL != mono
13-16 "Xing" for MPEG2 and CHANNEL == mono
After "Xing" string there are placed flags, number of s in file and a size
of file in Bytes. Each of these items has 4 Bytes and it is stored as ''int'' number
in memory. The first is the most significant Byte and the last is the least.
Following schema is for MPEG1 and CHANNEL != mon
40-43 Flags
Name Deion
00 00 00 01 s Flag set if for number of s in file is stored
00 00 00 02 Bytes Flag set if for filesize in Bytes is stored
00 00 00 04 TOC Flag set if s for TOC (see below) are stored
00 00 00 08 VBR Scale Flag set if s for VBR scale are stored
All these s can be stored simultaneously.
44-47 s
Number of s in file (including the first info one)
48-51 Bytes
File length in Bytes
52-151 TOC (Table of Contents)
Contains of 100 indexes (one Byte length) for easIEr lookup in file. Approximately
solves problem with moving inside file.
Each Byte has a according this formula:
(TOC[i] / 256) * fileLenInBytes
So if song lasts eg. 240 sec. and you want to jump to 60. sec. (and file is 5 000
000 Bytes length) you can use:
TOC[(60/240)*100] = TOC[25]
and corresponding Byte in file is then approximately at:
(TOC[25]/256) * 5000000
If you want to trim VBR file you should also reconstruct s, Bytes and TOC
properly.
152-155 VBR Scale
I dont know exactly system of storing of this s but this item probably doesnt
have deeper meaning.
四.ID3v1
ID3V1 比較簡單,它是存放在MP3 文件的末尾,用16 進制的編輯器打開一個MP3 文件,查看其末尾
的128 個順序存放字節,數據結構定義如下:
typedef struct tagID3V1
{
char Header[3]; /*標簽頭必須是"TAG"否則認為沒有標簽*/
char Title[30]; /*標題*/
char Artist[30]; /*作者*/
char Album[30]; /*專集*/
char Year[4]; /*出品年代*/
char Comment[28]; /*備注*/
char reserve; /*保留*/
char track;; /*音軌*/
char Genre; /*類型*/
}ID3V1,*pID3V1;
ID3V1 的各項信息都是順序存放,沒有任何標識將其分開,比如標題信息不足30 個字節,則使用''\0''
補足,否則將造成信息錯誤。Genre使用原碼表示,對照表如下:
/* Standard genres */
0="Blues";
1="ClassicRock";
2="Country";
3="Dance";
4="Disco";
5="Funk";
6="Grunge";
7="Hip-Hop";
8="Jazz";
9="l";
10="NewAge";
11="OldIEs";
12="Other";
13="Pop";
14="R&B";
15="Rap";
16="Reggae";
17="Rock";
18="Techno";
19="Industrial";
20="Alternative";
21="Ska";
22="Deathl";
23="Pranks";
24="Soundtrack";
25="Euro-Techno";
26="AmbIEnt";
27="Trip-Hop";
28="Vocal";
29="Jazz+Funk";
30="Fusion";
31="Trance";
32="Classical";
33="Instrumental";
34="Acid";
35="House";
36="Game";
37="SoundClip";
38="Gospel";
39="Noise";
40="AlternRock";
41="Bass";
42="Soul";
43="Punk";
44="Space";
45="Meditative";
46="InstrumentalPop";
47="InstrumentalRock";
48="Ethnic";
49="Gothic";
50="Darkwave";
51="Techno-Industrial";
52="Electronic";
53="Pop-Folk";
54="Eurodance";
55="Dream";
56="SouthernRock";
57="Comedy";
58="Cult";
59="Gangsta";
60="Top40";
61="ChristianRap";
62="Pop/Funk";
63="Jungle";
64="NativeAmerican";
65="Cabaret";
66="NewWave";
67="Psychadelic";
68="Rave";
69="Showtunes";
70="Trailer";
71="Lo-Fi";
72="Tribal";
73="AcidPunk";
74="AcidJazz";
75="Polka";
76="Retro";
77="Musical";
78="Rock&Roll";
79="HardRock";
/* Extended genres */
80="Folk";
81="Folk-Rock";
82="NationalFolk";
83="Swing";
84="FastFusion";
85="Bebob";
86="Latin";
87="Revival";
88="Celtic";
89="Bluegrass";
90="Avantgarde";
91="GothicRock";
92="ProgessiveRock";
93="PsychedelicRock";
94="SymphonicRock";
95="SlowRock";
96="BigBand";
97="Chorus";
98="EasyListening";
99="Acoustic";
100="Humour";
101="Speech";
102="Chanson";
103="Opera";
104="ChamberMusic";
105="Sonata";
106="Symphony";
107="BootyBass";
108="Primus";
109="PornGroove";
110="Satire";
111="SlowJam";
112="Club";
113="Tango";
114="Samba";
115="Folklore";
116="Ballad";
117="PowerBallad";
118="RhythmicSoul";
119="Freestyle";
120="Duet";
121="PunkRock";
122="DrumSolo";
123="Acapella";
124="Euro-House";
125="DanceHall";
126="Goa";
127="Drum&Bass";
128="Club-House";
129="Hardcore";
130="Terror";
131="IndIE";
132="BritPop";
133="Negerpunk";
134="PolskPunk";
135="Beat";
136="ChristianGangstaRap";
137="Heavyl";
138="Blackl";
139="Crossover";
140="ContemporaryChristian";
141="ChristianRock";
142="Merengue";
143="Salsa";
144="Trashl";
145="Anime";
146="JPop";
147="Synthpop";
五.ID3V2
ID3V2 到現在一共有4 個版本,但流行的播放軟件一般只支持第3 版, 既ID3v2.3。由於ID3V1 記錄
在MP3 文件的末尾,ID3V2就只好記錄在MP3 文件的首部了(如果有一天發布ID3V3,真不知道該記錄在哪
裡)。也正是由於這個原因,對ID3V2 的操作比ID3V1 要慢。而且ID3V2 結構比ID3V1 的結構要復雜得多,
但比前者全面且可以伸縮和擴展。
下面就介紹一下ID3V2.3。
每個ID3V2.3 的標簽都一個標簽頭和若干個標簽幀或一個擴展標簽頭組成。關於曲目的信息如標題、作者
等都存放在不同的標簽幀中,擴展標簽頭和標簽幀並不是必要的,但每個標簽至少要有一個標簽幀。標簽
頭和標簽幀一起順序存放在MP3 文件的首部。
1、標簽頭
在文件的首部順序記錄10 個字節的ID3V2.3 的頭部。數據結構如下:
char Header[3]; /*必須為"ID3"否則認為標簽不存在*/
char Ver; /*版本號ID3V2.3 就記錄3*/
char Revision; /*副版本號此版本記錄為0*/
char Flag; /*存放標志的字節,這個版本只定義了三位,稍後詳細解說*/
char Size[4]; /*標簽大小,包括標簽頭的10 個字節和所有的標簽幀的大小*/
1).標志字節
標志字節一般為0,定義如下:
abc00000
a -- 表示是否使用Unsynchronisation(這個單詞不知道是什麼意思,字典裡也沒有找到,一般不設置)
b -- 表示是否有擴展頭部,一般沒有(至少Winamp 沒有記錄),所以一般也不設置
c -- 表示是否為測試標簽(99.99%的標簽都不是測試用的啦,所以一般也不設置)
2).標簽大小
一共四個字節,但每個字節只用7 位,最高位不使用恆為0。所以格式如下
0xxxxxxx 0xxxxxxx 0xxxxxxx 0xxxxxxx
計算大小時要將0 去掉,得到一個28 位的二進制數,就是標簽大小(不懂為什麼要這樣做),計算公式如
下:
int total_size;
total_size = (Size[0]&0x7F)*0x200000
+(Size[1]&0x7F)*0x400
+(Size[2]&0x7F)*0x80
+(Size[3]&0x7F)
2、標簽幀
每個標簽幀都有一個10 個字節的幀頭和至少一個字節的不固定長度的內容組成。它們也是順序存放在文件
中,和標簽頭和其他的標簽幀也沒有特殊的字符分隔。得到一個完整的幀的內容只有從幀頭中的到內容大
小後才能讀出,讀取時要注意大小,不要將其他幀的內容或幀頭讀入。
幀頭的定義如下:
char ID[4]; /*用四個字符標識一個幀,說明其內容,稍後有常用的標識對照表*/
char Size[4]; /*幀內容的大小,不包括幀頭,不得小於1*/
char Flags[2]; /*存放標志,只定義了6 位,稍後詳細解說*/
1).幀標識
用四個字符標識一個幀,說明一個幀的內容含義,常用的對照如下:
TIT2=標題 表示內容為這首歌的標題,下同
TPE1=作者
TALB=專集
TRCK=音軌 格式:N/M 其中N 為專集中的第N 首,M為專集中共M 首,N和M 為ASCII 碼表示的數字
TYER=年代 是用ASCII 碼表示的數字
TCON=類型 直接用字符串表示
COMM=備注 格式:"eng\0備注內容",其中eng 表示備注所使用的自然語言
2).大小
這個可沒有標簽頭的算法那麼麻煩,每個字節的8 位全用,格式如下
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
算法如下:
int FSize;
FSize = Size[0]*0x100000000
+Size[1]*0x10000
+Size[2]*0x100
+Size[3];
3).標志
只定義了6 位,另外的10 位為0,但大部分的情況下16 位都為0 就可以了。格式如下:
abc00000 ijk00000
a -- 標簽保護標志,設置時認為此幀作廢
b -- 文件保護標志,設置時認為此幀作廢
c -- 只讀標志,設置時認為此幀不能修改(但我沒有找到一個軟件理會這個標志)
i -- 壓縮標志,設置時一個字節存放兩個BCD 碼表示數字
j -- 加密標志(沒有見過哪個MP3 文件的標簽用了加密)
k -- 組標志,設置時說明此幀和其他的某幀是一組
值得一提的是winamp 在保存和讀取幀內容的時候會在內容前面加個''\0'',並把這個字節計算在幀內容的
大小中。
附:幀標識的含義
4). Declared ID3v2 s
The following s are declared in this draft.
AENC Audio encryption
APIC Attached picture
COMM Comments
COMR Commercial
ENCR Encryption method registration
EQUA Equalization
ETCO Event timing codes
GEOB General encapsulated object
GRID Group identification registration
IPLS Involved people list
LINK Linked information
MCDI Music CD identifIEr
MLLT MPEG location lookup table
OWNE Ownership
PRIV Private
PCNT Play counter
POPM Popularimeter
POSS Position synchronisation
RBUF Recommended buffer size
RVAD Relative volume adjustment
RVRB Reverb
SYLT Synchronized lyric/text
SYTC Synchronized tempo codes
TALB Album/MovIE/Show title
TBPM BPM (beats per minute)
TCOM Composer
TCON Content type
TCOP Copyright message
TDAT Date
TDLY Playlist delay
TENC Encoded by
TEXT Lyricist/Text writer
TFLT File type
TIME Time
TIT1 Content group deion
TIT2 Title/songname/content deion
TIT3 Subtitle/Deion refinement
TKEY Initial key
TLAN Language(s)
TLEN Length
TMED Media type
TOAL Original album/movIE/show title
TOFN Original filename
TOLY Original lyricist(s)/text writer(s)
TOPE Original artist(s)/performer(s)
TORY Original release year
TOWN File owner/licensee
TPE1 Lead performer(s)/Soloist(s)
TPE2 Band/orchestra/accompaniment
TPE3 Conductor/performer refinement
TPE4 Interpreted, remixed, or otherwise modifIEd by
TPOS Part of a set
TPUB Publisher
TRCK Track number/Position in set
TRDA Recording dates
TRSN Internet radio station name
TRSO Internet radio station owner
TSIZ Size
TSRC ISRC (international standard recording code)
TSSE Software/Hardware and settings used for encoding
TYER Year
TXXX User defined text information
UFID Unique file identifIEr
USER Terms of use
USLT Unsychronized lyric/text tranion
WCOM Commercial information
WCOP Copyright/Legal information
WOAF Official audio file webpage
WOAR Official artist/performer webpage
WOAS Official audio source webpage
WORS Official internet radio station homepage
WPAY Payment
WPUB Publishers official webpage
WXXX User defined URL link