The highlevel structure of the mp3 bitstream
Todays blog entry descibes the mp3 bitstream structure, it wont describe the actual mp3 decoding and neither will it descibe where and how each variable is precissely stored …
First the mp3 bitstream is made of packets, these packets vary in size, yes they can even vary for “CBR” though only by +-1 byte
Each such packet begins with a 11bit “startcode” followed by 21 header bits which encode all of the most important info like samplerate, bitrate, number of channels, layer, padding, …. from these 21bits can the packet size be calculated
The remainder of the packet is split in 2 parts, the first contains everything except the scale factors and coefficients, the second contains the scale factors and coefficients which make it the larger part, the first is just encoded normally from position 4 (0-3 is the header) to the end
The second though is encoded much more stupidly, first we have (from the first part of the packet) a backstep value which is an amount of bytes from an internal buffer which we must prepend to the second part of the packet before we can begin to decode it, we will describe whats in that internal buffer in a moment, the second part now is split into channels and granules, for each we have a length in bits, we decode them normally until we either have decoded all scale factors and coefficients or we hit or run over the amount of bits assigned for that channel-granule if we dont hit it precissely but run over it as the bit-amount end doesnt coincide with the end of a vlc than we must step back by one vlc, undo its effects and skip the half vlc, whats left of the second part of our packet (that can include part of the internal buffer) will be put into the internal buffer to be used by future packets