Universal vlc codes
Its quite well known that to optimally store some variable x with value v, we should use -log2(p) bits where p is the probability that x has value v, obviously the decoder has to know the probability distribution the encoder used or the bitstrings generated from that to be able decode it…
now if we limit ourselfs to an interger count of bits then huffman codes are optimal, building an huffman code is trivial, just recursively merge the 2 least probable symbols, but if there are many different variables with different probablility distributions or the maximum value is large or unbounded or you dont know the probabilities or you want low complexity then huffman codes might be a bad choice
Universal vlc codes might be a better choice, they all assume small values are more likely and often have very good compression rates
heres a table of common ones
v | Unary | Unary2 | Exp Golomb | Exp Golomb2 | Fibonacci | rice k=2 | rice k=3 |
---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 11 | 00 | 000 |
1 | 10 | 11 | 100 | 101 | 011 | 01 | 001 |
2 | 110 | 101 | 101 | 111 | 0011 | 100 | 010 |
3 | 1110 | 1001 | 11000 | 10001 | 1011 | 101 | 011 |
4 | 111110 | 10001 | 11001 | 10011 | 00011 | 1100 | 1000 |
5 | 1111110 | 100001 | 11010 | 11001 | 10011 | 1101 | 1001 |
6 | 11111110 | 1000001 | 11011 | 11011 | 01011 | 11100 | 1010 |
7 | 111111110 | 10000001 | 1110000 | 1000001 | 000011 | 11101 | 1011 |
8 | 1111111110 | 100000001 | 1110001 | 1000011 | 100011 | 111100 | 11000 |
… | … | … |
Note1: Exp Golomb2 is identical to Exp Golomb except that its a reversible vlc, it can be decoded from right to left too, same for Unary2 and Unary
The smart reader will probably have noticed that Exp Golomb coding is simply unary coding for the number of bits + the bits, and rice coding is unary coding with k bits, more generally you can take any vlc code to encode some “class” and then give every such class its own number of additional bits
The ideal choice of uvlc code depends upon the probability distribution of the values which will be encoded, there is no single perfect one
Well, hmm there was something else …ahh yes negative numbers, we where just dealing with positive ones above, negative ones are trivial, just map them to odd positive ones and map the actual positive ones to even ones or the other way around
in C the following will do the trick quickly:
v = -2*i-1; v ^= v>>31;
and the following will reverse it:
(v>>1) ^ -(v&1)