> can specify are the lengths; the standard decides how to assign the actual codewords based on

> the lengths. Ditto for deflate. So you should really use that ffmpeg code you talk about

ive meant that you can decide it for the general case, jpeg of course puts some restrictions on it

ive updated the post to make this more clear

also its not true that the standard always decides the 0 vs. 1 it just does so for bits which seperate codewords of differing lengths. bits which seperate codewords of equal length can be selected at will i think

]]>About step 4, you can’t decide which one gets a 0 and which one gets a 1 as the only thing you can specify are the lengths; the standard decides how to assign the actual codewords based on the lengths. Ditto for deflate. So you should really use that ffmpeg code you talk about :)

]]>propabilities: 1,2,4,8,16, … will do ]]>

* Max code length is 16 bits

* No code can be all 1’s (1’s are reserved for byte-padding at the end; dumb, I know)

The first constraint is present in pretty much every dynamic huffman implementation in use (including zlib [*]) and requires thought. Contrary to popular belief, just 4180 symbols (*) are enough to get 17 bit code lengths (4179 in JPEG due to the second constraint).

(*) With Fibonacci-numbers frequencies, that is, 1, 1, 3, 5, 8, 13…

[*] deflate has a limit length of 15 bits, so with 2583 input symbols you can already have trouble, and that includes the block terminator and the match length codes with share the dictionary with the literal codes.