July 2007

libnoe

libnoe is a library to encode and decode reed solomon codes which i wrote between 2002 and 2006

noe

noe is an application which uses libnoe to generate an error correction file for some data file(s) and use that then to correct a wide varity of possible errors incuding having the data randomly chopped up and reordered. “noe” btw stands for “no error” in case you are wondering, sadly ive never finished the noe application.

The basic idea of how noe would work is that, first the data itself is unchanged, changing it would be inconvenient in many situations. The error correction file is made of many not too large packets, this ensures that any reordering which happens to the error correction file can be corrected by simply searching for the packet headers and looking at some sequence number in the header. The error correction packets now would contain some fingerprints of the data in the datafile(s) that is for example every 100th or 1000th bit of the data file would be stored in some error correction packet in the error correction file. With these fingerprints its possible to detect and correct reorderings which might have happenend to the data file even if just a random subset of the error correction packets are intact. The fingerprints as well as the headers of the error correction packets would contain some small checksums to avoid confusing the code by many wrong values. At last the main content of the error correction packets would simply be interleaved RS codes or more precissely the parity part of them. Btw in case anyone is wondering how data can get randomly choped up and reordered, think of a broken hard disk and fragmented files

Patches to finish noe are of course welcome! :)

mina

mina is the MINimal Alternative which my lazy self did finish. It simlpy takes a file and produces an error correction file which is just a bunch of interleaved RS codes (parity part of them actually) with no header or anything. It also happily eats corrupted files and corrects them

An example of minas correction capability is below, note images have been converted to jpeg to reduce their size and make them vissible in normal browsers. Raw damaged files as tar.gz are available too (mina dz lena.pnm.mina can be used to correct them)

damaged recovered

Source code under GPL and GIT repositoryis available too, its also quite clean and does compile :). History though is sadly quite incomplete like with the other forgotten code, this time though it was IBMs fault as my private CVS server with the whole history of noe was on a IBM deathstar disk and it seems i had no backup of the RCS files (this is also one of the reasons why i make all that stuff public now, to avoid it being lost due to some other hd failure or stupidity …)

patches are welcome !!! :)

Asymptotic complexity of best known (to me ;) ) decoding algorithm

O(n log n + t log2 t) for a (n,k) RS code over GF(n+1) and t=n-k

The proof for this is quite easy, syndrom calculation is just evaluating a polynomial at n-k points, and evaluating a polynomial (in GF(n)) at all points can be done with the GFFT actually evaluation at all points is the GFFT of the polynomial. Multiplying 2 polynomials is just 2xGFFT + componentwise multiplication + IGFFT. Finding the roots of a polynomial can as well be done by just evaluating it at all points. The only non trivial operation left for normal RS decoding is solving the key equation which is equivalent to euclids GCD algorithm as well as schönhages GCD algorithm, later has O(t log2 t) complexity (log2 t == (log t)2 in case thats unclear).

An alternative to GF(2x)

Normally RS codes are build over GF(2x) that way the bits of the elements of an RS codeword have a nice 1:1 mapping to x bits which can then be stored or transmitted, but it has a big disadvantage and that is that the GFFT for GF(y) needed for fast RS decoding is done with y-1 points and so it cannot use the well known power of 2 style FFT algorithms as 2x – 1 is not a multiple of 2. The solution is to use GF(2x+1), though note GF(2x+1) does not exist for all integer values of x, it only exists if 2x+1 is a power of a prime that is pj, 2 obvious choices using fermat primes are GF(28+1) and GF(216+1)

How do you store 2x+1 values in 2x values

Trivial ;)

The data part of our RS code is specified by the user and so it simply doesnt use the 2x+1 th symbol, actually it would be messy to use it. So the only problem left are the n-k parity symbols, which can trivially be transformed to not contain the annoying 2x+1 th symbol while at the same time maintaining the property of being an RS code

Let us assume that we have a symbol (at position y with value yv) in our k input symbols which is guranteed to have a value yv < 2x – n + k that is in practice less than one unused bit. Let p be the RS codeword with all k-1 data symbols 0 and the symbol at position y 1. The next step is to find all the values of the y element in our original codeword which would cause no parity symbol to have that annoying 2x+1 th value, for encoding we simply select the yv th element of this list as new yv element. For decoding we choose the number of elements in the list which are smaller than yv as our new element. As last step we just need to add a scaled version of p so as to actually have the wanted yv element and avoiding the nasty too large elements while also still having an RS code

What is a reed solomon code

Lets assume we have k values out of which we want to build a reed solomon code, to do this we imagine that our k values specify the height (=y) of k points with x from 0 to k-1. Now we really have just made a silly graph of our data. Next we find a order k-1 polynomial which goes exactly through these points, this polynomial is unique, no other polynomial of order k-1 will go through these points. And last we just evaluate this polynomial on the points 0 … n-1, these n values are a reed solomon code, simple isnt it? Note the first k values are just our input data values which we already know.

Correcting erasures with RS codes

We now can make RS codes, but how can we correct erasures? Lets assume there are t erasures (erasures are errors where the location of the error is known). That means we know n-t values of our polynomial, and if t≤n-k then we can just find the remaining values by finding the (unique) polynomial which goes through the n-t values. Its also easy to show (just think that you have k-1 of your k data values) that if t>n-k then no code can correct the erasures, so RS codes are optimal in that sense

Correcting errors with RS codes

But what if we dont know where the errors are? Well just try all possible error locations of 0, 1,…,t errors, yes this is not practical but its nice to proof the error correcting capability. Now if we have t actual errors and we guess their locations correctly then we will find our correct polynomial and can correct the errors if we have at least k values left. The only thing now we need to find out is how large t can be so that we cant find a wrong polynomial before we find the correct one. The awnser is trivial actually, a polynomial of order k-1 is uniquely defined by k points so if we have t errors and guess all t error locations wrong then we effectively kill 2t points, and if there are less than k left then we could end up with a wrong polynomial. So we can correct (n-k)/2 errors. More generally reed solomon codes can correct 2*errors + erasures as long as thats ≤ n-k

Hamming distance

n-k+1 proof is trivial (smaller would contradict error correcting capability)

Practice

The above is true if our data and code values are real, rational or integer numbers (and others) but these are quite difficult to handle in reality as they arent bounded. Luckily all the above also works with finite fields so we can just work with polynomials over GF(256) or similar, which has the nice property that you can store such values in bytes while integers and reals can be quite hard to store in finite storage space

Low resolution graphics modes like 320×200 are displayed with doublescan so that each logical line is drawn twice, that way 320×200 is drawn with 400 lines which looks better than 200 on a modern CRT monitor. The reason is that modern CRT monitors are designed to be able to display 1000 or more lines and if they are feeded with just 200 there would be huge black bars between these thin 200 lines, with 400 lines its better but still not good.

VGA600 is a little DOS TSR which i wrote in 1996/1997 to solve this problem, VGA600 does that by monitoring the state of various vga registers and if it detects that the current graphic or text mode can be improved then it improves it by increasing the number of lines and dot clock. So for example 320×200 would be displayed with 600 or 800 lines (which it is depends on the configuration)

Source code under GPL, README and binary are available too

Ive extensively tested VGA600 with doom and quake with a ET4000 PCI card a long time ago ;)

Patches welcome, especially ones porting this to linux and modern vga cards :)