Lair Of The Multimedia Guru » Error Correcting Codes

2010-07-21

Building a polynomial out of its roots

Building a polynomial out of its roots is not particularly hard, one just multiplies the corresponding linear/first degree polynomials. But this isnt fast, even doing it recursively and using FFT based multiplication isnt all that great its still O(nlog²n). We can under some circumstances get rid of one of these log(n) factors as ill describe in a moment below. But first i wonder if below is the best that can be done or if iam silly and theres a faster or simpler way?

The idea of this method is not to build up the polynomial coefficients but to build up a vector of polynomial evaluations at evenly spaced points. Doing this for a linear factor “prototype” like x-1 on m points costs us O(m). next we build a sparse vector that is 1 where its index matches a root (and 2 for double roots,…) otherwise its 0. Now we can almost build our evaluation of the final polynom by convolution, just that convolution adds its terms while we need the factors multiplied of course. The solution is simply to convert the evaluation of the “prototype” linear factor by elementwise log() before convolution and by exp() afterwards. The value for log(0) does not matter for us except for numerical stabilty, we have to after exp() reset all roots to 0 anyway. With log(0)=0 one gets first order derivatives at the roots though. Also one can implement this using clasic log/exp and a complex value fft or finite field log/exp with a real or finite field fft. The last step of turning the evaluation vector of our polynom into coefficients can be done with a finite field fft. This make the whole thing run in O(m logm) time for a field size of m.

Whats annoying on it is that the the first part works with samples evenly spaced (aka an additative subgroup) while the second, that is turning the evaluation into coefficients is on a multiplicative subgroup of a finite field. In practice that means while my roots are along a multiplicative subgroup of GF(2¹⁶+1) i have to apply the rdft over the whole field. which is kinda feeling like a waste of cpu cycles

Suggestions to improve this are welcome. Also alternatively if one knows of a linear time method to zero pad in the frequency domain a block of size 2ⁿ to twice its size than the resursive multiplication variant should also run in O(n logn) time.

Filed under: Error Correcting Codes,Optimization — Michael @ 02:40

Comments (0)

2007-07-10

Forgotten code, noe and mina

libnoe

libnoe is a library to encode and decode reed solomon codes which i wrote between 2002 and 2006

noe

noe is an application which uses libnoe to generate an error correction file for some data file(s) and use that then to correct a wide varity of possible errors incuding having the data randomly chopped up and reordered. “noe” btw stands for “no error” in case you are wondering, sadly ive never finished the noe application.

The basic idea of how noe would work is that, first the data itself is unchanged, changing it would be inconvenient in many situations. The error correction file is made of many not too large packets, this ensures that any reordering which happens to the error correction file can be corrected by simply searching for the packet headers and looking at some sequence number in the header. The error correction packets now would contain some fingerprints of the data in the datafile(s) that is for example every 100th or 1000th bit of the data file would be stored in some error correction packet in the error correction file. With these fingerprints its possible to detect and correct reorderings which might have happenend to the data file even if just a random subset of the error correction packets are intact. The fingerprints as well as the headers of the error correction packets would contain some small checksums to avoid confusing the code by many wrong values. At last the main content of the error correction packets would simply be interleaved RS codes or more precissely the parity part of them. Btw in case anyone is wondering how data can get randomly choped up and reordered, think of a broken hard disk and fragmented files

Patches to finish noe are of course welcome! :)

mina

mina is the MINimal Alternative which my lazy self did finish. It simlpy takes a file and produces an error correction file which is just a bunch of interleaved RS codes (parity part of them actually) with no header or anything. It also happily eats corrupted files and corrects them

An example of minas correction capability is below, note images have been converted to jpeg to reduce their size and make them vissible in normal browsers. Raw damaged files as tar.gz are available too (mina dz lena.pnm.mina can be used to correct them)

damaged	recovered

Source code under GPL and GIT repositoryis available too, its also quite clean and does compile :). History though is sadly quite incomplete like with the other forgotten code, this time though it was IBMs fault as my private CVS server with the whole history of noe was on a IBM deathstar disk and it seems i had no backup of the RCS files (this is also one of the reasons why i make all that stuff public now, to avoid it being lost due to some other hd failure or stupidity …)

patches are welcome !!! :)

Filed under: Error Correcting Codes — Michael @ 02:14

Comments (0)

2007-07-08

Reed Solomon codes part 2

Asymptotic complexity of best known (to me ;) ) decoding algorithm

O(n log n + t log² t) for a (n,k) RS code over GF(n+1) and t=n-k

The proof for this is quite easy, syndrom calculation is just evaluating a polynomial at n-k points, and evaluating a polynomial (in GF(n)) at all points can be done with the GFFT actually evaluation at all points is the GFFT of the polynomial. Multiplying 2 polynomials is just 2xGFFT + componentwise multiplication + IGFFT. Finding the roots of a polynomial can as well be done by just evaluating it at all points. The only non trivial operation left for normal RS decoding is solving the key equation which is equivalent to euclids GCD algorithm as well as schönhages GCD algorithm, later has O(t log² t) complexity (log² t == (log t)² in case thats unclear).

An alternative to GF(2^x)

Normally RS codes are build over GF(2^x) that way the bits of the elements of an RS codeword have a nice 1:1 mapping to x bits which can then be stored or transmitted, but it has a big disadvantage and that is that the GFFT for GF(y) needed for fast RS decoding is done with y-1 points and so it cannot use the well known power of 2 style FFT algorithms as 2^x – 1 is not a multiple of 2. The solution is to use GF(2^x+1), though note GF(2^x+1) does not exist for all integer values of x, it only exists if 2^x+1 is a power of a prime that is p^j, 2 obvious choices using fermat primes are GF(2⁸+1) and GF(2¹⁶+1)

How do you store 2^x+1 values in 2^x values

Trivial ;)

The data part of our RS code is specified by the user and so it simply doesnt use the 2^x+1 th symbol, actually it would be messy to use it. So the only problem left are the n-k parity symbols, which can trivially be transformed to not contain the annoying 2^x+1 th symbol while at the same time maintaining the property of being an RS code

Let us assume that we have a symbol (at position y with value yv) in our k input symbols which is guranteed to have a value yv < 2^x – n + k that is in practice less than one unused bit. Let p be the RS codeword with all k-1 data symbols 0 and the symbol at position y 1. The next step is to find all the values of the y element in our original codeword which would cause no parity symbol to have that annoying 2^x+1 th value, for encoding we simply select the yv th element of this list as new yv element. For decoding we choose the number of elements in the list which are smaller than yv as our new element. As last step we just need to add a scaled version of p so as to actually have the wanted yv element and avoiding the nasty too large elements while also still having an RS code

Filed under: Error Correcting Codes — Michael @ 20:31

Comments (2)

Reed Solomon codes

What is a reed solomon code

Lets assume we have k values out of which we want to build a reed solomon code, to do this we imagine that our k values specify the height (=y) of k points with x from 0 to k-1. Now we really have just made a silly graph of our data. Next we find a order k-1 polynomial which goes exactly through these points, this polynomial is unique, no other polynomial of order k-1 will go through these points. And last we just evaluate this polynomial on the points 0 … n-1, these n values are a reed solomon code, simple isnt it? Note the first k values are just our input data values which we already know.

Correcting erasures with RS codes

We now can make RS codes, but how can we correct erasures? Lets assume there are t erasures (erasures are errors where the location of the error is known). That means we know n-t values of our polynomial, and if t≤n-k then we can just find the remaining values by finding the (unique) polynomial which goes through the n-t values. Its also easy to show (just think that you have k-1 of your k data values) that if t>n-k then no code can correct the erasures, so RS codes are optimal in that sense

Correcting errors with RS codes

But what if we dont know where the errors are? Well just try all possible error locations of 0, 1,…,t errors, yes this is not practical but its nice to proof the error correcting capability. Now if we have t actual errors and we guess their locations correctly then we will find our correct polynomial and can correct the errors if we have at least k values left. The only thing now we need to find out is how large t can be so that we cant find a wrong polynomial before we find the correct one. The awnser is trivial actually, a polynomial of order k-1 is uniquely defined by k points so if we have t errors and guess all t error locations wrong then we effectively kill 2t points, and if there are less than k left then we could end up with a wrong polynomial. So we can correct (n-k)/2 errors. More generally reed solomon codes can correct 2*errors + erasures as long as thats ≤ n-k

Hamming distance

n-k+1 proof is trivial (smaller would contradict error correcting capability)

Practice

The above is true if our data and code values are real, rational or integer numbers (and others) but these are quite difficult to handle in reality as they arent bounded. Luckily all the above also works with finite fields so we can just work with polynomials over GF(256) or similar, which has the nice property that you can store such values in bytes while integers and reals can be quite hard to store in finite storage space

Filed under: Error Correcting Codes — Michael @ 00:28

Comments (3)

2006-02-06

CRC32 vs. Adler32

You need to choose a checksum and dunno which, well the 2 most popular ones are the CRC and Adler32 checksum

Speed / innermost loop

CRC32:

while(buffer<end -3){
    crc ^= le2me_32(*(uint32_t*)buffer); buffer+=4;
    crc =  tab[3][ crc     &0xFF]
          ^tab[2][(crc>>8 )&0xFF]
          ^tab[1][(crc>>16)&0xFF]
          ^tab[0][(crc>>24)     ];
}

Adler32:

while(buffer<end -3){
    s1 += *buffer++; s2+=s1;
    s1 += *buffer++; s2+=s1;
    s1 += *buffer++; s2+=s1;
    s1 += *buffer++; s2+=s1;
}

as we can see adler needs 12 adds and 4 1byte reads and crc needs 1 4byte read, 4 xor, 4 table lookups, 1 add, 3 and, 3 shift, without benchmarking i would say adler should be faster

Burst error detecting capabilities

CRC-32 can detect every burst error of 32 or less bits
Adler-32 will fail with at least one 17bit burst error (0 00000010 00000000 vs. 1 00000000 00000001)

Bit error detecting capabilities

CRC-32 differs depening upon generator polynom see misusing-crcs-for-error-correction
Adler-32 can detect every 2bit error within 65521 byte or so but will fail with some 3 bit errors

Random error detecting capabilities

CRC-32 will produce a evenly distributed checksum for messages >4byte
Adler-32 will produce a quite unevenly distributed checksum for small mesages so that the effective number of bits in the checksum is significantly reduced for short messags

adler32
len=     1, collisions=488296949(0.390637%), effective bits=7.999957
len=     2, collisions=  1903788(0.001523%), effective bits=16.002699
len=     4, collisions=   372789(0.000298%), effective bits=18.355140
len=     8, collisions=    94799(0.000076%), effective bits=20.330556
len=    16, collisions=    23902(0.000019%), effective bits=22.318296
len=    32, collisions=     6068(0.000005%), effective bits=24.296135
len=    64, collisions=     1573(0.000001%), effective bits=26.243837
len=   128, collisions=      669(0.000001%), effective bits=27.477278
len=   256, collisions=      437(0.000000%), effective bits=28.091651
len=   512, collisions=      331(0.000000%), effective bits=28.492453
len=  1024, collisions=      219(0.000000%), effective bits=29.088353
----
crc32
len=     1, collisions=488296915(0.390637%), effective bits=7.999957
len=     2, collisions=  1903782(0.001523%), effective bits=16.002703
len=     4, collisions=       22(0.000000%), effective bits=32.403708
len=     8, collisions=       29(0.000000%), effective bits=32.005159
len=    16, collisions=       30(0.000000%), effective bits=31.956249
len=    32, collisions=       28(0.000000%), effective bits=32.055785
len=    64, collisions=       39(0.000000%), effective bits=31.577738
len=   128, collisions=       39(0.000000%), effective bits=31.577738
len=   256, collisions=       32(0.000000%), effective bits=31.863140
len=   512, collisions=       32(0.000000%), effective bits=31.863140
len=  1024, collisions=       42(0.000000%), effective bits=31.470823

Note, generated with: check_collision.c

Filed under: Error Correcting Codes — Michael @ 19:27

Comments (0)

2006-01-12

Better CRC-32 polynom for correcting byte errors

In correcting-byte-and-burst-errors-with-crcs we have seen that the CRC-32 polynom 0x04C11DB7 can correct 1 byte error in ~1mbit not that correcting single byte errors in such huge blocks is good for anything but ive found a much better one specifically 0x0D438219 which can correct one byte error in 9747877 bits

Filed under: Error Correcting Codes — Michael @ 23:35

Comments (0)

2005-12-07

Correcting byte and burst errors with CRCs

Can CRCs be used to fix a whole messed up byte? Yes as long as the codeword (data bits + crc bits) is less then whats in this table under 8≤b:

	generator polynom	b≤2	b≤3	b≤4	b≤5	b≤6	b≤7	b≤8	b≤9	b≤10	b≤11	b≤12	b≤13	b≤14
CRC-4	0xF0000000	8	6	6	6	6	6	6	6	6	6	6	6	6
CRC-5	0x28000000	16	9	7	7	7	7	7	7	7	7	7	7	7
CRC-7	0xA2000000	18	18	9	9	9	9	9	9	9	9	9	9	9
CRC-7	0x12000000	34	28	12	9	9	9	9	9	9	9	9	9	9
CRC-7	0x6E000000	66	12	9	9	9	9	9	9	9	9	9	9	9
CRC-8	0x07000000	130	12	12	10	10	10	10	10	10	10	10	10	10
CRC-8	0x39000000	20	20	11	10	10	10	10	10	10	10	10	10	10
CRC-8	0xD5000000	96	24	12	10	10	10	10	10	10	10	10	10	10
CRC-12	0x80F00000	2050	555	16	16	16	14	14	14	14	14	14	14	14
CRC-12	0x80B00000	513	259	16	16	16	14	14	14	14	14	14	14	14
CRC-12	0x1F100000	68	68	21	15	15	14	14	14	14	14	14	14	14
CRC-15	0x8B320000	130	130	130	130	58	27	17	17	17	17	17	17	17
CRC-16	0x10210000	32770	7144	4181	175	19	19	19	18	18	18	18	18	18
CRC-16	0x80050000	32770	19	19	19	19	19	19	18	18	18	18	18	18
CRC-16	0xA0970000	16386	931	862	76	76	65	22	18	18	18	18	18	18
CRC-16	0xC5030000	7164	7164	188	188	164	145	28	18	18	18	18	18	18
CRC-16	0x90D90000	154	154	154	154	154	77	21	18	18	18	18	18	18
CRC-24	0x80510100	7164	7164	1028	1028	1028	1028	1028	1028	348	30	30	26	26
CRC-32	0x04C11DB7	376820511	376820511	30435040	14373578	14373578	3932619	1077949	49616	11995	5682	1731	732	40
CRC-32	0x404098E2	1026	1026	1026	1026	1026	1026	1026	1026	1026	1026	241	229	114
CRC-32	0x1EDC6F41	2147483650	258958121	193439312	62023781	3040389	1847228	603132	98658	4913	3356	1104	86	86
CRC-32	0x741B8CD7	114698	114698	16390	16390	6361	3955	1601	120	120	120	120	77	49
CRC-32	0xF4ACFB13	32770	32770	32770	32770	32770	32770	32770	32770	6508	3052	1696	152	152
CRC-32	0x32583499	32772	32772	11340	11340	6230	5348	324	324	324	324	156	44	34
CRC-32	0x20044009	32772	32772	3792	3792	3792	3792	620	360	302	302	52	52	52
CRC-32	0xA833982B	65540	65540	1928	1928	1928	1928	1928	1593	203	203	203	66	66
CRC-32	0x00210801	65540	65540	15207	15207	3211	3211	959	83	83	83	39	39	39

table was created with crc_test_v2.c

To actually correct byte errors the code below could be used, a complete example (crc_test_byte.c) is available too.

static int get_error(unsigned int crc, unsigned int len, int *error){
    int i;

    for(i=0; i<len; i++){
        if(!(crc & ~255)){
            *error= crc;
            return i;
        }
        crc= (crc>>1) ^ (((G>>1)|0x80000000) & (-(crc&1)));
    }
    return -1;
}

Filed under: Error Correcting Codes — Michael @ 23:08

Comments (0)

2005-12-06

Correcting 2 bit errors with CRCs

1 and 2 bit errors can easily be corrected by using the following code snippet, a complete example is available too.

    unsigned int i,v=1;
    for(i=0; i<block_size ; i++){
        crctab[i][0]= i ? (v^1) : v;
        crctab[i][1]= i;
        v= (v<<1) ^ (G & (((int)v)>>31));
    }

    qsort(crctab, BLOCK_SIZE, 2*sizeof(int), cmp);

static int get_error(unsigned int crc, unsigned int len, int error[2]){
    int i;

    for(i=0; i<len ; i++){
        int *result= bsearch(&crc, crctab, BLOCK_SIZE, 2*sizeof(int), cmp);
        if(result){
            error[0]= i;
            error[1]= result[1] + i;
            return 1 + (result != crctab[0]);
        }
        crc= (crc>>1) ^ (((G>>1)|0x80000000) & (-(crc&1)));
    }
    return -1;
}

Note, if you want to use this in anything where speed matters, then you should replace the qsort/bsearch with some hash table, i didnt as the c stadard doesnt contain any useable hashtable implementation and i was lazy

Filed under: Error Correcting Codes — Michael @ 00:18

Comments (0)

2005-12-05

(Mis)using CRCs for error correction

Cyclic Redundancy Check codes are commonly used for detecting errors, but they can also be used to correct single and multibit errors. To be able to correct e errors per code word and detect code words with d errors, its needed for the minimum hamming distance of a code to be > 2e+d
So what is the relation between the code word length and the minimum hamming distances for crc codes?

	generator polyom	hd≥3	hd≥4	hd≥5	hd≥6	hd≥7	hd≥8
CRC- 4	0xF0000000	5	5	5	5	5	5
CRC- 5	0x28000000	31	4	4	4	4	4
CRC- 7	0xA2000000	15	15	7	7	7	7
CRC- 7	0x12000000	127	6	6	6	6	6
CRC- 7	0x6E000000	63	63	9	9	7	7
CRC- 8	0x07000000	127	127	8	8	8	8
CRC- 8	0x39000000	17	17	17	7	7	7
CRC- 8	0xD5000000	93	93	10	10	8	8
CRC-12	0x80F00000	2047	2047	13	13	12	12
CRC-12	0x80B00000	510	171	36	13	13	13
CRC-12	0x1F100000	65	65	65	21	13	11
CRC-15	0x8B320000	127	127	127	127	22	22
CRC-16	0x10210000	32767	32767	16	16	16	16
CRC-16	0x80050000	32767	32767	16	16	16	16
CRC-16	0xA0970000	32766	32766	83	83	24	24
CRC-16	0xC5030000	7161	496	85	36	24	22
CRC-16	0x90D90000	151	151	151	151	21	21
CRC-24	0x80510100	7161	7161	1030	1030	24	24
CRC-32	0x04C11DB7	4294967295	91638	3006	299	203	122
CRC-32	0x404098E2	1024	1023	1023	1023	1023	1023
CRC-32	0x1EDC6F41	2147483647	2147483647	5275	5275	209	209
CRC-32	0x741B8CD7	114695	114695	16392	16392	184	184
CRC-32	0xF4ACFB13	65534	65534	32768	32768	306	306
CRC-32	0x32583499	65538	65538	32770	32770	166	166
CRC-32	0x20044009	65538	65538	32770	32770	32	32
CRC-32	0xA833982B	65537	65537	65537	1091	113	89
CRC-32	0x00210801	65537	65537	65537	31	31	31

These where found using crc_test.c
Actually correcting errors is pretty trivial, for a single bit error, feeding the code below with the calculated crc XOR the received crc, will give you the distance from the end where the error ocured, and yeah if they match (crc=0) thn theres no single bit error

int get_single_error(int crc, int len){
    int i, v=1;

    for(i=0; i<len ; i++){
        if(v == crc)
            return i;
        v= (v<<1) ^ (generator_polynom & (v>>31));
    }
    return -1;
}

ill post some code examples to fix multibit errors later …

Filed under: Error Correcting Codes — Michael @ 20:23

Comments (6)

2010-07-21

2007-07-10

libnoe

noe

mina

2007-07-08

Asymptotic complexity of best known (to me ;) ) decoding algorithm

An alternative to GF(2x)

How do you store 2x+1 values in 2x values

What is a reed solomon code

Correcting erasures with RS codes

Correcting errors with RS codes

Hamming distance

Practice

2006-02-06

Speed / innermost loop

Burst error detecting capabilities

Bit error detecting capabilities

Random error detecting capabilities

2006-01-12

2005-12-07

2005-12-06

2005-12-05

Links

An alternative to GF(2^x)

How do you store 2^x+1 values in 2^x values