Lair Of The Multimedia Guru

2005-12-06

Correcting 2 bit errors with CRCs

1 and 2 bit errors can easily be corrected by using the following code snippet, a complete example is available too.

    unsigned int i,v=1;
    for(i=0; i<block_size ; i++){
        crctab[i][0]= i ? (v^1) : v;
        crctab[i][1]= i;
        v= (v<<1) ^ (G & (((int)v)>>31));
    }

    qsort(crctab, BLOCK_SIZE, 2*sizeof(int), cmp);

static int get_error(unsigned int crc, unsigned int len, int error[2]){
    int i;

    for(i=0; i<len ; i++){
        int *result= bsearch(&crc, crctab, BLOCK_SIZE, 2*sizeof(int), cmp);
        if(result){
            error[0]= i;
            error[1]= result[1] + i;
            return 1 + (result != crctab[0]);
        }
        crc= (crc>>1) ^ (((G>>1)|0x80000000) & (-(crc&1)));
    }
    return -1;
}

Note, if you want to use this in anything where speed matters, then you should replace the qsort/bsearch with some hash table, i didnt as the c stadard doesnt contain any useable hashtable implementation and i was lazy

Filed under: Error Correcting Codes — Michael @ 00:18

2005-12-05

(Mis)using CRCs for error correction

Cyclic Redundancy Check codes are commonly used for detecting errors, but they can also be used to correct single and multibit errors. To be able to correct e errors per code word and detect code words with d errors, its needed for the minimum hamming distance of a code to be > 2e+d
So what is the relation between the code word length and the minimum hamming distances for crc codes?

generator polyom hd≥3 hd≥4 hd≥5 hd≥6 hd≥7 hd≥8
CRC- 4 0xF0000000 5 5 5 5 5 5
CRC- 5 0x28000000 31 4 4 4 4 4
CRC- 7 0xA2000000 15 15 7 7 7 7
CRC- 7 0x12000000 127 6 6 6 6 6
CRC- 7 0x6E000000 63 63 9 9 7 7
CRC- 8 0x07000000 127 127 8 8 8 8
CRC- 8 0x39000000 17 17 17 7 7 7
CRC- 8 0xD5000000 93 93 10 10 8 8
CRC-12 0x80F00000 2047 2047 13 13 12 12
CRC-12 0x80B00000 510 171 36 13 13 13
CRC-12 0x1F100000 65 65 65 21 13 11
CRC-15 0x8B320000 127 127 127 127 22 22
CRC-16 0x10210000 32767 32767 16 16 16 16
CRC-16 0x80050000 32767 32767 16 16 16 16
CRC-16 0xA0970000 32766 32766 83 83 24 24
CRC-16 0xC5030000 7161 496 85 36 24 22
CRC-16 0x90D90000 151 151 151 151 21 21
CRC-24 0x80510100 7161 7161 1030 1030 24 24
CRC-32 0x04C11DB7 4294967295 91638 3006 299 203 122
CRC-32 0x404098E2 1024 1023 1023 1023 1023 1023
CRC-32 0x1EDC6F41 2147483647 2147483647 5275 5275 209 209
CRC-32 0x741B8CD7 114695 114695 16392 16392 184 184
CRC-32 0xF4ACFB13 65534 65534 32768 32768 306 306
CRC-32 0x32583499 65538 65538 32770 32770 166 166
CRC-32 0x20044009 65538 65538 32770 32770 32 32
CRC-32 0xA833982B 65537 65537 65537 1091 113 89
CRC-32 0x00210801 65537 65537 65537 31 31 31

These where found using crc_test.c
Actually correcting errors is pretty trivial, for a single bit error, feeding the code below with the calculated crc XOR the received crc, will give you the distance from the end where the error ocured, and yeah if they match (crc=0) thn theres no single bit error

int get_single_error(int crc, int len){
    int i, v=1;

    for(i=0; i<len ; i++){
        if(v == crc)
            return i;
        v= (v<<1) ^ (generator_polynom & (v>>31));
    }
    return -1;
}

ill post some code examples to fix multibit errors later …

Filed under: Error Correcting Codes — Michael @ 20:23

2005-12-03

PP filter comparission update

Ive added some -vf uspp (PP by (mis)using the snow codec) based images to pp vs. spp filters, the reason for uspp=6 instead of higher values is that higher ones wake the OOM-killer :-) the reason for the lack of pp=ac/ha/va based samples is that they look identical to the pp=de/hb/vb

Filed under: Pictures,PostProcessing,Wavelets — Michael @ 14:21

2005-11-28

15 reasons why MPEG4 sucks

  • no loop filter to reduce blocking artefacts, H.263 upon which MPEG4 is based had one, and the thing people most often complain about when looking at MPEG4 videos are blocking artefacts, why cant postprocessing filters cover this up? well its much harder to remove multiple overlapping and randomliy shifted bocking artefacts instead of ones which are exactly at a 8×8 grid and only around macroblocks with specific parameters …
  • block boundary mirroring for qpel, this is AFAIK needed because there is no deblocking loop filter, if mirroring wouldnt be done then the filter would cross many more edges created by blocking artefacts and create a random mess, mirroring also reduces the number of bytes which need to be fetched for each block, ok but why am i complaining? well its a mess to implement and it forces the encoder to do redundant calculations as each block needs things mirrored differently
  • all the startcode avoidance and redundant marker bits, there are really just 2 possibilities, either the stream is undamaged in which case we dont need to search for startcodes unless we try to seek in raw mpeg4 (.m4v NOT .mp4!) or it is damaged in which case the data can contain anything anyway, now if all this mess was for seeking in raw mpeg4 why are there no reliably timestamps? additionally all non-MPEG/ITU codecs work perfectly fine without this startcode emulation prevention thing
  • OBMC is in the spec, theres a bit in the header to enable it but oops its not allowed in ANY profile
  • IDCT see prevous blog entry
  • GMC is very computationally intensive and according to some documents from the JVT the only reason why there is any quality/compression gain at all is that the motion vector for skiped macroblocks is the GMC vector where without GMC it would be (0,0) now simply changing that to the predicted median vector should make the tiny gain GMC has dissapear
  • a single set of fixed vlc tables, why? its just silly, several fixed sets or variable ones stored in a global header would mean very little additional complexity IMHO at least
  • short header format (=every MPEG4 decoder must support simple H.263)
  • different motion vector prediction in bframes, why? using the same as in P frames would be simpler and very likely higher quality
  • no intra macroblocks in b frames
  • macroblocks in B frames must be skiped if they where skiped in the last P frame, thanks to this idiotic idea encoders must check all b frames before they can mark a MB in a P frame as skiped
  • no macroblocks with 4mv and dquant at the same time, requires special handling in the encoder and no advantage at all, h.263 demonstrates how to do it correctly
  • all the “flash” animation multi object stuff in MPEG4
  • dquant is limited to +-2 so encoders must somehow convert the ideal QP values from lumimasking/… to ones which dont change too much between MBs
  • QP is limited to either even or odd values in b frames, yeah poor encoder must decide which it wants, cant have both, hmm and according to the comments in ffmpeg more code is needed for working around all these arbitarray mpeg4 QP limitations about then the actual QP selection needs
Filed under: VideoCoding — Michael @ 21:31

The MPEG1/2/4 and H.261/2/3 IDCT

The 8×8 Inverse discrete cosine transform used in most video codecs converts the very sparse and thus easily compressible dct coefficient matrixes into the 8×8 blocks vissible or the 8×8 difference relative to some area from the previous frame
The IDCT in these codecs is not exactly specified, instead the MPEG and ITU commitees only require the used IDCT to be approximately equal to a idealized and very slow reference IDCT due to that, decoders and encoders use various different IDCTs, if the actually used encoder and decoder happen to use 2 different enough IDCTs and the video material, encoding parameters, moon phase and so on match then the tiny +-1 differences between the IDCTs will accumulate and turn the pale face of a corpse pink or green or add stripes or worse …
some examples of such artifacts:

LLM/IJG int simpleidct
libmpeg2 idct xvids idct
LLM/IJG int simpleidct
libmpeg2 idct xvids idct

Its also very important to keep in mind that these artifacts are caused by the difference between the encoder and decoder IDCT and not the difference to some idealized IDCT only on the decoder side, so using the reference IDCT on the decoder side will often not help at all, and sometimes might make the artifacts worse
now maybe you are curious which IDCT was used in the encoders for the 2 examples above, well iam curious too about that as theres no 100% certain way to find out, allthough i would guess the first used the idct from the msmpeg4v3 codec and the second the one from xvid
Which prameters can be tuned on the encoder side to reduce these artifacts? First use a smaller keyframe interval, then avoid qp=1, avoid qpel and use lots of b frames or use H.264 which doesnt suffer from this mess as it has a exactly specified idct approximation

Filed under: DCT,VideoCoding — Michael @ 16:37

pp vs. spp filters

Comparing the pp and spp filters from MPlayer/ffmpeg with the standard lena 256×256 image, the unfiltered one is a simple 10% quality jpeg saved by gimp

Original unfiltered
pp=hb/vb/dr/fq=8 pp=hb/vb/dr/fq=16 pp=hb/vb/dr/fq=32
spp=6:8 spp=6:16 spp=6:32
uspp=6:8 uspp=6:16 uspp=6:32
Filed under: Pictures,PostProcessing — Michael @ 03:33

2005-11-27

The snow wavelet

The default wavelet used by the snow codec in ffmpeg is a symmetric biorthogonal compact 9/7 wavelet with rational coefficients with properties very similar to the famous biorthoginal daubechies 9/7 wavelet, it was found by bruteforce search with the goal of having a simple lifting implemenattion and having many vanishing moments or almost vanishing moments, the daubechies wavelet has (4,4) our wavelet has (4,2) vanishing moments
The lifting coefficients are:

dau97 snow
-0.443506852 -15/32
-0.882911075 -4/5
0.052980118 1/16
1.586134342 3/2

The resulting wavelet and scaling functions look like:

One thing which is still annoying is the 4/5, while this can be approximated pretty well by a multiplication and a shift right but this tends to overflow so something like

x += x+x;
x += x>>4;
x += x>>8;
x >>= 2;

is needed, this one only differs from 4/5 by a factor of ~0.00001, alternatively the 4/5 can be merged into the unquantization step so that

\ | /|\ | /|\ | /|\ | /|\
 \|/ | \|/ | \|/ | \|/ |
  +  |  +  |  +  |  +  |   -15/32
 /|\ | /|\ | /|\ | /|\ |
/ | \|/ | \|/ | \|/ | \|/
  |  +  |  +  |  +  |  +   -4/5
\ | /|\ | /|\ | /|\ | /|\
 \|/ | \|/ | \|/ | \|/ |
  +  |  +  |  +  |  +  |   +1/16
 /|\ | /|\ | /|\ | /|\ |
/ | \|/ | \|/ | \|/ | \|/
  |  +  |  +  |  +  |  +   +3/2
is changed to
\ | /|\ | /|\ | /|\ | /|\
 \|/ | \|/ | \|/ | \|/ |
  +  |  +  |  +  |  +  |   -3/8
 /|\ | /|\ | /|\ | /|\ |
/ | \|/ | \|/ | \|/ | \|/
 (|  + (|  + (|  + (|  +   -1
\ + /|\ + /|\ + /|\ + /|\  +1/4
 \|/ | \|/ | \|/ | \|/ |
  +  |  +  |  +  |  +  |   +1/16
 /|\ | /|\ | /|\ | /|\ |
/ | \|/ | \|/ | \|/ | \|/
  |  +  |  +  |  +  |  +   +3/2

Now lets compare quality between the snow and daubechies 9/7 wavelets, this comparission has been done with JasPer with this change and tinypsnr (from ffmpeg) The used test image was the 256×256 lena grayscale image

Daubechies 9/7 wavelet Snows 9/7 wavelet
size PSNR
618 24.02
1283 27.32
3194 31.80
6503 36.82
13079 42.93
22945 47.14
size PSNR
622 24.03
1295 27.33
3212 31.82
6519 36.83
12981 42.87
22941 47.18
Filed under: Wavelets — Michael @ 22:09

2005-11-25

Fast CRC

Ok kids, today we will look at how to quickly calculate a crc checksum, the naive way is to xor the crc polynomial with various shifts into the data to remove all 1 bits except the last 32 (or whatever length our crc polynom has) which from a mathemathical point of view is the data_polynom modulo the crc_polynom over GF(2)
now noone does that except maybe a few very smart or very stupid people :-) everyone else uses a 8bit->32bit look up table which simply contains the 32bit crc for each 8bit data polynom, more precissely:
LUT[i]= (i<<|CRC_POLY|) GF2_MOD CRC_POLY where GF2_MOD is the modulo operation between 2 binary polynoms over GF(2)
the actual code which uses this LUT looks approximately like:

for (i = 0; i < length; i++)
    crc = LUT[(crc&0xFF) ^ *buffer++] ^ (crc >>8);

a much faster method is to use 4 LUTs so that they gives the crc checksum 0,1,2,3 bytes ahead:
LUT[j][i]= (i<<(|CRC_POLY| + j*8)) GF2_MOD CRC_POLY
with that we can calculate the crc then by:

while(buffer<end-3){
    crc ^= le2me_32(*(uint32_t*)buffer); buffer+=4;
    crc =  tab[3][ crc     &0xFF]
          ^tab[2][(crc>>8 )&0xFF]
          ^tab[1][(crc>>16)&0xFF]
          ^tab[0][(crc>>24)     ];
}
while(buffer<end)
    crc = tab[0][(crc&0xFF) ^ *buffer++] ^ (crc >>8);

actually working code should be in ffmpeg/libavutil soon, if not ill post my messy proof of concept code

Filed under: Optimization — Michael @ 02:43

2005-11-23

Testing Upload

Below you will hopefully find images of various amphibians i stumbled across with my camera about 6 month ago.
Frog1Toad1Frog2Frog3Frog4

Filed under: Nature,Pictures — Michael @ 20:59

Fast approximate max of 2 integers

Maybe you are lucky and your cpu instructon set has a fast maximum_of_2_integers instruction, but more likely you wont be so lucky, in that case a simple bitwise OR might be worth a thought.
Note: max(a,b) ≤ a|b < 2 * max(a,b)
This approximation is surprissingly more usefull then it seems at first

Filed under: Optimization — Michael @ 07:27
« Previous PageNext Page »

Powered by WordPress