Lair Of The Multimedia Guru

2006-01-20

Rounding

When working with integers, fixedpoint math or when converting floats to integers we need to round somehow, but which way is best …
In school they probably told you to round to nearest and round halfway cases away from zero +-x.0000 … +-x.4999 to +-x and +-x.5 … +-x.9999 to +-x+-1 but thats actually not optimal

First decission, round to nearest or not, rounding to nearest is pretty clear already, it means to well round to the nearest if there is a single nearest value and thats obviously more accurate then not doing so, not rounding to nearest has one big advantage, its often but not always (PAVGB MMX instruction for example) faster (x>>2 vs. (x+2)>>2)

Second decission, what to do with x.5, it seems to not matter much but actually it often does matter, the round toward +inf and round to -inf cases are easy to implement (x+2)>>2 and (x+1)>>2 for example but they add some ugly systematic bias which can be problematic if values are used in long calculations and get rounded many times
the solution, round to even (or odd) doesnt add such a bias and is thus better, alternatively you could also repeatedly flip the direction in which rounding is done, mpeg4 for example does that in the motion compensation code …
to round to even you could do something like (x + ((x>>1)&1))>>1, (x+1+((x>>2)&1))>>2
to round to odd you could do something like (x>>1)|(x&1), (x+2-((x>>2)&1))>>2

you might also be stuck with instructions which do something like (a*constant)>>16 to reduce the rounding error of these you could simple add (1<<15) / constant to a

Filed under: Optimization — Michael @ 22:56

2006-01-15

Universal vlc codes

Its quite well known that to optimally store some variable x with value v, we should use -log2(p) bits where p is the probability that x has value v, obviously the decoder has to know the probability distribution the encoder used or the bitstrings generated from that to be able decode it…
now if we limit ourselfs to an interger count of bits then huffman codes are optimal, building an huffman code is trivial, just recursively merge the 2 least probable symbols, but if there are many different variables with different probablility distributions or the maximum value is large or unbounded or you dont know the probabilities or you want low complexity then huffman codes might be a bad choice

Universal vlc codes might be a better choice, they all assume small values are more likely and often have very good compression rates
heres a table of common ones

v Unary Unary2 Exp Golomb Exp Golomb2 Fibonacci rice k=2 rice k=3
0 0 0 0 0 11 00 000
1 10 11 100 101 011 01 001
2 110 101 101 111 0011 100 010
3 1110 1001 11000 10001 1011 101 011
4 111110 10001 11001 10011 00011 1100 1000
5 1111110 100001 11010 11001 10011 1101 1001
6 11111110 1000001 11011 11011 01011 11100 1010
7 111111110 10000001 1110000 1000001 000011 11101 1011
8 1111111110 100000001 1110001 1000011 100011 111100 11000

Note1: Exp Golomb2 is identical to Exp Golomb except that its a reversible vlc, it can be decoded from right to left too, same for Unary2 and Unary

The smart reader will probably have noticed that Exp Golomb coding is simply unary coding for the number of bits + the bits, and rice coding is unary coding with k bits, more generally you can take any vlc code to encode some “class” and then give every such class its own number of additional bits

The ideal choice of uvlc code depends upon the probability distribution of the values which will be encoded, there is no single perfect one

Well, hmm there was something else …ahh yes negative numbers, we where just dealing with positive ones above, negative ones are trivial, just map them to odd positive ones and map the actual positive ones to even ones or the other way around
in C the following will do the trick quickly:

v = -2*i-1;
v ^= v>>31;

and the following will reverse it:

(v>>1) ^ -(v&1)
Filed under: Entropy Coding — Michael @ 22:18

2006-01-14

DHL Europack

You order some japanese kitchen knifes from dick.biz which are supposed to become a present for you mother, yeah maybe you shouldnt order from a company with such a name … , the stuff gets send with DHL Europack on 8.12.2005 a few days later you get an invoice but nothing else, on that is the number for the packet, so you check via tracknet.deutschepost.de
where the package went, it says:
Letztes Sendungsverfolgungsereignis:
Erfolgloser Zustellversuch, Firma erloschen / unbekannt(Zustellbasis Information): Guntramsdorf (tof) / AT, 2005-12-12
or in engish: recipient doesnt exist :)
i must note that i was at home most of the time, they had my phone and email address and there was no trace that they ever where here or tried to contact me
i send the company from where i ordered the stuff an email on the 16th, which is ignored entirely
on the 22th i try again and threaten them a little, i get a phone call telling me the package is being sorted, and that the info from the online tracking stuff where wrong, yeah sure …
on the 30th i get a letter informing me that they will transfer my money back which obviously indicates that the package went back too, i write them another email the next day which again is ignored, so i try DHL, first mail ignored second via some online form get the awnser:
aus datenschutzrechtlichen Gründen dürfen wir Angaben zum Sendungsstatus nur
an den Absender weitergeben. Bitte wenden Sie sich an Ihre Postverwaltung
oder den Absender um Ihre Informationen zu bekommen. Vielen Dank.
or in english: no we can only tell the sender due to privacy/data blah blah laws why your package didnt reach you

suggestions anyone?
try to order again from there and hope it reaches me before next xmas?
from somewhere else? (they seemed to be cheapest though)

Filed under: Off Topic — Michael @ 00:14

2006-01-12

Better CRC-32 polynom for correcting byte errors

In correcting-byte-and-burst-errors-with-crcs we have seen that the CRC-32 polynom 0x04C11DB7 can correct 1 byte error in ~1mbit not that correcting single byte errors in such huge blocks is good for anything but ive found a much better one specifically 0x0D438219 which can correct one byte error in 9747877 bits

Filed under: Error Correcting Codes — Michael @ 23:35

2005-12-30

Principal components analysis / Karhunen-Loève transform

The PCA/KLT is often said to be the optimal linear transform for video coding while the DCT is a similar, faster and simpler transform
This isnt really true, the KLT is not that optimal at all for video coding, what the KLT is, is that it is the optimal orthogonal transform for compacting the energy of a vector into its first n components, what might be better is a transform, preferably (near) orthogonal which compacts the energy into few components, not neccessarily the first few!
For example, if we consider 1D 8 component vectors, lets assume our data set is entirely made of piecewice constants then the following basis functions would allow us to exactly store these with 1+n basis functions where n is the number of discontinuities, and please note this here is just an example where the KLT isnt that optimal, 1-D piecewise constants are probably not a ideal model for images …

1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1
0 0 0 1 1 1 1 1
0 0 0 0 1 1 1 1
0 0 0 0 0 1 1 1
0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 1

ok this is quite a non orthogonal set of basis functions which has its problems too, a less compact and orthogonal (if the basis functions are normalized, which for sake of readability hasnt been done) but still better then KLT transform would be based upon the following basis functions:

1 1 1 1 1 1 1 1
1 1 1 1-1-1-1-1
1 1-1-1 0 0 0 0
0 0 0 0 1 1-1-1
1-1 0 0 0 0 0 0
0 0 1-1 0 0 0 0
0 0 0 0 1-1 0 0 
0 0 0 0 0 0 1-1

here a piecewise constant with 1 discontinuity can be represented by 2-4 basis functions where the KLT case below will need all 8 for an exact representation

 0.290596  0.341669  0.377247  0.395479  0.395472  0.377227  0.341853  0.290749  
-0.490579 -0.416017 -0.277788 -0.097204  0.097447  0.277903  0.415566  0.490129      
 0.473927  0.255266 -0.138718 -0.437652 -0.436948 -0.137245  0.255345  0.473804       
-0.415204  0.097424  0.490934  0.276661 -0.277947 -0.490521 -0.097391  0.416181       
-0.382979  0.320320  0.364628 -0.343399 -0.342868  0.364783  0.320177 -0.383024       
-0.278541  0.491575 -0.099199 -0.414873  0.416580  0.096333 -0.489054  0.277157        
-0.210508  0.464503 -0.455644  0.190764  0.181391 -0.451288  0.463825 -0.210293        
-0.096505  0.274820 -0.412134  0.489524 -0.491862  0.418348 -0.280951  0.098971

this has been generated by (the hopefully not buggy) pca.c/pca.h

Another way to see the sub-optimality of the KLT is to consider the last (high frquency) basis functions, in many images and videos they are simply not used, if they would be replaced by commonly occuring patterns which without them would need many basis functions to be accurately represented then the compression-rate could be improved, the resultig transform would be non orthogonal though …

Filed under: DCT,VideoCoding — Michael @ 01:36

2005-12-28

Extracting scantables from binary video codecs

Whats a scantable? A thing that tells you where to put the coefficients which you got from vlc/rle decoding, so for example with the famous zigzag table

    0,   1,  8, 16,  9,  2,  3, 10,
    17, 24, 32, 25, 18, 11,  4,  5,
    12, 19, 26, 33, 40, 48, 41, 34,
    27, 20, 13,  6,  7, 14, 21, 28,
    35, 42, 49, 56, 57, 50, 43, 36,
    29, 22, 15, 23, 30, 37, 44, 51,
    58, 59, 52, 45, 38, 31, 39, 46,
    53, 60, 61, 54, 47, 55, 62, 63

the first Coefficient would be written at position 0, the second one at 1, third at 8 and so on. As the destination array is a 8×8 matrix here this produces a nice zigzag from the top left corner to the bottom right corner, ordering the dct coefficients approximately from low frequency to high, and as the high ones are almost always zero the vlc-rle coding before ends up being quite effective, but back to the topic, what if we for whatever odd reason want to know if a binary contains such a table?
its quite easy to find such tables by brute force, their size is something like 16 or 64 entries with no duplicates and the first entry most often being 0 and none of their entries should be larger then the number of entries in the table, heres some (old) code which i wrote quite some time ago to implement this and a few other things

And now the interresting part, what do we find with this?

 0  1  4  8 
 5  2  3  6 
 9 12 13 10 
 7 11 14 15

o-->o   o-->o
  /   /   /
o   o   o   o
| /   /   / |
o   o   o   o
  /   /   /
o-->o   o-->o


and

 0  1  2  6 
10  3  7 11 
 4  8  5  9 
12 13 14 15  

 o-->o-->o   o
         |  /|
 o   o   o / o
 | / |   |/  |
 o   o   o   o
   /
 o-->o-->o-->o
(dual scan table ascii art stolen from libavcodec/svq3.c)

in drv3.so.6.0
note these tables are the 4×4 zigzag and dualscan tables from an old H.264 draft, so RV30 seems to be a H.264 variant like SVQ3

Filed under: Reverse Engineering,VideoCoding — Michael @ 00:43

2005-12-20

Profiling / Benchmarking code

You have a piece of code, its too slow so you want to optimize it, but to do that you first need to be able to meassure its speed, otherwise you wont know if a change improved speed or not

Benchmarking a piece of code sounds easy but it isnt, you can try to do it with various standard “get the current time” functions but these tend to be very inaccurate, canned tools like gprof are an option too, but in my experience they are also inaccurate and they can produce highly misleading results due to function inlining. A more accurate method on the x86 architecture is to use rdtsc

So now we have a accurate method to get the current “time” but what now? simply running the code once and getting the time before and afterwards? This would be a very bad choice as the first time a piece of code is executed its not in the code cache, the data isnt in the data cache, branch prediction has not seen the code yet, … so we could ignore the first few iterations, but still a single measurement isnt very trustworthy, next improvement would be to average many interations but still its very sensitive, just think about what happens when the kernel decides to stop your program and give the cpu to your mp3 player, its like having a interation which needs 50 times longer then the average assuming the code between the rdtsc instructions doesnt take too long. Solving this is easy though, just discard such outliers which need many times the average (see the START/STOP_TIMER macros in ffmpeg/libavutil/common.h)

Filed under: Optimization — Michael @ 13:27

2005-12-07

Correcting byte and burst errors with CRCs

Can CRCs be used to fix a whole messed up byte? Yes as long as the codeword (data bits + crc bits) is less then whats in this table under 8≤b:

generator polynom b≤2 b≤3 b≤4 b≤5 b≤6 b≤7 b≤8 b≤9 b≤10 b≤11 b≤12 b≤13 b≤14
CRC-4 0xF0000000 8 6 6 6 6 6 6 6 6 6 6 6 6
CRC-5 0x28000000 16 9 7 7 7 7 7 7 7 7 7 7 7
CRC-7 0xA2000000 18 18 9 9 9 9 9 9 9 9 9 9 9
CRC-7 0x12000000 34 28 12 9 9 9 9 9 9 9 9 9 9
CRC-7 0x6E000000 66 12 9 9 9 9 9 9 9 9 9 9 9
CRC-8 0x07000000 130 12 12 10 10 10 10 10 10 10 10 10 10
CRC-8 0x39000000 20 20 11 10 10 10 10 10 10 10 10 10 10
CRC-8 0xD5000000 96 24 12 10 10 10 10 10 10 10 10 10 10
CRC-12 0x80F00000 2050 555 16 16 16 14 14 14 14 14 14 14 14
CRC-12 0x80B00000 513 259 16 16 16 14 14 14 14 14 14 14 14
CRC-12 0x1F100000 68 68 21 15 15 14 14 14 14 14 14 14 14
CRC-15 0x8B320000 130 130 130 130 58 27 17 17 17 17 17 17 17
CRC-16 0x10210000 32770 7144 4181 175 19 19 19 18 18 18 18 18 18
CRC-16 0x80050000 32770 19 19 19 19 19 19 18 18 18 18 18 18
CRC-16 0xA0970000 16386 931 862 76 76 65 22 18 18 18 18 18 18
CRC-16 0xC5030000 7164 7164 188 188 164 145 28 18 18 18 18 18 18
CRC-16 0x90D90000 154 154 154 154 154 77 21 18 18 18 18 18 18
CRC-24 0x80510100 7164 7164 1028 1028 1028 1028 1028 1028 348 30 30 26 26
CRC-32 0x04C11DB7 376820511 376820511 30435040 14373578 14373578 3932619 1077949 49616 11995 5682 1731 732 40
CRC-32 0x404098E2 1026 1026 1026 1026 1026 1026 1026 1026 1026 1026 241 229 114
CRC-32 0x1EDC6F41 2147483650 258958121 193439312 62023781 3040389 1847228 603132 98658 4913 3356 1104 86 86
CRC-32 0x741B8CD7 114698 114698 16390 16390 6361 3955 1601 120 120 120 120 77 49
CRC-32 0xF4ACFB13 32770 32770 32770 32770 32770 32770 32770 32770 6508 3052 1696 152 152
CRC-32 0x32583499 32772 32772 11340 11340 6230 5348 324 324 324 324 156 44 34
CRC-32 0x20044009 32772 32772 3792 3792 3792 3792 620 360 302 302 52 52 52
CRC-32 0xA833982B 65540 65540 1928 1928 1928 1928 1928 1593 203 203 203 66 66
CRC-32 0x00210801 65540 65540 15207 15207 3211 3211 959 83 83 83 39 39 39

table was created with crc_test_v2.c

To actually correct byte errors the code below could be used, a complete example (crc_test_byte.c) is available too.

static int get_error(unsigned int crc, unsigned int len, int *error){
    int i;

    for(i=0; i<len; i++){
        if(!(crc & ~255)){
            *error= crc;
            return i;
        }
        crc= (crc>>1) ^ (((G>>1)|0x80000000) & (-(crc&1)));
    }
    return -1;
}
Filed under: Error Correcting Codes — Michael @ 23:08

2005-12-06

Correcting 2 bit errors with CRCs

1 and 2 bit errors can easily be corrected by using the following code snippet, a complete example is available too.

    unsigned int i,v=1;
    for(i=0; i<block_size ; i++){
        crctab[i][0]= i ? (v^1) : v;
        crctab[i][1]= i;
        v= (v<<1) ^ (G & (((int)v)>>31));
    }

    qsort(crctab, BLOCK_SIZE, 2*sizeof(int), cmp);

static int get_error(unsigned int crc, unsigned int len, int error[2]){
    int i;

    for(i=0; i<len ; i++){
        int *result= bsearch(&crc, crctab, BLOCK_SIZE, 2*sizeof(int), cmp);
        if(result){
            error[0]= i;
            error[1]= result[1] + i;
            return 1 + (result != crctab[0]);
        }
        crc= (crc>>1) ^ (((G>>1)|0x80000000) & (-(crc&1)));
    }
    return -1;
}

Note, if you want to use this in anything where speed matters, then you should replace the qsort/bsearch with some hash table, i didnt as the c stadard doesnt contain any useable hashtable implementation and i was lazy

Filed under: Error Correcting Codes — Michael @ 00:18

2005-12-05

(Mis)using CRCs for error correction

Cyclic Redundancy Check codes are commonly used for detecting errors, but they can also be used to correct single and multibit errors. To be able to correct e errors per code word and detect code words with d errors, its needed for the minimum hamming distance of a code to be > 2e+d
So what is the relation between the code word length and the minimum hamming distances for crc codes?

generator polyom hd≥3 hd≥4 hd≥5 hd≥6 hd≥7 hd≥8
CRC- 4 0xF0000000 5 5 5 5 5 5
CRC- 5 0x28000000 31 4 4 4 4 4
CRC- 7 0xA2000000 15 15 7 7 7 7
CRC- 7 0x12000000 127 6 6 6 6 6
CRC- 7 0x6E000000 63 63 9 9 7 7
CRC- 8 0x07000000 127 127 8 8 8 8
CRC- 8 0x39000000 17 17 17 7 7 7
CRC- 8 0xD5000000 93 93 10 10 8 8
CRC-12 0x80F00000 2047 2047 13 13 12 12
CRC-12 0x80B00000 510 171 36 13 13 13
CRC-12 0x1F100000 65 65 65 21 13 11
CRC-15 0x8B320000 127 127 127 127 22 22
CRC-16 0x10210000 32767 32767 16 16 16 16
CRC-16 0x80050000 32767 32767 16 16 16 16
CRC-16 0xA0970000 32766 32766 83 83 24 24
CRC-16 0xC5030000 7161 496 85 36 24 22
CRC-16 0x90D90000 151 151 151 151 21 21
CRC-24 0x80510100 7161 7161 1030 1030 24 24
CRC-32 0x04C11DB7 4294967295 91638 3006 299 203 122
CRC-32 0x404098E2 1024 1023 1023 1023 1023 1023
CRC-32 0x1EDC6F41 2147483647 2147483647 5275 5275 209 209
CRC-32 0x741B8CD7 114695 114695 16392 16392 184 184
CRC-32 0xF4ACFB13 65534 65534 32768 32768 306 306
CRC-32 0x32583499 65538 65538 32770 32770 166 166
CRC-32 0x20044009 65538 65538 32770 32770 32 32
CRC-32 0xA833982B 65537 65537 65537 1091 113 89
CRC-32 0x00210801 65537 65537 65537 31 31 31

These where found using crc_test.c
Actually correcting errors is pretty trivial, for a single bit error, feeding the code below with the calculated crc XOR the received crc, will give you the distance from the end where the error ocured, and yeah if they match (crc=0) thn theres no single bit error

int get_single_error(int crc, int len){
    int i, v=1;

    for(i=0; i<len ; i++){
        if(v == crc)
            return i;
        v= (v<<1) ^ (generator_polynom & (v>>31));
    }
    return -1;
}

ill post some code examples to fix multibit errors later …

Filed under: Error Correcting Codes — Michael @ 20:23
« Previous PageNext Page »

Powered by WordPress