Lair Of The Multimedia Guru

January 28, 2006

Avoiding branches/if/conditionals

Why? well because misspredictable branches are very slow on modern CPUs, doing 10 or more simple arithemetic operations can be faster, but also keep in mind that:

  • compilers sometimes compile conditional code to very efficient unconditional code, while they might not recognize your branchless code es equivalent
  • if the branch is well predictable it will probably be faster then the branchless code

Building blocks of branchless code

Turning comparissions into masks or 0/1
a<0 a>>31
a!=0 (a|-a)>>31
a<b (a-b)>>31
a>b (b-a)>>31
a≤b (a-b-1)>>31
a≥b (b-a-1)>>31

Note, you might want to replace 31 by the number of bits in whatever type you use and cast to a signed type if you want a bit mask (-1/0) or unsigned type if you want (1/0)
allso keep in mind that (a<b) and such will give you (1/0) too

converting bitmasks to 0/1

well, thats trivial, there are many ways, some even flip the 0/1 in the process
mask&1, -mask, mask+1

converting 0/1 to masks

-x, x-1

flipping the 0/1 value

x^1, 1-x

flipping the mask value

x^(-1), ~x

switch between 2 variables depening upon a mask

b+((a-b)&mask), b^((a^b)&mask)

convert mask to 0/x


convert mask to -1/x

Note: usefull for (-1/0) -> (-1/1)

add/sub x depening upon a mask

a += x&mask
Note, dont forget a+=mask / a-=mask for x=1/-1

negate depening upon a mask

a = (a^mask) – mask

minimum of 2 variables


maximum of 2 variables


shift by s depending upon a mask

a>>= s&mask, a< <=s&mask, a += ((a>>s)-a)&mask
Note, that shifting by a constant might be faster

exchanging 2 values depending upon a mask

Note, the mask doesnt need to be 0/-1 it could have just some bits set

logical operations between masks

well, that really should be obvious AND a&b, OR a|b, XOR a^b, NOT ~a

Filed under: Optimization — Michael @ 2:52

January 20, 2006


When working with integers, fixedpoint math or when converting floats to integers we need to round somehow, but which way is best …
In school they probably told you to round to nearest and round halfway cases away from zero +-x.0000 … +-x.4999 to +-x and +-x.5 … +-x.9999 to +-x+-1 but thats actually not optimal

First decission, round to nearest or not, rounding to nearest is pretty clear already, it means to well round to the nearest if there is a single nearest value and thats obviously more accurate then not doing so, not rounding to nearest has one big advantage, its often but not always (PAVGB MMX instruction for example) faster (x>>2 vs. (x+2)>>2)

Second decission, what to do with x.5, it seems to not matter much but actually it often does matter, the round toward +inf and round to -inf cases are easy to implement (x+2)>>2 and (x+1)>>2 for example but they add some ugly systematic bias which can be problematic if values are used in long calculations and get rounded many times
the solution, round to even (or odd) doesnt add such a bias and is thus better, alternatively you could also repeatedly flip the direction in which rounding is done, mpeg4 for example does that in the motion compensation code …
to round to even you could do something like (x + ((x>>1)&1))>>1, (x+1+((x>>2)&1))>>2
to round to odd you could do something like (x>>1)|(x&1), (x+2-((x>>2)&1))>>2

you might also be stuck with instructions which do something like (a*constant)>>16 to reduce the rounding error of these you could simple add (1<<15) / constant to a

Filed under: Optimization — Michael @ 22:56

January 15, 2006

Universal vlc codes

Its quite well known that to optimally store some variable x with value v, we should use -log2(p) bits where p is the probability that x has value v, obviously the decoder has to know the probability distribution the encoder used or the bitstrings generated from that to be able decode it…
now if we limit ourselfs to an interger count of bits then huffman codes are optimal, building an huffman code is trivial, just recursively merge the 2 least probable symbols, but if there are many different variables with different probablility distributions or the maximum value is large or unbounded or you dont know the probabilities or you want low complexity then huffman codes might be a bad choice

Universal vlc codes might be a better choice, they all assume small values are more likely and often have very good compression rates
heres a table of common ones

v Unary Unary2 Exp Golomb Exp Golomb2 Fibonacci rice k=2 rice k=3
0 0 0 0 0 11 00 000
1 10 11 100 101 011 01 001
2 110 101 101 111 0011 100 010
3 1110 1001 11000 10001 1011 101 011
4 111110 10001 11001 10011 00011 1100 1000
5 1111110 100001 11010 11001 10011 1101 1001
6 11111110 1000001 11011 11011 01011 11100 1010
7 111111110 10000001 1110000 1000001 000011 11101 1011
8 1111111110 100000001 1110001 1000011 100011 111100 11000

Note1: Exp Golomb2 is identical to Exp Golomb except that its a reversible vlc, it can be decoded from right to left too, same for Unary2 and Unary

The smart reader will probably have noticed that Exp Golomb coding is simply unary coding for the number of bits + the bits, and rice coding is unary coding with k bits, more generally you can take any vlc code to encode some “class” and then give every such class its own number of additional bits

The ideal choice of uvlc code depends upon the probability distribution of the values which will be encoded, there is no single perfect one

Well, hmm there was something else …ahh yes negative numbers, we where just dealing with positive ones above, negative ones are trivial, just map them to odd positive ones and map the actual positive ones to even ones or the other way around
in C the following will do the trick quickly:

v = -2*i-1;
v ^= v>>31;

and the following will reverse it:

(v>>1) ^ -(v&1)
Filed under: Entropy Coding — Michael @ 22:18

January 14, 2006

DHL Europack

You order some japanese kitchen knifes from which are supposed to become a present for you mother, yeah maybe you shouldnt order from a company with such a name … , the stuff gets send with DHL Europack on 8.12.2005 a few days later you get an invoice but nothing else, on that is the number for the packet, so you check via
where the package went, it says:
Letztes Sendungsverfolgungsereignis:
Erfolgloser Zustellversuch, Firma erloschen / unbekannt(Zustellbasis Information): Guntramsdorf (tof) / AT, 2005-12-12
or in engish: recipient doesnt exist :)
i must note that i was at home most of the time, they had my phone and email address and there was no trace that they ever where here or tried to contact me
i send the company from where i ordered the stuff an email on the 16th, which is ignored entirely
on the 22th i try again and threaten them a little, i get a phone call telling me the package is being sorted, and that the info from the online tracking stuff where wrong, yeah sure …
on the 30th i get a letter informing me that they will transfer my money back which obviously indicates that the package went back too, i write them another email the next day which again is ignored, so i try DHL, first mail ignored second via some online form get the awnser:
aus datenschutzrechtlichen Gründen dürfen wir Angaben zum Sendungsstatus nur
an den Absender weitergeben. Bitte wenden Sie sich an Ihre Postverwaltung
oder den Absender um Ihre Informationen zu bekommen. Vielen Dank.
or in english: no we can only tell the sender due to privacy/data blah blah laws why your package didnt reach you

suggestions anyone?
try to order again from there and hope it reaches me before next xmas?
from somewhere else? (they seemed to be cheapest though)

Filed under: Off Topic — Michael @ 0:14

January 12, 2006

Better CRC-32 polynom for correcting byte errors

In correcting-byte-and-burst-errors-with-crcs we have seen that the CRC-32 polynom 0x04C11DB7 can correct 1 byte error in ~1mbit not that correcting single byte errors in such huge blocks is good for anything but ive found a much better one specifically 0x0D438219 which can correct one byte error in 9747877 bits

Filed under: Error Correcting Codes — Michael @ 23:35

Powered by WordPress