# Lair Of The Multimedia Guru

## 2006-01-28

### Avoiding branches/if/conditionals

Why? well because misspredictable branches are very slow on modern CPUs, doing 10 or more simple arithemetic operations can be faster, but also keep in mind that:

• compilers sometimes compile conditional code to very efficient unconditional code, while they might not recognize your branchless code es equivalent
• if the branch is well predictable it will probably be faster then the branchless code

#### Building blocks of branchless code

##### Turning comparissions into masks or 0/1
 a<0 a>>31 a!=0 (a|-a)>>31 a>31 a>b (b-a)>>31 a≤b (a-b-1)>>31 a≥b (b-a-1)>>31

Note, you might want to replace 31 by the number of bits in whatever type you use and cast to a signed type if you want a bit mask (-1/0) or unsigned type if you want (1/0)
allso keep in mind that (a<b) and such will give you (1/0) too

##### converting bitmasks to 0/1

well, thats trivial, there are many ways, some even flip the 0/1 in the process

-x, x-1

x^1, 1-x

x^(-1), ~x

##### convert mask to -1/x

Note: usefull for (-1/0) -> (-1/1)

##### add/sub x depening upon a mask

Note, dont forget a+=mask / a-=mask for x=1/-1

##### minimum of 2 variables

b+((a-b)&((a-b)>>31))

##### maximum of 2 variables

a-((a-b)&((a-b)>>31))

##### shift by s depending upon a mask

Note, that shifting by a constant might be faster

##### exchanging 2 values depending upon a mask

a^=b
a^=b
Note, the mask doesnt need to be 0/-1 it could have just some bits set

##### logical operations between masks

well, that really should be obvious AND a&b, OR a|b, XOR a^b, NOT ~a

Filed under: Optimization — Michael @ 02:52

## 5 Comments »

1. I remember seeing a bunch of stuff like this in the assembly code that TuxNES generates. (Hi Mike)

It also had this optimization:

For unsigned values, (x>a)&&(x

Comment by None — 2006-01-30 @ 15:28

2. err.. (x>a)&&(x<b) is equivalent to (x-a)<(b-a)

this time with &lt;… :)

Comment by None — 2006-01-30 @ 15:35

3. yes, and for a=0 this becomes x<b
and big surprisse it works for signed values too, for example:
(x>-10)&&(x<10) == (unsigened)(x+10)<20

PS: mike (melanson) != michael (niedermayer)

Comment by Michael — 2006-01-30 @ 17:09

4. Yes, but most of this stuff breaks if you’re not careful about signedness and range. For example
(0x80000000<0x7FFFFFFF)==1 but (0x80000000-0x7FFFFFFF)>>31==0 !

This can be a source of bugs if you’re not careful.

Note that if you compile on x86, some of these optimizations don’t make much difference. For example, the compiler should emit a<b as a setl instruction, which isn’t a conditional branch and doesn’t stall the pipeline. So you need to look at the instructions that actually get generated…

Comment by None — 2006-01-31 @ 04:27

5. You’re a better hacker than I, Mr. None– I just branched the Nestra source into TuxNES. I never claimed to understand how the dynamic recompiler worked! :)

Comment by Multimedia Mike — 2006-01-31 @ 08:04