Lair Of The Multimedia Guru

February 17, 2006

SIMD without SIMD

You want to put several values in a single int or uint64_t, and either you dont have a cpu which supports SIMD instructions like MMX or the instructions plain dont match the datatype you have (like 2 5:6:5 rgb values in an int)

shifting some arbitrary bits left by 1

note1, this is very usefull for converting betweem rgb15 and rgb16
note2, yes the bit into which you shift should be 0

unsiged shift left & right

this is simply a normal shift and then masking out the bits which where shifted out of each component
a>>/<<= shift

sign extension

(a+C)^C

00011|00010|00001|00000
10001|10000|01111|01110    + 1110011100111001110
11111|11110|00001|00000    ^ 1110011100111001110
signed shift right (signed left shift == unsigned)

unsigned shift + sign extension

sum all elements

there are several ways,
if you know nothing will overflow then a simple multiply will do
AAAABBBBCCCCDDDD * 1000100010001 = SSSSXXXXXXXXXXXX
or
AAAABBBBCCCCDDDD + (AAAABBBBCCCCDDDD>>4) = X,a+b,X,c+d
(X,a+b,X,c+d) + ((X,a+b,X,c+d)>>8) = X,X,X,a+b+c+d
if overflows might happen / the width of the types is too small then
a= (x & 1111000011110000)>>4
b= x & 0000111100001111
a+=b
and then either use one of the methods above or recursively continue with this

1. remove msb, and add the lsbs
2. xor the xored msb back in
x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
m=1000100010001000 and l=0111011101110111
((x&l) + (y&l)) ^ ((x^y)&m)

subtract

1. remove msb, and sub the lsbs
2. xor the xored msb back in
x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
m=1000100010001000 and l=0111011101110111
((x|m) – (y&l)) ^ ((x^y^m)&m)

negate

x=AAAABBBBCCCCDDDD is our input
m=1000100010001000 and l=0111011101110111
x= ~x
((x&l) + 0001000100010001) ^ (x&m)

average round down

x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
(x&y) + (((x^y)&1110111011101110)>>1)

average round up

x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
(x|y) – (((x^y)&1110111011101110)>>1)

test if any element is zero

x=AAAABBBBCCCCDDDD is our input
(x – 0001000100010001) & (~x) & 1000100010001000
Note, this can trivially be used to test for equality by choosing x=a^b

create a bitmask based on zeroness

x=AAAABBBBCCCCDDDD is our input
m=1000100010001000 and l=0111011101110111
(((((x&l) + l) | x) & m) >>3) + l ^ l
Note, this can trivially be used to test for equality by choosing x=a^b

Filed under: Optimization — Michael @ 23:46

1. In “create a bitmask based on zeroness”:
(((((x&l) + l) | x) & m) >>4) + l ^ l

… why are you xor-ing a variable with itself?…

– ods15

Comment by ods15 — February 18, 2006 @ 11:59

2. i dont, + is done before ^
note, the idea behind this line is that
1. set the msb if the part is non zero
2. mask out lsbs and shift msb to lsb
3. sign extend lsb to have a complete mask

in case anyone knows a faster/simpler way dont hesitate to speak up, this one is just what i came up with after a few minutes of thinking …

Comment by Michael — February 18, 2006 @ 15:28

3. The shift is supposed to be 3 I think, not 4.

Also, I can’t figure out what “test if any element is zero” is supposed to do, it doesn’t seem to work in the cases I’ve tried.

Comment by ods15 — February 18, 2006 @ 17:00

4. yes, the shift should be 3, fixed

maybe the following code will help understanding what “test if any element is zero” does:
for(i=0; i<(1<<16); i++){
int z1= (i&0x000F) == 0 || (i&0x00F0) == 0 || (i&0x0F00) == 0 || (i&0xF000) == 0;
int z2= (i-0x1111) & (~i) & 0x8888;
if(!z1 != !z2)
printf(“%X\n”, i);
}

Comment by Michael — February 18, 2006 @ 21:49

5. Michael,

it seems to me the example for “sign extension” is
incorrect (although the principle of (x+C)^C is ok,
since: 00…00 + C = C and 00…01 + C = 00…00
(with overflow))

bye, Skal.

Comment by skal — February 19, 2006 @ 19:02