Lair Of The Multimedia Guru

2006-02-17

SIMD without SIMD

You want to put several values in a single int or uint64_t, and either you dont have a cpu which supports SIMD instructions like MMX or the instructions plain dont match the datatype you have (like 2 5:6:5 rgb values in an int)

shifting some arbitrary bits left by 1

a += a&mask
note1, this is very usefull for converting betweem rgb15 and rgb16
note2, yes the bit into which you shift should be 0

unsiged shift left & right

this is simply a normal shift and then masking out the bits which where shifted out of each component
a>>/<<= shift
a&=mask

sign extension

(a+C)^C

00011|00010|00001|00000
10001|10000|01111|01110    + 1110011100111001110
11111|11110|00001|00000    ^ 1110011100111001110

signed shift right (signed left shift == unsigned)

unsigned shift + sign extension

sum all elements

there are several ways,
if you know nothing will overflow then a simple multiply will do
AAAABBBBCCCCDDDD * 1000100010001 = SSSSXXXXXXXXXXXX
or
AAAABBBBCCCCDDDD + (AAAABBBBCCCCDDDD>>4) = X,a+b,X,c+d
(X,a+b,X,c+d) + ((X,a+b,X,c+d)>>8) = X,X,X,a+b+c+d
if overflows might happen / the width of the types is too small then
a= (x & 1111000011110000)>>4
b= x & 0000111100001111
a+=b
and then either use one of the methods above or recursively continue with this

add

1. remove msb, and add the lsbs
2. xor the xored msb back in
x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
m=1000100010001000 and l=0111011101110111
((x&l) + (y&l)) ^ ((x^y)&m)

subtract

1. remove msb, and sub the lsbs
2. xor the xored msb back in
x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
m=1000100010001000 and l=0111011101110111
((x|m) – (y&l)) ^ ((x^y^m)&m)

negate

x=AAAABBBBCCCCDDDD is our input
m=1000100010001000 and l=0111011101110111
x= ~x
((x&l) + 0001000100010001) ^ (x&m)

average round down

x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
(x&y) + (((x^y)&1110111011101110)>>1)

average round up

x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
(x|y) – (((x^y)&1110111011101110)>>1)

test if any element is zero

x=AAAABBBBCCCCDDDD is our input
(x – 0001000100010001) & (~x) & 1000100010001000
Note, this can trivially be used to test for equality by choosing x=a^b

create a bitmask based on zeroness

x=AAAABBBBCCCCDDDD is our input
m=1000100010001000 and l=0111011101110111
(((((x&l) + l) | x) & m) >>3) + l ^ l
Note, this can trivially be used to test for equality by choosing x=a^b

Filed under: Optimization — Michael @ 23:46

Comments (5)

5 Comments »

In “create a bitmask based on zeroness”:
(((((x&l) + l) | x) & m) >>4) + l ^ l

… why are you xor-ing a variable with itself?…

– ods15

Comment by ods15 — 2006-02-18 @ 11:59
i dont, + is done before ^
note, the idea behind this line is that
1. set the msb if the part is non zero
2. mask out lsbs and shift msb to lsb
3. sign extend lsb to have a complete mask

in case anyone knows a faster/simpler way dont hesitate to speak up, this one is just what i came up with after a few minutes of thinking …

Comment by Michael — 2006-02-18 @ 15:28
The shift is supposed to be 3 I think, not 4.

Also, I can’t figure out what “test if any element is zero” is supposed to do, it doesn’t seem to work in the cases I’ve tried.

Comment by ods15 — 2006-02-18 @ 17:00
yes, the shift should be 3, fixed

maybe the following code will help understanding what “test if any element is zero” does:
for(i=0; i<(1<<16); i++){
int z1= (i&0x000F) == 0 || (i&0x00F0) == 0 || (i&0x0F00) == 0 || (i&0xF000) == 0;
int z2= (i-0x1111) & (~i) & 0x8888;
if(!z1 != !z2)
printf(“%X\n”, i);
}

Comment by Michael — 2006-02-18 @ 21:49
Michael,

it seems to me the example for “sign extension” is
incorrect (although the principle of (x+C)^C is ok,
since: 00…00 + C = C and 00…01 + C = 00…00
(with overflow))

bye, Skal.

Comment by skal — 2006-02-19 @ 19:02

RSS feed for comments on this post. TrackBack URI