SIMD without SIMD
You want to put several values in a single int or uint64_t, and either you dont have a cpu which supports SIMD instructions like MMX or the instructions plain dont match the datatype you have (like 2 5:6:5 rgb values in an int)
shifting some arbitrary bits left by 1
a += a&mask
note1, this is very usefull for converting betweem rgb15 and rgb16
note2, yes the bit into which you shift should be 0
unsiged shift left & right
this is simply a normal shift and then masking out the bits which where shifted out of each component
a>>/<<= shift
a&=mask
sign extension
(a+C)^C
00011|00010|00001|00000 10001|10000|01111|01110 + 1110011100111001110 11111|11110|00001|00000 ^ 1110011100111001110
signed shift right (signed left shift == unsigned)
unsigned shift + sign extension
sum all elements
there are several ways,
if you know nothing will overflow then a simple multiply will do
AAAABBBBCCCCDDDD * 1000100010001 = SSSSXXXXXXXXXXXX
or
AAAABBBBCCCCDDDD + (AAAABBBBCCCCDDDD>>4) = X,a+b,X,c+d
(X,a+b,X,c+d) + ((X,a+b,X,c+d)>>8) = X,X,X,a+b+c+d
if overflows might happen / the width of the types is too small then
a= (x & 1111000011110000)>>4
b= x & 0000111100001111
a+=b
and then either use one of the methods above or recursively continue with this
add
1. remove msb, and add the lsbs
2. xor the xored msb back in
x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
m=1000100010001000 and l=0111011101110111
((x&l) + (y&l)) ^ ((x^y)&m)
subtract
1. remove msb, and sub the lsbs
2. xor the xored msb back in
x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
m=1000100010001000 and l=0111011101110111
((x|m) – (y&l)) ^ ((x^y^m)&m)
negate
x=AAAABBBBCCCCDDDD is our input
m=1000100010001000 and l=0111011101110111
x= ~x
((x&l) + 0001000100010001) ^ (x&m)
average round down
x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
(x&y) + (((x^y)&1110111011101110)>>1)
average round up
x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
(x|y) – (((x^y)&1110111011101110)>>1)
test if any element is zero
x=AAAABBBBCCCCDDDD is our input
(x – 0001000100010001) & (~x) & 1000100010001000
Note, this can trivially be used to test for equality by choosing x=a^b
create a bitmask based on zeroness
x=AAAABBBBCCCCDDDD is our input
m=1000100010001000 and l=0111011101110111
(((((x&l) + l) | x) & m) >>3) + l ^ l
Note, this can trivially be used to test for equality by choosing x=a^b
In “create a bitmask based on zeroness”:
(((((x&l) + l) | x) & m) >>4) + l ^ l
… why are you xor-ing a variable with itself?…
– ods15
Comment by ods15 — 2006-02-18 @ 11:59
i dont, + is done before ^
note, the idea behind this line is that
1. set the msb if the part is non zero
2. mask out lsbs and shift msb to lsb
3. sign extend lsb to have a complete mask
in case anyone knows a faster/simpler way dont hesitate to speak up, this one is just what i came up with after a few minutes of thinking …
Comment by Michael — 2006-02-18 @ 15:28
The shift is supposed to be 3 I think, not 4.
Also, I can’t figure out what “test if any element is zero” is supposed to do, it doesn’t seem to work in the cases I’ve tried.
Comment by ods15 — 2006-02-18 @ 17:00
yes, the shift should be 3, fixed
maybe the following code will help understanding what “test if any element is zero” does:
for(i=0; i<(1<<16); i++){
int z1= (i&0x000F) == 0 || (i&0x00F0) == 0 || (i&0x0F00) == 0 || (i&0xF000) == 0;
int z2= (i-0x1111) & (~i) & 0x8888;
if(!z1 != !z2)
printf(“%X\n”, i);
}
Comment by Michael — 2006-02-18 @ 21:49
Michael,
it seems to me the example for “sign extension” is
incorrect (although the principle of (x+C)^C is ok,
since: 00…00 + C = C and 00…01 + C = 00…00
(with overflow))
bye, Skal.
Comment by skal — 2006-02-19 @ 19:02