Lair Of The Multimedia Guru

September 27, 2006

X86 assembly instructions you always wanted but intel didnt give them to you

Everyone who wrote assembly code probably has cursed the lack of some instuctions in the instrcution set of their target architecture, be it due to the work needed to workaround it or the complexity and consequently speed loss of the resulting code

Heres a list of what i think is lacking in the x86 instruction set …

btw, i had to use shl and shr instead of 2 less than and 2 right than symbols as wordpress randomizes the html code if & lt ; or & gt ; occur in it (i dunno why, but that did work in the past)

cadd/csub (conditional add / sub)
While there are conditional move instructions there are no conditional add/subtract instructions even though code like if(...) x+=... is pretty common
max/min
While there are mmx and sse maximum and minimum instruction there are sadly no normal integer equivalents
negative index addressing
addressing stuff like base[constant+index] is possible base[constant-index] is not, not only would that be usefull on its own it would also allow to subtract a left shifted register from another like: lea eax, [eax-8*ebx]
imulfix
For fixed point calculations the mul and imul instructions are very hard or slow to use as using the low 32bit result often would overflow and the high 32bit limits the coefficient to -0.5 … 0.5, a instruction which does (a*b + (1 shl 23)) shr 24 would solve this
sarr (shift arithmetic right with rounding)
Shift right is nice but often you need to do it with rounding, a (a + (0.5 shl s)) shr s would come in handy in these cases
random
a pseudo random number generator and a true random number generator would be nice
abs
While there is pabs for mmx theres no equivalent for normal integers
select
A a&c | b&(~c) would come in handy for combining bits from 2 registers
readbits
a hardware based bitstream reader which takes a base pointer, index in bits and length of bits it should read would be very usefull for many audio and video formats, bitstream&vlc decoding generally takes 1/3 of the time spend decoding
nand/nor
(a&~b) and (a|~b) and maybe others
pavgb_nornd (packed byte average without rounding)
while theres a average with rounding theres none without sadly mpeg need the later too
integer instructions like sarr and select for mmx/sse too
pcmpneq / pcmplt / pcmple / pcmpge
Just the missing compare instructions for mmx
punpckh/l0
unpack instructions which unpack the source with an implicit 0, this would avoid 1 move per 2 unpack and unpacking is very common
packsrr (pack with shift right and rounding)
like pack* but with rounding and shift right integrated
plea
A mmx/sse instruction similar to lea which does the extreemly common operation of adding a left shifted value to another, and if this isnt able to do a=2*a+-b then an additional instruction for that as it would reduce the very common a-b, a+b butterfly operation to 2 instructions instead of 3
p(add/sub/mul)unpackh/l
having to unpack 2 source operands into 4 to perform 1 operation with them sucks (=needs 6-8 instructions) so combined foobar+unpack high/low would be pretty nice, of course only if its fast though
ps(r/l)lb (packed right/left shift unsigned byte)
the missing packed byte shifts
pdabs (packed absolute difference)
absolute value of the difference of 2 values
paddsusb
add a signed into a unsiged (byte) value with unsigned saturation, usefull for brightness correcture and loop/postprocessing filters
Filed under: Optimization — Michael @ 13:16

Powered by WordPress