X86 assembly instructions you always wanted but intel didnt give them to you
Everyone who wrote assembly code probably has cursed the lack of some instuctions in the instrcution set of their target architecture, be it due to the work needed to workaround it or the complexity and consequently speed loss of the resulting code
Heres a list of what i think is lacking in the x86 instruction set …
btw, i had to use shl and shr instead of 2 less than and 2 right than symbols as wordpress randomizes the html code if & lt ; or & gt ; occur in it (i dunno why, but that did work in the past)
- cadd/csub (conditional add / sub)
- While there are conditional move instructions there are no conditional add/subtract instructions even though code like
if(...) x+=...
is pretty common - max/min
- While there are mmx and sse maximum and minimum instruction there are sadly no normal integer equivalents
- negative index addressing
- addressing stuff like base[constant+index] is possible base[constant-index] is not, not only would that be usefull on its own it would also allow to subtract a left shifted register from another like:
lea eax, [eax-8*ebx]
- imulfix
- For fixed point calculations the mul and imul instructions are very hard or slow to use as using the low 32bit result often would overflow and the high 32bit limits the coefficient to -0.5 … 0.5, a instruction which does
(a*b + (1 shl 23)) shr 24
would solve this - sarr (shift arithmetic right with rounding)
- Shift right is nice but often you need to do it with rounding, a
(a + (0.5 shl s)) shr s
would come in handy in these cases - random
- a pseudo random number generator and a true random number generator would be nice
- abs
- While there is pabs for mmx theres no equivalent for normal integers
- select
- A
a&c | b&(~c)
would come in handy for combining bits from 2 registers - readbits
- a hardware based bitstream reader which takes a base pointer, index in bits and length of bits it should read would be very usefull for many audio and video formats, bitstream&vlc decoding generally takes 1/3 of the time spend decoding
- nand/nor
(a&~b)
and(a|~b)
and maybe others- pavgb_nornd (packed byte average without rounding)
- while theres a average with rounding theres none without sadly mpeg need the later too
- integer instructions like sarr and select for mmx/sse too
- …
- pcmpneq / pcmplt / pcmple / pcmpge
- Just the missing compare instructions for mmx
- punpckh/l0
- unpack instructions which unpack the source with an implicit 0, this would avoid 1 move per 2 unpack and unpacking is very common
- packsrr (pack with shift right and rounding)
- like pack* but with rounding and shift right integrated
- plea
- A mmx/sse instruction similar to lea which does the extreemly common operation of adding a left shifted value to another, and if this isnt able to do
a=2*a+-b
then an additional instruction for that as it would reduce the very commona-b, a+b
butterfly operation to 2 instructions instead of 3 - p(add/sub/mul)unpackh/l
- having to unpack 2 source operands into 4 to perform 1 operation with them sucks (=needs 6-8 instructions) so combined foobar+unpack high/low would be pretty nice, of course only if its fast though
- ps(r/l)lb (packed right/left shift unsigned byte)
- the missing packed byte shifts
- pdabs (packed absolute difference)
- absolute value of the difference of 2 values
- paddsusb
- add a signed into a unsiged (byte) value with unsigned saturation, usefull for brightness correcture and loop/postprocessing filters