Lair Of The Multimedia Guru

2006-03-02

The AVI container file format

Some people love it some hate it, but no doubt it is probably the most common container format for videos though its usage is declining as supperior formats like matroska, nut and others gain popularity

Why is/was AVI so successfull

well theres no way to awnser this with certanity, we can only guess

allows storing of audio and video encoded with almost any codec without complicated codec specific hacks (ogg is probably the most common example which fundamentally failed here, but mpeg-ps/ts has the same issue, and .mp4 is thanks to the standard comitees also full of codec specific hacks, which ironically wouldnt be needed at all)
simple muxing and demuxing (quicktime/.mov/.mp4/mpeg-ps fails here)
no known software patents (asf/wma/wmv fails here)

AVI the format vs. reallity or why some developers hate AVI

The biggest problem with AVI at least in my humble oppionion is not some technical limitation of AVI but the fact that many encoders (muxers actually to be precisse) generate incorrect AVI files which violate the AVI spec from MS, that again means that players need to be much more complex to deal with all these variations …
some examples of such variations

putting all the audio in a single chunk, yeah try to implement seeking …
having an index which doesnt match the chunks, and just to be sure dont set the header flags which indicate that the index must be used
putting several variable sized frames into a single chunk
or the very common way of storing mp3 in AVI, simply cut the mp3 frames completely randomly and then scream “avi doesnt support vbr mp3” (hi virtualdub and clones)

Common missconceptions about AVI

AVI doesnt support variable bitrate audio: This is simple nonsense, there are 2 ways to store frames (audio/video/…) in AVI, the first is to set sample_size to the size in bytes of a packet and store an integer number of such packets per chunk, the second is to store 1 frame per chunk and set sample_size=0 and thats what has to be done for variable size packets it works with audio as well as video. Now what some encoders actually do is they set sample_size=1 and then chop the audio stream up at random, this surprisingly works with cbr mp3 but not vbr mp3 but its totally wrong even for cbr mp3
AVI doesnt support variable framerate: again this isnt true, storing variable framerate in AVI is not efficient but it is possible and its not a hack and it doesnt stretch the AVI spec, its fully suported, you just set the framerate to the least common multiple of the rates you want to use, and use 0-byte chunks for “skiped” frames
storing variable samples per packet audio in AVI (vorbis): while this is still possible, this stretches the spec somewhat, sample_size must be 0, now there are 2 ways to store things, 1. the one with large overhead this sets rate/scale to a common timebase (gcd of samples per packet/samplerate) and then adds 0 sized packets to keep the timestamps correct, 2. the one with small overhead, this sets rate/scale to (lcm of samples per packet/samplerate) and packs several packets intoeach chunk which isnt really allowed but well …
AVI has 24byte per chunk overhead, ODML-AVI has 32byte per chunk overhead: well ODML-AVI without the old redundant index has just 16byte per chunk overhead and if you stretch the spec like some encoders successfully do by having chunks and index missmatch then you can get away with 8byte per chunk overhead, and yes you can still seek in this

Filed under: Container Formats — Michael @ 23:53

Comments (7)

2006-02-17

SIMD without SIMD

You want to put several values in a single int or uint64_t, and either you dont have a cpu which supports SIMD instructions like MMX or the instructions plain dont match the datatype you have (like 2 5:6:5 rgb values in an int)

shifting some arbitrary bits left by 1

a += a&mask
note1, this is very usefull for converting betweem rgb15 and rgb16
note2, yes the bit into which you shift should be 0

unsiged shift left & right

this is simply a normal shift and then masking out the bits which where shifted out of each component
a>>/<<= shift
a&=mask

sign extension

(a+C)^C

00011|00010|00001|00000
10001|10000|01111|01110    + 1110011100111001110
11111|11110|00001|00000    ^ 1110011100111001110

signed shift right (signed left shift == unsigned)

unsigned shift + sign extension

sum all elements

there are several ways,
if you know nothing will overflow then a simple multiply will do
AAAABBBBCCCCDDDD * 1000100010001 = SSSSXXXXXXXXXXXX
or
AAAABBBBCCCCDDDD + (AAAABBBBCCCCDDDD>>4) = X,a+b,X,c+d
(X,a+b,X,c+d) + ((X,a+b,X,c+d)>>8) = X,X,X,a+b+c+d
if overflows might happen / the width of the types is too small then
a= (x & 1111000011110000)>>4
b= x & 0000111100001111
a+=b
and then either use one of the methods above or recursively continue with this

add

1. remove msb, and add the lsbs
2. xor the xored msb back in
x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
m=1000100010001000 and l=0111011101110111
((x&l) + (y&l)) ^ ((x^y)&m)

subtract

1. remove msb, and sub the lsbs
2. xor the xored msb back in
x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
m=1000100010001000 and l=0111011101110111
((x|m) – (y&l)) ^ ((x^y^m)&m)

negate

x=AAAABBBBCCCCDDDD is our input
m=1000100010001000 and l=0111011101110111
x= ~x
((x&l) + 0001000100010001) ^ (x&m)

average round down

x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
(x&y) + (((x^y)&1110111011101110)>>1)

average round up

x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
(x|y) – (((x^y)&1110111011101110)>>1)

test if any element is zero

x=AAAABBBBCCCCDDDD is our input
(x – 0001000100010001) & (~x) & 1000100010001000
Note, this can trivially be used to test for equality by choosing x=a^b

create a bitmask based on zeroness

x=AAAABBBBCCCCDDDD is our input
m=1000100010001000 and l=0111011101110111
(((((x&l) + l) | x) & m) >>3) + l ^ l
Note, this can trivially be used to test for equality by choosing x=a^b

Filed under: Optimization — Michael @ 23:46

Comments (5)

2006-02-10

ASCII Mandelbrot Realtime Zoom

not only that, but its 447 bytes of C source (with license header, copyright statement and indented code), and you can even specify the spot to which the realtime zoom will go on the command line
Update: new version with more nasty tricks and less whitespace only needs 285 bytes of C source

Filed under: Off Topic — Michael @ 03:14

Comments (2)

2006-02-06

CRC32 vs. Adler32

You need to choose a checksum and dunno which, well the 2 most popular ones are the CRC and Adler32 checksum

Speed / innermost loop

CRC32:

while(buffer<end -3){
    crc ^= le2me_32(*(uint32_t*)buffer); buffer+=4;
    crc =  tab[3][ crc     &0xFF]
          ^tab[2][(crc>>8 )&0xFF]
          ^tab[1][(crc>>16)&0xFF]
          ^tab[0][(crc>>24)     ];
}

Adler32:

while(buffer<end -3){
    s1 += *buffer++; s2+=s1;
    s1 += *buffer++; s2+=s1;
    s1 += *buffer++; s2+=s1;
    s1 += *buffer++; s2+=s1;
}

as we can see adler needs 12 adds and 4 1byte reads and crc needs 1 4byte read, 4 xor, 4 table lookups, 1 add, 3 and, 3 shift, without benchmarking i would say adler should be faster

Burst error detecting capabilities

CRC-32 can detect every burst error of 32 or less bits
Adler-32 will fail with at least one 17bit burst error (0 00000010 00000000 vs. 1 00000000 00000001)

Bit error detecting capabilities

CRC-32 differs depening upon generator polynom see misusing-crcs-for-error-correction
Adler-32 can detect every 2bit error within 65521 byte or so but will fail with some 3 bit errors

Random error detecting capabilities

CRC-32 will produce a evenly distributed checksum for messages >4byte
Adler-32 will produce a quite unevenly distributed checksum for small mesages so that the effective number of bits in the checksum is significantly reduced for short messags

adler32
len=     1, collisions=488296949(0.390637%), effective bits=7.999957
len=     2, collisions=  1903788(0.001523%), effective bits=16.002699
len=     4, collisions=   372789(0.000298%), effective bits=18.355140
len=     8, collisions=    94799(0.000076%), effective bits=20.330556
len=    16, collisions=    23902(0.000019%), effective bits=22.318296
len=    32, collisions=     6068(0.000005%), effective bits=24.296135
len=    64, collisions=     1573(0.000001%), effective bits=26.243837
len=   128, collisions=      669(0.000001%), effective bits=27.477278
len=   256, collisions=      437(0.000000%), effective bits=28.091651
len=   512, collisions=      331(0.000000%), effective bits=28.492453
len=  1024, collisions=      219(0.000000%), effective bits=29.088353
----
crc32
len=     1, collisions=488296915(0.390637%), effective bits=7.999957
len=     2, collisions=  1903782(0.001523%), effective bits=16.002703
len=     4, collisions=       22(0.000000%), effective bits=32.403708
len=     8, collisions=       29(0.000000%), effective bits=32.005159
len=    16, collisions=       30(0.000000%), effective bits=31.956249
len=    32, collisions=       28(0.000000%), effective bits=32.055785
len=    64, collisions=       39(0.000000%), effective bits=31.577738
len=   128, collisions=       39(0.000000%), effective bits=31.577738
len=   256, collisions=       32(0.000000%), effective bits=31.863140
len=   512, collisions=       32(0.000000%), effective bits=31.863140
len=  1024, collisions=       42(0.000000%), effective bits=31.470823

Note, generated with: check_collision.c

Filed under: Error Correcting Codes — Michael @ 19:27

Comments (0)

2006-02-04

Fast sign extension

You want to convert a signed n-bit integer into a m-bit interger, now if n and m are one of 8,16,32,64 then its easy, just cast to the respective type and let the compiler choose how to do it efficiently, but what if n or m are not one of these?

Bit Twiddling Hacks suggests:
C= 1 << (n – 1)
-(x & C) | x
which needs 3 operations if n is constant and 5 if not

can this be done with fewer instructions? yes
C= (-1)<<(n-1)
(x+C)^C
which needs 2 operations if n is constant and 4 if not

PS: the C= (-1)<<(n-1) could be replaced by a look up table …

Filed under: Optimization — Michael @ 23:10

Comments (0)

2006-01-28

Avoiding branches/if/conditionals

Why? well because misspredictable branches are very slow on modern CPUs, doing 10 or more simple arithemetic operations can be faster, but also keep in mind that:

compilers sometimes compile conditional code to very efficient unconditional code, while they might not recognize your branchless code es equivalent
if the branch is well predictable it will probably be faster then the branchless code

Building blocks of branchless code

Turning comparissions into masks or 0/1

a<0	a>>31
a!=0	(a\|-a)>>31
a<b	(a-b)>>31
a>b	(b-a)>>31
a≤b	(a-b-1)>>31
a≥b	(b-a-1)>>31

Note, you might want to replace 31 by the number of bits in whatever type you use and cast to a signed type if you want a bit mask (-1/0) or unsigned type if you want (1/0)
allso keep in mind that (a<b) and such will give you (1/0) too

converting bitmasks to 0/1

well, thats trivial, there are many ways, some even flip the 0/1 in the process
mask&1, -mask, mask+1

converting 0/1 to masks

-x, x-1

flipping the 0/1 value

x^1, 1-x

flipping the mask value

x^(-1), ~x

switch between 2 variables depening upon a mask

b+((a-b)&mask), b^((a^b)&mask)

convert mask to 0/x

mask&x

convert mask to -1/x

mask|x
Note: usefull for (-1/0) -> (-1/1)

add/sub x depening upon a mask

a += x&mask
Note, dont forget a+=mask / a-=mask for x=1/-1

negate depening upon a mask

a = (a^mask) – mask

minimum of 2 variables

b+((a-b)&((a-b)>>31))

maximum of 2 variables

a-((a-b)&((a-b)>>31))

shift by s depending upon a mask

a>>= s&mask, a< <=s&mask, a += ((a>>s)-a)&mask
Note, that shifting by a constant might be faster

exchanging 2 values depending upon a mask

a^=b
b^=a&mask
a^=b
Note, the mask doesnt need to be 0/-1 it could have just some bits set

logical operations between masks

well, that really should be obvious AND a&b, OR a|b, XOR a^b, NOT ~a

Filed under: Optimization — Michael @ 02:52

Comments (5)

2006-01-20

Rounding

When working with integers, fixedpoint math or when converting floats to integers we need to round somehow, but which way is best …
In school they probably told you to round to nearest and round halfway cases away from zero +-x.0000 … +-x.4999 to +-x and +-x.5 … +-x.9999 to +-x+-1 but thats actually not optimal

First decission, round to nearest or not, rounding to nearest is pretty clear already, it means to well round to the nearest if there is a single nearest value and thats obviously more accurate then not doing so, not rounding to nearest has one big advantage, its often but not always (PAVGB MMX instruction for example) faster (x>>2 vs. (x+2)>>2)

Second decission, what to do with x.5, it seems to not matter much but actually it often does matter, the round toward +inf and round to -inf cases are easy to implement (x+2)>>2 and (x+1)>>2 for example but they add some ugly systematic bias which can be problematic if values are used in long calculations and get rounded many times
the solution, round to even (or odd) doesnt add such a bias and is thus better, alternatively you could also repeatedly flip the direction in which rounding is done, mpeg4 for example does that in the motion compensation code …
to round to even you could do something like (x + ((x>>1)&1))>>1, (x+1+((x>>2)&1))>>2
to round to odd you could do something like (x>>1)|(x&1), (x+2-((x>>2)&1))>>2

you might also be stuck with instructions which do something like (a*constant)>>16 to reduce the rounding error of these you could simple add (1<<15) / constant to a

Filed under: Optimization — Michael @ 22:56

Comments (0)

2006-01-15

Universal vlc codes

Its quite well known that to optimally store some variable x with value v, we should use -log2(p) bits where p is the probability that x has value v, obviously the decoder has to know the probability distribution the encoder used or the bitstrings generated from that to be able decode it…
now if we limit ourselfs to an interger count of bits then huffman codes are optimal, building an huffman code is trivial, just recursively merge the 2 least probable symbols, but if there are many different variables with different probablility distributions or the maximum value is large or unbounded or you dont know the probabilities or you want low complexity then huffman codes might be a bad choice

Universal vlc codes might be a better choice, they all assume small values are more likely and often have very good compression rates
heres a table of common ones

v	Unary	Unary2	Exp Golomb	Exp Golomb2	Fibonacci	rice k=2	rice k=3
0	0	0	0	0	11	00	000
1	10	11	100	101	011	01	001
2	110	101	101	111	0011	100	010
3	1110	1001	11000	10001	1011	101	011
4	111110	10001	11001	10011	00011	1100	1000
5	1111110	100001	11010	11001	10011	1101	1001
6	11111110	1000001	11011	11011	01011	11100	1010
7	111111110	10000001	1110000	1000001	000011	11101	1011
8	1111111110	100000001	1110001	1000011	100011	111100	11000
…	…	…

Note1: Exp Golomb2 is identical to Exp Golomb except that its a reversible vlc, it can be decoded from right to left too, same for Unary2 and Unary

The smart reader will probably have noticed that Exp Golomb coding is simply unary coding for the number of bits + the bits, and rice coding is unary coding with k bits, more generally you can take any vlc code to encode some “class” and then give every such class its own number of additional bits

The ideal choice of uvlc code depends upon the probability distribution of the values which will be encoded, there is no single perfect one

Well, hmm there was something else …ahh yes negative numbers, we where just dealing with positive ones above, negative ones are trivial, just map them to odd positive ones and map the actual positive ones to even ones or the other way around
in C the following will do the trick quickly:

v = -2*i-1;
v ^= v>>31;

and the following will reverse it:

(v>>1) ^ -(v&1)

Filed under: Entropy Coding — Michael @ 22:18

Comments (1)

2006-01-14

DHL Europack

You order some japanese kitchen knifes from dick.biz which are supposed to become a present for you mother, yeah maybe you shouldnt order from a company with such a name … , the stuff gets send with DHL Europack on 8.12.2005 a few days later you get an invoice but nothing else, on that is the number for the packet, so you check via tracknet.deutschepost.de
where the package went, it says:
Letztes Sendungsverfolgungsereignis:
Erfolgloser Zustellversuch, Firma erloschen / unbekannt(Zustellbasis Information): Guntramsdorf (tof) / AT, 2005-12-12
or in engish: recipient doesnt exist :)
i must note that i was at home most of the time, they had my phone and email address and there was no trace that they ever where here or tried to contact me
i send the company from where i ordered the stuff an email on the 16th, which is ignored entirely
on the 22th i try again and threaten them a little, i get a phone call telling me the package is being sorted, and that the info from the online tracking stuff where wrong, yeah sure …
on the 30th i get a letter informing me that they will transfer my money back which obviously indicates that the package went back too, i write them another email the next day which again is ignored, so i try DHL, first mail ignored second via some online form get the awnser:
aus datenschutzrechtlichen Gründen dürfen wir Angaben zum Sendungsstatus nur
an den Absender weitergeben. Bitte wenden Sie sich an Ihre Postverwaltung
oder den Absender um Ihre Informationen zu bekommen. Vielen Dank.
or in english: no we can only tell the sender due to privacy/data blah blah laws why your package didnt reach you

suggestions anyone?
try to order again from there and hope it reaches me before next xmas?
from somewhere else? (they seemed to be cheapest though)

Filed under: Off Topic — Michael @ 00:14

Comments (2)

2006-01-12

Better CRC-32 polynom for correcting byte errors

In correcting-byte-and-burst-errors-with-crcs we have seen that the CRC-32 polynom 0x04C11DB7 can correct 1 byte error in ~1mbit not that correcting single byte errors in such huge blocks is good for anything but ive found a much better one specifically 0x0D438219 which can correct one byte error in 9747877 bits

Filed under: Error Correcting Codes — Michael @ 23:35

Comments (0)

« Previous Page — Next Page »

2006-03-02

Why is/was AVI so successfull

AVI the format vs. reallity or why some developers hate AVI

Common missconceptions about AVI

2006-02-17

shifting some arbitrary bits left by 1

unsiged shift left & right

sign extension

signed shift right (signed left shift == unsigned)

sum all elements

add

subtract

negate

average round down

average round up

test if any element is zero

create a bitmask based on zeroness

2006-02-10

2006-02-06

Speed / innermost loop

Burst error detecting capabilities

Bit error detecting capabilities

Random error detecting capabilities

2006-02-04

2006-01-28

Building blocks of branchless code

Turning comparissions into masks or 0/1

converting bitmasks to 0/1

converting 0/1 to masks

flipping the 0/1 value

flipping the mask value

switch between 2 variables depening upon a mask

convert mask to 0/x

convert mask to -1/x

add/sub x depening upon a mask

negate depening upon a mask

minimum of 2 variables

maximum of 2 variables

shift by s depending upon a mask

exchanging 2 values depending upon a mask

logical operations between masks

2006-01-20

2006-01-15

2006-01-14

2006-01-12

Links