Lair Of The Multimedia Guru

2006-06-02

Optimal (A)DPCM encoding

ADPCM in general encodes differences between samples using a small number of possible difference values and adapts which values are allowed depending upon past difference values
( … we all know that)

ADPCM encoders normally work based on the principle that they select the best possible difference between the last sample and the current sample while ignoring both future and past, thats not optimal at all

a viterbi based encoder would select the optimal sequence of differences which minimize some distortion measure like the sum of squares, it achives this by successively finding the optimal encodings up to a specific state or in other words
instead of keeping a single sequence of encoded differenes of the past samples and then encoding the next difference
we keep track of the optimal sequence of differences/encoded bitstream up to the current sample for every state (0..88 step + a few sample values around the ideal one) next we just calculate the optimal bitstreams up to the next sample using the current sample and the optimal ones up to the last sample

maybe a concrete example would help, we use non adaptive DPCM here but the principle is the same …

lets say we have some input samples	0,2,6,4,-4
and our example encoder can encode	+4,+1,-1,-4 differences
a conventional encoder would output	0,1,5,4, 0 distortion=18
optimal would be	1,2,6,2,-2 distortion=9
if we would force the first output to 1 and then use a conventional encoder	1,2,6,5, 1 distortion=27

viterbi encoder:

1. iteration
-2              distortion 4
-1              distortion 1
 0              distortion 0
 1              distortion 1
 2              distortion 4
2. iteration
 1, 0           distortion 5
 0, 1           distortion 1
 1, 2           distortion 1
-1, 3           distortion 2
 0, 4           distortion 4
3. iteration
-1, 3, 4        distortion 6
 0, 1, 5        distortion 2
 1, 2, 6        distortion 1
-1, 3, 7        distortion 3
 0, 4, 8        distortion 8
4. iteration
 1, 2, 6, 2     distortion 5
-1, 3, 7, 3     distortion 4
 0, 1, 5, 4     distortion 2
 1, 2, 6, 5     distortion 2
 0, 1, 5, 6     distortion 6
5. iteration
not possibl,-6
not possibl,-5
not possibl,-4
not possibl,-3
 1, 2, 6, 2,-2  distortion 9

Filed under: Uncategorized — Michael @ 20:18

Comments (0)

2006-05-23

PT_GNU_STACK

You upgrade your linux distro and suddenly things are failing with error messages like "error while loading shared libraries: libfoobar.so.123: cannot enable executable stack as shared object requires: Permission denied" what happened?

Well, you just have become yet another witness of the decay of the GNU toolchain. Besides that gcc and glibc becomes more and more bloated every day, gcc bugreports being randomly closed without fixing them and so on, we since some time have the beautifull PT_GNU_STACK feature in every GNU tool which tries very hard to make the stack of application executable, without grsec your system will be more vulnerable, with grsec stuff will fail hard as grsec will not give apps an executable stack if they dont need one and the redhead ehm i mean gnu libc/loader/… will then simply refuse to continue if the unneccesary executablity isnt provided

what exactly is PT_GNU_STACK?

PT_GNU_STACK is a entry in the elf file format which contains the access rights (read, write, execute) of the stack, so far so good, nothing bad here, the problem is how these access rights are choosen by the tools …

The idea behind PT_GNU_STACK / choosing the access rights

The stack is always read and writeable obviously thats needed, so we are left with the question about it being executable, when linking objects together into an library or libraries into an application, a single object requireing an executable stack will cause the final application and all libraries which contain that object to need an executable stack, if no object needs an executable stack the final application wont need one either

The design and implementation of PT_GNU_STACK

Every tool in the gnu toolchain will from source over assembly, objects and libraries to final application merge the PT_GNU_STACK access rights so that a single object needing a executable stack will cause the final application to need one, in case of doubt (no PT_GNU_STACK entry) the default is that the specific object needs an executable stack

The problem with this / the total failure of the PT_GNU_STACK idea

Undecideablility at the compilation and assembly level

Deciding if a piece of code needs an executable stack is proofably not possible, its indeed equivalent to solving the halting problem …
so gcc, gas and so on guess if a pice of code needs an executeable stack and if they are unsure, they claim that it does

Undecideablility at the global level

For example calling a function through a global pointer, the compiler compiling that file cant decide about the stack access rights as it doesnt know all code which changes the pointer, the linker later would have to dissassemble the object files again just to have the info needed to decide it …

several definitions of executability

If you execute code on the stack, does it need to be executable? if you use redheads execshield then AFAIK yes, if you use grsec it depends upon the code, as grsec can detect and emulate some common and harmless sequences of instrucions on the stack (trampolines)

Even if an application needs an executable stack for some feature

Theres no need to have an executable stack if that feature isnt used, actually applications might explicily try to enable / disable executability of a specific part of memory depening upon needs, gcc will in that case almost certainly claim that the app always needs an executable stack …

All these things cause applications to claim that they require an executable stack while they simply do not, which makes your system much more vulnerable to attacks

The alternative

Mark final applications which need an executable stack as such, make the default to not requireing an executable stack

whats the difference

Well, in practice more then 99% of all applications dont need an executable stack and if a application needs an executable stack but doesnt have one then that causes a clear failure which will be noticed immedeatly and can be fixed by marking that application as needing a executable stack
OTOH the current PT_GNU_STACK system with its pass the flags from source to final app ideally needs corretures to be done for each source file which the gnu tools cant deal with automatically, and there are many more source files then applications not to mention that the error rate at the application level seems to be far in excess of 1%
furthermore marking an app as requireing an executable stack while it actually doesnt, wont produce any warning, error or otherwise any hint that something is wrong while security is greatly impaired

Filed under: Off Topic — Michael @ 14:29

Comments (6)

2006-05-19

Coding Style / Code Conventions

There are many coding style guides, IMO actually significantly too many … their existence often only justified by the single hypothesis that having everything in the same style is more readable then having one style per file and letting the original author / maintainer decide it, now maybe my memory sucks but i dont remember a single case where the difference in coding style in different parts of a project caused me any difficulty in understanding. It was rather the lack of any indention, exclusive use of single letter variables or simply the complexity of the code, lack of comments, and my stupidity of course which caused difficulty understanding something …

hmm, mphq cvs is still dead :( so lets try to write the ultimate coding style guide and hope mphq works when iam done so i can finally commit a few minor bugfixes

The ultimate coding (style) guide

Rule 1

Every rule must be justifyable and the justification must not apply to a contradicting rule too
yes, iam trying to avoid the “for consistancy” justification which can be applied to litterally anything

Rule 2a

Write consistant code / Changes should be approximately in the same style as the surrounding code
Justifiction: having the coding and indention style change every few lines makes code somewhat harder to read, while a overly strict style restriction will cause some volunteers (or employees) to leave

Rule 2b

Write indented code (code within a block should be at the same indention level, code in parent and child blocks should be at different levels), placement and indention of stuff like {} is not restricted here, just the actual code
Justification: see mplayer.c for what happens without that rule :)

Rule 2c

Avoid constructs which IDEs/editors have difficulty with, common examples are non-ASCII letters, tab / space mixes, differing tab length in different files, dos line endings, …
Justification: obvious

Rule 2d

avoid non english (in identiers and comments)
Justification: not everyone speaks every language, english is understood my most people, of course if every programmer you care about understands hungarian then its a perfect choice too

Rule 3

maximize usefull information at minimum size and complexity (simple code preferred)

so a variable name like szMyCompleteName is normaly worse then simply name as the type is probably obvious, “complete” also adds no new info … of course a function which converts zero terminated strings to length + chars style would benefit from having the type in the names, it all depends upon how and where a variable is used
comments should not say whats already obvious from the code like int i=sqrt(v); //i is the squre root of v but instead provide usefull info like //i is the largest factor v can contain, so we only need to test integers less then i but then in that case maybe renaming i to largest_factor would have provided more info at less size …
justificaion: the idea is to provide the reader with the info she is interrested in, not to duplicate and spread out the important stuff between noise
also less code means fewer places where a bug could be
Note: this rule also forbids code duplication, and other bloatedness

Rule 4

Write efficient code
Note: efficient here means little memory, disk, cpu, …
Justification: well this one should be obvious

Rule 5

Write portable code
Justification: well this one should be obvious too

Rule 6

Write localized / internationalized applications (user vissible stuff in her native language)
Note, maybe error messages should be dealt with differently as the developers need to understand them in bugreports …
Justification: well this one should be obvious too

Rule 7

Write compatible code (dont break compatibility of file formats, your last library release, …)
Justification: well this one should be obvious too

Rule 8

Write secure code (dont write over the end of an array, checking untrusted input carefully, be carefull with malloc(w*h) due to overflows of the *, …)
Justification: well this one should be obvious too

Rule 9

Write modular code / split code into independant parts
Justification: many dependancies makes code very hard to understand and maintain

Rule 10

If some rules contradict, use common sense (which goal is more important in a specific case)

Filed under: Off Topic — Michael @ 19:49

Comments (6)

2006-05-15

TV sets

Its been a month since my last blog entry, iam not sure if thats bad or not, anyway heres some more halfway offtopic crap, i want(ed) to buy a new TV set as the one i have here is _old_ and doesnt really work very well …

First question CRT or LCD?

CRTs, are old, many shops dont have any CRT based TV sets anymore and everyone seems to know they are old and inferrior
LCDs, shiny new technology, flatter and better, well wait, why exactly does every LCD TV below 1000€ have a _vastly_ inferrior picture quality then my mothers el-cheapo noname CRT TV which probably costed < 200€ … and iam not saying the ones above 1000€ where better, just out of my acceptable price range and many showing high resolution HDTV which made fair comparission hard …

Size

for CRTs size seems strongly correlated with quality and features, no 55cm 100hz TV sets for example, it also seemed like larger isnt realy more expensive, for LCDs of coarse larger is much more expensive …

100hz CRT vs 50hz

50hz flickers, and actually i somehow cant get rid of the feeling that its stronger flickering then 50hz TV sets had a few years ago
100hz dosnt flicker, wait, actually i found just one 100hz TV set <300€ which didnt had at least part of the image flickering, all others had parts of the TV channels logo flickering also with 100hz you can choose between some residue interlacing artifacts or “staircase” artifacts on moving objects, ive not found any 100hz tv set which didnt show some artifacts allthough in the end i prefer 100hz over 50hz the artifcats are less annoying then the flickering IMHO

Comparing TV sets

Well you walk into a shop with TV sets, many TV sets … all showing different stuff, and if you are lucky and they show the same then its a crappy dark low quality music video from MTV completely unsuitable for comparing, but wait one shop had remote controls “attached” to each tv set so customers could change settings and channels as they liked, sadly less then half the remotes where working …

Summary, conclusion and personal oppinion (yeah as if the whole blog was anything but subjective personal oppinion)

LCDs have imporved, more in the written numbers then in reallity, CRTs seems to have become worse but still LCDs are behind CRTs both in quality and quality per price
many TV sets be it CRT or LCD show terrible artifacts
50hz CRTs: flickering
100hz CRTs: staircase or interlacing artifacts on moving objects
CRTs in general: some cases of brightness-size dependance, most have oversharpened and too strongly saturated images (yeah last 2 are purely an adjustment thing, probably done as most customers are idiots and prefer oversharpened and too strongly saturated images)
LCDs: washed out blurry images, colors more or less wrong, aritifacts caused by slow reaction times not to mention 2-4 times as expensive as CRTs

Filed under: Off Topic — Michael @ 11:56

Comments (3)

2006-04-15

Decrypting Nagravision

Now after breaking videocrypt, lets take a look at (old analog) nagravision

Nagravision

the PRNG seed

Same as videocrypt (Binary data from the vertical blanking interval is feeded into a smartcard which then among other things produces a PRNG seed for the nagravision decoder, but none of this matters to us here)

the PRNG

The PRNG (pseudo random number generator) is feeded with a 15bit seed and a 256 5bit entry table, and produces 287 5 bit values, the following code will do exactly that:

void prng(int seed, int *cle2enc){
  uint8_t keyNdx= seed >> 7;
  int keyInc= ((seed & 0x7F) << 1) + 1;
  int buffer[32];

  for(int i=0; i<32; i++) buffer[i]= i;

  for(int i=0; i<255; i++){
    cle2enc[i]= buffer[ keyTable[keyNdx] ];
    buffer[ keyTable[keyNdx] ]= i + 32;
    keyNdx+= keyInc;
  }

  memcpy(cle2enc+255, buffer, 32*sizeof(int));
}

the actual decryption

nagravision has a 32 line buffer, each line can contain a line of video, for each of the 287 lines of a field there are 5bits from the PRNG which are used as index into the buffer to decide into which entry to store the next received line and from where to output one, or in other words nagravision reorders lines using a 32line temporary buffer

Breaking Nagravision without knowledeg of the PRNG

The naive and ideal solution would be to try all 32^287 possibilities and compare the resulting decoded images, obviously thats not realistic
if OTOH we just use a compare function which compares 2 lines and lets assume its just simple correlation then and wish to find the global minimum then we actually didnt improve by that much as thats equivalent of solving the TSP
old versions of 2010 used the following algorithm to find a good permutation of the lines, which resulted in watchable movies

Step 1 (downsampling)

to speedup the linecompare later, the lines can be horizintally downsampled

Step 2 (comparing lines)

all lines which are not near black and which have not yet been merged with 2 lines are compared to all other such lines, the compares are done blockwise so that the number of cache misses is minimzed, for each line a fixed number of best matches is kept

Step 3 (sorting)

all the line pairs are sorted depending upon their similarity

Step 4 (merging)

from best matching line pair to worst the following steps are done

Step 4A (out of pairs)

if we hit the last pair for any line (we just keept a fixed number of best matches for each line) then we goto Step 2

Step 4B (impossible merges 1)

if either of the lines is in the middle of a block or they are the ends of the same block we continue with the next pair at Step 4

Step 4C (impossible merges 2)

we maintain 2 position limits for each block one for each possible orientation, if a merge would force the block outside the image we continue with the next pair at Step 4
the limits are calculated based on the fact that no line can be output before it has been received

Step 4D (impossible merges 3)

we maintain the fullness state of the 32 line reorder buffer if a merge would require overflowing it then we again continue with the next pair at Step 4

Step 4E (merge)

we merge the 2 blocks and update the buffer fullness and limits

Step 5 (sort blocks)

all blocks are sorted by their less restrictive position limit (=orientation is so that the limit is less restrictive)

Step 6 (reorder lines)

we reorder the actual lines

Breaking Nagravision with knowledge of the PRNG

as there are just 2^15 possible permutations so the naive bruteforce becomes alot more realistic but its still somewhat expensive for realtime decoding, so 2010 used a faster method

Step 1 (downsampling)

to speedup the linecompare later, the lines can be horizintally downsampled

Step 2 (compare)

every keyline is compared against all lines, the best 2 are keept

Step 3 (find seedlists)

for every line triplet (keyline and the 2 best matches) we assume that they will be consecutive
in the decoded image, we look up a list of seeds for which this is true from a precalculated table
or tree structure

Step 4 (find seed)

we find the most often occuring seed in the seedlists and if its number of occurances is above some threshold we decode the image, if not we drop it, this ensure practcally zero wrong decoded images

Step 5 (reorder)

after we know the seed we just need to reorder the lines and fix the chroma

Filed under: Cryptanalysis — Michael @ 11:54

Comments (3)

2006-04-10

Decrypting VideoCrypt

To complement my last blog entry, heres a description of how VideoCrypt works and how it can be broken with reasonable quality

PAL

First before we can talk about VideoCrypt, we need to understand how PAL works
Video information in PAL (and NTSC and SECAM) is stored linewise top to bottom, and interlaced so all the even and then in the next field all the odd lines are stored (313 lines in even fields, 312 in odd from which ~288 lines contain a vissible picture for PAL)
each line has a sync pulse to mark the end of a line, a horizontal blanking interval during which the electron beam in crts “moves” back to the left after the end of the previous line at the right and obviously the vissible part in which brightness = amplitude
In case of Color the U and V components of the YUV color value are stored quadrature amplitude modulated at the color subcarier frequency and the V component has its sign inverted in every 2nd line of a field, theres also a short reference color burst added in the HBI

VideoCrypt

the PRNG seed

Binary data from the vertical blanking interval is feeded into a smartcard which then among other things produces a PRNG seed for the videocrypt decoder, but none of this matters to us here

the actual decryption

The PRNG (pseudo random number generator) outputs a sequence of 8bit values (cutpoints), one for each line of video, each of these lines is then split into 4 parts, parts 0 and 3 are outside the vissible area and arent touched by videocrypt, parts 1 and 2 are the vissible area and are exchanged, the location which seperates 1 and 2 is the cutpoint, videocrypt does nothing special with the color information it just exchanges raw digitally sampled data
because this exchange operation cannot be done with zero-delay the videocrypt decoder will also output parts 1 and 2 one line later then when they where transmitted

Breaking VideoCrypt

Lacking informaion about the PRNG we can only guess the cutpoints based on the assumtation that color/luminance values for pixels which are close will be strongly correlated while distant pixels will have more different color/lumi
still, naive bruteforce would require 256^288 images of w*288 pixels to be compared
depening on how we test a image for how good it is, we are able to significantly reduce the complexity, if we use a score function which is the sum of some compare function which uses just consecutive line pairs and their cutpoints then the dynamic programing algorithm / viterbi algorithm can be used to reduce the complexity to 256*256*288 such 2 line checks, if we now further limit ourselfs to the common cross correlation + some edge detector and only consider a few cutpoint differences close to the best result from cross correlation then the complexity would be w*log(w)*288 + 9*w*288 which is what markus kuhns antisky.c did, though antisky was far from realtime decoding, mainly because it was inefficiently written (floating point code everwhere at a time where no cpu had a reasonable fast fpu, …)

Note, if you think comparing 256^288 images would be stupid, its not, think about choosing the image which is compressed best with jpeg or another codec, this might (or might not) be better then the linewise method

Step 1 (downsampling)

to speedup the cross correlation and other things the whole image can be optionally downsampled

Step 2 (luminance cross correlation)

antisky.c used a FFT based cross correlation, this while asymptotically quite fast is really not fast at all for the case of finding the best matching cutpoint difference in reality, 2010 uses a adaptive comparission which first tries a few coarsly spaced cutpoint differences and then tries recursively more around the better ones, its simpler and faster

Step 3 (missmatched lines detection)

lines which simply cant be matched by cross correlation (mean under some threshold or variance under some threshold or best matching score from cross correlation not much better then average score)
must be marked otherwise the (random) cross correlation result will trash the other surrounding lines

Step 4 (PAL phase detection)

as already described somewhere above, PAL flips the sign of the V component every second line, the first line may or may not be fliped, and if the capture hw doesnt provide us with this info then we need to find it if we want to decode color, one way to solve this is to let the user set the flag and then just flip the flag after every frame, a much more robust method is to compare the color of the first and last pixel of the encrypted line which will be exchanged, these are guranteed to be always adjacent in the decrypted image so they will have almost the same color in the decrypted image, while in the encrypted image their difference depends upon the PAL phase thingy

Step 5 (finding the Chroma phase difference)

we know the approximate cutpoint difference from the luma cross correlation, so we can simply find the chroma phase/hue difference by calculating the complex correlation coefficient (U and V are real and imag values) (note a little care needs to be taken here due to various things i dont remember exactly …)

Step 6 (edge detection)

simply run your favorite edge detector over the 256 possible cutpoints to detect the picture edge, for example log(abs(l[x-2] + l[x-1] – l[x+0] – l[x+1]))

Step 7 (dynamic programming search)

Well here we simply combine all the previous stuff to find the best cutpoint sequence to reach every cutpoint of line X assuming we know the best sequence to all cutpoints of line X-1 already
lines which couldnt be matched to their previous line are a special case which is handled like the first line (every cutpoint has the same score) the same is done with yet another special case, namely the one where no cutpoint is possible anymore, that can happen as we restrict ourselfs to a few relative cutpoints around what the cross correlation guessed is correct and theres a small deadzone in which no cutpoints lie, so not every sequence of relative cutpoints is possible depending upon how you define relative

Step 8 (cutpoint sequence cache)

over some timeperiod cutpoint sequences where repeating after an hour or so, so i added a cache to 2010 which lead to perfect decoding until the repeations dissapeared …

Step 8a (finding the current sequence)

we select several consecutive relative cutpoint pairs and look them up in a hashtable, we also try to add +-1 to both as to compensate for errors in our cutpoint guess
for each pair we will get a list of pointers to cutpoint sequences
we count the number of occurances of each pointer and choose the most often occuring as the correct sequence if its number of occurances is above some threshold or return no match if not

Step 8b (merging the current cutpoint estimates)

we add the current relative cutpoints to the sequence so that each line has a list of relative cutpoints, if that gets too long the least often occuring is discarded
for each line the most often occuring relative cutpoint is choosen and used in later steps, the hashtable is also updated with it

Step 9 (cut and exchange)

this is simple, just 2-3 memcpys and rotating the UV values so as to compensate for the effects of the shift and 1 line delay on the color

Filed under: Cryptanalysis — Michael @ 14:58

Comments (0)

2006-04-07

Decrypting (old analog) PayTv

A really long time ago, 1997 or maybe it was even a little earlier i wrote a program to decrypt the at that time used analog pay tv systems videocrypt and nagravision sadly i was too afraid of legal issues at that time so i never released either source or binary, well both systems are AFAIK dead nowadays, the old code hasnt been touched since 2000, doesnt compile with any compiler i have, so i thought its the perfect time for finally releasing that trash

When was the first working version born?

well i dont know, the oldest file i could find was a 2do list from 1997 which contains notes about improving videcrypt and nagravision decoding so these must have been working already …

Does it contain any cryptografically new knowlegde about the systems

no, before the nagra PRNG was reverse engeneered i belived that the PRNGs both systems used would be very secure and there would be no point in “looking” at their output, afterwards besides feeling silly as even a blind man would have noticed how the nagra PRNG works (its just permutating 256 5bit values in trivial 32768 ways, 127 only if we ignore new[x]= old[x+C])
i tried to look at the videocrypt PRNG output but failed to find anything, 2 cutpoint sequences where either different and their difference random or there where exactly equal, which happened either commonly (repeating after an hour or so) or never, depending upon when you recorded them, i also failed to find any other relations in the sequences …

Is this clean ANSI C code?

its ultra messy DJGPP-GCC-DOS-C code, i started to clean it but gave up, a patch with my unfinished cleanup work is included for the insane. with the patch a few files compile under current gcc-linux

what else does it do? could it make coffee?

Theres some teletext decoding code, some half working sync suppression decoder (with unmodified hardware) and probably a few other things i dont remember, after all it doesnt compile and i dont have my matrox meteor card here (which was the only card it supported, as that was what i had)

did you port it to java?

yes, i ported part of it to java a really long time ago, why i did that iam not sure anymore, was i seriously beliving the sun propaganda that java was a useable general propose programming language?i guess my excuse is that my previous java code at that time where just tiny applets & applications where the java-effect isnt so noticeable
the java version was even with hand coded mmx over jni about 3 times slower, there where ~1sec garbage collector pauses every 30secs or so and after 15min the computer needed a hard reset (yeah that was even without any mmx/jni)

download this crap\h\h\h\hmasterpiece

2010

license?

GPL but what do you want to do with it?!

Filed under: Cryptanalysis — Michael @ 14:37

Comments (6)

2006-03-24

AVI and B frames

AVI and b frames, is it allowed or not?

AVI certainly wasnt designed to support b frames but on the other hand theres nothing which would make b frames in AVI illegal, the biggest problem isnt AVI but the various applications and APIs like vfw which are designed with zero delay codecs in mind

PTS(Presentation timestamps)

well wtf are they and why would we need them? every (video) frame has a decode timestamp(DTS) and a PTS, the DTS is the time at which a frame is feeded into the decoder, the PTS is the time at which the decoded frame will be presented to the user, codecs which have no bframes / zero delay always have PTS =DTS and AVI as it wasnt desiged with b frames in mind has no concept of PTS, there are just DTS which are simply the frame number divide by the frame rate
So one could now argue AVI doesnt support b frames as it doesnt store PTS and would if the application needs to know PTS (simpler players dont need to know the PTS…) to calculate the PTS based upon frame type and DTS, but that argument against AVI+b frames has a critical flaw, MPEG-PS and MPEG-TS dont store PTS for every frame either but only require it to be stored every 0.5 seconds or so. Which means that the same complicated calculate the PTS from DTS + frame types code is needed for the official MPEG format too

Packed bitstream

Packed bitstream is a very ugly hack which puts several frames in to a single avi chunk, this reduces the b frame delay by 1, which avoids some problems with APIs which are not aware of b frames, but causes various problems with APIs which are aware of b frames, the resulting bitstream is also not valid according to the MPEG standards, so lossless transcoding to .mp4 is much more complicated

Filed under: Container Formats — Michael @ 19:37

Comments (1)

2006-03-02

The AVI container file format

Some people love it some hate it, but no doubt it is probably the most common container format for videos though its usage is declining as supperior formats like matroska, nut and others gain popularity

Why is/was AVI so successfull

well theres no way to awnser this with certanity, we can only guess

allows storing of audio and video encoded with almost any codec without complicated codec specific hacks (ogg is probably the most common example which fundamentally failed here, but mpeg-ps/ts has the same issue, and .mp4 is thanks to the standard comitees also full of codec specific hacks, which ironically wouldnt be needed at all)
simple muxing and demuxing (quicktime/.mov/.mp4/mpeg-ps fails here)
no known software patents (asf/wma/wmv fails here)

AVI the format vs. reallity or why some developers hate AVI

The biggest problem with AVI at least in my humble oppionion is not some technical limitation of AVI but the fact that many encoders (muxers actually to be precisse) generate incorrect AVI files which violate the AVI spec from MS, that again means that players need to be much more complex to deal with all these variations …
some examples of such variations

putting all the audio in a single chunk, yeah try to implement seeking …
having an index which doesnt match the chunks, and just to be sure dont set the header flags which indicate that the index must be used
putting several variable sized frames into a single chunk
or the very common way of storing mp3 in AVI, simply cut the mp3 frames completely randomly and then scream “avi doesnt support vbr mp3” (hi virtualdub and clones)

Common missconceptions about AVI

AVI doesnt support variable bitrate audio: This is simple nonsense, there are 2 ways to store frames (audio/video/…) in AVI, the first is to set sample_size to the size in bytes of a packet and store an integer number of such packets per chunk, the second is to store 1 frame per chunk and set sample_size=0 and thats what has to be done for variable size packets it works with audio as well as video. Now what some encoders actually do is they set sample_size=1 and then chop the audio stream up at random, this surprisingly works with cbr mp3 but not vbr mp3 but its totally wrong even for cbr mp3
AVI doesnt support variable framerate: again this isnt true, storing variable framerate in AVI is not efficient but it is possible and its not a hack and it doesnt stretch the AVI spec, its fully suported, you just set the framerate to the least common multiple of the rates you want to use, and use 0-byte chunks for “skiped” frames
storing variable samples per packet audio in AVI (vorbis): while this is still possible, this stretches the spec somewhat, sample_size must be 0, now there are 2 ways to store things, 1. the one with large overhead this sets rate/scale to a common timebase (gcd of samples per packet/samplerate) and then adds 0 sized packets to keep the timestamps correct, 2. the one with small overhead, this sets rate/scale to (lcm of samples per packet/samplerate) and packs several packets intoeach chunk which isnt really allowed but well …
AVI has 24byte per chunk overhead, ODML-AVI has 32byte per chunk overhead: well ODML-AVI without the old redundant index has just 16byte per chunk overhead and if you stretch the spec like some encoders successfully do by having chunks and index missmatch then you can get away with 8byte per chunk overhead, and yes you can still seek in this

Filed under: Container Formats — Michael @ 23:53

Comments (7)

2006-02-17

SIMD without SIMD

You want to put several values in a single int or uint64_t, and either you dont have a cpu which supports SIMD instructions like MMX or the instructions plain dont match the datatype you have (like 2 5:6:5 rgb values in an int)

shifting some arbitrary bits left by 1

a += a&mask
note1, this is very usefull for converting betweem rgb15 and rgb16
note2, yes the bit into which you shift should be 0

unsiged shift left & right

this is simply a normal shift and then masking out the bits which where shifted out of each component
a>>/<<= shift
a&=mask

sign extension

(a+C)^C

00011|00010|00001|00000
10001|10000|01111|01110    + 1110011100111001110
11111|11110|00001|00000    ^ 1110011100111001110

signed shift right (signed left shift == unsigned)

unsigned shift + sign extension

sum all elements

there are several ways,
if you know nothing will overflow then a simple multiply will do
AAAABBBBCCCCDDDD * 1000100010001 = SSSSXXXXXXXXXXXX
or
AAAABBBBCCCCDDDD + (AAAABBBBCCCCDDDD>>4) = X,a+b,X,c+d
(X,a+b,X,c+d) + ((X,a+b,X,c+d)>>8) = X,X,X,a+b+c+d
if overflows might happen / the width of the types is too small then
a= (x & 1111000011110000)>>4
b= x & 0000111100001111
a+=b
and then either use one of the methods above or recursively continue with this

add

1. remove msb, and add the lsbs
2. xor the xored msb back in
x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
m=1000100010001000 and l=0111011101110111
((x&l) + (y&l)) ^ ((x^y)&m)

subtract

1. remove msb, and sub the lsbs
2. xor the xored msb back in
x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
m=1000100010001000 and l=0111011101110111
((x|m) – (y&l)) ^ ((x^y^m)&m)

negate

x=AAAABBBBCCCCDDDD is our input
m=1000100010001000 and l=0111011101110111
x= ~x
((x&l) + 0001000100010001) ^ (x&m)

average round down

x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
(x&y) + (((x^y)&1110111011101110)>>1)

average round up

x=AAAABBBBCCCCDDDD and y=EEEEFFFFGGGGHHHH are our inputs
(x|y) – (((x^y)&1110111011101110)>>1)

test if any element is zero

x=AAAABBBBCCCCDDDD is our input
(x – 0001000100010001) & (~x) & 1000100010001000
Note, this can trivially be used to test for equality by choosing x=a^b

create a bitmask based on zeroness

x=AAAABBBBCCCCDDDD is our input
m=1000100010001000 and l=0111011101110111
(((((x&l) + l) | x) & m) >>3) + l ^ l
Note, this can trivially be used to test for equality by choosing x=a^b

Filed under: Optimization — Michael @ 23:46

Comments (5)

« Previous Page — Next Page »