Lair Of The Multimedia Guru

2007-05-19

Reviewing Patches

Whats a patch

A patch can be many things if you look at wikipedia, but the one i mean here is of course whats generated by the diff program. What diff outputs are simply differences between 2 text files in a terse form which is human and machine readable. The machine readable part means that the diff output together with one of the 2 input text files can be used by the patch program to generate the other text file. This makes patches a very convenient method of communicating changes between people, as people can read and understand them and apply them automatically to a file.

The following is an example patch which corrects a serious mistake of shakespeare. Sadly shakespeare did know about the patch program so he apparently failed to correct that …

--- mytext	2007-05-19 00:46:37.000000000 +0200
+++ mytext2	2007-05-19 00:47:12.000000000 +0200
@@ -1,3 +1,3 @@
 Shakespear said or wrote or whatever ...
-"To be or not to be,
+"To pee or not to pee,
 that is the question"

What are patches used for

Well they are used for communicating changes between software developers, at least in the open source world. So for example if you change the ffmpeg source code and fix a bug or add a feature then you could make a patch and submit that to us.

Why a patch and not the whole text

Well just think about the following: 2 people send you a patch, you look at the 2 small patches and if you are ok with them, you just apply both and have both changes in your text (which may be the source code of a program, or it might be just a love letter, the patch program doesnt care). On the other hand if the 2 people would have sent you the complete files not only would it be tricky to see what they have changed, integrating the changes would be even less fun. Note there are of course cases where applying 2 patches from 2 people conflict with each other so that they cannot both be applied automatically but in practice this is rare if the patches are clean

Reviewing

Now after this little introduction lets look at the actual reviewing of patches. People send us patches with their modifications of the ffmpeg source code and we review and comment on them. … Well, its sadly more i than we but anyway

The primary goal of the reviewing is to prevent bad, broken, buggy, inefficient, stupid, unneeded, … changes from being applied to the official ffmpeg source, so as to keep the source code small, efficient, simple, fast and working. This might sound like an easy task but it is absolutely not easy, if it where people would not subimit crappy changes in the first place but rather realize the problems and correct them before submitting.

Fixing problems in patches ourselfs vs. forcing the submitter to fix them

Fixing all issues which are found during review is always the problem of the person who submitted the patch. The first reason for that is it works very well in practice as the submitter wants the change to be accepted. The second reason is simply that we dont have the time to fix the issues considering the number of patches we receive and applying patches without all issues fixed would cause the quality, stability and maintainability of ffmpeg to decline steeply so that isnt an option either.

Sanity check

This is the first simply review level which checks if the patch could be applied at all if we wanted to apply it and if it does work

  • Is it a patch at all?
  • Is it undamaged?
  • does it apply to the current latest ffmpeg source?
  • do the regression tests pass? (We have a series of automated tests which check if various things like encoding, decoding and seeking still work as they did before)
  • does the patch not introduce any trailing whitespace or tabs? (we have agreed that these dont do any good and so the ffmpeg source doesnt contain any, this is arguable a philosophical issues but its easy to conform to)
  • does it not mix whitespace/cosmetic changes with functional changes? (patches which do are very hard to review so we always require them to be split)
  • does it do what the author claims it does?
Low level review

Low level here means reviewing the code in a line per line / function per function way without thinking too much about how the lines or functions are used

  • Are there any obvious bugs in the code?
  • Is the code minimal, that is can the same thing be done with less code? Less code generally means fewer bugs, faster code and less to read and understand …
  • Is the code fast? (does speed even matter where its used)
  • Is the indention consistent?
  • Is the code portable or would it fail on a big endian architecture?
  • Does the change not break anything?
  • Are all function/struct/… comments doxygen compatible so that nice html source code documentation can be automatically build?
  • Is the code not duplicated? Duplicated code is hard to maintain and wastes space in the source, the object file and in memory at runtime
  • Does the code have proper prefixes on the names of non static functions to avoid namespace issues?
  • Does the code not break the API/ABI and if it does break it does it update the version numbers properly and is the change really needed?
  • Are all input values properly checked to prevent crashes and security issues with invalid input files?
  • Are the types of large arrays minimal so as to safe space?
High level review
  • Is the choosen algorithm wisely choosen or is there a faster/simple one? does speed or size matter more in the case where its used …
  • Can the patch be split into seperate or incremental changes? Splited patches are easier to review, they are easier to understand, they are easier to debug as they can be seperately reverted and applied
  • Does the implementation make sense or are there serious conceptual mistakes in it, like messing up codec/container seperation or doing not really needed things
  • Are Things properly split into files? (for example decoder and encoder should be in seperate files)
  • Can big and seperate features be disabled at compile time?
  • Do the advantages from applying the patch outweight the disadvantages? Some rare patches add features noone needs but to do that add alot of complexity though that is rare and often such things can be made optional at compiletime

There are probably more things i forgot, but iam getting tired and this blog entry has become much longer than i wanted it to become …

Filed under: FFmpeg — Michael @ 01:48

2007-05-11

Tic Tac Toe

Java sucks, all remotely sane humans know that nowadays, but long long ago i did the unimagineable and volunteerly wrote some small java programs (until i realized how badly java sucks). Todays or actually tonights blog entry is dedicated to one of these, a little Tic-Tac-Toe game, which you should be able to see below in the unlikely case that you have java enabled in your browser


8? years old GPL source is available too, and in case you are wondering the images where created with povray, and yes patches to improve it are welcome ;)

PS: if you win send me a bugreport ;)

Filed under: Off Topic,Uncategorized — Michael @ 01:38

2007-05-03

Small tasks #2 (Iterative motion estimation)

FFmpeg supports iterative motion estimation, but only for snow. It would be interresting to implement/port this code to the generic mpeg framework (motion_est.c/mpegvideo.c/h) in ffmpeg

This task requires:

  1. Reading and understanding the generic motion estimation code
  2. Reading and understanding the iterative motion estimation code in snow.c
  3. Porting/Implementing iterative ME for the generic code

The advantage of iterative ME is higher quality per bitrate (but at higher computational cost).

Normal motion estimation tries to optimize each motion vector so as to minimize the weighted sum of the distortion (difference to the input picture) and the rate (bits for the motion vector). And after doing that for vector i, its done with i+1, then i+2 and so on, where all past vectors are held constant and all future ones are ignored. This is of course not correct, iterative motion estimation OTOH does not ignore any vectors but rather considers the bits of the future vectors too and performs several passes over all blocks until no further improvement is achived (=the search found a local minimum)

Filed under: FFmpeg — Michael @ 01:07

2007-04-25

Small tasks for FFmpeg

The FFmpeg todo list is getting longer and longer and melanson suggested me to try to talk about some small tasks here so as to maybe inspire some reader to contribute to ffmpeg :)

Optimal huffman tables for (M)JPEG encoding

this is pretty trivial, instead of encoding the stuff with the default huffman tables, simply store the data in a temporary buffer, build optimal huffman tables for the data, encode data with them

Building optimal huffman tables

In the unlikely case that the reader doesnt know how to build a optimal huffman table and to safe google from the millions of people looking that up after the post ;) heres how:

  1. you know the propability of each symbol (simply because you have “encoded” the image and know how often each symbol occured)
  2. merge the 2 least likely symbols into one which has their propablities sum as its propability, after that you have 2 symbols less and 1 more or simply one symbol less to deal with
  3. if you have more than 1 symbol left goto 2
  4. your huffman table/tree is finished, you only have to decide which side of each branch gets 1 and which 0, assigning these in any random or non random way will do and have no effect on the decodeability or optimality of the result. Note, (m)jpeg puts some restrictions on the 1/0 assignment so you cannot do them entirely random
Update 2007-04-26

As a reader has noticed (m)jpeg has some additional constraints which cannot be ignored when building huffman tables. So heres some update to deal with these constraints

Additional constraints for (m)jpeg

No code may have all 1 bits

This is trivial to avoid, simply add an additional dummy symbol with propability lower then all other symbols, such a code would be the longest after building the huffman table and could then simply be given all 1 bits during assinging bits for each branch.

The maximum code length is 16

To avoid this we need to change the symbol propabilities before huffman tree building. The optimal solution is the package merge alorithm

The package merge algorithm
  1. start with an empty list, lets call it list(0), set i=0
  2. add 1 entry to list(i) for each symbol we have and give each a score equal to the propability of the respective symbol
  3. merge the 2 symbols of least score and put them in list(i+1), list(i) will have 2 symbols less, list(i+1) will have 1 element more, the new score will be the sum of the 2 scores
  4. if there is more than 1 symbol left in the current list(i) then goto 3
  5. i++
  6. if i < 16 then goto 2
  7. select the n-1 elements in the last list with lowest score (n=the number of symbols)
  8. the length of the huffman code for symbol s will be equal to the number of times the symbol occurs in the select elements
Example:
symbol  propability
A       1
B       2
C       5
D       10
E       21

optimal huffman tree
A       1111
B       1110
C       110
D       10
E       0

assume we would rather want max length 3

list(0)= A1 B2 C5 D10 E21
list(1)= A1 B2 AB3 C5 D10 CD15 E21
list(2)= A1 B2 AB3 C5 ABC8 D10 E21 CDD25
list(3)= AB3 ABC8 ABCD18 CDDE46

A       111
B       110
C       101
D       100
E       0

Note: ffmpeg already contains code to build huffman tables from just the their lengths

Filed under: FFmpeg — Michael @ 02:33

2007-04-12

animal-rights activists

You get one of these begging letters from a bunch of animal-rights activists who need some money to safe a few poor animals, well you dont really wonder about from where they know your name and address, after all you live in a corrupto-cracy which protects such personal data …
But the real question is what do you do, donate a few euro to them? I mean the case iam talking about is not about some corrupted spammers but rather the following, quick summary (based on unverified info from the net)

an austrian zoo (safaripark gänserndorf) goes bankrupt somewhere in 2004, over 800 animals need to find a new home, many of them are being bought by animal-rights activist groups or zoos, the politicans and other austrian zoos who should take care of the animals until a new home is found rather fight against each other and try to squeeze money out of the whole while pretending in public that they would work pro bono. In the end there are a few lions and HIV infected monkeys left. The monkeys needs are being paid for by the pharma industry, yeah anyone surprised they care more about the wellfare of the animals than our politicans and other zoos? The lions owned by animal-rights activist group “Vier pfoten” from whom the begging letter was, they also bought some area in south africa for the lions but not surprissingly need more money …

Now why did i write this off topic blog entry? Well iam unsure if i should donate them a few euro or not, i want to help the animals but iam not so sure if what the animal rights activists do really helps anyone. Ive no doubt they have only the best intentions but the question is not dead lion vs. alive lion living in a nice natural environment but rather dead lion vs. a few hundread dead cows, yes lions arent vegetarian also the lions cannot be let loose into true freedom as they have been used to being close to humans. So being part of a working ecosystem where they hunt and kill their pray and by that positively contribute to the evolution of their pray (weakest dies strong survives and reproduces …) isnt possible.

And just in case anyone is curious, i belive its a detestable thing to lock animals into cages for the publics entertainment. Or in other words i hate zoos, thouse running them and thouse who support them (by visiting and thus paying for them).

Filed under: Nature,Off Topic — Michael @ 03:26

2007-03-25

Resampling

Resampling is the problem of converting a signal which is sampled at a given samplerate to one which is sampled at another samplerate. This is the same as converting a discretely sampled signal to a continuous one and then converting it back to a discretely sampled one.
There are many ways to look at this problem which dont lead to the same solutions

The optimization point of view

The problem can be seen as one of optimization that is choose the function which converts the discrete samples to a continous signal so that the error to the original correct signal is minimized. And then select the function which converts the continuous signal to discreet samples so that together with a optimal back conversation the error is minimzed.
While this is a nice generic definition of the problem it doesnt point to a obvious solution, also the optimal solution will almost certainly not be remotely close to a common sense definition of what samples represent, it might be more similar to a vector quantizer or other odd non linear thingy …

The linear optimization point of view

Like in the previous point we again try to find continuous->discrete->continuous convertion functions which minimize the error to the original but we restrict ourselfs to linear convolutional filters. Note continuous signal here means one which is either given analytically (like a polynom) or discretely sampled at higher resolution.
Here its a matter of linear algebra and a training set of signals to find the
optimal discrete->continuous filter for a given continuous->discrete filter. And the optimal continuous->discrete filter for a given discrete->continuous filter.
In this framework it is also possible to consider output and input devices. For example a monitor which converts the discretely sampled data into very tiny fuzzy colorfull dots, which when they are all added together give a continous signal. The monitor here is a discrete -> continuous filter for which we can find the optimal linear least squares filter which changes a given signal to a discretly sampled one.

The bandlimited point of view

Its possible to look at a amplitude per time signal also as a amplitude per frequency signal. If our signal now happens to not contain any non zero frequency component above a specific frequency X then this signal is said to be band limited. If we now sample such a signal with a sampling rate >2X then we can exactly restore the original continuous signal from these samples by simple convolution with the sinc filter (sin(x)/x) (the discrete samples here would be at x=…, -pi, 0, pi, 2pi, …).
And to remove all frequencies above a given frequency X while retaining all below again simply convolution with sinc is the awnser. So resampling becomes a simple matter of using the sinc filter with the correct scaling to find the new samples. Sadly there are a few problems and missunderstandings with this approch.

Summing a infinite number of samples fallacy

Some people say sinc cannot be used because it would require summing an infinite number of samples, this is of course not true as images and audio on computers tend not to contain infinite many samples to begin with. Or more precissely a signal of N samples can simply be converted into frequency domain with a FFT the too high frequency coefficients could then be set to 0 and then a inverese FFT could be applied. This is exactly equivalent to infinite periodic extension and application of the sinc filter. Direct applicaton of the sinc filter in time domain is trickier.

sincs latency

As the sinc filter has infinite support (theres no X after which all values are 0) it cannot be applied in realtime, it needs all input samples before it can output anything.
That also means all applications and libraries which claim to use sinc for resampling in realtime lie.

Causality and sinc

Most output samples depend on most (practically all) future input samples

Shooting sinc and the whole bandlimited theory down

Imagine absolute silence, open space, no obstacles to reflect sound and a gun, theres no doubt that the correct continuous sound amplitude will be exactly 0 (the air pressure will not change), now the gun is fired, which causes a very loud “bang” pretty much a very sharp spike in the amplitude and air pressure and then perfect silence again. Now this signal is clearly not bandlimited, so lets make it bandlimited by convolving it with sinc, the resulting signal will of course look like the sinc filter itself (convolution of a single spike is simply a identity operation). At that point we will have a bandlimited signal of a gunshot. Lets assume our signal is sampled at 8khz
in that case the amplitude 1 second before the gunshot will be 1/(pi*8000) of the loudness of the gunshot while this seems very little given a loud enough gunshot its audible and this simply isnt correct. Also for shorter distances like for example 5 samples before the gushot its 1/(pi*5) ~ 1/16 which for example when resampling an image with a sharp edge leads to terrible ringing artifacts.

Windowed sinc

An alternative to the infinite support sinc filter are windowed sinc filters, which are simply sinc componentwisely multiplied by a window function. These filers have compact support (are zero everywhere except a few samples). For them the the size of the transition band where frequencies are neither left at their amplitude nor attenuated to near 0 cannot be 0. Its size, the length of the filter the stopband attenuation and all the other parameters depend on each other and a trandeoff must be made …

Filed under: Uncategorized — Michael @ 03:50

2006-12-29

A few days ago chkrootkit told me “Enye LKM found”

My first thought was, shit …, but there was something odd, it wasnt reproduceable, subsequent runs of chkrootkit found nothing, was it just a false positive?
Also mutt had died with on odd error message, “Real-time signal 24” to be precisse, was there a relation? I decided to look at the chkrootkit source to clarify all that, i found the following:

   /* Check for Enye LKM */
   if (kill (12345, 58) >= 0)
   {
      printf("Enye LKM found\n");
      retdir+= errno;
   }

chkrootkit simply sends a “random” signal to a “random” pid, no check if anything normal is running at that pid not to mention the likelyness that a intruder would leave the pid/signal id at the default

Filed under: Uncategorized — Michael @ 23:25

2006-12-12

Hadamard transform

In the last blog entry ive mentioned the hadamard transform, so its high time to dissuss it
the hadamard transform is a linear transform, and all linear transforms can be written as a vector-matrix multiplication, here the matrix is orthogonal, symmetric and all it elements are either a or -a. Actually this is commonly written as a matrix of 1 and -1 which is multiplied by a which is sqrt(size of the transform), you also might want to take a look at Wikipedia/Hadamard_transform.
The hadamard transform is also somewhat similar to the DCT but simpler, thats also why its used as part of a SATD motion estimation comparission function, the dct would be slightly better quality wise but significantly slower …

How to calculate it

Well there are many ways
a slow and simple one is:

for(i=0; i<n; i++){
    dst[i]=0;
    for(j=0; j<n; j++){
        int sign= parity(i&j) ? -1 : 1;
        dst[i] += src[j] * sign;
    }
}

a faster one is:

void transform(int *data, int step, int n){
    int i, j;
    for(i=0; i<n; i+=2*step){
        for(j=i; j<i+step; j++){
            int a= data[j];
            data[j]      = a + data[j+step];
            data[j+step] = a - data[j+step];
        }
    }
    if(2*step<n)
        transform(data, 2*step, n);
}

You should of course not implement this recursively, but instead remove recursion, unroll the loops where cleanly possible and replace step and n by constants if you can.
Ive just written it like above as its IMHO easier to understand that way then 2 pages of unrolled loops

Filed under: Uncategorized — Michael @ 01:41

Motion Estimation (1. comparission functions)

This months ;) Todays blog entry is about motion estimation for block based video encoders, more precissely integer pel style, no half pel or qpel to keep this reasonable simple. In practice the sub-pel motion estimation (hpel/qpel…) is normaly done after normal integer pel anyway so we can nicely ignore sub pel today

Now how do we estimate the motion of a square or rectangular block of pixels?
Nothing easier then that you will likely say just find the best matching block in the reference frame for each block we need to encode, but what is “best” and how to find it?

Compare functions

To find the best matching block we first need to give each pair of blocks a score, there are several popular ways to do that:

SAD Sum of absolute differences
      for(x=0; x<SIZE; x++){
          for(y=0; y<SIZE; y++){
              score += abs(reference[y][x] - current[y][x]);
          }          
      }
    
SSD Sum of squared differences
      for(x=0; x<SIZE; x++){
          for(y=0; y<SIZE; y++){
              error= reference[y][x] - current[y][x];
              score += error*error;
          }          
      }
    
SATD Sum of absolute hadamard transformed differences
      for(x=0; x<SIZE; x++)
          for(y=0; y<SIZE; y++)
              difference[y][x]= reference[y][x] - current[y][x];
      vertical_hadamard_transform(difference);
      horizontal_hadamard_transform(difference); // vertical and horizontal can be exchanged
      for(x=0; x<SIZE; x++)
          for(y=0; y<SIZE; y++)
              score += abs(difference[y][x]);
    

And the theoretical ideal comparission function is to completely encode the block and then use the weighted sum of the number of bits and the distortion be it SSD or a psychovissual one as score

And how to find the best block with such a comparssion function will be the subject of another blog entry, dont fear, iam just trying to keep the blog entries small, selfcontained and easy digestable

Filed under: Uncategorized — Michael @ 00:17

2006-10-29

Integer precision conversation

You have a 8bit grayscale value (or a audio sample) and want to convert it to 16bit, so without thinking about it you write x<<8 but that generally isnt correct, because the the new least significant bits are all zero so that the largest 8bit value will become a value 255 below the largest 16bit value

So what is correct?

This depends on your definition of what all the values mean, if 0 is black then simply multiplying by the new value for white and dividing by the old value for white with rounding to nearest will do the trick

For the specific case where the largest representable value is white, increasing precision can be done by setting the new least significant bits to the most significant ones, for example for 8->16bit x+(x<<8)=x*65535/255=x*257, for 8->10bit (x<<2)+(x>>6) or if you are pedantic (x*1023+127)/255=(x*341+42)/85 and for 2->8 x+(x<<2)+(x<<4)+(x<<6)=x*85

For the specific case where the largest+1 (256 in case of 8bit) is white, indeed just setting the low bits to 0 is correct, but at least for low precision this is not how grayscale values are defined, just think about 1bit monochrome, converting that to 8bit by setting the low ones to 0 will give you 128 for 1 while white is 255 (or 256)

What about decreasing precision?

Well thats simply the inverse, so for x*257 its (x+128)/257 which can be approximated by (x*255+(1<<15))>>16

Filed under: Uncategorized — Michael @ 01:48
« Previous PageNext Page »

Powered by WordPress