Lair Of The Multimedia Guru

September 26, 2007

Spam karma2

As you certainly have (not) noticed, ive enabled spam karma2, mike melanson worked hard on installing a few anti spam plugins for me, spam karma2 being one of them. thanks mike!

Until now spam was stoped by a simple word list and anything containing a link was as well blocked. On my side procmail sorted the large amount of moderated comments into certain spams and uncertain as well as non spam comments i then from time to time deleted spams which where missed and approved what was blocked incorrectly. That worked quite well actually … but you know if it aint broken then break it ;)

Spam karma2 should hopefully catch all spams and not block legitimate comments ;) It even has the ability to display a captcha if its not certain, just an hour ago it did that for a spamer who tried to sell some “herbs” for long “legs” it didnt help though i had to delete that by hand after the spamer or his script passed the captcha check

If you write a comment and it gets blocked with no captcha or other tests then please send me an email!

Comments welcome!

Filed under: Uncategorized — Michael @ 21:10

September 2, 2007

Googles summer of code 2007

The company which tries so hard not to be evil, has this year again payed students, 900 according to wikipedia to work on free software. 8 of them worked on ffmpeg this year, luckily the results are less disasterous than last year where half the 5 students which where supposed to work on ffmpeg plain dissapeared, and just reapeared before the deadline to convince us that they where almost finished with everything so as to get payed …

But lets rather talk about this years SOC, first there where 19 students, submitting 37 applications about ffmpeg. To rate these 37 applications and to prevent the high failure rate of last year, we required students to do a qualification task, that is they had to make a not too insignificant contribution to ffmpeg to be accepted. What exactly they did was pretty much their decission though there was a list of suggestions. As a beautifull sideeffect of that, the qualification tasks led to some nice and new features for ffmpeg :).

9 of the 19 students submitted a qualification task, all thouse who submitted one passed, in addition 1 student was qualified through extensive past work on ffmpeg. From these 10 students, 2 sadly couldnt be accepted as they wanted to work on the same task as other students, well in principle it would have been possible to let 2 students work on the same task but it seemed silly. 1 wasnt accepted as his project appeared rather uninterresting and somewhat unrelated to video/audio. The 8th slot google provided was thus given to a student who didnt submit a qualification task. Also the actual decission of who would be accepted and who not was that of the mentors rating applications not of any single person …

So whats the current status?

  • Davids matroska muxer has passed review and should be commited to ffmpeg svn soon
  • Kostyas RV40 decoder looks pretty good and could probably be commited to ffmpeg svn soon, well actually if kosyta wanted he could commit immedeatly and continue work in ffmpeg svn
  • Marcos dirac decoder is also in good shape and theres not much keeping it from being commited, the encoder though needs more work
  • Kamils jpeg 2000 encoder and decoder, arent in good shape yet (only 2 out of 50 encoded images can be decoded by jasper, only 2 of 23 reference jpeg2k files can be decoded by kamils decoder), but then please dont forget that writing an encoder and decoder at the same time is harder then just one of the 2
  • Reynaldos QCELP decoder is missing a working postfilter, 1/4 and 1/8 decoding, so it needs more work as well
  • Bobbys avfilter is in quite good shape though its missing well working colorspace negotiation, also its missing actual integration in ffmpeg.c, only ffplay.c is there
  • Bartlomiejs EAC-3 decoder hasnt been reviewed by me yet …
  • xiaohuis mpeg-ts muxer itself also hasnt been reviewed by me yet, his patches though which where submitted and which ive reviewed need more work

I hope of course that all 8 students continue to work on their projects, free software in general and ffmpeg specifically!

Filed under: Uncategorized — Michael @ 22:55

June 7, 2007

Quantization of a bunch of scalar variables

You have a few uniformly distributed independant scalar variables, how do you optimally quantize them? Simply each independantly by uniform scalar quantization, wrong this is likely to the surprise of the reader not optimal.

One way to see the suboptimality is to look at what independantly uniform scalar quantization does if we have 2 variables, it splits the plane in squares and represents all points within a square by its center. How can we do better? just split the plane in hexagons, both worst case and average quantization error decreases. The worst case for the square is 2^0.5/2=~0.7071, the worst case for the hexagon is 12^0.25/3=~0.6204

Another way to see the suboptimality is to look at the worst case point, which in case of the square lies equidistant to the centers of 4 squares but with a hexagon it just lies equidistant to 3 centers of 3 hexagons. So there are just 3 redundant ways to quantize it instead of 4

If we consider the (uniform scaler) quantization of the first variable of 2 then we can simply improve the worst case which lies exactly between 2 values of the first variable by making use of the useless least significant bit. More exactly add half a quantization step to the second variable if the first was quantized to an odd value, this interrestingly turns out to be equivalent to using hexagons for quantization, though the hexagons are a little squished here compared to the optimal case with regular hexagons

Ive also written a little test program in case anyone wants to play with this quantization stuff

Filed under: Uncategorized — Michael @ 2:28

May 11, 2007

Tic Tac Toe

Java sucks, all remotely sane humans know that nowadays, but long long ago i did the unimagineable and volunteerly wrote some small java programs (until i realized how badly java sucks). Todays or actually tonights blog entry is dedicated to one of these, a little Tic-Tac-Toe game, which you should be able to see below in the unlikely case that you have java enabled in your browser


8? years old GPL source is available too, and in case you are wondering the images where created with povray, and yes patches to improve it are welcome ;)

PS: if you win send me a bugreport ;)

Filed under: Off Topic,Uncategorized — Michael @ 1:38

March 25, 2007

Resampling

Resampling is the problem of converting a signal which is sampled at a given samplerate to one which is sampled at another samplerate. This is the same as converting a discretely sampled signal to a continuous one and then converting it back to a discretely sampled one.
There are many ways to look at this problem which dont lead to the same solutions

The optimization point of view

The problem can be seen as one of optimization that is choose the function which converts the discrete samples to a continous signal so that the error to the original correct signal is minimized. And then select the function which converts the continuous signal to discreet samples so that together with a optimal back conversation the error is minimzed.
While this is a nice generic definition of the problem it doesnt point to a obvious solution, also the optimal solution will almost certainly not be remotely close to a common sense definition of what samples represent, it might be more similar to a vector quantizer or other odd non linear thingy …

The linear optimization point of view

Like in the previous point we again try to find continuous->discrete->continuous convertion functions which minimize the error to the original but we restrict ourselfs to linear convolutional filters. Note continuous signal here means one which is either given analytically (like a polynom) or discretely sampled at higher resolution.
Here its a matter of linear algebra and a training set of signals to find the
optimal discrete->continuous filter for a given continuous->discrete filter. And the optimal continuous->discrete filter for a given discrete->continuous filter.
In this framework it is also possible to consider output and input devices. For example a monitor which converts the discretely sampled data into very tiny fuzzy colorfull dots, which when they are all added together give a continous signal. The monitor here is a discrete -> continuous filter for which we can find the optimal linear least squares filter which changes a given signal to a discretly sampled one.

The bandlimited point of view

Its possible to look at a amplitude per time signal also as a amplitude per frequency signal. If our signal now happens to not contain any non zero frequency component above a specific frequency X then this signal is said to be band limited. If we now sample such a signal with a sampling rate >2X then we can exactly restore the original continuous signal from these samples by simple convolution with the sinc filter (sin(x)/x) (the discrete samples here would be at x=…, -pi, 0, pi, 2pi, …).
And to remove all frequencies above a given frequency X while retaining all below again simply convolution with sinc is the awnser. So resampling becomes a simple matter of using the sinc filter with the correct scaling to find the new samples. Sadly there are a few problems and missunderstandings with this approch.

Summing a infinite number of samples fallacy

Some people say sinc cannot be used because it would require summing an infinite number of samples, this is of course not true as images and audio on computers tend not to contain infinite many samples to begin with. Or more precissely a signal of N samples can simply be converted into frequency domain with a FFT the too high frequency coefficients could then be set to 0 and then a inverese FFT could be applied. This is exactly equivalent to infinite periodic extension and application of the sinc filter. Direct applicaton of the sinc filter in time domain is trickier.

sincs latency

As the sinc filter has infinite support (theres no X after which all values are 0) it cannot be applied in realtime, it needs all input samples before it can output anything.
That also means all applications and libraries which claim to use sinc for resampling in realtime lie.

Causality and sinc

Most output samples depend on most (practically all) future input samples

Shooting sinc and the whole bandlimited theory down

Imagine absolute silence, open space, no obstacles to reflect sound and a gun, theres no doubt that the correct continuous sound amplitude will be exactly 0 (the air pressure will not change), now the gun is fired, which causes a very loud “bang” pretty much a very sharp spike in the amplitude and air pressure and then perfect silence again. Now this signal is clearly not bandlimited, so lets make it bandlimited by convolving it with sinc, the resulting signal will of course look like the sinc filter itself (convolution of a single spike is simply a identity operation). At that point we will have a bandlimited signal of a gunshot. Lets assume our signal is sampled at 8khz
in that case the amplitude 1 second before the gunshot will be 1/(pi*8000) of the loudness of the gunshot while this seems very little given a loud enough gunshot its audible and this simply isnt correct. Also for shorter distances like for example 5 samples before the gushot its 1/(pi*5) ~ 1/16 which for example when resampling an image with a sharp edge leads to terrible ringing artifacts.

Windowed sinc

An alternative to the infinite support sinc filter are windowed sinc filters, which are simply sinc componentwisely multiplied by a window function. These filers have compact support (are zero everywhere except a few samples). For them the the size of the transition band where frequencies are neither left at their amplitude nor attenuated to near 0 cannot be 0. Its size, the length of the filter the stopband attenuation and all the other parameters depend on each other and a trandeoff must be made …

Filed under: Uncategorized — Michael @ 3:50

December 29, 2006

A few days ago chkrootkit told me “Enye LKM found”

My first thought was, shit …, but there was something odd, it wasnt reproduceable, subsequent runs of chkrootkit found nothing, was it just a false positive?
Also mutt had died with on odd error message, “Real-time signal 24” to be precisse, was there a relation? I decided to look at the chkrootkit source to clarify all that, i found the following:

   /* Check for Enye LKM */
   if (kill (12345, 58) >= 0)
   {
      printf("Enye LKM found\n");
      retdir+= errno;
   }

chkrootkit simply sends a “random” signal to a “random” pid, no check if anything normal is running at that pid not to mention the likelyness that a intruder would leave the pid/signal id at the default

Filed under: Uncategorized — Michael @ 23:25

December 12, 2006

Hadamard transform

In the last blog entry ive mentioned the hadamard transform, so its high time to dissuss it
the hadamard transform is a linear transform, and all linear transforms can be written as a vector-matrix multiplication, here the matrix is orthogonal, symmetric and all it elements are either a or -a. Actually this is commonly written as a matrix of 1 and -1 which is multiplied by a which is sqrt(size of the transform), you also might want to take a look at Wikipedia/Hadamard_transform.
The hadamard transform is also somewhat similar to the DCT but simpler, thats also why its used as part of a SATD motion estimation comparission function, the dct would be slightly better quality wise but significantly slower …

How to calculate it

Well there are many ways
a slow and simple one is:

for(i=0; i<n; i++){
    dst[i]=0;
    for(j=0; j<n; j++){
        int sign= parity(i&j) ? -1 : 1;
        dst[i] += src[j] * sign;
    }
}

a faster one is:

void transform(int *data, int step, int n){
    int i, j;
    for(i=0; i<n; i+=2*step){
        for(j=i; j<i+step; j++){
            int a= data[j];
            data[j]      = a + data[j+step];
            data[j+step] = a - data[j+step];
        }
    }
    if(2*step<n)
        transform(data, 2*step, n);
}

You should of course not implement this recursively, but instead remove recursion, unroll the loops where cleanly possible and replace step and n by constants if you can.
Ive just written it like above as its IMHO easier to understand that way then 2 pages of unrolled loops

Filed under: Uncategorized — Michael @ 1:41

Motion Estimation (1. comparission functions)

This months ;) Todays blog entry is about motion estimation for block based video encoders, more precissely integer pel style, no half pel or qpel to keep this reasonable simple. In practice the sub-pel motion estimation (hpel/qpel…) is normaly done after normal integer pel anyway so we can nicely ignore sub pel today

Now how do we estimate the motion of a square or rectangular block of pixels?
Nothing easier then that you will likely say just find the best matching block in the reference frame for each block we need to encode, but what is “best” and how to find it?

Compare functions

To find the best matching block we first need to give each pair of blocks a score, there are several popular ways to do that:

SAD Sum of absolute differences
      for(x=0; x<SIZE; x++){
          for(y=0; y<SIZE; y++){
              score += abs(reference[y][x] - current[y][x]);
          }          
      }
    
SSD Sum of squared differences
      for(x=0; x<SIZE; x++){
          for(y=0; y<SIZE; y++){
              error= reference[y][x] - current[y][x];
              score += error*error;
          }          
      }
    
SATD Sum of absolute hadamard transformed differences
      for(x=0; x<SIZE; x++)
          for(y=0; y<SIZE; y++)
              difference[y][x]= reference[y][x] - current[y][x];
      vertical_hadamard_transform(difference);
      horizontal_hadamard_transform(difference); // vertical and horizontal can be exchanged
      for(x=0; x<SIZE; x++)
          for(y=0; y<SIZE; y++)
              score += abs(difference[y][x]);
    

And the theoretical ideal comparission function is to completely encode the block and then use the weighted sum of the number of bits and the distortion be it SSD or a psychovissual one as score

And how to find the best block with such a comparssion function will be the subject of another blog entry, dont fear, iam just trying to keep the blog entries small, selfcontained and easy digestable

Filed under: Uncategorized — Michael @ 0:17

October 29, 2006

Integer precision conversation

You have a 8bit grayscale value (or a audio sample) and want to convert it to 16bit, so without thinking about it you write x<<8 but that generally isnt correct, because the the new least significant bits are all zero so that the largest 8bit value will become a value 255 below the largest 16bit value

So what is correct?

This depends on your definition of what all the values mean, if 0 is black then simply multiplying by the new value for white and dividing by the old value for white with rounding to nearest will do the trick

For the specific case where the largest representable value is white, increasing precision can be done by setting the new least significant bits to the most significant ones, for example for 8->16bit x+(x<<8)=x*65535/255=x*257, for 8->10bit (x<<2)+(x>>6) or if you are pedantic (x*1023+127)/255=(x*341+42)/85 and for 2->8 x+(x<<2)+(x<<4)+(x<<6)=x*85

For the specific case where the largest+1 (256 in case of 8bit) is white, indeed just setting the low bits to 0 is correct, but at least for low precision this is not how grayscale values are defined, just think about 1bit monochrome, converting that to 8bit by setting the low ones to 0 will give you 128 for 1 while white is 255 (or 256)

What about decreasing precision?

Well thats simply the inverse, so for x*257 its (x+128)/257 which can be approximated by (x*255+(1<<15))>>16

Filed under: Uncategorized — Michael @ 1:48

August 28, 2006

The highlevel structure of the mp3 bitstream

Todays blog entry descibes the mp3 bitstream structure, it wont describe the actual mp3 decoding and neither will it descibe where and how each variable is precissely stored …
First the mp3 bitstream is made of packets, these packets vary in size, yes they can even vary for “CBR” though only by +-1 byte

Each such packet begins with a 11bit “startcode” followed by 21 header bits which encode all of the most important info like samplerate, bitrate, number of channels, layer, padding, …. from these 21bits can the packet size be calculated

The remainder of the packet is split in 2 parts, the first contains everything except the scale factors and coefficients, the second contains the scale factors and coefficients which make it the larger part, the first is just encoded normally from position 4 (0-3 is the header) to the end

The second though is encoded much more stupidly, first we have (from the first part of the packet) a backstep value which is an amount of bytes from an internal buffer which we must prepend to the second part of the packet before we can begin to decode it, we will describe whats in that internal buffer in a moment, the second part now is split into channels and granules, for each we have a length in bits, we decode them normally until we either have decoded all scale factors and coefficients or we hit or run over the amount of bits assigned for that channel-granule if we dont hit it precissely but run over it as the bit-amount end doesnt coincide with the end of a vlc than we must step back by one vlc, undo its effects and skip the half vlc, whats left of the second part of our packet (that can include part of the internal buffer) will be put into the internal buffer to be used by future packets

Filed under: Uncategorized — Michael @ 21:50
« Previous PageNext Page »

Powered by WordPress