Lair Of The Multimedia Guru



Its really high time to update by blog before it starts rotting. So here are a few words about the modified discrete cosine transform. The reason why iam writing this, is that most docs about the MDCT ive seen are quite obfuscated so i thought maybe i could do better (or worse ;)). Note ive made no attempt to write formal proofs for sake of readability, but they are all very trivial.

Basis functions of the normal DCT

Basis functions of the MDCT

Both of the transforms have orthogonal basis functions, that is the forward transform is simply the series of dot products of the input and the basis functions. While the inverse transform is the sum of the basis functions each scaled by the dot product from the forward transform. This can also be said in a dozen different ways …

What makes the MDCT special, is that it has half as many basis functions as it has inputs. Thus performing the MDCT and then IMDCT on a single block will generally not result in the original. The magic with the MDCT is that it can be overlapped by 50% and then suddenly doing the MDCTs and IMDCTs leads to perfect reconstruction of the original. One can see such lapped transform as a single non lapped transform of infinite length if one wants.

To proof that simply applying the MDCT and then the IMDCT on each block i, which is 50% lapped over block i-1 leads to the original, one really only has to proof that all basis functions are orthogonal (have a mutual dot product = 0). We already assumed that the basis functions of a single MDCT are orthogonal (that can easily be proofen by trigonometric identities if someone is bored). We also know that non adjacent blocks can only have mutually orthogonal basis functions as they do not overlap. Whats left are the adjacent blocks. The proof for their orthogonality is very trivial, if one looks at the graph above, it is immedeatly obvious that there are 2 symmetries one at 0.25 and one at 0.75 in all basis functions. So they behave as (…, c, b, a, -a, -b, -c, …, x, y, z, z, y, x, …). Thus the dot product of 2 basis functions of 2 adjacent blocks is …
… c*x + b*y + a*z – a*z – b*y – c*x … which is obviously 0.

I also should mention that because of these 2 symmetries mentioned above one really just needs to calculate 50% of the IMDCT as the rest is identical up to the sign

The above is still missing one detail, that is that normally the (I)MDCT is used with a window to smoothly get the basis functions down to 0 at their ends. An example with sine window is below

The obvious question is, are the basis functions still orthogonal? The blocks which are not adjacent of course still have to be because their basis functions dont overlap. The basis functions from adjacent blocks assuming a symmetric window
(…c*w-2, b*w-1, a*w0, -a*w1, -b*w2, -c*w3, …, x*w3, y*w2, z*w1, z*w0, y*w-1, x*w-2, …) now have a dot product of …c*w-2*x*w3 + b*w-1*y*w2 + a*w0*z*w1 – a*w1*z*w0 – b*w2*y*w-1 – c*w3*x*w-2 … which is still 0 thus to our “big surprise” any symmetric window maintains orthogonality between 2 adjacent blocks. Whats left are the basis functions within a block. For them the dot product looks like
… c*C*w-22 + b*B*w-12 + a*A*w02 + a*A*w12 + b*B*w22 + c*C*w32 … + x*X*w32 + y*Y*w22 + z*Z*w12 + z*Z*w02 + y*Y*w-12 + x*X*w-22
or reordered and common stuff factored out:
… c*C*(w-22+w32) + b*B*(w-12+w22) + a*A*(w02+ w12) +
… x*X*(w32+w-22) + y*Y*(w22+w-12) + z*Z*(w12 + w02)
If we now choose a window for which wi2 + w1-i2 = 2 then the dot product equals what it is without the window, thus our windowed MDCT is still orthogonal and thus easy invertible

Filed under: DCT — Michael @ 03:34


Principal components analysis / Karhunen-Loève transform

The PCA/KLT is often said to be the optimal linear transform for video coding while the DCT is a similar, faster and simpler transform
This isnt really true, the KLT is not that optimal at all for video coding, what the KLT is, is that it is the optimal orthogonal transform for compacting the energy of a vector into its first n components, what might be better is a transform, preferably (near) orthogonal which compacts the energy into few components, not neccessarily the first few!
For example, if we consider 1D 8 component vectors, lets assume our data set is entirely made of piecewice constants then the following basis functions would allow us to exactly store these with 1+n basis functions where n is the number of discontinuities, and please note this here is just an example where the KLT isnt that optimal, 1-D piecewise constants are probably not a ideal model for images …

1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1
0 0 0 1 1 1 1 1
0 0 0 0 1 1 1 1
0 0 0 0 0 1 1 1
0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 1

ok this is quite a non orthogonal set of basis functions which has its problems too, a less compact and orthogonal (if the basis functions are normalized, which for sake of readability hasnt been done) but still better then KLT transform would be based upon the following basis functions:

1 1 1 1 1 1 1 1
1 1 1 1-1-1-1-1
1 1-1-1 0 0 0 0
0 0 0 0 1 1-1-1
1-1 0 0 0 0 0 0
0 0 1-1 0 0 0 0
0 0 0 0 1-1 0 0 
0 0 0 0 0 0 1-1

here a piecewise constant with 1 discontinuity can be represented by 2-4 basis functions where the KLT case below will need all 8 for an exact representation

 0.290596  0.341669  0.377247  0.395479  0.395472  0.377227  0.341853  0.290749  
-0.490579 -0.416017 -0.277788 -0.097204  0.097447  0.277903  0.415566  0.490129      
 0.473927  0.255266 -0.138718 -0.437652 -0.436948 -0.137245  0.255345  0.473804       
-0.415204  0.097424  0.490934  0.276661 -0.277947 -0.490521 -0.097391  0.416181       
-0.382979  0.320320  0.364628 -0.343399 -0.342868  0.364783  0.320177 -0.383024       
-0.278541  0.491575 -0.099199 -0.414873  0.416580  0.096333 -0.489054  0.277157        
-0.210508  0.464503 -0.455644  0.190764  0.181391 -0.451288  0.463825 -0.210293        
-0.096505  0.274820 -0.412134  0.489524 -0.491862  0.418348 -0.280951  0.098971

this has been generated by (the hopefully not buggy) pca.c/pca.h

Another way to see the sub-optimality of the KLT is to consider the last (high frquency) basis functions, in many images and videos they are simply not used, if they would be replaced by commonly occuring patterns which without them would need many basis functions to be accurately represented then the compression-rate could be improved, the resultig transform would be non orthogonal though …

Filed under: DCT,VideoCoding — Michael @ 01:36


The MPEG1/2/4 and H.261/2/3 IDCT

The 8×8 Inverse discrete cosine transform used in most video codecs converts the very sparse and thus easily compressible dct coefficient matrixes into the 8×8 blocks vissible or the 8×8 difference relative to some area from the previous frame
The IDCT in these codecs is not exactly specified, instead the MPEG and ITU commitees only require the used IDCT to be approximately equal to a idealized and very slow reference IDCT due to that, decoders and encoders use various different IDCTs, if the actually used encoder and decoder happen to use 2 different enough IDCTs and the video material, encoding parameters, moon phase and so on match then the tiny +-1 differences between the IDCTs will accumulate and turn the pale face of a corpse pink or green or add stripes or worse …
some examples of such artifacts:

LLM/IJG int simpleidct
libmpeg2 idct xvids idct
LLM/IJG int simpleidct
libmpeg2 idct xvids idct

Its also very important to keep in mind that these artifacts are caused by the difference between the encoder and decoder IDCT and not the difference to some idealized IDCT only on the decoder side, so using the reference IDCT on the decoder side will often not help at all, and sometimes might make the artifacts worse
now maybe you are curious which IDCT was used in the encoders for the 2 examples above, well iam curious too about that as theres no 100% certain way to find out, allthough i would guess the first used the idct from the msmpeg4v3 codec and the second the one from xvid
Which prameters can be tuned on the encoder side to reduce these artifacts? First use a smaller keyframe interval, then avoid qp=1, avoid qpel and use lots of b frames or use H.264 which doesnt suffer from this mess as it has a exactly specified idct approximation

Filed under: DCT,VideoCoding — Michael @ 16:37

Powered by WordPress