Lair Of The Multimedia Guru

2006-12-12

Motion Estimation (1. comparission functions)

This months ;) Todays blog entry is about motion estimation for block based video encoders, more precissely integer pel style, no half pel or qpel to keep this reasonable simple. In practice the sub-pel motion estimation (hpel/qpel…) is normaly done after normal integer pel anyway so we can nicely ignore sub pel today

Now how do we estimate the motion of a square or rectangular block of pixels?
Nothing easier then that you will likely say just find the best matching block in the reference frame for each block we need to encode, but what is “best” and how to find it?

Compare functions

To find the best matching block we first need to give each pair of blocks a score, there are several popular ways to do that:

SAD Sum of absolute differences
      for(x=0; x<SIZE; x++){
          for(y=0; y<SIZE; y++){
              score += abs(reference[y][x] - current[y][x]);
          }          
      }
    
SSD Sum of squared differences
      for(x=0; x<SIZE; x++){
          for(y=0; y<SIZE; y++){
              error= reference[y][x] - current[y][x];
              score += error*error;
          }          
      }
    
SATD Sum of absolute hadamard transformed differences
      for(x=0; x<SIZE; x++)
          for(y=0; y<SIZE; y++)
              difference[y][x]= reference[y][x] - current[y][x];
      vertical_hadamard_transform(difference);
      horizontal_hadamard_transform(difference); // vertical and horizontal can be exchanged
      for(x=0; x<SIZE; x++)
          for(y=0; y<SIZE; y++)
              score += abs(difference[y][x]);
    

And the theoretical ideal comparission function is to completely encode the block and then use the weighted sum of the number of bits and the distortion be it SSD or a psychovissual one as score

And how to find the best block with such a comparssion function will be the subject of another blog entry, dont fear, iam just trying to keep the blog entries small, selfcontained and easy digestable

Filed under: Uncategorized — Michael @ 00:17

2006-10-29

Integer precision conversation

You have a 8bit grayscale value (or a audio sample) and want to convert it to 16bit, so without thinking about it you write x<<8 but that generally isnt correct, because the the new least significant bits are all zero so that the largest 8bit value will become a value 255 below the largest 16bit value

So what is correct?

This depends on your definition of what all the values mean, if 0 is black then simply multiplying by the new value for white and dividing by the old value for white with rounding to nearest will do the trick

For the specific case where the largest representable value is white, increasing precision can be done by setting the new least significant bits to the most significant ones, for example for 8->16bit x+(x<<8)=x*65535/255=x*257, for 8->10bit (x<<2)+(x>>6) or if you are pedantic (x*1023+127)/255=(x*341+42)/85 and for 2->8 x+(x<<2)+(x<<4)+(x<<6)=x*85

For the specific case where the largest+1 (256 in case of 8bit) is white, indeed just setting the low bits to 0 is correct, but at least for low precision this is not how grayscale values are defined, just think about 1bit monochrome, converting that to 8bit by setting the low ones to 0 will give you 128 for 1 while white is 255 (or 256)

What about decreasing precision?

Well thats simply the inverse, so for x*257 its (x+128)/257 which can be approximated by (x*255+(1<<15))>>16

Filed under: Uncategorized — Michael @ 01:48

2006-08-28

The highlevel structure of the mp3 bitstream

Todays blog entry descibes the mp3 bitstream structure, it wont describe the actual mp3 decoding and neither will it descibe where and how each variable is precissely stored …
First the mp3 bitstream is made of packets, these packets vary in size, yes they can even vary for “CBR” though only by +-1 byte

Each such packet begins with a 11bit “startcode” followed by 21 header bits which encode all of the most important info like samplerate, bitrate, number of channels, layer, padding, …. from these 21bits can the packet size be calculated

The remainder of the packet is split in 2 parts, the first contains everything except the scale factors and coefficients, the second contains the scale factors and coefficients which make it the larger part, the first is just encoded normally from position 4 (0-3 is the header) to the end

The second though is encoded much more stupidly, first we have (from the first part of the packet) a backstep value which is an amount of bytes from an internal buffer which we must prepend to the second part of the packet before we can begin to decode it, we will describe whats in that internal buffer in a moment, the second part now is split into channels and granules, for each we have a length in bits, we decode them normally until we either have decoded all scale factors and coefficients or we hit or run over the amount of bits assigned for that channel-granule if we dont hit it precissely but run over it as the bit-amount end doesnt coincide with the end of a vlc than we must step back by one vlc, undo its effects and skip the half vlc, whats left of the second part of our packet (that can include part of the internal buffer) will be put into the internal buffer to be used by future packets

Filed under: Uncategorized — Michael @ 21:50

2006-08-17

ddrescue

ddrescue is a nice and simple tool to rescue data from a half broken disk or cd, it does that by simply repeatly trying to copy chunks, in the first pass it will try big chunks which is fairly fast, in the second pass it will try to split the damaged chunks into damaged and undamaged areas, and in the following passes it will just repeatly try to read the still damaged chunks

This takes quite some time if the disk/cd contains many damaged areas and recovery may or may not be successfull in the end but one thing which had a significantly positive effect on recovery of a old cdr (no there was nothing important on it, i was just playing around with ddrescue …) was changing the cdrom speed with hdparm every 10 secs, yes thats probably not healthy for the drive so kids try it with someone elses drive, not your own :)

While playing with ddrescue i added some ascii vissualization of the recovery process

Filed under: Uncategorized — Michael @ 00:57

2006-07-21

Division by zero

Have you ever been curious why you cant divide by zero?

The intuitive explanation with division of positve integers

To divide a by b means to remove b as often as possible from a and assign the number of times which you could do that to a/b, and whats left as the remainder of the division
if b is 0 you can remove it as often or as rare as you want, nothing will ever change

The intuitive explanation with division as the inverse of multiplication

If you have a number a and multiply that by b then divide
by b, so a*b/b then you should get a again if division reverses what multiplication did, but if b is 0 then a*b is a*0 which is 0 no matter what a was, so theres no information left about the value a had and consequently no way to reverse it

The Analytical explanation

if a is non zero constant, then a/b diverges to infinity as b approaches 0, and it converges to x if a=xb
note its important to remember here that the limit of a/b as b approaches 0 is not the same as a/0

The Algebraic explanation

All the above really just deals with the common numbers like integers, reals, complex numbers, …
so you might ask yourself if its not maybe possible to invent some other algebra (set of elements with some operations like addition and multiplication) in which we can divide by zero, well the awnser is of course you can, the problem is just that you have to give up some of the well known properties of common numbers
Lets see which set of properties is incompatible with division by zero
to speak about division by zero we need at least

  1. a set of elements S for our algebra A
  2. an additative identity element (x + 0 = 0 + x = x for all x)
    we use “0” here for simplicity even though the element doesnt need to be related to our well known zero
  3. of course a + and * operation, which again of course doesnt have to be the same as common addition and multiplication
  4. 0 to have a inverse ((x*0)*0-1 = x for all x)

with that set of rules one can easily find many algebras in which you can divide by zero, for example just define the * operation to be identical to + and that identical to the common addition, division by zero is just subtraction by zero, totally useless but it works :)
anoter property we likely want is a multiplicative identity element (x * 1 = 1 * x = x for all x) and the distributive law (a+b)*c= a*c + b*c and c*(a+b)= c*a + c*b sadly that already starts causing problems:

0*0 + 0= 0*0 definition of the identity element of +
0*0 + 1*0 = 0*0 definition of the identity element of *
(0 + 1)*0 = 0*0 distributive law
1*0 = 0*0 definition of the identity element of +
1 = 0 inverting the multiplication by 0

so our algebra would need the identity element of + and the identity element of * to be identical, hmm …

and another odd result:

x*1 = x definition of the identity element of *
x*(1+1) = x definition of the identity element of + and 1=0
x+x = x distributive law

now the question is, is there actualy a non trivial (more then 1 element) algebra left at all?
it seems there is: assume our algebra has 2 elements: “0” and “#”
and we define addition and multiplication like:

+/* 0 #
0 0 #
# # #

its immedeatly obvious that the rules for the 2 identity elements hold
does the distributive law hold too? it does which can be proofen by simply looking at the 8 possible cases, and an inverse element for multiply by 0 well 0 is its own inverse obviously

next, we drop the multiplicative identity element again and try to add a unique multiplicative inverse element x for every element instead of just for zero (a*x=b for all a,b), without that we would either just change the division by zero in a division by foobar problem or we wouldnt be able to reach some elements, sadly only the trivial 1 element algebra is left then:

a*b = a*(b+0) definition of the identity element of +
a*b = a*b+a*0 distributive law
0 = 0+a*0 choose b so that a*b=0
0 = a*0 definition of the identity element of +
0*0-1 = a invert multiplication by zero

the last line means: every algebra with the properties above will have exactly one element and be useless
alternatively if we replace the property that every element must have a unique multiplicative inverse by requiring addition to be invertable ((a+b) + (-b) = a) then we too end up with the useless 1 element algebra:

a=(a*0)*o-1 definition of the inverse of 0
a=(a*(0+0))*0-1 definition of the identity element of +
a=(a*0 + a*0)*0-1 distributve law
a=(a*0)*0-1+(a*0)*0-1 distributve law
a=a + a substituting from row 1
0=a inverting the addition of a

i guess thats enough math for today, and enough chances to embarrass myself with a trivial typo somewhere …

Filed under: Uncategorized — Michael @ 11:25

2006-07-08

UTF-8

UTF-8 is the most common encoding to store unicode, the following table summarizes how it works

number of bytes byte1 byte2 byte3 byte4 range of unicode symbols
1 0XXX XXXX 0-127
2 110X XXXX 10XX XXXX 128-2047
3 1110 XXXX 10XX XXXX 10XX XXXX 2048-65535
4 1111 0XXX 10XX XXXX 10XX XXXX 10XX XXXX 65536-2097151

Features

  • 0-127 ASCII is stored as is in UTF-8
  • bytewise substring matching works as is, or in other words one UTF-8 symbol will never match anything but one and the same UTF-8 symbol, not maybe just the last byte of a symbol
  • UTF-8 is easy to distinguish from a random byte sequence
  • when starting at a random byte, resynchronization is trivial and always happens at the next symbol

S1D-7

The biggest problem of UTF-8 is that its not very compact, it needs more bytes then other (non-unicode) encodings, can we do better without loosing its advantages?, well apparently yes, its sad because youd expect the standard comittees to do better …
Why “S1D-7”, well 1 “stop” bit to determine if the byte is the last one and 7 data bits for each byte

number of bytes byte1 byte2 byte3 byte4 range of unicode symbols
1 0XXX XXXX 0-127
2 1XXX XXXX 0XXX XXXX 128-16383
3 1XXX XXXX 1XXX XXXX 0XXX XXXX 16384-2097151

Compactness differences

  • In all cases is S1D-7 able to store as many or more symbols per number of bytes
  • where UTF-8 needs up to 4 bytes to store all symbols, S1D-7 never needs more then 3
  • Many languages which need 3 bytes per symbol in UTF-8 can be encoded with 2 bytes per symbol, just look at the Basic Multilingual Plane at wikipedia, all the pages which are in the range of 0x07FF to 0x3FFF need 3 bytes per symbol in UTF-8 and 2 bytes per symbol in S1D-7

Features

Well, the argumentation at the wikipedia page to justify UTF-8 bloatedness is that its “advantages outweigh this concern”, well so lets see if our simpler encoding which requires fewer bytes per symbol and is much more compact looses any features

0-127 ASCII is stored as is in UTF-8

same, no change here

Substring matching

in UTF-8 we can simply use bytewise substring matching while in S1D-7 we need to write a slightly differnt substring matching function, though both are very simple, see below

int match_utf8_or_ascii(uint8_t *heap, uint8_t *needle){
    int i,j;

    for(i=0; heap[i]; i++){
        for(j=0; needle[j]; j++){
            if(heap[i+j] != needle[j])
                break;
        }
        if(!needle[j])
            return i;
    }
    return -1;
}

int match_S1D8(uint8_t *heap, uint8_t *needle){
    int i,j;

    for(i=0; heap[i]; i++){
        for(j=0; needle[j]; j++){
            if(heap[i+j] != needle[j])
                break;
        }
        if(!needle[j] && !(i && heap[i-1]>127))
            return i;
    }
    return -1;
}
Distinguishing S1D-7 from random bytes

Due to the lower overhead there is less to check and it consequently needs more text to detect reliably, but the check is very simple, just check if there are any consecutive 3 bytes which are each larger then 127 and you know its not S1D-7, to distingissh it from UTF-8 just run your UTF-8 detection over it, its very unlikely that S1D-7 will be parsed correctly by a UTF-8 parser
not to mention, the encoding should be specified somehow anyway and not guessed by such encoding analysis

resynchronization

is even more trivial then in UTF-8, in UTF-8 you search for a byte <128 or a byte > 192 when you find it you know its the first byte of a symbol
in S1D-7 you search for a byte < 128 when you find it you know that the next byte is the first byte of a symbol

parsing

Parsing S1D-7 is much easier then UTF-8, see:

read_utf8_symbol(uin8_t **b){
    int val= *((*b)++);
    int ones=0;

    while(val&(128>>ones))
        ones++;

    if(ones==1 || ones>4)
        return -1;
    val&= 127>>ones;
    while(--ones > 0){
        int tmp= *((*b)++) - 128;
         if(tmp>>6)
            return -1;
        val= (val<<6) + tmp;
    }
    return val;
}

read_s1d7_symbol(uin8_t **b){
    int i, val=0;

    for(i=0; i<3; i++){
        int tmp= *((*b)++) - 128;
        val= (val<<7) + tmp;
        if(tmp < 0)
            return val + 128;
    }
    return -1;
}
zero byte occurances (new 2006-07-21)

This issue has been raised by a comment …
While UTF-8 gurantees that a zero byte in the encoding will occur only in a zero symbol, S1D-7 does not gurantee that, for example the value 256 is encoded as 0x82 0x00, UTF-16 also has this problem
zero bytes are problematic if the encoded strings are passed through code which has been written for one byte per symbol zero terminated strings, of course you should never pass something through code which is designed for a different encoding but if for whatever odd reason you still do that then one possible solution is to simply change the value before encoding and after decoding to avoid zero bytes, this is actually very easy:

int avoid_zero(int x){
    x+= x<<7;
    return (x + (x>>14))>>7;
}
int reverse_avoid_zero(int x){
    return x - (x>>7);
}

these transforms have almost no effect on the space efficiency of S1D-7

occurance of specific bytes (new 2006-07-21)

This issue has been raised by a comment …
While UTF-8 gurantees that no ascii bytes below 128 will occur in the encoding of symbols larger then 128, S1D-7 does not gurantee that, UTF-16 also has this “problem”
again this “problem” is limited to the case of passing encoded strings through code which expects some other encoding, and thats something you should never ever do, not only will the code not be able to make sense of at least some symbols its likely going to lead to bugs and exploits, if strings have to be passed through code which expects another encoding then you should convert your strings to that encoding and escape the problematic symbols somehow
S1D-7 can also easily be changed to avoid specific low <128 bytes in its encoded representation, allthough thats just for demonstration and shouldnt be done in practice as its very bad to pass strings though code which expects another encoding, if your filesystem for example expects filenames in ascii without specific letters then thats what the filename should be encoded with, and letters which cannot be represented should be encode with some sort of escape sequence

int avoid_some_bytes(int x){
    if(x>127) x= (x<<1) ^ table1[x&63];
    return x;
}
int reverse_avoid_some_bytes(int x){
    if(x>127) x= (x ^ table2[x&127])>>1;
    return x;
}

with these transforms any 64 low ascii bytes can be avoided at the cost of 1 bit per symbol, that makes S1D-7 somewhat worse then before but still much better then UTF-8 in space efficiency

Filed under: Uncategorized — Michael @ 12:40

2006-06-09

Deinterlacing filters

Below you will find a comparission of various deinterlacing filters, the interlaced source was created with mplayer using the phase=t and tinterlace=1 filters from the well known foreman video

all pictures where exported with mplayer -vo pnm and ppmtojpg --optimize --quality 90 why not -vo jpeg? well i didnt had -vo jpeg compiled in and -vo jpeg only accepts rgb input, so there wouldnt have been any advantage in using it …

Most deinterlacing filters only output one frame for every 2 fields that combined with the fact that some always choose the even, some always the odd and some always the first no matter if thats even or odd made this comparission less funny then i hoped originally, due to this little issue several filters (pp=ci and kerndeint IIRC) had to be feeded with vertically fliped video and tomsmocomp even needed -vf phase=b to output a frame based on the correct field and yes i of coarse did specify the top vs. bottom field first flag correctly as long as the filter had such an option …

Original non interlaced image
interlaced image mplayer -vf phase=t,tinterlace=1
mplayer -vf pp=lb (linear blend)
mplayer -vf pp=l5 (5tap lowpass filter)
transcode -J smartyuv
mplayer -vf pp=fd
mplayer -vf pp=md (median deinterlacer)
mplayer -vf pp=li (linear interpolate)
mplayer -vf pp=ci (cubic interpolate)
transcode -J dilyuvmmx
mplayer -vf kerndeint (Donald Graft’s adaptive kernel deinterlacer)
transcode -J smartdeinter (VirtualDub’s smart deinterlacer)
transcode -J tomsmocomp (Tom’s Motion Compensation deinterlacing filter)
mplayer -vf yadif=1:1
mplayer -vf yadif=3:1
mplayer -vf yadif=1:1,mcdeint=2:1:10
mplayer -vf yadif=3:1,mcdeint=2:1:10
Filed under: Uncategorized — Michael @ 16:48

2006-06-02

Optimal (A)DPCM encoding

ADPCM in general encodes differences between samples using a small number of possible difference values and adapts which values are allowed depending upon past difference values
( … we all know that)

ADPCM encoders normally work based on the principle that they select the best possible difference between the last sample and the current sample while ignoring both future and past, thats not optimal at all

a viterbi based encoder would select the optimal sequence of differences which minimize some distortion measure like the sum of squares, it achives this by successively finding the optimal encodings up to a specific state or in other words
instead of keeping a single sequence of encoded differenes of the past samples and then encoding the next difference
we keep track of the optimal sequence of differences/encoded bitstream up to the current sample for every state (0..88 step + a few sample values around the ideal one) next we just calculate the optimal bitstreams up to the next sample using the current sample and the optimal ones up to the last sample

maybe a concrete example would help, we use non adaptive DPCM here but the principle is the same …

lets say we have some input samples 0,2,6,4,-4
and our example encoder can encode +4,+1,-1,-4 differences
a conventional encoder would output 0,1,5,4, 0 distortion=18
optimal would be 1,2,6,2,-2 distortion=9
if we would force the first output to 1 and then use a conventional encoder 1,2,6,5, 1 distortion=27

viterbi encoder:

1. iteration
-2              distortion 4
-1              distortion 1
 0              distortion 0
 1              distortion 1
 2              distortion 4
2. iteration
 1, 0           distortion 5
 0, 1           distortion 1
 1, 2           distortion 1
-1, 3           distortion 2
 0, 4           distortion 4
3. iteration
-1, 3, 4        distortion 6
 0, 1, 5        distortion 2
 1, 2, 6        distortion 1
-1, 3, 7        distortion 3
 0, 4, 8        distortion 8
4. iteration
 1, 2, 6, 2     distortion 5
-1, 3, 7, 3     distortion 4
 0, 1, 5, 4     distortion 2
 1, 2, 6, 5     distortion 2
 0, 1, 5, 6     distortion 6
5. iteration
not possibl,-6
not possibl,-5
not possibl,-4
not possibl,-3
 1, 2, 6, 2,-2  distortion 9
Filed under: Uncategorized — Michael @ 20:18
« Previous Page

Powered by WordPress