Lair Of The Multimedia Guru

November 30, 2008

Pseudo random number generators 2

After the little bit of analysing PRNGs in the last blog post, which PRNGs exactly seem good?

Fastest PRNGs

If one either needs a very fast PRNG or one has some means to easily test that the output is good enough (like looking to see if a noise filter produces no vissible aritifacts) then folllowing could be considered.
Linear congruential generators, Additative lagged fibbonacci generators or even just an array of random numbers that gets used repeatedly, …
Each of these has its serious flaws, LCGs have very poorly performing LSBs, that is the lowest n bits never have a period larger than 2n. ALFGs contain number triplets (the ones spaced like the taps used in the PRNG) that when used as coordinates in a cube would all fall in a single plane. Still these generators are often good enough, and have been used for a long time and are widespread, that is all C libs ive seen use either of them for rand() for examle.

Good and fast PRNGs

If one wants a generator free of big flaws and that is still reasonably fast then the number of choices goes down, first there is
4 tap additative LFG (xi = xi-55 + xi-119 + xi-179 + xi-256 mod 2n) this one passed all of testu01 but has a obvious linear dependancy between its values.
Combined generators, these generators combine the output of several simple generators and generally pass all tests, many of them though are slow like CLCG4 which combines 4 LCGs, combined generators like fi = gi + hi also have the flaw that fi – fi+p = hi – hi+p when p is the period of g. This also means that fi – fi+p – (fi+q – fi+p+q) = 0 when p and q are the periods of h and g. Thus all combined generators which are made of constituent generators with “bruteforceable periods” are cryptographically likely very weak. A combined generator that is reasonable fast is KISS99.
Multiplicative lagged fibonacci generators, like for example (xi = xi-55 * xi-24 mod 2n) these generators also tend to pass all tests as long as only the most significant bits are used, that is for example the top 32 bit of a 64 bit generator. The least significant bit of such a generator is always 1, its next bit behaves like a LFSR identically to the least significant of an additative or xor based LFG. One trick to get rid of the least significant bit without ending up with just 31 or 63 bits is to use 2*a*b+a+b instead of a*b, this works because a and b are (have to be) odd in a normal MLFG and (2*A+1)(2*B+1) = 4AB + 2A + 2B + 1

Best but slow PRNGs

“Anything” based on AES, SHA-1, MD5, …

Bad PRNGs

These are PRNGs that are both slower and worse in terms of their output than others, examples are
Mersene twister, becuase its output is purely based on the XOR of previous bits thus being not nearly as good as some pretend it to be while it also isnt nearly as fast compared to other properly implemented PRNGs.
Combined generators with lots of modulo by non power of 2, these are slow …
Weak generators from which many output values are thrown away, again these are slow …

Filed under: Cryptanalysis,Pseudo random number generators — Michael @ 2:08

November 22, 2008

Pseudo random number generators

I guess most developers have once or more than once run into the question of, “which PRNG should one use”. The awnser, or more precissely my awnser to this, would be it really depends on for what one plans to use it.
There is no perfect generator, there are fast ones, and there are good ones, but the best arent the fastest. One has to choose depending on ones needs

  • If one wants to generate white noise for a human listener then the best generator simply is the fastest that produces perfectly sounding white noise, it really doesnt matter if it has some statistical defects or not.
  • If one wants to use the random numbers for cryptographic purposes, a perfect generator that has been extensively tested and no flaws found is needed.
  • If one wants to test a scientific theory by simulation, one needs a generator that doesnt cause a wrong result from the simulation, but as the correct result isnt known, one cant easily pick based on this criteria. Thus one has little alternative to picking a generator that has few statistical defects and is fast enough for the amount of numbers needed. Or better even run the simulation 2-3 times with very different generators

So which generators are there and what defects do they have? This actually is rather easy to awnser, or then maybe not ;) well, there is George Marsaglias Diehard and Pierre L’Ecuyers
TestU01 both contain code to test PRNGs, later also contains a paper describing the results of these tests for most recent and popular PRNGs.

In an ideal world one would just have to look at the TestU01 paper And pick a generator that has the amount of defects and speed one wants. At least thats what i thought before testing a few generators myself, more precissely i took a few of the generators that passed all tests in TestU01 and run 2 (to me obvious) tests against them

  1. Use a gaussian like algorithm to find out if any bits in the output are just a linear (mod 2) combination of previous output bits
  2. Use a gaussian like algorithm to find out if any scalars in the output are just a linear (mod maxoutput+1) combination of previous output scalars
  3. Use a gaussian like algorithm to find out if any bits (only considering the least significant of each scalar) in the output are just a linear (mod 2) combination of previous such output bits

The first 2 of these tests in the way i implemented them only consider outputs surrounding a scalar with 20 zero LSBs, though as far as i tested it this 20bit trick only made a difference for the additative LFG with only the 32MSB used, but i did not test all without this 20bit check

Failing either of these tests makes the PRNG at least unsuitable for linear algebra under the same modulo, so its certainly a statistical defect. All linear congruential generators have to fail at least the test that works in their own modulo, after all they are linear, but actually all i tested failed both tests. Lagged fibonacci generators based on addition or xor similarly must fail and do. All multiplicative Lagged fibonacci generators fail as well due to linear dependancies in their lower bits. All linear feedback shift register based generators like the mersene twister very obviously have to fail too, i did not test any of them though as none passed TestU01.

The actual generators that passed TestU01 and failed mine where:
superduper64, Marsa-lfib4, DX-47-3, MRGk5_93, Lfib(2^64,55,24,*), brent-xor4096s

The actual generators that passed TestU01 and passed mine where:
brent-xor4096s, MRG31k3p, CombMRG96, ran2, CLCG4, KISS99

Note1, there are more generators that passed TestU01 but i did not test them

Note2, The first 2 tests where only searching for linear dependancies within 512 consecutive scalars, that is 32kbit for a 64bit PRNG, the 3rd test was considering 32kbit of LSBs

In addition to that, limiting the output to just 32MSB of 64 made the multiplicative LFG 2^64,55,24,* pass mine while the same trick with a additative one was not helping, KISS99 also passed when SHR3 or CONG was droped but not when both or MWC alone was droped

No source code this time though because i randomly copied PRNGs from various places, but if there is interrest i could throw all but my own code out and post that

To be continued when iam less tired … ;)

Filed under: Cryptanalysis,Pseudo random number generators — Michael @ 1:56

April 15, 2006

Decrypting Nagravision

Now after breaking videocrypt, lets take a look at (old analog) nagravision

Nagravision

the PRNG seed

Same as videocrypt (Binary data from the vertical blanking interval is feeded into a smartcard which then among other things produces a PRNG seed for the nagravision decoder, but none of this matters to us here)

the PRNG

The PRNG (pseudo random number generator) is feeded with a 15bit seed and a 256 5bit entry table, and produces 287 5 bit values, the following code will do exactly that:

void prng(int seed, int *cle2enc){
  uint8_t keyNdx= seed >> 7;
  int keyInc= ((seed & 0x7F) << 1) + 1;
  int buffer[32];

  for(int i=0; i<32; i++) buffer[i]= i;

  for(int i=0; i<255; i++){
    cle2enc[i]= buffer[ keyTable[keyNdx] ];
    buffer[ keyTable[keyNdx] ]= i + 32;
    keyNdx+= keyInc;
  }

  memcpy(cle2enc+255, buffer, 32*sizeof(int));
}
the actual decryption

nagravision has a 32 line buffer, each line can contain a line of video, for each of the 287 lines of a field there are 5bits from the PRNG which are used as index into the buffer to decide into which entry to store the next received line and from where to output one, or in other words nagravision reorders lines using a 32line temporary buffer

Breaking Nagravision without knowledeg of the PRNG

The naive and ideal solution would be to try all 32^287 possibilities and compare the resulting decoded images, obviously thats not realistic
if OTOH we just use a compare function which compares 2 lines and lets assume its just simple correlation then and wish to find the global minimum then we actually didnt improve by that much as thats equivalent of solving the TSP
old versions of 2010 used the following algorithm to find a good permutation of the lines, which resulted in watchable movies

Step 1 (downsampling)

to speedup the linecompare later, the lines can be horizintally downsampled

Step 2 (comparing lines)

all lines which are not near black and which have not yet been merged with 2 lines are compared to all other such lines, the compares are done blockwise so that the number of cache misses is minimzed, for each line a fixed number of best matches is kept

Step 3 (sorting)

all the line pairs are sorted depending upon their similarity

Step 4 (merging)

from best matching line pair to worst the following steps are done

Step 4A (out of pairs)

if we hit the last pair for any line (we just keept a fixed number of best matches for each line) then we goto Step 2

Step 4B (impossible merges 1)

if either of the lines is in the middle of a block or they are the ends of the same block we continue with the next pair at Step 4

Step 4C (impossible merges 2)

we maintain 2 position limits for each block one for each possible orientation, if a merge would force the block outside the image we continue with the next pair at Step 4
the limits are calculated based on the fact that no line can be output before it has been received

Step 4D (impossible merges 3)

we maintain the fullness state of the 32 line reorder buffer if a merge would require overflowing it then we again continue with the next pair at Step 4

Step 4E (merge)

we merge the 2 blocks and update the buffer fullness and limits

Step 5 (sort blocks)

all blocks are sorted by their less restrictive position limit (=orientation is so that the limit is less restrictive)

Step 6 (reorder lines)

we reorder the actual lines

Breaking Nagravision with knowledge of the PRNG

as there are just 2^15 possible permutations so the naive bruteforce becomes alot more realistic but its still somewhat expensive for realtime decoding, so 2010 used a faster method

Step 1 (downsampling)

to speedup the linecompare later, the lines can be horizintally downsampled

Step 2 (compare)

every keyline is compared against all lines, the best 2 are keept

Step 3 (find seedlists)

for every line triplet (keyline and the 2 best matches) we assume that they will be consecutive
in the decoded image, we look up a list of seeds for which this is true from a precalculated table
or tree structure

Step 4 (find seed)

we find the most often occuring seed in the seedlists and if its number of occurances is above some threshold we decode the image, if not we drop it, this ensure practcally zero wrong decoded images

Step 5 (reorder)

after we know the seed we just need to reorder the lines and fix the chroma

Filed under: Cryptanalysis — Michael @ 11:54

April 10, 2006

Decrypting VideoCrypt

To complement my last blog entry, heres a description of how VideoCrypt works and how it can be broken with reasonable quality

PAL

First before we can talk about VideoCrypt, we need to understand how PAL works
Video information in PAL (and NTSC and SECAM) is stored linewise top to bottom, and interlaced so all the even and then in the next field all the odd lines are stored (313 lines in even fields, 312 in odd from which ~288 lines contain a vissible picture for PAL)
each line has a sync pulse to mark the end of a line, a horizontal blanking interval during which the electron beam in crts “moves” back to the left after the end of the previous line at the right and obviously the vissible part in which brightness = amplitude
In case of Color the U and V components of the YUV color value are stored quadrature amplitude modulated at the color subcarier frequency and the V component has its sign inverted in every 2nd line of a field, theres also a short reference color burst added in the HBI

VideoCrypt

the PRNG seed

Binary data from the vertical blanking interval is feeded into a smartcard which then among other things produces a PRNG seed for the videocrypt decoder, but none of this matters to us here

the actual decryption

The PRNG (pseudo random number generator) outputs a sequence of 8bit values (cutpoints), one for each line of video, each of these lines is then split into 4 parts, parts 0 and 3 are outside the vissible area and arent touched by videocrypt, parts 1 and 2 are the vissible area and are exchanged, the location which seperates 1 and 2 is the cutpoint, videocrypt does nothing special with the color information it just exchanges raw digitally sampled data
because this exchange operation cannot be done with zero-delay the videocrypt decoder will also output parts 1 and 2 one line later then when they where transmitted

Breaking VideoCrypt

Lacking informaion about the PRNG we can only guess the cutpoints based on the assumtation that color/luminance values for pixels which are close will be strongly correlated while distant pixels will have more different color/lumi
still, naive bruteforce would require 256^288 images of w*288 pixels to be compared
depening on how we test a image for how good it is, we are able to significantly reduce the complexity, if we use a score function which is the sum of some compare function which uses just consecutive line pairs and their cutpoints then the dynamic programing algorithm / viterbi algorithm can be used to reduce the complexity to 256*256*288 such 2 line checks, if we now further limit ourselfs to the common cross correlation + some edge detector and only consider a few cutpoint differences close to the best result from cross correlation then the complexity would be w*log(w)*288 + 9*w*288 which is what markus kuhns antisky.c did, though antisky was far from realtime decoding, mainly because it was inefficiently written (floating point code everwhere at a time where no cpu had a reasonable fast fpu, …)

Note, if you think comparing 256^288 images would be stupid, its not, think about choosing the image which is compressed best with jpeg or another codec, this might (or might not) be better then the linewise method

Step 1 (downsampling)

to speedup the cross correlation and other things the whole image can be optionally downsampled

Step 2 (luminance cross correlation)

antisky.c used a FFT based cross correlation, this while asymptotically quite fast is really not fast at all for the case of finding the best matching cutpoint difference in reality, 2010 uses a adaptive comparission which first tries a few coarsly spaced cutpoint differences and then tries recursively more around the better ones, its simpler and faster

Step 3 (missmatched lines detection)

lines which simply cant be matched by cross correlation (mean under some threshold or variance under some threshold or best matching score from cross correlation not much better then average score)
must be marked otherwise the (random) cross correlation result will trash the other surrounding lines

Step 4 (PAL phase detection)

as already described somewhere above, PAL flips the sign of the V component every second line, the first line may or may not be fliped, and if the capture hw doesnt provide us with this info then we need to find it if we want to decode color, one way to solve this is to let the user set the flag and then just flip the flag after every frame, a much more robust method is to compare the color of the first and last pixel of the encrypted line which will be exchanged, these are guranteed to be always adjacent in the decrypted image so they will have almost the same color in the decrypted image, while in the encrypted image their difference depends upon the PAL phase thingy

Step 5 (finding the Chroma phase difference)

we know the approximate cutpoint difference from the luma cross correlation, so we can simply find the chroma phase/hue difference by calculating the complex correlation coefficient (U and V are real and imag values) (note a little care needs to be taken here due to various things i dont remember exactly …)

Step 6 (edge detection)

simply run your favorite edge detector over the 256 possible cutpoints to detect the picture edge, for example log(abs(l[x-2] + l[x-1] – l[x+0] – l[x+1]))

Step 7 (dynamic programming search)

Well here we simply combine all the previous stuff to find the best cutpoint sequence to reach every cutpoint of line X assuming we know the best sequence to all cutpoints of line X-1 already
lines which couldnt be matched to their previous line are a special case which is handled like the first line (every cutpoint has the same score) the same is done with yet another special case, namely the one where no cutpoint is possible anymore, that can happen as we restrict ourselfs to a few relative cutpoints around what the cross correlation guessed is correct and theres a small deadzone in which no cutpoints lie, so not every sequence of relative cutpoints is possible depending upon how you define relative

Step 8 (cutpoint sequence cache)

over some timeperiod cutpoint sequences where repeating after an hour or so, so i added a cache to 2010 which lead to perfect decoding until the repeations dissapeared …

Step 8a (finding the current sequence)

we select several consecutive relative cutpoint pairs and look them up in a hashtable, we also try to add +-1 to both as to compensate for errors in our cutpoint guess
for each pair we will get a list of pointers to cutpoint sequences
we count the number of occurances of each pointer and choose the most often occuring as the correct sequence if its number of occurances is above some threshold or return no match if not

Step 8b (merging the current cutpoint estimates)

we add the current relative cutpoints to the sequence so that each line has a list of relative cutpoints, if that gets too long the least often occuring is discarded
for each line the most often occuring relative cutpoint is choosen and used in later steps, the hashtable is also updated with it

Step 9 (cut and exchange)

this is simple, just 2-3 memcpys and rotating the UV values so as to compensate for the effects of the shift and 1 line delay on the color

Filed under: Cryptanalysis — Michael @ 14:58

April 7, 2006

Decrypting (old analog) PayTv

A really long time ago, 1997 or maybe it was even a little earlier i wrote a program to decrypt the at that time used analog pay tv systems videocrypt and nagravision sadly i was too afraid of legal issues at that time so i never released either source or binary, well both systems are AFAIK dead nowadays, the old code hasnt been touched since 2000, doesnt compile with any compiler i have, so i thought its the perfect time for finally releasing that trash

When was the first working version born?

well i dont know, the oldest file i could find was a 2do list from 1997 which contains notes about improving videcrypt and nagravision decoding so these must have been working already …

Does it contain any cryptografically new knowlegde about the systems

no, before the nagra PRNG was reverse engeneered i belived that the PRNGs both systems used would be very secure and there would be no point in “looking” at their output, afterwards besides feeling silly as even a blind man would have noticed how the nagra PRNG works (its just permutating 256 5bit values in trivial 32768 ways, 127 only if we ignore new[x]= old[x+C])
i tried to look at the videocrypt PRNG output but failed to find anything, 2 cutpoint sequences where either different and their difference random or there where exactly equal, which happened either commonly (repeating after an hour or so) or never, depending upon when you recorded them, i also failed to find any other relations in the sequences …

Is this clean ANSI C code?

its ultra messy DJGPP-GCC-DOS-C code, i started to clean it but gave up, a patch with my unfinished cleanup work is included for the insane. with the patch a few files compile under current gcc-linux

what else does it do? could it make coffee?

Theres some teletext decoding code, some half working sync suppression decoder (with unmodified hardware) and probably a few other things i dont remember, after all it doesnt compile and i dont have my matrox meteor card here (which was the only card it supported, as that was what i had)

did you port it to java?

yes, i ported part of it to java a really long time ago, why i did that iam not sure anymore, was i seriously beliving the sun propaganda that java was a useable general propose programming language?i guess my excuse is that my previous java code at that time where just tiny applets & applications where the java-effect isnt so noticeable
the java version was even with hand coded mmx over jni about 3 times slower, there where ~1sec garbage collector pauses every 30secs or so and after 15min the computer needed a hard reset (yeah that was even without any mmx/jni)

download this crap\h\h\h\hmasterpiece

2010

license?

GPL but what do you want to do with it?!

Filed under: Cryptanalysis — Michael @ 14:37

Powered by WordPress