Testing Randomness of Cryptographic Data

lkcl · Mar 28, 2014

hi folks, the last couple of times I've been on here have been extraordinarily useful, and i have a new challenge that I'm stuck on so thought i'd ask. this one takes a bit of explaining, apologies.

i've encountered an issue with a standard cryptographic algorithm. take a block of data (say size 1024 bits), encrypt it, then the probability that anyone bit in the output is one or zero *must* be *exactly* 0.5 for all permutations of keys with all permutations of data. that's the basic definition of a "perfect" encryption algorithm.

reality is slightly different, and the reality is also that you cannot possibly hope to run through all possible 2^128 keys and all 2^128 data permutations, there's not enough time in the universe.

sooo, you do a series of runs instead. say, 100,000 runs of length 1024 bits. *one possible* test is to count the number of zeros. the probability of each bit being independent (in a perfect algorithm), you can apply a binomial distribution. in a run of 1024 bits, the average should be 512 zeros.

now, here's the thing: if you get *exactly* 512 zeros in every single one of the 100,000 runs, you know intuitively that there's a problem. buuut, how do you actually test that? you take a p-value and then you test the *p-value distribution* for being uniform.

in NIST.gov's statistical test suite, what they do is they have a histogram with 10 buckets, assume a uniform distribution, and then for each p-value from the 100,000 runs drop p-values between 0.0 and 0.1 into the 1st bucket, between 0.1 and 0.2 into the 2nd and so on.

once you have done this, then you can do a chi-squared test on the histogram (they should be roughly level.. but again, *not perfectly* level!).

the problem comes when the p-value distribution is *NOT* uniform, but is a binomial distribution like the "count the number of zeros in a block of 1024 bits" test. with a binomial distribution that only has a limited number of *discrete* values, you end up with what one can call "sampling noise" between the p-values and the histogram granularity: end-result, the histogram gets skewed (certain buckets end up with more p-values), it's non-uniform, and that's game over.

so, that's the background, for which i apologise in having to explain it to such lengths. you can also probably guess what i need to do :)

what i need is to work out based on a histogram of the number of *distinct* occurrences of p-values, if it is "good" or not. i know for a fact that the data below is *not* good, it just looks like it is. the data size which produced the table below is 256 bits, and it is a million runs (of 256 bits). the tail end (p-values less than 1e-6) is where i am suspicious,
but i need to *prove* that it's suspicious rather than just guess.

i need therefore two things (which i must implement in c or python)

1) a way to compute the "perfect" histogram (parameters will be the number of runs and the number of bits, each with probability 0.5)

2) a way to then compare the *actual* histogram against the "perfect" one.

both of these things although i can describe them i have nooo idea where to start :) some guidance, walking through this, greatly appreciated.

0.000000004 1
0.000000077 1
0.000000152 1
0.000000298 1
0.000000573 3
0.000001088 4
0.000002034 13
0.000003746 14
0.000006795 30
0.000012143 60
0.000021377 113
0.000037073 177
0.000063342 355
0.000106625 540
0.000176835 827
0.000288961 1300
0.000465258 2112
0.000738157 3321
0.001154050 5168
0.001778051 7472
0.002699796 11086
0.004040275 15609
0.005959526 22666
0.008664897 31501
0.012419331 44040
0.017548950 59411
0.024448945 79218
0.033586613 104171
0.045500264 133922
0.060792724 171872
0.080118314 216864
0.104162559 266307
0.133614403 324626
0.169131445 386622
0.211299547 457682
0.260589034 530572
0.317310508 604553
0.381573906 680090
0.453254705 754112
0.531971058 820481
0.617075077 880667
0.707660467 929469
0.802587349 965874
0.900523550 988681
1.000000000 498391

- - - Updated - - -

p.s. as i only need to test the tail-end, i am guessing that a cumulative distribution function would do? but i do not know how to apply it.

jvicens · Mar 28, 2014

Hello,

Thank you for sharing your challenge with us. It sounds like you are trying to test the randomness of a cryptographic algorithm through statistical analysis. This is a common approach in cryptography and can be a challenging task.

To answer your first question, there are a few ways to compute the "perfect" histogram. One approach is to use the binomial distribution, as you mentioned in your post. This distribution can be used to calculate the probability of getting a certain number of zeros or ones in a block of data of a given size. You can then use this probability to create a histogram with the expected frequencies for each possible outcome.

Another approach is to use a Monte Carlo simulation. This involves generating a large number of random data blocks and encrypting them with a given key. The resulting output can then be used to create a histogram of the frequencies of zeros and ones. This approach can be more computationally intensive, but it allows for more flexibility in terms of the size and number of data blocks you want to test.

To compare the actual histogram to the "perfect" one, you can use a statistical test such as the chi-squared test. This test compares the observed frequencies in your actual histogram to the expected frequencies in the "perfect" histogram. If there is a significant difference between the two, it suggests that the data is not random.

In terms of implementing these methods in C or Python, there are various libraries and packages available that can help with creating histograms and conducting statistical tests. Some examples include the NumPy and SciPy libraries in Python and the GSL library in C.

I hope this helps guide you in the right direction. Good luck with your research!

Testing Randomness of Cryptographic Data

FAQ: Testing Randomness of Cryptographic Data

What is the purpose of testing randomness in cryptographic data?

How is randomness measured in cryptographic data?

What are some common statistical tests used to test randomness in cryptographic data?

How can randomness be improved in cryptographic data?

What are some potential consequences of using non-random data in cryptography?

Similar threads

Hot Threads

Recent Insights