- #1
ktoz
- 171
- 12
Hi
I'm writing a compression utility based on an improvement to Huffman coding and just for giggles, I thought I'd try it out on true random data from http://www.random.org/" to see if it did anything.
What I found was that the data from random.org (which is derived from atmospheric noise) definitely displays biases which can be exploited to generate significant compression rates (I'm getting upwards of 55% savings.)
I'm well aware of the "http://en.wikipedia.org/wiki/Pigeonhole_principle" " and know that random data shouldn't be compressible so that brings up the question: Is this data somehow not random? Is randomness really just a function of scale?
For example: If you take 1,000,000 randomly generated bits and slice them up into 8 bit samples, there are bound to be certain biases (ie: more instances of the number 15 than 16) If you slice those 1,000,000 bits into 25 bit samples, completely different biases would result, but it seems that by choosing a sample size, you automatically introduce biases into random data, which sort of makes it - not random.
The only way I can think of to create a random set without biases would be to start with equal numbers of all values for a particular sample size and mix them. That way, there would be no frequency differences to exploit. But even this fails if you choose a different sample size.
So I'm wondering is randomness just an illusion caused by sampling scale?
I'm writing a compression utility based on an improvement to Huffman coding and just for giggles, I thought I'd try it out on true random data from http://www.random.org/" to see if it did anything.
What I found was that the data from random.org (which is derived from atmospheric noise) definitely displays biases which can be exploited to generate significant compression rates (I'm getting upwards of 55% savings.)
I'm well aware of the "http://en.wikipedia.org/wiki/Pigeonhole_principle" " and know that random data shouldn't be compressible so that brings up the question: Is this data somehow not random? Is randomness really just a function of scale?
For example: If you take 1,000,000 randomly generated bits and slice them up into 8 bit samples, there are bound to be certain biases (ie: more instances of the number 15 than 16) If you slice those 1,000,000 bits into 25 bit samples, completely different biases would result, but it seems that by choosing a sample size, you automatically introduce biases into random data, which sort of makes it - not random.
The only way I can think of to create a random set without biases would be to start with equal numbers of all values for a particular sample size and mix them. That way, there would be no frequency differences to exploit. But even this fails if you choose a different sample size.
So I'm wondering is randomness just an illusion caused by sampling scale?
Last edited by a moderator: