How to check if a list of numbers is random?

In summary, the conversation discusses the use of a random number generator to create a list of uniformly random numbers between 0 and 1. The usual check involves sorting the list and histograming the difference between consecutive numbers, with the expected shape being a negative exponential. However, for larger list sizes, empty bins appear in a repetitive pattern. One possible solution is to use the ARIMA time series analysis of Box-Jenkins to check for any autocorrelation between the numbers. Additionally, converting the sequence to a 2D plot and visually inspecting it may also reveal any non-random patterns. However, it should be noted that if the numbers were generated using a computer's random number generator, they are not truly random and may have a repeat period
  • #1
ORF
170
18
Hello

I have used a random number generator to create a list of uniformly random numbers, between 0 and 1.

The usual check that I do is sorting the list, and histograming the difference between the following and the previous one. The shape of the histogram should follow an negative exponential, and the slope will be the size of the list. If the size of the list is around 1e5, no strange effect is observed. If the size of the list is around 1e8, empty bins appear, following a certain repetitive pattern (empty bin - filled bin - empty bin - filled bin - empty bin - filled bin - 3 empty bins - filled bin ... )

I'm not sure about this test, because if the size of the list is huge, the exponential slope will be also huge, and I don't know how to face precision issues. For a list of 1e8 numbers, 1% are repeated (a std::map is used for storing the numbers, so sorting and controlling repetitions is easy).

Question: how to check (in a better way) if a list of numbers is uniformly random?

Thank you for your time.

Regards.
 
Technology news on Phys.org
  • #2
A good way is to apply the ARIMA time series analysis of Box-Jenkins. It looks for any autocorrelation between the series and any number of lagged values of the same series. So it does what you are trying to do for one lag, but it does it for any number of lags. (You might have every number heavily influenced by the number 10 places earlier, even if it is completely independent of the immediately preceding number.) And it can give you statistics of how statistically significant the autocorrelations are.

PS. It only tests the autocorrelations. You need to use another test to verify that the distribution is close enough to uniform. The Chi-squared goodness of fit test would work well for that.
 
Last edited:
  • Like
Likes berkeman, ORF and jim mcnamara
  • #3
Convert the number sequence ##(x_1 , x_2 , x_3 ,\dots )## to a 2d point sequence ##(x_1 , x_2 ),(x_3 ,x_4 ), (x_5 ,x_6),\dots## and plot the points on an xy-plane. If they're not random or uniformly distributed, you will probably see some obviously "non-random" patterns in the image.

See this blog post I've written: https://physicscomputingblog.com/2017/01/15/random-numbers-in-computation/
 
  • Like
Likes ORF and Borg
  • #4
ORF said:
Hello

I have used a random number generator to create a list of uniformly random numbers, between 0 and 1.

The usual check that I do is sorting the list, and histograming the difference between the following and the previous one. The shape of the histogram should follow an negative exponential, and the slope will be the size of the list. If the size of the list is around 1e5, no strange effect is observed. If the size of the list is around 1e8, empty bins appear, following a certain repetitive pattern (empty bin - filled bin - empty bin - filled bin - empty bin - filled bin - 3 empty bins - filled bin ... )

I'm not sure about this test, because if the size of the list is huge, the exponential slope will be also huge, and I don't know how to face precision issues. For a list of 1e8 numbers, 1% are repeated (a std::map is used for storing the numbers, so sorting and controlling repetitions is easy).

Question: how to check (in a better way) if a list of numbers is uniformly random?

Thank you for your time.

Regards.

If you generated the numbers using the random number generator on a computer, they are almost certainly not random, but have been generated using a pseudo-random sequence generator. These numbers will repeat after a certain (usually very large) number of draws. If you require truly random numbers, this is not an easy problem. Most approaches that I know use dedicated hardware based on some sort of stochastic physical process to generate the numbers. Having said that, pseudo-random numbers are usually adequate for almost all applications. You may be able to increase the repeat period of the numbers by using a different seed.
 
  • Like
Likes ORF
  • #5
Hello

@FactChecker : thank you for the tool! I will try to implement it.
@hilbert2 : thank you for the advice. Sadly, there is nothing strange in that 2D plot.
@phyzguy : you mean a different seed after, let's say, 1e6 random numbers?

Thank you all for your time :)

Regards
 
Last edited:
  • #6
Well, I didn't mean to change the seed part way through the computation, but that's not a bad idea. I meant that the repeat period can be dependent on the seed, and some seeds have longer repeat periods than others.
 
  • Like
Likes ORF
  • #7
ORF said:
Hello

@FactChecker : thank you for the tool! I will try to implement it.
The free software R has an ARIMA program. https://stat.ethz.ch/R-manual/R-devel/library/stats/html/arima.html
@hilbert2 : thank you for the advice. Sadly, there is nothing strange in that 2D plot.
I am not surprised. If you are using a modern uniform random number generator that is part of a computer language standard library, they probably pass all but the most sophisticated tests.
@phyzguy : you mean a different seed after, let's say, 1e6 random numbers?
I don't think this is good idea. It is very unlikely that your choice of the next seed will be statistically better than the continuation of the pseudo-random number sequence from the first seed. And it will throw another variable in that you will need to keep track of. One place where you want different seeds is for different random number sequences where you want to guarantee that two or more simulated options get the same "luck of the draw". For instance, if you were simulating a battle comparing force A versus B fighting the bad guys, there may be some very critical random numbers (e.g. how close a nuclear bomb comes to a target or whether a critical bridge gets captured intact) where you want A and B to have the same luck. This could be in the middle of a large simulation where A is on the 10,000th random number and B is on the 50,000th random number. Then you need a separate random number sequence, nukeAccuracy, that you can use the same luck (i.e. random number) just for those major events in both A and B.
 
Last edited:
  • Like
Likes ORF
  • #8
PRNG's are cyclic, they will repeat after (usually) a large number iterations. The cyclic nature of these algorithms is used in GPS, for example. So simply because a random number algorithm is cyclic does not preclude it's use, or mean that it is defective. You! get to pick the one that meets your needs.

Some references:
Pseudo Random Number Generators for Statistical Applications, L. E. Cannon (1971)
Generation of Pseudo-Random Numbers L. E. Howell 1982

Note - outside of cryptography PNRG's are pretty much a done deal. So do not reinvent 40 year old tech. Unless of course this is just part of homework or general messing around. Cryptographically sound PNRG's are periodically revisited.
Read Schneier: https://www.schneier.com/blog/archives/2006/06/random_number_g.html
 
  • Like
Likes ORF and FactChecker
  • #9
jim mcnamara said:
PRNG's are cyclic, they will repeat after (usually) a large number iterations.
This is true for any decent PRNG you are likely to actually use. A PRNG with finite bounded state must repeat. However, a PRNG with a state that is allowed to grow without bound need not repeat. As a simple example, consider a "PRNG" that generates the hexadecimal digits of pi. (Or one which simply outputs Champernown's constant).
 
  • Like
Likes ORF and jim mcnamara
  • #10
You are asking how to determine whether the sequence is "uniformly" random. There are several things that a random sequence can provide. In some cases you may need a sequence that is random in the sense that it cannot be predicted. Since you are asking about "uniformity", I will assume that that is what you need and that is what I will address.

By uniformity, you seem to mean something like "all values from 0 to 1 have an equal chance to be selected". But I am guessing that the value you are receiving from your random number generator is a floating point number. If that is the case, what your random number generator is probably doing is this:

1) Generate a random integer value from 0 to N (or perhaps 1 to N) where N is very large - perhaps on the order of 2^64.
2) Convert the value to floating point.
3) Scale it to 0 to 1.

Note that there are a lot more floating point value from 0 to 0.1 than there are from 0.1 to 1.0. So "uniformly" does mean an equal chance for all numbers, it means that the chance of the random value occurring within a sub-interval of the interval 0 to 1 must be the width of that sub-interval.

Let's say that the core random value is in the range 1 to (2^64)-1 (and, if your seed is 64 bits long, this is actually likely). That means that the range of 0 to 2^-30 (about a billionth) will only be receiving about 2^34 codes. Since there are more than 34 bits in the mantissa of a "double" floating point value, you are assured that certain values will never be returned by the random number generator.

In contrast, the higher range values (0.1 to 1.0) will have fewer bits than the core random value - and so they will be hit multiple times before the core random value is repeated.

But, of course, if it is on the order of 2^64, it will not repeat. You don't have enough time, processors, and electric power to call the random number generator 2^64 (about 2*10^19) times.

Detecting that lack of uniformity near zero will be hard to detect. The best way to make sure that it is generating the right random value is to look at the code for the generator. If that is not possible, it is not difficult to write a random number generator to your own spec.

The fact is, uniformity is the easiest thing for the random number generator to do. In most cases, the problem is that it is too uniform. For example, you can use a shift and taps register to generate a full length 33 bit code - and then reject all code that start with a zero. This will give you a precise uniformity over the range 0 to (2^32)-1. But "truly" random numbers are not "precisely" uniform.
 
  • Like
Likes ORF
  • #11

FAQ: How to check if a list of numbers is random?

How do I know if my list of numbers is truly random?

The best way to check if a list of numbers is random is to perform a statistical test, such as the Chi-square test or the Kolmogorov-Smirnov test. These tests can determine if the numbers in your list follow a specific pattern or distribution, which would indicate that they are not truly random.

Can I visually check if a list of numbers is random?

While it is possible to visually inspect a list of numbers for patterns, this method is not reliable for determining randomness. Our brains are wired to look for patterns, so we may see patterns even in a truly random list of numbers. It is best to use a statistical test for a more accurate evaluation.

What is the significance level when checking for randomness?

The significance level is the probability of rejecting the null hypothesis (the numbers are random) when it is actually true. A common significance level used in statistical tests is 0.05, meaning there is a 5% chance of rejecting the null hypothesis when it is true. A lower significance level indicates a higher degree of confidence in the results.

Can I use a pseudorandom number generator to create a list of random numbers?

Pseudorandom number generators (PRNGs) use algorithms to generate a sequence of numbers that appear to be random. While they may be sufficient for some applications, PRNGs are not truly random and may not pass statistical tests. It is best to use a hardware-based random number generator for more accurate and secure results.

Is it possible to have a list of numbers that is partially random?

Yes, it is possible to have a list of numbers that is partially random. This can occur if certain numbers are intentionally chosen or if there is a pattern within the list. In these cases, a statistical test may still detect the non-randomness, but it may not be as obvious as a completely random or completely non-random list of numbers.

Similar threads

Back
Top