Number of Die Throws until you get a repeated number

  • Thread starter Master1022
  • Start date
In summary, the question is asking how many throws it takes to get a repeat number on a 6-sided die. The answer is that it takes between 1 and 7 throws.
  • #1
Master1022
611
117
Homework Statement
If we have a 6-sided die (it is a fair die), how many throws would we expect to make before we see a repeated number?
Relevant Equations
Probability
Hi,

I found a similar problem online, but there wasn't any solution. The question actually had a larger number of sides (e.g. 50 or 100, etc.), but I thought the underlying solution method should be the same for any arbitrary ##n##-sided die.

Question:
If we have a 6-sided die (it is a fair die), how many throws would we expect to make before we see a repeated number?

Attempt:
I have been thinking about this problem for a while, and am very stuck on how to proceed. I did write a Python script to simulate this game for different ##n##, and the graph aligned with intuition (larger ##n## required more throws until repetition). I remember reading that this problem was similar to the birthday paradox conceptually, however I haven't been able to use that to solve the problem.

If I follow that birthday paradox line of reasoning, I suppose I could do:
[tex] P(\text{repetition on 5th throw}) = 1 \times \frac{5}{6} \times \frac{4}{6} \times \frac{3}{6} \times \frac{4}{6} = \frac{(6)! \cdot 4}{(6-4)! \cdot 6^5} [/tex]

Thus this could be generalized to the probability of a repetition on the ##k##-th throw:
[tex] P(\text{repetition on k-th throw for n sided die}) = \frac{(n)! \times (k-1)}{(n-(k-1))! \cdot n^k} [/tex]

Is the above correct? I am not sure I am fully convinced by the abiThis question was an interview question, so how could one find the expected number without a calculator - are there tricks or approximations?

Any help would be greatly appreciated.
 
Physics news on Phys.org
  • #2
You can't go more than ##7## throws without repetition. Using your formula for ##k =2 \dots 7## should be easy?
 
  • Like
Likes WWGD, FactChecker and Master1022
  • #3
The chances are 0 in 6 = 0.0 ... that you hit on the first roll.
The chances are 1 in 6 = 0.1666 ... that you hit on the second roll.
The chances are 2 in 6 = 0.3333 ... that you hit on the third roll.
The chances are 3 in 6 = 0.5 that you hit on the fourth roll.
The chances are 4 in 6 = 0.6666 ... that you hit on the fifth roll.
The chances are 5 in 6 = 0.8333 ... that you hit on the sixth roll.
The chances are 6 in 6 =1.0 that you hit on the seventh roll.

I hope you find this helpful.

I corrected this during the time @PeroK posted his comment.
 
  • #4
Buzz Bloom said:
The chances are 1 in 6 = 0.1666 ... that you hit on the first roll.
The chances are 2 in 6 = 0.3333 ... that you hit on the second roll.
The chances are 3 in 6 = 0.5 that you hit on the third roll.
The chances are 4 in 6 = 0.6666 ... that you hit on the fourth roll.
The chances are 5 in 6 = 0.8333 ... that you hit on the fifth roll.
The chances are 6 in 6 =1.0 that you hit on the sixth roll.

I hope you find this helpful.
Or, wrong!
 
  • #5
As @PeroK pointed out, you can't go more than 7 throws without repetition ##-##
PeroK said:
You can't go more than ##7## throws without repetition. Using your formula for ##k =2 \dots 7## should be easy?
More precisely, 7 throws is enough to guarantee repetition ##-## this is an example of the pigeonhole principle.
 
  • Like
Likes Delta2
  • #6
Thanks @PeroK, @Buzz Bloom , and @sysprog!

Sure, for this 6-sided die, going through different values of ##k## isn't too bad. However, I just used 6 to simplify the problem (was originally 100-sided die), in which case I think it becomes more difficult. Was my underlying method/reasoning correct?

EDIT: this question was asked in an interview with no access to calculators
 
  • #7
Master1022 said:
Thanks @PeroK, @Buzz Bloom , and @sysprog!

Sure, for this 6-sided die, going through different values of ##k## isn't too bad. However, I just used 6 to simplify the problem (was originally 100-sided die), in which case I think it becomes more difficult. Was my underlying method/reasoning correct?

EDIT: this question was asked in an interview with no access to calculators
What do you mean by repetition? Is that two in a row the same?
 
  • Like
Likes sysprog
  • #8
Master1022 said:
Thanks @PeroK, @Buzz Bloom , and @sysprog!

Sure, for this 6-sided die, going through different values of ##k## isn't too bad. However, I just used 6 to simplify the problem (was originally 100-sided die), in which case I think it becomes more difficult. Was my underlying method/reasoning correct?

EDIT: this question was asked in an interview with no access to calculators
By the pigeonhole principle, if you throw a 100-sided die 100 times and get a different number on each throw, there is no number for throw 101 that is not a repeated number. You of course could get a repeat on throw 2; however, you have only a 1% chance of doing that. Throw 1 sets the first number that is a candidate for repetition. If throw 2 is a different number, then you have 2 candidate numbers for throw 3, meaning you have a 2% chance on throw 3 ##\dots##
 
Last edited:
  • #9
One thing to realize is that the original problem with "100-sided die" might be expecting an entirely different approach to the solution. A large number of sides makes me suspect that there is a continuous approximation that they expect you to apply. I'm sorry that I am too rusty to recognize what the approximation might be (Poisson? It doesn't seem right.). There might be something appropriate in the section where the problem appears.
 
Last edited:
  • Like
Likes WWGD, sysprog and Master1022
  • #10
PeroK said:
What do you mean by repetition? Is that two in a row the same?
Sorry. Just mean't the same number appears again, but doesn't have to be consecutive. E.g. 1, 4, 5, 1 would count as 1 has been repeated now
 
  • Like
Likes sysprog
  • #11
Master1022 said:
EDIT: this question was asked in an interview with no access to calculators
Sometimes an interviewer will ask a question that cannot be answered within the available time or resources: they are interested in how you approach the problem rather than what the solution is.

So a good answer might go "Hmmm, that looks tricky. I can see straight away that the answer must be less than 101 [interviewer notes "can narrow down a problem by setting bounds"], but it looks like a lot less [Interviewer notes "demonstrates ability to estimate"]. It reminds me of the birthday paradox [interviewer notes "has experience of common problems in probability"]". At this point the interviewer interrupts: "yes that's right: do you know anywhere this might be useful in your job at CodesRUs?".

Candidate: "Well the same problem is encountered when we create a hash key for indexing a table - the longer the key the less frequently we get hash collisions [Interviewer notes: "follow-up interview with database team"]. And again in cryptography - I think it's even called a 'birthday attack'; this is where an attacker exploits the repetition associated with the length of the key [Interviewer notes: "follow-up interview with security team"]. And of course the birthday paradox always goes down well at parties [Interviewer notes: "not suitable for customer facing role" :-p]."
 
Last edited:
  • Like
Likes Master1022, jim mcnamara, DaveC426913 and 3 others
  • #12
Or, it must be about ##\frac n 2 + 1## but it's a tricky calculation to get an exact answer!
 
  • Like
Likes sysprog
  • #13
PeroK said:
Or, it must be about ##\frac n 2 + 1## but it's a tricky calculation to get an exact answer!
I think that's partly because there are multiple ways to do it ##-## here's one such that for ##n=100## I'd need a calculator to get an exact answer: to get a repetition in ##k## rolls of an ##n##-sided die, ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}##. (ref)
 
Last edited:
  • #14
FactChecker said:
One thing to realize is that the original problem with "100-sided die" might be expecting an entirely different approach to the solution. A large number of sides makes me suspect that there is a continuous approximation that they expect you to apply. I'm sorry that I am too rusty to recognize what the approximation might be (Poisson? It doesn't seem right.). There might be something appropriate in the section where the problem appears.
An asymptotic approximation would be: ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}\sim\sqrt{\frac{n\pi}2}+\frac23##
(with an error of approximately ##\frac1{10\sqrt{n}}##). (ref)
 
  • Like
  • Informative
Likes Master1022, FactChecker and PeroK
  • #15
sysprog said:
An asymptotic approximation would be: ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}\sim\sqrt{\frac{n\pi}2}+\frac23##
(with an error of approximately ##\frac1{10\sqrt{n}}##). (ref)
Nice! I guess I was thinking of an asymptotic approximation to a known common probability distribution, like Poisson. But the question was asked in an interview, so it may be just to see how the person responds and analysis the problem, as @pbuk pointed out. There might not be any reasonably simple solution.
 
  • Like
Likes Master1022 and sysprog
  • #16
FactChecker said:
Nice! I guess I was thinking of an asymptotic approximation to a known common probability distribution, like Poisson. But the question was asked in an interview, so it may be just to see how the person responds and analysis the problem, as @pbuk pointed out. There might not be any reasonably simple solution.
Handling the (similar) 'birthday problem' as Poisson processes is discussed here.

Also, the expected number of rolls until the first ##k##-repetition with an ##n##-sided die is given by the integral: ##\mathbb{E}(T) = \int_0^\infty \left(\sum_{j=0}^{k-1} {1\over j!}\left({x\over n}\right)^j\right)^n \,e^{-x}\,dx##. (ref)

I would use my TI-83 calculator for this if I wanted a reasonably quick answer.
 
  • Like
Likes uart, FactChecker and PeroK
  • #17
sysprog said:
An asymptotic approximation would be: ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}\sim\sqrt{\frac{n\pi}2}+\frac23##
(with an error of approximately ##\frac1{10\sqrt{n}}##). (ref)
For ##n =6## I got ##3.847##, and your formula gives ##3.737##. It appears to converge quickly.
 
  • Like
Likes FactChecker and sysprog
  • #18
Also, quite amusing is the case of a coin, I.e. ##n =2##. The formula gives ##2.44 ##, which is pretty close to the precise ##2.5##.
 
  • Like
Likes sysprog
  • #19
Doing the recursion manually for ##n=6## yields ##{1223\over324}\approx 3.77##. (ref)
 
  • #20
pbuk said:
Sometimes an interviewer will ask a question that cannot be answered within the available time or resources: they are interested in how you approach the problem rather than what the solution is.

So a good answer might go "Hmmm, that looks tricky. I can see straight away that the answer must be less than 101 [interviewer notes "can narrow down a problem by setting bounds"], but it looks like a lot less [Interviewer notes "demonstrates ability to estimate"]. It reminds me of the birthday paradox [interviewer notes "has experience of common problems in probability"]". At this point the interviewer interrupts: "yes that's right: do you know anywhere this might be useful in your job at CodesRUs?".

Candidate: "Well the same problem is encountered when we create a hash key for indexing a table - the longer the key the less frequently we get hash collisions [Interviewer notes: "follow-up interview with database team"]. And again in cryptography - I think it's even called a 'birthday attack'; this is where an attacker exploits the repetition associated with the length of the key [Interviewer notes: "follow-up interview with security team"]. And of course the birthday paradox always goes down well at parties [Interviewer notes: "not suitable for customer facing role" :-p]."
A technical point : 101 throws gives you a guarantee, but I believe the question was about the expected number of throws. Would n/2 or ( least integer greater than n/2 ) for an n-sided fair die be a good rule of thumb?
 
  • Like
Likes sysprog
  • #21
WWGD said:
A technical point : 101 throws gives you a guarantee, but I believe the question was about the expected number of throws. Would n/2 or ( least integer greater than n/2 ) for an n-sided fair die be a good rule of thumb?
A formula that provides a good approximation was provided here:

sysprog said:
An asymptotic approximation would be: ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}\sim\sqrt{\frac{n\pi}2}+\frac23##
(with an error of approximately ##\frac1{10\sqrt{n}}##). (ref)
 
  • Like
Likes Master1022, WWGD and sysprog
  • #22
WWGD said:
A technical point : 101 throws gives you a guarantee, but I believe the question was about the expected number of throws. Would n/2 or ( least integer greater than n/2 ) for an n-sided fair die be a good rule of thumb?
That seems like a good guess, but it is not close to the approximation, ##\sqrt{n\pi/2}+2/3##, that @sysprog gave. That gives about 13 when n=100.
Compare this problem to the well-known problem of the odds of two students in a class of 30 having the same birthday. The answer to that is that it is very likely. But half of 365 would make it very unlikely.
 
  • Like
Likes Master1022, sysprog and WWGD
  • #23
FactChecker said:
That seems like a good guess, but it is not close to the approximation, ##\sqrt{n\pi/2}+2/3##, that @sysprog gave. That gives about 13 when n=100.
Compare this problem to the well-known problem of the odds of two students in a class of 30 having the same birthday. The answer to that is that it is very likely. But half of 365 would make it very unlikely.
Thanks. So much for my intuition ;).
 
  • #24
WWGD said:
Thanks. So much for my intuition ;).
Mine too. When I read your post I thought you had to be right. But now I guess not.
 
  • Like
Likes WWGD
  • #25
FactChecker said:
Mine too. When I read your post I thought you had to be right. But now I guess not.
I beat you both to it:

PeroK said:
Or, it must be about ##\frac n 2 + 1## but it's a tricky calculation to get an exact answer!
 
  • Like
Likes FactChecker
  • #26
I think that there must be multiple interpretations of what the question means. As I understand the question, a trial consists of throwing the die and counting the number of throws which takes place such that the last throw is a number that has appeared previously. The question is: what is the average number of throws per trial? For a six sided die, the number of throws for one trial can be 1, 2, 3, 4, 5, 6, or 7.

The probability of success for roll n is as follows

n=1 p=0
n=2 p=(1/6)=0.166666666666667
n=3 p = (5/6)x(2/6)=0.277777777777778
n=4 p=(5/6)x(4/6)x(3/6)=0.277777777777778
n=5 p=(5/6)x(4/6)x(3/6)x(4/6)=0.185185185185185
n=6 p=(5/6)x(4/6)x(3/6)x(2/6)x(5/6)=0.0771604938271605
n=7 p=(5/6)x(4/6)x(3/6)x(2/6)x(1/6)x1=0.0154320987654321

The average number of rolls over a large number of trials is 3.77469135802469.

The above was calculated on a spreadsheet. I have started to work on a spreadsheet to calculate the case for a die with 100 sides, but it is likely to take me a few days.

OOPS - Bad calculatation for n=7. I will fix it soon.
FIXED IT!
 
Last edited:
  • Like
Likes PeroK and sysprog
  • #27
Buzz Bloom said:
The average number of rolls over a large number of trials is 3.77469135802469.
That's remarkably close to the manually calculable expected value.
sysprog said:
Doing the recursion manually for ##n=6## yields ##1223 \over {324}#### \approx 3.77##. (ref)
 
  • #28
sysprog said:
That's remarkably close to the manually calculable expected value.
I'm pretty sure that *is* the manually calculated expected value.
 
  • Like
Likes sysprog
  • #29
uart said:
I'm pretty sure that *is* the manually calculated expected value.
I agree ##-## 1223/324 is the manually calculable expected value, and its decimal expansion for the number of digits reported by @Buzz Bloom is exactly the same as on a calculator ##-## the reason for my characterizing it as 'remarkably close' is that it was reported as an average over a large number of trials; not as simply the result of doing a decimal division to expand the expected value fraction . . .
 
Last edited:
  • #30
pbuk said:
Sometimes an interviewer will ask a question that cannot be answered within the available time or resources: they are interested in how you approach the problem rather than what the solution is.

So a good answer might go "Hmmm, that looks tricky. I can see straight away that the answer must be less than 101 [interviewer notes "can narrow down a problem by setting bounds"], but it looks like a lot less [Interviewer notes "demonstrates ability to estimate"]. It reminds me of the birthday paradox [interviewer notes "has experience of common problems in probability"]". At this point the interviewer interrupts: "yes that's right: do you know anywhere this might be useful in your job at CodesRUs?".

Candidate: "Well the same problem is encountered when we create a hash key for indexing a table - the longer the key the less frequently we get hash collisions [Interviewer notes: "follow-up interview with database team"]. And again in cryptography - I think it's even called a 'birthday attack'; this is where an attacker exploits the repetition associated with the length of the key [Interviewer notes: "follow-up interview with security team"]. And of course the birthday paradox always goes down well at parties [Interviewer notes: "not suitable for customer facing role" :-p]."
Sorry for late reply - finally have some time to get back to this problem. Sure, this seems reasonable. Also, thanks for the insight into the use cases; didn't know about those before and will definitely read up on them!
 
  • Like
Likes sysprog
  • #31
sysprog said:
An asymptotic approximation would be: ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}\sim\sqrt{\frac{n\pi}2}+\frac23##
(with an error of approximately ##\frac1{10\sqrt{n}}##). (ref)
Thanks @sysprog ! I actually made a Python simulation of this game for an arbitrary ##n##-sided dice (unfortunately was made on a machine which I cannot access now). Nonetheless, your approximation gives a very similar result to some of the values I was getting from repeated experimentation (e.g. I also got a similar value for n = 100 to the one quoted by @FactChecker )

EDIT: what type of things should I read to learn about how one would derive that approximation? (or approximations to binomials in general)? I have never done any full probability & statistics courses so my knowledge only extends to reading random lecture notes on the internet.
 
  • #32
Master1022 said:
Thanks @sysprog ! I actually made a Python simulation of this game for an arbitrary ##n##-sided dice (unfortunately was made on a machine which I cannot access now). Nonetheless, your approximation gives a very similar result to some of the values I was getting from repeated experimentation (e.g. I also got a similar value for n = 100 to the one quoted by @FactChecker )

EDIT: what type of things should I read to learn about how one would derive that approximation? (or approximations to binomials in general)? I have never done any full probability & statistics courses so my knowledge only extends to reading random lecture notes on the internet.
Try the original link and let us know if you have a question:

https://math.stackexchange.com/ques...ded-die-until-repeat-is-found/1908500#1908500

Here, basically, you have 6* options for the first throw, 5 for the second, etc. These can be rearranged in Cn,k (" n Choose k ) ways. All of it in n^k throws of the dice.

* Or, n, in the general case.
 
  • Like
Likes sysprog and Master1022
  • #33
WWGD said:
Try the original link and let us know if you have a question:

https://math.stackexchange.com/ques...ded-die-until-repeat-is-found/1908500#1908500

Here, basically, you have 6* options for the first throw, 5 for the second, etc. These can be rearranged in Cn,k (" n Choose k ) ways. All of it in n^k throws of the dice.

* Or, n, in the general case.
Okay thanks will do. I will read the answer again, but I thought they just skipped from the summation expression to the form with ## \sqrt{\frac{n \pi}{2}} + ... ##
 
  • Like
Likes sysprog
  • #34
Master1022 said:
Okay thanks will do. I will read the answer again, but I thought they just skipped from the summation expression to the form with ## \sqrt{\frac{n \pi}{2}} + ... ##
Yes, this is a limit form as the number of sides grows to ##\infty ##. I think it uses Stirling's approximation.
 
  • Like
Likes Master1022 and sysprog
  • #35
Master1022 said:
Thanks @sysprog ! I actually made a Python simulation of this game for an arbitrary ##n##-sided dice (unfortunately was made on a machine which I cannot access now). Nonetheless, your approximation gives a very similar result to some of the values I was getting from repeated experimentation (e.g. I also got a similar value for n = 100 to the one quoted by @FactChecker )

EDIT: what type of things should I read to learn about how one would derive that approximation? (or approximations to binomials in general)? I have never done any full probability & statistics courses so my knowledge only extends to reading random lecture notes on the internet.
Links to free textbooks on combinatorics:
https://www.whitman.edu/mathematics/cgt_online/cgt.pdf
https://users.math.msu.edu/users/bsagan/Books/Aoc/final.pdf
http://www-math.mit.edu/~rstan/ec/ec1.pdf
https://people.math.gatech.edu/~trotter/book.pdf
http://www.math.utk.edu/~wagner/papers/comb.pdf

There are many more if you do a search on: combinatorics textbook pdf
I included only a few that are from .edu sources; I can't be sure of the copyright authorizations on the others ##-## at least not sure enough to post them on PF.

In general, combinatorics may be thought of as a subdiscipline of discrete mathematics, although in asymptotic approximations and other aspects it abuts on or overlaps some continuous mathematics.
 
Last edited:
  • Like
  • Informative
Likes Master1022, PeroK and WWGD

FAQ: Number of Die Throws until you get a repeated number

How is the number of die throws until you get a repeated number calculated?

The number of die throws until you get a repeated number is calculated by simulating multiple trials of rolling a die until a repeated number appears. The average number of throws needed for a repeated number to occur is then calculated and used as the result.

Is there a specific formula for calculating the number of throws until a repeated number?

No, there is not a specific formula for calculating the number of throws until a repeated number. It is determined through simulation and statistical analysis.

Does the type or number of dice used affect the number of throws until a repeated number?

Yes, the type and number of dice used can affect the number of throws until a repeated number. For example, using two dice instead of one will increase the likelihood of a repeated number occurring sooner.

How many trials are typically used to calculate the average number of throws until a repeated number?

The number of trials used can vary depending on the desired level of accuracy. However, a larger number of trials (e.g. 1000 or more) will generally result in a more accurate average.

Can the number of throws until a repeated number be predicted?

No, the number of throws until a repeated number cannot be predicted with certainty. It is a random process and can vary each time it is simulated.

Back
Top