Coupon Collection Problem Variation

  • Thread starter Master1022
  • Start date
  • Tags
    Variation
In summary, the problem reminded me of the coupon problem, but the probabilities of the 'coupons' are different.
  • #1
Master1022
611
117
Homework Statement
Call a “consecutive difference” the absolute value of the difference between two consecutive rolls of a d6. For example, the sequence of rolls 143511 has the corresponding sequence of consecutive differences 31240. What is the expected number of rolls until all 6 consecutive differences have appeared?
Relevant Equations
Probability
Hi,

I was attempting the following problem but didn't quite understand the steps in the solution. The problem reminded me of the coupon problem, but the probabilities of the 'coupons' are different.

Question: Call a “consecutive difference” the absolute value of the difference between two consecutive rolls of a d6. For example, the sequence of rolls 143511 has the corresponding sequence of consecutive differences 31240. What is the expected number of rolls until all 6 consecutive differences have appeared?

Attempt:
The first thing I did was draw out a table of the differences between two d6 rolls and found the probability distribution of the difference ##D##: ##P(D = 0) = \frac{6}{36}##, ##P(D = 1) = \frac{10}{36}##, ##P(D = 2) = \frac{8}{36} ##, ## P(D = 3) = \frac{6}{36} ##, ## P(D = 4) = \frac{4}{36} ##, and ## P(D = 5) = \frac{2}{36} ##.

The first difference will be shown after 1 roll. After that, I thought I would do the sum of ## \frac{1}{p} ##, but the solution says:

''' The most likely first consecutive dif- ference to get is the one with probability 10/36, then the expected number of rolls until getting a new consecutive difference would be ## \frac{36}{(36 − 10)} = \frac{36}{26} ##. We can proceed in this way, getting an estimate of ##1 + \frac{36}{36} + \frac{36}{26} + \frac{36}{18} + \frac{36}{12} + \frac{36}{6} + \frac{36}{2} \approx 32.38 ## '''

However, I don't understand the intuition behind terms such as ## \frac{36}{36 - 10} ##. Would anyone be able to help out and explain these concepts?

Thanks in advance
 
Physics news on Phys.org
  • #2
After you hit the 10/36 case, there are only 26 rolls you can get which give you a new case you haven't seen. So you need to be do 36/26 such rolls. The most likely outcome there is that you get the one with 8/36 chance. Then 10+8=18 out of the 36 cases you wrote down have been seen, so there are only 36-18=18 cases left that give you a new consecutive difference. So you need to roll 36/18 times on average to see a new case.

I think a nice follow up question to this solution is: is it overestimating or underestimating the true correct answer?
 
  • Like
Likes Master1022
  • #3
Office_Shredder said:
After you hit the 10/36 case, there are only 26 rolls you can get which give you a new case you haven't seen. So you need to be do 36/26 such rolls. The most likely outcome there is that you get the one with 8/36 chance. Then 10+8=18 out of the 36 cases you wrote down have been seen, so there are only 36-18=18 cases left that give you a new consecutive difference. So you need to roll 36/18 times on average to see a new case.

I think a nice follow up question to this solution is: is it overestimating or underestimating the true correct answer?
Thanks @Office_Shredder ! This does make sense now.

I think it is overestimating the true answer. I feel like the above method is considering getting a new difference one at a time (almost assuming some kind of 'independence')
 
  • #4
A crude lower bound can be had by taking the most improbable (i.e. largest) difference and seeing how long on average before that difference arises. For an N-sided die, about ##N^2/2##.
Better, we can note that the probability of difference i is
##i=0: p_i=\frac 1N##
##i>0: p_i=\frac{2N-2i}{N^2}##
Treating consecutive differences as independent, the probability that difference i occurs at least once in r trials is ##1-(1-p_i)^r##.
So for all to have occurred, the probability is ##\Pi_i(1-(1-p_i)^r)##.
For this to occur first at the rth roll: ##\Pi_i(1-(1-p_i)^r)-\Pi_i(1-(1-p_i)^{r-1})##.
Average number of rolls = ##\Sigma_{r=1}r[\Pi_i(1-(1-p_i)^r)-\Pi_i(1-(1-p_i)^{r-1})]##
##=\Sigma_{r=0} (1-\Pi_i(1-(1-p_i)^r))##.
This gives about 22.47 for N=6, 5.73 for N=3.

Treating consecutive differences as independent is likely to give a low answer. In practice, small differences are more likely to be followed by more small differences than by large differences.

The exact solution for N=3 is 7. I got this by defining 16 states, denoted as ##\alpha n##, where ##\alpha## is A if the last roll was a 1 or 3, B if the last roll was a 2, and ##n## is a number from 0 to 7 indicating which differences have been encountered: 0 initially, then the sum of 1 if a 0 has been seen, 2 if a 1 has been seen, 4 if a 2 has been seen. We stop when the state number reaches 7.

So the sequence of throws 13112 would go through the states A0 A4 A4 A5 B7.
It is not hard to figure out how many rolls more would be expected on average in each state, working back from the final states.
E.g. from A6 the next roll takes us:
1-> A7
2-> B6
3-> A6
whereas from B6:
1-> A6
2-> B7
3-> A6
whence we compute the average further throws as 3 for both A6 and B6.
Note that states B4 and B5 cannot be reached.

For N=6, I simulated the exact behaviour in a spreadsheet using random numbers. It seems to average about 27.
 
  • #5
haruspex said:
For N=6, I simulated the exact behaviour in a spreadsheet using random numbers. It seems to average about 27.
Using this CodePen the expected value appears to converge to exactly 25. Interesting. approximately 25.85.

 
Last edited:
  • Informative
Likes Master1022
  • #6
haruspex said:
The exact solution for N=3 is 7.
I haven't checked your workings but the Monte Carlo solution above appears to converge to ~7.81.
 
  • #7
pbuk said:
I haven't checked your workings but the Monte Carlo solution above appears to converge to ~7.81.
You're right, I found an error.
These are my equations:
A7=B7=0 (we're done, no more throws needed)
B6=1+(A6+B7+A6)/3=1+2A6/3
(i.e., last throw was a 2 (so 'B'), we throw a 1, 2 or 3 next;
throwing a 1 means diff of 1, which ORs 2 into the state number and the new 'last throw' is 'A'; etc.
Note that we don't need to distinguish between last throw being a 1 or 3 because of the symmetry.)
A6=1+(A7+B6+A6)
Solving: A6=B6=3.
A5=1+(A5+B7+A5)/3= 1+2A5/3
Hence A5=3
A3=1+(A3+B3+A7)/3=1+(A3+B3)/3
Hence 2A3=3+B3
B3=1+(A3+B3+A3)/3
Hence B3=6, A3=9/2
A1=1+(A1+B3+A5)/3=1+(A1+6+3)/3=A1/3+4
Hence A1=6
B1=1+(A3+B1+A3)/3=1+(B1+9)/3
Hence B1=6
A2=1+(A3+B2+A6)/3=1+(9/2+B2+3)/3=B2/3+7/2
B2=1+(A2+B3+A2)/3=3+2A2/3
Hence A2=(3+2A2/3)/3+7/2
7A2/9=9/2
A2=81/14
B2=48/7
A4=1+(A5+B6+A4)/3=3+A4/3
Hence A4=9/2
A0=1+(A1+B2+A4)/3=1+(6+48/7+9/2)/3=1+2+16/7+3/2=95/14
B0=1+(A2+B1+A2)/3=1+(81/7+6)/3=48/7
Expected number of throws=1+(A0+B0+A0)/3=164/21=7.8095...

Applying the same approach to N=6 gets hard because at the point where about half of the differences have arisen we get 20 different numeric elements to the state (3 of the six possible differences) and 3 alphabetic elements, leading to a system of nearly 60 simultaneous equations to solve. (A few combinations are impossible.)
That could be automated, though.
 
  • Like
Likes pbuk
  • #8
A next step would be to find an upper bound.
If the last throw was an extreme value, 1 or N, then all differences are possible and equally likely for the next throw. If there are r differences yet to be achieved, the probability of hitting one is r/N.
If not at an extreme, it will be on average N/2 throws to get back to being at an extreme.
If we ignore the possibility of scoring a new difference from other than an extreme position we have an average of ##(N/r)(N/2)=N^2/(2r)## throws to achieve a new difference and return to an extreme.
Summing, that gives an upper bound of ##\frac 12N^2\ln(N)##.
 
  • Informative
Likes Master1022
  • #9
Many thanks for your responses - these are very insightful!
 

FAQ: Coupon Collection Problem Variation

What is the Coupon Collection Problem Variation?

The Coupon Collection Problem Variation is a mathematical problem that deals with collecting a set number of different items, or "coupons", from a larger pool of items with different probabilities of being collected. It is a variation of the classic Coupon Collection Problem, which assumes equal probabilities for each item.

What is the significance of the Coupon Collection Problem Variation?

The Coupon Collection Problem Variation has applications in various fields such as statistics, computer science, and economics. It can be used to model real-world scenarios such as collecting data or information, and can help in decision making and resource allocation.

What are some strategies for solving the Coupon Collection Problem Variation?

Some strategies for solving the Coupon Collection Problem Variation include the "greedy algorithm", which involves collecting the most probable items first, and the "randomized algorithm", which involves selecting items randomly. Other strategies include the "principle of deferred decisions" and the "coupon collector's inequality".

What are some limitations of the Coupon Collection Problem Variation?

One limitation of the Coupon Collection Problem Variation is that it assumes independent probabilities for each item, which may not always be the case in real-world scenarios. It also does not take into account the cost or effort required to collect each item. Additionally, the problem becomes increasingly complex as the number of items and their probabilities increase.

What are some real-world examples of the Coupon Collection Problem Variation?

Some real-world examples of the Coupon Collection Problem Variation include collecting data from different sources, such as surveys or market research, where each source has a different probability of providing a specific piece of information. It can also be applied to collecting different types of resources, such as minerals or species, from a larger pool with varying probabilities of being collected.

Back
Top