- #1
Ryuzaki
- 46
- 0
I’m working on a chemistry problem, which essentially translates to finding the answer to a related probability problem. However, my knowledge in probability is very limited and I'd be grateful if someone could help me out with it. The following is the problem:-
Suppose I have a bag containing [itex]70[/itex] red balls and [itex]30[/itex] blue balls. For the purpose of illustration, let’s call them [itex]R[/itex]s (red balls) and [itex]B[/itex]s (blue balls). Now, I am going to pick one ball at a time from this bag, without replacement. I define a run to be a sequence of consecutive [itex]R[/itex]s (or alternately, [itex]B[/itex]s) picked, along with the first [itex]B[/itex] (or [itex]R[/itex]) that is picked. And I define a red (or blue) run length to be the number of consecutive [itex]R[/itex]s (or [itex]B[/itex]s) I pick in a run, before I encounter a [itex]B[/itex] (or [itex]R[/itex]) or until the number of balls run out.
As examples, [itex]RRRRRRB[/itex] is a run (for simplicity, let me denote it by [itex]R_6[/itex] in shorthand) with red run length [itex]6[/itex], [itex]RB[/itex] is a run (denoted by [itex]R_1[/itex]) with red run length [itex]1[/itex], [itex]BBBR[/itex] is a run (denoted by [itex]B_3[/itex]) with blue run length [itex]3[/itex].
In each simulation, I keep doing runs until all the [itex]100[/itex] balls are picked out (since the balls are picked without replacement, the number of runs and the red/blue run lengths are both finite).
Let’s look at a typical simulation of ball-picking: [itex]R_{50}R_{10}B_{28}R_9[/itex]. In this simulation, there are [itex]4[/itex] runs. The first run consists of [itex]50[/itex] consecutive red balls, until a blue ball is encountered. The second run consists of [itex]10[/itex] consecutive red balls until a ball is encountered. The third run consists of [itex]28[/itex] consecutive blue balls until a red ball is encountered. And the last run consists of [itex]9[/itex] consecutive red balls, and the simulation ends as there are no more balls to be picked.
It is easy to see that the minimum possible number of runs is [itex]2[/itex] (attained by [itex]R_{70}[/itex] followed by [itex]B_{29}[/itex], or [itex]B_{30}[/itex] followed by [itex]R_{69}[/itex]) and the maximum possible number of runs is [itex]31[/itex] (attained by [itex]R_1[/itex] [itex]30[/itex] times followed by [itex]R_{40}[/itex], or [itex]B_1[/itex] [itex]30[/itex] times followed by [itex]R_{70}[/itex]).
Also, the maximum possible value of red run length is [itex]70[/itex] and that of blue run length is [itex]30[/itex].
Now, I’m interested in knowing the probability distribution of the red and blue run lengths. For this, I believe that I must first find the expected value of the number of runs in a simulation. But I’m not sure how to proceed from here. So to sum up, the following are my questions:-
1. How do I find the expected value of the number of runs in a simulation?
2. For that expected value, how do I calculate the probability distribution of red and blue run lengths?
Suppose I have a bag containing [itex]70[/itex] red balls and [itex]30[/itex] blue balls. For the purpose of illustration, let’s call them [itex]R[/itex]s (red balls) and [itex]B[/itex]s (blue balls). Now, I am going to pick one ball at a time from this bag, without replacement. I define a run to be a sequence of consecutive [itex]R[/itex]s (or alternately, [itex]B[/itex]s) picked, along with the first [itex]B[/itex] (or [itex]R[/itex]) that is picked. And I define a red (or blue) run length to be the number of consecutive [itex]R[/itex]s (or [itex]B[/itex]s) I pick in a run, before I encounter a [itex]B[/itex] (or [itex]R[/itex]) or until the number of balls run out.
As examples, [itex]RRRRRRB[/itex] is a run (for simplicity, let me denote it by [itex]R_6[/itex] in shorthand) with red run length [itex]6[/itex], [itex]RB[/itex] is a run (denoted by [itex]R_1[/itex]) with red run length [itex]1[/itex], [itex]BBBR[/itex] is a run (denoted by [itex]B_3[/itex]) with blue run length [itex]3[/itex].
In each simulation, I keep doing runs until all the [itex]100[/itex] balls are picked out (since the balls are picked without replacement, the number of runs and the red/blue run lengths are both finite).
Let’s look at a typical simulation of ball-picking: [itex]R_{50}R_{10}B_{28}R_9[/itex]. In this simulation, there are [itex]4[/itex] runs. The first run consists of [itex]50[/itex] consecutive red balls, until a blue ball is encountered. The second run consists of [itex]10[/itex] consecutive red balls until a ball is encountered. The third run consists of [itex]28[/itex] consecutive blue balls until a red ball is encountered. And the last run consists of [itex]9[/itex] consecutive red balls, and the simulation ends as there are no more balls to be picked.
It is easy to see that the minimum possible number of runs is [itex]2[/itex] (attained by [itex]R_{70}[/itex] followed by [itex]B_{29}[/itex], or [itex]B_{30}[/itex] followed by [itex]R_{69}[/itex]) and the maximum possible number of runs is [itex]31[/itex] (attained by [itex]R_1[/itex] [itex]30[/itex] times followed by [itex]R_{40}[/itex], or [itex]B_1[/itex] [itex]30[/itex] times followed by [itex]R_{70}[/itex]).
Also, the maximum possible value of red run length is [itex]70[/itex] and that of blue run length is [itex]30[/itex].
Now, I’m interested in knowing the probability distribution of the red and blue run lengths. For this, I believe that I must first find the expected value of the number of runs in a simulation. But I’m not sure how to proceed from here. So to sum up, the following are my questions:-
1. How do I find the expected value of the number of runs in a simulation?
2. For that expected value, how do I calculate the probability distribution of red and blue run lengths?