- #1
Bata
- 14
- 10
- TL;DR Summary
- If n balls are distributed among n boxes (with equal chance of any ball being in any box) then what is the chance that exactly k boxes are left empty?
I made this question up so I have no guarantee that there is a clean answer. It seems like there should be a simple approach though, I’m just not seeing it.
First attempt:
Find the chance that only the first k boxes are left empty. Then we can multiply by ##{n \choose k}## to get the total chance, since all the choices of which k boxes are empty are exclusive (and equally likely) possibilities.
(The word “only” is the key problem. If we just wanted at least the first k to be empty, then the chance is simply ##(1-k/m)^n## but this does not guarantee the rest are non-empty.)
Let’s just look at k=1. Let’s label the propositions ##i_0=## “the i’th box is empty.” Let’s use the shorthand notations ##\bar A = \text{Not } A## and ##AB = (A\text{ and } B)##
Then the probability we are looking for (for k=1) is ##P(1_0\bar 2_0 \dots \overline m_0)## which we can build up from the product rule if we know all the conditional probabilities:
##P(1_0) = (1-1/m)^n##
##P(\bar 2_0|1_0) = 1 - (1-\frac{1}{m-1})^n##
##P(\bar 3|\bar 2_01_0) = ## ??
My only idea to find this third probability is to break the proposition ##\bar 2_0## into ##(2_1\text{ or }2_2\text{ or } \dots \text{ or } 2_n)## where the subscript denotes the number of balls in the (second) box. We would have to do this for each of the ##\bar i_0## which would get progressively complicated (the number of balls available to the third depends on the number in the second, etc.) and this is just for k=1… Thus I give up on my first attempt.
Second attempt:
Define the propositions ##E_k=## “exactly k boxes are left empty” and ##L_k=## “at least k boxes are left empty.”
Then logically we have ##L_k= E_k \text{ or }L_{k+1}## and since ##E_k## and ##L_{k+1}## are exclusive possibilities, we have
##P(L_k) = P(E_k \text{ or }L_{k+1}) = P(E_k) + P(L_{k+1})##
Thus we have reduced the problem of finding ##P(E_k)## to that of finding ##P(L_k)## via:
##P(E_k) = P(L_k) - P(L_{k+1})##
As noted above, the chance that at least the first k boxes are empty is ##(1-k/m)^n##. However it would be a logical mistake to multiply this by ##{n \choose k}## to get ##P(L_k)## since the possibilities are not exclusive (we would overcount some cases if we do this).
I can find ##P(L_1)## via the “inclusion-exclusion principle.” The chance that any particular set of j boxes is empty is ##(1-j/m)^n## and there are ##{m \choose j}## sets of j boxes, so we have:
$$P(L_1) = P(1_0 \text{ or } 2_0 \text{ or } \dots \text{ or } m_0) = \sum_{j=1}^{m}(-1)^{j+1} {m \choose j}(1-\frac{j}{m})^n$$
(The sum could also run up to m-1 since the last term, the chance that all are empty, is zero.) Even if I could simplify this, the method doesn’t seem to extend easily to the rest of the ##P(L_k)## thus I give up on my second attempt.
Third attempt
Clearly there is only one way for all but the first box to be empty. Thus maybe we can work recursively backwards. Define ##w_j## to be the number of ways that make the first j boxes non-empty, with the rest being empty.
##w_1 = 1##
##w_2 = 2^n - 2 = 2^n - {2 \choose 1} w_1##
##\vdots##
$$w_j = j^n - \sum_{i=1}^{j-1}{j \choose i}w_i$$
The idea is that we take all the ##j^n## ways of distributing n balls in j boxes and then subtract off all the ways which have empty boxes. Then the probability we want will be:
$$P(E_k) = {n \choose k} \frac{w_{m-k}}{m^n}$$
(The factor ##{n \choose k}## comes from the reasoning in the first attempt. ##m^n## is the total number of unrestricted arrangements, which are all equally likely.)
To be honest I haven’t really tried simplifying this since it looks complicated, but at least it is solved in principle.
Fourth attempt
This was a desperate idea but I’ll mention it anyway. I am re-trying my first line of thought where we make only the first k boxes empty. We could choose m-k balls to go in the last m-k boxes (to ensure they’re not empty) then distribute the remaining n-(m-k) balls arbitrarily among those m-k boxes. This gives ##{n \choose m-k}(m-k)^{n-m+k}## ways which then needs to be divided by the total number ##m^n## of (equally likely) ways of distributing n balls in m boxes.
However this reasoning is clearly flawed because there is nothing special about the first ball we put in each box. Thus we are overcounting by a lot. Subtracting off the correction terms seems hopelessly complicated, thus I give up on this attempt.
Final thoughts
How can such a simple sounding problem be so troublesome?! It seems like there should be a clever approach, but I have thought about it all day long with no good ideas. Originally I was trying to solve the more general problem, what is the chance that exactly k boxes have exactly ##\ell## balls in each of them, but then I found I can’t even solve the simplest ##\ell = 0## case!
I think the third line of reasoning is the most promising since (if I haven’t made a mistake) it is essentially solved (in recursive form) but it’s unclear to me how to make the solution explicit.
It’s late, I’m tired, I hope I have not made any major mistakes in this post. Please help :’(
First attempt:
Find the chance that only the first k boxes are left empty. Then we can multiply by ##{n \choose k}## to get the total chance, since all the choices of which k boxes are empty are exclusive (and equally likely) possibilities.
(The word “only” is the key problem. If we just wanted at least the first k to be empty, then the chance is simply ##(1-k/m)^n## but this does not guarantee the rest are non-empty.)
Let’s just look at k=1. Let’s label the propositions ##i_0=## “the i’th box is empty.” Let’s use the shorthand notations ##\bar A = \text{Not } A## and ##AB = (A\text{ and } B)##
Then the probability we are looking for (for k=1) is ##P(1_0\bar 2_0 \dots \overline m_0)## which we can build up from the product rule if we know all the conditional probabilities:
##P(1_0) = (1-1/m)^n##
##P(\bar 2_0|1_0) = 1 - (1-\frac{1}{m-1})^n##
##P(\bar 3|\bar 2_01_0) = ## ??
My only idea to find this third probability is to break the proposition ##\bar 2_0## into ##(2_1\text{ or }2_2\text{ or } \dots \text{ or } 2_n)## where the subscript denotes the number of balls in the (second) box. We would have to do this for each of the ##\bar i_0## which would get progressively complicated (the number of balls available to the third depends on the number in the second, etc.) and this is just for k=1… Thus I give up on my first attempt.
Second attempt:
Define the propositions ##E_k=## “exactly k boxes are left empty” and ##L_k=## “at least k boxes are left empty.”
Then logically we have ##L_k= E_k \text{ or }L_{k+1}## and since ##E_k## and ##L_{k+1}## are exclusive possibilities, we have
##P(L_k) = P(E_k \text{ or }L_{k+1}) = P(E_k) + P(L_{k+1})##
Thus we have reduced the problem of finding ##P(E_k)## to that of finding ##P(L_k)## via:
##P(E_k) = P(L_k) - P(L_{k+1})##
As noted above, the chance that at least the first k boxes are empty is ##(1-k/m)^n##. However it would be a logical mistake to multiply this by ##{n \choose k}## to get ##P(L_k)## since the possibilities are not exclusive (we would overcount some cases if we do this).
I can find ##P(L_1)## via the “inclusion-exclusion principle.” The chance that any particular set of j boxes is empty is ##(1-j/m)^n## and there are ##{m \choose j}## sets of j boxes, so we have:
$$P(L_1) = P(1_0 \text{ or } 2_0 \text{ or } \dots \text{ or } m_0) = \sum_{j=1}^{m}(-1)^{j+1} {m \choose j}(1-\frac{j}{m})^n$$
(The sum could also run up to m-1 since the last term, the chance that all are empty, is zero.) Even if I could simplify this, the method doesn’t seem to extend easily to the rest of the ##P(L_k)## thus I give up on my second attempt.
Third attempt
Clearly there is only one way for all but the first box to be empty. Thus maybe we can work recursively backwards. Define ##w_j## to be the number of ways that make the first j boxes non-empty, with the rest being empty.
##w_1 = 1##
##w_2 = 2^n - 2 = 2^n - {2 \choose 1} w_1##
##\vdots##
$$w_j = j^n - \sum_{i=1}^{j-1}{j \choose i}w_i$$
The idea is that we take all the ##j^n## ways of distributing n balls in j boxes and then subtract off all the ways which have empty boxes. Then the probability we want will be:
$$P(E_k) = {n \choose k} \frac{w_{m-k}}{m^n}$$
(The factor ##{n \choose k}## comes from the reasoning in the first attempt. ##m^n## is the total number of unrestricted arrangements, which are all equally likely.)
To be honest I haven’t really tried simplifying this since it looks complicated, but at least it is solved in principle.
Fourth attempt
This was a desperate idea but I’ll mention it anyway. I am re-trying my first line of thought where we make only the first k boxes empty. We could choose m-k balls to go in the last m-k boxes (to ensure they’re not empty) then distribute the remaining n-(m-k) balls arbitrarily among those m-k boxes. This gives ##{n \choose m-k}(m-k)^{n-m+k}## ways which then needs to be divided by the total number ##m^n## of (equally likely) ways of distributing n balls in m boxes.
However this reasoning is clearly flawed because there is nothing special about the first ball we put in each box. Thus we are overcounting by a lot. Subtracting off the correction terms seems hopelessly complicated, thus I give up on this attempt.
Final thoughts
How can such a simple sounding problem be so troublesome?! It seems like there should be a clever approach, but I have thought about it all day long with no good ideas. Originally I was trying to solve the more general problem, what is the chance that exactly k boxes have exactly ##\ell## balls in each of them, but then I found I can’t even solve the simplest ##\ell = 0## case!
I think the third line of reasoning is the most promising since (if I haven’t made a mistake) it is essentially solved (in recursive form) but it’s unclear to me how to make the solution explicit.
It’s late, I’m tired, I hope I have not made any major mistakes in this post. Please help :’(
Last edited: