Ways to put n balls in m boxes such that exactly k boxes are empty?

  • #1
Bata
14
10
TL;DR Summary
If n balls are distributed among n boxes (with equal chance of any ball being in any box) then what is the chance that exactly k boxes are left empty?
I made this question up so I have no guarantee that there is a clean answer. It seems like there should be a simple approach though, I’m just not seeing it.

First attempt:

Find the chance that only the first k boxes are left empty. Then we can multiply by ##{n \choose k}## to get the total chance, since all the choices of which k boxes are empty are exclusive (and equally likely) possibilities.
(The word “only” is the key problem. If we just wanted at least the first k to be empty, then the chance is simply ##(1-k/m)^n## but this does not guarantee the rest are non-empty.)

Let’s just look at k=1. Let’s label the propositions ##i_0=## “the i’th box is empty.” Let’s use the shorthand notations ##\bar A = \text{Not } A## and ##AB = (A\text{ and } B)##

Then the probability we are looking for (for k=1) is ##P(1_0\bar 2_0 \dots \overline m_0)## which we can build up from the product rule if we know all the conditional probabilities:

##P(1_0) = (1-1/m)^n##
##P(\bar 2_0|1_0) = 1 - (1-\frac{1}{m-1})^n##
##P(\bar 3|\bar 2_01_0) = ## ??

My only idea to find this third probability is to break the proposition ##\bar 2_0## into ##(2_1\text{ or }2_2\text{ or } \dots \text{ or } 2_n)## where the subscript denotes the number of balls in the (second) box. We would have to do this for each of the ##\bar i_0## which would get progressively complicated (the number of balls available to the third depends on the number in the second, etc.) and this is just for k=1… Thus I give up on my first attempt.

Second attempt:

Define the propositions ##E_k=## “exactly k boxes are left empty” and ##L_k=## “at least k boxes are left empty.”

Then logically we have ##L_k= E_k \text{ or }L_{k+1}## and since ##E_k## and ##L_{k+1}## are exclusive possibilities, we have

##P(L_k) = P(E_k \text{ or }L_{k+1}) = P(E_k) + P(L_{k+1})##

Thus we have reduced the problem of finding ##P(E_k)## to that of finding ##P(L_k)## via:

##P(E_k) = P(L_k) - P(L_{k+1})##

As noted above, the chance that at least the first k boxes are empty is ##(1-k/m)^n##. However it would be a logical mistake to multiply this by ##{n \choose k}## to get ##P(L_k)## since the possibilities are not exclusive (we would overcount some cases if we do this).

I can find ##P(L_1)## via the “inclusion-exclusion principle.” The chance that any particular set of j boxes is empty is ##(1-j/m)^n## and there are ##{m \choose j}## sets of j boxes, so we have:

$$P(L_1) = P(1_0 \text{ or } 2_0 \text{ or } \dots \text{ or } m_0) = \sum_{j=1}^{m}(-1)^{j+1} {m \choose j}(1-\frac{j}{m})^n$$

(The sum could also run up to m-1 since the last term, the chance that all are empty, is zero.) Even if I could simplify this, the method doesn’t seem to extend easily to the rest of the ##P(L_k)## thus I give up on my second attempt.

Third attempt

Clearly there is only one way for all but the first box to be empty. Thus maybe we can work recursively backwards. Define ##w_j## to be the number of ways that make the first j boxes non-empty, with the rest being empty.

##w_1 = 1##
##w_2 = 2^n - 2 = 2^n - {2 \choose 1} w_1##
##\vdots##
$$w_j = j^n - \sum_{i=1}^{j-1}{j \choose i}w_i$$

The idea is that we take all the ##j^n## ways of distributing n balls in j boxes and then subtract off all the ways which have empty boxes. Then the probability we want will be:

$$P(E_k) = {n \choose k} \frac{w_{m-k}}{m^n}$$

(The factor ##{n \choose k}## comes from the reasoning in the first attempt. ##m^n## is the total number of unrestricted arrangements, which are all equally likely.)

To be honest I haven’t really tried simplifying this since it looks complicated, but at least it is solved in principle.

Fourth attempt

This was a desperate idea but I’ll mention it anyway. I am re-trying my first line of thought where we make only the first k boxes empty. We could choose m-k balls to go in the last m-k boxes (to ensure they’re not empty) then distribute the remaining n-(m-k) balls arbitrarily among those m-k boxes. This gives ##{n \choose m-k}(m-k)^{n-m+k}## ways which then needs to be divided by the total number ##m^n## of (equally likely) ways of distributing n balls in m boxes.

However this reasoning is clearly flawed because there is nothing special about the first ball we put in each box. Thus we are overcounting by a lot. Subtracting off the correction terms seems hopelessly complicated, thus I give up on this attempt.

Final thoughts

How can such a simple sounding problem be so troublesome?! It seems like there should be a clever approach, but I have thought about it all day long with no good ideas. Originally I was trying to solve the more general problem, what is the chance that exactly k boxes have exactly ##\ell## balls in each of them, but then I found I can’t even solve the simplest ##\ell = 0## case!

I think the third line of reasoning is the most promising since (if I haven’t made a mistake) it is essentially solved (in recursive form) but it’s unclear to me how to make the solution explicit.

It’s late, I’m tired, I hope I have not made any major mistakes in this post. Please help :’(
 
Last edited:
Physics news on Phys.org
  • #2
Secondary question:

Are we guaranteed to have the expected number of empty boxes be the same as m times the probability that any particular box is empty?

That is, will we find the following to be true?
$$ \sum _0^m kP(E_k) = m(1-1/m)^n $$
It seems reasonable, but how can we properly justify it?
 
  • #3
Bata said:
Secondary question:

Are we guaranteed to have the expected number of empty boxes be the same as m times the probability that any particular box is empty?

That is, will we find the following to be true?
$$ \sum _0^m kP(E_k) = m(1-1/m)^n $$
It seems reasonable, but how can we properly justify it?
No.
1) There is a lot of conditional dependency among the probabilities. The probability of one box empty is affected by how many of the other boxes are empty.
2) Consider the case of n=k. Clearly it is impossible for all boxes to be empty, but your equation does not imply that.
 
  • Like
Likes Bata
  • #4
Bata said:
TL;DR Summary: If n balls are distributed among m boxes (with equal chance of any ball being in any box) then what is the chance that exactly k boxes are left empty?
You can split the problem up. First, there's the probability that precisely the first ##k## boxes are empty. Then we can use the trick that it's equally likely that any set of ##k## boxes are the empty ones. Given that, we need the probability that ##n## balls are distributed among ##n-k## boxes with no box being empty. We can assume that the balls and boxes are numbered (distinguishable) as that makes the calculations easier.

The number of ways of distributing that ##n## balls among ##n-k## boxes with no box being empty is calculated here (theorem 5.2):

https://physics.byu.edu/faculty/berrondo/docs/physics-731/ballsinboxes.pdf

You may see that each of these ways is equally likely and that the total number of ways is ##(n-k)^n##.

I'm sorry that I didn't have time to look through all your calculations, but the correct calculation is in that link if you need it.
 
  • Informative
Likes FactChecker
  • #5
Just to add a bit to this. If ##p(n, m, k)## is the probability that precisely ##k## boxes are empty when we place ##n## balls in ##m## boxes, then:
$$p(n, m, k) = \binom m k q(n, m, k)$$Where ##q(n, m, k)## is the probability that precisely the first ##k## boxes are empty. Moreover, the probability that the first ##k## boxes are empty is:
$$q(n, m, k) = \big (\frac{m-k}{m}\big)^np(n, m-k, 0)$$And:
$$p(n, m-k, 0) = \frac{1}{(m-k)^n}N(n, m-k, 0)$$Where ##N(n, m-k, 0)## is the number of ways of placing ##n## numbered balls in ##m-k## numbered boxes such that no box is empty.

This gives us:
$$p(n, m, k) = \binom m k \frac 1{m^n}N(n, m-k, 0)$$Here's an example with ##n = 5, m = 3, k = 1##:
$$p(5, 3, 1) = 3\big(\frac 1 {3^5} \big)30 = \frac{10}{27}$$Which you could check by hand or by computer simulation.
 
  • Like
Likes Bata
  • #6
FactChecker said:
No.
1) There is a lot of conditional dependency among the probabilities. The probability of one box empty is affected by how many of the other boxes are empty.
Good! That’s exactly what I first thought. It was a (pretty famous) author who made that assumption in a problem which led me down this rabbit hole of trying to solve for ##P(E_k)## to see if their assumption was valid and the dependencies somehow canceled out. They did not even mention the assumption let alone justify it, but they did assume that m is large so I’ll give them the benefit of the doubt that it should at least be approximately true in that case.

FactChecker said:
2) Consider the case of n=k. Clearly it is impossible for all boxes to be empty, but your equation does not imply that.
I’m not understanding your point here though. In that equation k is a dummy variable so perhaps you made a typo, but I don’t see how the equation is inconsistent with the fact that ##P(E_m)=0## (= chance that all are empty).
 
  • #7
PeroK said:
The number of ways of distributing that ##n## balls among ##n-k## boxes with no box being empty is calculated here (theorem 5.2):
It’s no worries you didn’t have time to read my post but I did essentially calculate that theorem in my OP:
Bata said:
Second attempt:Define the propositions ##E_k=## “exactly k boxes are left empty” and ##L_k=## “at least k boxes are left empty.” $$\vdots$$ I can find ##P(L_1)## via the “inclusion-exclusion principle.” The chance that any particular set of j boxes is empty is ##(1-j/m)^n## and there are ##{m \choose j}## sets of j boxes, so we have:
$$P(L_1) = P(1_0 \text{ or } 2_0 \text{ or } \dots \text{ or } m_0) $$
$$= \sum_{j=1}^{m}(-1)^{j+1} {m \choose j}(1-\frac{j}{m})^n$$
I neglected to mention the fact that ##P(E_0)= 1-P(L_1)## which gives the theorem 5.2 divided by (# boxes)^(# of balls)

PeroK said:
This gives us:
$$p(n, m, k) = \binom m k \frac 1{m^n}N(n, m-k, 0)$$
I see. This is basically my third attempt (it is very brief if you want to look at it) except that I failed to realize I could replace the ##w_j## with the previous calculation if I just adjusted it a bit.

That makes perfect sense, thank you!
 
Last edited:
  • Like
Likes PeroK
  • #8
@PeroK What about the more general question, “how many ways to put n balls in m boxes so that exactly k boxes contain exactly ##\ell## balls?”

Define ##N_{\ell}(n, m, 0)## as the number of ways to distribute n balls in m boxes so that none of the boxes have exactly ##\ell## balls.

Just as in the previous solution, the key step is to find ##N_{\ell}(n, m, 0)## because we can then say:

$$p_{\ell}(n,m,k) = {m \choose k} \frac{1}{m^n}\Big[\frac{n!}{(n-k\ell)!\ell !^k}N_{\ell}(n-k\ell, m-k, 0)\Big]$$
The term in brackets is the number of ways to distribute the balls so that the first k boxes have exactly ##\ell## and the rest don’t.

Would you agree?

The problem of finding ##N_{\ell}(n, m, 0)## is trickier than when ##\ell=0##. I might just give up and move on but I’m wondering if you have any ideas?
 
Last edited:
  • #9
I suppose we can try to solve it the same way I solved it in the ##\ell=0## case, that is:
$$N_{\ell}(n,m,0) = m^n(1-P(1_{\ell} \text{ or } 2_{\ell} \text{ or } \dots \text{ or } m_{\ell}))$$
Where ##i_{\ell}## is the proposition that the ##i ^{th}## box has exactly ##\ell## balls.

Of course, for a single proposition we just get the binomial distribution:
$$P(i_{\ell}) = {n \choose \ell}(1/m)^{\ell} (1-1/m)^{n-\ell}$$
But the multiple-propositions are no longer as trivial as in the ##\ell = 0## case.

Just thinking aloud here since I have to go, but it seems 90% solved at this point. Of course the last 10% might be the trickiest, I’ll think about it more later. Let me know if you agree so far @PeroK (especially with my previous post).
 
  • #10
I might look at this tomorrow if I get the chance.
 
  • #11
I am content that the general problem is effectively solved but that the solution is messy. I explained in more detail but accidentally deleted it. The point is that it doesn’t simplify cleanly unless ##\ell = 0##. Thank you @PeroK for your time and insight.
 
  • Like
Likes FactChecker
  • #12
Bata said:
The point is that it doesn’t simplify cleanly unless ##\ell = 0##.
That's what I suspect. The next case to try would be ##l = 1##, but it doesn't look promising to me.
 
  • Like
Likes Bata
  • #13
Bata said:
I am content that the general problem is effectively solved but that the solution is messy.
That is not surprising. Analytical solutions often get very messy even for simple-looking problems. That is one reason that there are so many special functions. A special functions can be defined to give the solution and, if it is useful, its properties can be studied.
 
  • Like
Likes Bata
  • #14
I haven't read all of this, but "balls in boxes" problems are usually best tackled using a "stars and bars" method.
 
  • #15
pbuk said:
I haven't read all of this, but "balls in boxes" problems are usually best tackled using a "stars and bars" method.
If one wants to know how many different ways identical balls can be distributed in some boxes, then “stars and bars” is an elegant counting trick. A few similar (but contrived) problems can be reduced to parts involving that trick. For the most part though, there are many questions one could ask about balls/boxes where the stars and bars trick is just not at all useful. For example, this problem. (See my next post for the final solution.)
 
  • #16
I randomly realized the solution is actually not as messy as I originally thought. (This is because I was trying to simplify it via probabilities instead of using direct counting.) I will now explain the solution.

Let me re-state the generalized problem:

Find the number of ways that n balls can be distributed among m boxes such that exactly k boxes each contain exactly ##\ell## balls.

Define ##N_{\ell}(n, m)## to be the number of ways to distribute n balls in m boxes such that NONE of them contain exactly ##\ell##. We can explicitly count these ways with the following formula:
$$N_{\ell}(n, m) = m^n - m{n \choose \ell}(m-1)^{n-\ell} + {m \choose 2}\frac{n!}{(n-2\ell)!(\ell !)^2}(m-2)^{n-2\ell}- \dots$$
$$N_{\ell}(m, n) = \sum_{j=0}^m (-1)^j{m \choose j}\frac{n!}{(n-j\ell)!(\ell !)^j}(m-j)^{n-j\ell}$$
The first term is the total (unrestricted) ways to distribute n balls in m boxes. The second term subtracts off the ways that at least a particular box contains ##\ell## (multipled by m, since there are m identical terms of that form). The next term is a correction for the cases that were subtracted off twice. That is, it’s the number of ways for at least two particular boxes to contain exactly ##\ell## (multipled by m choose 2 since that’s how many terms there are of that form). We continue in this manner with the ‘alternating corrections’ formula, which I won’t justify as I assume the pattern is well known.

It’s important now to comment on the upperbound of m (which can be reduced to m-1 as the last term is zero). This is only valid if we define factorials via the gamma function, ##s! = \Gamma(s+1)## since then we have ##1/(-1)!=1/(-2!)=\dots=0## by virtue of the fact that the factorial of negative integers is then divergent (“infinity”) which makes the reciprocal zero. Thus whenever ##n-jl < 0## we automatically find those terms to be zero. If we instead use the ordinary (restricted) definition of factorials, we would have to replace the upper bound with ##\text{min}(m, \lfloor n/\ell \rfloor)## in terms of the min and floor functions. Using the gamma definition just makes the formula cleaner.

Now that we have found ##N_{\ell}(n, m)## we can get the full solution for the number of ways exactly k boxes each contain exactly ##\ell## balls. Call this number ##W_{\ell}(n,m,k)##. The final answer is thus:
$$W_{\ell}(n,m,k) = {m \choose k} \Big[\frac{n!}{(n-k\ell)!(\ell !)^k}N_{\ell}(n-k\ell, m-k)\Big]$$
The term in brackets is the number of ways that a particular set of k boxes each contain exactly ##\ell## while the rest don’t. (Each choice of k boxes is then a mutually exclusive possibility, so there are no correction terms.)

If anyone is wondering about where the multinomial coefficient ##\frac{n!}{(n-k\ell)!(\ell !)^k}## comes from, it can be explained in two ways. One way is to imagine ordering ##n-k\ell## balls. There is ##\frac{n!}{(n-k\ell)!}## ways to do this. Then imagine the first ##\ell## is the first box, the next ##\ell## is the next box, and so on. We must then divide out the redundancy ##\ell !## for each of the k boxes. Alternatively, we can just imagine successively choosing ##\ell## balls for each box, which gives:
$${n \choose \ell} {n - \ell \choose \ell} {n -2\ell \choose \ell} \dots {n - (k-1)\ell \choose \ell} = \frac{n!}{(n-k\ell)!(\ell !)^k}$$
since many of the factorial terms cancel out.

Anyway, the final answer can easily be verified to reduce to the previous solution when ##\ell = 0## and it’s not terribly more difficult or messy than that case.

Ok, NOW I’m content and done with this silly problem. I promise this time.
 
Last edited:
  • Like
Likes PeroK
  • #17
Bata said:
A few similar (but contrived) problems can be reduced to parts involving that trick.
How do you distinguish between a "balls in boxes" problem that is contrived from one that is not?

Bata said:
For the most part though, there are many questions one could ask about balls/boxes where the stars and bars trick is just not at all useful.
There are of course many such problems, however to state that these are "for the most part" is a bold assertion: do you have a reference for it?

Bata said:
For example, this problem.
On the contrary, the problem in the OP is quickly solved using counting methods based on stars and bars:

  1. There are ## \binom{2n - 1}{n-1} ## equally probable ways of distributing ## n ## balls among ## n ## possibly empty boxes (by the method of stars and bars or otherwise).
  2. If exactly ## k ## boxes are empty we have ## \binom{n - 1}{n-k-1} ## ways of distributing the ## n ## balls among the ## n - k ## non-empty boxes (again by stars and bars or otherwise).
  3. There are ## \binom{n}{k} ## ways of distributing the ## k ## empty boxes among the ## n ## boxes and so we have ## \binom{n - 1}{n-k-1} \binom{n}{k} ## ways of distributing ## n ## balls among ## n ## boxes so that exactly ## k ## are empty, giving us a probability that exactly ## k ## boxes are empty of
    $$ \frac{\binom{n - 1}{n-k-1} \binom{n}{k}}{ \binom{2n - 1}{n-1}} $$
 
  • #18
Bata said:
@PeroK What about the more general question, “how many ways to put n balls in m boxes so that exactly k boxes contain exactly ##\ell## balls?”
If you start by putting ## \ell ## balls into each box then this problem reduces to "how many ways to put ## n - \ell m ## balls into ## m ## boxes so that exactly ## k ## boxes are empty", for which a solution has already been found.
 
  • #19
As recommended by @pbuk in the post #14, the problem of putting n indistinguishable balls into m distinguishable boxes, where ## n\ge m ## and ## m\gt 0 ## can be solved by using “stars and bars” method.

The total number of outcomes
$$ \binom{n+m-1}{n} $$
The number of outcomes when only the first k boxes are left empty and where ## 0\le k\lt m ##
$$ \binom{n-1}{m-k-1} $$
The number of outcomes when only k boxes are left empty
$$ \binom{m}{k}\binom{n-1}{m-k-1} $$
The probability that exactly k boxes are left empty

## \begin{align}
P(K=k)&=\frac{ m!}{(m-k)!k! }\frac{(n-1)!}{(m-k-1)!(n-m+k)!}\frac{(m-1)!n!}{(n+m-1)!}\nonumber\\
&=\frac{nm}{(m-k)}\left(\frac{(n-1)!(m-1)!}{(m-k-1)!} \right)^2\frac{1}{k!(n-m+k)!(n+m-1)!}\nonumber
\end{align} ##
 
  • Skeptical
Likes PeroK
  • #20
pbuk said:
How do you distinguish between a "balls in boxes" problem that is contrived from one that is not?


There are of course many such problems, however to state that these are "for the most part" is a bold assertion: do you have a reference for it?
Lol sure let me cite some references for my opinions in the future. Look, I’m sorry if I seem to have offended you. I am an undergraduate student giving my obviously limited perspective. How you feel about me saying “for the most part” is exactly how I felt about your use of the word “usually” in post #14.

pbuk said:
On the contrary, the problem in the OP is quickly solved using counting methods based on stars and bars:

  1. There are ## \binom{2n - 1}{n-1} ## equally probable ways of distributing ## n ## balls among ## n ## possibly empty boxes (by the method of stars and bars or otherwise)
Sorry I disagree with you still. I maintain my opinion that “stars and bars” is not useful here and your quick solution is wrong. Why are you stating that each (bar-star) configuration is equally probable with no justification?

In the problem, each ball is equally likely to go in any box. This implies each “micro-state” (where the balls are labeled) is equally likely. The (relative) probability of a “macro-state” (defined by ignoring the labels and only counting the number in each box) is then given by the “multiplicity” of the macro-state (that is, the number of micro-states consistent with that macro-state).

[Sorry for making up or misusing the words in quotes. I still don’t have any references.]

To be more specific, suppose that there are ##n_1, n_2, \cdots ,n_m## balls in each respective box. Then the probability of that macro-state will be:
$$\frac{1}{m^n}\frac{n!}{n_1 ! \cdots n_m !}$$
For example, all the balls being in one box is the least likely outcome. So no, all of the ##{n+m-1 \choose m-1}## configurations are not equally likely. This was one of my first lines of thought for approaching this problem, which is why I concluded that “stars and bars” are not helpful here. (It did also give me a decent exercise; to confirm the above expression, summed over every possible set of values (which must sum to n) does indeed sum to 1. This gave me confidence that the above expression is correct.)

@Gavran
You are making the same logical mistake as @pbuk which is simply forgetting that not all “macro-states” (“bar-star states”) are equally likely. Counting states and taking their ratio does not give the probability when the states are not equally likely… The convenience of dealing with indistinguishable balls is lost by the need to deal with the “multiplicity” of each such state.

Forgive me if I’m overlooking something, as I said I’m just a lowly undergraduate student. It seems you gentleman’s logic is flawed though. I have given a solution in post #16 which seems valid to me.
 
Last edited:
  • Like
Likes Gavran, PeroK and pbuk
  • #21
Bata said:
For example, all the balls being in one box is the least likely outcome. So no, all of the ##{n+m-1 \choose m-1}## configurations are not equally likely.
Yes you are right...

Bata said:
Why are you stating that each (bar-star) configuration is equally probable
... and I was wrong.

Bata said:
In the problem, each ball is equally likely to go in any box. This implies each “micro-state” (where the balls are labeled) is equally likely. The (relative) probability of a “macro-state” (defined by ignoring the labels and only counting the number in each box) is then given by the “multiplicity” of the macro-state (that is, the number of micro-states consistent with that macro-state).

[Sorry for making up or misusing the words in quotes. I still don’t have any references.]
Your "micro-states" are permutations, and "macro-states" are combinations. Permutations (of which there are ## m^n ##) are equally likely, combinations are not.

This gives us two ways to approach the problem: either use stars and bars to calculate the number of combinations that satisfy the criteria we are interested in and then calculate the number of permutations of balls in each combination, or enumerate equally likely permutations directly and then calculate the number that satisfy the criteria (which is what I think you have done).

If I were you I would check my answer with a numerical method (iterating over all the permutations for a given ## (n, m) ## and counting those that satisfy the ## (k, \ell) ## conditions).

Congratulations on your clear thinking on this!
 
  • Like
Likes Bata
  • #22
Bata said:
I randomly realized the solution is actually not as messy as I originally thought. (This is because I was trying to simplify it via probabilities instead of using direct counting.) I will now explain the solution.

Let me re-state the generalized problem:

Find the number of ways that n balls can be distributed among m boxes such that exactly k boxes each contain exactly ##\ell## balls.

Define ##N_{\ell}(n, m)## to be the number of ways to distribute n balls in m boxes such that NONE of them contain exactly ##\ell##. We can explicitly count these ways with the following formula:
$$N_{\ell}(n, m) = m^n - m{n \choose \ell}(m-1)^{n-\ell} + {m \choose 2}\frac{n!}{(n-2\ell)!(\ell !)^2}(m-2)^{n-2\ell}- \dots$$
$$N_{\ell}(m, n) = \sum_{j=0}^m (-1)^j{m \choose j}\frac{n!}{(n-j\ell)!(\ell !)^j}(m-j)^{n-j\ell}$$
The first term is the total (unrestricted) ways to distribute n balls in m boxes. The second term subtracts off the ways that at least a particular box contains ##\ell## (multipled by m, since there are m identical terms of that form). The next term is a correction for the cases that were subtracted off twice. That is, it’s the number of ways for at least two particular boxes to contain exactly ##\ell## (multipled by m choose 2 since that’s how many terms there are of that form). We continue in this manner with the ‘alternating corrections’ formula, which I won’t justify as I assume the pattern is well known.

It’s important now to comment on the upperbound of m (which can be reduced to m-1 as the last term is zero). This is only valid if we define factorials via the gamma function, ##s! = \Gamma(s+1)## since then we have ##1/(-1)!=1/(-2!)=\dots=0## by virtue of the fact that the factorial of negative integers is then divergent (“infinity”) which makes the reciprocal zero. Thus whenever ##n-jl < 0## we automatically find those terms to be zero. If we instead use the ordinary (restricted) definition of factorials, we would have to replace the upper bound with ##\text{min}(m, \lfloor n/\ell \rfloor)## in terms of the min and floor functions. Using the gamma definition just makes the formula cleaner.

Now that we have found ##N_{\ell}(n, m)## we can get the full solution for the number of ways exactly k boxes each contain exactly ##\ell## balls. Call this number ##W_{\ell}(n,m,k)##. The final answer is thus:
$$W_{\ell}(n,m,k) = {m \choose k} \Big[\frac{n!}{(n-k\ell)!(\ell !)^k}N_{\ell}(n-k\ell, m-k)\Big]$$
The term in brackets is the number of ways that a particular set of k boxes each contain exactly ##\ell## while the rest don’t. (Each choice of k boxes is then a mutually exclusive possibility, so there are no correction terms.)

If anyone is wondering about where the multinomial coefficient ##\frac{n!}{(n-k\ell)!(\ell !)^k}## comes from, it can be explained in two ways. One way is to imagine ordering ##n-k\ell## balls. There is ##\frac{n!}{(n-k\ell)!}## ways to do this. Then imagine the first ##\ell## is the first box, the next ##\ell## is the next box, and so on. We must then divide out the redundancy ##\ell !## for each of the k boxes. Alternatively, we can just imagine successively choosing ##\ell## balls for each box, which gives:
$${n \choose \ell} {n - \ell \choose \ell} {n -2\ell \choose \ell} \dots {n - (k-1)\ell \choose \ell} = \frac{n!}{(n-k\ell)!(\ell !)^k}$$
since many of the factorial terms cancel out.

Anyway, the final answer can easily be verified to reduce to the previous solution when ##\ell = 0## and it’s not terribly more difficult or messy than that case.

Ok, NOW I’m content and done with this silly problem. I promise this time.
Have you checked that formula using a simulation?
 
  • #23
pbuk said:
Yes you are right...


... and I was wrong.
It may not mean much, but you have earned my respect. It unfortunately seems far too rare that people willingly admit to a mistake. I value integrity more than clear thinking. Much respect.

PeroK said:
Have you checked that formula using a simulation?
I have not, but I have just added that to my to do list; thanks. I’m taking my first class on programming (and my first on combinatorics, ironically) in January, so I will probably do this after then.

I appreciate you PeroK, you gave me the key insight I was overlooking which allowed me to solve this problem. Take care of yourself friend.
 
  • Like
Likes Gavran and PeroK
  • #24
Bata said:
It may not mean much, but you have earned my respect. It unfortunately seems far too rare that people willingly admit to a mistake. I value integrity more than clear thinking. Much respect.


I have not, but I have just added that to my to do list; thanks. I’m taking my first class on programming (and my first on combinatorics, ironically) in January, so I will probably do this after then.

I appreciate you PeroK, you gave me the key insight I was overlooking which allowed me to solve this problem. Take care of yourself friend.
I've been away for a couple of weeks. I might knock up a Python simulation when I get the chance.
 
  • #25
Bata said:
To be more specific, suppose that there are ##n_1, n_2, \cdots ,n_m## balls in each respective box. Then the probability of that macro-state will be:
$$\frac{1}{m^n}\frac{n!}{n_1 ! \cdots n_m !}$$
Correct.

Bata said:
@Gavran
You are making the same logical mistake as @pbuk which is simply forgetting that not all “macro-states” (“bar-star states”) are equally likely. Counting states and taking their ratio does not give the probability when the states are not equally likely…
Thank you, @Bata.
In the post #19 I excluded the order of choosing boxes while putting the balls inside them and that was my mistake.
 
  • #26
Bata said:
I’m taking my first class on programming (and my first on combinatorics, ironically) in January, so I will probably do this after then.
Coding this is probably quite a challenge for a first class, and is probably best done using some more advanced tools of an expressive language such as Python which you probably won't encounter until later on.

The following code illustrates the use of functional programming (functools.reduce), iterators (the cartesian product iterator itertools.product) and lambda functions to implement the computations in 15 SLOC. I have deliberately left the code uncommented to enhance its use for learning.

Python:
from functools import reduce
from itertools import product

def countBinsWithNBalls(bins, n):
    return reduce(lambda cum, bin: cum + 1 if bin == n else cum, bins, 0)

def countBall(bins, ball):
    bins[ball] += 1
    return bins

def countBins(balls):
    return reduce(countBall, balls, [0] * n)

def countFrequencyOfKBinsWithLBalls(m, n, k, l):
    freq = 0
    
    for balls in product(range(n), repeat = m):
        bins = countBins(balls)
        binsWithLBalls = countBinsWithNBalls(bins, l)
        if binsWithLBalls == k:
            freq += 1
        # debug print(balls, bins, binsWithLBalls, freq)
    return freq

# Print results for a range of parameters.
for m in range(1, 5):
    for n in range(1, 5):
        print()
        print('m =', m, 'balls in n =', n, 'bins:', n ** m, 'arrangements') 

        for l in range(m + 1):
            print('Frequency of k bins with', l, '(ℓ) balls for k = (', end = '')
            for k in range(n + 1):
                if k < n:
                    print(k, ', ', sep='', end = '')
                else:
                    print(n, '): (', sep='', end = '')
                
            for k in range(n + 1):
                freq = countFrequencyOfKBinsWithLBalls(m, n, k, l)
                if k < n:
                    print(freq, ', ', sep = '', end = '')
                else:
                    print(freq, ')', sep = '')

Output:
m = 1 balls in n = 1 bins: 1 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1): (1, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1): (0, 1)

m = 1 balls in n = 2 bins: 2 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1, 2): (0, 2, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1, 2): (0, 2, 0)

m = 1 balls in n = 3 bins: 3 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1, 2, 3): (0, 0, 3, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1, 2, 3): (0, 3, 0, 0)

m = 1 balls in n = 4 bins: 4 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1, 2, 3, 4): (0, 0, 0, 4, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1, 2, 3, 4): (0, 4, 0, 0, 0)

m = 2 balls in n = 1 bins: 1 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1): (1, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1): (1, 0)
Frequency of k bins with 2 (ℓ) balls for k = (0, 1): (0, 1)

m = 2 balls in n = 2 bins: 4 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1, 2): (2, 2, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1, 2): (2, 0, 2)
Frequency of k bins with 2 (ℓ) balls for k = (0, 1, 2): (2, 2, 0)

m = 2 balls in n = 3 bins: 9 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1, 2, 3): (0, 6, 3, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1, 2, 3): (3, 0, 6, 0)
Frequency of k bins with 2 (ℓ) balls for k = (0, 1, 2, 3): (6, 3, 0, 0)

m = 2 balls in n = 4 bins: 16 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1, 2, 3, 4): (0, 0, 12, 4, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1, 2, 3, 4): (4, 0, 12, 0, 0)
Frequency of k bins with 2 (ℓ) balls for k = (0, 1, 2, 3, 4): (12, 4, 0, 0, 0)

m = 3 balls in n = 1 bins: 1 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1): (1, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1): (1, 0)
Frequency of k bins with 2 (ℓ) balls for k = (0, 1): (1, 0)
Frequency of k bins with 3 (ℓ) balls for k = (0, 1): (0, 1)

m = 3 balls in n = 2 bins: 8 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1, 2): (6, 2, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1, 2): (2, 6, 0)
Frequency of k bins with 2 (ℓ) balls for k = (0, 1, 2): (2, 6, 0)
Frequency of k bins with 3 (ℓ) balls for k = (0, 1, 2): (6, 2, 0)

m = 3 balls in n = 3 bins: 27 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1, 2, 3): (6, 18, 3, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1, 2, 3): (3, 18, 0, 6)
Frequency of k bins with 2 (ℓ) balls for k = (0, 1, 2, 3): (9, 18, 0, 0)
Frequency of k bins with 3 (ℓ) balls for k = (0, 1, 2, 3): (24, 3, 0, 0)

m = 3 balls in n = 4 bins: 64 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1, 2, 3, 4): (0, 24, 36, 4, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1, 2, 3, 4): (4, 36, 0, 24, 0)
Frequency of k bins with 2 (ℓ) balls for k = (0, 1, 2, 3, 4): (28, 36, 0, 0, 0)
Frequency of k bins with 3 (ℓ) balls for k = (0, 1, 2, 3, 4): (60, 4, 0, 0, 0)

m = 4 balls in n = 1 bins: 1 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1): (1, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1): (1, 0)
Frequency of k bins with 2 (ℓ) balls for k = (0, 1): (1, 0)
Frequency of k bins with 3 (ℓ) balls for k = (0, 1): (1, 0)
Frequency of k bins with 4 (ℓ) balls for k = (0, 1): (0, 1)

m = 4 balls in n = 2 bins: 16 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1, 2): (14, 2, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1, 2): (8, 8, 0)
Frequency of k bins with 2 (ℓ) balls for k = (0, 1, 2): (10, 0, 6)
Frequency of k bins with 3 (ℓ) balls for k = (0, 1, 2): (8, 8, 0)
Frequency of k bins with 4 (ℓ) balls for k = (0, 1, 2): (14, 2, 0)

m = 4 balls in n = 3 bins: 81 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1, 2, 3): (36, 42, 3, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1, 2, 3): (21, 24, 36, 0)
Frequency of k bins with 2 (ℓ) balls for k = (0, 1, 2, 3): (27, 36, 18, 0)
Frequency of k bins with 3 (ℓ) balls for k = (0, 1, 2, 3): (57, 24, 0, 0)
Frequency of k bins with 4 (ℓ) balls for k = (0, 1, 2, 3): (78, 3, 0, 0)

m = 4 balls in n = 4 bins: 256 arrangements
Frequency of k bins with 0 (ℓ) balls for k = (0, 1, 2, 3, 4): (24, 144, 84, 4, 0)
Frequency of k bins with 1 (ℓ) balls for k = (0, 1, 2, 3, 4): (40, 48, 144, 0, 24)
Frequency of k bins with 2 (ℓ) balls for k = (0, 1, 2, 3, 4): (76, 144, 36, 0, 0)
Frequency of k bins with 3 (ℓ) balls for k = (0, 1, 2, 3, 4): (208, 48, 0, 0, 0)
Frequency of k bins with 4 (ℓ) balls for k = (0, 1, 2, 3, 4): (252, 4, 0, 0, 0)

Edit: assuming we are interested in more than one value for ## k ##, there is a simple change to the above code with a huge improvement in run time. Don't post it please, leave it as an exercise for future readers.
 
Last edited:
  • Like
Likes Bata and PeroK
  • #27
By the way,
Welcome to Physics Forums, @Bata.
 
  • Like
Likes Bata and PeroK
  • #28
pbuk said:
Coding this is probably quite a challenge for a first class, and is probably best done using some more advanced tools of an expressive language such as Python which you probably won't encounter until later on.

The following code illustrates the use of functional programming (functools.reduce), iterators (the cartesian product iterator itertools.product) and lambda functions to implement the computations in 15 SLOC. I have deliberately left the code uncommented to enhance its use for learning.

Python:
from functools import reduce
from itertools import product

def countBinsWithNBalls(bins, n):
    return reduce(lambda cum, bin: cum + 1 if bin == n else cum, bins, 0)

def countBall(bins, ball):
    bins[ball] += 1
    return bins

def countBins(balls):
    return reduce(countBall, balls, [0] * n)

def countFrequencyOfKBinsWithLBalls(m, n, k, l):
    freq = 0
 
    for balls in product(range(n), repeat = m):
        bins = countBins(balls)
        binsWithLBalls = countBinsWithNBalls(bins, l)
        if binsWithLBalls == k:
            freq += 1
        # debug print(balls, bins, binsWithLBalls, freq)
    return freq

# Print results for a range of parameters.
for m in range(1, 5):
    for n in range(1, 5):
        print()
        print('m =', m, 'balls in n =', n, 'bins:', n ** m, 'arrangements')

        for l in range(m + 1):
            print('Frequency of k bins with', l, '(ℓ) balls for k = (', end = '')
            for k in range(n + 1):
                if k < n:
                    print(k, ', ', sep='', end = '')
                else:
                    print(n, '): (', sep='', end = '')
            
            for k in range(n + 1):
                freq = countFrequencyOfKBinsWithLBalls(m, n, k, l)
                if k < n:
                    print(freq, ', ', sep = '', end = '')
                else:
                    print(freq, ')', sep = '')
Perhaps it is because I'm new to programming (I had to research the language and functions) but tracing the logic of your code took some mental effort. (It didn't help that I didn't at first notice that you swapped the meaning of n and m.) I do understand the meaning/flow of each piece now, but wouldn't it be more read-able to have defined the first few functions as follows:
Python:
def countBinsWithLBalls(bins, L):
    return sum(1 for bin_ in bins if bin_ == L)

def countBins(balls):
    bins = [0] * n
    for i in range(n):
        bins[i] = balls.count(i)
    return bins
I suppose it's a matter of opinion, but the effect of these functions is much more easily understandable (to me) when written this way.

[Edit: I suppose we could actually just replace the entire first function with the built in count function. So lines 19 and 20 would be replaced with:
“if bins.count(L) == k”
but I don’t have a computer to verify that this works right now.]

I have two questions though. I've labeled the variable "bin_" because I noticed there is a built in binary function called "bin" and so my first question is: is there not a problem with re-using the name bin when it's already defined? I suppose Python just locally overwrites it? Or uses context clues as to which is meant?

My second question is about the use of n (which, again, was originally called m) in the function countBins. Why would this not cause an error of n being undefined? Shouldn't it have two parameters, "countBins(balls, n)"? I assume that Python just looks in the local context for the meaning of n? So even though n is undefined where the function is written, it is (locally) defined where the function is called? It still feels like we should be using two parameters in the function, but I guess programming is new to me.
 
Last edited:
  • Like
Likes PeroK
  • #29
Anyway, I have concluded that my formula is indeed correct, as it matches with all the cases listed out by @pbuk.

Here is my code:

Python:
from math import factorial, comb

def NoneHaveL(m, n, L):
    upperbound = min(m, n//L) if L != 0 else m
    def expression(i):
        if n-i*L < 0: # I don't understand why I need this... the upperbound should prevent it.
            return 0
        return (-1)**i * comb(m, i) * (m-i)**(n-i*L) * factorial(n)/(factorial(n-i*L) * factorial(L)**i)
    return sum(expression(i) for i in range(upperbound + 1))
   
def calculateFrequencyOfKBinsWithLBalls(m, n, k, L):
    if k*L > n:
        return 0;
    return comb(m, k) * factorial(n)/(factorial(n-k*L) * factorial(L)**k) * NoneHaveL(n-k*L, m-k, L)

I was going to try to figure out why I need that line which I commented on. I only added it to avoid the "factorial cannot be negative" error. But the upperbound is designed to avoid that error... so I'm not sure what's going on. Anyway the 'calculate' function gives the same answers as the 'count' function, so it seems my post #16 is indeed the correct solution, as I suspected.

I only have limited access to a computer (all my previous posts were from a phone) so I can't play around with it anymore to find out why I'm getting that error when I take out the commented line, as I have to get off the computer now.
 
  • Like
Likes PeroK
  • #30
pbuk said:
If you start by putting ## \ell ## balls into each box then this problem reduces to "how many ways to put ## n - \ell m ## balls into ## m ## boxes so that exactly ## k ## boxes are empty", for which a solution has already been found.
I know it's not important, but for completeness I'd like to point out the flaw in this. (I skipped this post the first time since I stopped reading once I noticed the mistaken assumption of equal probabilities.)

Not only do you not want to put any more balls in the k boxes, but you want to ensure that the m-k boxes each don't contain ##\ell## as well. The solution "already found" does not ensure this. That is what makes it tricky.

Not sure if I explained the point clearly, but anyway, it does not matter. The problem is correctly solved in post #16, and I think the logic of the solution is rather simple, even if the expressions are a bit intimidating.
 

Similar threads

Back
Top