- #1
DaanV
- 26
- 0
Homework Statement
I have run into a situation that my gut tells me is impossible (alright extremely unlikely) when assuming a Poisson distribution. I want to make this gut feeling more formal by testing it against a Poisson distribution. Sadly I'm not a schooled statistician.
Generalised form
I have ##N## objects that I am dividing over ##C## compartments.
##N## can be of distinguishable types ##N_A## and ##N_B## such that ##N_A + N_B = N##
(say green and blue tennis balls in buckets)
Sadly I cannot distinguish the number of objects that are in a bucket, but I can tell of what type they are. I.e. I can see if a bucket contains no tennis balls, only green, only blue or both.
Usually, ##C >> N##
If my null hypothesis is that the distribution of tennis balls over the buckets is random, what calculations do I have to perform to reject that hypothesis with e.g. 99% certainty?
Real example
This example should (IMO) quite clearly yield a rejection of the hypothesis, but I have other examples where the distinction may be less obvious. So I'm looking for ways to calculate.
I have 1026059 "buckets". 29755 of these contain only (at least one) ##N_A##, 14 contain both (at least one each) ##N_A## and ##N_B##. 0 "buckets" contain only ##N_B##. The remainder (1026059 - 29755 - 14 - 0 = 996290) contains no objects, is empty.
It seems incredibly unlikely to me that all 14 ##N_B## objects would end up in a bucket also containing ##N_A##, despite there being so many empty buckets left. But how do I bring this formally?
Homework Equations
##H_0## = Poisson distribution
##H_1## = a greater than random probability of B being linked to A
Something along those lines?
The Attempt at a Solution
I want to determine ##N_A## and ##N_B## from the bucket counts, assuming a Poisson distribution. Then from there I want to determine the expected distribution of ##N_A## and ##N_B## over the buckets, again assuming a Poisson distribution. That should show that we expect very few "dual positive" buckets.
Probability of ##b## balls in any bucket, given an average 'Loading' ##L## of balls per bucket (##L = \frac{N}{C}##), is given by:
##P(b;L)=\frac{L^b * e^{-L}}{b!}##
The fraction of empty buckets over total buckets (##\frac{C^-}{C}##) is then given by
##\frac{C^-}{C}=P(0;L)=\frac{L^0 * e^{-L}}{0!} = e^{-L}## so that we can compute ##L=-ln(\frac{C^-}{C})##
Doing so we can calculate ##L_A## and ##L_B## in an equivalent manner. Using the example above, I get to ##L_A=0.0289## and ##L_B=1.364*10^{-5}##.
From there I can calculate what distribution of fragments we would expect, given that loading. Doing so yields that we would expect on average 0.4 buckets with both ##N_A## and ##N_B## in them, rather than the 14 that I found.
So.. How do I show that the expected distribution does not fit with the observed distribution?
Thanks in advance for any help provided!