How do I express that a 100% occurrence in a small sample is low "confidence"?

In summary, a 100% occurrence in a small sample indicates that while the result appears definitive, the confidence in its reliability is low due to the limited size of the sample. This suggests that the findings may not accurately represent the larger population, and variability could lead to different outcomes if a larger or more diverse sample were analyzed.
  • #1
Archosaur
333
4
TL;DR Summary
How do I express that a 100% frequency occurrence in a small sample is low "confidence", when, strictly speaking, its 95% confidence interval is (1,1)?
In experiment A: I observe an event 2 times in 2 trials.
In experiment B: I observe an event 100 times in 100 trials.

In both cases, I calculate a frequency of 100%
In both cases, I calculate a 95% confidence interval of (1, 1).

But intuitively the result of experiment B is "stronger" than that of A. How can I express this as a number?
 
Physics news on Phys.org
  • #2
Archosaur said:
TL;DR Summary: How do I express that a 100% frequency occurrence in a small sample is low "confidence", when, strictly speaking, its 95% confidence interval is (1,1)?

In experiment A: I observe an event 2 times in 2 trials.
In experiment B: I observe an event 100 times in 100 trials.

In both cases, I calculate a frequency of 100%
In both cases, I calculate a 95% confidence interval of (1, 1).

But intuitively the result of experiment B is "stronger" than that of A. How can I express this as a number?

Assume a null hypothesis of whatever frequency you think is appropriate. 50% maybe. Then calculate the probability that such an experimental result is due to chance, ie. that your null hypothesis is true. This will usually be very close to zero in the second case.
 
  • Like
Likes Agent Smith
  • #3
Or you could do a Bayesian analysis and the 95% credible interval would not be (1,1) in either case, but it would be quite broad in the low data case and quite narrow in the high data case
 
  • #4
In the Bayesian case a beta distribution is the conjugate prior for a binomial random variable. The posterior is ##\beta(a+1,b+1)## where ##a## is the number of successes observed and ##b## is the number of failures observed.

From that you can calculate the credible interval. For ##(a=2,b=0)## we find that the 95% credible interval for ##\beta(3,1)## is 0.368 to 1.000. In contrast, for ##(a=100,b=0)## we find that the 95% credible interval for ##\beta(101,1)## is 0.971 to 1.000
 
  • Like
Likes Agent Smith
  • #5
This is awesome. Thanks very much for pointing me to the Beta distribution - this is exactly what I was looking for. I made a python function that calculated frequency and "credibility" (1 - width of 95% credible interval) for O observations in N trials up to 100, because I was curious what a credibility heatmap would look like in this space.
credibility.png
 
  • Like
Likes Agent Smith, berkeman and Dale
  • #6
Excellent! Personally, I think that this behaves as a reasonable person would expect. With just 2 observations it seems reasonable to say “I am pretty sure the probability is greater than 30%”. And with 100 observations it also seems reasonable to say “I am pretty sure the probability is greater than 96%”.
 
  • Like
Likes Agent Smith
  • #7
Dale said:
In the Bayesian case a beta distribution is the conjugate prior for a binomial random variable. The posterior is β(a+1,b+1) where a is the number of successes observed and b is the number of failures observed.
This looks like Laplace's "formula" which he invented to answer the question "what is the likelihood that the sun will rise tomorrow?" If out of ##b## observations, the sun rose ##a## of those times, ##\text{P(sun will rise tomorrow)} = \frac{a + 1}{b + 1}##
 

Similar threads

Replies
1
Views
904
Replies
3
Views
994
Replies
1
Views
950
Replies
14
Views
1K
Replies
6
Views
3K
Replies
1
Views
2K
Replies
24
Views
5K
Back
Top