Statistics - Confidence interval

In summary: The "model" in this case is basically: 50 balls are drawn simultaneously, 9 of them turned out to be red and the red ball mean for 50 balls is 7.5 (expected value). Is it necessary to get into binomial distribution in order to get a confidence interval for σ?Yes, because the binomial distribution is the only thing that describes the number of red balls you can get when you draw 50 balls. You can say that 9/50 is the sample mean of the number of red balls, but you cannot use it to make a confidence interval for the number of red balls. I cannot see how you can do this
  • #1
nossren
23
0

Homework Statement


Suppose you have a bucket containing a lot of balls with different colors. You randomly pick 50 balls, 9 of which are red (X = 9, where X ~ N(μ, σ²)). The probability of picking a red ball is 15%. From this you want to construct a 95% confidence interval for the standard deviation σ and do a hypothesis test.
$$
\begin{align}
X &= 9 \\
\mu &= 7.5 \\
\sigma^* & \approx 0.581 \\
\alpha &= 0.05 \\
H_0: \sigma &= \sigma^* \\
H_1: \sigma &\neq \sigma^*
\end{align}
$$

Homework Equations


$$
\begin{align}
V(X) &= E[(X-\mu)^2] \\
D(X) &= \sqrt{V(X)} \\
\end{align}
$$

The Attempt at a Solution


The expected amount of red balls per 50 balls, μ, ought to be 0.15*50 = 7.5. I estimated σ as σ* (above) to obtain a null hypothesis to test. Then I tried using a reference variable [itex]R = \frac{X-\mu}{\sigma}\ \tilde\ \ N(0,1)[/itex] and putting
$$
1-\alpha = P(-\lambda_{\alpha/2} < R < \lambda_{\alpha/2}) = P(-1.96 < \frac{X-\mu}{\sigma} < 1.96) \Rightarrow I = \left(\frac{X-\mu}{\sigma} \pm 1.96\right)
$$
but this doesn't seem to make any sense. Is there another reference variable/distribution I can use? I tried t-distribution, but it leads to division by 0 due to the N-1 denominator in the sample standard deviation.
 
Last edited:
Physics news on Phys.org
  • #2
nossren said:

Homework Statement


Suppose you have a bucket containing a lot of balls with different colors. You randomly pick 50 balls, 9 of which are red (X = 9, where X ~ N(μ, σ²)). The probability of picking a red ball is 15%. From this you want to construct a 95% confidence interval for the standard deviation σ and do a hypothesis test.
$$
\begin{align}
X &= 9 \\
\mu &= 7.5 \\
\sigma^* & \approx 0.581 \\
\alpha &= 0.05 \\
H_0: \sigma &= \sigma^* \\
H_1: \sigma &\neq \sigma^*
\end{align}
$$

Homework Equations


$$
\begin{align}
V(X) &= E[(X-\mu)^2] \\
D(X) &= \sqrt{V(X)} \\
\end{align}
$$

The Attempt at a Solution


The expected amount of red balls per 50 balls, μ, ought to be 0.15*50 = 7.5. I estimated σ as σ* (above) to obtain a null hypothesis to test. Then I tried using a reference variable [itex]R = \frac{X-\mu}{\sigma}\ \tilde\ \ N(0,1)[/itex] and putting
$$
1-\alpha = P(-\lambda_{\alpha/2} < R < \lambda_{\alpha/2}) = P(-1.96 < \frac{X-\mu}{\sigma} < 1.96) \Rightarrow I = \left(\frac{X-\mu}{\sigma} \pm 1.96\right)
$$
but this doesn't seem to make any sense. Is there another reference variable/distribution I can use? I tried t-distribution, but it leads to division by 0 due to the N-1 denominator in the sample standard deviation.

Where do you get the value ##\sigma^* \doteq 0.581?## This is wrong.
 
  • #3
I redid the calculation using the definition
$$
\sqrt{V(X)} = \sqrt{\sum_k (k-\mu)^2p(k)} = \sqrt{(9-7.5)^2\cdot 0.149} \approx 0.579
$$
 
  • #4
nossren said:
I redid the calculation using the definition
$$
\sqrt{V(X)} = \sqrt{\sum_k (k-\mu)^2p(k)} = \sqrt{(9-7.5)^2\cdot 0.149} \approx 0.579
$$

If you use the binomial distribution for ##X## there is a standard formula for the variance---look it up. It gives results much different from yours.
 
  • #5
The variance for [itex]X[/itex] is then, according to my book, [itex]V(X) = nqp = 50\cdot(1-0.15)\cdot0.15[/itex]. How can I justify going from N to Bin?

edit: p was supposed to be 0.15, mixed it up with another exercise
 
Last edited:
  • #6
nossren said:
The variance for [itex]X[/itex] is then, according to my book, [itex]V(X) = nqp = 50\cdot(1-0.15)\cdot0.15[/itex]. How can I justify going from N to Bin?

edit: p was supposed to be 0.15, mixed it up with another exercise

Justification depends on the "model". When the problem states that the probability of drawing a red is 15% (without giving other details) you more-or-less have to assume that the same 15% applies to the first, second, third,..., 50th balls. Then, if the drawings are independent, you get the Binomial distribution for sure.

However, if the 15% figure really means that 15% of the balls are red, then whether or not a binomial is good depends on the size of the ball population. For example, if there are only slightly more than 50 balls altogether, then the initial drawing of some red balls changes the red percentage in later draws, and so you do not get the binomial---instead, you get the so-called hypergeometric distribution. The variance formula is a bit more complicated, and depends explicitly on the total ball population size, N. However, if N is much larger than 50 the binomial distribution is a good approximation---becoming exact in the limit ##N \to \infty##. Exactly how large N should be and how good the approximation is can be studied numerically, by comparing the binomial and hypergeometric results.
 
  • #7
Ray Vickson said:
Justification depends on the "model". When the problem states that the probability of drawing a red is 15% (without giving other details) you more-or-less have to assume that the same 15% applies to the first, second, third,..., 50th balls. Then, if the drawings are independent, you get the Binomial distribution for sure.

However, if the 15% figure really means that 15% of the balls are red, then whether or not a binomial is good depends on the size of the ball population. For example, if there are only slightly more than 50 balls altogether, then the initial drawing of some red balls changes the red percentage in later draws, and so you do not get the binomial---instead, you get the so-called hypergeometric distribution. The variance formula is a bit more complicated, and depends explicitly on the total ball population size, N. However, if N is much larger than 50 the binomial distribution is a good approximation---becoming exact in the limit ##N \to \infty##. Exactly how large N should be and how good the approximation is can be studied numerically, by comparing the binomial and hypergeometric results.
Yes, the amount of balls in the "bucket" can be assumed to tend towards infinity, therefore the probability is constant. However, what I have learned is that when you have a sample with distribution N(μ, σ²) you want to construct a reference variable with some distribution ##N(0,1),\ t(n-1),\ x^2(n-1)## (depending on what is given), in order to construct a confidence interval using the quantiles.

The "model" in this case is basically: 50 balls are drawn simultaneously, 9 of them turned out to be red and the red ball mean for 50 balls is 7.5 (expected value). Is it necessary to get into binomial distribution in order to get a confidence interval for σ?
 
Last edited:
  • #8
nossren said:
Yes, the amount of balls in the "bucket" can be assumed to tend towards infinity, therefore the probability is constant. However, what I have learned is that when you have a sample with distribution N(μ, σ²) you want to construct a reference variable with some distribution ##N(0,1),\ t(n-1),\ x^2(n-1)## (depending on what is given), in order to construct a confidence interval using the quantiles.

The "model" in this case is basically: 50 balls are drawn simultaneously, 9 of them turned out to be red and the red ball mean for 50 balls is 7.5 (expected value). Is it necessary to get into binomial distribution in order to get a confidence interval for σ?

If the distribution is binomial you do not need a "confidence interval" for ##\sigma##; you just compute it from the formula. After all, if you are entitled to say ##\mu = 0.15 \times 50 = 7.5## you are also entitled to say ##\sigma^2 = 0.15 \times 0.85 \times 50 = 6.375##. In fact, for the binomial it makes no sense at all to even speak of a confidence interval for ##\sigma##.

It is difficult to see how to make any sense of the question, but one possibility might be to take the hypergeometric case; that is, the bucket contains ##N## balls (where ##N \geq 50## is unknown). Somehow you know that the number ##R## of red balls in the bucket is ##R = 0.15 N##; the other ##N-R## balls are not red. You draw ##n = 50## balls (without replacement) from the bucket and observe that ##k = 9## are red. If ##X## = number of reds in the sample, ##X## has a hypergeometric distribution. While the expected value of ##X## is still given by ##EX = 0.15 \times 50 = 7.5##, the variance does, in fact, depend on the unknown ball population ##N##:
[tex] \text{Var}(X) = n p (1-p)\, \frac{N-n}{N-1} [/tex]
where ##p = R/N = 0.15##. (See, eg., http://en.wikipedia.org/wiki/Hypergeometric_distribution .)
Presumably, you can use the observation ##X = 9## to cook up a maximum-likelihood estimation of ##N## and find some type of probable interval for ##N##. Then you could translate that ##N##-interval into a ##\sigma##-interval. However, that all seems far-fetched to me, and so I continue to be baffled by what on Earth the question could possibly mean.
 

FAQ: Statistics - Confidence interval

What is a confidence interval?

A confidence interval is a range of values that is likely to include the true population parameter with a certain level of confidence. It is often used in statistics to estimate the true value of a population parameter based on a sample.

How is a confidence interval calculated?

A confidence interval is calculated by using a sample mean and the standard error of the sample to determine a range of values that is likely to include the true population mean. The formula for a confidence interval is: sample mean ± (critical value * standard error)

What is the significance of the confidence level in a confidence interval?

The confidence level in a confidence interval represents the level of certainty that the true population parameter falls within the calculated interval. For example, a 95% confidence level means that if we were to take multiple samples and calculate confidence intervals, 95% of those intervals would contain the true population parameter.

How does sample size affect the width of a confidence interval?

A larger sample size will result in a narrower confidence interval, as there is more data available to estimate the true population parameter. This means that a larger sample size will increase the precision of the estimate and decrease the margin of error.

What are the limitations of a confidence interval?

A confidence interval is based on a sample from a population and is subject to sampling error. This means that there is a chance that the calculated interval does not include the true population parameter. Additionally, a confidence interval only provides information about the population parameter and does not indicate the distribution of the data.

Back
Top