Correct understanding of confidence intervals....

  • I
  • Thread starter fog37
  • Start date
  • Tags
    intervals
In summary: M_s)##To do this, you need to calculate:##P(M_s, \sigma_s | M, M_s)##and##P(M_s, \sigma_s | M_s, 1)##First, you need to find the probability that the mean is between ##1.75## and ##1.85##:##P(M_s, \sigma_s | M, M_s) = P(M_s, \sigma_s)P(M, M_s)##Since you know the mean and standard deviation for your sample, you can
  • #1
fog37
1,569
108
Hello,
I am attempting to correctly interpret what a confidence interval means.

This is what I know: a confidence interval is a a continuous interval of values with a lower bound and an upper bound centered around a sample mean.For example given a certain population, we are interested in the true population mean and the 95% confidence interval CI:
  • We can extract from the population N equal samples (all sample having identical size ##n##). Assume we pick N=100 samples.
  • Each sample will have its own sample mean and its own sample standard deviation ##s##.
  • Each sample will also generate its own confidence interval center at its own sample mean. The CI limits of each sample depends on standard deviation ##s## and the ##z## score we choose (the ##z## score value will determine if we talk about a 95% or 99% or 100% confidence interval). We pick ##z=1.96##.
  • We end up with 100 samples and 100 confidence intervals. 95 among those 100 confidence intervals will contain the true population mean and 5 confidence intervals will surely not.
  • The best estimate of the population mean is the average of the sample means. And as far as confidence interval...which confidence interval do we pick among the 95 CI that we are sure all contain the true population mean?
Thanks!
 
  • Like
Likes FactChecker
Physics news on Phys.org
  • #2
Why not use the standard deviation of the 100n samples?
 
  • #3
Apparently that is not how the CI limits are calculated. see https://www.mathsisfun.com/data/confidence-interval.html

They show that the interval is centered at the sample mean and the limits are calculated using the sample size ##n##, the ##Z## score, and the sample standard deviation ##s##: $$\pm Z \frac{s}{\sqrt{n}}$$
 
  • #4
fog37 said:
The best estimate of the population mean is the average of the sample means. And as far as confidence interval...which confidence interval do we pick among the 95 CI that we are sure all contain the true population mean?
I am not sure what you are asking here. There is no such thing as a population confidence interval. So what are you trying to estimate here?
 
  • #5
If there are 100 samples, each sample generates its own 95% CI with its own limits. 95 of these CI contain the true population value for sure and 5% does not.
Which, among the 95 interval, should we consider? As far as a point estimate, we take the average of the sample average. But as far as these many, 95, interval, which one do we choose to represent the interval estimate?
 
  • #6
Overall, an x% confidence interval means or is interpreted as saying that x% of the samples conducted in the same way will contain the true population statistic ( mean, variance, etc.)
 
  • #7
fog37 said:
If there are 100 samples, each sample generates its own 95% CI with its own limits. 95 of these CI contain the true population value for sure and 5% does not.
Which, among the 95 interval, should we consider? As far as a point estimate, we take the average of the sample average. But as far as these many, 95, interval, which one do we choose to represent the interval estimate?
None of the individual CI will be as good as a single CI formed with all 100*n samples. Calculating the mean of the means is the same as calculating the mean of the overall data set of 100*n samples (provided each sample is the same size). The CI is not so simple and you need to actually calculate it on the whole data set.
 
  • #8
Ok, if we know the entire population, as one huge sample containing all the items, we could calculate its true mean, variance standard deviation, and confidence interval CI.

But we often need to do sampling statistics and work with a finite number of samples of finite size. And we get a confidence interval from each sample. I guess any confidence interval is good as long as the confidence interval contain the true population parameter...
 
  • #9
I just want to point out something that is well-known, but is often glossed-over. A confidence interval of 95% doesn't really mean that there is a 5% chance that the true number (the true mean, for instance) is outside the interval. If you don't know the actual distribution, then you don't have any idea whether your sample accurately reflects that distribution.

Suppose that you're measuring the heights of American males. For your particular sample, you find the mean, say ##1.8## meters, and the standard deviation, say ##0.2## meters. What you would like to know is: What's the probability that the true mean (if we checked every American male) is between ##1.75## and ##1.85##? You don't know. You have no way of knowing in a non-subjective way.

Let's define some variables:
  • ##M## = the actual (unknown) mean among all American males.
  • ##\sigma## = the actual standard deviation
  • ##M_s## = the mean for our sample
  • ##\sigma_s## = the standard deviation for our sample
What you would like to be able to compute is:

##P(M, \sigma | M_s, \sigma_s)##

the probability that the actual mean is ##M## and the actual standard deviation is ##\sigma## given that our sample mean is ##M_s## and the sample standard deviation is ##\sigma_s##. You'd like to be able to say:

Claim 1: The probability that ##M < 1.75## is less than 5%.

But you can't compute that. What you can compute is the reverse: ##P(M_s, \sigma_s | M, \sigma)##

This is the probability of getting a sample mean ##M_s## and sample standard deviation ##\sigma_s## under the assumption that the true mean and standard deviation are ##M, \sigma##. So you can say:

Claim 2: If the true mean were 1.75 or less, and the true standard deviation were 0.2, then the probability that our sample mean would be 1.8 is less than 5%.

That's a different statement. People very often sloppily act as if confidence intervals tell you something like Claim 1, when they actually tell you something like Claim 2.
 
  • Like
Likes haushofer
  • #10
stevendaryl said:
But you can't compute that
Well, you can with Bayesian statistics, but I assume you know that and were just making a point about misunderstanding of frequentist statistics.
 
  • #11
fog37 said:
  • Each sample will also generate its own confidence interval center at its own sample mean.

That's false unless we use an ambiguous definition of "confidence interval". For example, if our sampling scheme has a 95% confidence interval of ##\pm 5.3## for the population mean then there is a .95 probability that the randomly selected sample mean will lie within ##\pm 5.3## of the unknown population mean. But if a particular sample has a sample mean of 47, this does not imply that there is a .95 probability that the unknown population mean is within ##\pm 5.3## of 47.

Calling ##( 47 - 5.3, 47 + 5.3)## a "confidence interval" is technically incorrect, although people unfamiliar with mathematical statistics call it such. The "frequentist" analysis of data regards the population mean as having a fixed but unknown value. This is different than modeling the population mean as a random variable. From the point of view that the population mean has a fixed but unknown value, is it is logically inconsistent to assign a probability for the population mean to be in an interval with specific numerical endpoints.

If you want to assign a probability that the population mean is in a specific interval, you need to formulate the problem in a Bayesian way and model the population mean as a random variable. Then specific Bayesian "credible intervals" can be computed from sample data.

  • The best estimate of the population mean is the average of the sample means.

If you want to get the concepts of statistics straight in you mind, you must be clear what you mean by "best". Study the different concepts of "unbiased estimators", "minimum variance estimators", "maximum liklihood estimators".

For example, suppose we know a random variable has equal probability of taking on each of the values ##x, x+1, x+3## and we take 3 independent samples. If the samples we observe are {8,8,10}, is it "best" to estimate the population mean as (8+8+10)/3 or is it "best" to estimate it as (7+8+10)/3 ?

(An estimator need not be defined by a simple algebraic expression. It can be defined by a complicated algorithm that employs various if-then branches.)
 
  • #12
Stephen Tashi said:
That's false unless we use an ambiguous definition of "confidence interval".
The definition does leave us with a difficult interpretation. https://en.wikipedia.org/wiki/Confidence_interval#Meaning_and_interpretation
For example, if our sampling scheme has a 95% confidence interval of ##\pm 5.3## for the population mean then there is a .95 probability that the randomly selected sample mean will lie within ##\pm 5.3## of the unknown population mean. But if a particular sample has a sample mean of 47, this does not imply that there is a .95 probability that the unknown population mean is within ##\pm 5.3## of 47.

Calling ##( 47 - 5.3, 47 + 5.3)## a "confidence interval" is technically incorrect, although people unfamiliar with mathematical statistics call it such.
This is how "confidence interval" is usually defined. That leaves us with the common misconception that we can interpret it in a simple probability manner.
 
Last edited:
  • #13
FactChecker said:
The definition does leave us with a difficult interpretation. https://en.wikipedia.org/wiki/Confidence_interval#Meaning_and_interpretation
This is how "confidence interval" is usually defined. That leaves us with the common misconception that we can interprit it in a simple probability manor.

As I have said about it, what you really want to know is "There is a 95% chance that the true mean is within ##\Delta x## of the sample mean". But there is no way to know that without making subjective assumptions (Bayesian reasoning does that, as @Dale points out). So what people instead calculate is something that is kind of complicated and whose significance is questionable, the confidence interval.

The alternative quantity described in the Wikipedia article would allow you to say something backwards like "There is a 95% chance that a randomly collected sample mean is within ##\Delta x## of the true mean". That sounds similar, but isn't exactly the same. Of course, you can't actually calculate that, either, unless you make assumptions about the true standard deviation. (The Wikipedia article says that using the "student-t distribution", you can avoid making assumptions about the true standard deviation, but I don't understand how that works.)
 
  • #14
stevendaryl said:
As I have said about it, what you really want to know is "There is a 95% chance that the true mean is within ##\Delta x## of the sample mean".
Yes, that is what people want. But that is not what the confidence interval really does. The originator of the method, Neyman, knew that and warned about it. A proper interpritation is "If the true parameter is outside of this interval, the odds of getting a sample like this is less than xxx." The OP asks about the confidence interval and the interpritation of it. He seems to recognize that there are some issues, and he is correct. It is good to address this issue because it comes up in hypothesis testing in general.
But there is no way to know that without making subjective assumptions (Bayesian reasoning does that, as @Dale points out).
At least the Bayesian approach identifies the issue in concrete terms, but it opens up an entire set of questions and issues.
 
Last edited:
  • #15
FactChecker said:
At least the Bayesian approach identifies the issue in concrete terms, but it opens up an entire set of questions and issues.

Sure. It would be very difficult and confusing to try to rewrite statistical results in a way that uses Bayesian analysis. For one thing, you couldn't just publish experimentally derived numbers, you would have to also publish your prior probabilities, and it would be difficult to establish what are sensible priors, and it would be difficult for researchers to use the results of researchers that used different priors. It would be a mess.

On the other hand, "confidence intervals" don't actually give any confidence at all, if you understand what they mean, unless you supplement them with subjective judgments. If someone proves that "The probability of gathering this sample data by chance if the tooth fairy doesn't exist is less than 5%", that doesn't actually tell us anything about the likelihood that the tooth fairy exists.
 
  • Like
Likes FactChecker
  • #16
stevendaryl said:
Sure. It would be very difficult and confusing to try to rewrite statistical results in a way that uses Bayesian analysis. For one thing, you couldn't just publish experimentally derived numbers, you would have to also publish your prior probabilities, and it would be difficult to establish what are sensible priors, and it would be difficult for researchers to use the results of researchers that used different priors. It would be a mess.

https://arxiv.org/abs/astro-ph/9812133
This is an example of a paper that uses Bayesian analysis (in addition to Frequentist statistics). They describe their priors.
 
  • Like
Likes Dale
  • #17
stevendaryl said:
On the other hand, "confidence intervals" don't actually give any confidence at all, if you understand what they mean, unless you supplement them with subjective judgments.
I tend to disagree, although the "confidence" does not have an authoritative numerical value. If the sample results make one very skeptical of a certain parameter range, that is confidence. IMHO, it is better to leave it there than to think that the Bayesian results are any more authoritative. They formalize the issue, but do not really put the results on any more firm ground -- so it can be deceptive.
 
  • #18
stevendaryl said:
It would be very difficult and confusing to try to rewrite statistical results in a way that uses Bayesian analysis.
That is true, but only because we have been doing it a different way for so long. If we had more institutional experience with Bayesian methods then it would not be any more difficult and confusing. Particularly since the Bayesian quantities are typically those that are actually of interest and more aligned with how people think about their data. E.g. the Bayesian “95% credible interval” for a parameter is an interval that has a 95% probability that it contains the parameter. That is directly what people wish a confidence interval told them.

stevendaryl said:
you would have to also publish your prior probabilities, and it would be difficult to establish what are sensible priors, and it would be difficult for researchers to use the results of researchers that used different priors.
On the contrary, with Bayesian methods you can directly use another researchers’ data in your own analysis with your own priors and together with you new data. You cannot do that using standard methods because of the multiple comparisons issue.

In my mind, the reuse of data and naturalness of the results are the two main reasons to switch to Bayesian analyses. In particular, I would think that the data reuse is something that scientists using public funds should feel ethically obligated to do.

If your fellow-citizens have paid millions of dollars to purchase your data and if you can make it “use once and dispose” or “permanent” simply by choice of analysis methods, then how can anyone justify the “disposable data” approach? Unfamiliarity with the alternative seems a poor excuse for not making the best use of the public stewardship.

FactChecker said:
IMHO, it is better to leave it there than to think that the Bayesian results are any more authoritative. They formalize the issue, but do not really put the results on any more firm ground -- so it can be deceptive.
I don’t think it is about “authoritative” or not. I think it is about naturalness. You have to tie yourself in mental knots to understand what a frequentist result means and how to interpret it in the context of your study. Yes, the Bayesian tools are a little more complicated, but in the end they get you where you wanted to go.

What researcher actually cares about the probability of their data given the null hypothesis? They are interested in their hypothesis, and yet have to test the null hypothesis simply because of the statistical tools.

atyy said:
https://arxiv.org/abs/astro-ph/9812133
This is an example of a paper that uses Bayesian analysis (in addition to Frequentist statistics). They describe their priors.
That is my preferred approach. Use both sets of tools. Swap them as appropriate for your use case.
 
Last edited:
  • #19
Dale said:
I don’t think it is about “authoritative” or not. I think it is about naturalness. You have to tie yourself in mental knots to understand what a frequentist result means and how to interpret it in the context of your study. Yes, the Bayesian tools are a little more complicated, but in the end they get you where you wanted to go.
I think it just hides the subjective prior beneath a thin veneer of another layer of math. And it creates a great many questions, such as, how much does it take to overcome an erroneous prior to any given accuracy.
 
  • Like
Likes Dale
  • #20
Dale said:
On the contrary, with Bayesian methods you can directly use another researchers’ data in your own analysis with your own priors and together with you new data.

Maybe you could (in an Insight article, or a regular article) explain how to combine Bayesian results from different researchers using different priors?

The basic formula used by Bayesian statistics is: (where ##\lambda## is the parameter that you're interested in, and ##D## is the data that is supposed to shed light on the value of ##\lambda##)

##P(\lambda | D) = \frac{P(D | \lambda) P(\lambda)}{P(D)} = \frac{P(D|\lambda) P(\lambda)}{\sum_{\lambda'} P(D|\lambda') P(\lambda')}##

where ##P(\lambda)## is the prior likelihood for ##\lambda##.

If someone wants to use a different prior, then it seems to me that that means throwing out the actual value for ##P(\lambda|D)##, because that is sensitive to the choice of priors. You could still reuse the value of ##P(D | \lambda)##, because that is theory-dependent but not dependent on the prior.
 
  • Like
Likes FactChecker
  • #21
stevendaryl said:
If someone wants to use a different prior, then it seems to me that that means throwing out the actual value for P(λ|D), because that is sensitive to the choice of priors.
You are exactly correct, you could discard their analysis including both the prior and their posterior. The point I was making is that you can reuse their data in your own analysis with your own priors and your own hypotheses.

You cannot do the same with frequentist methods. If you do one test and then later do another test on the same data then you have actually done a more complicated experiment which effectively increases the p value of the original test. This is the root of the multiple comparisons issue. In frequentist methods data should only be used once, and if you use it multiple times then you need to account for it by an adjustment for multiple comparisons.

That is not an issue for Bayesian methods. This guy is a little “partisan” for Bayesian methods, but he highlights the statistical issue well http://www.indiana.edu/~kruschke/articles/Kruschke2010WIRES.pdf
 
  • #22
Dale said:
You are exactly correct, you could discard their analysis including both the prior and their posterior. The point I was making is that you can reuse their data in your own analysis with your own priors and your own hypotheses.

Yes, that's true.

You cannot do the same with frequentist methods. If you do one test and then later do another test on the same data then you have actually done a more complicated experiment which effectively increases the p value of the original test. This is the root of the multiple comparisons issue. In frequentist methods data should only be used once, and if you use it multiple times then you need to account for it by an adjustment for multiple comparisons.

That is not an issue for Bayesian methods. This guy is a little “partisan” for Bayesian methods, but he highlights the statistical issue well http://www.indiana.edu/~kruschke/articles/Kruschke2010WIRES.pdf

I would be all in favor of switching to Bayesian methods, because I actually think it's the correct way to think about probabilities, but there is a lot of inertia to overcome.
 
  • Like
Likes Dale
  • #23
stevendaryl said:
there is a lot of inertia to overcome
That is the number one problem. I see no quick cure for that.
 

FAQ: Correct understanding of confidence intervals....

What is a confidence interval?

A confidence interval is a range of values that is likely to include the true value of a population parameter with a certain level of confidence. It is used to estimate the true value of a population based on a sample of data.

How is the confidence level determined for a confidence interval?

The confidence level for a confidence interval is typically determined by the researcher or scientist based on the level of certainty they want to have in their estimate. The most commonly used confidence level is 95%, meaning that there is a 95% chance that the true population parameter falls within the calculated interval.

What is the relationship between sample size and the width of a confidence interval?

The larger the sample size, the narrower the confidence interval will be. This is because a larger sample size provides more precise estimates of the population parameter, resulting in a smaller margin of error and a narrower interval.

How do confidence intervals help with interpreting experimental results?

Confidence intervals provide a range of values that are likely to include the true population parameter. This allows researchers to determine the precision of their estimate and the level of certainty in their results. It also helps to identify any potential biases or errors in the data.

Can confidence intervals be used to compare two or more groups?

Yes, confidence intervals can be used to compare two or more groups. By calculating the confidence intervals for each group, researchers can determine if there is a significant difference between the groups. If the confidence intervals do not overlap, it suggests that there is a significant difference between the groups.

Similar threads

Replies
1
Views
875
Replies
4
Views
1K
Replies
3
Views
969
Replies
15
Views
488
Replies
7
Views
2K
Replies
10
Views
546
Replies
21
Views
3K
Replies
3
Views
2K
Back
Top