What does a 95% confidence level mean in a poll?

  • I
  • Thread starter Agent Smith
  • Start date
In summary, a 95% confidence level in a poll indicates that if the same survey were conducted multiple times, 95% of the time the results would fall within a specified margin of error around the true population parameter. This suggests a high degree of reliability in the poll's findings, but it does not guarantee accuracy for any single poll.
  • #1
Agent Smith
278
30
When I read a statistical report that states a ##95\%## confidence interval that support for candidate A is ##56\% \pm 5\%##, does it mean that the there's a ##95\%## probability that the true proportion of supporters p is such that ##51\% < p < 61\%##?
 
  • Like
Likes Greg Bernhardt
Physics news on Phys.org
  • #2
Agent Smith said:
When I read a statistical report that states a ##95\%## confidence interval that support for candidate A is ##56\% \pm 5\%##, does it mean that the there's a ##95\%## probability that the true proportion of supporters p is such that ##51\% < p < 61\%##?
IIRC, using a similar method to determine the proportion of those who will vote for candidate A, 95% of those intervals will contain the proportion of 56%.
 
  • Wow
Likes Agent Smith
  • #3
WWGD said:
IIRC, using a similar method to determine the proportion of those who will vote for candidate A, 95% of those intervals will contain the proportion of 56%.
How does that help us in drawing a conclusion about the true population proportion?
 
  • #4
Agent Smith said:
How does that help us in drawing a conclusion about the true population proportion?
Agent Smith said:
How does that help us in drawing a conclusion about the true population proportion?
https://en.m.wikipedia.org/wiki/Confidence_interval
Ill give a better, clearer answer ASAP.
 
  • Like
Likes Agent Smith
  • #5
In the Wiki article linked to above, it's stated that, for a population proportion ##p## and a sample proportion ##\hat p## and a ##95\%## CI, given as ##(a, b)##

1. Resampling (repeating the "procedure"), say, ##20## times would result in around ##19## of those repeats with ##95\%## CIs that contain ##p##. That is to say ##\hat p - 2 \sigma_{\hat p} < p < \hat p + 2 \sigma_{\hat p}## will occur roughly ##95\%## of the time.

Ok, so what does this mean for the specific interval ##(a, b)## that I just calculated? Per the stats course I took, we're to make inferences about the population (parameters, here ##p##) from the sample (statistics, here ##\hat p##).
What was said in the first line of 1 (vide supra) can be rephrased as I am ##95\%## confident that ##p## lies between ##a## and ##b## i.e. ##a < p < b##.

This doesn't mean that there's a ##95\%## probability that ##p## lies between ##a## and ##b## i.e. to say ##P(a < p < b) = 95\%## is wrong. That would mean confidence is a concept different to probability, though both are expressed as (it seems) relative frequency.
 
  • #6
Agent Smith said:
When I read a statistical report that states a confidence interval that support for candidate A is ,
Can you give us a reference? You have two different margins of error in that one question.
 
  • #7
Vanadium 50 said:
Can you give us a reference? You have two different margins of error in that one question.
I just made them up, with a little help from past questions on statistics.

I'm trying to understand the meaning of ##95\%## confidence interval, correctly interpreted as confidence as opposed to incorrectly interpreted as probability.

If a ##95\%## confidence interval for a proportion is ##25 \pm (2 \times 3)## does that means the true proportion p is such that ##19 < p < 31## (standard deviation is ##3##)
 
  • #8
Agent Smith said:
I just made them up
Don't do that. Making up nonsense and asking us to explain it is not going to help you learn and some might call it antisocial.
 
  • Haha
Likes Agent Smith
  • #9
@Vanadium 50

apologies ...

What about 👇
Agent Smith said:
If a 95% confidence interval for a proportion is 25±(2×3) does that means the true proportion p is such that 19<p<31 (standard deviation is 3 and proportion in sample is 25)
 
  • #10
Where did you get that? It also has multiple margins of error.

If you made that up too, please stop making stuff up!
 
  • #11
Agent Smith said:
What was said in the first line of 1 (vide supra) can be rephrased as I am ##95\%## confident that ##p## lies between ##a## and ##b## i.e. ##a < p < b##.
I would not recommend rephrasing it that way. I would just say “the 95% confidence interval is …”. Although you are careful to not misinterpret, the statement itself sounds more like a Bayesian inference.

Agent Smith said:
That would mean confidence is a concept different to probability, though both are expressed as (it seems) relative frequency.
In Bayesian statistics population parameters are themselves random variables. They don’t have “true” values that are unknown, but rather probability distributions. These Bayesian probabilities are more like “confidence” than frequencies, but in the limit of a lot of data those two concepts converge.
 
  • Like
Likes Agent Smith
  • #12
Dale said:
I would not recommend rephrasing it that way. I would just say “the 95% confidence interval is …”. Although you are careful to not misinterpret, the statement itself sounds more like a Bayesian inference.

In Bayesian statistics population parameters are themselves random variables. They don’t have “true” values that are unknown, but rather probability distributions. These Bayesian probabilities are more like “confidence” than frequencies, but in the limit of a lot of data those two concepts converge.
I have 0 experience with Bayesian statistics. I've used Bayes' theorem for conditional probabilities (updating the prior probability based on new evidence), but that's about it.

I'm just a beginner, took a short statistics course in the past 2 years or so (on and off). The chapter on confidence interval is difficult. I like this 👇 answer
WWGD said:
IIRC, using a similar method to determine the proportion of those who will vote for candidate A, 95% of those intervals will contain the proportion of 56%.
but it feels a bit off. From what I know, it should be ... 95% of those intervals will capture the true proportion. My issue is I have no idea why this is important. What relevance does this conclusion have with what I've just painstakingly computed, a 95% confidence interval for the population proportion, say (19, 31)? Aren't we supposed to use the sample statistic to draw conclusions about the population. Here's my statistic: (19, 31), 95% confidence interval. What can I say now about the true proportion? That if I repeat the procedure and compute 95% confidence intervals 100 times, 95 of those intervals will contain the true proportion seems tangential. I'd like to know what the significance of what I just computed with regard to the true proportion is?
 
  • #13
Agent Smith said:
What can I say now about the true proportion? That if I repeat the procedure and compute 95% confidence intervals 100 times, 95 of those intervals will contain the true proportion
That is all you can say.

Agent Smith said:
I'd like to know what the significance of what I just computed with regard to the true proportion is?
That is the kind of information you get with a Bayesian “credible interval”. You can make statements like “there is a 95% probability that the proportion is between ##a## and ##b##”. You can also make statements like “there is a 95% probability that candidate 1’s proportion is greater than candidate 2’s”. Or “there is a 95% probability that the two candidates proportions differ by less than ##a##”.

Scientifically, it is much more natural than the corresponding standard frequentist quantities.
 
  • Like
Likes Agent Smith
  • #14
Dale said:
That is all you can say.
Then what about the 95% CI I worked so hard to compute? Say it's (19, 31), proportion = 25, standard deviation = 3? What meanings, statistically, do the numbers, 19 and 31, 25 and 3 have? They were computed from my sample data. The answer that seems to fit my intuition is that We are 95% confident that the true proportion lies between 19 and 31. Yet we're cautioned not to interpret that as a 95% probability that the true proportion lies between 19 and 31. There's a difference between confidence and probability and I don't know what that is.
 
  • #15
Agent Smith said:
The answer that seems to fit my intuition is that We are 95% confident that the true proportion lies between 19 and 31.
In frequentist statistics there is a “confidence interval” which has the property already discussed. But there is no assignment of a number to a human’s level of confidence.

Agent Smith said:
There's a difference between confidence and probability and I don't know what that is.
As far as I know, other than a “confidence interval” the unqualified word “confidence” isn’t part of statistics at all. So it is more what you mean by the word.

In frequentist statistics probability is the long-run frequency of some occurrence. It exists in nature, at least in some hypothetical sense.

In Bayesian statistics probability is a human’s subjective assessment of their state of knowledge under uncertainty. It exists in the mind. This may be what you have in mind by “confidence”.
 
  • #16
Agent Smith said:
There's a difference between confidence and probability and I don't know what that is.
Say you've got a jar with 1,000,000 red and blue balls in it and you want to estimate how many red balls there are (analogous to a naive random-sampled poll). You draw 100 balls, of which ten are red. You can generate an estimate of the number of red balls and 95% confidence limits. I repeat the experiment, but in my case ninety are red. Again I can generate an estimate of the number of red balls and 95% confidence limits. Assuming we're both playing fairly at least one of us has got really (un)lucky in our drawing, but (assuming there are at least ten red and ten blue balls) neither result is impossible, and the freakishness of (one of) the results won't become obvious until we compare notes. Your set of limits and mine won't overlap, so if we were to interpret them as "95% probability that the true number of red balls is in these limits" we've got 190% probability accounted for. So they can't be probabilities - at least not naively like I'm stating it.

As @Dale says, in the frequentist interpretation of confidence intervals all it's saying is that if we repeated the experiment many times 95% of the computed confidence intervals would contain the actual number of red balls. In the Bayesian interpretation, I'm saying that there is a 95% chance that the real value lies within my calculated range given what my data is saying about it, and you're saying the same given what your data says. We can't just add two conditional probabilities so we can't get to 190% with the italicised caveats properly stated.

We can, of course, pool our data and come up with a combined answer, or do some post hoc meta-analysis of our separate results. I believe the second is what a lot of political commentators do to combine polls from different sources to make election predictions, with all sorts of weightings and offsets to allow for their beliefs about biases or inadequacies in the input polls.
 
Last edited:
  • Like
Likes Dale
  • #17
@Dale and @Ibix thank you. I'm reading my statistics notes. Will post and respond once I'm clearer about the concepts involved. Until then ... have an awesome day.
 
  • Like
Likes Dale
  • #18
Agent Smith said:
@Dale and @Ibix thank you. I'm reading my statistics notes. Will post and respond once I'm clearer about the concepts involved. Until then ... have an awesome day.
One final comment about why you shouldn't say "the probability is 95% that p falls [into the given interval]" when you're dealing with frequentist intervals.

Probability in that setting applies to random quantities: p itself is not a random variable, it is assumed to be a fixed number, so it can't "fall into" anything. The randomness comes in at the endpoints of the interval since those are calculated from sample quantities and a probability distribution. Think of the process of calculating a confidence interval for p as analogous to this:

You're at a county fair, and you've paid $10 (US) to play a game. You have, in front of you, a large number of empty Coke bottles. One is painted red. Your $10 gives you 3 chances to throw a circular ring at the bottles: if your toss lands around the red bottle you win a $2 stuffed animal. The flight of the ring through the air might be well aimed but wind patterns and the nature of the universe [not to mention the distraction of your SO pleading with you to win so you can leave with the cute little stuff blue bear] contribute randomness to the process.

The painted bottle is the parameter, the ring is the confidence interval, and the bottle never falls anywhere. 95% confidence means you can stand at the game and win with approximately 95% of your throws.
 
  • Like
  • Haha
Likes Agent Smith, Dale and Ibix
  • #19
  • #20
Still non liquet I'm afraid. I have a method for computing 95% confidence intervals. When I say that the 95% confidence interval for the mean m is (a, b), i.e. a < m < b, do I mean ...
1. In 95% of the time that I "apply the same method", the interval contains the mean
2. There's a 95% chance that the interval (a, b) contains the mean
3. There's a 95% chance that the mean is in the interval (a, b)

@statdad ☝️
 
Last edited:
  • #21
Agent Smith said:
Then what about the 95% CI I worked so hard to compute? Say it's (19, 31), proportion = 25, standard deviation = 3? What meanings, statistically, do the numbers, 19 and 31, 25 and 3 have? They were computed from my sample data. The answer that seems to fit my intuition is that We are 95% confident that the true proportion lies between 19 and 31. Yet we're cautioned not to interpret that as a 95% probability that the true proportion lies between 19 and 31. There's a difference between confidence and probability and I don't know what that is.
Either the population mean is inside that interval or it isn't. (I'm not a Baysian so I say this value is a constant. ) The probability it lies inside that interval is either 1 or zero.

It is a rather subtle distinction. In the real world the word "probably" is used loosely. You are correct in thinking that if the poll is taken repeatedly and a confidence interval is generated each time then it is expected that on average in 95% of the cases the population mean will lie within that interval. See the difference? In this case the interval isn't fixed. It's a variable, a random variable.

This sort of thing arises so routinely that it is more or less assumed that others know it, so IMO the loose use of terminology is harmless. These distinctions are however educational and worth your time to learn.
 
  • Like
Likes Dale
  • #22
statdad said:
One final comment about why you shouldn't say "the probability is 95% that p falls [into the given interval]" when you're dealing with frequentist intervals.

Probability in that setting applies to random quantities: p itself is not a random variable, it is assumed to be a fixed number, so it can't "fall into" anything. The randomness comes in at the endpoints of the interval since those are calculated from sample quantities and a probability distribution. Think of the process of calculating a confidence interval for p as analogous to this:

You're at a county fair, and you've paid $10 (US) to play a game. You have, in front of you, a large number of empty Coke bottles. One is painted red. Your $10 gives you 3 chances to throw a circular ring at the bottles: if your toss lands around the red bottle you win a $2 stuffed animal. The flight of the ring through the air might be well aimed but wind patterns and the nature of the universe [not to mention the distraction of your SO pleading with you to win so you can leave with the cute little stuff blue bear] contribute randomness to the process.

The painted bottle is the parameter, the ring is the confidence interval, and the bottle never falls anywhere. 95% confidence means you can stand at the game and win with approximately 95% of your throws.
I once actually did this. On my first toss I won a liter of Bintang beer.
 
  • Like
Likes Agent Smith
  • #23
Agent Smith said:
2. There's a 95% chance that the interval (a, b) contains the mean
3. There's a 95% chance that the mean is in the interval (a, b)
What is the distinction you are making between these two statements?

To me “the cookie is in the jar” is completely equivalent to “the jar contains the cookie”. So I would never write those two things as different cases. So can you clarify the distinction you are making?
 
  • Haha
Likes Agent Smith
  • #24
Dale said:
What is the distinction you are making between these two statements?

To me “the cookie is in the jar” is completely equivalent to “the jar contains the cookie”. So I would never write those two things as different cases. So can you clarify the distinction you are making?
I suppose we have to be careful about which, the jar or the cookies, can vary. 🤔
 
  • Like
Likes Dale

Similar threads

3
Replies
72
Views
1K
Replies
10
Views
573
Replies
22
Views
3K
Replies
6
Views
3K
Replies
24
Views
5K
Replies
4
Views
2K
Replies
26
Views
3K
Replies
14
Views
5K
Back
Top