Polling Margin of Error

  • I
  • Thread starter Vanadium 50
  • Start date
  • Tags
    Margin
  • #1
Vanadium 50
Staff Emeritus
Science Advisor
Education Advisor
35,013
21,670
What exactly is a "margin of error" intended to be for a poll?

Is it a one sigma number? A 90 or 95% CL? An educated guess?

As I understand it, this number is reported on each result - i.e. if the poll says Smith and Jones each have 50% support with a 5% MOE, the "true" result can be anywhere between 45-55 and 55-45. So when the pundits say "the difference is less than the MOE" they really mean "less than twice the MOE."

Also as I understand it, polls need to be corrected for over and undersampling various subpopulations. (e.g. people with cell phones and no landline tend to be undersampled) This correction should form part of the MOE. But it surely is not distributed as a Gaussian. We could even argue about whether it is distributed at all!

When aggregators combine polls, they surely look at (and hopefully weight appropriately) the MOE. Do they also consider how accurate the MOE has been in the past? If a pollster systematically underestimates their MOE, that does not make it a better poll for sure. And vice versa.
 
  • Like
Likes WWGD, Klystron and gleem
Physics news on Phys.org
  • #2
My understanding is that it is supposed to be a 95% confidence interval. Frequentist analyses would put it at about twice the standard error of the mean.

I think that Bayesian analyses are becoming more common. So I don’t know exactly what Bayesian quantity is reported as the margin of error.
 
  • Like
Likes Agent Smith and FactChecker
  • #3
The 95% confidence is customary. Here is a good discussion.
 
  • Informative
Likes DaveE and berkeman
  • #4
FactChecker said:
The 95% confidence is customary
Thanks.

Then what am I to make of two polls that differ by 2x or 3x the margin of error?
 
  • Like
Likes FactChecker
  • #5
Dale said:
Bayesian analyses
This would be among the last analyses I would treat this way. You are maximally sensitive to your prior. "Nobody I know could possibly vote for Smith!" And as you know, the "flat prior" is a myth.
 
  • #6
At 95% confidence you expect 1 in 20 to differ that much even if everything is done correctly

Vanadium 50 said:
This would be among the last analyses I would treat this way. You are maximally sensitive to your prior. "Nobody I know could possibly vote for Smith!" And as you know, the "flat prior" is a myth.
Yes, priors are a part of Bayesian statistics. But I doubt polling is particularly more sensitive to the priors than many other applications.
 
Last edited:
  • Like
Likes Motore
  • #7
Dale said:
At 95% confidence you expect 1 in 20 to differ that much even if everything is done correctly
Sure, but if this is a Gaussian, if you have a probability of 5% that you lie anywhere outside the interval (and I realize this is imprecise language), the probability of it lying 2x out is less than 1/10000 and 3x out less than one in a million.

Dale said:
But I doubt polling is particularly more sensitive to the priors than many other applications.
Election of 2016 anyone?

But my concern is more fundamental. If I measure a physical quantity N different ways, I am happy to use previous measurements as a prior. If one technique has a systematic shift, it will be automatically deweighted by the priors. If every measurement uses the same technique - i.e. robocalls to land lines - they will all have the same systematic shift, and there's no way to correct for this.
 
  • #8
I think a MoE is supposed to be ##2 \times \text{Standard Error}##, where standard of error in this case would be ##\frac{\sigma_{\hat p}}{\sqrt n}##. I believe the best-case scenario is when we have ##\sigma## (the population standard deviation)
 
  • #9
Vanadium 50 said:
Election of 2016 anyone?
The most accurate poll aggregator I know of, Nate Silver, used Bayesian analysis. So, I don't think that actually supports an anti-Bayes stance. He gave a 1 in 3 probability of a Trump victory, based on a full-fledged Bayesian analysis with priors and all of the usual Bayesian machinery. The occurrence of a 1 in 3 event is not evidence of a model failure. In a well calibrated presidential election model it should happen pretty often, once every decade or so.

Apart from the accurate Bayesian poll aggregation, do you have any hint that Bayesian analyzed polls themselves were off by more than non-Bayesian polls?
 
  • #10
Dale, I know you live Baysean analyses. I do not consider it a "one size fits all" tool as some do. I gave my reasons for not considering it superior and don't think writing then again who wasn't convinced the first time.

I will say that Bayes lived in the 18th century, and there is still discussion of pros and cons.

Was 2016 an outlier? Sure. Was it a statistical fluctuation that hit poll after poll? We'll never know, but it sure looks more systematic. The fact that pollsters and aggregators have tweaked their methodology in response suggests they think so too.

Dale said:
do you have any hint that Bayesian analyzed polls themselves were off by more than non-Bayesian polls?
Of course not, because "off" means two different things - credibility level vs. confidence level. Is purple louder than sour?

But I don't want to get into a fight on the pros and cons of Baysean statistics. I am trying to better understand what is intended by "margin of error" and not how to improve polling and aggregation.
 
  • #11
As an aside, we look at how well our own measurements are when better ones and better averages come out. Our one sigma band tend to be too wide (i.e. we overestimate the error) but our two sigma bands are too narrow.
 
  • #12
Vanadium 50 said:
Was 2016 an outlier? Sure. Was it a statistical fluctuation that hit poll after poll? We'll never know, but it sure looks more systematic.
Sure, but were Bayesian polls or Bayesian aggregators worse or better? The premier Bayesian aggregator was the best that year.

Vanadium 50 said:
I gave my reasons for not considering it superior and don't think writing then again who wasn't convinced the first time
Yes. No need to repeat the fact that you don’t like priors. It is a pretty unconvincing argument against Bayesian statistics, and you are right that it won’t become more convincing a second time.

Vanadium 50 said:
But I don't want to get into a fight on the pros and cons of Baysean statistics. I am trying to better understand what is intended by "margin of error" and not how to improve polling and aggregation
Fair enough. I think that your immediate dismissal of Bayesian analysis in this context is unfounded. But I agree that it is not particularly germane to the question of a margin of error.

I have seen detailed statements about a poll before from the publisher of the poll. I will see if I can pull up an example. These are the more scientific descriptions of the methodology compared to what gets reported. I will see if I can find one of those. Maybe it is clear about the meaning of the margin of error
 
  • Like
Likes Vanadium 50
  • #13
Public opinion polls are not based on random samples, which is impractical in this case. They use stratified sampling, where pollees are selected to attempt to model those people who will actually vote. I suspect the poor 2016 predictions were due to unanticipated greater than usual participation by certain groups.

Other factors include alteration of election rules to boost participation by favored groups. Voter turnout in 2020 increased by a phenomenal 22 million voters.
 
  • Like
Likes Klystron
  • #14
Vanadium 50 said:
Thanks.

Then what am I to make of two polls that differ by 2x or 3x the margin of error?
That would happen if the sample size differs. Bigger samples cost more.

Bigtime candidates don't trust public polls. They pay for their own.
 
  • #15
It's not that I don't like priors. It's that I don't like sensitivity to priors. If you try multiple priors and your result barely moves, I am much happier that if small changes in the prior makes a large change in the outcome. I've seen both.

Onto the topic at hand, the thing I am really wrestling with is using these values in computations. (Which I suppose could include calculating priors for subsequent Baysean analyses). That requires more understanding than "Maybe 50% is really 45%". Correlations and corrections make a big difference here: if the correction is much larger than the margin of error, how much is this a measurement and how much an estimate (albeit by professionals)? If the polls move, is the opinion changing or is the sample changing? Are outlier such because of the data, or because of their corrections? And so on.
 
  • #16
Vanadium 50 said:
Thanks.

Then what am I to make of two polls that differ by 2x or 3x the margin of error?
That is a good question. It's a complicated situation. IMO, it is dangerous to compare two polls that probably use different methods. Any one poll should carefully treat all the alternatives to give a valid comparison of the alternatives. I would be less confident that two different polls can be compared. For instance, suppose one poll includes an "Undecided" category and the other does not. In political polls of likely voters, one poll might be more likely to classify certain people as unlikely to vote. I think trying to compare two polls opens a can of worms.
 
  • #17
Vanadium 50 said:
It's not that I don't like priors. It's that I don't like sensitivity to priors. If you try multiple priors and your result barely moves, I am much happier that if small changes in the prior makes a large change in the outcome. I've seen both
Agreed. I simply haven’t seen any indication that polling is an application that is unusually sensitive to the priors.

Vanadium 50 said:
Correlations and corrections make a big difference here
The correlations especially. A typical assumption is that the responses are independent and identically distributed (or rather that the residuals are). That assumption is demonstrably false, and accounting for it is really challenging. Both Bayesian and frequentist methods are affected by this.
 
  • Like
Likes phinds
  • #18
Hornbein said:
Public opinion polls are not based on random samples, which is impractical in this case. They use stratified sampling, where pollees are selected to attempt to model those people who will actually vote.
Correct, and part of this question is to better understand how this is incorporated into the margin of error.
Hornbein said:
suspect the poor 2016 predictions were due to unanticipated greater than usual participation by certain groups.
Not everyone agrees with that.
Hornbein said:
Bigtime candidates don't trust public polls. They pay for their own.
True, but a) I don't care what a poll I never see says, and b) the same understanding of what the numbers mean should apply whether I see them or not.
 
  • #19
Maybe being less abstract will help. There are at least three different components to the polling uncertainty. There is a pure statistical uncertainty which I will call x. There is a common systematic uncertainty from common corrections due to samping (roboalls) which I will call z, and finally an uncertainty on z called y from deviations between sample corections: r.h. one poll calls people in the morning and one at night.

You would like x to dominate, because then your margin of error is simple to calculate, and more importantly, statistics tells us hoe this variable behaves in combination and calculation.

If you sample 1000 people in a close race, the 2σ margin of error is 3.2%. That's not much smaller than the polls margin of error, so they are implicitly telling us x is large compared to y + z.

z is likely a large correction, but what matters is not the size of the shft, but rather its uncertainty. And since every pollster does pretty much the same thing, this is surely well understood. So I am prepared to believe it is small. I am prepared to believe that, subject to the proviso that if the pollsters get this wrong, they get it wrong for all the polls together.

Fixing this is not easy. If instead of robocalls, we put surveys in packages of baby food, polling would be slower, more expensive and not unbiased - only differently biased,

That leaves y, which is tough, in part because it depends on factors you can't or didn't control. One hopes it is small. (There is a famous pediatrics result that was just refuted because they missed a correlation between experienced doctors and sicker patients)

Now, what facts do we have to regute the idea that x dominates?

(1) We have polls that are outliers, at p-values we should never see.
(2) Sometimes changes in the race impact different polls substantially differently: if Smith promises free ice cream and one poll has her up 1x the margin of error (2σ) and another similar poll 4x (8σ), we can all agree that this is popular, but maybe we're not so sure about how many people like Smith. I think it would lend confidence (or credibilty...that was a joke, Dale!) if the polls moved in lockstep as the race changed.

I am not saying "all polls are bunk", as some do. But I am saying that it is quite difficult to assess how seriously to take trem, especially with outliers, and I am hoping to learn to do this better.
 
  • #20
Vanadium 50 said:
Correct, and part of this question is to better understand how [stratified sampling] is incorporated into the margin of error.
Stratified sampling is such a large subject. The classic text is "Sampling Techniques" by Cochran.
This link is to a short excerpt pdf that discusses the estimated variance and confidence limits.

PS. If I knew how valuable that book would become, I would not have given it away when I retired. ;-)
 
  • Like
Likes Vanadium 50
  • #21
You can see from that text you will have problems when you chop the data up too finely, even before considering biases. You end up with 1/N(sample size) in places where you had 1/N(total).

Considering subsample biases will make the variance go up, and not down.

The latest CNN poll has N=2074 and a stated margin of error of 3.0%. It's already hard to reconcile those two numbers, especially at 2σ. It's certainly not the binomial error. But even assuming this is 1σ and the error I call x is ##\sqrt{Np}## this says that the systematic terms have negligible (and indeed, slightly negative!) impact on the total uncertainty. This sounds implausible.
 
  • Like
Likes phyzguy
  • #22
Vanadium 50 said:
Maybe being less abstract will help. There are at least three different components to the polling uncertainty. There is a pure statistical uncertainty which I will call x. There is a common systematic uncertainty from common corrections due to samping (roboalls) which I will call z, and finally an uncertainty on z called y from deviations between sample corections: r.h. one poll calls people in the morning and one at night.
If I understand correctly, z and y are non-random biases due to methodological features of the poll, where z is one that is common to many or most polls (or pollsters) and y is one that is specific to a given poll (or pollster).

Vanadium 50 said:
z is likely a large correction, but what matters is not the size of the shft, but rather its uncertainty. And since every pollster does pretty much the same thing, this is surely well understood. So I am prepared to believe it is small. I am prepared to believe that, subject to the proviso that if the pollsters get this wrong, they get it wrong for all the polls together.

Fixing this is not easy. If instead of robocalls, we put surveys in packages of baby food, polling would be slower, more expensive and not unbiased - only differently biased,
This is a big issue. One thing that can be done is to focus on the changes in polling results. Even if there is some bias, as long as that bias is consistent from yesterday to today, changes can be meaningful.

There is one other possibility, but it is problematic. Polls are an attempt to measure opinions. They are not an attempt to predict behavior. But in election years that is what they are (mis) used for. However, insofar as you are willing to (mis) use polls as predictions of behavior, you can get a bit of feedback on the magnitude of the bias. It is a small amount of feedback

Vanadium 50 said:
Now, what facts do we have to regute the idea that x dominates?
We should be careful. That idea is not an idea claimed by the pollsters themselves as far as I can tell. I think that this idea is more of a vague impression by the public. It does need to be refuted, but in the sense that the GR bowling ball on a rubber sheet needs to be refuted.

One well respected and prolific pollster is SurveyUSA. Their methodology is described here:

https://www.surveyusa.net/methodology/

They say:

Though commonly cited in the presentation of research results, “sampling error” is only one of many types of error that may influence the outcome of an opinion research study. More practical concerns include the way in which questions are worded and ordered, the inability to contact some, the refusal of others to be interviewed, and the difficulty of translating each questionnaire into all possible languages and dialects. Non-sampling errors cannot be quantified

So the pollsters themselves (at least the high quality ones) recognize that there are other sources of error besides your x.

Vanadium 50 said:
I think it would lend confidence (or credibilty...that was a joke, Dale!)
Excellent! I am Dale and I approve this joke.
 
  • #23
Vanadium 50 said:
You can see from that text you will have problems when you chop the data up too finely, even before considering biases.
If you are not grouping the subsample categories wisely, there is no reason to use stratified sampling.
Vanadium 50 said:
You end up with 1/N(sample size) in places where you had 1/N(total).
That is true. The subsamples need to be clustered around the subsample mean to reduce the subsample variance, even with the smaller subsample size.
Vanadium 50 said:
Considering subsample biases will make the variance go up, and not down.
Where stratified sampling is used wisely, that is not the case.
 
  • #24
Suppose you have a sample from two groups of equal sizes, one clustered closely around 100 and the other clustered closely around -100. By grouping the subsamples, you have two small subsample variances. The end result will be smaller than if you ignored the groups and had a lot of large ##(x_i-0)^2 \approx 100^2## terms to sum.
 
  • Informative
Likes Dale
  • #25
After some thinking, I concluded that a poll can beat √N. Sort of.

Suppose East Springfield is known to vote 100% for Smith. Now you don't have to poll them - you know the answer. The margin of error is driven not by the total, but by the same from West Springfield.

The problem is that this is only as good as the assumptions, and if you put enough in, it becomes more "poll-influenced modeling" than polling. That may be a good thing, but it is not the same good thing as "polling".

Dale said:
They are not an attempt to predict behavior. But in election years that is what they are (mis) used for.
Fundamentally, they are all built on a lie. "If the election were held today". But yes, they are used as predictors. Despite some spectacular failures like "Dewey Defeats Truman". They are the worst tools except for all the others.

I find betting odds to be interesting. They involve real money, so the incentives are different. The ask the question "what do you think will happen" which is a different question than "what do you want to happen". They respond to events much faster than polls. Finally, they are illegal in the US for US elections, so you are getting an interestingly selected sample. I would not say these are more useful than polls, but they do provide different information.

More interestingly, they don't always agree with each other. This violates the Law of One Price, which opens up the possibility of arbitrage.
 
Last edited:
  • Like
Likes Dale
  • #26
Vanadium 50 said:
Fundamentally, they are all built on a lie. "If the election were held today". But yes, they are used as predictors.
It isn’t a lie. It is a counterfactual. Unlike electrons, humans can have definite opinions on counterfactuals.
 
  • #27
Sure...but the answer you get from that question when you are done processing is "If the election were held last week...." :smile:
 
  • Like
Likes Dale
  • #28
FactChecker said:
Suppose you have a sample from two groups of equal sizes,
But the variance of the total sample does not change by dividing it into subsamples, and so the uncertainty on the mean does not go down by dividing it. The exception is when you know a priori that the samples are exactly equal size (actually, you only need to know the relative mix better than you can count it, but I am sure you understand what is meant)

I don't think anyone disagrees with the idea that you can reduce the uncertainty by incorporating information apart from the poll itself. I think the question is ay one point are you no longer doing polling? There's a famous (mis) quoye from the Election og 1972 "Nobody I know voted fro Nixon". And that's true from Manhattan you had to go quote a way to find somewhere Nixon won. All the way to Queens or Staten Island.

Factoring in "what everybody knows" is a two-edged sword. Maybe three.
 
  • #29
Vanadium 50 said:
Factoring in "what everybody knows" is a two-edged sword. Maybe three.
Ignoring what is known is also problematic.

The problem you are getting at is distinguishing between what is known and what is erroneously believed to be known.
 
  • #30
Vanadium 50 said:
But the variance of the total sample does not change by dividing it into subsamples, and so the uncertainty on the mean does not go down by dividing it.
It does reduce the uncertainty of the mean if you know what proportions of the distribution are in each strata.
Vanadium 50 said:
The exception is when you know a priori that the samples are exactly equal size (actually, you only need to know the relative mix better than you can count it, but I am sure you understand what is meant)
Equal size is not point. You would prefer a sample that most closely reflects the population proportions. You can make appropriate adjustments if the proportions in the sample are different from the population proportions. That is beneficial if one category is more difficult or expensive to get a sample from. That is a very common problem.

Stratified sampling is a well-established statistical approach. Any reputable polling organization uses it extensively.
 
Last edited:
  • #31
I think we're saying similar things - we can beat √N by replacing counting with pre-existing knowledge. When we buy a dozen eggs, there is no √12 uncertainty.

The problem - or a problem - comes up as Dale (and Mark Twain before him) when these turn out to be incorrect. "I don't have to poll North Springfield because Jones has it all in the bag". Well, what if she doesn't? And how would you know?

At some point, you are crossing the line between corrected polling and poll-inspired modeling. Which means at some point you are no longer quoting a statistical estimate of uncertainty but an expert's estimate.

Farther along that path and we're into the realm of fortunetelling. I don't think we are there yet, but it would be a pity if we did someday.
 
  • #32
Vanadium 50 said:
"I don't have to poll North Springfield because Jones has it all in the bag". Well, what if she doesn't? And how would you know?
I don't think that the modeling errors are of this type. It is a lot more subtle. It is more like: in the US census 20% of the population has less than 4 years college in Somewhereville. Only 15% of my Somewhereville sample has less than 4 years of college, so I need to correct for my undersampling of the less than 4 years college population. But what I don't realize is that now it is actually 25%, so my correction isn't large enough.
 
  • Like
Likes FactChecker
  • #33
Ye,s but the issues are exposed by looking at the limiting cases. I think we can all agree that, for example, the US Presidential Election of 2024 that polling Pennsylvania heavily tells you more than polling Hawaii or Utah.

Further, if your goal is to beat √N, you have to count what you need to count over what you (believe) don't. Otherwise your error doesn't go down.
 
  • #34
Vanadium 50 said:
Ye,s but the issues are exposed by looking at the limiting cases.
I don't think that "I don't have to poll North Springfield because Jones has it all in the bag" is a limiting case of anything that high quality pollsters actually do. There is a difference between a limiting case and a strawman.
 
  • Like
Likes FactChecker
  • #35
Vanadium 50 said:
Ye,s but the issues are exposed by looking at the limiting cases.
I don't believe that they make naive mistakes. Their analysis is far more sophisticated than I will ever understand.
Vanadium 50 said:
I think we can all agree that, for example, the US Presidential Election of 2024 that polling Pennsylvania heavily tells you more than polling Hawaii or Utah.

Further, if your goal is to beat √N, you have to count what you need to count over what you (believe) don't. Otherwise your error doesn't go down.
Some things can be determined with good accuracy, like the percentage of people of certain age, education, wealth, internet connection, smartphone ownership, living in the country versus the city, income level, etc., in a state. Those have a strong influence on voting trends. They should not be ignored. Stratified sampling can take that into account (within reason and limits). It's good to know the characteristics of your sample. Even then, there is a lot of uncertainty.
 
  • Like
Likes Klystron and Dale

Similar threads

Replies
4
Views
2K
Replies
6
Views
11K
Replies
10
Views
9K
Replies
13
Views
1K
Replies
7
Views
3K
Replies
19
Views
6K
Replies
4
Views
5K
Back
Top