Public Opinion Polling for the USA's Election

  • #1
Hornbein
2,650
2,218
Suppose your polling methods were so good that all one had to worry about was random experimental error. Then how many sample subjects would you need to have 95% confidence that your sample mean was within 1% of the true population mean?

Answer : Ten thousand sample subjects. That though would only give you the popular vote, which in the USA doesn't matter. What you really want to know is the results of the vote in the seven swing states. Though one could fiddle about to reduce the number, the straightforward method is to have ten thousand subjects in each swing state. That means a total of seventy thousand pollees. Even that might result in no definite conclusion. And in reality your polling methods aren't that close to such a theoretical ideal. So it seems to me that in a close election practical polling can tell you only that the election will be close.
 
  • Like
Likes russ_watters
Physics news on Phys.org
  • #2
Suppose you did have that super poll in the swing states. How accurate would be the prediction of the 2020 result? The super poll would have predicted a Biden victory 99% of the time. Much better than I would have thought. Biden won three states by less than 1%. Trump needed to win all three of them to be elected. The super poll is often wrong in one or two states but this doesn't matter. The chance that the super poll is wrong in all three states thus predicting a Trump victory is about 1%.

What if the sample size is reduced to 2500? We then have per-state results that are usually within 2% of the true value. Then super poll in 2020 is correct 92% of the time. Still rather good. There is a peculiar effect. As the sample size decreases this boosts the chance that a state that voted for Trump [North Carolina] would wrongly be called for Biden, thusly increasing the probability of an accurate prediction by 1%.
 
Last edited:
  • #3
So while I wrongly thought the electoral college system made prediction more difficult it actually facilitates it. The safe states like Utah don't need to be polled at all, a big savings. In the other states poll until you are fairly sure you have the result, which in most cases isn't that high a cost. Concentrate your resources on the swing states.
 
Last edited:
  • Like
Likes FactChecker
  • #4
Let's say you have two big states. Anyone who wins both states wins the election. If the states are split then they cancel each other out and have no net effect on who wins. But in both states the vote is close, close enough that your inaccurate polling is no better than a coin flip. Now if you can get a clear prediction of the winner for the other 48 states then your prediction of the overall winner will be correct 5/8ths of the time. Not only that, you can change your strategy so as to be correct 3/4 of the time.
---
The problem is analogous to this. You flip two fair coins. You predict the total heads. If you are correct you win. If not correct you flip one coin. If you are then correct you win, otherwise lose. If you always choose the total heads as 1 then you win 3/4 of the time.

The analogy doesn't seem to me all that obvious, so here's the full story.
--
The boring case : If one candidate wins both big states then they win the election. Your random prediction of the overall winner will be correct half the time.
--
The interesting case: one candidate wins one state and the other candidate bags the other. Let's call the candidates B and C. B wins the first state and C the second. We'll call this result BC. There are four possible predictions : BB, BC, CB, and CC. Only one is correct, but nevertheless in case of result BC your chance of predicting the overall winner is 3/4. Wild, eh? Let's look at the four cases.

Let's say your prediction was BB. Your prediction for these two states is wrong but half the time B wins in the other 48 states anyway so your overall prediction is still correct. 1/2

Suppose your prediction is BC. You win. 1

Let's say your prediction is CB, totally wrong. But you were correct that the two states cancelled one another out so your overall prediction is still correct. 1

Let's say your prediction was CC. Your prediction for these two states is wrong but half the time C wins in the other 48 states anyway so your overall prediction is still correct. 1/2

Each of these four cases has probability 1/4. So in case of result BC your prediction of the overall winner is correct 1/8+1/4+1/4+1/8 = 3/4 of the time!
--
Adding together the four cases of results of voting we get correct prediction 5/8 of the time even though our predictions were purely chance. This is a lower bound : if the statistics are better than nothing then things only improve.
----
But how about this? You know your polling isn't giving you an edge, so just give up on it. Assume the two states will split and go with the solid polling result from the other 48 states. Then your prediction will be correct 3/4 of the time. If the two states split then you are always correct. If they don't you are right half the time. (1+1/2)/2 = 3/4.
 
Last edited:
  • #5
You are assuming that the results for different states are uncorrelated.
 
  • #6
Hornbein said:
So while I wrongly thought the electoral college system made prediction more difficult it actually facilitates it. The safe states like Utah don't need to be polled at all, a big savings. In the other states poll until you are fairly sure you have the result, which in most cases isn't that high a cost. Concentrate your resources on the swing states.
Good point. That is correct if you are interested in the electoral college winner rather than the total vote. If you want to predict the total vote, you can't do that. That may become a significant disadvantage in the near future. Many states have passed an agreement that, given a certain condition, they will all send electoral college people who represent the winner of the total national vote. The condition is that enough states join the agreement to completely decide the election. They still require a few more states to join the agreement. When/if that happens, the total national vote will become the deciding factor.

Another way to greatly increase the accuracy of the polling estimates is called the "ratio method". You already know the total Republican versus Democratic votes for each voting location of the last election. By sampling within each voting location, you can estimate the ratio of results this time versus last time. Because of the (assumed) strong correlation between the prior voting and the current one, there is much less variation in those ratio numbers. They can be used to calculate a total vote estimate that is far more accurate.
 
Last edited:
  • #7
FactChecker said:
Another way to greatly increase the accuracy of the polling estimates is called the "ratio method". You already know the total Republican versus Democratic votes for each voting location of the last election. By sampling within each voting location, you can estimate the ratio of results this time versus last time. Because of the (assumed) strong correlation between the prior voting and the current one, there is much less variation in those ratio numbers. They can be used to calculate a total vote estimate that is far more accurate.
This would take care of consistent bias on one's samples. We're always wrong in X direction so subtract that out.

I not impressed with "this county has always voted for the winner." Such things work until they don't. There are so many counties that there is bound to be an infallible one or two or three.
 
  • #8
Hornbein said:
I not impressed with "this county has always voted for the winner." Such things work until they don't. There are so many counties that there is bound to be an infallible one or two or three.
Good point, but "voting for the winner" is not the point. Voting for Republican or for Democrat is the point. There is no reason to ignore knowledge of the prior vote in voting places. The correlation from one vote to the next is usually strong. It doesn't have to be a rule that is always true for every voting place. A strong correlation in general from one time voting to the next will significantly decrease the overall variation when the ratio method is used. In the rare case that it is not the result is still valid (considering the associated variation), and those cases are probably so extreme that the race is not close at all.
 
Last edited:
  • #9
I'm getting ##2401## for sample size. 🤔
 
  • #10
A little known swing "state" is NE-2. There are ~600K people, which is, I dunno, 200K voters. 10K is a big chunk of that.

It is a "must win" for one of the candidates.
 
  • #11
Looks like the polls, while tightly clustered, ended up quite a ways off.
 
  • Like
Likes berkeman
  • #12
Vanadium 50 said:
Looks like the polls, while tightly clustered, ended up quite a ways off.
Yeah, weird.
 
  • #13
Vanadium 50 said:
Looks like the polls, while tightly clustered, ended up quite a ways off.
I suspect that if you look at the details of any legitimate poll, there will be a significant amount of "undecided" and "refused to answer"s in the samples.
 
  • #14
The BBC poll predicted 47% of the vote for Trump. He got 51%. With such poor accuracy why should anyone pay any attention to such polls?
 
  • #15
Hornbein said:
The BBC poll predicted 47% of the vote for Trump. He got 51%. With such poor accuracy why should anyone pay any attention to such polls?
What poll are you referring to?
Are you referring to the BBC reference to the 538/ABC results? That is a weighted average of a multitude of polls. I am not familiar with how much rigorous analysis goes into these weighted averages of multiple polls. Here is a brief description of their methodology during the Trump/Biden campaign.
In any case, nobody is forcing you to believe the polls.
 
Last edited:
  • #16
Hornbein said:
The BBC poll predicted 47% of the vote for Trump. He got 51%. With such poor accuracy why should anyone pay any attention to such polls?
CA is notoriously slow in counting their results so Trump’s percentage will fall.
 

Similar threads

Back
Top