Why do polls keep failing everywhere?

Polls have failed on us again. Once upon a time, we could rely on the latest published numbers to understand how an election result or a referendum’s could affect our lives. Those days seem to be gone. 2016 in particular has appeared to be catastrophic for pollsters around the world, with the obvious examples of Brexit and Colombia in our minds along with the latest results we got on November 8th. Well, those last ones weren’t really that bad at all, in fact the aggregate polls were fairly accurate on a national level, forecasting a whole country advantage for Hillary Clinton within the realm of 2 points [1]. Hillary will possibly end with an edge on the popular vote of about 1% or 1.5% nationally, which is in fact a really accurate result for polls. Still, there were some missteps in estimation, particularly in certain regions of the United States. Polls understated Trump’s margin by 4 points around Midwestern states [2]. In many of these states, that include Pennsylvania, Michigan and Wisconsin, Clinton was expected to win, yet it was the now president-elect who took the states.

thumbnail image: Why do polls keep failing everywhere?

Now, after the third major polling fiasco occurring on the same year, I would expect to see endless pages being written about methodological improvements for polling methodology. So far, as usual, the discussion lacks the essentials and has centred mostly on how us, poor pollsters, have been fooled by the voters. The USC Dornsife/Los Angeles Times poll explained to FiveThirtyEight that: “Women who said they backed Trump were particularly less likely to say they would be comfortable talking to a pollster about their vote” [2]. Other explanations involve Trump’s voters being mistrustful of institutions and governments, and polls being apparently part of the system; or simply a ‘late shift’ in preferences that changed the whole election in a matter of 36 hours.

I am pollster myself and after a few hundreds of surveys, I’ve come to understand some of the factors that affect our estimations. For that reason, I cringe every time I see debates about polling errors derailing into talk on endless psychological discussions of ‘shy voters’, ‘lying majorities’, ‘late voting surges’ or whatever name appears to explain what went wrong with polls. In my experience, the real culprit behind polling errors is not within voter’s minds, but within the pollsters’ methodologies: I’ve seen many, many times how unrepresentative samples are selected for polls, using outdated techniques and data collection strategies that simply do not work. When pollsters fail, we end up discussing how to frame questions or how to reach the secrecy of the minds, yet I hardly see a discussion on how to properly use statistical sampling techniques. And I’m not the first, nor the most qualified person who has raised that point.

In 2015 the polling industry witnessed one of their most prominent disasters to date: the U.K. general election. For those who missed the event, the polling average in the country had the Labour party and the Conservatives with an even voting share on the night before the election. This happened to be not just the average of the polls but the consensus: nearly every pollster’s final poll placed the two parties within 1% point of each other. One night later, when the final results rolled, the margin was above 6% in favour of Conservatives [5]. The errors were big, systematic and affected almost every pollster in the country: a catastrophe on every level.

Just a few days after, the British Polling Council in collaboration with the Market Research Society, announced a wholly independent inquiry under the chairmanship of Professor Patrick Sturgis of the University of Southampton [3]. The final report was presented on March of this year, maybe a bit too late to prevent Brexit and the other recent polling catastrophes. Still the BPC report is, in my view, one of the most important reviews of poll methodology and I would encourage everyone interested in political surveys to take a look on the text [4]. Of course, I would also hope all my colleagues take an in-depth review of the 120 pages the council presented, as they reached quiet a staggering conclusion: statistics are not being well-used by pollsters. The committee summarized it this way [4]:

“Our conclusion is that the primary cause of the polling miss in 2015 was unrepresentative samples. The methods the pollsters used to collect samples of voters systematically over-represented Labour supporters and under-represented Conservative supporters. The statistical adjustment procedures applied to the raw data did not mitigate this basic problem to any notable degree. The other putative causes can have made, at most, only a small contribution to the total error.”

Samples weren’t the only concern studied by the report. The inquiry reviewed over half a dozen factors that could have affected the results of different polls done during the election. No mishandling of the data was detected, a ‘late swing’ in preferences was deemed modest at least; deliberate misreporting was also discarded as a contributory factor for polling errors [4]. So, the favourite excuses pollster have the day after were all discarded.

For instance, question wording and framing, one of the favourite discussions among pollsters, proved to be an insignificant source of error [4]. After U.K.’s polling miss, it was suggested that pollsters could have achieved better estimates simply by changing the wording or the order of the voting preference question. This argument is closely related to the assumption that there was a ‘shy Tories’ problem in British polling: Conservative voters were less willing to admit to intending to vote Conservative. One and a half years later, we are hearing the exact same thing again, but now ‘shy Tories’ became ‘shy Trumpies’.

One of the main instruments the British Polling Counsel inquiry used was data from the 2015 British Election Study (BES) and the British Social Attitudes survey (BSA). Although these surveys were actually produced after the election, they were designed to give direct information on vote choice by respondents who are known to have voted. More importantly, both surveys use probability sampling rather than quota sampling, which is the preferred method by pollsters. The reason why these studies became a good benchmark for comparison was that both surveys produced good post-election estimates of the difference in vote shares between the Conservatives and Labour, way better estimates that the ones presented on the media.

These probabilistic surveys provided the best evidence against question framing, wording and ‘shy whatever’s’ affecting the outcome of the polls. BES in particular, included an experiment which manipulated the placement of the vote intention question within the survey: it was either placed at the beginning, after a ‘most important issue’ question and towards the end. In the end, the proportion of conservative vote was independent to the ordering, providing strong evidence against the idea that the order of questions was to blame for the underestimation of Conservative’s share of vote.

At some point, it was also argued that the vote intention distribution might be better estimated using a question which emphasized the respondent’s local constituency rather than the national race. This was in fact even argued at some point by the FiveThirtyEight team who failed to predict the outcome of that election [5]. The basic idea was that the question a) “If there was a general election tomorrow, which party would you vote for?” was far worse than question b) “Thinking specifically about your own parliamentary constituency at the next General Election and the candidates who are likely to stand for election to Westminster there, which party’s candidate do you think you will vote for in your own constituency?” I may need to improve my English as I find question b) incredibly hard to understand, yet some pollsters agreed that was the best way to ask. Thankfully, the BES panel also randomized respondents to receive either the standard vote intention question or the constituency specific question. The British Polling Counsel reports that in fact, the effect was the opposite: the standard vote intention question, question a), exhibits a higher proportion of Conservative vote than the constituency-specific question, i.e. in the end it has higher predictive power. As such, wording, framing or ordering of the questions were not particularly contributing factors for polling error this time [4]. In my view, the explanation is rather simple: although ordering and wording do have effect on survey estimations, pollsters do this all the time. The current ordering and framing of the questionnaire is pretty much standard by now and it shouldn’t be a concern for most pollsters.

What about ‘shy Tories’? The report states it is unlikely that deliberate misreporting turned out to be a contributing factor for the polling miss. We pollsters tend to be prepared for this, as most voting intention questions include “Don’t know”, “Undecided” and “Don’t want to tell” as response options. Why would someone deceitfully declare a vote for Hillary when they wish to vote for Trump if all they could have done is to say “I don’t want to tell you” or “I’m not sure”? Interviewers are also trained to avoid any form of conflict or pressure that could affect this outcome. Even more, re-contact probabilistic polls showed no changes in vote intention. And the stronger evidence against this theory also came from BES and BSA polls, which showed no signs of deliberate lying, even after the election. If voters were so embarrassed about supporting the Conservatives before the election, then it would be expected that they deliberately lied to pollsters to conceal the vote they cast after the election, if not in the same amount, at least to some degree. Still, no evidence of this was found [4].

And after thorough consideration of the many different factors that could have affected election outcome, the report ends defining that sampling and weighting, the two aspects of the methodology that more heavily depend on statistical expertise, turned out to be the ones that more dramatically affected polling errors.

The polls that performed best before and after the election were those that relied on probabilistic sampling [4]. Multi-stage sampling was used in these surveys, stratification was also standard. Multiple contacts are done to reach the sampled individuals and sample substitutions are not permitted. Nate Silver, if you are reading this, sampling methodology might be a better indicator on how to weight polls, according to these results. Even when compared with other studies performed after the election took place, surveys done with probabilistic sampling provided more accurate estimates [4].

I have never attempted to predict an election without a probabilistic sample, yet that seems to be happening a lot. ‘Quota sampling’, which basically means gathering enough interviews to reach a quota, turned out to be the most prominent data collection strategy employed by pollsters in the U.K. Quotas imply non probabilistic samples, which, as the report concluded, tend to provide biased estimates. Even some polls that describe their sampling designs as ‘random digit dialling’ used non-random probability methods, only considering at one point some element of random number generation. The poll designers simply didn’t know the difference between a random sample and a non-random one!

Sadly, pollsters, forecasters, modellers, all seem to be unaware of the perils of the sampling designs used by most pollsters. And that is why, every single year, the ‘shy voters’ appear again. That is why the questions get reframed every year and why the polling errors keep stacking, because no one seems to recognize that the mathematical methodology is what can and should be improved. We have spent so much time discussing the psychology of voters, the turnout algorithms and the merits of online polls that we have forgotten to meet the basic criteria of statistics. The British Polling Counsel report is an invaluable aid for the world of polls, simply because it clearly and unequivocally identifies a lack of statistical sampling expertise as a major setback in political surveys. Polls need to rely more in statistics than in political science or psychology. And maybe by reinforcing the basics, we can avoid polling misses in the years to come.

References

[1] Campos, Roy EU: Encuestas y Pronósticos. El Economista (Nov, 2016)
http://eleconomista.com.mx/columnas/columna-especial-politica/2016/11/11/eu-encuestas-pronosticos
[2] Bialik, Carl et al. The Polls Missed Trump. We Asked Pollsters Why. FiveThirtyEight (Nov, 2016)
http://fivethirtyeight.com/features/the-polls-missed-trump-we-asked-pollsters-why/
[3] Farmer, Ben. Independent inquiry announced into what went wrong with election polls. The Telegraph (May, 2015)
http://www.telegraph.co.uk/news/general-election-2015/11592840/Independent-inquiry-announced-into-what-went-wrong-with-election-polls.html
[4] Sturgis, P. Baker, N. Callegaro, M. Fisher, S. Green, J. Jennings, W. Kuha, J. Lauderdale, B. and Smith, P. Report of the Inquiry into the 2015 British general election opinion polls, London: Market Research Society and British Polling Council. (March, 2016)
http://eprints.ncrm.ac.uk/3789/1/Report_final_revised.pdf
[5] Grajales, Carlos. So what can we learn from Nate Silver’s mistakes? StatisticsViews (May, 2015)
http://www.statisticsviews.com/details/news/7945151/So-what-can-we-learn-from-Nate-Silvers-mistakes.html

Copyright: Image appears courtesy of Getty Images