Chris Hanretty’s election forecasting analysis is being used by BBC Newsnight to in the run up to the general election. He appeared on the programme on 5th January 2014 to explain how the principles behind it (http://www.bbc.co.uk/iplayer/episode/b04xtkzl/newsnight-05012015). In this post, he explains how his analysis is built on the work of others.
One characteristic which academia shares with rap music (and occasionally with house music) is the care it places on giving proper credit. The forecasting site that I’ve built with Ben Lauderdale and Nick Vivyan, and which is featured in tonight’s edition of Newsnight, wouldn’t have been possible without lots of previous research. I’ve put some links below for those that want to follow up some of the academic research on polling and election forecasting.
(1) “Past elections tell us that as the election nears, parties which are polling well above the last general election… tend to drop back slightly”.
In the language of statistics, we find a form of regression toward the mean. We’re far from the first people to find this pattern. In the British context, the best statement of this tendency is by Steve Fisher, who has his own forecasting site. Steve’s working paper is useful for more technically minded readers.
(2) “…use all the polling data that’s out there…”
As Twitter constantly reminds us, one poll does not make a trend — we need to aggregate polls.
Most political scientists who aggregate polls are following in the footsteps of Simon Jackman, who published some very helpful code for combining polls fielded at different times with different sample sizes. We’ve had to make a fair few adjustments for the multiparty system, but there’s enough of a link to make it worth a shout out.
(3) “By matching… [subsamples of national polls] with what we know about each local area we can start to identify patterns”
Again, to give this insight its proper statistical name, this is a form of small area estimation. In political science, a lot of small area estimation is done using something called multilevel regression and post-stratification, which can be quite slow and fiddly (these are non-technical terms). Although we’ve used MRP in the past (for example, to generate estimates of how Eurosceptic each constituency is), we’ve found that you get similar results using simpler regression models. See our technical report for the gory details.