26 November 2018

Reviewing our 2018 Mayoral race predictions

Our predictions for the 2018 mayoral race in Toronto were generated by our new agent-based model that used demographic characteristics and results of previous elections.

Now that the final results are available, we can see how our predictions performed at the census tract level.

For this analysis, we restrict the comparison to just Tory and Keesmaat, as they were the only two major candidates and the only two for which we estimated vote share. Given this, we start by just plotting the difference between the actual votes and the predicted votes for Keesmaat. The distribution for Tory is simply the mirror image, since their combined share of votes always equals 100%.
Distribution of the difference between the predicted and actual proportion of votes for Keesmaat

The mean difference from the actual results for Keesmaat is -6%, which means that, on average, we slightly overestimated support for Keesmaat. However, as the histogram shows, there is significant variation in this difference across census tracts with the differences slightly skewed towards overestimating Keesmaat’s support.

To better understand this variation, we can look at a plot of the geographical distribution of the differences. In this figure, we show both Keesmaat and Tory. Although the plots are just inverted versions of each other (since the proportion of votes always sums to 100%), seeing them side by side helps illuminate the geographical structure of the differences.
The distribution of the difference between the predicted and actual proportion of votes by census tract

The overall distribution of differences doesn’t have a clear geographical bias. In some sense, this is good, as it shows our agent-based model isn’t systematically biased to any particular census tract. Rather, refinements to the model will improve accuracy across all census tracts.

We’ll write details about our new agent-based approach soon. In the meantime, these results show that the approach has promise, given that it used only a few demographic characteristics and no polls. Now we’re particularly motivated to gather up much more data to enrich our agents’ behaviour and make better predictions.

21 October 2018

A new approach to predicting elections: Agent-based modeling

It’s been a while since we last posted – largely for personal reasons, but also because we wanted to take some time to completely retool our approach to modeling elections.
In the past, we’ve tried a number of statistical approaches. Because every election is quite different to its predecessors, this proved unsatisfactory – there are simply too many things that change which can’t be effectively measured in a top-down view. Top-down approaches ultimately treat people as averages. But candidates and voters do not behave like averages; they have different desires and expectations.

We know there are diverse behaviours that need to be modeled at the person-level. We also recognize that an election is a system of diverse agents, whose behaviours affect each other. For example, a candidate can gain or lose support by doing nothing, depending only on what other candidates do. Similarly, a candidate or voter will behave differently simply based on which candidates are in the race, even without changing any beliefs. In the academic world, the aggregated results of such behaviours are called “emergent properties”, and the ability to predict such outcomes is extremely difficult if looking at the system from the top down.

So we needed to move to a bottom-up approach that would allow us to model agents heterogeneously, and that led us to what is known as agent-based modeling.

Agent-based modeling and elections

Agent-based models employ individual heterogeneous “agents” that are interconnected and follow behavioural rules defined by the modeler. Due to their non-linear approach, agent-based models have been used extensively in military games, biology, transportation planning, operational research, ecology, and, more recently, in economics (where huge investments are being made).

While we’ll write more on this in the coming weeks, we define voters’ and candidates’ behaviour using parameters, and “train” them (i.e., setting those parameters) based on how they behaved in previous elections. For our first proof of concept model, we have candidate and voter agents with two-variable issues sets (call the issues “economic” and “social”) – each with a positional score of 0 to 100. Voters have political engagement scores (used to determine whether they cast a ballot), demographic characteristics based on census data, likability scores assigned to each candidate (which include anything that isn’t based on issues, from name recognition to racial or sexual bias), and a weight for how important likability is to that voter. Voters also track, via polls, the likelihood that a candidate can win. This is important for their “utility function” – that is, the calculation that defines which candidate a voter will choose, if they cast a ballot at all. For example, a candidate that a voter may really like, but who has no chance of winning, may not get the voter’s ultimate vote. Instead, the voter may vote strategically.

On the other hand, candidates simply seek votes. Each candidate looks at the polls and asks 1) am I a viable candidate?; and 2) how do I change my positions to attract more voters? (For now, we don’t provide them a way to change their likability.) Candidates that have a chance of winning move a random angle from their current position, based on how “flexible” they are on their positions. If that move works (i.e., moves them up in the polls), they move randomly in the same general direction. If the move hurt their standings in the polls, they turn around and go randomly in the opposite general direction. At some point, the election is held – that is, the ultimate poll – and we see who wins.

This approach allows us to run elections with different candidates, change a candidate’s likability, introduce shocks (e.g., candidates changing positions on an issue) and, eventually, see how different voting systems might impact who gets elected (foreshadowing future work.)
We’re not the first to apply agent-based modeling in psephology by any stretch (there are many in the academic world using it to explain observed behaviours), but we haven’t found any attempting to do so to predict actual elections.

Applying this to the Toronto 2018 Mayoral Race

First, Toronto voters have, over the last few elections, voted somewhat more right-wing than might have been predicted. Looking at the average positions across the city for the 2003, 2006, 2010, and 2014 elections looks like the following:

Average voters' mayoral choices on economic and social issues (2003, 2006, 2010, 2014)

This doesn’t mean that Toronto voters are themselves more right-wing than might be expected, just that they voted this way. This is in fact the first interesting outcome of our new approach. We find that about 30% of Toronto voters have been based on candidate likability, and that for the more right-wing candidates, likability has been a major reason for choosing them. For example, in 2010, Rob Ford’s likability score was significantly higher that his major competitors (George Smitherman and Joe Pantalone). This isn’t to say that everyone liked Rob Ford – but those that did vote for him cared more about something other than issues, at least relative to those who voted for his opponents.

For 2018, likability is less a differentiating factor, with both major candidates (John Tory and Jennifer Keesmaat scoring about the same on this factor). Nor are the issues – Ms. Keesmaat’s positions don’t seem to be hurting her standing in the polls as she’s staked out a strong position left of centre on both issues. What appears to be the bigger factor this time around is the early probabilities assigned by voters to Ms. Keesmaat’s chance of victory, a point that seems to have been a part of the actual Tory campaign’s strategy. Having not been seen as a major threat to John Tory by much of the city, that narrative become self-reinforcing. Further, John Tory’s positions are relatively more centrist in 2018 than they were in 2014, when he had a markedly viable right-wing opponent in Doug Ford. (To prove the point of this approach’s value, we could simply introduce a right-wing candidate and see what happens…)

Thus, our predictions don’t appear to be wildly different from current polls (with Tory winning nearly 2-to-1), and map as follows:

PsephoAnalytics' 2018 Mayoral Race Predictions

There will be much more to say on this, and much more we can do going forward, but for a proof of concept, we think this approach has enormous promise.

20 October 2015

How we did: High-level results

The day after an historic landslide electoral victory for the Liberal Party of Canada, we’ve compared our predictions (and those of other organizations who provide riding-level predictions) to the actual results in Toronto. 

Before getting to the details, we thought it important to highlight that while the methodologies of the other organizations differ, they are all based on tracking sentiments as the campaign unfolds. So, most columns in the table below will differ slightly from the one in our previous post as such sentiments change day to day.

This is fundamentally different from our modelling approach, which utilizes voter and candidate characteristics, and therefore could be applied to predict the results of any campaign before it even begins. (The primary assumption here is that individual voters behave in a consistent way but vote differently from election to election as they are presented with different inputs to their decision-making calculus.) We hope the value of this is obvious.

Now, on to the results! The final predictions of all organizations and the actual results were as follows:

Prediction sources: ThreeHundredEight.com, Vox Pop Labs, Election Atlas, Too Close to Call, Election Prediction Project  (as of October 19)

To start with, our predictions included many more close races than the others: while we predicted average margins of victory of about 10 points, the others were predicting averages well above that (ranging from around 25 to 30 points). The actual results fell in between at around 20 points.

Looking at specific races, we did better than the others at predicting close races in York Centre and Parkdale-High Park, where the majority predicted strong Liberal wins. Further, while everyone was wrong in Toronto-Danforth (which went Liberal by only around 1,000 votes), we predicted the smallest margin of victory for the NDP. On top of that, we were as good as the others in six ridings, meaning that we were at least as good as poll tracking in 9 out of 25 ridings (and would have been 79 days ago, before the campaign started, despite the polls changing up until the day before the election).

But that means we did worse in the others ridings, particularly Toronto Centre (where our model was way off), and a handful of races that the model said would be close but ended up being strong Liberal wins. While we need to undertake much more detailed analysis (once Elections Canada releases such details), the “surprise” in many of these cases was the extent to which voters, who might normally vote NDP, chose to vote Liberal this time around (likely a coalescence of “anti-Harper” sentiment).

Overall, we are pleased with how the model stood up, and know that we have more work to do to improve our accuracy. This will include more data and more variables that influence voters’ decisions. Thankfully, we now have a few years before the next election…

16 October 2015

Final riding-level predictions

Well, it is now only days until the 42nd Canadian election, and we have come a long way since this long campaign started. Based on our analyses to date of voter and candidate characteristics, we can now provide riding-level predictions. As we keep saying, we have avoided the use of polls, so these present more of an experiment than anything else. Nonetheless, we’ve put them beside the predictions of five other organizations (as of the afternoon of 15 October 2015), specifically:

(We’ll note that the last doesn’t provide the likelihood of a win, so isn’t colour-coded below, but does provide additional information for our purposes here.)

You’ll see that we’re predicting more close races than all the others combined, and more “leaning” races. In fact, the average margin of victory from 308, Vox Pop, and Too Close to Call are 23%/26%/23% respectively, which sounds high. Nonetheless, the two truly notable differences we’re predicting are in Eglinton-Lawrence, where the consensus is that finance minister Joe Oliver will lose badly (we predict he might win) and Toronto Centre, where Bill Munro is predicted to easily beat Linda McQuaig (we predict the opposite).

Anyway, we’re excited to see how these predictions look come Monday, and we’ll come back after the election with an analysis of our performance.

Now, get out and vote!

15 October 2015

A natural cycle in Canadian federal elections?

We’ve started looking into what might be a natural cycle between governing parties, which may account for some of our differences to the polls that we’ve seen. The terminology often heard is “time for a change” – and this sentiment, while very difficult to include in voter characteristics, is possible to model as a high level risk to governing parties.

To start, we reran our predictions with an incumbent-year interaction, to see if the incumbency bonus changed over time. Turns out it does – incumbency effect declines over time. But it is difficult to determine, from only a few years of data, whether we’re simply seeing a reversion to the mean. So we need more data – and likely at a higher level.

Let’s start with the proportion of votes received by each of today's three major parties (or their predecessors – whose names we’ll simply substitute with modern party names), with trend lines, in every federal election since Confederation:

This chart shows that the Liberal & Conservative trend lines are essentially the same, and that the two parties effectively cycle as the governing party over this line.

Prior to a noticeable 3rd party (i.e., the NDP starting in the 1962 election and its predecessor Co-operative Commonwealth Federation starting in the 1935 election) the Liberals and Conservatives effectively flipped back and forth in terms of governing (6 times over 68 years), averaging around 48% of the vote each. Since then, the flip has continued (10 more times over the following 80 years), and the median proportion of votes for Liberals, Conservatives, and NDP has been 41%/35%/16% respectively.

Further, since 1962, the Liberals have been very slowly losing support (about 0.25 points per election), while the other two parties have been very slowly gaining it (about 0.05 points per election), though there has been considerable variation across each election, making this slightly harder to use in predictions. (We’ll look into including this in our risk modeling).

Next, we looked at some stats about governing:
  • In the 148.4 years since Sir John A. Macdonald was first sworn in, there have been 27 PM-ships (though only 22 PMs), for an average length of 5.5 years (though 4.3 years for Conservatives and 6.9 years for Liberals).
  • Parties often string a couple PMs together - so the PM-ship has only switched parties 16 times with an average length of 8.7 years (or 7.2 Cons vs. 10.4 Libs).
  • Only two PMs have won four consecutive elections (Macdonald and Laurier), with four more having won three (Mackensie King, Diefenbaker, Trudeau, and Crétien) prior to Harper.

All of these stats would suggest that Harper is due for a loss: he has been the sole PM for his party for 9.7 years, which is over twice his party's average length for a PM-ship. He's also second all-time behind Macdonald in a consecutive Conservative PM role (having past Mulroney and Borden last year). From a risk-model perspective, Harper is likely about to become hit hard by the “time for a change” narrative.

But how much will this actually affect Conservative results? And how much will their opponents benefit? These are critical questions to our predictions.

In any election where the governing party lost (averaging once every 9 years; though 7 years for Conservatives, and 11 years for Liberals), that party saw a median drop of 6.1 points from the preceding election (average of 8.1 points). Since 1962 (first election with the NDP), that loss has been 5.5 points. But do any of those votes go to the NDP? Turns out, not really: those 5.5 points appear to (at least on average) switch back to the new governing party.

Given the risk to the current governing party, we would forecast a 5.5%-6.1% shift from the Conservatives to the Liberals, on top of all our other estimates (which would not overlap with any of this analysis), assuming that Toronto would feel the same about change as the rest of the country has historically.

That would mean our comparisons to recent Toronto-specific polls would look like this:

Remember – our analysis has avoided the use of polls, so these results (assuming the polls are right) are quite impressive.

Next up (and last before the election on Monday) will be our riding-level predictions.