09 June 2020

Comparing predicted to actual votes for the 2019 Federal election

Our predictions for the 2019 Federal race in Toronto were generated by our agent-based model that uses demographic characteristics and results from previous elections. Now that the final results are available, we can see how our predictions performed at the Electoral District level.

For this analysis, we restrict the comparison to just the major parties, as they were the only parties for which we estimated vote share. We also only compare the actual results to the predictions of our base scenario. In the future, our work will focus much more on scenario planning to explain political campaigns.

We start by plotting the difference between the actual votes and the predicted votes at the party and district level.
Distribution of the difference between the predicted and actual proportion of votes for all parties
The mean absolute value of differences from the actual results is 5.3%. In addition, the median value of the differences is 1.28%, which means that we slightly overestimated support for parties. However, as the histogram shows, there is significant variation in this difference across districts. Our highest overestimation was 15.6% and lowest underestimation was -18.5%.

To better understand this variation, we can look at a plot of the geographical distribution of the differences. In this figure, we show each party separately to illuminate the geographical structure of the differences.
Geographical distribution of the difference between the predicted and actual proportion of votes by Electoral District and party

The overall distribution of differences doesn’t have a clear geographical bias. In some sense, this is good, as it shows our agent-based model isn’t systematically biased to any particular Electoral District.

However, our model does appear to generally overestimate NDP support while underestimating Liberal support. These slight biases are important indicators for us in recalibrating the model.

Overall, we’re very happy with an error distribution of around 5%. As described earlier, our primary objective is to explain political campaigns. Having accurate predictions is useful to this objective, but isn’t the primary concern. Rather, we’re much more interested in using the model that we’ve built for exploring different scenarios and helping to design political campaigns.

21 October 2019

Using our Agent-Based Model to scenario test the Canadian federal election


As outlined in our last two posts, our algorithm has “learned” how to simulate the behavioural traits of over 2 million voters in Toronto. This allows us to turn their behavioural “dials” and see what happens.

To demonstrate, we’ll simulate three scenarios:
  1. The “likeability” of the Liberal Party falls by 10% from the baseline (i.e., continues to fall);
  2. The Conservative Party announces a policy stance regarding climate change much more aligned with the other parties; and
  3. People don’t vote strategically and no longer consider the probability of each candidate winning in their riding (i.e., they are free to vote for whomever they align with and like the most, somewhat as if proportional representation were a part of our voting system).

Let’s examine each scenario separately:

1 – If Liberal “likeability” fell

In this scenario, the “likeability” scores for the Liberals in each riding falls by 10% (the amount varies by riding). This could come from a new scandal (or increased salience and impact of previous ones).

What we see in this scenario is a nearly seven point drop in Liberal support across Toronto, about half of which would be picked up by the NDP. This would be particularly felt in certain ridings that are already less aligned on policy where changes in “likeability” have a greater impact. The Libs would only safely hold 13/25 seats, instead of 23/25.

From a seat perspective, the NDP would pick up another seat (for a total of three) in at least 80% of our simulations – namely York South-Weston. (It would also put four – Beaches-East York, Davenport, Spadina-Fort York, and University-Rosedale – into serious play.) Similarly, the Conservatives would pick up two seats in at least 80% of our simulations – namely Eglinton-Lawrence and York Centre (and put Don Valley North, Etobicoke Centre, and Willowdale into serious play).

This is a great example of how changing non-linear systems can produce results that are not linear (meaning they cannot be easily predicted by polls or regressions).

2 – If Conservatives undifferentiated themselves on climate change

In this scenario, the Conservatives announce a change to their policy position on a major issue, specifically climate change. The salience of this change would be immediate (this can also be changed, but for simplicity we won’t do so here). It may seem counterintuitive, but it appears that the Conservatives, by giving up a differentiating factor, would actually lose voters. Specifically, in this scenario, no seats change hands, but the Conservatives actually give up about three points to the Greens.

To work this through, imagine a voter who may like another party more, but chooses to vote Conservative specifically because their positions on climate change align. But if the party moved to align its climate change policy with other parties, that voter may decide that there is no longer a compelling enough reason to vote Conservative. If there are more of these voters than voters the party would pick up by changing this one policy (e.g., because there are enough other policies that still dissuade voters from shifting to the Conservatives), then the Conservatives become worse off.
The intuition may be for the defecting Conservative voters discussed above to go Liberal instead (and some do), but in fact, once policies look more alike, “likeability” can take over, and the Greens do better there than the Liberals.

This is a great example of how the emergent properties of a changing system cannot be seen by other types of models.

3 – Proportional Representations

Recent analysis done by P.J. Fournier (of 338Canada) for Macleans Magazine used 338Canada’s existing poll aggregations to estimate how many seats each party would win across Canada if (at least one form of) proportional representation was in place for the current federal election. It is an interesting thought experiment and allows for a discussion of the value of changing our electoral practice.

As supportive as we are of such analysis, this is an area of analysis perfectly set up for agent-based modeling. That’s because Fournier’s analysis assumes no change in voting behavior (as far as we can tell), whereas ABM can relax that assumption and see how the algorithm evolves.

To do so, we have our voters ignore the winning probabilities of each candidate and simply pick who they would want to (including their “likeability”).

Perhaps surprisingly, the simulations show that the Liberals would lose significant support in Toronto (and likely elsewhere). They would drop to third place, behind the Conservatives (first place) and the Greens (second place).

Toronto would transform into four-party city: depending on the form of proportional representation chosen, the city would have 9-12 Conservative seats, 4-7 Green seats, 2-5 Liberal seats, and 2-3 NDP seats.

This suggests that most Liberal voters in Toronto are supportive only to avoid their third or fourth choice from winning. This ties in with the finding that Liberals are not well “liked” (i.e., outside of their policies), and might also suggest why the Liberals back-tracked on electoral reform – though such conjecture is outside our analytical scope. Nonetheless, it does support the idea that the Greens are not taken seriously because voters sense that the Greens are not taken seriously by other voters.

More demonstrations are possible
Overall, these three scenarios showcase how agent-based modeling can be used to see the emergent outcomes of various electoral landscapes. Many more simulations could be run, and we welcome ideas for things that would be interesting to the #cdnpoli community.


20 October 2019

Dumbing down voters

In our last post, our analysis assumed that voters had a very good sense of the winning probabilities for each candidate in their ridings. This was probably an unfair assumption to make - voters have a sense of which two parties might be fighting for the seat, but unlikely that they know the z-scores based on good sample size polls.

So, we've loosened that statistical knowledge a fair amount, whereby voters only have some sense of who is really in the running in their ridings. While that doesn't change the importance of "likeability" (still averaging around 50% of each vote), it does change which parties' votes are driven by "likeability" more than their policies.

Now, it is in fact the Liberals who fall to last in "likeability" - and by a fairly large margin - coming last or second last in every riding. This suggests that a lot of people are willing to hold their nose and vote for the Libs.

On average, the other three parties have roughly equal "likeability", but this is more concentrated for some parties than for others. For example, the Greens appear to be either very well "liked" or not "liked" at all. They are the most "liked" in 13/25 ridings and least "liked" in 9/25 ridings - and have some fairly extreme values for "likeability". This would suggest that some Green supporters are driven entirely by policy while others are driven by something else.

The NDP and Conservatives are more consistent, but the NDP are most "liked" in 10/25 ridings whereas the Conservatives are most "liked" in the remaining 2/25 ridings.

As mentioned in the last post, we'll be posting some scenarios soon.

17 October 2019

Using Agent-based modeling to explain polls


Modeling to explain, not forecast

The goal of PsephoAnalytics is to model voting behaviour in order to accurately explain political campaigns. That is, we are not looking to forecast ongoing campaigns – there are plenty of good poll aggregators online that provide such estimation. But if we can quantitatively explain why an ongoing campaign is producing the polls that it is, then we have something unique.

That is why agent-based modeling is so useful to us. Our model – as a proof of concept – can replicate the behaviour of millions of individual voters in Toronto in a parameterized way. Once we match their voting patterns to those suggested by the polls (specifically those from CalculatedPolitics, which provides riding-level estimates), we can compare the various parameters that make up our agents behaviour and say something about them.

We can also, therefore, turn those various behavioural dials and see what happens. For example, what if a party changed its positions on a major policy issue, or if a party leader became more likeable? That allows us to estimate the outcomes of such hypothetical changes without having to invest in conducting a poll.

Investigating the 2019 Federal Election

As in previous elections, we only consider Toronto voters, and specifically (this time) how they are behaving with respect to the 2019 federal election. We have matched the likely voting outcomes of over 2 million individual voters with riding-level estimates of support for four parties: Liberals, Conservatives, NDP, and Greens. This also means that we can estimate the response of voters to individual candidates, not just the parties themselves.

First, let’s start with the basics – here are the likely voter outcomes by ridings for each party, as estimated by CalculatedPolitics on October 16.


As these maps show, the Liberals are expected to win 23 of Toronto’s 25 ridings. The two exceptions are Parkdale-High Park and Toronto-Danforth, which are leaning NDP. Four ridings, namely Eglinton-Lawrence, Etobicoke Centre, Willowdale, and York Centre, see the Liberals slightly edging out the Conservatives. Another four ridings, namely Beaches-East York, Davenport, University-Rosedale, and York South-Weston, see the Liberals slightly edging out the NDP. The Greens do no better than 15% (Toronto Danforth), average about 9% across the city, and are highly correlated with support for the NDP.

What is driving these results? First, a reminder about some of the parameters we employ in our model. All “agents” (e.g., voters, candidates) take policy positions. For voters, these are estimated using numerous historical elections to derive “natural” positions. For candidates, we assign values based on campaign commitments (e.g., from CBC’s coverage, though we could also simply use a VoteCompass). Some voters can also care about policy more than others, meaning they care less about non-policy factors (we use the term “likeability” to capture all these non-policy factors). As such, candidates also have a “likeability” score. Voters also have an “engagement” score that indicates how likely they are to pay attention to the campaign and, more importantly, vote at all. Finally, voters can see polls and determine how likely it is that certain parties will win in their riding. Each voter then determine, for each party a) how closely is their platform aligned with the voter’s issue preferences; b) how much do they “like” the candidate (for non-policy reasons); and c) how likely is it the candidate can win in their riding. That information is used by the voter to score each candidate, and then vote for the candidate with the highest score, if the voter chooses to vote at all. (There are other parameters used, but these few provide much of the differentiation we see.)

Based on this, there are a couple of key take-aways from the 2019 federal election:
  • “Likeability” is important, with about 50% of each vote, on average, being determined by how much the voter likes the party. The importance of “likeability” ranges from voter to voter (extremes of 11% and 89%), but half of voters use “likeability” to determine somewhere between 42% and 58% of their vote.
  • Given that, some candidates are simply not likeable enough to overcome a) their party platforms; or b) their perceived unlikelihood of victory (over which they have almost no control). For example, the NDP have the highest average “likeability” scores, and rank first in 18 out of 25 ridings. By contrast, the Greens has the lowest average. This means that policy issues (e.g., climate change) are disproportionately driving Green Party support, whereas something else (e.g., Jagmeet Singh’s popularity) is driving NDP support.

In our next post, we’ll look at some scenarios where we change some of these parameters (or perhaps more drastic things).



26 November 2018

Reviewing our 2018 Mayoral race predictions

Our predictions for the 2018 mayoral race in Toronto were generated by our new agent-based model that used demographic characteristics and results of previous elections.

Now that the final results are available, we can see how our predictions performed at the census tract level.

For this analysis, we restrict the comparison to just Tory and Keesmaat, as they were the only two major candidates and the only two for which we estimated vote share. Given this, we start by just plotting the difference between the actual votes and the predicted votes for Keesmaat. The distribution for Tory is simply the mirror image, since their combined share of votes always equals 100%.
Distribution of the difference between the predicted and actual proportion of votes for Keesmaat

The mean difference from the actual results for Keesmaat is -6%, which means that, on average, we slightly overestimated support for Keesmaat. However, as the histogram shows, there is significant variation in this difference across census tracts with the differences slightly skewed towards overestimating Keesmaat’s support.

To better understand this variation, we can look at a plot of the geographical distribution of the differences. In this figure, we show both Keesmaat and Tory. Although the plots are just inverted versions of each other (since the proportion of votes always sums to 100%), seeing them side by side helps illuminate the geographical structure of the differences.
The distribution of the difference between the predicted and actual proportion of votes by census tract

The overall distribution of differences doesn’t have a clear geographical bias. In some sense, this is good, as it shows our agent-based model isn’t systematically biased to any particular census tract. Rather, refinements to the model will improve accuracy across all census tracts.

We’ll write details about our new agent-based approach soon. In the meantime, these results show that the approach has promise, given that it used only a few demographic characteristics and no polls. Now we’re particularly motivated to gather up much more data to enrich our agents’ behaviour and make better predictions.

21 October 2018

A new approach to predicting elections: Agent-based modeling

It’s been a while since we last posted – largely for personal reasons, but also because we wanted to take some time to completely retool our approach to modeling elections.
In the past, we’ve tried a number of statistical approaches. Because every election is quite different to its predecessors, this proved unsatisfactory – there are simply too many things that change which can’t be effectively measured in a top-down view. Top-down approaches ultimately treat people as averages. But candidates and voters do not behave like averages; they have different desires and expectations.

We know there are diverse behaviours that need to be modeled at the person-level. We also recognize that an election is a system of diverse agents, whose behaviours affect each other. For example, a candidate can gain or lose support by doing nothing, depending only on what other candidates do. Similarly, a candidate or voter will behave differently simply based on which candidates are in the race, even without changing any beliefs. In the academic world, the aggregated results of such behaviours are called “emergent properties”, and the ability to predict such outcomes is extremely difficult if looking at the system from the top down.

So we needed to move to a bottom-up approach that would allow us to model agents heterogeneously, and that led us to what is known as agent-based modeling.

Agent-based modeling and elections

Agent-based models employ individual heterogeneous “agents” that are interconnected and follow behavioural rules defined by the modeler. Due to their non-linear approach, agent-based models have been used extensively in military games, biology, transportation planning, operational research, ecology, and, more recently, in economics (where huge investments are being made).

While we’ll write more on this in the coming weeks, we define voters’ and candidates’ behaviour using parameters, and “train” them (i.e., setting those parameters) based on how they behaved in previous elections. For our first proof of concept model, we have candidate and voter agents with two-variable issues sets (call the issues “economic” and “social”) – each with a positional score of 0 to 100. Voters have political engagement scores (used to determine whether they cast a ballot), demographic characteristics based on census data, likability scores assigned to each candidate (which include anything that isn’t based on issues, from name recognition to racial or sexual bias), and a weight for how important likability is to that voter. Voters also track, via polls, the likelihood that a candidate can win. This is important for their “utility function” – that is, the calculation that defines which candidate a voter will choose, if they cast a ballot at all. For example, a candidate that a voter may really like, but who has no chance of winning, may not get the voter’s ultimate vote. Instead, the voter may vote strategically.

On the other hand, candidates simply seek votes. Each candidate looks at the polls and asks 1) am I a viable candidate?; and 2) how do I change my positions to attract more voters? (For now, we don’t provide them a way to change their likability.) Candidates that have a chance of winning move a random angle from their current position, based on how “flexible” they are on their positions. If that move works (i.e., moves them up in the polls), they move randomly in the same general direction. If the move hurt their standings in the polls, they turn around and go randomly in the opposite general direction. At some point, the election is held – that is, the ultimate poll – and we see who wins.

This approach allows us to run elections with different candidates, change a candidate’s likability, introduce shocks (e.g., candidates changing positions on an issue) and, eventually, see how different voting systems might impact who gets elected (foreshadowing future work.)
We’re not the first to apply agent-based modeling in psephology by any stretch (there are many in the academic world using it to explain observed behaviours), but we haven’t found any attempting to do so to predict actual elections.

Applying this to the Toronto 2018 Mayoral Race

First, Toronto voters have, over the last few elections, voted somewhat more right-wing than might have been predicted. Looking at the average positions across the city for the 2003, 2006, 2010, and 2014 elections looks like the following:

Average voters' mayoral choices on economic and social issues (2003, 2006, 2010, 2014)

This doesn’t mean that Toronto voters are themselves more right-wing than might be expected, just that they voted this way. This is in fact the first interesting outcome of our new approach. We find that about 30% of Toronto voters have been based on candidate likability, and that for the more right-wing candidates, likability has been a major reason for choosing them. For example, in 2010, Rob Ford’s likability score was significantly higher that his major competitors (George Smitherman and Joe Pantalone). This isn’t to say that everyone liked Rob Ford – but those that did vote for him cared more about something other than issues, at least relative to those who voted for his opponents.

For 2018, likability is less a differentiating factor, with both major candidates (John Tory and Jennifer Keesmaat scoring about the same on this factor). Nor are the issues – Ms. Keesmaat’s positions don’t seem to be hurting her standing in the polls as she’s staked out a strong position left of centre on both issues. What appears to be the bigger factor this time around is the early probabilities assigned by voters to Ms. Keesmaat’s chance of victory, a point that seems to have been a part of the actual Tory campaign’s strategy. Having not been seen as a major threat to John Tory by much of the city, that narrative become self-reinforcing. Further, John Tory’s positions are relatively more centrist in 2018 than they were in 2014, when he had a markedly viable right-wing opponent in Doug Ford. (To prove the point of this approach’s value, we could simply introduce a right-wing candidate and see what happens…)

Thus, our predictions don’t appear to be wildly different from current polls (with Tory winning nearly 2-to-1), and map as follows:

PsephoAnalytics' 2018 Mayoral Race Predictions

There will be much more to say on this, and much more we can do going forward, but for a proof of concept, we think this approach has enormous promise.

20 October 2015

How we did: High-level results

The day after an historic landslide electoral victory for the Liberal Party of Canada, we’ve compared our predictions (and those of other organizations who provide riding-level predictions) to the actual results in Toronto. 

Before getting to the details, we thought it important to highlight that while the methodologies of the other organizations differ, they are all based on tracking sentiments as the campaign unfolds. So, most columns in the table below will differ slightly from the one in our previous post as such sentiments change day to day.

This is fundamentally different from our modelling approach, which utilizes voter and candidate characteristics, and therefore could be applied to predict the results of any campaign before it even begins. (The primary assumption here is that individual voters behave in a consistent way but vote differently from election to election as they are presented with different inputs to their decision-making calculus.) We hope the value of this is obvious.

Now, on to the results! The final predictions of all organizations and the actual results were as follows:

Prediction sources: ThreeHundredEight.com, Vox Pop Labs, Election Atlas, Too Close to Call, Election Prediction Project  (as of October 19)

To start with, our predictions included many more close races than the others: while we predicted average margins of victory of about 10 points, the others were predicting averages well above that (ranging from around 25 to 30 points). The actual results fell in between at around 20 points.

Looking at specific races, we did better than the others at predicting close races in York Centre and Parkdale-High Park, where the majority predicted strong Liberal wins. Further, while everyone was wrong in Toronto-Danforth (which went Liberal by only around 1,000 votes), we predicted the smallest margin of victory for the NDP. On top of that, we were as good as the others in six ridings, meaning that we were at least as good as poll tracking in 9 out of 25 ridings (and would have been 79 days ago, before the campaign started, despite the polls changing up until the day before the election).

But that means we did worse in the others ridings, particularly Toronto Centre (where our model was way off), and a handful of races that the model said would be close but ended up being strong Liberal wins. While we need to undertake much more detailed analysis (once Elections Canada releases such details), the “surprise” in many of these cases was the extent to which voters, who might normally vote NDP, chose to vote Liberal this time around (likely a coalescence of “anti-Harper” sentiment).

Overall, we are pleased with how the model stood up, and know that we have more work to do to improve our accuracy. This will include more data and more variables that influence voters’ decisions. Thankfully, we now have a few years before the next election…