Now that the final results are available, we can see how our predictions performed at the census tract level.
For this analysis, we restrict the comparison to just Tory and Keesmaat, as they were the only two major candidates and the only two for which we estimated vote share. Given this, we start by just plotting the difference between the actual votes and the predicted votes for Keesmaat. The distribution for Tory is simply the mirror image, since their combined share of votes always equals 100%.
Distribution of the difference between the predicted and actual proportion of votes for Keesmaat |
The mean difference from the actual results for Keesmaat is -6%, which means that, on average, we slightly overestimated support for Keesmaat. However, as the histogram shows, there is significant variation in this difference across census tracts with the differences slightly skewed towards overestimating Keesmaat’s support.
To better understand this variation, we can look at a plot of the geographical distribution of the differences. In this figure, we show both Keesmaat and Tory. Although the plots are just inverted versions of each other (since the proportion of votes always sums to 100%), seeing them side by side helps illuminate the geographical structure of the differences.
The distribution of the difference between the predicted and actual proportion of votes by census tract |
The overall distribution of differences doesn’t have a clear geographical bias. In some sense, this is good, as it shows our agent-based model isn’t systematically biased to any particular census tract. Rather, refinements to the model will improve accuracy across all census tracts.
We’ll write details about our new agent-based approach soon. In the meantime, these results show that the approach has promise, given that it used only a few demographic characteristics and no polls. Now we’re particularly motivated to gather up much more data to enrich our agents’ behaviour and make better predictions.