18 September 2015

Data for federal elections

Analyzing the upcoming federal election requires collecting and integrating new data. This is often the most challenging part of any analysis and we've committed significant efforts to obtaining good data for federal elections in Toronto's electoral districts.

Clearly, the first place to start was with Elections Canada and the results of previous general elections. These are available for download as collections of Excel files, which aren't the most convenient format. So, our toVotes package has been updated to include results from the 2006, 2008, and 2011 federal elections for electoral districts in Toronto. The toFederalVotes data frame provides the candidate's name, party, whether they were an incumbent, and the number of votes they received by electoral district and poll number. Across the three elections, this amounts to 82,314 observations.

Connecting these voting data with other characteristics requires knowing where each electoral district and poll are in Toronto. So, we created spatial joins among datasets to integrate them (e.g., combining demographics from census data with the vote results). Shapefiles for each of the three federal elections are available for download, but the location identifiers aren't a clean match between the Excel and shapefiles. Thanks to some help from Elections Canada, we were able to translate the location identifiers and join the voting data to the election shapefiles. This gives us close to 4,000 poll locations across 23 electoral districts in each year. We then used the census shapefiles to aggregate these voting data into 579 census tracts. These tracts are relatively stable and give us a common geographical classification for all of our data.

This work is currently in the experimental fed-geo branch of the toVotes package and will be pulled into the main branch soon. Now, with votes aggregated into census tracts, we can use the census data for Toronto in our toCensus package to explore how demographics affect voting outcomes.

Getting the data to this point was more work than we expected, but well worth the effort. We're excited to see what we can learn from these data and look forward to sharing the results with you.

No comments:

Post a Comment