Candidate Predictions – Split Voting vs GLM I: Theory

The most challenging and, to my mind, the most interesting part of the Herald election prediction model was the candidate vote projections. Unsurprisingly these were the least accurate part of my model. For more about that read my previous post here. To produce these predictions, I used a GLM which used party vote, previous candidate vote, incumbency, and party – along with some interaction terms – as explanatory variables. While doing well picking the winner they did poorly in percentile terms. There is another way which could be used to predict candidate vote: using split voting data. In fact, this was the method that another person trying to predict the 2017 candidate vote used.

Before continuing further, I will briefly explain what splitting your vote means and the New Zealand electoral system more generally. I expect most of my readers to be from New Zealand or familiar with the electoral system – if you are feel free to skip to the next paragraph – but just in case there is anyone who isn’t here is a quick rundown. New Zealand uses Mixed Member Proportional voting or MMP. This means everyone gets two votes: one for the party of their choosing, and one for a candidate in their electorate – usually affiliated to a party. To get seats in parliament a party either needs to break the 5% threshold of party vote nationwide or win an electorate seat. Electorate seats are won by however gets the most votes, like FPP. Since many parties don’t have enough supporters for their candidate to win an electorate, rather than vote for the candidate of the party they voted for many people will vote for a different, but closely aligned, party’s candidate. The most common example is Green party voters voting for the Labour party candidates. This is called splitting your vote. The electoral commission records and publishes split voting reports for each electorate. For more information on the New Zealand electoral system follow this link and for access to election statistics click here.

How Each of the Methods Would Work:

I have briefly described the variables used in the GLM model I fitted. Then to predict I would take a random multivariate normal with a covariance matrix which was as follows:
SE = sqrt(Standard Error of fit^2 + Residual Scale of fit^2) – This formula for standard error was used because it was found for gaussian log-link responses here.
Covariance matrix = SE * t(SE) * past matrix
Where past matrix is the covariance matrix of candidate vote of different parties from all general electorates in past elections (the same was done for Maori electorates apart from it was the covariance of Maori electorates). Some electorates such as Wigram with Jim Anderton and United Future with Peter Dunne were excluded.

The split voting analysis would operate slightly differently. After calculating party votes for all parties in the electorate you would then multiply the split vote matrix by the vector of party votes. The split vote matrix would be calculated based on past elections split vote data – nationwide and with individual electorate adjustments. It would also be varied based upon how each parties’ split voting has varied in the past. You would also have to adjust for which parties were running candidates – in this case you would likely have only the four major parties and an other column for general electorates.

Strengths and Weaknesses of the GLM:

The primary strength of the GLM method is that it can be applied to nearly every electorate without change – Epsom is the only exception – and it is relatively parameter efficient. This is particularly useful as the simulation code is quite time-consuming to run and so having a model which does not need to re-estimate parameters for every single electorate every simulation cuts down run time.

The GLM does have some weaknesses though, as outlined in the last post. To recap:

Over confidence for individual predictions
Lack of reasonable prediction intervals
Does not deal with tactical voting particularly well
Breaks when candidates have ‘name recognition’

Strengths and Weaknesses of Split Voting:

The strength of the split voting method is that it inherently deals with tactical voting – hence why it was used to predict the Epsom electorate in the model this past election. It also can deal with the ‘name recognition’ problem if that candidate has run before in that election – we assume that if they are known they likely drew more votes from other parties in the past. This highlights one of the problems of the split voting method though; that it can’t predict if split voting patterns will continue. Take for example Chlöe Swarbrick in Maungakiekie who won 13% of Labour voters and 49% of Green Party voters as compared to 3% and 35% for the Green Party candidate in 2014 and 3% and 30% in 2011. What would happen, for example, if Green Party voters decided to vote for National candidates if a Green Party realignment occurred. Obviously, that is unlikely but that sort of change in tactical voting preference would ruin the split voting model if it occurred. We also have the issue of having to adjust each individual electorates’ split vote matrix based on past elections and which parties are running. We would likely only predict the four major parties to avoid this. However, it reduces the coverage of the model and we might be interested in knowing the percentage to be won by minor parties. My final concern I plan on exploring in a future blog post and that is the relationship between party vote and split voting. I would hypothesis that the higher the Green Party’s vote goes the larger percentage of voters that would vote for a Labour candidate. This is partly because more ‘Labour’ voters would have swung to the Green Party and therefore the proportion of Green votes for Labour candidates would increase. I suspect a similar occurrence for ACT/National – although not as significant anymore.

Conclusions:

My current thoughts on the best way to predict Candidate vote then would either be a combination of a split vote and GLM model – although this would be computationally time-consuming – or a GLM which has a ‘split’ party vote as a parameter. This would be something along the lines of:

Labour Split Party Vote = 0.8*Labour Party Vote + 0.5*Green Party vote etc.

Where the split vote constants are taken as the national average with variation. This is not perfect but if we were to get more accurate we would end up with a split vote model calculating the party vote, so we might as well return to the prior suggestion. I plan on building a simple split vote model in the coming weeks and I’ll compare it to the GLM model in a future post.

Candidate Predictions – Split Voting vs GLM I: Theory

How Each of the Methods Would Work:

Strengths and Weaknesses of the GLM:

Strengths and Weaknesses of Split Voting:

Conclusions:

Published by areaundythecurve

Leave a comment Cancel reply

How Each of the Methods Would Work:

Strengths and Weaknesses of the GLM:

Strengths and Weaknesses of Split Voting:

Conclusions:

Share this:

Related

Published by areaundythecurve

Leave a comment Cancel reply