## Voting sympathies in double round elections: the Finnish 2012 presidential election case

Update on 20120917: added W-S normality tests on residual and changed the plotting code to be compliant with the latest releases of ggplot

Update on 20130828: Added a graph stacking the estimated results of the second round attribution given first round preferences.

Certain interesting public opinion traits can be extracted from analyzing double round election results, and the mathematical tools required to do so are freely available. In particular I am going to analyze the results of the 2012 Finnish presidential election with the R numerical analysis software.

The Finnish 2012 presidential election (like all recent Finnish presidential elections) is made by two rounds of voting. In the first one a number of more than two candidates are electable. If the most voted candidate in the first round get less than 50% of the votes a second round of voting takes place where only the two most voted candidates in the first round compete for the win. For completion sake: just two candidates on the first round would make it redundant; 50% or more of the votes for a first round candidate make the second round unnecesary too.

My objective is to find what are the second round results made of. That is, I intend to build a model that explains why the second round results happened by using only the voting tally of the first round.

For the mathematically inclined: I intend to find a vector of weights $$w$$ such that $$w'x=y$$ where each $$x_{i}$$ are the results for the $$i$$-th candidate on the first round and $$y$$ are the results of a candidate on the second round. Depending on how $$w$$ is formulated second round voting preferences can be uncovered.

Let's start by saying that I have (and so can you) the results from the 2012 election:

First we load some the R packages we are going to take into use.

library(reshape2, warn.conflicts=FALSE, quietly=TRUE, verbose=FALSE)
library(plyr, warn.conflicts=FALSE, quietly=TRUE, verbose=FALSE)
library(ggplot2, warn.conflicts=FALSE, quietly=TRUE, verbose=FALSE)
library(limSolve, warn.conflicts=FALSE, quietly=TRUE, verbose=FALSE)


The election data is contained in two variables:

load("fin_prez_2012.RData")
ls() 
## [1] "area.results" "round.totals" 

The variable area.results hold the voting results for each polling area (an aggregation entity smaller or equal than a small city: small villages are likely to form a polling area, large cities contain many polling areas) and for both rounds and for each candidate:

head(area.results) 
##   split_level                          election.round variable  value
## 1 Äänekoski-Akanniemi - Karhunlähde    first          Arhinmäki 154
## 2 Äänekoski-Alkula-Keskusta (Suolahti) first          Arhinmäki 193
## 3 Äänekoski-Honkola                    first          Arhinmäki 82
## 4 Äänekoski-Keskusta                   first          Arhinmäki 137
## 5 Äänekoski-Kirkonmäki - Kuhnamo       first          Arhinmäki 188
## 6 Äänekoski-Konginkangas               first          Arhinmäki 40 

Eight candidates ran on the first round:

levels(area.results$variable)  ## [1] "Arhinmäki" "Haavisto" "Lipponen" "Essayah" "Soini" "Niinistö" ## [7] "Biaudet" "Väyrynen"  The following color scale should make plotting easier: candidate.colormap <- c("Arhinmäki"='#cd0009ff', "Haavisto"='#61bf1aff', "Lipponen"='#ed1b24ff', "Essayah"='#f7931dff', "Soini"='#edd866ff', "Niinistö"='#00577dff', "Biaudet"='#007ac9ff', "Väyrynen"='#1b9345ff')  Their totals for the first round is as follows: totals.1st.round <- ddply( subset(area.results, election.round=='first'), .(variable), function(x) c(agg=sum(x$value)))
colnames(totals.1st.round) <- c('candidate', 'vote.count')
totals.1st.round 
##   candidate   vote.count
## 1 Arhinmäki   167663
## 2 Haavisto    574275
## 3 Lipponen    205111
## 4 Essayah      75744
## 5 Soini       287571
## 6 Niinistö   1131254
## 7 Biaudet      82598
## 8 Väyrynen    536555

But a graph is worth a thousand words:

p <- ggplot(totals.1st.round, aes(x=candidate, y=vote.count/1000, fill=candidate))
p <- p + labs(title='Total votes (1st round)')
p <- p + geom_bar() + theme(legend.position='none')
p <- p + scale_fill_manual(values=candidate.colormap[totals.1st.round$candidate]) p <- p + scale_x_discrete(name="Candidate") p <- p + scale_y_continuous(name="Count") p  The other variable that we loaded, round.totals, holds the number of votes casted for each round and the difference in votes casted between the second round and the first. head(round.totals)  ## split_level first second second.round.diff ## 1 Äänekoski-Akanniemi - Karhunlähde 1480 1321 -159 ## 2 Äänekoski-Alkula-Keskusta (Suolahti) 1533 1383 -150 ## 3 Äänekoski-Honkola 1075 910 -165 ## 4 Äänekoski-Keskusta 1400 1263 -137 ## 5 Äänekoski-Kirkonmäki - Kuhnamo 1578 1385 -193 ## 6 Äänekoski-Konginkangas 773 676 -97  Summarizing the difference in votes casted shows something worth a remark… summary(round.totals$second.round.diff) 
##    Min.   1st Qu.  Median    Mean  3rd Qu.   Max.
## -683.0    -104.0    -63.0   -77.8   -35.0  241.0 

…which is that typically for each polling area electorate turnover is lower on the second round. That is, typically less voters cast their votes on the second round. We will delve into this promptly.

### Candidate drop and its influence in turnover

The drop in the turnover that we just show triggers the question of whether it is a generalized phenomena or something affecting to some particular supporters. That is whether the turnover drop is more severe on voters that on the first round voted for some particular candidate.

Hence we will estimate the second round turnover difference by using the vote count for each candidate in the first round. We will do so by assuming that the second round results are a linear combination of the results in the first round while allowing that each voter of a dropped-off candidate may abstain.

The model looks as follows:

\begin{aligned} & Xw \approx -\Delta \\ \text{where} & 0 \preceq w \preceq 1 \end{aligned}

where $$X$$ contains the results of the first voting round (one row for each polling area, one column for each candidate), $$w$$ are my mixing coefficients and $$\Delta$$ is the turnover difference with negative sign.

That is, it tries to find a linear combination of votes in the first round that explains the missing votes on the second round, and such linear combination is constrained to real numbers larger than 0 (the model assumes that only first round voters may take part in the second round) and smaller than 1 (the percentage of voters showing up for a given candidate on the second round cannot be larger than their number on the first round).

The model is the constructed by reducing the problem to an $$\ell^{2}_{2}$$-norm minimization problem while adding the natural mixing limits as constraints thus obtaining a modified Least Squares problem, where the modification is in the form of constraints defining a boxed feasible region:

\begin{aligned} \min_{w} & & \parallel Xw - \Delta \parallel ^2 \\ \text{s.t} & & 0 \preceq w \preceq 1 \end{aligned}

That is, find the vector $$w$$ that minimizes the difference between $$Xw$$ and the second round voting difference (notice the sign reversal since I am considering the difference also with reversed sign).

The function lsei() in the R package LimSolve is suitable for solving such minimization problem if we augment the parameter constraints as to suit out problem definition. That is, we have to convert our constraints specification $$0 \preceq w \preceq 1$$ into the canonical form expected by lsei() which is $$Gx \geq h$$ which for our case is

\begin{aligned} I w \succeq 0 \\ -I w \succeq -1 \end{aligned}

where $$I$$ is the identity matrix comformant with our weight vector $$w$$.

This is how it looks like in R. First we construct the matrix $$X$$

x <- dcast(subset(area.results, election.round == "first"), split_level ~ variable)
candidate.names <- colnames(x)[!colnames(x) == "split_level"]
X <- as.matrix(x[, candidate.names])
rownames(X) <- x$split_level  Building $$\Delta$$ is trivial: D <- matrix(with(round.totals, first - second), nrow = nrow(round.totals), ncol = 1) rownames(D) <- round.totals$split_level
colnames(D) <- "turnover.diff" 

Last but not least the augmented constraints construction:

n <- ncol(X)
I <- diag(rep(1, n))
G <- rbind(I, -I)
h <- rbind(matrix(0, n, 1), matrix(-1, n, 1)) 

So now we can move on to solve the minimization problem and show the values for the vector $$w$$:

sol <- lsei(A = X, B = D, G = G, H = h, type = 2, verbose = FALSE)
show(w <- sol$X)  ## Arhinmäki Haavisto Lipponen Essayah Soini Niinistö Biaudet Väyrynen ## 0.03245 0.00000 0.10747 0.00000 0.06378 0.00000 0.00433 0.24868  The most remarkable result is that almost 1 out of 4 people who voted for Väyrynen and 1 out of 10 for Lipponen in the first round did not vote for any of the two candidates on the second round. All resulting values can be seen in the graph below: x <- data.frame(Candidate=names(w), drop.out.rate=w) p <- ggplot(x, aes(x=Candidate, y=drop.out.rate*100, fill=Candidate)) p <- p + labs(title='Voter Second Round Drop-out Rate by Candidate') p <- p + geom_bar() + theme(legend.position='none') p <- p + scale_fill_manual(values=candidate.colormap[x$Candidate])
p <- p + scale_y_continuous(name="Drop-out%")
p 

Notice that Väyrynen ended third in the first voting round.

At this point it would be good to measure how good is the model fit. A analysis of the model residuals is in place:

r <- (X %*% (1 - w) - round.totals$second)/round.totals$second ## normalized by district size
shapiro.test(r) 
## ## Shapiro-Wilk normality test
## ## data: r
## W = 0.8568, p-value < 2.2e-16 
summary(r) 
##              V1
## Min. :  -0.1274
## 1st Qu.:-0.0155
## Median : 0.0012
## Mean :   0.0080
## 3rd Qu.: 0.0222
## Max. :   0.4112 

… which indicates residuals following a normal distribution.

### Final vote mix by first round preference

Last we will try to estimate the composition of the second round results from the first round voting.

First we start by compensating the number of votes lost in the second round, which is trivial now that we have our estimate in $$w$$

Z <- round(X %*% diag(1 - w))
colnames(Z) <- colnames(X) 

In the final round there are two candidates only. Let me introduce you their names:

as.character(unique(subset(area.results, election.round == "second")variable))  ## [1] "Haavisto" "Niinistö"  The variables $$Y_{H}$$ and $$Y_{N}$$ represent the number of votes each of them got in the second round (the index denotes the first name of their respective last names). YH <- subset(area.results, election.round == "second" & variable == "Haavisto") YN <- subset(area.results, election.round == "second" & variable == "Niinistö")  I am going to define $$w_{H},w_{N}$$ as the mix variables, that is $$w_{H}$$ is the vector that holds where first round votes and $$w_{N}$$ for Niinistö both compensated by second round drop-outs in the election process, that is: \begin{aligned} Zw_{H} \approx Y_{H} \\ Zw_{N} \approx Y_{N} \end{aligned} Since our model assumes that second round voters choose either of the two candidates the following identity holds: $w^{(i)}_{N} + w^{(i)}_{H} = 1$ Let's put is all together. The model we are trying to build now is slighly different than the previous one and it looks as follows (using only $$w_{H}$$): \begin{aligned} \min_{w_{H}} & & \parallel \begin{bmatrix} Z & 0 \\ 0 & Z \end{bmatrix} \begin{bmatrix} w_{H} \\ 1-w_{H} \end{bmatrix} - \begin{bmatrix}Y_{H}\\ Y_{N}\end{bmatrix} \parallel ^2 \\ \text{s.t} & & 0 \preceq w_{H} \preceq 1 \end{aligned} Quick reminder: $$Z$$ is the first round results compensated with the second round drop-out rate by candidate, $$Y_{H}$$ and $$Y_{N}$$ are the second round vote count for each of the two candidates in the second round, and $$w_{H}$$ are the mixing coefficient for the second round such that $$Zw_{H} \approx Y_{H}$$ and $$Z(1-w_{H}) \approx Y_{N}$$. An alternative way of expressing it is: \begin{aligned} \min_{w_{H,N}} & & \parallel \begin{bmatrix} Z & 0 \\ 0 & Z \end{bmatrix} \begin{bmatrix} w_{H} \\ w_{N} \end{bmatrix} - \begin{bmatrix}Y_{H}\\ Y_{N}\end{bmatrix} \parallel ^2 \\ \text{s.t} & & w_{H},w_{N} \succeq 0 \\ & & \begin{bmatrix} I; I \end{bmatrix} \begin{bmatrix} w_{H} \\ w_{N} \end{bmatrix} = 1 \end{aligned} in which the inequality constraints force the mix variable to be positive and the equality constraints force the exclusive voter preference. Let's see how we solve this. Let me do a sanity check first in order to make manipulations easier later on: stopifnot(Reduce(&, Map(function(a, b) Reduce(&, a == b), c(rownames(Z), rownames(Z)), c(YHsplit_level, YN$split_level))))  Now we construct the variables that the solver expects: ZZ <- matrix(0, nrow(Z), ncol(Z)) A <- rbind(cbind(Z, ZZ), cbind(ZZ, Z)) colnames(A) <- rep(colnames(Z), 2) b <- matrix(c(YH$value, YN$value), nrow(A), 1) E <- diag(1, ncol(Z), ncol(Z)) E <- cbind(E, E) f <- matrix(1, ncol(Z), 1) G <- diag(1, ncol(A), ncol(A)) h <- matrix(0, ncol(A), 1)  sol <- lsei(A, b, E, f, G, h, type = 1, verbose = FALSE) show(wh <- sol$X[seq(1, ncol(Z))]) ## Mix variables for Haavisto 
## Arhinmäki   Haavisto   Lipponen    Essayah      Soini   Niinistö    Biaudet  Väyrynen
## 1.0000000  1.0000000  0.7799422  0.0009319  0.3448406  0.0067961  0.4775150  0.1603960 
show(wn <- sol$X[seq(ncol(Z) + 1, 2 * ncol(Z))]) ## Mix variables for Niinisto  ## Arhinmäki Haavisto Lipponen Essayah Soini Niinistö Biaudet Väyrynen ## 0.0000 0.0000 0.2201 0.9991 0.6552 0.9932 0.5225 0.8396  Which seems to suggest that the vast majority of Essayah voters supported Niinistö in the second round, as did Väyrynen's supporters mostly and Soini's. Haavisto however got most of his support from Arhimäki's first round voters and most of Lipponen's. Biaudet's supporters look divided according to the model. Notice that the objective is to explain voting sympathies from first round choices and not final results though know the former allows you easity to infer the later. Last but not least a graph showing second round preferences by first round voting choice x <- data.frame(second.round.choice=c('Haavisto', 'Niinistö'), rbind(wh,wn)) x <- melt(x, c("second.round.choice"), candidate.names, variable.name='first.round.choice', value.name='Sympathy') p <- ggplot(x, aes(x=first.round.choice, y=Sympathy, fill=first.round.choice)) p <- p + geom_bar() + labs(title='Second round sympathy') + theme(legend.position='none') p <- p + scale_fill_manual(values=candidate.colormap[x$first.round.choice])
p <- p + scale_x_discrete(name="First round choice")
p <- p + scale_y_continuous(name="Sympathy %")
p <- p + facet_grid(second.round.choice ~ .) p 

Again, an analysys of the obtained residuals is in place:

## normalizing by individual area result
r <- 100 * (Z %*% matrix(wh) - YH$value)/YH$value
shapiro.test(r) 
## ## Shapiro-Wilk normality test
## ## data: r
## W = 0.6557, p-value < 2.2e-16 
summary(r) 
##              V1
## Min. :   -30.82
## 1st Qu.:  -3.07
## Median :   1.52
## Mean :     6.08
## 3rd Qu.:   8.49
## Max. :   221.79 

… which indicates residuals following a normal distribution.

#### Update

The following graph shows how the first round voting sympathies contributed to the second round final result according to the model estimates:

model.results <- merge(merge(second.round.sympathy,
drop.out.rate, by.x = c("first.round.choice"),
by.y = c("Candidate")), totals.1st.round,
by.x = c("first.round.choice"),
by.y = c("candidate"))
model.results$first.round.choice <- as.character(model.results$first.round.choice)
model.results$second.round.choice <- as.character(model.results$second.round.choice)

model.results$contributions = as.integer(round(with(model.results, vote.count * (1 - drop.out.rate) * Sympathy))) p1 <- ggplot() p1 <- p1 + geom_bar(data = model.results, stat = "identity", aes(x = second.round.choice, y = contributions/1000, fill = first.round.choice)) p1 <- p1 + scale_fill_manual(name = "First round votes for:", values = candidate.colormap[model.results$first.round.choice])
p1 <- p1 + scale_x_discrete(name = "\nSecond round candidates")
p1 <- p1 + scale_y_continuous(name = "Total seconds round votes (in 000s)\n")
p1 <- p1 + theme(axis.title.x = element_text(size = 24),
axis.title.y = element_text(size = 24))
p1 <- p1 + theme(axis.text.x = element_text(size = 14),
axis.text.y = element_text(size = 14))
p1 <- p1 + theme(legend.title = element_text(size = 16),
legend.text = element_text(size = 14))
p1