A Bayesian Prediction Model For The U.S. Presidential Election
A Bayesian Prediction Model For The U.S. Presidential Election
Volume 37 Number 4
July 2009 700-724
© 2009 SAGE Publications
Presidential Election
Steven E. Rigdon
Southern Illinois University, Edwardsville
Sheldon H. Jacobson
Wendy K. Tam Cho
University of Illinois, Urbana-Champaign
Edward C. Sewell
Southern Illinois University, Edwardsville
Christopher J. Rigdon
Arizona State University, Tempe
It has become a popular pastime for political pundits and scholars alike to
predict the winner of the U.S. presidential election. Although forecasting
has now quite a history, we argue that the closeness of recent presidential
elections and the wide accessibility of data should change how presidential
election forecasting is conducted. We present a Bayesian forecasting model
that concentrates on the Electoral College outcome and considers finer
details such as third-party candidates and self-proclaimed undecided voters.
We incorporate our estimators into a dynamic programming algorithm to
determine the probability that a candidate will win an election.
Bayesian Estimators
Table 1
Voting Behavior of Alabama and New York
Alabama New York
Bayesian Formulation
More formally, define pi to be the true proportion of voters in a state
who intend to vote for candidate i in the election (for simplicity, let i = 1
correspond to the Republican candidate, i = 2 correspond to the Demo-
cratic candidate, i = 3 collectively correspond to all third-party candidates,
and i = 4 correspond to no candidate or voters who have declared that they
are still undecided). These proportions are assumed to be continuous
(between 0 and 1) and sum to 1. The joint prior distribution for p = (p1 ,
p2 , p3 , p4 ) is assumed to be a conjugate prior distribution (i.e., when com-
bined with a multinomial distribution, the same type of posterior distribu-
tion is obtained). To satisfy this requirement, assume that p follows a
Dirichlet distribution, p ~ DIRICHLET(b1 , b2 , b3 , b4 ), which is a multi-
variate generalization of the beta distribution and is often used as a prior
for the probability of a success in Bernoulli trials. Therefore, the joint
probability density function of p can be written as
b − 1 b4 − 1
X
4
f ðp1 , p2 , p3 , p4 Þ = c pb11 − 1 pb22 − 1 p33 p4 , pi ≥ 0 for i = 1, 2, 3, 4, pi = 1,
i=1
P Q
where c = ( 4i = 1 bi )/ 4i = 1 (bi ):
The probability that a candidate wins a given state can be computed
using the marginal probability densities. To obtain these marginals, we
sequentially integrated the remaining variables out of the joint Dirichlet
probability density function. We now illustrate this process by first rewrit-
ing the joint Dirichlet probability density function as
b −1
f1, 2, 3 ðp1 , p2 , p3 Þ = c pb11 − 1 pb22 − 1 p33 ð1 − p1 − p2 − p3 Þb4 − 1 ,
X
3 ð1Þ
p1 , p2 , p3 ≥ 0, pi ≤ 1:
i=1
bi
E(pi ) = P4 ð9Þ
k=1 bk
and
P
4
bi bk − bik=1
Var (pi ) = P P 2 ð10Þ
4 4
k = 1 bk + 1 k = 1 bk
ρ
ðρ + τ Þ 1− ρ
ρ+t
= , ð11Þ
ρ+τ+1
P
where ρ = bi and t = 4k = 1 bk − bi , then if b1 = b2 = 4 and b3 = b4 = 1 or
b1 = b2 = 40 and b3 = b4 = 10, the prior means are the same, namely,
4
Eðp1 Þ = Eðp2 Þ = = 0:4
10
1
Eðp3 Þ = Eðp4 Þ = = 0:1:
10
The choice of the shape parameters is essentially arbitrary if we do not
issue any constraints or do not use any substantive guidance. Fortunately,
in presidential forecasting, we have a great deal of substantive knowledge
that can be integrated. One way to constrain our choices is to choose the
P
bs so that (a) bi / 4k = 1 bk equals what pi is expected to be, prior to obser-
ving the polling data, and (b) the spread of the prior distribution for pi
P
(determined by 4k = 1 bk , with larger values indicating less uncertainty)
reflects the perceived uncertainty in pi.
To illustrate this process, consider the marginal probability density of p1
in the candidates’ home states: Massachusetts and Texas. In 2000, the most
recent presidential election before 2004, Bush received 33% of the popular
vote
P4 in Massachusetts and 59% of the popular vote in Texas. If we let
k = 1 bk = 4 and set b1 such that E(p1) equals the percentage of the popular
vote that the Republican candidate won in the 2000 election in the given
state (after adjusting for undecided voters—see below), the two marginal
densities for p1 are depicted in Figure 1. Under these choices for the bis,
Senator Kerry had a 0.374 probability of winning Texas, while President
Bush had a 0.208 probability of winning Massachusetts. These probabilities
seem too high given both the historical voting patterns of these states and
that these states are the home states of the candidates (Lewis-Beck & Rice,
Figure 1
Marginal Probability Density Function for p1 in Massachusetts
and Texas When Sum of Prior Parameters Is 4
P
1983). Another choice would be to let 4k = 1 bk = 400, in which case Sena-
tor Kerry has a 0.002 probability of winning Texas, while President Bush
has a 10–15 probability of winning Massachusetts (see Figure 2). These
probabilities seem too low given historical trends. Perhaps a value in
between
P4 these two choices would be more ideal. Toward this effort, setting
k=1 b k = 40 gives Senator Kerry a 0.177 probability of winning Texas
and gives President Bush a 0.010 probability of winning Massachusetts (see
Figure 3).1 Of these values (4, 400, and 40), 40P appears to be the most sub-
stantively grounded, so we move forward with 4k = 1 bk being set to 40.
Our priors for the Republican and Democratic candidates involve the nor-
mal vote for all states except Alaska and Hawaii (and Washington, D.C.). For
the combined third-party candidates, we chose the mean of the prior to be
equal to the combined third-party vote in 2000. The prior mean for each of
the Republican and Democratic candidates for president was taken to be the
normal vote reduced by half of the third-party vote for that state.
To incorporate the effect from voters who declared that they were
undecided, let us assume that the prior mean for undecided voters is 3%.
This is an assumption in the purest sense, but it also seems reasonable as a
prior given the percentage of undecided voters in our various polls. Of
course, our method is not wedded to this value, and users are free to incor-
porate different values as they see fit. We could base this figure on older
polls or even an intuition of current trends since the last election.
Figure 2
Marginal Probability Density Function for p1 in Massachusetts
and Texas When Sum of Prior Parameters Is 400
Figure 3
Marginal Probability Density Function for p1 in Massachusetts and
Texas When Sum of Prior Parameters Is 40
Figure 4
Marginal Density of p1 for Florida
where NVi is the normal vote for candidate i and C3 is the proportion of
the 2000 vote for third-party candidates. Using these parameters, the prior
distribution for a given state is
1
p = ðp1 , p2 , p3 , p4 Þ ∼ DIRICHLET 38:8 NV1 − C3 ,
2
1
38:8 NV2 − C3 , 38:8 C3 , 1:2 :
2
Illustrations of priors. The backdrop of the 2000 election may be used
to illustrate how this prior works with real data. In 2000, 49% of Flori-
dians voted for Governor Bush, 49% voted for Vice President Gore, and
2% voted for some other candidate. Therefore, the prior distribution for p
in Florida is
p = (p1 , p2 , p3 , p4 ) e DIRICHLET(19:012, 19:012, 0:776, 1:2):
From Figure 4, we can see that the marginal prior probability density of
p1 and p2 are identical in this case.
To obtain an expression for the likelihood function, let X = (X1 , X2 ,
X3 , X4 ) denote the random vector of sample proportions in a state poll for
p. Therefore, for a survey with n respondents,
742! X
4
gðX|pÞ = p364 p356 p0 p22 , pk = 1:
364! 356! 0! 22! 1 2 3 4 k=1
where
Therefore,
b − 1 b2 − 1 b3 − 1 b4 − 1 n! x x x x
hðp| XÞ = CB c p11 p2 p3 p4 p 1p 2p 3p 4
x1 ! x2 ! x3 ! x4 ! 1 2 3 4
x + b3 − 1 x4 + b4 − 1
X
4
= C px11 + b1 − 1 px22 + b2 − 1 p33 p4 , pi ≥ 0,8 i, pi = 1,
i=1
where
P
4
k = 1 ðbk + xk Þ
C = Q4
k = 1 ðbk + xk Þ
Figure 5
Marginal Density for p1 (solid line) and p2 (dotted line)
the user feels that it is far flung from any conceivable reality. After per-
using exit poll results for our election of interest, we felt comfortable
with this choice.
The posterior probability that a particular candidate wins a state can be
computed from the joint posterior distribution of the ps. For the no swing
scenario, we need only the posterior of (p1 , p2 ). For example, the prob-
ability that the Republican candidate wins a state is
For the Republican swing scenario, the posterior probability must be com-
puted as a triple integral over the appropriate region. Figure 6 shows the
region in the p1 p2 plane for a fixed value of p4; p4 then goes from 0 to 1.
As this figure suggests, the triple integral must be broken into two parts,
giving
Z 1 Z 1 − p4 Z 1 − p4 − p1
+ f124 ðp1 , p2 , p4 |xÞ dp2 dp1 dp4 :
0 0:5 − 0:54 p4 0
and
p1 + ðthird-party swingÞ · p4 > p3 + ðthird-party swingÞ · p4
Figure 6
Region of Integration in the p1 p2 Plane
for the Republican Swing Scenario
(from the Gamma function) and very small numbers (from the fractions
raised to large powers), the numerical evaluations are prone to inaccura-
cies unless a high degree of precision is maintained throughout the compu-
tation. We used WinBUGS to obtain the probabilities that each candidate
would win a state. Kaplan and Barnett (2003) cleverly designed a dynamic
programming algorithm to compute the probability distribution for the
number of Electoral College votes that a candidate will receive. This
dynamic programming algorithm, which uses the state-by-state probabil-
ities, was implemented in MATLAB. Implementation of our model, how-
ever, can be completed in other software packages as well. For example,
Mathematica could be used to evaluate the integrals.
We now illustrate the use of our estimators for forecasting the 2004
U.S. presidential election. Our state-level surveys for this election were
reported by Real Clear Politics, an independent company that gathers
information from numerous publishing companies and organizes it for
public consumption on its Web site. The polls were gathered from a
variety of companies including Zogby, Rasmussen Reports, NBC, The
Wall Street Journal, the American Research Group, Fox News, and
Survey USA, among others.
For each state, the posterior distribution for the proportion of voters
who will vote for a candidate, the prior distribution for p, and the likeli-
hood distribution for X are all explicitly given in the previous section. The
posterior probabilities, shown in Table 2, can be used to compute the
probability that a candidate wins a state. A candidate who wins a state is
awarded the number of Electoral College votes associated with that state
(with the exception of Maine and Nebraska). To estimate the probability
that President Bush wins a state, the posterior probability that p1 > p2 was
computed, which is equivalent to assuming that all third-party candidates
have a zero probability of winning a state. In 2004, no third-party candi-
date was in a position to win a state, though third-party candidates were
influential in close states.
Once the probabilities of winning each individual state have been com-
puted, we used the Kaplan and Barnett (2003) dynamic programming
algorithm to compute the probability distribution for the Electoral College
votes. More formally, in this stage, we number the states (including
Table 2
Posterior Probabilities That President Bush Wins Each State in 2004
Dem. No Rep. Dem. No Rep.
State Swing Swing Swing State Swing Swing Swing
Note: Normal vote data were unavailable for Alaska, Hawaii, and Washington, D.C. We used
the 2000 presidential vote outcome as the prior for these states and Washington, D.C.
Washington, D.C.) from 1 to 51, and let p(k) be the probability that the
candidate wins state k and vk be the number of Electoral College votes for
state k. Let P(i, k) be the probability that the candidate wins exactly i Elec-
toral College votes in States 1, 2, . . . , k. Then, P(i, k) can be computed
for k ≥ 2 via the following recurrence relation:
ð1 − pðkÞ ÞPði, k − 1Þ + pðkÞ Pði − vk , k − 1Þ if i ≥ vk
Pði, k Þ =
ð1 − pðkÞ ÞPði, k − 1Þ if i < vk ,
Figure 7
Estimate of Distribution of Electoral Votes
in 2004 for President Bush Under the No Swing Scenario
efforts are commonplace and occur after the polls on which our estimates are
based. Accordingly, supplying the three different outcomes gives us a way to
account for relevant information that transpires outside of our data window.
Unlike non-Bayesian forecasting models, our model supplies a poster-
ior distribution for the resulting Electoral College tally. The distribution
allows us to make a point estimate for the Electoral College vote, if such
precision is requested, while also allowing one to examine the uncertainty
attached to our estimates. Note as well that the final Electoral College tally
need not match the results obtained by predicting each state separately
and then simply summing up the tally from each state separately. If there
are several states where the outcome is close, but leaning to Bush, we
would not necessarily expect Bush to win each of these states even though
a single point estimate for each state would fall in his favor. In our polls,
Figure 8
Estimate of Distribution of Electoral Votes in 2004 for
President Bush Under the Republican Swing Scenario
in 12 of the states Bush and Kerry were within 5 percentage points of each
other. In addition, there were 5 more states where the candidates were 6
points apart. In this election, every state predicted to be in Bush’s camp,
except New Hampshire, was computed to have at least an 80% probability
of voting for Bush. In contrast, there were 208 electoral votes in total from
states that had at least an 80% probability of voting for Senator Kerry.
That is, accordingly to our analysis, Senator Kerry had to win more of the
battleground states than President Bush in order to win the election.
Discussion
In the end, what matters is the Electoral College vote. We argue, then,
that election forecasting efforts should be directed at the Electoral College
Figure 9
Estimate of Distribution of Electoral Votes in 2004 for President
Bush Under the Democratic Swing Scenario
vote rather than the popular vote. Attention to the Electoral College
changes forecasting in significant ways. Any prediction involves many
predictions, one for every state (and Washington, D.C.) rather than a sin-
gle prediction for the national popular vote. More data are involved as
well as more analysis. Our method is focused not on single state predic-
tions or even a single national prediction, but on finding an accurate esti-
mate of the distribution of electoral votes across the nation. From this
distribution, one may obtain a point estimate and measures of uncertainty
for the final election outcome.
We have presented a methodology for predicting the outcome of the
U.S. presidential election that uses a Bayesian estimation approach that
incorporates polling data. Our model includes the effect of third-party can-
didates and declared undecided voters as input to the Kaplan and Barnett
Note
P
1. We did examine other values for 4k = 1 bk , ranging from 10 to 70. Values around 40
yield similar results, whereas values significantly larger or smaller than 40 may not be
appropriate.
References
Abramowitz, A. I. (2004). When good forecasts go bad: The time-for-change model and the
2004 presidential election. PS: Political Science and Politics, 37, 745-746.
Althaus, S. L., Nardulli, P. F., & Shaw, D. R. (2002). Candidate appearances in presidential
elections. Political Communication, 19, 49-72.
Belenky, A. S. (2008). The good, the bad, and the ugly: Three proposals to introduce the
nationwide popular vote in U.S. presidential elections. Michigan Law Review First
Impressions, 106, 110-116.
Bennett, R. W. (2006). Taming the Electoral College. Stanford, CA: Stanford University
Press.
Campbell, J. E. (2004). Forecasting the presidential vote in 2004: Placing preference polls in
context. PS: Political Science and Politics, 37, 763-767.
Campbell, J. E. (2005). Introduction—Assessments of the 2004 presidential vote forecasts.
PS: Political Science and Politics, 38, 23.
Converse, P. (1966). The concept of a normal vote. In A. Campbell, P. Converse, W. Miller,
& D. Stokes (Eds.), Elections and the political order (pp. 9-39). New York: John Wiley.
Doherty, B. J. (2007). Elections: The politics of the permanent campaign: Presidential travel
and the Electoral College, 1977-2004. Presidential Studies Quarterly, 37, 749-773.
Edwards, G. C. (2004). Why the Electoral College is bad for America. New Haven, CT: Yale
University Press.
Gaines, B. J. (2001). Popular myths about popular vote—Electoral College splits. PS: Politi-
cal Science and Politics, 34, 70-75.
Gelman, A., Katz, J. N., & Tuerlinckx, F. (2002). The mathematics and statistics of voting
power. Statistical Science, 17, 420-435.
Goux, D. J., & Hopkins, D. A. (2008). Empirical implications of Electoral College reform.
American Politics Research, 36, 824-843.
Grofman, B., Brunell, T., & Campagna, J. (1997). Distinguishing the difference between
swing ratio and bias: The U.S. Electoral College. Electoral Studies, 16, 471-487.
Grofman, B., & Feld, S. L. (2005). Thinking about the political impacts of the Electoral Col-
lege. Public Choice, 123, 1-18.
Hansen, J. M. (2008). Equal voice by half measures. Michigan Law Review First Impres-
sions, 106, 100-105.
Hiltachk, T. W. (2008). Reforming the Electoral College one state at a time. Michigan Law
Review First Impressions, 106, 90-95.
Hirsch, S. (2008). Awarding presidential electors by congressional district: Wrong for Cali-
fornia, wrong for the nation. Michigan Law Review First Impressions, 106, 95-100.
Holbrook, T. M. (2004). Good news for Bush? Economic news, personal finances, and the
2004 presidential election. PS: Political Science and Politics, 37, 759-761.
Jackman, S., & Rivers, D. (2001). State-level election forecasting during election 2000 via
dynamic Bayesian hierarchical modeling (Working Paper). Stanford, CA: Stanford Uni-
versity, Department of Political Science.
Kaplan, E., & Barnett, A. (2003). A new approach to estimating the probability of winning
the presidency. Operations Research, 51(1), 32-40.
Leib, E. J., & Mark, E. J. (2008). Democratic principle and Electoral College reform. Michi-
gan Law Review First Impressions, 106, 105-110.
Lewis-Beck, M. S., & Rice, T. W. (1983). Localism in presidential elections: The home state
advantage. American Journal of Political Science, 27, 548-556.
Lewis-Beck, M. S., & Tien, C. (2004). Jobs and the job of president: A forecast for 2004. PS:
Political Science and Politics, 37, 753-758.
Lockerbie, B. (2004). A look to the future: Forecasting the 2004 presidential election. PS:
Political Science and Politics, 37, 741-743.
Longley, L., & Dana, J. (1992). The biases of the Electoral College in the 1990s. Polity, 25,
123-145.
Nardulli, P. F. (2005). Popular efficacy in the democratic era: A re-examination of electoral
accountability in the United States, 1828-2000. Princeton, NJ: Princeton University
Press.
Norpoth, H. (2004). From primary to general election: A forecast of the presidential vote. PS:
Political Science and Politics, 37, 737-740.
Rae, D. W. (1972). Political consequences of electoral laws. New Haven, CT: Yale Univer-
sity Press.
Rathbun, D. P. (2008). Ideological endowment: The staying power of the Electoral College
and the weaknesses of the national popular vote interstate compact. Michigan Law
Review First Impressions, 106, 117-122.
Shaw, D. R. (2006). The race to 270. Chicago: University of Chicago Press.
Stokes, D. E. (1962). Party loyalty and the likelihood of deviating elections. Journal of Poli-
tics, 24, 689-702.
Tokaji, D. P. (2008). An unsafe harbor: Recounts, contests, and the Electoral College. Michi-
gan Law Review First Impressions, 106, 84-88.
Whitaker, L. P., & Neale, T. H. (2004). The Electoral College: An overview and analysis of
reform proposals (Tech. Rep. RL30804). Washington, DC: Congressional Research
Service.
Wlezien, C., & Erikson, R. S. (2004). The fundamentals, the polls, and the presidential vote.
PS: Political Science and Politics, 37, 747-751.
Steven E. Rigdon is Professor of Statistics and Graduate Program Director at Southern Illi-
nois University, Edwardsville. His research interests include reliability, quality control, statis-
tical computing, and election modeling.
Wendy K. Tam Cho is associate professor in the Departments of Political Science and Sta-
tistics and senior research scientist at the National Center for Supercomputing Applications
at the University of Illinois at Urbana-Champaign.