Optimal Play of The Dice Game Pig
Optimal Play of The Dice Game Pig
Introduction to Pig
The object of the jeopardy dice game Pig is to be the first player to reach 100
points. Each player’s turn consists of repeatedly rolling a die. After each roll,
the player is faced with two choices: roll again, or hold (decline to roll again).
• If the player rolls a 1, the player scores nothing and it becomes the opponent’s
turn.
• If the player rolls a number other than 1, the number is added to the player’s
turn total and the player’s turn continues.
• If the player holds, the turn total, the sum of the rolls during the turn, is
added to the player’s score, and it becomes the opponent’s turn.
For such a simple dice game, one might expect a simple optimal strategy,
such as in Blackjack (e.g., “stand on 17” under certain circumstances, etc.). As
we shall see, this simple dice game yields a much more complex and intriguing
optimal policy, described here for the first time. The reader should be familiar
with basic concepts and notation of probability and linear algebra.
Simple Tactics
The game of Pig is simple to describe, but is it simple to play well? More
specifically, how can we play the game optimally? Knizia [1999] describes
simple tactics where each roll is viewed as a bet that a 1 will not be rolled:
. . . we know that the true odds of such a bet are 1 to 5. If you ask
yourself how much you should risk, you need to know how much there
is to gain. A successful throw produces one of the numbers 2, 3, 4, 5, and
6. On average, you will gain four points. If you put 20 points at stake
this brings the odds to 4 to 20, that is 1 to 5, and makes a fair game. . . .
Whenever your accumulated points are less than 20, you should continue
throwing, because the odds are in your favor.
Knizia [1999, 129]
However, Knizia also notes that there are many circumstances in which
one should deviate from this “hold at 20” policy. Why does this reasoning not
dictate an optimal policy for all play? The reason is that
Put another way, playing to maximize expected score for a single turn is differ-
ent from playing to win. For a clear illustration, consider the following extreme
example. Your opponent has a score of 99 and will likely win in the next turn.
You have a score of 78 and a turn total of 20. Do you follow the “hold at 20”
policy and end your turn with a score of 98? Why not? Because the probability
of winning if you roll once more is higher than the probability of winning if the
other player is allowed to roll.
The “hold at 20” policy may be a good rule of thumb, but how good is it?
Under what circumstances should we deviate from it and by how much?
where Pi,j,k,roll and Pi,j,k,hold are the probabilities of winning for rolling or
holding, respectively. These probabilities are
Pi,j,k,roll = 16 (1 − Pj,i,0 ) + Pi,j,k+2 + Pi,j,k+3 + Pi,j,k+4 + Pi,j,k+5 + Pi,j,k+6 ,
Pi,j,k,hold = 1 − Pj,i+k,0 .
At this point, we can see how to compute the optimal policy for play. If we
can solve for all probabilities of winning in all possible game states, we need
only compare Pi,j,k,roll with Pi,j,k,hold for our current state and either roll or hold
depending on which has a higher probability of resulting in a win.
Solving for the probability of a win in all states is not trivial, as dependencies
between variables are cyclic. For example, Pi,j,0 depends on Pj,i,0 which in turn
depends on Pi,j,0 . This feature is easily illustrated when both players roll a 1 in
subsequent turns. Put another way, game states can repeat, so we cannot simply
evaluate probabilities from the end of the game backwards to the beginning,
as in dynamic programming (as in Campbell [2002] and other articles in this
Journal) or its game-theoretic form, known as the minimax process (introduced
in von Neumann and Morgenstern [1944]; for a modern introduction to that
subject, we recommend Russell and Norvig [2003, Ch. 6]).
Let x be the vector of all possible unknown Pi,j,k . Because of our equa-
tion Pi,j,k = max (Pi,j,k,roll , Pi,j,k,hold ), our system of equations takes on the
interesting form
x = max (A1 x + b1 , A2 x + b2 ).
However, our system has additional constraints: We are solving for prob-
abilities, which take on values only in [0, 1]. Therefore, we are seeking the
intersection of folded hyperplanes within a unit hypercube of possible proba-
bility values.
28 The UMAP Journal 25.1 (2004)
Piglet
Piglet is very much like Pig except that it is played with a coin rather than
a die. The object of Piglet is to be the first player to reach 10 points. Each turn,
a player repeatedly flips a coin until either a tail is flipped or else the player
holds and scores the number of consecutive heads flipped.
The number of equations necessary to express the probability of winning in
each state is still too many for a pencil-and-paper exercise, so we simplify this
game further: The winner is the first to reach 2 points.
As before, let Pi,j,k be the player’s probability of winning if the player’s
score is i, the opponent’s score is j, and the player’s turn total is k. In the case
where i + k = 2, we have Pi,j,k = 1 because the player can simply hold and
win. In the general case where 0 ≤ i, j < 2 and k < 2 − i, the probability of a
player winning is
Pi,j,k = max (Pi,j,k,flip , Pi,j,k,hold ),
where Pi,j,k,flip and Pi,j,k,hold are the probabilities of winning if one flips or
holds, respectively. The probability of winning if one flips is
Pi,j,k,flip = 12 (1 − Pj,i,0 ) + Pi,j,k+1
The probability Pi,j,k,hold is just as before. Then the equations for the probabil-
ities of winning in each state are given as follows:
P0,0,0 = max 12 (1 − P0,0,0 ) + P0,0,1 , 1 − P0,0,0 ,
P0,0,1 = max 12 (1 − P0,0,0 ) + 1 , 1 − P0,1,0 ,
P0,1,0 = max 12 (1 − P1,0,0 ) + P0,1,1 , 1 − P1,0,0 , (1)
1
P0,1,1 = max 2 (1 − P1,0,0 ) + 1 , 1 − P1,1,0 ,
1
P1,0,0 = max 2 (1 − P0,1,0 ) + 1 , 1 − P0,1,0 ,
P1,1,0 = max 12 (1 − P1,1,0 ) + 1 , 1 − P1,1,0 .
Optimal Play of the Dice Game Pig 29
Once these equations are solved, the optimal policy is obtained by observing
which action maximizes max(Pi,j,k,f lip , Pi,j,k,hold) for each state.
Value Iteration
Value iteration is an algorithm that iteratively improves estimates of the
value of being in each state. In describing value iteration, we follow Sutton
and Barto [1998], which we also recommend for further reading. We assume
that the world consists of states, actions, and rewards. The goal is to compute
which action to take in each state so as to maximize future rewards. At any
time, we are in a known state s of a finite set of states S. There is a finite set
of actions A that can be taken in any state. For any two states s, s ∈ S and
any action a ∈ A, there is a probability Pss a
(possibly zero) that taking action a
will cause a transition to state s . For each such transition, there is an expected
immediate reward Rass .
We are not interested in just the immediate rewards; we are also interested to
some extent in future rewards. More specifically, the value of an action’s result
is the sum of the immediate reward plus some fraction of the future reward.
The discount factor 0 ≤ γ ≤ 1 determines how much we care about expected
future reward when selecting an action.
Let V (s) denote the estimated value of being in state s, based on the expected
immediate rewards of actions and the estimated values of being in subsequent
states. The estimated value of an action a in state s is given by
a
Pssa
Rss + γV (s ) .
s
The optimal choice is the action that maximizes this estimated value:
a
max Pss
a
Rss + γV (s ) .
a
s
This expression serves as an estimate of the value of being in state s, that is, of
V (s). In a nutshell, value iteration consists of revising the estimated values of
states until they converge, i.e., until no single estimate is changed significantly.
The algorithm is given as Algorithm 1.
Algorithm 1 repeatedly updates estimates of V (s) for each s. The variable
∆ is used to keep track of the largest change for each iteration, and is a small
constant. When the largest estimate change ∆ is smaller than , we stop revising
our estimates.
Convergence is guaranteed when γ < 1 and rewards are bounded [Mitchell
1997, §13.4], but convergence is not guaranteed in general when γ = 1. In the
case of Piglet and Pig, value iteration happens to converge for γ = 1.
30 The UMAP Journal 25.1 (2004)
0.9
0.8 P1,0,0
0.7 P0,0,1
P1,1,0
0.6 P0,1,1
Win Probability
P0,0,0
0.5
0.4 P0,1,0
0.3
0.2
0.1
0
0 5 10 15 20 25
Iteration
converge as they effectively wait for later states to converge. This approach of
performing value iteration in stages has the advantage of iterating values of
earlier game states only after those of later game states have converged.
This partitioning and ordering of states can be taken one step further. Within
the states of a given score sum, equations are dependent on the value of states
with either
• a greater score sum (which would already be computed), or
• the value of states with the players’ scores switched (e.g., in the case of a roll
of 1).
This means that within states of a given score sum, we can perform value
iteration on subpartitions of states as follows: For player scores i and j, value
iterate together all Pi,j,k for all 0 ≤ k < 100−i and all Pj,i,k for all 0 ≤ k < 100−j.
We solve this game using value iteration. Further investigation might seek
a more efficient solution technique or identify a special structure in these equa-
tions that yields a particularly simple and elegant solution.
The Solution
The solution to Pig is visualized in Figure 3. The axes are i (player 1 score),
j (player 2 score), and k (the turn total). The surface shown is the boundary
between states where player 1 should roll (below the surface) and states where
player 1 should hold (above the surface). We assume for this and following
figures that player 1 plays optimally. Player 1 assumes that player 2 will also
play optimally, although player 2 is free to use any policy.
Overall, we see that the “hold at 20” policy only serves as a good approxi-
mation to optimal play when both players have low scores. When either player
has a high score, it is advisable on each turn to try to win. In between these ex-
tremes, play is unintuitive, deviating significantly from the “hold at 20” policy
and being highly discontinuous from one score to the next.
Let us look more closely at the cross-section of this surface when we hold
the opponent’s score at 30 (Figure 4). The dotted line is for comparison with the
“hold at 20” policy. When the optimal player’s score is low and the opponent
has a significant lead, the optimal player must deviate from the “hold at 20”
policy, taking greater risks to catch up and maximize the expected probability of
a win. When the optimal player has a significant advantage over the opponent,
the optimal player maximizes the expected probability of a win by holding at
turn totals significantly below 20.
It is also interesting to consider that not all states are reachable with opti-
mal play. The states that an optimal player can reach are shown in Figure 5.
These states are reachable regardless of what policy the opponent follows. The
reachable regions of cross-sectional Figure 4 are shaded.
To see why many states are not reachable, consider that a player starts a turn
at a given (i, j, 0) and travels upward in k until the player holds or rolls a 1. An
Optimal Play of the Dice Game Pig 33
Figure 3. Two views of the roll/hold boundary for optimal Pig play policy.
34 The UMAP Journal 25.1 (2004)
optimal player following this policy will not travel more than 6 points above
the boundary. For example, an optimal player will never reach the upper-left
tip of the large “wave” of Figure 4. Only suboptimal risk-seeking play will
lead to most states on this wave, but once reached, the optimal decision is to
continue rolling towards victory.
Also, consider the fact that an optimal player with a score of 0 will never
hold with a turn total less than 21, regardless of the opponent’s score. This
means that an optimal player will never have a score between 0 and 21. We can
see these and other such gaps in Figure 5.
Combining the optimal play policy with state reachability, we can visualize
the relevant part of the solution as in Figure 6. Note the wave tips that are not
reachable.
The win probabilities that are the basis for these optimal decisions are vi-
sualized in Figure 7. Probability contours for this space are shown for 3%, 9%,
27%, and 81%. For instance, the small lower-leftmost surface separates states
having more or less than a 3% win probability.
If both players are playing optimally, the starting player wins 53.06% of the
time; that is, P0,0,0 ≈ 0.5306. We have also used the same technique to analyze
the advantage of the optimal policy versus a “hold at 20” policy, where the
“hold at 20” player is assumed to hold at less than 20 when the turn total is
sufficient to reach the goal. When the optimal player goes first, the optimal
player wins 58.74% of the time. When the “hold at 20” player goes first, the
“hold at 20” player wins 47.76% of the time. Thus, if the starting player is
chosen using a fair coin, the optimal player wins 55.49% of the time.
Conclusions
The simple game of Pig gives rise to a complex optimal policy. A first look
at the problem from a betting perspective yields a simple “hold at 20” policy,
but this policy maximizes expected points per turn rather than the probability
of winning. The optimal policy is instead derived by solving for the probability
Optimal Play of the Dice Game Pig 35
Figure 7. Win probability contours for optimal play (3%, 9%, 27%, 81%).
Optimal Play of the Dice Game Pig 37
of winning for every possible game state. This amounts to finding the intersec-
tion of folded hyperplanes within a hypercube; the method of value iteration
converges and provides a solution. The interested reader may play an optimal
computer opponent, view visualizations of the optimal policy, and learn more
about Pig at http://cs.gettysburg.edu/projects/pig .
Surprising in its topographical beauty, this optimal policy is approximated
well by the “hold at 20” policy only when both players have low scores. In the
race to 100 points, optimal play deviates significantly from this policy and is far
from intuitive in its details. Seeing the “landscape” of this policy is like seeing
the surface of a distant planet sharply for the first time having previously seen
only fuzzy images. If intuition is like seeing a distant planet with the naked eye,
and a simplistic, approximate analysis is like seeing it with a telescope, then
applying the tools of mathematics is like landing on the planet and sending
pictures home. We will forever be surprised by what we see!
the game requires specialized dice: One die has a pig head replacing the 1;
the other has a pig tail replacing the 6. Such rolls are called Heads and Tails,
respectively. The goal score is 100; yet after a player has met or exceeded 100,
all other players have one more turn to achieve the highest score.
As in Pig, players may hold or roll, risking accumulated turn totals. How-
ever:
• There is no undesirable single die value; rather, rolling dice that total 7 ends
the turn without scoring.
• Rolling a Head and a Tail doubles the current turn total.
• Rolling just a Head causes the value of the other die to be doubled.
• Rolling just a Tail causes the value of the other die to be negated.
• The turn total can never be negative; if a negated die would cause a negative
turn total, the turn total is set to 0.
• All other non-7 roll totals are added to the turn total.
• If two 1s are rolled, the player adds 25 to the turn total and it becomes
the opponent’s turn. (Knizia [1999] calls this a variant of Big Pig, which is
identical to Frey’s game except that the player’s turn continues after rolling
double 1s.)
• If other doubles are rolled, the player adds twice the value of the dice to the
turn total, and the player’s turn continues.
• Players are permitted the same number of turns. So if the first player scores
100 or more points, the second player must be allowed the opportunity to
exceed the first player’s score and win.
turn ends with loss of the turn total), and a roll with pigs touching has the same
consequences as rolling double 1s (the turn ends with loss of the turn total and
of the entire score). PigMania is similar to Frey’s variant in that two pigs in the
same non-side configuration score double what they would individually.
where Er,s,t,roll and Er,s,t,hold are the expected future gain if one rolls and holds,
respectively. These expectations are given by:
1
Er,s,t,roll = 1(4 + Er,s,t+4 ) + 2(5 + Er,s,t+5 ) + 3(6 + Er,s,t+6 )
36
+ 4(7 + Er,s,t+7 ) + 5(8 + Er,s,t+8 ) + 4(9 + Er,s,t+9 )
+ 3(10 + Er,s,t+10 ) + 2(11 + Er,s,t+11 ) + 1(12 + Er,s,t+12 )
+ 10(−t + Er+1,s,0 ) + 1(−s − t + Er+1,0,0 ) ,
Er,s,t,hold = Er+1,s+t,0 .
Since the state space has no cycles, value iteration is unnecessary. Comput-
ing the optimal policy π ∗ through dynamic programming, we calculate the
π∗
expected gain E1,0,0 = 36.29153313960543. If we instead apply the policy π ≤
of the single-turn odds-based analysis, rolling when s + 11t ≤ 200, we calcu-
π≤
late the expected gain E1,0,0 = 36.29151996719233. These numbers are so close
that simulating more than 109 games with each policy could not demonstrate
a significant statistical difference in the average gain.
However, two factors give support to the correctness of this computation.
First, we observe that the policy π ≤ is risk-averse with respect to the com-
puted optimal policy π ∗ . According to the odds-based analysis, it does not
matter what one does if s + 11t = 200, and the authors state “you may flip
a coin to decide”. The above computation assumed one always rolls when
s + 11t = 200. If this analysis is correct, there should be no difference in ex-
pected gain if we hold in such situations. However, if we instead apply the
policy π < of this odds-based analysis, rolling when s + 11t < 200, we com-
π<
pute E1,0,0 = 36.29132666694349, which is different and even farther from the
optimal expected gain.
42 The UMAP Journal 25.1 (2004)
Second, we can more easily observe the difference between optimal and
odds-based policies if we extend the number of turns in the game to 20. Then
π∗ π≤
E1,0,0 = 104.78360865008132 and E1,0,0 = 104.72378302093477. After 2 × 108
simulated games with each policy, the average gains were 104.784846975 and
104.72618221, respectively.
Of special note is the good quality of the approximation such an odds-
based analysis gives us for optimal THINK score gain, given such simple, local
considerations. For THINK reduced to four turns, we compute that policies π ∗
and π ≤ reach the same game states and dictate the same decisions in those states.
Similarly examining Knizia’s Pig analysis for maximizing expected score, we
find the same deviation of optimal versus odds-based policies for Pig games
longer than eight turns.
Miscellaneous Variants
Yixun Shi [2000] describes a variant of Pig that is the same as Brutlag’s
SKUNK except:
• There are six turns.
• A roll that increases the turn total does so by the product of the dice values
rather than by the sum.
• Double 1s have the same consequences as a single 1 in 2-Dice Pig (loss of
turn total, end of turn).
• Shi calls turns “games” and games “matches”. We adhere to our terminology.
Shi’s goal is not so much to analyze this Pig variant as to describe how to
form heuristics for good play. In particular, he identifies features of the game
(e.g., turn total, score differences, and distance from expected score per turn),
combining them into a function to guide decision making. He parametrizes
the heuristics and evaluates parameters empirically through actual play.
Ivars Peterson describes Piggy [2000], which varies from 2-Dice Pig in that
there is no bad dice value. However, doubles have the same consequences as a
single 1 in 2-dice Pig. Peterson suggests comparing Piggy play with standard
dice versus nonstandard Sicherman dice, for which one die is labeled 1, 2, 2, 3,
3, and 4 and the other is labeled 1, 3, 4, 5, 6, and 8. Although the distribution of
roll sums is the same for Sicherman and standard dice, doubles are rarer with
Sicherman dice.
total. We suggest that jeopardy dice games can be further subdivided into two
main subclasses: jeopardy race games and jeopardy approach games.
Blackjack. The playing-card version of Macao was very popular in the 17th
and 18th centuries [Scarne 1980]; the card game Vingt-et-Un gained popularity
in the mid-18th century as a favorite game of Madame du Barry and Napoleon
[Parlett 1991]. Parlett writes, “That banking games are little more than dice
games adapted to the medium of cards is suggested by the fact that they are
fast, defensive rather than offensive, and essentially numerical, suits being
often irrelevant” [1999, 76].
Computational Challenges
Optimal play for Macao and Sixteen has been computed by Neller and his
students at Gettysburg College through a similar application of value iteration.
Other non-jeopardy dice games have been solved with dynamic programming,
e.g., Campbell [2002]. However, many dice games are not yet solvable because
of the great number of reachable game states and the memory limitations of
modern computers.
Memory requirements for computing a solution may be reduced through
various means. For instance, the partitioning technique that we described can
be used to hold only those states in memory that are necessary for the solution
of a given partition. Also, one can make intense use of vast, slower secondary
memory. That is, one can trade off computational speed for greater memory.
One interesting area for future work is the development of techniques to
compute approximately optimal policies. We have shown that many possible Pig
game states are not reachable through optimal play, but it is also the case that
many reachable states are improbable. Simulation-based techniques such as
Monte Carlo and temporal difference learning algorithms [Sutton and Barto
1998] do not require probability models for state transitions and can converge
quickly for frequently occurring states. Approximately optimal play for more
difficult games, such as Backgammon, can be achieved through simulation-
based reinforcement learning techniques combined with feature-based state-
abstractions [Tesauro 2002; Boyan 2002].
References
Beardon, Toni, and Elizabeth Ayer. 2001a. Game of PIG—Sixes. NRICH (June
2001 and May 2004). http://www.nrich.maths.org/public/viewer.
php?obj_id=1258 .
———. 2001b. Game of PIG—Ones: Piggy Ones and Piggy Sixes: Should you
change your strategy? NRICH (July 2001). http://www.nrich.maths.
org/public/viewer.php?obj_id=1260 .
Bell, Robert Charles. 1979. Board and Table Games from Many Civilizations. Re-
vised ed. New York: Dover Publications, Inc.
Optimal Play of the Dice Game Pig 45
Campbell, Paul J. 2002. Farmer Klaus and the mouse. The UMAP Journal 23 (2):
121–134. 2004. Errata. 24 (4): 484.
Diagram Visual Information Ltd. 1979. The Official World Encyclopedia of Sports
and Games. London: Paddington Press.
———. 1998. Ten Thousand games to play with dice. WGR 13: 22–23, 37.
Published by Michael Keller, 1227 Lorene Drive, Pasadena, MD 21222;
[email protected] .
Kincaid, David R., and E. Ward Cheney. 1996. Numerical Analysis: Mathematics
of Scientific Computing. 2nd ed. Pacific Grove, CA: Brooks/Cole Publishing
Co.
Knizia, Reiner. 1999. Dice Games Properly Explained. Brighton Road, Lower
Kingswood, Tadworth, Surrey, KT20 6TD, U.K.: Elliot Right-Way Books.
Parlett, David. 1991. A History of Card Games. New York: Oxford University
Press.
———. 1999. The Oxford History of Board Games. New York: Oxford University
Press.
Scarne, John. 1945. Scarne on Dice. Harrisburg, PA: Military Service Publishing
Co. 1980. 2nd ed. New York: Crown Publishers, Inc.
Shi, Yixun. 2000. The game PIG: Making decisions based on mathematical
thinking. Teaching Mathematics and Its Applications 19 (1): 30–34.
Sutton, Richard S., and Andrew G. Barto. 1998. Reinforcement Learning: An
Introduction. Cambridge, MA: MIT Press.
Tesauro, Gerald J. 2002. Programming backgammon using self-teaching neural
nets. Artificial Intelligence 134: 181–199.
Wong, Freddie. n.d. Pass the Pigs — Probabilities. http://members.tripod.
com/~passpigs/prob.html .
Optimal Play of the Dice Game Pig 47