UNIT I (e)
UNIT I (e)
In adversarial search problem environment is multiagent, competitive where in the agent's goals are
in coflict. Adversarial search problems are commonly known as games.
Mathematical game theory a branch of economics views any multiagent environment as a game,
provided that, impact of each agent on the others is "significant" regardless of whether the agent are
co-operative or competitive.
What is Game?
AU: May-03
- The term game means a sort of conflict in which n individuals or groups (known as players)
participate.
- John Von Neumann is acknowledged as father of game theory. Neumann defined game theory in
1928 and established the mathematical framework for all subsequent theoretical developments.
- Game theory allows decision-makers (players) to cope with other decision-maker (players) who
have different purposes in mind. In other words, players determine their own strategies in terms of
the strategies and goal of their opponent.
- Games are integral attribute of human beings. Games engage the intellectual faculties of humans.
- Game playing has close relation to intelligence and it has well-defined states and rules.
AU May-03
Applications of game theory are wide-ranging. Von Neunmann and Morgenstern indicated the utility
of game theory by linking with ecomomic behavior.
1. Economic models
For markets of various commodities with differing numbers of buyers and sellers, fluctuating values
of supply and demand, seasonal and cyclical variations, analysis of conflicts of interest in maximizing
profits and promoting the widest distribution of goods and services.
2. Social sciences
The n-person game theory has interesting uses in studying the distribution of power in legislative
procedures, problems of majority rule, individual and group decision making.
3. Epidemiologists
Make use of game theory, with respect to immunization procedures and methods of testing a vaccine
or other medication.
4. Mlitary strategists
Turn to game theorey to study conflicts of interest resolved through "battles" where the outcome or
payoff of a war game is either victory or defeat.
Definition of Game
1. A game has at least two players. Solitaire is not considered a game by game theory.
2. An instance of a game begins with a player, choosing from a set of specified (game rules)
alternatives. This choice is called a move.
3. After first move, the new situation determines which player to make next move and alternatives
available to that player.
ii) In many multi-player card games, the player making next move depends on who dealt, who took
last trick, won last hand, etc.
4. The moves made by a player may or may not be known to other players. Games in which all moves
of all players are known to everyone are called games of perfect information. For example,
6. When an instance of a game ends, each player receives a payoff. A payoff is a value associated with
each player's final situation. A zero-sum game is one, in which elements of payoff matrix sum to zero.
In typical zero-sum game:
i) Win = 1 point,
Game Theory
Game theory does not prescribe a way or say how to play a game. Game theory is a set of ideas and
techniques for analyzing conflict situations between two or more parties. The outcomes are
determined by their decisions.
In every two players, game like zero sum, non-random, perfect knowledge game there exists a
perfect strategy guaranteed to at least result in a tie game.
- The term "game" means a sort of conflict in which n individuals or groups (known as players)
participate.
- A list of "rules" stipulates the conditions under which the game begins.
- A game is said to have "perfect information" if all moves are known to each of the players involved.
- A "strategy" is a list of the optimal choices for each player at every stage of a given game.
- A "move" is the way in which game progresses from one stage to another, beginning with an initial
state of the game to the final state.
• The important and basic game theory theorem is the mini-max theorem. This theorem says,
"If a minimax of one player corresponds to a maximin of the other player, then that outcome is the
best that both players can hope for."
There were two reasons that games appeared to be a good domain in which to explore machine
intelligence -
i) They provide a structured task in which it is very easy to measure success or failure.
ii) They did not obviously require large amounts of knowledge. They were thought to be solvable by
straightforward search from the starting state to a winning position.
The first of these reasons remains valid and accounts for continued interest in the area of game
playing by machine. Unfortunately, the second is not true for any but the simplest games. For
example, consider chess, playing it on computer has to face problems of combinational explosion of
solutions due to the following reasons
• So in order to examine the complete game tree, one would need to examine 35100 positions.
In addition to the above two reasons there are some more reasons, why game-playing occupies a
pivotal role in AI are -
i) The rules of games are very limited. Hence, extensive amounts of domain-specific knowledge are
seldom needed.
ii) Many human experts exist to assist in the developing of the programs. Hence, the problem of
shortage of human experts does not arise.
iii) For the human expert, it is easy to explain the rationale for a move unlike other domains.
iv) Games picturize real life situations in a constricted fashion. The logical reasoning ability of the
human in a normal condition and under stress is clearly exhibited in game-playing. Moreover game-
playing permits one to simulate real-life situations.
Game Playing
2. Time limit
Games are always played in time constraint environment. Therefore is needs to handle time
efficiently.
Types of Games
1. Based on chance
For example -
For example
Backgammon, Monopoly.
2. Based on information
i) Perfect information -
For example -
Chess, Checker, Tic-tac-toe.
For example:
Players must choose their strategies simultaneously, neither knowing what the other player is going
to do.
For example-
If you play a single game of chess with someone, one person will lose and one person will win. The
win (+1) added to the loss (−.1) equals zero.
4. Constant-sum game
Here the algebraic sum of the outcomes are always constant, though not necessarily zero.
5. Non-zero-sum game
Here the algebraic sum of the outcomes are not constant. In these of the payoffs are not the same
for all outcomes.
They are not always completely solvable but provide insights into important areas of inter-dependent
choice.
In these games, one player's losses do not always equal another player's gains.
Here all players have one goal that they contribute together.
6. N-person game
1. The initial state, which includes the board position and identifies the player to move.
2. A successor function, which returns a list of (move, state) pairs, each indicating a legal move and
the resulting state.
3. A terminal test, which determines when the game is over. States where the game has ended are
called as terminal states.
4. A utility function (also called an objective function or payoff function), which gives a numeric value
for the terminal states. In chess, the outcome is a win, loss or draw, with values +1, - 1, or Ө. Some
game have a wider variety of possible outcomes; the payoffs in backgammon range from + 192 to -
192.
A players's strategy in a game is a complete plan of action for whatever situation might arise. It is a
complete algorithm for playing the game, telling a player what to do for every possible situation
throughout the game.
A pure strategy provides a complete definition of how a player will play a game. In perticular, it
determines the move a player will make for any situation they could face. A player's strategy set, is
the set of pure strategies available to that player.
A mixed strategy is an assignment of a probabiblity to each pure strategy. This allows for a player to
randomly select a pure strategy. Since probabilities are continuous, there are infinitely many mixed
strategies available to a player, even if their strategy set is finite.
A mixed stretegy for a player is a probability distribution, on the set of his pure strategies.
• For example, Rubik's cube and 8-tile puzzle are single person games. For solving such problems
strategies like best-first or A* algorithm can be used. These strategies help in identifying paths in a
clear fashion.
• While for solving problem, where two person play a game like chess or checkers etc., cannot be
solved by best-first search or A* algorithms. As here each player tries to outsmart the opponent.
Each has their own way of evaluating the situation.
• The basic characteristic of the strategy must be look ahead in nature i.e., explore the tree for two
or more levels downwards and choose the optimal one. The basic methods available for game
playing are -
i) Minimax strategy
• Roughly speaking, an optimal strategy leads to outcomes that are at least as good as any other
strategy when one is playing an infallible opponent.
Mini-Max Value
The minimax value for a given game tree is determined by optimal strategy by evaluating minimax
value of each node.
Mini-Max Theorem
Players adopt those strategies which will maximize their gains, while minimizing their losses.
Therefore the solution is, the best each player can do for him/herself in the face of opposition of the
other player.
1. Given a game tree the optimal strategy can be determined by examining the minimax value of
each node.
2. The minimax value of a node is the utility (for player called MAX) of being in the corresponding
state, assuming that both players play optimally from this stage to the end of the game.
4. Given a choice, MAX will perfer to move to a state of maximum value, whereas MIN prefers a state
of minimum value.
Game Tree
The initial state and the legal moves for each side, define the game tree for the game.
1. Root node -
Represents board configuration and decision, required as to what is the best singlenext move.
If my turn to move, then the root is labeled a MAX node indicating it is my turn;
2. Arcs -
Represent the possible legal moves for the player that the arcs emanate from.
3. At each level, the tree has nodes that are all MAX or all MIN.
4. Since moves alternate, the nodes at level 'ï' are of opposite kind from those at level i + 1.
2. The top node (root) is the initial state and MAX (player 1) moves first, placing X in an empty
square.
3. The rest of the search tree shows alternate moves for MIN (player 2) and MAX.
• As for every move a player makes in the game of chess, the average branching factor is 35, i.e., the
opponent can make 35 different moves.
• If one employs a simple move-generator then it might not be possible to examine all the states.
Hence, it is essential that only every selected moves or paths be examined.
• For this purpose only, one has a plausible move generator, that expands or generates only selected
moves.
b) The amount of computational power available at the disposal for examining various states is also
limited.
• This is one of the most important component of game playing program which generates the static
evaluation function value for each and every move that is being made based on heuristics.
• The static evaluation function gives a snapshot of a particular move. More the static evaluation
function value, more is the probability for a victory.
• Static evaluation function generator occupies a crucial role in game playing programs because of
the following factors -
a) It utilizes the heuristic knowledge for evaluating the static evaluation function value.
b) The static evaluation function generator acts like a pointer to print the plausible move generator
has to generate future paths.
Mini-Max Algorithm
The minimax algorithm computes the minimax decision from the current state. It is used as a
searching technique in game problems. The minimax algorithm performs a complete depth-first
exploration of the game-tree.
1. The start node is MAX (player 1) node with current board configuration.
4. "Back up" values for each non-leaf nodes until computed for the root node.
5. At MIN (player 2) nodes, the backed up value is the minimum of the values associated with its
children.
6. At MAX nodes, the backed up value is the maximum of the values associated with its children.
Note The process of "backing up" values gives the optimal strategy, that is, both players assume that
your opponent is using the same static evaluation function as you are.
Properties of Mini-Max
4. The space complexity is O(bm) for depth-first exploration where algorithm generates all successor
at once, or O(m) for an algorithm that generate successors one at a time.
This algorithm explores whole search space. If we have a game with huge search spaces it will take
long time.
1. Assume that two players named MIN and MAX who are playing the game.
4. Play alternates between MAX and MIN until we reach leaf nodes corresponding to terminal states
such that one player has 3 in a row or all the squares are filled.
5. The number on each leaf node indicates the utility value of the terminal state from the point of
view of MAX.
6. High values are assumed to be good for MAX and bad for MIN.
Step 1:
Step 5: Level by level, on the basis of opponents turn UP: One level
Step 6: UP: Two levels
Example
1. Assume that two players named MIN and MAX who are playing the game.
3. The possible moves for MAX at the root node are labeled a1, a2 and a3.
4. The possible replies to a 1 for MIN are b1, b2, b3 and so on.
5. This particular game ends after one move each by MAX and MIN.
6. In game parlance, we say that this tree is one which, moves deep consisting of two-half-moves,
each of which is called a ply.
Example
3. Here the single value for each node is replaced with a vector of values. For example, in a three-
player game with players A, B, and C a vector (VA , VB , VC) is associated with each node.
4. For terminal states, this vector gives the utility of the state from each player's viewpoint. (In two
player, zero-sum games, the two-element vector can be reduced to a single value because the values
are always opposite.
6. For non-terminal states we can calculate utility function value as explain below - Consider
following diagram,
7. For non-terminal states vector values should be calculated as explained. Consider the node
marked X in the game tree shown in above diagram. In this state, player C choose what to do. The
two choices lead to terminal states with utility vector (VA = 1, VB = 2, VC = 6) and (VA = 4, VB = 2, VC =
3). Since 6 is bigger than 3, C should choose the first move. This means that if state X is reached
subsequent play will lead to a terminal state with utilities (VA = 1, VB = 2, VC = 6). Hence the backed-
up value of X is this vector.
8. In general, the backed-up value of a node 'n' is the utility vector of whichever successor has the
highest value for the player choosing at 'n'.
9. Multiplayer games usually involve alliance, whether formal or informal, among the players.
Alliances are made and broken as the game proceeds.
Alpha-Beta Pruning
1. The problem with minimax algorithm search is that the number of game states it has to examine is
exponential in the number of moves.
2. α- β proposes to compute the correct minimax algorithm decision without looking at every node in
the game tree.
Step 1:
Step 2:
Step 3:
1. MAX player cuts off search when he knows MIN-player can force a provably bad outcome.
2. MIN player cuts of search when he knows MAX-player can force provably good (for MAX) outcome.
3. Applying an alpha-cutoff means we stop search of a particular branch because we see that we
already have a better opportunity elsewhere.
4. Applying beta-cutoff means we stop search of a particular branch because we see that the
opponent already has a better opportunity elsewhere.
Alpha Cutoff
It may be found that, in the current branch, the opponent can achieve a state with a lower value for
us than one achievable in another branch. So the current branch is one that we will certainly not
move the game. Search of this branch can be safely terminated.
For example
Beta-Cutoff
It may also be found, that in the current branch, we would be able to achieve a state which has a
higher value for us than one of the opponent can hold us to in another branch. The current branch
can be identified as one that the opponent will certainly not move the game. So search in this branch
can be safely terminated.
For example -
/*
alpha is the best score for max along the path to state.
beta is the beta is the best score for min along the path to state.
*/
If the level is the top level, let alpha= - infinity, beta = + infinity.
If depth has reached the search limit apply static evaluation function to state and return result.
If player is max:
Until all of state's children are examined with ALPHA-BETA or until alpha is equal to or greater than
beta:
note result
Compare the value reported with alpha; if reported value is larger reset alpha to the new value.
Report alpha
If player is min:
Until all of state's children are examined with ALPHA-BETA or until alpha is equal to or greater than
beta:
note result.
Compare the value reported with beta; if reported value is smaller, reset beta to the new value.
Report beta.
Example 1:
1. In a game tree, each node represents a board position where one of the player gets to choose a
move.
2. For example, look at node C in Fig. 5.12.6, as well as look at its left child.
3. We realize that if the players reach node C, the minimizer can limit the utility to 2. But the
maximizer can get utility 6 by going to node B instead, so he would never let the game reach C. reach
C. Therefore we don't even have to look at C's other children.
4. Initially at the root of the tree there is no gurantee about what values the maximizer and
minimizer can achieve. So beta is set to ∞ and alpha to – ∞.
5. Then as we move down the tree, each node starts with beta and alpha values passed down from
its parent.
6. If it's a maximizer node, then alpha is increased if a child value is greater than the current alpha
value. Similarly, at a minimizer node, beta may be dereased. This procedure is shown in Fig. 5.12.6.
7. At each node, the alpha and beta values may be updated as we iterate over the node's children. At
node E, when alpha is updated to a value of 8, it ends up exceeding beta.
8. This is a point where alpha-beta pruning is required we know the minimizer would never let the
game reach this node, so we don't have to look at its remaining children.
9. In fact, pruning happens exactly when alpha becomes greater than or equal to beta - that is, when
the alpha and beta lines hit each other in the node value.
1) Alpha-beta cut-offs is a modified version of minimax algorithm, wherein two threshold values are
maintained for future expansion.
2) One representing a lower bound on the value that a maximizing node ultimately, be assigned (we
call this alpha) and another representing a upper bound on the value that a minimizing node may be
assigned (this we call beta).
3) To explain how alpha-beta values help, consider the tree structure in Fig. 5.12.7.
• Here, the maximizer has to play followed by the minimizer. As is done in minimax, look ahead
search is done.
• The maximizer assigns a value 9 at B (minimum of the value at D and E). This value is passed back
to A. So the maximizer is assured to have a value greater than 9. When it moves to B.
• So, by simply pruning the tree whose root is G, one saves a lot of time in searching.
4) Now effects of alpha and beta could be understood by Fig. 5.12.8. Here, a look ahead search is
done upto a level 3 as shown in Fig. 5.12.8. The static evaluation function generator has assigned
values which are given for the leaf nodes. Since, D, E, F and G are maximizers, the maximum of leaf
nodes are assigned to them. Thus D, E, F and G has value 8, 10, 5 and 11.
5) Here B and C are minimizers, so they are assigned a value minimum of their successors i.e., B is
assigned a value 8 and C is assigned a value 5. And A is maximizer so A will obviously opt a value 8.
• ROLE OF ALPHA - For A, the maximizer, a value of 8 is assured by moving to node B. This value is
compared with that of the value at C.
• A, being a maximizer would follow a path whose value is greater than 8. Hence, this value of 8,
being the least that a maximizing node can obtain is set as the value of alpha.
• This value of alpha is now used as reference point. Any node whose value is greater than alpha is
acceptable and all nodes whose values are less than alpha value are rejected. By Fig. 5.12.8, we come
to know, that value at C is 5. Hence, the entire tree under B is totally rejected. That saves a lot of time
and cost.
• ROLE OF BETA - Now, to know the role of BETA consider Fig. 5.12.9.
Here, B is root, B is minimizer and the paths for expansion are chosen from the values at each leaf
nodes.
Since, D and E are maximizer's the maximum values of their children are backed up as their static
evaluation function values. Node B being, a minimizer will always move to D rather than E. The value
at D(8) is now used by B as a reference. This value is called beta, the maximum value, a minimizer can
be assigned. Any node whose value is less than this beta value (8) is acceptable and values more than
beta are seldom referred. The value of beta is passed to node E. Comparing it with the static
evaluation function value there.
The minimizer will be benefited by only moving to D rather than E. Hence, the entire tree under note
E is pruned.
• α is the value of the best (i.e., highest-value) choice found so far at any choice point along the path
for max.
1. Evaluation Functions
1. An evaluation function returns an estimate of the expected utility of the game from a given
position, just as the heuristic functions return an estimate of the distance to the goal.
2. It should be clear that the performance of a game-playing program is dependant on quality of its
evaluation function. An inaccurate evaluation function will guide an agent toward positions that turn
out to be lost.
a) The evaluation function should order the terminal states in the same way as the true utility
function; otherwise, an agent using it might select sub optimal moves even if it can see ahead all the
way to the end of the game.
c) For non terminal states, the evaluation function should be strongly correlated with the actual
chances of winning.
The minimax algorithm generates the entire game search space, whereas the alpha-beta algorithm
allow us to prune large parts of it. However, alpha-beta still has to search all the way to terminal
states for at least a portion of the search space. This depth is usually not practical, because moves
must be made in a reasonable amount of time-typically a few minutes at most.
For making search faster we can make a heuristic evaluation function that can be applied to states in
the search. It will effectively turn nonterminal nodes into terminal leaves. In other words, the
suggestion is to alter minimax or alpha-beta in two ways; the utility function is replaced by a heuristic
evaluation function, which gives an estimate of the positions utility, and the terminal test is replaced
by a cutoff test that decides when to apply heuristic function.
If Cutoff-Test (s, depth) then return E(S) problem in cutting off search. (Where E is evaluation
function).
6. Quiescent position -
b) Instead, do a small secondary search until things calm down. For example -After capturing a piece,
things look good, but this would be misleading if opponent was about to capture, right back.
d) A quiescent position is a position which is unlikely to exhibit wild swings (huge changes) in value in
near future.
e) Consider following example –
Here [Refer Fig. 5.12.11] now since the tree is further explored, the values, which is passed to A, is 6.
Thus the situation calms down. This is called as waiting for quiescence. This helps in avoiding the
horizon effect of a drastic change of values.
A potential problem that arises in game tree search (of a fixed depth) is the horizon effect, which
occurs when there is a drastic change in value, immediately beyond the place where the algorithm
stops searching. Consider the tree show in Fig. 5.12.12 (a).
It has nodes A, B, C and D. At this level since it is a maximizing ply, the value which will passed up at A
is 5.
Suppose node B is examined one more level as shown in Fig. 5.12.12 (b), then we see that because of
a minimizing ply, value at B is 2 and hence the value passed to A is 2. This results in a drastic change
in the situation. There are two proposed solutions to this problem
a) Singular extension -
Which expands a move that is clearly better than all other moves. Its cost is higher with branching
factor 1.
b) Secondary search-
One proposed solution is to examine the search space, beyond the apparently best one, to see that if
something is looming just over the horizon. In that case we can revert to second-best move.
Obviously then the second-best move has the same problem, and there isn't time to search beyond
all possible acceptable moves.
8. Additional refinements - Other than alpha-beta pruning, there are a variety of other modifications
to the minimax procedure that can also enhance its performance.
During searching, maintain two values alpha and beta so that alpha is the current lower bound of the
possible returned value; beta is the current upper bound of the possible returned value. If during
searching, we know for sure alpha > beta, then there is no need to search any more in this branch.
The returned value cannot be in this branch. Backtrack until it is the case alpha beta. The two values
alpha and beta are called the ranges of the current search window. These values are dynamic.
Initially, alpha is - ∞ and beta is.
The quiescence and secondary search technique - If a node represents a state in the middle of an
exchange of pieces, the evaluation function may not give a reliable estimate of board quality. For
example, after a knight is captured, the evaluation may be good, but this is misleading as the
opponent is about to capture one's queen.
The solution to this problem is, if node evaluation is not "quiescent", continue alpha-beta search can
be made below that node but limit moves to those that significantly change evaluation function (for
example, capture moves or promotions). The crucial complexity point is that branching factor for
such moves is small.
For complicated games it is not feasible to select a move by simply looking-up the current game
configuration in a catalogue (the book move) and extracting the current move. The catalogue would
be huge and very complex for construction.
4. Forward Pruning
2. In this, some moves at a given node are pruned immediately without further consideration.
Clearly, most humans playing chess only consider a few moves from each position (at least
consciously).
3. Unfortunately, the approach is rather dangerous because there is no guarantee that the best move
will not be pruned away.
4. This can be disasterous if applied near the root, because it may happen that, often the program
will miss some "obvious" moves.
5. Forward pruning can be used safely in special situations. For example, when two moves are
symmetric or otherwise equivalent, only one of them need be considered or for nodes that are deep
in the search tree.
Games with Chance
Games with a certain element of chance are often more interesting than those without chance, and
many games involve rolling dice, tossing a coin or something similar.
- In the real world many situation infront of us are unpredictable. It is also observed in many games.
- In game with chance, we can introduce probabilities to our search diagrams and calculate minimax
solutions as in the normal games.
- We add 1 more level in game tree i.e. the level of chance nodes.
4) If N is MAX : Utility (N) = Σni=1 P(di) maxs ε S(N, di) utility (s)
5) If N is MIN:
- The utility is not computed by using just the terminal values. Therefore values assigned to win, loss
and draw affect the choice of moves.
- Time complexity increases (n outcomes from the chance nodes) to O (bd nd).
While developing game playing applications the incremental approach can be taken. First stage is,
human plays against human. In this stage program serve as a representation. The program checks for
legal moves. In second stage, human plays against program in which legal and good moves are made
by program. The program acts as coach in this stage. In third stage, program plays against program.
The learning component of the program enhances by playing games.
In game playing good, evaluation function is crucial factor. A evaluation function should consider all
the factors like number of pieces, values associated with each square on the board. Searching,
looking ahead and exploring alternatives should be tried using evaluation function.
• Checkers (Samuels, Chinook) : Chinook ended 40-year-reign of human world champion Marion
Tinsley in 1994. It used an endgame database defining perfect play for all positions involving 8 or
fewer pieces on the board, a total of 443,748,401,247 positions. Checkers is now solved!
• Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997.
Deep Blue examined 200 million positions per second, used very sophisticated evaluation and
undisclosed methods for extending some lines of search up to 40 ply. Current programs are even
better, if less historic.
• Othello (Logistello): In 1997, Logistello defeated human champion by six games to none. Human
champions refuse to compete against computers, which are too good.
• Go (Goemate, Go4++): Human champions are beginning to be challenged by machines, though the
best humans still beat the best machines. In Go, b> 300, so most programs use pattern knowledge
bases to suggest plausible moves, along with aggressive pruning.
4. Have an evaluation function taking other factors into account (for example, no. of pieces).
5. Search/look-ahead/exploring alternatives (using evaluation function) and look one more ahead.
look several moves ahead using minimax, alpha_beta.
Stochastic Games
Stochastic games represent dynamic interactions in which the environment changes in response to
players' behavior. Scientist Shapley says, "In a stochastic game the play proceeds by steps from
position to position, according to transition probabilities controlled jointly by the two players"
A stochastic game is played by a set of players. In each stage of the game, the play is in a given state
(or position, in Shapley's language), taken from a set of states, and every player chooses an action
from a set of available actions. The collection of actions that the players choose, together with the
current state, determine the stage payoff that each player receives, as well as determines probability
distribution according to which the new state that the player will visit.
Stochastic games extend the model of strategic-form games, which is attributable to scientist Von
Neumann, to dynamic situations in which the environment changes in response to the player's
choices.
The complexity of stochastic games arises from the fact that the choices made by the players have
two contradictory effects. First, together with the current state, the players' actions determine the
immediate payoff that each player receives. Second, the current state and the players' actions
influence the choice of the new state, which determines the potential of future payoffs. In particular,
when choosing his actions, each player has to balance these forces, a decision that may often be
difficult. Although this dichotomy is also present in one-player sequential decision problems.
A stochastic game is a collection of normal-form games that the agents play repeatedly. The
particular game played at any time depends probabilistically on the previous game played and the
actions of the agents in that game. Like a probabilistic finite state machines in which the states are
the games and the transition labels are joint action-payoff pairs.
• A set of strategies Si(x) for each player for each state xϵX.
• A set of rewards dependant on the state and the actions of the other players: u¡ (x,S1,S2).
In current discussion for game solving purpose following assumptions are made with respect to
stochastic games,
A history of length t in a stochastic game is the sequence of states that the game visited in the first t
stages, as well as the actions that the players played in the first t-1 stages. A strategy of a player is a
prescription how to play the game; that is, a function that assigns to every finite history an action to
play should that history occur. A behavior strategy of a player is a function that assigns to every finite
history a lottery over the set of available actions.
But now there are action profiles rather than individual actions, and each profile has several possible
outcomes. Thus a history is a sequence ht = (q0, a1, q1, a1, ..., at-1, qt), where t is the number of stages.
As before, the two most common methods to aggregate payoffs into an overall payoff are average
reward and future discounted reward.
Note that Stochastic games generalize both Markov Decision Processes (MDPs) and repeated games.
An MDP is a stochastic game with only 1 player. A repeated game is a stochastic game with only 1
state. For example, Iterated Prisoner's Dilemma, Roshambo, Iterated Battle of the Sexes.
• For agent i, a deterministic strategy specifies a choice of action for i at every stage of every possible
history
• As in extensive-form games, a behavioral strategy is a mixed strategy in which the mixing take place
at each history independently
• A Markov strategy is a behavioral strategy such that for each time t, the distribution over actions
depends only on the current state. But the distribution may be different at time t than at time t'..
• A stationary strategy is a Markov strategy in which the distribution over actions depends only on
the current state (not on the time t)
(i) player 1 has a strategy (which is then said to be "optimal"), which ensures that his expected
overall payoff over time does not fall below v, no matter what is the strategy followed by player 2,
and
(ii) if the symmetric property holds when exchanging the roles of the two players. Shapley proved the
existence of a value.
Because the parameters that define the game are independent of time, the situation that the players
face. If today the play is in a certain state is the same situation they would face tomorrow if
tomorrow the play is in that state.
In particular, one expects to have optimal strategies that are stationary Markov, that is, they depend
only on the current state of the game. Shapley proved that indeed such optimal strategies exist, and
characterized the value as the unique fixed point of a nonlinear functional operator-a two-player
version of the dynamic programming principle.
The existence of stationary Markov optimal strategies implies that, to play well, a player needs to
know only the current state. In particular, the value of the game does not change if players receive
partial information on each other's actions, and/or if they forget previously visited states.
Nash equilibrium is a concept within game theory where the optimal outcome of a game is where
there is no incentive to deviate from their initial strategy. More specifically, the Nash equilibrium is a
concept of game theory where the optimal outcome of a game is one where no player has an
incentive to deviate from his chosen strategy after considering an opponent's choice. Overall, an
individual can receive no incremental benefit from changing actions, assuming other players remain
constant in their strategies. A game may have multiple Nash equilibria or none at all.
The Nash equilibrium is the solution to a game in which two or more players have a strategy, and
with each participant considering an opponent's choice, he has no incentive, nothing to gain, by
switching his strategy. In the Nash equilibrium, each player's strategy is optimal when considering the
decisions of other players. Every player wins because everyone gets the outcome they desire. To
quickly test if the Nash equilibrium exists, reveal each player's strategy to the other players. If no one
changes his strategy, then the Nash equilibrium is proven.
For example, imagine a game between Anil and Sunil. In this simple game, both players can choose
strategy A, to receive 1, or strategy B, to lose 1. Logically, both players choose strategy A and receive
a payoff of 1. If one revealed Sunil's strategy to Anil and vice versa, one can see that no player
deviates from the original choice. Knowing the other player's move means little and doesn't change
either player's behavior. The outcome A, A represents a Nash equilibrium.
Prisoner's Dilemma
The prisoner's dilemma is a common situation analyzed in game theory that can employ the Nash
equilibrium. In this game, two criminals are arrested and each is held in solitary confinement with no
means of communicating with the other. The prosecutors do not have the evidence to convict the
pair, so they offer each prisoner the opportunity to either betray the other by testifying that the
other committed the crime or cooperate by remaining silent. If both prisoners betray each other,
each serves five years in prison. If A betrays B but B remains silent, prisoner A is set free and prisoner
B serves 10 years in prison, or vice versa. If each remains silent, then each serves just one year in
prison. The Nash equilibrium in this example is for both players to betray each other. Even though
mutual cooperation leads to a better outcome, if one prisoner chooses mutual cooperation and the
other does not, one prisoner's outcome is worse.
Irreducible stochastic game
A stochastic game is said to be irreducible if every game can be reached with positive probability
regardless of the strategy adopted.
Theorem: Every 2-player, general-sum, average reward, irreducible stochastic game has a Nash
equilibrium. A payoff profile is feasible if it is a convex combination of the outcomes in a game,
where the coefficients are rational numbers.
If (p1, p2) is a feasible pair of payoffs such that each pi is at least as big as agent i's minimax value,
then (p1, p2) can be achieved in equilibrium through the use of enforcement.
• For two-player zero-sum stochastic games, the folk theorem still applies, but it becomes vacuous
(empty).
• The situation is similar to what happened in repeated games. The only feasible pair of payoffs is the
minimax payoffs.
• Two agents who take turns. Before his/her move,an agent must roll the dice.
• The set of available moves depends on the results of the dice roll.
• Mapping Backgammon into a Markov game is straightforward, but slightly hod awkward. Basic idea
is to give each move a stochastic outcome, by combining it o with the dice roll that comes after it.
• Initial set of states = (initial board) x (all possible results of agent 1's first dice roll}
• Set of possible states after agent 1's move = {the board produced by agent 1's move) x {all
possible results of agent 2's dice roll}
• Vice versa for agent 2's move one can extend the minimax algorithm to deal with this.
• But it's easier if one doesn't try to combine the moves and the dice rolls. These two events need
to keep separate.
• This algorithm is used to solve, two-player zero-sum game in which each agent's move has a
deterministic outcome.
• The algorithm gives optimal play (highest expected utility). The algorithm is as follows.
• While finding solution for two player stochastic game following points are to be noted.
1. Dice rolls increase branching factor. 21 possible rolls are there with 2 dice.
Given the dice roll, there are 20 legal moves on an average. For some dice roles, it can be much
higher,
2. As depth increases, probability of reaching a given node shrinks and the value of lookahead is
diminished. In such situation α – β pruning is less effective.
3. There is algorithm, TDGammon that uses depth-2 search and it has very good evaluation function.
It can achieve world-champion level.
4. The evaluation function was created automatically using a machine learning technique called
Temporal Difference learning. Hence in the algorithm name 'TDGammon', 'TD' stands for Temporal
Difference learning.
Shapley's model is equivalent to one in which players discount their future payoffs according to a
discount factor that depends on the current state and on the players' actions. The game is called
"discounted" if all stopping probabilities equal the same constant, and one minus this constant is
called the "discount factor." Models of discounted stochastic games are prevalent in economics,
where the discount factor has a clear economic interpretation.
In non-zero-sum games, a collection of strategies, one for each player, is a "(Nash) equilibrium" if no
player can profit by deviating from his strategy, assuming all other players follow their prescribed
strategies.
Stochastic games give a model for a large variety of dynamic interactions and are therefore useful in
modeling real-life situations that arise in, e.g., economics, political science, and operations research.
So the analysis of the game provides decisive predictions and recommendations, the data that define
it must have special features. Because applications are usually motivated by the search for clear-cut
conclusions, only highly structured models of stochastic games have been studied.
2. Another application of stochastic games is that of market games with money. The study of the
origin of inflation in a market game with continuum of agents and a central bank has been carried
out. At every stage, each player receives a random endowment of a perishable commodity, decides
how much to lend to or to borrow ay from the central bank, and consumes the amount that he has
after this transaction. The conclusion of the analysis is that the mere presence of uncertainty in the
endowments leads to inflation.
Solved Example
or
Solution: Game playing - Game playing is one area in which substantial progress has been made in
scaling up toy problems. An IBM program named DEEP BLUE beat the reigning world chess
champion, Garry Kasparov, by 3.5 to 2.5 in a six game match. Championship performance has been
achieved by sophisticated search algorithms, high-speed computers and chess-specific hardware.
• In minimax game playing strategy, one can use a recursive procedure that needs to return not one
but two results –
• Assuming that MINIMAX returns a structure containing both results, we have two functions VALUE
and PATH, that extract the separate components.
• Initially MINIMAX procedure is called by specifying three parameters as, MINIMAX (Position,
Depth, Player) where
• So the initial call to compute the best move from the position CURRENT should be -
MINIMAX (CURRENT, 0, PLAYER-ONE) if player-one is to move or,
• A critical issue in the design of the MINIMAX procedure is when to step the recursion and simply
call the static evaluation function.
• There are a variety of factors that may influence this decision. They include –
• Thus a function DEEP-ENOUGH is assumed to evaluate all of these factors and to return TRUE if
the search should be stopped at the current level and FALSE otherwise.
• One simple implementaiton of DEEP-ENOUGH will take two parameters, Position and Depth. It will
ignore its position parameter and simply return true it is Depth parameter exceeds a constant cut-off
value.
Review Questions
1. Explain minimax procedure. Is this procedure a breadth first or depth first search? (Refer section
5.11)
2. Explain various strategies of game playing? (Refer sections 5.9 and 5.10)
3. Explain the minimax strategy of game playing techniques with example. (Refer section 5.11)
Ans.: In a game of chance we can add extra level of chance nodes in game search tree. These nodes
have successors which are the outcomes of random element.
The minimax algorithm uses probability P attached with chance node di based on this value.
Successor function S (N, di) give moves from position N for outcome di.
May 2003
Q.1 Explain perfect decisions in game playing? Give example. (Refer sections 5.1 and 5.2) [16]
Dec. 2003
Q.2 Explain minimax algorithms and how it works for game of tic-tac-toe. (Refer section 5.11 )[8]
Dec. 2004
Q.3 Explain minimax search procedure with example upto 3rd ply. (Refer section 5.11 ) [8]
Q.4 Describe alpha-beta pruning using example. Show game tree upto
May 2009
Q.5 Explain minimax procedure for game playing. Is this DFS or BFS? How it can be modified to be
used by a program playing three-four player game? (Refer section 5.11 ) [8]
May 2010
Q.7 Describe minimax procedure and alpha-beta pruning. (Refer sections 5.11 and 5.12) [16]
Dec. 2010
Q.8 Explain alpha-beta pruning algorithm with example. (Refer section 5.12)
[10]
May 2017
May 2019
Q.11 Explain the Min Max game playing algorithm with an example. (Refer section 5.11) [6]