0% found this document useful (0 votes)
5 views

UNIT I (e)

Adversarial search involves multiagent environments where agents have conflicting goals, commonly represented in games. Game theory, established by John Von Neumann, provides a framework for analyzing strategic interactions among players, with applications in economics, social sciences, and military strategy. Key concepts include game types, strategies, and the minimax algorithm, which helps determine optimal moves in competitive settings.

Uploaded by

P SANTHIYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

UNIT I (e)

Adversarial search involves multiagent environments where agents have conflicting goals, commonly represented in games. Game theory, established by John Von Neumann, provides a framework for analyzing strategic interactions among players, with applications in economics, social sciences, and military strategy. Key concepts include game types, strategies, and the minimax algorithm, which helps determine optimal moves in competitive settings.

Uploaded by

P SANTHIYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

UNIT I

Chapter: 5: Adversarial search


Adversarial Search

In adversarial search problem environment is multiagent, competitive where in the agent's goals are
in coflict. Adversarial search problems are commonly known as games.

Mathematical game theory a branch of economics views any multiagent environment as a game,
provided that, impact of each agent on the others is "significant" regardless of whether the agent are
co-operative or competitive.

What is Game?

AU: May-03

- The term game means a sort of conflict in which n individuals or groups (known as players)
participate.

- Game theory denotes games of strategy.

- John Von Neumann is acknowledged as father of game theory. Neumann defined game theory in
1928 and established the mathematical framework for all subsequent theoretical developments.

- Game theory allows decision-makers (players) to cope with other decision-maker (players) who
have different purposes in mind. In other words, players determine their own strategies in terms of
the strategies and goal of their opponent.

- Games are integral attribute of human beings. Games engage the intellectual faculties of humans.

- If computers are to mimic people they should be able to play games.

- Game playing has close relation to intelligence and it has well-defined states and rules.

Applications of Game Theory

AU May-03

Applications of game theory are wide-ranging. Von Neunmann and Morgenstern indicated the utility
of game theory by linking with ecomomic behavior.

1. Economic models

For markets of various commodities with differing numbers of buyers and sellers, fluctuating values
of supply and demand, seasonal and cyclical variations, analysis of conflicts of interest in maximizing
profits and promoting the widest distribution of goods and services.

2. Social sciences

The n-person game theory has interesting uses in studying the distribution of power in legislative
procedures, problems of majority rule, individual and group decision making.

3. Epidemiologists

Make use of game theory, with respect to immunization procedures and methods of testing a vaccine
or other medication.
4. Mlitary strategists

Turn to game theorey to study conflicts of interest resolved through "battles" where the outcome or
payoff of a war game is either victory or defeat.

Definition of Game

1. A game has at least two players. Solitaire is not considered a game by game theory.

The term "solitaire" is used for single-player games of concentration.

2. An instance of a game begins with a player, choosing from a set of specified (game rules)
alternatives. This choice is called a move.

3. After first move, the new situation determines which player to make next move and alternatives
available to that player.

i) In many board games, the next move is by other player.

ii) In many multi-player card games, the player making next move depends on who dealt, who took
last trick, won last hand, etc.

4. The moves made by a player may or may not be known to other players. Games in which all moves
of all players are known to everyone are called games of perfect information. For example,

i) Most board games are games of perfect information.

ii) Most card games are not games of perfect information.

5. Every instance of the game must end.

6. When an instance of a game ends, each player receives a payoff. A payoff is a value associated with
each player's final situation. A zero-sum game is one, in which elements of payoff matrix sum to zero.
In typical zero-sum game:

i) Win = 1 point,

ii) Draw = 0 point,

iii) Loss = - 1 point

Game Theory

Game theory does not prescribe a way or say how to play a game. Game theory is a set of ideas and
techniques for analyzing conflict situations between two or more parties. The outcomes are
determined by their decisions.

General Game Theorem

In every two players, game like zero sum, non-random, perfect knowledge game there exists a
perfect strategy guaranteed to at least result in a tie game.

The frequently used terms in game theory: -

- The term "game" means a sort of conflict in which n individuals or groups (known as players)
participate.

- A list of "rules" stipulates the conditions under which the game begins.
- A game is said to have "perfect information" if all moves are known to each of the players involved.

- A "strategy" is a list of the optimal choices for each player at every stage of a given game.

- A "move" is the way in which game progresses from one stage to another, beginning with an initial
state of the game to the final state.

- The total number of moves constitute the entirety of the game.

- The payoff or outcome, referes to what happens at the end of a game.

- Minimax: The least good of all good outcomes.

- Maximin: The least bad of all bad outcomes.

• The important and basic game theory theorem is the mini-max theorem. This theorem says,

"If a minimax of one player corresponds to a maximin of the other player, then that outcome is the
best that both players can hope for."

Role of Game Playing in Al

There were two reasons that games appeared to be a good domain in which to explore machine
intelligence -

i) They provide a structured task in which it is very easy to measure success or failure.

ii) They did not obviously require large amounts of knowledge. They were thought to be solvable by
straightforward search from the starting state to a winning position.

The first of these reasons remains valid and accounts for continued interest in the area of game
playing by machine. Unfortunately, the second is not true for any but the simplest games. For
example, consider chess, playing it on computer has to face problems of combinational explosion of
solutions due to the following reasons

• The average branching factor is around 35.

• In an average game, each player might make 50 moves.

• So in order to examine the complete game tree, one would need to examine 35100 positions.

In addition to the above two reasons there are some more reasons, why game-playing occupies a
pivotal role in AI are -

i) The rules of games are very limited. Hence, extensive amounts of domain-specific knowledge are
seldom needed.

ii) Many human experts exist to assist in the developing of the programs. Hence, the problem of
shortage of human experts does not arise.

iii) For the human expert, it is easy to explain the rationale for a move unlike other domains.

iv) Games picturize real life situations in a constricted fashion. The logical reasoning ability of the
human in a normal condition and under stress is clearly exhibited in game-playing. Moreover game-
playing permits one to simulate real-life situations.

Relevance of Game Theory and Game Playing


How relevant the game theory is to mathematics, computer science and economics is shown in the
figure below:

Game Playing

Characteristics of Game Playing: -

1. There is always an "unpredictable" opponent

- Opponent introduces uncertainty.

- Opponent also wants to win.

2. Time limit

Games are always played in time constraint environment. Therefore is needs to handle time
efficiently.

Types of Games

1. Based on chance

i) Deterministic (not involving chance)

For example -

Chess, Checkers, Tic-tac-toe

ii) Non-deterministic (can involve chance)

For example

Backgammon, Monopoly.

2. Based on information

i) Perfect information -

Here all moves of all players are known to everyone.

For example -
Chess, Checker, Tic-tac-toe.

ii) Imperfect information -

Here all moves are not known to everyone.

For example:

Bridge, Pocker, Scrabble.

3. General zero-sum games

Players must choose their strategies simultaneously, neither knowing what the other player is going
to do.

For example-

If you play a single game of chess with someone, one person will lose and one person will win. The
win (+1) added to the loss (−.1) equals zero.

4. Constant-sum game

Here the algebraic sum of the outcomes are always constant, though not necessarily zero.

It is strategically equivalent to zero-sum games.

5. Non-zero-sum game

Here the algebraic sum of the outcomes are not constant. In these of the payoffs are not the same
for all outcomes.

They are not always completely solvable but provide insights into important areas of inter-dependent
choice.

In these games, one player's losses do not always equal another player's gains.

The non-zero-sum games are of two types: -

i) Negative sum games (Competitive) -

Here nobody really wins, rather everybody loses.

Example - A war or a strike.

ii) Positive sum games (Co-operative) -

Here all players have one goal that they contribute together.

Example - An educational game, building blocks, or a science exhibit.

6. N-person game

It involves more than two players.

Analysis of such games is more complex than zero-sum games.

Conflicts of interest are less obvious.

Formal Representation of a Game as a Problem


A Game is Essentially a Kind of a Search Problem!

Game is formally defined with following four components: -

1. The initial state, which includes the board position and identifies the player to move.

2. A successor function, which returns a list of (move, state) pairs, each indicating a legal move and
the resulting state.

3. A terminal test, which determines when the game is over. States where the game has ended are
called as terminal states.

4. A utility function (also called an objective function or payoff function), which gives a numeric value
for the terminal states. In chess, the outcome is a win, loss or draw, with values +1, - 1, or Ө. Some
game have a wider variety of possible outcomes; the payoffs in backgammon range from + 192 to -
192.

Game Playing Strategies

A players's strategy in a game is a complete plan of action for whatever situation might arise. It is a
complete algorithm for playing the game, telling a player what to do for every possible situation
throughout the game.

A pure strategy provides a complete definition of how a player will play a game. In perticular, it
determines the move a player will make for any situation they could face. A player's strategy set, is
the set of pure strategies available to that player.

A mixed strategy is an assignment of a probabiblity to each pure strategy. This allows for a player to
randomly select a pure strategy. Since probabilities are continuous, there are infinitely many mixed
strategies available to a player, even if their strategy set is finite.

A mixed stretegy for a player is a probability distribution, on the set of his pure strategies.

• Games can be classified as either a single-person playing or multiperson playing.

• For example, Rubik's cube and 8-tile puzzle are single person games. For solving such problems
strategies like best-first or A* algorithm can be used. These strategies help in identifying paths in a
clear fashion.

• While for solving problem, where two person play a game like chess or checkers etc., cannot be
solved by best-first search or A* algorithms. As here each player tries to outsmart the opponent.
Each has their own way of evaluating the situation.

• The basic characteristic of the strategy must be look ahead in nature i.e., explore the tree for two
or more levels downwards and choose the optimal one. The basic methods available for game
playing are -

i) Minimax strategy

ii) Minimax strategy with alpha-beta cutoffs.

• A Two Player Strategy Table.


• It is a techniques which always leads to superior solution than any other strategy, as opponent is
playing in perfect manner.

• Roughly speaking, an optimal strategy leads to outcomes that are at least as good as any other
strategy when one is playing an infallible opponent.

Mini-Max Value

The minimax value for a given game tree is determined by optimal strategy by evaluating minimax
value of each node.

Mini-Max Theorem

Players adopt those strategies which will maximize their gains, while minimizing their losses.
Therefore the solution is, the best each player can do for him/herself in the face of opposition of the
other player.

Determining Optimal Strategy

1. Given a game tree the optimal strategy can be determined by examining the minimax value of
each node.

2. The minimax value of a node is the utility (for player called MAX) of being in the corresponding
state, assuming that both players play optimally from this stage to the end of the game.

3. The minimax value of a terminal state is just its utility.

4. Given a choice, MAX will perfer to move to a state of maximum value, whereas MIN prefers a state
of minimum value.

Game Tree

The initial state and the legal moves for each side, define the game tree for the game.

• Description of the game tree [Refer Fig. 5.9.1]

1. Root node -

Represents board configuration and decision, required as to what is the best singlenext move.

If my turn to move, then the root is labeled a MAX node indicating it is my turn;

Otherwise it is labeled a MIN, node to indicate it is my opponent's turn.

2. Arcs -
Represent the possible legal moves for the player that the arcs emanate from.

3. At each level, the tree has nodes that are all MAX or all MIN.

4. Since moves alternate, the nodes at level 'ï' are of opposite kind from those at level i + 1.

1. Above is a partial tree for the game of tic-tac-toe. T

2. The top node (root) is the initial state and MAX (player 1) moves first, placing X in an empty
square.

3. The rest of the search tree shows alternate moves for MIN (player 2) and MAX.

4. Terminal states are assign utilities according to the rules of game.

Major Components of Game Playing Program

There are two major components of a game playing program, viz.


i) A plausible move generator:

• As for every move a player makes in the game of chess, the average branching factor is 35, i.e., the
opponent can make 35 different moves.

• If one employs a simple move-generator then it might not be possible to examine all the states.
Hence, it is essential that only every selected moves or paths be examined.

• For this purpose only, one has a plausible move generator, that expands or generates only selected
moves.

• It is not possible for all moves to be examined because

a) The amount of time given for a move is limited.

b) The amount of computational power available at the disposal for examining various states is also
limited.

ii) A static evaluation function generator:

• This is one of the most important component of game playing program which generates the static
evaluation function value for each and every move that is being made based on heuristics.

• The static evaluation function gives a snapshot of a particular move. More the static evaluation
function value, more is the probability for a victory.

• Static evaluation function generator occupies a crucial role in game playing programs because of
the following factors -

a) It utilizes the heuristic knowledge for evaluating the static evaluation function value.

b) The static evaluation function generator acts like a pointer to print the plausible move generator
has to generate future paths.

Mini-Max Algorithm

AU: Dec.-03, 04, May-09, 10, 17, 19

The minimax algorithm computes the minimax decision from the current state. It is used as a
searching technique in game problems. The minimax algorithm performs a complete depth-first
exploration of the game-tree.

Concept of Mini-Max Algorithm

1. The start node is MAX (player 1) node with current board configuration.

2. Expand nodes down (play) to some depth of look-ahead in the game.

3. Apply evaluation function at each of the leaf nodes.

4. "Back up" values for each non-leaf nodes until computed for the root node.

5. At MIN (player 2) nodes, the backed up value is the minimum of the values associated with its
children.

6. At MAX nodes, the backed up value is the maximum of the values associated with its children.
Note The process of "backing up" values gives the optimal strategy, that is, both players assume that
your opponent is using the same static evaluation function as you are.

Properties of Mini-Max

1. Minimax provides a complete solution for finite tree.

2. Minimax provides optimal strategy against an optimal opponent.

3. The time complexity is O(bm).

4. The space complexity is O(bm) for depth-first exploration where algorithm generates all successor
at once, or O(m) for an algorithm that generate successors one at a time.

Problem Associated with Mini-Max

This algorithm explores whole search space. If we have a game with huge search spaces it will take
long time.

In such cases then exact solution is completely infeasible.

Game Playing with Mini-Max-Tic-Tac-Toe [Noughts and crosses] Example

1. Assume that two players named MIN and MAX who are playing the game.

2. MAX is playing first.

3. As you can see initially MAX has nine possbile moves.

4. Play alternates between MAX and MIN until we reach leaf nodes corresponding to terminal states
such that one player has 3 in a row or all the squares are filled.

5. The number on each leaf node indicates the utility value of the terminal state from the point of
view of MAX.

6. High values are assumed to be good for MAX and bad for MIN.

7. It is MAX's job to make use of search tree to determine best move.

8. Static evaluation. Criteria + 1' for a win, '0' for draw.

Example to show how Mini-Max Algorithm Works

Consider game of tic-tac-toe with the initial state -

Step 1:

Start: MAX (player 1) moves (MAX is making X)


Step 2: Next: MIN (player 2) moves.

Step 3: Again: MAX's moves


Step 4: +1' for a win, '0' for a draw. Criteria '+ 1' for a win, '0' for a draw.

Step 5: Level by level, on the basis of opponents turn UP: One level
Step 6: UP: Two levels

Step 7: Choose best move which is maximum

The 'made-up' Games and Concept of 'ply'


For study purpose we can form a game a tree which is constructed up to certain level (i.e. upto
certain depth). This is called as a made-up game. The term ply refers to depth of the tree.

For example - ply 4 is level at depth 4 below the root node.

Example

A two ply game tree, for game of tic-tac-toe.

1. Assume that two players named MIN and MAX who are playing the game.

2. MAX is playing first.

3. The possible moves for MAX at the root node are labeled a1, a2 and a3.

4. The possible replies to a 1 for MIN are b1, b2, b3 and so on.

5. This particular game ends after one move each by MAX and MIN.

6. In game parlance, we say that this tree is one which, moves deep consisting of two-half-moves,
each of which is called a ply.

Example

A three-ply game tree, for game of tic-tac-toe


Mini-Max Algorithm for Playing Multiplayer Games

1. There are many multiplayer games like cricket, football, etc.

2. The multiplayer game can be played with minimax concept.

3. Here the single value for each node is replaced with a vector of values. For example, in a three-
player game with players A, B, and C a vector (VA , VB , VC) is associated with each node.

4. For terminal states, this vector gives the utility of the state from each player's viewpoint. (In two
player, zero-sum games, the two-element vector can be reduced to a single value because the values
are always opposite.

5. In this implementation UTILITY function return a vector of utilities.

6. For non-terminal states we can calculate utility function value as explain below - Consider
following diagram,

7. For non-terminal states vector values should be calculated as explained. Consider the node
marked X in the game tree shown in above diagram. In this state, player C choose what to do. The
two choices lead to terminal states with utility vector (VA = 1, VB = 2, VC = 6) and (VA = 4, VB = 2, VC =
3). Since 6 is bigger than 3, C should choose the first move. This means that if state X is reached
subsequent play will lead to a terminal state with utilities (VA = 1, VB = 2, VC = 6). Hence the backed-
up value of X is this vector.

8. In general, the backed-up value of a node 'n' is the utility vector of whichever successor has the
highest value for the player choosing at 'n'.

9. Multiplayer games usually involve alliance, whether formal or informal, among the players.
Alliances are made and broken as the game proceeds.

Alpha-Beta Pruning

AU: Dec.-04, 10, May-10, 17

Motivation for α – β Pruning

1. The problem with minimax algorithm search is that the number of game states it has to examine is
exponential in the number of moves.
2. α- β proposes to compute the correct minimax algorithm decision without looking at every node in
the game tree.

α-β Pruning Example :

Step 1:

Step 2:

Step 3:

Steps in Alpha-Beta Pruning

1. MAX player cuts off search when he knows MIN-player can force a provably bad outcome.

2. MIN player cuts of search when he knows MAX-player can force provably good (for MAX) outcome.
3. Applying an alpha-cutoff means we stop search of a particular branch because we see that we
already have a better opportunity elsewhere.

4. Applying beta-cutoff means we stop search of a particular branch because we see that the
opponent already has a better opportunity elsewhere.

5. Applying both forms is alpha-beta pruning.

Alpha Cutoff

It may be found that, in the current branch, the opponent can achieve a state with a lower value for
us than one achievable in another branch. So the current branch is one that we will certainly not
move the game. Search of this branch can be safely terminated.

For example

Beta-Cutoff

It is just the reverse of alpha-cutoff.

It may also be found, that in the current branch, we would be able to achieve a state which has a
higher value for us than one of the opponent can hold us to in another branch. The current branch
can be identified as one that the opponent will certainly not move the game. So search in this branch
can be safely terminated.

For example -

Algorithm of Alpha-Beta Pruning

/*

alpha is the best score for max along the path to state.
beta is the beta is the best score for min along the path to state.

*/

If the level is the top level, let alpha= - infinity, beta = + infinity.

If depth has reached the search limit apply static evaluation function to state and return result.

If player is max:

Until all of state's children are examined with ALPHA-BETA or until alpha is equal to or greater than
beta:

Call ALPHA-BETA (child, min, depth + 1, alpha, beta);

note result

Compare the value reported with alpha; if reported value is larger reset alpha to the new value.

Report alpha

If player is min:

Until all of state's children are examined with ALPHA-BETA or until alpha is equal to or greater than
beta:

Call ALPHA-BETA, (child, max, depth + 1, alpha, beta);

note result.

Compare the value reported with beta; if reported value is smaller, reset beta to the new value.

Report beta.

Example of Alpha-Beta Pruning (Upto 3rd Ply)

Example 1:

1. In a game tree, each node represents a board position where one of the player gets to choose a
move.

2. For example, look at node C in Fig. 5.12.6, as well as look at its left child.

3. We realize that if the players reach node C, the minimizer can limit the utility to 2. But the
maximizer can get utility 6 by going to node B instead, so he would never let the game reach C. reach
C. Therefore we don't even have to look at C's other children.

4. Initially at the root of the tree there is no gurantee about what values the maximizer and
minimizer can achieve. So beta is set to ∞ and alpha to – ∞.

5. Then as we move down the tree, each node starts with beta and alpha values passed down from
its parent.

6. If it's a maximizer node, then alpha is increased if a child value is greater than the current alpha
value. Similarly, at a minimizer node, beta may be dereased. This procedure is shown in Fig. 5.12.6.
7. At each node, the alpha and beta values may be updated as we iterate over the node's children. At
node E, when alpha is updated to a value of 8, it ends up exceeding beta.

8. This is a point where alpha-beta pruning is required we know the minimizer would never let the
game reach this node, so we don't have to look at its remaining children.

9. In fact, pruning happens exactly when alpha becomes greater than or equal to beta - that is, when
the alpha and beta lines hit each other in the node value.

Example 2: Explain Alpha-beta cut-offs in game playing with example.

1) Alpha-beta cut-offs is a modified version of minimax algorithm, wherein two threshold values are
maintained for future expansion.

2) One representing a lower bound on the value that a maximizing node ultimately, be assigned (we
call this alpha) and another representing a upper bound on the value that a minimizing node may be
assigned (this we call beta).

3) To explain how alpha-beta values help, consider the tree structure in Fig. 5.12.7.

• Here, the maximizer has to play followed by the minimizer. As is done in minimax, look ahead
search is done.

• The maximizer assigns a value 9 at B (minimum of the value at D and E). This value is passed back
to A. So the maximizer is assured to have a value greater than 9. When it moves to B.

• Now, consider the case of node C. Value at node F is 5, and at G is unknown.


• Since, the move is a minimizing one, by moving to C, A can get value - 2 or less than that. This is
totally unacceptable to A because by moving to B, he is assured to have a value equal to 6 or greater
than this.

• So, by simply pruning the tree whose root is G, one saves a lot of time in searching.

4) Now effects of alpha and beta could be understood by Fig. 5.12.8. Here, a look ahead search is
done upto a level 3 as shown in Fig. 5.12.8. The static evaluation function generator has assigned
values which are given for the leaf nodes. Since, D, E, F and G are maximizers, the maximum of leaf
nodes are assigned to them. Thus D, E, F and G has value 8, 10, 5 and 11.

5) Here B and C are minimizers, so they are assigned a value minimum of their successors i.e., B is
assigned a value 8 and C is assigned a value 5. And A is maximizer so A will obviously opt a value 8.

• ROLE OF ALPHA - For A, the maximizer, a value of 8 is assured by moving to node B. This value is
compared with that of the value at C.

• A, being a maximizer would follow a path whose value is greater than 8. Hence, this value of 8,
being the least that a maximizing node can obtain is set as the value of alpha.

• This value of alpha is now used as reference point. Any node whose value is greater than alpha is
acceptable and all nodes whose values are less than alpha value are rejected. By Fig. 5.12.8, we come
to know, that value at C is 5. Hence, the entire tree under B is totally rejected. That saves a lot of time
and cost.
• ROLE OF BETA - Now, to know the role of BETA consider Fig. 5.12.9.

Here, B is root, B is minimizer and the paths for expansion are chosen from the values at each leaf
nodes.

Since, D and E are maximizer's the maximum values of their children are backed up as their static
evaluation function values. Node B being, a minimizer will always move to D rather than E. The value
at D(8) is now used by B as a reference. This value is called beta, the maximum value, a minimizer can
be assigned. Any node whose value is less than this beta value (8) is acceptable and values more than
beta are seldom referred. The value of beta is passed to node E. Comparing it with the static
evaluation function value there.

The minimizer will be benefited by only moving to D rather than E. Hence, the entire tree under note
E is pruned.

Why is it Called α-β?

• α is the value of the best (i.e., highest-value) choice found so far at any choice point along the path
for max.

• If γ is worse than a (as shown in A diagram), max will avoid it.

• Similarly ẞ can be defined for min.

Heuristic Function that can be used in Cutting Off Search

1. Evaluation Functions
1. An evaluation function returns an estimate of the expected utility of the game from a given
position, just as the heuristic functions return an estimate of the distance to the goal.

2. It should be clear that the performance of a game-playing program is dependant on quality of its
evaluation function. An inaccurate evaluation function will guide an agent toward positions that turn
out to be lost.

3. Designing evaluation Function:

a) The evaluation function should order the terminal states in the same way as the true utility
function; otherwise, an agent using it might select sub optimal moves even if it can see ahead all the
way to the end of the game.

b) The computation must not take too long!

c) For non terminal states, the evaluation function should be strongly correlated with the actual
chances of winning.

2. Imperfect and Real-time Decisions

The minimax algorithm generates the entire game search space, whereas the alpha-beta algorithm
allow us to prune large parts of it. However, alpha-beta still has to search all the way to terminal
states for at least a portion of the search space. This depth is usually not practical, because moves
must be made in a reasonable amount of time-typically a few minutes at most.

For making search faster we can make a heuristic evaluation function that can be applied to states in
the search. It will effectively turn nonterminal nodes into terminal leaves. In other words, the
suggestion is to alter minimax or alpha-beta in two ways; the utility function is replaced by a heuristic
evaluation function, which gives an estimate of the positions utility, and the terminal test is replaced
by a cutoff test that decides when to apply heuristic function.

3. Cutting Off Search

1. Cutting off search is simple approach for searching faster.

2. The cutting off search is applied to limit the depth.

If Cutoff-Test (s, depth) then return E(S) problem in cutting off search. (Where E is evaluation
function).

3. Cutoff test might be applied in adverse condition.

4. It may stop search before allowable time.

5. Iterative deepening go until time is elapsed.

6. Quiescent position -

a) If a position looks "dynamic", don't even bother to evaluate it.

b) Instead, do a small secondary search until things calm down. For example -After capturing a piece,
things look good, but this would be misleading if opponent was about to capture, right back.

c) In general such factors are called continuation heuristics.

d) A quiescent position is a position which is unlikely to exhibit wild swings (huge changes) in value in
near future.
e) Consider following example –

Here [Refer Fig. 5.12.11] now since the tree is further explored, the values, which is passed to A, is 6.
Thus the situation calms down. This is called as waiting for quiescence. This helps in avoiding the
horizon effect of a drastic change of values.

7. The horizon effect -

A potential problem that arises in game tree search (of a fixed depth) is the horizon effect, which
occurs when there is a drastic change in value, immediately beyond the place where the algorithm
stops searching. Consider the tree show in Fig. 5.12.12 (a).

It has nodes A, B, C and D. At this level since it is a maximizing ply, the value which will passed up at A
is 5.

Suppose node B is examined one more level as shown in Fig. 5.12.12 (b), then we see that because of
a minimizing ply, value at B is 2 and hence the value passed to A is 2. This results in a drastic change
in the situation. There are two proposed solutions to this problem
a) Singular extension -

Which expands a move that is clearly better than all other moves. Its cost is higher with branching
factor 1.

b) Secondary search-

One proposed solution is to examine the search space, beyond the apparently best one, to see that if
something is looming just over the horizon. In that case we can revert to second-best move.
Obviously then the second-best move has the same problem, and there isn't time to search beyond
all possible acceptable moves.

8. Additional refinements - Other than alpha-beta pruning, there are a variety of other modifications
to the minimax procedure that can also enhance its performance.

During searching, maintain two values alpha and beta so that alpha is the current lower bound of the
possible returned value; beta is the current upper bound of the possible returned value. If during
searching, we know for sure alpha > beta, then there is no need to search any more in this branch.
The returned value cannot be in this branch. Backtrack until it is the case alpha beta. The two values
alpha and beta are called the ranges of the current search window. These values are dynamic.
Initially, alpha is - ∞ and beta is.

The quiescence and secondary search technique - If a node represents a state in the middle of an
exchange of pieces, the evaluation function may not give a reliable estimate of board quality. For
example, after a knight is captured, the evaluation may be good, but this is misleading as the
opponent is about to capture one's queen.

The solution to this problem is, if node evaluation is not "quiescent", continue alpha-beta search can
be made below that node but limit moves to those that significantly change evaluation function (for
example, capture moves or promotions). The crucial complexity point is that branching factor for
such moves is small.

For complicated games it is not feasible to select a move by simply looking-up the current game
configuration in a catalogue (the book move) and extracting the current move. The catalogue would
be huge and very complex for construction.

4. Forward Pruning

1. It is another method for searching faster.

2. In this, some moves at a given node are pruned immediately without further consideration.
Clearly, most humans playing chess only consider a few moves from each position (at least
consciously).

3. Unfortunately, the approach is rather dangerous because there is no guarantee that the best move
will not be pruned away.

4. This can be disasterous if applied near the root, because it may happen that, often the program
will miss some "obvious" moves.

5. Forward pruning can be used safely in special situations. For example, when two moves are
symmetric or otherwise equivalent, only one of them need be considered or for nodes that are deep
in the search tree.
Games with Chance

Games with a certain element of chance are often more interesting than those without chance, and
many games involve rolling dice, tossing a coin or something similar.

- In the real world many situation infront of us are unpredictable. It is also observed in many games.

Example: Dice rolling, backgammon.

- Some time, in the game, imperfect information is available.

Example: Cards, dominose, etc.

- In game with chance, we can introduce probabilities to our search diagrams and calculate minimax
solutions as in the normal games.

- We add 1 more level in game tree i.e. the level of chance nodes.

- Chance nodes have as many successors as outcomes of the random element.

- Minimax with element of chance

1) di (i = 1, ..., n) - Outcomes from the chance nodes.

2) P (di) - Probability of di;


3) S (N, di) - Moves from position N for outcome di;

4) If N is MAX : Utility (N) = Σni=1 P(di) maxs ε S(N, di) utility (s)

5) If N is MIN:

Utility (N) = Σni=1 P(di) maxs ε S(N, di) utility (s)

- The utility is not computed by using just the terminal values. Therefore values assigned to win, loss
and draw affect the choice of moves.

- Time complexity increases (n outcomes from the chance nodes) to O (bd nd).

- Alpha-Beta pruning is more complicated in this game trees.

State of the Art Game Programs

While developing game playing applications the incremental approach can be taken. First stage is,
human plays against human. In this stage program serve as a representation. The program checks for
legal moves. In second stage, human plays against program in which legal and good moves are made
by program. The program acts as coach in this stage. In third stage, program plays against program.
The learning component of the program enhances by playing games.

In game playing good, evaluation function is crucial factor. A evaluation function should consider all
the factors like number of pieces, values associated with each square on the board. Searching,
looking ahead and exploring alternatives should be tried using evaluation function.

Game Playing Case Studies

• Checkers (Samuels, Chinook) : Chinook ended 40-year-reign of human world champion Marion
Tinsley in 1994. It used an endgame database defining perfect play for all positions involving 8 or
fewer pieces on the board, a total of 443,748,401,247 positions. Checkers is now solved!

• Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997.
Deep Blue examined 200 million positions per second, used very sophisticated evaluation and
undisclosed methods for extending some lines of search up to 40 ply. Current programs are even
better, if less historic.

• Othello (Logistello): In 1997, Logistello defeated human champion by six games to none. Human
champions refuse to compete against computers, which are too good.

• Go (Goemate, Go4++): Human champions are beginning to be challenged by machines, though the
best humans still beat the best machines. In Go, b> 300, so most programs use pattern knowledge
bases to suggest plausible moves, along with aggressive pruning.

• Backgammon (Tesauro's TD-gammon): Neural-net learning program TD Gammon is one of world's


top 3 players.

Human against program - Incremental Addition to the "Smartness" of the program:

1. Play randomly (but legal, may involve a non-trivial amount of knowledge/computation).

2. Have a static value associated with each square on the board.

3. Have a dynamic value associated with each square on the board.

4. Have an evaluation function taking other factors into account (for example, no. of pieces).
5. Search/look-ahead/exploring alternatives (using evaluation function) and look one more ahead.
look several moves ahead using minimax, alpha_beta.

Stochastic Games

Stochastic games represent dynamic interactions in which the environment changes in response to
players' behavior. Scientist Shapley says, "In a stochastic game the play proceeds by steps from
position to position, according to transition probabilities controlled jointly by the two players"

A stochastic game is played by a set of players. In each stage of the game, the play is in a given state
(or position, in Shapley's language), taken from a set of states, and every player chooses an action
from a set of available actions. The collection of actions that the players choose, together with the
current state, determine the stage payoff that each player receives, as well as determines probability
distribution according to which the new state that the player will visit.

Stochastic games extend the model of strategic-form games, which is attributable to scientist Von
Neumann, to dynamic situations in which the environment changes in response to the player's
choices.

Stochastic games - Complexity

The complexity of stochastic games arises from the fact that the choices made by the players have
two contradictory effects. First, together with the current state, the players' actions determine the
immediate payoff that each player receives. Second, the current state and the players' actions
influence the choice of the new state, which determines the potential of future payoffs. In particular,
when choosing his actions, each player has to balance these forces, a decision that may often be
difficult. Although this dichotomy is also present in one-player sequential decision problems.

Stochastic game - Definition

A stochastic game is a collection of normal-form games that the agents play repeatedly. The
particular game played at any time depends probabilistically on the previous game played and the
actions of the agents in that game. Like a probabilistic finite state machines in which the states are
the games and the transition labels are joint action-payoff pairs.

• A stochastic game also called as Markov game is defined by:

• a finite set Q of states (games),

• A set of strategies Si(x) for each player for each state xϵX.

• a set N = {1,…., n} of agents

• For each agent i, a finite set Ai of possible actions

• A transition probability function P: Q × A1 ×…..× An × Q → [0, 1]

P (q, a1, ..., an, q' ') = Probability of transitioning to state q

if the action profile (a1,..., an) is used in state q

• A set of rewards dependant on the state and the actions of the other players: u¡ (x,S1,S2).

• For each agent i, a real-valued payoff function

• ri: Q × A1 ×….. × An -> R (set of real numbers)


• Each stage game is played at a set of discrete times t.

In current discussion for game solving purpose following assumptions are made with respect to
stochastic games,

1. The length of the game is not known (infinite horizon) used;

2. The rewards and transition probabilities are not dependent.

3. For solving game only Markov strategies is considered.

A history of length t in a stochastic game is the sequence of states that the game visited in the first t
stages, as well as the actions that the players played in the first t-1 stages. A strategy of a player is a
prescription how to play the game; that is, a function that assigns to every finite history an action to
play should that history occur. A behavior strategy of a player is a function that assigns to every finite
history a lottery over the set of available actions.

Earlier, a history was just a sequence of actions.

But now there are action profiles rather than individual actions, and each profile has several possible
outcomes. Thus a history is a sequence ht = (q0, a1, q1, a1, ..., at-1, qt), where t is the number of stages.
As before, the two most common methods to aggregate payoffs into an overall payoff are average
reward and future discounted reward.

Stochastic games and MDPs

Note that Stochastic games generalize both Markov Decision Processes (MDPs) and repeated games.
An MDP is a stochastic game with only 1 player. A repeated game is a stochastic game with only 1
state. For example, Iterated Prisoner's Dilemma, Roshambo, Iterated Battle of the Sexes.

Strategies for Solving Stochastic Games

• For agent i, a deterministic strategy specifies a choice of action for i at every stage of every possible
history

• A mixed strategy is a probability distribution over deterministic strategies

• Several restricted classes of strategies:

• As in extensive-form games, a behavioral strategy is a mixed strategy in which the mixing take place
at each history independently

• A Markov strategy is a behavioral strategy such that for each time t, the distribution over actions
depends only on the current state. But the distribution may be different at time t than at time t'..

• A stationary strategy is a Markov strategy in which the distribution over actions depends only on
the current state (not on the time t)

Zero sum game

A zero-sum game is defined to have a "value" v if

(i) player 1 has a strategy (which is then said to be "optimal"), which ensures that his expected
overall payoff over time does not fall below v, no matter what is the strategy followed by player 2,
and
(ii) if the symmetric property holds when exchanging the roles of the two players. Shapley proved the
existence of a value.

Because the parameters that define the game are independent of time, the situation that the players
face. If today the play is in a certain state is the same situation they would face tomorrow if
tomorrow the play is in that state.

In particular, one expects to have optimal strategies that are stationary Markov, that is, they depend
only on the current state of the game. Shapley proved that indeed such optimal strategies exist, and
characterized the value as the unique fixed point of a nonlinear functional operator-a two-player
version of the dynamic programming principle.

The existence of stationary Markov optimal strategies implies that, to play well, a player needs to
know only the current state. In particular, the value of the game does not change if players receive
partial information on each other's actions, and/or if they forget previously visited states.

Solving stochastic games

Nash equilibrium in game theory

Nash equilibrium is a concept within game theory where the optimal outcome of a game is where
there is no incentive to deviate from their initial strategy. More specifically, the Nash equilibrium is a
concept of game theory where the optimal outcome of a game is one where no player has an
incentive to deviate from his chosen strategy after considering an opponent's choice. Overall, an
individual can receive no incremental benefit from changing actions, assuming other players remain
constant in their strategies. A game may have multiple Nash equilibria or none at all.

The Nash equilibrium is the solution to a game in which two or more players have a strategy, and
with each participant considering an opponent's choice, he has no incentive, nothing to gain, by
switching his strategy. In the Nash equilibrium, each player's strategy is optimal when considering the
decisions of other players. Every player wins because everyone gets the outcome they desire. To
quickly test if the Nash equilibrium exists, reveal each player's strategy to the other players. If no one
changes his strategy, then the Nash equilibrium is proven.

For example, imagine a game between Anil and Sunil. In this simple game, both players can choose
strategy A, to receive 1, or strategy B, to lose 1. Logically, both players choose strategy A and receive
a payoff of 1. If one revealed Sunil's strategy to Anil and vice versa, one can see that no player
deviates from the original choice. Knowing the other player's move means little and doesn't change
either player's behavior. The outcome A, A represents a Nash equilibrium.

Prisoner's Dilemma

The prisoner's dilemma is a common situation analyzed in game theory that can employ the Nash
equilibrium. In this game, two criminals are arrested and each is held in solitary confinement with no
means of communicating with the other. The prosecutors do not have the evidence to convict the
pair, so they offer each prisoner the opportunity to either betray the other by testifying that the
other committed the crime or cooperate by remaining silent. If both prisoners betray each other,
each serves five years in prison. If A betrays B but B remains silent, prisoner A is set free and prisoner
B serves 10 years in prison, or vice versa. If each remains silent, then each serves just one year in
prison. The Nash equilibrium in this example is for both players to betray each other. Even though
mutual cooperation leads to a better outcome, if one prisoner chooses mutual cooperation and the
other does not, one prisoner's outcome is worse.
Irreducible stochastic game

A stochastic game is said to be irreducible if every game can be reached with positive probability
regardless of the strategy adopted.

Theorem: Every 2-player, general-sum, average reward, irreducible stochastic game has a Nash
equilibrium. A payoff profile is feasible if it is a convex combination of the outcomes in a game,
where the coefficients are rational numbers.

There's a folk theorem similar to the one for repeated games:

If (p1, p2) is a feasible pair of payoffs such that each pi is at least as big as agent i's minimax value,
then (p1, p2) can be achieved in equilibrium through the use of enforcement.

Backgammon - a example Two-Player Zero-Sum Stochastic Game

• For two-player zero-sum stochastic games, the folk theorem still applies, but it becomes vacuous
(empty).

• The situation is similar to what happened in repeated games. The only feasible pair of payoffs is the
minimax payoffs.

• One example of a two-player zero-sum stochastic game is Backgammon.

• Two agents who take turns. Before his/her move,an agent must roll the dice.

• The set of available moves depends on the results of the dice roll.

• Mapping Backgammon into a Markov game is straightforward, but slightly hod awkward. Basic idea
is to give each move a stochastic outcome, by combining it o with the dice roll that comes after it.

• Every state is a pair:

(current board, current dice configuration)

• Initial set of states = (initial board) x (all possible results of agent 1's first dice roll}

• Set of possible states after agent 1's move = {the board produced by agent 1's move) x {all
possible results of agent 2's dice roll}

• Vice versa for agent 2's move one can extend the minimax algorithm to deal with this.

• But it's easier if one doesn't try to combine the moves and the dice rolls. These two events need
to keep separate.

Using expectiminimax algorithm for solving stochastic game

• This algorithm is used to solve, two-player zero-sum game in which each agent's move has a
deterministic outcome.

• In addition to the two agents' moves, there are chance moves.

• The algorithm gives optimal play (highest expected utility). The algorithm is as follows.

function EXPECTIMINIMAX(s) returns an expected utility

if s is a terminal state then return Max's payoff at s


if s is a "chance" node then

return Σ P(s' | s)EXPECTIMINIMAX(s')

else if it is Max's move at s then

return max{EXPECTIMINIMAX(result (a, s)): a is applicable to s}

else return min{EXPECTIMINIMAX(result (a, s)): a is applicable to s}

• While finding solution for two player stochastic game following points are to be noted.

1. Dice rolls increase branching factor. 21 possible rolls are there with 2 dice.

Given the dice roll, there are 20 legal moves on an average. For some dice roles, it can be much
higher,

depth 4 = 20 × (21 x 20)3 ≈ 1.2 × 109

2. As depth increases, probability of reaching a given node shrinks and the value of lookahead is
diminished. In such situation α – β pruning is less effective.

3. There is algorithm, TDGammon that uses depth-2 search and it has very good evaluation function.
It can achieve world-champion level.

4. The evaluation function was created automatically using a machine learning technique called
Temporal Difference learning. Hence in the algorithm name 'TDGammon', 'TD' stands for Temporal
Difference learning.

Discounted Stochastic Games

Shapley's model is equivalent to one in which players discount their future payoffs according to a
discount factor that depends on the current state and on the players' actions. The game is called
"discounted" if all stopping probabilities equal the same constant, and one minus this constant is
called the "discount factor." Models of discounted stochastic games are prevalent in economics,
where the discount factor has a clear economic interpretation.

In non-zero-sum games, a collection of strategies, one for each player, is a "(Nash) equilibrium" if no
player can profit by deviating from his strategy, assuming all other players follow their prescribed
strategies.

Stochastic games - Applications

Stochastic games give a model for a large variety of dynamic interactions and are therefore useful in
modeling real-life situations that arise in, e.g., economics, political science, and operations research.
So the analysis of the game provides decisive predictions and recommendations, the data that define
it must have special features. Because applications are usually motivated by the search for clear-cut
conclusions, only highly structured models of stochastic games have been studied.

The significance of stochastic games is threefold. First, by modeling a dynamic situation as a


stochastic game, researchers must understand the structure of the problem they face. Second, to
simplify the model, they have to realize which aspects of the model do not affect the outcome and
can be dispensed with. Third, the qualitative predictions of the model sometimes provide useful
conclusions. We provide here two applications of stochastic games.
1. One area that was extensively studied as a stochastic game is the over exploitation of a common
resource. For example scientists Lloyd, Levhari and Mirman studied a fishery war between two
countries. The state variable is the quantity of fish in a given area, which grows exponentially in the
absence of unnatural intervention. Each one of two countries has to determine the quantity of fish it
allows its fishermen to catch, so as to maximize its long-run utility. The authors concluded that, in
equilibrium, the fish population will be smaller than the population that boog would have resulted if
the two countries cooperated and maximized their joint utility. The phenomenon of over exploitation
of a common resource is known in economics as the "tragedy of the commons".

2. Another application of stochastic games is that of market games with money. The study of the
origin of inflation in a market game with continuum of agents and a central bank has been carried
out. At every stage, each player receives a random endowment of a perishable commodity, decides
how much to lend to or to borrow ay from the central bank, and consumes the amount that he has
after this transaction. The conclusion of the analysis is that the mere presence of uncertainty in the
endowments leads to inflation.

Solved Example

Example 5.1 Explain game playing with examples.

or

Explain the following terms in game playing with examples.

i) Minimax (Position, Depth, Player)

ii) Deep enough (Position, Depth)

Solution: Game playing - Game playing is one area in which substantial progress has been made in
scaling up toy problems. An IBM program named DEEP BLUE beat the reigning world chess
champion, Garry Kasparov, by 3.5 to 2.5 in a six game match. Championship performance has been
achieved by sophisticated search algorithms, high-speed computers and chess-specific hardware.

i) Minimax (Position, Depth, Player)

• In minimax game playing strategy, one can use a recursive procedure that needs to return not one
but two results –

a) The backed up value of the path it chooses

b) The path itself.

• Assuming that MINIMAX returns a structure containing both results, we have two functions VALUE
and PATH, that extract the separate components.

• Initially MINIMAX procedure is called by specifying three parameters as, MINIMAX (Position,
Depth, Player) where

• Position indicates a board position.

• Depth indicates the current depth of search.

• Player indicates the player to move.

• So the initial call to compute the best move from the position CURRENT should be -
MINIMAX (CURRENT, 0, PLAYER-ONE) if player-one is to move or,

MINIMAX (CURRENT, 0, PLAYER-TWO) if player-two is to move.

ii) Deep Enough (Position, Depth)

• A critical issue in the design of the MINIMAX procedure is when to step the recursion and simply
call the static evaluation function.

• There are a variety of factors that may influence this decision. They include –

• Has one side won?

• How many ply have we already explored?

• How promising is this path?

• How much time is left?

• How stable is the configuration?

• Thus a function DEEP-ENOUGH is assumed to evaluate all of these factors and to return TRUE if
the search should be stopped at the current level and FALSE otherwise.

• One simple implementaiton of DEEP-ENOUGH will take two parameters, Position and Depth. It will
ignore its position parameter and simply return true it is Depth parameter exceeds a constant cut-off
value.

Review Questions

1. Explain minimax procedure. Is this procedure a breadth first or depth first search? (Refer section
5.11)

2. Explain various strategies of game playing? (Refer sections 5.9 and 5.10)

3. Explain the minimax strategy of game playing techniques with example. (Refer section 5.11)

Two Marks Question with Answer


AU: Dec.-10

Q.1 How can minimax also be extended for game of chance?

Ans.: In a game of chance we can add extra level of chance nodes in game search tree. These nodes
have successors which are the outcomes of random element.

The minimax algorithm uses probability P attached with chance node di based on this value.
Successor function S (N, di) give moves from position N for outcome di.

University Questions with Answers

May 2003

Q.1 Explain perfect decisions in game playing? Give example. (Refer sections 5.1 and 5.2) [16]

Dec. 2003
Q.2 Explain minimax algorithms and how it works for game of tic-tac-toe. (Refer section 5.11 )[8]

Dec. 2004

Q.3 Explain minimax search procedure with example upto 3rd ply. (Refer section 5.11 ) [8]

Q.4 Describe alpha-beta pruning using example. Show game tree upto

3rdply.(Refer sections 5.11 and 5.12)[16]

May 2009

Q.5 Explain minimax procedure for game playing. Is this DFS or BFS? How it can be modified to be
used by a program playing three-four player game? (Refer section 5.11 ) [8]

Q.6 How minimax procedure can be modifed to play multiplayer game?

(Refer section 5.11)[8]

May 2010

Q.7 Describe minimax procedure and alpha-beta pruning. (Refer sections 5.11 and 5.12) [16]

Dec. 2010

Q.8 Explain alpha-beta pruning algorithm with example. (Refer section 5.12)

[10]

May 2017

Q.9 Explain Minimax algorithm in detail. (Refer section 5.11)[16]

Q.10 Explain Alpha-Beta pruning and Alpha-Beta algorithm.

(Refer section 5.12)[16]

May 2019

Q.11 Explain the Min Max game playing algorithm with an example. (Refer section 5.11) [6]

You might also like