Adversarial search techniques: Game playing
CCS 3101
ARTIFICIAL INTELLIGENCE
Games
Multi agent environments : any given agent will need to consider the actions of
other agents and how they affect its own welfare.
The unpredictability of these other agents can introduce many possible
contingencies
There could be competitive or cooperative environments
Competitive environments, in which the agent’s goals are in conflict require
adversarial search – these problems are known as games
Why study games
Fun
Clear criteria for success
Interesting, hard problems which require minimal “initial structure”
Games often define very large search spaces
chess 10*120 nodes
Historical reasons
Offer an opportunity to study problems involving {hostile, adversarial,
competing} agents.
Why do we use AI to play games
Games are an intelligent activity.
They provide a structured task in which it is very easy to measure success or
failure.
They do not require large amounts of knowledge.
They were thought to be solvable by straightforward search from the starting
state to a winning position
What kind of games?
Abstraction: To describe a game we must capture every relevant aspect of the game.
Such as:
Chess
Tic-tac-toe
…
Accessible environments: Such games are characterized by perfect information
Search: game-playing then consists of a search through possible game positions
Unpredictable opponent: introduces uncertainty thus game-playing must deal
with contingency problems
Types of Games
Perfect information: A game in which agents can look into the complete board. Agents
have all the information about the game, and they can see each others moves also. Examples
are Chess, Checkers, Go, etc.
Imperfect information: A game in which agents do not have all information about the
game and are not aware of what's going on, such as tic-tac-toe, Battleship, blind, Bridge, etc.
Deterministic games: Games which follow a strict pattern and set of rules for the games,
and there is no randomness associated with them. Examples are chess, Checkers, Go, tic-tac-
toe, etc.
Non-deterministic games: Games which have various unpredictable events with a factor
of chance or luck. This factor of chance or luck is introduced by either dice or cards. These are
random, and each action response is not fixed. Such games are also called as stochastic games.
Example: Backgammon, Monopoly, Poker, etc.
Types of Games
AI games are a specialized kind - deterministic, turn taking, two-player, zero sum games of
perfect information
A zero-sum game is a mathematical representation of a situation in which a
participant's gain (or loss) of utility is exactly balanced by the losses (or gains) of the
utility of other participant(s)
Typical assumptions
Two agents whose actions alternate
Utility values for each agent are the opposite of the other
This creates the adversarial situation
Fully observable environments
In game theory terms:
“Deterministic, turn-taking, zero-sum games of perfect information”
Game Trees
A game tree is a tree where nodes of the tree are the game states and Edges of the
tree are the moves by players.
Root node represents the “board” configuration at which a decision must be made as
to what is the best single move to make next. (not necessarily the initial
configuration)
Evaluator function rates a board position. f(board) (a real number).
Arcs represent the possible legal moves for a player (no costs associates to arcs).
Terminal nodes represent end-game configurations (the result must be one of “win”,
“lose”, and “draw”, possibly with numerical payoff)
Game Trees
If it is my turn to move, then the root is labeled a "MAX" node; otherwise it is labeled
a "MIN" node indicating my opponent's turn.
Each level of the tree has nodes that are all MAX or all MIN; nodes at level i are of the
opposite kind from those at level i+1
Complete game tree: includes all configurations that can be generated from the root
by legal moves (all leaves are terminal nodes)
Incomplete game tree: includes all configurations that can be generated from the
root by legal moves to a given depth (looking ahead to given steps)
Deterministic Single-Player
Deterministic, single player, perfect information:
Know the rules
Know what actions do
Know when you win
E.g. Freecell, 8-Puzzle… it’s just search!
Slight reinterpretation:
Each node stores a value: the best outcome it can reach
This is the maximal outcome of its children (the max value)
After search, can pick move that leads to best node
Deterministic Two-Player
E.g. tic-tac-toe, chess, checkers
Zero-sum games
One player maximizes result
The other minimizes result
Minimax search
A state-space search tree
Players alternate
Each layer, or ply, consists of a round of moves
Choose move to position with highest minimax value = best
achievable utility against best play
(Partial) game tree for Tic-Tac-Toe
• f(n) = +1 if the position is a win for X.
• f(n) = -1 if the position is a win for O.
• f(n) = 0 if the position is a draw.
How do we search this tree to find the optimal
move?
-
Search versus Games
Search – no adversary
Solution is (heuristic) method for finding goal
Heuristics techniques can find optimal solution
Evaluation function: estimate of cost from start to goal through given node
Examples: path planning, scheduling activities
Games – adversary
Solution is strategy
strategy specifies move for every possible opponent reply.
Time limits force an approximate solution
Evaluation function: evaluate “goodness” of game position
Examples: chess, checkers, Othello, backgammon
Games as Search
Two players: MAX and MIN
MAX moves first and they take turns until the game is over
Winner gets reward, loser gets penalty.
Formal definition as a search problem:
Initial state: It specifies how the game is set up at the start.
Player(s): Defines which player has the move in a state.
Actions(s): Returns the set of legal moves in a state.
Result(s, a): Transition model defines the result of a move.
Terminal-Test(s): Is the game finished? True if finished, false otherwise.
Utility function(s, p): Gives numerical value of terminal state s for player p.
E.g., win (+1), lose (-1), and draw (0) in tic-tac-toe.
MAX uses search tree to determine next move.
An optimal procedure: The Min-Max method
Designed to find the optimal strategy for Max and find best move:
1. Generate the whole game tree, down to the leaves.
2. Apply utility (payoff) function to each leaf.
3. Back-up values from leaves through branch nodes:
A Max node computes the Max of its child values
A Min node computes the Min of its child values
4. At root: choose the move leading to the child of highest value.
Adversarial Search for the minimax procedure
It aims to find the optimal strategy for MAX to win the game.
It follows the approach of Depth-first search.
In the game tree, optimal leaf node could appear at any depth of the tree.
Propagate the minimax values up to the tree until the terminal node discovered.
In a given game tree, the optimal strategy can be determined from the minimax
value of each node, which can be written as MINIMAX(n).
MAX prefer to move to a state of maximum value and MIN prefer to move to a
state of minimum value.
Game Trees
Two-Ply Game Tree
• Mini-Max algorithm uses recursion
to search through the game-tree.
• The minimax algorithm performs a
depth-first search algorithm for the
exploration of the complete game
tree.
• The minimax algorithm proceeds
all the way down to the terminal
node of the tree, then backtrack
the tree as the recursion.
Two-Ply Game Tree
Minimax maximizes the utility for the worst-case outcome for max
The minimax decision
Properties of minimax
Complete?
Yes (if tree is finite). It will definitely find a solution (if one exists).
Optimal?
Yes (against an optimal opponent).
Can it be beaten by an opponent playing sub-optimally?
No.
Time complexity?
O(bm) , where b is branching factor of the game-tree, and m is the maximum depth
of the tree.
Space complexity?
O(bm) (depth-first search, generate all actions at once)
O(m) (backtracking search, generate actions one at a time)
Alpha-beta(α-β) pruning
We can improve on the performance of the minimax algorithm through alpha-beta
pruning.
Basic idea: “If you have an idea that is surely bad, don't take the time to see how truly awful it is.” --
Pat Winston
>=2
• We don’t need to compute
=2 <=1 the value at this node.
• No matter what it is it can’t
affect the value of the root
node.
2 7 1 ?
Alpha-beta (α-β) pruning
Traverse the search tree in depth-first order - only considers nodes along a single path at any
time
At each Max node n, a(n) = maximum value found so far
Start with -infinity and only increase
Increases if a child of n returns a value greater than the current alpha
At each Min node n, b(n) = minimum value found so far
Start with infinity and only decrease
Decreases if a child of n returns a value less than the current beta
Update values of a and b during search and prune remaining branches as soon as the value is
known to be worse than the current a or b value for MAX or MIN
Alpha-beta(α-β) pruning
Condition for Alpha-beta pruning:
The main condition which is required for alpha-beta pruning is: α>=β
Key points about alpha-beta pruning:
The Max player will only update the value of alpha.
The Min player will only update the value of beta.
While backtracking the tree, the node values will be passed to upper nodes instead
of values of alpha and beta.
We will only pass the alpha, beta values to the child nodes.
Alpha-Beta (α-β) Example
Step 1:
Max player will start first move
from node A where
α= -∞ and β= +∞,
these value of alpha and beta are
passed down to node B where again
α= -∞ and β= +∞,
and Node B passes the same value to
its child D.
Alpha-Beta (α-β) Example
Step 2:
At Node D, the value of α will
be calculated as its turn for
Max.
The value of α is compared
with firstly 2 and then 3, and
the max (2, 3) = 3 will be the
value of α at node D and node
value will be 3.
Alpha-Beta (α-β) Example Step 3:
Now algorithm backtracks to
node B, where the value of β
will change as this is a turn of
Min,
Now β= +∞, will compare with
the available subsequent nodes
value, i.e. min (∞, 3) = 3,
hence at node B now α= -∞,
and β= 3.
In the next step, algorithm
traverse the next successor of
Node B which is node E, and the
values of α= -∞, and β= 3 will
also be passed.
Alpha-Beta (α-β) Example
Step 4:
At node E, Max will take its turn,
and the value of alpha will change.
The current value of alpha will be
compared with 5, so
max (-∞, 5) = 5,
hence at node E α= 5 and β= 3,
where α>=β,
so the right successor of E will be
pruned, and algorithm will not
traverse it, and the value at node E
will be 5.
Alpha-Beta (α-β) Example
Step 5:
At next step, algorithm again
backtrack the tree, from node B to
node A.
At node A, the value of alpha will be
changed, the maximum available
value is 3 as max (-∞, 3)= 3, and
β= +∞, these two values now
passes to right successor of A which
is Node C.
At node C, α=3 and β= +∞, and
the same values will be passed on to
node F.
Alpha-Beta (α-β) Example
Step 6:
At node F, again the value of α will
be compared with left child which is
0, and max(3,0)= 3,
and then compared with right child
which is 1, and max(3,1)= 3
still α remains 3, but the node value
of F will become 1.
Alpha-Beta (α-β) Example Step 7:
Node F returns the node value 1 to
node C, at C α= 3 and β= +∞,
here the value of beta will be changed,
it will compare with 1 so
min (∞, 1) = 1.
Now at C, α=3 and β= 1, and again it
satisfies the condition α>=β, so the
next child of C which is G will be
pruned, and the algorithm will not
compute the entire sub-tree G.
Alpha-Beta (α-β) Example
Step 8:
C now returns the value of 1 to A
here the best value for A is
max (3, 1) = 3.
The final game tree shows the nodes
which are computed and nodes
which were never computed.
Hence the optimal value for the
maximizer is 3 for this example.
Effectiveness of Alpha-beta pruning
Alpha-Beta is guaranteed to compute the same value for the root node as computed by
Minimax.
Worst case: NO pruning, branches are ordered so that no pruning takes place. In this
case alpha-beta gives no improvement over exhaustive search. In this case, the best
move occurs on the right side of the tree. The time complexity for such an order is
O(bm).
Best case is when each player's best move is the leftmost alternative, i.e. at MAX
nodes the child with the largest value is generated first, and at MIN nodes the child
with the smallest value is generated first.
Games : State-of-the-Art
Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in
1994. Used an endgame database defining perfect play for all positions involving 8 or fewer
pieces on the board, a total of 443,748,401,247 positions. Checkers is now solved!
Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in
1997. Deep Blue examined 200 million positions per second, used very sophisticated
evaluation and undisclosed methods for extending some lines of search up to 40 ply.
Current programs are even better, if less historic.
Othello: In 1997, Logistello defeated human champion by six games to none. Human
champions refuse to compete against computers, which are too good.
Go: Human champions are beginning to be challenged by machines, though the best
humans still beat the best machines. In Go, b > 300, so most programs use pattern
knowledge bases to suggest plausible moves, along with aggressive pruning.
Backgammon: Neural-net learning program TDGammon one of world’s top 3
players.