0% found this document useful (0 votes)
18 views

06-Adversarial_Search

Uploaded by

Shahzaib Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

06-Adversarial_Search

Uploaded by

Shahzaib Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

CIS 530:

ARTIFICIAL INTELLIGENCE

Games and
Adversarial
Search
Games: Outline of Unit
o Part 1: Games as Search
Motivation
Game-playing AI successes
Game Trees
Evaluation Functions
o Part II: Adversarial Search
The Minimax Rule
Alpha-Beta Pruning
May 11, 1997
Ratings of human and computer chess
champions

https://srconstantin.wordpress.com/2017/01/28/performance-trends-in-ai/
May 11, 1997
o Bulleted list
Callout box
Bulleted list level 2

Callout box
Content font: Open Sans Size20

Heading font: Open Sans Bold 34


Code block: Roboto Mono Size 20 with Gray background

Colors: Purple Yellow Cyan (accent) Light gray


(backgro und)
The Simplest Game Environment
o Multiagent
o Static: No change while an agentis deliberating.
o Discrete: A finite set of percepts and actions.
o Fully observable: An agent's sensors give it the complete state of the environment.
o Strategic: The next state is determined by the current state and the action executed
by the agent and the actions of one other agent.
Key properties of our games
1. Two players alternate moves
2. Zero-sum: one player’s loss is another’s gain
3. Clear set of legal moves
4. Well-defined outcomes (e.g. win, lose,draw)

o Examples:
Chess, Checkers, Go,
Mancala, Tic-Tac-Toe, Othello …
More complicated games
o Most card games (e.g. Hearts, Bridge, etc.) and Scrabble
Stochastic, not deterministic
Not fully observable: lacking in perfect information
o Real-time strategy games, e.g. Warcraft
Continuous rather than discrete
No pause between actions, don’t take turns
o Cooperative games
Pac-Man

https://youtu.be/-CbyAk3Sn9I
Formalizing the Game setup
1. Two players: MAX and MIN; MAX moves first.
2. MAX and MIN take turns until the game is over.
3. Winner gets award, loser getspenalty.

o Games as search:
Initial state: e.g. board configuration of chess
Successor function: list of (move,state) pairs specifying legal moves.
Terminal test: Is the game finished?
Utility function: Gives numerical value of terminal states.
e.g. win (+∞), lose (-∞) and draw (0)
MAX uses search tree to determine next move.
How to Play a Game bySearching
o General Scheme
1. Consider all legal successors to the current state (‘board position’)
2. Evaluate each successor board position
3. Pick the move which leads to the best board position.
4. After your opponent moves,repeat.

o Design issues
1. Representing the ‘board’
2. Representing legal next boards
3. Evaluating positions
4. Looking ahead
Hexapawn: A very simple Game
o Hexapawn is played on a 3x3 chessboard

o Only standard pawn moves:


1. A pawn moves forward one square onto an empty square
2. A pawn “captures” an opponent pawn by moving diagonally forward one
square, if that square contains an opposing pawn. The opposing pawn is
removed from the board.
Hexapawn: A very simple Game
o Hexapawn is played on a 3x3 chessboard

o Player P1 wins the game against P2 when:


One of P1’s pawns reaches the far side of the board, or
P2 cannot move because no legal move is possible.
P2 has no pawns left.
(Invented by Martin Gardner in 1962, with learning “program” using match boxes.
Reprinted in “The Unexpected Hanging..)
Hexapawn: Three Possible First Moves

⬤ ⬤ ⬤

White moves

⬤ ⬤ ⬤ ⬤ ⬤ ⬤ ⬤ ⬤ ⬤

□ □
Game Trees
o Represent the game problem space by a tree:
Nodes represent ‘board positions’; edges represent legal moves.
Root node is the first position in which a decision must be made.
Hexapawn: Simplified Game Tree for 2Moves
⬤ ⬤ ⬤
White to move

⬤ ⬤ ⬤ ⬤ ⬤ ⬤ ⬤ ⬤ ⬤

Black to move
□ □

⬤ ⬤ ⬤

⬤ ⬤ ⬤ ⬤ ⬤ ⬤ ⬤ ⬤ ⬤ ⬤ ⬤ ⬤ ⬤ ⬤
⬤ ⬤ White
⬤ ⬤ to move
□ □ □ □ □ □
Adversarial Search
Battle of Wits

https://www.youtube.com/watch?v=rMz7JBRbmNo
MAX & MIN Nodes : An egocentric view
o Two players: MAX, MAX’s opponent MIN
o All play is computed from MAX’s vantage point.
o When MAX moves, MAX attempts to MAXimize MAX’s outcome.
o When MAX’s opponent moves, they attempt to MINimize MAX’s outcome.
o WE TYPICALLY ASSUME MAX MOVES FIRST:

o Label the root (level 0) MAX


o Alternate MAX/MIN labels at each successive tree level (ply).
o Even levels represent turns for MAX
o Odd levels represent turns for MIN
Game Trees
o Represent the game problem space by a tree:
Nodes represent ‘board positions’; edges represent legal moves.
Root node is the first position in which a decision must be made.

o Evaluation function f assigns real-number scores


to `board positions’ without reference to path
o Terminal nodes represent ways the game could end, labeled with the
desirability of that ending (e.g. win/lose/draw or a numerical score)
Evaluation functions: f(n)
o Evaluates how good a ‘board position’ is
o Based on static features of that board alone
o Zero-sum assumption lets us use one function to describe goodness for
both players.
f(n)>0 if MAX is winning in position n
f(n)=0 if position n istied
f(n)<0 if MIN is winning in position n
o Build using expert knowledge,
Tic-tac-toe: f(n)=(# of 3 lengths open for MAX)- (# open forMIN)
A Partial Game Tree for Tic-Tac-Toe
o f(n)=# of potential three-lines for X –
o # of potential three-line for O

o f(n)=0 if n is a terminal tie


o f(n)=+ ∞ if n is a terminal win
o f(n)=8-5=3
f(n)=- ∞ if n is f(n)=2 f(n)=3
a terminal f(n)=2
loss f(n)=4 f(n)=2 f(n)=3 f(n)=2 f(n)=3

f(n)=6-5=1 f(n)=0 f(n)=1

f(n)=6-3=3 f(n)=6-4=2 f(n)=6-2=4


f(n)=# of potential three-lines for X –
# of potential three-line for O

f(n)=0 if n is a terminal tie


f(n)=+ ∞ if n is a terminal win
f(n)=- ∞ if n is a terminal loss

-∞ 0 +∞
Chess Evaluation Functions
o Claude Shannon argued for a chess evaluation function in a 1950 paper
Pawn 1.0
o Alan Turing defined function in 1948:
f(n)=(sum of A’s piece values) Knight 3.0
-(sum of B’s piece values) Bishop 3.25
Rook 5.0
o More complex: weighted sum
Type equation her e.Quee 9.0
of positional features: n
Σ wi featurei(n) Pieces values for a simple Turing-
style evaluation function often taught
to novice chess players
o Deep Blue had >8000 features Positive: rooks on open files, knights in
closed positions, control of the center,
developed pieces

Negative: doubled pawns, wrong-colored


bishops in closed positions, isolated pawns, pinned pieces
Examples of more complex features
Some Chess Positions and their Evaluations

White to move … Nxg5?? Uh-oh: Rxg4+


f(n)=(9+3)-(5+5+3.25) f(n)=(9+3)-(5+5) f(n)=(3)-(5+5)
=-1.25 =2 =-7
And black may
So, considering our opponent’s possible force checkmate
responses would be wise.
The Minimax Rule (AIMA 5.2)
The Minimax Rule: “Don’t play hope chess”
o Idea: Make the best move for MAX assuming that MIN always replies with the best
move for MIN

o Easily computed by a recursive process


• The backed-up value of each node in the tree is determined by the values of its
children:

• For a MAX node, the backed-up value is the maximum of the values of its
children (i.e. the best for MAX)

• For a MIN node, the backed-up value is the minimum of the values of its
children (i.e. the best for MIN)
The Minimax Procedure
o Until game is over:

1. Start with the current position as a MAX node.

2. Expand the game tree a fixed number of ply.

3. Apply the evaluation function to the leaf positions.

4. Calculate back-up values bottom-up.

5. Pick the move assigned to MAX at the root

6. Wait for MIN to respond


MAX
2-ply Example: Backing up values 2
MIN

2 1
2 1

2 7 1 8
2 7 1 8
2 7 1 8
Evaluation function value

This is the move 2


selected by minimax 1

2 7 1 8
Adversarial Search (Minimax)
o Minimax search:
A state-space search tree Minimax values:
Players alternate turns computed recursively
Compute each 5
node’s minimax value: the
best achievable utility against a
rational (optimal) adversary 2 5

8 2 5 6

Terminal values:
part of the game
Minimax Implementation

def max-value(state): def min-value(state):


if the state is a terminal state: if the state is a terminal state:
return the state’s utility return the state’s utility
initialize v = -∞ initialize v = +∞
for each successor of state: for each successor of state:
v = max(v, min-value(successor)) v = min(v, max-value(successor))
return v return v
Minimax Implementation (Dispatch)
def value(state):
if the state is a terminal state: return the state’s utility
if the next agent is MAX: return max-value(state)
if the next agent is MIN: return min-value(state)

def max-value(state):
initialize v = -∞ def min-value(state):
for each successor of state: initialize v = +∞
for each successor of state:
v = max(v, value(successor))
return v v = min(v, value(successor))
return v
Minimax Example

3 12 8 2 4 6 14 5 2

def max-value(state): def min-value(state):


initialize v = -∞ initialize v = +∞
for each successor of state: for each successor of state:
v = max(v, min-value(successor)) v = min(v, max-value(successor))
return v return v
Minimax Example

3 12 8 2 4 6 14 5 2

def max-value(state): def min-value(state):


if the state is a terminal state: if the state is a terminal state:
return the state’s utility return the state’s utility
initialize v = -∞ initialize v = +∞
for each successor of state: for each successor of state:
v = max(v, min-value(successor)) v = min(v, max-value(successor))
return v return v
What if MIN does not play optimally?

o Definition of optimal play for MAX assumes MIN plays optimally:


Maximizes worst-case outcome for MAX.
(Classic game theoretic strategy)

o But if MIN does not play optimally, MAX will do even better.
This theorem is not hard to prove
Comments on Minimax Search
o Depth-first search with fixed number of ply m as the limit.
O(bm) time complexity – As usual!
O(bm) space complexity

o Performance will depend on


the quality of the static evaluation function (expert knowledge)
depth of search (computing power and search algorithm)

o Differences from normal state space search


Looking to make one move only, despite deeper search
No cost on arcs – costs from backed-up static evaluation
MAX can’t be sure how MIN will respond to his moves

o Minimax forms the basis for other game tree search algorithms.
Alpha-Beta Pruning (AIMA 5.3)
Alpha-Beta Pruning
o A way to improve the performance of the
Minimax Procedure
o Basic idea: “If you have an idea which is surely bad,
don’t take the time to see how truly awful it is” ~ Pat
Winston

>=2
•We don’t need to
compute the value at
=2 <=1 this node.

•No matter what this


is, it won’t change the
value of the root
2 7 1 ? node.
Alpha-Beta Pruning
o During Minimax, keep track of two additional values:
α: MAX’s current lower bound on MAX’s outcome β:
MIN’s current upper bound on MIN’s outcome

o MAX will never allow a move that could lead to a worse score (for MAX)
than α
o MIN will never allow a move that could lead to a better score (for MAX)
than β

o Therefore, stop evaluating a branch whenever:


When evaluating a MAX node: a value v ≥ β is backed-up
• MIN will never select that MAX node
When evaluating a MIN node: a value v ≤ α is found
• MAX will never select that MIN node
Alpha-Beta Implementation
α: MAX’s best option on path to root
β: MIN’s best option on path to root

def max-value(state, α, β): def min-value(state , α, β):


initialize v = -∞ initialize v = +∞
for each successor of state: for each successor of state:
v = max(v, value(successor, α, β)) v = min(v, value(successor, α, β))
if v ≥ β return v if v ≤ α return v
α = max(α, v) β = min(β, v)
return v return v
Alpha-Beta Pruning Properties
o This pruning has no effect on minimax value computed for the root!

o Values of intermediate nodes might be wrong max


Important: children of the root may have the wrong value
So the most naïve version won’t let you do action selection
min
o Good child ordering improves effectiveness of pruning

o With “perfect ordering”:


Time complexity drops to O(bm/2)
10 10 0
Doubles solvable depth!
Full search of, e.g. chess, is still hopeless…

o This is a simple example of metareasoning (computing about what to


compute)
Alpha-Beta Pruning
o Based on observation that for all viable paths utility value
f(n) will be α <= f(n) <= β

o Initially, α = -infinity, β=infinity

o As the search tree is traversed, the possible utility value


window shrinks as α increases, β decreases
Alpha-Beta Pruning
o Whenever the current ranges of alpha and beta no longer
overlap, it is clear that the current node is a dead end
Games and Adversarial Search II
Alpha-Beta Pruning (AIMA 5.3)

Some slides adapted from Richard Lathrop, USC/ISI, CS 271


Review: The Minimax Rule
o Idea: Make the best move for MAX assuming that MIN always replies with the best
move for MIN

1. Start with the current position as a MAX node.


2. Expand the game tree a fixed number of ply.
3. Apply the evaluation function to all leaf positions.
4. Calculate back-up values bottom-up:
• For a MAX node, return the maximum of the values of its children (i.e. the best
for MAX)
• For a MIN node, return the minimum of the values of its children (i.e. the best
for MIN
5. Pick the move assigned to MAX at the root
6. Wait for MIN to respond and REPEAT FROM 1
MAX
2-ply Example: Backing up values 2
MIN

2 1
2 1

2 7 1 8
2 7 1 8
2 7 1 8
Evaluation function value

This is the move 2


selected by minimax 1

New point: 2 7 1 8
Actually calculated by DFS!
Minimax Algorithm
function MINIMAX-DECISION(state) returns an action
inputs: state, current state in game
v□MAX-VALUE(state)
return an action in SUCCESSORS(state) with value v

function MAX-VALUE(state) returns a utility value


if TERMINAL-TEST(state) then return UTILITY(state)
v □ -∞
for a,s in SUCCESSORS(state) do
v □ MAX(v, MIN-VALUE(s) )
return v

function MIN-VALUE(state) returns a u t i l i t y v a lu e


if TERMINAL-TEST(state) then return UTILITY(state)
v □ ∞
for a , s in SUCCESSORS(state) do
v □ MIN(v, MAX-VALUE(s) )
CIS 550 | Property of Penn EngCIineS
Alpha-Beta Pruning
o A way to improve the performance of the
Minimax Procedure
o Basic idea: “If you have an idea which is surely bad,
don’t take the time to see how truly awful it is” ~ Pat
Winston
•Assuming left-to-right
>=2 tree traversal:

•We don’t need to


=2 <=1 compute the value at
this node.

•No matter what this


1 ? is, it won’t change the
2 7
value of the root
node.
Alpha-Beta Pruning II
o During Minimax, keep track of two additional values:
α: current lower bound on MAX’s outcome
β: current upper bound on MIN’s outcome

o MAX will never choose a move that could lead to a worse score (for MAX)
than α
o MIN will never choose a move that could lead to a better score (for MAX)
than β

o Therefore, stop evaluating a branch whenever:


When evaluating a MAX node: a value v ≥ β isbacked-up
• MIN will never select that MAX node
When evaluating a MIN node: a value v ≤ α is found
• MAX will never select that MIN node
Alpha-Beta Pruning IIIa
o Based on observation that for all viable paths utility value
f(n) will be α <= f(n) <= β

o Initially, α = -infinity, β=infinity

o As the search tree is traversed, the possible utility value


window shrinks as α increases, β decreases
Alpha-Beta Pruning IIIb
o Whenever the current ranges of alpha and beta
no longer overlap (α ≥ β), it is clear that the current
node is a dead end, so it can be pruned
Alpha-beta Algorithm: In detail
o Depth first search (usually bounded, with static evaluation)
β only considers nodes along a single path from root at any time

o α = current lower bound on MAX’s outcome


(initially, α = −infinity)
o β = current upper bound on MIN’s outcome
o (initially, β = +infinity)

o Pass current values of a and b down to child nodes during search.


α
o Update values of a and b during search:
MAX updates α at MAXnodes
MIN updates β at MIN nodes
o Prune remaining branches at a node whenever α ≥ β
When to Prune
o Prune whenever α ≥ β.
β
Prune below a Max node when its α value becomes ≥ the β value of its
ancestors.
• Max nodes update α based on children’s returned values.
• Idea: Player MIN at node above won’t pick that value anyway, since MIN
can force a worse value.

Prune below a Min node when its β value becomes ≤ the α value of its
α ancestors.
• Min nodes update β based on children’s returned values.
• Idea: Player MAX at node above won’t pick that value anyway; she can do
better.
Pseudocode for Alpha-Beta Algorithm
function ALPHA-BETA-SEARCH(state) returns an a c t i o n
inputs: s ta te , current state in game
v□MAX-VALUE(state, - ∞ , +∞)
return an a c t i o n in ACTIONS(state) with value v
An Alpha-Beta Example
Do DF-search until first leaf , , initial values
=−
β  =+
MAX
, , passed to kids
=−
MIN
 =+

α
An Alpha-Beta Example (continued)

=−
β  =+
MAX
=−
 =3 <=3
MIN
MIN updates , based on kids

3
An Alpha-Beta Example (continued)

=−
β  =+
MAX
=−
 =3 <=3
MIN
MIN updates , based on kids.
No change.
α

3 12
An Alpha-Beta Example (continued)
MAX updates , based α on kids.
=3
β  =+
MAX
=− 3 is returned as node value.
 =3 3
MIN

3 12 8
An Alpha-Beta Example (continued)

=3
β  =+
MAX
α, β passed to kids
3 =3
 =+
MIN

3 12 8
An Alpha-Beta Example (continued)

=3
β  =+
MAX MIN updates ,
based on kids.
3 =3
 =2
MIN

3 12 8 2
An Alpha-Beta Example (continued)

=3
β  =+
MAX
=3
α >= , so prune.
3  =2
MIN

α X X
3 12 8 2
An Alpha-Beta Example (continued)
MAX updates , based on kids.
No change.
=3
β  =+ 2 is returned
MAX
as node value.
3 <=2

MIN

α X X
3 12 8 2
An Alpha-Beta Example (continued)
=3
β  =+
MAX
, , passed to kids

3 <=2 =3
MIN
 =+

α
X X
3 12 8 2
An Alpha-Beta Example (continued)
=3
β  =+
MAX MIN updates ,
based on kids.
3 <=2 =3
MIN
 =14

α
X X
3 12 8 2 14
An Alpha-Beta Example (continued)
=3
β  =+
MAX MIN updates ,
based on kids.
3 <=2 =3
MIN
 =5

α
X X
3 12 8 2 14 5
An Alpha-Beta Example (continued)
=3
β  =+
MAX 2 is returned as
Node value.
3 <=2 2
MIN

α
X X
3 12 8 2 14 5 2
An Alpha-Beta Example (continued)
Max now makes it’s best move,
as computed by Alpha-Beta
β MAX

3 <=2 2
MIN

α
X X
3 12 8 2 14 5 2

CIS 550 | Property of Penn Engineering | 67


Alpha-Beta Algorithm Pseudocode
function ALPHA-BETA-SEARCH(state) returns an a c t i o n
inputs: s ta te , current state in game
v□MAX-VALUE(state, - ∞ , +∞)
return an a c t i o n in ACTIONS(state) with value v

function MAX-VALUE(state,□ , □) returns a u t i l i t y v a lue


if TERMINAL-TEST(state) then return UTILITY(state)
v □ - ∞
for a in ACTIONS(state) do
v □ MAX(v,MIN-VALUE(Result(s,a), □ , □))
if v ≥ □ then return v
□ □ MAX(□ ,v)
return v
Alpha-Beta Algorithm II

function MIN-VALUE(state, □ , □) returns a u t i l i t y v a lue


if TERMINAL-TEST(state) then return UTILITY(state)
v □ + ∞
for a , s in SUCCESSORS(state) do
v □ MIN(v,MAX-VALUE(s, □ , □))
if v ≤ □ then return v
□ □ MIN(□ ,v)
return v
Effectiveness of Alpha-Beta Pruning
o Guaranteed to compute same root value as Minimax
o Worst case: no pruning, same as Minimax (O(bd))
o Best case: when each player’s best move is the first option examined, examines
only O(bd/2) nodes, allowing to search twice as deep!
When best move is the first examined,
examines only O(bd/2) nodes….
o So: run Iterative Deepening search, sort by value returned on last iteration.
o So: expand captures first, then threats, then forward moves, etc.

o O(b(d/2)) is the same as having a branching factor of sqrt(b),


Since (sqrt(b))d = b(d/2)
e.g., in chess go from b ~ 35 to b ~ 6

o For Deep Blue, alpha-beta pruning reduced the average branching factor from
35-40 to 6, as expected, doubling search depth
Real systems use a few more tricks
o Expand the proposed solution a little farther
Just to make sure there are no surprises
o Learn better board evaluation functions
E.g., for backgammon
o Learn model of your opponent
E.g., for poker
o Do stochastic search
E.g., for go

You might also like