0% found this document useful (0 votes)
28 views51 pages

Lecture05 AdversarialSearch

Uploaded by

Thanh Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views51 pages

Lecture05 AdversarialSearch

Uploaded by

Thanh Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Artificial Intelligence

ADVERSARIAL SEARCH

Nguyễn Ngọc Thảo – Nguyễn Hải Minh


{nnthao, nhminh}@fit.hcmus.edu.vn
Outline
• The concept of games in AI
• Optimal decisions in games
• α-β Pruning
• Imperfect, real-time decisions
• Stochastic games

2
The concept of games in AI

3
Search in multiagent environments
• Each agent needs to consider the actions of other agents
and how they affect its own welfare.
• The unpredictability of other agents introduce contingencies
into the agent’s problem-solving process

4
Game theory
• Game theory views any multiagent environment as a game.
• The impact of each agent on the others is “significant,” regardless of
whether the agents are cooperative or competitive.
• Types of games

Deterministic Chance
Perfect Chess, Checkers, Go, Backgammon,
information Othello Monopoly
Imperfect Bridge, poker, scrabble
information nuclear war

5
Types of Games

6
Adversarial search
• Adversarial search (known as games) covers competitive
environments in which the agents’ goals are in conflict.
• Zero-sum games of perfect information
• Deterministic, fully observable environments, turn-taking, two-player
• The utility values at the end are always equal and opposite.

7
Games vs. Search problems
• Complexity: games are too hard to be solved
• Chess: b  35, d  100 (50 moves/player) → graph of 1040 nodes,
search tree of 35100 or 10154 nodes
• Go: b  1000 (!)
• Time limits: make some decision even when calculating the
optimal decision is infeasible
• Efficiency: penalize inefficiency severely
• Several interesting ideas on how to make the best possible use of
time are spawn in game-playing research.

8
Primary assumptions
• Two players only, called MAX and MIN.
• MAX moves first, and then they take turns moving until the game ends
• Winner gets reward, loser gets penalty.
• Both players have complete knowledge of the game’s state
• E.g., chess, checkers and Go, etc. Counter examples: poker
• No element of chance
• No dice thrown, no cards drawn, etc.
• Zero-sum games
• The total payoff to all players is the same for every game instance.
• Rational players
• Each player always tries to maximize his/her utility
9
Games as search
• 𝑆0 – Initial state: How the game is set up at the start
• E.g., board configuration of chess
• 𝑃𝐿𝐴𝑌𝐸𝑅(𝑠): Which player has the move in a state, MAX/MIN?
• 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠) – Successor function: A list of (move, state) pairs
specifying legal moves.
• 𝑅𝐸𝑆𝑈𝐿𝑇(𝑠, 𝑎) – Transition model: Result of move 𝑎 on state 𝑠
• 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿 − 𝑇𝐸𝑆𝑇(𝑠): Is the game finished?
• States where the game has ended are called terminal states
• 𝑈𝑇𝐼𝐿𝐼𝑇𝑌 (𝑠, 𝑝) – Utility function: A numerical value of a terminal
state 𝑠 for a player 𝑝
• E.g., chess: win (+1), lose (-1) and draw (0), backgammon: [0, 192]
10
The game tree of Tic-Tac-Toe

MAX uses search tree


to determine next move.

from the point of view of MAX 11


Examples of game: Checkers

• Complexity
• ~ 1018 nodes, which may require 100k years with 106 positions/sec

• Chinook (1989-2007)
• The first computer program that won the world champion title in a
competition against humans
• 1990: won 2 games in competition with world champion Tinsley (final
score: 2-4, 33 draws). 1994: 6 draws

• Chinook’s search
• Ran on regular PCs, played perfectly by using alpha-beta search
combining with a database of 39 trillion endgame positions
12
Examples of game: Chess
• Complexity
• b  35, d  100, 10154 nodes (!!)
• Completely impractical to search this
• Deep Blue (May 11, 1997)
• Kasparov lost a 6-game match against IBM’s Deep Blue (1 win Kasp
– 2 wins DB) and 3 ties.
• In the future, focus will be to allow computers to LEARN to
play chess rather than being TOLD how it should play

13
Deep Blue
• Ran on a parallel computer with 30 IBM RS/6000
processors doing alpha–beta search
• Searched up to 30 billion positions/move, average depth 14
(be able to reach to 40 plies)
• Evaluation function: 8000 features
• highly specific patterns of pieces (~4000 positions)
• 700,000 grandmaster games in database
• Working at 200 million positions/sec, even Deep Blue
would require 10100 years to evaluate all possible games.
• (The universe is only 1010 years old.)

• Now: algorithmic improvements have allowed programs running on standard PCs


to win World Computer Chess Championships.
• Pruning heuristics reduce the effective branching factor to less than 3

14
GO 1 million trillion trillion trillion
trillion more configurations
than chess!
• Complexity
• Board of 19x19, b  361, average depth  200
• 10174 possible board configuration.
• Control of territory is unpredictable until the endgame
• AlphaGo (2016) by Google
• Beat 9-dan professional Lee Sedol (4-1)
• Machine learning + Monte Carlo search guided by a “value network”
and a “policy network” (implemented using deep neural network
technology)
• Learn from human + Learn by itself (self-play games)

15
An overview of AlphaGo

16
Optimal
decisions
in games

• Minimax algorithm
• Optimal decisions in multiplayer games
17
Optimal decision in games
• Normal search problem
• The optimal solution is a sequence of action leading to a goal state.
• Games
• The optimal strategy is a search path that guarantee win for a player
• This can be determined from the minimax value of each node.

For MAX

Assume that both players play optimally from there to the end of the game

18
An example of two-ply game tree

MAX best move

MIN best move

Utility values for MAX

19
Minimax algorithm
• Make a minimax decision from the current state, using a
recursive computation of minimax values at each successor
• The recursion proceeds all the way down to the leaves, and then
back up the minimax values through the tree as it unwinds.

20
Minimax algorithm
function MINIMAX-DECISION(state) returns an action
return arg maxa ∈ ACTIONS(s) MIN-VALUE(RESULT(state, a))
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v ← -∞
for each a in ACTIONS(state) do
v ← MAX(v, MIN-VALUE(RESULT(s, a)))
return v
function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v←∞
for each a in ACTIONS(state) do
v ← MIN(v, MAX-VALUE(RESULT(s, a)))
return v 21
Properties of Minimax algorithm
• A complete depth-first exploration of the game tree
• Completeness
• Yes (if tree is finite)
• Optimality Note:
• Yes (against an optimal opponent) m: the maximum depth of the tree
• Time complexity b: the legal moves at each point

• 𝑂(𝑏𝑚 )
• Space complexity
• 𝑂(𝑏𝑚) (depth-first exploration)

For chess, 𝑏 ≈ 35, 𝑚 ≈ 100 for "reasonable" games


→ exact solution completely infeasible
22
Quiz 01: Minimax algorithm
• Calculate the utility value for the remaining nodes
• Which node should MAX and MIN choose?

23
Optimality in multiplayer games
• A single value is replaced with a vector of values.
→ the UTILITY function returns a vector of utilities
• For terminal states, this vector gives the utility of the state
from each player’s viewpoint.

24
Optimality in multiplayer games
• Multiplayer games usually involve alliances, which are made
and broken as the game proceeds.

A and B are weak while C is strong. C becomes weak.


A forms an alliance with B. A or B could violate the agreement

• If the game is not zero-sum, then collaboration can also


occur with just two players.

25
Alpha-beta
pruning
26
Problem with minimax search
• The number of game states is exponential in the tree’s depth
→ Do not examine every node
• Alpha-beta pruning: Prune away branches that cannot
possibly influence the final decision
• Bounded lookahead
• Limit depth for each search
• This is what chess players do: look ahead for a few moves and see
what looks best

27
Alpha-beta pruning: An example

28
Another way to look at this is as a simplification of the formula for MINIMAX.
Let the two unevaluated successors of node 𝐶 have values 𝑥 and 𝑦.
Then the value of the root node is given by

29
Alpha-beta pruning
• If a move 𝑛 is determined to be
worse than move 𝑚 that has
already been examined and
discarded, then examining move
𝑛 once again is pointless.

𝜶 = the value of the best (i.e., highest-value) choice we have found so far
at any choice point along the path for MAX.
β = the value of the best (i.e., lowest-value) choice we have found so far
at any choice point along the path for MIN.
30
Alpha-beta search algorithm

function ALPHA-BETA-SEARCH(state) returns an action


v ← MAX-VALUE(state,-∞,+∞)
return the action in ACTIONS(state) with value v

function MAX-VALUE(state,α,β) returns a utility value


if TERMINAL-TEST(state) then return UTILITY(state)
v ← -∞
for each a in ACTIONS(state) do
v ← MAX(v, MIN-VALUE(RESULT(s,a),α,β))
if v ≥ β then return v
α ← MAX(α, v)
return v

31
Alpha-beta search algorithm

function MIN-VALUE(state,α,β) returns a utility value


if TERMINAL-TEST(state) then return UTILITY(state)
v ← +∞
for each a in ACTIONS(state) do
v ← MIN(v, MAX-VALUE(RESULT(s,a) ,α,β))
if v ≤ α then return v
β ← MIN(β, v)
return v

32
Properties of alpha-beta pruning
• Pruning does not affect the result
• Its worst case is as good as the minimax algorithm
• Good move ordering improves effectiveness of pruning
• With "perfect ordering“: time complexity 𝑂(𝑏 𝑚/2 ) → x2 search depth
• The effective branching factor becomes 𝑏 instead of 𝑏.
• E.g., for chess, about 6 instead of 35.

• Killer move heuristic


• First, IDS search with 1 ply deep and record the best path.
• Then search 1 ply deeper with the recorded path to inform move
ordering
• Transposition table avoids re-evaluation a state
33
Quiz 02: Alpha-beta pruning
• Calculate the utility value for the remaining nodes.
• Which nodes should be pruned?

34
Imperfect
real-time
decisions

• Evaluation functions
• Cutting off search
• Forward pruning
• Search versus Lookup
35
Heuristic minimax
• Both minimax and alpha-beta pruning search all the way to
terminal states.
• This depth is usually impractical because moves must be made in a
reasonable amount of time (~ minutes).
• Cut off the search earlier with some depth limit
• Use an evaluation function
• An estimation for the desirability of position (win, lose, tie?)

36
Evaluation functions
• These evaluation function should order the terminal states in
the same way as the true utility function does
• States that are wins must evaluate better than draws, which in turn
must be better than losses.
• The computation must not take too long!
• For nonterminal states, their orders should be strongly
correlated with the actual chances of winning.

37
Evaluation functions
• For chess, typically linear weighted sum of features
𝑬𝒗𝒂𝒍(𝒔) = 𝒘𝟏 𝒇𝟏 (𝒔) + 𝒘𝟐 𝒇𝟐 (𝒔) + … + 𝒘𝒏 𝒇𝒏 (𝒔)
• where 𝑓𝑖 could be the numbers of each kind of piece on the board,
and 𝑤𝑖 could be the values of the pieces
• E.g., 𝐸𝑣𝑎𝑙(𝑠) = 9𝑞 + 5𝑟 + 3𝑏 + 3𝑛 + 𝑝
• Implicit strong assumption: the contribution of each feature is
independent of the values of the other features.
• E.g., assign the value 3 to a bishop ignores the fact that bishops are
more powerful in the endgame → Nonlinear combination

38
Cutting off search
• Minimax Cutoff is identical to Minimax Value except
1. 𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙? is replaced by 𝐶𝑢𝑡𝑜𝑓𝑓?
2. 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 is replaced by 𝐸𝑣𝑎𝑙

if CUTOFF-TEST(state, depth) then return EVAL(state)

• Does it work in practice?


• 𝑏 𝑚 = 106 , 𝑏 = 35 → 𝑚 = 4
• 4-ply lookahead is a hopeless chess player!
• 4-ply ≈ human novice, 8-ply ≈ typical PC, human master, 12-ply ≈
Deep Blue, Kasparov

39
A more sophisticated cutoff test
• Quiescent positions are those unlikely to exhibit wild swings
in value in the near future.
• E.g., in chess, positions in which favorable captures can be made
are not quiescent for an evaluation function counting material only
• Quiescence search: expand nonquiescent positions until
quiescent positions are reached.

40
Quiescent positions: An example

Two chess positions that differ only in the position of the rook at lower right.
In (a), Black has an advantage of a knight and two pawns, which should be
enough to win the game. In (b), White will capture the queen, giving it an
advantage that should be strong enough to win.
41
A more sophisticated cutoff test
• Horizon effect: The program is facing an evitable serious
loss and temporarily avoid it by delaying tactics.

With Black to move, the black bishop is


surely doomed. But Black can forestall
that event by checking the white king
with its pawns, forcing the king to
capture the pawns.

42
A more sophisticated cutoff test
• Singular extension: a move that is “clearly better” than all
other moves in a given position.
• The algorithm allows for further consideration on a legal singular
extension → deeper search tree, yet only a few singular extensions.
• Beam search
• Forward pruning, consider only a “beam” of the 𝑛 best moves only
• Most humans consider only a few moves from each position
• PROBCUT, or probabilistic cut, algorithm (Buro, 1995)
• Search vs. Lookup
• Use table lookup rather than search for the opening and ending

43
Stochastic
games
44
Stochastic behaviors
• Uncertain outcomes controlled by chance, not an adversary!
• Why wouldn’t we know what the result of an action will be?
• Explicit randomness: rolling dice
• Unpredictable opponents: the ghosts respond randomly
• Actions can fail: when a robot is moving, wheels might slip

45
Expectimax search
• Values reflect the average-case (expectimax) outcomes, not
worst-case (minimax) outcomes

• Expectimax search: compute the


average score under optimal play
• Max nodes as in minimax search
• Chance nodes are like min nodes, but the outcome is uncertain
• Calculate expected utilities, i.e., take weighted average of children

• For minimax, terminal function scale doesn't matter


• Monotonic transformations: better states to have higher evaluations
• For expectimax, we need magnitudes to be meaningful
46
Expectimax search: Pseudo code

47
Expectimax pruning

Is it possible to perform pruning in expectimax search?

48
Expectimax pruning
• Pruning can only be possible with knowledge of a fix range.

How to prune this tree?


• Each child has an equal
probability of being chosen
• The values can only be in
the range 0-9 (inclusive).

49
Depth-limited expectimax

50
THE END

51

You might also like