Lecture05 AdversarialSearch
Lecture05 AdversarialSearch
ADVERSARIAL SEARCH
2
The concept of games in AI
3
Search in multiagent environments
• Each agent needs to consider the actions of other agents
and how they affect its own welfare.
• The unpredictability of other agents introduce contingencies
into the agent’s problem-solving process
4
Game theory
• Game theory views any multiagent environment as a game.
• The impact of each agent on the others is “significant,” regardless of
whether the agents are cooperative or competitive.
• Types of games
Deterministic Chance
Perfect Chess, Checkers, Go, Backgammon,
information Othello Monopoly
Imperfect Bridge, poker, scrabble
information nuclear war
5
Types of Games
6
Adversarial search
• Adversarial search (known as games) covers competitive
environments in which the agents’ goals are in conflict.
• Zero-sum games of perfect information
• Deterministic, fully observable environments, turn-taking, two-player
• The utility values at the end are always equal and opposite.
7
Games vs. Search problems
• Complexity: games are too hard to be solved
• Chess: b 35, d 100 (50 moves/player) → graph of 1040 nodes,
search tree of 35100 or 10154 nodes
• Go: b 1000 (!)
• Time limits: make some decision even when calculating the
optimal decision is infeasible
• Efficiency: penalize inefficiency severely
• Several interesting ideas on how to make the best possible use of
time are spawn in game-playing research.
8
Primary assumptions
• Two players only, called MAX and MIN.
• MAX moves first, and then they take turns moving until the game ends
• Winner gets reward, loser gets penalty.
• Both players have complete knowledge of the game’s state
• E.g., chess, checkers and Go, etc. Counter examples: poker
• No element of chance
• No dice thrown, no cards drawn, etc.
• Zero-sum games
• The total payoff to all players is the same for every game instance.
• Rational players
• Each player always tries to maximize his/her utility
9
Games as search
• 𝑆0 – Initial state: How the game is set up at the start
• E.g., board configuration of chess
• 𝑃𝐿𝐴𝑌𝐸𝑅(𝑠): Which player has the move in a state, MAX/MIN?
• 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠) – Successor function: A list of (move, state) pairs
specifying legal moves.
• 𝑅𝐸𝑆𝑈𝐿𝑇(𝑠, 𝑎) – Transition model: Result of move 𝑎 on state 𝑠
• 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿 − 𝑇𝐸𝑆𝑇(𝑠): Is the game finished?
• States where the game has ended are called terminal states
• 𝑈𝑇𝐼𝐿𝐼𝑇𝑌 (𝑠, 𝑝) – Utility function: A numerical value of a terminal
state 𝑠 for a player 𝑝
• E.g., chess: win (+1), lose (-1) and draw (0), backgammon: [0, 192]
10
The game tree of Tic-Tac-Toe
• Complexity
• ~ 1018 nodes, which may require 100k years with 106 positions/sec
• Chinook (1989-2007)
• The first computer program that won the world champion title in a
competition against humans
• 1990: won 2 games in competition with world champion Tinsley (final
score: 2-4, 33 draws). 1994: 6 draws
• Chinook’s search
• Ran on regular PCs, played perfectly by using alpha-beta search
combining with a database of 39 trillion endgame positions
12
Examples of game: Chess
• Complexity
• b 35, d 100, 10154 nodes (!!)
• Completely impractical to search this
• Deep Blue (May 11, 1997)
• Kasparov lost a 6-game match against IBM’s Deep Blue (1 win Kasp
– 2 wins DB) and 3 ties.
• In the future, focus will be to allow computers to LEARN to
play chess rather than being TOLD how it should play
13
Deep Blue
• Ran on a parallel computer with 30 IBM RS/6000
processors doing alpha–beta search
• Searched up to 30 billion positions/move, average depth 14
(be able to reach to 40 plies)
• Evaluation function: 8000 features
• highly specific patterns of pieces (~4000 positions)
• 700,000 grandmaster games in database
• Working at 200 million positions/sec, even Deep Blue
would require 10100 years to evaluate all possible games.
• (The universe is only 1010 years old.)
14
GO 1 million trillion trillion trillion
trillion more configurations
than chess!
• Complexity
• Board of 19x19, b 361, average depth 200
• 10174 possible board configuration.
• Control of territory is unpredictable until the endgame
• AlphaGo (2016) by Google
• Beat 9-dan professional Lee Sedol (4-1)
• Machine learning + Monte Carlo search guided by a “value network”
and a “policy network” (implemented using deep neural network
technology)
• Learn from human + Learn by itself (self-play games)
15
An overview of AlphaGo
16
Optimal
decisions
in games
• Minimax algorithm
• Optimal decisions in multiplayer games
17
Optimal decision in games
• Normal search problem
• The optimal solution is a sequence of action leading to a goal state.
• Games
• The optimal strategy is a search path that guarantee win for a player
• This can be determined from the minimax value of each node.
For MAX
Assume that both players play optimally from there to the end of the game
18
An example of two-ply game tree
19
Minimax algorithm
• Make a minimax decision from the current state, using a
recursive computation of minimax values at each successor
• The recursion proceeds all the way down to the leaves, and then
back up the minimax values through the tree as it unwinds.
20
Minimax algorithm
function MINIMAX-DECISION(state) returns an action
return arg maxa ∈ ACTIONS(s) MIN-VALUE(RESULT(state, a))
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v ← -∞
for each a in ACTIONS(state) do
v ← MAX(v, MIN-VALUE(RESULT(s, a)))
return v
function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v←∞
for each a in ACTIONS(state) do
v ← MIN(v, MAX-VALUE(RESULT(s, a)))
return v 21
Properties of Minimax algorithm
• A complete depth-first exploration of the game tree
• Completeness
• Yes (if tree is finite)
• Optimality Note:
• Yes (against an optimal opponent) m: the maximum depth of the tree
• Time complexity b: the legal moves at each point
• 𝑂(𝑏𝑚 )
• Space complexity
• 𝑂(𝑏𝑚) (depth-first exploration)
23
Optimality in multiplayer games
• A single value is replaced with a vector of values.
→ the UTILITY function returns a vector of utilities
• For terminal states, this vector gives the utility of the state
from each player’s viewpoint.
24
Optimality in multiplayer games
• Multiplayer games usually involve alliances, which are made
and broken as the game proceeds.
25
Alpha-beta
pruning
26
Problem with minimax search
• The number of game states is exponential in the tree’s depth
→ Do not examine every node
• Alpha-beta pruning: Prune away branches that cannot
possibly influence the final decision
• Bounded lookahead
• Limit depth for each search
• This is what chess players do: look ahead for a few moves and see
what looks best
27
Alpha-beta pruning: An example
28
Another way to look at this is as a simplification of the formula for MINIMAX.
Let the two unevaluated successors of node 𝐶 have values 𝑥 and 𝑦.
Then the value of the root node is given by
29
Alpha-beta pruning
• If a move 𝑛 is determined to be
worse than move 𝑚 that has
already been examined and
discarded, then examining move
𝑛 once again is pointless.
𝜶 = the value of the best (i.e., highest-value) choice we have found so far
at any choice point along the path for MAX.
β = the value of the best (i.e., lowest-value) choice we have found so far
at any choice point along the path for MIN.
30
Alpha-beta search algorithm
31
Alpha-beta search algorithm
32
Properties of alpha-beta pruning
• Pruning does not affect the result
• Its worst case is as good as the minimax algorithm
• Good move ordering improves effectiveness of pruning
• With "perfect ordering“: time complexity 𝑂(𝑏 𝑚/2 ) → x2 search depth
• The effective branching factor becomes 𝑏 instead of 𝑏.
• E.g., for chess, about 6 instead of 35.
34
Imperfect
real-time
decisions
• Evaluation functions
• Cutting off search
• Forward pruning
• Search versus Lookup
35
Heuristic minimax
• Both minimax and alpha-beta pruning search all the way to
terminal states.
• This depth is usually impractical because moves must be made in a
reasonable amount of time (~ minutes).
• Cut off the search earlier with some depth limit
• Use an evaluation function
• An estimation for the desirability of position (win, lose, tie?)
36
Evaluation functions
• These evaluation function should order the terminal states in
the same way as the true utility function does
• States that are wins must evaluate better than draws, which in turn
must be better than losses.
• The computation must not take too long!
• For nonterminal states, their orders should be strongly
correlated with the actual chances of winning.
37
Evaluation functions
• For chess, typically linear weighted sum of features
𝑬𝒗𝒂𝒍(𝒔) = 𝒘𝟏 𝒇𝟏 (𝒔) + 𝒘𝟐 𝒇𝟐 (𝒔) + … + 𝒘𝒏 𝒇𝒏 (𝒔)
• where 𝑓𝑖 could be the numbers of each kind of piece on the board,
and 𝑤𝑖 could be the values of the pieces
• E.g., 𝐸𝑣𝑎𝑙(𝑠) = 9𝑞 + 5𝑟 + 3𝑏 + 3𝑛 + 𝑝
• Implicit strong assumption: the contribution of each feature is
independent of the values of the other features.
• E.g., assign the value 3 to a bishop ignores the fact that bishops are
more powerful in the endgame → Nonlinear combination
38
Cutting off search
• Minimax Cutoff is identical to Minimax Value except
1. 𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙? is replaced by 𝐶𝑢𝑡𝑜𝑓𝑓?
2. 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 is replaced by 𝐸𝑣𝑎𝑙
39
A more sophisticated cutoff test
• Quiescent positions are those unlikely to exhibit wild swings
in value in the near future.
• E.g., in chess, positions in which favorable captures can be made
are not quiescent for an evaluation function counting material only
• Quiescence search: expand nonquiescent positions until
quiescent positions are reached.
40
Quiescent positions: An example
Two chess positions that differ only in the position of the rook at lower right.
In (a), Black has an advantage of a knight and two pawns, which should be
enough to win the game. In (b), White will capture the queen, giving it an
advantage that should be strong enough to win.
41
A more sophisticated cutoff test
• Horizon effect: The program is facing an evitable serious
loss and temporarily avoid it by delaying tactics.
42
A more sophisticated cutoff test
• Singular extension: a move that is “clearly better” than all
other moves in a given position.
• The algorithm allows for further consideration on a legal singular
extension → deeper search tree, yet only a few singular extensions.
• Beam search
• Forward pruning, consider only a “beam” of the 𝑛 best moves only
• Most humans consider only a few moves from each position
• PROBCUT, or probabilistic cut, algorithm (Buro, 1995)
• Search vs. Lookup
• Use table lookup rather than search for the opening and ending
43
Stochastic
games
44
Stochastic behaviors
• Uncertain outcomes controlled by chance, not an adversary!
• Why wouldn’t we know what the result of an action will be?
• Explicit randomness: rolling dice
• Unpredictable opponents: the ghosts respond randomly
• Actions can fail: when a robot is moving, wheels might slip
45
Expectimax search
• Values reflect the average-case (expectimax) outcomes, not
worst-case (minimax) outcomes
47
Expectimax pruning
48
Expectimax pruning
• Pruning can only be possible with knowledge of a fix range.
49
Depth-limited expectimax
50
THE END
51