AI Notes-UNIT-3
AI Notes-UNIT-3
UNIT III
ADVERSARIAL SEARCH AND GAMES
3.1 Game theory:
• Competitive environments, in which two or more agents have conflicting goals, giving rise to adversarial
search problems.
• There are at least three stances we can take towards multi-agent environments.
• The first stance, appropriate when there are a very large number of agents, is to consider them in the
aggregate as an economy, allowing us to do things like predict that increasing demand will cause prices
to rise, without having to predict the action of any individual agent.
• Second, we could consider adversarial agents as just a part of the environment—a part that makes the
environment nondeterministic.
• The third stance is to explicitly model the adversarial agents with the techniques of adversarial game-
tree search. minimax search, a generalization of AND–OR search. Pruning makes the search more
efficient by ignoring portions of the search tree that make no difference to the optimal move.
• For nontrivial games, we will usually not have enough time to be sure of finding the optimal move ; we
will have to cut off the search at some point.
Two-player zero-sum games:
• The games most commonly studied within AI are what game theorists call deterministic, two-player,
turn-taking, perfect information, zero-sum games.
• “Perfect information” is a synonym for “fully observable,” and “zero-sum” means that what is good for
one player is just as bad for the other: there is no “win-win” outcome.
• For games we often use the term move as a synonym for “action” and position as a synonym for
“state.”
• Two players MAX and MIN. Position MAX moves first, and then the players take turns moving until the
game is over. At the end of the game, points are awarded to the winning player and penalties are given to
the loser.
• A game can be formally defined with the following elements:
• The initial state, ACTIONS function, and RESULT function define the state space graph—a graph where
the vertices are states, the edges are moves and a state might be reached by multiple paths.
• superimpose a search tree over part of that graph to determine what move to make. We define the
complete game tree as a search tree that follows every sequence of moves all the way to a terminal state.
• The game tree may be infinite if the state space itself is unbounded or if the rules of the game allow for
infinitely repeating positions.
• From the initial state, MAX has nine possible moves. Play alternates between MAX’s placing an X and
MIN’s placing an O until we reach leaf nodes corresponding to terminal states such that one player has
three squares in a row or all the squares are filled. The number on each leaf node indicates the utility
value of the terminal state from the point of view of MAX; high values are good for MAX and bad for MIN.
• For tic-tac-toe the game tree is relatively small—fewer than 9!=362,880 terminal nodes (with only 5,478
distinct states). But for chess there are over 1040 nodes, so the game tree is best thought of as a
theoretical construct that we cannot realize in the physical world.
3.2 Optimal Decisions in Games
• MAX wants to find a sequence of actions leading to a win, but MIN has something to say about it. This
means that MAX’s strategy must be a conditional plan—a contingent strategy specifying a response to
each of MIN’s possible moves.
• In games that have a binary outcome (win or lose), we could use AND–OR search to generate the
conditional plan.
• In fact, for such games, the definition of a winning strategy for the game is identical to the definition of a
solution for a nondeterministic planning problem: in both cases the desirable outcome must be
guaranteed no matter what the “other side” does.
• For games with multiple Minimax search outcome scores, we need a slightly more general algorithm
called minimax search.
• The possible moves for MAX at the root node are labeled a1, a2, and a3. The possible replies to a1 for
MIN are b1, b2, b3, and so on. This particular game ends after one move each by MAX and MIN. The
utilities of the terminal states in this game range from 2 to 14.
• Given a game tree, the optimal strategy can be determined by working out the minimax value of each
state in the tree, which we write as MINIMAX(s).
• The minimax value is the utility (for MAX) of being in that state, assuming that both players play
optimally from there to the end of the game.
• The minimax value of a terminal state is just its utility. In a nonterminal state, MAX prefers to move to a
state of maximum value when it is MAX’s turn to move, and MIN prefers a state of minimum value.
• The terminal nodes on the bottom level get their utility values from the game’s UTILITY function. The
first MIN node, labeled B, has three successor states with values 3, 12, and 8, so its minimax value is 3.
• Similarly, the other two MIN nodes have minimax value 2. The root node is a MAX node; its successor
states have minimax values 3, 2, and 2; so it has a minimax value of 3.
• We can also identify the minimax decision at the root: action a1 is the optimal choice for MAX because it
leads to the state with the highest minimax value.
The minimax search algorithm
• Now that we can compute MINIMAX(s), we can turn that into a search algorithm that finds the best
move for MAX by trying all actions and choosing the one whose resulting state has the highest MINIMAX
value.
• . It is a recursive algorithm that proceeds all the way down to the leaves of the tree and then backs up
the minimax values through the tree as the recursion unwinds.
• For example, in Figure 5.2, the algorithm first recurses down to the three bottom-left nodes and uses the
UTILITY function on them to discover that their values are 3, 12, and 8, respectively.
• Then it takes the minimum of these values, 3, and returns it as the backed-up value of node B. A similar
process gives the backed-up values of 2 for C and 2 for D.
• Finally, we take the maximum of 3, 2, and 2 to get the backed-up value of 3 for the root node.
• The minimax algorithm performs a complete depth-first exploration of the game tree. If the maximum
depth of the tree is m and there are b legal moves at each point, then the time complexity of the minimax
algorithm is O(b m).
• The space complexity is O(bm) for an algorithm that generates all actions at once, or O(m) for an
algorithm that generates actions one at a time.
• The exponential complexity makes MINIMAX impractical for complex games; for example, chess has a
branching factor of about 35 and the average game has depth of about 80 ply, and it is not feasible to
search 3580 ≈ 10123 states.
• MINIMAX does, however, serve as a basis for the mathematical analysis of games. By approximating the
minimax analysis in various ways, we can derive more practical algorithms.
Optimal decisions in multiplayer games
• First, we need to replace the single value for each node with a vector of values. For example, in a three-
player game with players A, B, and C, a vector (vA,vB,vC) is associated with each node.
• For terminal states, this vector gives the utility of the state from each player’s viewpoint.
• The simplest way to implement this is to have the UTILITY function return a vector of utilities.
• Now we have to consider nonterminal states. Consider the node marked X in the game tree. In that state,
player C chooses what to do.
• The two choices lead to terminal states with utility vectors (vA =1,vB =2,vC =6) and (vA =4,vB =2,vC =3).
Since 6 is bigger than 3, C should choose the first move.
• This means that if state X is reached, subsequent play will lead to a terminal state with utilities (vA
=1,vB =2,vC =6). Hence, the backed-up value of X is this vector.
• In general, the backed-up value of a node n is the utility vector of the successor state with the highest
value for the player choosing at n.
• Multiplayer games usually involve alliances, whether formal or informal, among the players. Alliances
are made and broken as the game proceeds.
• For example, suppose A and B are in weak positions and C is in a stronger position. Then it is often
optimal for both A and B to attack C rather than each other, lest C destroy each of them individually.
• In this way, collaboration emerges from purely selfish behavior. Of course, as soon as C weakens under
the joint onslaught, the alliance loses its value, and either A or B could violate the agreement.
• In some cases, explicit alliances merely make concrete what would have happened anyway. In other
cases, a social stigma attaches to breaking an alliance, so players must balance the immediate advantage
of breaking an alliance against the long-term disadvantage of being perceived as untrustworthy.
3.3 Alpha–Beta Pruning
• The number of game states is exponential in the depth of the tree. No algorithm can completely
eliminate the exponent, but we can sometimes cut it in half, computing the correct minimax decision
without examining every state by pruning large parts of the tree that make no difference to the
outcome.
• The particular technique we examine is called Alpha–beta pruning alpha–beta pruning.
• Let’s go through the calculation of the optimal decision once more, this time paying careful attention to
what we know at each point in the process. The outcome is that we can identify the minimax decision
without ever evaluating two of the leaf nodes.
• Another way to look at this is as a simplification of the formula for MINIMAX. Let the two unevaluated
successors of node C have values x and y. Then the value of the root node is given by
• In other words, the value of the root and hence the minimax decision are independent of the values of
the leaves x and y, and therefore they can be pruned.
• Alpha–beta pruning can be applied to trees of any depth, and it is often possible to prune entire subtrees
rather than just leaves.
• The general principle is this: consider a node n somewhere in the tree such that Player has a choice of
moving to n.
• If Player has a better choice either at the same level or at any point higher up in the tree, then Player will
never move to n. So once we have found out enough about n to reach this conclusion, we can prune it.
• Remember that minimax search is depth-first, so at any one time we just have to consider the nodes
along a single path in the tree.
• Alpha–beta pruning gets its name from the two extra parameters in MAX-VALUE(state,α,β) that describe
bounds on the backed-up values that appear anywhere along the path:
• α = the value of the best (i.e., highest-value) choice we have found so far at any choice point along the
path for MAX. Think: α = “at least.”
• β = the value of the best (i.e., lowest-value) choice we have found so far at any choice point along the
path for MIN. Think: β = “at most.”
• Alpha–beta search updates the values of α and β as it goes along and prunes the remaining branches at a
node (i.e., terminates the recursive call) as soon as the value of the current node is known to be worse
than the current α or β value for MAX or MIN, respectively.
• (a) The first leaf below B has the value 3. Hence, B, which is a MIN node, has a value of at most 3.
• (b) The second leaf below B has a value of 12; MIN would avoid this move, so the value of B is still at
most 3.
• (c) The third leaf below B has a value of 8; we have seen all B’s successor states, so the value of B is
exactly 3. Now we can infer that the value of the root is at least 3, because MAX has a choice worth 3 at
the root.
• (d) The first leaf below C has the value 2. Hence, C, which is a MIN node, has a value of at most 2. But we
know that B is worth 3, so MAX would never choose C. Therefore, there is no point in looking at the
other successor states of C. This is an example of alpha–beta pruning.
• (e) The first leaf below D has the value 14, so D is worth at most 14. This is still higher than MAX’s best
alternative (i.e., 3), so we need to keep exploring D’s successor states. Notice also that we now have
bounds on all of the successors of the root, so the root’s value is also at most 14.
• (f) The second successor of D is worth 5, so again we need to keep exploring. The third successor is
worth 2, so now D is worth exactly 2. MAX’s decision at the root is to move to B, giving a value of 3.
Move ordering
• The effectiveness of alpha–beta pruning is highly dependent on the order in which the states are
examined.
• Adding dynamic move-ordering schemes, such as trying first the moves that were found to be best in the
past, brings us quite close to the theoretical limit. The past could be the previous move—often the same
threats remain—or it could come from previous exploration of the current move through a process of
iterative deepening . First, search one ply deep and record the ranking of moves based on their
evaluations. Then search one ply deeper, using the previous ranking to inform move ordering; and so on.
The increased search time from iterative deepening can be more than made up from better move
ordering. The best moves are known as killer moves, and to try them first is called the killer move
heuristic.
• . In game tree search, repeated states can occur because of transpositions—different permutations of
the move sequence that end up in the same position, and the problem can be addressed with a
transposition table that caches the heuristic value of states.
• Simulation: We perform a playout from the newly generated child node, choosing moves for both players
according to the playout policy. These moves are not recorded in the search tree. In the figure, the
simulation results in a win for black.
• Back-propagation: We now use the result of the simulation to update all the search tree nodes going up
to the root. Since black won the playout, black nodes are incremented in both the number of wins and
the number of playouts, so 27/35 becomes 28/26 and 60/79 becomes 61/80. Since white lost, the white
nodes are incremented in the number of playouts only, so 16/53 becomes 16/54 and the root 37/100
becomes 37/101.
• The branches leading from each chance node denote the possible dice rolls; each branch is labeled with
the roll and its probability. There are 36 ways to roll two dice, each equally likely; but because a 6–5 is
the same as a 5–6, there are only 21 distinct rolls.
• The six doubles (1–1 through 6–6) each have a probability of 1/36, so we say P(1–1) = 1/36. The other
15 distinct rolls each have a 1/18 probability.
• The next step is to understand how to make correct decisions. Obviously, we still want to pick the move
that leads to the best position. However, positions do not have definite minimax values.
• Instead, we can only calculate the expected value of a position: the average over all possible outcomes of
the chance nodes.
•
• This leads us to the expectiminimax value for games with chance nodes, a generalization value of the
minimax value for deterministic games.
• Terminal nodes and MAX and MIN nodes work exactly the same way as before. For chance nodes we
compute the expected value, which is the sum of the value over all outcomes, weighted by the
probability of each chance action:
• where r represents a possible dice roll (or other chance event) and RESULT(s,r) is the same state as s,
with the additional fact that the result of the dice roll is r.
3.6 Partially Observable Games
• Video games such as StarCraft are particularly challenging, being partially observable, multi-agent,
nondeterministic, dynamic, and unknown.
• In deterministic partially observable games, uncertainty about the state of the board arises entirely from
lack of access to the choices made by the opponent.
• This class includes children’s games such as Battleship (where each player’s ships are placed in
locations hidden from the opponent) and Stratego.
• Other games also have partially observable versions: Phantom Go, Phantom tic-tac-toe, and Screen
Shogi.
Kriegspiel: Partially observable chess
• The rules of Kriegspiel are as follows:
• White and Black each see a board containing only their own pieces. A referee, who can see all the pieces,
adjudicates the game and periodically makes announcements that are heard by both players.
• First, White proposes to the referee a move that would be legal if there were no black pieces.
• If the black pieces prevent the move, the referee announces “illegal,” and White keeps proposing moves
until a legal one is found—learning more about the location of Black’s pieces in the process.
• Once a legal move is proposed, the referee announces one or more of the following: “Capture on square
X” if there is a capture, and “Check by D” if the black king is in check, where D is the direction of the
check, and can be one of “Knight,” “Rank,” “File,” “Long diagonal,” or “Short diagonal.”
• If Black is checkmated or stalemated, the referee says so; otherwise, it is Black’s turn to move.
• Kriegspiel may seem terrifyingly impossible, but humans manage it quite well and computer programs
are beginning to catch up. It helps to recall the notion of a belief state—the set of all logically possible
board states given the complete history of percepts to date.
• Initially, White’s belief state is a singleton because Black’s pieces haven’t moved yet. After White makes a
move and Black responds, White’s belief state contains 20 positions, because Black has 20 replies to any
opening move.
• Keeping track of the belief state as the game progresses is exactly the problem of state estimation.
• We can map Kriegspiel state estimation directly onto the partially observable, nondeterministic
framework if we consider the opponent as the source of nondeterminism; that is, the RESULTS of
White’s move are composed from the (predictable) outcome of White’s own move and the unpredictable
outcome given by Black’s reply.
• For a partially observable game, the notion of a strategy is altered; instead of specifying a move to make
for each possible move the opponent might make, we need a move for every possible percept sequence
that might be received.
• For Kriegspiel, a winning strategy, or guaranteed checkmate, is one that, for each possible percept
sequence, leads to an actual checkmate for every possible board state in the current belief state,
regardless of how the opponent moves.
• With this definition, the opponent’s belief state is irrelevant—the strategy has to work even if the
opponent can see all the pieces. This greatly simplifies the computation.
• The general AND-OR search algorithm can be applied to the belief-state space to find guaranteed
checkmates. The incremental belief-state algorithm often finds midgame checkmates up to depth 9—
well beyond the abilities of most human players.
• In addition to guaranteed checkmates, Kriegspiel admits an entirely new concept that makes no sense in
fully observable games: probabilistic checkmate.
• Such checkmates are Probabilistic checkmate still required to work in every board state in the belief
state; they are probabilistic with respect to randomization of the winning player’s moves.
• To get the basic idea, consider the problem of finding a lone black king using just the white king. Simply
by moving randomly, the white king will eventually bump into the black king even if the latter tries to
avoid this fate, since Black cannot keep guessing the right evasive moves indefinitely. In the terminology
of probability theory, detection occurs with probability 1.
• The KBNK endgame—king, bishop and knight versus king—is won in this sense; White presents Black
with an infinite random sequence of choices, for one of which Black will guess incorrectly and reveal his
position, leading to checkmate.
• On the other hand, the KBBK endgame is won with probability 1 − ǫ. White can force a win only by
leaving one of his bishops unprotected for one move. If Black happens to be in the right place and
captures the bishop, the game is drawn.
• White can choose to make the risky move at some randomly chosen point in the middle of a very long
sequence, thus reducing ǫ to an arbitrarily small constant, but cannot reduce ǫ to zero.
• Sometimes a checkmate strategy works for some of the board states in the current belief state but not
others. Trying such a strategy may succeed, leading to an accidental checkmate—accidental in the sense
that White could not know that it would be checkmate—if Accidental checkmate Black’s pieces happen
to be in the right places.
• This idea leads naturally to the question of how likely it is that a given strategy will win, which leads in
turn to the question of how likely it is that each board state in the current belief state is the true board
state.
• Each player’s goal is not just to move pieces to the right squares but also to minimize the information
that the opponent has about their location. Playing any predictable “optimal” strategy provides the
opponent with information. Hence, optimal play in partially observable games requires a willingness to
play somewhat randomly.
• {(red,green),(red,blue),(green,red),(green,blue),(blue,red),(blue,green)}.
• There are many possible solutions to this problem, such as {WA=red,NT =green,Q=red,NSW =green,V
=red,SA=blue,T =red }.
• It can be helpful to visualize a CSP as a constraint graph.
• The nodes of the graph correspond to variables of the problem, and an edge connects any two variables
that participate in a constraint.
• The best-known category of continuous-domain CSPs is that of linear programming problems, where
constraints must be linear equalities or inequalities. Linear programming problems can be solved in
time polynomial in the number of variables.
• the types of variables that can appear in CSPs, it is useful to look at the types of constraints. The simplest
type is the unary constraint, which restricts the value of a single variable.
• A binary constraint relates two variables. For example, SA 6= NSW is a binary constraint. A binary CSP is
one with only unary and binary constraints; it can be represented as a constraint graph.
• The ternary constraint Between(X,Y,Z), for example, can be defined as h(X,Y,Z),X < Y < Z or X > Y > Zi.
• A constraint involving an arbitrary number of variables is called a global constraint.
Cryptarithmetic puzzles
• Each letter in a cryptarithmetic puzzle represents a different digit.
• For the case, the global constraint is represented as Alldiff(F,T,U,W,R,O).
• The addition constraints on the four columns of the puzzle can be written as the following n-ary
constraints:
• O+O = R+10 ·C1
• C1 +W +W = U +10 ·C2
• C2 +T +T = O+10 ·C3
• C3 = F ,
• where C1, C2, and C3 are auxiliary variables representing the digit carried over into the tens, hundreds,
or thousands column. These constraints can be represented in a constraint hypergraph.
• A hypergraph consists of ordinary nodes and hypernodes (the squares), which represent n-ary
constraints— constraints involving n variables.
• Another way to convert an n-ary CSP to a binary one is the dual graph transformation: create a new
graph in which there will be one variable for each constraint in the original graph, and one binary
constraint for each pair of constraints in the original graph that share variables.
3.8 Constraint Propagation: Inference in CSPs
• An atomic state-space search algorithm makes progress in only one way: by expanding a node to visit
the successors.
• A CSP algorithm has choices. It can generate successors by choosing a new variable assignment, or it can
do a specific type of inference called constraint propagation: using the constraints to reduce the number
of legal values for a variable, which in turn can reduce the legal values for another variable, and so on.
• The idea is that this will leave fewer choices to consider when we make the next choice of a variable
assignment.
• Constraint propagation may be intertwined with search, or it may be done as a preprocessing step,
before search starts. Sometimes this preprocessing can solve the whole problem, so no search is
required at all.
• The key idea is local consistency. If we treat each variable as a node in a graph and each binary
constraint as an edge, then the process of enforcing local consistency in each part of the graph causes
inconsistent values to be eliminated throughout the graph.
• There are different types of local consistency
• Arc consistency
• A variable in a CSP is arc-consistent if every value in its domain satisfies the variable’s binary
constraints.
• More formally, Xi is arc-consistent with respect to another variable X j if for every value in the current
domain Di there is some value in the domain D j that satisfies the binary constraint on the arc (X i ,Xj). A
graph is arc-consistent if every variable is arc consistent with every other variable.
Node consistency
• A single variable (corresponding to a node in the CSP graph) is node-consistent if all the values in the
variable’s domain satisfy the variable’s unary constraints.
• graph is node-consistent if every variable in the graph is node-consistent.
• It is easy to eliminate all the unary constraints in a CSP by reducing the domain of variables with unary
constraints at the start of the solving process.
Path consistency
• Arc consistency tightens down the domains (unary constraints) using the arcs (binary constraints). To
make progress on problems like map coloring, we need a stronger notion of consistency.
• Path consistency tightens the binary constraints by using implicit constraints that are inferred by
looking at triples of variables.
• A two-variable set {Xi ,Xj} is path-consistent with respect to a third variable Xm if, for every assignment
{Xi = a,Xj = b} consistent with the constraints (if any) on {Xi ,Xj}, there is an assignment to Xm that
satisfies the constraints on {Xi ,Xm} and {Xm,Xj}. The name refers to the overall consistency of the path
from Xi to Xj with Xm in the middle.
K-consistency
• Stronger forms of propagation can be defined with the notion of k-consistency. A CSP is k-consistent if,
for any set of k−1 variables and for any consistent assignment to those variables, a consistent value can
always be assigned to any kth variable.
• 1-consistency says that, given the empty set, we can make any set of one variable consistent: this is what
we called node consistency.
• 2-consistency is the same as arc consistency.
• For binary constraint graphs, 3- consistency is the same as path consistency.
Global constraints
• Global constraint is one involving an arbitrary number of variables (but not necessarily all variables).
Global constraints occur frequently in real problems and can be handled by special-purpose algorithms
that are more efficient than the general-purpose methods described so far.
• One simple form of inconsistency detection for constraints works as follows: if m variables are
involved in the constraint, and if they have n possible distinct values altogether, and m > n, then
the constraint cannot be satisfied.
• This leads to the following simple algorithm: First, remove any variable in the constraint that has a
singleton domain, and delete that variable’s value from the domains of the remaining variables.
• Repeat as long as there are singleton variables. If at any point an empty domain is produced or there are
more variables than domain values left, then an inconsistency has been detected.
3.9 Backtracking Search for CSPs
• The constraint propagation process can be finished and still have variables with multiple possible
values. In that case we have to search for a solution.
• Backtracking search algorithms that work on partial assignments.
• Consider how a standard depth-limited search could solve CSPs.
• A state would be a partial assignment, and an action would extend the assignment. For a CSP with n
variables of domain size d we would end up with a search tree where all the complete assignments are
leaf nodes at depth n. But notice that the branching factor at the top level would be nd because any of d
values can be assigned to any of n variables.
• At the next level, the branching factor is (n− 1)d, and so on for n levels. So the tree has n!· d n leaves,
even though there are only dn possible complete assignments.
• It repeatedly chooses an unassigned variable, and then tries all values in the domain of that variable in
turn, trying to extend each one into a solution via a recursive call.
• If the call succeeds, the solution is returned, and if it fails, the assignment is restored to the previous
state, and we try the next value. If no value works then we return failure.
Variable and value ordering
• A more intelligent approach is to backtrack to a variable that might fix the problem—a variable that was
responsible for making one of the possible values of SA impossible.
• To do this, we will keep track of a set of assignments that are in conflict with some value for SA. The set
(in this case {Q=red,NSW =green,V =blue}), is called the conflict set for SA.
• The Conflict set backjumping method backtracks to the most recent assignment in the conflict set; in
this Backjumping case, backjumping would jump over Tasmania and try a new value for V.
• This method is easily implemented by a modification to BACKTRACK such that it accumulates the
conflict set while checking for a legal value to assign.
• If no legal value is found, the algorithm should return the most recent element of the conflict set along
with the failure indicator.
Constraint learning
• Constraint learning is the idea of finding a minimum set of variables from the conflict set that causes the
problem. This set of variables, along with their corresponding values, is called a no-good. We then
record the no-good, either by adding a new constraint to the CSP to forbid this combination of
assignments or by keeping a separate cache of no-goods.
• No-goods can be effectively used by forward checking or by backjumping. Constraint learning is one of
the most important techniques used by modern CSP solvers to achieve efficiency on complex problems.
3.10 Local Search for CSPs
• Local search algorithms turn out to be very effective in solving many CSPs. They use a complete-state
formulation where each state assigns a value to every variable, and the search changes the value of one
variable at a time. As an example, we’ll use the 8-queens problem, as defined as a CSP. In Figure 6.8 we
start on the left with a complete assignment to the 8 variables; typically this will violate several
constraints.
• We then randomly choose a conflicted variable, which turns out to be Q8, the rightmost column. We’d
like to change the value to something that brings us closer to a solution; the most obvious approach is to
select the value that results in the minimum number of conflicts with other variables—the min-conflicts
heuristic.
• All the local search techniques are candidates for application to CSPs, and some of those have proved
especially effective. The landscape of a CSP under the minconflicts heuristic usually has a series of
plateaus. There may be millions of variable assignments that are only one conflict away from a solution.
• Plateau search—allowing sideways moves to another state with the same score—can help local search
find its way off this plateau. This wandering on the plateau can be directed with a technique called tabu
search: keeping a small list of recently visited states and forbidding the algorithm to return to those
states. Simulated annealing can also be used to escape from plateaus.
• Another technique called constraint weighting aims to concentrate the search on the important
constraints. Each constraint is given a numeric weight, initially all 1.
• At each step of the search, the algorithm chooses a variable/value pair to change that will result in the
lowest total weight of all violated constraints. The weights are then adjusted by incrementing the weight
of each constraint that is violated by the current assignment.
• This has two benefits: it adds topography to plateaus, making sure that it is possible to improve from the
current state, and it also adds learning: over time the difficult constraints are assigned higher weights.
Cutset conditioning
• The first way to reduce a constraint graph to a tree involves assigning values to some variables so that
the remaining variables form a tree.
• The general algorithm is as follows:
1. Choose a subset S of the CSP’s variables such that the constraint graph becomes a tree after removal of
S. S is called a cycle cutset.
2. For each possible assignment to the variables in S that satisfies all constraints on S, (a) remove from
the domains of the remaining variables any values that are inconsistent with the assignment for S, and (b) if the
remaining CSP has a solution, return it together with the assignment for S.
• If the cycle cutset has size c, then the total run time is O(d c ·(n−c)d2 ): we have to try each of the d c
combinations of values for the variables in S, and for each combination we must solve a tree problem of
size n−c.
• Finding the smallest cycle cutset is NP-hard, but several efficient approximation algorithms are known.
The overall algorithmic approach is called cutset conditioning
Tree decomposition
• The second way to reduce a constraint graph to a tree is based on constructing a tree decomposition of
the constraint graph: a transformation of the original graph into a tree where each node in the tree
consists of a set of variables.
• A tree decomposition must satisfy these three requirements:
• Every variable in the original problem appears in at least one of the tree nodes.
• If two variables are connected by a constraint in the original problem, they must appear together in at
least one of the tree nodes.
• If a variable appears in two nodes in the tree, it must appear in every node along the path connecting
those nodes.
• The first two conditions ensure that all the variables and constraints are represented in the tree
decomposition. The third condition seems rather technical, but allows us to say that any variable from
the original problem must have the same value wherever it appears: the constraints in the tree say that a
variable in one node of the tree must have the same value as the corresponding variable in the adjacent
node in the tree.
• The tree width of a tree decomposition of a graph one less than the size of the largest node; the tree
width of the graph itself is defined to be the minimum width among all its tree decompositions. If a
graph has tree width w then the problem can be solved in O(nd w+1 ) time given the corresponding tree
decomposition. Hence, CSPs with constraint graphs of bounded tree width are solvable in polynomial
time.
Value symmetry
• Consider the map-coloring problem with d colors. For every consistent solution, there is actually a set of
d! solutions formed by permuting the color names. For example, on the Australia map we know that WA,
NT, and SA must all have different colors, but there are 3! = 6 ways to assign three colors to three
regions. This is called value symmetry. We would like to reduce the search space by a factor of d! by
breaking the symmetry in assignments. We do this by introducing a symmetry-breaking constraint.
• For map coloring, it was easy to find a constraint that eliminates the symmetry. In general it is NP-hard
to eliminate all symmetry, but breaking value symmetry has proved to be important and effective on a
wide range of problems.