0% found this document useful (0 votes)
8 views24 pages

AI Notes-UNIT-3

The document discusses adversarial search and game theory, focusing on competitive environments with conflicting goals among agents. It covers concepts such as two-player zero-sum games, optimal decision-making strategies like minimax search, and techniques for improving search efficiency, including alpha-beta pruning and Monte Carlo Tree Search. The document highlights the complexities of game trees and the necessity for effective strategies in multiplayer games, emphasizing the importance of collaboration and decision-making under uncertainty.

Uploaded by

rajeswaric.aids
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views24 pages

AI Notes-UNIT-3

The document discusses adversarial search and game theory, focusing on competitive environments with conflicting goals among agents. It covers concepts such as two-player zero-sum games, optimal decision-making strategies like minimax search, and techniques for improving search efficiency, including alpha-beta pruning and Monte Carlo Tree Search. The document highlights the complexities of game trees and the necessity for effective strategies in multiplayer games, emphasizing the importance of collaboration and decision-making under uncertainty.

Uploaded by

rajeswaric.aids
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Department of Artificial Intelligence and Data Science

UNIT III
ADVERSARIAL SEARCH AND GAMES
3.1 Game theory:
• Competitive environments, in which two or more agents have conflicting goals, giving rise to adversarial
search problems.
• There are at least three stances we can take towards multi-agent environments.
• The first stance, appropriate when there are a very large number of agents, is to consider them in the
aggregate as an economy, allowing us to do things like predict that increasing demand will cause prices
to rise, without having to predict the action of any individual agent.
• Second, we could consider adversarial agents as just a part of the environment—a part that makes the
environment nondeterministic.
• The third stance is to explicitly model the adversarial agents with the techniques of adversarial game-
tree search. minimax search, a generalization of AND–OR search. Pruning makes the search more
efficient by ignoring portions of the search tree that make no difference to the optimal move.
• For nontrivial games, we will usually not have enough time to be sure of finding the optimal move ; we
will have to cut off the search at some point.
Two-player zero-sum games:
• The games most commonly studied within AI are what game theorists call deterministic, two-player,
turn-taking, perfect information, zero-sum games.
• “Perfect information” is a synonym for “fully observable,” and “zero-sum” means that what is good for
one player is just as bad for the other: there is no “win-win” outcome.
• For games we often use the term move as a synonym for “action” and position as a synonym for
“state.”
• Two players MAX and MIN. Position MAX moves first, and then the players take turns moving until the
game is over. At the end of the game, points are awarded to the winning player and penalties are given to
the loser.
• A game can be formally defined with the following elements:

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• The initial state, ACTIONS function, and RESULT function define the state space graph—a graph where
the vertices are states, the edges are moves and a state might be reached by multiple paths.
• superimpose a search tree over part of that graph to determine what move to make. We define the
complete game tree as a search tree that follows every sequence of moves all the way to a terminal state.
• The game tree may be infinite if the state space itself is unbounded or if the rules of the game allow for
infinitely repeating positions.

• From the initial state, MAX has nine possible moves. Play alternates between MAX’s placing an X and
MIN’s placing an O until we reach leaf nodes corresponding to terminal states such that one player has
three squares in a row or all the squares are filled. The number on each leaf node indicates the utility
value of the terminal state from the point of view of MAX; high values are good for MAX and bad for MIN.
• For tic-tac-toe the game tree is relatively small—fewer than 9!=362,880 terminal nodes (with only 5,478
distinct states). But for chess there are over 1040 nodes, so the game tree is best thought of as a
theoretical construct that we cannot realize in the physical world.
3.2 Optimal Decisions in Games
• MAX wants to find a sequence of actions leading to a win, but MIN has something to say about it. This
means that MAX’s strategy must be a conditional plan—a contingent strategy specifying a response to
each of MIN’s possible moves.
• In games that have a binary outcome (win or lose), we could use AND–OR search to generate the
conditional plan.
• In fact, for such games, the definition of a winning strategy for the game is identical to the definition of a
solution for a nondeterministic planning problem: in both cases the desirable outcome must be
guaranteed no matter what the “other side” does.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• For games with multiple Minimax search outcome scores, we need a slightly more general algorithm
called minimax search.

• The possible moves for MAX at the root node are labeled a1, a2, and a3. The possible replies to a1 for
MIN are b1, b2, b3, and so on. This particular game ends after one move each by MAX and MIN. The
utilities of the terminal states in this game range from 2 to 14.
• Given a game tree, the optimal strategy can be determined by working out the minimax value of each
state in the tree, which we write as MINIMAX(s).
• The minimax value is the utility (for MAX) of being in that state, assuming that both players play
optimally from there to the end of the game.
• The minimax value of a terminal state is just its utility. In a nonterminal state, MAX prefers to move to a
state of maximum value when it is MAX’s turn to move, and MIN prefers a state of minimum value.

• The terminal nodes on the bottom level get their utility values from the game’s UTILITY function. The
first MIN node, labeled B, has three successor states with values 3, 12, and 8, so its minimax value is 3.
• Similarly, the other two MIN nodes have minimax value 2. The root node is a MAX node; its successor
states have minimax values 3, 2, and 2; so it has a minimax value of 3.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• We can also identify the minimax decision at the root: action a1 is the optimal choice for MAX because it
leads to the state with the highest minimax value.
The minimax search algorithm
• Now that we can compute MINIMAX(s), we can turn that into a search algorithm that finds the best
move for MAX by trying all actions and choosing the one whose resulting state has the highest MINIMAX
value.
• . It is a recursive algorithm that proceeds all the way down to the leaves of the tree and then backs up
the minimax values through the tree as the recursion unwinds.
• For example, in Figure 5.2, the algorithm first recurses down to the three bottom-left nodes and uses the
UTILITY function on them to discover that their values are 3, 12, and 8, respectively.
• Then it takes the minimum of these values, 3, and returns it as the backed-up value of node B. A similar
process gives the backed-up values of 2 for C and 2 for D.
• Finally, we take the maximum of 3, 2, and 2 to get the backed-up value of 3 for the root node.

• The minimax algorithm performs a complete depth-first exploration of the game tree. If the maximum
depth of the tree is m and there are b legal moves at each point, then the time complexity of the minimax
algorithm is O(b m).
• The space complexity is O(bm) for an algorithm that generates all actions at once, or O(m) for an
algorithm that generates actions one at a time.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• The exponential complexity makes MINIMAX impractical for complex games; for example, chess has a
branching factor of about 35 and the average game has depth of about 80 ply, and it is not feasible to
search 3580 ≈ 10123 states.
• MINIMAX does, however, serve as a basis for the mathematical analysis of games. By approximating the
minimax analysis in various ways, we can derive more practical algorithms.
Optimal decisions in multiplayer games
• First, we need to replace the single value for each node with a vector of values. For example, in a three-
player game with players A, B, and C, a vector (vA,vB,vC) is associated with each node.
• For terminal states, this vector gives the utility of the state from each player’s viewpoint.
• The simplest way to implement this is to have the UTILITY function return a vector of utilities.
• Now we have to consider nonterminal states. Consider the node marked X in the game tree. In that state,
player C chooses what to do.
• The two choices lead to terminal states with utility vectors (vA =1,vB =2,vC =6) and (vA =4,vB =2,vC =3).
Since 6 is bigger than 3, C should choose the first move.
• This means that if state X is reached, subsequent play will lead to a terminal state with utilities (vA
=1,vB =2,vC =6). Hence, the backed-up value of X is this vector.
• In general, the backed-up value of a node n is the utility vector of the successor state with the highest
value for the player choosing at n.
• Multiplayer games usually involve alliances, whether formal or informal, among the players. Alliances
are made and broken as the game proceeds.
• For example, suppose A and B are in weak positions and C is in a stronger position. Then it is often
optimal for both A and B to attack C rather than each other, lest C destroy each of them individually.
• In this way, collaboration emerges from purely selfish behavior. Of course, as soon as C weakens under
the joint onslaught, the alliance loses its value, and either A or B could violate the agreement.
• In some cases, explicit alliances merely make concrete what would have happened anyway. In other
cases, a social stigma attaches to breaking an alliance, so players must balance the immediate advantage
of breaking an alliance against the long-term disadvantage of being perceived as untrustworthy.
3.3 Alpha–Beta Pruning
• The number of game states is exponential in the depth of the tree. No algorithm can completely
eliminate the exponent, but we can sometimes cut it in half, computing the correct minimax decision
without examining every state by pruning large parts of the tree that make no difference to the
outcome.
• The particular technique we examine is called Alpha–beta pruning alpha–beta pruning.
• Let’s go through the calculation of the optimal decision once more, this time paying careful attention to
what we know at each point in the process. The outcome is that we can identify the minimax decision
without ever evaluating two of the leaf nodes.
• Another way to look at this is as a simplification of the formula for MINIMAX. Let the two unevaluated
successors of node C have values x and y. Then the value of the root node is given by

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• In other words, the value of the root and hence the minimax decision are independent of the values of
the leaves x and y, and therefore they can be pruned.
• Alpha–beta pruning can be applied to trees of any depth, and it is often possible to prune entire subtrees
rather than just leaves.
• The general principle is this: consider a node n somewhere in the tree such that Player has a choice of
moving to n.
• If Player has a better choice either at the same level or at any point higher up in the tree, then Player will
never move to n. So once we have found out enough about n to reach this conclusion, we can prune it.
• Remember that minimax search is depth-first, so at any one time we just have to consider the nodes
along a single path in the tree.
• Alpha–beta pruning gets its name from the two extra parameters in MAX-VALUE(state,α,β) that describe
bounds on the backed-up values that appear anywhere along the path:
• α = the value of the best (i.e., highest-value) choice we have found so far at any choice point along the
path for MAX. Think: α = “at least.”
• β = the value of the best (i.e., lowest-value) choice we have found so far at any choice point along the
path for MIN. Think: β = “at most.”
• Alpha–beta search updates the values of α and β as it goes along and prunes the remaining branches at a
node (i.e., terminates the recursive call) as soon as the value of the current node is known to be worse
than the current α or β value for MAX or MIN, respectively.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• (a) The first leaf below B has the value 3. Hence, B, which is a MIN node, has a value of at most 3.
• (b) The second leaf below B has a value of 12; MIN would avoid this move, so the value of B is still at
most 3.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• (c) The third leaf below B has a value of 8; we have seen all B’s successor states, so the value of B is
exactly 3. Now we can infer that the value of the root is at least 3, because MAX has a choice worth 3 at
the root.
• (d) The first leaf below C has the value 2. Hence, C, which is a MIN node, has a value of at most 2. But we
know that B is worth 3, so MAX would never choose C. Therefore, there is no point in looking at the
other successor states of C. This is an example of alpha–beta pruning.
• (e) The first leaf below D has the value 14, so D is worth at most 14. This is still higher than MAX’s best
alternative (i.e., 3), so we need to keep exploring D’s successor states. Notice also that we now have
bounds on all of the successors of the root, so the root’s value is also at most 14.
• (f) The second successor of D is worth 5, so again we need to keep exploring. The third successor is
worth 2, so now D is worth exactly 2. MAX’s decision at the root is to move to B, giving a value of 3.
Move ordering
• The effectiveness of alpha–beta pruning is highly dependent on the order in which the states are
examined.

• Adding dynamic move-ordering schemes, such as trying first the moves that were found to be best in the
past, brings us quite close to the theoretical limit. The past could be the previous move—often the same
threats remain—or it could come from previous exploration of the current move through a process of
iterative deepening . First, search one ply deep and record the ranking of moves based on their
evaluations. Then search one ply deeper, using the previous ranking to inform move ordering; and so on.
The increased search time from iterative deepening can be more than made up from better move
ordering. The best moves are known as killer moves, and to try them first is called the killer move
heuristic.
• . In game tree search, repeated states can occur because of transpositions—different permutations of
the move sequence that end up in the same position, and the problem can be addressed with a
transposition table that caches the heuristic value of states.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

3.4 Monte Carlo Tree Search


• Two major weaknesses of heuristic alpha–beta tree search: First, Go has a branching factor that
starts at 361, which means alpha–beta search would be limited to only 4 or 5 ply.
• Second, it is difficult to define a good evaluation function for Go because material value is not a strong
indicator and most positions are in flux until the endgame.
• In response to these two challenges, modern Go programs have abandoned alpha–beta search and
instead use a strategy called Monte Carlo tree search (MCTS).
• The basic MCTS strategy does not use a heuristic evaluation function. Instead, the value of a state is
estimated as the average utility over a number of simulations of complete games starting from the state.
• A simulation (also called a playout or rollout) chooses moves first for one player, than for the other,
repeating until a terminal position is reached.
• At that point the rules of the game determine who has won or lost, and by what score. For games in
which the only outcomes are a win or a loss, “average utility” is the same as “win percentage.”
• To get useful information from the playout we need a playout policy that biases the moves towards
good ones. For Go and other games, playout policies have been successfully learned from self-play by
using neural networks. Sometimes game-specific heuristics are used, such as “consider capture
moves” in chess.
• Given a playout policy, we next need to decide two things: from what positions do we start the playouts,
and how many playouts do we allocate to each position?
• The simplest answer, called pure Monte Carlo search, is to do N simulations starting from the current
state of the game, and track which of the possible moves from the current position has the highest win
percentage.
• For some stochastic games this converges to optimal play as N increases, but for most games it is not
sufficient—we need a selection policy that selectively focuses the computational resources on the
important parts of the game tree.
• It balances two factors: exploration of states that have had few playouts, and exploitation of states
that have done well in past playouts, to get a more accurate estimate of their value.
• Monte Carlo tree search does that by maintaining a search tree and growing it on each iteration of the
following four steps
• Selection: Starting at the root of the search tree, we choose a move ,leading to a successor node, and
repeat that process, moving down the tree to a leaf.
• A search tree with the root representing a state where white has just moved, and white has won 37 out
of the 100 playouts done so far. The thick arrow shows the selection of a move by black that leads to a
node where black has won 60/79 playouts. This is the best win percentage among the three moves, so
selecting it is an example of exploitation. But it would also have been reasonable to select the 2/11 node
for the sake of exploration—with only 11 playouts, the node still has high uncertainty in its valuation,
and might end up being best if we gain more information about it. Selection continues on to the leaf
node marked 27/35.
• Expansion: We grow the search tree by generating a new child of the selected node; The new node
marked with 0/0.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• Simulation: We perform a playout from the newly generated child node, choosing moves for both players
according to the playout policy. These moves are not recorded in the search tree. In the figure, the
simulation results in a win for black.
• Back-propagation: We now use the result of the simulation to update all the search tree nodes going up
to the root. Since black won the playout, black nodes are incremented in both the number of wins and
the number of playouts, so 27/35 becomes 28/26 and 60/79 becomes 61/80. Since white lost, the white
nodes are incremented in the number of playouts only, so 16/53 becomes 16/54 and the root 37/100
becomes 37/101.

3.5 Stochastic Games


• Stochastic games bring us a little closer to the unpredictability of real life by including a random
element, such as the throwing of dice. Backgammon is a typical stochastic game that combines luck and
skill.
• At this point Black knows what moves can be made, but does not know what White is going to roll and
thus does not know what White’s legal moves will be. A game tree in backgammon must include chance
nodes in addition to MAX and MIN nodes. Chance nodes are shown as circles.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• The branches leading from each chance node denote the possible dice rolls; each branch is labeled with
the roll and its probability. There are 36 ways to roll two dice, each equally likely; but because a 6–5 is
the same as a 5–6, there are only 21 distinct rolls.
• The six doubles (1–1 through 6–6) each have a probability of 1/36, so we say P(1–1) = 1/36. The other
15 distinct rolls each have a 1/18 probability.
• The next step is to understand how to make correct decisions. Obviously, we still want to pick the move
that leads to the best position. However, positions do not have definite minimax values.
• Instead, we can only calculate the expected value of a position: the average over all possible outcomes of
the chance nodes.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• This leads us to the expectiminimax value for games with chance nodes, a generalization value of the
minimax value for deterministic games.
• Terminal nodes and MAX and MIN nodes work exactly the same way as before. For chance nodes we
compute the expected value, which is the sum of the value over all outcomes, weighted by the
probability of each chance action:

• where r represents a possible dice roll (or other chance event) and RESULT(s,r) is the same state as s,
with the additional fact that the result of the dice roll is r.
3.6 Partially Observable Games
• Video games such as StarCraft are particularly challenging, being partially observable, multi-agent,
nondeterministic, dynamic, and unknown.
• In deterministic partially observable games, uncertainty about the state of the board arises entirely from
lack of access to the choices made by the opponent.
• This class includes children’s games such as Battleship (where each player’s ships are placed in
locations hidden from the opponent) and Stratego.
• Other games also have partially observable versions: Phantom Go, Phantom tic-tac-toe, and Screen
Shogi.
Kriegspiel: Partially observable chess
• The rules of Kriegspiel are as follows:
• White and Black each see a board containing only their own pieces. A referee, who can see all the pieces,
adjudicates the game and periodically makes announcements that are heard by both players.
• First, White proposes to the referee a move that would be legal if there were no black pieces.
• If the black pieces prevent the move, the referee announces “illegal,” and White keeps proposing moves
until a legal one is found—learning more about the location of Black’s pieces in the process.
• Once a legal move is proposed, the referee announces one or more of the following: “Capture on square
X” if there is a capture, and “Check by D” if the black king is in check, where D is the direction of the
check, and can be one of “Knight,” “Rank,” “File,” “Long diagonal,” or “Short diagonal.”
• If Black is checkmated or stalemated, the referee says so; otherwise, it is Black’s turn to move.
• Kriegspiel may seem terrifyingly impossible, but humans manage it quite well and computer programs
are beginning to catch up. It helps to recall the notion of a belief state—the set of all logically possible
board states given the complete history of percepts to date.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• Initially, White’s belief state is a singleton because Black’s pieces haven’t moved yet. After White makes a
move and Black responds, White’s belief state contains 20 positions, because Black has 20 replies to any
opening move.
• Keeping track of the belief state as the game progresses is exactly the problem of state estimation.
• We can map Kriegspiel state estimation directly onto the partially observable, nondeterministic
framework if we consider the opponent as the source of nondeterminism; that is, the RESULTS of
White’s move are composed from the (predictable) outcome of White’s own move and the unpredictable
outcome given by Black’s reply.
• For a partially observable game, the notion of a strategy is altered; instead of specifying a move to make
for each possible move the opponent might make, we need a move for every possible percept sequence
that might be received.
• For Kriegspiel, a winning strategy, or guaranteed checkmate, is one that, for each possible percept
sequence, leads to an actual checkmate for every possible board state in the current belief state,
regardless of how the opponent moves.
• With this definition, the opponent’s belief state is irrelevant—the strategy has to work even if the
opponent can see all the pieces. This greatly simplifies the computation.
• The general AND-OR search algorithm can be applied to the belief-state space to find guaranteed
checkmates. The incremental belief-state algorithm often finds midgame checkmates up to depth 9—
well beyond the abilities of most human players.
• In addition to guaranteed checkmates, Kriegspiel admits an entirely new concept that makes no sense in
fully observable games: probabilistic checkmate.
• Such checkmates are Probabilistic checkmate still required to work in every board state in the belief
state; they are probabilistic with respect to randomization of the winning player’s moves.
• To get the basic idea, consider the problem of finding a lone black king using just the white king. Simply
by moving randomly, the white king will eventually bump into the black king even if the latter tries to
avoid this fate, since Black cannot keep guessing the right evasive moves indefinitely. In the terminology
of probability theory, detection occurs with probability 1.
• The KBNK endgame—king, bishop and knight versus king—is won in this sense; White presents Black
with an infinite random sequence of choices, for one of which Black will guess incorrectly and reveal his
position, leading to checkmate.
• On the other hand, the KBBK endgame is won with probability 1 − ǫ. White can force a win only by
leaving one of his bishops unprotected for one move. If Black happens to be in the right place and
captures the bishop, the game is drawn.
• White can choose to make the risky move at some randomly chosen point in the middle of a very long
sequence, thus reducing ǫ to an arbitrarily small constant, but cannot reduce ǫ to zero.
• Sometimes a checkmate strategy works for some of the board states in the current belief state but not
others. Trying such a strategy may succeed, leading to an accidental checkmate—accidental in the sense
that White could not know that it would be checkmate—if Accidental checkmate Black’s pieces happen
to be in the right places.
• This idea leads naturally to the question of how likely it is that a given strategy will win, which leads in
turn to the question of how likely it is that each board state in the current belief state is the true board
state.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• Each player’s goal is not just to move pieces to the right squares but also to minimize the information
that the opponent has about their location. Playing any predictable “optimal” strategy provides the
opponent with information. Hence, optimal play in partially observable games requires a willingness to
play somewhat randomly.

3.7 Constraint Satisfaction Problems


• A set of variables, each of which has a value. A problem is solved when each variable has a value that
satisfies all the constraints on the variable. A problem described this way is called a constraint
satisfaction problem, or CSP.
• CSP search algorithms take advantage of the structure of states and use general rather than domain-
specific heuristics to enable the solution of complex problems.
• The main idea is to eliminate large portions of the search space all at once by identifying variable/value
combinations that violate the constraints.
• CSPs have the additional advantage that the actions and transition model can be deduced from the
problem description.
Defining Constraint Satisfaction Problems

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• A constraint satisfaction problem consists of three components, X ,D, and C:


• X is a set of variables, {X1,...,Xn}.
• D is a set of domains, {D1,...,Dn}, one for each variable.
• C is a set of constraints that specify allowable combinations of values.
• A domain, Di , consists of a set of allowable values, {v1,...,vk}, for variable Xi . For example, a Boolean
variable would have the domain {true,false}.
• Different variables can have different domains of different sizes. Each constraint C j consists of a pair
<scope,rel>, where scope is a tuple of variables that participate in the constraint and rel is a relation that
defines the values that those variables can take on.
• A relation can be represented as an explicit set of all tuples of values that satisfy the constraint, or as a
function that can compute whether a tuple is a member of the relation.
• For example, if X1 and X2 both have the domain {1,2,3}, then the constraint saying that X1 must be
greater than X2 can be written as ((X1,X2),{(3,1),(3,2),(2,1)})or as ((X1,X2),X1 > X2).
• CSPs deal with assignments of values to variables, {Xi = vi ,Xj = vj ,...}. An assignment that does not
violate any constraints is called a consistent or legal assignment.
• A complete assignment is one in which every variable is assigned a value, and a solution to a CSP is a
consistent, complete assignment.
• A partial assignment is one that leaves some variables unassigned, and a partial solution is a partial
assignment that is consistent.
• Solving a CSP is an NP-complete problem in general, although there are important subclasses of CSPs
can be solved very efficiently.
• Example problem: Map coloring
• Suppose that, having tired of Romania, we are looking at a map of Australia showing each of its states
and territories. We are given the task of coloring each region either red, green, or blue in such a way that
no two neighboring regions have the same color.
• To formulate this as a CSP, we define the variables to be the regions: X = {WA,NT,Q,NSW,V,SA,T}.
• The domain of every variable is the set Di = {red,green,blue}. The constraints require neighboring
regions to have distinct colors. Since there are nine places where regions border, there are nine
constraints:

• {(red,green),(red,blue),(green,red),(green,blue),(blue,red),(blue,green)}.
• There are many possible solutions to this problem, such as {WA=red,NT =green,Q=red,NSW =green,V
=red,SA=blue,T =red }.
• It can be helpful to visualize a CSP as a constraint graph.
• The nodes of the graph correspond to variables of the problem, and an edge connects any two variables
that participate in a constraint.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

Why formulate a problem as a CSP?


• One reason is that the CSPs yield a natural representation for a wide variety of problems; it is often easy
to formulate a problem as a CSP.
• Another is that years of development work have gone into making CSP solvers fast and efficient.
• A third is that a CSP solver can quickly prune large swathes of the search space that an atomic state-
space searcher cannot.
Variations on the CSP formalism
• The simplest kind of CSP involves variables that have discrete, finite domains. Map-coloring problems
and scheduling with time limits are both of this kind.
• The 8-queens problem can also be viewed as a finite-domain CSP, where the variables Q1,...,Q8
correspond to the queens in columns 1 to 8, and the domain of each variable specifies the possible row
numbers for the queen in that column, Di = {1,2,3,4,5,6,7,8}.
• The constraints say that no two queens can be in the same row or diagonal.
• A discrete domain can be infinite, such as the set of integers or strings.
• With infinite domains, we must use implicit constraints like T1+d1 ≤ T2 rather than explicit tuples of
values. Special solution algorithms exist for linear constraints on integer variables—that is, constraints,
such as the one just given, in which each variable appears only in linear form.
• It can be shown that no algorithm exists for solving general nonlinear constraints on integer variables—
the problem is undecidable.
• Constraint satisfaction problems with continuous domains are common in the real world and are widely
studied in the field of operations research.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• The best-known category of continuous-domain CSPs is that of linear programming problems, where
constraints must be linear equalities or inequalities. Linear programming problems can be solved in
time polynomial in the number of variables.
• the types of variables that can appear in CSPs, it is useful to look at the types of constraints. The simplest
type is the unary constraint, which restricts the value of a single variable.
• A binary constraint relates two variables. For example, SA 6= NSW is a binary constraint. A binary CSP is
one with only unary and binary constraints; it can be represented as a constraint graph.
• The ternary constraint Between(X,Y,Z), for example, can be defined as h(X,Y,Z),X < Y < Z or X > Y > Zi.
• A constraint involving an arbitrary number of variables is called a global constraint.
Cryptarithmetic puzzles
• Each letter in a cryptarithmetic puzzle represents a different digit.
• For the case, the global constraint is represented as Alldiff(F,T,U,W,R,O).
• The addition constraints on the four columns of the puzzle can be written as the following n-ary
constraints:
• O+O = R+10 ·C1
• C1 +W +W = U +10 ·C2
• C2 +T +T = O+10 ·C3
• C3 = F ,
• where C1, C2, and C3 are auxiliary variables representing the digit carried over into the tens, hundreds,
or thousands column. These constraints can be represented in a constraint hypergraph.
• A hypergraph consists of ordinary nodes and hypernodes (the squares), which represent n-ary
constraints— constraints involving n variables.
• Another way to convert an n-ary CSP to a binary one is the dual graph transformation: create a new
graph in which there will be one variable for each constraint in the original graph, and one binary
constraint for each pair of constraints in the original graph that share variables.
3.8 Constraint Propagation: Inference in CSPs
• An atomic state-space search algorithm makes progress in only one way: by expanding a node to visit
the successors.
• A CSP algorithm has choices. It can generate successors by choosing a new variable assignment, or it can
do a specific type of inference called constraint propagation: using the constraints to reduce the number
of legal values for a variable, which in turn can reduce the legal values for another variable, and so on.
• The idea is that this will leave fewer choices to consider when we make the next choice of a variable
assignment.
• Constraint propagation may be intertwined with search, or it may be done as a preprocessing step,
before search starts. Sometimes this preprocessing can solve the whole problem, so no search is
required at all.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• The key idea is local consistency. If we treat each variable as a node in a graph and each binary
constraint as an edge, then the process of enforcing local consistency in each part of the graph causes
inconsistent values to be eliminated throughout the graph.
• There are different types of local consistency
• Arc consistency
• A variable in a CSP is arc-consistent if every value in its domain satisfies the variable’s binary
constraints.
• More formally, Xi is arc-consistent with respect to another variable X j if for every value in the current
domain Di there is some value in the domain D j that satisfies the binary constraint on the arc (X i ,Xj). A
graph is arc-consistent if every variable is arc consistent with every other variable.
Node consistency
• A single variable (corresponding to a node in the CSP graph) is node-consistent if all the values in the
variable’s domain satisfy the variable’s unary constraints.
• graph is node-consistent if every variable in the graph is node-consistent.
• It is easy to eliminate all the unary constraints in a CSP by reducing the domain of variables with unary
constraints at the start of the solving process.
Path consistency
• Arc consistency tightens down the domains (unary constraints) using the arcs (binary constraints). To
make progress on problems like map coloring, we need a stronger notion of consistency.
• Path consistency tightens the binary constraints by using implicit constraints that are inferred by
looking at triples of variables.
• A two-variable set {Xi ,Xj} is path-consistent with respect to a third variable Xm if, for every assignment
{Xi = a,Xj = b} consistent with the constraints (if any) on {Xi ,Xj}, there is an assignment to Xm that
satisfies the constraints on {Xi ,Xm} and {Xm,Xj}. The name refers to the overall consistency of the path
from Xi to Xj with Xm in the middle.
K-consistency
• Stronger forms of propagation can be defined with the notion of k-consistency. A CSP is k-consistent if,
for any set of k−1 variables and for any consistent assignment to those variables, a consistent value can
always be assigned to any kth variable.
• 1-consistency says that, given the empty set, we can make any set of one variable consistent: this is what
we called node consistency.
• 2-consistency is the same as arc consistency.
• For binary constraint graphs, 3- consistency is the same as path consistency.
Global constraints
• Global constraint is one involving an arbitrary number of variables (but not necessarily all variables).
Global constraints occur frequently in real problems and can be handled by special-purpose algorithms
that are more efficient than the general-purpose methods described so far.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• One simple form of inconsistency detection for constraints works as follows: if m variables are
involved in the constraint, and if they have n possible distinct values altogether, and m > n, then
the constraint cannot be satisfied.
• This leads to the following simple algorithm: First, remove any variable in the constraint that has a
singleton domain, and delete that variable’s value from the domains of the remaining variables.
• Repeat as long as there are singleton variables. If at any point an empty domain is produced or there are
more variables than domain values left, then an inconsistency has been detected.
3.9 Backtracking Search for CSPs
• The constraint propagation process can be finished and still have variables with multiple possible
values. In that case we have to search for a solution.
• Backtracking search algorithms that work on partial assignments.
• Consider how a standard depth-limited search could solve CSPs.
• A state would be a partial assignment, and an action would extend the assignment. For a CSP with n
variables of domain size d we would end up with a search tree where all the complete assignments are
leaf nodes at depth n. But notice that the branching factor at the top level would be nd because any of d
values can be assigned to any of n variables.
• At the next level, the branching factor is (n− 1)d, and so on for n levels. So the tree has n!· d n leaves,
even though there are only dn possible complete assignments.

• It repeatedly chooses an unassigned variable, and then tries all values in the domain of that variable in
turn, trying to extend each one into a solution via a recursive call.
• If the call succeeds, the solution is returned, and if it fails, the assignment is restored to the previous
state, and we try the next value. If no value works then we return failure.
Variable and value ordering

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• The backtracking algorithm contains the line


var←SELECT-UNASSIGNED-VARIABLE(csp, assignment) .
• The simplest strategy for SELECT-UNASSIGNED-VARIABLE is static ordering: choose the variables in
order, {X1,X2,...}. The next simplest is to choose randomly. Neither strategy is optimal.
• This intuitive idea—choosing the variable with the fewest “legal” values—is called the minimum-
remaining-values (MRV) heuristic. It also has been called the “most constrained” or “fail-first”
heuristic, the latter because it picks a variable that is most likely to cause a failure soon, thereby pruning
the search tree.
• If some variable X has no legal values left, the MRV heuristic will select X and failure will be detected
immediately—avoiding pointless searches through other variables.
• The MRV heuristic usually performs better than a random or static ordering, sometimes by orders of
magnitude, although the results vary depending on the problem.
• The degree heuristic attempts to reduce the branching factor on future choices by selecting the
variable that is involved in the largest number of constraints on other unassigned variables.
• Once a variable has been selected, the algorithm must decide on the order in which to examine its
values. The least-constraining-value heuristic is effective for this. It prefers the value that rules out the
fewest choices for the neighboring variables in the constraint graph.
Interleaving search and inference
• One of the simplest forms of inference is called forward checking. Whenever a variable X is assigned, the
forward-checking process establishes arc consistency for it: for each unassigned variable Y that is
connected to X by a constraint, delete from Y’s domain any value that is inconsistent with the value
chosen for X.
• Although forward checking detects many inconsistencies, it does not detect all of them. The problem is
that it doesn’t look ahead far enough.
• The algorithm called MAC (for Maintaining Arc Consistency) detects inconsistencies like this.
• After a variable Xi is assigned a value, the INFERENCE procedure calls AC-3, but instead of a queue of all
arcs in the CSP, we start with only the arcs (Xj ,Xi) for all Xj that are unassigned variables that are
neighbors of Xi .
• From there, AC-3 does constraint propagation in the usual way, and if any variable has its domain
reduced to the empty set, the call to AC-3 fails and we know to backtrack immediately.
Intelligent backtracking: Looking backward
• The BACKTRACKING-SEARCH algorithm has a very simple policy for what to do when a branch of the
search fails: back up to the preceding variable and try a different value for it. This is called chronological
backtracking.
• Consider what happens when we apply simple backtracking with a fixed variable ordering Q, NSW, V, T,
SA, WA, NT. Suppose we have generated the partial assignment {Q=red,NSW =green,V =blue,T =red}.
• When we try the next variable, SA, we see that every value violates a constraint. We back up to T and try
a new color for Tasmania! Obviously this is silly—recoloring Tasmania cannot possibly help in resolving
the problem with South Australia.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• A more intelligent approach is to backtrack to a variable that might fix the problem—a variable that was
responsible for making one of the possible values of SA impossible.
• To do this, we will keep track of a set of assignments that are in conflict with some value for SA. The set
(in this case {Q=red,NSW =green,V =blue}), is called the conflict set for SA.
• The Conflict set backjumping method backtracks to the most recent assignment in the conflict set; in
this Backjumping case, backjumping would jump over Tasmania and try a new value for V.
• This method is easily implemented by a modification to BACKTRACK such that it accumulates the
conflict set while checking for a legal value to assign.
• If no legal value is found, the algorithm should return the most recent element of the conflict set along
with the failure indicator.
Constraint learning
• Constraint learning is the idea of finding a minimum set of variables from the conflict set that causes the
problem. This set of variables, along with their corresponding values, is called a no-good. We then
record the no-good, either by adding a new constraint to the CSP to forbid this combination of
assignments or by keeping a separate cache of no-goods.
• No-goods can be effectively used by forward checking or by backjumping. Constraint learning is one of
the most important techniques used by modern CSP solvers to achieve efficiency on complex problems.
3.10 Local Search for CSPs
• Local search algorithms turn out to be very effective in solving many CSPs. They use a complete-state
formulation where each state assigns a value to every variable, and the search changes the value of one
variable at a time. As an example, we’ll use the 8-queens problem, as defined as a CSP. In Figure 6.8 we
start on the left with a complete assignment to the 8 variables; typically this will violate several
constraints.
• We then randomly choose a conflicted variable, which turns out to be Q8, the rightmost column. We’d
like to change the value to something that brings us closer to a solution; the most obvious approach is to
select the value that results in the minimum number of conflicts with other variables—the min-conflicts
heuristic.
• All the local search techniques are candidates for application to CSPs, and some of those have proved
especially effective. The landscape of a CSP under the minconflicts heuristic usually has a series of
plateaus. There may be millions of variable assignments that are only one conflict away from a solution.
• Plateau search—allowing sideways moves to another state with the same score—can help local search
find its way off this plateau. This wandering on the plateau can be directed with a technique called tabu
search: keeping a small list of recently visited states and forbidding the algorithm to return to those
states. Simulated annealing can also be used to escape from plateaus.
• Another technique called constraint weighting aims to concentrate the search on the important
constraints. Each constraint is given a numeric weight, initially all 1.
• At each step of the search, the algorithm chooses a variable/value pair to change that will result in the
lowest total weight of all violated constraints. The weights are then adjusted by incrementing the weight
of each constraint that is violated by the current assignment.
• This has two benefits: it adds topography to plateaus, making sure that it is possible to improve from the
current state, and it also adds learning: over time the difficult constraints are assigned higher weights.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

3.11 The Structure of Problems


• Independence can be ascertained simply by finding connected components of the constraint graph. Each
component corresponds to a subproblem CSPi . If assignment Si is a solution of CSPi , then UiSi is a
solution of UiCSPi .
• Suppose each CSPi has c variables from the total of n variables, where c is a constant. Then there are n/c
subproblems, each of which takes at most dc work to solve, where d is the size of the domain.
• Hence, the total work is O(d cn/c), which is linear in n; without the decomposition, the total work is O(d
n
), which is exponential in n.
• Completely independent subproblems are delicious, then, but rare. Fortunately, some other graph
structures are also easy to solve. For example, a constraint graph is a tree when any two variables are
connected by only one path. We will show CSP can be solved in time linear in the number of variables.
• The key is a new notion of that any tree-structured consistency, called directional arc consistency or
DAC. A CSP is defined to be directional arc-consistent under an ordering of variables X1,X2,...,Xn if and
only if every Xi is arc consistent with each Xj for j > i.
• To solve a tree-structured CSP, first pick any variable to be the root of the tree, and choose an ordering of
the variables such that each variable appears after its parent in the tree. Such an ordering is called a
topological sort.
• Any tree with n nodes has n − 1 edges, so we can make this graph directed arc-consistent in O(n) steps,
each of which must compare up to d possible domain values for two variables, for a total time of O(nd2 ).
• Once we have a directed arc-consistent graph, we can just march down the list of variables and choose
any remaining value. Since each edge from a parent to its child is arc-consistent, we know that for any
value we choose for the parent, there will be a valid value left to choose for the child.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

Cutset conditioning
• The first way to reduce a constraint graph to a tree involves assigning values to some variables so that
the remaining variables form a tree.
• The general algorithm is as follows:
1. Choose a subset S of the CSP’s variables such that the constraint graph becomes a tree after removal of
S. S is called a cycle cutset.
2. For each possible assignment to the variables in S that satisfies all constraints on S, (a) remove from
the domains of the remaining variables any values that are inconsistent with the assignment for S, and (b) if the
remaining CSP has a solution, return it together with the assignment for S.
• If the cycle cutset has size c, then the total run time is O(d c ·(n−c)d2 ): we have to try each of the d c
combinations of values for the variables in S, and for each combination we must solve a tree problem of
size n−c.
• Finding the smallest cycle cutset is NP-hard, but several efficient approximation algorithms are known.
The overall algorithmic approach is called cutset conditioning

Tree decomposition
• The second way to reduce a constraint graph to a tree is based on constructing a tree decomposition of
the constraint graph: a transformation of the original graph into a tree where each node in the tree
consists of a set of variables.
• A tree decomposition must satisfy these three requirements:
• Every variable in the original problem appears in at least one of the tree nodes.
• If two variables are connected by a constraint in the original problem, they must appear together in at
least one of the tree nodes.
• If a variable appears in two nodes in the tree, it must appear in every node along the path connecting
those nodes.
• The first two conditions ensure that all the variables and constraints are represented in the tree
decomposition. The third condition seems rather technical, but allows us to say that any variable from
the original problem must have the same value wherever it appears: the constraints in the tree say that a
variable in one node of the tree must have the same value as the corresponding variable in the adjacent
node in the tree.

SubCode:AD2405 Subject Name:Artificial Intelligence


Department of Artificial Intelligence and Data Science

• The tree width of a tree decomposition of a graph one less than the size of the largest node; the tree
width of the graph itself is defined to be the minimum width among all its tree decompositions. If a
graph has tree width w then the problem can be solved in O(nd w+1 ) time given the corresponding tree
decomposition. Hence, CSPs with constraint graphs of bounded tree width are solvable in polynomial
time.

Value symmetry
• Consider the map-coloring problem with d colors. For every consistent solution, there is actually a set of
d! solutions formed by permuting the color names. For example, on the Australia map we know that WA,
NT, and SA must all have different colors, but there are 3! = 6 ways to assign three colors to three
regions. This is called value symmetry. We would like to reduce the search space by a factor of d! by
breaking the symmetry in assignments. We do this by introducing a symmetry-breaking constraint.
• For map coloring, it was easy to find a constraint that eliminates the symmetry. In general it is NP-hard
to eliminate all symmetry, but breaking value symmetry has proved to be important and effective on a
wide range of problems.

SubCode:AD2405 Subject Name:Artificial Intelligence

You might also like