UNIT 2
🞆 Multi-agent environments, in which each agent needs to
  consider the actions of other agents and how they affect its
  own welfare.
🞆 The unpredictability of these other agents can introduce
  contingencies into the agent’s problem-solving process
🞆 This unit covers competitive environments, in which the
  agents’ goals are in conflict, giving rise to adversarial
  search problems—often known as games.
UNIT-2
🞆 ADVERSARIAL SEARCH
🞆 In which we examine the problems that arise
  when we try to plan ahead in a world where
  other agents are planning against us.
    ADVERSIAL SEARCH
🞆 Mathematical game theory, a branch of economics, views any multiagent
  environment as a game, provided that the impact of each agent on the others is
  “significant,” regardless of whether the agents are cooperative or competitive.
•    In previous topics, we have studied the search strategies which are only
    associated with a single agent that aims to find the solution which often
    expressed in the form of a sequence of actions.
• The environment with more than one agent is termed as multi-agent environment
  where each agent is an opponent of other agent, playing against each other
  considering the action of other agent and effect of that action on their performance
• So, Searches in which two or more players with conflicting goals are
    trying to explore the same search space for the solution, are called
    adversarial searches, often known as Games.
• Games are modeled as a Search problem and heuristic evaluation function, and
  these are the two main factors which help to model and solve games in AI.
 TYPES OF GAMES IN AI:
                              Deterministic                    Chance Moves
                                                               (Non-D/stochastic)
Perfect information          Chess, Checkers                   Backgammon, monopoly
Imperfect information        Battleships, blind, tic-tac-toe   Bridge, poker, scrabble,
                                                               nuclear war
•Perfect information: A game with the perfect information is that in which agents can
look into the complete board such that they can see each other moves. Examples are
Chess, Checkers, Go, etc.
•Imperfect information: If in a game agents do not have all information about the game
and not aware with what's going on. Examples such as tic-tac-toe, Battleship
•Deterministic games: Deterministic games are those games which follow a strict
pattern and set of rules for the games, and there is no randomness associated with them.
Examples are chess, Checkers, Go, tic-tac-toe, etc.
•Non-deterministic games(stochastic games): Non-deterministic are those games which
have various unpredictable events and has a factor of chance by using dice or cards.
Example: Backgammon, Monopoly, Poker, etc.
ZERO-SUM GAME
• Zero-sum games are adversarial search which involves pure competition,
   which means, In Zero-sum game each agent's gain or loss of utility is exactly
   balanced by the losses or gains of utility of another agent.
• One player of the game tries to maximize one single value, while other
   player tries to minimize it.
• Each move by one player in the game is called as ply.
• Chess and tic-tac-toe are examples of a Zero-sum game.
Zero-sum game: Embedded thinking
The Zero-sum game involved embedded thinking in which one agent or player is
trying to figure out:
       What to do.
       How to decide the move
       Needs to think about his opponent as well
• The opponent also thinks what to do
🞆 Each of the players is trying to find out the response of his opponent to their
   actions. This requires embedded thinking or backward reasoning to solve
   the game problems in AI.
   FORMALIZATION OF THE PROBLEM
🞆A game can be defined as a type of search in AI which can be formalized of the
following elements:
  • Initial state: It specifies how the game is set up at the start.
  • Player(s): It specifies which player has moved in the state space.
  • Action(s): It returns the set of legal moves in state space.
  • Result(s, a): It is the transition model, which specifies the result of moves in the state
    space.
  • Terminal-Test(s): Terminal test is true if the game is over, else it is false at any case.
    The state where the game ends is called terminal states.
  • Utility(s, p): A utility function gives the final numeric value for a game that ends
    in terminal states s for player p. It is also called payoff function.
   Eg.: For Chess, the outcomes are a win, loss, or draw and its
payoff values are +1, 0, ½.
And for tic-tac-toe, utility values are +1, -1, and 0.
GAME TREE:
🞆 A game tree is a tree where nodes of the tree are the game states and Edges
  of the tree are the moves by players.
🞆 Game tree involves initial state, actions function, and result Function.
🞆 Example: Tic-Tac-Toe game tree:
🞆 The following figure is showing part of the game-tree for tic-tac-toe game.
Following are some key points of the game:
• There are two players MAX and MIN.
• Players have an alternate turn and start with MAX.
• MAX maximizes the result of the game tree
• MIN minimizes the result.
🞆 Games, like the real world, therefore require the ability to make some decision
  even when calculating the optimal
🞆 decision is infeasible.
🞆 Game-playing research has therefore spawned a number of interesting ideas on
  how to make the best possible use of time.
MIN-MAX GAME
🞆 We first consider games with two players,
  whom we call MAX and MIN for reasons that
  will soon become obvious.
🞆 MAX moves first, and then they take turns
  moving until the game is over.
🞆 At the end of the game, points are awarded
  to the winning player and penalties are given
  to the loser.
🞆 A game can be formally defined as a kind of
  search problem with the following elements:
🞆 S0: The initial state, which specifies how the
  game is set up at the start.
🞆 PLAYER(s): Defines which player has the move in
  a state.
🞆 ACTIONS(s): Returns the set of legal moves in a
  state.
🞆 RESULT(s, a): The transition model, which
  defines the result of a move.
🞆 TERMINAL-TEST(s): A terminal test, which is true
  when the game is over and false otherwise.
  States where the game has ended are called
  terminal states.
🞆 TIC TAC TOE Explanation:
• From the initial state, MAX has 9 possible moves as he starts first. MAX place
  x and MIN place o, and both player plays alternatively until we reach a leaf
  node where one player has three in a row or all squares are filled.
• Both players will compute for each node, minimax, the minimax value which
  is the best achievable utility against an optimal adversary.
• Suppose both the players are well aware of the tic-tac-toe and playing the best
  play. Each player is doing his best to prevent another one from winning. MIN
  is acting against Max in the game.
• So in the game tree, we have a layer of Max, a layer of MIN, and each layer is
  called as Ply. Max place x, then MIN puts o to prevent Max from winning,
  and this game continues until the terminal node.
• In this either MIN wins, MAX wins, or it's a draw. This game-tree is the whole
  search space of possibilities that MIN and MAX are playing tic-tac-toe and
  taking turns alternately.
 Hence adversarial Search for the minimax procedure works as follows:
       • It aims to find the optimal strategy for MAX to win the game.
       • In the game tree, optimal leaf node could appear at any depth of the tree. It
         follows the approach of Depth-first search.
       • Propagate the minimax values up to the tree until the terminal node
         discovered.
🞆 In a given game tree, the optimal strategy can be determined from the minimax
  value of each node, which can be written as MINIMAX(n). MAX prefer to move
  to a state of maximum value and MIN prefer to move to a state of minimum
  value then:
MINIMAX(s) =
⎧ UTILITY(s)                  if TERMINAL-TEST(s)
⎨ maxa∈Actions(s) MINIMAX(RESULT(s, a)) if PLAYER(s) = MAX
⎩ mina∈Actions(s) MINIMAX(RESULT(s, a) if PLAYER(s) = MIN
🞆 For tic-tac-toe the game tree is relatively
  small—fewer than 9! = 362, 880 terminal
  nodes. But for chess there are over 1040
  nodes, so the game tree is best thought of
  as a theoretical construct that we cannot
  realize in the physical world
🞆 But regardless of the size of the game
  tree, it is MAX’s job to search for a
  good move
🞆 We use the term search tree for a tree that
  is superimposed on the full game tree,
  and examines enough nodes to allow a
  player to determine what move to make.
OPTIMAL DECISIONS IN GAMES
🞆 In a normal search problem, the optimal
  solution would be a sequence of actions
  leading to a goal state—a terminal state that
  is a win.
🞆 In adversarial search, MIN has something to
  say about it.
🞆 MAX therefore must find a contingent
  strategy, which specifies
🞆 MAX’s move in the initial state, then MAX’s
  moves in the states resulting from every
  possible response by MIN to those moves,
  and so on.
2-PLY GAME
2-PLY GAME
🞆 Even a simple game like tic-tac-toe is too complex for us to draw
  the entire game tree on one page, so we will switch to the trivial
  game in Figure 5.2.
🞆 The possible moves for MAX at the root node are labeled a1, a2, and
  a3.
🞆 The possible replies to a1 for MIN are b1, b2, b3, and so on.
🞆 This particular game ends after one move each by MAX and MIN.
  (In game parlance, we say that this tree is one move deep, consisting
  of two half-moves, each of which is called a ply.)
🞆 The utilities of the terminal states in this game range from 2 to 14.
🞆 Given a game tree, the optimal strategy can be determined
  from the minimax value of each node, which we write as
  MINIMAX(n).
🞆 The minimax value of a node is the utility (for MAX) of being
  in the corresponding state, assuming that both players play
  optimally from there to the end of the game.
🞆 Obviously, the minimax value of a terminal state is just its
  utility.
🞆 Furthermore, given a choice, MAX prefers to move to a state
  of maximum value, whereas MIN prefers a state of minimum
  value.
🞆 So we have the following:
🞆 MINIMAX(s) =
🞆 ⎧ UTILITY(s)         if TERMINAL-TEST(s)
🞆 ⎨ maxa∈Actions(s) MINIMAX(RESULT(s, a)) if
  PLAYER(s) = MAX
🞆 ⎩ mina∈Actions(s) MINIMAX(RESULT(s, a))      if
  PLAYER(s) = MIN
🞆 Let us apply these definitions to the game tree in Figure 5.2.
🞆 The terminal nodes on the bottom level get their utility values from the
    game’s UTILITY function.
🞆   The first MIN node, labeled B, has three successor states with values 3,
    12, and 8, so its minimax value is 3.
🞆 Similarly, the other two MIN nodes have minimax value 2.
🞆 The root node is a MAX node; its successor states have minimax values 3,
    2, and 2; so it has a minimax value of 3.
🞆 We can also identify the minimax decision
  at the root: action a1 is the optimal choice for
  MAX because it leads to the state with the
  highest minimax value.
🞆 This definition of optimal play for MAX
  assumes that MIN also plays optimally—it
  maximizes the worst-case outcome for MAX.
  What if MIN does not play optimally? Then it
  is easy to show that MAX will do even better.
THE MINIMAX ALGORITHM
🞆 The minimax algorithm (Figure 5.3) computes the minimax decision from
  the current state. It uses a simple recursive computation of the minimax
  values of each successor state, directly implementing the defining
  equations. The recursion proceeds all the way down to the leaves of the tree,
  and then the minimax values are backed up through the tree as the
  recursion unwinds. For example, in Figure 5.2, the algorithm first recurses
  down to the three bottom- left nodes and uses the UTILITY function on
  them to discover that their values are 3, 12, and 8, respectively. Then it
  takes the minimum of these values, 3, and returns it as the backed- up value
  of node B. A similar process gives the backed-up values of 2 for C and 2 for
  D. Finally, we take the maximum of 3, 2, and 2 to get the backed-up value
  of 3 for the root node. The minimax algorithm performs a complete depth-
  first exploration of the game tree.
🞆 If the maximum depth of the tree is m and there are b legal moves at each
  point, then the time complexity of the minimax algorithm is O(b m). The
  space complexity is O(bm) for an algorithm that generates all actions at
  once, or O(m) for an algorithm that generates actions one at a time (see
  page 87). For real games, of course, the time cost is totally impractical, but
  this algorithm serves as the basis for the mathematical analysis of games
  and for more practical algorithms.
   EXAMPLE PROBLEM
🞆 Step-1: In the first step, the algorithm generates the entire game-tree and
  apply the utility function to get the utility values for the terminal states. In
  the below tree diagram, let's take A is the initial state of the tree. Suppose
  maximizer takes first turn which has worst-case initial value =- infinity, and
  minimizer will take next turn which has worst-case initial value = +infinity.
Step 2: Now, first we find the utilities value for the Maximizer, its
initial value is -∞, so we will compare each value in terminal state with
initial value of Maximizer and determines the higher nodes values. It
will find the maximum among the all.
•For node D           max(-1,- -∞) => max(-1,4)= 4
•For Node E           max(2, -∞) => max(2, 6)= 6
•For Node F           max(-3, -∞) => max(-3,-5) = -3
•For node G           max(0, -∞) = max(0, 7) = 7
Step 3: In the next step, it's a turn for minimizer, so it will compare
all nodes value with +∞, and will find the 3rd layer node values.
•For node B= min(4,6) = 4
•For node C= min (-3, 7) = -3
Step 4: Now it's a turn for Maximizer, and it will again choose the
maximum of all nodes value and find the maximum value for the root
node. In this game tree, there are only 4 layers, hence we reach
immediately to the root node, but in real games, there will be more
than 4 layers.
•For node A max(4, -3)= 4
🞆 Properties of Mini-Max algorithm:
• Complete- Min-Max algorithm is Complete. It will definitely find a solution
  (if exist), in the finite search tree.
• Optimal- Min-Max algorithm is optimal if both opponents are playing
  optimally.
• Time complexity- As it performs DFS for the game-tree, so the time
  complexity of Min-Max algorithm is O(bm), where b is branching factor of the
  game-tree, and m is the maximum depth of the tree.
• Space Complexity- Space complexity of Mini-max algorithm is also similar
  to DFS which is O(bm).
🞆 Limitation of the minimax Algorithm:
🞆 The main drawback of the minimax algorithm is that it gets really slow for
  complex games such as Chess, go, etc. This type of games has a huge
  branching factor, and the player has lots of choices to decide. This limitation of
  the minimax algorithm can be improved from alpha-beta pruning which we
  have discussed in the next topic.
  ALPHA-BETA PRUNING
• Alpha-beta pruning is a modified version of the minimax algorithm. It is an
  optimization technique for the minimax algorithm.
• As we have seen in the minimax search algorithm that the number of game
  states it has to examine are exponential in depth of the tree. Since we cannot
  eliminate the exponent, but we can cut it to half.
• Hence there is a technique by which without checking each node of the game tree
  we can compute the correct minimax decision, and this technique is called pruning.
• This involves two threshold parameter Alpha and beta for future expansion, so
  it is called alpha-beta pruning. It is also called as Alpha-Beta Algorithm.
• Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only
  prune the tree leaves but also entire sub-tree.
• The two-parameter can be defined as:
   • Alpha: The best (highest-value) choice we have found so far at any point along
       the path of Maximizer. The initial value of alpha is -∞.
    • Beta: The best (lowest-value) choice we have found so far at any point along
      the path of Minimizer. The initial value of beta is +∞.
🞆 The Alpha-beta pruning to a standard
  minimax algorithm returns the same move
  as the standard algorithm does, but it
  removes all the nodes which are not
  really affecting the final decision but
  making algorithm slow. Hence by pruning
  these nodes, it makes the algorithm fast
🞆 Alpha–beta pruning can be applied to
  trees of any depth, and it is often possible
  to prune entire subtrees rather than just
  leaves.
CONDITIONS:
🞆 Condition for Alpha-beta pruning: α>=β
🞆 Key points about alpha-beta pruning:
• The Max player will only update the value of alpha
• The Min player will only update the value of beta
• While backtracking the tree, the node values will be passed to upper
  nodes instead of values of alpha and beta
• We will only pass the alpha, beta values to the child nodes
Step 1: At the first step the, Max player will start first move
from node A where α= -∞ and β= +∞, these value of alpha and beta
passed down to node B where again α= -∞ and β= +∞, and Node B
passes the same value to its child D.
Step 2: At Node D, the value of α will be calculated as its turn for
Max. The value of α is compared with firstly 2 and then 3, and the max
(2, 3) = 3 will be the value of α at node D and node value will also 3
Step 3: Now algorithm backtrack to node B, where the value of β
will change as this is a turn of Min, Now β= +∞, will compare with
the available subsequent nodes value, i.e. min (∞, 3) = 3, hence at node
B now α= -∞, and β= 3.
In the next step, algorithm traverse the next successor of Node B which
is node E, and the values of α= -∞, and β= 3 will also be passed.
Step 4: At node E, Max will take its turn, and the value of alpha will
change. The current value of alpha will be compared with 5, so max (-∞,
5) = 5, hence at node E α= 5 and β= 3, where α>=β, so the right
successor of E will be pruned, and algorithm will not traverse it,
and the value at node E will be 5.
Step 5: At next step, algorithm again backtrack the tree, from
node B to node A. At node A, the value of alpha will be changed the
maximum available value is 3 as max (-∞, 3)= 3, and β= +∞, these
two values now passes to right successor of A which is Node C.
At node C, α=3 and β= +∞, and the same values will be passed on to
node F.
Step 6: At node F, again the value of α will be compared with left child
which is 0, and max(3,0)= 3, and then compared with right child which
is 1, and max(3,1)= 3 still α remains 3, but the node value of F will
become 1.
🞆 Step 7: Node F returns the node value 1 to node C, at C α= 3
  and β= +∞, here the value of beta will be changed, it will
  compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1,
  and again it satisfies the condition α>=β, so the next child
  of C which is G will be pruned, and the algorithm will not
  compute the entire sub-tree G.
🞆 Step 8: C now returns the value of 1 to A here the best value
  for A is max (3, 1) = 3. Following is the final game tree which
  is the showing the nodes which are computed and nodes
  which has never computed. Hence the optimal value for the
  maximizer is 3 for this example.
2-PLY WITH ALPHA BETA PRUNING
🞆 Consider again the two-ply game tree from
  Figure 5.2. Let’s go through the calculation of
  the optimal decision once more, this time
  paying careful attention to what we know at
  each point in the process. The steps are
  explained in Figure 5.5. The outcome is that
  we can identify the minimax decision without
  ever evaluating two of the leaf nodes.
🞆 The general principle is this: consider a node
  n somewhere in the tree (see Figure 5.6),
  such that Player has a choice of moving to
  that node. If Player has a better choice m
  either at the parent node of n or at any
  choice point further up, then n will never be
  reached in actual play. So once we have
  found out enough about n (by examining
  some of its descendants) to reach this
  conclusion, we can prune it.
🞆 Alpha–beta search updates the values of α
  and β as it goes along and prunes the
  remaining branches at a node (i.e.,
  terminates the recursive call) as soon as the
  value of the current node is known to be
  worse than the current α or β value for MAX
  or MIN, respectively. The complete algorithm
  is given in Figure 5.7.
SUMMARY
🞆 in each state, the result of each action, a
  terminal test (which says when the game is
  over), and a utility function that applies to
  terminal states.
🞆 In two-player zero-sum games with perfect
  information, the minimax algorithm can
  select optimal moves by a depth-first
  enumeration of the game tree.
🞆 The alpha–beta search algorithm computes
  the same optimal move as minimax, but
  achieves much greater efficiency by
  eliminating subtrees that are provably
  irrelevant.
OPTIMAL DECISIONS IN
 MULTIPLAYER GAMES
🞆 Many popular games allow more than two players. Let us examine
  how to extend the minimax idea to multiplayer games. This is
  straightforward from the technical viewpoint, but raises some
  interesting new conceptual issues.
🞆 First, we need to replace the single value for each node with a
  vector of values. For example, in a three-player game with players
  A, B, and C, a vector (vA, vB, vC ) is associated with each node. For
  terminal states, this vector gives the utility of the state from each
  player’s
🞆 viewpoint. (In two-player, zero-sum games, the two-element vector
  can be reduced to a single value because the values are always
  opposite.) The simplest way to implement this is to have the UTILITY
  function return a vector of utilities.
🞆 Now we have to consider nonterminal states.
  Consider the node marked X in the game tree
  shown in Figure 5.4. In that state, player C
  chooses what to do. The two choices lead
🞆 to terminal states with utility vectors (vA = 1,
  vB = 2, vC = 6) and (vA = 4, vB = 2, vC = 3).
  Since 6 is bigger than 3, C should choose the
  first move. This means that if state X is
  reached,
🞆 subsequent play will lead to a terminal state
  with utilities (vA = 1, vB = 2, vC = 6). Hence,
  the backed-up value of X is this vector. The
  backed-up value of a node n is always the
🞆 vector of the successor state with the highest
  value for the player choosing at n. Anyone
  who plays multiplayer games, such as
  Diplomacy, quickly becomes aware that
  much more is going on than in two-player
  games. Multiplayer games usually involve
  alliances, whether formal or informal,
  among the players. Alliances are made and
  broken as the game proceeds. How are we to
  understand such behavior? Are alliances a
  natural consequence of optimal strategies for
  each player in a multiplayer game? It turns
  out that they can be. For example,
🞆 suppose A and B are in weak positions and C is in a stronger
  position. Then it is often optimal for both A and B to attack C
  rather than each other, lest C destroy each of them
  individually. In this way, collaboration emerges from purely
  selfish behavior. Of course, as soon as C weakens under
  the joint onslaught, the alliance loses its value, and either
  A or B could violate the agreement. In some cases, explicit
  alliances merely make concrete what would have happened
  anyway. In other cases, a social stigma attaches to breaking
  an alliance, so players must balance the immediate
  advantage of breaking an alliance against the long-term
  disadvantage of being perceived as untrustworthy
🞆 If the game is not zero-sum, then
  collaboration can also occur with just two
  players. Suppose, for example, that there is a
  terminal state with utilities (vA = 1000, vB =
  1000) and that 1000 is the highest possible
  utility for each player. Then the optimal
  strategy is for both
🞆 players to do everything possible to reach
  this state—that is, the players will
  automatically cooperate to achieve a
  mutually desirable goal.