0% found this document useful (0 votes)
30 views145 pages

GP-Module 5

Uploaded by

Mridula Prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views145 pages

GP-Module 5

Uploaded by

Mridula Prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 145

Artificial Intelligence in Game for move prediction

and
optimization
Module 5

1
Content
• Games for Artificial Intelligence, Game AI Panoram;
• AI Methods: Tree Search, Evolutionary Computation,
Supervised Learning & Reinforcement Learning.

2
Games for Artificial Intelligence

• There are a number of reasons why games offer the ideal domain for the study of
artificial intelligence.
• It is that complexity and interestingness of games as a problem that makes them
desirable for AI.
• From a computational complexity perspective, many games are NP-hard (NP refers to
“nondeterministic polynomial time”), meaning that the worst-case complexity of
“solving” them is very high.
• In other words, in the general case an algorithm for solving a particular game could run
for a very long time.
• Depending on a game’s properties complexity can vary substantially.
• Eg for games that are NP-Hard: Mastermind game, the Lemmings (Psygnosis, 1991),
arcade game, and the Minesweeper game by Microsoft.
3
Why Artificial Intelligence for Games?

• AI can improve games in several ways by merely playing them.


• AI plays games with two core objectives in mind: play well and/or play believably
(or human-like, or interestingly).
• AI can control either the player character or the non-player character of the game.
• AI that plays well as a player character focuses on optimizing the performance of
play—performance is measured as the degree to which a player meets the
objectives of the game solely. Such AI can be of tremendous importance for
automatic game testing and for the evaluation of the game design as a whole.
• AI that plays well as a non-player character, instead, can empower dynamic
difficulty adjustment and automatic game balancing mechanisms that will in turn
personalize and enhance the experience for the player
4
Panoramic views of Game AI
• To facilitate and foster synergies across active research areas we place
all key studies into a taxonomy with the hope of developing a common
understanding and vocabulary within the field of AI and games.
• In this section we view game AI research from three high-level
perspectives
1. Methods(Computer) Perspective
2. End User(Human) Perspective
3. Player –Game Interaction Perspective

5
1. Methods(Computer) Perspective

• The first panoramic view of game AI we present is centered around the AI methods used in the field.
• Evolutionary computation is a dominant method for playing to win, for generating content (in an
assisted/mixed-initiative fashion or autonomously), and for modeling players.
• It has also been considered for the design of believable play (play for experience) research.
Supervised learning is of substantial use across the game AI areas and appears to be dominant in
player experience and behavioral modeling, as well as in the area of AI that plays for experience.
• Behavior authoring, on the other hand, is useful solely for game-playing.
• Reinforcement learning and unsupervised learning find limited use across the game AI areas,
respectively, being dominant only on AI that plays to win and player behavior modeling.
• Finally, tree search finds use primarily in playing to win and it is also considered—as a form of
planning—for controlling play for experience and in computational narrative

6
2. End User(Human) Perspective
• The second panoramic view of the game AI field puts an
emphasis on the end user of the AI technology or general
outcome (product or solution)
• Towards that aim we investigate three core dimensions of the
game AI field and classify all game AI areas with respect to the
process AI follows, the game context under which algorithms
operate and, finally, the end user that benefits most from the
resulting outcome.

7
Three core dimensions of the game AI
• In general, what can AI do within games? AI can model or generate.
For instance, an artificial neural network can model a playing pattern,
or a genetic algorithm can generate game assets.
• What can AI methods model or generate in a game? The two possible
classes here are content and behavior. For example, AI can model a
players’ affective state, or generate a level.
• Finally, the third dimension is the end user: AI can model, or generate,
either content or behavior; but, for whom? The classes under the third
dimension are the designer, the player, the AI researcher, and the
producer/publisher.
8
3. Player –Game Interaction Perspective
• Putting an emphasis on player experience and behavior, player modeling directly
focuses on the interaction between a player and the game context.
• Game content is influenced primarily by research on autonomous procedural
content generation. In addition to other types of content, most games feature
NPCs, the behavior of which is controlled by some form of AI.
• NPC behavior is informed by research in NPCs that play the game to win or any
other playing-experience purpose such as believability.

9
What is a “Game AI”?
• The term “Game AI" is used to refer to a broad set of algorithms that also
include techniques from control theory, robotics, computer
graphics and computer science in general.
• Most video games include various non-player characters. These are controlled by
the computer software in some way, known as Game AI.
• In fact, for many game developers “Game AI” refers to the program code that
controls the NPCs, regardless of how simple or sophisticated that code is.

10
Game AI
• The first application of Game AI is in the development of Nim
(a mathematical game of strategy in which two players take
turns removing objects from distinct heaps. On each turn, a
player must remove at least one object, and may remove any
number of objects provided they all come from the same heap.
The goal of the game is to be the player who removes the last
object). The AI for the game was developed in 1951 by
Christopher Strachey.
• The second was for developing chess in 1952 by Dietrich Prinz.
11
Nim and Chess games

12
Examples of games with very good AI

Publisher: Blizzard Entertainment


13
Examples of games with very good AI (contd)

Farcry 2 Developer: Ubisoft Montreal


14
Examples of games with very good AI (contd)

Tom Clancy’s Splinter cell Blacklist


Developer: Ubisoft Montreal 15
Why Use AI to Play Games?

• AI Plays a very important role


in the Gaming Industry Uses of AI In Gaming
• Most frequently associated 1. Playing to win in a player role
with playing as the opponent. 2. Playing to win in a non-player role
3. Playing for experience in the player
role
4. Playing for experience in a non-
player role

16
1. Playing To Win In A Player Role

The most common use of AI together with games in AI is used for testing games too! When designing
academic settings is to play to win, while taking the a new game, or a new game level, you can use a
role of a human player. This is especially common game-playing agent to test whether the game or
when using games as an AI testbed. AI Programs level is playable, so called simulation-based
have been used to give tough competition to human testing.
players. Human-like playing is possible too!

Games are excellent testbeds for artificial There are some games where you need strong AI to provide a
challenge to players, which includes many strategic games such as
intelligence for a number of reasons. Games are
classic board games, including Chess, Checkers and Go. However,
made to test human intelligence and as a result
for games with hidden information, it is often easier to provide
they offer the kind of gradual skill progression that challenge by simply “cheating”, for example, by giving the AI player
allows for testing of AI at different capability levels. access to the hidden state of the game or even by modifying the
hidden state so as to make it harder to play for the human.
17
2. Playing To Win In A Non-Player Role
• The Main role that is most commonly associated with AI in the gaming
industry is Non-Playable Characters (NPCs).
• Non-player characters are very often designed to not offer maximum
challenge or otherwise be as effective as possible, but instead to be
entertaining or humanlike.
• Strategy games such as Civilization , XCOM: Enemy Unknown need NPC
Play To Win AI for performing roles that humans do not play.
• NPC Playing to Win is an essential component of AI that is implemented
in many racing games where the NPC cars adapt to the speed of the
human players so that they are never far too behind or too ahead.
18
Playing To Win In A Non-Player Role (contd)

19
3. Playing For Experience In The Player Role

• Most important reason for this category of AI Application is because we want a


human-like agent that takes the player role, but does not focus on winning for the
purpose of simulation-based testing.
• The quality of the game content is often evaluated automatically with the help of
an agent playing the game.
• Human-like manner is required for the agents so as to evaluate whether humans
can cross the challenges in that particular scenario.
• Another situation where human-like play is necessary is when you want to
demonstrate how to play a level to a human player ,i.e demo mode.
• But AI has one particular disadvantage, i.e there are many ways in which a typical
AI agent plays differently from a typical human player and it depends on the
algorithm for the AI, nature of the game, etc
20
3. Playing For Experience In The Player Role (contd)

21
4. Playing For Experience In A Non- Player Role

• The most common goal for game-playing AI in the game industry is to make non-
player characters act, almost always in ways which are not primarily meant to beat
the player or otherwise “win” the game
• NPCs may exist in games for many, sometimes overlapping, purposes: to act as
adversaries, to provide assistance and guidance, to form part of a puzzle, to tell a
story, to provide a backdrop to the action of the game, to be emotively expressive
and so on
• NPCs role varies widely, they range from the side characters to the game’s main
boss . It all depends on the coding and the algorithm that is used behind the AI.
• A large part of the challenge posed by AI is for the player to memorize the challenge
and use counters against it.
• AI’s algorithm should be developed in a way that it is not boring for the player.
22
Which Games Can AI Play?

23
Which Games Can AI Play? (contd)

Board Games
• Easy to implement AI Algorithms Card Games
•Card games are games centered on one
• Chess commonly used for AI or several decks of cards
Research •Almost all card games feature a large
• Adversarial Planning degree of hidden information
•Poker is a good example
• Skill demands are narrow
•Prediction, Action ,Reaction
• Most board games have very simple
discrete state representations and
deterministic forward models

24
Which Games Can AI Play? (contd)

Classic Arcade Games


• Classic arcade games (found in 70s and 80s) have been commonly used as AI
Benchmarks
• Most games require fast reactions and precise timing

Strategy Games
• Strategy games are games where the player controls multiple characters or units,
and the objective of the game is to prevail in some sort of conquest or conflict.
• It is usually turn-based and real time. Difficult for developing AI due to large
number of outcomes that are possible.
25
Which Games Can AI Play? (contd)
Racing Games
• Racing games are games where the player is tasked with controlling some
kind of vehicle or character so as to reach a goal in the shortest possible
time, or as to traverse as far as possible along a track in a given time
• Most racing games actually require multiple simultaneous tasks to be
executed and have significant skill depth.
• Training of the agent is done for the various tracks and conditions.
• Forza is a prominent example for training AI based on human driving on
the tracks.

26
Which Games Can AI Play? (contd)
First Person Shooter
• Shooters are often seen as fast-paced games where speed of
perception and reaction is crucial, and this is true to an extent,
although the speed of gameplay varies between different shooters.
• AI has advantage of quick reaction and movement over real people.
• But there are other cognitive challenges as well, including Visual Input
, orientation and movement in a complex three-dimensional
environment, predicting actions and locations of multiple adversaries,
and in some game modes also team-based collaboration.

27
AI Algorithms used in Games
• Ad-hoc authoring
• Tree search
• Evolutionary computation
• Supervised learning
• Reinforcement learning and
• Unsupervised learning

28
Path Finding Algorithms

29
Pac Man 1980

30
31
32
Path Finding Algorithms
• Breadth First Search
• Depth First Search
• Dijkstra's Algorithm
• Greedy Search
• A*
• D*

33
Problem Statement

• Shizuka has invited Nobita to have cake


• But because Gian had forced Nobita to play
baseball, he is already late
• Shizuka will get upset if he gets any more late.
• There are chances that she will invite Dekisuki if
Nobita doesn’t show up
• As Nobita’s friend, Doraemon (you) have to find
the optimal path to Shizuka’s home.

34
BREADTH FIRST SEARCH
Treat the neighbourhood as layers

o Explore the neighbourhood layer wise


o Proceed to next layer once all nodes in a layer are
complete
o If destination is found, no need to search further. That is
the nearest distance.

35
Path
Visited
Tentative

36
Path
Visited
Tentative

37
Path
Visited
Tentative

38
Path
Visited
Tentative

39
Path
Visited
Tentative

40
Path
Visited
Tentative

41
Path
Visited
Tentative

42
Path
Visited
Tentative

43
Path
Visited
Tentative

44
Path
Visited
Tentative

45
Path
Visited
Tentative

46
DFS

• Depth-first Search (DFS) is an algorithm for searching a graph or


tree data structure.
• The algorithm starts at the root (top) node of a tree and goes as
far as it can down a given branch (path), and then backtracks until
it finds an unexplored path, and then explores it.
• The algorithm does this until the entire graph has been explored.
• Depth-first searches are often used as subroutines in other more
complex algorithms.

47
Depth first search in Trees

A tree is an undirected graph in which any two vertices are connected by exactly one path. In other words, any
acyclic connected graph is a tree. For a tree, we have below traversal methods –

• Preorder: visit each node before its children.


• Postorder: visit each node after its children.
• Inorder (for binary trees only): visit left subtree, node, right subtree.

48
Depth first search in Graph
Recursive Approach:

Depth first search is a way of traversing To turn this into a graph traversal algorithm, we
graphs, which is closely related to preorder basically replace “child” by “neighbor”. But to
traversal of a tree. Below is recursive prevent infinite loops we keep track of the vertices
implementation of preorder traversal are already discovered and not visit them again.

procedure preorder(treeNode v) procedure dfs(vertex v)


{ {
visit(v); visit(v);
for each child u of v for each neighbor u of v
preorder(u); if u is undiscovered
} call dfs(u);
}

49
Depth first search in Graph

Iterative Approach:

The non-recursive implementation of DFS is similar to the non-recursive implementation of BFS, but
differs from it in two ways:

•It uses a stack instead of a queue


•The DFS should mark discovered only after popping the vertex not before pushing it.
•It uses reverse iterator instead of iterator to produce same results as recursive DFS.

50
Tic Tac Toe Using DFS

Using Recursive Method:-


1. Check if the game is over — if either player won or if the board is fully filled
1. If so, return the result of the game
2. Iterate through all board squares
1. If the square is occupied, continue to the next one
2. Set the square to either an ‘X’ or an ‘O’ depending on the current player’s color
3. Get the outcome of the game by recursing, calling the same method with the updated board, and with the
turn boolean swapped
4. Update the best result of this branch, trying to maximize the result for the current player

51
Path
Visited
Tentative

52
Path
Visited
Tentative

53
Path
Visited
Tentative

54
Path
Visited
Tentative

55
Path
Visited
Tentative

56
Dijkstra’s Algorithm

“Finding shortest path from source by building a set of nodes


that have minimum distance from the source.”

57
Dijkstra’s Algorithm

• Create a set of shortest path tree that keeps track of vertices included in shortest
path tree, i.e., whose minimum distance from source is calculated and finalized.
Initially, this set is empty.

• Assign a distance value to all vertices in the input graph. Initialize all distance
values as ∞. Assign distance value as 0 for the source vertex so that it is picked
first.

58
Continue…

• While the set doesn’t include all vertices

• Pick a vertex u which is not there in set and has minimum distance value.

• Include u to set.

• Update distance value of all adjacent vertices of u. To update the distance


values, iterate through all adjacent vertices. For every adjacent vertex v, if
sum of distance value of u (from source) and weight of edge u-v, is less than
the distance value of v, then update the distance value of v.

59
60
D

A C B

61
D
A B C D

A C B

62
D
A B C D
0 6 2 ∞ ∞ ∞

A C B

63
D
A B C D
0 6 2 ∞ ∞ ∞
B 6 2 3B ∞ ∞
A C B

64
D
A B C D
0 6 2 ∞ ∞ ∞
B 6 2 3B ∞ ∞
A C B
C 6 3B 5C ∞

65
D
A B C D
0 6 2 ∞ ∞ ∞
B 6 2 3B ∞ ∞
A C B
C 6 3B 5C ∞
D 6 5C 8D

66
D
A B C D
0 6 2 ∞ ∞ ∞
B 6 2 3B ∞ ∞
A C B
C 6 3B 5C ∞
D 6 5C 8D
A 6 8D

67
D
A B C D
0 6 2 ∞ ∞ ∞
B 2 3B ∞ ∞
A C B
C 6 3B 5C ∞
D 6 5C 8D
A 6 8D
8D

68
D
A B C D
0 6 2 ∞ ∞ ∞
B 2 3B ∞ ∞
A C B
C 6 3B 5C ∞
D 6 5C 8D
A 6 8D
8D

69
Application

Google maps!!
It uses more complex and efficient algorithms.. But Dijkstra is the basis.

Network Routing Protocols!!!


Dijkstra's algorithm is widely used in the routing protocols required by
the routers to update their forwarding table. The algorithm provide the
shortest cost path from source router to other routers in the network.

70
Greedy

“Focus on choosing the next best choice whether it gives the best
solution or not”

71
Greedy

• Greedy algorithms are fast

• A greedy algorithm is an algorithmic paradigm that follows the problem solving heuristic of making the
locally optimal choice at each stage with the intent of finding a global optimum.

• In many problems, a greedy strategy does not usually produce an optimal solution, but nonetheless a greedy
heuristic may yield locally optimal solutions that approximate a globally optimal solution in a reasonable
amount of time.

72
Crystal Quest

• Crystal Quest is an action game originally for the Macintosh. It was written by Patrick Buckland for Casady &
Greene in 1987, and was ported to the Apple IIgs in 1989 by Bill Heineman. Ports were also made to
the Amiga, Game Boy, iOS, and Palm. It was notable for being the first game to support the color displays on
the Macintosh II.
• In the Macintosh computer game Crystal Quest the objective is to collect crystals, in a fashion similar to
the travelling salesman problem. The game has a demo mode, where the game uses a greedy algorithm to
go to every crystal. The artificial intelligence does not account for obstacles, so the demo mode often ends
quickly.

73
A B

74
A B
H=2 H=6

75
A B

76
A B

77
A* Algorithm

“Cost and Heuristics both matter”

78
A* Algorithm

• Informally speaking, A* Search algorithms, unlike other traversal techniques, it has


“brains”.
• What it means is that it is really a smart algorithm which separates it from the other
conventional algorithms.

79
A* Algorithm - What it does?

What A* Search Algorithm does is that at each step it picks the node according to a
value-‘f’ which is a parameter equal to the sum of two other parameters – ‘g’ and ‘h’. At
each step it picks the node/cell having the lowest ‘f’, and process that node/cell.
g = the movement cost to move from the starting point to a given square on the grid,
following the path generated to get there.
h = the estimated movement cost to move from that given square on the grid to the
final destination. This is often referred to as the heuristic (smart guess)

80
A* Algorithm
1. Initialize the open list successor.h = distance from goal to successor
2. Initialize the closed list successor.f = successor.g + successor.h
put the starting node on the open ii) if a node with the same position as
list (you can leave its f at zero) successor is in the OPEN list which has a
3. while the open list is not empty lower f than successor, skip this successor
a) find the node with the least f on iii) if a node with the same position as
the open list, call it "q" successor is in the CLOSED list which has
b) pop q off the open list a lower f than successor, skip this successor
c) generate q's 8 successors and set their otherwise, add the node to the open list
parents to q end (for loop)
d) for each successor
i) if successor is the goal, stop search e) push q on the closed list
successor.g = q.g + distance between end (while loop)
successor and q
81
A* Algorithm

Games don’t need lot of accuracy, but speed is a crucial thing.


So most of the games use A* algorithm which uses heuristics to
speed up the calculations but with accuracy trade off. In most
cases A* is more than sufficient.

82
A B
H=2 H=6
G=6 G=2

83
D
H=3
G=9

A C B
H=5
G=3

84
D
H=3
G=5

A C B
H=2
G=6

85
D

A C B

86
D

A C B

87
D* Algorithm

behaves like A* except that the arc costs can change as the algorithm runs

88
D* Algorithm

• This is the algo that most games use for games when the terrain is not known/visible
to the player.
• There are three types to this algo viz., original D* (informed incremental search
algorithm), Focussed D* (informed incremental heuristic search algorithm) and the
D* Lite (Lifelong Planning A* algorithm).

89
D* Algorithm – An Example

Imagine exploring an unknown planet using a robotic vehicle. The robot moves along
the rugged terrain while using a range scanner to make precise measurements of the
ground in its vicinity. As the robot moves, it may discover that some parts were easier
to traverse than it originally thought. In other cases, it might realize that some direction
it was intending to go is impassable due to a large bolder or a ravine. If the goal is to
arrive at some specified coordinates, this problem can be viewed as a navigation
problem in an unknown environment. Sounds like pure Artificial Intelligence isn’t?

90
The Automated Cross-Country Unmanned Vehicle (XUV) is
equipped with laser radar and other sensors, and uses Stentz's
algorithm (D*) to navigate (courtesy of General Dynamics
Robotic Systems).

91
D* Algorithm

Games don’t need lot of accuracy, but speed is a crucial thing.


So most of the games use A* algorithm which uses heuristics to
speed up the calculations but with accuracy trade off. In most
cases A* is more than sufficient.

Very well, but what if the heuristic of a path keeps changing?


Or there is a enemy roaming around that needs to be avoided.
Or the position of the goal state itself changes without you
having any knowledge of the situation? This is a problem of
dynamic heuristic

92
Bye – Bye Nobita!
I am not going to help you anymore, you have do it yourself now!

5 6 8

93
4 5 6

5 7

94
3 4

4 8

5 6 7

95
a

4 3 4

3 5

6 5 6

96
a

8 9 7

9 10 9 8

97
a

9 8 6

10 9 8 7

98
Bye – Bye Nobita!
I am not going to help you anymore, you have do it yourself now!

4 5

8 7 7

9 8 7

99
Bye – Bye Nobita!
I am not going to help you anymore, you have do it yourself now!

a 3 4

7 6 7

8 7

100
a 2 3

7 6 7

8 7

101
a 2

4 5

7 6 7

8 7

102
a 1 3

4 5

7 6 7

8 7

103
a 2 3

4 5

104
a 2 3

105
D* Algorithm

D* and its variants have been widely used for mobile robot and autonomous vehicle
navigation. Such navigation systems include a prototype system tested on the Mars
rovers Opportunity and Spirit and the navigation system of the winning entry in the
DARPA Urban Challenge, both developed at Carnegie Mellon University.

106
Mini Max Algorithm

107
Tic-Tac-Toe
Racing Games
 Initial State: Board position of 3x3 matrix with 0 and X.
 Operators: Putting 0’s or X’s in vacant positions alternatively
 Terminal test: Which determines game is over
 Utility function:
e(p) = (No. of complete rows, columns or diagonals are still open for
player ) – (No. of complete rows, columns or diagonals are still open for
opponent ) X O

108
Minimax Algorithm
• Generate the game tree
• Apply the utility function to each terminal state to get its value
• Use these values to determine the utility of the nodes one level
higher up in the search tree
– From bottom to top
– For a max level, select the maximum value of its successors
– For a min level, select the minimum value of its successors
• From root node select the move which leads to highest value

109
110
Alpha Beta pruning

• Alpha-beta pruning is a modified version of the minimax algorithm. It is an optimization


technique for the minimax algorithm.
• As we have seen in the minimax search algorithm that the number of game states it has
to examine are exponential in depth of the tree. Since we cannot eliminate the
exponent, but we can cut it to half.
• Hence there is a technique by which without checking each node of the game tree we
can compute the correct minimax decision, and this technique is called pruning
• Let’s define the parameters alpha and beta.

• Alpha is the best value that the maximizer currently can guarantee at that level or
above.

• Beta is the best value that the minimizer currently can guarantee at that level or above. 111
Alpha Beta pruning (contd)

• The initial call starts from A. The value of alpha


here is -INFINITY and the value of beta
is +INFINITY. These values are passed down to
subsequent nodes in the tree. At A the maximizer
must choose max of B and C, so A calls B first
• At B it the minimizer must choose min
of D and E and hence calls D first.
• At D, it looks at its left child which is a leaf node.
This node returns a value of 3. Now the value of
alpha at D is max( -INF, 3) which is 3.

112
Alpha Beta pruning (contd)

• D now looks at its right child which returns a value of 5.At D, alpha = max(3, 5) which is 5. Now
the value of node D is 5
• D returns a value of 5 to B. At B, beta = min( +INF, 5) which is 5. The minimizer is now
guaranteed a value of 5 or lesser. B now calls E to see if he can get a lower value than 5.
• At E the values of alpha and beta is not -INF and +INF but instead -INF and 5 respectively,
because the value of beta was changed at B and that is what B passed down to E
• Now E looks at its left child which is 6. At E, alpha = max(-INF, 6) which is 6. Here the condition
becomes true. beta is 5 and alpha is 6. So beta<=alpha is true. Hence it breaks and E returns 6
to B
• Note how it did not matter what the value of E‘s right child is. It could have been +INF or -INF, it
still wouldn’t matter, We never even had to look at it because the minimizer was guaranteed a
value of 5 or lesser. So as soon as the maximizer saw the 6 he knew the minimizer would never
come this way because he can get a 5 on the left side of B. This way we dint have to look at that 113
After pruning and repeating the steps for all the nodes

114
Use of Alpha Beta Pruning
• It reduces the computation time by a huge factor.
• This allows us to search much faster and even go into deeper
levels in the game tree.
• It cuts off branches in the game tree which need not be
searched because there already exists a better move available.

115
Reinforcement Learning

Many faces of
Reinforcement Learning

116
Branches of Machine Learning

117
Characteristics of Reinforcement Learning
What makes reinforcement learning different from other
machine learning paradigms?
• There is no supervisor, only a reward signal
• Feedback is delayed, not instantaneous
• Time really matters (sequential, non i.i.d data)
• Agent's actions affect the subsequent data it receives

118
Examples of Reinforcement Learning
• Fly stunt manoeuvres in a helicopter
• Defeat the world champion at Backgammon
• Manage an investment portfolio
• Control a power station
• Make a humanoid robot walk
• Play many different Atari games better than humans

119
Rewards
• Reinforcement learning is based on the reward hypothesis
• All goals can be described by the maximisation of expected
cumulative reward

120
Examples of Rewards

Fly stunt manoeuvres in a helicopter


• +ve reward for following desired trajectory
• -ve reward for crashing
Defeat the world champion at Backgammon
• +/-ve reward for winning/losing a game
Manage an investment portfolio
• +ve reward for each $ in bank
Control a power station
• +ve reward for producing power
• -ve reward for exceeding safety thresholds
Make a humanoid robot walk
• +ve reward for forward motion
• -ve reward for falling over
Play many dierent Atari games better than humans
• +/-ve reward for increasing/decreasing score 121
Sequential Decision Making
• Goal: select actions to maximise total future reward
• Actions may have long term consequences
• Reward may be delayed
• It may be better to sacrice immediate reward to gain more
• long-term reward
• Examples:
• A financial investment (may take months to mature)
• Refuelling a helicopter (might prevent a crash in several hours)
• Blocking opponent moves (might help winning chances many
• moves from now)
122
Agent and Environment

123
Agent and Environment

At each step t the agent:


• Executes action At
• Receives observation Ot
• Receives scalar reward Rt
The environment:
• Receives action At
• Emits observation Ot+1
• Emits scalar reward Rt+1
t increments at env. step
124
A* Search algorithm

125
A* Algorithm

126
Graph example

Heuristic
function h(n)

Cost function
g(n)

127
128
129
130
131
132
133
A Real Time
application of A*

134
135
136
137
138
139
D* Algorithm
• The aigorithnl is named D* because it resembles A* , except
that it is dynamic in the sense that arc costs can change during
the traverse of the solution path.
• Provided that the traverse is properly coupled to the
replanning process, it is guaranteed to be optimal.

140
D* Lite
• Consider a goal-directed robot-navigation task in unknown terrain,
where the robot always observes which of its eight adjacent cells are
traversable and then moves with cost one to one of them.
• The robot starts at the start cell and has to move to the goal cell. It
always computes a shortest path from its current cell to the goal cell
under the assumption that cells with unknown blockage status are
traversable.
• It then follows this path until it reaches the goal cell, in which case it
stops successfully, or it observes an untraversable cell, in which case
it recomputes a shortest path from its current cell to the goal cell.
141
D* Lite
(Contd)
• Figure 1 shows the
goal distances of all
traversable cells
and the shortest
paths from its
current cell to the
goal cell both
before and after the
robot has moved
along the path and
discovered the first
blocked cell it did
not know about.
142
D* Lite (Contd)

• Cells whose goal distances have changed are shaded gray. The goal distances are
important because one can easily determine a shortest path from its current cell of
the robot to the goal cell by greedily decreasing the goal distances once the goal
distances have been computed.
• Notice that the number of cells with changed goal distances is small and most of the
changed goal distances are irrelevant for recalculating a shortest path from its current
cell to the goal cell.
• Thus, one can efficiently recalculate a shortest path from its current cell to the goal
cell by recalculating only those goal distances that have changed (or have not been
calculated before) and are relevant for recalculating the shortest path.
• This is what D* Lite does. The challenge is to identify these cells efficiently.
• Reference: https://www.youtube.com/watch?v=q77-uxsDZow 143
Navigation Mesh

• A navigation mesh, or navmesh, is an abstract data structure used in artificial


intelligence applications to aid agents in pathfinding through complicated
spaces.
• This approach has been known since at least the mid-1980s in robotics, where
it has been called a meadow map, and was popularized in video game AI in
2000.
• A navigation mesh is a collection of two-dimensional convex
polygons (a polygon mesh) that define which areas of an environment are
traversable by agents.
• In other words, a character in a game could freely walk around within these
areas unobstructed by trees, lava, or other barriers that are part of the
environment. Adjacent polygons are connected to each other in a graph.
144
Navigation Mesh (contd..)

• Pathfinding within one of these polygons can be done trivially in a straight line because the polygon is
convex and traversable. Pathfinding between polygons in the mesh can be done with one of the large
number of graph search algorithms, such as A*.
• Agents on a navmesh can thus avoid computationally expensive collision detection checks with
obstacles that are part of the environment.
• Navigation meshes can be created manually, automatically, or by some combination of the two. In
video games, a level designer might manually define the polygons of the navmesh in a level editor.
• This approach can be quite labor intensive. Alternatively, an application could be created that takes the
level geometry as input and automatically outputs a navmesh.
• It is commonly assumed that the environment represented by a navmesh is static – it does not change
over time – and thus the navmesh can be created offline and be immutable. However, there has been
some investigation of online updating of navmeshes for dynamic environments
• Reference: https://www.youtube.com/watch?v=SMWxCpLvrcc
145

You might also like