0% found this document useful (0 votes)

56 views9 pages

Understanding Markov Decision Processes

Uploaded by

khanakgupta321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views9 pages

Understanding Markov Decision Processes

Uploaded by

khanakgupta321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

 Living near an airport for a year and getting used to the sound of airplanes passing

overhead ─ Habituation
 Hearing loud thunder when at home alone at night and then becoming easily startled
by bright flashes of light ─ Sensitization

5.6.3 Markov-Decision Process

Reinforcement Learning is a type of Machine Learning. It allows machines and software

agents to automatically determine the ideal behavior within a specific context, in order to
maximize its performance. Simple reward feedback is required for the agent to learn its
behavior; this is known as the reinforcement signal.

There are many different algorithms that tackle this issue. As a matter of fact,
Reinforcement Learning is defined by a specific type of problem, and all its solutions are
classed as Reinforcement Learning algorithms. In the problem, an agent is supposed to
decide the best action to select based on his current state. When this step is repeated, the
problem is known as a Markov Decision Process.
A Markov Decision Process (MDP) model contains:

A set of possible world states S.

 A set of Models.
 A set of possible actions A.
 A real-valued reward function R(s,a).
 A policy the solution of Markov Decision Process.

IT DEPT-R20-MACHINE LEARNING Page 124

State:

A State is a set of tokens that represent every state that the agent can be in.

Model:
A Model (sometimes called Transition Model) gives an action’s effect in a state. In
particular, T(S, a, S’) defines a transition T where being in state S and taking an action
‘a’ takes us to state S’ (S and S’ may be the same). For stochastic actions (noisy, non-
deterministic) we also define a probability P(S’|S,a) which represents the probability of
reaching a state S’ if action ‘a’ is taken in state S. Note Markov property states that the
effects of an action taken in a state depend only on that state and not on the prior
history.

Actions
An Action A is a set of all possible actions. A(s) defines the set of actions that can be taken
being in state S.

Reward
A Reward is a real-valued reward function. R(s) indicates the reward for simply being in
the state S. R(S,a) indicates the reward for being in a state S and taking an action ‘a’.
R(S,a,S’) indicates the reward for being in a state S, taking an action ‘a’ and ending up in a
state S’.
Policy
A Policy is a solution to the Markov Decision Process. A policy is a mapping from S to a.
It indicates the action ‘a’ to be taken while in state S.

Let us take the example of a grid world:

IT DEPT-R20-MACHINE LEARNING Page 125

An agent lives in the grid. The above example is a 3*4 grid. The grid has a START
state(grid no 1,1). The purpose of the agent is to wander around the grid to finally reach the
Blue Diamond (grid no 4,3). Under all circumstances, the agent should avoid the Fire grid
(orange color, grid no 4,2). Also the grid no 2,2 is a blocked grid, it acts as a wall hence the
agent cannot enter it.

The agent can take any one of these actions: UP, DOWN, LEFT, RIGHT
Walls block the agent path, i.e., if there is a wall in the direction the agent would have
taken, the agent stays in the same place. So for example, if the agent says LEFT in the
START grid he would stay put in the START grid.

First Aim: To find the shortest sequence getting from START to the Diamond. Two such
sequences can be found:
 RIGHT RIGHT UP UPRIGHT
 UP UP RIGHT RIGHT RIGHT
Let us take the second one (UP UP RIGHT RIGHT RIGHT) for the subsequent discussion.
The move is now noisy. 80% of the time the intended action works correctly. 20% of the
time the action agent takes causes it to move at right angles. For example, if the agent says
UP the probability of going UP is 0.8 whereas the probability of going LEFT is 0.1, and the
probability of going RIGHT is 0.1 (since LEFT and RIGHT are right angles to UP).

The agent receives rewards each time step:-

 Small reward each step (can be negative when can also be term as punishment, in
the above example entering the Fire can have a reward of -1).
 Big rewards come at the end (good or bad).
 The goal is to Maximize the sum of rewards.

IT DEPT-R20-MACHINE LEARNING Page 126

5.6.4 Q-learning
Q-learning is a model-free, value-based, off-policy algorithm that will find the best series of
actions based on the agent's current state. The “Q” stands for quality. Quality represents how
valuable the action is in maximizing future rewards.

The model-based algorithms use transition and reward functions to estimate the optimal
policy and create the model. In contrast, model-free algorithms learn the consequences of
their actions through the experience without transition and reward function.

The value-based method trains the value function to learn which state is more valuable and
take action. On the other hand, policy-based methods train the policy directly to learn which
action to take in a given state.

In the off-policy, the algorithm evaluates and updates a policy that differs from the policy
used to take an action. Conversely, the on-policy algorithm evaluates and improves the same
policy used to take an action

Before we jump into how Q-learning works, we need to learn a few useful terminologies to
understand Q-learning's fundamentals.

 States(s): the current position of the agent in the environment.

 Action(a): a step taken by the agent in a particular state.

 Rewards: for every action, the agent receives a reward and penalty.

 Episodes: the end of the stage, where agents can’t take new action. It happens when
the agent has achieved the goal or failed.

 Q(St+1, a): expected optimal Q-value of doing the action in a particular state.

 Q(St, At): it is the current estimation of Q(St+1, a).

 Q-Table: the agent maintains the Q-table of sets of states and actions.

 Temporal Differences(TD): used to estimate the expected value of Q(St+1, a) by using

the current state and action and previous state and action.

IT DEPT-R20-MACHINE LEARNING Page 127

We will learn in detail how Q-learning works by using the example of a frozen lake. In this
environment, the agent must cross the frozen lake from the start to the goal, without falling
into the holes. The best strategy is to reach goals by taking the shortest path

Q-Table

The agent will use a Q-table to take the best possible action based on the expected reward for
each state in the environment. In simple words, a Q-table is a data structure of sets of actions
and states, and we use the Q-learning algorithm to update the values in the table.

Q-Function

The Q-function uses the Bellman equation and takes state(s) and action(a) as input. The
equation simplifies the state values and state-action value calculation.

IT DEPT-R20-MACHINE LEARNING Page 128

Q-learning algorithm

Initialize Q-Table

We will first initialize the Q-table. We will build the table with columns based on the number
of actions and rows based on the number of states.

In our example, the character can move up, down, left, and right. We have four possible
actions and four states(start, Idle, wrong path, and end). You can also consider the wrong
path for falling into the hole. We will initialize the Q-Table with values at 0.

IT DEPT-R20-MACHINE LEARNING Page 129

Choose an Action

The second step is quite simple. At the start, the agent will choose to take the random
action(down or right), and on the second run, it will use an updated Q-Table to select the
action.

Perform an Action

Choosing an action and performing the action will repeat multiple times until the training
loop stops. The first action and state are selected using the Q-Table. In our case, all values of
the Q-Table are zero.

Then, the agent will move down and update the Q-Table using the Bellman equation. With
every move, we will be updating values in the Q-Table and also using it for determining the
best course of action.

Initially, the agent is in exploration mode and chooses a random action to explore the
environment. The Epsilon Greedy Strategy is a simple method to balance exploration and
exploitation. The epsilon stands for the probability of choosing to explore and exploits when
there are smaller chances of exploring.

At the start, the epsilon rate is higher, meaning the agent is in exploration mode. While
exploring the environment, the epsilon decreases, and agents start to exploit the environment.
During exploration, with every iteration, the agent becomes more confident in estimating Q-
values

IT DEPT-R20-MACHINE LEARNING Page 130

In the frozen lake example, the agent is unaware of the environment, so it takes random
action (move down) to start. As we can see in the above image, the Q-Table is updated using
the Bellman equation.

Measuring the Rewards

After taking the action, we will measure the outcome and the reward.

 The reward for reaching the goal is +1

 The reward for taking the wrong path (falling into the hole) is 0

 The reward for Idle or moving on the frozen lake is also 0.

Update Q-Table

We will update the function Q(St, At) using the equation. It uses the previous episode’s
estimated Q-values, learning rate, and Temporal Differences error. Temporal Differences
error is calculated using Immediate reward, the discounted maximum expected future reward,
and the former estimation Q-value.

The process is repeated multiple times until the Q-Table is updated and the Q-value function
is maximized.

IT DEPT-R20-MACHINE LEARNING Page 131

At the start, the agent is exploring the environment to update the Q-table. And when the Q-
Table is ready, the agent will start exploiting and start taking better decisions.

In the case of a frozen lake, the agent will learn to take the shortest path to reach the goal and
avoid jumping into the holes.

IT DEPT-R20-MACHINE LEARNING Page 132

Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
17 pages
Q-Learning in Reinforcement Learning
No ratings yet
Q-Learning in Reinforcement Learning
27 pages
Understanding Q-Learning in Reinforcement Learning
No ratings yet
Understanding Q-Learning in Reinforcement Learning
47 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
RL Workshop Compressed
No ratings yet
RL Workshop Compressed
31 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
23 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
13 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
11 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
76 pages
Understanding Q-Learning Basics
No ratings yet
Understanding Q-Learning Basics
65 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
51 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
59 pages
Module-3 Part-2
No ratings yet
Module-3 Part-2
41 pages
Q-Learning and TD Learning Explained
No ratings yet
Q-Learning and TD Learning Explained
6 pages
Understanding Q-Learning Basics
No ratings yet
Understanding Q-Learning Basics
24 pages
Reinforcement Learning Fundamentals
No ratings yet
Reinforcement Learning Fundamentals
58 pages
Understanding Q-Learning in RL
No ratings yet
Understanding Q-Learning in RL
30 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
32 pages
Reinforcement Learning Explained
No ratings yet
Reinforcement Learning Explained
45 pages
Q-Learning in Reinforcement Learning
No ratings yet
Q-Learning in Reinforcement Learning
54 pages
Understanding Reinforcement Learning
No ratings yet
Understanding Reinforcement Learning
17 pages
Chapter V - Reinforcement Learning & Genetic Algorithms
No ratings yet
Chapter V - Reinforcement Learning & Genetic Algorithms
34 pages
Understanding Reinforcement Learning
No ratings yet
Understanding Reinforcement Learning
32 pages
Reinforcement Learning for Robot Walking
No ratings yet
Reinforcement Learning for Robot Walking
9 pages
Introduction to Q-Learning in RL
No ratings yet
Introduction to Q-Learning in RL
36 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
29 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
10 pages
Reinforcement Learning Algorithms Overview
No ratings yet
Reinforcement Learning Algorithms Overview
5 pages
Understanding Q-Learning in AI
No ratings yet
Understanding Q-Learning in AI
26 pages
Understanding Q-Learning in AI
No ratings yet
Understanding Q-Learning in AI
4 pages
Q-Learning Explained: Optimal Pathfinding
No ratings yet
Q-Learning Explained: Optimal Pathfinding
26 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
18 pages
MDP in Reinforcement Learning Overview
No ratings yet
MDP in Reinforcement Learning Overview
10 pages
Reinforcement Learning Basics and MDPs
No ratings yet
Reinforcement Learning Basics and MDPs
22 pages
Reinforcement Learning Overview Guide
No ratings yet
Reinforcement Learning Overview Guide
98 pages
Passive Reinforcement Learning Overview
No ratings yet
Passive Reinforcement Learning Overview
43 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
18 pages
Reinforcement Learning Fundamentals
No ratings yet
Reinforcement Learning Fundamentals
44 pages
Multi-Agent Reinforcement Learning in Hide and Seek
No ratings yet
Multi-Agent Reinforcement Learning in Hide and Seek
7 pages
Understanding Feature Types and Q-Learning
No ratings yet
Understanding Feature Types and Q-Learning
10 pages
Reinforcement Learning Techniques Explained
No ratings yet
Reinforcement Learning Techniques Explained
15 pages
Understanding Q-Learning Basics
No ratings yet
Understanding Q-Learning Basics
67 pages
Reinforcement Learning Overview by Lucibello
No ratings yet
Reinforcement Learning Overview by Lucibello
56 pages
MDP Simulation in Reinforcement Learning
No ratings yet
MDP Simulation in Reinforcement Learning
18 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
25 pages
Reinforcement Learning Concepts and Models
No ratings yet
Reinforcement Learning Concepts and Models
60 pages
Reinforcement Learning Concepts and Q-Learning
No ratings yet
Reinforcement Learning Concepts and Q-Learning
12 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
34 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
16 pages
Advanced Reinforcement Learning Course
No ratings yet
Advanced Reinforcement Learning Course
33 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
38 pages
Q-Learning in OpenAI Gym: FrozenLake
No ratings yet
Q-Learning in OpenAI Gym: FrozenLake
7 pages
AIDL Unit-6 Reinforcement Learing
No ratings yet
AIDL Unit-6 Reinforcement Learing
32 pages
AI A-Z: Building Intelligent Systems
No ratings yet
AI A-Z: Building Intelligent Systems
12 pages
Reinforcement Learning Fundamentals
No ratings yet
Reinforcement Learning Fundamentals
53 pages
Reinforcement Learning Overview Guide
No ratings yet
Reinforcement Learning Overview Guide
12 pages
Understanding Reinforcement Learning
No ratings yet
Understanding Reinforcement Learning
35 pages
C#.NET Features and Development Guide
No ratings yet
C#.NET Features and Development Guide
58 pages
Java Programming Basics and History
No ratings yet
Java Programming Basics and History
51 pages
Big Data Analytics in Engineering Overview
No ratings yet
Big Data Analytics in Engineering Overview
23 pages
VB.NET Keywords and Concepts Overview
No ratings yet
VB.NET Keywords and Concepts Overview
26 pages
Windows Forms App Development Guide
No ratings yet
Windows Forms App Development Guide
24 pages
Overview of Visual Studio IDE Features
No ratings yet
Overview of Visual Studio IDE Features
49 pages
Java Array Problems and Solutions Guide
No ratings yet
Java Array Problems and Solutions Guide
3 pages
2015 Monthly Calendar Printable
No ratings yet
2015 Monthly Calendar Printable
12 pages
Java DSA Problems and Solutions
No ratings yet
Java DSA Problems and Solutions
4 pages
Binary Tree and Graph Algorithms Guide
No ratings yet
Binary Tree and Graph Algorithms Guide
4 pages
K-Means and Hierarchical Clustering Guide
No ratings yet
K-Means and Hierarchical Clustering Guide
20 pages
Array Problems: 200 Questions Guide
No ratings yet
Array Problems: 200 Questions Guide
114 pages
Essential DSA Questions List
No ratings yet
Essential DSA Questions List
1 page
Field Study Insights for Aspiring Teachers
No ratings yet
Field Study Insights for Aspiring Teachers
2 pages
Hướng Dẫn Làm Bài Đục Lỗ Câu
No ratings yet
Hướng Dẫn Làm Bài Đục Lỗ Câu
78 pages
Alfred Binet: Pioneer of IQ Testing
100% (1)
Alfred Binet: Pioneer of IQ Testing
4 pages
Large Language Models Assignment 7
No ratings yet
Large Language Models Assignment 7
3 pages
Homeroom Guidance Activities Overview
No ratings yet
Homeroom Guidance Activities Overview
5 pages
Grade 3 Math Rounding Lesson Plan
No ratings yet
Grade 3 Math Rounding Lesson Plan
2 pages
Philippine Poetry Curriculum Overview
No ratings yet
Philippine Poetry Curriculum Overview
11 pages
Philippine Politics: Nation vs. State
No ratings yet
Philippine Politics: Nation vs. State
11 pages
Florida B.E.S.T. Expository Writing Rubric
No ratings yet
Florida B.E.S.T. Expository Writing Rubric
3 pages
English 8 Weekly Instruction Log
No ratings yet
English 8 Weekly Instruction Log
16 pages
Female Students in Athletics: A Study
No ratings yet
Female Students in Athletics: A Study
67 pages
Action Research in Reflective Teaching
100% (1)
Action Research in Reflective Teaching
5 pages
Types of Research Questions Explained
No ratings yet
Types of Research Questions Explained
13 pages
Shs Career Advocacy Template
No ratings yet
Shs Career Advocacy Template
6 pages
Grade 4 Lesson Plan: Compound Words
No ratings yet
Grade 4 Lesson Plan: Compound Words
4 pages
Effective Monitoring in Teaching Methods
No ratings yet
Effective Monitoring in Teaching Methods
23 pages
English 8 Quarter 3: Text Analysis Skills
No ratings yet
English 8 Quarter 3: Text Analysis Skills
4 pages
Exploring Symmetrical Figures in Class
No ratings yet
Exploring Symmetrical Figures in Class
4 pages
Blueprint Implementation Guide
No ratings yet
Blueprint Implementation Guide
52 pages
ELA Lesson Plan: Citing Textual Evidence
No ratings yet
ELA Lesson Plan: Citing Textual Evidence
5 pages
Program Development and Excel Basics
No ratings yet
Program Development and Excel Basics
1 page
SPJ Implementation Plan 2023-2025
100% (3)
SPJ Implementation Plan 2023-2025
2 pages
Weekly Preschool Planner Template
No ratings yet
Weekly Preschool Planner Template
5 pages
Language Teaching Methodologies Guide
No ratings yet
Language Teaching Methodologies Guide
64 pages
Curriculum Vitae of Pragyaa Rihesvar
No ratings yet
Curriculum Vitae of Pragyaa Rihesvar
3 pages
Lesson Plan: Listening Skills for Cities
No ratings yet
Lesson Plan: Listening Skills for Cities
10 pages
Maximize Your Use of ChatGPT Today
No ratings yet
Maximize Your Use of ChatGPT Today
11 pages
Thinking About Thinking Chickens PDF
No ratings yet
Thinking About Thinking Chickens PDF
5 pages
Menifee USD School Calendar 2025-2026
No ratings yet
Menifee USD School Calendar 2025-2026
2 pages
BVSc Programme Overview and Regulations
No ratings yet
BVSc Programme Overview and Regulations
8 pages

Understanding Markov Decision Processes

Uploaded by

Understanding Markov Decision Processes

Uploaded by

 Living near an airport for a year and getting used to the sound of airplanes passing

5.6.3 Markov-Decision Process

Reinforcement Learning is a type of Machine Learning. It allows machines and software

A set of possible world states S.

IT DEPT-R20-MACHINE LEARNING Page 124

Let us take the example of a grid world:

IT DEPT-R20-MACHINE LEARNING Page 125

The agent receives rewards each time step:-

IT DEPT-R20-MACHINE LEARNING Page 126

 States(s): the current position of the agent in the environment.

 Action(a): a step taken by the agent in a particular state.

 Q(St, At): it is the current estimation of Q(St+1, a).

 Temporal Differences(TD): used to estimate the expected value of Q(St+1, a) by using

IT DEPT-R20-MACHINE LEARNING Page 127

IT DEPT-R20-MACHINE LEARNING Page 128

IT DEPT-R20-MACHINE LEARNING Page 129

IT DEPT-R20-MACHINE LEARNING Page 130

Measuring the Rewards

 The reward for reaching the goal is +1

 The reward for Idle or moving on the frozen lake is also 0.

IT DEPT-R20-MACHINE LEARNING Page 131

IT DEPT-R20-MACHINE LEARNING Page 132

You might also like