0% found this document useful (0 votes)

51 views18 pages

5.4-Reinforcement learning-part3-Q-Learning

This document provides an overview of Q-learning, a reinforcement learning algorithm. Q-learning aims to learn an optimal policy that maximizes total reward by learning the quality (Q) of taking an action (a) in a given state (s). The algorithm initializes a Q(s,a) function and then iteratively updates it based on rewards and choices made in each state during episodes. After many episodes, the optimal policy emerges as the one that takes the action with the highest Q-value in each state. The document demonstrates the algorithm through an example gridworld environment.

Uploaded by

polinati.vinesh2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views18 pages

5.4-Reinforcement learning-part3-Q-Learning

Uploaded by

polinati.vinesh2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Week 5 Machine Learning enabled

by prior Theories

Video 5.4 Reinforcement Learning – Part 3 Q-learning

Q Learning

Q-learning is a model-free off-policy TD reinforcement learning algorithm.

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

For any finite Markov decision process (FMDP), Q-learning finds a policy that is optimal in the sense that it
maximizes the expected value of the total reward over any and all successive steps, starting from the current
state.

Q-learning can identify an optimal action-selection policy for any given FMDP, given infinite exploration time
and a partly-random policy.

"Q" names the function Q(s,a) that can be said to stand for the "quality" of an action a taken in a given state s.

Suppose we have the optimal Q-function (s, a) then the optimal policy in state s is argmax a Q(s, a).
Q-learning Algorithm

Initialize Q(s, a) arbitrarily

Repeat (for each episode)
Initialize s
Repeat (for each step of the episode)
Take action a, observe r, s’
Q(s, a) 🡨 Q(s, a) + α[r + γ max Q(s’, a’) – Q(s, a)]
a’
s 🡨 s’

With α =1 or α =1 and γ = 1 the updating formula is simplified

Q(s, a) 🡨 r + γ max Q(s’, a’)
Q(s, a) 🡨 r + max Q(s’, a’)
Example

r=8

r=0
r=-8
States and Actions

States: s Actions: a
1 2 3 4 5
N

6 7 8 9 10
S

11 12 13 14 15
E

16 17 18 19 20
W

Assume that α=1 and γ = 0.5

Initializing the Q(s, a) function

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A N 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c
t S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i
o W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n
s E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
An Episode

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20
Calculating new Q(s, a) values

1st step:

2nd step:

3rd step:

4th step:
The Q(s, a) function after the first episode

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A N 0 0 0 0 0 0 -8 0 0 0 0 0 0 0 0 0 0 0 0 0
c
t S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i
o
W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n
s
E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A second episode

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20
Calculating new Q(s, a) values

1st step:

2nd step:

3rd step:

4th step:
The Q(s, a) function after the second episode

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A N 0 0 0 0 0 0 -8 0 0 0 0 0 0 0 0 0 0 0 0 0
c
t S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i
o W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n
s E 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0
The Q(s, a) function after a few episodes

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

N 0 0 0 0 0 0 -8 -8 -8 0 0 1 2 4 0 0 0 0 0 0
A
c
t S 0 0 0 0 0 0 0.5 1 2 0 0 -8 -8 -8 0 0 0 0 0 0
i
o W 0 0 0 0 0 0 -8 1 2 0 0 -8 0.5 1 0 0 0 0 0 0
n
s E 0 0 0 0 0 0 2 4 8 0 0 1 2 -8 0 0 0 0 0 0
One of the optimal policies

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

N 0 0 0 0 0 0 -8 -8 -8 0 0 1 2 4 0 0 0 0 0 0
A
c S 0 0 0 0 0 0 0.5 1 2 0 0 -8 -8 -8 0 0 0 0 0 0
t
i W 0 0 0 0 0 0 -8 1 2 0 0 -8 0.5 1 0 0 0 0 0 0
o
n
E 0 0 0 0 0 0 2 4 8 0 0 1 2 -8 0 0 0 0 0 0
s
An optimal policy graphically

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20
Another of the optimal policies

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A N 0 0 0 0 0 0 -8 -8 -8 0 0 1 2 4 0 0 0 0 0 0
c
t S 0 0 0 0 0 0 0.5 1 2 0 0 -8 -8 -8 0 0 0 0 0 0
i
o W 0 0 0 0 0 0 -8 1 2 0 0 -8 0.5 1 0 0 0 0 0 0
n
s E 0 0 0 0 0 0 2 4 8 0 0 1 2 -8 0 0 0 0 0 0
Another optimal policy graphically

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20
NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Thanks for your attention!

The next lecture 5.5 will be on the topic:

Case Based Reasoning

Filippov Theory in ϵ-Greedy Q-Learning
No ratings yet
Filippov Theory in ϵ-Greedy Q-Learning
66 pages
Q Learning SARSA Deep Q Learning
No ratings yet
Q Learning SARSA Deep Q Learning
4 pages
MDPs Solving
No ratings yet
MDPs Solving
19 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
12 pages
Adobe Scan Nov 18, 2024
No ratings yet
Adobe Scan Nov 18, 2024
13 pages
Unit 5
No ratings yet
Unit 5
54 pages
MDP Algorithms: Value & Policy Iteration
No ratings yet
MDP Algorithms: Value & Policy Iteration
24 pages
Q Learing
No ratings yet
Q Learing
30 pages
Deep Learning Binoy-19-3-RL Q Learning
No ratings yet
Deep Learning Binoy-19-3-RL Q Learning
26 pages
Q Learning Ejemplo
100% (1)
Q Learning Ejemplo
11 pages
Unit 5
No ratings yet
Unit 5
65 pages
RL Class Mtech
No ratings yet
RL Class Mtech
67 pages
Nidhish RLAI-Lab1
No ratings yet
Nidhish RLAI-Lab1
18 pages
Intro To Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro To Reinforcement Learning - DQ Q AC A3C
36 pages
Q-Learning for Room Navigation Simulation
100% (1)
Q-Learning for Room Navigation Simulation
15 pages
Q Learning
No ratings yet
Q Learning
12 pages
AI Seminar RL
No ratings yet
AI Seminar RL
27 pages
Q Learning
No ratings yet
Q Learning
38 pages
Q Learning
No ratings yet
Q Learning
9 pages
3x3 Grid World Reinforcement Learning
No ratings yet
3x3 Grid World Reinforcement Learning
6 pages
Q-Learning in C++
No ratings yet
Q-Learning in C++
4 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
3964 Double Q Learning
No ratings yet
3964 Double Q Learning
9 pages
SARSA and Q-Learning Overview
No ratings yet
SARSA and Q-Learning Overview
4 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
14 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
14 pages
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
No ratings yet
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
22 pages
39-Q Learning Numerical
No ratings yet
39-Q Learning Numerical
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Subtitle
No ratings yet
Subtitle
2 pages
Reinforcement Learning 2
No ratings yet
Reinforcement Learning 2
41 pages
Q-Learning Convergence Explained
No ratings yet
Q-Learning Convergence Explained
4 pages
q2B Review
No ratings yet
q2B Review
9 pages
ml4r 2025 05
No ratings yet
ml4r 2025 05
22 pages
Reinforcement Learning Concepts Explained
No ratings yet
Reinforcement Learning Concepts Explained
4 pages
09 - Monte Carlo Learning
No ratings yet
09 - Monte Carlo Learning
24 pages
Q Learning
No ratings yet
Q Learning
38 pages
Lec 09
No ratings yet
Lec 09
26 pages
Q Learn
No ratings yet
Q Learn
5 pages
MAS Lab7 QFA
No ratings yet
MAS Lab7 QFA
10 pages
12 ML Reinforcement Learning Value Based Control
No ratings yet
12 ML Reinforcement Learning Value Based Control
12 pages
Reinforedu
No ratings yet
Reinforedu
46 pages
A Painless Q-Learning Tutorial
No ratings yet
A Painless Q-Learning Tutorial
6 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
SARSA, Expected SARSA, Q-Learning
No ratings yet
SARSA, Expected SARSA, Q-Learning
4 pages
Reinforcement Learning II
No ratings yet
Reinforcement Learning II
28 pages
Learning Task
No ratings yet
Learning Task
14 pages
Unit 1
No ratings yet
Unit 1
18 pages
Reinforcement Learning Algorithms in Global Path Planning For Mobile Robot
No ratings yet
Reinforcement Learning Algorithms in Global Path Planning For Mobile Robot
5 pages
Lecture 06
No ratings yet
Lecture 06
98 pages
Reinforcement Learning II
No ratings yet
Reinforcement Learning II
28 pages
Q-Learning and Dynamic Treatment Regimes: S.A. Murphy Univ. of Michigan IMS/Bernoulli: July, 2004
No ratings yet
Q-Learning and Dynamic Treatment Regimes: S.A. Murphy Univ. of Michigan IMS/Bernoulli: July, 2004
31 pages
q2B Review Sol
No ratings yet
q2B Review Sol
14 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
Exp1 D16AD 60
No ratings yet
Exp1 D16AD 60
11 pages
Q-Learning for Optimal Pathfinding
No ratings yet
Q-Learning for Optimal Pathfinding
2 pages
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
No ratings yet
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
9 pages
Machine Learning Week 5 Assignments 2024
No ratings yet
Machine Learning Week 5 Assignments 2024
10 pages
Week 6 Machine Learning Assignments
No ratings yet
Week 6 Machine Learning Assignments
13 pages
7.2 Interdisciplinary Inspiration
No ratings yet
7.2 Interdisciplinary Inspiration
17 pages
Passive vs Active Reinforcement Learning
No ratings yet
Passive vs Active Reinforcement Learning
15 pages
4.1-Inductive Learning Based On Symbolic Representations and Week Theories
No ratings yet
4.1-Inductive Learning Based On Symbolic Representations and Week Theories
9 pages
4.2-GeneralizationAsSearch Part 1
No ratings yet
4.2-GeneralizationAsSearch Part 1
17 pages
6 9-DeepLearning
No ratings yet
6 9-DeepLearning
8 pages
CNN Architecture & Concepts Explained
No ratings yet
CNN Architecture & Concepts Explained
21 pages
Neural Networks: Hopfield & Boltzmann
100% (1)
Neural Networks: Hopfield & Boltzmann
13 pages
5 2-ExplanationBasedLearning
No ratings yet
5 2-ExplanationBasedLearning
19 pages
Hebbian Learning and Associative Memory
No ratings yet
Hebbian Learning and Associative Memory
13 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
18 pages
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
No ratings yet
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
18 pages
Perceptrons
No ratings yet
Perceptrons
11 pages
Hopfield Networks and Boltzman Machines-Part 2
No ratings yet
Hopfield Networks and Boltzman Machines-Part 2
13 pages
Model of Neuron in An ANN
No ratings yet
Model of Neuron in An ANN
12 pages
Pruning & Plucking in Tea Garden
100% (13)
Pruning & Plucking in Tea Garden
18 pages
A Prospective Phase I/II Clinical Trial of High-Dose Proton Therapy For Chordomas and Chondrosarcomas
No ratings yet
A Prospective Phase I/II Clinical Trial of High-Dose Proton Therapy For Chordomas and Chondrosarcomas
10 pages
Nurses in Implementation of National Rural Health Mission: Dr. Pratima Mittra Sr. Consultant, RCH - Ii / NRHM Nihfw
No ratings yet
Nurses in Implementation of National Rural Health Mission: Dr. Pratima Mittra Sr. Consultant, RCH - Ii / NRHM Nihfw
32 pages
Estimation of Marrige Hall
50% (2)
Estimation of Marrige Hall
57 pages
Lewin's Change Theory in Nursing Informatics
No ratings yet
Lewin's Change Theory in Nursing Informatics
4 pages
Dell Inspiron 14 7472
No ratings yet
Dell Inspiron 14 7472
2 pages
Health Education: Patient Teaching Stroke
No ratings yet
Health Education: Patient Teaching Stroke
10 pages
Final Defense Rubrics
No ratings yet
Final Defense Rubrics
2 pages
MP3 Player User Guide
No ratings yet
MP3 Player User Guide
14 pages
Understanding HOL Blocking in HTTP/3
No ratings yet
Understanding HOL Blocking in HTTP/3
23 pages
Makro Black Friday Deals 2018
90% (20)
Makro Black Friday Deals 2018
20 pages
LKPD Chapter 1
No ratings yet
LKPD Chapter 1
7 pages
Student Scholarship Status Report
No ratings yet
Student Scholarship Status Report
2 pages
Torque and Center of Mass Explained
No ratings yet
Torque and Center of Mass Explained
39 pages
Rural Marketing Strategies
No ratings yet
Rural Marketing Strategies
4 pages
CF - Questions and Practice Problems - Chapter 16
No ratings yet
CF - Questions and Practice Problems - Chapter 16
2 pages
Advanced Airway Management Printable
No ratings yet
Advanced Airway Management Printable
8 pages
Petroleum Geology of South Australia Complete
No ratings yet
Petroleum Geology of South Australia Complete
183 pages
Microbiology Answer Keys I Key
No ratings yet
Microbiology Answer Keys I Key
2 pages
Computer Science Error Notes FC
No ratings yet
Computer Science Error Notes FC
3 pages
Negotiation and Mediation Competition 2025 Brochure PDF
No ratings yet
Negotiation and Mediation Competition 2025 Brochure PDF
10 pages
How To Calculate Your Destiny Number Vanessa Somuayina
No ratings yet
How To Calculate Your Destiny Number Vanessa Somuayina
2 pages
Bionic Eye: A Visionary Breakthrough
No ratings yet
Bionic Eye: A Visionary Breakthrough
9 pages
Metabolic-Related Outcomes After Switching From TDF To TAF in Adults With HIV
No ratings yet
Metabolic-Related Outcomes After Switching From TDF To TAF in Adults With HIV
9 pages
Project Report On Crypto Currency
No ratings yet
Project Report On Crypto Currency
54 pages
Thermodynamics - Chapter 2 - Lecture 2
No ratings yet
Thermodynamics - Chapter 2 - Lecture 2
20 pages
Botanical Research Centre Survey Insights
No ratings yet
Botanical Research Centre Survey Insights
10 pages
Reflexive Verbs German
100% (2)
Reflexive Verbs German
5 pages
Filtrado de Beele y Isabella Ladera Hot Onlyfans Us 722171
No ratings yet
Filtrado de Beele y Isabella Ladera Hot Onlyfans Us 722171
3 pages
Modern Front End Frame Works Assessment
No ratings yet
Modern Front End Frame Works Assessment
3 pages

5.4-Reinforcement learning-part3-Q-Learning

Uploaded by

5.4-Reinforcement learning-part3-Q-Learning

Uploaded by

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Week 5 Machine Learning enabled

Video 5.4 Reinforcement Learning – Part 3 Q-learning

Q-learning is a model-free off-policy TD reinforcement learning algorithm.

Initialize Q(s, a) arbitrarily

With α =1 or α =1 and γ = 1 the updating formula is simplified

Assume that α=1 and γ = 0.5

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Thanks for your attention!

The next lecture 5.5 will be on the topic:

Case Based Reasoning

You might also like