0% found this document useful (0 votes)

56 views6 pages

Q Learning

The document explains the Q-Learning algorithm used to train a robot to navigate a maze while avoiding mines and reaching an endpoint in the shortest time possible. It introduces the Q-Table, which stores the maximum expected future rewards for actions at each state, and describes the iterative process of updating these values using the Bellman equation. The scoring system incentivizes the robot to minimize steps and avoid mines while gaining points for reaching the goal and collecting power-ups.

Uploaded by

gdcwcse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views6 pages

Q Learning

Uploaded by

gdcwcse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Q-Learning:

Let’s say that a robot has to cross a maze and reach the end point. There are mines, and the
robot can only move one tile at a time. If the robot steps onto a mine, the robot is dead. The
robot has to reach the end point in the shortest time possible.

The scoring/reward system is as below:

1. The robot loses 1 point at each step. This is done so that the robot takes the shortest path and
reaches the goal as fast as possible.

2. If the robot steps on a mine, the point loss is 100 and the game ends.

3. If the robot gets power ⚡️, it gains 1 point.

4. If the robot reaches the end goal, the robot gets 100 points.

Now, the obvious question is: How do we train a robot to reach the end goal with the
shortest path without stepping on a mine?

Introducing the Q-Table

Q-Table is just a fancy name for a simple lookup table where we calculate the maximum
expected future rewards for action at each state. Basically, this table will guide us to the best
action at each state.
There will be four numbers of actions at each non-edge tile. When a robot is at a state it can
either move up or down or right or left.

So, let’s model this environment in our Q-Table.

In the Q-Table, the columns are the actions and the rows are the states.

Each Q-table score will be the maximum expected future reward that the robot will get if it
takes that action at that state. This is an iterative process, as we need to improve the Q-Table
at each iteration.

But the questions are:

 How do we calculate the values of the Q-table?

 Are the values available or predefined?

 To learn each value of the Q-table, we use the Q-Learning algorithm.
 Mathematics: the Q-Learning algorithm
 Q-function
 The Q-function uses the Bellman equation and takes two inputs: state (s) and action
(a).

Using the above function, we get the values of Q for the cells in the table.
When we start, all the values in the Q-table are zeros.

There is an iterative process of updating the values. As we start to explore the

environment, the Q-function gives us better and better approximations by continuously
updating the Q-values in the table.
Now, let’s understand how the updating takes place.

Introducing the Q-learning algorithm process

Each of the colored boxes is one step. Let’s understand each of these steps in detail.
Step 1: initialize the Q-Table
We will first build a Q-table. There are n columns, where n= number of actions. There are m
rows, where m= number of states. We will initialise the values at 0.

In our robot example, we have four actions (a=4) and five states (s=5). So we will build a
table with four columns and five rows.

Steps 2 and 3: choose and perform an action

This combination of steps is done for an undefined amount of time. This means that this step
runs until the time we stop the training, or the training loop stops as defined in the code.

We will choose an action (a) in the state (s) based on the Q-Table. But, as mentioned earlier,
when the episode initially starts, every Q-value is 0.

So now the concept of exploration and exploitation trade-off comes into play.
For the robot example, there are four actions to choose from: up, down, left, and
right. We are starting the training now — our robot knows nothing about the environment. So
the robot chooses a random action, say right.

We can now update the Q-values for being at the start and moving right using the Bellman
equation.

Steps 4 and 5: evaluate

Now we have taken an action and observed an outcome and reward.We need to update the
function Q(s,a).

In the case of the robot game, to reiterate the scoring/reward structure is:

 power = +1
 mine = -100
 end = +100

We will repeat this again and again until the learning is stopped. In this way the Q-Table will
be updated.

Q Learning
No ratings yet
Q Learning
9 pages
Unit 5
No ratings yet
Unit 5
65 pages
Intro To Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro To Reinforcement Learning - DQ Q AC A3C
36 pages
AI Seminar RL
No ratings yet
AI Seminar RL
27 pages
Q Learning Ejemplo
100% (1)
Q Learning Ejemplo
11 pages
Q-Learning Algorithm
No ratings yet
Q-Learning Algorithm
13 pages
Deep Learning Binoy-19-3-RL Q Learning
No ratings yet
Deep Learning Binoy-19-3-RL Q Learning
26 pages
Q-Learning for Room Navigation Simulation
100% (1)
Q-Learning for Room Navigation Simulation
15 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
11 pages
A Painless Q-Learning Tutorial
No ratings yet
A Painless Q-Learning Tutorial
6 pages
Reinforcement Learning - Ipynb - Colaboratory
No ratings yet
Reinforcement Learning - Ipynb - Colaboratory
7 pages
Q-Learning for Optimal Pathfinding
No ratings yet
Q-Learning for Optimal Pathfinding
2 pages
Unit 5
No ratings yet
Unit 5
54 pages
39-Q Learning Numerical
No ratings yet
39-Q Learning Numerical
13 pages
Q-Learning Implementation in OpenAI Gym
No ratings yet
Q-Learning Implementation in OpenAI Gym
34 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Exp1 D16AD 60
No ratings yet
Exp1 D16AD 60
11 pages
Q Learning
No ratings yet
Q Learning
12 pages
Simulation of The Navigation of A Mobile Robot by The Q-Learning Using Artificial Neuron Networks
No ratings yet
Simulation of The Navigation of A Mobile Robot by The Q-Learning Using Artificial Neuron Networks
12 pages
Ex No4rl
No ratings yet
Ex No4rl
3 pages
MAS Lab7 QFA
No ratings yet
MAS Lab7 QFA
10 pages
Nidhish RLAI-Lab1
No ratings yet
Nidhish RLAI-Lab1
18 pages
Q Learning
No ratings yet
Q Learning
38 pages
Q Learning
No ratings yet
Q Learning
38 pages
RL MJJ
No ratings yet
RL MJJ
32 pages
Reinforcement Learning Algorithms in Global Path Planning For Mobile Robot
No ratings yet
Reinforcement Learning Algorithms in Global Path Planning For Mobile Robot
5 pages
RL Class Mtech
No ratings yet
RL Class Mtech
67 pages
Reinforcement Learning Insights from CS188
No ratings yet
Reinforcement Learning Insights from CS188
46 pages
Q Learning SARSA Deep Q Learning
No ratings yet
Q Learning SARSA Deep Q Learning
4 pages
p1 Piotr
No ratings yet
p1 Piotr
7 pages
Q Learning
No ratings yet
Q Learning
187 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
AIML
No ratings yet
AIML
4 pages
Reinforcement Learning with Latent Confounding
No ratings yet
Reinforcement Learning with Latent Confounding
7 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Q-Learning in C++
No ratings yet
Q-Learning in C++
4 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
12 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
31 pages
Unit 5
No ratings yet
Unit 5
70 pages
Hota ML ReinforcementLearning
No ratings yet
Hota ML ReinforcementLearning
12 pages
Lec 09
No ratings yet
Lec 09
26 pages
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
No ratings yet
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
22 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
No ratings yet
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
9 pages
Rahman2018 Article ImplementationOfQLearningAndDe PDF
No ratings yet
Rahman2018 Article ImplementationOfQLearningAndDe PDF
6 pages
Muhammad Muaaz Aamer BSCS 2021 FAST NU LHR - Take Home Quiz No 3
No ratings yet
Muhammad Muaaz Aamer BSCS 2021 FAST NU LHR - Take Home Quiz No 3
4 pages
EEE4114F 2022 ML Tutorial Solution 2 of 2
No ratings yet
EEE4114F 2022 ML Tutorial Solution 2 of 2
4 pages
Q Learning
No ratings yet
Q Learning
18 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
80 pages
Training in FrozenLake Environment
No ratings yet
Training in FrozenLake Environment
6 pages
Fai Mid2 4ans
No ratings yet
Fai Mid2 4ans
4 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
ML U5 Notes
No ratings yet
ML U5 Notes
26 pages
Adobe Scan Nov 18, 2024
No ratings yet
Adobe Scan Nov 18, 2024
13 pages
ml4r 2025 05
No ratings yet
ml4r 2025 05
22 pages
10.Q Learning Algorithm
No ratings yet
10.Q Learning Algorithm
2 pages
AP4011 Lab Manual
No ratings yet
AP4011 Lab Manual
42 pages
Discrete-Time Signals Overview
No ratings yet
Discrete-Time Signals Overview
143 pages
JulyAugust 20291071
No ratings yet
JulyAugust 20291071
2 pages
fn-06 3rd Assignment
No ratings yet
fn-06 3rd Assignment
5 pages
Digital Image Processing Course Outline
No ratings yet
Digital Image Processing Course Outline
9 pages
ML Lab Manual 2025-2
No ratings yet
ML Lab Manual 2025-2
35 pages
Design and Analysis Algorithms: Course Outcomes
No ratings yet
Design and Analysis Algorithms: Course Outcomes
3 pages
Mehran University of Engineering and Technology, Jamshoro Department of Software Engineering
No ratings yet
Mehran University of Engineering and Technology, Jamshoro Department of Software Engineering
2 pages
Homework 5
No ratings yet
Homework 5
6 pages
Two Phase Matrix Method
No ratings yet
Two Phase Matrix Method
9 pages
Algorithm Lab Report 6
No ratings yet
Algorithm Lab Report 6
7 pages
Explicit and Implicit Methods
No ratings yet
Explicit and Implicit Methods
3 pages
Introduction To The Design and Analysis of Algorithms 3rd Edition Edition Levitin Instant Download
No ratings yet
Introduction To The Design and Analysis of Algorithms 3rd Edition Edition Levitin Instant Download
135 pages
CNNs for Image Pattern Recognition
100% (1)
CNNs for Image Pattern Recognition
3 pages
Deep Learning NPTEL Assignment 1
No ratings yet
Deep Learning NPTEL Assignment 1
6 pages
Digital Filter Concepts and Questions
100% (1)
Digital Filter Concepts and Questions
77 pages
3DOF
No ratings yet
3DOF
44 pages
Discrete-Time Control Systems Analysis
No ratings yet
Discrete-Time Control Systems Analysis
11 pages
Unit 2 A.I
No ratings yet
Unit 2 A.I
37 pages
DataStructureBCAUGCA1918pdf 2025 07 22 11 43 03
No ratings yet
DataStructureBCAUGCA1918pdf 2025 07 22 11 43 03
54 pages
Algorithms: Backtracking & B&B
No ratings yet
Algorithms: Backtracking & B&B
38 pages
Digital Electronics
No ratings yet
Digital Electronics
8 pages
Equalization and Diversity: School of Information Science and Engineering, SDU
No ratings yet
Equalization and Diversity: School of Information Science and Engineering, SDU
93 pages
Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
Algorithm Syllabus
No ratings yet
Algorithm Syllabus
4 pages
Lagrange Interpolation for Unequal Intervals
No ratings yet
Lagrange Interpolation for Unequal Intervals
2 pages
DC Module 1
No ratings yet
DC Module 1
60 pages
Brute Force & Divide-Conquer Algorithms
No ratings yet
Brute Force & Divide-Conquer Algorithms
34 pages
Fall Detection with HOG and AdaBoost
No ratings yet
Fall Detection with HOG and AdaBoost
1 page
DSA 08 Sorting - Algorithms
No ratings yet
DSA 08 Sorting - Algorithms
12 pages

Q Learning

Uploaded by

Q Learning

Uploaded by

Q-Learning:

The scoring/reward system is as below:

3. If the robot gets power ⚡️, it gains 1 point.

Introducing the Q-Table

So, let’s model this environment in our Q-Table.

But the questions are:

 How do we calculate the values of the Q-table?

 Are the values available or predefined?

There is an iterative process of updating the values. As we start to explore the

Introducing the Q-learning algorithm process

Steps 2 and 3: choose and perform an action

Steps 4 and 5: evaluate

You might also like