L13 Reinforcement Learning
L13 Reinforcement Learning
Khoat Than
School of Information and Communication Technology
Hanoi University of Science and Technology
2023
Contents 2
¡ Introduction
¡ Supervised learning
¡ Unsupervised learning
¡ Reinforcement learning
¡ Practical advice
Reinforcement Learning problem 3
n t increments at
environment step
History and State 10
¡ Rewards: -1 per
time-step
¡ Actions: N, E, S, W
¡ States: Agent's
location
(https://www.davidsilver.uk/wp-content/uploads/2020/03/intro_RL.pdf)
Maze example: Policy 21
¡ Arrows represent
policy p(𝑠) for each
state s
(https://www.davidsilver.uk/wp-content/uploads/2020/03/intro_RL.pdf)
Maze example: Value function 22
¡ Numbers represent
value 𝑣p(𝑠) of each
state s
(https://www.davidsilver.uk/wp-content/uploads/2020/03/intro_RL.pdf)
Maze example: Model 23
¡ Value-based
v No policy
v Value function
¡ Policy-based
v Policy
v No value function
¡ Actor critic
v Policy
v Value function
Categorizing RL agents (2) 25
¡ Model-free
v Policy and/or Value function
v No model
¡ Model-based
v Policy and/or Value function
v Model
Exploration and Exploitation (1) 26
¡ Restaurant selection
v Exploitation: Go to your favorite restaurant
v Exploration: Try a new restaurant
¡ Game playing
v Exploitation: Play the move you believe is best
v Exploration: Play an experimental move
Q-Learning: What to learn 29
v 𝑠 ← 𝑠.
Updating Q* 33
¡ 𝑄∗ 𝑠+ , 𝑎1234# ← 𝑟 + 𝛾. max
&
𝑄 ∗
𝑠5 , 𝑎 /
&
= S 𝑃 𝑠 / , 𝑟| 𝑠, 𝑎 𝑟 + 𝛾𝑣- (𝑠′)
)/,1