0% found this document useful (0 votes)

3 views

dqn-atari

The document discusses the integration of deep reinforcement learning in achieving human-level control, particularly through the application of Q-learning and Deep Q-Networks in playing Atari games. It highlights key concepts such as Markov Decision Processes, exploration-exploitation dilemmas, and the importance of experience replay in stabilizing learning. The work signifies a significant step towards general artificial intelligence, as demonstrated by DeepMind's advancements in the field.

Uploaded by

erwinfnaruto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

dqn-atari

Uploaded by

erwinfnaruto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Human-level control through deep

reinforcement learning
Jiang Guo
2016.04.19
Towards General Artificial Intelligence
• Playing Atari with Deep Reinforcement Learning. ArXiv (2013)
• 7 Atari games
• The first step towards “General Artificial Intelligence”

• DeepMind got acquired by @Google (2014)

• Human-level control through deep reinforcement learning. Nature

(2015)
• 49 Atari games
• Google patented “Deep Reinforcement Learning”
Key Concepts
• Reinforcement Learning
• Markov Decision Process
• Discounted Future Reward
• Q-Learning
• Deep Q Network
• Exploration-Exploitation
• Experience Replay
• Deep Q-learning Algorithm
Reinforcement Learning
• Example: breakout (one of the Atari games)

• Suppose you want to teach an agent (e.g. NN) to play this game
• Supervised training (expert players play a million times) That’s not how we learn!
• Reinforcement learning
Reinforcement Learning
Supervised Learning Target label for each training example

ML Reinforcement Learning Sparse and time-delayed labels

Unsupervised Learning No label at all

Pong Breakout Space Invaders Seaquest Beam Rider

RL is Learning from Interaction

RL is like Life!
Markov Decision Process

𝑠0 , 𝑎0 , 𝑟1 , 𝑠1 , 𝑎1 , 𝑟2 , … , 𝑠𝑛−1 , 𝑎𝑛−1 , 𝑟𝑛 , 𝑠𝑛
state Terminal state
action
reward
State Representation
Think about the Breakout game
• How to define a state?
• Location of the paddle
• Location/direction of the ball
• Presence/absence of each individual brick

Let’s make it more universal!

Screen pixels
MDP

Value Function 𝑠0 , 𝑎0 , 𝑟1 , 𝑠1 , 𝑎1 , 𝑟2 , … , 𝑠𝑛−1 , 𝑎𝑛−1 , 𝑟𝑛 , 𝑠𝑛

• Future reward
𝑅 = 𝑟1 + 𝑟2 + 𝑟3 + ⋯ + 𝑟𝑛
𝑅𝑡 = 𝑟𝑡 + 𝑟𝑡+1 + 𝑟𝑡+2 + ⋯ + 𝑟𝑛
• Discounted future reward (environment is stochastic)
𝑅𝑡 = 𝑟𝑡 + 𝛾𝑟𝑡+1 + 𝛾 2 𝑟𝑡+2 + ⋯ + 𝛾 𝑛−𝑡 𝑟𝑛
= 𝑟𝑡 + 𝛾(𝑟𝑡+1 + 𝛾(𝑟𝑡+2 + ⋯ ))
= 𝑟𝑡 + 𝛾𝑅𝑡+1
• A good strategy for an agent would be to always choose an action that maximizes
the (discounted) future reward
Value-Action Function
• We define a 𝑄(𝑠, 𝑎) representing the maximum discounted future
reward when we perform action a in state s:
𝑄 𝑠𝑡 , 𝑎𝑡 = max 𝑅𝑡+1

• Q-function: represents the “Quality” of a certain action in a given state

• Imagine you have the magical Q-function
𝜋 𝑠 = 𝑎𝑟𝑔max 𝑄(𝑠, 𝑎)
𝑎

• 𝜋 is the policy
Q-Learning
• How do we get the Q-function?
• Bellman Equation （贝尔曼公式）
𝑄 𝑠, 𝑎 = 𝑟 + 𝛾𝑚𝑎𝑥𝑎′ 𝑄(𝑠 ′ , 𝑎′ )

Value Iteration
Q-Learning
• In practice, Value Iteration is impractical
• Very limited states/actions
• Cannot generalize to unobserved states

• Think about the Breakout game

• State: screen pixels
• Image size: 𝟖𝟒 × 𝟖𝟒 (resized)
• Consecutive 4 images 𝟐𝟓𝟔𝟖𝟒×𝟖𝟒×𝟒 rows in the Q-table!
• Grayscale with 256 gray levels
Function Approximator
• Use a function (with parameters) to approximate the Q-function
𝑄 𝑠, 𝑎; 𝜽 ≈ 𝑄 ∗ (𝑠, 𝑎)
• Linear
• Non-linear: Q-network
Q-value 1
𝑠 State

Network Q-value 𝑠 State Network Q-value 2

𝑎 Action

Q-value 3
Deep Q-Network

Deep Q-Network used in the DeepMind paper:

Note: No Pooling Layer!

Estimating the Q-Network
• Objective Function
• Recall the Bellman Equation: 𝑄 𝑠, 𝑎 = 𝑟 + 𝛾𝑚𝑎𝑥𝑎′ 𝑄(𝑠 ′ , 𝑎′ )

• Here, we use simple squared error:

𝐿 = 𝔼[(𝒓 + 𝜸𝒎𝒂𝒙𝒂′ 𝑸 𝒔′ , 𝒂′ − 𝑄 𝑠, 𝑎 )2 ]
target
• Leading to the following Q-learning gradient

𝜕𝐿 𝑤 ′ ′
𝜕𝑄(𝑠, 𝑎, 𝑤)
= 𝔼[(𝒓 + 𝜸𝒎𝒂𝒙𝒂′ 𝑸 𝒔 , 𝒂 − 𝑄 𝑠, 𝑎 ) ]
𝜕𝑤 𝜕𝑤
• Optimize objective end-to-end by SGD
Learning Stability
• Non-linear function approximator (Q-Network) is not very stable

Deep Learning Reinforcement Learning

Data samples are I.I.D. States are highly correlated

1. Exploration-Exploitation
vs. 2. Experience Replay
Underlying data
Data distribution changes
distribution is fixed
Exploration-Exploitation Dilemma
(探索-利用困境)
• During training, how do we choose an action at time 𝑡?

• （探索）Exploration: random guessing

• （利用）Exploitation: choose the best one according to the Q-value

• 𝜖-greedy policy
• With probability 𝜖 select a random action (Exploration)
• Otherwise select 𝑎 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑎′ 𝑄 𝑠, 𝑎′ (Exploitation)
Experience Replay
• To remove correlations, build data-set from agent’s own experience

1. Take action 𝑎𝑡 according to 𝝐-greedy policy

2. During gameplay, store transition < 𝑠𝑡 , 𝑎𝑡 , 𝑟𝑡+1 , 𝑠𝑡+1 > in replay memory 𝐷
3. Sample random mini-batch of transitions < 𝑠, 𝑎, 𝑟, 𝑠 ′ > from 𝐷
4. Optimize MSE between Q-network and Q-learning targets

1
𝐿 = 𝔼𝑠,𝑎,𝑟,𝑠′ ~𝐷 [𝒓 + 𝜸𝒎𝒂𝒙𝒂′ 𝑸 𝒔′ , 𝒂′ − 𝑄 𝑠, 𝑎 ]2
2
𝝐-greedy policy

Experience memory

Target network
Effect of Experience Replay and Target Q-Network
A short review
• Reinforcement Learning
• Function approximators for end-to-end Q-learning
• Deep Learning
• Extract high-level feature representations from high-dimensional raw sensory
data

Reinforcement Learning + Deep Learning = AI

by David Silver

Numerical Method For Engineers-Chapter 13
100% (2)
Numerical Method For Engineers-Chapter 13
20 pages
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
No ratings yet
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
9 pages
42-Deep Q Learning
No ratings yet
42-Deep Q Learning
8 pages
Unit 5 Deep Learning
No ratings yet
Unit 5 Deep Learning
24 pages
18-deeprl
No ratings yet
18-deeprl
19 pages
3.5 Intro2DeepQLearning
No ratings yet
3.5 Intro2DeepQLearning
12 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Deep Reinforcement Learning
100% (4)
Deep Reinforcement Learning
48 pages
Playing Geometry Dash With Convolutional Neural Networks
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
7 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Lecture Notes on Reinforcement Learning Basics
No ratings yet
Lecture Notes on Reinforcement Learning Basics
6 pages
1611.01606v1
No ratings yet
1611.01606v1
13 pages
15
No ratings yet
15
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
Lec 1 Intro Course Overview
No ratings yet
Lec 1 Intro Course Overview
50 pages
An Introduction To Deep ReinforcementLearning
No ratings yet
An Introduction To Deep ReinforcementLearning
65 pages
Deep Reinforcement Learning in Action 1st Edition Alexander Zai Brandon Brown pdf download
100% (2)
Deep Reinforcement Learning in Action 1st Edition Alexander Zai Brandon Brown pdf download
44 pages
Full Download Deep Reinforcement Learning in Action 1st Edition Alexander Zai Brandon Brown PDF DOCX
100% (2)
Full Download Deep Reinforcement Learning in Action 1st Edition Alexander Zai Brandon Brown PDF DOCX
65 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Introduction To Deep Reinforcement Learning
No ratings yet
Introduction To Deep Reinforcement Learning
7 pages
Deep Reinforcement Learning in Games
No ratings yet
Deep Reinforcement Learning in Games
9 pages
Deep Q-Network
No ratings yet
Deep Q-Network
15 pages
AI A Z HandBook
No ratings yet
AI A Z HandBook
12 pages
Deep RL Tutorial Small
No ratings yet
Deep RL Tutorial Small
66 pages
37 RL
No ratings yet
37 RL
18 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
IMPLing The DQN
No ratings yet
IMPLing The DQN
9 pages
Q Learning
No ratings yet
Q Learning
38 pages
Unit 1
No ratings yet
Unit 1
18 pages
Artificial Intelligence A - Z - UDEMY
No ratings yet
Artificial Intelligence A - Z - UDEMY
5 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
MDP
No ratings yet
MDP
10 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Chapter 3
No ratings yet
Chapter 3
14 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
47 pages
Untitled document
No ratings yet
Untitled document
11 pages
1 Introduction To RL
No ratings yet
1 Introduction To RL
46 pages
10. Learning Task
No ratings yet
10. Learning Task
14 pages
Hota-ML-ReinforcementLearning
No ratings yet
Hota-ML-ReinforcementLearning
12 pages
Get Deep Reinforcement Learning in Action 1st Edition Alexander Zai Brandon Brown PDF ebook with Full Chapters Now
100% (2)
Get Deep Reinforcement Learning in Action 1st Edition Alexander Zai Brandon Brown PDF ebook with Full Chapters Now
40 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
A Short Survey On Memory Based RL
No ratings yet
A Short Survey On Memory Based RL
18 pages
ReinforcementLearning
No ratings yet
ReinforcementLearning
17 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
(Ebook) Deep Reinforcement Learning in Action by Alexander Zai, Brandon Brown ISBN 9781617295430, 1617295434 All Chapters Instant Download
100% (10)
(Ebook) Deep Reinforcement Learning in Action by Alexander Zai, Brandon Brown ISBN 9781617295430, 1617295434 All Chapters Instant Download
65 pages
1 s2.0 S0925231220303337 Main
No ratings yet
1 s2.0 S0925231220303337 Main
12 pages
lec22
No ratings yet
lec22
22 pages
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
No ratings yet
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
7 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Lecture15 Deep Reinforcement Learning PDF
No ratings yet
Lecture15 Deep Reinforcement Learning PDF
109 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Reinforcement Learning With Python
No ratings yet
Reinforcement Learning With Python
24 pages
A Beginners Guide To Deep Reinforcement Learning PDF
No ratings yet
A Beginners Guide To Deep Reinforcement Learning PDF
9 pages
Human-Level Control Through Deep Reinforcement Learning
No ratings yet
Human-Level Control Through Deep Reinforcement Learning
13 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
03-04-lessonarticle
No ratings yet
03-04-lessonarticle
5 pages
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Mathematical Chess
From Everand
Mathematical Chess
Dr George Ho
No ratings yet
Real-Time Multirate Multiband Amplification For Hearing Aids
No ratings yet
Real-Time Multirate Multiband Amplification For Hearing Aids
12 pages
3 Algorithm Analysis-2
No ratings yet
3 Algorithm Analysis-2
6 pages
Simplex Method Calculator1
No ratings yet
Simplex Method Calculator1
3 pages
Sha 3
No ratings yet
Sha 3
10 pages
An Economical Class of Digital Filters For Decimation and Interpolation
No ratings yet
An Economical Class of Digital Filters For Decimation and Interpolation
9 pages
How To Create Beautiful Graphs and Charts With LaTeX
No ratings yet
How To Create Beautiful Graphs and Charts With LaTeX
1 page
(Ebook) DSP for MATLAB and LabVIEW, Volume II: Discrete Frequency Transforms by Forester W. Isen; J. Moura ISBN 9781598298932, 9781598298949, 1598298933, 1598298941download
100% (4)
(Ebook) DSP for MATLAB and LabVIEW, Volume II: Discrete Frequency Transforms by Forester W. Isen; J. Moura ISBN 9781598298932, 9781598298949, 1598298933, 1598298941download
49 pages
Digital Image Processing Question Answer Bank PDF
100% (1)
Digital Image Processing Question Answer Bank PDF
69 pages
Ec2306 Digital Signal Processing Lab Manual For TMS320C6745
No ratings yet
Ec2306 Digital Signal Processing Lab Manual For TMS320C6745
52 pages
2017 - Leaf Disease Detection Using Image Processing
No ratings yet
2017 - Leaf Disease Detection Using Image Processing
4 pages
I. TOPIC: Solving Quadratic Equations by Factoring
No ratings yet
I. TOPIC: Solving Quadratic Equations by Factoring
2 pages
Answer Test 1 UTHM
No ratings yet
Answer Test 1 UTHM
12 pages
Hillary A. Bsece - : 2.16. An Ideal Low-Pass filter-MATLAB
No ratings yet
Hillary A. Bsece - : 2.16. An Ideal Low-Pass filter-MATLAB
5 pages
Hermite Interpolation
No ratings yet
Hermite Interpolation
52 pages
Application of Jacobian Series
No ratings yet
Application of Jacobian Series
6 pages
2nd Periodical Test
No ratings yet
2nd Periodical Test
8 pages
Milp For Cse - 2023
No ratings yet
Milp For Cse - 2023
38 pages
Lec 35
No ratings yet
Lec 35
19 pages
NPTEL Assignment 1 Opt
No ratings yet
NPTEL Assignment 1 Opt
14 pages
Ex 1
No ratings yet
Ex 1
1 page
Density-Based Methods: DBSCAN: Density-Based Clustering Based On Connected Regions With High Density
No ratings yet
Density-Based Methods: DBSCAN: Density-Based Clustering Based On Connected Regions With High Density
3 pages
Jpeg PPT Notes
No ratings yet
Jpeg PPT Notes
24 pages
Matlab Code
No ratings yet
Matlab Code
7 pages
Artificial Variable Techniques - Big M-Method
No ratings yet
Artificial Variable Techniques - Big M-Method
26 pages
3.5 Numerical Differentiation (Finite Difference Formulas) : X X F X X F X F DX X DF
No ratings yet
3.5 Numerical Differentiation (Finite Difference Formulas) : X X F X X F X F DX X DF
5 pages
DIP Syllabus
No ratings yet
DIP Syllabus
2 pages
Assignment 5 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 5 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
5 pages
Solved - The Fourth-Degree Polynomial F (X) 230x4 + 18x3 + 9x2...
No ratings yet
Solved - The Fourth-Degree Polynomial F (X) 230x4 + 18x3 + 9x2...
7 pages
Data Science Interview Preparation (30 Days of Interview Preparation)
No ratings yet
Data Science Interview Preparation (30 Days of Interview Preparation)
15 pages