0% found this document useful (0 votes)

30 views38 pages

Model-Free Prediction in Reinforcement Learning

The document discusses model-free prediction in reinforcement learning, focusing on Monte-Carlo and Temporal Difference (TD) learning methods. It explains how these methods estimate the value function of unknown Markov Decision Processes (MDPs) using episodes of experience, highlighting their differences in handling episodic and non-episodic environments. Key concepts such as incremental updates, bias/variance trade-offs, and the advantages and disadvantages of each method are also covered.

Uploaded by

Nadya Noorfatima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views38 pages

Model-Free Prediction in Reinforcement Learning

Uploaded by

Nadya Noorfatima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Model Free Prediction

Rajib Paul(PhD)
Department of Software and Computer Engineering
Contents

 Introduction

 Monte-Carlo Learning

 Temporal Difference Learning

 TD()

2
Model Free Reinforcement Learning

 Last lecture:
 Planning by dynamic programming
 Solve a known MDP
 This lecture:
 Model-free prediction
 Estimate the value function of an unknown MDP
 Next lecture:
 Model-free control
 Optimize the value function of an unknown MDP

3
Monte-Carlo Reinforcement Learning

 MC methods learn directly from episodes of experience

 MC is model-free: no knowledge of MDP transitions / rewards
 MC learns from complete episodes: no bootstrapping
 MC uses the simplest possible idea: value = mean return
 Caveat: can only apply MC to episodic MDPs
 All episodes must terminate

4
Monte-Carlo Policy Evaluation

 Goal: learn v from episodes of experience under policy

S1, A1, R2; …., Sk ~
 Recall that the return is the total discounted reward:
Gt = Rt+1 + Rt+2 + …. + RT
 Recall that the value function is the expected return:
v(s) = E[Gt|St = s]

 Monte-Carlo policy evaluation uses empirical mean return instead of

expected return

5
First Visit Monte Carlo Policy Evaluation

 To evaluate state s
 The first time-step t that state s is visited in an episode,
 Increment counter N(s) <- N(s) + 1
 Increment total return S(s) <- S(s) + Gt
 Value is estimated by mean return V(s) = S(s)/N(s)
 By law of large numbers, V(s)-> v(s) as N(s) ->

6
Every Visit Monte Carlo Policy Evaluation

 To evaluate state s
 Every time-step t that state s is visited in an episode,
 Increment counter N(s) <- N(s) + 1
 Increment total return S(s) <- S(s) + Gt
 Value is estimated by mean return V(s) = S(s)/N(s)
 By law of large numbers, V(s)-> v(s) as N(s) ->

7
Black-Jack Example
 States (200 of them):
 Current sum (12-21)
 Dealer's showing card (ace-10)
 Do I have a \useable" ace? (yes-no)
 Action stick: Stop receiving cards (and terminate)
 Action twist: Take another card (no replacement)
 Reward for stick:
 +1 if sum of cards > sum of dealer cards
 0 if sum of cards = sum of dealer cards
 -1 if sum of cards < sum of dealer cards
 Reward for twist:
 -1 if sum of cards > 21 (and terminate)
 0 otherwise
 Transitions: automatically twist if sum of cards < 12
8
Blackjack Value Function after MC

 Policy: stick if sum of cards 20, otherwise twist

9
Incremental Mean

 The mean 1, ,… of a sequence x1, x2, … can be computed incrementally,

10
Incremental Monte-Carlo Updates

 Update V(s) incrementally after episode S1, A1, R2, ….., ST

 For each state St with return Gt

 In non-stationary problems, it can be useful to track a running mean, i.e.

forget old episodes

11
Temporal Difference Learning

 TD methods learn directly from episodes of experience

 TD is model-free: no knowledge of MDP transitions / rewards
 TD learns from incomplete episodes, by bootstrapping
 TD updates a guess towards a guess

12
MC and TD

 Goal: learn v online from experience under policy

 Incremental every-visit Monte-Carlo
 Update value V(St) toward actual return Gt
V(St)<- V(St) + (Gt - V(St))
 Simplest temporal-difference learning algorithm: TD(0)
 Update value V(St ) toward estimated return Rt+1 + V(St+1)

 Rt+1 + V(St+1) is called the TD target

 t = Rt+1 + V(St+1) - V(St) is called the TD error
13
Driving Home Example

14
Driving Home Example: MC vs TD

15
Advantage and Disadvantage of MC vs. TD

 TD can learn before knowing the final outcome

 TD can learn online after every step
 MC must wait until end of episode before return is known
 TD can learn without the final outcome
 TD can learn from incomplete sequences
 MC can only learn from complete sequences
 TD works in continuing (non-terminating) environments
 MC only works for episodic (terminating) environments

16
Bias/Variance Trade Off

 Return Gt = Rt+1 + Rt+2 + …. + is unbiased estimate of v(St)

 True TD target Rt+1 + v(St+1) is unbiased estimate of v(St)
 TD target Rt+1 + V(St+1) is biased estimate of v(St)
 TD target is much lower variance than the return:
 Return depends on many random actions, transitions, rewards
 TD target depends on one random action, transition, reward

17
Advantage and Disadvantage of MC vs. TD

 MC has high variance, zero bias

 Good convergence properties
 (even with function approximation)
 Not very sensitive to initial value
 Very simple to understand and use
 TD has low variance, some bias
 Usually more efficient than MC
 TD(0) converges to v(s)
 (but not always with function approximation)
 More sensitive to initial value

18
Random Walk Example

19
Random Walk: MC vs. TD

20
Batch MC and TD

 MC and TD converge: V(s) -> v(s) as experience ->

 But what about batch solution for finite experience?

 e.g. Repeatedly sample episode k [1,K]

 Apply MC or TD(0) to episode k

21
AB Example

 Two states A;B; no discounting; 8 episodes of experience

 What is V(A), V(B)?

22
Certainty Equivalence

 MC converges to solution with minimum mean-squared error

 Best t to the observed returns
 In the AB example, V(A) = 0
 TD(0) converges to solution of max likelihood Markov model
 Solution to the MDP that best fits the data

 In the AB example, V(A) = 0.75

23
Advantage and Disadvantage of MC vs. TD

 TD exploits Markov property

 Usually more efficient in Markov environments
 MC does not exploit Markov property
 Usually more effective in non-Markov environments

24
Monte-Carlo Backup

25
Temporal Difference Backup

26
Dynamic Programming Backup

27
Bootstrapping and Sampling

 Bootstrapping: update involves an estimate

 MC does not bootstrap
 DP bootstraps
 TD bootstraps
 Sampling: update samples an expectation
 MC samples
 DP does not sample
 TD samples

28
Unified View of Reinforcement Learning

29
N Step Prediction

 Let TD target look n steps into the future

30
n-step Return

 Consider the following n-step returns for n = 1, 2, 1,

 Define the n-step return

 n-step temporal difference learning

31
Long Random Walk Example

32
Averaging n-Step Returns

 We can average n-step returns over different n

 e.g. average the 2-step and 4-step returns

 Combines information from two different time-steps

 Can we efficiently combine information from all
time-steps?

33
return

 The return combines all n-steps returns

 Using weight (1- )

 Forward-view TD()

34
Forward View TD()

 Update value function towards the -return

 Forward-view looks into the future to compute
 Like MC, can only be computed from complete episodes

35
Backward View TD()

 Forward view provides theory

 Backward view provides mechanism
 Update online, every step, from incomplete sequences

36
Backward View TD()

 Keep an eligibility trace for every state s

 Update value V(s) for every state s
 In proportion to TD-error and eligibility trace E t(s)

37
Thank You

Model-Free Prediction in RL Techniques
No ratings yet
Model-Free Prediction in RL Techniques
51 pages
Model-Free Prediction in Reinforcement Learning
No ratings yet
Model-Free Prediction in Reinforcement Learning
48 pages
Lecture 5 - ModelFreePrediction
No ratings yet
Lecture 5 - ModelFreePrediction
79 pages
Model-Free Prediction & Control in RL
No ratings yet
Model-Free Prediction & Control in RL
63 pages
Temporal Difference Learning Explained
No ratings yet
Temporal Difference Learning Explained
15 pages
MC vs. TD Learning Explained
No ratings yet
MC vs. TD Learning Explained
24 pages
UC Berkeley Reinforcement Learning Guide
No ratings yet
UC Berkeley Reinforcement Learning Guide
46 pages
Understanding Temporal-Difference Learning
No ratings yet
Understanding Temporal-Difference Learning
25 pages
Understanding Temporal Difference Learning
No ratings yet
Understanding Temporal Difference Learning
9 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
47 pages
Independent Q-Learning in RL
No ratings yet
Independent Q-Learning in RL
50 pages
Reinforcement Learning Control Techniques
No ratings yet
Reinforcement Learning Control Techniques
162 pages
Reinforcement Learning Techniques Overview
No ratings yet
Reinforcement Learning Techniques Overview
45 pages
TD Learning and Model-Free Control Overview
No ratings yet
TD Learning and Model-Free Control Overview
9 pages
Reinforcement Learning Overview and Q-Learning
No ratings yet
Reinforcement Learning Overview and Q-Learning
80 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
57 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
51 pages
Monte Carlo vs. TD Methods in RL
No ratings yet
Monte Carlo vs. TD Methods in RL
17 pages
Reinforcement Learning Techniques Overview
No ratings yet
Reinforcement Learning Techniques Overview
48 pages
Monte Carlo Methods in MDPs Explained
No ratings yet
Monte Carlo Methods in MDPs Explained
28 pages
Online Evaluation in Reinforcement Learning
No ratings yet
Online Evaluation in Reinforcement Learning
41 pages
3 Evaluation Annotated
No ratings yet
3 Evaluation Annotated
42 pages
Model-Free Monte Carlo Learning Techniques
No ratings yet
Model-Free Monte Carlo Learning Techniques
22 pages
Reinforcement Learning Techniques Explained
No ratings yet
Reinforcement Learning Techniques Explained
46 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
59 pages
機器學習強化學習 Part2 (2025)
No ratings yet
機器學習強化學習 Part2 (2025)
48 pages
Monte Carlo Policy Evaluation in RL
No ratings yet
Monte Carlo Policy Evaluation in RL
245 pages
Reinforcement Learning Concepts Explained
No ratings yet
Reinforcement Learning Concepts Explained
45 pages
MDPs and Reinforcement Learning Insights
No ratings yet
MDPs and Reinforcement Learning Insights
15 pages
09_RL.pptx
No ratings yet
09_RL.pptx
58 pages
Reinforcement Learning: Monte Carlo Methods
No ratings yet
Reinforcement Learning: Monte Carlo Methods
53 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
52 pages
Reinforcement Learning Fundamentals
No ratings yet
Reinforcement Learning Fundamentals
42 pages
Understanding Semi-Supervised and Reinforcement Learning
No ratings yet
Understanding Semi-Supervised and Reinforcement Learning
49 pages
Passive vs Active Reinforcement Learning
No ratings yet
Passive vs Active Reinforcement Learning
15 pages
Reinforcement Learning Basics Explained
No ratings yet
Reinforcement Learning Basics Explained
31 pages
Reinforcement Learning Algorithms Overview
No ratings yet
Reinforcement Learning Algorithms Overview
42 pages
Reinforcement Learning Algorithms Overview
No ratings yet
Reinforcement Learning Algorithms Overview
41 pages
Understanding Semi-Supervised and Reinforcement Learning
No ratings yet
Understanding Semi-Supervised and Reinforcement Learning
18 pages
Understanding Temporal Difference Learning
No ratings yet
Understanding Temporal Difference Learning
62 pages
Reinforcement Learning Basics in AI
No ratings yet
Reinforcement Learning Basics in AI
25 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
77 pages
Monte Carlo vs Temporal Difference Methods
No ratings yet
Monte Carlo vs Temporal Difference Methods
57 pages
Reinforcement Learning Techniques Overview
No ratings yet
Reinforcement Learning Techniques Overview
33 pages
Understanding Temporal-Difference Learning
No ratings yet
Understanding Temporal-Difference Learning
32 pages
Passive vs Active Reinforcement Learning
No ratings yet
Passive vs Active Reinforcement Learning
30 pages
L12 Reinforcement Learning ThanhNH
No ratings yet
L12 Reinforcement Learning ThanhNH
59 pages
Reinforcement Learning Algorithms Overview
No ratings yet
Reinforcement Learning Algorithms Overview
28 pages
Markov Decision Processes & RL Techniques
No ratings yet
Markov Decision Processes & RL Techniques
40 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
57 pages
Reinforcement Learning: Temporal-Difference Learning
No ratings yet
Reinforcement Learning: Temporal-Difference Learning
61 pages
Reinforcement Learning Overview in CS188
No ratings yet
Reinforcement Learning Overview in CS188
35 pages
Monte Carlo Learning in Reinforcement Learning
No ratings yet
Monte Carlo Learning in Reinforcement Learning
14 pages
Bootstrapping in Reinforcement Learning
No ratings yet
Bootstrapping in Reinforcement Learning
25 pages
Reinforcement Learning Algorithms Guide
No ratings yet
Reinforcement Learning Algorithms Guide
98 pages
TD Learning vs. Monte Carlo Methods
No ratings yet
TD Learning vs. Monte Carlo Methods
9 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
8 pages
TD Learning in Reinforcement Learning
No ratings yet
TD Learning in Reinforcement Learning
57 pages
Technology Cost Impact on Indonesia's Renewable Energy
No ratings yet
Technology Cost Impact on Indonesia's Renewable Energy
2 pages
Tax Refund Process on Hometax
No ratings yet
Tax Refund Process on Hometax
11 pages
Clustering Residential Energy Profiles
No ratings yet
Clustering Residential Energy Profiles
4 pages
P2P Energy Trading System Requirements
No ratings yet
P2P Energy Trading System Requirements
21 pages
Technology Cost Impact on Indonesia's Renewable Energy
No ratings yet
Technology Cost Impact on Indonesia's Renewable Energy
10 pages
Goal Model for P2P Energy Trading
No ratings yet
Goal Model for P2P Energy Trading
21 pages
Table of Contents - Powergrid
No ratings yet
Table of Contents - Powergrid
4 pages
Aggregated Model for Grid-Forming PV
No ratings yet
Aggregated Model for Grid-Forming PV
1 page
Processes 09 01780
No ratings yet
Processes 09 01780
17 pages
PRISMA 2025 Checklist Overview
No ratings yet
PRISMA 2025 Checklist Overview
2 pages
Dynamic P2P Energy Trading with DOE
No ratings yet
Dynamic P2P Energy Trading with DOE
12 pages
Thesis Confirmation for KFUPM Students
No ratings yet
Thesis Confirmation for KFUPM Students
1 page
Distributed Energy Resource Coordination
No ratings yet
Distributed Energy Resource Coordination
311 pages
Research Student Confirmation Form
No ratings yet
Research Student Confirmation Form
1 page
ED-Based P2P Energy Trading in Korea
No ratings yet
ED-Based P2P Energy Trading in Korea
12 pages
Oxidation Resistance in Circuit Breaker Contacts
No ratings yet
Oxidation Resistance in Circuit Breaker Contacts
5 pages
Peer-to-Peer Energy Trading Requirements
No ratings yet
Peer-to-Peer Energy Trading Requirements
28 pages
Demand Response Programs: Korea vs. Advanced Countries
No ratings yet
Demand Response Programs: Korea vs. Advanced Countries
5 pages
Penguins - Sight Word Readers Set 36: Click Here For More Free Printables!
No ratings yet
Penguins - Sight Word Readers Set 36: Click Here For More Free Printables!
9 pages
Computer Systems & Digital Skills Course
No ratings yet
Computer Systems & Digital Skills Course
4 pages
Overview of Computer-Based Information Systems
No ratings yet
Overview of Computer-Based Information Systems
9 pages
Presales Consulting & Proposal Strategies
100% (4)
Presales Consulting & Proposal Strategies
73 pages
Cisco ISE Interview Q&A Guide
No ratings yet
Cisco ISE Interview Q&A Guide
35 pages
HIMatrix F60 Engineering Manual
No ratings yet
HIMatrix F60 Engineering Manual
64 pages
Uputstvo za TESLA TV 32E620BHS
No ratings yet
Uputstvo za TESLA TV 32E620BHS
54 pages
MOS International Computer Exam Practice
No ratings yet
MOS International Computer Exam Practice
5 pages
SD-WAN Migration Practice Scenarios
No ratings yet
SD-WAN Migration Practice Scenarios
3 pages
MATLAB Discrete Time Signal Processing Lab
No ratings yet
MATLAB Discrete Time Signal Processing Lab
63 pages
Lead Generation Playbook for 2025
No ratings yet
Lead Generation Playbook for 2025
12 pages
Archer C20 Router Setup Guide
No ratings yet
Archer C20 Router Setup Guide
1 page
Smart Grid: Key Features and Benefits
No ratings yet
Smart Grid: Key Features and Benefits
19 pages
SRMS Project Overview and Acknowledgements
No ratings yet
SRMS Project Overview and Acknowledgements
36 pages
International Job Opportunities from India
No ratings yet
International Job Opportunities from India
7 pages
Combinatorial Proof of Subset Count
No ratings yet
Combinatorial Proof of Subset Count
9 pages
It Man Pages
No ratings yet
It Man Pages
69 pages
Blockchain Module 1 Exam Answers
No ratings yet
Blockchain Module 1 Exam Answers
81 pages
SAP P2P Process Scenarios Explained
No ratings yet
SAP P2P Process Scenarios Explained
92 pages
Phison PS3112-S12 SSD Controller Overview
No ratings yet
Phison PS3112-S12 SSD Controller Overview
3 pages
Microsoft 365 Copilot for Operations PMs
No ratings yet
Microsoft 365 Copilot for Operations PMs
1 page
Introduction to Typography and Typewriting
No ratings yet
Introduction to Typography and Typewriting
135 pages
BDCOM GP3600-16 OLT Series
No ratings yet
BDCOM GP3600-16 OLT Series
6 pages
myViewBoard User Training Guide
No ratings yet
myViewBoard User Training Guide
30 pages
Ethereum Node Setup Guide
No ratings yet
Ethereum Node Setup Guide
9 pages
Class VI Computer Exam Paper 2020-21
No ratings yet
Class VI Computer Exam Paper 2020-21
2 pages
How Gamers Discover New Games 2024
No ratings yet
How Gamers Discover New Games 2024
9 pages
OXO Connect SIP Advanced Parameters Guide
No ratings yet
OXO Connect SIP Advanced Parameters Guide
23 pages
Infor XA Release 7 Client Architecture Guide
No ratings yet
Infor XA Release 7 Client Architecture Guide
104 pages
Creating Bootable USB with Rufus
No ratings yet
Creating Bootable USB with Rufus
4 pages