0% found this document useful (0 votes)

56 views23 pages

RL en Espacios Con, Nuos: (Con Diaposi0vas de Ali Nouri)

The document discusses reinforcement learning (RL) in continuous state spaces. It outlines some key assumptions for continuous spaces, including that transitions can be written as a function with noise and that transition and reward functions are Lipschitz continuous. It also discusses solving Markov decision processes (MDPs) with value iteration, policy iteration, and linear programming for finite state spaces, and fitted value iteration, forward search, and discretization approaches for continuous spaces. The challenges of function approximation in continuous spaces are described.

Uploaded by

Kasey Owens

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views23 pages

RL en Espacios Con, Nuos: (Con Diaposi0vas de Ali Nouri)

Uploaded by

Kasey Owens

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

RL

en espacios con,nuos
(con diaposi0vas de Ali Nouri)

Assump0ons for Con0nuous Spaces

Transi0on can be wri>en in the following form:
St+1 = f(st,at) +
Where is drawn from a known distribu0on.
[Note: a recent work shows learning the noise is not
very detrimental to algorithms [BLLLR08]

Transi0on and reward func0ons are Lipschitz

con0nuous [CT91]:
|| f(s1,a) f(s2,a)|| < CT ||s1-s2||
|| R(s1) R(s2) || < CR ||s1-s2||
2

MDP Proper0es Cntd.

Op0mal value func0on sa0ses the following
Bellman equa0on [P94]:

Finite State Space:

Con.nuous State Space:

Solving MDPs
Finite state space [P94]:
Value itera0on
Policy itera0on
Linear programming

Con0nuous state space:

Convert to nite MDP by
Rela0vely very expensive
discre0za0on, solve accordingly
Fi>ed value itera0on [G95]
Forward search: sparse sampling,
UCT [KMN02, KS06]
4

RL en espacios con,nuos
Todos los algoritmos que mencionamos hasta
ahora usan tablas para almacenar los
parmetros del problema.
Imposoble en espacios con,nuos con estados
y/o acciones innitas.
Hay que usar algn mtodo de generalizacin.
Podemos reemplazar las tablas por un
aproximador de funciones, aunque puede
traer problemas.
5

Value-func0on Approxima0on
Several researchers have tried to apply
func0on approxima0on to values of states a
long 0me ago [T95, BM95]
Boyan and Moore reported that some
func0on approximators are not stable and
result in divergence [BM95].
Gordon showed a very restricted set of
func0on approximators are stable [G95].
6

Model Approxima0on
Metric E3 [KKL03] provides an algorithm
schema for how generaliza0on can be done in
a model-based sepng without losing
convergence guarantees.
A few realiza0on of metric E3 exist for a subset
of environments [SL07,JS06].

An Experimental Domain

Factored State Spaces

S = S1 x S2 x x Sm
S1 = (s11,s12,, s1m1)
S2 = (s21,s22,, s2m2)

dog x
dog y
State: dog angle
ball x
ball y

Model Parameter Approxima0on (MPA)

A factored learner for con0nuous spaces

Factored Learner in Con0nuous Domains

Es0mate environment in delta model [JS06]:

Instead of learning a mapping from S to S, learn a

mapping from S to Rm such that st+1=st + f(st,at)+

Input the dependency graph of variables in the

form of a DBN.
Construct a func0on approximator for each state
variable.
Train each func0on approximator with samples in
delta transi0on model.
Allow func0on approximators to return I dont
know value, if theres not enough support.
11

dog x
dog y
For each ac0on
dog angle
ball x
ball y

dog x
dog y
dog angle
ball x
ball y

dog x
dog y
For each ac0on
dog x
dog angle
ball x
ball y

dog x
For each ac0on
dog y
dog x
dog angle

MPA
Input dependency graphs di,j for each ac0on i and
target state variable j.
Create transi0on func0on approximator i,j for
each ac0on i and target state variable j.
Create reward func0on approximator .
While learning con0nues:
observe an interac0on in the form of (st,at,st+1,rt+1)
Train at,i with ( dat,i(st), st+1(i) st(i) )
Train with (st+1, rt+1)
If t= intervalTime then
Let = plan(, ,Rmax)

Return (st+1)

MPA Planning

Input , and Rmax

Par00on state space using non-overlapping cells with resolu0on .
Create c00ous state sf with reward Rmax and self-looping ac0ons.
For each cell in , generate k uniformly distributed samples (O1 , ,
Ok).
Let R(Oi) = (Oi)
For each ac0on ai, Let Pij= (Oj,ai) and cell(s) be the cell containing s.
Using Pijs, construct maximum likelihood transi0on func0on T(, ai)
If Pij = IDK then Pij = sf
Construct nite MDP M=<, A, T, R, >
Solve M using conven0onal value itera0on algorithm.
Return the best policy
14

Results for Robot Naviga0on

Results when BumbleBall is NOT in the environment

LWPR used as func0on approxima0on [VSS00]
Results averaged over 10 runs
15

Mul0-resolu0on Explora0on (MRE)

A discre0za0on-based explora0on

Discre0ze state space and use samples in each cell to tag that
region as known or unknown.
We can use it for implemen0ng metric E3
Its computa0onally faster than maintaining hyper spheres.

More accurate
less generaliza0on

More generaliza0on
Less accurate

Mul0-resolu0on Discre0za0on
Allow dierent levels of
generaliza0on depending on how
many samples exist in the
neighborhood.
Get more accurate es0ma0ons in
parts of state space where we
care more.
Allow more generaliza0on for
places where we dont need
much accuracy
18

Kd-tree Structure
It par00ons the state space into variable-sized
regions.
The root covers the whole space, each node
selects one of the dimensions and splits the
space into two half-spaces, producing two
children.
We split at the median of the points along that
direc0on.
We stop once the number of points inside a
regions is less than a threshold.
19

Knownness for MRE

Make knownness a func0on of the size and

the number of points in the cell:
Dene a target resolu0on, , and number of
desired points in cells, , based on , , and
smoothness parameters.
Dene smallness of a cell to be a func0on that
goes from 0 to 1 as the size of the cell decreases
from ||S|| to .
Dene knownness to be
knownness() = (|O|/ ).(smallness())
20

Model-based MRE

Let Osa be the set of samples for s/a pair.

Let sf be a c00ous state with value=Vmax.
Build the transi0on func0on as follows:
T(s|s,a) = k(s,a) T^(s|s,a)
T(sf|s,a) = 1 k(s,a)

Propiedades de Model-based MRE

Es independiente del aproximador u0lizado.
Es independiente del algoritmo de planning.
Resultado: es PAC.

Results for Mountain Car

MountainCar is a 2D environment
Results averaged over 20 runs
23

Answer Key
No ratings yet
Answer Key
12 pages
Mathematics in AI: Neural Networks
No ratings yet
Mathematics in AI: Neural Networks
10 pages
5SC28 L7 Machine Learning
No ratings yet
5SC28 L7 Machine Learning
61 pages
Lecture Doubts
No ratings yet
Lecture Doubts
2 pages
Fuzzy Modeling
No ratings yet
Fuzzy Modeling
65 pages
ML Assignment 1: 1. A) What Is Machine Learning? Explain Types of Machine Learning
No ratings yet
ML Assignment 1: 1. A) What Is Machine Learning? Explain Types of Machine Learning
8 pages
RL Theory Tutorial
No ratings yet
RL Theory Tutorial
80 pages
COMP 4901Z: Reinforcement Learning: 2.3 Value Function Approximation
No ratings yet
COMP 4901Z: Reinforcement Learning: 2.3 Value Function Approximation
55 pages
Supp 2
No ratings yet
Supp 2
214 pages
Reinforcement Learning: Foundations Exam
No ratings yet
Reinforcement Learning: Foundations Exam
42 pages
Fundations Data Science
No ratings yet
Fundations Data Science
16 pages
Deep Learning: Overcoming High-Dimensional Challenges
No ratings yet
Deep Learning: Overcoming High-Dimensional Challenges
5 pages
HW 1 Eeowh 3
No ratings yet
HW 1 Eeowh 3
6 pages
Function Approximation
No ratings yet
Function Approximation
35 pages
Class Notes 2
No ratings yet
Class Notes 2
6 pages
Genetic Algorithms Versus Traditional Methods
No ratings yet
Genetic Algorithms Versus Traditional Methods
7 pages
Formula Sheet: Section 1 - Deterministic Dynamic Programming
No ratings yet
Formula Sheet: Section 1 - Deterministic Dynamic Programming
10 pages
HW 1
No ratings yet
HW 1
6 pages
Mathematics Theory of Deep Learning
No ratings yet
Mathematics Theory of Deep Learning
3 pages
Cs217 Perceptron Capacity FNN Week4 27jan25
No ratings yet
Cs217 Perceptron Capacity FNN Week4 27jan25
59 pages
Notes On Deep Learning Theory
No ratings yet
Notes On Deep Learning Theory
68 pages
ML Fundamentals by Bitspace
No ratings yet
ML Fundamentals by Bitspace
19 pages
Introduction To Machine Learning by Ethem Alpaydin 2nded - 2010
No ratings yet
Introduction To Machine Learning by Ethem Alpaydin 2nded - 2010
314 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
hw1 PDF
No ratings yet
hw1 PDF
6 pages
Mathematical Deep Learning Theory
No ratings yet
Mathematical Deep Learning Theory
275 pages
Lecture 01
No ratings yet
Lecture 01
11 pages
Deep Learning Math
No ratings yet
Deep Learning Math
282 pages
Least Mean Square (LMS) Algorithm: 3.1 Spatial Filtering
No ratings yet
Least Mean Square (LMS) Algorithm: 3.1 Spatial Filtering
16 pages
Operator Learning Algorithms and Analysis
No ratings yet
Operator Learning Algorithms and Analysis
36 pages
A Gentle Introduction To Gradient-Based Optimization
No ratings yet
A Gentle Introduction To Gradient-Based Optimization
36 pages
Lecture 13 - Perceptrons: Machine Learning March 16, 2010
No ratings yet
Lecture 13 - Perceptrons: Machine Learning March 16, 2010
49 pages
Lect 1
No ratings yet
Lect 1
24 pages
Special Networks
No ratings yet
Special Networks
42 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Inherently Interpretable Models 1 of 2
No ratings yet
Inherently Interpretable Models 1 of 2
64 pages
Ai Unit 2
No ratings yet
Ai Unit 2
4 pages
Neural Network Theory22
No ratings yet
Neural Network Theory22
60 pages
Mathematics of Neural Networks: Bart M.N. Smets November 12, 2022
No ratings yet
Mathematics of Neural Networks: Bart M.N. Smets November 12, 2022
80 pages
Lecture 05
No ratings yet
Lecture 05
34 pages
Restricted Boltzmann Machines: Abstract
No ratings yet
Restricted Boltzmann Machines: Abstract
21 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
UNIT-III-3.2-ML-Features of ANN and Case Study ANN
No ratings yet
UNIT-III-3.2-ML-Features of ANN and Case Study ANN
24 pages
AI Final Exam 2025
No ratings yet
AI Final Exam 2025
3 pages
1718sem2-Ee5904 Me5404
No ratings yet
1718sem2-Ee5904 Me5404
4 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
Ecs 403 ML Module I
No ratings yet
Ecs 403 ML Module I
33 pages
Math Meets Machine Learning
No ratings yet
Math Meets Machine Learning
6 pages
DL UNIT II PART II (IMP) Optimization For Training Deep Model
No ratings yet
DL UNIT II PART II (IMP) Optimization For Training Deep Model
81 pages
Unit 5
No ratings yet
Unit 5
39 pages
DL Unit1
No ratings yet
DL Unit1
61 pages
Probabilistic Machine Learning Guide
No ratings yet
Probabilistic Machine Learning Guide
343 pages
Compact and Efficient Encodings For Planning in Factored Sta - 2020 - Artificial
No ratings yet
Compact and Efficient Encodings For Planning in Factored Sta - 2020 - Artificial
21 pages
Lecture Five Radial-Basis Function Networks: Associate Professor
No ratings yet
Lecture Five Radial-Basis Function Networks: Associate Professor
64 pages
CS229
No ratings yet
CS229
216 pages
281 Cheat Sheet
No ratings yet
281 Cheat Sheet
3 pages
Multi-Agent System For Decision Support in Enterprises: Dejan Lavbi
No ratings yet
Multi-Agent System For Decision Support in Enterprises: Dejan Lavbi
16 pages
Haigh&Veloso (1999) LearningSituationDependentRules (ROGUE)
No ratings yet
Haigh&Veloso (1999) LearningSituationDependentRules (ROGUE)
6 pages
Learning Planning Operators by Observation and Practice Xuemei Wang
No ratings yet
Learning Planning Operators by Observation and Practice Xuemei Wang
6 pages
Argumentative Deliberation For Autonomous Agents
No ratings yet
Argumentative Deliberation For Autonomous Agents
10 pages
LeBaron.B AgentBasedComputationalFinance
No ratings yet
LeBaron.B AgentBasedComputationalFinance
26 pages
Multi-Robot Decision Making with Coordination Graphs
No ratings yet
Multi-Robot Decision Making with Coordination Graphs
6 pages
Owens.J - WhatHistoriansWantFromGIS
No ratings yet
Owens.J - WhatHistoriansWantFromGIS
8 pages
Barbara Hayes-Roth: An Architecture For Adaptive Intelligent Systems1
No ratings yet
Barbara Hayes-Roth: An Architecture For Adaptive Intelligent Systems1
49 pages
The Development of Khepera: 1 Starting Conditions
No ratings yet
The Development of Khepera: 1 Starting Conditions
7 pages
Integrating Planning and Learning: The PRODIGY Architecture
No ratings yet
Integrating Planning and Learning: The PRODIGY Architecture
39 pages
Walsh&Littman
No ratings yet
Walsh&Littman
6 pages
GeoCities Archive: A Digital Time Capsule
No ratings yet
GeoCities Archive: A Digital Time Capsule
2 pages
Moneris PCI DSS Checklist 100716
No ratings yet
Moneris PCI DSS Checklist 100716
3 pages
Chapter 00.course Information
No ratings yet
Chapter 00.course Information
14 pages
CEGP013091: 49.248.216.238 17/05/2024 13:48:57 Static-238
No ratings yet
CEGP013091: 49.248.216.238 17/05/2024 13:48:57 Static-238
3 pages
Coverage UVM Cookbook
0% (1)
Coverage UVM Cookbook
97 pages
Ecus
No ratings yet
Ecus
10 pages
Project Management Workflow
No ratings yet
Project Management Workflow
38 pages
Sap MM Sto p2p - Part 2
No ratings yet
Sap MM Sto p2p - Part 2
16 pages
Assignment Week5
No ratings yet
Assignment Week5
2 pages
Air Regulations RK Bali PDF
38% (8)
Air Regulations RK Bali PDF
1 page
Applsci 12 08252
No ratings yet
Applsci 12 08252
20 pages
C 09 S 4
100% (2)
C 09 S 4
12 pages
Functional Safety Certificate: ICO3S, ICO4S, ICO4D, ICO4N and SOV 1 To 6
100% (1)
Functional Safety Certificate: ICO3S, ICO4S, ICO4D, ICO4N and SOV 1 To 6
5 pages
Digital Environment Issues Guide
No ratings yet
Digital Environment Issues Guide
66 pages
Get Response
No ratings yet
Get Response
15 pages
Sahilmahajan (4 6)
No ratings yet
Sahilmahajan (4 6)
3 pages
Data Analysis With STATA - Sample Chapter
No ratings yet
Data Analysis With STATA - Sample Chapter
22 pages
TrueRTA Quick Start
No ratings yet
TrueRTA Quick Start
8 pages
CSE
No ratings yet
CSE
20 pages
Sigma CP-R: Conventional Fire Alarm Control Panel Repeater & Ancillary PCB
No ratings yet
Sigma CP-R: Conventional Fire Alarm Control Panel Repeater & Ancillary PCB
2 pages
Citrix Easycall Gateway Telephony System Integrator'S Guide: For Alcatel Omnipcx Enterprise
No ratings yet
Citrix Easycall Gateway Telephony System Integrator'S Guide: For Alcatel Omnipcx Enterprise
16 pages
10 Essential InDesign Skills by InDesignSkills
100% (5)
10 Essential InDesign Skills by InDesignSkills
14 pages
CH 4 - Process Control J5800
100% (1)
CH 4 - Process Control J5800
49 pages
Oracle Control File Recreation Guide
100% (1)
Oracle Control File Recreation Guide
3 pages
OU6 Tests
No ratings yet
OU6 Tests
4 pages
PC Assembly & Disassembly Guide
No ratings yet
PC Assembly & Disassembly Guide
27 pages
Travel Email
No ratings yet
Travel Email
2 pages
2025 Copy of Fintech Eng m12345 Part 2
No ratings yet
2025 Copy of Fintech Eng m12345 Part 2
345 pages
Network Tools DNS, IP, Email
No ratings yet
Network Tools DNS, IP, Email
1 page
TASO Ethics Guidance Consent Form Template 4 2
No ratings yet
TASO Ethics Guidance Consent Form Template 4 2
2 pages

RL en Espacios Con, Nuos: (Con Diaposi0vas de Ali Nouri)

Uploaded by

RL en Espacios Con, Nuos: (Con Diaposi0vas de Ali Nouri)

Uploaded by

RL

Assump0ons for Con0nuous Spaces

Transi0on and reward func0ons are Lipschitz

MDP Proper0es Cntd.

Finite State Space:

Con.nuous State Space:

Con0nuous state space:

Factored State Spaces

Model Parameter Approxima0on (MPA)

Factored Learner in Con0nuous Domains

Instead of learning a mapping from S to S, learn a

Input the dependency graph of variables in the

Input , and Rmax

Results for Robot Naviga0on

Results when BumbleBall is NOT in the environment

Mul0-resolu0on Explora0on (MRE)

Knownness for MRE

Make knownness a func0on of the size and

Let Osa be the set of samples for s/a pair.

Propiedades de Model-based MRE

Results for Mountain Car

You might also like