0% found this document useful (0 votes)

79 views

MS&E 221: Stochastic Modeling: Session 7: Nonlinear Optimization, Markov Decision Processes

This document discusses nonlinear optimization and Markov decision processes (MDPs). It provides two examples of using maximum likelihood estimation to estimate parameters for queueing models based on observed data. It then introduces MDPs and Bellman's equation for solving sequential decision problems to maximize reward over time. Two solution methods are described: linear programming and value iteration. An example perpetual option problem is presented to illustrate an MDP.

Uploaded by

l f

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views

MS&E 221: Stochastic Modeling: Session 7: Nonlinear Optimization, Markov Decision Processes

Uploaded by

l f

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

MS&E 221: Stochastic Modeling

Session 7: Nonlinear Optimization, Markov Decision Processes

Lin Fan

February 20, 2019

1 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates

Example 2 (from Estimation Slide Deck): Queueing model

Xn+1 = [Xn + Zn+1 − 2]+

where (Zn : n ≥ 1) i.i.d. Geometric(p∗ )

From the data, estimate p∗

2 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates

Example 2 (from Estimation Slide Deck):

Observed data:
Z1 = 0, Z2 = 1, Z3 = 3, Z4 = 1, Z5 = 2
L(p) = (1 − p) p · (1 − p)1 p · (1 − p)3 p · (1 − p)1 p · (1 − p)2 p = (1 − p)7 p5
0

The maximizing value of p is p̂ = 5/12

Find p̂ using Matlab:

fun=@(p)(-(1-p)^7*p^5);
p0=0.1;
p_hat=fminsearch(fun,p0)

3 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates

Example 2 Variant (from Estimation Slide Deck): Queueing model

Xn+1 = [Xn + Zn+1 − 1]+

where (Zn : n ≥ 1) i.i.d. Poisson(λ∗ )

From the data, estimate λ∗

4 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates
Example 2 Variant (from Estimation Slide Deck): Xn+1 = [Xn + Zn+1 − 1]+

Observed data:

X1 = 0, X2 = 0, X3 = 3, X4 = 2, X5 = 1, X6 = 0, X7 = 0
3
λ4 λ0

λ λ
L(λ) = e−λ + e−λ · e−λ · e−λ · e−λ + e−λ
1! 4! 0! 1!
The maximizing value of λ is λ̂ = 0.8165

Find λ̂ using Matlab:

fun=@(lambda)(-(exp(-lambda)+exp(-lambda)*lambda)*exp(-lambda)*lambda^4...
*exp(-3*lambda)*(exp(-lambda)+exp(-lambda)*lambda))
lambda0=0.1;
lambda_hat=fminsearch(fun,lambda0)

5 / 18
Sequential Decision Making

Goal: Maximize reward sequentially over time

Reward is a mathematical expression of a desirable state

Decisions made in stages
Current decision affects future outcomes, and therefore future decisions
Balance high present reward vs. potentially low future rewards

6 / 18
Markov Decision Processes

S: set of states, A(x) ⊆ A: set of actions permissible at state x ∈ S

r : S × A → R+ : reward function
Xn : state of system at time n
An : S → A: action mapping at time n, with An (x) ∈ A(x)

P (Xn+1 = xn+1 |X0 = x0 , A0 = a0 , . . . , Xn = xn , An = an )

= P (Xn+1 = xn+1 |Xn = xn , An = an ) =: Pan (xn , xn+1 )

Goal: Denoting the policy Π = (An )n≥0 , solve

"∞ #
X
−αn
maximize EΠ e r(Xn , An (Xn ))
Π
n=0

where α > 0 is discount rate.

7 / 18
Applications

Robotics, Control
Rockets
Autonomous Robots
Business Decisions
Inventory management
Scheduling, controlling queues
Personalized marketing
Finance
Portfolio management (e.g. pension funds)
Option pricing
Education (edtech services)
And many others...

8 / 18
Example: Perpetual Option

Consider an option on a stock that you can exercise at any time

Let Xn be the stock price at time n
If you exercise the option (action a0 ) at time n, you get r(Xn , a0 ), otherwise
0 (action a1 )
Once you exercise option (state E), no reward from then on
S = {E, 0, 1, 2, . . . , }, A = {a0 , a1 }
Some transition matrix Pa with Pa0 (x, E) = 1
For a ∈ A, Pa (E, E) = 1 and r(E, a) = 0

9 / 18
Bellman’s Equation

Recall An : S → A, and Π = (An )n≥0

Define optimal value

∞
" #
X
? −αn
V (x) = max EΠ e r(Xn , An (Xn ))X0 = x

Π
n=0

Theorem 1 (Bellman’s Equation)

Suppose |S| < ∞ and |A| < ∞. Then, V ? satisfies for all x ∈ S
 
 X 
V ? (x) = max r(x, a) + e−α Pa (x, y)V ? (y) .
a∈A(x)  
y∈S

Further, V ? is the unique finite solution to the above fixed point equation.

10 / 18
Sketch of Proof n o
Goal: Show V ? satisfies V ? (x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V ? (y)
P

? ?
We postulate that optimal policy is given n = A for some A : S → A (with
P∞by A−αn
? ? ?
A (x) ∈ A(x)). Then, V (x) = EA [ n=0 e
? r(Xn , A (Xn ))].
From first transition analysis, we have
X
V ? (x) = r(x, A? (x)) + e−α PA? (x) (x, y)V ? (y).
y∈S

Similarly, we can show for all a ∈ A(x)

X
V ? (x) ≥ r(x, a) + e−α Pa (x, y)V ? (y)
y∈S

Intuition: Playing optimally is better than playing action a at time 0, and then
optimally onwards

11 / 18
Solution Methods

Can we compute V ? (x) for all x ∈ S?

Use Bellman’s equation
 
 X 
V ? (x) = max r(x, a) + e−α Pa (x, y)V ? (y) .
a∈A(x)  
y∈S

Once we have V ? , the optimal strategy A? : S → A is given by

 
 X 
A? (x) = argmax r(x, a) + e−α Pa (x, y)V ? (y)
a∈A(x)  y∈S


12 / 18
Approach 1: Linear Programming

Since V ? is the unique solution to

 
 X 
V ? (x) = max r(x, a) + e−α Pa (x, y)V ? (y) ,
a∈A(x)  
y∈S

it is given by the linear program

X
minimize V (x)
V
x∈S
X
s.t. V (x) ≥ r(x, a) + e−α Pa (x, y)V (y)
y∈S

for all x ∈ S, a ∈ A(x)

Drawback: Computationally expensive when |S| and |A| are large!

13 / 18
Approach 2: Value Iteration

Let T : R|S| → R|S| be nthe Bellman operator o

(T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y) .
P

The value function V ? is a fixed point of T : V ? = T V ?

Theorem 2 (Value Iteration)

For any vector V0 , we have limk→∞ (T k V0 )(x) = V ? (x) for all x ∈ S.

Starting with some |S|-dimensional vector V0 , iteratively apply Vk+1 = T Vk !

14 / 18
Example: Perpetual Option

Let Pa0 (x, E) = 1 for x ∈ S and

1 2 3 E
 
1 1/2 1/2 0 0
2  1/3 0 2/3 0 
Pa1 =  .
3  1/3 1/3 1/3 0 
E 0 0 0 1

Let r(1, a0 ) = 0, r(2, a0 ) = 1, r(3, a0 ) = 2, r(x, a1 ) = 0 for all x ∈ S,

and r(E, a0 ) = r(E, a1 ) = 0 (so V (E) = 0).

15 / 18
Example: Perpetual Option
n o
Compute (T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y)
P

Suppose that e−α = 1/2. Let Pa0 (x, E) = 1 for x ∈ S and

1 2 3 E
 
1 1/2 1/2 0 0
2  1/3 0 2/3 0 
Pa1 =  .
3  1/3 1/3 1/3 0 
E 0 0 0 1
Let r(1, a0 ) = 0, r(2, a0 ) = 1, r(3, a0 ) = 2, r(x, a1 ) = 0 for all x ∈ S,
and r(E, a0 ) = r(E, a1 ) = 0 (so V (E) = 0).

1 1 1
(T V )(1) = max 0, V (1) + V (2)
2 2 2

1 1
= max 0, V (1) + V (2)
4 4

16 / 18
Example: Perpetual Option
n o
Compute (T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y)
P

Suppose that e−α = 1/2. Let Pa0 (x, E) = 1 for x ∈ S and

1 2 3 E
 
1 1/2 1/2 0 0
2  1/3 0 2/3 0 
Pa1 =  .
3  1/3 1/3 1/3 0 
E 0 0 0 1
Let r(1, a0 ) = 0, r(2, a0 ) = 1, r(3, a0 ) = 2, r(x, a1 ) = 0 for all x ∈ S,
and r(E, a0 ) = r(E, a1 ) = 0 (so V (E) = 0).

1 1 2
(T V )(2) = max 1, V (1) + V (3)
2 3 3

1 1
= max 1, V (1) + V (3)
6 3

17 / 18
Example: Perpetual Option
n o
Compute (T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y)
P

Suppose that e−α = 1/2. Let Pa0 (x, E) = 1 for x ∈ S and

1 2 3 E
 
1 1/2 1/2 0 0
2 
 1/3 0 2/3 0 
Pa1 = .
3  1/3 1/3 1/3 0 
E 0 0 0 1

Let r(1, a0 ) = 0, r(2, a0 ) = 1, r(3, a0 ) = 2, r(x, a1 ) = 0 for all x ∈ S,

and r(E, a0 ) = r(E, a1 ) = 0 (so V (E) = 0).

1 1 1 1
(T V )(3) = max 2, V (1) + V (2) + V (3)
2 3 3 3

1 1 1
= max 2, V (1) + V (2) + V (3)
6 6 6

18 / 18

Approach Manual
100% (3)
Approach Manual
199 pages
Reinforcement Learning Cheat Sheet: Return
No ratings yet
Reinforcement Learning Cheat Sheet: Return
7 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
mdp-cheatsheet
No ratings yet
mdp-cheatsheet
3 pages
MDP 2
No ratings yet
MDP 2
53 pages
2-dynamic
No ratings yet
2-dynamic
50 pages
Fa19 Lecture 15 MDPs II
No ratings yet
Fa19 Lecture 15 MDPs II
76 pages
Lec 09
No ratings yet
Lec 09
51 pages
Infinite-Horizon Dynamic Programming: Tianxiao Zheng Saif
No ratings yet
Infinite-Horizon Dynamic Programming: Tianxiao Zheng Saif
10 pages
cs229 Notes13
No ratings yet
cs229 Notes13
15 pages
Markov Decision Process II
No ratings yet
Markov Decision Process II
88 pages
l1 Mdps Exact Methods
No ratings yet
l1 Mdps Exact Methods
69 pages
Lecture26 Ri
No ratings yet
Lecture26 Ri
55 pages
SLchapt 3
No ratings yet
SLchapt 3
10 pages
AI512/EE633: Reinforcement Learning: Lecture 3 - Dynamic Programming
No ratings yet
AI512/EE633: Reinforcement Learning: Lecture 3 - Dynamic Programming
43 pages
02 MarkovDecisionProcess
No ratings yet
02 MarkovDecisionProcess
51 pages
Formula Sheet: Section 1 - Deterministic Dynamic Programming
No ratings yet
Formula Sheet: Section 1 - Deterministic Dynamic Programming
10 pages
exact (RL IITH)
No ratings yet
exact (RL IITH)
47 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Sp14 Cs188 Lecture 9 - Mdps II
No ratings yet
Sp14 Cs188 Lecture 9 - Mdps II
48 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
101 pages
EE675A Lec12
No ratings yet
EE675A Lec12
5 pages
242 Sheet 02 03
No ratings yet
242 Sheet 02 03
5 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
Lec 3
No ratings yet
Lec 3
15 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
Dynamic Programming: Quantitative Macroeconomics (Econ 5725)
No ratings yet
Dynamic Programming: Quantitative Macroeconomics (Econ 5725)
55 pages
Computational Economics: Session 16: Numerical Dynamic Programming
No ratings yet
Computational Economics: Session 16: Numerical Dynamic Programming
17 pages
Markov Decision Processes and Exact Solution Methods
No ratings yet
Markov Decision Processes and Exact Solution Methods
34 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
No ratings yet
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
8 pages
3 - Chapter 3 Optimal State Values and Bellman Optimality Equation
No ratings yet
3 - Chapter 3 Optimal State Values and Bellman Optimality Equation
21 pages
lecture-06
No ratings yet
lecture-06
98 pages
Module-2 For Btech in Topic
No ratings yet
Module-2 For Btech in Topic
29 pages
Problem Set 11 Solutions
No ratings yet
Problem Set 11 Solutions
4 pages
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
No ratings yet
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
4 pages
New CZ3005 Module 4 - Markov Decision Process
No ratings yet
New CZ3005 Module 4 - Markov Decision Process
38 pages
DRL_Homework_1
No ratings yet
DRL_Homework_1
4 pages
Lecture SM 1 DP
No ratings yet
Lecture SM 1 DP
71 pages
12 ML Reinforcement Learning Value Based Control
No ratings yet
12 ML Reinforcement Learning Value Based Control
12 pages
A Distrib Persp On RL
No ratings yet
A Distrib Persp On RL
19 pages
Bellemare17a PDF
No ratings yet
Bellemare17a PDF
10 pages
EE675 Lecture 10
No ratings yet
EE675 Lecture 10
4 pages
09 - Monte Carlo Learning
No ratings yet
09 - Monte Carlo Learning
24 pages
Dynamic Programming Handout - : 14.451 Recitation, February 18, 2005 - Todd Gormley
No ratings yet
Dynamic Programming Handout - : 14.451 Recitation, February 18, 2005 - Todd Gormley
11 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
کتاب هشتم بارگزاری شده
No ratings yet
کتاب هشتم بارگزاری شده
112 pages
cs229-notes12 Reinforcement in Control
No ratings yet
cs229-notes12 Reinforcement in Control
17 pages
Notes Vfi sp2024
No ratings yet
Notes Vfi sp2024
11 pages
E0_270_RL
No ratings yet
E0_270_RL
10 pages
Cs748 s2021 Quizzes Till q4
No ratings yet
Cs748 s2021 Quizzes Till q4
4 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
Lec 08
No ratings yet
Lec 08
59 pages
CS229
No ratings yet
CS229
17 pages
SanchezPajueloKai_PS3
No ratings yet
SanchezPajueloKai_PS3
6 pages
Value Functions & Bellman Equations
No ratings yet
Value Functions & Bellman Equations
11 pages
Lec 4
No ratings yet
Lec 4
16 pages
Reinforcement-Learning-Cheatsheet
No ratings yet
Reinforcement-Learning-Cheatsheet
16 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Lab 1 Report
No ratings yet
Lab 1 Report
7 pages
Prime Sense
No ratings yet
Prime Sense
4 pages
Valid Entry Online Toll Plaza System
No ratings yet
Valid Entry Online Toll Plaza System
8 pages
IC Simple Bug Report Template 10591
No ratings yet
IC Simple Bug Report Template 10591
2 pages
List of Lab Exercises For VHDL Lab
No ratings yet
List of Lab Exercises For VHDL Lab
3 pages
Internet of Things: Time: 2 HRS.) (Marks: 75
No ratings yet
Internet of Things: Time: 2 HRS.) (Marks: 75
25 pages
Pentest Standard PDF
No ratings yet
Pentest Standard PDF
229 pages
Philips
No ratings yet
Philips
3 pages
Chapter 2
No ratings yet
Chapter 2
58 pages
80386
50% (2)
80386
89 pages
CS 2254operating Systems3
No ratings yet
CS 2254operating Systems3
3 pages
Multiple Choice MP
50% (2)
Multiple Choice MP
9 pages
Oracle JET Component Inside VBCS
No ratings yet
Oracle JET Component Inside VBCS
6 pages
Grade 7 - Activity Sheet
75% (4)
Grade 7 - Activity Sheet
3 pages
1.2) Introduction About Editors - Turbo, Vi, Emacs, Dev C++, Turbo C Editor
No ratings yet
1.2) Introduction About Editors - Turbo, Vi, Emacs, Dev C++, Turbo C Editor
7 pages
Tutorial How To Create Barcode Reader App in Android Studio 1
No ratings yet
Tutorial How To Create Barcode Reader App in Android Studio 1
14 pages
13) Timeline of Operating Systems
No ratings yet
13) Timeline of Operating Systems
6 pages
Lecture 1 (Notes) : Confusion & Diffusion
No ratings yet
Lecture 1 (Notes) : Confusion & Diffusion
5 pages
Instance Based Learning
No ratings yet
Instance Based Learning
36 pages
Oop Python
No ratings yet
Oop Python
36 pages
CPP Lecture03
No ratings yet
CPP Lecture03
36 pages
Sample CV 003
No ratings yet
Sample CV 003
3 pages
Social Media Image Sizes 2018
No ratings yet
Social Media Image Sizes 2018
21 pages
Brocade Guide To Zoning (Old9
No ratings yet
Brocade Guide To Zoning (Old9
27 pages
642 999
No ratings yet
642 999
43 pages
Exercse-1 Insurance Database
No ratings yet
Exercse-1 Insurance Database
35 pages
Oracle Semest 2 Part 1
No ratings yet
Oracle Semest 2 Part 1
20 pages
Age and Gender Classification Using Convolutional Neural Networks
No ratings yet
Age and Gender Classification Using Convolutional Neural Networks
9 pages
VMware High Availability and Fault Tolerance
No ratings yet
VMware High Availability and Fault Tolerance
4 pages