MS&E 221: Stochastic Modeling: Session 7: Nonlinear Optimization, Markov Decision Processes
MS&E 221: Stochastic Modeling: Session 7: Nonlinear Optimization, Markov Decision Processes
Lin Fan
1 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates
2 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates
Observed data:
Z1 = 0, Z2 = 1, Z3 = 3, Z4 = 1, Z5 = 2
L(p) = (1 − p) p · (1 − p)1 p · (1 − p)3 p · (1 − p)1 p · (1 − p)2 p = (1 − p)7 p5
0
fun=@(p)(-(1-p)^7*p^5);
p0=0.1;
p_hat=fminsearch(fun,p0)
3 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates
4 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates
Example 2 Variant (from Estimation Slide Deck): Xn+1 = [Xn + Zn+1 − 1]+
Observed data:
X1 = 0, X2 = 0, X3 = 3, X4 = 2, X5 = 1, X6 = 0, X7 = 0
3
λ4 λ0
λ λ
L(λ) = e−λ + e−λ · e−λ · e−λ · e−λ + e−λ
1! 4! 0! 1!
The maximizing value of λ is λ̂ = 0.8165
fun=@(lambda)(-(exp(-lambda)+exp(-lambda)*lambda)*exp(-lambda)*lambda^4...
*exp(-3*lambda)*(exp(-lambda)+exp(-lambda)*lambda))
lambda0=0.1;
lambda_hat=fminsearch(fun,lambda0)
5 / 18
Sequential Decision Making
6 / 18
Markov Decision Processes
7 / 18
Applications
Robotics, Control
Rockets
Autonomous Robots
Business Decisions
Inventory management
Scheduling, controlling queues
Personalized marketing
Finance
Portfolio management (e.g. pension funds)
Option pricing
Education (edtech services)
And many others...
8 / 18
Example: Perpetual Option
9 / 18
Bellman’s Equation
Further, V ? is the unique finite solution to the above fixed point equation.
10 / 18
Sketch of Proof n o
Goal: Show V ? satisfies V ? (x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V ? (y)
P
? ?
We postulate that optimal policy is given n = A for some A : S → A (with
P∞by A−αn
? ? ?
A (x) ∈ A(x)). Then, V (x) = EA [ n=0 e
? r(Xn , A (Xn ))].
From first transition analysis, we have
X
V ? (x) = r(x, A? (x)) + e−α PA? (x) (x, y)V ? (y).
y∈S
Intuition: Playing optimally is better than playing action a at time 0, and then
optimally onwards
11 / 18
Solution Methods
12 / 18
Approach 1: Linear Programming
13 / 18
Approach 2: Value Iteration
14 / 18
Example: Perpetual Option
1 2 3 E
1 1/2 1/2 0 0
2 1/3 0 2/3 0
Pa1 = .
3 1/3 1/3 1/3 0
E 0 0 0 1
15 / 18
Example: Perpetual Option
n o
Compute (T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y)
P
16 / 18
Example: Perpetual Option
n o
Compute (T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y)
P
17 / 18
Example: Perpetual Option
n o
Compute (T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y)
P
1 2 3 E
1 1/2 1/2 0 0
2
1/3 0 2/3 0
Pa1 = .
3 1/3 1/3 1/3 0
E 0 0 0 1
18 / 18