0% found this document useful (0 votes)

223 views18 pages

Stochastic Dynamic Programming Guide

Stochastic dynamic programming deals with problems where the next state is determined probabilistically rather than deterministically based on the current state and action. The optimality equations involve minimizing expected future costs over the possible next states, weighted by their probabilities. Dynamic programming can be applied to problems involving sequential decision making under uncertainty.

Uploaded by

Armee Justitia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

223 views18 pages

Stochastic Dynamic Programming Guide

Uploaded by

Armee Justitia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Stochastic Dynamic

Programming

1 / 18 Ya-Tang Chuang Dynamic Programming

S TOCHASTIC DYNAMIC PROGRAMMING
Stochastic dynamic programming differs from deterministic dynamic
programming in that the state at the next stage is not completely
determined by the state and policy decision at the current stage.

Rather, there is a probability distribution for what the next state will
be. However, this probability distribution still is completely determined
by the state and policy decision at the current stage.

2 / 18 Ya-Tang Chuang Dynamic Programming

S TOCHASTIC DYNAMIC PROGRAMMING

Let A, a finite set, be the set of all possible actions. fn (sn , xn )

represents the minimum expected sum from stage n onward,
given that the state and policy decision at stage n are sn and xn ,
respectively.

Optimality equation
S
X
fn (sn , xn ) = pi [Ci + fn∗ (i)].
i=1

The optimal value

∗
fn+1 (i) = min fn+1 (i, xn+1 ),
xn+1

where this minimization is taken over the feasible values of xn+1 .

3 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : D ETERMINING R EJECT A LLOWANCES
A company has received an order to supply one item of a particular
type. However, the customer has specified such stringent quality
requirements that the manufacturer may have to produce more than
one item to obtain an item that is acceptable.

The number of extra items produced in a production run is called the

reject allowance. Including a reject allowance is common practice
when producing for a custom order, and it seems advisable in this
case.

4 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : D ETERMINING R EJECT A LLOWANCES
The manufacturer estimates that each item of this type that is
produced will be acceptable with probability 1/2 and defective with
probability 1/2. Thus, the number of acceptable items produced in a
lot of size L will have a binomial distribution; i.e., the probability of
producing no acceptable items in such a lot is (1/2)L

Marginal production costs for this product are estimated to be $100

per item (even if defective), and excess items are worthless. In
addition, a setup cost of $300 must be incurred whenever the
production process is set up for this product, and a completely new
setup at this same cost is required for each subsequent production
run if a lengthy inspection procedure reveals that a completed lot has
not yielded an acceptable item.

5 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : D ETERMINING R EJECT A LLOWANCES
The manufacturer has time to make no more than three production
runs. If an acceptable item has not been obtained by the end of the
third production run, the cost to the manufacturer in lost sales income
and penalty costs will be $1,600.

The objective is to determine the policy regarding the lot size for the
required production run(s) that minimizes total expected cost for the
manufacturer.

6 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : W INNING IN L AS V EGAS
An enterprising young statistician believes that she has developed a
system for winning a popular Las Vegas game. Her colleagues do not
believe that her system works, so they have made a large bet with her
that if she starts with three chips, she will not have at least five chips
after three plays of the game.

Each play of the game involves betting any desired number of

available chips and then either winning or losing this number of chips.
The statistician believes that her system will give her a probability of
2/3 of winning a given play of the game.

7 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : W INNING IN L AS V EGAS
Assuming the statistician is correct, we now use dynamic
programming to determine her optimal policy regarding how many
chips to bet at each of the three plays of the game.

The decision at each play should take into account the results of
earlier plays. The objective is to maximize the probability of winning
her bet with her colleagues.

8 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : A GAMBLING MODEL

At each play of the game a gambler can bet any nonnegative

amount up to his present fortune and will either win or lose that
amount with probabilities p and q = 1 − p, respectively. The
gambler is allowed to make n bets, and his objective is to
maximize the expectation of the logarithm of his final fortune.
What strategy achieves this end?

What’s the state space, action space and value function?

9 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : A GAMBLING MODEL

At each play of the game a gambler can bet any nonnegative

Let Vn (x) denote the maximal expected return if the gambler has
a present fortune of x and is allowed n more gambles

Let x denote the present fortune and α the fraction of the

gambler’s fortune.

Vn (x) = max {pVn−1 ( ) + qVn−1 ( )} ,

0≤α≤1

with the boundary condition V0 (x) = log x.

10 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : A GAMBLING MODEL

At each play of the game a gambler can bet any nonnegative

Let Vn (x) denote the maximal expected return if the gambler has
a present fortune of x and is allowed n more gambles

Let x denote the present fortune and α the fraction of the

gambler’s fortune.

Vn (x) = max {pVn−1 (x + αx) + qVn−1 (x − αx)} ,

0≤α≤1

with the boundary condition V0 (x) = log x.

11 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : I NVENTORY CONTROL

A store has, at time n, sn items in stock. It then orders (and

receives) an items, and sells dn items, where dn follows a given
probability distribution. Assume that backorders are not allowed.
Thus,

sn+1 =

Suppose there is a unit holding cost h if there is any inventory on

hand (sn+1 > 0) as well as unit shortage penalty p if demand is
not fulfilled (sn+1 < 0). Then the (random) cost incurred in period
n, is

r (sn , an ) =

12 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : I NVENTORY CONTROL

A store has, at time n, sn items in stock. It then orders (and

receives) an items, and sells dn items, where dn follows a given
probability distribution. Assume that backorders are not allowed.
Thus,

sn+1 = (sn + an − dn )+ .

Suppose there is a unit holding cost h if there is any inventory on

hand (sn+1 > 0) as well as unit shortage penalty p if demand is
not fulfilled (sn+1 < 0). Then the (random) cost incurred in period
n, is

r (sn , an ) = h · (sn + an − dn )+ + p · (dn − sn − an )+ .

13 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : I NVENTORY CONTROL

Objective for a N-stage problem

N−1
X
V0 (s0 ) = min E r (sn , an ),
a0 ,...,aN−1
n=0

where V0 (s0 ) is the expected total cost over N planning horizon.

Optimality equation

Vn (sn ) = min {R(sn , an ) + EF [Vn+1 (sn+1 )]} ,

an ∈A

where R(sn , an ) = E[r (sn , an )].

14 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : P RICING TO MAXIMIZE REVENUE

Assume that you have s0 items to sell over N periods, and that
you cannot order more items during that time. However, you can
influence demand by varying the price from period to period. The
problem is to select a price at the beginning of each time period
to maximize the total expected revenue.

What’s the state space, action space, reward function and state
transition?

15 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : P RICING TO MAXIMIZE REVENUE

Let (1) sn be the inventory at the beginning of period n, (2) an be

the price per item during period n, (3) dn be amount demanded
during period n.

Since backorders are not allowed, the state is updated by

sn+1 =

The (random) revenue during period n is

r (sn , an ) =

16 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : P RICING TO MAXIMIZE REVENUE

Let (1) sn be the inventory at the beginning of period n, (2) an be

the price per item during period n, (3) dn be amount demanded
during period n.

Since backorders are not allowed, the state is updated by

sn+1 = (sn − dn )+ .

The (random) revenue during period n is

r (sn , an ) = an · min(sn , dn ).

17 / 18 Ya-Tang Chuang Dynamic Programming

E XAMPLE : P RICING TO MAXIMIZE REVENUE

Objective for a N-stage problem

N−1
X
π
V0 (s0 ) = max E r (sn , an ),
a0 ,...,aN−1
n=0

where V0 (s0 ) is the expected total revenue over N planning

horizon.

Optimality equation

Vn (sn ) = max {R(sn , an ) + EF [Vn+1 (sn+1 )]} ,

an ∈A

where R(sn , an ) = E[r (sn , an )].

18 / 18 Ya-Tang Chuang Dynamic Programming

Tractable Stochastic Analysis in High Dimensions Via Robust Optimization
100% (1)
Tractable Stochastic Analysis in High Dimensions Via Robust Optimization
48 pages
R Anonymous Functions Overview
100% (1)
R Anonymous Functions Overview
7 pages
State Space Model Recipes in R
No ratings yet
State Space Model Recipes in R
27 pages
GLPK for R: Linear Programming Guide
No ratings yet
GLPK for R: Linear Programming Guide
12 pages
Juno-G Ug PDF
100% (1)
Juno-G Ug PDF
28 pages
History of Probability
50% (2)
History of Probability
17 pages
XG Boost PDF
100% (1)
XG Boost PDF
3 pages
Zivot+Yollin R Forecasting
100% (1)
Zivot+Yollin R Forecasting
90 pages
Intro to Linear Regression
100% (1)
Intro to Linear Regression
47 pages
Classification and Prediction
100% (2)
Classification and Prediction
31 pages
Lecture On Volatility
No ratings yet
Lecture On Volatility
37 pages
Homoscedasticity, Heteroscedasticity and Multicollinearity
100% (1)
Homoscedasticity, Heteroscedasticity and Multicollinearity
10 pages
ML Guide: Boston House Price Prediction
100% (1)
ML Guide: Boston House Price Prediction
15 pages
Octave Programming and Linear Algebra
No ratings yet
Octave Programming and Linear Algebra
17 pages
MM Assignment Stochastic Programming and Applications2023
No ratings yet
MM Assignment Stochastic Programming and Applications2023
22 pages
George B Dantzig PDF
100% (1)
George B Dantzig PDF
19 pages
AndreW Lo
No ratings yet
AndreW Lo
34 pages
Maximum Entropy Distribution of Stock Price Fluctuations
No ratings yet
Maximum Entropy Distribution of Stock Price Fluctuations
29 pages
Stock Market Prediction Using Time Series Analysis: N Viswam and G Satyanarayana Reddy
100% (1)
Stock Market Prediction Using Time Series Analysis: N Viswam and G Satyanarayana Reddy
5 pages
ComparisonofOpen SourceLinearProgrammingSolvers
100% (1)
ComparisonofOpen SourceLinearProgrammingSolvers
62 pages
Science of Programming Matrix Computations
100% (2)
Science of Programming Matrix Computations
178 pages
Numerical Method
No ratings yet
Numerical Method
6 pages
Kalman Filter for ETF Trading
No ratings yet
Kalman Filter for ETF Trading
4 pages
Power-Laws in Economy and Finance: Some Ideas From Physics: Jean-Philippe Bouchaud
No ratings yet
Power-Laws in Economy and Finance: Some Ideas From Physics: Jean-Philippe Bouchaud
16 pages
Black-Scholes Made Easy
No ratings yet
Black-Scholes Made Easy
96 pages
Linear and Nonlinear Programming
No ratings yet
Linear and Nonlinear Programming
7 pages
MSM Specification: Discrete Time
No ratings yet
MSM Specification: Discrete Time
5 pages
Kaossilator Manual PDF
No ratings yet
Kaossilator Manual PDF
80 pages
Python Programming For Economics Finance
No ratings yet
Python Programming For Economics Finance
267 pages
Game Theory - Thomas S. Ferguson
No ratings yet
Game Theory - Thomas S. Ferguson
8 pages
TimeSeries SARIMA
No ratings yet
TimeSeries SARIMA
15 pages
Bayes PDF
No ratings yet
Bayes PDF
634 pages
Introduction To Monte Carlo Methods
100% (1)
Introduction To Monte Carlo Methods
30 pages
Stock Forecasting with HMM & Fuzzy Logic
No ratings yet
Stock Forecasting with HMM & Fuzzy Logic
8 pages
Laid Back
No ratings yet
Laid Back
1 page
SMG - 3 - Modeling and Solving LP Problems in A Spreadsheet
100% (1)
SMG - 3 - Modeling and Solving LP Problems in A Spreadsheet
43 pages
CQF
No ratings yet
CQF
29 pages
Prediction, Learning, and Games (2006)
100% (1)
Prediction, Learning, and Games (2006)
407 pages
Game Theory Lecture Notes
No ratings yet
Game Theory Lecture Notes
169 pages
Monte Carlo Finance
No ratings yet
Monte Carlo Finance
24 pages
Change Point Detection Seminar
100% (1)
Change Point Detection Seminar
23 pages
The Advantages of Least Squares Monte Carlo
0% (1)
The Advantages of Least Squares Monte Carlo
9 pages
Mod7 CVX CVXOPT
No ratings yet
Mod7 CVX CVXOPT
69 pages
Asset Pricing Modeling and Estimation
No ratings yet
Asset Pricing Modeling and Estimation
246 pages
Sourcebook For Programmable Calculators INDICE
50% (2)
Sourcebook For Programmable Calculators INDICE
2 pages
C++ Financial Numerical Recipes Guide
No ratings yet
C++ Financial Numerical Recipes Guide
152 pages
Simplex Method for Beginners
No ratings yet
Simplex Method for Beginners
6 pages
The Data Science Process
100% (1)
The Data Science Process
53 pages
Approximation Theorems of Mathematical Statistics
100% (1)
Approximation Theorems of Mathematical Statistics
193 pages
Process Optimisation: Dynamic Programming
No ratings yet
Process Optimisation: Dynamic Programming
35 pages
Dynamic Programming for Analysts
No ratings yet
Dynamic Programming for Analysts
43 pages
Lecture 2 Deterministic
No ratings yet
Lecture 2 Deterministic
21 pages
IIM7064 Dynamic Programming
No ratings yet
IIM7064 Dynamic Programming
18 pages
Dynamic Programming and Linear Programming
No ratings yet
Dynamic Programming and Linear Programming
14 pages
Operations Research: Dynamic Programming
100% (1)
Operations Research: Dynamic Programming
52 pages
Powell - Modernizing The Teaching of Optimization January 5 2024
No ratings yet
Powell - Modernizing The Teaching of Optimization January 5 2024
8 pages
Deterministic & Probabilistic Dynamic Programming
No ratings yet
Deterministic & Probabilistic Dynamic Programming
39 pages
Introduction To Dynamic Programming
No ratings yet
Introduction To Dynamic Programming
15 pages
Dynamic Programming Challenges
No ratings yet
Dynamic Programming Challenges
3 pages
Dynamic Programming Optimization
No ratings yet
Dynamic Programming Optimization
52 pages
IS 657 Information Systems Governance and Risk Management Intro IT Governance
No ratings yet
IS 657 Information Systems Governance and Risk Management Intro IT Governance
59 pages
SERVQUAL Analysis of Academic Systems
No ratings yet
SERVQUAL Analysis of Academic Systems
5 pages
Dihedral Group Non-Coprime Graphs
No ratings yet
Dihedral Group Non-Coprime Graphs
4 pages
Indonesian Hoax News Detection
No ratings yet
Indonesian Hoax News Detection
13 pages
Four Spins to Win Betting Strategy
No ratings yet
Four Spins to Win Betting Strategy
2 pages
Astrological Signs: The Pulse of Life
No ratings yet
Astrological Signs: The Pulse of Life
26 pages
Key Sharia Principles and Prohibitions in Islamic Finance by DR
No ratings yet
Key Sharia Principles and Prohibitions in Islamic Finance by DR
1 page
Lotto 649 May 7 2025 (2025-05-07) - Canada Lottery
No ratings yet
Lotto 649 May 7 2025 (2025-05-07) - Canada Lottery
1 page
BC - CO Crypto Casino Games & Casino Slot Games - Crypto Gambling 10
No ratings yet
BC - CO Crypto Casino Games & Casino Slot Games - Crypto Gambling 10
1 page
KN 564
No ratings yet
KN 564
3 pages
Make Your Dreams Come True Using My Football Betting System
86% (14)
Make Your Dreams Come True Using My Football Betting System
39 pages
2015 World Series of Poker Schedule
No ratings yet
2015 World Series of Poker Schedule
4 pages
Livro Jogo Problemático e Patológico
No ratings yet
Livro Jogo Problemático e Patológico
23 pages
Preprint - Shanghai Blind Box Regulation
No ratings yet
Preprint - Shanghai Blind Box Regulation
30 pages
Aviator Game Tricks - How To Win Aviator Game
80% (15)
Aviator Game Tricks - How To Win Aviator Game
19 pages
Betting Tracker v2 21 Advanced
100% (1)
Betting Tracker v2 21 Advanced
79 pages
PowerBaccarat Book
100% (5)
PowerBaccarat Book
144 pages
Classic Sabacc Rules Guide
100% (1)
Classic Sabacc Rules Guide
9 pages
Fixed, Rigged and Loaded Dice - Cheating With Craps Dice
No ratings yet
Fixed, Rigged and Loaded Dice - Cheating With Craps Dice
8 pages
Gherulal Parakh Vs Mahadeodas Maiya and Others On 26 March, 1959
No ratings yet
Gherulal Parakh Vs Mahadeodas Maiya and Others On 26 March, 1959
23 pages
Gambling Addiction Speech
No ratings yet
Gambling Addiction Speech
6 pages
Team-Oleander (Petitioner Memorial)
No ratings yet
Team-Oleander (Petitioner Memorial)
25 pages
Fiche Combinaisons Lotto Max Quebec Max - EN
No ratings yet
Fiche Combinaisons Lotto Max Quebec Max - EN
2 pages
Law Students' Guide to RA 9165
No ratings yet
Law Students' Guide to RA 9165
7 pages
Casino Ordinance Invalidity Ruling
No ratings yet
Casino Ordinance Invalidity Ruling
2 pages
Lottoland Winners
No ratings yet
Lottoland Winners
15 pages
Riichi Mahjong Scoring Guide
No ratings yet
Riichi Mahjong Scoring Guide
53 pages
Sichuan Mahjong Rules & Payments
No ratings yet
Sichuan Mahjong Rules & Payments
2 pages
The Gentlemen's Entomology Club Corebook
No ratings yet
The Gentlemen's Entomology Club Corebook
22 pages
Gilbreath Principle
100% (1)
Gilbreath Principle
3 pages
Anna Belle - Wife To Gambling Jack
No ratings yet
Anna Belle - Wife To Gambling Jack
4 pages
Gambling As A Deviant Behavior
No ratings yet
Gambling As A Deviant Behavior
9 pages
Schizophrenia - Indian Psychiatry Journal
No ratings yet
Schizophrenia - Indian Psychiatry Journal
128 pages
Litte Panda GameRules
No ratings yet
Litte Panda GameRules
3 pages

Stochastic Dynamic Programming Guide

Uploaded by

Stochastic Dynamic Programming Guide

Uploaded by

Stochastic Dynamic

1 / 18 Ya-Tang Chuang Dynamic Programming

2 / 18 Ya-Tang Chuang Dynamic Programming

Let A, a finite set, be the set of all possible actions. fn (sn , xn )

The optimal value

where this minimization is taken over the feasible values of xn+1 .

3 / 18 Ya-Tang Chuang Dynamic Programming

The number of extra items produced in a production run is called the

4 / 18 Ya-Tang Chuang Dynamic Programming

Marginal production costs for this product are estimated to be $100

5 / 18 Ya-Tang Chuang Dynamic Programming

6 / 18 Ya-Tang Chuang Dynamic Programming

Each play of the game involves betting any desired number of

7 / 18 Ya-Tang Chuang Dynamic Programming

8 / 18 Ya-Tang Chuang Dynamic Programming

At each play of the game a gambler can bet any nonnegative

What’s the state space, action space and value function?

9 / 18 Ya-Tang Chuang Dynamic Programming

At each play of the game a gambler can bet any nonnegative

Let x denote the present fortune and α the fraction of the

Vn (x) = max {pVn−1 ( ) + qVn−1 ( )} ,

with the boundary condition V0 (x) = log x.

10 / 18 Ya-Tang Chuang Dynamic Programming

At each play of the game a gambler can bet any nonnegative

Let x denote the present fortune and α the fraction of the

Vn (x) = max {pVn−1 (x + αx) + qVn−1 (x − αx)} ,

with the boundary condition V0 (x) = log x.

11 / 18 Ya-Tang Chuang Dynamic Programming

A store has, at time n, sn items in stock. It then orders (and

Suppose there is a unit holding cost h if there is any inventory on

12 / 18 Ya-Tang Chuang Dynamic Programming

A store has, at time n, sn items in stock. It then orders (and

Suppose there is a unit holding cost h if there is any inventory on

r (sn , an ) = h · (sn + an − dn )+ + p · (dn − sn − an )+ .

13 / 18 Ya-Tang Chuang Dynamic Programming

Objective for a N-stage problem

where V0 (s0 ) is the expected total cost over N planning horizon.

Vn (sn ) = min {R(sn , an ) + EF [Vn+1 (sn+1 )]} ,

where R(sn , an ) = E[r (sn , an )].

14 / 18 Ya-Tang Chuang Dynamic Programming

15 / 18 Ya-Tang Chuang Dynamic Programming

Let (1) sn be the inventory at the beginning of period n, (2) an be

Since backorders are not allowed, the state is updated by

The (random) revenue during period n is

16 / 18 Ya-Tang Chuang Dynamic Programming

Let (1) sn be the inventory at the beginning of period n, (2) an be

Since backorders are not allowed, the state is updated by

The (random) revenue during period n is

17 / 18 Ya-Tang Chuang Dynamic Programming

Objective for a N-stage problem

where V0 (s0 ) is the expected total revenue over N planning

Vn (sn ) = max {R(sn , an ) + EF [Vn+1 (sn+1 )]} ,

where R(sn , an ) = E[r (sn , an )].

18 / 18 Ya-Tang Chuang Dynamic Programming

You might also like