0% found this document useful (0 votes)
34 views75 pages

Lecture11 SDE CMU

This document discusses Monte Carlo methods and stochastic differential equations (SDEs) as part of a course at Carnegie Mellon University. It covers the connection between ordinary differential equations (ODEs) and SDEs, the history of Brownian motion, and various numerical methods for solving ODEs and SDEs. Key concepts include the Feynman-Kac formula, Ito calculus, and the universality of Brownian motion in modeling random processes.

Uploaded by

Ge Cao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views75 pages

Lecture11 SDE CMU

This document discusses Monte Carlo methods and stochastic differential equations (SDEs) as part of a course at Carnegie Mellon University. It covers the connection between ordinary differential equations (ODEs) and SDEs, the history of Brownian motion, and various numerical methods for solving ODEs and SDEs. Key concepts include the Feynman-Kac formula, Ito calculus, and the universality of Brownian motion in modeling random processes.

Uploaded by

Ge Cao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

MONTE CARLO

METHODS
AND APPLICATIONS

Carnegie Mellon University | 21-387 / 15-327 / 15-627 / 15-860 | Fall 2023


LECTURE 11
STOCHASTIC DIFFERENTIAL EQUATIONS

MONTE CARLO METHODS


AND APPLICATIONS
Carnegie Mellon University | 21-387 | 15-327 | 15-627 | 15-860 | Fall 2023
DETERMINISTIC MOTION BROWNIAN MOTION

depends on history independent of history


Overview—SDEs & PDEs
• Ordinary & Stochastic Differential Equations (this lecture)
– how do we describe systems evolving over time? (ODEs)
– how do we incorporate randomness? (SDEs)
– how do we simulate motion numerically?
• Partial Differential Equations (next lecture)
analogy: trajectory of rock (+wind)
– how do we describe systems evolving over time & space? (PDEs)
– how do we simulate these systems numerically?

• SDE ⟷ PDE connection


– Somewhat surprising perspective: can use stochastic ODEs to
understand—and simulate—deterministic PDEs
– …and vice-versa! analogy: ripples on pond
Goal: Connect “microscopic” & “macroscopic”
Understand statements of two major concepts
and see how they can be used for computation.

Feynman-Kac formula Fokker-Planck equation

use random walks to solve PDEs solve PDEs to model random walks

(Proofs that they’re true will come later.)


History of Brownian Motion
• Brown’s “life force”
Robert Brown
– “spontaneous” motion of organic particles botanist
– …but also inorganic particles
• Einstein’s mystery: how does random motion arise?
– random “kicks” from water molecules are both too
small, and too frequent
– but occasionally random events “conspire” to give
big kick in same direction
Albert Einstein
– foundation of statistical physics physicist

• Wiener process
Norbert Wiener
– formalize Brownian motion as a “non differentiable mathematician &
curve” (Wiener process) computer scientist
Ordinary Differential Equations
Ordinary Differential Equations—Overview
• Differential equations “lingua franca” for phenomena appearing throughout
nature, technology, & society
• Give an implicit description of quantities in terms of relative rates of
change
– “if I change quantity A by a little bit, how much does quantity B change?"
• Very different from an explicit description
– “what are the actual values of A or B?”
• Basic task in mathematics & computation is therefore to solve for explicit
values, given implicit description
You’ve probably already done this in
your intro physics class! (Solve “F=ma”)
Ordinary Differential Equation
An ordinary differential equation is any equation of the form

where F is any function of the (unknown) function x(t) and its first n
derivatives in time.

We say this ODE is:


– nth order in time (or simply nth order)
– linear (or nonlinear) if F is a linear (or nonlinear) function of its inputs
Example — 1st-order Linear ODEs
x(t)
Simple but important example: c
some constant
a<0

t
t=0
“the function is proportional to its derivative” exponential decay (e.g., caffeine in blood)
Q: Solution? initial
x(t)
value

a>0
Check:
c t
t=0
exponential growth (e.g., bacteria on food)
1st-order Linear ODEs—General Solution
More generally, 1st-order linear ODE has the form

highest-order
derivative
x(t)

Solution:
still dominated by
exponential growth
(for b > 1)
c

t
t=0
Trivial Example—0th Order ODEs
Q: By the way, why didn’t we start with 0th-order ODEs? :-)

Example.

Example.

A: Because 0th-order “differential” equations are just equations!


(No relationship between different moments in time…)
Example—Projectile Motion
Quite famous ODE: Newton’s 2nd law of motion (“F=ma”)

assuming force,
mass are constant
2nd-order linear ODE

Q: Solution? initial initial


position velocity

in reality: a lot more complicated


(aerodynamic drag, spin of ball, wind, …)
Systems of ODEs
One way to solve Newton’s 2nd law: split into system of 1st-order equations:

original ODE (2nd-order) think of velocity v as independent quantity

Now solve each linear equation in sequence:


determined by
initial velocity:
v(0) = c

“couple” position x and velocity determined by


v into a system of ODEs initial position:
x(0) = d
ODEs—Vector Field Perspective
In general, ODE in several variables x(t) = (x1(t), …, xn(t)) can be
viewed as “flow” along a vector field ω .⃗
x2


change in velocity
position vector field
Solution corresponds to streamline of
vector field, starting from initial conditions.

We’ll use this visualization later to


develop an understanding of SDEs… → x1
Example—Projectile Motion

v v


0 0

→x →x
Solving a System of Linear ODEs
Consider the system of linear 1st-order ODEs
Q: What do you think the solution
should be?

For a single ODE we had

Can write in matrix form as

So, perhaps unsurprisingly,

matrix
exponential
Helpful for understanding
infinitesimal generator of
stochastic process…
Numerical Integration of ODEs
Numerical Integration
• As usual, can't integrate most equations in
closed form
• Instead, use numerical "time stepping" to
approximate solution
• General strategy:
– replace derivatives with differences
– solve for the unknowns!
• (This will also be the basis of the finite
difference method for PDEs…)
Running Example—Frictionless Pendulum

← angular velocity θ′ →

⃗ θ′) = (θ′, −sin(θ))
ω (θ,   -π ⟵ angle θ ⟶ +π
Forward Euler
Consider any ODE of the form We can approximate the time
derivative dx/dt by a difference

where x(t) is an -valued function


of time t, and the velocity is a
vector field on .

Question: at which of the two points should we evaluate the velocity?

Forward Euler: assuming current point x(t) is known, and next point
x(t) is unknown, probably easiest to evaluate at the known point.
Forward Euler (continued)
Forward Euler (continued)

Suppose we have initial conditions x(0) = x0.


Then we can repeatedly apply this approximation to get a sequence

forward Euler

Intuition: to get the next state, just step a little along the direction of velocity…
Pendulum — Forward Euler
Why does this
happen?

← angular velocity θ′ →

velocity ω ⃗ -π ⟵ angle θ ⟶ +π
Forward Euler—Stability Analysis
Consider a simpler (linear) problem: xk
a = -1
c
exponential decay forward Euler
ε = 0.5

k
5 10
xk
c ε = 1.75
initial
value Q: will we always get decay? 0
x(t)
c
A: No—must have |1+εa| < 1. k
5 10
a<0 Stay monotonic: ε < 1/|a|. xk
ε = 2.25

For general (nonlinear) ODE: c


of
bound ε in terms of eigenvalues
t=0
t Jacobian at every point
k
5 10
Backward Euler
Consider again any ODE Approximation of time
derivative involves two points:

where x(t) is an -valued function


of time t, and the velocity is a
vector field on .

Question: what if we evaluate the velocity at x(t + ε) instead of x(t)?


Backward Euler: even though next point x(t + ε) is not known, we
can still evaluate velocity “implicitly,” i.e., solve for a point x(t + ε)
such that the finite difference in time equals the velocity at x(t + ε).
Backward Euler (continued)
Backward Euler (continued)

Suppose we have initial conditions x(0) = x0.


Then we can repeatedly apply this approximation to get a sequence

backward Euler

Summary: solve a (possibly nonlinear) equation for the next state.


Backward Euler
Why does this
solve via, e.g., happen?
Newton’s method

← angular velocity θ′ →

velocity ω ⃗ -π ⟵ angle θ ⟶ +π
Backward Euler—Stability Analysis
Consider a simpler (linear) problem: xk
a = -1
c
exponential decay backward Euler
ε = 0.5

k
5 10
xk
c ε = 1.75
initial
value
x(t)
c
k
5 10
a<0 Q: will we always get decay? xk
ε = 2.25
A: Yes—since a < 0, ε > 0, factor c
always less than 1 (“unconditionally
t stable”)
t=0 But may be “over-damped!” k
5 10
Symplectic Euler
For ODEs arising from dynamical systems (e.g., Newton’s 2nd law), another option:
– first, update velocity from old position
–then, update positions from new velocity
• For conservative systems (no friction, etc.) energy, momentum, etc., will not “drift”
significantly up or down even over very long time scales
–exactly preserve symplectic form (sum of 2D phase-space areas in each dimension)
forward Euler backward Euler symplectic Euler
Symplectic Euler
This will (provably)
use new velocity to continue forever.
update old position

← angular velocity θ′ →

velocity ω ⃗ -π ⟵ angle θ ⟶ +π
ODE Integration—Beyond the Basics
• A lot to say about numerical integrators beyond
forward/backward Euler
• E.g., can we get the “right” behavior for systems more
complex than pendulum?
– Yes! can use geometric integrators like symplectic
Euler to get good long-term behavior for many
systems (dissipative, non-conservative forces, …)
• More generally, can improve integrator accuracy
– Adams-Bashforth, Runge Kutta, …
– less error per step, but error can still accumulate over long times
Numerical sol
ution may
scipy.integrate not reflect real
ity!

Can often just invoke library functions (but please understand what they do!)
Stochastic Differential Equations
Stochastic Differential Equations—Overview
• Now that we understand how to describe functions in terms of their
derivatives, can add randomness to the picture
• A few key pieces:
– Brownian motion — basic notion of randomness for continuous functions
– Diffusion process — more general class of “random functions” that
connect to broader applications & algorithms
– Ito calculus
° Ito’s lemma — basic notion of “stochastic differentiation”
° Ito integral — basic notion of “stochastic integration”
– Numerical integrators for SDEs
Stochastic Differential Equations—Motivation
• Consider particles jiggling in a water. What would
it take to simulate this system using an ODE
integrator?
• The issue is not merely that there are a lot of
particles: to capture the “jiggling” motion, we’d
also have to integrate ODEs for trajectories of a
huge number of water molecules (~1023).
• If mass of particles is large—or fluid is very cold—
motion due to thermal fluctuation is negligible, and
we can just simulate projectile motion plus a linear
drag force (linear ODE!)
• Otherwise, we have to actually model & simulate
the forces that induce jiggling (“Langevin force”)
Brownian motion — Motivation
• Processes found in nature, finance, etc., have very different physical/dynamical origins

• Each one “jiggles around” according to a very different distribution P(xk+1 | xk)

• For fun, let's simulate random walks using a few distributions p (centered at 0):

normal square circle points

ALGORITHM: xk+1 ← xk + ξk, ξk ∼ p (i.i.d.)


Random Walks—A Few Steps
Suppose we take 10 steps. Can you tell which walk comes from which distribution?
Random Walks—A Few Steps
Suppose we take 10 steps. Can you tell which walk comes from which distribution?
Random Walks—Many Steps
Suppose we now take 10,000 steps. Can you still tell which walk is which?
Random Walks—Many Steps
Suppose we now take 10,000 steps. Can you still tell which walk is which?
Random Walks—Zooming Out
Let’s watch what happens as we gradually zoom out:
Q: Why do these walks all look so similar “from a distance”—
even though they look very different “up close?”
Brownian motion — Big Picture
• A: Because of the central limit theorem! ξ1 + ⋯ + ξn
• The distribution describing the location of
the nth step is the sum of n copies of single- n=1
step distribution p
• Central limit theorem tells us that this n=2
distribution approaches a normal
distribution as n → ∞, no matter what p
looks like n=5
– when we zoom out, can’t see individual
steps—only the effect of n steps, for fairly
large n n = 1k
Universality of Brownian Motion
Takeaway: Even though random processes found in nature, science,
technology, etc., all have very different origins, their aggregate behavior
is in many* cases extremely well-predicted by one universal model.

*Though other stochastic processes do arise in nature!


Brownian Motion / Wiener Process
Brownian motion or Wiener process assigns random variable Wt to each time t:

independent Gaussian increments

t
(Wt varies continuously with respect to t)
Wiener Process—Definition
More formally, a Wiener process is a time-parameterized family of random
variables Wt (i.e., one random variable for each t ∈ ℝ>0) such that:

(continuity) Wt is continuous in t almost surely i.e., with


Brownian probability 1
motion (independent increments) The “random increment” Wt2 − Wt1 is
exhibits independent of any past state Wt0 for all 0 ≤ t0 < t1 ≤ t2
Markov
property! (Gaussian increments) Each increment Wt2 − Wt1 follows a normal
distribution 𝒩(0,t2 − t1)

Often, “Gaussian increments” condition given without any motivation


– e.g., why not consider other kinds of increments?
– hopefully you now understand why! ;-)
Donsker’s Theorem
• Consider a sequence of i.i.d. random variables X1, …, Xn
• Can associate these discrete steps with a time-continuous function

• Donsker’s theorem. As n → ∞, W n̂ (t) converges* to a standard


Brownian motion Wt over t ∈ [0,1]
– *in an appropriate sense (Skorokhod topology)
Stopping Time
• Although many random processes could continue price T
indefinitely, there is often a natural stopping time

– For process Xt, often denote stopping time by


capital T time

– e.g., stock options: we purchase the option to payoff date


purchase an asset at an alternative price at a fixed
time T
– e.g., control theory: need to “steer” noisy process
toward a goal over a fixed time T
• Stopping time can itself be a random variable
– e.g., gamble until you run out of money!
video: Marc Miskin
Deterministic Process
ordinary differential equation (ODE)

CHANGE IN CHANGE IN
POSITION TIME

VELOCITY

Note: if we “div
ide
by dt”, get usua
ODE dx/dt = ω( l
x(t))

trajectory
drift direction
Brownian Process
stochastic differential equation (SDE)

CHANGE IN
POSITION

“NOISE”

trajectory
Brownian Process with Drift
deterministic motion + “noise”
–or–
random motion + “drift”

CHANGE IN CHANGE IN
POSITION TIME

VELOCITY BROWNIAN
MOTION

trajectory
drift direction
Brownian Process with Variable Diffusion

RATE OF
DIFFUSION

trajectory
diffusivity
Brownian Process in Absorbing Medium
In general, may need to talk about
random walk getting “killed” or
“absorbed”—even though absorption
does not appear in the SDE itself.

Roughly: integrating absorption


over time determines (random)
stopping time.

trajectory
absorption
Diffusion Process

CHANGE IN CHANGE IN
POSITION TIME “NOISE”

VELOCITY RATE OF
DIFFUSION

trajectory
drift direction
absorption
diffusivity
Anisotropic Diffusion
Q: Do you think our random walk will look the same (as n → ∞)
if we sample our step direction from these two distributions?

A: No! If our distribution is anisotropic (i.e., lacks rotational


symmetry), our random walk will likewise be anisotropic.
Anisotropic Diffusion & Central Limit Theorem
• In multiple dimensions, the central covariance matrix
limit theorem says that a sum of i.i.d.
samples Xi from any distribution
converges to a normal distribution
with the same mean μ and covariance
matrix Σ
– in general, Σ can look very different
from a constant multiple of the
identity!
Function of a Stochastic Processes
• Recall: a function of a random variable is a random variable
• Likewise, a function of stochastic process is a stochastic process
Natural que
stion:
what is the S
D
governing Y E
t?
Itô’s Formula
Itô’s lemma provides “chain rule” for stochastic processes.

Rough intuition.

Deterministic derivatives ask: how much


does output change if we vary the input
along a given direction?

Stochastic derivatives ask: what


distribution of change do we get by
varying the input along a distribution of
Kyoshi Itō
directions?
Itô’s Formula—Ordinary Differential Equation
Itô’s lemma provides “chain rule” for stochastic processes.

Example. For deterministic ODE, just the usual chain rule:

ordinary differential equation


temporal
change in change in spatial change in
derived value f itself f due to motion
over time along trajectory
time-varying function

derived differential equation


Itô’s Formula — Brownian Motion
Itô’s lemma provides “chain rule” for stochastic processes.

Example. Most essential question: what about Brownian motion?

pure Brownian motion


change in
derived value Laplacian of spatial change in f
over time function f along random trajectory
time-independent function

derived stochastic process


Really strange: we only took
one derivative (d). How did we
end up with 2nd derivatives?
Itô’s Lemma & Laplacian
Intuition.
Wt f
– Over small time t, Brownian
motion Wt explores a small
neighborhood of X0.
h
– At any point x, the Laplacian ∆f (x)
gives difference between the value
at x and value in small
neighborhood.
– Hence, 1st-order change in
observed value over time involves
2nd-order derivative in space.
– (Formal treatment: Øksendal §4.2)
Itô’s Formula — Diffusion Process
Itô’s lemma provides “chain rule” for stochastic processes.

Example. Overall we get a formula for general diffusion processes:


diffusion process time-varying function “derived” process

change in temporal spatial change in f spatial change in f


derived value change in due to “exploration” due to motion along
over time f itself of local neighborhood random path

spatial change in f due : s in c e a ll d ire c ti ons are


to deterministic motion Note locally
a ll y li k e ly & f is
equ ] = 0
lin e a r, 𝔼 [ ∇ f ⋅ d W t
Even more general form: Øksendal [2013, §4.2]
Ito Integration
Deterministic Integral Stochastic (Itô) Integral
“start at an initial point and add “start at an initial point and add
total change due to a deterministic total change due to a stochastic
function / vector field” function / random walk”

result is a point in space result is a random variable


Ito Integration (continued)
Perhaps easiest to understand in terms of numerical integration:

velocity at
Euler-Maruyama last state diffusivity noise

next last time


state state step

Get better & better approximation of one trajectory by taking more steps n:

As n → ∞, the distribution of points essentially describes result of Ito integral.


More formal treatment: Øksendal [2013, Ch. 3]
Numerical Integration of SDEs
• Numerically integrating SDEs not much different from ODEs
– roughly speaking: take a step and “add noise”
– amount of noise should be proportional to time step ε

diffusion process

[Kloeden & Platen]

Euler-Maruyama (forward) Euler-Maruyama (backward)

explicit implicit
Pendulum in the Wind—Forward Euler-Maruyama

μ=3/5

← angular velocity θ′ →

⃗ θ′)
drift ω (θ,  -π ⟵ angle θ ⟶ +π
Pendulum in the Wind—Backward Euler-Maruyama

μ=3/5

← angular velocity θ′ →

⃗ θ′)
drift ω (θ,  -π ⟵ angle θ ⟶ +π
Pendulum in the Wind—Symplectic Euler-Maruyama

μ=3/5

← angular velocity θ′ →

⃗ θ′)
drift ω (θ,  -π ⟵ angle θ ⟶ +π
Pendulum as Random Variable
But wait a minute—didn’t we say the result of Itô integration is a random variable?
(Not just one “noisy” trajectory.)

Can think of our SDE integrator as a tool for approximating a


distribution, rather than finding just one trajectory.
Application: Molecular Dynamics
• In fact, this is often the goal in molecular dynamics
– use SDE integrator to simulate trajectory of
molecules in “noisy” environment
– perform many trials to understand typical/
average behavior of large ensemble of molecules
– use information to predict behavior of diseases, COVID-19 spike protein
response to drugs, build new materials, …
– alternative perspective: simulation is strategy for
sampling states of system according to their
probability
– later: Langevin dynamics ↔ Langevin Monte Carlo
video credits: Bohemian chemists, Max Planck society
Beyond Brownian Motion—Martingales
• In general, martingale is stochastic process where:
– average value doesn’t change
– average value is independent of history
• Discrete sequence of random variables X1, …, Xk is a
martingale if 𝔼[Xk+1 | X1, …, Xk] = Xk
nightingale
– (Xi need not be independent) Xt

• Brownian motion is a model example in the


continuous case
Makes it possible
g to t
r i t y c o n d i t i o n eneralize Brown
Basic regul a m i a n
s t i c p ro c e s s e s o t ion (and still say
for sto c h a
useful things…)
martingale
Summary
Overview—Stochastic Differential Equations
• ODEs implicitly describe systems evolving over time
• SDEs add randomness to this picture
• use numerical integration to recover explicit function from
implicit description
– forward Euler — simple/cheap but unstable
analogy: trajectory of rock (+wind)
– backward Euler — trickier/more expensive but stable
– Euler-Maruyama — “just add noise” to simulate SDEs
• Ito calculus lets us analyze SDEs
– Ito’s lemma — basic analogue of differentiation
– Ito integration — basic analogue of integration
– unlike ordinary calculus, get distributions (not definite values)
Thanks!

MONTE CARLO METHODS


AND APPLICATIONS
Carnegie Mellon University | 21-387 / 15-327 / 15-627 / 15-860 | Fall 2023

You might also like