0% found this document useful (0 votes)
88 views

Continuous Martingales and Stochastic Calculus: Alison Etheridge March 11, 2018

This document provides an introduction to continuous martingales and stochastic calculus. It begins with an overview of Gaussian variables and processes. It then defines Brownian motion and discusses its properties. Next, it covers filtrations, stopping times, (sub/super) martingales in continuous time, and continuous semimartingales. It concludes by discussing stochastic integration and Itô's formula. The goal is to develop the necessary "calculus" to define and study stochastic differential equations driven by Brownian motion.

Uploaded by

vahid mesic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Continuous Martingales and Stochastic Calculus: Alison Etheridge March 11, 2018

This document provides an introduction to continuous martingales and stochastic calculus. It begins with an overview of Gaussian variables and processes. It then defines Brownian motion and discusses its properties. Next, it covers filtrations, stopping times, (sub/super) martingales in continuous time, and continuous semimartingales. It concludes by discussing stochastic integration and Itô's formula. The goal is to develop the necessary "calculus" to define and study stochastic differential equations driven by Brownian motion.

Uploaded by

vahid mesic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Continuous martingales and stochastic calculus

Alison Etheridge

March 11, 2018

Contents
1 Introduction 3

2 An overview of Gaussian variables and processes 5


2.1 Gaussian variables . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Gaussian vectors and spaces . . . . . . . . . . . . . . . . . . . . 7
2.3 Gaussian spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Gaussian processes . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Constructing distributions on R[0,∞) , B(R[0,∞) ) . . . . . . . . . 11

3 Brownian Motion and First Properties 12


3.1 Definition of Brownian motion . . . . . . . . . . . . . . . . . . . 12
3.2 Wiener Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Extensions and first properties . . . . . . . . . . . . . . . . . . . 18
3.4 Basic properties of Brownian sample paths . . . . . . . . . . . . . 19

4 Filtrations and stopping times 23


4.1 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5 (Sub/super-)Martingales in continuous time 27


5.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Doob’s maximal inequalities . . . . . . . . . . . . . . . . . . . . 28
5.3 Convergence and regularisation theorems . . . . . . . . . . . . . 30
5.4 Martingale convergence and optional stopping . . . . . . . . . . . 33

6 Continuous semimartingales 36
6.1 Functions of finite variation . . . . . . . . . . . . . . . . . . . . . 36
6.2 Processes of finite variation . . . . . . . . . . . . . . . . . . . . . 38
6.3 Continuous local martingales . . . . . . . . . . . . . . . . . . . . 40
6.4 Quadratic variation of a continuous local martingale . . . . . . . . 41
6.5 Continuous semimartingales . . . . . . . . . . . . . . . . . . . . 47

1
7 Stochastic Integration 48
7.1 Stochastic integral w.r.t. L2 -bounded martingales . . . . . . . . . 48
7.2 Intrinsic characterisation of stochastic integrals using the quadratic
co-variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.3 Extensions: stochastic integration with respect to continuous semi-
martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.4 Itô’s formula and its applications . . . . . . . . . . . . . . . . . . 57

A The Monotone Class Lemma/ Dynkin’s π − λ Theorem 62

B Review of some basic probability 63


B.1 Convergence of random variables. . . . . . . . . . . . . . . . . . 63
B.2 Convergence in probability. . . . . . . . . . . . . . . . . . . . . . 63
B.3 Uniform Integrability . . . . . . . . . . . . . . . . . . . . . . . . 64
B.4 The Convergence Theorems . . . . . . . . . . . . . . . . . . . . 64
B.5 Information and independence. . . . . . . . . . . . . . . . . . . . 66
B.6 Conditional expectation. . . . . . . . . . . . . . . . . . . . . . . 66

C Convergence of backwards discrete time supermartingales 67

D A very short primer in functional analysis 67

E Normed vector spaces 67

F Banach spaces 69

These notes are based heavily on notes by Jan Obłój from last year’s course,
and the book by Jean-François Le Gall, Brownian motion, martingales, and stochas-
tic calculus, Springer 2016. The first five chapters of that book cover everything in
the course (and more). Other useful references (in no particular order) include:
1. D. Revuz and M. Yor, Continuous martingales and Brownian motion, Springer
(Revised 3rd ed.), 2001, Chapters 0-4.
2. I. Karatzas and S. Shreve, Brownian motion and stochastic calculus, Springer
(2nd ed.), 1991, Chapters 1-3.
3. R. Durrett, Stochastic Calculus: A practical introduction, CRC Press, 1996.
Sections 1.1 - 2.10.
4. F. Klebaner, Introduction to Stochastic Calculus with Applications, 3rd edition,
Imperial College Press, 2012. Chapters 1, 2, 3.1–3.11, 4.1-4.5, 7.1-7.8, 8.1-
8.7.

2
5. J. M. Steele, Stochastic Calculus and Financial Applications, Springer, 2010.
Chapters 3 - 8.
6. B. Oksendal, Stochastic Differential Equations: An introduction with applica-
tions, 6th edition, Springer (Universitext), 2007. Chapters 1 - 3.
7. S. Shreve, Stochastic calculus for finance, Vol 2: Continuous-time models,
Springer Finance, Springer-Verlag, New York, 2004. Chapters 3 - 4.
The appendices gather together some useful results that we take as known.

1 Introduction
Our topic is part of the huge field devoted to the study of stochastic processes.
Since first year, you’ve had the notion of a random variable, X say, on a probability
space (Ω, F , P) and taking values in a state space (E, E ). X : Ω → E is just a
measurable mapping, so that for each e ∈ E , X −1 (e) ∈ F and so, in particular, we
can assign a probability to the event that X ∈ e. Often (E, E ) is just (R, B(R))
(where B(R) is the Borel sets on R) and this just says that for each x ∈ R we can
assign a probability to the event {X ≤ x}.
Definition 1.1 A stochastic process, indexed by some set T , is a collection of
random variables {Xt }t∈T , defined on a common probability space (Ω, F , P) and
taking values in a common state space (E, E ).
For us, T will generally be either [0, ∞) or [0, T ] and we think of Xt as a random
quantity that evolves with time.
When we model deterministic quantitities that evolve with (continuous) time,
we often appeal to ordinary differential equations as models. In this course we
develop the ‘calculus’ necessary to develop an analogous theory of stochastic (or-
dinary) differential equations.
An ordinary differential equation might take the form
dX(t) = a(t, X(t))dt,
for a suitably nice function a. A stochastic equation is often formally written as
dX(t) = a(t, X(t))dt + b(t, X(t))dBt ,
where the second term on the right models ‘noise’ or fluctuations. Here (Bt )t≥0
is an object that we call Brownian motion. We shall consider what appear to be
more general driving noises, but the punchline of the course is that under rather
general conditions they can all be built from Brownian motion. Indeed if we added
possible (random) ‘jumps’ in X(t) at times given by a Poisson process, we’d cap-
ture essentially the most general theory. We are not going to allow jumps, so we’ll
be thinking of settings in which our stochastic equation has a continuous solution
t 7→ Xt , and Brownian motion will be a fundamental object.
It requires some measure-theoretic niceties to make sense of all this.

3
Definition 1.2 The mapping t 7→ Xt (ω) for a fixed ω ∈ Ω, represents a realisation
of our stochastic process, called a sample path or trajectory. We shall assume that
(t, ω) 7→ Xt (ω) : [0, ∞) × Ω, B([0, ∞)) ⊗ F 7→ R, B(R)
 

is measurable (i.e. ∀A ∈ B(R), {(t, ω) : Xt ∈ A} is in the product σ -algebra


B([0, ∞)) ⊗ F . Our stochastic process is then said to be measurable.
Definition 1.3 Let X,Y be two stochastic processes defined on a common proba-
bility space (Ω, F , P) .
i. We say that X is a modification of Y if, for all t ≥ 0, we have Xt = Yt a.s.;
ii. We say that X and Y are indistinguishable if P[Xt = Yt , for all 0 ≤ t < ∞] = 1.
If X and Y are modifications of one another then, in particular, they have the same
finite dimensional distributions,
P [(Xt1 , . . . , Xtn ) ∈ A] = P [(Yt1 , . . . ,Ytn ) ∈ A]
for all measurable sets A, but indistinguishability is a much stronger property.
Indistinguishability takes the sample path as the basic object of study, so that
we could think of (Xt (ω),t ≥ 0) (the path) as a random variable taking values in the
space E [0,∞) (of all possible paths). We are abusing notation a little by identifying
the sample space Ω with the state space E [0,∞) . This state space then has to be
endowed with a σ -algebra of measurable sets. For definiteness, we take real-valued
processes, so E = R.
Definition 1.4 An n-dimensional cylinder set in R[0,∞) is a set of the form
C = ω ∈ R[0,∞) : ω(t1 ), . . . , ω(tn ) ∈ A
 

for some 0 ≤ t1 < t2 . . . < tn and A ∈ B(Rn ).


Let C be the family of all finite-dimensional cylinder sets and B(R[0,∞) ) the
σ -algebra it generates. This is small enough to be able to build probability mea-
sures on B(R[0,∞) ) using Carathéodory’s Theorem (see B8.1). On the other hand
B(R[0,∞) ) only contains events which can be defined using at most countably many
coordinates. In particular, the event
ω ∈ B(R[0,∞) ) : ω(t) is continuous


is not B(R[0,∞) )-measurable.


We will have to do some work to show that many processes can be assumed to
be continuous, or right continuous. The sample paths are then fully described by
their values at times t ∈ Q, which will greatly simplify the study of quantities of
interest such as sup0≤s≤t |Xs | or τ0 (ω) = inf{t ≥ 0 : Xt (ω) > 0}.
A monotone class argument (see Appendix A) will tell us that a probability
measure on B(R[0,∞) ) is characterised by its finite-dimensional distributions - so
if we can take continuous paths, then we only need to find the probabilities of
cylinder sets to characterise the distribution of the process.

4
2 An overview of Gaussian variables and processes
Brownian motion is a special example of a Gaussian process - or at least a version
of one that is assumed to have continuous sample paths. In this section we give an
overview of Gaussian variables and processes, but in the next we shall give a direct
construction of Brownian motion, due to Lévy, from which continuity of sample
paths is an immediate consequence.

2.1 Gaussian variables


Definition 2.1 A random variable X is called a centred standard Gaussian, or stan-
dard normal, if its distribution has density
1 x2
pX (x) = √ e− 2 x∈R

with respect to Lebesgue measure. We write X ∼ N (0, 1).

It is elementary to calculate its Laplace transform:


λ2
E[eλ X ] = e 2 , λ ∈ R,

or extending to complex values the characteristic function


−ξ 2
E[eiξ X ] = e 2 , ξ ∈ R.

We say Y has Gaussian (or normal) distribution with mean m and variance σ 2 ,
written Y ∼ N (m, σ 2 ), if Y = σ X + m where X ∼ N (0, 1). Then
 σ 2ξ 2 
E[eiξY ] = exp imξ − , ξ ∈ R,
2
and if σ > 0, the density on R is
2
1 − (x−m)
pY (x) = √ e 2σ 2 , x ∈ R.
2πσ
We think of a constant ‘random’ variable as being a degenerate Gaussian. Then
the space of Gaussian variables (resp. distributions) is closed under convergence in
probability (resp. distribution).
Proposition 2.2 Let (Xn ) be a sequence of Gaussian random variables with Xn ∼
N (mn , σn2 ), which converges in distribution to a random variable X. Then
(i) X is also Gaussian, X ∼ N (m, σ 2 ) with m = limn→∞ mn , σ 2 = limn→∞ σn2 ;
and

(ii) if (Xn )n≥1 converges to X in probability, then the convergence is also in L p


for all 1 ≤ p < ∞.

5
Proof. Convergence in distribution is equivalent to saying that the characteristic
functions converge:
h i h i
E eiξ Xn = exp(imn ξ − σn2 ξ 2 /2) −→ E eiξ X , ξ ∈ R. (1)

Taking absolute values we see that the sequence exp(−σn2 ξ 2 /2) converges, which
in turn implies that σn2 → σ 2 ∈ [0, ∞) (where we ruled out the case σn → ∞ since the
limit has to be the absolute value of a characteristic function and so, in particular,
has to be continuous). We deduce that
1 2 2
h i
eimn ξ −→ e 2 σ ξ E eiξ X , ξ ∈ R.

We now argue that this implies that the sequence mn converges to some finite m.
Suppose first that the sequence {mn }n≥1 is bounded, and consider any two conver-
gent subsequences, converging to m and m0 say. Then rearranging (1) yields m = m0
and so the sequence converges.
Now suppose that lim supn→∞ mn = ∞ (the case lim infn→∞ mn = −∞ is similar).
There exists a subsequence {mnk }k≥1 which tends to infinity. Given M, for k large
enough that mnk > M,
1
P[Xnk ≥ M] ≥ P[Xnk ≥ mnk ] = ,
2
and so using convergence in distribution
1
P[X ≥ M] ≥ lim sup P[Xnk ≥ M] ≥ , for any M > 0.
k→∞ 2
This is clearly impossible for any fixed random variable X and gives us the desired
contradiction. This completes the proof of (i).
To show (ii), observe that the convergence of σn and mn implies, in particular, that
h i 2 2
sup E eθ Xn = sup eθ mn +θ σn /2 < ∞, for any θ ∈ R.
n n

Since exp(|x|) ≤ exp(x) + exp(−x), this remains finite if we take |Xn | instead of Xn .
This implies that supn E[|Xn | p ] < ∞ for any p ≥ 1 and hence also

sup E[|Xn − X| p ] < ∞, ∀p ≥ 1. (2)


n

Fix p ≥ 1. Then the sequence |Xn − X| p converges to zero in probability (by as-
sumption) and is uniformly integrable, since, for any q > p,
1
E[|Xn − X| p : |Xn − X| p > r] ≤ E[|Xn − X|q ] → 0 as r → ∞
r(q−p)/p
(by equation (2) above). It follows that we also have convergence of Xn to X in L p .


6
2.2 Gaussian vectors and spaces
So far we’ve considered only real-valued Gaussian variables.

Definition 2.3 A random vector taking values in Rd is called Gaussian if and only
if
d
< u, X >:= uT X = ∑ ui Xi is Gaussian for all u ∈ Rd .
i=1

It follows immediately that the image of a Gaussian vector under a linear transfor-
mation is also Gaussian: if X ∈ Rd is Gaussian and A is an m × d matrix, then AX
is Gaussian in Rm .

Lemma 2.4 Let X be a Gaussian


 vector and define mX := (E[X1 ], . . . , E[Xd ]) and
ΓX := cov(Xi , X j )1≤i, j≤d , the mean vector and the covariance matrix respec-
tively. Then qX (u) := uT ΓX u is a non-negative quadratic form and

uT X ∼ N uT mX , qX (u) , u ∈ Rd .

(3)

Proof. Clearly E[uT X] = uT mX and


 !2 
d
var(uT X) = E  ∑ ui (Xi − E[Xi ])  = ∑ ui u j cov(Xi , X j ) = uT ΓX u = qX (u),
i=1 1≤i, j≤d

which also shows that qX (u) ≥ 0. 


The identification in (3) is equivalent to
1
E ei<u,X> = ei<u,mX >− 2 qX (u) , u ∈ Rd .
 

From this we derive easily the following important fact:

Proposition 2.5 Let X be a Gaussian vector and ΓX its covariance matrix. Then
X1 , . . . , Xd are independent if and only if ΓX is a diagonal matrix (i.e. the variables
are pairwise uncorrelated).

Warning: Note It is crucial to assume that the vector X is Gaussian and not just that
X1 , . . . , Xd are Gaussian. For example, consider X1 ∼ N (0, 1) and ε an independent
random variable with P[ε = 1] = 1/2 = P[ε = −1]. Let X2 := εX1 . Then X2 ∼
N (0, 1) and cov(X1 , X2 ) = 0, while clearly X1 , X2 are not independent.
By definition, a Gaussian vector X remains Gaussian if we add to it a determin-
istic vector m ∈ Rd . Hence, without loss of generality, by considering X − mX , it
suffices to consider centred Gaussian vectors. The variance-covariance matrix ΓX
is symmetric and non-negative definite (as observed above). Conversely, for any
such matrix Γ, there exists a Gaussian vector X with ΓX = Γ, and, indeed, we can
construct it as a linear transformation of a Gaussian vector with i.i.d. coordinates.

7
Theorem 2.6 Let Γ be a symmetric non-negative definite d ×d matrix. Let (ε1 , . . . , εd )
be an orthonormal basis in Rd which diagonalises Γ, i.e. Γεi = λi εi for some
λ1 ≥ λ2 ≥ . . . ≥ λr > 0 = λr+1 = . . . = λd , where 1 ≤ r ≤ d is the rank of Γ.

(i) A centred Gaussian vector X with covariance matrix ΓX = Γ exists.

(ii) Further, any such vector can be represented as


r
X = ∑ Yi εi , (4)
i=1

where Y1 , . . . ,Yr are independent Gaussian random variables with Yi ∼ N (0, λi ).

(iii) If r = d, then X admits a density given by


1 1  1
T −1

pX (x) = p exp − x Γ x , x ∈ Rd .
(2π)d/2 det(Γ) 2

Proof. Let A be a matrix whose columns are εi so that ΓX = AΛAT where Λ is the
√ diagonal. Let Z1 , . . . , Zn be i.i.d. standard
diagonal matrix with entries λi on the
centred Gaussian variables and Yi = λi Zi . Let X be given by (4), i.e. X = AY .
Then
d p
< u, X >=< u, AY >=< AT u,Y >= ∑ λi < u, εi > Zi
i=1
is Gaussian, centred. Its variance is given by
d d
var(hu, Xi) = ∑ λi (uT εi )2 = ∑ uT εi λi εiT u = (AT u)T ΛAT u = uT AΛAT u = uT Γu,
i=1 i=1

and (i) is proved.


Conversely, suppose X is a centred Gaussian vector with covariance matrix Γ and
let Y = AT X. For u ∈ Rd , < u,Y >=< Au, X > is centred Gaussian with variance

(Au)T ΓAu = uT AT ΓAu = uT Λu

and we conclude that Y is also a centred Gaussian vector with covariance matrix
Λ. Independence between Y1 , . . . ,Yd then follows from Proposition 2.5. It follows
that when r = d, Y admits a density on Rd given by
1 1  1
T −1

pY (y) = p exp − y Λ y , y ∈ Rd .
(2π)d/2 det(Λ) 2

Change of variables, together with det(Λ) = det(Γ) and | det(A)| = 1 gives the
desired density for x. 
Once we know that we can write X = ∑ri=1 Yi εi in this way, we have an easy
way to compute conditional expectations within the family of random variables

8
which are linear transformations of a Gaussian vector X. To see how it works,
suppose that X is a Gaussian vector in Rd and define Z := (X1 − ∑di=2 ai Xi ) with the
coefficients ai chosen in such a way that Z and Xi are uncorrelated for i = 2, . . . , d;
that is
d
cov(X1 , Xi ) − ∑ a j cov(X j , Xi ) = 0, 2 ≤ i ≤ d.
j=2

Evidently Z is Gaussian (it is a linear combination of Gaussians) and since it is


uncorrelated with X2 , . . . , Xd , by Proposition 2.5, it is independent of them. Then

d
E[X1 |σ (X2 , . . . , Xd )] = E[Z + ∑ ai Xi |σ (X2 , . . . , Xd )]
i=2
d d
= E[Z|σ (X2 , . . . , Xd )] + E[ ∑ ai Xi |σ (X2 , . . . , Xd )] = ∑ ai Xi ,
i=2 i=2

where we have used independence to see that E[Z|σ (X2 , . . . , Xd )] = E[Z] = 0.


The most striking feature is that E[X|σ (K)] is an element of the vector space
K itself and not a general σ (K)-measurable random variable. In particular, if
(X1 , X2 , X3 ) is a Gaussian vector then the best (in the L2 sense, see Appendix B.6)
approximation of X1 in terms of X2 , X3 is in fact a linear function of X1 and X2 .
This extends to the more general setting of Gaussian spaces to which we now turn.

2.3 Gaussian spaces


Note that to a Gaussian vector X in Rd , we can associate the vector space spanned
by its coordinates: ( )
d
∑ ui Xi : ui ∈ R ,
i=1

and by definition all elements of this space are Gaussian random variables. This is
a simple example of a Gaussian space and it is useful to think of such spaces in
much greater generality.

Definition 2.7 A closed linear subspace H ⊂ L2 (Ω, F , P) is called a Gaussian


space if all of its elements are centred Gaussian random variables.

In analogy to Proposition 2.5, two elements of a Gaussian space are independent if


and only if they are uncorrelated, which in turn is equivalent to being orthogonal
in L2 . More generally we have the following result.

Theorem 2.8 Let H1 , H2 be two Gaussian subspaces of a Gaussian space H. Then


H1 , H2 are orthogonal if and only if σ (H1 ) and σ (H2 ) are independent.

The theorem follows from monotone class arguments, which (see Appendix A) re-
duce it to checking that it holds true for any finite subcollection of random variables
- which is Proposition 2.5.

9
Corollary 2.9 Let H be a Gaussian space and K a closed subspace. Let pK denote
the orthogonal projection onto K. Then for X ∈ H

E[X|σ (K)] = pK (X). (5)

Proof. Let Y = X − pK (X) which, by Theorem 2.8 is independent of σ (K). Hence

E[X|σ (K)] = E[pK (X)|σ (K)] + E[Y |σ (K)] = pK (X) + E[Y ] = pK (X),

where we have used that Y is a centred Gaussian and so has zero mean. 
Warning: For an arbitrary X ∈ L2 we would have

E[X|σ (K)] = pL2 (Ω,σ (K),P) (X).

It is a special property of Gaussian random variables X that it is enough to consider


the projection onto the much smaller space K.

2.4 Gaussian processes


Definition 2.10 A stochastic process (Xt : t ≥ 0) is called a (centred) Gaussian
process if any finite linear combination of its coordinates is a (centred) Gaussian
variable.

Equivalently, X is a centred Gaussian process if for any n ∈ N and 0 ≤ t1 < t2 <


. . . < tn , (Xt1 , Xt2 , . . . , Xtn ) is a (centred) Gaussian vector. It follows that the distribu-
tion of a centred Gaussian process on B(R[0,∞) ), is characterised by the covariance
function Γ : [0, ∞)2 → R, i.e.

Γ(s,t) := cov(Xt , Xs ).

For any fixed n-tuple (Xt1 , . . . , Xtn ) the covariance matrix (Γ(ti ,t j )) has to be sym-
metric and positive semi-definite. As the following result shows, the converse also
holds - for any such function, Γ, we may construct an associated Gaussian process.

Theorem 2.11 Let Γ : [0, ∞)2 → R be symmetric and such that for any n ∈ N and
0 ≤ t1 < t2 < . . . < tn ,

∑ ui u j Γ(ti ,t j ) ≥ 0, u ∈ Rd .
1≤i, j,≤n

Then there exists a centred Gaussian process with covariance function Γ.

This result will follow from the (more general) Daniell-Kolmogorov Theorem 2.13
below.
Recalling from Proposition 2.2 that an L2 -limit of Gaussian variables is also
Gaussian, we observe that the closed linear subspace of L2 spanned by the variables
(Xt : t ≥ 0) is a Gaussian space.

10
Constructing distributions on R[0,∞) , B(R[0,∞) )

2.5
In this section, we’re going to provide a very general result about constructing
continuous time stochastic processes and a criterion due to Kolmogorov which
gives conditions under which there will be a version of the process with continuous
paths.
Let T be the set of finite increasing sequences of non-negative numbers, i.e.
t ∈ T if and only if t = (t1 ,t2 , . . . ,tn ) for some n and 0 ≤ t1 < t2 < . . . , < tn .
Suppose that for each t ∈ T of length n we have a probability measure Qt on
(Rn , B(Rn )). The collection (Qt : t ∈ T) is called a family of finite-dimensional
(marginal) distributions.

Definition 2.12 A family of finite dimensional distributions is called consistent if


for any t = (t1 ,t2 , . . . ,tn ) ∈ T and 1 ≤ j ≤ n

Qt (A1 × A2 × . . . × A j−1 × R × A j+1 × . . . × An )


= Qs (A1 × A2 × . . . × A j−1 × A j+1 × . . . × An )

where Ai ∈ B(R) and s := (t1 ,t2 , . . . ,t j−1 ,t j+1 , . . . ,tn ).

(In other words, if we integrate out over the distribution at the jth time point then
we recover the corresponding marginal for the remaining lower dimensional vec-
tor.)
If we have a probability measure Q on R[0,∞) , B(R[0,∞) ) then it defines a


consistent family of marginals via

Qt (A) = Q({ω ∈ R[0,∞) : (ω(t1 ), . . . , ω(tn )) ∈ A})

where t = (t1 ,t2 , . . . ,tn ), A ∈ B(Rn ), and we note that the set in question is in
B(R[0,∞) ) as it depends on finitely many coordinates. But we’d like a converse - if
I give you Qt , when does there exist a corresponding measure Q.

Theorem 2.13 (Daniell-Kolmogorov) Let {Qt : t ∈ T} be a consistent family


of finite-dimensional
 distributions. Then there exists a probability measure P on
R[0,∞) , B(R[0,∞) ) such that for any n, t = (t1 , . . . ,tn ) ∈ T and A ∈ B(Rn ),

Qt (A) = P[{ω ∈ R[0,∞) : (ω(t1 ), . . . , ω(tn )) ∈ A}]. (6)

We won’t prove this, but notice that (6) defines P on the cylinder sets and so if we
have countable additivity then the proof reduces to an application of Carathéodory’s
extension theorem. Uniqueness is a consequence of the Monotone Class Lemma.
This is a remarkably general result, but it doesn’t allow us to say anything
meaningful about the paths of the process. For that we appeal to Kolmogorov’s
criterion.

11
Theorem 2.14 (Kolmogorov’s continuity criterion) Suppose that a stochastic pro-
cess (Xt : t ≤ T ) defined on (Ω, F , P) satisfies

E[|Xt − Xs |α ] ≤ C|t − s|1+β , 0 ≤ s,t ≤ T (7)

for some strictly positive constants α, β and C.


Then there exists X̃, a modification of X, whose paths are γ-locally Hölder contin-
uous ∀γ ∈ (0, β /α) a.s., i.e. there are constants δ > 0 and ε > 0 s.t.

|X̃t − X̃s |
sup ≤ε a.s. (8)
0≤t−s≤δ , s,t,∈[0,T ] |t − s|
γ

In particular, the sample paths of X̃ are a.s. continuous.

3 Brownian Motion and First Properties


3.1 Definition of Brownian motion
Our fundamental building block will be Brownian motion. It is a centred Gaussian
process, so we can characterise it in terms of its covariance structure. It is often
described as an ‘infinitesimal random walk’, so to motivate the definition, we take
a quick look at simple random walk.

Definition 3.1 The discrete time stochastic process {Sn }n≥0 is a symmetric simple
random walk under the measure P if Sn = ∑ni=1 ξi , where the ξi can take only the
values ±1, and are i.i.d. under P with P[ξi = −1] = 1/2 = P[ξi = 1].

Lemma 3.2 {Sn }n≥0 is a P-martingale (with respect to the natural filtration) and

cov(Sn , Sm ) = n ∧ m.

To obtain a ‘continuous’ version of simple random walk, we appeal to the Central


Limit Theorem. Since E[ξi ] = 0 and var(ξi ) = 1, we have
  Zx
Sn 1 2
P √ ≤x → √ e−y /2 dy as n → ∞.
n −∞ 2π
More generally,
  Zx
S[nt] 1 −y2 /2t
P √ ≤x → √ e dy as n → ∞,
n −∞ 2πt
where [nt] denotes the integer part of nt.
Heuristically at least, passage to the limit from simple random walk suggests
the following definition of Brownian motion.
Definition 3.3 (Brownian motion) A real-valued stochastic process {Bt }t≥0 is a
P-Brownian motion (or a P-Wiener process) if for some real constant σ , under P,

12
i. for each s ≥ 0 and t > 0 the random variable Bt+s − Bs has the normal
distribution with mean zero and variance σ 2t,

ii. for each n ≥ 1 and any times 0 ≤ t0 ≤ t1 ≤ · · · ≤ tn , the random variables


{Btr − Btr−1 } are independent,

iii. B0 = 0,

iv. Bt is continuous in t ≥ 0.

When σ 2 = 1, we say that we have a standard Brownian motion.

Notice in particular that for s < t,

cov(Bs , Bt ) = E[Bs Bt ] = E B2s + Bs (Bt − Bs ) = E[B2s ] = s


 
(= s ∧ t).

We can write down the finite dimensional distributions using the independence of
increments. They admit a density with respect to Lebesgue measure. We write
p(t, x, y) for the transition density

(x − y)2
 
1
p(t, x, y) = √ exp − .
2πt 2t

For 0 = t0 ≤ t1 ≤ t2 ≤ . . . ≤ tn , writing x0 = 0, the joint probability density function


of Bt1 , . . . , Btn is
n
f (x1 , . . . , xn ) = ∏ p(t j − t j−1 , x j−1 , x j ).
1

Although the sample paths of Brownian motion are continuous, it does not
mean that they are nice in any other sense. In fact the behaviour of Brownian
motion is distinctly odd. Here are just a few of its strange behavioural traits.

i. Although {Bt }t≥0 is continuous everywhere, it is (with probability one) dif-


ferentiable nowhere.

ii. Brownian motion will eventually hit any and every real value no matter how
large, or how negative. No matter how far above the axis, it will (with prob-
ability one) be back down to zero at some later time.

iii. Once Brownian motion hits a value, it immediately hits it again infinitely
often, and then again from time to time in the future.

iv. It doesn’t matter what scale you examine Brownian motion on, it looks just
the same. Brownian motion is a fractal.

The last property is really a consequence of the construction of the process. We’ll
formulate the second and third more carefully later.

13
We could recover the existence of Brownian motion from the general principles
outlined so far (Daniell-Kolmogorov Theorem and the Kolmogorov continuity cri-
terion plus what we know about Gaussian processes), but we are now going to take
a short digression to describe a beautiful (and useful) construction due to Lévy.
The idea is that we can simply produce a path of Brownian motion by direct
polygonal interpolation. We require just one calculation.

Lemma 3.4 Suppose that {Bt }t≥0 is standard Brownian motion. Conditional on
Bt1 = x1 , the probability density function of Bt1 /2 is
2 !!
x − 12 x1
r
2 1
pt1 /2 (x) , exp − .
πt1 2 t1 /4

In other words, the conditional distribution is a normally distributed random vari-


able with mean x1 /2 and variance t1 /4. The proof is an exercise.

The construction: Without loss of generality we take the range of t to be [0, 1].
Lévy’s construction builds (inductively) a polygonal approximation to the Brown-
ian motion from a countable collection of independent normally distributed random
variables with mean zero and variance one. We index them by the dyadic points of
[0, 1], a generic variable being denoted ξ (k2−n ) where n ∈ N and k ∈ {0, 1, . . . , 2n }.
The induction begins with

X1 (t) = tξ (1).

Thus X1 is a linear function on [0, 1].


The nth process, Xn , is linear in each interval [(k − 1)2−n , k2−n ] is continuous
in t and satisfies Xn (0) = 0. It is thus determined by the values {Xn (k2−n ), k =
1, . . . , 2n }.

The inductive step: We take


   
Xn+1 2k2−(n+1) = Xn 2k2−(n+1) = Xn k2−n .


We now determine the appropriate value for Xn+1 (2k − 1)2−(n+1) . Conditional


on Xn+1 2k2−(n+1) − Xn+1 2(k − 1)2−(n+1) , Lemma 3.4 tells us that


 

   
Xn+1 (2k − 1)2−(n+1) − Xn+1 2(k − 1)2−(n+1)

should be normally distributed with mean


1    
Xn+1 2k2−(n+1) − Xn+1 2(k − 1)2−(n+1)
2

and variance 2−(n+2) .

14
Now if X ∼ N (0, 1), then aX + b ∼ N (b, a2 ) and so we take
     
Xn+1 (2k − 1)2−(n+1) −Xn+1 2(k − 1)2−(n+1) = 2−(n/2+1) ξ (2k − 1)2−(n+1)
1    
+ Xn+1 2k2−(n+1) − Xn+1 2(k − 1)2−(n+1) .
2
In other words
  1
Xn+1 (2k − 1)2−(n+1) Xn (k − 1)2−n

=
2
1  
+ Xn k2−n + 2−(n/2+1) ξ (2k − 1)2−(n+1)

 2 
= Xn (2k − 1)2−(n+1)
 
+ 2−(n/2+1) ξ (2k − 1)2−(n+1) , (9)

where the last equality follows by linearity of Xn on [(k − 1)2−n , k2−n ].


The construction is illustrated in Figure 1
Brownian motion will be the process constructed by letting n increase to infinity.
To check that it exists we need some technical lemmas. The proofs are adapted
from Knight (1981).

Lemma 3.5
h i
P lim Xn (t) exists for 0 ≤ t ≤ 1 uniformly in t = 1.
n→∞

Proof: Notice that maxt |Xn+1 (t) − Xn (t)| will be attained at a vertex, that is for
t ∈ {(2k − 1)2−(n+1) , k = 1, 2, . . . , 2n } and using (9),
h i
P max |Xn+1 (t) − Xn (t)| ≥ 2−n/4
t
   
= P max n ξ (2k − 1)2−(n+1) ≥ 2n/4+1

1≤k≤2
h i
≤ 2n P |ξ (1)| ≥ 2n/4+1 .

Now
1 2
P [ξ (1) ≥ x] ≤ √ e−x /2 ,
x 2π
(exercise), and, combining this with the fact that
 
exp −2(n/2+1) < 2−2n+2 ,

we obtain that for n ≥ 4


h i 2n 1   2n −2n+2
2n P ξ (1) ≥ 2n/4+1 ≤ √ exp −2 (n/2+1)
≤ 2 < 2−n .
2n/4+1 2π 2n/4+1

15
X3
X2

X1

0 1/4 1/2 3/4 1

Figure 1: Lévy’s sequence of polygonal approximations to Brownian motion.

16
Consider now for k > n,
h i h i
P max |Xk (t) − Xn (t)| ≥ 2−n/4+3 = 1 − P max |Xk (t) − Xn (t)| ≤ 2−n/4+3
t t

and
h i
P max |Xk (t) − Xn (t)| ≤ 2−n/4+3
t
" #
k−1
≥ P ∑ max X j+1 (t) − X j (t) ≤ 2−n/4+3

t
j=n
h
≥ P max X j+1 (t) − X j (t) ≤ 2− j/4 ,

t
i
j = n, . . . , k − 1
k−1
≥ 1 − ∑ 2− j ≥ 1 − 2−n+1 .
j=n

Finally we have that


h i
P max |Xk (t) − Xn (t)| ≥ 2−n/4+3 ≤ 2−n+1 ,
t

for all k ≥ n. The events on the left are increasing (since the maximum can only
increase by the addition of a new vertex) so
h i
P max |Xk (t) − Xn (t)| ≥ 2−n/4+3 for some k > n ≤ 2−n+1 .
t

In particular, for ε > 0,

lim P [For some k > n and t ≤ 1, |Xk (t) − Xn (t)| ≥ ε] = 0,


n→∞

which proves the lemma. 


To complete the proof of existence of the Brownian motion, we must check the
following.

Lemma 3.6 Let X(t) = limn→∞ Xn (t) if the limit exists uniformly and zero other-
wise. Then X(t) satisfies the conditions of Definition 3.3 (for t restricted to [0, 1]).

Proof: By construction, the properties 1–3 of Definition 3.3 hold for the approxi-
mation Xn (t) restricted to Tn = {k2−n , k = 0, 1, . . . , 2n }. Since we don’t change Xk
on Tn for k > n, the same must be true for X on ∪∞ n=1 Tn . A uniform limit of con-
tinuous functions is continuous, so condition 4 holds and now by approximation of
any 0 ≤ t1 ≤ t2 ≤ . . . ≤ tn ≤ 1 from within the dense set ∪∞ n=1 Tn we see that in fact
all four properties hold without restriction for t ∈ [0, 1]. 

17
3.2 Wiener Measure
Let C(R+ , R) be the space of continuous functions from [0, ∞) to R. Given a
Brownian motion (Bt : t ≥ 0) on (Ω, F , P), consider the map

Ω → C(R+ , R), given by ω 7→ (Bt (ω) : t ≥ 0) (10)

which is measurable w.r.t. B(C(R+ , R)) - the smallest σ -algebra such that the
coordinate mappings (i.e. (ωt : t ≥ 0) 7→ ω(t0 ) for a fixed t0 ) are measurable. (In
fact B(C(R+ , R)) is also the Borel σ -algebra generated by the topology of uniform
convergence on compacts).

Definition 3.7 The Wiener measure W is the image of P under the mapping in (10);
it is the probability measure on the space of continuous functions such that the
canonical process, i.e. (Bt (ω) = ω(t),t ≥ 0), is a Brownian motion.

In other words, W is the unique probability measure on (C(R+ , R), B(C(R+ , R)))
such that

i. W({ω ∈ C(R+ , R), ω(0) = 0}) = 1;

ii. for any n ≥ 1, ∀ 0 = t0 < t1 < . . . < tn , A ∈ B(Rn )

W({ω ∈ C(R+ , R) : (ω(t1 ), . . . , ω(tn )) ∈ A})


Z
1 dy1 · · · dyn  n (y − y )2 
i i−1
= n p exp −∑ ,
A (2π) 2 t1 (t2 − t1 ) . . . (tn − tn−1 ) i=1 2(t i − t i−1 )

where y0 := 0.

(Uniqueness follows from the Monotone Class Lemma, since B(C(R+ , R))) is
generated by finitely dimensional projections.)

3.3 Extensions and first properties


Definition 3.8 Let µ be a probability measure on Rd . A d-dimensional stochastic
process (Bt : t ≥ 0) on (Ω, F , P) is called a d-dimensional Brownian motion with
initial distribution µ if

i. P[B0 ∈ A] = µ(A), A ∈ B(Rd );

ii. ∀ 0 ≤ s ≤ t the increment (Bt − Bs ) is independent of σ (Bu : u ≤ s) and is


normally distributed with mean 0 and covariance matrix (t − s) × Id ;

iii. B has a.s. continuous paths.


(1) (d)
Writing the d-dimensional Brownian motion as Bt = (Bt , . . . , Bt ), the coordinate
(i)
processes (Bt ), 1 ≤ i ≤ d, are independent one-dimensional Brownian motions.
If µ({x}) = 1 for some x ∈ Rd , we say that B starts at x.

18
Proposition 3.9 Let B be a standard real-valued Brownian motion. Then
i. −Bt is also a Brownian motion, (symmetry)
ii. ∀ c ≥ 0, cBt/c2 is a Brownian motion, (scaling)
iii. X0 = 0, Xt := tB 1 is a Brownian motion, (time reversal)
t

iv. ∀ s ≥ 0, B̃t = Bt+s − Bs is a Brownian motion independent of σ (Bu : u ≤ s),


(simple Markov property).

The proof is an exercise.

3.4 Basic properties of Brownian sample paths


From now on, when we say “Brownian motion”, we mean a standard real-valued
Brownian motion.
We know that t 7→ Bt (ω) is almost surely continuous.
Exercise: Use the Kolmogorov continuity criterion to show that Brownian motion
admits a modification which is locally Hölder continuous of order γ for any 0 <
γ < 1/2.
On the other hand, as we have already remarked, the path is actually rather
‘rough’. We’d like to have a way to quantify this roughness.

Definition 3.10 Let π be a partition of [0, T ], N(π) the number of intervals that
make up π and δ (π) be the mesh of π (that is the length of the longest interval in
the partition). Write 0 = t0 < t1 < . . . < tN(π) = T for the endpoints of the intervals
of the partition. Then the variation of a function f : [0, T ] → R is
( )
N(π)
lim sup ∑ f (t j ) − f (t j−1 ) .
δ →0 π:δ (π)=δ j=1

If the function is ‘nice’, for example differentiable, then it has bounded variation.
Our ‘rough’ paths will have unbounded variation. To quantify roughness we can
extend the idea of variation to that of p-variation.
Definition 3.11 In the notation of Definition 3.10, the p-variation of a function
f : [0, T ] → R is defined as
( )
N(π) p
lim sup ∑ f (t j ) − f (t j−1 ) .
δ →0 π:δ (π)=δ j=1

Notice that for p > 1, the p-variation will be finite for functions that are much
rougher than those for which the variation is bounded. For example, roughly speak-
ing, finite 2-variation
√ will follow if the fluctuation of the function over an interval
of order δ is order δ .
For a typical Brownian path, the 2-variation will be infinite. However, a slightly
weaker analogue of the 2-variation does exist.

19
Theorem 3.12 Let Bt denote Brownian motion under P and for a partition π of
[0, T ] define
N(π) 2
S(π) = ∑ Bt j − Bt j−1 .
j=1

Let πn be a sequence of partitions with δ (πn ) → 0. Then


h i
E |S(πn ) − T |2 → 0 as n → ∞. (11)

We say that the quadratic variation process of Brownian motion, which we denote
by {hBit }t≥0 is hBit = t. More generally, we can define the quadratic variation
process associated with any bounded continuous martingale.
Definition 3.13 Suppose that {Mt }t≥0 is a bounded continuous P-martingale. The
quadratic variation process associated with {M}t≥0 is the process {hMit }t≥0 such
that for any sequence of partitions πn of [0, T ] with δ (πn ) → 0,
" #
N(πn ) 2 2
E ∑ Mt − Mt − hMiT → 0 as n → ∞. (12)

j j−1
j=1

Remark: We don’t prove it here, but the limit in (12) will be independent of the
sequence of partitions. 
Proof of Theorem 3.12: We expand the expression inside the expectation in (11)
N(π )
and make use of our knowledge of the normal distribution. Let {tn, j } j=0n denote
the endpoints of the intervals that make up the partition πn . First observe that
o 2

N(πn ) n
2
|S(πn ) − T |2 = ∑ Btn, j − Btn, j−1 − (tn, j − tn, j−1 ) .

j=1
2
It is convenient to write δn, j for Btn, j − Btn, j−1 − (tn, j − tn, j−1 ). Then
N(πn )
|S (πn ) − T |2 = ∑ δn,2 j + 2 ∑ δn, j δn,k .
j=1 j<k

Note that since Brownian motion has independent increments,


E [δn, j δn,k ] = E [δn, j ] E [δn,k ] = 0 if j 6= k.
Also
h 4 2 i
E δn,2 j = E Btn, j − Btn, j−1 − 2 Btn, j − Btn, j−1 (tn, j − tn, j−1 ) + (tn, j − tn, j−1 )2 .
 

For a normally distributed random variable, X, with mean zero and variance λ ,
E[|X|4 ] = 3λ 2 , so we have
E δn,2 j = 3 (tn, j − tn, j−1 )2 − 2 (tn, j − tn, j−1 )2 + (tn, j − tn, j−1 )2
 

= 2 (tn, j − tn, j−1 )2


≤ 2δ (πn ) (tn, j − tn, j−1 ) .

20
Summing over j

h 2 i N(πn )
E S(πn ) − T ≤ 2 ∑ δ (πn ) (tn, j − tn, j−1 )
j=1
= 2δ (πn )T
→ 0 as n → ∞.

Corollary 3.14 Brownian sample paths are of infinite variation on any interval
almost surely.

Corollary 3.15 Brownian sample paths are almost surely nowhere locally Hölder
continuous of order γ > 21 .

(The proofs are exercises.)


To study the very small time behaviour of Brownian motion, it is useful to
establish the following 0 − 1 law.
Fix a Brownian motion (Bt )t≥0 on (Ω, F , P). For every t ≥ 0 we set Ft :=
σ (Bu : u ≤ t). Note that Fs ⊂ Ft if s ≤ t. We also set F0+ := ∩s>0 Fs .

Theorem 3.16 (Blumenthal’s 0-1 law)


The σ -field F0+ is trivial in the sense that P[A] = 0 or 1 for every A ∈ F0+ .

Proof. Let 0 < t1 < t2 · · · < tk and let g : Rk → R be a bounded continuous function.
Also, fix A ∈ F0+ . Then by a continuity argument,

E[1A g(Bt1 , . . . , Btk )] = lim E[1A g(Bt1 − Bε , . . . , Btk − Bε )].


ε↓0

If 0 < ε < t1 , the variables Bt1 − Bε , . . . , Btk − Bε are independent of Fε (by the
Markov property) and thus also of F0+ . It follows that

E[1A g(Bt1 , . . . , Btk )] = lim E[1A g(Bt1 − Bε , . . . , Btk − Bε )]


ε↓0
= P[A]E[g(Bt1 , . . . , Btk )].

We have thus obtained that F0+ is independent of σ (Bt1 , . . . , Btk ). Since this holds
for any finite collection {t1 , . . . ,tk } of (strictly) positive reals, F0+ is independent
of σ (Bt ,t > 0). However, σ (Bt ,t > 0) = σ (Bt ,t ≥ 0), since B0 is the pointwise
limit of Bt when t → 0. Since F0+ ⊂ σ (Bt ,t ≥ 0), we conclude that F0+ is inde-
pendent of itself and so must be trivial. 

Proposition 3.17 Let B be a standard real-valued Brownian motion, as above.

21
i. Then, a.s., for every ε > 0,

sup Bs > 0 and inf Bs < 0.


0≤s≤ε 0≤s≤ε

In particular, inf{t > 0 : Bt = 0} = 0 a.s.

ii. For every a ∈ R, let Ta := inf{t ≥ 0 : Bt = a} (with the convention that


inf 0/ = ∞). Then a.s. for each a ∈ R, Ta < ∞. Consequently, we have a.s.

lim sup Bt = +∞, lim inf Bt = −∞.


t→∞ t→∞

Remark 3.18 It is not a priori obvious that sup0≤s≤ε Bs is even measurable, since
this is an uncountable supremum of random variables, but since sample paths are
continuous, we can restrict to rational values of s ∈ [0, ε] so that we are taking the
supremum over a countable set. We implicitly use this observation in what follows.

Proof. (i) Let ε p be a sequence of strictly positive reals decreasing to zero and set
A := p≥0 {sup0≤s≤ε p Bs > 0}. Since this is a monotone decreasing intersection,
T

A ∈ F0+ . On the other hand, by monotonicity, P[A] is the decreasing limit

P[A] = lim ↓ P[ sup Bs > 0]


p→∞ 0≤s≤ε p

and
1
P[ sup Bs > 0] ≥ P[Bε p > 0] = .
0≤s≤ε p 2
So P[A] ≥ 1/2 and by Blumenthal’s 0 − 1 law P[A] = 1. Hence a.s. for all ε > 0,
sup0≤s≤ε Bs > 0. Replacing B by −B we obtain P[inf0≤s≤ε Bs < 0] = 1.
(ii) Write
h i h i
1 = P sup Bs > 0 = lim ↑ P sup Bs > δ ,
0≤s≤1 δ ↓0 0≤s≤1

where ↑ indicates an increasing limit. Now use the scale invariance property, Btλ =
Bλ 2t /λ is a Brownian motion, with λ = 1/δ to see that for any δ > 0,
h i h i h i
P sup Bs > δ = P sup Bδs > 1 = P sup Bs > 1 . (13)
0≤s≤1 0≤s≤1/δ 2 0≤s≤1/δ 2

If we let δ ↓ 0, we find

P[sup Bs > 1] = lim ↑ P[ sup Bs > 1] = 1


s≥0 δ ↓0 0≤s≤1/δ 2

(since limδ ↓0 ↑ P[sup0≤Bs ≤1 Bs > δ ] = 1).


Another scaling argument shows that, for every M > 0,

P[sup Bs > M] = 1
s≥0

22
and replacing B with −B,
P[inf Bs < −M] = 1.
s≥0

Continuity of sample paths completes the proof of (ii). 

Corollary 3.19 a.s. t 7→ Bt is not monotone on any non-trivial interval.

4 Filtrations and stopping times


These are concepts that you already know about in the context of discrete param-
eter martingales. Our definitions here mirror what you already know, but in the
continuous setting one has to be slightly more careful. In the end, we’ll make
enough assumptions to guarantee that everything goes through nicely.

Definition 4.1 A collection {Ft ,t ∈ [0, ∞)} of σ -algebras of sets in F is a filtra-


tion if Ft ⊆ Ft+s for t, s ∈ [0, ∞). (Intuitively, Ft corresponds to the information
known to an observer at time t.)
In particular, for a process X we define FtX = σ ({X(s) : s ≤ t}) (that is FtX
is the information obtained by observing X up to time t) to be the natural filtration
associated with the process X.
We say that Ft is right continuous if for each t ≥ 0,

Ft = Ft+ ≡ ∩ε>0 Ft+ε .

We say that {Ft } is complete if (Ω, F , P) is complete (contains all subsets of


the P-null sets) and {A ∈ F : P[A] = 0} ⊂ F0 (and hence ⊂ Ft for all t).
A process X is adapted to a filtration {Ft } if X(t) is Ft -measurable for each
t ≥ 0 (if and only if FtX ⊆ Ft for all t).
A process X is Ft -progressive if for each t ≥ 0, the mapping (s, ω) 7→ Xs (ω) is
measurable on ([0,t] × Ω, B([0,t]) ⊗ Ft ).

If X is Ft -progressive, then it is Ft -adapted, but the converse is not necessarily


true. However, every right continuous Ft -adapted process is Ft -progressive and
since we shall only be interested in continuous processes, we won’t need to dwell
on these details.

Proposition 4.2 An adapted process (Xt ) whose paths are all right-continuous (or
are all left-continuous) is progressively measurable.

Proof. We present the argument for a right-continuous X. For t > 0, n ≥ 1, k =


(n) (n)
0, 1, 2 . . . , 2n − 1 let X0 (ω) = X0 (ω) and Xs (ω) := X k+1n t (ω) for 2ktn < s ≤ k+1
2n t.
2
(n)
Clearly (Xs : s ≤ t) takes finitely many values and is B([0,t])⊗Ft –measurable.
(n)
Further, by right continuity, Xs (ω) = limn→∞ Xs (ω), and hence is also measurable
(as a limit of measurable mappings). 

23
Definition 4.3 A filtration (Ft )t≥0 (or the filtered space (Ω, F , (Ft )t≥0 , P)) is said
to satisfy the usual conditions if it is right-continuous and complete.

Given a filtered probability space, we can always consider a natural augmentation,


replacing the filtration with σ (Ft+ , N ), where N = N (P) := {A ∈ Ω : ∃B ∈
F such that A ⊆ B and P[B] = 0}. The augmented filtration satisfies the usual
conditions. In Section 5.3 we’ll see that if we have a martingale with respect to a
filtration that satisfies the usual conditions, then it has a right continuous version.
Usually we shall consider the natural filtration associated with a process and
don’t specify it explicitly. Sometimes we suppose that we are given a filtration Ft
and then we say that a process is an Ft -Brownian motion if it is adapted to Ft .

4.1 Stopping times


Again the definition mirrors what you know from the discrete setting.

Definition 4.4 Let (Ω, F , (Ft ), P) be a filtered space. A random variable τ :


Ω 7→ [0, +∞] is called a stopping time (relative to (Ft )) if {τ ≤ t} ∈ Ft , ∀t ≥ 0.

The ‘first time a certain phenomenon occurs’ will be a stopping time. Our funda-
mental examples will be first hitting times of sets. If (Xt ) is a stochastic process
and Γ ∈ B(R) we set

HΓ (ω) = HΓ (X(ω)) := inf{t ≥ 0 : X(ω) ∈ Γ}. (14)

Exercise 4.5 Show that

i. if (Xt ) is adapted to (Ft ) and has right-continuous paths then HΓ , for Γ an


open set, is a stopping time relative to (Ft+ ).

ii. if (Xt ) has continuous paths, then HΓ , for Γ a closed set, is a stopping time
relative to (Ft ).

With a stopping time we can associate ‘the information known at time τ’:

Definition 4.6 Given a stopping time τ relative to (Ft ) we define

Fτ := {A ∈ F : A ∩ {τ ≤ t} ∈ Ft ∀t ≥ 0},
Fτ+ := {A ∈ F : A ∩ {τ < t} ∈ Ft },
Fτ− := σ ({A ∩ {τ > t} : t ≥ 0, A ∈ Ft })

(which satisfy all the natural properties).

Proposition 4.7 Let τ be a stopping time. Then

(i) Fτ is a σ -algebra and τ is Fτ -measurable.

(ii) Fτ− ⊆ Fτ ⊆ Fτ+ and Fτ = Fτ+ if (Ft ) is right-continuous.

24
(iii) If τ = t then Fτ = Ft , Fτ+ = Ft+ .

(iv) If τ and ρ are stopping times then so are τ ∧ ρ, τ ∨ ρ and τ + ρ and {τ ≤


ρ} ∈ Fτ∧ρ . Further if τ ≤ ρ then Fτ ⊆ Fρ .

(v) If τ is a stopping time and ρ is a [0, ∞]-valued random variable which is


Fτ -measurable and ρ ≥ τ, then ρ is a stopping time. In particular,

k+1
τn := ∑ n
1{ kn <τ≤ k+1n } + ∞1{τ=∞} (15)
k=0 2
2 2

is a sequence of stopping times with τn ↓ τ as n → ∞.

Proof.
We give a proof of (v).
Note that {ρ ≤ t} = {ρ ≤ t} ∩ {τ ≤ t} ∈ Ft since ρ is Fτ -measurable. Hence
ρ is a stopping time. We have τn ↓ τ by definition, and clearly τn is Fτ -measurable
since τ is Fτ -measurable. 
It is often useful to be able to ‘stop’ a process at a stopping time and know
that the result still has nice measurability properties. If (Xt )t≥0 is progressively
measurable and τ is a stopping time, then X τ := (Xt∧τ : t ≥ 0) is progressively
measurable.
We’re going to use the sequence τn of stopping times in (15) to prove an impor-
tant generalisation of the Markov property for Brownian motion called the strong
Markov property. Recall that the Markov property says that Brownian motion has
‘no memory’ - we can start it again from Bs and Bt+s − Bs is just a Brownian
motion, independent of the path followed by B up to time s. The strong Markov
property says that the same is true if we replace s by a stopping time.

Theorem 4.8 Let B = (Bt : t ≥ 0) be a standard Brownian motion on the filtered


probability space (Ω, F , (Ft )t≥0 , P) and let τ be a stopping time with respect to
(Ft )t≥0 . Then, conditional on {τ < ∞}, the process
(τ)
Bt := Bτ+t − Bτ (16)

is a standard Brownian motion independent of Fτ .

Proof. Assume that τ < ∞ a.s..


We will show that ∀ A ∈ Fτ , 0 ≤ t1 < . . . < t p and continuous and bounded
functions F on Rn we have
 (τ) (τ)   
E 1A F(Bt1 , . . . , Bt p ) = P(A)E F(Bt1 , . . . , Bt p ) . (17)

Granted (17), taking A = Ω, we find that B and B(τ) have the same finite dimen-
sional distributions, and since B(τ) has continuous paths, it must be a Brownian

25
motion. On the other hand (as usual using a monotone class argument), (17) says
(τ) (τ)
that (Bt1 , . . . , Bt p ) is independent of Fτ , and so B(τ) is independent of Fτ .
To establish (17), first observe that by continuity of B and F,

(τ) (τ) 
F(Bt1 , . . . , Bt p ) = lim∑1 k−1 k F B kn +t1 − B kn , . . . , B kn +t p − B kn a.s.,
n→∞ 2n <τ≤ 2n 2 2 2 2
k=0

and by the Dominated Convergence Theorem


∞ h i
 (τ) (τ) 
E 1A F(Bt1 , . . . , Bt p ) = lim 1 1
∑ A n <τ≤ n
E k−1 k F B k
n +t1
− B k
n
, . . . , B k
n +t p
− B k
n
.
n→∞ 2 2 2 2 2 2
k=0

For A ∈ Fτ , the event A ∩ { k−1


2n < τ ≤
k
2n } ∈ F kn , so using the simple Markov
2
property at k/2n,
h i
E 1A∩{ k−1n <τ≤ kn } F B kn +t1 − B kn , . . . , B kn +t p − B kn
2 2 2 2
 2 2

k−1 k  
= P A∩ n
<τ ≤ n E F(Bt1 , . . . , Bt p ) .
2 2

Sum over k to recover the desired result.


If P(τ = ∞) > 0, the same argument gives instead
h i
(τ) (τ)  
E 1A∩{τ<∞} F(Bt1 , . . . , Bt p ) = P[A ∩ {τ < ∞}]E F(Bt1 , . . . , Bt p ) .


It was not until the 1940’s that Doob properly formulated the strong Markov
property and it was 1956 before Hunt proved it for Brownian motion.
The following result, known as the reflection principle, was known at the end
of the 19th Century for random walk and appears in the famous 1900 thesis of
Bachelier, which introduced the idea of modelling stock prices using Brownian
motion (although since he had no formulation of the strong Markov property, his
proof is not rigorous).

Theorem 4.9 (The reflection principle) Let St := supu≤t Bu . For a ≥ 0 and b ≤ a


we have
P[St ≥ a, Bt ≤ b] = P[Bt ≥ 2a − b] ∀t ≥ 0.
In particular St and |Bt | have the same distribution.

Proof. We apply the strong Markov property to the stopping time Ta = inf{t > 0 :
Bt = a}. We have already seen that Ta < ∞ a.s. and so in the notation of Theo-
rem 4.8,
a (T )
P[St ≥ a, Bt ≤ b] = P[Ta ≤ t, Bt ≤ b] = P[Ta ≤ t, Bt−T a
≤ b − a] (18)

26
(T )
a
(since Bt−T a
= Bt − BTa = Bt − a).
Now B a ) is a Brownian motion, independent of FTa and hence of Ta . Since
(T

B a ) has the same distribution as −B(Ta ) , (Ta , B(Ta ) ) has the same distribution as
(T

(Ta , −B(Ta ) ).
So
(T )
P[Ta ≤ t, B(Ta ) ≤ b − a] = P[Ta ≤ t, −Bt−T
a
a
≤ b − a]
= P[Ta ≤ t, Bt ≥ 2a − b] = P[Bt ≥ 2a − b],

since 2a − b ≥ a and so {Bt ≥ 2a − b} ⊆ {Ta ≤ t}.


We have proved that P[St ≥ a, Bt ≤ b] = P[Bt ≥ 2a − b]. For the last assertion
of the theorem, taking a = b in (18), observe that

P[St ≥ a] = P[St ≥ a, Bt ≥ a] + P[St ≥ a, Bt ≤ a]


= 2P[Bt ≥ a] = P[Bt ≥ a] + P[Bt ≤ −a] (symmetry)
= P[|Bt | ≥ a].

5 (Sub/super-)Martingales in continuous time


The results in this section will to a large extent mirror what you proved last term for
discrete parameter martingales (and we use those results repeatedly in our proofs).
We assume throughout that a filtered probability space (Ω, F , (Ft )t≥0 , P) is given.

5.1 Definitions
Definition 5.1 An adapted stochastic process (Xt )t≥0 such that Xt ∈ L1 (P) (i.e.
E[|Xt |] < ∞) for any t ≥ 0, is called

i. a martingale if E[Xt |Fs ] = Xs for all 0 ≤ s ≤ t,

ii. a super-martingale if E[Xt |Fs ] ≤ Xs for all 0 ≤ s ≤ t,

iii. a sub-martingale if E[Xt |Fs ] ≥ Xs for all 0 ≤ s ≤ t.

Exercises: Suppose (Zt : t ≥ 0) is an adapted process with independent incre-


ments, i.e. for all 0 ≤ s < t, Zt − Zs is independent of Fs . The following give us
examples of martingales:

i. if ∀t ≥ 0, Zt ∈ L1 , then Z̃t := Zt − E[Zt ] is a martingale,

ii. if ∀t ≥ 0, Zt ∈ L2 , then Z̃t2 − E[Z̃t2 ] is a martingale,


eθ Zt
iii. if for some θ ∈ R, and ∀t ≥ 0, E[eθ Zt ] < ∞, then E[eθ Zt ]
is a martingale.

27
2
In particular, Bt , Bt2 −t and eθ Bt −θ t/2 are all martingales with respect to a filtration
(Ft )t≥0 for which (Bt )t≥0 is a Brownian motion.
Warning: It is important to remember that a process is a martingale with respect
to a filtration - giving yourself more information (enlarging the filtration) may
destroy the martingale property. For us, even when we don’t explicitly mention it,
there is a filtration implicitly assumed (usually the natural filtration associated with
the process, augmented to satisfy the usual conditions).
Given a martingale (or submartingale) it is easy to generate many more.

Proposition 5.2 Let (Xt )t≥0 be a martingale (respectively sub-martingale) and ϕ :


R → R be a convex (respectively convex and increasing) such that E[|ϕ(Xt )|] < ∞
for any t ≥ 0. Then (ϕ(Xt ))t≥0 is a sub-martingale.

Proof. Conditional Jensen inequality. 


In particular, if (Xt )t≥0 is martingale with E[|Xt | p ] < ∞, for some p ≥ 1 and
all t ≥ 0, then |Xt | p is a sub-martingale (and consequently, t 7→ E[|Xt |P ] is non-
decreasing).

5.2 Doob’s maximal inequalities


Doob was the person who placed martingales on a firm mathematical foundation
(beginning in the 1940’s). He initially called them ‘processes with the property E’,
but reverted to the term martingale in his monumental book.
Doob’s inequalities are fundamental to proving convergence theorems for mar-
tingales. You already encountered them in the discrete setting and we shall recall
those results that underpin our proofs in the continuous world here. They allow us
to control the running maximum of a martingale.
Theorem 5.3 If (Xn )n≥0 is a discrete martingale (or a positive submartingale)
w.r.t. some filtration (Fn ), then for any N ∈ N, p ≥ 1 and λ > 0,
h i
λ p P sup |Xn | ≥ λ ≤ E |XN | p
 
n≤N

and for any p > 1


  
p p p p 
E |XN | p .
  
E |XN | ≤ E sup |Xn | ≤
n≤N p−1
We’d now like to extend this to continuous time.
Suppose that X is indexed by t ∈ [0, ∞). Take a countable dense set D in [0, T ],
e.g. D = Q ∩ [0, T ], and an increasing sequence of finite subsets Dn ⊆ D such that
∪∞n=1 Dn = D.
The above inequalities hold for X indexed by t ∈ Dn ∪ {T }. Monotone con-
vergence then yields the result for t ∈ D. If X has regular sample paths (e.g. right
continuous) then the supremum over a countable dense set in [0, T ] is the same as
over the whole of [0, T ] and so:

28
Theorem 5.4 (Doob’s maximal and L p inequalities)
If (Xt )t≥0 is a right continuous martingale or positive sub-martingale, then for any
T ≥ 0, λ > 0,
h i
λ p P sup |Xt | ≥ λ ≤ E |XT | p , p ≥ 1
 
t≤T
   (19)
p p p  p

E sup |Xt | ≤ E |XT | , p > 1.
t≤T p−1
As an application of Doob’s maximal inequality, we derive a useful bound for
Brownian motion.
Proposition 5.5 Let (Bt )t≥0 be Brownian motion and St = supu≤t Bu . For any λ >
0 we have
λ 2t
P[St ≥ λt] ≤ e− 2 .
2 t/2
Proof. Recall that eαBt −α , t ≥ 0, is a non-negative martingale. It follows that,
for α ≥ 0,
h 2 2
i
P[St ≥ λt] ≤ P sup eαBu −α t/2 ≥ eαλt−α t/2

u≤t
h 2 2
i
≤ P sup eαBu −α u/2 ≥ eαλt−α t/2

u≤t
2
h 2
i
≤ e−αλt+α t/2 E eαBt −α t/2 .
| {z }
=1

The bound now follows since minα≥0 e −αλt+α 2 t/2 = e−λ


2 t/2
(with the minimum
achieved when α = λ ). 
In the next subsection, we are going to show that even if a supermartingale is
not right continuous, it has a right continuous version (this is Doob’s Regularisation
Theorem). To prove this, we need a slight variant of the maximal inequality - this
time for a supermartingale - which in turn relies on Doob’s Optional Stopping
Theorem for discrete supermartingales.
Theorem 5.6 (Doob’s Optional Stopping Theorem for discrete supermartinagles)
(bounded case)
If (Yn )n≥1 is a supermartingale, then for any choice of bounded stopping times
S and T such that S ≤ T , we have
YS ≥ E[YT |FS ].
Here’s the version of the maximal inequality that we shall need.
Proposition 5.7 Let (Xt : t ≥ 0) be a supermartingale. Then
" #
1
P sup |Xt | ≥ λ ≤ (2E[|XT |] + E[|X0 |]) , ∀λ , T > 0. (20)
t∈[0,T ]∩Q λ
In particular, supt∈[0,T ]∩Q |Xt | < ∞ a.s.

29
Proof. Take a sequence of rational numbers 0 = t0 < t1 < . . . < tn = T . Applying
Theorem 5.6 with S = min{ti : Xti ≥ λ } ∧ T , we obtain

E[X0 ] ≥ E[XS ] ≥ λ P[ sup Xti ≥ λ ] + E[XT 1sup1≤i≤n Xti <λ ].


1≤i≤n

Rearranging,
λ P( sup Xti ≥ λ ) ≤ E[X0 ] + E[XT− ]
1≤i≤n

(where XT−
= − min(XT , 0)). Now XT− is a non-negative submartingale and so we
can apply Doob’s inequality directly to it, from which

λ P( sup Xt−
i
≥ λ ) ≤ E[XT− ],
1≤i≤n

and, since E[XT− ] ≤ E[|XT |], taking the (monotone) limit in nested sequences in
[0, T ] ∩ Q, gives the result. 

5.3 Convergence and regularisation theorems


As advertised, our aim in this section is to prove that, provided the filtration satisfies
‘the usual conditions’, any martingale has a version with right continuous paths.
First we recall the notion of upcrossing numbers.
Definition 5.8 Let f : I → R be a function defined on a subset I of [0, ∞). If a < b,
the upcrossing number of f along [a, b], which we shall denote U([a, b], ( ft )t∈I ) is
the maximal integer k ≥ 1 such that there exists a sequence s1 < t1 < · · · < sk < tk
of elements of I such that f (si ) ≤ a and f (ti ) ≥ b for every i = 1, . . . , k.
If even for k = 1 there is no such sequence, we take U([a, b], ( ft )t∈I ) = 0. If
such a sequence exists for every k ≥ 1, we set U([a, b], ( ft )t∈I ) = ∞.
Upcrossing numbers are a convenient tool for studying the regularity of functions.
We omit the proof of the following analytic lemma.
Lemma 5.9 Let D be a countable dense set in [0, ∞) and let f be a real function
defined on D. Assume that for every T ∈ D
i. f is bounded on D ∩ [0, T ];

ii. for all rationals a and b such that a < b

U([a, b], ( ft )t∈D∩[0,T ] ) < ∞.

Then the right limit


f (t+) = lim f (s)
s↓t,s∈D

exists for every real t ≥ 0, and similarly the left limit

f (t−) = lim f (s)


s↑t,s∈D

30
exists for any real t > 0.
Furthermore, the function g : R+ → R defined by g(t) = f (t+) is càdlàg (‘con-
tinue à droite avec des limites à gauche’; i.e. right continuous with left limits) at
every t > 0.

Lemma 5.10 (Doob’s upcrossing lemma in discrete time) Let (Xt )t≥0 be a su-
permartingale and F a finite subset of [0, T ]. If a < b then

E (Xn − a)− E (XT − a)−


h    
i
E U [a, b], (Xn : n ∈ F) ≤ sup ≤ .
n∈F b−a b−a

The last inequality follows since (Xt − a)− is a submartingale.


Taking an increasing sequence Fn and setting ∪n Fn = F, this immediately ex-
tends to a countable F ⊂ [0, T ]. From this we deduce:

Theorem 5.11 If (Xt ) is a right-continuous super-martingale and supt E[Xt− ] < ∞


then limt→∞ Xt exists a.s. In particular, a non-negative right-continuous super-
martingale converges a.s. as t → ∞.

Proof. This is immediate since, by right continuity, upcrossings of [a, b] over t ∈


[0, ∞) are the same as upcrossings of (a, b) over t ∈ [0, ∞) ∩ Q and a sequence
(xn )n≥1 converges if and only if U([a, b], (xn )n≥1 < ∞ for all a < b with a, b ∈ Q.

The next result says that we can talk about left and right limits for a general
supermartingale and then our analytic lemma will tell us how to find a càdlàg ver-
sion.

Theorem 5.12 If (Xt : t ≥ 0) is a supermartingale then for P-almost every ω ∈ Ω,

∀t ∈ (0, ∞) lim Xr (ω) and lim Xr (ω) exist and are finite. (21)
r↑t, r∈Q r↓t, r∈Q

Proof. Fix T > 0. From Lemmas 5.7 and 5.10, there exists ΩT ⊆ Ω, with P(ΩT ) =
1, such that for any ω ∈ ΩT

∀a, b ∈ Q with a < b, U [a, b], (Xt (ω) : t ∈ [0, T ] ∩ Q) < ∞,

and
sup |Xt (ω)| < ∞.
t∈[0,T ]∩Q

It follows that the limits in (21) are well defined and finite for all t ≤ T and ω ∈ ΩT .
To complete the proof, take Ω := Ω1 ∩ Ω2 ∩ Ω3 ∩ . . .. 
Using this, even if X is not right-continuous, its right-continuous version is a.s.
well defined. The following fundamental regularisation result is again due to Doob.

31
Theorem 5.13 Let X be a supermartingale with respect to a right-continuous and
complete filtration (Ft ). If t 7→ E[Xt ] is right continuous (e.g. if X is a mar-
tingale) then X admits a modification with càdlàg paths, which is also an (Ft )-
supermartingale.

Remark 5.14 If X is a martingale then its càdlàg modification is also a martin-


gale.

Proof. By Theorem 5.12, there exists Ω0 ⊆ Ω, with P[Ω0 ] = 1, such that the
process
(
limr↓t, r∈Q Xr (ω) ω ∈ Ω0
Xt+ (ω) =
0 ω∈/ Ω0

is well defined and adapted to (Ft ). By Lemma 5.9, it has càdlàg paths.
To check that we really have only produced a modification of Xt , that is Xt =
Xt+ almost surely, let tn ↓ t be a sequence of rationals. Set Yk = Xt−k for every integer
k ≤ 0. Then Y is a backwards supermartingale with respect to the (backward)
discrete filtration Hk = Ft−k and supk≤0 E[|Yk |] < ∞. The convergence theorem for
backwards supermartingales (see Appendix C) then implies that Xtn converges to
Xt+ in L1 . In particular, Xt+ ∈ L1 and thanks to L1 convergence, we can pass to the
limit n → ∞ in the inequality Xt ≥ E[Xtn |Ft ] to obtain Xt ≥ E[Xt+ |Ft ].
Right continuity of t 7→ E[Xt ] implies E[Xt+ − Xt ] = 0, so that Xt = Xt+ almost
surely.
Now, to check the supermartingale property, let s < t and let (sn )n≥0 be a se-
quence of rationals decreasing to s. Assume that sn < t for all n. Then, as above,
Xsn → Xs+ ∈ L1 , so if A ∈ Fs+ , which implies A ∈ Fsn for every n, with tn as above,

E[Xs+ 1A ] = lim E[Xsn 1A ] ≥ lim E[Xtn 1A ]


n→∞ n→∞
 
= E[Xt+ 1A ] = E E[Xt+ |Fs+ ]1A .

Since this holds for all A ∈ Fs+ and since Xs+ and E[Xt+ |Fs+ ] are both Fs+ -
measurable, this shows
Xs+ ≥ E[Xt+ |Fs+ ].
For martingales all the inequalities can be replaced by equalities and for submartin-
gales we use that −X is a supermartingale. 

Remark 5.15 Let’s make some comments on the assumptions of the theorem.
i. The assumption that the filtration is right continuous is necessary. For ex-
ample, let Ω = {−1, +1}, P[{1}] = 1/2 = P[{−1}]. We set ε(ω) = ω and
Xt = 0 if 0 ≤ t ≤ 1, Xt = ε if t > 1. Then X is a martingale with respect
to the canonical filtration (which is complete since there are no nonempty
negligible sets), but no modification of X can be right continuous at t = 1.

32
ii. Similarly, take Xt = f (t), where f (t) is deterministic, non-increasing and not
right continuous. Then no modification can have right continuous sample
paths.

5.4 Martingale convergence and optional stopping


Theorem 5.16 Let X be a supermartingale with right continuous sample paths.
Assume that (Xt )t≥0 is bounded in L1 . Then there exists X∞ ∈ L1 such that limt→∞ Xt =
X∞ almost surely.

Proof. Let D be a countable dense subset of R+ . We showed before that for a < b
1
E[U([a, b], (Xt )t∈D∩[0,T ] )] ≤ E[(XT − a)− ].
b−a
By the Monotone Convergence Theorem,
1
E[U([a, b], (Xt )t∈D )] ≤ sup E[(Xt − a)− ] < ∞
b − a t≥0

(since X is bounded in L1 ). This implies that X∞ = limD3t→∞ Xt exists a.s. in


[−∞, ∞].
We can exclude the values ±∞ since Fatou’s Lemma gives

E[|X∞ |] ≤ lim inf E[|Xt |] < ∞,


D3t→∞

so X∞ ∈ L1 . Right continuity of sample paths allows us to remove the restriction


that t ∈ D. 
Under the assumptions of this theorem, Xt may not converge to X∞ in L1 .
The next result gives, for martingales, necessary and sufficient conditions for L1 -
convergence.

Definition 5.17 A martingale is said to be closed if there exists a random variable


Z ∈ L1 such that for every t ≥ 0, Xt = E[Z|Ft ].

Theorem 5.18 Let (Xt : t ≥ 0) be a martingale with right continuous sample paths.
Then TFAE:

i. X is closed;

ii. the collection (Xt )t≥0 is uniformly integrable;

iii. Xt converges almost surely and in L1 as t → ∞.

Moreover, if these properties hold, Xt = E[X∞ |Ft ] for every t ≥ 0, where X∞ ∈ L1


is the almost sure limit of Xt as t → ∞.

33
Proof. That the first condition implies the second is easy. If Z ∈ L1 , then E[Z|G ],
where G varies over sub σ -fields of F is uniformly integrable.
If the second condition holds, then, in particular, (Xt )t≥0 is bounded in L1 and,
by Theorem 5.16, Xt → X∞ almost surely. By the uniform integrability, we also
have convergence in L1 , so the third condition is satisfied.
Finally, if the third condition holds, for every s ≥ 0, pass to the limit as t → ∞ in
the equality Xs = E[Xt |Fs ] (using the fact that conditional expectation is continuous
for the L1 -norm) and obtain Xs = E[X∞ |Fs ]. 
We should now like to establish conditions under which we have an optional
stopping theorem for continuous martingales. As usual, our starting point will be
the corresponding discrete time result and we shall pass to a suitable limit.

Theorem 5.19 (Optional stopping for uniformly integrable discrete martingales)


Let (Yn )n∈N be a uniformly integrable martingale with respect to the filtration
(Gn )n∈N , and let Y∞ be the a.s. limit of Yn when n → ∞. Then, for every choice
of the stopping times S and T such that S ≤ T , we have YT ∈ L1 and

YS = E[YT |GS ],

where
GS = {A ∈ G∞ : A ∩ {S = n} ∈ Gn for every n ∈ N},
with the convention that YT = Y∞ on the event {T = ∞}, and similarly for YS .

Let (Xt )t≥0 be a right continuous martingale or supermartingale such that Xt con-
verges almost surely as t → ∞ to a limit X∞ . Then for every stopping time T , we
define
XT (ω) = 1{T (ω)<∞} XT (ω) (ω) + 1{T (ω)=∞} X∞ (ω).

Theorem 5.20 Let (Xt )t≥0 be a uniformly integrable martingale with right contin-
uous sample paths. Let S and T be two stopping times with S ≤ T . Then XS and XT
are in L1 and XS = E[XT |FS ].
In particular, for every stopping time S we have XS = E[X∞ |FS ] and E[XS ] =
E[X∞ ] = E[X0 ].

Proof. For any integer n ≥ 0 set



k+1
Tn = ∑ n
1{k2−n <T ≤(k+1)2−n } + ∞1{T =∞} ,
k=0 2


k+1
Sn = ∑ n
1{k2−n <S≤(k+1)2−n } + ∞1{S=∞} .
k=0 2

Then Tn and Sn are sequences of stopping times that decrease respectively to T and
S. Moreover, Sn ≤ Tn for every n ≥ 0.

34
For each fxed n, 2n Sn and 2n Tn are stopping times of the discrete filtration
(n)
Hn = Fk/2n and Yk = Xk/2n is a discrete martingale with respect to this filtration.
(n) (n)
From Theorem 5.19, Y2n Sn and Y2n Tn are in L1 and

(n) (n)
XSn = Y2n Sn = E[Y2n Tn |H2n Sn ] = E[XTn |FSn ].

Let A ∈ FS . Since FS ⊆ FSn we have A ∈ FSn and so E[1A XSn ] = E[1A XTn ]. By
right continuity, XS = limn→∞ XSn and XT = limn→∞ XTn . The limits also hold in
L1 (in fact, by Theorem 5.19, XSn = E[X∞ |FSn ] for every n and so (XSn )n≥1 and
(XTn )n≥1 are uniformly integrable). L1 convergence implies that the limits XS and
XT are in L1 and allows us to pass to a limit, E[1A XS ] = E[1A XT ]. This holds for
all A ∈ FS and so since XS is FS -measurable we conclude that XS = E[XT |FS ], as
required. 

Corollary 5.21 In particular, for any martingale with right continuous paths and
two bounded stopping times, S ≤ T , we have XS , XT ∈ L1 and XS = E[XT |FS ].

Proof. Let a be such that S ≤ T ≤ a. The martingale (Xt∧a )t≥0 is closed by Xa and
so we may apply our previous results. 

Corollary 5.22 Suppose that (Xt )t≥0 is a martingale with right continuous paths
and T is a stopping time.

i. (Xt∧T )t≥0 is a martingale;

ii. if, in addition, (Xt )t≥0 is uniformly integrable, then (Xt∧T )t≥0 is uniformly
integrable and for every t ≥ 0, Xt∧T = E[XT |Ft ].

We omit the proof which can be found in Le Gall.


Above all, optional stopping is a powerful tool for explicit calculations.

Example 5.23 Fix a > 0 and let Ta be the first hitting time of a by standard Brow-
nian motion. Then for each λ > 0,

E[e−λ Ta ] = e−a 2λ
.
2
Recall that Ntλ = exp(λ Bt − λ2 t) is a martingale. So Nt∧Tλ
a
is still a martingale and
a
it is bounded above by e and hence is uniformly integrable, so E[NTλa ] = E[N0λ ].
λ

That is,
2
eaλ E[e−λ Ta /2 ] = E[N0λ ] = 1.

Replace λ by 2λ and rearrange.
Warning: This argument fails if λ < 0 - the reason being that we lose the
uniform integrability.

35
6 Continuous semimartingales
Recall that our original goal was to make sense of differential equations driven
by ‘rough’ inputs. In fact, we’ll recast our differential equations as integral equa-
tions and so we must develop a theory that allows us to integrate with respect to
‘rough’ driving processes. The class of processes with which we work are called
semimartingales, and we shall specialise to the continuous ones.
We’re going to start with functions for which the integration theory that you
already know is adequate - these are called functions of finite variation.
Throughout, we assume that a filtered probability space (Ω, F , (Ft ), P) satis-
fying the usual conditions is given.

6.1 Functions of finite variation


Throughout this section we only consider real-valued right-continuous functions
on [0, ∞). Our arguments will be shift invariant so, without loss of generality, we
assume that any such function a satisfies a(0) = 0. Recall the following definition.

Definition 6.1 The (total) variation of a function a over [0, T ] is defined as


nπ −1
V (a)T = sup ∑ |ati+1 − ati |,
π i=0

where the supremum is over partitions π = {0 = t0 < t1 < . . . < tnπ = T } of [0, T ].
We say that a is of finite variation on [0, T ] if V (a)T < ∞. The function a is of finite
variation if V (a)T < ∞ for all T ≥ 0 and of bounded variation if limT →∞ V (a)T < ∞.

Remark 6.2 Note that t → V (a)t is non-negative, right-continuous and non-decreasing


in t. This follows since any partition of [0, s] may be included in a partition of [0,t],
t ≥ s.

Proposition 6.3 The function a is of finite variation if and only if it is equal to the
difference of two non-decreasing functions, a1 and a2 .
Moreover, if a is of finite variation, then a1 and a2 can be chosen so that
V (a)t = a1 (t) + a2 (t). If a is càdlàg then V (a)t is also càdlàg.

Proof.
n(π)−1 
V (a)t − a(t) = sup ∑ |a(ti+1 ) − a(ti )| − a(ti+1 ) − a(ti )
π i=0

is an non-decreasing function of t, as is V (a)t + a(t). 


If we define measures µ+ , µ− by

V (a)t + a(t) V (a)t − a(t)


µ+ ((0,t]) = , µ− ((0,t]) = ,
2 2

36
then we can develop a theory of integration with respect to a by declaring that
Z t Z t Z t
f (s)da(s) = f (s)µ+ (ds) − f (s)µ− (ds),
0 0 0

provided that
Z t Z t
| f (s)||µ|(ds) = | f (s)| (µ+ (ds) + µ− (ds)) < ∞.
0 0

We say that µ = µ+ − µ− Ris the signed measure associated with a, µ+ , µ− is its


Jordan decomposition and 0t f (s)da(s) is the Lebesgue-Stieltjes integral of f with
respect to a.
We sometimes use the notation
Z t
( f · a)(t) = f (s)da(s).
0

The function ( f · a) will be right continuous and of finite variation whenever a is


finite variation and f is a-integrable (exercise).

Proposition 6.4 (Associativity) Let a be of finite variation as above and f , g mea-


surable functions, f is a-integrable and g is ( f · a)-integrable. Then g f is a-
integrable and Z t Z t
g(s)d( f · a)(s) = g(s) f (s)da(s).
0 0

In our ‘dot’-notation:
g · ( f · a) = (g f ) · a. (22)

Proposition 6.5 (Stopping) Let a be of finite variation as above and fix t ≥ 0. Set
at (s) = a(t ∧ s). Then at is of finite variation and for any measurable a-integrable
function f
Z u∧t Z u Z u
f (s)da(s) = f (s)dat (s) = f (s)1[0,t] (s)da(s), u ∈ [0, ∞].
0 0 0

Proposition 6.6 (Integration by parts) Let a and b be two continuous functions


of finite variation with a(0) = b(0) = 0. Then for any t
Z t Z t
a(t)b(t) = a(s)db(s) + b(s)da(s).
0 0

Proposition 6.7 (Chain-rule) If F is a C1 function and a is continuous of finite


variation, then F(a(t)) is also of finite variation and
Z t
F(a(t)) = F(a(0)) + F 0 (a(s))da(s).
0

37
Proof. The statement is trivially true for F(x) = x. Now by Proposition 6.6, it is
straightforward to check that if the statement is true for F, then it is also true for
xF(x). Hence, by induction, the statement holds for all polynomials. To complete
the proof, approximate F ∈ C1 by a sequence of polynomials. 

Proposition 6.8 (Change of variables) If a is non-decreasing and right-continuous


then so is
c(s) := inf{t ≥ 0 : a(t) > s},
where inf 0/ = +∞. Let a(0) = 0. Then, for any Borel measurable function f ≥ 0
on R+ , we have
Z ∞ Z a(∞)
f (u)da(u) = f (c(s))ds.
0 0

Proof. If f (u) = 1[0,ν] (u), then the claim becomes


Z ∞
a(ν) = 1{c(s)≤ν} ds = inf{s : c(s) > ν},
0

and equality holds by definition of c. Take differences to get indicators of sets


(u, ν]. The Monotone Class Theorem allows us to extend to functions of compact
support and then take increasing limits to obtain the formula in general. 

6.2 Processes of finite variation


Recall that a filtered probability space (Ω, F , (Ft ), P) satisfying the usual condi-
tions is given.

Definition 6.9 An adapted right-continuous process A = (At : t ≥ 0) is called a


finite variation process (or a process of finite variation) if A0 = 0 and t 7→ At is (a
function) of finite variation a.s..

Proposition 6.10 Let A be a finite variation process and K a progressively mea-


surable process s.t.
Z t
∀t ≥ 0, ∀ω ∈ Ω, |Ks (ω)||dAs (ω)| < ∞.
0
Rt
Then ((K · A)t : t ≥ 0), defined as (K · A)t (ω) := 0 Ks (ω)dAs (ω), is a finite varia-
tion process.

Proof. The right continuity is immediate from the deterministic theory, but we
need to check that (K · A)t is adapted. For this we check that if t > 0 is fixed and
h : [0,t] × Ω → R is measurable with respect to B([0,t]) ⊗ Ft , and if
Z t
|h(s, ω)||dAs (ω)| < ∞
0

38
for every ω ∈ Ω, then Z t
h(s, ω)dAs (ω)
0
is Ft -measurable.
Fix t > 0. Consider first h defined by h(s, ω) = 1(u,v] (s)1Γ (ω) for (u, v] ⊆ [0,t]
and Γ ∈ Ft . Then
(h · A)t = 1Γ (Av − Au )
is Ft -measurable. By the Monotone Class Theorem, (h · A)t is Ft -measurable for
any h = 1G with G ∈ B([0,t]) ⊗ Ft , or, more generally, any bounded B([0,t]) ⊗
Ft -measurable function h. If h is a general B([0,t]) ⊗ Ft -measurable function
satisfying Z t
|h(s, ω)||dAs (ω)| < ∞ ∀ω ∈ Ω,
0
then h is aR pointwise limit, h = limn→∞ hn , of simple functions with |h| ≥ |hn |. The
integrals hn (s, ω)dAs (ω) converge by the Dominated Convergence Theorem, and
hence 0t h(s, ω)dAs (ω) is also Ft -measurable (as a limit of Ft -measurable func-
R

tions). In particular, (K · A)t (ω) is Ft -measurable since by progressive measura-


bility, (s, ω) 7→ Ks (ω) on [0,t] is B([0,t]) ⊗ Ft -measurable. 

It is worth recording that our integral can be obtained through the limiting
procedure that one might expect. Let f : [0, T ] → R be continuous and 0 = ton <
t1n < · · · < t pnn = T be a sequence of partitions of [0, T ] with mesh tending to zero.
Then Z T pn
n
) a(tin ) − a(ti−1
n

f (s)da(s) = lim ∑ f (ti−1 ) .
0 n→∞
i=1
n ) if s ∈ (t n ,t n ],
The proof is easy: let fn : [0, T ] → R be defined by fn (s) = f (ti−1 i−1 i
1 ≤ i ≤ pn , and fn (0) = 0. Then
pn  Z
n
∑ f (ti−1 ) a(tin ) − a(ti−1
n
) = fn (s)µ(ds),
i=1 [0,T ]

where µ is the signed measure associated with a. The desired result now follows
by the Dominated Convergence Theorem.
In the argument above, fn took the value of f at the left endpoint of each inter-
val. In the finite variation case, we could equally have approximated by fn taking
the value of f at the midpoint of the interval, or the right hand endpoint, or any
other point in between. However, we are going to extend our theory to processes
that do not have finite variation and then, even if f is continuous, it matters whether
fn takes the value of f at the left or right endpoint of the intervals.
The processes that make our theory work are slight generalisations of martin-
gales.

39
6.3 Continuous local martingales
Definition 6.11 An adapted process (Mt : t ≥ 0) is called a continuous local mar-
tingale if M0 = 0, it has continuous trajectories a.s. and if there exists a non-
decreasing sequence of stopping times (τn )n≥1 such that τn ↑ ∞ a.s. and for each n,
M τn = (Mt∧τn : t ≥ 0) is a (uniformly integrable) martingale. We say (τn ) reduces
M.
More generally, when we do not assume that M0 = 0, we say that M is a con-
tinuous local martingale if Nt = Mt − M0 is a continuous local martingale.

Any martingale is a local martingale, but the converse is false.


Proposition 6.12 i. A non-negative continuous local martingale such that M0 ∈
1
L is a supermartingale.
ii. A continuous local martingale M such that there exists a random variable
Z ∈ L1 with |Mt | ≤ Z for every t ≥ 0 is a uniformly integrable martingale.
iii. If M is a continuous local martingale and M0 = 0 (or more generally M0 ∈
L1 ), the sequence of stopping times

Tn = inf{t ≥ 0 : |Mt | ≥ n}

reduces M.
iv. If M is a continuous local martingale, then for any stopping time ρ, the
stopped process M ρ is also a continuous local martingale.
Proof. (i) Write Mt = M0 +Nt . By definition, there exists a sequence Tn of stopping
times that reduces N. Thus, if s ≤ t, for every n,

Ns∧Tn = E[Nt∧Tn |Fs ].

We can add M0 to both sides (M0 is Fs -measurable and in L1 ) and we find

Ms∧Tn = E[Mt∧Tn |Fs ].

Since M takes non-negative values, let n → ∞ and apply Fatou’s lemma for condi-
tional expectations to find
Ms ≥ E[Mt |Fs ]. (23)
Taking s = 0, E[Mt ] ≤ E[M0 ] < ∞. So Mt ∈ L1 for every t ≥ 0, and (23) says that
M is a supermartingale.
(ii) By the same argument,

Ms∧Tn = E[Mt∧Tn |Fs ].

Since |Mt∧Tn | ≤ Z, this time apply the Dominated Convergence Theorem to see that
Mt∧Tn converges in L1 (to Mt ) and Ms = E[Mt |Fs ].
The other two statements are immediate. 

40
Theorem 6.13 A continuous local martingale M with M0 = 0 a.s., is a process of
finite variation if and only if M is indistinguishable from zero.

Proof. Assume M is a continuous local martingale and of finite variation. Let


Z t
τn = inf{t ≥ 0 : |dMs | ≥ n} = inf{t ≥ 0 : V (M)t ≥ n},
0
Rt
which are stopping times since V (M)t = 0 |dMs | is continuous and adapted.
Let N = M τn , which is bounded since
Z t∧τn Z t∧τn
|Nt | = |Mt∧τn | ≤ | dMu | ≤ |dMu | ≤ n,
0 0

and hence (Nt ) is a martingale.


Let t > 0 and π = {0 = t0 < t1 < t2 < . . . < tm(π) = t} be a partition of [0,t]. Then
m(π) m(π)
E[Nt2 ] = E[Nt2i − Nt2i−1 ] = )2
 
∑ ∑ E (Nt − Nt i i−1
i=1 i=1
h i
≤E ( sup |Nti − Nti−1 |) · ∑ |Nt {z− Nt
i i−1 |
1≤i≤m(π) | }
≤V (N)t =V (M)t∧τn ≤n
 
≤nE sup |Nti − Nti−1 | → 0 as δ (π) → 0
1≤i≤m(π)

(where δ (π) is the mesh of π), by the Dominated Convergence Theorem (since
|Nti − Nti−1 | ≤ V (N)t ≤ n and so n is a dominating function).
It then follows by Fatou’s Lemma that
E[Mt2 ] = E[ lim Mt∧τ
2
n
2
] ≤ lim E[Mt∧τ n
]=0
n→∞ n→∞

which implies that Mt = 0 a.s., and so by continuity of paths, P[Mt = 0 ∀t ≥ 0] = 1.




6.4 Quadratic variation of a continuous local martingale


If our martingales are going to be interesting, then they’re going to have unbounded
variation. But remember that we said that we’d use Brownian motion as a ba-
sic building block, and that while Brownian motion has infinite variation, it has
bounded quadratic variation, defined over [0, T ] by
N(πn ) 2
lim ∑ Bt j − Bt j−1 = T.
δ (πn )→0 j=1

We are now going to see that the analogue of this process exists for any continuous
local martingale. Ultimately, we shall see that the quadratic variation is in some
sense a ‘clock’ for a local martingale, but that will be made more precise in the
very last result of the course.

41
Theorem 6.14 Let M be a continuous local martingale. There exists a unique (up
to indistinguishability) non-decreasing, continuous adapted finite variation pro-
cess (hM, Mit : t ≥ 0), starting in zero, such that (Mt2 − hM, Mit : t ≥ 0) is a con-
tinuous local martingale.
Furthermore, for any T > 0 and any sequence of partitions πn = {0 = t0n <
t1n < . . . < tn(π
n
n)
= T } with δ (πn ) = sup1≤i≤n(πn ) (tin − ti−1
n ) → 0 as n → ∞

n(πn )
2
hM, MiT = lim ∑ (Mt n
i
− Mti−1
n ) , (24)
n→∞
i=1

where the limit is in probability.


The process hM, Mi is called the quadratic variation of M, or simply the in-
creasing process of M, and is often denoted hM, Mit = hMit .

Proof.[Sketch of the Proof (NOT EXAMINABLE)]


Uniqueness is a direct consequence of Theorem 6.13 since if A, A0 are two such
processes then M 2 − A − (M 2 − A0 ) = A − A0 is a local martingale starting in zero
and of finite variation, which implies A = A0 by Theorem 6.13.
The idea of existence is as follows. First suppose that M is bounded. Take
a sequence of partitions 0 = t0n < · · · < t pnn = T with mesh tending to zero. Then
check that
np
Xtn := ∑ Mti−1
n (Mt n ∧t − Mt n ∧t )
i i−1
i=1
is a (bounded) martingale. Now observe that
j
Mt2nj − 2Xtnnj = ∑ (Mtin − Mti−1 2
n ) .

i=1

A direct computation gives

lim E[(Xtn − Xtm )2 ] = 0,


n,m→∞

and by Doob’s L2 -inequality

lim E[sup(Xtn − Xtm )2 ] = 0.


n,m→∞ t≤T

By passing to a subsequence, X nk → Y almost surely on [0, T ] where (Yt )t≤T is a


continuous process which inherits the martingale property from X.
j
Mt2nj − 2Xtnnj = ∑ (Mtin − Mti−1 2 π
n ) =: QV n n (M)
t j
i=1

is non-decreasing along t nj : j ≤ n(πn ). Letting n → ∞, Mt2 − 2Yt is almost surely


non-decreasing and we set hM, Mit = Mt2 − 2Yt .

42
To move to a general continuous local martingale, we consider a sequence of
stopped processes.
Details are in, for example, Le Gall’s book. 
Our theory of integration is going to be an ‘L2 -theory’. Let us introduce the
martingales with which we are going to work. We are going to think of them as
being defined up to indistinguishability - nothing changes if we change the pro-
cess on a null set. Think of this as analogous to considering Lebesgue integrable
functions as being defined ‘almost everywhere’.
Definition 6.15 Let H2 be the space of L2 -bounded càdlàg martingales, i.e.

((Ft ), P)–martingales M s.t. sup E[Mt2 ] < ∞,


t≥0

and H2 the subspace consisting of continuous L2 -bounded martingales. Finally, let


H20 = {M ∈ H2 : M0 = 0 a.s.}.
We note that the space H2 is also sometimes denoted M 2 .
It follows from Doob’s L2 -inequality that
 
E sup Mt ≤ 4 sup E[Mt2 ] < +∞, M ∈ H2 .
2
t≥0 t≥0

Consequently, {Mt : t ≥ 0} is bounded by a square integrable random variable


(supt≥0 |Mt |) and in particular is uniformly integrable. It follows from the Optional
Stopping Theorem that Mt = E[M∞ |Ft ] for some square integrable random variable
M∞ .
Conversely, we can start with a random variable Y ∈ L2 (Ω, F∞ , P) and define
a martingale Mt := E[Y |Ft ] ∈ H2 (and M∞ = Y ).
Two L2 -bounded martingales M, M 0 are indistinguishable if and only if M∞ =
0
M∞ a.s. and so if we endow H with the norm
q
kMkH2 := E[M∞2 ] = kM∞ kL2 (Ω,F∞ ,P) , M ∈ H2 , (25)

then H2 can be identified with the familiar L2 (Ω, F∞ , P) space.


Theorem 6.16 H2 is a closed subspace of H2 .
Proof. This is almost a matter of writing down definitions. Suppose that the se-
quence M n ∈ H2 converges in k · kH2 to some M ∈ H2 . By Doob’s L2 -inequality
 
E sup |Mt − Mt | ≤ 4kM n − Mk2H2 −→ 0, as n → ∞.
n 2
t≥0

Passing to a subsequence, we have supt≥0 |Mtnk − Mt | → 0 a.s. and hence M has


continuous paths a.s.,which completes the proof. 
For continuous local martingales, the norm in (25) can be re-expressed in terms
of the quadratic variation:

43
Theorem 6.17 Let M be a continuous local martingale with M0 ∈ L2 .

i. TFAE

(a) M is a martingale, bounded in L2 ;


(b) E[hM, Mi∞ ] < ∞.

Furthermore, if these properties hold, Mt2 −hM, Mit is a uniformly integrable


martingale and, in particular, E[M∞2 ] = E[M02 ] + E[hM, Mi∞ ].

ii. TFAE

(a) M is a martingale and Mt ∈ L2 for every t ≥ 0;


(b) E[hM, Mit ] < ∞ for every t ≥ 0.

Furthermore, if these properties hold, Mt2 − hM, Mit is a martingale.

Proof. The second statement will follow from the first on applying it to Mt∧a for
every choice of a ≥ 0.
To prove the first set of equivalences, without loss of generality, suppose that
M0 = 0 (or replace M by M − M0 ).
Suppose that M is a martingale, bounded in L2 . Doob’s L2 -inequality implies
that for every T > 0,
E[ sup Mt2 ] ≤ 4E[MT2 ],
0≤t≤T

and so, letting T → ∞,

E[sup Mt2 ] ≤ 4 sup E[Mt2 ] = C < ∞.


t≥0 t≥0

Let Sn = inf{t ≥ 0 : hM, Mit ≥ n}. Then the continuous local martingale Mt∧S 2 −
n
2
hM, Mit∧Sn is dominated by sups≥0 Ms +n, which is integrable. By Proposition 6.12
this continuous local martingale is a uniformly integrable martingale and hence
2
E[hM, Mit∧Sn ] = E[Mt∧Sn
] ≤ E[sup Ms2 ] ≤ C < ∞.
s≥0

Let n and then t tend to infinity and use the Monotone Convergence Theorem to
obtain E[hM, Mi∞ ] < ∞.
Conversely, assume that E[hM, Mi∞ ] < ∞. Set Tn = inf{t ≥ 0 : |Mt | ≥ n}. Then
2
the continuous local martingale Mt∧T − hM, Mit∧Tn is dominated by n2 + hM, Mi∞
n
which is integrable. From Proposition 6.12 again, this continuous local martingale
is a uniformly integrable martingale and hence for every t ≥ 0,
2
E[Mt∧Tn
] = E[hM, Mit∧Tn ] ≤ E[hM, Mi∞ ] = C0 < ∞. (26)

Let n → ∞ and use Fatou’s lemma to see that E[Mt2 ] ≤ C0 < ∞, so (Mt )t≥0 is
bounded in L2 .

44
We still have to check that (Mt )t≥0 is a martingale. However, (26) shows that
(Mt∧Tn )n≥1 is uniformly integrable and so converges both almost surely and in L1
to Mt for every t ≥ 0. Recalling that M Tn is a martingale, L1 convergence allows us
to pass to the limit as n → ∞ in the martingale property, so M is a martingale.
Finally, if the two properties hold, then M 2 −hM, Mi is dominated by supt≥0 Mt2 +
hM, Mi∞ , which is integrable, and so Proposition 6.12 again says that M 2 − hM, Mi
is a uniformly integrable martingale. 
Our previous theorem immediately yields that for a martingale M with M0 = 0
we have
kMk2H2 = E[M∞2 ] = E[hMi∞ ].
We can also deduce a complement to Theorem 6.13.

Corollary 6.18 Let M be a continuous local martingale with M0 = 0. Then the


following are equivalent:

i. M is indistinguishable from zero,

ii. hMit = 0 for all t ≥ 0 a.s.,

iii. M is a process of finite variation.

In other words, there is nothing ‘in between’ finite variation and finite quadratic
variation for this class of processes.
Proof. We already know that the first and third statements are equivalent. That
the first implies the second is trivial, so we must just show that the second implies
the first. We have hMi∞ = limt→∞ hMit = 0. From Theorem 6.17, M ∈ H 2 and
E[M∞2 ] = E[hMi∞ ] = 0 and so Mt = E[M∞ |Ft ] = 0 almost surely. 
We can see that the quadratic variation of a martingale is telling us something
about how its variance increases with time. We also need an analogous quantity
for the ‘covariance’ between two martingales. This is most easily defined through
polarisation.

Definition 6.19 The quadratic co-variation between two continuous local martin-
gales M, N is defined by
1
hM, Ni := (hM + N, M + Ni − hM, Mi − hN, Ni) . (27)
2
It is often called the (angle) bracket process of M and N.

Proposition 6.20 For two continuous local martingales M, N

i. the process hM, Ni is the unique finite variation process, zero in zero, such
that (Mt Nt − hM, Nit : t ≥ 0) is a continuous local martingale;

ii. the mapping M, N 7→ hM, Ni is bilinear and symmetric;

45
iii. for any stopping time τ,

hM τ , N τ it = hM τ , Nit = hM, N τ it = hM, Niτ∧t , t ≥ 0, a.s.; (28)

iv. for any t > 0 and a sequence of partitions πn of [0,t] with mesh converging
to zero
∑ (Mti+1 − Mti )(Nti+1 − Nti ) → hM, Nit , (29)
ti ∈πn

the convergence being in probability.

Proof. (i) (M + N)t2 − hM + N, M + Nit is a continuous local martingale and by


adding and subtracting terms it is equal to
1
Mt2 − hM, Mit + Nt2 − hN, Nit +2 Mt Nt − (hM + N, M + Nit − hM, Mit − hN, Nit )

| {z } | {z } | 2 {z }
l.mat l.mat
hence also a l.mat

Uniqueness follows from Theorem 6.13.


(iv) Note that

(Mt + Nt − Ms − Ns )2 − (Mt − Ms )2 − (Nt − Ns )2 = 2(Mt − Ms )(Nt − Ns ).

The asserted convergence then follows from Theorem 6.14


(ii) Both properties follow from (iv). Symmetry is obvious from the definition in
(27).
(iii) Follows from (iv). 

Definition 6.21 Two continuous local martingales M, N, are said to be orthogonal


if hM, Ni = 0.

For example, if B and B0 are independent Brownian motions, then hB, B0 i = 0.

Remark 6.22 It follows that if M and N are two martingales bounded in L2 and
with M0 N0 = 0 a.s., then (Mt Nt − hM, Nit , t ≥ 0) is a uniformly integrable martin-
gale. In particular, for any stopping time τ,

E[Mτ Nτ ] = E[hM, Niτ ]. (30)

Take M, N ∈ H2 , which we recall had the norm kMk2H2 = E[hM, Mi∞ ] = E[M∞2 ].
Then we see that this norm is consistent with the inner product on H2 × H2 given
by E[hM, Ni∞ ] = E[M∞ N∞ ] and, by the usual Cauchy-Schwarz inequality,
p
E[|hM, Ni∞ |] = E[|M∞ N∞ |] ≤ E[hMi∞ ]E[hNi∞ ].

Actually, it is easy to obtain an almost sure version of this inequality, using that
 q  q
∑ Mt − Mt Nt − Nt ≤ ∑ Mt − Mt 2 ∑ Nt − Nt 2
 
i+1 i i+1 i i+1 i i+1 i

46
and taking limits to deduce that
p p
|hM, Nit | ≤ hMit hNit .
It’s often convenient to have a more general version of this inequality.
Theorem 6.23 (Kunita-Watanabe inequality) Let M, N be continuous local mar-
tingales and K, H two measurable processes. Then for all 0 ≤ t ≤ ∞,
Z t Z t 1/2 Z t 1/2
2 2
|Hs ||Ks ||dhM, Nis | ≤ Hs dhMis Ks dhNis a.s.. (31)
0 0 0

We omit the proof which approximates H, K by simple functions and then


essentially uses the Cauchy-Schwarz inequality for sums noted above.

6.5 Continuous semimartingales


Definition 6.24 A stochastic process X = (Xt : t ≥ 0) is called a continuous semi-
martingale if it can be written as
Xt = X0 + Mt + At , t ≥0 (32)
where M is a continuous local martingale, A is a continuous process of finite vari-
ation, and M0 = A0 = 0 a.s..
The decomposition is unique (up to modification on a set of measure zero). It
should be remembered that there is a filtration (Ft ) and a probability measure P
implicit in our definition.
Proposition 6.25 A continuous semimartingale is of finite quadratic variation and
in the notation above hX, Xi = hM, Mi.
Proof. Fix t ≥ 0 and consider a sequence of partitions of [0,t], πm = {0 = t0 < t1 <
. . . < tnm = t} with mesh(πm ) → 0 as m → ∞. Then
nm nm nm nm
2 2 2
∑ (Xt − Xt
i i−1 ) = ∑ (Mti − Mti−1 ) + ∑ (Ati − Ati−1 ) +2 ∑ (Mti − Mti−1 )(Ati − Ati−1 ) .
i=1 i=1 i=1 i=1
| {z } | {z } | {z }
(i) (ii) (iii)

It follows from the properties of M and A that, as m → ∞,


(i) → hM, Mit ,
(ii) ≤ sup |Ati − Ati−1 | ·Vt (A) → 0 a.s. ,
1≤i≤nm
(iii) ≤ sup |Mti − Mti−1 | ·Vt (A) → 0 a.s. .
1≤i≤nm


If X,Y are two continuous semimartingales, we can define their co-variation
hX,Y i via the polarisation formula that we used for martingales. If Xt = X0 + Mt +
At and Yt = Y0 + Nt + At0 , then hX,Y it = hM, Nit .

47
7 Stochastic Integration
At the beginning of the course we argued that whereas classically differential equa-
tions take the form
dX(t) = a(t, X(t))dt,
in many settings, the dynamics of the physical quantity in which we are interested
may also have a random component and so perhaps takes the form
dXt = a(t, Xt )dt + b(t, Xt )dBt .
We actually understand equations like this in the integral form:
Z t Z t
Xt − X0 = a(s, Xs )ds + b(s, Xs )dBs .
0 0
If a is nice enough, then the first term has a classical interpretation. It is the second
term, or rather a generalisation of it, that we want to make sense of now.
The first approach will be to mimic what we usually do for construction of the
Lebesgue integral, namely work out how to integrate simple functions and then
extend to general functions through passage to the limit. We’ll then provide a very
slick, but not at all intuitive, approach that nonetheless gives us some ‘quick wins’
in proving properties of the integral.

7.1 Stochastic integral w.r.t. L2 -bounded martingales


Remark on Notation: We are going to use the notation ϕ • M for the (Itô) stochas-
tic integral of ϕ withRrespect to M. This is not universally accepted notation; many
authors would write 0t ϕs dMs for (ϕ • M)t . Moreover, for emphasis, when the inte-
grator is stochastic, we have used ‘•’ in place of the ‘·’ that we used for the Stieltjes
integral.

We’re going to develop a theory of integration w.r.t. martingales in H2 . Recall


that H20 is the space of continuous martingales M, zero at zero, which are bounded
in L2 . It is a Hilbert space with the inner product hM, NiH2 = E[M∞ N∞ ] and induced
norm q p
kMkH2 = E[M∞2 ] = E[hMi∞ ].
(In a very real sense we are identifying H2 with L2 .)
Define E to be the space of simple bounded process of the form
m
ϕt = ∑ ϕ (i) 1(ti ,ti+1 ] (t), t ≥ 0, (33)
i=0

for some m ∈ N, 0 ≤ t0 < t1 < . . . < tm+1 and where ϕ (i) are bounded Fti -measurable
random variables. Define the stochastic integral ϕ • M of ϕ in (33) with respect to
M ∈ H2 via
m
(ϕ • M)t := ∑ ϕ (i) (Mt∧ti+1 − Mt∧ti ), t ≥ 0. (34)
i=0

48
If we write Mti := ϕ (i) (Mt∧ti+1 − Mt∧ti ) then clearly M i ∈ H2 and so ϕ • M is a
martingale. Moreover, since for i 6= j the intervals (ti ,ti+1 ] and (t j ,t j+1 ] are disjoint,
Mti Mtj is a martingale and hence hM i , M j it = 0. Using the bilinearity of the bracket
process then yields
m m  2  Zt 2
hϕ • Mit = ∑ hM i it = ∑ ϕ (i) hMiti+1 ∧t − hMiti ∧t = ϕs dhMis , t ≥ 0.
i=0 i=0 0
(35)
We already used the notation that if K is progressively measurable and A is of finite
variation, then Z t
(K · A)t = Ks (ω)dAs (ω), t ≥ 0.
0
In that notation
hϕ • Mi = ϕ 2 · hMi.
More generally, for N ∈ H2 ,
m m
hϕ • M, Nit = ∑ hM i , Nit = ∑ ϕ (i) hM, Niti+1 ∧t − hM, Niti ∧t

i=0 i=0 (36)
Z t
= ϕs dhM, Nis = (ϕ · hM, Ni)t .
0

Proposition 7.1 Let M ∈ H2 . The mapping ϕ 7→ ϕ • M is a linear map from E to


H20 . Moreover,   Z ∞
kϕ • Mk2H2 = E ϕt2 dhMit . (37)
0

The proof is easy - we just need to show linearity. But given ϕ, ψ ∈ E , we use
a refinement of the partitions on which they are constant to write them as simple
functions with respect to the same partition and the result is trivial.

Remark 7.2 If we were considering left continuous martingales, then it would be


important that the processes in E are left continuous.

We are expecting an L2 -theory - we have already found an expression for the ‘L2 -
norm’ of ϕ • M. Let us define the appropriate spaces more carefully.

Definition 7.3 Given M ∈ H2 we denote by L2 (M) the space of progressively mea-


surable processes K such that
Z ∞ 
2 2
kKkL2 (M) := E Kt dhMit < +∞. (38)
0

L2 (M) is a Hilbert space, with inner product


Z ∞ 
H, K 7→ E Ht Kt dhMit = E [(HK · hMi)∞ ] .
0

49
We have E ⊆ L2 (M) and (35) tells us that the map E → H20 given by ϕ 7→ ϕ • M
is a linear isometry. If we can show that the elementary functions are dense in
L2 (M), this observation will allow us to define integrals of functions from L2 (M)
with respect to M via a limiting procedure.

Proposition 7.4 Let M ∈ H2 . Then E is a dense vector subspace of L2 (M).

Proof. It is enough to show that if K ∈ L2 (M) is orthogonal to ϕ for all ϕ ∈ E , then


K = 0 (as an element of L2 (M)). So suppose that hK, ϕiL2 (M) = 0 for all ϕ ∈ E . Let
X = K · hMi, i.e. Xt = 0t Ku dhMiu . This is well defined and by Cauchy-Schwarz
R

Z t  s Z t 
p
E[|Xt |] ≤ E |Ku |dhMiu ≤ E Ku2 dhMiu EhMit < +∞
0 0

since M ∈ H2 and K ∈ L2 (M) (we took one of the functions to be identically one
in Cauchy-Schwarz).
Taking ϕ = ξ 1(s,t] ∈ E , with 0 ≤ s < t and ξ a bounded Fs -measurable r.v., we
have  Zt 
0 =< K, ϕ >L2 (M) = E ξ Ku dhMiu = E [ξ (Xt − Xs )] .
s

Since this holds for any Fs -measurable bounded ξ , we conclude that E[(Xt −
Xs )|Fs ] = 0. In other words, X is a martingale. But X is also continuous and
of finite variation and hence X ≡ 0 a.s. Thus K = 0 dhMi − a.e. a.s. and hence
K = 0 in L2 (M). 
We now know that any K ∈ L2 (M) is a limit of simple processes ϕ n → K. For
each ϕ n we can define the stochastic integral ϕ n • M. The isometry property then
shows that ϕ n • M converge in H2 to some element that we denote K • M and which
does not depend on the choice of approximating sequence ϕ n .

Theorem 7.5 Let M ∈ H2 . The mapping ϕ 7→ ϕ • M from E to H20 defined in (34)


has a unique extension to a linear isometry from L2 (M) to H20 which we denote
K 7→ K • M.

Remark 7.6 For K ∈ L2 (M), the martingale K • M is called theRItô stochastic


integral of K with respect to M and is often written as (K • M)t = 0t Ku dMu . The
isometry property may be then written as
"Z  #
∞ 2
Z  ∞
kK • Mk2H2 = E Kt dMt =E Kt2 dhMit = kKk2L2 (M) . (39)
0 0

Notice that if B is standard Brownian motion and we calculate (B • B)t , then


N(π)−1  
(B • B)t = lim ∑ Bt j Bt j+1 − Bt j . (40)
δ (π)→0 j=0

50
We also know already that the quadratic variation is
N(π)−1  2 N(π)−1  
t = lim ∑ Bt j+1 − Bt j = Bt2 − B20 − 2 ∑ Bt j Bt j+1 − Bt j ,
δ (π)→0 j=0 j=0

and so rearranging we find


Z t
1 2  1 
Bs dBs = Bt − B20 − t = Bt2 − t .
0 2 2
This is not what one would have predicted from classical integration theory (the
extra term here comes from the quadratic variation).
Even more strangely, it matters that in (40) we took the left endpoint of the in-
terval for evaluating the integrand. On the problem sheet, you are asked to evaluate
  Bt j + Bt j+1  
lim ∑ Bt j+1 Bt j+1 − Bt j , and lim ∑ Bt j+1 − Bt j .
δ (π)→0 δ (π)→0 2

Each gives a different answer.


We can more generally define
Z T  f (Bt ) + f (Bt )  
j j+1
f (Bs ) ◦ dBs = lim ∑ Bt j+1 − Btj .
0 δ (π)→0 2

This so-called Stratonovich integral has the advantage that from the point of view
of calculations, the rules of Newtonian calculus hold true. From a modelling per-
spective however, it can be the wrong choice. For example, suppose that we are
modelling the change in a population size over time and we use [ti ,ti+1 ) to repre-
sent the (i + 1)st generation. The change over (ti ,ti+1 ) will be driven by the number
of adults, so the population size at the beginning of the interval.

7.2 Intrinsic characterisation of stochastic integrals using the quadratic


co-variation
We can also characterise the Itô integral in a slightly different way.

Theorem 7.7 Let M ∈ H2 . For any K ∈ L2 (M) there exists a unique element in
H20 , denoted K • M, such that

hK • M, Ni = K · hM, Ni, ∀N ∈ H2 . (41)

Furthermore, kK • MkH2 = kKkL2 (M) and the map

L2 (M) 3 K −→ K • M ∈ H20

is a linear isometry.

51
Proof. We first check uniqueness. Suppose that there are two such elements, X and
X 0 . Then
hX, Ni − hX 0 , Ni = hX − X 0 , Ni ≡ 0, ∀N ∈ H2 .
Taking N = X − X 0 we conclude, by Corollary 6.18, that X = X 0 .
Now let us verify (41) for the Itô integral.
Fix N ∈ H2 . First note that for K ∈ L2 (M) the Kunita-Watanabe inequality
shows that Z ∞ 
E |Ks ||dhM, Nis | ≤ kKkL2 (M) kNkH2 < ∞
0

and thus the variable


Z ∞  
Ks dhM, Nis = K · hM, Ni
0 ∞

is well defined and in L1 .


If K is an elementary process, in the notation of (34) and (35),
m
hK • M, Ni = ∑ hM i , Ni
i=0

and  
hM i , Nit = ϕ (i) hM, Niti+1 ∧t − hM, Niti ∧t ,
so   Zt
(i)
hK • M, Nit = ∑ ϕ hM, Niti+1 ∧t − hM, Niti ∧t = Ks dhM, Nis .
0

Now observe that the mapping X 7→ hX, Ni∞ is continuous from H2 into L1 . Indeed,
by Kunita-Watanabe
h i h i1/2 h i1/2
E |hX, Ni| ≤ E hX, Xi∞ E hN, Ni∞ = kNkH2 kXkH2 .

So if K n is a sequence in E such that K n → K in L2 (M),


   
hK • M, Ni∞ = lim hK n • M, Ni∞ = lim K n · hM, Ni = K · hM, Ni ,
n→∞ n→∞ ∞ ∞

where the convergence is in L1 and the last equality is again a consequence of


Kunita-Watanabe by writing
h Z ∞   i h i1/2
n
kK n − KkL2 (M) .

E Ks − Ks dhM, Nis ≤ E hN, Ni∞
0

We have thus obtained


 
hK • M, Ni∞ = K · hM, Ni ,

52
but replacing N by the stopped martingale N t in this identity also gives
 
hK • M, Nit = K · hM, Ni
t
which completes the proof of (41). 
We could write the relationship (41) as
Z · Z t
h Ks dMs , Nit = Ks dhM, Nis ;
0 0
that is, the stochastic integral ‘commutes’ with the bracket. One important conse-
quence is that if M ∈ H2 and K ∈ L2 (M), then applying (41) twice gives
 
hK • M, K • Mi = K · K · hM, Mi = K 2 · hM, Mi.

In other words, the bracket process of Ks dMs is Ks2 dhM, Mis . More generally,
R R

for N another martingale and H ∈ L2 (N),


Z · Z · Z t
h Hs dNs , Ks dMs it = Hs Ks dhM, Nis .
0 0 0

Proposition 7.8 (Associativity of stochastic integration) Let H ∈ L2 (M). If K is


progressive, then KH ∈ L2 (M) if and only if K ∈ L2 (H • M). In that case,
(KH) • M = K • (H • M).
(This is the analogue of what we already know for finite variation processes, where
K · (H · A) = (KH) · A.)
Proof.
Z ∞  Z ∞ 
E Ks2 Hs2 dhM, Mis = E Ks2 dhH • M, H • Mis ,
0 0
which gives the first assertion.
For the second, for N ∈ H2 we write
h(KH) • M, Ni = KH · hM, Ni = K · (H · hM, Ni)
= K · hH • M, Ni = hK • (H • M), Ni,
and by uniqueness in (41) this implies
(KH) • M = K • (H • M).

Recall that if M ∈ H2
and τ is a stopping time, then = (Mt∧τ ,t ≥ 0) denotes

the stopped process, which is itself a martingale and clearly M τ ∈ H2 . For any
N ∈ H2 we have
hM τ , Ni = hM, Niτ = 1[0,τ] · hM, Ni = h1[0,τ] • M, Ni,
so by uniqueness in Theorem 7.7, 1[0,τ] • M = M τ .
In fact a much more general property holds true.

53
Proposition 7.9 (Stopped stochastic integrals) Let M ∈ H2 , K ∈ L2 (M) and τ a
stopping time. Then

(K • M)τ = K • M τ = K1[0,τ] • M.

Proof. We already argued above that the result holds for K ≡ 1.


Associativity says

K • M τ = K • 1[0,τ] • M = K1[0,τ] • M.

Applying the same result to the martingale K • M we obtain

(K • M)τ = 1[0,τ] • (K • M) = 1[0,τ] K • M,

which gives the desired equalities. 

7.3 Extensions: stochastic integration with respect to continuous semi-


martingales
Definition 7.10 For a continuous local martingale M, denote by L2loc (M) the space
of progressively measurable processes K such that
Z t
∀t ≥ 0 Ks2 dhMis < +∞ a.s.
0

Theorem 7.11 Let M be a continuous local martingale. For any K ∈ L2loc (M)
there exists a unique continuous local martingale, zero in zero, denoted K • M and
called the Itô integral of K with respect to M, such that for any continuous local
martingale N
hK • M, Ni = K · hM, Ni. (42)
If M ∈ H2 and K ∈ L2 (M) then this definition coincides with the previous one.

Proof. We only sketch the proof. Not surprisingly, we use a stopping argument.
For every n ≥ 1, set
 Z t 
2
τn = inf t ≥ 0 : (1 + Ks )dhMis ≥ n ,
0

so that τn is a sequence of stopping times that increases to infinity. Since hM τn i∞ =


hMiτn ≤ n, the stopped martingale M τn is in H2 . Also
Z ∞ Z τn
Ks2 dhM τn , M τn is = Ks2 dhM, Mis ≤ n,
0 0

so that K ∈ L2 (M τn ) and the definition of K • M τn makes sense. If m > n,

K • M τn = (K • M τm )τn

54
so there is a unique process, that we denote K • M such that

(K • M)τn = K • M τn

and (K • M)t = limn→∞ (K • M τn )t and so, since (K • M τn ) is a martingale, the pro-


cess K • M is a continuous local martingale with reducing sequence τn .
If N is a continuous local martingale (and without loss of generality N0 = 0),
we consider a reducing sequence

τ̃n = inf{t ≥ 0 : |Nt | ≥ n} and set ρn := τn ∧ τ̃n .

Then N ρn ∈ H20 and hence


τn ≥ρn (28)
hK • M, Niρn =h(K • M)ρn , N ρn i = h(K • M τn )ρn , N ρn i = hK • M τn , N ρn i
Thm 7.7 (28)
= K · hM τn , N ρn i = K · hM, Niρn = (K · hM, Ni)ρn ,

so that hK • M, Ni = K · hM, Ni as required. Uniqueness of K • M follows as in


Theorem 7.7. 
Naturally, we’re going to define an integral with respect to a continuous semi-
martingale X = X0 + M + A as a sum of integrals w.r.t. M and w.r.t. A.

Definition 7.12 We say that a progressively measurable process K is locally bounded


if
a.s. ∀t ≥ 0 sup |Ku | < +∞.
u≤t

In particular, any adapted process with continuous sample paths is a locally bounded
progressively measurable process.
If K is progressively measurable and locally bounded, then for any finite vari-
ation process A, almost surely,
Z t
∀t ≥ 0, |Ks ||dAs | < ∞
0

and, similarly, K ∈ L2loc (M) for every continuous local martingale M.

Definition 7.13 Let X = X0 + M + A be a continuous semimartingale and K a


locally bounded process. The Itô stochastic integral of K with respect to X is the
continuous semimartingale K • X defined by

K • X := K • M + K · A

often written Z t Z t Z t
(K • X)t = Ks dXs = Ks dMs + Ks dAs .
0 0 0

55
This integral inherits all the nice properties of the Stieltjes integral and the Itô
integral that we have already derived (linearity, associativity, stopping etc.).
And of course, it is still the case for an elementary function ϕ ∈ E that
m  
(ϕ • X)t = ∑ ϕ (i) Xti+1 ∧t − Xti ∧t .
i=1

We should also like to know how our integral behaves under limits.

Proposition 7.14 (Stochastic Dominated Convergence Theorem) Let X be a con-


tinuous semimartingale and K n a sequence of locally bounded processes with Ktn →
0 as n → ∞ a.s. for all t. Further suppose that |Ktn | ≤ Kt for all n where K is a lo-
cally bounded process. Then K n • X converges to zero in probability and, more
precisely,
Z s
n

∀t ≥ 0 sup Ku dXu −→ 0 in probability as n → ∞.

s≤t 0

Proof. We can treat the finite variation part, X0 + A, and the local martingale part,
M, separately. For the first, note that
Z t Z t Z t
n n + n −

Ku dAu = Ku dAu − Ku dAu
0 0 0

Z t Z t Z t
≤ |Kun |dA+
u + |Kun |dA−
u = |Kun ||dAu |.
0 0 0

The a.s. pointwise convergence of K n to 0, together with the bound |K n | ≤ K, allow


us to apply
Rt n
the (usual) Dominated Convergence TheoremRt n
to conclude that, for any
t > 0, 0 |Ku ||dAu | converges to 0 a.s. (in fact, as 0 |Ku ||dAu | is non-decreasing in
t, the convergence is uniform on any compact interval).
For the continuous local martingale part M, let (τm ) be a reducing sequence
such that M τm ∈ H20 and K ∈ L2 (M τm ). Then, by the Itô isometry,
"Z 2 # Z ∞ 
τm
τm 2
n
kK •M kH2 = E n
Kt dMt =E (Kt ) 1[0,τm ] (t)dhMit = kK n k2L2 (Mτm ) .
n 2
0 0

The right hand side tends to zero by the usual Dominated Convergence Theorem.
For a fixed t ≥ 0, and any given ε > 0, we may take m large enough that P[τm ≤
t] ≤ ε/2. We then have
   
n n
P sup |(K • M)s | > ε ≤ P sup |(K • M)s | > ε + ε/2
s≤t s≤t∧τm
1
≤ kK n • M τm k2H2 + ε/2 ≤ ε,
ε2
for n large enough. 
From this we can also confirm that even in their most general form our stochas-
tic integrals can be thought of as limits of integrals of simple functions.

56
Proposition 7.15 Let X be a continuous semimartingale and K a left-continuous
locally bounded process. If π n is a sequence of partitions of [0,t] with mesh con-
verging to zero then
Z t
∑ Kt (Xt i i+1 − Xti ) −→ Ks dXs in probability as n → ∞.
ti ∈π n 0

7.4 Itô’s formula and its applications


We already saw that the stochastic integral of Brownian motion with respect to
itself did not behave as we would expect from Newtonian calculus. So what are
the analogues of integration by parts and the chain rule for stochastic integrals?

Proposition 7.16 (Integration by parts) If X and Y are two continuous semimartin-


gales then
Z t Z t
Xt Yt = X0Y0 + Xs dYs + Ys dXs + hX,Y it , t ≥0 a.s. (43)
0 0
= X0Y0 + (X •Y )t + (Y • X)t + hX,Y it .

Proof. Fix t and let π n be a sequence of partitions of [0,t] with mesh converging to
zero. Note that

Xt Yt − XsYs = Xs (Yt −Ys ) +Ys (Xt − Xs ) + (Xt − Xs )(Yt −Ys )

so for any n
 
Xt Yt − X0Y0 = ∑ Xti (Yti+1 −Yti ) +Yti (Xti+1 − Xti ) + (Xti+1 − Xti )(Yti+1 −Yti )
ti ∈π n
−→ (X •Y )t + (Y • X)t + hX,Y it as n → ∞.

Theorem 7.17 (Itô’s formula) Let X 1 , . . . , X d be continuous semimartingales and


F : Rd → R a C2 function. Then (F(Xt1 , . . . , Xtd ) : t ≥ 0) is a continuous semimartin-
gale and a.s. for all t ≥ 0
d Z t
∂F
F(Xt1 , . . . , Xtd ) =F(X01 , . . . , X0d ) + ∑ (Xs1 , . . . , Xsd )dXsi
i=1 0 ∂ xi
(44)
∂ 2F
Z t
1
+ ∑ (Xs1 , . . . , Xsd )dhX i , X j is .
2 1≤i, j≤d 0 ∂ xi ∂ x j

In particular, for d = 1, we have


Z t Z t
1
F(Xt ) = F(X0 ) + F 0 (Xs )dXs + F 00 (Xs )dhXis .
0 2 0

57
Proof. Let X i = X0i + M i + Ai be the semimartingale decomposition of X i and
denote by V i the total variation process of Ai . Let

τri = inf{t ≥ 0 : |Xti | +Vti + hM i it > r},

and τr = min{τri , i = 1, . . . , d}. Then (τr )r≥0 is a family of stopping times with
τr ↑ ∞. It is sufficient to prove (44) up to time τr . We will prove that the result holds
for polynomials and then the full result follows by approximating C2 functions by
polynomials.
First note that it is obvious that the set of functions for which the formula holds
is a vector space containing the functions F ≡ 1 and F(x1 , . . . , xd ) = xi for i ≤ d.
We now check that if (44) holds for two functions F and G, then it holds for
the product FG. Integration by parts yields
Z t Z t
Ft Gt − F0 G0 = Fs dGs + Gs dFs + hF, Git . (45)
0 0

By associativity of stochastic integration, and because (44) holds for G,


d
∂ 2 Gs
Z t Z t Z t
∂ Gs 1
Fs dGs = ∑ F(Xs ) i dXsi + ∑ F(Xs ) dhX i , X j is ,
0 i=1 0 ∂x 2 1≤i, j≤d 0 ∂ xi ∂ x j
Rt
with a similar expression for 0 Gs dFs . Using the fact that (44) holds for F and G,
we also have
d d Z t
∂ Fs ∂ Gs
hF, Git = ∑ ∑ dhX i , X j is .
i=1 j=1 0 ∂ xi ∂ x j
Substituting these into (45), we obtain Itô’s formula for FG.
To pass to limits of sequences of polynomials, use the stochastic Dominated
Convergence Theorem (and the fact that everything is nicely bounded up to time
τr ). 
As a first application of this, suppose that M is a continuous local martingale
and A is a process of finite variation. Then hM, Ai ≡ 0 and applying Itô’s formula
with X 1 = M and X 2 = A yields
Z t Z t
∂F ∂F
F(Mt , At ) = F(M0 , A0 ) + (Ms , As )dMs + (Ms , As )dAs
0 ∂m 0 ∂a
1 t ∂ 2F
Z
+ (Ms , As )dhMis .
2 0 ∂ m2
Note that this gives us the semimartingale decomposition of F(Mt , At ) and we can,
for example, read off the conditions on F under which we recover a local martin-
2
gale. In particular, taking F(x, y) = exp(λ x − λ2 y) with X 1 = M and X 2 = hM, Mi,
we obtain:

58
Proposition 7.18 Let M be a continuous local martingale and λ ∈ R. Then

λ2
 
E (M)t := exp λ Mt − hMit , t ≥ 0,
λ
(46)
2

is a continuous local martingale. In fact the same holds true for any λ ∈ C with
the real and imaginary parts being local martingales.

Proof. Computing the partial derivatives and simplifying gives:


Z t

E (M)t = E (M)0 +
λ λ
F λ (Ms , hMis )dMs .
0 ∂x


Note that we have ∂ x F(x, y) = λ F(x, y) so that we could have written this as
Z t
E λ (M)t = E λ (M)0 + λ E λ (M)s dMs
0

or in ‘differential form’ as

dE λ (M)t = λ E λ (M)t dMt

which shows E λ (M) solves the stochastic exponential differential equation driven
by M: dYt = λYt dMt .
Here is a beautiful application of exponential martingales:

Theorem 7.19 (Lévy’s characterisation of Brownian motion) Let M be a con-


tinuous local martingale starting at zero. Then M is a standard Brownian motion
if and only if hMit = t a.s. for all t ≥ 0.

Proof. We know that the quadratic variation of a Brownian motion B is given by


hBit = t.
Suppose M is a continuous local martingale starting in zero with hMit = t a.s.
for all t ≥ 0. Then, by Proposition 7.18,

ξ2
 
exp iξ Mt + t , t ≥ 0
2

is a local martingale for any ξ ∈ R and, since it is bounded, it is a martingale. Let


0 ≤ s < t. We have

ξ 2 ξ2
     
E exp iξ Mt + t Fs = exp iξ Ms + s
2 2

which we can rewrite as


ξ2
h i
E eiξ (Mt −Ms ) Fs = e− 2 (t−s) . (47)

59
In other words, Mt − Ms is centred Gaussian with with variance t − s.
It follows also from (47) that for A ∈ Fs ,
h i h i
E 1A eiξ (Mt −Ms ) = P[A]E eiξ (Mt −Ms ) ,

so fixing A ∈ Fs with P[A] > 0 and writing PA = P[· ∩ A]/P[A] (which is a proba-
bility measure on Fs ) for the conditional probability given A, we have that Mt − Ms
has the same distribution under P as under PA and so Mt − Ms is independent of Fs
and we have that M is an (Ft )-Brownian motion. 
So the quadratic variation is capturing all the information about M. This is sur-
prising - recall that it is a special property of Gaussians that they are characterised
by their means and the variance-covariance matrix, but in general we need to know
much more. It turns out that what we just saw for Brownian motion has a powerful
consequence for all continuous local martingales - they are characterised by their
quadratic variation and, in fact, they are all time changes of Brownian motion.

Theorem 7.20 (Dambis, Dubins and Schwarz) Let M be an ((Ft ), P))-continuous


local martingale with M0 = 0 and hMi∞ = ∞ a.s. Let τs := inf{t ≥ 0 : hMit > s}.
Then the process B defined by Bs := Mτs , is an ((Fτs ), P)-Brownian motion and
Mt = BhMit , ∀t ≥ 0 a.s.

Proof. Note that τs is the first hitting time of an open set (s, ∞) for an adapted
process hMi with continuous sample paths, and hence τs is a stopping time (recall
that (Ft ) is right-continuous). Further, hMi∞ = ∞ a.s. implies that τs < ∞ a.s.
The process (τs : s ≥ 0) is non-decreasing and right-continuous (in fact s → τs is
the right-continuous inverse of t → hMit ). Let Gs := Fτs . Note that it satisfies
the usual conditions. The process B is right continuous by continuity of M and
right-continuity of τ. We have

lim Bu = lim Mτu = Mτs− .


u↑s u↑s

But[τs− , τs ] is either a point or an interval of constancy of hMi. The latter are


known (exercise) to coincide a.s. with the intervals of constancy of M and hence
Mτs− = Mτs = Bs so that B has a.s. continuous paths. To conclude that B is a (Gs )-
Brownian motion, by Lévy’s theorem, it remains to show that (Bs ) and (B2s − s) are
(Gs )-local martingales.
Note that M τn and (M τn )2 −hMiτn are uniformly integrable martingales. Taking
0 ≤ u < s < n and applying the Optional Stopping Theorem we obtain

E[Bs |Gu ] = E[Mττsn |Fτu ] = Mττun = Mτu = Bu

and

E[B2s −s|Gu ] = E (Mττsn )2 − hMiττns |Fτu = (Mττun )2 −hMiττnu = (Mτu )2 −hMiτu = B2u −u,
 

60
where we used continuity of hMi to write hMiτu = u. It follows that B is indeed a
(Gs )-Brownian motion.
Finally, BhMit = MτhMit = Mt , again since the intervals of constancy of M and of
hMi coincide a.s. so that s → τs is constant on [t, τhMit ]. 

61
A The Monotone Class Lemma/ Dynkin’s π − λ Theorem
There are multiple names used for this result (often with slightly different formu-
lations).
Let E be an arbitrary set and let P(E) be the set of all subsets of E.

Definition A.1 A subset M of P(E) is called a monotone class, or a Dynkin


system/ λ -system, if

i. E ∈ M ;

ii. if A, B ∈ M and A ⊂ B, then B\A ∈ M ;

iii. if (An )n≥0 is an increasing sequence of subsets of E such that An ∈ M , then


∪n≥0 An ∈ M .

The monotone class generated by an arbitrary subset C of P(E) is

M (C) = ∩M monontone class,C ⊂M M .

Equivalently, M is a montone class if

i. E ∈ M ;

ii. if A, B ∈ M and A ⊂ B, then B\A ∈ M ;

iii. if (An )n≥0 is a sequence of subsets of E such that Ai ∩ A j = 0/ for i 6= j, then


∪n≥0 An ∈ M .

Definition A.2 A collection I of subsets of E such that 0/ ∈ calI and for all A,
B ∈ I , A ∩ B ∈ I is called a π-system.

You may have seen the result expresed as:

Theorem A.3 (Dynkin’s π − λ Theorem) If P is a π-system and D is a λ -system


such that P ⊆ D, then σ (P) ⊆ σ (D).

Le Gall’s (equivalent) formulation is:

Lemma A.4 (Monotone class lemma) If C ⊂ P(E) is stable under finite inter-
sections, then M = σ (C ).

In other words, a Dynkin system which is also a π-system is a σ -algebra.


Here are some useful consequences:

i. Let A be a σ -field of E and let µ, ν be two probability measures on (E, A ).


Assume that there exists C ⊂ A which is stable under finite intersections
and such that σ (C ) = A and µ(A) = ν(A) for every A ∈ C , then µ = ν.
(Use that G = {A ∈ A : µ(A) = ν(A)} is a montone class.)

62
ii. Let (Xi )i∈I be an arbitrary collection of random variables, and let G be
a σ -field on some probability space. In order to show that the σ -fields
σ (Xi : i ∈ I) and G are independent, it is enough to verify that (Xi1 , . . . , Xi p )
is independent of G for any choice of the finite set {i1 , . . . , i p } ⊂ I. (Observe
that the class of all events that depend on a finite number of the Xi is stable
under finite intersections and generates σ (Xi , i ∈ I).)
iii. Let (Xi )i∈I be an arbitrary collection of random variables and let Z be a
bounded real variable. Let i0 ∈ I. In order to verify that E[Z|Xi , i ∈ I] =
E[Z|Xi0 ], it is enough to show that E[Z|Xi0 , Xi1 . . . Xi p ] = E[Z|Xi0 ] for any
choice of the finite collection {i1 , . . . i p } ⊂ I. (Observe that the class of all
events A such that E[1A Z] = E[1A E[Z|Xi0 ]] is a monotone class.)

B Review of some basic probability


B.1 Convergence of random variables.
a) Xn → X a.s. iff P{ω : limn→∞ Xn (ω) = X(ω)} = 1.
b) Xn → X in probability iff ∀ε > 0, limn→∞ P{|Xn − X| > ε} = 0.
c) Xn converges to X in distribution (denoted Xn ⇒ X) iff limn→∞ P{Xn ≤ x} =
P{X ≤ x} ≡ FX (x) for all x at which FX is continuous.

Theorem B.1 a) implies b) implies c).

Proof. (b ⇒ c) Let ε > 0. Then


P{Xn ≤ x} − P{X ≤ x + ε} = P{Xn ≤ x, X > x + ε} − P{X ≤ x + ε, Xn > x}
≤ P{|Xn − X| > ε}
and hence lim sup P{Xn ≤ x} ≤ P{X ≤ x + ε}. Similarly, lim inf P{Xn ≤ x} ≥
P{X ≤ x − ε}. Since ε is arbitrary, the implication follows. 

B.2 Convergence in probability.


a) If Xn → X in probability and Yn → Y in probability then aXn +bYn → aX +bY
in probability.
b) If Q : R → R is continuous and Xn → X in probability then Q(Xn ) → Q(X)
in probability.
c) If Xn → X in probability and Xn − Yn → 0 in probability, then Yn → X in
probability.

Remark B.2 (b) and (c) hold with convergence in probability replaced by con-
vergence in distribution; however (a) is not in general true for convergence in
distribution.

63
B.3 Uniform Integrability
If X is an integrable random variable (that is E[|X|] < ∞) and Λn is a sequence
of sets with P[Λn ] → 0, then E[|X1Λn |] → 0 as n → ∞. (This is a consequence
of the DCT since |X| dominates |X1Λn | and |X1Λn | → 0 a.s.) Uniform integrability
demands that this type of property holds uniformly for random variables from some
class.

Definition B.3 (Uniform Integrability) A class C of random variables is called


uniformly integrable if given ε > 0 there exists K ∈ (0, ∞) such that

E[|X|1{|X|>K} ] < ε for all X ∈ C .

There are two reasons why this definition is important:

i. Uniform integrability is necessary and sufficient for passing to the limit un-
der an expectation,

ii. it is often easy to verify in the context of martingale theory.

Property 1 should be sufficient to guarantee that uniform integrability is interesting,


but in fact uniform integrability is not often used in analysis where it is usually
simpler to use the MCT or the DCT. It is only taken seriously in probability and
that is because of 2.

Proposition B.4 Suppose that {Xα , α ∈ I} is a uniformly integrable family of ran-


dom variables on some probability space (Ω, F , P). Then

i.
sup E[|Xα |] < ∞,
α

ii.
P[|Xα | > N] → 0 as N → ∞, uniformly in α.

iii.
E[|Xα |1Λ ] → 0 as P[Λ] → 0, uniformly in α.

Conversely, either 1 and 3 or 2 and 3 implies uniform integrability.

B.4 The Convergence Theorems


Theorem B.5 (Bounded Convergence Theorem) Suppose that Xn ⇒ X and that
there exists a constant b such that P(|Xn | ≤ b) = 1. Then E[Xn ] → E[X].

Proof. Let {xi } be a partition of R such that FX is continuous at each xi . Then

∑ xi P{xi < Xn ≤ xi+1 } ≤ E[Xn ] ≤ ∑ xi+1 P{xi < Xn ≤ xi+1 }


i i

64
and taking limits we have

∑ xi P{xi < X ≤ xi+1 } ≤ limn→∞ E[Xn ]


i
≤ limn→∞ E[Xn ] ≤ ∑ xi+1 P{xi < X ≤ xi+1 }
i

As max |xi+1 − xi | → 0, the left and right sides converge to E[X], as required. 
Lemma B.6 Let X ≥ 0 a.s. Then limM→∞ E[X ∧ M] = E[X].
Proof. Check the result first for X having a discrete distribution and then extend to
general X by approximation. 
Theorem B.7 (Monotone Convergence Theorem.) Suppose 0 ≤ Xn ≤ X and Xn →
X in probability. Then limn→∞ E[Xn ] = E[X].
Proof. For M > 0
E[X] ≥ E[Xn ] ≥ E[Xn ∧ M] → E[X ∧ M]
where the convergence on the right follows from the bounded convergence theo-
rem. It follows that
E[X ∧ M] ≤ lim inf E[Xn ] ≤ lim sup E[Xn ] ≤ E[X]
n→∞ n→∞

and the result follows by Lemma B.6. 

Lemma B.8 (Fatou’s lemma.) If Xn ≥ 0 and Xn ⇒ X, then lim inf E[Xn ] ≥ E[X].
Proof. Since E[Xn ] ≥ E[Xn ∧ M] we have
lim inf E[Xn ] ≥ lim inf E[Xn ∧ M] = E[X ∧ M].
By the Monotone Convergence Theorem E[X ∧ M] → E[X] and the result follows.

Theorem B.9 (Dominated Convergence Theorem) Assume Xn ⇒ X, Yn ⇒ Y , |Xn | ≤
Yn , and E[Yn ] → E[Y ] < ∞. Then E[Xn ] → E[X].
Proof. For simplicity, assume in addition that Xn + Yn ⇒ X + Y and Yn − Xn ⇒
Y − X (otherwise consider subsequences along which (Xn ,Yn ) ⇒ (X,Y )). Then
by Fatou’s lemma lim inf E[Xn + Yn ] ≥ E[X + Y ] and lim inf E[Yn − Xn ] ≥ E[Y −
X]. From these observations lim inf E[Xn ] + lim E[Yn ] ≥ E[X] + E[Y ], and hence
lim inf E[Xn ] ≥ E[X]. Similarly lim inf E[−Xn ] ≥ E[−X] and lim sup E[Xn ] ≤ E[X]

Lemma B.10 (Markov’s inequality)
P{|X| > a} ≤ E[|X|]/a, a ≥ 0.

Proof. Note that |X| ≥ aI{|X|>a} . Taking expectations proves the desired inequality.


65
B.5 Information and independence.
Information obtained by observations of the outcome of a random experiment is
represented by a sub-σ -algebra D of the collection of events F . If D ∈ D, then
the oberver “knows” whether or not the outcome is in D.
An S-valued random variable Y is independent of a σ -algebra D if

P({Y ∈ B} ∩ D) = P{Y ∈ B}P(D), ∀B ∈ B(S), D ∈ D.

Two σ -algebras D1 , D2 are independent if

P(D1 ∩ D2 ) = P(D1 )P(D2 ), ∀D1 ∈ D1 , D2 ∈ D2 .

Random variables X and Y are independent if σ (X) and σ (Y ) are independent, that
is, if
P({X ∈ B1 } ∩ {Y ∈ B2 }) = P{X ∈ B1 }P{Y ∈ B2 }.

B.6 Conditional expectation.


Interpretation of conditional expectation in L2 .
Problem: Approximate X ∈ L2 using information represented by D such that the
mean square error is minimized, i.e., find the D-measurable random variable Y that
minimizes E[(X −Y )2 ].
Solution: Suppose Y is a minimizer. For any ε 6= 0 and any D-measurable random
variable Z ∈ L2

E[|X −Y |2 ] ≤ E[|X −Y − εZ|2 ] = E[|X −Y |2 ] − 2εE[Z(X −Y )] + ε 2 E[Z 2 ].

Hence 2εE[Z(X −Y )] ≤ ε 2 E[Z 2 ]. Since ε is arbitrary, E[Z(X −Y )] = 0 and hence

E[ZX] = E[ZY ] (48)

for every D-measurable Z with E[Z 2 ] < ∞. 


With (48) in mind, for an integrable random variable X, the conditional expec-
tation of X, denoted E[X|D], is the unique (up to changes on events of probability
zero) random variable Y satisfying

A) Y is D-measurable.

for all D ∈ D.
R R
B) D XdP = D Y dP

Note that Condition B is a special case of (48) with Z = ID (where ID denotes


the indicator function for the event D) and that Condition B implies that (48) holds
for all bounded D-measurable random variables. Existence of conditional expec-
tations is a consequence of the Radon-Nikodym theorem.

66
C Convergence of backwards discrete time supermartin-
gales
For a backwards supermartingale, Yn is Hn -measurable and for m < n ≤ 0, E[Yn |Hm ] ≤
Ym . Notice that the σ -field (Hn )n∈−N gets ‘smaller and smaller’ as n → −∞.
If (Yn )n∈−N is a backwards supermartingale and if the sequence (Yn )n∈−N is
bounded in L1 , then (Yn )n∈−N converges almost surely and in L1 as n → −∞.

D A very short primer in functional analysis


We start with a brief recall of basic notions of functional analysis leading to Hilbert
spaces and identification of their dual.

E Normed vector spaces


We start with basics. A vector space V over R is a set endowed with two binary op-
erations: addition and multiplication by a scalar, which satisfy the natural axioms.
We focus the discussion on real scalars as this is relevant for us but most of what
follows applies to spaces over complex numbers (or more general fields).

Definition E.1 A norm k · k on a vector space V is a mapping from V to [0, ∞) such


that

(i) for any a ∈ R, v ∈ V , kavk = |a|kvk (absolute homogeneity);

(ii) for any x, y ∈ V , kx + yk ≤ kxk + kyk (triangle inequality);

(iii) kvk = 0 if and only if v is the zero vector in V (separates points).

Note that a norm induces a metric on V through d(x, y) = kx − yk and hence a


topology on V . A norm is then a continuous function from V to R. The space of
continuous linear functions plays a special role:

Definition E.2 Given a normed vector space over reals (V, k · kV ), its dual V 0 is
the space of all continuous linear maps (functionals) from V to R. V 0 itself is a
vector space over R equipped with a norm

kφ kV 0 := sup |φ (v)|.
v∈V,kvkV ≤1

The classical examples of spaces to consider are spaces of sequences or of


functions. Let (S , F, µ) be a measurable space endowed with a σ -finite measure.
Then for a real valued measurable function f on S we can consider
Z 1/p
p
k f k p := | f (x)| µ(dx)
S

67
and let L p (S , F, µ) be the space of such functions for which k f k p < ∞. Observe
that k · k p is not yet a norm on L p – indeed it fails to satisfy (iii) in Definition
E.1 since if f = 0 µ-a.e. but is not zero, e.g. f = 1A for a measurable A ∈ F with
µ(A) = 0, then still k f k p = 0. We then say that k · k p is a semi-norm on L p .
To remedy this, we consider the space L p (S , F, µ) which is the quotient of L p
with respect to the equivalence relation f ∼ g iff f = g µ-a.e. Put differently, L p is
the space of equivalence classes of functions equal µ-a.e. and which are integrable
with pth power. Then (L p (S , F, µ), k · k p ) is a normed vector space for p ≥ 1. The
triangle inequality for k · k p is simply the Minkowski inequality.
A more geometric notion of measuring the relation between vectors is given by
an inner product.
Definition E.3 Given a vector space V over R, a mapping < ·, · >: V ×V → R is
called an inner product if
(i) it is bilinear and symmetric: < ax + bz, y >= a < x, y > +b < z, y > and
< x, y >=< y, x > for a, b ∈ R, x, y, z ∈ V ;
(ii) for any x ∈ V , < x, x >≥ 0;
(iii) < x, x >= 0 if and only if x is the zero vector in V .
This notion is very familiar on V = Rn where an inner product is given by
n
< x, y >= xT y = ∑ xi yi .
i=1

An inner product satisfies the Cauchy-Schwartz inequality


Proposition E.4 An inner product on a vector space satisfies
√ √
| < x, y > | ≤ < x, x > < y, y >, x, y ∈ V. (49)

Proof. Let kxk := < x, x >, x ∈ V . Fix x, y ∈ V and define a quadratic function
Q : R → R by

Q(r) = kx + ryk2 = kyk2 r2 + 2 < x, y > r + kxk2 , r∈R

which clearly is non-negative and hence its discriminant has to be non-positive i.e.

4| < x, y > |2 − 4kxk2 kyk2 ≤ 0 that is | < x, y > | ≤ kxkkyk,

as required. We note also that equality holds if and only if the vectors x, y are
linearly dependent i.e. x = ry for some r ∈ R. 

The above implies that kxk := < x, x > is a norm on V . We say that the norm
is induced by an inner product. Among spaces L p defined above only L2 has norm
which is induced by an inner product, namely by
Z
< f , g >= f (x)g(x)µ(dx). (50)
S

68
F Banach spaces
We first define Cauchy sequences which embody the idea of a converging sequence
when we do not know the limiting element.
Definition F.1 A sequence (xn ) of elements in a normed vector space (X, k · k) is
called a Cauchy sequence if for any ε > 0 there exists N ≥ 1 such that for all
n, m ≥ N we have kxn − xm k ≤ ε.

Definition F.2 A normed vector space (X, k · k) is complete if every Cauchy se-
quence converges to an element x ∈ X. It is then called a Banach space. Further, if
the norm is induces by an inner product then (X, k · k) is called a Hilbert space.

Naturally, the Euclideanqspace Rd is a Banach space (and in fact a Hilbert


space) with the norm kxk = ∑di=1 xi2 . This implies (reasoning for d = 1) that

Proposition F.3 If (X, k · kX ) is a normed vector space over R then its dual (X 0 , k ·
kX 0 ) in Definition E.2 is a Banach space.
In many cases it is interesting to build linear functionals satisfying certain addi-
tional properties. This is often done using the Hahn-Banach theorem. It states in
particular that a continuous linear functional defined on a linear subspace Y of X
can be continuously extended to the whole of X without increasing its norm. A
version of this is also known as the separating hyperplane theorem since it allows
to separate two convex sets (one open) using an affine hyperplane.
An important step in studying continuous linear functionals on X is achieved
by describing the structure of X 0 . We have
Proposition F.4 Let (S , F, µ) be a measurable space with a σ -finite measure.
Then for any p ≥ 1, L p (S , F, µ) is a Banach space and for p > 1 its dual is
equivalent to (isometric to) the space Lq (S , F, µ), where 1/p + 1/q = 1.
In particular we see that L2 is its own dual. This means that any continuous linear
functional on L2 can be identified with an element in L2 . This property remains
true for any Hilbert space:
Proposition F.5 Let (X, k·k) be a Hilbert space with the norm induced by an inner

product, kxk = < x, x >. If φ : X → R is a continuous linear map then there exists
an element xφ ∈ X such that

φ (y) =< x, y >, ∀y ∈ X.

In particular, if φ : L2 (S , F, µ) → R is a continuous linear map then there exists


an element fφ ∈ L2 (S , F, µ) such that
Z
φ (g) = g(x) fφ (x)µ(dx), ∀g ∈ L2 (S , F, µ).
S

69
R
Note that the inner product < x, y >, or the integral S g(x) fφ (x)µ(dx), in the
above statement is well defined by (49)–(50).
On a (separable) Hilbert space, we can also state an infinite dimensional ana-
logue of the Pythagorean theorem. Recall that Rn is a Hilbert space with inner
product < x, y >= ∑ni=1 xi yi . This uses the canonical basis in Rn but if we take any
orthonormal basis in Rn , say (ε1 , . . . , εn ), then
n n
x = ∑ < x, εi > εi and hence kxk2 = ∑ < x, εi >2 ,
i=1 i=1

which is the Pythagorean theorem. The same reasoning gives < x, y >= ∑ni=1 <
x, εi >< y, εi >. The infinite dimensional version is known as the
Proposition F.6 (Parseval’s identity) Let (X, k · k) be a separable Hilbert space
with the norm induced by an inner product (x, y) →< x, y > and let (εn : n ≥ 1) be
an orthonormal basis of X. Then for any x, y ∈ X

< x, y >= ∑ < x, εn >< y, εn >, and in particular kxk2 = ∑ < x, εn >2 .
n≥1 n≥1

Finally we state one more result, which is crucial for the construction of the
stochastic integral.
Proposition F.7 Suppose (X, k · kX ) and (Y, k · kY ) are two Banach spaces, E ⊂ X
is a dense vector subspace in X and I : E → Y is a linear isometry, i.e. a linear map
which preserves the norm, kI(x)kY = kxkX for all x ∈ X. Then I may be extended
in an unique way to a linear isometry from X to Y .
Proof. Take x ∈ X and xn → x with xn ∈ E . Then, by the isometry property, (I(xn ))
is Cauchy in Y since (xn ) is Cauchy in X. It follows that it converges to some
element which we denote I(x). Further, if we have two sequences in E , (xn ) and
(yn ), both converging to x ∈ X and giving raise to potentially two elements I(x) and
I(x)0 then we can build a third sequence z2n = xn , z2n+1 = yn which also converges
to x and we see that I(zn ) has to converge and the limit has to agree with both I(x)
and I(x)0 , so that I(x) = I(x)0 is unique. It follows that we defined I(x) ∈ Y uniquely
for all x ∈ X. Further,

kI(x)kY = lim kI(xn )kY = lim kxn kX = kxkX


n n

so that I is norm-preserving. Finally, if x, y ∈ X then we can write them as limits


of sequences of elements in E , say (xn ) and (yn ) respectively. For any a, b ∈ R,
axn + byn ∈ E since E is a vector space, and then by the above and linearity of I on
E we have
 
I(ax + by) = lim I(axn + byn ) = lim aI(xn ) + bI(yn ) = aI(x) + bI(y),
n n

so that I is linear on X as required. 

70

You might also like