Continuous Martingales and Stochastic Calculus: Alison Etheridge March 11, 2018
Continuous Martingales and Stochastic Calculus: Alison Etheridge March 11, 2018
Alison Etheridge
Contents
1 Introduction 3
6 Continuous semimartingales 36
6.1 Functions of finite variation . . . . . . . . . . . . . . . . . . . . . 36
6.2 Processes of finite variation . . . . . . . . . . . . . . . . . . . . . 38
6.3 Continuous local martingales . . . . . . . . . . . . . . . . . . . . 40
6.4 Quadratic variation of a continuous local martingale . . . . . . . . 41
6.5 Continuous semimartingales . . . . . . . . . . . . . . . . . . . . 47
1
7 Stochastic Integration 48
7.1 Stochastic integral w.r.t. L2 -bounded martingales . . . . . . . . . 48
7.2 Intrinsic characterisation of stochastic integrals using the quadratic
co-variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.3 Extensions: stochastic integration with respect to continuous semi-
martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.4 Itô’s formula and its applications . . . . . . . . . . . . . . . . . . 57
F Banach spaces 69
These notes are based heavily on notes by Jan Obłój from last year’s course,
and the book by Jean-François Le Gall, Brownian motion, martingales, and stochas-
tic calculus, Springer 2016. The first five chapters of that book cover everything in
the course (and more). Other useful references (in no particular order) include:
1. D. Revuz and M. Yor, Continuous martingales and Brownian motion, Springer
(Revised 3rd ed.), 2001, Chapters 0-4.
2. I. Karatzas and S. Shreve, Brownian motion and stochastic calculus, Springer
(2nd ed.), 1991, Chapters 1-3.
3. R. Durrett, Stochastic Calculus: A practical introduction, CRC Press, 1996.
Sections 1.1 - 2.10.
4. F. Klebaner, Introduction to Stochastic Calculus with Applications, 3rd edition,
Imperial College Press, 2012. Chapters 1, 2, 3.1–3.11, 4.1-4.5, 7.1-7.8, 8.1-
8.7.
2
5. J. M. Steele, Stochastic Calculus and Financial Applications, Springer, 2010.
Chapters 3 - 8.
6. B. Oksendal, Stochastic Differential Equations: An introduction with applica-
tions, 6th edition, Springer (Universitext), 2007. Chapters 1 - 3.
7. S. Shreve, Stochastic calculus for finance, Vol 2: Continuous-time models,
Springer Finance, Springer-Verlag, New York, 2004. Chapters 3 - 4.
The appendices gather together some useful results that we take as known.
1 Introduction
Our topic is part of the huge field devoted to the study of stochastic processes.
Since first year, you’ve had the notion of a random variable, X say, on a probability
space (Ω, F , P) and taking values in a state space (E, E ). X : Ω → E is just a
measurable mapping, so that for each e ∈ E , X −1 (e) ∈ F and so, in particular, we
can assign a probability to the event that X ∈ e. Often (E, E ) is just (R, B(R))
(where B(R) is the Borel sets on R) and this just says that for each x ∈ R we can
assign a probability to the event {X ≤ x}.
Definition 1.1 A stochastic process, indexed by some set T , is a collection of
random variables {Xt }t∈T , defined on a common probability space (Ω, F , P) and
taking values in a common state space (E, E ).
For us, T will generally be either [0, ∞) or [0, T ] and we think of Xt as a random
quantity that evolves with time.
When we model deterministic quantitities that evolve with (continuous) time,
we often appeal to ordinary differential equations as models. In this course we
develop the ‘calculus’ necessary to develop an analogous theory of stochastic (or-
dinary) differential equations.
An ordinary differential equation might take the form
dX(t) = a(t, X(t))dt,
for a suitably nice function a. A stochastic equation is often formally written as
dX(t) = a(t, X(t))dt + b(t, X(t))dBt ,
where the second term on the right models ‘noise’ or fluctuations. Here (Bt )t≥0
is an object that we call Brownian motion. We shall consider what appear to be
more general driving noises, but the punchline of the course is that under rather
general conditions they can all be built from Brownian motion. Indeed if we added
possible (random) ‘jumps’ in X(t) at times given by a Poisson process, we’d cap-
ture essentially the most general theory. We are not going to allow jumps, so we’ll
be thinking of settings in which our stochastic equation has a continuous solution
t 7→ Xt , and Brownian motion will be a fundamental object.
It requires some measure-theoretic niceties to make sense of all this.
3
Definition 1.2 The mapping t 7→ Xt (ω) for a fixed ω ∈ Ω, represents a realisation
of our stochastic process, called a sample path or trajectory. We shall assume that
(t, ω) 7→ Xt (ω) : [0, ∞) × Ω, B([0, ∞)) ⊗ F 7→ R, B(R)
4
2 An overview of Gaussian variables and processes
Brownian motion is a special example of a Gaussian process - or at least a version
of one that is assumed to have continuous sample paths. In this section we give an
overview of Gaussian variables and processes, but in the next we shall give a direct
construction of Brownian motion, due to Lévy, from which continuity of sample
paths is an immediate consequence.
We say Y has Gaussian (or normal) distribution with mean m and variance σ 2 ,
written Y ∼ N (m, σ 2 ), if Y = σ X + m where X ∼ N (0, 1). Then
σ 2ξ 2
E[eiξY ] = exp imξ − , ξ ∈ R,
2
and if σ > 0, the density on R is
2
1 − (x−m)
pY (x) = √ e 2σ 2 , x ∈ R.
2πσ
We think of a constant ‘random’ variable as being a degenerate Gaussian. Then
the space of Gaussian variables (resp. distributions) is closed under convergence in
probability (resp. distribution).
Proposition 2.2 Let (Xn ) be a sequence of Gaussian random variables with Xn ∼
N (mn , σn2 ), which converges in distribution to a random variable X. Then
(i) X is also Gaussian, X ∼ N (m, σ 2 ) with m = limn→∞ mn , σ 2 = limn→∞ σn2 ;
and
5
Proof. Convergence in distribution is equivalent to saying that the characteristic
functions converge:
h i h i
E eiξ Xn = exp(imn ξ − σn2 ξ 2 /2) −→ E eiξ X , ξ ∈ R. (1)
Taking absolute values we see that the sequence exp(−σn2 ξ 2 /2) converges, which
in turn implies that σn2 → σ 2 ∈ [0, ∞) (where we ruled out the case σn → ∞ since the
limit has to be the absolute value of a characteristic function and so, in particular,
has to be continuous). We deduce that
1 2 2
h i
eimn ξ −→ e 2 σ ξ E eiξ X , ξ ∈ R.
We now argue that this implies that the sequence mn converges to some finite m.
Suppose first that the sequence {mn }n≥1 is bounded, and consider any two conver-
gent subsequences, converging to m and m0 say. Then rearranging (1) yields m = m0
and so the sequence converges.
Now suppose that lim supn→∞ mn = ∞ (the case lim infn→∞ mn = −∞ is similar).
There exists a subsequence {mnk }k≥1 which tends to infinity. Given M, for k large
enough that mnk > M,
1
P[Xnk ≥ M] ≥ P[Xnk ≥ mnk ] = ,
2
and so using convergence in distribution
1
P[X ≥ M] ≥ lim sup P[Xnk ≥ M] ≥ , for any M > 0.
k→∞ 2
This is clearly impossible for any fixed random variable X and gives us the desired
contradiction. This completes the proof of (i).
To show (ii), observe that the convergence of σn and mn implies, in particular, that
h i 2 2
sup E eθ Xn = sup eθ mn +θ σn /2 < ∞, for any θ ∈ R.
n n
Since exp(|x|) ≤ exp(x) + exp(−x), this remains finite if we take |Xn | instead of Xn .
This implies that supn E[|Xn | p ] < ∞ for any p ≥ 1 and hence also
Fix p ≥ 1. Then the sequence |Xn − X| p converges to zero in probability (by as-
sumption) and is uniformly integrable, since, for any q > p,
1
E[|Xn − X| p : |Xn − X| p > r] ≤ E[|Xn − X|q ] → 0 as r → ∞
r(q−p)/p
(by equation (2) above). It follows that we also have convergence of Xn to X in L p .
6
2.2 Gaussian vectors and spaces
So far we’ve considered only real-valued Gaussian variables.
Definition 2.3 A random vector taking values in Rd is called Gaussian if and only
if
d
< u, X >:= uT X = ∑ ui Xi is Gaussian for all u ∈ Rd .
i=1
It follows immediately that the image of a Gaussian vector under a linear transfor-
mation is also Gaussian: if X ∈ Rd is Gaussian and A is an m × d matrix, then AX
is Gaussian in Rm .
uT X ∼ N uT mX , qX (u) , u ∈ Rd .
(3)
Proposition 2.5 Let X be a Gaussian vector and ΓX its covariance matrix. Then
X1 , . . . , Xd are independent if and only if ΓX is a diagonal matrix (i.e. the variables
are pairwise uncorrelated).
Warning: Note It is crucial to assume that the vector X is Gaussian and not just that
X1 , . . . , Xd are Gaussian. For example, consider X1 ∼ N (0, 1) and ε an independent
random variable with P[ε = 1] = 1/2 = P[ε = −1]. Let X2 := εX1 . Then X2 ∼
N (0, 1) and cov(X1 , X2 ) = 0, while clearly X1 , X2 are not independent.
By definition, a Gaussian vector X remains Gaussian if we add to it a determin-
istic vector m ∈ Rd . Hence, without loss of generality, by considering X − mX , it
suffices to consider centred Gaussian vectors. The variance-covariance matrix ΓX
is symmetric and non-negative definite (as observed above). Conversely, for any
such matrix Γ, there exists a Gaussian vector X with ΓX = Γ, and, indeed, we can
construct it as a linear transformation of a Gaussian vector with i.i.d. coordinates.
7
Theorem 2.6 Let Γ be a symmetric non-negative definite d ×d matrix. Let (ε1 , . . . , εd )
be an orthonormal basis in Rd which diagonalises Γ, i.e. Γεi = λi εi for some
λ1 ≥ λ2 ≥ . . . ≥ λr > 0 = λr+1 = . . . = λd , where 1 ≤ r ≤ d is the rank of Γ.
Proof. Let A be a matrix whose columns are εi so that ΓX = AΛAT where Λ is the
√ diagonal. Let Z1 , . . . , Zn be i.i.d. standard
diagonal matrix with entries λi on the
centred Gaussian variables and Yi = λi Zi . Let X be given by (4), i.e. X = AY .
Then
d p
< u, X >=< u, AY >=< AT u,Y >= ∑ λi < u, εi > Zi
i=1
is Gaussian, centred. Its variance is given by
d d
var(hu, Xi) = ∑ λi (uT εi )2 = ∑ uT εi λi εiT u = (AT u)T ΛAT u = uT AΛAT u = uT Γu,
i=1 i=1
and we conclude that Y is also a centred Gaussian vector with covariance matrix
Λ. Independence between Y1 , . . . ,Yd then follows from Proposition 2.5. It follows
that when r = d, Y admits a density on Rd given by
1 1 1
T −1
pY (y) = p exp − y Λ y , y ∈ Rd .
(2π)d/2 det(Λ) 2
Change of variables, together with det(Λ) = det(Γ) and | det(A)| = 1 gives the
desired density for x.
Once we know that we can write X = ∑ri=1 Yi εi in this way, we have an easy
way to compute conditional expectations within the family of random variables
8
which are linear transformations of a Gaussian vector X. To see how it works,
suppose that X is a Gaussian vector in Rd and define Z := (X1 − ∑di=2 ai Xi ) with the
coefficients ai chosen in such a way that Z and Xi are uncorrelated for i = 2, . . . , d;
that is
d
cov(X1 , Xi ) − ∑ a j cov(X j , Xi ) = 0, 2 ≤ i ≤ d.
j=2
d
E[X1 |σ (X2 , . . . , Xd )] = E[Z + ∑ ai Xi |σ (X2 , . . . , Xd )]
i=2
d d
= E[Z|σ (X2 , . . . , Xd )] + E[ ∑ ai Xi |σ (X2 , . . . , Xd )] = ∑ ai Xi ,
i=2 i=2
and by definition all elements of this space are Gaussian random variables. This is
a simple example of a Gaussian space and it is useful to think of such spaces in
much greater generality.
The theorem follows from monotone class arguments, which (see Appendix A) re-
duce it to checking that it holds true for any finite subcollection of random variables
- which is Proposition 2.5.
9
Corollary 2.9 Let H be a Gaussian space and K a closed subspace. Let pK denote
the orthogonal projection onto K. Then for X ∈ H
E[X|σ (K)] = E[pK (X)|σ (K)] + E[Y |σ (K)] = pK (X) + E[Y ] = pK (X),
where we have used that Y is a centred Gaussian and so has zero mean.
Warning: For an arbitrary X ∈ L2 we would have
Γ(s,t) := cov(Xt , Xs ).
For any fixed n-tuple (Xt1 , . . . , Xtn ) the covariance matrix (Γ(ti ,t j )) has to be sym-
metric and positive semi-definite. As the following result shows, the converse also
holds - for any such function, Γ, we may construct an associated Gaussian process.
Theorem 2.11 Let Γ : [0, ∞)2 → R be symmetric and such that for any n ∈ N and
0 ≤ t1 < t2 < . . . < tn ,
∑ ui u j Γ(ti ,t j ) ≥ 0, u ∈ Rd .
1≤i, j,≤n
This result will follow from the (more general) Daniell-Kolmogorov Theorem 2.13
below.
Recalling from Proposition 2.2 that an L2 -limit of Gaussian variables is also
Gaussian, we observe that the closed linear subspace of L2 spanned by the variables
(Xt : t ≥ 0) is a Gaussian space.
10
Constructing distributions on R[0,∞) , B(R[0,∞) )
2.5
In this section, we’re going to provide a very general result about constructing
continuous time stochastic processes and a criterion due to Kolmogorov which
gives conditions under which there will be a version of the process with continuous
paths.
Let T be the set of finite increasing sequences of non-negative numbers, i.e.
t ∈ T if and only if t = (t1 ,t2 , . . . ,tn ) for some n and 0 ≤ t1 < t2 < . . . , < tn .
Suppose that for each t ∈ T of length n we have a probability measure Qt on
(Rn , B(Rn )). The collection (Qt : t ∈ T) is called a family of finite-dimensional
(marginal) distributions.
(In other words, if we integrate out over the distribution at the jth time point then
we recover the corresponding marginal for the remaining lower dimensional vec-
tor.)
If we have a probability measure Q on R[0,∞) , B(R[0,∞) ) then it defines a
where t = (t1 ,t2 , . . . ,tn ), A ∈ B(Rn ), and we note that the set in question is in
B(R[0,∞) ) as it depends on finitely many coordinates. But we’d like a converse - if
I give you Qt , when does there exist a corresponding measure Q.
We won’t prove this, but notice that (6) defines P on the cylinder sets and so if we
have countable additivity then the proof reduces to an application of Carathéodory’s
extension theorem. Uniqueness is a consequence of the Monotone Class Lemma.
This is a remarkably general result, but it doesn’t allow us to say anything
meaningful about the paths of the process. For that we appeal to Kolmogorov’s
criterion.
11
Theorem 2.14 (Kolmogorov’s continuity criterion) Suppose that a stochastic pro-
cess (Xt : t ≤ T ) defined on (Ω, F , P) satisfies
|X̃t − X̃s |
sup ≤ε a.s. (8)
0≤t−s≤δ , s,t,∈[0,T ] |t − s|
γ
Definition 3.1 The discrete time stochastic process {Sn }n≥0 is a symmetric simple
random walk under the measure P if Sn = ∑ni=1 ξi , where the ξi can take only the
values ±1, and are i.i.d. under P with P[ξi = −1] = 1/2 = P[ξi = 1].
Lemma 3.2 {Sn }n≥0 is a P-martingale (with respect to the natural filtration) and
cov(Sn , Sm ) = n ∧ m.
12
i. for each s ≥ 0 and t > 0 the random variable Bt+s − Bs has the normal
distribution with mean zero and variance σ 2t,
iii. B0 = 0,
iv. Bt is continuous in t ≥ 0.
We can write down the finite dimensional distributions using the independence of
increments. They admit a density with respect to Lebesgue measure. We write
p(t, x, y) for the transition density
(x − y)2
1
p(t, x, y) = √ exp − .
2πt 2t
Although the sample paths of Brownian motion are continuous, it does not
mean that they are nice in any other sense. In fact the behaviour of Brownian
motion is distinctly odd. Here are just a few of its strange behavioural traits.
ii. Brownian motion will eventually hit any and every real value no matter how
large, or how negative. No matter how far above the axis, it will (with prob-
ability one) be back down to zero at some later time.
iii. Once Brownian motion hits a value, it immediately hits it again infinitely
often, and then again from time to time in the future.
iv. It doesn’t matter what scale you examine Brownian motion on, it looks just
the same. Brownian motion is a fractal.
The last property is really a consequence of the construction of the process. We’ll
formulate the second and third more carefully later.
13
We could recover the existence of Brownian motion from the general principles
outlined so far (Daniell-Kolmogorov Theorem and the Kolmogorov continuity cri-
terion plus what we know about Gaussian processes), but we are now going to take
a short digression to describe a beautiful (and useful) construction due to Lévy.
The idea is that we can simply produce a path of Brownian motion by direct
polygonal interpolation. We require just one calculation.
Lemma 3.4 Suppose that {Bt }t≥0 is standard Brownian motion. Conditional on
Bt1 = x1 , the probability density function of Bt1 /2 is
2 !!
x − 12 x1
r
2 1
pt1 /2 (x) , exp − .
πt1 2 t1 /4
The construction: Without loss of generality we take the range of t to be [0, 1].
Lévy’s construction builds (inductively) a polygonal approximation to the Brown-
ian motion from a countable collection of independent normally distributed random
variables with mean zero and variance one. We index them by the dyadic points of
[0, 1], a generic variable being denoted ξ (k2−n ) where n ∈ N and k ∈ {0, 1, . . . , 2n }.
The induction begins with
X1 (t) = tξ (1).
We now determine the appropriate value for Xn+1 (2k − 1)2−(n+1) . Conditional
Xn+1 (2k − 1)2−(n+1) − Xn+1 2(k − 1)2−(n+1)
14
Now if X ∼ N (0, 1), then aX + b ∼ N (b, a2 ) and so we take
Xn+1 (2k − 1)2−(n+1) −Xn+1 2(k − 1)2−(n+1) = 2−(n/2+1) ξ (2k − 1)2−(n+1)
1
+ Xn+1 2k2−(n+1) − Xn+1 2(k − 1)2−(n+1) .
2
In other words
1
Xn+1 (2k − 1)2−(n+1) Xn (k − 1)2−n
=
2
1
+ Xn k2−n + 2−(n/2+1) ξ (2k − 1)2−(n+1)
2
= Xn (2k − 1)2−(n+1)
+ 2−(n/2+1) ξ (2k − 1)2−(n+1) , (9)
Lemma 3.5
h i
P lim Xn (t) exists for 0 ≤ t ≤ 1 uniformly in t = 1.
n→∞
Proof: Notice that maxt |Xn+1 (t) − Xn (t)| will be attained at a vertex, that is for
t ∈ {(2k − 1)2−(n+1) , k = 1, 2, . . . , 2n } and using (9),
h i
P max |Xn+1 (t) − Xn (t)| ≥ 2−n/4
t
= P max n ξ (2k − 1)2−(n+1) ≥ 2n/4+1
1≤k≤2
h i
≤ 2n P |ξ (1)| ≥ 2n/4+1 .
Now
1 2
P [ξ (1) ≥ x] ≤ √ e−x /2 ,
x 2π
(exercise), and, combining this with the fact that
exp −2(n/2+1) < 2−2n+2 ,
15
X3
X2
X1
16
Consider now for k > n,
h i h i
P max |Xk (t) − Xn (t)| ≥ 2−n/4+3 = 1 − P max |Xk (t) − Xn (t)| ≤ 2−n/4+3
t t
and
h i
P max |Xk (t) − Xn (t)| ≤ 2−n/4+3
t
" #
k−1
≥ P ∑ max X j+1 (t) − X j (t) ≤ 2−n/4+3
t
j=n
h
≥ P max X j+1 (t) − X j (t) ≤ 2− j/4 ,
t
i
j = n, . . . , k − 1
k−1
≥ 1 − ∑ 2− j ≥ 1 − 2−n+1 .
j=n
for all k ≥ n. The events on the left are increasing (since the maximum can only
increase by the addition of a new vertex) so
h i
P max |Xk (t) − Xn (t)| ≥ 2−n/4+3 for some k > n ≤ 2−n+1 .
t
Lemma 3.6 Let X(t) = limn→∞ Xn (t) if the limit exists uniformly and zero other-
wise. Then X(t) satisfies the conditions of Definition 3.3 (for t restricted to [0, 1]).
Proof: By construction, the properties 1–3 of Definition 3.3 hold for the approxi-
mation Xn (t) restricted to Tn = {k2−n , k = 0, 1, . . . , 2n }. Since we don’t change Xk
on Tn for k > n, the same must be true for X on ∪∞ n=1 Tn . A uniform limit of con-
tinuous functions is continuous, so condition 4 holds and now by approximation of
any 0 ≤ t1 ≤ t2 ≤ . . . ≤ tn ≤ 1 from within the dense set ∪∞ n=1 Tn we see that in fact
all four properties hold without restriction for t ∈ [0, 1].
17
3.2 Wiener Measure
Let C(R+ , R) be the space of continuous functions from [0, ∞) to R. Given a
Brownian motion (Bt : t ≥ 0) on (Ω, F , P), consider the map
which is measurable w.r.t. B(C(R+ , R)) - the smallest σ -algebra such that the
coordinate mappings (i.e. (ωt : t ≥ 0) 7→ ω(t0 ) for a fixed t0 ) are measurable. (In
fact B(C(R+ , R)) is also the Borel σ -algebra generated by the topology of uniform
convergence on compacts).
Definition 3.7 The Wiener measure W is the image of P under the mapping in (10);
it is the probability measure on the space of continuous functions such that the
canonical process, i.e. (Bt (ω) = ω(t),t ≥ 0), is a Brownian motion.
In other words, W is the unique probability measure on (C(R+ , R), B(C(R+ , R)))
such that
where y0 := 0.
(Uniqueness follows from the Monotone Class Lemma, since B(C(R+ , R))) is
generated by finitely dimensional projections.)
18
Proposition 3.9 Let B be a standard real-valued Brownian motion. Then
i. −Bt is also a Brownian motion, (symmetry)
ii. ∀ c ≥ 0, cBt/c2 is a Brownian motion, (scaling)
iii. X0 = 0, Xt := tB 1 is a Brownian motion, (time reversal)
t
Definition 3.10 Let π be a partition of [0, T ], N(π) the number of intervals that
make up π and δ (π) be the mesh of π (that is the length of the longest interval in
the partition). Write 0 = t0 < t1 < . . . < tN(π) = T for the endpoints of the intervals
of the partition. Then the variation of a function f : [0, T ] → R is
( )
N(π)
lim sup ∑ f (t j ) − f (t j−1 ) .
δ →0 π:δ (π)=δ j=1
If the function is ‘nice’, for example differentiable, then it has bounded variation.
Our ‘rough’ paths will have unbounded variation. To quantify roughness we can
extend the idea of variation to that of p-variation.
Definition 3.11 In the notation of Definition 3.10, the p-variation of a function
f : [0, T ] → R is defined as
( )
N(π) p
lim sup ∑ f (t j ) − f (t j−1 ) .
δ →0 π:δ (π)=δ j=1
Notice that for p > 1, the p-variation will be finite for functions that are much
rougher than those for which the variation is bounded. For example, roughly speak-
ing, finite 2-variation
√ will follow if the fluctuation of the function over an interval
of order δ is order δ .
For a typical Brownian path, the 2-variation will be infinite. However, a slightly
weaker analogue of the 2-variation does exist.
19
Theorem 3.12 Let Bt denote Brownian motion under P and for a partition π of
[0, T ] define
N(π) 2
S(π) = ∑ Bt j − Bt j−1 .
j=1
We say that the quadratic variation process of Brownian motion, which we denote
by {hBit }t≥0 is hBit = t. More generally, we can define the quadratic variation
process associated with any bounded continuous martingale.
Definition 3.13 Suppose that {Mt }t≥0 is a bounded continuous P-martingale. The
quadratic variation process associated with {M}t≥0 is the process {hMit }t≥0 such
that for any sequence of partitions πn of [0, T ] with δ (πn ) → 0,
" #
N(πn ) 2 2
E ∑ Mt − Mt − hMiT → 0 as n → ∞. (12)
j j−1
j=1
Remark: We don’t prove it here, but the limit in (12) will be independent of the
sequence of partitions.
Proof of Theorem 3.12: We expand the expression inside the expectation in (11)
N(π )
and make use of our knowledge of the normal distribution. Let {tn, j } j=0n denote
the endpoints of the intervals that make up the partition πn . First observe that
o2
N(πn ) n
2
|S(πn ) − T |2 = ∑ Btn, j − Btn, j−1 − (tn, j − tn, j−1 ) .
j=1
2
It is convenient to write δn, j for Btn, j − Btn, j−1 − (tn, j − tn, j−1 ). Then
N(πn )
|S (πn ) − T |2 = ∑ δn,2 j + 2 ∑ δn, j δn,k .
j=1 j<k
For a normally distributed random variable, X, with mean zero and variance λ ,
E[|X|4 ] = 3λ 2 , so we have
E δn,2 j = 3 (tn, j − tn, j−1 )2 − 2 (tn, j − tn, j−1 )2 + (tn, j − tn, j−1 )2
20
Summing over j
h 2 i N(πn )
E S(πn ) − T ≤ 2 ∑ δ (πn ) (tn, j − tn, j−1 )
j=1
= 2δ (πn )T
→ 0 as n → ∞.
Corollary 3.14 Brownian sample paths are of infinite variation on any interval
almost surely.
Corollary 3.15 Brownian sample paths are almost surely nowhere locally Hölder
continuous of order γ > 21 .
Proof. Let 0 < t1 < t2 · · · < tk and let g : Rk → R be a bounded continuous function.
Also, fix A ∈ F0+ . Then by a continuity argument,
If 0 < ε < t1 , the variables Bt1 − Bε , . . . , Btk − Bε are independent of Fε (by the
Markov property) and thus also of F0+ . It follows that
We have thus obtained that F0+ is independent of σ (Bt1 , . . . , Btk ). Since this holds
for any finite collection {t1 , . . . ,tk } of (strictly) positive reals, F0+ is independent
of σ (Bt ,t > 0). However, σ (Bt ,t > 0) = σ (Bt ,t ≥ 0), since B0 is the pointwise
limit of Bt when t → 0. Since F0+ ⊂ σ (Bt ,t ≥ 0), we conclude that F0+ is inde-
pendent of itself and so must be trivial.
21
i. Then, a.s., for every ε > 0,
Remark 3.18 It is not a priori obvious that sup0≤s≤ε Bs is even measurable, since
this is an uncountable supremum of random variables, but since sample paths are
continuous, we can restrict to rational values of s ∈ [0, ε] so that we are taking the
supremum over a countable set. We implicitly use this observation in what follows.
Proof. (i) Let ε p be a sequence of strictly positive reals decreasing to zero and set
A := p≥0 {sup0≤s≤ε p Bs > 0}. Since this is a monotone decreasing intersection,
T
and
1
P[ sup Bs > 0] ≥ P[Bε p > 0] = .
0≤s≤ε p 2
So P[A] ≥ 1/2 and by Blumenthal’s 0 − 1 law P[A] = 1. Hence a.s. for all ε > 0,
sup0≤s≤ε Bs > 0. Replacing B by −B we obtain P[inf0≤s≤ε Bs < 0] = 1.
(ii) Write
h i h i
1 = P sup Bs > 0 = lim ↑ P sup Bs > δ ,
0≤s≤1 δ ↓0 0≤s≤1
where ↑ indicates an increasing limit. Now use the scale invariance property, Btλ =
Bλ 2t /λ is a Brownian motion, with λ = 1/δ to see that for any δ > 0,
h i h i h i
P sup Bs > δ = P sup Bδs > 1 = P sup Bs > 1 . (13)
0≤s≤1 0≤s≤1/δ 2 0≤s≤1/δ 2
If we let δ ↓ 0, we find
P[sup Bs > M] = 1
s≥0
22
and replacing B with −B,
P[inf Bs < −M] = 1.
s≥0
Proposition 4.2 An adapted process (Xt ) whose paths are all right-continuous (or
are all left-continuous) is progressively measurable.
23
Definition 4.3 A filtration (Ft )t≥0 (or the filtered space (Ω, F , (Ft )t≥0 , P)) is said
to satisfy the usual conditions if it is right-continuous and complete.
The ‘first time a certain phenomenon occurs’ will be a stopping time. Our funda-
mental examples will be first hitting times of sets. If (Xt ) is a stochastic process
and Γ ∈ B(R) we set
ii. if (Xt ) has continuous paths, then HΓ , for Γ a closed set, is a stopping time
relative to (Ft ).
With a stopping time we can associate ‘the information known at time τ’:
Fτ := {A ∈ F : A ∩ {τ ≤ t} ∈ Ft ∀t ≥ 0},
Fτ+ := {A ∈ F : A ∩ {τ < t} ∈ Ft },
Fτ− := σ ({A ∩ {τ > t} : t ≥ 0, A ∈ Ft })
24
(iii) If τ = t then Fτ = Ft , Fτ+ = Ft+ .
Proof.
We give a proof of (v).
Note that {ρ ≤ t} = {ρ ≤ t} ∩ {τ ≤ t} ∈ Ft since ρ is Fτ -measurable. Hence
ρ is a stopping time. We have τn ↓ τ by definition, and clearly τn is Fτ -measurable
since τ is Fτ -measurable.
It is often useful to be able to ‘stop’ a process at a stopping time and know
that the result still has nice measurability properties. If (Xt )t≥0 is progressively
measurable and τ is a stopping time, then X τ := (Xt∧τ : t ≥ 0) is progressively
measurable.
We’re going to use the sequence τn of stopping times in (15) to prove an impor-
tant generalisation of the Markov property for Brownian motion called the strong
Markov property. Recall that the Markov property says that Brownian motion has
‘no memory’ - we can start it again from Bs and Bt+s − Bs is just a Brownian
motion, independent of the path followed by B up to time s. The strong Markov
property says that the same is true if we replace s by a stopping time.
Granted (17), taking A = Ω, we find that B and B(τ) have the same finite dimen-
sional distributions, and since B(τ) has continuous paths, it must be a Brownian
25
motion. On the other hand (as usual using a monotone class argument), (17) says
(τ) (τ)
that (Bt1 , . . . , Bt p ) is independent of Fτ , and so B(τ) is independent of Fτ .
To establish (17), first observe that by continuity of B and F,
∞
(τ) (τ)
F(Bt1 , . . . , Bt p ) = lim∑1 k−1 k F B kn +t1 − B kn , . . . , B kn +t p − B kn a.s.,
n→∞ 2n <τ≤ 2n 2 2 2 2
k=0
It was not until the 1940’s that Doob properly formulated the strong Markov
property and it was 1956 before Hunt proved it for Brownian motion.
The following result, known as the reflection principle, was known at the end
of the 19th Century for random walk and appears in the famous 1900 thesis of
Bachelier, which introduced the idea of modelling stock prices using Brownian
motion (although since he had no formulation of the strong Markov property, his
proof is not rigorous).
Proof. We apply the strong Markov property to the stopping time Ta = inf{t > 0 :
Bt = a}. We have already seen that Ta < ∞ a.s. and so in the notation of Theo-
rem 4.8,
a (T )
P[St ≥ a, Bt ≤ b] = P[Ta ≤ t, Bt ≤ b] = P[Ta ≤ t, Bt−T a
≤ b − a] (18)
26
(T )
a
(since Bt−T a
= Bt − BTa = Bt − a).
Now B a ) is a Brownian motion, independent of FTa and hence of Ta . Since
(T
B a ) has the same distribution as −B(Ta ) , (Ta , B(Ta ) ) has the same distribution as
(T
(Ta , −B(Ta ) ).
So
(T )
P[Ta ≤ t, B(Ta ) ≤ b − a] = P[Ta ≤ t, −Bt−T
a
a
≤ b − a]
= P[Ta ≤ t, Bt ≥ 2a − b] = P[Bt ≥ 2a − b],
5.1 Definitions
Definition 5.1 An adapted stochastic process (Xt )t≥0 such that Xt ∈ L1 (P) (i.e.
E[|Xt |] < ∞) for any t ≥ 0, is called
27
2
In particular, Bt , Bt2 −t and eθ Bt −θ t/2 are all martingales with respect to a filtration
(Ft )t≥0 for which (Bt )t≥0 is a Brownian motion.
Warning: It is important to remember that a process is a martingale with respect
to a filtration - giving yourself more information (enlarging the filtration) may
destroy the martingale property. For us, even when we don’t explicitly mention it,
there is a filtration implicitly assumed (usually the natural filtration associated with
the process, augmented to satisfy the usual conditions).
Given a martingale (or submartingale) it is easy to generate many more.
28
Theorem 5.4 (Doob’s maximal and L p inequalities)
If (Xt )t≥0 is a right continuous martingale or positive sub-martingale, then for any
T ≥ 0, λ > 0,
h i
λ p P sup |Xt | ≥ λ ≤ E |XT | p , p ≥ 1
t≤T
(19)
p p p p
E sup |Xt | ≤ E |XT | , p > 1.
t≤T p−1
As an application of Doob’s maximal inequality, we derive a useful bound for
Brownian motion.
Proposition 5.5 Let (Bt )t≥0 be Brownian motion and St = supu≤t Bu . For any λ >
0 we have
λ 2t
P[St ≥ λt] ≤ e− 2 .
2 t/2
Proof. Recall that eαBt −α , t ≥ 0, is a non-negative martingale. It follows that,
for α ≥ 0,
h 2 2
i
P[St ≥ λt] ≤ P sup eαBu −α t/2 ≥ eαλt−α t/2
u≤t
h 2 2
i
≤ P sup eαBu −α u/2 ≥ eαλt−α t/2
u≤t
2
h 2
i
≤ e−αλt+α t/2 E eαBt −α t/2 .
| {z }
=1
29
Proof. Take a sequence of rational numbers 0 = t0 < t1 < . . . < tn = T . Applying
Theorem 5.6 with S = min{ti : Xti ≥ λ } ∧ T , we obtain
Rearranging,
λ P( sup Xti ≥ λ ) ≤ E[X0 ] + E[XT− ]
1≤i≤n
(where XT−
= − min(XT , 0)). Now XT− is a non-negative submartingale and so we
can apply Doob’s inequality directly to it, from which
λ P( sup Xt−
i
≥ λ ) ≤ E[XT− ],
1≤i≤n
and, since E[XT− ] ≤ E[|XT |], taking the (monotone) limit in nested sequences in
[0, T ] ∩ Q, gives the result.
30
exists for any real t > 0.
Furthermore, the function g : R+ → R defined by g(t) = f (t+) is càdlàg (‘con-
tinue à droite avec des limites à gauche’; i.e. right continuous with left limits) at
every t > 0.
Lemma 5.10 (Doob’s upcrossing lemma in discrete time) Let (Xt )t≥0 be a su-
permartingale and F a finite subset of [0, T ]. If a < b then
∀t ∈ (0, ∞) lim Xr (ω) and lim Xr (ω) exist and are finite. (21)
r↑t, r∈Q r↓t, r∈Q
Proof. Fix T > 0. From Lemmas 5.7 and 5.10, there exists ΩT ⊆ Ω, with P(ΩT ) =
1, such that for any ω ∈ ΩT
∀a, b ∈ Q with a < b, U [a, b], (Xt (ω) : t ∈ [0, T ] ∩ Q) < ∞,
and
sup |Xt (ω)| < ∞.
t∈[0,T ]∩Q
It follows that the limits in (21) are well defined and finite for all t ≤ T and ω ∈ ΩT .
To complete the proof, take Ω := Ω1 ∩ Ω2 ∩ Ω3 ∩ . . ..
Using this, even if X is not right-continuous, its right-continuous version is a.s.
well defined. The following fundamental regularisation result is again due to Doob.
31
Theorem 5.13 Let X be a supermartingale with respect to a right-continuous and
complete filtration (Ft ). If t 7→ E[Xt ] is right continuous (e.g. if X is a mar-
tingale) then X admits a modification with càdlàg paths, which is also an (Ft )-
supermartingale.
Proof. By Theorem 5.12, there exists Ω0 ⊆ Ω, with P[Ω0 ] = 1, such that the
process
(
limr↓t, r∈Q Xr (ω) ω ∈ Ω0
Xt+ (ω) =
0 ω∈/ Ω0
is well defined and adapted to (Ft ). By Lemma 5.9, it has càdlàg paths.
To check that we really have only produced a modification of Xt , that is Xt =
Xt+ almost surely, let tn ↓ t be a sequence of rationals. Set Yk = Xt−k for every integer
k ≤ 0. Then Y is a backwards supermartingale with respect to the (backward)
discrete filtration Hk = Ft−k and supk≤0 E[|Yk |] < ∞. The convergence theorem for
backwards supermartingales (see Appendix C) then implies that Xtn converges to
Xt+ in L1 . In particular, Xt+ ∈ L1 and thanks to L1 convergence, we can pass to the
limit n → ∞ in the inequality Xt ≥ E[Xtn |Ft ] to obtain Xt ≥ E[Xt+ |Ft ].
Right continuity of t 7→ E[Xt ] implies E[Xt+ − Xt ] = 0, so that Xt = Xt+ almost
surely.
Now, to check the supermartingale property, let s < t and let (sn )n≥0 be a se-
quence of rationals decreasing to s. Assume that sn < t for all n. Then, as above,
Xsn → Xs+ ∈ L1 , so if A ∈ Fs+ , which implies A ∈ Fsn for every n, with tn as above,
Since this holds for all A ∈ Fs+ and since Xs+ and E[Xt+ |Fs+ ] are both Fs+ -
measurable, this shows
Xs+ ≥ E[Xt+ |Fs+ ].
For martingales all the inequalities can be replaced by equalities and for submartin-
gales we use that −X is a supermartingale.
Remark 5.15 Let’s make some comments on the assumptions of the theorem.
i. The assumption that the filtration is right continuous is necessary. For ex-
ample, let Ω = {−1, +1}, P[{1}] = 1/2 = P[{−1}]. We set ε(ω) = ω and
Xt = 0 if 0 ≤ t ≤ 1, Xt = ε if t > 1. Then X is a martingale with respect
to the canonical filtration (which is complete since there are no nonempty
negligible sets), but no modification of X can be right continuous at t = 1.
32
ii. Similarly, take Xt = f (t), where f (t) is deterministic, non-increasing and not
right continuous. Then no modification can have right continuous sample
paths.
Proof. Let D be a countable dense subset of R+ . We showed before that for a < b
1
E[U([a, b], (Xt )t∈D∩[0,T ] )] ≤ E[(XT − a)− ].
b−a
By the Monotone Convergence Theorem,
1
E[U([a, b], (Xt )t∈D )] ≤ sup E[(Xt − a)− ] < ∞
b − a t≥0
Theorem 5.18 Let (Xt : t ≥ 0) be a martingale with right continuous sample paths.
Then TFAE:
i. X is closed;
33
Proof. That the first condition implies the second is easy. If Z ∈ L1 , then E[Z|G ],
where G varies over sub σ -fields of F is uniformly integrable.
If the second condition holds, then, in particular, (Xt )t≥0 is bounded in L1 and,
by Theorem 5.16, Xt → X∞ almost surely. By the uniform integrability, we also
have convergence in L1 , so the third condition is satisfied.
Finally, if the third condition holds, for every s ≥ 0, pass to the limit as t → ∞ in
the equality Xs = E[Xt |Fs ] (using the fact that conditional expectation is continuous
for the L1 -norm) and obtain Xs = E[X∞ |Fs ].
We should now like to establish conditions under which we have an optional
stopping theorem for continuous martingales. As usual, our starting point will be
the corresponding discrete time result and we shall pass to a suitable limit.
YS = E[YT |GS ],
where
GS = {A ∈ G∞ : A ∩ {S = n} ∈ Gn for every n ∈ N},
with the convention that YT = Y∞ on the event {T = ∞}, and similarly for YS .
Let (Xt )t≥0 be a right continuous martingale or supermartingale such that Xt con-
verges almost surely as t → ∞ to a limit X∞ . Then for every stopping time T , we
define
XT (ω) = 1{T (ω)<∞} XT (ω) (ω) + 1{T (ω)=∞} X∞ (ω).
Theorem 5.20 Let (Xt )t≥0 be a uniformly integrable martingale with right contin-
uous sample paths. Let S and T be two stopping times with S ≤ T . Then XS and XT
are in L1 and XS = E[XT |FS ].
In particular, for every stopping time S we have XS = E[X∞ |FS ] and E[XS ] =
E[X∞ ] = E[X0 ].
∞
k+1
Sn = ∑ n
1{k2−n <S≤(k+1)2−n } + ∞1{S=∞} .
k=0 2
Then Tn and Sn are sequences of stopping times that decrease respectively to T and
S. Moreover, Sn ≤ Tn for every n ≥ 0.
34
For each fxed n, 2n Sn and 2n Tn are stopping times of the discrete filtration
(n)
Hn = Fk/2n and Yk = Xk/2n is a discrete martingale with respect to this filtration.
(n) (n)
From Theorem 5.19, Y2n Sn and Y2n Tn are in L1 and
(n) (n)
XSn = Y2n Sn = E[Y2n Tn |H2n Sn ] = E[XTn |FSn ].
Let A ∈ FS . Since FS ⊆ FSn we have A ∈ FSn and so E[1A XSn ] = E[1A XTn ]. By
right continuity, XS = limn→∞ XSn and XT = limn→∞ XTn . The limits also hold in
L1 (in fact, by Theorem 5.19, XSn = E[X∞ |FSn ] for every n and so (XSn )n≥1 and
(XTn )n≥1 are uniformly integrable). L1 convergence implies that the limits XS and
XT are in L1 and allows us to pass to a limit, E[1A XS ] = E[1A XT ]. This holds for
all A ∈ FS and so since XS is FS -measurable we conclude that XS = E[XT |FS ], as
required.
Corollary 5.21 In particular, for any martingale with right continuous paths and
two bounded stopping times, S ≤ T , we have XS , XT ∈ L1 and XS = E[XT |FS ].
Proof. Let a be such that S ≤ T ≤ a. The martingale (Xt∧a )t≥0 is closed by Xa and
so we may apply our previous results.
Corollary 5.22 Suppose that (Xt )t≥0 is a martingale with right continuous paths
and T is a stopping time.
ii. if, in addition, (Xt )t≥0 is uniformly integrable, then (Xt∧T )t≥0 is uniformly
integrable and for every t ≥ 0, Xt∧T = E[XT |Ft ].
Example 5.23 Fix a > 0 and let Ta be the first hitting time of a by standard Brow-
nian motion. Then for each λ > 0,
√
E[e−λ Ta ] = e−a 2λ
.
2
Recall that Ntλ = exp(λ Bt − λ2 t) is a martingale. So Nt∧Tλ
a
is still a martingale and
a
it is bounded above by e and hence is uniformly integrable, so E[NTλa ] = E[N0λ ].
λ
That is,
2
eaλ E[e−λ Ta /2 ] = E[N0λ ] = 1.
√
Replace λ by 2λ and rearrange.
Warning: This argument fails if λ < 0 - the reason being that we lose the
uniform integrability.
35
6 Continuous semimartingales
Recall that our original goal was to make sense of differential equations driven
by ‘rough’ inputs. In fact, we’ll recast our differential equations as integral equa-
tions and so we must develop a theory that allows us to integrate with respect to
‘rough’ driving processes. The class of processes with which we work are called
semimartingales, and we shall specialise to the continuous ones.
We’re going to start with functions for which the integration theory that you
already know is adequate - these are called functions of finite variation.
Throughout, we assume that a filtered probability space (Ω, F , (Ft ), P) satis-
fying the usual conditions is given.
where the supremum is over partitions π = {0 = t0 < t1 < . . . < tnπ = T } of [0, T ].
We say that a is of finite variation on [0, T ] if V (a)T < ∞. The function a is of finite
variation if V (a)T < ∞ for all T ≥ 0 and of bounded variation if limT →∞ V (a)T < ∞.
Proposition 6.3 The function a is of finite variation if and only if it is equal to the
difference of two non-decreasing functions, a1 and a2 .
Moreover, if a is of finite variation, then a1 and a2 can be chosen so that
V (a)t = a1 (t) + a2 (t). If a is càdlàg then V (a)t is also càdlàg.
Proof.
n(π)−1
V (a)t − a(t) = sup ∑ |a(ti+1 ) − a(ti )| − a(ti+1 ) − a(ti )
π i=0
36
then we can develop a theory of integration with respect to a by declaring that
Z t Z t Z t
f (s)da(s) = f (s)µ+ (ds) − f (s)µ− (ds),
0 0 0
provided that
Z t Z t
| f (s)||µ|(ds) = | f (s)| (µ+ (ds) + µ− (ds)) < ∞.
0 0
In our ‘dot’-notation:
g · ( f · a) = (g f ) · a. (22)
Proposition 6.5 (Stopping) Let a be of finite variation as above and fix t ≥ 0. Set
at (s) = a(t ∧ s). Then at is of finite variation and for any measurable a-integrable
function f
Z u∧t Z u Z u
f (s)da(s) = f (s)dat (s) = f (s)1[0,t] (s)da(s), u ∈ [0, ∞].
0 0 0
37
Proof. The statement is trivially true for F(x) = x. Now by Proposition 6.6, it is
straightforward to check that if the statement is true for F, then it is also true for
xF(x). Hence, by induction, the statement holds for all polynomials. To complete
the proof, approximate F ∈ C1 by a sequence of polynomials.
Proof. The right continuity is immediate from the deterministic theory, but we
need to check that (K · A)t is adapted. For this we check that if t > 0 is fixed and
h : [0,t] × Ω → R is measurable with respect to B([0,t]) ⊗ Ft , and if
Z t
|h(s, ω)||dAs (ω)| < ∞
0
38
for every ω ∈ Ω, then Z t
h(s, ω)dAs (ω)
0
is Ft -measurable.
Fix t > 0. Consider first h defined by h(s, ω) = 1(u,v] (s)1Γ (ω) for (u, v] ⊆ [0,t]
and Γ ∈ Ft . Then
(h · A)t = 1Γ (Av − Au )
is Ft -measurable. By the Monotone Class Theorem, (h · A)t is Ft -measurable for
any h = 1G with G ∈ B([0,t]) ⊗ Ft , or, more generally, any bounded B([0,t]) ⊗
Ft -measurable function h. If h is a general B([0,t]) ⊗ Ft -measurable function
satisfying Z t
|h(s, ω)||dAs (ω)| < ∞ ∀ω ∈ Ω,
0
then h is aR pointwise limit, h = limn→∞ hn , of simple functions with |h| ≥ |hn |. The
integrals hn (s, ω)dAs (ω) converge by the Dominated Convergence Theorem, and
hence 0t h(s, ω)dAs (ω) is also Ft -measurable (as a limit of Ft -measurable func-
R
It is worth recording that our integral can be obtained through the limiting
procedure that one might expect. Let f : [0, T ] → R be continuous and 0 = ton <
t1n < · · · < t pnn = T be a sequence of partitions of [0, T ] with mesh tending to zero.
Then Z T pn
n
) a(tin ) − a(ti−1
n
f (s)da(s) = lim ∑ f (ti−1 ) .
0 n→∞
i=1
n ) if s ∈ (t n ,t n ],
The proof is easy: let fn : [0, T ] → R be defined by fn (s) = f (ti−1 i−1 i
1 ≤ i ≤ pn , and fn (0) = 0. Then
pn Z
n
∑ f (ti−1 ) a(tin ) − a(ti−1
n
) = fn (s)µ(ds),
i=1 [0,T ]
where µ is the signed measure associated with a. The desired result now follows
by the Dominated Convergence Theorem.
In the argument above, fn took the value of f at the left endpoint of each inter-
val. In the finite variation case, we could equally have approximated by fn taking
the value of f at the midpoint of the interval, or the right hand endpoint, or any
other point in between. However, we are going to extend our theory to processes
that do not have finite variation and then, even if f is continuous, it matters whether
fn takes the value of f at the left or right endpoint of the intervals.
The processes that make our theory work are slight generalisations of martin-
gales.
39
6.3 Continuous local martingales
Definition 6.11 An adapted process (Mt : t ≥ 0) is called a continuous local mar-
tingale if M0 = 0, it has continuous trajectories a.s. and if there exists a non-
decreasing sequence of stopping times (τn )n≥1 such that τn ↑ ∞ a.s. and for each n,
M τn = (Mt∧τn : t ≥ 0) is a (uniformly integrable) martingale. We say (τn ) reduces
M.
More generally, when we do not assume that M0 = 0, we say that M is a con-
tinuous local martingale if Nt = Mt − M0 is a continuous local martingale.
Tn = inf{t ≥ 0 : |Mt | ≥ n}
reduces M.
iv. If M is a continuous local martingale, then for any stopping time ρ, the
stopped process M ρ is also a continuous local martingale.
Proof. (i) Write Mt = M0 +Nt . By definition, there exists a sequence Tn of stopping
times that reduces N. Thus, if s ≤ t, for every n,
Since M takes non-negative values, let n → ∞ and apply Fatou’s lemma for condi-
tional expectations to find
Ms ≥ E[Mt |Fs ]. (23)
Taking s = 0, E[Mt ] ≤ E[M0 ] < ∞. So Mt ∈ L1 for every t ≥ 0, and (23) says that
M is a supermartingale.
(ii) By the same argument,
Since |Mt∧Tn | ≤ Z, this time apply the Dominated Convergence Theorem to see that
Mt∧Tn converges in L1 (to Mt ) and Ms = E[Mt |Fs ].
The other two statements are immediate.
40
Theorem 6.13 A continuous local martingale M with M0 = 0 a.s., is a process of
finite variation if and only if M is indistinguishable from zero.
(where δ (π) is the mesh of π), by the Dominated Convergence Theorem (since
|Nti − Nti−1 | ≤ V (N)t ≤ n and so n is a dominating function).
It then follows by Fatou’s Lemma that
E[Mt2 ] = E[ lim Mt∧τ
2
n
2
] ≤ lim E[Mt∧τ n
]=0
n→∞ n→∞
We are now going to see that the analogue of this process exists for any continuous
local martingale. Ultimately, we shall see that the quadratic variation is in some
sense a ‘clock’ for a local martingale, but that will be made more precise in the
very last result of the course.
41
Theorem 6.14 Let M be a continuous local martingale. There exists a unique (up
to indistinguishability) non-decreasing, continuous adapted finite variation pro-
cess (hM, Mit : t ≥ 0), starting in zero, such that (Mt2 − hM, Mit : t ≥ 0) is a con-
tinuous local martingale.
Furthermore, for any T > 0 and any sequence of partitions πn = {0 = t0n <
t1n < . . . < tn(π
n
n)
= T } with δ (πn ) = sup1≤i≤n(πn ) (tin − ti−1
n ) → 0 as n → ∞
n(πn )
2
hM, MiT = lim ∑ (Mt n
i
− Mti−1
n ) , (24)
n→∞
i=1
i=1
42
To move to a general continuous local martingale, we consider a sequence of
stopped processes.
Details are in, for example, Le Gall’s book.
Our theory of integration is going to be an ‘L2 -theory’. Let us introduce the
martingales with which we are going to work. We are going to think of them as
being defined up to indistinguishability - nothing changes if we change the pro-
cess on a null set. Think of this as analogous to considering Lebesgue integrable
functions as being defined ‘almost everywhere’.
Definition 6.15 Let H2 be the space of L2 -bounded càdlàg martingales, i.e.
43
Theorem 6.17 Let M be a continuous local martingale with M0 ∈ L2 .
i. TFAE
ii. TFAE
Proof. The second statement will follow from the first on applying it to Mt∧a for
every choice of a ≥ 0.
To prove the first set of equivalences, without loss of generality, suppose that
M0 = 0 (or replace M by M − M0 ).
Suppose that M is a martingale, bounded in L2 . Doob’s L2 -inequality implies
that for every T > 0,
E[ sup Mt2 ] ≤ 4E[MT2 ],
0≤t≤T
Let Sn = inf{t ≥ 0 : hM, Mit ≥ n}. Then the continuous local martingale Mt∧S 2 −
n
2
hM, Mit∧Sn is dominated by sups≥0 Ms +n, which is integrable. By Proposition 6.12
this continuous local martingale is a uniformly integrable martingale and hence
2
E[hM, Mit∧Sn ] = E[Mt∧Sn
] ≤ E[sup Ms2 ] ≤ C < ∞.
s≥0
Let n and then t tend to infinity and use the Monotone Convergence Theorem to
obtain E[hM, Mi∞ ] < ∞.
Conversely, assume that E[hM, Mi∞ ] < ∞. Set Tn = inf{t ≥ 0 : |Mt | ≥ n}. Then
2
the continuous local martingale Mt∧T − hM, Mit∧Tn is dominated by n2 + hM, Mi∞
n
which is integrable. From Proposition 6.12 again, this continuous local martingale
is a uniformly integrable martingale and hence for every t ≥ 0,
2
E[Mt∧Tn
] = E[hM, Mit∧Tn ] ≤ E[hM, Mi∞ ] = C0 < ∞. (26)
Let n → ∞ and use Fatou’s lemma to see that E[Mt2 ] ≤ C0 < ∞, so (Mt )t≥0 is
bounded in L2 .
44
We still have to check that (Mt )t≥0 is a martingale. However, (26) shows that
(Mt∧Tn )n≥1 is uniformly integrable and so converges both almost surely and in L1
to Mt for every t ≥ 0. Recalling that M Tn is a martingale, L1 convergence allows us
to pass to the limit as n → ∞ in the martingale property, so M is a martingale.
Finally, if the two properties hold, then M 2 −hM, Mi is dominated by supt≥0 Mt2 +
hM, Mi∞ , which is integrable, and so Proposition 6.12 again says that M 2 − hM, Mi
is a uniformly integrable martingale.
Our previous theorem immediately yields that for a martingale M with M0 = 0
we have
kMk2H2 = E[M∞2 ] = E[hMi∞ ].
We can also deduce a complement to Theorem 6.13.
In other words, there is nothing ‘in between’ finite variation and finite quadratic
variation for this class of processes.
Proof. We already know that the first and third statements are equivalent. That
the first implies the second is trivial, so we must just show that the second implies
the first. We have hMi∞ = limt→∞ hMit = 0. From Theorem 6.17, M ∈ H 2 and
E[M∞2 ] = E[hMi∞ ] = 0 and so Mt = E[M∞ |Ft ] = 0 almost surely.
We can see that the quadratic variation of a martingale is telling us something
about how its variance increases with time. We also need an analogous quantity
for the ‘covariance’ between two martingales. This is most easily defined through
polarisation.
Definition 6.19 The quadratic co-variation between two continuous local martin-
gales M, N is defined by
1
hM, Ni := (hM + N, M + Ni − hM, Mi − hN, Ni) . (27)
2
It is often called the (angle) bracket process of M and N.
i. the process hM, Ni is the unique finite variation process, zero in zero, such
that (Mt Nt − hM, Nit : t ≥ 0) is a continuous local martingale;
45
iii. for any stopping time τ,
iv. for any t > 0 and a sequence of partitions πn of [0,t] with mesh converging
to zero
∑ (Mti+1 − Mti )(Nti+1 − Nti ) → hM, Nit , (29)
ti ∈πn
Remark 6.22 It follows that if M and N are two martingales bounded in L2 and
with M0 N0 = 0 a.s., then (Mt Nt − hM, Nit , t ≥ 0) is a uniformly integrable martin-
gale. In particular, for any stopping time τ,
Take M, N ∈ H2 , which we recall had the norm kMk2H2 = E[hM, Mi∞ ] = E[M∞2 ].
Then we see that this norm is consistent with the inner product on H2 × H2 given
by E[hM, Ni∞ ] = E[M∞ N∞ ] and, by the usual Cauchy-Schwarz inequality,
p
E[|hM, Ni∞ |] = E[|M∞ N∞ |] ≤ E[hMi∞ ]E[hNi∞ ].
Actually, it is easy to obtain an almost sure version of this inequality, using that
q q
∑ Mt − Mt Nt − Nt ≤ ∑ Mt − Mt 2 ∑ Nt − Nt 2
i+1 i i+1 i i+1 i i+1 i
46
and taking limits to deduce that
p p
|hM, Nit | ≤ hMit hNit .
It’s often convenient to have a more general version of this inequality.
Theorem 6.23 (Kunita-Watanabe inequality) Let M, N be continuous local mar-
tingales and K, H two measurable processes. Then for all 0 ≤ t ≤ ∞,
Z t Z t 1/2 Z t 1/2
2 2
|Hs ||Ks ||dhM, Nis | ≤ Hs dhMis Ks dhNis a.s.. (31)
0 0 0
If X,Y are two continuous semimartingales, we can define their co-variation
hX,Y i via the polarisation formula that we used for martingales. If Xt = X0 + Mt +
At and Yt = Y0 + Nt + At0 , then hX,Y it = hM, Nit .
47
7 Stochastic Integration
At the beginning of the course we argued that whereas classically differential equa-
tions take the form
dX(t) = a(t, X(t))dt,
in many settings, the dynamics of the physical quantity in which we are interested
may also have a random component and so perhaps takes the form
dXt = a(t, Xt )dt + b(t, Xt )dBt .
We actually understand equations like this in the integral form:
Z t Z t
Xt − X0 = a(s, Xs )ds + b(s, Xs )dBs .
0 0
If a is nice enough, then the first term has a classical interpretation. It is the second
term, or rather a generalisation of it, that we want to make sense of now.
The first approach will be to mimic what we usually do for construction of the
Lebesgue integral, namely work out how to integrate simple functions and then
extend to general functions through passage to the limit. We’ll then provide a very
slick, but not at all intuitive, approach that nonetheless gives us some ‘quick wins’
in proving properties of the integral.
for some m ∈ N, 0 ≤ t0 < t1 < . . . < tm+1 and where ϕ (i) are bounded Fti -measurable
random variables. Define the stochastic integral ϕ • M of ϕ in (33) with respect to
M ∈ H2 via
m
(ϕ • M)t := ∑ ϕ (i) (Mt∧ti+1 − Mt∧ti ), t ≥ 0. (34)
i=0
48
If we write Mti := ϕ (i) (Mt∧ti+1 − Mt∧ti ) then clearly M i ∈ H2 and so ϕ • M is a
martingale. Moreover, since for i 6= j the intervals (ti ,ti+1 ] and (t j ,t j+1 ] are disjoint,
Mti Mtj is a martingale and hence hM i , M j it = 0. Using the bilinearity of the bracket
process then yields
m m 2 Zt 2
hϕ • Mit = ∑ hM i it = ∑ ϕ (i) hMiti+1 ∧t − hMiti ∧t = ϕs dhMis , t ≥ 0.
i=0 i=0 0
(35)
We already used the notation that if K is progressively measurable and A is of finite
variation, then Z t
(K · A)t = Ks (ω)dAs (ω), t ≥ 0.
0
In that notation
hϕ • Mi = ϕ 2 · hMi.
More generally, for N ∈ H2 ,
m m
hϕ • M, Nit = ∑ hM i , Nit = ∑ ϕ (i) hM, Niti+1 ∧t − hM, Niti ∧t
i=0 i=0 (36)
Z t
= ϕs dhM, Nis = (ϕ · hM, Ni)t .
0
The proof is easy - we just need to show linearity. But given ϕ, ψ ∈ E , we use
a refinement of the partitions on which they are constant to write them as simple
functions with respect to the same partition and the result is trivial.
We are expecting an L2 -theory - we have already found an expression for the ‘L2 -
norm’ of ϕ • M. Let us define the appropriate spaces more carefully.
49
We have E ⊆ L2 (M) and (35) tells us that the map E → H20 given by ϕ 7→ ϕ • M
is a linear isometry. If we can show that the elementary functions are dense in
L2 (M), this observation will allow us to define integrals of functions from L2 (M)
with respect to M via a limiting procedure.
Z t s Z t
p
E[|Xt |] ≤ E |Ku |dhMiu ≤ E Ku2 dhMiu EhMit < +∞
0 0
since M ∈ H2 and K ∈ L2 (M) (we took one of the functions to be identically one
in Cauchy-Schwarz).
Taking ϕ = ξ 1(s,t] ∈ E , with 0 ≤ s < t and ξ a bounded Fs -measurable r.v., we
have Zt
0 =< K, ϕ >L2 (M) = E ξ Ku dhMiu = E [ξ (Xt − Xs )] .
s
Since this holds for any Fs -measurable bounded ξ , we conclude that E[(Xt −
Xs )|Fs ] = 0. In other words, X is a martingale. But X is also continuous and
of finite variation and hence X ≡ 0 a.s. Thus K = 0 dhMi − a.e. a.s. and hence
K = 0 in L2 (M).
We now know that any K ∈ L2 (M) is a limit of simple processes ϕ n → K. For
each ϕ n we can define the stochastic integral ϕ n • M. The isometry property then
shows that ϕ n • M converge in H2 to some element that we denote K • M and which
does not depend on the choice of approximating sequence ϕ n .
50
We also know already that the quadratic variation is
N(π)−1 2 N(π)−1
t = lim ∑ Bt j+1 − Bt j = Bt2 − B20 − 2 ∑ Bt j Bt j+1 − Bt j ,
δ (π)→0 j=0 j=0
This so-called Stratonovich integral has the advantage that from the point of view
of calculations, the rules of Newtonian calculus hold true. From a modelling per-
spective however, it can be the wrong choice. For example, suppose that we are
modelling the change in a population size over time and we use [ti ,ti+1 ) to repre-
sent the (i + 1)st generation. The change over (ti ,ti+1 ) will be driven by the number
of adults, so the population size at the beginning of the interval.
Theorem 7.7 Let M ∈ H2 . For any K ∈ L2 (M) there exists a unique element in
H20 , denoted K • M, such that
L2 (M) 3 K −→ K • M ∈ H20
is a linear isometry.
51
Proof. We first check uniqueness. Suppose that there are two such elements, X and
X 0 . Then
hX, Ni − hX 0 , Ni = hX − X 0 , Ni ≡ 0, ∀N ∈ H2 .
Taking N = X − X 0 we conclude, by Corollary 6.18, that X = X 0 .
Now let us verify (41) for the Itô integral.
Fix N ∈ H2 . First note that for K ∈ L2 (M) the Kunita-Watanabe inequality
shows that Z ∞
E |Ks ||dhM, Nis | ≤ kKkL2 (M) kNkH2 < ∞
0
and
hM i , Nit = ϕ (i) hM, Niti+1 ∧t − hM, Niti ∧t ,
so Zt
(i)
hK • M, Nit = ∑ ϕ hM, Niti+1 ∧t − hM, Niti ∧t = Ks dhM, Nis .
0
Now observe that the mapping X 7→ hX, Ni∞ is continuous from H2 into L1 . Indeed,
by Kunita-Watanabe
h i h i1/2 h i1/2
E |hX, Ni| ≤ E hX, Xi∞ E hN, Ni∞ = kNkH2 kXkH2 .
52
but replacing N by the stopped martingale N t in this identity also gives
hK • M, Nit = K · hM, Ni
t
which completes the proof of (41).
We could write the relationship (41) as
Z · Z t
h Ks dMs , Nit = Ks dhM, Nis ;
0 0
that is, the stochastic integral ‘commutes’ with the bracket. One important conse-
quence is that if M ∈ H2 and K ∈ L2 (M), then applying (41) twice gives
hK • M, K • Mi = K · K · hM, Mi = K 2 · hM, Mi.
In other words, the bracket process of Ks dMs is Ks2 dhM, Mis . More generally,
R R
53
Proposition 7.9 (Stopped stochastic integrals) Let M ∈ H2 , K ∈ L2 (M) and τ a
stopping time. Then
(K • M)τ = K • M τ = K1[0,τ] • M.
Theorem 7.11 Let M be a continuous local martingale. For any K ∈ L2loc (M)
there exists a unique continuous local martingale, zero in zero, denoted K • M and
called the Itô integral of K with respect to M, such that for any continuous local
martingale N
hK • M, Ni = K · hM, Ni. (42)
If M ∈ H2 and K ∈ L2 (M) then this definition coincides with the previous one.
Proof. We only sketch the proof. Not surprisingly, we use a stopping argument.
For every n ≥ 1, set
Z t
2
τn = inf t ≥ 0 : (1 + Ks )dhMis ≥ n ,
0
K • M τn = (K • M τm )τn
54
so there is a unique process, that we denote K • M such that
(K • M)τn = K • M τn
In particular, any adapted process with continuous sample paths is a locally bounded
progressively measurable process.
If K is progressively measurable and locally bounded, then for any finite vari-
ation process A, almost surely,
Z t
∀t ≥ 0, |Ks ||dAs | < ∞
0
K • X := K • M + K · A
often written Z t Z t Z t
(K • X)t = Ks dXs = Ks dMs + Ks dAs .
0 0 0
55
This integral inherits all the nice properties of the Stieltjes integral and the Itô
integral that we have already derived (linearity, associativity, stopping etc.).
And of course, it is still the case for an elementary function ϕ ∈ E that
m
(ϕ • X)t = ∑ ϕ (i) Xti+1 ∧t − Xti ∧t .
i=1
We should also like to know how our integral behaves under limits.
Proof. We can treat the finite variation part, X0 + A, and the local martingale part,
M, separately. For the first, note that
Z t Z t Z t
n n + n −
Ku dAu = Ku dAu − Ku dAu
0 0 0
Z t Z t Z t
≤ |Kun |dA+
u + |Kun |dA−
u = |Kun ||dAu |.
0 0 0
The right hand side tends to zero by the usual Dominated Convergence Theorem.
For a fixed t ≥ 0, and any given ε > 0, we may take m large enough that P[τm ≤
t] ≤ ε/2. We then have
n n
P sup |(K • M)s | > ε ≤ P sup |(K • M)s | > ε + ε/2
s≤t s≤t∧τm
1
≤ kK n • M τm k2H2 + ε/2 ≤ ε,
ε2
for n large enough.
From this we can also confirm that even in their most general form our stochas-
tic integrals can be thought of as limits of integrals of simple functions.
56
Proposition 7.15 Let X be a continuous semimartingale and K a left-continuous
locally bounded process. If π n is a sequence of partitions of [0,t] with mesh con-
verging to zero then
Z t
∑ Kt (Xt i i+1 − Xti ) −→ Ks dXs in probability as n → ∞.
ti ∈π n 0
Proof. Fix t and let π n be a sequence of partitions of [0,t] with mesh converging to
zero. Note that
so for any n
Xt Yt − X0Y0 = ∑ Xti (Yti+1 −Yti ) +Yti (Xti+1 − Xti ) + (Xti+1 − Xti )(Yti+1 −Yti )
ti ∈π n
−→ (X •Y )t + (Y • X)t + hX,Y it as n → ∞.
57
Proof. Let X i = X0i + M i + Ai be the semimartingale decomposition of X i and
denote by V i the total variation process of Ai . Let
and τr = min{τri , i = 1, . . . , d}. Then (τr )r≥0 is a family of stopping times with
τr ↑ ∞. It is sufficient to prove (44) up to time τr . We will prove that the result holds
for polynomials and then the full result follows by approximating C2 functions by
polynomials.
First note that it is obvious that the set of functions for which the formula holds
is a vector space containing the functions F ≡ 1 and F(x1 , . . . , xd ) = xi for i ≤ d.
We now check that if (44) holds for two functions F and G, then it holds for
the product FG. Integration by parts yields
Z t Z t
Ft Gt − F0 G0 = Fs dGs + Gs dFs + hF, Git . (45)
0 0
58
Proposition 7.18 Let M be a continuous local martingale and λ ∈ R. Then
λ2
E (M)t := exp λ Mt − hMit , t ≥ 0,
λ
(46)
2
is a continuous local martingale. In fact the same holds true for any λ ∈ C with
the real and imaginary parts being local martingales.
or in ‘differential form’ as
which shows E λ (M) solves the stochastic exponential differential equation driven
by M: dYt = λYt dMt .
Here is a beautiful application of exponential martingales:
ξ2
exp iξ Mt + t , t ≥ 0
2
ξ 2 ξ2
E exp iξ Mt + t Fs = exp iξ Ms + s
2 2
59
In other words, Mt − Ms is centred Gaussian with with variance t − s.
It follows also from (47) that for A ∈ Fs ,
h i h i
E 1A eiξ (Mt −Ms ) = P[A]E eiξ (Mt −Ms ) ,
so fixing A ∈ Fs with P[A] > 0 and writing PA = P[· ∩ A]/P[A] (which is a proba-
bility measure on Fs ) for the conditional probability given A, we have that Mt − Ms
has the same distribution under P as under PA and so Mt − Ms is independent of Fs
and we have that M is an (Ft )-Brownian motion.
So the quadratic variation is capturing all the information about M. This is sur-
prising - recall that it is a special property of Gaussians that they are characterised
by their means and the variance-covariance matrix, but in general we need to know
much more. It turns out that what we just saw for Brownian motion has a powerful
consequence for all continuous local martingales - they are characterised by their
quadratic variation and, in fact, they are all time changes of Brownian motion.
Proof. Note that τs is the first hitting time of an open set (s, ∞) for an adapted
process hMi with continuous sample paths, and hence τs is a stopping time (recall
that (Ft ) is right-continuous). Further, hMi∞ = ∞ a.s. implies that τs < ∞ a.s.
The process (τs : s ≥ 0) is non-decreasing and right-continuous (in fact s → τs is
the right-continuous inverse of t → hMit ). Let Gs := Fτs . Note that it satisfies
the usual conditions. The process B is right continuous by continuity of M and
right-continuity of τ. We have
and
E[B2s −s|Gu ] = E (Mττsn )2 − hMiττns |Fτu = (Mττun )2 −hMiττnu = (Mτu )2 −hMiτu = B2u −u,
60
where we used continuity of hMi to write hMiτu = u. It follows that B is indeed a
(Gs )-Brownian motion.
Finally, BhMit = MτhMit = Mt , again since the intervals of constancy of M and of
hMi coincide a.s. so that s → τs is constant on [t, τhMit ].
61
A The Monotone Class Lemma/ Dynkin’s π − λ Theorem
There are multiple names used for this result (often with slightly different formu-
lations).
Let E be an arbitrary set and let P(E) be the set of all subsets of E.
i. E ∈ M ;
i. E ∈ M ;
Definition A.2 A collection I of subsets of E such that 0/ ∈ calI and for all A,
B ∈ I , A ∩ B ∈ I is called a π-system.
Lemma A.4 (Monotone class lemma) If C ⊂ P(E) is stable under finite inter-
sections, then M = σ (C ).
62
ii. Let (Xi )i∈I be an arbitrary collection of random variables, and let G be
a σ -field on some probability space. In order to show that the σ -fields
σ (Xi : i ∈ I) and G are independent, it is enough to verify that (Xi1 , . . . , Xi p )
is independent of G for any choice of the finite set {i1 , . . . , i p } ⊂ I. (Observe
that the class of all events that depend on a finite number of the Xi is stable
under finite intersections and generates σ (Xi , i ∈ I).)
iii. Let (Xi )i∈I be an arbitrary collection of random variables and let Z be a
bounded real variable. Let i0 ∈ I. In order to verify that E[Z|Xi , i ∈ I] =
E[Z|Xi0 ], it is enough to show that E[Z|Xi0 , Xi1 . . . Xi p ] = E[Z|Xi0 ] for any
choice of the finite collection {i1 , . . . i p } ⊂ I. (Observe that the class of all
events A such that E[1A Z] = E[1A E[Z|Xi0 ]] is a monotone class.)
Remark B.2 (b) and (c) hold with convergence in probability replaced by con-
vergence in distribution; however (a) is not in general true for convergence in
distribution.
63
B.3 Uniform Integrability
If X is an integrable random variable (that is E[|X|] < ∞) and Λn is a sequence
of sets with P[Λn ] → 0, then E[|X1Λn |] → 0 as n → ∞. (This is a consequence
of the DCT since |X| dominates |X1Λn | and |X1Λn | → 0 a.s.) Uniform integrability
demands that this type of property holds uniformly for random variables from some
class.
i. Uniform integrability is necessary and sufficient for passing to the limit un-
der an expectation,
i.
sup E[|Xα |] < ∞,
α
ii.
P[|Xα | > N] → 0 as N → ∞, uniformly in α.
iii.
E[|Xα |1Λ ] → 0 as P[Λ] → 0, uniformly in α.
64
and taking limits we have
As max |xi+1 − xi | → 0, the left and right sides converge to E[X], as required.
Lemma B.6 Let X ≥ 0 a.s. Then limM→∞ E[X ∧ M] = E[X].
Proof. Check the result first for X having a discrete distribution and then extend to
general X by approximation.
Theorem B.7 (Monotone Convergence Theorem.) Suppose 0 ≤ Xn ≤ X and Xn →
X in probability. Then limn→∞ E[Xn ] = E[X].
Proof. For M > 0
E[X] ≥ E[Xn ] ≥ E[Xn ∧ M] → E[X ∧ M]
where the convergence on the right follows from the bounded convergence theo-
rem. It follows that
E[X ∧ M] ≤ lim inf E[Xn ] ≤ lim sup E[Xn ] ≤ E[X]
n→∞ n→∞
Lemma B.8 (Fatou’s lemma.) If Xn ≥ 0 and Xn ⇒ X, then lim inf E[Xn ] ≥ E[X].
Proof. Since E[Xn ] ≥ E[Xn ∧ M] we have
lim inf E[Xn ] ≥ lim inf E[Xn ∧ M] = E[X ∧ M].
By the Monotone Convergence Theorem E[X ∧ M] → E[X] and the result follows.
Theorem B.9 (Dominated Convergence Theorem) Assume Xn ⇒ X, Yn ⇒ Y , |Xn | ≤
Yn , and E[Yn ] → E[Y ] < ∞. Then E[Xn ] → E[X].
Proof. For simplicity, assume in addition that Xn + Yn ⇒ X + Y and Yn − Xn ⇒
Y − X (otherwise consider subsequences along which (Xn ,Yn ) ⇒ (X,Y )). Then
by Fatou’s lemma lim inf E[Xn + Yn ] ≥ E[X + Y ] and lim inf E[Yn − Xn ] ≥ E[Y −
X]. From these observations lim inf E[Xn ] + lim E[Yn ] ≥ E[X] + E[Y ], and hence
lim inf E[Xn ] ≥ E[X]. Similarly lim inf E[−Xn ] ≥ E[−X] and lim sup E[Xn ] ≤ E[X]
Lemma B.10 (Markov’s inequality)
P{|X| > a} ≤ E[|X|]/a, a ≥ 0.
Proof. Note that |X| ≥ aI{|X|>a} . Taking expectations proves the desired inequality.
65
B.5 Information and independence.
Information obtained by observations of the outcome of a random experiment is
represented by a sub-σ -algebra D of the collection of events F . If D ∈ D, then
the oberver “knows” whether or not the outcome is in D.
An S-valued random variable Y is independent of a σ -algebra D if
Random variables X and Y are independent if σ (X) and σ (Y ) are independent, that
is, if
P({X ∈ B1 } ∩ {Y ∈ B2 }) = P{X ∈ B1 }P{Y ∈ B2 }.
A) Y is D-measurable.
for all D ∈ D.
R R
B) D XdP = D Y dP
66
C Convergence of backwards discrete time supermartin-
gales
For a backwards supermartingale, Yn is Hn -measurable and for m < n ≤ 0, E[Yn |Hm ] ≤
Ym . Notice that the σ -field (Hn )n∈−N gets ‘smaller and smaller’ as n → −∞.
If (Yn )n∈−N is a backwards supermartingale and if the sequence (Yn )n∈−N is
bounded in L1 , then (Yn )n∈−N converges almost surely and in L1 as n → −∞.
Definition E.2 Given a normed vector space over reals (V, k · kV ), its dual V 0 is
the space of all continuous linear maps (functionals) from V to R. V 0 itself is a
vector space over R equipped with a norm
kφ kV 0 := sup |φ (v)|.
v∈V,kvkV ≤1
67
and let L p (S , F, µ) be the space of such functions for which k f k p < ∞. Observe
that k · k p is not yet a norm on L p – indeed it fails to satisfy (iii) in Definition
E.1 since if f = 0 µ-a.e. but is not zero, e.g. f = 1A for a measurable A ∈ F with
µ(A) = 0, then still k f k p = 0. We then say that k · k p is a semi-norm on L p .
To remedy this, we consider the space L p (S , F, µ) which is the quotient of L p
with respect to the equivalence relation f ∼ g iff f = g µ-a.e. Put differently, L p is
the space of equivalence classes of functions equal µ-a.e. and which are integrable
with pth power. Then (L p (S , F, µ), k · k p ) is a normed vector space for p ≥ 1. The
triangle inequality for k · k p is simply the Minkowski inequality.
A more geometric notion of measuring the relation between vectors is given by
an inner product.
Definition E.3 Given a vector space V over R, a mapping < ·, · >: V ×V → R is
called an inner product if
(i) it is bilinear and symmetric: < ax + bz, y >= a < x, y > +b < z, y > and
< x, y >=< y, x > for a, b ∈ R, x, y, z ∈ V ;
(ii) for any x ∈ V , < x, x >≥ 0;
(iii) < x, x >= 0 if and only if x is the zero vector in V .
This notion is very familiar on V = Rn where an inner product is given by
n
< x, y >= xT y = ∑ xi yi .
i=1
which clearly is non-negative and hence its discriminant has to be non-positive i.e.
as required. We note also that equality holds if and only if the vectors x, y are
linearly dependent i.e. x = ry for some r ∈ R.
√
The above implies that kxk := < x, x > is a norm on V . We say that the norm
is induced by an inner product. Among spaces L p defined above only L2 has norm
which is induced by an inner product, namely by
Z
< f , g >= f (x)g(x)µ(dx). (50)
S
68
F Banach spaces
We first define Cauchy sequences which embody the idea of a converging sequence
when we do not know the limiting element.
Definition F.1 A sequence (xn ) of elements in a normed vector space (X, k · k) is
called a Cauchy sequence if for any ε > 0 there exists N ≥ 1 such that for all
n, m ≥ N we have kxn − xm k ≤ ε.
Definition F.2 A normed vector space (X, k · k) is complete if every Cauchy se-
quence converges to an element x ∈ X. It is then called a Banach space. Further, if
the norm is induces by an inner product then (X, k · k) is called a Hilbert space.
Proposition F.3 If (X, k · kX ) is a normed vector space over R then its dual (X 0 , k ·
kX 0 ) in Definition E.2 is a Banach space.
In many cases it is interesting to build linear functionals satisfying certain addi-
tional properties. This is often done using the Hahn-Banach theorem. It states in
particular that a continuous linear functional defined on a linear subspace Y of X
can be continuously extended to the whole of X without increasing its norm. A
version of this is also known as the separating hyperplane theorem since it allows
to separate two convex sets (one open) using an affine hyperplane.
An important step in studying continuous linear functionals on X is achieved
by describing the structure of X 0 . We have
Proposition F.4 Let (S , F, µ) be a measurable space with a σ -finite measure.
Then for any p ≥ 1, L p (S , F, µ) is a Banach space and for p > 1 its dual is
equivalent to (isometric to) the space Lq (S , F, µ), where 1/p + 1/q = 1.
In particular we see that L2 is its own dual. This means that any continuous linear
functional on L2 can be identified with an element in L2 . This property remains
true for any Hilbert space:
Proposition F.5 Let (X, k·k) be a Hilbert space with the norm induced by an inner
√
product, kxk = < x, x >. If φ : X → R is a continuous linear map then there exists
an element xφ ∈ X such that
69
R
Note that the inner product < x, y >, or the integral S g(x) fφ (x)µ(dx), in the
above statement is well defined by (49)–(50).
On a (separable) Hilbert space, we can also state an infinite dimensional ana-
logue of the Pythagorean theorem. Recall that Rn is a Hilbert space with inner
product < x, y >= ∑ni=1 xi yi . This uses the canonical basis in Rn but if we take any
orthonormal basis in Rn , say (ε1 , . . . , εn ), then
n n
x = ∑ < x, εi > εi and hence kxk2 = ∑ < x, εi >2 ,
i=1 i=1
which is the Pythagorean theorem. The same reasoning gives < x, y >= ∑ni=1 <
x, εi >< y, εi >. The infinite dimensional version is known as the
Proposition F.6 (Parseval’s identity) Let (X, k · k) be a separable Hilbert space
with the norm induced by an inner product (x, y) →< x, y > and let (εn : n ≥ 1) be
an orthonormal basis of X. Then for any x, y ∈ X
< x, y >= ∑ < x, εn >< y, εn >, and in particular kxk2 = ∑ < x, εn >2 .
n≥1 n≥1
Finally we state one more result, which is crucial for the construction of the
stochastic integral.
Proposition F.7 Suppose (X, k · kX ) and (Y, k · kY ) are two Banach spaces, E ⊂ X
is a dense vector subspace in X and I : E → Y is a linear isometry, i.e. a linear map
which preserves the norm, kI(x)kY = kxkX for all x ∈ X. Then I may be extended
in an unique way to a linear isometry from X to Y .
Proof. Take x ∈ X and xn → x with xn ∈ E . Then, by the isometry property, (I(xn ))
is Cauchy in Y since (xn ) is Cauchy in X. It follows that it converges to some
element which we denote I(x). Further, if we have two sequences in E , (xn ) and
(yn ), both converging to x ∈ X and giving raise to potentially two elements I(x) and
I(x)0 then we can build a third sequence z2n = xn , z2n+1 = yn which also converges
to x and we see that I(zn ) has to converge and the limit has to agree with both I(x)
and I(x)0 , so that I(x) = I(x)0 is unique. It follows that we defined I(x) ∈ Y uniquely
for all x ∈ X. Further,
70