0% found this document useful (0 votes)
170 views133 pages

Stochastic Processes Lecture Notes

The document is a lecture on stochastic processes that reviews key concepts in measure theory needed for studying stochastic processes. It covers probability spaces, construction of measures, random variables, integrals, Lp spaces, Fubini's theorem, densities and Radon-Nikodým derivatives, conditional expectations and probabilities, stochastic processes, martingales, Markov processes, random walks, and Brownian motion. The review of measure theory concepts in the first chapter lays the mathematical foundation for defining and analyzing stochastic processes.

Uploaded by

zzz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views133 pages

Stochastic Processes Lecture Notes

The document is a lecture on stochastic processes that reviews key concepts in measure theory needed for studying stochastic processes. It covers probability spaces, construction of measures, random variables, integrals, Lp spaces, Fubini's theorem, densities and Radon-Nikodým derivatives, conditional expectations and probabilities, stochastic processes, martingales, Markov processes, random walks, and Brownian motion. The review of measure theory concepts in the first chapter lays the mathematical foundation for defining and analyzing stochastic processes.

Uploaded by

zzz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Anton Bovier

Stochastic Processes
Lecture, Summer term 2020, Bonn

November 27, 2020


Contents

1 A review of measure theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Probability spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Construction of measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 L p and L p spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6 Fubini’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.7 Densities, Radon-Nikodým derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 24

2 Conditional expectations and conditional probabilities . . . . . . . . . . . . . 31


2.1 Conditional expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Elementary properties of conditional expectations . . . . . . . . . . . . . . . 34
2.3 The case of random variables with absolutely continuous
distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4 The special case of L2 -random variables . . . . . . . . . . . . . . . . . . . . . . . 37
2.5 Conditional probabilities and conditional probability measures . . . . . 38

3 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1 Definition of stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Construction of stochastic processes; Kolmogorov’s theorem . . . . . . 44
3.3 Examples of stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1 Independent random variables . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.2 Gaussian processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.3 Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.4 Gibbs measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Upcrossings and convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Doob decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

v
vi Contents

4.5 A discrete time Itô formula. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72


4.6 Central limit theorem for martingales . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.7 Stopping times, optional stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5 Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.1 Markov processes with stationary transition probabilities . . . . . . . . . 83
5.2 The strong Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3 Markov processes and martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.4 Harmonic functions and martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.5 Dirichlet problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.5.1 Green function, equilibrium potential, and equilibrium
measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.6 Doob’s h-transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.7 Markov chains with countable state space . . . . . . . . . . . . . . . . . . . . . . 96

6 Random walks and Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103


6.1 Random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.2 Construction of Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3 Donsker’s invariance principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.4 Martingale and Markov properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.5 Sample path properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.6 The law of the iterated logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Chapter 1
A review of measure theory

In this first chapter I review the main concepts of measure the-


ory that we will need. I will not give proofs in most cases. Those
familiar with my W-Theorie 1 lecture will find that most of this
material was covered there. The main new concepts are uniform
integrability and the Radon-Nikodým derivatives. For more de-
tails, there is a wealth of references on measure theory. See e.g.
[2, 14, 10, 8, 5, 1].

1.1 Probability spaces

A space, Ω, is an arbitrary non-empty set. Elements of a space Ω will be denoted by


ω. If A ⊂ Ω is a subset of Ω, we denote by 1A the indicator function of the set A, i.e.

1, if ω ∈ A,
1A (ω) = 

(1.1.1)
0, if ω ∈ Ac ≡ Ω\A.

Definition 1.1. Let Ω be a space. A family A ≡ {Aλ }λ∈I , Aλ ⊂ Ω, with I an arbitrary


set, is called a class of Ω. A non-empty class of Ω is called an algebra, if:
(i) Ω ∈ A.
(ii) For all A ∈ A, Ac ∈ A.
(iii) For all A, B ∈ A, A ∪ B ∈ A.
If A is an algebra, and moreover
(iv) ∞
S
n=1 An ∈ A whenever, for all n ∈ N, An ∈ A,

then A is called a σ-algebra.

Definition 1.2. A space, Ω, together with a σ-algebra, F, of subsets of Ω is called a


measurable space, (Ω, F).

1
2 1 A review of measure theory

Definition 1.3. Let (Ω, F) be a measurable space. A map µ : F → [0, ∞] from F to


the non-negative real numbers (and infinity) is called a (positive) measure, if
(i) µ(∅) = 0.
(ii) For any countable family {An }n∈N of mutually disjoint elements of F,
[  X
µ An = µ(An ). (1.1.2)
n∈N n∈N

A measure, µ, is called finite, if µ(Ω) < ∞. A measure is called σ-finite, if there


exists a countable class, {Ωn }n∈N , such that, for all n ∈ N, Ωn ∈ F, µ(Ωn ) < ∞ and
Ω = n∈N Ωn . A triple (Ω, F, µ) is called a measure space.
S

Definition 1.4. Let (Ω, F) be a measurable space. A positive measure, P, on (Ω, F)


that satisfies P[Ω] = 1 is called a probability measure. A triple (Ω, F, P), where Ω is
a set, F a σ-algebra of subsets of Ω, and P a probability measure on (Ω, F), is called
a probability space.
Probability spaces provide the scenery where probability theory takes place. The
set of sceneries is huge, since we have so far not made any restriction on the al-
lowable spaces Ω. In most instances, we will, however, want to stay on reasonable
grounds. Fortunately, there is a quite canonical setting where everything we ever
want to do can be constructed. This is the realm where Ω is a topological space and
F = B(Ω) is the Borel-σ-algebra of Ω.
We recall the definition of a topological space.
Definition 1.5. A space, E, is called a topological space, if for every point p ∈ E
there exists a collection, U p , of subsets of E, called (open) neighbourhoods, with
the following properties:
(i) For every point, p, U p , ∅.
(ii) Every neighbourhood of p contains p.
(iii) If U1 , U2 ∈ U p , then there exists U3 ∈ U p such that U3 ⊂ U1 ∩ U2 .
(iv) If U ∈ U p and q ∈ U, then there exists V ∈ Uq such that V ⊂ U.
Recall that in a topological space, one can define the notions such open sets and
closed sets; open sets have the property that any of its points has a neighbourhood
that is contained in the set, and closed sets are the complements of open sets. Note
that the empty set is also considered as an open set by default. Since the entire space
E is also open, the empty set is, however, also a closed set.
Definition 1.6. Two topological spaces are considered equivalent, or have the same
topology, if they contain the same open sets. In particular, given two sets of collec-
tions of neighbourhoods, U p , and V p , on a space E, then they generate the same
topology1 , if for any p ∈ E, and any U ∈ U p , there exists V ∈ V p such that V ⊂ U
and for any V ∈ V p there exists U ∈ U p such that U ⊂ V.
1 One says that the collections of neighbourhoods B = {U , p ∈ E} generate a topology T , or it is
p
a base for a topology T , if every open set in T can be written as a union of elements of B
1.1 Probability spaces 3

Definition 1.7. A topological space, E, is called:


(i) Hausdorff, if any two distinct points in E have disjoint neighbourhoods.
(ii) separable, if there exists a countable subset, E0 ⊂ E whose closure2 is E.
Definition 1.8. Let E be a topological space. The Borel-σ-algebra, B(E) of E is
the smallest σ-Algebra that contains all open sets of E.
The point behind the notion of the Borel-σ-algebra is that it is big enough to
satisfy our needs, but small enough to ensure that it is possible to construct measures
on it. Larger σ-algebras, such as the power set on any uncountable topological space,
do not usually allow to define measures with nice properties on them.
One says that the Borel-σ-algebra is generated by the open sets of E. This notion
will be used quite frequently. We say in general that a class, A, of a space Ω generates
a σ-algebra, σ(A), defined as the smallest σ-algebra that contains A,
\
σ (A) ≡ F.
F⊇A
F is σ−algebra

Even more structure appears if we work on metric spaces.


Definition 1.9. Let E be a set. A map, ρ : E × E → [0, ∞], is called a metric, if
(i) ρ(x, y) = 0, if and only if x = y;
(ii) ρ(x, y) = ρ(y, x), for all x, y ∈ E;
(iii) ρ(x, z) ≤ ρ(x, y) + ρ(y, z), for all x, y, z ∈ E.
For r ∈ R+ and x ∈ E, the set Br (x) ≡ {y ∈ E : ρ(x, y) < r} is called the (open) ball of
radius r.
The set of neighbourhoods obtained from the open balls associated to a metric,
ρ, is called the metric topology. A topological space endowed with a metric and its
metric topology is called a metric space.
A sequence xn ∈ E, n ∈ N is called a Cauchy sequence, if for any  > 0, there
exists n0 ∈ N, such that for all n, m ≥ n0 , ρ(xn , xm ) < . A metric space, E, is called
complete, if any Cauchy sequence in E converges.
A related concept is that of a normed space.
Definition 1.10. Let E be a vector space. A map k · k : E → R+ is called a norm, if
(i) for all x ∈ E, kxk ≥ 0, and kxk = 0 iff x = 0;
(ii) for all x ∈ E and α ∈ R, kαxk = |α|kxk;
(iii) for any x, y ∈ E, kx + yk ≤ kxk + kyk;
A vector space equipped with a norm is called a normed (vector) space.
Defining ρ(x, y) ≡ kx − yk yields a metric, so every normed space can be turned
into a metric space. A normed vector space that is a complete metric space with
respect to this norm is called a Banach space.
2 The closure of a subset, A, of a topological space is the intersection of all closed subsets contain-
ing A.
4 1 A review of measure theory

A further useful specialisation is the restriction to so-called Polish spaces.


Definition 1.11. A topological space E is called Polish if it is separable and com-
pletely metrisable. A completely metrisable space is a space that is homeomorphic
to a complete metric space.
That is, a Polish space is essentially a complete, separable metric space up to the
fact that the metric may not have been fixed. Recall that Rd is a Polish space, and so
is RN when equipped with the product topology.
Note that in many cases, different families of sets generate the same σ-algebra.
For instance, if E is a metric space with the topology given by the metric topology,
then the set of open balls generates the Borel-σ-algebra B(E). But also the set of
closed balls will generate B(E). If E is the real line, the half-lines also generate the
Borel-σ-algebra.
An advantage in Ω being a Polish space lies in the fact that one can choose
as a generator of the Borel-σ-algebra a countable collection of sets. For example,
in the case of the real line, the Borel-σ-algebra is already generated by the half-
lines (−∞, q], with q ∈ Q (just observe that if x is any real number, there exists a
sequence qn ↓ x, and the set n∈N (−∞, qn ] = (−∞, x] is also contained in the σ-
T
algebra generated by these half-lines.
A related, but more general, class of spaces are sometimes useful. These are
called Lusin spaces. These are spaces that are homeomorphic to a Borel subset of a
compact metrisable space. For instance, R is a Lusin space. To see this, note that R
is homeomorphic to the space (0, 1), which is a Borel subset of the compact metric
space [0, 1].
Two notions of special types of classes are very useful in this context.

Definition 1.12. Let Ω be a space. A class of subsets of Ω, T , is called a Π-system,


if T is closed under finite intersections; a class, G, is called a λ-system, if
(i) Ω ∈ G,
(ii) If A, B ∈ G, and A ⊃ B, then A \ B ∈ G,
(iii) If An ∈ G and An ⊂ An+1 , then limn→∞ An ∈ G.

The following useful observation is called Dynkin’s theorem.

Theorem 1.13 (Dynkin’s theorem). If T is a Π-system and G is a λ-system, then


G ⊃ T implies that G contains the smallest σ-algebra containing T .

The most useful application of Dynkin’s theorem is the observation that, if two
probability (resp. σ-finite) measures are equal on a Π-system, then they are equal
on σ(T ) (the σ-algebra generated by T ) (since the set on which the two measures
coincide forms a λ-system containing T ).

Examples.

The general setup allows to treat many important examples on the same footing.
1.1 Probability spaces 5

Countable spaces. If Ω is a countable space, the natural topology is the discrete


topology. Here the set of neighbourhoods of a point p contains the set {p}. Clearly
this is a topology, and all sets are open and closed with respect to it. The Borel-σ-
algebra consists of the power set of Ω. Countable spaces equipped with the discrete
metric defined by ρ(x, y) = 1 x,y are metric spaces.
Euclidean space. Rd equipped with the Euclidean metric ρ(x, y) ≡ kx − yk is a metric
space. Choosing as sets of neighbourhoods the set of all open balls, Br (p) ≡ {x ∈ Rd :
kx − pk < r} turns this into a topological space. The corresponding Borel-σ-algebra
is the smallest σ-algebra containing all these balls.
Note that, since on Rd the Euclidean norm and the sup-norm are equivalent, the
Borel-σ-algebra is also generated by open (or closed) rectangles.
Infinite product spaces. If E is a topological space, then the infinite Cartesian prod-
uct space, E ∞ , can also be turned into a topological space through the product topol-
ogy. Here the set of neighbourhoods of a point p ≡ (p1 , p2 , p3 , . . . ) is given by the
collection of sets
U p1 × U p2 × U pk × E × E × . . . , (1.1.3)
where k ∈ N, and U pi ∈ U pi . If B(E) is the Borel-σ-algebra of E, then the Borel-
σ-algebra of E ∞ is the product σ-algebra, B(E ∞ ) = B(E)⊗∞ , i.e. the σ-algebra that
is generated by the family of sets A1 × · · · × Ak × E × . . . , k ∈ N, Ai ∈ B(E) (where
of course it also suffices to choose the sets E × · · · × E × Ak × E × . . . , k ∈ N, and Ak
running through a generator of B(E)).
If E is a metric space, then one can also turn E ∞ into a metric space, such that
the associated metric topology is equivalent to the product topology. This is done,
e.g. by setting

X ρE (pn , qn )
ρE ∞ (p, q) ≡ 2−n . (1.1.4)
n=1
1 + ρE (pn , qn )
Note that this implies that, if E is a Polish space, then the infinite product space E ∞
equipped with the product topology is also Polish.
Infinite product spaces will be the scenario to discuss stochastic processes with
discrete time, the main topic of this course.
Function spaces. Important examples of metric spaces are normed function spaces,
such as the space of bounded, real-valued functions on R (or subsets I ⊂ R),
equipped with the supremum norm

k f − gk∞ ≡ sup | f (t) − g(t)|. (1.1.5)


t∈I

In the case when I is infinite (e.g. I = R+ ), we will often use a weaker topology that
“ignores infinity”, called the topology of “uniform convergence on finite subsets”.
It can be metrized with a norm

X sup0≤t≤n | f (t) − g(t)|
k f − gk ≡ 2−n . (1.1.6)
n=1
1 + sup0≤t≤n | f (t) − g(t)|
6 1 A review of measure theory

We will begin to deal with such examples in the later parts of this course, when we
introduce Gaussian random processes with continuous time.
Spaces of measures. Another space we are often encountering in probability the-
ory is that of measures on a Borel-σ-algebra. There are various ways to introduce
topologies on spaces of measures, but a very common one is the so-called weak
topology. Let E be the topological space in question, and C0 (E, R) the space of real-
valued, bounded, and continuous functions on E. We denote by M+ (E, B(E)) the
set of all positive measures on (E, B(E)). One can then define neighbourhoods of a
measure µ of the form
 
k
B,k, f1 ,..., fk (µ) ≡ ν ∈ M+ (E, B(E)) : max |µ( fi ) − ν( fi )| <  , (1.1.7)
i=1

where  > 0, k ∈ N, and fi ∈ C0 (E, R).


If E is a Polish space, then the weak topology can also be derived from a suitably
defined metric.

1.2 Construction of measures

The problem of the construction of measures in the general context of topological


spaces is not entirely trivial. This is due to the richness of a Borel-σ-algebra and the
hidden subtlety associated with the requirement of σ-additivity. The general strategy
is to construct a “measure” first on a simpler set, an algebra or a semi-algebra, and
then to use a powerful theorem ensuring the unique extendibility to the σ-algebra.
To do this we first define the notion of a σ-additive set-function.
Definition 1.14. Let A be a class of subsets of some set Ω. A function ν : A → [0, ∞]
is called a positive, σ-additive (or countably additive) set-function, if
(i) ν(∅) = 0,
(ii) for any sequence, Ak , k ∈ N, of mutually disjoint elements of A such that
S
k∈N Ak ∈ A,  
[  X
ν  Ak  =
 ν(Ak ). (1.2.1)
k∈N k∈N

The aim of this section is to prove the following version of Carathéodory’s theo-
rem.
Theorem 1.15 (Carathéodory’s theorem). Let Ω be a set and let S be an algebra
of subsets of Ω. Let µ0 : S → [0, ∞] be a σ-additive set-function. Then there exists a
measure, µ, on (Ω, σ(S)), such that µ = µ0 on S. If µ0 is σ-finite, then µ is unique.
Proof. We begin by defining the notion of an outer measure.
Definition 1.16. Let Ω be a set. A map µ∗ : P(Ω) → [0, ∞] is called an outer measure
if,
1.2 Construction of measures 7

(i) µ∗ (∅) = 0;
(ii) If A ⊂ B, then µ∗ (A) ≤ µ∗ (B) (increasing);
(iii) µ∗ is σ-subadditive, i.e., for any sequence An ∈ P(Ω), n ∈ N,
 
[  X
µ∗  An  ≤ µ∗ (An ) . (1.2.2)
n∈N n∈N

Note that an outer measure is far less constraint than a measure; this is why it can
be defined on any set, not just on σ-algebras.
Example. If (Ω, F, µ) is a measure space, we can define an extension of µ that will
be an outer measure on P(Ω) as follows: For any D ⊂ Ω, let

µ∗ (D) ≡ inf{µ(F) : F ∈ F; F ⊃ D}. (1.2.3)

This is of course not how we want to proceed when constructing a measure.


Rather, we will construct an outer measure from a σ-additive function on an algebra
(that is also a Π-system), and then use this to construct a measure.
Next we define the notion of µ∗ -measurability of sets.

Definition 1.17. Let Ω be a set and µ∗ an outer measure. A subset B ⊂ Ω is called


µ∗ -measurable, if, for all subsets A ⊂ Ω,

µ∗ (A) = µ∗ (A ∩ B) + µ∗ (A ∩ Bc ). (1.2.4)

The set of µ∗ -measurable sets is called M(µ∗ ).

Theorem 1.18. Let µ∗ be an outer [Link]:


(i) M(µ∗ ) is a σ-algebra that contains all subsets B ⊂ Ω such that µ∗ (B) = 0.
(ii) The restriction of µ∗ to M(µ∗ ) is a measure.

Proof. Note first that in general, by sub-additivity,

µ∗ (A) ≤ µ∗ (A ∩ B) + µ∗ (A ∩ Bc ). (1.2.5)

If µ∗ (B) = 0, we have also that

µ∗ (A) ≥ µ∗ (A ∩ Bc ) = µ∗ (A ∩ B) + µ∗ (A ∩ Bc ). (1.2.6)

Thus, M(µ∗ ) contains all sets B with µ∗ (B) = 0. This implies in particular that ∅ ∈
M(µ∗ ). Also, by the symmetry of the definition, M(µ∗ ) contains all of its elements
together with their complements. Thus the only non-trivial thing to show (i) is the
stability under countable unions. Let B1 , B2 be in M(µ∗ ). Note also that (1.2.6) holds
trivially if µ∗ (A) = +∞. Therefore we can assume henceforth that the sets A satisfy
µ∗ (A) < ∞. Then

µ∗ (A ∩ (B1 ∪ B2 )) = µ∗ (A ∩ (B1 ∪ B2 ) ∩ B1 ) + µ∗ (A ∩ (B1 ∪ B2 ) ∩ Bc1 )


= µ∗ (A ∩ B1 ) + µ∗ (A ∩ B2 ∩ Bc1 ), (1.2.7)
8 1 A review of measure theory

where we used that B1 ∈ M(µ∗ ) for the first equality. Then

µ∗ (A ∩ (B1 ∪ B2 )) + µ∗ (A ∩ (B1 ∪ B2 )c ) (1.2.8)


= µ (A ∩ B1 ) + µ (A ∩ B2 ∩ Bc1 ) + µ∗ (A ∩ Bc1 ∩ Bc2 )
∗ ∗

= µ∗ (A ∩ B1 ) + µ∗ (A ∩ Bc1 ) = µ∗ (A).

Thus B1 ∪ B2 ∈ M(µ∗ ). This implies that M(µ∗ ) is closed under finite union. Since
it is also closed under passage to the complement, it is closed under finite inter-
section. Thus it is enough to show that countable unions of pairwise disjoint sets,
Bk ∈ M(µ∗ ), k ∈ N, are in M(µ∗ ). To show this, we show that, for all m ∈ N,
m
 m

X  \ 
µ (A) =

µ (A ∩ Bn ) + µ A ∩
∗ ∗  Bn  .
c
(1.2.9)
n=1 n=1

This holds for m = 1 by definition, and if it holds for m, then


 m
  m
  m+1

 \   \   \ 
µ A ∩
∗
Bn  = µ A ∩
c ∗
Bn ∩ Bm+1  + µ A ∩
c ∗ c
Bn 
n=1 n=1 n=1
 m+1

 \ 
= µ (A ∩ Bm+1 ) + µ A ∩
∗ ∗  Bn  ,
c

n=1

so, inserting this into (1.2.9), it holds for m + 1. Hence, by induction, it is true for all
m ∈ N.
From (1.2.9) we deduce further that
m
 ∞

X  \ 
µ(A) ≥ µ (A ∩ Bn ) + µ A ∩
∗ ∗  Bn  .
c
(1.2.10)
n=1 n=1

Now we let m tend to infinity, and use sub-additivity:



 ∞

X  \ 
µ(A) ≥ µ (A ∩ Bn ) + µ A ∩
∗ ∗ c
Bn  (1.2.11)
n=1 n=1
  ∞   ∞

 [   \ 
≥ µ A ∩  Bn  + µ A ∩
∗     ∗  Bn  .
c

n=1 n=1

The converse inequality follows easily. Clearly,


 ∞   ∞ c 
 [  [  
µ (A) = µ A ∩  Bn  ∪ A ∩  Bn  
∗ ∗

n=1 n=1
  ∞    ∞ c 
 [   [  
≤ µ A ∩  Bn  + µ A ∩  Bn   .
∗    ∗  (1.2.12)
n=1 n=1
1.2 Construction of measures 9

Thus equality holds and the union ∞ ∗


S
n=1 Bn ∈ M(µ ).
It remains to prove that µ restricted to M(µ ) is a measure. We know already
∗ ∗

that µ∗ (∅) = 0. Let now Bn be disjoint as above. Let us choose in the first line of
(1.2.11) A = ∞
S
n=1 Bn . This gives
∞  ∞
[  X
µ  Bn  ≥
∗
µ∗ (Bn ) . (1.2.13)
n=1 n=1

Since the converse inequality holds by sub-additivity, equality holds and the result
is proven. ut

The preceding theorem provides a clear strategy for proving Carathéodory’s the-
orem. All we need is to prescribe a σ-additive function, µ0 , on the algebra. Then
construct an outer measure µ∗ . This can be done in the following way: If S is an
algebra, set  

X 
µ (D) = inf 

µ
 
(F ) : F ∈ S; ∪ F ⊃ D (1.2.14)

 0 n n n∈N n 

 
n∈N

One needs to show that this is sub-additive and defines an outer measure. Once this
is done, it remains to show that M(µ∗ ) contains σ(S). This is done by showing that
it contains S, since M(µ∗ ) is a σ-algebra.
Let us now conclude our proof by carrying out these steps.

Lemma 1.19. Let S be an algebra, µ0 a σ-additive function on S, and µ∗ defined


by (1.2.14). Then µ∗ is an outer measure.

Proof. First, note that the first two conditions for µ∗ to be an outer measure are
trivially satisfied. To prove sub-additivity, let An , n ∈ N be a family of subsets of Ω.
For each n and given  > 0, we can choose a family of sets Fn,i ∈ S, i ∈ N, such that
An ⊂ ∪i∈N Fn,i , and i∈N µ0 (Fn,i ) ≤ µ∗ (An ) + 2−n . Such families must exist due to the
P
definition of µ∗ as the infimum over the sum of the masses of all such covers. Then,
S S S
since n i Fn,i ⊃ n An ,
 
[  X  X ∗  X ∗
µ∗  An  ≤ µ0 Fn,i ≤ µ (An ) + 2−n ≤ µ (An ) + 2, (1.2.15)
n∈N n,i∈N n∈N n∈N

which proves the claim since  > 0 is arbitrary. t


u

Lemma 1.20. Let µ∗ be the outer measure defined by (1.2.14). Let M(µ∗ ) be the
σ-algebra of µ∗ -measurable sets. Then σ(S) ⊂ M(µ∗ ).

Proof. We must show that M(µ∗ ) contains a family that generates σ(S). In fact, we
will show that it contains all the elements of the algebra S. To see this, let A ⊂ Ω
be arbitrary with µ∗ (A) < ∞. Then, for any  > 0, there is a collection (Fn )n∈N , with
F ∈ S for all n ∈ N, such that A ⊂ ∪n∈N Fn and µ∗ (A) ≥ n∈N µ0 (Fn ) − . But then,
P
10 1 A review of measure theory

for B ∈ S, set Bn = Fn ∩ B ∈ S. Then the union of all the Bn covers B ∩ A and A ∩ Bc


is covered by the union of the sets Fn \ B = Fn \ Bn .
X
µ∗ (A ∩ B) ≤ µ∗ (∪n∈N Bn ) ≤ µ0 (Bn ) , (1.2.16)
n∈N

where the last inequality uses subadditivity and the fact that µ∗ equals µ0 on S. Also,
for the same reasons,
X
µ∗ (A ∩ Bc ) ≤ µ∗ (∪n∈N Fn \ Bn ) ≤ µ0 (Fn \ Bn ) . (1.2.17)
n∈N

Adding both inequalities and using that µ0 (Bn ) + µ0 (Fn \ Bn ) = µ0 (Fn ), we get that
X
µ∗ (A ∩ B) + µ∗ (A ∩ Bc ) ≤ µ(Fn ) ≤ µ∗ (A) + . (1.2.18)
n∈N

This proves
µ∗ (A) ≥ µ∗ (A ∩ B) + µ∗ (A ∩ Bc ), (1.2.19)
and since the opposite inequality follows by sub-additivity, B ∈ M(µ∗ ). t
u

Thus we have in fact constructed an outer measure that is a measure on σ(S)


and that extends µ0 on S. The uniqueness of the extension in the finite case follows
from Dynkin’s theorem. Assume that there are two measures, µ and ν , on σ(S) that
coincide on S. One verifies easily that the class of sets B ∈ σ(S) where µ(B) = ν(B) is
a λ-system which contains the Π-system S; by Dynkin’s theorem this λ-system must
be σ(S). Finally, if µ0 is σ-finite, one uses the following standard argument (that
allows to carry many results from finite to σ-finite measures): By σ-finiteness, there
exists a sequence of increasing sets, Ωn ↑ Ω, with µ0 (Ωn ) < ∞. Then the measure
µn ≡ µ1Ωn ↑ µ. So if there are two extensions of a given σ-additive set-function,
then their restrictions to all Ωn are finite measures and must coincide. But then so
must their limits. This concludes the proof of Carathéodory’s theorem. u t

Remark. Carathéodory’s theorem should appear rather striking at first by its gener-
ality. It makes no assumptions on the nature of the space Ω whatsoever. Does this
mean that the construction of a measure is in general trivial? The answer is of course
no, but Caratherodory’s theorem separates clearly the topological aspects form the
algebraic aspects of measure theory. Namely, it shows that in a concrete situation,
to construct a measure one needs to construct a σ-additive set-function on an alge-
bra that contains a Π-system that will generate the desired σ-algebra. The proof of
Carathéodory’s theorem shows that the extension to a measure is essentially a matter
of algebra and completely general. We will see later how topological aspects enter
into the construction of additive set-functions, and why aspects like separability and
metric topologies become relevant.

Remark. The σ-algebra M(µ∗ ) is in general not equal to the σ-algebra generated
by S. In particular, we have seen that M(µ∗ ) contains all sets of µ∗ -measure zero,
1.2 Construction of measures 11

all of which need not be in σ(S). This observation suggests to consider in general
extensions of a given σ-algebra with respect to a measure that ensures that all sets
of measure zero are measurable. Let (Ω, F, µ) be a measure space. Define the outer
measure, µ∗ , as in (1.2.3), and define the inner measure, µ∗ , as

µ∗ (D) ≡ sup{µ(F) : F ∈ F; F ⊂ D}. (1.2.20)

Then
M(µ) ≡ {A ⊂ Ω : µ∗ (A) = µ∗ (A)}. (1.2.21)
One can easily check that M(µ) is a σ-algebra that contains F and all sets of outer
measure zero.
Terminology. A measure, µ, defined on a Borel-σ-algebra F = B(Ω) is sometimes
called a Borel measure. The measure space (Ω, M(µ), µ) is called the completion of
(Ω, F, µ).

It is a nice feature of null-sets that not only can they be added, but they can also
be gotten rid off. This is the content of the next lemma.

Lemma 1.21. Let (Ω, F, µ) be a probability space and assume that G ⊂ Ω is such
that µ∗ (G) = 1. Then, for any A ∈ F, µ∗ (G ∩ A) = µ(A) and if G ≡ F ∩ G (that is the
set of all subsets of G of the form G ∩ A, A ∈ F), then (G, G, µ∗ ) is a probability
space.

Proof. Exercise. u
t

Lebesgue measure. The prime example of the construction of a measure using


Carathéodory’s theorem ist the Lebesgue measure on R. Consider the algebra, S,
of all sets that can be written as finite unions of semi-open, disjoint intervals of the
form (a, b], and (a, +∞), a ∈ R ∪ {−∞}, b ∈ R. Clearly, the function λ, defined by
 
[  X
λ  (ai , bi ] = (bi − ai ) (1.2.22)
i i

provides a countably additive set-function (this needs a proof!!). Then we know


that this can be extended to σ(S) = B(Ω); more precisely, one actually constructs
a measure on the σ-algebra M(λ∗ ), and strictly speaking it is this measure on the
complete measure space (R, M(λ), λ) that is called the Lebesgue measure.
Of course the same construction can be carried out on any finite non-empty in-
terval, I ⊂ R; the corresponding measures are finite and thus unique. It is easy to see
that λ as a measure on R is σ-finite and hence also unique.
The construction carries over, with obvious modificatons, to Rd : just replace half-
open intervals by half-open rectangles. The key is that we have a natural notion of
volume for the elementary objects, and that this provides a σ-additive function an
the corresponding algebra.
On topological spaces, one can ask for a number of continuity related properties
of measures that occasionally will come very handy.
12 1 A review of measure theory

Definition 1.22. Let Ω be a Hausdorff space and B(Ω) the corresponding Borel-σ-
algebra. A measure, µ, on (Ω, F = B(Ω)), is called:
(0) Borel measure, if for any compact set3 , C ∈ F, µ(C) < ∞;
(i) inner regular or tight, if, for all B ∈ F, µ(B) = supC⊂B µ(C), where the supremum
is over all compact sets contained in B;
(ii) outer regular, if for all B ∈ F, µ(B) = inf O⊃B µ(O), where the infimumum is over
all open sets containing B.
(iii) locally finite, if for any point p ∈ Ω there exists a neighbourhood U p such that
µ(U p ) < ∞.
(iv) Radon measure, if it is inner regular and locally finite.
A very important result is that on a compact metrisable spaces4 , all probability
measures are inner regular. The following result will be used in the construction of
stochastic processes in Section 3.2.
Theorem 1.23. Let Ω be a (Hausdorff) compact metrisable space and let P be a
probability measure on (Ω, B(Ω)). Then P is inner regular.
Proof. Let A be the class of elements, B, of B(Ω), such that, for all  > 0, there exists
a compact set, K ⊂ B, and an open set, G ⊃ B, such that P(B\K) <  and P(G\B) < .
Step 1: Show that A is an algebra. First, if B ∈ A, then its complement, Bc , will
also be in A (for Gc is closed, Bc ⊃ Gc , and Bc \Gc = G\B, and vice versa). Next,
if B1 , B2 ∈ A, then there are Ki ⊂ Bi and Gi ⊃ Bi , such that P(Bi \Ki ) < /2 and
P(Gi \Bi ) < /2. Then K = K1 ∪ K2 and G = G1 ∪ G2 are the desired sets for B =
B1 ∪ B2 . Thus A is an algebra.
Step 2: Show that A is a σ-algebra. Now let Bn be an increasing sequence of
elements of A such that n∈N Bn = B. We choose sets Kn and Gn as
S
 before, but with
/2 replaced by 2−n−1 . Then there exists a N < ∞ such that P B\ n=1 Kn < .
SN
S 
Indeed, P B\ n∈N Kn < /2, while P n∈N Kn \ n=1 Kn < /2 for N large enough.
S  SN

Kn such that P(B\K) < . The infinite


SN
Therefore, there exists a compact set K ≡ n=1
union of the Gn contains B and is open, and P n∈N Gn \B < , and so B ∈ A. Thus
S 
A is a σ-algebra.
Step 3: Show that B(Ω) = A. We need to verify that any compact set K ∈ B(Ω)
is in A. Since Ω is metrisable, there exists a metric, ρ, such that the topology of Ω
is equivalent to the metric topology. If K is a closed and thus compact subset of Ω,
then K is the intersection of a sequence of open sets Gn ≡ {ω ∈ Ω : ρ(ω, K) < 1/n},
\
K= Gn . (1.2.23)
n∈N

Since Gn ↓ K and P is finite, it follows that P[Gn ] ↓ P[K], because Gn \K ↓ ∅ and


by σ-additivity P[Gn \K] ↓ 0. This means that K ∈ A. Thus A is a σ-algebra that
3 For Hausdorff spaces it holds also that compact sets are closed. Closed sets in a compact topo-
logical space are compact.
4 A topological space E is compact if every open cover of E has a finite subcover. In other words,

if E is the union of a family of open sets, there is a finite subfamily whose union is E.
1.3 Random variables 13

contains all closed sets, and since B(Ω) is the smallest σ-algebra that contains all
closed sets, then B(Ω) ⊂ A. By definition A ⊂ B(Ω), thus B(Ω) = A.
Now for any B ∈ B(Ω) and K ⊂ B compact, P(B) = P(K) + P(B\K). But since,
for any B ∈ A, and for any  > 0, by definition, there exists K such that P(B\K) < .
Thus sup{P(K) : K ⊂ B} = P(B), so P is inner regular. u t

Remark. We used in the proof the fact that Ω is compact to conclude that all closed
sets are compact. Otherwise, we would only deduce that for all B ∈ B(Ω), P(B) =
inf{P(C) : C ⊂ B}, where the infimum is over all closed sets. Sometimes this is called
inner regular, and tight is reserved for our definition. Note also that the conclusion
of the theorem also holds for separable metrisable spaces without the compactness
assumption.

Remark. Note that the proof shows that P is also outer regular. Measures that are
both inner and outer regular are sometimes called regular.

1.3 Random variables

Definition 1.24. Let (Ω, F) and (E, G) be two measurable spaces. A map f : Ω → E
is called measurable from (Ω, F) to (E, G), if, for all A ∈ G, f −1 (A) ≡ {ω ∈ Ω : f (ω) ∈
A} ∈ F.

The notion of measurability implies that a measurable map is capable of trans-


porting a measure from one space to another. Namely, if (Ω, F, P) is a probability
space, and f is a measurable map from (Ω, F) to (E, G), then

P f ≡ P ◦ f −1

defines a probability measure on (E, G), called the induced measure. Namely, for
any B ∈ G, by definition  
P f (B) = P f −1 (B)

is well defined, since f −1 (B) ∈ F.


The standard notion of a random variable refers to a measurable function from
some measurable space to the space (R, B(R)). We will generally extend this notion
and call any measurable map from a measurable space (Ω, F) to a measurable space
(E, B(E)), where E is a topological, respectively metric space, a E-valued random
variable or a E-valued Borel function. Our privileged picture is then that we have
an unspecified, so called abstract probability space (Ω, F, P) on which all kinds of
random variables, be it, reals, infinite sequences, functions, or measures, are defined,
possibly simultaneously.
An important notion is then that of the σ-algebra generated by random variables.

Definition 1.25. Let (Ω, F) be a measurable space, and let (E, B(E)) be a topological
space equipped with its Borel-σ-algebra. Let f be an E-valued random variable. We
14 1 A review of measure theory

say that σ( f ) is the smallest σ-algebra such that f is measurable from (Ω, σ( f )) to
(E, B(E)).
Note that σ( f ) depends on the set of values f takes. E.g., if f is real valued, but
takes only finitely many values, the σ-algebra generated by f has just finitely many
elements. If f is the constant function, then σ( f ) = {Ω, ∅}, the trivial σ-algebra. This
notion is particularly useful, if several random variables are defined on the same
probability space.
Dynkin’s theorem has a useful analogue for so-called monotone classes of func-
tions.
Theorem 1.26 (Monotone class theorem). Let H be a class of bounded functions
on Ω to R. Assume that
(i) H is a vector space over R,
(ii) 1 ∈ H,
(iii) if fn ≥ 0 are in H, and fn ↑ f , where f is bounded, then f ∈ H.
If H contains the indicator functions of every element of a Π-system S, then H
contains any bounded σ(S)-measurable function.
Proof. Let D be the class of subsets D of Ω such that 1D ∈ H. Then D is a λ-system.
Since by hypothesis D contains S, by Dynkin’s theorem, D contains the σ-algebra
generated by S. Now let f be a σ(S)-measurable function s.t. 0 ≤ f ≤ K < ∞ for
some constant K. Set

D(n, i) ≡ {ω ∈ Ω : i2−n ≤ f (ω) < (i + 1)2−n }, (1.3.1)

and set
K2n
i2−n 1D(n,i) (ω).
X
fn (ω) ≡ (1.3.2)
i=0

Every D(n, i) is σ(S)-measurable, and so 1D(n,i) ∈ H, and so by (i), fn ∈ H. Since


fn ↑ f , f ∈ H.
To conclude, we take a general σ(S)-measurable function and decompose it into
the positive and negative part and treat each part as before. u
t
An important property of measurable functions is that the space of measurable
functions if closed under limit procedures.
Lemma 1.27. Let fn , n ∈ N, be real valued random variables. Then the functions

f + ≡ lim sup fn and f − ≡ lim inf fn (1.3.3)


n→∞ n→∞

are measurable. In particular, if the fn → f pointwise, than f is measurable.


The proof is left as an exercise.
If Ω is a topological space, we have the natural class of continuous functions
from Ω to R. It is easy to see that all continuous functions are measurable if Ω and
1.4 Integrals 15

R are equipped with their Borel σ-algebras. Thus, all functions that are pointwise
limits of continuous functions are measurable, etc..

Remark. Instead of introducing the Borel-σ-algebra, one could go a different path


and introduce what is called the Baire-σ-algebra. Here one proceeds form the idea
that on a topological space one naturally has the notion of continuous functions.
One certainly will want all of these to be measurable functions, but certainly one
will want more: any pointwise limit of a continuous function should be measurable,
as well as limits of sequences of such functions. In this way one arrives at a class of
functions, called Baire-functions, that is defined as the smallest class of functions
that is closed under pointwise limits and that contains the continuous functions.
One can then define the Baire-σ-algebra as the smallest σ-algebra that makes all
Baire-functions measurable. It is in general true that the Borel-σ-algebra contains
the Baire-σ-algebra, but in general they are not the same. However, on most spaces
we will consider (Polish spaces), the two concepts coincide.

1.4 Integrals

We will now recall the notion of an integral of a measurable function (respectively


expectation value of random variables).
To do this one first introduces the notion of non-negative simple functions:

Definition 1.28. A function, g : Ω → R, is called a non-negative simple function if,


for some k ∈ N there are non-negative numbers, w1 , . . . , wk , and a partition of Ω,
(Ai ∈ F)ki=1 with ki=1 Ai = Ω, such that Ai = {ω ∈ Ω : g(ω) = wi }. Then we can write
S

k
wi 1Ai (ω).
X
g(ω) =
i=1

The space of non-negative simple functions is denoted by E+ .

Obviously, simple functions are measurable. It is obvious what the integral of a


simple function should be.

Definition 1.29. Let (Ω, F, µ) be a measure space and g = ki=1 wi 1Ai a non-negative
P
simple function. Then
Z Xk
gdµ = wi µ(Ai ). (1.4.1)
Ω i=1

Note that the fact that 1A dµ = µ(A) by any interpretation of what an integral
R

should be. The rest follows from the requirement of linearity.


The integral of a general measurable function is defined by approximation with
simple functions.
16 1 A review of measure theory

Definition 1.30.
(i) Let f be non-negative and measurable. Then
Z Z
f dµ ≡ sup gdµ (1.4.2)
Ω g≤ f,g∈E+ Ω

Note that the value of the integral is in R ∪ {+∞}.


(ii) If f is measurable, set

f (ω) = 1 f (ω)≥0 f (ω) + 1 f (ω)<0 f (ω) ≡ f+ (ω) − f− (ω)

f+ (ω)dµ < ∞ or Ω f− (ω)dµ < ∞, define


R R
If either Ω
Z Z Z
f (ω)dµ ≡ f+ (ω)dµ − f− (ω)dµ (1.4.3)
Ω Ω Ω

(iii) We call a function f integrable or absolutely integrable, if


Z
| f |dµ < ∞.

Notation. If P is a probability measure and X a random variable, we write


Z
XdP ≡ E[X], (1.4.4)

and call E[X] the expectation of X.

Remark. One sometimes also writes µ( f ) for f dµ. But this should be avoided, in
R

particular in the case of probabilities. Never write P(X) if X is a random variable!!


We state the key properties of the integral without proof.
The most fundamental property is the monotone convergence theorem, which to
a large extent justifies the (otherwise strange) definition above.

Theorem 1.31 (Monotone convergence theorem). Let (Ω, F, µ) be a measure space


and f a real valued non-negative measurable function. Let f1 ≤ f2 ≤ · · · ≤ f be a
monotone increasing sequence of non-negative measurable functions that converge
pointwise to f . Then Z Z
f dµ = lim fn dµ (1.4.5)
Ω n→∞ Ω

The monotone convergence theorem allows to provide an “explicit” construction


of the integral as originally used by Lebesgue as a definition.
Lemma 1.32. Let f be a non-negative measurable function. Then
1.4 Integrals 17
Z n −1
"n2X
f dµ ≡ lim 2−n k µ({ω : 2−n k ≤ f (ω) < 2−n (k + 1)})
Ω n→∞
k=0
#
+nµ({ω : f (ω) ≥ n}) (1.4.6)

The following lemma is known as Fatou’s lemma:


Lemma 1.33. Let fn be a sequence of measurable non-negative functions. Then
Z Z
lim inf fn dµ ≤ lim inf fn dµ. (1.4.7)
Ω n→∞ n→∞ Ω

Equally central is Lebesgue’s dominated convergence theorem:

Theorem 1.34 (Lebesgue’s dominated covergence theorem). Let fn be a se-


quence of absolutely integrable functions, and let f be a measurable function such
that
lim fn (ω) = f (ω) for µ-almost all ω.
n→∞

Let g ≥ 0 be a positive function such that Ω gdµ < ∞ and


R

| fn (ω)| ≤ g(ω) for µ-almost all ω.

Then f is absolutely integrable with respect to µ and


Z Z
lim fn dµ = f dµ. (1.4.8)
n→∞ Ω Ω

In the case when we are dealing with integrals with respect to a probability mea-
sure, there exists a very useful improvement of the dominated convergence theorem
that leads us to the important notion of uniform integrability.
We have frequently used the following fact about probability measures:

Lemma 1.35. Let (Ω, F, P) be a probability space and let X be an absolutlely inte-
grable real valued random variables on this space. Then, for any  > 0, there exists
K < ∞, such that
E |X|1|X|>K < .

(1.4.9)

Proof. This is a direct consequence of the monotone convergence theorem. The


family of random variables XK ≡ |X|1|X|≤K is monotone increasing K and converges
pointwise to |X|. Therefore, limK↑∞ EXK = E|X| = E|X|. But E|X| = EXK + E|X|1|X|>K ,
and so limK↑∞ E|X|1|X|>K = 0, which by definition of convergence implies the asser-
tion of the lemma. u t

When dealing with families of random variables, one problem is that this prop-
erty will in general not hold uniformly. A nice situation occurs if it does:
18 1 A review of measure theory

Definition 1.36 (Uniform integrability). Let (Ω, F, P) be a probability space. A


class, C, of real valued random variables is called uniformly integrable, if, for any
 > 0, there exists K < ∞, such that, for all X ∈ C,

E |X|1|X|>K < .

(1.4.10)

Note that, in particular, if C is uniformly integrable, then there exists a constant,


C < ∞, such that, for all X ∈ C, E(|X|) ≤ C.
Remark. The definition of uniform integrability immediately extends to random
variables taking values in a Banach space by replacing the absolute value by the
norm.
Remark. The simplest example of a class of random variables that is not uniformly
integrable is given as follows. Take Xn such that

P(Xn = 1) = 1 − 1/n and P(Xn = n) = 1/n. (1.4.11)

Clearly, for any K, limn→∞ E(|Xn |1|Xn |>K ) = 1. One should always keep this example
in mind when reflecting upon uniform integrability. Note that on the other hand the
class of functions (Yn , n ∈ N) with

P(Yn = 1) = 1 − 1/n and P(Yn = n) = 1/n (1.4.12)

is uniformly integrable.
The following lemma gives an equivalent formulation of uniform integrability
that is sometimes useful.
Lemma 1.37. A class, C, of real valued random variables is uniformly integrable,
if and only if supX∈C E|X| < ∞ and for all  > 0 there exists δ > 0, such that, for any
set A ∈ F such that
P(A) < δ, (1.4.13)
it holds that, for all X ∈ C,
E [|X|1A ] < . (1.4.14)
Proof. First, assume that the class C is uniformly integrable. For given , let K be
such that E |X|1|X|>K < /2 for all X ∈ C and K > 2/. Then
 

E [|X| − (K ∧ |X|)] = E (|X| − K)1|X|>K = E |X|1|X|>K − KP(|X| > K) < /2.


   
(1.4.15)
for all X ∈ C. Now choose A such that P(A) ≤ K −2 . Then

E [|X|1A ] ≤ E [(|X| − (K ∧ |X|))1A ]+ KP(A) ≤ E [|X| − (K ∧ |X|)]+ KP(A) ≤ /2+ K −1 < .


(1.4.16)
Thus, the property claimed in the lemma follows with δ = K −2 .
Now assume the conclusion of the lemma holds. By Chebyshev’s inequality,
P(|X| > K) ≤ K −1 E|X| ≤ K −1 supX∈C E|X|, for all X ∈ C. Hence we can find, for any
1.4 Integrals 19

δ > 0, K < ∞ such that P(|X| > K) ≤ δ. Then E |X|1|X|>K < , if δ is chosen as in the
 
conclusion of the lemma. Hence C is uniformly integrable. u t

Theorem 1.38 (Uniform integrability). Let Xn , n ∈ N and X be integrable random


variables on some probability space (Ω, F, P). Then limn→∞ E|Xn − X| = 0, if and
only if
(i) Xn → X in probability, and
(ii) the family Xn , n ∈ N is uniformly integrable.

Proof. The key idea is that if a sequence of random variables is uniformly inte-
grable, then this is almost als good as if that sequence was bounded. But a sequence
of bounded random variables has limit points, and these limit points must all be the
same, if the sequence converges in probability. So Xn converges to X almost surely.
But then Xn also converges in L1 by dominated convergence. Conversely, if Xn con-
verges to X in L1 , then Xn is essentially bounded, since all most all Xn are very close
to X (which is essentially bounded by Lemma 1.36. Let us now make this rigorous.
We first show the “if” part. Define




 K, if x > K,
φK (x) ≡ 

|x| ≤ K, (1.4.17)

 x, if

−K, if x < −K.

We have obviously from the uniform integrability that for any  there exists K < ∞,
such that
E(|φK (Xn ) − Xn |) ≤ , (1.4.18)
for n ≥ 0 (where for convenience we set X ≡ X0 ). Moreover, since |φK (x) − φK (y)| ≤
|x − y|, (i) implies that φK (Xn ) → φK (X) in probability. Thus, for any δ > 0 and  > 0,
there exists n0 ∈ N such that, for all n ≥ n0 , P(|φK (Xn ) − φK (X)| > δ) ≤ /K. Then,
since |φK (Xn )| ≤ K, for such n,

E(|φK (Xn ) − φK (X)|) ≤ δ + 2, (1.4.19)

and so limn→∞ E(|φK (Xn ) − φK (X)|) = 0. In view of the fact that (1.4.18) holds for
any , it follows that E(|Xn − X|) → 0.
Let us now show the converse (“only”) direction. If E(|Xn − X|) → 0, then by
Chebychev’s inequality, P(|Xn − X| > ) ≤ E(|Xn−X|) → 0, so Xn → X in probability.
Now write Xn = (Xn − X) + X and use that, by the triangle inequality,

E(|Xn |) ≤ E(|X|) + E(|Xn − X|). (1.4.20)

For any  > 0, there exists n0 such that, for all n ≥ n0 , E(|Xn − X|) < . Since all Xi and
X are integrable, there exists K such that, for all n ≤ n0 , E(|Xn |1|Xn |>K ) < . Hence

 if n ≤ n0 ,
E(|Xn |1|Xn |>2K ) ≤ 


(1.4.21)
E(|X|1|Xn |>2K ) +  if n > n0 .

20 1 A review of measure theory

Finally we use that, for n > n0 ,

E(|X|1|Xn |>2K ) ≤ E(|X|1|X|>2K−|X−Xn | ) (1.4.22)


≤ E(|X|1|X|>K ) + E(|X|1|X|≤K 1|X−Xn |>K )
≤  + KP(|X − Xn | > K) ≤ 2.

This concludes the proof. t


u

The importance of this result lies in the fact that in probability theory, we are
very often dealing with functions that are not really bounded, and where Lebesgue’s
theorem is not immediately applicable either. Uniform integrability is the best pos-
sible condition for convergence of the integrals. Note that the simple example
(1.4.12) of a uniformly integrable family given above furnishes a nice example
where E(|Xn − X|) → 0, but where Lebesgue’s dominated convergence theorem can-
not be applied.
Exercise: Use the previous criterion to prove Lebesgue’s dominated convergence
theorem in the case of probability measures.

1.5 L p and L p spaces

I will only rather briefly summarise some frequently used notions concerning spaces
of integrable functions. Given a measure space, (Ω, F, µ), one defines, for p ∈ [1, ∞)
and measurable functions, f ,
Z !1/p
p 1/p
k f k p,µ ≡ k f k p ≡ E| f | = p
| f | dµ . (1.5.1)

Also, set
k f k∞ ≡ sup | f (ω)|. (1.5.2)
ω∈Ω

One may also define


k f k∞,µ ≡ esupω∈Ω | f (ω)|, (1.5.3)
where esup is called the essential supremum (with respect to the measure µ). This is
defined as
esup g = inf sup g(ω). (1.5.4)
O∈F:µ(O)=0 ω∈Oc

The set of functions, f , such that k f k p,µ < ∞ is denoted by L p (Ω, F, µ) ≡ L p .


There are two crucial inequalities.
Lemma 1.39 (Minkowski inequality). For f, g ∈ L p ,

k f + gk p ≤ k f k p + kgk p , (1.5.5)
1.5 L p and L p spaces 21

Lemma 1.40 (Hölder inequality). For measurable functions f, g and p, q ∈ [1, ∞]


are such that 1p + 1q = 1, then
Z
f gdµ ≤ k f k kgk , (1.5.6)
p q

Both inequalities follow from one of the most important inequalities in integra-
tion theory, Jensen’s inequality.
Theorem 1.41 (Jensen’s inequality). Let (Ω, F, P) be a Probability space, let X be
an absolutely integrable random variable, and let ϕ : R → R be a convex function.
Then, for any c ∈ R,
Eϕ(X − EX + c) ≥ ϕ(c), (1.5.7)
and in particular
Eϕ(X) ≥ ϕ (EX) . (1.5.8)
Proof. If ϕ is convex, then for any y there is a straight line below ϕ that touches
ϕ at (y, ϕ(y)), i.e. there exists m ∈ R such that ϕ(x) ≥ ϕ(y) + (x − y)m. Choosing x =
X − EX + c and y = c and taking expectations on both sides yields (1.5.7). u t

ϕ(x)

f (x) = ϕ(y) + m(x − y)

ϕ(y)

y x
Fig. 1.1 Convex function

Exercise: Prove the Hölder inequalities (for p > 1) using Jensen’s inequality.
Since Minkowski’s inequality is really a triangle inequality and property (i) (lin-
earity w.r.t. scalar multiplication) is trivial, we would be inclined to think that k · k p
is a norm and L p is a normed space. In fact, the only problem is that k f k p = 0 does
not imply f = 0, since f maybe non-zero on sets of µ-measure zero. Therefore to
define a normed space, one considers equivalence classes of functions in L p by call-
ing two functions, f, f 0 equivalent, if f − f 0 is non-zero only on set of measure zero.
The space of these equivalence classes is called L p ≡ L p (Ω, F, µ).
The following fact about L p spaces will be useful to know.
22 1 A review of measure theory

Lemma 1.42. The spaces L p (Ω, F; µ) are Banach spaces (i.e. complete normed vec-
tor space).

Proof. The by now only non-trivial fact that needs to be proven is the completeness
of L p . Let fi ∈ L p , i ∈ N be a Cauchy sequence. Then there are nk ∈ N, such that, for
all i, j ≥ nk , k fi − f j k p ≤ 2−k−k/p . Set gk ≡ fnk and
X
F≡ 2kp |gk − gk+1 | p . (1.5.9)
k∈N

Then
Z X Z X
p
Fdµ = 2kp |gk − gk+1 | p dµ = 2kp kgk − gk+1 k p ≤ 1. (1.5.10)
k∈N k∈N

Therefore, F is integrable and hence finite except possibly on a set of measure zero.
It follows that for all ω ∈ Ω s.t. F(ω) is finite, |gk (ω) − gk+1 (ω)| ≤ 2−k F(ω)1/p . It
follows further, using telescopic expansion and the triangle inequality, that gk (ω) is a
Cauchy sequence of real numbers, and hence convergent. Set f (ω) = limk→∞ gk (ω).
For the ω in the null-set where F(x) = +∞, we set f (ω) = 0. It follows readily that
Z
|gk − f | p dµ → 0, (1.5.11)

and using once more the Cauchy property of fn , that


Z
| fn − f | p dµ → 0. (1.5.12)

t
u

The case p = 2 is particularly nice, in that the space L2 is not only a Banach
space, but a Hilbert space. The point here is that the Hölder inequality, applied for
the case p = 2, yields Z
f gdµ ≤ k f k kgk . (1.5.13)
2 2

This means that on L2 , there exists a quadratic form (·, ·)µ ,


Z
( f, g)µ ≡ f gdµ (1.5.14)
R

which has
p the properties of a scalar product. The L2 -norm being the derived norm,
k f k2 = ( f, g)µ . Although somehow L spaces are not the most natural settings for
2

probability, it is sometimes quite convenient to exploit this additional structure.


1.6 Fubini’s theorem 23

1.6 Fubini’s theorem

An always important tool for the computation of integral on product spaces is Fu-
bini’s theorem. We consider first the case of non-negative functions.
Theorem 1.43 (Fubini-Tonnelli). Let (Ω1 , F1 , µ1 ), and (Ω2 , F2 , µ2 ) be two mea-
sure spaces, and let f be a real-valued, non-negative measurable function on
(Ω1 × Ω2 , F1 ⊗ F2 ). Then the two functions
Z Z
h(x) ≡ f (x, y)µ2 (dy) and g(y) ≡ f (x, y)µ1 (dx)
Ω2 Ω1

are measurable with respect to F1 resp. F2 , and


Z Z Z
f d(µ1 ⊗ µ2 ) = hdµ1 = gdµ2 (1.6.1)
Ω1 ×Ω2 Ω1 Ω2

Now we turn to the general case.

Theorem 1.44 (Fubini-Lebesgue). Let f : (Ω1 × Ω2 , F1 ⊗ F2 ) → (R, B(R)) be ab-


solutely integrable with respect to the product measure µ1 ⊗ µ2 . Then
(i) For µ1 -almost all x, f (x, y) is absolutely integrable with respect to µ2 , and vice
versa.
(ii) The functions h(x) = Ω f (x, y)µ2 (dy) and g(y) = Ω f (x, y)µ1 (dx), are well-
R R
2 1
defined except possibly on a set of measure zero with respect to the measures µ1 ,
resp. µ2 , and absolutely integrable with respect to these same measures.
(iii) Z Z Z
f d(µ1 ⊗ µ2 ) = h(x)µ1 (dx) = g(y)µ2 (dy). (1.6.2)
Ω1 ×Ω2 Ω1 Ω2
24 1 A review of measure theory

1.7 Densities, Radon-Nikodým derivatives

In Probability 1 we have encountered the notion of a proba-


bility density. In fact, we had constructed the Lebesgue-Stieltjes
measure on R by prescribing a distribution function, F, (i.e. a
non-decreasing, right-continuous function) in term of which any
interval (a, b] had measure µ((a, b]) = F(b) − F(a). In the special
case when there was a positive function f , such that for all a < b,
Rb
F(b) − F(a) = a f (x)dx, where dx indicates the standard Lebesgue
measure, we called f the density of µ and said that µ is absolutely
continuous with respect to Lebesgue measure.
We now want to generalise these notions to the general con-
text of positive measures. In particular, we want to be able to
say when two measures are absolutely continuous with respect
to each other, and define the corresponding relative densities.
First we notice that it is rather easy to modify a given mea-
sure µ on a measurable space (Ω, F) with the help of a measur-
able function f . To do so, we set, for any A ∈ F,
Z
µ f (A) ≡ f dµ. (1.7.1)
A

Exercise: Show that if f is measurable and integrable, but not


necessarily non-negative, µ f , defined as in (1.7.1), defines an
additive set-function. Show that, if f ≥ 0, µ f is indeed a mea-
sure on (Ω, F).
We see that in the case when µ is the Lebesgue measures, µ f is the absolutely
continuous measure with density f . In the general case, we have that, if µ(O) =
0, then it is also true that µ f (O) = 0. The latter property will define the notion of
absolute continuity between general measures.
Definition 1.45. Let µ, ν be two measures on a measurable space (Ω, F).
(i) We say that ν is absolutely continuous with respect to µ, or ν  µ, if and only if,
all µ-null sets, O (i.e. all sets O with µ(O) = 0), are ν-null sets.
(ii) We say that two measures, µ, ν, are equivalent if µ  ν and ν  µ.
(iii) We say that a measure ν is singular with respect to µ, or ν ⊥ µ, if there exists a
set O ∈ F such that µ(O) = 0 and ν(Oc ) = 0.
It is important to keep in mind that the notion of absolute continuity is not sym-
metric.
The following important theorem, called the Radon-Nikodým theorem, asserts
that relative absolute continuity is equivalent to the existence of a density.
Theorem 1.46 (Radon-Nikodým theorem). Let µ, ν be two σ-finite measures on a
measurable space (Ω, F). Then the following two statements are equivalent:
1.7 Densities, Radon-Nikodým derivatives 25

(i) ν  µ.
(ii) There exists a non-negative measurable function, f , such that ν = µ f .
Moreover, f is unique up to µ-null sets.

Definition 1.47. If ν  µ, then a positive measurable function f such that ν = µ f is


called the Radon-Nikodým derivative of ν with respect to µ, denoted

f= . (1.7.2)

Proof. Note that the implication (ii) ⇒ (i) is obvious from the definition. The other
direction is more tricky.
We consider for simplicity the case when µ, ν are finite measures. The extension
to σ-finite measures can then easily be carried through by using suitable partitions
of Ω.
We need a few concepts and auxiliary results. The first is the notion of the es-
sential supremum of a class of measurable functions (not to be confused with the
essential supremum of a function).
Definition 1.48. Let (Ω, F, µ) be a measure space and T an arbitrary non-empty set.
The essential supremum, g ≡ esupt∈T gt , of a class, {gt , t ∈ T }, of measurable func-
tions gt : Ω → [−∞, +∞] (with respect to µ), is defined by the properties
(i) g is measurable;
(ii) g ≥ gt , µ-almost everywhere, for each t ∈ T ;
(iii) for any h that satisfies (i) and (ii), h ≥ g, µ- a.e.
Note that by definition, if there are two g that satisfy this definition, then they
are µ-a.e. equal. Note also that the essential supremum depends on µ only trough its
null-sets.
The first fact we need to establish is that the essential supremum is always equal
to the supremum over a countable set.
Lemma 1.49. Let (Ω, F, µ) be a measure space with µ a σ-finite measure. Let {gt , t ∈
T } be a non-empty class of real measurable functions. Then there exists a countable
subset T 0 ⊂ T , such that
sup gt = esupt∈T gt . (1.7.3)
t∈T 0

Proof. It is enough to consider the case when µ is finite. Moreover, we may restrict
ourselves to the case when |gt | < C, for all t ∈ T (e.g. by passing from gt to tanh(gt ),
which is monotone and preserves all properties of the definition). Let S denote the
class of all countable subsets of T . Set
!
α ≡ sup E sup gt . (1.7.4)
I∈S t∈I

Now let In ∈ S be an increasing sequence of subsets such that


26 1 A review of measure theory
!
lim E sup gt = α, (1.7.5)
n∈N t∈In
 
and set T 0 = n∈N In . Of course, T 0 is countable and α = E supt∈T 0 gt . The function
S
g ≡ supt∈T 0 gt is measurable, since it is the supremum over a countable set of mea-
surable functions. To see that it also satisfies (ii), assume that there exists t ∈ T , such
that gt > g on a set of positive measure. Then for this t, E(max(g, gt )) > E(g) = α.
On the other hand, T 0 ∪ {t} is a countable subset of T , and so by definition of α,
E(max(g, gt )) ≤ α, which yields a contradiction. Thus (ii) holds. To show (iii), as-
sume that there exists h satisfying (i) and (ii). By (ii), h ≥ gt , a.e., for each t ∈ T , and
thus also h ≥ supt∈T 0 gt , a.e., since a countable union of null-sets is a null set. Thus
g satisfies property (iii), too. Therefore, g = esupt∈T gt . ut

The notion of essential supremum is used in the next lemma, which is the major
step in the proof of the Radon-Nikodým theorem.

Lemma 1.50. Let (Ω, F, µ) be a measure space, with µ a σ-finite measure, and let
ν be another σ-finite measure on (Ω, F).R Let H be the family of all measurable
functions, h ≥ 0, such that, for all A ∈ F, A hdµ ≤ ν(A). Then, for all A ∈ F,
Z
ν(A) = ψ(A) + gdµ, (1.7.6)
A

where ψ is a measure that is singular with respect to µ and

g = esuph∈H h (1.7.7)

with respect to µ.

Proof. We again assume µ, ν to be finite, and leave the extension to σ-finite mea-
sures as an easy exercise. We also exclude the trivial case of µ = 0. From Lemma
1.49 we know that there exists a sequence of functions hn ∈ H, such that g =
supn∈N hn . Let us first note that if h1 , h2 ∈ H, then so is h ≡ max(h1 , h2 ). To see
this, note that the disjoint sets

A1 ≡ {ω ∈ A : h1 (ω) ≥ h2 (ω)}, A2 ≡ {ω ∈ A : h2 (ω) > h1 (ω)} (1.7.8)

are measurable and A1 ∪ A2 = A. But


Z Z Z
hdµ = h1 dµ + h2 dµ ≤ ν(A1 ) + ν(A2 ) = ν(A), (1.7.9)
A A1 A2

which implies h ∈ H. We may therefore assume the sequence hn ordered such that
hn ≤ hn+1 , for all n ≥ 1. Then g = limn→∞ hn , and by monotone convergence, for all
A ∈ F, Z Z
gdµ = lim hn dµ ≤ ν(A). (1.7.10)
A n→∞ A
1.7 Densities, Radon-Nikodým derivatives 27

As a consequence, ψ defined by (1.7.6) satisfies ψ(A) ≥ 0, for all A ∈ F. Moreover,


trivially ψ(∅) = 0, as both ν and gdµ are measures, ψ defined as their difference is
σ-additive. Thus ψ is a measure.
It remains to show that ψ is singular with respect to µ. To this end we construct
a set of zero ψ-measure whose complement has zero µ-measure. Of course, this can
only be done through a delicate limiting procedure. To begin we define collections
of sets whose ψ-measure is much smaller than their µ-measure. More precisely, for
n ∈ N and A ∈ F with µ(A) > 0, let
n o
Dn (A) ≡ B ∈ F : B ⊂ A, ψ(B) < n−1 µ(B) . (1.7.11)

The key fact is that any set A of positive µ measure contains such subsets, i.e.
Dn (A) , ∅ whenever µ(A) , 0. This is proven by contradiction: assume that Dn (A) =
∅. Then set h0 = n−1 1A . For all B ∈ F one has that
Z Z
h0 dµ = n−1 µ(A ∩ B) ≤ ψ(A ∩ B) ≤ ψ(B) = ν(B) − gdµ. (1.7.12)
B B

But then B (h0 + g)dµ ≤ ν(B), for all B ∈ F, so that g + h0 ∈ H, which contradicts
R

the fact that g = esuph∈H h, since h0 > 0 on a set of positive µ-measure. Ideally we
might try to look at the union of all sets B ∈ Dn (Ω). If this was an element of F, it
would have ψ-mass of order n−1 , while complement of this set must have vanishing
µ-mass (otherwise there would be parts of Dn (Ω) in this complement. The problem
is that this union is not necessarily countable and thus it may not be in F. Therefore
we must resort to a delicate iterative procedure to construct the desired set.
We begin by choosing a set B1,n ∈ Dn (Ω) with the property that

1
µ(B1,n ) ≥ sup {µ(B) : B ∈ Dn (Ω)} ≡ α1,n . (1.7.13)
2
Morally, B1,n is our first attempt to pick up as much µ-mass as we can from the ψ-
tiny sets. If we were lucky, and µ(Bc1,n ) = 0, then we stop the procedure. Otherwise,
we continue by picking up as much mass as we can from what was left, i.e. we
choose B2,n ∈ Dn (Bc1,n ) with

1 n o
µ(B2,n ) ≥ sup µ(B) : B ∈ Dn (Bc1,n ) ≡ α2,n . (1.7.14)
2
If µ (B2,n ∪ B1,n )c = 0, we are happy and stop. Otherwise, we continue and choose

B3,n ∈ Dn (B1,n ∪ B2,n )c with


1 n o
µ(B3,n ) ≥ sup µ(B) : B ∈ Dn (Bc1,n ∩ Bc2,n ) ≡ α3,n , (1.7.15)
2
and so on. If the process stops at some kn -th step, set B j,n = ∅ for j > kn .
It is obvious from the definition that B j,n ∈ Dn (Ω), if B j,n , ∅. Since Dn (Ω) is
closed under countable disjoint unions (both ψ and µ being measures), also Mn ≡
j=1 B j,n ∈ Dn (Ω). We want to show that µ(Mn ) = 0, that is we have picked up
S∞ c
28 1 A review of measure theory

D n(Ω)

B1,n

B2,n

Fig. 1.2 Construction of the sets Bi,n

all the mass eventually. To do this, note again that, if µ(Mnc ) > 0, then there exists
D ∈ Dn (Mnc ) with µ(D) > 0.
On the other hand, for any m ∈ N,
  

 m−1
\ 

2αm,n = sup  µ(B) : B ∈ Dn  c
 
B j,n  (1.7.16)
 
 

 

j=1
 
≥ sup µ(B) : B ∈ Dn (Mnc ) ≥ µ(D).


Thus, if µ(D) > 0, then there exists some α > 0, such that µ(Bm,n ) ≥ αm,n =≥ α, for
all m. Since all B j,n are disjoint, this would imply that µ(Mn ) = ∞, which contradicts
the assumption that µ is a finite measure. Thus we conclude that µ(Mnc ) = 0, and so
ψ(Mn ) < n−1 µ(Mn ) = n−1 µ(Ω). Therefore,
∞ 
\  ∞
ψ  Mn  ≤ lim ψ (Mn ) = 0,
 (1.7.17)
n=1
n=1
\ c 
 ∞  ∞  ∞
[  X
µ  Mn   = µ  Mnc  ≤ µ Mnc = 0.

n=1 n=1 n=1

This proves that ψ is singular with respect to µ. u


t
As the first consequence of this lemma, we state the famous Lebesgue decompo-
sition theorem.
Theorem 1.51 (Lebesgue decomposition). If µ, ν are σ-finite measures on a mea-
surable space (Ω, F), then there exist two uniquely determined measures, νc , ν s , such
that ν = ν s +νc , where νc is absolutely continuous with respect to µ and ν s is singular
with respect to µ.
1.7 Densities, Radon-Nikodým derivatives 29

Proof. Lemma 1.50 provides the existence of two measures ν s and νc with the de-
sired properties. To prove the uniqueness of this decomposition, assume that there
are ν̃ s , ν̃c with the same properties. Since the measures ν s , ν̃ s are carried on sets of
zero µ-mass, they can only be different if there exists a set A ∈ F with µ(A) = 0
and ν s (A) , ν̃ s (A) > 0. But then νc (A) , ν̃c (A) as well, while by absolute continuity,
νc (A) = ν̃c (A) = 0. Thus ν s = ν̃ s and consequently νc = ν̃c . u t
The Radon-Nikodým theorem is now immediate: Assume that ν is absolutely
continuous with respect to µ. The decomposition (1.7.6) applied to µ-null sets A
then implies that for all these sets, ψ(A) = 0. But ψ is singular with respect to µ,
so there should be a µ-null set, A, for which ψ(Ac ) = 0. But since for all such A,
ψ(A) = 0, it follows that ψ(Ω) = ψ(A) + ψ(Ac ) = 0, and so ψ is the zero-measure.
All that remains is to assert that the Radon-Nikodým derivative is unique a.e.. To
do this, assume that there exists another measurable function, g∗ , such that
Z
ν(A) = g∗ dµ. (1.7.18)
A

Now define the measurable set A = {ω : C > g∗ > g > −C}. Then, by assumption,
Z Z
g∗ dµ = ν(A) = gdµ. (1.7.19)
A A

But since on A g∗ > g, this can only hold if µ(A) = 0, for all C < ∞. Thus, µ(g∗ >
g) = 0. In the same way one shows that µ(g∗ < g) = 0, implying that g and g∗ differ
at most on sets of measure zero. ut
Remark. We have said (and seen in the proof), that the Radon-Nikodým derivative
is defined modulo null-sets (w.r.t. µ). This is completely natural. Note that if µ and
ν are equivalent, then 0 < dµ

< ∞ almost everywhere, and dµ dν
= dµ1
.

The following property of the Radon-Nikodým derivative will be needed later.


Lemma 1.52. Let µ, ν be σ-finite measures on (Ω, F), and let ν  µ. If X is F-
measurable and ν-integrable, then, for any A ∈ F,
Z Z

Xdν = X dµ. (1.7.20)
A A dµ

Proof. We may assume that µ is finite and X non-negative. Appealing to the mono-
tone convergence theorem, it is also enough to consider bounded X (otherwise, ap-
proximate and pass to the limit on both sides). Let H be the class of all bounded
non-negative F-measurable functions for which (1.7.20) is true. Then H satisfies
the hypothesis of Theorem 1.26: clearly, (i) H is a vector space, (ii) the function
1 is contained in H by definition of the Radon-Nikodým derivative, and the prop-
erty (1.7.20) is stable under monotone convergence by the monotone convergence
theorem. Also, H contains the indicator functions of all elements of F. Then the
assertion of Theorem 1.26 implies that H contains all bounded F-measurable func-
tion, as claimed. u t
Chapter 2
Conditional expectations and conditional
probabilities

In this chapter we will generalise the notion of conditional expectations and condi-
tional probabilities from elementary probability theory considerably. In elementary
probability, we could condition only on events of positive probability. This notion
is too restrictive, as we have seen in the context of Markov processes, where this
limited us to consider discrete state spaces. The new notions we will introduce is
conditioning on σ-algebras. In this section we follow largely the presentation in
Chow and Teicher [4] where much further material can be found.

2.1 Conditional expectations

Definition 2.1. Consider a probability space (Ω, F, P). Let G ⊂ F be sub-σ-algebra


of F. Let X be a random variable, i.e. a F-measurable (real-valued) function on Ω,
and let E[X] be defined. We say that a function Y is a conditional expectation of X
given G, written Y = E(X|G), if
(i) Y is G-measurable, and
(ii) For all A ∈ G, Z Z
YdP = XdP. (2.1.1)
A A

Remark. If two functions Y, Y 0 both satisfy the conditions of a conditional expec-


tation, then they can differ only on sets of probability zero, i.e. P(Y = Y 0 ) = 1. One
calls such different realisations of a conditional expectation versions.

Remark. Recall that EX is defined as EX = EX + − EX − if either EX + < ∞ or


EX − < ∞. It is the weakest assumption possible under which a definition of con-
ditional expectation can make sense. Existence of conditional expectations can be
established under just this condition (see [4]), however, we will in the sequel only
treat the simple case when X is absolutely integrable, E(|X|) < ∞.

31
32 2 Conditional expectations and conditional probabilities

For a set A of positive probability, the classical notion of conditional expectation


with respect to an event is related to the notion of a conditional expectation with
respect to a σ-algebra as follows. Set A ≡ σ(A). Then

E(X|A) = E(X|A)1A + E(X|Ac )1Ac . (2.1.2)

To check this, note that the right and side is obviously A-measurable. There are only
four sets in A, namely ∅, A, Ac , and Ω. But

E (1∅ E(X|A)1A ) + E 1∅ E(X|Ac )1Ac = 0 = E (1∅ X) ,



(2.1.3)
E (1A E(X|A)1A ) + E 1A E(X|A )1Ac = E(X|A)P(A) = E (1A X) ,
c 
(2.1.4)
E (1Ac E(X|A)1A ) + E 1Ac E(X|A )1Ac = E(X|A )P(A ) = E (1Ac X) ,
c c c

(2.1.5)
E (E(X|A)1A ) + E E(X|A )1A = E(X|A)P(A) + E(X|A )P(A ) = E (X) ,
c c c

c

(2.1.6)

so this is really the conditional expectation.


Intuitively, this notion of conditional expectation can be seen as “integrating”
the random variable partially, i.e. with respect to all degrees of freedom that do not
affect the σ-algebra G. A trivial example would be the case where Ω = R2 , and G
is the σ-algebra of events that depend only on the first coordinate, say x. Then the
conditional expectation of a function f (x, y) is just the integral with respect of the
variables y (recall the construction of the integral in Fubini’s theorem), modulo re-
normalisation. What is left is, of course, a function that depends only on x, and that
also satisfies property (ii). The advantage of the notion of a conditional expectation
given a σ-algebra is that it largely generalises this concept.
In many cases that we will encounter, the σ-algebra, G, with respect to which we
are conditioning is the σ-algebra, σ(Y), generated by some other random variable,
Y. In that case we will often write

E(X|σ(Y)) ≡ E(X|Y) (2.1.7)

and call this the conditional expectation of X given Y. We may than also think of
is a function of the value of the random variable Y. As we can see, the difficulty
associated with constructing conditional expectations in the general case relates to
making sense of expressions of the form 0/0. The key to the construction of condi-
tional expectations in the general case will use the concept of the Radon-Nikodým
derivative.

Theorem 2.2. Let (Ω, F, P) be a probability space and let X be a random variable
such that E(|X|) < ∞, and let G ⊂ F be a sub-σ-algebra of F. Then
(i) there exists a G-measurable function, E(X|G), unique up to sets of measure
zero, the conditional expectation of X given G, such that for all A ∈ G,
Z Z
E(X|G)dP = XdP. (2.1.8)
A A
2.1 Conditional expectations 33

(ii) If X is absolutely integrable and Z is an absolutely integrable, G-measurable


random variable such that E(Z) = E(X) and, for some Π-System D with σ(D) =
G, Z Z
ZdP = XdP, ∀A ∈ D, (2.1.9)
A A
then Z = E(X|G) almost everywhere.

Proof. We begin by proving (i). Define the set functions λ, λ+ , λ− as


Z
λ± (A) ≡ X ± dP, λ ≡ λ+ − λ− (2.1.10)
A

Now we can consider the restriction of λ to G, denoted by λG , and the restriction


of P to G, PG . Clearly, λ± are absolutely continuous with respect to P, and their
restrictions to G, λ±G , are absolutely continuous with respect to the restriction of P
to G, PG . But since X is assumed to be absolutely integrable with respect to P and
P is a probability measure, it follows that also λ±G are finite measures. Therefore,
the Radon-Nikodým theorem 1.46 implies that there exist G-measurable functions,
dλ±
Y± = G
dPG , such that, for all A ∈ G,
Z Z
Y ± dP = λ± (A) = X ± dP, (2.1.11)
A A

and hence Y = dλG


dPG ≡ Y + − Y − , such that
Z Z
YdP = λ(A) = XdP. (2.1.12)
A A

Thus, Y has the properties of a conditional expectation and we may set E(X|G) =
dλG
Y = dP G
. Note that Y is unique up to sets of measure zero. Finally, to show that the
conditional expectation is unique in the same sense, assume that there is a function
Y 0 satisfying the conditions of the conditional expectation that differs from Y on a
set of positive measure. Then one may set A± = {ω : ±(Y 0 (ω) − Y(ω)) > 0}, and at
least one of these sets, say A+ , has positive measure. Then
Z Z Z Z
XdP = Y dP >
0
YdP = XdP, (2.1.13)
A+ A+ A+ A+

which is impossible. This proves uniqueness and hence (i) is established.


To prove (ii), set ( Z Z )
A≡ A∈F: ZdP = XdP . (2.1.14)
A A
then Ω ∈ A, and D ⊂ A, by assumption. Also, A is a λ-system, and so by Dynkin’s
theorem, A ⊃ σ(D) = G, and so Z is the desired conditional expectation. u
t
34 2 Conditional expectations and conditional probabilities

2.2 Elementary properties of conditional expectations

Conditional expectations share most of the properties of ordinary expectations. The


following is a list of elementary properties:

Lemma 2.3. Let (Ω, F, P) be a probability space and let G ⊂ F be a sub-σ-algebra.


Then:
(i) If X is G-measurable, then E(X|G) = X, a.s.;
(ii) The map X → E(X|G) is linear;
(iii) E[E(X|G)] = E(X);
(iv) If B ⊂ G is a σ-algebra, then E[E(X|G)|B] = E(X|B), a.s..
(v) |E(X|G)| ≤ E(|X| |G), a.s.;
(vi) If X ≤ Y, then E(X|G) ≤ E(Y|G), a.s.;

Proof. Left as an exercise! t


u

The following theorem summarises the most important properties of conditional


expectations with regard to limits.

Theorem 2.4. Let Xn , n ∈ N and Y be absolutely integrable random variables on a


probability space (Ω, F, P), and let G ⊂ F be a sub-σ-algebra. Then
(i) If Y ≤ Xn ↑ X a.s., then E(Xn |G) ↑ E(X|G) a.s..
(ii) If Y ≤ Xn a.s., then
 
E lim inf Xn |G ≤ lim inf E (Xn |G) . (2.2.1)
n→∞ n→∞

(iii) If Xn → X a.s., and |Xn | ≤ |Y|, for all n, then

lim E(Xn |G) = E(X|G) a.s.. (2.2.2)


n→∞

Of course, these are just the analogs of the three basic convergence theorems for
ordinary expectations. Note that we replaced the usual condition 0 ≤ Xn in (i) and
(ii) by Y ≤ Xn , which is of course a trivial generalisation (just pass to Xn − Y). We
leave the proofs as exercises.
A useful, but not unexpected, property is the following lemma.

Lemma 2.5. Let X be integrable and let Y be bounded and G-measurable. Then

E(XY|G) = YE(X|G), a.s. (2.2.3)

Remark. The conditions that Y be bounded can be relaxed to demanding that XY is


integrable, which is all we use in the proof.

Proof. We may assume that X, Y are non-negative; otherwise decompose them into
positive and negative parts and use linearity of the conditional expectation.
Define, for any A ∈ F,
2.2 Elementary properties of conditional expectations 35
Z Z
ν(A) ≡ XYdP, µ(A) ≡ XdP. (2.2.4)
A A

Both µ and ν are finite measures that are absolutely continuous with respect to P.
Then
dνG dµG dµ
= E(XY|G), = E(X|G), = X. (2.2.5)
dPG dPG dP
Then, using Lemma 1.52, for any A ∈ G,
Z Z Z
dµG
YdµG = Y dPG = YE(X|G)dPG , (2.2.6)
A A dPG A

whereas for any A ∈ F,


Z Z Z

Ydµ = Y dP = Y XdP. (2.2.7)
A A dP A

Specializing the second equality to the case when A ∈ G, we find that for those A,
Z Z
YE(X|G)dP = Y XdP. (2.2.8)
A A

Now Z ≡ YE(X|G) is G-measurable, and (2.2.8) is precisely the defining property


for Z to be the conditional expectation of XY. This concludes the proof. u
t
There should be a natural connection between independence and conditional ex-
pectation, as it was the case for the elementary notion of conditional expectation.
Here it is.
Theorem 2.6. Two σ-algebras, G1 , G2 , are independent, if and only if, for all G2 -
measurable integrable random variables, X,

E[X|G1 ] = E[X], a.s. (2.2.9)

Note that in the theorem we can replace “for all integrable G2 measurable random
variable” by “for all random variables of the form X = 1B , B ∈ G2 ”.
Proof. Assume first that G1 and G2 are independent. Let A ∈ G1 and X be G2 -
measurable. The random variables 1A and X are independent, thus

E(1A X) = E(1A )E(X) = E(1A E(X)), (2.2.10)

and from the definition of conditional expectation

E[1A E(X|G1 )] = E(1A X), ∀A ∈ G1 . (2.2.11)

Thus (2.2.9) holds.


Now assume that (2.2.9) holds. Choose X = 1B , B ∈ G2 . Then

E(1B |G1 ) = E(1B ) = P(B). (2.2.12)


36 2 Conditional expectations and conditional probabilities

Then, for all A ∈ G1 ,

P(A ∩ B) = E(1A 1B ) = E[E(1A 1B |G1 )]


(2.2.13)
= E[E(1B |G1 )1A ] = E(P(B)1A ) = P(A)P(B).

Thus G1 and G2 are independent. t


u

2.3 The case of random variables with absolutely continuous


distributions

Let us consider some cases where conditional expectations can be computed more
“explicitly”. For this, consider two random variables, X, Y, with values in Rm and
Rn (in the sequel, nothing but notation changes if the assume n = m = 1, so we will
do this). We assume that the joint distribution of X and Y is absolutely continuous
with respect to Lebesgue’s measure with density p(x, y). That is, for any function
f : Rm × Rn → R+ , Z
E( f (X, Y)) = f (x, y)p(x, y)dxdy. (2.3.1)

The (marginal) density of the random variable Y is then


Z
q(y) = p(x, y)dx (2.3.2)

(where we should modify the density to be zero when p(x, y)dx = ∞. This can be
R

done because this can be true only on a set of Lebesgue measure zero). Let us note
also that the set where q(y) = 0 has measure zero.
Z Z Z
1q(y)=0 p(x, y)dxdy = 1q(y)=0 q(y)dy = 0. (2.3.3)

Let now h : Rm → R+ be a measurable function. We want to compute E(h(X)|Y). To


do this, take a measurable function g : Rn → R+ . Then
Z
E(h(X)g(Y)) = h(x)g(y)p(x, y)dxdy (2.3.4)
Z Z !
= h(x)p(x, y)dx g(y)dy
 R 
 h(x)p(x, y)dx 
Z 
=   g(y)q(y)1q(y)>0 dy
q(y)
Z
≡ φ(y)g(y)q(y)1q(y)>0 dy
= E(φ(Y)g(Y)),
2.4 The special case of L2 -random variables 37

where we were allowed to introduce the indicator function 1q(y)>0 because as we


have seen, the complementary set has measure zero.
From this calculation we can derive the following
Proposition 2.7. With the notation above, let ν(y, dx) be the measure on Rm defined
by  p(x,y)
 q(y) dx, if q(y) > 0,

ν(y, dx) ≡ 

(2.3.5)
δ0 (dx),
 if q(y) = 0.
Then for any measurable1 function h : Rm → R+ ,
Z
E(h(X)|Y)(ω) = h(x)ν(Y(ω), dx). (2.3.6)

Proof. It is obvious that the right-hand side of Equation (2.3.6) is measurable with
respect to σ(Y). Verifying the second defining property of the conditional expecta-
tion amounts to repeating the computations in Eq. (2.3.4). u t
p(x,y)
Definition 2.8. The function q(y) as a function of x is called the conditional density
of X given Y = y.
What is particular here is that we can represent it as an expectation with respect to
an explicitly given probability measure.

2.4 The special case of L2 -random variables

Conditional expectations have a particularly nice interpretation in the case when the
random variable X is square-integrable, i.e. if X ∈ L2 (Ω, F, P) (since for the moment
we think of conditional expectations as equivalence classes modulo sets of measure
zero, we may consider X as an element of L2 rather than L2 ). We will identify the
space L2 (Ω, G, P) as the subspace of L2 (Ω, F, P) for which at least one representative
of each equivalence class is G-measurable.
Theorem 2.9. If X ∈ L2 (Ω, F, P), then E(X|G) is the orthogonal projection of X on
L2 (Ω, G, P).
Proof. The Jensen-inequality applied to the conditional expectation yields that
E(X 2 |G) ≥ E(X|G)2 , and hence E[E(X|G)2 ] ≤ E[E(X 2 |G)] = E(X 2 ) < ∞, so that
E(X|G) ∈ L2 (Ω, G, P). Moreover, for any bounded, G-measurable function Z,

E[Z(X − E(X|G))] = E(ZX) − E[Z E(X|G)] = E(ZX) − E[E(ZX|G)] = 0. (2.4.1)

Thus, X − E(X|G) is orthogonal to all bounded G-measurable random variables, and


using that these form a dense set in L2 (Ω, G, P), it is orthogonal to L2 (Ω, G, P). This
proves the theorem. u t
1 One can show that the statement holds true for any measurable and integrable h : Rm → R.
38 2 Conditional expectations and conditional probabilities

Note that this interpretation of the conditional expectation can be used to define
the conditional expectation for L2 -random variables.

2.5 Conditional probabilities and conditional probability


measures

From conditional expectations we now want to construct conditional probability


measures. These seems quite straightforward, but there are some non-trivial techni-
calities that arise from the version business of conditional expectations.
As before we consider a probability space (Ω, F, P) and a sub-σ-algebra G. For
any A ∈ F, we can define
P(A|G) ≡ E(1A |G), (2.5.1)
and call it the conditional probability of A given G. It is a G-measurable function
that satisfies
Z   
P(A|G)dP = E 1B E 1A G = E(1A 1B ) = P(A ∩ B), (2.5.2)
B

for any B ∈ G.
It clearly inherits from the conditional expectation the following properties:
(i) 0 ≤ P(A|G) ≤ 1, a.s.;
(ii) P(A|G) = 0, a.s., if and only if P(A) = 0; also P(A|G) = 1, a.s., if and only if
P(A) = 1;
(iii) If An ∈ F, n ∈ N, are disjoint sets, then
 [  X
P An G = P (An |G) , a.s.; (2.5.3)
n∈N n∈N

(iv) If An ∈ F, such that limn→∞ An = A, then

lim P(An |G) = P(A|G), a.s.. (2.5.4)


n→∞

These observations bring us close to thinking that conditional probabilities can be


thought of as G-measurable functions taking values in the probability measures, at
least for almost all ω. The problem, however, is that the requirement of σ-additivity
which seems to be satisfied due to (iii) is in fact problematic: (iii) says, that, for any
sequence An , there exists a set of measure one, such that, for all ω in this set,
 [  X
P An G (ω) = P (An |G) (ω). (2.5.5)
n∈N n∈N
2.5 Conditional probabilities and conditional probability measures 39

However, this set may depend on the sequence, and since that space is not countable,
it is unclear whether there exists a set of full measure on which (2.5.5) holds for all
sequences of sets.
These considerations lead to the definition of so-called regular conditional prob-
abilities.

Definition 2.10. Let (Ω, F, P) be a probability space and let G be a sub-σ-algebra.


A regular conditional probability measure or regular conditional probability on F
given G is a function, P : Ω × F → [0, 1], such that
(i) for each ω ∈ Ω, P(ω, ·) is a probability measure on (Ω, F);
(ii) for each A ∈ F, P(·, A) is a G-measurable function coinciding with the condi-
tional probability P(A|G) almost everywhere.

The point is that, if we have a regular conditional probability, then we can express
conditional expectations as expectations with respect normal probability measures.

Theorem 2.11. With the notation form above, if P is a regular conditional proba-
bility on F given G, then for a F-measurable integrable random variable, X,
Z
E(X|G)(ω) = X(ω̃)dP(ω, dω̃), a.s. (2.5.6)

Proof. As often, we may assume X positive. The proof then goes through the mono-
tone class theorem (Theorem 1.26), quite similar to the proof of Theorem 1.52. One
defines the class of functions where (2.5.6) holds, verifies that it satisfies the hy-
pothesis of the monotone class theorem and notices that it is true for all indicator
functions of sets in F. u
t

The question remains whether and when regular conditional probabilities exist.
An example of a regular conditional probability measure (on the measure space
(Rn × Rm , B((Rn × Rm ), p(x, y)dxdy) is the measure ν from Proposition 2.7. It is easy
to check that this has all the required properties properties, in particular it exists for
every y.
A central result for us is the existence in the case when Ω is a Polish space.

Theorem 2.12. Let (Ω, B(Ω), P) be a probability space where Ω is a Polish space
and B(Ω) is the Borel-σ-algebra. Let G ⊂ B(Ω) be a sub-σ-algebra. Then there
exists a regular conditional probability P(ω, A) given G.

We will not give the proof of this theorem here.


Chapter 3
Stochastic processes

We are finally ready to come to the main topic of this course,


stochastic processes. In this chapter we give the basic definitions,
prove the fundamental theorem of Kolmogorov and Daniell, and
discuss some examples.

3.1 Definition of stochastic processes

There are various equivalent ways in which stochastic processes


can be defined, and it will be useful to always keep them in mind.

The traditional definition.

The standard way to define stochastic processes is as follows. We


begin with an abstract probability space (Ω, F, P). Next we need a measurable space
(S , B) (which in almost all cases will be a Polish space together with its Borel σ-
algebra). The space S is called the state space. Next, we need a set I, called the
index set. Then a stochastic process with state space S and index set I is a collection
of (S , B)-valued random variables, {Xt , t ∈ I} defined on (Ω, F, P).
We call such a stochastic process also a stochastic process indexed by I. The
term stochastic process is often reserved to the cases when I is either N, Z, R+ , or
R. The index set is then interpreted as a time parameter. Depending on whether the
index set is discrete or continuous, one refers to stochastic processes with discrete or
continuous time. However, there is also an extensive theory of stochastic processes
indexed by more complicated sets, such as Rd , Zd , etc.. Often these are also referred
to as stochastic fields. We will mostly be concerned with the standard case of one-
dimensional index sets, but I will give examples of the more general case below.

41
42 3 Stochastic processes

From the point of view of mappings, we have the picture that for any t ∈ I, there
is a measurable map,
Xt : Ω → S , (3.1.1)
whose inverse maps B into F.
For this to work, we do want, of course, F to be so rich that it makes all functions
Xt , t ∈ I measurable. We denote this σ-algebra by

σ(Xt ; t ∈ I). (3.1.2)

An example of a stochastic process with discrete time are families of independent


random variables.

Sample paths.

Given a stochastic process as defined above, we can take a different perspective and
view, for each ω ∈ Ω, X(ω) as a map from I to S ,

X(ω) : I → S (3.1.3)
t 7→ Xt (ω)

We call such a function a sample path of X, or a realisation of X. Clearly here we


want to see the stochastic process as a random variable with values in the space of
functions,

X :Ω→SI (3.1.4)
ω 7→ X(ω),

where we view S I as the space of functions I → S . To complete this image, we need


to endow S I with a σ-algebra, BI . How should we choose the σ-algebra on S I ? Our
picture will be that X maps (Ω, F) to (SI , BI ). If this map is measurable, then the
marginals Xt : Ω → S should be measurable. This will be the case if the projection
maps πt : S I → S that map a function x ∈ S I to its value at time t, πt (x) = xt , are
measurable from BI to B.
Lemma 3.1. Let BI be the smallest σ-algebra that contains all subsets of S I of the
form n o
C(A, t) ≡ x ∈ S I : xt ∈ A . (3.1.5)

with A ∈ B, t ∈ I. Then BI is the smallest σ-algebra such that all the maps πt : S I → S
that map x 7→ xt , are measurable from (S I , BI ) → (S , B). Moreover, σ(Xt , t ∈ I) is the
smallest σ-algebra such that the map X : Ω → S I is measurable from (Ω, σ(Xt , t ∈ I))
to (S I , BI ).
Proof. We first show that all πt are measurable from

BI → B. (3.1.6)
3.1 Definition of stochastic processes 43

To do this, let A ∈ B, and chose t ∈ I. Then

π−1
t (A) = C(A, t) ∈ B .
I
(3.1.7)

Thus each πt is measurable. On the other hand, assume that there is some t and some
A such that C(A, t) < BI . Then clearly π−1 t (A) < B , and then πt is not measurable!
I

So all C(A, t) must be contained, but none more have to.


To show that σ(Xt , t ∈ I) is the smallest σ-algebra that makes X measurable, note
that
X −1 (C(A, t)) = {ω ∈ Ω : Xt (ω) ∈ A}. (3.1.8)
Hence the σ-algebra must contain σ(Xt , t ∈ I) and since that σ-algebra contains the
pre-image of the generator of BI , it is also sufficiently large. u
t

As a consequence of this lemma, the maps Xt ≡ πt ◦ X : Ω → S are the compo-


sitions of two measurable maps and hence measurable from σ(Xt , t ∈ I) to B as it
should be:

Lemma 3.2. The map X : Ω → S I is measurable from F → BI if and only if, for
each t, Xt is measurable from F → B.

Definition 3.3. If J ⊂ I is finite, and B ∈ B J , we call a set

C(B, J) ≡ {x ∈ S I : x J ≡ {xt , t ∈ J} ∈ B} (3.1.9)

a cylinder set or more precisely finite dimensional cylinder sets. If B is of the form
B = ×t∈J At , At ∈ B, we call such a set a special cylinder.

It is clear that BI contains all finite dimensional cylinder sets, but of course it
contains much more. We call BI the product σ-algebra, or the algebra generated by
the cylinder sets.
It is easy to check that the special cylinders form a Π-system, and the cylinders
form an algebra; both generate BI .
Thus we see that the choice of the σ-algebra BI is just the right one to make
the two points of view on stochastic processes equivalent from the point of view of
measurability.

The law of a stochastic process.

Once we view X as a map from Ω to the S -valued functions on I, we can define the
probability distribution induced by P on the space (S I , BI ),

µX ≡ P ◦ X −1 (3.1.10)

on (S I , BI ) as the distribution of the random variable X.


44 3 Stochastic processes

Canonical process.

Given a stochastic process with law µ, one can of course realise this process on the
probability space (S I , BI , µ). In that case the random variable X is the trivial map

X :SI →SI (3.1.11)


x 7→ X(x) = x.

The viewpoint of the canonical process is, however, not terribly helpful, since more
often than not, we want to keep a much richer probability space on which many
other random objects can be defined.

3.2 Construction of stochastic processes; Kolmogorov’s theorem

The construction of a stochastic process may appear rather formidable, but we may
draw encouragement from the fact that we have introduced a rather coarse σ-algebra
on the space S I . The most fundamental observation is that stochastic processes are
determined by their observation on just finitely many points in time. We first make
this important notion precise.
For any J ⊂ I, we will denote by π J the canonical projection from S I to S J ,
i.e. π J X ∈ S J , such that, for all t ∈ J, (π J X)t = Xt . Naturally, we can define the
distributions
µXJ ≡ P ◦ (π J X)−1 (3.2.1)
on S J .
Definition 3.4. Let F(I) denote the set of all finite, non-empty subsets of I. Then
the collection of probability measures
n o
µXJ : J ∈ F(I) (3.2.2)

is called the collection of finite dimensional distributions1 of X.


Note that the finite dimensional distributions determine µX on the algebra of finite
dimensional cylinder sets. Hence, by Dynkin’s theorem, they determine the distri-
bution on the σ-algebra BI . This is nice. What is nicer, is that one can also go the
other way and construct the law of a stochastic process from specified finite dimen-
sional distributions. This will be the content of the fundamental theorem of Daniell
and Kolmogorov.
Theorem 3.5. Let S be a compact metrisable space, and let B ≡ B(S ) be its Borel-
σ-algebra. Let I be a set. Suppose that, for each J ∈ F(I), there exists a probability
measure, µ J , on (S J , B J ), such that for any J1 ⊂ J2 ∈ F(I),
1Alternative appellation are “finite dimensional marginal distributions” or “finite dimensional
marginals”.
3.2 Construction of stochastic processes; Kolmogorov’s theorem 45

µ J1 = µ J2 ◦ π−1
J1 , (3.2.3)

where π J1 denotes the canonical projection from S J2 → S J1 . Then there exists a


unique measure, µ, on (S I , BI ), such that, for all J ∈ F(I),

µ ◦ π−1
J =µ .
J
(3.2.4)

Proof. It will not come as a surprise that we will use Carathéodory’s theorem to
prove our result. To do this, we have to construct a σ-additive set function on an
algebra that generates the σ-algebra BI . Of course, this algebra will be the algebra
of all finite-dimensional cylinder events. It is rather easy to see what this set function
will have to be. Namely, if B is a finite dimensional cylinder, then there exists J ∈
F(I), and A J ∈ B J , such that B = A J × S I\J (we call in such a case J the base of the
cylinder). Then we can define

µ0 (B) = µ J (A J ). (3.2.5)

Clearly µ0 (∅) = 0, and µ0 is finitely additive: if B1 , B2 are disjoint finite dimensional


cylinders with basis Ji , then we can write Bi , i = 1, 2, in the form Ai × S I\J , where
J = J1 ∪ J2 , and Ai ∈ B J are disjoint. Then it is clear that

µ0 (B1 ∪ B2 ) = µ J (A1 ∪ A2 ) = µ J (A1 ) + µ J (A2 ) = µ0 (B1 ) + µ0 (B2 ) (3.2.6)

where the consistency relations (3.2.3) were used in the last step. The usual way to
prove σ-additivity is to use the fact that an additive set-function, µ0 , is σ-additive if
and only if for any sequence Gn ↓ ∅, µ(Gn ) ↓ 0.
Therefore, the proof will be finished once we establish the following lemma.

Lemma 3.6. Let Bn , n ∈ N be a sequence of cylinder sets such that Bn ⊃ Bn+1 for all
n. If there exists an  > 0, such that for all n ∈ N, µ0 (Bn ) ≥ 2, then limn→∞ Bn , ∅.

Proof. If Bn satisfies the assumptions of the lemma, then there exists a sequence
Jn ∈ F(I) and An ∈ B Jn , such that Bn = An × S I\Jn , Jn ⊂ Jn+1 and

µ0 (Bn ) = µ Jn (An ). (3.2.7)

It will be enough to assume that Jn = {1, . . . , n}. Since µ Jn is a probability measure on


the compact metrisable space S Jn , Theorem 1.23 implies that, for any  > 0, there
exists a compact subset, Kn ⊂ An , such that

µ Jn (Kn ) ≥ µ Jn (An ) − 2−n , (3.2.8)

or, with Hn = Kn × S I\Jn ,

µ0 (Hn ) ≥ µ0 (Bn ) − 2−n . (3.2.9)

Now, under the hypothesis of the lemma, for all n ∈ N,


46 3 Stochastic processes
n
X ∞
X
µ0 (H1 ∩ · · · ∩ Hn ) ≥ µ0 (B1 ∩ · · · ∩ Bn ) − µ0 (Bi \Hi ) ≥ 2 −  2−i = . (3.2.10)
i=1 i=1

In particular, for any n, H1 ∩ · · · ∩ Hn , ∅. Now let xn ∈ H1 ∩ · · · ∩ Hn , and hence


π Jk xn ∈ K1 ∩ · · · ∩ Kk , for any k ≤ n. By compactness of this set, there exist a subse-
quence, ni , such that limi→∞ π Jk xni ∈ kj=1 K j .
T

Taking subsequently sub-subsequences2 , we can construct a sequence in such a


way that π Jk xni → xk ∈ kj=1 K j for all k. Clearly, π J` xk = x` , for all ` ≤ k. Then
T

there exist an x ∈ S I whose projections are equal to these limits for all k and hence
x ∈ kj=1 B j for all k, hence x ∈ n∈N Bn and so n∈N Bn , ∅. But this is the claim of
T T T
the lemma. u t

So we are done: µ0 is σ-additive on the algebra of finite dimensional cylinders,


and so there exists a unique probability measure on the σ-algebra BI with the ad-
vertised properties. ut

Note that we have used the assumption on the space S only to ensure that the
measures µ J , for J ∈ F(I), are all inner regular. Thus we can replace the assertion of
the theorem by:

Theorem 3.7. Let S be a topological space, and let B = B(S ) be its Borel-σ-
algebra. Let I be a set. Suppose that, for each J ∈ F(I), there exists an inner regular
probability measure, µ J , on (S J , B J ), such that for any J1 ⊂ J2 ∈ F(I),

µ J1 = µ J2 ◦ π−1
J1 , (3.2.11)

where π J1 denotes the canonical projection from S J2 → S J1 . Then there exists a


unique measure, µ, on (S I , BI ), such that, for all J ∈ F(I),

µ ◦ π−1
J =µ .
J
(3.2.12)

Finally, one can show that the assumption that S be compact and metrisable in
Theorem 1.23 can be replaced by assuming that S be Polish.
This is due to the following extension of Theorem 1.23.

Theorem 3.8. Let S be a Polish space and let P be a probability measure on


(S , B(S )). Then P is inner regular.

Proof. We only need to modify the proof of Theorem 1.23 slightly. Instead of show-
ing that any compact set is an element of the algebra A, we will show that any closed
set is in A. The main step here is to show that S ∈ A, since then the mass of S is well
approximated by masses of compact subsets and we are done.

2 This is possible because the subsequences for k + 1 is a sub-subsequence for k, due to kj=1 K j ⊃
T
Tk+1
j=1 K j . Indeed, denote (ni,k )i≥1 the subsequence for k. Applying the Cantor diagonal procedure,
i.e., taking (nk,k )k≥1 provides the desired subsequence.
3.2 Construction of stochastic processes; Kolmogorov’s theorem 47

Denote by Kr (x) = {y ∈ S : ρ(x, y) ≤ r} the closed balls around x with radius r with
respect to an complete metric on S . Then there is a countable sequence of points,
(xn , n ∈ N), such that for any r > 0, S = n∈N Kr (xn ). By σ-additivity,
S

 k 
[ 
lim P  Kr (xi ) = 1.
 (3.2.13)
k↑∞
i=1

In particular, for any  > 0, there exists a sequence kn , such that for all n ∈ N,
k 
[n 
P  K1/n (xi ) ≥ 1 − 2−n .
 (3.2.14)
i=1

Clearly the finite unions of the balls are closed and so is their intersection,
kn
\[
K≡ K1/n (xi ), (3.2.15)
n∈N i=1

and it holds that


  k 
[  [ n 
P(S ) − P(K) ≤ P  S \
  K1/n (xi ) (3.2.16)
n∈N i=1
 k  ∞
X  [ n  X
≤ P S \

 K1/n (xi ) ≤ 

 2−n = .
n∈N i=1 n=1
S
But K is a subset of any of the sets i=1kn K1/n (xi ) which is bounded. Hence K is
closed and totally bounded (totally bounded is exactly the property stated: For every
n there is a finite collection of balls of radius 1/n that cover K). Since the space is
complete, it follows that K is compact. Thus S is in A. This fact can then be used to
prove that all closed sets are in A. Namely, let K be compact such that P(S \ K) ≤ .
Let A be closed. Then P(A \ (A ∩ K)) ≤ P(S \ K) ≤ , hence A ∈ A. The Theorem
follows then from Dynkin’s theorem. u t

As a consequence:

Corollary 3.9. The hypothesis of Theorem 3.6 holds for any probability measure
(S , B(S )) when S is a Polish space.

Remark. Note that we have seen no need to distinguish cases according to the nature
of the set I.
48 3 Stochastic processes

3.3 Examples of stochastic processes

The Kolmogorov-Daniell theorem goes a long way in helping to construct stochastic


processes. However, one should not be deceived: prescribing a consistent family of
finite dimensional distributions (i.e. distributions satisfying (3.2.4)) is by no means
an easy task and in practise we want to have a simpler way of describing a stochastic
process of our liking.
In this section I discuss some of the most important classes of examples without
going into too much detail.

3.3.1 Independent random variables

We have of course already encountered independent random variables in the first


course of probability. We can now formulate the existence of independent random
variables in full generality and with full rigour.

Theorem 3.10. Let I be a set and let, for each t ∈ I, µt be a probability measure on
(S , B(S )), where S is a polish space. Then there exists a unique probability measure,
µ, on (S I , BI ), such that, for J ∈ F(I), At ∈ B, and A J ≡ ×t∈J At ∈ B J ,
  Y
µ π−1J (A J ) ≡ µ J
(A J ) = µt (At ). (3.3.1)
t∈J

Proof. Eq. (3.3.1) prescribes µ J on the rectangles ×t∈J At . But the rectangles are a
Π-system that generates the σ-algebra B J . By Carathérody’s theorem this fixes a
unique measure µ J on B J . This family is easily verified to satisfy the consistency
relations in the Kolmogorov-Daniell theorem. Hence, unique measure µ on (S I , BI )
with these marginals exists. ut

Remark. Note that we do not assume I to be countable. In the case when I is un-
countable, such a collection of random variables is sometimes called white noise.
This is, however, a rather unpleasant object. When we discuss seriously the issue
of stochastic processes with continuous time, we will see that we always will want
additional properties of sample paths that the theorem above does not provide.

Independent random variables are a major building block for more interesting
stochastic processes. We have already encountered sums of independent random
variables. Other interesting processes are e.g. maxima of independent random vari-
ables: If Xi , i ∈ N are independent random variables, define

Mn = max Xk . (3.3.2)
1≤k≤n

The study of such maxima is an interesting topic in itself.


Of course one can look at many more functions of independent random variables.
3.3 Examples of stochastic processes 49

3.3.2 Gaussian processes

Gaussian processes is one of the most important classes of stochastic process that
can be defined with the help of densities. Let us proceed in two steps.
First, we consider finite dimensional Gaussian vectors. Let n ∈ N be fixed, and let
C be a real symmetric positive definite n × n matrix. We denote by C −1 its inverse.
Define the Gaussian density,
!
1 1
fC (x1 , . . . , xn ) ≡ √ exp − (x,C x) .
−1
(3.3.3)
(2π)n/2 detC 2

You see that the necessity of having C positive derives from the fact that we want
this density to be integrable with respect to the n-dimensional Lebesgue measure.

Definition 3.11. A family of n real random variables is called jointly Gaussian with
mean zero and covariance C, if and only if their distribution is absolutely continuous
w.r.t. the Lebesgue measure on Rn with density given by fC .

Remark. In this section we always assume that Gaussian random variables have
mean zero. The corresponding expressions in the general case can be recovered by
simple computations.

The definition of Gaussian vectors is no problem. The question is, can we define
Gaussian processes? From what we have learned, it will be crucial to be able to
define compatible families of finite dimensional distributions.
The following result will be important.

Lemma 3.12. Let X1 , . . . , Xn be random variables whose joint distribution is Gaus-


sian with covariance matrix C and mean zero.
(i) For any k, ` ∈ {1, . . . , n},
E(Xk X` ) = Ck,` . (3.3.4)
(ii) If J ⊂ {1, . . . , n} with |J| = m, then the random variables X` , ` ∈ J are jointly
Gaussian with covariance given by the m×m-matrix C J with elements Ck,` J =C ,
k,`
if k, ` ∈ J.
(iii) For any u ∈ Rn , we have that
h i 1
E e(u,X) = e 2 (u,Cu) . (3.3.5)

and h i 1
E ei(u,X) = e− 2 (u,Cu) . (3.3.6)

Proof. Item (i) follows from the first part of Item (iii), or by direct computation. (ii)
can be proven in various ways. Note that it is clear that if the denote the random
vector with components X` , ` ∈ J by X J , then, trivially,

E(XkJ X`J ) = Ck,` . (3.3.7)


50 3 Stochastic processes

Therefore, all we have to show to infer (ii) is that X J is jointly Gaussian. For this we
need to compute the joint marginal density of this vector. Without loss of generality,
we can assume that J = 1, . . . , m. Let us write x = (x J , xR ). We have to compute
Z
fC (x J , xR )dxR , (3.3.8)

where the notation should be obvious. Now it is clear that the matrix C −1 can be
written in Block form as !
A D
C = t
−1
, (3.3.9)
D B
where the particular form of A, B, D does not matter. Then we can write
 
(x J , xR ),C −1 (x J , xR ) = (x J , Ax J ) + (x J , DxR ) + (xR , Dt x J ) + (xR , BxR ). (3.3.10)

By completing the square, this can be written as

(x J , Ax J ) + (xR + B−1 Dt x J , B(xR + B−1 Dt x J ) − (x J , DB−1 Dt x J ). (3.3.11)

Therefore,
Z  Z  
fC (x J , xR )dxR = const.e− 2 xJ ,(A−DB D )xJ e− 2 xR +B ,B(xR +B D xJ ) dxR
1 −1 t 1 −1 −1 t

 
1 x J ,(A−DB−1 Dt )x J
= const.0 e− 2 , (3.3.12)

for constants that we do not care to compute. Moreover, we know that the integral
over the expression in the last line is equal to one, so the quadratic form in the
exponent must be positive definite. That shows that the marginal distribution of X J is
Gaussian, and since we know the covariances from (i), we know that the covariance
matrix is just the restriction of C set J.
Item (iii) is just computations. We first compute the moment generating function,
or the Laplace transform, of our jointly Gaussian vector. We define, for u ∈ Cn ,
   Pn  Z Pn
φC (u) ≡ E e(u,X) ≡ E e i=1 ui Xi = dx1 . . . dxn fC (x1 , . . . , xn )e i=1 ui xi . (3.3.13)

Z !
1 1
φC (u) = √ dx exp − (x,C −1 x) + (u, x)
(2π)n/2 detC 2
Z !
1 1 1
= √ dx exp − (x − Cu,C −1
(x − Cu)) + (u,Cu)
(2π)n/2 detC 2 2
 Z
exp 12 (u,Cu) 1
!
= √ dx exp − (x − Cu,C −1 (x − Cu))
(2π) n/2 detC 2
!
1
= exp (u,Cu) , (3.3.14)
2
3.3 Examples of stochastic processes 51

where in the last line we used that the domain of integration in the integral is invari-
ant under translation. To obtain the analog result for the Fourier transform, i.e. Eq.
(3.3.6), use Cauchy’s theorem to show that the last line in Eq. (3.3.14) is also valid
if u is a complex vector.
Now it is easy to compute the correlation functions. Clearly,

d2 φC (u)

E(Xk X` ) = = Ck,` . (3.3.15)
duk du` u=0
This establishes (i). (ii) is now quite simple. To compute the Laplace or Fourier
transform of the vector X` , ` ∈ J, we just need to set ui = uiJ for i ∈ J and ui = 0
for i < J. The result is precisely the Laplace transform of a Gaussian vector with
covariance C J . Since the Fourier (and also the Laplace) transform determines the
distribution uniquely, (ii) follows. ut

This result if very encouraging for the prospect of defining Gaussian vectors.
If we can specify an infinite dimensional positive matrix, C then all its finite di-
mensional sub-matrices, C J , J ∈ F(N), are positive and the ensuing family of finite
dimensional distributions are Gaussian distributions that do satisfy the consistency
requirements of Kolmogorov’s theorem! The result is:

Theorem 3.13. Let C : N×N → R define a positive quadratic form. Then there exists
a unique stochastic process, X, with index set N and state space R, such that, for all
finite J ⊂ N, the marginal distributions are |J|-dimensional Gaussian vectors with
covariance C J . X is called a Gaussian processes indexed by N.

There is an obvious generalisation of this theorem to arbitrary index sets:

Theorem 3.14. Let I be a set and let C : I × I → R define a positive quadratic form
J = C(t, s), t, s ∈ J
in the sense that for every J ∈ F(I), the matrix C J with elements Ct,s
is positive definite. Then there exists a unique stochastic process, X, with index set I
and state space R, such that, for all finite J ⊂ N, the marginal distributions are |J|-
dimensional Gaussian vectors with covariance C J . X is called a Gaussian processes
indexed by I.

Thus the trick is to construct positive quadratic forms. Of course is easy to guess
a few by going the other way, and using independent Gaussian random variables
as building blocks. For example, consider Xn , n ∈ N to be independent, Gaussian
random variables with mean zero and variance σ2n . Set Zn ≡ nk=1 Xk . Then
P

n X
X m  m∧n
X m∧n
X
Cn,m ≡ E(Zn Zm ) = E Xi X j = E(Xi2 ) = σ2i . (3.3.16)
i=1 j=1 i=1 i=1

Thus the quadratic form Cn,m = σ2i is apparently positive. In fact, if u ∈ RN ,


Pm∧n
i=1
52 3 Stochastic processes

X m∧n
X X X X
(u,Cu) = un um σ2i = σ2i um un (3.3.17)
n,m∈N i=1 i∈N m≥i n≥i
X X 2
= σ2i um ≥0
i∈N m≥i

and it is equal to zero if and only if u = 0.


Now we have seen that in the construction of stochastic processes, the fact to
have discrete time did not appear (so far) to be much of an advantage. Thus the
above example may make us courageous to attempt to define a Gaussian process on
R+ . To this end, define a function C : R+ × R+ → R+ by

C(t, s) ≡ t ∧ s. (3.3.18)

What we have to check is that, for any J ∈ F(R+ ), the restriction of C to a quadratic
form on R J is positive. But indeed,
X X Z (t∧s) Z ∞  X 2
ut u s (t ∧ s) = ut u s 1 dr = dr ut ≥ 0. (3.3.19)
t,s∈J t,s∈J 0 0 t∈J,t≥r

Thus all finite dimensional distributions exist as Gaussian vectors, and the com-
patibility conditions are trivially satisfied. Therefore there exists a Gaussian process
on R+ with this covariance. This process is called “Brownian motion”. Note, how-
ever, that this constructs the process only in the product topology, which does not
yet yield any nice path properties. We will later see that this process can actually
be constructed on the space of continuous functions, and this object will then more
properly called Brownian motion.
Exercise. Let Xk , k ∈ N, be independent Gaussian random variables with mean zero
and variance σ2 = 1. Define, for n ∈ N, and t ∈ [0, 1],
[nt]
1 X
Zn (t) ≡ √ Xk , (3.3.20)
n k=1

where [·] denotes the largest integer smaller than ·. Show that
(i) Zn (t) is a stochastic process with indexset [0, 1] and state space R.
(ii) Compute the covariance, Cn , of Zn and show that for any I ∈ F([0, 1]), CnI → C I ,
where C(s, t) = s ∧ t.
(iii) Show that the finite dimensional distributions of the processes Zn converge, as
n → ∞, to those of the “Brownian motion” defined above.
(iv) Show that the results (i) − (iii) remain true if instead of requiring that the Xk are
Gaussian we just assume that their variance equals to 1.
Note that to prove (iv), you need to prove the multi-dimensional analogue of the
central limit theorem. This requires, however, little more than an adaptation of the
notation from the standard CLT in dimension one.
3.3 Examples of stochastic processes 53

3.3.3 Markov processes

Gaussian processes were build from independent random variables using densities.
Another important way to construct non-trivial processes uses conditional probabil-
ities. Markov processes are the most prominent examples. In the case of Markov
processes we really think of the index set, N0 or R+ , as time. The process Xt then
shall have two properties: (1) it should be causal, i.e. we want an expression for the
law of Xt given the σ-algebra Ft− ≡ σ(X s , s < t), (2) we want this law to be forgetful
of the past: if we know the position (value; we will think mostly of a Markov pro-
cess as a “particle” moving around in S ) of X at some time s < t, then the law of
Xt should be independent of the positions of X s0 with s0 < s. In a way, Markov pro-
cesses are meant be the stochastic analogues of deterministic evolution (differential
equations).
To set such a process up, let us consider the (much simpler) case of discrete time,
i.e. I = N0 (we always want zero in our index set). The main building block for a
Markov chain is then the so called (one-step) transition kernel, P : N0 × S × B →
[0, 1], with the following properties:
(i) For each t ∈ N0 and x ∈ S , Pt (x, ·) is a probability measure on (S , B).
(ii) For each A ∈ B, and t ∈ N0 , Pt (·, A) is a B-measurable function on S .
In the sequel we denote by Ft = σ(X s , s ≤ t) ⊂ F the sigma algebra generated by
the process up to time t.

Definition 3.15. A stochastic process X with state space S and index set N0 is a
discrete time Markov process with transition kernel P if, for all A ∈ B, t ∈ N,

P(Xt ∈ A|Ft−1 )(ω) = Pt−1 (Xt−1 (ω), A), P − a.s.. (3.3.21)

Equivalently, we can ask that for all bounded measurable functions f : S → R.


Z
E( f (Xt )|Ft−1 )(ω) = Pt−1 (Xt−1 (ω), dx) f (x), P − a.s.. (3.3.22)

The remarkable thing is that this requirement fixes the law P up to one more proba-
bility measure on (S , B), the so-called initial distribution, P0 .

Theorem 3.16. Let S be a Polish space, P a transition kernel, and P0 a probabil-


ity measure on (S , B(S )). Then there exists a unique stochastic process satisfying
(3.3.21) and P(X0 ∈ A) = P0 (A), for all A.

Proof. In view of the Kolmogorov-Daniell theorem, we have to show that our re-
quirements fix all finite dimensional distributions, and that these satisfy the compat-
ibility conditions. This is more a problem of notation than anything else. We will
need to be able to derive formulas for
 
P Xtn ∈ An , . . . , Xt1 ∈ A1 . (3.3.23)
54 3 Stochastic processes

To get started, we consider P (Xt ∈ A|F s ), for s < t. To do this, we use that by the
elementary properties of conditional expectations (we drop the a.s. that applies to
all equations relating to conditional expectations).

P (Xt ∈ A|F s ) = E [P (Xt ∈ A|Ft−1 ) |F s ]


= E [Pt−1 (Xt−1 , A)|F s ]
= E [E [Pt−1 (Xt−1 , A)|Ft−2 ] |F s ]

where we used F s ⊂ F s0 for all s < s0 . Moreover,


" "Z # #
P (Xt ∈ A|F s ) = E E Pt−1 (xt−1 , A)Pt−2 (Xt−2 (ω), dxt−1 ) Ft−2 F s

Z
=E Pt−1 (xt−1 , A)Pt−2 (xt−2 , dxt−1 ) · · ·

· · · P s+1 (x s+1 , dx s+2 )P s (X s (ω), dx s+1 ) F s
Z
= Pt−1 (xt−1 , A)Pt−2 (xt−2 , dxt−1 ) · · · P s (X s (ω), dx s+1 ),
(3.3.24)

since X s is F s measurable. We will set


Z
P s,t (x, A) ≡ Pt−1 (xt−1 , A)Pt−2 (xt−2 , dxt−1 ) · · · P s (x, dx s+1 ), (3.3.25)

and call P s,t the transition kernel from time s to time t. With this object defines, we
can now proceed to more complicated expressions:
 
P Xtn ∈ An , . . . , Xt1 ∈ A1
= E P(Xtn ∈ An |Ftn−1 )1An−1 (Xtn−1 ) · · · 1A1 (Xt1 )
h i

= E E Ptn−1 ,tn (Xtn−1 (ω), An )|Ftn−1 1An−1 (Xtn−1 ) · · · 1A1 (Xt1 )


h h i i
 Z 
=E E Ptn−1 ,tn (xn−1 , An )Ptn−2 ,tn−1 (Xtn−2 (ω), dxn−1 ) Ftn−2
An−1

×1An−2 (Xtn−2 ) · · · 1A1 (Xt1 )
Z Z
= Ptn−1 ,tn (xn−1 , An ) Ptn−2 ,tn−1 (xn−2 , dxn−1 )
An−1 An−2
Z Z
··· Pt1 ,t2 (x1 , dx2 ) P0,t1 (x0 , dx1 )P0 (dx0 ). (3.3.26)
A1 S

Thus, we have the desired expression of the marginal distributions in terms of the
transition kernel P and the initial distribution P0 . The compatibility relations follow
from the following obvious, but important property of the transition kernels.
3.3 Examples of stochastic processes 55

Lemma 3.17. The transition kernels P s,t satisfy the Chapman-Kolmogorov equa-
tions Z
P s,t (x, A) = Pr,t (y, A)P s,r (x, dy) (3.3.27)

for any s < r < t.


Proof. This is obvious from the definition. t
u
The proof of the compatibility relations is now also obvious; if some of the Ai
are equal to S , we can use (3.3.27) and recover the expressions for the lower dimen-
sional marginals. u t
Exercise. Consider the Brownian motion process from the last sub-section. Show
that this process is Markov in the sense that all finite dimensional distributions sat-
isfy the Markov property.
Hint: Let J = tn > tn−1 > · · · > t1 . Show that the family of random variables
Yn ≡ Xtn − Xtn−1 , Xtn−1 , Xtn−2 , . . . , Xt1 are jointly Gaussian and that Yn is independent
of the σ-algebra generated by Xtn−1 , . . . , Xt1 .

3.3.4 Gibbs measures

As an aside, I will briefly explain another important way to construct stochastic


processes with the help of conditional expectations and densities, that is central in
statistical mechanics. It is particularly useful in the setting where I is not an ordered
set, the most prominent example is I = Zd .
In order not to introduce too much notation, I will stick to a simple example,
the so-called Ising-model. In this case, S = {−1, 1}. The main object is family of
d
functions, HΛ : S Z → R, called Hamiltonians, that are defined for every finite Λ ⊂
d
Z , and are given by X
HΛ (X) = − Xi X j Ji j . (3.3.28)
i, j:i∨ j∈Λ

Using this function, we will construct a family of probability kernels, µΛ , that have
the following properties:
d d
(i) For each y ∈ S Z , µΛ (·, τ) is a probability measure on S Z ;
d
(ii) For each A ∈ BZ , µΛ (A, ·) is a FΛc -measurable function, where FΛc = σ(Xi , i ∈ Λc );
d
(iii) For any pair of volumes, Λ, Λ0 , with Λ ⊂ Λ0 , and any A ∈ BZ ,
Z
µΛ (z, A)µΛ0 (x, dz) = µΛ0 (x, A). (3.3.29)

We will indeed give an explicit formula for µΛ :

xi ,i∈Λ 1(xΛ ,yΛc )∈A e


P −βHΛ ((xΛ ,yΛc ))
µΛ (A, y) = . (3.3.30)
−βH ((xΛ ,yΛc ))
P
Λ
xi ,i∈Λ e
56 3 Stochastic processes

It is easily checked that this expression indeed defines a kernel with properties (i)
and (ii). An expression of this type is called a local Gibbs specification.
Now we see that the properties of these kernels are reminiscent of those of regular
conditional probabilities.
One defines the notion of a Gibbs measure as follows:
d
Definition 3.18. A probability measure on S Z is called a Gibbs measures, if and
only if, for any finite Λ ⊂ Zd , the kernel µΛ is a regular conditional probability for µ
given FΛc .
More specifically, if the kernel is the Gibbs specification (3.3.30), it will be called
a Gibbs measures for the d-dimensional Ising model at temperature β−1 .
One can prove that such Gibbs measures exist; for this one shows that any ac-
cumulation point of a sequence µΛn (·, x), where Λn ↑ Zd is any increasing sequence
of volumes that converges to Zd (in the sense, that, for any finite Λ, there exists
n0 , such that, for all n ≥ n0 , Λ ⊂ Λn ), will be a Gibbs measure. This is relatively
straightforward, by writing equation (3.3.29) for a sequence of volumes Λn ↑ Zd :
Z
µΛ (z, A)µΛn (x, dz) = µΛn (x, A). (3.3.31)

If µΛn conveges weakly to some measureR µ, then the right-hand side converges to
µ(A). The left-hand side will converge to µΛ (z, A)µ(x, dz), since one can easily see
that µΛ (z, A) is a continuous function, if A is a cylinder event (in fact, in our example,
it is a local function on a discrete space). But then µ satisfies the desired properties
of a Gibbs measure.
d
The existence of accumulation points is then guaranteed by the fact that S Z is
compact (Tychonov, since S = {−1, 1} is compact), and that the set of probability
measures over a compact space is compact. What makes this setting interesting is
that there is no general uniqueness result. In fact, if d ≥ 2, and β > βc , for a certain βc ,
then it is known that there exists more than one Gibbs measure. This mathematical
fact is connected to the physical phenomenon of a so-called phase transition, and
this is what makes the study of Gibbs measures so interesting. For deeper material
on Gibbs measures see [3, 12, 6].
Chapter 4
Martingales

In this chapter we introduce the fundamental concept of mar-


tingales, which plays a central rôle in the theory of stochastic
processes. Martingales are “truly random” stochastic processes,
in the sense that their observation in the past does not allow for
useful prediction of the future. By useful we mean here that no
gambling strategies can be devised that would allow for system-
atic gains. In thi chapter we will always assume that random vari-
ables take values in R, unless specified otherwise. The treatment
of martingales follows largely the book of Rogers and Williams [11], with the ex-
ception of a section on the central limit theorem, which is inspired by Billingsley’s
presentation[2].

4.1 Definitions

We begin by formally introducing the notion of a filtration of a σ-algebra that we


have already briefly encountered in the context of Markov processes. We remain in
the context of discrete index sets.

Definition 4.1. Let (Ω, F) be a measurable space. A family of sub-σ-algebras,


{Fn , n ∈ N0 } of F that satisfies
[ 
F0 ⊂ F1 ⊂ · · · ⊂ · · · F∞ ≡ σ Fn ⊂ F, (4.1.1)
n∈N0

is called a filtration of the σ-algebra F. We call a quadruple (Ω, F, P, {Fn , n ∈ N0 }) a


filtered (probability) space.

In this chapter we will henceforth always assume that we are given a filtered
space. Also, all stochastic processes are assumed to have state space R.

57
58 4 Martingales

Filtrations and stochastic processes are closely linked. We will see that this goes
in two ways.
Definition 4.2. A stochastic process, {Xn , n ∈ N0 }, is called adapted to the filtration
{Fn , n ∈ N0 }, if, for every n, Xn is Fn -measurable.
Now the other direction:
Definition 4.3. Let {Xn , n ∈ N0 } be a stochastic process on (Ω, F, P). The natural
filtration, {Wn , n ∈ N0 } with respect to X is the smallest filtration such that X is
adapted to it, that is,
Wn = σ(X0 , . . . , Xn ). (4.1.2)
We see that the basic idea of the natural filtration is that functions of the process
that are measurable with respect to Wn depend only on the observations of the
process up to time n.
We now define martingales.
Definition 4.4. A stochastic process, X, on a filtered space is called a martingale, if
and only if the following hold:
(i) The process X is adapted to the filtration {Fn , n ∈ N0 };
(ii) For all n ∈ N0 , E(|Xn |) < ∞;
(iii) For all n ∈ N,
E(Xn |Fn−1 ) = Xn−1 , a.s.. (4.1.3)
If (i) and (ii) hold, but instead to (iii) it holds that E(Xn |Fn−1 ) ≥ Xn−1 , respectively
E(Xn |Fn−1 ) ≤ Xn−1 , then the process X is called a sub-martingale, respectively a
super-martingale.
In particular, for a martingale E(Xn ) = E(Xn−1 ), for a sub-martingale E(Xn ) ≥
E(Xn−1 ), finally, for a super-martingale E(Xn ) ≤ E(Xn−1 ).
It is clear that the property (iii) is what makes martingales special: intuitively, it
means that the best guess for what Xn could be, knowing what happened up to time
n − 1 is simply Xn−1 . No prediction on the direction of change is possible.
We will now head for the fundamental theorem concerning the impossibility of
winning systems in games build on martingales.
To put us into the gambling mood, we think of the increments of the process,
Yn ≡ Xn − Xn−1 , as the result of (not necessarily independent) games (Examples: (i)
Coin tosses, or (ii) the daily increase of the price of a stock). We are allowed to bet
on the outcome in the following way: at each moment in time, k − 1, we choose a
number Ck ∈ R.
Then our wealth will increase by the amount Ck Yk on the k-th day, i.e. the wealth
process, Wn is given by Wn = nk=1 Ck Yk (Example: (i) in the coin tosscase, chosing
P
Ck > 0 means to bet on head (= {Yn = +1}) an amount Ck , and Cn < 0 means to bet on
the outcome tails (= {Yn = −1}) the amount −Cn ; (ii) in the stock case, Cn represents
the number of shares an investor decides to hold at time n − 1 up to time n (here
negative values can be realised by short-selling).
The choice of the Ck is done knowing the process up to time k − 1. This justifies
the following definition.
4.1 Definitions 59

Definition 4.5. A stochastic process {Cn , n ∈ N} is called previsible1 , if, for all n ∈ N,
Cn is Fn−1 -measurable.
Given an adapted stochastic process, X and a previsible process C, we can define
the wealth process
X n
Wn ≡ Ck (Xk − Xk−1 ) ≡ (C • X)n . (4.1.4)
k=1
Definition 4.6. The process C • X is called the martingale transform of X by C or
the discrete stochastic integral of C with respect to X.
Now we can formulate the general “no-system” theorem for martingales:
Theorem 4.7. Let (Ω, F, P, {Fn , n ∈ N}) be a filtered probability space.
(i) Let C be a bounded non-negative previsible process such that there exists K <
∞, such that, for all n, and all ω ∈ Ω, |Cn (ω)| ≤ K. Let X be a super-martingale.
Then C • X is a super-martingale that vanishes for n = 0.
(ii) Let C be a bounded previsible process (boundedness as above) and X be a
martingale. Then C • X is a martingale that vanishes at zero.
(iii) Both in (i) and (ii), the condition of boundedness can be replaced by Cn ∈ L2 ,
if also Xn ∈ L2 .
Remark. In terms of gambling, (i) says that, if the underlying process has a tendency
to fall, then playing against the trend (“investing in a falling stock”) leads to a wealth
process that tends to fall. On the other hand, (ii) says that, if the underlying process
X is a martingale, then no matter what strategy you use, the wealth process has mean
zero.
Proof. (i) and (ii). To check integrability it is trivial. We also have that Wn − Wn−1 =
Cn (Xn − Xn−1 ). Then

E(Wn − Wn−1 |Fn−1 ) = Cn E(Xn − Xn−1 |Fn−1 ), (4.1.5)

by Lemma 2.5. If X is a martingale, the conditional expectation on the right is zero,


so E(Wn − Wn−1 |Fn−1 ) = 0, and W is a martingale. If X is a super-martingale, the
conditional expectation is non-positive and this remains true for the product, if Cn
is non-negative. This proves (i) and (ii).
To prove (iii), we just need to show that under the hypothesis of (iii), Eq. (4.1.5)
still holds. But first, by the Cauchy-Schwartz inequality, Cn (Xn − Xn−1 ) is absolutely
integrable. Next, chose since the bounded functions are dense in L2 , take a sequence
of bounded functions Cnk that converge to Cn . Then

E(Wn − Wn−1 |Fn−1 ) = Cnk E(Xn − Xn−1 |Fn−1 ) + E((Cn −Cnk )(Xn − Xn−1 )|Fn−1 ). (4.1.6)

Again by Cauchy-Schwartz, the second term tends to zero as k ↑ ∞, while the first
tends to Cn E(Xn − Xn−1 |Fn−1 ), almost surely. u
t
1 The terminology previsible refers to the fact that C can be foreseen from the information avail-
n
able at time n − 1.
60 4 Martingales

The quantities Yn = Xn − Xn−1 are called martingale differences. A sequence S n ≡


k=1 Yn where E(Yn |Fn−1 ) = 0 is called a martingale difference sequence. If Yn are
Pn
square integrable, then the variance of a martingale difference sequence satisfies
n
X
E(S n2 ) = E(Yk2 ). (4.1.7)
k=1

Some examples.

A canonical way to construct a martingale is to take any integrable random variable,


X, on a filtered probability space, (Ω, F, P, {Fn , n ∈ N0 }), and to define

Xn ≡ E(X|Fn ). (4.1.8)

Then, by the properties of conditional expectation,

E(Xn |Fn−1 ) = E[E(X|Fn )|Fn−1 ] = E(X|Fn−1 ) = Xn−1 , a.s. (4.1.9)

In this case, we should expect that limn→∞ Xn = X, a.s..


Another example is a Markov chain whose transition kernel has the property that
Z
xPt (y, dx) = y. (4.1.10)

In particular, sums of independent random variables with mean zero are martingales.

4.2 Upcrossings and convergence

Consider an interval [a, b]. We want to count the number of times a process crosses
this interval from below.
Definition 4.8. Let a < b ∈ R and let X s be a stochastic process with values in R. We
say that an upcrossing of [a, b] occurs between times s and t, if
(i) X s < a, Xt > b,
(ii)for all r such that s < r < t, Xr ∈ [a, b].
We denote by U N (X, [a, b])(ω) the number of uprossings in the time interval
[0, N].
We consider an (obviously) previsible process constructed as follows:

C1 = 1X0 <a ; Cn = 1Cn−1 =1 1Xn−1 ≤b + 1Cn−1 =0 1Xn−1 <a , for n ≥ 2. (4.2.1)


4.2 Upcrossings and convergence 61

This process represents a “winning” strategy: wait until the process (say, price of
....) drops below a. Buy the stock, and hold it until its price exceeds b; sell, wait
until the price drops below a, and so on. Our wealth process is W = C • X.
Now each time there is an upcrossing of [a, b] we win at least (b − a). Thus, at
time N, we have

WN ≥ (b − a)U N (X, [a, b]) − |a − XN |1XN <a , (4.2.2)

where the last term count is the maximum loss that we could have incurred if we are
invested at time N and the price is below a.
Naive intuition would suggest that in the long run, the first term must win. Our
theorem above says that this is false, if we are in a fair or disadvantageous game
(that is, in practice, always).

Theorem 4.9 (Doob’s upcrossing lemma). Let X be a super-martingale.. Then


Then for any a < b ∈ R,

(b − a) E(U N (X, [a, b])) ≤ E |a − XN |1XN <a .


 
(4.2.3)

Proof. The process C defined in (4.2.1) is a bounded, non-negative previsible pro-


cess. Therefore (i) of Theorem 4.7 implies that W ≡ C • X is super-martingale with
W0 = 0. Therefore 0 ≥ EWN and taking the expectation of (4.2.2) gives (4.2.3). ut

Previsible process: X
Cn−1 → Cn
”wait” ”buy” ”wait” b
Xn−1 ≥ a Xn−1 < a Xn−1 ≤ b

0 1 last buy

Xn−1 > b a
max. possible
”sell”
loss
XN

N n

Fig. 4.1 Illustration of Theorem 4.9

The result has the following, quite remarkable consequence:

Definition 4.10. We say that a stochastic process with discrete time and values in a
Banach space is L p -bounded. if

sup E kXn k p < ∞.



(4.2.4)
n∈N

definition

Remark. Note that this requirement to be L p -bounded is strictly stronger than just
asking that for all n, E (kXn k p ) < ∞.)
62 4 Martingales

Corollary 4.11. Let Xn be a L1 -bounded super-martingale. Define U∞ (X, [a, b]) =


limn→∞ Un (X, [a, b]) for any interval [a, b]. Then

(b − a) E(U∞ (X, [a, b])) ≤ a + sup E(|Xn |) < ∞. (4.2.5)


n

In particular, P(U∞ (X, [a, b]) = ∞) = 0.

Proof. Exercise! t
u

This is quite impressive: a (super) martingale that is L1 -bounded cannot cross


any interval infinitely often. The next result is even more striking, and in fact one of
the most important results about martingales.

Theorem 4.12 (Doob’s super-martingale convergence theorem). Let Xn be a L1 -


bounded super-martingale. Then, almost surely, X∞ ≡ limn→∞ Xn exists and is a
finite random variable.

Proof. Define

Λ ≡ ω : Xn (ω) does not converge to a limit in [−∞, +∞]



(4.2.6)
( )
= ω : lim sup Xn (ω) > lim inf Xn (ω)
n→∞ n→∞
[ ( ) [
= ω : lim sup Xn (ω) > b > a > lim inf Xn (ω) ≡ Λa,b .
n→∞ n→∞
a<b∈Q a<b∈Q

But
Λa,b ⊂ {ω : U∞ (X, [a, b])(ω) = ∞}. (4.2.7)
Therefore, by Corollary 4.11, P(Λa,b ) = 0, and thus also P( a<b∈Q Λa,b ) = 0, since
S
countable unions of null-sets are null-sets.
Thus the limit of Xn exists in [−∞, ∞] with probability one. It remains to show
that it is finite. To do this, we use Fatou’s lemma:

E(|X∞ |) = E(lim inf |Xn |) ≤ lim inf E(|Xn |) ≤ sup E(|Xn |) < ∞. (4.2.8)
n→∞ n→∞ n∈N0

So X∞ is almost surely finite. u


t

Doob’s convergence theorem implies that positive super-martingale always con-


verge a.s.. This is because the super-martingale property ensures in this case that
E(|Xn |) = E(Xn ) ≤ E(X0 ), so the uniform boundedness in L1 is always guaranteed.
Our next result gives a sharp criterion for convergence that brings to light the
importance of the notion of the uniform integrability.

Theorem 4.13. Let X be a L1 -bounded super-martingale, so that, by Theorem 4.12,


X∞ ≡ limn→∞ Xn exists a.s.. Then Xn → X∞ in L1 if and only if the sequence {Xn , n ∈
N0 } is uniformly integrable. Then, for n ∈ N0 ,
4.2 Upcrossings and convergence 63

E(X∞ |Fn ) ≤ Xn , a.s. (4.2.9)

with equality holding if X is a martingale.


Proof. The first statement follows from Theorem 1.38. For the second statement,
note that we have that for m ≥ n, E(Xm |Fn ) ≤ Xn a.s.. Therefore, for any A ∈ Fn ,

E (Xm 1A ) = E (1A E(Xm |Fn )) ≤ E (Xn 1A ) . (4.2.10)

Since X converges in L1 , E (Xm 1A ) → E (X∞ 1A ), and hence

E (X∞ 1A ) = E (1A E(X∞ |Fn )) ≤ E (Xn 1A ) . (4.2.11)

But since this holds for any A ∈ Fn , this implies (4.2.9). t


u
A martingale with the property that there exists integrable X∞ such that Xn =
E(X∞ |Fn ) is called a closed martingale. The preceding theorem thus says in partic-
ular that martingales that are uniformly integrable and converge a.s. are closed.
The martingales of our first example are by definition closed. The next result
implies that such martingales converge almost surely and in L1 . To show this, we
need yet another result of Doob that implies the uniform integrability of conditional
expectations.
Theorem 4.14. Let X be an absolutely integrable random variable on some proba-
bility space (Ω, F, P). Then the family

{E(X|G) : G is a sub-σ-algebra of F} (4.2.12)

is uniformly integrable.
Proof. Since X is absolutely integrable, for any  > 0, we can find δ > 0 such that, if
F ∈ F with P(F) < δ, then E(|X|1F ) < . Let such  and δ be given. Choose K such
that K −1 E(|X|) < δ. Let now G ⊂ F be a σ-algebra, and let Y be a version of E(X|G).
Then Jensen’s inequality for conditional expectations implies that

|Y| ≤ E(|X| |G), a.s. (4.2.13)

By Chebychev inequality we have, KP(|Y| > K) ≤ E(|Y|) ≤ E(|X|). Thus P(|Y| > K) <
δ. On the other hand, since the event {|Y| > K} ∈ G, we can argue that

E(|Y|1|Y|>K ) ≤ E[1|Y|>K E(|X||G)] = E[E(|X|1|Y|>K |G)]


= E(|X|1|Y|>K )) < ,

where in the last step we have set F = {|Y| > K}. This is the uniform integrability
property we want to prove. u t
Theorem 4.15 (Lévy’s upward theorem). Let ξ be an absolutely integrable ran-
dom variable on a filtered probability space (Ω, F, P, {Fn , n ∈ N0 }). Define Xn ≡
E(ξ|Fn ), a.s.. Then Xn is a uniformly integrable martingale and
64 4 Martingales

Xn → X∞ = E(ξ|F∞ ), (4.2.14)

almost surely and in L1 .

Proof. Xn is a L1 -bounded martingale by the properties of conditional expectations.


Theorem 4.14 implies that Xn is uniformly integrable. Thus Xn converges almost
surely and in L1 . We have to show the last equality in (4.2.14). For any n, any
F ∈ Fn , and for any m ≥ n,

E[1F E(ξ|F∞ )] = E[1F E[E(ξ|F∞ )]|Fm ]] = E[1F E(ξ|Fm ]] = E(1F Xm ). (4.2.15)

In particular, for any m > n,

E[1F Xn ] = E[1F Xm ], (4.2.16)

and so
E[1F Xn ] = lim E[1F Xm ] = E[1F X∞ ] (4.2.17)
m↑∞

since Xm converges in L1 . Thus E[1F E(ξ|F∞ )] = E(1F X∞ ) for any F in the π-system
n∈N0 Fn that generates the σ-algebra F∞ . Using Lebesgue’s dominated conver-
S
gence theorem, one can verify easily that the class of sets for which the equal-
ity holds is also a λ-system. Therefore, by Dynkin’s theorem, the equality holds
for the σ-algebra generated by the π-system, that ist for F∞ . But this means that
E(ξ|F∞ ) = X∞ , almost surely. ut

Note that, when F = F∞ , the theorem says that E(ξ|Fn ) → ξ.


An application of this result is Kolmogorov’s 0 − 1 law.

Theorem 4.16 (Kolmogorov’s 0 − 1 law). Let Xn , n ∈ N be a sequence of inde-


pendent random variables. Define Tn = σ(Xn+1 , Xn+2 , . . . ) and T ≡ n∈N Tn . Then,
T
P(F) ∈ {0, 1} if F ∈ T .

Proof. Let Fn ≡ σ(X1 , . . . , Xn ), F ∈ T and set η = 1F . Since η is bounded and F∞ -


measurable, the preceding theorem tells us that

η = E(η|F∞ ) = lim E(η|Fn ), a.s. (4.2.18)


n→∞

Now η is Tn -measurable for each n and hence independent of Fn . Thus, for any n

E(η|Fn ) = E(η) = P(F), a.s. (4.2.19)

and so η = P(F), a.s.. But η takes only the values 0 and 1, being an indicator function.
Thus P(F) ∈ {0, 1}, proving the theorem. u t

The next theorem relates to filtrations to the infinite past. It is called the Lévy-
Doob downward theorem. It is somehow an inverted version of the upward theorem.
4.2 Upcrossings and convergence 65

Theorem 4.17 (Lévy-Doob downward theorem). Let (Ω, F, P) be a probability


space, and let {G−n , n ∈ N} be a collection of sub-σ-algebras of F such that,
\
G−∞ ≡ G−k ⊂ · · · ⊂ G−n−1 ⊂ G−n ⊂ · · · ⊂ G−1 . (4.2.20)
k∈N

Let {X−n , n ∈ N} be a super-martingale relative to {G−n , n ∈ N}, i.e.

E(X−n |G−m ) ≤ X−m , a.s. (4.2.21)

for m ≥ n. Assume that supn≥1 E(X−n ) < ∞. Then the process X is uniformly inte-
grable and the limit
X−∞ = lim X−n (4.2.22)
n→∞

exists a.s. and in L1 . Moreover,

E(X−n |G−∞ ) ≤ X−∞ , a.s. (4.2.23)

with equality in the martingale case.

Remark. Note that the limit we are considering here is really quite different form
the one in the previous convergence theorems. We are really looking backward in
time: as n tends to infinity, X−n is measurable with respect to smaller and smaller
σ-algebras, contrary to the usual Xn , that depend on more information. Therefore,
while a convergent martingale Xn can converge to a constant only if the entire se-
quence is a constant, but usually is a random variable, a convergent X−n has a much
better chance to converge to a real constant. We will see shortly why this can be
used to prove things like the strong law of large numbers.

Proof. The condition supn∈N E(X−n ) < ∞, and the super-martingale property imply
that ∞ > E(X−∞ ) ≥ E(X−1 ) > −∞. In particular, the sequence E(X−n ) is increasing
and bounded, and so the limit limn↑∞ E(X−n ) < ∞ exists and is finite. Thus, for any
 > 0, there is k ∈ N, such that

0 ≤ E(X−n ) − E(X−k ) ≤ /2, (4.2.24)

for all n ≥ k. Now, for such n, k, and λ > 0,

E(|X−n |1|X−n |>λ ) = −E(X−n 1X−n <−λ ) + E(X−n ) − E(X−n 1X−n ≤λ )


≤ −E(X−k 1X−n <−λ ) + E(X−n ) − E(X−k 1X−n ≤λ ) (4.2.25)

where we used the super-martingale property to replace n by k. Next we can replace


E(X−n ) by E(X−k ) with an error of at most /2.

E(|X−n |1|X−n |>λ ) ≤ −E(X−k 1X−n <−λ ) + E(X−k ) − E(X−k 1X−n ≤λ ) + /2
= −E(X−k 1X−n <−λ ) + E(X−k 1X−n >λ ) + /2
≤ E(|X−k |1|X−n |>λ ) + /2. (4.2.26)
66 4 Martingales

Since X−k is absolutely integrable, there exists δ > 0 such that for all F with

P(F) < δ ⇒ E(|X−k |1F ) < /2. (4.2.27)

But P(|X−n | > λ) ≤ λ−1 E(|X−n |). To control E(|X−n |), let us set X − ≡ max(−X, 0), and
write
E(|X−n |) = E(X−n ) + 2E(X−n −
). (4.2.28)
But X − is a sub-martingale, and so

E(|X−n |) ≤ sup E(X−n ) + 2E(X−1



). (4.2.29)
n∈N

Thus we can choose K < ∞ such that

P(|X−n | > K) ≤ δ, if n ≥ k, (4.2.30)


E(|X− j |1|X− j |>K ) < , if j < k,

(for the second we just use the integrability for the finitely many values of i; for
the first we use the uniform bound (4.2.29)). Then the first inequalities imply that
E(|X−n |1|X−n |>K ) ≤  for n ≥ k via (4.2.26) and the implication (4.2.27). This proves
the uniform integrability. But uniform integrability implies trivially boundedness in
L1 , and then the upcrossing lemma implies a.s. convergence. By uniform integrabil-
ity, convergence in L1 follows. Equation (4.2.23) follows from the convergence of
Xm to X−∞ . u t

As an application we give a new proof of Kolmogorov’s law of large numbers.

Theorem 4.18 (Kolmogorov’s law of large numbers). Let Xn , n ∈ N be iid random


variables with E(|Xn |) < ∞. Let µ = E(Xn ). Set S n ≡ ni=1 Xi . Then
P

n−1 S n → µ, (4.2.31)

a.s. and in L1 .

Proof. Define G−n = σ(S n , S n+1 , . . . , ). Then, for n ≥ 1,

E(X1 |G−n ) = E(X2 |G−n ) = · · · = E(Xn |G−n ). (4.2.32)

The reason for these equalities is simply that knowing something about the sums
S n , S n+1 , etc. effects the expectation of the Xk , k ≤ n all in the same way: we could
simply re-label the first indices without changing anything. Then, by linearity

E(X1 |G−n ) = (n − 1)−1 E(S n−1 |G−n ) = n−1 E(S n |G−n ) = n−1 S n , a.s. (4.2.33)

where we used the fact that S n is G−n measurable. Thus, L−n ≡ n−1 S n is a mar-
tingale with respect to the filtration {G−n , n ∈ N}. Thus, by the preceding theorem
L ≡ limn→∞ L−n exists a.s. and in L1 .
4.3 Inequalities 67

But clearly we also have, for any finite k, that L = limn→∞ n−1 (Xk+1 + · · · + Xn+k ),
which means that L is measurable with respect to Tk , for any k. Now Kolmogorov’s
zero-one law implies that, for any c, P(L ≤ c) ∈ {0, 1}. Since as a function of c this is
monotone and right-continuous, there must be exactly one c0 , such that P(L ≤ c) = 1
for all c ≥ c0 and P(L ≤ c) = 0 for all c < c0 , and hence P(L = c0 ) = 1. Then E(L) = c0 .
But E(L−n ) = µ, for all n, so c0 = µ. u t

The proof above shows some of the power of martingales!

4.3 Inequalities

In this section we derive some fundamental inequalities for martingales. One of the
most useful ones is the following maximum inequality.

Theorem 4.19 (Sub-martingale maximum inequality). Let Z be a non-negative


sub-martingale. Then, for c > 0, and n ∈ N,
 
cP max Zk ≥ c ≤ E Zn 1{maxk≤n Zk ≥c} ≤ E(Zn ).
 
(4.3.1)
k≤n

Remark. You may recall a similar result for sums of iid random variables as Kol-
mogorov’s inequality. The estimate is extremely powerful, since it gives the same
estimate for the probability of the maximum to exceed c as Chebyshev’s inequality
would give for just the endpoint!

Proof. Define the sequence of disjoint events F0 ≡ {Z0 ≥ c},


\
Fk ≡ {Z` < c} ∩ {Zk ≥ c} = {ω : min(` ∈ N0 : X` ≥ c) = k}, k ∈ N, (4.3.2)
`<k

and set
n o [n
F ≡ max Zk ≥ c = Fk . (4.3.3)
k≤n
k=0

Clearly, the events Fk ∈ Fk are disjoint. Moreover, on Fk we know that Zk ≥ c. Thus

E(Zn 1Fk ) ≥ E(Zk 1Fk ) ≥ cP(Fk ) (4.3.4)

for all k ≤ n. Here the first inequality used of course the sub-martingale property of
Z. Thus
n n
E(Zn 1F ) = E(Zn 1Fk ) ≥ c
X X
P(Fk ) = cP(F). (4.3.5)
k=0 k=0

This implies the assertion of the theorem. t


u

This implies the following corollary.


68 4 Martingales

Corollary 4.20. Let M be a (sub)-martingale and f : R → [0, ∞) a non-negative,


convex, and non-decreasing function such that E f (Mn ) < ∞ fo all n. Then, for any
c > 0, !
E( f (Mn ))
P sup Mk > c ≤ . (4.3.6)
k≤n f (c)

Proof. Note that if Mn is a (sub)-martingale and f an increasing convex function


such that E f (Mn ) < ∞, then f (Mn ) is a sub-martingale. Namely, Jensen’s inequality
also holds for conditional expectations, and so E ( f (Mn+1 )|Fn ) ≥ f (E(Mn+1 |Fn )) ≥
f (Mn ), a.s. . Since f is increasing, P(maxk≤n Mn > c) = P(maxk≤n f (Mk ) > f (c)).
Using Theorem 4.19 for the positive sub-martingale f (Mn ) yields the assertion of
the corollary. u t

This allows to obtain many useful inequalities from the one of Theorem 4.19!
In particular, Kolmogorov’s inequality follows by choosing f (X) = X 2 . Other useful
choices are the exponential function, f (x) = exp(λx), for λ > 0.
Our next target is Doob’s L p inequality. The next lemma is a first step in this
direction.

Lemma 4.21. Let X and Y be non-negative random variables such that, for all c > 0,

c P(X ≥ c) ≤ E(Y 1X≥c ). (4.3.7)

Then, for p > 1 and q−1 = 1 − p−1 ,

kXk p ≤ qkYk p . (4.3.8)

Proof. By our hypothesis, it holds that


Z ∞ Z ∞
L≡ pc p−1 P(X ≥ c)dc ≤ pc p−2 E(Y 1X≥c )dc ≡ R. (4.3.9)
0 0

Using Fubini’s theorem for non-negative integrands, we can write


Z ∞ Z !
L= pc p−1
1X(ω)≥c P(dω) dc
0 Ω
Z Z X(ω) ! Z
= pc p−1
dc P(dω) = X(ω) p P(dω) = E(X p ).
Ω 0 Ω

Starting from the right-hand side, we can perform the same calculation, and derive
that
R = qE(X p−1 Y) ≤ qkYk p kX p−1 kq , (4.3.10)
where the second inequality is just Hölder’s inequality. Then

E(X p ) ≤ qkYk p kX p−1 kq . (4.3.11)

Assume that kXk p is finite. Clearly, (p − 1)q = p, and so


4.3 Inequalities 69
 1/q
kX p−1 kq = EX q(p−1) = EX p 1/q .

(4.3.12)

Therefore (4.3.11) reads


p p/q
kXk p ≤ qkYk p kXk p , (4.3.13)
or kXk p ≤ qkYk p , as claimed. If kXk p = ∞, one derives the inequality first for X ∧ n,
and then uses monotone convergence. This proves the lemma. u t

We can now formulate Doob’s L p -inequality.

Theorem 4.22 (Doob’s L p -inequality). Let p > 1 and q−1 = 1 − p−1 . Let Z be a
non-negative sub-martingale bounded in L p , and define

Z ∗ ≡ sup Zk . (4.3.14)
k∈N0

Then Z ∗ ∈ L p , and
kZ ∗ k p ≤ q sup kZn k p . (4.3.15)
n∈N0

The limit, Z∞ ≡ limn→∞ Zn , exists a.s. and in L p , and

kZ∞ k p = sup kZn k p = lim kZn k p . (4.3.16)


n∈N0 n→∞

If Z is of the form Z = |M|, where M is a martingale bounded in L p , then M∞ ≡


limn→∞ Mn exists a.s. and in L p , and Z∞ = |M∞ |, a.s..

Proof. Define Zn∗ ≡ supk≤n Zk . Theorem 4.19 implies that the random variables Zn∗
and Zn satisfy the hypothesis on X and Y in Lemma 4.21. Therefore,

kZn∗ k p ≤ qkZn k p ≤ q sup kZk k p . (4.3.17)


k∈N

Since kZn∗ k p is increasing in n, the monotone convergence theorem implies (4.3.15).


Furthermore, −Z is a super-martingale bounded in L p , and hence in L1 .It follows
that Z∞ exists a.s.. But
|Z∞ − Zn | p ≤ (2Z ∗ ) p ∈ L1 , (4.3.18)
so that, by Lebesgue’s dominated convergence theorem, E(|Z∞ −Zn | p ) → 0, i.e. Zn →
Z∞ in L p .
The last assertion in (4.3.16) follows since by Jensen’s inequality and the sub-
martingale property
p p p
 
E(Zn ) = E E(Zn |Fn−1 ) ≥ E E(Zn |Fn−1 ) p ≥ E(Zn−1 ),

(4.3.19)

and so kZn k p is a non-decreasing sequence. The remaining assertions are straight-


forward. u t
70 4 Martingales

4.4 Doob decomposition

One of the games when dealing with stochastic processes is to “extract the martin-
gale part”. There are several such decompositions, but the following Doob decom-
position is very important and its continuous time analogue will be fundamental for
the theory of stochastic integration.
Theorem 4.23 (Doob decomposition).
(i) Let {Xn , n ∈ N0 } be an adapted process on a filtered space (Ω, F, P, {Fn , n ∈ N0 })
with Xn ∈ L1 for all n. Then X can be written in the form2

X = X0 + M + A, (4.4.1)

where M is a martingale with M0 = 0 and A is a previsible process with A0 = 0.


This decomposition is unique modulo indistinguishability, i.e. if for some other
M 0 , A0 , X = X0 + M 0 + A0 , then

P(Mn = Mn0 , An = A0n , ∀n ∈ N) = 1. (4.4.2)

(ii) The process X is a sub-martingale, if and only if A is an increasing process in


the sense that
P(An ≤ An+1 , ∀n ∈ N) = 1. (4.4.3)
Proof. The proof is unsurprisingly very easy. All we need to do is to derive explicit
formulae for M and A. Now assume that a decomposition of the claimed form exists.
Then

E((Xn − Xn−1 )|Fn−1 ) = E((Mn − Mn−1 )|Fn−1 ) + E((An − An−1 )|Fn−1 )


= 0 + An − An−1 (4.4.4)

by the martingale and predictability properties. Therefore


n
X
An = E((Xk − Xk−1 )|Fk−1 ), a.s. (4.4.5)
k=1

So now just define An by (4.4.5), and Mn by Mn ≡ Xn − X0 − An . Clearly M is then


a martingale, and A is by construction predictable. To see uniqueness, we use Mn −
Mn0 = A0n − An and applying the conditional expectation with respect to Fn−1 we have
Mn−1 − Mn−10 = A0n − An a.s.. Then, by M0 = M00 = 0 follows A01 = A1 a.s., from which
M1 = M1 a.s. and so on. This ends the proof of (i). The assertion of (ii) is obvious
0

from (4.4.4). u t
An immediate application of the decomposition theorem is a maximum inequal-
ity without positivity assumption.
2 To make sure that there is no confusion about notation: the following equation is to be understood
in the sense that X0 = X0 , and for n ≥ 1, Xn = X0 + Mn + An .
4.4 Doob decomposition 71

Lemma 4.24. If X is either a sub-martingale or a super-martingale then, for n ∈ N


and c > 0, !
cP sup |Xk | ≥ 3c ≤ 4E(|X0 |) + 3E(|Xn |). (4.4.6)
k≤n
Proof. We consider the case when X is a sub-martingale, the case of the super-
martingale is identical by passing to −X. Then there is a Doob decomposition

X = X0 + M + A (4.4.7)

with A an increasing process. Thus

sup |Xk | ≤ |X0 | + sup |Mk | + sup Ak = |X0 | + sup |Mk | + An . (4.4.8)
k≤n k≤n k≤n k≤n

Note that |M| is a non-negative sub-martingale, so for the supremum of |Mk | we can
use Theorem (4.19). We use the simple observation that, if x + y + z > 3c, then at
least one of the x, y, z must exceed c. Thus,
! !
cP sup |Xk | ≥ 3c ≤ cP(|X0 | ≥ c) + cP sup |Mk | ≥ c + cP(An ≥ c)
k≤n k≤n
≤ E(|X0 |) + E(|Mn |) + E(An ) (4.4.9)

Now
E(|Mn |) = E(|Xn − X0 − An |) ≤ E(|Xn |) + E(|X0 |) + E(An ) (4.4.10)
and
E(An ) = E(Xn − X0 − Mn ) = E(Xn − X0 ) ≤ E(|Xn |) + E(|X0 |). (4.4.11)
Inserting these two bounds into (4.4.9) gives the claimed inequality. t
u
The Doob decomposition gives rise to two important derived processes associ-
ated to a martingale, M, the bracket, hMi, and [M].
Definition 4.25. Let M be a martingale in L2 with M0 = 0. Then M 2 is a sub-
martingale with Doob decomposition

M 2 = N + hMi, (4.4.12)

where N is a martingale that vanishes at zero and hMi is a previsible process that
vanishes at zero. The process hMi is called the bracket of M.
Note that boundedness in L1 of hMi is equivalent to boundedness in L2 of M.
From the formulas associated with the Doob decomposition, we derive that

hMin − hMin−1 = E((Mn2 − Mn−1


2
)|Fn−1 ) = E((Mn − Mn−1 )2 |Fn−1 ). (4.4.13)

Definition 4.26. Let M be as before. We define


n
X
[M]n ≡ (Mk − Mk−1 )2 . (4.4.14)
k=1
72 4 Martingales

Lemma 4.27. If M is as before,

M 2 − [M] ≡ V = (C • M), (4.4.15)

where Cn ≡ 2Mn−1 . V is a martingale. If M is bounded in L2 , then V is bounded in


L1 .

Proof. Exercise! t
u

4.5 A discrete time Itô formula.

We will now give in some way a justification of the name “discrete stochastic in-
tegral” for the martingale tranform. We consider a martingale M zero in zero and
a function F : R → R. We want to consider the process F(MT ) and ask whether
we can represent F(MT ) − F(M0 ) as a “stochastic integral. Since we have called
C • M a stochastic integral, we might expect that this formula could simply read
F(MT ) = (F 0 • M)T + F(M0 ), as in the usual fundamental theorem of calculus, but
this will not turn out to be the case in general.
Let us consider the situation when the increments of Mt are getting very small;
the idea here is that the spacings between consecutive times are really small. So we
introduce parameter  > 0 that will later tend to zero, while we think that T =  −1C.
We also assume that E(Mt − Mt−1 )2 = O(). To see why this may reasonable think of
Mt ≡ Bt/T with B Brownian motion, where E(Bt/T − B(t−1)T )2 = 1/T = . Assuming
that F is a smooth function, we can expand F(Mt ) in a Taylor series:

F(Mt ) = F(Mt−1 ) + (Mt − Mt−1 )F 0 (Mt−1 ) (4.5.1)


1  
+ F 00 (Mt−1 )(Mt − Mt−1 )2 + O (Mt − Mt−1 )3
2
where we assume that
h  i
E O (Mt − Mt−1 )3 ≤ K 3/2 , (4.5.2)
h  i
and therefore T E O (MT − MT −1 )3 ≤ K 1/2 ↓ 0, so that as  ↓ 0 these error terms
will be negligible. Now we may iterate this procedure to obtain
T
X
F(MT ) = F(M0 ) + F 0 (Mt−1 )(Mt − Mt−1 ) (4.5.3)
t=1
T
1 X 00
+ F (Mt−1 )(Mt − Mt−1 )2 + O( 1/2 ).
2 t=1

This expression looks almost like the Doob decomposition of the process F(Mt ),
except that the last term is not exactly predictable. In fact, from the Doob decompo-
4.6 Central limit theorem for martingales 73

sition, we would instead expect a predictable term of the form


T
X h i
F 00 (Mt−1 )E (Mt − Mt−1 )2 |Ft−1 . (4.5.4)
t=1

However, under reasonable assumptions (on F and on the behavior of the increments
of the martingale M), the martingale
T
X  h i
∆T ≡ F 00 (Mt−1 ) (Mt − Mt−1 )2 − E (Mt − Mt−1 )2 |Ft−1 (4.5.5)
t=1

satisfies E(∆2T ) = O(), and is therefore negligible in our approximation. This implies
the discrete version of Itô’s formula:
T
X
F(MT ) = F(M0 ) + F 0 (Mt−1 )(Mt − Mt−1 ) (4.5.6)
t=1
T
1 X 00 h i
+ F (Mt−1 )E (Mt − Mt−1 )2 |Ft−1 + O( 1/2 ).
2 t=1

4.6 Central limit theorem for martingales

One important further result for martingales concerns central limit theorems. There
are various different formulations of such theorems. We will present one which em-
phasises the rôle of the bracket.

Theorem 4.28 (Central limit theorem). Let M be a martingale with M0 = 0. Set


s2n ≡ ni=1 E(Mi − Mi−1 )2 = E([M]n ). Assume that, as n → ∞,
P

n maxk≤n E(Mk − Mk−1 ) ↓ 0, and, for all  > 0,


s−2 2

n
E (Mk − Mk−1 )2 1|Mk −Mk−1 |> sn Fk−1 ↓ 0, a.s.
X h i
s−2
n (4.6.1)
k=1

If moreoveer hMin /s2n → 1 in probability, then

s−1
n Mn → N(0, 1) (4.6.2)

in distribution.

Remark. Condition (4.6.1) is called the conditional Lindeberg condition. In the case
when Mn = S n = ni=1 Xi with independent centered random variables Xi , (4.6.1)
P
reduces to the usual Lindeberg condition
74 4 Martingales
n
E Xk2 1|Xk |> sn ↓ 0.
X h i
s−2
n (4.6.3)
k=1

Moreover, in this case E([M]n ) = hMin , and so the condition hMin /s2n → 1 is trivially
verified (it is equal to one for all n). Thus the above theorem implies the usual CLT
for sums of independent random variables under the weakest possible conditions.
Interestingly, the conditions for the CLT for the martingale include a law of large
numbers for the bracket of the martingale. This is worth keeping in mind.

Proof. To simplify notation we set (for n fixed) M̃k ≡ Mk /sn . (We should really
write Mkn to indicate that this quantity depends explicitly on n; the same is true for
all other objects carrying a tilda). Then the assumptions of the theorem read:

max E(( M̃k − M̃k−1 )2 ) → 0,


k≤n
n  
E ( M̃k − M̃k−1 )2 1| M̃k − M̃k−1 |> Fk−1 → 0,
X
(4.6.4)
k=1
h M̃in → 1, in probability,

as n → ∞. We have to prove that M̃n → N(0, 1). This holds if and only if, for all
u ∈ R,
lim E(eiu M̃n ) = e−u /2 .
2
(4.6.5)
n→∞

Let use set X̃k ≡ M̃k − M̃k−1 . Then, it holds


n
X
h M̃in = E(X̃k2 |Fk−1 ) = h M̃in−1 + E(X̃n2 |Fn−1 ),
k=1
n
(4.6.6)
X
M̃n = X̃k = M̃n−1 + X̃n .
k=1

Things are a little tricky, and the following decomposition is quite helpful:
 
E eiu M̃n − e− u22

 2 
u2 u2 u2
 u
 
= E eiu M̃n 1 − e 2 h M̃in e− 2 + e− 2 eiu M̃n e 2 h M̃in − 1
 u2 h M̃i − u2
  2
iu M̃n u2 h M̃in
 (4.6.7)
≤E 1 − e
2 n
e + E e
2
e − 1
 n 
 X 
u2 h M̃i − u2 u2 h M̃i u2 h M̃i
≤E 1 − e

2 n
e +
2
E e

iu M̃k e 2 k −eiu M̃k−1 e 2 k−1

k=1

Now we show that the result holds under the assumption

h M̃in ≤ C (4.6.8)
4.6 Central limit theorem for martingales 75

for some finite constant C. In a second step we will show how to remove this as-
sumption. First, notice that the assumption that h M̃in → 1 in probability implies
that  
u2 h M̃i − u2
n
E 1 − e

2 e → 0, as n → ∞.
2 (4.6.9)

Thus we need to deal with the second term in (4.6.7). Using (4.6.6), we get
u2 u2
 
E eiu M̃k e 2 h M̃ik − eiu M̃k−1 e 2 h M̃ik−1
u2 u2
  
=E eiu M̃k−1 e 2 h M̃ik−1 eiuX̃k + 2 E(X̃k |Fk−1 ) − 1
2
(4.6.10)
u2 u2
  
=E eiu M̃k−1 e 2 h M̃ik−1 E eiuX̃k + 2 E(X̃k |Fk−1 ) − 1 Fk−1 .
2

This implies that


 
E eiu M̃k e u22 h M̃ik − eiu M̃k−1 e u22 h M̃ik−1 (4.6.11)

2   2  
C u2 iu X̃ + u E( X̃ 2 |F )
≤ e E E e k 2 k k−1 − 1 Fk−1

u2 2
 
To simplify the notation, set σ2k ≡ E(X̃k2 |Fk−1 ). To bound E eiuX̃k + 2 σk − 1 Fk−1 , we
use the following elementary estimates:

eix = 1 + ix − x2 /2 + R1 (x), with |R1 (x)| ≤ min(x2 , |x|3 ), (4.6.12)


x2 /2 4 x2 /2
e = 1 + x /2 + R2 (x), with |R2 (x)| ≤ x e
2
. (4.6.13)

With this we get


u2 2
 
E eiuX̃k + 2 σk − 1 Fk−1 (4.6.14)
u2 u2
h ih i !
= E 1 + iuX̃k − X̃k2 + R1 (uX̃k ) 1 + σ2k + R2 (uσk ) − 1 Fk−1 .
2 2

Since σk is Fk−1 -measurable, the second bracket can be taken out of the conditional
expectation. Also, E(X̃k |Fk−1 ) = 0 since M̃ is a martingale. Since E(X̃k2 |Fk−1 ) = σ2k ,
so that
u2
! 
u2 2
 
E eiuX̃k + 2 σk − 1 Fk−1 = 1 + σ2k + R2 (uσk ) E R1 (uX̃k ) Fk−1

2
u2 u4
!
+ 1 − σ2k R2 (uσk ) − σ4k . (4.6.15)
2 4

We use the following bounds:


(i) h M̃in = nk=1 σ2k ≤ C. In particular, σ2k is both bounded and summable.
P
76 4 Martingales

(ii) σ2k = E(X̃k2 |Fk−1 ) ≤  2 + E(X̃k2 1|X̃k |> |Fk−1 ). This is nice, because the second term
is controlled by the Lindeberg condition.
(iii) |E(R1 (uX̃k )|Fk−1 )| ≤ |u|3 σ2k + u2 E(X̃k2 1|X̃k |> |Fk−1 ). This holds by computing the
conditional expectation given Fk−1 of both sides of the inequality

min{u2 X̃k2 , |u|3 |X̃k |3 } = min{u2 X̃k2 , |u|3 |X̃k |3 } 1|X̃k |≤ + 1|X̃k |>
 

≤ u3 |X̃k |3 1|X̃k |≤ + u2 |X̃k |2 1|X̃k |>


≤ |u|3 X̃k2 + u2 X̃k2 1|X̃k |> . (4.6.16)

u2 u2
(iv) |R2 (uσk )| ≤ e 2 C u4 σ4k ≤ e 2 C u4C 2 .
Using these estimates, we get

u2
!
u2 C 4 2 
|(4.6.15)| ≤ 1 + C + e 2 u C |u|3 σ2k + u2 E(X̃k2 1|X̃k |> |Fk−1 )

2
u2
! 2
+ 1 + C e 2 C u4 σ2k  2 + CE(X̃k2 1|X̃k |> |Fk−1 )
u  
2
u 4
σ2k  2 + CE(X̃k2 1|X̃k |> |Fk−1 )
 
+
4 
≤ K(u) σ2k  2 + E(X̃k2 1|X̃k |> |Fk−1 ) ,

(4.6.17)

for some constant K(u) < ∞. But


n  n
σ2k  2 + E(X̃k2 1|X̃k |> |Fk−1 ) ≤ C 2 + E(X̃k2 1|X̃k |> |Fk−1 ),
X  X
(4.6.18)
k=1 k=1

there by the Lindeberg condition the second term tends to zero for any  > 0. Thus
the limit as n ↑ ∞ of the second trem in Eq. (4.6.7) is bounded by a constant times
 2 , for any  > 0, that is it is equal to zero, as desired. This proves the CLT under the
assumption (4.6.8).
To conclude, let us show that we can remove Assumption (4.6.8). Define, for n
fixed and m ≤ n
m
 
 X   
ω Ω 2
.
 
Am ≡  ∈ : h M̃i ≡ X̃ |F ≤ C (4.6.19)
 
 m E k k−1 

 
k=1

(to be more formal, we should write Anm , as this set is different for different choices
of n. Remember that all terms with a tilda are divided by s2n ). Of course, for m ≤ n,
 n /sn → 1 and so
An ⊂ Am , and so P(An ) ≤ P(Am ). Moreover, by assumption hMi 2

limn→∞ P(An ) = limn→∞ P(An ) = 1. Notice that k=1 E X̃k |Fk−1 is Fm−1 measur-
n Pm 2

able, and hence so is 1Am . Thus, if we set Zm ≡ X̃m 1Am , it holds that E(Zm |Fm−1 ) = 0,
for all m ≤ n. Therefore the variables {Zm , m ≤ n}, for fixed n, form a martingale dif-
ference sequence. Since |Zm | ≤ |X̃m |, all the properties used in the calculations above
4.6 Central limit theorem for martingales 77

carry over to the Zm . Therefore, repeating the calculations above with M̃n replaced
by Mbn ≡ Pn Zm , we find that
m=1
 
lim E eiu Mn = e−u /2 .
2
(4.6.20)
b
n→∞

Since on Am it holds that M̃m = Zm and since An ⊂ Am , it is true that on An , we have


en = M̃n . Therefore,
that M

lim E eiu M̃n = lim E eiu M̃n 1An + lim E eiu M̃n 1Acn
     
(4.6.21)
n→∞ n→∞ n→∞
 
= lim E eiu Mn 1An + 0
b
n→∞
   
= lim E eiu Mn − lim E eiu Mn 1Acn
b b
n→∞ n→∞
−u2 /2
=e .

This concludes the proof of the theorem. t


u

Very similar computations like those presented above play an important rôle in
what is called the concentration of measure phenomenon. Without going into too
many details, let me briefly describe this. The setting one is considering is the fol-
lowing. We have n independent, identically distributed random variables, X1 , . . . , Xn ,
assumed to have mean zero, variance one, and to satisfy, e.g. E(euXi ) < ∞, for all
u ∈ R. Let f : Rn → R be a differentiable function that satisfies
∂f

n
sup sup (x1 , . . . , xn ) ≤ 1. (4.6.22)
k=1 x∈Rn ∂xk

Set F ≡ f (X1 , . . . , Xn ). Then one can show that for some constant, C > 0,
nρ2
P (|F − E(F)| > ρn) ≤ 2e− 2C . (4.6.23)

The proof relies on the exponential Markov inequality, that states that

P (F − E(F) > nρ) ≤ inf e−tnρ E(et(F−E(F)) ). (4.6.24)


t≥0

The trick is to bound the Laplace transform by


2 nC/2
E(et(F−E(F)) ) ≤ et . (4.6.25)

(and not, as one might worry, of order exp(n2 )!!).


To do this, one writes F − E(F) as a martingale difference sequence with respect
to the filtration generated by the random variables Xi :
n
X
F − E(F) = (E(F|Fk ) − E(F|Fk−1 )) . (4.6.26)
k=1
78 4 Martingales

The computations one has to do are quite similar to those we have perfored in the
proof of the central limit theorem. There is one small trick that is useful to use: Set
F u ≡ f (X1 , . . . , uXk , Xk+1 , . . . Xn ) . Then
1 1

Z Z
d u
F − F0 = du F = duXk f (X1 , . . . , uXk , Xk+1 , . . . Xn ) (4.6.27)
0 du 0 ∂xk
and
Z 1 " # " #!
d u d u
E(F|Fk ) − E(F|Fk−1 ) = du EF Fk − E F Fk−1
0 du du
≡ E(Zk |Fk ) − E(Zk |Fk−1 ), (4.6.28)

where |Zk | ≤ |Xk |. Hence


 
E eλ(E(F|Fk )−E(F|Fk−1 )) − 1 − λ (E(F|Fk ) − E(F|Fk−1 )) Fk−1
 
≤ λ2 E (E(F|Fk ) − E(F|Fk−1 ))2 eλ|E(Zk |Fk )−E(Zk |Fk−1 )| Fk−1

≤ λ2C (4.6.29)

by the assumption on the law of Xk . We leave the remaining details of the calculation
as an exercise. For more on concentration of measure, see e.g. [9].

4.7 Stopping times, optional stopping

In a stochastic process we often want to consider random times that are determined
by the occurrence of a particular event. If this event depends only on what happens
“in the past”, we call it a stopping time. Stopping times are nice, since we can de-
termine there occurrence as we observe the process; hence, if we are only interested
in them, we can stop the process at this moment, hence the name.

Definition 4.29. A map T : Ω → N0 ∪ {+∞} is called a stopping time (with respect


to a filtration {Fn , n ∈ N0 }), if, for all n ∈ N0 ∪ {+∞},

{T = n} ∈ Fn . (4.7.1)

Example. The most important examples of stopping times are hitting time. Let X
be an adapted process, and let B ∈ B. Define

τB ≡ inf{t > 0 : Xt ∈ B}. (4.7.2)

Then τB is a stopping time. To see this, note that, if n ∈ N.

{τB = n} = {ω : Xn (ω) ∈ B, Xk (ω) < B, ∀0 < k < n}. (4.7.3)


4.7 Stopping times, optional stopping 79

This event is manifestly in Fn . The event {τB = ∞} occurs if {Xn < B, ∀n ∈ N} ⊂ F∞ .


In principle all stopping times can be realised as first hitting times of some pro-
cess. To do so, define

1, if n ≥ T (ω),

X[T,∞) (n, ω) = 

(4.7.4)
0, otherwise.

This process is adapted, and T = τ1 .


It is sometimes very convenient to have the notion of a σ-algebra of events that
take place before a stopping time.

Definition 4.30. The pre-T -σ-algebra, FT , is the set of events F ⊂ F, such that, for
all n ∈ N0 ∪ {+∞},
F ∩ {T ≤ n} ∈ Fn . (4.7.5)

Pre-T -σ-algebras will play an important rôle in formulation the strong Markov
property.
There are some useful elementary facts associated with this concept.

Lemma 4.31. Let S , T be stopping times. Then:


(i) If X is an adapted process, then XT is FT -measurable.
(ii) If S < T , then FS ⊂ FT .
(iii) FT ∧S = FT ∩ FS .
(iv) If F ∈ FS ∨T , then F ∩ {S ≤ T } ∈ FT .
(v) FS ∨T = σ(FT , FS ).

Proof. Exercise. u
t

We now return to our gambling mode. We consider a super-martingale X and we


want to play a strategy, C, that depends of a stopping time, T : say, we keep one unit
of stock until the random time T . Then

Cn ≡ CnT ≡ 1n≤T . (4.7.6)

Note that C T is a previsible process. Namely,

{CnT = 0} = {T ≤ n − 1} ∈ Fn−1 , (4.7.7)

and since CnT only takes the two values 0, 1, this suffices to show that CnT is Fn−1 −
measurable. The wealth process associated to this strategy is then

(C T • X)n = XT ∧n − X0 . (4.7.8)

Definition 4.32. We define define the stopped process X T , via

XnT (ω) ≡ XT (ω)∧n (ω). (4.7.9)


80 4 Martingales

With this definition we have (for our choice of C)

C T • X = X T − X0 . (4.7.10)

Theorem 4.33. (i) If X is a super-martingale and T is a stopping time, then the


stopped process, X T , is a super-martingale. In particular, for all n ∈ N,

E(XT ∧n ) ≤ E(X0 ). (4.7.11)

(ii) If X is a martingale and T is a stopping time, then X T is a martingale. In


particular
E(XT ∧n ) = E(X0 ). (4.7.12)
Proof. It follows directly from Theorem 4.7(i) because C T is positive and bounded.
t
u
This theorem is disappointing news for whose who might have hoped to reach
a certain gain by playing until they have won a preset sum of money, and stopping
then. In a martingale setting, the sure gain that will occur if this stopping time is
reached before time n is offset by the expected loss, if the target has not yet been
reached.
Note, however, that the theorem does not assert that E(XT ) ≤ E(X0 ) (see example
below). The following theorem, called Doob’s Optional Stopping Theorem, gives
conditions under which even that holds.
Theorem 4.34 (Doob’s Optional Stopping Theorem).
(i) Let T be a stopping time, and let X be a super-martingale. Then, XT is integrable
and
E(XT ) ≤ E(X0 ), (4.7.13)
if one of the following conditions holds:
(a) T is bounded (i.e. there exists N ∈ N, s.t. T (ω) ≤ N ∀ω ∈ Ω);
(b) X is bounded, and T is a.s. finite;
(c) E(T ) < ∞, and, for some K < ∞,

|Xn (ω) − Xn−1 (ω)| ≤ K, (4.7.14)

for all n ∈ N, ω ∈ Ω.
(ii) If X is a martingale and one of the conditions (a)-(c) holds, then E(XT ) = E(X0 ).
Remark. This theorem may look strange, and contradict the “no strategy” idea: take
a simple random walk, S n , (i.e. a series of fair games, and define a stopping time
T = inf{n : S n = 10}. Then clearly E(S T ) = S T = 10 , E(S 0 ) = 0! So we conclude,
using (c), that E(T ) = +∞. In fact, the “sure” gain if we achieve our goal is offset
by the fact that on average, it takes infinitely long to reach it (of course, most games
will end quickly, but chances are that some may take very very long!).
4.7 Stopping times, optional stopping 81

Proof. We already know that E(XT ∧n ) − E(X0 ) ≤ 0 for all n ∈ N. Consider case (a).
Then we know that T ∧ N = T , and so E(XT ) = E(XT ∧N ) ≤ E(X0 ), as claimed.
In case (b), we start from E(XT ∧n ) − E(X0 ) ≤ 0 and let n → ∞. Since T is almost
surely finite, limn→∞ XT ∧n = XT , a.s., and since Xn is uniformly bounded,

lim E(XT ∧n ) = E( lim XT ∧n ) = E(XT ), (4.7.15)


n→∞ n→∞

which implies the result.


In the last case, (c), we observe that

TX∧n
|XT ∧n − X0 | = (Xk − Xk−1 ) ≤ KT, (4.7.16)
k=1

and by assumption E(KT ) < ∞. Thus, we can again take the limit n → ∞ and use
Lebesgue’s dominated convergence theorem to justify that the inequality survives.
Finally, to justify (ii), use that if X is a martingale, then both X and −X are super-
martingales. The ensuing two inequalities imply the desired equality. u t

Case (c) in the above theorem is certainly the most frequent situation one may
hope to be in. For this it is good to know how to show that E(T ) < ∞, if that is the
case. The following lemma states that this is always the case, whenever, eventually,
the probability that the event leading to T is reasonably big.

Lemma 4.35. Suppose that T is a stopping time and that there exists N ∈ N and
 > 0, such that, for all n ∈ N,

P(T ≤ n + N|Fn ) > , a.s. (4.7.17)

Then E(T ) < ∞.

Proof. Consider P (T > kN). Clearly we can write

P (T > kN) = E 1T >(k−1)N 1T >kN



(4.7.18)
= E E 1T >(k−1)N 1T >kN |F(k−1)N


= E 1T >(k−1)N E 1T >kN |F(k−1)N




≤ (1 − )E 1T >(k−1)N


≤ (1 − )k ,

by iteration. The exponential decay of the probability implies the finiteness of the
expectation of T immediately. u t

Finally we state Doob’s super-martingale inequalities for non-negative super-


martingales.

Theorem 4.36. Let X be non-negative super-martingale and T a stopping time.


Then
82 4 Martingales

E(XT ) ≤ E(X0 ). (4.7.19)


Moreover, for any c > 0, !
c P sup Xk > c ≤ E(X0 ). (4.7.20)
k

Proof. We know that E(XT ∧n ) ≤ E(X0 ). Using Fatou’s lemma and the fact that Xn →
X∞ , a.s., allows to show that
 
E(X0 ) ≥ lim inf E(XT ∧n ) ≥ E lim inf XT ∧n = E (XT ) . (4.7.21)
n n

For (4.7.20), set T = inf{n : Xn > c}. Then, E(X0 ) ≥ E(XT ). But XT > c, provided the
set {n : Xn > c} is not empty, so E(XT ) ≥ cP(supk Xk > c). u t
Chapter 5
Markov processes

We have seen the definition and construction of discrete time


Markov chains already in Chapter 3. Markov chains are among
the most important stochastic processes that are used to model
real live phenomena that involve disorder. This is because the con-
struction of these processes is very much adapted to our thinking
about such processes. Moreover, Markov processes can be very
easily implemented in numerical algorithms. This allows to nu-
merically simulate even very complicated systems. We will al-
ways imagine a Markov process as a “particle” moving around in state space; mind,
however, that these “particles” can represent all kinds of very complicated things,
once we allow the state space to be sufficiently general. In this section, S will always
be a complete separable metric space.

5.1 Markov processes with stationary transition probabilities

In general, we call a stochastic process whose index set supports the action of a
group (or semi-group) stationary (with respect to the action of this (semi) group, if
all finite dimensional distributions are invariant under the simultaneous shift of all
time-indices. Specifically, if our index sets, I, are R+ or Z, resp. N, then a stochastic
process is stationary if for all ` ∈ N, s1 , . . . , s` ∈ I, all A1 , . . . , A` ∈ B, and all t ∈ I,
h i h i
P X s1 ∈ A1 , . . . , X s` ∈ A` = P X s1 +t ∈ A1 , . . . , X s` +t ∈ A` . (5.1.1)

We can express this also as follows: Define the shift θ, for any t ∈ I, as (X ◦ θt ) s ≡
Xt+s . Then X is stationary, if and only if, for all t ∈ I, the processes X and X ◦ θt have
the same finite dimensional distributions.
In the case of Markov processes, a necessary (but not sufficient) condition for
stationarity is the stationarity of the transitions kernels. Recall that we have defined
the one-step transition kernel Pt of a Markov process in Section 3.3.

83
84 5 Markov processes

Definition 5.1. A Markov process with discrete time N0 and state space S is said
to have stationary transition probabilities (kernels), if its one step transition kernel,
Pt , is independent of t, i.e. there exists a probability kernel, P, s.t.

Pt (x, A) = P(x, A), (5.1.2)

for all t ∈ N, x ∈ S , and A ∈ B.


Remark. With the notation Pt,s for the transitions kernel from time s to time t, we
could alternatively state that a Markov process has stationary transition probabili-
ties (kernels), if there exists a family of transition kernels Pt (x, A), s.t.

P s,t (x, A) = Pt−s (x, A), (5.1.3)

for all s < t ∈ N, x ∈ S , and A ∈ B. Note that there is a potential conflict of notation
between Pt and Pt which should not be confused.
A key concept for Markov chains with stationary transition kernels is the notion
of an invariant distribution.
Definition 5.2. Let P be the transition kernel of a Markov chain with stationary
transition kernels. Then a probability measure, π, on (S , B) is called an invariant
(probability) distribution, if
Z
π(dx)P(x, A) = π(A), (5.1.4)

for all A ∈ B. More generally, a positive, σ-finite measure, π, satisfying (5.1.4), is


called an invariant measure.
Lemma 5.3. A Markov chain with stationary probability kernels and initial distri-
bution P0 = π is a stationary stochastic process, if and only if π is an invariant
probability distribution.
Proof. Exercise. u
t
In the case when the state space, S , is finite, we have seen that there is always at
least one invariant measure, which then can be chosen to be a probability measure. In
the case of general state spaces, while there still will always be an invariant measure
(through a generalisation of the Perron-Frobenius theorem to the operator setting),
there appears a new issue, namely whether there is an invariant measure that is finite,
viz. whether there exists a invariant probability distribution.

5.2 The strong Markov property

The setting of Markov processes is very much suitable for the application of the
notions of stopping times introduced in the last section. In fact, one of the very
5.3 Markov processes and martingales 85

important properties of Markov processes is the fact that we can split expectations
between past and future also at random times.

Theorem 5.4. Let X be a Markov process with stationary transition kernels. Let
Fn = σ(X0 , . . . , Xn ) be the natural filtration, and let T be a stopping time. Let F and
G be F-measurable functions, and let F in addition be measurable with respect to
the pre-T -σ-algebra FT . Then

E [1T <∞ F G ◦ θT |F0 ] = E 1T <∞ F E0 G|F00 (XT ) F0


h h i i
(5.2.1)

where E0 and F0 refers to an independent copy, X 0 , of the Markov chain X.

Remark. If this looks fancy, just think of G as a function of the Markov process, i.e.
G = G(Xi1 , . . . , Xik ), and F = F(XT , XT −1 , . . . , X0 ). Then the statement of the theorem
says that

E 1T <∞ F(XT , XT −1 , . . . .X0 )G(XT +i1 , . . . , XT +ik )|F0


h i
(5.2.2)
=E 1
h h i i
T <∞ F(X , X , . . . .X ) E0 G(X 0 , . . . , X 0 )|F0 (X ) F
T T −1 0 i1 ik 0 T

0

Proof. We have

E [1T <∞ F G ◦ θT |F0 ] = E [E [1T <∞ F G ◦ θT |FT ] |F0 ] (5.2.3)


= E [1T <∞ F E [G ◦ θT |FT ] |F0 ] .
h i
Now E [G ◦ θT |FT ] depends only on XT and by stationarity is equal to E0 G|F00 (XT ),
which yields the claim of the theorem. u t

5.3 Markov processes and martingales

We now want to develop some theory that will be more important and more difficult
in the continuous time case. First we want to see how the transition kernels can be
seen as operators acting on spaces of measures respectively spaces of function.
If µ is a σ-finite measure on S , and P is a Markov transition kernel, we define
the measure µP as Z
µP(A) ≡ P(x, A)dµ(x), (5.3.1)
S
and similarly, for the t-step transition kernel, Pt ,
Z
µPt (A) ≡ Pt (x, A)dµ(x). (5.3.2)
S

By the Markov property, we have of course the

µPt (A) = µPt (A). (5.3.3)


86 5 Markov processes

The action on measures has of course the following natural interpretation in terms
of the process: if P(X0 ∈ A) = µ(A), then

P(Xt ∈ A) = µPt (A). (5.3.4)

Alternatively, if f is a bounded, measurable function on S , we define


Z
(P f )(x) ≡ f (y)P(x, dy), (5.3.5)
S

and Z
(Pt f )(x) ≡ f (y)Pt (x, dy), (5.3.6)
S
where again
Pt f = Pt f. (5.3.7)
We say that Pt is a semi-group acting on the space of measures, respectively on the
space of bounded measurable functions. The interpretation of the action on functions
is given as follows.

Lemma 5.5. Let Pt be a Markov semi-group acting on bounded measurable func-


tions f . Then
(Pt f )(x) = E ( f (Xt )|F0 ) (x) ≡ E x f (Xt ). (5.3.8)

Proof. We only need to show this for t = 1. Then, by definition,


Z Z
E x ( f (X1 )) = f (y)P(X1 ∈ dy|F0 )(x) = f (y)P(x, dy) = (P f )(x). (5.3.9)
S S

t
u

Notice that, by telescopic expansion, we have the elementary formula


t−1 t−1
P s (P − 1) f =
X X
Pt f − f = P s L f, (5.3.10)
s=0 s=0

where we call L ≡ P − 1 the (discrete) generator of our Markov process (this formula
will have a complete analogue in the continuous-time case).
An interesting consequence is the following observation:

Lemma 5.6 (Discrete time martingale problem). Let L be the generator of a


Markov process, Xt , and let f be a bounded measurable function. Then
t−1
X
Mt ≡ f (Xt ) − f (X0 ) − L f (X s ) (5.3.11)
s=0

is a martingale.
5.3 Markov processes and martingales 87

Proof. Let t, r ≥ 0. Then


t+r−1
X
E(Mt+r |Ft ) = E( f (Xt+r )|Ft ) − E( f (X0 )|Ft ) − E(L f (X s )|Ft )
s=0
= Pr f (Xt ) − f (Xt ) + f (Xt ) − f (X0 )
t+r−1
X Xt−1
− E(L f (X s )|Ft ) − E(L f (X s )|Ft )
s=t s=0
t−1
X
= f (Xt ) − f (X0 ) − L f (X s )
s=0
r−1
X
+Pr f (Xt ) − f (Xt ) − P s L f (Xt )
s=0
= Mt + 0, (5.3.12)

where the last term vanishes because of (5.3.10). This proves the lemma. t
u

Remark. (5.3.11) is of course the Doob decomposition of the process f (Xt ), since
Pt−1
s=0 L f (X s ) is a previsible process. One may check that this can be obtained directly
using the formula (4.4.5) [Exercise!].

What is important about this observation is that it gives rise to a characterisation


of Markov processes that will be extremely useful in the continuous time setting.
Namely, one can ask whether the requirement that Mt be a martingale given a
family of pairs ( f, L f ) characterises fully a Markov process.

Theorem 5.7. Let X be a discrete time adapted stochastic process on a filtered


space. Then X is a Markov process with transition kernel P = 1 + L, if and only
if, for all bounded measurable functions f the expression on the right-hand side of
(5.3.11) is a martingale.

Proof. Lemma 5.6 already provides the “only if” part, so it remains to show the “if”
part.
First, if we assume that X is a Markov process, setting r = 1 and t = 0 above and
taking conditional expectations given F0 , we see from Lemma 5.5 that E( f (X1 )) =
f (X0 ) + (L f )(X0 ), implying that the transition kernel must be 1 + L.
It remains to show that X is indeed a Markov process. For this we want to show
that
E( f (Xt+s )|Ft ) = (1 + L) s f (Xt ) ≡ P s f (Xt ), (5.3.13)
follows from the martingale problem formulation. To see this, we just use the above
calculation to see that
88 5 Markov processes

E( f (Xt+r )|Ft ) = E(Mt+r |Ft ) + f (X0 )


Xt−1 t+r−1
X
+ (L f )(X s ) + E((L f )(X s )|Ft )
s=0 s=t
t−1
X t+r−1
X
= Mt + f (X0 ) + (L f )(X s ) + E((L f )(X s )|Ft )
s=0 s=t
r−1
X
= f (Xt ) + E((L f )(Xt+s )|Ft ) (5.3.14)
s=0

Now let again r = 1. Then

E( f (Xt+1 )|Ft ) = f (Xt ) + (L f )(Xt ) = ((1 + L) f )(Xt ) ≡ P f (Xt ), (5.3.15)

which is (5.3.13) for r = 1. Now proceed by induction: assume that (5.3.13) holds
for it holds for all bounded measurable functions for s ≤ r − 1. We must show that it
then also holds for s = r. To do this, we use (5.3.14) for the last sum in (5.3.14),
r−1
X r−1
X
E((L f )(Xt+s )|Ft ) = (P s (L f ))(Xt ) = (Pr f )(Xt ) − f (Xt ), (5.3.16)
s=0 s=0

where we undid the telescopic sum. Inserting this into (5.3.14) yields (5.3.13) for
s = r. Hence (5.3.13) holds for all r, by induction. u
t

Remark. The full strength of this theorem will come out in the continuous time case.
A crucial point is that it will not be necessary to even consider all bounded functions,
but just sufficiently rich classes. This allows to formulate martingale problems even
then one cannot write down the generator in a explicit form. The idea of character-
ising Markov processes by the associated martingale problem goes back to Stroock
and Varadhan, see [13].

5.4 Harmonic functions and martingales

We have seen that measures that satisfy µL = 0 are of special importance in the
theory of Markov processes (they are the invariant measures). Also of central im-
portance are functions that satisfy L f = 0. In this section we will assume that the
transition kernels of our Markov chains have bounded support, so that for some
K < ∞, |Xt+1 − Xt | ≤ K < ∞ for all t.

Definition 5.8. Let L be the generator of a Markov process and D ⊆ S . A bounded


measurable function f : S → R that satisfies

L f (x) = 0, ∀x ∈ D, (5.4.1)
5.5 Dirichlet problems 89

is called a harmonic function on D. A function is called subharmonic (resp. super-


harmonic )on D if L f (x) ≥ 0 ( resp. L f (x) ≤ 0), for all x ∈ D.
Theorem 5.9. Let Xt be a Markov process with generator L. Then, a bounded mea-
surable function f is
(i) harmonic, if and only if f (Xt ) is a martingale;
(ii) subharmonic, if and only if f (Xt ) is a submartingale;
(iii) superharmonic, if and only if f (Xt ) is a supermartingale;
Proof. Simply use Lemma 5.6. u
t
Remark. Theorem 5.9 establishes a profound relationship between potential theory
and martingales.
A nice application of the preceding result is the maximum principle.
Theorem 5.10 (Maximum principle). Let X be a Markov process and let D be a
bounded open domain such that E(τDc ) < ∞. Assume that f is a bounded subhar-
monic function on D. Then

sup f (x) ≤ sup f (x). (5.4.2)


x∈D x∈Dc

Proof. Let us define T ≡ τDc . Then f (X T ) is a submartingale, and thus

E ( f (XT )|F0 ) (x) ≥ f (x). (5.4.3)

Since XT ∈ Dc , it must be true that

sup f (y) ≥ E ( f (XT )|F0 ) (x) ≥ f (x), (5.4.4)


y∈Dc

for all x ∈ D, hence the claim of the theorem. Of course we used again the Doob’s
optional stopping theorem in case (i,c). ut
The theorem can be phrased as saying that (sub) harmonic functions take on their
maximum on the boundary, since of course the set Dc in (5.4.2) can be replaced by
a subset, ∂D ⊂ Dc such that P x (XT ∈ ∂D) = 1.
The above proof is an example of how intrinsically analytic results can be proven
with probabilistic means. The next section will further develop this theme.

5.5 Dirichlet problems

Let us now consider a connected bounded open subset D of S . We define the stop-
ping time T = τDc ≡ inf{t > 0 : Xt ∈ Dc }.
If g is a bounded measurable function on D, we consider the Dirichlet problem
associated to a generator, L, of a Markov process, X:
90 5 Markov processes

−(L f )(x) = g(x), x ∈ D, (5.5.1)


f (x) = 0, x∈D . c

Theorem 5.11. Assume that E(τDc ) < ∞. Then (5.5.1) has a unique solution given
by
τ c −1  τ c −1 
D
 X  D
 X 
f (x) = E  g(Xt ) F0  (x) ≡ E x  g(Xt ) (5.5.2)

t=0 t=0

for x ∈ D, and f (x) = 0, for x ∈ Dc .


Proof. Consider the martingale Mt from Lemma 5.6. Set T ≡ τDc . We know from
Theorem 4.33 that M T is also a martingale. Moreover,
T
X −1 T
X −1
MT = f (XT ) − f (X0 ) − (L f )(Xt ) = 0 − f (X0 ) − (L f )(Xt ). (5.5.3)
t=0 t=0

But we want f such that −L f = g on D. Thus, (5.5.3) seen as a problem for f , reads
T
X −1
MT = − f (X0 ) + g(Xt ). (5.5.4)
t=0

Taking expectations conditioned on F0 , yields


T −1 
X 
0 = − f (X0 ) + E  g(Xt ) F0  , (5.5.5)

t=0

or T −1 
X 
f (x) = E x  g(Xt ) (5.5.6)
t=0

Here we relied of course on Doob’s optimal stopping theorem for E(MT ) = 0.


Thus, any solution of the Dirichlet problem is given by (5.5.6). To verify exis-
tence, we just need to check that (5.5.6) solves −L f = g on D. To do this we use the
Markov property “backwards”, to see that
T −1  Z T −1  Z
X  X 
P f (x) = PE x  g(Xt ) =
  P(x, dy)Ey  g(Xt ) +
 P(x, dy)0
t=0 D t=0 Dc
T −1  T −1 
X  X 
= E x  g(Xt ) = E x  g(Xt ) − g(x) = f (x) − g(x). (5.5.7)
t=1 t=0

This concludes the proof. t


u
We see that the Markov process produces a solution of the Dirichlet problem.
We can express the solution in terms of an integral kernel, called the Green kernel,
G D (x, dy), as
5.5 Dirichlet problems 91
Z T −1 
X 
f (x) = G D (x, dy)g(y) ≡ E x  g(Xt ) ,
 (5.5.8)
t=0

or, in more explicit terms,



X
G D (x, dy) = PtD (x, dy), (5.5.9)
t=0

where
Z Z Z
PtD (x, dy) = P(x, dz1 ) P(z1 , dz2 )· · · P(zt−1 , dy). (5.5.10)
D D D

The preceding theorem has an obvious extension to more complicated boundary


value problems:

Theorem 5.12. Let D be as above, and let h be a bounded function on Dc . Assume


that E(τDc ) < ∞. Then
 P 
E x t=0 g(Xt ) + E x (h(XT )) ,
T −1

 x ∈ D,
f (x) ≡  (5.5.11)
h(x),
 x ∈ Dc ,

is the unique solution of the Dirichlet problem

−(L f )(x) = g(x), x ∈ D, (5.5.12)


f (x) = h(x), x ∈ Dc .

Proof. Identical to the previous one. t


u

Theorem 5.12 is a two way game: it allows to produce solutions of analytic prob-
lems in terms of stochastic processes, and it allows to compute interesting proba-
bilistic problems analytically. As an example, assume that Dc = A∪ B with A∩ B = ∅.
Set h = 1A . Then, clearly, for x ∈ D,

E x (h(XT )) = P x (XT ∈ A) ≡ P x (τA < τB ), (5.5.13)

and so P x (XT ∈ A) can be represented as the solution of the boundary value problem

(L f )(x) = 0, x ∈ D, (5.5.14)
f (x) = 1, x ∈ A,
f (x) = 0, x ∈ B.

The is a generalisation of the ruin problem for the random walk that we discussed
in Probability 1.
Exercise. Derive the formula for P x (τA < τB ) directly from the Markov property
without using Lemma 5.6.
92 5 Markov processes

5.5.1 Green function, equilibrium potential, and equilibrium


measure

Let us consider the case where the solution of the Dirichlet problem is unique. Then
the solution of (5.5.12) can be written in the form
Z Z
f (x) = G Dc (x, dz)g(z) + HDc (x, dz)h(z), (5.5.15)
D Dc

where τ c −1 
D
1X(t)∈A 
 X 
G Dc (x, A) = E x 
t=0

is called the Green kernel, and

1X(τDc )∈A
h i
HDc (x, A) = E x (5.5.16)

X
= P x (τDc = t ∧ X(t) ∈ A)
t=0

is called the Poisson kernel. The Green kernel can also be characterised as the weah
solution of the problem

−(LG Dc (x, dz) = δz (dx), ∀ x ∈ D,


(5.5.17)
G Dc (x, dz) = 0, ∀ x ∈ Dc .

Let A, B ⊂ S be two disjoint subsets. Consider the Dirichlet problem

(−L f )(x) = 0, ∀ x ∈ S \ (A ∪ B),


f (x) = 1, ∀ x ∈ A, (5.5.18)
f (x) = 0, ∀ x ∈ B.

Suppose that (5.5.18) has a unique solution, e.g. because E x [τA∪B ] < ∞ for all x ∈ S .
The harmonic function that solves (5.5.18) will be denoted by hA,B (x) and is called
the equilibrium potential. We have already seen that

hA,B (x) = E x [1A (X(τA∪B ))] = P x (τA < τB ) , x ∈ S \ (A ∪ B). (5.5.19)

We would like to view this equation as an analytic expression for the probability
in the right-hand side. Naturally we would like to obtain such a formula also when
x ∈ A or x ∈ B. However, using the Markov property, we see that
Z Z
P x (τA < τB ) = P(x, dy)Py (τA < τB ) + P(x, dy) (5.5.20)
(A∪B)c A
Z
= P(x, dy)hA,B (y) = PhA,B (x)
S
= (LhA,B )(x) + hA,B (x).
5.6 Doob’s h-transform 93

For x ∈ B, the latter can be also written as (since hA,B (x) = 0)

P x (τA < τB ) = (LhA,B )(x), (5.5.21)

and for x ∈ A as

− (LhA,B )(x) = 1 − P x (τA < τB ) = P x (τB < τA ) . (5.5.22)

The quantity eA,B (x) ≡ −LhA,B (x), x ∈ A, is called the equilibrium measure on A.
The following simple observation provides the fundamental connection between
the objects we have introduced so far, and leads to a different representation of the
equilibrium potential. Pretend that the equilibrium measure eA,B is already known.
Then the equilibrium potential satisfies the inhomogeneous Dirichlet problem

−(Lh)(x) = eA,B (x), ∀ x ∈ S \ B,


(5.5.23)
h(x) = 0, ∀ x ∈ B.

The solution of (5.5.23) can be written in terms of the Green function.

Lemma 5.13. With the notation introduced above,


Z
hA,B (x) = G B (x, dy)eA,B (y). (5.5.24)
A

Note that ea,B (a) = Pa (τB < τa ) has the meaning of an escape probability.

5.6 Doob’s h-transform

Let us consider a discrete time Markov process, X, with generator L = P − 1 given.


We may want to consider modifications of the process. One important type of con-
ditioning is that to reach some set in particular places (e.g. consider a random walk
in a finite interval; we may be interested to consider this walk conditioned on the
fact that it exits on a specific side of the interval; this may correspond to consider a
sequence of games conditioned on the player to win).
How and when can we do this, and what is the nature of the resulting process?
In particular, is the resulting process again a Markov process, and if so, what is its
generator?
Given a non-negative harmonic function h, let us define a new measure, Ph , on
the space of paths as follows: If Y is a Ft -measurable random variable, then
1
Eh [Y|F0 ] = E[h(Xt )Y|F0 ]. (5.6.1)
h(X0 )
The first thing to check is of course whether this defines in a consistent way a prob-
ability measure. In particular, Ph should not depend on the choice of t. This is true
because h(Xt ) is a martingale.
94 5 Markov processes

Lemma 5.14. Let Y be F s -measurable. Then, for any t ≥ s,


1 1
Eh [Y|F0 ] ≡ E[h(X s )Y|F0 ] = E[h(Xt )Y|F0 ]. (5.6.2)
h(X0 ) h(X0 )

In particular, Ph [Ω|F0 ] = 1.

Proof. Just introduce a conditional expectation:

E[h(Xt )Y|F0 ] = E[E[h(Xt )Y|F s ]|F0 ] = E[YE[h(Xt )|F s ]|F0 ], (5.6.3)

and use that h(Xt ) is a martingale

= E[Yh(X s )|F0 ], (5.6.4)

from which the result follows. t


u

This lemma shows why it is important that h is a harmonic function.


Now we turn to the question of whether the law Ph is a Markov chain.

Theorem 5.15. Let X be a Markov chain with generator L. Let h be a positive har-
monic function. Then the h-transformed measure, Ph , is the law of a Markov process
with generator Lh , where for any bounded measurable function f ,
Z
h 1
L f (x) ≡ P(x, dy)h(y) f (y) − f (x). (5.6.5)
h(x) S
Proof. To proof this theorem we turn to the martingale problem. We will show that
for Lh defined by (5.6.5),
t−1
X
Mth ≡ f (Xt ) − f (X0 ) − (Lh f )(X s ) (5.6.6)
s=0

is a martingale under the law Eh , i.e. that, for t > r,

Eh [Mth |Fr ] = Mrh . (5.6.7)

Note first that, by definition


r−1
1 X
Eh [Mth |Fr ] = E[h(Xt ) f (Xt )|Fr ] − f (X0 ) − (Lh f )(X s )
h(Xr ) s=0
t−1
X 1
− E[h(X s )Lh f (X s )|Fr ]. (5.6.8)
s=r
h(Xr )

The middle terms are part of Mrh and we must consider E[ f (Xt )h(Xt )|Fr ]. This is
done by applying the martingale problem for P and the function f h. This yields
5.6 Doob’s h-transform 95

t−1
X
E[ f (Xt )h(Xt )|Fr ] = f (Xr )h(Xr ) + E[(L( f h))(X s )|Fr ]
s=r

Inserting this in (5.6.8) gives


r−1
X
Eh [Mth |Fr ] = f (Xr ) − f (X0 ) − (Lh f )(X s )
s=0
t−1 h
1 X i
+ E[(L( f h))(X s )|Fr ] − E[h(X s )Lh f (X s )|Fr ]
h(Xr ) s=r
= Mrh
t−1
1 Xh i
+ E[(L( f h))(X s )|Fr ] − E[h(X s )Lh f (X s )|Fr ] .
h(Xr ) s=r

The second term will vanish if we choose Lh f (x) = h(x)−1 (L(h f ))(x), i.e. as defined
in (5.15).
Hence we see that under Ph , X solves the martingale problem corresponding to
the generator Lh , and so is a Markov chain with transition kernel Ph = Lh + 1. The
process X under Ph is called the (Doob) h-transform of the original Markov process.
t
u
Now let us take h to the the harmonic potential h(x) = hA,B (x), for A, B ⊂ S . Then
(recall the definition of hA,B , (5.5.18))

1
Ph (x, dy) = P(x, dy)hA,B (y)
P x (τA ≤ τB )
1
= P x (X1 ∈ dy ∧ τA ≤ τB )
P x (τA ≤ τB )
= P x (X1 ∈ dy)|τA ≤ τB ). (5.6.9)

So Ph are the transition probabilities of the process conditioned to hit A before B.


Lemma 5.16. With the notation above, if Y is a FτB∪A -measurable function,

Ehx [Y] = E x [Y|τA < τB ]. (5.6.10)

Proof. We have by definition

1
Ehx [Y] = E x [Yh(XτA∪B )] (5.6.11)
h(x)
1
= E x Y 1τA <τB
 
P x [τA < τB ]
= E x [Y|τA < tB ].

t
u
96 5 Markov processes

Exercise. As a simple example, consider a simple random walk on {−N, −N +


1, . . . , N}. Assume we want to condition this process on hitting +N before −N. Then
let
h(x) = P x [τN = τ{N}∪{−N} ] = P x [τN < τ−N ]. (5.6.12)
Compute h(x) and use this to compute the transition rates of the h-transformed walk?
Plot the probabilities to jump down in the new chain!

5.7 Markov chains with countable state space

Much of the theory of Markov chains with countable state space is similar to the
case of finite state space. In particular, the notions of communicating classes, irre-
ducibility, and periodicity carry over. There are, however, important new concepts
in the case when the state space is infinite. These are the notions of recurrence and
transience. It will be useful to use a notation close to the matrix notation of finite
chains. Thus we set
P(i, { j}) = p(i, j) (5.7.1)
We will place us in the setting of an irreducible Markov chain, i.e. the all states
in S communicate (i.e. for any i, j ∈ S , P j (τ j < ∞) > 0). We may also for simplicity
assume that our chain is aperiodic. In the case of finite state space, we have seen that
such chains are ergodic in the sense that there exists a unique invariant probability
distribution, and the marginal distributions at time t, converge to this distribution
independently of the starting measure. Essentially this is true because the chain is
trapped on the finite set. If S is infinite, a new phenomenon is possible: the chain
may run “to infinity”.
Definition 5.17. Let X be an irreducible aperiodic Markov chain with countable
state space S . Then:
(i) X is called transient, if for any i ∈ S ,

Pi (τi < ∞) < 1; (5.7.2)

(ii) X is called recurrent, if it is not transient.


(iii) X is called positive recurrent or ergodic, if, for all i ∈ S ,

Ei (τi ) < ∞. (5.7.3)

Remark. The notion of recurrence and transience can be defined for states rather
than for the entire chain. In the case of irreducible and aperiodic chains, all states
have the same characteristics.
Some simple consequences of the definition are the following.
Lemma 5.18. Let X be a Markov chain with countable state space be irreducible.
Then X is transient, iff
5.7 Markov chains with countable state space 97

P` (Xt = `, i.o.) = 0. (5.7.4)

Proof. Assume that X is transient. Then P` (τ` < ∞) = c < 1. By the first Borel-
Cantelli lemma, (5.7.4) holds if

X
P` (Xt = `) < ∞. (5.7.5)
t=0

But ∞  ∞

P` (Xt = `) = E`  1Xt =`  =
X X  X
 nP` (Xt = `, n-times) . (5.7.6)
t=0 t=0 n=1

Using the strong Markov property,

P` (Xt = `, n-times) = P` (τ` < ∞)n P` (τ` = ∞) = cn (1 − c). (5.7.7)

Inserting this equality into (5.7.6) yields that (5.7.5) holds and thus that (5.7.4) is
true.
To show the converse, assume that (5.7.4) holds. Then

1 = 1 − P` (Xt = `, i.o.) = Pe ll Xt = `, finitely many times



X∞ ∞
X
= P` (Xt = `, n-times) = cn (1 − c). (5.7.8)
n=0 n=0

The latter sum equals 1 if and only if c < 1. Thus X is transient. u


t

Positive recurrent chains are called ergodic, because they are ergodic in the same
sense as finite Markov chains.

Lemma 5.19. Let X positive recurrent Markov chain with countable state space, S .
Then, for any j, ` ∈ S , Pτ
1 Xt = j

`
E` t=1
µ( j) ≡ . (5.7.9)
E` τ`
is the unique invariant probability distribution of X.
hPτ
1
i
Proof. Define ν` ( j) = E` t=1`
X t = j . We show first that ν is an invariant measure.
Obviously, 1 = m∈S 1X`−1 =m , and hence, using the strong Markov property,
P
98 5 Markov processes
τ   τ

ν` ( j) ≡ E`  1Xt = j  = E`  1Xt = j 1Xt−1 =m 
X̀   X X̀ 
  
t=1 m∈S t=1
τ 
E`  1Xt−1 =m P[Xt = j|Ft−1 ](m)
X X̀ 
=
m∈S t=1
τ 
E`  1Xt−1 =m  p(m, j)
X X̀ 
= 
m∈S t=1
τ 
E`  1Xt =m  P(m, j)
X X̀ 
=
m∈S t=1
X
= ν` (m)P(m, j)
m∈S

Thus µ` solves the invariance equation and thus is an invariant measure. It remains
to show that ν` is normalisable. But
X
ν` ( j) = E` (t` ) < ∞, (5.7.10)
j∈Σ

by assumption. Thus ν` ( j)/ i∈S ν` (i) = µ( j) is an invariant probability distribution.


P
Next we want to show uniqueness. Note first that for any irreducible Markov
chain (with discrete state space) it holds that, if µ is an invariant measure and µ(i) =
0, for some i ∈ S , then µ ≡ 0. Namely, if for some j,µ( j) > 0, then there exists t finite
such that Ptji > 0, and µ(i) ≥ µ( j)Ptji > 0, in contradiction to the hypothesis.
We will now actually show that ν` is the only invariant measure such that ν` (`) = 1
(which implies the desired uniqueness result immediately). To do so, we will show
that for any other invariant measure, ν, such that ν(`) = 1, we have that ν( j) ≥ ν` ( j)
for all j. For then, ν − ν` is a positive invariant measure as well, and being zero in `,
must vanish identically. Hence ν = ν` .
Now we clearly have that
X
ν(i) = p( j, i)ν( j) + p(`, i), (5.7.11)
j,`

since ν(`) = 1, by hypothesis. We want to think of p(`, i) as

p(`, i) = E` 1τ` ≥1 1X1 =i .


 
(5.7.12)

Now iterate the same relation in the first term in (5.7.11). Thus

p(`, j1 )p( j1 , i) + E` 1τ` ≥1 1X1 =i


X X  
ν(i) = p( j2 , j1 )p( j1 , i)ν( j2 ) +
j1 , j2 ,` j1 ,`
2∧τ 
 X`
1Xs =i  .
X 
= p( j2 , j1 )p( j1 , i)ν( j2 ) + E`   (5.7.13)
j1 , j2 ,` s=1
5.7 Markov chains with countable state space 99

Further iteration yields for any n ∈ N


n∧τ 
 X`
1Xs =i 
X 
ν(i) = p( jn , jn−1 ) . . . p( j2 , j1 )p( j1 , i)ν( jn ) + E`  
j1 , j2 ,... jn ,` s=1
n∧τ 
 X`
1Xs =i  .

≥ E`   (5.7.14)
s=1

This implies ν(i) ≥ ν` (i), as desired, and the proof is complete. t


u
Corollary 5.20. An ergodic Markov chain satisfies
1
µ( j) = . (5.7.15)
E j (τ j )
Pτ j
Proof. Just set ` = j in the definition of µ( j), and note that ν j ( j) = E j ( t=1 1Xt = j ) = 1.
t
u
We have seen that positive recurrence is needed to ensure the existence of an in-
variant probability measures. Next we show that if the chain is in addition aperiodic,
we get convergence towards this invariant measure.
Let us show first that the existence of strictly positive invariant probability mea-
sure ensures positive recurrence.
Lemma 5.21. Let X be an irreducible Markov chain with countable state space. If
there exists an invariant probability measure µ, then µ(i) = 1/Ei τi , and X is positive
recurrent.
Proof. Since µ is a probability measure, there must be a state i ∈ S such that µ(i) > 0.
By irreducibility, for any ` ∈ S , there exists n ∈ N such that (Pn )i` > 0, and hence
µ(`) > 0. Then λ( j) = µ( j)/µ(`), j ∈ Σ is an invariant measure that satisfies λ(`) = 1.
Thus λ ≥ ν` . Hence
X X µ(i) 1
E` τ` = ν` (i) ≤ = < ∞. (5.7.16)
i∈S i∈S
µ(`) µ(`)

Therefore, X is positive recurrent. t


u
We can now state an ergodic theorem.
Theorem 5.22. Let X be an irreducible, aperiodic, and positive recurrent Markov
chain with countable state space. Let P denote its transition kernel and µ its unique
invariant probability measure. Then, for any initial distribution π0 , we have that for
all i ∈ S ,
lim(π0 Pn )i = µ(i). (5.7.17)
n↑∞

Proof. The proof uses the method of coupling. Let π0 be our initial distribution. We
construct a second Markov chain, independent of X with the same transition kernel
100 5 Markov processes

but initial distribution µ. Then we define the stopping time T with respect to the
filtration Fn ≡ σ(X0 , Y0 , X1 , Y1 , . . . , Xn , Yn ) as

T ≡ inf {n : Xn = Yn = i} , (5.7.18)

where i ∈ S is an arbitrary state in S .


We show first that T is almost surely finite. To do this, we consider the pair
W = (X, Y) as a Markov chain with state space S × S . Its transition kernel P
e has
elements
p̃(ik)( jm) ≡ pi j pkm . (5.7.19)
The initial distribution of this chain is π̃0 (( jk)) = π0 ( j)µ(k). Since P is irreducible
and aperiodic, for any i, j, k, ` there exists n, such that

p̃n(ik)( jm) = pnij pnkm > 0. (5.7.20)

Hence W is irreducible. Furthermore, it is evident that the invariant distribution of


W is given by µ̃
µ̃(( jk)) = µ( j)µ(k) > 0. (5.7.21)
Hence, by Lemma 5.21, W is positive recurrent, and thus, T = inf {n ≥ 0 : Wn = (ii)},
we Pii (T < ∞) = 1. But from this it follows that for any (k`), Pk` (T < ∞) = 1. Just
note that by irreducibility, there exists an m < ∞, such that Pii (Wm = (k`)) > 0. But
then Pii (T + ∞) ≥ Pii (Wm = (k`)) Pk` (T = ∞). So if the last probability were posi-
tive, then so would Pii (T + ∞), in contradiction to what we know.
Next we construct a new Markov chain with state space S as

Xn , if n < T

Zn = 

(5.7.22)
Yn , if n ≥ T.

This chain has the same law as X. It follows that

P (Xn = i) = P (Zn = i) (5.7.23)


= P (Zn = i ∧ {n < T }) + P (Zn = i ∧ {n ≥ T })
= P (Xn = i ∧ {n < T }) + P (Yn = i ∧ {n ≥ T })
= P (Yn = i) − P (Yn = i ∧ {n < T }) + P (Xn = i ∧ {n < T })
= µ(i) + (P (Xn = i|T > n) − P (Yn = i|T > n)) P (T > n) .

The expression in the brackets is smaller than one while the coefficient P (T > n)
tends to zero, as n ↑ ∞. This proves the theorem. u
t

Remark. Note that both irreducibility and aperiodicity were used in the proof. It is
clear from elementary examples that the conclusion cannot hold for periodic Markov
chains.

Finally we note that the strong ergodic theorem that we know for irreducible
Markov chains with finite state space holds also for positive recurrent chains with
5.7 Markov chains with countable state space 101

countable state space. The proof is identical to that in finite state space, given that
we already know existence und uniqueness of an invariant probability measure.
Chapter 6
Random walks and Brownian motion

The goal of this chapter is to introduce Brownian motion as a


continuous time stochastic process with continuous paths and to
explain its connection to random walks through Donsker’s invari-
ance principle. A very detailed source on Brownian motion is the
classical book by Itô and McKean [7].

6.1 Random walks

The innocent looking stochastic processes


n
X
Sn ≡ Xi , (6.1.1)
i=1

with Xi , i ∈ N iid random variables are generally called random walks and receive a
considerable attention in probability theory. A special case is the so-called simple
random walk on Zd , characterised by the fact that the random variables Xi take
values in the set of ± unit vectors in the lattice Zd . Consequently, S n ∈ Zd , is a
stochastic process with discrete state space. Obviously, S n is a Markov chain, and,
µ
moreover, the coordinate processes, S n , µ = 1, . . . d, are sub-, super-, or martingales,
µ
depending on whether E(X0 ) is positive, negative, or zero.
Let us focus on the centred case, E(X1 ) = 0. In this case we have seen that Zn ≡
n−1/2 S n converges in distribution to a Gaussian random variable. By considering the
process coordinate wise, it will also be enough to think about d = 1. We now want
to extend this result to a convergence result on the level of stochastic process. That
is, rather than saying something about the position of the random walk at a time n,
we want to trace the entire trajectories of the process and try give a description of
their statistical properties in terms of some limiting stochastic process.
It is rather clear from the central limit theorem that we must consider a rescaling
like

103
104 6 Random walks and Brownian motion

[tn]
X
Zn (t) ≡ n−1/2 Xk . (6.1.2)
k=1

In that case we have from the central limit theorem, that for any t ∈ (0, 1],
D
Zn (t) → Bt , (6.1.3)

([x] denotes the lower integer part of x) where Bt is a centred Gaussian random vari-
able with variance t. Moreover, for any finite collection of indices t1 , . . . , t` , define
Yn (i) ≡ Zn (ti ) − Zn (ti−1 ). Then the random variables Yn (i) are independent and it is
easy to see that they converge, as n → ∞, jointly to a family of independent centred
Gaussian variables with variances ti − ti−1 . This implies that the finite dimensional
distributions of the processes Zn (t), t ∈ (0, 1], converge to the finite dimensional dis-
tributions of the Gaussian process with covariance C(s, t) = s ∧ t, that we introduced
in Section 3.3.2 and that we have preliminarily called Brownian motion.
We now want to go a step further and discuss the properties of the paths of our
processes.

8 5

100 200 300 400 500


6
-5

4 -10

2 -15

-20
10 20 30 40 50
-25

60
5000 10000 15000 20000
-50
40
-100
20
-150

-200
1000 2000 3000 4000 5000
-250
-20
-300

Fig. 6.1 Paths of S n for various values of n.

From looking at pictures, it is clear that the limiting process Bt should have rather
continuous looking sample paths.

6.2 Construction of Brownian motion

Before stating the desired convergence result, we have to define and construct the
limiting object, the Brownian motion.
6.2 Construction of Brownian motion 105

Definition 6.1. A stochastic process {Bt ∈ Rd , t ∈ R+ }, defined on a probability space


(Ω, F, P), is called a d-dimensional Brownian motion starting in 0, iff
(o) B0 = 0, a.s..
(i) For any p ∈ N, and any 0 = t0 < t1 < · · · < t p , the random variables Bt1 , Bt2 −
Bt1 , . . . , Bt p − Bt p−1 , are independent and each Bti − Bti−1 is a centred Gaussian r.v.
with variance ti − ti−1 .
(ii) For any ω ∈ Ω, the map t 7→ Bt (ω) is continuous.
Remark. The requirement (ii) is sometimes replaced by "For P-almost all ω ∈ Ω,
the map t 7→ Bt (ω) is continuous". Since we already know that we do not care much
about what happens on sets of probability zero, this does not make a big difference.
In fact, we can then modify the process by replacing all the paths B(ω) that are not
continuous by constant paths (in fact, we will do this when constructing BM later).
It will, however, be convenient to have all paths continuous, so that we can think of
B as a map from Ω to the space of continuous functions.
The question is whether such a process exists. The first property can, as we have
seen, be established with the help of Kolmogorov’s theorem. The problem with
this is that it constructs the process on the space ((Rd )R+ , BR+ (Rd )); but the second
requirement, the continuity of the sample paths, is not a measurable property with
respect to the product σ-algebra. Therefore, we have to proceed differently. In fact,
we want to construct Brownian motion as a random variable with values in the space
C(R+ , Rd ).
Theorem 6.2. Brownian motion exists.
Proof. We consider the case d = 1, the extension to higher dimensions is straight-
forward. We consider a probability space (Ω, F, P) on which an infinite family of
independent standard Gaussian random variables is defined. We define the so-called
Haar-functions, hkn on [0, 1] via

h00 (t) ≡ 1, (6.2.1)


1[(2k)2−n ,(2k+1)2−n ) (t) − 1[(2k+1)2−n ,(2k+2)2−n ) (t)
h i
(n−1)/2
hkn (t) ≡2

for k ∈ {0, . . . , 2n−1 − 1} and n ≥ 1. We set I(n) ≡ {0, . . . , 2n−1 − 1} for n ≥ 1 and
I(0) = {0}. The functions hkn , n ∈ N, k ∈ I(n) form a complete orthonormal system
of functions in L2 ([0, 1]), as one may easily check. Now set
Z t
fn (t) =
k
hkn (u)du, (6.2.2)
0

and set
n X
X
Bt(n) ≡ fmk (t)Xm,k (6.2.3)
m=0 k∈I(m)

for t ∈ [0, 1], where Xm,k are our independent standard normal random variables.
We will show that (i) the continuous functions B(n) (ω) converge uniformly, almost
106 6 Random walks and Brownian motion

surely, and hence to continuous functions, and (ii) that the covariances of B(n) con-
verge to the correct limit. The limit, modified to be Bt (ω) ≡ 0 when B(n)
t (ω) does not
converge to a continuous function, will then be Brownian Motion on [0, 1].
Let us now prove (i). The point here is that, of course, that the functions fnk (t) are
very small, namely,
| fnk (t)| ≤ 2−(n+1)/2 . (6.2.4)
Moreover, for given t, there is only one value of k such that fnk (t) , 0. Therefore,
 
" #  X 
(n) (n−1)
P sup |Bt − Bt | > an = P  sup fnk (t)Xn,k > an  (6.2.5)
0≤t≤1 0≤t≤1 k∈I(n)
 
≤ P  sup |Xn,k | > 2 (n+1)/2
 
an 
k∈I(n)
h i
≤ 2 P |Xn,1 | > 2(n+1)/2 an
n

2 n 2 n
e−an 2 2n/2 e−an 2
≤2 √ n
= √ ,
π/2an 2(n+1)/2 πan
where we used the very useful bound
1
e−u /2
2
P[|X| > u] ≤ √ (6.2.6)
u π/2
for Gaussian probabilities. Now we are close to being done: Choose a sequence an
such that ∞n=0 an < ∞ and
P


X " #
(n) (n−1)
P sup |Bt − Bt | > an < ∞. (6.2.7)
n=1 0≤t≤1

Clearly, the choice an = 2−n/4 will do. Then, by the Borel-Cantelli lemma,
" #
(n) (n−1)
P sup |Bt − Bt | > an i.o. = 0, (6.2.8)
0≤t≤1

and hence, " #


(m) (n)
P ∀k∈N ∃n<∞ ∀m>n sup |Bt − Bt | < 1/k = 1. (6.2.9)
0≤t≤1

which implies that almost surely, the sequence B(n) converges uniformly in the inter-
val [0, 1]. Since uniformly convergent sequences of continuous functions converge
to continuous functions, limn→∞ Bt(n) ≡ Bt (ω) in C([0, 1], R), for almost all ω.
To check (ii), we compute the covariances:
6.2 Construction of Brownian motion 107
n X X
n
0
X X
E(B(n) (n)
t Bs ) = fmk (t) fmk 0 (s)E(Xm,k Xm0 ,k0 )
m=0 k∈I(m) m0 =0 k0 ∈I(m0 )
Xn X
= fmk (t) fmk (s) (6.2.10)
m=0 k∈I(m)
Z 1 Z 1 n X
dv1[0,t] (u)1[0,s] (v)
X
= du hkm (u)hkm (v).
0 0 m=0 k∈I(m)

Taking the n → ∞ we obtain


Z 1 Z 1 ∞ X
dv1[0,t] (u)1[0,s] (v)
X
lim E(B(n) (n)
t Bs ) = du hkm (u)hkm (v)
n→∞ 0 0 m=0 k∈I(m)
Z 1
= du1[0,t] (u)1[0,s] (u) = s ∧ t (6.2.11)
0

due to the fact that the system hkn is a complete orthonormal system. Now note that
from the definition of Brownian motion, for s < t,

EBt Bs = E [((Bt − Bs ) + Bs ) Bs ] = EB2s = s = t ∧ s, (6.2.12)

so the limiting covariance is that of Brownian motion. Finally, since Bt(n) are Gaus-
sian whose covariances converge, the limit is necessarily Gaussian with the limiting
covariance (Exercise! Hint: Show that the Fourier transforms converge!).
This provides Bt on [0, 1]. To construct Bt for t ∈ (k, k + 1], just take k + 1 inde-
pendent copies of the B we just constructed, say Bt,1 , . . . , Bt,k+1 , via
k
X
Bt = B1,i + Bt−k,k+1 . (6.2.13)
i=1

Finally, to construct d-dimensional Brownian motion, take d independent copies of


Bt , say Bt,1 , . . . , Bt,d and let eµ , µ = 1, . . . , d, be a orthonormal basis of Rd . Then set
d
eµ Bt,µ .
X
Bt ≡
b (6.2.14)
µ=1

It is easily checked that this process is a Brownian motion in Rd . This concludes the
existence proof. u t

Having constructed the random variable Bt in C(R+ , Rd ), we can now define its
distribution, the so-called Wiener measure.
For this is it useful to observe that

Lemma 6.3. The smallest σ-algebra, C, on C(R+ , Rd ) that makes marginals w(t) :
C(R+ , Rd ) → Rd measurable for all t ∈ R+ measurable coincides with the Borel-σ-algebra,
108 6 Random walks and Brownian motion

B ≡ B(C(R+ , Rd )), of the metrisable space C(R+ , Rd ) equipped with the topology of
uniform convergence on compact sets.

Proof. First, C ⊂ B since all functions t 7→ w(t) are continuous and hence measur-
able with respect to the Borel-σ-algebra B. To prove that B ⊂ C, we note that the
topology of uniform convergence is equivalent to the metric topology relative to the
metric X
d(w, w0 ) ≡ 2−n sup |w(t) − w0 (t)| ∧ 1 .

(6.2.15)
n∈N 0≤t≤n

We thus have to show that any ball with respect to this distance is measurable with
respect to C. But since w are continuous functions,

sup |w(t) − w0 (t)| ∧ 1 = |w(t) − w0 (t)| ∧ 1 ,


 
sup (6.2.16)
t∈[0,n] t∈[0,n]∩Q

we see that e.g. the set {w : d(w, 0) < ρ} is in fact in C. t


u

Note that by construction, the map ω 7→ B(ω) is measurable, since the maps
ω 7→ Bt (ω) are measurable for all t, and by definition of C, all coordinate maps
B 7→ Bt are measurable. Thus the following definition makes sense.

Definition 6.4. Let Bt a Brownian motion in Rd defined on a probability space


(Ω, F, P). The probability measure on (C(R+ , Rd ), B(C(R+ , Rd ))) given as the image
of P under the map ω 7→ {Bt (ω), t ∈ R+ } is called the d-dimensional Wiener measure.

Note that uniqueness of the Wiener measure is a consequence of the Kolmogorov-


Daniell theorem, since we have already seen that the finite-dimensional distributions
are fixed by the prescription of the covariances.

6.3 Donsker’s invariance principle

We are now in the position to prove Donsker’s theorem.

Theorem 6.5. Let Xi be independent, identically distributed random variables with


mean zero and variance one. Let Zn (t) be as defined in (6.1.2). Then the processes
Zn (t), t ∈ [0, 1], converge in distribution to Brownian motion. More precisely, if Bt
is a Brownian motion, then there exists a sequence of processes Z en (t), t ∈ [0, 1] such
that the process Z en (t), t ∈ [0, 1] has the same distribution as Zn (t), t ∈ [0, 1], and for
all  > 0,  
lim P  sup kZn (t) − Bt k >   = 0.
 
 e (6.3.1)
n↑∞ t∈[0,1]

Remark. The assertion of the theorem implies what is called weak convergence in
the uniform topology on [0, 1]. This means the following: Take any function F :
B([0, 1], R) → R, that is continuous in the uniform topology, meaning that for any  >
6.3 Donsker’s invariance principle 109

0, one can find δ > 0, such that whenever two functions w, w0 satisfy supt∈[0,1] |w(t) −
w0 (t)| < δ, then |F(w) − F(w0 )| < . Then

lim EF(Zn ) = EF(B). (6.3.2)


n↑∞

This is easily proven from the assertion of our theorem: First,

EF(Zn ) = EF(Z
en ). (6.3.3)

Next,
   
en ) − F(B) ≤
E F(Z en ) − F(B)1
E (F(Z (6.3.4)
supt∈[0,1] |Z
en (t)−B(t)|≤δ
 
+ E (F(Zn ) − F(B)1sup
e
t∈[0,1] |Zn (t)−B(t)|>δ
e
 
≤  + CP  sup kZen (t) − Bt k >   .
 
t∈[0,1]

This implies that


en ) − F(B) = 0.
lim EF(Z (6.3.5)
n↑∞

Obviously, the interval [0, 1] can be replaced with any other finite interval.

Proof. We will give an interesting proof of this theorem which will not use what
we already know about finite dimensional distributions. For simplicity we consider
the case d = 1 only. It will be based on the famous Skorokhod embedding. What this
will do is to construct any desired random walk from a Brownian motion. This goes
a follows: we assume that F is the common distribution function of our random
variables Xi , assumed to have finite second moments σ2 . We now want to construct
stopping times, T , for the Brownian motion, B, such that (i) the law of BT is F, and
(ii) E(T ) = σ2 . This is a little tricky. First, we construct a probability measure on
(−R+ ) × R+ , from the restrictions, F± , of F to the positive and negative axis:

µ(da, db) ≡ γ (b − a)dF− (a)dF+ (b). (6.3.6)

where γ provides the normalisation, i.e.,


Z ∞ Z 0
γ−1 = bdF+ (b) = − adF− (a). (6.3.7)
0 −∞

We need some elementary facts that are easy if we accept that our results on
martingales carry over to continuous time.
Lemma 6.6. Let a < 0 < b and τ ≡ inf{t > 0 : Bt < (a, b)}. Then
(i) P(Bτ = a) = b
b−a ;
(ii) E(τ) = |ab|.
110 6 Random walks and Brownian motion

Proof. As we will discuss shortly, Bt is a martingale and let us anticipate that Doob’s
optional stopping theorem also holds for Brownian motion. Then 0 = E(Bτ ) =
bP[Bτ = b] + aP[Bτ = a] = b + (a − b)P[Bτ = a], which gives (i). To prove (ii) consider

Mt = (Bt − a)(b − Bt ) + t, (6.3.8)

which is a martingale with M0 = −ba. On the other hand (again assuming that we
can use the optional stopping theorem,

E(M0 ) = E(Mτ ) = E(τ) + E((Bτ − a)(b − Bτ )) = E(τ) + 0. (6.3.9)

which gives the claimed result. t


u

The Skorokhod embedding is now constructed by choosing α < 0 < β at random


from µ, and T = inf{t > 0 : Bt < (α, β)}. Then:
Theorem 6.7. The law of BT is F and E(T ) = σ2 .

Proof. Let b > 0. Then


Z 0
−a
P(BT ∈ db) = γ(b − a)dF+ (b)dF− (a) = dF+ (b). (6.3.10)
−∞ b−a
Analogously, for a < 0, P(BT ∈ da) = dF− (a). This proves the first assertion. Finally,
by a simple computation,
Z ∞Z 0 Z ∞
E(T ) = µ(da, db)|ab| = x2 F(dx) = σ2 . (6.3.11)
0 −∞ −∞

This proves (ii). t


u

Exercise. Construct the Skorokhod embedding for the simple random walk on Z.
We can now define a sequence of stopping times T 1 = T , T 2 = T 1 + T 20 , . . . , where
0
T i are independent and constructed in the same way as T on the Brownian motions
BT i−1 +t − BT i−1 . Then it follows immediately from the preceding theorem that:

Theorem 6.8. The process Sen , n ∈ N where Sen ≡ BT n , for all n ∈ N, has the same
distribution as the process S n ≡ ni=1 Xi , where Xi are iid with distribution F. Simi-
P

larly, there are stopping times T kn such that Z


en (t) ≡ BT n has the same distribution
[nt]
as Zn (t) and T k have the same distribution as T k /n..
n

Proof. Let Xi be iid with distribution functions F. By Theorem 6.7, the random vari-
ei ≡ BT i − BT are iid with the same distribution as Xi . Therefore, S n (t) has
ables X i−1
the same law as BT n . Then Zn (t) has the same distribution as n−1/2 BT [nt] . However,
we can also construct the Skorokhod embedding to reproduce the random variables
n−1/2 Xi as BT in − BT i−1 en (t) ≡ BT n [nt] .
n . Then Zn (t) also has the same distribution as Z

Now we use an important property of Brownian motion:


6.3 Donsker’s invariance principle 111

Lemma 6.9. For any a ∈ R+ , the processes Bt and Bat ≡ a−1 Bta2 have the same dis-
tribution.

Proof. Obviously, Ba is a Gaussian process. It suffices to show that B and Ba have


the same covariance. But trivially

EBat Bas = a−2 EBa2 t Ba2 s = a−2 (a2 t) ∧ (a2 s) = s ∧ t. (6.3.12)

which is the covariance of B. u


t

From the scaling property it follows easily that T in have the same law as T i /n.
This proves the theorem. u
t

The Skorokhod embedding provides the means to prove Donsker’s theorem.


Namely, we will show that the process Z en (t) converges uniformly to Bt in proba-
bility. This is possible, since it is coupled to Bt realisation-wise, unlike the original
Zn (t) which would not know which particular Bt (ω) it should stick with. We will set
σ2 = 1.
Note first that by the continuity of Brownian motion, for any  > 0, we can find
δ > 0 such that

P(∃u, t ∈ [0, 1]; |u − t| ≤ δ s.t. |Bu − Bt | > ) ≤ /2. (6.3.13)

Next, by the independence of the T i0 , and the law of large numbers,

Tn
lim = E(T ) = 1, a.s. (6.3.14)
n→∞ n
Thus
lim n−1 sup |T k − k| = 0, a.s. (6.3.15)
n→∞ k≤n

This holds since otherwise there exits


with positive
probability a sequence kn ↑ ∞,
where kn ≤ n, such that for all n, T kn /kn − 1 ≥ n/kn ≥ , for some  > 0. But this
contradicts (6.3.14) Therefore, there exists n1 such that for all n ≥ n1 ,
" #
P n−1 sup |T k − k| ≥ δ/3 ≤ /2. (6.3.16)
k≤n

Since T in have the same law as T i /n, this impies that also
" #
n
P sup T k − k/n ≥ δ/3 ≤ /2. (6.3.17)
k≤n

en (t) will coincide for any t = k/n with BT n . Therefore,


Finally, the process Z k
112 6 Random walks and Brownian motion
" #

P sup Z en (t) − Bt ≥ 
0≤t≤1
" #

≤ P sup Z en (t) − Bt ≥ , |T n − k/n| ≤ δ/3, ∀k≤n
k
0≤t≤1
" #

+P sup T kn − k/n ≥ δ/3
k≤n
h i
≤ P ∃k ≤ n, t ∈ [0, 1], |k/n − t| ≤ δ : Bn/k − Bt ≥  + /2
≤ P [∃u, t ∈ [0, 1], |u − t| ≤ δ : |Bu − Bt | ≥ ] + /2 ≤ .. (6.3.18)

This implies that the difference between Z en (t) and Bt converges uniformly in
t ∈ [0, 1] to zero in probability. On the other hand, Z en (t) has the same law as Zn (t).
This implies weak convergence as claimed. u t

6.4 Martingale and Markov properties

Although we have not studied with full rigour the concepts of martingales and
Markov processes in continuous time, Brownian motion is a good example to get
provisionally acquainted with them. The nice thing here is that we know already that
it has continuous paths, so that we need not worry about discontinuities; moreover,
a path is determined by knowing it on a dense set of times, say the rational numbers,
so we also need not worry about unaccountability.
Proposition 6.10. Brownian motion is a continuous time martingale, in the sense
that, if Ft = σ(Bs , s ≤ t), then, for any s < t,

E[Bt |F s ] = Bs . (6.4.1)

Proof. Of course we have not defined what a continuous time filtration is, but we
will not worry at this moment, and just take Ft as the σ-algebra generated be {Bs } s≤t .
Now we know that Bt = Bt − Bs + Bs , where Bt − Bs and Bs are independent. Thus

E[Bt |F s ] = E[Bt − Bs + Bs |F s ] = E[Bt − Bs |F s ] + E[Bs |F s ] = 0 + Bs , (6.4.2)

as claimed. u
t
Next we show that Brownian motion is also a Markov process. As a definition of
a continuous time Markov process, we adopt the obvious generalisation of (3.3.21).

Definition 6.11. A stochastic process with state space S and index set R+ is called
a continuous time Markov process, if there exists a two-parameter family of proba-
bility kernels, P s,t , satisfying the Chapman-Kolmogorov equations,
Z
P s,t (x, A) = Pr,t (y, A)P s,r (x, dy), ∀r ∈ (s, t), A ∈ B, (6.4.3)
S
6.4 Martingale and Markov properties 113

such that for all A ∈ B, s < t ∈ R+ ,

P[Bt ∈ A|F s ](ω) = P s,t (Bs (ω), A), a.s.. (6.4.4)

This definition may not sound abstract enough, because it stipulates that we search
for the kernels P s,t ; one may replace this by saying that

P[Bt ∈ A|F s ] (6.4.5)

is independent of the σ-algebras Fr , for all r < s; or in other words, that P[Bt ∈
A|F s ](ω) is a function of Bs (ω), a.s.. You can see that we will have to worry a little
bit about these definitions in general, but by the continuity of Brownian motion, we
may just look at rational times and then no problem arises. We come to these things
in the next course. We see that the two definitions are really the same, using the
existence of regular conditional probabilities: namely, P s,t will be just the regular
version of P[Bt ∈ A|F s ].

Proposition 6.12. Brownian motion in dimension d is a continuous time Markov


process with transition kernel

|y − x|2
Z !
1
P s,t (x, A) = exp − dy. (6.4.6)
(2π(t − s))d/2 A 2(t − s)

Proof. The proof is next to trivial from the defining property (i) of Brownian motion
and left as an exercise. ut

We now come, again somewhat informally, to the martingale problem associated


with Brownian motion.

Theorem 6.13. Let f be a two times differentiable function with bounded second
derivatives. Let Bt be Brownian motion. Then

1 t
Z
Mt = f (Bt ) − f (B0 ) − ∆ f (Bs )ds (6.4.7)
2 0
is a martingale.

Proof. We consider for simplicity only the case d = 1; the general case works the
same way. We proceed as in the discrete time case.
114 6 Random walks and Brownian motion
Z t
1
E[Mt+r |Ft ] = f (Bt ) − f (B0 ) − f 00 (Bs )ds (6.4.8)
2 0
1 r
Z
+E[ f (Bt+r ) − f (Bt )|Ft ] − E[ f 00 (Bt+s )|Ft ]ds
2 0
(y − Bt )2
Z !
1
= Mt + √ f (y) exp − dy − f (Bt )
2πr R 2r
1 r 1 (y − Bt )2
Z Z !
− √ f 00 (y) exp − dyds
2 0 2πs R 2s
= Mt

The last inequality holds since, using integration by parts,

(y − x)2
Z !
1 00
√ f (y) exp − dy (6.4.9)
2πs R 2s
d2 1 (y − x)2
Z !
= f (y) 2 √ exp − dy
R dy 2πs 2s
(y − x)2
Z !
1 h i
= √ f (y) −s −3/2
+ (y − x) s
2 −5/2
exp − dy
2π R 2s
(y − x)2
Z !
d 1
=2 f (y) √ exp − dy
R ds 2πs 2s

Integrating the last expression in (6.4.8) over s yields

(x − y)2
Z !
2
√ f (y) exp − dy − 2 f (x), (6.4.10)
2πr R 2r

where we used that


(x − y)2
Z !
2
lim √ f (y) exp − dy = f (x). (6.4.11)
h↓0 2πh R 2h

Inserting this into (6.4.8) concludes the proof. t


u
Note that we really used that the function

kxk2
!
1
e(t, x) ≡ √ exp − (6.4.12)
2πt 2t

satisfies the (parabolic) partial differential equation

∂ 1
e(x, t) = ∆e(x, t), (6.4.13)
∂t 2
with the (singular) initial condition

e(x, t) = δ(x), (6.4.14)


6.5 Sample path properties 115

(where δRhere denotes the Dirac-delta function, i.e., for any bounded integrable
function R δ(x) f (x)dx = f (0)). e(t, x) is called the heat kernel associated to (one-
dimensional) Brownian motion.

Remark. Let us note that if we rewrite (6.4.7) in the form

1 t
Z
f (Bt ) = f (B0 ) + Mt + ∆ f (Bs )ds, (6.4.15)
2 0
it formally resembles the Itô formula (4.5.6) that we derived formally in Section 4.
The martingale Mt should then play the rôle of the stochastic integral, i.e. we would
like to think of Z t
Mt = ∇ f (Bs ) · dBs . (6.4.16)
0
It will turn out that this is indeed a correct interpretation if and that (6.4.15) is the
Itô formula for Brownian motion.

The preceding theorem justifies to call L = ∆2 the generator of Brownian motion,


and to think of (6.4.8) as the associated martingale problem. The connection be-
tween Markov processes and potential theory, established for discrete time Markov
processes, also carries over to Brownian motion; in this case, this links to the clas-
sical potential theory associated to the Laplace operator ∆.

6.5 Sample path properties

We have constructed Brownian motion on a space of continuous paths. What else


can we say about the properties of theses paths? The striking feature is that Brown-
ian paths are almost surely nowhere differentiable!
The following theorem shows that it is not even Lipshitz continuous anywhere:

Theorem 6.14. For almost all ω, B(ω) is nowhere Lipshitz continuous.

Proof. Let K > 0 and define

An,K ≡ {ω ∈ Ω : ∃ s∈[0,1] ∀|t−s|≤2/n |Bt − Bs | ≤ K|t − s|}. (6.5.1)

Clearly
n o
An,K ⊂ ∪nk=2 |B j/n − B( j−1)/n | ≤ 4K/n, for j ∈ {k − 1, k, k + 1} . (6.5.2)

Now

P[An,K ] ≤ (n − 1) P[|B1/n − B0 | ≤ 4K/n] 3



(6.5.3)
≤ (n − 1) P[|B1/n | ≤ 4K/n] 3 ≤ Cn−1/2

116 6 Random walks and Brownian motion

for some finite constant C = (8K/ 2π)3 . Now An,K ⊂ An+1,K , and so for all n and all
K,
P[An,K ] ≤ lim P[A`,K ] = 0. (6.5.4)
`→∞

Finally, by monotonicity of the Lipshitz property, it follows that


X
P[∃K<∞ An,K ] ≤ P[An,K ] = 0. (6.5.5)
K∈N

t
u

Remark. The argument used in the proof can be extended to show that Brownian
motion is nowhere Hölder continuous with exponent larger than 1/2. Namely, for
α > 1/2, let k be chosen such that k(α − 1/2) > 1. Then define

An,K ≡ {ω ∈ Ω : ∃ s∈[0,1] ∀|t−s|≤k/n |Bt − Bs | ≤ K|t − s|α }. (6.5.6)

We then obtain that

P[An,K ] ≤ (n − 1) P[|B1/n − B0 | ≤ 2kK/nα ] k



(6.5.7)
≤ (n − 1) P[|B1/n | ≤ 2kK/nα ] k ≤ Cn−k(α−1/2)+1


which yields the conclusion as in the case α = 1.

An important notion is that of the quadratic variation. Let tkn ≡ (k2−n ) ∧ t and set

X
n ] .
2
[B]nt ≡ [Btkn − Btk−1 (6.5.8)
k=1

Lemma 6.15. With probability one, as n → ∞, [B]nt → t, uniformly on compact in-


tervals.

Proof. Note that all sums over k contain only finitely many non-zero terms, and
that all the summands in (6.5.13) are independent random variables, satisfying (for
tkn ≤ t)
 2
E Btkn − Btk−1
n = 2−n , (6.5.9)
 2 
var Btkn − Btk−1
n = 3 2−2n . (6.5.10)

Thus
E[B]nt = t, var [B]nt = 2−n t,

(6.5.11)
and thus
lim [B]nt = t, a.s.. (6.5.12)
n→∞

By telescopic expansion,
6.5 Sample path properties 117
∞ 
X 
B2t − B20 = B2tn − B2tn (6.5.13)
k k−1
k=1
X∞   
= Btkn − Btk−1
n Btkn + Btk−1
n

k=1

X  
= 2Btk−1
n Btkn − Btk−1
n + [B]nt .
k=1

Now set

X  
Vtn ≡ B2t − [B]nt = 2Btk−1
n Btkn − Btk−1
n . (6.5.14)
k=1

One can check easily that for any n, V n is a martingale. Then also

Vtn − Vtn+1 = [B]n+1


t − [B]nt (6.5.15)

is a martingale. If we accept that Doob’s L2 -inequality (Theorem 4.22) applies in


the continuous martingale case as well, we get that, for any T < ∞,

  n+1 p
sup [B]t − [B]t ≤ 2 sup [B]t − [B]t 2 = 2 T 2−n−1 ,
n+1 n n
(6.5.16)
0≤t≤T 2 0≤t≤T

where the last inequality is obtained by explicit computation. This implies that [B]nt
converges uniformly on compact intervals. u t

Remark. Lemma 6.15 plays a crucial rôle in stochastic calculus. It justifies the claim
that d[B]t = dt. If we go with this into our “discrete Itô formula (Section 4.6), this
means this justifies in a more precise way the step from Eq. (4.5.3) to Eq. (4.5.6).

Remark. The definition of the quadratic variation we adopt here via di-adic parti-
tions is different from the “true” quadratic variation that would be
 n 

X 
2
, = < < < = ,
 
sup  [B − B ] n ∈ 0 t t · · · t 1 (6.5.17)

 t k tk−1 N, 0 1 n 

 
k=1

which can be shown to be infinite almost surely (note that the choices of the ti can be
adapted to the specific realisation of the BM). The diadic version above is, however,
important in the construction of stochastic integrals.

Remark. The fact that the quadratic variation of BM converges to t implies that the
linear variation,
X∞

Btkn − Btk−1
n (6.5.18)
k=1

is infinite on every interval. This means in particular that the length of a Brownian
path between any times t, t0 is infinite.
118 6 Random walks and Brownian motion

6.6 The law of the iterated logarithm

How precisely random phenomena can be controlled is witnessed by the so-called


law of the iterated logarithm (LIL). It states (not in its most general form) that
Theorem 6.16. Let S n = ni=1 Xi , where Xi are independent identically distributed
P
random variables with mean zero and variance σ2 . Then
" #
Sn
P lim sup √ = 1 = 1. (6.6.1)
n→∞ σ 2n ln ln n

Remark. Just as the CLT, the LIL has extensions to the case of non-identically dis-
tributed random variables. For a host of results, see [4], Chapter 10. Furthermore,
there are extensions to the case of martingales, under similar conditions as for the
CLT.
The nicest proof of this result passes though the analogous result for Brownian
motion and then uses the Skorokhod embedding theorem. The proof below follows
[11].
Thus we want to first prove:
Theorem 6.17. Let Bt be a one-dimensional Brownian motion. Then
" #
Bt
P lim sup √ = 1 = 1, (6.6.2)
t→∞ 2t ln ln t
and  
Bt
= 1 = 1.
 
P lim sup √ (6.6.3)
t↓0 2t ln ln(1/t)
Proof. Note first that the two statements are equivalent since the two processes Bt
and tB1/t have the same law (Exercise!). √
We concentrate on (6.6.3). Set h(t) = 2t ln ln(1/t). Basically, the idea is to use
exponentially shrinking subsequences tn ≡ θn in such a way that the variables Btn are
essentially independent. Then, for the lower bound, it is enough to show that along
such a subsequence, the h(tn ) is reached infinitely often: this will prove that the
lim sup is as large as claimed. For the upper bound, one shows that along such sub-
sequences, the threshold h(tn ) is not exceeded, and then uses a maximum inequality
for martingales to control the intermediate values of t.
We first show that lim supt↓0 (· · · ) ≤ 1. For this we will assume that we can use
Doob’s submartingale inequality, Theorem 4.19 also in the continuous time case.
Define !
1 2
Zt ≡ exp αBt − α t . (6.6.4)
2
A simple calculation shows that Zt is a martingale (with E(Zt ) = 1), and so
" # " #
αB s −α2 s/2 αβ
P sup(Bs − αs/2) > β = P sup e > e ≤ e−αβ E(Zt ) = e−αβ . (6.6.5)
s≤t s≤t
6.6 The law of the iterated logarithm 119

Let θ, δ ∈ (0, 1), and chose tn = θn , αn = θ−n (1 + δ)h(θn ), and βn = 12 h(θn ). Then
" #
P sup (Bs − αn s/2) > βn ≤ n−(1+δ) (ln 1/θ)−(1+δ) , (6.6.6)
s≤θn

since αn βn = (1 + δ) ln ln θ−n = (1 + δ)(ln n + ln ln θ−1 ). Therefore, the Borel-Cantelli


lemma implies that, almost surely, for all but finitely many values of n,
 s  1
sup Bs − (1 + δ)θ−n h(θn ) ≤ h(θn ). (6.6.7)
s≤θn 2 2

It follows that
θn 1 1
sup Bs ≤ (1 + δ)θ−n h(θn ) + h(θn ) = (2 + δ)h(θn ) (6.6.8)
s≤θn 2 2 2

and so for any θn+1 ≤ t ≤ θn ,


1
Bt ≤ sup Bs ≤ (2 + δ)θ−1/2 h(t), (6.6.9)
s≤θ n 2

hence, almost surely,


1
lim sup Bt /h(t) ≤ θ−1/2 (2 + δ). (6.6.10)
t↓0 2
Since this holds for any δ > 0 and θ < 1 almost surely, it holds along any countable
subsequence δk ↓ 0, θk ↑ 1, almost surely, and

lim sup Bt /h(t) ≤ 1, a.s.. (6.6.11)


t↓0

To prove the converse inequality, consider the event

An ≡ {Bθn − Bθn+1 > (1 − θ)1/2 h(θn )}. (6.6.12)

The events are independent, and their probability can be bounded easily using that
for any u > 0, Z ∞
1 1
e−x /2 dx ≥ √ e−u /2 1 − 2u−2 .
2 2  
(6.6.13)
2π u u 2π
This implies that
Z ∞
x2
!
1
P[An ] = √ exp − dx (6.6.14)
2π(θn (1 − θ)) (1−θ)1/2 h(θn ) 2θn (1 − θ)
Z ∞
x2
!
1
= √ exp − dx
2π θ−n/2 h(θn ) 2
 
exp −θ−n h(θn )2 /2  
≥ √ 1 − 2θn h(θn )−2 ≡ γn .
2πθ−n/2 h(θn )
120 6 Random walks and Brownian motion

Now θ−n h(θn )2 = 2 ln n + 2 ln ln(1/θ), and so


1
γn ≥ C √ , (6.6.15)
n ln n
so that n γn = +∞; hence, the second Borel-Cantelli lemma implies that, with prob-
P
ability one, An happens infinitely often, i.e. for infinitely many n,

Bθn ≥ (1 − θ)1/2 h(θn ) + Bθn+1 . (6.6.16)

Now, the upper bound (6.6.11) also holds for −Bt , so that, almost surely, for all but
finitely many n,
Bθn+1 ≥ −h(θn+1 ). (6.6.17)
But by some simple estimates,
s
ln ln(θ−n θ−1 )  
h(θn+1 ) = θ1/2 h(θn ) −n
≤ θ1/2 h(θn ) 1 + O(ln θ−1 /n) , (6.6.18)
ln ln(θ )

so that, for infinitely many n,


 
Bθn ≥ (1 − θ)1/2 − 2θ1/2 h(θn ). (6.6.19)

This implies that  


lim sup Bθn /h(θn ) ≥ (1 − θ)1/2 − 2θ1/2 , (6.6.20)
n→∞

for all θ > 0; hence,


lim sup Bt /h(t) ≥ 1, (6.6.21)
t→∞

which completes the proof. u


t

From the LIL for Brownian motion one can prove the LIL for random walk using
the Skorokhod embedding.

Proof. (of Theorem 6.16). From the construction of the Skorokhod embedding, we
know that we may choose S n (ω) = BT n (ω). The strong law of large numbers implies
that T n /n → 1, a.s., and so also h(T n )/h(n) → 1, a.s.. Thus the upper bound follows
trivially:
Sn BT n Bt
lim sup = lim sup ≤ lim sup = 1. (6.6.22)
n→∞ h(n) n→∞ h(T n ) t→∞ h(t)
To prove the complementing lower bound, note that by Kolmogorov’s 0 − 1-law,
ρ ≡ lim supn→∞ h(n)Sn
is almost surely a constant (since the limsup is measurable with
respect to the tail-σ-algebra. Then, there exists n0 < ∞, such that for all n ≥ n0 ,
BT n
h(T n ) ≤ ρ. Assume ρ < 1; we will show that this leads to a contradiction with (6.6.2)
of Theorem 6.17. To show this, we must show that the Brownian motion cannot rise
too far in the intervals [T n , T n+1 ]. But recall that T n+1 is defined as the stopping time
at the random interval [α, β] of the Brownian motion Bt . We want to show that in no
6.6 The law of the iterated logarithm 121

such interval can the BM climb by more than  2n ln ln n. An explicit computation
shows that
  Z 0 Z ∞
−a
φ(x) ≡ P  sup Bt > x = γ ,
 
  dF− (a) dF+ (b)(b − a) (6.6.23)
t≤T 1 −∞ x x −a
−a
where the ratio x−a is the probability that the BM reaches x before a (i.e. before T 1 )
(the logic of the formula is that for Bt to exceed x before T 1 , the random variable β
must be larger than x, and then Bt may not hit the lower boundary before reaching
x). Now we will be done by Borel-Cantelli, if
X √
φ( 2n ln ln n) < ∞, (6.6.24)
n

or in fact the stronger but simpler condition


X √
φ( n) < ∞ (6.6.25)
n

holds for all  > 0. For then, except finitely often,

sup Bt ≤ h(n)(ρ + ), (6.6.26)


T n <t<T n+1

which implies
Bt
lim sup < ρ + , (6.6.27)
t→∞ h(t)
which is smaller than 1 if  is, e.g. (1 − ρ)/2, thus contradicting the result for BM.
We are left with checking (6.6.25). We may decompose φ as
Z 0 Z ∞
−a
Φ(x) = γ dF− (a) dF+ (b)(b − x) (6.6.28)
−∞ x x−a
Z 0 Z ∞
+γ dF− (a) dF+ (b)|a| ≡ φ1 (x) + φ2 (x).
−∞ x
√ R∞ √
Now n φ2 ( n) < ∞ if 0 φ2 ( x) < ∞. Recalling the formula for γ, (6.3.7),
P
we see that
Z ∞ Z ∞ Z ∞
√ √
φ2 ( x)dx = (1 − F( x))dx =  −2 (1 − F(t))tdt <  −2 E(X 2 ) < ∞.
0 0 0
(6.6.29)
To deal with φ1 , use that x − a > x, and then as before
Z ∞
φ1 (x) ≤ x −1
(b − x)dF+ (x) (6.6.30)
x

Comparing the sum to an integral, we must check the finiteness of


122 6 Random walks and Brownian motion
Z Z ∞ √
Z Z ∞
1
dx √ dF+ (b)(b −  x) = 2 −2 dt dF+ (b)(b − t), (6.6.31)
 x √
 x t

which again hold since F has finite second moment. This concludes the proof. t
u

Remark. On can show more than what we did. For one thing, not only is lim supt Bt /h(t) =
+1 (and hence by symmetry lim inf t Bt /h(t) = −1), a.s., it is also true that the set
of limit points of the process Bt /h(t) is the entire interval [−1, 1]; i.e., for any
a ∈ [−1, 1], there exist subsequences tn , such that limn Btn /h(tn ) = a.

The LIL √ shows that at any given time t, the increment of BM, Bt+δ − Bt , grows
as fast as 2δ ln ln 1/d. The following theorem, called Lévy’s theorem , shows that
there are exceptional times when it increases even faster.

Theorem 6.18. Let B be Brownian motion. Then


 
Bt+δ − Bt
= 1 = 1.
 
P lim sup sup √ (6.6.32)
δ↓0 t∈[0,1] 2δ| ln δ|

Remark. This theorem implies in particular that Brownian motion is almost surely
Hölder continuous with exponent α, i.e.
 
P lim sup sup δ−α |Bt+δ − Bt | = 0 = 1,
 
(6.6.33)
δ↓0 t∈[0,1]

for any α < 1/2. This is a basic property of Brownian motion that one should mem-
orise. But Theorem 6.18 is sharper than that. It states that almost
√ surely, on any
compact interval, there will be points where BM increases like δ| ln δ|, faster than
what√one would guess from the LIL, which states that at any given point, it increases
like δ ln | ln δ|!

Proof. The proof we give her is due to Lévy and differs from that of the LIL in
that it does not use maximum inequalities for the upper bound, but a new technique,
called chaining. We first prove the lower bound. For  ∈ (0, 1), Here it is enough to
exhibit candidates for the highly singular behaviour:
 p 
P maxn Bk2n − B(k−1)2n ≤ (1 − ) 21−n ln 2n

(6.6.34)
k≤2
  p 2n
= 1 − P B2−n > (1 − ) 21−n ln 2n
h  √ i2n
= 1 − P B1 > (1 − ) 2 ln 2n
n
"
1  #2
≤ 1− √ exp −(1 − )2 ln 2n
2π2n ln 2
 2−n(1−) +n 
 2 
 
≤ exp − √  ∼ exp −22n ,
2π2n ln 2
6.6 The law of the iterated logarithm 123

which tends to zero and is summable over n, for any  > 0. By the first Borel-
Cantelli lemma, this implies that the event considered can happen only for finitely
many values of n, almost surely. Thus
 
Bt+δ − Bt
≥ 1 = 1.
 
P lim sup sup √ (6.6.35)
δ↓0 t∈[0,1] 2δ| ln δ|

The lower bound is more tricky and uses an interesting technique of chaining.
We first establish
√ that the required conditions hold on a 2−n grid. By convention we
set h() ≡ 2| ln |. Then we estimate
!

−n − B j 2−n > 1 + 2
−n
P maxn h( j2 ) B j 2 2 1
(6.6.36)
j+ ji − j2 ≤2 , ji ≤2n
 
≤ 2(1+)n P B j2−n > (1 + 2)h( j2−n )
 p 
= 2(1+)n P |B1 | > (1 + ) 2 ln 2n(1−)
2
≤ 2(1+)n √ exp(−(1 + 2)2 ln 2n(1−) ) ≤ 2−2n .
2π2n ln 2
This bound is summable over n, so that, by the first Borel-Cantelli lemma, almost
surely, there exists an n(ω) < ∞, such that for all n ≥ n(ω),

−n − B j 2−n ≤ 1 + .
−n
maxn h( j2 ) B j 2 2 1
(6.6.37)
j+ ji − j2 ≤2 , ji ≤2
n

We may chose n(ω) in such a way that 2(n+1)−1 > 2 and 2−n(1−) < 1/e, and

X
h(2−m ) ≤ h(2−(1−)(n+1) ), (6.6.38)
m=n+1

for all n > n(ω).


Now let t2 − t1 in [0, 1] be such that δ = t2 − t1 < 2−n(ω)(1−) and chose n ≥ n(ω)
such that 2−(n+1)(1−) ≤ δ ≤ 2−n(1−) . Obviously, we can represent the numbers ti in a
binary representation as

t1 = j1 2−n − 2−n1 − 2−n2 − . . . ,


t2 = j2 2−n + 2−m1 + 2−m2 = . . . .

By our estimates, we have then the bound

|Bt2 − Bt1 | ≤ |B j1 2−n − Bt1 | + |B j2 2−n − Bt2 | + |B j1 2−n − B j2 2−n | (6.6.39)


X
≤ 2 (1 + )h(2−m ) + (1 + )h( j2−n )
m>n
≤ 2(1 + )h(2−(n+1)(1−) ) + (1 + )h( j2−n )
≤ (1 + 4)h(δ).
124 6 Random walks and Brownian motion

This provides the upper bound and concludes the proof. t


u
References

1. H. Bauer and R. Burckel. Probability theory and elements of measure theory. Academic Press
London, 1981.
2. P. Billingsley. Probability and measure. Wiley Series in Probability and Mathematical Statis-
tics. John Wiley & Sons Inc., New York, 1995.
3. A. Bovier. Statistical mechanics of disordered systems. Cambridge Series in Statistical and
Probabilistic Mathematics. Cambridge University Press, Cambridge, 2006.
4. Y. Chow and H. Teicher. Probability theory. Springer Texts in Statistics. Springer-Verlag,
New York, 1997.
5. J. Doob. Measure theory, volume 143. Springer, 1994.
6. H.-O. Georgii. Gibbs measures and phase transitions, volume 9 of de Gruyter Studies in
Mathematics. Walter de Gruyter & Co., Berlin, 1988.
7. K. Itô and H. P. McKean, Jr. Diffusion processes and their sample paths. Die Grundlehren
der Mathematischen Wissenschaften, Band 125. Academic Press Inc., Publishers, New York,
1965.
8. O. Kallenberg. Random measures. Akademie-Verlag, Berlin, 1983.
9. M. Ledoux and M. Talagrand. Probability in Banach spaces, volume 23 of Ergebnisse der
Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)].
Springer-Verlag, Berlin, 1991.
10. M. Rao. Measure theory and integration, volume 265. CRC, 2004.
11. L. Rogers and D. Williams. Diffusions, Markov processes, and martingales, volume 1. Cam-
bridge University Press, 2000.
12. B. Simon. The statistical mechanics of lattice gases. Vol. I. Princeton Series in Physics.
Princeton University Press, Princeton, NJ, 1993.
13. D. Stroock and S. Varadhan. Multidimensional diffusion processes, volume 233. Springer,
1979.
14. M. Taylor. Measure theory and integration, volume 76. Amer. Math. Society, 2006.

125
Index

0 − 1-law density, 24
Kolmogorov’s, 64 Dirichlet problem, 89
L p -space, 20 Donsker’s theorem, 108
L p -space, 20 Doob decomposition, 70, 87
Π-system, 4 Doob’s super-martingale inequality, 82
λ-system, 4 Dynkin’s theorem, 4
σ-additive, 6
σ-algebra, 1 equilibrium measure, 93
Borel, 3 equilibrium potential, 92
generated, 3 equivalence (of measures), 24
ergodic, 97
absolute continuity, 24 ergodic theorem, 99
absolutely integrable, 16 essential supremum, 20, 25
adapted process, 58 expectation, 16
algebra, 1
Fatou’s lemma, 17
Baire σ-algebra, 15 filtered space, 57
Baire function, 15 filtration, 57
Banach space, 3 filtrations
Borel measure, 11, 12 natural, 58
Borel-σ-algebra, 3 Fourier transform, 50
Brownian motion, 51, 104 Fubini-Lebesgue theorem, 23
construction, 104 Fubini-Tonnelli theorem, 22

Carathéodory’s theorem, 6 Gaussian density, 49


Cauchy sequence, 3 Gaussian process, 49
central limit theorem generator, 86
for martingales, 73 Gibbs measure, 54
chaining, 123 Green kernel, 92
Chapman-Kolmogorov equations, 54, 113
class, 1 Hölder inequality, 20
closed, 2 Haar functions, 105
concentration of measure, 77 harmonic function, 88
conditional expectation, 31 heat kernel, 115
conditional probability, 38
coupling, 100 independent random variables, 48
cylinder set, 43 index set, 41

127
128 Index

indicator function, 1 transform, 59


induced measure, 13 martingale difference sequence, 60
inequality maximum inequality, 67, 71
Hölder, 20 maximum principle, 89
inequality, 20 measurable space, 1
Jensen, 20 measure, 2
maximum, 71 σ-finite, 2
upcrossing, 61 equilibrium, 93
initial distribution, 53 finite, 2
inner regular, 12 invariant, 98
integrable, 16 probability, 2
invariance principle, 108 Wiener, 107
invariant measure space, 2
distribution, 84, 98 metric, 3
measure, 84 metric space, 3
invariant measure, 98 Minlowski inequality, 20
Ising model, 54 monotone class theorem, 14
Itô formula, 72, 115 monotone convergence theorem, 16

Jensen inequality, 20 norm, 3


normed vector space, 3
Kolmogorov’s 0 − 1-law, 64
Kolmogorov’s LLN, 66 open, 2
Kolmogorov-Daniell theorem, 45 outer measure, 7, 9
outer regular, 12
Lévy’s downward theorem, 65
Lévy’s theorem, 122 Poisson kernel, 92
Laplace transform, 50 Polish space, 4
law of large numbers, 66 positive recurrent, 97
law of the iterated logarithm, 118 potential
Lebesgue decomposition theorem, 29 equilibrium, 92
Lebesgue integral, 15 pre-T -σ-algebra, 79
Lebesgue measure, 11 previsible process, 59
Lebesgue’s dominated convergence theorem, probability
17 regular conditional, 38
Lindeberg condition, 74 probability measure, 2
local specification, 55 process
Lousin space, 4 adapted, 58
product space, 5
marginals, 45 product topology, 5
finite dimensional, 45
Markov chain, 52 quadratic variation, 116
Markov inequality
exponential, 77 Radon measure, 12
Markov process, 52, 83 Radon-Nikodým derivative, 25
continuous time, 113 Radon-Nikodým theorem, 25
stationary, 83 random variable, 13
Markov property, 54 random walk, 103
strong, 85 recurrence, 96
martingale, 57 recurrent
convergence theorem, 62 positive, 97
problem, 86 regular conditional probability, 38, 55
sub, 58
super, 58 sample path, 42
Index 129

set-function, 6 strong Markov property, 85


simple function, 15 supremum norm, 6
Skorokhod embedding, 110
space, 1 time
Banach, 3 continuous, 41
complete, 3 discrete, 41
filtered, 57 topological space, 2
Hausdorff, 3 topology, 2
Lusin, 4 transience, 96
measurable, 1 transition kernel, 52
metric, 3 stationary, 83
normed, 3
Polish, 4
uniform integrability, 17, 63
topological, 2
upcrossing, 60
special cylinder, 43
inequality, 61
state space, 41
stationary process, 83
statistical mechanics, 54 version (of conditional expectation), 31
stochastic integral, 59
stochastic process, 41 white noise, 48
stopping time, 78 Wiener measure, 107

You might also like