Probability Theory II Stochastic Calculus 3031631927 9783031631924 - Compress
Probability Theory II Stochastic Calculus 3031631927 9783031631924 - Compress
Andrea Pascucci
Probability
Theory II
Stochastic Calculus
UNITEXT
Volume 166
Editor-in-Chief
Alfio Quarteroni, Politecnico di Milano, Milan, Italy
École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Series Editors
Luigi Ambrosio, Scuola Normale Superiore, Pisa, Italy
Paolo Biscari, Politecnico di Milano, Milan, Italy
Ciro Ciliberto, Università di Roma “Tor Vergata”, Rome, Italy
Camillo De Lellis, Institute for Advanced Study, Princeton, NJ, USA
Victor Panaretos, Institute of Mathematics, École Polytechnique Fédérale de
Lausanne (EPFL), Lausanne, Switzerland
Lorenzo Rosasco, DIBRIS, Università degli Studi di Genova, Genova, Italy
Center for Brains Mind and Machines, Massachusetts Institute of Technology,
Cambridge, Massachusetts, US
Istituto Italiano di Tecnologia, Genova, Italy
The UNITEXT - La Matematica per il 3+2 series is designed for undergraduate
and graduate academic courses, and also includes books addressed to PhD students
in mathematics, presented at a sufficiently general and advanced level so that the
student or scholar interested in a more specific theme would get the necessary
background to explore it.
Originally released in Italian, the series now publishes textbooks in English
addressed to students in mathematics worldwide.
Some of the most successful books in the series have evolved through several
editions, adapting to the evolution of teaching curricula.
Submissions must include at least 3 sample chapters, a table of contents, and
a preface outlining the aims and scope of the book, how the book fits in with the
current literature, and which courses the book is suitable for.
For any further information, please contact the Editor at Springer:
[email protected]
THE SERIES IS INDEXED IN SCOPUS
***
UNITEXT is glad to announce a new series of free webinars and interviews
handled by the Board members, who rotate in order to interview top experts in their
field.
Access this link to subscribe to the events:
https://cassyni.com/s/springer-unitext
Andrea Pascucci
Probability Theory II
Stochastic Calculus
Andrea Pascucci
Dipartimento di Matematica
Alma Mater Studiorum – Università di
Bologna
Bologna, Italy
Translation from the Italian language edition: “Teoria della Probabilità. Processi e calcolo stocastico”
by Andrea Pascucci, © Springer-Verlag Italia S.r.l., part of Springer Nature 2024. Published by Springer
Milano. All Rights Reserved.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
Cover illustration: Cino Valentini, Archeologia 2 , 2021, affresco acrilico, private collection
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
“For over two millennia, Aristotle’s logic has ruled over the thinking of western
intellectuals. All precise theories, all scientific models, even models of the process
of thinking itself, have in principle conformed to the straight-jacket of logic.
But from its shady beginnings devising gambling strategies and counting corpses
in medieval London, probability theory and statistical inference now emerge as
better foundations for scientific models, especially those of the process of thinking
and as essential ingredients of theoretical mathematics, even the foundations of
mathematics itself. We propose that this sea change in our perspective will affect
virtually all of mathematics in the next century.”
David Bryant Mumford, The Dawning of the Age of Stochasticity [99]
“A mathematician is someone who loves philosophy, art, and poetry because they
find the profound human need everywhere, against and beyond the often ridiculous
oppositions between “hard” and “soft” sciences. Awareness of such an intertwining
further enhances (.. . .) the high, inescapable and indestructible moral choice to
carry out one’s own action as a scientist and as a human being in society towards
good. And if good and true come together, they can only produce beauty.”
Rino Caputo, Preface to Le anime della matematica [147]
vii
viii Preface
America’s Cold War efforts that Wiener’s work was declared top secret. But all
of it, Wiener insisted, could have been deduced from Kolmogorov’s early paper.”
Finally, probability is at the foundation of the development of the most recent
technologies in Machine Learning and all related applications in Artificial Intelli-
gence, such as autonomous driving, speech and image recognition, and more (see,
for example, [54] and [122]). Nowadays, an advanced knowledge of Probability
Theory is a minimum requirement for anyone interested in pursuing applied
mathematics in any of the aforementioned fields.
It should be acknowledged that there are numerous monographs on stochastic
analysis: among my favorites I mention, in alphabetical order, Baldi [6], Bass [9],
Baudoin [13], Doob [35], Durrett [37], Friedman [50], Kallenberg [66], Karatzas
and Shreve [67], Mörters and Peres [98], Revuz and Yor [123], Schilling [129], and
Stroock [133]. Other excellent texts that have been major sources of inspiration and
ideas include those by Bass [10], Durrett [38], Klenke [68], and Williams [148]. In
any case, this list is far from exhaustive.
After more than two decades of teaching experience in this field, this book
represents my endeavor to systematically, concisely, and as comprehensively as
possible, compile the fundamental concepts of stochastic calculus that, in my view,
should constitute the essential knowledge for a modern mathematician, whether pure
or applied.
I would like to conclude by expressing my heartfelt gratitude to the exceptional
group of probabilists at the Department of Mathematics in Bologna: Stefano
Pagliarani, Elena Bandini, Cristina Di Girolami, Salvatore Federico, Antonello
Pesce, and Giacomo Lucertini, as well as those whom I hope will join us in the
future. A big thank you also goes to Andrea Cosso for his valuable collaboration
during the (all too short!) time he was a member of our department. Lastly, I extend
a special thank you to all the students who have taken my courses on probability
theory and stochastic calculus. This book was created for them, inspired by the
passion and energy they have shared with me. It is dedicated to them because I
cannot refrain from making it my own, at least as an attempt, the famous phrase of
a great scientist “I never teach my pupils; I only attempt to provide the conditions
in which they can learn.”
Readers who wish to report any errors, typos, or suggestions for improvement
can do so at the following address: [email protected].
The corrections received after publication will be made available on the website
at: https://unibo.it/sitoweb/andrea.pascucci/.
f : (Ω, F ) −→ (E, E );
.
If .(E, E ) = (R, B), .mF + (resp. .bF ) denotes the class of .F -measurable and
non-negative (resp. .F -measurable and bounded) functions.
• .N is the family of negligible sets (cf. Definition 1.1.16 in [113])
• Numerical sets:
– natural numbers: .N = {1, 2, 3, . . .}, .N0 = N ∪ {0}, .In := {1, . . . , n} for
.n ∈ N
– real numbers .R, extended real numbers .R̄ = R∪{±∞}, positive real numbers
.R >0 = ]0, +∞[, non-negative real numbers .R ≥0 = [0, +∞[
xi
xii Frequently Used Symbols and Notations
⎲
d
〈x, y〉 = x · y =
. xi yi , x = (x1 , . . . , xd ), y = (y1 , . . . , yd ) ∈ Rd
i=1
x ∧ y = min{x, y},
. x ∨ y = max{x, y}
x + = x ∨ 0,
. x − = (−x) ∨ 0
1 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Stochastic Processes: Law and Finite-Dimensional
Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Measurable Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Existence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Filtrations and Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Proof of Kolmogorov’s Extension Theorem . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1 Transition Law and Feller Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Processes with Independent Increments and Martingales . . . . . . . . . . 34
2.4 Finite-Dimensional Laws and Chapman-Kolmogorov Equation. . . 36
2.5 Characteristic Operator and Kolmogorov Equations . . . . . . . . . . . . . . . 41
2.5.1 The Local Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.2 Backward Kolmogorov Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.5.3 Forward Kolmogorov (or Fokker-Planck) Equation . . . . . . 51
2.6 Markov Processes and Diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.7 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3 Continuous Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1 Continuity and a.s. Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Canonical Version of a Continuous Process . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3 Kolmogorov’s Continuity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4 Proof of Kolmogorov’s Continuity Theorem . . . . . . . . . . . . . . . . . . . . . . . 66
3.5 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Markov and Feller Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
xiii
xiv Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
Abbreviations
r.v. = random variable, a.s. = almost surely. A certain property holds a.s. if there
exists .N ∈ N (negligible set) such that the property is true for every .ω ∈
Ω\N
a.e. = almost everywhere (with respect to the Lebesgue measure)
xix
Chapter 1
Stochastic Processes
In this section, we give two equivalent definitions of stochastic process. The first
definition is quite simple and intuitive; the second is more abstract but essential
for the proof of some general results on stochastic processes. We also introduce
some accessory notions: the space of trajectories, the law and the finite-dimensional
distributions.
Let I be a generic non-empty set. Given .d ∈ N, let .mF be the set of random
variables with values in .Rd , defined on a probability space .(Ω, F , P ). The concept
of a stochastic process extends that of a function from I to .Rd , admitting that the
values taken may be random: in other words, just as a function
f : I −→ Rd
.
. X : I −→ mF
X : I −→ mF
.
t −→ Xt .
RI = {x : I −→ R}
.
the family of functions from I to .R. For each .x ∈ RI and .t ∈ I , we write .xt instead
of .x(t) and say that .xt is the t-th component of x: in this way we interpret .RI as the
Cartesian product of .R for a number .|I | of times (even if I is not finite or countable).
For example, if .I = {1, . . . , d} then .RI is identifiable with .Rd , while if .I = N then
.R
N is the set of sequences .x = (x , x , . . . ) of real numbers. An element .x ∈ R I
1 2
can be seen as a parameterized curve in .R, where I is the set of parameters.
We say that .RI is the space of trajectories from I to .R and .x ∈ RI is a real
trajectory. There is nothing special about considering real trajectories: we could
directly consider .Rd or even a generic measurable space .(E, E ) instead of .R. In
such a case, the space of trajectories is .E I , the set of functions from I with values
1.1 Stochastic Processes: Law and Finite-Dimensional Distributions 3
Ct (H ) := {x ∈ RI | xt ∈ H }
.
⋂
n
Ct1 ,...,tn (H ) := {x ∈ RI | (xt1 , . . . , xtn ) ∈ H } =
. Cti (Hi ) (1.1.1)
i=1
F I := σ (C )
.
X : Ω −→ RI .
.
Remark 1.1.4 The fact that X is a random variable means that the measurability
condition holds
(X ∈ C) ∈ F for every C ∈ F I .
. (1.1.2)
4 1 Stochastic Processes
and therefore Definitions 1.1.1 and 1.1.3 are equivalent. In summary, one can also
say that a real stochastic process X is a function
X : I × Ω −→ R.
.
(t, ω) −→ Xt (ω)
that
• associates to each .t ∈ I the random variable .ω |→ Xt (ω): this is the standpoint
of Definition 1.1.1;
• associates to each .ω ∈ Ω the trajectory .t |→ Xt (ω): this is the standpoint of
Definition 1.1.3. Note that each outcome .ω ∈ Ω corresponds to (and can be
identified with) a trajectory of the process.
Example 1.1.5 Every function .f : I −→ R can be seen as a stochastic process
interpreting, for each fixed .t ∈ I , .f (t) as a constant random variable. In other
words, if .Ω = {ω} is a sample space consisting of a single element, the process
defined by .Xt (ω) = f (t) has only one trajectory which is the function f . The
measurability condition (1.1.3) is obvious since .F = {∅, Ω}. In this sense, the
concept of a stochastic process generalizes that of a function because it allows the
existence of multiple trajectories.
From the standpoint of Definition 1.1.3 a stochastic process is a random variable
and therefore we can define its law.
Definition 1.1.6 (Law) The distribution (or law) of the stochastic process X is the
probability measure on .(RI , F I ) defined by
μX (C) = P (X ∈ C),
. C ∈ FI.
1 Indeed, .(X∈ H ) = (X ∈ C) where C is the one-dimensional cylinder (i.e., in which only one
t
component is fixed) defined by .{x ∈ RI | xt ∈ H }: so it is clear that if X is a stochastic process
then .Xt ∈ mF for every .t ∈ I . Conversely, the family
.H := {C ∈ F I | X−1 (C) ∈ F }
which are the distributions .μ(Xt1 ,...,Xtn ) of the random vectors .(Xt1 , . . . , Xtn ) as the
choice of a finite number of indices .t1 , . . . , tn ∈ I varies. The law of a process
is uniquely determined by the finite-dimensional distributions: in other words, it is
equivalent to knowing the law or the finite-dimensional distributions of a stochastic
process.2
The one-dimensional distributions are not sufficient to identify the law of a
process. This is clear when I is finite and therefore the process is simply a random
vector: in fact, the one-dimensional distributions are the marginal laws of the vector
which obviously do not identify the joint law. Another interesting example is given
in Remark 4.1.5.
Example 1.1.8 Let .A, B ∼ N0,1 be independent random variables. Consider the
stochastic process .X = (Xt )t∈R defined by
Xt = At + B,
. t ∈ R.
m(t) := E [Xt ] ,
. c(s, t) := cov(Xs , Xt ), s, t ∈ I.
and therefore the finite-dimensional distributions identify .μX on .C . On the other hand, .C is a
.∩-closed
)family and generates .F : by Corollary I-cc2 in [113] if two probability measures on
I
( I
. R ,F
I coincide on .C then they are equal. In other words, if .μ (C) = μ (C) for each .C ∈ C
1 2
then .μ1 ≡ μ2 . We will see that, thanks to Carathéodory’s theorem, a probability measure extends
uniquely from .C to .F I : this is the content of one of the first fundamental results on stochastic
processes, Kolmogorov’s extension theorem, which we will examine in Sect. 1.3.
6 1 Stochastic Processes
where
( )
M = (m(t1 ), . . . , m(tn ))
. and C = c(ti , tj ) i,j =1,...,n . (1.1.4)
( )
We observe that .C = c(ti , tj ) i,j =1,...,n is a symmetric and positive semi-definite
matrix. Obviously, if I is finite then X is nothing but a random vector with multi-
normal distribution. The process of Example 1.1.8 is Gaussian with zero mean and
covariance function .c(s, t) = st + 1. The trivial process of Example 1.1.5 is also
Gaussian with mean function .f (t) and identically zero covariance function: in this
case, .Xt ∼ δf (t) for every .t ∈ I . Finally, a fundamental example of Gaussian
process is the Brownian motion that we will define in Chap. 4.
Remark 1.1.10 ([!]) There are families of trajectories, even very significant ones,
that do not belong to the .σ -algebra .F I . The idea is that every element of .F I is
characterized by a countably number of coordinates3 and this is highly restrictive
when I is uncountable. For example, if .I = [0, 1] we have
/ F [0,1]
C[0, 1] ∈
.
3 More precisely, let us solve Exercise 1.4 in [9]: consider .I = [0, 1] (thus the space of trajectories
.R is the family of functions from .[0, 1] to .R). Given a sequence .τ = (tn )n≥1 ∈ [0, 1]N , we
I
and put
where .F N denotes the .σ -algebra generated by cylinders in .RN . Then .M ⊆ F [0,1] and contains
the family of finite-dimensional cylinders of .R[0,1] , which is a .∩-closed family that generates
.F
[0,1] . Moreover, one proves that .M is a monotone family: it follows from Corollary A.0.4 in
[113] that .M = F [0,1] i.e., every element .C ∈ F [0,1] is of the form .C = τ −1 (H ) for some
sequence .τ in .[0, 1] and some .H ∈ F N . In other words, C is characterized by the choice of a
countable number of coordinates .τ = (tn )n≥1 (as well as by .H ∈ F N ).
4 By contradiction, if .C[0, 1] = τ −1 (H ), for some sequence of coordinates .τ = (t )
n n≥1 in .[0, 1]
and .H ∈ F N , then modifying .x ∈ C[0, 1] at a point .t ∈ / τ should still result in a continuous
function and this is clearly false.
1.1 Stochastic Processes: Law and Finite-Dimensional Distributions 7
We have given two equivalent definitions of stochastic process, each with its own
advantages and disadvantages:
(i) a stochastic process is a function with random values (Definition 1.1.1)
.X : I −→ mF
that associates to each .t ∈ I the random variable .Xt defined on the probability
space .(Ω, F , P );
(ii) a stochastic process is a random variable with values in a space of
trajectories (Definition 1.1.3): according to this much more abstract definition,
a process .X = X(ω) is a random variable
X : Ω −→ RI
.
from the probability space .(Ω, F , P ) to the space of trajectories .RI , equipped
with the structure of a measurable space with the .σ -algebra .F I . This definition
is used in the proof of the most general and theoretical results even if it is a less
operational notion and more difficult to apply to the study of concrete examples.
Note that the previous definitions do not require any assumptions about the type of
dependence of X with respect to the variable t (for example, measurability or some
kind of regularity). Obviously, the problem does not arise if I is a generic set, devoid
of any measurable or metric space structure; however, if I is a real interval then it
is possible to endow the product space .I × Ω with a structure of measurable space
with the product .σ -algebra .B ⊗ F .
8 1 Stochastic Processes
X : (I × Ω, B ⊗ F ) −→ (R, B).
.
1.2 Uniqueness
There are various notions of equivalence between stochastic processes. First of all,
two processes .X = (Xt )t∈I and .Y = (Yt )t∈I are equal in law if they have the same
distribution (or, equivalently, if they have the same finite-dimensional distributions):
in this case X and Y could even be defined on different probability spaces. When X
and Y are defined on the same probability space .(Ω, F , P ), we can provide other
notions of equivalence expressed in terms of equality of trajectories. We first recall
that, in a probability space .(Ω, F , P ), a subset A of .Ω is almost sure (with respect
to P ) if there exists an event .C ⊆ A such that .P (C) = 1. If the probability space
is complete5 then every almost sure set A is an event and therefore we can simply
write .P (A) = 1.
Definition 1.2.1 (Modifications) Let .X = (Xt )t∈I and .Y = (Yt )t∈I be stochastic
processes on .(Ω, F , P ). We say that X and Y are modifications if .P (Xt = Yt ) = 1
for every .t ∈ I .
Remark 1.2.2 The previous definition can be easily generalized to the case of .X, Y
generic functions from .Ω to values in .RI : in this case .(Xt = Yt ) is not necessarily
5 We recall the definition given in Remark 2.1.11 in [113]: a probability space .(Ω, F , P ) is
complete if .N ⊆ F where .N denotes the family of negligible sets (cf. Definition 1.1.16 in
[113]).
1.2 Uniqueness 9
is almost sure.
Remark 1.2.4 ([!]) Two processes X and Y are indistinguishable if they have
almost all the same trajectories. Even if X and Y are stochastic processes, it is not
necessarily true that .(X = Y ) is an event. In fact, .(X = Y ) = (X − Y )−1 ({0})
where .0 denotes the identically zero trajectory: however, .{0} ∈
/ F I unless I is finite
or countable (cf. Remark 1.1.10).
On the other hand, if the space .(Ω, F , P ) is complete then X and Y are
indistinguishable if and only if .P (X = Y ) = 1 since the completeness of the
space guarantees that .(X = Y ) ∈ F in the case .(X = Y ) is almost sure. For this
and other reasons that we will explain later, from now on we will often assume that
.(Ω, F , P ) is complete.
Remark 1.2.5 ([!]) If X and Y are modifications then they have the same finite-
dimensional distributions and therefore are equal in law. If X and Y are indis-
tinguishable then they are also modifications since for every .t ∈ I we have
.(X = Y ) ⊆ (Xt = Yt ). Conversely, if X and Y are modifications then they are
has Lebesgue measure equal to one, i.e., it is an almost sure event. On the other
hand, all the trajectories of X are different from those of Y at one point.
We also note that X and Y are equal in law, but X has all continuous trajectories
and Y has all discontinuous trajectories: therefore, there are important properties of
the trajectories of a stochastic process (such as, for example, continuity) that do not
depend on the distribution of the process.
In the case of continuous processes, we have the following particular result.
Proposition 1.2.7 Let I be a real interval and let .X = (Xt )t∈I and .Y = (Yt )t∈I be
processes with a.s. continuous trajectories.6 If X is a modification of Y , then .X, Y
are indistinguishable.
Proof By assumption, the trajectories .X(ω) and .Y (ω) are continuous for every .ω ∈
A with A almost sure. Moreover, .P (Xt = Yt ) = 1 for every .t ∈ I and consequently
the set
⋂
.C := A ∩ (Xt = Yt )
t∈I ∩Q
is almost sure. For every .t ∈ I , there exists an approximating sequence .(tn )n∈N in
I ∩ Q: by the continuity hypothesis, for every .ω ∈ C we have
.
1.3 Existence
6 The set of .ω ∈ Ω such that .t |→ Xt (ω) and .t |→ Yt (ω) are continuous functions is almost sure.
1.3 Existence 11
Let us make a preliminary remark: if .μt1 ,...,tn are the finite-dimensional distribu-
tions of a real stochastic process .(Xt )t∈I , then we have
As a consequence, the following consistency properties hold: for every finite family
of indices .t1 , . . . , tn ∈ I , for every .H1 , . . . , Hn ∈ B and for every permutation .ν of
the indices .1, 2, . . . , n, we have
A posteriori, it is clear that (1.3.2) and (1.3.3) are necessary conditions for the
distributions .μt1 ,...,tn to be the finite-dimensional distributions of a stochastic
process. The following result shows that these conditions are also sufficient.
Theorem 1.3.1 (Kolmogorov’s Extension Theorem [!!!] ) Let I be a non-empty
set. Suppose that, for each finite family of indices .t1 , . . . , tn ∈ I , a distribution
.μt1 ,...,tn on .R
n is given, and the consistency properties (1.3.2) and (1.3.3) are
( )
satisfied. Then there exists a unique probability measure .μ on . RI , F I that has
.μt1 ,...,tn as finite-dimensional distributions, i.e., such that
finite-dimensional distributions.
Proof Proceed in a similar way to the case of real random variables (cf.
Remark 2.1.17 in [113]). Let .(Ω, F , P ) = (RI , FμI , μ) be the complete
12 1 Stochastic Processes
. = μ(Ct1 ,...,tn (H )) =
(by (1.3.4))
. = μt1 ,...,tn (H ).
⨆
⨅
Now consider a stochastic process X on the space .(Ω, F , P ). Denote by .μX the
law of X and by .FμI X the .μX -completion of .F I (cf. Remark 1.3.2).
Definition 1.3.4 (Canonical Version of a Stochastic Process [!]) The canonical
version (or realization) of a process X is the process .X, on the probability space
.(R , Fμ , μX ), defined by .X(w) = w for each .w ∈ R .
I I I
X
Remark 1.3.5 By Corollary 1.3.3, X and its canonical realization .X are equal in
law. Moreover, .X is defined on the complete probability space .(RI , FμI X , μX ) in
which the sample space is .RI and the outcomes are the trajectories of the process.
Corollary 1.3.6 (Existence of Gaussian Processes [!]) Let
.m : I −→ R, c : I × I −→ R
be functions
( ) that, for every finite family of indices .t1 , . . . , tn ∈ I , the matrix
such
C = c(ti , tj ) i,j =1,...,n is symmetric and positive semi-definite. Then there exists a
.
with .NM,C instead of .μt1 ,...,tn and .(Xt1 , . . . , Xtn ) ∼ NM,C . Then the first part of
the thesis follows from Corollary 1.3.3. ( )
Now let .t1 , . . . , tn ∈ R≥0 : the matrix .C = min{ti , tj } i,j =1,...,n is obviously
symmetric and is also positive semi-definite since, for every .η1 , . . . , ηn ∈ R, we
have
⎲
n ⎲
n ˆ ∞
. ηi ηj min{ti , tj } = ηi ηj 1[0,ti ] (s)1[0,tj ] (s)ds.
i,j =1 i,j =1 0
ˆ ⎛ ⎞2
∞ ⎲
n
= ηi 1[0,ti ] (s) ds ≥ 0.
0 i=1
⨆
⨅
Corollary 1.3.7 (Existence of Independent Sequences of Random Variables [!])
Let .(μn )n∈N be a sequence of real distributions. There exists a sequence .(Xn )n∈N of
independent random variables defined on a complete probability space .(Ω, F , P ),
such that .Xn ∼ μn for every .n ∈ N.
Proof Apply Corollary 1.3.3 with .I = N. The family of finite-dimensional
distributions defined by
.μn+1 (H × R) = μn (H ), H ∈ Bn , n ∈ N.
Then there exists a sequence .(Xn )n∈N of random variables defined on a complete
probability space .(Ω, F , P ), such that .(X1 , . . . , Xn ) ∼ μn for every .n ∈ N.
14 1 Stochastic Processes
In this section, we consider the particular case where I is a subset of .R, typically
I = R≥0
. or I = [0, 1] or I = N.
Fs ⊆ Ft ⊆ F ,
. s, t ∈ I, s ≤ t.
Remark 1.4.4 We use the notation .G X for the filtration generated by X because
we want to reserve the symbol .F X for another filtration that we will define later
in Sect. 6.2.2 and call standard filtration for X. The filtration generated by X is
the “smallest” filtration that includes information about the process X: clearly, X is
adapted to .(Ft )t∈I if and only if .GtX ⊆ Ft for every .t ∈ I .
Remark 1.4.5 If .X is the canonical version of X (cf. Definition 1.3.4), then
E [Xt ] = E [XT ] ,
. t, T ∈ I.
Xn := Z1 + · · · + Zn ,
. n ∈ N.
Here .Zn represents the win or loss at the n-th play, q is the probability of winning,
and .Xn is the balance after n plays. Consider the filtration .(GnZ )n∈N of information
on the outcomes of the plays, .GnZ = σ (Z1 , . . . , Zn ). Then we have
⎾ ⏋ ⎾ ⏋
E Xn+1 | GnZ = E Xn + Zn+1 | GnZ =
.
. = Xn + E [Zn+1 ] = Xn + 2q − 1.
E [XT | Ft ] = E [E [X | FT ] | Ft ] = E [X | Ft ] = Xt ,
. t, T ∈ I, t ≤ T .
Remark 1.4.10 ([!]) We will often use the following remarkable identity,
⎾ ⏋ valid for
a real-valued square-integrable martingale X, i.e. X such that .E Xt2 < ∞ for
.t ∈ I :
⎾ ⏋ ⎾ ⏋
.E (Xt − Xs ) | Fs = E Xt − Xs | Fs , s ≤ t.
2 2 2
(1.4.3)
⎾ ⏋
= E Xt2 | Fs − 2Xs E [Xt | Fs ] + Xs2 =
Definition 1.4.11 Let .X = (Xt )t∈I be a stochastic process on the filtered space
(Ω, F , P , Ft ). We say that X is a sub-martingale if:
.
Xt ≤ E [XT | Ft ] ,
. t, T ∈ I, t ≤ T .
Xn = Mn + An ,
. n ≥ 0. (1.4.4)
Proof Uniqueness If two processes M and A, with the properties of the statement,
exist then we have
Conditioning on .Fn and exploiting the fact that X is adapted, M is a martingale and
A is predictable, we have
Note that from (1.4.6) it follows that if X is a sub-martingale then the process A has
almost surely monotone increasing trajectories.
Inserting (1.4.6) into (1.4.5) we also find
⎧
Mn+1 = Mn + Xn+1 − E [Xn+1 | Fn ] , if n ∈ N,
. (1.4.7)
M0 = X0 .
Mn = Xn − n(2q − 1),
. An = n(2q − 1).
⋂
n
Ct1 ,...,tn (H1 × · · · × Hn ) =
. Cti (Hi ), (1.5.1)
i=1
1.5 Proof of Kolmogorov’s Extension Theorem 19
. (Ct (H ))c = Ct (H c ),
( )c ⋃
n
( )c ⋃
n
. Ct1 ,...,tn (H1 × · · · × Hn ) = Cti (Hi ) = Cti (Hic )
i=1 i=1
Ct1 (H1 ) ∪ Ct2 (H2 ) = Ct1 ,t2 (H1 × H2 ) ⊎ Ct1 ,t2 (H1c × H2 ) ⊎ Ct1 ,t2 (H1 × H2c ),
.
and in general
⋃
n ⊎
. Cti (Hi ) = Ct1 ,...,tn (K1 × · · · × Kn )
i=1
where the disjoint union is taken among all the different possible combinations of
K1 × · · · × Kn where Ki is Hi or Hic , except for the case where Ki = Hic for every
i = 1, . . . , n. ⨆
⨅
We define μ on C as in (1.3.4), that is
⊎
n
.Dn = C \ Ck , n ∈ N.
k=1
⎲
n
μ(C) =
. μ(Ck ) + μ(Dn ).
k=1
8 Formula (1.5.2) implies the σ -sub-additivity: if A ∈ C and (An )n∈N is a sequence of elements
in C such that
⋃
.A ⊆ An
n∈N
⋃
n−1
.Cn = (A ∩ An ) \ Ak
k=1
with Cn which, by Lemma 1.5.1, is a finite and disjoint union of cylinders for each n ≥ 2. Then
from (1.5.2) it follows that
⎲
.μ(A) ≤ μ (An ) .
n∈N
1.5 Proof of Kolmogorov’s Extension Theorem 21
⊎
Nn
Dn =
. ~k ,
C ~k = {x ∈ RI | (xt1 , . . . , xtn ) ∈ Hk,1 × · · · × Hk,n }
C
k=1
for some sequence (tn )n∈N in I and Hk,n ∈ B. Now we use the following fact,
the proof of which we postpone to the end: it is possible to construct a sequence
(Kn )n∈N such that
• Kn ⊆ Rn is a compact subset of
⋃
Nn
Bn :=
. (Hk,1 × · · · × Hk,n ); (1.5.4)
k=1
• Kn+1 ⊆ Kn × R;
• μt1 ,...,tn (Kn ) ≥ 2ε .
Thus, we conclude the proof of (1.5.3). Since Kn /= ∅, for each n ∈ N there exists
a vector
(n)
(y1 , . . . , yn(n) ) ∈ Kn .
.
(n) (k )
By compactness, the sequence (y1 )n∈N admits a subsequence (y1 n )n∈N con-
(k ) (k )
verging to a point y1 ∈ K1 . Similarly, the sequence (y1 n , y2 n )n∈N admits a
subsequence converging to (y1 , y2 ) ∈ K2 . Repeating the argument, we construct
a sequence (yn )n∈N such that (y1 , . . . , yn ) ∈ Kn for each n ∈ N. Therefore
{x ∈ RI | xtk = yk , k ∈ N} ⊆ Dn
.
in which RI and the elements of (Dn )n∈N are repeated a sufficient number of times.
10 It is enough to combine the property of internal regularity of μ
t1 ,...,tn (cf. Proposition 1.4.9 in
[113]) with the fact that, by the continuity from below, for each ε > 0 there exists a compact K
such that μt1 ,...,tn (Rn \ K) < ε: note that this latter fact is nothing but the tightness property of the
distribution μt1 ,...,tn (cf. Definition 3.3.5 in [113]).
22 1 Stochastic Processes
Setting
⋂
n
Kn :=
. ~h × Rn−h ),
(K (1.5.5)
h=1
⋃
n
⊆ ~h ) × Rn−h
(Bh \ K
h=1
and consequently
⎲
n ⎛ ⎞
.μt1 ,...,tn (Bn \ Kn ) ≤ ~h ) × Rn−h .
μt1 ,...,tn (Bh \ K
h=1
⎲
n
= ~h ).
μt1 ,...,th (Bh \ K
h=1
⎲
n
ϵ ϵ
≤ ≤ .
2h+1 2
h=1
Then we have
ε
. μt1 ,...,tn (Kn ) = μt1 ,...,tn (Bn ) − μt1 ,...,tn (Bn \ Kn ) ≥ ,
2
since μt1 ,...,tn (Bn ) = μ(Dn ) ≥ ε by hypothesis. This concludes the proof. ⨆
⨅
Kolmogorov’s extension theorem generalizes, with a substantially identical
proof, to the case where the trajectories have values in a separable and complete
metric space (M, ϱ).11 We recall the notation Bϱ for the Borel σ -algebra on
(M, ϱ); moreover, MI is the family of functions from I to values in M and FϱI
is the σ -algebra generated by finite-dimensional cylinders
11 The first part of the proof, based on Carathéodory’s theorem, is exactly the same. In the second
part, and in particular in the construction of the sequence of compact Kn in (1.5.5), the tightness
property is crucial: here we exploit the fact that, under the assumption that (M, ϱ) is separable and
complete, every distribution on Bϱ is tight (see, for example, Theorem 1.4 in [16]). Kolmogorov’s
theorem does not extend to any measurable space: in this regard, see, for example, [59] page 214.
1.6 Key Ideas to Remember 23
We summarize the most significant findings of the chapter and the fundamental
concepts to be retained from an initial reading, while excluding overly technical or
ancillary details. If you have any doubt about what the following succinct statements
mean, please review the corresponding section.
• Section 1.1: we define a stochastic process as a function taking random values
or equivalently, albeit in a more abstract way, as a random variable with values
in the functional space of trajectories. The finite-dimensional distributions of a
process determine its law, playing the same role as the distribution of a random
variable.
• Section 1.2: we compare the different notions of equality between stochastic
processes, introducing the definitions of equivalence in law, indistinguishable
processes and modifications.
• Section 1.3: the main existence result for processes is Kolmogorov’s extension
Theorem 1.3.1. It states that it is possible to construct a stochastic process start-
ing from given finite-dimensional distributions that satisfy natural consistency
properties: it is a corollary of Carathéodory’s Theorem 1.4.29 in [113], and the
proof, being somewhat technical, can be safely skipped at a first reading.
24 1 Stochastic Processes
.i = 1, . . . , n}
.C Family of finite-dimensional cylinders 3
.F = σ (C )
I .σ -algebra
generated by finite-dimensional cylinders 3
.Fμ Completion of .F I with respect to the measure .μ
I 11
.Gt = σ (Xs , s ≤ t)
X Filtration generated by the process X 14
Chapter 2
Markov Processes
World is stochastic.
From “Students’ opinions on educational activities”, A.Y.
2022/23 University of Bologna
p = p(t, x; T , H ),
. 0 ≤ t ≤ T , x ∈ R N , H ∈ BN ,
p(t, Xt ; T , H ) = P (XT ∈ H | Xt ),
. 0 ≤ t ≤ T , H ∈ BN .
1 We
⎾ recall the convention
⏋ where P (XT ∈ H | Xt ) denotes the usual conditional expectation
E 1H (XT ) | Xt , as in Remark 4.3.5 in [113].
Remark 2.1.2 By properties (i) and (ii) of Definition 2.1.1, if X has transition law
p then p(t, Xt ; T , ·) is a regular version2 of the conditional law of XT given Xt .
Hence, we have
ˆ
. p(t, Xt ; T , dy)ϕ(y) = E [ϕ(XT ) | Xt ] , ϕ ∈ bBN , (2.1.1)
RN
In other words,
p(t, x; T , H ) = p(0, x; T − t, H ),
. 0 ≤ t ≤ T , x ∈ R, H ∈ B.
Equation (2.1.3) means that the conditional expectation function of ϕ(XT ) given Xt
is equal to the conditional expectation function of the temporally translated process
at the initial time.4
Example 2.1.6 (Poisson Transition Law [!]) Recall that Poissonx,λ denotes the
Poisson distribution with parameter λ > 0 and centered at x ∈ R, defined in
Example 1.4.17 in [113]. The Poisson transition law with parameter λ > 0 is
defined by
+∞
⎲ (λ(T − t))n
= e−λ(T −t) δx+n , 0 ≤ t ≤ T , x ∈ R.
n!
n=0
.Ex [Y ] = E [Y | X0 = x] ,
For clarity: the right-hand side of (2.1.4) is the conditional expectation of ϕ (XT −t ) given X0 ,
evaluated at Xt .
28 2 Markov Processes
Properties (i) and (ii) of Definition 2.1.1 are obvious. The Poisson transition law is
time-homogeneous and invariant under translations in the sense that
Example 2.1.9 (Gaussian Transition Law [!]) The Gaussian transition law is
defined by p(t, x; T , ·) = Nx,T −t for every 0 ≤ t ≤ T and x ∈ R. It is an
absolutely continuous transition law since
p(t, x; T , H ) := Nx,T −t (H )
.
ˆ
= 𝚪(t, x; T , y)dy, 0 ≤ t < T , x ∈ R, H ∈ B,
H
where
1 (x−y)2
𝚪(t, x; T , y) = √
. e− 2(T −t) , 0 ≤ t < T , x, y ∈ R,
2π(T − t)
is the Gaussian transition density. It is clear that p satisfies properties (i) and (ii) of
Definition 2.1.1.
We now introduce a notion of “continuous dependence” of the transition law with
respect to the initial datum (t, x).
2.1 Transition Law and Feller Processes 29
Definition 2.1.10 (Feller Property) A transition law p has the Feller property if
for every h > 0 and ϕ ∈ bC(RN ) the function
ˆ
(t, x) |−→
. p(t, x; t + h, dy)ϕ(y)
RN
is continuous. A Feller process is a process with a transition law that satisfies the
Feller property.
The Feller property is equivalent to the continuity under weak convergence of
the transition law p = p(t, x; t + h, ·) with respect to the pair (t, x) of initial
time and position: more precisely, recalling the definition of weak convergence of
distributions (cf. Remark 3.1.1 in [113]), the fact that X is a Feller process with
transition law p means that
d
p(tn , xn ; tn + h, ·) −−→ p(t, x; t + h, ·)
.
is continuous. The Feller property plays an important role in the study of Markov
processes (cf. Chap. 7) and the regularity properties of continuous-time filtrations
(cf. Sect. 6.2.1).
Example 2.1.11 Poisson and Gaussian transition laws satisfy the Feller property
(cf. Examples 2.4.5 and 2.4.6): therefore, we say that the related stochastic processes
that we will introduce later, respectively the Poisson process and the Brownian
motion, are Feller processes.
We conclude the section with a technical result. Recall Definition 1.3.4 of the
canonical version of a stochastic process.
Proposition 2.1.12 If p is a transition law for the process X, defined on the space
(Ω, F , P ), then it is also a transition law for its canonical version X.
Proof Recall that X is defined on the probability space (RI , FμI X , μX ), where FμI X
denotes the μX -completion of F I , and X(w) = w for every w ∈ RI . Given 0 ≤
t ≤ T and H ∈ B, let Z := p(t, Xt , T , H ): we have to verify that
Z = E μX [1H (XT ) | Xt ]
. (2.1.5)
30 2 Markov Processes
where E μX [·] denotes the expected value under the probability measure μX . Clearly
Z ∈ mσ (Xt ). Moreover, if W ∈ bσ (Xt ) then by Doob’s theorem W = ϕ(Xt ) with
ϕ ∈ bB and we have
. = E P [p(t, Xt , T , H )ϕ(Xt )] =
p(t, Xt ; T , H ) = P (XT ∈ H | Ft ),
. 0 ≤ t ≤ T , H ∈ B. (2.2.1)
E [ϕ(XT ) | Xt ] = E [ϕ(XT ) | Ft ] .
. (2.2.3)
The Markov property can be generalized in the following way. Observe that if
t ≤ t1 < t2 and .ϕ1 , ϕ2 ∈ bB then, by the tower property, we have
.
⎾ ⏋ ⎾ ⎾ ⏋ ⏋
E ϕ1 (Xt1 )ϕ2 (Xt2 ) | Xt = E E ϕ1 (Xt1 )ϕ2 (Xt2 ) | Ft1 | Xt .
.
⎾ ⎾ ⏋ ⏋
= E ϕ1 (Xt1 )E ϕ2 (Xt2 ) | Ft1 | Xt =
Doob’s theorem)
⎾ ⎾ ⏋ ⏋
. = E ϕ1 (Xt1 )E ϕ2 (Xt2 ) | Xt1 | Ft =
6 Formula (2.2.3) is not an equality but a notation that must be interpreted in the sense of
Hence, we have7
E [Y | Xt ] = E [Y | Ft ]
. (2.2.4)
for .Y = ϕ1 (Xt1 )ϕ2 (Xt2 ) with .t ≤ t1 < t2 and .ϕ1 , ϕ2 ∈ bB. By induction, it is not
difficult to prove that (2.2.4) also holds if
∏
n
Y =
. ϕk (Xtk ) (2.2.5)
k=1
for every .t ≤ t1 < · · · < tn and .ϕ1 , . . . , ϕn ∈ bB. Finally, by8 Dynkin’s
Theorem A.0.8 in [113], (2.2.4) is valid for every bounded r.v. that is measurable
with respect to the .σ -algebra generated by the random variables of the type .Xs with
.s ≥ t, that is
Gt,∞
.
X
:= σ (Xs , s ≥ t). (2.2.6)
E [Y | Xt ] = E [Y | Ft ] ,
. Y ∈ bGt,∞
X
. (2.2.7)
The following corollary expresses the essence of the Markov property: the past
(i.e., .Ft ) and the future (i.e., .Gt,∞
X ) are conditionally independent9 given the present
.P (A | Xt )P (B | Xt ) = P (A ∩ B | Xt ).
2.2 Markov Property 33
.E [Y | Xt ] E [Z | Xt ] = E [Y Z | Xt ] , Y ∈ bGt,∞
X
, Z ∈ bFt . (2.2.8)
E [W E [Y | Xt ] E [Z | Xt ]] =
.
. = E [W E [Y | Xt ] Z] =
. = E [W E [Y | Ft ] Z] .
= E [E [W Y Z | Ft ]] = E [W Y Z]
where .E μX [·] denotes the expected value under the probability measure .μX .
Obviously, .Z ∈ mGtX and therefore it remains to verify that
W = ϕ(Xt1 , . . . , Xtn )
.
⨆
⨅
XTt,x := XT − Xt + x,
. 0 ≤ t ≤ T , x ∈ R. (2.3.1)
10 We use Dynkin’s Theorem A.0.8 in [113] in a similar way to what was done in the proof of
Theorem 2.2.4.
2.3 Processes with Independent Increments and Martingales 35
Proof Let us prove that p in (2.3.1) is a transition law for X. Clearly, .p(t, x; T , ·)
is a distribution and .p(t, x; t, ·) = δx . Moreover, if .μXT −Xt denotes the law of
.XT − Xt , then by the Fubini’s theorem, for any .H ∈ B the function
x−
.| → p(t, x; T , H ) = μXT −Xt (H − x)
E [ϕ(XT ) | Xt ] = E [ϕ(XT − Xt + Xt ) | Xt ] =
.
(by the freezing Lemma 4.2.11 in [113], since .XT − Xt is independent of .Xt and
obviously .Xt is .σ (Xt )-measurable)
ˆ
⎾ ⏋|
. = E ϕ(XTt,x ) |x=X = p(t, Xt ; T , dy)ϕ(y).
t
R
⨆
⨅
A process with independent increments is not necessarily integrable, nor constant
in mean, and therefore not necessarily a martingale. However, we have the following
36 2 Markov Processes
Let X be a Markov process with initial distribution .μ (i.e., .X0 ∼ μ) and transition
law p. The following result shows that, starting from the knowledge of .μ and p, it
is possible to determine the finite-dimensional distributions (and therefore the law!)
of X.
Proposition 2.4.1 (Finite-Dimensional Distributions [!]) Let .X = (Xt )t≥0 be
a Markov process with transition law p and such that .X0 ∼ μ. For every
.t0 , t1 , . . . , tn ∈ R with .0 = t0 < t1 < t2 < · · · < tn , and .H ∈ Bn+1 we have
ˆ ∏
n
P ((Xt0 , Xt1 , . . . , Xtn ) ∈ H ) =
. μ(dx0 ) p(ti−1 , xi−1 ; ti , dxi ). (2.4.1)
H i=1
Now suppose (2.4.1) is true for n and prove it for .n + 1: for .H ∈ Bn+1 and .K ∈ B
we have
⎾ ⎾ ⏋⏋
P ((Xt0 , . . . , Xtn+1 ) ∈ H × K) = E 1H (Xt0 , . . . , Xtn )E 1K (Xtn+1 ) | Ftn =
.
ˆ ∏
n+1
. = μ(dx0 ) p(ti−1 , xi−1 ; ti , dxi ).
H ×K i=1
⨆
⨅
Remark 2.4.2 In the particular case .μ = δx0 , (2.4.1) becomes
ˆ ∏
n
. P ((Xt1 , . . . , Xtn ) ∈ H ) = p(ti−1 , xi−1 ; ti , dxi ), H ∈ Bn . (2.4.2)
H i=1
Proof Intuitively, the Chapman-Kolmogorov equation expresses the fact that the
probability of moving from position .x1 at time .t1 to a position in H at time .t3
is equal to the probability of transitioning to a position .x2 at an interim time .t2 ,
followed by a transition from .x2 to H , integrated over all possible values of .x2 .
We have
⎾ ⏋
p(t1 , Xt1 ; t3 , H ) = E 1H (Xt3 ) | Xt1 =
.
38 2 Markov Processes
(by (2.1.1))
ˆ
. = p(t1 , Xt1 ; t2 , dx2 )p(t2 , x2 ; t3 , H ).
R
⨆
⨅
We now show that the Chapman-Kolmogorov equation is actually a necessary
and sufficient condition, in the sense that it is always possible to construct a Markov
process from an initial law and a transition law p provided that it verifies (2.4.3).
Theorem 2.4.4 ([!]) Let .μ be a distribution on .R and let .p = p(t, x; T , H ) be a
transition law11 that verifies the Chapman-Kolmogorov equation
ˆ
p(t1 , x; t3 , H ) =
. p(t1 , x; t2 , dy)p(t2 , y; t3 , H ), (2.4.4)
R
for every .0 ≤ t1 < t2 < t3 , .x ∈ R and .H ∈ B. Then there exists a Markov process
X = (Xt )t≥0 with transition law p and such that .X0 ∼ μ.
.
and if .t0 , . . . , tn are not ordered in increasing order, we define .μt0 ,...,tn by (1.3.2) by
reordering the times. In this way, the consistency property (1.3.2) is automatically
satisfied by construction. On the other hand, the Chapman-Kolmogorov equation
guarantees the validity of the second consistency property (1.3.3) since, after
ordering the times in increasing order, we have
⎾ ⏋
= E 1H (Xt0 , Xt1 , . . . , Xtn )ϕ(XT )
ˆ ∏n ˆ
= μ(dx0 ) p(ti−1 , xi−1 ; ti , dxi ) p(tn , xn ; T , dy)ϕ(y)
H i=1 R
⎾ ˆ ⏋
= E 1H (Xt0 , . . . , Xtn ) p(tn , Xtn ; T , dy)ϕ(y)
R
⎾ ⏋
= E 1Ct0 ,...,tn (H ) Z .
⨆
⨅
Example 2.4.5 (Poisson Transition Law [!]) The Poisson transition law with
parameter .λ > 0 (cf. Example 2.1.6)
+∞
⎲ (λ(T − t))n
= e−λ(T −t) δx+n , 0 ≤ t ≤ T , x ∈ R,
n!
n=0
40 2 Markov Processes
where
1 (x−y)2
.𝚪(t, x; T , y) = √ e− 2(T −t) , 0 ≤ t < T , x, y ∈ R,
2π(T − t)
is the Gaussian transition density. The Gaussian transition law satisfies the
Chapman-Kolmogorov equation as it is verified directly by calculating the
convolution of two Gaussians or, more easily, the product of their characteristic
functions. We will study later, in Chap. 4, the Markov process associated with p,
+∞ i
⎲ i ⎛ ⎞
⎲
λ i
= e−λ(T −t) δx+i (H ) (s − t)j (T − s)i−j
i! j
i=0 j =0
= p(t, x; T , H ).
2.5 Characteristic Operator and Kolmogorov Equations 41
the so-called Brownian motion. For any .ϕ ∈ bC and .T > 0 the function
ˆ
x |−→
. 𝚪(0, x; T , y)ϕ(y)dy (2.4.5)
R
is continuous and therefore the Brownian motion is a Feller process. Actually, one
verifies that the function in (2.4.5) is .C ∞ for each .T > 0 and .ϕ ∈ bB (not just
for .ϕ ∈ bC): for this reason we say that Brownian motion verifies the strong Feller
property.
Remark 2.4.7 (Transition Law and Semigroups) For( each) transition law .p =
p(t, x; T , ·), there exists a corresponding family .p = pt,T 0≤t≤T of linear and
bounded operators
pt,T : bB −→ bB
.
defined by
ˆ
pt,T ϕ :=
. p(t, ·; T , dy)ϕ(y), ϕ ∈ bB.
R
E [ϕ(XT ) | Ft ] ,
. 0 ≤ t < T,
42 2 Markov Processes
E [ϕ(XT ) | Ft ] = u(t, Xt )
. (2.5.1)
where
ˆ
. u(t, x) := p(t, x; T , dy)ϕ(y), 0 ≤ t ≤ T , x ∈ RN . (2.5.2)
RN
and therefore
ϕ(x + γ (T ) − γ (t)) − ϕ(x)
At ϕ(x) =
. lim =
T −t→0+ T −t
1
. = lim (∇ϕ(x) · (γ (T ) − γ (t)) + o (|γ (T ) − γ (t)|)) .
T −t→0+ T −t
⎲
N
At = γ ' (t) · ∇ =
. γj' (t)∂xj .
j =1
Notice, in particular, that the characteristic operator .At depends on the process X
and not on the specific version of its transition law. By (2.5.5), in analogy with
Example 2.5.3, we can interpret .At ϕ(x) as an “average directional derivative” (or
44 2 Markov Processes
In the following section, we show that for a wide class of transition laws, it is
possible to give a more detailed representation of the characteristic operator.
13 By assumption, |ϕ(x)| ≤ |x|2 g(|x|) for |x| ≤ 1 with g going to zero as |x| → 0+ and it is not
restrictive to assume g monotonically increasing. Then (2.5.7) follows from the fact that
⎛x ⎞ 1
.g(|x|)χ ≤ χ(x)g(δ), x ∈ RN , 0 < δ ≤ .
δ 2
2.5 Characteristic Operator and Kolmogorov Equations 45
1
. |ϕδ (x)| ≤ g(δ)|x|2 χ (x), x ∈ RN , 0 < δ ≤ . (2.5.7)
2
1 ⎲ ⎲
N N
Aϕ =
. cij ∂xi xj ϕ(x0 ) + bi ∂xi ϕ(x0 ), ϕ ∈ C 2 (RN ). (2.5.8)
2
i,j =1 i=1
A ϕ = A T2,x0 (ϕ) =
.
1 ⎲ ⎲
N N
. = cij ∂xi xj ϕ(x0 ) + bi ∂xi ϕ(x0 )
2
i,j =1 i=1
ϕij (x) = (x − x0 )i (x − x0 )j ,
. ϕj (x) = (x − x0 )j , x ∈ RN . (2.5.9)
⎲
N
. ϕη (x) = −〈x − x0 , η〉2 = − ηi ηj ϕij (x);
i,j =1
A ϕη = −2〈C η, η〉 ≤ 0.
.
46 2 Markov Processes
1 ⎲ ⎲ 1⎲ ⎲
N N N N
Aϕ =
. ∂xi xj ϕ(x0 ) mih mj h = ∂xi xj ϕ(x0 )mih mj h ≤ 0,
2 2
i,j =1 h=1 h=1 i,j =1
1 ⎲ ⎲
N N
At ϕ(x) =
. cij (t, x)∂xi xj ϕ(x)+ bi (t, x)∂xi ϕ(x), (t, x) ∈ R>0 ×RN ,
2
i,j =1 i=1
(2.5.10)
where C (t, x) = (cij (t, x)) is an N × N symmetric, positive semi-definite matrix
and b(t, x) = (bj (t, x)) ∈ RN . In other words, At is a second-order partial
differential operator of elliptic-parabolic type.
Combining (2.5.4) with the expression of the coefficients of At given by the
functions in (2.5.9), we obtain the formulas15
ˆ
p(t, x; T , dy)
bi (t, x) =
. lim (y − x)i
T −t→0+ T −t
RN
⎾ ⏋
(XT − Xt )i
= lim E | Xt = x , . (2.5.11)
T −t→0+ T −t
14 It can be shown that the property of being local corresponds to the continuity of the trajectories of
the associated Markov process. For the characterization of the characteristic operator of a generic
Markov process, see, for example, [132].
15 If A is local at x then the integration domain in (2.5.11) and (2.5.12) can be restricted to |x −
t
y| < 1.
2.5 Characteristic Operator and Kolmogorov Equations 47
ˆ
p(t, x; T , dy)
cij (t, x) = lim (y − x)i (y − x)j
T −t→0+ T −t
RN
⎾ ⏋
(XT − Xt )i (XT − Xt )j
= lim E | Xt = x , (2.5.12)
T −t→0+ T −t
16 Notice that
ˆ
p(t, x; T , dy)
.cij (t, x) = lim (y − x − (T − t)b(t, x))i (y − x − (T − t)b(t, x))j .
T −t→0+ T −t
RN
⎾ ⏋
(XT − Xt − (T − t)b(t, Xt ))i (XT − Xt − (T − t)b(t, Xt ))j
= lim E | Xt = x
T −t→0+ T −t
as can be verified by expanding the product inside the integral and observing that
ˆ ˆ
. lim (T − t) p(t, x; T , dy)bi (t, x)bj (t, x) = lim p(t, x; T , dy)(y − x)i bj (t, x) = 0.
T −t→0+ T −t→0+
RN RN
48 2 Markov Processes
based on the definition of the characteristic operator in the form (2.5.6). The
previous steps are justified rigorously under the assumption that .u(t, ·) ∈ D: in
Example 2.5.12 this assumption is satisfied if .ϕ ∈ C 1 (RN ) since .x |→ u(t, x) =
ϕ(x + γ (T ) − γ (t)) inherits the regularity properties of .ϕ. We will examine later
other significant examples in which .u(t, ·) ∈ bC 2 (RN ) thanks to the regularizing
properties of the kernel .p(t, x; T , dy).
Therefore, at least formally, the function u in (2.5.13) solves the Cauchy problem
for the backward Kolmogorov equation17 (with final datum)
⎧
∂t u(t, x) + At u(t, x) = 0, (t, x) ∈ [0, T [×RN ,
. (2.5.15)
u(T , x) = ϕ(x), x ∈ RN ,
or in integral form
ˆ T
u(t, x) = ϕ(x) +
. As u(s, x)ds, (t, x) ∈ [0, T ] × RN .
t
1 (x−y)2
𝚪(t, x; T , y) = √
. e− 2(T −t) , 0 ≤ t < T , x, y ∈ R. (2.5.16)
2π(T − t)
´
17 Being .u(t, x) = p(t, x; T , dy)ϕ(y), it is also customary to say that the transition law .(t, x) |→
RN
p(t, x; T , dy) solves the backward problem
⎧
∂t p(t, x; T , dy) + At p(t, x; T , dy) = 0, (t, x) ∈ [0, T [×RN ,
.
p(T , x; T , ·) = δx , x ∈ RN ,
The Markov process associated with p is the Brownian motion that will be
introduced in Chap. 4. A direct calculation shows that
T − t − (x − y)2
∂t 𝚪(t, x; T , y) = −∂T 𝚪(t, x; T , y) =
. 𝚪(t, x; T , y),
2(T − t)2
y−x
∂x 𝚪(t, x; T , y) = −∂y 𝚪(t, x; T , y) = 𝚪(t, x; T , y),
T −t
T − t − (x − y)2
∂xx 𝚪(t, x; T , y) = ∂yy 𝚪(t, x; T , y) = − 𝚪(t, x; T , y),
(T − t)2
and also
⎛ ⎞
1
. ∂T − ∂yy 𝚪(t, x; T , y) = 0, t < T , x, y ∈ R (2.5.18)
2
which is called forward Kolmogorov equation and will be studied in Sect. 2.5.3. The
characteristic operator of p is the Laplace operator
1
At =
. ∂xx
2
as can also be verified using formulas (2.5.11) and (2.5.12) which here become
ˆ
𝚪(t, x; T , y)
.b(t, x) = lim (y − x)dy = 0,
T −t→0+ T −t
RN
ˆ
𝚪(t, x; T , y)
c(t, x) = lim (y − x)2 dy = 1.
T −t→0+ T −t
RN
⎧
∂t u(t, x) + 12 ∂xx u(t, x) = 0, (t, x) ∈ [0, T [×R,
. (2.5.20)
u(T , x) = ϕ(x), x ∈ R.
Note that, if v denotes the solution of the forward problem (2.5.19) with initial time
t = 0, then .u(t, x) := v(T − t, x) solves the backward problem (2.5.20); moreover,
.
A deep connection between the theory of stochastic processes and that of partial
differential equations is given by the fact that, if it exists, the transition density of
a Markov process (for example, the Gaussian density in the case of a Brownian
motion) is the fundamental solution of the Kolmogorov equations (corresponding
to the heat equations in the case of a Brownian motion). A general treatment on
the existence and uniqueness of the solution of the Cauchy problem for partial
differential equations of parabolic type is given in Chap. 20, while in Chap. 15 we
deepen the connection with stochastic differential equations.
Example 2.5.11 ([!]) Consider the Poisson transition law with parameter .λ > 0 of
Example 2.4.5:
⎲ (λ(T − t))n−1
. = λu(t, x) − λe−λ(T −t) ϕ(x + n)
(n − 1)!
n≥1
⎲ (λ(T − t))n
= λu(t, x) − λe−λ(T −t) ϕ(x + n + 1)
n!
n≥0
In conclusion, we have
ˆ ˆ
. ∂T p(t, x; T , dy)ϕ(y) = p(t, x; T , dy)AT ϕ(y), ϕ ∈ D,
RN RN
(2.5.22)
which is called the forward Kolmogorov equation or also the Fokker-Planck
equation. Here .ϕ must be interpreted as a test function and (2.5.22) as the weak
(or distributional) form of the equation
where .AT∗ denotes the adjoint operator of .AT . For example, if .AT is a differential
operator of the form (2.5.10) then .AT∗ is obtained formally by integration by parts:
ˆ ˆ
( ∗ )
. AT u(y) v(y)dy = u(y)AT v(y)dy,
RN RN
for any pair of test functions .u, v. If the coefficients are sufficiently regular, it is
possible to write the forward operator more explicitly:
1 ⎲ ⎲
N N
AT∗ u =
. cij ∂yi yj u + bj∗ ∂yj + a ∗ , (2.5.23)
2
i,j =1 j =1
where
⎲
N ⎲
N
1 ⎲
N
bj∗ := −bj +
. ∂yi cij , a ∗ := − ∂yi bi + ∂yi yj cij . (2.5.24)
2
i=1 i=1 i,j =1
The term “distributional solution” is used to indicate the fact that .p(t, x; T , ·), being
a distribution, does not generally have the regularity required to support the operator
.AT which in fact appears in (2.5.22) applied to the test function .ϕ. Note that
the problem (2.5.25) is written in the forward variables .(T , y) on .]t, +∞[×RN ,
assuming fixed the backward variables .(t, x).
The existence of the distributional solution of (2.5.25) can be proved under very
general assumptions (see, for example, Theorem 1.1.9 in [133]): although the notion
of distributional solution is very weak, this is the best result one can hope to obtain
without assuming further hypotheses, as shown by the following
2.5 Characteristic Operator and Kolmogorov Equations 53
Example 2.5.12 ([!]) Let us resume Example 2.5.3. The operator .At = γ ' (t) ·
∇x , with .∇x = (∂x1 , . . . , ∂xN ), is obviously local at every .x ∈ RN : it can also be
determined using formulas (2.5.11) and (2.5.12) which, for p as in (2.5.3) with .γ
differentiable, give
ˆ
1
b(t, x) =
. lim δx+γ (T )−γ (t) (dy)(y − x) = γ ' (t), .
T −t→0+ T −t
RN
ˆ
1
cij (t, x) = lim δx+γ (T )−γ (t) (dy)(y − x)i (y − x)j = 0.
T −t→0+ T −t
RN
Clearly, since .p(t, x; T , ·) is a measure, the gradient .∇y p(t, x; T , ·) is not defined
in the classical sense but in the sense of distributions. Therefore, problem (2.5.26)
should be understood as in (2.5.22), that is, as an integral equation where the
gradient is applied to the function .ϕ:
ˆ T
ϕ(x+γ (T )−γ (t)) = ϕ(x)+
. γ ' (s)·(∇ϕ)(x+γ (s)−γ (t))ds, ϕ ∈ C 1 (RN );
t
by differentiating, we find
d
. ϕ(x + γ (T ) − γ (t)) = γ ' (T ) · (∇ϕ)(x + γ (T ) − γ (t)).
dT
Intuitively, the characteristic operator provides the infinitesimal increment (also
called, the drift) of a process: by removing the drift, we get a martingale. This fact is
made rigorous by the following remarkable result, which shows how to compensate
a process to make it a martingale, by means of the characteristic operator.
Theorem 2.5.13 ([!]) Let X be a Markov process with characteristic operator .At
defined on a domain .D. If .ψ ∈ D is such that .At ψ(Xt ) ∈ L1 ([0, T ] × Ω), then the
process
ˆ t
Mt := ψ(Xt ) −
. As ψ(Xs )ds, t ∈ [0, T ],
0
is a martingale.
54 2 Markov Processes
Proof We have .Mt ∈ L1 (Ω, P ), for any .t ∈ [0, T ], thanks to the assumptions18 on
.ψ. It remains to prove that
E [Mt − Ms | Ft ] = 0,
. 0 ≤ s ≤ t ≤ T,
that is
⎾ ˆ t ⏋
E ψ(Xt ) − ψ(Xs ) −
. Ar ψ(Xr )dr | Fs = 0, 0 ≤ s ≤ t ≤ T.
s
(by the Markov property (2.5.1) applied to the first and last term)
ˆ t
. = E [ψ(Xt ) | Fs ] − ψ(Xs ) − E [Ar ψ(Xr ) | Fs ] dr =
s
(since, as we will prove shortly, it is possible to exchange the time integral with the
conditional expectation)
⎾ ˆ t ⏋
. = E ψ(Xt ) − ψ(Xs ) − Ar ψ(Xr )dr | Fs
s
´t
is a version of the conditional expectation of . Ar ψ(Xr )dr given .Fs . First of all,
s
from the fact that .E [Ar ψ(Xr ) | Fs ] ∈ mFs it follows that also .Z ∈ mFs . Then,
for every .G ∈ Fs , we have
⎾ˆ t ⏋
E [Z1G ] = E
. E [Ar ψ(Xr ) | Fs ] dr 1G =
s
18 We also recall that .ψ is bounded since .D ⊆ bBN : this assumption is not restrictive and can be
significantly weakened.
2.6 Markov Processes and Diffusions 55
⨆
⨅
1 ⎲ ⎲
N N
.At = cij (t, x)∂xi xj + bi (t, x)∂xi , (t, x) ∈ R × RN .
2
i,j =1 i=1
matrix .C (t, Xt ), consistently with Eqs. (2.5.11) and (2.5.12). Itô developed a theory
of stochastic calculus based on which the previous idea can be formalized in terms
of the stochastic differential equation
19 The most important result in this regard is the famous Hörmander’s theorem [62].
20 Malliavin’s calculus extends the mathematical field of calculus of variations from deterministic
functions to stochastic processes. For a general reference see, e.g., [101].
2.7 Key Ideas to Remember 57
We summarize the core concepts and key insights from the chapter to facilitate
comprehension, omitting the more technical or less significant details. As usual,
if you have any doubt about what the following succinct statements mean, please
review the corresponding section.
• Section 2.1: the transition law of a stochastic process .X = (Xt )t≥0 is the family
of the conditional distributions of .XT given .Xt , indexed by .t, T with .t ≤ T . Two
notable examples of transition laws are the Gaussian and Poisson ones.
• Section 2.2: for a Markov process, conditioning on .Ft (the .σ -algebra of
information up to time t) is equivalent to conditioning on .Xt : in this sense, the
Markov property is a “memoryless” property.
• Section 2.3: processes with independent increments are Markov processes.
• Section 2.4: starting from the initial distribution and the transition law of a
Markov process, it is possible to derive the finite-dimensional distributions, and
therefore the law of the process: moreover, the transition law of a Markov process
verifies an important identity, the Chapman-Kolmogorov equation (2.4.3), which
expresses a consistency property between the distributions that make up the
transition law.
• Section 2.5: if it exists, the average directional derivative along the trajectories of
X, i.e.
⎾ ⏋
ϕ(XT ) − ϕ(Xt )
. lim E | Xt = x =: At ϕ(x),
T −t→0+ T −t
defines the characteristic operator .At of the Markov process X, at least for .ϕ in
an appropriate space of functions.
• Section 2.5.1: for continuous Markov processes, .At is a second-order elliptic-
parabolic partial differential operator whose prototype is the Laplace operator.
The coefficients of .At are the infinitesimal increments of the mean and covari-
ance matrix of X (cf. formulas (2.5.11) and (2.5.12)).
• Sections 2.5.2 and 2.5.3: the transition law is the solution of the backward
and forward Kolmogorov equations. The prototypes of such equations are the
backward and forward versions of the heat equation.
• Section 2.6: we call diffusion a continuous Markov process. A classical approach
to the construction of diffusions consists in determining their transition law
as fundamental solutions of the backward or forward Kolmogorov equation.
Alternatively, diffusions are constructed as solutions of stochastic differential
equations, the theory of which will be developed starting from Chap. 14.
58 2 Markov Processes
The notion of continuity for stochastic processes, although intuitive, hides some
small pitfalls and must therefore be analyzed carefully.
In this chapter, I denotes a real interval of the form .I = [0, T ] or .I =
[0, +∞[. Moreover, .C(I ) is the set of continuous functions mapping I to real
values. In the first part of the chapter, we confirm a natural and unsurprising fact:
a continuous process can be defined as a random variable with values in the space
of continuous functions .C(I ), rather than in the space .RI of all trajectories, as
seen in the broader definition of a stochastic process (cf. Definition 1.1.3). Then
we prove the fundamental Kolmogorov’s continuity theorem according to which,
up to modifications, one can deduce the continuity of a process from a condition
on its law: this is a deep result because it allows to deduce a “pointwise” property
(of individual trajectories) from a condition “in the average” (i.e. on the law of the
process).
M=
. sup Xt .
t∈[0,1]∩Q
1 We cannot use (X ∈ C(I )) instead of A because if (Ω, F , P ) is not complete then X1(X∈C(I ))
would not necessarily be a stochastic process.
2 Actually, the argument is more subtle and will be clarified in Sect. 3.3.
3.2 Canonical Version of a Continuous Process 61
1⎲
n
J (ω) = lim
. X k (ω)
n→∞ n n
k=1
since the integral of a continuous function is equal to the limit of Riemann sums.
Finally, (I + = ∅) = (M ≤ 0) ∈ F and thus also
⋃
(T < t) = (I + = ∅) ∪
. (Xs > 0)
s∈Q∩[0,t[
In this section, we focus on the case .I = [0, 1]. We recall that .C([0, 1]) (we also
write, more simply, .C[0, 1]) is a separable and complete metric space, i.e., a Polish
space, with the uniform metric
We consider .I = [0, 1] only for simplicity: the results of this section can be easily
extended to the case where .I = [0, T ] or even .I = R≥0 considering the distance
⎲ 1 ⎧ ⎫
.ρmax (v, w) = min 1, max |v(t) − w(t)| , v, w ∈ C(R≥0 ).
2n t∈[0,n]
n≥1
We denote by .Bρmax the Borel .σ -algebra on .C[0, 1] (cf. Section 1.4.2 in [113]).
According to the general Definition 1.1.3, a stochastic process .X = (Xt )t∈I is
a measurable function from .(Ω, F ) to .(RI , F I ). We now show that if X is con-
tinuous then it is possible to replace the codomain .(RI , F I ) with .(C(I ), Bϱmax ),
maintaining the measurability property with respect to the .σ -algebra .Bϱmax . This
fact is not trivial and deserves to be proven rigorously. In fact, based on Remark
1.1.10, .C[0, 1] itself does not belong to .F [0,1] and therefore in general .(X ∈
C[0, 1]) is not an event. Similarly, the singletons .{w} are not elements of .F [0,1]
and therefore even if
is measurable.
Proof First, we show that .Bϱmax is the .σ -algebra generated by the family .C~ of
cylinders of the form3
~t (H ) := {w ∈ C[0, 1] | w(t) ∈ H },
C
. t ∈ [0, 1], H ∈ B. (3.2.1)
In fact, cylinders of the type (3.2.1) with H open in .R generate .σ (C~) and are open
with respect to .ϱmax : therefore .Bϱmax ⊇ σ (C~).
Conversely, since .(C[0, 1], ϱmax ) is separable, every open set is a countable
union of open disks. Therefore, .Bϱmax is generated by the family of open disks
that are sets of the form
where .w ∈ C[0, 1] is the center and .r > 0 is the radius of the disk. On the other
hand, each disk is obtained by countable operations of union and intersection of
cylinders of .C~in the following way
⋃ ⋂
D(w, r) =
. {v ∈ C[0, 1] | |v(t) − w(t)| < r − n1 }.
n∈N t∈[0,1]∩Q
Thus, each disk belongs to .σ (C~) and this proves the opposite inclusion.
Now we prove the thesis: as just proven, we have
( ) ( )
X−1 Bϱmax = X−1 σ (C~) =
.
(since X is continuous)
. = X−1 (σ (C )) ⊆ F
where the last inclusion is due to the fact that X is a stochastic process. ⨆
⨅
3 We use the “tilde” to distinguish the cylinders of continuous functions from the cylinders of .R [0,1]
defined in (1.1.1).
3.2 Canonical Version of a Continuous Process 63
μX (H ) = P (X ∈ H ),
. H ∈ Bϱmax .
Two continuous processes X and Y are equal in law (or in distribution) if .μX = μY :
d
in this case we write .X = Y .
In analogy with Definition 1.3.4 we give the following
Definition 3.2.3 (Canonical Version of an a.s. Continuous Process [!]) Let .X =
(Xt )t∈I be an a.s. continuous process defined on the space .(Ω, F , P ) and with law
.μX . The canonical version of X is the stochastic process defined as the identity
4 By Remark 3.1.3, the definition extends to the case of X a.s. continuous in an obvious way.
64 3 Continuous Processes
~t (ω) − X
|X
. ~s (ω)| ≤ cα,ω |t − s|α , t, s ∈ [0, 1].
In Sect. 3.4 we give a proof of Theorem 3.3.1, inspired by the original ideas of
Kolmogorov. Let us consider some examples.
Example 3.3.2 ([!]) We resume Corollary 1.3.6 and consider a Gaussian process
.(Xt )t∈[0,1] with mean function .m ≡ 0 and covariance .c(s, t) = s ∧ t. By definition,
.(Xt , Xs ) ∼ N0,Ct,s where
⎛ ⎞
t s∧t
.Ct,s =
s∧t s
√
t − sZ with .Z ∼ N0,1 ; then, for every .p > 0 we have
⎾ ⏋ p ⎾ ⏋
E |Xt − Xs |p = |t − s| 2 E |Z|p
.
∞
⎲
⎾ ⏋ (λ(t − s))n
E |Nt − Ns |p = e−λ(t−s)
. np =
n!
n=0
⎲∞
(λ(t − s))n
≥ e−λ(t−s)
n!
n=1
⎛ ⎞
= e−λ(t−s) eλ(t−s) − 1 ≈ λ(t − s) + o(t − s)
for .t − s → 0. Thus, condition (3.3.1) is not satisfied for any value of .ε > 0. Indeed,
in Chap. 5 we will discover that the Poisson law corresponds to a process N with
discontinuous trajectories.
Theorem 3.3.1 can be extended in several directions: the most interesting ones
concern higher-order regularity, the extension to the case of multidimensional I ,
and the case of processes with values in Banach spaces. In relatively recent times, it
has been observed that Kolmogorov’s continuity theorem is essentially an analytical
result that can be proved as a corollary of the Sobolev embedding theorem, in a very
general version for the so-called Besov spaces. We provide here the statement given
in [128].
Theorem 3.3.4 (Kolmogorov’s Continuity Theorem) [[!!!]] Let .X = (Xt )t∈Rd
be a real stochastic process. If there exist .k ∈ N0 , .0 < ε < p, and .δ > 0 such that
⎾ ⏋
E |Xt − Xs |p ≤ c|t − s|d+ε+kp
.
for every .t, s ∈ Rd with .|t − s| < δ, then X admits a modification .X ~ whose
trajectories are differentiable up to order k, with locally .α-Hölder derivatives for
every .α ∈ [0, pε [.
66 3 Continuous Processes
Theorem 3.3.4 also extends to processes with values in a Banach space: the
following example is particularly relevant in the study of stochastic differential
equations.
( )
Example 3.3.5 Let . Xtx t∈[0,1] be a family of continuous stochastic processes,
indexed by .x ∈ Rd : as in Sect. 3.2, we consider .Xx as a r.v. with values in
.(C[0, 1], Bϱmax ) which is a Banach space with the norm
If
⎾ p ⏋
E ‖Xx − Xy ‖∞ ≤ c|x − y|d+ε ,
. x, y ∈ Rd ,
~ (i.e., we have6 .X
then there exists a modification .X ~x = Xx a.s. for each .x ∈ Rd )
such that
~tx (ω) − X
‖X
. ~t (ω))‖∞ ≤ c |x − y|α ,
y
x, y ∈ K,
We have to prove that, if .X = (Xt )t∈[0,1] is a real stochastic process and there exist
three constants .p, ε, c > 0 such that
⎾ ⏋
E |Xt − Xs |p ≤ c|t − s|1+ε ,
. t, s ∈ [0, 1], (3.4.1)
( x )
6 In ~t = Xtx , t ∈ [0, 1] =1.
the sense that .P X
3.4 Proof of Kolmogorov’s Continuity Theorem 67
We observe that from (3.4.2) it follows that, fixing .t ∈ [0, 1], there exists the limit
in probability
. lim Xs = Xt
s→t
and consequently, there is also almost sure convergence. However, this is not enough
to prove the thesis: in fact, the same result holds, for example, for the Poisson
process which has all discontinuous trajectories (cf. (5.1.5)). Indeed, Kolmogorov
realized that from (3.4.2) it is not possible to directly obtain an estimate of the
increment .Xt − Xs for every .t, s since .[0, 1] is uncountable. Thus, his idea was to
first restrict .t, s to the countable family of dyadic rationals of .[0, 1] defined by
⋃ { }
. D= Dn , Dn = k
2n | k = 0, 1, . . . , 2n .
n≥1
We observe that .Dn ⊆ Dn+1 for every .n ∈ N. Two elements .t, s ∈ Dn are called
consecutive if .|t − s| = 2−n .
Second Step We estimate the increment .Xt − Xs assuming that .t, s are consecutive
in .Dn : by (3.4.2) we have
⎛ ⎞
−nα
P |X
. k − X k−1 | ≥ 2 ≤ c 2n(αp−1−ε) .
2n n 2
Then, setting
⎛ ⎞ ⋃ ⎛ ⎞
−nα −nα
An =
. max n |X k − X k−1 |≥2 = |X k − X k−1 | ≥ 2 ,
1≤k≤2 2n n 2 2n n
2
1≤k≤2n
⎲
2 n
⎛ ⎞ ⎲
2n
−nα
P (An ) ≤
. P |X k − X k−1 |≥2 ≤ c 2n(αp−1−ε) = c 2n(αp−ε) .
2n n 2
k=1 k=1
ε
Hence, if .α < p, we have
⎲
. P (An ) < ∞
n≥1
and by Borel-Cantelli’s Lemma 1.3.28 in [113] .P (An i.o.) = 0: this means that
there exists .N ∈ F , with .P (N) = 0, such that for every .ω ∈ Ω \ N there exists
.nα,ω ∈ N for which
As a consequence, we also have that for every .ω ∈ Ω \ N there exists .cα,ω > 0 such
that
n̄ = min{k | t, s ∈ Dk },
. n = max{k | t − s < 2−k },
so that .n < n̄. Moreover, for .k = n + 1, . . . , n̄, we recursively define the sequence
sn = max{τ ∈ Dn | τ ≤ s},
. sk = sk−1 + 2−k sgn(s − sk−1 )
|s − sk | < 2−k ,
. |t − tk | < 2−k , k = n, . . . , n̄,
⎲
n̄ ⎲
n̄
Xt − Xs = Xtn − Xsn +
. (Xtk − Xtk−1 ) − (Xsk − Xsk−1 )
k=n+1 k=n+1
⎲
n̄
. |Xt (ω) − Xs (ω)| ≤ cα,ω 2−nα + 2 cα,ω 2−kα.
k=n+1
∞
⎲
≤ 2cα,ω 2−kα
k=n
2cα,ω −nα
= 2 ,
1 − 2−α
' |t − s|α for some positive constant .c' .
so that .|Xt − Xs | ≤ cα,ω α,ω
3.5 Key Ideas to Remember 69
Fourth Step We proved that for every .ω ∈ Ω \ N the trajectory .X(ω) is .α-
Hölder continuous on .D and therefore extends uniquely to an .α-Hölder continuous
~
function on .[0, 1], which we denote by .X(ω). Now we define the process .X ~ whose
~
trajectories are equal to .X(ω) if .ω ∈ Ω \ N and are identically zero on N . We prove
~ is a modification of X, that is, .P (Xt = X
that .X ~t ) = 1 for every fixed .t ∈ [0, 1]:
this is obvious if .t ∈ D. On the other hand, if .t ∈ [0, 1] \ D, we consider a sequence
.(tn )n∈N in .D that approximates t. We already have observed that, by (3.4.2), .Xtn
We provide a summary of the chapter’s major findings and essential concepts for
initial comprehension, focusing on omitting technical or secondary details. As usual,
if you have any doubt about what the following succinct statements mean, please
review the corresponding section.
• Sections 3.1 and 3.2: a continuous stochastic process X can be regarded as a
random variable with values in the Polish metric space of continuous trajectories,
.(C(I ), Bϱmax ). The law of X is therefore a distribution on the Borel .σ -algebra
.Bϱmax .
Brownian motion stands out as one of the paramount stochastic processes. It owes
its name to the botanist Robert Brown, who, circa 1820, documented the erratic
motion exhibited by pollen grains suspended within a solution. This phenomenon,
characterized by the seemingly random movement of particles due to collisions with
surrounding molecules, has since found widespread applications in various fields,
ranging from physics and chemistry to finance and biology. Brownian motion was
used by Louis Bachelier in 1900 in his doctoral thesis as a model for the price of
stocks and was studied by Albert Einstein in one of his famous papers in 1905.
The first rigorous mathematical definition of a Brownian motion is due to Norbert
Wiener in 1923.
4.1 Definition
Definition 4.1.1 (Brownian Motion [!!!]) Let W = (Wt )t≥0 be a real stochastic
process defined on a filtered probability space (Ω, F , P , Ft ). We say that W is a
Brownian motion if it satisfies the following properties:
(i) W0 = 0 a.s.;
(ii) W is a.s. continuous;
(iii) W is adapted to (Ft )t≥0 , i.e., Wt ∈ mFt for every t ≥ 0;
(iv) Wt − Ws is independent of Fs for every t ≥ s ≥ 0;
(v) Wt − Ws ∼ N0,t−s for every t ≥ s ≥ 0.
Fig. 4.2 1000 trajectories of a Brownian motion and histogram of its sample distribution at time
t =1
Remark 4.1.2 Let us briefly comment on the properties of Definition 4.1.1: by (i)
a Brownian motion starts from the origin, just as a convention. Property (ii) ensures
that almost all trajectories of W are continuous. Moreover, W is adapted to the
filtration (Ft )t≥0 : this means that, at any fixed time t, the information in Ft is
sufficient to observe the entire trajectory of W up to time t. Properties (iv) and
(v) are less intuitive but can be justified by some notable features, observable at
the statistical level, of random motions: we call (iv) and (v) the independence and
stationarity properties of the increments, respectively (cf. Definition 2.3.1). Notice
that Wt −Ws is equal in law to Wt−s . Figures 4.1 and 4.2 show the plot of trajectories
of a Brownian motion.
Remark 4.1.3 In Definition 4.1.1 the filtration (Ft ) is not necessarily the one
generated by W : the latter was denoted by (GtW )t≥0 in Definition 1.4.3. Clearly,
property (iii) of a Brownian motion implies that GtW ⊆ Ft for every t ≥ 0. We
4.1 Definition 73
will see in Sect. 6.2 that it is generally preferable to work with filtrations strictly
larger than G W in order to satisfy appropriate technical assumptions, including, for
example, completeness.
We give a useful characterization of a Brownian motion.
Proposition 4.1.4 ([!]) An a.s. continuous stochastic process W = (Wt )t≥0 is a
Brownian motion with respect to its own filtration (GtW )t≥0 if and only if it is a
Gaussian process with zero mean function, E [Wt ] = 0, and covariance function
cov(Ws , Wt ) = s ∧ t.
Proof Let W be a Brownian motion on (Ω, F , P , (GtW )t≥0 ). For each 0 = t0 <
t1 < · · · < tn , the random variables Zk := Wtk − Wtk−1 , have normal distribution;
moreover, by properties (iii) and (v) of a Brownian motion, Zk is independent of
GtW
k−1
and therefore of Z1 , . . . , Zk−1 ∈ mGtW k−1
. This proves that (Z1 , . . . , Zn ) is a
multi-normal vector with independent components. Also (Wt1 , . . . , Wtn ) is multi-
normal because it is obtained from (Z1 , . . . , Zn ) by the linear transformation
⎲
h
.Wth = Zk , h = 1, . . . , n,
k=1
and this proves that W is a Gaussian process. We also observe that, assuming s < t,
we have
Remark 4.1.5 ([!]) Proposition 4.1.4 states that the finite-dimensional distributions
of a Brownian motion are uniquely determined: hence the Brownian motion √ is
~t := tW1
unique in law. Notice that, if W is a Brownian motion then the process W
has the same one-dimensional distributions as W but is obviously not a Brownian
motion.
There are numerous proofs of the existence of a Brownian motion: some of them
can be found, for example, in the monographs by Schilling [129] and Bass [9]. Here
we see the result as a corollary of Kolmogorov’s extension and continuity theorems.
Theorem 4.1.6 A Brownian motion exists.
Proof The main step is the construction of a Brownian motion on the bounded
time interval [0, 1]. By Kolmogorov’s extension theorem (in particular, by Corol-
lary 1.3.6) there exists a Gaussian process W (0) = (Wt(0) )t∈[0,1] with zero mean
(0) (0)
function and covariance function cov(Ws , Wt ) = s ∧ t. By Kolmogorov’s
(0)
continuity theorem and Example 3.3.2, W admits a continuous modification that,
by Proposition 4.1.4, satisfies the properties of a Brownian motion on [0, 1].
Now take a sequence (W (n) )n∈N of independent copies of W (0) . We “glue” these
(0)
processes together by defining Wt = Wt for t ∈ [0, 1] and
[t]−1
⎲ [t]
Wt =
. W1(k) + Wt−[t] , t > 1,
k=0
where [t] denotes the integer part of t. Then it is easy to prove that W is a Brownian
motion. ⨆
⨅
Remark 4.1.7 As seen in Example 3.3.2, a Brownian motion admits a modification
with trajectories that are not only continuous but also locally α-Hölder continuous
for every α < 12 . The exponent α is strictly less than 12 , and this result cannot
be improved: for more details, we refer, for example, to Chapter 7 in [9]. A classic
result, the Law of the iterated logarithm, precisely describes the asymptotic behavior
of Brownian increments:
|Wt |
. lim sup / =1 a.s.
+
t→0 2t log log 1t
Consequently, the trajectories of a Brownian motion are almost surely not differen-
tiable at any point: precisely, there exists N ∈ F , with P (N ) = 0, such that for
every ω ∈ Ω \ N, the function t |→ Wt (ω) is not differentiable at any point in
[0, +∞[.
4.2 Markov and Feller Properties 75
WTt,x := WT − Wt + x,
. T ≥ t.
Definition 4.2.1 The process .W t,x = (WTt,x )T ≥t is called Brownian motion with
initial point x at time t and has the following properties:
(i) .Wtt,x = x;
(ii) the trajectories .T |→ WTt,x are a.s. continuous;
t,x
(iii) .W
T ∈ mFT for every .T ≥ t;
t,x t,x
(iv) .W
T − Ws = WT − Ws is independent of .Fs for every .T ≥ s ≥ t;
t,x t,x
(v) .W
T − Ws ∼ N0,T −s for every .T ≥ s ≥ t.
Remark 4.2.2 The process .W t,x is also a Brownian motion with respect to its
generated filtration, defined by
Note that .GTt,x ⊆ FT and there is a strict inclusion .Gtt,x = {∅, Ω} ⊂ Ft if .t > 0.
By Proposition 2.3.2, we have
Theorem 4.2.3 (Markov Property [!]) Let .W = (Wt )t≥0 be a Brownian motion
on .(Ω, F , P , Ft ). Then W is a Markov process with Gaussian transition density
1 (x−y)2
𝚪(t, x; T , y) = √
. e− 2(T −t) , 0 ≤ t < T , x, y ∈ R. (4.2.1)
2π(T − t)
u(t, Wt ) = E [ϕ(WT ) | Ft ]
.
where
ˆ
u(t, x) :=
. 𝚪(t, x; T , y)ϕ(y)dy. (4.2.2)
R
This is in agreement with Example 2.5.9, being .At = 21 ∂xx the characteristic
operator of the Gaussian transition distribution. Note that the hypothesis .ϕ ∈ bC(R)
is only1 used to prove the continuity of .u(t, x) up to .t = T .
Recall Definition 3.2.3, which defines the canonical version of an a.s. continuous
process. An immediate consequence of Proposition 4.1.4 is the following
Corollary 4.3.3 Given a Brownian motion W , its canonical version .W is a
Brownian motion on the Wiener space equipped with the filtration .G W generated
by .W.
Given a Brownian motion W , we will later introduce (cf. Sect. 6.2.3) a filtration
larger than .G W so that some useful regularity properties hold.
Example 4.3.4 Let W be a real Brownian motion and .0 < t < T . We have the
following expressions for the joint densities of .Wt and .WT :
1 (T x 2 −2txy+ty 2 )
. γ(Wt ,WT ) (t, x; T , y) = γ(WT ,Wt ) (T , y; t, x) = √ e− 2t (T −t) .
2π t (T − t)
γ(WT ,Wt ) (T , y; t, x)
γWT |Wt (T , y; t, x) =
. = 𝚪(t, x; T , y),
γWt (t, x)
2
T (x− Tt y )
γ(Wt ,WT ) (t, x; T , y) 1
γWt |WT (t, x; T , y) = =/ e− 2t (T −t) .
γWT (T , y) 2π t (TT−t)
and
μWt |WT = N t W
. t (T −t) .
T T, T
Xt := Wt2 − t;
.
σ2
. Yt = eσ Wt − 2 t
for every .σ ∈ C.
78 4 Brownian Motion
and therefore W is an absolutely integrable process. Part (i) follows from Proposi-
tion 2.3.4, being W a process with constant zero mean and independent increments.
Similarly, (ii) and (iii) are proven: for example, we have
⎾ ⏋
E [XT | Ft ] = E (WT − Wt + Wt )2 | Ft − T
.
⎾ ⏋
= E (WT − Wt )2 | Ft + 2Wt E [WT − Wt | Ft ]
◟ ◝◜ ◞ ◟ ◝◜ ◞
=0
=T −t
+ Wt2 − T = Wt2 − t.
⨆
⨅
We give a useful characterization of a Brownian motion in terms of exponential
martingales.
Proposition 4.4.2 ([!]) A continuous and adapted process W , defined on the space
(Ω, F , P , Ft ) and such that .W0 = 0 a.s., is a Brownian motion if and only if
.
η2
Mt := eiηWt +
η 2 t
.
from which the thesis follows: in particular, the independence property follows from
14) of Theorem 4.2.10 in [113]. ⨆
⨅
4.4 Brownian Martingales 79
The following version of Theorem 2.5.13 provides a general method for con-
structing a martingale by composing a Brownian motion W with a sufficiently
regular function .f = f (t, x). We also assume on f a growth condition of the type
with .cT a positive constant dependent on T and .α ∈ [0, 2[: this ensures the
integrability of the process .f (t, Wt ) for .t ∈ [0, T ].
Theorem 4.4.3 ([!]) Let .f = f (t, x) ∈ C 1,2 (R≥0 × R) be a function that verifies,
together with its first and second derivatives, the growth condition (4.4.1). Then the
process
ˆ t⎛ ⎞
1
.Mt := f (t, Wt ) − f (0, W0 ) − ∂s f + ∂xx f (s, Ws )ds, t ∈ [0, T ],
0 2
In conclusion, we have
E [MT − Mt | Ft ]
.
⎾ ˆ T ⎛ ⎞ ⏋
1
= E f (T , WT ) − f (t, Wt ) − ∂s f + ∂xx f (s, Ws )ds | Ft = 0
t 2
We summarize the most significant findings of the chapter and the fundamental
concepts to be retained from an initial reading, while disregarding the more technical
or secondary matters. As usual, if you have any doubt about what the following
succinct statements mean, please review the corresponding section.
• Section 4.1: a Brownian motion W is a continuous and adapted process, with
independent and stationary increments having normal distribution. It is charac-
terized by being a Gaussian process with zero mean function and covariance
function .cov(Ws , Wt ) = s ∧ t.
• Section 4.2: W is a Markov process with transition law equal to the law of .WTt,x .
Moreover, W is a strong Feller process.
• Section 4.3: the finite-dimensional densities of W are uniquely determined and
the law of W is called Wiener measure.
• Section 4.4: W is a martingale and other notable examples of martingales can
be constructed as functions of W : for instance, the quadratic and the exponential
martingales. The latter provides a characterization of the Brownian motion (cf.
Proposition 4.4.2). Theorem 4.4.3 shows how to “compensate” a function of W
to make it a martingale and indicates the connection with the heat equation that
will be further explored in the following chapters.
4.5 Key Ideas to Remember 81
We are too small and the universe too large and too interrelated
for thoroughly deterministic thinking.
Don S. Lemons, [88]
The Poisson process, denoted as .(Nt )t≥0 , serves as the prototype of what are known
as “pure jump processes”. Intuitively, .Nt indicates the number of times within
the time interval .[0, t] that a specific event (referred to as an episode) occurs:
for example, if the single episode consists of the arrival of a spam email in a
mailbox, then .Nt represents the number of spam emails that arrive in the period
.[0, t]; similarly, .Nt can indicate the number of children born in some country or the
number of earthquakes that occur in some geographical area in the period .[0, t].
5.1 Definition
T0 := 0,
. Tn := τ1 + · · · + τn , n ∈ N,
in which .Tn represents the instant at which the n-th episode occurs.
Tn ∼ Gamman,λ
. n ∈ N. (5.1.1)
Moreover, almost surely3 the sequence .(Tn )n≥0 is monotonically increasing and
Proof Formula (5.1.1) follows from (2.6.7) in [113]. The monotonicity follows
from the fact that .τn ≥ 0 a.s. for every .n ∈ N. Finally, (5.1.2) follows from Borel-
Cantelli’s Lemma 1.3.28 in [113]: in fact, for every .ε > 0, we have
⎛ ⎞ ⋂⋃
. lim Tn = +∞ ⊇ ((τn > ε) i.o.) = (τk > ε)
n→∞
n≥1 k≥n
and the events .(τk > ε) are independent and such that
⎲
. P (τn > ε) = +∞.
n≥1
⨆
⨅
Definition 5.1.2 (Poisson Process, I) The Poisson process .(Nt )t≥0 with parameter
λ > 0 is defined by
.
∞
⎲
Nt =
. n1[Tn ,Tn+1 [ (t), t ≥ 0. (5.1.3)
n=1
By definition, .Nt takes non-negative integer values and precisely .Nt = n if and
only if t belongs to the interval with random endpoints .[Tn , Tn+1 [; hence we have
the equality of events
At the random time .Tn , when the n-th episode occurs, the process makes a jump of
size 1: Fig. 5.1 shows the plot of a Poisson process trajectory in the time interval
(λt)n−1
.γn,λ (t) := λe−λt 1R (t), n ∈ N.
(n − 1)! ≥0
3 The set of .ω ∈ Ω such that .Tn (ω) ≤ Tn+1 (ω) for every .n ∈ N and . lim Tn (ω) = +∞, is a
n→∞
certain event.
5.1 Definition 85
[0, 10]. We recall that a trajectory of N is a function of the form .t |→ Nt (ω), defined
.
Nt = ♯{n ∈ N | Tn ≤ t}.
.
(λt)n
P (Nt = n) = e−λt
. , t ≥ 0, n ∈ N ∪ {0}. (5.1.6)
n!
4 In other words, every fixed t is almost surely (i.e., for almost all trajectories) a point of
continuity for the Poisson process. This apparent paradox is explained by the fact that almost
every trajectory has at most countably infinite discontinuities, since it is monotonically increasing,
and such discontinuities are arranged on the entire interval .[0, +∞[ which has the cardinality of
the continuum. Thus, all trajectories are discontinuous but every single t is a point of discontinuity
only for a negligible family of trajectories.
86 5 Poisson Process
In particular, the parameter .λ, called intensity of the Poisson process, is equal
to the expected number of jumps in the unit time interval .[0, 1];
(iii) the characteristic function of .Nt is given by
iη −1)
ϕNt (η) = eλt (e
. , t ≥ 0, η ∈ R. (5.1.7)
Proof
((i) Right-continuity and monotonicity follow from the definition. For every .t > 0,
let .Nt− = lim Ns and .ΔNt = Nt − Nt− . We note that .ΔNt ∈ {0, 1} a.s. and,
s↗t
for a fixed .t > 0, the set of trajectories that are discontinuous at t is given by
∞
⋃
. (ΔNt = 1) = (Tn = t)
n=1
which is a negligible event since the random variables .Tn are absolutely
continuous. This proves (5.1.5).
(ii) By (5.1.4) we have
. = P (Tn ≤ t) − P (Tn+1 ≤ t) =
ψ(η) = λ(eiη − 1)
. (5.1.8)
⎲
Nt
Xt =
. Zn , t ≥ 0.
n=0
Note that the Poisson process is a particular case of X in which .Zn ≡ 1 for .n ∈ N.
In Fig. 5.2 two trajectories of the compound Poisson process with normal jumps and
different choices of the intensity parameter are represented.
Taking advantage of the independence assumption, it is easy to calculate the CHF
of .Xt : actually, it is a calculation already carried out in Exercise 2.5.4 in [113] where
we proved that
Fig. 5.2 On the left: plot of a trajectory of the compound Poisson process with .λ = 10 and
.Zn∼ N0,10−2 . On the right: plot of a trajectory of the compound Poisson process with .λ = 1000
and .Zn ∼ N0,10−2
88 5 Poisson Process
where .ϕZ (η) is the CHF of .Z1 . Also in this case, the CHF of .Xt is homogeneous in
time and .ψ is called the characteristic exponent of the compound Poisson process.
As a particular case, we find (5.1.8) for .Zn ∼ δ1 , that is, for unitary jumps as in the
Poisson process.
The following theorem provides two crucial properties of the increments .Nt − Ns
of the Poisson process. As usual (cf. (1.4.1)), .G N = (GtN )t≥0 denotes the filtration
generated by N .
Theorem 5.2.1 ([!]) For every .0 ≤ s < t we have:
(i) .Nt − Ns ∼ Poissonλ(t−s) ;
(ii) .Nt − Ns is independent of .GsN .
Property (i) implies that the r.v. .Nt − Ns and .Nt−s are equal in law and for this
reason, we say that N has stationary increments. Property (ii) states that N is a
process with independent increments according to Definition 2.3.1.
The proof of Theorem 5.2.1 is postponed to Sect. 5.4.
Definition 5.2.2 (Càdlàg Function) We say that a function f , from a real interval
I to .R, is càdlàg (from the French “continue à droite, limite à gauche”) if at every
point it is continuous from the right and has a finite limit from the left.5
The definition of Poisson process can be generalized as follows.
Definition 5.2.3 (Poisson Process, II) A Poisson process with intensity .λ > 0,
defined on a filtered probability space .(Ω, F , P , Ft ), is a stochastic process
.(Nt )t≥0 such that:
5 If .I = [a, b], at the endpoints we assume by definition that . lim f (x) = f (a) and the limit
x↘a
. lim f (x) exists and is finite.
x↗b
5.2 Markov and Feller Properties 89
are independent and with distribution .Expλ : for more details see, for example,
Chapter 5 in [9]. Note that in Definition 5.2.3 the filtration is not necessarily the
one generated by the process.
Theorem 5.2.4 (Markov Property [!]) The Poisson process N is a Markov and
Feller process with transition law
then
u(t, Nt ) = E [ϕ(NT ) | Ft ] .
.
Conversely, if N satisfies (5.2.1) and properties (i), (ii) and (iii) of Definition
5.2.3, properties (iv) and (v) remain to be proven. Applying the expected value to
90 5 Poisson Process
(5.2.1), we get
⎾ ⏋
E eiη(Nt −Ns ) = eλ(e −1)(t−s) ,
iη
. 0 ≤ s ≤ t, η ∈ R.
Then (v) is an obvious consequence of the fact that the characteristic function
determines the distribution; property (iv) of independent increments follows from
point 14) of Theorem 4.2.10 in [113]. ⨆
⨅
Remark 5.2.6 (Poisson Process with Stochastic Intensity) The characterization
given in Proposition 5.2.5 enables the definition of a broad range of processes, with
the Poisson process being just one specific example. In a space .(Ω, F , P , Ft )
consider a process .N = (Nt )t≥0 that satisfies properties (i), (ii) and (iii) of
Definition 5.2.3 and a non-negative valued process .λ = (λt )t≥0 such that for each
.t ≥ 0,
ˆ t
λt ∈ mF0
. and λs ds < ∞ a.s.
0
If
⎾ ⏋ ´t
E eiη(Nt −Ns ) | Fs = e(e −1) s λr dr
iη
.
Consider a Poisson process .N = (Nt )t≥0 on the space .(Ω, F , P , Ft ). Note that N
is not a martingale since .E [Nt ] = λt is a strictly increasing function and therefore
the process is not constant in mean. However, being a process with independent
increments, from Proposition 2.3.4 we have the following
Proposition 5.3.1 (Compensated Poisson Process) The compensated Poisson
process, defined by
.~t := Nt − λt,
N t ≥ 0,
is a martingale.
We explicitly observe that .N~ takes real values, unlike N which takes only integer
values: in Fig. 5.3 a trajectory of a compensated Poisson process is depicted.
5.4 Proof of Theorem 5.2.1 91
Remark 5.3.2 The fact that .N ~ is a martingale also follows by applying Theo-
rem 2.5.13 with .ϕ(x) = x. More generally, Theorem 2.5.13 shows how it is possible
to “compensate” a process that is a function of .Nt in order to obtain a martingale.
(s)
Nh = Ns+h − Ns ,
. h ∈ R≥0 , (5.4.1)
is a Poisson process with respect to the conditional probability given the event
(Ns = k), i.e. .N (s) is a Poisson process on the space .(Ω, F , P (· | Ns = k)).
.
(s)
T0
. = 0, Tn(s) = Tk+n − s, n ∈ N,
(s)
.(Nh = n) ∩ A = (Ns+h = n + k) ∩ A = (Tn+k ≤ s + h < Tn+k+1 ) ∩ A
(s)
= (Tn(s) ≤ h < Tn+1 ) ∩ A
92 5 Poisson Process
that is, in accordance with the definition of the Poisson process in the form (5.1.4),
on the event A we have
τ1(s) := Tk+1 − s,
.
(s)
τn(s) := Tn(s) − Tn−1 ≡ τk+n , n ≥ 2,
∏
J
= P (Ns = k) Expλ (Hj ). (5.4.3)
j =1
Now it is sufficient to consider the case where .H1 is an interval, .H1 = [0, c]: since
Tk and .τk+1 are independent under P , the joint density is given by the product of
.
5.4 Proof of Theorem 5.2.1 93
ˆ s ⎛ˆ c+s−x ⎞
= λe−λy dy Gammak,λ (dx)
0 s−x
ˆ s
= e−λ(c+s−x) (eλc − 1)Gammak,λ (dx)
0
(sλ)k −λ(c+s) λc
= e (e − 1) = Poissonλs ({k})Expλ ([0, c])
k!
which proves (5.4.4) with .H1 = [0, c].
Second Step By the first step, .Nt − Ns is a Poisson process conditionally on .(Ns =
k) and therefore we have
for every .s < t and .n, k ∈ N ∪ {0}. By the law of total probability, we have
⎲
P (Nt − Ns = n) =
. P (Nt − Ns = n | Ns = k)P (Ns = k) =
k≥0
(by (5.4.5))
⎲
. = Poissonλ(t−s) ({n})P (Ns = k) = Poissonλ(t−s) ({n}), (5.4.6)
k≥0
and this proves property (i). Moreover, as a consequence of (5.4.6), formula (5.4.5)
is equivalent to
which proves that the consecutive increments .Nt − Ns and .Ns = Ns − N0 are
independent under P .
More generally, we verify that .Nt − Nr and .Nr − Ns , with .0 ≤ s < r < t, are
independent under P . Recalling the notation (5.4.1), we have
(here we use the fact that .N (s) is a Poisson process conditionally on .(Ns = j ) and
(s) (s) (s)
therefore, as just proved, the increments .Nt−s − Nr−s and .Nr−s are independent
(s)
under .P (· | Ns = j ). Moreover, .Nr−s = Nr − Ns and .Ns are independent under P
(s) (s)
and therefore .P (Nr−s = k | Ns = j ) = P (Nr−s = k))
⎲ (s) (s) (s)
. = P (Nt−s − Nr−s = n | Ns = j )P (Nr−s = k)P (Ns = j )
j ≥0
(s) (s) (s)
= P (Nt−s − Nr−s = n)P (Nr−s = k)
= P (Nt − Nr = n)P (Nr − Ns = k).
Thus, we have proved that, for .0 ≤ s < r < t, the increment .Nt −Nr is independent
of .X := Nr and .Y := Nr − Ns : consequently, .Nt − Nr is also independent of
.Ns = X − Y and this proves property (ii). ⨆
⨅
We summarize the most significant findings of the chapter and the fundamental
concepts to be retained from an initial reading, leaving out the more technical or less
crucial details. As usual, if you have any doubt about what the following succinct
statements mean, please review the corresponding section.
. Section 5.1: the Poisson process N is the prototype of jump processes. Some-
times called a “counting process”, .Nt indicates the number of times in the
interval .[0, t] in which an episode occurs. The discontinuities of N are jumps of
unit size; in various applications, the compound Poisson process is used, which
has jumps whose size is random. The CHF of a (compound) Poisson process
is homogeneous in time and can be expressed in explicit form in terms of the
characteristic exponent.
. Section 5.2: N is a process with independent increments and enjoys the Markov
and Feller properties.
. Section 5.3: the compensated process .N ~t = Nt − λt is a martingale.
. Section 5.4: from the constructive definition of the Poisson process given in
Sect. 5.2, one can deduce some remarkable properties, namely the fact that
.Nt − Ns ∼ Poissonλ(t−s) and .Nt − Ns is independent of .Gs (cf. Theorem 5.2.1);
N
however, this requires some work and the proof can be skipped at a first
reading.
5.5 Key Ideas to Remember 95
Stopping times are a fundamental tool in the study of stochastic processes: they
are particular random times that satisfy a consistency property with respect to the
assigned filtration of information. The concept of stopping time is at the basis of
some deep results on the structure of martingales: the optional sampling theorem,
the maximal inequalities, and the upcrossing lemma. The inherent challenges in
establishing these results become apparent even within the discrete framework. To
move to continuous time, it will be necessary to introduce further assumptions on
filtrations, the so-called usual conditions. The second part of the chapter collects
some technical results: it shows how to extend the filtrations of Markov processes
and other important classes of stochastic processes, in order to guarantee the usual
conditions while maintaining the properties of the processes.
In this section, we consider the case of a finite number of time instants, within a
filtered probability space .(Ω, F , P , (Fn )n=0,1,...,N ) with .N ∈ N.
Definition 6.1.1 (Discrete Stopping Time) A discrete stopping time is a random
variable
τ : Ω −→ {0, 1, . . . , N, ∞}
.
such that
.(τ = n) ∈ Fn , n = 0, . . . , N. (6.1.1)
We employ the symbol .∞ to represent a constant number that is not part of the
set .{0, 1, . . . , N } of the specified time instances: the reason for using such a symbol
will be clearer later, e.g. in Example 6.1.3. We assume .N < ∞ so that
(τ ≥ n) := (τ = n) ∪ · · · ∪ (τ = N ) ∪ (τ = ∞)
.
for every .n = 0, . . . , N .
Remark 6.1.2 Note that:
(i) condition (6.1.1) is equivalent to
(τ ≤ n) ∈ Fn ,
. n = 0, 1, . . . , N ;
(ii) we have
(τ ≥ n + 1) = (τ ≤ n)c ∈ Fn ,
. n = 0, . . . , N, (6.1.2)
(τ ∧ σ ≤ n) = (τ ≤ n) ∪ (σ ≤ n),
.
(τ ∨ σ ≤ n) = (τ ≤ n) ∩ (σ ≤ n), n = 0, . . . , N ;
.J (ω) = {n | Xn (ω) ∈
/ H }, ω ∈ Ω.
From now on, we adopt the convention .min ∅ = ∞ and therefore write more
concisely
τ = min{n | Xn ∈
. / H }.
A simple example of random time that is not a stopping time is the last exit time of
X from H :
⎧
max J (ω) if J (ω) =
/ ∅,
.τ̄ (ω) =
∞ otherwise.
Fτ := {A ∈ F | A ∩ (τ = n) ∈ Fn for every n = 0, . . . , N }.
. (6.1.3)
(iv) if .X = (Xn )n=0,...,N is a process adapted to the filtration then .Xτ ∈ mFτ .
Proof Part (i) follows from the fact that if .τ ≡ k then
⎧
A if k = n,
A ∩ (τ = n) =
.
∅ if k /= n.
A ∩ (σ = n) = A ∩ (τ ≤ n) ∩ (σ = n) .
.
◟ ◝◜ ◞ ◟ ◝◜ ◞
∈Fn ∈Fn
. (τ ≤ σ ) ∩ (τ = n) = (σ ≥ n) ∩ (τ = n) ∈ Fn ,
(τ ≤ σ ) ∩ (σ = n) = (τ ≤ n) ∩ (σ = n) ∈ Fn ,
100 6 Stopping Times
.A ∩ (τ ∧ σ ≤ n) = A ∩ ((τ ≤ n) ∪ (σ ≤ n))
= (A ∩ (τ ≤ n)) ∪ (A ∩ (σ ≤ n)) ∈ Fn , n = 0, . . . , N,
A ∩ (τ = n) = (A ∩ (τ ∧ σ = n)) ∩ (τ = n) ∈ Fn
.
. (Xτ ∈ H ) ∩ (τ = n) = (Xn ∈ H ) ∩ (τ = n) ∈ Fn , n = 0, . . . , N.
Xnτ = Xn∧τ ,
. n = 0, . . . , N.
Proposition 6.1.7
(i) If X is adapted, then .Xτ is adapted;
(ii) if X is a sub-martingale, then .Xτ is a sub-martingale as well.
Proof Part (i) follows from the fact that, for .n = 0, . . . , N , we have1
τ ∧n
⎲
Xτ ∧n = X0 +
. (Xk − Xk−1 )
k=1
⎲
n
= X0 + (Xk − Xk−1 )1(k≤τ )
k=1
and, by (6.1.2), .(k ≤ τ ) ∈ Fk−1 . Part (ii) follows by applying the conditional
expectation given .Fn−1 to the identity
Xnτ − Xn−1
.
τ
= (Xn − Xn−1 )1(τ ≥n) , n = 1, . . . , N,
∑
0
1 With the convention . · · · = 0.
k=1
6.1 The Discrete Case 101
E [Z1G ] ≤ E [X1G ]
. for every G ∈ G .
Xτ ∧σ ≤ E [Xτ | Fσ ] ;
.
(iii) for every stopping time .τ0 , the stopped process .Xτ0 is a sub-martingale.
Proof [(i) .=⇒ (ii)] Observe that
⎲
Xτ = Xτ ∧σ +
. (Xk − Xk−1 ) = (6.1.4)
σ <k≤τ
⎲
N
. = Xτ ∧σ + (Xk − Xk−1 )1(σ <k≤τ ) .
k=1
Now, by points (ii) and (iv) of Proposition 6.1.5, .Xτ ∧σ ∈ mFτ ∧σ ⊆ mFσ and
therefore conditioning (6.1.4) to .Fσ we have
⎲
N
⎾ ⏋
.E [Xτ | Fσ ] = Xτ ∧σ + E (Xk − Xk−1 )1(σ <k≤τ ) | Fσ .
k=1
⎾ ⏋
To conclude, it is sufficient to prove that .E (Xk − Xk−1 )1(σ <k≤τ ) | Fσ ≥ 0 for
.k = 1, . . . , N or equivalently, thanks to Lemma 6.1.8,
⎾ ⏋ ⎾ ⏋
E Xk−1 1(σ <k≤τ ) 1G ≤ E Xk 1(σ <k≤τ ) 1G ,
. G ∈ Fσ , k = 1, . . . , N.
(6.1.5)
2 .Z ≤ E [X | G ] means .Z ≤ Y a.s. if .Y = E [X | G ].
102 6 Stopping Times
Formula (6.1.5) follows from the sub-martingale property of X once observed that,
by definition of .Fσ and by Remark 6.1.2-(ii), we have
(σ < k ≤ τ ) ∩ G = (σ < k) ∩ G ∩ (τ ≥ k) .
.
◟ ◝◜ ◞ ◟ ◝◜ ◞
∈Fk−1 ∈Fk−1
Xσ ≤ E [Xτ | Fσ ] .
. (6.1.6)
Proof Formula (6.1.7) is a sort of Markov inequality (cf. (3.12) in [113]) for discrete
martingales. If M is a martingale then, by Proposition 1.4.12, .|M| is a non-negative
sub-martingale: therefore it is enough to prove the thesis under the assumption that
M is a non-negative sub-martingale. In this case, we denote by .τ the first instant in
which M exceeds the level .λ,
τ = min{n | Mn ≥ λ},
.
and we set
M̄ = max Mn .
.
0≤n≤N
(M̄ ≥ λ) = (τ ≤ N ) ∈ Fτ ∧N .
.
Then we have
⎾ ⏋ ⎾ ⏋
.P (M̄ ≥ λ) = E λ1(M̄≥λ) ≤ E Mτ ∧N 1(M̄≥λ) ≤
(since .(M̄ ≥ λ) ∈ Fτ ∧N )
⎾ ⎾ ⏋⏋ ⎾ ⏋
. = E E MN 1(M̄≥λ) | Fτ ∧N = E MN 1(M̄≥λ) (6.1.9)
p
Now observe that .M̄ p = max Mn . From (3.1.7) in [113] we have
0≤n≤N
ˆ +∞
⎾ ⏋ ( )
. E M̄ p
=p λp−1 P M̄ ≥ λ dλ ≤
0
(by (6.1.9))
ˆ +∞ ⎾ ⏋
. ≤p λp−2 E MN 1(M̄≥λ) dλ ≤
0
p
(by Hölder’s inequality, with . p−1 being the conjugate exponent of p)
p ⎾ p⏋1 ⎾ ⏋1− p1
. ≤ E MN p E M̄ p ,
p−1
⎾ ⏋1− p1
hence (6.1.8) follows by dividing by .E M̄ p and raising to the power of p. ⨅
⨆
Corollary 6.1.12 (Doob’s Maximal Inequalities) Let .M = (Mn )n=0,1,...,N
be a martingale or a non-negative sub-martingale on the space .(Ω, F , P ,
(Fn )n=0,1,...,N ), and let .τ be a discrete stopping time. Then:
(i) for every .λ > 0
⎛ ⎞
E [|Mτ |]
. P max |Mn | ≥ λ ≤ ;
0≤n≤τ ∧N λ
idea that we might have of a martingale as a process whose trajectories are strongly
“oscillating” (think, for example, of a Brownian motion).
To formalize the result, let us fix .a, b ∈ R with .a < b. The upcrossing
lemma provides an estimate of the number of times a martingale “rises” from a
value less than a to a value greater than b. More precisely, given a martingale
.M = (Mn )n=0,...,N on the space .(Ω, F , P , (Fn )n=0,...,N ), let .τ0 := 0 and,
recursively for .k ∈ N,
the time of the k-th upcrossing of the trajectory .M(ω); instead, if .τk (ω) = ∞ then
the total number of upcrossings of the trajectory .M(ω) is less than k. Ultimately,
the number of upcrossings of M on .[a, b] is given by
Now it is good to remember that, by definition (cf. Notation 6.1.4), .Mτk ≡ Mτk ∧N so
that .Mτk = MN on .(τk = ∞): in particular, it is not necessarily true that .Mτk (ω) ≥
b if .τk (ω) = ∞. This remark is important because, between an upcrossing time
.τk (ω) ≤ N and the next one, the trajectory .M(ω) must “descend” from .Mτk (ω) ≥ b
to .Mσk+1 (ω) ≤ a. The optional sampling theorem says that ⎾this cannot
⏋ happen
⎾ “too
⏋
often”: if .σk+1 ≤ N, by (6.1.11) we would have .b ≤ E Mτk ≤ E Mσk+1 ≤
a and this is absurd by the assumption .a < b. Therefore, for every .k ∈ N, the
event .(τk = ∞) cannot be negligible and, as already mentioned, such an event is
identifiable with the set of trajectories that have fewer than k upcrossings. In this
sense, the martingale property and the optional sampling theorem limit the number
of possible upcrossings, and thus oscillations, of M on .[a, b]. Now it is obvious that
.νa,b ≤ N , indeed more precisely .νa,b ≤
2 if .N ≥ 2: the surprising fact of the
N
Proof Since .a, b are fixed, during the proof we denote .νa,b simply by .ν. By
definition, .τk ≤ N on .(k ≤ ν) and .τk = ∞ on .(k > ν): therefore, recalling
again that .Mτ ≡ Mτ ∧N for every stopping time .τ , we have
⎲
N ⎲
ν
. (Mτk − Mσk ) = (Mτk − Mσk ) + Mτν+1 − Mσν+1 . (6.1.12)
k=1 k=1
Now there is a small problem: the last term .Mτν+1 − Mσν+1 = MN − Mσν+1 may
have a negative sign (since .MN could also be less than a). To solve this problem
(we will see shortly what the advantage will be) we introduce the process Y defined
by .Yn = (Mn − a)+ . We recall that Y is a non-negative sub-martingale (Proposition
1.4.12) and the number of upcrossings of M on .[a, b] is equal to the number of
upcrossings of Y on .[0, b − a] since
⎲
N ⎲
ν ⎲
ν
. (Yτk − Yσk ) = (Yτk − Yσk ) + Yτν+1 − Yσν+1 ≥ (Yτk − Yσk ) ≥ (b − a)ν,
k=1 k=1 k=1
(6.1.13)
⎲
N
YN ≥ YσN+1 − Yσ1 =
. (Yσk+1 − Yσk )
k=1
⎲
N ⎲
N
= (Yσk+1 − Yτk ) + (Yτk − Yσk ) ≥
k=1 k=1
(by (6.1.13))
⎲
N
. ≥ (Yσk+1 − Yτk ) + (b − a)ν.
k=1
Applying the expected value and the optional sampling theorem ((6.1.11) with .M =
Y ) we finally have the thesis
⨆
⨅
Exercise 6.1.14 Prove that, for every .a < b, a continuous function .f : [0, 1] −→
R can have only a finite number of upcrossings on .[a, b].
The analysis of stopping times in the continuous case, where .I = R≥0 , requires
additional technical assumptions on filtrations, commonly referred to as the “usual
conditions”. We will delve into these conditions in the subsequent sections.
Definition 6.2.1 (Usual Conditions) We say that a filtration (Ft )t≥0 in the
complete space (Ω, F , P ) satisfies the usual conditions if:
(i) it is complete, i.e., F0 (and therefore also Ft for every t > 0) contains the
family N of negligible events;4
(ii) it is right-continuous, i.e., for every t ≥ 0 we have Ft = Ft+ where
⋂
Ft+ :=
. Ft+ε . (6.2.1)
ε>0
If X is adapted to a filtration (Ft ) that satisfies the usual conditions, then every
modification of X is adapted to (Ft ) as well: without the completeness assumption
on the filtration, this statement is false. The right-continuity assumption is more
subtle: it means that the knowledge of information up to time t, represented by
Ft , allows us to know what happens “immediately after” t, i.e., Ft+ . To better
understand this fact, which may now appear obscure, we introduce the concepts of
stopping time in R≥0 and exit time of an adapted process.
Definition 6.2.2 (Stopping Time) In a filtered space (Ω, F , P , Ft ), a stopping
time is a random variable5
τ : Ω −→ R≥0 ∪ {∞}
.
such that
(τ ≤ t) ∈ Ft ,
. t ≥ 0. (6.2.2)
Example 6.2.3 (First Exit Time [!]) Given a process X = (Xt )t≥0 and H ⊆ R,
we set
⎧
inf J (ω) if J (ω) /= ∅,
.τ (ω) = where J (ω) = {t ≥ 0 | Xt (ω) ∈
/ H }.
∞ if J (ω) = ∅,
τ = inf{t ≥ 0 | Xt ∈
. / H}
assuming by convention that the infimum of the empty set is ∞ so that τ (ω) = ∞
if Xt (ω) ∈ H for every t ≥ 0. We say that τ is the first exit time of X from H .
Proposition 6.2.4 (Exit Time from an Open Set [!]) Let X be an adapted and
continuous process on the space (Ω, F , P , Ft ). The first exit time of X from an
open set H is a stopping time.
Proof The thesis is a consequence of the equality
⋃ ⋂ ⎛ ⎞
(τ > t) =
. dist(Xs , H c ) ≥ 1
n (6.2.3)
n∈N s∈Q∩[0,t)
⎛ ⎞
since dist(Xs , H c ) ≥ n1 ∈ Fs for s ≤ t and therefore (τ ≤ t) = (τ > t)c ∈ Ft .
Let us prove (6.2.3): if ω belongs to the right-hand side then there exists n ∈ N
such that dist(Xs (ω), H c ) ≥ n1 for every s ∈ Q ∩ [0, t); since X has continuous
trajectories, it follows that dist(Xs (ω), H c ) ≥ n1 for every s ∈ [0, t] and therefore,
again by the continuity of X, it must be τ (ω) > t.
Conversely, if τ (ω) > t then K := {Xs (ω) | s ∈ [0, t]} is a compact subset of H :
since H is open, it follows that dist(K, H c ) > 0 and this is enough to conclude. ⨅ ⨆
In the next lemma, we prove that for every stopping time τ we have
(τ < t) ∈ Ft ,
. t > 0. (6.2.4)
In general, (6.2.4) is weaker than (6.2.2) but, under the usual conditions on the
filtration, the two properties are equivalent.
Lemma 6.2.5 ([!]) Every stopping time τ satisfies (6.2.4). Conversely, if (6.2.4)
holds and the filtration (Ft )t≥0 is right-continuous, then τ is a stopping time.
Proof We have
⋃⎛ ⎞
(τ < t) =
. τ ≤t− 1
n .
n∈N
6.2 The Continuous Case 109
⎛ ⎞
If τ is a stopping time, then τ ≤ t − n1 ∈ F 1 ⊆ Ft for every n ∈ N, and this
t− n
proves the first part of the thesis.
Conversely, if (6.2.4) holds, then for every ε > 0 we have
⋂⎛ ⎞
. (τ ≤ t) = τ < t + n1 ∈ Ft+ε .
n∈N
1
n <ε
Therefore
⋂
(τ ≤ t) ∈
. Ft+ε = Ft
ε>0
(τ = t) = (τ ≤ t) \ (τ < t) ∈ Ft .
.
Moreover
⋂ ⋃
(τ = ∞) =
. (τ ≥ t) ∈ Ft .
t≥0 t≥0
and the thesis follows from the fact that (Xs ∈ H c ) ∈ Ft for s ≤ t since X is
adapted to (Ft ). The second part of the thesis follows directly from Lemma 6.2.5.
⨆
⨅
110 6 Stopping Times
Remark 6.2.8 Under the usual conditions, also the exit time from a Borel set
is a stopping time. However, establishing this fact demands a substantially more
challenging proof: see, for example, Section I.10 in [20].
Remark 6.2.9 ([!]) Let us comment on Proposition 6.2.7 by observing Fig. 6.1
where the first exit time τ of X from the closed set H is represented. Up to time
τ , including τ , the trajectory of X is in H . Now note the difference between the
events
Intuitively, it is plausible that, without the need to impose conditions on the filtration,
one can prove (this is what we did in Proposition 6.2.7) that (τ < t) ∈ Ft , i.e., that
the fact that X exits H before time t is observable based on the knowledge of what
happened up to time t (i.e., Ft , in particular knowing the trajectory of the process up
to time t). On the contrary, it is only thanks to the right-continuity of the filtration
that one can prove that (τ ≤ t) ∈ Ft . Indeed, if t = τ (ω) then Xt (ω) ∈ ∂H
and based on the observation of the trajectory of X up to time t (i.e., having the
information in Ft ) it is not possible to know whether X(ω) will continue to remain
inside H or exit H immediately after t. In fact, for a generic filtration (τ ≤ t) ∈ / Ft ,
i.e., as already observed, the condition (τ < t) ∈ Ft is weaker than (τ ≤ t) ∈ Ft .
On the other hand, if (Ft )t≥0 satisfies the usual conditions (in particular, the right-
continuity property) then the two conditions (τ < t) ∈ Ft and (τ ≤ t) ∈ Ft are
equivalent (Lemma 6.2.5). As we anticipated, this means that the right-continuity of
the filtration ensures that knowing Ft we can also see what happens “immediately
after” time t.
We have explained the importance of the usual conditions on filtrations and the
reasons why it is it is preferable to assume the validity of such hypotheses. In
this section, we prove that it is always possible to modify a filtration so that it
satisfies the usual conditions and, under appropriate conditions, it is also possible
6.2 The Continuous Case 111
The results of this section and the rest of the chapter are useful but have
quite technical and less informative proofs: at a first reading, it is therefore
recommended to read the statements but skip the proofs.
Consider a complete space .(Ω, F , P ) equipped with a generic filtration .(Ft )t≥0
and denote by .N the family of negligible events. It is always possible to expand
.(Ft )t≥0 so that the usual conditions are satisfied:
(i) by setting
F¯t := σ (Ft ∪ N ) ,
. t ≥ 0, (6.2.6)
Suppose that .X = (Xt )t≥0 is a Markov process with transition law p on the
complete filtered space .(Ω, F , P , Ft ). In general, it is not a problem to “shrink”
the filtration: more precisely, if .(Gt )t≥0 is a filtration such that .GtX ⊆ Gt ⊆ Ft for
every .t ≥ 0, i.e., .(Gt )t≥0 is smaller than .(Ft )t≥0 but larger than .(GtX )t≥0 , then it
is immediate to verify that X is a Markov process also on the space .(Ω, F , P , Gt ).
6 Obviously, we have .F¯t ⊆ F¯T if .0 ≤ t ≤ T . Moreover, .F¯t ⊆ F for every .t ≥ 0 thanks to the
completeness assumption of .(Ω, F , P ).
112 6 Stopping Times
The problem is not obvious when we want to enlarge the filtration. The following
results provide conditions under which it is possible to enlarge the filtration of a
Markov process so that it verifies the usual conditions, without affecting the Markov
property.
Proposition 6.2.12 Let .X = (Xt )t≥0 be a Markov process with transition law p
on the complete filtered space .(Ω, F , P , Ft ). Then X is a Markov process with
transition law p on .(Ω, F , P ) with respect to the completed filtration .(F¯t )t≥0
in (6.2.6).
Proof Clearly, X is adapted to .F¯ so we only need to prove that
Formula (6.2.7) is true if .G ∈ Ft : on the other hand (see Remark 1.4.3 in [113])
G ∈ F¯t = σ (Ft ∪ N ) if and only if .G = A ∪ N for some .A ∈ Ft and .N ∈ N .
.
Therefore, we have
⎾ ⏋ ⎾ ⏋
E [Z1G ] = E [Z1A ] = E 1(XT ∈H ) 1A = E 1(XT ∈H ) 1G .
.
⨆
⨅
It is possible to enlarge the filtration to make it right-continuous and maintain
the Markov property, assuming additional continuity assumptions for the process
trajectories (e.g., a.s. right-continuity) and for the process transition law (the Feller
property, Definition 2.1.10).
Proposition 6.2.13 Let .X = (Xt )t≥0 be a Markov process with transition law p on
the complete filtered space .(Ω, F , P , Ft ). Suppose that X is a Feller process with
a.s. right-continuous trajectories. Then X is a Markov process with transition law
p on .(Ω, F , P , Ft+ ).
Proof Clearly, X is adapted to .(Ft+ )t≥0 so there is only to prove the Markov
property, namely that for every .0 ≤ t < T and .ϕ ∈ bB we have
ˆ
Z = E [ϕ(XT ) | Ft+ ]
. where Z := p(t, Xt ; T , dy)ϕ(y).
R
Now, let .h > 0 such that .t + h < T : we have .G ∈ Ft+h and therefore, by the
Markov property of X with respect to .(Ft )t≥0 , we have
⎾ˆ ⏋
E [ϕ(XT )1G ] = E
. p(t + h, Xt+h ; T , dy)ϕ(y)1G . (6.2.9)
R
Utilizing the a.s. right-continuity of the trajectories of X and the Feller property
of p, we can take the limit as h tends to .0+ in (6.2.9). Applying the dominated
convergence theorem yields (6.2.8).
⨆
⨅
Remark 6.2.14 ([!]) Combining Propositions 6.2.12 and 6.2.13 we have the fol-
lowing result: if X is an a.s. right-continuous, Markov and Feller process on the
complete space .(Ω, F , P , Ft ) then X is a Markov process also on the complete
space .(Ω, F , P , (F¯t+ )t≥0 ) where the usual conditions hold.
Next, we show that for a Markov process X with respect to its own standard
filtration .F X , we simply have
FtX = σ (GtX ∪ N ),
. t ≥ 0. (6.2.10)
⎾ ⏋
1A = E 1A | FtX ∈ bσ (GtX ∪ N ).
.
⨆
⨅
Remark 6.2.16 ([!]) Combining Propositions 6.2.12, 6.2.13, and 6.2.15, we have
the following result: let Xbe a Markov and Feller right-continuous process with
continuity of .F , we have
X
⋂
.σ (Xt ) ⊆ σ (Xs , t ≤ s ≤ t + ε) ⊆ Ft ∩ Ft,∞
X
.
ε>0
if this were the case, the thesis would be an obvious consequence of Example 4.3.3
in [113]. On the other hand, by Corollary 2.2.5, .Ft and .Gt,∞X are, conditionally on
therefore we have
P (A | Xt ) = P (A ∩ A | Xt ) = P (A | Xt )2 .
.
F0 = F0 ∩ F0,∞
X X X since .τ is a stopping time; here .(τ = 0) indicates the
event according to which the process X exits immediately from H . Then we have
.P (τ = 0 | X0 ) = 0 or .P (τ = 0 | X0 ) = 1, that is almost all trajectories of X exit
immediately from H or almost none. This fact is particularly interesting when .X0
belongs to the boundary of H .
We now study the filtration enlargement for the Poisson process and the Brownian
motion. To treat the subject in a unified way, we introduce a class of processes of
which Poisson and Brownian are particular cases.
Definition 6.2.19 (Lévy Process) Let .X = (Xt )t≥0 be a real stochastic process
defined on a complete filtered probability space .(Ω, F , P , Ft ). We say that X is a
Lévy process if it satisfies the following properties:
(i) .X0 = 0 a.s.;
(ii) the trajectories of X are a.s. càdlàg;
6.2 The Continuous Case 115
ˆ
.(t, x) |−→ p(t, x; t + h, dy)ϕ(y) =
R
ˆ
. = p(0, x; h, dy)ϕ(y) = E [ϕ(Xh + x)]
R
2
where .ψ is called the characteristic exponent of X: for example, .ψ(η) = − η2
for Brownian motion and .ψ(η) = λ(eiη − 1) for the Poisson process (cf.
Remark 5.1.4). Then, setting for simplicity .p(T , ·) = p(0, 0; T , ·), we have the
following remarkable relation:
(since .p(T , dy) solves the forward Kolmogorov equation (2.5.25), .∂T p(T , ·) =
AT∗ p(T , ·) where .AT∗ is the adjoint of the infinitesimal generator or characteristic
operator of X)
ˆ
. = eiηy AT∗ p(T , dy).
R
116 6 Stopping Times
AT∗ = ψ(i∂y ).
.
2
For example, for the Brownian motion we have .ψ(η) = − η2 and
1
AT∗ = ψ(i∂y ) =
. ∂yy ,
2
while for the Poisson process, since .ψ(η) = λ(eiη − 1), we have
as a Taylor series expansion valid for every analytic function .ϕ. The general
expression of the characteristic exponent of a Lévy process is given by the famous
Lévy-Khintchine formula
ˆ ⎛ ⎞
σ 2 η2
. ψ(η) = iμη − + eiηx − 1 − iηx1|x|≤1 ν(dx)
2 R
For each .H ∈ B, .ν(H ) indicates the expected number of jumps of the process
trajectories in a unit time period, with size .Δt X ∈ H : for example, for the Poisson
process, we have .ν = λδ1 and for the compound Poisson process of Example 5.1.5,
we have .ν = λμZ where .μZ is the law of the variables .Zn , i.e., the individual jumps
of the process.
If a Lévy process X is a.s. continuous then .ν ≡ 0 and therefore necessarily X
is a Brownian motion with drift, i.e., a process of the form .Xt = μt + σ Wt with
.μ, σ ∈ R and W Brownian motion. Among the reference texts for the general
Proof It suffices to verify that, for each .0 ≤ s < t, the increment .Xt − Xs is
independent of .F¯s and of .Fs+ , i.e., we have
P (Xt − Xs ∈ H | G) = P (Xt − Xs ∈ H ),
. H ∈ B, (6.2.12)
if .G ∈ F¯s ∪ Fs+ with .P (G) > 0. Let us first consider the case .G ∈ F¯s (always
assuming .P (G) > 0). Equation (6.2.12) is true if .G ∈ Fs : on the other hand (cf.
Remark 1.4.3 in [113]) .G ∈ F¯s = σ (Fs ∪ N ) if and only if .G = A ∪ N for some
.A ∈ Fs and .N ∈ N (and necessarily .P (A) > 0 since .P (G) > 0). Hence we have
Now let us consider the case .G ∈ Fs+ with .P (G) > 0. Here we use the fact
that, by Corollary 2.5.8 in [113], Eq. (6.2.12) is true if and only if we have
E [ϕ(Xt − Xs ) | G] = E [ϕ(Xt − Xs )] ,
.
for every .ϕ ∈ bC. We observe that, for every .h > 0, .G ∈ Fs+h and therefore G is
independent from .Xt+h − Xs+h : then we have
Remark 6.2.25 ([!]) By Corollary 4.3.3 and Theorem 6.2.22, the canonical Brow-
nian motion is a Brownian motion, according to Definition 4.1.1, on the space
.(C(R ≥0 ), BμW , μW , F
W ). Moreover, the Wiener space is a Polish metric space
We resume the study of stopping times with values in .R≥0 ∪ {∞} (cf. Definition
6.2.2), on a filtered space .(Ω, F , P , Ft ) satisfying the usual conditions. We leave
as an exercise the proof of the following
Proposition 6.2.26
(i) If .τ = t a.s. then .τ is a stopping time;
(ii) if .τ, σ are stopping times then also .τ ∧ σ and .τ ∨ σ are stopping times;
(iii) if .(τn )n≥1 is an increasing sequence (i.e., .τn ≤ τn+1 a.s. for every .n ∈ N) then
. sup τn is a stopping time;
n∈N
(iv) if .(τn )n≥1 is a decreasing sequence (i.e., .τn ≥ τn+1 a.s. for every .n ∈ N) then
. inf τn is a stopping time;
n∈N
(v) if .τ is a stopping time then for every .ε ≥ 0 also .τ + ε is a stopping time.
Now consider a stochastic process .X = (Xt )t≥0 on the filtered space
(Ω, F , P , Ft ) that verifies the usual conditions. In the analysis of stopping
.
In other words, X is progressively measurable if, for every fixed .t > 0, the
function .g := X|[0,t]×Ω , defined by
modification (for a proof of this fact see, for example [96], Theorem T46 on p. 68).
We will only need the following much simpler result:
Proposition 6.2.28 If X is adapted to .(Ft ) and has a.s. right-continuous trajecto-
ries (or has a.s. left-continuous trajectories) then it is progressively measurable.
Proof Consider the sequences
∞
⎲ ∞
⎲
(n)
→ t(n) :=
X X k−1
n
1[ k−1
n ,
k
) (t), X→ t := X kn 1[ k−1
n ,
k
) (t), t ∈ [0, T ], n ∈ N.
2 2 2n 2 2 2n
k=1 k=1
Note that .Fτ is a .σ -algebra and .Fτ = Ft if .τ is the constant stopping time equal
to t. Moreover, given a process .X = (Xt )t≥0 we define
⎧
Xτ (ω) (ω) if τ (ω) < ∞,
(Xτ )(ω) :=
.
0 if τ (ω) = ∞.
Proposition 6.2.29 In a filtered probability space where the usual conditions are
in force, we have:
(i) .τ ∈ mFτ ;
(ii) if .τ ≤ σ then .Fτ ⊆ Fσ ;
(iii) .Fτ ∩ Fσ = Fτ ∧σ ;
Proof
(i) We have to show that .(τ ∈ H ) ∩ (τ ≤ t) ∈ Ft for every .t ≥ 0 and .H ∈ B: the
thesis follows easily since by Lemma 2.1.5 in [113] it is sufficient to consider
H of the type .(−∞, s] with .s ∈ R.
(ii) If .τ ≤ σ then .(σ ≤ t) ⊆ (τ ≤ t): hence for every .A ∈ Fτ we have
A ∩ (σ ≤ t) = A ∩ (τ ≤ t) ∩ (σ ≤ t) .
.
◟ ◝◜ ◞ ◟ ◝◜ ◞
∈Ft ∈Ft
f : (Ω, Ft ) −→ ([0, t] × Ω, B ⊗ Ft ),
. f (t, ω) := (τ (ω) ∧ t, ω),
We summarize the most significant findings of the chapter and the fundamental
concepts to be retained from an initial reading, while disregarding the more technical
or secondary matters. As usual, if you have any doubt about what the following
succinct statements mean, please review the corresponding section.
• Section 6.1: stopping times are random times that comply with the information
structure of the assigned filtration. They are a useful tool in various fields and in
particular for the study of the fundamental properties of martingales. Even in the
6.3 Key Ideas to Remember 121
discrete case, many of the main ideas and techniques related to stopping times
emerge: the proofs, although using elementary tools, can be quite challenging.
Stopping a process maintains its essential properties such as being adapted and
the martingale property.
• Section 6.1.1: the optional sampling theorem and Doob’s maximal inequalities
are crucial results that will be systematically used in the following chapters: so it
is useful to dwell on the details of the proofs. The upcrossing lemma is a rather
unusual and subtle result, whose use will be limited to proving the continuity of
martingale trajectories: its proof can be skipped at a first reading.
• Section 6.2.1: the study of stopping times in the continuous case involves some
technical difficulties. First of all, it is necessary to assume the so-called usual
conditions on the filtration: these are crucial, for example, in the study of exit
times of a process from a closed set.
• Sections 6.2.2 and 6.2.3: every filtration can be enlarged in such a way that it
satisfies the usual conditions, but in that case, it is necessary to prove that certain
properties of the processes remain valid: for instance, the Markov property or the
independence properties of the increments of a Lévy process. It is useful to grasp
the statements in these sections, but one can gloss over the technical aspects of
the proofs.
• Section 6.2.4: the notion of progressively measurable process strengthens that
of an adapted process as it requires a joint measurability property in .(t, ω). In
particular, a progressively measurable process is also measurable as a function of
the time variable: this is relevant in the context of stochastic integration theory.
Main notations used or introduced in this chapter:
L’appartenenza
è assai di più della salvezza personale
è la speranza di ogni uomo che sta male
e non gli basta esser civile.
È quel vigore che si sente se fai parte di qualcosa
che in sé travolge ogni egoismo personale
on quell’aria più vitale che è davvero contagiosa.1
Giorgio Gaber
In this chapter, .X = (Xt )t≥0 denotes a Markov process with transition law p on a
filtered probability space .(Ω, F , P , Ft ) satisfying the usual conditions. The strong
Markov property is an extension of the Markov property in which the initial time is
a stopping time.
Definition 7.1.1 (Strong Markov property) We say that X satisfies the strong
Markov property if for any h > 0, ϕ ∈ bB and τ being an almost surely finite
stopping time, we have
ˆ
. p(τ, Xτ ; τ + h, dy)ϕ(y) = E [ϕ (Xτ +h ) | Fτ ] . (7.1.1)
R
1 Belonging
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 123
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_7
124 7 Strong Markov Property
E [Z1A ] = E [ϕ (Xτ +h ) 1A ] .
. (7.1.2)
First, consider the case where τ takes only a countable infinity of values tk , k ∈ N:
in this case, (7.1.2) follows from the fact that
∞
⎲ ⎡ ⎤
.E [Z1A ] = E Z1A∩(τ =tk )
k=1
⎡ ⎤
∞
⎲ ˆ
= E ⎣ p(tk , Xtk ; tk + h, dy)ϕ(y)1A∩(τ =tk ) ⎦ =
k=1 R
∞
⎲ ⎡ ⎤
. = E ϕ(Xtk +h )1A∩(τ =tk ) = E [ϕ(Xτ +h )1A ] .
k=1
7.1 Feller and Strong Markov Properties 125
In the general case, consider the approximating sequence of stopping times defined
as
⎧
2n ≤ τ (ω) < 2n for k ∈ N,
k
2n if k−1 k
.τn (ω) =
∞ if τ (ω) = ∞.
Wtτ := Wt+τ − Wτ ,
. t ≥ 0, (7.1.3)
⎡ ⎤
= eiηWτ E eiηWt+τ | Fτ
⎡ ⎤ η2 t 2
= eiηWτ E eiηWt+τ | Wτ = e− 2
thanks to the strong Markov property in the form (7.1.1). From Theorem 4.2.10 in
[113] it follows that Wtτ ∼ N0,t and is independent of Fτ . Similarly, we prove that
Wtτ − Wsτ ∼ N0,t−s and is independent of Fτ +s for every 0 ≤ s ≤ t. ⨆
⨅
126 7 Strong Markov Property
Fig. 7.1 Trajectories of a Brownian and its reflected process starting from .t0 = 0.2
Consider a Brownian motion W defined on the filtered space .(Ω, F , P , Ft ) and fix
t0 ≥ 0. We say that
.
( )
~t := Wt∧t0 − Wt − Wt∧t0 ,
W
. t ≥ 0,
is the reflected process of W starting from .t0 . Figure 7.1 represents a trajectory of W
and its reflected process .W ~ starting from .t0 = 0.2.
~ also is a Brownian motion on .(Ω, F , P , Ft ).
It is not difficult to check2 that .W
It is noteworthy that this result generalizes to the case where .t0 is a stopping time.
Theorem 7.2.1 (Reflection Principle) [!] Let .W = (Wt )t≥0 be a Brownian motion
on the filtered space .(Ω, F , P , Ft ) and .τ a stopping time. Then the reflected
process starting from .τ , defined as
2 For .s ≤ t we have
⎧
~ Wt if t ≤ t0 ,
.Wt =
2Wt0 − Wt if t > t0 ,
~t ∈ mFt . Moreover,
so that .W
⎧
⎨Wt − Ws
⎪ if s, t ≤ t0 ,
.W ~s = Wt − Ws − (Wt − Wt )
~t − W if s < t0 < t,
⎪
⎩
0 0
−(Wt − Ws ) if t0 ≤ s, t,
~t − W
and therefore .W ~s is independent of .Fs and has distribution .N0,t−s .
7.2 Reflection Principle 127
Proof It is enough to prove the thesis on a time interval .[0, T ] for a fixed .T > 0
and therefore it is not restrictive to assume .τ < ∞ so that the Brownian motion .W τ
in (7.1.3) is well defined. We observe that
Wt = Wt∧τ + Wt−τ
.
τ
1(t≥τ ) , ~t = Wt∧τ − Wt−τ
W τ
1(t≥τ ) .
The thesis follows from the fact that, being a Brownian motion, .W τ is equal in law
to .−W τ and is independent of .Fτ and therefore of .Wt∧τ and of .τ : it follows that W
~ are equal in law.
and .W ⨆
⨅
Consider the process of the maximum of W, defined by
W̄t := max Ws ,
. t ≥ 0.
s∈[0,t]
τa := inf{t ≥ 0 | Wt ≥ a}
.
~t ≥ a)
(Wt ≤ a, W̄t ≥ a) = (W
.
3 We set .A = (Wt ≤ a, W̄t ≥ a) and .B = (W ~t ≥ a). If .ω ∈ A then .τa (ω) ≤ t and therefore
~t (ω)
.W = 2Wτa (ω) (ω) − Wt = 2a − Wt ≥ a from which .ω ∈ B. Conversely, assume .W ~t (ω) ≥ a:
if .τa (ω) > t we would have .a ≤ W~t (ω) = Wt (ω) which is absurd. Then it must be .τa (ω) ≤ t and
~t (ω) = 2a − Wt (ω) so that .Wt (ω) ≥ a.
therefore obviously .W̄t (ω) ≥ a and also .a ≤ W
128 7 Strong Markov Property
so that
a2
ae− 2t
.γτa (t) = √ 1]0,+∞[ (t);
2π t 3/2
We set .I = R≥0 and suppose that X is the canonical version (cf. Proposition 2.2.6)
of a Markov process with time-homogeneous transition law p: thus, X is defined on
the complete space .(RI , FμI , μ, F X ) where .μ is the law of the process X and .F X
is the standard filtration of X (cf. Definition 6.2.11). Moreover .Xt (ω) = ω(t) for
every .t ≥ 0 and .ω ∈ RI .
To express the Markov property more effectively, we introduce the family of
translations .(θt )t≥0 defined by
θt : RI −→ RI ,
. (θt ω)(s) = ω(t + s), s ≥ 0, ω ∈ RI .
Intuitively, the translation operator .θt “cuts and removes” the part of the trajectory .ω
up to time t. Given a random variable Y , we denote by .Y ◦ θt the translated random
variable defined by
Note that .(Xs ◦ θt )(ω) = ω(t + s) = Xt+s (ω) or, more simply,
Xs ◦ θt = Xt+s .
.
Ex [Y ] := E [Y | X0 = x]
.
a version of the conditional expectation function of Y given .X0 (cf. Definition 4.2.16
in [113]) and .F0,∞
X = σ (X , s ≥ 0) (cf. (2.2.6)).
s
7.3 The Homogeneous Case 129
Theorem 7.3.1 (Strong Markov Property in the Homogeneous Case [!]) Let
X be the canonical version of a strong Markov process with time-homogeneous
transition law. For every a.s. finite stopping time .τ and for every .Y ∈ bF0,∞
X , we
have
EXτ [Y ] = E [Y ◦ θτ | Fτ ] .
. (7.3.1)
Proof For clarity, we explicitly observe that the left-hand side of (7.3.1) indicates
the function .Ex [Y ] evaluated at .x = Xτ . If X satisfies the strong Markov
property (7.1.1), we have
E [ϕ (Xh ) ◦ θτ | Fτ ] = E [ϕ (Xτ +h ) | Fτ ]
.
ˆ
= p(τ, Xτ ; τ + h, dy)ϕ(y) =
R
which proves (7.3.1) for .Y = ϕ(Xh ) with .h ≥ 0 and .ϕ ∈ bB. The general case is
proved as in Theorem 2.2.4, first extending (7.3.1) to the case
∏
n
Y =
. ϕi (Xhi )
i=1
with .0 ≤ h1 < · · · < hn and .ϕ1 , . . . , ϕn ∈ bB, and finally using the second
Dynkin’s theorem. ⨆
⨅
All the results on Markov processes encountered thus far seamlessly extend to
the multidimensional case, where processes take values in .Rd , without encountering
any significant difficulty. The following Theorem 7.3.2 is preliminary to the study
of the relationship between Markov processes and harmonic functions: we recall
that a harmonic function is a solution of the Laplace operator or more generally
of a partial differential equation of elliptic type. We assume the following general
hypotheses:
• D is an open set in .Rd ;
• X is the canonical version of a strong Markov process with values in .Rd ;
• X is continuous and has a time-homogeneous transition law p;
• .X0 ∈ D a.s.;
• .τD < ∞ a.s. where .τD is the exit time of X from D (cf. Example 6.2.3).
130 7 Strong Markov Property
We denote by .∂D the boundary of D and observe that, based on the assumptions
made, .XτD ∈ ∂D a.s. In the following statement, .Ex [·] ≡ E [· | X0 = x] indicates
the conditional expectation function given .X0 .
Theorem 7.3.2 Let .ϕ ∈ bB(∂D). If 4
⎡ ⎤
u(x) = Ex ϕ(XτD )
. (7.3.2)
then we have:
(i) the process .(u(Xt∧τD ))t≥0 is a martingale with respect to the filtration
X
.(Ft∧τ )t≥0 ;
D
(ii) for every .y ∈ D and .ϵ > 0 such that .D(y, ϵ) := {z ∈ Rd | |z − y| < ϵ} ⊆ D
we have
⎡ ( )⎤
u(x) = Ex u XτD(y,ϵ)
. (7.3.3)
XτD ◦ θτ = XτD .
. (7.3.4)
since the trajectory .ω and the trajectory .θτ (ω), obtained by cutting and removing the
part of .ω up to the instant .τ (ω), exit D for the first time at the same point .XτD (ω).
Let us prove (i): for .0 ≤ s ≤ t we have
⎡ ⎤ ⎡ ⎡ ⎤ ⎤
E u(Xt∧τD ) | Fs∧τD = E EXt∧τD ϕ(XτD ) | Fs∧τD =
.
⎡ ⎡ ⎤ ⎤
. = E E ϕ(XτD ) ◦ θt∧τD | Ft∧τD | Fs∧τD =
4 Formula (7.3.2) means that u is a version of the conditional expectation function of .ϕ(X ) given
τD
.X0 .
7.3 The Homogeneous Case 131
that is
⎡ ⎤
u(X0 ) = E u(XτD(y,ϵ) ) | X0
.
In this chapter, we extend some important results from the discrete to the continuous
case, such as the optional sampling theorem and Doob’s maximal inequalities for
martingales. The general strategy consists of three steps:
. the results are first extended from the discrete case, in which the number of time
instants is finite, to the case in which the time instants are the so-called dyadic
rationals defined by
⋃ { } { }
.D := Dn , Dn := k
2n | k ∈ N0 = 0, 21n , 22n , 23n , . . . .
n≥1
1 The inability to be satisfied by any earthly thing, nor, so to speak, by the entire earth; to consider
the immeasurable vastness of space, the wondrous number and magnitude of the worlds, and to
find that everything is small and insufficient for the capacity of one’s own soul; to imagine the
number of worlds as infinite, and the universe as infinite, and to feel that our soul and desire would
still be greater than this vast universe; and to always accuse things of inadequacy and nullity, and
to suffer from lack and emptiness, and therefore boredom - this, to me, seems the greatest sign of
greatness and nobility that one can perceive in human nature. Translation by J. Galassi
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 133
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_8
134 8 Continuous Martingales
We observe that .Dn ⊆ Dn+1 for every .n ∈ N and .D is a countable set dense in
R≥0 ;
.
Mn := sup Xt ,
. M := sup Xt .
t∈DT ,n t∈D (T )
Fix .ε > 0. Recalling that .DT ,n ⊆ DT ,n+1 , by Beppo Levi’s theorem we have2
E [XT ]
. ≤ .
λ−ε
⎾ ⏋ ⎾ p⏋
E M p = lim E Mn ≤
.
n→∞
⨆
⨅
In the following statements, we will always assume the hypothesis of right-
continuity of the processes: we will see in Sect. 8.2 that, if the filtration satisfies
the usual conditions, every martingale admits a càdlàg modification.
Theorem 8.1.2 (Doob’s Maximal Inequalities [!]) Let .X = (Xt )t≥0 be a right-
continuous martingale (or a non-negative sub-martingale). For every .T , λ > 0 and
2 Note that
⎾ ⏋ ⎾ ⏋
.P (M > λ − ε) = E 1(M>λ−ε) = lim E 1(Mn >λ−ε) = lim P (Mn > λ − ε),
n→∞ n→∞
p > 1 we have
.
⎛ ⎞
E [|XT |]
P
. sup |Xt | ≥ λ ≤ ,. (8.1.4)
t∈[0,T ] λ
⎾ ⏋ ⎛ ⎞p
p ⎾ ⏋
E sup |Xt | ≤ p
E |XT |p . (8.1.5)
t∈[0,T ] p−1
Proof The thesis is an immediate consequence of Lemma 8.1.1 since if X has right-
continuous trajectories then . sup |Xt | = sup |Xt |. ⨆
⨅
t∈[0,T ] t∈D (T )
Proof We will see later (cf. Corollary 8.4.1) that stopping a right-continuous
martingale results in a martingale. Then the thesis follows from Theorem 8.1.2
applied to .(Xt∧τ )t≥0 . ⨆
⨅
To extend some results on stopping times and martingales from the discrete case
to the continuous one, the following technical approximation result is useful.
Lemma 8.1.4 Let .τ : Ω −→ [0, +∞] be a stopping time. There exists a sequence
(τn )n∈N of discrete stopping times (cf. Definition 6.1.1)
.
τn : Ω −→ { 2kn | k = 1, 2, . . . , n2n }
.
such that:
(i) .τn −→ τ as .n → ∞;
(ii) .τn+1 (ω) ≤ τn (ω) if .n > τ (ω).
Proof For each .n ∈ N we set
⎧
k
2n if k−1
2n ≤ τ (ω) < k
2n for k ∈ {1, 2, . . . , n2n },
τn (ω) =
.
n if τ (ω) ≥ n.
8.1 Optional Sampling and Maximal Inequalities 137
τn (ω) −
.
1
2n ≤ τ (ω) ≤ τn (ω)
which proves (i) and (ii). Finally, for every fixed .n ∈ N, .τn is a discrete stopping
time with respect to the filtration defined by .F k for .k = 0, 1, . . . , n2n , since we
2n
have
( ) ⎛ ⎞
. τn = k
2n = k−1
2n ≤τ < k
2n ∈F k , k = 0, 1, . . . , n2n − 1,
2n
⎛ ⎞ ⎛ ⎞c
(τn = n) = τ ≥ n − 1
2n = τ <n− 1
2n ∈F 1 ⊆ Fn .
n− 2n
⨆
⨅
Remark 8.1.5 Based on (ii) of Lemma 8.1.4, if .τ (ω) < ∞, the approximating
sequence .(τn (ω))n∈N has the property of being monotonically decreasing at least
for large n. On the other hand, if .τ (ω) = ∞ then .τn (ω) = n.
We give a first version of the optional sampling theorem: we will see a second
one, with weaker assumptions on stopping times, in Theorem 8.5.4.
Theorem 8.1.6 (Optional Sampling Theorem [!!!]) Let .X = (Xt )t≥0 be a right-
continuous sub-martingale. If .τ1 and .τ2 are stopping times such that .τ1 ≤ τ2 ≤ T
for some .T > 0, then we have
⎾ ⏋
Xτ1 ≤ E Xτ2 | Fτ1 .
.
and the right-continuity of X, we have .Xτ̄i,n −−−→ Xτi . On the other hand, by the
n→∞
discrete version of the optional sampling theorem (cf. Theorem 6.1.10) we have
⎾ ⏋
Xτ̄i,n = E XT | Fτ̄i,n
. (8.1.6)
and therefore, by Proposition C.0.7 in [113] (and Remark C.0.8 in [113]), the
sequences .(Xτ̄i,n )n∈N are uniformly integrable. Then, by Vitali’s convergence
theorem C.0.2 in [113], we also have convergence in .L1 (Ω, P ):
L1
Xτ̄i,n −−−→ Xτi ,
. i = 1, 2. (8.1.7)
n→∞
138 8 Continuous Martingales
The thesis follows by taking the limit as .n → ∞, thanks to (8.1.7) and remembering
that the convergence in .L1 (Ω, P ) of .Xτ̄i,n implies the convergence of the conditional
⎾ ⏋
expectations .E Xτ̄i,n | Fτ1 (cf. Theorem 4.2.10 in [113]).
If X is a sub-martingale, the proof is completely analogous except for the fact
that uniform integrability cannot be deduced directly from (8.1.6) but requires using
a slightly more subtle argument: for details, we refer to [6], Theorem 5.13. ⨆
⨅
The following useful result shows that the martingale property is equivalent to
the property of having constant expectation over time, at least if we also consider
random times (more precisely, bounded stopping times).
Theorem 8.1.7 ([!]) Let .X = (Xt )t≥0 be an adapted, right-continuous and
absolutely integrable (i.e., such that .Xt ∈ L1 (Ω, P ) for every .t ≥ 0) process.
Then X is a martingale if and only if .E [Xτ ] = E [X0 ] for every bounded3 stopping
time .τ .
Proof If X is a right-continuous martingale4 then it is constant on average on
bounded stopping times by the optional sampling Theorem 8.1.6. Conversely, since
X is adapted by hypothesis, it remains only to verify that
. E [Xt 1A ] = E [Xs 1A ] , s ≤ t, A ∈ Fs .
τ := s1A + t1Ac
.
and subtracting one equation from the other yields the thesis. ⨆
⨅
In this section, we prove that, under the usual conditions on the filtration, every
martingale admits a càdlàg modification and thus the right-continuity assumption
made in the statements of the previous section can be removed. We first prove that
a martingale can only have jump discontinuities (with jumps of finite size) on the
dyadic rationals of .R≥0 .
Lemma 8.2.1 Let .X = (Xt )t∈D be a martingale or a non-negative sub-martingale.
There exists a negligible event N such that, for every .t ≥ 0, the limits
exist and are finite for every .ω ∈ Ω \ N . Moreover, if . sup E [|Xt |] < ∞ then also
t∈D
the limit
where .νn,a,b is the number of upcrossings of .(|Xt |)t∈Dn ∩[0,n] on .[a, b]. Taking the
limit as .n → ∞ and using Beppo Levi’s theorem, we have
⎛ ⎞
κ ⎾ ⏋ κ
P
. sup |Xt | ≥ λ ≤ , E νa,b ≤ ,
t∈D λ b−a
140 8 Continuous Martingales
where .νa,b is the number of upcrossings of .(|Xt |)t∈D on .[a, b]. This implies the
existence of two negligible events .N0 and .Na,b for which
is negligible: for every .ω ∈ Ω \ N we have that . sup |Xt (ω)| < ∞ and, on every
t∈D
interval with non-negative rational endpoints, there are only a finite number of
upcrossings of .|X(ω)|; consequently the limits in (8.2.1) and (8.2.2) exist and are
finite on .Ω \ N .
Now consider the case where X is a generic martingale. For every .n ∈ N, we
can apply what has just been proven to the stopped process .(Xt∧n )t∈D . Indeed it is
immediate to verify that .(Xt∧n )t∈D is a martingale and
X∞ := lim Xn .
.
n→∞
Proof We only prove the case where X is a martingale. By Lemma 8.2.1 the
trajectories of .(Xt )t∈D have finite right and left limits almost surely. Then the
process
. ~t := lim Xs ,
X t ≥ 0,
s→t +
s∈D
. ~t = E [XT | Ft ] ,
X 0 ≤ t ≤ T; (8.2.3)
In light of Theorem 8.2.3 from now on, given a martingale with respect
to a filtration that verifies usual conditions, we implicitly assume to always
consider a càdlàg version of it.
In this section we introduce the space of processes on which we will build the
stochastic integral and prove that it is a Banach space.
Definition 8.3.1 For .T > 0, we denote by .MTc,2 the space of continuous square-
integrable martingales .X = (Xt )t∈[0,T ] and set
/ ⎾ ⏋
‖X‖T := ‖XT ‖L2 (Ω,P ) =
. E XT2 .
142 8 Continuous Martingales
⎾ ⏋1
E |Xn,T − Xm,T |2 2 ‖Xn − Xm ‖T
. ≤ = .
ε ε
In particular, if .t = T , we have
‖ ‖
. lim ‖X − Xnk ‖T = 0.
k→∞
One of the main motivations for the introduction of stopping times is the use of
so-called “localization” techniques, which allow for relaxation of the integrability
assumptions. In this section, we analyze the specific case of martingales.
Consider a filtered space .(Ω, F , P , Ft ) satisfying the usual conditions. The
concept of local martingale extends that of martingale by removing the integrability
condition of the process. This allows to include important classes of processes (for
example, stochastic integrals) that are martingales only if stopped (or “localized”).
We first observe that, as in the discrete case (cf. Proposition 6.1.7), the martingale
property is preserved by stopping the process.
Corollary 8.4.1 (Stopped Martingale) Let .X = (Xt )t≥0 be a (càdlàg) mar-
tingale and .τ0 a stopping time. Then also the stopped process .(Xt∧τ0 )t≥0 is a
martingale.
Proof Since X is càdlàg and adapted by hypothesis, by Proposition ⎾6.2.29 we have
⏋
Xt∧τ0 ∈ mFt∧τ0 ⊆ mFt . Moreover, by Theorem 8.1.6 .Xt∧τ0 = E Xt | Ft∧τ0 ∈
.
L1 (Ω, P ) for every .t⎾ ≥ 0. ⏋Again by Theorem 8.1.6, for every bounded stopping
time .τ we have .E Xτ ∧τ0 = E [X0 ] and therefore the thesis follows from
Theorem 8.1.7. ⨆
⨅
Definition 8.4.2 (Local Martingale) We say that .X = (Xt )t≥0 is a local
martingale if .X0 ∈ mF0 and there exists a non-decreasing sequence .(τn )n∈N of
stopping times, called localizing sequence for X, such that:
(i) .τn ↗ ∞ as .n → ∞;
(ii) for every .n ∈ N, the stopped and translated process .(Xt∧τn − X0 )t≥0 is a
martingale.
We denote by .M c,loc the space of continuous local martingales.
By Corollary 8.4.1, every (càdlàg) martingale is a local martingale with localiz-
ing sequence .τn ≡ ∞.
144 8 Continuous Martingales
Example 8.4.3 Consider the constant process .X = (Xt )t≥0 with .Xt ≡ X0 ∈ mF0
for every .t ≥ 0. If .X0 ∈ L1 (Ω, P ) then X is a martingale. If .X0 ∈/ L1 (Ω, P ), the
process X is not a martingale due to the lack of integrability but is obviously a local
martingale: in fact, setting .τn ≡ ∞, we have .Xt∧τn − X0 ≡ 0.
Example 8.4.4 Let W be a Brownian motion on .(Ω, F , P , Ft ) and .Y ∈ mF0 .
Then the process
Xt := Y Wt
.
E [Y Wt | Fs ] = Y E [Wt | Fs ] = Y Ws ,
. s ≤ t,
so that X is a martingale.
Without further assumptions on Y apart from the .F0 -measurability, the process
X may not be a martingale due to the lack of integrability but is still a local
martingale: the idea is to remove the trajectories where Y is “too large” by setting
⎧
0 if |Y | > n,
τn :=
.
∞ if |Y | ≤ n,
which defines an increasing sequence of stopping times (note that .(τn ≤ t) = (|Y | >
n) ∈ F0 ⊆ Ft ). Then, for every .n ∈ N, the process
is a martingale since it is of the type .Wt Ȳ where .Ȳ = Y 1(|Y |≤n) is a bounded .F0 -
measurable random variable.
Exercise 8.4.5 (Brownian Motion with Random Initial Value) Let .W =
(Wt )t≥0 be a Brownian motion on .(Ω, F , P , Ft ). Given .t0 ≥ 0 and .Z ∈ mFt0 , let
Wtt0 ,Z := Wt − Wt0 + Z,
. t ≥ t0 .
The process .W t0 ,Z has an initial value (at time .t0 ) equal to Z, is continuous, adapted
and has independent and stationary increments, equal to the increments of a standard
Brownian motion. If .Z ∈ L1 (Ω, P ) then .(Wtt0 ,Z )t≥t0 is a martingale; in general,
t ,Z is a local martingale with localizing sequence .τ ≡ ∞.
.W 0 n
We also notice that, given any distribution .μ, it is not difficult to construct a
μ
Brownian motion .W μ with initial distribution .W0 ∼ μ on the space .(Ω × R, F ⊗
B, P ⊗ μ).
8.4 The Space M c,loc of Continuous Local Martingales 145
Remark 8.4.6 ([!]) If X is a local martingale with localizing sequence .(τn )n∈N
then:
(i) X has a modification with càdlàg trajectories that is constructed from the
existence of a càdlàg modification of each martingale .Xt∧τn .
Hereafter, the fact that a local martingale is càdlàg will be always implicitly
assumed by convention;
(ii) X is adapted since .X0 ∈ mF0 by definition and .Xt − X0 is the pointwise limit
of .Xt∧τn − X0 which is .mFt -measurable by definition of martingale;
(iii) a priori .Xt does not have any integrability property;
(iv) if X has càdlàg trajectories then there exists a localizing sequence .(τ̄n )n∈N
such that
| |
|τ̄n | ≤ n,
. |Xt∧τ̄ | ≤ n, t ≥ 0, n ∈ N.
n
Indeed, by Proposition 6.2.7, the exit time .σn of .|X| from the interval .[−n, n]
is a stopping time; moreover, since X is càdlàg (and therefore every trajectory
of X is bounded on every compact time interval) we have .σn ↗ ∞. Then
τ̄n := τn ∧ σn ∧ n
.
The thesis follows by taking the limit as .n → ∞ and using the dominated
convergence theorem for the conditional expectation. Notice that, in particular,
every bounded local martingale is a true martingale. Convergence in (8.4.1)
is a very delicate issue: for example, there exist uniformly integrable local
martingales that are not martingales;5
Xs ≥ E [Xt | Fs ] ,
. 0 ≤ s ≤ t ≤ T. (8.4.2)
and therefore from the assumption we get .E [Xt ] = E [X0 ] for every .t ∈
[0, T ]. If it were .Xs > E [Xt | Fs ] on a non-negligible event, we would have
a contradiction from (8.4.2).
In this section we prove a further version of the optional sampling theorem. Let
(Ω, F , P , Ft ) be a filtered space satisfying the usual conditions. To deal with the
.
case where the time index varies in .R≥0 we introduce a integrability condition that
will allow to easily reduce to the case .[0, T ] by using stopping times.
Definition 8.5.1 Let .p ≥ 1. We say that a process .X = (Xt )t≥0 is uniformly in .Lp
if
⎾ ⏋
. sup E |Xt |p < ∞.
t≥0
Proposition 8.5.2 Let .X = (Xt )t≥0 be a martingale. The following statements are
equivalent:
(i) X is uniformly in .L2 ;
(ii) there exists a .F∞ -measurable6 random variable .X∞ ∈ L2 (Ω, P ) such that
Xt = E [X∞ | Ft ] ,
. t ≥ 0.
[(i). ⇒ (ii)] Consider the discrete martingale .(Xn )n∈N . By Theorem 8.2.2, for almost
every .ω ∈ Ω, there exists and is finite the limit
we also set .X∞ (ω) = 0 for the .ω for which such limit does not exist or is not finite.
Clearly, .X∞ ∈ mF∞ and also .X∞ ∈ L2 (Ω, P ) since by Fatou’s lemma, we have
⎾ ⏋ ⎾ ⏋ ⎾ ⏋
2
E X∞
. ≤ lim E Xn2 ≤ sup E Xt2 < ∞
n→∞ t≥0
Xn = E [X∞ | Fn ] ,
. n ∈ N; (8.5.3)
and (8.5.1) follows by taking the limit as .n → +∞, by Beppo Levi’s theorem. ⨅⨆
⎾ ⏋
Example 8.5.3 A real Brownian motion W is not uniformly in .L2 since .E Wt2 =
t. However, for any fixed .T > 0, the process .Xt := Wt∧T is a martingale that is
uniformly in .L2 with .X∞ = WT .
148 8 Continuous Martingales
The next result is a version of the optional sampling theorem for martingales that
are uniformly in .L2 . Such a integrability condition is necessary as is evident from
the following example: given a real Brownian motion W and .a > 0, consider the
stopping time .τa = inf{t ≥ 0 | Wt ≥ a}. We have seen in Remark 7.2.3-(ii) that
.τa < ∞ a.s. but
⎾ ⏋
0 = W0 < E Wτa = a.
.
Theorem 8.5.4 (Optional Sampling Theorem [!]) Let .X = (Xt )t≥0 be a (càdlàg)
martingale that is uniformly in .L2 . If .τ1 and .τ2 are stopping times such that .τ1 ≤
τ2 < ∞, then we have
⎾ ⏋
Xτ1 = E Xτ2 | Fτ1 .
.
X0 ≤ E [Xτ | F0 ] .
. (8.5.4)
First, we observe that by (8.5.1) we have .Xτ ∈ L2 (Ω, P ). Applying the optional
sampling Theorem 8.1.6 with the sequence of bounded stopping times .τ ∧ n, we
have
. X0 ≤ E [Xτ ∧n | F0 ] .
thanks to (8.5.1).
To prove the thesis, it is sufficient to verify that for every .A ∈ Fτ1 , we have
⎾ ⏋ ⎾ ⏋
E Xτ1 1A = E Xτ2 1A .
. (8.5.5)
Consider
. τ := τ1 1A + τ2 1Ac
We distill the chapter’s key findings and essential concepts for easy comprehension
upon initial perusal, setting aside the intricacies of technical or secondary details.
As usual, if you have any doubt about what the following succinct statements mean,
please review the corresponding section.
. Section 8.1: the optional sampling theorem and Doob’s maximal inequalities
extend without difficulty from discrete to continuous martingales.
. Section 8.2: under the usual conditions, every martingale admits a càdlàg
modification; therefore the continuity assumption of Sect. 8.1 is actually not
restrictive.
. Section 8.3: the space .M c,2 of continuous square-integrable martingales X
on .[0, T ] is a Banach space, equipped with the .L2 norm of the final value,
.‖XT ‖L2 (Ω,P ) .
g : [0, T ] −→ Rd
.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 151
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_9
152 9 Theory of Variation
⎲
N
V (g; π ) :=
. |g(tk ) − g(tk−1 )| .
k=1
We say that
g : R≥0 −→ Rd
.
is locally of bounded variation, and we write .g ∈ BV, if .g|[0,T ] ∈ BVT for every
T > 0.
.
⎲
N ⎲
N
V (g; π ) =
. |g(tk ) − g(tk−1 )| = (g(tk ) − g(tk−1 )) = g(T ) − g(0)
k=1 k=1
where
is called the mesh of .π (i.e. the length of the longest subinterval). Interpreting
t |→ g(t) as a trajectory (or parametrized curve) in .Rd , the fact that .g ∈ BVT
.
means that g is rectifiable, in the sense that the length of g can be computed
9.1 Riemann-Stieltjes Integral 153
⎲
N ⎲
N
V (g; π ) =
. |g(tk ) − g(tk−1 )| ≤ c (tk − tk−1 ) = cT
k=1 k=1
for every .π ∈ PT .
(iv) If g is an integral function of the type
ˆ t
g(t) =
. u(s)ds, t ∈ [0, T ],
0
N |ˆ
⎲ | ⎲ N ˆ
| tk | tk
V (g; π ) = | u(s)ds || ≤ |u(s)|ds = ‖u‖L1 ,
.
|
k=1 tk−1 k=1 tk−1
for every .π ∈ PT .
(v) It is not difficult to prove that the function
⎧
0 if t = 0,
g(t) =
.
t sin 1
t if 0 < t ≤ T ,
τ = {τ1 , . . . , τN },
. τk ∈ [tk−1 , tk ], k = 1, . . . , N.
1 A polygonal approximation is obtained by connecting a finite number of line segments along the
curve.
154 9 Theory of Variation
⎲
N
S(f, g; π, τ ) :=
. f (τk )(g(tk ) − g(tk−1 ))
k=1
is the Riemann-Stieltjes sum of f with respect to g, relative to the partition .π and the
choice of points .τ .
Proposition 9.1.3 (Riemann-Stieltjes Integral) For every .f ∈ C[0, T ] and .g ∈
BVT there exists and is finite the limit
More precisely, for every .ε > 0 there exists .δε > 0 such that
| ˆ |
| T |
|
. S(f, g; π, τ ) − f dg || < ε
|
0
| |
. |S(f, g; π ' , τ ' ) − S(f, g; π '' , τ '' )| < ϵ
for every .π ' , π '' ∈ PT such that .|π ' |, |π '' | < δϵ and for every .τ ' ∈ Tπ ' and
''
.τ ∈ Tπ '' .
| | ⎲
N
. |S(f, g; π ' , τ ' ) − S(f, g; π '' , τ '' )| ≤ ϵ |g(tk ) − g(tk−1 )| ≤ ϵV (g; π )
k=1
For every .f ∈ C[0, T ], .π = {t0 , . . . , tN } ∈ PT and .τ ∈ Tπ , let .k̄ be the index for
which .t¯ ∈ ]tk̄−1 , tk̄ ]. Then we have
( )
S(f, g; π, τ ) = f (τk̄ ) g(tk̄ ) − g(tk̄−1 ) = f (τk̄ ) −−−→ f (t¯).
.
|π |→0
Hence
ˆ T
. f dg = f (t¯).
0
Note that
ˆ T ˆ
. f (t)dg(t) = f (t)δt¯(dt)
0 [0,T ]
where the right-hand side is the integral with respect to the Dirac delta measure
centered at .t¯.
Example 9.1.5 Let
ˆ t
. g(t) = u(s)ds, t ∈ [0, T ],
0
. τk ∈ arg min f, k = 1, . . . , N.
[tk−1 ,tk ]
Then we have
⎲
N
S(f, g; π, τ ) =
. f (τk )(g(tk ) − g(tk−1 ))
k=1
⎲
N ˆ tk
= f (τk ) u(s)ds
k=1 tk−1
N ˆ
⎲ tk ˆ T
≤ f (s)u(s)ds = f (s)u(s)ds.
k=1 tk−1 0
156 9 Theory of Variation
τk ∈ arg max f,
. k = 1, . . . , N.
[tk−1 ,tk ]
The general result that provides the rules for Riemann-Stieltjes integration is the
following important Itô’s formula.
Theorem 9.1.6 (Deterministic Itô’s Formula) For every .F = F (t, x) ∈
C 1 ([0, T ] × R) and .g ∈ BVT ∩ C[0, T ] we have
ˆ T ˆ T
F (T , g(T )) − F (0, g(0)) =
. (∂t F )(t, g(t))dt + (∂x F )(t, g(t))dg(t)
0 0
⎲
N
(T , g(T )) − F (0, g(0)) =
. (F (tk , g(tk )) − F (tk−1 , g(tk−1 ))) =
k=1
(by the mean value theorem and the continuity of g, with .τ ' , τ '' ∈ Tπ )
⎲
N
( )
. = (∂t F )(τk' , g(τk'' ))(tk − tk−1 ) + (∂x F )(τk' , g(τk'' )) (g(tk ) − g(tk−1 ))
k=1
The latter formally reminds the usual chain rule for the derivation of composite
functions.
9.1 Riemann-Stieltjes Integral 157
ˆ T ˆ T
F (T , g(T )) − F (0, g(0)). = (∂t F )(t, g(t))dt + (∇x F )(t, g(t))dg(t)
0 0
ˆ T d ˆ
⎲ T
= (∂t F )(t, g(t))dt + (∂xi F )(t, g(t))dgi (t)
0 i=1 0
or in differential notation
Example 9.1.8 Let us consider some examples of the application of the determin-
istic Itô’s formula:
(i) for .F (t, x) = x we have
ˆ T
g(T ) − g(0) =
. dg
0
dg 2 (t) = 2g(t)dg(t).
.
158 9 Theory of Variation
μ±
.
±
g ([a, b]) = μg (]a, b]) = g± (b) − g± (a), a ≤ b.
|μg | := μ+
.
−
g + μg
μg (H ) = μ+
.
−
g (H ) − μg (H ). (9.2.1)
We say that .μg is a signed measure since it can also take negative values, including
−∞.
.
respect to g on H as
ˆ ˆ ˆ
. f dμg := f dμ+
g − f dμ−
g.
H H H
2 We define the measures on .R≥0 since the space of non-negative real numbers will be the set
of time indices for stochastic processes. To apply Theorem 1.4.33 in [113], we can extend the
functions .g+ , g− so that they are continuous and constant for .t ≤ 0. All the results of the section
obviously hold on .(R, B ).
9.2 Lebesgue-Stieltjes Integral 159
⎲
N
. fπ± (t) = f (τk± )1[tk−1 ,tk [ (t)
k=1
with
Then we have
⎲
N ˆ ˆ ˆ
. f (τk− ) (g+ (tk ) − g+ (tk−1 )) = fπ− dμ+
g ≤ f dμ+
g ≤ fπ+ dμ+
g
k=1 [0.T ] [0,T ] [0.T ]
⎲
N
= f (τk+ ) (g+ (tk ) − g+ (tk−1 )) .
k=1
Proof First, assume that A and X are bounded a.s. by some .N ∈ N. Fixed .n ∈ N,
n for .k = 0, . . . , n. We have
let .τk = kτ
⎡ˆ ⎤ ⎡ n ⎤
τ ⎲ ( )
E
. XdAt = E X Aτk − Aτk−1
0 k=1
⎡ n ⎤
⎲ ⎡ ⎤( )
=E E X | Fτk Aτk − Aτk−1
k=1
⎡ n ⎤
⎲ ( )
=E Mτk Aτk − Aτk−1
k=1
⎡ˆ τ ⎤
(n)
=E Mt dAt
0
where
(n)
⎲
n
Mt
. = M0 + Mτk 1]τk−1 ,τk ] (t).
k=1
for almost every .ω such that .t ≤ τ (ω). Given the boundedness of X and therefore
of M, the thesis follows from the dominated convergence theorem.
Moving on to the general case, it is sufficient to apply what we have just proved
to .X ∧ N and .A ∧ N , using Beppo Levi’s theorem to take the limit as .N → ∞. ⨅ ⨆
9.3 Semimartingales
Xt := x + μt + σ Wt ,
. t ≥ 0,
(2)
⎲
N
VT (g; π ) :=
. |g(tk ) − g(tk−1 )|2 . (9.3.1)
k=1
(2)
. lim VT (g; π ) = 0.
|π |→0
162 9 Theory of Variation
Proof Since g is uniformly continuous on the compact interval .[0, T ], for every
ε > 0 there exists .δε > 0 such that
.
(2)
⎲
N
.V
T (g; π ) ≤ϵ |g(tk ) − g(tk−1 )| ≤ ϵVT (g).
k=1
⨆
⨅
Example 9.3.5 ([!]) If W is a real Brownian motion, then
(2)
. lim VT (W ; π ) = T in L2 (Ω, P ), (9.3.2)
|π |→0
and consequently, the trajectories of W are not of bounded variation almost surely.
To prove (9.3.2), given a partition .π = {t0 , t1 , . . . , tN } ∈ PT , we set
δk = tk − tk−1 ,
. Δk = Wtk − Wtk−1 , k = 1, . . . , N,
⎡ ⎤
and observe that .E Δ4k = 3δk2 and
⎡ ⎤
E Δ2k − δk = 0,
.
⎡⎛ ⎞⎛ ⎞⎤ ⎡⎛ ⎞ ⎡ ⎤⎤
E Δ2h − δh Δ2k − δk = E Δ2h − δh E Δ2k − δk | Fth = 0 (9.3.3)
⎲
N ⎡⎛ ⎞2 ⎤
= E Δ2k − δk
k=1
⎲ ⎡⎛ ⎞⎛ ⎞⎤
+2 E Δ2h − δh Δ2k − δk =
h<k
⎲
N ⎡ ⎤
. = E Δ4k − 2Δ2k δk + δk2 =
k=1
9.3 Semimartingales 163
(again by (9.3.3))
⎲
N ⎲
N
. = 2δk2 ≤ 2|π | δk = 2|π |T
k=1 k=1
In Example 9.3.5 we have repeatedly used the martingale property to prove that W
has positive quadratic variation and therefore is not of bounded variation. In fact, this
result extends to the entire class of continuous local martingales whose trajectories
are not of bounded variation unless they are identically zero.
Theorem 9.3.6 ([!]) Let .X = (Xt )t≥0 be a continuous local martingale, .X ∈
M c,loc . If .X ∈ BV then X is indistinguishable from the process identically equal to
.X0 .
Proof Without loss of generality, we can consider .X0 = 0. First, we prove the thesis
in the case where .X ∈ BV is a bounded continuous martingale: precisely, suppose
there exists a constant K such that
Δk = Xtk − Xtk−1 ,
. Δπ = max |Xtk − Xtk−1 |.
1≤k≤N
Then we have
⎡N ⎤ ⎡N ⎤
⎡ ⎤ ⎲⎛ ⎞ ⎲( )2
2
.E XT =E Xtk − Xtk−1
2 2
=E Xtk − Xtk−1
k=1 k=1
which, as⎡.|π | →
⎤ 0, tends to zero by (9.3.4) and the dominated convergence theorem.
Hence .E XT2 = 0 and by Doob’s maximal inequality
⎡ ⎤
⎡ ⎤
E
. sup Xt2 ≤ 4E XT2 = 0.
0≤t≤T
Xt2 = Mt + 〈X〉t ,
. t ≥ 0;
(iv)
⎡ ⎤
E (Xt − Xs )2 | Fs = E [〈X〉t − 〈X〉s | Fs ] ,
. t ≥ s ≥ 0. (9.4.1)
Formula (9.4.1) is the first version of an important identity called Itô’s isometry
(see Sect. 10.2.1).
More generally, if .X ∈ M c,loc then (ii) and (iii) still hold, while (i) is replaced by
(i’) .M ∈ M c,loc .
The process .〈X〉 is called the quadratic variation process of X and we have
2 ⎛
⎲
n
⎞2
.〈X〉t = lim X tkn − X t (k−1) , t > 0, (9.4.2)
n→∞ 2 2n
k=1
4 Clearly .〈X〉is also absolutely integrable since .〈X〉t = Xt2 − Mt with .Xt ∈ L2 (Ω, P ) by
hypothesis and .Mt ∈ L1 (Ω, P ) by definition of martingale.
166 9 Theory of Variation
have
2 ⎛
⎲
n
⎞2
.〈S〉t := lim S tkn − S t (k−1) = 〈X〉t (9.4.3)
n→∞ 2 2n
k=1
in probability and therefore we say that .〈S〉 is the quadratic variation process of S.
The proof of Theorem 9.4.1 is postponed to Sect. 9.6.
Example 9.4.2 Let .Xt = t +W⎡t , where⎤ W is a Brownian motion, then by definition
〈X〉t = 〈W 〉t = t. Note that .E Xt2 − t = t 2 and .Xt2 − t is not a martingale.
.
Remark 9.4.3 Theorem 9.4.1 is a special case of a deep and more general result,
known as Doob-Meyer decomposition theorem, which states that every càdlàg sub-
martingale X of class D (i.e., such that the family of random variables .Xτ , with .τ
stopping time, is uniformly integrable) can be uniquely written in the form .X =
M + A where M is a continuous martingale and A is an increasing process such
that .A0 = 0.
This result was first proved by Meyer in the 1960s of the last century and since
then many other proofs have been provided. A particularly concise proof has been
recently proposed in [14]: the very intuitive idea is to discretize the process X on
the dyadics, use the discrete version of the Doob’s decomposition theorem (cf.
Theorem 1.4.15) and finally prove that the sequence of discrete decompositions
converges to the desired decomposition, using Komlós’ Lemma 9.6.1.
Remark 9.4.4 By the optional sampling Theorem 8.1.6, the important iden-
tity (9.4.1) is generalized to the case where instead of .t, s there are two bounded
stopping times .τ, σ such that .σ ≤ τ ≤ T a.s. for some .T > 0.
〈X + Y 〉 − 〈X − Y 〉
〈X, Y 〉 :=
. , (9.5.1)
4
is the unique (up to indistinguishability) process such that
(i) .〈X, Y 〉 ∈ BV is adapted, continuous, and such that .〈X, Y 〉0 = 0;
(ii) .XY − 〈X, Y 〉 ∈ M c,loc and is a true martingale if .X, Y ∈ M c,2 .
9.5 Covariation Matrix 167
in probability.
Proof Given the elementary equality
(X + Y )2 − (X − Y )2
XY =
.
4
it is easy to verify that the process .〈X, Y 〉 defined as in (9.5.1) satisfies properties (i)
and (ii). Uniqueness follows directly from Theorem 9.3.6. Formula (9.5.2) follows
from the identity
and from the martingale property of .XY − 〈X, Y 〉. Formula (9.5.3) is a simple
consequence of (9.5.1), applied to .X + Y and .X − Y , and of Proposition 11.2.4
whose proof is given in Chap. 11. ⨆
⨅
Remark 9.5.2 By uniqueness, we have .〈X, X〉 = 〈X〉. The following properties
are direct consequences of definition (9.5.1) of covariation and of (9.5.3):
(i) symmetry: .〈X, Y 〉 = 〈Y, X〉;
(ii) bi-linearity: .〈αX + βY, Z〉 = √
α〈X, Z〉 + β〈Y, Z〉, for .α, β ∈ R;
(iii) Cauchy-Schwarz: .|〈X, Y 〉| ≤ 〈X〉〈Y 〉.
Since the quadratic variation of a continuous BV function is zero (cf. Proposi-
tion 9.3.4), the definition of quadratic variation extends to continuous semimartin-
gales in a natural way: recall that in Theorem 9.4.1 we defined the quadratic
variation process of a continuous semimartingale .S = X + A, with .X ∈ M c,loc
and .A ∈ BV adapted, as .〈S〉 := 〈X〉.
Definition 9.5.3 (Covariation Matrix of a Semimartingale) If .S = (S 1 , . . . , S d )
is a continuous d-dimensional semimartingale with decomposition .S = X + A, the
covariation matrix of S is the .d × d symmetric matrix defined by
Cn = {λn fn + · · · + λN fN | N ≥ n, λn , . . . , λN ≥ 0, λn + · · · + λN = 1}
.
an := inf ‖g‖,
. n ∈ N,
g∈Cn
we have .an ≤ an+1 and .a := sup an ≤ K. Then for each .n ∈ N there exists
n∈N
gn ∈ Cn such that .‖gn ‖ ≤ a + n1 . On the other hand, for each .ε > 0 there exists
‖ gn +gm ‖
.
which proves that .(gn )n∈N is a Cauchy sequence and therefore convergent. ⨆
⨅
Proof of Theorem 9.4.1 Uniqueness follows directly from Theorem 9.3.6 since if
M ' and .A' satisfy (i), (ii) and (iii) then .M−M ' is a continuous martingale of bounded
.
variation starting from 0. We prove existence assuming first that .X = (Xt )t∈[0,1] is
a continuous and bounded martingale:
for some positive constant K. This is the difficult part of the proof, in which the
main ideas emerge. We proceed step by step.
9.6 Proof of Doob’s Decomposition Theorem 169
⎲
k
( )2
Xn,k = X kn ,
. An,k = Xn,i − Xn,i−1 , Fn,k := F kn , k = 0, 1, . . . , 2n .
2 2
i=1
Clearly .k |→ Xn,k and .k |→ An,k are processes adapted to the discrete filtration
(Fn,k )k=0,1,...,2n and .k |→ An,k is increasing. Moreover, the process
.
Mn,k := Xn,k
.
2
− An,k , k = 0, 1, . . . , 2n
(by (1.4.3))
⎡ ⎤
. = E Xn,k
2
− Xn,k−1
2
| Fn,k−1 (9.6.2)
Note that, for each fixed .n ∈ N, the final value .An,2n of the process .An,· is clearly in
2
.L (Ω, P ), being a finite sum of terms that are bounded by hypothesis: however, the
number of such terms increases exponentially in n and this explains the difficulty in
proving (9.6.3) which is a uniform estimate in .n ∈ N. Here we essentially use the
martingale property and the boundedness of X (note that in the general hypotheses
X is square-integrable but in (9.6.3) powers of X of order four appear). We have
n n n
⎲
2
( )4 ⎲
2 ⎲
2
( )2 ( )2
2
.An,2n = Xn,k − Xn,k−1 + 2 Xn,k − Xn,k−1 Xn,h − Xn,h−1
k=1 k=1 h=k+1
⎲( 2n ⎲
2 n
)4 ( )2 ( )
= Xn,k − Xn,k−1 + 2 Xn,k − Xn,k−1 An,2n − An,k .
k=1 k=1
(9.6.4)
170 9 Theory of Variation
By taking the expectation, we estimate the first sum of Eq. (9.6.4) pointwise using
Eq. (9.6.1). Then, we apply the tower property in the second sum:
⎡ ⎤ 2n
⎲ ⎡( )2 ⎤
E A2n,2n ≤ 2K 2
. E Xn,k − Xn,k−1
k=1
⎲
2n
⎡( )2 ⎡ ⎤⎤
+2 E Xn,k − Xn,k−1 E An,2n − An,k | Fn,k =
k=1
⎡ ⎤ ⎲
2 ⎡( )2 ⎡ 2
n
⎤⎤
. = 2K E An,2n + 2 E Xn,k − Xn,k−1 E Xn,2 n − Xn,k | Fn,k ≤
2 2
k=1
| |
| 2 2 |
(since .|Xn,2 n − Xn,k | ≤ 2K )
2
⎡ ⎤ ⎡ ⎤1
2
. ≤ 6K 2 E An,2n ≤ 6K 2 E A2n,2n
having applied Hölder’s inequality in the last step. This concludes the proof
of (9.6.3).
Step 3 We extend the discrete martingale .Mn,· to the whole .[0, 1] by setting
(n) ⎡ ⎤
Mt
. := E Mn,2n | Ft , t ∈ [0, 1].
⎡ ⎤
For every .t ∈ k−1 k
2n , 2n we have, by the tower property,
(n) ⎡ ⎡ ⎤ ⎤
Mt
. = E E Mn,2n | Fn,k | Ft
⎡ ⎤
= E Mn,k | Ft
⎡ ⎤
= E Xn,k
2
− An,k | Ft
⎡ ( )2 ⎤
= E Xn,k
2
− Xn,k − Xn,k−1 | Ft − An,k−1
⎡ ⎤
= E 2Xn,k Xn,k−1 | Ft − Xn,k−1
2
− An,k−1
Then, from the continuity of X, it follows that .M (n) also is a continuous process.
Moreover, by Step 2 the sequence
(n)
M1
. = X12 − An,2n
(n)
is bounded in .L2 (Ω, P ). One could prove that .(M1 )n∈N is a Cauchy sequence,
converging in .L2 norm (and therefore in probability) but the direct proof of this
fact is a bit technical and laborious. Therefore, here we prefer to take a shortcut
relying on Komlós’ Lemma 9.6.1: for each .n ∈ N there exist non-negative weights
(n) (n)
.λn , . . . , λ
Nn whose sum is equal to one, such that setting
~n,t = λ(n)
M
.
(n) (n) (Nn )
n Mt + · · · + λ Nn Mt , t ∈ [0, 1],
Mt := E [Z | Ft ] ,
. t ∈ [0, 1].
At := Xt2 − Mt
.
is continuous.
To show that A is increasing, we first fix two dyadic numbers .s, t ∈ [0, 1] with
kn
.s ≤ t: then there exists .n̄ such that .s, t ∈ Dn for every .n ≥ n̄, that is, .s = n and
2
hn
.t = n for certain .kn , hn ∈ {0, 1, . . . , 2 }. Now by construction
n
2
.
2
Xn,kn
− Mn,kn = An,kn ≤ An,hn = Xn,h
2
n
− Mn,hn
and a similar inequality also holds for every convex combination, so in the limit we
have .As (ω) ≤ At (ω) for every .ω ∈ Ω \ F . From the density of dyadic numbers
172 9 Theory of Variation
= E [Mt − Ms | Fs ] + E [At − As | Fs ]
= E [At − As | Fs ] .
Step 4 Now suppose that .X = (Xt )t≥0 is a continuous, not necessarily bounded,
martingale but such that .Xt ∈ L2 (Ω, P ) for every .t ≥ 0. We use a localization
procedure and define the sequence of stopping times
τn = inf{t | |Xt | ≥ n} ∧ n,
. n ∈ N.
(n) (n)
.
2
Xt∧τ n
= Mt + At , t ≥ 0.
By uniqueness, for every .m > n we have .Mt(n) = Mt(m) and .A(n) t = A(m)
t for
(n) (n)
.t ∈ [0, τn ]: thus the definition .Mt := Mt and .At := At is well posed for every n
such that .τn ≥ t. Clearly, .M, A are continuous processes, A is increasing and M is
a martingale: indeed, if .0 ≤ s ≤ t, for every n such that .τn ≥ t we have
⎡ ⎤
Ms∧τn = E Mt∧τn | Fs .
.
We highlight the major takeaways from this chapter and the key concepts you should
remember after your first read-through, skipping over the technical jargon and less
important details. As usual, if you have any doubt about what the following succinct
statements mean, please review the corresponding section.
• Section 9.1: to facilitate the understanding of the stochastic integration theory,
we recall the definition of the Riemann-Stieltjes integral. It is the natural
generalization of the Riemann integral, defined under the assumption that the
integrand function is continuous and the integrator is of bounded variation.
The main rules of integral calculus are provided by Itô’s formula which, in a
deterministic version, anticipates the analogous result for the stochastic integral.
• Section 9.2: the Lebesgue integral can be generalized as well. In fact, by
Carathéodory’s theorem, to each .BV function is associated a (signed) measure,
called the Lebesgue-Stieltjes measure. The related integral, called the Lebesgue-
Stieltjes integral, admits a class of integrable functions much larger than the
Riemann-Stieltjes integral.
• Section 9.3: a semimartingale is an adapted process that decomposes into the sum
of a local martingale with a .BV process. For a continuous semimartingale, this
decomposition is unique: in fact, if a process is simultaneously a continuous local
martingale and of bounded variation then it is indistinguishable from a constant
process. This is due to the fact that a continuous and .BV process X has zero
quadratic variation and this, in combination with the martingale property, implies
(see (9.3.5)) that X is constant. A direct and instructive calculation shows that
the quadratic variation process of a Brownian motion W is equal to .〈W 〉T = T :
consequently, almost all trajectories of W are not of bounded variation.
• Section 9.4: the Doob’s decomposition theorem states that for every continuous
local martingale X there exists an increasing (and therefore .BV) process, called
the quadratic variation process and denoted by .〈X〉, which “compensates” the
local sub-martingale .X2 in the sense that .X2 − 〈X〉 is a continuous local
martingale. In practice, this result states that .X2 is a semimartingale and provides
its Doob’s decomposition into .BV and martingale parts.
• Section 9.6: the general idea of the proof of the Doob’s decomposition theorem
is simple: the process .〈X〉 can be constructed path by path as the limit of
the quadratic variation process. However, considering the significance of the
technical details involved, it is advisable to skip this section during the initial
reading.
174 9 Theory of Variation
1 So we want to define .Xt not only as a random variable for fixed t, but as a stochastic process
indexed by .t ≥ 0: we will see that this entails some additional difficulty due to the fact that t varies
in an uncountable set.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 175
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_10
176 10 Stochastic Integral
belongs to the space .M c,2 , the so-called Itô isometry holds, and finally, the quadratic
variation process is given explicitly by
ˆ t
〈X〉t =
. u2s d〈B〉s , t ≥ 0.
0
The last part of the chapter is dedicated to the definition of the stochastic integral
in the case where B is a continuous semimartingale. We will also introduce the
important class of Itô processes which are continuous semimartingales that can
be uniquely decomposed into the sum of a Lebesgue integral (of a progressively
measurable and absolutely integrable process) with a Brownian stochastic integral.
Remark 10.1.2 Property (i) is more than a simple condition of joint measurability
in .(t, ω) (which would be natural since we are defining an integral): it also
incorporates the critical assumption that the information structure of the considered
filtration is upheld. Let us remember that, if u is continuous, then (i) is equivalent to
the fact that u is adapted to .(Ft ).
10.1 Integral with Respect to a Brownian Motion 177
⎲
N
.ut = αk 1[tk−1 ,tk [ (t), t ≥ 0, (10.1.2)
k=1
where .0 ≤ t0 < t1 < · · · < tN and .α1 , . . . , αN are random variables such that
P (αk /= αk+1 ) > 0 for .k = 1, . . . , N − 1. For every .T ≥ tN we set
.
ˆ ⎲
N
T ( )
. ut dBt := αk Btk − Btk−1
0 k=1
and define the stochastic integral for two generic integration endpoints a and b, with
0 ≤ a ≤ b, as
.
ˆ b ˆ tN
. ut dBt := ut 1[a,b[ (t)dBt . (10.1.3)
a 0
In this introductory part, we do not worry about clarifying all the details of
the definition of integral, such as the fact that (10.1.3) is well posed because it is
independent, up to indistinguishable processes, of the representation (10.1.2) of the
process u.
Remark 10.1.5 A simple process is piecewise constant as a function of time and
has trajectories that depend on the coefficients .α1 , . . . , αN which are random. From
the fact that .u ∈ L2 some properties of the variables .α1 , . . . , αN follow:
(i) since u is progressively measurable and .αk = ut ∈ mFt for every .t ∈ [tt−k , tk [,
then
αk ∈ mFtk−1 ,
. k = 1, . . . , N; (10.1.4)
2 The Poisson process is a .BV process and therefore we can define the related stochastic integral in
the Lebesgue-Stieltjes sense: however, if the integrand is not continuous from the left, the integral
loses the fundamental property of being a (local) martingale: for an intuitive explanation of this
fact, see Section 2.1 in [37].
178 10 Stochastic Integral
⎲
N ⎾ ⎤
= E αk2 (tk − tk−1 ) < +∞
k=1
Proof First, let us observe that formulas (10.1.5), (10.1.6), (10.1.7) and (10.1.8) are
equivalent to
E [Xt − Xs | Fs ] = 0,
. (10.1.10)
⎾ ⎤
E (Xt − Xs )2 | Fs = E [〈X〉t − 〈X〉s | Fs ] ,
⎲
h
⎾ ( ) ⎤
= Xtk + E αi Bti − Bi−1 | Ftk =
i=k+1
⎲
h
⎾ ⎾ ⎤ ⎤
. = Xtk + E αi E Bti − Bti−1 | Fti−1 | Ftk = Xtk
i=k+1
where the last equality follows from the independence and stationarity of Brownian
increments for which we have
⎾ ⎤ ⎾ ⎤
E Bti − Bti−1 | Fti−1 = E Bti − Bti−1 = 0
.
for every .i = 1, . . . , N .
Regarding Itô’s isometry, still assuming that .s = tk and .t = th , we have
⎾⎛ˆ ⎞2 ⎤
t
E
. ur dBr | Fs
s
⎾( )2 ⎤
=E Xth − Xtk | Ftk
⎡⎛ ⎞2 ⎤
⎲h
( )
=E⎣ αi Bti − Bti−1 | F tk ⎦
i=k+1
180 10 Stochastic Integral
⎲
h ⎾ ( )2 ⎤
= E αi2 Bti − Bti−1 | Ftk
i=k+1
1 ⎲ ⎾ ( ) ( ) ⎤
+ E αi Bti − Bti−1 αj Btj − Btj −1 | Ftk =
2
k+1≤i<j ≤h
⎲
h ⎾ ⎾( )2 ⎤ ⎤
. = E αi2 E Bti − Bti−1 | Fti−1 | Ftk
i=k+1
1 ⎲ ⎾ ( ) ⎾ ⎤ ⎤
+ E αi Bti − Bti−1 αj E Btj − Btj −1 | Ftj −1 | Ftk =
2
k+1≤i<j ≤h
⎲
h ⎾ ⎤
. = E αi2 (ti − ti−1 ) | Ftk
i=k+1
⎲
h ⎾ˆ t ⎤
= E αi2 1[ti−1 ,ti [ (r)dr | Fs
i=k+1 s
⎾ˆ t ⎤
=E u2r dr | Fs .
s
(by (10.1.7))
⎾ˆ T ⎤
. =E ur vr 1[s,t[ (r)1[t,T [ (r)dr = 0.
s
= Xs Ys + E [〈X, Y 〉t − 〈X, Y 〉s | Fs ]
Hence, also the sequence of stochastic integrals is a Cauchy sequence in .L2 (Ω, P ),
thereby ensuring the existence of
ˆ T ˆ T
. us dBs := lim un,s dBs .
0 n→∞ 0
With this procedure, the stochastic integral is defined for a fixed T as a limit in
L2 (Ω, P )-norm, i.e., only up to a negligible event. We will see in Sect. 10.2.3 that,
.
In Sect. 10.2.4 we will further extend the integral to the case of integrands in
u ∈ L2loc , that is, u is progressively measurable and satisfies the mild integrability
.
condition
ˆ T
. u2t dt < ∞ T > 0, a.s. (10.1.12)
0
which is considerably weaker than (10.1.1): for example, every adapted continuous
process u belongs to .L2loc since the integral in (10.1.12), on the compact interval
.[0, T ], is finite by the continuity of the trajectories of u. On the other hand, .ut =
exp(Bt4 ) is in .L2loc but not3 in .L2 . Theorem 10.1.6 does not extend to the case of
.u ∈ L
2 , however, we will prove that in this case the integral process is a local
loc
martingale.
To prove the density of the class of simple processes in the space .L2 , we use
the following consequence of Proposition B.3.3 in [113], namely the so-called
“continuity in mean” of absolutely integrable functions.
Corollary 10.1.8 (Continuity in Mean) If .f ∈ L1 (R) then for almost every .x ∈
R we have
ˆ x+h
1
. lim |f (x) − f (y)|dy = 0.
h→0 h x
Tk
. tn,k = , k = 0, . . . , 2n , (10.1.13)
2n
the dyadic numbers of .[0, T ] and define the simple process
n
⎲
2
un,t =
. αn,k 1[tn,k−1 ,tn,k [ , αn,k = utn,k−1 1{|utn,k−1 |≤n} , t ∈ [0, T ].
k=1
3 Since
⎾ˆ T ⎤ ˆ ˆ T 1 x2
e− 2t dtdx = +∞.
4 4
.E e2Bt dt = e2x √
0 R 0 2π t
10.2 Integral with Respect to Continuous Square-Integrable Martingales 183
end, we define4
t
un,t :=
. ⎛ ⎞ us ds, 0 < t ≤ T , n ∈ N.
t− n1 ∨0
Note that .un is continuous and adapted (and therefore progressively measurable).
Moreover, we have
⎡ ⎞2 ⎤
⎾ˆ T ⎤ ˆ T⎛ t
( )2
.E ut − un,t dt = E ⎣ ⎛ ⎞ (ut − us )ds dt ⎦ ≤
0 0 t− n1 ∨0
and therefore we can take the limit in (10.1.14) as .n → ∞ and conclude using the
Lebesgue dominated convergence theorem.
We assume that the integrator process B belongs to the class .M c,2 , i.e., B is a
continuous martingale such that .Bt ∈ L2 (Ω, P ) for every .t ≥ 0. The construction
of the stochastic integral is similar to the case of a Brownian motion with some
additional technicalities.
We denote by .〈B〉 the quadratic variation process defined in Theorem 9.4.1:
.〈B〉 is a continuous and increasing process associated with the Lebesgue-Stieltjes
ffl
4 Here . b
´b
a us ds = 1
b−a a us ds for .a < b.
184 10 Stochastic Integral
indicate the integral with respect to .μ〈B〉 . For example, if B is a Brownian motion
then .〈B〉t = t and the corresponding Lebesgue-Stieltjes measure is simply the
Lebesgue measure, as seen in Sect. 10.1.
Definition 10.2.1 We denote by .L2B the class of processes .u = (ut )t≥0 such that:
(i) u is progressively measurable;
(ii) for every .T ≥ 0 we have
⎾ˆ T ⎤
E
. u2t d〈B〉t < ∞. (10.2.1)
0
Generally, the process B will be fixed once and for all and therefore, if there is no
risk of confusion, we will simply write .L2 instead of .L2B .
At a later stage, we will weaken the integrability condition (ii) by requiring that
u belongs to the following class.
Definition 10.2.2 We denote by .L2B,loc (or, more simply, .L2loc ) the class of pro-
cesses u such that
(i) u is progressively measurable;
(ii’) for every .T ≥ 0 we have
ˆ T
. u2t d〈B〉t < ∞ a.s. (10.2.2)
0
Consider a very particular class of integrands that, with respect to the temporal
variable, are indicator functions of an interval: precisely, an indicator process is a
stochastic process of the form
where .α is a .Ft0 -measurable and bounded random variable (i.e., such that .|α| ≤ c
a.s. for some positive constant c) and .t1 > t0 ≥ 0.
Remark 10.2.3 Every indicator process u belongs to .L2 : in fact, u is càdlàg and
adapted, therefore progressively measurable; moreover, u satisfies (10.2.1) since
⎾ˆ ⎤ ⎾ (
T )⎤ ⎾ ⎤
E
. u2t d〈B〉t = E α 2 〈B〉T ∧t1 − 〈B〉T ∧t0 ≤ c2 E 〈B〉T ∧t1 − 〈B〉T ∧t0 < ∞
0
for every .T ≥ 0.
The definition of the stochastic integral of an indicator process is elementary and
completely explicit: it is defined, path by path, by multiplying .α by an increment
of B.
Definition 10.2.4 (Stochastic Integral of Indicator Processes) Let u be the indi-
cator process in (10.2.3) and .B ∈ M c,2 . For every .T ≥ t1 we set
ˆ T ( )
. ut dBt := α Bt1 − Bt0 (10.2.4)
0
and we define the stochastic integral for two generic integration endpoints a and b,
with .0 ≤ a ≤ b, as
ˆ b ˆ t1
. ut dBt := ut 1[a,b[ (t)dBt . (10.2.5)
a 0
Remark 10.2.5 If .[t0 , t1 [∩[a, b[/= ∅, the integral in the right-hand side of (10.2.5)
is defined by (10.2.4) interpreting .ut 1[a,b[ (t) as the simple process .α1[t0 ∨a,t1 ∧b[ (t)
and choosing .T = t1 . Otherwise, it is understood that the integral is null by
definition.
Remark 10.2.6 ([!]) Being defined in terms of increments of B, the stochastic
integral does not depend on the initial value .B0 . Moreover, X is an adapted and
continuous process.
In the next result, we establish some fundamental properties of the stochastic
integral. The second part of the proof is based on the remarkable identity (9.4.1),
valid for every .B ∈ M c,2 , which we recall here:
⎾ ⎤
E (Bt − Bs )2 | Fs = E [〈B〉t − 〈B〉s | Fs ] ,
. 0 ≤ s ≤ t. (10.2.6)
where we have exploited the fact that .α ∈ mFs and the martingale property
of B. This proves (10.2.7) which is equivalent to the martingale property of
X. Clearly .XT ∈ L2 (Ω, P ) for every .T ≥ 0 since .XT is the product of
the bounded random variable .α, times an increment of B which is square-
integrable.
10.2 Integral with Respect to Continuous Square-Integrable Martingales 187
and therefore
⎾ˆ t ˆ t ⎤
.E [Xt Yt | Fs ] = Xs Ys + E ur dBr vr dBr | Fs
s s
⎾ˆ t ⎤ ⎾ˆ t ⎤
+ Xs E vr dBr | Fs + Ys E ur dBr | Fs =
s s
so
⨆
⨅
Remark 10.2.8 Formulas (10.2.7), (10.2.8), (10.2.9), (10.2.10), and (10.2.11) can
be rewritten in the form
E [Xt − Xs | Fs ] = 0,
.
⎾ ⎤
E (Xt − Xs )2 | Fs = E [〈X〉t − 〈X〉s | Fs ] ,
By taking the expected value, we also obtain the unconditional versions of Itô’s
isometry:
⎾⎛ˆ ⎞2 ⎤ ⎾ˆ ⎤
t t
E
. ur dBr =E u2r d〈B〉r ,. (10.2.12)
s s
⎾ˆ t ˆ t ⎤ ⎾ˆ t ⎤
E ur dBr vr dBr = E ur vr d〈B〉r ,
s s s
⎾ˆ t ˆ T ⎤
E ur dBr vr dBr = 0, (10.2.13)
s t
⎲
N
.ut = uk,t , uk,t := αk 1[tk−1 ,tk [ (t), (10.2.14)
k=1
where:
(i) .0 ≤ t0 < t1 < · · · < tN ;
(ii) .αk is a bounded .Ftk−1 -measurable random variable for each .k = 1, . . . , N.
One can also require that .P (αk /= αk+1 ) > 0, for .k = 1, . . . , N − 1, so that the
representation (10.2.14) of u is unique.
Definition 10.2.10 (Stochastic Integral of Simple Processes) Let u be a simple
process of the form (10.2.14) and let .B ∈ M c,2 . The stochastic integral of u with
respect to B is the stochastic process
ˆ N ˆ
⎲ ⎲
N
t t ( )
. us dBs := uk,s dBs = αk Bt∧tk − Bt∧tk−1 .
0 k=1 0 k=1
Theorem 10.2.11 Theorem 10.2.7 remains valid under the assumption that .u, v are
simple processes.
Proof The continuity and the martingale property (10.2.7) are immediate due to
linearity. As by Itô’s isometry (10.2.9), first we can write v in the form (10.2.14)
with respect to the same choice of .t0 , . . . , tN , for certain .vk,t = βk 1[tk−1 ,tk [ (t): note
that
⎲
N ⎲
N ⎲
N
. ut vt = uk,t vh,t = αk βk 1[tk−1 ,tk [ (t). (10.2.15)
k=1 h=1 k=1
Then we have
⎾ˆ t ˆ t ⎤
.E ur dBr vr dBr | Fs
s s
⎾ N ˆ t N ˆ
⎤
⎲ ⎲ t
=E uk,r dBr vh,r dBr | Fs
k=1 s h=1 s
⎲
N ⎾ˆ t ˆ t ⎤
= E uk,r dBr vk,r dBr | Fs
k=1 s s
⎲ ⎾ˆ th ˆ tk ⎤
×2 E uh,r 1[s,t[ (r)dBr vk,r 1[s,t[ (r)dBr | Fs =
h<k th−1 tk−1
190 10 Stochastic Integral
⎲
N ⎾ˆ t ⎤
. = E uk,r vk,r d〈B〉r | Fs =
k=1 s
(by (10.2.15))
⎾ˆ t ⎤
. =E ur vr d〈B〉r | Fs .
s
Finally, the fact that .〈X, Y 〉 in (10.2.11) is the covariation process of X and Y is
proven as in the proof of Theorem 10.2.7-(iii). ⨅
⨆
10.2.3 Integral in L2
In this section, we extend the class of integrands by exploiting the density of simple
processes in .L2B (cf. Definition 10.2.1). The stochastic integral is now defined as
a limit in .M c,2 and therefore, recalling Remark 8.3.2, as an equivalence class and
no longer path by path. However, the fundamental properties of the integral remain
valid: the martingale property and Itô’s isometry. As usual, since B is fixed, we
simply write .L2 instead of .L2B .
Lemma 10.1.7 has the following generalization, which is proven with a technical
trick: the idea is to make a change of time variable to “realign” the continuous and
increasing process .〈B〉t to the Brownian case in which .〈B〉t ≡ t; for details, we
refer to Lemma 2.2.7 in [67].
Lemma 10.2.12 Let .u ∈ L2 . For every .T > 0 there exists a sequence .(un )n∈N of
simple processes such that
⎾ˆ ⎤
T ( )2
. lim E us − un,s d〈B〉s = 0.
n→∞ 0
We now see how to define the stochastic integral of .u ∈ L2 . Given .T > 0 and
an approximating sequence .(un )n∈N of simple processes as in Lemma 10.2.12, we
denote by
ˆ t
Xn,t =
. un,s dBs , t ∈ [0, T ], (10.2.16)
0
It follows that .(Xn )n∈N is a Cauchy sequence in .(MTc,2 , ‖ · ‖T ) and therefore there
exists
X := lim Xn
. in MTc,2 . (10.2.17)
n→∞
Proof Let X be the limit in (10.2.17) defined from the approximating sequence
(un )n∈N . Let .(vn )n∈N be another approximating sequence for u and
.
ˆ t
Yn,t =
. vn,s dBs , t ∈ [0, T ]. (10.2.18)
0
Then .‖Yn − X‖T ≤ ‖Yn − Xn ‖T + ‖Xn − X‖T and it is enough to observe that,
again by Itô’s isometry, we have
⎾⎛ˆ ⎞2 ⎤
T
. ‖Yn − Xn ‖T =E (vn,t − un,t )dBt
2
0
⎾ˆ T ⎤
=E (vn,t − un,t )2 d〈B〉t −−−→ 0.
0 n→∞
⨆
⨅
192 10 Stochastic Integral
which can be verified directly for simple .u, v and, in general, by approximation.
Proof Let us consider the approximations .un and .vn defined as in Lemma 10.2.12.
By construction, for every .n ∈ N and .t ∈ [0, T ], .un,t = vn,t almost surely on
F . It follows that the relative integrals .(Xn,t )t∈[0,T ] in (10.2.16) and .(Yn,t )t∈[0,T ]
in (10.2.18) are modifications on F . Taking the limit in n, we deduce that .(Xt )t∈[0,T ]
and .(Yt )t∈[0,T ] are modifications on F : the thesis follows from the continuity of X
and Y . ⨆
⨅
Remark 10.2.18 Suppose that, for some .T > 0, we have
ˆ T ˆ T
. ut dBt = vt dBt
0 0
n
⎲
2
τn =
. tn,k 1Fn,k
k=1
with
( ) ( )
Fn,1 = 0 ≤ τ ≤
.
T
2n , Fn,k = tn,k−1 < τ ≤ tn,k , k = 2, . . . , 2n .
We note that .(Fn,k )k=1,...,2n forms a partition of .Ω with .Fn,k ∈ Ftn,k and .(τn )n∈N is
a decreasing sequence of stopping times that converges to .τ . By continuity, we have
.Xτn → Xτ . Moreover, setting
ˆ T ˆ T
Y =
. us 1(s≤τ ) dBs , Yn = us 1(s≤τn ) dBs ,
0 0
using Itô’s isometry, it is easy to prove that .Yn → Y in .L2 (Ω, P ) and therefore also
almost surely.
To prove the thesis, i.e., the fact that .Xτ = Y a.s., it is sufficient to verify that
.Xτn = Yn a.s. for each .n ∈ N. Now, on .Fn,k we have
ˆ T ˆ T
Xτn = Xtn,k =
. us dBs − us dBs ,
0 tn,k
and therefore
ˆ T ⎲
2 n ˆ T
Xτn =
. us dBs − 1Fn,k us dBs . (10.2.21)
0 k=1 tn,k
10.2 Integral with Respect to Continuous Square-Integrable Martingales 195
ˆ T ⎲
2n ˆ T
. = us dBs − 1Fn,k us dBs
0 k=1 tn,k
Weakening the integrability condition on the integrand from .L2 to .L2loc , some of the
fundamental properties of the integral are lost, including the martingale property
and Itô’s isometry. However, we will prove that the integral is a local martingale
and provide a “surrogate” for Itô’s isometry, Lemma 10.2.25.
We recall that .u ∈ L2loc if it is progressively measurable and, for every .t > 0,
ˆ t
At :=
. u2s d〈B〉s < ∞ a.s. (10.2.22)
0
Fig. 10.1 On the left: plot´ t of a trajectory of a Brownian motion W . On the right: plot of the
related trajectory of .At = 0 Ws2 ds, corresponding to the process in (10.2.22) with .u = W and B
Brownian motion
196 10 Stochastic Integral
Fig. 10.2 Plot of two trajectories of the process A in (10.2.22) and the corresponding stopping
times .τn and .τn+1 in (10.2.23)
Remark 10.2.20 ([!]) Note that the class .L2 depends on the fixed probability
measure, as opposed to .L2loc that is invariant with respect to equivalent6 probability
measures.
Let us fix .T > 0 and consider the sequence of stopping times defined by
τn = T ∧ inf{t ≥ 0 | At ≥ n},
. n ∈ N, (10.2.23)
and represented in Fig. 10.2. Due to the continuity of A, we have .τn ↗ T almost
surely, and thus the sequence of events .Fn := (τn = T ) is such that .Fn ↗ Ω \ N
with .P (N ) = 0. Truncating u at time .τn , we define the process
un,t := ut 1(t≤τn ) ,
. t ∈ [0, T ],
un,t = un+h,t = ut
. on Fn ,
6 Equivalent measures have the same certain (and, therefore, also negligible) events.
10.2 Integral with Respect to Continuous Square-Integrable Martingales 197
( ) ( )
and therefore the processes . Xn,t t∈[0,n] and . Xn+h,t t∈[0,n] are indistinguishable
on .Fn thanks to Proposition 10.2.17. Hence, the following definition is well-posed:
Definition 10.2.21 (Stochastic Integral of Processes in .L2loc ) The stochastic inte-
gral of .u ∈ L2loc with respect to .B ∈ M c,2 on .[0, T ] is the continuous and adapted
process .X = (Xt )t∈[0,T ] that on .Fn is indistinguishable from .Xn in (10.2.24) for
every .n ∈ N. As usual, we write
ˆ t
Xt =
. us dBs , t ∈ [0, T ]. (10.2.25)
0
is well-defined.
Proposition 10.2.19 has the following simple generalization.
Proposition 10.2.23 (Integral with Random Integration Endpoint) Let X be the
stochastic integral process of .u ∈ L2loc with respect to .B ∈ M c,2 . Let .τ be a
( )
stopping time such that .0 ≤ τ ≤ T for some .T > 0. Then . ut 1(t≤τ ) t≥0 ∈ L2loc and
ˆ τ ˆ T
Xτ =
. us dBs = us 1(s≤τ ) dBs a.s.
0 0
( )
Proof It is clear that . ut 1(t≤τ ) t≥0 ∈ L2loc . Let .(τn )n∈N be the sequence of stopping
times in (10.2.23). By definition on the event .Fn = (τn = T ), we have
ˆ τ
Xτ =
. us 1(s≤τn ) dBs =
0
198 10 Stochastic Integral
τn := n ∧ inf{t ≥ 0 | At ≥ n},
. n ∈ N,
Proof By Proposition 10.2.23 (with the choice .τ = t ∧ τn and .T = t), for every
t ≥ 0 we have
.
ˆ t
Xt∧τn =
. us 1(s≤τn ) dBs a.s.
0
is a martingale: it follows that .XY − A ∈ M c,loc with localizing sequence .(τn )n∈N
and therefore .A = 〈X, Y 〉. ⨆
⨅
For the stochastic integral of .u ∈ L2loc , we no longer have a fundamental tool
such as Itô’s isometry: in many situations it can be conveniently replaced by the
following lemma.
Lemma 10.2.25 ([!]) Let
ˆ t ˆ t
Xt =
. us dBs , 〈X〉t = u2s d〈B〉s ,
0 0
δ
P (|Xt | ≥ ε) ≤ P (〈X〉t ≥ δ) +
. .
ε2
Proof Let
δ
P ((|Xt | ≥ ε) ∩ (τδ > t)) ≤
. .
ε2
Now we have
⎛⎛|ˆ t | ⎞ ⎞ ⎛⎛|ˆ t | ⎞ ⎞
| | | |
.P | u dB | ≥ ε ∩ (t < τ ) = P | u 1 dB | ≥ ε ∩ (t < τδ )
| s s | δ | s (s<τδ ) s |
0 0
⎛|ˆ t | ⎞
| |
| |
≤ P | us 1(s<τδ ) dBs | ≥ ε ≤
0
200 10 Stochastic Integral
⨆
⨅
The following result shows that the stochastic integral of .u ∈ L2loc can also be
defined by approximation, as we did for .u ∈ L2 , provided that we use convergence
in probability instead of in .L2 (Ω, P )-norm.
Proposition 10.2.26 Let .u, un ∈ L2loc , .n ∈ N, such that
ˆ t
P
. |un,s − us |2 d〈B〉s −−−→ 0. (10.2.26)
0 n→∞
Then
ˆ t ˆ t
P
. un,s dBs −−−→ us dBs .
0 n→∞ 0
of progressive measurability of the integrand. The following result is also the basis
of the numerical approximation methods for the stochastic integral.
Corollary 10.2.27 ([!]) Let u be a continuous and adapted process, .B ∈ M c,2 ,
and .(πn )n∈N be a sequence of partitions of .[0, t], with .πn = (tn,k )k=0,...,mn , such
that . lim |πn | = 0. Then
n→∞
⎲
mn ˆ
( ) P t
. utn,k−1 Btn,k − Btn,k−1 −−−→ us dBs .
n→∞ 0
k=1
Proof Setting
⎲
mn
. un,s = utn,k−1 1[tn,k−1 ,tn,k[ (s)
k=1
⎲
mn ˆ
( ) t
. utn,k−1 Btn,k − Btn,k−1 = un,s dBs .
k=1 0
d
If .(u1 , B 1 ) = (u2 , B 2 ) (i.e. .(u1 , B 1 ) and .(u2 , B 2 ) are equal in law) then we also
1 1 d
.(u , B , X ) = (u , B , X ).
1 2 2 2
A similar result holds under much more general assumptions: in this regard, see,
for example, Exercise IV.5.16 in [123].
202 10 Stochastic Integral
S =A+B
.
where the two integrals on the right-hand side have the meaning that we now
explain.
Let .μA be the Lebesgue-Stieltjes measure7 associated with A and defined path
by path: we denote by
ˆ t ˆ
. ur dAr := ur μA (dr)
0 [0,t]
for every .t ≥ 0.
As for the integral with respect to .B ∈ M c,loc , one can use a localization
procedure entirely analogous8 to that of Sect. 10.2.4. In conclusion, recalling
Definition 9.5.3 of quadratic variation of a semimartingale, we have the following
Proposition 10.3.2 Let .S = A+B be a continuous semimartingale and .u ∈ L2S,loc .
The stochastic integral process
ˆ t ˆ t ˆ t
Xt :=
. ur dSr = ur dAr + ur dBr , t ≥ 0,
0 0 0
8 Let .(τ )
n n∈N be a localizing sequence for B: as in Remark 8.4.6-(iv) we can assume .|Bt∧τn | ≤ n
so that .Bn := (Bt∧τn )t≥0 ∈ M c,2 . If .u ∈ L2S,loc then
ˆ t ˆ t
. u2r d〈Bn 〉r ≤ u2r d〈B〉r < ∞ a.s.
0 0
This is true if u is simple and in general it can be proved by approximation, as Proposition 10.2.17.
Since .Fn,T ↗ FT with .P (FT ) = 1, we define the integral
ˆ t
.Yt = ur dBr , 0 ≤ t ≤ T,
0
as the equivalence class of continuous and adapted processes that, for each .n ∈ N, are
indistinguishable from .(Yn,t )t∈[0,T ] on .Fn,T . If Y and .Ȳ indicate respectively the stochastic
integral processes of u on the intervals .[0, T ] and .[0, T̄ ] with .T ≤ T̄ , then Y and .Ȳ |[0,T ] are
indistinguishable on .[0, T ]. Therefore, the Itô stochastic integral process of .u ∈ L2S,loc with
respect to .B ∈ M c,loc is well defined:
ˆ t
.Yt = ur dBr , t ≥ 0.
0
and a localizing sequence for Y is given by .τ̄n = τn ∧ τn' where .τn' = inf{t ≥ 0 | 〈I 〉t ≥ n}.
204 10 Stochastic Integral
In the next section, we deal with the particular case where .At = t and B is a
Brownian motion.
where:
(i) .X0 ∈ mF0 ;
(ii) .u ∈ L1loc , that is, u is progressively measurable and such that
ˆ t
. |us |ds < ∞, a.s.
0
for any .t ≥ 0;
(iii) .v ∈ L2loc , that is, v is progressively measurable and such that9
ˆ t
. |vs |2 ds < ∞ a.s.
0
for any .t ≥ 0.
Notation 10.4.2 (Differential Notation [[!]) ] To indicate the Itô process
in (10.4.1), the so-called “differential notation” is often used:
This notation, in addition to being more compact, has the merit of evoking the
expressions of classical differential calculus. In rigorous terms, .dXt is neither a
“derivative” nor a “differential of the process X”. These terms have not been
defined; rather, it is a symbol that holds significance solely within the context of
expression (10.4.2): such expression, in turn, is a writing whose precise meaning is
given by the integral equation (10.4.1). When we talk about stochastic differential
calculus, we refer to this type of symbolic calculation whose true meaning is
Remark 10.4.3 ([!]) The representation of an Itô process is unique in the following
sense: if X is the process in (10.4.2) and we also have
In particular, if .u, u' , v, v ' are continuous, then u is indistinguishable from .u' and v
is indistinguishable from .v ' .
Indeed, the process
ˆ t ˆ t ˆ t ˆ t
Mt :=
. vs dWs − vs' dWs = u's ds − us ds
0 0 0 0
206 10 Stochastic Integral
where the second and third equalities are due respectively to Proposition 10.2.23
and Itô’s isometry. Taking the limit as .n → ∞, by Beppo Levi’s theorem, we have
⎾ˆ ∞ ⎤
E
. (vs − vs' )2 ds =0
0
( )
and therefore .P v = v ' a.e. = 1. On the other hand, by Proposition B.3.2 in [113],
we also have that
( )
P u = u' a.e. = 1.
.
We summarize the contents of the chapter and provide a roadmap for reading,
glossing over technical and secondary aspects. As usual, if you have any doubt
about what the following succinct statements mean, please review the corresponding
section.
• Section 10.1: when approaching these topics for the first time, it is preferable
to select some content and postpone the general treatment and in-depth studies
to a later time. In particular, it is best to first consider only the case where the
integrator is a Brownian motion. As for the integrand, the crucial assumption is
that it is a progressively measurable process; the construction of the Brownian
integral takes place in three steps, gradually widening the class of integrands:
(1) the definition of the integral of simple processes is explicit: it is a Riemann
sum of Brownian increments. In this case, three fundamental properties of
the integral are directly proven:
10.5 Key Ideas to Remember 207
Ito’s formula is the most important tool in stochastic differential calculus. In this
chapter, we present several versions that provide the general rules of stochastic
calculus and generalize the analogous deterministic formula of Theorem 9.1.6 for
the Lebesgue-Stieltjes integral.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 209
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_11
210 11 Itô’s Formula
1
dF (Xt ) = F ' (Xt )dXt + F '' (Xt )d〈X〉t .
. (11.1.2)
2
Idea of the Proof Given a partition .π = {t0 , . . . , tN } of .[0, t], we write the
difference .F (Xt ) − F (X0 ) as a telescoping sum and then expand it in a Taylor
series up to the second order: we obtain
⎲
N
( )
F (Xt ) − F (X0 ) =
. F (Xtk ) − F (Xtk )
k=1
⎲
N
( ) 1⎲ N
( )2
= F ' (Xtk−1 ) Xtk − Xtk−1 + F '' (Xtk−1 ) Xtk − Xtk−1
2
k=1 k=1
+ “remainder”.
⎲
N ˆ
'
( ) t
. F (Xtk−1 ) Xtk − Xtk−1 −→ F ' (Xs )dXs ,
k=1 0
⎲
N ˆ
( )2 t
F '' (Xtn,k−1 ) Xtk − Xtk−1 −→ F '' (Xs )d〈X〉s
k=1 0
as .|π | → 0 and the remainder term is negligible. The detailed proof, which involves
more technical intricacies, is presented in Sect. 11.3.
Remark 11.1.2 Compared to the deterministic version (9.1.3), in Itô’s for-
mula (11.1.2) an additional second-order term appears, which comes from the
quadratic variation of X: the factor . 21 appearing in front of it is the coefficient of the
Taylor series expansion of F .
Likewise, we establish a more comprehensive version of Itô’s formula.
Theorem 11.1.3 (Itô’s Formula) Let X be a continuous real semimartingale and
F = F (t, x) ∈ C 1,2 (R≥0 × R). Then almost surely, for every .t ≥ 0 we have
.
ˆ t ˆ t
.F (t, Xt ) = F (0, X0 ) + (∂t F )(s, Xs )ds + (∂x F )(s, Xs )dXs
0 0
ˆ t
1
+ (∂xx F )(s, Xs )d〈X〉s
2 0
11.1 Itô’s Formula for Continuous Semimartingales 211
1
.dF (t, Xt ) = ∂t F (t, Xt )dt + (∂x F )(t, Xt )dXt + (∂xx F )(t, Xt )d〈X〉t .
2
We consider Itô’s formula for a real Brownian motion W and delve into several
illustrative examples. Recall that the quadratic variation process of W is simply
.〈W 〉t = t.
Example 11.1.5
(i) if .F (t, x) = f (t)x, with .f ∈ C 1 (R), we have
Then we have
ˆ t ˆ t
f (t)Wt =
. f ' (s)Ws ds + f (s)dWs
0 0
∂t F (t, x) = 0,
. ∂x F (t, x) = 2x, ∂xx F (t, x) = 2,
and therefore
ˆ t
Wt2 = 2
. Ws dWs + t
0
or, in
⎛ ⎞
σ2
dXt = a +
.
2 Xt dt + σ Xt dWt .
2
With the choice .a = − σ2 , the drift of the process vanishes, and we obtain
ˆ t
. Xt = 1 + σ Xs dWs
0
σ2
which is a continuous martingale: specifically, .Xt = eσ Wt − 2 t is the exponen-
tial martingale introduced in Proposition 4.4.1.
Remark 11.1.6 ([!]) Itô’s formula shows that every stochastic process of the form
Xt = F (t, Wt ), with F sufficiently regular, is an Itô process according to
.
1
∂t F (t, x) + ∂xx F (t, x) = 0,
. t > 0, x ∈ R, (11.1.3)
2
then the drift of X vanishes and therefore X is a local martingale. Conversely, if X
is a local martingale then by Remark 10.4.3 we have that
1
(∂t F + ∂xx F )(t, Wt ) = 0
. (11.1.4)
2
in the sense of indistinguishability and this implies2 that F solves the heat
equation (11.1.3).
dXt = μt dt + σt dWt
. (11.1.5)
that is, .d〈X〉t = σt2 dt. Hence we have the following further version of Itô’s formula.
1 We find here the result of Theorem 4.4.3, proven in the context of Markov process theory!
2 The stochastic equation (11.1.4) is equivalent to the deterministic equation (11.1.3): just observe
that if f is a continuous function such that .f (Wt ) = 0 a.s. for a .t > 0 then .f ≡ 0: in fact, if it
were .f (x̄) > 0 for a .x̄ ∈ R then we would also have .f (x) > 0 for .|x − x̄| < r for some .r > 0
sufficiently small; this leads to a contradiction since, the Gaussian density being strictly positive,
we would have
⎡ ⎤
.0 < E f (Wt )1(|Wt −x̄|<r) = 0.
214 11 Itô’s Formula
Corollary 11.1.7 (Itô’s Formula for Itô Processes) Let X be the Itô process
in (11.1.5). For each .F = F (t, x) ∈ C 1,2 (R≥0 × R) we have
ˆ t ˆ t
F (t, Xt ) = F (0, X0 ) +
. (∂t F )(s, Xs )ds + (∂x F )(s, Xs )dXs
0 0
ˆ t
1
+ (∂xx F )(s, Xs )σs2 ds (11.1.6)
2 0
or equivalently
⎛ ⎞
σt2
.dF (t, Xt ) = ∂t F + μt ∂x F + ∂xx F (t, Xt )dt + σt ∂x F (t, Xt )dWt .
2
Example 11.1.8 ([!!]) Let us calculate the stochastic differential of the process
´t
Yt = e t
. 0 Ws dWs
.
First of all, we notice that we cannot use Itô’s formula for Brownian motion from
Corollary 11.1.4 because .Yt is not a function of .Wt but depends on .(Ws )s∈[0,t] ,
that is, on the entire trajectory of W in the interval .[0, t]. The general criterion
to correctly apply Itô’s formula is to first analyze how .Yt depends on the variable
t, distinguishing the “deterministic” from the “stochastic” dependence: in this
example, we highlight in bold the deterministic dependence
⎛ ˆ t ⎞
. t |→ exp t Ws dWs
0
to establish that
ˆ t
Yt = F (t, Xt ),
. F (t, x) = e , tx
Xt = Ws dWs ,
0
and therefore .dXt = Wt dWt and .d〈X〉t = Wt2 dt. Then we can apply Itô’s
formula (11.1.6): since
.∂t F (t, x) = xF (t, x), ∂x F (t, x) = tF (t, x), ∂xx F (t, x) = t 2 F (t, x),
11.1 Itô’s Formula for Continuous Semimartingales 215
we get
⎛ ⎞
(tWt )2
.dYt = Xt + Yt dt + tWt Yt dWt .
2
so that
η2
ϕXt (η) = eiηm(t)−
. 2 C (t)
.d(tWt ) = tdWt + Wt dt
that is
ˆ t ˆ t
Xt = tWt −
. sdWs = (t − s)dWs .
0 0
is not written in the form of an Itô process: to circumvent this problem, we define
the Itô process
ˆ t
(a)
.Yt := (a − s)dWs
0
(a)
Yt
. ∼N 3
0, t3 +at (a−t)
(t)
and the thesis follows from the fact that .Xt = Yt .
We prove some classical inequalities that are a basic tool in the study of martingales
and stochastic differential equations.
Theorem 11.2.1 (Burkholder-Davis-Gundy [!]) Let X be a continuous local
martingale such that .X0 = 0 a.s. and .τ an a.s. finite stopping time (i.e., such that
11.2 Some Consequences of Itô’s Formula 217
τ < ∞ a.s.). For every .p > 0 there exist two positive constants .cp , Cp such that
.
⎡ ⎤
⎡ ⎤ ⎡ ⎤
.cp E 〈X〉τ ≤E sup |Xt | ≤ Cp E 〈X〉p/2
p/2 p
τ . (11.2.1)
t∈[0,τ ]
and assume for the moment that .X̄τ ≤ n a.s. for some .n ∈ N. Then, by Doob’s
maximal inequality, Corollary 8.1.3, we have
⎡ ⎤ ⎡ ⎤
E X̄τp ≤ cp E |Xτ |p =
.
(since the first term is null because the stochastic integral is a martingale, given the
boundedness assumption of .X̄τ )
⎡ˆ τ ⎤
. = cp' E |Xt |p−2 d〈X〉t
0
⎡ˆ τ ⎤
≤ cp' E X̄τp−2 d〈X〉t
0
⎡ ⎤
= cp' E X̄τp−2 〈X〉τ ≤
p
(by Hölder’s inequality with exponents . p−2 and . p2 )
⎡ ⎤ p−2 ⎡ ⎤2
≤ cp' E X̄τp p E 〈X〉p/2
p
. τ
and from this inequality, the thesis easily follows. To remove the boundedness
assumption, it is sufficient to apply the result just proved to the stopping time
218 11 Itô’s Formula
Levi’s theorem.
Let us now prove the first inequality: with the usual localization argument based
⎡ p ⎤that .τ , .X̄τ and .〈X〉τ are
on Beppo Levi’s theorem, it is not restrictive to assume
bounded by a positive constant. We also assume .E X̄τ > 0 otherwise there is
nothing to prove. Let .r = p2 > 1 and .A = 〈X〉. By the deterministic Itô’s formula,
Theorem 9.1.6 and formula (9.1.4), we have
1
dArt = At dAtr−1 + dArt
.
r
that is
ˆ τ
(r − 1)Arτ = r
. At dAtr−1 .
0
Since also
ˆ τ ˆ τ
Arτ = Aτ
. dAtr−1 = Aτ dAtr−1 ,
0 0
we finally obtain
ˆ τ
r
.Aτ =r (Aτ − At ) dAtr−1 .
0
Then we have
⎡ˆ ⎤
⎡ r⎤ τ
.E Aτ = rE (Aτ − At ) dAtr−1 =
0
(by (9.4.1) and (1.4.3) (see also Remark 9.4.4), remembering the notation .A = 〈X〉)
⎡ˆ τ ⎡ ⎤ ⎤
. = rE E Xτ2 − Xt2 | Ft d〈X〉tr−1
0
11.2 Some Consequences of Itô’s Formula 219
⎡ˆ τ ⎡ ⎤ ⎤
≤ rE E X̄τ | Ft d〈X〉t
2 r−1
=
0
r
To conclude, just apply Hölder’s inequality with exponents r, . r−1 and finally divide
⎡ ⎤ r−1
by .E 〈X〉τ r .
r ⨆
⨅
We have the following immediate
Corollary 11.2.2 ([!]) Let .σ ∈ L2 and W be a real Brownian motion. For every
.p ≥ 2 and .T > 0 we have
⎡ |ˆ t |p ⎤ ⎡ˆ ⎤
| | T
sup || σs dWs || ≤ cp T
p−2
E
. 2 E |σs |p ds (11.2.2)
0≤t≤T 0 0
we obtain
⎡ ⎤ ⎡⎛ˆ ⎞p/2 ⎤
⎡ ⎤ T
p/2
E
. sup |Xt | ≤ cp E 〈X〉T = cp E σt2 dt .
0≤t≤T 0
p
The thesis follows by applying Hölder’s inequality with exponents . p2 and . p−2 . ⨆
⨅
Remark 11.2.3 Assume .p > 4 and
ˆ t ⎡ˆ T ⎤
Xt :=
. σs dWs with E |σs | ds < ∞.
p
0 0
2 ⎛
⎲
n
⎞2
.〈X〉t = lim X tkn − X t (k−1) , t ≥ 0,
n→∞ 2 2n
k=1
2 ⎛
⎲
n
⎞2
. lim S tkn − S t (k−1) = 〈X〉t , t ≥ 0, (11.2.3)
n→∞ 2 2n
k=1
in probability.
Proof As usual, we denote by .tn,k = 2tkn , .k = 0, . . . , 2n , the dyadic rationals of
the interval .[0, t]. We first assume that X is a bounded continuous local martingale,
.|X| ≤ K with K positive constant. Given .n ∈ N and .k ∈ {1, . . . , 2 }, we consider
n
the process
Ys := Xs − Xtn,k−1 ,
. s ≥ tn,k−1 ,
and observe that .〈Y 〉s = 〈X〉s − 〈X〉tn,k−1 : indeed, it is enough to observe that
( )
Ys2 − 〈X〉s − 〈X〉tn,k−1 = Xs2 −〈X〉s +Ms , Ms := −2Xs Xtn,k−1 +Xt2n,k−1 +〈X〉tn,k−1 ,
.
and it is easily verified that .(Ms )s≥tn,k−1 is a martingale. Applying Itô’s formula, we
have
that is
ˆ
( )2 ( ) tn,k ( )
. Xtn,k − Xtn,k−1 − 〈X〉tn,k − 〈X〉tn,k−1 = 2 Xs − Xtn,k−1 dYs .
tn,k−1
11.3 Proof of Itô’s Formula 221
⎲
2 n 2 ˆ
⎲
n
( )2 tn,k ( )
Rn :=
. Xtn,k − Xtn,k−1 − 〈X〉t = 2 Xs − Xtn,k−1 dYs .
k=1 k=1 tn,k−1
Thanks to the Itô isometry in the form (10.2.12) and (10.2.13) (also remember the
Theorem 10.2.15), we have
⎡ˆ ⎤
⎡ ⎤ 2n
⎲ tn,k ( )2
.E Rn = 4 Xs − Xtn,k−1 d〈Y 〉s
2
E
k=1 tn,k−1
⎡ ⎤
ˆ t⎲
2n
( )2
= 4E ⎣ Xs − Xtn,k−1 1[tn,k−1 ,tn,k ] (s)d〈Y 〉s ⎦
0 k=1
and taking
⎡ ⎤ the limit, by the dominated convergence theorem, we have
.lim E Rn2 = 0. Therefore, in this particular case, we prove the convergence
n→∞
in .L2 norm which obviously implies convergence in probability.
To remove the boundedness assumption on X, it is sufficient to use a localization
argument proving the thesis for the bounded martingale .Xt∧τn , with
to then let n tend to infinity: with this procedure, we can prove convergence in
probability. The proof of (11.2.3) is similar and is omitted. ⨅
⨆
where .Vs (A) denotes the first variation process of A on .[0, s] (cf. Definition 9.1.1).
By continuity, .τn ↗ ∞ a.s. and therefore it is enough to prove Itô’s formula for
.Xt∧τn for each .n ∈ N: equivalently, it is enough to prove for each fixed .N̄ ∈ N
that (11.1.1) holds in the case where the processes .|X|, |M|, A, 〈X〉 and .V (A) are
bounded by .N̄. In this case, it is not restrictive to assume that the function F has
compact support, possibly modifying it outside .[−N̄ , N̄ ]. At first, we also assume
that .F ∈ C 3 (R).
We use the notation (8.1.1) for the dyadics
D(t) = {tn,k =
.
tk
2n | k = 0, . . . , 2n , n ∈ N}
of .[0, t] and indicate with .Δn,k Y = Ytn,k − Ytn,k−1 the increment of a generic process
Y . Moreover, let .Fn,k := Ftn,k and
δn (Y ) =
. sup |Ys − Yr |, n ∈ N.
s,r∈D(t)
|s−r|< 1n
2
11.3 Proof of Itô’s Formula 223
with
n
⎲
2
( )3
. |Rn | ≤ ‖F ''' ‖∞ Δn,k X . (11.3.4)
k=1
In the next two steps, we estimate the individual terms in (11.3.3) to show that they
converge to the corresponding terms in (11.1.1) and .Rn −→ 0 as .n → ∞.
Second Step Regarding the first sum in (11.3.3), we have
n
⎲
2
. F ' (Xtn,k−1 )Δn,k X = In1,A + In1,M
k=1
⎲
2 n ˆ t
'
1,A
.In := F (Xtn,k−1 )Δn,k A −−−−→ F ' (Xs )dAs (11.3.5)
n→∞ 0
k=1
⎲
2 n ˆ t
'
1,M
.In := F (Xtn,k−1 )Δn,k M −−−−→ F ' (Xs )dMs
n→∞ 0
k=1
where
n n
⎲
2 ⎲
2
''
2,A
.In := F (Xtn,k−1 )(Δn,k A) , 2
In2,AM := F '' (Xtn,k−1 )(Δn,k A)(Δn,k M),
k=1 k=1
2n
⎲
In2,M := F '' (Xtn,k−1 )(Δn,k M)2 .
k=1
Now we have
by the uniform continuity of the trajectories of A on .[0, t]. A similar result holds for
In2,AM . Recalling that by definition .〈X〉 = 〈M〉, it remains to prove that
.
ˆ t
2,M
.In −−−−→ F '' (Xs )d〈M〉s .
n→∞ 0
⎲
2 n ˆ t
. F '' (Xtn,k−1 )Δn,k 〈M〉 −−−−→ F '' (Xs )d〈M〉s ,
n→∞ 0
k=1
we prove that
⎲
2n
⎛ ⎞
. F '' (Xtn,k−1 ) (Δn,k M)2 − Δn,k 〈M〉 −−−−→ 0
n→∞
k=1
( )
in .L2 (Ω, P ) norm. Setting .Gn,k = F '' (Xtn,k−1 ) (Δn,k M)2 − Δn,k 〈M〉 , expanding
the square of the sum, we have
⎡⎛ ⎞2 ⎤ ⎡ n ⎤
2n
⎲ ⎲2
⎢⎝ ⎥
.E ⎣ Gn,k ⎠ ⎦ = E ⎣ G2n,k ⎦
k=1 k=1
since:
. .δn (M) ≤ 2N̄ and .δn (M) −−−−→ 0 almost everywhere by the uniform continuity
n→∞ ⎡ ⎤
of M on .[0, t]: consequently, .E δn4 (M) → 0 by the dominated convergence
theorem. Similarly, .E [δn (〈M〉)] −−−−→ 0;
⎡⎛ ⎞2 ⎤
n→∞
∑
2 n
is entirely analogous.
Fourth Step We conclude the proof by removing the additional regularity assump-
tion on F . Given .F ∈ C 2 (R) with compact support, consider a sequence .(Fn )n∈N
of .C 3 functions that converge uniformly to F along with their first and second
derivatives. We apply Itô’s formula to .Fn and let n tend to infinity: we have
.Fn (Xs ) −−−−→ F (Xs ) for every .s ∈ [0, t]. By the dominated convergence theorem,
n→∞
we have a.s.
ˆ t ˆ t
( ' ) ( '' )
. lim Fn (Xs ) − F ' (Xs ) dAs = lim Fn (Xs ) − F '' (Xs ) d〈X〉s = 0
n→∞ 0 n→∞ 0
226 11 Itô’s Formula
We outline the chapter’s main findings and essential concepts, omitting technical
details. As usual, if you have any doubt about what the following succinct statements
mean, please review the corresponding section.
. Section 11: the significance of the quadratic variation process becomes apparent
in the outline of Itô’s formula proof: in particular, it introduces an additional
term that modifies the usual rules of deterministic integral calculus. The Itô’s
formula provides the Doob’s decomposition of a process that is a sufficiently
regular function of a continuous semimartingale, providing the expression of the
drift and the diffusive part.
. Sections 11.1.1 and 11.1.2: the heat operator appears in the drift term of Itô’s
formula for the Brownian motion: a process of the form .Xt = F (t, Wt ) is a
(local) martingale if and only if the function F is a solution of the heat equation.
An application of Itô’s formula shows that Itô processes with deterministic
coefficients have normal distribution.
. Section 11.2: the Burkholder-Davis-Gundy inequality generalizes the Itô isom-
etry and provides a comparison between the .Lp norm of a continuous local
martingale X and the .Lp/2 norm of the related quadratic variation process .〈X〉.
Main notations used or introduced in this chapter:
Tu, tu non mi basti mai davvero non mi basti mai tu, tu dolce
terra mia dove non sono stato mai.1
Lucio Dalla
In this chapter, we extend the definitions and results of the previous chapters to
the multidimensional case. We do not introduce any really new concepts; however,
some results, such as Itô’s formula, become technically more complicated and for
this reason, some formal rules introduced in Sect. 12.3 can be useful for practical
calculations.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 227
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_12
228 12 Multidimensional Stochastic Calculus
d〈W i , W j 〉t = δij dt
. (12.1.1)
1 |x|2 ∏
d
1 xi 2
. e− 2(t−s) = √ e− 2(t−s) , x ∈ Rd ,
(2π(t − s))
d
2
i=1
2π(t − s)
2 For t ≥ s ≥ 0, we have
⎡ ⎤ ⎡⎛ ⎞ ⎤ ⎡ ⎤
i j j j j
.E Wt Wt | Fs = E Wti − Wsi Wt | Fs + Wsi E Wt | Fs = Wsi Ws
since
⎡⎛ ⎞ ⎤ ⎡⎛ ⎞⎛ ⎞ ⎤ ⎡ ⎤
j j j j
.E Wti − Wsi Wt | Fs = E Wti − Wsi Wt − Ws | Fs + Ws E Wti − Wsi | Fs
12.1 Multidimensional Brownian Motion 229
In this section, we briefly show how to define the stochastic integral of multidi-
mensional processes, focusing in particular on Brownian motion and Itô processes.
For simplicity, we only deal with the case where the integrator is in M c,2 even
though all the results extend directly to integrators that are continuous semimartin-
gales. Hereafter, d and N denote two natural numbers.
Definition 12.1.4 Let B = (B 1 , . . . , B d ) ∈ M c,2 be a d-dimensional process.
Consider a process u = (uij ) with values in the space of matrices of dimension
N × d. We write u ∈ L2B (or simply u ∈ L2 ) if uij ∈ L2B j for each i = 1, . . . , N
and j = 1, . . . , d. The class L2loc ≡ L2B,loc is defined in an analogous way. The
stochastic integral of u with respect to B is the N -dimensional process, defined
component by component as
⎛ ⎞
ˆ t d ˆ
⎲ t
us dBs := ⎝ us dBs ⎠
ij j
.
0 j =1 0 i=1,...,N
for t ≥ 0.
Theorem 12.1.5 ([!]) Let
ˆ t ˆ t
Xt =
. us dBs1 , Yt = vs dBs2 ,
0 0
⎡⎛ ⎞⎛ ⎞⎤
j j
=E Wti − Wsi W t − Ws =0
(ii) if u ∈ L2B 1 and v ∈ L2B 2 then the following version of Itô’s isometry holds
⎡ˆ T ˆ T ⎤ ⎡ˆ T ⎤
E
. us dBs1 vs dBs2 | Ft = E us vs d〈B 1 , B 2 〉s | Ft , 0 ≤ t ≤ T .
t t t
(12.1.3)
Proof When u and v are indicator processes, (12.1.3) is proven by repeating the
proof of Theorem 10.2.7-(ii) with the only difference that, instead of (10.2.6), we
use (9.5.2) in the form
⎡ ⎤ ⎡ ⎤
E (BT1 − Bt1 )(BT2 − Bt2 ) | Ft = E 〈B 1 , B 2 〉T − 〈B 1 , B 2 〉t | Ft ,
. 0≤t ≤ T .
Proof Equation (12.1.4) follows directly from (12.1.3) and point (iii) of Proposi-
tion 12.1.2. ⨆
⨅
Remark 12.1.7 The components of the covariation matrix (cf. Definition 9.5.3) of
the integral process
ˆ t
Xt =
. us dBs
0
are
⎲d ˆ t d ˆ t
⎲
ij jk
〈X〉t = 〈
. uih
s dB h
s , us dBsk 〉 =
h=1 0 k=1 0
12.2 Multidimensional Itô Processes 231
(by (12.1.2))
d ˆ
⎲ t
jk
. = s us d〈B , B 〉s
uih h k
(12.1.5)
h,k=1 0
for i, j = 1, . . . , N .
where:
(i) X0 ∈ mF0 is an N-dimensional random variable;
(ii) u is an N -dimensional process in L1loc , i.e., u is progressively measurable and
such that, for every t ≥ 0,
ˆ t
. |us |ds < ∞, a.s.
0
where |v| denotes the Hilbert-Schmidt norm of the matrix v, i.e., the Euclidean
norm in RN×d , defined by
⎲
N ⎲
d
|v|2 =
. (v ij )2 .
i=1 j =1
dXt = ut dt + vt dWt .
.
Combining (12.1.5) with the fact that 〈w〉t = tI , we obtain the following
232 12 Multidimensional Stochastic Calculus
Proposition 12.2.2 Let X be the Itô process in (12.2.1). The covariation matrix of
X is
ˆ t
.〈X〉t = vs vs∗ ds, t ≥ 0,
0
( )ij ⎲
d
C ij := vv ∗ =
ij
d〈Xi , Xj 〉t = Ct dt,
. v ik v j k . (12.2.2)
k=1
Proof We have
⎡⎛ ⎞2 ⎤
⎡ |ˆ |2 ⎤ d ˆ t
| t | ⎲
N
⎢ ⎲ ⎥
.E | | E ⎣⎝ vs dWs ⎠ ⎦ =
ij j
| vs dWs | =
0 i=1 j =1 0
(by (12.1.4))
⎡⎛ˆ ⎞2 ⎤
⎲
N ⎲
d t
ij j
. = E vs dWs =
i=1 j =1 0
⎲
N ⎲
d ⎡ˆ t ⎤
ij
. = E (vs )2 ds .
i=1 j =1 0
⨆
⨅
Example 12.2.4 In the simplest case where u, v are constants, we have
Xt = X0 + ut + vWt ,
.
ˆ t d ˆ
⎲ t
j
F (t, Xt ) = F (0, X0 ) +
. (∂t F )(s, Xs )ds + (∂xj F )(s, Xs )dXs
0 j =1 0
d ˆ
1 ⎲ t
+ (∂xi xj F )(s, Xs )d〈Xi , Xj 〉s
2 0
i,j =1
⎲
d
j 1
⎲
d
dF(t, Xt )=∂t F (t, Xt )dt+ (∂xj F )(t, Xt )dXt +
. (∂xi xj F )(t, Xt )d〈Xi , Xj 〉t .
2
j =1 i,j =1
d〈W i , W j 〉t = δij dt
. (12.3.1)
dXt = μt dt + σt dWt
. (12.3.2)
that is, recalling the notation 〈X〉 for the covariation matrix of X (cf.
Definition 9.5.3),
d〈X〉t = Ct dt.
.
have
ˆ t d ˆ
⎲ t
j
F (t, Wt ) = F (0, 0) +
. (∂t F )(s, Ws )ds + (∂xj F )(s, Ws )dWs
0 j =1 0
ˆ t
1
+ (ΔF )(s, Ws )ds
2 0
⎲
d
Δ=
. ∂x j x j .
j =1
⎲
N
d|Wt |2 = N dt + 2Wt dWt = N dt + 2
. Wti dWti .
i=1
ˆ t N ˆ
⎲ t
j
F (t, Xt ) = F (0, X0 ) +
. (∂t F )(s, Xs )ds + (∂xj F )(s, Xs )dXs
0 j =1 0
N ˆ
1 ⎲ t ij
+ (∂xi xj F )(s, Xs )Cs ds
2 0
i,j =1
12.3 Multidimensional Itô’s Formula 235
⎲
N ⎲
d
jk
+ σt ∂xj F (t, Xt )dWtk .
j =1 k=1
dYt = σt dWt
.
1
dXt = dYt − σt σt∗ ηdt.
.
2
η
We have Mt = F (Xt ) and
so that
⎛ ⎞ ⎲
N ⎲
d
1
dMt = Xt ηdXt + 〈σt σt∗ η, η〉dt = Xt ηdYt = Xt
η ij j
. ηi σt dWt .
2
i=1 j =1
|η|2
Mt := ei〈η,Wt 〉+
η
. 2 t , t ≥ 0, η ∈ Rd , (12.3.4)
⎲
d
dXti = μit dt +
. σtik dWtk , i = 1, . . . , N. (12.3.5)
k=1
and calculate the product “∗'' on the right-hand side as a product of the “polynomi-
als” dXi in (12.3.5) according to the following calculation rules
j
dt ∗ dt = dt ∗ dWti = dWti ∗ dt = 0,
. dWti ∗ dWt = δij dt, (12.3.6)
Consequently,
dXt1 = dWt1 ,
. dXt2 = Wt2 dWt1
∂t F = x1 F,
. ∂x1 F = tF, ∂x2 F = etx1 , ∂x1 x1 F = t 2 F, ∂x1 x2 F = tetx1 ,
∂x2 x2 F = 0,
and by the formal rules (12.3.6) for the calculation of covariation processes
d〈X1 〉t = dt,
. d〈X1 , X2 〉t = Wt2 dt.
Consequently
1 1⎛2 1
⎞
.dYt = Wt1 Yt dt + tYt dWt1 + etWt dWt2 + t Yt + 2tetWt Wt2 dt.
2
〈Xi , Xj 〉t = δij t,
. t ≥ 0. (12.4.1)
Proof We use Proposition 12.3.6 and verify that, for every .η ∈ Rd , the exponential
process
|η|2
Mt := eiηXt +
η 2 t
.
since
⎡ d ˆ d ˆ t
⎤
⎡ ⎤ ⎲ t ⎲
j i j jh
i
.cov(Bt , Bt ) = E Bt Bt = E ik k
αs dWs αs dWs =
h
k=1 0 h=1 0
or alternatively by
⎲
d
ij j
dSti = μit Sti dt +
. vt Sti dWt , i = 1, . . . , N, (12.4.3)
j =1
j ij
dt ∗ dt = dt ∗ dBti = dBti ∗ dt = 0,
. dBti ∗ dBt = ϱt dt. (12.4.4)
For example, let us assume the dynamics (12.4.2) with .N = 2 and let B be two-
dimensional Brownian motion defined as in Example 12.1.3, with correlation matrix
⎛ ⎞
1ϱ
. , ϱ ∈ [−1, 1].
ϱ1
Then we have
⎛ ⎞
St1 dSt1 St1 1 2 2St1
.d = 2 − 2 dSt + 2
− 2 d〈S , S 〉t + 2 d〈S 〉t
1 2 2
St2 St (St )2 2 (St )2 (St )3
S1 ⎛ ⎞ S1
= t2 μ1t − μ2t − ϱt σt1 σt2 + (σt2 )2 dt + t2 (σt1 dBt1 − σt2 dBt2 ).
St St
We summarize the most significant findings of the chapter and the fundamental
concepts to be retained from an initial reading, while disregarding the more technical
or secondary matters. As usual, if you have any doubt about what the following
succinct statements mean, please review the corresponding section.
12.5 Key Ideas to Remember 241
• Sections 12.1, 12.2, and 12.3: these sections contain the multidimensional
extension of the main concepts of stochastic integration. Since several technical
and non-substantial complications arise, the rules of thumb of Remark 12.3.7
come in handy when applying Itô’s formula.
• Section 12.4: a classical result by Lévy provides a characterization of a Brownian
motion in terms of the martingale property and the expression of the covariation
matrix. In certain applications, such as in finance (see Example 12.4.4), it’s
common to employ correlated Brownian motion and the associated Itô’s formula.
Symbols introduced in this chapter:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 243
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_13
244 13 Changes of Measure and Martingale Representation
we obtain
1 The filtration obtained by completing the filtration generated by W so that it satisfies the usual
conditions.
13.1 Change of Measure and Itô Processes 245
Moreover, if .Q ∼ P :
(a) almost surely we have
⎾ ⏋
dQ
λ
.Mt =E P
| Ft ,
W
t ∈ [0, T ]; (13.1.4)
dP
dXt = bt dt + σt dWt
. (13.1.6)
We will prove Theorem 13.1.1 in Sect. 13.5.1, as a corollary of the two main
results of this chapter, Girsanov theorem and the Brownian martingale representa-
tion theorem.
In some applications, we are interested in replacing the drift .bt of an Itô process of
the form (13.1.6) with a suitable drift .rt ∈ L1loc . Theorem 13.1.1, states that this is
possible by changing the probability measure provided that there exists a process
.λ ∈ L
2 such that .r = b − σ λ and .M λ in (13.1.1) is a martingale. In this section,
loc t t t t
we present a specific application in the field of mathematical finance.
In the one-dimensional Black&Scholes model [19] of Example 12.4.4, the price
S of a risky asset has the following stochastic dynamics
where W is a real Brownian motion on .(Ω, F , P , Ft ) and .μ, σ are two real
parameters called expected return rate and volatility, respectively. We assume .σ > 0
in order not to cancel the random effect of the Brownian motion that describes the
246 13 Changes of Measure and Martingale Representation
μ−r
λ=
. ∈ R+ . (13.1.9)
σ
The choice of .λ is such that the dynamics of S becomes
thus formally analogous4 to (13.1.8) but with the expected return rate equal to the
risk-free rate. The measure Q does not intend to describe the real dynamics of the
stock: Q is called “risk-neutral measure” or also “martingale measure” because the
process .~
St := e−rt St of the discounted asset price5 is a Q-martingale6 and, in
particular, we have
S0 = e−rT E Q [St ] .
. (13.1.10)
.dSt = μSt dt
with deterministic solution .St = S0 eμt : the latter is called a compound capitalization formula with
interest rate .μ.
3 The interest rate paid by the bank account which is assumed to be the risk-free investment of
reference.
4 .W λ = W + λt is a real Brownian motion under the measure Q.
t t
5 The discount factor .e−rt eliminates the “time value” of prices.
6 As opposed to the real measure P under which, being .μ > r, the discounted price is a
sub-martingale: this describes the expectation of a higher return compared to a bank account,
considering the riskiness of the asset.
13.2 Integrability of Exponential Martingales 247
“payoff function” .ϕ is given and the random variable .ϕ(ST ) represents the value of
the derivative at time T . For consistency with formula (13.1.10), the (discounted)
expected value in the risk-neutral measure
e−rT E Q [ϕ(ST )]
. (13.1.11)
is called “risk-neutral price”, at the initial time, of the derivative with payoff .ϕ. The
expected value in (13.1.11) can be calculated explicitly using the fact that .ST has
log-normal distribution, returning the famous Black&Scholes formula.
The parameter .λ in (13.1.9) is called “market price of risk” because it is defined
as the ratio between the return differential .μ − r required to assume the risk of
investing in S and the volatility .σ that measures the riskiness of S.
Unlike P , the measure Q does not have a “statistical” purpose and does not
reflect the actual probabilities of events; rather, it is an artificial measure under
which all market prices (of the bank account, of the stock S and of the derivative
.ϕ(ST )) are deemed fair: the purposes of Q are mainly the valuation of derivatives
and the study of some fundamental properties of financial models, such as absence
of arbitrage and completeness. For a full treatment of these topics, we refer, for
example, to [111, 112] and [115].
In this section, we give some conditions on the process .λ that guarantee that the
exponential martingale (13.1.1) is a true martingale.
Proposition 13.2.1 Assume that
ˆT
. |λt |2 dt ≤ κ a.s. (13.2.1)
0
for a certain constant .κ. Then the exponential martingale .M λ in (13.1.1) is a true
martingale and
⎾ ⏋
( λ )p
E
. sup Mt < ∞, p ≥ 1.
0≤t≤T
where the Brownian motion W and .λ ∈ L2loc are both d-dimensional processes.7
Under condition (13.2.1), the Burkholder-Davis-Gundy inequality provides the
following summability estimate for Y : for every .p > 0, we have
⎾ p⏋ ⎾ ⏋
p/2
E ȲT ≤ cE 〈Y 〉T
. ≤ cκ p/2 .
τ := inf{t ≥ 0 | Zt ≥ ε} ∧ T .
.
Then .τ is a bounded stopping time and by the optional sampling Theorem 8.1.6, we
have
⎾ ⏋
.E [Z0 ] ≥ E [Zτ ] ≥ E Zτ 1(Z̄ ≥ε) ≥ εP (Z̄T ≥ ε).
T
⨆
⨅
Proposition 13.2.4 (Exponential Integrability) Let Y be the stochastic integral
in (13.2.2) with .λ ∈ L2 satisfying the condition (13.2.1). Then we have
( ) ϵ2
. P ȲT ≥ ϵ ≤ 2e− 2κ , ϵ > 0, (13.2.3)
α2
. Ztα = eαYt − 2 〈Y 〉t ,
Hence
⎛ ⎞ ⎛ ⎞
2 α2 κ
αϵ− α 2 κ
.P sup Yt ≥ ϵ ≤P sup Ztα ≥e ≤ e−αϵ+ 2
0≤t≤T 0≤t≤T
An analogous estimate holds for .−Y and this proves (13.2.3). Finally, (13.2.4) is an
immediate consequence of (13.2.3), Proposition 3.1.6 in [113] and Example 3.1.7
in [113]. ⨆
⨅
Remark 13.2.5 Proposition 13.2.4 extends to the case where .σ is a .N × d-
dimensional process: in this case we have
( ) ϵ2
P ȲT ≥ ϵ ≤ 2N e− 2κN ,
. ϵ > 0, (13.2.5)
Remark 13.2.7 Condition (13.2.7) is sharp in the sense that, for every .0 < α < 21 ,
there exists a process .λ ∈ L2loc that satisfies
⎾ ⎛ ˆ T ⎞⏋
.E exp α |λs | ds
2
<∞
0
and is such that .M λ in (13.1.1) is not a martingale: for details see Chapter 6 in [90].
dQ
. = MTλ . (13.3.2)
dP
The proof of the following lemma is based on the Bayes’ formula of Theorem 4.2.14
in [113]: for every .X ∈ L1 (Ω, Q) we have
⎾ ⏋
E P XMTλ | Ft
.E [X | Ft ] =
Q
⎾ ⏋ t ∈ [0, T ]. (13.3.3)
E P MTλ | Ft
and thus .Xt ∈ L1 (Ω, Q) if and only if .Xt Mtλ ∈ L1 (Ω, P ). Similarly, for .s ≤ t we
have
⎾ ⏋ ⎾ ⎾ ⏋ ⏋ ⎾ ⏋
.E
P
Xt MTλ | Fs = E P E P Xt MTλ | Ft | Fs = E P Xt Mtλ | Fs .
( )−1
is a Q-martingale since .M λ M λ is obviously a P -martingale. Moreover, for
every absolutely integrable random variable X, we have
⎾ ( )−1 λ ⏋ ⎾ ( )−1 ⏋
E P [X] = E P X MTλ
. MT = E Q X MTλ
and therefore
dP ( )−1
. = MTλ .
dQ
In particular, .P , Q are equivalent measures, in the sense that they have the same
certain and negligible events, since they have mutually strictly positive densities.
A Brownian motion is a martingale and therefore is a “driftless process”:
Girsanov theorem states that if a drift is added to a Brownian motion, this process
is still a Brownian motion with respect to a new probability measure. To understand
this result, which at first glance seems a bit strange, it is helpful to keep in mind
the elementary Example 1.4.8 at the end of which we observed that the martingale
property is not a property of the paths of the process, but rather depends on the
probability measure under consideration.
13.3 Girsanov Theorem 253
η
λ ) is a P -martingale and
By Lemma 13.2.1, the process .(Xt∧τn Mt∧τn
⎾ η ⏋ η
. EP Xt∧τn Mt∧τ
λ
n
| Fs = Xs∧τn Ms∧τ
λ
n
, s ≤ t, n ∈ N.
η
Hence, to prove that .Xη Z is a martingale, it is sufficient to show that .(Xt∧τn Mt∧τ
λ )
n
η λ
converges to .(Xt Mt ) in .L1 -norm as n tends to infinity. Since
η η
. lim Xt∧τn = Xt a.s.
n→∞
η |η|2 T
and .0 ≤ Xt∧τn ≤ e 2 , it is enough to prove that
.
λ
lim Mt∧τn
= Mtλ in L1 (Ω, P ).
n→∞
254 13 Changes of Measure and Martingale Representation
Let
Mn,t = min{Mt∧τ
.
λ
n
, Mtλ };
Another reason for interest in exponential martingales is the fact that they are
a useful approximation tool. Hereafter, W is a Brownian motion on the space
.(Ω, F , P ) equipped with the standard Brownian filtration .F
W : the choice of
this particular filtration is crucial for the validity of the following results. The
next theorem is the main ingredient in the proof of the Brownian martingale
representation theorem that we will present in Sect. 13.5.
The proofs of this section are a bit technical and can be skipped at a first
reading.
Theorem 13.4.1 The space of linear combinations of random variables of the form
⎛ ˆ T ˆ T ⎞
1
MTλ = exp −
. λ(t)dWt − λ(t)2 dt ,
0 2 0
with .λ deterministic function in .L∞ ([0, T ]), is dense in .L2 (Ω, FTW ).
13.4 Approximation by Exponential Martingales 255
ϕ(Wt1 , . . . , Wtn ),
. ϕ ∈ C0∞ (Rn ), n ∈ N,
Gn := σ (Wt1 , . . . , Wtn ),
. n ∈ N,
Xn = ϕn (Wt1 , . . . , Wtn )
.
for some measurable function .ϕn that is square-integrable respect to the law
μWt1 ,...,Wtn : by density, .ϕn can be approximated in .L2 by a sequence .(ϕn,k )k∈N
.
Then, by Theorem 8.2.2 on the convergence of discrete martingales, there exists the
a.s. pointwise limit
. M := lim Xn .
n→∞
Moreover, since
. lim Xn = M in L2 (Ω, P ).
n→∞
due to (13.4.3). Since the elements of .FTW and .GTW differ only for negligible events,
⎾ ⏋
it follows that .M = E X | FTW . ⨆
⨅
Proof of Theorem 13.4.1 It is sufficient to prove that if .X ∈ L2 (Ω, FTW ) and, for
every .λ ∈ L∞ ([0, T ]),
⎾ ⏋
〈X, MTλ 〉L2 (Ω) = E XMTλ = 0
. (13.4.4)
Proof We restrict our attention to the one-dimensional case for simplicity. As for
uniqueness, if .u, v ∈ L2 satisfy (13.5.2), then
ˆ T
. (ut − vt )dWt = 0
0
and thus .(un )n∈N is a Cauchy sequence in .L2 . The thesis follows by taking the limit
in (13.5.4). ⨆
⨅
Proof of Theorem 13.5.1 The uniqueness of u follows from the uniqueness of the
representation of an Itô process (cf. Remark 10.4.3).
As for the existence, let us first consider the case where X is a martingale such
that .XT ∈ L2 (Ω, P ). By Theorem 13.5.3, there exists .u ∈ L2 such that
ˆ T
. XT = E [XT ] + ut dWt ,
0
1
E [|Yn − XT |] ≤
. , n ∈ N.
2n
For the previous point, the sequence of martingales
⎾ ⏋
Xn,t := E Yn | FtW ,
. t ∈ [0, T ],
13.5 Representation of Brownian Martingales 259
modification. Since
By (13.5.5) and Proposition 10.2.26, we can take the limit in (13.5.6) to conclude
the proof. ⨆
⨅
ut
Note that .λt := − M t
belongs to .L2loc since M is an adapted, continuous, and strictly
positive process. Consequently, we have
ˆ t
Mt = 1 −
. Ms λs dWs , t ∈ [0, T ],
0
260 13 Changes of Measure and Martingale Representation
that is, M solves a linear stochastic differential equation of which the exponential
martingale .M λ in (13.1.1) is the unique8 solution. Hence .M = M λ in the sense of
indistinguishability.
By construction, M is a martingale and therefore, by Girsanov Theorem 13.3.3,
.W in (13.1.5) is a Brownian motion on .(Ω, F , Q, Ft ). Finally, we have
λ W
dXt = bt dt + σt dWt =
.
(by (13.1.5))
. = bt dt + σt (dWtλ − λt dt)
We summarize the main findings and key concepts of the chapter, omitting technical
details. As usual, if you have any doubt about what the following succinct statements
mean, please review the corresponding section.
. Section 13.1: the exponential martingale .M λ in (13.1.1), with .λ ∈ L2loc , is the
main tool used throughout the chapter. If .M λ is a true martingale, then it can
be used as a density (or Radon-Nikodym derivative) to define a measure Q
equivalent to the initially considered measure P . The process .W λ in (13.1.5),
obtained by adding a drift .λ to a Brownian motion, is a Brownian motion under
the new measure Q. The idea is that there is a correspondence between changes in
drift of a Brownian motion (and related Itô processes) and changes in probability
measure: the drift coefficient .λ acts as the exponent of the martingale .M λ , which
is the Radon-Nikodym derivative of the change of measure.
. Section 13.1.1: the results on changes in drift and measure (often referred to
as “Girsanov’s change of measure” in financial jargon) are pivotal in modern
financial derivatives valuation theory. It is worth noting that a Girsanov’s change
of measure alters the drift term of an Itô process while leaving the diffusion
coefficient unchanged.
. Section 13.2: we provide sufficient conditions on the process .λ for .M λ to be a
true martingale. Novikov’s condition is a classic condition that is often used in
probability theory and mathematical finance.
. Section 13.3: the proof of Girsanov theorem is a relatively direct consequence
of Proposition 4.4.2 that characterizes Brownian motion in terms of exponential
martingales.
8 The fact that .M λ is a solution is a simple check with Itô’s formula. For uniqueness, it is not
difficult to adapt the proof of Theorem 17.1.1 that we will prove later.
13.6 Key Ideas to Remember 261
. Sections 13.4 and 13.5: the proof of the Brownian martingale representation
theorem is quite challenging and is based on a density result of exponential mar-
tingales in the space .L2 (Ω, FTW ) where .F W indicates the standard Brownian
filtration (which satisfies the usual conditions). A significant corollary is the fact
that every local Brownian martingale admits a continuous modification.
Main notations used or introduced in this chapter:
Starting from this chapter, we begin the study of Stochastic Differential Equations,
hereafter abbreviated as SDEs. As anticipated in Sect. 2.6, such equations were
originally introduced for the construction of continuous Markov processes or
diffusions. Over time, SDEs have become increasingly important in stochastic
modeling across a wide range of fields. SDEs generalize deterministic differential
equations by incorporating a random perturbation factor, which allows them to
model systems that are subject to uncertainty. In addition, SDEs can be used to
construct explicit examples of continuous semimartingales.
In this chapter, we introduce the notion of solution to an SDE and the related
problems of existence and uniqueness. These problems have a dual formulation,
in a weak and strong sense. We give a very particular existence and uniqueness
result from which some peculiarities of SDEs compared to the usual deterministic
equations can be deduced, including the so-called “regularization by noise” effect.
We see that it is possible to transfer the study of an SDE to a canonical setting
and analyze the relationship between weak and strong solvability. Finally, we prove
some preliminary estimates of continuous dependence and integrability of solutions.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 263
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_14
264 14 Stochastic Differential Equations
are measurable functions:1 b is called the drift coefficient and .σ the diffusion
coefficient of the SDE. In (14.1.2) .RN ×d indicates the space of matrices of
dimension .N ×d. To simplify the presentation, we will always assume the following
Assumption 14.1.1 The functions .b, σ are measurable and locally bounded in x
uniformly in t (in short, we write .b, σ ∈ L∞
loc (]t0 , T [×R )): precisely, for each
N
the time variable. This type of equation intervenes, for example, in the study of optimal control
problems and stochastic filtering. We will restrict our attention to deterministic coefficients. We
refer, for example, to [77] and [66] for a general treatment.
14.1 Solving SDEs: Concepts of Existence and Uniqueness 265
2 On the probability space .(Ω, F , P , (Ft )t∈[t0 ,T ] ), we say that .W = (Wt )t∈[t0 ,T ] is a Brownian
motion starting at time .t0 if:
(i) .Wt0 = 0 a.s.;
(ii) W is a.s. continuous;
(iii) W is adapted to .(Ft )t∈[t0 ,T ] ;
(iv) .Wt − Ws is independent of .Fs for every .t0 ≤ s ≤ t ≤ T ;
(v) .Wt − Ws ∼ N0,(t−s)I for every .t0 ≤ s ≤ t ≤ T , where I denotes the .d × d identity matrix.
For example, let .B = (Bt )t≥0 be a standard Brownian motion on .(Ω, F , P , (Ft )t≥0 ); then .Wt :=
Bt − Bt0 is a Brownian motion starting at time .t0 on .(Ω, F , P ) with respect to the filtration
.(Ft )t≥t0 or even with respect to the standard filtration defined by
Note that there is a strict inclusion .FtW ⊂ FtB in the case .t0 > 0. Moreover, since the stochastic
integral depends only on the Brownian increments (cf. Corollary 10.2.27), we have a.s.
ˆ t ˆ t
. us dBs = us dWs , t ≥ t0 .
t0 t0
such that (14.1.4) holds for every .t ∈ [t0 , T ] almost surely. We explicitly note that, under the local
boundedness Assumption 14.1.1, we have
ˆ T ˆ T
. |b(t, Xt )|dt + |σ (t, Xt )|2 dt < ∞ a.s. (14.1.3)
t0 t0
To indicate that X is a solution of the SDE with coefficients .b, σ on the set-up
(W, Ft ) we write
.
X ∈ SDE(b, σ, W, Ft ).
.
4 By Theorem 6.2.22 and the independence of Z from .F W (cf. Remark 14.1.4), W is a Brownian
motion also with respect to .F Z,W .
5 The smallest filtration verifying the usual conditions.
14.1 Solving SDEs: Concepts of Existence and Uniqueness 267
We recall from Example 11.1.9 that if also the initial datum is deterministic, then
Xt is a Gaussian process.
.
• in the weak sense, if for every distribution .μ0 on .RN there exist a set-up .(W, Ft )
and a solution .X ∈ SDE(b, σ, W, Ft ) such that .Xt0 ∼ μ0 ;
• in the strong sense, if for every set-up .(W, Ft ) and .Z ∈ mFt0 there exists a
strong solution .X ∈ SDE(b, σ, W, FtZ,W ) such that .Xt0 = Z a.s.
Although it may seem counter-intuitive, it is possible for a process to satisfy an
equation of the type
ˆ t ˆ t
.Xt = x + b(s, Xs )ds + σ (s, Xs )dWs
0 0
To prove that the SDE (14.1.5) is solvable in the weak sense, consider a Brownian
motion X defined on the space .(Ω, F , P , F X ). The process
ˆ t
Wt :=
. σ (Xs )dXs (14.1.6)
0
which means that X is a solution of the SDE (14.1.5) with respect to W , i.e.,
X ∈ SDE(0, σ, W, F X ), with null initial datum. The crucial point is that it can
.
In the definition of strong uniqueness, the two processes X and Y are defined
on the same probability space .(Ω, F , P ) and are solutions of the SDE on the
setups .(W, Ft ) and .(W, Gt ), respectively: here W is a Brownian motion with
respect to both filtrations .(Ft ) and .(Gt ) which can be different. Strong uniqueness
is also known as “pathwise uniqueness”. In the definition of uniqueness in law, the
processes X and Y can be solutions on different set-ups .(W, Ft ) and .(B, Gt ), even
defined on different probability spaces.
Example 14.1.12 ([!]) For the SDE in Example 14.1.10, there is weak but not
strong uniqueness. In fact, every solution X of the SDE (14.1.5) is a local martingale
with .〈X〉t = t and therefore, by Lévy’s characterization Theorem 12.4.1, X is a
Brownian motion: hence there is uniqueness in law.
On the other hand, if X is the weak solution constructed in Example 14.1.10, we
can verify that also .−X is a solution of the SDE and therefore there is no strong
6 Here the Meyer-Tanaka formula is used: see, for example, Section 5.3.2 in [112] or Section 2.11
in [37].
14.2 Weak Existence and Uniqueness via Girsanov Theorem 269
Here we used the fact that .P (Xs = 0) = 0 for every .s ≥ 0 since X is a Brownian
motion.
Remark 14.1.13 ([!]) Theorem 14.3.6, by Yamada and Watanabe, states that if an
SDE is solvable in the strong sense then it is also solvable in the weak sense.
Furthermore, strong uniqueness implies uniqueness in law: while this result may
seem intuitive, its proof is not straightforward; indeed, strong uniqueness pertains
to solutions defined on the same space, whereas proving weak uniqueness requires
dealing with solutions that may be defined on different spaces. Finally, we also have
that if for an SDE there is strong uniqueness then every solution is a strong solution.
Remark 14.1.14 Recently, a further notion of uniqueness for SDEs, called “path-
by-path uniqueness”, has also been studied: see in this regard [31, 48] and [130].
There are many ways to prove weak existence and uniqueness for an SDE. In this
section, we examine a very particular technique that exploits the results on changes
of measure of Chap. 13. The following remarkable Theorem 14.2.3 is an example
of the so-called “regularizing effect of Brownian motion”, whereby weak existence
and uniqueness for an SDE are obtained under minimal regularity assumptions on
the drift coefficient, which is here assumed to be only measurable and bounded.
Under such assumptions, the corresponding ordinary differential equation (without
the Brownian part) does not generally have a unique solution as shown by the well-
known
Example 14.2.1 (Peano’s Brush) The SDE (14.1.1) with .b(t, x) = |x|α , .σ = 0
and null initial datum reduces to the Volterra integral equation
ˆ t
.Xt = |Xs |α ds. (14.2.1)
0
270 14 Stochastic Differential Equations
Formula (14.2.1) has the null function as its unique solution if .α ≥ 1, while if
α ∈ ]0, 1[ there are infinite solutions of the form
.
⎧
⎨0 if 0 ≤ t ≤ s,
.Xt = ⎛ ⎞β
⎩ t−s if s ≤ t ≤ T ,
β
where .β = 1
1−α and .s ∈ [0, T ].
A similar phenomenon also occurs in the stochastic case.
Example 14.2.2 (Itô and Watanabe [64] [!]) The SDE
1 2
dXt = 3Xt3 dt + 3Xt3 dWt ,
. X0 = 0,
b : ]0, T [×Rd −→ Rd
.
⎛ˆ t ˆ t ⎞
1
Mt := exp
. b(s, Xs )dXs − |b(s, Xs )|2 ds , t ∈ [0, T ],
0 2 0
(14.2.3)
14.2 Weak Existence and Uniqueness via Girsanov Theorem 271
(i)
are Brownian motions respectively on the spaces .(Ωi , F (i) , Qi , Ft ) where
dQi (i)
dP = MT . Therefore, the law of .X
.
(1) in .Q is equal to the law of .X (2)
1
in .Q2 : from (14.2.5), (14.2.6), and Corollary 10.2.28, it follows that the law of
.(X
(1) , W (1) , M (1) ) in .Q is equal to the law of .(X (2) , W (2) , M (2) ) in .Q . Finally,
1 2
for every .0 ≤ t1 < · · · < tn ≤ T and .H ∈ B2nd we have
ˆ
dQ1
P1 ((Xt(1)
.
1
, Wt(1)
1
, . . . , Xt(1)
n
, Wt(1)
n
)∈H)= 1H (Xt(1)
1
, Wt(1)
1
, . . . , Xt(1)
n
, Wt(1)
n
) (1)
Ω1 MT
ˆ
(2) (2) (2) (2) dQ2
= 1H (Xt1 , Wt1 , . . . , Xtn , Wtn )
Ω2 MT(2)
(2) (2) (2) (2)
= P2 ((Xt1 , Wt1 , . . . , Xtn , Wtn ) ∈ H )
Remark 14.2.4 Theorem 14.2.3 can be extended in various directions. Using the
Novikov condition (Theorem 13.2.6) to prove that the process in (14.2.3) is a
martingale, one proves the existence of a weak solution of the SDE (14.2.2) under
the more general assumption of linear growth in x (in addition to measurability) of
the coefficient b: for more details see, for example, Proposition 5.3.6 in [67].
In Sect. 18.4 we will prove a “strong version” of Theorem 14.2.3, under the more
restrictive assumption that .b = b(t, x) is a bounded and Hölder continuous function
in the variable x, uniformly in t.
We examine the relationship between strong and weak solvability. For simplicity,
we assume .t0 = 0 and, given .N, d ∈ N and .T > 0, we consider an SDE with
coefficients
Furthermore, we let .μ0 be a distribution on .RN that we will use as the initial
condition.
Since the result of this section are rather technical, on a first reading it is
recommended to read the statements and skip the proofs.
Definition 14.3.1 (Weak Solution of an SDE) The SDE with coefficients .b, σ
and initial law .μ0 is solvable in the weak sense if there exist a set-up .(W, Ft ) and a
solution .X ∈ SDE(b, σ, W, Ft ) such that .X0 ∼ μ0 . In this case, almost surely
ˆ t ˆ t
Xt = X0 +
. b(s, Xs )ds + σ (s, Xs )dWs , t ∈ [0, T ], (14.3.1)
0 0
and we say that the pair .(X, W ) is a weak solution of the SDE with coefficients .b, σ
and initial law .μ0 .
Remark 14.3.2 ([!]) To prove that an SDE is solvable in the weak sense, it is
necessary to construct not only the process X but also the set-up .(W, Ft ) on which
the SDE is written: for this reason, the weak solution is typically referred to as the
pair .(X, W ), not just the process X.
We now see that it is always possible to transfer the problem of weak solvability
of an SDE to a “canonical setting”.
14.3 Weak vs Strong Solutions: The Yamada-Watanabe Theorem 273
Ωn = C([0, T ]; Rn )
.
Xt (w) := w(t),
. w ∈ Ωn , t ∈ [0, T ],
μX,W (H ) = P ((X, W ) ∈ H ),
. H ∈ GTN+d .
Hereafter, we will repeatedly use the fact that .ΩN +d is a Polish space on which,
thanks to Theorem 4.3.2 in [113], it is possible to define a regular version of
the conditional probability. The following lemma is a crucial ingredient in all
subsequent analysis.
Lemma 14.3.5 (Transfer of Solutions [!]) If .(X, W ) is a weak solution of the SDE
with coefficients .b, σ and initial law .μ0 on the space .(Ω, F , P ), then the canonical
process .(X, W) defined by
Xt (x, w) := x(t),
. Wt (x, w) := w(t), (x, w) ∈ ΩN +d , t ∈ [0, T ],
is a weak solution of the SDE with coefficients .b, σ and initial law .μ0 on the space
(ΩN+d , GTN+d , μX,W ).
.
(X,W ) (X,W)
(Ω, F , P ) −−−−→ (ΩN +d , GTN+d , μX,W ) −−−−→ (ΩN +d , GTN +d )
.
d
and by construction, .(X, W ) = (X, W). The fact that .W is a Brownian motion is a
consequence8 of the equality in law of .(X, W ) and .(X, W). Suppose for the moment
7 We saw in Proposition 3.2.1 that, in the space of continuous trajectories, the .σ -algebra generated
by cylinders (or, equivalently, by the identity process) coincides with the Borel .σ -algebra.
8 In particular, it is sufficient to show the independence of the increments using the characteristic
that the initial law is .μ0 = δx0 for some .x0 ∈ RN and therefore .X0 = x0 almost
surely. Letting
ˆ t ˆ t
.Jt := b(s, Xs )ds + σ (s, Xs )dWs ,
0 0
ˆ t ˆ t
Jt := b(s, Xs )ds + σ (s, Xs )dWs ,
0 0
we have that .(X, W, J ) and .(X, W, J) are equal in law by Corollary 10.2.28.
Therefore, .X − x0 − J is indistinguishable from the null process, and this proves
the thesis.
The case where the initial datum .X0 is random can be handled by conditioning
on .X0 . Precisely, to lighten the notation, let .P := μX,W : by Theorem 4.3.2 in [113],
there exists a regular version
( )
.P(· | X0 ) = Px,w (· | X0 )
(x,w)∈Ω d+N
ΩN+d , under the measure .Px,w (· | X0 ), the process .(X, W) is a solution of the SDE
with coefficients .b, σ and initial datum .x(0). To conclude, it is sufficient to observe
that, for
| ˆ t ˆ t |
| |
.Z := sup | σ (s, Xs )dWs ||
|Xt − X0 − b(s, Xs )ds −
t∈[0,T ] 0 0
9 Further reference sources are Theorem 21.14 and Lemma 21.17 in [66] and Section V.17 in [124].
14.3 Weak vs Strong Solutions: The Yamada-Watanabe Theorem 275
where .μW is the law of the d-dimensional Brownian motion, and with the
filtration .(Gt )t∈[0,T ] generated by the identity process
(Z, W) : RN × Ωd
.
d
(Xi , W) = (Xi , W i ),
. i = 1, 2; (14.3.4)
d d
(X1 , W 1 ) = (X1 , W) = (X2 , W) = (X2 , W 2 ).
.
(iii) Again, we consider only the case of a deterministic initial datum. Let .X ∈
SDE(b, σ, W, Ft ) be a solution with initial datum .X0 = x ∈ RN a.s. We
apply the construction of point (ii) with .X1 = X2 = X, that is, we construct
on the space .ΩN × ΩN × Ωd the measure .P as in (14.3.3) and the canonical
process .(X1 , X2 , W) where .X1 , X2 are equal in law to X and are solutions of
the SDE with respect to the Brownian motion .W.
We consider the conditional probability .P(· | W) = (Pw (· | W))w∈Ωd and
the related conditional laws
noting that .μXi |W = μX|W by (14.3.4). We have12 that the random variables
.X and .X are simultaneously equal a.s. and independent in .Pw (· | W)
1 2
for almost every .w ∈ Ωd and therefore13 .X1 and .X2 have a Dirac delta
distribution under .Pw (· | W). In other terms, for almost every .w ∈ Ωd we have
.μX|W (H ; w) = μXi |W (H ; w) = δF (w) for some measurable map F from .Ωd
and since .P(X1 = X2 | W) ≤ 1, we also have .Pw (X1 = X2 | W) = 1 for almost every .w ∈ Ωd .
Moreover, from definition (14.3.3) of .P, it is not difficult to verify that the joint conditional law of
1 2
.X , X is the product of the marginals
⎛ ⎞
.μX1 ,X2 |W (H × K) = P (X , X ) ∈ H × K | W = μX|W (H )μX|W (K)
1 2
a.s. and independent, then .X ∼ δx0 for some .x0 ∈ R. Prove that an analogous result holds for .X, Y
with values in the space .Ωn .
14.4 Standard Assumptions and Preliminary Estimates 277
14 In fact, in [67] more is proved (see also Remark 2 on page 310 in [123]): highlighting the
dependence on the initial datum .x ∈ RN , the function .F = F (x, w) is jointly measurable and, for
.Z ∈ mF0 , .X = F (Z, W ) is a strong solution of the SDE with random initial datum .X0 = Z.
278 14 Stochastic Differential Equations
where .μ, σ are real parameters. In this case, .b(t, x) = μx and .σ (t, x) = σ x, so the
standard assumptions are obviously satisfied. As in Example 11.1.5-(iii), a direct
application of Itô’s formula shows that
⎛ 2
⎞
μ− σ2 t+σ Wt
Xt = X0 e
.
⎾ |ˆ t ˆ t |p ⏋
| |
.E |
sup | b(s, Xs )ds + σ (s, Xs )dWs ||
t0 ≤t≤t1 t0 t0
ˆ t1 ⎛ ⎾ ⏋⎞
p−2
≤ c̄1 (t1 − t0 ) 2 1 + E sup |Xr |p ds (14.4.4)
t0 t0 ≤r≤s
(by (14.4.1))
ˆ
p
t1 ⎾ ⏋
. ≤ (t1 − t0 )p−1 c1 E (1 + |Xs |)p ds ≤
t0
(by (14.4.6))
ˆ
p
t1 ( ⎾ ⏋)
. ≤ 2p−1 (t1 − t0 )p−1 c1 1 + E |Xs |p ds
t0
ˆ t1 ⎛ ⎾ ⏋⎞
p
≤ 2p−1 (t1 − t0 )p−1 c1 1 + E sup |Xr |p ds.
t0 t0 ≤r≤s
(by (14.4.2))
ˆ
p
t1 ⎾ ⏋
. ≤ (t1 − t0 )p−1 c2 E |Xs − Ys |p ds
t0
ˆ t1 ⎾ ⏋
p
≤ (t1 − t0 )p−1 c2 E sup |Xr − Yr |p ds.
t0 t0 ≤r≤s
with .b, σ satisfying the linear growth assumption (14.4.1). Then for every .T > 0
and .p ≥ 2 there exists a positive constant .c = c(T , p, d, N, c1 ) such that
⎾ ⏋
⎾ ⏋
E
. sup |Xt | p
≤ c(1 + E |X0 |p ). (14.5.1)
0≤t≤T
Proof It is not restrictive to assume .E [|X0 |p ] < ∞ otherwise the thesis is obvious.
The general idea of the proof is simple: from estimate (14.4.4) we have
⎾ p⏋
v(t) := E X̄t
.
⎛ ˆ t ⎞
⎾ ⏋ ( ⎾ p⏋ )
≤2 p−1
E |X0 | + c̄1
p
1 + E X̄s ds , t ∈ [0, T ],
0
or equivalently
⎛ ˆ t ⎞
⎾ ⏋
.v(t) ≤ c 1 + E |X0 | + t ∈ [0, T ],
p
v(s)ds ,
0
and therefore the thesis would follow directly from Grönwall’s lemma.
As a matter of fact, to apply Grönwall’s lemma, it is necessary to know a priori15
that .v ∈ L1 ([0, T ]). For this reason, it is necessary to proceed more carefully using
15 Based on what has been proven so far, we do not even know if v is a continuous function.
282 14 Stochastic Differential Equations
with the convention .min ∅ = T . Being X a.s. continuous, we have that .τn is an
increasing sequence of stopping times such that .τn ↗ T a.s. With .bn , .σn as in
(17.1.2), we have
ˆ t∧τn ˆ t∧τn
Xt∧τn = X0 +
. b(s, Xs )ds + σ (s, Xs )dWs
0 0
ˆ t ˆ t
= X0 + bn (s, Xs∧τn )ds + σn (s, Xs∧τn )dWs .
0 0
The coefficients .bn = bn (t, x) and .σn = σn (t, x), although stochastic, satisfy the
linear growth condition (14.4.1) with the same constant .c1 : the proof of estimate
(14.4.4) can be repeated in a substantially identical way to the case of deterministic
.b, σ , to obtain
⎾ ⏋
vn (t1 ) := E
. sup |Xt∧τn |p
0≤t≤t1
⎛ ˆ ⎛ ⎾ ⏋⎞ ⎞
⎾ ⏋ t1
≤2 p−1
E |X0 |p + c̄1 1+ E sup |Xr∧τn | p
ds , t1 ∈ [0, T ],
0 0≤r≤s
◟ ◝◜ ◞
=vn (s)
or equivalently
⎛ ˆ ⎞
⎾ ⏋ t1
.vn (t1 ) ≤ c 1 + E |X0 | + t1 ∈ [0, T ],
p
vn (s)ds ,
0
and taking the limit as n goes to infinity, we get (14.5.1) by Beppo Levi’s theorem.
⨆
⨅
If the diffusive coefficient .σ is bounded, a stronger integrability estimate than
that of Theorem 14.5.2 holds.
14.5 Some A Priori Estimates 283
with b satisfying the linear growth assumption (14.4.1) and .σ bounded by a constant
.κ, i.e., .|σ (t, x)| ≤ κ for .(t, x) ∈ [0, T ]×RN . Then there exist two positive constants
.α and c, depending only on .T , κ, c1 and N, such that
⎾ ⏋ ⎾ ⏋
E eα X̄T ≤ cE ec|X0 | ,
2 2
. X̄T := sup |Xt |.
0≤t≤T
Proof Let
|ˆ t |
| |
.M̄T = sup | |
| σ (s, Xs )dWs | .
0≤t≤T 0
Consequently
⎛ ⎞ ( )
. X̄T ≥ (|X0 | + c1 T + δ)ec1 T ⊆ M̄T ≥ δ
and by Proposition 13.2.4 (and estimate (13.2.5)) there exists a positive constant c,
depending only on .N, κ and T , such that16
⎛ ⎞ δ2
P X̄T ≥ (|X0 | + c1 T + δ)ec1 T | X0 ≤ ce− c .
. (14.5.2)
λ −c1 T
δ = λe−c1 T − |X0 | − c1 T ≥
. e if λ ≥ ā|X0 | + b̄ (14.5.3)
2
16 Provided that we switch to the canonical setting by means of Lemma 14.3.5 (this is not restrictive
since the thesis depends only on the law of X), a regular version of the conditional probability exists
and estimate (14.5.2) holds pointwise as a consequence of Proposition 13.2.4.
284 14 Stochastic Differential Equations
with .ā := 2ec1 T and .b̄ := 2c1 T ec1 T . So, combining (14.5.2) and (14.5.3), we have
( )
P X̄T ≥ λ | X0 ≤ ce−c̄λ ,
2
. λ ≥ ā|X0 | + b̄, (14.5.4)
with .c, c̄ positive constants depending only on .T , κ, c1 and N . Now we apply the
2
Proposition 3.1.6 in [113] with .f (λ) = eαλ , where the constant .α > 0 will be
determined later: we have
⎾ ⏋ ˆ ∞
α X̄T2 2 ( )
.E e | X0 = 1 + 2α λeαλ P X̄T ≥ λ | X0 dλ ≤
0
(by (14.5.4))
ˆ ā|X0 |+b̄ ˆ +∞
2 2 (α−c̄)
. ≤ 1 + 2α λeαλ dλ + 2αc λeλ dλ.
0 ā|X0 |+b̄
We summarize the most significant findings of the chapter and the fundamental
concepts to be retained from an initial reading, while disregarding the more technical
or secondary matters. As usual, if you have any doubt about what the following
succinct statements mean, please review the corresponding section.
• Section 14.1: we introduce the concepts of solution of an SDE on a set-up
.(W, Ft ) and of solvability of an SDE in the strong sense (i.e., with solutions
adapted to the filtration generated by the initial datum and by W ) and in the
weak sense: in the latter case, not having the set-up fixed a priori, a solution is
constituted by the pair .(X, W ).
• Section 14.2: thanks to the regularizing effect of the Brownian motion and in
contrast with what happens in the deterministic case, we can have existence and
uniqueness of the solution of an SDE with a strongly irregular drift coefficient,
even only measurable and bounded.
• Section 14.3: the solution transfer technique allows us to set the problem of
solvability of an SDE in the canonical space of continuous trajectories: this
is particularly useful for the study of weak solutions. The Yamada-Watanabe
Theorem clarifies the relationship between the concepts of solvability in a weak
and strong sense:
(i) if an SDE is solvable in the strong sense then it is also solvable in the weak
sense;
14.6 Key Ideas to Remember 285
(ii) if for an SDE there is uniqueness in the strong sense then there is also
uniqueness in the weak sense;
(iii) if for an SDE there is solvability in the weak sense and uniqueness in the
strong sense then there is solvability in the strong sense.
• Sections 14.4 and 14.5: under the “standard assumptions” of linear growth and
Lipschitz continuity of the coefficients, we prove some integrability estimates
that will be crucial in the study of strong solutions.
Main notations used or introduced in this chapter:
If there exists a solution .Xt,x = (Xst,x )s∈[t,T ] to (15.0.1) with initial datum .(t, x),
then by Itô’s formula, for any suitably smooth function u we have
ˆ s
u(s, Xst,x ) = u(t, x) +
. (∂r + Ar ) u(r, Xrt,x )dr
t
ˆ s
+ ∇u(r, Xrt,x )σ (r, Xrt,x )dWr , s ∈ [t, T ], (15.0.2)
t
where
1 ⎲ ⎲
N N
At :=
. cij (t, x)∂xi xj + bj (t, x)∂xj , c := σ σ ∗ , (15.0.3)
2
i,j =1 j =1
is the so-called characteristic operator of the SDE (15.0.1) (see Definition 15.1.1).
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 287
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_15
288 15 Feynman-Kac Formulas
Consider an SDE of the form (15.0.1) with coefficients .b, σ ∈ L∞ loc that satisfy
the linear growth assumption (14.4.1). Suppose there exists a solution .Xt,x =
(Xst,x )s∈[t,T ] with initial datum .(t, x). Then, given a function .ψ = ψ(x) ∈
bC 2 (RN ) (i.e., .ψ has continuous and bounded derivatives up to the second order),
by Itô’s formula we have
⎾ ⏋
ψ(Xst,x ) − ψ(x)
.E
s−t
⎾ ˆ s ˆ s ⏋
1 1
=E Ar ψ(Xr )dr +
t,x
∇ψ(Xr )σ (r, Xr )dWr =
t,x t,x
s−t t s−t t
15.1 Characteristic Operator of an SDE 289
where we used the dominated convergence theorem and the estimates of Theo-
rem 14.5.2, to evaluate the limit: thus, we have
|
d ⎾ ⏋|
t,x |
. E ψ(Xs ) | = At ψ(x). (15.1.1)
ds s=t
This serves as the motivation for the following definition, which mirrors formula
(2.5.5) for Markov processes.
Definition 15.1.1 (Characteristic Operator of an SDE) The operator .At in
(15.0.3) is called the characteristic operator of the SDE (15.0.1).
Remark 15.1.2 ([!]) Given .m ∈ RN , consider the functions
ψi (x) := xi ,
. ψij (x) := (xi − mi )(xj − mj ), x ∈ RN , i, j = 1, . . . , N,
Formula (15.1.1) is valid with .ψ = ψi and .ψ = ψij : this can be proved using
the same arguments as above since the linear growth hypothesis of the coefficients
p
.b, σ and the .L estimates of Theorem 14.5.2 justify convergence and the martingale
+ bj (t, x)(xi − mi )
Based on formulas (15.1.2) and (15.1.3), the coefficients .bi (t, x) and .cij (t, x)
represent the infinitesimal increments of expectation and covariance matrix of .Xt,x ,
in agreement with Remark 2.5.8.
290 15 Feynman-Kac Formulas
is a local martingale: this result is similar to Theorem 2.5.13 and shows how to
“compensate” the process .s |→ u(s, Xst,x ) to obtain a (local) martingale. These sim-
ilarities between Markov processes and solutions of SDEs are not coincidental: we
will prove later (see Theorems 17.3.1 and 18.2.3) that, under suitable assumptions
on the coefficients, the solution of an SDE is a diffusion.
In this section, we provide some simple conditions that ensure that the first exit time
of the solution of the SDE (15.0.1) from a bounded domain1 D of .RN , is absolutely
integrable and therefore a.s. finite. We make the following
Assumption 15.2.1
(i) The coefficients of the SDE (15.0.1) are measurable and locally bounded,
.b, σ ∈ L
∞ ([0, +∞[×R N );
loc
(ii) for every .t ≥ 0 and .x ∈ D there exists a solution .Xt,x of (15.0.1) with initial
condition .Xtt,x = x, on a set-up .(W, Ft ).
We denote by .τt,x the first exit time of .Xt,x from D,
At f (x) ≤ −1,
. t ≥ 0, x ∈ D. (15.2.1)
Then .E [τx ] is finite for every .x ∈ D. In particular, such a function exists if for some
λ > 0 and .i ∈ {1, . . . , N } we have2
.
cii (t, ·) ≥ λ,
. t ≥ 0, x ∈ D. (15.2.2)
Since .∇f and .σ (s, ·) are bounded on D for .s ≤ t, the stochastic integral has zero
expectation and by (15.2.1) we have
⎾ ⏋
x
E f (Xt∧τ
.
x
) ≤ f (x) − E [t ∧ τx ] ;
thus, since .f ≥ 0,
E [t ∧ τx ] ≤ f (x).
.
.E [τx ] ≤ f (x).
Now suppose that (15.2.2) holds and consider only the case .i = 1: then it is
enough to set
where .α, β are suitable positive constants and R is large enough so that D is
included in the Euclidean ball of radius R, centered at the origin. Indeed, f is non-
negative on D and we have
⎛ ⎞
1
.At f (x) = −αe c11 (t, x)β 2 + b1 (t, x)β
βx1
2
⎛ ⎞
−βR λβ
≤ −αβe − ‖b‖ L∞ (D) ,
2
ˆ d ˆ
⎲
( x ) t∧τx t∧τx
. Xt∧τ
x 1
= x1 + b1 (s, Xsx )ds + σ1i (s, Xsx )dWsi ,
0 i=1 0
292 15 Feynman-Kac Formulas
and in expectation
⎾( ) ⏋
E
.
x
Xt∧τx 1
≥ x1 + λE [t ∧ τx ] ,
In this section, we consider the case where the coefficients .b = b(x) and .σ = σ (x)
of the SDE (15.0.1) are independent of time and therefore denote .At in (15.0.3)
simply as .A . For many aspects, this condition is not restrictive since even problems
with time dependence can be treated in this context by inserting time among the state
variables as in the following Example 15.3.7. In addition to Assumption 15.2.1, we
assume that .E [τx ] is finite for every .x ∈ D, where D is a bounded domain.
The following result provides a representation formula (and, consequently, a
uniqueness result) for the classical solutions of the Dirichlet problem for the elliptic-
parabolic operator .A :
⎧
A u − au = f, in D,
. (15.3.1)
u|∂D = ϕ,
where .f, a, ϕ are given functions. As previously stated, formula (15.3.2) serves as
the foundation for Monte Carlo-type methods used in the numerical approximation
of solutions to the Dirichlet problem (15.3.1).
Theorem 15.3.1 (Feynman-Kac Formula [!!]) Let .f ∈ L∞ (D), .ϕ ∈ C(∂D) and
.a ∈ C(D) such that .a ≥ 0. If .u ∈ C (D) ∩ C(D̄) is a solution of the Dirichlet
2
Proof For .ε > 0 sufficiently small, let .Dε be a domain such that
x ∈ Dε ,
. D̄ε ⊆ D, dist (∂Dε , ∂D) ≤ ε.
Let .τε be the exit time of .Xx from .Dε and observe that, being .Xx continuous
(Fig. 15.1),
. lim τε = τx .
ε→0+
15.3 The Autonomous Case: The Dirichlet Problem 293
Let
´t
Zt = e−
. 0 a(Xsx )ds
,
and note that, by hypothesis, .Zt ∈ ]0, 1]. Moreover, if .uε ∈ C02 (RN ) is such that
.uε = u on .Dε , by Itô’s formula we have
( )
d(Zt uε (Xtx )) = Zt (A uε − auε ) (Xtx )dt + ∇uε (Xtx )σ (Xtx )dWt
.
so that
ˆ τε ˆ τε
Zτε u(Xτxε ) = u(x) +
. Zt f (Xtx )dt + Zt ∇u(Xtx )σ (Xtx )dWt .
0 0
Existence results for problem (15.3.1) are well known in the uniformly elliptic
case: we recall the following classical theorem (see, for example, Theorem 6.13 in
[53]).
Theorem 15.3.4 Under the following assumptions
(i) .A in (15.0.3) is a uniformly elliptic operator, i.e., there exists a constant .λ > 0
such that
⎲
N
. cij (x)ξi ξj ≥ λ|ξ |2 , x ∈ D, ξ ∈ RN ;
i,j =1
3 This is a regularity condition for the boundary of D, that is satisfied if, for example, .∂D is a
.C
2 -manifold.
15.3 The Autonomous Case: The Dirichlet Problem 295
The law .μx is usually called the harmonic measure of .A on .∂D. If .Xx is a Brownian
motion with initial point .x ∈ RN , then .A = 12 Δ and when .D = B(0, R) is the
Euclidean ball of radius R, .μx has a density (with respect to the surface measure)
whose explicit expression is known: it corresponds to the so-called Poisson kernel.
1 R − |x|2
. ,
RωN |x − y|N
where .ωN denotes the measure of the unit spherical surface in .RN .
Example 15.3.7 (Heat Equation) Let W be a real Brownian motion. The process
Xt = (Wt , −t) is the solution of the SDE
.
⎧
dXt1 = dWt ,
.
dXt2 = −dt,
1
A =
. ∂x x − ∂x2
2 1 1
D = ]a1 , b1 [ × ]a2 , b2 [ .
.
Examining the explicit expression of the trajectories of X (see Fig. 15.2), it is clear
that the value .u(x̄1 , x̄2 ) of a solution of the heat equation depends only on the values
of u on the boundary part D contained in .{x2 < x̄2 }. In general, the value of u in D
depends only on the values of u on the parabolic boundary of D, defined by
⎲
N
A =
. bi (x)∂xi .
i=1
d
. Xt = b(Xt ).
dt
If the exit time of X from D is finite (cf. Remark 15.2.3) then we have the
representation
´ τx ˆ τx ´t
u(x) = e−
. 0 a(Xtx )dt
ϕ(Xτxx ) − e− 0 a(Xsx )ds
f (Xtx )dt, (15.3.3)
0
Note that, as in Example 2.5.12, the solution u inherits the regularity properties
of .ϕ and therefore in general the differential equation in (15.3.4) has to be
understood in a distributional sense. From a probabilistic perspective, the transition
law of the process X (which in this case is a Dirac delta distribution, i.e., .Xtx ∼
δ(x1 +t,x2 −tx1 −t 2 /2) ) is the fundamental solution of the Cauchy problem (15.3.4).
Theorem 15.3.1 also has a parabolic counterpart, with a proof that is entirely
analogous. Precisely, given the bounded domain D, we consider the cylinder
DT = ]0, T [×D
.
298 15 Feynman-Kac Formulas
and we denote by
∂p DT := ∂D \ ({0} × D)
.
where .At is the characteristic operator in (15.0.3) and .f, a, ϕ are given functions.
Chapter 20 is dedicated to a concise presentation of the main existence and
uniqueness results for problem (15.4.3) in the case of uniformly parabolic operators
with Hölder and bounded coefficients.
Since problem (15.4.3) is posed on an unbounded domain, it is necessary to
introduce appropriate assumptions on the behavior at infinity of the coefficients.
15.4 The Evolutionary Case: The Cauchy Problem 299
Assumption 15.4.3
(i) The coefficients .b = b(t, x) and .σ = σ (t, x) are measurable functions, with at
most linear growth in x uniformly in .t ∈ [0, T [;
(ii) .a ∈ C([0, T [ ×RN ) with .inf a =: a0 > −∞.
Theorem 15.4.4 (Feynman-Kac Formula [!!]) Suppose there exists a solution
u ∈ C 2 ([0, T [ ×RN ) ∩ C([0, T ] × RN ) of the Cauchy problem (15.4.3). Take
.
(2) the matrix .σ is bounded and there exist two positive constants M and .α, with .α
sufficiently small, such that
2
|u(t, x)| + |f (t, x)| ≤ Meα|x| ,
. (t, x) ∈ [0, T [ ×RN . (15.4.5)
If the SDE (15.0.1) has a solution .Xt,x with initial datum .(t, x) ∈ [0, T [ ×RN then
the representation formula holds
⎾ ´T ˆ T ´s ⏋
− a(s,Xst,x )ds − a(r,Xrt,x )dr
u(t, x) = E e
. t ϕ(XTt,x ) − e t f (s, Xst,x )ds .
t
(15.4.6)
Proof Fix .(t, x) ∈ [0, T [ ×RN and for simplicity, let .X = Xt,x . If .τR denotes the
exit time of X from the Euclidean ball of radius R, by Theorem 15.4.1 we have
⎾ ´
T ∧τR
u(t, x) = E e− t
.
a(s,Xs )ds
u(T ∧ τR , XT ∧τR )
ˆ T ∧τR ´s ⏋
− e− t a(r,Xr )dr
f (s, Xs )ds . (15.4.7)
t
Since
. lim T ∧ τR = T ,
R→∞
the thesis follows by taking the limit in R in (15.4.7) thanks to the dominated
convergence theorem. In fact, we have pointwise convergence of the integrands and
300 15 Feynman-Kac Formulas
where
The following theorem provides the explicit expression of the solution of a linear
SDE.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 303
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_16
304 16 Linear Equations
Theorem 16.1.1 The solution .Xx = (Xtx )t≥0 of (16.0.1) with initial datum .X0x =
x ∈ RN is given by
⎛ ˆ t ˆ t ⎞
−sB −sB
x
.Xt =e tB
x+ e bds + e σ dWs . (16.1.1)
0 0
The solution .Xx is a Gaussian process: in particular, .Xtx ∼ Nmt (x),Ct where
⎛ ˆ t ⎞ ˆ t
−sB
mt (x) = e
.
tB
x+ e bds , Ct = esB σ (esB σ )∗ ds.
0 0
Proof To prove that .Xx in (16.1.1) solves the SDE (16.0.1), it is sufficient to apply
Itô’s formula using the expression .Xtx = etB Ytx where
We now recall that, since .Y x is an Itô process with deterministic coefficients, by the
multidimensional version of Example 11.1.9, we have
ˆ t ˆ t ∗
Ytx ∼ Nμt (x),Ct ,
. μt (x) = x + e−sB bds, Ct = e−sB σ σ ∗ e−sB ds.
0 0
(16.1.2)
The thesis follows easily from the fact that .Xx is a linear transformation of .Y x . ⨆
⨅
Remark 16.1.2 ([!]) The process
T |→ XTt,x := XTx −t ,
. T ≥ t,
solves the SDE (16.0.1) with initial datum .(t, x). If the covariance matrix .CT −t
is positive definite, then the random variable .XTt,x is absolutely continuous with
Gaussian density .𝚪(t, x; T , ·) given by
⎛ ⎞
1 1
𝚪(t, x; T , y)= √
. exp − 〈CT−1
−t (y − m T −t (x)), (y − m T −t (x))〉 .
(2π )N det CT −t 2
1 ⎲
N
At =
. cij ∂xi xj + 〈Bx + b, ∇〉, c := σ σ ∗ , (16.1.3)
2
i,j =1
which is the simplified version of the Langevin equation [86] used in physics to
describe the random motion of a particle in phase space: .Vt and .Xt represent the
velocity and position of the particle at time t, respectively. Paul Langevin was the
first, in 1908, to apply Newton’s laws to the random Brownian motion studied by
Einstein a few years earlier. Lemons [88] provides an interesting account of the
approaches of Einstein and Langevin.
Referring to the general notation (16.0.1), we have .d = 1, .N = 2 and
⎛ ⎞ ⎛ ⎞
00 1
.B= , σ = . (16.1.4)
10 0
and
ˆ ˆ t⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞
t t2
∗ sB ∗ 10 10 1s t
Ct =
.
sB
e σσ e ds = ds = t 2 2
t3
. (16.1.5)
0 0 s1 00 01 2 3
Note that .Ct is positive definite for every .t > 0 and therefore .(V , X) has transition
density
√ ⎛ ⎞
3 1 −1 (T −t)B (T −t)B
𝚪(t, z; T , ζ ) =
. exp − 〈C (ζ − e z), ζ − e z〉
π(T − t)2 2 T −t
(16.1.6)
for .t < T and .z = (v, x), ζ = (η, ξ ) ∈ R2 , where
⎛ ⎞
−1
4
− t62
.Ct = t .
− t62 12
t3
306 16 Linear Equations
1
. ∂vv + v∂x + ∂t (16.1.7)
2
and .(T , η, ξ ) |→ 𝚪(t, v, x; T , η, ξ ) is a fundamental solution of the forward
Kolmogorov operator
1
. ∂ηη − η∂ξ − ∂T . (16.1.8)
2
Operators in (16.1.7) and (16.1.8) are not uniformly parabolic because the matrix of
the second-order part
⎛ ⎞
10
σσ∗ =
.
00
is degenerate; nonetheless, like the classical heat equation operator, they have a
Gaussian fundamental solution. Kolmogorov [70] was the first to exhibit the explicit
expression (16.1.6) of the fundamental solution of (16.1.7) (see also the introduction
of Hörmander’s work [62]). In mathematical finance, the backward operator (16.1.7)
is employed to evaluate some complex derivative instruments, notably including the
so-called Asian options (see, for example, [8] and [112]).
Example 16.1.4 ([!]) In Example 16.1.3 we proved that, setting
ˆ t
Xt :=
. Ws ds,
0
the pair .(W, X) has a two-dimensional normal distribution with covariance matrix
given in (16.1.5). It follows in particular that .Xt ∼ N t 3 , confirming what we had
0, 3
already observed in Example 11.1.10.
Let us prove that X is not a Markov process. In Theorem 17.3.1 we will see
that the pair .(W, X), being a solution of the Langevin SDE, is a Markov process:
Theorem 17.3.1 does not apply to X which is an Itô process but is not a solution of
an SDE of the form (17.0.1). In fact, we have
⎾ˆ T ⏋
E [XT | Ft ] = Xt + E
. Ws ds | Ft = Xt + (T − t)Wt (16.1.9)
t
d(tWt ) = Wt dt + tdWt
.
16.1 Solution and Transition Law of a Linear SDE 307
namely
ˆ T ˆ T
T WT = tWt +
. Ws ds + sdWs
t t
from which
⎾ˆ T ⏋ ⎾ˆ T ⏋
E [T WT | Ft ] = tWt + E
. Ws ds | Ft + E sdWs | Ft
t t
and therefore
⎾ˆ T ⏋
E
. Ws ds | Ft = (T − t)Wt .
t
By (16.1.9), .E [XT | Ft ] is a function not only of .Xt but also of .Wt : incidentally,
this is a further confirmation of the Markov property of the pair .(W, X). If X were
a Markov process, then we should have3
which combined with (16.1.9) would imply .Wt = f (Xt ) a.s. for some .f ∈ mB.
However, this is absurd: in fact, if .Wt = f (Xt ) a.s. then .μWt |Xt = δf (Xt ) and this
contrasts with the fact that .(Wt , Xt ) has a two-dimensional Gaussian density.
Remark 16.1.5 The results of this section extend to the case of linear SDEs of the
type
where the matrices .B, b and .σ are measurable and bounded functions of time. In
this case, the matrix exponential .etB in the expression of the solution provided by
Theorem 16.1.1 is replaced by the solution .Ф(t) of the matrix Cauchy problem
⎧
Ф' (t) = B(t)Ф(t),
.
Ф(0) = IN ,
We have seen that the solution X of the linear SDE (16.0.1) has a multi-normal
transition law. Clearly, it is of particular interest when X admits a transition density
and therefore the related Kolmogorov equations have a fundamental solution. In this
section, we see that the non-degeneracy of the covariance matrix of .Xt ,
ˆ t
Ct := cov(Xt ) =
. Gs G∗s ds, Gt := etB σ, (16.2.1)
0
verifies the final condition .γ (T ) = y. We say that v is a control for .(B, σ ) on .[0, T ].
Theorem 16.2.2 ([!]) The matrix .CT in (16.2.1) is positive definite if and only if
(B, σ ) is controllable on .[0, T ].
.
∗
Proof We preliminarily observe that .Ct = etB Ct etB , where
ˆ t
Ct =
. G−s G∗−s ds
0
is the covariance matrix in (16.1.2). Clearly, .CT > 0 if and only if .CT > 0.
We suppose .CT > 0 and prove that .(B, σ ) is controllable on .[0, T ]. Consider the
solution
⎛ ˆ t ⎞
.γ (t) = e x+ t ∈ [0, T ],
tB
G−s v(s)ds ,
0
Conversely, assume that .(B, σ ) is controllable on .[0, T ] and suppose, for contradic-
tion, that .CT is degenerate, i.e., there exists .w ∈ RN \ {0} such that
〈CT w, w〉 = 0.
.
Equivalently, we have
ˆ T
. |w ∗ G−s |2 ds = 0
0
This contradicts (16.2.3), hence the controllability hypothesis, and concludes the
proof. ⨅
⨆
Remark 16.2.3 The control v in (16.2.4) is optimal in the sense that it minimizes
the “cost functional”
ˆ T
U (v) := ‖v‖2L2 ([0,T ]) =
. |v(t)|2 dt.
0
Then we find .v(s) = 12 G∗−s λ with .λ determined by the constraint (16.2.3), that is
−1
.λ = 2C
T z, in agreement with (16.2.4).
310 16 Linear Equations
Example 16.2.4 Let us resume Example 16.1.3 with the matrices .B, σ as in
(16.1.4). In this case, the control .v = v(t) has real values and the problem (16.2.2)
becomes
⎧
⎪
⎪γ1' (t) = v(t),
⎨
. γ2' (t) = γ1 (t), (16.2.5)
⎪
⎪
⎩γ (0) = (x , x ).
1 2
The control acts directly only on the first component of .γ but also affects the
second component .γ2 through the second equation: by Theorem 16.2.2, .(B, σ ) is
controllable on .[0, T ] for every .T > 0 with a control given explicitly by formula
(16.2.4) (see Fig. 16.1).
w ∗ σ = w ∗ Bσ = · · · = w ∗ B N −1 σ = 0.
. (16.3.2)
Therefore, if the matrix (16.3.1) does not have maximum rank, by (16.3.2) and the
Cayley-Hamilton theorem, we have
w ∗ B k σ = 0,
. k ∈ N0 ,
w ∗ etB σ = 0,
. t ≥ 0.
Consequently,
ˆ T
〈CT w, w〉 =
. |w ∗ etB σ |2 dt = 0, (16.3.3)
0
f (t) := w ∗ etB σ = 0,
. t ∈ [0, T ].
By differentiating, we obtain
dk
0=
. f (t) |t=0 = w ∗ B k σ, k ∈ N0 ,
dt k
and therefore, by (16.3.2), the matrix (16.3.1) does not have maximum rank. ⨆
⨅
Remark 16.3.2 Since the Kalman condition does not depend on T , then .CT is
positive definite for some .T > 0 if and only if it is for every .T > 0.
Example 16.3.3 In Example 16.1.3, we have
⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞
1 00 1 0
.σ = , Bσ = = ,
0 10 0 1
and thus .(σ Bσ ) is the .2 × 2 identity matrix which obviously satisfies the Kalman
condition.
312 16 Linear Equations
The non-degeneracy of the covariance matrix of a linear SDE can also be char-
acterized in terms of a well-known condition in the context of partial differential
equations. Consider the linear SDE (16.0.1) under the assumption that .σ has rank
d: then, up to a linear transformation, it is not restrictive to assume
⎛ ⎞
Id
.σ = .
0
1
K =
. Δd + 〈b + Bx, ∇〉 + ∂t , (t, x) ∈ RN +1 , (16.4.1)
2
where .Δd denotes the Laplace operator in the first d variables .x1 , . . . , xd .
By convention, we identify a first-order differential operator on .RN of the type
⎲
N
.Z := αi (x)∂xi ,
i=1
with the vector field of its coefficients and therefore also write
⎲
N
U=
. βi ∂xi ,
i=1
is defined by
⎲
N
[Z, U ] = ZU − U Z =
. (Zβi − U αi ) ∂xi .
i=1
Hörmander’s theorem [62] (see also Stroock [133] for a more recent treatment)
stands as a remarkably broad theorem. Here, we revisit a specific version pertinent
to the operator .K in (16.4.1): this theorem states that .K has a smooth fundamental
solution if and only if, at every point .x ∈ RN , the first-order operators (vector fields)
∂x 1 , . . . , ∂ x d ,
. Y := 〈Bx, ∇〉,
16.4 Hörmander’s Condition 313
together with their commutators of any order, span .RN . This is the so-called
Hörmander’s condition. Note that .∂x1 , . . . , ∂xd are the derivatives that appear in
the second-order part of .K , corresponding to the directions of Brownian diffusion,
while Y is the drift of the operator: therefore, essentially, the existence of the
fundamental solution is equivalent to the fact that .RN is spanned at every point
by the directional derivatives that appear in .K as second derivatives and as drift,
together with their commutators of any order.
Example 16.4.1
(i) If .d = N then .K is a uniformly parabolic operator and Hörmander’s condition
is obviously satisfied, without resorting to the drift and commutators, since
.∂x1 , . . . , ∂xN form the canonical basis of .R .
N
(ii) In the case of the Langevin operator of Example 16.1.3, we have .Y = x1 ∂x2 .
Thus .∂x1 = (1, 0) together with the commutator
1
K =
. ∂x x + x1 ∂x2 + x2 ∂x3 + ∂t , (x1 , x2 , x3 ) ∈ R3 .
2 1 1
Here .N = 3, .d = 1 and .Y = x1 ∂x2 + x2 ∂x3 : also in this case Hörmander’s
condition is satisfied since
∂x 1 ,
. [∂x1 , Y ] = ∂x2 , [[∂x1 , Y ], Y ] = ∂x3 ,
⎲
N
[∂xi , Y ] =
. bki ∂xk
k=1
is the i-th column of matrix B. Moreover, .[[∂xi , Y ], Y ] is the i-th column of matrix
B 2 and an analogous representation holds for higher order commutators.
.
Building upon the research in [34, 85, 106, 119] and [114], a theory analogous
to the classical treatment of uniformly parabolic equations has been developed for
Kolmogorov equations with variable coefficients of the type .∂t + At with .At as in
(16.1.3) and .σ = σ (t, x).
Linear SDEs are the basis of many important stochastic models: here we briefly
present some examples.
Example 16.5.1 (Vasicek Model) One of the simplest and most famous stochastic
models for the evolution of interest rates (also called short rates or short-term rates)
was proposed by Vasicek [143]:
Here W is a real Brownian motion, .σ represents the volatility of the rate and the
parameters .κ, θ are called respectively “speed of mean reversion” and “long-term
mean level”. The particular form of the drift .κ(θ − rt ), with .κ > 0, is designed to
capture the so-called “mean reversion” property, an essential characteristic of the
interest rate that distinguishes it from other financial prices: unlike stock prices,
for example, interest rates cannot rise indefinitely. This is because at very high
levels they would hinder economic activity, leading to a decrease in interest rates.
Consequently, interest rates move in a bounded range, showing a tendency to return
to a long-term value, represented by the parameter .θ in the model. As soon as .rt
exceeds the level .θ , the drift becomes negative and “pushes” .rt to decrease while on
the contrary, if .rt < θ , the drift is positive and tends to make .rt grow towards .θ . The
fact that .rt has a normal distribution makes the model very simple to use and allows
for explicit formulas for more complex financial instruments, such as interest rate
derivatives. Among various resources, [18] stands out as an excellent introductory
text for interest rate modeling (Fig. 16.2).
with solution
ˆ t dWs
Bt = B0 (1 − t) + bt + (1 − t)
. , 0 ≤ t < 1.
0 1−s
16.5 Examples and Applications 315
Fig. 16.2 Plot of two trajectories of the Vasicek process with parameters .κ = 1, X0 = θ = 5%
and .σ = 8%
We have
E [Bt ] = B0 (1 − t) + bt,
.
so that
The Brownian bridge is useful for modeling a system that starts at some level .B0
and is expected to reach level b at some future time, for example .t = 1. In Fig. 16.3,
316 16 Linear Equations
four trajectories of a Brownian bridge B with initial value .B0 = 0 and .B1 = 1 are
shown.
Here W is a real Brownian motion, .μ and .η are the positive parameters of friction
and diffusion. In matrix form
with
⎞ ⎛ ⎛ ⎞
−μ 0 η
.B = , σ = .
1 0 0
and
⎛ ⎞
⎲
N
(tB)n e−μt 0
e
.
tB
=I+ = 1−e−μt .
n! μ 1
n=1
16.5 Examples and Applications 317
and
ˆ t ∗
Ct =
. esB σ σ ∗ esB ds
0
ˆ t⎛ ⎞⎛ −μs
⎞
e−μs 0 e−μs 1−eμ
=η 2
1−e−μs ds
0 μ 0 0 1
⎛ ⎞
ˆ t e−2μs e −e
−μs −2μs
Next, we present two examples of very popular SDEs frequently used in the
field of mathematical finance. Although not linear SDEs of the form (16.0.1), these
equations have an “affine structure” (in the sense of [36]) that allows to derive the
expression of their CHF and density in terms of special functions.
Example 16.5.4 (CIR Model) The Cox-Ingersoll-Ross (CIR) model [29] is a
variant of the Vasicek model of Example 16.5.1 in which the diffusion coefficient
is a square root function: this implies that, unlike Vasicek, the solution (the interest
rate) takes non-negative values. Specifically, we consider the following stochastic
dynamics
√
dXt = κ(θ − Xt )dt + σ Xt dWt
. (16.5.1)
where .κ, θ, σ are positive parameters and W is a real Brownian motion. Using Itô’s
formula, we determine the CHF .ϕXt of .Xt : first, we have
η2 iηXt
deiηXt = iηeiηXt dXt −
. e d〈X〉t
2
⎛ ⎞ √
(ησ )2
=e iηXt
iηκ(θ − Xt ) − Xt dt + iησ eiηXt Xt dWt =
2
318 16 Linear Equations
(ησ )2 √
(putting .a(η) = iηκθ , .b(η) = iηκ − 2 and .c(η, Xt ) = iησ eiηXt Xt )
Equivalently, the function .u(t, η) := ϕXt (η) satisfies the following Cauchy problem
for a first-order partial differential equation
⎧
∂t u(t, η) = (a(η) − ib(η)∂η )ϕXt (η), t > 0, η ∈ R,
.
u(0, η) = eiηx .
2κ
. d(t) := , λ(t) := 2xe−κt d(t),
(1 − e−κt )σ 2
Example 16.5.5 (CEV Model) The constant elasticity of variance (CEV) model
has origins in physics and was introduced in mathematical finance by Cox [27, 28]
to describe the dynamics of the price of a risky asset: the CEV equation is of the
form
β
dXt = σ Xt dWt ,
. (16.5.2)
2(1−β) 2(1−β)
1 √ −x +y
⎛ ⎞
x 2 −2β ye 2(1−β)2 σ 2 (T −t) (xy)1−β
.𝚪± (t, x; T , y) = I± 1 ,
(1 − β)σ 2 (T − t) 2(1−β) (1 − β)2 σ 2 (T − t)
where .Iν (x) is the modified Bessel function of the first kind defined by
⎛ x ⎞ν ⎲
∞
x 2k
Iν (x) =
. ,
2
k=0
22k k!𝚪 E (ν + k + 1)
and .𝚪E denotes the Euler Gamma. Both .𝚪+ and .𝚪− are fundamental solutions of
∂t + A where .A is the characteristic operator of X:
.
σ 2 x 2β
A =
. ∂xx .
2
Precisely, we have
and
ˆ
. lim 𝚪± (t, x; T , y)ϕ(y)dy = ϕ(x0 ), x0 ∈ R≥0 ,
(t,x)→(T ,x0 )
R>0
t<T
where
ˆ +∞
a :=
. 𝚪+ (t, x; T , y)dy < 1.
0
On the other hand, if .β < 12 then X reaches 0 but is “reflected”: in this case .𝚪− has
an integral equal to one on .R>0 and is the transition density of X.
In [33] and [61] it is proven that X is a strictly local martingale and for this reason
it is not a good model for the price of a risky asset because it creates “arbitrage
opportunities”: in fact, if .β < 12 , buying the asset at time .τx at zero cost, there is a
certain gain since the price later becomes positive. For this reason, in the CEV model
introduced by Cox [27], the price is defined as the process obtained by stopping the
solution X at time .τx , that is
St := Xt∧τx ,
. t ≥ 0.
In the financial interpretation, .τx represents the default time of the risky asset.
Delbaen and Shirakawa [33] show that S is a non-negative martingale for every
.0 < β < 1. The unstopped process X is instead used as a model for the dynamics
of interest rates and volatility (or risk index, positive by definition) of financial
assets, as in the famous CIR [29] and Heston [60] models. The CEV model (and its
stochastic volatility counterpart, the popular SABR model [58] used in interest rate
modeling) is an interesting example of a degenerate model because the infinitesimal
generator is not uniformly elliptic and the law of the price process is not absolutely
continuous with respect to the Lebesgue measure.
16.6 Key Ideas to Remember 321
We highlight the key outcomes of the chapter and the fundamental concepts to
remember from an initial reading, omitting the more technical or peripheral matters.
If any of the following brief statements are unclear, please refer back to the relevant
section for clarification.
• Section 16.1: linear SDEs have explicit Gaussian solutions. A particularly
interesting example is provided by the Langevin kinetic model whose solution
admits a density although the diffusive coefficient of the SDE is degenerate.
• Sections 16.2, 16.3, and 16.4: the study of the absolute continuity of the solution
of a linear SDE opens up interesting links with the theories of optimal control
and PDEs. The fact that the covariance matrix of the solution of a linear SDE
is positive definite is equivalent to the controllability of an appropriate linear
system: in this regard, the Kalman condition provides a simple operational
criterion. There is an additional equivalence with the Hörmander condition,
which is well-known in the context of PDEs theory.
• Section 16.5: linear SDEs are the basis of classic stochastic models and find
wide-ranging applications in various fields. In this section we present numerous
examples of linear and non-linear SDEs used in mathematical finance and
beyond.
Chapter 17
Strong Solutions
We present classical results regarding the strong existence and pathwise uniqueness
for SDEs. We maintain the general notations introduced in Chap. 14 and focus on
the SDE
satisfy the standard assumptions of Definition 14.4.1 for regularity (local Lipschitz
continuity) and linear growth. Here .N, d ∈ N and .0 ≤ t0 < T are fixed. We prove
the following results:
• Theorem 17.1.1 on strong uniqueness;
• Theorem 17.2.1 on strong solvability and the flow property;
• Theorem 17.3.1 on the Markov property;
• Theorem 17.4.1 and Corollary 17.4.2 on estimates of dependence on the initial
datum, regularity of trajectories, Feller property, and strong Markov property.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 323
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_17
324 17 Strong Solutions
17.1 Uniqueness
for every t ∈ [t0 , T ] and x, y ∈ RN such that |x|, |y| ≤ n. Then for the SDE (17.0.1)
with initial datum Z there is strong uniqueness according to Definition 14.1.11.
Proof Let X, Y be two solutions of the SDE (17.0.1) with initial datum Z, i.e.
X ∈ SDE(b, σ, W, Ft ) and Y ∈ SDE(b, σ, W, Gt ). We use a localization argument1
and set
with the convention min ∅ = T . Note that τn = t0 on (|Z| > n) ∈ Ft0 ∩ Gt0 . Since
by hypothesis X, Y are adapted and a.s. continuous, τn is an increasing sequence of
stopping times2 with values in [t0 , T ], such that τn ↗ T a.s. We set
Moreover, we have
| | | |
. |bn (s, Xs∧τ ) − bn (s, Ys∧τ )| = |bn (s, Xs∧τ ) − bn (s, Ys∧τ )| 1(|Z|≤n) ≤
n n n n
1 The localization argument is necessary even under the hypothesis of global Lischitz continuity
From (17.1.3) and (17.1.4), proceeding exactly as in the proof of estimate (14.4.5)
with p = 2, we obtain
ˆ t
vn (t) ≤ c̄
. v(s)ds, t ∈ [t0 , T ],
t0
for a positive constant c̄ = c̄(T , d, N, κn ). Since X and Y are a.s. continuous and
adapted (and therefore progressively measurable), Fubini’s theorem ensures that v
is a measurable function on [t0 , T ], that is, vn ∈ mB. Moreover, vn is bounded,
precisely |vn | ≤ 4n2 , by construction. From Grönwall’s lemma, we obtain that vn ≡
0 and therefore
⎾ ⏋
| |2
.E sup |Xt∧τ − Yt∧τ | = vn (T ) = 0.
n n
t0 ≤t≤T
where
(i) h is a strictly increasing function such that h(0) = 0 and for every ε > 0
ˆ ε 1
. ds = ∞; (17.1.6)
0 h2 (s)
326 17 Strong Solutions
(ii) k is a strictly increasing, concave function such that k(0) = 0 and for every
ε>0
ˆ ε
1
. ds = ∞.
0 k(s)
17.2 Existence
We are interested in studying the solvability in the strong sense, which, as seen
in Sect. 14.1, requires that the solution is adapted to the standard filtration of the
Brownian motion and the initial datum. As stated3 in [124], the point where Itô
’s original theory of strong solutions of SDEs proves to be truly effective is the
theory of flows, which plays an important role in many applications: in this regard,
we indicate [82] as a reference monograph (additional valuable resources include
[12, 47] and [51]).
Theorem 17.2.1 (Strong Solvability and Flow Property [!]) Suppose that
the coefficients .b, σ satisfy the standard assumptions4 (14.4.1) and (14.4.2) on
.]t0 , T [×R . Given a set-up .(W, Ft ), we have:
N
3 [124] page 136: “Where the ‘strong’ or ‘pathwise’ approach of Itô ’s original theory of SDEs
really comes into its own is in the theory of flows. Flows are now very big business; and the
martingale-problem approach, for all that is has other interesting things to say, cannot deal with
them in any natural way.”
4 Actually, using a localization argument as in the proof of Theorem 17.1.1, it is sufficient to assume
(0)
Xt
. ≡ x,
ˆ t ˆ t
(n)
Xt =x+ b(s, Xs(n−1) )ds + σ (s, Xs(n−1) )dWs , n ∈ N,
t0 t0
(17.2.4)
with .c = c(T , d, N, x, c1 , c2 ) > 0 where .c1 , c2 are the constants of the standard
assumptions on coefficients. Let .n = 1: by (14.4.4) we have
⎾ ⏋ ⎾ |ˆ t ˆ t |2 ⏋
| |
E
. sup
(1)
|Xt
(0)
− Xt |2 =E |
sup | b(s, x)ds + σ (s, x)dWs ||
t0 ≤t≤t1 t0 ≤t≤t 1 t0 t0
(by (14.4.5))
ˆ t1 ⎾ ⏋
. ≤ c̄2 E sup |Xr(n) − Xr(n−1) |2 ds ≤
t0 t0 ≤r≤s
(4cT )n
≤ , n ∈ N.
n!
Then, by Borel-Cantelli’s Lemma 1.3.28 in [113] we have
⎛ ⎞
(n) (n−1) 1
P
. sup |Xt − Xt | ≥ n i.o. = 0
t0 ≤t≤T 2
that is, for almost every .ω ∈ Ω there exists .nω ∈ N such that
(n) (n−1) 1
. sup |Xt (ω) − Xt (ω)| ≤ , n ≥ nω .
t0 ≤t≤T 2n
Being
(n)
⎲
n
(k) (k−1)
Xt
. =x+ (Xt − Xt )
k=1
(n)
it follows that, almost surely, .Xt converges uniformly in .t ∈ [t0 , T ] as
.n → +∞ to a limit that we denote by .Xt : to express this fact, in symbols
(n)
we write .Xt ⇒ Xt a.s. Note that .X = (Xt )t∈[t0 ,T ] is a.s. continuous (thanks
to the uniform convergence) and adapted to .F W : moreover, .Xt = Xt (x, ω) ∈
m(BN ⊗ FtW ) for each .t ∈ [t0 , T ] because this measurability property holds
(n)
for .Xt for each .n ∈ N.
By (14.4.1) and being X a.s. continuous, it is clear that condition (14.1.3) is
satisfied. To verify that, almost surely, we have
ˆ t ˆ t
Xt = x +
. b(s, Xs )ds + σ (s, Xs )dWs , t ∈ [t0 , T ],
t0 t0
17.2 Existence 329
This concludes the proof of existence in the case of deterministic initial datum.
(2) Now consider the case of a random initial datum .Z ∈ mFt0 . Let .f = f (x, ω)
be the function on .RN × Ω defined by
| ˆ t ˆ t |
| t0 ,x |
.f (x, ·) := sup | σ (s, Xs )dWs || .
|Xt − x − b(s, Xs )ds −
t0 ,x t0 ,x
t0 ≤t≤T t0 t0
t ,·
Note that .f ∈ m(BN ⊗ FTW ) since .Xt 0 ∈ m(BN ⊗ FtW ) for each .t ∈ [t0 , T ].
Moreover, for each .x ∈ R we have .f (x, ·) = 0 a.s. and therefore also .F (x) :=
N
(by the freezing lemma in Theorem 4.2.10 in [113], since .Z ∈ mFt0 , then
f ∈ m(BN ⊗ FTW ) with .Ft0 and .FtW independent .σ -algebras by Remark
.
14.1.4 and .f ≥ 0)
⎾ ⏋
. = E f (Z, ·) | Ft0 .
E [f (Z, ·)] = 0
.
and therefore .Xt0 ,Z in (17.2.2) is a solution of the SDE (17.0.1); actually, .Xt0 ,Z
is a strong solution because it is clearly adapted to .F Z,W .
330 17 Strong Solutions
that is, .Xt0 ,Z is a solution on .[t, T ] of the SDE (17.0.1) with initial datum .Xtt0 ,Z .
t0 ,Z
On the other hand, as proven in point (2), also .Xt,Xt is a solution of the same
0 t ,Z
SDE. By uniqueness, the processes .Xt0 ,Z and .Xt,Xt
are indistinguishable on
[t, T ]. This proves (17.2.3) and concludes the proof of the theorem.
.
⨆
⨅
In this section we show that, under suitable assumptions, the solution of an SDE is a
continuous Markov process (i.e., a diffusion). Hereafter, we will refer systematically
to the results of Sect. 2.5 concerning the characteristic operator of a Markov process.
Theorem 17.3.1 (Markov Property [!]) Assume that the coefficients .b, σ satisfy
conditions (14.4.1) and (17.1.1) of linear growth and local Lipschitz continuity. If
.X ∈ SDE(b, σ, W, Ft ) then X is a Markov process with transition law p where,
1 ⎲ ⎲
N N
At =
. cij (t, x)∂xi xj + bj (t, x)∂xi , cij := (σ σ ∗ )ij . (17.3.1)
2
i,j =1 j =1
p(t, Xt ; s, H ) = P (Xs ∈ H | Xt ),
. t0 ≤ t ≤ s ≤ T , H ∈ BN .
Xs = Xst,Xt
. for every s ∈ [t, T ].
Therefore, we have
( )⎾ ⏋
E 1H Xst,Xt | Xt =
(by (4.2.7) in [113] of the freezing lemma, being .Xt ∈ mFt and therefore, by
Remark 14.1.4, independent of .FsW and .(x, ω) |→ 1H (Xst,x (ω)) ∈ m(BN ⊗ FsW )
thanks to (17.2.1))
⎾ ⏋
. = E 1H (Xst,x ) |x=Xt = p(t, Xt ; s, H ).
On the other hand, it is enough to repeat the previous steps, conditioning on .Ft
instead of .Xt , to prove the Markov property
p(t, Xt ; s, H ) = P (Xs ∈ H | Ft ),
. 0 ≤ t0 ≤ t ≤ s ≤ T , H ∈ BN .
Finally, the fact that .At is the characteristic operator of X has been proved in
Sect. 15.1 (in particular, compare (15.1.1) with definition (2.5.5)). ⨅
⨆
Remark 17.3.2 Under the assumptions of Theorem 17.3.1, by the Markov property
we have
where
ˆ
u(t, x) :=
. p(t, x; T , dy)ϕ(y).
R
332 17 Strong Solutions
We recall that, by the results of Sects. 2.5.3 and 2.5.2, the transition law p is a
solution of the Kolmogorov backward and forward equations, given respectively by
where .As∗ indicates the adjoint operator of .At in (17.3.1), acting in the forward
variable y.
ˆ T
. 0= ϕ(T , XTt,x ) − ϕ(t, x) = (∂s + As ) ϕ(s, Xst,x )ds
t
ˆ T
+ ∇ϕ(s, Xst,x )σ (s, Xst,x )dWs
t
where .At is the characteristic operator in (17.3.1). Applying the expected value and
Fubini’s theorem, we obtain
⎾ˆ T ⏋
. 0=E (∂s + As ) ϕ(s, Xst,x )ds
t
ˆ ˆ ˆ
T ⎾ ⏋ T
= E (∂s + As ) ϕ(s, Xst,x ) ds = (∂s + As ) ϕ(s, y)p(t, x; s, dy)ds
t t RN
(17.3.2)
where .p(t, x; s, dy) denotes the law of the random variable .Xst,x which, by
Theorem 17.3.1, is the transition law of the Markov process X.
By (17.3.2), for every .t ≥ 0 we have
¨
. (∂s + As ) ϕ(s, y)p(t, x; s, dy)ds = 0, ϕ ∈ C0∞ (]t, +∞[×RN ),
RN+1
and thus we recover the result of Sect. 2.5.3 according to which p is a distributional
solution of the forward Kolmogorov equation
( )
. ∂s − As∗ p(t, x; s, ·) = 0, s > t. (17.3.3)
17.4 Continuous Dependence on Parameters 333
(by (17.4.3))
⎾| |p ⏋
| |
. ≤ cE |Xtt10 ,Z1 − Z1 | ≤
(by (14.4.4))
ˆ t1 ⎛ ⎾ ⏋⎞
p−2
. ≤ cc̄1 |t1 − t0 | 2 1 + E sup |Xrt0 ,Z1 |p ds ≤
t0 t0 ≤r≤s
We estimate the last term of (17.4.2) using a completely analogous approach, which
concludes the proof. ⨆
⨅
Corollary 17.4.2 (Feller and Strong Markov Properties) Under the standard
assumptions (14.4.1)–(14.4.2) and the usual conditions on the filtration, every X ∈
SDE(b, σ, W, Ft ) is a Feller process and satisfies the strong Markov property.
Proof By Theorem 17.3.1, X is a Markov process with transition law p =
p(t, x; T , ·) where, for every t, T ≥ 0 with t ≤ T and x ∈ RN , p(t, x; T , ·) is
the law of the r.v. XTt,x . By (17.4.1) and Kolmogorov’s continuity theorem (in the
multidimensional version of Theorem 3.3.4), the process (t, x, T ) |→ XTt,x admits a
modification X ~t,x with locally α-Hölder continuous trajectories for every α ∈ [0, 1[
T
17.4 Continuous Dependence on Parameters 335
with respect to the so-called “parabolic” distance: precisely, for every α ∈ [0, 1[,
n ∈ N and ω ∈ Ω there exists cα,n,ω > 0 such that
| t,x | ⎛ ⎞
1 α
. |~ ~us,y (ω)| ≤ cα,n,ω |x − y| + |t − s| 2 + |r − u| 2 ,
Xr (ω) − X
1
is continuous thanks to the dominated convergence theorem and this proves the
Feller property. The strong Markov property follows from Theorem 7.1.2. ⨆
⨅
Chapter 18
Weak Solutions
In this chapter, we present weak existence and uniqueness results for SDEs with
coefficients
where .N, d ∈ N and .T > 0 are fixed. To this end, we describe what is known as
the “martingale problem” due to Stroock and Varadhan [136]: this problem pertains
to the construction of a distribution with respect to which the canonical process
∗
.X is a semimartingale with drift .b(t, Xt ) and covariance matrix .(σ σ )(t, Xt ). The
solution to the martingale problem, if it exists, is the law of the solution of the
corresponding SDE: in fact, the martingale problem turns out to be equivalent to the
weak solvability problem.
The analytical results on the fundamental solution of parabolic PDEs (cf.
Chap. 20) provide a solution to the martingale problem under Hölder regularity
and uniform ellipticity assumptions on the coefficients. Under these assumptions,
we prove existence and uniqueness in the weak sense for SDEs, along with
strong Markov, Feller, and other regularity properties of the trajectories of the
solution. We also showcase broader findings from prominent mathematicians,
including Skorokhod, Stroock, Varadhan, Krylov, Veretennikov and Zvonkin. In
the last section, we prove a “regularization by noise” result that guarantees strong
uniqueness for SDEs with bounded Hölder drift.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 337
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_18
338 18 Weak Solutions
The results of this chapter mark the endpoint of the study of construction methods
for diffusions, whose historical motivations had been illustrated in Sect. 2.6.
Assume that the SDE with coefficients .b, σ admits a weak solution .(X, W ) and
denote as usual by .μX,W its law. By Lemma 14.3.5, the canonical process .(X, W) is
also a solution of the SDE with coefficients .b, σ on the space .(ΩN +d , GTN +d , μX,W )
and consequently,1 for each .i, j = 1, . . . , N, the processes
ˆ t
Mit := Xit −
. bi (s, Xs )ds, . (18.1.1)
0
ˆ t
(cij ) := σ σ ∗ ,
ij j
Mt := Mit Mt − cij (s, Xs )ds, (18.1.2)
0
are local martingales with respect to the filtration .(GtN +d )t∈[0,T ] generated by
.(X, W).
Note that the Brownian motion .W does not appear in the definitions (18.1.1) and
(18.1.2) and, still denoting by .X the identity process on .ΩN , one can verify that the
processes formally defined as in (18.1.1) and (18.1.2) are local martingales on the
space .(ΩN , GTN , μX ). This motivates the following
Definition 18.1.1 (Martingale Problem) A solution to the martingale problem
for .b, σ is a probability measure on the canonical space .(ΩN , GTN ) such that the
processes .Mi , Mij in (18.1.1) and (18.1.2) are local martingales with respect to the
filtration .GtN generated by the identity process .X.
Remark 18.1.2 ([!!]) It is worth emphasizing that the martingale condition on the
processes in (18.1.1) and (18.1.2) basically (means that) .X is a semimartingale with
drift .b(t, Xt ) and covariation matrix .Ct := cij (t, Xt ) .
If .(X, W ) is a solution of the SDE with coefficients .b, σ then .μX is a solution
of the martingale problem for .b, σ . We now show a result in the opposite direction
then
ˆ t
.〈M , Mj 〉t =
i
cij (s, Xs )ds
0
that allows us to conclude that the martingale problem and the weak solvability of
an SDE are equivalent.
Theorem 18.1.3 (Stroock and Varadhan) If .μ is a solution to the martingale
problem for .b, σ , then there exists a weak solution to the SDE with coefficients .b, σ
and initial law .μ0 defined by
μ0 (H ) := μ(X0 ∈ H ),
. H ∈ BN .
Proof We provide the proof only in the scalar case .N = d = 1 and refer, for
example, to Section 5.4.B in [67] for the general case. The fact that .μ is a solution
to the martingale problem for .b, σ , means that the process defined on .(ΩN , GTN , μ)
as in (18.1.1), that is
ˆ t
Mt = Xt −
. b(s, Xs )ds, (18.1.3)
0
ˆ t ˆ t
. σ (s, Xs )dBs = Xt − X0 − b(s, Xs )ds,
0 0
that is, .(X, B) is a solution to the SDE with coefficients .b, σ . Note that the solution
.(X, B) is defined on the space .(ΩN , GTN , μ).
In the general case where .σ can be zero, consider the space .(ΩN +d , GTN +d , μ ⊗
μW ) where .μW is the Wiener measure and the canonical process .(X, W) is such
that .W is a real Brownian motion (we recall that we are dealing only with the case
.N = d = 1). Let .Jt = 1(σ (t,Xt )/=0) and
ˆ t ˆ t
Js
Bt =
. dMs + (1 − Js )dWs .
0 σ (s, Xs ) 0
340 18 Weak Solutions
where in the last step we used the fact that, by the Itô isometry,
⎾⎛ˆ ⎞2 ⏋ ⎾ˆ ⏋
t t
E
. (Js − 1)dMs =E (Js − 1)σ 2 (s, Xs )ds = 0.
0 0
⨆
⨅
Remark 18.1.4 It is interesting to note in the previous proof that if .σ /= 0, i.e., in
the non-degenerate case, the Brownian motion .B is constructed as a functional of .X
and therefore the space .ΩN is sufficient to “support” the solution .(X, B) of the SDE.
On the contrary, in the degenerate case where .σ can be zero, the Brownian motion
.W comes into play to “guarantee sufficient randomness” to the system and it is
therefore necessary to define the solution on the enlarged space .ΩN +d . This further
explicates the difference between weak and strong solutions illustrated earlier in
Remarks 14.1.7 and 14.3.7.
Remark 18.1.5 Stroock and Varadhan (cf. Theorem 6.2.3 in [136]) prove that, for
the martingale problem, the equality of marginal distributions implies the equality
of finite-dimensional distributions and therefore the uniqueness in law. Precisely,
suppose that .b, σ are measurable and bounded functions: if for every .t ∈ [0, T ],
.x ∈ R and .ϕ ∈ bC(R ) we have
n n
E μ1 [ϕ(Xt )] = E μ2 [ϕ(Xt )]
.
where .μ1 , μ2 are solutions of the martingale problem for .b, σ with initial law
δx , then there exists at most one solution of the martingale problem for .b, σ with
.
initial law .δx . Hereafter, we will not use this result but will adopt a more analytical
approach to prove weak uniqueness using existence theorems for the Kolmogorov
equation associated with the SDE.
18.2 Equations with Hölder Coefficients 341
We consider an SDE with coefficients .b, σ as in (18.0.1) and define the diffusion
matrix
C = (cij ) := σ σ ∗ .
.
The elements of .bCTα are continuous functions in .(t, x), Hölder continuous in
the spatial variable x, uniformly with respect to the time variable t. In fact, the
continuity condition in t can be omitted and will only be assumed for the sake of
simplifying the presentation.
In this section, we prove a weak existence and uniqueness result for SDE under
the following
Assumption 18.2.2
(i) .cij , bi ∈ bCTα for some .α ∈ ]0, 1] and for each .i, j = 1, . . . , N ;
(ii) the diffusion matrix .C is uniformly positive definite: there exists a positive
constant .λ0 such that
1 2
. |η| ≤ 〈C (t, x)η, η〉 ≤ λ0 |η|2 , (t, x) ∈ ]0, T [×RN , η ∈ RN .
λ0
(18.2.2)
Theorem 18.2.3 ([!!]) Under Assumption 18.2.2, for every distribution .μ0 on .RN
there exists and is unique in law the weak solution .(X, W ) of the SDE
1 ⎲ ⎲
N N
At :=
. cij (t, x)∂xi xj + bi (t, x)∂xi , (t, x) ∈ ]0, T [×RN .
2
i,j =1 i=1
342 18 Weak Solutions
(iii) X admits a modification with .β-Hölder continuous trajectories for every .β <
1
2.
The proof of Theorem 18.2.3 is based on the existence results of the fundamental
solution for parabolic PDEs of Theorem 18.2.6 below.
Notation 18.2.4 We denote by .C 1,2 (]0, T [×RN ) the space of functions defined
on .]0, T [×RN that are continuously differentiable with respect to t and twice
continuously differentiable with respect to x.
Definition 18.2.5 (Backward Cauchy Problem) A classical solution of the back-
ward Cauchy problem for the operator .∂t + At on .]0, s[×RN , is a function .u ∈
C 1,2 (]0, s[×RN ) ∩ C(]0, s] × RN ) such that
⎧
∂t u(t, x) + At u(t, x) = 0, (t, x) ∈ ]0, s[×RN ,
. (18.2.4)
u(s, x) = ϕ(x), x ∈ RN ,
(i) for every .s ∈ ]0, T ] and for every .ϕ ∈ bC(RN ) the function defined by
ˆ
u(t, x) =
. 𝚪(t, x; s, y)ϕ(y)dy, (t, x) ∈ ]0, s[×RN , (18.2.5)
RN
is a transition law,4 enjoys the Feller property (cf. Definitions 2.1.1 and 2.1.10)
and satisfies the Chapman-Kolmogorov equation (2.4.4);
(iii) for every .(s, y) ∈ ]0, T ] × RN , we have .𝚪(·, ·; s, y) ∈ C 1,2 (]0, s[×RN ) and
the following Gaussian estimates hold: there exist two positive constants .λ, c
that depend only on .T , N, α, λ0 , [cij ]α and .[bi ]α , for which we have
1 ⎛ −1 ⎞
. G λ (s − t), x − y ≤ 𝚪(t, x; s, y) ≤ c G (λ(s − t), x − y) ,
c
(18.2.6)
| |
|∂x 𝚪(t, x; s, y)| ≤ √ c G (λ(s − t), x − y) ,
i
s−t
| |
|∂x x 𝚪(t, x; s, y)| + |∂t 𝚪(t, x; s, y)| ≤ c G (λ(s − t), x − y)
i j
s−t
for every .(t, x) ∈ ]0, s[×RN , where .G denotes the standard N-dimensional
Gaussian function
1 |x|2
G(t, x) =
.
N
e− 2t , t > 0, x ∈ RN .
(2π t) 2
ψi (x) = xi ,
. ψij (x) = xi xj , x ∈ RN , i, j = 1, . . . , N,
We observe that the boundedness hypothesis of the coefficients and the Gaussian
estimate from above (18.2.6) guarantee that .At ψi (Xt ), At ψij (Xt ) ∈ L1 ([0, T ] ×
ΩN ): then from Theorem 2.5.13 it follows that the processes
ˆ t
Mit := Xit −
. bi (s, Xs )ds,
0
ˆ t⎛ ⎞
ij j j
Zt := Xit Xt − cij (s, Xs ) + bi (s, Xs )Xs + bj (s, Xs )Xis ds
0
Uniqueness in Law and Main Properties Let us prove that if .(X, W ) is a weak
solution of the SDE (18.2.3) on .[0, T ] then X is a Markov process. Fixed .ϕ ∈
bC(RN ), we consider the6 solution u in (18.2.5) of the backward Cauchy problem
(18.2.4). Note that u is a bounded function since, by the Gaussian estimate (18.2.6),
we have
ˆ
.|u(t, x)| ≤ c‖ϕ‖L∞ (RN ) G (λ(s − t), x − y) dy
RN
Given the arbitrariness of .ϕ, it follows that X is a Markov process with transition
density .𝚪: by Theorem 18.2.6-(ii) X is a Feller process and therefore also enjoys
the strong Markov property by Theorem 7.1.2.
By Kolmogorov’s continuity Theorem 3.3.4, the process X admits a modification
with .β-Hölder continuous trajectories for every .β < 12 : indeed, for every .0 ≤ t <
s ≤ T and .p > 0, the following integral estimate holds
⎾ ⏋ ⎾ ⎾ ⏋⏋
E |Xt − Xs |p = E E |Xt − Xs |p | Xt
.
5 Possiblyextending the canonical space to support also the Brownian motion .W with respect to
which to write the SDE, as in the proof of Theorem 18.1.3 and in the subsequent Remark 18.1.4.
6 As we will see in Chap. 20, in general the Cauchy problem (18.2.4) admits more than one
solution.
346 18 Weak Solutions
⎾ˆ ⏋
=E |Xt − y| 𝚪(t, Xt ; s, y)dy ≤
p
RN
We present an existence and uniqueness result for weak solutions under significantly
broader assumptions than those of Theorem 18.2.3.
Theorem 18.3.1 (Skorokhod [131], Stroock and Varadhan [136], Krylov [74,
75] [!!]) Let .μ0 be a distribution on .RN . Suppose that
(i) the coefficients .b, σ are bounded, Borel measurable functions
and at least one of the following assumptions holds:
(ii) .b(t, ·), σ (t, ·) are continuous functions for every .t ∈ [0, T ];
(iii) condition (18.2.2) of uniform ellipticity holds.
Then there exists a weak solution .(X, W ) of the SDE
on .[0, T ] with initial law .μ0 . Moreover, if both assumptions (ii) and (iii) hold, then
there is also uniqueness in the weak sense.
So as for Theorem 18.2.3, the proof of weak solvability hinges on the martingale
problem, and therefore consists in the construction of the law of the solution.
However, in the proof of Theorem 18.2.3, this probability distribution is defined by
the fundamental solution of the backward Kolmogorov equation, whose existence is
18.4 Strong Uniqueness Through Regularization by Noise 347
The main result of the section is the following theorem that provides an example of
“regularization by noise”: it extends the results of Sect. 14.2 to the case of strong
solutions.
Theorem 18.4.1 (Zvonkin [154], Veretennikov [144] [!!]) Assume the following
hypotheses:
(i) the drift coefficient is bounded and Hölder continuous, .b ∈ bCTα for some
.α ∈ ]0, 1];
for SDEs with irregular coefficients, without assuming the uniform ellipticity of
the diffusion matrix, starting from suitable .Lp estimates for the solutions of the
associated Fokker-Planck equation. Finally, we point out the recent results in [57]
on the approximation of solutions, under minimal regularity assumptions.
For the proof of Theorem 18.4.1 we follow Fedrizzi and Flandoli [43] who use
the so-called Itô-Tanaka trick and the following
Proposition 18.4.3 Under the assumptions of Theorem 18.2.6, let .𝚪 be the funda-
mental solution of the Kolmogorov operator .∂t +At on .]0, T [×RN . For every .λ ≥ 1,
the vector-valued function in .RN
ˆ T ˆ
.uλ (t, x) := e−λ(s−t) 𝚪(t, x; s, y)b(s, y)dyds, (t, x) ∈ ]0, T ] × RN ,
t RN
Moreover, there exists a constant .c > 0, which depends only on .N, λ0 , T and the
norms .[bi ]α and .[cij ]α in (18.2.1), such that
|x − y|
|uλ (t, x) − uλ (t, y)| ≤ c √ ,
. λ (18.4.2)
|∇x uλ (t, x) − ∇x uλ (t, y)| ≤ c|x − y|,
7 Here
⎲
N
.(∇x uλ · σ )ij = (∇x uλ )ik σkj , i = 1, . . . , N, j = 1, . . . , d.
k=1
18.4 Strong Uniqueness Through Regularization by Noise 349
or equivalently
ˆ t ˆ t
. b(s, Xs )ds = uλ (0, X0 ) − uλ (t, Xt ) + λ uλ (s, Xs )ds
0 0
ˆ t
+ (∇x uλ · σ )(s, Xs )dWs . (18.4.3)
0
In this way, the drift coefficient b is replaced by the more regular function .uλ : at
this point, with some small adjustments, one can proceed as in the case of Lipschitz
coefficients, using Grönwall’s lemma to prove uniqueness. In fact, let .X' be another
solution of the SDE (18.4.1) related to the same Brownian motion W and let .Z :=
X − X' . Writing also .X' as in (18.4.4) and subtracting the two equations, we obtain
ˆ t
Zt = −uλ (t, Xt ) + uλ (t, Xt' ) + λ
. (uλ (s, Xs ) − uλ (s, Xs' ))ds
0
ˆ t ( )
+ σ (s, Xs ) − σ (s, Xs' ) dWs
0
ˆ t ( )
+ (∇x uλ · σ )(s, Xs ) − (∇x uλ · σ )(s, Xs' ) dWs .
0
1 ⎾ ⏋ ⎾ ⏋
. E |Zt |2 ≤ E |uλ (t, Xt ) − uλ (t, Xt' )|2
4
⎾ˆ t ⏋
| |
+λ TE
2 | ' |2
uλ (s, Xs ) − uλ (s, Xs ) ds
0
⎾ˆ ⏋
t | |
+E |σ (s, Xs ) − σ (s, X' )|2 ds
s
0
⎾ˆ ⏋
t | |
+E |(∇x uλ · σ )(s, Xs ) − (∇x uλ · σ )(s, X' )|2 ds ≤
s
0
350 18 Weak Solutions
(by the estimates (18.4.2) of Proposition 18.4.3 with .λ ≥ 1 and the Lipschitz
assumption of .σ )
ˆ
c ⎾ ⏋ t ⎾ ⏋
. ≤ E |Zt |2 + c(1 + λ) E |Zs |2 ds,
λ 0
for some positive constant c that depends only on .N, λ0 , T and the norms .[b]α and
[σ ]1 . In other words, we have
.
⎛ ⎞ ⎾ ⏋ ˆ t ⎾ ⏋
1 c
. − E |Zt | ≤ c(1 + λ)
2
E |Zs |2 ds.
4 λ 0
for a suitable positive constant .c̄. The thesis follows from Grönwall’s lemma. ⨆
⨅
Remark 18.4.4 Formula (18.4.4) can be used as in the proof of Theorem 17.4.1
to obtain the continuous dependence estimate (17.4.1) on the parameters. As a
consequence of Kolmogorov’s continuity Theorem 3.3.4, under the assumptions
of Theorem 18.4.1 the solution of the SDE (18.4.1) with initial datum x at
time t, admits a modification .(t, x, s) |→ Xst,x with locally .α-Hölder continuous
trajectories for every .α ∈ [0, 1[ with respect to the “parabolic” distance: precisely,
for every .α ∈ [0, 1[, .n ∈ N and .ω ∈ Ω there exists .cα,n,ω > 0 such that
| | ⎛ ⎞
| t1 ,x t ,y | 1 1 α
|Xs1 (ω) − Xs22 (ω)| ≤ cα,n,ω |x − y| + |t1 − t2 | 2 + |s1 − s2 | 2 ,
.
(18.4.5)
for every .t1 , t2 , s1 , s2 ∈ [0, T ] such that .t1 ≤ s1 , .t2 ≤ s2 , and for every .x, y ∈ RN
such that .|x|, |y| ≤ n.
We summarize the most relevant results of the chapter. As usual, if you have
any doubt about what the following succinct statements mean, please review the
corresponding section.
• Section 18.1: through the Stroock-Varadhan martingale problem, the study of
weak solvability of an SDE is reduced to the construction of a distribution (the
law of the solution) on the canonical space that makes the processes in (18.1.1)
and (18.1.2) martingales.
18.5 Key Ideas to Remember 351
• Sections 18.2 and 18.3: we exploit the analytical results on the existence of
the fundamental solution of uniformly parabolic PDEs to solve the martingale
problem. As a consequence, we prove existence, weak uniqueness, and Markov
properties for SDEs with Hölder and bounded coefficients. The assumptions
are further weakened in Theorem 18.3.1 whose proof is based on properties of
relative compactness in the space of distributions.
• Section 18.4: we establish a “regularization by noise” result, ensuring strong
uniqueness for SDEs with Hölder continuous and bounded drift, under a uniform
ellipticity condition.
Main notations used or introduced in this chapter:
We offer a concise and relaxed exploration of various paths that the theory of
stochastic differential equations has taken. At the end of each section, we include a
bibliography, directing interested readers to further literature on the specific topics
discussed.
dXt = ut dt + vt dBt
. (19.1.1)
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 353
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_19
354 19 Complements
which “mimicks” X in the sense that it has the same marginal distributions, i.e., it
d
is such that .Yt = Xt for each t. This result can be useful when one is interested in
the law of .Xt for a fixed time t and not in the entire law of the process X. Since
the coefficients .b = b(t, y) and .σ = σ (t, y) in (19.1.2) are deterministic functions,
by the results of the previous chapters, Y is a Markov process, sometimes called
Markovian projection of X.
Remark 19.1.1 Processes with the same one-dimensional distributions can have
very distinct properties: for example, we saw in Remark 4.1.5 that a Brownian √
motion W has the same one-dimensional distributions as the process .W ~t := tW1 .
However, despite this equivalence, the two processes are inherently distinct in
law, and their trajectories demonstrate entirely different properties, as illustrated
in Fig. 19.1.
Theorem 19.1.2 (Gyöngy [56]) Let X be an Itô process of the form (19.1.1) with
the coefficients .u, v being progressively measurable, bounded, and satisfying the
uniform ellipticity condition
for some positive constant .λ. There exist two bounded and measurable functions
b : [0, T ] × RN −→ RN ,
. σ : [0, T ] × RN −→ RN×N ,
1 Formula (19.1.3) means that .b(t, ·) and .(σ σ ∗ )(t, ·) are respectively versions of the conditional
expectation functions of .ut and .vt vt∗ given .Xt , according to Definition 4.2.16 in [113].
19.1 Markovian Projection and Gyöngy’s Lemma 355
and the SDE (19.1.2) with coefficients .b, σ admits a weak solution .(Y, W ) such that
d
.Yt = Xt for every .t ∈ [0, T ].
Proof We only give a sketch of the proof. Let b and .C = (cij ) be versions of
the conditional expectation functions of .ut and .vt vt∗ given .Xt respectively, as in
1
(19.1.3). Moreover, let .σ = C 2 be the positive definite square root of the positive
definite matrix .C : the complete proof in [56] uses a regularization argument of the
coefficients that allows to reduce to the case where .bi (t, ·) and .cij (t, ·) are at least
Hölder continuous functions so as to satisfy the hypotheses of Theorem 18.2.6 for
the existence of a fundamental solution of the characteristic operator .At + ∂t where
1 ⎲ ⎲
N N
At :=
. cij (t, x)∂xi xj + bi (t, x)∂xi .
2
i,j =1 i=1
Hence, fixed .s ∈ ]0, T ] and .ϕ ∈ C0∞ (RN ), consider the classical, bounded solution
f of the backward Cauchy problem
⎧
∂t f (t, x) + At f (t, x) = 0, (t, x) ∈ ]0, s[×RN ,
.
f (s, x) = ϕ(x), x ∈ RN .
N ˆ
1 ⎲ s
f (s, Xs ) = f (0, X0 ) +
. (vt vt∗ )ij ∂xi xj f (t, Xt )dt
2 0
i,j =1
ˆ ˆ
s ( ) s
+ ut ∇x f (t, Xt ) + ∂t f (t, Xt ) dt + ∇x f (t, Xt )vt dBt
0 0
(19.1.4)
N ˆ
1 ⎲ s ⎾ ⏋
.E [f (s, Xs )] = E [f (0, X0 )] + E (vt vt∗ )ij ∂xi xj f (t, Xt ) dt
2 0 i,j =1
ˆ s
+ E [ut ∇x f (t, Xt ) + ∂t f (t, Xt )] dt =
0
2 Here we use a technical argument that relies on the analytical results of Chap. 20: the estimate
of Corollary 20.2.7 guarantees that .∇x f (t, Xt )vt ∈ L2 and therefore the stochastic integral in
(19.1.4) has zero expectation.
356 19 Complements
N ˆ
1 ⎲ s ⎾ ⎾ ⏋ ⏋
. = E [f (0, X0 )] + E E (vt vt∗ )ij | Xt ∂xi xj f (t, Xt ) dt
2 0
i,j =1
ˆ s
+ E [E [ut | Xt ] ∇x f (t, Xt ) + ∂t f (t, Xt )] dt =
0
(by (19.1.3))
ˆ s
. = E [f (0, X0 )] + E [(At f + ∂t f )(t, Xt )] dt =
0
. = E [f (0, X0 )] . (19.1.5)
On the other hand, by Theorem 18.3.1 there exists a weak solution .(Y, W ) of the
SDE (19.1.2) with initial law equal to the law of .X0 . By Itô’s formula, the process
3
.f (t, Yt ) is a martingale and therefore, by (19.1.5) we have
= E [ϕ(Xs )]
d
so that .Ys = Xs , given the arbitrariness of .ϕ. ⨆
⨅
Remark 19.1.3 (Bibliographic Note) Markovian projection methods are widely
used in mathematical finance for the calibration of local-stochastic volatility and
interest rates models: in this regard, see, for example, [3], [83] and Section 11.5
in [55]. A version of Gyöngy’s Theorem 19.1.2 that relaxes the hypotheses on the
coefficients has been more recently proven by Brunick and Shreve [22].
3 Precisely, by Itô’s formula the process .f (t, Y ) is a local martingale, but it is also a true martingale
t
by the boundedness of the function f .
19.2 Backward Stochastic Differential Equations 357
in this case, we speak of backward SDEs (or BSDEs). The most elementary
example is
⎧
dYt = 0,
. (19.2.1)
YT = η.
ˆ t ˆ T ˆ T
Yt = Y0 +
. Zs dWs = Y0 + Zs dWs − Zs dWs .
0 0 t
◟ ◝◜ ◞
=η
Although it may not seem obvious, it is not difficult to prove that .(Y, Z) is the only
pair of processes in .L2 that satisfies (19.2.3): in fact, if (19.2.3) were also satisfied
by .(Y ' , Z ' ) ∈ L2 , then, setting .A = Y − Y ' and .B = Z − Z ' , we would have
⎧
dAt = Bt dWt ,
.
AT = 0.
and therefore
ˆ T ˆ T
At = −
. 2As dAs − Bs2 ds
t t
and
⎾ ˆ T ⏋ ⎾ˆ T ⏋
E A2t +
. Bs2 ds = E 2As dAs = 0
t t
where the last equality is due to the fact that A, and therefore also the stochastic
integral, is a martingale. Based on what has just been proven, the following
definition is well posed.
Definition 19.2.1 Let W be a Brownian motion on the space .(Ω, F , P ) endowed
with the standard filtration .F W . We say that the pair .(Y, Z) ∈ L2 , unique solution
of the SDE (19.2.3), is the adapted solution of the BSDE (19.2.1) with final datum
.η ∈ L (Ω, F
2 W
T , P ).
Note that by definition we have
⎧
dYt = Zt dWt ,
.
YT = η.
Given .u = u(t, x) ∈ C 1,2 ([0, T [×RN ), applying Itô’s formula to .Yt := u(t, Xt ) we
obtain
where .Y t,x is the solution of the FBSDE (19.2.5) with initial datum .Xt = x.
Formula (19.2.6) is a nonlinear Feynman-Kac formula that generalizes the classical
representation formula of Sect. 15.4.
Remark 19.2.2 (Bibliographic Note) The main motivation for the study of
BSDEs comes from the theory of optimal stochastic control, starting from the works
[17] and [15]; some applications to mathematical finance are discussed in [39]. The
earliest results about existence and the nonlinear Feynman-Kac representation come
from [109], [117], and [2]. We point to the following books as essential references
for the theory of backward equations: Ma and Yong [93], Yong and Zhou [150],
Pardoux and Rascanu [110], and Zhang [152].
In this section, we outline some basic ideas of the theory of stochastic filtering and,
in a simple and explicit case, introduce the notion of stochastic partial differential
equation (abbreviated as SPDE), which intervenes naturally in this type of problems.
Given .(W, B) a standard two-dimensional Brownian motion, we consider the
process
√
Xtσ := σ Wt +
. 1 − σ 2 Bt , σ ∈ [0, 1].
4 Since it is a non-linear problem, the solution u is understood in a generalized sense, for example
as a “viscosity solution” (see, for example, Theorem 2.1, Chap. 8 in [93]).
360 19 Complements
Suppose that .Xσ represents a signal that is transmitted but not observable with
precision due to some disturbance in the transmission: precisely, we assume that
we can observe precisely .Wt , called the observation process, while the Brownian
motion .Bt represents the noise in the transmission.
It is easy to verify that .Xσ is a real Brownian motion for every .σ ∈ [0, 1]. The
problem of stochastic filtering consists in obtaining the best estimate of the signal
σ
.X based on the observation W : in fact, it is not difficult to prove that
μXσ |F W = Nσ Wt ,(1−σ 2 )t
. (19.3.1)
t t
where .μXσ |F W denotes the conditional law of .Xtσ given the .σ -algebra .FtW of
t t
observations on W up to time t (here .F W is the standard filtration for W ). To prove
(19.3.1) it is enough to calculate the conditional characteristic function
⎾ ⏋ ⎾ √ ⏋
iηXtσ iη 1−σ 2 Bt
.ϕXσ |F W (η) = E e | FtW =e iησ Wt
E e | FtW
=
t t
1 x2
𝚪(t, x) = √
. e− 2t , t > 0, x ∈ R. (19.3.2)
2π t
If .0 ≤ σ < 1 then .Xtσ has the following conditional density given .FtW :
If .σ > 0, clearly the conditional density .pt (x) is a stochastic process: from a
practical standpoint, having the observation of .Wt available and inserting it into
(19.3.3), we obtain the expression of the law of .Xtσ estimated (or “filtered”) based
on such observation. Note that .pt (x) is a Gaussian function with stochastic drift,
dependent on the observation, and variance proportional to .1 − σ 2 . Figure 19.2
represents the plot of a simulation of the stochastic Gaussian density .pt (x).
In analogy with the unconditioned case examined in Sects. 2.5.3 and 17.3.1,
.pt (x) is a solution of the Kolmogorov forward (Fokker-Planck) equation which in
this case is a SPDE: in fact, recalling the expression (19.3.3) of .pt (x) in terms of
19.3 Filtering and Stochastic Heat Equation 361
Fig. 19.2 Plot of a simulation of the fundamental solution .pt (x) of the stochastic heat equation
. dpt (x)=(1 − σ 2 )(∂s 𝚪)((1 − σ 2 )t, x − σ Wt )dt − σ (∂y 𝚪)((1 − σ 2 )t, x − σ Wt )dWt
σ2
+ (∂yy 𝚪)((1 − σ 2 )t, x − σ Wt )dt =
2
(since .𝚪 solves the forward heat equation .∂s 𝚪(s, y) = 21 ∂yy 𝚪(s, y))
1
. = (∂yy 𝚪)((1 − σ 2 )t, x − σ Wt )dt − σ (∂y 𝚪)((1 − σ 2 )t, x − σ Wt )dWt
2
1
= ∂xx pt (x)dt − σ ∂x pt (x)dWt .
2
In other words, the conditional density .pt (x) is the fundamental solution of the
stochastic heat equation
1
dpt (x) =
. ∂xx pt (x)dt − σ ∂x pt (x)dWt
2
which, in the case .σ = 0 where the observation is null, degenerates into the classical
heat equation.
Remark 19.3.1 (Bibliographic Note) Among the numerous monographs on the
theory of SPDEs, we particularly mention the books by Rozovskii [125], Kunita
[82], Prévôt and Röckner [120], Kotelenez [73], Chow [24], Liu and Röckner [91],
Lototsky and Rozovskii [92], and Pardoux [108]. For the study of stochastic filtering
362 19 Complements
problems, we refer, for example, to Fujisaki et al. [52], Pardoux [107], Fristedt et al.
[51], Elworthy et al. [40]. In [146] and [81], alternative approaches to the derivation
of filtering SPDEs are proposed, based on arguments similar to those used for the
proof of Feynman-Kac formulas.
In this section, we present an interesting result according to which, for a fixed time
T , the solution of an SDE seen as a stochastic process varying over time and initial
datum, i.e., .(t, x) |→ XTt,x in the usual notations of Chap. 15, solves a stochastic
partial differential equation (SPDE) involving the characteristic operator of the SDE.
The statement of this result and the formulation of the Krylov’s backward SPDE (so
called in Section 1.2.3 in [126]) requires the introduction of the backward stochastic
integral in which the temporal structure of information, Brownian motion, and the
related filtration are inverted.
Let W be a d-dimensional Brownian motion on the space .(Ω, F , P ). For
.t ∈ [0, T ] we consider the (completed) .σ -algebra of the increments of a Brownian
Fˆt := σ (Gˆt ∪ N ),
. Gˆt := σ (Ws − Wt , s ∈ [t, T ]). (19.4.1)
→ t := WT − WT −t ,
W t ∈ [0, T ],
under the assumptions on u for which the right-hand side of (19.4.2) is defined in
the usual sense of Itô, i.e.,
→ -progressively measurable (thus .ut ∈ mFˆt for every .t ∈ [0, T ]);
(i) .t |→ uT −t is .F
(ii) .u ∈ L2 ([0, T ]) a.s.
19.4 Backward Stochastic Integral and Krylov’s SPDE 363
sum in (19.4.3) is evaluated at the right endpoint of each interval of the partition
and .utk ∈ mFˆtk by hypothesis.
An N-dimensional backward Itô process is a process of the form
ˆ T ˆ T
Xt = XT +
. us ds + vs ⋆ dWs , t ∈ [0, T ],
t t
. − dF (t, Xt )
⎛ ⎞
1 ⎲
N
= ⎝(∂t F )(t, Xt ) + (vt vt∗ )ij (∂xi xj F )(t, Xt ) + ut (∇x F )(t, Xt )⎠ dt
2
i,j =1
⎲
N ⎲
d
j
+ (vt )ij (∂xi F )(t, Xt ) ⋆ dWt . (19.4.5)
i=1 j =1
with initial condition .Xtt,x = x. Then the process .(t, x) |→ XTt,x solves the backward
SPDE
⎧
−dXTt,x = At XTt,x dt + (∇x XTt,x )σ (t, x) ⋆ dWt ,
. (19.4.7)
XTT ,x = x,
364 19 Complements
where
1 ⎲
N
At =
. cij (t, x)∂xj xi + bi (t, x)∂xi , (cij ) := σ σ ∗ ,
2
i,j =1
is the characteristic operator of the SDE (19.4.6). The explicit expressions of the
drift coefficient and the diffusion term in (19.4.7) are
1 ⎲ ⎲
N N
At XTt,x =
. cij (t, x)∂xj xi XTt,x + bi (t, x)∂xi XTt,x ,
2
i,j =1 i=1
⎲
N ⎲
d
j
(∇x XTt,x )σ (t, x) ⋆ dWt = σij (t, x)∂xi XTt,x ⋆ dWt .
i=1 j =1
Proof We only give a sketch of the proof and refer to Proposition 5.3 in [126]
for the details. To simplify the presentation, we only treat the one-dimensional and
autonomous case, following the approach proposed in [145]. First of all, thanks
to Kolmogorov’s continuity theorem 3.3.4 and the .Lp estimates of dependence on
the initial datum, extending the results of Corollary 17.4.2, we have that, up to
modifications, .x |→ XTt,x is sufficiently regular to support the derivatives that appear
in the SPDE in the classical sense. We use the Taylor series expansion for functions
of class .C 2 (R):
δ 2 ''
f (δ) − f (0) = δf ' (0) +
. f (λδ), λ ∈ [0, 1]. (19.4.8)
2
Given a partition of .[t, T ], we have
n ⎛
⎲ ⎞
t ,x
XTt,x − x = XTt,x − XTT ,x =
. XTk−1 − XTtk ,x =
k=1
. = XT k − XTtk ,x =
k=1
t ,x
(by (19.4.8) with .f (δ) = XTtk ,x+δ and .δ = Δk X := Xtkk−1 − x)
n ⎛
⎲ ⎞
(Δk X)2
. = Δk X∂x XTtk ,x + ∂xx XTtk ,x+λk Δk X (19.4.9)
2
k=1
19.4 Backward Stochastic Integral and Krylov’s SPDE 365
Therefore, setting
we have
ˆ tk ⎛ ⎛ ⎞ ⎞
~k X =
.Δk X − Δ
t ,x
b Xsk−1 − b(x) ds
tk−1
ˆ tk ⎛ ⎛ ⎞ ⎞
t ,x
+ σ Xsk−1 − σ (x) dWs = O(Δk t),
tk−1
with the constant c depending only on T and the Lipschitz constants of b and .σ .
Hence, from (19.4.9) we obtain
n ⎛
⎲ ~ k X)2
⎞
(Δ
.X
t,x
−x = ~ k X∂x Xtk ,x +
Δ tk ,x
∂xx XT + O(Δk t).
T T
2
k=1
Note that .∂x XTtk ,x , ∂xx XTtk ,x ∈ mFˆtk ; therefore, by (19.4.3), letting the partition
mesh go to zero, we have
⎲
n ˆ T ˆ T
. ~ k X∂x Xtk ,x −→
Δ b(x)∂x XTs,x ds + σ (x)∂x XTs,x ⋆ dWs ,
T
k=1 t t
⎲
n ˆ T
~ k X)2 ∂xx Xtk ,x −→
(Δ σ 2 (x)∂xx XTs,x ds,
T
k=1 t
Proof To fix the ideas, let us start with the one-dimensional case: by the backward
SPDE (19.4.7) and Itô’s formula (19.4.5), we have
⎛
σ 2 (t, x) '' t,x σ 2 (t, x) ' t,x
. − dF (XTt,x ) = F (XT )(∂x XTt,x )2 + F (XT )∂xx XTt,x
2 2
⎞
' t,x t,x
+ b(t, x)F (XT )∂x XT dt
(since .∂x VTt,x = F ' (XTt,x )∂x XTt,x and .∂xx VTt,x = F '' (XTt,x )(∂x XTt,x )2 +
F ' (XTt,x )∂xx XTt,x )
⎛ ⎞
σ 2 (t, x) t,x t,x
. = ∂xx VT + b(t, x)∂x VT dt + σ (t, x)∂x VTt,x ⋆ dWt
2
which proves the thesis. The multidimensional case is analogous: first of all
. ⎲
N
∂xh xk VTt,x = (∂xi xj F )(XTt,x )(∂xh XTt,x )i (∂xk XTt,x )j + (∇x F )(XTt,x )(∂xh xk XTt,x ),
i,j =1
(19.4.10)
and by (19.4.7) and (19.4.5)
1 ⎲(
N
)
. − dF (XTt,x ) = (∇x XTt,x )σ (t, x)((∇x XTt,x )σ (t, x))∗ ij (∂xi xj F )(XTt,x )dt
2
i,j =1
+ (At XTt,x )(∇x F )(XTt,x )dt + (∇x F )(XTt,x )(∇x XTt,x )σ (t, x) ⋆ dWt =
(by (19.4.10))
Lo so
Del mondo e anche del resto
Lo so
Che tutto va in rovina
Ma di mattina
Quando la gente dorme
Col suo normale malumore
Mi può bastare un niente
Forse un piccolo bagliore
Un’aria già vissuta
Un paesaggio o che ne so
E sto bene
Io sto bene come uno quando sogna
Non lo so se mi conviene
Ma sto bene, che vergogna1
Giorgio Gaber
1 I know
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 369
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_20
370 20 A Primer on Parabolic PDEs
1 ⎲ ⎲
N N
L :=
. cij (t, x)∂xi xj + bj (t, x)∂xj + a(t, x) − ∂t (20.0.1)
2
i,j =1 j =1
ST := ]0, T [×RN ,
. (20.0.2)
where .T > 0 is fixed. We assume that the matrix of coefficients .(cij ) is symmetric
and positive semidefinite: in this case, we say that .L is a forward parabolic
operator.
The interest in this type of operators is due to the fact that, as seen previously in
Sects. 2.5 and 15.1,
1 ⎲ ⎲
N N
At :=
. cij (t, x)∂xi xj + βj (t, x)∂xj , (cij ) := σ σ ∗ , (20.0.3)
2
i,j =1 j =1
and the related forward Kolmogorov operator .L = At∗ − ∂t is, at least formally, of
the form (20.0.1) with
⎲
N ⎲
N
1 ⎲
N
. bj := −βj + ∂xi cij , a := − ∂xi βi + ∂xi xj cij . (20.0.5)
2
i=1 i=1 i,j =1
Note that in a forward operator, the time derivative appears with a negative sign:
as already mentioned in Sect. 2.5.2, this type of operators typically intervene in
physics in the description of phenomena that evolve over time, such as heat diffusion
in a body. On the other hand, every forward operator can be transformed into a
parabolic backward2 operator with the simple change of variables .s = T − t: it
follows that all the results we prove in this chapter for forward operators admit an
analogous backward formulation.
for the operator .L in (20.0.1). A classic example due to Tychonoff [141] shows
that the problem (20.1.1) for the heat operator admits infinite solutions: in fact, in
addition to the identically zero solution, also the functions of the type
∞
⎲ x 2k k − 1α
uα (t, x) :=
. ∂ e t , α > 1, (20.1.2)
(2k)! t
k=0
However, the solutions in (20.1.2) are in some sense “pathological”: they oscillate,
changing sign infinitely many times and have very rapid growth as .|x| → ∞.
In light of Tychonoff’s example, the study of the uniqueness of the solution of
the problem (20.1.1) consists in determining suitable classes of functions, called
uniqueness classes for .L , within which the solution, if it exists, is unique. In this
section, we assume the following minimal hypotheses on the coefficients of .L in
(20.0.1):
Assumption 20.1.1
(i) For each .i, j = 1, . . . , N , the coefficients .cij , bi and a are real-valued
measurable functions;
(ii) the matrix .C (t, x) := (cij (t, x)) is symmetric and positive semidefinite for
every .(t, x) ∈ ST ;
(iii) the coefficient a is upper bounded: there exists .a0 ∈ R such that
a(t, x) ≤ a0 ,
. (t, x) ∈ ST .
We will prove that a uniqueness class is given by functions that do not grow too
rapidly to infinity in the sense that they satisfy the estimate
2
. |u(t, x)| ≤ CeC|x| , (t, x) ∈ ST , (20.1.3)
372 20 A Primer on Parabolic PDEs
for some positive constant C. This result, contained in Theorem 20.1.8, is proven
under very general conditions, namely Assumption 20.1.1 and the following
Assumption 20.1.2 There exists a constant M such that
Theorem 20.1.10 shows that, under Assumptions 20.1.1 and 20.1.3, a uniqueness
class is given by functions with at most polynomial growth, which satisfy an
estimate of the type
DT = ]0, T [×D
.
where D is a bounded domain (open and connected set) of .RN . We denote by .∂D
the boundary of D and say that
is the parabolic boundary of .DT . As before, .C 1,2 (DT ) is the space of functions
defined on .DT , that are continuously differentiable with respect to t and twice
continuously differentiable with respect to x.
20.1 Uniqueness: The Maximum Principle 373
Proof First, we observe that it is not restrictive to take .a0 < 0 in Assumption 20.1.1.
If it were not, it would be enough to prove the thesis for the function
which satisfies
L uλ − λuλ = e−λt L u,
. (20.1.7)
choosing .λ > a0 .
Now we proceed by contradiction. Denying the thesis, there would exist a point
.(t, x) ∈ DT such that .u(t, x) > 0: in fact, we can also assume that
u(t, x) = max u.
.
[0,t]×D
It follows that
and therefore
1 ⎲ ⎲ ⎲
N N N
L u(t, x) = −
. cij (t, x) mih mj h + bj (t, x)∂xj u(t, x)
2
i,j =1 h=1 j =1
1⎲ ⎲
N N
=− cij (t, x)mih mj h +a(t, x)u(t, x) − ∂t u(t, x)
2
h=1 i,j =1
◟ ◝◜ ◞
≥0 since C =(cij )≥0
Proof Consider first the case .a0 ≤ 0 and therefore .a0+ = 0. Suppose that u and .L u
are bounded respectively on .∂p DT and .DT , otherwise there is nothing to prove.
Letting
we have
and .−w ≤ u ≤ w on .∂p DT . Then estimate (20.1.8) follows from the comparison
principle, Corollary 20.1.6.
20.1 Uniqueness: The Maximum Principle 375
Let now .a0 > 0. Consider .uλ in (20.1.6) with .λ = a0 : as just proved, we have
(by (20.1.7))
We establish analogous results to those presented in the preceding section for the
Cauchy problem (20.1.1).
Theorem 20.1.8 (Weak Maximum Principle) Let Assumptions 20.1.1 and 20.1.2
be in force. If .u ∈ C 1,2 (ST ) ∩ C([0, T [×RN ) is such that
⎧
L u ≤ 0, in ST ,
. (20.1.9)
u(0, ·) ≥ 0, in RN ,
We explicitly note that Assumptions 20.1.1 and 20.1.2 are very mild, so as to
include, for example, the case when .L is a first-order operator. We first prove the
following
Lemma 20.1.9 Under Assumption 20.1.1, if .u ∈ C 1,2 (ST ) ∩ C([0, T [×RN )
verifies (20.1.9) and is such that
Fix .(t0 , x0 ) ∈ ST . Thanks to condition (20.1.11), there exists .R > |x0 | such that
u(t, x) + ε > 0,
. t ∈ ]0, T [, |x| = R,
and from the weak maximum principle of Theorem 20.1.5, applied on the cylinder
1
L =
. Δ − ∂t ,
2
such that
v(t, x) ≥ eγ |x| .
2
L v(t, x) = 0
. and
From Lemma 20.1.9 we deduce that .u + εv ≥ 0 for every .ε > 0, which proves the
thesis.
The general case is only technically more complicated and exploits Assump-
tion 20.1.2 on the coefficients of the operator. Fixed .γ > C and two constants
.α, β ∈ R that we will determine later, consider the function
⎛ ⎞
γ |x|2 1
.v(t, x) = exp + βt , 0≤t ≤ , x ∈ RN .
1 − αt 2α
Since
2γ ⎲
N
Lv 2γ 2 γ αγ |x|2
. = 〈C x, x〉 + trC + bi xi + a − − β,
v (1 − αt)2 1 − αt 1 − αt (1 − αt)2
i=1
Lv
. ≤ 0. (20.1.12)
v
1 ⎲ ⎲
N N
Lu
. cij ∂xi xj w + b̂i ∂xi w + âw − ∂t w = ≤ 0,
2 v
i,j =1 i=1
where
⎲
N
∂x j v Lv
b̂i = bi +
. cij , â = .
v v
j =1
Since .â ≤ 0 by (20.1.12), we can apply Lemma 20.1.9 to conclude that w (and thus
also u) is non-negative. ⨆
⨅
378 20 A Primer on Parabolic PDEs
for some positive constants C and p, then .u ≥ 0 in .[0, T [×RN . Consequently, there
exists at most one classical solution of the Cauchy problem (20.1.1) that satisfies the
polynomial growth estimate (20.1.4) at infinity.
Proof We only prove the case .a0 < 0. Consider the function
⎛ ⎞q
v(t, x) = eαt κt + |x|2
.
and verify that for every .q > 0 it is possible to choose .α, κ such that .L v < 0 on
ST . Then for .p < 2q and for every .ε > 0 we have .L (u + εv) < 0 on .ST and,
.
thanks to condition (20.1.13), we can apply Lemma 20.1.9 to deduce that .u+εv ≥ 0
on .ST . The thesis follows from the arbitrariness of .ε. ⨆
⨅
We now prove the analogue of Theorem 20.1.7: the following result provides
estimates, in .L∞ norm, of the dependence of the solution in terms of the initial
datum and the inhomogeneous term. These estimates play a crucial role, for
example, in the proof of the stability of numerical schemes.
Theorem 20.1.11 If the operator .L satisfies Assumptions 20.1.1 and 20.1.2, then
for every .u ∈ C 1,2 (ST ) ∩ C([0, T [×RN ) that satisfies the exponential growth
estimate (20.1.3) we have
⎛ ⎞
−a0+ T
. sup |u| ≤ e sup |u(0, ·)| + T sup |L u| , a0+ := max{0, a0 }.
[0,T [×RN RN ST
we have
⎧
⎨L w± = a sup |u(0, ·)| − sup |L u| ± L u ≤ 0, in ST ,
. ST
⎩
w± (0, ·) ≥ 0, in RN ,
and clearly .w± satisfies the estimate (20.1.10). It follows from Theorem 20.1.8 that
w± ≥ 0 in .ST and this proves the thesis. On the other hand, if .a0 ≥ 0 then it is
.
1 2
. |η| ≤ 〈C (t, x)η, η〉 ≤ λ0 |η|2 , (t, x) ∈ ST , η ∈ RN . (20.2.3)
λ0
For convenience, we assume .λ0 large enough so that .[cij ]α , [bi ]α , [a]α ≤ λ0
for each .i, j = 1, . . . , N .
3 See, for example, the works of Pogorzelski [118] and Aronson [5] on the construction of the
fundamental solution. The book by Friedman [50] is still a classic reference text for the parametrix
method and the main source that inspired our presentation.
4 It is possible to assume slightly weaker hypotheses: in this regard, see Section 6.4 in [50]. In
particular, the continuity condition in time is only for convenience: the results of this section extend
without difficulty to the case of coefficients that are measurable in t; in this case, the PDE is
understood in an integro-differential sense, as in (20.2.5).
380 20 A Primer on Parabolic PDEs
1 ⎲ ⎲
N N
A :=
. cij (t, x)∂xi xj + bj (t, x)∂xj + a(t, x) (20.2.4)
2
i,j =1 j =1
c1 ec2 |x|
2
|f (t, x)| ≤
. , (t, x) ∈ ]t0 , T [×RN , (20.2.6)
(t − t0 )1−β
where .c1 , c2 are positive constants with .c2 < 4λ0 (T1 −t0 ) ;
(ii) for every .n ∈ N, there exists a constant .κn such that
|x − y|β
|f (t, x) − f (t, y)| ≤ κn
.
β
, t0 < t < T , |x|, |y| ≤ n.
(t − t0 )1− 2
(20.2.7)
.𝚪(t0 , x0 ; t, x) ≤ c G (λ(t − t0 ), x − x0 ) , .
(20.2.8)
| | c
|∂x 𝚪(t0 , x0 ; t, x)| ≤ √ G (λ(t − t0 ), x − x0 ) , .
i
t − t0
(20.2.9)
| | c
|∂x x 𝚪(t0 , x0 ; t, x)| + |∂t 𝚪(t0 , x0 ; t, x)| ≤ G (λ(t − t0 ), x − x0 )
i j
t − t0
(20.2.10)
for every .(t, x) ∈ ]t0 , T [×RN , where .G is the Gaussian function in (20.3.1).
Furthermore, there exist two positive constants .λ̄, c̄, only dependent on
.T , N, λ0 , α, such that
( )
𝚪(t0 , x0 ; t, x) ≥ c̄ G λ̄(t − t0 ), x − x0
. (20.2.11)
𝚪(t0 , x0 ; t, x)
.
ˆ
= 𝚪(t0 , x0 ; s, y)𝚪(s, y; t, x)dy, 0 ≤ t0 < s < t < T , x, x0 ∈ RN ;
RN
|ϕ(x) − ϕ(y)|
. ‖ϕ‖bC α (RN ) := sup |ϕ| + sup .
RN x/=y |x − y|α
The following result shows that estimate (20.2.10) can be refined in the sense that
1
the non-integrable singularity . t−t 0
can be replaced by an integrable one when the
initial datum is Hölder continuous.
Corollary 20.2.7 Under the assumptions of Theorem 20.2.5, consider the solution
u in (20.2.12) of the Cauchy problem (20.2.13) with .a = f = 0. If .ϕ ∈
bC δ (RN ) for some .δ > 0, then there exists a constant c, which depends only on
.T , N, δ, α, λ0 , [cij ]α and .[bi ]α , such that
c
. |Dxk u(t, x)| ≤ k−δ
‖ϕ‖bC δ (RN ) , t > t0 , x ∈ RN , k = 0, 1, 2,
(t − t0 ) 2
(20.2.15)
we have
ˆ ˆ
. 0 = ∂xi xj 𝚪(t0 , x0 ; t, x)dx0 = ∂xi xj 𝚪(t0 , x0 ; t, x)dx0 .
RN RN
Hence
|ˆ |
| |
|∂xi xj u(t, x)| = ||
. ∂xi xj 𝚪(t0 , x0 ; t, x)(ϕ(x0 ) − ϕ(x))dy || ≤ (20.2.16)
RN
20.3 The Parametrix Method 383
and the conclusion follows thanks to the elementary estimates of Lemma 20.3.4. ⨆
⨅
1 1 −1 x,x〉
G(C, x) = √
. e− 2 〈C , x ∈ RN .
(2π )N det C
Notice that
1 ⎲
N
. Cij ∂xi xj G(tC, x) = ∂t G(tC, x), t > 0, x ∈ RN .
2
i,j =1
1 |x|2
G(t, x) ≡ G(tIN , x) =
.
N
e− 2t , t > 0, x ∈ RN , (20.3.1)
(2π t) 2
to indicate the usual standard Gaussian function, solution of the heat equation
2 ΔG(t, x) = ∂t G(t, x).
1
.
384 20 A Primer on Parabolic PDEs
1 ⎲
N
Ly :=
. cij (t, y)∂xi xj − ∂t .
2
i,j =1
Operator .Ly acts in the variables .(t, x) and has coefficients that depend only on the
time variable t, since y is fixed. Thanks to Assumption 20.2.2 and in particular to
the fact that the matrix .C = (cij ) is uniformly positive definite, we have that the
fundamental solution of .Ly has the following explicit expression
ˆ t
𝚪y (t0 , x0 ; t, x) = G(Ct0 ,t (y), x − x0 ),
. Ct0 ,t (y) := C (s, y)ds, (20.3.2)
t0
ˆ tˆ
L 𝚪(t0 , x0 ; t, x) = L P(t0 , x0 ; t, x) +
. Ф(t0 , x0 ; s, y)L P(s, y; t, x)dyds
t0 RN
− Ф(t0 , x0 ; t, x) (20.3.5)
since formally .P(t, y; t, x)dy = δx (dy) where .δx denotes the Dirac delta centered at x.
20.3 The Parametrix Method 385
where
c |x − y|
α
2 ⎛
|Ф(t0 , x0 ; t, x) − Ф(t0 , x0 ; t, y)| ≤ 1− α4
G(λ(t − t0 ), x − x0 )
(t − t0 )
⎞
+ G(λ(t − t0 ), y − x0 )
(20.3.10)
may vary from one line to another. When necessary, we will explicitly state the
quantities on which c depends.
Lemma 20.3.4 Let .G be the Gaussian function in (20.3.1). For every .p > 0 and
λ1 > λ0 there exists a constant .c = c(p, N, λ1 , λ0 ) such that
.
⎛ ⎞p
|x|
. √ G(λ0 t, x) ≤ c G(λ1 t, x), t > 0, x ∈ RN .
t
|x|
Proof For simplicity, let .z = √
t
, we have
⎛ ⎞ ⎛ ⎞N
zp z2 λ1
z G(λ0 t, x) =
.
p
N
exp − = g(z)G(λ1 t, x)
(2π λ0 t) 2 2λ0 λ0
where
κz2 1 1
g(z) := zp e−
. 2 , κ= − > 0, z ∈ R+ ,
λ0 λ1
/ ( p )p
p
reaches the global maximum in .z0 = κ where we have .g(z0 ) = eκ
2 . ⨆
⨅
1 ⎛ ⎞
t−t0
.
N
G λ0 , x − x0 ≤ 𝚪y (t0 , x0 ; t, x) ≤ λN
0 G (λ0 (t − t0 ), x − x0 ) (20.3.11)
λ0
for every .0 ≤ t0 < t < T and .x, x0 , y ∈ RN , where .λ0 is the constant of
Assumption 20.2.2. Moreover, for every .λ > λ0 there exists a positive constant
.c = c(T , N, λ, λ0 ) such that
| |
|∂x 𝚪y (t0 , x0 ; t, x)| ≤ √ c G (λ(t − t0 ), x − x0 ) , .
.
i
t − t0
(20.3.12)
| | c
|∂x x 𝚪y (t0 , x0 ; t, x)| ≤ G (λ(t − t0 ), x − x0 ) , .
i j
t − t0
(20.3.13)
| | c
|∂x x x 𝚪y (t0 , x0 ; t, x)| ≤ G (λ(t − t0 ), x − x0 ) , .
i j k
(t − t0 )3/2
(20.3.14)
| |
|𝚪y (t0 , x0 ; t, x) − 𝚪η (t0 , x0 ; t, x)| ≤ c|y − η|α G (λ(t − t0 ), x − x0 ) , .
(20.3.15)
20.3 The Parametrix Method 387
| | − η|α
|∂x 𝚪y (t0 , x0 ; t, x) − ∂x 𝚪η (t0 , x0 ; t, x)| ≤ c|y
√ G (λ(t − t0 ), x − x0 ) , .
i i
t − t0
(20.3.16)
| |
|∂x x 𝚪y (t0 , x0 ; t, x) − ∂x x 𝚪η (t0 , x0 ; t, x)| ≤ c|y − η| G (λ(t − t0 ), x − x0 ) ,
α
i j i j
t − t0
(20.3.17)
t − t0
. |y0 |2 ≤ 〈Ct0 ,t (y)y0 , y0 〉 ≤ λ0 (t − t0 )|y0 |2 ; (20.3.18)
λ0
consequently, we have
|y0 |2 λ0 |y0 |2
. ≤ 〈Ct−1
0 ,t
(y)y0 , y0 〉 ≤ (20.3.19)
λ0 (t − t0 ) t − t0
and also
⎛ ⎞N
t − t0
. ≤ det Ct0 ,t (y) ≤ λN
0 (t − t0 ) .
N
(20.3.20)
λ0
Formula (20.3.19) follows from the fact that if .A, B are symmetric and positive
definite matrices, then the inequality between quadratic forms .A ≤ B (i.e.,
.〈Ay0 , y0 〉 ≤ 〈By0 , y0 〉 for every .y0 ∈ R ) implies .B
N −1 ≤ A−1 . Formula
(20.3.20) follows from the fact that the minimum and maximum eigenvalue of a
symmetric matrix C are respectively . min 〈Cy0 , y0 〉 and . max 〈Cy0 , y0 〉 =: ‖C‖
|y0 |=1 |y0 |=1
where .‖C‖ is the spectral norm of C. We note that (20.3.18)-(20.3.19) can be
rewritten respectively in the form
t − t0 1 λ0
. ≤ ‖Ct0 ,t (y)‖ ≤ λ0 (t − t0 ), ≤ ‖Ct−1
0 ,t
(y)‖ ≤ .
λ0 λ0 (t − t0 ) t − t0
(20.3.21)
Estimates (20.3.11) then follow directly from the definition of .𝚪y (t0 , x0 ; t, x).
As for (20.3.12), letting .∇x = (∂x1 , . . . , ∂xN ), we have
| |
. |∇x 𝚪y (t0 , x0 ; t, x)| = |C −1 (y)(x − x0 )|𝚪y (t0 , x0 ; t, x)
t0 ,t
≤ ‖Ct−1
0 ,t
(y)‖ |x − x0 |𝚪y (t0 , x0 ; t, x) ≤
388 20 A Primer on Parabolic PDEs
(by (20.3.20))
| |
λN |det Ct ,t (y) − det Ct ,t (η)|
. ≤ √ 0 0 0
(since .| det A − det B| ≤ c‖A − B‖ where .‖ · ‖ indicates the spectral norm and c is
a constant that depends only on .‖A‖, ‖B‖ and the dimension of the matrices)
‖ ‖
c ‖ 1 ( )‖
≤√ ‖ C (y) − C (η) ‖
det Ct0 ,t (y) ‖ t − t0 ‖
. t0 ,t t0 ,t
20.3 The Parametrix Method 389
and (20.3.22) follows from Assumption 20.2.2, in particular from the Hölder
condition on the coefficients .cij . Regarding (20.3.23), by the mean value theorem
and (20.3.19) we have
| | | | 2
| − 12 〈Ct−1 1 −1 | | | − |x|
. |e 0 ,t
(y)x,x〉
− e− 2 〈Ct0 ,t (η)x,x〉 | ≤ |〈Ct−1
0 ,t
(y)x, x〉 − 〈Ct−1
0 ,t
(η)x, x〉| e 2λ0 (t−t0 )
2
− 2λ |x|
≤ ‖Ct−1
0 ,t
(y) − Ct−1
0 ,t
(η)‖ |x|2 e (t−t
0 0) ≤
(by (20.3.21))
‖ ‖
‖ 1 ( )‖ |x|2 − 2λ |x| 2
. ≤ c‖
‖ Ct0 ,t (y) − Ct0 ,t (η) ‖
‖ e 0 (t−t0 ) ≤
t − t0 t − t0
and the proof of (20.3.16) and (20.3.17) follows a similar line of reasoning as used
previously. ⨆
⨅
390 20 A Primer on Parabolic PDEs
Lemma 20.3.5 enables us to estimate the terms .(L P)k in (20.3.8) of the parametrix
expansion.
Lemma 20.3.6 For every .λ > λ0 there exists a positive constant .c =
c(T , N, λ, λ0 ) such that
mk
|(L P)k (t0 , x0 ; t, x)| ≤
.
αk
G(λ(t − t0 ), x − x0 ) (20.3.24)
(t − t0 )1− 2
For .k = 1 we have
| |
|L P(t0 , x0 ; t, x)| = |(L − Lx0 )P(t0 , x0 ; t, x)|
.
1 ⎲ ||( |
N
)
≤ cij (t, x) − cij (t, x0 ) ∂xi xj 𝚪x0 (t0 , x0 ; t, x)|
2
i,j =1
⎲
N
| |
+ |bi (t, x)∂x 𝚪x (t0 , x0 ; t, x)|
i 0
i=1
The first term is the most delicate: by the estimates (20.3.25) and (20.3.13), for
λ/ = λ02+λ we have
.
|( ) |
| cij (t, x)−cij (t, x0 ) ∂x x 𝚪x (t0 , x0 ; t, x)| ≤ c |x − x0 | G(λ/ (t − t0 ), x − x0 ) ≤
α
. i j 0
t − t0
The other terms are easily estimated using the boundedness hypothesis of the
coefficients and estimate (20.3.12) of the first derivatives:
| |
. |bi (t, x)∂x 𝚪x (t0 , x0 ; t, x)| + |a(t, x)|𝚪x (t0 , x0 ; t, x)
i 0 0
⎛ ⎞
1
≤c √ + 1 G(λ(t − t0 ), x − x0 ).
t − t0
and the thesis follows from the properties of Euler’s Gamma function. ⨆
⨅
Remark 20.3.7 The Chapman-Kolmogorov equation is a crucial tool in the
parametrix method: it is proved by a direct calculation or, alternatively, as a
consequence of the uniqueness result of Theorem 20.1.8. In fact, for .t0 < s < t < T
and .x, x0 , y ∈ RN , we have that the functions .u1 (t, x) := G(t − t0 , x − x0 ) and
ˆ
u2 (t, x) =
. G(s − t0 , y − x0 )G(t − s, x − y)dy
RN
Lemma 20.3.8 Let .κ > 0. Given .κ1 ∈ ]0, κ[ there exists a positive constant c such
that
|η−x0 |2 |y−x0 |2
e−κ
. t ≤ ce−κ1 t (20.3.26)
b2
2|ab| ≤ εa 2 +
. ,
ε
and
⎛ ⎞
1 2
.(a + b) ≤ (1 + ε)a + 1 +
2 2
b .
ε
(since .|y − η|2 ≤ t by hypothesis and for .ε sufficiently small, being .κ1 < κ)
⎛ ⎞
1
κ1
. 1+ .
ε
⨆
⨅
Proof of Proposition 20.3.2 For every .λ > λ0 we have
∞
⎲
. |Ф(t0 , x0 ; t, x)| ≤ |(L P)k (t0 , x0 ; t, x)| ≤
k=1
c
≤ α G(λ(t − t0 ), x − x0 )
(t − t0 )1− 2
∑
∞
with .c = c(T , N, λ, λ0 ) positive constant, since the power series . mk r k−1 has
k=1
infinite convergence radius. This proves (20.3.9). The convergence of the series is
20.3 The Parametrix Method 393
∞
⎲
= (L P)k (t0 , x0 ; t, x)
k=2
= Ф(t0 , x0 ; t, x) − L P(t0 , x0 ; t, x)
for every .λ > λ0 , .0 ≤ t0 < t < T and .x, y, x0 ∈ RN , with .c = c(T , N, λ, λ0 ) > 0.
Now, if .|x − y|2 > t − t0 then (20.3.27) follows directly from (20.3.24) with .k = 1.
To study the case .|x − y|2 ≤ t − t0 , we observe that
L P(t0 , x0 ; t, x) − L P(t0 , x0 ; t, y)
.
where
1 ⎲
N
F1 =
. ((cij (t, x) − cij (t, x0 ))∂xi xj P(t0 , x0 ; t, x)
2
i,j =1
1 ⎲
N
= (cij (t, x) − cij (t, y))∂xi xj P(t0 , x0 ; t, x)
2
i,j =1
◟ ◝◜ ◞
=:G1
394 20 A Primer on Parabolic PDEs
1 ⎲
N
( )
+ (cij (t, y) − cij (t, x0 )) ∂xi xj P(t0 , x0 ; t, x) − ∂yi yj P(t0 , x0 ; t, y) ,
2
i,j =1
◟ ◝◜ ◞
=:G2
⎲
N
( )
F2 = bj (t, x)∂xj P(t0 , x0 ; t, x) − bj (t, y)∂yj P(t0 , x0 ; t, y)
j =1
Due to the Hölder continuity assumption of the coefficients and the Gaussian
estimate (20.3.13), under the condition .|x − y|2 ≤ t − t0 , we have
α
c |x − y|α c |x − y| 2
|G1 | ≤
. G (λ(t − t0 ), x − x0 ) ≤ α G (λ(t − t0 ), x − x0 ) .
t − t0 (t − t0 )1− 4
Regarding .G2 , we still use the Hölder continuity of the coefficients and combine the
mean value theorem (with .η belonging to the segment with endpoints .x, y) with the
Gaussian estimate (20.3.14) of the third derivatives: we obtain
⎛ ⎞
c |x − y| λ + λ0
|G2 | ≤ |y − x0 |α
.
3
G (t − t0 ), η − x0 ≤
(t − t0 ) 2 2
A similar estimate holds for .F2 , which can be proved using the Hölder continuity of
the coefficients .bj and a. This concludes the proof of (20.3.27).
We now prove (20.3.10) using the fact that .Ф solves equation (20.3.6), so we
have
Ф(t0 , x0 ; t, x) − Ф(t0 , x0 ; t, y)
.
= L P(t0 , x0 ; t, x) − L P(t0 , x0 ; t, y)
ˆ tˆ
+ Ф(t0 , x0 ; s, η) (L P(s, η; t, x) − L P(s, η; t, y)) dηds .
t0 R N
◟ ◝◜ ◞
=:I (t0 ,x0 ;t,x,y)
20.3 The Parametrix Method 395
where .f ∈ C(]t0 , T [×RN ) satisfies Assumption 20.2.4 of growth and local Hölder
continuity. The main result of this section is the following
Proposition 20.3.9 Definition (20.3.29) is well-posed and .Vf ∈ C(]t0 , T [×RN ).
Moreover, for every .i, j = 1, . . . , N there exist and are continuous on .]t0 , T [×RN
396 20 A Primer on Parabolic PDEs
the derivatives
ˆ tˆ
. ∂xi Vf (t, x) = f (s, y)∂xi P(s, y; t, x)dyds, . (20.3.30)
t0 RN
ˆ tˆ
∂xi xj Vf (t, x) = f (s, y)∂xi xj P(s, y; t, x)dyds, . (20.3.31)
t0 RN
ˆ tˆ
∂t Vf (t, x) = f (t, x) + f (s, y)∂t P(s, y; t, x)dyds. (20.3.32)
t0 RN
Proof Let
ˆ
I (s; t, x) :=
. f (s, y)𝚪y (s, y; t, x)dy, t0 ≤ s < t < T , x ∈ RN ,
RN
so that
ˆ t
Vf (t, x) =
. I (s; t, x)ds.
t0
with .β > 0.
20.3 The Parametrix Method 397
ce2c2 |x|
2
. ≤ √ .
(s − t0 )1−β t − s
This is sufficient to prove (20.3.30) and moreover, by (20.3.28) we have
| | ce2c2 |x|
2
Proof of (20.3.31) The proof of the existence of the second order derivatives is
more involved since repeating the previous argument using estimate (20.3.13) would
1
result in a singular term of the type . t−s which is not integrable in the interval .[t0 , t].
Proceeding carefully, it is possible to prove more precise and uniform estimates on
.]t0 , T [×Dn for each fixed .n ∈ N, where .Dn := {|x| ≤ n}.
where
ˆ
J (s; t, x) =
. f (s, y)∂xi xj P(s, y; t, x)dy,
Dn+1
ˆ
H (s; t, x) = f (s, y)∂xi xj P(s, y; t, x)dy.
RN \Dn+1
( )
8 For clarity, the term . ∂xi xj 𝚪η (s, y; t, x) |η=x is obtained by first applying the derivatives
.∂xi xj 𝚪η (s, y; t, x), keeping .η fixed, and then calculating the result obtained in .η = x. Note that,
under Assumption 20.2.2, .𝚪η (s, y; t, x) as a function of .η is not differentiable.
398 20 A Primer on Parabolic PDEs
By the local Hölder continuity of f , being .x, y ∈ Dn+1 , and estimate (20.3.13), we
have
ˆ
c |x − y|β
. |J1 (s; t, x)| ≤ G (λ(t − s), x − y) dy ≤
(s − t0 )1− 2
β
Dn+1 t −s
ˆ
cec2 |x|
2
|y − x|α
. |J2 (s; t, x)| ≤ G (λ(t − s), x − y) dy
(s − t0 )1−β Dn+1 t −s
cec2 |x|
2
≤ α .
(s − t0 )1−β (t − s)1− 2
and therefore
ˆ ˆ
( ) ( )
. ∂xi xj 𝚪η (s, y; t, x) |η=x dy = − ∂yi xj 𝚪η (s, y; t, x) |η=x dy =
Dn+1 Dn+1
(by the divergence theorem, indicating with .ν the external normal to .Dn+1 and with
dσ (y) the surface measure on the boundary .∂Dn+1 )
.
ˆ
( )
. =− ∂xj 𝚪η (s, y; t, x) |η=x ν(y)dσ (y)
∂Dn+1
cec2 |x|2
≤ √ .
(s − t0 )1−β t − s
20.3 The Parametrix Method 399
(by Lemma 20.3.4, with .λ/ > λ, and the assumption (20.2.6) on the growth of f )
ˆ 2
c ( ) cec|x|
ec2 |y| G λ/ (t − s), x − y dy ≤
2
. ≤
(s − t0 )1−β RN (s − t0 )1−β
with .c > 0, remembering that .c2 < 4λ10 T by assumption and choosing .λ/ − λ0
sufficiently small. In conclusion, we have proved that, for every .t0 ≤ s < t < T
and .x ∈ Dn , with .n ∈ N fixed, there exists a constant c such that
|ˆ |
| | c
.|∂xi xj I (s; t, x)| = | f (s, y)∂ P(s, y; t, x)dy |≤
| N xi xj | β γ
R (s − t0 )1− 2 (t − s)1− 2
(20.3.35)
for every .t0 ≤ s < t < T and .x ∈ Dn , with .n ∈ N fixed. Now, we have
ˆ
Vf (t + h, x) − Vf (t, x) I (s; t + h, x) − I (s; t, x)
t
. = ds
h t0 h
ˆ
1 t+h
+ I (s; t + h, x)ds =: I1 (t, x) + I2 (t, x).
h t
By the mean value theorem, there exists .tˆs ∈ [t, t + h] such that
ˆ t ˆ t
. I1 (t, x) = ∂t I (s; tˆs , x)ds −−−→ ∂t I (s; t, x)ds
t0 h→0 t0
where the second integral on the right-hand side tends to zero as .h → 0 since f is
continuous, while to estimate the first integral we assume .x ∈ Dn and proceed as in
the proof of (20.3.31): specifically, we write
ˆ t+h
1
. (I (s; t + h, x) − f (s, x)) ds
h t
ˆ t+h ˆ
1
= (f (s, y) − f (s, x))𝚪y (s, y; t + h, x)dyds
h t Dn+1
◟ ◝◜ ◞
=:J1 (t,x)
ˆ t+h ˆ
1
+ (f (s, y) − f (s, x))𝚪y (s, y; t + h, x)dyds .
h t RN \Dn+1
◟ ◝◜ ◞
=:J2 (t,x)
20.3 The Parametrix Method 401
Assuming .h > 0 for simplicity: by the Hölder continuity of f and estimate (20.3.11)
of .𝚪y , we have
ˆ t+h ˆ
λN κn+1
. |J1 (t, x)| ≤ |x − y|β G (λ0 (t + h − s), x − y) dyds ≤
h t Dn+1
On the other hand, thanks to the growth assumption (20.2.6) on f and (20.3.11), it
can be readily proved that
ˆ t+h ˆ
c
ec2 |y| G (λ0 (t + h − s), x − y) dyds −−−−→ 0.
2
. |J2 (t, x)| ≤
h t |x−y|>1 h→0+
− Ф(t0 , x0 ; t, x)
L 𝚪(t0 , x0 ; t, x) = 0,
. 0 ≤ t0 < t < T , x, x0 ∈ RN , (20.3.37)
. ≤ λN G (λ(t − t0 ), x − x0 )
ˆ t ˆ
c
+ 1− α
G(λ(s − t0 ), y − x0 )G(λ(t − s), x − y)dyds =
t0 (s − t0 ) 2 RN
and this proves, in particular, the upper bound (20.2.8). Formula (20.2.9) is proven
in a completely analogous way.
Now, we prove (20.2.10). By repeating the proof of (20.3.35) with .Ф(t0 , x0 ; s, y)
in place of .f (s, y) and using the estimates from Proposition 20.3.2, we establish the
existence of a positive constant .c = c(T , N, λ, λb ) such that
|ˆ |
| |
.| Ф(t0 , x0 ; s, y)∂xi xj P(s, y; t, x)dy ||
|
RN
c
≤ 1− α4 α G(λ(t − t0 ), x − x0 ), t0 ≤ s < t < T , x, x0 ∈ RN .
(s − t0 ) (t − s)1− 4
(20.3.39)
20.3 The Parametrix Method 403
Here we use the limit argument of Example 3.3.3 in [113]: in probabilistic terms,
this correspond to the weak convergence of the normal distribution to the Dirac
delta, as the variance tends to zero. On the other hand, by (20.3.38)
ˆ
α
. |H (t, x)| ≤ c(t − t0 ) 2 ϕ(x0 )G(λ(t − t0 ), x − x0 )dx0 −−−−−−−−→ 0.
RN (t,x)→(t0 ,x̄)
This proves that .u ∈ C([t0 , T [×RN ) and is therefore a classical solution of the
Cauchy problem (20.2.2).
Step 4 We prove that u in (20.2.12) is a classical solution of the non-homogeneous
Cauchy problem (20.2.13). We use the definition of .𝚪 in (20.3.4) and focus on the
term
ˆ tˆ
. f (s, y)𝚪(s, y; t, x)dyds
t0 RN
ˆ tˆ
= f (s, y)P(s, y; t, x)dyds
t0 RN
ˆ tˆ ˆ tˆ
+ f (s, y) Ф(s, y; τ, η)P(τ, η; t, x)dηdτ dyds =
t0 RN s RN
. = Vf (t, x) + VF (t, x)
where
ˆ τ ˆ
.F (τ, η) := f (s, y)Ф(s, y; τ, η)dyds.
t0 RN
We will soon prove that F satisfies Assumption 20.2.4 and it is therefore possible to
apply Proposition 20.3.9 to .Vf and .VF : we obtain
( )
.L Vf (t, x) + VF (t, x) = −f (t, x) − F (t, x)
ˆ tˆ
+ (f (s, y) + F (s, y)) L P(s, y; t, x)dyds
t0 RN
ˆ tˆ
= −f (t, x) + f (s, y)I (s, y; t, x)dyds
t0 RN
20.3 The Parametrix Method 405
where
. |F (τ, η)| ≤ β α
G(λ(τ − s), η − y)dyds
t0 RN (s − t0 )1− 2 (τ − s)1− 2
c 2
≤ ec|η| .
1− α+β
(τ − t0 ) 2
c|η − η/ | 2
α ⎛ / 2
⎞
ec|η| + ec|η |
2
≤ .
1− α+2β
(τ − t0 ) 4
Finally, using the upper bound (20.2.8) of .𝚪 and proceeding as in the proof of
estimate (20.3.34), we have that
ˆ tˆ
. f (s, y)𝚪(s, y; t, x)dyds −−−−−−−−→ 0,
t0 RN (t,x)→(t0 ,x̄)
for every .x̄ ∈ RN . This concludes the proof that u in (20.2.12) is a classical solution
of the non-homogeneous Cauchy problem (20.2.13).
Step 5 The Chapman-Kolmogorov equation and formula (20.2.14) can be proved
as in Remark 20.3.7, as a consequence of the uniqueness result of Theorem 20.1.8.
406 20 A Primer on Parabolic PDEs
are both bounded solutions (thanks to estimate (20.3.38)) of the Cauchy problem
⎧
Lu = 0 in ]t0 , T [×RN ,
.
u(t0 , ·) = 1 in RN ,
𝚪(t0 , y; t1 , x1 ) < 0,
. |y − x0 | < r,
for a suitable .r > 0. Consider .ϕ ∈ bC(RN ) such that .ϕ(y) > 0 for .|y − x0 | < r and
.ϕ(y) ≡ 0 for .|y − x0 | ≥ r: the function
ˆ
u(t, x) :=
. ϕ(y)𝚪(t0 , y; t, x)dy, t ∈ ]t0 , T [, x ∈ RN ,
RN
is bounded thanks to estimate (20.3.38) of .𝚪, is such that .u(t1 , x1 ) < 0 and is
a classical solution of the Cauchy problem (20.2.2). But this is absurd because it
contradicts the maximum principle, Theorem 20.1.8.
Now we observe that for every .λ > 1 we have
⎛ ⎞
t
.G(λt, x) ≤ G ,x
λ
√ /
if .|x| < cλ t where .cλ = λλN
2 −1 log λ. Then, by definition (20.3.4) we have
|ˆ t ˆ |
| |
.𝚪(t0 , x0 ; t, x) ≥ P(t0 , x0 ; t, x) − | Ф(t0 , x0 ; s, y)P(s, y; t, x)dyds || ≥
|
t0 R N
⎛ ⎞
1 t − t0 α
≥ NG , x − x0 − c(t − t0 ) 2 G (λ(t − t0 ), x − x0 ) =
λ λ
20.3 The Parametrix Method 407
√
(if .|x − x0 | ≤ cλ t − t0 )
⎛ ⎞ ⎛t − t ⎞
−N α 0
. ≥ λ − c(t − t0 ) G
2 , x − x0
λ
⎛ ⎞
1 t − t0
≥ NG , x − x0 (20.3.40)
2λ λ
( )− 2
if .0 < t − t0 ≤ Tλ := 2cλN α ∧ T .
Given .x, x0 ∈ RN and .0 ≤ t0 < t < T , let .m ∈ N be the integer part of
⎧ ⎫
4|x − x0 |2 T
. max , .
cλ2 (t − t0 ) Tλ
We set
t − t0 x − x0
tk = t0 + k
. , xk = x0 + k , k = 1, . . . , m,
m+1 m+1
t − t0 T
. tk+1 − tk = ≤ ≤ Tλ . (20.3.41)
m+1 m+1
/
|x − x0 | cλ t − t0
.|yk+1 − yk | ≤ 2r + |xk+1 − xk | = 2r + ≤ 2r +
m+1 2 m+1
/
t − t0
= cλ . (20.3.42)
m+1
√
= cλ tk+1 − tk . (20.3.43)
∏
m−1
× 𝚪(tk , yk ; tk+1 , yk+1 )𝚪(tm , ym ; t, x)dy1 . . . dym ≥
k=1
408 20 A Primer on Parabolic PDEs
∏
m−1
× 1D(xk ,r) (yk )𝚪(tk , yk ; tk+1 , yk+1 )1D(xm ,r) (ym )𝚪(tm , ym ; t, x)dy1 . . . dym ≥
k=1
1
𝚪(t0 , x0 ; t, x) ≥
.
N
e−cm
c(t − t0 ) 2
and by the choice of m, this is enough to prove the thesis and conclude the proof of
Theorem 20.2.5.
For consistency with the notations of this chapter, we state and prove Proposi-
tion 18.4.3 in its forward version.
Proposition 20.3.10 Under Assumption 20.2.2, let .𝚪 be the fundamental solution
of the operator .A −∂t on .ST with .A in (20.2.4). For every .λ ≥ 1, the vector-valued
20.3 The Parametrix Method 409
function
ˆ t
uλ (t, x) :=
. e−λ(t−t0 )
0
ˆ
× b(t0 , x0 )𝚪(t0 , x0 ; t, x)dx0 dt0 , (t, x) ∈ [0, T [×RN ,
RN
Moreover, there exists a constant .c > 0, which depends only on .N, λ0 and T , such
that
c
. |uλ (t, x) − uλ (t, y)| ≤ √ |x − y|, . (20.3.44)
λ
|∇x uλ (t, x) − ∇x uλ (t, y)| ≤ c|x − y|, (20.3.45)
where
ˆ
Ib (t0 ; t, x) :=
. b(t0 , x0 )P(t0 , x0 ; t, x)dx0 ,
RN
ˆ ˆ tˆ
Jb (t0 ; t, x) := b(t0 , x0 ) Ф(t0 , x0 ; s, y)P(s, y; t, x)dyds dx0 ,
RN t0 RN
◟ ◝◜ ◞
=:R(t0 ,x0 ;t,x)
(20.3.46)
|x − y|
. |Ib (t0 ; t, x) − Ib (t0 ; t, y)| ≤ c √ , x, y ∈ RN .
t − t0
410 20 A Primer on Parabolic PDEs
A similar result holds for .Jb : in fact, by (20.3.9) and by the mean value theorem and
estimate (20.3.12), for .λ1 > λ0 we have
|x − y|
. =c 1−α
G(λ1 (t − t0 ), x̄ − x0 ) (20.3.47)
(t − t0 ) 2
|x − y|
. |Jb (t0 ; t, x) − Jb (t0 ; t, y)| ≤ c 1−α
x, y ∈ RN .
(t − t0 ) 2
Hence, we have
ˆ t e−λ(t−t0 )
|uλ (t, x) − uλ (t, y)| ≤ c|x − y|
. √ dt0
0 t − t0
which yields (20.3.44). The proof of (20.3.45) is analogous and is based on the
arguments also used for the proof of Proposition 20.3.9. ⨅
⨆
The chapter is structured into two parts, focusing on the study of uniqueness and
existence for the parabolic Cauchy problem, respectively.
• Section 20.1: uniqueness is proven under very general assumptions (cf. Assump-
tion 20.1.1, (20.1.2), and 20.1.3). The main results are the maximum and
comparison principles. Uniqueness classes for the Cauchy problem are given by
functions that do not grow too rapidly at infinity.
• Sections 20.2 and 20.3: we present the classic parametrix method for the
construction of the fundamental solution of a uniformly parabolic operator with
bounded coefficients that are Hölder continuous in the spatial variable. This
is a fairly long and complex technique based on suitable estimates involving
Gaussian functions and on the study of singular integrals. The fundamental
Theorem 20.2.5 provides, in addition to existence and the property of being a
20.4 Key Ideas to Remember 411
density, also a comparison between the fundamental solution and the Gaussian
function, the Chapman-Kolmogorov property, and Duhamel’s formula for the
solution of the non-homogeneous Cauchy problem.
Main notations used or introduced in this chapter:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 413
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1
414 References
18. Bjork, T.: Arbitrage Theory in Continuous Time, 2nd edn. Oxford University Press, Oxford
(2004)
19. Black, F., Scholes, M.: The pricing of options and corporate liabilities. J. Polit. Econ. 81,
637–654 (1973)
20. Blumenthal, R.M., Getoor, R.K.: Markov Processes and Potential Theory. Pure and Applied
Mathematics, vol. 29. Academic Press, New York-London (1968)
21. Brémaud, P.: Point Processes and Queues. Springer, New York (1981). Martingale dynamics,
Springer Series in Statistics
22. Brunick, G., Shreve, S.: Mimicking an Itô process by a solution of a stochastic differential
equation. Ann. Appl. Probab. 23, 1584–1628 (2013)
23. Champagnat, N., Jabin, P.-E.: Strong solutions to stochastic differential equations with rough
coefficients. Ann. Probab. 46, 1498–1541 (2018)
24. Chow, P.-L.: Stochastic Partial Differential Equations, second edn. Advances in Applied
Mathematics. CRC Press, Boca Raton, FL (2015)
25. Chung, K.L., Doob, J.L.: Fields, optionality and measurability. Amer. J. Math. 87, 397–424
(1965)
26. Courrège, P.: Générateur infinitésimal d’un semi-groupe de convolution sur .Rn , et formule de
Lévy-Khinchine. Bull. Sci. Math. (2) 88, 3–30 (1964)
27. Cox, J.C.: Notes on Option Pricing I: Constant Elasticity of Variance Diffusion. Working
Paper, Stanford University, Stanford CA (1975)
28. Cox, J.C.: The constant elasticity of variance option pricing model. J. Portfolio Manag. 23,
15–17 (1997)
29. Cox, J.C., Ingersoll, J.E., Ross, S.A.: The relation between forward prices and futures prices.
J. Financ. Econ. 9, 321–346 (1981)
30. Criens, D., Pfaffelhuber, P., Schmidt, T.: The martingale problem method revisited. Electron.
J. Probab. 28 (2023), 1–46
31. Davie, A.M.: Uniqueness of solutions of stochastic differential equations. Int. Math. Res. Not.
IMRN 2007, Art. ID rnm124, 26 (2007)
32. Davydov, D., Linetsky, V.: Pricing and hedging path-dependent options under the CEV
process. Manag. Sci. 47, 949–965 (2001)
33. Delbaen, F., Shirakawa, H.: A note on option pricing for the constant elasticity of variance
model. Asia-Pac. Financ. Mark. 9, 85–99 (2002)
34. Di Francesco, M., Pascucci, A.: On a class of degenerate parabolic equations of Kolmogorov
type. AMRX Appl. Math. Res. Express 3, 77–116 (2005)
35. Doob, J.L.: Stochastic Processes. John Wiley & Sons/Chapman & Hall, New York/London
(1953)
36. Duffie, D., Filipović, D., Schachermayer, W.: Affine processes and applications in finance.
Ann. Appl. Probab. 13, 984–1053 (2003)
37. Durrett, R.: Stochastic Calculus. Probability and Stochastics Series. CRC Press, Boca Raton,
FL (1996). A practical introduction
38. Durrett, R.: Probability: Theory and Examples, vol. 49 of Cambridge Series in Statistical
and Probabilistic Mathematics. Cambridge University Press, Cambridge (2019). Available at
https://services.math.duke.edu/~rtd/PTE/pte.html
39. El Karoui, N., Peng, S., Quenez, M.C.: Backward stochastic differential equations in finance.
Math. Finance 7, 1–71 (1997)
40. Elworthy, K.D., Le Jan, Y., Li, X.-M.: The Geometry of Filtering. Frontiers in Mathematics.
Birkhäuser Verlag, Basel (2010)
41. Evans, L.C.: Partial Differential Equations, second edn., vol. 19 of Graduate Studies in
Mathematics. American Mathematical Society, Providence, RI (2010)
42. Fabes, E.B., Stroock, D.W.: A new proof of Moser’s parabolic Harnack inequality using the
old ideas of Nash. Arch. Rational Mech. Anal. 96, 327–338 (1986)
43. Fedrizzi, E., Flandoli, F.: Pathwise uniqueness and continuous dependence of SDEs with non-
regular drift. Stochastics 83, 241–257 (2011)
References 415
44. Feehan, P.M.N., Pop, C.A.: On the martingale problem for degenerate-parabolic partial
differential operators with unbounded coefficients and a mimicking theorem for Itô processes.
Trans. Amer. Math. Soc. 367, 7565–7593 (2015)
45. Feller, W.: Zur Theorie der stochastischen Prozesse. Math. Ann. 113, 113–160 (1937)
46. Figalli, A.: Existence and uniqueness of martingale solutions for SDEs with rough or
degenerate coefficients. J. Funct. Anal. 254, 109–153 (2008)
47. Flandoli, F.: Regularity Theory and Stochastic Flows for Parabolic SPDEs, vol. 9 of
Stochastics Monographs. Gordon and Breach Science Publishers, Yverdon (1995)
48. Flandoli, F.: Random Perturbation of PDEs and Fluid Dynamic Models, vol. 2015 of Lecture
Notes in Mathematics. Springer, Heidelberg (2011). Lectures from the 40th Probability
Summer School held in Saint-Flour, 2010, École d’Été de Probabilités de Saint-Flour. [Saint-
Flour Probability Summer School]
49. Friedman, A.: Partial Differential Equations of Parabolic Type. Prentice-Hall, Englewood
Cliffs, NJ (1964)
50. Friedman, A.: Stochastic Differential Equations and Applications. Dover Publications,
Mineola, NY (2006). Two volumes bound as one, Reprint of the 1975 and 1976 original
published in two volumes
51. Fristedt, B., Jain, N., Krylov, N.: Filtering and Prediction: A Primer, vol. 38 of Student
Mathematical Library. American Mathematical Society, Providence, RI (2007)
52. Fujisaki, M., Kallianpur, G., Kunita, H.: Stochastic differential equations for the non linear
filtering problem. Osaka J. Math. 9, 19–40 (1972)
53. Gilbarg, D., Trudinger, N.S.: Elliptic Partial Differential Equations of Second Order,
second edn., vol. 224 of Grundlehren der mathematischen Wissenschaften [Fundamental
Principles of Mathematical Sciences]. Springer, Berlin (1983)
54. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). Available at
http://www.deeplearningbook.org
55. Guyon, J., Henry-Labordère, P.: Nonlinear Option Pricing. Chapman & Hall/CRC Financial
Mathematics Series. CRC Press, Boca Raton, FL (2014)
56. Gyöngy, I.: Mimicking the one-dimensional marginal distributions of processes having an Itô
differential. Probab. Theory Relat. Fields 71, 501–516 (1986)
57. Gyöngy, I., Krylov, N.V.: Existence of strong solutions for Itô’s stochastic equations via
approximations: revisited. Stoch. Partial Differ. Equations Anal. Comput. 10, 693–719 (2022)
58. Hagan, P.S., Kumar, D., Lesniewski, A., Woodward, D.E.: Managing smile risk. Wilmott
Magazine, September, 84–108 (2002)
59. Halmos, P.R.: Measure Theory. D. Van Nostrand Company, New York, NY (1950)
60. Heston, S.: A closed-form solution for options with stochastic volatility with applications to
bond and currency options. Rev. Financ. Stud. 6, 327–343 (1993)
61. Heston, S.L., Loewenstein, M., Willard, G.A.: Options and bubbles. Rev. Financ. Stud. 20(2),
359–390 (2007)
62. Hörmander, L.: Hypoelliptic second order differential equations. Acta Math. 119, 147–171
(1967)
63. Ikeda, N., Watanabe, S.: Stochastic Differential Equations and Diffusion Processes, vol. 24
of North-Holland Mathematical Library. North-Holland Publishing Co./Kodansha, Amster-
dam/Tokyo (1981)
64. Itô, K., Watanabe, S.: Introduction to stochastic differential equations. In: Proceedings of the
International Symposium on Stochastic Differential Equations (Res. Inst. Math. Sci., Kyoto
Univ., Kyoto, 1976), pp. i–xxx. Wiley, New York (1978)
65. Jacod, J., Shiryaev, A.N.: Limit Theorems for Stochastic Processes, second edn., vol. 288 of
Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical
Sciences]. Springer, Berlin (2003)
66. Kallenberg, O.: Foundations of Modern Probability, second edn. Probability and its Applica-
tions (New York). Springer, New York (2002)
67. Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus, second edn., vol. 113
of Graduate Texts in Mathematics. Springer, New York (1991)
416 References
68. Klenke, A.: Probability Theory, second edn. Universitext. Springer, London (2014). A
comprehensive course
69. Kolmogorov, A.N.: Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung.
Math. Ann. 104, 415–458 (1931)
70. Kolmogorov, A.N.: Selected Works of A. N. Kolmogorov. Vol. III. Kluwer Academic
Publishers Group, Dordrecht (1993). Edited by A. N. Shiryayev
71. Kolokoltsov, V.N.: Markov Processes, Semigroups and Generators, vol. 38 of De Gruyter
Studies in Mathematics. Walter de Gruyter & Co., Berlin (2011)
72. Komlós, J.: A generalization of a problem of Steinhaus. Acta Math. Acad. Sci. Hungar. 18,
217–229 (1967)
73. Kotelenez, P.: Stochastic Ordinary and Stochastic Partial Differential Equations, vol. 58 of
Stochastic Modelling and Applied Probability. Springer, New York (2008). Transition from
microscopic to macroscopic equations
74. Krylov, N.V.: Itô’s stochastic integral equations. Teor. Verojatnost. i Primenen 14, 340–348
(1969)
75. Krylov, N.V.: Correction to the paper “Itô’s stochastic integral equations” (Teor. Verojatnost.
i Primenen. 14, 340–348 (1969)). Teor. Verojatnost. i Primenen. 17, 392–393 (1972)
76. Krylov, N.V.: The selection of a Markov process from a Markov system of processes, and
the construction of quasidiffusion processes. Izv. Akad. Nauk SSSR Ser. Mat. 37, 691–708
(1973)
77. Krylov, N.V.: Controlled Diffusion Processes, vol. 14 of Stochastic Modelling and Applied
Probability. Springer, Berlin (2009). Translated from the 1977 Russian original by A. B.
Aries, Reprint of the 1980 edition
78. Krylov, N.V., Röckner, M.: Strong solutions of stochastic equations with singular time
dependent drift. Probab. Theory Related Fields 131, 154–196 (2005)
79. Krylov, N.V., Rozovsky, B.L.: On the first integrals and Liouville equations for diffusion
processes. In: Stochastic Differential Systems (Visegrád, 1980), pp. 117–125, vol. 36 of
Lecture Notes in Control and Information Sci. Springer, Berlin (1981)
80. Krylov, N.V., Rozovsky, B.L.: Characteristics of second-order degenerate parabolic Itô
equations. Trudy Sem. Petrovsk. 8, 153–168 (1982)
81. Krylov, N.V., Zatezalo, A.: A direct approach to deriving filtering equations for diffusion
processes. Appl. Math. Optim. 42, 315–332 (2000)
82. Kunita, H.: Stochastic Flows and Stochastic Differential Equations, vol. 24 of Cambridge
Studies in Advanced Mathematics. Cambridge University Press, Cambridge (1997). Reprint
of the 1990 original
83. Lacker, D., Shkolnikov, M., Zhang, J.: Inverting the Markovian projection, with an application
to local stochastic volatility models. Ann. Probab. 48, 2189–2211 (2020)
84. Ladyzhenskaia, O.A., Solonnikov, V.A., Ural’tseva, N.N.: Linear and Quasilinear Equations
of Parabolic Type. Translations of Mathematical Monographs, vol. 23. American Mathemat-
ical Society, Providence, RI (1968). Translated from the Russian by S. Smith
85. Lanconelli, E., Polidoro, S.: On a class of hypoelliptic evolution operators. Rend. Sem. Mat.
Univ. Politec. Torino 52, 29–63 (1994)
86. Langevin, P.: Sur la théorie du mouvement Brownien. C.R. Acad. Sci. Paris 146, 530–532
(1908)
87. Lee, E.B., Markus, L.: Foundations of Optimal Control Theory, second edn. Robert E. Krieger
Publishing Co., Melbourne, FL (1986)
88. Lemons, D.S.: An Introduction to Stochastic Processes in Physics. Johns Hopkins University
Press, Baltimore, MD (2002). Containing “On the theory of Brownian motion” by Paul
Langevin, translated by Anthony Gythiel
89. Levi, E.E.: Sulle equazioni lineari totalmente ellittiche alle derivate parziali. Rend. Circ. Mat.
Palermo 24, 275–317 (1907)
90. Liptser, R.S., Shiryaev, A.N.: Statistics of Random Processes. I, expanded edn., vol. 5 of
Applications of Mathematics (New York). Springer, Berlin (2001). General theory, Translated
from the 1974 Russian original by A. B. Aries, Stochastic Modelling and Applied Probability
References 417
91. Liu, W., Röckner, M.: Stochastic Partial Differential Equations: An Introduction. Universitext.
Springer, Cham (2015)
92. Lototsky, S.V., Rozovsky, B.L.: Stochastic Partial Differential Equations. Universitext.
Springer, Cham (2017)
93. Ma, J., Yong, J.: Forward-backward Stochastic Differential Equations and Their Applications,
vol. 1702 of Lecture Notes in Mathematics. Springer, Berlin (1999)
94. Mazliak, L., Shafer, G.: The Splendors and Miseries of Martingales - Their History from the
Casino to Mathematics. Trends in the History of Science. Birkhäuser, Cham (2022)
95. Menozzi, S.: Parametrix techniques and martingale problems for some degenerate Kol-
mogorov equations. Electron. Commun. Probab. 16, 234–250 (2011)
96. Meyer, P.-A.: Probability and Potentials. Blaisdell Publishing Co. Ginn, Waltham (1966)
97. Meyer, P.A.: Stochastic processes from 1950 to the present. J. Électron. Hist. Probab. Stat. 5,
42 (2009). Translated from the French [MR1796860] by Jeanine Sedjro
98. Mörters, P., Peres, Y.: Brownian Motion, vol. 30 of Cambridge Series in Statistical and Prob-
abilistic Mathematics. Cambridge University Press, Cambridge (2010). With an appendix by
Oded Schramm and Wendelin Werner
99. Mumford, D.: The dawning of the age of stochasticity. Atti Accad. Naz. Lincei Cl. Sci. Fis.
Mat. Natur. Rend. Lincei (9) Mat. Appl. 11, 107–125 (2000). Mathematics towards the third
millennium (Rome, 1999)
100. Novikov, A.A.: A certain identity for stochastic integrals. Teor. Verojatnost. i Primenen. 17,
761–765 (1972)
101. Nualart, D.: The Malliavin Calculus and Related Topics, second edn. Probability and its
Applications (New York). Springer, Berlin (2006)
102. Oksendal, B.: Stochastic Differential Equations, fifth edn. Universitext. Springer, Berlin
(1998). An introduction with applications
103. Oleinik, O.A., Radkevic, E.V.: Second Order Equations with Nonnegative Characteristic
Form. Plenum Press, New York (1973). Translated from the Russian by Paul C. Fife
104. Ornstein, L.S., Uhlenbeck, G.E.: On the theory of the Brownian motion. Phys. Rev. 36, 823–
841 (1930)
105. Pagliarani, S., Pascucci, A.: The exact Taylor formula of the implied volatility. Finance Stoch.
21, 661–718 (2017)
106. Pagliarani, S., Pascucci, A., Pignotti, M.: Intrinsic Taylor formula for Kolmogorov-type
homogeneous groups. J. Math. Anal. Appl. 435, 1054–1087 (2016)
107. Pardoux, E.: Stochastic partial differential equations and filtering of diffusion processes.
Stochastics 3, 127–167 (1979)
108. Pardoux, E.: Stochastic Partial Differential Equations. SpringerBriefs in Mathematics.
Springer, Cham (2021). An introduction
109. Pardoux, E., Peng, S.G.: Adapted solution of a backward stochastic differential equation. Syst.
Control Lett. 14, 55–61 (1990)
110. Pardoux, E., Rascanu, A.: Stochastic Differential Equations, Backward SDEs, Partial Differ-
ential Equations, vol. 69 of Stochastic Modelling and Applied Probability. Springer, Cham
(2014)
111. Pascucci, A.: Calcolo stocastico per la finanza, vol. 33 of Unitext. Springer, Milano (2008)
112. Pascucci, A.: PDE and Martingale Methods in Option Pricing, vol. 2 of Bocconi & Springer
Series. Springer/Bocconi University Press, Milan (2011)
113. Pascucci, A.: Probability Theory. Volume 1 - Random Variables and Distributions. Unitext.
Springer, Milan (2024)
114. Pascucci, A., Pesce, A.: Sobolev embeddings for kinetic Fokker-Planck equations. J. Funct.
Anal. 286, Paper No. 110344, 40 (2024)
115. Pascucci, A., Runggaldier, W.J.: Financial Mathematics, vol. 59 of Unitext. Springer, Milan
(2012). Theory and problems for multi-period models, Translated and extended version of
the 2009 Italian original
116. Paulos, J.A.: A Mathematician Reads the Newspaper. Basic Books, New York (2013).
Paperback edition of the 1995 original with a new preface
418 References
117. Peng, S.G.: A nonlinear Feynman-Kac formula and applications. In: Control Theory, Stochas-
tic Analysis and Applications (Hangzhou, 1991), pp. 173–184. World Sci. Publ., River Edge,
NJ (1991)
118. Pogorzelski, W.: Étude de la solution fondamentale de l’équation parabolique. Ricerche Mat.
5, 25–57 (1956)
119. Polidoro, S.: Uniqueness and representation theorems for solutions of Kolmogorov-Fokker-
Planck equations. Rend. Mat. Appl. (7) 15, 535–560 (1995)
120. Prévôt, C., Röckner, M.: A Concise Course on Stochastic Partial Differential Equations,
vol. 1905 of Lecture Notes in Mathematics. Springer, Berlin (2007)
121. Protter, P.E.: Stochastic Integration and Differential Equations, Second edn., vol. 21 of
Stochastic Modelling and Applied Probability. Springer, Berlin (2005). Version 2.1, Corrected
third printing
122. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press
(2006). Available at http://www.gaussianprocess.org/gpml/
123. Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion, third edn., vol. 293 of
Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical
Sciences]. Springer, Berlin (1999)
124. Rogers, L.C.G., Williams, D.: Diffusions, Markov Processes, and Martingales. Vol. 2,
Cambridge Mathematical Library. Cambridge University Press, Cambridge (2000). Itô
calculus, Reprint of the second (1994) edition
125. Rozovsky, B.L.: Stochastic Evolution Systems, vol. 35 of Mathematics and its Applications
(Soviet Series). Kluwer Academic Publishers Group, Dordrecht (1990). Linear theory and
applications to nonlinear filtering, Translated from the Russian by A. Yarkho
126. Rozovsky, B.L., Lototsky, S.V.: Stochastic Evolution Systems, vol. 89 of Probability Theory
and Stochastic Modelling. Springer, Cham (2018). Linear theory and applications to non-
linear filtering
127. Salsburg, D.: The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth
Century. Henry Holt and Company (2002)
128. Schilling, R.L.: Sobolev embedding for stochastic processes. Expo. Math. 18, 239–242 (2000)
129. Schilling, R.L.: Brownian Motion—A Guide to Random Processes and Stochastic Calculus.
De Gruyter Textbook, De Gruyter, Berlin (2021). With a chapter on simulation by Björn
Böttcher, Third edition [of 2962168]
130. Shaposhnikov, A., Wresch, L.: Pathwise vs. path-by-path uniqueness, preprint,
arXiv:2001.02869 (2020)
131. Skorokhod, A.V.: Studies in the Theory of Random Processes. Translated from the Russian
by Scripta Technica, Inc., Mineola, NY: Dover Publications, reprint of the 1965 edition ed.,
2017
132. Stroock, D.W.: Markov Processes from K. Itô’s Perspective, vol. 155 of Annals of Mathemat-
ics Studies. Princeton University Press, Princeton, NJ (2003)
133. Stroock, D.W.: Partial Differential Equations for Probabilists, vol. 112 of Cambridge Studies
in Advanced Mathematics. Cambridge University Press, Cambridge (2012). Paperback
edition of the 2008 original
134. Stroock, D.W., Varadhan, S.R.S.: Diffusion processes with continuous coefficients. I. Comm.
Pure Appl. Math. 22, 345–400 (1969)
135. Stroock, D.W., Varadhan, S.R.S.: Diffusion processes with continuous coefficients. II. Comm.
Pure Appl. Math. 22, 479–530 (1969)
136. Stroock, D.W., Varadhan, S.R.S.: Multidimensional Diffusion Processes. Classics in Mathe-
matics. Springer, Berlin (2006). Reprint of the 1997 edition
137. Struwe, M.: Variational Methods, fourth edn., vol. 34 of Ergebnisse der Mathematik und ihrer
Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in Mathematics
and Related Areas. 3rd Series. A Series of Modern Surveys in Mathematics]. Springer, Berlin
(2008). Applications to nonlinear partial differential equations and Hamiltonian systems
138. Taira, K.: Semigroups, Boundary Value Problems and Markov Processes, second edn.
Springer Monographs in Mathematics. Springer, Heidelberg (2014)
References 419
139. Tanaka, H.: Note on continuous additive functionals of the 1-dimensional Brownian path. Z.
Wahrscheinlichkeitstheorie Verw. Gebiete 1, 251–257 (1962/1963)
140. Trevisan, D.: Well-posedness of multidimensional diffusion processes with weakly differen-
tiable coefficients. Electron. J. Probab. 21, Paper No. 22, 41 (2016)
141. Tychonoff, A.: Théorèmes d’unicité pour l’equation de la chaleur. Math. Sbornik 42, 199–216
(1935)
142. van Casteren, J.A.: Markov Processes, Feller Semigroups and Evolution Equations, vol. 12
of Series on Concrete and Applicable Mathematics. World Scientific Publishing, Hackensack
(2011)
143. Vasicek, O.: An equilibrium characterization of the term structure. J. Financ. Econ. 5, 177–
188 (1977)
144. Veretennikov, A.Y.: Strong solutions and explicit formulas for solutions of stochastic integral
equations. Mat. Sb. (N.S.) 111(153), 434–452, 480 (1980)
145. Veretennikov, A.Y.: “Inverse diffusion” and direct derivation of stochastic Liouville equations.
Mat. Zametki 33, 773–779 (1983)
146. Veretennikov, A.Y.: On backward filtering equations for SDE systems (direct approach). In:
Stochastic Partial Differential Equations (Edinburgh, 1994), pp. 304–311, vol. 216 of London
Math. Soc. Lecture Note Ser. Cambridge Univ. Press, Cambridge (1995)
147. Vespri, V.: Le anime della matematica. Da Pitagora alle intelligenze artificiali. Diarkos
editore, Santarcangelo di Romagna (2023)
148. Williams, D.: Probability with Martingales. Cambridge Mathematical Textbooks. Cambridge
University Press, Cambridge (1991)
149. Yamada, T., Watanabe, S.: On the uniqueness of solutions of stochastic differential equations.
J. Math. Kyoto Univ. 11, 155–167 (1971)
150. Yong, J., Zhou, X.Y.: Stochastic Controls, vol. 43 of Applications of Mathematics (New
York). Springer, New York (1999). Hamiltonian systems and HJB equations
151. Zabczyk, J.: Mathematical Control Theory—An Introduction, Systems & Control: Founda-
tions & Applications. Birkhäuser/Springer, Cham (2020). Second edition [of 2348543]
152. Zhang, J.: Backward Stochastic Differential Equations, vol. 86 of Probability Theory and
Stochastic Modelling. Springer, New York (2017). From linear to fully nonlinear theory
153. Zhang, X.: Stochastic homeomorphism flows of SDEs with singular drifts and Sobolev
diffusion coefficients. Electron. J. Probab. 16, 38, 1096–1116 (2011)
154. Zvonkin, A.K.: A transformation of the phase space of a diffusion process that will remove
the drift. Mat. Sb. (N.S.) 93(135), 129–149, 152 (1974)
Index
C
A Càdlàg, 88
Almost everywhere (a.e.), xix Canonical version
Almost surely (a.s.), xix of a continuous process, 63
A priori estimate of a Markov process, 33
p
.L , 281 of a process, 12
exponential, 283 Cauchy-Schwarz, 167
Arg max, xii Change of drift, 244
Aronson, D.G., 406 Chapman-Kolmogorov, 37, 381, 391
Assumptions Characteristic exponent, 116
standard for SDE, 277 Characteristics, 296
Coefficient
diffusion, 205, 264
B drift, 264
Bachelier, L., 71 Commutator, 312
Black&Scholes, 247 Completion, 11
Blumenthal, O., 114, 117 Condition
Boundary Hörmander, 312
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 421
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1
422 Index
G Law
Girsanov, I.V., 251, 253 0-1 of Blumenthal, 114, 117
Grönwall, T.H., 281 of a continuous process, 63
Gyöngy, I., 353 iterated logarithm, 74
of a stochastic process, 4
transition, 25
H Gaussian, 28, 40
Hörmander, L., 306, 312, 313 homogeneous, 27
Hilbert-Schmidt, 231 linear SDE, 303
Poisson, 27, 39
Lebesgue-Stieltjes, 158
I
Lemma
Independent increments, 34
Grönwall, 281
Inequality
Gyöngy, 353
Burkholder-Davis-Gundy, 216, 219
Komlós, 168
Doob’s maximal, 103, 104, 135
upcrossing, 105
Infinitesimal generator, 42
Levi E.E., 342, 379
Inhomogeneous term, 373
Lévy P., 114, 238
Integral
Lévy-Khintchine, 116
Itô, 175
Linear system
Lebesgue-Stieltjes, 158
controllability, 308
Riemann-Stieltjes, 151, 154, 200
Intensity, 86
stochastic, 90
Isometry M
Itô, 178, 186, 199, 232 Markov
Itô process, 30
formula, 209, 210 finite-dimensional laws, 36
for Brownian motion, 211, 239 property, 30
for continuous semimartingales, 233 extended, 32
for Itô processes, 214, 234 Markov, A., 25, 30, 123, 330
integral, 175 Markovian projection, 354
isometry, 178, 186, 199, 232 Martingale, 15, 338
process, 204 Brownian, 77, 257
multidimensional, 231 càdlàg, 139
process with deterministic coefficients, 215 discrete, 15
Itô-Tanaka, 348 exponential, 77, 212, 235, 244
local, 143
K problem, 338
Kalman, R.E., 310 quadratic, 77, 234
Kernel stopped, 143
Poisson, 295 sub-, 17
Kolmogorov, A.N., 11, 23, 64, 306 super-, 17
Kolmogorov equation uniformly square-integrable, 146
backward, 47 Matrix
forward, 51, 332 covariation, 166, 232
Komlós, J., 168 Mean reversion, 314
Kronecker, L., 228 Measurability
Krylov, N.V., 346, 363 progressive, 118
Measure
harmonic, 295
L Lebesgue-Stieltjes, 158
Langevin, P., 305 Wiener, 76
Laplace, P.S., 129 Mesh, 152
424 Index