0% found this document useful (0 votes)

63 views428 pages

Probability Theory II Stochastic Calculus 3031631927 9783031631924 - Compress

UNITEXT 166, edited by Andrea Pascucci, focuses on advanced topics in probability theory and stochastic calculus, designed for undergraduate and graduate courses. The book, a translation of the original Italian edition, emphasizes stochastic processes, particularly Markov processes and martingales, and aims to provide a solid foundation for students and researchers in the field. It highlights the importance of probability theory across various disciplines, including physics, finance, and computer science, and serves as a resource for understanding and applying stochastic models.

Uploaded by

eugenio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views428 pages

Probability Theory II Stochastic Calculus 3031631927 9783031631924 - Compress

Uploaded by

eugenio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 428

UNITEXT 166

Andrea Pascucci

Probability
Theory II
Stochastic Calculus
UNITEXT

La Matematica per il 3+2

Volume 166

Editor-in-Chief
Alfio Quarteroni, Politecnico di Milano, Milan, Italy
École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland

Series Editors
Luigi Ambrosio, Scuola Normale Superiore, Pisa, Italy
Paolo Biscari, Politecnico di Milano, Milan, Italy
Ciro Ciliberto, Università di Roma “Tor Vergata”, Rome, Italy
Camillo De Lellis, Institute for Advanced Study, Princeton, NJ, USA
Victor Panaretos, Institute of Mathematics, École Polytechnique Fédérale de
Lausanne (EPFL), Lausanne, Switzerland
Lorenzo Rosasco, DIBRIS, Università degli Studi di Genova, Genova, Italy
Center for Brains Mind and Machines, Massachusetts Institute of Technology,
Cambridge, Massachusetts, US
Istituto Italiano di Tecnologia, Genova, Italy
The UNITEXT - La Matematica per il 3+2 series is designed for undergraduate
and graduate academic courses, and also includes books addressed to PhD students
in mathematics, presented at a sufficiently general and advanced level so that the
student or scholar interested in a more specific theme would get the necessary
background to explore it.
Originally released in Italian, the series now publishes textbooks in English
addressed to students in mathematics worldwide.
Some of the most successful books in the series have evolved through several
editions, adapting to the evolution of teaching curricula.
Submissions must include at least 3 sample chapters, a table of contents, and
a preface outlining the aims and scope of the book, how the book fits in with the
current literature, and which courses the book is suitable for.
For any further information, please contact the Editor at Springer:
[email protected]
THE SERIES IS INDEXED IN SCOPUS
***
UNITEXT is glad to announce a new series of free webinars and interviews
handled by the Board members, who rotate in order to interview top experts in their
field.
Access this link to subscribe to the events:
https://cassyni.com/s/springer-unitext
Andrea Pascucci

Probability Theory II
Stochastic Calculus
Andrea Pascucci
Dipartimento di Matematica
Alma Mater Studiorum – Università di
Bologna
Bologna, Italy

ISSN 2038-5714 ISSN 2532-3318 (electronic)

UNITEXT
ISSN 2038-5722 ISSN 2038-5757 (electronic)
La Matematica per il 3+2
ISBN 978-3-031-63192-4 ISBN 978-3-031-63193-1 (eBook)
https://doi.org/10.1007/978-3-031-63193-1
This book is a translation of the original Italian edition “Teoria della Probabilità” by Andrea Pascucci,
published by Springer-Verlag Italia S.r.l. in 2024. The translation was done with the help of an artificial
intelligence machine translation tool. A subsequent human revision was done primarily in terms of
content, so that the book will read stylistically differently from a conventional translation. Springer
Nature works continuously to further the development of tools for the production of books and on the
related technologies to support the authors.

Translation from the Italian language edition: “Teoria della Probabilità. Processi e calcolo stocastico”
by Andrea Pascucci, © Springer-Verlag Italia S.r.l., part of Springer Nature 2024. Published by Springer
Milano. All Rights Reserved.

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

Cover illustration: Cino Valentini, Archeologia 2 , 2021, affresco acrilico, private collection

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

If disposing of this product, please recycle the paper.

To my students
vi

E ora, che ne sarà

del mio viaggio?
Troppo accuratamente l’ho studiato
senza saperne nulla. Un imprevisto
è la sola speranza. Ma mi dicono
che è una stoltezza dirselo1
Eugenio Montale, Prima del viaggio

1 And now, what will become of my journey?

I’ve studied it too meticulously

without knowing anything about it. An unforeseen
event is the only hope. But they tell me
it’s foolish to say so.
Preface

“For over two millennia, Aristotle’s logic has ruled over the thinking of western
intellectuals. All precise theories, all scientific models, even models of the process
of thinking itself, have in principle conformed to the straight-jacket of logic.
But from its shady beginnings devising gambling strategies and counting corpses
in medieval London, probability theory and statistical inference now emerge as
better foundations for scientific models, especially those of the process of thinking
and as essential ingredients of theoretical mathematics, even the foundations of
mathematics itself. We propose that this sea change in our perspective will affect
virtually all of mathematics in the next century.”
David Bryant Mumford, The Dawning of the Age of Stochasticity [99]

“A mathematician is someone who loves philosophy, art, and poetry because they
find the profound human need everywhere, against and beyond the often ridiculous
oppositions between “hard” and “soft” sciences. Awareness of such an intertwining
further enhances (.. . .) the high, inescapable and indestructible moral choice to
carry out one’s own action as a scientist and as a human being in society towards
good. And if good and true come together, they can only produce beauty.”
Rino Caputo, Preface to Le anime della matematica [147]

In Volume 1 of Probability Theory [113], we introduced fundamental concepts

such as probability space and distribution, random variables, limit theorems, and
conditional expectation. This second volume complements the earlier material by
delving into more advanced classical topics in stochastic analysis. The primary
focus of this book lies in stochastic processes, with particular emphasis on two
crucial classes: Markov processes and martingales. The initial chapters provide a
general introduction to stochastic processes and explore the analysis of two key
examples of Markov processes: Brownian motion and the Poisson process. Histori-
cally, two major approaches have been employed for the construction of continuous
Markov process, often referred to as “diffusions.” The classical approach, pioneered
by A. N. Kolmogorov [69] and W. Feller [45], involves constructing a diffusion
based on its transition law, which is defined as the distributional solution to the

vii
viii Preface

backward and forward Kolmogorov differential equations. This approach relies on

intricate analytical results from the theory of partial differential equations. Starting
from Chap. 8, we embark on a systematic study of martingales. One of the most
significant results in martingale theory is Doob’s decomposition theorem, which,
under appropriate assumptions, represents a process as the direct sum of a “drift
part” and a “martingale part,” each with its own regularity properties. This type
of result forms the basis for the second approach, proposed by K. Itô, to the
construction of continuous Markov processes. Itô builds upon P. Lévy’s idea of
considering the infinitesimal increment of a diffusion as a Gaussian-type increment
with a suitable mean (drift) and covariance matrix (martingale part). In light of
these results, Itô develops a theory of stochastic differential calculus and provides a
method for constructing diffusions as solutions to stochastic differential equations.
The final part of the book is dedicated to an in-depth study of the existence and
uniqueness problems for stochastic equations and their connections to elliptic-
parabolic partial differential equations.
This book comprises sufficient material to support at least two semester-long
courses on stochastic processes and calculus, suitable for graduate or doctoral-level
studies. It is designed as a relatively concise compendium, given the intricacy of the
subject, providing a solid groundwork for those interested in exploring stochastic
models for practical applications and for those beginning their research journey in
the field of stochastic analysis.
As emphasized in the introduction of the first volume [113], it is worth restating
the quote by David Mumford that initiates the preface: nowadays, probability theory
is regarded as an indispensable component for the theoretical advancement of
mathematics and the very foundations of mathematics itself. In this context, the
noteworthy review article [97] examines the remarkable progress in research on
stochastic processes since the mid-twentieth century.
From an applied standpoint, probability theory serves as the fundamental tool
used to model and manage risk in all fields where phenomena are studied under
conditions of uncertainty:
• Physics and Engineering where stochastic numerical methods, such as Monte
Carlo methods, are extensively used. These methods were first formalized by
Enrico Fermi and John von Neumann.
• Economics and Finance, starting with the famous Black-Scholes-Merton for-
mula, for which the authors received the Nobel Prize. Financial modeling gener-
ally requires an advanced background in mathematical-probabilistic-numerical
methods. The text [112] provides an introduction to the theory of financial
derivative valuation, balancing the probabilistic approach (based on martingale
theory) and the analytic approach (based on partial differential equations theory).
• Telecommunications: NASA utilizes the Kalman-Bucy filter method to filter
signals from satellites and probes sent into space. From [102], page 2: “In
1960 Kalman and in 1961 Kalman and Bucy proved what is now known as
the Kalman-Bucy filter. Basically the filter gives a procedure for estimating the
state of a system which satisfies a ‘noisy’ linear differential equation, based
Preface ix

on a series of ‘noisy’ observations. Almost immediately the discovery found

applications in aerospace engineering (Ranger, Mariner, Apollo etc.) and it now
has a broad range of applications. Thus the Kalman-Bucy filter is an example
of a recent mathematical discovery which has already proved to be useful -
it is not just ‘potentially’ useful. It is also a counterexample to the assertion
that ‘applied mathematics is bad mathematics’ and to the assertion that ‘the
only really useful mathematics is the elementary mathematics’. For the Kalman-
Bucy filter—as the whole subject of stochastic differential equations—involves
advanced, interesting and first class mathematics.”
• Medicine and Botany: the most important stochastic process, the Brownian
motion, is named after Robert Brown, a botanist who observed the irregular
movement of colloidal particles in suspension around 1830. Brownian motion
was used by Louis Jean Baptist Bachelier in 1900 in his doctoral thesis to model
stock prices and was the subject of one of Albert Einstein’s most famous works
published in 1905. The first mathematically rigorous definition of the Brownian
motion was given by Norbert Wiener in 1923.
• Genetics: it is the science that studies the transmission of traits and the
mechanisms by which they are inherited. Gregor Johann Mendel (1822–1884), a
Czech Augustinian monk considered the precursor of modern genetics, made a
fundamental methodological contribution by applying probability calculus to the
study of biological inheritance for the first time.
• Computer Science: quantum computers exploit the laws of quantum mechanics
for data processing. In a “classical” computer, the unit of information is the bit:
we can always determine the state of a bit and precisely establish whether it is 0
or 1. However, we cannot determine the state of a quantum bit (qubit), the unit of
quantum information, with the same level of precision. We can only determine
the probabilities of it assuming the values 0 and 1.
• Jurisprudence: the verdict issued by a judge in a court is based on the
probability of the defendant’s guilt estimated from the information provided by
the investigations. In this field, the concept of conditional probability plays a
fundamental role, and its misuse can lead to notorious miscarriages of justice,
some of which are recounted in [116].
• Meteorology: for forecasts beyond the fifth day, it is crucial to have proba-
bilistic meteorological models. These probabilistic models are generally run in
major international meteorological centers because they require highly complex
statistical-mathematical procedures that are computationally intensive. Starting
from 2020, the Data Center of the European Centre for Medium-Range Weather
Forecasts (ECMWF) is located in Bologna.
• Military Applications: from [127] page 139: “In 1938, Kolmogorov had pub-
lished a paper that established the basic theorems for smoothing and predicting
stationary stochastic processes. An interesting comment on the secrecy of war
efforts comes from Norbert Wiener (1894–1964) who, at the Massachusetts
Institute of Technology, worked on applications of these methods to military
problems during and after the war. These results were considered so important to
x Preface

America’s Cold War efforts that Wiener’s work was declared top secret. But all
of it, Wiener insisted, could have been deduced from Kolmogorov’s early paper.”
Finally, probability is at the foundation of the development of the most recent
technologies in Machine Learning and all related applications in Artificial Intelli-
gence, such as autonomous driving, speech and image recognition, and more (see,
for example, [54] and [122]). Nowadays, an advanced knowledge of Probability
Theory is a minimum requirement for anyone interested in pursuing applied
mathematics in any of the aforementioned fields.
It should be acknowledged that there are numerous monographs on stochastic
analysis: among my favorites I mention, in alphabetical order, Baldi [6], Bass [9],
Baudoin [13], Doob [35], Durrett [37], Friedman [50], Kallenberg [66], Karatzas
and Shreve [67], Mörters and Peres [98], Revuz and Yor [123], Schilling [129], and
Stroock [133]. Other excellent texts that have been major sources of inspiration and
ideas include those by Bass [10], Durrett [38], Klenke [68], and Williams [148]. In
any case, this list is far from exhaustive.
After more than two decades of teaching experience in this field, this book
represents my endeavor to systematically, concisely, and as comprehensively as
possible, compile the fundamental concepts of stochastic calculus that, in my view,
should constitute the essential knowledge for a modern mathematician, whether pure
or applied.
I would like to conclude by expressing my heartfelt gratitude to the exceptional
group of probabilists at the Department of Mathematics in Bologna: Stefano
Pagliarani, Elena Bandini, Cristina Di Girolami, Salvatore Federico, Antonello
Pesce, and Giacomo Lucertini, as well as those whom I hope will join us in the
future. A big thank you also goes to Andrea Cosso for his valuable collaboration
during the (all too short!) time he was a member of our department. Lastly, I extend
a special thank you to all the students who have taken my courses on probability
theory and stochastic calculus. This book was created for them, inspired by the
passion and energy they have shared with me. It is dedicated to them because I
cannot refrain from making it my own, at least as an attempt, the famous phrase of
a great scientist “I never teach my pupils; I only attempt to provide the conditions
in which they can learn.”
Readers who wish to report any errors, typos, or suggestions for improvement
can do so at the following address: [email protected].
The corrections received after publication will be made available on the website
at: https://unibo.it/sitoweb/andrea.pascucci/.

Bologna, Italy Andrea Pascucci

April 2024
Frequently Used Symbols and Notations

⊎ := B means that A is, by definition, equal to B

• .A
• . indicates the disjoint union
• .An ↗⋃A indicates that .(An )n∈N is an increasing sequence of sets such that
.A = An
n∈N
n ↘ A indicates that .(An )n∈N is a decreasing sequence of sets such that .A =
• .A⋂
An
n∈N
• .Bd = B(Rd ) is the Borel .σ -algebra in .Rd ; .B := B1
• .mF is the class of .F -measurable functions

f : (Ω, F ) −→ (E, E );
.

If .(E, E ) = (R, B), .mF + (resp. .bF ) denotes the class of .F -measurable and
non-negative (resp. .F -measurable and bounded) functions.
• .N is the family of negligible sets (cf. Definition 1.1.16 in [113])
• Numerical sets:
– natural numbers: .N = {1, 2, 3, . . .}, .N0 = N ∪ {0}, .In := {1, . . . , n} for
.n ∈ N

– real numbers .R, extended real numbers .R̄ = R∪{±∞}, positive real numbers
.R >0 = ]0, +∞[, non-negative real numbers .R ≥0 = [0, +∞[

• .Lebd indicates the d-dimensional Lebesgue measure; .Leb := Leb1

• Indicator function of a set A
⎧
1 if x ∈ A
.1A (x) :=
0 otherwise

xi
xii Frequently Used Symbols and Notations

• Euclidean scalar product:

⎲
d
〈x, y〉 = x · y =
. xi yi , x = (x1 , . . . , xd ), y = (y1 , . . . , yd ) ∈ Rd
i=1

In matrix operations, the d-dimensional vector x is identified with the .d × 1

column matrix.
• Maximum and minimum of real numbers:

x ∧ y = min{x, y},
. x ∨ y = max{x, y}

• Positive and negative part:

x + = x ∨ 0,
. x − = (−x) ∨ 0

• Argument of the maximum and minimum of .f : A −→ R:

. arg max f (x) = {y ∈ A | f (y) ≥ f (x) for every x ∈ A}

x∈A

arg min f (x) = {y ∈ A | f (y) ≤ f (x) for every x ∈ A}

x∈A
Contents

1 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Stochastic Processes: Law and Finite-Dimensional
Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Measurable Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Existence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Filtrations and Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Proof of Kolmogorov’s Extension Theorem . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1 Transition Law and Feller Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Processes with Independent Increments and Martingales . . . . . . . . . . 34
2.4 Finite-Dimensional Laws and Chapman-Kolmogorov Equation. . . 36
2.5 Characteristic Operator and Kolmogorov Equations . . . . . . . . . . . . . . . 41
2.5.1 The Local Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.2 Backward Kolmogorov Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.5.3 Forward Kolmogorov (or Fokker-Planck) Equation . . . . . . 51
2.6 Markov Processes and Diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.7 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3 Continuous Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1 Continuity and a.s. Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Canonical Version of a Continuous Process . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3 Kolmogorov’s Continuity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4 Proof of Kolmogorov’s Continuity Theorem . . . . . . . . . . . . . . . . . . . . . . . 66
3.5 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Markov and Feller Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

xiii
xiv Contents

4.3 Wiener Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.4 Brownian Martingales. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5 Poisson Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Markov and Feller Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3 Martingale Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4 Proof of Theorem 5.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6 Stopping Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.1 The Discrete Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.1.1 Optional Sampling, Maximal Inequalities, and
Upcrossing Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2 The Continuous Case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2.1 Usual Conditions and Stopping Times . . . . . . . . . . . . . . . . . . . . 107
6.2.2 Filtration Enlargement and Markov Processes . . . . . . . . . . . . 110
6.2.3 Filtration Enlargement and Lévy Processes . . . . . . . . . . . . . . . 114
6.2.4 General Results on Stopping Times . . . . . . . . . . . . . . . . . . . . . . . 118
6.3 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7 Strong Markov Property. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.1 Feller and Strong Markov Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.2 Reflection Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.3 The Homogeneous Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
8 Continuous Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.1 Optional Sampling and Maximal Inequalities . . . . . . . . . . . . . . . . . . . . . . 134
8.2 Càdlàg Martingales. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.3 The Space M c,2 of Square-Integrable Continuous Martingales . . . 141
8.4 The Space M c,loc of Continuous Local Martingales . . . . . . . . . . . . . . . 143
8.5 Uniformly Square-Integrable Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.6 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
9 Theory of Variation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
9.1 Riemann-Stieltjes Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
9.2 Lebesgue-Stieltjes Integral. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.3 Semimartingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
9.3.1 Brownian Motion as a Semimartingale . . . . . . . . . . . . . . . . . . . . 161
9.3.2 Semimartingales of Bounded Variation. . . . . . . . . . . . . . . . . . . . 163
9.4 Doob’s Decomposition and Quadratic Variation Process . . . . . . . . . . 165
9.5 Covariation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.6 Proof of Doob’s Decomposition Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 168
9.7 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Contents xv

10 Stochastic Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

10.1 Integral with Respect to a Brownian Motion. . . . . . . . . . . . . . . . . . . . . . . . 176
10.1.1 Proof of Lemma 10.1.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
10.2 Integral with Respect to Continuous Square-Integrable
Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
10.2.1 Integral of Indicator Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
10.2.2 Integral of Simple Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
10.2.3 Integral in L2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
10.2.4 Integral in L2loc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
10.2.5 Stochastic Integral as a Riemann-Stieltjes Integral . . . . . . . 200
10.3 Integral with Respect to Continuous Semimartingales . . . . . . . . . . . . . 202
10.4 Scalar Itô Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
10.5 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
11 Itô’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
11.1 Itô’s Formula for Continuous Semimartingales . . . . . . . . . . . . . . . . . . . . . 209
11.1.1 Itô’s Formula for Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . 211
11.1.2 Itô’s Formula for Itô Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
11.2 Some Consequences of Itô’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
11.2.1 Burkholder-Davis-Gundy Inequalities . . . . . . . . . . . . . . . . . . . . . 216
11.2.2 Quadratic Variation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
11.3 Proof of Itô’s Formula. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
11.4 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
12 Multidimensional Stochastic Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
12.1 Multidimensional Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
12.2 Multidimensional Itô Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
12.3 Multidimensional Itô’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
12.4 Lévy’s Characterization and Correlated Brownian Motion . . . . . . . . 238
12.5 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
13 Changes of Measure and Martingale Representation . . . . . . . . . . . . . . . . . . 243
13.1 Change of Measure and Itô Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
13.1.1 An Application: Risk-Neutral Valuation of
Financial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
13.2 Integrability of Exponential Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
13.3 Girsanov Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
13.4 Approximation by Exponential Martingales . . . . . . . . . . . . . . . . . . . . . . . . 254
13.5 Representation of Brownian Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
13.5.1 Proof of Theorem 13.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
13.6 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
14 Stochastic Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
14.1 Solving SDEs: Concepts of Existence and Uniqueness . . . . . . . . . . . . 264
14.2 Weak Existence and Uniqueness via Girsanov Theorem . . . . . . . . . . . 269
14.3 Weak vs Strong Solutions: The Yamada-Watanabe Theorem . . . . . . 272
14.4 Standard Assumptions and Preliminary Estimates . . . . . . . . . . . . . . . . . 277
xvi Contents

14.5 Some A Priori Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

14.6 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
15 Feynman-Kac Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
15.1 Characteristic Operator of an SDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
15.2 Exit Time from a Bounded Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
15.3 The Autonomous Case: The Dirichlet Problem. . . . . . . . . . . . . . . . . . . . . 292
15.4 The Evolutionary Case: The Cauchy Problem . . . . . . . . . . . . . . . . . . . . . . 297
15.5 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
16 Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
16.1 Solution and Transition Law of a Linear SDE . . . . . . . . . . . . . . . . . . . . . . 303
16.2 Controllability of Linear Systems and Absolute Continuity . . . . . . . 308
16.3 Kalman Rank Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
16.4 Hörmander’s Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
16.5 Examples and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
16.6 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
17 Strong Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
17.1 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
17.2 Existence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
17.3 Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
17.3.1 Forward Kolmogorov Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
17.4 Continuous Dependence on Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
18 Weak Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
18.1 The Stroock-Varadhan Martingale Problem . . . . . . . . . . . . . . . . . . . . . . . . 338
18.2 Equations with Hölder Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
18.3 Other Results for the Martingale Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 346
18.4 Strong Uniqueness Through Regularization by Noise. . . . . . . . . . . . . . 347
18.5 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
19 Complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
19.1 Markovian Projection and Gyöngy’s Lemma . . . . . . . . . . . . . . . . . . . . . . . 353
19.2 Backward Stochastic Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . 356
19.3 Filtering and Stochastic Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
19.4 Backward Stochastic Integral and Krylov’s SPDE . . . . . . . . . . . . . . . . . 362
20 A Primer on Parabolic PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
20.1 Uniqueness: The Maximum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
20.1.1 Cauchy-Dirichlet Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
20.1.2 Cauchy Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
20.2 Existence: The Fundamental Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
20.3 The Parametrix Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
20.3.1 Gaussian Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Contents xvii

20.3.2 Proof of Proposition 20.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390

20.3.3 Potential Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
20.3.4 Proof of Theorem 20.2.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
20.3.5 Proof of Proposition 18.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
20.4 Key Ideas to Remember. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
Abbreviations

r.v. = random variable, a.s. = almost surely. A certain property holds a.s. if there
exists .N ∈ N (negligible set) such that the property is true for every .ω ∈
Ω\N
a.e. = almost everywhere (with respect to the Lebesgue measure)

We indicate the importance of the results with the following symbols:

[!] means that you should pay close attention and try to understand well, because
an important concept, a new idea, or a new technique is being introduced
[!!] means that the result is very important
[!!!] means that the result is fundamental
Certain points of particular significance or relevance will be denoted by the gray
shades

xix
Chapter 1
Stochastic Processes

Infinite product spaces are the natural habitat of probability

theory
William Feller

Random variables describe the state of a random phenomenon: for example, an

unobservable position of a particle in a physics model or the price at a future date
of a stock in a financial model. Stochastic processes describe the dynamics, over
time or depending on other parameters, of a random phenomenon. A stochastic
process can be defined as a parameterized family of random variables, each of
which represents the state of the phenomenon corresponding to a fixed value of the
parameters. We have already encountered a simple but notable stochastic process in
Volume 1, Example 2.6.4 in [113], in which .(Xn )n∈N represents the evolution over
time of the price of a risky asset. From a more abstract perspective, a stochastic
process can be defined as a random variable with values in a functional space,
typically a space of curves in .RN : each curve represents a trajectory or possible
evolution of the phenomenon in .RN as the parameters vary.
The theory of stochastic processes is nowadays one of the richest and most
fascinating fields of mathematics: we point out the excellent review article [97]
which, with a wealth of insights, tells the story of research on stochastic processes
from the middle of the last century onwards.

1.1 Stochastic Processes: Law and Finite-Dimensional

Distributions

In this section, we give two equivalent definitions of stochastic process. The first
definition is quite simple and intuitive; the second is more abstract but essential
for the proof of some general results on stochastic processes. We also introduce
some accessory notions: the space of trajectories, the law and the finite-dimensional
distributions.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 1

A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_1
2 1 Stochastic Processes

Let I be a generic non-empty set. Given .d ∈ N, let .mF be the set of random
variables with values in .Rd , defined on a probability space .(Ω, F , P ). The concept
of a stochastic process extends that of a function from I to .Rd , admitting that the
values taken may be random: in other words, just as a function

f : I −→ Rd
.

associates .t ∈ I with the dependent variable .f (t) ∈ Rd , similarly a stochastic

process

. X : I −→ mF

associates .t ∈ I with the d-dimensional random variable .Xt ∈ mF .

Definition 1.1.1 (Stochastic Process) A stochastic process is a function with d-
dimensional random values

X : I −→ mF
.
t −→ Xt .

If .d = 1 we say that X is a real stochastic process. If I is finite or countable then

we say that X is a discrete stochastic process.
One can equivalently think of the stochastic process X as an indexed family .X =
(Xt )t∈I of random variables. To fix ideas, often the domain I will be a subset of .R
that represents a set of time indices; for example, if .I = N then a process .(Xn )n∈N
is simply a sequence of random variables.
More generally, a stochastic process X can be defined by assuming that .Xt , for
each .t ∈ I , is a random variable with values in a generic measurable space .(E, E )
instead of .Rd .
To give the second definition of a stochastic process, it is necessary to introduce
some preliminary notations. We denote by

RI = {x : I −→ R}
.

the family of functions from I to .R. For each .x ∈ RI and .t ∈ I , we write .xt instead
of .x(t) and say that .xt is the t-th component of x: in this way we interpret .RI as the
Cartesian product of .R for a number .|I | of times (even if I is not finite or countable).
For example, if .I = {1, . . . , d} then .RI is identifiable with .Rd , while if .I = N then
.R
N is the set of sequences .x = (x , x , . . . ) of real numbers. An element .x ∈ R I
1 2
can be seen as a parameterized curve in .R, where I is the set of parameters.
We say that .RI is the space of trajectories from I to .R and .x ∈ RI is a real
trajectory. There is nothing special about considering real trajectories: we could
directly consider .Rd or even a generic measurable space .(E, E ) instead of .R. In
such a case, the space of trajectories is .E I , the set of functions from I with values
1.1 Stochastic Processes: Law and Finite-Dimensional Distributions 3

in E. However, at least for the moment, we restrict our attention to .E = R which is

involved in the study of one-dimensional (or real) stochastic processes.
We endow the space of trajectories with a measurable space structure. On .RI
we introduce a .σ -algebra that generalizes the product .σ -algebra defined in Section
2.3.2 in [113]. We call a finite-dimensional cylinder, or simply cylinder, a subset of
.R of which a finite number of components are “fixed”.
I

Definition 1.1.2 (Finite-Dimensional Cylinder) Given .t ∈ I and .H ∈ B, we say

that the set

Ct (H ) := {x ∈ RI | xt ∈ H }
.

is a one-dimensional cylinder. Given .t1 , . . . , tn ∈ I distinct and .H1 , . . . , Hn ∈ B,

we set .H = H1 × · · · × Hn and say that

⋂
n
Ct1 ,...,tn (H ) := {x ∈ RI | (xt1 , . . . , xtn ) ∈ H } =
. Cti (Hi ) (1.1.1)
i=1

is a finite-dimensional cylinder. We denote by .C the family of finite-dimensional

cylinders and

F I := σ (C )
.

the .σ -algebra generated by such cylinders.

The .σ -algebra .F I is a very abstract object and, at least for the moment, it is not
important to try to visualize it concretely or to understand its structure in depth:
some additional information about .F I will be provided in Remark 1.1.10. We
introduced .F I in order to give the following alternative definition.
Definition 1.1.3 (Stochastic Process) A real stochastic process .X = (Xt )t∈I on
the probability space .(Ω, F , P ) is a random variable with values in the space of
trajectories .(RI , F I ):

X : Ω −→ RI .
.

Remark 1.1.4 The fact that X is a random variable means that the measurability
condition holds

(X ∈ C) ∈ F for every C ∈ F I .
. (1.1.2)
4 1 Stochastic Processes

In turn, condition (1.1.2) is equivalent1 to the fact that

(Xt ∈ H ) ∈ F for every H ∈ B, t ∈ I,

. (1.1.3)

and therefore Definitions 1.1.1 and 1.1.3 are equivalent. In summary, one can also
say that a real stochastic process X is a function

X : I × Ω −→ R.
.

(t, ω) −→ Xt (ω)

that
• associates to each .t ∈ I the random variable .ω |→ Xt (ω): this is the standpoint
of Definition 1.1.1;
• associates to each .ω ∈ Ω the trajectory .t |→ Xt (ω): this is the standpoint of
Definition 1.1.3. Note that each outcome .ω ∈ Ω corresponds to (and can be
identified with) a trajectory of the process.
Example 1.1.5 Every function .f : I −→ R can be seen as a stochastic process
interpreting, for each fixed .t ∈ I , .f (t) as a constant random variable. In other
words, if .Ω = {ω} is a sample space consisting of a single element, the process
defined by .Xt (ω) = f (t) has only one trajectory which is the function f . The
measurability condition (1.1.3) is obvious since .F = {∅, Ω}. In this sense, the
concept of a stochastic process generalizes that of a function because it allows the
existence of multiple trajectories.
From the standpoint of Definition 1.1.3 a stochastic process is a random variable
and therefore we can define its law.
Definition 1.1.6 (Law) The distribution (or law) of the stochastic process X is the
probability measure on .(RI , F I ) defined by

μX (C) = P (X ∈ C),
. C ∈ FI.

Remark 1.1.7 (Finite-Dimensional Distributions) Even the concept of law of a

stochastic process is abstract and not very convenient: from an operational perspec-
tive, a much more effective tool are the so-called finite-dimensional distributions

1 Indeed, .(X∈ H ) = (X ∈ C) where C is the one-dimensional cylinder (i.e., in which only one
t
component is fixed) defined by .{x ∈ RI | xt ∈ H }: so it is clear that if X is a stochastic process
then .Xt ∈ mF for every .t ∈ I . Conversely, the family

.H := {C ∈ F I | X−1 (C) ∈ F }

is a .σ -algebra that, by hypothesis, includes one-dimensional cylinders and therefore also .C

(cylinders are finite intersections of one-dimensional cylinders). Then .H ⊇ σ (C ) = F I .
1.1 Stochastic Processes: Law and Finite-Dimensional Distributions 5

which are the distributions .μ(Xt1 ,...,Xtn ) of the random vectors .(Xt1 , . . . , Xtn ) as the
choice of a finite number of indices .t1 , . . . , tn ∈ I varies. The law of a process
is uniquely determined by the finite-dimensional distributions: in other words, it is
equivalent to knowing the law or the finite-dimensional distributions of a stochastic
process.2
The one-dimensional distributions are not sufficient to identify the law of a
process. This is clear when I is finite and therefore the process is simply a random
vector: in fact, the one-dimensional distributions are the marginal laws of the vector
which obviously do not identify the joint law. Another interesting example is given
in Remark 4.1.5.
Example 1.1.8 Let .A, B ∼ N0,1 be independent random variables. Consider the
stochastic process .X = (Xt )t∈R defined by

Xt = At + B,
. t ∈ R.

Each trajectory of X is a linear function (a straight line) on .R. It is not obvious

to specify the distribution of this process but it is easy to calculate the finite-
dimensional distributions, in fact given .t1 , . . . , tn ∈ R we have
⎛ ⎞ ⎛ ⎞
Xt1 ⎛ ⎞ t1 1
⎜ . ⎟ A ⎜ ⎟
.⎝ . ⎠ = α
. , α = ⎝ ... ... ⎠
B
Xtn tn 1

and therefore, by Proposition 2.5.15 in [113], .(Xt1 , . . . , Xtn ) ∼ N0,αα ∗ .

Example 1.1.9 (Gaussian Process) We say that a stochastic process is Gaussian if
it has normal finite-dimensional distributions. If .X = (Xt )t∈I is Gaussian, consider
the mean and covariance functions

m(t) := E [Xt ] ,
. c(s, t) := cov(Xs , Xt ), s, t ∈ I.

2 The measure of a generic cylinder .Ct1 ,...,tn (H ) is expressed as

( )
.μX Ct1 ,...,tn (H ) = μ(Xt ,...,Xt ) (H )
1 n

and therefore the finite-dimensional distributions identify .μX on .C . On the other hand, .C is a
.∩-closed
)family and generates .F : by Corollary I-cc2 in [113] if two probability measures on
I
( I
. R ,F
I coincide on .C then they are equal. In other words, if .μ (C) = μ (C) for each .C ∈ C
1 2
then .μ1 ≡ μ2 . We will see that, thanks to Carathéodory’s theorem, a probability measure extends
uniquely from .C to .F I : this is the content of one of the first fundamental results on stochastic
processes, Kolmogorov’s extension theorem, which we will examine in Sect. 1.3.
6 1 Stochastic Processes

These functions determine the finite-dimensional distributions (and therefore also

the law!) of the process because, for each choice .t1 , . . . , tn ∈ I , we have

(Xt1 , . . . , Xtn ) ∼ NM,C

where
( )
M = (m(t1 ), . . . , m(tn ))
. and C = c(ti , tj ) i,j =1,...,n . (1.1.4)
( )
We observe that .C = c(ti , tj ) i,j =1,...,n is a symmetric and positive semi-definite
matrix. Obviously, if I is finite then X is nothing but a random vector with multi-
normal distribution. The process of Example 1.1.8 is Gaussian with zero mean and
covariance function .c(s, t) = st + 1. The trivial process of Example 1.1.5 is also
Gaussian with mean function .f (t) and identically zero covariance function: in this
case, .Xt ∼ δf (t) for every .t ∈ I . Finally, a fundamental example of Gaussian
process is the Brownian motion that we will define in Chap. 4.
Remark 1.1.10 ([!]) There are families of trajectories, even very significant ones,
that do not belong to the .σ -algebra .F I . The idea is that every element of .F I is
characterized by a countably number of coordinates3 and this is highly restrictive
when I is uncountable. For example, if .I = [0, 1] we have

/ F [0,1]
C[0, 1] ∈
.

since the family .C[0, 1] of continuous functions cannot be characterized, in the

space of all functions from .[0, 1] to .R, by imposing conditions on a countable
number of coordinates.4 For the same reason, even the singletons .{x} with .x ∈
R[0,1] , the subsets of .R[0,1] with a finite number of elements, and other significant

3 More precisely, let us solve Exercise 1.4 in [9]: consider .I = [0, 1] (thus the space of trajectories
.R is the family of functions from .[0, 1] to .R). Given a sequence .τ = (tn )n≥1 ∈ [0, 1]N , we
I

identify .τ with the map

.τ : R[0,1] −→ RN , τ (x) := (xtn )n≥1 ,

and put

.M = {τ −1 (H ) | τ ∈ [0, 1]N , H ∈ F N }, τ −1 (H ) = {x ∈ R[0,1] | τ (x) ∈ H },

where .F N denotes the .σ -algebra generated by cylinders in .RN . Then .M ⊆ F [0,1] and contains
the family of finite-dimensional cylinders of .R[0,1] , which is a .∩-closed family that generates
.F
[0,1] . Moreover, one proves that .M is a monotone family: it follows from Corollary A.0.4 in

[113] that .M = F [0,1] i.e., every element .C ∈ F [0,1] is of the form .C = τ −1 (H ) for some
sequence .τ in .[0, 1] and some .H ∈ F N . In other words, C is characterized by the choice of a
countable number of coordinates .τ = (tn )n≥1 (as well as by .H ∈ F N ).
4 By contradiction, if .C[0, 1] = τ −1 (H ), for some sequence of coordinates .τ = (t )
n n≥1 in .[0, 1]
and .H ∈ F N , then modifying .x ∈ C[0, 1] at a point .t ∈ / τ should still result in a continuous
function and this is clearly false.
1.1 Stochastic Processes: Law and Finite-Dimensional Distributions 7

families such as for example

{ }
. x ∈ R[0,1] | sup xt < 1
t∈[0,1]

do not belong to .F [0,1] .

These examples may raise strong perplexity towards the .σ -algebra .F I which
is not wide enough to contain important families of trajectories like those just
considered. Actually, the problem is that the sample space .RI , of all the functions
from I to .R, is so large as to be hardly tractable as a measurable space, thus making
it difficult to develop a general theory of stochastic processes. For this reason, as
soon as possible we will replace .RI with a state space that, in addition to being
“smaller”, also possesses a useful metric space structure: this is the case of the space
of continuous trajectories that we will examine in Sect. 3.2.

1.1.1 Measurable Processes

We have given two equivalent definitions of stochastic process, each with its own
advantages and disadvantages:
(i) a stochastic process is a function with random values (Definition 1.1.1)

.X : I −→ mF

that associates to each .t ∈ I the random variable .Xt defined on the probability
space .(Ω, F , P );
(ii) a stochastic process is a random variable with values in a space of
trajectories (Definition 1.1.3): according to this much more abstract definition,
a process .X = X(ω) is a random variable

X : Ω −→ RI
.

from the probability space .(Ω, F , P ) to the space of trajectories .RI , equipped
with the structure of a measurable space with the .σ -algebra .F I . This definition
is used in the proof of the most general and theoretical results even if it is a less
operational notion and more difficult to apply to the study of concrete examples.
Note that the previous definitions do not require any assumptions about the type of
dependence of X with respect to the variable t (for example, measurability or some
kind of regularity). Obviously, the problem does not arise if I is a generic set, devoid
of any measurable or metric space structure; however, if I is a real interval then it
is possible to endow the product space .I × Ω with a structure of measurable space
with the product .σ -algebra .B ⊗ F .
8 1 Stochastic Processes

Definition 1.1.11 (Measurable Process) A measurable stochastic process is a

measurable function

X : (I × Ω, B ⊗ F ) −→ (R, B).
.

By Lemma 2.3.11 in [113], if X is a measurable stochastic process then:

• .Xt is a random variable for each .t ∈ I ;
• the trajectory .t |→ Xt (ω) is a Borel measurable function from I to .R, for each
.ω ∈ Ω.

If .I ⊆ R it is natural to interpret .t ∈ I as a time index: then, as we will see in

Sect. 1.4, the space of probability will be enriched with new elements (filtrations)
and a predominant role will be assumed by a particular class of stochastic processes,
called martingales. In that context, we will strengthen the notion of measurability by
introducing the concept of progressively measurable process (cf. Definition 6.2.27).
The term “General Theory of Stochastic Processes” is usually referred in the
literature to the field that deals with the study of the general properties of processes
when .I = R≥0 : for a concise introduction see, for example, Chapter 16 in [9] and
Chapter 1 in [65].

1.2 Uniqueness

There are various notions of equivalence between stochastic processes. First of all,
two processes .X = (Xt )t∈I and .Y = (Yt )t∈I are equal in law if they have the same
distribution (or, equivalently, if they have the same finite-dimensional distributions):
in this case X and Y could even be defined on different probability spaces. When X
and Y are defined on the same probability space .(Ω, F , P ), we can provide other
notions of equivalence expressed in terms of equality of trajectories. We first recall
that, in a probability space .(Ω, F , P ), a subset A of .Ω is almost sure (with respect
to P ) if there exists an event .C ⊆ A such that .P (C) = 1. If the probability space
is complete5 then every almost sure set A is an event and therefore we can simply
write .P (A) = 1.
Definition 1.2.1 (Modifications) Let .X = (Xt )t∈I and .Y = (Yt )t∈I be stochastic
processes on .(Ω, F , P ). We say that X and Y are modifications if .P (Xt = Yt ) = 1
for every .t ∈ I .
Remark 1.2.2 The previous definition can be easily generalized to the case of .X, Y
generic functions from .Ω to values in .RI : in this case .(Xt = Yt ) is not necessarily

5 We recall the definition given in Remark 2.1.11 in [113]: a probability space .(Ω, F , P ) is
complete if .N ⊆ F where .N denotes the family of negligible sets (cf. Definition 1.1.16 in
[113]).
1.2 Uniqueness 9

an event and therefore we say that X is a modification of Y if the set .(Xt = Yt )

is almost sure. This can be useful if it is not known a priori that X and/or Y are
stochastic processes.
Definition 1.2.3 (Indistinguishable Processes) Let .X = (Xt )t∈I and .Y = (Yt )t∈I
be stochastic processes on .(Ω, F , P ). We say that X and Y are indistinguishable if
the set

. (X = Y ) := {ω ∈ Ω | Xt (ω) = Yt (ω) for every t ∈ I }

is almost sure.
Remark 1.2.4 ([!]) Two processes X and Y are indistinguishable if they have
almost all the same trajectories. Even if X and Y are stochastic processes, it is not
necessarily true that .(X = Y ) is an event. In fact, .(X = Y ) = (X − Y )−1 ({0})
where .0 denotes the identically zero trajectory: however, .{0} ∈
/ F I unless I is finite
or countable (cf. Remark 1.1.10).
On the other hand, if the space .(Ω, F , P ) is complete then X and Y are
indistinguishable if and only if .P (X = Y ) = 1 since the completeness of the
space guarantees that .(X = Y ) ∈ F in the case .(X = Y ) is almost sure. For this
and other reasons that we will explain later, from now on we will often assume that
.(Ω, F , P ) is complete.

Remark 1.2.5 ([!]) If X and Y are modifications then they have the same finite-
dimensional distributions and therefore are equal in law. If X and Y are indis-
tinguishable then they are also modifications since for every .t ∈ I we have
.(X = Y ) ⊆ (Xt = Yt ). Conversely, if X and Y are modifications then they are

not necessarily indistinguishable: indeed,

⋂
(X = Y ) =
. (Xt = Yt )
t∈I

but if I is uncountable, such intersection might not belong to .F or have probability

less than one. If I is finite or countable then .X, Y are modifications if and only if
they are indistinguishable.
Let us give an explicit example of processes that are modifications but are not
indistinguishable.
Example 1.2.6 ([!]) Consider the sample space .Ω = [0, 1] with the Lebesgue
measure as the probability measure. Let .I = [0, 1], .X = (Xt )t∈I be the identically
zero process and .Y = (Yt )t∈I be the process defined by
⎧
1 if ω = t,
Yt (ω) =
.
0 if ω ∈ [0, 1] \ {t}.
10 1 Stochastic Processes

Then X and Y are modifications since, for every .t ∈ I ,

(Xt = Yt ) = {ω ∈ Ω | ω /= t} = [0, 1] \ {t}

has Lebesgue measure equal to one, i.e., it is an almost sure event. On the other
hand, all the trajectories of X are different from those of Y at one point.
We also note that X and Y are equal in law, but X has all continuous trajectories
and Y has all discontinuous trajectories: therefore, there are important properties of
the trajectories of a stochastic process (such as, for example, continuity) that do not
depend on the distribution of the process.
In the case of continuous processes, we have the following particular result.
Proposition 1.2.7 Let I be a real interval and let .X = (Xt )t∈I and .Y = (Yt )t∈I be
processes with a.s. continuous trajectories.6 If X is a modification of Y , then .X, Y
are indistinguishable.
Proof By assumption, the trajectories .X(ω) and .Y (ω) are continuous for every .ω ∈
A with A almost sure. Moreover, .P (Xt = Yt ) = 1 for every .t ∈ I and consequently
the set
⋂
.C := A ∩ (Xt = Yt )
t∈I ∩Q

is almost sure. For every .t ∈ I , there exists an approximating sequence .(tn )n∈N in
I ∩ Q: by the continuity hypothesis, for every .ω ∈ C we have
.

.Xt (ω) = lim Xtn (ω) = lim Ytn (ω) = Yt (ω)

n→∞ n→∞

and this proves that .X, Y are indistinguishable. ⨆

⨅
Remark 1.2.8 The result of Proposition 1.2.7 remains valid for processes that are
only continuous from the right or from the left.

1.3 Existence

In this section, we show that it is “always” possible to construct a stochastic process

with assigned finite-dimensional distributions.

6 The set of .ω ∈ Ω such that .t |→ Xt (ω) and .t |→ Yt (ω) are continuous functions is almost sure.
1.3 Existence 11

Let us make a preliminary remark: if .μt1 ,...,tn are the finite-dimensional distribu-
tions of a real stochastic process .(Xt )t∈I , then we have

μt1 ,...,tn (H1 × · · · × Hn )

.
( )
= P (Xt1 ∈ H1 ) ∩ · · · ∩ (Xtn ∈ Hn ) , t1 , . . . , tn ∈ I, H1 , . . . , Hn ∈ B.
(1.3.1)

As a consequence, the following consistency properties hold: for every finite family
of indices .t1 , . . . , tn ∈ I , for every .H1 , . . . , Hn ∈ B and for every permutation .ν of
the indices .1, 2, . . . , n, we have

μt1 ,...,tn (H1 × · · · × Hn ) = μtν(1) ,...,tν(n) (Hν(1) × · · · × Hν(n) ), .

. (1.3.2)
μt1 ,...,tn (H1 × · · · × Hn−1 × R) = μt1 ,...,tn−1 (H1 × · · · × Hn−1 ). (1.3.3)

A posteriori, it is clear that (1.3.2) and (1.3.3) are necessary conditions for the
distributions .μt1 ,...,tn to be the finite-dimensional distributions of a stochastic
process. The following result shows that these conditions are also sufficient.
Theorem 1.3.1 (Kolmogorov’s Extension Theorem [!!!] ) Let I be a non-empty
set. Suppose that, for each finite family of indices .t1 , . . . , tn ∈ I , a distribution
.μt1 ,...,tn on .R
n is given, and the consistency properties (1.3.2) and (1.3.3) are
( )
satisfied. Then there exists a unique probability measure .μ on . RI , F I that has
.μt1 ,...,tn as finite-dimensional distributions, i.e., such that

μ(Ct1 ,...,tn (H )) = μt1 ,...,tn (H )

. (1.3.4)

for each finite family of indices .t1 , . . . , tn ∈ I and .H = H1 × · · · × Hn ∈ Bn .

Remark 1.3.2 ([!]) Under the hypotheses of the previous theorem, the measure .μ
extends further to a .σ -algebra .FμI that contains .F I and such that the probability
space .(RI , FμI , μ) is complete: this is a consequence of Corollary 1.5.11 in
[113] and the constructive method used in the proof of Carathéodory’s theorem.
Sometimes, .FμI is called the .μ-completion of .F I .
We postpone the proof of Theorem 1.3.1 to Sect. 1.5 and now examine some
remarkable applications.
Corollary 1.3.3 (Existence of Processes with Assigned Finite-Dimensional Dis-
tributions [!]) Let I be a non-empty set. Suppose that, for each finite family of
indices .t1 , . . . , tn ∈ I , a distribution .μt1 ,...,tn on .Rn is given, and the consistency
properties (1.3.2) and (1.3.3) are satisfied. Then there exists a stochastic process
.X = (Xt )t∈I that is defined on a complete probability space and has .μt1 ,...,tn as

finite-dimensional distributions.
Proof Proceed in a similar way to the case of real random variables (cf.
Remark 2.1.17 in [113]). Let .(Ω, F , P ) = (RI , FμI , μ) be the complete
12 1 Stochastic Processes

probability space defined in Remark 1.3.2. The identity function

X : (RI , FμI ) −→ (RI , F I )

defined by .X(w) = w for each .w ∈ RI , is a stochastic process since .X−1 (F I ) =

F I ⊆ FμI . Moreover, X has .μt1 ,...,tn as finite-dimensional distributions since, for
each finite-dimensional cylinder .Ct1 ,...,tn (H ) as in (1.1.1), we have

.μX (Ct1 ,...,tn (H )) = μ(X ∈ Ct1 ,...,tn (H )) =

(since X is the identity function)

. = μ(Ct1 ,...,tn (H )) =

(by (1.3.4))

. = μt1 ,...,tn (H ).

⨆
⨅
Now consider a stochastic process X on the space .(Ω, F , P ). Denote by .μX the
law of X and by .FμI X the .μX -completion of .F I (cf. Remark 1.3.2).
Definition 1.3.4 (Canonical Version of a Stochastic Process [!]) The canonical
version (or realization) of a process X is the process .X, on the probability space
.(R , Fμ , μX ), defined by .X(w) = w for each .w ∈ R .
I I I
X

Remark 1.3.5 By Corollary 1.3.3, X and its canonical realization .X are equal in
law. Moreover, .X is defined on the complete probability space .(RI , FμI X , μX ) in
which the sample space is .RI and the outcomes are the trajectories of the process.
Corollary 1.3.6 (Existence of Gaussian Processes [!]) Let

.m : I −→ R, c : I × I −→ R

be functions
( ) that, for every finite family of indices .t1 , . . . , tn ∈ I , the matrix
such
C = c(ti , tj ) i,j =1,...,n is symmetric and positive semi-definite. Then there exists a
.

Gaussian process, defined on a complete probability space .(Ω, F , P ), with mean

function m and covariance function c.
In particular, choosing .I = R≥0 , there exists a Gaussian process with mean
function .m ≡ 0 and covariance function .c(s, t) = t ∧ s ≡ min{s, t}.
Proof The family of distributions .NM,C , with .M, C as in (1.1.4), is well defined
thanks to the hypothesis on the covariance function c. Moreover, it satisfies the
consistency properties (1.3.2) and (1.3.3), as can be verified by applying (1.3.1)
1.3 Existence 13

with .NM,C instead of .μt1 ,...,tn and .(Xt1 , . . . , Xtn ) ∼ NM,C . Then the first part of
the thesis follows from Corollary 1.3.3. ( )
Now let .t1 , . . . , tn ∈ R≥0 : the matrix .C = min{ti , tj } i,j =1,...,n is obviously
symmetric and is also positive semi-definite since, for every .η1 , . . . , ηn ∈ R, we
have

⎲
n ⎲
n ˆ ∞
. ηi ηj min{ti , tj } = ηi ηj 1[0,ti ] (s)1[0,tj ] (s)ds.
i,j =1 i,j =1 0

ˆ ⎛ ⎞2
∞ ⎲
n
= ηi 1[0,ti ] (s) ds ≥ 0.
0 i=1

⨆
⨅
Corollary 1.3.7 (Existence of Independent Sequences of Random Variables [!])
Let .(μn )n∈N be a sequence of real distributions. There exists a sequence .(Xn )n∈N of
independent random variables defined on a complete probability space .(Ω, F , P ),
such that .Xn ∼ μn for every .n ∈ N.
Proof Apply Corollary 1.3.3 with .I = N. The family of finite-dimensional
distributions defined by

μk1 ,...,kn := μk1 ⊗ · · · ⊗ μkn ,

. k1 , . . . , kn ∈ N,

verifies the consistency properties (1.3.2)–(1.3.3). By Corollary 1.3.3, there exists a

process .(Xk )k∈N that has .μk1 ,...,kn as finite-dimensional distributions. Independence
follows from Theorem 2.3.25 in [113] and the arbitrariness of the choice of indices
.k1 , . . . , kn ∈ N. ⨆
⨅
Corollary 1.3.7 admits the following slightly more general version, whose proof
is left as an exercise. The following result requires a simplified version, compared
to Corollary 1.3.3, of the consistency property.
Corollary 1.3.8 (Existence of Sequences of Random Variables with Assigned
Distribution [!]) Let a sequence .(μn )n∈N be given, where .μn is a distribution on
.R and
n

.μn+1 (H × R) = μn (H ), H ∈ Bn , n ∈ N.

Then there exists a sequence .(Xn )n∈N of random variables defined on a complete
probability space .(Ω, F , P ), such that .(X1 , . . . , Xn ) ∼ μn for every .n ∈ N.
14 1 Stochastic Processes

1.4 Filtrations and Martingales

In this section, we consider the particular case where I is a subset of .R, typically

I = R≥0
. or I = [0, 1] or I = N.

In this case, it is useful to think of t as a parameter denoting a point in time.

Definition 1.4.1 (Filtration) Let .I ⊆ R and .(Ω, F , P ) be a probability space. A
filtration .(Ft )t∈I is an increasing family of sub-.σ -algebras of .F , in the sense that

Fs ⊆ Ft ⊆ F ,
. s, t ∈ I, s ≤ t.

In many applications, a .σ -algebra represents a set of information; as for

filtrations, the idea is that
• the .σ -algebra .Ft represents the information available at time t;
• the filtration .(Ft )t∈I represents the flow of information that increases over time.
The concept of information is crucial in probability theory: for example, the
very definition of conditional probability is essentially motivated by the problem
of describing the effect of information on the probability of events. Filtrations
constitute the mathematical tool that dynamically describes (as a function of time)
the available information and for this reason play a fundamental role in the theory of
stochastic processes. The following definition formalizes the idea that a stochastic
process is observable based on the information of some filtration.
Definition 1.4.2 (Adapted Process) Let .X = (Xt )t∈I be a stochastic process on
the space .(Ω, F , P ). We say that X is adapted to the filtration .(Ft )t∈I if .Xt ∈ mFt
for every .t ∈ I .
Definition 1.4.3 (Filtration Generated by a Process) Let .X = (Xt )t∈I be a
stochastic process on the space .(Ω, F , P ). The filtration generated by X, denoted
by .G X = (GtX )t∈I , is defined as

GtX := σ (Xs , s ≤ t) ≡ σ (Xs−1 (H ), s ≤ t, H ∈ B),

. t ∈ I. (1.4.1)

Remark 1.4.4 We use the notation .G X for the filtration generated by X because
we want to reserve the symbol .F X for another filtration that we will define later
in Sect. 6.2.2 and call standard filtration for X. The filtration generated by X is
the “smallest” filtration that includes information about the process X: clearly, X is
adapted to .(Ft )t∈I if and only if .GtX ⊆ Ft for every .t ∈ I .
Remark 1.4.5 If .X is the canonical version of X (cf. Definition 1.3.4), then

.GtX = σ (Cs (H ) | s ∈ I, s ≤ t, H ∈ B), t ∈ I,

that is, the filtration generated by .X is the one generated by cylinders.

1.4 Filtrations and Martingales 15

We now introduce a fundamental class of stochastic processes.

Definition 1.4.6 (Martingale [!!!] ) Let .X = (Xt )t∈I , with .I ⊆ R, be a stochastic
process on the filtered space .(Ω, F , P , Ft ). We say that X is a martingale if:
(i) X is an absolutely integrable process, i.e. .Xt ∈ L1 (Ω, P ) for every .t ∈ I ;
(ii) we have

.Xt = E [XT | Ft ] , t, T ∈ I, t ≤ T . (1.4.2)

If I is finite or countable, we say that X is a discrete martingale.

The concept of martingale is central to the theory of stochastic processes and in
many applications. Equation (1.4.2), called the martingale property, means that the
current value (at time t) of the process is the best estimate of the future value (at
time .T ≥ t) given the currently available information. In economics, for example,
the martingale property translates into the fact that if X represents the price of a
good, then such price is fair in the sense that it is the best estimate of the future
value of the good based on the information available at the moment.
Let X be a martingale on the filtered space .(Ω, F , P , Ft ). As an immediate
consequence of Definition 1.4.6 and the properties of conditional expectation, we
have:
(i) X is adapted to .(Ft )t∈I ;
(ii) X has constant expectation since, applying the expected value to both sides
of (1.4.2) we get7

E [Xt ] = E [XT ] ,
. t, T ∈ I.

Remark 1.4.7 The term martingale originally referred to a series of strategies

used by French gamblers in the 18th century, including the doubling strategy we
mentioned in Example 3.2.4 in [113]. The interesting monograph [94] illustrates
the history of the concept of martingale through the contribution of many famous
historians and mathematicians.
Example 1.4.8 ([!]) The sequence over time of wins and losses in a fair gambling
game can be represented by a discrete martingale: sometimes we win and sometimes
we lose, but if the game is fair, wins and losses balance each other on average.
More precisely, let .(Zn )n∈N be a sequence of i.i.d. random variables with .Zn ∼
qδ1 + (1 − q)δ−1 and .0 < q < 1 fixed. Consider the stochastic process

Xn := Z1 + · · · + Zn ,
. n ∈ N.

7 We recall that .E [E [XT | Ft ]] = E [XT ] by definition of conditional expectation.

16 1 Stochastic Processes

Here .Zn represents the win or loss at the n-th play, q is the probability of winning,
and .Xn is the balance after n plays. Consider the filtration .(GnZ )n∈N of information
on the outcomes of the plays, .GnZ = σ (Z1 , . . . , Zn ). Then we have
⎾ ⏋ ⎾ ⏋
E Xn+1 | GnZ = E Xn + Zn+1 | GnZ =
.

(since .Xn ∈ mGnZ and .Zn+1 is independent of .GnZ )

. = Xn + E [Zn+1 ] = Xn + 2q − 1.

So .(Xn ) is a martingale if .q = 12 , that is, if the game is fair. If .q ≥ 21 , that is, if

the probability of winning
⎾ a single⏋ bet is greater than or equal to the probability of
losing, then .Xn ≤ E Xn+1 | GnZ (and we say that .(Xn ) is a sub-martingale): in
this case, we also have .E [Xn ] ≤ E [Xn+1 ], that is, the process is increasing on
average.
This example shows that the martingale property is not a property of the
trajectories of the process but depends on the probability measure and the filtration
considered.
Example 1.4.9 Let .X ∈ L1 (Ω, P ) and .(Ft )t∈I be a filtration on .(Ω, F , P ). A
simple application of the tower property shows that the process defined by .Xt =
E [X | Ft ], .t ∈ I , is a martingale, in fact we have

E [XT | Ft ] = E [E [X | FT ] | Ft ] = E [X | Ft ] = Xt ,
. t, T ∈ I, t ≤ T .

Remark 1.4.10 ([!]) We will often use the following remarkable identity,
⎾ ⏋ valid for
a real-valued square-integrable martingale X, i.e. X such that .E Xt2 < ∞ for
.t ∈ I :

⎾ ⏋ ⎾ ⏋
.E (Xt − Xs ) | Fs = E Xt − Xs | Fs , s ≤ t.
2 2 2
(1.4.3)

It is enough to observe that

⎾ ⏋ ⎾ ⏋
E (Xt − Xs )2 | Fs = E Xt2 − 2Xt Xs + Xs2 | Fs
.

⎾ ⏋
= E Xt2 | Fs − 2Xs E [Xt | Fs ] + Xs2 =

(by the martingale property)

⎾ ⏋
. = E Xt2 | Fs − Xs2

from which (1.4.3) follows.

1.4 Filtrations and Martingales 17

Definition 1.4.11 Let .X = (Xt )t∈I be a stochastic process on the filtered space
(Ω, F , P , Ft ). We say that X is a sub-martingale if:
.

(i) X is an absolutely integrable and adapted process to .(Ft )t∈I ;

(ii) we have

Xt ≤ E [XT | Ft ] ,
. t, T ∈ I, t ≤ T .

Furthermore, X is a super-martingale if .−X is a sub-martingale.

Proposition 1.4.12 ([!]) If X is a martingale and .ϕ : R −→ R is a convex function
such that .ϕ(Xt ) ∈ L1 (Ω, P ) for every .t ∈ I , then .ϕ(X) is a sub-martingale.
If X is a sub-martingale and .ϕ : R −→ R is a convex, increasing function such
that .ϕ(Xt ) ∈ L1 (Ω, P ) for every .t ∈ I , then .ϕ(X) is a sub-martingale.
Proof The first part is an immediate consequence of Jensen’s inequality. Similarly,
if X is a sub-martingale then .Xt ≤ E [XT | Ft ] for .t ≤ T and since .ϕ is increasing,
we also have

ϕ(Xt ) ≤ ϕ (E [XT | Ft ]) ≤ E [ϕ(XT ) | Ft ]

where for the second inequality we have reapplied Jensen’s inequality. ⨆

⨅
Remark 1.4.13 If X is a martingale then .|X| is a non-negative sub-martingale:
however, this is not necessarily true if X is a sub-martingale since .x |→ |x| is not
increasing. Moreover, if X is a sub-martingale then also .X+ := X ∨ 0 = |X|+X2 is.
In the last part of this section, we consider the particular case where .I = N ∪{0}.
We give a deep result, valid also in a much more general framework, on the structure
of adapted stochastic processes: Doob’s decomposition theorem. First, we introduce
the following
Definition 1.4.14 (Predictable Process) Let .A = (An )n≥0 be a discrete stochastic
process, defined on the filtered space .(Ω, F , P , (Fn )n≥0 ). We say that A is
predictable if:
(i) .A0 = 0;
(ii) .An ∈ mFn−1 for every .n ∈ N.
Theorem 1.4.15 (Doob’s Decomposition Theorem) Let .X = (Xn )n≥0 be
an adapted and absolutely integrable stochastic process on the filtered space
.(Ω, F , P , (Fn )n≥0 ). There exist, and are a.s. unique, a martingale M and a

predictable process A such that

Xn = Mn + An ,
. n ≥ 0. (1.4.4)

In particular, if X is a martingale then .M ≡ X and .A ≡ 0; if X is a sub-martingale

then the process A has almost surely monotone increasing trajectories.
18 1 Stochastic Processes

Proof Uniqueness If two processes M and A, with the properties of the statement,
exist then we have

Xn+1 − Xn = Mn+1 − Mn + An+1 − An ,

. n ≥ 0. (1.4.5)

Conditioning on .Fn and exploiting the fact that X is adapted, M is a martingale and
A is predictable, we have

E [Xn+1 | Fn ] − Xn = E [Mn+1 | Fn ] − Mn + An+1 − An = An+1 − An .

Consequently, the process A is uniquely determined by the recursive formula

⎧
An+1 = An + E [Xn+1 | Fn ] − Xn , if n ∈ N,
. (1.4.6)
A0 = 0.

Note that from (1.4.6) it follows that if X is a sub-martingale then the process A has
almost surely monotone increasing trajectories.
Inserting (1.4.6) into (1.4.5) we also find
⎧
Mn+1 = Mn + Xn+1 − E [Xn+1 | Fn ] , if n ∈ N,
. (1.4.7)
M0 = X0 .

Existence It is enough to prove that the processes M and A, defined respectively

by (1.4.7) and (1.4.6), verify the properties of the statement. This is a simple check:
for example, it is easy to prove by induction on n that A is predictable. Similarly,
we prove that M is a martingale and (1.4.4) holds.
⨆
⨅
Example 1.4.16 ([!]) Let X be as in Example 1.4.8. Then the processes of the
Doob’s decomposition of X are easily calculated:

Mn = Xn − n(2q − 1),
. An = n(2q − 1).

Note that in this case the process A is deterministic; moreover, X is a sub-martingale

for .q ≥ 12 and in this case .(An )n≥0 is a monotone increasing sequence.

1.5 Proof of Kolmogorov’s Extension Theorem

Lemma 1.5.1 The family C of finite-dimensional cylinders is a semi-ring.

Proof Recalling definition (1.1.1) of finite-dimensional cylinder

⋂
n
Ct1 ,...,tn (H1 × · · · × Hn ) =
. Cti (Hi ), (1.5.1)
i=1
1.5 Proof of Kolmogorov’s Extension Theorem 19

and observing that Ct (H ) ∩ Ct (K) = Ct (H ∩ K) for every t ∈ I and H, K ∈ B,

it is not difficult to prove that C is a ∩-closed family and ∅ ∈ C . It remains to
prove that the difference of cylinders is a finite and disjoint union of cylinders: since
C \ D = C ∩ D c , for C, D ∈ C , it is sufficient to prove that the complement of a
cylinder is a disjoint union of cylinders.
For a one-dimensional cylinder we have

. (Ct (H ))c = Ct (H c ),

and therefore, by (1.5.1),

( )c ⋃
n
( )c ⋃
n
. Ct1 ,...,tn (H1 × · · · × Hn ) = Cti (Hi ) = Cti (Hic )
i=1 i=1

where in general the union is not disjoint: however, we observe that

Ct1 (H1 ) ∪ Ct2 (H2 ) = Ct1 ,t2 (H1 × H2 ) ⊎ Ct1 ,t2 (H1c × H2 ) ⊎ Ct1 ,t2 (H1 × H2c ),
.

and in general

⋃
n ⊎
. Cti (Hi ) = Ct1 ,...,tn (K1 × · · · × Kn )
i=1

where the disjoint union is taken among all the different possible combinations of
K1 × · · · × Kn where Ki is Hi or Hic , except for the case where Ki = Hic for every
i = 1, . . . , n. ⨆
⨅
We define μ on C as in (1.3.4), that is

μ(Ct1 ,...,tn (H1 × · · · × Hn ))

:= μt1 ,...,tn (H1 × · · · × Hn ), t1 , . . . , tn ∈ I, H1 , · · · Hn ∈ B.

If we prove that μ is a pre-measure (i.e., μ is additive, σ -sub-additive, and such

that μ(∅) = 0) on C , then by Carathéodory’s Theorem 1.5.5 in [113], μ extends
uniquely to a probability measure on F I .
Clearly, μ(∅) = 0 and it is not difficult to prove that μ is finitely additive. To
prove that μ is σ -sub-additive, consider a sequence (Cn )n∈N of disjoint cylinders
20 1 Stochastic Processes

whose union is a cylinder C and show that8

⎲
μ(C) =
. μ(Cn ). (1.5.2)
n∈N

To this end, set

⊎
n
.Dn = C \ Ck , n ∈ N.
k=1

By Lemma 1.5.1, Dn is a finite and disjoint union of cylinders: therefore, μ(Dn ) is

well-defined (by the additivity of μ) and we have

⎲
n
μ(C) =
. μ(Ck ) + μ(Dn ).
k=1

Then it is enough to prove that

. lim μ(Dn ) = 0. (1.5.3)

n→∞

Clearly, Dn ↘ ∅ as n → ∞. We prove (1.5.3) by contradiction and, without loss

of generality, by passing to a subsequence if necessary, suppose there exists ε > 0
such that μ(Dn ) ≥ ε for every n ∈ N: using a compactness argument, we show that
in this case the intersection of Dn is not empty, from which the contradiction.

8 Formula (1.5.2) implies the σ -sub-additivity: if A ∈ C and (An )n∈N is a sequence of elements
in C such that
⋃
.A ⊆ An
n∈N

it is enough to set C1 = A ∩ A1 ∈ C and

⋃
n−1
.Cn = (A ∩ An ) \ Ak
k=1

with Cn which, by Lemma 1.5.1, is a finite and disjoint union of cylinders for each n ≥ 2. Then
from (1.5.2) it follows that
⎲
.μ(A) ≤ μ (An ) .
n∈N
1.5 Proof of Kolmogorov’s Extension Theorem 21

We know that Dn is a finite and disjoint union of cylinders: since Dn ⊇ Dn+1 ,

possibly repeating9 the elements of the sequence, we can assume

⊎
Nn
Dn =
. ~k ,
C ~k = {x ∈ RI | (xt1 , . . . , xtn ) ∈ Hk,1 × · · · × Hk,n }
C
k=1

for some sequence (tn )n∈N in I and Hk,n ∈ B. Now we use the following fact,
the proof of which we postpone to the end: it is possible to construct a sequence
(Kn )n∈N such that
• Kn ⊆ Rn is a compact subset of

⋃
Nn
Bn :=
. (Hk,1 × · · · × Hk,n ); (1.5.4)
k=1

• Kn+1 ⊆ Kn × R;
• μt1 ,...,tn (Kn ) ≥ 2ε .
Thus, we conclude the proof of (1.5.3). Since Kn /= ∅, for each n ∈ N there exists
a vector
(n)
(y1 , . . . , yn(n) ) ∈ Kn .
.

(n) (k )
By compactness, the sequence (y1 )n∈N admits a subsequence (y1 n )n∈N con-
(k ) (k )
verging to a point y1 ∈ K1 . Similarly, the sequence (y1 n , y2 n )n∈N admits a
subsequence converging to (y1 , y2 ) ∈ K2 . Repeating the argument, we construct
a sequence (yn )n∈N such that (y1 , . . . , yn ) ∈ Kn for each n ∈ N. Therefore

{x ∈ RI | xtk = yk , k ∈ N} ⊆ Dn
.

for each n ∈ N and this completes the proof by contradiction.

Finally, we prove the existence of the sequence (Kn )n∈N . For each n ∈ N there
exists10 a compact subset K~n of Bn in (1.5.4) such that μt1 ,...,tn (Bn \ K
~n ) ≤ n+1
ε
.
2

9 Defining a new sequence of the form

.R , . . . , R I , D1 , . . . , D 1 , D2 , . . . , D 2 , D3 . . .
I

in which RI and the elements of (Dn )n∈N are repeated a sufficient number of times.
10 It is enough to combine the property of internal regularity of μ
t1 ,...,tn (cf. Proposition 1.4.9 in
[113]) with the fact that, by the continuity from below, for each ε > 0 there exists a compact K
such that μt1 ,...,tn (Rn \ K) < ε: note that this latter fact is nothing but the tightness property of the
distribution μt1 ,...,tn (cf. Definition 3.3.5 in [113]).
22 1 Stochastic Processes

Setting
⋂
n
Kn :=
. ~h × Rn−h ),
(K (1.5.5)
h=1

we have that Kn is a compact subset of Bn and Kn+1 ⊆ Kn × R. Now observe that

⋃
n
Bn \ Kn ⊆
. ~h × Rn−h ).
B n \ (K
h=1

⋃
n
⊆ ~h ) × Rn−h
(Bh \ K
h=1

and consequently

⎲
n ⎛ ⎞
.μt1 ,...,tn (Bn \ Kn ) ≤ ~h ) × Rn−h .
μt1 ,...,tn (Bh \ K
h=1

⎲
n
= ~h ).
μt1 ,...,th (Bh \ K
h=1

⎲
n
ϵ ϵ
≤ ≤ .
2h+1 2
h=1

Then we have
ε
. μt1 ,...,tn (Kn ) = μt1 ,...,tn (Bn ) − μt1 ,...,tn (Bn \ Kn ) ≥ ,
2
since μt1 ,...,tn (Bn ) = μ(Dn ) ≥ ε by hypothesis. This concludes the proof. ⨆
⨅
Kolmogorov’s extension theorem generalizes, with a substantially identical
proof, to the case where the trajectories have values in a separable and complete
metric space (M, ϱ).11 We recall the notation Bϱ for the Borel σ -algebra on
(M, ϱ); moreover, MI is the family of functions from I to values in M and FϱI
is the σ -algebra generated by finite-dimensional cylinders

Ct1 ,...,tn (H ) := {x ∈ MI | (xt1 , . . . , xtn ) ∈ H }

where t1 , . . . , tn ∈ I and H = H1 × · · · × Hn with H1 , . . . , Hn ∈ Bϱ .

11 The first part of the proof, based on Carathéodory’s theorem, is exactly the same. In the second

part, and in particular in the construction of the sequence of compact Kn in (1.5.5), the tightness
property is crucial: here we exploit the fact that, under the assumption that (M, ϱ) is separable and
complete, every distribution on Bϱ is tight (see, for example, Theorem 1.4 in [16]). Kolmogorov’s
theorem does not extend to any measurable space: in this regard, see, for example, [59] page 214.
1.6 Key Ideas to Remember 23

Theorem 1.5.2 (Kolmogorov’s Extension Theorem [!!!] ) Let I be a non-empty

set and (M, ϱ) a separable and complete metric space. Suppose that, for each
finite family of indices t1 , . . . , tn ∈ I , a distribution μt1 ,...,tn is given on Mn , and
the following consistency properties are satisfied: for each finite family of indices
t1 , . . . , tn ∈ I , for each H1 , . . . , Hn ∈ Bϱ and for each permutation ν of the indices
1, 2, . . . , n, we have

μt1 ,...,tn (H1 × · · · × Hn ) = μtν(1) ,...,tν(n) (Hν(1) × · · · × Hν(n) ),

μt1 ,...,tn (H1 × · · · × Hn−1 × M) = μt1 ,...,tn−1 (H1 × · · · × Hn−1 ).

⎛ ⎞
Then there exists a unique probability measure μ on MI , FϱI that has μt1 ,...,tn as
finite-dimensional distributions, i.e., such that

μ(Ct1 ,...,tn (H )) = μt1 ,...,tn (H )

for each finite family of indices t1 , . . . , tn ∈ I and H = H1 × · · · × Hn with

H1 , . . . , Hn ∈ Bϱ .

1.6 Key Ideas to Remember

We summarize the most significant findings of the chapter and the fundamental
concepts to be retained from an initial reading, while excluding overly technical or
ancillary details. If you have any doubt about what the following succinct statements
mean, please review the corresponding section.
• Section 1.1: we define a stochastic process as a function taking random values
or equivalently, albeit in a more abstract way, as a random variable with values
in the functional space of trajectories. The finite-dimensional distributions of a
process determine its law, playing the same role as the distribution of a random
variable.
• Section 1.2: we compare the different notions of equality between stochastic
processes, introducing the definitions of equivalence in law, indistinguishable
processes and modifications.
• Section 1.3: the main existence result for processes is Kolmogorov’s extension
Theorem 1.3.1. It states that it is possible to construct a stochastic process start-
ing from given finite-dimensional distributions that satisfy natural consistency
properties: it is a corollary of Carathéodory’s Theorem 1.4.29 in [113], and the
proof, being somewhat technical, can be safely skipped at a first reading.
24 1 Stochastic Processes

• Section 1.4: martingales constitute a fundamental class of stochastic processes

that, together with Markov processes, will be the main object of study in
the following chapters. Martingales have constant expectation and originate as
models for fair gambling games. The martingale property depends on the fixed
probability measure and filtration: a filtration describes the increasing flow of
observable information as the temporal index varies.
Main notations introduced in this chapter:

Symbol Description Page

.R
I = {x : I −→ R} Space of trajectories, I is the family of parameters 2
.Ct1 ,...,tn (H ) = {x ∈ R | xti ∈ Hi , Finite-dimensional cylinder with .ti ∈ I and .Hi ∈ B 3
I

.i = 1, . . . , n}
.C Family of finite-dimensional cylinders 3
.F = σ (C )
I .σ -algebra
generated by finite-dimensional cylinders 3
.Fμ Completion of .F I with respect to the measure .μ
I 11
.Gt = σ (Xs , s ≤ t)
X Filtration generated by the process X 14
Chapter 2
Markov Processes

World is stochastic.
From “Students’ opinions on educational activities”, A.Y.
2022/23 University of Bologna

Markov processes constitute a fundamental class of stochastic processes, character-

ized by a memoryless property which renders them highly tractable and beneficial
in practical applications. In this chapter the set of indices is .I = R≥0 , where .t ∈ I
is interpreted as a time instant.

2.1 Transition Law and Feller Processes

Definition 2.1.1 (Transition Law) A transition law on RN is a function

p = p(t, x; T , H ),
. 0 ≤ t ≤ T , x ∈ R N , H ∈ BN ,

that satisfies the following conditions:

(i) for every 0 ≤ t ≤ T and x ∈ RN , p(t, x; T , ·) is a distribution, i.e., a
probability measure on BN , and p(t, x; t, ·) = δx ;
(ii) for every 0 ≤ t ≤ T and H ∈ BN , the function x |→ p(t, x; T , H ) is BN -
measurable.
Let X = (Xt )t≥0 be a stochastic process taking values in RN , defined on the
probability space (Ω, F , P ). We say that X has transition law p if:
(i) p is a transition law;
(ii) we have1

p(t, Xt ; T , H ) = P (XT ∈ H | Xt ),
. 0 ≤ t ≤ T , H ∈ BN .

1 We
⎾ recall the convention
⏋ where P (XT ∈ H | Xt ) denotes the usual conditional expectation
E 1H (XT ) | Xt , as in Remark 4.3.5 in [113].

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 25

A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_2
26 2 Markov Processes

Remark 2.1.2 By properties (i) and (ii) of Definition 2.1.1, if X has transition law
p then p(t, Xt ; T , ·) is a regular version2 of the conditional law of XT given Xt .
Hence, we have
ˆ
. p(t, Xt ; T , dy)ϕ(y) = E [ϕ(XT ) | Xt ] , ϕ ∈ bBN , (2.1.1)
RN

by Theorem 4.3.8 in [113]. Analogously, p(t, x; T , ·) is a regular version of the

conditional distribution function3 of XT given Xt and we have
ˆ
. p(t, x; T , dy)ϕ(y) = E [ϕ (XT ) | Xt = x] , x ∈ RN , ϕ ∈ bBN ,
RN
(2.1.2)
by Theorem 4.3.19 in [113]. Notice that
ˆ
.u(x) := p(t, x; T , dy)ϕ(y), x ∈ RN ,
RN

is a BN -measurable, bounded function: indeed, by (ii) of Definition 2.1.1, u ∈ bBN

if ϕ = 1H and by approximation, thanks to Lemma 2.2.3 in [113] and Beppo Levi’s
theorem, so is for every ϕ ∈ bBN . In accordance with notation (4.2.10) in [113],
formula (2.1.2) indicates that u is a version of the conditional expectation function
of ϕ (XT ) given Xt .
Remark 2.1.3 Definition 2.1.1 extends in an obvious way to the case where,
instead of (RN , BN ), a generic metric space (M, ϱ) is considered, equipped with
the Borel σ -algebra Bϱ (cf. Definition 1.4.4 in [113]).
Example 2.1.4 ([!]) Consider the “trivial” case of the deterministic process Xt =
γ (t) with γ : R≥0 −→ RN which is interpreted as a parametrized curve in RN .
We have

E [ϕ(XT ) | Xt ] = ϕ(γ (T )) = ϕ(γ (t) + γ (T ) − γ (t))

and therefore a regular version of the conditional expectation function of ϕ(XT )

given Xt equals
ˆ
E [ϕ(XT ) | Xt = x] = ϕ(x + γ (T ) − γ (t)) =
. δx+γ (T )−γ (t) (dy)ϕ(y).
R

In other words,

p(t, x; T , ·) = δx+γ (T )−γ (t)

2 Definition 4.3.1 in [113].

3 Theorem 4.3.16 in [113].
2.1 Transition Law and Feller Processes 27

is a transition law of X: this result is a very particular case of Proposition 2.3.2

which we will prove later. Notice that the transition law is not unique: for example,
if for every 0 ≤ t ≤ T we set
⎧
δx+γ (T )−γ (t) if x = γ (t),
.~(t, x; T , ·) =
p
δx if x /= γ (t),

~ is a transition law for X.

then even p
Remark 2.1.5 (Time-Homogeneous Transition Law) A transition law p is said
to be time-homogeneous if

p(t, x; T , H ) = p(0, x; T − t, H ),
. 0 ≤ t ≤ T , x ∈ R, H ∈ B.

If X has a time-homogeneous transition law p, then

ˆ
E [ϕ(XT ) | Xt = x] =
. p(t, x; T , dy)ϕ(y)
R
ˆ
= p(0, x; T − t, dy)ϕ(y) = E [ϕ(XT −t ) | X0 = x] .
R
(2.1.3)

Equation (2.1.3) means that the conditional expectation function of ϕ(XT ) given Xt
is equal to the conditional expectation function of the temporally translated process
at the initial time.4
Example 2.1.6 (Poisson Transition Law [!]) Recall that Poissonx,λ denotes the
Poisson distribution with parameter λ > 0 and centered at x ∈ R, defined in
Example 1.4.17 in [113]. The Poisson transition law with parameter λ > 0 is
defined by

p(t, x; T , ·) = Poissonx,λ(T −t)

+∞
⎲ (λ(T − t))n
= e−λ(T −t) δx+n , 0 ≤ t ≤ T , x ∈ R.
n!
n=0

4 If, for simplicity, we denote

.Ex [Y ] = E [Y | X0 = x] ,

Eq. (2.1.3) can be written in the more compact form

.E [ϕ (XT ) | Xt ] = EXt [ϕ (XT −t )] . (2.1.4)

For clarity: the right-hand side of (2.1.4) is the conditional expectation of ϕ (XT −t ) given X0 ,
evaluated at Xt .
28 2 Markov Processes

Properties (i) and (ii) of Definition 2.1.1 are obvious. The Poisson transition law is
time-homogeneous and invariant under translations in the sense that

p(t, x; T , H ) = p(0, 0; T − t, H − x),

. 0 ≤ t ≤ T , x ∈ R, H ∈ B.

Definition 2.1.7 (Transition Density) A transition law p is absolutely continuous

if, for every 0 ≤ t < T and x ∈ RN , there exists a density 𝚪 = 𝚪(t, x; T , ·) such
that
ˆ
.p(t, x; T , H ) = 𝚪(t, x; T , y)dy, H ∈ BN .
H

We say that 𝚪 is a transition density of p (or, of X, if p is the transition law of a

process X).
Remark 2.1.8 The transition density 𝚪 = 𝚪(t, x; T , y) of a process X is a function
of four variables: the first pair (t, x) represents the time and starting point of X; the
second pair (T , y) represents the time and random position of arrival of X. For any
ϕ ∈ bBN , we have
ˆ
. 𝚪(t, Xt ; T , y)ϕ(y)dy = E [ϕ(XT ) | Xt ] ,
RN

or, in terms of conditional expectation function,

ˆ
. 𝚪(t, x; T , y)ϕ(y)dy = E [ϕ(XT ) | Xt = x] , x ∈ RN .
RN

Example 2.1.9 (Gaussian Transition Law [!]) The Gaussian transition law is
defined by p(t, x; T , ·) = Nx,T −t for every 0 ≤ t ≤ T and x ∈ R. It is an
absolutely continuous transition law since

p(t, x; T , H ) := Nx,T −t (H )
.
ˆ
= 𝚪(t, x; T , y)dy, 0 ≤ t < T , x ∈ R, H ∈ B,
H

where

1 (x−y)2
𝚪(t, x; T , y) = √
. e− 2(T −t) , 0 ≤ t < T , x, y ∈ R,
2π(T − t)

is the Gaussian transition density. It is clear that p satisfies properties (i) and (ii) of
Definition 2.1.1.
We now introduce a notion of “continuous dependence” of the transition law with
respect to the initial datum (t, x).
2.1 Transition Law and Feller Processes 29

Definition 2.1.10 (Feller Property) A transition law p has the Feller property if
for every h > 0 and ϕ ∈ bC(RN ) the function
ˆ
(t, x) |−→
. p(t, x; t + h, dy)ϕ(y)
RN

is continuous. A Feller process is a process with a transition law that satisfies the
Feller property.
The Feller property is equivalent to the continuity under weak convergence of
the transition law p = p(t, x; t + h, ·) with respect to the pair (t, x) of initial
time and position: more precisely, recalling the definition of weak convergence of
distributions (cf. Remark 3.1.1 in [113]), the fact that X is a Feller process with
transition law p means that

d
p(tn , xn ; tn + h, ·) −−→ p(t, x; t + h, ·)
.

for every sequence (tn , xn ) converging to (t, x) as n → +∞.

When p is homogeneous in time, the Feller property reduces to continuity with
respect to x: precisely, p has the Feller property if for every h > 0 and ϕ ∈ bC(RN )
the function
ˆ
.x |−→ p(0, x; h, dy)ϕ(y)
RN

is continuous. The Feller property plays an important role in the study of Markov
processes (cf. Chap. 7) and the regularity properties of continuous-time filtrations
(cf. Sect. 6.2.1).
Example 2.1.11 Poisson and Gaussian transition laws satisfy the Feller property
(cf. Examples 2.4.5 and 2.4.6): therefore, we say that the related stochastic processes
that we will introduce later, respectively the Poisson process and the Brownian
motion, are Feller processes.
We conclude the section with a technical result. Recall Definition 1.3.4 of the
canonical version of a stochastic process.
Proposition 2.1.12 If p is a transition law for the process X, defined on the space
(Ω, F , P ), then it is also a transition law for its canonical version X.
Proof Recall that X is defined on the probability space (RI , FμI X , μX ), where FμI X
denotes the μX -completion of F I , and X(w) = w for every w ∈ RI . Given 0 ≤
t ≤ T and H ∈ B, let Z := p(t, Xt , T , H ): we have to verify that

Z = E μX [1H (XT ) | Xt ]
. (2.1.5)
30 2 Markov Processes

where E μX [·] denotes the expected value under the probability measure μX . Clearly
Z ∈ mσ (Xt ). Moreover, if W ∈ bσ (Xt ) then by Doob’s theorem W = ϕ(Xt ) with
ϕ ∈ bB and we have

E μX [ZW ] = E μX [p(t, Xt , T , H )ϕ(Xt )] =

. (2.1.6)

(since X and X have the same law)

. = E P [p(t, Xt , T , H )ϕ(Xt )] =

(since p is a transition law of X)

. = E P [1H (XT )ϕ(Xt )] =

(again by the equality in law of X and X)

. = E μX [1H (XT )ϕ(Xt )] .

This proves (2.1.5) and concludes the proof. ⨆

⨅

2.2 Markov Property

We consider for simplicity the scalar case, .N = 1.
Definition 2.2.1 (Markov Process) Let .X = (Xt )t≥0 be an adapted stochastic
process on the filtered space .(Ω, F , P , Ft ). We say that X is a Markov process if
it has a transition law p such that5

p(t, Xt ; T , H ) = P (XT ∈ H | Ft ),
. 0 ≤ t ≤ T , H ∈ B. (2.2.1)

Formula (2.2.1) is a memoryless property: intuitively, it expresses the fact that

the knowledge of .Ft (and, in particular, of the entire trajectory of X up to time t)
or just the value .Xt provide the same information regarding the distribution of the
future value .XT .
Proposition 2.2.2 (Markov Property) Let .X = (Xt )t≥0 be an adapted stochastic
process on the filtered space .(Ω, F , P , Ft ), with transition law p. Then X is a
Markov process if and only if
ˆ
. p(t, Xt ; T , dy)ϕ(y) = E [ϕ(XT ) | Ft ] , 0 ≤ t ≤ T , ϕ ∈ bB. (2.2.2)
R

5 Here, as in Remark 4.3.5 in [113], .P (X ∈ · | F ) indicates a regular version of the conditional

T t ⎾ ⏋
distribution of .XT given .Ft . Formula (2.2.1) is equivalent to .p(t, Xt ; T , H ) = E 1H (XT ) | Ft ,
that is, .p(t, Xt ; T , H ) is a version of the conditional expectation of .1H (XT ) given .Ft .
2.2 Markov Property 31

Proof If X is a Markov process then .p(t, Xt ; T , ·) is a regular version of the

conditional law of .XT given .Ft and (2.2.2) follows from Theorem I-cc20 in [113].
The converse is obvious, with the choice .ϕ = 1H , .H ∈ B. ⨆
⨅
Remark 2.2.3 By density, we also have that X is a Markov process if and only
if (2.2.2) holds for every .ϕ ∈ C0∞ . Combining (2.1.1) with (2.2.2), it is customary
to write6

E [ϕ(XT ) | Xt ] = E [ϕ(XT ) | Ft ] .
. (2.2.3)

The Markov property can be generalized in the following way. Observe that if
t ≤ t1 < t2 and .ϕ1 , ϕ2 ∈ bB then, by the tower property, we have
.

⎾ ⏋ ⎾ ⎾ ⏋ ⏋
E ϕ1 (Xt1 )ϕ2 (Xt2 ) | Xt = E E ϕ1 (Xt1 )ϕ2 (Xt2 ) | Ft1 | Xt .
.
⎾ ⎾ ⏋ ⏋
= E ϕ1 (Xt1 )E ϕ2 (Xt2 ) | Ft1 | Xt =

(by the Markov property)

⎾ ⎾ ⏋ ⏋
. = E ϕ1 (Xt1 )E ϕ2 (Xt2 ) | Xt1 | Xt =

(by the Markov

⎾ property⏋ applied to the external conditional expectation, being
ϕ1 (Xt1 )E ϕ2 (Xt2 ) | Xt1 a bounded and Borel-measurable function of .Xt1 by
.

Doob’s theorem)
⎾ ⎾ ⏋ ⏋
. = E ϕ1 (Xt1 )E ϕ2 (Xt2 ) | Xt1 | Ft =

(by the Markov property applied to the internal conditional expectation)

⎾ ⎾ ⏋ ⏋
. = E ϕ1 (Xt1 )E ϕ2 (Xt2 ) | Ft1 | Ft .
⎾ ⎾ ⏋ ⏋
= E E ϕ1 (Xt1 )ϕ2 (Xt2 ) | Ft1 | Ft .
⎾ ⏋
= E ϕ1 (Xt1 )ϕ2 (Xt2 ) | Ft .

6 Formula (2.2.3) is not an equality but a notation that must be interpreted in the sense of

Convention 4.2.5 in [113]: precisely, (2.2.3) means that if .Z = E [ϕ(XT ) | Xt ] then .Z =

E [ϕ(XT ) | Ft ]. However, there may exist a version .Z ' of .E [ϕ(XT ) | Ft ] that is not .σ (Xt )-
measurable (It is enough to modify Z on a negligible event that belongs to .Ft but not to .σ (Xt ).)
and therefore is not the expectation of .ϕ(XT ) conditioned on .Xt . On the other hand, if (2.2.3)
holds and .Z ' = E [ϕ(XT ) | Ft ] then .Z ' = f (Xt ) a.s. for some .f ∈ mB : indeed, taking a version
Z of .E [ϕ(XT ) | Xt ], by Doob’s theorem, .Z = f (Xt ) and by (2.2.3) (and the uniqueness of the
conditional expectation) .Z = Z ' a.s. These subtleties are relevant when one has to verify in practice
the validity of the Markov property: Example 11.1.10 is illuminating in this sense.
32 2 Markov Processes

Hence, we have7

E [Y | Xt ] = E [Y | Ft ]
. (2.2.4)

for .Y = ϕ1 (Xt1 )ϕ2 (Xt2 ) with .t ≤ t1 < t2 and .ϕ1 , ϕ2 ∈ bB. By induction, it is not
difficult to prove that (2.2.4) also holds if

∏
n
Y =
. ϕk (Xtk ) (2.2.5)
k=1

for every .t ≤ t1 < · · · < tn and .ϕ1 , . . . , ϕn ∈ bB. Finally, by8 Dynkin’s
Theorem A.0.8 in [113], (2.2.4) is valid for every bounded r.v. that is measurable
with respect to the .σ -algebra generated by the random variables of the type .Xs with
.s ≥ t, that is

Gt,∞
.
X
:= σ (Xs , s ≥ t). (2.2.6)

The .σ -algebra .Gt,∞

X represents the future information on X starting from time t,

by analogy with Definition 1.4.3. In conclusion, we have proven the following

generalized Markov property.
Theorem 2.2.4 (Extended Markov Property) Let X be a Markov process on
(Ω, F , P , Ft ). We have
.

E [Y | Xt ] = E [Y | Ft ] ,
. Y ∈ bGt,∞
X
. (2.2.7)

The following corollary expresses the essence of the Markov property: the past
(i.e., .Ft ) and the future (i.e., .Gt,∞
X ) are conditionally independent9 given the present

(i.e., .σ (Xt )).

7 In accordance with convention (2.2.3).

8 We use Dynkin’s Theorem A.0.8 in [113] in the following way: let .A be the family of cylinders
⋂n
of the form .C = (Xtk ∈ Hk ) as .t ≤ t1 ≤ · · · ≤ tn and .H1 , . . . , Hn ∈ B . Then .A is a .∩-closed
k=1
family of events. Let .H be the family of bounded random variables for which (2.2.4) holds: by
Beppo Levi’s theorem for conditional expectation, .H is a monotone family; moreover, choosing
.ϕk = 1Hk in (2.2.5), we have that .H contains the indicator functions of elements of .A . Then
Theorem A.0.8 in [113] ensures that .H also contains the bounded and .σ (A )-measurable random
variables.
9 More precisely: if there exists a regular version of the conditional probability .P (· | X ) (this is
t
guaranteed if .Ω is a Polish space) then (2.2.8) with .Y = 1A , .A ∈ Gt,∞
X , and .Z = 1 , .B ∈ F ,
B t
becomes

.P (A | Xt )P (B | Xt ) = P (A ∩ B | Xt ).
2.2 Markov Property 33

Corollary 2.2.5 ([!]) Let X be a Markov process on .(Ω, F , P , Ft ). Then we have

.E [Y | Xt ] E [Z | Xt ] = E [Y Z | Xt ] , Y ∈ bGt,∞
X
, Z ∈ bFt . (2.2.8)

Proof We verify that .E [Y | Xt ] E [Z | Xt ] is a version of the expectation of Y Z

conditioned on .Xt : the measurability property .E [Y | Xt ] E [Z | Xt ] ∈ mσ (Xt ) is
obvious. Given .W ∈ bσ (Xt ), we have

E [W E [Y | Xt ] E [Z | Xt ]] =
.

(since .W E [Y | Xt ] ∈ bσ (Xt ) and by property (ii) of the definition of conditional

expectation .E [Z | Xt ])

. = E [W E [Y | Xt ] Z] =

(by the extended Markov property (2.2.7))

. = E [W E [Y | Ft ] Z] .
= E [E [W Y Z | Ft ]] = E [W Y Z]

which proves the second property of the definition of conditional expectation. ⨆

⨅
Finally, we introduce the canonical version of a Markov process. The insistence
on prioritizing the canonical version (cf. Definition 1.3.4) of a process is justified by
the importance of the completeness property of the space and the fact that we can
identify the outcomes with the trajectories of the process: this will be even clearer
when, in Chap. 7, we will express the Markov property using an appropriate time
translation operator.
Proposition 2.2.6 (Canonical Version of a Markov Process) Let X be a
Markov process on the space .(Ω, F , P , Ft ) with transition law p and let .X
be its canonical version. Then .X is a Markov process with transition law p on
.(R , Fμ , μX , G ) where, as usual, .G
I I X X denotes the filtration generated by .X
X
(cf. (1.4.1) and Remark 1.4.5).
Proof By Proposition 2.1.12, p is also a transition law of .X, so it suffices to prove
that, for every .0 ≤ t ≤ T and .H ∈ B, setting .Z := p(t, Xt , T , H ), we have
⎾ ⏋
Z = E μX 1H (XT ) | GtX
.

where .E μX [·] denotes the expected value under the probability measure .μX .
Obviously, .Z ∈ mGtX and therefore it remains to verify that

E μX [ZW ] = E μX [1H (XT )W ] ,

. W ∈ bGtX .
34 2 Markov Processes

Actually, thanks to10 Dynkin’s Theorem A.0.8 in [113], it is sufficient to consider

W of the form

W = ϕ(Xt1 , . . . , Xtn )
.

with .0 ≤ t1 < · · · < tn ≤ t and .ϕ ∈ bBn . Now, it is enough to proceed as in the

proof of Proposition 2.1.12:
⎾ ⏋
E μX [ZW ] = E μX p(t, Xt , T , H )ϕ(Xt1 , . . . , Xtn ) =
.

(since X and .X have the same distribution)

⎾ ⏋
. = E P p(t, Xt , T , H )ϕ(Xt1 , . . . , Xtn ) =

(by the Markov property of X)

⎾ ⏋
. = E P 1H (XT )ϕ(Xt1 , . . . , Xtn ) =

(again by the equality in distribution of X and .X)

⎾ ⏋
. = E μX 1H (XT )ϕ(Xt1 , . . . , Xtn ) .

⨆
⨅

2.3 Processes with Independent Increments and Martingales

Let .X = (Xt )t≥0 be a stochastic process on the filtered space .(Ω, F , P , Ft ).

Definition 2.3.1 (Process with Independent Increments) We say that X has
independent increments if:
(i) X is adapted to .(Ft )t≥0 ;
(ii) the increment .XT − Xt is independent of .Ft for every .0 ≤ t < T .
Proposition 2.3.2 ([!]) Let .X = (Xt )t≥0 be a process with independent increments,
then X is a Markov process with transition law .p = p(t, x; T , ·) equal to the law of

XTt,x := XT − Xt + x,
. 0 ≤ t ≤ T , x ∈ R. (2.3.1)

10 We use Dynkin’s Theorem A.0.8 in [113] in a similar way to what was done in the proof of
Theorem 2.2.4.
2.3 Processes with Independent Increments and Martingales 35

Proof Let us prove that p in (2.3.1) is a transition law for X. Clearly, .p(t, x; T , ·)
is a distribution and .p(t, x; t, ·) = δx . Moreover, if .μXT −Xt denotes the law of
.XT − Xt , then by the Fubini’s theorem, for any .H ∈ B the function

x−
.| → p(t, x; T , H ) = μXT −Xt (H − x)

is .B-measurable. Finally, fixed .H ∈ B, .p(t, Xt ; T , H ) = P (XT ∈ H | Xt ) as a

consequence of the fact that for every function .ϕ ∈ bB we have

E [ϕ(XT ) | Xt ] = E [ϕ(XT − Xt + Xt ) | Xt ] =
.

(by the freezing Lemma 4.2.11 in [113], since .XT − Xt is independent of .Xt and
obviously .Xt is .σ (Xt )-measurable)
ˆ
⎾ ⏋|
. = E ϕ(XTt,x ) |x=X = p(t, Xt ; T , dy)ϕ(y).
t
R

Similarly, the Markov property (2.2.2) (and consequently (2.2.1)) is established,

conditioning on .Ft rather than .Xt . ⨆
⨅
It is interesting to compare the definitions of a process with independent
increments and a martingale. We begin by observing that if X has independent
increments, then for every .n ∈ N and .0 ≤ t0 < t1 < · · · < tn , the increments
.Xtk − Xtk−1 are indeed independent; in particular, if X is square-integrable, i.e.,

.Xt ∈ L (Ω, P ) for any t, then the increments are uncorrelated:

cov(Xtk − Xtk−1 , Xth − Xth−1 ) = 0,

. 1 ≤ k < h ≤ n.

Even a martingale has uncorrelated (but not necessarily independent) increments.

Proposition 2.3.3 Let X be a square-integrable martingale. Then X has uncorre-
lated increments.
Proof Let .t0 ≤ t1 ≤ t2 ≤ t3 . We have
⎾ ⏋
. cov(Xt1 − Xt0 , Xt3 − Xt2 ) = E (Xt1 − Xt0 )(Xt3 − Xt2 ) .
⎾ ⎾ ⏋⏋
= E E (Xt1 − Xt0 )(Xt3 − Xt2 ) | Ft2 .
⎾ ⎾ ⏋⏋
= E (Xt1 − Xt0 )E Xt3 − Xt2 | Ft2 = 0.

⨆
⨅
A process with independent increments is not necessarily integrable, nor constant
in mean, and therefore not necessarily a martingale. However, we have the following
36 2 Markov Processes

Proposition 2.3.4 Let X be an absolutely integrable process with independent

~t := Xt − E [Xt ] is a
increments. Then the “compensated” process defined by .X
martingale.
Proof It is enough to observe that for every .t ≤ T we have
⎾ ⏋ ⎾ ⏋
E X
. ~T | Ft = E X~T − X ~t =
~t | Ft + X

~ has independent increments)

(since also .X
⎾ ⏋
. =E X~T − X ~t = X
~t + X ~t

~ has zero mean.

since .X ⨆
⨅
Remark 2.3.5 [!] Proposition 2.3.4 provides the Doob’s decomposition .X = X ~+
A of the process X: in this case the drift process .At = E [Xt ] is deterministic.

2.4 Finite-Dimensional Laws and Chapman-Kolmogorov

Equation

Let X be a Markov process with initial distribution .μ (i.e., .X0 ∼ μ) and transition
law p. The following result shows that, starting from the knowledge of .μ and p, it
is possible to determine the finite-dimensional distributions (and therefore the law!)
of X.
Proposition 2.4.1 (Finite-Dimensional Distributions [!]) Let .X = (Xt )t≥0 be
a Markov process with transition law p and such that .X0 ∼ μ. For every
.t0 , t1 , . . . , tn ∈ R with .0 = t0 < t1 < t2 < · · · < tn , and .H ∈ Bn+1 we have

ˆ ∏
n
P ((Xt0 , Xt1 , . . . , Xtn ) ∈ H ) =
. μ(dx0 ) p(ti−1 , xi−1 ; ti , dxi ). (2.4.1)
H i=1

Proof By Corollary A.0.5 in [113] it is sufficient to prove the thesis when .H =

H0 × · · · × Hn with .Hi ∈ B. We proceed by induction: in the case .n = 1 we have
⎾ ⏋
.P ((Xt0 , Xt1 ) ∈ H0 × H1 ) = E 1H0 (Xt0 )1H1 (Xt1 )
⎾ ⎾ ⏋⏋
= E 1H0 (Xt0 )E 1H1 (Xt1 ) | Xt0
⎾ ˆ ⏋
= E 1H0 (Xt0 ) p(t0 , Xt0 ; t1 , dx1 ) =
H1
2.4 Finite-Dimensional Laws and Chapman-Kolmogorov Equation 37

(by Fubini’s theorem)

ˆ
. = μ(dx0 )p(t0 , x0 ; t1 , dx1 ).
H0 ×H1

Now suppose (2.4.1) is true for n and prove it for .n + 1: for .H ∈ Bn+1 and .K ∈ B
we have
⎾ ⎾ ⏋⏋
P ((Xt0 , . . . , Xtn+1 ) ∈ H × K) = E 1H (Xt0 , . . . , Xtn )E 1K (Xtn+1 ) | Ftn =
.

(by the Markov property)

⎾ ⎾ ⏋⏋
. = E 1H (Xt0 , . . . , Xtn )E 1K (Xtn+1 ) | Xtn .
⎾ ˆ ⏋
= E 1H (Xt0 , . . . , Xtn ) p(tn , Xtn ; tn+1 , dxn+1 ) =
K

(by inductive hypothesis and Fubini’s theorem)

ˆ ∏
n+1
. = μ(dx0 ) p(ti−1 , xi−1 ; ti , dxi ).
H ×K i=1

⨆
⨅
Remark 2.4.2 In the particular case .μ = δx0 , (2.4.1) becomes
ˆ ∏
n
. P ((Xt1 , . . . , Xtn ) ∈ H ) = p(ti−1 , xi−1 ; ti , dxi ), H ∈ Bn . (2.4.2)
H i=1

The following remarkable result provides a necessary condition for a transition

law to be the transition law of a Markov process.
Proposition 2.4.3 (Chapman-Kolmogorov Equation [!!]) Let X be a Markov
process with transition law p. For every .0 ≤ t1 < t2 < t3 and .H ∈ B, we have
ˆ
p(t1 , Xt1 ; t3 , H ) =
. p(t1 , Xt1 ; t2 , dx2 )p(t2 , x2 ; t3 , H ). (2.4.3)
R

Proof Intuitively, the Chapman-Kolmogorov equation expresses the fact that the
probability of moving from position .x1 at time .t1 to a position in H at time .t3
is equal to the probability of transitioning to a position .x2 at an interim time .t2 ,
followed by a transition from .x2 to H , integrated over all possible values of .x2 .
We have
⎾ ⏋
p(t1 , Xt1 ; t3 , H ) = E 1H (Xt3 ) | Xt1 =
.
38 2 Markov Processes

(by the tower property)

⎾ ⎾ ⏋ ⏋
. = E E 1H (Xt3 ) | Ft2 | Xt1 =

(by the Markov property (2.2.1))

⎾ ⏋
. = E p(t2 , Xt2 ; t3 , H ) | Xt1 =

(by (2.1.1))
ˆ
. = p(t1 , Xt1 ; t2 , dx2 )p(t2 , x2 ; t3 , H ).
R

⨆
⨅
We now show that the Chapman-Kolmogorov equation is actually a necessary
and sufficient condition, in the sense that it is always possible to construct a Markov
process from an initial law and a transition law p provided that it verifies (2.4.3).
Theorem 2.4.4 ([!]) Let .μ be a distribution on .R and let .p = p(t, x; T , H ) be a
transition law11 that verifies the Chapman-Kolmogorov equation
ˆ
p(t1 , x; t3 , H ) =
. p(t1 , x; t2 , dy)p(t2 , y; t3 , H ), (2.4.4)
R

for every .0 ≤ t1 < t2 < t3 , .x ∈ R and .H ∈ B. Then there exists a Markov process
X = (Xt )t≥0 with transition law p and such that .X0 ∼ μ.
.

Proof Consider the family of finite-dimensional distributions defined by (2.4.1):

specifically, if .0 = t0 < t1 < t2 < · · · < tn we set
ˆ ∏
n
μt0 ,...,tn (H ) =
. μ(dx0 ) p(ti−1 , xi−1 ; ti , dxi ), H ∈ Bn+1 ,
H i=1

and if .t0 , . . . , tn are not ordered in increasing order, we define .μt0 ,...,tn by (1.3.2) by
reordering the times. In this way, the consistency property (1.3.2) is automatically
satisfied by construction. On the other hand, the Chapman-Kolmogorov equation
guarantees the validity of the second consistency property (1.3.3) since, after
ordering the times in increasing order, we have

μt0 ,...,tk−1 ,tk ,tk+1 ,...,tn (H0 × · · · × Hk−1 × R × Hk+1 × · · · × Hn )

= μt0 ,...,tk−1 ,tk+1 ,...,tn (H0 × · · · × Hk−1 × Hk+1 × · · · × Hn ).

11 That is, p verifies properties (i) and (ii) of Definition 2.1.1.

2.4 Finite-Dimensional Laws and Chapman-Kolmogorov Equation 39

Since the assumptions of Kolmogorov’s extension theorem are satisfied, we consider

the stochastic process .X = (Xt )t≥0 constructed canonically as in Corollary 1.3.3:
X has the finite-dimensional distributions in (2.4.1) and is defined on the filtered
space .(Ω, F , P , (GtX )t≥0 ) with .Ω = R[0,+∞) : we recall that, by Remark 1.4.4, the
filtration .(GtX )t≥0 is the one generated by finite-dimensional cylinders.
It remains to prove that X is a Markov process with transition distribution p.
Fixing .0 ≤ t < T and .ϕ ∈ bB, we prove that the following formula, equivalent
to (2.2.2), holds
ˆ ⎾ ⏋
. p(t, Xt ; T , dy)ϕ(y) = E ϕ(XT ) | GtX ,
R

by directly verifying the properties of conditional expectation. Setting

ˆ
Z=
. p(t, Xt ; T , dy)ϕ(y)
R

clearly .Z ∈ mGtX . By Remark 4.2.2 in [113], to conclude it is sufficient to prove

that

E [1C ϕ(XT )] = E [1C Z]

where C is a finite-dimensional cylinder in .GtX of the form in (1.1.1): in particular,

it is not restrictive to assume .C = Ct0 ,t1 ,...,tn (H ) with .H ∈ Bn+1 and .tn = t. This
allows us to use the finite-dimensional distributions in (2.4.1): in fact, we have
⎾ ⏋
E 1Ct0 ,...,tn (H ) ϕ(XT )
.

⎾ ⏋
= E 1H (Xt0 , Xt1 , . . . , Xtn )ϕ(XT )
ˆ ∏n ˆ
= μ(dx0 ) p(ti−1 , xi−1 ; ti , dxi ) p(tn , xn ; T , dy)ϕ(y)
H i=1 R
⎾ ˆ ⏋
= E 1H (Xt0 , . . . , Xtn ) p(tn , Xtn ; T , dy)ϕ(y)
R
⎾ ⏋
= E 1Ct0 ,...,tn (H ) Z .
⨆
⨅
Example 2.4.5 (Poisson Transition Law [!]) The Poisson transition law with
parameter .λ > 0 (cf. Example 2.1.6)

p(t, x; T , ·) = Poissonx,λ(T −t)

+∞
⎲ (λ(T − t))n
= e−λ(T −t) δx+n , 0 ≤ t ≤ T , x ∈ R,
n!
n=0
40 2 Markov Processes

satisfies the Chapman-Kolmogorov equation: this can be proved proceeding as12 in

Example 2.6.5 in [113] on the sum of independent Poisson random variables. The
Markov process associated with p is called the Poisson process and will be studied
in Chap. 5. For any .ϕ ∈ bC and .t > 0 the function
ˆ +∞
⎲
−λt (λt)n
x |−→
. Poissonx,λt (dy)ϕ(y) = e ϕ(x + n)
R n!
n=0

is continuous and therefore the Poisson process is a Feller process.

Example 2.4.6 (Gaussian Transition Law [!]) Consider the Gaussian transition
law of Example 2.1.9:
ˆ
p(t, x; T , H ) :=
. 𝚪(t, x; T , y)dy, 0 ≤ t < T , x ∈ R, H ∈ B,
H

where

1 (x−y)2
.𝚪(t, x; T , y) = √ e− 2(T −t) , 0 ≤ t < T , x, y ∈ R,
2π(T − t)

is the Gaussian transition density. The Gaussian transition law satisfies the
Chapman-Kolmogorov equation as it is verified directly by calculating the
convolution of two Gaussians or, more easily, the product of their characteristic
functions. We will study later, in Chap. 4, the Markov process associated with p,

12 For .0 ≤ t < s < T , we have

ˆ +∞
⎲ (λ(s − t))n
. p(t, x; s, dy)p(s, y; T , H ) = e−λ(s−t) p(s, x + n; T , H )
R n!
n=0
+∞
⎲ (λ(s − t))n (λ(T − s))m
= e−λ(T −t) δx+n+m (H ) =
n! m!
n,m=0

(by the change of indices .i = n + m and .j = n)

+∞ ⎲
⎲ i
(s − t)j (T − s)i−j
. = e−λ(T −t) λi δx+i (H )
j! (i − j )!
i=0 j =0

+∞ i
⎲ i ⎛ ⎞
⎲
λ i
= e−λ(T −t) δx+i (H ) (s − t)j (T − s)i−j
i! j
i=0 j =0

= p(t, x; T , H ).
2.5 Characteristic Operator and Kolmogorov Equations 41

the so-called Brownian motion. For any .ϕ ∈ bC and .T > 0 the function
ˆ
x |−→
. 𝚪(0, x; T , y)ϕ(y)dy (2.4.5)
R

is continuous and therefore the Brownian motion is a Feller process. Actually, one
verifies that the function in (2.4.5) is .C ∞ for each .T > 0 and .ϕ ∈ bB (not just
for .ϕ ∈ bC): for this reason we say that Brownian motion verifies the strong Feller
property.
Remark 2.4.7 (Transition Law and Semigroups) For( each) transition law .p =
p(t, x; T , ·), there exists a corresponding family .p = pt,T 0≤t≤T of linear and
bounded operators

pt,T : bB −→ bB
.

defined by
ˆ
pt,T ϕ :=
. p(t, ·; T , dy)ϕ(y), ϕ ∈ bB.
R

Note that .pt,T ϕ ∈ bB for every .ϕ ∈ bB and by Jensen’s inequality we have

‖pt,T ϕ‖∞ ≤ ‖ϕ‖∞ .

The Chapman-Kolmogorov equation (2.4.4) corresponds to the so-called semigroup

property of .p:

pt,s ◦ ps,T = pt,T ,

. t ≤ s ≤ T.
( )
The family .p = pt,T 0≤t≤T is called the semigroup of operators associated with
the transition law p. Moreover, we say that .p is a homogeneous semigroup if .pt,T =
p0,T −t for every .t ≤ T : in this case, we simply write .pt instead of .p0,t . There are
many monographs on Markov processes and semigroup theory: among the most
recent, we mention [71, 142] and [138].

2.5 Characteristic Operator and Kolmogorov Equations

Let X be a stochastic process on the space .(Ω, F , P , Ft ). In various applications,

there is a notable interest in calculating the conditional expectation

E [ϕ(XT ) | Ft ] ,
. 0 ≤ t < T,
42 2 Markov Processes

where .ϕ ∈ bB is a given function. The problem is not trivial, even from

a computational standpoint, because such a conditional expectation is an .Ft -
measurable random variable, i.e., it depends on the information up to time t, which
in mathematical terms translates into a functional dependency. However, if X is a
Markov process with transition law p then, by the memoryless property, we have

E [ϕ(XT ) | Ft ] = u(t, Xt )
. (2.5.1)

where
ˆ
. u(t, x) := p(t, x; T , dy)ϕ(y), 0 ≤ t ≤ T , x ∈ RN . (2.5.2)
RN

Thus, the problem reduces to determining u as a function of real variables: this is a

significant advantage of Markov processes.
In this section, we show that, as a consequence of the Chapman-Kolmogorov
equation, the function u in (2.5.2) solves a Cauchy problem for which theoretical
results and efficient numerical computation methods are available. More generally,
we prove that, under appropriate assumptions, the transition law .p = p(t, x; T , dy)
solves the so-called Kolmogorov backward and forward equations: these are integro-
differential equations solved by .p(t, x; T , dy) in the backward variables .(t, x)
(corresponding to the initial time and value of the process X) and in the forward
variables .(T , y) (corresponding to the final time and value of the process X),
respectively.
Notation 2.5.1 Given a function .f = f (t, T ), with .t < T , we use the notation

. lim f (t, T ) := lim f (t, T ) = lim f (t, T )

T −t→0+ T →t + t→T −

when the second and third limits exist and coincide.

Definition 2.5.2 (Characteristic Operator) Let p be a transition law on .RN .
Suppose that the limit
ˆ
p(t, x; T , dy) − p(t, x; t, dy)
At ϕ(x) :=
. lim ϕ(y)
T −t→0+ RN T −t

exists for every .(t, x) ∈ R>0 × RN and .ϕ ∈ D where .D is a suitable subspace of

.bBN , the space of measurable and bounded functions from .R
N to .R. Then we say

that .At is the characteristic operator (or infinitesimal generator) of p. If p is the

transition law of a Markov process X, then we also say that .At is the characteristic
operator of X.
Note that .At is a linear operator on .D. The “domain” .D on which the
characteristic operator is defined depends on the transition law p: in the following
2.5 Characteristic Operator and Kolmogorov Equations 43

sections we present some particular cases in which .D can be explicitly determined.

Let us start with the following simple
Example 2.5.3 ([!]) Consider the deterministic Markov process .Xt = γ (t) from
Example 2.1.4. A transition law of X is

p(t, x; T , ·) = δx+γ (T )−γ (t)

. (2.5.3)

and therefore
ϕ(x + γ (T ) − γ (t)) − ϕ(x)
At ϕ(x) =
. lim =
T −t→0+ T −t

(assuming .ϕ ∈ D := bC 1 (RN ), the vector space of bounded and .C 1 functions, and

expanding in a first-order Taylor series)

1
. = lim (∇ϕ(x) · (γ (T ) − γ (t)) + o (|γ (T ) − γ (t)|)) .
T −t→0+ T −t

Such a limit exists only if the function .γ is sufficiently regular: in particular, if .γ is

differentiable then we have

At ϕ(x) = γ ' (t) · ∇ϕ(x).

In this case, the characteristic operator is simply the directional derivative of .ϕ

along the curve .γ : precisely, .At is the first-order differential operator with constant
coefficients

⎲
N
At = γ ' (t) · ∇ =
. γj' (t)∂xj .
j =1

Remark 2.5.4 ([!]) Since .p(t, x; t, ·) = δx for every .t ≥ 0, we have

ˆ
ϕ(y) − ϕ(x)
At ϕ(x) =
. lim p(t, x; T , dy) . (2.5.4)
T −t→0+ RN T −t

Hence, if p is the transition law of a Markov process X, we have

⎾ ⏋
ϕ(XT ) − ϕ(Xt )
.At ϕ(x) = lim E | Xt = x . (2.5.5)
T −t→0+ T −t

Notice, in particular, that the characteristic operator .At depends on the process X
and not on the specific version of its transition law. By (2.5.5), in analogy with
Example 2.5.3, we can interpret .At ϕ(x) as an “average directional derivative” (or
44 2 Markov Processes

average infinitesimal increment) of .ϕ along the trajectories of X starting at time t

from x. Let us also note that
ˆ
p(T , x; T , dy) − p(t, x; T , dy)
.At ϕ(x) = − lim ϕ(y). (2.5.6)
T −t→0+ RN T −t

In the following section, we show that for a wide class of transition laws, it is
possible to give a more detailed representation of the characteristic operator.

2.5.1 The Local Case

Definition 2.5.5 Let x0 ∈ RN . We say that a linear operator A : C 2 (RN ) −→ R

• satisfies the maximum principle at x0 if A ϕ ≤ 0 for any ϕ ∈ C 2 (RN ) such that
ϕ(x0 ) = max ϕ(x);
x∈RN
• is local at x0 if A ϕ = 0 for every ϕ ∈ C 2 (RN ) that vanishes in a neighborhood
of x0 .
Remark 2.5.6 We note that:
(i) if A satisfies the maximum principle at x0 then A ϕ = 0 for every constant
function ϕ;
(ii) if A is a local operator at x0 then A ϕ = A ψ for every ϕ, ψ that are equal in
a neighborhood of x0 ;
(iii) combining (i) and (ii) we have that if A satisfies the maximum principle and
is local at x0 then A ϕ = 0 for every ϕ that is constant in a neighborhood of
x0 ;
(iv) if A satisfies the maximum principle and is local at x0 then A ϕ = A T2,x0 (ϕ)
where T2,x0 (ϕ) is the second-order Taylor polynomial of ϕ with initial point
x0 .
Indeed, since A is a linear operator, it is enough to prove that A ϕ = 0 for
every ϕ ∈ C 2 (RN ) whose second-order Taylor polynomial with initial point
x0 is null. Moreover, it is not restrictive to assume x0 = 0. Consider a “cut-off”
function χ ∈ C0∞ (RN ; R) such that 0 ≤ χ ≤ 1, χ (x) ≡ 1 for |x| ≤ 1 and
( )
χ (x) ≡ 0 for |x| ≥ 2. Letting ϕδ (x) = ϕ(x)χ xδ for δ > 0, there exists13 a

13 By assumption, |ϕ(x)| ≤ |x|2 g(|x|) for |x| ≤ 1 with g going to zero as |x| → 0+ and it is not
restrictive to assume g monotonically increasing. Then (2.5.7) follows from the fact that
⎛x ⎞ 1
.g(|x|)χ ≤ χ(x)g(δ), x ∈ RN , 0 < δ ≤ .
δ 2
2.5 Characteristic Operator and Kolmogorov Equations 45

function g such that g(δ) → 0 as δ → 0+ and

1
. |ϕδ (x)| ≤ g(δ)|x|2 χ (x), x ∈ RN , 0 < δ ≤ . (2.5.7)
2

Then, applying the maximum principle at 0 to the functions ψδ± (x) =

−g(δ)|x|2 χ (x) ± ϕδ (x), we obtain A ψδ± ≤ 0 or equivalently, by point (i),

. ± A ϕ = ±A ϕδ ≤ g(δ)A ψ, ψ(x) := |x|2 χ (x).

The thesis is follows since δ > 0 is arbitrarily small.

The following result, which is a particular case of Courrège’s theorem [26],
provides an interesting characterization of local linear operators that satisfy the
maximum principle.
Theorem 2.5.7 (Courrège’s Theorem) A linear operator A on C 2 (RN ) satisfies
the maximum principle and is local at x0 ∈ RN if and only if there exist b ∈ RN
and a symmetric and positive semidefinite C = (cij )1≤i,j ≤N such that

1 ⎲ ⎲
N N
Aϕ =
. cij ∂xi xj ϕ(x0 ) + bi ∂xi ϕ(x0 ), ϕ ∈ C 2 (RN ). (2.5.8)
2
i,j =1 i=1

Proof By Remark 2.5.6 we have

A ϕ = A T2,x0 (ϕ) =
.

(by the linearity of A )

1 ⎲ ⎲
N N
. = cij ∂xi xj ϕ(x0 ) + bi ∂xi ϕ(x0 )
2
i,j =1 i=1

where cij := A ϕij and bj := A ϕj with

ϕij (x) = (x − x0 )i (x − x0 )j ,
. ϕj (x) = (x − x0 )j , x ∈ RN . (2.5.9)

To check that C = (cij ) ≥ 0, consider η ∈ RN and set

⎲
N
. ϕη (x) = −〈x − x0 , η〉2 = − ηi ηj ϕij (x);
i,j =1

then by linearity and by the maximum principle at x0 we have

A ϕη = −2〈C η, η〉 ≤ 0.
.
46 2 Markov Processes

Conversely, if A is of the form (2.5.8) then it is clearly local at x0 . Moreover,

there exists a symmetric and positive semi-definite matrix M = (mij ) such that
⎛N ⎞ ⎛N ⎞
⎲ ⎲
C =M =
.
2
mih mhj = mih mj h .
h=1 i,j h=1 i,j

If x0 is a maximum point for ϕ then ∇ϕ(x0 ) = 0 and the Hessian matrix of ϕ in x0

is negative semi-definite, so we have

1 ⎲ ⎲ 1⎲ ⎲
N N N N
Aϕ =
. ∂xi xj ϕ(x0 ) mih mj h = ∂xi xj ϕ(x0 )mih mj h ≤ 0,
2 2
i,j =1 h=1 h=1 i,j =1

that is, A satisfies the maximum principle at x0 . ⨆

⨅
Remark 2.5.8 ([!]) For every x ∈ RN , the characteristic operator At of a transition
law p satisfies the maximum principle at x: this follows immediately from (2.5.4).
Then, under the further assumption that At is local14 at x, Theorem 2.5.7 provides
the representation

1 ⎲ ⎲
N N
At ϕ(x) =
. cij (t, x)∂xi xj ϕ(x)+ bi (t, x)∂xi ϕ(x), (t, x) ∈ R>0 ×RN ,
2
i,j =1 i=1
(2.5.10)
where C (t, x) = (cij (t, x)) is an N × N symmetric, positive semi-definite matrix
and b(t, x) = (bj (t, x)) ∈ RN . In other words, At is a second-order partial
differential operator of elliptic-parabolic type.
Combining (2.5.4) with the expression of the coefficients of At given by the
functions in (2.5.9), we obtain the formulas15
ˆ
p(t, x; T , dy)
bi (t, x) =
. lim (y − x)i
T −t→0+ T −t
RN
⎾ ⏋
(XT − Xt )i
= lim E | Xt = x , . (2.5.11)
T −t→0+ T −t

14 It can be shown that the property of being local corresponds to the continuity of the trajectories of

the associated Markov process. For the characterization of the characteristic operator of a generic
Markov process, see, for example, [132].
15 If A is local at x then the integration domain in (2.5.11) and (2.5.12) can be restricted to |x −
t
y| < 1.
2.5 Characteristic Operator and Kolmogorov Equations 47

ˆ
p(t, x; T , dy)
cij (t, x) = lim (y − x)i (y − x)j
T −t→0+ T −t
RN
⎾ ⏋
(XT − Xt )i (XT − Xt )j
= lim E | Xt = x , (2.5.12)
T −t→0+ T −t

for i, j = 1, . . . , N . Hence, the coefficients of At represent the infinitesimal

increments of the mean and covariance matrix16 of the process X as it starts from
(t, x). From formulas (2.5.11) and (2.5.12) it also follows that cij = cij (t, x) and
bj = bj (t, x) are Borel measurable functions on R>0 × RN .

2.5.2 Backward Kolmogorov Equation

Let p be the transition law of a Markov process X. We exploit the Chapman-

Kolmogorov equation to study the conditional expectation function in (2.5.2),
defined by
ˆ
u(t, x) :=
. p(t, x; T , dy)ϕ(y) = E [ϕ(XT ) | Xt = x] , 0 ≤ t ≤ T , x ∈ RN ,
RN
(2.5.13)
for .ϕ ∈ bB. If it exists, the derivative .∂t u(t, x) is given by
ˆ
p(t, x; T , dy) − p(t − h, x; T , dy)
∂t u(t, x) = lim
. ϕ(y) =
h→0+ RN h

16 Notice that
ˆ
p(t, x; T , dy)
.cij (t, x) = lim (y − x − (T − t)b(t, x))i (y − x − (T − t)b(t, x))j .
T −t→0+ T −t
RN
⎾ ⏋
(XT − Xt − (T − t)b(t, Xt ))i (XT − Xt − (T − t)b(t, Xt ))j
= lim E | Xt = x
T −t→0+ T −t

as can be verified by expanding the product inside the integral and observing that
ˆ ˆ
. lim (T − t) p(t, x; T , dy)bi (t, x)bj (t, x) = lim p(t, x; T , dy)(y − x)i bj (t, x) = 0.
T −t→0+ T −t→0+
RN RN
48 2 Markov Processes

(by the Chapman-Kolmogorov equation)

ˆ ˆ
p(t, x; t, dz) − p(t − h, x; t, dz)
. = lim p(t, z; T , dy)ϕ(y)
h→0+ RN h
◟R
N
◝◜ ◞
=u(t,z)

= −At u(t, x) (2.5.14)

based on the definition of the characteristic operator in the form (2.5.6). The
previous steps are justified rigorously under the assumption that .u(t, ·) ∈ D: in
Example 2.5.12 this assumption is satisfied if .ϕ ∈ C 1 (RN ) since .x |→ u(t, x) =
ϕ(x + γ (T ) − γ (t)) inherits the regularity properties of .ϕ. We will examine later
other significant examples in which .u(t, ·) ∈ bC 2 (RN ) thanks to the regularizing
properties of the kernel .p(t, x; T , dy).
Therefore, at least formally, the function u in (2.5.13) solves the Cauchy problem
for the backward Kolmogorov equation17 (with final datum)
⎧
∂t u(t, x) + At u(t, x) = 0, (t, x) ∈ [0, T [×RN ,
. (2.5.15)
u(T , x) = ϕ(x), x ∈ RN ,

or in integral form
ˆ T
u(t, x) = ϕ(x) +
. As u(s, x)ds, (t, x) ∈ [0, T ] × RN .
t

We emphasize that problem (2.5.15) is written in the backward variables .(t, x)

assuming the forward time T fixed.
Example 2.5.9 ([!]) Consider the Gaussian transition law .p(t, x; T , dy) =
𝚪(t, x; T , y)dy of Example 2.1.9 with transition density defined by

1 (x−y)2
𝚪(t, x; T , y) = √
. e− 2(T −t) , 0 ≤ t < T , x, y ∈ R. (2.5.16)
2π(T − t)

´
17 Being .u(t, x) = p(t, x; T , dy)ϕ(y), it is also customary to say that the transition law .(t, x) |→
RN
p(t, x; T , dy) solves the backward problem
⎧
∂t p(t, x; T , dy) + At p(t, x; T , dy) = 0, (t, x) ∈ [0, T [×RN ,
.
p(T , x; T , ·) = δx , x ∈ RN ,

in the backward variables .(t, x).

2.5 Characteristic Operator and Kolmogorov Equations 49

The Markov process associated with p is the Brownian motion that will be
introduced in Chap. 4. A direct calculation shows that

T − t − (x − y)2
∂t 𝚪(t, x; T , y) = −∂T 𝚪(t, x; T , y) =
. 𝚪(t, x; T , y),
2(T − t)2
y−x
∂x 𝚪(t, x; T , y) = −∂y 𝚪(t, x; T , y) = 𝚪(t, x; T , y),
T −t
T − t − (x − y)2
∂xx 𝚪(t, x; T , y) = ∂yy 𝚪(t, x; T , y) = − 𝚪(t, x; T , y),
(T − t)2

from which we obtain the backward Kolmogorov equation

⎛ ⎞
1
. ∂t + ∂xx 𝚪(t, x; T , y) = 0, t < T , x, y ∈ R (2.5.17)
2

and also
⎛ ⎞
1
. ∂T − ∂yy 𝚪(t, x; T , y) = 0, t < T , x, y ∈ R (2.5.18)
2

which is called forward Kolmogorov equation and will be studied in Sect. 2.5.3. The
characteristic operator of p is the Laplace operator

1
At =
. ∂xx
2
as can also be verified using formulas (2.5.11) and (2.5.12) which here become
ˆ
𝚪(t, x; T , y)
.b(t, x) = lim (y − x)dy = 0,
T −t→0+ T −t
RN
ˆ
𝚪(t, x; T , y)
c(t, x) = lim (y − x)2 dy = 1.
T −t→0+ T −t
RN

Obviously, .At is a local operator at every .x ∈ R.

Equations (2.5.17) and (2.5.18) are well known for their importance in physics
and economics:
• (2.5.18) is also called forward heat equation and intervenes in models that
describe the physical phenomenon of heat diffusion in a body. Precisely, the
solution .v = v(T , y) of the forward Cauchy problem
⎧
∂T v(T , y) = 12 ∂yy v(T , y), (T , y) ∈ ]t, +∞[×R,
. (2.5.19)
v(t, y) = ϕ(y), y ∈ R,
50 2 Markov Processes

represents the temperature, at time T and position y, of an infinitely long body

with assigned temperature .ϕ at the initial time t;
• (2.5.17) is called backward heat equation and intervenes naturally in mathe-
matical finance, in the valuation of certain complex financial instruments, called
derivatives, of which the value .ϕ is known at the future time T : the price at time
.t < T is given by the solution .u = u(t, x) of the backward Cauchy problem

⎧
∂t u(t, x) + 12 ∂xx u(t, x) = 0, (t, x) ∈ [0, T [×R,
. (2.5.20)
u(T , x) = ϕ(x), x ∈ R.

Note that, if v denotes the solution of the forward problem (2.5.19) with initial time
t = 0, then .u(t, x) := v(T − t, x) solves the backward problem (2.5.20); moreover,
.

u is given by formula (2.5.13) which here becomes

ˆ
u(t, x) =
. 𝚪(t, x; T , y)ϕ(y)dy, (t, x) ∈ [0, T ] × R. (2.5.21)
R

By exchanging signs of derivative and integral, one can prove that .u ∈

C ∞ ([0, T [×R) and .‖u‖∞ ≤ ‖ϕ‖∞ for every .ϕ ∈ bB and this justifies the
validity of (2.5.14).
Remark 2.5.10 In the theory of differential equations, .𝚪 in (2.5.16) is called
fundamental solution of the heat operator since, through the resolutive formula
(2.5.21), it provides the solution of the backward problem (2.5.20) for every final
datum .ϕ ∈ bC (and similarly of the forward problem (2.5.19) for every initial datum
.ϕ ∈ bC). We refer to Sect. 20.2 for the general definition of fundamental solution.

A deep connection between the theory of stochastic processes and that of partial
differential equations is given by the fact that, if it exists, the transition density of
a Markov process (for example, the Gaussian density in the case of a Brownian
motion) is the fundamental solution of the Kolmogorov equations (corresponding
to the heat equations in the case of a Brownian motion). A general treatment on
the existence and uniqueness of the solution of the Cauchy problem for partial
differential equations of parabolic type is given in Chap. 20, while in Chap. 15 we
deepen the connection with stochastic differential equations.
Example 2.5.11 ([!]) Consider the Poisson transition law with parameter .λ > 0 of
Example 2.4.5:

.p(t, x; T , ·) = Poissonx,λ(T −t)

+∞
⎲ (λ(T − t))n
:= e−λ(T −t) δx+n , 0 ≤ t ≤ T , x ∈ R.
n!
n=0
2.5 Characteristic Operator and Kolmogorov Equations 51

For u as in (2.5.13) we have

⎛ ⎞
⎲ (λ(T − t))n ⎠
∂t u(t, x) = ∂t ⎝e−λ(T −t)
. ϕ(x + n)
n!
n≥0
⎲ (λ(T − t))n
= λe−λ(T −t) ϕ(x + n)
n!
n≥0
⎲ (λ(T − t))n
+ e−λ(T −t) ∂t ϕ(x + n) =
n!
n≥0

(the exchange of series-derivative is justified by the fact that it is a series of powers

with infinite convergence radius if .ϕ ∈ bB)

⎲ (λ(T − t))n−1
. = λu(t, x) − λe−λ(T −t) ϕ(x + n)
(n − 1)!
n≥1
⎲ (λ(T − t))n
= λu(t, x) − λe−λ(T −t) ϕ(x + n + 1)
n!
n≥0

= −λ (u(t, x + 1) − u(t, x)) .

Hence .At is defined by

At ϕ(x) = λ (ϕ(x + 1) − ϕ(x)) ,

. ϕ ∈ D := bB.

In this case, .At is a non-local operator at any .x ∈ R.

2.5.3 Forward Kolmogorov (or Fokker-Planck) Equation

Assume that p is the transition law of a Markov process X. By definition of charac-

teristic operator and assuming the existence of the derivative .∂T p(t, x; T , dz), for
every .ϕ ∈ D we have
ˆ ˆ
p(t, x; T + h, dz) − p(t, x; T , dz)
. ∂T p(t, x; T , dz)ϕ(z) = lim ϕ(z) =
RN RN h→0+ h

(by the Chapman-Kolmogorov equation)

ˆ ˆ
p(T , y; T + h, dz) − p(T , y; T , dz)
. = p(t, x; T , dy) lim ϕ(z)
RN h→0+ RN h
ˆ
= p(t, x; T , dy)AT ϕ(y).
RN
52 2 Markov Processes

In conclusion, we have
ˆ ˆ
. ∂T p(t, x; T , dy)ϕ(y) = p(t, x; T , dy)AT ϕ(y), ϕ ∈ D,
RN RN
(2.5.22)
which is called the forward Kolmogorov equation or also the Fokker-Planck
equation. Here .ϕ must be interpreted as a test function and (2.5.22) as the weak
(or distributional) form of the equation

.∂T p(t, x; T , ·) = AT∗ p(t, x; T , ·)

where .AT∗ denotes the adjoint operator of .AT . For example, if .AT is a differential
operator of the form (2.5.10) then .AT∗ is obtained formally by integration by parts:
ˆ ˆ
( ∗ )
. AT u(y) v(y)dy = u(y)AT v(y)dy,
RN RN

for any pair of test functions .u, v. If the coefficients are sufficiently regular, it is
possible to write the forward operator more explicitly:

1 ⎲ ⎲
N N
AT∗ u =
. cij ∂yi yj u + bj∗ ∂yj + a ∗ , (2.5.23)
2
i,j =1 j =1

where

⎲
N ⎲
N
1 ⎲
N
bj∗ := −bj +
. ∂yi cij , a ∗ := − ∂yi bi + ∂yi yj cij . (2.5.24)
2
i=1 i=1 i,j =1

Formula (2.5.22) is also expressed by stating that .p(t, x; ·, ·) is a distributional

solution of the forward Cauchy problem (with initial datum)
⎧
∂T p(t, x; T , ·) = AT∗ p(t, x; T , ·), T > t,
. (2.5.25)
p(t, x; t, ·) = δx .

The term “distributional solution” is used to indicate the fact that .p(t, x; T , ·), being
a distribution, does not generally have the regularity required to support the operator
.AT which in fact appears in (2.5.22) applied to the test function .ϕ. Note that

the problem (2.5.25) is written in the forward variables .(T , y) on .]t, +∞[×RN ,
assuming fixed the backward variables .(t, x).
The existence of the distributional solution of (2.5.25) can be proved under very
general assumptions (see, for example, Theorem 1.1.9 in [133]): although the notion
of distributional solution is very weak, this is the best result one can hope to obtain
without assuming further hypotheses, as shown by the following
2.5 Characteristic Operator and Kolmogorov Equations 53

Example 2.5.12 ([!]) Let us resume Example 2.5.3. The operator .At = γ ' (t) ·
∇x , with .∇x = (∂x1 , . . . , ∂xN ), is obviously local at every .x ∈ RN : it can also be
determined using formulas (2.5.11) and (2.5.12) which, for p as in (2.5.3) with .γ
differentiable, give
ˆ
1
b(t, x) =
. lim δx+γ (T )−γ (t) (dy)(y − x) = γ ' (t), .
T −t→0+ T −t
RN
ˆ
1
cij (t, x) = lim δx+γ (T )−γ (t) (dy)(y − x)i (y − x)j = 0.
T −t→0+ T −t
RN

The Cauchy problem (2.5.25) for the forward Kolmogorov equation is

⎧
∂T p(t, x; T , ·) = −γ ' (T ) · ∇y p(t, x; T , ·), T > t,
. (2.5.26)
p(t, x; t, ·) = δx .

Clearly, since .p(t, x; T , ·) is a measure, the gradient .∇y p(t, x; T , ·) is not defined
in the classical sense but in the sense of distributions. Therefore, problem (2.5.26)
should be understood as in (2.5.22), that is, as an integral equation where the
gradient is applied to the function .ϕ:
ˆ T
ϕ(x+γ (T )−γ (t)) = ϕ(x)+
. γ ' (s)·(∇ϕ)(x+γ (s)−γ (t))ds, ϕ ∈ C 1 (RN );
t

by differentiating, we find

d
. ϕ(x + γ (T ) − γ (t)) = γ ' (T ) · (∇ϕ)(x + γ (T ) − γ (t)).
dT
Intuitively, the characteristic operator provides the infinitesimal increment (also
called, the drift) of a process: by removing the drift, we get a martingale. This fact is
made rigorous by the following remarkable result, which shows how to compensate
a process to make it a martingale, by means of the characteristic operator.
Theorem 2.5.13 ([!]) Let X be a Markov process with characteristic operator .At
defined on a domain .D. If .ψ ∈ D is such that .At ψ(Xt ) ∈ L1 ([0, T ] × Ω), then the
process
ˆ t
Mt := ψ(Xt ) −
. As ψ(Xs )ds, t ∈ [0, T ],
0

is a martingale.
54 2 Markov Processes

Proof We have .Mt ∈ L1 (Ω, P ), for any .t ∈ [0, T ], thanks to the assumptions18 on
.ψ. It remains to prove that

E [Mt − Ms | Ft ] = 0,
. 0 ≤ s ≤ t ≤ T,

that is
⎾ ˆ t ⏋
E ψ(Xt ) − ψ(Xs ) −
. Ar ψ(Xr )dr | Fs = 0, 0 ≤ s ≤ t ≤ T.
s

Integrating the forward Kolmogorov equation (2.5.22) over time with .x = Xs , we

have
ˆ ˆ tˆ
.0 = p(s, Xs ; t, dy)ψ(y) − ψ(Xs ) − p(s, Xs ; r, dy)Ar ψ(y)dr =
RN s RN

(by the Markov property (2.5.1) applied to the first and last term)
ˆ t
. = E [ψ(Xt ) | Fs ] − ψ(Xs ) − E [Ar ψ(Xr ) | Fs ] dr =
s

(since, as we will prove shortly, it is possible to exchange the time integral with the
conditional expectation)
⎾ ˆ t ⏋
. = E ψ(Xt ) − ψ(Xs ) − Ar ψ(Xr )dr | Fs
s

which proves the thesis.

To justify the exchange between the integral and the conditional expectation, we
verify that the random variable
ˆ t
.Z := E [Ar ψ(Xr ) | Fs ] dr
s

´t
is a version of the conditional expectation of . Ar ψ(Xr )dr given .Fs . First of all,
s
from the fact that .E [Ar ψ(Xr ) | Fs ] ∈ mFs it follows that also .Z ∈ mFs . Then,
for every .G ∈ Fs , we have
⎾ˆ t ⏋
E [Z1G ] = E
. E [Ar ψ(Xr ) | Fs ] dr 1G =
s

18 We also recall that .ψ is bounded since .D ⊆ bBN : this assumption is not restrictive and can be
significantly weakened.
2.6 Markov Processes and Diffusions 55

(by Fubini’s theorem, given the integrability assumption on .Ar ψ(Xr ))

ˆ t
. = E [E [Ar ψ(Xr ) | Fs ] 1G ] dr =
s

(by the properties of conditional expectation)

ˆ t
. = E [Ar ψ(Xr )1G ] dr =
s

(reapplying Fubini’s theorem)

⎾ˆ t ⏋
. =E Ar ψ(Xr )dr 1G .
s

⨆
⨅

2.6 Markov Processes and Diffusions

Continuous Markov processes are sometimes called diffusions, although it should

be noted that there is no unanimous agreement on this definition in the literature.
Associated with each N -dimensional diffusion are the measurable functions .b =
(bi )1≤i≤N and .C = (cij )1≤i,j ≤N defined in (2.5.11) and (2.5.12); these functions
are the coefficients of the characteristic operator (2.5.10):

1 ⎲ ⎲
N N
.At = cij (t, x)∂xi xj + bi (t, x)∂xi , (t, x) ∈ R × RN .
2
i,j =1 i=1

We recall that .C is an .N × N symmetric and positive semi-definite matrix.

Historically, there are two main approaches to the construction of diffusions.
The first and more classical one is based on Kolmogorov’s equations: specifically,
the idea of A. N. Kolmogorov [69] and W. Feller [45] is to determine a transition
law .p(t, x; T , dy) as the solution of the forward Kolmogorov equation

∂T p(t, x; T , dy) = AT∗ ∂T p(t, x; T , dy)

. (2.6.1)

associated with the initial datum .p(t, x; t, ·) = δx as in (2.5.25). Equation (2.6.1)

is the starting point for the study of the existence and regularity properties of
56 2 Markov Processes

a density of p through analytical19 and probabilistic20 techniques. Although it

seems the most natural approach, Eq. (2.6.1) presents some technical difficulties
due to being interpreted in a distributional sense in the forward variables and the
presence of the adjoint operator of .At whose precise definition requires appropriate
regularity assumptions on the coefficients (cf. (2.5.23) and (2.5.24)). For this reason,
attention has subsequently shifted to the Kolmogorov backward equation. The study
of diffusions using the backward equation has been one of the most effective and
successful approaches: Sect. 18.2 is dedicated to a summary of the main results in
this regard. The main objection to the use of Kolmogorov’s equations for the study
of diffusions is that the tools used are predominantly analytical in nature and rely on
technically complex results from the theory of partial differential equations: among
these, first and foremost, the construction of the fundamental solution of parabolic
equations that we will present in a synthetic way in Chap. 20.
The second approach to the construction of diffusions is the one initiated by
K. Itô: it is inspired by P. Lévy’s idea of considering the infinitesimal increment
.Xt+dt −Xt of a diffusion as a Gaussian increment with drift .b(t, Xt ) and covariance

matrix .C (t, Xt ), consistently with Eqs. (2.5.11) and (2.5.12). Itô developed a theory
of stochastic calculus based on which the previous idea can be formalized in terms
of the stochastic differential equation

dXt = b(t, Xt )dt + σ (t, Xt )dWt ,

. (2.6.2)

where W denotes a stochastic process with independent and Gaussian increments

(a Brownian motion, cf. Chap. 4) and .C = σ σ ∗ . The primary challenge with this
approach lies in defining the stochastic differential (or integral) of processes whose
trajectories, while continuous, exhibit such irregularity that traditional mathematical
analysis tools prove inadequate: Chap. 10 is entirely dedicated to the theory of
stochastic integration in the Itô sense. Secondly, in order to construct a diffusion
X as a solution of Eq. (2.6.2), existence and uniqueness results are required for such
an equation: this problem has also been solved by Itô under standard assumptions
of local Lipschitz continuity and linear growth of the coefficients in perfect analogy
with the theory of ordinary differential equations. Subsequently, a significant step
forward was made by Stroock and Varadhan [134, 135] who built a bridge between
the theory of diffusions and that of martingales: Stroock and Varadhan showed that
the problem of the existence of a diffusion, as a solution of (2.6.2), is equivalent
to the so-called “martingale problem”, i.e., the problem of the existence of a
probability measure, on the canonical space of trajectories, with respect to which
the compensated process of Theorem 2.5.13 is a martingale. A concise presentation
of the main results by Stroock and Varadhan is provided in Chap. 18.

19 The most important result in this regard is the famous Hörmander’s theorem [62].
20 Malliavin’s calculus extends the mathematical field of calculus of variations from deterministic
functions to stochastic processes. For a general reference see, e.g., [101].
2.7 Key Ideas to Remember 57

2.7 Key Ideas to Remember

We summarize the core concepts and key insights from the chapter to facilitate
comprehension, omitting the more technical or less significant details. As usual,
if you have any doubt about what the following succinct statements mean, please
review the corresponding section.
• Section 2.1: the transition law of a stochastic process .X = (Xt )t≥0 is the family
of the conditional distributions of .XT given .Xt , indexed by .t, T with .t ≤ T . Two
notable examples of transition laws are the Gaussian and Poisson ones.
• Section 2.2: for a Markov process, conditioning on .Ft (the .σ -algebra of
information up to time t) is equivalent to conditioning on .Xt : in this sense, the
Markov property is a “memoryless” property.
• Section 2.3: processes with independent increments are Markov processes.
• Section 2.4: starting from the initial distribution and the transition law of a
Markov process, it is possible to derive the finite-dimensional distributions, and
therefore the law of the process: moreover, the transition law of a Markov process
verifies an important identity, the Chapman-Kolmogorov equation (2.4.3), which
expresses a consistency property between the distributions that make up the
transition law.
• Section 2.5: if it exists, the average directional derivative along the trajectories of
X, i.e.
⎾ ⏋
ϕ(XT ) − ϕ(Xt )
. lim E | Xt = x =: At ϕ(x),
T −t→0+ T −t

defines the characteristic operator .At of the Markov process X, at least for .ϕ in
an appropriate space of functions.
• Section 2.5.1: for continuous Markov processes, .At is a second-order elliptic-
parabolic partial differential operator whose prototype is the Laplace operator.
The coefficients of .At are the infinitesimal increments of the mean and covari-
ance matrix of X (cf. formulas (2.5.11) and (2.5.12)).
• Sections 2.5.2 and 2.5.3: the transition law is the solution of the backward
and forward Kolmogorov equations. The prototypes of such equations are the
backward and forward versions of the heat equation.
• Section 2.6: we call diffusion a continuous Markov process. A classical approach
to the construction of diffusions consists in determining their transition law
as fundamental solutions of the backward or forward Kolmogorov equation.
Alternatively, diffusions are constructed as solutions of stochastic differential
equations, the theory of which will be developed starting from Chap. 14.
58 2 Markov Processes

Main notations introduced in this chapter:

Symbol Description Page

.p = p(t, x; T , H ) Transition law 25
.Poissonx,λ(T −t) Poisson transition law 27
.𝚪(t, x; T , y) Gaussian transition density 28
.X Canonical version of the process X 29
.Gt,∞ = σ (Xs , s ≥ t)
X .σ -algebra of future information on X 32
t,x
.XT = XT − Xt + x Translated process 34
.At Characteristic operator 42
.At
∗ Adjoint operator 52
Chapter 3
Continuous Processes

As far as the laws of mathematics refer to reality, they are not

certain; and as far as they are certain, they do not refer to
reality.
Albert Einstein

The notion of continuity for stochastic processes, although intuitive, hides some
small pitfalls and must therefore be analyzed carefully.
In this chapter, I denotes a real interval of the form .I = [0, T ] or .I =
[0, +∞[. Moreover, .C(I ) is the set of continuous functions mapping I to real
values. In the first part of the chapter, we confirm a natural and unsurprising fact:
a continuous process can be defined as a random variable with values in the space
of continuous functions .C(I ), rather than in the space .RI of all trajectories, as
seen in the broader definition of a stochastic process (cf. Definition 1.1.3). Then
we prove the fundamental Kolmogorov’s continuity theorem according to which,
up to modifications, one can deduce the continuity of a process from a condition
on its law: this is a deep result because it allows to deduce a “pointwise” property
(of individual trajectories) from a condition “in the average” (i.e. on the law of the
process).

3.1 Continuity and a.s. Continuity

Definition 3.1.1 (Continuous Process) A stochastic process X = (Xt )t∈I on

the space (Ω, F , P ) is almost surely (a.s.) continuous if the family of continuous
trajectories

(X ∈ C(I )) := {ω ∈ Ω | X(ω) ∈ C(I )}

is an almost sure set, i.e., it includes a certain event: (X ∈ C(I )) ⊇ A with A ∈ F

such that P (A) = 1.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 59

A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_3
60 3 Continuous Processes

Remark 3.1.2 (Continuity and Completeness) If the space (Ω, F , P ) is com-

plete, then X is a.s. continuous if and only if P (X ∈ C(I )) = 1. If (Ω, F , P ) is
not complete, then it is not necessarily true that (X ∈ C(I )) is an event. In fact,
recall that, denoting by F I the σ -algebra on RI generated by cylinders, by the
Definition 1.1.3 of stochastic process, we have X−1 (H ) ∈ F for every H ∈ F I :
however, by Remark 1.1.10, C(I ) ∈ / F I and therefore it is not necessarily true that
(X ∈ C(I )) ∈ F . Similarly, in an incomplete space, even if X is a.s. continuous, it
is not necessarily the case that quantities such as
ˆ ⎧
inf I + if I + := {t ∈ I | Xt > 0} =
/ ∅,
M := sup Xt ,
. J := Xt dt, T :=
t∈I I 0 otherwise,
(3.1.1)
are random variables.
Remark 3.1.3 (Continuity and Almost Sure Continuity) Let X be an a.s. con-
tinuous process defined on the space (Ω, F , P ) and let A be as in Definition 3.1.1.
Then X is indistinguishable from X̄ := X1A which has all continuous trajectories.1
More explicitly, X̄ is defined by
⎧
X(ω) if ω ∈ A,
X̄(ω) =
.
0 otherwise.

We say that X̄ is a continuous version of X. Hence, provided that we switch

to a continuous version, we can eliminate the term “almost surely” and consider
continuous processes instead of a.s. continuous ones.
Now, one might wonder why the definition of a.s. continuous process was
introduced and not directly that of a continuous process. The fact is that a stochastic
process, such as the Brownian motion, is usually constructed from a given law, using
Kolmogorov’s extension theorem: in this way, one can only prove2 the almost sure
continuity of the trajectories and only later switch to a continuous version.
Remark 3.1.4 If X = (Xt )t∈I , with I = [0, 1], is a continuous process then M, J
and T in (3.1.1) are well-defined and are random variables. In fact, it is enough to
observe that

M=
. sup Xt .
t∈[0,1]∩Q

1 We cannot use (X ∈ C(I )) instead of A because if (Ω, F , P ) is not complete then X1(X∈C(I ))
would not necessarily be a stochastic process.
2 Actually, the argument is more subtle and will be clarified in Sect. 3.3.
3.2 Canonical Version of a Continuous Process 61

Moreover, J (ω) is well-defined for each ω ∈ Ω, since all trajectories of X are

continuous, and equals

1⎲
n
J (ω) = lim
. X k (ω)
n→∞ n n
k=1

since the integral of a continuous function is equal to the limit of Riemann sums.
Finally, (I + = ∅) = (M ≤ 0) ∈ F and thus also
⋃
(T < t) = (I + = ∅) ∪
. (Xs > 0)
s∈Q∩[0,t[

belongs to F for every 0 < t ≤ 1: this is enough to prove that T ∈ mF .

3.2 Canonical Version of a Continuous Process

In this section, we focus on the case .I = [0, 1]. We recall that .C([0, 1]) (we also
write, more simply, .C[0, 1]) is a separable and complete metric space, i.e., a Polish
space, with the uniform metric

.ρmax (v, w) = max |v(t) − w(t)|, v, w ∈ C[0, 1].

t∈[0,1]

We consider .I = [0, 1] only for simplicity: the results of this section can be easily
extended to the case where .I = [0, T ] or even .I = R≥0 considering the distance

⎲ 1 ⎧ ⎫
.ρmax (v, w) = min 1, max |v(t) − w(t)| , v, w ∈ C(R≥0 ).
2n t∈[0,n]
n≥1

We denote by .Bρmax the Borel .σ -algebra on .C[0, 1] (cf. Section 1.4.2 in [113]).
According to the general Definition 1.1.3, a stochastic process .X = (Xt )t∈I is
a measurable function from .(Ω, F ) to .(RI , F I ). We now show that if X is con-
tinuous then it is possible to replace the codomain .(RI , F I ) with .(C(I ), Bϱmax ),
maintaining the measurability property with respect to the .σ -algebra .Bϱmax . This
fact is not trivial and deserves to be proven rigorously. In fact, based on Remark
1.1.10, .C[0, 1] itself does not belong to .F [0,1] and therefore in general .(X ∈
C[0, 1]) is not an event. Similarly, the singletons .{w} are not elements of .F [0,1]
and therefore even if

X : (Ω, F ) −→ (R[0,1] , F [0,1] )

.
62 3 Continuous Processes

is a stochastic process, it is not necessarily true that .(X = w) is an event. On the

contrary, in the space .(C[0, 1], Bϱmax ) singletons are measurable (they are disks of
radius zero in the uniform metric), that is, .{w} ∈ Bϱmax for each .w ∈ C[0, 1].
Proposition 3.2.1 Let .X = (Xt )t∈[0,1] be a continuous stochastic process on the
space .(Ω, F , P ). Then the map

.X : (Ω, F ) −→ (C[0, 1], Bϱmax )

is measurable.
Proof First, we show that .Bϱmax is the .σ -algebra generated by the family .C~ of
cylinders of the form3

~t (H ) := {w ∈ C[0, 1] | w(t) ∈ H },
C
. t ∈ [0, 1], H ∈ B. (3.2.1)

In fact, cylinders of the type (3.2.1) with H open in .R generate .σ (C~) and are open
with respect to .ϱmax : therefore .Bϱmax ⊇ σ (C~).
Conversely, since .(C[0, 1], ϱmax ) is separable, every open set is a countable
union of open disks. Therefore, .Bϱmax is generated by the family of open disks
that are sets of the form

D(w, r) = {v ∈ C[0, 1] | ϱmax (v, w) < r},

where .w ∈ C[0, 1] is the center and .r > 0 is the radius of the disk. On the other
hand, each disk is obtained by countable operations of union and intersection of
cylinders of .C~in the following way
⋃ ⋂
D(w, r) =
. {v ∈ C[0, 1] | |v(t) − w(t)| < r − n1 }.
n∈N t∈[0,1]∩Q

Thus, each disk belongs to .σ (C~) and this proves the opposite inclusion.
Now we prove the thesis: as just proven, we have
( ) ( )
X−1 Bϱmax = X−1 σ (C~) =
.

(since X is continuous)

. = X−1 (σ (C )) ⊆ F

where the last inclusion is due to the fact that X is a stochastic process. ⨆
⨅

3 We use the “tilde” to distinguish the cylinders of continuous functions from the cylinders of .R [0,1]

defined in (1.1.1).
3.2 Canonical Version of a Continuous Process 63

Proposition 3.2.1 allows us to give the following

Definition 3.2.2 (Law of an a.s. Continuous Process) Let .X = (Xt )t∈I be a
continuous process4 on the space .(Ω, F , P ). The law of X is the distribution .μX
defined on .(C(I ), Bϱmax ) by

μX (H ) = P (X ∈ H ),
. H ∈ Bϱmax .

Two continuous processes X and Y are equal in law (or in distribution) if .μX = μY :
d
in this case we write .X = Y .
In analogy with Definition 1.3.4 we give the following
Definition 3.2.3 (Canonical Version of an a.s. Continuous Process [!]) Let .X =
(Xt )t∈I be an a.s. continuous process defined on the space .(Ω, F , P ) and with law
.μX . The canonical version of X is the stochastic process defined as the identity

function .X(w) = w, .w ∈ C(I ), on the probability space .(C(I ), Bϱmax , μX ).

Remark 3.2.4 The main properties of the canonical version .X are:
(i) .X is a continuous process equal in law to X;
(ii) .X is defined on the Polish metric space .(C(I ), ϱmax ): this fact is relevant for the
existence of the regular version of conditional probability (cf. Theorem 4.3.2
in [113]) and is crucial in the study of stochastic differential equations. In
Chap. 14 we will make extensive use of the canonical version of continuous
processes;
(iii) .X is defined on a sample space in which the outcomes are the trajectories:
.t |→ Xt (w) ≡ w(t), .t ∈ I . This fact allows, for example, to give an intuitive

characterization of the strong Markov property (cf. Sect. 7.3).

Furthermore, the space .(C(I ), Bϱmax , μX ) can be completed by considering as .σ -
algebra of events the completion of .Bϱmax with respect to .μX (cf. Remark 1.4.3 in
[113]).
Remark 3.2.5 (Skorokhod Space) The Skorokhod space is an extension of the
space of continuous trajectories that intervenes in the study of discontinuous
stochastic processes (such as, for example, the Poisson process). The Skorokhod
space .D(I ) is formed by càdlàg functions (cf. Definition 5.2.2) from I to .R or,
more generally, with values in a metric space. All the results of this section extend
to the case of a.s. processes with càdlàg trajectories. In particular, it is possible
to define on .D(I ) a metric, the Skorokhod distance, equipped with which .D(I )
is a Polish space. Obviously .C(I ) is a subspace of .D(I ) and it can be proved
that the uniform and Skorokhod distances are equivalent on .C(I ). The monograph
[16] provides a complete treatment of the Skorokhod space and the compactness

4 By Remark 3.1.3, the definition extends to the case of X a.s. continuous in an obvious way.
64 3 Continuous Processes

properties (tightness) of families of probability measures on .D(I ), in analogy with

what was seen in Section 3.3.2 in [113].

3.3 Kolmogorov’s Continuity Theorem

Kolmogorov’s extension theorem establishes the existence of a process with a given

law but does not provide information on the regularity of its trajectories. In fact,
Example 1.2.6 shows that nothing can be said about the continuity of a process’s
trajectories based on its distribution: modifying5 a continuous process can make
it discontinuous without changing its law. For this reason, the construction of a
process using Kolmogorov’s extension theorem takes place in the space .RI of all
the trajectories.
On the other hand, if the law of a process X satisfies suitable conditions, then
there exists a continuous modification of X: the fundamental result in this regard is
the classical Kolmogorov’s continuity theorem, of which we offer various versions,
with the simplest being the following
Theorem 3.3.1 (Kolmogorov’s Continuity Theorem [!!!] ) Let .X = (Xt )t∈[0,1]
be a real stochastic process defined on a probability space .(Ω, F , P ). If there exist
three positive constants .c, ε, p, with .p > ε, such that
⎾ ⏋
E |Xt − Xs |p ≤ c|t − s|1+ε ,
. t, s ∈ [0, 1], (3.3.1)

then X admits a modification .X ~ with .α-Hölder continuous trajectories for every

.α ∈ [0, [: precisely, for every .α ∈ [0, [ and .ω ∈ Ω there exists a positive
ε ε
p p
constant .cα,ω , which depends only on .α and .ω, such that

~t (ω) − X
|X
. ~s (ω)| ≤ cα,ω |t − s|α , t, s ∈ [0, 1].

In Sect. 3.4 we give a proof of Theorem 3.3.1, inspired by the original ideas of
Kolmogorov. Let us consider some examples.
Example 3.3.2 ([!]) We resume Corollary 1.3.6 and consider a Gaussian process
.(Xt )t∈[0,1] with mean function .m ≡ 0 and covariance .c(s, t) = s ∧ t. By definition,
.(Xt , Xs ) ∼ N0,Ct,s where

⎛ ⎞
t s∧t
.Ct,s =
s∧t s

and therefore .Xt − Xs ∼ N0,t+s−2s∧t . It is easy to prove an estimate of the

type (3.3.1): first of all, it is not restrictive to assume .s < t so that .Xt − Xs =

5 Here “modifying a process” means taking a modification of it.

3.3 Kolmogorov’s Continuity Theorem 65

√
t − sZ with .Z ∼ N0,1 ; then, for every .p > 0 we have
⎾ ⏋ p ⎾ ⏋
E |Xt − Xs |p = |t − s| 2 E |Z|p
.

where .E [|Z|p ] is a finite constant. By Kolmogorov’s continuity theorem, X admits

a modification .X~ which is .α-Hölder for every .α < p/2−1 = 1 − 1 . Given the
p 2 p
~ is .α-Hölder for every .α < 1 .
arbitrariness of p, it follows that .X 2
Example 3.3.3 ([!]) Let us verify the Kolmogorov’s criterion (3.3.1) for the Pois-
son transition law. If .Nt − Ns ∼ Poissonλ(t−s) , then for .p > 0 we have

∞
⎲
⎾ ⏋ (λ(t − s))n
E |Nt − Ns |p = e−λ(t−s)
. np =
n!
n=0

(since the first term of the series is zero)

∞
⎲
−λ(t−s) (λ(t − s))n
. =e np
n!
n=1

⎲∞
(λ(t − s))n
≥ e−λ(t−s)
n!
n=1
⎛ ⎞
= e−λ(t−s) eλ(t−s) − 1 ≈ λ(t − s) + o(t − s)

for .t − s → 0. Thus, condition (3.3.1) is not satisfied for any value of .ε > 0. Indeed,
in Chap. 5 we will discover that the Poisson law corresponds to a process N with
discontinuous trajectories.
Theorem 3.3.1 can be extended in several directions: the most interesting ones
concern higher-order regularity, the extension to the case of multidimensional I ,
and the case of processes with values in Banach spaces. In relatively recent times, it
has been observed that Kolmogorov’s continuity theorem is essentially an analytical
result that can be proved as a corollary of the Sobolev embedding theorem, in a very
general version for the so-called Besov spaces. We provide here the statement given
in [128].
Theorem 3.3.4 (Kolmogorov’s Continuity Theorem) [[!!!]] Let .X = (Xt )t∈Rd
be a real stochastic process. If there exist .k ∈ N0 , .0 < ε < p, and .δ > 0 such that
⎾ ⏋
E |Xt − Xs |p ≤ c|t − s|d+ε+kp
.

for every .t, s ∈ Rd with .|t − s| < δ, then X admits a modification .X ~ whose
trajectories are differentiable up to order k, with locally .α-Hölder derivatives for
every .α ∈ [0, pε [.
66 3 Continuous Processes

Theorem 3.3.4 also extends to processes with values in a Banach space: the
following example is particularly relevant in the study of stochastic differential
equations.
( )
Example 3.3.5 Let . Xtx t∈[0,1] be a family of continuous stochastic processes,
indexed by .x ∈ Rd : as in Sect. 3.2, we consider .Xx as a r.v. with values in
.(C[0, 1], Bϱmax ) which is a Banach space with the norm

‖X‖∞ := max |Xt |.

.
t∈[0,1]

If
⎾ p ⏋
E ‖Xx − Xy ‖∞ ≤ c|x − y|d+ε ,
. x, y ∈ Rd ,

~ (i.e., we have6 .X
then there exists a modification .X ~x = Xx a.s. for each .x ∈ Rd )
such that

~tx (ω) − X
‖X
. ~t (ω))‖∞ ≤ c |x − y|α ,
y
x, y ∈ K,

for every compact subset K of .Rd and .α < ε

p, with .c > 0 depending only on .ω, α
and K.

3.4 Proof of Kolmogorov’s Continuity Theorem

We have to prove that, if .X = (Xt )t∈[0,1] is a real stochastic process and there exist
three constants .p, ε, c > 0 such that
⎾ ⏋
E |Xt − Xs |p ≤ c|t − s|1+ε ,
. t, s ∈ [0, 1], (3.4.1)

then X admits a modification .X ~ with .α-Hölder continuous trajectories for every

.α ∈ [0, [.
ε
p
We divide the proof into four steps, of which the third is the most technical and
can be skipped at a first reading.
First Step We combine Markov’s inequality (3.1.2) in [113] with (3.4.1) to obtain
the estimate

E [|Xt − Xs |p ] c|t − s|1+ε

P (|Xt − Xs | ≥ λ) ≤
.
p
≤ , λ > 0. (3.4.2)
λ λp

( x )
6 In ~t = Xtx , t ∈ [0, 1] =1.
the sense that .P X
3.4 Proof of Kolmogorov’s Continuity Theorem 67

We observe that from (3.4.2) it follows that, fixing .t ∈ [0, 1], there exists the limit
in probability

. lim Xs = Xt
s→t

and consequently, there is also almost sure convergence. However, this is not enough
to prove the thesis: in fact, the same result holds, for example, for the Poisson
process which has all discontinuous trajectories (cf. (5.1.5)). Indeed, Kolmogorov
realized that from (3.4.2) it is not possible to directly obtain an estimate of the
increment .Xt − Xs for every .t, s since .[0, 1] is uncountable. Thus, his idea was to
first restrict .t, s to the countable family of dyadic rationals of .[0, 1] defined by
⋃ { }
. D= Dn , Dn = k
2n | k = 0, 1, . . . , 2n .
n≥1

We observe that .Dn ⊆ Dn+1 for every .n ∈ N. Two elements .t, s ∈ Dn are called
consecutive if .|t − s| = 2−n .
Second Step We estimate the increment .Xt − Xs assuming that .t, s are consecutive
in .Dn : by (3.4.2) we have
⎛ ⎞
−nα
P |X
. k − X k−1 | ≥ 2 ≤ c 2n(αp−1−ε) .
2n n 2

Then, setting
⎛ ⎞ ⋃ ⎛ ⎞
−nα −nα
An =
. max n |X k − X k−1 |≥2 = |X k − X k−1 | ≥ 2 ,
1≤k≤2 2n n 2 2n n
2
1≤k≤2n

by the sub-additivity of P , we have

⎲
2 n
⎛ ⎞ ⎲
2n
−nα
P (An ) ≤
. P |X k − X k−1 |≥2 ≤ c 2n(αp−1−ε) = c 2n(αp−ε) .
2n n 2
k=1 k=1

ε
Hence, if .α < p, we have
⎲
. P (An ) < ∞
n≥1

and by Borel-Cantelli’s Lemma 1.3.28 in [113] .P (An i.o.) = 0: this means that
there exists .N ∈ F , with .P (N) = 0, such that for every .ω ∈ Ω \ N there exists
.nα,ω ∈ N for which

. max |X kn (ω) − X k−1

n
(ω)| ≤ 2−nα , n ≥ nα,ω .
1≤k≤2n 2 2
68 3 Continuous Processes

As a consequence, we also have that for every .ω ∈ Ω \ N there exists .cα,ω > 0 such
that

. max |X kn (ω) − X k−1

n
(ω)| ≤ cα,ω 2−nα , n ∈ N.
1≤k≤2n 2 2

Third Step We estimate the increment .Xt − Xs with .t, s ∈ D, constructing an

appropriate chain of consecutive points connecting s to t, and then using, through
the triangle inequality, the estimate obtained in the previous step. Let .t, s ∈ D with
.s < t: we set

n̄ = min{k | t, s ∈ Dk },
. n = max{k | t − s < 2−k },

so that .n < n̄. Moreover, for .k = n + 1, . . . , n̄, we recursively define the sequence

sn = max{τ ∈ Dn | τ ≤ s},
. sk = sk−1 + 2−k sgn(s − sk−1 )

where .sgn(x) = |x| x

if .x /= 0 and .sgn(0) = 0. We define .(tk )n≤k≤n̄ in an analogous
way. Then .sk , tk ∈ Dk and we have

|sk − sk−1 | ≤ 2−k ,

. |tk − tk−1 | ≤ 2−k , k = n + 1, . . . , n̄.

Furthermore, we prove that .|tn − sn | ≤ 2−n and we have

|s − sk | < 2−k ,
. |t − tk | < 2−k , k = n, . . . , n̄,

from which .sn̄ = s and .tn̄ = t. Then we have

⎲
n̄ ⎲
n̄
Xt − Xs = Xtn − Xsn +
. (Xtk − Xtk−1 ) − (Xsk − Xsk−1 )
k=n+1 k=n+1

and therefore, for every .ω ∈ Ω \ N ,

Fourth Step We proved that for every .ω ∈ Ω \ N the trajectory .X(ω) is .α-
Hölder continuous on .D and therefore extends uniquely to an .α-Hölder continuous
~
function on .[0, 1], which we denote by .X(ω). Now we define the process .X ~ whose
~
trajectories are equal to .X(ω) if .ω ∈ Ω \ N and are identically zero on N . We prove
~ is a modification of X, that is, .P (Xt = X
that .X ~t ) = 1 for every fixed .t ∈ [0, 1]:
this is obvious if .t ∈ D. On the other hand, if .t ∈ [0, 1] \ D, we consider a sequence
.(tn )n∈N in .D that approximates t. We already have observed that, by (3.4.2), .Xtn

converges to .Xt in probability and thus also pointwise a.s., up to a subsequence:

since .Xtn = X~tn a.s. then also .Xt = X~t a.s. and this concludes the proof.

3.5 Key Ideas to Remember

We provide a summary of the chapter’s major findings and essential concepts for
initial comprehension, focusing on omitting technical or secondary details. As usual,
if you have any doubt about what the following succinct statements mean, please
review the corresponding section.
• Sections 3.1 and 3.2: a continuous stochastic process X can be regarded as a
random variable with values in the Polish metric space of continuous trajectories,
.(C(I ), Bϱmax ). The law of X is therefore a distribution on the Borel .σ -algebra

.Bϱmax .

• Section 3.3: Kolmogorov’s continuity theorem provides a condition on the law

of a process so that it admits a modification with locally Hölder continuous
trajectories. This is the case of the Gaussian transition law of Example 3.3.2
but not of the Poisson transition law of Example 3.3.3.
• Section 3.4: the first two steps of the proof of Kolmogorov’s continuity theorem
are based on Markov’s inequality and Borel-Cantelli’s lemma: they contain the
key ideas of the proof of this deep and fundamental result.
Main notations used or introduced in this chapter:

Symbol Description Page

.C(I ) Continuous functions on the interval I 59
.F .σ -algebra on .R generated by finite-dimensional cylinders
I I 3
.ϱmax Uniform distance on .C(I ) 61
.Bϱmax Borel .σ -algebra on .C(I ) 61
Chapter 4
Brownian Motion

In this section we will define Brownian motion and construct it.

This event, like the birth of a child, is messy and painful, but
after a while we will be able to have fun with our new arrival.
Richard Durrett

Brownian motion stands out as one of the paramount stochastic processes. It owes
its name to the botanist Robert Brown, who, circa 1820, documented the erratic
motion exhibited by pollen grains suspended within a solution. This phenomenon,
characterized by the seemingly random movement of particles due to collisions with
surrounding molecules, has since found widespread applications in various fields,
ranging from physics and chemistry to finance and biology. Brownian motion was
used by Louis Bachelier in 1900 in his doctoral thesis as a model for the price of
stocks and was studied by Albert Einstein in one of his famous papers in 1905.
The first rigorous mathematical definition of a Brownian motion is due to Norbert
Wiener in 1923.

4.1 Definition

Definition 4.1.1 (Brownian Motion [!!!]) Let W = (Wt )t≥0 be a real stochastic
process defined on a filtered probability space (Ω, F , P , Ft ). We say that W is a
Brownian motion if it satisfies the following properties:
(i) W0 = 0 a.s.;
(ii) W is a.s. continuous;
(iii) W is adapted to (Ft )t≥0 , i.e., Wt ∈ mFt for every t ≥ 0;
(iv) Wt − Ws is independent of Fs for every t ≥ s ≥ 0;
(v) Wt − Ws ∼ N0,t−s for every t ≥ s ≥ 0.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 71

A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_4
72 4 Brownian Motion

Fig. 4.1 A trajectory of a Brownian motion

Fig. 4.2 1000 trajectories of a Brownian motion and histogram of its sample distribution at time
t =1

Remark 4.1.2 Let us briefly comment on the properties of Definition 4.1.1: by (i)
a Brownian motion starts from the origin, just as a convention. Property (ii) ensures
that almost all trajectories of W are continuous. Moreover, W is adapted to the
filtration (Ft )t≥0 : this means that, at any fixed time t, the information in Ft is
sufficient to observe the entire trajectory of W up to time t. Properties (iv) and
(v) are less intuitive but can be justified by some notable features, observable at
the statistical level, of random motions: we call (iv) and (v) the independence and
stationarity properties of the increments, respectively (cf. Definition 2.3.1). Notice
that Wt −Ws is equal in law to Wt−s . Figures 4.1 and 4.2 show the plot of trajectories
of a Brownian motion.
Remark 4.1.3 In Definition 4.1.1 the filtration (Ft ) is not necessarily the one
generated by W : the latter was denoted by (GtW )t≥0 in Definition 1.4.3. Clearly,
property (iii) of a Brownian motion implies that GtW ⊆ Ft for every t ≥ 0. We
4.1 Definition 73

will see in Sect. 6.2 that it is generally preferable to work with filtrations strictly
larger than G W in order to satisfy appropriate technical assumptions, including, for
example, completeness.
We give a useful characterization of a Brownian motion.
Proposition 4.1.4 ([!]) An a.s. continuous stochastic process W = (Wt )t≥0 is a
Brownian motion with respect to its own filtration (GtW )t≥0 if and only if it is a
Gaussian process with zero mean function, E [Wt ] = 0, and covariance function
cov(Ws , Wt ) = s ∧ t.
Proof Let W be a Brownian motion on (Ω, F , P , (GtW )t≥0 ). For each 0 = t0 <
t1 < · · · < tn , the random variables Zk := Wtk − Wtk−1 , have normal distribution;
moreover, by properties (iii) and (v) of a Brownian motion, Zk is independent of
GtW
k−1
and therefore of Z1 , . . . , Zk−1 ∈ mGtW k−1
. This proves that (Z1 , . . . , Zn ) is a
multi-normal vector with independent components. Also (Wt1 , . . . , Wtn ) is multi-
normal because it is obtained from (Z1 , . . . , Zn ) by the linear transformation

⎲
h
.Wth = Zk , h = 1, . . . , n,
k=1

and this proves that W is a Gaussian process. We also observe that, assuming s < t,
we have

cov(Ws , Wt ) = cov(Ws , Wt − Ws + Ws ) = cov(Ws , Wt − Ws ) + var(Ws ) = s

by the independence of Ws and Wt − Ws : this proves that cov(Ws , Wt ) = s ∧ t.

Conversely, let W be a Gaussian process with zero mean function and covariance
function cov(Ws , Wt ) = s ∧ t. Since E [W0 ] = var(W0 ) = 0, we have W0 = 0 a.s.
Properties (ii) and (iii) of the definition of a Brownian motion are obvious. To prove
(v), it is enough to observe that, if s < t, we have

var(Wt − Ws ) = var(Wt ) + var(Ws ) − 2cov(Wt , Ws ) = t + s − 2(s ∧ t) = t − s.

Finally, given τ ≤ s < t, the vector (Wt − Ws , Wτ ) has a normal distribution

because it is a linear combination of (Wτ , Ws , Wt ) and

cov(Wt − Ws , Wτ ) = cov(Wt , Wτ ) − cov(Ws , Wτ ) = τ − τ = 0.

Consequently, Wt − Ws and Wτ are independent: since W is a Gaussian process, it

follows that Wt − Ws is independent of (Wτ1 , . . . , Wτn ) for every τ1 , . . . , τn ≤ s.
Then, by Lemma 2.3.20 in [113], Wt − Ws is independent of GsW and this proves
the validity of property (iv). ⨆
⨅
74 4 Brownian Motion

Remark 4.1.5 ([!]) Proposition 4.1.4 states that the finite-dimensional distributions
of a Brownian motion are uniquely determined: hence the Brownian motion √ is
~t := tW1
unique in law. Notice that, if W is a Brownian motion then the process W
has the same one-dimensional distributions as W but is obviously not a Brownian
motion.
There are numerous proofs of the existence of a Brownian motion: some of them
can be found, for example, in the monographs by Schilling [129] and Bass [9]. Here
we see the result as a corollary of Kolmogorov’s extension and continuity theorems.
Theorem 4.1.6 A Brownian motion exists.
Proof The main step is the construction of a Brownian motion on the bounded
time interval [0, 1]. By Kolmogorov’s extension theorem (in particular, by Corol-
lary 1.3.6) there exists a Gaussian process W (0) = (Wt(0) )t∈[0,1] with zero mean
(0) (0)
function and covariance function cov(Ws , Wt ) = s ∧ t. By Kolmogorov’s
(0)
continuity theorem and Example 3.3.2, W admits a continuous modification that,
by Proposition 4.1.4, satisfies the properties of a Brownian motion on [0, 1].
Now take a sequence (W (n) )n∈N of independent copies of W (0) . We “glue” these
(0)
processes together by defining Wt = Wt for t ∈ [0, 1] and

[t]−1
⎲ [t]
Wt =
. W1(k) + Wt−[t] , t > 1,
k=0

where [t] denotes the integer part of t. Then it is easy to prove that W is a Brownian
motion. ⨆
⨅
Remark 4.1.7 As seen in Example 3.3.2, a Brownian motion admits a modification
with trajectories that are not only continuous but also locally α-Hölder continuous
for every α < 12 . The exponent α is strictly less than 12 , and this result cannot
be improved: for more details, we refer, for example, to Chapter 7 in [9]. A classic
result, the Law of the iterated logarithm, precisely describes the asymptotic behavior
of Brownian increments:
|Wt |
. lim sup / =1 a.s.
+
t→0 2t log log 1t

Consequently, the trajectories of a Brownian motion are almost surely not differen-
tiable at any point: precisely, there exists N ∈ F , with P (N ) = 0, such that for
every ω ∈ Ω \ N, the function t |→ Wt (ω) is not differentiable at any point in
[0, +∞[.
4.2 Markov and Feller Properties 75

4.2 Markov and Feller Properties

Let .W = (Wt )t≥0 be a Brownian motion on .(Ω, F , P , Ft ). Given .t ≥ 0 and

x ∈ R, we set
.

WTt,x := WT − Wt + x,
. T ≥ t.

Definition 4.2.1 The process .W t,x = (WTt,x )T ≥t is called Brownian motion with
initial point x at time t and has the following properties:
(i) .Wtt,x = x;
(ii) the trajectories .T |→ WTt,x are a.s. continuous;
t,x
(iii) .W
T ∈ mFT for every .T ≥ t;
t,x t,x
(iv) .W
T − Ws = WT − Ws is independent of .Fs for every .T ≥ s ≥ t;
t,x t,x
(v) .W
T − Ws ∼ N0,T −s for every .T ≥ s ≥ t.
Remark 4.2.2 The process .W t,x is also a Brownian motion with respect to its
generated filtration, defined by

GTt,x := σ (Wst,x , s ∈ [t, T ]),

. T ≥ t.

Note that .GTt,x ⊆ FT and there is a strict inclusion .Gtt,x = {∅, Ω} ⊂ Ft if .t > 0.
By Proposition 2.3.2, we have
Theorem 4.2.3 (Markov Property [!]) Let .W = (Wt )t≥0 be a Brownian motion
on .(Ω, F , P , Ft ). Then W is a Markov process with Gaussian transition density

1 (x−y)2
𝚪(t, x; T , y) = √
. e− 2(T −t) , 0 ≤ t < T , x, y ∈ R. (4.2.1)
2π(T − t)

Consequently, for every .ϕ ∈ bB, we have

u(t, Wt ) = E [ϕ(WT ) | Ft ]
.

where
ˆ
u(t, x) :=
. 𝚪(t, x; T , y)ϕ(y)dy. (4.2.2)
R

We have proven in Example 2.4.6 the following

Proposition 4.2.4 (Feller Property) A Brownian motion satisfies the strong Feller
property.
76 4 Brownian Motion

Remark 4.2.5 The function u in (4.2.2) belongs to .C ∞ ([0, T [×R); moreover, if

.ϕ ∈ bC(R), proceeding as in Example 3.1.3 in [113], we get

. lim u(t, x) = ϕ(y)

(t,x)→(T ,y)
t<T

so that .u ∈ C ([0, T ] × R) and .u(0, ·) ≡ ϕ. Thus, u is a classical solution (cf.

Definition 18.2.5) of the backward Cauchy problem
⎧
∂t u(t, x) + 12 ∂xx u(t, x) = 0, t ∈ [0, T [, x ∈ R,
.
u(T , x) = ϕ(x), x ∈ R.

This is in agreement with Example 2.5.9, being .At = 21 ∂xx the characteristic
operator of the Gaussian transition distribution. Note that the hypothesis .ϕ ∈ bC(R)
is only1 used to prove the continuity of .u(t, x) up to .t = T .

4.3 Wiener Space

By Proposition 4.1.4, a Brownian motion has finite-dimensional Gaussian distribu-

tions. More precisely, by Proposition 2.4.1 (in particular, by formula (2.4.2)) we
have the following
Theorem 4.3.1 (Finite-Dimensional Densities) Let .W = (Wt )t≥0 be a real
Brownian motion. For every .0 < t1 < · · · < tn , the vector .(Wt1 , . . . , Wtn ) is
absolutely continuous with density

γ(Wt1 ,...,Wtn ) (x1 , . . . , xn ) = 𝚪(0, 0; t1 , x1 )𝚪(t1 , x1 ; t2 , x2 ) · · · 𝚪(tn−1 , xn−1 ; tn , xn )

with .𝚪 as in (4.2.1). The law2 of W is called Wiener measure.

Definition 4.3.2 (Wiener Space) The probability space .(C(R≥0 ), BμW , μW ),
where .μW is the Wiener measure and .BμW is the .μW -completion3 of the Borel
.σ -algebra, is called Wiener space.

Recall Definition 3.2.3, which defines the canonical version of an a.s. continuous
process. An immediate consequence of Proposition 4.1.4 is the following
Corollary 4.3.3 Given a Brownian motion W , its canonical version .W is a
Brownian motion on the Wiener space equipped with the filtration .G W generated
by .W.

∈ C ∞ ([0, T [×R) for every .ϕ ∈ bB .

1 .u
2 Definition
3.2.2.
3 Cf. Remark 1.4.3 in [113].
4.4 Brownian Martingales 77

Given a Brownian motion W , we will later introduce (cf. Sect. 6.2.3) a filtration
larger than .G W so that some useful regularity properties hold.
Example 4.3.4 Let W be a real Brownian motion and .0 < t < T . We have the
following expressions for the joint densities of .Wt and .WT :

1 (T x 2 −2txy+ty 2 )
. γ(Wt ,WT ) (t, x; T , y) = γ(WT ,Wt ) (T , y; t, x) = √ e− 2t (T −t) .
2π t (T − t)

By Proposition 4.3.20 in [113] we also have the conditional densities

γ(WT ,Wt ) (T , y; t, x)
γWT |Wt (T , y; t, x) =
. = 𝚪(t, x; T , y),
γWt (t, x)
2
T (x− Tt y )
γ(Wt ,WT ) (t, x; T , y) 1
γWt |WT (t, x; T , y) = =/ e− 2t (T −t) .
γWT (T , y) 2π t (TT−t)

Thus, in accordance with Theorem 4.2.3, we have

μWT |Wt = NWt ,T −t

and

μWt |WT = N t W
. t (T −t) .
T T, T

4.4 Brownian Martingales

Let W be a Brownian motion on the filtered space .(Ω, F , P , Ft ).

Proposition 4.4.1 The following processes are martingales:
(i) the Brownian motion W ;
(ii) the quadratic martingale

Xt := Wt2 − t;
.

(iii) the exponential martingale

σ2
. Yt = eσ Wt − 2 t

for every .σ ∈ C.
78 4 Brownian Motion

Proof By Hölder’s inequality, we have

⎾ ⏋1 √
2
E [|Wt |] ≤ E Wt2 = t
.

and therefore W is an absolutely integrable process. Part (i) follows from Proposi-
tion 2.3.4, being W a process with constant zero mean and independent increments.
Similarly, (ii) and (iii) are proven: for example, we have
⎾ ⏋
E [XT | Ft ] = E (WT − Wt + Wt )2 | Ft − T
.

⎾ ⏋
= E (WT − Wt )2 | Ft + 2Wt E [WT − Wt | Ft ]
◟ ◝◜ ◞ ◟ ◝◜ ◞
=0
=T −t

+ Wt2 − T = Wt2 − t.

⨆
⨅
We give a useful characterization of a Brownian motion in terms of exponential
martingales.
Proposition 4.4.2 ([!]) A continuous and adapted process W , defined on the space
(Ω, F , P , Ft ) and such that .W0 = 0 a.s., is a Brownian motion if and only if
.

η2
Mt := eiηWt +
η 2 t
.

is a martingale for every .η ∈ R.

Proof If W is a Brownian motion then .M η is a martingale by Proposition 4.4.1-(iii).
Conversely, it is sufficient to verify that for .0 ≤ s ≤ t:
(i) .Wt − Ws has normal distribution .N0,t−s ;
(ii) .Wt − Ws is independent of .Fs .
η
The martingale property of .Mt is equivalent to
⎾ ⏋ η2
E eiη(Wt −Ws ) | Fs = e− 2 (t−s) ,
. η ∈ R.

Applying the expected value, we obtain the characteristic function of .Wt − Ws :

⎾ ⏋ η2
E eiη(Wt −Ws ) = e− 2 (t−s) ,
. η ∈ R,

from which the thesis follows: in particular, the independence property follows from
14) of Theorem 4.2.10 in [113]. ⨆
⨅
4.4 Brownian Martingales 79

The following version of Theorem 2.5.13 provides a general method for con-
structing a martingale by composing a Brownian motion W with a sufficiently
regular function .f = f (t, x). We also assume on f a growth condition of the type

|f (t, x)| ≤ cT ecT |x| ,

α
. (t, x) ∈ [0, T ] × R, (4.4.1)

with .cT a positive constant dependent on T and .α ∈ [0, 2[: this ensures the
integrability of the process .f (t, Wt ) for .t ∈ [0, T ].
Theorem 4.4.3 ([!]) Let .f = f (t, x) ∈ C 1,2 (R≥0 × R) be a function that verifies,
together with its first and second derivatives, the growth condition (4.4.1). Then the
process
ˆ t⎛ ⎞
1
.Mt := f (t, Wt ) − f (0, W0 ) − ∂s f + ∂xx f (s, Ws )ds, t ∈ [0, T ],
0 2

is a martingale. In particular, if f solves the backward heat equation, then .f (t, Wt )

is a martingale.
Proof The proof is entirely analogous to that of Theorem 2.5.13. For each .s > t
and .x ∈ R, we have
ˆ ˆ
( )
.∂s 𝚪(t, x; s, y)f (s, y)dy = ∂s 𝚪(t, x; s, y)f (s, y) dy =
R R

(since .∂s 𝚪(t, x; s, y) = 12 ∂yy 𝚪(t, x; s, y))

ˆ ˆ
1
. = 𝚪(t, x; s, y)∂s f (s, y)dy + ∂yy 𝚪(t, x; s, y)f (s, y)dy =
R R 2

(integrating by parts in the second integral)

ˆ ⎛ ⎞
1
. = 𝚪(t, x; s, y) ∂s f + ∂yy f (s, y)dy.
R 2

Setting .x = Wt in the previous formula, by the Markov property we have

⎾⎛ ⎞ ⏋
1
∂s E [f (s, Ws ) | Ft ] = E
. ∂s f + ∂xx f (s, Ws ) | Ft .
2

Now we integrate in s between t and T to obtain

ˆ T ⎾⎛ ⎞ ⏋
1
E [f (T , WT ) | Ft ] − f (t, Wt ) =
. E ∂s f + ∂xx f (s, Ws ) | Ft ds =
t 2
80 4 Brownian Motion

(exchanging the signs of integral and conditional expectation as in the proof of

Theorem 2.5.13)
⎾ˆ T ⎛ ⎞ ⏋
1
. =E ∂s f + ∂xx f (s, Ws )ds | Ft .
t 2

In conclusion, we have

E [MT − Mt | Ft ]
.

⎾ ˆ T ⎛ ⎞ ⏋
1
= E f (T , WT ) − f (t, Wt ) − ∂s f + ∂xx f (s, Ws )ds | Ft = 0
t 2

and this wraps up the proof. ⨆

⨅

4.5 Key Ideas to Remember

We summarize the most significant findings of the chapter and the fundamental
concepts to be retained from an initial reading, while disregarding the more technical
or secondary matters. As usual, if you have any doubt about what the following
succinct statements mean, please review the corresponding section.
• Section 4.1: a Brownian motion W is a continuous and adapted process, with
independent and stationary increments having normal distribution. It is charac-
terized by being a Gaussian process with zero mean function and covariance
function .cov(Ws , Wt ) = s ∧ t.
• Section 4.2: W is a Markov process with transition law equal to the law of .WTt,x .
Moreover, W is a strong Feller process.
• Section 4.3: the finite-dimensional densities of W are uniquely determined and
the law of W is called Wiener measure.
• Section 4.4: W is a martingale and other notable examples of martingales can
be constructed as functions of W : for instance, the quadratic and the exponential
martingales. The latter provides a characterization of the Brownian motion (cf.
Proposition 4.4.2). Theorem 4.4.3 shows how to “compensate” a function of W
to make it a martingale and indicates the connection with the heat equation that
will be further explored in the following chapters.
4.5 Key Ideas to Remember 81

Main notations used or introduced in this chapter:

Symbol Description Page

.G
W Filtration generated by W 14
.W
t,x Brownian motion with initial point x at time t 75
.G
t,x Filtration generated by .W t,x 75
.𝚪(t, x; T , y) Gaussian transition density 75
.μW Wiener measure 76
Chapter 5
Poisson Process

We are too small and the universe too large and too interrelated
for thoroughly deterministic thinking.
Don S. Lemons, [88]

The Poisson process, denoted as .(Nt )t≥0 , serves as the prototype of what are known
as “pure jump processes”. Intuitively, .Nt indicates the number of times within
the time interval .[0, t] that a speciﬁc event (referred to as an episode) occurs:
for example, if the single episode consists of the arrival of a spam email in a
mailbox, then .Nt represents the number of spam emails that arrive in the period
.[0, t]; similarly, .Nt can indicate the number of children born in some country or the

number of earthquakes that occur in some geographical area in the period .[0, t].

5.1 Definition

Referring to the general notation of Deﬁnition 1.1.3, we assume .I = R≥0 . To

construct the Poisson process, we consider a sequence .(τn )n∈N of independent and
identically distributed random variables1 with exponential distribution, .τn ∼ Expλ ,
with parameter .λ > 0, deﬁned on a complete probability space .(Ω, F , P ): here .τn
represents the time that elapses between the .(n − 1)-th episode and the next one.
Then we deﬁne the sequence

T0 := 0,
. Tn := τ1 + · · · + τn , n ∈ N,

in which .Tn represents the instant at which the n-th episode occurs.

1 Such a sequence exists by Corollary 1.3.7.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 83

A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_5
84 5 Poisson Process

Lemma 5.1.1 We have2

Tn ∼ Gamman,λ
. n ∈ N. (5.1.1)

Moreover, almost surely3 the sequence .(Tn )n≥0 is monotonically increasing and

. lim Tn = +∞. (5.1.2)

n→∞

Proof Formula (5.1.1) follows from (2.6.7) in [113]. The monotonicity follows
from the fact that .τn ≥ 0 a.s. for every .n ∈ N. Finally, (5.1.2) follows from Borel-
Cantelli’s Lemma 1.3.28 in [113]: in fact, for every .ε > 0, we have
⎛ ⎞ ⋂⋃
. lim Tn = +∞ ⊇ ((τn > ε) i.o.) = (τk > ε)
n→∞
n≥1 k≥n

and the events .(τk > ε) are independent and such that
⎲
. P (τn > ε) = +∞.
n≥1

⨆
⨅
Definition 5.1.2 (Poisson Process, I) The Poisson process .(Nt )t≥0 with parameter
λ > 0 is deﬁned by
.

∞
⎲
Nt =
. n1[Tn ,Tn+1 [ (t), t ≥ 0. (5.1.3)
n=1

By deﬁnition, .Nt takes non-negative integer values and precisely .Nt = n if and
only if t belongs to the interval with random endpoints .[Tn , Tn+1 [; hence we have
the equality of events

(Nt = n) = (Tn ≤ t < Tn+1 ),

. n ∈ N ∪ {0}. (5.1.4)

At the random time .Tn , when the n-th episode occurs, the process makes a jump of
size 1: Fig. 5.1 shows the plot of a Poisson process trajectory in the time interval

2 Thus .T is absolutely continuous with density

(λt)n−1
.γn,λ (t) := λe−λt 1R (t), n ∈ N.
(n − 1)! ≥0

3 The set of .ω ∈ Ω such that .Tn (ω) ≤ Tn+1 (ω) for every .n ∈ N and . lim Tn (ω) = +∞, is a
n→∞
certain event.
5.1 Deﬁnition 85

Fig. 5.1 Plot of a Poisson process trajectory

[0, 10]. We recall that a trajectory of N is a function of the form .t |→ Nt (ω), deﬁned
.

from .R≥0 to .N ∪ {0}, and each .ω ∈ Ω corresponds to a different trajectory.

In conclusion, the random value .Nt is equal to the number of jumps (or the
number of episodes) between 0 and t:

Nt = ♯{n ∈ N | Tn ≤ t}.
.

We will later give a more general characterization of the Poisson process, in

Deﬁnition 5.2.3.
Proposition 5.1.3 The Poisson process .(Nt )t≥0 has the following properties:
(i) almost surely the trajectories are right-continuous and monotonically increas-
ing. Moreover, for every .t > 0, we have4
⎛ ⎞
P lim Ns = Nt = 1;
. (5.1.5)
s→t

(ii) .Nt ∼ Poissonλt , that is

(λt)n
P (Nt = n) = e−λt
. , t ≥ 0, n ∈ N ∪ {0}. (5.1.6)
n!

4 In other words, every ﬁxed t is almost surely (i.e., for almost all trajectories) a point of

continuity for the Poisson process. This apparent paradox is explained by the fact that almost
every trajectory has at most countably inﬁnite discontinuities, since it is monotonically increasing,
and such discontinuities are arranged on the entire interval .[0, +∞[ which has the cardinality of
the continuum. Thus, all trajectories are discontinuous but every single t is a point of discontinuity
only for a negligible family of trajectories.
86 5 Poisson Process

As a consequence, .N0 = 0 a.s. and we have

E [Nt ] = var(Nt ) = λt.

In particular, the parameter .λ, called intensity of the Poisson process, is equal
to the expected number of jumps in the unit time interval .[0, 1];
(iii) the characteristic function of .Nt is given by
iη −1)
ϕNt (η) = eλt (e
. , t ≥ 0, η ∈ R. (5.1.7)

Proof
((i) Right-continuity and monotonicity follow from the deﬁnition. For every .t > 0,
let .Nt− = lim Ns and .ΔNt = Nt − Nt− . We note that .ΔNt ∈ {0, 1} a.s. and,
s↗t
for a ﬁxed .t > 0, the set of trajectories that are discontinuous at t is given by
∞
⋃
. (ΔNt = 1) = (Tn = t)
n=1

which is a negligible event since the random variables .Tn are absolutely
continuous. This proves (5.1.5).
(ii) By (5.1.4) we have

. P (Nt = n) = P (Tn ≤ t < Tn+1 ) =

(since .(t ≥ Tn+1 ) ⊆ (t ≥ Tn ))

. = P (Tn ≤ t) − P (Tn+1 ≤ t) =

(since .Tn ∼ Gamman,λ )

ˆ t ˆ t
−λs (λs)n−1 (λs)n
. = λe ds − λe−λs ds
0 (n − 1)! 0 n!

from which, integrating by parts the second integral, (5.1.6) follows.

(iii) It is a simple calculation: by (ii) we have
⎾ ⏋ ⎲ (λt)n iηn ⎲ (λteiη )n
E eiηNt =
. e−λt e = e−λt
n! n!
n≥0 n≥0

which concludes the proof.

⨆
⨅
5.1 Deﬁnition 87

Remark 5.1.4 (Characteristic Exponent) The characteristic function of the Pois-

son process has an interesting property of homogeneity with respect to time: in fact,
by (5.1.7) the CHF of .Nt is of the form .ϕNt (η) = etψ(η) where

ψ(η) = λ(eiη − 1)
. (5.1.8)

is a function that depends on .η but not on t. Consequently, the function .ψ determines

the CHF of .Nt for every t and for this reason is called characteristic exponent of the
Poisson process.
Example 5.1.5 (Compound Poisson Process [!]) The Poisson process N is the
starting point for the construction of stochastic processes even more interesting and
useful in applications. The first generalization consists in making the size of the
jumps random, as opposed to N where they are all fixed equal to 1.
Consider a probability space on which a Poisson process N is defined and a
sequence .(Zn )n∈N of identically distributed real random variables. Suppose that
the family formed by .(Zn )n∈N and .(τn )n∈N (the exponential random variables that
define N) is a family of independent random variables: this construction is possible
thanks to Corollary 1.3.7. We set by convention .Z0 = 0 and define the compound
Poisson process in the following way:

⎲
Nt
Xt =
. Zn , t ≥ 0.
n=0

Note that the Poisson process is a particular case of X in which .Zn ≡ 1 for .n ∈ N.
In Fig. 5.2 two trajectories of the compound Poisson process with normal jumps and
different choices of the intensity parameter are represented.
Taking advantage of the independence assumption, it is easy to calculate the CHF
of .Xt : actually, it is a calculation already carried out in Exercise 2.5.4 in [113] where
we proved that

Fig. 5.2 On the left: plot of a trajectory of the compound Poisson process with .λ = 10 and
.Zn∼ N0,10−2 . On the right: plot of a trajectory of the compound Poisson process with .λ = 1000
and .Zn ∼ N0,10−2
88 5 Poisson Process

ϕXt (η) = etψ(η) ,

. ψ(η) = λ (ϕZ (η) − 1)

where .ϕZ (η) is the CHF of .Z1 . Also in this case, the CHF of .Xt is homogeneous in
time and .ψ is called the characteristic exponent of the compound Poisson process.
As a particular case, we ﬁnd (5.1.8) for .Zn ∼ δ1 , that is, for unitary jumps as in the
Poisson process.

5.2 Markov and Feller Properties

The following theorem provides two crucial properties of the increments .Nt − Ns
of the Poisson process. As usual (cf. (1.4.1)), .G N = (GtN )t≥0 denotes the filtration
generated by N .
Theorem 5.2.1 ([!]) For every .0 ≤ s < t we have:
(i) .Nt − Ns ∼ Poissonλ(t−s) ;
(ii) .Nt − Ns is independent of .GsN .
Property (i) implies that the r.v. .Nt − Ns and .Nt−s are equal in law and for this
reason, we say that N has stationary increments. Property (ii) states that N is a
process with independent increments according to Definition 2.3.1.
The proof of Theorem 5.2.1 is postponed to Sect. 5.4.
Definition 5.2.2 (Càdlàg Function) We say that a function f , from a real interval
I to .R, is càdlàg (from the French “continue à droite, limite à gauche”) if at every
point it is continuous from the right and has a finite limit from the left.5
The definition of Poisson process can be generalized as follows.
Definition 5.2.3 (Poisson Process, II) A Poisson process with intensity .λ > 0,
defined on a filtered probability space .(Ω, F , P , Ft ), is a stochastic process
.(Nt )t≥0 such that:

(i) .N0 = 0 a.s.;

(ii) N is a.s. càdlàg;
(iii) N is adapted to .(Ft )t≥0 , i.e., .Nt ∈ mFt for every .t ≥ 0;
(iv) .Nt − Ns is independent of .Fs for .s < t;

(v) .Nt − Ns ∼ Poissonλ(t−s) for .s < t.

By Theorem 5.2.1, the process N deﬁned in (5.1.3) is a Poisson process accord-

ing to Definition 5.2.3 with respect to the filtration .G N generated by N . Conversely,
it can be shown that if N is a Poisson process according to Definition 5.2.3 then the

5 If .I = [a, b], at the endpoints we assume by deﬁnition that . lim f (x) = f (a) and the limit
x↘a
. lim f (x) exists and is ﬁnite.
x↗b
5.2 Markov and Feller Properties 89

r.v. .Tn , deﬁned recursively by

.T1 = inf{t ≥ 0 | ΔNt = 1}, Tn+1 := inf{t > Tn | ΔNt = 1},

are independent and with distribution .Expλ : for more details see, for example,
Chapter 5 in [9]. Note that in Deﬁnition 5.2.3 the ﬁltration is not necessarily the
one generated by the process.
Theorem 5.2.4 (Markov Property [!]) The Poisson process N is a Markov and
Feller process with transition law

p(t, x; T , ·) = Poissonx,λ(T −t)

and characteristic operator defined by

At ϕ(x) = λ (ϕ(x + 1) − ϕ(x)) ,

. x ∈ R.

If .ϕ ∈ bB and u is a solution of the backward Cauchy problem

⎧
∂t u(t, x) + At u(t, x) = 0, (t, x) ∈ [0, T [×R,
.
u(T , x) = ϕ(x), x ∈ R,

then

u(t, Nt ) = E [ϕ(NT ) | Ft ] .
.

Proof The thesis is an immediate consequence of Proposition 2.3.2 and the

results of Sect. 2.5.2 for the backward Kolmogorov equation: see in particular
Example 2.5.11. The Feller property was proven in Example 2.4.5. ⨆
⨅
We give a useful characterization of the Poisson process.
Proposition 5.2.5 ([!]) Let .N = (Nt )t≥0 be a stochastic process on the space
(Ω, F , P , Ft ), which satisfies properties (i), (ii) and (iii) of Definition 5.2.3. Then
.

N is a Poisson process of parameter .λ > 0 if and only if

⎾ ⏋
E eiη(Nt −Ns ) | Fs = eλ(e −1)(t−s) ,
iη
. 0 ≤ s ≤ t, η ∈ R. (5.2.1)

Proof If N is a Poisson process, then by the independence and stationarity of

increments and (5.1.7), we have
⎾ ⏋ ⎾ ⏋ ⎾ ⏋
E eiη(Nt −Ns ) | Fs = E eiη(Nt −Ns ) = E eiηNt−s = eλ(e −1)(t−s) .
iη
.

Conversely, if N satisﬁes (5.2.1) and properties (i), (ii) and (iii) of Deﬁnition
5.2.3, properties (iv) and (v) remain to be proven. Applying the expected value to
90 5 Poisson Process

(5.2.1), we get
⎾ ⏋
E eiη(Nt −Ns ) = eλ(e −1)(t−s) ,
iη
. 0 ≤ s ≤ t, η ∈ R.

Then (v) is an obvious consequence of the fact that the characteristic function
determines the distribution; property (iv) of independent increments follows from
point 14) of Theorem 4.2.10 in [113]. ⨆
⨅
Remark 5.2.6 (Poisson Process with Stochastic Intensity) The characterization
given in Proposition 5.2.5 enables the definition of a broad range of processes, with
the Poisson process being just one specific example. In a space .(Ω, F , P , Ft )
consider a process .N = (Nt )t≥0 that satisfies properties (i), (ii) and (iii) of
Definition 5.2.3 and a non-negative valued process .λ = (λt )t≥0 such that for each
.t ≥ 0,

ˆ t
λt ∈ mF0
. and λs ds < ∞ a.s.
0

If
⎾ ⏋ ´t
E eiη(Nt −Ns ) | Fs = e(e −1) s λr dr
iη
.

for each .0 ≤ s ≤ t and .η ∈ R, then N is called Poisson process with

stochastic intensity .λ. For further insights into stochastic intensity processes and
their signiﬁcant applications, refer to, for instance, [21].

5.3 Martingale Properties

Consider a Poisson process .N = (Nt )t≥0 on the space .(Ω, F , P , Ft ). Note that N
is not a martingale since .E [Nt ] = λt is a strictly increasing function and therefore
the process is not constant in mean. However, being a process with independent
increments, from Proposition 2.3.4 we have the following
Proposition 5.3.1 (Compensated Poisson Process) The compensated Poisson
process, defined by

.~t := Nt − λt,
N t ≥ 0,

is a martingale.
We explicitly observe that .N~ takes real values, unlike N which takes only integer
values: in Fig. 5.3 a trajectory of a compensated Poisson process is depicted.
5.4 Proof of Theorem 5.2.1 91

Fig. 5.3 A trajectory of the compensated Poisson process

Remark 5.3.2 The fact that .N ~ is a martingale also follows by applying Theo-
rem 2.5.13 with .ϕ(x) = x. More generally, Theorem 2.5.13 shows how it is possible
to “compensate” a process that is a function of .Nt in order to obtain a martingale.

5.4 Proof of Theorem 5.2.1

We prove that if N is a Poisson process then for every .0 ≤ s < t:

(i) .Nt − Ns ∼ Poissonλ(t−s) ;
(ii) .Nt − Ns is independent of .GsN .
We divide the proof into two steps.
First Step We prove that, given .s > 0 and .k ∈ N ∪ {0}, the process deﬁned by

(s)
Nh = Ns+h − Ns ,
. h ∈ R≥0 , (5.4.1)

is a Poisson process with respect to the conditional probability given the event
(Ns = k), i.e. .N (s) is a Poisson process on the space .(Ω, F , P (· | Ns = k)).
.

To this end, we deﬁne the “translated” jumps

(s)
T0
. = 0, Tn(s) = Tk+n − s, n ∈ N,

which, on the event .A := (Ns = k) ≡ (Tk ≤ s < Tk+1 ), form an increasing

sequence a.s. (see Fig. 5.4). We observe that

(s)
.(Nh = n) ∩ A = (Ns+h = n + k) ∩ A = (Tn+k ≤ s + h < Tn+k+1 ) ∩ A
(s)
= (Tn(s) ≤ h < Tn+1 ) ∩ A
92 5 Poisson Process

Fig. 5.4 Jump times .Tn and

(s)
“translated” jump times .Tn

that is, in accordance with the deﬁnition of the Poisson process in the form (5.1.4),
on the event A we have

(Nh(s) = n) = (Tn(s) ≤ h < Tn+1

.
(s)
), n ∈ N ∪ {0}.

Thus, it is sufﬁcient to verify that the times

τ1(s) := Tk+1 − s,
.
(s)
τn(s) := Tn(s) − Tn−1 ≡ τk+n , n ≥ 2,

form a sequence of random variables that, with respect to .P (· | Ns = k), have

distribution .Expλ and are independent: therefore, we need to prove that
⎛ ⎞
⋂
J ∏
J
P
. ⎝ (τ (s) ∈ Hj ) | Ns = k ⎠ = Expλ (Hj ) (5.4.2)
j
j =1 j =1

for every .J ∈ N and .H1 , . . . , HJ ∈ B(R≥0 ). Formula (5.4.2) is equivalent to

⎛ ⎞
⋂
J
P ⎝(Ns = k) ∩ (Tk+1 − s ∈ H1 ) ∩
. (τk+j ∈ Hj )⎠
j =2

∏
J
= P (Ns = k) Expλ (Hj ). (5.4.3)
j =1

Taking advantage of the fact that .(Ns = k) ∩ (Tk+1 − s ∈ H1 ) = (Tk ≤ s) ∩

(Tk+1 − s ∈ H1 ), .Tk+1 = Tk + τk+1 and the random variables .Tk , τk+1 , . . . , τk+J
are independent under P , (5.4.3) reduces to

P ((Tk ≤ s) ∩ (Tk + τk+1 − s ∈ H1 )) = P (Ns = k)Expλ (H1 ).

. (5.4.4)

Now it is sufﬁcient to consider the case where .H1 is an interval, .H1 = [0, c]: since
Tk and .τk+1 are independent under P , the joint density is given by the product of
.
5.4 Proof of Theorem 5.2.1 93

the marginals and, recalling Lemma 5.1.1, we have

P ((Tk ≤ s) ∩ (τk+1 ∈ [s − Tk , c + s − Tk ]))

ˆ s ⎛ˆ c+s−x ⎞
= λe−λy dy Gammak,λ (dx)
0 s−x
ˆ s
= e−λ(c+s−x) (eλc − 1)Gammak,λ (dx)
0

(sλ)k −λ(c+s) λc
= e (e − 1) = Poissonλs ({k})Expλ ([0, c])
k!
which proves (5.4.4) with .H1 = [0, c].
Second Step By the ﬁrst step, .Nt − Ns is a Poisson process conditionally on .(Ns =
k) and therefore we have

P (Nt − Ns = n | Ns = k) = Poissonλ(t−s) ({n})

. (5.4.5)

for every .s < t and .n, k ∈ N ∪ {0}. By the law of total probability, we have
⎲
P (Nt − Ns = n) =
. P (Nt − Ns = n | Ns = k)P (Ns = k) =
k≥0

(by (5.4.5))
⎲
. = Poissonλ(t−s) ({n})P (Ns = k) = Poissonλ(t−s) ({n}), (5.4.6)
k≥0

and this proves property (i). Moreover, as a consequence of (5.4.6), formula (5.4.5)
is equivalent to

P ((Nt − Ns = n) ∩ (Ns = k)) = P (Ns = k)P (Nt − Ns = n)

which proves that the consecutive increments .Nt − Ns and .Ns = Ns − N0 are
independent under P .
More generally, we verify that .Nt − Nr and .Nr − Ns , with .0 ≤ s < r < t, are
independent under P . Recalling the notation (5.4.1), we have

(s) (s) (s)

.P ((Nt − Nr = n) ∩ (Nr − Ns = k)) = P ((Nt−s − Nr−s = n) ∩ (Nr−s = k)) =

(by the law of total probability)

⎲ (s) (s) (s)
. = P ((Nt−s − Nr−s = n) ∩ (Nr−s = k) | Ns = j )P (Ns = j ) =
j ≥0
94 5 Poisson Process

(here we use the fact that .N (s) is a Poisson process conditionally on .(Ns = j ) and
(s) (s) (s)
therefore, as just proved, the increments .Nt−s − Nr−s and .Nr−s are independent
(s)
under .P (· | Ns = j ). Moreover, .Nr−s = Nr − Ns and .Ns are independent under P
(s) (s)
and therefore .P (Nr−s = k | Ns = j ) = P (Nr−s = k))
⎲ (s) (s) (s)
. = P (Nt−s − Nr−s = n | Ns = j )P (Nr−s = k)P (Ns = j )
j ≥0
(s) (s) (s)
= P (Nt−s − Nr−s = n)P (Nr−s = k)
= P (Nt − Nr = n)P (Nr − Ns = k).

Thus, we have proved that, for .0 ≤ s < r < t, the increment .Nt −Nr is independent
of .X := Nr and .Y := Nr − Ns : consequently, .Nt − Nr is also independent of
.Ns = X − Y and this proves property (ii). ⨆
⨅

5.5 Key Ideas to Remember

We summarize the most significant findings of the chapter and the fundamental
concepts to be retained from an initial reading, leaving out the more technical or less
crucial details. As usual, if you have any doubt about what the following succinct
statements mean, please review the corresponding section.
. Section 5.1: the Poisson process N is the prototype of jump processes. Some-
times called a “counting process”, .Nt indicates the number of times in the
interval .[0, t] in which an episode occurs. The discontinuities of N are jumps of
unit size; in various applications, the compound Poisson process is used, which
has jumps whose size is random. The CHF of a (compound) Poisson process
is homogeneous in time and can be expressed in explicit form in terms of the
characteristic exponent.
. Section 5.2: N is a process with independent increments and enjoys the Markov
and Feller properties.
. Section 5.3: the compensated process .N ~t = Nt − λt is a martingale.
. Section 5.4: from the constructive definition of the Poisson process given in
Sect. 5.2, one can deduce some remarkable properties, namely the fact that
.Nt − Ns ∼ Poissonλ(t−s) and .Nt − Ns is independent of .Gs (cf. Theorem 5.2.1);
N

however, this requires some work and the proof can be skipped at a ﬁrst
reading.
5.5 Key Ideas to Remember 95

Main notations used or introduced in this chapter:

Symbol Description Page

.N = (Nt )t≥0 Poisson process 84
.τn Time elapsed between two jumps (or episodes) 84
.Tn n-th jump time 84
.G
N Filtration generated by N 88
~
.Nt = Nt − λt compensated Poisson process 90
Chapter 6
Stopping Times

Passion glows within your heart Like a furnace burning bright

Until you struggle through the dark You’ll never know the joy in
life Dream Theater,
Illumination theory

Stopping times are a fundamental tool in the study of stochastic processes: they
are particular random times that satisfy a consistency property with respect to the
assigned filtration of information. The concept of stopping time is at the basis of
some deep results on the structure of martingales: the optional sampling theorem,
the maximal inequalities, and the upcrossing lemma. The inherent challenges in
establishing these results become apparent even within the discrete framework. To
move to continuous time, it will be necessary to introduce further assumptions on
filtrations, the so-called usual conditions. The second part of the chapter collects
some technical results: it shows how to extend the filtrations of Markov processes
and other important classes of stochastic processes, in order to guarantee the usual
conditions while maintaining the properties of the processes.

6.1 The Discrete Case

In this section, we consider the case of a finite number of time instants, within a
filtered probability space .(Ω, F , P , (Fn )n=0,1,...,N ) with .N ∈ N.
Definition 6.1.1 (Discrete Stopping Time) A discrete stopping time is a random
variable

τ : Ω −→ {0, 1, . . . , N, ∞}
.

such that

.(τ = n) ∈ Fn , n = 0, . . . , N. (6.1.1)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 97

A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_6
98 6 Stopping Times

We employ the symbol .∞ to represent a constant number that is not part of the
set .{0, 1, . . . , N } of the specified time instances: the reason for using such a symbol
will be clearer later, e.g. in Example 6.1.3. We assume .N < ∞ so that

(τ ≥ n) := (τ = n) ∪ · · · ∪ (τ = N ) ∪ (τ = ∞)
.

for every .n = 0, . . . , N .
Remark 6.1.2 Note that:
(i) condition (6.1.1) is equivalent to

(τ ≤ n) ∈ Fn ,
. n = 0, 1, . . . , N ;

(ii) we have

(τ ≥ n + 1) = (τ ≤ n)c ∈ Fn ,
. n = 0, . . . , N, (6.1.2)

and in particular .(τ = ∞) ∈ FN ;

(iii) if .τ, σ are stopping times then .τ ∧ σ and .τ ∨ σ are stopping times because

(τ ∧ σ ≤ n) = (τ ≤ n) ∪ (σ ≤ n),
.

(τ ∨ σ ≤ n) = (τ ≤ n) ∩ (σ ≤ n), n = 0, . . . , N ;

(iv) constant times are stopping times: precisely, if .τ ≡ k for some .k ∈

{0, . . . , N, ∞}, then .τ is a stopping time.
Example 6.1.3 (Exit Time [!]) Given an adapted real-valued process .X =
(Xn )n=0,1,...,N and .H ∈ B, we set

.J (ω) = {n | Xn (ω) ∈
/ H }, ω ∈ Ω.

The first exit time of X from H is defined as

⎧
min J (ω) if J (ω) /= ∅,
τ (ω) =
.
∞ otherwise.

From now on, we adopt the convention .min ∅ = ∞ and therefore write more
concisely

τ = min{n | Xn ∈
. / H }.

It is easy to see that .τ is a stopping time: in fact, .(τ = 0) = (X0 ∈

/ H ) ∈ F0 and
we have

(τ = n) = (X0 ∈ H ) ∩ · · · ∩ (Xn−1 ∈ H ) ∩ (Xn ∈

. / H ) ∈ Fn , n = 1, . . . , N.
6.1 The Discrete Case 99

A simple example of random time that is not a stopping time is the last exit time of
X from H :
⎧
max J (ω) if J (ω) =
/ ∅,
.τ̄ (ω) =
∞ otherwise.

Notation 6.1.4 Given a discrete stopping time .τ and a stochastic process .X =

(Xn )n=0,1,...,N , we set
⎧
Xτ (ω) (ω) if τ (ω) ∈ {0, . . . , N },
(Xτ )(ω) :=
.
XN (ω) if τ (ω) = ∞,

that is, .Xτ := Xτ ∧N , and

Fτ := {A ∈ F | A ∩ (τ = n) ∈ Fn for every n = 0, . . . , N }.
. (6.1.3)

It is easy to prove that .Fτ is a .σ -algebra: in fact, for example, if .A ∈ Fτ then

Ac ∩ (τ = n) = (τ = n) \ (A ∩ (τ = n)) ∈ Fn and therefore .Ac ∈ Fτ . We note
.

that .Fτ = {A ∈ F | A ∩ (τ ≤ n) ∈ Fn for every n = 0, . . . , N }. Moreover .F∞

(that is, .Fτ with .τ ≡ ∞) is equal to .F .
The following proposition collects other useful properties of .Fτ .
Proposition 6.1.5 Given .τ, σ discrete stopping times, we have:
(i) if .τ ≡ k for some .k ∈ {0, . . . , N } then .Fτ = Fk ;
(ii) if .τ ≤ σ then .Fτ ⊆ Fσ ;
(iii) .(τ ≤ σ ) ∈ Fτ ∩ Fσ ≡ Fτ ∧σ ;

(iv) if .X = (Xn )n=0,...,N is a process adapted to the filtration then .Xτ ∈ mFτ .
Proof Part (i) follows from the fact that if .τ ≡ k then
⎧
A if k = n,
A ∩ (τ = n) =
.
∅ if k /= n.

Regarding (ii), it is enough to observe that, given .n ∈ {0, . . . , N }, if .τ ≤ σ then

(σ = n) ⊆ (τ ≤ n) and consequently for every .A ∈ Fτ we have
.

A ∩ (σ = n) = A ∩ (τ ≤ n) ∩ (σ = n) .
.
◟ ◝◜ ◞ ◟ ◝◜ ◞
∈Fn ∈Fn

As for (iii), recalling (6.1.2) we have

. (τ ≤ σ ) ∩ (τ = n) = (σ ≥ n) ∩ (τ = n) ∈ Fn ,
(τ ≤ σ ) ∩ (σ = n) = (τ ≤ n) ∩ (σ = n) ∈ Fn ,
100 6 Stopping Times

and therefore .(τ ≤ σ ) ∈ Fτ ∩ Fσ . Now, if .A ∈ Fτ ∩ Fσ we have

.A ∩ (τ ∧ σ ≤ n) = A ∩ ((τ ≤ n) ∪ (σ ≤ n))
= (A ∩ (τ ≤ n)) ∪ (A ∩ (σ ≤ n)) ∈ Fn , n = 0, . . . , N,

so that .Fτ ∩ Fσ ⊆ Fτ ∧σ . Conversely, if .A ∈ Fτ ∧σ , since .(τ = n) ⊆ (τ ∧ σ = n),

we have

A ∩ (τ = n) = (A ∩ (τ ∧ σ = n)) ∩ (τ = n) ∈ Fn
.

which proves the opposite inclusion.

Finally, consider .H ∈ B: to prove that .(Xτ ∈ H ) ∈ Fτ it is enough to observe
that

. (Xτ ∈ H ) ∩ (τ = n) = (Xn ∈ H ) ∩ (τ = n) ∈ Fn , n = 0, . . . , N.

This proves (iv). ⨆

⨅
Definition 6.1.6 (Stopped Process) Given a process .X = (Xn )n=0,...,N and a
stopping time .τ , the stopped process .Xτ = (Xnτ )n=0,...,N is defined by

Xnτ = Xn∧τ ,
. n = 0, . . . , N.

Proposition 6.1.7
(i) If X is adapted, then .Xτ is adapted;
(ii) if X is a sub-martingale, then .Xτ is a sub-martingale as well.
Proof Part (i) follows from the fact that, for .n = 0, . . . , N , we have1

τ ∧n
⎲
Xτ ∧n = X0 +
. (Xk − Xk−1 )
k=1

⎲
n
= X0 + (Xk − Xk−1 )1(k≤τ )
k=1

and, by (6.1.2), .(k ≤ τ ) ∈ Fk−1 . Part (ii) follows by applying the conditional
expectation given .Fn−1 to the identity

Xnτ − Xn−1
.
τ
= (Xn − Xn−1 )1(τ ≥n) , n = 1, . . . , N,

and remembering that .(τ ≥ n) ∈ Fn−1 . ⨆

⨅

∑
0
1 With the convention . · · · = 0.
k=1
6.1 The Discrete Case 101

From Proposition 6.1.7 it also follows that if X is a martingale (or a super-

martingale) then even .Xτ is a martingale (or a super-martingale). We leave as an
exercise the proof of the following
Lemma 6.1.8 Let .X ∈ L1 (Ω, F , P ) and .Z ∈ L1 (Ω, G , P ), where .G is a sub-.σ -
algebra of .F . Then2 .Z ≤ E [X | G ] if and only if

E [Z1G ] ≤ E [X1G ]
. for every G ∈ G .

Proposition 6.1.9 Let .X = (Xn )n=0,1,...,N be an absolutely integrable and adapted

process on the filtered space .(Ω, F , P , (Fn )n=0,1,...,N ). The following properties
are equivalent:
(i) X is a sub-martingale;
(ii) for every pair of stopping times .σ, τ we have

Xτ ∧σ ≤ E [Xτ | Fσ ] ;
.

(iii) for every stopping time .τ0 , the stopped process .Xτ0 is a sub-martingale.
Proof [(i) .=⇒ (ii)] Observe that
⎲
Xτ = Xτ ∧σ +
. (Xk − Xk−1 ) = (6.1.4)
σ <k≤τ

(recalling that, by Notation 6.1.4, .Xτ = Xτ ∧N )

⎲
N
. = Xτ ∧σ + (Xk − Xk−1 )1(σ <k≤τ ) .
k=1

Now, by points (ii) and (iv) of Proposition 6.1.5, .Xτ ∧σ ∈ mFτ ∧σ ⊆ mFσ and
therefore conditioning (6.1.4) to .Fσ we have

⎲
N
⎾ ⏋
.E [Xτ | Fσ ] = Xτ ∧σ + E (Xk − Xk−1 )1(σ <k≤τ ) | Fσ .
k=1
⎾ ⏋
To conclude, it is sufficient to prove that .E (Xk − Xk−1 )1(σ <k≤τ ) | Fσ ≥ 0 for
.k = 1, . . . , N or equivalently, thanks to Lemma 6.1.8,

⎾ ⏋ ⎾ ⏋
E Xk−1 1(σ <k≤τ ) 1G ≤ E Xk 1(σ <k≤τ ) 1G ,
. G ∈ Fσ , k = 1, . . . , N.
(6.1.5)

2 .Z ≤ E [X | G ] means .Z ≤ Y a.s. if .Y = E [X | G ].
102 6 Stopping Times

Formula (6.1.5) follows from the sub-martingale property of X once observed that,
by definition of .Fσ and by Remark 6.1.2-(ii), we have

(σ < k ≤ τ ) ∩ G = (σ < k) ∩ G ∩ (τ ≥ k) .
.
◟ ◝◜ ◞ ◟ ◝◜ ◞
∈Fk−1 ∈Fk−1

[(ii) .=⇒ (iii)] From point (ii) with .τ = τ0 ∧ n and .σ = n − 1 we get

⎾ ⏋
Xτ0 ∧(n−1) ≤ E Xτ0 ∧n | Fn−1 ,
. n = 1, . . . , N,

which implies the sub-martingale property of .Xτ0 .

[(iii) .=⇒ (i)] The claim follows by choosing .τ0 ≡ ∞. ⨆
⨅

6.1.1 Optional Sampling, Maximal Inequalities, and

Upcrossing Lemma

The following result is an immediate consequence of Proposition 6.1.9 (see also

Notation 6.1.4).
Theorem 6.1.10 (Optional Sampling Theorem [!!!]) Let .X = (Xn )n=0,...,N be a
sub-martingale on the space .(Ω, F , P , (Fn )n=0,...,N ). If .τ, σ are discrete stopping
times such that .σ ≤ τ then

Xσ ≤ E [Xτ | Fσ ] .
. (6.1.6)

If X is a martingale (respectively, a super-martingale) then formula (6.1.6) becomes

an equality (respectively, the direction of the inequality is reversed).
We now prove two important consequences of the optional sampling theorem:
• the Doob’s maximal inequalities which provide an estimate of the maximum of
a martingale;
• the Upcrossing lemma which provides an estimate on the local behavior of a
martingale and in particular on “how many times it can oscillate around an
interval”.
A fundamental characteristic of both results is to provide estimates that depend
only on the final value of the martingale and not on the number N of time instants
considered: this crucial fact will allow us to easily move from the discrete case to
the continuous one as we will see in Chap. 8.
6.1 The Discrete Case 103

Theorem 6.1.11 (Doob’s Maximal Inequalities [!!!]) Let .M = (Mn )n=0,1,...,N

be a martingale or a non-negative sub-martingale on the space .(Ω, F , P ,
(Fn )n=0,1,...,N ). Then:
(i) for every .λ > 0 we have
⎛ ⎞
E [|MN |]
P
. max |Mn | ≥ λ ≤ ; (6.1.7)
0≤n≤N λ

(ii) for every .p > 1 we have

⎾ ⏋ ⎛ ⎞p
p ⎾ ⏋
E
. max |Mn |p ≤ E |MN |p . (6.1.8)
0≤n≤N p−1

Proof Formula (6.1.7) is a sort of Markov inequality (cf. (3.12) in [113]) for discrete
martingales. If M is a martingale then, by Proposition 1.4.12, .|M| is a non-negative
sub-martingale: therefore it is enough to prove the thesis under the assumption that
M is a non-negative sub-martingale. In this case, we denote by .τ the first instant in
which M exceeds the level .λ,

τ = min{n | Mn ≥ λ},
.

and we set

M̄ = max Mn .
.
0≤n≤N

By Example 6.1.3 .τ is a stopping time and by Proposition 6.1.5-(iii) we have

(M̄ ≥ λ) = (τ ≤ N ) ∈ Fτ ∧N .
.

Then we have
⎾ ⏋ ⎾ ⏋
.P (M̄ ≥ λ) = E λ1(M̄≥λ) ≤ E Mτ ∧N 1(M̄≥λ) ≤

(by the optional sampling theorem)

⎾ ⏋
. ≤ E E [MN | Fτ ∧N ] 1(M̄≥λ) =

(since .(M̄ ≥ λ) ∈ Fτ ∧N )
⎾ ⎾ ⏋⏋ ⎾ ⏋
. = E E MN 1(M̄≥λ) | Fτ ∧N = E MN 1(M̄≥λ) (6.1.9)

which proves (6.1.7).

104 6 Stopping Times

p
Now observe that .M̄ p = max Mn . From (3.1.7) in [113] we have
0≤n≤N

ˆ +∞
⎾ ⏋ ( )
. E M̄ p
=p λp−1 P M̄ ≥ λ dλ ≤
0

(by (6.1.9))
ˆ +∞ ⎾ ⏋
. ≤p λp−2 E MN 1(M̄≥λ) dλ ≤
0

(by Fubini’s theorem)

⎾ ˆ ⏋
M̄ p ⎾ ⏋
. ≤ pE MN λp−2 dλ = E MN M̄ p−1 ≤
0 p−1

p
(by Hölder’s inequality, with . p−1 being the conjugate exponent of p)

p ⎾ p⏋1 ⎾ ⏋1− p1
. ≤ E MN p E M̄ p ,
p−1
⎾ ⏋1− p1
hence (6.1.8) follows by dividing by .E M̄ p and raising to the power of p. ⨅
⨆
Corollary 6.1.12 (Doob’s Maximal Inequalities) Let .M = (Mn )n=0,1,...,N
be a martingale or a non-negative sub-martingale on the space .(Ω, F , P ,
(Fn )n=0,1,...,N ), and let .τ be a discrete stopping time. Then:
(i) for every .λ > 0
⎛ ⎞
E [|Mτ |]
. P max |Mn | ≥ λ ≤ ;
0≤n≤τ ∧N λ

(ii) for every .p > 1

⎾ ⏋ ⎛ ⎞p
p ⎾ ⏋
.E max |Mn | p
≤ E |Mτ |p .
0≤n≤τ ∧N p−1

Proof It is sufficient to apply Theorem 6.1.11 to the stopped martingale .M τ (cf.

Definition 6.1.6 and Proposition 6.1.7). ⨆
⨅
We now prove a rather bizarre and surprising result, which will play a crucial
role in the study of the regularity and convergence properties of martingales: the
Upcrossing lemma. It shows that the number of “oscillations” of a martingale is
controlled by its final expectation. This result is unexpected and goes against the
6.1 The Discrete Case 105

idea that we might have of a martingale as a process whose trajectories are strongly
“oscillating” (think, for example, of a Brownian motion).
To formalize the result, let us fix .a, b ∈ R with .a < b. The upcrossing
lemma provides an estimate of the number of times a martingale “rises” from a
value less than a to a value greater than b. More precisely, given a martingale
.M = (Mn )n=0,...,N on the space .(Ω, F , P , (Fn )n=0,...,N ), let .τ0 := 0 and,

recursively for .k ∈ N,

σk := min{n ∈ {τk−1 , . . . , N } | Mn ≤ a}, τk := min{n ∈ {σk , . . . , N } | Mn ≥ b},

assuming as usual the convention .min ∅ = ∞. By definition, .τk ≥ σk ≥ τk−1 and

σk , τk are stopping times with values in .{0, . . . , N, ∞}. If .τk (ω) ≤ N then .τk (ω) is
.

the time of the k-th upcrossing of the trajectory .M(ω); instead, if .τk (ω) = ∞ then
the total number of upcrossings of the trajectory .M(ω) is less than k. Ultimately,
the number of upcrossings of M on .[a, b] is given by

νa,b := max{k ∈ N ∪ {0} | τk ≤ N}.

. (6.1.10)

A fundamental ingredient of the proof of the upcrossing lemma is the optional

sampling theorem, according to which, for every sub-martingale M, we have
⎾ ⏋ ⎾ ⏋
E Mτk ≤ E Mσk+1 ,
. k ∈ N. (6.1.11)

Now it is good to remember that, by definition (cf. Notation 6.1.4), .Mτk ≡ Mτk ∧N so
that .Mτk = MN on .(τk = ∞): in particular, it is not necessarily true that .Mτk (ω) ≥
b if .τk (ω) = ∞. This remark is important because, between an upcrossing time
.τk (ω) ≤ N and the next one, the trajectory .M(ω) must “descend” from .Mτk (ω) ≥ b

to .Mσk+1 (ω) ≤ a. The optional sampling theorem says that ⎾this cannot
⏋ happen
⎾ “too
⏋
often”: if .σk+1 ≤ N, by (6.1.11) we would have .b ≤ E Mτk ≤ E Mσk+1 ≤
a and this is absurd by the assumption .a < b. Therefore, for every .k ∈ N, the
event .(τk = ∞) cannot be negligible and, as already mentioned, such an event is
identifiable with the set of trajectories that have fewer than k upcrossings. In this
sense, the martingale property and the optional sampling theorem limit the number
of possible upcrossings, and thus oscillations, of M on .[a, b]. Now it is obvious that
.νa,b ≤ N , indeed more precisely .νa,b ≤
2 if .N ≥ 2: the surprising fact of the
N

upcrossing lemma is that it provides an estimate of .νa,b independent of N.

Lemma 6.1.13 (Upcrossing Lemma [!!]) For every sub-martingale .M =
(Mn )n=0,...,N and .a < b, we have
⎾ ⏋
⎾ ⏋ E (MN − a)+
.E νa,b ≤
b−a

where .νa,b in (6.1.10) indicates the number of upcrossings of M on .[a, b].

106 6 Stopping Times

Proof Since .a, b are fixed, during the proof we denote .νa,b simply by .ν. By
definition, .τk ≤ N on .(k ≤ ν) and .τk = ∞ on .(k > ν): therefore, recalling
again that .Mτ ≡ Mτ ∧N for every stopping time .τ , we have

⎲
N ⎲
ν
. (Mτk − Mσk ) = (Mτk − Mσk ) + Mτν+1 − Mσν+1 . (6.1.12)
k=1 k=1

Now there is a small problem: the last term .Mτν+1 − Mσν+1 = MN − Mσν+1 may
have a negative sign (since .MN could also be less than a). To solve this problem
(we will see shortly what the advantage will be) we introduce the process Y defined
by .Yn = (Mn − a)+ . We recall that Y is a non-negative sub-martingale (Proposition
1.4.12) and the number of upcrossings of M on .[a, b] is equal to the number of
upcrossings of Y on .[0, b − a] since

σk = min{n ∈ {τk−1 , . . . , N} | Yn = 0}, τk = min{n ∈ {σk , . . . , N } | Yn ≥b − a}.

Rewriting (6.1.12) for Y , now we have

⎲
N ⎲
ν ⎲
ν
. (Yτk − Yσk ) = (Yτk − Yσk ) + Yτν+1 − Yσν+1 ≥ (Yτk − Yσk ) ≥ (b − a)ν,
k=1 k=1 k=1
(6.1.13)

since3 .Yτν+1 − Yσν+1 ≥ 0. To conclude, we observe that .YN = YσN+1 and

⎲
N
YN ≥ YσN+1 − Yσ1 =
. (Yσk+1 − Yσk )
k=1

⎲
N ⎲
N
= (Yσk+1 − Yτk ) + (Yτk − Yσk ) ≥
k=1 k=1

(by (6.1.13))

⎲
N
. ≥ (Yσk+1 − Yτk ) + (b − a)ν.
k=1

Applying the expected value and the optional sampling theorem ((6.1.11) with .M =
Y ) we finally have the thesis

E [YN ] ≥ E [(b − a)ν] .

⨆
⨅

3 We have .Yτν+1 − Yσν+1 = YN ≥ 0 on .(σν+1 ≤ N ) and .Yτν+1 − Yσν+1 = 0 on .(σν+1 = ∞).

6.2 The Continuous Case 107

Exercise 6.1.14 Prove that, for every .a < b, a continuous function .f : [0, 1] −→
R can have only a finite number of upcrossings on .[a, b].

6.2 The Continuous Case

The analysis of stopping times in the continuous case, where .I = R≥0 , requires
additional technical assumptions on filtrations, commonly referred to as the “usual
conditions”. We will delve into these conditions in the subsequent sections.

6.2.1 Usual Conditions and Stopping Times

Definition 6.2.1 (Usual Conditions) We say that a filtration (Ft )t≥0 in the
complete space (Ω, F , P ) satisfies the usual conditions if:
(i) it is complete, i.e., F0 (and therefore also Ft for every t > 0) contains the
family N of negligible events;4
(ii) it is right-continuous, i.e., for every t ≥ 0 we have Ft = Ft+ where
⋂
Ft+ :=
. Ft+ε . (6.2.1)
ε>0

If X is adapted to a filtration (Ft ) that satisfies the usual conditions, then every
modification of X is adapted to (Ft ) as well: without the completeness assumption
on the filtration, this statement is false. The right-continuity assumption is more
subtle: it means that the knowledge of information up to time t, represented by
Ft , allows us to know what happens “immediately after” t, i.e., Ft+ . To better
understand this fact, which may now appear obscure, we introduce the concepts of
stopping time in R≥0 and exit time of an adapted process.
Definition 6.2.2 (Stopping Time) In a filtered space (Ω, F , P , Ft ), a stopping
time is a random variable5

τ : Ω −→ R≥0 ∪ {∞}
.

such that

(τ ≤ t) ∈ Ft ,
. t ≥ 0. (6.2.2)

4 By assumption (Ω, F , P ) is complete and therefore every negligible set is an event.

5 That is, (τ ∈ H ) ∈ F for every H ∈ B . Consequently, also (τ = ∞) = (τ ∈ [0, ∞))c ∈ F .
108 6 Stopping Times

Example 6.2.3 (First Exit Time [!]) Given a process X = (Xt )t≥0 and H ⊆ R,
we set
⎧
inf J (ω) if J (ω) /= ∅,
.τ (ω) = where J (ω) = {t ≥ 0 | Xt (ω) ∈
/ H }.
∞ if J (ω) = ∅,

Hereafter, we will also write

τ = inf{t ≥ 0 | Xt ∈
. / H}

assuming by convention that the infimum of the empty set is ∞ so that τ (ω) = ∞
if Xt (ω) ∈ H for every t ≥ 0. We say that τ is the first exit time of X from H .
Proposition 6.2.4 (Exit Time from an Open Set [!]) Let X be an adapted and
continuous process on the space (Ω, F , P , Ft ). The first exit time of X from an
open set H is a stopping time.
Proof The thesis is a consequence of the equality
⋃ ⋂ ⎛ ⎞
(τ > t) =
. dist(Xs , H c ) ≥ 1
n (6.2.3)
n∈N s∈Q∩[0,t)

⎛ ⎞
since dist(Xs , H c ) ≥ n1 ∈ Fs for s ≤ t and therefore (τ ≤ t) = (τ > t)c ∈ Ft .
Let us prove (6.2.3): if ω belongs to the right-hand side then there exists n ∈ N
such that dist(Xs (ω), H c ) ≥ n1 for every s ∈ Q ∩ [0, t); since X has continuous
trajectories, it follows that dist(Xs (ω), H c ) ≥ n1 for every s ∈ [0, t] and therefore,
again by the continuity of X, it must be τ (ω) > t.
Conversely, if τ (ω) > t then K := {Xs (ω) | s ∈ [0, t]} is a compact subset of H :
since H is open, it follows that dist(K, H c ) > 0 and this is enough to conclude. ⨅ ⨆
In the next lemma, we prove that for every stopping time τ we have

(τ < t) ∈ Ft ,
. t > 0. (6.2.4)

In general, (6.2.4) is weaker than (6.2.2) but, under the usual conditions on the
filtration, the two properties are equivalent.
Lemma 6.2.5 ([!]) Every stopping time τ satisfies (6.2.4). Conversely, if (6.2.4)
holds and the filtration (Ft )t≥0 is right-continuous, then τ is a stopping time.
Proof We have
⋃⎛ ⎞
(τ < t) =
. τ ≤t− 1
n .
n∈N
6.2 The Continuous Case 109

⎛ ⎞
If τ is a stopping time, then τ ≤ t − n1 ∈ F 1 ⊆ Ft for every n ∈ N, and this
t− n
proves the first part of the thesis.
Conversely, if (6.2.4) holds, then for every ε > 0 we have
⋂⎛ ⎞
. (τ ≤ t) = τ < t + n1 ∈ Ft+ε .
n∈N
1
n <ε

Therefore
⋂
(τ ≤ t) ∈
. Ft+ε = Ft
ε>0

thanks to the right-continuity assumption on the filtration. ⨆

⨅
Remark 6.2.6 If τ is a stopping time then

(τ = t) = (τ ≤ t) \ (τ < t) ∈ Ft .
.

Moreover
⋂ ⋃
(τ = ∞) =
. (τ ≥ t) ∈ Ft .
t≥0 t≥0

Since the union of σ -algebras is not generally a σ -algebra, we denote by

⎛⋃ ⎞
F∞ := σ
. Ft (6.2.5)
t≥0

the smallest σ -algebra that contains Ft for each t ≥ 0. Clearly (τ = ∞) ∈ F∞ .

Proposition 6.2.7 (Exit Time from a Closed Set) Let X be an adapted and
continuous process on the space (Ω, F , P , Ft ). The first exit time τ of X from a
closed set H satisfies (6.2.4). If the filtration is right-continuous then τ is a stopping
time.
Proof Since H c is open and X is continuous, for each t > 0 we have
⋃
(τ < t) =
. (Xs ∈ H c )
s∈Q∩[0,t)

and the thesis follows from the fact that (Xs ∈ H c ) ∈ Ft for s ≤ t since X is
adapted to (Ft ). The second part of the thesis follows directly from Lemma 6.2.5.
⨆
⨅
110 6 Stopping Times

Fig. 6.1 A trajectory of a

continuous process X and its
first exit time from a closed
set H

Remark 6.2.8 Under the usual conditions, also the exit time from a Borel set
is a stopping time. However, establishing this fact demands a substantially more
challenging proof: see, for example, Section I.10 in [20].

Remark 6.2.9 ([!]) Let us comment on Proposition 6.2.7 by observing Fig. 6.1
where the first exit time τ of X from the closed set H is represented. Up to time
τ , including τ , the trajectory of X is in H . Now note the difference between the
events

(τ < t) = “X exits H before time t”,

(τ ≤ t) = “X exits H before or immediately after t”.

Intuitively, it is plausible that, without the need to impose conditions on the filtration,
one can prove (this is what we did in Proposition 6.2.7) that (τ < t) ∈ Ft , i.e., that
the fact that X exits H before time t is observable based on the knowledge of what
happened up to time t (i.e., Ft , in particular knowing the trajectory of the process up
to time t). On the contrary, it is only thanks to the right-continuity of the filtration
that one can prove that (τ ≤ t) ∈ Ft . Indeed, if t = τ (ω) then Xt (ω) ∈ ∂H
and based on the observation of the trajectory of X up to time t (i.e., having the
information in Ft ) it is not possible to know whether X(ω) will continue to remain
inside H or exit H immediately after t. In fact, for a generic filtration (τ ≤ t) ∈ / Ft ,
i.e., as already observed, the condition (τ < t) ∈ Ft is weaker than (τ ≤ t) ∈ Ft .
On the other hand, if (Ft )t≥0 satisfies the usual conditions (in particular, the right-
continuity property) then the two conditions (τ < t) ∈ Ft and (τ ≤ t) ∈ Ft are
equivalent (Lemma 6.2.5). As we anticipated, this means that the right-continuity of
the filtration ensures that knowing Ft we can also see what happens “immediately
after” time t.

6.2.2 Filtration Enlargement and Markov Processes

We have explained the importance of the usual conditions on filtrations and the
reasons why it is it is preferable to assume the validity of such hypotheses. In
this section, we prove that it is always possible to modify a filtration so that it
satisfies the usual conditions and, under appropriate conditions, it is also possible
6.2 The Continuous Case 111

to preserve some fundamental properties of the considered processes, such as the

Markov property.

The results of this section and the rest of the chapter are useful but have
quite technical and less informative proofs: at a first reading, it is therefore
recommended to read the statements but skip the proofs.

Consider a complete space .(Ω, F , P ) equipped with a generic filtration .(Ft )t≥0
and denote by .N the family of negligible events. It is always possible to expand
.(Ft )t≥0 so that the usual conditions are satisfied:

(i) by setting

F¯t := σ (Ft ∪ N ) ,
. t ≥ 0, (6.2.6)

we define the smallest filtration6 in .(Ω, F , P ), which completes and extends

.(Ft )t≥0 ;

(ii) the filtration .(Ft+ )t≥0 defined by (6.2.1) is right-continuous.

( )
Combining points (i) and (ii) (in any order), we obtain the filtration . F¯t+ t≥0 which
is the smallest filtration that extends .(Ft )t≥0 and verifies the usual conditions.
Definition
( ) 6.2.10 (Standard Enlargement of a Filtration) The filtration
.F¯t+ t≥0 is called the standard enlargement of the filtration .(Ft )t≥0 .
Now consider a stochastic process .X = (Xt )t≥0 on .(Ω, F , P ) and its associated
filtration

GtX := σ (Xs , s ≤ t),

. t ≥ 0,

that is, the filtration generated by X.

Definition 6.2.11 (Standard Filtration of( a Process)
) The standard filtration of a
process X, hereafter denoted by .F X = FtX t≥0 , is the standard enlargement of
.G .
X

Suppose that .X = (Xt )t≥0 is a Markov process with transition law p on the
complete filtered space .(Ω, F , P , Ft ). In general, it is not a problem to “shrink”
the filtration: more precisely, if .(Gt )t≥0 is a filtration such that .GtX ⊆ Gt ⊆ Ft for
every .t ≥ 0, i.e., .(Gt )t≥0 is smaller than .(Ft )t≥0 but larger than .(GtX )t≥0 , then it
is immediate to verify that X is a Markov process also on the space .(Ω, F , P , Gt ).

6 Obviously, we have .F¯t ⊆ F¯T if .0 ≤ t ≤ T . Moreover, .F¯t ⊆ F for every .t ≥ 0 thanks to the
completeness assumption of .(Ω, F , P ).
112 6 Stopping Times

The problem is not obvious when we want to enlarge the filtration. The following
results provide conditions under which it is possible to enlarge the filtration of a
Markov process so that it verifies the usual conditions, without affecting the Markov
property.
Proposition 6.2.12 Let .X = (Xt )t≥0 be a Markov process with transition law p
on the complete filtered space .(Ω, F , P , Ft ). Then X is a Markov process with
transition law p on .(Ω, F , P ) with respect to the completed filtration .(F¯t )t≥0
in (6.2.6).
Proof Clearly, X is adapted to .F¯ so we only need to prove that

p(t, Xt ; T , H ) = P (XT ∈ H | F¯t ),

. 0 ≤ t ≤ T , H ∈ B.

Let .Z = p(t, Xt ; T , H ), then .Z ∈ mσ (Xt ) ⊆ mF¯t ; based on the definition of

conditional expectation, it remains to verify that for every .G ∈ F¯t we have
⎾ ⏋
E [Z1G ] = E 1(XT ∈H ) 1G .
. (6.2.7)

Formula (6.2.7) is true if .G ∈ Ft : on the other hand (see Remark 1.4.3 in [113])
G ∈ F¯t = σ (Ft ∪ N ) if and only if .G = A ∪ N for some .A ∈ Ft and .N ∈ N .
.

Therefore, we have
⎾ ⏋ ⎾ ⏋
E [Z1G ] = E [Z1A ] = E 1(XT ∈H ) 1A = E 1(XT ∈H ) 1G .
.

⨆
⨅
It is possible to enlarge the filtration to make it right-continuous and maintain
the Markov property, assuming additional continuity assumptions for the process
trajectories (e.g., a.s. right-continuity) and for the process transition law (the Feller
property, Definition 2.1.10).
Proposition 6.2.13 Let .X = (Xt )t≥0 be a Markov process with transition law p on
the complete filtered space .(Ω, F , P , Ft ). Suppose that X is a Feller process with
a.s. right-continuous trajectories. Then X is a Markov process with transition law
p on .(Ω, F , P , Ft+ ).
Proof Clearly, X is adapted to .(Ft+ )t≥0 so there is only to prove the Markov
property, namely that for every .0 ≤ t < T and .ϕ ∈ bB we have
ˆ
Z = E [ϕ(XT ) | Ft+ ]
. where Z := p(t, Xt ; T , dy)ϕ(y).
R

By Fubini’s theorem, .Z ∈ mFt ⊆ mFt+ . Therefore, by definition of conditional

expectation, it remains to verify that for every .G ∈ Ft+ we have

E [ϕ(XT )1G ] = E [Z1G ] .

. (6.2.8)
6.2 The Continuous Case 113

Now, let .h > 0 such that .t + h < T : we have .G ∈ Ft+h and therefore, by the
Markov property of X with respect to .(Ft )t≥0 , we have
⎾ˆ ⏋
E [ϕ(XT )1G ] = E
. p(t + h, Xt+h ; T , dy)ϕ(y)1G . (6.2.9)
R

Utilizing the a.s. right-continuity of the trajectories of X and the Feller property
of p, we can take the limit as h tends to .0+ in (6.2.9). Applying the dominated
convergence theorem yields (6.2.8).
⨆
⨅
Remark 6.2.14 ([!]) Combining Propositions 6.2.12 and 6.2.13 we have the fol-
lowing result: if X is an a.s. right-continuous, Markov and Feller process on the
complete space .(Ω, F , P , Ft ) then X is a Markov process also on the complete
space .(Ω, F , P , (F¯t+ )t≥0 ) where the usual conditions hold.
Next, we show that for a Markov process X with respect to its own standard
filtration .F X , we simply have

FtX = σ (GtX ∪ N ),
. t ≥ 0. (6.2.10)

In other words, .F X is obtained by completing the filtration generated by X and the

property of right-continuity is automatically satisfied.
Proposition 6.2.15 ([!]) If X is a Markov process with respect to its standard
filtration .F X then (6.2.10) holds.
Proof The proof is based on the extended Markov property of Theorem 2.2.4
according to which we have7
⎾ ⏋
ZE [Y | Xt ] = E ZY | FtX ,
. Z ∈ bσ (GtX ∪ N ), Y ∈ bGt,∞
X
.

Since every version of .E [Y | Xt ] is .σ (Xt )-measurable and given the uniqueness of

the⎾ conditional
⏋ expectation up to negligible events, it follows that every version of
.E ZY | Ft
X is .σ (G X ∪ N )-measurable: given the assumptions on Y and Z, this
t
measurability property also holds if instead of ZY we put any random variable in
.bσ (G∞ ∪ N ). In particular, for .A ∈ Ft ⊆ σ (G∞ ∪ N ) we obtain
X X X

⎾ ⏋
1A = E 1A | FtX ∈ bσ (GtX ∪ N ).
.

⨆
⨅
Remark 6.2.16 ([!]) Combining Propositions 6.2.12, 6.2.13, and 6.2.15, we have
the following result: let Xbe a Markov and Feller right-continuous process with

7 In the sense of Convention 4.2.5 in [113]. Note that .Z ∈ bσ (GtX ∪ N ) ⊆ bFtX .

114 6 Stopping Times

respect to .G X ; then the standard filtration is .FtX = σ (GtX ∪ N ), .t ≥ 0, and X is

a Markov process also with respect to .F X .
We now consider a Markov process X on the space .(Ω, F , P , Ft ) in which the
usual conditions hold and recall definition (2.2.6) of the .σ -algebra .Gt,∞
X of future

information on X starting from time t.

Theorem 6.2.17 (Blumenthal’s 0-1 Law) Let X be a Markov process on
(Ω, F , P , Ft ). If .A ∈ Ft ∩ Gt,∞
.
X then .P (A | X ) = 1 or .P (A | X ) = 0.
t t

Proof We explicitly note that A is not necessarily .σ (Xt )-measurable. In other

words, in general .σ (Xt ) is strictly included in .Ft ∩ Ft,∞
X since, by the right

continuity of .F , we have
X

⋂
.σ (Xt ) ⊆ σ (Xs , t ≤ s ≤ t + ε) ⊆ Ft ∩ Ft,∞
X
.
ε>0

if this were the case, the thesis would be an obvious consequence of Example 4.3.3
in [113]. On the other hand, by Corollary 2.2.5, .Ft and .Gt,∞X are, conditionally on

.Xt , independent: it follows that A is independent of itself (conditionally on .Xt ) and

therefore we have

P (A | Xt ) = P (A ∩ A | Xt ) = P (A | Xt )2 .
.

Hence, .P (A | Xt ) can only take the values 0 or 1. ⨆

⨅
Example 6.2.18 ([!]) We resume Example 6.2.3 and suppose that .τ is the exit
time from a closed set H , of a continuous Markov process X on the space
.(Ω, F , P , F ). We apply Blumenthal’s 0-1 law with .t = 0: clearly .(τ = 0) ∈
X

F0 = F0 ∩ F0,∞
X X X since .τ is a stopping time; here .(τ = 0) indicates the
event according to which the process X exits immediately from H . Then we have
.P (τ = 0 | X0 ) = 0 or .P (τ = 0 | X0 ) = 1, that is almost all trajectories of X exit

immediately from H or almost none. This fact is particularly interesting when .X0
belongs to the boundary of H .

6.2.3 Filtration Enlargement and Lévy Processes

We now study the filtration enlargement for the Poisson process and the Brownian
motion. To treat the subject in a unified way, we introduce a class of processes of
which Poisson and Brownian are particular cases.
Definition 6.2.19 (Lévy Process) Let .X = (Xt )t≥0 be a real stochastic process
defined on a complete filtered probability space .(Ω, F , P , Ft ). We say that X is a
Lévy process if it satisfies the following properties:
(i) .X0 = 0 a.s.;
(ii) the trajectories of X are a.s. càdlàg;
6.2 The Continuous Case 115

(iii) X is adapted to .(Ft );

(iv) .Xt − Xs is independent of .Fs for every .0 ≤ s ≤ t;
(v) the increments .Xt −Xs and .Xt+h −Xs+h have the same law for every .0 ≤ s ≤ t
and .h ≥ 0.
Remark 6.2.20 ([!!]) Properties (iv) and (v) are expressed by saying that X has
independent and stationary increments. By Proposition 2.3.2, a Lévy process X is a
Markov process with transition law .p(t, x; T , ·) equal to the distribution of .XT −
Xt +x: such law is homogeneous in time thanks to the stationarity of the increments.
It follows in particular that every Lévy process is a Feller process: indeed, for every
.ϕ ∈ bC(R) and .h > 0 we have

ˆ
.(t, x) |−→ p(t, x; t + h, dy)ϕ(y) =
R

(since .p(t, x; t + h, ·) is the distribution of .Xt+h − Xt + x which is equal in law to

Xh + x by the stationarity of the increments)
.

ˆ
. = p(0, x; h, dy)ϕ(y) = E [ϕ(Xh + x)]
R

and the continuity in .(t, x) follows from Lebesgue’s dominated convergence

theorem.
Moreover, one can prove that the CHF of a Lévy process X is of the form

ϕXT (η) = eT ψ(η)

2
where .ψ is called the characteristic exponent of X: for example, .ψ(η) = − η2
for Brownian motion and .ψ(η) = λ(eiη − 1) for the Poisson process (cf.
Remark 5.1.4). Then, setting for simplicity .p(T , ·) = p(0, 0; T , ·), we have the
following remarkable relation:

ψ(η)eT ψ(η) = ∂T eT ψ(η)

.
ˆ
= ∂T eiηy p(T , dy) =
R

(assuming we can exchange the signs of derivative and integral)

ˆ
. = eiηy ∂T p(T , dy) =
R

(since .p(T , dy) solves the forward Kolmogorov equation (2.5.25), .∂T p(T , ·) =
AT∗ p(T , ·) where .AT∗ is the adjoint of the infinitesimal generator or characteristic
operator of X)
ˆ
. = eiηy AT∗ p(T , dy).
R
116 6 Stopping Times

In the language of pseudo-differential calculus, this fact is expressed by stating that

ψ is the symbol of the operator .AT∗ and is denoted as
.

AT∗ = ψ(i∂y ).
.

2
For example, for the Brownian motion we have .ψ(η) = − η2 and

1
AT∗ = ψ(i∂y ) =
. ∂yy ,
2

while for the Poisson process, since .ψ(η) = λ(eiη − 1), we have

AT∗ ϕ(y) = ψ(i∂y )ϕ(y) = λ(ϕ(y − 1) − ϕ(y)).

. (6.2.11)

The representation (6.2.11) of .AT∗ as a pseudo-differential operator is also justified

by the formal expression
∞
⎲ (α∂y )n
eα∂y ϕ(y) =
. ϕ(y) = ϕ(y + α)
n!
n=0

as a Taylor series expansion valid for every analytic function .ϕ. The general
expression of the characteristic exponent of a Lévy process is given by the famous
Lévy-Khintchine formula
ˆ ⎛ ⎞
σ 2 η2
. ψ(η) = iμη − + eiηx − 1 − iηx1|x|≤1 ν(dx)
2 R

where .μ, σ ∈ R and .ν is a measure on .R such that .ν({0}) = 0 and

ˆ
. (1 ∧ |x|2 )ν(dx) < ∞.
R

For each .H ∈ B, .ν(H ) indicates the expected number of jumps of the process
trajectories in a unit time period, with size .Δt X ∈ H : for example, for the Poisson
process, we have .ν = λδ1 and for the compound Poisson process of Example 5.1.5,
we have .ν = λμZ where .μZ is the law of the variables .Zn , i.e., the individual jumps
of the process.
If a Lévy process X is a.s. continuous then .ν ≡ 0 and therefore necessarily X
is a Brownian motion with drift, i.e., a process of the form .Xt = μt + σ Wt with
.μ, σ ∈ R and W Brownian motion. Among the reference texts for the general

theory of Lévy processes, we indicate the monograph [4].

Proposition 6.2.21 Let .X = (Xt )t≥0 be a Lévy process on the complete space
.(Ω, F , P , Ft ). Then X is a Lévy process also on .(Ω, F , P , (F¯t )t≥0 ) and on
.(Ω, F , P , (Ft+ )t≥0 ).
6.2 The Continuous Case 117

Proof It suffices to verify that, for each .0 ≤ s < t, the increment .Xt − Xs is
independent of .F¯s and of .Fs+ , i.e., we have

P (Xt − Xs ∈ H | G) = P (Xt − Xs ∈ H ),
. H ∈ B, (6.2.12)

if .G ∈ F¯s ∪ Fs+ with .P (G) > 0. Let us first consider the case .G ∈ F¯s (always
assuming .P (G) > 0). Equation (6.2.12) is true if .G ∈ Fs : on the other hand (cf.
Remark 1.4.3 in [113]) .G ∈ F¯s = σ (Fs ∪ N ) if and only if .G = A ∪ N for some
.A ∈ Fs and .N ∈ N (and necessarily .P (A) > 0 since .P (G) > 0). Hence we have

P (Xt − Xs ∈ H | G) = P (Xt − Xs ∈ H | A) = P (Xt − Xs ∈ H ).

Now let us consider the case .G ∈ Fs+ with .P (G) > 0. Here we use the fact
that, by Corollary 2.5.8 in [113], Eq. (6.2.12) is true if and only if we have

E [ϕ(Xt − Xs ) | G] = E [ϕ(Xt − Xs )] ,
.

for every .ϕ ∈ bC. We observe that, for every .h > 0, .G ∈ Fs+h and therefore G is
independent from .Xt+h − Xs+h : then we have

.E [ϕ(Xt+h − Xs+h ) | G] = E [ϕ(Xt+h − Xs+h )]

and we conclude by taking the limit as .h → 0+ , by the dominated convergence

theorem thanks to the right-continuity of the trajectories of X and the continuity
and boundedness of .ϕ. ⨆
⨅
Combining the previous results with Remark 6.2.16 we have the following
Theorem 6.2.22 ([!]) Let X be a Lévy process on the complete space .(Ω, F , P )
equipped with the filtration .G X generated by X. Then .FtX = σ (GtX ∪ N ), for
.t ≥ 0, and X is a Lévy process also with respect to the standard filtration .F .
X

As a consequence of Blumenthal’s 0-1 law of Theorem 6.2.17, we have

Corollary 6.2.23 (Blumenthal’s 0-1 Law) Let .X = (Xt )t≥0 be a Lévy process.
For every .A ∈ F0X we have .P (A) = 0 or .P (A) = 1.
Let .(C(R≥0 ), BμW , μW ) be the Wiener space (cf. Definition 4.3.2): here .μW
is the Wiener measure (i.e., the law of a Brownian motion) defined on the .μW -
completion .BμW of the Borel .σ -algebra.
Definition 6.2.24 (Canonical Brownian Motion) The canonical Brownian
motion .W is the identity process8 on the Wiener space equipped with the standard
filtration .F W .

8 That is, .Wt (w) = w(t) for every .w ∈ C(R≥0 ) and .t ≥ 0.

118 6 Stopping Times

Remark 6.2.25 ([!]) By Corollary 4.3.3 and Theorem 6.2.22, the canonical Brow-
nian motion is a Brownian motion, according to Definition 4.1.1, on the space
.(C(R ≥0 ), BμW , μW , F
W ). Moreover, the Wiener space is a Polish metric space

and a complete probability space in which the standard filtration .F W satisfies

the usual conditions: due to these important properties, the Wiener space and the
canonical Brownian motion constitute respectively the canonical space and process
of reference in the study of stochastic differential equations.

6.2.4 General Results on Stopping Times

We resume the study of stopping times with values in .R≥0 ∪ {∞} (cf. Definition
6.2.2), on a filtered space .(Ω, F , P , Ft ) satisfying the usual conditions. We leave
as an exercise the proof of the following
Proposition 6.2.26
(i) If .τ = t a.s. then .τ is a stopping time;
(ii) if .τ, σ are stopping times then also .τ ∧ σ and .τ ∨ σ are stopping times;
(iii) if .(τn )n≥1 is an increasing sequence (i.e., .τn ≤ τn+1 a.s. for every .n ∈ N) then
. sup τn is a stopping time;
n∈N
(iv) if .(τn )n≥1 is a decreasing sequence (i.e., .τn ≥ τn+1 a.s. for every .n ∈ N) then
. inf τn is a stopping time;
n∈N
(v) if .τ is a stopping time then for every .ε ≥ 0 also .τ + ε is a stopping time.
Now consider a stochastic process .X = (Xt )t≥0 on the filtered space
(Ω, F , P , Ft ) that verifies the usual conditions. In the analysis of stopping
.

times (and later, stochastic integration), it becomes necessary to impose a minimal

measurability condition on X concerning the time variable. This condition enhances
the notion of adapted process.
Definition 6.2.27 (Progressively Measurable Process) A process .X = (Xt )t≥0
is progressively measurable if, for every .t > 0, the function .(s, ω) |→ Xs (ω) on
.[0, t] × Ω to .R is measurable with respect to the product .σ -algebra .B ⊗ Ft .
d

In other words, X is progressively measurable if, for every fixed .t > 0, the
function .g := X|[0,t]×Ω , defined by

.g : ([0, t] × Ω, B ⊗ Ft ) −→ (R, B), g(s, ω) = Xs (ω), (6.2.13)

is .(B ⊗ Ft )-measurable. If X is progressively measurable then, by Lemma 2.3.11

in [113], it is adapted to .(Ft ). Conversely, a result by Chung and Doob [25] shows
that if X is adapted and measurable9 then it possesses a progressively measurable

9 That is, .(t, ω) |→ Xt (ω) is .B ⊗ F -measurable.

6.2 The Continuous Case 119

modification (for a proof of this fact see, for example [96], Theorem T46 on p. 68).
We will only need the following much simpler result:
Proposition 6.2.28 If X is adapted to .(Ft ) and has a.s. right-continuous trajecto-
ries (or has a.s. left-continuous trajectories) then it is progressively measurable.
Proof Consider the sequences
∞
⎲ ∞
⎲
(n)
→ t(n) :=
X X k−1
n
1[ k−1
n ,
k
) (t), X→ t := X kn 1[ k−1
n ,
k
) (t), t ∈ [0, T ], n ∈ N.
2 2 2n 2 2 2n
k=1 k=1

→ (n) ∈ m(B ⊗FT )

Since X is adapted, it follows from Corollary 2.3.9 in [113] that .X
(n)
and .X→ ∈ m(B ⊗ FT + 1n ). If X has a.s. left-continuous trajectories then .X → (n)
2
converges pointwise .(Leb ⊗ P )-a.s. to X on .[0, T ] × Ω as .n → ∞: given the
arbitrariness of T , it follows that X is progressively measurable.
(n)
Similarly, if X has a.s. right-continuous trajectories then .X→ converges point-
wise .(Leb ⊗ P )-a.s. to X on .[0, T ] × Ω as .n → ∞: it follows that, for every .ε > 0,
the map .(t, ω) |→ Xt (ω) is .(B ⊗ FT +ε )-measurable on .[0, T ] × Ω. Due to the
right-continuity of the filtration, we conclude that X is progressively measurable.
⨆
⨅
Given a stopping time .τ , we recall definition (6.2.5) of .F∞ and, in analogy
with (6.1.3), we define

Fτ := {A ∈ F∞ | A ∩ (τ ≤ t) ∈ Ft for every t ≥ 0}.

Note that .Fτ is a .σ -algebra and .Fτ = Ft if .τ is the constant stopping time equal
to t. Moreover, given a process .X = (Xt )t≥0 we define
⎧
Xτ (ω) (ω) if τ (ω) < ∞,
(Xτ )(ω) :=
.
0 if τ (ω) = ∞.

Proposition 6.2.29 In a filtered probability space where the usual conditions are
in force, we have:
(i) .τ ∈ mFτ ;
(ii) if .τ ≤ σ then .Fτ ⊆ Fσ ;
(iii) .Fτ ∩ Fσ = Fτ ∧σ ;

(iv) ⋂ measurable then .Xτ ∈ mFτ ;

if X is progressively
(v) .Fτ = Fτ + := Fτ +ε ;
ε>0
120 6 Stopping Times

Proof
(i) We have to show that .(τ ∈ H ) ∩ (τ ≤ t) ∈ Ft for every .t ≥ 0 and .H ∈ B: the
thesis follows easily since by Lemma 2.1.5 in [113] it is sufficient to consider
H of the type .(−∞, s] with .s ∈ R.
(ii) If .τ ≤ σ then .(σ ≤ t) ⊆ (τ ≤ t): hence for every .A ∈ Fτ we have

A ∩ (σ ≤ t) = A ∩ (τ ≤ t) ∩ (σ ≤ t) .
.
◟ ◝◜ ◞ ◟ ◝◜ ◞
∈Ft ∈Ft

(iii) By point (ii) the inclusion .Fτ ∩ Fσ ⊇ Fτ ∧σ holds. Conversely, if .A ∈ Fτ ∩

Fσ then

A ∩ (τ ∧ σ ≤ t) = A ∩ ((τ ≤ t) ∪ (σ ≤ t)) = (A ∩ (τ ≤ t)) ∪ (A ∩ (σ ≤ t)) .

.
◟ ◝◜ ◞ ◟ ◝◜ ◞
∈Ft ∈Ft

(iv) we have to prove that .(Xτ ∈ H ) ∩ (τ ≤ t) = (Xτ ∧t ∈ H ) ∩ (τ ≤ t) ∈ Ft

for every .t ≥ 0 and .H ∈ B. Since .(τ ≤ t) ∈ Ft it is sufficient to prove that
.Xτ ∧t ∈ mFt : this is a consequence of the fact that .Xτ ∧t (ω) = (f ◦ g)(t, ω)

with f and g measurable functions defined by

f : (Ω, Ft ) −→ ([0, t] × Ω, B ⊗ Ft ),
. f (t, ω) := (τ (ω) ∧ t, ω),

and g as in (6.2.13). The measurability of f follows from Corollary 2.3.9 in

[113] and the fact that, by (i), .(τ ∧ t) ∈ mFτ ∧t ⊆ mFt ; g is measurable since
X is progressively measurable.
(v) The inclusion .Fτ ⊆ Fτ + is obvious by (ii). Conversely, if .A ∈ Fτ + then by
definition .A ∩ (τ + ϵ ≤ t) ∈ Ft for every .t ≥ 0 and .ϵ > 0: therefore .A ∩ (τ ≤
t − ϵ) ∈ Ft for every .t ≥ 0 and .ϵ > 0, or equivalently .A ∩ (τ ≤ t) ∈ Ft+ϵ for
every .t ≥ 0 and .ϵ > 0. Due to the right-continuity hypothesis of the filtration,
we have .A ∩ (τ ≤ t) ∈ Ft for every .t ≥ 0 which means .A ∈ Fτ .
⨆
⨅

6.3 Key Ideas to Remember

discrete case, many of the main ideas and techniques related to stopping times
emerge: the proofs, although using elementary tools, can be quite challenging.
Stopping a process maintains its essential properties such as being adapted and
the martingale property.
• Section 6.1.1: the optional sampling theorem and Doob’s maximal inequalities
are crucial results that will be systematically used in the following chapters: so it
is useful to dwell on the details of the proofs. The upcrossing lemma is a rather
unusual and subtle result, whose use will be limited to proving the continuity of
martingale trajectories: its proof can be skipped at a first reading.
• Section 6.2.1: the study of stopping times in the continuous case involves some
technical difficulties. First of all, it is necessary to assume the so-called usual
conditions on the filtration: these are crucial, for example, in the study of exit
times of a process from a closed set.
• Sections 6.2.2 and 6.2.3: every filtration can be enlarged in such a way that it
satisfies the usual conditions, but in that case, it is necessary to prove that certain
properties of the processes remain valid: for instance, the Markov property or the
independence properties of the increments of a Lévy process. It is useful to grasp
the statements in these sections, but one can gloss over the technical aspects of
the proofs.
• Section 6.2.4: the notion of progressively measurable process strengthens that
of an adapted process as it requires a joint measurability property in .(t, ω). In
particular, a progressively measurable process is also measurable as a function of
the time variable: this is relevant in the context of stochastic integration theory.
Main notations used or introduced in this chapter:

Symbol Description Page

.τ Typical letter used to indicate a stopping time 97
.Xτ Process X evaluated at (stopping) time .τ 99
.Fτ .σ -algebra of information at (stopping) time .τ 99
.X
τ Stopped process 100
.M̄ = max Mn Maximum process 103
0≤n≤N
.N Negligible sets 107
.F¯t Completed .σ -algebra 111
.Ft+ “Right-augmented” .σ -algebra 111
.F
X Standard filtration of a process X 111
Chapter 7
Strong Markov Property

L’appartenenza
è assai di più della salvezza personale
è la speranza di ogni uomo che sta male
e non gli basta esser civile.
È quel vigore che si sente se fai parte di qualcosa
che in sé travolge ogni egoismo personale
on quell’aria più vitale che è davvero contagiosa.1
Giorgio Gaber

In this chapter, .X = (Xt )t≥0 denotes a Markov process with transition law p on a
filtered probability space .(Ω, F , P , Ft ) satisfying the usual conditions. The strong
Markov property is an extension of the Markov property in which the initial time is
a stopping time.

7.1 Feller and Strong Markov Properties

Definition 7.1.1 (Strong Markov property) We say that X satisfies the strong
Markov property if for any h > 0, ϕ ∈ bB and τ being an almost surely finite
stopping time, we have
ˆ
. p(τ, Xτ ; τ + h, dy)ϕ(y) = E [ϕ (Xτ +h ) | Fτ ] . (7.1.1)
R

1 Belonging

is much more than personal salvation

it’s the hope of every man who’s struggling
and being civil isn’t enough for him.

It’s that strength you feel when you’re part of something

that overwhelms every personal selfishness
with that more vital air that is truly contagious.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 123
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_7
124 7 Strong Markov Property

Theorem 7.1.2 Let X be a Markov process. If X is a right-continuous Feller

process, then it satisfies the strong Markov property.
Proof Recall from Definition 2.1.10 that the transition law p of a Feller process is
such that, for every h > 0 and ϕ ∈ bC(R), the function
ˆ
(t, x) −
. | → p(t, x; t + h, dy)ϕ(y)
R

is continuous. Given h > 0 and ϕ ∈ bC, we prove that, setting

ˆ
Z :=
. p(τ, Xτ ; τ + h, dy)ϕ(y),
R

then Z = E [ϕ (Xτ +h ) | Fτ ]. We verify the properties of conditional expectation.

First of all, Z ∈ mFτ since:
´
• Z = f (τ, Xτ ) with f (t, x) := p(t, x; t + h, dy)ϕ(y) that is a continuous
R
function by the Feller property;
• Xτ ∈ mFτ by Proposition 6.2.29-(iv), being X adapted and right-continuous
(thus progressively measurable by Proposition 6.2.28).
Secondly, we prove that for every A ∈ Fτ we have

E [Z1A ] = E [ϕ (Xτ +h ) 1A ] .
. (7.1.2)

First, consider the case where τ takes only a countable infinity of values tk , k ∈ N:
in this case, (7.1.2) follows from the fact that
∞
⎲ ⎡ ⎤
.E [Z1A ] = E Z1A∩(τ =tk )
k=1
⎡ ⎤
∞
⎲ ˆ
= E ⎣ p(tk , Xtk ; tk + h, dy)ϕ(y)1A∩(τ =tk ) ⎦ =
k=1 R

(by the Markov property (2.2.2), since A ∩ (τ = tk ) ∈ Ftk )

∞
⎲ ⎡ ⎤
. = E ϕ(Xtk +h )1A∩(τ =tk ) = E [ϕ(Xτ +h )1A ] .
k=1
7.1 Feller and Strong Markov Properties 125

In the general case, consider the approximating sequence of stopping times defined
as
⎧
2n ≤ τ (ω) < 2n for k ∈ N,
k
2n if k−1 k
.τn (ω) =
∞ if τ (ω) = ∞.

For every n ∈ N, τn takes only a countably infinite number of values. Moreover,

τn ≥ τ and thus if A ∈ Fτ then also A ∈ Fτn and we have
⎡ ⎤
ˆ
⎡ ( ) ⎤
E⎣
. p(τn , Xτn ; τn + h, dy)ϕ(y)1A ⎦ = E ϕ Xτn +h 1A .
R

By taking the limit as n → ∞, we obtain (7.1.2). This limit is justified by

the dominated convergence theorem, given that the integrands are bounded and
converge pointwise almost surely. On the right-hand side, the convergence is
ensured by the right-continuity of X and the continuity of ϕ; on the left-hand side,
by the right-continuity of X and the Feller property. ⨆
⨅
Remark 7.1.3 [!] By Theorem 7.1.2, the Brownian motion, the Poisson process,
and more generally Lévy processes (cf. Definition 6.2.19) enjoy the strong Markov
property: so we say that they are strong Markov processes.
In analogy with the results of Sect. 4.2, we have
Proposition 7.1.4 Let W = (Wt )t≥0 be a Brownian motion on (Ω, F , P , Ft ) and
τ an a.s. finite stopping time. Then the process

Wtτ := Wt+τ − Wτ ,
. t ≥ 0, (7.1.3)

is a Brownian motion on (Ω, F , P , (Ft+τ )t≥0 ). In particular, W τ is independent

of Fτ .
Proof For every η ∈ R, we have
⎡ ⎤ ⎡ ⎤
E eiηWt | Fτ = E eiη(Wt+τ −Wτ ) | Fτ
τ
.

⎡ ⎤
= eiηWτ E eiηWt+τ | Fτ
⎡ ⎤ η2 t 2
= eiηWτ E eiηWt+τ | Wτ = e− 2

thanks to the strong Markov property in the form (7.1.1). From Theorem 4.2.10 in
[113] it follows that Wtτ ∼ N0,t and is independent of Fτ . Similarly, we prove that
Wtτ − Wsτ ∼ N0,t−s and is independent of Fτ +s for every 0 ≤ s ≤ t. ⨆
⨅
126 7 Strong Markov Property

Fig. 7.1 Trajectories of a Brownian and its reflected process starting from .t0 = 0.2

7.2 Reflection Principle

Consider a Brownian motion W defined on the filtered space .(Ω, F , P , Ft ) and fix
t0 ≥ 0. We say that
.

( )
~t := Wt∧t0 − Wt − Wt∧t0 ,
W
. t ≥ 0,

is the reflected process of W starting from .t0 . Figure 7.1 represents a trajectory of W
and its reflected process .W ~ starting from .t0 = 0.2.
~ also is a Brownian motion on .(Ω, F , P , Ft ).
It is not difficult to check2 that .W
It is noteworthy that this result generalizes to the case where .t0 is a stopping time.
Theorem 7.2.1 (Reflection Principle) [!] Let .W = (Wt )t≥0 be a Brownian motion
on the filtered space .(Ω, F , P , Ft ) and .τ a stopping time. Then the reflected
process starting from .τ , defined as

~t := Wt∧τ − (Wt − Wt∧τ ) ,

W
. t ≥ 0,

is a Brownian motion on .(Ω, F , P , Ft ).

2 For .s ≤ t we have
⎧
~ Wt if t ≤ t0 ,
.Wt =
2Wt0 − Wt if t > t0 ,

~t ∈ mFt . Moreover,
so that .W
⎧
⎨Wt − Ws
⎪ if s, t ≤ t0 ,
.W ~s = Wt − Ws − (Wt − Wt )
~t − W if s < t0 < t,
⎪
⎩
0 0

−(Wt − Ws ) if t0 ≤ s, t,

~t − W
and therefore .W ~s is independent of .Fs and has distribution .N0,t−s .
7.2 Reflection Principle 127

Proof It is enough to prove the thesis on a time interval .[0, T ] for a fixed .T > 0
and therefore it is not restrictive to assume .τ < ∞ so that the Brownian motion .W τ
in (7.1.3) is well defined. We observe that

Wt = Wt∧τ + Wt−τ
.
τ
1(t≥τ ) , ~t = Wt∧τ − Wt−τ
W τ
1(t≥τ ) .

The thesis follows from the fact that, being a Brownian motion, .W τ is equal in law
to .−W τ and is independent of .Fτ and therefore of .Wt∧τ and of .τ : it follows that W
~ are equal in law.
and .W ⨆
⨅
Consider the process of the maximum of W, defined by

W̄t := max Ws ,
. t ≥ 0.
s∈[0,t]

Corollary 7.2.2 For every .a > 0 we have

P (W̄t ≥ a) = 2P (Wt ≥ a),

. t ≥ 0. (7.2.1)

Proof We decompose .(W̄t ≥ a) into the disjoint union

(W̄t ≥ a) = (Wt > a) ∪ (Wt ≤ a, W̄t ≥ a).

We introduce the stopping time

τa := inf{t ≥ 0 | Wt ≥ a}
.

~ of W starting from .τa . Then we have3

and the reflected process .W

~t ≥ a)
(Wt ≤ a, W̄t ≥ a) = (W
.

and the thesis follows from the reflection principle. ⨆

⨅
Remark 7.2.3 [!] Some notable consequences of Corollary 7.2.2 are:
(i) since .P (|Wt | ≥ a) = 2P (Wt ≥ a), from (7.2.1) it follows that .W̄t and .|Wt |
are equal in law;
(ii) since .(τa ≤ t) = (W̄t ≥ a), from (7.2.1) we have
ˆ ∞
2
e−y dy,
2
.P (τa ≤ t) = 2P (Wt ≥ a) = √ (7.2.2)
π √a
2t

3 We set .A = (Wt ≤ a, W̄t ≥ a) and .B = (W ~t ≥ a). If .ω ∈ A then .τa (ω) ≤ t and therefore
~t (ω)
.W = 2Wτa (ω) (ω) − Wt = 2a − Wt ≥ a from which .ω ∈ B. Conversely, assume .W ~t (ω) ≥ a:
if .τa (ω) > t we would have .a ≤ W~t (ω) = Wt (ω) which is absurd. Then it must be .τa (ω) ≤ t and
~t (ω) = 2a − Wt (ω) so that .Wt (ω) ≥ a.
therefore obviously .W̄t (ω) ≥ a and also .a ≤ W
128 7 Strong Markov Property

so that

P (τa < +∞) = lim P (τa ≤ n) = 1

.
n→+∞

and, by differentiating (7.2.2), we obtain the expression of a density of .τa :

a2
ae− 2t
.γτa (t) = √ 1]0,+∞[ (t);
2π t 3/2

(iii) for every .ε > 0

P (Wt ≤ 0 ∀t ∈ [0, ε]) = P (W̄ε ≤ 0) = P (|Wε | ≤ 0) = 0.

7.3 The Homogeneous Case

We set .I = R≥0 and suppose that X is the canonical version (cf. Proposition 2.2.6)
of a Markov process with time-homogeneous transition law p: thus, X is defined on
the complete space .(RI , FμI , μ, F X ) where .μ is the law of the process X and .F X
is the standard filtration of X (cf. Definition 6.2.11). Moreover .Xt (ω) = ω(t) for
every .t ≥ 0 and .ω ∈ RI .
To express the Markov property more effectively, we introduce the family of
translations .(θt )t≥0 defined by

θt : RI −→ RI ,
. (θt ω)(s) = ω(t + s), s ≥ 0, ω ∈ RI .

Intuitively, the translation operator .θt “cuts and removes” the part of the trajectory .ω
up to time t. Given a random variable Y , we denote by .Y ◦ θt the translated random
variable defined by

(Y ◦ θt )(ω) := Y (θt (ω)),

. ω ∈ RI .

Note that .(Xs ◦ θt )(ω) = ω(t + s) = Xt+s (ω) or, more simply,

Xs ◦ θt = Xt+s .
.

In the following statement, we denote by

Ex [Y ] := E [Y | X0 = x]
.

a version of the conditional expectation function of Y given .X0 (cf. Definition 4.2.16
in [113]) and .F0,∞
X = σ (X , s ≥ 0) (cf. (2.2.6)).
s
7.3 The Homogeneous Case 129

Theorem 7.3.1 (Strong Markov Property in the Homogeneous Case [!]) Let
X be the canonical version of a strong Markov process with time-homogeneous
transition law. For every a.s. finite stopping time .τ and for every .Y ∈ bF0,∞
X , we

have

EXτ [Y ] = E [Y ◦ θτ | Fτ ] .
. (7.3.1)

Proof For clarity, we explicitly observe that the left-hand side of (7.3.1) indicates
the function .Ex [Y ] evaluated at .x = Xτ . If X satisfies the strong Markov
property (7.1.1), we have

E [ϕ (Xh ) ◦ θτ | Fτ ] = E [ϕ (Xτ +h ) | Fτ ]
.
ˆ
= p(τ, Xτ ; τ + h, dy)ϕ(y) =
R

(by the homogeneity assumption)

ˆ
. = p(0, Xτ ; h, dy)ϕ(y) = EXτ [ϕ(Xh )]
R

which proves (7.3.1) for .Y = ϕ(Xh ) with .h ≥ 0 and .ϕ ∈ bB. The general case is
proved as in Theorem 2.2.4, first extending (7.3.1) to the case

∏
n
Y =
. ϕi (Xhi )
i=1

with .0 ≤ h1 < · · · < hn and .ϕ1 , . . . , ϕn ∈ bB, and finally using the second
Dynkin’s theorem. ⨆
⨅
All the results on Markov processes encountered thus far seamlessly extend to
the multidimensional case, where processes take values in .Rd , without encountering
any significant difficulty. The following Theorem 7.3.2 is preliminary to the study
of the relationship between Markov processes and harmonic functions: we recall
that a harmonic function is a solution of the Laplace operator or more generally
of a partial differential equation of elliptic type. We assume the following general
hypotheses:
• D is an open set in .Rd ;
• X is the canonical version of a strong Markov process with values in .Rd ;
• X is continuous and has a time-homogeneous transition law p;
• .X0 ∈ D a.s.;

• .τD < ∞ a.s. where .τD is the exit time of X from D (cf. Example 6.2.3).
130 7 Strong Markov Property

We denote by .∂D the boundary of D and observe that, based on the assumptions
made, .XτD ∈ ∂D a.s. In the following statement, .Ex [·] ≡ E [· | X0 = x] indicates
the conditional expectation function given .X0 .
Theorem 7.3.2 Let .ϕ ∈ bB(∂D). If 4
⎡ ⎤
u(x) = Ex ϕ(XτD )
. (7.3.2)

then we have:
(i) the process .(u(Xt∧τD ))t≥0 is a martingale with respect to the filtration
X
.(Ft∧τ )t≥0 ;
D
(ii) for every .y ∈ D and .ϵ > 0 such that .D(y, ϵ) := {z ∈ Rd | |z − y| < ϵ} ⊆ D
we have
⎡ ( )⎤
u(x) = Ex u XτD(y,ϵ)
. (7.3.3)

where .τD(y,ϵ) indicates the exit time of X from .D(y, ϵ).

Proof The proof is based on the crucial remark that if .τ is a stopping time and
τ ≤ τD , then we have
.

XτD ◦ θτ = XτD .
. (7.3.4)

More explicitly, for every .ω ∈ RI we have

(XτD ◦ θτ )(ω) = XτD (θτ (ω)) = XτD (ω)

since the trajectory .ω and the trajectory .θτ (ω), obtained by cutting and removing the
part of .ω up to the instant .τ (ω), exit D for the first time at the same point .XτD (ω).
Let us prove (i): for .0 ≤ s ≤ t we have
⎡ ⎤ ⎡ ⎡ ⎤ ⎤
E u(Xt∧τD ) | Fs∧τD = E EXt∧τD ϕ(XτD ) | Fs∧τD =
.

(by the strong Markov property (7.3.1), since .ϕ(XτD ) ∈ bF0,∞

X )

⎡ ⎡ ⎤ ⎤
. = E E ϕ(XτD ) ◦ θt∧τD | Ft∧τD | Fs∧τD =

(by (7.3.4) with .τ = t ∧ τD )

⎡ ⎡ ⎤ ⎤
. = E E ϕ(XτD ) | Ft∧τD | Fs∧τD =

4 Formula (7.3.2) means that u is a version of the conditional expectation function of .ϕ(X ) given
τD
.X0 .
7.3 The Homogeneous Case 131

(since .Fs∧τD ⊆ Ft∧τD )

⎡ ⎤
. = E ϕ(XτD ) | Fs∧τD =

(reapplying the strong Markov property (7.3.1))

⎡ ⎤
. = EXs∧τD ϕ(XτD ) = u(Xs∧τD ).

Now let us prove (ii). If .x ∈

/ D(y, ϵ), .τD(y,ϵ) = 0 and the thesis is an obvious
consequence of Example 4.2.18 in [113]. If .x ∈ D(y, ϵ), we observe that .τD(y,ϵ) ≤
τD < ∞ a.s. since X is continuous and applying the optional sampling theorem, in
the form of Theorem 8.5.4, to the martingale .Mt := u(Xt∧τD ) we have
⎡ ⎤
M0 = E MτD(y,ϵ) | F0X
.

that is
⎡ ⎤
u(X0 ) = E u(XτD(y,ϵ) ) | X0
.

which proves (7.3.3). ⨆

⨅
Chapter 8
Continuous Martingales

Il non poter essere soddisfatto da alcuna cosa terrena, nè, per

dir così, dalla terra intera; considerare l’ampiezza inestimabile
dello spazio, il numero e la mole maravigliosa dei mondi, e
trovare che tutto è poco e piccino alla capacità dell’animo
proprio; immaginarsi il numero dei mondi infinito, e l’universo
infinito, e sentire che l’animo e il desiderio nostro sarebbe
ancora più grande che siì fatto universo; e sempre accusare le
cose d’insufficienza e di nullità, e patire mancamento e vòto, e
però noia, pare a me il maggior segno di grandezza e di nobiltà,
che si vegga della natura umana.1
Giacomo Leopardi

In this chapter, we extend some important results from the discrete to the continuous
case, such as the optional sampling theorem and Doob’s maximal inequalities for
martingales. The general strategy consists of three steps:
. the results are ﬁrst extended from the discrete case, in which the number of time
instants is finite, to the case in which the time instants are the so-called dyadic
rationals deﬁned by
⋃ { } { }
.D := Dn , Dn := k
2n | k ∈ N0 = 0, 21n , 22n , 23n , . . . .
n≥1

1 The inability to be satisfied by any earthly thing, nor, so to speak, by the entire earth; to consider
the immeasurable vastness of space, the wondrous number and magnitude of the worlds, and to
find that everything is small and insufficient for the capacity of one’s own soul; to imagine the
number of worlds as infinite, and the universe as infinite, and to feel that our soul and desire would
still be greater than this vast universe; and to always accuse things of inadequacy and nullity, and
to suffer from lack and emptiness, and therefore boredom - this, to me, seems the greatest sign of
greatness and nobility that one can perceive in human nature. Translation by J. Galassi

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 133
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_8
134 8 Continuous Martingales

We observe that .Dn ⊆ Dn+1 for every .n ∈ N and .D is a countable set dense in
R≥0 ;
.

. under the assumption of right-continuity of the trajectories, it is almost immedi-

ate to extend the validity of the results from the dyadic to the continuous case;
. ﬁnally, the assumption of continuity of the trajectories is not restrictive since
every martingale admits a modification with càdlàg trajectories: the proof is
based on Doob’s maximal inequalities (which allow to prove that the trajectories
do not diverge almost surely) and on the upcrossing lemma (which allows to
prove that the trajectories do not oscillate almost surely). The third fundamental
ingredient is Vitali’s convergence theorem (Theorem C.0.2 in [113]) which
guarantees the preservation of the martingale property when taking the limit.
In the second part of the chapter, we introduce some remarkable martingale spaces
that will play a central role in the theory of stochastic integration. We also give
the deﬁnition of local martingale, a notion that generalizes that of martingale by
weakening the integrability assumptions.

8.1 Optional Sampling and Maximal Inequalities

Consider a ﬁltered probability space .(Ω, F , P , Ft ). In this section, we do not

assume the usual conditions on the ﬁltration. Hereafter, ﬁxed .T > 0, we use the
notation
⋃ { }
.D(T ) := DT ,n , DT ,n := T2nk | k = 0, 1, . . . , 2n , n ∈ N.
n≥1
(8.1.1)
Lemma 8.1.1 (Doob’s Maximal Inequalities on Dyadics) Let .X = (Xt )t≥0 be a
martingale or a non-negative sub-martingale. For every .T , λ > 0 and .p > 1, we
have
⎛ ⎞
E [|XT |]
.P sup |Xt | ≥ λ ≤ ,. (8.1.2)
t∈D (T ) λ
⎾ ⏋ ⎛ ⎞p
p ⎾ ⏋
E sup |Xt | ≤ p
E |XT |p . (8.1.3)
t∈D (T ) p−1

Proof If X is a martingale, then .|X| is a non-negative sub-martingale by Propo-

sition 1.4.12. Therefore, it sufﬁces to prove the thesis for a non-negative sub-
martingale X. Fixed .T > 0, for each .n ∈ N we consider the process .(Xt )t∈DT ,n
8.1 Optional Sampling and Maximal Inequalities 135

which is a non-negative discrete sub-martingale with respect to the ﬁltration

(Ft )t∈DT ,n and set
.

Mn := sup Xt ,
. M := sup Xt .
t∈DT ,n t∈D (T )

Fix .ε > 0. Recalling that .DT ,n ⊆ DT ,n+1 , by Beppo Levi’s theorem we have2

.P (M > λ − ε) = lim P (Mn > λ − ε) ≤

n→∞

(by Doob’s maximal inequality for discrete sub-martingales, Theorem 6.1.11)

E [XT ]
. ≤ .
λ−ε

Formula (8.1.2) follows from the arbitrariness of .ε.

p p p
Now let .p > 1. Since .DT ,n ⊆ DT ,n+1 and .Mn = sup Xt , we have .0 ≤ Mn ↗
t∈DT ,n
p
M = sup Xt as .n → ∞. Then, by Beppo Levi’s theorem, we have
t∈D (T )

⎾ ⏋ ⎾ p⏋
E M p = lim E Mn ≤
.
n→∞

(by Doob’s maximal inequality for discrete sub-martingales, Theorem 6.1.11)

⎛ ⎞p
p ⎾ p⏋
. ≤ E XT .
p−1

⨆
⨅
In the following statements, we will always assume the hypothesis of right-
continuity of the processes: we will see in Sect. 8.2 that, if the filtration satisfies
the usual conditions, every martingale admits a càdlàg modification.
Theorem 8.1.2 (Doob’s Maximal Inequalities [!]) Let .X = (Xt )t≥0 be a right-
continuous martingale (or a non-negative sub-martingale). For every .T , λ > 0 and

2 Note that
⎾ ⏋ ⎾ ⏋
.P (M > λ − ε) = E 1(M>λ−ε) = lim E 1(Mn >λ−ε) = lim P (Mn > λ − ε),
n→∞ n→∞

since the sequence .1(Mn >λ−ε) is monotonically increasing.

136 8 Continuous Martingales

p > 1 we have
.

⎛ ⎞
E [|XT |]
P
. sup |Xt | ≥ λ ≤ ,. (8.1.4)
t∈[0,T ] λ
⎾ ⏋ ⎛ ⎞p
p ⎾ ⏋
E sup |Xt | ≤ p
E |XT |p . (8.1.5)
t∈[0,T ] p−1

Proof The thesis is an immediate consequence of Lemma 8.1.1 since if X has right-
continuous trajectories then . sup |Xt | = sup |Xt |. ⨆
⨅
t∈[0,T ] t∈D (T )

In analogy with the discrete case, we have the following simple

Corollary 8.1.3 (Doob’s Maximal Inequalities [!]) Let .X = (Xt )t≥0 be a right-
continuous martingale (or a non-negative sub-martingale). For every .λ > 0, .p > 1
and .τ stopping time such that .τ ≤ T a.s. for some T , we have
⎛ ⎞
E [|Xτ |]
P
. sup |Xt | ≥ λ ≤ ,
t∈[0,τ ] λ
⎾ ⏋ ⎛ ⎞p
p ⎾ ⏋
E sup |Xt | ≤p
E |Xτ |p .
t∈[0,τ ] p−1

Proof We will see later (cf. Corollary 8.4.1) that stopping a right-continuous
martingale results in a martingale. Then the thesis follows from Theorem 8.1.2
applied to .(Xt∧τ )t≥0 . ⨆
⨅
To extend some results on stopping times and martingales from the discrete case
to the continuous one, the following technical approximation result is useful.
Lemma 8.1.4 Let .τ : Ω −→ [0, +∞] be a stopping time. There exists a sequence
(τn )n∈N of discrete stopping times (cf. Definition 6.1.1)
.

τn : Ω −→ { 2kn | k = 1, 2, . . . , n2n }
.

such that:
(i) .τn −→ τ as .n → ∞;
(ii) .τn+1 (ω) ≤ τn (ω) if .n > τ (ω).
Proof For each .n ∈ N we set
⎧
k
2n if k−1
2n ≤ τ (ω) < k
2n for k ∈ {1, 2, . . . , n2n },
τn (ω) =
.
n if τ (ω) ≥ n.
8.1 Optional Sampling and Maximal Inequalities 137

For every .ω ∈ Ω and .n ∈ N such that .τ (ω) < n we have

τn (ω) −
.
1
2n ≤ τ (ω) ≤ τn (ω)

which proves (i) and (ii). Finally, for every fixed .n ∈ N, .τn is a discrete stopping
time with respect to the filtration defined by .F k for .k = 0, 1, . . . , n2n , since we
2n
have
( ) ⎛ ⎞
. τn = k
2n = k−1
2n ≤τ < k
2n ∈F k , k = 0, 1, . . . , n2n − 1,
2n
⎛ ⎞ ⎛ ⎞c
(τn = n) = τ ≥ n − 1
2n = τ <n− 1
2n ∈F 1 ⊆ Fn .
n− 2n

⨆
⨅
Remark 8.1.5 Based on (ii) of Lemma 8.1.4, if .τ (ω) < ∞, the approximating
sequence .(τn (ω))n∈N has the property of being monotonically decreasing at least
for large n. On the other hand, if .τ (ω) = ∞ then .τn (ω) = n.
We give a ﬁrst version of the optional sampling theorem: we will see a second
one, with weaker assumptions on stopping times, in Theorem 8.5.4.
Theorem 8.1.6 (Optional Sampling Theorem [!!!]) Let .X = (Xt )t≥0 be a right-
continuous sub-martingale. If .τ1 and .τ2 are stopping times such that .τ1 ≤ τ2 ≤ T
for some .T > 0, then we have
⎾ ⏋
Xτ1 ≤ E Xτ2 | Fτ1 .
.

Proof Suppose that X is a right-continuous martingale. Consider the sequences

.(τi,n )n∈N , .i = 1, 2, constructed as in Lemma 8.1.4, of discrete stopping times such
that .τi,n −−−→ τi : by construction we also have .τ1,n ≤ τ2,n for every .n ∈ N. Let
n→∞
.τ̄i,n = τi,n ∧ T . Due to the monotonicity property of .τ̄i,n (cf. Lemma 8.1.4-(ii))

and the right-continuity of X, we have .Xτ̄i,n −−−→ Xτi . On the other hand, by the
n→∞
discrete version of the optional sampling theorem (cf. Theorem 6.1.10) we have
⎾ ⏋
Xτ̄i,n = E XT | Fτ̄i,n
. (8.1.6)

and therefore, by Proposition C.0.7 in [113] (and Remark C.0.8 in [113]), the
sequences .(Xτ̄i,n )n∈N are uniformly integrable. Then, by Vitali’s convergence
theorem C.0.2 in [113], we also have convergence in .L1 (Ω, P ):

L1
Xτ̄i,n −−−→ Xτi ,
. i = 1, 2. (8.1.7)
n→∞
138 8 Continuous Martingales

Again by the optional sampling Theorem 6.1.10, we have

⎾ ⏋
Xτ̄1,n = E Xτ̄2,n | Fτ̄1,n
.

so that, conditioning on .Fτ̄1 and using the tower property, we get

⎾ ⏋ ⎾ ⏋
E Xτ̄1,n | Fτ̄1 = E Xτ̄2,n | Fτ̄1 .
.

The thesis follows by taking the limit as .n → ∞, thanks to (8.1.7) and remembering
that the convergence in .L1 (Ω, P ) of .Xτ̄i,n implies the convergence of the conditional
⎾ ⏋
expectations .E Xτ̄i,n | Fτ1 (cf. Theorem 4.2.10 in [113]).
If X is a sub-martingale, the proof is completely analogous except for the fact
that uniform integrability cannot be deduced directly from (8.1.6) but requires using
a slightly more subtle argument: for details, we refer to [6], Theorem 5.13. ⨆
⨅
The following useful result shows that the martingale property is equivalent to
the property of having constant expectation over time, at least if we also consider
random times (more precisely, bounded stopping times).
Theorem 8.1.7 ([!]) Let .X = (Xt )t≥0 be an adapted, right-continuous and
absolutely integrable (i.e., such that .Xt ∈ L1 (Ω, P ) for every .t ≥ 0) process.
Then X is a martingale if and only if .E [Xτ ] = E [X0 ] for every bounded3 stopping
time .τ .
Proof If X is a right-continuous martingale4 then it is constant on average on
bounded stopping times by the optional sampling Theorem 8.1.6. Conversely, since
X is adapted by hypothesis, it remains only to verify that

. E [Xt 1A ] = E [Xs 1A ] , s ≤ t, A ∈ Fs .

To this end, we consider

τ := s1A + t1Ac
.

which is easily veriﬁed to be a bounded stopping time. Then by hypothesis, we have

E [X0 ] = E [Xτ ] = E [Xs 1A ] + E [Xt 1Ac ] ,

E [X0 ] = E [Xt ] = E [Xt 1A ] + E [Xt 1Ac ] ,

and subtracting one equation from the other yields the thesis. ⨆
⨅

3 There exists .T > 0 such that .τ ≤ T .

4 Under the usual conditions on the ﬁltration, this assumption is not restrictive since we will see in
Sect. 8.2 that every martingale admits a càdlàg modiﬁcation.
8.2 Càdlàg Martingales 139

8.2 Càdlàg Martingales

In this section, we prove that, under the usual conditions on the filtration, every
martingale admits a càdlàg modification and thus the right-continuity assumption
made in the statements of the previous section can be removed. We ﬁrst prove that
a martingale can only have jump discontinuities (with jumps of ﬁnite size) on the
dyadic rationals of .R≥0 .
Lemma 8.2.1 Let .X = (Xt )t∈D be a martingale or a non-negative sub-martingale.
There exists a negligible event N such that, for every .t ≥ 0, the limits

. lim Xs (ω), lim Xs (ω) (8.2.1)

s→t − s→t +
s∈D s∈D

exist and are finite for every .ω ∈ Ω \ N . Moreover, if . sup E [|Xt |] < ∞ then also
t∈D
the limit

. lim Xt (ω) (8.2.2)

t→+∞
t∈D

exists and is finite, for .ω ∈ Ω \ N .

Proof The idea of the proof is as follows. The fact that the limits in (8.2.1) diverge
or do not exist is possible only in two cases: if . sup |Xt (ω)| = ∞ or if there exists a
t∈D
non-trivial interval .[a, b] that is “crossed” by X an inﬁnite number of times. Doob’s
maximal inequality and the upcrossing lemma exclude these two possibilities or,
more precisely, imply that they occur only for .ω belonging to a negligible event.
Consider ﬁrst the case where .κ := sup E [|Xt |] < ∞. Fixed .n ∈ N, we apply
t∈D
the maximal inequality (6.1.7) and the upcrossing Lemma 6.1.13 to the non-negative
discrete sub-martingale .(|Xt |)t∈Dn ∩[0,n] : for every .λ > 0 and .0 ≤ a < b, we have
⎛ ⎞
E [|Xn |] κ
P
. max |Xt | ≥ λ ≤ ≤ ,
t∈Dn ∩[0,n] λ λ
⎾ ⏋
⎾ ⏋ E (|Xn | − a)+ κ
E νn,a,b ≤ ≤ ,
b−a b−a

where .νn,a,b is the number of upcrossings of .(|Xt |)t∈Dn ∩[0,n] on .[a, b]. Taking the
limit as .n → ∞ and using Beppo Levi’s theorem, we have
⎛ ⎞
κ ⎾ ⏋ κ
P
. sup |Xt | ≥ λ ≤ , E νa,b ≤ ,
t∈D λ b−a
140 8 Continuous Martingales

where .νa,b is the number of upcrossings of .(|Xt |)t∈D on .[a, b]. This implies the
existence of two negligible events .N0 and .Na,b for which

. sup |Xt | < ∞ on Ω \ N0 , νa,b < ∞ on Ω \ Na,b .

t∈D

Also the event

⋃
N :=
. Na,b ∪ N0
a,b∈Q
0≤a<b

is negligible: for every .ω ∈ Ω \ N we have that . sup |Xt (ω)| < ∞ and, on every
t∈D
interval with non-negative rational endpoints, there are only a ﬁnite number of
upcrossings of .|X(ω)|; consequently the limits in (8.2.1) and (8.2.2) exist and are
ﬁnite on .Ω \ N .
Now consider the case where X is a generic martingale. For every .n ∈ N, we
can apply what has just been proven to the stopped process .(Xt∧n )t∈D . Indeed it is
immediate to verify that .(Xt∧n )t∈D is a martingale and

. sup E [|Xt∧n |] ≤ E [|Xn |]

t∈D

since, by Proposition 1.4.12, .(|Xt∧n |)t∈D is a sub-martingale. Hence the limits

in (8.2.1) exist and are ﬁnite almost surely for .t ≤ n. The thesis follows from the
arbitrariness of .n ∈ N. ⨆
⨅
The argument used in the second part of the proof of Lemma 8.2.1 is easily
adapted to prove the following
Theorem 8.2.2 ([!]) Let .X = (Xn )n∈N be a discrete martingale such that
sup E [|Xn |] < ∞. Then, there exists and is a.s. finite the pointwise limit
.
n∈N

X∞ := lim Xn .
.
n→∞

The usual conditions, in particular the right-continuity of the ﬁltration, play a

crucial role in the proof of the next result.
Theorem 8.2.3 ([!]) Assume that the filtered probability space .(Ω, F , P , Ft ) sat-
isfies the usual conditions. Then every martingale (or non-negative sub-martingale)
.X = (Xt )t≥0 on .(Ω, F , P , Ft ) admits a modification that is still a martingale

(respectively, non-negative sub-martingale) with càdlàg trajectories.

8.3 The Space M c,2 of Square-Integrable Continuous Martingales 141

Proof We only prove the case where X is a martingale. By Lemma 8.2.1 the
trajectories of .(Xt )t∈D have ﬁnite right and left limits almost surely. Then the
process

. ~t := lim Xs ,
X t ≥ 0,
s→t +
s∈D

is well deﬁned and has càdlàg trajectories. Let us prove that

. ~t = E [XT | Ft ] ,
X 0 ≤ t ≤ T; (8.2.3)

~t = Xt almost surely, i.e., .X

this implies that .X ~ is a modiﬁcation of X, and
~ is a martingale.
consequently also that .X
Let us prove (8.2.3) by verifying the two properties of conditional expectation.
~t ∈ mFt+ = mFt thanks to the usual conditions.
First of all, by deﬁnition .X
Secondly, since X is a martingale, for every .A ∈ Ft we have

.E [Xs 1A ] = E [XT 1A ] , s ∈ [t, T ]. (8.2.4)

⎾ ⏋
Taking the limit in (8.2.4) as .s → t + , with .s ∈ D ∩ (t, T ], we get .E X~t 1A =
E [XT 1A ] which proves (8.2.3). Convergence is justiﬁed by Vitali’s Theorem C.0.2
in [113] since .Xs = E [XT | Fs ], with .s ∈ D ∩ (t, T ], is uniformly integrable by
Proposition C.0.7 in [113]. ⨆
⨅
Example 8.2.4 ([!]) Let .X ∈ L1 (Ω, P ). Under the assumptions of Theorem 8.2.3,
the martingale .Mt := E [X | Ft ] admits a càdlàg version.

In light of Theorem 8.2.3 from now on, given a martingale with respect
to a ﬁltration that veriﬁes usual conditions, we implicitly assume to always
consider a càdlàg version of it.

8.3 The Space M c,2 of Square-Integrable Continuous

Martingales

In this section we introduce the space of processes on which we will build the
stochastic integral and prove that it is a Banach space.
Definition 8.3.1 For .T > 0, we denote by .MTc,2 the space of continuous square-
integrable martingales .X = (Xt )t∈[0,T ] and set
/ ⎾ ⏋
‖X‖T := ‖XT ‖L2 (Ω,P ) =
. E XT2 .
142 8 Continuous Martingales

Moreover, we denote by .M c,2 the space of continuous martingales .X = (Xt )t≥0

such that .Xt ∈ L2 (Ω, P ) for every .t ≥ 0.
Remark 8.3.2 Note that .‖ · ‖T is a semi-norm in .MTc,2 , in the sense that .‖X‖T = 0
if and only if X is indistinguishable from the null process. This fact is a consequence
of the continuity assumption of X and Doob’s maximal inequality according to
which we have
⎾ ⏋
⎾ ⏋
.E sup Xt2 ≤ 4E XT2 = 4‖X‖2T .
t∈[0,T ]

By identifying indistinguishable processes in .MTc,2 and thus considering .MTc,2 as

the space of equivalence classes of processes (in the sense of indistinguishability),
we obtain a complete normed space.
Proposition 8.3.3 .(MTc,2 , ‖ · ‖T ) is a Banach space.

Proof Let .(Xn )n∈N be a Cauchy sequence in .MTc,2 with respect to .‖ · ‖T . It is

enough to show that .(Xn )n∈N admits a convergent subsequence in .MTc,2 .
By Doob’s maximal inequality (8.1.4), for every .ε > 0 and .n, m ∈ N we have
⎛ ⎞ ⎾ ⏋
E |Xn,T − Xm,T |
.P sup |Xn,t − Xm,t | ≥ ε ≤ ≤
t∈[0,T ] ε

(by Hölder’s inequality)

⎾ ⏋1
E |Xn,T − Xm,T |2 2 ‖Xn − Xm ‖T
. ≤ = .
ε ε

Consequently, for every .k ∈ N there exists .nk ∈ N such that

⎛ ⎞
1 1
P
. sup |Xn,t − Xm,t | ≥ ≤ , n, m ≥ nk ,
t∈[0,T ] k 2k

and, by Borel-Cantelli’s Lemma 1.3.28 in [113], .Xnk ,· converges uniformly on

[0, T ] almost surely: the limit value, which we denote by X, is a continuous process
.

(we can set to zero the discontinuous trajectories). ( )

Fix .t ∈ [0, T ]: by Doob’s inequality (8.1.5), also . Xnk ,t k∈N is a Cauchy
sequence in .L2 (Ω, P ) which is a complete space and, by the uniqueness of the
limit, converges to .Xt in the sense that
⎾| |2 ⏋
. lim E |Xt − Xnk ,t | = 0. (8.3.1)
k→∞
8.4 The Space M c,loc of Continuous Local Martingales 143

In particular, if .t = T , we have
‖ ‖
. lim ‖X − Xnk ‖T = 0.
k→∞

Finally, we prove that X is a martingale. For .0 ≤ s ≤ t ≤ T and .G ∈ Fs we have

⎾ ⏋ ⎾ ⏋
E Xnk ,t 1G = E Xnk ,s 1G
.

since .Xnk ∈ MTc,2 . Taking the limit as .n → ∞ thanks to (8.3.1) we have

.E [Xt 1G ] = E [Xs 1G ] which proves the thesis. ⨆
⨅

8.4 The Space M c,loc of Continuous Local Martingales

One of the main motivations for the introduction of stopping times is the use of
so-called “localization” techniques, which allow for relaxation of the integrability
assumptions. In this section, we analyze the specific case of martingales.
Consider a filtered space .(Ω, F , P , Ft ) satisfying the usual conditions. The
concept of local martingale extends that of martingale by removing the integrability
condition of the process. This allows to include important classes of processes (for
example, stochastic integrals) that are martingales only if stopped (or “localized”).
We first observe that, as in the discrete case (cf. Proposition 6.1.7), the martingale
property is preserved by stopping the process.
Corollary 8.4.1 (Stopped Martingale) Let .X = (Xt )t≥0 be a (càdlàg) mar-
tingale and .τ0 a stopping time. Then also the stopped process .(Xt∧τ0 )t≥0 is a
martingale.
Proof Since X is càdlàg and adapted by hypothesis, by Proposition ⎾6.2.29 we have
⏋
Xt∧τ0 ∈ mFt∧τ0 ⊆ mFt . Moreover, by Theorem 8.1.6 .Xt∧τ0 = E Xt | Ft∧τ0 ∈
.

L1 (Ω, P ) for every .t⎾ ≥ 0. ⏋Again by Theorem 8.1.6, for every bounded stopping
time .τ we have .E Xτ ∧τ0 = E [X0 ] and therefore the thesis follows from
Theorem 8.1.7. ⨆
⨅
Definition 8.4.2 (Local Martingale) We say that .X = (Xt )t≥0 is a local
martingale if .X0 ∈ mF0 and there exists a non-decreasing sequence .(τn )n∈N of
stopping times, called localizing sequence for X, such that:
(i) .τn ↗ ∞ as .n → ∞;
(ii) for every .n ∈ N, the stopped and translated process .(Xt∧τn − X0 )t≥0 is a
martingale.
We denote by .M c,loc the space of continuous local martingales.
By Corollary 8.4.1, every (càdlàg) martingale is a local martingale with localiz-
ing sequence .τn ≡ ∞.
144 8 Continuous Martingales

Example 8.4.3 Consider the constant process .X = (Xt )t≥0 with .Xt ≡ X0 ∈ mF0
for every .t ≥ 0. If .X0 ∈ L1 (Ω, P ) then X is a martingale. If .X0 ∈/ L1 (Ω, P ), the
process X is not a martingale due to the lack of integrability but is obviously a local
martingale: in fact, setting .τn ≡ ∞, we have .Xt∧τn − X0 ≡ 0.
Example 8.4.4 Let W be a Brownian motion on .(Ω, F , P , Ft ) and .Y ∈ mF0 .
Then the process

Xt := Y Wt
.

is adapted. Moreover, if .Y ∈ L1 (Ω, P ), being .Wt = Wt − W0 and Y independent,

we also have .Xt ∈ L1 (Ω, P ) for every .t ≥ 0 and

E [Y Wt | Fs ] = Y E [Wt | Fs ] = Y Ws ,
. s ≤ t,

so that X is a martingale.
Without further assumptions on Y apart from the .F0 -measurability, the process
X may not be a martingale due to the lack of integrability but is still a local
martingale: the idea is to remove the trajectories where Y is “too large” by setting
⎧
0 if |Y | > n,
τn :=
.
∞ if |Y | ≤ n,

which deﬁnes an increasing sequence of stopping times (note that .(τn ≤ t) = (|Y | >
n) ∈ F0 ⊆ Ft ). Then, for every .n ∈ N, the process

t |→ Xt∧τn = Xt 1(τn =∞) = Wt Y 1(|Y |≤n)

is a martingale since it is of the type .Wt Ȳ where .Ȳ = Y 1(|Y |≤n) is a bounded .F0 -
measurable random variable.
Exercise 8.4.5 (Brownian Motion with Random Initial Value) Let .W =
(Wt )t≥0 be a Brownian motion on .(Ω, F , P , Ft ). Given .t0 ≥ 0 and .Z ∈ mFt0 , let

Wtt0 ,Z := Wt − Wt0 + Z,
. t ≥ t0 .

The process .W t0 ,Z has an initial value (at time .t0 ) equal to Z, is continuous, adapted
and has independent and stationary increments, equal to the increments of a standard
Brownian motion. If .Z ∈ L1 (Ω, P ) then .(Wtt0 ,Z )t≥t0 is a martingale; in general,
t ,Z is a local martingale with localizing sequence .τ ≡ ∞.
.W 0 n
We also notice that, given any distribution .μ, it is not difﬁcult to construct a
μ
Brownian motion .W μ with initial distribution .W0 ∼ μ on the space .(Ω × R, F ⊗
B, P ⊗ μ).
8.4 The Space M c,loc of Continuous Local Martingales 145

Remark 8.4.6 ([!]) If X is a local martingale with localizing sequence .(τn )n∈N
then:

(i) X has a modiﬁcation with càdlàg trajectories that is constructed from the
existence of a càdlàg modiﬁcation of each martingale .Xt∧τn .

Hereafter, the fact that a local martingale is càdlàg will be always implicitly
assumed by convention;

(ii) X is adapted since .X0 ∈ mF0 by deﬁnition and .Xt − X0 is the pointwise limit
of .Xt∧τn − X0 which is .mFt -measurable by deﬁnition of martingale;
(iii) a priori .Xt does not have any integrability property;
(iv) if X has càdlàg trajectories then there exists a localizing sequence .(τ̄n )n∈N
such that
| |
|τ̄n | ≤ n,
. |Xt∧τ̄ | ≤ n, t ≥ 0, n ∈ N.
n

Indeed, by Proposition 6.2.7, the exit time .σn of .|X| from the interval .[−n, n]
is a stopping time; moreover, since X is càdlàg (and therefore every trajectory
of X is bounded on every compact time interval) we have .σn ↗ ∞. Then

τ̄n := τn ∧ σn ∧ n
.

is a localizing sequence for X: in particular, since .Xt∧τn − X0 is a martingale,

by Corollary 8.4.1, .Xt∧τ̄n − X0 = X(t∧τ̄n )∧(σn ∧n) − X0 also is a martingale;
(v) if there exists .Y ∈ L1 (Ω, P ) such that .|Xt | ≤ Y for every⎾ .t ≥ 0, then X is⏋
a martingale: in fact for .s ≤ t we have .Xs∧τn − X0 = E Xt∧τn − X0 | Fs
which, thanks to the integrability hypothesis, is equivalent to
⎾ ⏋
Xs∧τn = E Xt∧τn | Fs .
. (8.4.1)

The thesis follows by taking the limit as .n → ∞ and using the dominated
convergence theorem for the conditional expectation. Notice that, in particular,
every bounded local martingale is a true martingale. Convergence in (8.4.1)
is a very delicate issue: for example, there exist uniformly integrable local
martingales that are not martingales;5

5 See, for example, Chapter 2 in [37].

146 8 Continuous Martingales

(vi) if .X ≥ 0 then X is a super-martingale because, arguing as in the previous point

and using Fatou’s lemma instead of the dominated convergence theorem, we
obtain

Xs ≥ E [Xt | Fs ] ,
. 0 ≤ s ≤ t ≤ T. (8.4.2)

Moreover, if .E [XT ] = E [X0 ] then .(Xt )t∈[0,T ] is a true martingale. In fact,

from (8.4.2) it is easy to deduce

.E [X0 ] ≥ E [Xt ] ≥ E [XT ] , 0 ≤ t ≤ T,

and therefore from the assumption we get .E [Xt ] = E [X0 ] for every .t ∈
[0, T ]. If it were .Xs > E [Xt | Fs ] on a non-negligible event, we would have
a contradiction from (8.4.2).

8.5 Uniformly Square-Integrable Martingales

In this section we prove a further version of the optional sampling theorem. Let
(Ω, F , P , Ft ) be a ﬁltered space satisfying the usual conditions. To deal with the
.

case where the time index varies in .R≥0 we introduce a integrability condition that
will allow to easily reduce to the case .[0, T ] by using stopping times.
Definition 8.5.1 Let .p ≥ 1. We say that a process .X = (Xt )t≥0 is uniformly in .Lp
if
⎾ ⏋
. sup E |Xt |p < ∞.
t≥0

Proposition 8.5.2 Let .X = (Xt )t≥0 be a martingale. The following statements are
equivalent:
(i) X is uniformly in .L2 ;
(ii) there exists a .F∞ -measurable6 random variable .X∞ ∈ L2 (Ω, P ) such that

Xt = E [X∞ | Ft ] ,
. t ≥ 0.

In this case, we also have

⎾ ⏋
⎾ ⏋
E
. sup Xt2 ≤ 4E X∞
2
. (8.5.1)
t≥0

6 Recall the deﬁnition of .F∞ in (6.2.5).

8.5 Uniformly Square-Integrable Martingales 147

Proof [(ii). ⇒ (i)] By Jensen’s inequality, we have

⎾ ⏋ ⎾ ⏋ ⎾ ⎾ ⏋⏋ ⎾ ⏋
.E Xt2 = E E [X∞ | Ft ]2 ≤ E E X∞
2
| Ft = E X∞
2
< ∞. (8.5.2)

[(i). ⇒ (ii)] Consider the discrete martingale .(Xn )n∈N . By Theorem 8.2.2, for almost
every .ω ∈ Ω, there exists and is ﬁnite the limit

X∞ (ω) := lim Xn (ω);

.
n→∞

we also set .X∞ (ω) = 0 for the .ω for which such limit does not exist or is not ﬁnite.
Clearly, .X∞ ∈ mF∞ and also .X∞ ∈ L2 (Ω, P ) since by Fatou’s lemma, we have
⎾ ⏋ ⎾ ⏋ ⎾ ⏋
2
E X∞
. ≤ lim E Xn2 ≤ sup E Xt2 < ∞
n→∞ t≥0

by assumption. Thanks to Remark C.0.10 in [113], .(Xn )n∈N is uniformly integrable

and thus by Vitali’s Theorem C.0.2 in [113], .Xn converges to .X∞ in .L1 (Ω, P ):
from this, it also follows that

Xn = E [X∞ | Fn ] ,
. n ∈ N; (8.5.3)

indeed, using the deﬁnition of conditional expectation, it is sufﬁcient to observe that

for every .A ∈ Fn , we have

0 = lim E [(Xn − XN )1A ] = E [(Xn − X∞ )1A ] .

.
N →∞

Then, given .t ≥ 0 and taking .n ≥ t, we have

Xt = E [Xn | Ft ] = E [E [X∞ | Fn ] | Ft ] = E [X∞ | Ft ] .

Finally, for every .n ∈ N, by Doob’s maximal inequality, we have

⎾ ⏋
⎾ ⏋
E
. sup Xt2 ≤ 4E Xn2 ≤
t∈[0,n]

(by (8.5.3) and proceeding as in the proof of (8.5.2))

⎾ ⏋
. ≤ 4E X∞
2

and (8.5.1) follows by taking the limit as .n → +∞, by Beppo Levi’s theorem. ⨅⨆
⎾ ⏋
Example 8.5.3 A real Brownian motion W is not uniformly in .L2 since .E Wt2 =
t. However, for any ﬁxed .T > 0, the process .Xt := Wt∧T is a martingale that is
uniformly in .L2 with .X∞ = WT .
148 8 Continuous Martingales

The next result is a version of the optional sampling theorem for martingales that
are uniformly in .L2 . Such a integrability condition is necessary as is evident from
the following example: given a real Brownian motion W and .a > 0, consider the
stopping time .τa = inf{t ≥ 0 | Wt ≥ a}. We have seen in Remark 7.2.3-(ii) that
.τa < ∞ a.s. but

⎾ ⏋
0 = W0 < E Wτa = a.
.

Theorem 8.5.4 (Optional Sampling Theorem [!]) Let .X = (Xt )t≥0 be a (càdlàg)
martingale that is uniformly in .L2 . If .τ1 and .τ2 are stopping times such that .τ1 ≤
τ2 < ∞, then we have
⎾ ⏋
Xτ1 = E Xτ2 | Fτ1 .
.

Proof We begin by proving that if .X = (Xt )t≥0 is a (càdlàg) sub-martingale that is

uniformly in .L2 , then for every stopping time .τ such that .P (τ < ∞) = 1, we have

X0 ≤ E [Xτ | F0 ] .
. (8.5.4)

First, we observe that by (8.5.1) we have .Xτ ∈ L2 (Ω, P ). Applying the optional
sampling Theorem 8.1.6 with the sequence of bounded stopping times .τ ∧ n, we
have

. X0 ≤ E [Xτ ∧n | F0 ] .

Taking the limit as .n → ∞, we obtain (8.5.4) by the dominated convergence

theorem since

|Xτ ∧n | ≤ 1 + sup Xt2 ∈ L1 (Ω, P )

.
t≥0

thanks to (8.5.1).
To prove the thesis, it is sufﬁcient to verify that for every .A ∈ Fτ1 , we have
⎾ ⏋ ⎾ ⏋
E Xτ1 1A = E Xτ2 1A .
. (8.5.5)

Consider

. τ := τ1 1A + τ2 1Ac

which is a stopping time since

( )
(τ < t) = (A ∩ (τ1 < t)) ∪ Ac ∩ (τ2 < t) ∈ Ft ,
. t ≥ 0.
8.6 Key Ideas to Remember 149

Then, by (8.5.4), we have

⎾ ⏋ ⎾ ⏋
E [X0 ] = E [Xτ ] = E Xτ1 1A + E Xτ2 1Ac ,
.
⎾ ⏋ ⎾ ⏋ ⎾ ⏋
E [X0 ] = E Xτ1 = E Xτ1 1A + E Xτ1 1Ac ,

and this proves (8.5.5). ⨆

⨅

8.6 Key Ideas to Remember

We distill the chapter’s key findings and essential concepts for easy comprehension
upon initial perusal, setting aside the intricacies of technical or secondary details.
As usual, if you have any doubt about what the following succinct statements mean,
please review the corresponding section.
. Section 8.1: the optional sampling theorem and Doob’s maximal inequalities
extend without difficulty from discrete to continuous martingales.
. Section 8.2: under the usual conditions, every martingale admits a càdlàg
modification; therefore the continuity assumption of Sect. 8.1 is actually not
restrictive.
. Section 8.3: the space .M c,2 of continuous square-integrable martingales X
on .[0, T ] is a Banach space, equipped with the .L2 norm of the final value,
.‖XT ‖L2 (Ω,P ) .

. Section 8.4: a local martingale is a process that can be approximated by true

martingales through a localizing sequence of stopping times. In the deﬁnition
of a local martingale, no assumptions are made regarding the integrability of
the process or conditions on the initial data. Important classes of processes,
including stochastic integrals, fall under the category of local martingales, as
they are martingales only when stopped. Every bounded local martingale is a
true martingale, and every non-negative local martingale is a supermartingale.
. Section 8.5: we introduce the class of uniformly square-integrable martingales
and another version of the optional sampling theorem in which the boundedness
assumption on stopping times is removed.
Main notations used or introduced in this chapter:

Symbol Description Page

.D Dyadic rationals 133
.D (T ) Dyadic rationals of .[0, T ] 134
.M
c,2 Continuous square-integrable martingales 141
.M
c,loc Continuous local martingales 143
Chapter 9
Theory of Variation

The traditional professor writes a, says b, and means c; but it

should be d.
George Pólya

In this chapter, we review some basic concepts of deterministic integration theory

in the sense of Riemann-Stieltjes and Lebesgue-Stieltjes. We shall see that, unfor-
tunately, the trajectories of a Brownian motion (and, in general, of a martingale) do
not have sufficient regularity to use such theories to define the Brownian integral
in a deterministic sense, path by path. To understand this fact, it is necessary to
introduce the concepts of first and second (or quadratic) variation of a function,
which are crucial in the construction of the stochastic integral. In the second
part of the chapter, we introduce an important class of stochastic processes called
semimartingales. A semimartingale is the sum of a local martingale with a process
whose trajectories are of bounded variation: under appropriate assumptions, such
decomposition is unique. We prove a particular version of the fundamental Doob-
Meyer decomposition theorem: if X is a martingale, then .X2 is a semimartingale,
i.e., it can be decomposed into the sum of a martingale and a process of bounded
variation: the latter is the so-called quadratic variation process of X. The results of
this chapter provide the background for the construction of the stochastic integral
that we will present in the next chapter.

9.1 Riemann-Stieltjes Integral

In this section, we recall some classical results on integration in a deterministic

framework. Given .T > 0, a partition of the interval .[0, T ] is a set of the form
.π = {t0 , t1 , . . . , tN } with .0 = t0 < t1 < · · · < tN = T . We denote by .PT the set

of partitions of .[0, T ]. Given a function

g : [0, T ] −→ Rd
.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 151
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_9
152 9 Theory of Variation

the first variation of g with respect to the partition .π ∈ PT is defined as

⎲
N
V (g; π ) :=
. |g(tk ) − g(tk−1 )| .
k=1

Definition 9.1.1 (BV Function) We say that g is of bounded variation on .[0, T ],

and we write .g ∈ BVT , if

VT (g) := sup V (g; π ) < ∞.

.
π ∈PT

We say that

g : R≥0 −→ Rd
.

is locally of bounded variation, and we write .g ∈ BV, if .g|[0,T ] ∈ BVT for every
T > 0.
.

Note that the function .t |→ Vt (g) is increasing and non-negative.

Example 9.1.2 ([!])
(i) Let .d = 1. If g is a monotone function on .[0, T ] then .g ∈ BVT . In fact, if, for
example, g is increasing then

⎲
N ⎲
N
V (g; π ) =
. |g(tk ) − g(tk−1 )| = (g(tk ) − g(tk−1 )) = g(T ) − g(0)
k=1 k=1

for every .π ∈ PT . In the case .d = 1, monotonicity is almost a characteriza-

tion: it is known that .g ∈ BVT if and only if g is the difference of increasing
monotone functions, .g = g+ − g− . Moreover, if g is continuous then .g+ and
.g− are continuous as well.

(ii) It is not difficult to show that, if g is continuous then

VT (g) = lim V (g; π )

. (9.1.1)
|π |→0

where

|π | := max |tk − tk−1 |

.
1≤k≤N

is called the mesh of .π (i.e. the length of the longest subinterval). Interpreting
t |→ g(t) as a trajectory (or parametrized curve) in .Rd , the fact that .g ∈ BVT
.

means that g is rectifiable, in the sense that the length of g can be computed
9.1 Riemann-Stieltjes Integral 153

as the supremum of the lengths of polygonal approximations:1 by definition,

.VT (g) is the length of g. Equation (9.1.1) does not hold if g is discontinuous:

for example, fixed .s ∈ ]0, T [, the function

⎧
1 if t = s,
g(t) =
.
0 if t ∈ [0, s[∪ ]s, T ],

is such that .V (g; π ) = 2 for every .π ∈ PT such that .s ∈ π and .V (g; π ) = 0

for every .π ∈ PT such that .s ∈ / π.
(iii) If .g ∈ Lip([0, T ]; Rd ), that is, there exists a constant c such that .|g(t)−g(s)| ≤
c|t − s| for every .t, s ∈ [0, T ], then .g ∈ BVT since

⎲
N ⎲
N
V (g; π ) =
. |g(tk ) − g(tk−1 )| ≤ c (tk − tk−1 ) = cT
k=1 k=1

for every .π ∈ PT .
(iv) If g is an integral function of the type
ˆ t
g(t) =
. u(s)ds, t ∈ [0, T ],
0

with .u ∈ L1 ([0, T ]; Rd ) then .g ∈ BVT since

N |ˆ
⎲ | ⎲ N ˆ
| tk | tk
V (g; π ) = | u(s)ds || ≤ |u(s)|ds = ‖u‖L1 ,
.
|
k=1 tk−1 k=1 tk−1

for every .π ∈ PT .
(v) It is not difficult to prove that the function
⎧
0 if t = 0,
g(t) =
.
t sin 1
t if 0 < t ≤ T ,

is continuous but not of bounded variation.

We now introduce the Riemann-Stieltjes integral. Given .π = {t0 , . . . , tN } ∈ PT ,
we denote by .Tπ the family of point choices relative to .π : an element of .Tπ is of
the form

τ = {τ1 , . . . , τN },
. τk ∈ [tk−1 , tk ], k = 1, . . . , N.

1 A polygonal approximation is obtained by connecting a finite number of line segments along the

curve.
154 9 Theory of Variation

Given two functions .f, g : [0, T ] −→ R, .π ∈ PT and .τ ∈ Tπ , we say that

⎲
N
S(f, g; π, τ ) :=
. f (τk )(g(tk ) − g(tk−1 ))
k=1

is the Riemann-Stieltjes sum of f with respect to g, relative to the partition .π and the
choice of points .τ .
Proposition 9.1.3 (Riemann-Stieltjes Integral) For every .f ∈ C[0, T ] and .g ∈
BVT there exists and is finite the limit

. lim S(f, g; π, τ ). (9.1.2)

|π |→0

Such limit is called Riemann-Stieltjes integral of f with respect to g on .[0, T ] and

denoted by
ˆ T ˆ T
. f dg or f (t)dg(t).
0 0

More precisely, for every .ε > 0 there exists .δε > 0 such that
| ˆ |
| T |
|
. S(f, g; π, τ ) − f dg || < ε
|
0

for every .π ∈ PT , with .|π | < δϵ , and .τ ∈ Tπ .

Proof We use the Cauchy criterion and show that for every .ϵ > 0 there exists
δϵ > 0 such that
.

| |
. |S(f, g; π ' , τ ' ) − S(f, g; π '' , τ '' )| < ϵ

for every .π ' , π '' ∈ PT such that .|π ' |, |π '' | < δϵ and for every .τ ' ∈ Tπ ' and
''
.τ ∈ Tπ '' .

Let .π = π ' ∪π '' = {t0 , . . . , tN }. Since f is uniformly continuous on the compact

interval .[0, T ], given .ϵ > 0 there exists .δϵ > 0 such that, for .|π ' |, |π '' | < δϵ , we
have

which proves the thesis. ⨆

⨅
Let us see some particular cases in which it is possible to calculate a Riemann-
Stieltjes integral starting from the general definition (9.1.2).
9.1 Riemann-Stieltjes Integral 155

Example 9.1.4 Fixed .t¯ ∈ ]0, T [, let

⎧
0 if t ∈ [0, t¯[,
g(t) =
.
1 if t ∈ [t¯, T ].

For every .f ∈ C[0, T ], .π = {t0 , . . . , tN } ∈ PT and .τ ∈ Tπ , let .k̄ be the index for
which .t¯ ∈ ]tk̄−1 , tk̄ ]. Then we have
( )
S(f, g; π, τ ) = f (τk̄ ) g(tk̄ ) − g(tk̄−1 ) = f (τk̄ ) −−−→ f (t¯).
.
|π |→0

Hence
ˆ T
. f dg = f (t¯).
0

Note that
ˆ T ˆ
. f (t)dg(t) = f (t)δt¯(dt)
0 [0,T ]

where the right-hand side is the integral with respect to the Dirac delta measure
centered at .t¯.
Example 9.1.5 Let
ˆ t
. g(t) = u(s)ds, t ∈ [0, T ],
0

the integral function of Example 9.1.2-(iv), with .u ∈ L1 ([0, T ]; R). By considering

separately the positive and negative parts of u, it is not restrictive to assume .u ≥ 0.
Given .π ∈ PT and .f ∈ C[0, T ], we consider the particular choice of points

. τk ∈ arg min f, k = 1, . . . , N.
[tk−1 ,tk ]

Then we have

⎲
N
S(f, g; π, τ ) =
. f (τk )(g(tk ) − g(tk−1 ))
k=1

⎲
N ˆ tk
= f (τk ) u(s)ds
k=1 tk−1

N ˆ
⎲ tk ˆ T
≤ f (s)u(s)ds = f (s)u(s)ds.
k=1 tk−1 0
156 9 Theory of Variation

We prove a similar inequality with the choice

τk ∈ arg max f,
. k = 1, . . . , N.
[tk−1 ,tk ]

and, taking the limit as .|π | → 0, we conclude that

ˆ T ˆ T ˆ T
. f (t)dg(t) = f (t)u(t)dt ≡ f (t)g ' (t)dt.
0 0 0

The general result that provides the rules for Riemann-Stieltjes integration is the
following important Itô’s formula.
Theorem 9.1.6 (Deterministic Itô’s Formula) For every .F = F (t, x) ∈
C 1 ([0, T ] × R) and .g ∈ BVT ∩ C[0, T ] we have
ˆ T ˆ T
F (T , g(T )) − F (0, g(0)) =
. (∂t F )(t, g(t))dt + (∂x F )(t, g(t))dg(t)
0 0

Proof For every .π = {t0 , . . . , tN } ∈ PT , we have

⎲
N
(T , g(T )) − F (0, g(0)) =
. (F (tk , g(tk )) − F (tk−1 , g(tk−1 ))) =
k=1

(by the mean value theorem and the continuity of g, with .τ ' , τ '' ∈ Tπ )

⎲
N
( )
. = (∂t F )(τk' , g(τk'' ))(tk − tk−1 ) + (∂x F )(τk' , g(τk'' )) (g(tk ) − g(tk−1 ))
k=1

which proves the thesis, taking the limit as .|π | → 0. ⨆

⨅
Remark 9.1.7 When F depends only on x, the Itô’s formula becomes
ˆ T
F (g(T )) − F (g(0)) =
. F ' (g(t))dg(t)
0

which is sometimes written, especially in the context of stochastic calculus (cf.

Notation 10.4.2), in terms of the so-called “differential notation”:

dF (g(t)) = F ' (g(t))dg(t).

. (9.1.3)

The latter formally reminds the usual chain rule for the derivation of composite
functions.
9.1 Riemann-Stieltjes Integral 157

In the multidimensional case where .g = (g1 , . . . , gd ) takes values in .Rd , setting

.∇x = (∂x1 , . . . , ∂xd ), the Itô’s formula becomes

ˆ T ˆ T
F (T , g(T )) − F (0, g(0)). = (∂t F )(t, g(t))dt + (∇x F )(t, g(t))dg(t)
0 0
ˆ T d ˆ
⎲ T
= (∂t F )(t, g(t))dt + (∂xi F )(t, g(t))dgi (t)
0 i=1 0

or in differential notation

dF (t, g(t)) = (∂t F )(t, g(t))dt + (∇x F )(t, g(t))dg(t).

Example 9.1.8 Let us consider some examples of the application of the determin-
istic Itô’s formula:
(i) for .F (t, x) = x we have
ˆ T
g(T ) − g(0) =
. dg
0

which generalizes the fundamental theorem of integral calculus;

(ii) for .F (t, x) = f (t)x, with .f ∈ C 1 [0, T ], we have
ˆ T ˆ T
f (T )g(T ) − f (0)g(0) =
. f ' (t)g(t)dt + f (t)dg(t)
0 0

which generalizes the integration by parts formula. In differential form we have

d(f (t)g(t)) = f ' (t)g(t)dt + f (t)dg(t)

. (9.1.4)

which formally resembles the formula for the derivative of a product;

(iii) for .F (t, x) = x 2 we have
ˆ T g 2 (T ) − g 2 (0)
. g(t)dg(t) =
0 2
or

dg 2 (t) = 2g(t)dg(t).
.
158 9 Theory of Variation

9.2 Lebesgue-Stieltjes Integral

Any real-valued a function .g ∈ BV ∩ C(R≥0 ) decomposes into the difference .g =

g+ − g− where .g+ , .g− are increasing and continuous functions. By Theorem 1.4.33
in [113], .g+ and .g− are associated with two measures on2 .(R≥0 , B) which we
denote by .μ+ −
g and .μg , respectively: we have

μ±
.
±
g ([a, b]) = μg (]a, b]) = g± (b) − g± (a), a ≤ b.

In order to apply Theorem 1.4.33 in [113], it would be sufficient to assume that g

is right-continuous (as in Example 9.1.4 where .μg = δt¯). However, to simplify the
treatment, here we only consider a continuous function g because we will later study
the stochastic integral only with respect to continuous integrators. We denote by

|μg | := μ+
.
−
g + μg

the measure defined as the sum of .μ+ −

g and .μg . Moreover, for each .H ∈ B such that
+ −
at least one of .μg (H ) and .μg (H ) is finite, we set

μg (H ) = μ+
.
−
g (H ) − μg (H ). (9.2.1)

We say that .μg is a signed measure since it can also take negative values, including
−∞.
.

Definition 9.2.1 (Lebesgue-Stieltjes Measure) Given .g ∈ BV ∩ C(R≥0 ), we say

that .μg in (9.2.1) is the Lebesgue-Stieltjes measure associated with g. For each
.H ∈ B and .f ∈ L (H, |μg |), we define the Lebesgue-Stieltjes integral of f with
1

respect to g on H as
ˆ ˆ ˆ
. f dμg := f dμ+
g − f dμ−
g.
H H H

The Lebesgue-Stieltjes integral generalizes the Riemann-Stieltjes integral,

extending the class of integrable functions.
Proposition 9.2.2 (Riemann-Stieltjes vs Lebesgue-Stieltjes) For every .f ∈
C(R≥0 ), .g ∈ BV ∩ C(R≥0 ) and .T > 0, we have
ˆ T ˆ
. f dg = f dμg .
0 [0,T ]

2 We define the measures on .R≥0 since the space of non-negative real numbers will be the set
of time indices for stochastic processes. To apply Theorem 1.4.33 in [113], we can extend the
functions .g+ , g− so that they are continuous and constant for .t ≤ 0. All the results of the section
obviously hold on .(R, B ).
9.2 Lebesgue-Stieltjes Integral 159

Proof Given .π = {t0 , . . . , tN } ∈ PT , let us consider the simple functions

⎲
N
. fπ± (t) = f (τk± )1[tk−1 ,tk [ (t)
k=1

with

τk+ ∈ arg max f,

. τk− ∈ arg min f, k = 1, . . . , N.
[tk−1 ,tk ] [tk−1 ,tk ]

Then we have

⎲
N ˆ ˆ ˆ
. f (τk− ) (g+ (tk ) − g+ (tk−1 )) = fπ− dμ+
g ≤ f dμ+
g ≤ fπ+ dμ+
g
k=1 [0.T ] [0,T ] [0.T ]

⎲
N
= f (τk+ ) (g+ (tk ) − g+ (tk−1 )) .
k=1

Taking the limit as .|π | → 0, we obtain

ˆ T ˆ
. f dg+ = f dμ+
g.
0 [0,T ]

Proceeding in a similar manner with .g− , we conclude the proof. ⨆

⨅
We prove a technical result that will be used later (see, for example, Theo-
rem 11.2.1).
Proposition 9.2.3 In a filtered probability space .(Ω, F , P , Ft ) satisfying the
usual conditions, let:
• .τ be a finite (i.e. .τ < ∞ a.s.) stopping time;
• A be a continuous, increasing, and adapted process with .A0 = 0;
• X be a non-negative integrable random variable.
Then we have
⎡ˆ τ ⎤ ⎡ˆ τ ⎤
E
. XdAt = E E [X | Ft ] dAt
0 0

and, more precisely,

⎡ˆ τ ⎤ ⎡ˆ τ ⎤
.E XdAt = E Mt dAt
0 0

for every càdlàg version M of the martingale .E [X | Ft ].

160 9 Theory of Variation

Proof First, assume that A and X are bounded a.s. by some .N ∈ N. Fixed .n ∈ N,
n for .k = 0, . . . , n. We have
let .τk = kτ

⎡ˆ ⎤ ⎡ n ⎤
τ ⎲ ( )
E
. XdAt = E X Aτk − Aτk−1
0 k=1
⎡ n ⎤
⎲ ⎡ ⎤( )
=E E X | Fτk Aτk − Aτk−1
k=1
⎡ n ⎤
⎲ ( )
=E Mτk Aτk − Aτk−1
k=1
⎡ˆ τ ⎤
(n)
=E Mt dAt
0

where

(n)
⎲
n
Mt
. = M0 + Mτk 1]τk−1 ,τk ] (t).
k=1

Due to the right-continuity of M, we have

. lim Mt(n) (ω) = Mt (ω)

n→∞

for almost every .ω such that .t ≤ τ (ω). Given the boundedness of X and therefore
of M, the thesis follows from the dominated convergence theorem.
Moving on to the general case, it is sufficient to apply what we have just proved
to .X ∧ N and .A ∧ N , using Beppo Levi’s theorem to take the limit as .N → ∞. ⨅ ⨆

9.3 Semimartingales

Definition 9.3.1 We say that a process X = (Xt )t≥0 is

• increasing if the trajectories t |→ Xt (ω) are increasing functions3 for almost
every ω ∈ Ω;
• locally of bounded variation if X(ω) ∈ BV for almost every ω ∈ Ω (cf.
Definition 9.1.1). For brevity, we often omit the adjective “locally” and simply
speak of processes of bounded variation (or BV processes), still using the notation
BV to indicate the family of such processes;

3 That is, Xs (ω) ≤ Xt (ω) if s ≤ t.

9.3 Semimartingales 161

• a semimartingale if it is of the form X = M + A where M is a local martingale

and A is an adapted process, of bounded variation and such that A0 = 0.
The interest in semimartingales is due to the fact that we will use such processes
as integrators in the Itô stochastic integral. We will restrict our attention to
continuous semimartingales, i.e., processes of the form X = M + A with M ∈
M c,loc (cf. Definition 8.4.2) and A continuous, adapted and of bounded variation.
Example 9.3.2 Let x, μ, σ ∈ R and W be a standard Brownian motion. The
Brownian motion with drift

Xt := x + μt + σ Wt ,
. t ≥ 0,

is a continuous semimartingale with decomposition X = M + A where Mt =

x + σ Wt and At = μt. We will prove in Corollary 9.3.7 that the decomposition of
a continuous semimartingale is unique.
Remark 9.3.3 A deep result, the Doob-Meyer decomposition theorem, states that
every càdlàg sub-martingale is a semimartingale: unlike the discrete case (cf.
Theorem 1.4.15), the proof of this fact is far from elementary.
In [121], Cap. IV Theorem 71, it is shown that if X is a continuous local
martingale, X ∈ M c,loc , with X0 = 0 and 0 < α < 12 then the process |X|α is
not a semimartingale unless X is identically zero.

9.3.1 Brownian Motion as a Semimartingale

A Brownian motion W is a continuous martingale and therefore also a semimartin-

gale. To show that its .BV part is null (and almost all trajectories of W are not .BV),
we introduce the concept of second (or quadratic) variation of a function g relative
to the partition .π = {t0 , t1 , . . . , tN } ∈ PT :

(2)
⎲
N
VT (g; π ) :=
. |g(tk ) − g(tk−1 )|2 . (9.3.1)
k=1

Proposition 9.3.4 If .g ∈ BVT ∩ C[0, T ] then

(2)
. lim VT (g; π ) = 0.
|π |→0
162 9 Theory of Variation

Proof Since g is uniformly continuous on the compact interval .[0, T ], for every
ε > 0 there exists .δε > 0 such that
.

. max |g(tk ) − g(tk−1 )| < ϵ

1≤k≤N

for every .π ∈ PT such that .|π | < δϵ . Consequently,

(2)
⎲
N
.V
T (g; π ) ≤ϵ |g(tk ) − g(tk−1 )| ≤ ϵVT (g).
k=1

⨆
⨅
Example 9.3.5 ([!]) If W is a real Brownian motion, then

(2)
. lim VT (W ; π ) = T in L2 (Ω, P ), (9.3.2)
|π |→0

and consequently, the trajectories of W are not of bounded variation almost surely.
To prove (9.3.2), given a partition .π = {t0 , t1 , . . . , tN } ∈ PT , we set

δk = tk − tk−1 ,
. Δk = Wtk − Wtk−1 , k = 1, . . . , N,
⎡ ⎤
and observe that .E Δ4k = 3δk2 and
⎡ ⎤
E Δ2k − δk = 0,
.

⎡⎛ ⎞⎛ ⎞⎤ ⎡⎛ ⎞ ⎡ ⎤⎤
E Δ2h − δh Δ2k − δk = E Δ2h − δh E Δ2k − δk | Fth = 0 (9.3.3)

if .h < k. Then we have

⎡⎛ ⎞ ⎤
⎡⎛ ⎞2 ⎤ ⎲N ⎛ ⎞ 2
=E⎣ ⎦
(2)
.E VT (W ; π ) − T Δ2k − δk
k=1

⎲
N ⎡⎛ ⎞2 ⎤
= E Δ2k − δk
k=1
⎲ ⎡⎛ ⎞⎛ ⎞⎤
+2 E Δ2h − δh Δ2k − δk =
h<k

(since the terms of the second sum are null by (9.3.3))

⎲
N ⎡ ⎤
. = E Δ4k − 2Δ2k δk + δk2 =
k=1
9.3 Semimartingales 163

(again by (9.3.3))

⎲
N ⎲
N
. = 2δk2 ≤ 2|π | δk = 2|π |T
k=1 k=1

which proves the thesis.

9.3.2 Semimartingales of Bounded Variation

In Example 9.3.5 we have repeatedly used the martingale property to prove that W
has positive quadratic variation and therefore is not of bounded variation. In fact, this
result extends to the entire class of continuous local martingales whose trajectories
are not of bounded variation unless they are identically zero.
Theorem 9.3.6 ([!]) Let .X = (Xt )t≥0 be a continuous local martingale, .X ∈
M c,loc . If .X ∈ BV then X is indistinguishable from the process identically equal to
.X0 .

Proof Without loss of generality, we can consider .X0 = 0. First, we prove the thesis
in the case where .X ∈ BV is a bounded continuous martingale: precisely, suppose
there exists a constant K such that

. sup (|Xt | + Vt (X)) ≤ K.

t≥0

Fixed .T > 0 and .π ∈ PT , we set

Δk = Xtk − Xtk−1 ,
. Δπ = max |Xtk − Xtk−1 |.
1≤k≤N

We observe that by identity (1.4.3) we have

⎡ ⎤ ⎡ ⎤
.E (Xtk − Xtk−1 )2 = E Xt2k − Xt2k−1

and, by the uniform continuity of the trajectories,

. lim Δπ (ω) = 0, 0 ≤ Δπ (ω) ≤ 2K, ω ∈ Ω. (9.3.4)

|π |→0
164 9 Theory of Variation

Then we have
⎡N ⎤ ⎡N ⎤
⎡ ⎤ ⎲⎛ ⎞ ⎲( )2
2
.E XT =E Xtk − Xtk−1
2 2
=E Xtk − Xtk−1
k=1 k=1

≤ E [Δπ VT (X; π )] ≤ KE [Δπ ] (9.3.5)

which, as⎡.|π | →
⎤ 0, tends to zero by (9.3.4) and the dominated convergence theorem.
Hence .E XT2 = 0 and by Doob’s maximal inequality
⎡ ⎤
⎡ ⎤
E
. sup Xt2 ≤ 4E XT2 = 0.
0≤t≤T

Consequently, by continuity, almost surely the trajectories of X are identically zero

on .[0, T ]. Given the arbitrariness of T , we conclude that X is indistinguishable from
the null process.
In the general case, we consider a localizing sequence .τ̄n for which .Yn,t :=
Xt∧τ̄n ∈ BV. We refine this sequence by defining the stopping times

.σn = inf{t ≥ 0 | |Yn,t | + Vt (Yn,· ) ≥ n}.

Also .τn := τ̄n ∧ σn ∧ n is a localizing sequence for X: moreover, .Xt∧τn is a bounded

continuous martingale, that is constant for .t ≥ n and whose first variation is bounded
by n. As proven above, .Xt∧τn is indistinguishable from the null process and the
thesis follows by taking the limit as .n → ∞. ⨆
⨅
Corollary 9.3.7 ([!]) Let X be a continuous semimartingale. The decomposition
.X = M + A, with .M ∈ M c,loc and .A ∈ BV continuous, adapted process such that
.A0 = 0, is unique.

Proof If .X = M ' + A' is another decomposition, then .M − M ' = A' − A is a

continuous local martingale that is locally of bounded variation. By Theorem 9.3.6,
M is indistinguishable from .M ' and A is indistinguishable from .A' . ⨆
⨅
Remark 9.3.8 Without the continuity assumption, the decomposition of a semi-
martingale is generally not unique: discontinuities in the paths of a semimartingale
can lead to different decompositions. For example, the Poisson process N is
increasing and therefore of bounded variation: then .N = M + A with .A := N
and .M := 0. However, we have also the decomposition given by .At := λt and
.Mt := Nt −λt, where M is the compensated Poisson process (cf. Proposition 5.3.1).
9.4 Doob’s Decomposition and Quadratic Variation Process 165

9.4 Doob’s Decomposition and Quadratic Variation Process

In this section, we introduce a fundamental result that underpins the theory of

stochastic integration: for every continuous local martingale X there exists an
increasing process, called the quadratic variation process and denoted by .〈X〉,
which “compensates” the local sub-martingale .X2 in the sense that .X2 − 〈X〉 is
a continuous local martingale. The process .〈X〉 can be constructed path by path as
the limit of the quadratic variation (9.3.1) as .|π | → 0: this is consistent with what
was seen in Example 9.3.5 related to the Brownian motion W for which .〈W 〉t = t
and the process .Wt2 − t is a continuous martingale.
Recall that .M c,2 denotes the space of continuous martingales X such that .Xt ∈
L (Ω, P ) for every .t ≥ 0 (cf. Definition 8.3.1) and .M c,loc denotes the space of
2

continuous local martingales (cf. Definition 8.4.2).

Theorem 9.4.1 (Doob’s Decomposition Theorem [[!!]) ] For every .X ∈ M c,2
there exist and are unique (up to indistinguishability) two processes M and .〈X〉
such that:
(i) M is a continuous martingale;
(ii) .〈X〉 is an adapted, continuous and increasing process,4 such that .〈X〉0 = 0;
(iii)

Xt2 = Mt + 〈X〉t ,
. t ≥ 0;

(iv)
⎡ ⎤
E (Xt − Xs )2 | Fs = E [〈X〉t − 〈X〉s | Fs ] ,
. t ≥ s ≥ 0. (9.4.1)

Formula (9.4.1) is the first version of an important identity called Itô’s isometry
(see Sect. 10.2.1).
More generally, if .X ∈ M c,loc then (ii) and (iii) still hold, while (i) is replaced by
(i’) .M ∈ M c,loc .
The process .〈X〉 is called the quadratic variation process of X and we have

2 ⎛
⎲
n
⎞2
.〈X〉t = lim X tkn − X t (k−1) , t > 0, (9.4.2)
n→∞ 2 2n
k=1

with convergence in probability. More generally, given a continuous semimartingale

of the form .S = X + A, with .X ∈ M c,loc and .A ∈ BV adapted, for every .t > 0 we

4 Clearly .〈X〉is also absolutely integrable since .〈X〉t = Xt2 − Mt with .Xt ∈ L2 (Ω, P ) by
hypothesis and .Mt ∈ L1 (Ω, P ) by definition of martingale.
166 9 Theory of Variation

have
2 ⎛
⎲
n
⎞2
.〈S〉t := lim S tkn − S t (k−1) = 〈X〉t (9.4.3)
n→∞ 2 2n
k=1

in probability and therefore we say that .〈S〉 is the quadratic variation process of S.
The proof of Theorem 9.4.1 is postponed to Sect. 9.6.
Example 9.4.2 Let .Xt = t +W⎡t , where⎤ W is a Brownian motion, then by definition
〈X〉t = 〈W 〉t = t. Note that .E Xt2 − t = t 2 and .Xt2 − t is not a martingale.
.

Remark 9.4.3 Theorem 9.4.1 is a special case of a deep and more general result,
known as Doob-Meyer decomposition theorem, which states that every càdlàg sub-
martingale X of class D (i.e., such that the family of random variables .Xτ , with .τ
stopping time, is uniformly integrable) can be uniquely written in the form .X =
M + A where M is a continuous martingale and A is an increasing process such
that .A0 = 0.
This result was first proved by Meyer in the 1960s of the last century and since
then many other proofs have been provided. A particularly concise proof has been
recently proposed in [14]: the very intuitive idea is to discretize the process X on
the dyadics, use the discrete version of the Doob’s decomposition theorem (cf.
Theorem 1.4.15) and finally prove that the sequence of discrete decompositions
converges to the desired decomposition, using Komlós’ Lemma 9.6.1.
Remark 9.4.4 By the optional sampling Theorem 8.1.6, the important iden-
tity (9.4.1) is generalized to the case where instead of .t, s there are two bounded
stopping times .τ, σ such that .σ ≤ τ ≤ T a.s. for some .T > 0.

9.5 Covariation Matrix

We extend the concept of quadratic variation process to the multidimensional case.

Proposition 9.5.1 (Covariation Process) Let .X, Y ∈ M c,loc be real-valued
processes. The covariation process of X and Y, defined by

〈X + Y 〉 − 〈X − Y 〉
〈X, Y 〉 :=
. , (9.5.1)
4
is the unique (up to indistinguishability) process such that
(i) .〈X, Y 〉 ∈ BV is adapted, continuous, and such that .〈X, Y 〉0 = 0;
(ii) .XY − 〈X, Y 〉 ∈ M c,loc and is a true martingale if .X, Y ∈ M c,2 .
9.5 Covariation Matrix 167

If .X, Y ∈ M c,2 , we have

.E [(Xt − Xs )(Yt − Ys ) | Fs ] = E [〈X, Y 〉t − 〈X, Y 〉s | Fs ] , t ≥ s ≥ 0,

(9.5.2)
and
2 ⎛
⎲
n
⎞⎛ ⎞
.〈X, Y 〉t = lim X tkn − X t (k−1) Y tkn − Y t (k−1) , t ≥ 0, (9.5.3)
n→∞ 2 2n 2 2n
k=1

in probability.
Proof Given the elementary equality

(X + Y )2 − (X − Y )2
XY =
.
4
it is easy to verify that the process .〈X, Y 〉 defined as in (9.5.1) satisfies properties (i)
and (ii). Uniqueness follows directly from Theorem 9.3.6. Formula (9.5.2) follows
from the identity

E [(Xt − Xs )(Yt − Ys ) | Fs ] = E [Xt Yt − Xs Ys | Fs ]

and from the martingale property of .XY − 〈X, Y 〉. Formula (9.5.3) is a simple
consequence of (9.5.1), applied to .X + Y and .X − Y , and of Proposition 11.2.4
whose proof is given in Chap. 11. ⨆
⨅
Remark 9.5.2 By uniqueness, we have .〈X, X〉 = 〈X〉. The following properties
are direct consequences of definition (9.5.1) of covariation and of (9.5.3):
(i) symmetry: .〈X, Y 〉 = 〈Y, X〉;
(ii) bi-linearity: .〈αX + βY, Z〉 = √
α〈X, Z〉 + β〈Y, Z〉, for .α, β ∈ R;
(iii) Cauchy-Schwarz: .|〈X, Y 〉| ≤ 〈X〉〈Y 〉.
Since the quadratic variation of a continuous BV function is zero (cf. Proposi-
tion 9.3.4), the definition of quadratic variation extends to continuous semimartin-
gales in a natural way: recall that in Theorem 9.4.1 we defined the quadratic
variation process of a continuous semimartingale .S = X + A, with .X ∈ M c,loc
and .A ∈ BV adapted, as .〈S〉 := 〈X〉.
Definition 9.5.3 (Covariation Matrix of a Semimartingale) If .S = (S 1 , . . . , S d )
is a continuous d-dimensional semimartingale with decomposition .S = X + A, the
covariation matrix of S is the .d × d symmetric matrix defined by

〈S〉 := (〈Xi , Xj 〉)i,j =1,...,d .

.
168 9 Theory of Variation

9.6 Proof of Doob’s Decomposition Theorem

To prove Theorem 9.4.1 we adapt an argument proposed in [14], based on an

interesting and useful result of functional analysis. The classic Bolzano-Weierstrass
theorem ensures that from any bounded sequence in the Euclidean space it is
possible to extract a convergent subsequence. Although this result does not extend to
the infinite-dimensional case, the following lemma shows that it is always possible
to construct a convergent sequence of convex combinations (subsequences are
particular convex combinations) of the elements of the starting sequence. More
precisely, given a sequence .(fn )n∈N in a Hilbert space, we denote by

Cn = {λn fn + · · · + λN fN | N ≥ n, λn , . . . , λN ≥ 0, λn + · · · + λN = 1}
.

the family of convex combinations of a finite number of elements of .(fk )k≥n .

Lemma 9.6.1 (Komlós’ Lemma [72]) Let .(fn )n∈N be a bounded sequence in a
Hilbert space. Then there exists a convergent sequence .(gn )n∈N , with .gn ∈ Cn .
Proof If .‖fn ‖ ≤ K for each .n ∈ N then, by the triangle inequality, .‖g‖ ≤ K for
each .g ∈ Cn . Therefore, setting

an := inf ‖g‖,
. n ∈ N,
g∈Cn

we have .an ≤ an+1 and .a := sup an ≤ K. Then for each .n ∈ N there exists
n∈N
gn ∈ Cn such that .‖gn ‖ ≤ a + n1 . On the other hand, for each .ε > 0 there exists
‖ gn +gm ‖
.

.nε ∈ N such that .‖ ‖ ≥ a − ε for each .n ≥ m ≥ nε , simply because

2
gn +gm
.
2 ∈ Cn and by definition of a. Then, for each .n, m ≥ nε , we have
⎛ ⎞2
1
.‖gn − gm ‖ = 2‖gn ‖ + 2‖gm ‖ − ‖gn + gm ‖ ≤ 4 a + − 4(a − ε)2
2 2 2 2
n

which proves that .(gn )n∈N is a Cauchy sequence and therefore convergent. ⨆
⨅
Proof of Theorem 9.4.1 Uniqueness follows directly from Theorem 9.3.6 since if
M ' and .A' satisfy (i), (ii) and (iii) then .M−M ' is a continuous martingale of bounded
.

variation starting from 0. We prove existence assuming first that .X = (Xt )t∈[0,1] is
a continuous and bounded martingale:

. sup |Xt | ≤ K (9.6.1)

t∈[0,1]

for some positive constant K. This is the difficult part of the proof, in which the
main ideas emerge. We proceed step by step.
9.6 Proof of Doob’s Decomposition Theorem 169

Step 1 Fixing .n ∈ N, we introduce the following notation to simplify the

calculations on dyadics of .[0, 1]:

⎲
k
( )2
Xn,k = X kn ,
. An,k = Xn,i − Xn,i−1 , Fn,k := F kn , k = 0, 1, . . . , 2n .
2 2
i=1

Clearly .k |→ Xn,k and .k |→ An,k are processes adapted to the discrete filtration
(Fn,k )k=0,1,...,2n and .k |→ An,k is increasing. Moreover, the process
.

Mn,k := Xn,k
.
2
− An,k , k = 0, 1, . . . , 2n

is a discrete martingale. In fact, we have

⎡ ⎤ ⎡( )2 ⎤
E An,k − An,k−1 | Fn,k−1 = E Xn,k − Xn,k−1 | Fn,k−1 =
.

(by (1.4.3))
⎡ ⎤
. = E Xn,k
2
− Xn,k−1
2
| Fn,k−1 (9.6.2)

which proves the martingale property of .Mn,k .

Step 2 This is the crucial point of the proof: we show that
⎡ ⎤
. sup E A2n,2n ≤ 36K 4 . (9.6.3)
n∈N

Note that, for each fixed .n ∈ N, the final value .An,2n of the process .An,· is clearly in
2
.L (Ω, P ), being a finite sum of terms that are bounded by hypothesis: however, the

number of such terms increases exponentially in n and this explains the difficulty in
proving (9.6.3) which is a uniform estimate in .n ∈ N. Here we essentially use the
martingale property and the boundedness of X (note that in the general hypotheses
X is square-integrable but in (9.6.3) powers of X of order four appear). We have
n n n
⎲
2
( )4 ⎲
2 ⎲
2
( )2 ( )2
2
.An,2n = Xn,k − Xn,k−1 + 2 Xn,k − Xn,k−1 Xn,h − Xn,h−1
k=1 k=1 h=k+1

⎲( 2n ⎲
2 n
)4 ( )2 ( )
= Xn,k − Xn,k−1 + 2 Xn,k − Xn,k−1 An,2n − An,k .
k=1 k=1
(9.6.4)
170 9 Theory of Variation

By taking the expectation, we estimate the first sum of Eq. (9.6.4) pointwise using
Eq. (9.6.1). Then, we apply the tower property in the second sum:

⎡ ⎤ 2n
⎲ ⎡( )2 ⎤
E A2n,2n ≤ 2K 2
. E Xn,k − Xn,k−1
k=1

⎲
2n
⎡( )2 ⎡ ⎤⎤
+2 E Xn,k − Xn,k−1 E An,2n − An,k | Fn,k =
k=1

(by the martingale property (9.6.2) of .Mn,k = Xn,k

2 −A )
n,k

⎡ ⎤ ⎲
2 ⎡( )2 ⎡ 2
n
⎤⎤
. = 2K E An,2n + 2 E Xn,k − Xn,k−1 E Xn,2 n − Xn,k | Fn,k ≤
2 2

k=1
| |
| 2 2 |
(since .|Xn,2 n − Xn,k | ≤ 2K )
2

⎡ ⎤ ⎡ ⎤1
2
. ≤ 6K 2 E An,2n ≤ 6K 2 E A2n,2n

having applied Hölder’s inequality in the last step. This concludes the proof
of (9.6.3).
Step 3 We extend the discrete martingale .Mn,· to the whole .[0, 1] by setting

(n) ⎡ ⎤
Mt
. := E Mn,2n | Ft , t ∈ [0, 1].
⎡ ⎤
For every .t ∈ k−1 k
2n , 2n we have, by the tower property,

= 2Xt Xn,k−1 − Xn,k−1

2
− An,k−1 .
9.6 Proof of Doob’s Decomposition Theorem 171

Then, from the continuity of X, it follows that .M (n) also is a continuous process.
Moreover, by Step 2 the sequence

(n)
M1
. = X12 − An,2n

(n)
is bounded in .L2 (Ω, P ). One could prove that .(M1 )n∈N is a Cauchy sequence,
converging in .L2 norm (and therefore in probability) but the direct proof of this
fact is a bit technical and laborious. Therefore, here we prefer to take a shortcut
relying on Komlós’ Lemma 9.6.1: for each .n ∈ N there exist non-negative weights
(n) (n)
.λn , . . . , λ
Nn whose sum is equal to one, such that setting

~n,t = λ(n)
M
.
(n) (n) (Nn )
n Mt + · · · + λ Nn Mt , t ∈ [0, 1],

~n,1 converges in .L2 (Ω, P ) to a random variable Z. Let M be a càdlàg

we have that .M
version of the martingale defined by

Mt := E [Z | Ft ] ,
. t ∈ [0, 1].

~n,t is a continuous martingale for each .n ∈ N, by Doob’s maximal

Since .t |→ M
inequality we have
⎡ ⎤
| |2 ⎡| | ⎤ ⎡| | ⎤
E
. sup |M~n,t − Mt | ≤ 4E |M~n,1 − M1 |2 = 4E |M~n,1 − Z |2 .
t∈[0,1]

Hence, after taking a subsequence, we have

| |
. lim sup |M~n,t (ω) − Mt (ω)|2 = 0, ω ∈ Ω \ F,
n→∞ t∈[0,1]

with F negligible, from which we deduce the existence of a continuous version of

M. Consequently, also the process

At := Xt2 − Mt
.

is continuous.
To show that A is increasing, we first fix two dyadic numbers .s, t ∈ [0, 1] with
kn
.s ≤ t: then there exists .n̄ such that .s, t ∈ Dn for every .n ≥ n̄, that is, .s = n and
2
hn
.t = n for certain .kn , hn ∈ {0, 1, . . . , 2 }. Now by construction
n
2

.
2
Xn,kn
− Mn,kn = An,kn ≤ An,hn = Xn,h
2
n
− Mn,hn

and a similar inequality also holds for every convex combination, so in the limit we
have .As (ω) ≤ At (ω) for every .ω ∈ Ω \ F . From the density of dyadic numbers
172 9 Theory of Variation

in .[0, 1] and the continuity of A, it follows that A is increasing a.s. Finally, we

prove (9.4.1): by (1.4.3) we have
⎡ ⎤ ⎡ ⎤
E (Xt − Xs )2 | Fs = E Xt2 − Xs2 | Fs
.

= E [Mt − Ms | Fs ] + E [At − As | Fs ]
= E [At − As | Fs ] .

Step 4 Now suppose that .X = (Xt )t≥0 is a continuous, not necessarily bounded,
martingale but such that .Xt ∈ L2 (Ω, P ) for every .t ≥ 0. We use a localization
procedure and define the sequence of stopping times

τn = inf{t | |Xt | ≥ n} ∧ n,
. n ∈ N.

By the continuity of X, we have .τn ↗ ∞ as .n → ∞. By Corollary 8.4.1, .Xt∧τn

is a continuous, bounded martingale that is constant for .t ≥ n: then we can use
the previous arguments to show that there exist a continuous square-integrable
martingale .M (n) and a continuous and increasing process .A(n) such that

(n) (n)
.
2
Xt∧τ n
= Mt + At , t ≥ 0.

By uniqueness, for every .m > n we have .Mt(n) = Mt(m) and .A(n) t = A(m)
t for
(n) (n)
.t ∈ [0, τn ]: thus the definition .Mt := Mt and .At := At is well posed for every n
such that .τn ≥ t. Clearly, .M, A are continuous processes, A is increasing and M is
a martingale: indeed, if .0 ≤ s ≤ t, for every n such that .τn ≥ t we have
⎡ ⎤
Ms∧τn = E Mt∧τn | Fs .
.

Hence, we conclude by employing the same reasoning as in the proof of Theo-

rem 8.1.6, given that the family .{Mt∧τn | n ∈ N} is uniformly integrable, as
guaranteed by Doob’s inequality
⎡ ⎤
⎡ ⎤
E
. sup |Ms | 2
≤ 4E Mt2
s∈[0,t]

and Remark C.0.10 in [113].

The same localizing sequence can be used to deal with the case where .X ∈
M c,loc and in this case it is obvious that .M ∈ M c,loc .
Step 5 Given the current available tools, proving formulas (9.4.2) and (9.4.3) would
require lengthy and laborious calculations. However, since we do not intend to
utilize these formulas soon, we opt to defer their proof to a later stage when we
will have the Itô’s formula at our disposal: this will simplify the proof significantly
(cf. Proposition 11.2.4).
⨆
⨅
9.7 Key Ideas to Remember 173

9.7 Key Ideas to Remember

We highlight the major takeaways from this chapter and the key concepts you should
remember after your first read-through, skipping over the technical jargon and less
important details. As usual, if you have any doubt about what the following succinct
statements mean, please review the corresponding section.
• Section 9.1: to facilitate the understanding of the stochastic integration theory,
we recall the definition of the Riemann-Stieltjes integral. It is the natural
generalization of the Riemann integral, defined under the assumption that the
integrand function is continuous and the integrator is of bounded variation.
The main rules of integral calculus are provided by Itô’s formula which, in a
deterministic version, anticipates the analogous result for the stochastic integral.
• Section 9.2: the Lebesgue integral can be generalized as well. In fact, by
Carathéodory’s theorem, to each .BV function is associated a (signed) measure,
called the Lebesgue-Stieltjes measure. The related integral, called the Lebesgue-
Stieltjes integral, admits a class of integrable functions much larger than the
Riemann-Stieltjes integral.
• Section 9.3: a semimartingale is an adapted process that decomposes into the sum
of a local martingale with a .BV process. For a continuous semimartingale, this
decomposition is unique: in fact, if a process is simultaneously a continuous local
martingale and of bounded variation then it is indistinguishable from a constant
process. This is due to the fact that a continuous and .BV process X has zero
quadratic variation and this, in combination with the martingale property, implies
(see (9.3.5)) that X is constant. A direct and instructive calculation shows that
the quadratic variation process of a Brownian motion W is equal to .〈W 〉T = T :
consequently, almost all trajectories of W are not of bounded variation.
• Section 9.4: the Doob’s decomposition theorem states that for every continuous
local martingale X there exists an increasing (and therefore .BV) process, called
the quadratic variation process and denoted by .〈X〉, which “compensates” the
local sub-martingale .X2 in the sense that .X2 − 〈X〉 is a continuous local
martingale. In practice, this result states that .X2 is a semimartingale and provides
its Doob’s decomposition into .BV and martingale parts.
• Section 9.6: the general idea of the proof of the Doob’s decomposition theorem
is simple: the process .〈X〉 can be constructed path by path as the limit of
the quadratic variation process. However, considering the significance of the
technical details involved, it is advisable to skip this section during the initial
reading.
174 9 Theory of Variation

Main notations used or introduced in this chapter:

Symbol Description Page

.PT Family of partitions .π of .[0, T ] 152
.V (g; π ) First variation of the function g relative to .π 152
.BVT Family of functions of bounded variation on .[0, T ] 152
.VT (g) First variation of the function g on .[0, T ] 152
.BV Family of functions locally of bounded variation 152
´T
. 0 f dg Riemann-Stieltjes integral of f with respect to g on .[0, T ] 154
.dF (g(t)) = F ' (g(t))dg(t) Deterministic Itô’s formula in differential notation 156
.μg Lebesgue-Stieltjes measure of .g ∈ BV ∩ C 158
.M
c,2 Continuous square-integrable martingales 141
.M
c,loc Continuous local martingales 143
(2)
.VT (g; π ) Quadratic variation of function g relative to .π 161
.〈X〉 Quadratic variation process 165
.〈X, Y 〉 Covariation process 166
Chapter 10
Stochastic Integral

One needs for stochastic integration a six months course to

cover only the definitions. What is there to do?
Paul-André Meyer

In this chapter, we introduce the stochastic integral

ˆ t
Xt :=
. us dBs , t ≥ 0,
0

interpreted as a stochastic process with varying integration endpoint.1 We will

assume appropriate hypotheses on the integrand process u and the integrator pro-
cess B. The prototype for the integrator is the Brownian motion: since the Brownian
trajectories are not of bounded variation, we cannot adopt the deterministic theory
of Lebesgue-Stieltjes integration to define the integral path by path. Instead, we will
follow the construction due to Kiyosi Itô (1915–2008) which is based on the theory
of variation presented in Chap. 9: a crucial ingredient is the assumption that the
integrand process u is progressively measurable.
The construction of the stochastic integral is in some ways analogous to that of
the Lebesgue integral but is decidedly longer and more laborious: it begins with the
“simple” processes (i.e., piecewise constant in time) and advances to progressively
measurable processes whose trajectories satisfy a weak integrability property with
respect to the time variable. An important intermediate step is when u is a “square-
integrable process” (cf. Definition 10.1.1); in this case, the stochastic integral has
some remarkable properties: it is a continuous square-integrable martingale, i.e., it

1 So we want to define .Xt not only as a random variable for fixed t, but as a stochastic process
indexed by .t ≥ 0: we will see that this entails some additional difficulty due to the fact that t varies
in an uncountable set.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 175
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_10
176 10 Stochastic Integral

belongs to the space .M c,2 , the so-called Itô isometry holds, and finally, the quadratic
variation process is given explicitly by
ˆ t
〈X〉t =
. u2s d〈B〉s , t ≥ 0.
0

The last part of the chapter is dedicated to the definition of the stochastic integral
in the case where B is a continuous semimartingale. We will also introduce the
important class of Itô processes which are continuous semimartingales that can
be uniquely decomposed into the sum of a Lebesgue integral (of a progressively
measurable and absolutely integrable process) with a Brownian stochastic integral.

As stated by Meyer in the quote at the beginning of the chapter, an entire

semester course would be needed just to give the definition of the stochastic
integral in full details. For those approaching the theory of stochastic integra-
tion for the first time, it is advisable to follow the reading scheme indicated
in Sect. 10.5, in particular focusing on studying Sects. 10.1 and 10.4, initially
skipping Sects. 10.2 and 10.3.

10.1 Integral with Respect to a Brownian Motion

For an introductory purpose, we examine the particular case where B is a real

Brownian motion defined on a filtered space .(Ω, F , P , Ft ). To overcome the
problem of irregularity of Brownian trajectories, the idea is to selectively choose
the class of integrand processes in order to exploit some probabilistic properties.
Definition 10.1.1 We denote by .L2 the class of processes .u = (ut )t≥0 such that:
(i) u is progressively measurable with respect to .(Ft ) (cf. Definition 6.2.27);
(ii) for every .T ≥ 0 we have
⎾ˆ T ⎤
E
. u2t dt < ∞. (10.1.1)
0

Remark 10.1.2 Property (i) is more than a simple condition of joint measurability
in .(t, ω) (which would be natural since we are defining an integral): it also
incorporates the critical assumption that the information structure of the considered
filtration is upheld. Let us remember that, if u is continuous, then (i) is equivalent to
the fact that u is adapted to .(Ft ).
10.1 Integral with Respect to a Brownian Motion 177

Remark 10.1.3 As previously mentioned, we restrict our attention to continuous

integrators. However, it is possible to define the stochastic integral also with respect
to càdlàg processes such as the Poisson process. In such cases, it is necessary to
impose a more stringent condition on the integrand, essentially requiring it to be
approximable by left-continuous processes.2
As for the Lebesgue integral, the construction of the stochastic integral takes
place in steps, initially considering “simple” processes.
Definition 10.1.4 We say that .u ∈ L2 is simple if

⎲
N
.ut = αk 1[tk−1 ,tk [ (t), t ≥ 0, (10.1.2)
k=1

where .0 ≤ t0 < t1 < · · · < tN and .α1 , . . . , αN are random variables such that
P (αk /= αk+1 ) > 0 for .k = 1, . . . , N − 1. For every .T ≥ tN we set
.

ˆ ⎲
N
T ( )
. ut dBt := αk Btk − Btk−1
0 k=1

and define the stochastic integral for two generic integration endpoints a and b, with
0 ≤ a ≤ b, as
.

ˆ b ˆ tN
. ut dBt := ut 1[a,b[ (t)dBt . (10.1.3)
a 0

In this introductory part, we do not worry about clarifying all the details of
the definition of integral, such as the fact that (10.1.3) is well posed because it is
independent, up to indistinguishable processes, of the representation (10.1.2) of the
process u.
Remark 10.1.5 A simple process is piecewise constant as a function of time and
has trajectories that depend on the coefficients .α1 , . . . , αN which are random. From
the fact that .u ∈ L2 some properties of the variables .α1 , . . . , αN follow:
(i) since u is progressively measurable and .αk = ut ∈ mFt for every .t ∈ [tt−k , tk [,
then

αk ∈ mFtk−1 ,
. k = 1, . . . , N; (10.1.4)

2 The Poisson process is a .BV process and therefore we can define the related stochastic integral in
the Lebesgue-Stieltjes sense: however, if the integrand is not continuous from the left, the integral
loses the fundamental property of being a (local) martingale: for an intuitive explanation of this
fact, see Section 2.1 in [37].
178 10 Stochastic Integral

(ii) by the integrability assumption (10.1.1) we have

⎾ˆ tN ⎤ ⎲N ⎾ˆ tN ⎤
.E u2t dt = E αk2 1[tk−1 ,tk [ (t)dt
0 k=1 0

⎲
N ⎾ ⎤
= E αk2 (tk − tk−1 ) < +∞
k=1

and therefore .α1 , . . . , αN ∈ L2 (Ω, P ).

We now prove some fundamental properties of the stochastic integral.
Theorem 10.1.6 ([!]) Given two simple processes .u, v ∈ L2 , consider
ˆ t ˆ t
Xt :=
. us dBs , Yt := vs dBs , t ≥ 0.
0 0

For .0 ≤ s ≤ t ≤ T the following properties hold:

(i) X is a continuous square-integrable martingale, .X ∈ M c,2 , and
⎾ˆ t ⎤
E
. ur dBr | Fs = 0; (10.1.5)
s

(ii) the Itô isometry holds

⎾⎛ˆ ⎞2 ⎤ ⎾ˆ ⎤
t t
. E ur dBr | Fs = E u2r dr | Fs (10.1.6)
s s

and more generally

⎾ˆ t ˆ t ⎤ ⎾ˆ t ⎤
E
. ur dBr vr dBr | Fs = E ur vr dr | Fs , . (10.1.7)
s s s
⎾ˆ t ˆ T ⎤
E ur dBr vr dBr | Fs = 0; (10.1.8)
s t

(iii) the covariation process of X and Y (cf. Proposition 9.5.1) is given by

ˆ t
. 〈X, Y 〉t = us vs ds, t ≥ 0. (10.1.9)
0

Finally, the unconditional versions of formulas (10.1.5), (10.1.6), (10.1.7)

and (10.1.8) also hold.
10.1 Integral with Respect to a Brownian Motion 179

Proof First, let us observe that formulas (10.1.5), (10.1.6), (10.1.7) and (10.1.8) are
equivalent to

E [Xt − Xs | Fs ] = 0,
. (10.1.10)
⎾ ⎤
E (Xt − Xs )2 | Fs = E [〈X〉t − 〈X〉s | Fs ] ,

E [(Xt − Xs ) (Yt − Ys ) | Fs ] = E [〈X, Y 〉t − 〈X, Y 〉s | Fs ] ,

E [(Xt − Xs ) (YT − Yt ) | Fs ] = 0.

Let us prove (10.1.5), which is equivalent to the martingale property .E [Xt | Fs ] =

Xs : referring to (10.1.2) and remembering the notation (10.1.3), it is not restrictive
to assume .s = tk and .t = th for some .k, h with .k < h ≤ N. We have
⎾ˆ ⎤
⎾ ⎤ th
.E Xth | Ftk = Xtk + E ur dBr | Ftk
tk

⎲
h
⎾ ( ) ⎤
= Xtk + E αi Bti − Bi−1 | Ftk =
i=k+1

(by (10.1.4) and the tower property)

⎲
h
⎾ ⎾ ⎤ ⎤
. = Xtk + E αi E Bti − Bti−1 | Fti−1 | Ftk = Xtk
i=k+1

where the last equality follows from the independence and stationarity of Brownian
increments for which we have
⎾ ⎤ ⎾ ⎤
E Bti − Bti−1 | Fti−1 = E Bti − Bti−1 = 0
.

for every .i = 1, . . . , N .
Regarding Itô’s isometry, still assuming that .s = tk and .t = th , we have
⎾⎛ˆ ⎞2 ⎤
t
E
. ur dBr | Fs
s
⎾( )2 ⎤
=E Xth − Xtk | Ftk
⎡⎛ ⎞2 ⎤
⎲h
( )
=E⎣ αi Bti − Bti−1 | F tk ⎦
i=k+1
180 10 Stochastic Integral

⎲
h ⎾ ( )2 ⎤
= E αi2 Bti − Bti−1 | Ftk
i=k+1

1 ⎲ ⎾ ( ) ( ) ⎤
+ E αi Bti − Bti−1 αj Btj − Btj −1 | Ftk =
2
k+1≤i<j ≤h

(by (10.1.4) and the tower property)

⎲
h ⎾ ⎾( )2 ⎤ ⎤
. = E αi2 E Bti − Bti−1 | Fti−1 | Ftk
i=k+1

1 ⎲ ⎾ ( ) ⎾ ⎤ ⎤
+ E αi Bti − Bti−1 αj E Btj − Btj −1 | Ftj −1 | Ftk =
2
k+1≤i<j ≤h

(since .Btj − Btj −1 is independent of .Ftj −1 )

⎲
h ⎾ ⎤
. = E αi2 (ti − ti−1 ) | Ftk
i=k+1

⎲
h ⎾ˆ t ⎤
= E αi2 1[ti−1 ,ti [ (r)dr | Fs
i=k+1 s
⎾ˆ t ⎤
=E u2r dr | Fs .
s

Formula (10.1.7) is proven in a similar way. Regarding (10.1.8), it is enough to

observe that
⎾ˆ t ˆ T ⎤
.E ur dBr vr dBr | Fs
s t
⎾ˆ T ˆ T ⎤
=E ur 1[s,t[ (r)dBr vr 1[t,T [ (r)dBr | Fs =
s s

(by (10.1.7))
⎾ˆ T ⎤
. =E ur vr 1[s,t[ (r)1[t,T [ (r)dr = 0.
s

Finally, .〈X, Y 〉 in (10.1.9) is a BV process that is adapted, continuous, and such

that .〈X, Y 〉0 = 0. Recalling Proposition 9.5.1, to prove that .〈X, Y 〉 is the covariation
10.1 Integral with Respect to a Brownian Motion 181

process of X and Y , it is enough to verify that .XY − 〈X, Y 〉 is a martingale. For

0 ≤ s ≤ t, we have
.

E [Xt Yt | Fs ] = Xs Ys + E [(Xt − Xs )(Yt − Ys ) | Fs ] + 2Xs E [Yt − Ys | Fs ] =

(by (10.1.7) and since .E [Yt − Ys | Fs ] = 0 by (10.1.10))

⎾ˆ t ⎤
. = Xs Ys + E ur vr dr | Fs
s

= Xs Ys + E [〈X, Y 〉t − 〈X, Y 〉s | Fs ]

which proves the thesis. ⨆

⨅
Thanks to Itô’s isometry (10.1.6), the stochastic integral extends to the case of
integrands in .L2 with an approximation procedure using simple processes. The
following density result holds, whose proof is postponed to Sect. 10.1.1.
Lemma 10.1.7 Let .u ∈ L2 . For every .T > 0 there exists a sequence .(un )n∈N of
simple processes in .L2 that converges to u in the .L2 (Ω × [0, T ])-norm:
⎾ˆ ⎤
T ( )2
. lim E us − un,s ds = 0. (10.1.11)
n→∞ 0

Given .u ∈ L2 , we consider an approximating sequence .(un )n∈N of simple

processes as in Lemma 10.1.7 for a fixed .T > 0. Then, .(un )n∈N is a Cauchy
sequence in .L2 ([0, T ] × Ω) and by Itô’s isometry we have
⎾⎛ˆ ˆ ⎞2 ⎤
T T
. lim E un,s dBs − um,s dBs
n,m→∞ 0 0
⎾ˆ ⎤
T ( )2
= lim E un,s − um,s ds = 0.
n,m→∞ 0

Hence, also the sequence of stochastic integrals is a Cauchy sequence in .L2 (Ω, P ),
thereby ensuring the existence of
ˆ T ˆ T
. us dBs := lim un,s dBs .
0 n→∞ 0

With this procedure, the stochastic integral is defined for a fixed T as a limit in
L2 (Ω, P )-norm, i.e., only up to a negligible event. We will see in Sect. 10.2.3 that,
.

thanks to Doob’s maximal inequality, it is possible to construct the integral as a

stochastic process (varying the integration endpoint) by defining it as a limit in the
space of martingales .M c,2 . By approximation, the properties of Theorem 10.1.6
remain valid under the assumption that .u ∈ L2 .
182 10 Stochastic Integral

In Sect. 10.2.4 we will further extend the integral to the case of integrands in
u ∈ L2loc , that is, u is progressively measurable and satisfies the mild integrability
.

condition
ˆ T
. u2t dt < ∞ T > 0, a.s. (10.1.12)
0

which is considerably weaker than (10.1.1): for example, every adapted continuous
process u belongs to .L2loc since the integral in (10.1.12), on the compact interval
.[0, T ], is finite by the continuity of the trajectories of u. On the other hand, .ut =

exp(Bt4 ) is in .L2loc but not3 in .L2 . Theorem 10.1.6 does not extend to the case of
.u ∈ L
2 , however, we will prove that in this case the integral process is a local
loc
martingale.

10.1.1 Proof of Lemma 10.1.7

To prove the density of the class of simple processes in the space .L2 , we use
the following consequence of Proposition B.3.3 in [113], namely the so-called
“continuity in mean” of absolutely integrable functions.
Corollary 10.1.8 (Continuity in Mean) If .f ∈ L1 (R) then for almost every .x ∈
R we have
ˆ x+h
1
. lim |f (x) − f (y)|dy = 0.
h→0 h x

We prove Lemma 10.1.7 initially assuming that u is continuous. Fixed .T > 0,

for .n ∈ N, we denote by

Tk
. tn,k = , k = 0, . . . , 2n , (10.1.13)
2n
the dyadic numbers of .[0, T ] and define the simple process
n
⎲
2
un,t =
. αn,k 1[tn,k−1 ,tn,k [ , αn,k = utn,k−1 1{|utn,k−1 |≤n} , t ∈ [0, T ].
k=1

Then (10.1.11) follows from the dominated convergence theorem.

3 Since

⎾ˆ T ⎤ ˆ ˆ T 1 x2
e− 2t dtdx = +∞.
4 4
.E e2Bt dt = e2x √
0 R 0 2π t
10.2 Integral with Respect to Continuous Square-Integrable Martingales 183

To conclude, it is enough to prove that every .u ∈ L2 can be approximated in the

.L ([0, T ] × Ω)-norm by a sequence .(un )n∈N of continuous processes in .L2 . To this
2

end, we define4
t
un,t :=
. ⎛ ⎞ us ds, 0 < t ≤ T , n ∈ N.
t− n1 ∨0

Note that .un is continuous and adapted (and therefore progressively measurable).
Moreover, we have
⎡ ⎞2 ⎤
⎾ˆ T ⎤ ˆ T⎛ t
( )2
.E ut − un,t dt = E ⎣ ⎛ ⎞ (ut − us )ds dt ⎦ ≤
0 0 t− n1 ∨0

(by Jensen’s inequality)

⎾ˆ ⎤
T t
. ≤E ⎛ ⎞ (ut − us )2 ds dt
0 t− n1 ∨0
ˆ T t ⎾ ⎤
= ⎛ ⎞ E (ut − us )2 ds dt. (10.1.14)
0 t− n1 ∨0

Now, by Corollary 10.1.8 we have

t ⎾ ⎤
. lim ⎛ ⎞ E (ut − us )2 ds = 0 a.e.
n→∞ t− n1 ∨0

and therefore we can take the limit in (10.1.14) as .n → ∞ and conclude using the
Lebesgue dominated convergence theorem.

10.2 Integral with Respect to Continuous Square-Integrable

Martingales

We assume that the integrator process B belongs to the class .M c,2 , i.e., B is a
continuous martingale such that .Bt ∈ L2 (Ω, P ) for every .t ≥ 0. The construction
of the stochastic integral is similar to the case of a Brownian motion with some
additional technicalities.
We denote by .〈B〉 the quadratic variation process defined in Theorem 9.4.1:
.〈B〉 is a continuous and increasing process associated with the Lebesgue-Stieltjes

ﬄ
4 Here . b
´b
a us ds = 1
b−a a us ds for .a < b.
184 10 Stochastic Integral

measure .μ〈B〉 (cf. Sect. 9.2). We let

ˆ ˆ b
. f dμ〈B〉 or f (t)d〈B〉t , 0 ≤ a ≤ b,
[a,b] a

indicate the integral with respect to .μ〈B〉 . For example, if B is a Brownian motion
then .〈B〉t = t and the corresponding Lebesgue-Stieltjes measure is simply the
Lebesgue measure, as seen in Sect. 10.1.
Definition 10.2.1 We denote by .L2B the class of processes .u = (ut )t≥0 such that:
(i) u is progressively measurable;
(ii) for every .T ≥ 0 we have
⎾ˆ T ⎤
E
. u2t d〈B〉t < ∞. (10.2.1)
0

Generally, the process B will be fixed once and for all and therefore, if there is no
risk of confusion, we will simply write .L2 instead of .L2B .
At a later stage, we will weaken the integrability condition (ii) by requiring that
u belongs to the following class.
Definition 10.2.2 We denote by .L2B,loc (or, more simply, .L2loc ) the class of pro-
cesses u such that
(i) u is progressively measurable;
(ii’) for every .T ≥ 0 we have
ˆ T
. u2t d〈B〉t < ∞ a.s. (10.2.2)
0

Property (ii’) is a very weak integrability condition that is automatically verified

if, for example, u has continuous trajectories or, more generally, locally bounded
ones (note that the integration domain in (10.2.2) is compact). Formula (10.2.2) is
equivalent to .P (u ∈ L2 ([0, T ], μ〈B〉 )) = 1.

10.2.1 Integral of Indicator Processes

Consider a very particular class of integrands that, with respect to the temporal
variable, are indicator functions of an interval: precisely, an indicator process is a
stochastic process of the form

ut = α1[t0 ,t1 [ (t),

. t ≥ 0, (10.2.3)
10.2 Integral with Respect to Continuous Square-Integrable Martingales 185

where .α is a .Ft0 -measurable and bounded random variable (i.e., such that .|α| ≤ c
a.s. for some positive constant c) and .t1 > t0 ≥ 0.
Remark 10.2.3 Every indicator process u belongs to .L2 : in fact, u is càdlàg and
adapted, therefore progressively measurable; moreover, u satisfies (10.2.1) since
⎾ˆ ⎤ ⎾ (
T )⎤ ⎾ ⎤
E
. u2t d〈B〉t = E α 2 〈B〉T ∧t1 − 〈B〉T ∧t0 ≤ c2 E 〈B〉T ∧t1 − 〈B〉T ∧t0 < ∞
0

for every .T ≥ 0.
The definition of the stochastic integral of an indicator process is elementary and
completely explicit: it is defined, path by path, by multiplying .α by an increment
of B.
Definition 10.2.4 (Stochastic Integral of Indicator Processes) Let u be the indi-
cator process in (10.2.3) and .B ∈ M c,2 . For every .T ≥ t1 we set
ˆ T ( )
. ut dBt := α Bt1 − Bt0 (10.2.4)
0

and we define the stochastic integral for two generic integration endpoints a and b,
with .0 ≤ a ≤ b, as
ˆ b ˆ t1
. ut dBt := ut 1[a,b[ (t)dBt . (10.2.5)
a 0

Remark 10.2.5 If .[t0 , t1 [∩[a, b[/= ∅, the integral in the right-hand side of (10.2.5)
is defined by (10.2.4) interpreting .ut 1[a,b[ (t) as the simple process .α1[t0 ∨a,t1 ∧b[ (t)
and choosing .T = t1 . Otherwise, it is understood that the integral is null by
definition.
Remark 10.2.6 ([!]) Being defined in terms of increments of B, the stochastic
integral does not depend on the initial value .B0 . Moreover, X is an adapted and
continuous process.
In the next result, we establish some fundamental properties of the stochastic
integral. The second part of the proof is based on the remarkable identity (9.4.1),
valid for every .B ∈ M c,2 , which we recall here:
⎾ ⎤
E (Bt − Bs )2 | Fs = E [〈B〉t − 〈B〉s | Fs ] ,
. 0 ≤ s ≤ t. (10.2.6)

Throughout the chapter, we insist on providing the explicit expression of the

quadratic variation of the stochastic integral or the covariation of two integrals:
the reason is that they appear in the most important tool for calculating stochastic
integrals, Itô’s formula, which we will present in Chap. 11.
Theorem 10.1.6 has the following natural extension.
186 10 Stochastic Integral

Theorem 10.2.7 ([!]) Let

ˆ t ˆ t
. Xt := us dBs , Yt := vs dBs , t ≥ 0,
0 0

where .u, v are indicator processes and .B ∈ M c,2 . For .0 ≤ s ≤ t ≤ T , the

following properties hold:
(i) X is a continuous square-integrable martingale, .X ∈ M c,2 , and we have
⎾ˆ t ⎤
E
. ur dBr | Fs = 0; (10.2.7)
s

(ii) the Itô isometry holds

⎾⎛ˆ ⎞2 ⎤ ⎾ˆ ⎤
t t
. E ur dBr | Fs = E u2r d〈B〉r | Fs (10.2.8)
s s

and more generally

⎾ˆ t ˆ t ⎤ ⎾ˆ t ⎤
E
. ur dBr vr dBr | Fs = E ur vr d〈B〉r | Fs , . (10.2.9)
s s s
⎾ˆ t ˆ T ⎤
E ur dBr vr dBr | Fs = 0; (10.2.10)
s t

(iii) the covariation process of X and Y is given by

ˆ t
〈X, Y 〉t =
. us vs d〈B〉s , t ≥ 0. (10.2.11)
0

Proof By Remark 10.2.5 it is not restrictive to assume .u = α1[s,t[ and .v = β1[s,t[

with bounded .α, β ∈ mFs .
(i) We have
⎾ˆ t ⎤
E
. ur dBr | Fs = E [α (Bt − Bs ) | Fs ] = αE [Bt − Bs | Fs ] = 0
s

where we have exploited the fact that .α ∈ mFs and the martingale property
of B. This proves (10.2.7) which is equivalent to the martingale property of
X. Clearly .XT ∈ L2 (Ω, P ) for every .T ≥ 0 since .XT is the product of
the bounded random variable .α, times an increment of B which is square-
integrable.
10.2 Integral with Respect to Continuous Square-Integrable Martingales 187

(ii) We directly prove (10.2.9): we have

⎾ˆ t ˆ t ⎤ ⎾ ⎤
E
. ur dBr vr dBr | Fs = E αβ(Bt − Bs )2 | Fs
s s
⎾ ⎤
= αβE (Bt − Bs )2 | Fs =

(by the crucial formula (10.2.6))

. = αβE [〈B〉t − 〈B〉s | Fs ]

= E [αβ(〈B〉t − 〈B〉s ) | Fs ]
⎾ˆ t ⎤
=E ur vr d〈B〉r | Fs .
s

The proof of (10.2.9) is analogous.

(iii) The process .〈X, Y 〉 in (10.2.11) is adapted, continuous and locally of bounded
variation since it is the difference of increasing processes
ˆ t ˆ t
〈X, Y 〉t =
. (us vs )+ d〈B〉s − (us vs )− d〈B〉s .
0 0

Moreover, .〈X, Y 〉0 = 0. To conclude, it is enough to prove that .XY − 〈X, Y 〉

is a martingale: we have
⎛ ˆ t ⎞⎛ ˆ t ⎞
Xt Yt = Xs +
. ur dBr Ys + vr dBr
s s
ˆ t ˆ t ˆ t ˆ t
= Xs Ys + ur dBr vr dBr + Xs vr dBr + Ys ur dBr
s s s s

and therefore
⎾ˆ t ˆ t ⎤
.E [Xt Yt | Fs ] = Xs Ys + E ur dBr vr dBr | Fs
s s
⎾ˆ t ⎤ ⎾ˆ t ⎤
+ Xs E vr dBr | Fs + Ys E ur dBr | Fs =
s s

(by (10.2.9) and (10.2.7))

⎾ˆ t ⎤
. = Xs Ys + E ur vr d〈B〉r | Fs
s
188 10 Stochastic Integral

E [Xt Yt − 〈X, Y 〉t | Fs ] = Xs Ys − 〈X, Y 〉s .

⨆
⨅
Remark 10.2.8 Formulas (10.2.7), (10.2.8), (10.2.9), (10.2.10), and (10.2.11) can
be rewritten in the form

E [Xt − Xs | Fs ] = 0,
.
⎾ ⎤
E (Xt − Xs )2 | Fs = E [〈X〉t − 〈X〉s | Fs ] ,

E [(Xt − Xs ) (Yt − Ys ) | Fs ] = E [〈X, Y 〉t − 〈X, Y 〉s | Fs ] ,

E [(Xt − Xs ) (YT − Yt ) | Fs ] = 0.

By taking the expected value, we also obtain the unconditional versions of Itô’s
isometry:
⎾⎛ˆ ⎞2 ⎤ ⎾ˆ ⎤
t t
E
. ur dBr =E u2r d〈B〉r ,. (10.2.12)
s s
⎾ˆ t ˆ t ⎤ ⎾ˆ t ⎤
E ur dBr vr dBr = E ur vr d〈B〉r ,
s s s
⎾ˆ t ˆ T ⎤
E ur dBr vr dBr = 0, (10.2.13)
s t

and (10.2.11) with .u = v becomes

ˆ t
〈X〉t =
. u2s d〈B〉s , t ≥ 0.
0

10.2.2 Integral of Simple Processes

In this section, we extend the class of integrable processes to simple processes:

they are sums of indicator processes like those considered in the previous section.
Due to linearity, the definition of stochastic integral extends, path by path, in an
elementary and explicit way. The fundamental properties of the integral remain
valid: the martingale property and Itô’s isometry.
10.2 Integral with Respect to Continuous Square-Integrable Martingales 189

Definition 10.2.9 (Simple Process) A simple process u is a process of the form

⎲
N
.ut = uk,t , uk,t := αk 1[tk−1 ,tk [ (t), (10.2.14)
k=1

where:
(i) .0 ≤ t0 < t1 < · · · < tN ;
(ii) .αk is a bounded .Ftk−1 -measurable random variable for each .k = 1, . . . , N.
One can also require that .P (αk /= αk+1 ) > 0, for .k = 1, . . . , N − 1, so that the
representation (10.2.14) of u is unique.
Definition 10.2.10 (Stochastic Integral of Simple Processes) Let u be a simple
process of the form (10.2.14) and let .B ∈ M c,2 . The stochastic integral of u with
respect to B is the stochastic process

ˆ N ˆ
⎲ ⎲
N
t t ( )
. us dBs := uk,s dBs = αk Bt∧tk − Bt∧tk−1 .
0 k=1 0 k=1

Theorem 10.2.11 Theorem 10.2.7 remains valid under the assumption that .u, v are
simple processes.
Proof The continuity and the martingale property (10.2.7) are immediate due to
linearity. As by Itô’s isometry (10.2.9), first we can write v in the form (10.2.14)
with respect to the same choice of .t0 , . . . , tN , for certain .vk,t = βk 1[tk−1 ,tk [ (t): note
that

⎲
N ⎲
N ⎲
N
. ut vt = uk,t vh,t = αk βk 1[tk−1 ,tk [ (t). (10.2.15)
k=1 h=1 k=1

Then we have
⎾ˆ t ˆ t ⎤
.E ur dBr vr dBr | Fs
s s
⎾ N ˆ t N ˆ
⎤
⎲ ⎲ t
=E uk,r dBr vh,r dBr | Fs
k=1 s h=1 s

⎲
N ⎾ˆ t ˆ t ⎤
= E uk,r dBr vk,r dBr | Fs
k=1 s s

⎲ ⎾ˆ th ˆ tk ⎤
×2 E uh,r 1[s,t[ (r)dBr vk,r 1[s,t[ (r)dBr | Fs =
h<k th−1 tk−1
190 10 Stochastic Integral

(by (10.2.8) and (10.2.10))

⎲
N ⎾ˆ t ⎤
. = E uk,r vk,r d〈B〉r | Fs =
k=1 s

(by (10.2.15))
⎾ˆ t ⎤
. =E ur vr d〈B〉r | Fs .
s

Finally, the fact that .〈X, Y 〉 in (10.2.11) is the covariation process of X and Y is
proven as in the proof of Theorem 10.2.7-(iii). ⨅
⨆

10.2.3 Integral in L2

In this section, we extend the class of integrands by exploiting the density of simple
processes in .L2B (cf. Definition 10.2.1). The stochastic integral is now defined as
a limit in .M c,2 and therefore, recalling Remark 8.3.2, as an equivalence class and
no longer path by path. However, the fundamental properties of the integral remain
valid: the martingale property and Itô’s isometry. As usual, since B is fixed, we
simply write .L2 instead of .L2B .
Lemma 10.1.7 has the following generalization, which is proven with a technical
trick: the idea is to make a change of time variable to “realign” the continuous and
increasing process .〈B〉t to the Brownian case in which .〈B〉t ≡ t; for details, we
refer to Lemma 2.2.7 in [67].
Lemma 10.2.12 Let .u ∈ L2 . For every .T > 0 there exists a sequence .(un )n∈N of
simple processes such that
⎾ˆ ⎤
T ( )2
. lim E us − un,s d〈B〉s = 0.
n→∞ 0

We recall the convention according to which .MTc,2 is the space of equivalence

classes (according to indistinguishability) of continuous square-integrable martin-
gales .X = (Xt )t∈[0,T ] , equipped with the norm
/ ⎾ ⎤
.‖X‖T := E XT2 .

By Proposition 8.3.3, .(MTc,2 , ‖ · ‖T ) is a Banach space.

10.2 Integral with Respect to Continuous Square-Integrable Martingales 191

We now see how to define the stochastic integral of .u ∈ L2 . Given .T > 0 and
an approximating sequence .(un )n∈N of simple processes as in Lemma 10.2.12, we
denote by
ˆ t
Xn,t =
. un,s dBs , t ∈ [0, T ], (10.2.16)
0

the sequence of their respective stochastic integrals. By Theorem 10.2.11 .Xn ∈

MTc,2 and by Itô’s isometry (10.2.8) we have
⎾⎛ˆ ⎞2 ⎤ ⎾ˆ ⎤
T T
. ‖Xn − Xm ‖T =E (un,t − um,t )dBt =E (un,t − um,t ) d〈B〉t .
2 2
0 0

It follows that .(Xn )n∈N is a Cauchy sequence in .(MTc,2 , ‖ · ‖T ) and therefore there
exists

X := lim Xn
. in MTc,2 . (10.2.17)
n→∞

Proposition 10.2.13 (Stochastic Integral of .L2 Processes) The limit process .X =

(Xt )t∈[0,T ] in (10.2.17) is independent of the approximating sequence and is called
the stochastic integral process of u with respect to B on .[0, T ] and denoted by
ˆ t
Xt =
. us dBs , t ∈ [0, T ].
0

Proof Let X be the limit in (10.2.17) defined from the approximating sequence
(un )n∈N . Let .(vn )n∈N be another approximating sequence for u and
.

ˆ t
Yn,t =
. vn,s dBs , t ∈ [0, T ]. (10.2.18)
0

Then .‖Yn − X‖T ≤ ‖Yn − Xn ‖T + ‖Xn − X‖T and it is enough to observe that,
again by Itô’s isometry, we have
⎾⎛ˆ ⎞2 ⎤
T
. ‖Yn − Xn ‖T =E (vn,t − un,t )dBt
2
0
⎾ˆ T ⎤
=E (vn,t − un,t )2 d〈B〉t −−−→ 0.
0 n→∞

⨆
⨅
192 10 Stochastic Integral

Remark 10.2.14 ([!]) By construction, the Itô stochastic integral

ˆ t
Xt =
. us dBs , (10.2.19)
0

with .u ∈ L2 and .B ∈ M c,2 , is an equivalence class in .M c,2 : each representative

of this class is a continuous martingale, uniquely determined up to indistinguishable
processes. From this perspective, unless a particular choice of the representative has
been made, the single trajectories of the stochastic integral process are not defined
and it does not make sense to consider .Xt (ω) for a specific .ω ∈ Ω.
Theorem 10.2.15 Theorem 10.2.7 remains valid under the assumption that .u, v ∈
L2 .
Proof Let .(un )n∈N and .(vn )n∈N be sequences of simple processes, approximating
u and v in .(MTc,2 , ‖ · ‖T ), respectively. We denote by .(Xn )n∈N and .(Yn )n∈N the
corresponding stochastic integrals in (10.2.16) and (10.2.18). Equations (10.2.7)
and (10.2.8) are a direct consequence of the fact that .Xn,t → Xt in .L2 (Ω, P )
(and therefore also in .L1 (Ω, P )) and .Xn,t Yn,t → Xt Yt in .L1 (Ω, P ), together with
the general fact that5 if .Zn → Z in .L1 (Ω, P ) then .E [Zn | G ] → E [Z | G ] in
1
.L (Ω, P ). The proof of (10.2.11) is identical to that of Theorem 10.2.7-(iii). ⨆
⨅
Remark 10.2.16 ([!]) Let .B ∈ M c,2 and .u ∈ L2B . By Theorem 10.2.15, the
integral X in (10.2.19) belongs to .M c,2 and therefore, in turn, can be used as an
integrator. Since
ˆ t
〈X〉t =
. u2s d〈B〉s ,
0

we have that .v ∈ L2X if v is progressively measurable and satisfies

⎾ˆ t ⎤ ⎾ˆ t ⎤
.E vs2 d〈X〉s =E vs2 u2s d〈B〉s <∞
0 0

for every .t ≥ 0. In this case, we have

ˆ t ˆ t
. vs dXs = vs us dBs
0 0

which can be verified directly for simple .u, v and, in general, by approximation.

5 By Jensen’s inequality, we have

.E [|E [Zn | G ] − E [Z | G ]|] ≤ E [E [|Zn − Z| | G ]] = E [|Zn − Z|] .

10.2 Integral with Respect to Continuous Square-Integrable Martingales 193

In particular, if B is a Brownian motion, then the Lebesgue-Stieltjes measure

associated with .〈X〉 is absolutely continuous with respect to the Lebesgue measure,
with density .u2 .
We now give two propositions whose statements seem almost obvious but
actually, in light of Remark 10.2.14, require a rigorous proof. Both results are proven
using an approximation procedure, technical and somewhat tedious.
Proposition 10.2.17 ([!]) Suppose that .u, v ∈ L2 are modifications on an event F
in the sense that, for every .t ∈ [0, T ], .ut (ω) = vt (ω) for every .ω ∈ F \ N where N
is a negligible event. Then the corresponding integral processes
ˆ t ˆ t
.Xt = us dBs , Yt = vs dBs ,
0 0

are indistinguishable on F , that is, . sup |Xt (ω) − Yt (ω)| = 0 for .ω ∈ F \ N .

t∈[0,T ]

Proof Let us consider the approximations .un and .vn defined as in Lemma 10.2.12.
By construction, for every .n ∈ N and .t ∈ [0, T ], .un,t = vn,t almost surely on
F . It follows that the relative integrals .(Xn,t )t∈[0,T ] in (10.2.16) and .(Yn,t )t∈[0,T ]
in (10.2.18) are modifications on F . Taking the limit in n, we deduce that .(Xt )t∈[0,T ]
and .(Yt )t∈[0,T ] are modifications on F : the thesis follows from the continuity of X
and Y . ⨆
⨅
Remark 10.2.18 Suppose that, for some .T > 0, we have
ˆ T ˆ T
. ut dBt = vt dBt
0 0

where .u, v ∈ L2 and B is a Brownian motion. Then .P (u = v a.e. on [0, T ]) = 1,

that is, almost surely the trajectories of u and v are a.e. equal on .[0, T ]. Indeed, by
Itô’s isometry, we have
⎾ˆ ⎤ ⎾⎛ˆ ⎞2 ⎤
T T
E
. (ut − vt ) dt = E
2
(ut − vt )dBt =0
0 0

which proves the thesis.

Proposition 10.2.19 (Integral with Random Integration Endpoint [!]) Let X
in (10.2.19) be the stochastic integral process of .u ∈ L2 with respect to .B ∈ M c,2 .
Let .τ be a stopping time such that .0 ≤ τ ≤ T for some .T > 0. Then .(ut 1(t≤τ ) )t≥0 ∈
L2 and
ˆ τ ˆ T
Xτ =
. us dBs = us 1(s≤τ ) dBs a.s.
0 0
194 10 Stochastic Integral

Proof First, we observe that, by Proposition 10.2.17, if .F ∈ Ft then

ˆ T ˆ T
1F
. us dBs = 1F us dBs a.s. (10.2.20)
t t

The measurability condition on F is essential because it ensures that the integral on

the right-hand side of (10.2.20) is well-defined, being the integrand progressively
measurable on .[t, T ].
Now we recall the notation (10.1.13), .tn,k := T2nk , for the dyadic numbers of
.[0, T ] and we use the usual discretization of .τ :

n
⎲
2
τn =
. tn,k 1Fn,k
k=1

with
( ) ( )
Fn,1 = 0 ≤ τ ≤
.
T
2n , Fn,k = tn,k−1 < τ ≤ tn,k , k = 2, . . . , 2n .

We note that .(Fn,k )k=1,...,2n forms a partition of .Ω with .Fn,k ∈ Ftn,k and .(τn )n∈N is
a decreasing sequence of stopping times that converges to .τ . By continuity, we have
.Xτn → Xτ . Moreover, setting

ˆ T ˆ T
Y =
. us 1(s≤τ ) dBs , Yn = us 1(s≤τn ) dBs ,
0 0

using Itô’s isometry, it is easy to prove that .Yn → Y in .L2 (Ω, P ) and therefore also
almost surely.
To prove the thesis, i.e., the fact that .Xτ = Y a.s., it is sufficient to verify that
.Xτn = Yn a.s. for each .n ∈ N. Now, on .Fn,k we have

ˆ T ˆ T
Xτn = Xtn,k =
. us dBs − us dBs ,
0 tn,k

and therefore
ˆ T ⎲
2 n ˆ T
Xτn =
. us dBs − 1Fn,k us dBs . (10.2.21)
0 k=1 tn,k
10.2 Integral with Respect to Continuous Square-Integrable Martingales 195

On the other hand,

ˆ T ( )
Yn =
. us 1 − 1(s>τn ) dBs
0
ˆ T 2 ˆ
⎲
n
T
= us dBs − us 1Fn,k dBs =
0 k=1 tn,k

(by (10.2.20), with probability one)

ˆ T ⎲
2n ˆ T
. = us dBs − 1Fn,k us dBs
0 k=1 tn,k

which, combined with (10.2.21), proves the thesis. ⨆

⨅

10.2.4 Integral in L2loc

Weakening the integrability condition on the integrand from .L2 to .L2loc , some of the
fundamental properties of the integral are lost, including the martingale property
and Itô’s isometry. However, we will prove that the integral is a local martingale
and provide a “surrogate” for Itô’s isometry, Lemma 10.2.25.
We recall that .u ∈ L2loc if it is progressively measurable and, for every .t > 0,
ˆ t
At :=
. u2s d〈B〉s < ∞ a.s. (10.2.22)
0

The process A is continuous, adapted, and increasing; moreover, A is non-negative

since .A0 = 0 (see Fig. 10.1).

Fig. 10.1 On the left: plot´ t of a trajectory of a Brownian motion W . On the right: plot of the
related trajectory of .At = 0 Ws2 ds, corresponding to the process in (10.2.22) with .u = W and B
Brownian motion
196 10 Stochastic Integral

Fig. 10.2 Plot of two trajectories of the process A in (10.2.22) and the corresponding stopping
times .τn and .τn+1 in (10.2.23)

Remark 10.2.20 ([!]) Note that the class .L2 depends on the fixed probability
measure, as opposed to .L2loc that is invariant with respect to equivalent6 probability
measures.
Let us fix .T > 0 and consider the sequence of stopping times defined by

τn = T ∧ inf{t ≥ 0 | At ≥ n},
. n ∈ N, (10.2.23)

and represented in Fig. 10.2. Due to the continuity of A, we have .τn ↗ T almost
surely, and thus the sequence of events .Fn := (τn = T ) is such that .Fn ↗ Ω \ N
with .P (N ) = 0. Truncating u at time .τn , we define the process

un,t := ut 1(t≤τn ) ,
. t ∈ [0, T ],

which is progressively measurable and such that

⎾ˆ t ⎤ ⎾ˆ t∧τn ⎤
E
. u2n,s d〈B〉s = E u2s d〈B〉s ≤ n, t ∈ [0, T ].
0 0

Thus, .un ∈ L2 and the corresponding integral

ˆ t ˆ t∧τn
Xn,t :=
. un,s dBs = us dBs , t ∈ [0, T ], (10.2.24)
0 0

belongs to .M c,2 according to Theorem 10.2.15. Moreover, for every .n, h ∈ N,

almost surely for every .t ∈ [0, T ] we have

un,t = un+h,t = ut
. on Fn ,

6 Equivalent measures have the same certain (and, therefore, also negligible) events.
10.2 Integral with Respect to Continuous Square-Integrable Martingales 197

( ) ( )
and therefore the processes . Xn,t t∈[0,n] and . Xn+h,t t∈[0,n] are indistinguishable
on .Fn thanks to Proposition 10.2.17. Hence, the following definition is well-posed:
Definition 10.2.21 (Stochastic Integral of Processes in .L2loc ) The stochastic inte-
gral of .u ∈ L2loc with respect to .B ∈ M c,2 on .[0, T ] is the continuous and adapted
process .X = (Xt )t∈[0,T ] that on .Fn is indistinguishable from .Xn in (10.2.24) for
every .n ∈ N. As usual, we write
ˆ t
Xt =
. us dBs , t ∈ [0, T ]. (10.2.25)
0

We will see later, in Proposition 10.2.26, that

ˆ t ˆ t
. us dBs = lim un,s dBs
0 n→∞ 0

with convergence in probability.

Remark 10.2.22 As already observed earlier, the stochastic integral is defined as
an equivalence class of indistinguishable processes. The previous definition and
in particular the notation (10.2.25) are well-posed in the sense that if X and .X̄
denote respectively the stochastic integral processes of u with respect to B on the
intervals .[0, T ] and .[0, T̄ ] with .T ≤ T̄ then, by an approximation procedure starting
from simple processes, we get that X and .X̄|[0,T ] are indistinguishable processes.
Consequently, the Itô stochastic integral process of u with respect to B denoted by
ˆ t
Xt =
. us dBs , t ≥ 0.
0

is well-defined.
Proposition 10.2.19 has the following simple generalization.
Proposition 10.2.23 (Integral with Random Integration Endpoint) Let X be the
stochastic integral process of .u ∈ L2loc with respect to .B ∈ M c,2 . Let .τ be a
( )
stopping time such that .0 ≤ τ ≤ T for some .T > 0. Then . ut 1(t≤τ ) t≥0 ∈ L2loc and

ˆ τ ˆ T
Xτ =
. us dBs = us 1(s≤τ ) dBs a.s.
0 0
( )
Proof It is clear that . ut 1(t≤τ ) t≥0 ∈ L2loc . Let .(τn )n∈N be the sequence of stopping
times in (10.2.23). By definition on the event .Fn = (τn = T ), we have
ˆ τ
Xτ =
. us 1(s≤τn ) dBs =
0
198 10 Stochastic Integral

(by Proposition 10.2.19, since .us 1(s≤τn ) ∈ L2 )

ˆ T
. = us 1(s≤τn ) 1(s≤τ ) dBs =
0

(since .τn = T ≥ τ on .Fn )

ˆ T
. = us 1(s≤τ ) dBs .
0

The thesis follows from the arbitrariness of n. ⨆

⨅
Extending the class of integrands from .L2 to .L2loc , we lose the martingale
property, however, we have the following
Theorem 10.2.24 ([!]) Let
ˆ t ˆ t
Xt =
. us dBs , Yt = vs dBs
0 0

with .u, v ∈ L2loc and .B ∈ M c,2 . Then:

(i) X is a continuous local martingale, i.e., .X ∈ M c,loc , and

τn := n ∧ inf{t ≥ 0 | At ≥ n},
. n ∈ N,

with A in (10.2.22), is a localizing sequence for X (cf. Definition 8.4.2);

(ii) the covariation process of X and Y is
ˆ t
〈X, Y 〉t =
. us vs d〈B〉s , t ≥ 0.
0

Proof By Proposition 10.2.23 (with the choice .τ = t ∧ τn and .T = t), for every
t ≥ 0 we have
.

ˆ t
Xt∧τn =
. us 1(s≤τn ) dBs a.s.
0

and therefore, by continuity, .Xt∧τn is a version of the stochastic integral of the

process .us 1(s≤τn ) which belongs to .L2 . It follows that .Xt∧τn is a continuous
martingale and therefore X is a local martingale with localizing sequence .(τn )n∈N .
´t
Now let .At = us vs d〈B〉s and
0

τn = n ∧ inf{t ≥ 0 | 〈X〉t + 〈Y 〉t ≥ n},

. n ∈ N.
10.2 Integral with Respect to Continuous Square-Integrable Martingales 199

By Theorem 10.2.15 (cf. (10.2.11)) and the Cauchy-Schwarz inequality of

Remark 9.5.2-(iii), the process
ˆ t
(XY − A)t∧τn = Xt∧τn Yt∧τn − At∧τn = Xt∧τn Yt∧τn −
. us vs 1(s≤τn ) d〈B〉s
0

is a martingale: it follows that .XY − A ∈ M c,loc with localizing sequence .(τn )n∈N
and therefore .A = 〈X, Y 〉. ⨆
⨅
For the stochastic integral of .u ∈ L2loc , we no longer have a fundamental tool
such as Itô’s isometry: in many situations it can be conveniently replaced by the
following lemma.
Lemma 10.2.25 ([!]) Let
ˆ t ˆ t
Xt =
. us dBs , 〈X〉t = u2s d〈B〉s ,
0 0

with .u ∈ L2loc and .B ∈ M c,2 . For every .t, ε, δ > 0 we have

δ
P (|Xt | ≥ ε) ≤ P (〈X〉t ≥ δ) +
. .
ε2

Proof Let

τδ = inf{s > 0 | 〈X〉s ≥ δ},

. δ > 0.

Given .t, ε > 0, we have

P (|Xt | ≥ ε) = P ((|Xt | ≥ ε) ∩ (τδ ≤ t)) + P ((|Xt | ≥ ε) ∩ (τδ > t)) ≤

(since .(τδ ≤ t) = (〈X〉t ≥ δ))

. ≤ P (〈X〉t ≥ δ) + P ((|Xt | ≥ ε) ∩ (τδ > t))

and therefore it remains to prove that

δ
P ((|Xt | ≥ ε) ∩ (τδ > t)) ≤
. .
ε2
Now we have
⎛⎛|ˆ t | ⎞ ⎞ ⎛⎛|ˆ t | ⎞ ⎞
| | | |
.P | u dB | ≥ ε ∩ (t < τ ) = P | u 1 dB | ≥ ε ∩ (t < τδ )
| s s | δ | s (s<τδ ) s |
0 0
⎛|ˆ t | ⎞
| |
| |
≤ P | us 1(s<τδ ) dBs | ≥ ε ≤
0
200 10 Stochastic Integral

(by Chebyshev’s inequality (3.1.3) in [113])

⎾|ˆ |2 ⎤
1 | t |
. ≤
2
|
E | us 1(s<τδ ) dBs || =
ε 0

(by Itô’s isometry, since .us 1(s<τδ ) ∈ L2 )

⎾ˆ t ⎤
1 δ
. = E u2s 1(s<τδ ) d〈B〉s ≤ 2 .
ε2 0 ε

⨆
⨅

10.2.5 Stochastic Integral as a Riemann-Stieltjes Integral

The following result shows that the stochastic integral of .u ∈ L2loc can also be
defined by approximation, as we did for .u ∈ L2 , provided that we use convergence
in probability instead of in .L2 (Ω, P )-norm.
Proposition 10.2.26 Let .u, un ∈ L2loc , .n ∈ N, such that
ˆ t
P
. |un,s − us |2 d〈B〉s −−−→ 0. (10.2.26)
0 n→∞

Then
ˆ t ˆ t
P
. un,s dBs −−−→ us dBs .
0 n→∞ 0

Proof The thesis is an immediate consequence of Itô’s isometry in the form of

Lemma 10.2.25: fixed .ε > 0 and setting .δ = ε3 , we have
⎛|ˆ t | ⎞
| |
. lim P || (un,s − us )dBs || ≥ ε
n→∞ 0
⎛ˆ t ⎞
≤ lim P |un,s − us |2 d〈B〉s ≥ δ + ε = ε
n→∞ 0

thanks to assumption (10.2.26). ⨆

⨅
As a simple application of Proposition 10.2.26, we prove that, in the case
where the integrand is a continuous process, the stochastic integral is indeed
the limit in probability of the Riemann-Stieltjes sums in which the integrand is
evaluated at the left endpoint of each interval of the partition: this is consistent
with the construction of the Itô integral, which crucially exploits the hypothesis
10.2 Integral with Respect to Continuous Square-Integrable Martingales 201

of progressive measurability of the integrand. The following result is also the basis
of the numerical approximation methods for the stochastic integral.
Corollary 10.2.27 ([!]) Let u be a continuous and adapted process, .B ∈ M c,2 ,
and .(πn )n∈N be a sequence of partitions of .[0, t], with .πn = (tn,k )k=0,...,mn , such
that . lim |πn | = 0. Then
n→∞

⎲
mn ˆ
( ) P t
. utn,k−1 Btn,k − Btn,k−1 −−−→ us dBs .
n→∞ 0
k=1

Proof Setting

⎲
mn
. un,s = utn,k−1 1[tn,k−1 ,tn,k[ (s)
k=1

we have that .un ∈ L2loc and

⎲
mn ˆ
( ) t
. utn,k−1 Btn,k − Btn,k−1 = un,s dBs .
k=1 0

Moreover, by the continuity of u and the dominated convergence theorem, we have

ˆ t
. lim |un,s − us |2 d〈B〉s = 0 a.s.
n→∞ 0

The thesis follows from Proposition 10.2.26. ⨆

⨅
A useful consequence of Corollary 10.2.27 is the following
Corollary 10.2.28 ([!]) Assume that, for .i = 1, 2, the process .B i ∈ M c,2 and the
continuous adapted process .ui are defined on .(Ωi , F i , P i ). Moreover, let
ˆ t
i
.Xt = uis dBsi .
0

d
If .(u1 , B 1 ) = (u2 , B 2 ) (i.e. .(u1 , B 1 ) and .(u2 , B 2 ) are equal in law) then we also
1 1 d
.(u , B , X ) = (u , B , X ).
1 2 2 2

A similar result holds under much more general assumptions: in this regard, see,
for example, Exercise IV.5.16 in [123].
202 10 Stochastic Integral

10.3 Integral with Respect to Continuous Semimartingales

In the previous sections, we assumed that the integrating process B is a continuous

square-integrable martingale. Now we extend the definition of the stochastic integral
to the case where the integrator, here denoted by S, is a continuous semimartingale:
precisely, by Definition 9.3.1, S is an adapted and continuous process of the form

S =A+B
.

where .A ∈ BV is such that .A0 = 0 and .B ∈ M c,loc . We use the notation

ˆ t
. ur dSr
0

to indicate the stochastic integral of the process u with respect to S: it is defined as

the sum
ˆ t ˆ t ˆ t
. ur dSr := ur dAr + ur dBr
0 0 0

where the two integrals on the right-hand side have the meaning that we now
explain.
Let .μA be the Lebesgue-Stieltjes measure7 associated with A and defined path
by path: we denote by
ˆ t ˆ
. ur dAr := ur μA (dr)
0 [0,t]

the corresponding Lebesgue-Stieltjes integral. In order for this integral to be well-

defined, we require that .u ∈ L2S,loc according to the following

Definition 10.3.1 .L2S,loc is the class of progressively measurable processes u such

that
ˆ ˆ t
. |ur ||μA |(dr) + u2r d〈B〉r < ∞ a.s.
[0,t] 0

for every .t ≥ 0.

7 According to Definition 9.2.1, .μA is a signed measure.

10.3 Integral with Respect to Continuous Semimartingales 203

As for the integral with respect to .B ∈ M c,loc , one can use a localization
procedure entirely analogous8 to that of Sect. 10.2.4. In conclusion, recalling
Definition 9.5.3 of quadratic variation of a semimartingale, we have the following
Proposition 10.3.2 Let .S = A+B be a continuous semimartingale and .u ∈ L2S,loc .
The stochastic integral process
ˆ t ˆ t ˆ t
Xt :=
. ur dSr = ur dAr + ur dBr , t ≥ 0,
0 0 0

is a continuous semimartingale with quadratic variation process

ˆ t
〈X〉t =
. u2r d〈B〉r , t ≥ 0. (10.3.1)
0

8 Let .(τ )
n n∈N be a localizing sequence for B: as in Remark 8.4.6-(iv) we can assume .|Bt∧τn | ≤ n
so that .Bn := (Bt∧τn )t≥0 ∈ M c,2 . If .u ∈ L2S,loc then
ˆ t ˆ t
. u2r d〈Bn 〉r ≤ u2r d〈B〉r < ∞ a.s.
0 0

and therefore .u ∈ L2Bn ,loc and the integral

ˆ t
.Yn,t := ur dBn,r
0

is well-defined. On the event .Fn,T := (T ≤ τn ) we have a.s.

| |
. sup |Yn,t − Ym,t | = 0, m ≥ n.
0≤t≤T

This is true if u is simple and in general it can be proved by approximation, as Proposition 10.2.17.
Since .Fn,T ↗ FT with .P (FT ) = 1, we define the integral
ˆ t
.Yt = ur dBr , 0 ≤ t ≤ T,
0

as the equivalence class of continuous and adapted processes that, for each .n ∈ N, are
indistinguishable from .(Yn,t )t∈[0,T ] on .Fn,T . If Y and .Ȳ indicate respectively the stochastic
integral processes of u on the intervals .[0, T ] and .[0, T̄ ] with .T ≤ T̄ , then Y and .Ȳ |[0,T ] are
indistinguishable on .[0, T ]. Therefore, the Itô stochastic integral process of .u ∈ L2S,loc with
respect to .B ∈ M c,loc is well defined:
ˆ t
.Yt = ur dBr , t ≥ 0.
0

We have .Y ∈ M c,loc with quadratic variation process

ˆ t
.〈Y 〉t = u2r d〈B〉r , t ≥ 0,
0

and a localizing sequence for Y is given by .τ̄n = τn ∧ τn' where .τn' = inf{t ≥ 0 | 〈I 〉t ≥ n}.
204 10 Stochastic Integral

In the next section, we deal with the particular case where .At = t and B is a
Brownian motion.

10.4 Scalar Itô Processes

An Itô process is specific type of continuous semimartingale, which can be

expressed as the sum of a Lebesgue integral and a stochastic integral. In this section,
W denotes a real Brownian motion.
Definition 10.4.1 (Itô Process [!]) An Itô process is a process of the form
ˆ t ˆ t
Xt = X0 +
. us ds + vs dWs , (10.4.1)
0 0

where:
(i) .X0 ∈ mF0 ;
(ii) .u ∈ L1loc , that is, u is progressively measurable and such that
ˆ t
. |us |ds < ∞, a.s.
0

for any .t ≥ 0;
(iii) .v ∈ L2loc , that is, v is progressively measurable and such that9
ˆ t
. |vs |2 ds < ∞ a.s.
0

for any .t ≥ 0.
Notation 10.4.2 (Differential Notation [[!]) ] To indicate the Itô process
in (10.4.1), the so-called “differential notation” is often used:

.dXt = ut dt + vt dWt . (10.4.2)

This notation, in addition to being more compact, has the merit of evoking the
expressions of classical differential calculus. In rigorous terms, .dXt is neither a
“derivative” nor a “differential of the process X”. These terms have not been
defined; rather, it is a symbol that holds significance solely within the context of
expression (10.4.2): such expression, in turn, is a writing whose precise meaning is
given by the integral equation (10.4.1). When we talk about stochastic differential
calculus, we refer to this type of symbolic calculation whose true meaning is

9 Remember that .〈W 〉s = s.

10.4 Scalar Itô Processes 205

given by the corresponding integral expressions: therefore, it is actually a stochastic

integral calculus.
The process in (10.4.1) is a continuous semimartingale and can therefore act as
an integrator itself, in fact, we have .X = A + M where:
– the process
ˆ t
At :=
. us ds
0

is continuous, adapted, and of bounded variation according to Example 9.1.2-

(iv), and is called the drift of X;
– the stochastic integral process
ˆ t
Mt := X0 +
. vs dWs
0

is a continuous local martingale and is called the diffusive part or diffusion of X.

By formula (10.3.1), the quadratic variation process of X is
ˆ t
〈X〉t =
. vs2 ds,
0

or, in differential notation,

d〈X〉t = vt2 dt.

Remark 10.4.3 ([!]) The representation of an Itô process is unique in the following
sense: if X is the process in (10.4.2) and we also have

dXt = u't dt + vt' dWt ,

with .u' ∈ L1loc and .v ' ∈ L2loc , then

( ) ( )
.P v = v ' a.e. = P u = u' a.e. = 1.

In particular, if .u, u' , v, v ' are continuous, then u is indistinguishable from .u' and v
is indistinguishable from .v ' .
Indeed, the process
ˆ t ˆ t ˆ t ˆ t
Mt :=
. vs dWs − vs' dWs = u's ds − us ds
0 0 0 0
206 10 Stochastic Integral

is a continuous local martingale, of bounded variation, which, by Theorem 9.3.6, is

indistinguishable from the null process. Consider
ˆ t
τn := n ∧ inf{t ≥ 0 | At ≥ n},
. At := (vs − vs' )2 ds, n ∈ N,
0

the usual localizing sequence for M. Then we have

⎾⎛ˆ ⎞2 ⎤ ⎾⎛ˆ ⎞2 ⎤
τn n
. 0=E (vs − vs' )dWs =E (vs − vs' )1[0,τn ] (s)dWs
0 0
⎾ˆ n ⎤
=E (vs − vs' )2 1[0,τn ] (s)ds
0

where the second and third equalities are due respectively to Proposition 10.2.23
and Itô’s isometry. Taking the limit as .n → ∞, by Beppo Levi’s theorem, we have
⎾ˆ ∞ ⎤
E
. (vs − vs' )2 ds =0
0
( )
and therefore .P v = v ' a.e. = 1. On the other hand, by Proposition B.3.2 in [113],
we also have that
( )
P u = u' a.e. = 1.
.

10.5 Key Ideas to Remember

We summarize the contents of the chapter and provide a roadmap for reading,
glossing over technical and secondary aspects. As usual, if you have any doubt
about what the following succinct statements mean, please review the corresponding
section.
• Section 10.1: when approaching these topics for the first time, it is preferable
to select some content and postpone the general treatment and in-depth studies
to a later time. In particular, it is best to first consider only the case where the
integrator is a Brownian motion. As for the integrand, the crucial assumption is
that it is a progressively measurable process; the construction of the Brownian
integral takes place in three steps, gradually widening the class of integrands:
(1) the definition of the integral of simple processes is explicit: it is a Riemann
sum of Brownian increments. In this case, three fundamental properties of
the integral are directly proven:
10.5 Key Ideas to Remember 207

(i) it is a continuous martingale;

(ii) Ito’s isometry;
(iii) there is an explicit expression for the quadratic variation process;
(2) the stochastic integral extends by density to integrands in .L2 . The three
fundamental properties remain valid;
(3) with a localization procedure using stopping times (which stop the quadratic
variation process when it exceeds some level), the stochastic integral extends
to integrands in the much wider class .L2loc . In this case, the first two
fundamental properties are lost, or rather, they remain valid in a weakened
form.
• Section 10.2: the construction of the stochastic integral extends to the case where
the integrator process is in .M c,2 and essentially analogous properties to those
of the Brownian integral hold. The integration endpoint can also be random,
provided it is a stopping time (see Proposition 10.2.23).
• Section 10.3: we further extend the definition of the stochastic integral to the case
where the integrator is a continuous semimartingale.
• Section 10.4: an Itô process is a particular continuous semimartingale that is the
sum of a Lebesgue integral with an integrand in .L1loc (drift term) and a Brownian
integral with an integrand in .L2loc (diffusive part): in differential notation, it is
written as .dXt = ut dt + vt dWt . The decomposition of an Itô process into drift
and diffusive parts is unique and the quadratic variation process is .d〈X〉t = vt2 dt.
Main notations used or introduced in this chapter:

Symbol Description Page

´t
. 0us dBs Stochastic integral with integrand u and integrator B 175
.L Progressively measurable processes in .L2 (Ω × [0, T ])
2 176
.L loc Progressively measurable processes in .L2 ([0, T ]) a.s.
2 182
.M Space of continuous martingales X, with .Xt ∈ L2 (Ω, P ) for
c,2 183
any t
.μ〈B〉 Lebesgue-Stieltjes measure of the increasing process .〈B〉 184
´b
. a f (t)d〈B〉t Lebesgue-Stieltjes integral with respect to the increasing 183
process .〈B〉
.L B
2 Progressively measurable processes in 184
.L (Ω × [0, T ], P ⊗ μ〈B〉 )
2

.L B,loc Progressively measurable processes in .L2 ([0, T ], μ〈B〉 ) a.s.

2 184
c,2
.MT Continuous square-integrable martingales 190
/ ⎾ ⎤
.‖X‖T = E XT2 Norm in .MTc,2 190
.L loc Progressively measurable processes in .L1 ([0, T ]) a.s.
1 204
Chapter 11
Itô’s Formula

To put meaning in one’s life may end in madness, But life

without meaning is the torture Of restlessness and vague desire-
It is a boat longing for the sea and yet afraid.
Edgar Lee Master

Ito’s formula is the most important tool in stochastic differential calculus. In this
chapter, we present several versions that provide the general rules of stochastic
calculus and generalize the analogous deterministic formula of Theorem 9.1.6 for
the Lebesgue-Stieltjes integral.

11.1 Itô’s Formula for Continuous Semimartingales

Although the case of semimartingales is very general, we immediately give this

version of Itô’s formula because it has the advantage of having a compact expression
and an intuitive proof. Recall that a continuous semimartingale is an adapted and
continuous process of the form .X = A+M with .A ∈ BV such that .A0 = 0 and .M ∈
M c,loc , that is, M is a continuous local martingale according to Deﬁnition 8.4.2.
We denote by .〈X〉 the quadratic variation process of X: by Theorem 9.4.1, we
have .〈X〉 ≡ 〈M〉 where .〈M〉 is the unique continuous increasing process such that
.〈M〉0 = 0 and .M − 〈M〉 is a local martingale. For example, if X is a Brownian
2

motion then .A ≡ 0 and the quadratic variation process is deterministic: .〈X〉t = t

for .t ≥ 0. More generally, if X is an Itô process of the form .dXt = ut dt + vt dWt
(cf. Deﬁnition 10.4.1) then .d〈X〉t = vt2 dt.
Theorem 11.1.1 (Itô’s Formula [!!!]) Let X be a continuous real semimartingale
and .F ∈ C 2 (R). Then almost surely, for every .t ≥ 0 we have
ˆ t ˆ t
1
F (Xt ) = F (X0 ) +
. F ' (Xs )dXs + F '' (Xs )d〈X〉s (11.1.1)
0 2 0

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 209
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_11
210 11 Itô’s Formula

or, in differential notation,

1
dF (Xt ) = F ' (Xt )dXt + F '' (Xt )d〈X〉t .
. (11.1.2)
2
Idea of the Proof Given a partition .π = {t0 , . . . , tN } of .[0, t], we write the
difference .F (Xt ) − F (X0 ) as a telescoping sum and then expand it in a Taylor
series up to the second order: we obtain

⎲
N
( )
F (Xt ) − F (X0 ) =
. F (Xtk ) − F (Xtk )
k=1

⎲
N
( ) 1⎲ N
( )2
= F ' (Xtk−1 ) Xtk − Xtk−1 + F '' (Xtk−1 ) Xtk − Xtk−1
2
k=1 k=1

+ “remainder”.

Finally, we prove that, in an appropriate sense, the limits exist

⎲
N ˆ
'
( ) t
. F (Xtk−1 ) Xtk − Xtk−1 −→ F ' (Xs )dXs ,
k=1 0

⎲
N ˆ
( )2 t
F '' (Xtn,k−1 ) Xtk − Xtk−1 −→ F '' (Xs )d〈X〉s
k=1 0

as .|π | → 0 and the remainder term is negligible. The detailed proof, which involves
more technical intricacies, is presented in Sect. 11.3.
Remark 11.1.2 Compared to the deterministic version (9.1.3), in Itô’s for-
mula (11.1.2) an additional second-order term appears, which comes from the
quadratic variation of X: the factor . 21 appearing in front of it is the coefﬁcient of the
Taylor series expansion of F .
Likewise, we establish a more comprehensive version of Itô’s formula.
Theorem 11.1.3 (Itô’s Formula) Let X be a continuous real semimartingale and
F = F (t, x) ∈ C 1,2 (R≥0 × R). Then almost surely, for every .t ≥ 0 we have
.

ˆ t ˆ t
.F (t, Xt ) = F (0, X0 ) + (∂t F )(s, Xs )ds + (∂x F )(s, Xs )dXs
0 0
ˆ t
1
+ (∂xx F )(s, Xs )d〈X〉s
2 0
11.1 Itô’s Formula for Continuous Semimartingales 211

or, in differential notation,

1
.dF (t, Xt ) = ∂t F (t, Xt )dt + (∂x F )(t, Xt )dXt + (∂xx F )(t, Xt )d〈X〉t .
2

11.1.1 Itô’s Formula for Brownian Motion

We consider Itô’s formula for a real Brownian motion W and delve into several
illustrative examples. Recall that the quadratic variation process of W is simply
.〈W 〉t = t.

Corollary 11.1.4 (Itô’s Formula for Brownian Motion) For every .F =

F (t, x) ∈ C 1,2 (R≥0 × R) we have
ˆ t ˆ t
F (t, Wt ) = F (0, W0 ) +
. (∂t F )(s, Ws )ds + (∂x F )(s, Ws )dWs
0 0
ˆ t
1
+ (∂xx F )(s, Ws )ds
2 0

or, in differential notation,

⎛ ⎞
1
dF (t, Wt ) = ∂t F + ∂xx F (t, Wt )dt + (∂x F )(t, Wt )dWt .
.
2

Example 11.1.5
(i) if .F (t, x) = f (t)x, with .f ∈ C 1 (R), we have

∂t F (t, x) = f ' (t)x,

. ∂x F (t, x) = f (t), ∂xx F (t, x) = 0.

Then we have
ˆ t ˆ t
f (t)Wt =
. f ' (s)Ws ds + f (s)dWs
0 0

which corresponds to the deterministic integration by parts formula of Exam-

ple 9.1.8-(ii). In differential form, we equivalently have

d(f (t)Wt ) = f ' (t)Wt dt + f (t)dWt

which resembles the usual formula for the derivation of a product;

212 11 Itô’s Formula

(ii) if .F (t, x) = x 2 we have

∂t F (t, x) = 0,
. ∂x F (t, x) = 2x, ∂xx F (t, x) = 2,

and therefore
ˆ t
Wt2 = 2
. Ws dWs + t
0

or, in differential form,

dWt2 = 2Wt dWt + dt;

(iii) if .F (t, x) = eat+σ x , with .a, σ ∈ R, we have

∂t F (t, x) = aF (t, x),

. ∂x F (t, x) = σ F (t, x), ∂xx F (t, x) = σ 2 F (t, x),

and therefore, setting .Xt = eat+σ Wt , we obtain

ˆ t ˆ t ˆ
σ2 t
Xt = 1 + a
. Xs ds + σ Xs dWs + Xs ds
0 0 2 0

or, in
⎛ ⎞
σ2
dXt = a +
.
2 Xt dt + σ Xt dWt .

2
With the choice .a = − σ2 , the drift of the process vanishes, and we obtain
ˆ t
. Xt = 1 + σ Xs dWs
0

σ2
which is a continuous martingale: speciﬁcally, .Xt = eσ Wt − 2 t is the exponen-
tial martingale introduced in Proposition 4.4.1.
Remark 11.1.6 ([!]) Itô’s formula shows that every stochastic process of the form
Xt = F (t, Wt ), with F sufﬁciently regular, is an Itô process according to
.

Deﬁnition 10.4.1: in particular, X is a semimartingale, and Itô’s formula provides the

explicit expression of the decomposition (unique up to indistinguishable processes)
of X into the sum .X = A + M where the process of bounded variation
ˆ t⎛ ⎞
1
At :=
. ∂t F + ∂xx F (s, Ws )ds
0 2
11.1 Itô’s Formula for Continuous Semimartingales 213

is the drift of X and the local martingale1

ˆ t
.Mt := X0 + (∂x F )(s, Ws )dWs
0

is the diffusive part of X.

Note that if F solves the heat equation

1
∂t F (t, x) + ∂xx F (t, x) = 0,
. t > 0, x ∈ R, (11.1.3)
2
then the drift of X vanishes and therefore X is a local martingale. Conversely, if X
is a local martingale then by Remark 10.4.3 we have that

1
(∂t F + ∂xx F )(t, Wt ) = 0
. (11.1.4)
2

in the sense of indistinguishability and this implies2 that F solves the heat
equation (11.1.3).

11.1.2 Itô’s Formula for Itô Processes

Let X be an Itô process of the form

dXt = μt dt + σt dWt
. (11.1.5)

with .μ ∈ L1loc and .σ ∈ L2loc . In Sect. 10.4 we saw that X is a continuous

semimartingale with quadratic variation process
ˆ t
〈X〉t =
. σs2 ds
0

that is, .d〈X〉t = σt2 dt. Hence we have the following further version of Itô’s formula.

1 We ﬁnd here the result of Theorem 4.4.3, proven in the context of Markov process theory!
2 The stochastic equation (11.1.4) is equivalent to the deterministic equation (11.1.3): just observe
that if f is a continuous function such that .f (Wt ) = 0 a.s. for a .t > 0 then .f ≡ 0: in fact, if it
were .f (x̄) > 0 for a .x̄ ∈ R then we would also have .f (x) > 0 for .|x − x̄| < r for some .r > 0
sufﬁciently small; this leads to a contradiction since, the Gaussian density being strictly positive,
we would have
⎡ ⎤
.0 < E f (Wt )1(|Wt −x̄|<r) = 0.
214 11 Itô’s Formula

Corollary 11.1.7 (Itô’s Formula for Itô Processes) Let X be the Itô process
in (11.1.5). For each .F = F (t, x) ∈ C 1,2 (R≥0 × R) we have
ˆ t ˆ t
F (t, Xt ) = F (0, X0 ) +
. (∂t F )(s, Xs )ds + (∂x F )(s, Xs )dXs
0 0
ˆ t
1
+ (∂xx F )(s, Xs )σs2 ds (11.1.6)
2 0

or equivalently
⎛ ⎞
σt2
.dF (t, Xt ) = ∂t F + μt ∂x F + ∂xx F (t, Xt )dt + σt ∂x F (t, Xt )dWt .
2

Example 11.1.8 ([!!]) Let us calculate the stochastic differential of the process
´t
Yt = e t
. 0 Ws dWs
.

First of all, we notice that we cannot use Itô’s formula for Brownian motion from
Corollary 11.1.4 because .Yt is not a function of .Wt but depends on .(Ws )s∈[0,t] ,
that is, on the entire trajectory of W in the interval .[0, t]. The general criterion
to correctly apply Itô’s formula is to ﬁrst analyze how .Yt depends on the variable
t, distinguishing the “deterministic” from the “stochastic” dependence: in this
example, we highlight in bold the deterministic dependence
⎛ ˆ t ⎞
. t |→ exp t Ws dWs
0

and the stochastic dependence

⎛ ˆ t ⎞
t |→ exp t
. Ws dWs
0

to establish that
ˆ t
Yt = F (t, Xt ),
. F (t, x) = e , tx
Xt = Ws dWs ,
0

and therefore .dXt = Wt dWt and .d〈X〉t = Wt2 dt. Then we can apply Itô’s
formula (11.1.6): since

.∂t F (t, x) = xF (t, x), ∂x F (t, x) = tF (t, x), ∂xx F (t, x) = t 2 F (t, x),
11.1 Itô’s Formula for Continuous Semimartingales 215

we get
⎛ ⎞
(tWt )2
.dYt = Xt + Yt dt + tWt Yt dWt .
2

Example 11.1.9 ([!]) Consider an Itô process with deterministic coefﬁcients

ˆ t ˆ t
Xt = x +
. μ(s)ds + σ (s)dWs
0 0

with .x ∈ R, .μ ∈ L1loc (R≥0 ) and .σ ∈ L2loc (R≥0 ). As an application of Itô’s

formula (11.1.6), we prove that
ˆ t ˆ t
Xt ∼ Nm(t),C (t) ,
. m(t) := x + μ(s)ds, C (t) := σ 2 (s)ds,
0 0

for every .t ≥ 0. In fact, we can easily calculate the characteristic function of X:

ﬁrst, for every .η ∈ R we have
⎛ ⎞
η2
deiηXt = eiηXt iηdXt − d〈X〉t
.
2
η2 σ 2 (t)
= eiηXt (a(t, η)dt + iησ (t)dWt ) , a(t, η) := iημ(t) − .
2
Applying the expected value and being null the expectation of the stochastic integral,
we have
⎡ˆ t ⎤
.ϕXt (η) = e +E
iηx
a(s, η)eiηXs ds
0
ˆ t
= eiηx + a(s, η)ϕXs (η)ds;
0

equivalently, .t |→ ϕXt (η) solves the Cauchy problem

⎧
d
dt ϕXt (η) = a(t, η)ϕXt (η),
.
ϕX0 (η) = eiηx ,

so that
η2
ϕXt (η) = eiηm(t)−
. 2 C (t)

and this proves the thesis.

216 11 Itô’s Formula

Example 11.1.10 ([!]) Given

ˆ t
Xt :=
. Ws ds (11.1.7)
0

we have .Xt ∼ N 3 . In fact, by Itô’s formula, we have

0, t3

.d(tWt ) = tdWt + Wt dt

that is
ˆ t ˆ t
Xt = tWt −
. sdWs = (t − s)dWs .
0 0

We note that the expression of X in (11.1.7) is that of an Itô process, while

ˆ t
. (t − s)dWs
0

is not written in the form of an Itô process: to circumvent this problem, we deﬁne
the Itô process
ˆ t
(a)
.Yt := (a − s)dWs
0

dependent on the parameter .a ∈ R. We know that

(a)
Yt
. ∼N 3
0, t3 +at (a−t)

(t)
and the thesis follows from the fact that .Xt = Yt .

11.2 Some Consequences of Itô’s Formula

11.2.1 Burkholder-Davis-Gundy Inequalities

We prove some classical inequalities that are a basic tool in the study of martingales
and stochastic differential equations.
Theorem 11.2.1 (Burkholder-Davis-Gundy [!]) Let X be a continuous local
martingale such that .X0 = 0 a.s. and .τ an a.s. finite stopping time (i.e., such that
11.2 Some Consequences of Itô’s Formula 217

τ < ∞ a.s.). For every .p > 0 there exist two positive constants .cp , Cp such that
.

⎡ ⎤
⎡ ⎤ ⎡ ⎤
.cp E 〈X〉τ ≤E sup |Xt | ≤ Cp E 〈X〉p/2
p/2 p
τ . (11.2.1)
t∈[0,τ ]

In (11.2.1), .〈X〉 denotes the quadratic variation process of X.

Proof We only prove the case .p ≥ 2 in which it is possible to give an elementary
proof based on Itô’s formula. For the general case, see, for example, Proposition
3.26 in [67]. The case .p = 2 follows from Itô’s isometry (9.4.1) and therefore it is
sufﬁcient to consider .p > 2.
⎡We begin⎤ by proving the second inequality. It is not restrictive to assume
p/2
.E 〈X〉τ > 0 otherwise there is nothing to prove. Let

X̄τ = sup |Xt |

.
t∈[0,τ ]

and assume for the moment that .X̄τ ≤ n a.s. for some .n ∈ N. Then, by Doob’s
maximal inequality, Corollary 8.1.3, we have
⎡ ⎤ ⎡ ⎤
E X̄τp ≤ cp E |Xτ |p =
.

(by Itô’s formula, noting that the function .x →

| |x|p is of class .C 2 since .p ≥ 2)
⎡ˆ τ ⎤ ⎡ˆ τ ⎤
cp
. = cp E p|Xt |p−1 dXt + E p(p − 1)|Xt |p−2 d〈X〉t =
0 2 0

(since the ﬁrst term is null because the stochastic integral is a martingale, given the
boundedness assumption of .X̄τ )
⎡ˆ τ ⎤
. = cp' E |Xt |p−2 d〈X〉t
0
⎡ˆ τ ⎤
≤ cp' E X̄τp−2 d〈X〉t
0
⎡ ⎤
= cp' E X̄τp−2 〈X〉τ ≤

p
(by Hölder’s inequality with exponents . p−2 and . p2 )

⎡ ⎤ p−2 ⎡ ⎤2
≤ cp' E X̄τp p E 〈X〉p/2
p
. τ

and from this inequality, the thesis easily follows. To remove the boundedness
assumption, it is sufﬁcient to apply the result just proved to the stopping time
218 11 Itô’s Formula

τn = inf{t ≥ 0 | |Xt | ≥ n} ∧ τ and then take the limit as .n → ∞ using Beppo

Levi’s theorem.
Let us now prove the ﬁrst inequality: with the usual localization argument based
⎡ p ⎤that .τ , .X̄τ and .〈X〉τ are
on Beppo Levi’s theorem, it is not restrictive to assume
bounded by a positive constant. We also assume .E X̄τ > 0 otherwise there is
nothing to prove. Let .r = p2 > 1 and .A = 〈X〉. By the deterministic Itô’s formula,
Theorem 9.1.6 and formula (9.1.4), we have

dArt = rAtr−1 dAt ,

.
⎛ ⎞
dArt = d At Atr−1 = At dAtr−1 + Atr−1 dAt ,

and inserting the ﬁrst into the second equality we get

1
dArt = At dAtr−1 + dArt
.
r
that is
ˆ τ
(r − 1)Arτ = r
. At dAtr−1 .
0

Since also
ˆ τ ˆ τ
Arτ = Aτ
. dAtr−1 = Aτ dAtr−1 ,
0 0

we ﬁnally obtain
ˆ τ
r
.Aτ =r (Aτ − At ) dAtr−1 .
0

Then we have
⎡ˆ ⎤
⎡ r⎤ τ
.E Aτ = rE (Aτ − At ) dAtr−1 =
0

(by Proposition 9.2.3 and since .At = E [At | Ft ])

⎡ˆ τ ⎤
. = rE E [Aτ − At | Ft ] dAtr−1 =
0

(by (9.4.1) and (1.4.3) (see also Remark 9.4.4), remembering the notation .A = 〈X〉)
⎡ˆ τ ⎡ ⎤ ⎤
. = rE E Xτ2 − Xt2 | Ft d〈X〉tr−1
0
11.2 Some Consequences of Itô’s Formula 219

⎡ˆ τ ⎡ ⎤ ⎤
≤ rE E X̄τ | Ft d〈X〉t
2 r−1
=
0

(again by Proposition 9.2.3)

⎡ˆ τ ⎤ ⎡ ⎤
. = rE X̄τ2 d〈X〉tr−1 = rE X̄τ2 〈X〉τr−1 .
0

r
To conclude, just apply Hölder’s inequality with exponents r, . r−1 and ﬁnally divide
⎡ ⎤ r−1
by .E 〈X〉τ r .
r ⨆
⨅
We have the following immediate
Corollary 11.2.2 ([!]) Let .σ ∈ L2 and W be a real Brownian motion. For every
.p ≥ 2 and .T > 0 we have

⎡ |ˆ t |p ⎤ ⎡ˆ ⎤
| | T
sup || σs dWs || ≤ cp T
p−2
E
. 2 E |σs |p ds (11.2.2)
0≤t≤T 0 0

where .cp is a positive constant that depends only on p.

Proof It is enough3 to consider .p > 2. Applying the Burkholder-Davis-Gundy
inequality to the continuous martingale
ˆ t
Xt =
. σs dWs ,
0

we obtain
⎡ ⎤ ⎡⎛ˆ ⎞p/2 ⎤
⎡ ⎤ T
p/2
E
. sup |Xt | ≤ cp E 〈X〉T = cp E σt2 dt .
0≤t≤T 0

p
The thesis follows by applying Hölder’s inequality with exponents . p2 and . p−2 . ⨆
⨅
Remark 11.2.3 Assume .p > 4 and
ˆ t ⎡ˆ T ⎤
Xt :=
. σs dWs with E |σs | ds < ∞.
p
0 0

Combining estimate (11.2.2) with Kolmogorov’s continuity theorem, we have that

the integral process X admits a version with .α-Hölder continuous trajectories for
every .α ∈ [0, 12 − p2 [.

3 The case .p = 2 corresponds to Itô’s isometry.

220 11 Itô’s Formula

11.2.2 Quadratic Variation Process

We prove formula (9.4.2) that we left pending.

Proposition 11.2.4 Let X be a continuous local martingale with quadratic varia-
tion process .〈X〉. We have

2 ⎛
⎲
n
⎞2
.〈X〉t = lim X tkn − X t (k−1) , t ≥ 0,
n→∞ 2 2n
k=1

in probability. Moreover, if .S = A+X is a continuous semimartingale, with .A ∈ BV

and .X ∈ M c,loc , we have

2 ⎛
⎲
n
⎞2
. lim S tkn − S t (k−1) = 〈X〉t , t ≥ 0, (11.2.3)
n→∞ 2 2n
k=1

in probability.
Proof As usual, we denote by .tn,k = 2tkn , .k = 0, . . . , 2n , the dyadic rationals of
the interval .[0, t]. We ﬁrst assume that X is a bounded continuous local martingale,
.|X| ≤ K with K positive constant. Given .n ∈ N and .k ∈ {1, . . . , 2 }, we consider
n

the process

Ys := Xs − Xtn,k−1 ,
. s ≥ tn,k−1 ,

and observe that .〈Y 〉s = 〈X〉s − 〈X〉tn,k−1 : indeed, it is enough to observe that
( )
Ys2 − 〈X〉s − 〈X〉tn,k−1 = Xs2 −〈X〉s +Ms , Ms := −2Xs Xtn,k−1 +Xt2n,k−1 +〈X〉tn,k−1 ,
.

and it is easily veriﬁed that .(Ms )s≥tn,k−1 is a martingale. Applying Itô’s formula, we
have

dYs2 = 2Ys dYs + d〈Y 〉s

and in integral form over .[tn,k , tn,k−1 ]

ˆ
( )2 tn,k ( )
. Xtn,k − Xtn,k−1 =2 Xs − Xtn,k−1 dYs + 〈X〉tn,k − 〈X〉tn,k−1
tn,k−1

that is
ˆ
( )2 ( ) tn,k ( )
. Xtn,k − Xtn,k−1 − 〈X〉tn,k − 〈X〉tn,k−1 = 2 Xs − Xtn,k−1 dYs .
tn,k−1
11.3 Proof of Itô’s Formula 221

Summing over k, we obtain

⎲
2 n 2 ˆ
⎲
n
( )2 tn,k ( )
Rn :=
. Xtn,k − Xtn,k−1 − 〈X〉t = 2 Xs − Xtn,k−1 dYs .
k=1 k=1 tn,k−1

Thanks to the Itô isometry in the form (10.2.12) and (10.2.13) (also remember the
Theorem 10.2.15), we have
⎡ˆ ⎤
⎡ ⎤ 2n
⎲ tn,k ( )2
.E Rn = 4 Xs − Xtn,k−1 d〈Y 〉s
2
E
k=1 tn,k−1
⎡ ⎤
ˆ t⎲
2n
( )2
= 4E ⎣ Xs − Xtn,k−1 1[tn,k−1 ,tn,k ] (s)d〈Y 〉s ⎦
0 k=1

and taking
⎡ ⎤ the limit, by the dominated convergence theorem, we have
.lim E Rn2 = 0. Therefore, in this particular case, we prove the convergence
n→∞
in .L2 norm which obviously implies convergence in probability.
To remove the boundedness assumption on X, it is sufﬁcient to use a localization
argument proving the thesis for the bounded martingale .Xt∧τn , with

τn = t ∧ inf{s ≥ 0 | |Xs | ≥ n, 〈X〉s ≥ n, Vs (A) ≥ n},

. n ∈ N,

to then let n tend to inﬁnity: with this procedure, we can prove convergence in
probability. The proof of (11.2.3) is similar and is omitted. ⨅
⨆

11.3 Proof of Itô’s Formula

We prove Theorem 11.1.1. Let .X = A+M be a continuous real-valued semimartin-

gale where A is an adapted, continuous, and locally of bounded variation process
and .M ∈ M c,loc . In Theorem 9.4.1 we deﬁned the quadratic variation process .〈M〉
as the unique (up to indistinguishability) adapted, continuous, increasing process
such that .〈M〉0 = 0 and .M 2 − 〈M〉 ∈ M c,loc . Moreover, if M is square-integrable,
i.e., .M ∈ M c,2 , then we have the important identities
⎡ ⎤ ⎡ ⎤
E (Mt − Ms )2 | Fs = E Mt2 − Ms2 | Fs .
. (11.3.1)

= E [〈M〉t − 〈M〉s | Fs ] , 0 ≤ s ≤ t. (11.3.2)

222 11 Itô’s Formula

Even though it is a calculation we have already done, it is useful to remember

that (11.3.1) simply comes from
⎡ ⎤ ⎡ ⎤
. E (Mt − Ms )2 | Fs = E Mt2 − 2Mt Ms + Ms2 | Fs
⎡ ⎤
= E Mt2 | Fs − 2Ms E [Mt | Fs ] + Ms2 =

(by the martingale property of M)

⎡ ⎤
. = E Mt2 | Fs − Ms2 .

Instead, (11.3.2) is equivalent to the martingale property of .M 2 − 〈M〉. The proof

of Itô’s formula is essentially based on these two identities. Another ingredient is
the uniform estimate (9.6.3) of the .L2 norm of the quadratic variation of M on the
dyadics.
We divide the proof of Theorem 11.1.1 into four steps.
First Step Consider the continuous semimartingale .X = A + M. Since (11.1.1)
is an equality of continuous processes, it is sufficient to prove that they are
modifications: in other words, we can prove the thesis for a fixed .t > 0. We set

τn = t ∧ inf{s ≥ 0 | |Xs | ≥ n, 〈X〉s ≥ n, Vs (A) ≥ n},

. n ∈ N,

where .Vs (A) denotes the first variation process of A on .[0, s] (cf. Definition 9.1.1).
By continuity, .τn ↗ ∞ a.s. and therefore it is enough to prove Itô’s formula for
.Xt∧τn for each .n ∈ N: equivalently, it is enough to prove for each fixed .N̄ ∈ N

that (11.1.1) holds in the case where the processes .|X|, |M|, A, 〈X〉 and .V (A) are
bounded by .N̄. In this case, it is not restrictive to assume that the function F has
compact support, possibly modifying it outside .[−N̄ , N̄ ]. At ﬁrst, we also assume
that .F ∈ C 3 (R).
We use the notation (8.1.1) for the dyadics

D(t) = {tn,k =
.
tk
2n | k = 0, . . . , 2n , n ∈ N}

of .[0, t] and indicate with .Δn,k Y = Ytn,k − Ytn,k−1 the increment of a generic process
Y . Moreover, let .Fn,k := Ftn,k and

δn (Y ) =
. sup |Ys − Yr |, n ∈ N.
s,r∈D(t)
|s−r|< 1n
2
11.3 Proof of Itô’s Formula 223

Expanding in Taylor series up to the second order with Lagrange remainder, we

obtain
n
⎲
2
( )
.F (Xt ) − F (X0 ) = F (Xtn,k ) − F (Xtn,k−1 )
k=1
n 2n
⎲
2
1 ⎲ '' ( )2
'
= F (Xtn,k−1 )Δn,k X + F (Xtn,k−1 ) Δn,k X + Rn
2
k=1 k=1
(11.3.3)

with
n
⎲
2
( )3
. |Rn | ≤ ‖F ''' ‖∞ Δn,k X . (11.3.4)
k=1

In the next two steps, we estimate the individual terms in (11.3.3) to show that they
converge to the corresponding terms in (11.1.1) and .Rn −→ 0 as .n → ∞.
Second Step Regarding the ﬁrst sum in (11.3.3), we have
n
⎲
2
. F ' (Xtn,k−1 )Δn,k X = In1,A + In1,M
k=1

where, by Proposition 9.1.3,

⎲
2 n ˆ t
'
1,A
.In := F (Xtn,k−1 )Δn,k A −−−−→ F ' (Xs )dAs (11.3.5)
n→∞ 0
k=1

with the integral understood in the Riemann-Stieltjes sense (or Lebesgue-Stieltjes,

by Proposition 9.2.2) and

⎲
2 n ˆ t
'
1,M
.In := F (Xtn,k−1 )Δn,k M −−−−→ F ' (Xs )dMs
n→∞ 0
k=1

in probability, by Corollary 10.2.27.

Third Step Regarding the second sum in (11.3.3), we have
n
⎲
2
. F '' (Xtn,k−1 )(Δn,k X)2 = In2,A + 2In2,AM + In2,M
k=1
224 11 Itô’s Formula

where
n n
⎲
2 ⎲
2
''
2,A
.In := F (Xtn,k−1 )(Δn,k A) , 2
In2,AM := F '' (Xtn,k−1 )(Δn,k A)(Δn,k M),
k=1 k=1
2n
⎲
In2,M := F '' (Xtn,k−1 )(Δn,k M)2 .
k=1

Now we have

|In2,A | ≤ ‖F '' ‖∞ δn (A)Vt (A) ≤ N̄ ‖F '' ‖∞ δn (A) −−−−→ 0

. a.s.
n→∞

by the uniform continuity of the trajectories of A on .[0, t]. A similar result holds for
In2,AM . Recalling that by deﬁnition .〈X〉 = 〈M〉, it remains to prove that
.

ˆ t
2,M
.In −−−−→ F '' (Xs )d〈M〉s .
n→∞ 0

Since, analogously to (11.3.5), we almost surely have

⎲
2 n ˆ t
. F '' (Xtn,k−1 )Δn,k 〈M〉 −−−−→ F '' (Xs )d〈M〉s ,
n→∞ 0
k=1

we prove that

⎲
2n
⎛ ⎞
. F '' (Xtn,k−1 ) (Δn,k M)2 − Δn,k 〈M〉 −−−−→ 0
n→∞
k=1
( )
in .L2 (Ω, P ) norm. Setting .Gn,k = F '' (Xtn,k−1 ) (Δn,k M)2 − Δn,k 〈M〉 , expanding
the square of the sum, we have
⎡⎛ ⎞2 ⎤ ⎡ n ⎤
2n
⎲ ⎲2
⎢⎝ ⎥
.E ⎣ Gn,k ⎠ ⎦ = E ⎣ G2n,k ⎦
k=1 k=1

since the double products cancel out: in fact, if .h < k, we have

⎡ ⎤ ⎡ ⎡ ⎤⎤
E Gn,h Gn,k = E Gn,h F '' (Xtn,k−1 )E (Δn,k M)2 − Δn,k 〈M〉 | Fn,k−1 = 0
.
11.3 Proof of Itô’s Formula 225

due to (11.3.2). Now, by the elementary inequality .(x + y)2 ≤ 2x 2 + 2y 2 , we have

⎡ ⎤ ⎡ ⎤
⎲
2n 2 ⎛
⎲
n
⎞
E⎣
. G2n,k ⎦ ≤ 2‖F '' ‖∞ E ⎣ (Δn,k M)4 + (Δn,k 〈M〉)2 ⎦
k=1 k=1
⎡ n
⎤
⎲
2
≤ 2‖F '' ‖∞ E ⎣δn2 (M) (Δn,k M)2 + δn (M)Vt (〈M〉)⎦ ≤
k=1

(applying Hölder’s inequality to the ﬁrst term)

⎛ ⎡⎛ ⎞
⎞2 ⎤ 12
⎜ ⎡ 4 ⎤ n
⎢ ⎲ ⎟
2 1
⎥
. ≤ 2‖F ‖∞ ⎜E δn (M) E ⎣⎝ (Δn,k M)2 ⎠ ⎦ + N̄ E [δn (〈M〉)]⎟
'' 2
⎝ ⎠ −−−−→ 0 n→∞
k=1

since:
. .δn (M) ≤ 2N̄ and .δn (M) −−−−→ 0 almost everywhere by the uniform continuity
n→∞ ⎡ ⎤
of M on .[0, t]: consequently, .E δn4 (M) → 0 by the dominated convergence
theorem. Similarly, .E [δn (〈M〉)] −−−−→ 0;
⎡⎛ ⎞2 ⎤
n→∞

∑
2 n

. . sup E ⎣ (Δn,k M)2 ⎦ ≤ 16N̄ 4 by estimate (9.6.3).

n∈N k=1

Based on (11.3.4), the proof that

⎡ ⎤
. lim E |Rn |2 = 0
n→∞

is entirely analogous.
Fourth Step We conclude the proof by removing the additional regularity assump-
tion on F . Given .F ∈ C 2 (R) with compact support, consider a sequence .(Fn )n∈N
of .C 3 functions that converge uniformly to F along with their ﬁrst and second
derivatives. We apply Itô’s formula to .Fn and let n tend to inﬁnity: we have
.Fn (Xs ) −−−−→ F (Xs ) for every .s ∈ [0, t]. By the dominated convergence theorem,
n→∞
we have a.s.
ˆ t ˆ t
( ' ) ( '' )
. lim Fn (Xs ) − F ' (Xs ) dAs = lim Fn (Xs ) − F '' (Xs ) d〈X〉s = 0
n→∞ 0 n→∞ 0
226 11 Itô’s Formula

and by Itô’s isometry

⎡⎛ˆ ⎞2 ⎤
t ( )
. lim E Fn' (Xs ) − F ' (Xs ) dMs
n→∞ 0
⎡ˆ t ⎛ ⎞ ⎤
' '
= lim E Fn (Xs ) − F (Xs ) d〈M〉s = 0.
2
n→∞ 0

11.4 Key Ideas to Remember

We outline the chapter’s main findings and essential concepts, omitting technical
details. As usual, if you have any doubt about what the following succinct statements
mean, please review the corresponding section.
. Section 11: the significance of the quadratic variation process becomes apparent
in the outline of Itô’s formula proof: in particular, it introduces an additional
term that modifies the usual rules of deterministic integral calculus. The Itô’s
formula provides the Doob’s decomposition of a process that is a sufficiently
regular function of a continuous semimartingale, providing the expression of the
drift and the diffusive part.
. Sections 11.1.1 and 11.1.2: the heat operator appears in the drift term of Itô’s
formula for the Brownian motion: a process of the form .Xt = F (t, Wt ) is a
(local) martingale if and only if the function F is a solution of the heat equation.
An application of Itô’s formula shows that Itô processes with deterministic
coefficients have normal distribution.
. Section 11.2: the Burkholder-Davis-Gundy inequality generalizes the Itô isom-
etry and provides a comparison between the .Lp norm of a continuous local
martingale X and the .Lp/2 norm of the related quadratic variation process .〈X〉.
Main notations used or introduced in this chapter:

Symbol Description Page

.M
c,2 Continuous square-integrable martingales 141
.M
c,loc Continuous local martingales 143
.〈X〉 Quadratic variation process 165
Chapter 12
Multidimensional Stochastic Calculus

Tu, tu non mi basti mai davvero non mi basti mai tu, tu dolce
terra mia dove non sono stato mai.1
Lucio Dalla

In this chapter, we extend the definitions and results of the previous chapters to
the multidimensional case. We do not introduce any really new concepts; however,
some results, such as Itô’s formula, become technically more complicated and for
this reason, some formal rules introduced in Sect. 12.3 can be useful for practical
calculations.

12.1 Multidimensional Brownian Motion

Definition 12.1.1 (d-Dimensional Brownian Motion) Let W =(Wt1 , . . . , Wtd )t≥0

be a stochastic process with values in Rd defined on a filtered probability space
(Ω, F , P , Ft ). We say that W is a d-dimensional Brownian motion if it satisfies
the following properties:
(i) W0 = 0 a.s.;
(ii) W is a.s. continuous;
(iii) W is adapted;
(iv) Wt − Ws is independent of Fs for every t ≥ s ≥ 0;
(v) Wt − Ws ∼ N0,(t−s)I for every t ≥ s ≥ 0, where I denotes the d × d identity
matrix.

1 You, you’re never enough for me

truly, you’re never enough for me
you, you sweet land of mine
where I have never been before.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 227
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_12
228 12 Multidimensional Stochastic Calculus

A multidimensional Brownian motion is a vector of independent real Brownian

motions: indeed, we have
Proposition 12.1.2 If W = (W 1 , . . . , W d ) is a d-dimensional Brownian motion on
(Ω, F , P , Ft ) then:
(i) any component W i , for i = 1, . . . , d, is a real Brownian motion on
(Ω, F , P , Ft );
j j
(ii) Wti − Wsi and Wt − Ws are independent random variables for every i /= j
and t ≥ s ≥ 0;
(iii) the covariation matrix of W is 〈W 〉t = tI or, in differential notation,

d〈W i , W j 〉t = δij dt
. (12.1.1)

where δij is the Kronecker delta

⎧
1 if i = j,
. δij =
0 if i /= j ;

(iv) if A is an orthogonal d × d matrix then the process defined by Bt := AWt

is still a d-dimensional Brownian motion. If instead A is a generic N × d
matrix then B satisfies properties (i), (ii), (iii) and (iv) of Definition 12.1.1 and
Bt − Bs ∼ N0,(t−s)C for every 0 ≤ s ≤ t, where C = AA∗ . The covariation
matrix of B coincides with its covariance matrix 〈B〉t = cov(Bt ) = tC . We
say that B is an N-dimensional correlated Brownian motion.
Proof Properties (i) and (ii) follow from the fact that, for t > s ≥ 0, the increment
Wt − Ws has Gaussian density

1 |x|2 ∏
d
1 xi 2

. e− 2(t−s) = √ e− 2(t−s) , x ∈ Rd ,
(2π(t − s))
d
2
i=1
2π(t − s)

which is the product of standard one-dimensional Gaussians: in particular, indepen-

dence follows from Theorem 2.3.23 in [113].
As for (iii), by point (i) we have 〈W i 〉t = 〈W i , W i 〉t = t for each i = 1, . . . , d.
For i /= j it is a simple exercise2 to prove that W i W j is a martingale and therefore
〈W i , W j 〉t = 0.

2 For t ≥ s ≥ 0, we have
⎡ ⎤ ⎡⎛ ⎞ ⎤ ⎡ ⎤
i j j j j
.E Wt Wt | Fs = E Wti − Wsi Wt | Fs + Wsi E Wt | Fs = Wsi Ws

since
⎡⎛ ⎞ ⎤ ⎡⎛ ⎞⎛ ⎞ ⎤ ⎡ ⎤
j j j j
.E Wti − Wsi Wt | Fs = E Wti − Wsi Wt − Ws | Fs + Ws E Wti − Wsi | Fs
12.1 Multidimensional Brownian Motion 229

Point (iv) is a simple check based on Proposition 2.5.15 in [113]. ⨆

⨅
Example 12.1.3 ([!]) Let W be a two-dimensional Brownian motion. Setting
⎛ ⎞
1√ 0
.A =
ρ 1 − ρ2

with ρ ∈ [−1, 1], we have

⎞ ⎛
1ρ ∗
.C = AA = .
ρ 1

The two-dimensional correlated Brownian motion B := AW is such that

/
.Bt = Wt , Bt2 = ρWt1 + 1 − ρ 2 Wt2 ,
1 1

are scalar Brownian motions and

cov(Bt1 , Bt2 ) = 〈B 1 , B 2 〉t = ρt.

In this section, we briefly show how to define the stochastic integral of multidi-
mensional processes, focusing in particular on Brownian motion and Itô processes.
For simplicity, we only deal with the case where the integrator is in M c,2 even
though all the results extend directly to integrators that are continuous semimartin-
gales. Hereafter, d and N denote two natural numbers.
Definition 12.1.4 Let B = (B 1 , . . . , B d ) ∈ M c,2 be a d-dimensional process.
Consider a process u = (uij ) with values in the space of matrices of dimension
N × d. We write u ∈ L2B (or simply u ∈ L2 ) if uij ∈ L2B j for each i = 1, . . . , N
and j = 1, . . . , d. The class L2loc ≡ L2B,loc is defined in an analogous way. The
stochastic integral of u with respect to B is the N -dimensional process, defined
component by component as
⎛ ⎞
ˆ t d ˆ
⎲ t
us dBs := ⎝ us dBs ⎠
ij j
.
0 j =1 0 i=1,...,N

for t ≥ 0.
Theorem 12.1.5 ([!]) Let
ˆ t ˆ t
Xt =
. us dBs1 , Yt = vs dBs2 ,
0 0

⎡⎛ ⎞⎛ ⎞⎤
j j
=E Wti − Wsi W t − Ws =0

by the independence of increments.

230 12 Multidimensional Stochastic Calculus

with B 1 , B 2 one-dimensional processes in M c,2 and u, v one-dimensional pro-

cesses respectively in L2B 1 ,loc and L2B 2 ,loc . Then:
(i)
ˆ t
.〈X, Y 〉t = us vs d〈B 1 , B 2 〉s ; (12.1.2)
0

(ii) if u ∈ L2B 1 and v ∈ L2B 2 then the following version of Itô’s isometry holds
⎡ˆ T ˆ T ⎤ ⎡ˆ T ⎤
E
. us dBs1 vs dBs2 | Ft = E us vs d〈B 1 , B 2 〉s | Ft , 0 ≤ t ≤ T .
t t t
(12.1.3)
Proof When u and v are indicator processes, (12.1.3) is proven by repeating the
proof of Theorem 10.2.7-(ii) with the only difference that, instead of (10.2.6), we
use (9.5.2) in the form
⎡ ⎤ ⎡ ⎤
E (BT1 − Bt1 )(BT2 − Bt2 ) | Ft = E 〈B 1 , B 2 〉T − 〈B 1 , B 2 〉t | Ft ,
. 0≤t ≤ T .

The proof of (12.1.2) is completely analogous to the case where B 1 = B 2 . ⨆

⨅
Corollary 12.1.6 If W = (W 1 , . . . , W d )
is a d-dimensional Brownian motion (cf.
Definition 12.1.1) on (Ω, F , P , Ft ) then for each u, v ∈ L2W we have
⎡ˆ T ˆ T ⎤
j
. E us dWsi vs dWs | Ft
t t
⎡ˆ T ⎤
= δij E us vs ds | Ft , 0 ≤ t ≤ T , i, j = 1, . . . , d. (12.1.4)
t

Proof Equation (12.1.4) follows directly from (12.1.3) and point (iii) of Proposi-
tion 12.1.2. ⨆
⨅
Remark 12.1.7 The components of the covariation matrix (cf. Definition 9.5.3) of
the integral process
ˆ t
Xt =
. us dBs
0

are

⎲d ˆ t d ˆ t
⎲
ij jk
〈X〉t = 〈
. uih
s dB h
s , us dBsk 〉 =
h=1 0 k=1 0
12.2 Multidimensional Itô Processes 231

(by (12.1.2))

d ˆ
⎲ t
jk
. = s us d〈B , B 〉s
uih h k
(12.1.5)
h,k=1 0

for i, j = 1, . . . , N .

12.2 Multidimensional Itô Processes

Definition 12.2.1 (Itô Process [!]) Let W be a d-dimensional Brownian motion.

An N-dimensional Itô process is a process of the form
ˆ t ˆ t
Xt = X0 +
. us ds + vs dWs (12.2.1)
0 0

where:
(i) X0 ∈ mF0 is an N-dimensional random variable;
(ii) u is an N -dimensional process in L1loc , i.e., u is progressively measurable and
such that, for every t ≥ 0,
ˆ t
. |us |ds < ∞, a.s.
0

(iii) v is a process in L2loc with values in the space of N × d matrices, i.e., v is

progressively measurable and such that, for every t ≥ 0,
ˆ t
. |vs |2 ds < ∞ a.s.
0

where |v| denotes the Hilbert-Schmidt norm of the matrix v, i.e., the Euclidean
norm in RN×d , defined by

⎲
N ⎲
d
|v|2 =
. (v ij )2 .
i=1 j =1

In differential notation, we write

dXt = ut dt + vt dWt .
.

Combining (12.1.5) with the fact that 〈w〉t = tI , we obtain the following
232 12 Multidimensional Stochastic Calculus

Proposition 12.2.2 Let X be the Itô process in (12.2.1). The covariation matrix of
X is
ˆ t
.〈X〉t = vs vs∗ ds, t ≥ 0,
0

or, in differential notation,

( )ij ⎲
d
C ij := vv ∗ =
ij
d〈Xi , Xj 〉t = Ct dt,
. v ik v j k . (12.2.2)
k=1

Proposition 12.2.3 (Itô Isometry) For every N × d matrix v ∈ L2 and d-

dimensional Brownian motion W , we have
⎡ |ˆ |2 ⎤ ⎡ˆ t ⎤
| t |
.E | v dW | =E |v| 2
ds .
| s s |
0 0

Proof We have
⎡⎛ ⎞2 ⎤
⎡ |ˆ |2 ⎤ d ˆ t
| t | ⎲
N
⎢ ⎲ ⎥
.E | | E ⎣⎝ vs dWs ⎠ ⎦ =
ij j
| vs dWs | =
0 i=1 j =1 0

(by (12.1.4))
⎡⎛ˆ ⎞2 ⎤
⎲
N ⎲
d t
ij j
. = E vs dWs =
i=1 j =1 0

(by the scalar Itô isometry)

⎲
N ⎲
d ⎡ˆ t ⎤
ij
. = E (vs )2 ds .
i=1 j =1 0

⨆
⨅
Example 12.2.4 In the simplest case where u, v are constants, we have

Xt = X0 + ut + vWt ,
.

that is, X is a correlated Brownian motion with drift.

12.3 Multidimensional Itô’s Formula 233

12.3 Multidimensional Itô’s Formula

Theorem 12.3.1 (Itô’s Formula for Continuous Semimartingales) Let X =

(X1 , . . . , Xd ) be a continuous d-dimensional semimartingale and F = F (t, x) ∈
C 1,2 (R≥0 × Rd ). Then almost surely, for every t ≥ 0 we have

ˆ t d ˆ
⎲ t
j
F (t, Xt ) = F (0, X0 ) +
. (∂t F )(s, Xs )ds + (∂xj F )(s, Xs )dXs
0 j =1 0

d ˆ
1 ⎲ t
+ (∂xi xj F )(s, Xs )d〈Xi , Xj 〉s
2 0
i,j =1

or, in the differential notation,

⎲
d
j 1
⎲
d
dF(t, Xt )=∂t F (t, Xt )dt+ (∂xj F )(t, Xt )dXt +
. (∂xi xj F )(t, Xt )d〈Xi , Xj 〉t .
2
j =1 i,j =1

Below we examine two particularly important cases in which we use the

expressions (12.1.1) and (12.2.2) of the covariations 〈Xi , Xj 〉:
(i) if W is a d-dimensional Brownian motion (cf. Definition 12.1.1) we have

d〈W i , W j 〉t = δij dt
. (12.3.1)

where δij is the Kronecker delta;

(ii) if X is an Itô process of the form

dXt = μt dt + σt dWt
. (12.3.2)

where μ is an N -dimensional process in L1loc and σ is an N × d matrix in L2loc ,

then
( )ij
C ij = σ σ ∗ ,
ij
d〈Xi , Xj 〉t = Ct dt,
. (12.3.3)

that is, recalling the notation 〈X〉 for the covariation matrix of X (cf.
Definition 9.5.3),

d〈X〉t = Ct dt.
.

Corollary 12.3.2 (Itô’s Formula for Brownian Motion) Let W be a d-

dimensional Brownian motion. For every F = F (t, x) ∈ C 1,2 (R≥0 × Rd ) we
234 12 Multidimensional Stochastic Calculus

have
ˆ t d ˆ
⎲ t
j
F (t, Wt ) = F (0, 0) +
. (∂t F )(s, Ws )ds + (∂xj F )(s, Ws )dWs
0 j =1 0
ˆ t
1
+ (ΔF )(s, Ws )ds
2 0

where Δ is the Laplace operator in Rd :

⎲
d
Δ=
. ∂x j x j .
j =1

In differential notation, we have

⎛ ⎞
1
.dF (t, Wt ) = ∂t F + ΔF (t, Wt )dt + (∇x F )(t, Wt )dWt ,
2
( )
where ∇x = ∂x1 , . . . , ∂xd denotes the spatial gradient.
Example 12.3.3 (Quadratic Martingale) Let us compute the stochastic differen-
tial of |Wt |2 where W is an N -dimensional Brownian motion. In this case

F (x) = |x|2 = x12 + · · · + xN

.
2
, ∂xi F (x) = 2xi , ∂xi xj F (x) = 2δij ,

where δij is the Kronecker delta. Therefore, we have

⎲
N
d|Wt |2 = N dt + 2Wt dWt = N dt + 2
. Wti dWti .
i=1

It follows that the process Xt = |Wt |2 − N t is a martingale.

Corollary 12.3.4 (Itô’s Formula for Itô Processes [!]) Let X be an Itô process in
RN of the form (12.3.2). For every F = F (t, x) ∈ C 1,2 (R≥0 × RN ) we have

ˆ t N ˆ
⎲ t
j
F (t, Xt ) = F (0, X0 ) +
. (∂t F )(s, Xs )ds + (∂xj F )(s, Xs )dXs
0 j =1 0

N ˆ
1 ⎲ t ij
+ (∂xi xj F )(s, Xs )Cs ds
2 0
i,j =1
12.3 Multidimensional Itô’s Formula 235

where C = σ σ ∗ . In differential notation, we hav

⎛ ⎞
1 ⎲
N ⎲
N
.dF (t, Xt ) = ⎝∂t F + μt ∂xj F ⎠ (t, Xt )dt
ij j
Cs ∂xi xj F +
2
i,j =1 j =1

⎲
N ⎲
d
jk
+ σt ∂xj F (t, Xt )dWtk .
j =1 k=1

Example 12.3.5 (Exponential Martingale) Let

dYt = σt dWt
.

with σ of dimension N × d and W a d-dimensional Brownian motion. Recall that

the covariation matrix of Y is d〈Y 〉t = σt σt∗ dt. Given η ∈ RN , let
⎛ ⎞ ⎛ ˆ ⎞
η 1 1 t ∗ 2
Mt = exp 〈η, Yt 〉 − 〈〈Y 〉t η, η〉 = exp 〈η, Yt 〉 −
. |σs η| ds .
2 2 0

We apply Itô’s formula with F (x) = e〈x,η〉 and

1
dXt = dYt − σt σt∗ ηdt.
.
2
η
We have Mt = F (Xt ) and

∂xi F (x) = ηi F (x),

. ∂xi xj F (x) = ηi ηj F (x),

so that
⎛ ⎞ ⎲
N ⎲
d
1
dMt = Xt ηdXt + 〈σt σt∗ η, η〉dt = Xt ηdYt = Xt
η ij j
. ηi σt dWt .
2
i=1 j =1

In particular, it follows that M η is a positive local martingale (and therefore a super-

martingale by Remark 8.4.6-(vi)).
Proposition 4.4.2 has the following multidimensional generalization: we consider
the exponential martingale

|η|2
Mt := ei〈η,Wt 〉+
η
. 2 t , t ≥ 0, η ∈ Rd , (12.3.4)

where i is the imaginary unit and W is a d-dimensional Brownian motion.

236 12 Multidimensional Stochastic Calculus

Proposition 12.3.6 Let W be a d-dimensional, continuous, and adapted process on

the space (Ω, F , P , Ft ) and such that W0 = 0 a.s. If for every η ∈ Rd the process
M η in (12.3.4) is a martingale, then W is a Brownian motion.
Remark 12.3.7 (Formal Rules for Covariations [!]) Let X be the Itô process
in (12.3.2) with components

⎲
d
dXti = μit dt +
. σtik dWtk , i = 1, . . . , N. (12.3.5)
k=1

To determine the coefficients of the second

( derivatives
) in Itô’s formula, we need to
calculate the covariation matrix 〈X〉 = 〈Xi , Xj 〉 which we know to be given by
d〈X〉t = σt σt∗ dt by (12.3.3). From a practical standpoint, the calculation of σ σ ∗
can be cumbersome and it is therefore preferable to use the following rule of thumb:
we write

d〈Xi , Xj 〉 = dXi ∗ dXj

and calculate the product “∗'' on the right-hand side as a product of the “polynomi-
als” dXi in (12.3.5) according to the following calculation rules

j
dt ∗ dt = dt ∗ dWti = dWti ∗ dt = 0,
. dWti ∗ dWt = δij dt, (12.3.6)

where δij is the Kronecker delta.

Example 12.3.8 Suppose N = d = 2 in (12.3.5) and calculate the stochastic
differential of the product of Zt = Xt1 Xt2 . We have Zt = F (Xt ) where F (x1 , x2 ) =
x1 x2 and

.∂x1 F (x) = x2 , ∂x2 F (x) = x1 , ∂x1 x1 F (x) = ∂x2 x2 F (x) = 0,

∂x1 x2 F (x) = ∂x2 x1 F (x) = 1.

Consequently,

d(Xt1 Xt2 ) = Xt1 dXt2 + Xt2 dXt1 + d〈X1 , X2 〉t

.
⎛ ⎞
= Xt1 dXt2 + Xt2 dXt1 + σt11 σt21 + σt12 σt22 dt.

Moreover, regarding the quadratic variation of X1 , we have

⎛ ⎞
.d〈X1 〉t = (σt11 )2 + (σt12 )2 dt.
12.3 Multidimensional Itô’s Formula 237

Example 12.3.9 Let us calculate the stochastic differential of the process

ˆ t
1
Yt = etWt
. Ws2 dWs1
0

where (W 1 , W 2 ) is a standard two-dimensional Brownian motion. Proceeding as in

Example 11.1.8, we identify the function F = F (t, x1 , x2 ) = etx1 x2 and the Itô
process

dXt1 = dWt1 ,
. dXt2 = Wt2 dWt1

in order to apply Itô’s formula. We have

∂t F = x1 F,
. ∂x1 F = tF, ∂x2 F = etx1 , ∂x1 x1 F = t 2 F, ∂x1 x2 F = tetx1 ,
∂x2 x2 F = 0,

and by the formal rules (12.3.6) for the calculation of covariation processes

d〈X1 〉t = dt,
. d〈X1 , X2 〉t = Wt2 dt.

Consequently

1 1⎛2 1
⎞
.dYt = Wt1 Yt dt + tYt dWt1 + etWt dWt2 + t Yt + 2tetWt Wt2 dt.
2

Finally, we give the multidimensional version of Corollary 11.2.2 on the Lp

estimates for the stochastic integral. We omit the proof which is similar to the scalar
case.
Corollary 12.3.10 ([!]) Let σ ∈ L2 , an N × d-dimensional matrix, and W a d-
dimensional Brownian motion. For every p ≥ 2 and T > 0 we have
⎡ |ˆ t |p ⎤ ⎡ˆ ⎤
| | T
sup | σs dWs || ≤ cT
|
p−2
E
. 2 E |σs | ds
p
(12.3.7)
0≤t≤T 0 0

where |σ | indicates the Hilbert-Schmidt norm3 of σ and c is a positive constant that

depends only on p, N , and d.

3 That is, the Euclidean norm in RN ×d .

238 12 Multidimensional Stochastic Calculus

12.4 Lévy’s Characterization and Correlated Brownian

Motion

We recall expression (12.3.1) of the covariations of a standard Brownian motion W .

Theorem 12.4.1 (Lévy’s Characterization of a Brownian Motion) Let X be a
d-dimensional process defined on the space .(Ω, F , P , (Ft )) and such that .X0 = 0
a.s. Then X is a Brownian motion if and only if X is a continuous local martingale
such that

〈Xi , Xj 〉t = δij t,
. t ≥ 0. (12.4.1)

Proof We use Proposition 12.3.6 and verify that, for every .η ∈ Rd , the exponential
process

|η|2
Mt := eiηXt +
η 2 t
.

is a martingale. By Itô’s formula we have

⎛ ⎞
|η|2 1 ⎲
d
.dMt = Mt ⎝ ηi ηj d〈Xi , Xj 〉t ⎠ =
η η
dt + iηdXt −
2 2
i,j =1

(by assumption (12.4.1))

η
. = Mt iηdXt

and therefore, by Theorem 10.2.24, .M η is a continuous local martingale. On the

other hand .M η is also a true martingale being a bounded process, hence the thesis.
⨆
⨅
Corollary 12.4.2 Let .α = (α 1 , . . . , α d ) be a d-dimensional progressively measur-
able process such that .|αt | = 1 for .t ≥ 0 almost surely. For every d-dimensional
Brownian motion W , the process
ˆ t
Bt :=
. αs dWs
0

is a real Brownian motion.

Proof By Theorem 10.2.15 B is a continuous martingale and by assumption
ˆ t
〈B〉t =
. |αs |2 ds = t.
0

The thesis follows from Theorem 12.4.1. ⨆

⨅
12.4 Lévy’s Characterization and Correlated Brownian Motion 239

Definition 12.4.3 (Correlated Brownian Motion) Let .α be a progressively mea-

surable process with values in the space of matrices of dimension .N ×d, whose rows
.α are such that .|αt | = 1 for .t ≥ 0 almost surely. Given a standard d-dimensional
i i

Brownian motion W , the process

ˆ t
Bt :=
. αs dWs
0

is called correlated Brownian motion.

By Corollary 12.4.2, each component of B is a real Brownian motion and
by (12.3.3) we have
ˆ t
ij
〈B i , B j 〉t =
. ρs ds
0

where .ρt = αt αt∗ is called correlation matrix of B. Moreover, we have

ˆ t
cov(Bt ) =
. E [ρs ] ds,
0

since
⎡ d ˆ d ˆ t
⎤
⎡ ⎤ ⎲ t ⎲
j i j jh
i
.cov(Bt , Bt ) = E Bt Bt = E ik k
αs dWs αs dWs =
h

k=1 0 h=1 0

(by the Itô isometry, Proposition 12.2.3)

⎡ˆ ⎤ ˆ
t ⎲
d t ⎡ ⎤
jk ij
. =E αsik αs ds = E ρs ds.
0 k=1 0

When .σ is orthogonal, we have .N = d, .α ∗ = α −1 and therefore .α i · α j = δij

for each pair of rows: in this particular case, B is also a standard d-dimensional
Brownian motion according to Definition 12.1.1.
Example 12.4.4 (Itô’s Formula for Correlated Brownian Motion [!]) In some
applications, it is natural to use Itô processes defined in terms of a correlated
Brownian motion .dBt = αt dWt as in Definition 12.4.3. For example, in the
Black&Scholes financial model [19], the stochastic dynamics governing the prices
of N risky assets can be described by the following equations

dSti = μit Sti dt + σti Sti dBti ,

. i = 1, . . . , N, (12.4.2)
240 12 Multidimensional Stochastic Calculus

or alternatively by

⎲
d
ij j
dSti = μit Sti dt +
. vt Sti dWt , i = 1, . . . , N, (12.4.3)
j =1

where W is a standard d-dimensional Brownian motion. In (12.4.3), the dynamics

of the i-th asset explicitly involves all Brownian motions .W 1 , . . . , W d and the
diffusion coefficients .v ij incorporate the correlations between the different assets.
The dynamics described in Eq. (12.4.2) may offer greater convenience, as the i-
th asset depends only on the real Brownian motion .B i : the coefficient .σ i , usually
called volatility, is an indicator of the “riskiness” of the i-th asset; the dependence
between the different assets is implicit in B through the correlation matrix .ρ = αα ∗ ,
for which .d〈B〉t = ρt dt. In this context, it is often preferred to assign the
dynamics (12.4.2) instead of (12.4.3), to keep separate the volatility structures of
individual securities from that of correlation.
In the case of correlated Brownian motion, the formal calculation rules of
Remark 12.3.7 change to

j ij
dt ∗ dt = dt ∗ dBti = dBti ∗ dt = 0,
. dBti ∗ dBt = ϱt dt. (12.4.4)

For example, let us assume the dynamics (12.4.2) with .N = 2 and let B be two-
dimensional Brownian motion defined as in Example 12.1.3, with correlation matrix
⎛ ⎞
1ϱ
. , ϱ ∈ [−1, 1].
ϱ1

Then we have
⎛ ⎞
St1 dSt1 St1 1 2 2St1
.d = 2 − 2 dSt + 2
− 2 d〈S , S 〉t + 2 d〈S 〉t
1 2 2
St2 St (St )2 2 (St )2 (St )3
S1 ⎛ ⎞ S1
= t2 μ1t − μ2t − ϱt σt1 σt2 + (σt2 )2 dt + t2 (σt1 dBt1 − σt2 dBt2 ).
St St

12.5 Key Ideas to Remember

• Sections 12.1, 12.2, and 12.3: these sections contain the multidimensional
extension of the main concepts of stochastic integration. Since several technical
and non-substantial complications arise, the rules of thumb of Remark 12.3.7
come in handy when applying Itô’s formula.
• Section 12.4: a classical result by Lévy provides a characterization of a Brownian
motion in terms of the martingale property and the expression of the covariation
matrix. In certain applications, such as in finance (see Example 12.4.4), it’s
common to employ correlated Brownian motion and the associated Itô’s formula.
Symbols introduced in this chapter:

Symbol Description Page

⎛ ⎞
´t d ´
∑ t ij j
. 0 us dBs := 0 us dBs Multidimensional stochastic integral 229
j =1 i=1,...,N
∑
d
.Δ = ∂x j x j Laplace operator in .Rd 234
j =1
Chapter 13
Changes of Measure and Martingale
Representation

It has been suggested that an army of monkeys might be trained

to pound typewriters at random in the hope that ultimately great
works of literature would be produced. Using a coin for the
same purpose may save feeding and training expenses and free
the monkeys for other monkey business.
William Feller

In this chapter, we present two classic results:

• Girsanov Theorem 13.3.3, which states that the process obtained by adding a
drift to a Brownian motion is still a Brownian motion under a new probability
measure;
. the martingale representation Theorem 13.5.1, according to which every local
martingale with respect to the Brownian ﬁltration admits a representation in terms
of stochastic integral and consequently has a continuous version.
These results can be combined to examine the effect of a change of probability
measure on the expression of the drift of an Itô process. In the treatment of these
problems, a central role is played by exponential martingales.

13.1 Change of Measure and Itô Processes

Consider a d-dimensional Brownian motion W on a ﬁltered space .(Ω, F, P , Ft ) and

a d-dimensional process .λ ∈ L2loc . Applying Itô’s formula to the exponential process
⎛ ˆ t ˆ ⎞
1 t
.Mt := exp − λs dWs − |λs | ds , t ∈ [0, T ],
λ 2
(13.1.1)
0 2 0

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 243
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_13
244 13 Changes of Measure and Martingale Representation

we obtain

.dMtλ = −Mtλ λt dWt . (13.1.2)

Thus, .M λ is a local martingale, sometimes called exponential martingale. Being

positive, .M λ is a super-martingale (cf. Remark (8.4.6)-(vi)) and in particular
⎾ ⏋
E Mtλ ≤ M0λ = 1,
. t ∈ [0, T ].
⎾ ⏋
Furthermore, .M λ is a true martingale on .[0, T ] if and only if .E MTλ = 1.
Exponential martingales have an interesting connection with changes of proba-
bility measure. Recall that two probabilities .P , Q on a measurable space .(Ω, F )
are said to be equivalent if they have the same certain and negligible events: in
this case we write .Q ∼ P . By the Radon-Nikodym Theorem B.1.3 in [113], for
each probability Q, equivalent to P , there is a random variable Z that is a.s. strictly
positive and such that
ˆ
Q(A) =
. ZdP , A ∈ F;
A

in particular, we have .E P [Z] = 1. Z is called the Radon-Nikodym derivative of Q

with respect to P and is denoted by the symbol .Z = dQ dP . Note that it is equivalent
to assign .Q ∼ P or a strictly positive r.v. Z such that .E P [Z] = 1.
The following theorem states that there is a one-to-one correspondence between
the measures Q, equivalent to P , and the processes .λ ∈ L2loc such that .M λ is a
martingale. Moreover, a change of probability measure corresponds to a change of
drift of the Brownian motion (and the related Itô processes).
Theorem 13.1.1 (Changes of Measure and Drift [!!]) Let .W = (Wt )t∈[0,T ] be
a d-dimensional Brownian motion on the space .(Ω, F , P ) equipped with the
standard Brownian filtration1 .F W . We have:
(i) if Q is a probability measure equivalent to P then there exists .λ ∈ L2loc such
that
dQ
. = MTλ (13.1.3)
dP

where .M λ is the exponential martingale in (13.1.1);

(ii) conversely, if .λ ∈ L2loc is such that .M λ is a true martingale then (13.1.3) defines
a probability measure .Q ∼ P .

1 The filtration obtained by completing the filtration generated by W so that it satisfies the usual

conditions.
13.1 Change of Measure and Itô Processes 245

Moreover, if .Q ∼ P :
(a) almost surely we have
⎾ ⏋
dQ
λ
.Mt =E P
| Ft ,
W
t ∈ [0, T ]; (13.1.4)
dP

(b) the process

ˆ t
Wtλ := Wt +
. λs ds (13.1.5)
0

is a Brownian motion on .(Ω, F , Q, FtW );

dXt = bt dt + σt dWt
. (13.1.6)

with .b ∈ L1loc and .σ ∈ L2loc , then

dXt = (bt − σt λt )dt + σt dWtλ .

. (13.1.7)

We will prove Theorem 13.1.1 in Sect. 13.5.1, as a corollary of the two main
results of this chapter, Girsanov theorem and the Brownian martingale representa-
tion theorem.

13.1.1 An Application: Risk-Neutral Valuation of Financial

Derivatives

In some applications, we are interested in replacing the drift .bt of an Itô process of
the form (13.1.6) with a suitable drift .rt ∈ L1loc . Theorem 13.1.1, states that this is
possible by changing the probability measure provided that there exists a process
.λ ∈ L
2 such that .r = b − σ λ and .M λ in (13.1.1) is a martingale. In this section,
loc t t t t
we present a specific application in the field of mathematical finance.
In the one-dimensional Black&Scholes model [19] of Example 12.4.4, the price
S of a risky asset has the following stochastic dynamics

dSt = μSt dt + σ St dWt ,

. (13.1.8)

where W is a real Brownian motion on .(Ω, F , P , Ft ) and .μ, σ are two real
parameters called expected return rate and volatility, respectively. We assume .σ > 0
in order not to cancel the random effect of the Brownian motion that describes the
246 13 Changes of Measure and Martingale Representation

riskiness2 of the asset. Moreover, it is reasonable to assume .μ > r where r denotes

the risk-free interest rate:3 this is economically motivated by the fact that investors,
to take on the risk of investing in the asset S, expect a return rate .μ > r, more
remunerative than the bank account. In ﬁnancial jargon, P is called the “real-world
measure” because the dynamics (13.1.8) under the measure P intends to describe
the real evolution of the risky asset: precisely, the parameters .μ, σ of the model
are those that could be estimated by means of econometric methods applied to real
data, such as a historical series of stock prices. This statistical estimation is typically
conducted with the intention of predicting the future price trend based on past data.
In mathematical ﬁnance, starting from model (13.1.8), another probability
measure Q is introduced as in Theorem 13.1.1 with .λ equal to the constant process

μ−r
λ=
. ∈ R+ . (13.1.9)
σ
The choice of .λ is such that the dynamics of S becomes

dSt = rSt dt + σ St dWtλ ,

thus formally analogous4 to (13.1.8) but with the expected return rate equal to the
risk-free rate. The measure Q does not intend to describe the real dynamics of the
stock: Q is called “risk-neutral measure” or also “martingale measure” because the
process .~
St := e−rt St of the discounted asset price5 is a Q-martingale6 and, in
particular, we have

S0 = e−rT E Q [St ] .
. (13.1.10)

Formula (13.1.10) is a risk-neutral valuation formula, according to which the

current price .S0 is fair in the sense that it is equal to the expected value of the
discounted future price.
The measure Q is employed to assess speciﬁc ﬁnancial instruments known as
derivatives, whose value is determined at a future time T based on .ST : precisely, a

2 If .σ = 0, (13.1.8) reduces to an ordinary differential equation

.dSt = μSt dt

with deterministic solution .St = S0 eμt : the latter is called a compound capitalization formula with
interest rate .μ.
3 The interest rate paid by the bank account which is assumed to be the risk-free investment of

reference.
4 .W λ = W + λt is a real Brownian motion under the measure Q.
t t
5 The discount factor .e−rt eliminates the “time value” of prices.
6 As opposed to the real measure P under which, being .μ > r, the discounted price is a

sub-martingale: this describes the expectation of a higher return compared to a bank account,
considering the riskiness of the asset.
13.2 Integrability of Exponential Martingales 247

“payoff function” .ϕ is given and the random variable .ϕ(ST ) represents the value of
the derivative at time T . For consistency with formula (13.1.10), the (discounted)
expected value in the risk-neutral measure

e−rT E Q [ϕ(ST )]
. (13.1.11)

is called “risk-neutral price”, at the initial time, of the derivative with payoff .ϕ. The
expected value in (13.1.11) can be calculated explicitly using the fact that .ST has
log-normal distribution, returning the famous Black&Scholes formula.
The parameter .λ in (13.1.9) is called “market price of risk” because it is defined
as the ratio between the return differential .μ − r required to assume the risk of
investing in S and the volatility .σ that measures the riskiness of S.
Unlike P , the measure Q does not have a “statistical” purpose and does not
reflect the actual probabilities of events; rather, it is an artificial measure under
which all market prices (of the bank account, of the stock S and of the derivative
.ϕ(ST )) are deemed fair: the purposes of Q are mainly the valuation of derivatives

and the study of some fundamental properties of ﬁnancial models, such as absence
of arbitrage and completeness. For a full treatment of these topics, we refer, for
example, to [111, 112] and [115].

13.2 Integrability of Exponential Martingales

In this section, we give some conditions on the process .λ that guarantee that the
exponential martingale (13.1.1) is a true martingale.
Proposition 13.2.1 Assume that

ˆT
. |λt |2 dt ≤ κ a.s. (13.2.1)
0

for a certain constant .κ. Then the exponential martingale .M λ in (13.1.1) is a true
martingale and
⎾ ⏋
( λ )p
E
. sup Mt < ∞, p ≥ 1.
0≤t≤T

We prove Proposition 13.2.1 at the end of the section.

Notation 13.2.2 For every process X, we set

.X̄T := sup |Xt |.

0≤t≤T
248 13 Changes of Measure and Martingale Representation

Consider the integral process

ˆ t
Yt :=
. λs dWs , t ∈ [0, T ], (13.2.2)
0

where the Brownian motion W and .λ ∈ L2loc are both d-dimensional processes.7
Under condition (13.2.1), the Burkholder-Davis-Gundy inequality provides the
following summability estimate for Y : for every .p > 0, we have
⎾ p⏋ ⎾ ⏋
p/2
E ȲT ≤ cE 〈Y 〉T
. ≤ cκ p/2 .

In fact, a stronger, exponential-type integrability estimate holds; to prove it we need

the following
Lemma 13.2.3 For every continuous non-negative super-martingale .Z =
(Zt )t∈[0,T ] , we have
⎛ ⎞
E [Z0 ]
P
. sup Zt ≥ ε ≤ , ε > 0.
0≤t≤T ε

Proof Fix .ε > 0, and let

τ := inf{t ≥ 0 | Zt ≥ ε} ∧ T .
.

Then .τ is a bounded stopping time and by the optional sampling Theorem 8.1.6, we
have
⎾ ⏋
.E [Z0 ] ≥ E [Zτ ] ≥ E Zτ 1(Z̄ ≥ε) ≥ εP (Z̄T ≥ ε).
T

⨆
⨅
Proposition 13.2.4 (Exponential Integrability) Let Y be the stochastic integral
in (13.2.2) with .λ ∈ L2 satisfying the condition (13.2.1). Then we have

( ) ϵ2
. P ȲT ≥ ϵ ≤ 2e− 2κ , ϵ > 0, (13.2.3)

7 Then, more explicitly,

d ˆ
⎲ t
j j
.Yt = λs dWs .
j =1 0
⎛ ⎞
We note that .Mtλ = exp −Yt − 21 〈Y 〉t .
13.2 Integrability of Exponential Martingales 249

and consequently there exists .α = α(κ) > 0 such that

⎾ 2⏋
E eα ȲT < ∞.
. (13.2.4)

Proof For every .α > 0, the process

α2
. Ztα = eαYt − 2 〈Y 〉t ,

is a continuous, positive supermartingale. Furthermore, under the condition (13.2.1),

for every .ϵ > 0 and .t ∈ [0, T ], we have
⎛ ⎞ ⎛ α2 κ
⎞
(Yt ≥ ϵ) = eαYt ≥ eαϵ ⊆ Ztα ≥ eαϵ− 2 .
.

Hence
⎛ ⎞ ⎛ ⎞
2 α2 κ
αϵ− α 2 κ
.P sup Yt ≥ ϵ ≤P sup Ztα ≥e ≤ e−αϵ+ 2
0≤t≤T 0≤t≤T

by Lemma 13.2.3, since .E[Z0α ] = 1. Choosing .α = ϵ

κ in order to minimize the last
term, we get
⎛ ⎞
ϵ2
P
. sup Yt ≥ ϵ ≤ e− 2κ
0≤t≤T

An analogous estimate holds for .−Y and this proves (13.2.3). Finally, (13.2.4) is an
immediate consequence of (13.2.3), Proposition 3.1.6 in [113] and Example 3.1.7
in [113]. ⨆
⨅
Remark 13.2.5 Proposition 13.2.4 extends to the case where .σ is a .N × d-
dimensional process: in this case we have

( ) ϵ2
P ȲT ≥ ϵ ≤ 2N e− 2κN ,
. ϵ > 0, (13.2.5)

and there exists .α = α(κ, N ) > 0 such that

⎾ 2⏋
E eα ȲT < ∞.
.

Indeed, it is enough to note that

⎛ ⎞
( ) j ϵ
. ȲT ≥ ϵ ⊆ ȲT ≥ √
N
250 13 Changes of Measure and Martingale Representation

for at least one component .Y j , with .j ∈ {1, . . . , N}, of Y . Therefore, we have

⎛ ⎞
( ) ⎲ N
j ϵ
.P Ȳt ≥ ϵ ≤ P ȲT ≥ √
j =1
N

and the thesis follows.

Proof of Proposition 13.2.1 For every .ε > 0, by (13.2.3) we have
⎛ ⎞ ⎛ ⎞
|Yt |
( ) (log ε)2
P
. sup Mtλ ≥ε ≤P sup e ≥ε = P ȲT ≥ log ε ≤ 2e− 2κ .
0≤t≤T 0≤t≤T

and consequently, by Proposition 3.1.6 in [113], we have

⎾ ⏋ ˆ ⎛ ⎞
∞
E
. sup (Mtλ )p =p ε p−1
P sup Mtλ ≥ ε dε < ∞. (13.2.6)
0≤t≤T 0 0≤t≤T

In particular for .p = 2 we have

⎾ˆ ⏋ ⎾ ˆ ⏋
T T
E
. λ2t (Mtλ )2 dt ≤E sup (Mtλ )2 λ2t dt ≤
0 0≤t≤T 0

(by assumption (13.2.1))

⎾ ⏋
. ≤ κE sup (Mtλ )2 <∞
0≤t≤T

by (13.2.6). Therefore .λM λ ∈ L2 and from (13.1.2) it follows that .M λ is a

martingale. ⨅
⨆
A more general condition that guarantees the martingale property for the
exponential process .M λ is given by the following classical result by Novikov [100].
Theorem 13.2.6 (Novikov’s Condition) If .λ ∈ L2loc is such that
⎾ ⎛ ˆ T ⎞⏋
1
E exp
. |λs |2 ds <∞ (13.2.7)
2 0

then the process .M λ in (13.1.1) is a martingale.

13.3 Girsanov Theorem 251

Remark 13.2.7 Condition (13.2.7) is sharp in the sense that, for every .0 < α < 21 ,
there exists a process .λ ∈ L2loc that satisﬁes
⎾ ⎛ ˆ T ⎞⏋
.E exp α |λs | ds
2
<∞
0

and is such that .M λ in (13.1.1) is not a martingale: for details see Chapter 6 in [90].

13.3 Girsanov Theorem

Let W be a d-dimensional Brownian motion on the space .(Ω, F , P , Ft ). In

Sect. 13.2 we have provided sufﬁcient conditions on .λ ∈ L2loc for the exponential
process
⎛ ˆ t ˆ ⎞
1 t
λ
.Mt := exp − λs dWs − |λs | ds ,
2
t ∈ [0, T ]. (13.3.1)
0 2 0
⎾ ⏋
to be a true martingale and thus in particular .E MTλ = 1: in this case
ˆ
Q(A) :=
. MTλ dP , A ∈ F,
A

is a probability measure on .(Ω, F ) with Radon-Nikodym derivative

dQ
. = MTλ . (13.3.2)
dP
The proof of the following lemma is based on the Bayes’ formula of Theorem 4.2.14
in [113]: for every .X ∈ L1 (Ω, Q) we have
⎾ ⏋
E P XMTλ | Ft
.E [X | Ft ] =
Q
⎾ ⏋ t ∈ [0, T ]. (13.3.3)
E P MTλ | Ft

Lemma 13.3.1 Suppose that .M λ in (13.3.1) is a P -martingale and let Q be the

probability measure in (13.3.2). A process .X = (Xt )t∈[0,T ] is a Q-martingale if and
only if .(Xt Mtλ )t∈[0,T ] is a P -martingale.
Proof Since .M λ is adapted and strictly positive, it is clear that X is adapted if and
only if .XM λ is. Moreover, we have
⎾ ⏋ ⎾ ⎾ ⏋⏋
E Q [|Xt |] = E P |Xt |MTλ = E P E P |Xt |MTλ | Ft =
.
252 13 Changes of Measure and Martingale Representation

(since X is adapted and .M λ is a P -martingale)

⎾ ⎾ ⏋⏋ ⎾ ⏋
. = E P |Xt |E P MTλ | Ft = E P |Xt |Mtλ ,

and thus .Xt ∈ L1 (Ω, Q) if and only if .Xt Mtλ ∈ L1 (Ω, P ). Similarly, for .s ≤ t we
have
⎾ ⏋ ⎾ ⎾ ⏋ ⏋ ⎾ ⏋
.E
P
Xt MTλ | Fs = E P E P Xt MTλ | Ft | Fs = E P Xt Mtλ | Fs .

Then from (13.3.3) with .X = Xt we have

⎾ ⏋ ⎾ ⏋
E P Xt MTλ | Fs E P Xt Mtλ | Fs
.E [Xt | Fs ] =
Q
⎾ ⏋ = ,
E P MTλ | Fs Msλ

which proves the thesis. ⨆

⨅
Remark 13.3.2 Under the assumptions of Lemma 13.3.1, the process
⎛ˆ ˆ ⎞
( λ )−1 t 1 t
. Mt = exp λs dWs + |λs | ds .
2
0 2 0

( )−1
is a Q-martingale since .M λ M λ is obviously a P -martingale. Moreover, for
every absolutely integrable random variable X, we have
⎾ ( )−1 λ ⏋ ⎾ ( )−1 ⏋
E P [X] = E P X MTλ
. MT = E Q X MTλ

and therefore
dP ( )−1
. = MTλ .
dQ

In particular, .P , Q are equivalent measures, in the sense that they have the same
certain and negligible events, since they have mutually strictly positive densities.
A Brownian motion is a martingale and therefore is a “driftless process”:
Girsanov theorem states that if a drift is added to a Brownian motion, this process
is still a Brownian motion with respect to a new probability measure. To understand
this result, which at ﬁrst glance seems a bit strange, it is helpful to keep in mind
the elementary Example 1.4.8 at the end of which we observed that the martingale
property is not a property of the paths of the process, but rather depends on the
probability measure under consideration.
13.3 Girsanov Theorem 253

Theorem 13.3.3 (Girsanov [!!]) If W is a Brownian motion and .M λ in (13.3.1) is

a martingale on the space .(Ω, F , P , Ft ), then the process
ˆ t
.Wt
λ
:= Wt + λs ds, t ∈ [0, T ],
0

is a Brownian motion on .(Ω, F , Q, Ft ) with . dQ

dP = MT .
λ

Proof By Proposition 12.3.6 on the characterization of a Brownian motion, it is

sufﬁcient to show that, for every .η ∈ Rd , the process
2
η λ + |η|
Xt := eiηWt
. 2 t
, t ∈ [0, T ],

is a Q-martingale (i.e., a martingale under the measure Q): equivalently, by

Lemma 13.3.1, we prove that the process
⎛ ˆ t ˆ t ˆ ⎞
η |η|2 t 1 t
Xt Mtλ = exp iηWt + i
. ηλs ds + − λs dWs − |λs |2 ds
0 2 0 2 0
⎛ ⎞
ˆ t ⎲ d ˆ t⎛ ⎞2
1
= exp ⎝− λs − iηj ds ⎠
j
(λs − iη) dWs −
0 2 0 j =1

is a P -martingale. Under the boundedness condition (13.2.1), the thesis follows

from Lemma 13.2.1, which also holds for complex-valued processes and in
particular for .λ − iη.
For the general case we use a localization argument: we consider the sequence
of stopping times
⎧ ˆ t ⎫
τn = inf t ≥ 0 |
. |λs |2 ds ≥ n ∧ T , n ∈ N.
0

η
λ ) is a P -martingale and
By Lemma 13.2.1, the process .(Xt∧τn Mt∧τn

⎾ η ⏋ η
. EP Xt∧τn Mt∧τ
λ
n
| Fs = Xs∧τn Ms∧τ
λ
n
, s ≤ t, n ∈ N.

η
Hence, to prove that .Xη Z is a martingale, it is sufﬁcient to show that .(Xt∧τn Mt∧τ
λ )
n
η λ
converges to .(Xt Mt ) in .L1 -norm as n tends to inﬁnity. Since
η η
. lim Xt∧τn = Xt a.s.
n→∞

η |η|2 T
and .0 ≤ Xt∧τn ≤ e 2 , it is enough to prove that

.
λ
lim Mt∧τn
= Mtλ in L1 (Ω, P ).
n→∞
254 13 Changes of Measure and Martingale Representation

Let

Mn,t = min{Mt∧τ
.
λ
n
, Mtλ };

we have .0 ≤ Mn,t ≤ Mtλ and by the dominated convergence theorem

⎾ ⏋ ⎾ ⏋
. lim E Mn,t = E Mtλ .
n→∞

On the other hand

⎾| |⏋ ⎾ ⏋ ⎾ λ ⏋
E |Mtλ − Mt∧τ
.
λ |
n
= E Mtλ − Mn,t + E Mt∧τn
− Mn,t =
⎾ ⏋ ⎾ λ ⏋
(since .E Mtλ = E Mt∧τn
= 1)
⎾ ⏋
. = 2E Mtλ − Mn,t

which proves the thesis. ⨆

⨅

13.4 Approximation by Exponential Martingales

Another reason for interest in exponential martingales is the fact that they are
a useful approximation tool. Hereafter, W is a Brownian motion on the space
.(Ω, F , P ) equipped with the standard Brownian ﬁltration .F
W : the choice of

this particular ﬁltration is crucial for the validity of the following results. The
next theorem is the main ingredient in the proof of the Brownian martingale
representation theorem that we will present in Sect. 13.5.

The proofs of this section are a bit technical and can be skipped at a ﬁrst
reading.

Theorem 13.4.1 The space of linear combinations of random variables of the form
⎛ ˆ T ˆ T ⎞
1
MTλ = exp −
. λ(t)dWt − λ(t)2 dt ,
0 2 0

with .λ deterministic function in .L∞ ([0, T ]), is dense in .L2 (Ω, FTW ).
13.4 Approximation by Exponential Martingales 255

The proof of Theorem 13.4.1 is based on the following

Lemma 13.4.2 Let .(tn )n∈N be a sequence dense in .[0, T ]. The family of random
variables of the form

ϕ(Wt1 , . . . , Wtn ),
. ϕ ∈ C0∞ (Rn ), n ∈ N,

is dense in .L2 (Ω, FTW ).

Proof The discrete ﬁltration deﬁned by

Gn := σ (Wt1 , . . . , Wtn ),
. n ∈ N,

is such that .σ (Gn , n ∈ N) = GTW where .G W denotes the ﬁltration generated by

Brownian motion. Given .X ∈ L2 (Ω, FTW ), later we will prove that
⎾ ⏋
. lim E |X − Xn |2 = 0, Xn := E [X | Gn ] , n ∈ N. (13.4.1)
n→∞

Since .Xn ∈ mGn , by Doob’s Theorem 2.3.3 in [113] we have

Xn = ϕn (Wt1 , . . . , Wtn )
.

for some measurable function .ϕn that is square-integrable respect to the law
μWt1 ,...,Wtn : by density, .ϕn can be approximated in .L2 by a sequence .(ϕn,k )k∈N
.

in .C0∞ (Rn ) and we also have

. lim ϕn,k (Wt1 , . . . , Wtn ) = Xn , in L2 (Ω, P ),

k→∞

which proves the thesis.

It remains to prove (13.4.1). By Doob’s maximal inequality (8.1.3) we have
⎾ ⏋ ⎾ ⏋
E sup Xn2 ≤ 4E X2 < ∞.
. (13.4.2)
n∈N

Then, by Theorem 8.2.2 on the convergence of discrete martingales, there exists the
a.s. pointwise limit

. M := lim Xn .
n→∞

Moreover, since

(Xn − M)2 ≤ 2(Xn2 + M 2 ) ≤ 2 sup Xn2 ,

.
n∈N
256 13 Changes of Measure and Martingale Representation

by (13.4.2) and the dominated convergence theorem, we also have

. lim Xn = M in L2 (Ω, P ).
n→∞

Setting .Mn = E [M | Gn ], we have

⎾ ⏋ ⎾ ⏋ ⎾ ⏋
E (Xn − Mn )2 = E (Xn − E [M | Gn ])2 = E (E [Xn − M | Gn ])2 ≤
.

(by Jensen’s inequality)

⎾ ⏋
. ≤ E (Xn − M)2 −−−−→ 0.
n→∞
⎾ ⏋
To conclude, let us prove that .M = E X | FTW = X so that .M = X almost surely.
First, .M ∈ mGTW ⊆ mFTW ; then, ﬁxed .n̄ ∈ N, for .Z ∈ bGn̄ and .n ≥ n̄ we have

E [Z(M − X)] = E [ZE [M − X | Gn ]] = E [Z(Mn − Xn )] −−−−−−→ 0

.
n̄≤n→∞
(13.4.3)

due to (13.4.3). Since the elements of .FTW and .GTW differ only for negligible events,
⎾ ⏋
it follows that .M = E X | FTW . ⨆
⨅
Proof of Theorem 13.4.1 It is sufﬁcient to prove that if .X ∈ L2 (Ω, FTW ) and, for
every .λ ∈ L∞ ([0, T ]),
⎾ ⏋
〈X, MTλ 〉L2 (Ω) = E XMTλ = 0
. (13.4.4)

then .X = 0 almost surely.

From (13.4.4), choosing a piecewise constant .λ, we have
⎾ ⏋
F (η) := E Xeη1 Wt1 +···+ηn Wtn = 0,
. η ∈ Rn , t1 , . . . , tn ∈ [0, T ],

and the analytic extension of F to .Cn , by the theorem of analytic continuation, is

identically zero. Then, for every .ϕ ∈ C0∞ (Rn ), by Theorem 2.5.6 in [113] on Fourier
inversion, we have
⎾ ˆ ⏋
⎾ ⏋ X −i(η1 Wt1 +···+ηn Wtn )
.E Xϕ(Wt1 , . . . , Wtn ) = E e ϕ̂(η)dη
(2π )n Rn
ˆ ⎾ ⏋
1 −i(η1 Wt1 +···+ηn Wtn )
= ϕ̂(η)E e X dη = 0,
(2π )n Rn

and the thesis follows from Lemma 13.4.2. ⨆

⨅
13.5 Representation of Brownian Martingales 257

13.5 Representation of Brownian Martingales

The Brownian stochastic integral constructed in Chap. 10 is a continuous local

martingale. The following result shows that, conversely, every local martingale
with respect to the standard Brownian filtration .F W admits a representation as a
stochastic integral.
Theorem 13.5.1 (Representation of Brownian Martingales [!!!]) Let W be a
Brownian motion on the space .(Ω, F , P ) equipped with the standard Brownian
filtration .F W . If .X = (Xt )t∈[0,T ] is a càdlàg version of a local martingale on
.(Ω, F , P , F
W ) then there exists a unique .u ∈ L 2 such that
loc
ˆ t
Xt = X0 +
. us dWs , t ∈ [0, T ]. (13.5.1)
0

In particular, X is an a.s. continuous process.

Remark 13.5.2 Theorem 13.5.1 strengthens the result proven in Sect. 8.2 as it
states that every Brownian local martingale admits a continuous modiﬁcation, not
just a càdlàg one.
Before presenting the proof of Theorem 13.5.1, we preface it with the following
proposition, grounded on the approximation results elaborated in Sect. 13.4.
Proposition 13.5.3 ([!]) For every random variable .X ∈ L2 (Ω, FTW ) there exists
a unique .u ∈ L2 such that
ˆ T
X = E [X] +
. ut dWt . (13.5.2)
0

Proof We restrict our attention to the one-dimensional case for simplicity. As for
uniqueness, if .u, v ∈ L2 satisfy (13.5.2), then
ˆ T
. (ut − vt )dWt = 0
0

and from Itô’s isometry it follows that .P (u = v a.e. on [0, T ]) = 1 (cf.

Remark 10.2.18).
As for the existence part, the proof is straightforward if X is of the form
⎛ ˆ T ˆ T ⎞
1
X=
. MTλ := exp − λ(t)dWt − 2
λ(t) dt (13.5.3)
0 2 0
258 13 Changes of Measure and Martingale Representation

with .λ ∈ L∞ ([0, T ]) deterministic function. Indeed, by Itô ’s formula we have

ˆ T
X =1−
. λ(t)Mtλ dWt
0

with λ ∈ L 2 by Proposition 13.2.1 and therefore, in particular, .E [X] =

⎾ .λM ⏋
E MT = 1 by the martingale property.
λ

In general, according to Theorem 13.4.1 every .X ∈ L2 (Ω, FTW ) is approximated

in .L2 by a sequence .(Xn )n∈N of linear combinations of variables of the form
(13.5.3) for which
ˆ T
Xn = E [Xn ] +
. un,t dWt (13.5.4)
0

with .un ∈ L2 . By Itô’s isometry we have

⎾ ⏋ ⎾ˆ T ⏋
.E (Xn − Xm ) = (E [Xn − Xm ]) + E (un,t − um,t ) dt ,
2 2 2
0

and thus .(un )n∈N is a Cauchy sequence in .L2 . The thesis follows by taking the limit
in (13.5.4). ⨆
⨅
Proof of Theorem 13.5.1 The uniqueness of u follows from the uniqueness of the
representation of an Itô process (cf. Remark 10.4.3).
As for the existence, let us ﬁrst consider the case where X is a martingale such
that .XT ∈ L2 (Ω, P ). By Theorem 13.5.3, there exists .u ∈ L2 such that
ˆ T
. XT = E [XT ] + ut dWt ,
0

from which (13.5.1) follows, simply by applying the conditional expectation to

.FtW for every .t ∈ [0, T ]. In particular, we have demonstrated that X possesses
a continuous modiﬁcation.
Now we remove the assumption .XT ∈ L2 (Ω, P ) and prove that every .F W -
martingale X admits a continuous modiﬁcation. Since .XT ∈ L1 (Ω, P ) and
2 1
.L (Ω, P ) is dense in .L (Ω, P ), there exists a sequence .(Yn )n∈N of random
2
variables in .L (Ω, P ) such that

1
E [|Yn − XT |] ≤
. , n ∈ N.
2n
For the previous point, the sequence of martingales
⎾ ⏋
Xn,t := E Yn | FtW ,
. t ∈ [0, T ],
13.5 Representation of Brownian Martingales 259

admits a continuous modiﬁcation and by Doob’s maximal inequality, Theo-

From Borel-Cantelli’s Lemma 1.3.28 in [113], it follows that, almost surely,

.(Xn )n∈N converges uniformly on .[0, T ] to the martingale X, which is therefore
a.s. continuous.
If X is a local martingale, consider a localizing sequence .(τn )n∈N : the process
.Xt∧τn − X0 is a martingale and, as we have just proved, admits a continuous

modiﬁcation. Since

Xt 1(τn ≥T ) = Xt∧τn 1(τn ≥T ) ,

. t ∈ [0, T ], n ∈ N, (13.5.5)

we deduce that also X admits a continuous modiﬁcation.

Finally, we prove (13.5.1) under the assumption that X is a continuous local
martingale. By Remark 8.4.6, there exists a localizing sequence .(τn )n∈N such that
.Xt∧τn − X0 is a continuous and bounded martingale for every .n ∈ N. Then there

exists a sequence .(un )n∈N in .L2 such that

ˆ t
.Xt∧τn = X0 + un,s dWs , t ∈ [0, T ]. (13.5.6)
0

By (13.5.5) and Proposition 10.2.26, we can take the limit in (13.5.6) to conclude
the proof. ⨆
⨅

13.5.1 Proof of Theorem 13.1.1

By the Brownian martingale representation Theorem 13.5.1, there exists .u ∈ L2loc

such that the process M in (13.1.4) admits the representation
ˆ t
Mt = 1 +
. us dWs , t ∈ [0, T ].
0

ut
Note that .λt := − M t
belongs to .L2loc since M is an adapted, continuous, and strictly
positive process. Consequently, we have
ˆ t
Mt = 1 −
. Ms λs dWs , t ∈ [0, T ],
0
260 13 Changes of Measure and Martingale Representation

that is, M solves a linear stochastic differential equation of which the exponential
martingale .M λ in (13.1.1) is the unique8 solution. Hence .M = M λ in the sense of
indistinguishability.
By construction, M is a martingale and therefore, by Girsanov Theorem 13.3.3,
.W in (13.1.5) is a Brownian motion on .(Ω, F , Q, Ft ). Finally, we have
λ W

dXt = bt dt + σt dWt =
.

(by (13.1.5))

. = bt dt + σt (dWtλ − λt dt)

from which (13.1.7) follows.

13.6 Key Ideas to Remember

We summarize the main findings and key concepts of the chapter, omitting technical
details. As usual, if you have any doubt about what the following succinct statements
mean, please review the corresponding section.
. Section 13.1: the exponential martingale .M λ in (13.1.1), with .λ ∈ L2loc , is the
main tool used throughout the chapter. If .M λ is a true martingale, then it can
be used as a density (or Radon-Nikodym derivative) to define a measure Q
equivalent to the initially considered measure P . The process .W λ in (13.1.5),
obtained by adding a drift .λ to a Brownian motion, is a Brownian motion under
the new measure Q. The idea is that there is a correspondence between changes in
drift of a Brownian motion (and related Itô processes) and changes in probability
measure: the drift coefficient .λ acts as the exponent of the martingale .M λ , which
is the Radon-Nikodym derivative of the change of measure.
. Section 13.1.1: the results on changes in drift and measure (often referred to
as “Girsanov’s change of measure” in financial jargon) are pivotal in modern
financial derivatives valuation theory. It is worth noting that a Girsanov’s change
of measure alters the drift term of an Itô process while leaving the diffusion
coefficient unchanged.
. Section 13.2: we provide sufficient conditions on the process .λ for .M λ to be a
true martingale. Novikov’s condition is a classic condition that is often used in
probability theory and mathematical finance.
. Section 13.3: the proof of Girsanov theorem is a relatively direct consequence
of Proposition 4.4.2 that characterizes Brownian motion in terms of exponential
martingales.

8 The fact that .M λ is a solution is a simple check with Itô’s formula. For uniqueness, it is not
difﬁcult to adapt the proof of Theorem 17.1.1 that we will prove later.
13.6 Key Ideas to Remember 261

. Sections 13.4 and 13.5: the proof of the Brownian martingale representation
theorem is quite challenging and is based on a density result of exponential mar-
tingales in the space .L2 (Ω, FTW ) where .F W indicates the standard Brownian
filtration (which satisfies the usual conditions). A significant corollary is the fact
that every local Brownian martingale admits a continuous modification.
Main notations used or introduced in this chapter:

Symbol Description Page

.M
λ Exponential martingale solution of .dMtλ = −Mtλ λt dWt 244
.Q ∼P Equivalence between measures P and Q 244
dQ
.
dP Radon-Nikodym derivative of Q with respect to P 244
.X̄T = sup |Xt | Maximum process 247
0≤t≤T
.W
λ Brownian motion with drift .λ 253
.G
W Filtration generated by Brownian motion 255
.F
W Standard Brownian ﬁltration 257
Chapter 14
Stochastic Differential Equations

It seems fair to say that all differential equations are better

models of the world when a stochastic term is added and that
their classical analysis is useful only if it is stable in an
appropriate sense to such perturbations.
David Mumford

Starting from this chapter, we begin the study of Stochastic Differential Equations,
hereafter abbreviated as SDEs. As anticipated in Sect. 2.6, such equations were
originally introduced for the construction of continuous Markov processes or
diffusions. Over time, SDEs have become increasingly important in stochastic
modeling across a wide range of fields. SDEs generalize deterministic differential
equations by incorporating a random perturbation factor, which allows them to
model systems that are subject to uncertainty. In addition, SDEs can be used to
construct explicit examples of continuous semimartingales.
In this chapter, we introduce the notion of solution to an SDE and the related
problems of existence and uniqueness. These problems have a dual formulation,
in a weak and strong sense. We give a very particular existence and uniqueness
result from which some peculiarities of SDEs compared to the usual deterministic
equations can be deduced, including the so-called “regularization by noise” effect.
We see that it is possible to transfer the study of an SDE to a canonical setting
and analyze the relationship between weak and strong solvability. Finally, we prove
some preliminary estimates of continuous dependence and integrability of solutions.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 263
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_14
264 14 Stochastic Differential Equations

14.1 Solving SDEs: Concepts of Existence and Uniqueness

Hereafter, .N, d ∈ N and .0 ≤ t0 < T are fixed constants. An SDE is an expression

of the form

dXt = b(t, Xt )dt + σ (t, Xt )dWt

. (14.1.1)

where W is a d-dimensional Brownian motion and

b = b(t, x) : ]t0 , T [×RN −→ RN ,

. σ = σ (t, x) : ]t0 , T [×RN −→ RN ×d ,
(14.1.2)

are measurable functions:1 b is called the drift coefficient and .σ the diffusion
coefficient of the SDE. In (14.1.2) .RN ×d indicates the space of matrices of
dimension .N ×d. To simplify the presentation, we will always assume the following
Assumption 14.1.1 The functions .b, σ are measurable and locally bounded in x
uniformly in t (in short, we write .b, σ ∈ L∞
loc (]t0 , T [×R )): precisely, for each
N

.n ∈ N there is a constant .κn such that

|b(t, x)| + |σ (t, x)| ≤ κn ,

. t ∈ ]t0 , T [, |x| ≤ n.

Remark 14.1.2 Though introducing slightly denser notation, we opt to generalize

the initial time as .t0 rather than strictly setting it to zero. We anticipate that
this approach will enhance comprehension of the theory of “strong solutions”
discussed in Chap. 17, along with pivotal results like the flow property of solutions
and parameter dependence estimates. Beginning from Chap. 18, for the sake of
simplicity, we will revert to setting .t0 as zero.
Before giving the definition of solution to an SDE, it is necessary to properly set
the problem through the following
Definition 14.1.3 (Set-up) A set-up .(W, Ft ) on .[t0 , T ] consists of:
• a filtered probability space .(Ω, F , P , (Ft )t∈[t0 ,T ] );

1 More generally, it is possible to study equations whose coefficients depend stochastically on

the time variable. This type of equation intervenes, for example, in the study of optimal control
problems and stochastic filtering. We will restrict our attention to deterministic coefficients. We
refer, for example, to [77] and [66] for a general treatment.
14.1 Solving SDEs: Concepts of Existence and Uniqueness 265

• a d-dimensional Brownian motion2 .W = (Wt )t∈[t0 ,T ] on .(Ω, F , P , Ft ), starting

at time .t0 .
Remark 14.1.4 We explicitly note that .Ft0 is independent of .Wt for .t ≥ t0 and
therefore also from the standard Brownian filtration .(FtW )t∈[t0 ,T ] that verifies the
usual conditions.
Definition 14.1.5 (Solution of an SDE) A solution of the SDE with coefficients
b, σ on the set-up .(W, Ft ) is an N-dimensional process .X = (Xt )t∈[t0 ,T ] defined
.

on the same space as W and such that:

(i) X is continuous and adapted, i.e., .Xt ∈ mFt for every .t ∈ [t0 , T ];
(ii) almost surely we have3
ˆ t ˆ t
Xt = Xt0 +
. b(s, Xs )ds + σ (s, Xs )dWs , t ∈ [t0 , T ]. (14.1.4)
t0 t0

2 On the probability space .(Ω, F , P , (Ft )t∈[t0 ,T ] ), we say that .W = (Wt )t∈[t0 ,T ] is a Brownian
motion starting at time .t0 if:
(i) .Wt0 = 0 a.s.;
(ii) W is a.s. continuous;
(iii) W is adapted to .(Ft )t∈[t0 ,T ] ;
(iv) .Wt − Ws is independent of .Fs for every .t0 ≤ s ≤ t ≤ T ;
(v) .Wt − Ws ∼ N0,(t−s)I for every .t0 ≤ s ≤ t ≤ T , where I denotes the .d × d identity matrix.

For example, let .B = (Bt )t≥0 be a standard Brownian motion on .(Ω, F , P , (Ft )t≥0 ); then .Wt :=
Bt − Bt0 is a Brownian motion starting at time .t0 on .(Ω, F , P ) with respect to the filtration
.(Ft )t≥t0 or even with respect to the standard filtration defined by

.Ft := σ (GtW ∪ N ), GtW := σ (Ws , t0 ≤ s ≤ t), t0 ≤ t ≤ T .

Note that there is a strict inclusion .FtW ⊂ FtB in the case .t0 > 0. Moreover, since the stochastic
integral depends only on the Brownian increments (cf. Corollary 10.2.27), we have a.s.
ˆ t ˆ t
. us dBs = us dWs , t ≥ t0 .
t0 t0

3 That is, there exists a version of the stochastic integral

ˆ t
.t −
| → σ (s, Xs )dWs
t0

such that (14.1.4) holds for every .t ∈ [t0 , T ] almost surely. We explicitly note that, under the local
boundedness Assumption 14.1.1, we have
ˆ T ˆ T
. |b(t, Xt )|dt + |σ (t, Xt )|2 dt < ∞ a.s. (14.1.3)
t0 t0

and therefore the integrals in (14.1.4) are well defined.

266 14 Stochastic Differential Equations

To indicate that X is a solution of the SDE with coefficients .b, σ on the set-up
(W, Ft ) we write
.

X ∈ SDE(b, σ, W, Ft ).
.

It is customary to associate an SDE with an “initial condition” that can be

assigned pointwise through a random variable .Z ∈ mFt0 if the set-up .(W, Ft )
has been previously fixed or, as we will see later, in law through a distribution .μ0
on .RN .
Definition 14.1.6 (Strong Solution of an SDE) Given a set-up .(W, Ft ) and an
initial datum .Z ∈ mFt0 , we denote by

.F Z,W = (FtZ,W )t∈[t0 ,T ]

the filtration generated by W and Z, completed so that it satisfies the usual

conditions.4 We say that a solution .X ∈ SDE(b, σ, W, Ft ), such that .Xt0 = Z,
is a strong solution if it is adapted to the filtration .F Z,W .
Remark 14.1.7 ([!]) Strong solutions are characterized by the property of being
adapted to the filtration .F Z,W : since .F Z,W is the smallest5 filtration with respect
to which a solution of the SDE can be defined, this measurability condition is the
most restrictive possible.
If the initial datum is deterministic, i.e., .Z ∈ RN , then a strong solution is
adapted to the standard Brownian filtration .F W . This means that, through the
SDE, a process (the solution) X is associated with W and X is a “functional” of
W , meaning that .Xt can be expressed as a function of the process .(Ws )s∈[t0 ,t] .
This remark is relevant since in various applications, such as in signal theory, W
represents a set of observed data that are used as “input” for a dynamical system
(formalized by the SDE) that produces the solution X as “output”: in this case,
it is crucial that the output can be represented as a function of the input data. In
other fields, such as mathematical finance, it may be sufficient to consider a weaker
notion of solution, in particular if one is only interested in applications where what
is relevant is the law of the solution.
Example 14.1.8 When the coefficients .b = b(t) and .σ = σ (t) of the SDE
(14.1.1) are deterministic .L∞ functions of the time variable only, the solution of
the corresponding SDE is the Itô process
ˆ t ˆ t
Xt = Z +
. b(s)ds + σ (s)dWs .
t0 t0

4 By Theorem 6.2.22 and the independence of Z from .F W (cf. Remark 14.1.4), W is a Brownian
motion also with respect to .F Z,W .
5 The smallest filtration verifying the usual conditions.
14.1 Solving SDEs: Concepts of Existence and Uniqueness 267

We recall from Example 11.1.9 that if also the initial datum is deterministic, then
Xt is a Gaussian process.
.

Next, we present two formulations of the problem concerning the existence of

solutions to an SDE.
Definition 14.1.9 (Solvability of an SDE) We say that the SDE with coefficients
b, σ is solvable
.

• in the weak sense, if for every distribution .μ0 on .RN there exist a set-up .(W, Ft )
and a solution .X ∈ SDE(b, σ, W, Ft ) such that .Xt0 ∼ μ0 ;
• in the strong sense, if for every set-up .(W, Ft ) and .Z ∈ mFt0 there exists a
strong solution .X ∈ SDE(b, σ, W, FtZ,W ) such that .Xt0 = Z a.s.
Although it may seem counter-intuitive, it is possible for a process to satisfy an
equation of the type
ˆ t ˆ t
.Xt = x + b(s, Xs )ds + σ (s, Xs )dWs
0 0

with deterministic initial datum .x ∈ RN , and not be adapted to .F W : in other words,

in some instances, a solution X needs to possess additional randomness beyond that
induced by the Brownian motion with respect to which the SDE is formulated. A
famous example is due to Tanaka [139] (see also [154]): here we describe the general
idea and refer to Section 9.2.1 in [112] or Example 3.5, Chapter 5 in [67] for details.
Example 14.1.10 (Tanaka [!]) Consider the scalar (i.e., with .N = d = 1) SDE

. dXt = σ (Xt )dWt (14.1.5)

with null drift and initial datum, .b = Z = 0, and diffusion coefficient

⎧
1 if x ≥ 0,
σ (x) = sgn(x) :=
.
−1 if x < 0.

To prove that the SDE (14.1.5) is solvable in the weak sense, consider a Brownian
motion X defined on the space .(Ω, F , P , F X ). The process
ˆ t
Wt :=
. σ (Xs )dXs (14.1.6)
0

is a continuous martingale with quadratic variation .〈W 〉t = t and consequently, by

Theorem 12.4.1, it is also a Brownian motion on .(Ω, F , P , F X ). Since .σ 2 ≡ 1,
from the definition .dWt = σ (Xt )dXt we obtain

dXt = σ 2 (Xt )dXt = σ (Xt )dWt

.
268 14 Stochastic Differential Equations

which means that X is a solution of the SDE (14.1.5) with respect to W , i.e.,
X ∈ SDE(0, σ, W, F X ), with null initial datum. The crucial point is that it can
.

be proved6 that W , defined by (14.1.6), is adapted to the standard filtration .F |X|

of the absolute value process .|X|: if X were adapted to .F W then it should also
be adapted to .F |X| and this is absurd. This example may seem a bit pathological
because the coefficient .σ is a discontinuous function: more recently Barlow [7] has
shown that for every .α < 12 there exists an .α-Hölder continuous function .σ that is
bounded above and below by positive constants, and such that the SDE (14.1.5) is
solvable in the weak sense but not in the strong sense.
In conclusion, an SDE can be solvable in the weak sense without being solvable
in the strong sense: weak solvability is less restrictive because it gives the freedom
to choose the space, the Brownian motion, and the filtration with respect to which
to write the SDE. On the contrary, strong solutions are constrained to be adapted to
the standard filtration .F Z,W of the initial datum Z and the Brownian motion W .
Just like for existence, there exist different notions of uniqueness for the solution
of an SDE.
Definition 14.1.11 (Uniqueness for an SDE) We say that for the SDE with
coefficients .b, σ there is uniqueness
• in the strong sense, if the fact that .X ∈ SDE(b, σ, W, Ft ) and .Y ∈
SDE(b, σ, W, Gt ) with .Xt0 = Yt0 a.s. implies that X and Y are indistinguishable
processes;
• in the weak sense (or in law), if the fact that .X ∈ SDE(b, σ, W, Ft ) and .Y ∈
d d
SDE(b, σ, B, Gt ), with .Xt0 = Yt0 , implies that .(X, W ) = (Y, B) or, equivalently,
.(X, W ) and .(Y, B) have the same finite-dimensional distributions.

In the definition of strong uniqueness, the two processes X and Y are defined
on the same probability space .(Ω, F , P ) and are solutions of the SDE on the
setups .(W, Ft ) and .(W, Gt ), respectively: here W is a Brownian motion with
respect to both filtrations .(Ft ) and .(Gt ) which can be different. Strong uniqueness
is also known as “pathwise uniqueness”. In the definition of uniqueness in law, the
processes X and Y can be solutions on different set-ups .(W, Ft ) and .(B, Gt ), even
defined on different probability spaces.
Example 14.1.12 ([!]) For the SDE in Example 14.1.10, there is weak but not
strong uniqueness. In fact, every solution X of the SDE (14.1.5) is a local martingale
with .〈X〉t = t and therefore, by Lévy’s characterization Theorem 12.4.1, X is a
Brownian motion: hence there is uniqueness in law.
On the other hand, if X is the weak solution constructed in Example 14.1.10, we
can verify that also .−X is a solution of the SDE and therefore there is no strong

6 Here the Meyer-Tanaka formula is used: see, for example, Section 5.3.2 in [112] or Section 2.11
in [37].
14.2 Weak Existence and Uniqueness via Girsanov Theorem 269

uniqueness: in fact, since .σ (−x) = −σ (x) if .x /= 0, we have

ˆ t ˆ t ˆ t
. σ (−Xs )dWs = − σ (Xs )dWs + 2 1(Xs =0) dWs
0 0 0
ˆ t
=− σ (Xs )dWs a.s.
0

since, by Itô’s isometry,

⎾⎛ˆ ⎞2 ⏋ ˆ
t t ⎾ ⏋
E
. 1(Xs =0) dWs = E 1(Xs =0) ds = 0.
0 0

Here we used the fact that .P (Xs = 0) = 0 for every .s ≥ 0 since X is a Brownian
motion.
Remark 14.1.13 ([!]) Theorem 14.3.6, by Yamada and Watanabe, states that if an
SDE is solvable in the strong sense then it is also solvable in the weak sense.
Furthermore, strong uniqueness implies uniqueness in law: while this result may
seem intuitive, its proof is not straightforward; indeed, strong uniqueness pertains
to solutions defined on the same space, whereas proving weak uniqueness requires
dealing with solutions that may be defined on different spaces. Finally, we also have
that if for an SDE there is strong uniqueness then every solution is a strong solution.
Remark 14.1.14 Recently, a further notion of uniqueness for SDEs, called “path-
by-path uniqueness”, has also been studied: see in this regard [31, 48] and [130].

14.2 Weak Existence and Uniqueness via Girsanov Theorem

There are many ways to prove weak existence and uniqueness for an SDE. In this
section, we examine a very particular technique that exploits the results on changes
of measure of Chap. 13. The following remarkable Theorem 14.2.3 is an example
of the so-called “regularizing effect of Brownian motion”, whereby weak existence
and uniqueness for an SDE are obtained under minimal regularity assumptions on
the drift coefficient, which is here assumed to be only measurable and bounded.
Under such assumptions, the corresponding ordinary differential equation (without
the Brownian part) does not generally have a unique solution as shown by the well-
known
Example 14.2.1 (Peano’s Brush) The SDE (14.1.1) with .b(t, x) = |x|α , .σ = 0
and null initial datum reduces to the Volterra integral equation
ˆ t
.Xt = |Xs |α ds. (14.2.1)
0
270 14 Stochastic Differential Equations

Formula (14.2.1) has the null function as its unique solution if .α ≥ 1, while if
α ∈ ]0, 1[ there are infinite solutions of the form
.

⎧
⎨0 if 0 ≤ t ≤ s,
.Xt = ⎛ ⎞β
⎩ t−s if s ≤ t ≤ T ,
β

where .β = 1
1−α and .s ∈ [0, T ].
A similar phenomenon also occurs in the stochastic case.
Example 14.2.2 (Itô and Watanabe [64] [!]) The SDE
1 2
dXt = 3Xt3 dt + 3Xt3 dWt ,
. X0 = 0,

has infinite strong solutions of the form

⎧
(a) 0 for 0 ≤ t < τa ,
.Xt =
Wt3 for t ≥ τa ,

where .a ∈ [0, +∞] and .τa = inf{t ≥ a | Wt = 0}. For .a = +∞ and .a = 0, we

have the solutions .Xt(+∞) ≡ 0 and .Xt(0) = Wt3 , respectively.
In light of the previous examples, the following result is quite surprising and
documents the regularizing effect of Brownian motion.
Theorem 14.2.3 (Zvonkin [154], Veretennikov [144]) Suppose that the coeffi-
cient

b : ]0, T [×Rd −→ Rd
.

is a Borel-measurable and bounded function. Then the SDE

dXt = b(t, Xt )dt + dWt

. (14.2.2)

is solvable in the weak sense and the solution is unique in law.

Proof
Existence Let .μ0 be a distribution on .Rd and let X be a d-dimensional Brownian
motion with initial value .X0 ∼ μ0 (cf. Exercise 8.4.5) defined on the space
.(Ω, F , P , Ft ). By the boundedness of b and Proposition 13.2.1, we have that

⎛ˆ t ˆ t ⎞
1
Mt := exp
. b(s, Xs )dXs − |b(s, Xs )|2 ds , t ∈ [0, T ],
0 2 0
(14.2.3)
14.2 Weak Existence and Uniqueness via Girsanov Theorem 271

is a martingale. Then, by Theorem 13.1.1, the process

ˆ t
Wt := Xt − X0 −
. b(s, Xs )ds (14.2.4)
0

is a standard Brownian motion under the measure Q defined by . dQ

dP = MT . Formula
(14.2.4) shows that X is a weak solution of the SDE (14.2.2) under the measure Q.
Moreover
⎾ ⏋
.Q(X0 ∈ H ) = E [1H (X0 )MT ] = E 1H (X0 )E P [MT | F0 ] = P (X0 ∈ H )
P P

by the martingale property of the process M, and therefore .X0 ∼ μ0 under Q.

Uniqueness Let .X(i) , .i = 1, 2, be solutions of the SDE (14.2.2) on the setups
.(W
(i) , F (i) ) defined on the spaces .(Ω , F (i) , P ), respectively. Assume that .X (1)
t i i 0
and .X0(2) are equal in law. Again, by the boundedness of b and Proposition 13.2.1,
the processes
⎛ ˆ t ˆ ⎞
(i) 1 t
.Mt := exp − b(s, Xs )dWs −
(i) (i)
|b(s, Xs )| ds ,
(i) 2
t ∈ [0, T ],
0 2 0
(14.2.5)

are martingales. From Theorem 13.1.1 it follows that

ˆ t
(i)
.Xt = X0(i) + b(s, Xs(i) )ds + Wt(i) (14.2.6)
0

(i)
are Brownian motions respectively on the spaces .(Ωi , F (i) , Qi , Ft ) where
dQi (i)
dP = MT . Therefore, the law of .X
.
(1) in .Q is equal to the law of .X (2)
1
in .Q2 : from (14.2.5), (14.2.6), and Corollary 10.2.28, it follows that the law of
.(X
(1) , W (1) , M (1) ) in .Q is equal to the law of .(X (2) , W (2) , M (2) ) in .Q . Finally,
1 2
for every .0 ≤ t1 < · · · < tn ≤ T and .H ∈ B2nd we have
ˆ
dQ1
P1 ((Xt(1)
.
1
, Wt(1)
1
, . . . , Xt(1)
n
, Wt(1)
n
)∈H)= 1H (Xt(1)
1
, Wt(1)
1
, . . . , Xt(1)
n
, Wt(1)
n
) (1)
Ω1 MT
ˆ
(2) (2) (2) (2) dQ2
= 1H (Xt1 , Wt1 , . . . , Xtn , Wtn )
Ω2 MT(2)
(2) (2) (2) (2)
= P2 ((Xt1 , Wt1 , . . . , Xtn , Wtn ) ∈ H )

which proves the thesis.

⨆
⨅
272 14 Stochastic Differential Equations

Remark 14.2.4 Theorem 14.2.3 can be extended in various directions. Using the
Novikov condition (Theorem 13.2.6) to prove that the process in (14.2.3) is a
martingale, one proves the existence of a weak solution of the SDE (14.2.2) under
the more general assumption of linear growth in x (in addition to measurability) of
the coefficient b: for more details see, for example, Proposition 5.3.6 in [67].
In Sect. 18.4 we will prove a “strong version” of Theorem 14.2.3, under the more
restrictive assumption that .b = b(t, x) is a bounded and Hölder continuous function
in the variable x, uniformly in t.

14.3 Weak vs Strong Solutions: The Yamada-Watanabe

Theorem

We examine the relationship between strong and weak solvability. For simplicity,
we assume .t0 = 0 and, given .N, d ∈ N and .T > 0, we consider an SDE with
coefficients

b = b(t, x) : ]0, T [×RN −→ RN ,

. σ = σ (t, x) : ]0, T [×RN −→ RN ×d .

Furthermore, we let .μ0 be a distribution on .RN that we will use as the initial
condition.

Since the result of this section are rather technical, on a first reading it is
recommended to read the statements and skip the proofs.

Definition 14.3.1 (Weak Solution of an SDE) The SDE with coefficients .b, σ
and initial law .μ0 is solvable in the weak sense if there exist a set-up .(W, Ft ) and a
solution .X ∈ SDE(b, σ, W, Ft ) such that .X0 ∼ μ0 . In this case, almost surely
ˆ t ˆ t
Xt = X0 +
. b(s, Xs )ds + σ (s, Xs )dWs , t ∈ [0, T ], (14.3.1)
0 0

and we say that the pair .(X, W ) is a weak solution of the SDE with coefficients .b, σ
and initial law .μ0 .
Remark 14.3.2 ([!]) To prove that an SDE is solvable in the weak sense, it is
necessary to construct not only the process X but also the set-up .(W, Ft ) on which
the SDE is written: for this reason, the weak solution is typically referred to as the
pair .(X, W ), not just the process X.
We now see that it is always possible to transfer the problem of weak solvability
of an SDE to a “canonical setting”.
14.3 Weak vs Strong Solutions: The Yamada-Watanabe Theorem 273

Notation 14.3.3 Given .n ∈ N, we denote by

Ωn = C([0, T ]; Rn )
.

the space of continuous n-dimensional trajectories equipped with the filtration

(Gtn )t∈[0,T ] generated by the identity process
.

Xt (w) := w(t),
. w ∈ Ωn , t ∈ [0, T ],

and the Borel .σ -algebra7 .GTn .

Remark 14.3.4 If the process .(X, W ), defined on the space .(Ω, F , P ), is a
solution of the SDE (14.3.1) then its law .μX,W is the distribution on .ΩN +d =
ΩN × Ωd defined by

μX,W (H ) = P ((X, W ) ∈ H ),
. H ∈ GTN+d .

Hereafter, we will repeatedly use the fact that .ΩN +d is a Polish space on which,
thanks to Theorem 4.3.2 in [113], it is possible to define a regular version of
the conditional probability. The following lemma is a crucial ingredient in all
subsequent analysis.
Lemma 14.3.5 (Transfer of Solutions [!]) If .(X, W ) is a weak solution of the SDE
with coefficients .b, σ and initial law .μ0 on the space .(Ω, F , P ), then the canonical
process .(X, W) defined by

Xt (x, w) := x(t),
. Wt (x, w) := w(t), (x, w) ∈ ΩN +d , t ∈ [0, T ],

is a weak solution of the SDE with coefficients .b, σ and initial law .μ0 on the space
(ΩN+d , GTN+d , μX,W ).
.

Proof We have the scheme

(X,W ) (X,W)
(Ω, F , P ) −−−−→ (ΩN +d , GTN+d , μX,W ) −−−−→ (ΩN +d , GTN +d )
.

d
and by construction, .(X, W ) = (X, W). The fact that .W is a Brownian motion is a
consequence8 of the equality in law of .(X, W ) and .(X, W). Suppose for the moment

7 We saw in Proposition 3.2.1 that, in the space of continuous trajectories, the .σ -algebra generated

by cylinders (or, equivalently, by the identity process) coincides with the Borel .σ -algebra.
8 In particular, it is sufficient to show the independence of the increments using the characteristic

function: for details, see for example Lemma IV.1.2 in [63].

274 14 Stochastic Differential Equations

that the initial law is .μ0 = δx0 for some .x0 ∈ RN and therefore .X0 = x0 almost
surely. Letting
ˆ t ˆ t
.Jt := b(s, Xs )ds + σ (s, Xs )dWs ,
0 0
ˆ t ˆ t
Jt := b(s, Xs )ds + σ (s, Xs )dWs ,
0 0

we have that .(X, W, J ) and .(X, W, J) are equal in law by Corollary 10.2.28.
Therefore, .X − x0 − J is indistinguishable from the null process, and this proves
the thesis.
The case where the initial datum .X0 is random can be handled by conditioning
on .X0 . Precisely, to lighten the notation, let .P := μX,W : by Theorem 4.3.2 in [113],
there exists a regular version
( )
.P(· | X0 ) = Px,w (· | X0 )
(x,w)∈Ω d+N

of the conditional probability of .P given .X0 . For .P-almost every .(x, w) ∈ ΩN +d ,

under the measure .Px,w (· | X0 ), the process .(X, W) has the same law as .(X̂, W )
where .(X̂, W ) is the solution of the SDE with coefficients .b, σ and initial datum
.X̂0 = x(0). Then, for what has been proven previously, for .P-almost every .(x, w) ∈

ΩN+d , under the measure .Px,w (· | X0 ), the process .(X, W) is a solution of the SDE
with coefficients .b, σ and initial datum .x(0). To conclude, it is sufficient to observe
that, for
| ˆ t ˆ t |
| |
.Z := sup | σ (s, Xs )dWs ||
|Xt − X0 − b(s, Xs )ds −
t∈[0,T ] 0 0

by the law of total probability, we have .E [Z] = E [E [Z | X0 ]] = 0. ⨆

⨅
The following result establishes the relationships between solvability and unique-
ness for an SDE in the weak and strong sense, according to Definitions 14.1.9
and 14.1.11.
Theorem 14.3.6 (Yamada and Watanabe [149] [!])
(i) Strong solvability implies weak solvability;
(ii) strong uniqueness implies weak uniqueness;
(iii) weak solvability and strong uniqueness together imply strong solvability.
Proof We provide a detailed outline of the proof, and direct readers to Chapter 8 in
[136] for a comprehensive treatment.9
(i) In order to infer weak solvability from strong solvability, we only have to
construct a set-up. More precisely, given a distribution .μ0 on .RN , we consider
the canonical space .RN × Ωd equipped with the product measure .μ0 ⊗ μW ,

9 Further reference sources are Theorem 21.14 and Lemma 21.17 in [66] and Section V.17 in [124].
14.3 Weak vs Strong Solutions: The Yamada-Watanabe Theorem 275

where .μW is the law of the d-dimensional Brownian motion, and with the
filtration .(Gt )t∈[0,T ] generated by the identity process

(Z, W) : RN × Ωd
.

−→ RN × Ωd , Z(z, w) = z, Wt (z, w) = w(t), t ∈ [0, T ].

Then .Z ∼ μ0 is .G0 -measurable and .W is a Brownian motion (with respect to

Gt ). Hence, by the hypothesis of strong solvability, there exists a solution .X
.

related to the set-up .(W, Gt ) and such that .X0 = Z ∼ μ0 .

(ii) We omit the case where the initial datum is random: this can be treated in a
completely analogous way to the second part of the proof of Lemma 14.3.5
(for details, see, for example, Proposition IX.1.4 in [123]).
We thus consider two solutions .Xi ∈ SDE(b, σ, W i , Fti ) such that .X0i =
x ∈ RN almost surely, for .i = 1, 2. We prove that the hypothesis of strong
uniqueness implies that .(X1 , W 1 ) and .(X2 , W 2 ) are equal in law. The problem
is that the solutions .X1 and .X2 are generally defined on different sample
spaces: so the idea is to construct versions of .X1 and .X2 that are solutions of
the SDE on the same space and with respect to the same Brownian motion. To
this end, we construct a canonical space on which three processes are defined:
a Brownian motion and the versions of .X1 and .X2 .
By Theorem 4.3.4 in [113] (and Remark 4.3.5 in [113]), there exists a
regular version

μXi |W i = (μXi |W i (·; w))w∈Ωd

of the law of .Xi conditioned on .W i : for each .w ∈ Ωd , .μXi |W i (·; w) is a

distribution on the Borel .σ -algebra .GTN of .ΩN and we have10
ˆ ⎾ ⎾ ⏋ ⏋
. μXi |W i (H ; w)μW (dw) = E E 1H (Xi ) | W i 1A (W i )
A

= μXi ,W i (H × A), (H, A) ∈ GTN × GTd .

(14.3.2)

Now, on .ΩN × ΩN × Ωd we define the probability measure11

ˆ
.P(H × K × A) := μX1 |W 1 (H ; w)μX2 |W 2 (K; w)μW (dw),
A

(H, K, A) ∈ GTN × GTN × GTd ,

(14.3.3)

10 Here .μ ≡ μ , .i = 1, 2, is the Wiener measure on .Ω .

W Wi d
11 .P extends to the product .σ -algebra .G N ⊗ G N ⊗ G d = G 2N +d .
T T T T
276 14 Stochastic Differential Equations

and denote by .(X1 , X2 , W) the canonical process on such space. Taking

respectively .H = ΩN or .K = ΩN in (14.3.3), by (14.3.2) we have

d
(Xi , W) = (Xi , W i ),
. i = 1, 2; (14.3.4)

we deduce in particular that .W is a Brownian motion under the measure .P and,

as in the proof of Lemma 14.3.5, .(X1 , W) and .(X2 , W) are both solutions of the
SDE with coefficients .b, σ and with initial datum x. By strong uniqueness, we
have that .X1 and .X2 are indistinguishable under the measure .P and therefore

d d
(X1 , W 1 ) = (X1 , W) = (X2 , W) = (X2 , W 2 ).
.

(iii) Again, we consider only the case of a deterministic initial datum. Let .X ∈
SDE(b, σ, W, Ft ) be a solution with initial datum .X0 = x ∈ RN a.s. We
apply the construction of point (ii) with .X1 = X2 = X, that is, we construct
on the space .ΩN × ΩN × Ωd the measure .P as in (14.3.3) and the canonical
process .(X1 , X2 , W) where .X1 , X2 are equal in law to X and are solutions of
the SDE with respect to the Brownian motion .W.
We consider the conditional probability .P(· | W) = (Pw (· | W))w∈Ωd and
the related conditional laws

μXi |W (H ) = P(Xi ∈ H | W),

. H ∈ ΩN , i = 1, 2,

noting that .μXi |W = μX|W by (14.3.4). We have12 that the random variables
.X and .X are simultaneously equal a.s. and independent in .Pw (· | W)
1 2

for almost every .w ∈ Ωd and therefore13 .X1 and .X2 have a Dirac delta
distribution under .Pw (· | W). In other terms, for almost every .w ∈ Ωd we have
.μX|W (H ; w) = μXi |W (H ; w) = δF (w) for some measurable map F from .Ωd

12 Indeed, by the strong uniqueness hypothesis, we have .P(X1 = X2 ) = 1 so that

⎾ ⏋ ⎾ ⏋
.E P(X = X | W) = E P(X = X ) = 1
1 2 1 2

and since .P(X1 = X2 | W) ≤ 1, we also have .Pw (X1 = X2 | W) = 1 for almost every .w ∈ Ωd .
Moreover, from definition (14.3.3) of .P, it is not difficult to verify that the joint conditional law of
1 2
.X , X is the product of the marginals
⎛ ⎞
.μX1 ,X2 |W (H × K) = P (X , X ) ∈ H × K | W = μX|W (H )μX|W (K)
1 2

= μX1 |W (H )μX2 |W (K), H, K ∈ ΩN ,

from which the independence for almost every .w ∈ Ωd .

13 As an exercise, prove that if .X, Y are real random variables on a space .(Ω, F , P ), that are equal

a.s. and independent, then .X ∼ δx0 for some .x0 ∈ R. Prove that an analogous result holds for .X, Y
with values in the space .Ωn .
14.4 Standard Assumptions and Preliminary Estimates 277

to .ΩN and therefore .X = F (W ) a.s. To conclude, it is necessary to show that

X is adapted to the standard Brownian filtration .F W : for the proof of this fact,
based on the properties of the regular version of conditional probability, we
refer14 to Problem 3.21 on page 310 in [67].
⨆
⨅
Remark 14.3.7 ([!]) In Remark 14.1.7 we pointed out that strong solutions differ
from weak ones by the property of being adapted to the standard Brownian filtration
(assuming for simplicity that the initial datum is deterministic). This measurability
property is well expressed by the functional dependence .X = F (W ) shown in
the previous proof: in particular, a strong solution .(X, W ) can be defined on the
canonical space .Ωd . On the contrary, Lemma 14.3.5 shows that it is possible to
“transport” every weak solution to the canonical space .ΩN × Ωd . This means that
weak solutions generally require a richer sample space, in which the trajectories of
a solution (that are elements of .ΩN ) are not necessarily functionals of the Brownian
trajectories (that are elements of .Ωd ): this is the case of Tanaka’s Example 14.1.10.

14.4 Standard Assumptions and Preliminary Estimates

In this section, we introduce additional assumptions on the coefficients that enable

us to obtain useful estimates for the solutions of SDEs.
Definition 14.4.1 (Standard Assumptions) The coefficients .b, σ satisfy the stan-
dard assumptions on .]t0 , T [ if there exist two positive constants .c1 , c2 such that

. |b(t, x)| + |σ (t, x)| ≤ c1 (1 + |x|), . (14.4.1)

|b(t, x) − b(t, y)| + |σ (t, x) − σ (t, y)| ≤ c2 |x − y|, (14.4.2)

for every .t ∈ ]t0 , T [ and .x, y ∈ RN .

Formulas (14.4.1) and (14.4.2) are linear growth and global Lipschitz continuity
conditions in x uniform in .t ∈ ]t0 , T [, respectively. We note that, under Assump-
tion 14.1.1, (14.4.2) implies (14.4.1). In some results, we will weaken (14.4.2) by
requiring local Lipschitz continuity in x.
Example 14.4.2 (Geometric Brownian Motion) Consider the SDE with linear
coefficients

dXt = μXt dt + σ Xt dWt

. (14.4.3)

14 In fact, in [67] more is proved (see also Remark 2 on page 310 in [123]): highlighting the
dependence on the initial datum .x ∈ RN , the function .F = F (x, w) is jointly measurable and, for
.Z ∈ mF0 , .X = F (Z, W ) is a strong solution of the SDE with random initial datum .X0 = Z.
278 14 Stochastic Differential Equations

where .μ, σ are real parameters. In this case, .b(t, x) = μx and .σ (t, x) = σ x, so the
standard assumptions are obviously satisfied. As in Example 11.1.5-(iii), a direct
application of Itô’s formula shows that
⎛ 2
⎞
μ− σ2 t+σ Wt
Xt = X0 e
.

is a solution of (14.4.3). The process X, known as geometric Brownian motion,

is used to represent the dynamics of a risky financial asset price in the classical
Black-Scholes model [19]. The model generalizes to the case of time-dependent
coefficients, .μ = μ(t), σ = σ (t) ∈ L∞ (R≥0 ): also in this case, it is easy to
determine the explicit expression of the solution.
Several constants are introduced in the estimates that we prove in this section.
Since it is essential to keep track of them, we introduce the following
Convention 14.4.3 To indicate that a constant c depends solely and exclusively on
the values of the parameters .α1 , . . . , αn , we will write .c = c(α1 , . . . , αn ).
Lemma 14.4.4 ([!]) Let .X, Y be adapted and a.s. continuous processes and .p ≥ 2.
Then:
• if .b, σ satisfy the linear growth condition (14.4.1), there exists a positive constant
.c̄1 = c̄1 (T , d, N, p, c1 ), such that

⎾ |ˆ t ˆ t |p ⏋
| |
.E |
sup | b(s, Xs )ds + σ (s, Xs )dWs ||
t0 ≤t≤t1 t0 t0
ˆ t1 ⎛ ⎾ ⏋⎞
p−2
≤ c̄1 (t1 − t0 ) 2 1 + E sup |Xr |p ds (14.4.4)
t0 t0 ≤r≤s

for every .t1 ∈ ]t0 , T [;

• if .b, σ satisfy the global Lipschitz condition (14.4.2), there exists a positive
constant .c̄2 = c̄2 (T , d, N, p, c2 ) such that
⎾ |ˆ t ˆ t |p ⏋
| |
E
. sup || (b(s, Xs ) − b(s, Ys )) ds + (σ (s, Xs ) − σ (s, Ys )) dWs ||
t0 ≤t≤t1 t0 t0
ˆ t1 ⎾ ⏋
p−2
≤ c̄2 (t1 − t0 ) 2 E sup |Xr − Yr |p ds (14.4.5)
t0 t0 ≤r≤s

for every .t1 ∈ ]t0 , T [.

Proof We recall the elementary inequality
( )
.|x1 + · · · + xn |p ≤ np−1 |x1 |p + · · · |xn |p , x1 , . . . , xn ∈ RN , n ∈ N.
(14.4.6)
14.4 Standard Assumptions and Preliminary Estimates 279

By Hölder’s inequality, we have

⎾ |ˆ t |p ⏋
| |
E
. sup || b(s, Xs )ds ||
t0 ≤t≤t1 t0
⎾ˆ t1 ⏋
≤ (t1 − t0 ) p−1
E |b(s, Xs )| ds ≤
p
t0

(by (14.4.1))
ˆ
p
t1 ⎾ ⏋
. ≤ (t1 − t0 )p−1 c1 E (1 + |Xs |)p ds ≤
t0

(by (14.4.6))
ˆ
p
t1 ( ⎾ ⏋)
. ≤ 2p−1 (t1 − t0 )p−1 c1 1 + E |Xs |p ds
t0
ˆ t1 ⎛ ⎾ ⏋⎞
p
≤ 2p−1 (t1 − t0 )p−1 c1 1 + E sup |Xr |p ds.
t0 t0 ≤r≤s

Similarly, by Burkholder-Davis-Gundy’s inequality, in the version of Corol-

(proceeding as for the previous estimate)

ˆ t1 ⎛ ⎾ ⏋⎞
p−2 p
. ≤ c(t1 − t0 ) 2 2p−1 c1 1 + E sup |Xr | p
ds.
t0 t0 ≤r≤s

This proves (14.4.4).

Again, by Hölder’s inequality, we have
⎾ |ˆ t |p ⏋
| |
.E |
sup | (b(s, Xs ) − b(s, Ys )) ds ||
t0 ≤t≤t1 t0
⎾ˆ t1 ⏋
≤ (t1 − t0 ) p−1
E |b(s, Xs ) − b(s, Ys )| ds ≤
p
t0
280 14 Stochastic Differential Equations

(by (14.4.2))
ˆ
p
t1 ⎾ ⏋
. ≤ (t1 − t0 )p−1 c2 E |Xs − Ys |p ds
t0
ˆ t1 ⎾ ⏋
p
≤ (t1 − t0 )p−1 c2 E sup |Xr − Yr |p ds.
t0 t0 ≤r≤s

Similarly, by Corollary 12.3.10, we have

⎾ |ˆ t |p ⏋
| |
. E sup || (σ (s, Xs ) − σ (s, Ys )) dWs ||
t0 ≤t≤t1 t0
⎾ˆ t1 ⏋
p−2
≤ cp (t1 − t0 ) E
2 |σ (s, Xs ) − σ (s, Ys )| ds ≤
p
t0

(proceeding as for the previous estimate, by (14.4.2))

ˆ t1 ⎾ ⏋
p−2 p
. ≤ cp (t1 − t0 ) 2 c2 E sup |Xr − Yr | p
ds.
t0 t0 ≤r≤s

This proves (14.4.5). ⨆

⨅

14.5 Some A Priori Estimates

In this section, we prove some polynomial and exponential integrability estimates

for the solutions of SDEs whose coefficients satisfy the linear growth assumption
(14.4.1). We use the term “a priori” estimates because condition (14.4.1) alone is
not enough to ensure the existence of a solution: existence is therefore implicitly
assumed as a hypothesis. The following estimates have considerable theoretical
importance (for example, for the proof of Feynman-Kac’s Theorem 15.4.4) and
practical applications (for example, for the results of continuous dependence on
parameters of Sect. 17.4 and the study of the convergence of numerical approxima-
tion schemes for SDEs). On the other hand, the proofs of this section, technical and
not very informative, can be skipped at first reading.
To lighten the notation, in this section we assume .t0 = 0 and for each stochastic
process X we set

X̄t = sup |Xs |.

.
0≤s≤t
14.5 Some A Priori Estimates 281

Hereafter, we will repeatedly use the following classic

Lemma 14.5.1 (Grönwall) Consider .v ∈ L1 ([0, T ]) such that
ˆ t
v(t) ≤ a + b
. v(s)ds, t ∈ [0, T ],
0

where a and b are non-negative real numbers. Then we have

.v(t) ≤ aebt , t ∈ [0, T ].

In Grönwall’s lemma, the integrability assumption of v is necessary: a counter-

example is given by .v(t) = 0 for .t = 0 and .v(t) = 1t for .t > 0, with .a = 0 and
.b = 1. If we add the assumptions .v ≥ 0 and .a = 0 to the hypotheses of Grönwall’s

lemma, then we have .v ≡ 0.

Theorem 14.5.2 (A Priori .Lp Estimates) Let .X = (Xt )t∈[0,T ] be a solution of
the SDE

dXt = b(t, Xt )dt + σ (t, Xt )dWt ,

with .b, σ satisfying the linear growth assumption (14.4.1). Then for every .T > 0
and .p ≥ 2 there exists a positive constant .c = c(T , p, d, N, c1 ) such that
⎾ ⏋
⎾ ⏋
E
. sup |Xt | p
≤ c(1 + E |X0 |p ). (14.5.1)
0≤t≤T

Proof It is not restrictive to assume .E [|X0 |p ] < ∞ otherwise the thesis is obvious.
The general idea of the proof is simple: from estimate (14.4.4) we have
⎾ p⏋
v(t) := E X̄t
.
⎛ ˆ t ⎞
⎾ ⏋ ( ⎾ p⏋ )
≤2 p−1
E |X0 | + c̄1
p
1 + E X̄s ds , t ∈ [0, T ],
0

or equivalently
⎛ ˆ t ⎞
⎾ ⏋
.v(t) ≤ c 1 + E |X0 | + t ∈ [0, T ],
p
v(s)ds ,
0

and therefore the thesis would follow directly from Grönwall’s lemma.
As a matter of fact, to apply Grönwall’s lemma, it is necessary to know a priori15
that .v ∈ L1 ([0, T ]). For this reason, it is necessary to proceed more carefully using

15 Based on what has been proven so far, we do not even know if v is a continuous function.
282 14 Stochastic Differential Equations

a technical localization argument. Let

τn = inf{t ∈ [0, T ] | |Xt | ≥ n},

. n ∈ N,

with the convention .min ∅ = T . Being X a.s. continuous, we have that .τn is an
increasing sequence of stopping times such that .τn ↗ T a.s. With .bn , .σn as in
(17.1.2), we have
ˆ t∧τn ˆ t∧τn
Xt∧τn = X0 +
. b(s, Xs )ds + σ (s, Xs )dWs
0 0
ˆ t ˆ t
= X0 + bn (s, Xs∧τn )ds + σn (s, Xs∧τn )dWs .
0 0

The coefficients .bn = bn (t, x) and .σn = σn (t, x), although stochastic, satisfy the
linear growth condition (14.4.1) with the same constant .c1 : the proof of estimate
(14.4.4) can be repeated in a substantially identical way to the case of deterministic
.b, σ , to obtain

or equivalently
⎛ ˆ ⎞
⎾ ⏋ t1
.vn (t1 ) ≤ c 1 + E |X0 | + t1 ∈ [0, T ],
p
vn (s)ds ,
0

with c positive constant that depends only on .T , p, d, N, c1 and not on n.

We observe that .vn is a measurable and bounded function since .|Xt∧τn | ≤
|X0 |1(|X0 |≥n) + n1(|X0 |<n) and therefore .vn (t) ≤ E [(|X0 | + n)p ] < +∞: then
by Grönwall’s lemma we have
⎾ ⏋
| |p ( ⎾ ⏋)
E
. sup |Xt∧τn | = vn (T ) ≤ cecT 1 + E |X0 |p ,
0≤t≤T

and taking the limit as n goes to infinity, we get (14.5.1) by Beppo Levi’s theorem.
⨆
⨅
If the diffusive coefficient .σ is bounded, a stronger integrability estimate than
that of Theorem 14.5.2 holds.
14.5 Some A Priori Estimates 283

Theorem 14.5.3 (A Priori Exponential Estimate) Let .X = (Xt )t∈[0,T ] be the

solution of the SDE

dXt = b(t, Xt )dt + σ (t, Xt )dWt ,

with b satisfying the linear growth assumption (14.4.1) and .σ bounded by a constant
.κ, i.e., .|σ (t, x)| ≤ κ for .(t, x) ∈ [0, T ]×RN . Then there exist two positive constants
.α and c, depending only on .T , κ, c1 and N, such that

⎾ ⏋ ⎾ ⏋
E eα X̄T ≤ cE ec|X0 | ,
2 2
. X̄T := sup |Xt |.
0≤t≤T

Given .δ > 0, almost surely on .(M̄T < δ) we have

ˆ t
|Xt | < |X0 | + c1
. (1 + X̄s )ds + δ, t ∈ [0, T ],
0

so that, by Grönwall’s lemma,

X̄T < (|X0 | + c1 T + δ)ec1 T .

Consequently
⎛ ⎞ ( )
. X̄T ≥ (|X0 | + c1 T + δ)ec1 T ⊆ M̄T ≥ δ

and by Proposition 13.2.4 (and estimate (13.2.5)) there exists a positive constant c,
depending only on .N, κ and T , such that16
⎛ ⎞ δ2
P X̄T ≥ (|X0 | + c1 T + δ)ec1 T | X0 ≤ ce− c .
. (14.5.2)

Let .λ = (|X0 | + c1 T + δ)ec1 T and observe that

λ −c1 T
δ = λe−c1 T − |X0 | − c1 T ≥
. e if λ ≥ ā|X0 | + b̄ (14.5.3)
2

16 Provided that we switch to the canonical setting by means of Lemma 14.3.5 (this is not restrictive

since the thesis depends only on the law of X), a regular version of the conditional probability exists
and estimate (14.5.2) holds pointwise as a consequence of Proposition 13.2.4.
284 14 Stochastic Differential Equations

with .ā := 2ec1 T and .b̄ := 2c1 T ec1 T . So, combining (14.5.2) and (14.5.3), we have
( )
P X̄T ≥ λ | X0 ≤ ce−c̄λ ,
2
. λ ≥ ā|X0 | + b̄, (14.5.4)

with .c, c̄ positive constants depending only on .T , κ, c1 and N . Now we apply the
2
Proposition 3.1.6 in [113] with .f (λ) = eαλ , where the constant .α > 0 will be
determined later: we have
⎾ ⏋ ˆ ∞
α X̄T2 2 ( )
.E e | X0 = 1 + 2α λeαλ P X̄T ≥ λ | X0 dλ ≤
0

(by (14.5.4))
ˆ ā|X0 |+b̄ ˆ +∞
2 2 (α−c̄)
. ≤ 1 + 2α λeαλ dλ + 2αc λeλ dλ.
0 ā|X0 |+b̄

The thesis follows by setting .α = c̄

2 and applying the expected value. ⨆
⨅

14.6 Key Ideas to Remember

adapted to the filtration generated by the initial datum and by W ) and in the
weak sense: in the latter case, not having the set-up fixed a priori, a solution is
constituted by the pair .(X, W ).
• Section 14.2: thanks to the regularizing effect of the Brownian motion and in
contrast with what happens in the deterministic case, we can have existence and
uniqueness of the solution of an SDE with a strongly irregular drift coefficient,
even only measurable and bounded.
• Section 14.3: the solution transfer technique allows us to set the problem of
solvability of an SDE in the canonical space of continuous trajectories: this
is particularly useful for the study of weak solutions. The Yamada-Watanabe
Theorem clarifies the relationship between the concepts of solvability in a weak
and strong sense:
(i) if an SDE is solvable in the strong sense then it is also solvable in the weak
sense;
14.6 Key Ideas to Remember 285

(ii) if for an SDE there is uniqueness in the strong sense then there is also
uniqueness in the weak sense;
(iii) if for an SDE there is solvability in the weak sense and uniqueness in the
strong sense then there is solvability in the strong sense.
• Sections 14.4 and 14.5: under the “standard assumptions” of linear growth and
Lipschitz continuity of the coefficients, we prove some integrability estimates
that will be crucial in the study of strong solutions.
Main notations used or introduced in this chapter:

Symbol Description Page

.(W, Ft ) Set-up 264
.W = (Wt )t∈[t0 ,T ] Brownian motion with initial point .t0 264
.F
W Standard Brownian filtration 264
.X ∈ SDE(b, σ, W, Ft ) X is the solution of the SDE with coefficients .b, σ related to 265
.(W, Ft )
.F (Completed) filtration generated by .Z ∈ mFt0 and W
Z,W 266
.Ωn = C([0, T ]; Rn ) Space of continuous n-dimensional trajectories 273
.Xt (w) = w(t) Identity process on .Ωn 273
n
.(Gt )t∈[0,T ] Filtration on .Ωn generated by the identity process 273
Chapter 15
Feynman-Kac Formulas

I may never find all the answers

I may never understand why
I may never prove
What I know to be true
But I know that I still have to try
Dream Theater, The spirit carries on

Consider the SDE

dXt = b(t, Xt )dt + σ (t, Xt )dWt

. (15.0.1)

where W is a d-dimensional Brownian motion and

b = b(t, x) : ]0, T [×RN −→ RN ,

. σ = σ (t, x) : ]0, T [×RN −→ RN ×d .

If there exists a solution .Xt,x = (Xst,x )s∈[t,T ] to (15.0.1) with initial datum .(t, x),
then by Itô’s formula, for any suitably smooth function u we have
ˆ s
u(s, Xst,x ) = u(t, x) +
. (∂r + Ar ) u(r, Xrt,x )dr
t
ˆ s
+ ∇u(r, Xrt,x )σ (r, Xrt,x )dWr , s ∈ [t, T ], (15.0.2)
t

where

1 ⎲ ⎲
N N
At :=
. cij (t, x)∂xi xj + bj (t, x)∂xj , c := σ σ ∗ , (15.0.3)
2
i,j =1 j =1

is the so-called characteristic operator of the SDE (15.0.1) (see Deﬁnition 15.1.1).

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 287
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_15
288 15 Feynman-Kac Formulas

The Feynman-Kac formulas offer a probabilistic framework for expressing

solutions to partial differential equations (abbreviated as PDEs) that involve the
operator .At . To ﬁx ideas, suppose there exists a classical solution to the backward
Cauchy problem
⎧
(∂t + At )u(t, x) = 0, (t, x) ∈ ]0, T [×RN ,
. (15.0.4)
u(T , x) = ϕ(x), x ∈ RN .

Then, (15.0.2) reduces to

ˆ s
t,x
.u(s, Xs ) = u(t, x) + ∇u(r, Xrt,x )σ (r, Xrt,x )dWr , s ∈ [t, T ],
t

and therefore the process .s |→ u(s, Xst,x ) is a local martingale: moreover, if

t,x
.(u(s, Xs ))s∈[t,T ] is a true martingale, by taking the expectation and using the ﬁnal

condition .u(T , ·) = ϕ, we obtain

⎾ ⏋ ⎾ ⏋
u(t, x) = E u(T , XTt,x ) = E ϕ(XTt,x ) .
. (15.0.5)

Formula (15.0.5) provides a representation of the solution of (15.0.4) in terms of

the ﬁnal datum .ϕ: from an application standpoint, this formula can be readily
implemented using Monte Carlo methods for numerical approximation of the
solution; from a theoretical perspective, Eq. (15.0.5) provides a uniqueness result
for the solution of problem (15.0.4).
In this chapter, we examine various variants and generalizations of formula
(15.0.5), valid for second-order partial differential operators of elliptic and parabolic
type.

15.1 Characteristic Operator of an SDE

Consider an SDE of the form (15.0.1) with coefﬁcients .b, σ ∈ L∞ loc that satisfy
the linear growth assumption (14.4.1). Suppose there exists a solution .Xt,x =
(Xst,x )s∈[t,T ] with initial datum .(t, x). Then, given a function .ψ = ψ(x) ∈
bC 2 (RN ) (i.e., .ψ has continuous and bounded derivatives up to the second order),
by Itô’s formula we have
⎾ ⏋
ψ(Xst,x ) − ψ(x)
.E
s−t
⎾ ˆ s ˆ s ⏋
1 1
=E Ar ψ(Xr )dr +
t,x
∇ψ(Xr )σ (r, Xr )dWr =
t,x t,x
s−t t s−t t
15.1 Characteristic Operator of an SDE 289

(since .|∇ψ(Xrt,x )σ (r, Xrt,x )| ≤ c(1 + |Xrt,x |) ∈ L2 by the a priori integrability

estimates of Theorem 14.5.2)
⎾ ˆ s ⏋
1
. = E Ar ψ(Xrt,x )dr −−−−−−→ At ψ(x)
s−t t s−t→0+

where we used the dominated convergence theorem and the estimates of Theo-
rem 14.5.2, to evaluate the limit: thus, we have
|
d ⎾ ⏋|
t,x |
. E ψ(Xs ) | = At ψ(x). (15.1.1)
ds s=t

This serves as the motivation for the following deﬁnition, which mirrors formula
(2.5.5) for Markov processes.
Definition 15.1.1 (Characteristic Operator of an SDE) The operator .At in
(15.0.3) is called the characteristic operator of the SDE (15.0.1).
Remark 15.1.2 ([!]) Given .m ∈ RN , consider the functions

ψi (x) := xi ,
. ψij (x) := (xi − mi )(xj − mj ), x ∈ RN , i, j = 1, . . . , N,

and observe that

At ψi (x) = bi (t, x),

. At ψij (x) = cij (t, x)+bi (t, x)(xj −mj )+bj (t, x)(xi −mi ).

Formula (15.1.1) is valid with .ψ = ψi and .ψ = ψij : this can be proved using
the same arguments as above since the linear growth hypothesis of the coefﬁcients
p
.b, σ and the .L estimates of Theorem 14.5.2 justify convergence and the martingale

property of the stochastic integrals. Thus, we have

|
d ⎾ t,x ⏋||
. E Xs | = b(t, x), (15.1.2)
ds s=t
|
d ⎾ t,x ⏋|
E (Xs − m)i (Xs − m)j ||
t,x
= cij (t, x) + bi (t, x)(xj − mj )
ds s=t

+ bj (t, x)(xi − mi )

and in particular, for .m = x,

|
d ⎾ t,x ⏋|
. E (Xs − x)i (Xst,x − x)j || = cij (t, x). (15.1.3)
ds s=t

Based on formulas (15.1.2) and (15.1.3), the coefﬁcients .bi (t, x) and .cij (t, x)
represent the infinitesimal increments of expectation and covariance matrix of .Xt,x ,
in agreement with Remark 2.5.8.
290 15 Feynman-Kac Formulas

Remark 15.1.3 Given .u ∈ C 1,2 (RN +1 ), by Itô’s formula, the process

ˆ s
.Mt := u(s, Xst,x ) − (∂r + Ar )u(r, Xrt,x )dr, s ≥ t,
t

is a local martingale: this result is similar to Theorem 2.5.13 and shows how to
“compensate” the process .s |→ u(s, Xst,x ) to obtain a (local) martingale. These sim-
ilarities between Markov processes and solutions of SDEs are not coincidental: we
will prove later (see Theorems 17.3.1 and 18.2.3) that, under suitable assumptions
on the coefﬁcients, the solution of an SDE is a diffusion.

15.2 Exit Time from a Bounded Domain

In this section, we provide some simple conditions that ensure that the first exit time
of the solution of the SDE (15.0.1) from a bounded domain1 D of .RN , is absolutely
integrable and therefore a.s. finite. We make the following
Assumption 15.2.1
(i) The coefficients of the SDE (15.0.1) are measurable and locally bounded,
.b, σ ∈ L
∞ ([0, +∞[×R N );
loc
(ii) for every .t ≥ 0 and .x ∈ D there exists a solution .Xt,x of (15.0.1) with initial
condition .Xtt,x = x, on a set-up .(W, Ft ).
We denote by .τt,x the first exit time of .Xt,x from D,

τt,x = inf{s ≥ t | Xst,x ∈

. / D},

and for simplicity, we write .X0,x = Xx and .τ0,x = τx .

Proposition 15.2.2 Assume that there exists a function .f ∈ C 2 (RN ), non-negative
on D and such that

At f (x) ≤ −1,
. t ≥ 0, x ∈ D. (15.2.1)

Then .E [τx ] is finite for every .x ∈ D. In particular, such a function exists if for some
λ > 0 and .i ∈ {1, . . . , N } we have2
.

cii (t, ·) ≥ λ,
. t ≥ 0, x ∈ D. (15.2.2)

1 Open and connected set.

2 Formula (15.2.2) is a non-total degeneracy condition on the matrix .(cij ) of the second-order
coefficients of the characteristic operator .At in (15.0.3): it is obviously satisfied if .(cij ) is uniformly
positive definite.
15.2 Exit Time from a Bounded Domain 291

Proof For a ﬁxed time t, by Itô’s formula, we have

ˆ t∧τx ˆ t∧τx
.
x
f (Xt∧τx
) = f (x) + As f (Xsx )ds + ∇f (Xsx )σ (s, Xsx )dWs .
0 0

Since .∇f and .σ (s, ·) are bounded on D for .s ≤ t, the stochastic integral has zero
expectation and by (15.2.1) we have
⎾ ⏋
x
E f (Xt∧τ
.
x
) ≤ f (x) − E [t ∧ τx ] ;

thus, since .f ≥ 0,

E [t ∧ τx ] ≤ f (x).
.

Finally, taking the limit as .t → ∞, by Beppo Levi’s theorem, we obtain

.E [τx ] ≤ f (x).

Now suppose that (15.2.2) holds and consider only the case .i = 1: then it is
enough to set

f (x) = α(eβR − eβx1 )

where .α, β are suitable positive constants and R is large enough so that D is
included in the Euclidean ball of radius R, centered at the origin. Indeed, f is non-
negative on D and we have
⎛ ⎞
1
.At f (x) = −αe c11 (t, x)β 2 + b1 (t, x)β
βx1
2
⎛ ⎞
−βR λβ
≤ −αβe − ‖b‖ L∞ (D) ,
2

hence the thesis by choosing .α, β suitably large. ⨆

⨅
Remark 15.2.3 It is easy to determine a condition on the ﬁrst-order terms,
analogous to that of Proposition 15.2.2: if there exist .λ > 0 and .i ∈ {1, . . . , N}
such that .bi (t, ·) ≥ λ or .b1 (t, x) ≤ −λ on D for every .t ≥ 0 then .E [τx ] is ﬁnite.
In fact, suppose for example that .b1 (t, x) ≥ λ: then applying Itô’s formula to the
function .f (x) = x1 we have

ˆ d ˆ
⎲
( x ) t∧τx t∧τx
. Xt∧τ
x 1
= x1 + b1 (s, Xsx )ds + σ1i (s, Xsx )dWsi ,
0 i=1 0
292 15 Feynman-Kac Formulas

and in expectation
⎾( ) ⏋
E
.
x
Xt∧τx 1
≥ x1 + λE [t ∧ τx ] ,

which proves the claim, in the limit as .t → ∞.

15.3 The Autonomous Case: The Dirichlet Problem

In this section, we consider the case where the coefﬁcients .b = b(x) and .σ = σ (x)
of the SDE (15.0.1) are independent of time and therefore denote .At in (15.0.3)
simply as .A . For many aspects, this condition is not restrictive since even problems
with time dependence can be treated in this context by inserting time among the state
variables as in the following Example 15.3.7. In addition to Assumption 15.2.1, we
assume that .E [τx ] is ﬁnite for every .x ∈ D, where D is a bounded domain.
The following result provides a representation formula (and, consequently, a
uniqueness result) for the classical solutions of the Dirichlet problem for the elliptic-
parabolic operator .A :
⎧
A u − au = f, in D,
. (15.3.1)
u|∂D = ϕ,

where .f, a, ϕ are given functions. As previously stated, formula (15.3.2) serves as
the foundation for Monte Carlo-type methods used in the numerical approximation
of solutions to the Dirichlet problem (15.3.1).
Theorem 15.3.1 (Feynman-Kac Formula [!!]) Let .f ∈ L∞ (D), .ϕ ∈ C(∂D) and
.a ∈ C(D) such that .a ≥ 0. If .u ∈ C (D) ∩ C(D̄) is a solution of the Dirichlet
2

problem (15.3.1) then for every .x ∈ D we have

⎾ ´ ˆ τx ´t ⏋
τx
u(x) = E e− 0 a(Xt )dt ϕ(Xτxx ) − e−
x a(Xsx )ds
. 0 f (Xtx )dt . (15.3.2)
0

Proof For .ε > 0 sufﬁciently small, let .Dε be a domain such that

x ∈ Dε ,
. D̄ε ⊆ D, dist (∂Dε , ∂D) ≤ ε.

Let .τε be the exit time of .Xx from .Dε and observe that, being .Xx continuous
(Fig. 15.1),

. lim τε = τx .
ε→0+
15.3 The Autonomous Case: The Dirichlet Problem 293

Fig. 15.1 The domain of a

Dirichlet problem and two
trajectories of the
corresponding solution of the
associated SDE

Let
´t
Zt = e−
. 0 a(Xsx )ds
,

and note that, by hypothesis, .Zt ∈ ]0, 1]. Moreover, if .uε ∈ C02 (RN ) is such that
.uε = u on .Dε , by Itô’s formula we have

( )
d(Zt uε (Xtx )) = Zt (A uε − auε ) (Xtx )dt + ∇uε (Xtx )σ (Xtx )dWt
.

so that
ˆ τε ˆ τε
Zτε u(Xτxε ) = u(x) +
. Zt f (Xtx )dt + Zt ∇u(Xtx )σ (Xtx )dWt .
0 0

Since .∇u and .σ are bounded on D, in expectation we obtain

⎾ ˆ τε ⏋
.u(x) = E Zτε u(Xτ ) −
x
ε
Zt f (Xtx )dt .
0

Letting .ε → 0+ , we get the thesis by the dominated convergence: indeed, recalling

that .Zt ∈ ]0, 1], we have
|ˆ τε |
| | | |
| x |
. Zτε u(Xτ ) ≤ ‖u‖L (D) ,
∞ | Z f (X x
)dt | ≤ τx ‖f ‖L∞ (D) ,
ε | t t |
0

and, by hypothesis, .τx is absolutely integrable. ⨆

⨅

Remark 15.3.2 The assumption .a ≥ 0 in Theorem 15.3.1 is essential: for example,

the function

.u(x, y) = sin x sin y

294 15 Feynman-Kac Formulas

is a solution to the Dirichlet problem

⎧
2 Δu + u = 0, in D = ]0, 2π [ × ]0, 2π [ ,
1
.
u|∂D = 0,

but does not satisfy (15.3.2).

Remark 15.3.3 (Maximum Principle) Under the assumptions of Theorem 15.3.1,
from formula (15.3.2) it follows that if .f ≥ 0 then
⎾ ´ τx ⏋
− 0 a(Xtx )dt
.u(x) ≤ E e ϕ(Xτxx ) ≤ max ϕ.
∂D

Moreover, when .f = a = 0, the following “maximum principle” holds:

. min u ≤ u(x) ≤ max u.

∂D ∂D

Existence results for problem (15.3.1) are well known in the uniformly elliptic
case: we recall the following classical theorem (see, for example, Theorem 6.13 in
[53]).
Theorem 15.3.4 Under the following assumptions
(i) .A in (15.0.3) is a uniformly elliptic operator, i.e., there exists a constant .λ > 0
such that
⎲
N
. cij (x)ξi ξj ≥ λ|ξ |2 , x ∈ D, ξ ∈ RN ;
i,j =1

(ii) the coefficients are Hölder continuous functions, .cij , bj , a, f ∈ C α (D).

Moreover, the functions .cij , bj , f are bounded and .a ≥ 0;
(iii) for each .y ∈ ∂D there exists3 a Euclidean ball B contained in the complement
of D and such that .y ∈ B̄;
(iv) .ϕ ∈ C(∂D);
there exists a classical solution .u ∈ C 2+α (D) ∩ C(D̄) of problem (15.3.1).
Now let us consider some signiﬁcant examples.
Example 15.3.5 (Expectation of the Exit Time) If the problem
⎧
A u = −1, in D,
.
u|∂D = 0,

has a solution, then by (15.3.2) we have .u(x) = E [τx ].

3 This is a regularity condition for the boundary of D, that is satisﬁed if, for example, .∂D is a
.C
2 -manifold.
15.3 The Autonomous Case: The Dirichlet Problem 295

Example 15.3.6 (Poisson Kernel) In the case .a = f = 0, (15.3.2) is equivalent to

a mean value formula. More precisely, let .μx denote the distribution of the random
variable .Xτxx : then .μx is a probability measure on .∂D and by (15.3.2) we have
ˆ
⎾ ⏋
u(x) = E u(Xτxx ) =
. u(y)μx (dy).
∂D

The law .μx is usually called the harmonic measure of .A on .∂D. If .Xx is a Brownian
motion with initial point .x ∈ RN , then .A = 12 Δ and when .D = B(0, R) is the
Euclidean ball of radius R, .μx has a density (with respect to the surface measure)
whose explicit expression is known: it corresponds to the so-called Poisson kernel.

1 R − |x|2
. ,
RωN |x − y|N

where .ωN denotes the measure of the unit spherical surface in .RN .
Example 15.3.7 (Heat Equation) Let W be a real Brownian motion. The process
Xt = (Wt , −t) is the solution of the SDE
.

⎧
dXt1 = dWt ,
.
dXt2 = −dt,

and the corresponding characteristic operator

1
A =
. ∂x x − ∂x2
2 1 1

is the heat operator in .R2 .

Let us consider formula (15.3.2) on a rectangular domain

D = ]a1 , b1 [ × ]a2 , b2 [ .
.

Examining the explicit expression of the trajectories of X (see Fig. 15.2), it is clear
that the value .u(x̄1 , x̄2 ) of a solution of the heat equation depends only on the values
of u on the boundary part D contained in .{x2 < x̄2 }. In general, the value of u in D
depends only on the values of u on the parabolic boundary of D, deﬁned by

∂p D = ∂D \ ( ]a1 , b1 [ ×{b2 }).

This fact is consistent with the results on the Cauchy-Dirichlet problem of

Sect. 20.1.1.
296 15 Feynman-Kac Formulas

Fig. 15.2 Two paths traced

by the solution of the SDE
associated to a
Cauchy-Dirichlet problem
deﬁned on a rectangular
domain

Example 15.3.8 (Method of Characteristics) If .σ = 0 the characteristic operator

is the ﬁrst-order differential operator

⎲
N
A =
. bi (x)∂xi .
i=1

The corresponding SDE is actually deterministic and reduces to

ˆ t
x
.Xt =x+ b(Xsx )ds,
0

that is, X is an integral curve of the vector ﬁeld b:

d
. Xt = b(Xt ).
dt
If the exit time of X from D is ﬁnite (cf. Remark 15.2.3) then we have the
representation
´ τx ˆ τx ´t
u(x) = e−
. 0 a(Xtx )dt
ϕ(Xτxx ) − e− 0 a(Xsx )ds
f (Xtx )dt, (15.3.3)
0

for the solution of the problem

⎧
〈b, ∇u〉 − au = f, in D,
.
u|∂D = ϕ.

Equation (15.3.3) is a particular case of the classic method of characteristics for

the solution of ﬁrst-order PDEs: for a full description of this method, we refer, for
example, to Chapter 3.2 in [41].
15.4 The Evolutionary Case: The Cauchy Problem 297

As a particular example, let us consider the Cauchy problem in .R2

⎧
∂x1 u(x1 , x2 ) − x1 ∂x2 u(x1 , x2 ) = 0, (x1 , x2 ) ∈ D := R× ]0, +∞[,
.
u(x1 , 0) = ϕ(x1 ), x1 ∈ R.
(15.3.4)
In this case, .b(x1 , x2 ) = (1, −x1 ) and the correspondent “SDE” is
⎧
d
dt X1,t = 1,
.
d
dt X2,t = −X1,t .

Imposing the initial condition .X0 = x ≡ (x1 , x2 ) ∈ D, we determine the solution

⎛ ⎞
t2
x
.Xt = x
(X1,t x
, X2,t ) = x1 + t, x2 − tx1 − .
2
x = 0, we ﬁnd the exit time from D of the trajectory .X x :
Imposing .X2,t
/
. τx = x12 + x2 − x1 .
⎛/ ⎞
Then .Xτxx = x12 + x2 , 0 is the exit point and, based on formula (15.3.3), the
solution of the problem (15.3.4) is
⎛/ ⎞
u(x1 , x2 ) =
. ϕ(Xτxx ) =ϕ x +y .
2

Note that, as in Example 2.5.12, the solution u inherits the regularity properties
of .ϕ and therefore in general the differential equation in (15.3.4) has to be
understood in a distributional sense. From a probabilistic perspective, the transition
law of the process X (which in this case is a Dirac delta distribution, i.e., .Xtx ∼
δ(x1 +t,x2 −tx1 −t 2 /2) ) is the fundamental solution of the Cauchy problem (15.3.4).

15.4 The Evolutionary Case: The Cauchy Problem

Theorem 15.3.1 also has a parabolic counterpart, with a proof that is entirely
analogous. Precisely, given the bounded domain D, we consider the cylinder

DT = ]0, T [×D
.
298 15 Feynman-Kac Formulas

and we denote by

∂p DT := ∂D \ ({0} × D)
.

the so-called parabolic boundary of .DT . The following theorem provides a

representation formula for the classical solutions of the Cauchy-Dirichlet problem
⎧
At u − au + ∂t u = f, in DT ,
. (15.4.1)
u|∂p DT = ϕ,

where .f, a, ϕ are given functions.

Theorem 15.4.1 (Feynman-Kac Formula [!]) Let .f ∈ L∞ (DT ), .ϕ ∈ C(∂p DT )
and .a ∈ C(DT ) such that .a0 := inf a is finite. Under Assumption 15.2.1, if .u ∈
C 2 (DT )∩C(DT ∪∂p DT ) is a solution of problem (15.4.1) then, for any .(t, x) ∈ DT ,
we have
⎾ ´
T ∧τ
− t t,x a(s,Xst,x )ds
.u(t, x) = E e ϕ(T ∧ τt,x , XTt,x∧τt,x )
ˆ T ∧τt,x ´s ⏋
a(r,Xrt,x )dr
− e− t f (s, Xst,x )ds . (15.4.2)
t

Remark 15.4.2 (Maximum Principle) Under the hypotheses of Theorem 15.4.1

and assuming .f = a = 0, from formula (15.4.2) we deduce the following
“maximum principle”

. min u ≤ u(x) ≤ max u

∂p DT ∂p DT

which we will ﬁnd, by analytical means, in Sect. 20.1.1.

We now prove a representation formula for the classical solution of the backward
Cauchy problem
⎧
At u − au + ∂t u = f, in [0, T [ ×RN ,
. (15.4.3)
u(T , ·) = ϕ, in RN ,

where .At is the characteristic operator in (15.0.3) and .f, a, ϕ are given functions.
Chapter 20 is dedicated to a concise presentation of the main existence and
uniqueness results for problem (15.4.3) in the case of uniformly parabolic operators
with Hölder and bounded coefficients.
Since problem (15.4.3) is posed on an unbounded domain, it is necessary to
introduce appropriate assumptions on the behavior at infinity of the coefficients.
15.4 The Evolutionary Case: The Cauchy Problem 299

Assumption 15.4.3
(i) The coefﬁcients .b = b(t, x) and .σ = σ (t, x) are measurable functions, with at
most linear growth in x uniformly in .t ∈ [0, T [;
(ii) .a ∈ C([0, T [ ×RN ) with .inf a =: a0 > −∞.
Theorem 15.4.4 (Feynman-Kac Formula [!!]) Suppose there exists a solution
u ∈ C 2 ([0, T [ ×RN ) ∩ C([0, T ] × RN ) of the Cauchy problem (15.4.3). Take
.

Assumption 15.4.3 and at least one of the following conditions as given:

(1) there exist two positive constants .M, p such that

|u(t, x)| + |f (t, x)| ≤ M(1 + |x|p ),

. (t, x) ∈ [0, T [ ×RN ; (15.4.4)

(2) the matrix .σ is bounded and there exist two positive constants M and .α, with .α
sufficiently small, such that
2
|u(t, x)| + |f (t, x)| ≤ Meα|x| ,
. (t, x) ∈ [0, T [ ×RN . (15.4.5)

If the SDE (15.0.1) has a solution .Xt,x with initial datum .(t, x) ∈ [0, T [ ×RN then
the representation formula holds
⎾ ´T ˆ T ´s ⏋
− a(s,Xst,x )ds − a(r,Xrt,x )dr
u(t, x) = E e
. t ϕ(XTt,x ) − e t f (s, Xst,x )ds .
t
(15.4.6)

Proof Fix .(t, x) ∈ [0, T [ ×RN and for simplicity, let .X = Xt,x . If .τR denotes the
exit time of X from the Euclidean ball of radius R, by Theorem 15.4.1 we have
⎾ ´
T ∧τR
u(t, x) = E e− t
.
a(s,Xs )ds
u(T ∧ τR , XT ∧τR )
ˆ T ∧τR ´s ⏋
− e− t a(r,Xr )dr
f (s, Xs )ds . (15.4.7)
t

Since

. lim T ∧ τR = T ,
R→∞

the thesis follows by taking the limit in R in (15.4.7) thanks to the dominated
convergence theorem. In fact, we have pointwise convergence of the integrands and
300 15 Feynman-Kac Formulas

moreover, under condition 1), we have

´ T ∧τR | | ( )
e−
. t |u(T ∧ τR , XT ∧τ )| ≤ Me|a0 |T 1 + X̄p ,
a(s,Xs )ds
R T
|ˆ |
| T ∧τR ´s | ( p)
| e− t a(r,Xr )dr f (s, Xs )ds || ≤ T e|a0 |T M 1 + X̄T ,
|
t

where

X̄T = sup |Xt |

.
0≤t≤T

is absolutely integrable thanks to the a priori estimates of Theorem 14.5.2. Under

condition (2), we proceed in a similar way using the exponential integrability
estimate of Theorem 14.5.3. ⨆
⨅
Remark 15.4.5 From the representation formula (15.4.6), it follows in particular
the uniqueness of the solution of the Cauchy problem. As we will see in Sect. 20.1,
the growth conditions (15.4.4) and (15.4.5) are necessary in order to select one
among the solutions that are, in general, inﬁnite.

15.5 Key Ideas to Remember

This chapter introduces several types of representation formulas for solutions to

(Cauchy or Cauchy-Dirichlet) problems involving the characteristic operator of an
SDE. Unsurprisingly, the definition of the characteristic operator closely mirrors
that introduced in the context of Markov process theory. If you have any doubts
about the following concise statements, please refer back to the relevant section.
. Section 15.1: we define the characteristic operator of an SDE: its coefficients
represent the infinitesimal increments of expectation and covariance matrix of
the solution to the associated SDE.
. Section 15.2: as a preliminary result, we provide simple conditions that ensure
that the exit time from a bounded domain of the solution of an SDE is a.s. finite
or even absolutely integrable.
. Section 15.3: using Itô’s formula, it is almost immediate to obtain representation
formulas for the classical solutions (assuming they exist) of the Dirichlet problem
in terms of the expected value of the solution of the associated SDE. These
formulas, known as Feynman-Kac formulas, have considerable theoretical and
practical importance, which is illustrated with numerous examples.
. Section 15.4: we present the parabolic version of the Feynman-Kac formulas.
15.5 Key Ideas to Remember 301

Main notations used or introduced in this chapter:

Symbol Description Page

.At Characteristic operator of an SDE 287
.τt,x First exit time 290
.∂p D Parabolic boundary of the cylinder D 298
.X̄T = sup |Xt | Maximum process 300
0≤t≤T
Chapter 16
Linear Equations

Tant que nous sommes agités, nous pouvons être calmes1

Julien Green

In this chapter, we consider stochastic differential equations of the form

dXt = (BXt + b)dt + σ dWt

. (16.0.1)

where .B ∈ RN ×N , .b ∈ RN , .σ ∈ RN ×d , and W is a d-dimensional Brownian

motion. Equation (16.0.1) is a particular case of (14.1.1) with coefficients .b(t, x) =
Bx + b and .σ (t, x) = σ that are linear functions of the variable x (in fact, the
diffusion coefficient is even constant) and therefore we say that (16.0.1) is a linear
SDE. In this chapter, we exhibit the explicit expression of the solution and study the
properties of its transition law, with particular attention to the absolutely continuous
case, providing conditions for the existence of the transition density.

16.1 Solution and Transition Law of a Linear SDE

The following theorem provides the explicit expression of the solution of a linear
SDE.

1 As long as we are restless, we can be calm.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 303
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_16
304 16 Linear Equations

Theorem 16.1.1 The solution .Xx = (Xtx )t≥0 of (16.0.1) with initial datum .X0x =
x ∈ RN is given by
⎛ ˆ t ˆ t ⎞
−sB −sB
x
.Xt =e tB
x+ e bds + e σ dWs . (16.1.1)
0 0

The solution .Xx is a Gaussian process: in particular, .Xtx ∼ Nmt (x),Ct where
⎛ ˆ t ⎞ ˆ t
−sB
mt (x) = e
.
tB
x+ e bds , Ct = esB σ (esB σ )∗ ds.
0 0

Proof To prove that .Xx in (16.1.1) solves the SDE (16.0.1), it is sufficient to apply
Itô’s formula using the expression .Xtx = etB Ytx where

. dYtx = e−tB bdt + e−tB σ dWt , Y0x = x.

We now recall that, since .Y x is an Itô process with deterministic coefficients, by the
multidimensional version of Example 11.1.9, we have
ˆ t ˆ t ∗
Ytx ∼ Nμt (x),Ct ,
. μt (x) = x + e−sB bds, Ct = e−sB σ σ ∗ e−sB ds.
0 0
(16.1.2)
The thesis follows easily from the fact that .Xx is a linear transformation of .Y x . ⨆
⨅
Remark 16.1.2 ([!]) The process

T |→ XTt,x := XTx −t ,
. T ≥ t,

solves the SDE (16.0.1) with initial datum .(t, x). If the covariance matrix .CT −t
is positive definite, then the random variable .XTt,x is absolutely continuous with
Gaussian density .𝚪(t, x; T , ·) given by
⎛ ⎞
1 1
𝚪(t, x; T , y)= √
. exp − 〈CT−1
−t (y − m T −t (x)), (y − m T −t (x))〉 .
(2π )N det CT −t 2

By2 Remark 2.5.10, .𝚪 is a transition density of X in (16.0.1) and is the fundamental

solution of the backward Kolmogorov operator .At + ∂t where

1 ⎲
N
At =
. cij ∂xi xj + 〈Bx + b, ∇〉, c := σ σ ∗ , (16.1.3)
2
i,j =1

is the characteristic operator of X.

2 See also Theorem 17.3.1.

16.1 Solution and Transition Law of a Linear SDE 305

Example 16.1.3 (Langevin Equation [!]) Consider the SDE in .R2

⎧
dVt = dWt ,
.
dXt = Vt dt,

which is the simplified version of the Langevin equation [86] used in physics to
describe the random motion of a particle in phase space: .Vt and .Xt represent the
velocity and position of the particle at time t, respectively. Paul Langevin was the
first, in 1908, to apply Newton’s laws to the random Brownian motion studied by
Einstein a few years earlier. Lemons [88] provides an interesting account of the
approaches of Einstein and Langevin.
Referring to the general notation (16.0.1), we have .d = 1, .N = 2 and
⎛ ⎞ ⎛ ⎞
00 1
.B= , σ = . (16.1.4)
10 0

Since .B 2 = 0, the matrix B is nilpotent and

⎛ ⎞
10
etB = I + tB =
. .
t 1

Moreover, setting .z = (v, x), we have

. mt (z) = etB z = (v, x + tv),

and
ˆ ˆ t⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞
t t2
∗ sB ∗ 10 10 1s t
Ct =
.
sB
e σσ e ds = ds = t 2 2
t3
. (16.1.5)
0 0 s1 00 01 2 3

Note that .Ct is positive definite for every .t > 0 and therefore .(V , X) has transition
density
√ ⎛ ⎞
3 1 −1 (T −t)B (T −t)B
𝚪(t, z; T , ζ ) =
. exp − 〈C (ζ − e z), ζ − e z〉
π(T − t)2 2 T −t
(16.1.6)
for .t < T and .z = (v, x), ζ = (η, ξ ) ∈ R2 , where
⎛ ⎞
−1
4
− t62
.Ct = t .
− t62 12
t3
306 16 Linear Equations

Moreover, .(t, v, x) |→ 𝚪(t, v, x; T , η, ξ ) is a fundamental solution of the backward

Kolmogorov operator

1
. ∂vv + v∂x + ∂t (16.1.7)
2
and .(T , η, ξ ) |→ 𝚪(t, v, x; T , η, ξ ) is a fundamental solution of the forward
Kolmogorov operator

1
. ∂ηη − η∂ξ − ∂T . (16.1.8)
2
Operators in (16.1.7) and (16.1.8) are not uniformly parabolic because the matrix of
the second-order part
⎛ ⎞
10
σσ∗ =
.
00

is degenerate; nonetheless, like the classical heat equation operator, they have a
Gaussian fundamental solution. Kolmogorov [70] was the first to exhibit the explicit
expression (16.1.6) of the fundamental solution of (16.1.7) (see also the introduction
of Hörmander’s work [62]). In mathematical finance, the backward operator (16.1.7)
is employed to evaluate some complex derivative instruments, notably including the
so-called Asian options (see, for example, [8] and [112]).
Example 16.1.4 ([!]) In Example 16.1.3 we proved that, setting
ˆ t
Xt :=
. Ws ds,
0

the pair .(W, X) has a two-dimensional normal distribution with covariance matrix
given in (16.1.5). It follows in particular that .Xt ∼ N t 3 , confirming what we had
0, 3
already observed in Example 11.1.10.
Let us prove that X is not a Markov process. In Theorem 17.3.1 we will see
that the pair .(W, X), being a solution of the Langevin SDE, is a Markov process:
Theorem 17.3.1 does not apply to X which is an Itô process but is not a solution of
an SDE of the form (17.0.1). In fact, we have
⎾ˆ T ⏋
E [XT | Ft ] = Xt + E
. Ws ds | Ft = Xt + (T − t)Wt (16.1.9)
t

since, by Itô’s formula

d(tWt ) = Wt dt + tdWt
.
16.1 Solution and Transition Law of a Linear SDE 307

namely
ˆ T ˆ T
T WT = tWt +
. Ws ds + sdWs
t t

from which
⎾ˆ T ⏋ ⎾ˆ T ⏋
E [T WT | Ft ] = tWt + E
. Ws ds | Ft + E sdWs | Ft
t t

and therefore
⎾ˆ T ⏋
E
. Ws ds | Ft = (T − t)Wt .
t

By (16.1.9), .E [XT | Ft ] is a function not only of .Xt but also of .Wt : incidentally,
this is a further confirmation of the Markov property of the pair .(W, X). If X were
a Markov process, then we should have3

.E [XT | Xt ] = E [XT | Ft ] , t ≤ T, (16.1.10)

which combined with (16.1.9) would imply .Wt = f (Xt ) a.s. for some .f ∈ mB.
However, this is absurd: in fact, if .Wt = f (Xt ) a.s. then .μWt |Xt = δf (Xt ) and this
contrasts with the fact that .(Wt , Xt ) has a two-dimensional Gaussian density.
Remark 16.1.5 The results of this section extend to the case of linear SDEs of the
type

dXt = (b(t) + B(t)Xt )dt + σ (t)dWt

where the matrices .B, b and .σ are measurable and bounded functions of time. In
this case, the matrix exponential .etB in the expression of the solution provided by
Theorem 16.1.1 is replaced by the solution .Ф(t) of the matrix Cauchy problem
⎧
Ф' (t) = B(t)Ф(t),
.
Ф(0) = IN ,

where .IN denotes the .N × N identity matrix.

3 Formula (16.1.10) must be interpreted according to Convention 4.2.5 in [113].

308 16 Linear Equations

16.2 Controllability of Linear Systems and Absolute

Continuity

We have seen that the solution X of the linear SDE (16.0.1) has a multi-normal
transition law. Clearly, it is of particular interest when X admits a transition density
and therefore the related Kolmogorov equations have a fundamental solution. In this
section, we see that the non-degeneracy of the covariance matrix of .Xt ,
ˆ t
Ct := cov(Xt ) =
. Gs G∗s ds, Gt := etB σ, (16.2.1)
0

can be characterized in terms of controllability of a system within the framework of

optimal control theory (see, for example, [87] and [151]). We begin by introducing
the following
Definition 16.2.1 The pair .(B, σ ) is controllable on .[0, T ] if for every .x, y ∈ RN
there exists a function .v ∈ C([0, T ]; Rd ) such that the solution .γ ∈ C 1 ([0, T ]; RN )
of the problem
⎧
γ ' (t) = Bγ (t) + σ v(t), 0 < t < T ,
. (16.2.2)
γ (0) = x,

verifies the final condition .γ (T ) = y. We say that v is a control for .(B, σ ) on .[0, T ].
Theorem 16.2.2 ([!]) The matrix .CT in (16.2.1) is positive definite if and only if
(B, σ ) is controllable on .[0, T ].
.
∗
Proof We preliminarily observe that .Ct = etB Ct etB , where
ˆ t
Ct =
. G−s G∗−s ds
0

is the covariance matrix in (16.1.2). Clearly, .CT > 0 if and only if .CT > 0.
We suppose .CT > 0 and prove that .(B, σ ) is controllable on .[0, T ]. Consider the
solution
⎛ ˆ t ⎞
.γ (t) = e x+ t ∈ [0, T ],
tB
G−s v(s)ds ,
0

of the Cauchy problem (16.2.2). Given .y ∈ RN , we have .γ (T ) = y if and only if

ˆ T
. G−s v(s)ds = z := e−T B y − x. (16.2.3)
0
16.2 Controllability of Linear Systems and Absolute Continuity 309

Then it is easy to verify that a control is given explicitly by

v(s) = G∗−s CT−1 z,

. s ∈ [0, T ]. (16.2.4)

Conversely, assume that .(B, σ ) is controllable on .[0, T ] and suppose, for contradic-
tion, that .CT is degenerate, i.e., there exists .w ∈ RN \ {0} such that

〈CT w, w〉 = 0.
.

Equivalently, we have
ˆ T
. |w ∗ G−s |2 ds = 0
0

so that .w ∗ G−s = 0 for every .s ∈ [0, T ] and therefore also

ˆ T
w∗
. G−s v(s)ds = 0.
0

This contradicts (16.2.3), hence the controllability hypothesis, and concludes the
proof. ⨅
⨆
Remark 16.2.3 The control v in (16.2.4) is optimal in the sense that it minimizes
the “cost functional”
ˆ T
U (v) := ‖v‖2L2 ([0,T ]) =
. |v(t)|2 dt.
0

This is a consequence of the Lagrange-Ljusternik theorem (cf., for example, [137])

which is the functional extension of the classical Lagrange multipliers theorem.
More precisely, to minimize the functional U under the constraint (16.2.3), we
consider the Lagrange functional
⎛ˆ T ⎞
L (v, λ) = ‖v‖2L2 ([0,T ]) − λ∗
. G−t v(t)dt − z ,
0

where .λ ∈ RN is the Lagrange multiplier. Differentiating .L in the Fréchet sense,

we impose that v is a critical point for .L and obtain
ˆ T ˆ T
∗ ∗
∂v L (u) = 2
. v(t) u(t)dt − λ G−t u(t)dt = 0, u ∈ L2 ([0, T ]).
0 0

Then we find .v(s) = 12 G∗−s λ with .λ determined by the constraint (16.2.3), that is
−1
.λ = 2C
T z, in agreement with (16.2.4).
310 16 Linear Equations

Fig. 16.1 Plot of the optimal

trajectory
.γ (t) = (6(t − t ), 3t − 2t ),
2 3 3

solution of problem (16.2.5)

with initial condition
.γ (0) = (0, 0) and final
.γ (1) = (0, 1)

Example 16.2.4 Let us resume Example 16.1.3 with the matrices .B, σ as in
(16.1.4). In this case, the control .v = v(t) has real values and the problem (16.2.2)
becomes
⎧
⎪
⎪γ1' (t) = v(t),
⎨
. γ2' (t) = γ1 (t), (16.2.5)
⎪
⎪
⎩γ (0) = (x , x ).
1 2

The control acts directly only on the first component of .γ but also affects the
second component .γ2 through the second equation: by Theorem 16.2.2, .(B, σ ) is
controllable on .[0, T ] for every .T > 0 with a control given explicitly by formula
(16.2.4) (see Fig. 16.1).

16.3 Kalman Rank Condition

We provide a further operational criterion to verify the non-degeneracy of the

covariance matrix .C .
Theorem 16.3.1 (Kalman Rank Condition) The matrix .CT in (16.2.1) is positive
definite for .T > 0 if and only if the pair .(B, σ ) satisfies the following Kalman
condition: the matrix of dimension .N × (N d), defined in blocks by
( )
. σ Bσ B 2 σ · · · B N −1 σ , (16.3.1)

has maximum rank, equal to N .

Proof Denote with

p(λ) := det(B − λIN ) = λN + a1 λN −1 + · · · + aN −1 λ + aN

.
16.3 Kalman Rank Condition 311

the characteristic polynomial of the matrix B: by the Cayley-Hamilton theorem, we

have .p(B) = 0. It follows that every power .B k , with .k ≥ N, is a linear combination
of .IN , B, . . . , B N −1 .
Now the matrix (16.3.1) does not have maximum rank if and only if there exists
.w ∈ R
N \ {0} such that

w ∗ σ = w ∗ Bσ = · · · = w ∗ B N −1 σ = 0.
. (16.3.2)

Therefore, if the matrix (16.3.1) does not have maximum rank, by (16.3.2) and the
Cayley-Hamilton theorem, we have

w ∗ B k σ = 0,
. k ∈ N0 ,

from which also

w ∗ etB σ = 0,
. t ≥ 0.

Consequently,
ˆ T
〈CT w, w〉 =
. |w ∗ etB σ |2 dt = 0, (16.3.3)
0

and .CT is degenerate for every .T > 0.

Conversely, if .CT is degenerate then there exists .w ∈ RN \ {0} for which (16.3.3)
holds and therefore

f (t) := w ∗ etB σ = 0,
. t ∈ [0, T ].

By differentiating, we obtain

dk
0=
. f (t) |t=0 = w ∗ B k σ, k ∈ N0 ,
dt k
and therefore, by (16.3.2), the matrix (16.3.1) does not have maximum rank. ⨆
⨅
Remark 16.3.2 Since the Kalman condition does not depend on T , then .CT is
positive definite for some .T > 0 if and only if it is for every .T > 0.
Example 16.3.3 In Example 16.1.3, we have
⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞
1 00 1 0
.σ = , Bσ = = ,
0 10 0 1

and thus .(σ Bσ ) is the .2 × 2 identity matrix which obviously satisfies the Kalman
condition.
312 16 Linear Equations

16.4 Hörmander’s Condition

The non-degeneracy of the covariance matrix of a linear SDE can also be char-
acterized in terms of a well-known condition in the context of partial differential
equations. Consider the linear SDE (16.0.1) under the assumption that .σ has rank
d: then, up to a linear transformation, it is not restrictive to assume
⎛ ⎞
Id
.σ = .
0

The corresponding Kolmogorov backward operator is

1
K =
. Δd + 〈b + Bx, ∇〉 + ∂t , (t, x) ∈ RN +1 , (16.4.1)
2
where .Δd denotes the Laplace operator in the first d variables .x1 , . . . , xd .
By convention, we identify a first-order differential operator on .RN of the type

⎲
N
.Z := αi (x)∂xi ,
i=1

with the vector field of its coefficients and therefore also write

Z(x) = (α1 (x), . . . , αN (x)),

. x ∈ RN .

The commutator of two vector fields Z and U , with

⎲
N
U=
. βi ∂xi ,
i=1

is defined by

⎲
N
[Z, U ] = ZU − U Z =
. (Zβi − U αi ) ∂xi .
i=1

Hörmander’s theorem [62] (see also Stroock [133] for a more recent treatment)
stands as a remarkably broad theorem. Here, we revisit a specific version pertinent
to the operator .K in (16.4.1): this theorem states that .K has a smooth fundamental
solution if and only if, at every point .x ∈ RN , the first-order operators (vector fields)

∂x 1 , . . . , ∂ x d ,
. Y := 〈Bx, ∇〉,
16.4 Hörmander’s Condition 313

together with their commutators of any order, span .RN . This is the so-called
Hörmander’s condition. Note that .∂x1 , . . . , ∂xd are the derivatives that appear in
the second-order part of .K , corresponding to the directions of Brownian diffusion,
while Y is the drift of the operator: therefore, essentially, the existence of the
fundamental solution is equivalent to the fact that .RN is spanned at every point
by the directional derivatives that appear in .K as second derivatives and as drift,
together with their commutators of any order.
Example 16.4.1
(i) If .d = N then .K is a uniformly parabolic operator and Hörmander’s condition
is obviously satisfied, without resorting to the drift and commutators, since
.∂x1 , . . . , ∂xN form the canonical basis of .R .
N

(ii) In the case of the Langevin operator of Example 16.1.3, we have .Y = x1 ∂x2 .
Thus .∂x1 = (1, 0) together with the commutator

[∂x1 , Y ] = ∂x2 = (0, 1)

form the canonical basis of .R2 and Hörmander’s condition is satisfied.

(iii) Consider the Kolmogorov operator

1
K =
. ∂x x + x1 ∂x2 + x2 ∂x3 + ∂t , (x1 , x2 , x3 ) ∈ R3 .
2 1 1
Here .N = 3, .d = 1 and .Y = x1 ∂x2 + x2 ∂x3 : also in this case Hörmander’s
condition is satisfied since

∂x 1 ,
. [∂x1 , Y ] = ∂x2 , [[∂x1 , Y ], Y ] = ∂x3 ,

form a basis of .R3 . This example can be considered a generalization of

Langevin model in which, in addition to considering position and velocity,
a third stochastic process is introduced that represents the acceleration of a
particle and is defined as a real Brownian motion.
Theorem 16.4.2 Kalman and Hörmander conditions are equivalent.
Proof It is sufficient to note that, for .i = 1, . . . , d,

⎲
N
[∂xi , Y ] =
. bki ∂xk
k=1

is the i-th column of matrix B. Moreover, .[[∂xi , Y ], Y ] is the i-th column of matrix
B 2 and an analogous representation holds for higher order commutators.
.

On the other hand, for .k = 1, . . . , N , the block .B k σ in the Kalman matrix

(16.3.1) is the .N × d matrix whose columns are the first d columns of .B k . ⨆
⨅
314 16 Linear Equations

Building upon the research in [34, 85, 106, 119] and [114], a theory analogous
to the classical treatment of uniformly parabolic equations has been developed for
Kolmogorov equations with variable coefficients of the type .∂t + At with .At as in
(16.1.3) and .σ = σ (t, x).

16.5 Examples and Applications

Linear SDEs are the basis of many important stochastic models: here we briefly
present some examples.
Example 16.5.1 (Vasicek Model) One of the simplest and most famous stochastic
models for the evolution of interest rates (also called short rates or short-term rates)
was proposed by Vasicek [143]:

drt = κ(b − rt )dt + σ dWt .

Here W is a real Brownian motion, .σ represents the volatility of the rate and the
parameters .κ, θ are called respectively “speed of mean reversion” and “long-term
mean level”. The particular form of the drift .κ(θ − rt ), with .κ > 0, is designed to
capture the so-called “mean reversion” property, an essential characteristic of the
interest rate that distinguishes it from other financial prices: unlike stock prices,
for example, interest rates cannot rise indefinitely. This is because at very high
levels they would hinder economic activity, leading to a decrease in interest rates.
Consequently, interest rates move in a bounded range, showing a tendency to return
to a long-term value, represented by the parameter .θ in the model. As soon as .rt
exceeds the level .θ , the drift becomes negative and “pushes” .rt to decrease while on
the contrary, if .rt < θ , the drift is positive and tends to make .rt grow towards .θ . The
fact that .rt has a normal distribution makes the model very simple to use and allows
for explicit formulas for more complex financial instruments, such as interest rate
derivatives. Among various resources, [18] stands out as an excellent introductory
text for interest rate modeling (Fig. 16.2).

Example 16.5.2 (Brownian Bridge) Given .b ∈ R, consider the one-dimensional

SDE
b − Bt
dBt =
. dt + dWt
1−t

with solution
ˆ t dWs
Bt = B0 (1 − t) + bt + (1 − t)
. , 0 ≤ t < 1.
0 1−s
16.5 Examples and Applications 315

Fig. 16.2 Plot of two trajectories of the Vasicek process with parameters .κ = 1, X0 = θ = 5%
and .σ = 8%

We have

E [Bt ] = B0 (1 − t) + bt,
.

and, by Itô’s isometry, we have

ˆ t ds
var(Bt ) = (1 − t)
.
2
= t (1 − t),
0 (1 − s)2

so that

. lim E [Bt ] = b, lim var(Bt ) = 0.

t→1− t→1−

Let us prove that .Bt converges to b for .t → 1− in .L2 norm:

⎾ ⏋ ⎾ˆ t ⏋
dWs
E (Bt − b)
.
2
= (1 − t) (b − B0 ) − 2(1 − t) (b − B0 ) E
2 2 2
0 1−s
◟ ◝◜ ◞
=0
⎾⎛ˆ ⎞2 ⏋
t dWs
+E
0 1−s
⎛ ˆ ⎞ t
ds
= (1 − t) (b − B0 ) +
2 2
0 (1 − s)
2
⎛ ⎞
1
= (1 − t)2 (b − B0 )2 + − 1 −−−→ 0.
1−t t→1−

The Brownian bridge is useful for modeling a system that starts at some level .B0
and is expected to reach level b at some future time, for example .t = 1. In Fig. 16.3,
316 16 Linear Equations

Fig. 16.3 Plot of four

trajectories of a Brownian
bridge

four trajectories of a Brownian bridge B with initial value .B0 = 0 and .B1 = 1 are
shown.

Example 16.5.3 (Ornstein-Uhlenbeck [104]) The following system of equations

for the motion of a particle extends the Langevin model by introducing an additional
friction term:
⎧
dXt1 = −μXt1 dt + ηdWt
.
dXt2 = Xt1 dt.

Here W is a real Brownian motion, .μ and .η are the positive parameters of friction
and diffusion. In matrix form

dXt = BXt dt + σ dWt

with
⎞ ⎛ ⎛ ⎞
−μ 0 η
.B = , σ = .
1 0 0

The validity of the Kalman condition is easily verified. Moreover, we have

⎞
⎛
(−μ)n 0
.B = n ∈ N,
n
,
(−μ)n−1 0

and
⎛ ⎞
⎲
N
(tB)n e−μt 0
e
.
tB
=I+ = 1−e−μt .
n! μ 1
n=1
16.5 Examples and Applications 317

The solution .Xt with initial datum .(x1 , x2 ) ∈ R2 is a two-dimensional Gaussian

process with
⎛ ⎞
x1 e−μt
.E [Xt ] = e x= tB
x2 + μ (1 − e−μt )
x1

and
ˆ t ∗
Ct =
. esB σ σ ∗ esB ds
0
ˆ t⎛ ⎞⎛ −μs
⎞
e−μs 0 e−μs 1−eμ
=η 2
1−e−μs ds
0 μ 0 0 1
⎛ ⎞
ˆ t e−2μs e −e
−μs −2μs

= y2 ⎝ −μs −2μs ⎛ −μs ⎞2 ⎠ ds

μ
e −e 1−e
0 μ μ
⎛ ( ) ( ) ⎞
1
1 − e−2μt 1
2μ⎛2
1 − 2e−μt + e−2μt
=y 2 (
2μ
) −2μt
⎞ .
1
2μ2
1 − 2e−μt + e−2μt 1
μ3
μt + 2e−μt − e 2 −3

Next, we present two examples of very popular SDEs frequently used in the
field of mathematical finance. Although not linear SDEs of the form (16.0.1), these
equations have an “affine structure” (in the sense of [36]) that allows to derive the
expression of their CHF and density in terms of special functions.
Example 16.5.4 (CIR Model) The Cox-Ingersoll-Ross (CIR) model [29] is a
variant of the Vasicek model of Example 16.5.1 in which the diffusion coefficient
is a square root function: this implies that, unlike Vasicek, the solution (the interest
rate) takes non-negative values. Specifically, we consider the following stochastic
dynamics
√
dXt = κ(θ − Xt )dt + σ Xt dWt
. (16.5.1)

where .κ, θ, σ are positive parameters and W is a real Brownian motion. Using Itô’s
formula, we determine the CHF .ϕXt of .Xt : first, we have

η2 iηXt
deiηXt = iηeiηXt dXt −
. e d〈X〉t
2
⎛ ⎞ √
(ησ )2
=e iηXt
iηκ(θ − Xt ) − Xt dt + iησ eiηXt Xt dWt =
2
318 16 Linear Equations

(ησ )2 √
(putting .a(η) = iηκθ , .b(η) = iηκ − 2 and .c(η, Xt ) = iησ eiηXt Xt )

. = eiηXt (a(η) + b(η)Xt )dt + c(η, Xt )dWt =

(exploiting the fact that .Xt eiηXt = −i∂η eiηXt )

. = (a(η) − ib(η)∂η )eiηXt dt + c(η, Xt )dWt .

Applying the expected value and assuming .X0 = x, we have

ˆ t ( )
ϕXt (η) = eiηx +
. a(η) − ib(η)∂η ϕXs (η)ds.
0

Equivalently, the function .u(t, η) := ϕXt (η) satisfies the following Cauchy problem
for a first-order partial differential equation
⎧
∂t u(t, η) = (a(η) − ib(η)∂η )ϕXt (η), t > 0, η ∈ R,
.
u(0, η) = eiηx .

This problem is solved using the method of characteristics of Example 15.3.8:

setting

2κ
. d(t) := , λ(t) := 2xe−κt d(t),
(1 − e−κt )σ 2

we obtain (Fig. 16.4)

⎛ ⎞κ
d(t) 2 iηλ(t)/2
ϕXt (η) =
. e d(t)−iη .
d(t) − iη

Fig. 16.4 Plot of four

trajectories of the CIR
process with parameters
.X0 = θ = 5%, .κ = 1 and
.σ = 20%
16.5 Examples and Applications 319

Example 16.5.5 (CEV Model) The constant elasticity of variance (CEV) model
has origins in physics and was introduced in mathematical finance by Cox [27, 28]
to describe the dynamics of the price of a risky asset: the CEV equation is of the
form
β
dXt = σ Xt dWt ,
. (16.5.2)

with parameters .σ > 0, .0 < β < 1 and initial condition .X0 = x ≥ 0.

We illustrate its peculiar characteristics here following the presentation in [105]
(see also [32] and [33]): it is possible to construct a weak solution of (16.5.2) starting
from the Kolmogorov equation, expressing the transition density4 of the solution in
terms of special functions. The process X has distinct properties in the two cases
2 and .β ≥ 2 . To describe these properties, we first introduce the functions
1 1
.β <

2(1−β) 2(1−β)
1 √ −x +y
⎛ ⎞
x 2 −2β ye 2(1−β)2 σ 2 (T −t) (xy)1−β
.𝚪± (t, x; T , y) = I± 1 ,
(1 − β)σ 2 (T − t) 2(1−β) (1 − β)2 σ 2 (T − t)

where .Iν (x) is the modified Bessel function of the first kind defined by

⎛ x ⎞ν ⎲
∞
x 2k
Iν (x) =
. ,
2
k=0
22k k!𝚪 E (ν + k + 1)

and .𝚪E denotes the Euler Gamma. Both .𝚪+ and .𝚪− are fundamental solutions of
∂t + A where .A is the characteristic operator of X:
.

σ 2 x 2β
A =
. ∂xx .
2
Precisely, we have

(∂t + A )𝚪± (t, x; T , y) = 0,

. on ]0, T [×R>0 ,

4 The transition density is constructed from the transformation

2(1−β)
Xt
.Yt =
σ 2 (1 − β)2
which leads (16.5.2) to the Bessel equation
√
.dYt = δdt + 2 Yt dWt (16.5.3)
1−2β
with .δ = 1−β . Formula (16.5.3) is a particular case of (16.5.1).
320 16 Linear Equations

and
ˆ
. lim 𝚪± (t, x; T , y)ϕ(y)dy = ϕ(x0 ), x0 ∈ R≥0 ,
(t,x)→(T ,x0 )
R>0
t<T

for every continuous and bounded function .ϕ.

The process X is non-negative and can take the value 0. If .β ≥ 12 we say that 0
is an “absorbing” state since, if we denote by .τx := inf{t | Xt = 0} the first time
when X reaches 0 starting from .X0 = x, then .Xt ≡ 0 for .t ≥ τx . The transition law
of X is
ˆ
.p(t, x; T , H ) = (1 − a)δ0 (H ) + a 𝚪+ (t, x; T , y)dy, H ∈ B,
H

where
ˆ +∞
a :=
. 𝚪+ (t, x; T , y)dy < 1.
0

On the other hand, if .β < 12 then X reaches 0 but is “reflected”: in this case .𝚪− has
an integral equal to one on .R>0 and is the transition density of X.
In [33] and [61] it is proven that X is a strictly local martingale and for this reason
it is not a good model for the price of a risky asset because it creates “arbitrage
opportunities”: in fact, if .β < 12 , buying the asset at time .τx at zero cost, there is a
certain gain since the price later becomes positive. For this reason, in the CEV model
introduced by Cox [27], the price is defined as the process obtained by stopping the
solution X at time .τx , that is

St := Xt∧τx ,
. t ≥ 0.

In the financial interpretation, .τx represents the default time of the risky asset.
Delbaen and Shirakawa [33] show that S is a non-negative martingale for every
.0 < β < 1. The unstopped process X is instead used as a model for the dynamics

of interest rates and volatility (or risk index, positive by definition) of financial
assets, as in the famous CIR [29] and Heston [60] models. The CEV model (and its
stochastic volatility counterpart, the popular SABR model [58] used in interest rate
modeling) is an interesting example of a degenerate model because the infinitesimal
generator is not uniformly elliptic and the law of the price process is not absolutely
continuous with respect to the Lebesgue measure.
16.6 Key Ideas to Remember 321

16.6 Key Ideas to Remember

We highlight the key outcomes of the chapter and the fundamental concepts to
remember from an initial reading, omitting the more technical or peripheral matters.
If any of the following brief statements are unclear, please refer back to the relevant
section for clarification.
• Section 16.1: linear SDEs have explicit Gaussian solutions. A particularly
interesting example is provided by the Langevin kinetic model whose solution
admits a density although the diffusive coefficient of the SDE is degenerate.
• Sections 16.2, 16.3, and 16.4: the study of the absolute continuity of the solution
of a linear SDE opens up interesting links with the theories of optimal control
and PDEs. The fact that the covariance matrix of the solution of a linear SDE
is positive definite is equivalent to the controllability of an appropriate linear
system: in this regard, the Kalman condition provides a simple operational
criterion. There is an additional equivalence with the Hörmander condition,
which is well-known in the context of PDEs theory.
• Section 16.5: linear SDEs are the basis of classic stochastic models and find
wide-ranging applications in various fields. In this section we present numerous
examples of linear and non-linear SDEs used in mathematical finance and
beyond.
Chapter 17
Strong Solutions

I spend many hours wandering the streets of Palermo, drinking

strong black coffee and wondering what’s wrong with me. I’ve
made it - I’m the world’s number one tennis player, yet I feel
empty.
Andre Agassi [1]

We present classical results regarding the strong existence and pathwise uniqueness
for SDEs. We maintain the general notations introduced in Chap. 14 and focus on
the SDE

dXt = b(t, Xt )dt + σ (t, Xt )dWt

. (17.0.1)

where W is a d-dimensional Brownian motion and the coefficients

b = b(t, x) : ]t0 , T [×RN −→ RN ,

. σ = σ (t, x) : ]t0 , T [×RN −→ RN ×d ,
(17.0.2)

satisfy the standard assumptions of Definition 14.4.1 for regularity (local Lipschitz
continuity) and linear growth. Here .N, d ∈ N and .0 ≤ t0 < T are fixed. We prove
the following results:
• Theorem 17.1.1 on strong uniqueness;
• Theorem 17.2.1 on strong solvability and the flow property;
• Theorem 17.3.1 on the Markov property;
• Theorem 17.4.1 and Corollary 17.4.2 on estimates of dependence on the initial
datum, regularity of trajectories, Feller property, and strong Markov property.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 323
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_17
324 17 Strong Solutions

17.1 Uniqueness

Theorem 17.1.1 (Strong Uniqueness) Assume the following hypothesis of local

Lischitz continuity in x, uniform in t: for every n ∈ N there exists a constant κn such
that

. |b(t, x) − b(t, y)| + |σ (t, x) − σ (t, y)| ≤ κn |x − y|, (17.1.1)

for every t ∈ [t0 , T ] and x, y ∈ RN such that |x|, |y| ≤ n. Then for the SDE (17.0.1)
with initial datum Z there is strong uniqueness according to Definition 14.1.11.
Proof Let X, Y be two solutions of the SDE (17.0.1) with initial datum Z, i.e.
X ∈ SDE(b, σ, W, Ft ) and Y ∈ SDE(b, σ, W, Gt ). We use a localization argument1
and set

τn = inf{t ∈ [t0 , T ] | |Xt | ∨ |Yt | ≥ n},

. n ∈ N,

with the convention min ∅ = T . Note that τn = t0 on (|Z| > n) ∈ Ft0 ∩ Gt0 . Since
by hypothesis X, Y are adapted and a.s. continuous, τn is an increasing sequence of
stopping times2 with values in [t0 , T ], such that τn ↗ T a.s. We set

bn (t, x) = b(t, x)1[t0 ,τn] (t),

. σn (t, x) = σ (t, x)1[t0 ,τn] (t), n ∈ N.
(17.1.2)
The processes Xt∧τn , Yt∧τn satisfy almost surely the equation
ˆ t∧τn ˆ t∧τn
Xt∧τn − Yt∧τn =
. (b(s, Xs ) − b(s, Ys )) ds + (σ (s, Xs ) − σ (s, Ys )) dWs
t0 t0
ˆ t ( )
= bn (s, Xs∧τn ) − bn (s, Ys∧τn ) ds
t0
ˆ t ( )
+ σn (s, Xs∧τn ) − σn (s, Ys∧τn ) dWs . (17.1.3)
t0

Moreover, we have
| | | |
. |bn (s, Xs∧τ ) − bn (s, Ys∧τ )| = |bn (s, Xs∧τ ) − bn (s, Ys∧τ )| 1(|Z|≤n) ≤
n n n n

1 The localization argument is necessary even under the hypothesis of global Lischitz continuity

because the idea is to apply Grönwall’s lemma to the function

⎾ ⏋
.v(t) = E sup |Xs − Ys |2
t0 ≤s≤t

under the assumption that v is bounded.

2 With respect to the filtration defined by F ∨ G := σ (F ∪ G ).
t t t t
17.1 Uniqueness 325

(since |Xs∧τn |, |Ys∧τn | ≤ n on (|Z| ≤ n) for s ∈ [t0 , T ])

| |
. ≤ κn |Xs∧τn − Xs∧τn | (17.1.4)

and a similar estimate is obtained with σn instead of bn . Now let

⎾ ⏋
| |2
vn (t) = E
. sup |Xs∧τn − Ys∧τn | , t ∈ [t0 , T ].
t0 ≤s≤t

From (17.1.3) and (17.1.4), proceeding exactly as in the proof of estimate (14.4.5)
with p = 2, we obtain
ˆ t
vn (t) ≤ c̄
. v(s)ds, t ∈ [t0 , T ],
t0

for a positive constant c̄ = c̄(T , d, N, κn ). Since X and Y are a.s. continuous and
adapted (and therefore progressively measurable), Fubini’s theorem ensures that v
is a measurable function on [t0 , T ], that is, vn ∈ mB. Moreover, vn is bounded,
precisely |vn | ≤ 4n2 , by construction. From Grönwall’s lemma, we obtain that vn ≡
0 and therefore
⎾ ⏋
| |2
.E sup |Xt∧τ − Yt∧τ | = vn (T ) = 0.
n n
t0 ≤t≤T

Taking the limit as n → ∞, by Beppo Levi’s theorem, X and Y are indistinguishable

on [t0 , T ]. ⨆
⨅
In the one-dimensional case, the following stronger result holds, which we report
without proof (see, for example, Theorem 5.3.3 in [37] or Proposition 5.2.13 in
[67]).
Theorem 17.1.2 (Yamada and Watanabe [149]) In the case N = d = 1, there is
strong uniqueness for the SDE (17.0.1) under the following conditions:

|b(t, x) − b(t, y)| ≤ k(|x − y|),

. |σ (t, x) − σ (t, y)| ≤ h(|x − y|),
t ≥ 0, x, y ∈ R, (17.1.5)

where
(i) h is a strictly increasing function such that h(0) = 0 and for every ε > 0
ˆ ε 1
. ds = ∞; (17.1.6)
0 h2 (s)
326 17 Strong Solutions

(ii) k is a strictly increasing, concave function such that k(0) = 0 and for every
ε>0
ˆ ε
1
. ds = ∞.
0 k(s)

17.2 Existence

We are interested in studying the solvability in the strong sense, which, as seen
in Sect. 14.1, requires that the solution is adapted to the standard filtration of the
Brownian motion and the initial datum. As stated3 in [124], the point where Itô
’s original theory of strong solutions of SDEs proves to be truly effective is the
theory of flows, which plays an important role in many applications: in this regard,
we indicate [82] as a reference monograph (additional valuable resources include
[12, 47] and [51]).
Theorem 17.2.1 (Strong Solvability and Flow Property [!]) Suppose that
the coefficients .b, σ satisfy the standard assumptions4 (14.4.1) and (14.4.2) on
.]t0 , T [×R . Given a set-up .(W, Ft ), we have:
N

(i) for every .x ∈ RN , there exists a strong solution .Xt0 ,x ∈ SDE(b, σ, W, F W )

with initial datum .Xtt00 ,x = x. Moreover, for every .t ∈ [t0 , T ] we have

(x, ω) |−→ ψt0 ,t (x, ω) := Xtt0 ,x (ω) ∈ m(BN ⊗ FtW );

. (17.2.1)

(ii) for every .Z ∈ mFt0 the process .Xt0 ,Z defined by

Xtt0 ,Z (ω) := ψt0 ,t (Z(ω), ω),

. ω ∈ Ω, t ∈ [t0 , T ], (17.2.2)

is a strong solution of the SDE (17.0.1) (i.e. .Xt0 ,Z ∈ SDE(b, σ, W, F Z,W ))

with initial datum .Xtt00 ,Z = Z;
t0 ,Z
(iii) the flow property holds: for every .t ∈ [t0 , T [, the processes .Xt0 ,Z and .Xt,Xt
are indistinguishable on .[t, T ], that is, almost surely
t ,Z
t,Xt 0
Xst0 ,Z = Xs
. for every s ∈ [t, T ]. (17.2.3)

3 [124] page 136: “Where the ‘strong’ or ‘pathwise’ approach of Itô ’s original theory of SDEs
really comes into its own is in the theory of flows. Flows are now very big business; and the
martingale-problem approach, for all that is has other interesting things to say, cannot deal with
them in any natural way.”
4 Actually, using a localization argument as in the proof of Theorem 17.1.1, it is sufficient to assume

hypothesis (17.1.1) (local Lipschitz continuity) instead of (14.4.2).

17.2 Existence 327

Proof We divide the proof into several steps.

(1) We prove the existence of the solution of (17.0.1) on .[t0 , T ] with deterministic
initial datum .Xt0 = x ∈ RN . We use the method of successive approximations
and recursively define the sequence of Itô processes

(0)
Xt
. ≡ x,
ˆ t ˆ t
(n)
Xt =x+ b(s, Xs(n−1) )ds + σ (s, Xs(n−1) )dWs , n ∈ N,
t0 t0
(17.2.4)

for .t ∈ [t0 , T ]. The sequence is well-defined and .X(n) is adapted to .F W and

a.s. continuous for every n. Moreover, an inductive argument5 in n shows that
(n)
.Xt = Xt(n) (x, ω) ∈ m(BN ⊗ FtW ) for every .n ≥ 0 and .t ∈ [t0 , T ].
We prove by induction the estimate
⎾ ⏋
(n) (n−1) 2 cn (t1 − t0 )n
E
. sup |Xt − Xt | ≤ , t1 ∈]t0 , T [, n ∈ N,
t0 ≤t≤t1 n!
(17.2.5)

with .c = c(T , d, N, x, c1 , c2 ) > 0 where .c1 , c2 are the constants of the standard
assumptions on coefficients. Let .n = 1: by (14.4.4) we have
⎾ ⏋ ⎾ |ˆ t ˆ t |2 ⏋
| |
E
. sup
(1)
|Xt
(0)
− Xt |2 =E |
sup | b(s, x)ds + σ (s, x)dWs ||
t0 ≤t≤t1 t0 ≤t≤t 1 t0 t0

≤ c̄1 (1 + |x|2 )(t1 − t0 ).

Supposing (17.2.5) true for n, let us prove it for .n + 1: we have

(by (14.4.5))
ˆ t1 ⎾ ⏋
. ≤ c̄2 E sup |Xr(n) − Xr(n−1) |2 ds ≤
t0 t0 ≤r≤s

5 Measurability in .(x, ω) is obvious for .n = 0. Assuming the thesis true for .n − 1, it is

sufficient to approximate the integrand in (17.2.4) with simple processes and use Corollary 10.2.27,
remembering that convergence in probability maintains the property of measurability.
328 17 Strong Solutions

(by inductive hypothesis, with .c = c̄2 ∨ c̄1 (1 + |x|2 ))

ˆ t1 (s − t0 )n
. ≤c n+1
ds
t0 n!

and this proves (17.2.5).

Combining Markov’s inequality with (17.2.5) we obtain
⎛ ⎞ ⎾ ⏋
(n) (n−1) 1 (n) (n−1) 2
P
. sup |Xt − Xt | ≥ n ≤2 E 2n
sup |Xt − Xt |
t0 ≤t≤T 2 t0 ≤t≤T

(4cT )n
≤ , n ∈ N.
n!
Then, by Borel-Cantelli’s Lemma 1.3.28 in [113] we have
⎛ ⎞
(n) (n−1) 1
P
. sup |Xt − Xt | ≥ n i.o. = 0
t0 ≤t≤T 2

that is, for almost every .ω ∈ Ω there exists .nω ∈ N such that

(n) (n−1) 1
. sup |Xt (ω) − Xt (ω)| ≤ , n ≥ nω .
t0 ≤t≤T 2n

Being

(n)
⎲
n
(k) (k−1)
Xt
. =x+ (Xt − Xt )
k=1

(n)
it follows that, almost surely, .Xt converges uniformly in .t ∈ [t0 , T ] as
.n → +∞ to a limit that we denote by .Xt : to express this fact, in symbols
(n)
we write .Xt ⇒ Xt a.s. Note that .X = (Xt )t∈[t0 ,T ] is a.s. continuous (thanks
to the uniform convergence) and adapted to .F W : moreover, .Xt = Xt (x, ω) ∈
m(BN ⊗ FtW ) for each .t ∈ [t0 , T ] because this measurability property holds
(n)
for .Xt for each .n ∈ N.
By (14.4.1) and being X a.s. continuous, it is clear that condition (14.1.3) is
satisfied. To verify that, almost surely, we have
ˆ t ˆ t
Xt = x +
. b(s, Xs )ds + σ (s, Xs )dWs , t ∈ [t0 , T ],
t0 t0
17.2 Existence 329

it is sufficient to observe that:

(n)
• by the Lipschitz property of b and .σ uniform in t, it follows that .b(t, Xt ) ⇒
(n)
b(t, Xt ) and .σ (t, Xt ) ⇒ σ (t, Xt ) a.s., and therefore
ˆ t ˆ t
. lim b(s, Xs(n) )ds = b(s, Xs )ds a.s.
n→+∞ t t0
0
ˆ t| |2
| |
lim |σ (s, Xs(n) ) − σ (s, Xs )| ds = 0 a.s. (17.2.6)
n→+∞ t
0

• by Proposition 10.2.26, (17.2.6) implies that

ˆ t ˆ t
. lim σ (s, Xs(n) )dWs = σ (s, Xs )dWs a.s.
n→+∞ t t0
0

This concludes the proof of existence in the case of deterministic initial datum.
(2) Now consider the case of a random initial datum .Z ∈ mFt0 . Let .f = f (x, ω)
be the function on .RN × Ω defined by
| ˆ t ˆ t |
| t0 ,x |
.f (x, ·) := sup | σ (s, Xs )dWs || .
|Xt − x − b(s, Xs )ds −
t0 ,x t0 ,x
t0 ≤t≤T t0 t0

t ,·
Note that .f ∈ m(BN ⊗ FTW ) since .Xt 0 ∈ m(BN ⊗ FtW ) for each .t ∈ [t0 , T ].
Moreover, for each .x ∈ R we have .f (x, ·) = 0 a.s. and therefore also .F (x) :=
N

E [f (x, ·)] = 0. Then we have

0 = F (Z) = E [f (x, ·)] |x=Z =

(by the freezing lemma in Theorem 4.2.10 in [113], since .Z ∈ mFt0 , then
f ∈ m(BN ⊗ FTW ) with .Ft0 and .FtW independent .σ -algebras by Remark
.

14.1.4 and .f ≥ 0)
⎾ ⏋
. = E f (Z, ·) | Ft0 .

Applying the expected value we also have

E [f (Z, ·)] = 0
.

and therefore .Xt0 ,Z in (17.2.2) is a solution of the SDE (17.0.1); actually, .Xt0 ,Z
is a strong solution because it is clearly adapted to .F Z,W .
330 17 Strong Solutions

(3) For .t0 ≤ t ≤ s ≤ T , with equalities holding almost surely, we have

ˆ s ˆ s
Xst0 ,Z = Z +
. b(r, Xrt0 ,Z )dr + σ (r, Xrt0 ,Z )dWr
t0 t0
ˆ t ˆ t
=Z+ b(r, Xrt0 ,Z )dr + σ (r, Xrt0 ,Z )dWr
t0 t0
ˆ s ˆ s
+ b(r, Xrt0 ,Z )dr + σ (r, Xrt0 ,Z )dWr
t t
ˆ s ˆ s
= Xtt0 ,Z + b(r, Xrt0 ,Z )dr + σ (r, Xrt0 ,Z )dWr ,
t t

that is, .Xt0 ,Z is a solution on .[t, T ] of the SDE (17.0.1) with initial datum .Xtt0 ,Z .
t0 ,Z
On the other hand, as proven in point (2), also .Xt,Xt is a solution of the same
0 t ,Z
SDE. By uniqueness, the processes .Xt0 ,Z and .Xt,Xt
are indistinguishable on
[t, T ]. This proves (17.2.3) and concludes the proof of the theorem.
.

⨆
⨅

17.3 Markov Property

In this section we show that, under suitable assumptions, the solution of an SDE is a
continuous Markov process (i.e., a diffusion). Hereafter, we will refer systematically
to the results of Sect. 2.5 concerning the characteristic operator of a Markov process.
Theorem 17.3.1 (Markov Property [!]) Assume that the coefficients .b, σ satisfy
conditions (14.4.1) and (17.1.1) of linear growth and local Lipschitz continuity. If
.X ∈ SDE(b, σ, W, Ft ) then X is a Markov process with transition law p where,

for every .t0 ≤ t ≤ s ≤ T and .x ∈ RN , .p = p(t, x; s, ·) is the law of the random

variable .Xst,x that is, of the solution of the SDE with initial condition x at time t,
evaluated at time s. Moreover, the characteristic operator of X is

1 ⎲ ⎲
N N
At =
. cij (t, x)∂xi xj + bj (t, x)∂xi , cij := (σ σ ∗ )ij . (17.3.1)
2
i,j =1 j =1

Proof We observe that p is a transition law according to Definition 2.1.1. Indeed,

we have:
(i) for every .x ∈ RN , by definition, .p(t, x; s, ·) is a distribution such that
.p(t, x; t, ·) = δx ;
17.3 Markov Property 331

(ii) for every .H ∈ BN

⎾ ( )⏋
x |→ p(t, x; s, H ) = E 1H Xst,x ∈ mBN
.

thanks to the measurability property (17.2.1) and Fubini’s theorem.

We prove that p is a transition law for X: according to Definition 2.1.1, we have to
verify that

p(t, Xt ; s, H ) = P (Xs ∈ H | Xt ),
. t0 ≤ t ≤ s ≤ T , H ∈ BN .

Since, by uniqueness, X is indistinguishable from the solution .Xt0 ,Xt0 ∈

Xt ,W
SDE(b, σ, W, Ft 0 ) constructed in Theorem 17.2.1, from the flow property
(17.2.3) we have almost surely

Xs = Xst,Xt
. for every s ∈ [t, T ].

Therefore, we have

P (Xs ∈ H | Xt ) ≡ E [1H (Xs ) | Xt ]

( )⎾ ⏋
E 1H Xst,Xt | Xt =

(by (4.2.7) in [113] of the freezing lemma, being .Xt ∈ mFt and therefore, by
Remark 14.1.4, independent of .FsW and .(x, ω) |→ 1H (Xst,x (ω)) ∈ m(BN ⊗ FsW )
thanks to (17.2.1))
⎾ ⏋
. = E 1H (Xst,x ) |x=Xt = p(t, Xt ; s, H ).

On the other hand, it is enough to repeat the previous steps, conditioning on .Ft
instead of .Xt , to prove the Markov property

p(t, Xt ; s, H ) = P (Xs ∈ H | Ft ),
. 0 ≤ t0 ≤ t ≤ s ≤ T , H ∈ BN .

Finally, the fact that .At is the characteristic operator of X has been proved in
Sect. 15.1 (in particular, compare (15.1.1) with definition (2.5.5)). ⨅
⨆
Remark 17.3.2 Under the assumptions of Theorem 17.3.1, by the Markov property
we have

. E [ϕ(XT ) | Ft ] = u(t, Xt ), ϕ ∈ bB,

where
ˆ
u(t, x) :=
. p(t, x; T , dy)ϕ(y).
R
332 17 Strong Solutions

We recall that, by the results of Sects. 2.5.3 and 2.5.2, the transition law p is a
solution of the Kolmogorov backward and forward equations, given respectively by

(∂t + At )p(t, x; s, dy) = 0,

. (∂s − As∗ )p(t, x; s, dy) = 0, t0 ≤ t < s ≤ T ,

where .As∗ indicates the adjoint operator of .At in (17.3.1), acting in the forward
variable y.

17.3.1 Forward Kolmogorov Equation

The forward Kolmogorov equation of a diffusion X can be derived by a direct

application of the Itô’s formula. Under the assumptions of Theorem 17.3.1, we
denote by .Xt,x the solution of the SDE (17.0.1) with initial condition .Xtt,x = x.
Given a test function .ϕ ∈ C0∞ (R × RN ), with compact support contained in
.]t, T [×R , by Itô’s formula we have
N

ˆ T
. 0= ϕ(T , XTt,x ) − ϕ(t, x) = (∂s + As ) ϕ(s, Xst,x )ds
t
ˆ T
+ ∇ϕ(s, Xst,x )σ (s, Xst,x )dWs
t

where .At is the characteristic operator in (17.3.1). Applying the expected value and
Fubini’s theorem, we obtain
⎾ˆ T ⏋
. 0=E (∂s + As ) ϕ(s, Xst,x )ds
t
ˆ ˆ ˆ
T ⎾ ⏋ T
= E (∂s + As ) ϕ(s, Xst,x ) ds = (∂s + As ) ϕ(s, y)p(t, x; s, dy)ds
t t RN
(17.3.2)

where .p(t, x; s, dy) denotes the law of the random variable .Xst,x which, by
Theorem 17.3.1, is the transition law of the Markov process X.
By (17.3.2), for every .t ≥ 0 we have
¨
. (∂s + As ) ϕ(s, y)p(t, x; s, dy)ds = 0, ϕ ∈ C0∞ (]t, +∞[×RN ),
RN+1

and thus we recover the result of Sect. 2.5.3 according to which p is a distributional
solution of the forward Kolmogorov equation
( )
. ∂s − As∗ p(t, x; s, ·) = 0, s > t. (17.3.3)
17.4 Continuous Dependence on Parameters 333

In particular, if p is absolutely continuous with density .𝚪, that is

ˆ
p(t, x; t, H ) =
. 𝚪(t, x; t, x)dx, H ∈ BN ,
H

then .𝚪(t, x; t, x) is a distributional solution of (17.3.3), that is

¨
. 𝚪(t, x; s, y) (∂s + As ) ϕ(t, x)dyds = 0, ϕ ∈ C0∞ (]t, +∞[×RN ),
RN+1

and we say that .(s, y) |→ 𝚪(t, x; s, y) is fundamental solution of the forward

operator .∂s − As∗ with pole in .(t, x).

17.4 Continuous Dependence on Parameters

Theorem 17.4.1 (Continuous Dependence Estimates on Parameters) Under the

standard assumptions (14.4.1) and (14.4.2), let Xt0 ,Z0 and Xt1 ,Z1 be solutions of the
SDE (17.0.1) with initial data (t0 , Z0 ) and (t1 , Z1 ), respectively, with 0 ≤ t0 ≤ t1 ≤
t2 ≤ T . For every p ≥ 2 there exists a positive constant c = c(T , d, N, p, c1 , c2 )
such that
⎾ ⏋
| |p ⎾ ⏋
| t0 ,Z0 t1 ,Z1 |
.E sup |Xt − Xs | ≤ cE |Z0 − Z1 |p
t2 ≤t,s≤T
( ⎾ ⏋) ⎛ p p
⎞
+ c 1 + E |Z1 |p |t1 − t0 | 2 + |T − t2 | 2 .
(17.4.1)

Proof By the elementary inequality (14.4.6) we have

Again by (14.4.6) and (14.4.5) we have

⎾ | |p ⏋
| |
v(t) := E
. sup |Xst0 ,Z0 − Xst0 ,Z1 |
t0 ≤s≤t
ˆ
⎾ ⏋ p−2 t
≤ 2p−1 E |Z0 − Z1 |p + 2p−1 c̄2 T 2 v(s)ds,
t0

and, by Grönwall’s lemma,

with c depending only on p, T and c2 .

(by (17.4.3))
⎾| |p ⏋
| |
. ≤ cE |Xtt10 ,Z1 − Z1 | ≤

(by (14.4.4))
ˆ t1 ⎛ ⎾ ⏋⎞
p−2
. ≤ cc̄1 |t1 − t0 | 2 1 + E sup |Xrt0 ,Z1 |p ds ≤
t0 t0 ≤r≤s

(by the Lp estimate (14.5.1), for a new constant c = C(T , d, N, p, c1 , c2 ))

⎾ ⏋ p
. ≤ c(1 + E |Z1 |p )|t1 − t0 | 2 .

We estimate the last term of (17.4.2) using a completely analogous approach, which
concludes the proof. ⨆
⨅
Corollary 17.4.2 (Feller and Strong Markov Properties) Under the standard
assumptions (14.4.1)–(14.4.2) and the usual conditions on the filtration, every X ∈
SDE(b, σ, W, Ft ) is a Feller process and satisfies the strong Markov property.
Proof By Theorem 17.3.1, X is a Markov process with transition law p =
p(t, x; T , ·) where, for every t, T ≥ 0 with t ≤ T and x ∈ RN , p(t, x; T , ·) is
the law of the r.v. XTt,x . By (17.4.1) and Kolmogorov’s continuity theorem (in the
multidimensional version of Theorem 3.3.4), the process (t, x, T ) |→ XTt,x admits a
modification X ~t,x with locally α-Hölder continuous trajectories for every α ∈ [0, 1[
T
17.4 Continuous Dependence on Parameters 335

with respect to the so-called “parabolic” distance: precisely, for every α ∈ [0, 1[,
n ∈ N and ω ∈ Ω there exists cα,n,ω > 0 such that
| t,x | ⎛ ⎞
1 α
. |~ ~us,y (ω)| ≤ cα,n,ω |x − y| + |t − s| 2 + |r − u| 2 ,
Xr (ω) − X
1

for every t, s, r, u ∈ [0, T ] such that t ≤ r, s ≤ u, and for every x, y ∈ RN such

that |x|, |y| ≤ n. Consequently, for every ϕ ∈ bC(RN ) and h > 0, the function
ˆ
⎾ ⏋
.(t, x) |−→ ~t,x )
p(t, x; t + h, dy)ϕ(y) = E ϕ(Xt+h
RN

is continuous thanks to the dominated convergence theorem and this proves the
Feller property. The strong Markov property follows from Theorem 7.1.2. ⨆
⨅
Chapter 18
Weak Solutions

If someone were to ask me, as a philosopher, what should be

learned in high school, I would answer: “first of all, only
‘useless’ things, ancient Greek, Latin, pure mathematics, and
philosophy. Everything that is useless in life”. The beauty is that
by doing so, at the age of 18, you have a wealth of useless
knowledge with which you can do everything. While with useful
knowledge you can only do small things.
Agnes Heller

In this chapter, we present weak existence and uniqueness results for SDEs with
coefficients

b = b(t, x) : ]0, T [×RN −→ RN ,

. σ = σ (t, x) : ]0, T [×RN −→ RN ×d ,
(18.0.1)

where .N, d ∈ N and .T > 0 are fixed. To this end, we describe what is known as
the “martingale problem” due to Stroock and Varadhan [136]: this problem pertains
to the construction of a distribution with respect to which the canonical process
∗
.X is a semimartingale with drift .b(t, Xt ) and covariance matrix .(σ σ )(t, Xt ). The

solution to the martingale problem, if it exists, is the law of the solution of the
corresponding SDE: in fact, the martingale problem turns out to be equivalent to the
weak solvability problem.
The analytical results on the fundamental solution of parabolic PDEs (cf.
Chap. 20) provide a solution to the martingale problem under Hölder regularity
and uniform ellipticity assumptions on the coefficients. Under these assumptions,
we prove existence and uniqueness in the weak sense for SDEs, along with
strong Markov, Feller, and other regularity properties of the trajectories of the
solution. We also showcase broader findings from prominent mathematicians,
including Skorokhod, Stroock, Varadhan, Krylov, Veretennikov and Zvonkin. In
the last section, we prove a “regularization by noise” result that guarantees strong
uniqueness for SDEs with bounded Hölder drift.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 337
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_18
338 18 Weak Solutions

The results of this chapter mark the endpoint of the study of construction methods
for diffusions, whose historical motivations had been illustrated in Sect. 2.6.

18.1 The Stroock-Varadhan Martingale Problem

Assume that the SDE with coefficients .b, σ admits a weak solution .(X, W ) and
denote as usual by .μX,W its law. By Lemma 14.3.5, the canonical process .(X, W) is
also a solution of the SDE with coefficients .b, σ on the space .(ΩN +d , GTN +d , μX,W )
and consequently,1 for each .i, j = 1, . . . , N, the processes
ˆ t
Mit := Xit −
. bi (s, Xs )ds, . (18.1.1)
0
ˆ t
(cij ) := σ σ ∗ ,
ij j
Mt := Mit Mt − cij (s, Xs )ds, (18.1.2)
0

are local martingales with respect to the filtration .(GtN +d )t∈[0,T ] generated by
.(X, W).

Note that the Brownian motion .W does not appear in the definitions (18.1.1) and
(18.1.2) and, still denoting by .X the identity process on .ΩN , one can verify that the
processes formally defined as in (18.1.1) and (18.1.2) are local martingales on the
space .(ΩN , GTN , μX ). This motivates the following
Definition 18.1.1 (Martingale Problem) A solution to the martingale problem
for .b, σ is a probability measure on the canonical space .(ΩN , GTN ) such that the
processes .Mi , Mij in (18.1.1) and (18.1.2) are local martingales with respect to the
filtration .GtN generated by the identity process .X.
Remark 18.1.2 ([!!]) It is worth emphasizing that the martingale condition on the
processes in (18.1.1) and (18.1.2) basically (means that) .X is a semimartingale with
drift .b(t, Xt ) and covariation matrix .Ct := cij (t, Xt ) .
If .(X, W ) is a solution of the SDE with coefficients .b, σ then .μX is a solution
of the martingale problem for .b, σ . We now show a result in the opposite direction

1 Formula (18.1.1) follows from the fact that

ˆ t
.Mt = X0 + σ (s, Xs )dWs ;
0

then
ˆ t
.〈M , Mj 〉t =
i
cij (s, Xs )ds
0

is the covariation process of .M, leading to formula (18.1.2).

18.1 The Stroock-Varadhan Martingale Problem 339

that allows us to conclude that the martingale problem and the weak solvability of
an SDE are equivalent.
Theorem 18.1.3 (Stroock and Varadhan) If .μ is a solution to the martingale
problem for .b, σ , then there exists a weak solution to the SDE with coefficients .b, σ
and initial law .μ0 defined by

μ0 (H ) := μ(X0 ∈ H ),
. H ∈ BN .

Proof We provide the proof only in the scalar case .N = d = 1 and refer, for
example, to Section 5.4.B in [67] for the general case. The fact that .μ is a solution
to the martingale problem for .b, σ , means that the process defined on .(ΩN , GTN , μ)
as in (18.1.1), that is
ˆ t
Mt = Xt −
. b(s, Xs )ds, (18.1.3)
0

is a local martingale with quadratic variation process .d〈M〉t = σ 2 (t, Xt )dt.

If .σ (t, x) /= 0 for every .(t, x), the proof is very simple: in fact, the process
ˆ t 1
Bt :=
. dMs (18.1.4)
0 σ (s, Xs )

is a local martingale with quadratic variation

ˆ t 1
〈B〉t =
. d〈M〉s = t.
0 σ 2 (s, Xs )

Then, by Lévy’s characterization Theorem 12.4.1, .B is a Brownian motion and being

dBt = σ −1 (t, Xt )dMt = σ −1 (t, Xt ) (dXt − b(t, Xt )dt), we have
.

ˆ t ˆ t
. σ (s, Xs )dBs = Xt − X0 − b(s, Xs )ds,
0 0

that is, .(X, B) is a solution to the SDE with coefficients .b, σ . Note that the solution
.(X, B) is defined on the space .(ΩN , GTN , μ).
In the general case where .σ can be zero, consider the space .(ΩN +d , GTN +d , μ ⊗
μW ) where .μW is the Wiener measure and the canonical process .(X, W) is such
that .W is a real Brownian motion (we recall that we are dealing only with the case
.N = d = 1). Let .Jt = 1(σ (t,Xt )/=0) and

ˆ t ˆ t
Js
Bt =
. dMs + (1 − Js )dWs .
0 σ (s, Xs ) 0
340 18 Weak Solutions

Again, .B is a real Brownian motion since it is a local martingale with quadratic

variation
Jt Jt (1 − Jt )
d〈B〉t =
. d〈M〉t + (1 − Jt )d〈W〉t + 2 d〈M, W〉t = dt.
σ 2 (t, Xt ) σ (t, Xt )

Furthermore, since .(1 − Jt )σ (t, Xt ) = 0, we have

ˆ t ˆ t ˆ t
. σ (s, Xs )dBs = Js dMs = Mt − M0 + (Js − 1)dMs
0 0 0
ˆ t
= Xt − X0 − b(s, Xs )ds
0

where in the last step we used the fact that, by the Itô isometry,
⎾⎛ˆ ⎞2 ⏋ ⎾ˆ ⏋
t t
E
. (Js − 1)dMs =E (Js − 1)σ 2 (s, Xs )ds = 0.
0 0

⨆
⨅
Remark 18.1.4 It is interesting to note in the previous proof that if .σ /= 0, i.e., in
the non-degenerate case, the Brownian motion .B is constructed as a functional of .X
and therefore the space .ΩN is sufficient to “support” the solution .(X, B) of the SDE.
On the contrary, in the degenerate case where .σ can be zero, the Brownian motion
.W comes into play to “guarantee sufficient randomness” to the system and it is

therefore necessary to define the solution on the enlarged space .ΩN +d . This further
explicates the difference between weak and strong solutions illustrated earlier in
Remarks 14.1.7 and 14.3.7.
Remark 18.1.5 Stroock and Varadhan (cf. Theorem 6.2.3 in [136]) prove that, for
the martingale problem, the equality of marginal distributions implies the equality
of finite-dimensional distributions and therefore the uniqueness in law. Precisely,
suppose that .b, σ are measurable and bounded functions: if for every .t ∈ [0, T ],
.x ∈ R and .ϕ ∈ bC(R ) we have
n n

E μ1 [ϕ(Xt )] = E μ2 [ϕ(Xt )]
.

where .μ1 , μ2 are solutions of the martingale problem for .b, σ with initial law
δx , then there exists at most one solution of the martingale problem for .b, σ with
.

initial law .δx . Hereafter, we will not use this result but will adopt a more analytical
approach to prove weak uniqueness using existence theorems for the Kolmogorov
equation associated with the SDE.
18.2 Equations with Hölder Coefficients 341

18.2 Equations with Hölder Coefficients

We consider an SDE with coefficients .b, σ as in (18.0.1) and define the diffusion
matrix

C = (cij ) := σ σ ∗ .
.

To specify the regularity conditions on the coefficients, we introduce the following

Notation 18.2.1 .bCTα denotes the space of bounded, continuous functions on
.]0, T [×R , that are uniformly Hölder continuous in x with exponent .α ∈ ]0, 1].
n
α
On .bCT , we consider the norm

|g(t, x) − g(t, y)|

.[g]α := sup |g| + sup . (18.2.1)
]0,T [×Rn 0<t<T |x − y|α
x/=y

The elements of .bCTα are continuous functions in .(t, x), Hölder continuous in
the spatial variable x, uniformly with respect to the time variable t. In fact, the
continuity condition in t can be omitted and will only be assumed for the sake of
simplifying the presentation.
In this section, we prove a weak existence and uniqueness result for SDE under
the following
Assumption 18.2.2
(i) .cij , bi ∈ bCTα for some .α ∈ ]0, 1] and for each .i, j = 1, . . . , N ;
(ii) the diffusion matrix .C is uniformly positive definite: there exists a positive
constant .λ0 such that

1 2
. |η| ≤ 〈C (t, x)η, η〉 ≤ λ0 |η|2 , (t, x) ∈ ]0, T [×RN , η ∈ RN .
λ0
(18.2.2)
Theorem 18.2.3 ([!!]) Under Assumption 18.2.2, for every distribution .μ0 on .RN
there exists and is unique in law the weak solution .(X, W ) of the SDE

dXt = b(t, Xt )dt + σ (t, Xt )dWt

. (18.2.3)

with initial law .μ0 . Moreover:

(i) X is a Feller and strong Markov process with characteristic operator

1 ⎲ ⎲
N N
At :=
. cij (t, x)∂xi xj + bi (t, x)∂xi , (t, x) ∈ ]0, T [×RN .
2
i,j =1 i=1
342 18 Weak Solutions

(ii) X has a transition density .𝚪(t, x; s, y) which is the fundamental solution2 of

.∂t + At ;

(iii) X admits a modification with .β-Hölder continuous trajectories for every .β <
1
2.
The proof of Theorem 18.2.3 is based on the existence results of the fundamental
solution for parabolic PDEs of Theorem 18.2.6 below.
Notation 18.2.4 We denote by .C 1,2 (]0, T [×RN ) the space of functions defined
on .]0, T [×RN that are continuously differentiable with respect to t and twice
continuously differentiable with respect to x.
Definition 18.2.5 (Backward Cauchy Problem) A classical solution of the back-
ward Cauchy problem for the operator .∂t + At on .]0, s[×RN , is a function .u ∈
C 1,2 (]0, s[×RN ) ∩ C(]0, s] × RN ) such that
⎧
∂t u(t, x) + At u(t, x) = 0, (t, x) ∈ ]0, s[×RN ,
. (18.2.4)
u(s, x) = ϕ(x), x ∈ RN ,

where .ϕ ∈ C(RN ) is the assigned final datum.

Section 20.3 is dedicated to the rather long and involved proof of the following
result.3
Theorem 18.2.6 (Levi [89], Friedman [49]) Under Assumption 18.2.2, there
exists a continuous function .𝚪 = 𝚪(t, x; s, y), defined for .0 < t < s ≤ T and
.x, y ∈ R , such that:
N

(i) for every .s ∈ ]0, T ] and for every .ϕ ∈ bC(RN ) the function defined by
ˆ
u(t, x) =
. 𝚪(t, x; s, y)ϕ(y)dy, (t, x) ∈ ]0, s[×RN , (18.2.5)
RN

and by .u(s, ·) = ϕ, is a classical solution of the backward Cauchy problem on

]0, s[×RN with final datum .ϕ. We say that .𝚪 is the fundamental solution of the
.

operator .∂t + At on .]0, T [×RN ;

(ii) the function
ˆ
p(t, x; s, H ) :=
. 𝚪(t, x; s, y)dy, 0 < t < s ≤ T , x ∈ RN , H ∈ BN ,
H

2 SeeTheorem 18.2.6 for the definition of fundamental solution.

3 In
Sect. 20.3 we will prove an equivalent result, Theorem 20.2.5, which is the forward version of
Theorem 18.2.6.
18.2 Equations with Hölder Coefficients 343

is a transition law,4 enjoys the Feller property (cf. Definitions 2.1.1 and 2.1.10)
and satisfies the Chapman-Kolmogorov equation (2.4.4);
(iii) for every .(s, y) ∈ ]0, T ] × RN , we have .𝚪(·, ·; s, y) ∈ C 1,2 (]0, s[×RN ) and
the following Gaussian estimates hold: there exist two positive constants .λ, c
that depend only on .T , N, α, λ0 , [cij ]α and .[bi ]α , for which we have

for every .(t, x) ∈ ]0, s[×RN , where .G denotes the standard N-dimensional
Gaussian function
1 |x|2
G(t, x) =
.
N
e− 2t , t > 0, x ∈ RN .
(2π t) 2

Proof of Theorem 18.2.3 It is a matter of combining Theorem 18.2.6 with a series

of results proven earlier. We examine separately the existence and uniqueness:
Weak Solvability Let .𝚪 be the fundamental solution on .]0, T [×RN of the operator
∂t + At as in Theorem 18.2.6. Due to the properties of .𝚪, in particular in
.

Theorem 18.2.6-(ii), and the multidimensional version of Theorem 2.4.4, there

exists a Markov process .X = (Xt )t∈[0,T ] that has transition density .𝚪 and is such
that .X0 ∼ μ0 . By Proposition 2.2.6, the identity process .X is a Markov process
on the canonical space .(ΩN , GTN , μX ) equipped with the filtration .(GtN )t∈[0,T ]
generated by .X.
We show that the law .μX of the process X solves the martingale problem for
b, σ and therefore, by Theorem 18.1.3, the SDE is solvable in the weak sense. We
.

consider the functions

ψi (x) = xi ,
. ψij (x) = xi xj , x ∈ RN , i, j = 1, . . . , N,

for which we have

At ψi (x) = bi (t, x),

. At ψij (x) = cij (t, x) + bi (t, x)xj + bj (t, x)xi .

4 In particular, according to Definition 2.1.1 of transition law, there exists

.p(s, x; s, ·) := lim p(t, x; s, ·) = δx

t→s −

with the limit understood in the sense of weak convergence.

344 18 Weak Solutions

We observe that the boundedness hypothesis of the coefficients and the Gaussian
estimate from above (18.2.6) guarantee that .At ψi (Xt ), At ψij (Xt ) ∈ L1 ([0, T ] ×
ΩN ): then from Theorem 2.5.13 it follows that the processes
ˆ t
Mit := Xit −
. bi (s, Xs )ds,
0
ˆ t⎛ ⎞
ij j j
Zt := Xit Xt − cij (s, Xs ) + bi (s, Xs )Xs + bj (s, Xs )Xis ds
0

are continuous martingales. To conclude, we prove that .Mij in (18.1.2) is indistin-

guishable from .Z ij or equivalently the process
ˆ t⎛ ⎞
ij ij ij j j
Yt := Mt − Zt =
. bi (s, Xs )(Xs − Xt ) + bj (s, Xs )(Xis − Xit )
0
ˆ t ˆ t
+ bi (s, Xs )ds bj (s, Xs )ds,
0 0

is null. First of all, we have the equality

ˆ t ˆ t
ij j j
.Yt = bi (s, Xs )(Ms − Mt )ds + bj (s, Xs )(Mis − Mit )ds
0 0

by the fact that

ˆ t ˆ t
j j j j
. bi (s, Xs )(Ms − Mt )ds = bi (s, Xs )(Xs − Xt )ds
0 0
ˆ t ⎛ˆ s ⎞
− bi (s, Xs ) bj (r, Xr )dr ds
0 0
ˆ t ˆ t
+ bi (s, Xs )ds bj (s, Xs )ds
0 0

and, by integration by parts, we have

ˆ t ⎛ˆ s ⎞ ˆ t ˆ t
. bi (s, Xs ) bj (r, Xr )dr ds = bi (s, Xs )ds bj (s, Xs )ds
0 0 0 0
ˆ t ⎛ˆ s ⎞
− bj (s, Xs ) bi (r, Xr )dr ds.
0 0

Moreover, we observe that

ˆ t ˆ t ⎛ˆ s ⎞
j j j
. bi (s, Xs )(Ms − Mt )ds =− bi (r, Xr )dr dMs (18.2.7)
0 0 0
18.2 Equations with Hölder Coefficients 345

which is equivalent to the expression obtained from Itô’s formula

⎛ ˆ t ⎞ ⎛ˆ t ⎞
j j j
.d Mt bj (s, Xs )ds = Mt bj (t, Xt )dt + bj (s, Xs )ds dMt .
0 0

Formula (18.2.7) is an equality between a BV process and a continuous local

martingale: by Theorem 9.3.6, both processes are null, so that .Y ij = 0. Then, by
Theorem 18.1.3, .X is a solution5 of the SDE with coefficients .b, σ and initial law
.μ0 with respect to a Brownian motion .W.

Uniqueness in Law and Main Properties Let us prove that if .(X, W ) is a weak
solution of the SDE (18.2.3) on .[0, T ] then X is a Markov process. Fixed .ϕ ∈
bC(RN ), we consider the6 solution u in (18.2.5) of the backward Cauchy problem
(18.2.4). Note that u is a bounded function since, by the Gaussian estimate (18.2.6),
we have
ˆ
.|u(t, x)| ≤ c‖ϕ‖L∞ (RN ) G (λ(s − t), x − y) dy
RN

= c‖ϕ‖L∞ (RN ) , (t, x) ∈ ]0, s] × RN . (18.2.8)

By Itô’s formula, .u(t, Xt ) is a local martingale and a bounded process by (18.2.8):

therefore, .u(t, Xt ) is a true martingale (cf. Remark 8.4.6-(v)) and we have
ˆ s
ϕ(Xs ) = u(t, Xt ) +
. ∇x u(r, Xr )σ (r, Xr )dWr . (18.2.9)
t

Conditioning (18.2.9) to .Ft , we obtain

ˆ
E [ϕ(Xs ) | Ft ] = u(t, Xt ) =
. 𝚪(t, Xt ; s, y)ϕ(y)dy.
RN

Given the arbitrariness of .ϕ, it follows that X is a Markov process with transition
density .𝚪: by Theorem 18.2.6-(ii) X is a Feller process and therefore also enjoys
the strong Markov property by Theorem 7.1.2.
By Kolmogorov’s continuity Theorem 3.3.4, the process X admits a modification
with .β-Hölder continuous trajectories for every .β < 12 : indeed, for every .0 ≤ t <
s ≤ T and .p > 0, the following integral estimate holds
⎾ ⏋ ⎾ ⎾ ⏋⏋
E |Xt − Xs |p = E E |Xt − Xs |p | Xt
.

5 Possiblyextending the canonical space to support also the Brownian motion .W with respect to
which to write the SDE, as in the proof of Theorem 18.1.3 and in the subsequent Remark 18.1.4.
6 As we will see in Chap. 20, in general the Cauchy problem (18.2.4) admits more than one

solution.
346 18 Weak Solutions

⎾ˆ ⏋
=E |Xt − y| 𝚪(t, Xt ; s, y)dy ≤
p
RN

(by the Gaussian estimate from above (18.2.6))

⎾ˆ ⏋
p
. ≤ cE |Xt − y|p G (λ(s − t), Xt − y) dy ≤ c(s − t) 2
RN

where the last step is justified by the change of variable .z = √t −y .

X
s−t
Finally, if .(Xi , W i ) for .i = 1, 2 are weak solutions of the SDE (18.2.3) on .[0, T ],
d
then, as just shown, .𝚪 is a transition law for both .X1 and .X2 . Therefore, if .X01 = X02 ,
i.e., if .X1 , X2 have the same initial law, then by Proposition 2.4.1 .X1 , X2 are equal
in law.
To conclude, we observe that, under the assumption of uniform ellipticity
(18.2.2), .W i is a functional of .Xi : to fix ideas, in the case .d = 1, from the SDE we
obtain the explicit expression of such a functional as in (18.1.3) and (18.1.4). Then,
from Corollary 10.2.28 we have the equality in law of .(X1 , W 1 ) and .(X2 , W 2 ). ⨅ ⨆
Remark 18.2.7 The latter part of the proof of Theorem 18.2.3 demonstrates a
duality result, namely, that the existence of a fundamental solution of .∂t +At implies
the uniqueness in law of the solution .(X, W ) of the stochastic differential equation.

18.3 Other Results for the Martingale Problem

We present an existence and uniqueness result for weak solutions under significantly
broader assumptions than those of Theorem 18.2.3.
Theorem 18.3.1 (Skorokhod [131], Stroock and Varadhan [136], Krylov [74,
75] [!!]) Let .μ0 be a distribution on .RN . Suppose that
(i) the coefficients .b, σ are bounded, Borel measurable functions
and at least one of the following assumptions holds:
(ii) .b(t, ·), σ (t, ·) are continuous functions for every .t ∈ [0, T ];
(iii) condition (18.2.2) of uniform ellipticity holds.
Then there exists a weak solution .(X, W ) of the SDE

dXt = b(t, Xt )dt + σ (t, Xt )dWt

. (18.3.1)

on .[0, T ] with initial law .μ0 . Moreover, if both assumptions (ii) and (iii) hold, then
there is also uniqueness in the weak sense.
So as for Theorem 18.2.3, the proof of weak solvability hinges on the martingale
problem, and therefore consists in the construction of the law of the solution.
However, in the proof of Theorem 18.2.3, this probability distribution is defined by
the fundamental solution of the backward Kolmogorov equation, whose existence is
18.4 Strong Uniqueness Through Regularization by Noise 347

ensured by the classical results of the theory of PDEs. Conversely, Skorokhod’s

approach to proving Theorem 18.3.1 is more akin to the method employed in
establishing the existence of strong solutions. It involves temporal discretization or
smoothing of the coefficients to render the equation solvable, followed by taking the
limit: the method of successive approximations employed in Theorem 17.2.1 is sup-
planted by an argument based on relative compactness or tightness (cf. Section 3.3.2
in [113]), in the space of distributions from which the weak convergence to a law is
deduced; the latter is finally shown to be a solution to the martingale problem. For
the details of the proof of Theorem 18.3.1, we refer to Section 2.6 in [77] and to
Theorems 6.1.7 and 7.2.1 in [136].
Remark 18.3.2 (Bibliographic Note) The literature on the martingale problem is
vast. We only mention some of the most recent contributions in which equations that
do not satisfy the uniform ellipticity condition (18.2.2) are considered: [11, 44, 46,
95, 140] and [30].

18.4 Strong Uniqueness Through Regularization by Noise

The main result of the section is the following theorem that provides an example of
“regularization by noise”: it extends the results of Sect. 14.2 to the case of strong
solutions.
Theorem 18.4.1 (Zvonkin [154], Veretennikov [144] [!!]) Assume the following
hypotheses:
(i) the drift coefficient is bounded and Hölder continuous, .b ∈ bCTα for some
.α ∈ ]0, 1];

(ii) the diffusion coefficient is bounded and Lipschitz continuous, .σ ∈ bCT1 ;

(iii) condition (18.2.2) of uniform ellipticity holds.
Then for the SDE

dXt = b(t, Xt )dt + σ (t, Xt )dWt

. (18.4.1)

there is existence and uniqueness in the strong sense.

Remark 18.4.2 ([!]) Theorem 18.4.1 illustrates the regularizing effect of noise,
i.e., the diffusive part of the SDE: in the case of zero diffusion .σ , the classic
Example 14.2.1 by Peano shows that the Hölder continuity of the drift b is not
sufficient to guarantee the uniqueness of the solution.
First, Zvonkin [154] proved the existence and uniqueness in the strong sense for
SDEs in one dimension with .b ∈ L∞ (]0, T [×R) and .σ = 1: Veretennikov [144]
extended this result to the multidimensional case. Krylov and Röckner [78] showed
p
that there is existence and uniqueness in the strong sense if .b ∈ Lloc with .p > N and
Zhang [153] dealt with the case where the diffusion coefficient is not constant. In the
recent work [23], Champagnat and Jabin study the existence and strong uniqueness
348 18 Weak Solutions

for SDEs with irregular coefficients, without assuming the uniform ellipticity of
the diffusion matrix, starting from suitable .Lp estimates for the solutions of the
associated Fokker-Planck equation. Finally, we point out the recent results in [57]
on the approximation of solutions, under minimal regularity assumptions.
For the proof of Theorem 18.4.1 we follow Fedrizzi and Flandoli [43] who use
the so-called Itô-Tanaka trick and the following
Proposition 18.4.3 Under the assumptions of Theorem 18.2.6, let .𝚪 be the funda-
mental solution of the Kolmogorov operator .∂t +At on .]0, T [×RN . For every .λ ≥ 1,
the vector-valued function in .RN
ˆ T ˆ
.uλ (t, x) := e−λ(s−t) 𝚪(t, x; s, y)b(s, y)dyds, (t, x) ∈ ]0, T ] × RN ,
t RN

is a classical solution to the Cauchy problem

⎧
(∂t + At )u = λu − b, in ]0, T [×RN ,
.
u(T , ·) = 0, in RN .

Moreover, there exists a constant .c > 0, which depends only on .N, λ0 , T and the
norms .[bi ]α and .[cij ]α in (18.2.1), such that

|x − y|
|uλ (t, x) − uλ (t, y)| ≤ c √ ,
. λ (18.4.2)
|∇x uλ (t, x) − ∇x uλ (t, y)| ≤ c|x − y|,

for every .t ∈ ]0, T [ and .x, y ∈ RN , where .∇x = (∂x1 , . . . , ∂xN ).

Proposition 18.4.3 is a consequence of some estimates obtained in the proof
of Theorem 18.2.6 on the existence of the fundamental solution: we refer to
Sect. 20.3.5 for the proof and details.
Proof of Theorem 18.4.1 The existence of a weak solution is a consequence of
Theorem 18.2.3: therefore, it remains to prove the strong uniqueness from which
the thesis will follow thanks to the Yamada-Watanabe Theorem 14.3.6.
First, we present the Itô-Tanaka trick to transform the SDE into a new equation
with a more regular drift. Let .(X, W ) be a solution of (18.4.1). By Proposition 18.4.3
and Itô’s formula, we have7

7 Here

⎲
N
.(∇x uλ · σ )ij = (∇x uλ )ik σkj , i = 1, . . . , N, j = 1, . . . , d.
k=1
18.4 Strong Uniqueness Through Regularization by Noise 349

duλ (t, Xt ) = (∂t + At )uλ (t, Xt )dt + (∇x uλ · σ )(t, Xt )dWt

= (λuλ (t, Xt ) − b(t, Xt ))dt + (∇x uλ · σ )(t, Xt )dWt

or equivalently
ˆ t ˆ t
. b(s, Xs )ds = uλ (0, X0 ) − uλ (t, Xt ) + λ uλ (s, Xs )ds
0 0
ˆ t
+ (∇x uλ · σ )(s, Xs )dWs . (18.4.3)
0

Inserting (18.4.3) into (18.4.1), we obtain

ˆ t ˆ t
Xt = X0 + uλ (0, X0 ) − uλ (t, Xt ) + λ
. uλ (s, Xs )ds + σ (s, Xs )dWs
0 0
ˆ t
+ (∇x uλ · σ )(s, Xs )dWs . (18.4.4)
0

In this way, the drift coefficient b is replaced by the more regular function .uλ : at
this point, with some small adjustments, one can proceed as in the case of Lipschitz
coefficients, using Grönwall’s lemma to prove uniqueness. In fact, let .X' be another
solution of the SDE (18.4.1) related to the same Brownian motion W and let .Z :=
X − X' . Writing also .X' as in (18.4.4) and subtracting the two equations, we obtain
ˆ t
Zt = −uλ (t, Xt ) + uλ (t, Xt' ) + λ
. (uλ (s, Xs ) − uλ (s, Xs' ))ds
0
ˆ t ( )
+ σ (s, Xs ) − σ (s, Xs' ) dWs
0
ˆ t ( )
+ (∇x uλ · σ )(s, Xs ) − (∇x uλ · σ )(s, Xs' ) dWs .
0

By the elementary inequality (14.4.6) and Jensen and Burkholder inequalities

(12.3.7), we have

1 ⎾ ⏋ ⎾ ⏋
. E |Zt |2 ≤ E |uλ (t, Xt ) − uλ (t, Xt' )|2
4
⎾ˆ t ⏋
| |
+λ TE
2 | ' |2
uλ (s, Xs ) − uλ (s, Xs ) ds
0
⎾ˆ ⏋
t | |
+E |σ (s, Xs ) − σ (s, X' )|2 ds
s
0
⎾ˆ ⏋
t | |
+E |(∇x uλ · σ )(s, Xs ) − (∇x uλ · σ )(s, X' )|2 ds ≤
s
0
350 18 Weak Solutions

(by the estimates (18.4.2) of Proposition 18.4.3 with .λ ≥ 1 and the Lipschitz
assumption of .σ )
ˆ
c ⎾ ⏋ t ⎾ ⏋
. ≤ E |Zt |2 + c(1 + λ) E |Zs |2 ds,
λ 0

for some positive constant c that depends only on .N, λ0 , T and the norms .[b]α and
[σ ]1 . In other words, we have
.

⎛ ⎞ ⎾ ⏋ ˆ t ⎾ ⏋
1 c
. − E |Zt | ≤ c(1 + λ)
2
E |Zs |2 ds.
4 λ 0

Then, choosing .λ suitably large, we get

⎾ ⏋ ˆ t ⎾ ⏋
.E |Zt | ≤ c̄ E |Zs |2 ds, t ∈ [0, T ],
2
0

for a suitable positive constant .c̄. The thesis follows from Grönwall’s lemma. ⨆
⨅
Remark 18.4.4 Formula (18.4.4) can be used as in the proof of Theorem 17.4.1
to obtain the continuous dependence estimate (17.4.1) on the parameters. As a
consequence of Kolmogorov’s continuity Theorem 3.3.4, under the assumptions
of Theorem 18.4.1 the solution of the SDE (18.4.1) with initial datum x at
time t, admits a modification .(t, x, s) |→ Xst,x with locally .α-Hölder continuous
trajectories for every .α ∈ [0, 1[ with respect to the “parabolic” distance: precisely,
for every .α ∈ [0, 1[, .n ∈ N and .ω ∈ Ω there exists .cα,n,ω > 0 such that
| | ⎛ ⎞
| t1 ,x t ,y | 1 1 α
|Xs1 (ω) − Xs22 (ω)| ≤ cα,n,ω |x − y| + |t1 − t2 | 2 + |s1 − s2 | 2 ,
.

(18.4.5)
for every .t1 , t2 , s1 , s2 ∈ [0, T ] such that .t1 ≤ s1 , .t2 ≤ s2 , and for every .x, y ∈ RN
such that .|x|, |y| ≤ n.

18.5 Key Ideas to Remember

We summarize the most relevant results of the chapter. As usual, if you have
any doubt about what the following succinct statements mean, please review the
corresponding section.
• Section 18.1: through the Stroock-Varadhan martingale problem, the study of
weak solvability of an SDE is reduced to the construction of a distribution (the
law of the solution) on the canonical space that makes the processes in (18.1.1)
and (18.1.2) martingales.
18.5 Key Ideas to Remember 351

• Sections 18.2 and 18.3: we exploit the analytical results on the existence of
the fundamental solution of uniformly parabolic PDEs to solve the martingale
problem. As a consequence, we prove existence, weak uniqueness, and Markov
properties for SDEs with Hölder and bounded coefficients. The assumptions
are further weakened in Theorem 18.3.1 whose proof is based on properties of
relative compactness in the space of distributions.
• Section 18.4: we establish a “regularization by noise” result, ensuring strong
uniqueness for SDEs with Hölder continuous and bounded drift, under a uniform
ellipticity condition.
Main notations used or introduced in this chapter:

Symbol Description Page

.Ωn = C([0, T ]; Rn ) Space of continuous n-dimensional trajectories 273
.Xt (w) = w(t) Identity process on .Ωn 273
.(Gt )t∈[0,T ]
n Filtration on .Ωn generated by the identity process 273
.bCT
α Continuous, bounded, and uniformly Hölder continuous functions 341
in x
.[g]α Norm in .bCTα 341
.C
1,2 (]0, T [×RN ) Functions continuously differentiable w.r.t. t and twice continu- 342
ously differentiable w.r.t x
.At Characteristic operator 341
.𝚪(t, x; s, y) Fundamental solution 342
.G(t, x) Standard N -dimensional Gaussian 343
Chapter 19
Complements

The day a man realizes he cannot know everything is a day of

mourning. Then comes the day when he is brushed by the
suspicion that he will not be able to know many things; and
finally that autumn afternoon when it will seem to him that he
has never known too well what he thought he knew.
Julien Green

We offer a concise and relaxed exploration of various paths that the theory of
stochastic differential equations has taken. At the end of each section, we include a
bibliography, directing interested readers to further literature on the specific topics
discussed.

19.1 Markovian Projection and Gyöngy’s Lemma

Consider an Itô process of the form

dXt = ut dt + vt dBt
. (19.1.1)

where u is an N -dimensional process in .L1loc , v is .N × d-dimensional process in

.L
2 and B is a d-dimensional Brownian motion on the space .(Ω, F , P , F ). In
loc t
general, at any time t, .Xt may depend on the .σ -algebra .Ft (of information up to
time t) in an extremely complicated way through the coefficients .ut and .vt . In this
section, we present a result, known as Gyöngy’s lemma, according to which there

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 353
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_19
354 19 Complements

Fig. 19.1 Plot of the

trajectories .t |→ Wt (ω) (solid
line) and .t |→ W ~ (ω) (dashed
line) of the processes of
Remark 19.1.1, related to two
outcomes .ω = ω1 (in black)
and .ω = ω2 (in gray)

exists a diffusion Y , solution of an SDE of the type

dYt = b(t, Yt )dt + σ (t, Yt )dWt ,

. (19.1.2)

which “mimicks” X in the sense that it has the same marginal distributions, i.e., it
d
is such that .Yt = Xt for each t. This result can be useful when one is interested in
the law of .Xt for a fixed time t and not in the entire law of the process X. Since
the coefficients .b = b(t, y) and .σ = σ (t, y) in (19.1.2) are deterministic functions,
by the results of the previous chapters, Y is a Markov process, sometimes called
Markovian projection of X.
Remark 19.1.1 Processes with the same one-dimensional distributions can have
very distinct properties: for example, we saw in Remark 4.1.5 that a Brownian √
motion W has the same one-dimensional distributions as the process .W ~t := tW1 .
However, despite this equivalence, the two processes are inherently distinct in
law, and their trajectories demonstrate entirely different properties, as illustrated
in Fig. 19.1.
Theorem 19.1.2 (Gyöngy [56]) Let X be an Itô process of the form (19.1.1) with
the coefficients .u, v being progressively measurable, bounded, and satisfying the
uniform ellipticity condition

〈vt vt∗ η, η〉 ≥ λ|η|2 ,

. t ∈ [0, T ], η ∈ RN ,

for some positive constant .λ. There exist two bounded and measurable functions

b : [0, T ] × RN −→ RN ,
. σ : [0, T ] × RN −→ RN×N ,

such that, setting .C = σ σ ∗ , we have1

⎾ ⏋
b(t, Xt ) = E [ut | Xt ] ,
. C (t, Xt ) = E vt vt∗ | Xt (19.1.3)

1 Formula (19.1.3) means that .b(t, ·) and .(σ σ ∗ )(t, ·) are respectively versions of the conditional
expectation functions of .ut and .vt vt∗ given .Xt , according to Definition 4.2.16 in [113].
19.1 Markovian Projection and Gyöngy’s Lemma 355

and the SDE (19.1.2) with coefficients .b, σ admits a weak solution .(Y, W ) such that
d
.Yt = Xt for every .t ∈ [0, T ].

Proof We only give a sketch of the proof. Let b and .C = (cij ) be versions of
the conditional expectation functions of .ut and .vt vt∗ given .Xt respectively, as in
1
(19.1.3). Moreover, let .σ = C 2 be the positive definite square root of the positive
definite matrix .C : the complete proof in [56] uses a regularization argument of the
coefficients that allows to reduce to the case where .bi (t, ·) and .cij (t, ·) are at least
Hölder continuous functions so as to satisfy the hypotheses of Theorem 18.2.6 for
the existence of a fundamental solution of the characteristic operator .At + ∂t where

1 ⎲ ⎲
N N
At :=
. cij (t, x)∂xi xj + bi (t, x)∂xi .
2
i,j =1 i=1

Hence, fixed .s ∈ ]0, T ] and .ϕ ∈ C0∞ (RN ), consider the classical, bounded solution
f of the backward Cauchy problem
⎧
∂t f (t, x) + At f (t, x) = 0, (t, x) ∈ ]0, s[×RN ,
.
f (s, x) = ϕ(x), x ∈ RN .

By Itô’s formula, we have

N ˆ
1 ⎲ s
f (s, Xs ) = f (0, X0 ) +
. (vt vt∗ )ij ∂xi xj f (t, Xt )dt
2 0
i,j =1
ˆ ˆ
s ( ) s
+ ut ∇x f (t, Xt ) + ∂t f (t, Xt ) dt + ∇x f (t, Xt )vt dBt
0 0
(19.1.4)

and taking the expectation2

N ˆ
1 ⎲ s ⎾ ⏋
.E [f (s, Xs )] = E [f (0, X0 )] + E (vt vt∗ )ij ∂xi xj f (t, Xt ) dt
2 0 i,j =1
ˆ s
+ E [ut ∇x f (t, Xt ) + ∂t f (t, Xt )] dt =
0

2 Here we use a technical argument that relies on the analytical results of Chap. 20: the estimate
of Corollary 20.2.7 guarantees that .∇x f (t, Xt )vt ∈ L2 and therefore the stochastic integral in
(19.1.4) has zero expectation.
356 19 Complements

(by the properties of conditional expectation)

N ˆ
1 ⎲ s ⎾ ⎾ ⏋ ⏋
. = E [f (0, X0 )] + E E (vt vt∗ )ij | Xt ∂xi xj f (t, Xt ) dt
2 0
i,j =1
ˆ s
+ E [E [ut | Xt ] ∇x f (t, Xt ) + ∂t f (t, Xt )] dt =
0

(by (19.1.3))
ˆ s
. = E [f (0, X0 )] + E [(At f + ∂t f )(t, Xt )] dt =
0

(being f a solution of the Cauchy problem)

. = E [f (0, X0 )] . (19.1.5)

On the other hand, by Theorem 18.3.1 there exists a weak solution .(Y, W ) of the
SDE (19.1.2) with initial law equal to the law of .X0 . By Itô’s formula, the process
3
.f (t, Yt ) is a martingale and therefore, by (19.1.5) we have

E [ϕ(Ys )] = E [f (s, Ys )] = E [f (0, Y0 )] = E [f (0, X0 )] = E [f (s, Xs )]

= E [ϕ(Xs )]

d
so that .Ys = Xs , given the arbitrariness of .ϕ. ⨆
⨅
Remark 19.1.3 (Bibliographic Note) Markovian projection methods are widely
used in mathematical finance for the calibration of local-stochastic volatility and
interest rates models: in this regard, see, for example, [3], [83] and Section 11.5
in [55]. A version of Gyöngy’s Theorem 19.1.2 that relaxes the hypotheses on the
coefficients has been more recently proven by Brunick and Shreve [22].

19.2 Backward Stochastic Differential Equations

In the previous chapters, we examined SDEs with an assigned initial datum.

However, in some applications, for example in stochastic optimal control theory or
mathematical finance, problems arise where it is natural to assign a final condition:

3 Precisely, by Itô’s formula the process .f (t, Y ) is a local martingale, but it is also a true martingale
t
by the boundedness of the function f .
19.2 Backward Stochastic Differential Equations 357

in this case, we speak of backward SDEs (or BSDEs). The most elementary
example is
⎧
dYt = 0,
. (19.2.1)
YT = η.

If the datum .η ∈ RN is not random, (19.2.1) is a simple ordinary differential

equation (ODE) with constant solution .Y ≡ η. The situation is completely different
if we set the problem in a space .(Ω, F , P ) on which a Brownian motion W is
defined with standard filtration .F W and assume .η ∈ mFTW : in fact, to remain
within the classical Itô calculus, we would like the solution Y to be an adapted
process and therefore the constant solution equal to .η is not acceptable. The first
problem is therefore to correctly formulate the concept of a solution to a BSDE.
For each .η ∈ L2 (Ω, FTW , P ), the adapted process that best (in .L2 norm)
approximates the constant process equal to .η is
⎾ ⏋
Yt := E η | FtW ,
. t ∈ [0, T ]. (19.2.2)

From this perspective, the process Y in (19.2.2) is the natural candidate to be a

solution to the BSDE (19.2.1). Clearly, it is not necessarily the case that Y in
(19.2.2) verifies the equation .dYt = 0. Indeed, since Y is a .F W -square-integrable
martingale, by the martingale representation Theorem 13.5.1 there exists a unique
.Z ∈ L such that
2

ˆ t ˆ T ˆ T
Yt = Y0 +
. Zs dWs = Y0 + Zs dWs − Zs dWs .
0 0 t
◟ ◝◜ ◞
=η

This means that Y verifies the forward SDE

⎧
dYt = Zt dWt ,
. ´T (19.2.3)
Y0 = η − 0 Zs dWs .

Although it may not seem obvious, it is not difficult to prove that .(Y, Z) is the only
pair of processes in .L2 that satisfies (19.2.3): in fact, if (19.2.3) were also satisfied
by .(Y ' , Z ' ) ∈ L2 , then, setting .A = Y − Y ' and .B = Z − Z ' , we would have
⎧
dAt = Bt dWt ,
.
AT = 0.

By Itô’s formula, we have

dA2t = 2At dAt + d〈A〉t

.
358 19 Complements

and therefore
ˆ T ˆ T
At = −
. 2As dAs − Bs2 ds
t t

and
⎾ ˆ T ⏋ ⎾ˆ T ⏋
E A2t +
. Bs2 ds = E 2As dAs = 0
t t

where the last equality is due to the fact that A, and therefore also the stochastic
integral, is a martingale. Based on what has just been proven, the following
definition is well posed.
Definition 19.2.1 Let W be a Brownian motion on the space .(Ω, F , P ) endowed
with the standard filtration .F W . We say that the pair .(Y, Z) ∈ L2 , unique solution
of the SDE (19.2.3), is the adapted solution of the BSDE (19.2.1) with final datum
.η ∈ L (Ω, F
2 W
T , P ).
Note that by definition we have
⎧
dYt = Zt dWt ,
.
YT = η.

In a similar way, more general backward equations of the form

⎧
dYt = f (t, Yt , Zt )dt + Zt dWt ,
.
YT = η,

are studied. Under standard Lipschitz assumptions on the coefficient .f = f (t, y, z)

in the variables .(y, z), it is possible to prove the existence and uniqueness of the
adapted solution .(Y, Z): see, for example, Theorem 4.2, Chapter 1 in [93].
Often a BSDE is coupled with a forward SDE of the type

dXt = b(t, Xt )dt + σ (t, Xt )dWt .

Given .u = u(t, x) ∈ C 1,2 ([0, T [×RN ), applying Itô’s formula to .Yt := u(t, Xt ) we
obtain

dYt = (∂t + At )u(t, Xt )dt + Zt dWt

where .At is the characteristic operator of X and

.Zt := (∇x u)(t, Xt )σ (t, Xt ).

19.3 Filtering and Stochastic Heat Equation 359

In particular, if u is a solution of the quasi-linear Cauchy problem

⎧
(∂t + At )u(t, x) = f (t, x, u(t, x), ∇x u(t, x)σ (t, x)) (t, x) ∈ [0, T [×RN ,
.
u(T , x) = ϕ(x) x ∈ RN ,
(19.2.4)
then .(X, Y, Z) solves the forward-backward system of equations (FBSDE)
⎧
⎪
⎨dXt = b(t, Xt )dt + σ (t, Xt )dWt ,
⎪
. dYt = f (t, Xt , Yt , Zt )dt + Zt dWt , (19.2.5)
⎪
⎪
⎩Y = ϕ(X ).
T T

Under appropriate assumptions that guarantee the existence of a solution4 of the

problem (19.2.4), by construction we have

.u(t, x) = Ytt,x (19.2.6)

where .Y t,x is the solution of the FBSDE (19.2.5) with initial datum .Xt = x.
Formula (19.2.6) is a nonlinear Feynman-Kac formula that generalizes the classical
representation formula of Sect. 15.4.
Remark 19.2.2 (Bibliographic Note) The main motivation for the study of
BSDEs comes from the theory of optimal stochastic control, starting from the works
[17] and [15]; some applications to mathematical finance are discussed in [39]. The
earliest results about existence and the nonlinear Feynman-Kac representation come
from [109], [117], and [2]. We point to the following books as essential references
for the theory of backward equations: Ma and Yong [93], Yong and Zhou [150],
Pardoux and Rascanu [110], and Zhang [152].

19.3 Filtering and Stochastic Heat Equation

In this section, we outline some basic ideas of the theory of stochastic filtering and,
in a simple and explicit case, introduce the notion of stochastic partial differential
equation (abbreviated as SPDE), which intervenes naturally in this type of problems.
Given .(W, B) a standard two-dimensional Brownian motion, we consider the
process
√
Xtσ := σ Wt +
. 1 − σ 2 Bt , σ ∈ [0, 1].

4 Since it is a non-linear problem, the solution u is understood in a generalized sense, for example
as a “viscosity solution” (see, for example, Theorem 2.1, Chap. 8 in [93]).
360 19 Complements

Suppose that .Xσ represents a signal that is transmitted but not observable with
precision due to some disturbance in the transmission: precisely, we assume that
we can observe precisely .Wt , called the observation process, while the Brownian
motion .Bt represents the noise in the transmission.
It is easy to verify that .Xσ is a real Brownian motion for every .σ ∈ [0, 1]. The
problem of stochastic filtering consists in obtaining the best estimate of the signal
σ
.X based on the observation W : in fact, it is not difficult to prove that

μXσ |F W = Nσ Wt ,(1−σ 2 )t
. (19.3.1)
t t

where .μXσ |F W denotes the conditional law of .Xtσ given the .σ -algebra .FtW of
t t
observations on W up to time t (here .F W is the standard filtration for W ). To prove
(19.3.1) it is enough to calculate the conditional characteristic function
⎾ ⏋ ⎾ √ ⏋
iηXtσ iη 1−σ 2 Bt
.ϕXσ |F W (η) = E e | FtW =e iησ Wt
E e | FtW
=
t t

(by independence of W and B)

⎾ √ ⏋
2
. = eiησ Wt E eiη 1−σ Bt

which proves (19.3.1). We observe that in particular:

• when there is no noise, .σ = 1, we have .Xtσ = Wt and .μXσ |F W = δWt that is, the
t t
conditional law degenerates into a Dirac distribution;
• when there is no observation, .σ = 0, then .Xtσ = Bt and the conditioned law is
obviously .μXσ |F W = N0,t with Gaussian density
t t

1 x2
𝚪(t, x) = √
. e− 2t , t > 0, x ∈ R. (19.3.2)
2π t

If .0 ≤ σ < 1 then .Xtσ has the following conditional density given .FtW :

pt (x) = 𝚪((1 − σ 2 )t, x − σ Wt ),

. t > 0, x ∈ R. (19.3.3)

If .σ > 0, clearly the conditional density .pt (x) is a stochastic process: from a
practical standpoint, having the observation of .Wt available and inserting it into
(19.3.3), we obtain the expression of the law of .Xtσ estimated (or “filtered”) based
on such observation. Note that .pt (x) is a Gaussian function with stochastic drift,
dependent on the observation, and variance proportional to .1 − σ 2 . Figure 19.2
represents the plot of a simulation of the stochastic Gaussian density .pt (x).
In analogy with the unconditioned case examined in Sects. 2.5.3 and 17.3.1,
.pt (x) is a solution of the Kolmogorov forward (Fokker-Planck) equation which in

this case is a SPDE: in fact, recalling the expression (19.3.3) of .pt (x) in terms of
19.3 Filtering and Stochastic Heat Equation 361

Fig. 19.2 Plot of a simulation of the fundamental solution .pt (x) of the stochastic heat equation

𝚪 = 𝚪(s, y) in (19.3.2), by Itô’s formula we have

. dpt (x)=(1 − σ 2 )(∂s 𝚪)((1 − σ 2 )t, x − σ Wt )dt − σ (∂y 𝚪)((1 − σ 2 )t, x − σ Wt )dWt
σ2
+ (∂yy 𝚪)((1 − σ 2 )t, x − σ Wt )dt =
2

(since .𝚪 solves the forward heat equation .∂s 𝚪(s, y) = 21 ∂yy 𝚪(s, y))

1
. = (∂yy 𝚪)((1 − σ 2 )t, x − σ Wt )dt − σ (∂y 𝚪)((1 − σ 2 )t, x − σ Wt )dWt
2
1
= ∂xx pt (x)dt − σ ∂x pt (x)dWt .
2
In other words, the conditional density .pt (x) is the fundamental solution of the
stochastic heat equation

1
dpt (x) =
. ∂xx pt (x)dt − σ ∂x pt (x)dWt
2
which, in the case .σ = 0 where the observation is null, degenerates into the classical
heat equation.
Remark 19.3.1 (Bibliographic Note) Among the numerous monographs on the
theory of SPDEs, we particularly mention the books by Rozovskii [125], Kunita
[82], Prévôt and Röckner [120], Kotelenez [73], Chow [24], Liu and Röckner [91],
Lototsky and Rozovskii [92], and Pardoux [108]. For the study of stochastic filtering
362 19 Complements

problems, we refer, for example, to Fujisaki et al. [52], Pardoux [107], Fristedt et al.
[51], Elworthy et al. [40]. In [146] and [81], alternative approaches to the derivation
of filtering SPDEs are proposed, based on arguments similar to those used for the
proof of Feynman-Kac formulas.

19.4 Backward Stochastic Integral and Krylov’s SPDE

In this section, we present an interesting result according to which, for a fixed time
T , the solution of an SDE seen as a stochastic process varying over time and initial
datum, i.e., .(t, x) |→ XTt,x in the usual notations of Chap. 15, solves a stochastic
partial differential equation (SPDE) involving the characteristic operator of the SDE.
The statement of this result and the formulation of the Krylov’s backward SPDE (so
called in Section 1.2.3 in [126]) requires the introduction of the backward stochastic
integral in which the temporal structure of information, Brownian motion, and the
related filtration are inverted.
Let W be a d-dimensional Brownian motion on the space .(Ω, F , P ). For
.t ∈ [0, T ] we consider the (completed) .σ -algebra of the increments of a Brownian

motion between t and T , defined by

Fˆt := σ (Gˆt ∪ N ),
. Gˆt := σ (Ws − Wt , s ∈ [t, T ]). (19.4.1)

Clearly, (19.4.1) defines a decreasing family of .σ -algebras; we say that

F→t := FˆT −t , t ∈ [0, T ],

is the backward Brownian filtration. It is straightforward to verify that the process

→ t := WT − WT −t ,
W t ∈ [0, T ],

is a Brownian motion on .(Ω, F , P , F→t ). The backward stochastic integral is

defined as
ˆ s ˆ T −t
ur ⋆ dWr := → r,
uT −r d W 0 ≤ t ≤ s ≤ T, (19.4.2)
t T −s

under the assumptions on u for which the right-hand side of (19.4.2) is defined in
the usual sense of Itô, i.e.,
→ -progressively measurable (thus .ut ∈ mFˆt for every .t ∈ [0, T ]);
(i) .t |→ uT −t is .F
(ii) .u ∈ L2 ([0, T ]) a.s.
19.4 Backward Stochastic Integral and Krylov’s SPDE 363

For practical purposes, according to Corollary 10.2.27, if u is continuous then the

backward integral is the limit
ˆ ⎲
n
s ( )
. ur ⋆ dWr := lim utk Wtk − Wtk−1 (19.4.3)
t |π |→0+
k=1

in probability, where .π = {t = t0 < t1 < · · · < tn = s} denotes a partition of

[t, s]: note, in particular, that unlike the usual Itô integral, the coefficient u in the
.

sum in (19.4.3) is evaluated at the right endpoint of each interval of the partition
and .utk ∈ mFˆtk by hypothesis.
An N-dimensional backward Itô process is a process of the form
ˆ T ˆ T
Xt = XT +
. us ds + vs ⋆ dWs , t ∈ [0, T ],
t t

also written in differential form as

. − dXt = ut dt + vt ⋆ dWt . (19.4.4)

We state the backward version of Itô’s formula.

Theorem 19.4.1 (Backward Itô’s Formula) Let .F = F (t, x) ∈ C 1,2 ([0, T ] ×
RN ) and let X be the process in (19.4.4). We have

. − dF (t, Xt )
⎛ ⎞
1 ⎲
N
= ⎝(∂t F )(t, Xt ) + (vt vt∗ )ij (∂xi xj F )(t, Xt ) + ut (∇x F )(t, Xt )⎠ dt
2
i,j =1

⎲
N ⎲
d
j
+ (vt )ij (∂xi F )(t, Xt ) ⋆ dWt . (19.4.5)
i=1 j =1

The main result of the section is the following

Theorem 19.4.2 (Krylov’s Backward SPDE) Assume that .b, σ ∈ bC 3 ([0, T ] ×
RN ) and let .s |→ Xst,x be the solution of the SDE

dXst,x = b(s, Xst,x )ds + σ (s, Xst,x )dWs

. (19.4.6)

with initial condition .Xtt,x = x. Then the process .(t, x) |→ XTt,x solves the backward
SPDE
⎧
−dXTt,x = At XTt,x dt + (∇x XTt,x )σ (t, x) ⋆ dWt ,
. (19.4.7)
XTT ,x = x,
364 19 Complements

where

1 ⎲
N
At =
. cij (t, x)∂xj xi + bi (t, x)∂xi , (cij ) := σ σ ∗ ,
2
i,j =1

is the characteristic operator of the SDE (19.4.6). The explicit expressions of the
drift coefficient and the diffusion term in (19.4.7) are

1 ⎲ ⎲
N N
At XTt,x =
. cij (t, x)∂xj xi XTt,x + bi (t, x)∂xi XTt,x ,
2
i,j =1 i=1

⎲
N ⎲
d
j
(∇x XTt,x )σ (t, x) ⋆ dWt = σij (t, x)∂xi XTt,x ⋆ dWt .
i=1 j =1

Proof We only give a sketch of the proof and refer to Proposition 5.3 in [126]
for the details. To simplify the presentation, we only treat the one-dimensional and
autonomous case, following the approach proposed in [145]. First of all, thanks
to Kolmogorov’s continuity theorem 3.3.4 and the .Lp estimates of dependence on
the initial datum, extending the results of Corollary 17.4.2, we have that, up to
modifications, .x |→ XTt,x is sufficiently regular to support the derivatives that appear
in the SPDE in the classical sense. We use the Taylor series expansion for functions
of class .C 2 (R):

δ 2 ''
f (δ) − f (0) = δf ' (0) +
. f (λδ), λ ∈ [0, 1]. (19.4.8)
2
Given a partition of .[t, T ], we have
n ⎛
⎲ ⎞
t ,x
XTt,x − x = XTt,x − XTT ,x =
. XTk−1 − XTtk ,x =
k=1

(by the flow property of Theorem 17.2.1)

⎛ ⎞
⎲
n t
tk ,Xt k−1
,x

. = XT k − XTtk ,x =
k=1

t ,x
(by (19.4.8) with .f (δ) = XTtk ,x+δ and .δ = Δk X := Xtkk−1 − x)

n ⎛
⎲ ⎞
(Δk X)2
. = Δk X∂x XTtk ,x + ∂xx XTtk ,x+λk Δk X (19.4.9)
2
k=1
19.4 Backward Stochastic Integral and Krylov’s SPDE 365

with .λk = λk (ω) ∈ [0, 1]. Now we have

ˆ tk ⎛ ⎞ ˆ tk ⎛ ⎞
t ,x t ,x t ,x
Δk X = Xtkk−1 − x =
. b Xsk−1 ds + σ Xsk−1 dWs .
tk−1 tk−1

Therefore, setting

.Δk t = tk − tk−1 , Δk W = Wtk − Wtk−1 , ~ k X = b(x)Δk t + σ (x)Δk W,

we have
ˆ tk ⎛ ⎛ ⎞ ⎞
~k X =
.Δk X − Δ
t ,x
b Xsk−1 − b(x) ds
tk−1
ˆ tk ⎛ ⎛ ⎞ ⎞
t ,x
+ σ Xsk−1 − σ (x) dWs = O(Δk t),
tk−1

∂xx XTtk ,x+λk Δk X − ∂xx XTtk ,x = O(Δk t),

in .L2 (Ω, P ) norm or, more precisely,

⎾ | | ⏋
~ | tk ,x+λk Δk X tk ,x |2
.E |Δk X − Δk X| + |∂xx X − ∂xx XT | ≤ c(1 + |x|2 )(Δk t)2
2
T

with the constant c depending only on T and the Lipschitz constants of b and .σ .
Hence, from (19.4.9) we obtain
n ⎛
⎲ ~ k X)2
⎞
(Δ
.X
t,x
−x = ~ k X∂x Xtk ,x +
Δ tk ,x
∂xx XT + O(Δk t).
T T
2
k=1

Note that .∂x XTtk ,x , ∂xx XTtk ,x ∈ mFˆtk ; therefore, by (19.4.3), letting the partition
mesh go to zero, we have

⎲
n ˆ T ˆ T
. ~ k X∂x Xtk ,x −→
Δ b(x)∂x XTs,x ds + σ (x)∂x XTs,x ⋆ dWs ,
T
k=1 t t

⎲
n ˆ T
~ k X)2 ∂xx Xtk ,x −→
(Δ σ 2 (x)∂xx XTs,x ds,
T
k=1 t

in norm .L2 (Ω, P ) and this concludes the proof. ⨆

⨅
366 19 Complements

Let us prove an interesting invariance property of Krylov’s SPDE.

Corollary 19.4.3 Given .F ∈ bC 2 (RN ) and X as in (19.4.6), let .VTt,x = F (XTt,x ).
Then also .VTt,x satisfies the SPDE (19.4.7)

. − dVTt,x = At VTt,x dt + (∇x VTt,x )σ (t, x) ⋆ dWt .

Proof To fix the ideas, let us start with the one-dimensional case: by the backward
SPDE (19.4.7) and Itô’s formula (19.4.5), we have
⎛
σ 2 (t, x) '' t,x σ 2 (t, x) ' t,x
. − dF (XTt,x ) = F (XT )(∂x XTt,x )2 + F (XT )∂xx XTt,x
2 2
⎞
' t,x t,x
+ b(t, x)F (XT )∂x XT dt

+ σ (t, x)F ' (XTt,x )∂x XTt,x ⋆ dWt =

(since .∂x VTt,x = F ' (XTt,x )∂x XTt,x and .∂xx VTt,x = F '' (XTt,x )(∂x XTt,x )2 +
F ' (XTt,x )∂xx XTt,x )
⎛ ⎞
σ 2 (t, x) t,x t,x
. = ∂xx VT + b(t, x)∂x VT dt + σ (t, x)∂x VTt,x ⋆ dWt
2

which proves the thesis. The multidimensional case is analogous: first of all

∂xh VTt,x = (∇x F )(XTt,x )∂xh XTt,x ,

. ⎲
N
∂xh xk VTt,x = (∂xi xj F )(XTt,x )(∂xh XTt,x )i (∂xk XTt,x )j + (∇x F )(XTt,x )(∂xh xk XTt,x ),
i,j =1
(19.4.10)
and by (19.4.7) and (19.4.5)

1 ⎲(
N
)
. − dF (XTt,x ) = (∇x XTt,x )σ (t, x)((∇x XTt,x )σ (t, x))∗ ij (∂xi xj F )(XTt,x )dt
2
i,j =1

+ (At XTt,x )(∇x F )(XTt,x )dt + (∇x F )(XTt,x )(∇x XTt,x )σ (t, x) ⋆ dWt =

(by (19.4.10))

. = At VTt,x dt + (∇x VTt,x )σ (t, x) ⋆ dWt .

which proves the thesis. ⨆

⨅
19.4 Backward Stochastic Integral and Krylov’s SPDE 367

Remark 19.4.4 (Bibliographic Note) The assumptions regarding the coefficients

in Theorem 19.4.2 can be relaxed: in [126], Theorem 5.1, it is shown that if
t,x
.b, σ ∈ bC ([0, T ] × R ) then .(t, x) |→ X
1 N
T is a distributional solution of equation
(19.4.7).
The material in this section is based on the original works of Krylov in [76], [79],
[80] and Veretennikov [145]. The monographs [82] and [126] are now considered
classical references for the study of SPDEs and stochastic flows.
Chapter 20
A Primer on Parabolic PDEs

Lo so
Del mondo e anche del resto
Lo so
Che tutto va in rovina
Ma di mattina
Quando la gente dorme
Col suo normale malumore
Mi può bastare un niente
Forse un piccolo bagliore
Un’aria già vissuta
Un paesaggio o che ne so

E sto bene
Io sto bene come uno quando sogna
Non lo so se mi conviene
Ma sto bene, che vergogna1
Giorgio Gaber

1 I know

About the world and everything else

I know
That everything is crumbling
But in the morning
When people are asleep
With their usual bad mood
It can be enough for me
Maybe a small gleam
An already experienced air
A landscape or whatever
And I’m fine
I’m fine like someone when they dream
I don’t know if it’s good for me
But I’m fine, what a shame.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 369
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1_20
370 20 A Primer on Parabolic PDEs

We provide a concise overview of fundamental results concerning the existence and

uniqueness of solutions to the Cauchy problem for parabolic partial differential
equations. The monographs by Friedman [49], Ladyzhenskaia et al. [84], Oleinik
and Radkevic [103], although somewhat dated, are classic reference texts for a more
complete and in-depth treatment.
We consider a second-order partial differential operator of the form

1 ⎲ ⎲
N N
L :=
. cij (t, x)∂xi xj + bj (t, x)∂xj + a(t, x) − ∂t (20.0.1)
2
i,j =1 j =1

defined for .(t, x) belonging to the strip

ST := ]0, T [×RN ,
. (20.0.2)

where .T > 0 is fixed. We assume that the matrix of coefficients .(cij ) is symmetric
and positive semidefinite: in this case, we say that .L is a forward parabolic
operator.
The interest in this type of operators is due to the fact that, as seen previously in
Sects. 2.5 and 15.1,

1 ⎲ ⎲
N N
At :=
. cij (t, x)∂xi xj + βj (t, x)∂xj , (cij ) := σ σ ∗ , (20.0.3)
2
i,j =1 j =1

is the characteristic operator of the SDE

.dXt = β(t, Xt )dt + σ (t, Xt )dWt (20.0.4)

and the related forward Kolmogorov operator .L = At∗ − ∂t is, at least formally, of
the form (20.0.1) with

⎲
N ⎲
N
1 ⎲
N
. bj := −βj + ∂xi cij , a := − ∂xi βi + ∂xi xj cij . (20.0.5)
2
i=1 i=1 i,j =1

Note that in a forward operator, the time derivative appears with a negative sign:
as already mentioned in Sect. 2.5.2, this type of operators typically intervene in
physics in the description of phenomena that evolve over time, such as heat diffusion
in a body. On the other hand, every forward operator can be transformed into a
parabolic backward2 operator with the simple change of variables .s = T − t: it
follows that all the results we prove in this chapter for forward operators admit an
analogous backward formulation.

2 In which the time derivative appears with a positive sign.

20.1 Uniqueness: The Maximum Principle 371

20.1 Uniqueness: The Maximum Principle

We study the uniqueness of the solution to the Cauchy problem

⎧
L u(t, x) = 0, (t, x) ∈ ST ,
. (20.1.1)
u(0, x) = ϕ(x), x ∈ RN ,

for the operator .L in (20.0.1). A classic example due to Tychonoff [141] shows
that the problem (20.1.1) for the heat operator admits infinite solutions: in fact, in
addition to the identically zero solution, also the functions of the type
∞
⎲ x 2k k − 1α
uα (t, x) :=
. ∂ e t , α > 1, (20.1.2)
(2k)! t
k=0

are classical solutions to the Cauchy problem

⎧
1
2 ∂xx uα − ∂t uα = 0, in R>0 × R,
.
uα (0, ·) = 0, in R.

However, the solutions in (20.1.2) are in some sense “pathological”: they oscillate,
changing sign infinitely many times and have very rapid growth as .|x| → ∞.
In light of Tychonoff’s example, the study of the uniqueness of the solution of
the problem (20.1.1) consists in determining suitable classes of functions, called
uniqueness classes for .L , within which the solution, if it exists, is unique. In this
section, we assume the following minimal hypotheses on the coefficients of .L in
(20.0.1):
Assumption 20.1.1
(i) For each .i, j = 1, . . . , N , the coefficients .cij , bi and a are real-valued
measurable functions;
(ii) the matrix .C (t, x) := (cij (t, x)) is symmetric and positive semidefinite for
every .(t, x) ∈ ST ;
(iii) the coefficient a is upper bounded: there exists .a0 ∈ R such that

a(t, x) ≤ a0 ,
. (t, x) ∈ ST .

We will prove that a uniqueness class is given by functions that do not grow too
rapidly to infinity in the sense that they satisfy the estimate
2
. |u(t, x)| ≤ CeC|x| , (t, x) ∈ ST , (20.1.3)
372 20 A Primer on Parabolic PDEs

for some positive constant C. This result, contained in Theorem 20.1.8, is proven
under very general conditions, namely Assumption 20.1.1 and the following
Assumption 20.1.2 There exists a constant M such that

|cij (t, x)| ≤ M,

. |bi (t, x)| ≤ M(1 + |x|), |a(t, x)| ≤ M(1 + |x|2 ),
(t, x) ∈ ST , i, j = 1, . . . , N.

It is possible to determine another uniqueness class by imposing other growth

conditions on the coefficients.
Assumption 20.1.3 There exists a constant M such that

|cij (t, x)| ≤ M(1 + |x|2 ),

. |bi (t, x)| ≤ M(1 + |x|), |a(t, x)| ≤ M,
(t, x) ∈ ST , i, j = 1, . . . , N.

Theorem 20.1.10 shows that, under Assumptions 20.1.1 and 20.1.3, a uniqueness
class is given by functions with at most polynomial growth, which satisfy an
estimate of the type

|u(t, x)| ≤ C(1 + |x|p ),

. (t, x) ∈ ST , (20.1.4)

for some positive constants C and p.

We explicitly note that the previous assumptions are so weak that they do not
generally guarantee the existence of the solution.

20.1.1 Cauchy-Dirichlet Problem

In this section we study the operator .L in (20.0.1) on a “cylinder” of the form

DT = ]0, T [×D
.

where D is a bounded domain (open and connected set) of .RN . We denote by .∂D
the boundary of D and say that

∂p DT := ({0} × D) ∪ ([0, T [×∂D)

.
◟ ◝◜ ◞ ◟ ◝◜ ◞
base lateral boundary

is the parabolic boundary of .DT . As before, .C 1,2 (DT ) is the space of functions
defined on .DT , that are continuously differentiable with respect to t and twice
continuously differentiable with respect to x.
20.1 Uniqueness: The Maximum Principle 373

Definition 20.1.4 (Cauchy-Dirichlet Problem) A classical solution of the

Cauchy-Dirichlet problem for .L on .DT is a function .u ∈ C 1,2 (DT ) ∩ C(DT ∪
∂p DT ) such that
⎧
L u = f, in DT ,
. (20.1.5)
u = ϕ, in ∂p DT ,

where .f ∈ C(DT ) and .ϕ ∈ C(∂p DT ) are given functions, called respectively

inhomogeneous term and boundary datum of the problem.
The main result of the section, which subsequently leads to the uniqueness of the
classical solution to problem (20.1.5) (cf. Corollary 20.1.6), is the following
Theorem 20.1.5 (Weak Maximum Principle) Under Assumption 20.1.1, if .u ∈
C 1,2 (DT ) ∩ C(DT ∪ ∂p DT ) is such that .L u ≥ 0 in .DT and .u ≤ 0 on .∂p DT , then
.u ≤ 0 on .DT .

Proof First, we observe that it is not restrictive to take .a0 < 0 in Assumption 20.1.1.
If it were not, it would be enough to prove the thesis for the function

uλ (t, x) := e−λt u(t, x)

. (20.1.6)

which satisfies

L uλ − λuλ = e−λt L u,
. (20.1.7)

choosing .λ > a0 .
Now we proceed by contradiction. Denying the thesis, there would exist a point
.(t, x) ∈ DT such that .u(t, x) > 0: in fact, we can also assume that

u(t, x) = max u.
.
[0,t]×D

It follows that

H u(t, x) := (∂xi xj u(t, x)) ≤ 0,

. ∂xj u(t, x) = 0, ∂t u(t, x) ≥ 0,

for every .j = 1, . . . , N. Then there exists a symmetric and positive semi-definite

matrix .M = (mij ) such that
⎛ ⎞
⎲
N
H u(t, x) = −M = −
.
2
mih mj h
h=1 i,j
374 20 A Primer on Parabolic PDEs

and therefore

1 ⎲ ⎲ ⎲
N N N
L u(t, x) = −
. cij (t, x) mih mj h + bj (t, x)∂xj u(t, x)
2
i,j =1 h=1 j =1

+ a(t, x)u(t, x) − ∂t u(t, x)

1⎲ ⎲
N N
=− cij (t, x)mih mj h +a(t, x)u(t, x) − ∂t u(t, x)
2
h=1 i,j =1
◟ ◝◜ ◞
≥0 since C =(cij )≥0

≤ a(t, x)u(t, x) < 0,

and this contradicts the assumption .L u ≥ 0 in .DT . ⨆

⨅
Corollary 20.1.6 (Comparison Principle) Under Assumption 20.1.1, let .u, v ∈
C 1,2 (DT ) ∩ C(DT ∪ ∂p DT ) be such that .L u ≤ L v in .DT and .u ≥ v on .∂p DT .
Then .u ≥ v in .DT . In particular, if it exists, the classical solution of the Cauchy-
Dirichlet problem (20.1.5) is unique.
Proof It is enough to apply the weak maximum principle to the function .v − u. ⨆
⨅
The following useful result provides an estimate of the maximum of the solution
of the Cauchy-Dirichlet problem (20.1.5) in terms of f and the boundary datum .ϕ.
Theorem 20.1.7 If the operator .L satisfies Assumption 20.1.1 then for every .u ∈
C 1,2 (DT ) ∩ C(DT ∪ ∂p DT ) we have
⎛ ⎞
a0+ T
. sup |u| ≤ e sup |u| + T sup |L u| , a0+ := max{0, a0 }. (20.1.8)
DT ∂p DT DT

Proof Consider first the case .a0 ≤ 0 and therefore .a0+ = 0. Suppose that u and .L u
are bounded respectively on .∂p DT and .DT , otherwise there is nothing to prove.
Letting

w(t) = sup |u| + t sup |L u|,

. t ∈ [0, T ],
∂p DT DT

we have

.L w = aw − sup |L u| ≤ L u, L (−w) = −aw + sup |L u| ≥ L u,

DT DT

and .−w ≤ u ≤ w on .∂p DT . Then estimate (20.1.8) follows from the comparison
principle, Corollary 20.1.6.
20.1 Uniqueness: The Maximum Principle 375

Let now .a0 > 0. Consider .uλ in (20.1.6) with .λ = a0 : as just proved, we have

. sup |uλ | ≤ sup |uλ | + T sup |(L − a0 )uλ |.

DT ∂p DT DT

Then, since .a0 > 0, we obtain

e−a0 T sup |u| ≤

. sup |e−a0 t u(t, x)| ≤ sup |uλ | + T sup |(L − a0 )uλ | ≤
DT (t,x)∈DT ∂p DT DT

(by (20.1.7))

. ≤ sup |e−a0 t u(t, x)| + T sup |e−a0 t L u(t, x)| ≤

(t,x)∈∂p DT (t,x)∈DT

(since .a0 > 0)

. ≤ sup |u| + T sup |L u|,

∂p DT DT

which proves the thesis. ⨆

⨅

20.1.2 Cauchy Problem

We establish analogous results to those presented in the preceding section for the
Cauchy problem (20.1.1).
Theorem 20.1.8 (Weak Maximum Principle) Let Assumptions 20.1.1 and 20.1.2
be in force. If .u ∈ C 1,2 (ST ) ∩ C([0, T [×RN ) is such that
⎧
L u ≤ 0, in ST ,
. (20.1.9)
u(0, ·) ≥ 0, in RN ,

and verifies the estimate

2
u(t, x) ≥ −CeC|x| ,
. (t, x) ∈ [0, T [×RN , (20.1.10)

for a positive constant C, then .u ≥ 0 in .[0, T [×RN . Consequently, there exists

at most one classical solution of the Cauchy problem (20.1.1) that verifies the
exponential growth estimate (20.1.3).
376 20 A Primer on Parabolic PDEs

We explicitly note that Assumptions 20.1.1 and 20.1.2 are very mild, so as to
include, for example, the case when .L is a first-order operator. We first prove the
following
Lemma 20.1.9 Under Assumption 20.1.1, if .u ∈ C 1,2 (ST ) ∩ C([0, T [×RN )
verifies (20.1.9) and is such that

. lim inf inf u(t, x) ≥ 0, (20.1.11)

|x|→∞ t∈ ]0,T [

then .u ≥ 0 on .[0, T [×RN .

Proof As in the proof of Theorem 20.1.5, it is not restrictive to assume .a0 < 0 so
that, for every .ε > 0, we have
⎧
L (u + ε) ≤ 0, in ST ,
.
u(0, ·) + ε > 0, in RN .

Fix .(t0 , x0 ) ∈ ST . Thanks to condition (20.1.11), there exists .R > |x0 | such that

u(t, x) + ε > 0,
. t ∈ ]0, T [, |x| = R,

and from the weak maximum principle of Theorem 20.1.5, applied on the cylinder

DT = ]0, T [×{|x| < R},

it follows that .u(t0 , x0 ) + ε ≥ 0. Given the arbitrariness of .ε, we also have

u(t0 , x0 ) ≥ 0.
. ⨆
⨅
Proof of Theorem 20.1.8 We prove that .u ≥ 0 on a strip .ST0 with .T0 > 0 that
depends only on the constant M of Assumption 20.1.2 and on the constant C in
(20.1.10): if necessary, we just need to apply this result repeatedly to prove the
thesis on the strip .ST .
First of all, to understand the general idea, we give the proof in the particular
case of the heat operator

1
L =
. Δ − ∂t ,
2

Fixed .γ > C, let .T0 = 1

4γ and consider the function
⎛ ⎞
1 γ |x|2
v(t, x) :=
. exp , (t, x) ∈ [0, T0 [×RN ,
(1 − 2γ t)
N
2 1 − 2γ t
20.1 Uniqueness: The Maximum Principle 377

such that

v(t, x) ≥ eγ |x| .
2
L v(t, x) = 0
. and

From Lemma 20.1.9 we deduce that .u + εv ≥ 0 for every .ε > 0, which proves the
thesis.
The general case is only technically more complicated and exploits Assump-
tion 20.1.2 on the coefficients of the operator. Fixed .γ > C and two constants
.α, β ∈ R that we will determine later, consider the function

⎛ ⎞
γ |x|2 1
.v(t, x) = exp + βt , 0≤t ≤ , x ∈ RN .
1 − αt 2α

Since

2γ ⎲
N
Lv 2γ 2 γ αγ |x|2
. = 〈C x, x〉 + trC + bi xi + a − − β,
v (1 − αt)2 1 − αt 1 − αt (1 − αt)2
i=1

by Assumption 20.1.2 it is possible to choose .α, β large enough so that

Lv
. ≤ 0. (20.1.12)
v

Letting .w := uv , by condition (20.1.10), we have

⎛ ⎞
. lim inf inf w(t, x) ≥ 0,
|x|→∞ 1
0≤t≤ 2α

and w satisfies the equation

1 ⎲ ⎲
N N
Lu
. cij ∂xi xj w + b̂i ∂xi w + âw − ∂t w = ≤ 0,
2 v
i,j =1 i=1

where

⎲
N
∂x j v Lv
b̂i = bi +
. cij , â = .
v v
j =1

Since .â ≤ 0 by (20.1.12), we can apply Lemma 20.1.9 to conclude that w (and thus
also u) is non-negative. ⨆
⨅
378 20 A Primer on Parabolic PDEs

Theorem 20.1.10 (Weak Maximum Principle) Assume Assumptions 20.1.1

and 20.1.3. If .u ∈ C 1,2 (ST ) ∩ C([0, T [×RN ) satisfies (20.1.9) and the estimate

u(t, x) ≥ −C(1 + |x|p ),

. (t, x) ∈ [0, T [×RN , (20.1.13)

for some positive constants C and p, then .u ≥ 0 in .[0, T [×RN . Consequently, there
exists at most one classical solution of the Cauchy problem (20.1.1) that satisfies the
polynomial growth estimate (20.1.4) at infinity.
Proof We only prove the case .a0 < 0. Consider the function
⎛ ⎞q
v(t, x) = eαt κt + |x|2
.

and verify that for every .q > 0 it is possible to choose .α, κ such that .L v < 0 on
ST . Then for .p < 2q and for every .ε > 0 we have .L (u + εv) < 0 on .ST and,
.

thanks to condition (20.1.13), we can apply Lemma 20.1.9 to deduce that .u+εv ≥ 0
on .ST . The thesis follows from the arbitrariness of .ε. ⨆
⨅
We now prove the analogue of Theorem 20.1.7: the following result provides
estimates, in .L∞ norm, of the dependence of the solution in terms of the initial
datum and the inhomogeneous term. These estimates play a crucial role, for
example, in the proof of the stability of numerical schemes.
Theorem 20.1.11 If the operator .L satisfies Assumptions 20.1.1 and 20.1.2, then
for every .u ∈ C 1,2 (ST ) ∩ C([0, T [×RN ) that satisfies the exponential growth
estimate (20.1.3) we have
⎛ ⎞
−a0+ T
. sup |u| ≤ e sup |u(0, ·)| + T sup |L u| , a0+ := max{0, a0 }.
[0,T [×RN RN ST

Proof If .a0 < 0 then, let

w± = sup |u(0, ·)| + t sup |L u| ± u,

.
RN ST

we have
⎧
⎨L w± = a sup |u(0, ·)| − sup |L u| ± L u ≤ 0, in ST ,
. ST
⎩
w± (0, ·) ≥ 0, in RN ,

and clearly .w± satisfies the estimate (20.1.10). It follows from Theorem 20.1.8 that
w± ≥ 0 in .ST and this proves the thesis. On the other hand, if .a0 ≥ 0 then it is
.

enough to proceed as in the proof of Theorem 20.1.7. ⨆

⨅
20.2 Existence: The Fundamental Solution 379

20.2 Existence: The Fundamental Solution

In this section, we give an existence result of classical solutions of the Cauchy

problem for the operator .L in (20.0.1). The central concept in this regard is that
of fundamental solution.
Definition 20.2.1 (Fundamental Solution) A fundamental solution for the opera-
tor .L on .ST ≡ ]0, T [×RN is a function .𝚪 = 𝚪(t0 , x0 ; t, x), with .0 ≤ t0 < t < T
and .x0 , x ∈ RN , such that for every .ϕ ∈ bC(RN ) the function defined by
ˆ
u(t, x) =
. ϕ(x0 )𝚪(t0 , x0 ; t, x)dx0 , t0 < t < T , x ∈ RN , (20.2.1)
RN

and by .u(t0 , ·) = ϕ, is a classical solution (i.e., .u ∈ C 1,2 (]t0 , T [×RN ) ∩

C([t0 , T [×RN )) of the Cauchy problem
⎧
Lu = 0 in ]t0 , T [×RN ,
. (20.2.2)
u(t0 , ·) = ϕ in RN .

A well-known technique for proving the existence of the fundamental solution is

the parametrix method introduced by E.E. Levi in [89] and then developed by many
other authors.3 It is a fairly long and complex constructive procedure based on the
following4 Assumption 20.2.2 on the operator .L . We recall the definition of the
space .bCTα with the norm defined in (18.2.1): in particular, we emphasize that the
functions in .bCTα are Hölder continuous solely with respect to the spatial variables.
Assumption 20.2.2
(i) .cij , bi , a ∈ bCTα for some .α ∈ ]0, 1] and for each .i, j = 1, . . . , N;
(ii) the matrix .C := (cij )1≤i,j ≤N is symmetric and satisfies the following uniform
parabolicity condition: there exists a constant .λ0 > 1 such that

1 2
. |η| ≤ 〈C (t, x)η, η〉 ≤ λ0 |η|2 , (t, x) ∈ ST , η ∈ RN . (20.2.3)
λ0

For convenience, we assume .λ0 large enough so that .[cij ]α , [bi ]α , [a]α ≤ λ0
for each .i, j = 1, . . . , N .

3 See, for example, the works of Pogorzelski [118] and Aronson [5] on the construction of the
fundamental solution. The book by Friedman [50] is still a classic reference text for the parametrix
method and the main source that inspired our presentation.
4 It is possible to assume slightly weaker hypotheses: in this regard, see Section 6.4 in [50]. In

particular, the continuity condition in time is only for convenience: the results of this section extend
without difficulty to the case of coefficients that are measurable in t; in this case, the PDE is
understood in an integro-differential sense, as in (20.2.5).
380 20 A Primer on Parabolic PDEs

Remark 20.2.3 Let

1 ⎲ ⎲
N N
A :=
. cij (t, x)∂xi xj + bj (t, x)∂xj + a(t, x) (20.2.4)
2
i,j =1 j =1

so that .L = A − ∂t . Under Assumption 20.2.2, the following statements are

equivalent:
(i) .u ∈ C 1,2 (]t0 , T [×RN ) is a classical solution of the equation .L u = 0 on
.]t0 , T [×R ;
N

(ii) .u ∈ C(]t0 , T [×RN ), is twice continuously differentiable with respect to x and

satisfies the integro-differential equation
ˆ t
. u(t, x) = u(t1 , x) + A u(s, x)ds, t0 < t1 < t < T , x ∈ RN .
t1
(20.2.5)
In the following theorem, we consider the Cauchy problem with inhomogeneous
term f satisfying growth and local Hölder continuity conditions.
Assumption 20.2.4 .f ∈ C(]t0 , T [×RN ) and there exists .β > 0 such that:
(i)

c1 ec2 |x|
2

|f (t, x)| ≤
. , (t, x) ∈ ]t0 , T [×RN , (20.2.6)
(t − t0 )1−β

where .c1 , c2 are positive constants with .c2 < 4λ0 (T1 −t0 ) ;
(ii) for every .n ∈ N, there exists a constant .κn such that

|x − y|β
|f (t, x) − f (t, y)| ≤ κn
.
β
, t0 < t < T , |x|, |y| ≤ n.
(t − t0 )1− 2
(20.2.7)

The main result of the chapter is the following

Theorem 20.2.5 (Fundamental Solution [!!!]) Under Assumption 20.2.2, there
exists a fundamental solution .𝚪 for .A − ∂t in .ST . Moreover:
(i) .𝚪 = 𝚪(t0 , x0 ; t, x) is a continuous function of .(t0 , x0 , t, x) for .0 ≤ t0 <
t < T and .x, x0 ∈ RN . For every .(t0 , x0 ) ∈ [0, T [×RN , .𝚪(t0 , x0 ; ·, ·) ∈
C 1,2 (]t0 , T [×RN ) and the following Gaussian estimates hold: for every .λ >
20.2 Existence: The Fundamental Solution 381

λ0 , where .λ0 is the constant of Assumption 20.2.2, there exists a positive

constant .c = c(T , N, λ, λ0 , α) such that

for every .(t, x) ∈ ]t0 , T [×RN , where .G is the Gaussian function in (20.3.1).
Furthermore, there exist two positive constants .λ̄, c̄, only dependent on
.T , N, λ0 , α, such that

( )
𝚪(t0 , x0 ; t, x) ≥ c̄ G λ̄(t − t0 ), x − x0
. (20.2.11)

for every .(t, x) ∈ ]t0 , T [×RN ;

(ii) for every f satisfying Assumption 20.2.4 and .ϕ ∈ bC(RN ), the function
defined by
ˆ
u(t, x) =
. ϕ(x0 )𝚪(t0 , x0 ; t, x)dx0
RN
ˆ tˆ
− f (s, y)𝚪(s, y; t, x)dyds, t0 < t < T , x ∈ RN ,
t0 RN
(20.2.12)

and by .u(t0 , ·) = ϕ, is a classical solution of the Cauchy problem

⎧
Lu = f in ]t0 , T [×RN ,
. (20.2.13)
u(t0 , ·) = ϕ in RN .

Formula (20.2.12) is usually called5 Duhamel’s formula;

(iii) the Chapman-Kolmogorov equation holds

𝚪(t0 , x0 ; t, x)
.
ˆ
= 𝚪(t0 , x0 ; s, y)𝚪(s, y; t, x)dy, 0 ≤ t0 < s < t < T , x, x0 ∈ RN ;
RN

5 Duhamel’s formula can be interpreted as a “forward version” of the Feynman-Kac formula

(15.4.6).
382 20 A Primer on Parabolic PDEs

(iv) if the coefficient a is constant, then

ˆ
. 𝚪(t0 , x0 ; t, x)dx0 = ea(t−t0 ) , t ∈ ]t0 , T [, x ∈ RN , (20.2.14)
RN

and in particular, if .a ≡ 0, then .𝚪(t0 , ·; t, x) is a density.

The proof of Theorem 20.2.5 is deferred to Sect. 20.3 along with several
preliminary results.
Notation 20.2.6 Let .α ∈ ]0, 1]. We denote by .bC α (RN ) the space of bounded, .α-
Hölder continuous functions on .RN , equipped with the norm

|ϕ(x) − ϕ(y)|
. ‖ϕ‖bC α (RN ) := sup |ϕ| + sup .
RN x/=y |x − y|α

The following result shows that estimate (20.2.10) can be refined in the sense that
1
the non-integrable singularity . t−t 0
can be replaced by an integrable one when the
initial datum is Hölder continuous.
Corollary 20.2.7 Under the assumptions of Theorem 20.2.5, consider the solution
u in (20.2.12) of the Cauchy problem (20.2.13) with .a = f = 0. If .ϕ ∈
bC δ (RN ) for some .δ > 0, then there exists a constant c, which depends only on
.T , N, δ, α, λ0 , [cij ]α and .[bi ]α , such that

c
. |Dxk u(t, x)| ≤ k−δ
‖ϕ‖bC δ (RN ) , t > t0 , x ∈ RN , k = 0, 1, 2,
(t − t0 ) 2

(20.2.15)

where .Dxk denotes a derivative of order k in the variables .x1 , . . . , xN .

Proof We give the proof for .k = 2 as the other cases are analogous and simpler.
Since
ˆ
. 𝚪(t0 , x0 ; t, x)dx0 = 1, t0 < t, x ∈ RN ,
RN

we have
ˆ ˆ
. 0 = ∂xi xj 𝚪(t0 , x0 ; t, x)dx0 = ∂xi xj 𝚪(t0 , x0 ; t, x)dx0 .
RN RN

Hence
|ˆ |
| |
|∂xi xj u(t, x)| = ||
. ∂xi xj 𝚪(t0 , x0 ; t, x)(ϕ(x0 ) − ϕ(x))dy || ≤ (20.2.16)
RN
20.3 The Parametrix Method 383

(by the triangle inequality and the Gaussian estimate (20.2.10))

ˆ
c
. ≤ G(λ(t − t0 ), x − x0 )|ϕ(x0 ) − ϕ(x)|dx0 ≤ (20.2.17)
t − t0 RN

(by the Hölder assumption on .ϕ)

ˆ ⎛ ⎞δ
c‖ϕ‖bC δ (RN ) |x − x0 |
. ≤ √ G(λ(t − t0 ), x − x0 )dx0 (20.2.18)
(t − t0 )1− 2
δ
RN t − t0

and the conclusion follows thanks to the elementary estimates of Lemma 20.3.4. ⨆
⨅

20.3 The Parametrix Method

This section is dedicated to the proof of Theorem 20.2.5. We consider .L in (20.0.1)

and assume that it verifies Assumption 20.2.2. The main idea of the parametrix
method is to construct a fundamental solution through successive approximations;
the first approximation term is referred to as the “parametrix”, which is essentially
the Gaussian fundamental solution of a heat operator obtained from .L by freezing
the coefficients in the spatial variables, while leaving the time variable free.
Notation 20.3.1 Given a constant .N × N, symmetric and positive definite matrix
C, we set

1 1 −1 x,x〉
G(C, x) = √
. e− 2 〈C , x ∈ RN .
(2π )N det C

Notice that

1 ⎲
N
. Cij ∂xi xj G(tC, x) = ∂t G(tC, x), t > 0, x ∈ RN .
2
i,j =1

When C is the identity matrix, .C = IN , for simplicity we write

1 |x|2
G(t, x) ≡ G(tIN , x) =
.
N
e− 2t , t > 0, x ∈ RN , (20.3.1)
(2π t) 2

to indicate the usual standard Gaussian function, solution of the heat equation
2 ΔG(t, x) = ∂t G(t, x).
1
.
384 20 A Primer on Parabolic PDEs

Given .y ∈ RN , we define the operator .Ly as the result of computing the

coefficients of .L in y and removing terms of order lower than the second:

1 ⎲
N
Ly :=
. cij (t, y)∂xi xj − ∂t .
2
i,j =1

Operator .Ly acts in the variables .(t, x) and has coefficients that depend only on the
time variable t, since y is fixed. Thanks to Assumption 20.2.2 and in particular to
the fact that the matrix .C = (cij ) is uniformly positive definite, we have that the
fundamental solution of .Ly has the following explicit expression
ˆ t
𝚪y (t0 , x0 ; t, x) = G(Ct0 ,t (y), x − x0 ),
. Ct0 ,t (y) := C (s, y)ds, (20.3.2)
t0

for .0 ≤ t0 < t < T and .x0 , x ∈ RN .

We define the parametrix for .L as

P(t0 , x0 ; t, x) := 𝚪x0 (t0 , x0 ; t, x),

. 0 ≤ t0 < t < T , x0 , x ∈ RN . (20.3.3)

According to the parametrix method, the fundamental solution of .L is sought in the

form
ˆ tˆ
.𝚪(t0 , x0 ; t, x) = P(t0 , x0 ; t, x) + Ф(t0 , x0 ; s, y)P(s, y; t, x)dyds
t0 RN
(20.3.4)

where .Ф is an unknown function to be determined by imposing that6

.L 𝚪(t0 , x0 ; t, x) = 0. Formally, from (20.3.4) we have
7

ˆ tˆ
L 𝚪(t0 , x0 ; t, x) = L P(t0 , x0 ; t, x) +
. Ф(t0 , x0 ; s, y)L P(s, y; t, x)dyds
t0 RN

− Ф(t0 , x0 ; t, x) (20.3.5)

6 Remember that .L acts in the variables .(t, x).

7 The last term in the right-hand side of (20.3.5) derives from applying .∂t to integration limit of the
outer integral in (20.3.4): we obtain
ˆ
. Ф(t0 , x0 ; t, y)P(t, y; t, x)dy = Ф(t0 , x0 ; t, x)
RN

since formally .P(t, y; t, x)dy = δx (dy) where .δx denotes the Dirac delta centered at x.
20.3 The Parametrix Method 385

which gives the equation for .Ф

ˆ tˆ
.Ф(t0 , x0 ; t, x) = L P(t0 , x0 ; t, x) + Ф(t0 , x0 ; s, y)L P(s, y; t, x)dyds
t0 RN
(20.3.6)

for .0 ≤ t0 < t < T and .x0 , x ∈ RN . By successive approximations, we obtain

∞
⎲
.Ф(t0 , x0 ; t, x) = (L P)k (t0 , x0 ; t, x) (20.3.7)
k=1

where

(L P)1 (t0 , x0 ; t, x) = L P(t0 , x0 ; t, x),

.
ˆ tˆ
(L P)k+1 (t0 , x0 ; t, x) = (L P)k (t0 , x0 ; s, y)L P(s, y; t, x)dyds, k ∈ N.
t0 RN
(20.3.8)
In Sect. 20.3.2 we prove the following
Proposition 20.3.2 The series in (20.3.7) converges and defines .Ф =
Ф(t0 , x0 ; t, x) which is a continuous function of .(t0 , x0 , t, x) for .0 ≤ t0 < t < T
and .x, x0 ∈ RN , and solves equation (20.3.6). Moreover, for every .λ > λ0 there
exists a positive constant .c = c(T , N, λ, λ0 ) such that
c
. |Ф(t0 , x0 ; t, x)| ≤ α G(λ(t − t0 ), x − x0 ), .
(t − t0 )1− 2
(20.3.9)

c |x − y|
α
2 ⎛
|Ф(t0 , x0 ; t, x) − Ф(t0 , x0 ; t, y)| ≤ 1− α4
G(λ(t − t0 ), x − x0 )
(t − t0 )
⎞
+ G(λ(t − t0 ), y − x0 )
(20.3.10)

for every .0 ≤ t0 < t < T and .x, y, x0 ∈ RN .

20.3.1 Gaussian Estimates

In this section, we prove some preliminary estimates for Gaussian kernels.

Notation 20.3.3 We adopt Convention 14.4.3 to denote the dependence of con-
stants. Moreover, for the sake of convenience, as we need to establish several
estimates, we will use the symbol c to represent a generic constant whose value
386 20 A Primer on Parabolic PDEs

may vary from one line to another. When necessary, we will explicitly state the
quantities on which c depends.
Lemma 20.3.4 Let .G be the Gaussian function in (20.3.1). For every .p > 0 and
λ1 > λ0 there exists a constant .c = c(p, N, λ1 , λ0 ) such that
.

⎛ ⎞p
|x|
. √ G(λ0 t, x) ≤ c G(λ1 t, x), t > 0, x ∈ RN .
t
|x|
Proof For simplicity, let .z = √
t
, we have

⎛ ⎞ ⎛ ⎞N
zp z2 λ1
z G(λ0 t, x) =
.
p
N
exp − = g(z)G(λ1 t, x)
(2π λ0 t) 2 2λ0 λ0

where
κz2 1 1
g(z) := zp e−
. 2 , κ= − > 0, z ∈ R+ ,
λ0 λ1
/ ( p )p
p
reaches the global maximum in .z0 = κ where we have .g(z0 ) = eκ
2 . ⨆
⨅

Lemma 20.3.5 Consider .L in (20.0.1) and assume that it verifies Assump-

tion 20.2.2. For .G and .𝚪y , defined respectively in (20.3.1) and (20.3.2), we have

1 ⎛ ⎞
t−t0
.
N
G λ0 , x − x0 ≤ 𝚪y (t0 , x0 ; t, x) ≤ λN
0 G (λ0 (t − t0 ), x − x0 ) (20.3.11)
λ0

for every .0 ≤ t0 < t < T and .x, x0 , y ∈ RN , where .λ0 is the constant of
Assumption 20.2.2. Moreover, for every .λ > λ0 there exists a positive constant
.c = c(T , N, λ, λ0 ) such that

| | − η|α
|∂x 𝚪y (t0 , x0 ; t, x) − ∂x 𝚪η (t0 , x0 ; t, x)| ≤ c|y
√ G (λ(t − t0 ), x − x0 ) , .
i i
t − t0
(20.3.16)
| |
|∂x x 𝚪y (t0 , x0 ; t, x) − ∂x x 𝚪η (t0 , x0 ; t, x)| ≤ c|y − η| G (λ(t − t0 ), x − x0 ) ,
α
i j i j
t − t0
(20.3.17)

for every .0 ≤ t0 < t < T , .x, x0 , y, y0 ∈ RN and .i, j, k = 1, . . . , N .

Proof By the definition of .Ct0 ,t (y) in (20.3.2) and by the hypothesis of uniform
parabolicity (20.2.3) we have

t − t0
. |y0 |2 ≤ 〈Ct0 ,t (y)y0 , y0 〉 ≤ λ0 (t − t0 )|y0 |2 ; (20.3.18)
λ0

consequently, we have

|y0 |2 λ0 |y0 |2
. ≤ 〈Ct−1
0 ,t
(y)y0 , y0 〉 ≤ (20.3.19)
λ0 (t − t0 ) t − t0

and also
⎛ ⎞N
t − t0
. ≤ det Ct0 ,t (y) ≤ λN
0 (t − t0 ) .
N
(20.3.20)
λ0

Formula (20.3.19) follows from the fact that if .A, B are symmetric and positive
definite matrices, then the inequality between quadratic forms .A ≤ B (i.e.,
.〈Ay0 , y0 〉 ≤ 〈By0 , y0 〉 for every .y0 ∈ R ) implies .B
N −1 ≤ A−1 . Formula

(20.3.20) follows from the fact that the minimum and maximum eigenvalue of a
symmetric matrix C are respectively . min 〈Cy0 , y0 〉 and . max 〈Cy0 , y0 〉 =: ‖C‖
|y0 |=1 |y0 |=1
where .‖C‖ is the spectral norm of C. We note that (20.3.18)-(20.3.19) can be
rewritten respectively in the form

t − t0 1 λ0
. ≤ ‖Ct0 ,t (y)‖ ≤ λ0 (t − t0 ), ≤ ‖Ct−1
0 ,t
(y)‖ ≤ .
λ0 λ0 (t − t0 ) t − t0
(20.3.21)

≤ ‖Ct−1
0 ,t
(y)‖ |x − x0 |𝚪y (t0 , x0 ; t, x) ≤
388 20 A Primer on Parabolic PDEs

(by the second estimate in (20.3.21))

⎛ ⎞
λ0 |x − x0 |
. ≤√ √ 𝚪y (t0 , x0 ; t, x) ≤
t − t0 t − t0

(by (20.3.11) and Lemma 20.3.4)

c
. ≤√ G(λ(t − t0 ), x − x0 ).
t − t0

Formulas (20.3.13) and (20.3.14) can be proven in a completely analogous way.

Using the explicit expression of .𝚪y , (20.3.15) is a direct consequence of the
following estimates:
| |
| | c|y − η|α
| 1 1 |
. |√ −√ |≤ √ ,. (20.3.22)
| det Ct0 ,t (y) det Ct0 ,t (η) | det Ct0 ,t (y)
| 1 −1 | 2
| − 2 〈Ct0 ,t (y)x,x〉 1 −1 | − |x|
|e − e− 2 〈Ct0 ,t (η)x,x〉 | ≤ c|y − η|α e 2λ(t−t0 ) . (20.3.23)

Regarding (20.3.22), we have

| |
| |
| 1 1 |
. |√ −√ |
| det Ct0 ,t (y) det Ct0 ,t (η) |
| |
1 |det Ct ,t (y) − det Ct ,t (η)|
=√ √ (√0 0
√ )≤
det Ct0 ,t (y) det Ct0 ,t (η) det Ct0 ,t (y) + det Ct0 ,t (η)

(by (20.3.20))
| |
λN |det Ct ,t (y) − det Ct ,t (η)|
. ≤ √ 0 0 0

det Ct0 ,t (y) (t − t0 )N

| ⎛ ⎞ ⎛ ⎞|
λN | 1 1 |
=√ 0 |det C (y) − det C (y) |≤
det Ct0 ,t (y) | t − t0
t0 ,t
t − t0
t0 ,t |

(since .| det A − det B| ≤ c‖A − B‖ where .‖ · ‖ indicates the spectral norm and c is
a constant that depends only on .‖A‖, ‖B‖ and the dimension of the matrices)
‖ ‖
c ‖ 1 ( )‖
≤√ ‖ C (y) − C (η) ‖
det Ct0 ,t (y) ‖ t − t0 ‖
. t0 ,t t0 ,t
20.3 The Parametrix Method 389

and (20.3.22) follows from Assumption 20.2.2, in particular from the Hölder
condition on the coefficients .cij . Regarding (20.3.23), by the mean value theorem
and (20.3.19) we have
| | | | 2
| − 12 〈Ct−1 1 −1 | | | − |x|
. |e 0 ,t
(y)x,x〉
− e− 2 〈Ct0 ,t (η)x,x〉 | ≤ |〈Ct−1
0 ,t
(y)x, x〉 − 〈Ct−1
0 ,t
(η)x, x〉| e 2λ0 (t−t0 )
2
− 2λ |x|
≤ ‖Ct−1
0 ,t
(y) − Ct−1
0 ,t
(η)‖ |x|2 e (t−t
0 0) ≤

(by the identity .A−1 − B −1 = A−1 (B − A)B −1 )

2
− 2λ |x|
. ≤ c‖Ct−1
0 ,t
(y)‖ ‖Ct0 ,t (y) − Ct0 ,t (η)‖ ‖Ct−1
0 ,t
(η)‖ |x|2 e 0(t−t 0) ≤

(by (20.3.21))
‖ ‖
‖ 1 ( )‖ |x|2 − 2λ |x| 2
. ≤ c‖
‖ Ct0 ,t (y) − Ct0 ,t (η) ‖
‖ e 0 (t−t0 ) ≤
t − t0 t − t0

(by the assumption of Hölder continuity of the coefficients .cij and by

Lemma 20.3.4)
2
|x|
− 2λ(t−t
. ≤ c|y − η|α e 0)

and this is sufficient to prove (20.3.23) and therefore (20.3.15).

The proof of the estimates (20.3.16) and (20.3.17) is analogous: for example, we
have
| |
. |∇x 𝚪y (t0 , x0 ; t, x) − ∇x 𝚪η (t0 , x0 ; t, x)|
| |
| |
= |Ct−1 0 ,t (y)(x − x0 )𝚪 y (t 0 , x0 ; t, x) − C −1
t 0 ,t (η)(x − x0 )𝚪 η (t 0 , x0 ; t, x) |
|⎛ ⎞ |
| |
≤ | Ct−1 0 ,t
(y) − Ct−1
0 ,t
(η) (x − x0 )| 𝚪y (t0 , x0 ; t, x)
| || |
| |
+ |C −1 (η)(x − x0 )| |𝚪y (t0 , x0 ; t, x) − 𝚪η (t0 , x0 ; t, x)|
t0 ,t

and the proof of (20.3.16) and (20.3.17) follows a similar line of reasoning as used
previously. ⨆
⨅
390 20 A Primer on Parabolic PDEs

20.3.2 Proof of Proposition 20.3.2

Lemma 20.3.5 enables us to estimate the terms .(L P)k in (20.3.8) of the parametrix
expansion.
Lemma 20.3.6 For every .λ > λ0 there exists a positive constant .c =
c(T , N, λ, λ0 ) such that
mk
|(L P)k (t0 , x0 ; t, x)| ≤
.
αk
G(λ(t − t0 ), x − x0 ) (20.3.24)
(t − t0 )1− 2

for every .k ∈ N, .0 ≤ t0 < t < T and .x, x0 ∈ RN , where

( ( ))k
c𝚪E α2
.mk = ( )
𝚪E αk
2

and .𝚪E denotes the Euler Gamma function.

Proof First, we observe that by Assumption 20.2.2 we have
| |
. |cij (t, x) − cij (t, x0 )| ≤ λ0 |x − x0 |α , 0 ≤ t < T , x, x0 ∈ RN , i, j = 1, . . . , N.
(20.3.25)

1 ⎲ ||( |
N
)
≤ cij (t, x) − cij (t, x0 ) ∂xi xj 𝚪x0 (t0 , x0 ; t, x)|
2
i,j =1

⎲
N
| |
+ |bi (t, x)∂x 𝚪x (t0 , x0 ; t, x)|
i 0
i=1

+ |a(t, x)|𝚪x0 (t0 , x0 ; t, x).

The first term is the most delicate: by the estimates (20.3.25) and (20.3.13), for
λ/ = λ02+λ we have
.

|( ) |
| cij (t, x)−cij (t, x0 ) ∂x x 𝚪x (t0 , x0 ; t, x)| ≤ c |x − x0 | G(λ/ (t − t0 ), x − x0 ) ≤
α
. i j 0
t − t0

(by Lemma 20.3.4)

c
. ≤ α G(λ(t − t0 ), x − x0 ).
(t − t0 )1− 2
20.3 The Parametrix Method 391

The other terms are easily estimated using the boundedness hypothesis of the
coefficients and estimate (20.3.12) of the first derivatives:
| |
. |bi (t, x)∂x 𝚪x (t0 , x0 ; t, x)| + |a(t, x)|𝚪x (t0 , x0 ; t, x)
i 0 0
⎛ ⎞
1
≤c √ + 1 G(λ(t − t0 ), x − x0 ).
t − t0

This is sufficient to prove (20.3.24) in the case .k = 1.

Now we proceed by induction and, assuming the thesis is true for k, we prove it
for .k + 1:
ˆ tˆ
. |(L P)k+1 (t0 , x0 ; t, x)| ≤ |(L P)k (t0 , x0 ; s, y)| |L P(s, y; t, x)| dyds
t0 RN
ˆ t mk m1
≤
1− αk α
t0 (s − t0 ) 2 (t − s)1− 2
ˆ
× G(λ(s − t0 ), y − x0 )G(λ(t − s), x − y)dyds =
RN

(by the Chapman-Kolmogorov equation (2.4.4))

ˆ t mk m1
. = G(λ(t − t0 ), x − x0 ) ds
1− αk α
t0 (s − t0 ) 2 (t − s)1− 2

and the thesis follows from the properties of Euler’s Gamma function. ⨆
⨅
Remark 20.3.7 The Chapman-Kolmogorov equation is a crucial tool in the
parametrix method: it is proved by a direct calculation or, alternatively, as a
consequence of the uniqueness result of Theorem 20.1.8. In fact, for .t0 < s < t < T
and .x, x0 , y ∈ RN , we have that the functions .u1 (t, x) := G(t − t0 , x − x0 ) and
ˆ
u2 (t, x) =
. G(s − t0 , y − x0 )G(t − s, x − y)dy
RN

are both bounded solutions of the Cauchy problem

⎧
2 Δu − ∂t u =0 in ]s, T [×RN ,
1
.
u(s, y) = G(s − t0 , y − x0 ) for y ∈ RN ,

and therefore they are equal.

392 20 A Primer on Parabolic PDEs

Lemma 20.3.8 Let .κ > 0. Given .κ1 ∈ ]0, κ[ there exists a positive constant c such
that
|η−x0 |2 |y−x0 |2
e−κ
. t ≤ ce−κ1 t (20.3.26)

for every .t > 0 and .x0 , y, η ∈ RN such that .|y − η|2 ≤ t.

Proof First of all, for every .ε > 0 and .a, b ∈ R, the elementary inequalities hold

b2
2|ab| ≤ εa 2 +
. ,
ε
and
⎛ ⎞
1 2
.(a + b) ≤ (1 + ε)a + 1 +
2 2
b .
ε

Formula (20.3.26) follows from the fact that

⎛ ⎞
|y − x0 |2 |η − x0 |2 1 |y − η|2 ((1 + ε)κ1 − κ) |η − x0 |2
κ1
. −κ ≤ κ1 1 + + ≤
t t ε t t

(since .|y − η|2 ≤ t by hypothesis and for .ε sufficiently small, being .κ1 < κ)
⎛ ⎞
1
κ1
. 1+ .
ε

⨆
⨅
Proof of Proposition 20.3.2 For every .λ > λ0 we have
∞
⎲
. |Ф(t0 , x0 ; t, x)| ≤ |(L P)k (t0 , x0 ; t, x)| ≤
k=1

(by estimate (20.3.24))

∞
⎲ mk
. ≤ αk
G(λ(t − t0 ), x − x0 )
k=1 (t − t0 )1− 2

c
≤ α G(λ(t − t0 ), x − x0 )
(t − t0 )1− 2

∑
∞
with .c = c(T , N, λ, λ0 ) positive constant, since the power series . mk r k−1 has
k=1
infinite convergence radius. This proves (20.3.9). The convergence of the series is
20.3 The Parametrix Method 393

uniform in .(t0 , x0 , t, x) if .t − t0 ≥ δ > 0, for every .δ > 0 sufficiently small, and

consequently .Ф(t0 , x0 ; t, x) is a continuous function of .(t0 , x0 , t, x) for .0 ≤ t0 <
t < T and .x, x0 ∈ RN . Moreover, exchanging the signs of series and integral, we
have
ˆ tˆ
. Ф(t0 , x0 ; s, y)L P(s, y; t, x)dyds
t0 RN
∞ ˆ tˆ
⎲
= (L P)k (t0 , x0 ; s, y)L P(s, y; t, x)dyds
k=1 t0 RN

∞
⎲
= (L P)k (t0 , x0 ; t, x)
k=2

= Ф(t0 , x0 ; t, x) − L P(t0 , x0 ; t, x)

and therefore .Ф solves equation (20.3.6).

As for (20.3.10), we first prove the estimate

|L P(t0 , x0 ; t, x) − L P(t0 , x0 ; t, y)| ≤

.
c |x − y|α/2
≤ (G(λ(t − t0 ), x − x0 ) + G(λ(t − t0 ), y − x0 ))
(t − t0 )1−α/4
(20.3.27)

for every .λ > λ0 , .0 ≤ t0 < t < T and .x, y, x0 ∈ RN , with .c = c(T , N, λ, λ0 ) > 0.
Now, if .|x − y|2 > t − t0 then (20.3.27) follows directly from (20.3.24) with .k = 1.
To study the case .|x − y|2 ≤ t − t0 , we observe that

L P(t0 , x0 ; t, x) − L P(t0 , x0 ; t, y)
.

= (L − Lx0 )P(t0 , x0 ; t, x) − (L − Lx0 )P(t0 , x0 ; t, y) = F1 + F2

where

1 ⎲
N
F1 =
. ((cij (t, x) − cij (t, x0 ))∂xi xj P(t0 , x0 ; t, x)
2
i,j =1

− (cij (t, y) − cij (t, x0 ))∂yi yj P(t0 , x0 ; t, y))

1 ⎲
N
= (cij (t, x) − cij (t, y))∂xi xj P(t0 , x0 ; t, x)
2
i,j =1
◟ ◝◜ ◞
=:G1
394 20 A Primer on Parabolic PDEs

1 ⎲
N
( )
+ (cij (t, y) − cij (t, x0 )) ∂xi xj P(t0 , x0 ; t, x) − ∂yi yj P(t0 , x0 ; t, y) ,
2
i,j =1
◟ ◝◜ ◞
=:G2

⎲
N
( )
F2 = bj (t, x)∂xj P(t0 , x0 ; t, x) − bj (t, y)∂yj P(t0 , x0 ; t, y)
j =1

+ a(t, x)P(t0 , x0 ; t, x) − a(t, y)P(t0 , x0 ; t, y).

Due to the Hölder continuity assumption of the coefficients and the Gaussian
estimate (20.3.13), under the condition .|x − y|2 ≤ t − t0 , we have
α
c |x − y|α c |x − y| 2
|G1 | ≤
. G (λ(t − t0 ), x − x0 ) ≤ α G (λ(t − t0 ), x − x0 ) .
t − t0 (t − t0 )1− 4

Regarding .G2 , we still use the Hölder continuity of the coefficients and combine the
mean value theorem (with .η belonging to the segment with endpoints .x, y) with the
Gaussian estimate (20.3.14) of the third derivatives: we obtain
⎛ ⎞
c |x − y| λ + λ0
|G2 | ≤ |y − x0 |α
.
3
G (t − t0 ), η − x0 ≤
(t − t0 ) 2 2

(since .|x − y|2 ≤ t − t0 and by Lemma 20.3.8)

α ⎛ ⎞
c |x − y| 2 λ + λ0
. ≤ α |y − x0 |α G (t − t0 ), y − x0 ≤
(t − t0 )1+ 4 2

(by Lemma 20.3.4)

α
c |x − y| 2
. ≤ α G (λ(t − t0 ), y − x0 ) .
(t − t0 )1− 4

A similar estimate holds for .F2 , which can be proved using the Hölder continuity of
the coefficients .bj and a. This concludes the proof of (20.3.27).
We now prove (20.3.10) using the fact that .Ф solves equation (20.3.6), so we
have

Ф(t0 , x0 ; t, x) − Ф(t0 , x0 ; t, y)
.

= L P(t0 , x0 ; t, x) − L P(t0 , x0 ; t, y)
ˆ tˆ
+ Ф(t0 , x0 ; s, η) (L P(s, η; t, x) − L P(s, η; t, y)) dηds .
t0 R N
◟ ◝◜ ◞
=:I (t0 ,x0 ;t,x,y)
20.3 The Parametrix Method 395

Thanks to (20.3.27), it is sufficient to estimate the term .I (t0 , x0 ; t, x, y): again by

the estimates (20.3.9) and (20.3.27) we obtain
ˆ α
t c |x − y| 2
. |I (t0 , x0 ; t, x, y)| ≤ α α ·
t0 (s − t0 )1− 2 (t − s)1− 4
ˆ
· G(λ(s − t0 ), η − x0 )(G(λ(t − s), x − η)
RN

+ G(λ(t − s), y − η))dηds =

(by the Chapman-Kolmogorov equation)

ˆ t c |x − y|α/2
. = α α ds (G(λ(t − t0 ), x − x0 ) + G(λ(t − t0 ), y − x0 ))
t0 (s − t0 )1− 2 (t − s)1− 4
c |x − y|α/2
= 3α
(G(λ(t − t0 ), x − x0 ) + G(λ(t − t0 ), y − x0 ))
(t − t0 )1− 4

given the general formula

ˆ t 1 𝚪E (1 − β) 𝚪E (1 − γ )
. ds = (t − t0 )1−β−γ (20.3.28)
t0 (s − t0 )β (t − s)γ 𝚪E (2 − β − γ )

valid for every .β, γ < 1. ⨆

⨅

20.3.3 Potential Estimates

Let Assumption 20.2.2 be in force and recall definition (20.3.3) of parametrix. In

this section, we consider the so-called potential
ˆ tˆ
Vf (t, x) :=
. f (s, y)P(s, y; t, x)dyds, (t, x) ∈ ]t0 , T [×RN ,
t0 RN
(20.3.29)

where .f ∈ C(]t0 , T [×RN ) satisfies Assumption 20.2.4 of growth and local Hölder
continuity. The main result of this section is the following
Proposition 20.3.9 Definition (20.3.29) is well-posed and .Vf ∈ C(]t0 , T [×RN ).
Moreover, for every .i, j = 1, . . . , N there exist and are continuous on .]t0 , T [×RN
396 20 A Primer on Parabolic PDEs

the derivatives
ˆ tˆ
. ∂xi Vf (t, x) = f (s, y)∂xi P(s, y; t, x)dyds, . (20.3.30)
t0 RN
ˆ tˆ
∂xi xj Vf (t, x) = f (s, y)∂xi xj P(s, y; t, x)dyds, . (20.3.31)
t0 RN
ˆ tˆ
∂t Vf (t, x) = f (t, x) + f (s, y)∂t P(s, y; t, x)dyds. (20.3.32)
t0 RN

Proof Let
ˆ
I (s; t, x) :=
. f (s, y)𝚪y (s, y; t, x)dy, t0 ≤ s < t < T , x ∈ RN ,
RN

so that
ˆ t
Vf (t, x) =
. I (s; t, x)ds.
t0

By estimate (20.3.11) and assumption (20.2.6), we have

ˆ
c1 λN 2
c2 |y|2 − 2λ|x−y|
. |I (s; t, x)| ≤ 0
N
e 0 (t−s)
dy =
(s − t0 )1−β (2π λ0 (t − s)) 2 RN

(by the change of variables .z = √ x−y

2λ0 (t−s)
and setting .c0 = c1 λN π −N/2 )
ˆ √
c0
ec2 |x−z 2λ0 (t−s)| −|z|2
2
. = dz ≤
(s − t0 )1−β RN

(setting .κ = 1 − 4c2 λ0 T > 0 by hypothesis)

ˆ
ce2c2 |x|
2
c0
e2c2 |x| e−κ|z| dz ≤
2 2
. ≤ (20.3.33)
(s − t0 ) 1−β
RN (s − t0 )1−β

for some suitable positive constant .c = c(λ0 , T , N, c1 , c2 ). It follows that the

with .β > 0.
20.3 The Parametrix Method 397

Proof of (20.3.30) For .t0 ≤ s < t < T we have

|ˆ |
| | | |
. |∂xi I (s; t, x)| = | f (s, y)∂ P(s, y; t, x)dy |≤
| N xi |
R

(proceeding as in the proof of (20.3.33), using estimate (20.3.12))

ce2c2 |x|
2

. ≤ √ .
(s − t0 )1−β t − s
This is sufficient to prove (20.3.30) and moreover, by (20.3.28) we have

| | ce2c2 |x|
2

. |∂x Vf (t, x)| ≤ , t0 < t < T , x ∈ RN .

i 1
(t − t0 ) 2 −β

Proof of (20.3.31) The proof of the existence of the second order derivatives is
more involved since repeating the previous argument using estimate (20.3.13) would
1
result in a singular term of the type . t−s which is not integrable in the interval .[t0 , t].
Proceeding carefully, it is possible to prove more precise and uniform estimates on
.]t0 , T [×Dn for each fixed .n ∈ N, where .Dn := {|x| ≤ n}.

Assume .x ∈ Dn . First of all, for each .s < t we have

ˆ
.∂xi xj I (s; t, x) = f (s, y)∂xi xj P(s, y; t, x)dy = J (s; t, x) + H (s; t, x)
RN

where
ˆ
J (s; t, x) =
. f (s, y)∂xi xj P(s, y; t, x)dy,
Dn+1
ˆ
H (s; t, x) = f (s, y)∂xi xj P(s, y; t, x)dy.
RN \Dn+1

Decompose J into the sum of three terms, .J = J1 + J2 + J3 , where8

ˆ
J1 (s; t, x) =
. (f (s, y) − f (s, x)) ∂xi xj 𝚪y (s, y; t, x)dy,
Dn+1
ˆ
( ( ) )
J2 (s; t, x) = f (s, x) ∂xi xj 𝚪y (s, y; t, x) − ∂xi xj 𝚪η (s, y; t, x) |η=x dy,
Dn+1
ˆ
( )
J3 (s; t, x) = f (s, x) ∂xi xj 𝚪η (s, y; t, x) |η=x dy.
Dn+1

( )
8 For clarity, the term . ∂xi xj 𝚪η (s, y; t, x) |η=x is obtained by first applying the derivatives
.∂xi xj 𝚪η (s, y; t, x), keeping .η fixed, and then calculating the result obtained in .η = x. Note that,
under Assumption 20.2.2, .𝚪η (s, y; t, x) as a function of .η is not differentiable.
398 20 A Primer on Parabolic PDEs

By the local Hölder continuity of f , being .x, y ∈ Dn+1 , and estimate (20.3.13), we
have
ˆ
c |x − y|β
. |J1 (s; t, x)| ≤ G (λ(t − s), x − y) dy ≤
(s − t0 )1− 2
β
Dn+1 t −s

(by Lemma 20.3.4)

ˆ
c c
≤
.
β
G (2λ(t − s), x − y) dy ≤ ,
1− β2 1− β2 β
(s − t0 )1 − (t − s)
2 Dn+1 (s − t0 ) (t − s)1− 2

with c positive constant that depends on .κn in (20.2.7), as well as on .T , N, λ and

λ0 . Proceeding in a similar way, using (20.3.17) and (20.2.6), we have
.

ˆ
cec2 |x|
2
|y − x|α
. |J2 (s; t, x)| ≤ G (λ(t − s), x − y) dy
(s − t0 )1−β Dn+1 t −s

cec2 |x|
2

≤ α .
(s − t0 )1−β (t − s)1− 2

Now, we notice that

∂xi 𝚪η (s, y; t, x) = −∂yj 𝚪η (s, y; t, x)

and therefore
ˆ ˆ
( ) ( )
. ∂xi xj 𝚪η (s, y; t, x) |η=x dy = − ∂yi xj 𝚪η (s, y; t, x) |η=x dy =
Dn+1 Dn+1

(by the divergence theorem, indicating with .ν the external normal to .Dn+1 and with
dσ (y) the surface measure on the boundary .∂Dn+1 )
.

ˆ
( )
. =− ∂xj 𝚪η (s, y; t, x) |η=x ν(y)dσ (y)
∂Dn+1

from which, again by (20.3.12) and (20.2.6), we obtain

ˆ
cec2 |x|
2
1
. |J3 (s; t, x)| ≤ √ G (λ(t − s), x − y) dσ (y)
(s − t0 )1−β ∂Dn+1 t −s

cec2 |x|2
≤ √ .
(s − t0 )1−β t − s
20.3 The Parametrix Method 399

Finally, by (20.3.13) we have

ˆ
c
. |H (s; t, x)| ≤ |f (s, y)| G (λ(t − s), x − y) dy ≤
RN \Dn+1 t −s

(being .|x − y| ≥ 1 since .|y| ≥ n + 1 and .|x| ≤ n)

ˆ
|x − y|2
. ≤c |f (s, y)| G (λ(t − s), x − y) dy ≤
RN \Dn+1 t −s

(by Lemma 20.3.4, with .λ/ > λ, and the assumption (20.2.6) on the growth of f )
ˆ 2
c ( ) cec|x|
ec2 |y| G λ/ (t − s), x − y dy ≤
2
. ≤
(s − t0 )1−β RN (s − t0 )1−β

with .c > 0, remembering that .c2 < 4λ10 T by assumption and choosing .λ/ − λ0
sufficiently small. In conclusion, we have proved that, for every .t0 ≤ s < t < T
and .x ∈ Dn , with .n ∈ N fixed, there exists a constant c such that
|ˆ |
| | c
.|∂xi xj I (s; t, x)| = | f (s, y)∂ P(s, y; t, x)dy |≤
| N xi xj | β γ
R (s − t0 )1− 2 (t − s)1− 2
(20.3.35)

where .γ = α ∧ β, from which also

c
|∂xi xj Vf (t, x)| ≤
.
1 β γ
(t − t0 ) 2 − 2 − 2

thanks to (20.3.28). This concludes the proof of formula (20.3.31).

Proof of (20.3.32) First, we observe that
|ˆ |
| |
. |∂t I (s; t, x)| = | f (s, y)∂t 𝚪y (s, y; t, x)dy || =
|
RN

(since .𝚪y is the fundamental solution of .Ly )

| |
|ˆ ⎲ |
| 1
N
|
. = | f (s, y) c (t, y)∂ 𝚪 (s, y; t, x)dy |≤
| N ij xi xj y |
| R 2
i,j =1 |
400 20 A Primer on Parabolic PDEs

(proceeding as in the proof of (20.3.35) and using the boundedness assumption on

the coefficients)
c
. ≤ γ . (20.3.36)
(s − t0 )1−β (t − s)1− 2

for every .t0 ≤ s < t < T and .x ∈ Dn , with .n ∈ N fixed. Now, we have
ˆ
Vf (t + h, x) − Vf (t, x) I (s; t + h, x) − I (s; t, x)
t
. = ds
h t0 h
ˆ
1 t+h
+ I (s; t + h, x)ds =: I1 (t, x) + I2 (t, x).
h t

By the mean value theorem, there exists .tˆs ∈ [t, t + h] such that
ˆ t ˆ t
. I1 (t, x) = ∂t I (s; tˆs , x)ds −−−→ ∂t I (s; t, x)ds
t0 h→0 t0

by the dominated convergence theorem thanks to estimate (20.3.36). As for .I2 , we

have
ˆ t+h
1
.I2 (t, x) − f (t, x) = (I (s; t + h, x) − f (s, x)) ds
h t
ˆ t+h
1
+ (f (s, x) − f (t, x))ds
h t

where the second integral on the right-hand side tends to zero as .h → 0 since f is
continuous, while to estimate the first integral we assume .x ∈ Dn and proceed as in
the proof of (20.3.31): specifically, we write
ˆ t+h
1
. (I (s; t + h, x) − f (s, x)) ds
h t
ˆ t+h ˆ
1
= (f (s, y) − f (s, x))𝚪y (s, y; t + h, x)dyds
h t Dn+1
◟ ◝◜ ◞
=:J1 (t,x)
ˆ t+h ˆ
1
+ (f (s, y) − f (s, x))𝚪y (s, y; t + h, x)dyds .
h t RN \Dn+1
◟ ◝◜ ◞
=:J2 (t,x)
20.3 The Parametrix Method 401

Assuming .h > 0 for simplicity: by the Hölder continuity of f and estimate (20.3.11)
of .𝚪y , we have
ˆ t+h ˆ
λN κn+1
. |J1 (t, x)| ≤ |x − y|β G (λ0 (t + h − s), x − y) dyds ≤
h t Dn+1

(by Lemma 20.3.4)

ˆ t+h ˆ
c β
. ≤ (t + h − s) 2 G (λ0 (t + h − s), x − y) dy ds −−−−→ 0.
h t Dn+1 h→0+
◟ ◝◜ ◞
≤1

On the other hand, thanks to the growth assumption (20.2.6) on f and (20.3.11), it
can be readily proved that
ˆ t+h ˆ
c
ec2 |y| G (λ0 (t + h − s), x − y) dyds −−−−→ 0.
2
. |J2 (t, x)| ≤
h t |x−y|>1 h→0+

This is enough to conclude the proof of the proposition.

⨆
⨅

20.3.4 Proof of Theorem 20.2.5

We divide the proof into several steps.

Step 1 By construction and the properties of .Ф in Proposition 20.3.2, .𝚪 =
𝚪(t0 , x0 ; t, x) in (20.3.4) is a continuous function of .(t0 , x0 , t, x) for .0 ≤ t0 < t < T
and .x, x0 ∈ RN . We show that .𝚪 is a solution of .L . Thanks to the estimates of .Ф
in Proposition 20.3.2, applying Proposition 20.3.9 we obtain
ˆ tˆ
∂xi 𝚪(t0 , x0 ; t, x) = ∂xi P(t0 , x0 ; t, x) +
. Ф(t0 , x0 ; s, y)∂xi P(s, y; t, x)dyds,
t0 RN

∂xi xj 𝚪(t0 , x0 ; t, x) = ∂xi xj P(t0 , x0 ; t, x)

ˆ tˆ
+ Ф(t0 , x0 ; s, y)∂xi xj P(s, y; t, x)dyds,
t0 RN
ˆ tˆ
∂t 𝚪(t0 , x0 ; t, x) = Ф(t0 , x0 ; s, y)∂t P(s, y; t, x)dyds + Ф(t0 , x0 ; t, x),
t0 RN
402 20 A Primer on Parabolic PDEs

for .t0 < t < T , .x, x0 ∈ RN . Then we have

ˆ tˆ
L 𝚪(t0 , x0 ; t, x) = L P(t0 , x0 ; t, x) +
. Ф(t0 , x0 ; s, y)L P(s, y; t, x)dyds
t0 RN

− Ф(t0 , x0 ; t, x)

from which we deduce that

L 𝚪(t0 , x0 ; t, x) = 0,
. 0 ≤ t0 < t < T , x, x0 ∈ RN , (20.3.37)

since, by Proposition 20.3.2, .Ф solves Eq. (20.3.6).

Step 2 We prove the upper Gaussian estimate (20.2.8). By using the definition
(20.3.4) of .𝚪, we have
ˆ tˆ
. |𝚪(t0 , x0 ; t, x)| ≤ P(t0 , x0 ; t, x) + |Ф(t0 , x0 ; s, y)| P(s, y; t, x)dyds ≤
t0 RN

(by (20.3.9) and (20.3.11))

. ≤ λN G (λ(t − t0 ), x − x0 )
ˆ t ˆ
c
+ 1− α
G(λ(s − t0 ), y − x0 )G(λ(t − s), x − y)dyds =
t0 (s − t0 ) 2 RN

(by the Chapman-Kolmogorov equation)

α
. ≤ λN G (λ(t − t0 ), x − x0 ) + c(t − t0 ) 2 G(λ(t − t0 ), x − x0 ) (20.3.38)

and this proves, in particular, the upper bound (20.2.8). Formula (20.2.9) is proven
in a completely analogous way.
Now, we prove (20.2.10). By repeating the proof of (20.3.35) with .Ф(t0 , x0 ; s, y)
in place of .f (s, y) and using the estimates from Proposition 20.3.2, we establish the
existence of a positive constant .c = c(T , N, λ, λb ) such that
|ˆ |
| |
.| Ф(t0 , x0 ; s, y)∂xi xj P(s, y; t, x)dy ||
|
RN
c
≤ 1− α4 α G(λ(t − t0 ), x − x0 ), t0 ≤ s < t < T , x, x0 ∈ RN .
(s − t0 ) (t − s)1− 4
(20.3.39)
20.3 The Parametrix Method 403

Hence, by (20.3.4) and (20.3.31), we have

| | | |
. |∂x x 𝚪(t0 , x0 ; t, x)| ≤ |∂x x P(t0 , x0 ; t, x)|
i j i j
|ˆ t ˆ |
| |
+ || Ф(t0 , x0 ; s, y)∂xi xj P(s, y; t, x)dyds || ≤
t0 RN

(by (20.3.13) and (20.3.39))

⎛ ⎞
1 1
. ≤c + α G (λ(t − t0 ), x − x0 ) .
t − t0 (t − t0 )1− 2

Step 3 We prove that .𝚪 is a fundamental solution of .L . Given .ϕ ∈ bC(RN ),

consider the function u in (20.2.1). Thanks to the estimates (20.2.8)–(20.2.10) we
have
ˆ
.L u(t, x) = ϕ(ξ )L 𝚪(t0 , ξ ; t, x)dξ = 0, 0 ≤ t0 < t < T , x ∈ RN ,
RN

by (20.3.37). As for the initial datum, we have

ˆ
.u(t, x) = ϕ(ξ )P(t0 , ξ ; t, x)dξ
◟R
N
◝◜ ◞
J (t,x)
ˆ ˆ tˆ
+ ϕ(ξ ) Ф(t0 , ξ ; s, y)P(s, y; t, x)dyds dξ .
RN t0 RN
◟ ◝◜ ◞
H (t,x)

Now, for a fixed .x0 ∈ RN ,

ˆ
( )
.J (t, x) = ϕ(ξ ) 𝚪ξ (t0 , ξ ; t, x) − 𝚪x0 (t0 , ξ ; t, x) dξ
◟R
N
◝◜ ◞
J1 (t,x)
ˆ
+ ϕ(ξ )𝚪x0 (t0 , ξ ; t, x)dξ
RN

and, by (20.3.15), we have

ˆ
. |J1 (t, x)| ≤ c |ϕ(ξ )||ξ − x0 |α G (λ(t − t0 ), x − ξ ) dξ −−−−−−−−→ 0,
RN (t,x)→(t0 ,x0 )
ˆ
ϕ(ξ )𝚪x0 (t0 , ξ ; t, x)dξ −−−−−−−−→ ϕ(x0 ).
RN (t,x)→(t0 ,x0 )
404 20 A Primer on Parabolic PDEs

Here we use the limit argument of Example 3.3.3 in [113]: in probabilistic terms,
this correspond to the weak convergence of the normal distribution to the Dirac
delta, as the variance tends to zero. On the other hand, by (20.3.38)
ˆ
α
. |H (t, x)| ≤ c(t − t0 ) 2 ϕ(x0 )G(λ(t − t0 ), x − x0 )dx0 −−−−−−−−→ 0.
RN (t,x)→(t0 ,x̄)

This proves that .u ∈ C([t0 , T [×RN ) and is therefore a classical solution of the
Cauchy problem (20.2.2).
Step 4 We prove that u in (20.2.12) is a classical solution of the non-homogeneous
Cauchy problem (20.2.13). We use the definition of .𝚪 in (20.3.4) and focus on the
term
ˆ tˆ
. f (s, y)𝚪(s, y; t, x)dyds
t0 RN
ˆ tˆ
= f (s, y)P(s, y; t, x)dyds
t0 RN
ˆ tˆ ˆ tˆ
+ f (s, y) Ф(s, y; τ, η)P(τ, η; t, x)dηdτ dyds =
t0 RN s RN

(using notation (20.3.29), setting .Ф(s, y; τ, η) = 0 for .τ ≤ s and exchanging the

order of integration of the last integral)

. = Vf (t, x) + VF (t, x)

where
ˆ τ ˆ
.F (τ, η) := f (s, y)Ф(s, y; τ, η)dyds.
t0 RN

We will soon prove that F satisfies Assumption 20.2.4 and it is therefore possible to
apply Proposition 20.3.9 to .Vf and .VF : we obtain
( )
.L Vf (t, x) + VF (t, x) = −f (t, x) − F (t, x)
ˆ tˆ
+ (f (s, y) + F (s, y)) L P(s, y; t, x)dyds
t0 RN
ˆ tˆ
= −f (t, x) + f (s, y)I (s, y; t, x)dyds
t0 RN
20.3 The Parametrix Method 405

where

.I (s, y; t, x) := −Ф(s, y; t, x) + L P(s, y; t, x)

ˆ tˆ
+ Ф(s, y; τ, η)L P(τ, η; t, x)dηdτ ≡ 0
s RN

by (20.3.6). This proves that

L u(t, x) = f (t, x),

. 0 ≤ t0 < t < T , x, x0 ∈ RN .

Let us verify that F satisfies Assumption 20.2.4: by (20.3.9), the hypotheses on f

and (20.3.28), we have
ˆ ˆ
cec2 |y|
τ 2

. |F (τ, η)| ≤ β α
G(λ(τ − s), η − y)dyds
t0 RN (s − t0 )1− 2 (τ − s)1− 2
c 2
≤ ec|η| .
1− α+β
(τ − t0 ) 2

Moreover, by (20.3.10) we have

| |
. |F (τ, η) − F (τ, η/ )|
ˆ ˆ
ec2 |y|
τ 2
α
≤ c|η − η/ | 2 β α
t0 RN (s − t0 )1− 2 (τ − s)1− 4
( )
× G(λ(τ − s), η − y) + G(λ(τ − s), η/ − y) dyds

c|η − η/ | 2
α ⎛ / 2
⎞
ec|η| + ec|η |
2
≤ .
1− α+2β
(τ − t0 ) 4

Finally, using the upper bound (20.2.8) of .𝚪 and proceeding as in the proof of
estimate (20.3.34), we have that
ˆ tˆ
. f (s, y)𝚪(s, y; t, x)dyds −−−−−−−−→ 0,
t0 RN (t,x)→(t0 ,x̄)

for every .x̄ ∈ RN . This concludes the proof that u in (20.2.12) is a classical solution
of the non-homogeneous Cauchy problem (20.2.13).
Step 5 The Chapman-Kolmogorov equation and formula (20.2.14) can be proved
as in Remark 20.3.7, as a consequence of the uniqueness result of Theorem 20.1.8.
406 20 A Primer on Parabolic PDEs

In particular, as shown in the previous points, if a is constant, the functions

ˆ
u1 (t, x) := ea(t−t0 ) ,
. u2 (t, x) := 𝚪(t0 , x0 ; t, x)dx0
RN

are both bounded solutions (thanks to estimate (20.3.38)) of the Cauchy problem
⎧
Lu = 0 in ]t0 , T [×RN ,
.
u(t0 , ·) = 1 in RN ,

and therefore coincide.

Step 6 As a last step, we prove the lower bound of .𝚪 in (20.2.11). This is a non-
trivial result, for which we adapt a technique introduced by D.G. Aronson that
exploits some classical estimates of J. Nash: for further details, we also refer to
Section 2 in [42]. Here, instead of Nash’s estimates, we use other estimates derived
directly from the parametrix method.
First, we prove that .𝚪 ≥ 0: for the sake of contradiction, if .𝚪(t0 , x0 ; t1 , x1 ) < 0
for certain .x0 , x1 ∈ RN and .0 ≤ t0 < t1 < T , then by continuity we would have

𝚪(t0 , y; t1 , x1 ) < 0,
. |y − x0 | < r,

for a suitable .r > 0. Consider .ϕ ∈ bC(RN ) such that .ϕ(y) > 0 for .|y − x0 | < r and
.ϕ(y) ≡ 0 for .|y − x0 | ≥ r: the function

ˆ
u(t, x) :=
. ϕ(y)𝚪(t0 , y; t, x)dy, t ∈ ]t0 , T [, x ∈ RN ,
RN

is bounded thanks to estimate (20.3.38) of .𝚪, is such that .u(t1 , x1 ) < 0 and is
a classical solution of the Cauchy problem (20.2.2). But this is absurd because it
contradicts the maximum principle, Theorem 20.1.8.
Now we observe that for every .λ > 1 we have
⎛ ⎞
t
.G(λt, x) ≤ G ,x
λ
√ /
if .|x| < cλ t where .cλ = λλN
2 −1 log λ. Then, by definition (20.3.4) we have

|ˆ t ˆ |
| |
.𝚪(t0 , x0 ; t, x) ≥ P(t0 , x0 ; t, x) − | Ф(t0 , x0 ; s, y)P(s, y; t, x)dyds || ≥
|
t0 R N
⎛ ⎞
1 t − t0 α
≥ NG , x − x0 − c(t − t0 ) 2 G (λ(t − t0 ), x − x0 ) =
λ λ
20.3 The Parametrix Method 407

√
(if .|x − x0 | ≤ cλ t − t0 )
⎛ ⎞ ⎛t − t ⎞
−N α 0
. ≥ λ − c(t − t0 ) G
2 , x − x0
λ
⎛ ⎞
1 t − t0
≥ NG , x − x0 (20.3.40)
2λ λ

( )− 2
if .0 < t − t0 ≤ Tλ := 2cλN α ∧ T .
Given .x, x0 ∈ RN and .0 ≤ t0 < t < T , let .m ∈ N be the integer part of
⎧ ⎫
4|x − x0 |2 T
. max , .
cλ2 (t − t0 ) Tλ

We set
t − t0 x − x0
tk = t0 + k
. , xk = x0 + k , k = 1, . . . , m,
m+1 m+1

and observe that, thanks to the choice of m, we have

t − t0 T
. tk+1 − tk = ≤ ≤ Tλ . (20.3.41)
m+1 m+1

Moreover, if .yk ∈ D(x /k , r) := {y ∈ R | |xk − y| < r} for each .k = 1, . . . , m

N
t−t0
then, choosing .r = c4λ m+1 , we have

/
|x − x0 | cλ t − t0
.|yk+1 − yk | ≤ 2r + |xk+1 − xk | = 2r + ≤ 2r +
m+1 2 m+1
/
t − t0
= cλ . (20.3.42)
m+1
√
= cλ tk+1 − tk . (20.3.43)

Applying the Chapman-Kolmogorov equation repeatedly, we have

ˆ
𝚪(t0 , x0 ; t, x) =
. 𝚪(t0 , x0 ; t1 , y1 )
RNm

∏
m−1
× 𝚪(tk , yk ; tk+1 , yk+1 )𝚪(tm , ym ; t, x)dy1 . . . dym ≥
k=1
408 20 A Primer on Parabolic PDEs

(using the fact that .𝚪 ≥ 0)

ˆ
≥
. 𝚪(t0 , x0 ; t1 , y1 )
RNm

∏
m−1
× 1D(xk ,r) (yk )𝚪(tk , yk ; tk+1 , yk+1 )1D(xm ,r) (ym )𝚪(tm , ym ; t, x)dy1 . . . dym ≥
k=1

(since, by (20.3.41) and (20.3.43), estimate (20.3.40) holds)

ˆ ⎛ ⎞
1 t − t0
. ≥ G , y1 − x0 ·
(2λN )m+1 RNm λ(m + 1)
∏
m−1 ⎛ ⎞
t − t0
· 1D(xk ,r) (yk )G , yk+1 − yk
λ(m + 1)
k=1
⎛ ⎞
t − t0
× 1D(xm ,r) (ym )G , x − ym dy1 . . . dym ≥
λ(m + 1)

(denoting by .ωN the volume of the unit ball in .RN , by (20.3.42))

⎛ ⎞
1 ⎛ ⎞m ⎛ λ(m + 1) ⎞ N2 (m+1) λcλ2
. ≥ exp − (m + 1) .
N
ωN r
(2λN )m+1 2π(t − t0 ) 2

It follows that there exists a constant .c = c(N, T , α, λ, λ0 ) such that

1
𝚪(t0 , x0 ; t, x) ≥
.
N
e−cm
c(t − t0 ) 2

and by the choice of m, this is enough to prove the thesis and conclude the proof of
Theorem 20.2.5.

20.3.5 Proof of Proposition 18.4.3

For consistency with the notations of this chapter, we state and prove Proposi-
tion 18.4.3 in its forward version.
Proposition 20.3.10 Under Assumption 20.2.2, let .𝚪 be the fundamental solution
of the operator .A −∂t on .ST with .A in (20.2.4). For every .λ ≥ 1, the vector-valued
20.3 The Parametrix Method 409

function
ˆ t
uλ (t, x) :=
. e−λ(t−t0 )
0
ˆ
× b(t0 , x0 )𝚪(t0 , x0 ; t, x)dx0 dt0 , (t, x) ∈ [0, T [×RN ,
RN

is a classical solution of the Cauchy problem

⎧
(∂t + At )u = λu − b, in ST ,
.
u(0, ·) = 0, in RN .

Moreover, there exists a constant .c > 0, which depends only on .N, λ0 and T , such
that
c
. |uλ (t, x) − uλ (t, y)| ≤ √ |x − y|, . (20.3.44)
λ
|∇x uλ (t, x) − ∇x uλ (t, y)| ≤ c|x − y|, (20.3.45)

for every .t ∈ ]0, T [ and .x, y ∈ RN , where .∇x = (∂x1 , . . . , ∂xN ).

Proof We use the representation (20.3.4) of the fundamental solution provided by
the parametrix method:
ˆ t
uλ (t, x) =
. e−λ(t−t0 ) (Ib (t0 ; t, x) + Jb (t0 ; t, x))dt0
0

where
ˆ
Ib (t0 ; t, x) :=
. b(t0 , x0 )P(t0 , x0 ; t, x)dx0 ,
RN
ˆ ˆ tˆ
Jb (t0 ; t, x) := b(t0 , x0 ) Ф(t0 , x0 ; s, y)P(s, y; t, x)dyds dx0 ,
RN t0 RN
◟ ◝◜ ◞
=:R(t0 ,x0 ;t,x)
(20.3.46)

with .Ф defined in (20.3.7). Since b is bounded, by (20.3.12) we have

|x − y|
. |Ib (t0 ; t, x) − Ib (t0 ; t, y)| ≤ c √ , x, y ∈ RN .
t − t0
410 20 A Primer on Parabolic PDEs

A similar result holds for .Jb : in fact, by (20.3.9) and by the mean value theorem and
estimate (20.3.12), for .λ1 > λ0 we have

. |R(t0 , x0 ; t, x) − R(t0 , x0 ; t, y)|

ˆ t
|x − y|
≤c 1− α √
t0 (s − t0 ) 2 t − s
ˆ
× G(λ1 (t − s), x̄ − y)G(λ1 (s − t0 ), y − x0 )dyds =
RN

(integrating and using (20.3.28))

|x − y|
. =c 1−α
G(λ1 (t − t0 ), x̄ − x0 ) (20.3.47)
(t − t0 ) 2

Plugging estimate (20.3.47) into (20.3.46) and being b bounded, we obtain

|x − y|
. |Jb (t0 ; t, x) − Jb (t0 ; t, y)| ≤ c 1−α
x, y ∈ RN .
(t − t0 ) 2

Hence, we have
ˆ t e−λ(t−t0 )
|uλ (t, x) − uλ (t, y)| ≤ c|x − y|
. √ dt0
0 t − t0

which yields (20.3.44). The proof of (20.3.45) is analogous and is based on the
arguments also used for the proof of Proposition 20.3.9. ⨅
⨆

20.4 Key Ideas to Remember

The chapter is structured into two parts, focusing on the study of uniqueness and
existence for the parabolic Cauchy problem, respectively.
• Section 20.1: uniqueness is proven under very general assumptions (cf. Assump-
tion 20.1.1, (20.1.2), and 20.1.3). The main results are the maximum and
comparison principles. Uniqueness classes for the Cauchy problem are given by
functions that do not grow too rapidly at infinity.
• Sections 20.2 and 20.3: we present the classic parametrix method for the
construction of the fundamental solution of a uniformly parabolic operator with
bounded coefficients that are Hölder continuous in the spatial variable. This
is a fairly long and complex technique based on suitable estimates involving
Gaussian functions and on the study of singular integrals. The fundamental
Theorem 20.2.5 provides, in addition to existence and the property of being a
20.4 Key Ideas to Remember 411

density, also a comparison between the fundamental solution and the Gaussian
function, the Chapman-Kolmogorov property, and Duhamel’s formula for the
solution of the non-homogeneous Cauchy problem.
Main notations used or introduced in this chapter:

Symbol Description Page

.L Forward parabolic operator 370
.ST := ]0, T [×RN Strip in .RN +1 370
.𝚪 Fundamental solution 379
.G Standard Gaussian function 383
.P Parametrix 384
.bCT
α Bounded, uniformly .α-Hölder continuous (w.r.t. x) functions on .ST 341
.[g]α Norm in .bCTα 341
.bC
α (RN ) Bounded, .α-Hölder continuous functions on .RN 382
.‖ϕ‖bC α (RN ) Norm in .bC α (RN ) 382
References

1. Agassi, A.: Open: An Autobiography. Einaudi (2011)

2. Antonelli, F.: Backward-forward stochastic differential equations. Ann. Appl. Probab. 3, 777–
793 (1993)
3. Antonov, A., Misirpashaev, T., Piterbarg, V.: Markovian projection on a Heston model. J.
Comput. Finance 13, 23–47 (2009)
4. Applebaum, D.: Lévy Processes and Stochastic Calculus, vol. 93 of Cambridge Studies in
Advanced Mathematics. Cambridge University Press, Cambridge (2004)
5. Aronson, D.G.: The fundamental solution of a linear parabolic equation containing a small
parameter. Illinois J. Math. 3, 580–619 (1959)
6. Baldi, P.: Stochastic Calculus. Universitext, Springer, Cham (2017). An introduction through
theory and exercises
7. Barlow, M.T.: One-dimensional stochastic differential equations with no strong solution. J.
London Math. Soc. (2) 26, 335–347 (1982)
8. Barucci, E., Polidoro, S., Vespri, V.: Some results on partial differential equations and Asian
options. Math. Models Methods Appl. Sci. 11, 475–497 (2001)
9. Bass, R.F.: Stochastic Processes, vol. 33 of Cambridge Series in Statistical and Probabilistic
Mathematics. Cambridge University Press, Cambridge (2011)
10. Bass, R.F.: Real Analysis for Graduate Students (2013). Available at http://bass.math.uconn.
edu/real.html
11. Bass, R.F., Perkins, E.: A new technique for proving uniqueness for martingale problems.
Astérisque, 47–53 (2009), (2010)
12. Baudoin, F.: An Introduction to the Geometry of Stochastic Flows. Imperial College Press,
London (2004)
13. Baudoin, F.: Diffusion Processes and Stochastic Calculus. EMS Textbooks in Mathematics.
European Mathematical Society (EMS), Zürich (2014)
14. Beiglböck, M., Schachermayer, W., Veliyev, B.: A short proof of the Doob-Meyer theorem.
Stoch. Process. Appl. 122, 1204–1209 (2012)
15. Bensoussan, A.: Stochastic maximum principle for distributed parameter systems. J. Franklin
Inst. 315, 387–406 (1983)
16. Billingsley, P.: Convergence of Probability Measures. Wiley Series in Probability and
Statistics: Probability and Statistics, second edn. John Wiley & Sons, New York (1999). A
Wiley-Interscience Publication
17. Bismut, J.-M.: Théorie probabiliste du contrôle des diffusions. Mem. Amer. Math. Soc. 4,
xiii+130 (1976)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 413
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1
414 References

18. Bjork, T.: Arbitrage Theory in Continuous Time, 2nd edn. Oxford University Press, Oxford
(2004)
19. Black, F., Scholes, M.: The pricing of options and corporate liabilities. J. Polit. Econ. 81,
637–654 (1973)
20. Blumenthal, R.M., Getoor, R.K.: Markov Processes and Potential Theory. Pure and Applied
Mathematics, vol. 29. Academic Press, New York-London (1968)
21. Brémaud, P.: Point Processes and Queues. Springer, New York (1981). Martingale dynamics,
Springer Series in Statistics
22. Brunick, G., Shreve, S.: Mimicking an Itô process by a solution of a stochastic differential
equation. Ann. Appl. Probab. 23, 1584–1628 (2013)
23. Champagnat, N., Jabin, P.-E.: Strong solutions to stochastic differential equations with rough
coefficients. Ann. Probab. 46, 1498–1541 (2018)
24. Chow, P.-L.: Stochastic Partial Differential Equations, second edn. Advances in Applied
Mathematics. CRC Press, Boca Raton, FL (2015)
25. Chung, K.L., Doob, J.L.: Fields, optionality and measurability. Amer. J. Math. 87, 397–424
(1965)
26. Courrège, P.: Générateur infinitésimal d’un semi-groupe de convolution sur .Rn , et formule de
Lévy-Khinchine. Bull. Sci. Math. (2) 88, 3–30 (1964)
27. Cox, J.C.: Notes on Option Pricing I: Constant Elasticity of Variance Diffusion. Working
Paper, Stanford University, Stanford CA (1975)
28. Cox, J.C.: The constant elasticity of variance option pricing model. J. Portfolio Manag. 23,
15–17 (1997)
29. Cox, J.C., Ingersoll, J.E., Ross, S.A.: The relation between forward prices and futures prices.
J. Financ. Econ. 9, 321–346 (1981)
30. Criens, D., Pfaffelhuber, P., Schmidt, T.: The martingale problem method revisited. Electron.
J. Probab. 28 (2023), 1–46
31. Davie, A.M.: Uniqueness of solutions of stochastic differential equations. Int. Math. Res. Not.
IMRN 2007, Art. ID rnm124, 26 (2007)
32. Davydov, D., Linetsky, V.: Pricing and hedging path-dependent options under the CEV
process. Manag. Sci. 47, 949–965 (2001)
33. Delbaen, F., Shirakawa, H.: A note on option pricing for the constant elasticity of variance
model. Asia-Pac. Financ. Mark. 9, 85–99 (2002)
34. Di Francesco, M., Pascucci, A.: On a class of degenerate parabolic equations of Kolmogorov
type. AMRX Appl. Math. Res. Express 3, 77–116 (2005)
35. Doob, J.L.: Stochastic Processes. John Wiley & Sons/Chapman & Hall, New York/London
(1953)
36. Duffie, D., Filipović, D., Schachermayer, W.: Affine processes and applications in finance.
Ann. Appl. Probab. 13, 984–1053 (2003)
37. Durrett, R.: Stochastic Calculus. Probability and Stochastics Series. CRC Press, Boca Raton,
FL (1996). A practical introduction
38. Durrett, R.: Probability: Theory and Examples, vol. 49 of Cambridge Series in Statistical
and Probabilistic Mathematics. Cambridge University Press, Cambridge (2019). Available at
https://services.math.duke.edu/~rtd/PTE/pte.html
39. El Karoui, N., Peng, S., Quenez, M.C.: Backward stochastic differential equations in finance.
Math. Finance 7, 1–71 (1997)
40. Elworthy, K.D., Le Jan, Y., Li, X.-M.: The Geometry of Filtering. Frontiers in Mathematics.
Birkhäuser Verlag, Basel (2010)
41. Evans, L.C.: Partial Differential Equations, second edn., vol. 19 of Graduate Studies in
Mathematics. American Mathematical Society, Providence, RI (2010)
42. Fabes, E.B., Stroock, D.W.: A new proof of Moser’s parabolic Harnack inequality using the
old ideas of Nash. Arch. Rational Mech. Anal. 96, 327–338 (1986)
43. Fedrizzi, E., Flandoli, F.: Pathwise uniqueness and continuous dependence of SDEs with non-
regular drift. Stochastics 83, 241–257 (2011)
References 415

44. Feehan, P.M.N., Pop, C.A.: On the martingale problem for degenerate-parabolic partial
differential operators with unbounded coefficients and a mimicking theorem for Itô processes.
Trans. Amer. Math. Soc. 367, 7565–7593 (2015)
45. Feller, W.: Zur Theorie der stochastischen Prozesse. Math. Ann. 113, 113–160 (1937)
46. Figalli, A.: Existence and uniqueness of martingale solutions for SDEs with rough or
degenerate coefficients. J. Funct. Anal. 254, 109–153 (2008)
47. Flandoli, F.: Regularity Theory and Stochastic Flows for Parabolic SPDEs, vol. 9 of
Stochastics Monographs. Gordon and Breach Science Publishers, Yverdon (1995)
48. Flandoli, F.: Random Perturbation of PDEs and Fluid Dynamic Models, vol. 2015 of Lecture
Notes in Mathematics. Springer, Heidelberg (2011). Lectures from the 40th Probability
Summer School held in Saint-Flour, 2010, École d’Été de Probabilités de Saint-Flour. [Saint-
Flour Probability Summer School]
49. Friedman, A.: Partial Differential Equations of Parabolic Type. Prentice-Hall, Englewood
Cliffs, NJ (1964)
50. Friedman, A.: Stochastic Differential Equations and Applications. Dover Publications,
Mineola, NY (2006). Two volumes bound as one, Reprint of the 1975 and 1976 original
published in two volumes
51. Fristedt, B., Jain, N., Krylov, N.: Filtering and Prediction: A Primer, vol. 38 of Student
Mathematical Library. American Mathematical Society, Providence, RI (2007)
52. Fujisaki, M., Kallianpur, G., Kunita, H.: Stochastic differential equations for the non linear
filtering problem. Osaka J. Math. 9, 19–40 (1972)
53. Gilbarg, D., Trudinger, N.S.: Elliptic Partial Differential Equations of Second Order,
second edn., vol. 224 of Grundlehren der mathematischen Wissenschaften [Fundamental
Principles of Mathematical Sciences]. Springer, Berlin (1983)
54. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). Available at
http://www.deeplearningbook.org
55. Guyon, J., Henry-Labordère, P.: Nonlinear Option Pricing. Chapman & Hall/CRC Financial
Mathematics Series. CRC Press, Boca Raton, FL (2014)
56. Gyöngy, I.: Mimicking the one-dimensional marginal distributions of processes having an Itô
differential. Probab. Theory Relat. Fields 71, 501–516 (1986)
57. Gyöngy, I., Krylov, N.V.: Existence of strong solutions for Itô’s stochastic equations via
approximations: revisited. Stoch. Partial Differ. Equations Anal. Comput. 10, 693–719 (2022)
58. Hagan, P.S., Kumar, D., Lesniewski, A., Woodward, D.E.: Managing smile risk. Wilmott
Magazine, September, 84–108 (2002)
59. Halmos, P.R.: Measure Theory. D. Van Nostrand Company, New York, NY (1950)
60. Heston, S.: A closed-form solution for options with stochastic volatility with applications to
bond and currency options. Rev. Financ. Stud. 6, 327–343 (1993)
61. Heston, S.L., Loewenstein, M., Willard, G.A.: Options and bubbles. Rev. Financ. Stud. 20(2),
359–390 (2007)
62. Hörmander, L.: Hypoelliptic second order differential equations. Acta Math. 119, 147–171
(1967)
63. Ikeda, N., Watanabe, S.: Stochastic Differential Equations and Diffusion Processes, vol. 24
of North-Holland Mathematical Library. North-Holland Publishing Co./Kodansha, Amster-
dam/Tokyo (1981)
64. Itô, K., Watanabe, S.: Introduction to stochastic differential equations. In: Proceedings of the
International Symposium on Stochastic Differential Equations (Res. Inst. Math. Sci., Kyoto
Univ., Kyoto, 1976), pp. i–xxx. Wiley, New York (1978)
65. Jacod, J., Shiryaev, A.N.: Limit Theorems for Stochastic Processes, second edn., vol. 288 of
Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical
Sciences]. Springer, Berlin (2003)
66. Kallenberg, O.: Foundations of Modern Probability, second edn. Probability and its Applica-
tions (New York). Springer, New York (2002)
67. Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus, second edn., vol. 113
of Graduate Texts in Mathematics. Springer, New York (1991)
416 References

68. Klenke, A.: Probability Theory, second edn. Universitext. Springer, London (2014). A
comprehensive course
69. Kolmogorov, A.N.: Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung.
Math. Ann. 104, 415–458 (1931)
70. Kolmogorov, A.N.: Selected Works of A. N. Kolmogorov. Vol. III. Kluwer Academic
Publishers Group, Dordrecht (1993). Edited by A. N. Shiryayev
71. Kolokoltsov, V.N.: Markov Processes, Semigroups and Generators, vol. 38 of De Gruyter
Studies in Mathematics. Walter de Gruyter & Co., Berlin (2011)
72. Komlós, J.: A generalization of a problem of Steinhaus. Acta Math. Acad. Sci. Hungar. 18,
217–229 (1967)
73. Kotelenez, P.: Stochastic Ordinary and Stochastic Partial Differential Equations, vol. 58 of
Stochastic Modelling and Applied Probability. Springer, New York (2008). Transition from
microscopic to macroscopic equations
74. Krylov, N.V.: Itô’s stochastic integral equations. Teor. Verojatnost. i Primenen 14, 340–348
(1969)
75. Krylov, N.V.: Correction to the paper “Itô’s stochastic integral equations” (Teor. Verojatnost.
i Primenen. 14, 340–348 (1969)). Teor. Verojatnost. i Primenen. 17, 392–393 (1972)
76. Krylov, N.V.: The selection of a Markov process from a Markov system of processes, and
the construction of quasidiffusion processes. Izv. Akad. Nauk SSSR Ser. Mat. 37, 691–708
(1973)
77. Krylov, N.V.: Controlled Diffusion Processes, vol. 14 of Stochastic Modelling and Applied
Probability. Springer, Berlin (2009). Translated from the 1977 Russian original by A. B.
Aries, Reprint of the 1980 edition
78. Krylov, N.V., Röckner, M.: Strong solutions of stochastic equations with singular time
dependent drift. Probab. Theory Related Fields 131, 154–196 (2005)
79. Krylov, N.V., Rozovsky, B.L.: On the first integrals and Liouville equations for diffusion
processes. In: Stochastic Differential Systems (Visegrád, 1980), pp. 117–125, vol. 36 of
Lecture Notes in Control and Information Sci. Springer, Berlin (1981)
80. Krylov, N.V., Rozovsky, B.L.: Characteristics of second-order degenerate parabolic Itô
equations. Trudy Sem. Petrovsk. 8, 153–168 (1982)
81. Krylov, N.V., Zatezalo, A.: A direct approach to deriving filtering equations for diffusion
processes. Appl. Math. Optim. 42, 315–332 (2000)
82. Kunita, H.: Stochastic Flows and Stochastic Differential Equations, vol. 24 of Cambridge
Studies in Advanced Mathematics. Cambridge University Press, Cambridge (1997). Reprint
of the 1990 original
83. Lacker, D., Shkolnikov, M., Zhang, J.: Inverting the Markovian projection, with an application
to local stochastic volatility models. Ann. Probab. 48, 2189–2211 (2020)
84. Ladyzhenskaia, O.A., Solonnikov, V.A., Ural’tseva, N.N.: Linear and Quasilinear Equations
of Parabolic Type. Translations of Mathematical Monographs, vol. 23. American Mathemat-
ical Society, Providence, RI (1968). Translated from the Russian by S. Smith
85. Lanconelli, E., Polidoro, S.: On a class of hypoelliptic evolution operators. Rend. Sem. Mat.
Univ. Politec. Torino 52, 29–63 (1994)
86. Langevin, P.: Sur la théorie du mouvement Brownien. C.R. Acad. Sci. Paris 146, 530–532
(1908)
87. Lee, E.B., Markus, L.: Foundations of Optimal Control Theory, second edn. Robert E. Krieger
Publishing Co., Melbourne, FL (1986)
88. Lemons, D.S.: An Introduction to Stochastic Processes in Physics. Johns Hopkins University
Press, Baltimore, MD (2002). Containing “On the theory of Brownian motion” by Paul
Langevin, translated by Anthony Gythiel
89. Levi, E.E.: Sulle equazioni lineari totalmente ellittiche alle derivate parziali. Rend. Circ. Mat.
Palermo 24, 275–317 (1907)
90. Liptser, R.S., Shiryaev, A.N.: Statistics of Random Processes. I, expanded edn., vol. 5 of
Applications of Mathematics (New York). Springer, Berlin (2001). General theory, Translated
from the 1974 Russian original by A. B. Aries, Stochastic Modelling and Applied Probability
References 417

91. Liu, W., Röckner, M.: Stochastic Partial Differential Equations: An Introduction. Universitext.
Springer, Cham (2015)
92. Lototsky, S.V., Rozovsky, B.L.: Stochastic Partial Differential Equations. Universitext.
Springer, Cham (2017)
93. Ma, J., Yong, J.: Forward-backward Stochastic Differential Equations and Their Applications,
vol. 1702 of Lecture Notes in Mathematics. Springer, Berlin (1999)
94. Mazliak, L., Shafer, G.: The Splendors and Miseries of Martingales - Their History from the
Casino to Mathematics. Trends in the History of Science. Birkhäuser, Cham (2022)
95. Menozzi, S.: Parametrix techniques and martingale problems for some degenerate Kol-
mogorov equations. Electron. Commun. Probab. 16, 234–250 (2011)
96. Meyer, P.-A.: Probability and Potentials. Blaisdell Publishing Co. Ginn, Waltham (1966)
97. Meyer, P.A.: Stochastic processes from 1950 to the present. J. Électron. Hist. Probab. Stat. 5,
42 (2009). Translated from the French [MR1796860] by Jeanine Sedjro
98. Mörters, P., Peres, Y.: Brownian Motion, vol. 30 of Cambridge Series in Statistical and Prob-
abilistic Mathematics. Cambridge University Press, Cambridge (2010). With an appendix by
Oded Schramm and Wendelin Werner
99. Mumford, D.: The dawning of the age of stochasticity. Atti Accad. Naz. Lincei Cl. Sci. Fis.
Mat. Natur. Rend. Lincei (9) Mat. Appl. 11, 107–125 (2000). Mathematics towards the third
millennium (Rome, 1999)
100. Novikov, A.A.: A certain identity for stochastic integrals. Teor. Verojatnost. i Primenen. 17,
761–765 (1972)
101. Nualart, D.: The Malliavin Calculus and Related Topics, second edn. Probability and its
Applications (New York). Springer, Berlin (2006)
102. Oksendal, B.: Stochastic Differential Equations, fifth edn. Universitext. Springer, Berlin
(1998). An introduction with applications
103. Oleinik, O.A., Radkevic, E.V.: Second Order Equations with Nonnegative Characteristic
Form. Plenum Press, New York (1973). Translated from the Russian by Paul C. Fife
104. Ornstein, L.S., Uhlenbeck, G.E.: On the theory of the Brownian motion. Phys. Rev. 36, 823–
841 (1930)
105. Pagliarani, S., Pascucci, A.: The exact Taylor formula of the implied volatility. Finance Stoch.
21, 661–718 (2017)
106. Pagliarani, S., Pascucci, A., Pignotti, M.: Intrinsic Taylor formula for Kolmogorov-type
homogeneous groups. J. Math. Anal. Appl. 435, 1054–1087 (2016)
107. Pardoux, E.: Stochastic partial differential equations and filtering of diffusion processes.
Stochastics 3, 127–167 (1979)
108. Pardoux, E.: Stochastic Partial Differential Equations. SpringerBriefs in Mathematics.
Springer, Cham (2021). An introduction
109. Pardoux, E., Peng, S.G.: Adapted solution of a backward stochastic differential equation. Syst.
Control Lett. 14, 55–61 (1990)
110. Pardoux, E., Rascanu, A.: Stochastic Differential Equations, Backward SDEs, Partial Differ-
ential Equations, vol. 69 of Stochastic Modelling and Applied Probability. Springer, Cham
(2014)
111. Pascucci, A.: Calcolo stocastico per la finanza, vol. 33 of Unitext. Springer, Milano (2008)
112. Pascucci, A.: PDE and Martingale Methods in Option Pricing, vol. 2 of Bocconi & Springer
Series. Springer/Bocconi University Press, Milan (2011)
113. Pascucci, A.: Probability Theory. Volume 1 - Random Variables and Distributions. Unitext.
Springer, Milan (2024)
114. Pascucci, A., Pesce, A.: Sobolev embeddings for kinetic Fokker-Planck equations. J. Funct.
Anal. 286, Paper No. 110344, 40 (2024)
115. Pascucci, A., Runggaldier, W.J.: Financial Mathematics, vol. 59 of Unitext. Springer, Milan
(2012). Theory and problems for multi-period models, Translated and extended version of
the 2009 Italian original
116. Paulos, J.A.: A Mathematician Reads the Newspaper. Basic Books, New York (2013).
Paperback edition of the 1995 original with a new preface
418 References

117. Peng, S.G.: A nonlinear Feynman-Kac formula and applications. In: Control Theory, Stochas-
tic Analysis and Applications (Hangzhou, 1991), pp. 173–184. World Sci. Publ., River Edge,
NJ (1991)
118. Pogorzelski, W.: Étude de la solution fondamentale de l’équation parabolique. Ricerche Mat.
5, 25–57 (1956)
119. Polidoro, S.: Uniqueness and representation theorems for solutions of Kolmogorov-Fokker-
Planck equations. Rend. Mat. Appl. (7) 15, 535–560 (1995)
120. Prévôt, C., Röckner, M.: A Concise Course on Stochastic Partial Differential Equations,
vol. 1905 of Lecture Notes in Mathematics. Springer, Berlin (2007)
121. Protter, P.E.: Stochastic Integration and Differential Equations, Second edn., vol. 21 of
Stochastic Modelling and Applied Probability. Springer, Berlin (2005). Version 2.1, Corrected
third printing
122. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press
(2006). Available at http://www.gaussianprocess.org/gpml/
123. Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion, third edn., vol. 293 of
Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical
Sciences]. Springer, Berlin (1999)
124. Rogers, L.C.G., Williams, D.: Diffusions, Markov Processes, and Martingales. Vol. 2,
Cambridge Mathematical Library. Cambridge University Press, Cambridge (2000). Itô
calculus, Reprint of the second (1994) edition
125. Rozovsky, B.L.: Stochastic Evolution Systems, vol. 35 of Mathematics and its Applications
(Soviet Series). Kluwer Academic Publishers Group, Dordrecht (1990). Linear theory and
applications to nonlinear filtering, Translated from the Russian by A. Yarkho
126. Rozovsky, B.L., Lototsky, S.V.: Stochastic Evolution Systems, vol. 89 of Probability Theory
and Stochastic Modelling. Springer, Cham (2018). Linear theory and applications to non-
linear filtering
127. Salsburg, D.: The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth
Century. Henry Holt and Company (2002)
128. Schilling, R.L.: Sobolev embedding for stochastic processes. Expo. Math. 18, 239–242 (2000)
129. Schilling, R.L.: Brownian Motion—A Guide to Random Processes and Stochastic Calculus.
De Gruyter Textbook, De Gruyter, Berlin (2021). With a chapter on simulation by Björn
Böttcher, Third edition [of 2962168]
130. Shaposhnikov, A., Wresch, L.: Pathwise vs. path-by-path uniqueness, preprint,
arXiv:2001.02869 (2020)
131. Skorokhod, A.V.: Studies in the Theory of Random Processes. Translated from the Russian
by Scripta Technica, Inc., Mineola, NY: Dover Publications, reprint of the 1965 edition ed.,
2017
132. Stroock, D.W.: Markov Processes from K. Itô’s Perspective, vol. 155 of Annals of Mathemat-
ics Studies. Princeton University Press, Princeton, NJ (2003)
133. Stroock, D.W.: Partial Differential Equations for Probabilists, vol. 112 of Cambridge Studies
in Advanced Mathematics. Cambridge University Press, Cambridge (2012). Paperback
edition of the 2008 original
134. Stroock, D.W., Varadhan, S.R.S.: Diffusion processes with continuous coefficients. I. Comm.
Pure Appl. Math. 22, 345–400 (1969)
135. Stroock, D.W., Varadhan, S.R.S.: Diffusion processes with continuous coefficients. II. Comm.
Pure Appl. Math. 22, 479–530 (1969)
136. Stroock, D.W., Varadhan, S.R.S.: Multidimensional Diffusion Processes. Classics in Mathe-
matics. Springer, Berlin (2006). Reprint of the 1997 edition
137. Struwe, M.: Variational Methods, fourth edn., vol. 34 of Ergebnisse der Mathematik und ihrer
Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in Mathematics
and Related Areas. 3rd Series. A Series of Modern Surveys in Mathematics]. Springer, Berlin
(2008). Applications to nonlinear partial differential equations and Hamiltonian systems
138. Taira, K.: Semigroups, Boundary Value Problems and Markov Processes, second edn.
Springer Monographs in Mathematics. Springer, Heidelberg (2014)
References 419

139. Tanaka, H.: Note on continuous additive functionals of the 1-dimensional Brownian path. Z.
Wahrscheinlichkeitstheorie Verw. Gebiete 1, 251–257 (1962/1963)
140. Trevisan, D.: Well-posedness of multidimensional diffusion processes with weakly differen-
tiable coefficients. Electron. J. Probab. 21, Paper No. 22, 41 (2016)
141. Tychonoff, A.: Théorèmes d’unicité pour l’equation de la chaleur. Math. Sbornik 42, 199–216
(1935)
142. van Casteren, J.A.: Markov Processes, Feller Semigroups and Evolution Equations, vol. 12
of Series on Concrete and Applicable Mathematics. World Scientific Publishing, Hackensack
(2011)
143. Vasicek, O.: An equilibrium characterization of the term structure. J. Financ. Econ. 5, 177–
188 (1977)
144. Veretennikov, A.Y.: Strong solutions and explicit formulas for solutions of stochastic integral
equations. Mat. Sb. (N.S.) 111(153), 434–452, 480 (1980)
145. Veretennikov, A.Y.: “Inverse diffusion” and direct derivation of stochastic Liouville equations.
Mat. Zametki 33, 773–779 (1983)
146. Veretennikov, A.Y.: On backward filtering equations for SDE systems (direct approach). In:
Stochastic Partial Differential Equations (Edinburgh, 1994), pp. 304–311, vol. 216 of London
Math. Soc. Lecture Note Ser. Cambridge Univ. Press, Cambridge (1995)
147. Vespri, V.: Le anime della matematica. Da Pitagora alle intelligenze artificiali. Diarkos
editore, Santarcangelo di Romagna (2023)
148. Williams, D.: Probability with Martingales. Cambridge Mathematical Textbooks. Cambridge
University Press, Cambridge (1991)
149. Yamada, T., Watanabe, S.: On the uniqueness of solutions of stochastic differential equations.
J. Math. Kyoto Univ. 11, 155–167 (1971)
150. Yong, J., Zhou, X.Y.: Stochastic Controls, vol. 43 of Applications of Mathematics (New
York). Springer, New York (1999). Hamiltonian systems and HJB equations
151. Zabczyk, J.: Mathematical Control Theory—An Introduction, Systems & Control: Founda-
tions & Applications. Birkhäuser/Springer, Cham (2020). Second edition [of 2348543]
152. Zhang, J.: Backward Stochastic Differential Equations, vol. 86 of Probability Theory and
Stochastic Modelling. Springer, New York (2017). From linear to fully nonlinear theory
153. Zhang, X.: Stochastic homeomorphism flows of SDEs with singular drifts and Sobolev
diffusion coefficients. Electron. J. Probab. 16, 38, 1096–1116 (2011)
154. Zvonkin, A.K.: A transformation of the phase space of a diffusion process that will remove
the drift. Mat. Sb. (N.S.) 93(135), 129–149, 152 (1974)
Index

Symbols parabolic, 298

.L
2 , 176 Bounded variation (BV), 152
.LB,loc , 184
2 Brownian bridge, 314
.LB , 184
2 Brownian motion, 41, 71
.LS,loc , 202
2 canonical, 117
correlated, 228, 238, 239
.Lloc , 195
2
with drift, 161
.F , 111
X
Feller property, 75
.F∞ , 109
finite-dimensional densities, 76
.Fτ , 119
geometric, 277
.G , 14
X
Lévy characterization, 238
.σ -algebra
Markov property, 75
completion, 11 multidimensional, 227
α
.bCT , 341
with random initial value, 144
.M , 141
c,2
Burkholder-Davis-Gundy, 216, 219
.M
c,loc , 143

C
A Càdlàg, 88
Almost everywhere (a.e.), xix Canonical version
Almost surely (a.s.), xix of a continuous process, 63
A priori estimate of a Markov process, 33
p
.L , 281 of a process, 12
exponential, 283 Cauchy-Schwarz, 167
Arg max, xii Change of drift, 244
Aronson, D.G., 406 Chapman-Kolmogorov, 37, 381, 391
Assumptions Characteristic exponent, 116
standard for SDE, 277 Characteristics, 296
Coefficient
diffusion, 205, 264
B drift, 264
Bachelier, L., 71 Commutator, 312
Black&Scholes, 247 Completion, 11
Blumenthal, O., 114, 117 Condition
Boundary Hörmander, 312

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 421
A. Pascucci, Probability Theory II, La Matematica per il 3+2 166,
https://doi.org/10.1007/978-3-031-63193-1
422 Index

Kalman, 310 Estimates

Novikov, 250 Gaussian, 381, 385
Constant elasticity of variance, 319 potential, 395
Continuity in mean, 182 Euler Gamma, 390
Continuous dependence on parameters, 333 Exit time, 290
Controllability, 308
Control theory, 308
Courrège, P., 45 F
Covariation process, 166, 178, 186, 198 Feller, W., 29, 124
Cox-Ingersoll-Ross (CIR), 317 Feynman-Kac, 287, 292, 298, 299
non-linear, 359
D Filtering, 359
Decomposition Filtration, 14
Doob, 17, 165 Brownian
Delta backward, 362
Kronecker, 228 complete, 107
Density enlargement, 110
transition, 28 standard, 111
.G , 14
X
Differential notation, 156, 204
Diffusion, 55 generated, 14
Dirichlet problem, 292 right-continuous, 107
Distribution standard, 111
of a stochastic process, 4 usual conditions, 107
transition, 25 Finite-dimensional
homogeneous, 27 cylinder, 3
Doob, J.L., 17, 103, 135, 165 distributions, 4
Drift, 205, 264 Flandoli, F., 348
change of, 244 Fokker-Planck, 51
Duhamel, J.M.C., 381 Formula
Durrett, R., 71 Black&Scholes, 247
Dyadic Duhamel, 381
partition, 67 Feynman-Kac, 287, 292, 298, 299
rationals, 67, 133 non-linear, 359
Itô, 209, 210
backward, 363
E for Brownian motion, 211
Einstein, A., 71, 305 for Brownian motion correlated, 239
Enlargement for continuous semimartingales, 233
filtrations, 110 deterministic, 156
standard, 111 for Itô processes, 214, 234
Equation Lévy-Khintchine, 116
Chapman-Kolmogorov, 37 Forward-backward system of equations
Fokker-Planck, 51, 52, 360 (FBSDE), 359
heat, 295 Friedman, A., 342
backward, 50 Function
forward, 49 of bounded variation, 152
Kolmogorov BV, 152
backward, 47 càdlàg, 88
forward, 49, 51, 52, 332 Euler Gamma, 390
Langevin, 305 Gaussian, 383
stochastic differential, 263 standard, 383
Volterra, 269 indicator, xi
Index 423

G Law
Girsanov, I.V., 251, 253 0-1 of Blumenthal, 114, 117
Grönwall, T.H., 281 of a continuous process, 63
Gyöngy, I., 353 iterated logarithm, 74
of a stochastic process, 4
transition, 25
H Gaussian, 28, 40
Hörmander, L., 306, 312, 313 homogeneous, 27
Hilbert-Schmidt, 231 linear SDE, 303
Poisson, 27, 39
Lebesgue-Stieltjes, 158
I
Lemma
Independent increments, 34
Grönwall, 281
Inequality
Gyöngy, 353
Burkholder-Davis-Gundy, 216, 219
Komlós, 168
Doob’s maximal, 103, 104, 135
upcrossing, 105
Infinitesimal generator, 42
Levi E.E., 342, 379
Inhomogeneous term, 373
Lévy P., 114, 238
Integral
Lévy-Khintchine, 116
Itô, 175
Linear system
Lebesgue-Stieltjes, 158
controllability, 308
Riemann-Stieltjes, 151, 154, 200
Intensity, 86
stochastic, 90
Isometry M
Itô, 178, 186, 199, 232 Markov
Itô process, 30
formula, 209, 210 finite-dimensional laws, 36
for Brownian motion, 211, 239 property, 30
for continuous semimartingales, 233 extended, 32
for Itô processes, 214, 234 Markov, A., 25, 30, 123, 330
integral, 175 Markovian projection, 354
isometry, 178, 186, 199, 232 Martingale, 15, 338
process, 204 Brownian, 77, 257
multidimensional, 231 càdlàg, 139
process with deterministic coefficients, 215 discrete, 15
Itô-Tanaka, 348 exponential, 77, 212, 235, 244
local, 143
K problem, 338
Kalman, R.E., 310 quadratic, 77, 234
Kernel stopped, 143
Poisson, 295 sub-, 17
Kolmogorov, A.N., 11, 23, 64, 306 super-, 17
Kolmogorov equation uniformly square-integrable, 146
backward, 47 Matrix
forward, 51, 332 covariation, 166, 232
Komlós, J., 168 Mean reversion, 314
Kronecker, L., 228 Measurability
Krylov, N.V., 346, 363 progressive, 118
Measure
harmonic, 295
L Lebesgue-Stieltjes, 158
Langevin, P., 305 Wiener, 76
Laplace, P.S., 129 Mesh, 152
424 Index

Method transition law, 39

characteristics, 296 Positive part, xii
parametrix, 379, 383 Potential, 395
Model Predictable, 17
CEV, 319 Principle
CIR, 317 comparison, 374
Vasicek, 314 Duhamel, 381
Modification, 8 maximum, 44, 294, 298, 371, 373
Mumford, D.B., vii weak, 375, 378
reflection, 126
Problem
N Cauchy, 371
Nash, J., 406 backward, 342
Norm classical solution, 342
Hilbert-Schmidt, 231 quasi-linear, 359
spectral, 387 Cauchy-Dirichlet, 372
Novikov, A., 250 Dirichlet, 292
martingale, 338
Processes
O absolutely integrable, 15
Operator adapted, 14
adjoint, 52 Brownian motion, 71
characteristic, 42 BV, 160
an SDE, 288 càdlàg, 88
elliptic-parabolic, 46 canonical version, 12, 14, 63
Kolmogorov CEV, 319
backward, 342 CIR, 317
forward, 370 continuous, 59
Laplace, 49, 129 canonical version, 63
local, 44 law, 63
parabolic, 371 covariation, 166, 178, 186, 198
pseudo-differential, 116 diffusion, 55
symbol, 116 equal in law, 8
translation, 128 Feller, 29
Option, 245 Gaussian, 5, 12
Asian, 306 increasing, 160
Optional sampling, 102, 137, 148 indistinguishable, 9
Ornstein-Uhlenbeck, 316 Itô, 204
backward, 363
with deterministic coefficients, 215
P multidimensional, 231
Parabolic Lévy, 114
boundary, 298, 372 Markov, 25, 30, 306
distance, 335 martingale, 15
PDE, 369 maximum, 127
Parametrix, 379, 383 measurable, 8
Partial differential equation (PDE), 288 modifications, 8
parabolic, 369 Poisson, 40, 84, 88
Partition, 151 compensated, 90
dyadic, 133 compound, 87
Peano’s brush, 269 with stochastic intensity, 90
Poisson, 27, 84, 295 predictable, 17
characteristic exponent, 87 progressively measurable, 118
kernel, 295 quadratic variation, 165, 205, 209, 220
Index 425

reflected, 126 of an SDE, 265

simple, 177, 189 strong (of an SDE), 266
square root, 317 transfer of, 273
stochastic, 1–3 weak (of an SDE), 272
discrete, 2 Solvability of an SDE, 267
law, 4 Space
real, 2 of trajectories, 1
stopped, 100 Polish, 61
with independent increments, 34 probability
Progressively measurable, 118 complete, 9
Property Skorokhod, 63
Feller, 29, 124 trajectories, 2
for SDE, 334 Wiener, 76
strong, 41 Stochastic differential equation (SDE), 263
flow, 326, 364 backward, 356
Markov, 30 forward-backward, 359
extended, 32 linear, 303
for SDE, 330 solution, 265
strong, 123 solvability, 267
martingale, 15 standard assumptions, 277
semigroup, 41 strong solution, 266
strong Markov, 124 uniqueness, 268
homogeneous case, 129 weak solution, 272
for SDE, 334 Stochastic partial differential equation (SPDE),
Pseudo-differential operator, 116 359
heat, 359
Krylov’s, 363
Q Stopping time, 107
Quadratic variation, 165, 205, 209, 220 discrete, 97
Stroock, D.W., 337, 339
Sub-martingale, 17
R Super-martingale, 17
Random variable (r.v.), xix Symbol
Reflected process, 126 Kronecker, 228
Reflection principle, 126 of an operator, 116
Regularization by noise, 347
Representation of Brownian martingales, 257
Riemann-Stieltjes, 151, 154
Risk-neutral valuation, 245 T
Tanaka, H., 267, 268
Tanaka’s example, 267, 268
S Theorem
Scalar product, xii Courrège, 45
Semigroup, 41 Doob’s decomposition, 17, 165
Semimartingale, 161, 202 Girsanov, 251, 253
BV, 163 Kolmogorov’s continuity, 64, 65
continuous Kolmogorov’s extension, 11, 22
uniqueness of decomposition, 164 Lévy’s characterization, 238
Set-up, 264 optional sampling, 102, 137, 148
Shreve, S., 356 representation of Brownian martingales,
Skorokhod, A.V., 63, 346 257
Solution Skorokhod, 346
distributional, 52 Stroock and Varadhan, 339
fundamental, 50, 333, 342, 379 Yamada-Watanabe, 274, 325
426 Index

Time quadratic, 161

exit, 98, 108, 290 Vasicek, O., 314
from a closed set, 109 Vector field, 312
from an open set, 108 Veretennikov, A.Y., 270, 347
stopping, 107 Version
discrete, 97 canonical
Trajectory, 2, 4 of a continuous process, 63
Transfer of solutions, 273 of a Markov process, 33
Tychonoff, A.N., 371 of a process, 12
continuous, 60
Vespri, V., vii
U
Uniqueness
class, 371 W
for an SDE, 268 Watanabe, S., 274
strong for SDE, 324 Wiener, N., 71, 76
Upcrossing, 105
Usual conditions, 107
Y
Yamada, T., 274
V
Varadhan, S.R.S., 337, 339
Variation Z
first, 152 Zvonkin, A.K., 270, 347

Stochastic Calculus An Introduction Through Theory and Exercises 1st Edition Paolo Baldi Download
No ratings yet
Stochastic Calculus An Introduction Through Theory and Exercises 1st Edition Paolo Baldi Download
98 pages
Measure, Integration and A Primer On Probability Theory: Stefano Gentili
No ratings yet
Measure, Integration and A Primer On Probability Theory: Stefano Gentili
458 pages
(Universitext) Paolo Baldi - Probability - An Introduction Through Theory and Exercises-Springer (2024) (Z-Lib - Io)
No ratings yet
(Universitext) Paolo Baldi - Probability - An Introduction Through Theory and Exercises-Springer (2024) (Z-Lib - Io)
395 pages
An Introduction To Continuous-Time Stochastic Processes (Theory, Models, and Applications To Finance, Biology, and Medicine) (4th Edition) Capasso
No ratings yet
An Introduction To Continuous-Time Stochastic Processes (Theory, Models, and Applications To Finance, Biology, and Medicine) (4th Edition) Capasso
10 pages
Introduction To Bios Tatis Tic S Second
No ratings yet
Introduction To Bios Tatis Tic S Second
374 pages
Random Walk A Modern Introduction 2010
No ratings yet
Random Walk A Modern Introduction 2010
378 pages
Generalized Linear Models: Ariel Alonso Abad
No ratings yet
Generalized Linear Models: Ariel Alonso Abad
43 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
Malliavin Calculus for Economists
No ratings yet
Malliavin Calculus for Economists
83 pages
Linear Models 2nd Edition Shayle R. Searle PDF Download
No ratings yet
Linear Models 2nd Edition Shayle R. Searle PDF Download
52 pages
Measure, Topology, and Fractal Geometry (2nd Edition) PDF
No ratings yet
Measure, Topology, and Fractal Geometry (2nd Edition) PDF
10 pages
Math F424 2191
100% (1)
Math F424 2191
3 pages
Bosq Nguyen A Course in Stochastic Processes PDF
100% (1)
Bosq Nguyen A Course in Stochastic Processes PDF
354 pages
Generalised Linear Models and Bayesian Statistics
No ratings yet
Generalised Linear Models and Bayesian Statistics
35 pages
Webpage:: Textbook: "Matrix Algebra Useful For Statistics", Searle
0% (1)
Webpage:: Textbook: "Matrix Algebra Useful For Statistics", Searle
18 pages
Probability Theory-Merged
100% (1)
Probability Theory-Merged
127 pages
(Dimitris N. Politis, Joseph P. Romano, Michael Subsampling
No ratings yet
(Dimitris N. Politis, Joseph P. Romano, Michael Subsampling
180 pages
Statistics Syllabus
No ratings yet
Statistics Syllabus
148 pages
Mathematics of Finance An Intuitive Introduction - Donald G. Saari - Sep.2019
No ratings yet
Mathematics of Finance An Intuitive Introduction - Donald G. Saari - Sep.2019
155 pages
Random Walks: (Lecture Notes)
100% (1)
Random Walks: (Lecture Notes)
84 pages
Further Topics On Discrete-Time Markov Control Processes
No ratings yet
Further Topics On Discrete-Time Markov Control Processes
285 pages
Selected Topics in Malliavin Calculus, Springer
No ratings yet
Selected Topics in Malliavin Calculus, Springer
178 pages
Asymptotics in Statistics Some Basic Concepts by Lucien Le Cam, Grace Lo Yang (Auth.)
100% (1)
Asymptotics in Statistics Some Basic Concepts by Lucien Le Cam, Grace Lo Yang (Auth.)
298 pages
SIR Model with Vital Dynamics
No ratings yet
SIR Model with Vital Dynamics
5 pages
Credit Derivatives Lecture Series
No ratings yet
Credit Derivatives Lecture Series
21 pages
Functional Analysis Exercises
No ratings yet
Functional Analysis Exercises
6 pages
PDE Solutions for Instructors
No ratings yet
PDE Solutions for Instructors
74 pages
Fixed Point Theorems Guide
No ratings yet
Fixed Point Theorems Guide
61 pages
Convergence of Stochastic Processes
No ratings yet
Convergence of Stochastic Processes
223 pages
Problems in Real Analysis:: Advanced Calculus On The Real Axis
No ratings yet
Problems in Real Analysis:: Advanced Calculus On The Real Axis
1 page
Bayesian Inference of State Space Models Kalman Filtering and Beyond
No ratings yet
Bayesian Inference of State Space Models Kalman Filtering and Beyond
503 pages
How To Teach Mathematics 3rd Edition Steven G. Krantz Available Full Chapters
100% (9)
How To Teach Mathematics 3rd Edition Steven G. Krantz Available Full Chapters
195 pages
Lesson 0: Martingales: Le Thi Xuan Mai
100% (1)
Lesson 0: Martingales: Le Thi Xuan Mai
50 pages
Functional Analysis and Applications 1st Ed Abul Hasan Siddiqi Download
No ratings yet
Functional Analysis and Applications 1st Ed Abul Hasan Siddiqi Download
89 pages
A Primer of Infinitesimal Analysis - Portada
0% (1)
A Primer of Infinitesimal Analysis - Portada
4 pages
Interest Rate Modeling Theory and Practice 3rd Edition Lixin Wu Available Full Chapters
No ratings yet
Interest Rate Modeling Theory and Practice 3rd Edition Lixin Wu Available Full Chapters
174 pages
Binomial Distribution Explained
No ratings yet
Binomial Distribution Explained
16 pages
TimeSeries Analysis State Space Methods
100% (1)
TimeSeries Analysis State Space Methods
57 pages
LargeScaleInference PDF
No ratings yet
LargeScaleInference PDF
273 pages
Wiley - Student Solutions Manual To Accompany Introduction To Time Series Analysis and Forecasting - 978-0-470-43574-8
0% (1)
Wiley - Student Solutions Manual To Accompany Introduction To Time Series Analysis and Forecasting - 978-0-470-43574-8
3 pages
STAT 650 - Foundations of Data Science Syllabus
No ratings yet
STAT 650 - Foundations of Data Science Syllabus
13 pages
(Ebook PDF) Loss Models: From Data To Decisions 5th Edition Instant Download
100% (2)
(Ebook PDF) Loss Models: From Data To Decisions 5th Edition Instant Download
108 pages
Test
100% (1)
Test
297 pages
(Ian W. Knowles, Roger T. Lewis, International Con (BookFi)
No ratings yet
(Ian W. Knowles, Roger T. Lewis, International Con (BookFi)
401 pages
Dominated Convergence Theorem
No ratings yet
Dominated Convergence Theorem
4 pages
Computational Optimal Transport
No ratings yet
Computational Optimal Transport
56 pages
1480215236
100% (2)
1480215236
657 pages
Differential Topology and General Equilibrium With Complete and Incomplete Markets by Villanacci, Carosi, Benevieri & Battinelli
No ratings yet
Differential Topology and General Equilibrium With Complete and Incomplete Markets by Villanacci, Carosi, Benevieri & Battinelli
494 pages
Fdocuments - in - Asymptotic Expansion of The Incomplete Beta Function For Large Values of The
No ratings yet
Fdocuments - in - Asymptotic Expansion of The Incomplete Beta Function For Large Values of The
6 pages
Notes On Mathematical Statistics and Data Analysis 3e by J.A. Rice (2006)
No ratings yet
Notes On Mathematical Statistics and Data Analysis 3e by J.A. Rice (2006)
8 pages
(Ebook) Functional Analysis: Introduction To Further Topics in Analysis by Elias M. Stein, Rami Shakarchi ISBN 9780691113876, 9781400840557, 0691113874, 1400840554 PDF Version
No ratings yet
(Ebook) Functional Analysis: Introduction To Further Topics in Analysis by Elias M. Stein, Rami Shakarchi ISBN 9780691113876, 9781400840557, 0691113874, 1400840554 PDF Version
95 pages
STA130
No ratings yet
STA130
578 pages
Probability Theory I Random Variables and Distribution
No ratings yet
Probability Theory I Random Variables and Distribution
398 pages
A Compact Course On Linear PDEs 2nd Edition Alberto Valli Download
No ratings yet
A Compact Course On Linear PDEs 2nd Edition Alberto Valli Download
108 pages
The Painlevé Handbook Robert Conte Newest Edition 2025
No ratings yet
The Painlevé Handbook Robert Conte Newest Edition 2025
151 pages
Elements of Probability and Statis
100% (1)
Elements of Probability and Statis
246 pages
A Compact Course On Linear PDEs 1st Edition Alberto Valli PDF Download
No ratings yet
A Compact Course On Linear PDEs 1st Edition Alberto Valli PDF Download
139 pages
978 0 387 73394 4
No ratings yet
978 0 387 73394 4
212 pages
Isolated Singularities in Partial Differential Inequalities 1st Edition Marius Ghergu Available Instanly
100% (8)
Isolated Singularities in Partial Differential Inequalities 1st Edition Marius Ghergu Available Instanly
156 pages
Mcnotes 41
No ratings yet
Mcnotes 41
8 pages
Mcnotes 61
No ratings yet
Mcnotes 61
7 pages
10.1515 - Nanoph 2021 0715
No ratings yet
10.1515 - Nanoph 2021 0715
12 pages
Adinkra Symbols
No ratings yet
Adinkra Symbols
7 pages
Rpp2008 Tab Baryons N
No ratings yet
Rpp2008 Tab Baryons N
12 pages
NII-Electronic Library Service
No ratings yet
NII-Electronic Library Service
8 pages
Evaluating The Teaching Difficulties of A Physics Topic Using The Classroom Practice Diagnostic Framework (CPDF) : A Focus On Classroom Interactions and Discourse
No ratings yet
Evaluating The Teaching Difficulties of A Physics Topic Using The Classroom Practice Diagnostic Framework (CPDF) : A Focus On Classroom Interactions and Discourse
11 pages
Envelope Geom
No ratings yet
Envelope Geom
11 pages
Ap 150489
No ratings yet
Ap 150489
15 pages
Painlevé Equations Overview
No ratings yet
Painlevé Equations Overview
63 pages
Brain and Behavior 10
No ratings yet
Brain and Behavior 10
44 pages
Nureg 1093 PDF
No ratings yet
Nureg 1093 PDF
97 pages
Adinkra Symbols and Meanings - 122 African Symbols & Pronunciations
100% (2)
Adinkra Symbols and Meanings - 122 African Symbols & Pronunciations
235 pages
General Relativity 3
No ratings yet
General Relativity 3
82 pages
Book of Abstracts - Extract: 86 Annual Meeting March 23-27, 2015
No ratings yet
Book of Abstracts - Extract: 86 Annual Meeting March 23-27, 2015
18 pages
1992mnras 259 209B PDF
No ratings yet
1992mnras 259 209B PDF
9 pages
Galois Theory
No ratings yet
Galois Theory
16 pages
Energy in Special Relativity: 22 Theoretical Physics, Vol. 4, No. 1, March 2019
No ratings yet
Energy in Special Relativity: 22 Theoretical Physics, Vol. 4, No. 1, March 2019
4 pages
Developments of The Relativistic Bohm-Poisson Equation and Dark Energy
No ratings yet
Developments of The Relativistic Bohm-Poisson Equation and Dark Energy
14 pages
3 A1 C
No ratings yet
3 A1 C
3 pages
3 A1 B
No ratings yet
3 A1 B
3 pages
Lecture1 2
No ratings yet
Lecture1 2
74 pages
Concisegeometry032982mbp PDF
No ratings yet
Concisegeometry032982mbp PDF
333 pages
Beam String 5
No ratings yet
Beam String 5
19 pages
Lab Report of Strain Gauges and Load Cells PDF
No ratings yet
Lab Report of Strain Gauges and Load Cells PDF
11 pages
PYP Student Planner Guide
No ratings yet
PYP Student Planner Guide
31 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
BIL FOA 2829 - Basis For Techno-Economic Analysis 2023 - 01 - 30
No ratings yet
BIL FOA 2829 - Basis For Techno-Economic Analysis 2023 - 01 - 30
45 pages
UNIT 10 - Bahasa Inggris Pangan - JMP
No ratings yet
UNIT 10 - Bahasa Inggris Pangan - JMP
10 pages
Môn Listening
No ratings yet
Môn Listening
19 pages
Social Media and The Influence of Fake News Detect
No ratings yet
Social Media and The Influence of Fake News Detect
11 pages
Effectiveness of Olive Oil Massage On Fatigue Among The Patients Undergoing Haemodialysis
No ratings yet
Effectiveness of Olive Oil Massage On Fatigue Among The Patients Undergoing Haemodialysis
5 pages
Grade 5 Term 3 Lessons Plans
No ratings yet
Grade 5 Term 3 Lessons Plans
132 pages
AI-Ayesha Strategic Decision.d 55c8316b360ce0bf
No ratings yet
AI-Ayesha Strategic Decision.d 55c8316b360ce0bf
20 pages
Classification of Living Things
No ratings yet
Classification of Living Things
6 pages
22 KCV For Teaching
No ratings yet
22 KCV For Teaching
2 pages
English Portfolio Class 12
No ratings yet
English Portfolio Class 12
8 pages
Use Phone as Projector with Lens
No ratings yet
Use Phone as Projector with Lens
8 pages
Exploring Grammar in Writing Article
No ratings yet
Exploring Grammar in Writing Article
4 pages
Ancient Philosophy Summary
No ratings yet
Ancient Philosophy Summary
8 pages
Stats Correlation for Chem Students
No ratings yet
Stats Correlation for Chem Students
50 pages
Design of Cable Trench
78% (9)
Design of Cable Trench
4 pages
Rheolube 363F: Rust Inhibited, PTFE Fortified
No ratings yet
Rheolube 363F: Rust Inhibited, PTFE Fortified
1 page
Beauty and The Alpha 1st Edition Liliana Carlisle Download
100% (10)
Beauty and The Alpha 1st Edition Liliana Carlisle Download
117 pages
Exercises TECHNICAS ANALÍTICAS
No ratings yet
Exercises TECHNICAS ANALÍTICAS
23 pages
DiscreteMaths LectureNotes
No ratings yet
DiscreteMaths LectureNotes
4 pages
2007 02 17 GENV Cofimvaba Landfill Site Phase 2
No ratings yet
2007 02 17 GENV Cofimvaba Landfill Site Phase 2
42 pages
Scaffold Erection NC2 Cert
No ratings yet
Scaffold Erection NC2 Cert
1 page
Dissertation Writing Support
100% (2)
Dissertation Writing Support
7 pages
Quick Guide FDX Console
No ratings yet
Quick Guide FDX Console
5 pages
Life Skills Complete
No ratings yet
Life Skills Complete
23 pages
Barnes-The Toils of Scepticism
100% (1)
Barnes-The Toils of Scepticism
88 pages