1
Introduction to Arbitrage Pricing Theory
For reference, this chapter reviews selected results from stochastic calculus
and from the modern theory of asset pricing. The material in this chapter
is well covered in existing literature, so we keep the chapter brief and the
mathematical treatment informal. For a more rigorous treatment we refer
to Duffie [2001] or Musicla and Rutkowski [1997]. Most of the necessary
mathematical foundation for the theory is available in Karatzas and Shreve
[1997], Oksendal [1992], and Protter (2005).
The treatment in this chapter focuses on asset pricing in general; we shall
specialize it to interest rate securities in Chapter 4. Chapter 5 introduces
fixed income markets in detail.
1.1 The Setup
Unless otherwise noted, in this book we shall always consider an economy
with continuous and frictionless trading taking place inside a finite horizon
{0,T]. We assume the existence of traded dividend-free assets with prices
characterized by a p-dimensional vector-valued stochastic process X(t) =
(X1(t),...Xp(t))'. Uncertainty and information arrival is modeled by a
probability space (2,,P), with @ being a sample space with outcome
elements w; F being a o-algebra on 92; and P being a probability measure
on the measure space ({2, F). Information is revealed over time according
to a filtration {F;, t € [0,T]}, a family of sub-o-algebras of F satisfying
Fs © F, whenever s < t. We can loosely think of F; as the information
available at time ¢. We assume that the process X(t) is adapted to {F;}, ie.
that X(t) is fully observable at time t. For technical reasons, we require that
the filtration satisfies the “usual conditions”+. Let EP(-) be the expectation
\To satisfy the “usual conditions”, 7, must be right-continuous for all t, and
Fo must contain all the null-sets of F, ie. all subsets of sets of zero P-probability.4 1 Introduction to Arbitrage Pricing Theory
operator for the measure P; when conditioning on information at time t, we
will use the notation EP (.) = EP (.|F2).
Tn all of the models in this book, we specialize the abstract setup above to
the situation where information is generated by a d-dimensional vector-valued
Brownian motion (or Wiener process) W(t) = (Wi (t),-..,Wa(t))", where
W; is independent of W; for i 4 j. Brownian motions are treated in detail
in Karatzas and Shreve [1997]; here, we just recall that a scalar Brownian
motion W; is a continuous stochastic process starting at 0 (ie. W;(0) = 0),
having independent Gaussian increments: W;(t) -—W;(s) ~ N(0,t—s), € > s.
The filtration we consider is normally always the one generated by W,
F, = o{W(u), 0 < u < t}, possibly augmented to satisfy the usual conditions.
We will generally assume that the price vector X(t) is described by a vector-
valued Ito process:
t t
X(t) = X(0) +f L(s,w) as+ [ a(s,w)dW(s), (1.1)
0 0
or, in differential notation,
dX (t) = w(t, w) dt + o(t,w) dW(t), (1.2)
where jz: R x 2 -+ R? and a: R x 2 — R?*4 are processes of dimension p
and p x d, respectively. We assume that both yz and o are adapted to {Fy}
and are in L' and L? respectively, in the sense that for all t € (0,T],
t
f |n(s,w)| ds < 00, (1.3)
t
[ lo(s,w)|? ds < 00, (1.4)
0
almost surely”. In (1.4), we have defined
Jo(t,w)|? = tr (o(t,w)o(t,w)"). (1.5)
We notice that the sample paths of X generated by (1-1) are almost surely
continuous, with no jumps in asset prices.
A technical treatment of Ito processes and the Ito integral with respect
to Brownian motion can be found in Karatzas and Shreve [1997]. For our
needs, it suffices to think of the Ito integral as
[ couyawi) = lim Soli )8,u) [W (id) —W((i-1)8)], (1.6)
5 ne
isl
?An event holds “almost surely” — often abbreviated by “a.s.” — if the
probability of the event is one.1.1 ‘The Setup 5
where 5 £ t/n. We note that the integrand ¢ is here always evaluated at
the left of each interval {(i — 1)4, 16]. Other choices are possible?, but, as we
shall see, the “non-anticipative” structure of the Ito integral gives rise to
a number of useful results and makes it particularly useful as a model of
trading gains (see Section 1.2).
We list a few relevant definitions and results below.
Definition 1.1.1 (Martingale). Let Y(t) be an adapted vector-valued pro-
cess with EP (|Y (t)|) < 00 for all t € [0,T]. We say that Y(t) is a martingale
under measure P if for all s,t € [0,7] witht < s,
EP (¥(s))=Y(t), as.
If we replace the equality sign in this equation with < or >, Y(t) is said
to be a supermartingale or a submartingale, respectively.
Definition 1.1.2 (Space H?). Let |o(t,w)|? be as defined in (1.5). We
say that o is in H®, if for allt € [0,T] we have
EP (f jo(s,w)|? as) < 00.
The importance of Definition 1.1.2 becomes clear from the following
result:
Theorem 1.1.3 (Properties of Ito Integral). Define I(t) =
Jo.o(s,w) dW(s) and assume that o is in H?. Then
1. I(t) is Fy-measurable.
2. I(t) is a continuous martingale. In particular, EP(I(t)) = 0 for all
t¢ [0,7].
3. EP(((t)|?) = EP(f, lo(s,w)|? ds) < 00.
4. EP(I(t)1(8)7) = EP (fen o(u,w)o(u,w)7 du).
A proof of Theorem 1.1.3 can be found in, e.g., Karatzas and Shreve
[1997]. The equality in the third item of Theorem 1.1.3 is known as the Ito
isometry. Due to the inequality in the third item, we say that the martingale
defined in the process is a square-integrable martingale.
While it is common in applied work to simply assume that Ito integrals
are martingales, without technical regularity conditions on o(t,w) (such
as the H? restriction in Theorem 1.1.3), we should note that Ito integrals
involving general processes in L? can, in fact, only be guaranteed to be local
martingales. A process X is said to be a local martingale if there exists a
3Phe Stratonovich stochastic integral evaluates ¢ at the mid-point of each
interval.6 1 Introduction to Arbitrage Pricing ‘Theory
sequence of stopping times* {r, }°,, with tT, — co as n — oo, such that
X(min(t,7)), > 0, is a martingale for all n. In other words, all “driftless”
Ito processes of the type
dY (t) = o(t,w) dW(t) (1.7)
are local martingales, but not necessarily martingales. Interestingly, a con-
verse result holds as well; all local martingales adapted to the filtration
generated by the Brownian motion W can be represented as Ito processes
of the form (1.7):
Theorem 1.1.4 (Martingale Representation Theorem). Jf Y is a
local martingale adapted to the filtration generated by a Brownian motion W,
then there exists a process 0 such that (1.7) holds. If Y is a square-integrable
martingale, then o is in H?.
The proof of Theorem 1.1.4 can be found in Karatzas and Shreve [1997].
In the manipulation of functionals of Ito processes, the key result is a
famous result by K. Ito:
Theorem 1.1.5 (Ito’s Lemma). Let f(t,x), 2 = («1,...,2))', denote a
continuous function, f :{0,T] x R? + R, with continuous partial derivatives
Of /At = fr, Af [Oui = fo,, Pf /Ox,0x; = fr,x,. Let X(t) be given by the
Ito process (1.2) and define a scalar process Y(t) = f(t, X(t)). Then Y(t) is
an Ito process with stochastic differential
dY (t) = f(t, X(t)) dt + fe (t, X(t) w(t,w) dt + fy (t, X (t)) o(t,w) dW (t)
+ ; SOD fave; (.X() (o(t.w)o(tew)"), 5 ty
i=1 j=l
where fz = (fr,,---5 fec,,)-
For easy reference, the result below lists Ito’s lemma for the special case
where p=d=1
Corollary 1.1.6. For the case p = d =1, Ito’s lemma becomes
dy (t) (i (t, X(t)) + fe (X(t) ul(t,w) 4 plex (x(0) o(t4)?) dt
+ fa (t,X (8) o(t,w) dV(t).
Ito’s lemma can be motivated heuristically from a Taylor expansion. For
instance, for the scalar case in Corollary 1.1.6, we write informally
Recall that a stopping time 7 is simply a random time adapted to the given
filtration, in the sense that the event {r < t} belongs to Fy.1.2 ‘Trading Gains and Arbitrage 7
f (t+ dt, X(t + dt)) = f(t, X(t) + fedt + fedX(t) + phos (dX(0)" eee
(1.8)
Here, we have
(dX (t))? = p(t,w)? (dt)? + o(t,w)? (dW(2))? + 2u(t,w)o(t,w) dt dW (t).
As shown earlier, (dW (t))? = dt in quadratic mean, whereas all other terms
in the expression for (dX (t))? are of order O(de?/?) or higher and can be
neglected for small dt. In the limit, we therefore have (dX (t))? = o(t,w)? dt
which can be inserted into (1.8). The result in Corollary 1.1.6 then emerges.
Remark 1.1.7. The quantity (dX (t))? discussed above is the differential of
the quadratic variation of X(t), often denoted by (X(t), X(t)). That is,
d(X(t), X(t) = (dX)? = Le(. x10) = f (axeny?.
For two different (scalar) Ito processes X(t) and Y(t), we may equivalently
define the quadratic covariation process (X(t), ¥(t)) by
(X(t), Y(t) = dX(t) dY (2).
Sometimes we also write d(X(t),¥(t)) = (dX(t),d¥(t)). If X(t) is a p-
dimensional process and Y(t) is a g-dimensional process, the quadratic
covariation (X(t), Y(t)") is a (p x q)-dimensional matrix process whose
(i, j)-th element is (X;(t), Y;(t)), 4=1,....p,7=1,...,¢.
The so-called Tanaka extension (see Karatzas and Shreve {1997]) extends
Ito’s lemma to continuous but non-differentiable functions. At points where
the function has a kink, the Tanaka extension (loosely speaking) justifies
using the Heaviside (step-) function for the first-order derivative and the
Dirac delta function for the second-order derivative. An application of the
Tanaka extension can be found in Section 1.9.2 and in Chapter 7, along with
further discussion and references.
1.2 Trading Gains and Arbitrage
Working in the setting of Section 1.1 with assets driven by Ito processes,
we now consider an investor engaging in a trading strategy involving the p
assets X),..., Xp. Let the trading strategy be characterized by a predictable®
adapted process J(t,w) = (¢1(t,w),...,¢p(t,w))", with ¢;(t,w) denoting
°A predictable process is one where we, loosely speaking, can “foretell” the
value of the process at time t, given all information available up to, but not
including, time t. All adapted continuous processes are thus predictable. For a
technical definition of predictable processes, see Karatzas and Shreve [1997].8 1 Introduction to Arbitrage Pricing Theory
the holdings at time ¢ in the i-th asset X;. The value (¢) of the trading
strategy at time ¢ is thus (dropping the dependence on w in the notation)
a(t) = o(t) X(t). (1.9)
The gain from trading over a small time interval {t,t + 6) is (approximately)
o(t)" [X(t + 4) — X(t)], suggesting (compare to (1.6)) that the Ito integral
[ow T ax(s y= [os Tu(s)ads+ [ o(e)To (s)4W(s)
is a proper model for trading gains over (0, ¢]. An investment strategy is said
to be self-financing if, for any t € [0,7],
ay —mo)= [ #(s)dX(s). (1.10)
This relationship simply expresses that changes in portfolio value are solely
caused by trading gains or losses, with no funds being added or withdrawn.
Self-financing trading strategies allow investors to turn a certain initial
investment 7(0) into stochastic future wealth z(t). Under natural assump-
tions on possible trading strategies (e.g., that there is finite supply of all
assets) we would expect that there should be limitations to the profits that
self-financing strategies can create. Most notably, it should be impossible to
create “something for nothing”, that is, to turn a zero initial investment into
future wealth that is certain to be non-negative and may be positive with
non-zero probability. To express this formally, we introduce the concept of
an arbitrage opportunity:
Definition 1.2.1 (Arbitrage). An arbitrage opportunity is @ self-financing
strategy @ for which r(0) =0 and, for some t € [0,T),
n(t)>0 as., and P (n(t) > 0) >0, (1.11)
with m given in (1.9).
In economic equilibrium, arbitrage strategies cannot exist and preclud-
ing (1.11) constitutes a fundamental consistency requirement on the asset
processes.
1.3 Equivalent Martingale Measures and Arbitrage
We turn to the question of characterizing the conditions under which the
trading economy is free of arbitrage opportunities. A concise way to state
these conditions involves equivalent martingale measures, a concept we
shall work our way up to in a number of steps. First, we recall that two1.3 Equivalent Martingale Measures and Arbitrage 9
probability measures P and P on the same measure space ({2,F) are said to
be equivalent if P(A) = 0 + B(A) = 0, VA € F; that is, the two measures
have the same null-sets. An important result from measure theory states
that equivalent measures are uniquely associated through a quantity known
as a Radon-Nikodym derivative:
Theorem 1.3.1 (Radon-Nikodym Theorem). Let P and P be equivalent
probability measures on the common measure space (2,F). There exists a
unique (a.s.) non-negative random variable R with EP(R) = 1, such that
P(A)=EP(R1yay), for all AEF.
For a proof of Theorem 1.3.1, see e.g. Billingsley [1995]. The random
variable R in the theorem is known as a Radon-Nikodym derivative and
is denoted dP/dP. In the theorem we have used an indicator 1,4}; this
quantity is 1 if the event A comes true, 0 if not.
For later use, we associate any probability measure P with a density
process
s(t) =EP (s) » Wee (0,7). (1.12)
Clearly, ¢(t) is a P-martingale with (0) = 1 and ¢(t) = EP (<(T)). A simple
conditioning exercise demonstrates that for any Fp-measurable random
variable Y(T), with R = dP/dP,
EP(Y(D)|F) = ra (RY(T)F)
= (t)'EP (EP(R|Fr)¥(T)|F.)
oP) si) : (1.13)
P
Brn
We shall use this result on numerous occasions in this book.
We now introduce the important concept of a deflator, a strictly
positive Ito process used to normalize the asset prices. Let the defla-
tor be denoted D(t), and define the normalized asset process X?(t) =
(Xi(t)/Dit),..., Xp(t)/D(t))™. We say that a measure Q? is an equivalent
martingale measure induced by D if X(t) is a martingale with respect to
Q?. If Q? is a martingale measure, we say that a self-financing trading
strategy is permissible if
t
J 6s) ax(s)
0
is a martingale. For the Ito setup discussed earlier, a permissible strategy®
is obtained by, say, requiring that (t)'o(t) is in H?; see Theorem 1.1.3. An
°The technical restriction on trading positions imposed by only considering
permissible trading strategies rules out certain pathological strategies, such as the10 1 Introduction to Arbitrage Pricing Theory
application of Ito’s lemma combined with (1.9)—(1.10) implies that (t)/D(t)
is a Q?-martingale when the trading strategy is permissible.
For permissible trading strategies, the importance of equivalent martin-
gale measures follows from the following theorem:
Theorem 1.3.2 (Sufficient Condition for No-Arbitrage). Restrict at-
tention to permissible trading strategies. If there is a deflator D such that
the deflated asset price process allows for an equivalent martingale measure,
then there is no arbitrage.
For a proof we refer to Musiela and Rutkowski [1997]. We note that
Theorem 1.3.2 only provides sufficient conditions for the absence of arbitrage,
and known (and rather technical) counterexamples demonstrate that the
existence of an equivalent martingale measure does not follow from the
absence of arbitrage in a setting with permissible trading strategies. A body
of results known as the fundamental theorem of arbitrage establishes the
conditions under which the existence of an equivalent martingale measure is
also a necessary condition for the absence of arbitrage. The results are rather
technical, but generally state that absence of arbitrage and the existence
of an equivalent martingale measure are “nearly” equivalent concepts. The
exact notion of “nearly” equivalent is discussed in Duffie [2001] as well as
in the authoritative reference’ Delbaen and Schachermayer [1994]. For our
purposes in this book, we ignore many of these technicalities and often
simply treat the absence of arbitrage and the existence of a martingale
measure as equivalent concepts.
Finally, if the deflator is one of the p assets, we call the deflator a
numeraire. Let us, say, assume that X, is strictly positive and can be used
as a numeraire. Also assume that a deflator D has been identified such
that Theorem 1.3.2 holds. As X;(t)/D(t) is a Q?-martingale, we can use
the Radon-Nikodym theorem to define a new measure Q*? by the density
s(t) = (Xi (t)/D(t))/(X1(0)/D(0)). For an Fyp-measurable variable Y(T),
we then have, from (1.13),
xyoe?” (262) = vioee” (53). (4.14)
In particular, if Y(t)/D(t) is a Q?-martingale, Y(t)/X1(t) must also be a
Q*:-martingale. In practice, it normally suffices to only consider deflators
from the set of available numeraires.
Remark 1.3.3. Some sources define 1/D(t) (rather than D(t)) as the deflator.
The convention used in this book is more natural for our applications.
doubling strategy considered in Harrison and Kreps [1979]. A realistic resource-
constrained economy will always bound the size of the positions one can take in
an asset, sufficing to ensure that predictable trading strategies are permissible.
"In a nutshell, Delbaen and Schachermayer [1994] show that absence of arbitrage
implies only the existence of a local martingale measure.1.4 Derivative Security Pricing and Complete Markets il
1.4 Derivative Security Pricing and Complete Markets
A T-maturity derivative security (also known as a contingent claim) pays
out at time T an Fr-measurable random variable V(T), and makes no
payments before T. We assume that V(T) has finite variance, and say that
the derivative security is attainable (or sometimes redundant) if there exists
a permissible trading strategy ¢ such that V(T) = ¢(T)' X(T) = x(T) as.
The trading strategy is said to replicate the derivative security. Importantly,
the absence of arbitrage dictates that the time 0 price of an attainable
derivative security V(0) must be equal to the cost of setting up the self-
financing strategy, i.e. V(0) = (0). More generally, V(t) = x(t), t € (0, T].
This observation is the foundation of arbitrage pricing and allows us to
price derivative securities as expectations under an equivalent martingale
measure. Specifically, consider a deflator D and assume the existence of
an equivalent martingale measure Q? induced by D; the existence of Q?
guarantees that there are no arbitrages in the market, by Theorem 1.3.2.
Now, from the martingale property of 7(t)/D(t) in the measure Q? and the
relation V(t) = 7(t) it immediately follows that
we? (UM)
Dit) ~* \D(T)
or
V(t) = D(t)EO” (FR) ; (1.15)
If all finite-variance #7-measurable random variables can be replicated, the
market is said to be complete. In a complete market, all derivatives are
“spanned” and hence have unique prices. Interestingly, a similar uniqueness
result holds for equivalent martingale measures:
Theorem 1.4.1. In the absence of arbitrage, a market is complete if and
only if there exists a deflator inducing a unique martingale measure.
From (1.14) it follows that the martingale measures induced by all
numeraires must then be unique as well.
In practical applications, we shall often manipulate the choice of nu-
meraire asset to simplify computations. The following result is useful for
this:
Theorem 1.4.2 (Change of Numeraire). Consider two numeraires N(t)
and M(t), inducing equivalent martingale measures QN and Q™, respectively.
If the market is complete, then the density of the Radon-Nikodym derivative
relating the two measures is uniquely given by
iy paw (dQ!) _ M(t)/M(0)
o=88 (Sr) = Mam12 1 Introduction to Arbitrage Pricing ‘Theory
Proof, As the market is complete, all derivatives prices are unique. Consider
an integrable Fr-measurable payout V(T) = Y(T)M(T), with time ¢ price
V(t). From Theorem 1.4.1 and (1.15) we must have
V(t) = N()ES" (on?) = M(t)ER" (“On-)
or
Be" (y(r) = £2" ( vt MEINE,
qT)
) M(t)/N(t)
Comparison with (1.13), and the fact that the density must be scaled to
equal 1 at time 0, reveals that the Radon-Nikodym derivative for the measure
shift is characterized by the density in the theorem. 0
1.5 Girsanov’s Theorem
The last two sections have demonstrated a close link between the concept
of arbitrage and the existence and uniqueness of equivalent martingale
measures. In this section, we consider i) the conditions on the asset prices
that allow for an equivalent martingale measure; and ii) the effect on asset
dynamics from a change of probability measure. We consider two measures
P and P(@) related by a density <°(t) = EP (dP(@)/dP), where <8(t) is an
exponential martingale given by the Ito process
dc? (t)/s0(t) = —0(t) dW (t),
where W(t) is a d-dimensional P-Brownian motion. The d-dimensional
process @ is known as the market price of risk. By an application of Ito’s
lemma, we can write
<(t) = exp (- [ar dW(s) - if 8s)" 8(s) a)
26g -[ 6(s)7 aiv(s)) (1.16)
where &(-) is the Doleans exponential. An often-quoted sufficient condition on
6(t) for (1.16) to define a proper martingale (and not just a local martingale)
is the Novikov condition
EP [exw (5 [asrases)| < 00. (1.17)
The Novikov condition can often be difficult to verify in practical applications.
Armed with the notation above, we are now ready to state the main
result of this section.1.5 Girsanov’s Theorem 13
Theorem 1.5.1 (Girsanov’s Theorem). Suppose that <®(t) defined in
(1.16) és @ martingale. Then for all t € (0,T]
W(t) = w+ fm) As
is a Brownian motion under the measure P(6).
To discuss a strategy to prove Girsanov’s theorem, assume for simplicity
that the dimension of the Brownian motion is d = 1. One way to construct a
proof for Theorem 1.5.1 is to demonstrate that the joint moment. -generating
function (mgf)® (under P(@)) of the increments
W(t), W? (ta) ~ W(t), W8(tn) —W(tnay O< tr
(Se. (W(t) — W(t »))-] Tle? (ti — ti-1) /2),
1
where we have defined to = 0. While carrying out such a proof is not difficult,
we here merely justify the final result by examining the case n = 1 only.
Specifically, we consider
EP [exp (aW9(t))],
where a € R and t > 0. Shifting probability measure, we get
1
EPO) [exp (aWv%(t))] = EP [exp (owe) 4a [ 0s) as)|
oO
=EP [exe (awe + af as) as) é (-[ 4(s) awis))|
= et /2pP [ex (f (a — 6(s)) dW(s) — ; [i — %3))2as) |
= ot/2pP le ([ (a —6(s)) awis))|
2,
=e «2
Recall that the moment-generating function of a random variable Y in some
measure P is defined as the expectation E?(exp(aY)), a € R. Unlike the charac-
teristic function, the moment-generating function is not always well-defined for all
values of the argument a.14 1 Introduction to Arbitrage Pricing ‘Theory
as desired. In the last step, we used the fact that the Doleans exponential is
a martingale with initial value 1.
Girsanov’s theorem implies that we can shift probability measure to
transform an Ito process with a given drift to an Ito process with nearly
arbitrary drift. Specifically, we notice that our asset price process (under P)
dX (t) = p(t) dt + o(t) dW(t)
can be written
AX (t) = (u(t) — o(t)6(t)) dt + o(t)dW(2),
where W®(t) is a Brownian measure under the measure P(9). This process will
be driftless provided that 6 satisfies the “spanning condition” s(t) = o(t)0(t)
for all t € [0,7]. This gives us a convenient way to check for the existence of
equivalent martingale measures:
Corollary 1.5.2. For a given numeraire D, assume that the deflated asset
process satisfies
dX? (t) = p(t) dt + 0? (t) dW(t),
where o?(t) is sufficiently regular to make fy o?(s)dW(s) a martingale.
Assume also that there exists a @ such that the density ¢® is a martingale
d (as.)
7 oP (t)a(t) = w(t), t€ (0,7), (1.18)
then D induces an equivalent martingale measure and there is no arbitrage.
Equation (1.18) is a system of linear equations and we can use rank results
from linear algebra to determine the circumstances under which (1.18) will
have solutions (no arbitrage) and when these are unique (complete market).
For instance, a necessary condition for the market to be complete is that
rank(o) = d. Further results along these lines can be found in Musiela and
Rutkowski [1997] and Duffie [2001].
We conclude this section by noting that while a change of probability
measure affects the drift 4 of an Ito process, it does not change the diffusion
coefficient ¢. This is sometimes known as the diffusion invariance principle.
1.6 Stochastic Differential Equations
So far we have defined the asset process vector to be an Ito process with
general measurable coefficients j1(t,w) and o(t,w). In virtually all applica-
tions, however, we restrict our attention to the case where these coefficients1.6 Stochastic Differential Equations 15
are deterministic functions of time and the state of the asset process®. In
other words, we consider a stochastic differential equation (SDE) of the form
dX(t) = p(t, X(t) dt +0 (t, X(t) dwt), X(0) = Xo, (1.19)
with ys: [0,7] xR? > R?; 0 : (0,7] xR? - R°*4; and Xo an initial condition.
A strong solution'® to (1.19) is an Ito process
t
x= Xo [ wlax(s) as | o(s,X(s)) dW(s).
0
A number of restrictions on jz and ¢ are needed to ensure that the solution
to (1.19) exists and is unique. A standard result is listed below.
Theorem 1.6.1. In (1.19) assume that there exists a constant K such that
for allt € [0,T| and all x,y € R?,
\u(t,2) ~ plt,y)| + fol) —o(t,y)| < Kix — yl, (Lipschitz condition),
u(t, x)|? + |o(t,x)|? < K? (1+ |2/?), (growth condition).
Then there exists a unique solution to (1.19).
We notice that the dynamics of (1.19) do not depend on the past evolution
of X(t) beyond the state of X at time t. This lack of path-dependence
suggests that X is a Markov process. We formalize this as follows.
Definition 1.6.2 (Markov Process). The R?-valued stochastic process
X(t) is called a Markov process if for all s,t € [0,T] witht Ve, (t) us (t) dt
i=1
+ ; > Ve,x,(t)3i,5(t) dt + » Ve, (t)o;(t) dW(t), (1.24)
where @; is the i-th row of the px d matrix o and Yj; is the (i, j)-th element
in oo”. We recall that subscripts like Vy, denote partial differentiation, see
Theorem 1.1.5.
If V(t) can be replicated by a self-financing trading strategy ¢ in the p
assets, we must also have, from (1.10),
P P
V(t) = ot) AX(t) = S> dilt)ua(t) dt + > gilt)oi(t) a(t). (2.28)
Comparing terms in (1.24) and (1.25) we see that both equations will hold,
provided that for all t € [0,7]
av (t, X(t))
di(t) = an (1.26)
and
P
1 j=l
To the extent that the system above allows for a solution (it may not if the
market is not complete, from (1.26) we see that the trading strategy that
replicates the derivative V holds V(t, X(t))/Ax; units of asset X; at time18 1 Introduction to Arbitrage Pricing ‘Theory
t. The quantity 0V/dz; is often known as the delta with respect to X;!".
Note that, from (1.9) and (1.26) we have that
V@X() = MEAD x, (1.28)
i=l
Besides identifying an explicit replication strategy, the arguments above
have also produced (1.27), a partial differential equation (PDE) for the
value function V(t,2). The PDE is a second-order parabolic equation in p
spatial variables, with known terminal condition V(T,x) = g(x) (a so-called
Cauchy problem). Solving this PDE provides an alternative way to price
the derivative, as compared to the purely probabilistic expectations-based
methods outlined earlier (see (1.15)). We shall investigate the link between
expectations and PDEs in more detail in Section 1.8.
Inspection of the valuation PDE (1.27) reveals that the drifts jz; of the
asset price SDE (1.19) are notably absent, making the price of the derivative
security independent of drifts. This is typical of derivatives in complete
markets and follows from the fact that derivatives can be priced preference-
free, by arbitrage arguments. In contrast, for the elements of the fundamental
asset price vector, risk-averse investors would demand that assets with high
volatilities |o;| be rewarded with higher drifts (more precisely, higher rates
of return) as compensation for the additional uncertainty.
1.8 Kolmogorov’s Equations and the Feynman-Kac
Theorem
In earlier sections, we have seen that derivatives prices can be expressed as
expectations under certain probability measures or as solutions to PDEs. This
hints at a deeper connection between expectations and PDEs, a connection
we shall explore in this section. As part of this exploration, we list results
for transition densities that will be useful later in model calibration.
As in Section 1.6, we consider a Markov vector SDE of the type (see
(1.19))
dX(t) = n(t, X(t) dt +0 (t, X(t) W(t), X(0) = Xo, (1.29)
where the coefficients are assumed smooth enough to allow for a unique
solution (see Theorem 1.6.1). Now define a functional
u(t,x) = EP (g(X(T))|X() = 2),
“Note that taking a position in V and following a trading strategy with
$: = —OV/Ax;, i = 1,...,p will effectively remove any exposure to V (as we
simultaneously take a long position in V and, through a trading strategy, a short
position in V). ‘This strategy is known as a delta hedge.1.8 Kolmogorov’s Equations and the Feynman-Kac Theorem 19
for a function g : R? + R. Under regularity conditions on g, it is easy to
see that the process u(t, X(t)), being a conditional expectation, must be a
martingale. Proceeding informally, an application of Ito’s lemma gives, for
t € [0,T) (suppressing dependence on X(t)),
P PP
du(t) = u(t) dt + 37 ue, (t)unr(t) ash SOS tere, (t)2i,9 (dt + O(AW(t)),
i=l t=1 j=1
where as before ¥j,; is the (7, 7)-th element of oo’. From earlier results, we
know that for u(t, X(¢)) to be a martingale, the term multiplying dt in the
equation above must be zero. Defining the operator
P a P ar
A= dou eae +5 LL Sula ae
1 j=l
we deduce that u(t, x) satisfies the PDE
Ou(t, x)
at
Nie
Ms
i
+ Au(t,x) = 0, (1.30)
with terminal condition u(T, a) = g(x). The equation above is known as the
Kolmogorov backward equation for the SDE (1.29). The operator A is known
as the generator or infinitesimal operator of the SDE, and can be identified
as
EP (u(t +h, X(t+ h)) |X(t) =e) — ult, x)
———
In arriving at (1.30) we made several implicit assumptions, most notably
that the function u(t, x) exists and is twice differentiable. Sufficient conditions
for the validity of (1.30) can be found in Karatzas and Shreve [1997], for
instance. A relevant result is listed below.
Au(t,2) = kim
Theorem 1.8.1. Let the process X(t) be given by the SDE (1.29), where the
coefficients 4. and o are continuous in x and satisfy the Lipschitz and growth
conditions of Theorem 1.6.1. Consider a continuous function g(x) that is
either non-negative or satisfies a polynomial growth condition, meaning
that for some positive constants K and q
g(x) < K(1+|a|%), «Ee R?.
If u(t,x) solves (1.30) with boundary condition u(T,x) = g(x), and u(t, x)
satisfies a polynomial growth condition in x, then
u(t,2) = EP (g(X(T))|X()=2), t€ (0,7). (1.31)
Conditions required to ensure existence of a solution to (1.30) are more
involved, and we just refer to Karatzas and Shreve [1997] and the references
therein.20 1 Introduction to Arbitrage Pricing ‘Theory
A family of functions g of particular importance to many of our applica-
tions is 7
g(a) =e *, kER?,
where i = V—I is the imaginary unit. In this case u(t,z) becomes the
characteristic function of X(T), conditional on X(t) = x. We refer to any
standard statistics textbook (e.g. Ochi [1990]) for the many useful properties
of characteristic functions.
For the Markov process X(t) in (1.29), let us now introduce a transition
density, given heuristically by
P(t, 3 8,y) dy = P(X(s) € [y,y+dyl|X(t)=2), OSt t from a known state at time t, rather than
vice-versa. For this, we first define an operator A* by
Pp
aflei)=-3 Sols you) re
i=
& (Ei 5(s, y)f(s,9)}
Xu Ouidu, ,
In the transition density p(t,x;s,y) now consider (t,x) fixed and let A*
operate on the resulting function of s and y. Under additional regularity
conditions, we then have the forward Kolmogorov equation
a;
= Sels.9) + A*p(s,y) =0, (t,x) fixed, (1.33)
subject to the boundary condition p(t, x;t,y) = 6(2 — y).
The forward Kolmogorov equation is sometimes known as the Fokker-
Planck equation. We stress that the backward equation is more general than
the forward equation, in the sense that the former holds for general terminal
conditions g(x), whereas the latter only holds for 5-type initial conditions.1.9 Black-Scholes and Extensions 21
We round off this section by a useful extension to the Kohnogorov
backward equation. Specifically, consider extending the PDE (1.30) to
Ou(t, x)
ot
+ Au(t,a) + h(t,x) =r(t,a)u(t,2), (1.34)
where h,r : [0,7] x R? > R. Given the boundary condition u(T,2) = g(x),
the Feynman-Kac solution to (1.34), should it exist, is given by
X(t)= °) ;
(1.35)
T
u(t, 2) = EP (worn (X(P)) +/ w(t, s)h(s, X(s)) ds
where
.
w(t,T) = exp (-/ ra.X(0) a), te (0,7),
t
The result is easily understood from an application of Ito’s lemma, similar
to the one used above to motivate the backward Kolmogorov equation.
Sufficient regularity conditions for the Feynman-Kac result to hold are
identical to those of Theorem 1.8.1, supplemented with the requirement
that r be nonnegative and continuous in «; and the requirement that h be
continuous in x and either be nonnegative or satisfy a polynomial growth
requirement in x. See Duffie [2001] for further details about the often delicate
regularity issues surrounding the Feynman-Kac result.
For later use, let us finally note that when g(x) = 6(a—y) and h(t, x) = 0,
u(t, x) in (1.35) will equal
Glt,a;Tyy) £ EP (e° WX AG (X(T) —y)|X() = 2).
The function G is known as a state-price density or as an Arrow-Debreu
security price function. In particular, notice that for an arbitrary g(x), we
then have
BP (eI MOG x (TY) XC) =2) = f Clea: T valu) dy. (1.86)
Comparison with (1.32) shows that the state-price density is, essentially,
equivalent to a Green’s function with built-in discounting.
1.9 Black-Scholes and Extensions
In reviews of asset pricing theory, a discussion of the seminal Black-Scholes-
Merton model (sometimes just known as the Black-Scholes model) of Black
and Scholes [1973] and Merton [1973] is nearly mandatory. As the Black-
Scholes-Merton (BSM) model constitutes a well-behaved setting in which
to tie elements of previous sections together, our text is no exception. To
provide a smoother transition to material that follows, we do, however,
extend the usual analysis to include a simple case of stochastic interest rates.22 1 Introduction to Arbitrage Pricing Theory
1.9.1 Basics
In the basic BSM economy, two assets are traded: a money market account
B and a stock S. In previous notations, X(t) = (B(t), S(t)" and p = 2. The
money market account value is 1 at time 0 and accrues risk-free interest
at a continuously compounded, non-negative rate of r, initially assumed
constant. The dynamics for § are thus given by an ordinary differential
equation (ODE)
dB(t)/A(t)=rdt, B(0)=1,
implying that simply A(t) = B(0)e"'.
The stock dynamics are assumed to satisfy GBMD under measure P:
dS(t)/S(t) = pdt +odW(t), (1.37)
where W is a Brownian motion of dimension d = 1, and y and o are
constants.
Taking first a probabilistic approach, we notice that f is positive and can
be used as a numeraire. Let $°(t) = S(t)/A(t) be the stock price deflated
by 6. By Ito’s lemma,
d$9(t)/$9(t) =(u—r)dt +oaW(t).
Applying Girsanov’s theorem (see Theorem 1.5.1) and Corollary 1.5.2, we
see that if o 4 0, 6 will induce a unique equivalent martingale measure,
with the measure shift characterized by the density process!?
ds(t)/s(t)=-0dW(t), O= moe
Clearly, ¢(t) defines an exponential martingale. The probability measure
induced by the money market account £ is called the risk-neutral martingale
measure and is traditionally denoted Q. Under Q, W4(t) = W(t) + 6t is a
Brownian motion, and
dS®(t)/S*(t) =o dW*(t),
dS(t)/S(t) =rdt + odW*(t), (1.38)
or, from (1.21),
S(T) = S(t)e FTO 4o(WPT)-WPO), +t € [0,7]. (1.39)
We note that under Q, the drift of the stock process is replaced by
the risk-free interest rate r. That is, under Q agents in the economy will
12'The reader may recognize the market price of risk 9 as the Sharpe ratio of the
stock S, a measure of how well the risk of stock (represented by a) is compensated
by excess return (represented by —r).1.9 Black-Scholes and Extensions 23
appear to be indifferent (“neutral”) to the risk of the stock, content with an
average growth rate of the stock equal to that of the money market account.
Before proceeding with the BSM analysis, we wish to emphasize that
the drift restriction imposed on the stock in the risk-neutral measure Q is a
general result. In a larger setting with a p-dimensional vector asset process
X, if the Q-dynamics of the components of X are all of the form
dX;(t) =rXidt+O(dW(t)), i=1,...,p,
there is no arbitrage. This result holds unchanged if the interest rate is
random (see Section 1.9.3).
Returning to the BSM setting, we note that the risk-neutral measure
is unique, whereby the market is complete and all derivative securities on
S (and #) are attainable. Let us consider a few such securities. First, we
consider a security paying at time T $1 for certain. Such a security is a
discount bond and we shall denote its time t price by P(t,T), t € (0,7). If
the interest rate is positive, we would expect P(t,T) <1 as a reflection of
the time value of money, with equality only holding for t = T. Application
of the basic derivative pricing equation (1.15) immediately gives
1
P(t,T) = B(t)Ee (a) =E? (err) =en(T-),
This result is trivial, as it is easily seen that the amount e-T(T-9 invested
in the money market account at time t will grow to exactly $1 at time T.
Second, consider a derivative V paying V(T) = S(Z’) — K at time T,
with K being an arbitrary constant. Proceeding as above, at time t K, the call option is in-the-money
(ITM) and the put option is out-of-the-money (OTM). If S(t) < K, the call is
OTM and the put is ITM. The ATM, ITM, and OTM monikers are sometimes
used to refer to the ordering of the forward value E,(S(T)) = S(t)e7-9
(for a T-maturity option) rather than the spot S(t), relative to the strike K.
In deriving (1.43), the choice of $ as numeraire was arbitrary. If we
instead use S (which is also strictly positive) as numeraire, we can write
(SZ) — K)*
ao) = S108" ( 30) ) = see" (a= K/s()*), (1-44)
where Q is the martingale measure induced by S. To identify the measure
shift involved in moving from P to QS, consider that 85(t) = 8(t)/S(t) must
be a martingale in Q®. By Ito’s lemma, in measure P we have
dB5(t)/BS(t) = (r — w+ 0?) dt — o dW(t),
such that dW5(t) = ((r —)/o + @) dt + dW/(t) is a Brownian motion under
Q°. Application of Ito’s lemma on 1/S(t) yields, after a few rearrangements,
dS(t)“*/S(t)7! = -rdt — 0 dWS(t),1.9 Black-Scholes and Extensions 25,
which is a GBMD as before. Evaluation of the expectation (1.44) can be
verified to recover the BSM formula (1.43).
Our derivation of the BSM formula was so far entirely probabilistic.
Writing c(t) = c(t, 8, $), the arguments in Section 1.7 allow us to write c as
a solution to the PDE (see (1.27))
(1.45)
subject to the boundary condition ¢(T, 6, 8) = (S—K)*. From (1.28) we also
have that the replication positions in B and S are 2& and 2&, respectively.
That is,
Oc Oc
t,B,S) ==, =o8. 1.46)
(8,8) = FEB + 55 (1.46)
As 8 is deterministic, we can actually eliminate c-dependence on this
variable by a change of variables ¢(t, S) = c(t, 8,S). By the chain rule
de Oc IcAB Ac , Ae, Ac dc,
a topo at op a tT ag”
where the last equation follows from (1.46). Inserting this into (1.45) yields
the original Black-Scholes PDE
Oe 8 lop PF ~
at Sag + 9% Ss age 7% (1.47)
with @T, S) = (S — K)*. We can solve this equation by classical methods
(see Lipton [2001] for several techniques), or we can use the Feynman-Kac
result to write it as an expectation. We leave it as an exercise to the reader
to verify that Feynman-Kac leads to the same expectation as derived earlier
by probabilistic means (see (1.41)).
A final note: the derivation of the Black-Scholes PDE above was somewhat,
non-standard due to the initial assumption of option price being a function of
the deterministic numeraire 8. A more conventional (but entirely equivalent)
argument sets up a portfolio of the call option and a position in the stock, and
demonstrates that the stock position can be set such that the total portfolio
growth is deterministic (risk-free) on [t,t+dt]. Equating the portfolio growth
ate yields the Black-Scholes PDE (1.47). See Hull {2006}
is approach.
1.9.2 Alternative Derivation
We have already demonstrated several different ways of proving the BSM
call pricing formula, but as shown in Andreasen et al. {1998] there are many
more. One particularly enlightening proof is based on the concept of local
time and shall briefly be discussed in this section. The proof, which borrows26 1 Introduction to Arbitrage Pricing ‘Theory
from the results in Carr and Jarrow [1990], will also allow us to demonstrate
the Tanaka extension of Ito’s lemma, mentioned earlier in Section 1.1.
As above, we assume that the stock price process is as in (1.38), and
define the forward stock price F(t) = S(t)/P(t,T). Clearly,
dF(t)/F(t) =o dW(t), t«} and the second derivative can be
interpreted as the Dirac delta function, 5(F(t) — K). As I is clearly not
twice differentiable, Ito’s lemma formally does not apply, but the Tanaka
extension nevertheless gives us permission to write
al(t) = Aprqsmy APO + 50? FUe)?6 (F(A) — KY)
= ltra>Ky oF (t) Wht) + 5o7K25 (F(t) —K) dt.
In integrated form,
‘T 1 T
I(T) = I(t) +f Lpe(uy>K} OF (u) dW? (u) + so? f 5(F(u) — K) du.
t t
The second integral in this expression is a random variable known as the local
time of F spent at the level KX, on the interval {t, T}. Taking expectations, it
follows that
Ee l op2 [ pa
¢ (U(T)) = T(t) + 50° [ E? (5 (F(u) — K)) du.
2 t
Here, if p(t,y;u,x) is the density of F(u) given F(t) = y, u > t, then
obviously
Be (5 (F(u) — K)) =p(t, F()su, K).
By the definition of F(T) we have F(T) = S(T), such that I(T) =
(S(T) — K)*. From (1.41), we may therefore write the time ¢ European call
option price as
e(t) = P(t, T)EP (I(T))
‘T
= (S(t) -— KP(t,T))* + PED 21? | p(t, F(t);u,K)du. (1.49)
A
The formula (1.49) decomposes the call option into a sum of two terms, the
intrinsic value and the time value, respectively. The time value can be made
more explicit by observing from the representation (1.39) that!®
This also follows directly from the fact that F(u) is a log-normal random
variable with moments given by (1.22) and (1.23).1.9 Black-Scholes and Extensions 27
ou Ky = ex (hates
P(t PO: K) = pee »( alu) ,
d(u) & MEO [30 — ot =)
In other words, we have arrived at the following result.
Proposition 1.9.3. The European call option price c(t) on the process
(1.38) can be written as
7 7 . P(t, T)oK [7 $(d(u)) .
a(t) = (810) — KET)? + FODEE f SO au, (1.50)
where (x) is the Gaussian density.
Explicit evaluation of the integral in (1.50) can be verified to produce the
BSM formula in Theorem 1.9.1. We leave this as an exercise to the reader.
1.9.3 Extensions
[Link] Deterministic Parameters and Dividends
In our basic BSM setup, consider now first a simple extension to a deter-
ministic interest rate r(t) and a deterministic volatility a(t). Carrying out
the analysis as before, we see that discount bond prices now become
P(t,T) = ef r)as, (1.51)
The BSM call pricing formula (1.43) holds unchanged provided P(t, T) is
changed according to (1.51), and we redefine
a, & R(S@)/K) + A (r(s) = 0(s)?/2) ds
LTS —aoooooe
Kr a(s)? ds
Let us further assume that the stock pays dividends at a deterministic
rate of q(¢). Our framework so far, however, has assumed that assets pay
no cash over [0,7]. To salvage the situation, consider a fictitious asset S*
obtained by reinvesting all dividends into the stock S itself. It is easily seen
that
S*(t) = S(t)eli a9) as,
and clearly S*(t) satisfies the requirements of generating no cash flows on
[0,7]. Stating the call option payout as
e(T) = (S(T) — K)t = (s*(ryem Jor afs) ds _ x)”28 1 Introduction to Arbitrage Pricing Theory
and performing the pricing analysis of Section 1.9.1 on S*(t), rather than
S(t), results in a dividend-extended BSM call option formula:
c(t) = S(t)en 9) 486(d,) — K P(t, T)#(d_),
dg & MUSIK) + fr (r(s) — 4(s) + (s)?/2) ds
: Vf" o(s)? ds
When the stock pays a dividend rate of q(t), note that the risk-neutral
process for S(t) is
AS(t)/S(t) = (r(t) ~ a(t) dt + o(t) awe),
which extends (1.38). Note that for the special case where r(t) = q(t), S(¢)
becomes a martingale and the call option price formula simplifies to
c(t) = P(t, T) (S(t)®(d,) — K@(d_)), (1.52)
where now
a In(S(t)/K) £3 f7 o(s)2ds
4, 2 Se es
gr o(s)? ds
d.
Remark 1.9.4. The martingale call formula (1.52) typically emerges when
pricing options on futures and forward prices (see (1.48)) and is often called
the Black formula, in honor of the work in Black [1976].
[Link] Stochastic Interest Rates
We now get, even more ambitious and wish to consider call option pricing in
the case where the interest rate r is stochastic. The money market account
B becomes
B(t) = ef (9) 48,
and is now assumed an F;-measurable random variable. Proceeding as in
Section 1.9.1, we find that under the risk-neutral measure Q, the call option
price expression is (assuming that the stock pays no dividends)
1 T
(t) = B()ES ( —— (S(T) — K)* ) = ES (eW fr) as ~K)*).
a(t) = BOWES ( ser (S(T) KY") = 88 (eM 04 (9(2) ~ KY")
(1.53)
In (1.53), we emphasize that the numeraire no longer can be pulled out
from the expectation. Still, to simplify call option computations, it would
be convenient to somehow remove the term exp(— fi r(s) ds) from the
expectation in (1.53). By substituting 1 for ($(7) — K)+ in the expression
above, we first notice that1.9 Black-Scholes and Extensions 29
PT) = BP (ek rede) |
This inspires us to perform a new measure shift, where we use the discount
bond P(t, T), rather than 6(t), as our numeraire. Let the martingale measure
induced by P(t,T’) be denoted Q?, often termed the T-forward measure. By
the standard result (1.15) we have
elt) = PU TIES” (P(T,T)" (S(L) - K)*)
= PU TIES” ((S(2) - K)*),
where we have used that P(T,T) = 1. From Theorem 1.4.2, Q? and Q are
related by the density
\T P.
To proceed, we need to add more structure to the model by making
assumptions about the stochastic process for P(t,T). We shall spend con-
siderable effort in subsequent chapters on this issue, but for this initial
application we simply assume that P(t,T’) has Q dynamics
dP(t,T)/P(t,T) = r(t) dt — op(t,T) dWr(t), (1.55)
where op(t,T) is deterministic and Wp(t) is a Brownian motion correlated
to the stock Brownian motion. Notice that the drift of P(t,T) under Q is not
freely specifiable and must be equal to the risk-free rate; see the discussion
following (1.38). For clarity, let the stock Brownian motion be renamed
Ws(t), and assume that the correlation between Wp(t) and Ws(t) is a
constant p. In the setting of vector-valued Brownian motion with independent
components used in earlier sections, we can introduce correlation by writing
W(t) = (Wi(t), Wo(t))" and setting, say,
W(t) = Wilt),
Ws(t) = pWi(t) + V1 — p>Walt).
The filtration {F;,} of our extended BSM setting is the one generated by the
2-dimensional W(t).
Under Q’, the deflated process S?(t) = S(t)/P(t,T) is a martingale. An
application of Ito’s lemma combined with the Diffusion Invariance Principle
shows that the Q? process for S?(t) is
dS? (t)/S? (t) = op(t,T) dWi(t) + a(t) (paws (t) + + i — pe dWo(t) ),
(1.56)
where a(t) as before is the deterministic volatility of the stock S. We recognize
S(t) as a drift-free geometric Brownian motion with instantaneous variance
of30 1 Introduction to Arbitrage Pricing ‘Vheory
op(t,T)? + 0(t)? + 2po(t)op(t.T).
Exploiting the convenient fact that S?(T) = S(T) and c(T) = (S?(T)—K)*
(as P(T,T) = 1), we get
oft) = P(t, TBS” ((S°(2) - K)*)
sa} +
= pur) [ (SP (yer drenes VO?) - K) o(z)dz, (1.57)
00
where we have defined the “term”, or total, variance
T 9
v(t,T) & [ (op(s,T)? + o(s)? + 2p0(s)or(s,T)) ds. (1.58)
Je
Completing the integration (compare with (1.42)) and using S?(t) =
S(t)/P(t,T), we arrive at a modified BSM-type call option formula:
Proposition 1.9.5. Consider a BSM economy with stochastic interest rates
evolving according to (1.55). Define term variance v(t, T) as in (1.58). Then,
the T-maturity European call option price is
lt) = S)O(dy) — KPULT)OW.),
_ In(S(t)/ (KPT) = 30(t,T)
u(t,T) :
Proposition 1.9.5 was originally derived in Merton [1973], using PDE methods.
Extensions to dividend-paying stocks are straightforward and follow the
arguments shown in Section [Link].
ds
1.10 Options with Early Exercise Rights
In our previous definition of a contingent claim, we assumed that the claim
involved a single F-measurable payout at time T. In reality, a number
of derivative contracts may have intermediate cash payments from, say,
scheduled coupons or through “rebates” for barrier-style options. Mostly,
such complications are straightforwardly incorporated; see for instance
Section 2.7.3. Of particular interest from a theoretical perspective are the
claims that allow the holder to accelerate payments through early exercise.
Derivative securities with early exercise are characterized by an adapted
payout process U(t), payable to the option holder at a stopping time (or
exercise policy) tT < T, chosen by the holder. If early exercise can take
place at any time in some interval, we say that the derivative security is an
American option; if exercise can only take place on a discrete set of dates,
we say that it is a Bermudan option.1.10 Options with Early Exercise Rights 31
Let the allowed (and deterministic) set of exercise dates larger than
or equal to t be denoted D(t), and suppose that we are given at time 0
a particular exercise policy 7 taking values in D(0), as well as a pricing
numeraire NV inducing a unique martingale measure Q%. Let V7(0) be the
time 0 value of a derivative security that pays U(r). Under some technical
conditions on U(t), we can write for the value of the derivative security
V7(0) = BE” (Aa) , (1.59)
where we have assumed, with no loss of generality, that N(0) = 1. Let T(t)
be the time ¢ set of (future) stopping times taking value in D(t). In the
absence of arbitrage, the time 0 value of a security with early exercise into
U must then be given by the optimal stopping problem
w (U(r)
V(0)= sup V7(0)= sup E® ( ) : (1.60)
reT(0) reT(0) N(x)
reflecting the fact that a rational investor would choose an exercise policy
to optimize the value of his claim.
We can extend (1.60) to future times ¢ by
Hon gaye (HB).
where supe7y) E2’ (U(r)/N(r)) is known as the Snell envelope of U/N
under Q". The process V(t) must here be interpreted as the value of the
option with early exercise, con mal on exercise not having taken place
before time t. To make this explicit, let 7* € T(0) be the optimal exercise
policy, as seen from time 0. We can then write, for0 E%” (V(t)/N()),
which establishes that V(t)/N(t) is a supermartingale under Q’. This result
also follows directly from known properties of the Snell envelope; see, e.g.,
Musiela and Rutkowski [1997].
For later use, focus now on the Bermudan case and assume that D(0) =
{T;,T2,...,Tp}, where T, > 0 and Tg = T. For t < Ti41, define H;(t) as
the time ¢ value of the Bermudan option when exercise is restricted to the
dates D(Ti+1) = {Ti41, Tiz2,--., Tp}. That is32 1 Introduction to Arbitrage Pricing ‘Theory
:
H(t) =NEP (V(Tis1)/N(ina)), 4 1,---,B-1.
At time T;, H;(T;) can be interpreted as the hold value of the Bermudan
option, that is, the value of the Bermudan option if not exercised at time T;.
If an optimal exercise policy is followed, clearly we must have at time T;
V(T;) = max(U(T,), Hi(Ti)), i=1,...,B, (1.63)
such that
H,(t) = N(t)E2” (max (U(Ti41), Hisi(Dia)) /N(Pix)), i= 1,0. B=.
(1.64)
Starting with the terminal condition Hg(T) = 0, (1.64) defines a useful
iteration backwards in time for the value V(0) = Ho(0). We shall use this
later for the purposes of designing valuation algorithms in Chapter 18, and
for computing price sensitivities (deltas) in Chapter 24.
We note that the idea behind (1.63) is often known as dynamic program-
ming or the Bellman principle. Loosely speaking, we here work “from the
back” to price the Bermudan option. As we shall see later (in Chapter 2),
this idea is particularly well-suited for numerical methods that proceed
backwards in time, such as finite difference methods.
1.10.1 The Markovian Case
We now specialize to the Markovian case where U(t) = g(t, x(t)), where
g: (0,T] x R” — R is continuous and
de(t) = (t,2(t)) dt +o (t,2(t)) aw(t) (1.65)
is an n-dimensional Markovian process, where js and o satisfy the regularity
conditions of Theorem 1.6.1. The n-dimensional process!® x(t) here defines
the state of the exercise value U(t), so we say that x(t) is a state variable
process, For concreteness let our numeraire N(t) be the money market
account
N(t) = B(t) = elt r(ue(u)) due
where the short interest rate r : [0,7] x R" > R is here assumed a function
of time and the state variable vector z. In (1.65), W(t) is understood to be
a d-dimensional Brownian motion in the risk-neutral measure Q.
Writing V(t) = V(t,2(t)), we have from (1.61)
V(t,2) = sup B9 (ee reur(u) 449 (7, 2(7))] 2(0) = 2). (1.66)
6Note that «(t) is an abstract construct, and does not necessarily coincide with
any asset price process.1.10 Options with Early Exercise Rights 33
For dates t € D(0), clearly V(t,x) > g(t,x), with equality holding only when
time t exercise is optimal. This leads us to define the concept of an exercise
region as
X = {(t,c) € D(0) x R": V(t, x) = g(t,x)}.
Similarly, we define the complement of 7,
C= {(t,x) € [0,7] xR": (t,2) ¢ X},
to be the continuation region, i.e. the region where we wait (either because
exercise is not optimal or because it is not allowed, t ¢ D(0)) rather than
exercise the option.
For Markovian systems, rather than solving the optimization problem
(1.66) directly, it is often particularly convenient to invoke the Bellman
principle. Extending the ideas presented earlier, let us, somewhat loosely,
state the Bellman principle as follows: for any t € D(0),
V(t.) = lim max (ott 1), E2 (eo SA rue) uy (44 Asa(t + 4)))) .
(1.67)
Again, this simply says that the option value at time ¢ is the maximum of
the exercise value and the hold value, that is, the present value of continuing
to hold on to the option for a small period of time. As we have seen above,
for a Bermudan option, (1.67) also holds for finite A (namely up to the next
exercise date).
The Bellman principle provides us with a link between present (time
t) and future (time t + A) option values that we can often exploit in a
numerical scheme. For this, however, we need further characterization of
V(t,x) in the continuation region. By earlier arguments, we realize that
V(t, x)/B(t) must be a Q-martingale on the continuation region. Assuming
sufficient smoothness for an application of Ito’s lemma, this leads to a PDE
formulation, to hold for (¢,a) € C,
IV (t,x) =0, (1.68)
where
t, ly Se (t, t, & r(t, x
r= sun? HLL ott) ).5 Reda ~ Mh)
Assume first that our option is of the Bermudan type, and let 7, and
Tra be subsequent exercise dates in the exercise schedule. For any function
f of time, define f(t+) to be the limits limeyo f(t + €), and assume that
V(Ti41—,2) is known for all «. As all values of t € (Tj,Ti41) by definition
must be in the continuation region, we can use (1.68) to solve for V(Tj+, x).
Applying the Bellman principle (1.67) at time T; then leads to the condition
V (T;-, 2) = max (9(T;, 2), V (Ti+, 2).34 1 Introduction to Arbitrage Pricing Theory
In PDE parlance, this is a so-called jump condition which is straightforward
to incorporate into a numerical solution; see Section 2.7.4 for details.
For American-style options, (1.68) continues to apply on C. The Bellman
principle here leads to the characterization that
TV(t,x) <0,
for (t,r) € #, i.e. we exercise when the rate of return from holding the
option strictly fails to match r(t,). The American option pricing problem is
often conveniently summarized in a variational inequality, to hold on XY UC,
Vita) >9(t.2), IV(t,2) <0, (V(t,x) —9(t,2)) IV(t,2) = 0,
(1.69)
and subject to the boundary condition V(T, x) = g(T, x). The first of these
three conditions expresses that the option is always worth at least its exercise
value; the second expresses the supermartingale property of V(t, x); and the
third implies (after a little thought) that 7V(t,z) = O0onC and JV(t, x) <0
on X. The system (1.69) is discussed more carefully in Duffie [2001], where
additional discussion of regularity issues may also be found.
1.10.2 Some General Bounds
In many cases of practical interest, solving PDEs and/or variational in-
equalities is not computationally feasible. In such situations, we may be
interested in at least bounding the value of an option with early exercise
rights. Providing a lower bound is straightforward: postulate an exercise
policy 7 and compute the price V7 (0) by direct methods. From (1.60), clearly
this provides a lower bound
v7(0) < V(0). (1.70)
The closer the postulated exercise policy 7 is to the optimal exercise policy
r*, the tighter this bound will be. We shall later study a number of numerical
techniques to generate good exercise strategies for fixed income options with
early exercise rights, see Chapter 18.
To produce an upper bound, we can rely on duality results established
in Rogers [2001], Haugh and Kogan [2004] and Andersen and Broadie
[2004]. Let K denote the space of adapted martingales M for which
sup, (0,7) E2” |M(r)| < 00. For a martingale M € K, we then write
cap 20% (UD
ro (xa)
U
= an (U(r) 7
ero] (xa + MO) i)
vio
= Pe Qn U(r) i
=M(0)+ sup E Go M(n)).1.10 Options with Early Exercise Rights 35
In the second equality, we have relied on the optional sampling theorem, a
result that states that the martingale property is satisfied up to a bounded
random stopping time, i.e. that E®’(M(r)) = M(0); see Karatzas and
Shreve [1997] for details. We now turn the above result into an upper bound
by forming a pathwise maximum at all possible future exercise dates D(0):
vrs a (mi)
” U(t)
U(t) and A(t) > 0. In
conclusion, we have arrived at a dual formulation of the option price
¥¢0) = jn (100) + 82" (a (a3 - mi) } . (7)
and have demonstrated that the infimum is attained when the martingale
M is set equal to the martingale component of the deflated price process36 1 Introduction to Arbitrage Pricing ‘Vheory
V(t)/N(t). In practice, we are obviously not privy to V(t)/N(t) (which is a
quantity that we are trying to estimate), but we are nevertheless provided
with a strategy to make the upper bound (1.71) tight: use a martingale that
is “close” to the martingale component of the true deflated option price
process. In Chapter 18 we shall demonstrate how to make this strategy
operational.
1.10.3 Early Exercise Premia
We finish our discussion of options with early exercise rights by listing some
known results for puts and calls, including an interesting decomposition
of American and Bermudan option prices into the sum of a European
option price and an early exercise premium. For convenience, we work in
a Markovian setting where the single state variable, denoted S(t), follows
one-dimensional GBMD. Specifically, we assume that
dS(t)/S(t) = (r — q)dt +a dW (2), (1.73)
with W(t) being a one-dimensional Brownian motion in the risk-neutral
measure, i.e. the measure induced by the money market account 8(t) =e”.
For simplicity we assume that the interest rate r, the dividend yield g, and
the volatility o are all constants; the extension to time-dependent parameters
is straightforward.
Let c(t), Ca(t), and Ca(t) be the time t European, American, and
Bermudan prices of the call option with terminal maturity T, conditional
on no exercise prior to time t. While obviously c(t) < Cg(t) < Ca(t), in
some cases these inequalities are equalities, as the following straightforward
lemma shows.
Lemma 1.10.2. Suppose that r > 0 and q <0 in (1.73). It is then never
optimal to exercise a call option early, and
e(t) = Ca(t) = Ca(t).
Proof. Notice that, by Jensen’s inequality,
ft) =e -PEP ((S(T) — K)*)
+ . +
> en(T-) ((e (S(T) -K) ) 7 (e890 7 ent) .
It is therefore clear that ifr >0 and g < 0, then for any value of T — t,
e(t) > (S@) - K)*,
ive. the European call option price dominates the exercise value. As the
hold value of Ainerican and Bermudan options must be at least as large
as the European option price, it follows that the option to exercise early is
worthless. O1.10 Options with Early Exercise Rights 37
Remark 1.10.3. For the put option, early exercise is never optimal if r < 0
and q > 0. As this situation rarely happens in practice, American put options
nearly always trade at a premium to their European counterparts.
Lemma 1.10.2 demonstrates the well-known fact that American or Bermu-
dan call options on stocks that pay no dividends (q = 0) should never be
ed early. On the other hand, if the stock does pay dividends, for an
American call option there will, at time t, be a critical value of the stock,
Sa(t), at which the value of the stream of dividends paid by the stock will
compensate for the cost of accelerating the payment of the strike K. In other
words, an American option should be exercised at time t, provided that
S(t) > Sa(t). The deterministic curve S(t) is known as the early exercise
boundary and marks the boundary between the exercise and continuation
regions, ¥ and C. Writing C(t) = Ca(t, S(t)), we formally have
Sa(t) = inf {s = Ca(t,$) =(S- K)*} , t 0 and the high
contact condition states that the delta equals —1 at the exercise boundary.
Establishing the boundary S'i4(t) will virtually always require numerical
methods, although asymptotic results are known for t close to T (see for
instance Lipton {2001]). One simple result is listed below.
Lemma 1.10.6. Assume that r > 0 and q > 0, such that the early exercise
boundary exists for the American call option. The exercise boundary just
prior to maturity is then
lim $4(T —¢) = K max (. r) .
210 q1.10 Options with Early Exercise Rights 39
Proof. An informal proof of Lemma 1.10.6 proceeds as follows. At time
T — dt, assume that S(T — dt) > K; otherwise it clearly makes no sense to
exercise the option. If we exercise the option, we receive S(T — dt) — K at
time T — dt. On the other hand, if we postpone exercise, at time T — dt our
hold value is
eT CER y (S(T) — K) = S(T — dtje""" — Ker
= S(T —dt)- K — S(T —t)qdt + Krdt.
Clearly, we should then only exercise if
S(T — dt) — K > S(T ~ dt) — K — S(T — dt)qdt + Krat
or if
S(T —dt)q> Kr.
5B
Notice that since clearly S4(T) = K, the call option exercise boundary
will have a discontinuity at time T, if g s4(y}(S(Q) — K).
In the continuation region, C'4(t, 5) satisfies the PDE (1.47), ie.
aCalt, 8) g2CaltrS) | 12 g20°Calt,S)
apt Sag + 9S co
Inserting this into (1.79) we get, after a few rearrangements,
eCate)
=rCa(t, S).
dCa(t) = rCa(t) dt + 1siyes,(y)(r — 2) S(t) = aw (t)
+ ls@rsawny {((r — DS) - Cat ) a + oS(t) dw? (t)}
=rCalt)at + lyseesncoylr— 991) 220 away
+ Ieswrsaiy {(rK — ¢S(t)) dt + oS(t) dw? (t)}
Setting y(t) = Ca(t)/B(t), it follows from Ito’s lemma that
OC a(t
dy(t) =e" Lisaycsa(y}(t — 2)3(t) 40 awe)
+ lisyesrye™ {(rK — gS(t)) dt + S(t) dw? (t)} .
Integrating and taking expectations leads to
T
EP (¥(T)) = u(t) +f OME (1ys(ay>satw)} (TK — 9S(u))) du
Applying the definition of y(t) and the fact that y(T) = e~"7 (S(T) — K)+
proves (1.76). The explicit form of the early exercise premium in (1.78)
follows from the properties of GBMD. 0
Remark 1.10.8. Combining results from Lemma 1.10.6 and Proposition
1.10.4, it follows that E(t) > 0, so Ca(t) > c(t) as expected.
The integral representation of the American call option in Proposition
1.10.7 forms the basis for a number of proposed computational methods for
American option pricing. Loosely speaking, these methods are based on the
idea of iteratively estimating the exercise boundary S(t), often working
backwards from ¢ = T, after which an application of Proposition 1.10.7 will
yield the American option price. A representative example of these methods
can be found in Ju [1998]. See Chiarella et al. [2004] for a survey of the
literature, and Section 19.7.3 for an application in interest rate derivative
pricing.
For a Bermudan option, an integral representation such as that in Propo-
sition 1.10.7 is not possible. Nevertheless, it is still possible to break the1.10 Options with Early Exercise Rights 41
Bermudan call option into the sum of a European option and an early
exercise premium. To show this, assume that the allowed exercise dates are
D(0) = {T1,T2,...,T pz}, and let Sg(T;) be the exercise level above which
the Bermudan option should be exercised at time T;, i = 1,...,B. Notice
that if at time J; we have S(T;) > Sg(T;), then Cg will jump down in value
when time progresses past time Tj, as a reflection of the missed exercise
opportunity. Indeed, in the earlier notation of hold and exercise values, we
have
Cp(Ti) = max (U(T;), H(Ti)),
Cn(Ti+) = H(T),
which makes the jump in value evident. Given the existence of these jumps,
we may write
dCp(t) = rCp(t) dt + dM(t)
B
+ > Lps¢r,)>sp(72)}9(Ti — t) (H (Ti) — U(T)) dt,
i=l
where H(T;) = Cg(T;+) is the hold value at time T;, U(T;) = S(T;) — K,
and M(t) is a martingale,
OCz(t)
as"
Deflating Cg by the money market account and forming expectations, we
get, since c(t) = e779 E2(CR(T)),
Cal) = ct) + Soe PRP (1p scr) >sacn} (UT) ~ H(Ti))) -
Ti>t
dM(t) =
a) S(t) dw? (2).
As H(T;) must be less than the exercise value U(T;) whenever $(T;) > Sp(Ti)
we can simplify this expression to the following result. that we, in Section
18.2.3, call the marginal exercise value decomposition.
Proposition 1.10.9. The Bermudan option price Cg(t) satisfies
Ca(t) =clt)+En(t), t R are sufficiently regular
to make (2.1) and (2.2) meaningful (see Chapter 1).
The terminal value problem above is, as discussed earlier, a Cauchy
problem to be solved for V(t,x) on (t,x) € {0,T) x B. In many cases of
practical interest, further boundary conditions are applied in the spatial
(x) domain. If such boundary conditions are expressed directly in terms of
V (rather than its derivatives) we have a Dirichlet boundary problem. For
instance, a so-called up-and-out barrier option will pay out g(x(T)) at time
T if and only if x(t) stays strictly below a contractually specified barrier
level H at all times t < T. If, on the other hand, x(t) touches H at any time
during the life of the contract, it will expire worthless (or “knock out”). In
this case, the PDE is only to be solved on (t,x) € [0,T) x (BN (—00, H))
and is subject to the Dirichlet boundary condition
V(t,H) =0, t€ (0,7),
which expresses that the option has no value for z > H. We note that it is
not uncommon to encounter options where the spatial domain boundaries
are functions of time, a situation we shall deal with in Section 2.7.1. Also,
as we shall see shortly, sometimes boundary conditions are conveniently
expressed in terms of derivatives of V.
For numerical solution of the PDE (2.1), we often need to assume that
the domain of the state variable z is finite, even in situations where (2.1) is
supposed to hold for an infinite domain. Suitable truncation of the domain
can often be done probabilistically, based on a confidence interval for x(T).
To illustrate the procedure, consider the Black-Scholes PDE (1.47) applied to
a call option with strike K. A common first step is to use the transformation
zx = In S, such that the PDE has constant coefficients,
av 15\8 1 40°V _
a+ 0) 4 3 rv =0, (2.3)2.2 Finite Difference Discretization 45
with terminal value (for a call option) V(T, x) = (e*—K)*. The domain of
z is here the entire real line, B = R. We know (from (1.39)) that
2(T) = 2(0) + rte? T +0(W(T)-W(0)), (2.4)
2
which is a Gaussian random variable with mean % = x(0) + (r — 40)T and
variance o?T. Consider now replacing the domain (—oo, 00) with the finite
interval [= — ao VT, + ac VT] for some positive constant a. The likelihood
of z(L) falling outside of this interval is easily seen to be 26(—a) (where, as
always, ®(z) is the standard Gaussian cumulative distribution function). If,
say, we set a to 4, 26(—a) = 6.3 x 1075, which is an insignificant probability
for most applications. Larger (smaller) values of a will make the truncation
error smaller (larger) and will ultimately require more (less) effort in a
numerical scheme. We recommend values of a somewhere between 3 and
5 for most applications. For the Black-Scholes case, a rigorous estimate of
the error imposed by domain truncation is given in Kangro and Nicolaides
(2000).
In many cases of practical interest, it is not possible to write down
an exact confidence interval for «(T). In such cases, one instead may use
an approximate confidence interval, found by, for instance, using “average”
values for (t,x) and o(t, x). High precision in these estimates is typically
not needed.
2.2 Finite Difference Discretization
In order to solve the PDE (2.1) numerically, we now wish to discretize it on
the rectangular domain (t,.:) € [0,7] x [M, M], where M7 and M are finite
constants, possibly found by a truncation procedure such as the one outlined
above. We first introduce two equidistant! grids {t;}.9 and (as}jco where
t, =iT/n £ iA, i =0,1,...,n, anda; = M+j(M-M)/(m+1) = M+jA,,
j=0,1,...,m+1. The terminal value V(T, x) = g(x) is imposed at tn = T,
and spatial boundary conditions are imposed at t and £41.
2.2.1 Discretization in x-Direction. Dirichlet Boundary
Conditions
We first focus on the spatial operator £ and restrict x to take values in
the interior of the spatial grid x € {x;}7%,. Consider replacing the first-
and second-order partial derivatives with first- and second-order difference
operators:
1Non-equidistant grids are often required in practice and will be covered in
Section 2.4.46 2 Finite Difference Methods
V(t, aj41) — V(t, 27-1)
5,V(t, xj) © Cy nn (2.5)
Vit. Vi ~1) — 2V(E,
SnaV (t,0;) & Ve tesa) + V(b 25-1) = V(b 25) es 1) = 2V (tj) (2.6)
3
These operators are accurate to second order. Formally”,
Lemma 2.2.1.
OV (t,x
5,V(t,25) = Wiis) +0(42),
PV (ta
SneV (t, 05) = oVieee) +0(42).
Proof. A Taylor expansion of V(t, x) around the point x = x; gives
OV (t, 25
V(tuatj41) = V(t.2,) + Oe
PV(t,a;) , 1.3 BV (t,25)
+5 5A + Eat aa +0 (42),
and
OV(t, 25
V(tyay-1) = V(t) — 1
OV(t,2;) 1 ,30°V(t, 23) 4
+5ap Mea) _ 7 ag PVE) 5 o (a8).
Insertion of these expressions into (2.5) and (2.6) gives the desired result.
a
In other words, if we introduce the discrete operator
L = ult,2)b_ + 50(t,2)*6ex —r(t,2),
we have, for x € {x;}71,
LV (t,x) = LV(t,2) +O (A2).
With attention restricted to values on the grid {2;}72,, we can view Lasa
matrix, once we specify the side boundary conditions at xp and 241. For
the Dirichlet case, assume for instance that
V(20,t) =F (t,%0), V(amsrst) =F (t,m41),
?Recall that a function f(h) is of order O(e(h)) if [f(h)|/le(h)| is bounded
from above by a positive constant in the limit h > 0.2.2 Finite Difference Discretization AT
for given functions f,f : (0,7) x R — R. With? w(t) 4
(V(t,21),---,V(t,em))" and, for j = 1, ,
o(t) & —o(t,ay)?AZ? — r(t, x5), (2.7)
1 1
ug(t) © sm(t,aj)Az? + got ei) AS*, (2.8)
Ut) & —Sultsay)g" + Solty2y)?Az?, (2.9)
we can write 7
LV(t) = A(t) V(t) + QU), (2.10)
where A is a tri-diagonal matrix
ex(t) u(t) 0 0 0 0
l(t) co(t) w(t) 0 0 0
0 I(t) s(t) us(t) 0 0
A(ty=| 2 9 ut ca(t) ua(t) 0 (2.11)
: 7 : . 7 . 0
000. bmualt) nes) taf)
0 0 0 0 0 Im(t) em(t)
and Q(t) is a vector containing boundary values
L(t)F (t,o)
0
X= :
_0
Um (t)F (t, mst)
As discussed earlier, sometimes one or both of the functions f and f are
explicitly imposed as part of the option specification (as is the case for a
knock-out options). In other cases, asymptotics may be necessary to establish
these functions. For instance, for the case of a simple call option on a stock
paying no dividends, we can set
F(t,2) =e — Ket,
f(t,x) =0,
where we, as before, have set « = In S (S being the stock price) and assumed
that the strike K is positive. The result for f is obvious; the result for
F follows from the fact that a deep in-the-money call option will almost
certainly pay at maturity the stock (the present value of which is just S = e*)
minus the strike (the present value of which is Ke7"(?-)).
3For clarity, this chapter uses boldface type for all vectors and matrices.48 2 Finite Difference Methods
2.2.2 Other Boundary Conditions
Deriving asymptotic Dirichlet conditions can be quite involved for compli-
cated option payouts and is often inconvenient in implementations. Rather
than having to perform an asymptotic analysis for each and every type of
option payout, it would be preferable to have a general-purpose mechanism
for specifying the boundary condition. One common idea involves making
assumptions on the form of the functional dependency between V and x
at the grid boundaries, often from specification of relationships between
spatial derivatives. For instance, if we impose the condition that the second
derivative of V is zero at the upper boundary (2.41) — that is, V is a linear
function of « — we can write (effectively using a downward discretization of
the second derivative)
V(t, ¢m41) + V(t,em—1) — 2V(t, 2m) _ 0
a2 ~
ie
=> V(t, tm+i) = 2V(t,2m) — V(t, em~1)-
A similar assumption at the lower spatial boundary yields
V(t, 20) = 2V(t,21) — V(t»).
For PDEs discretized in the logarithm of some asset, it may be more
natural to assume that V(t,x) « e* at the boundaries; equivalently, we
can assume that OV/Ox = 0°V/dx? at the boundary. When discretized in
downward fashion at the upper boundary (241), this implies that.
V(tytmar) —V (ttm) _ V(t,tme1) + V(t, ¢imai) = 2V (tem)
Ay a
or (assuming that A, # 1)
1
V(t,m41) = V(t tm—1) A =a+ V(t tm) A xT
Similarly,
a 2+ Ar _ . 1
V(t, 20) = V(t.) 7 +a, Vibe)as
Common for both methods above — and for the Dirichlet specification
discussed earlier — is that they give rise to boundary specifications through
simple linear systems of the general form
V(t, @m41) = kmn(t)V (tim) + km—1(t)V (ts @m—1) + f (t)@m41), (2.12)
V(t, 20) = ki(t)V(t, v1) + ko(t)V(t, a2) + f(t, 20). (2.13)
This boundary specification can be captured in the matrix system (2.10) by
simply rewriting a few components of A(t); specifically, we must set2.2 Finite Difference Discretization 49
* = 1(t, 0m) + Fm(t)Um(t)s
1 1
Im(t) = =5u(t,tm)As* + 50(t, tm) Az? + hm—a(t}um(t),
ex(t) = -o(t,2;)?AZ? — r(t,25) + k(t),
em(t) = —9(t, tm)”
1
a(t) = Zultsor)Ag! + 5o(te)?Ag? + halt lt.
All other components of A remain as in (2.11); note that A remains tri-
diagonal.
An alternative approach to specification of boundary conditions in the
a-domain involves using the PDE itself to determine the boundary condi-
tions, through replacement of all central difference operators with one-sided
differences at the boundaries. Section [Link] contains a detailed example
of this idea; ultimately, this approach leads to boundary conditions that can
also be written in the form (2.12)-(2.13).
2.2.3 Time-Discretization
To simplify notation, assume for now that Q(t) = 0 for all t, as will be
the case if, say, we use the linear or linear-exponential boundary conditions
outlined earlier. On the spatial grid, our original PDE can be written
“sO = —A(t)V(t) + O (A?)
which, ignoring the error term*, defines a system of coupled ordinary differ-
ential equations (ODEs).
A number of methods are available for the numerical solution of coupled
ODEs; see, e.g., Press et al. [1992]. We here only consider basic two-level
time-stepping schemes, where grid computations at time ¢; involve only
PDE values at times t; and t;41. Focusing the attention on a particular
bucket [t;,t;41], the choice for the finite difference approximation of OV/At
is obvious:
OV _ Vitiss)- Viti)
ot At .
Not so obvious, however, is to which time in the interval [t;,¢;41] we should
associate this derivative. To be general, consider picking a time t}+1(0) €
(ti, tis], given by
41 (6) = (1 — O)tin + ti, (2.14)
where € (0, 1} is a parameter. We then write
BV (6) | Vitus) = Vite)
at A, 7
‘Note that the error term O(A2) is here to be interpreted as an m-dimensional
vector. We will use such short-hand notation throughout this chapter.50 2 Finite Difference Methods
By a Taylor expansion, it is easy to see that this expression is first-order
accurate in the time step when 6 # 3, and second-order accurate when
6 = }. Written compactly,
av (18 tigi) — V(ti
we") = We ts +1 fo4}0 (4) +0 (42). (2.15)
This result on the convergence order is intuitive since only in the case 9 = 2
is the difference coefficient precisely central; for all other cases, the tite
ence coefficient is either predominantly backward i in time or predominantly
forward in time.
The time-discretization technique introduced above i is known as a theta
scheme. The special cases of @ = 1, 6 = 0, and 6 = 4 are known as the fully
implicit scheme, the fully explicit scheme, and the Crank-Nicolson scheme,
respectively. In light of the convergence result (2.15), one may wonder
why anything other than the Crank-Nicolson scheme is ever used. The CN
method is, indeed, often the method of choice, but there are situations where
a straight application of the Crank-Nicolson scheme can lead to oscillations
in the numerical solution or its spatial derivatives. Judicial application of
the fully implicit method can often alleviate these problems, as we shall
discuss later. The fully explicit method should never be used due to poor
convergence and stability properties (see Section 2.3), but has nevertheless
managed to survive in a surprisingly large number of finance texts and
papers.
2.2.4 Finite Difference Scheme
We now proceed to combine the discretizations (2.10) and (2.15) into a
complete finite difference scheme. First, we expand
A (HB) V (ti°2(0)) = 8A (117(8)) Viti)
+ (1-8) A(47*(6)) V(tier) + Lgoey} (Ae) + 0 (42),
such that our PDE can be represented as
Vitis) — Viti)
——A + 149 44}0(A1) + O (A?)
=A (G8) V (EP @) +0 (42)
= — OA (t)**(8)) V(t.) — (1 = 0) A (47*(6)) V(tins)
+ Logg} (Ar) + O (4?) +0 (A2).
Multiplying through with A, gives rise to the complete finite difference
representation of the PDE solution at times t; and t;41:2.2 Finite Difference Discretization 51
Proposition 2.2.2. On the grid {xj}, the solution to (2.1) at times ty
and ti41 is characterized by
(I-6A,A (t)*1(8))) V(ti) = (I+ (1 - @) ALA (#177(0))) V(tign) + ei,
(2.16)
ith
where I is the m x m identity matrix, and e;** is an error term
1 = AcO (AZ) + 1f943}0 (Az) +0 (42). (2.17)
Let V(ti,2) denote the approximation to the true solution V(t),
obtained by using (2.16) without the error term. Defining
Vi) = (Pea)... Pam),
we have
(1-04, A (t**1(0))) V(t,) = (T+ (1 — 8) ALA (t°47(0))) V(tign). (2.18)
For a known value of Vitins)s (2.18) defines a simple linear system of equa-
tions that can be solved for Viti) by standard methods. Simplifying matters
is the fact that the matrix (I-9A,A(t{*?(0))) is tri-diagonal, allowing us to
solve (2.18) in only O(m) operations; see Press et al. [1992] for an algorithm®.
Starting from the prescribed terminal condition V(tn,2)) = g(x), J =
1,...,m, we can now use (2.18) to iteratively step backward in time until
we ultimately recover V(0). This procedure is known as backward induction.
Proposition 2.2.3. The theta scheme (2.18) recovers V(0) in O(mn) op-
erations. If the scheme converges, the error on V(0) compared to the exact
solution V(0) is of order
0 (A2) + 1¢543}0 (Ar) +0 (42).
Proof. The backward induction algorithm requires the solution of n tri-
diagonal systems, one per time step, for a total computational cost of O(mn).
The local truncation error on V(t;) is e't?, making the global truncation
error after n time steps of order ne'+!. Combining (2.17) with the fact: that
n= T/A; = O(A;') gives the order result listed in the proposition. O
The special case of an explicit scheme (@ = 0) provides us with a direct
expression for V(t,,2;) in terms of V(ty41,2j-1), V(lig1, 2), and V(ti21, 2541),
a scheme that is easily visualized as a “trinomial tree”. The intuitive nature of the
explicit scheme coupled with the fact that no matrix equation must be solved may
explain the popularity of this scheme in the finance literature, despite its poor
numerical qualities (see Section 2.3). We stress that the workload of the explicit
scheme is still O(m) per time step, as is the case for all theta schemes.52 2 Finite Difference Methods
It follows from Proposition 2.2.3 that the Crank-Nicolson scheme is
second-order convergent in the time step, and all other theta schemes are
first-order convergent in the time step. All theta-schemes are second-order
convergent in the spatial step Az.
In deriving (2.18), we assumed earlier that the boundary vector was zero,
Q(t) = 0. Including a non-zero boundary vector into the scheme is, however,
straightforward and results in a time-stepping scheme of the form
(I-0A,A (t#*1())) V(t) = (I+ (1 — 8) ALA (t3*1(0))) V (tig)
+ (1~ 0)2%ti41) + OME). (2.19)
Again, this system is easily solved for Viti) by a standard tri-diagonal
equation solver.
As a final point, we stress that the finite difference scheme above ulti-
mately yields a full vector of values Vo) at time 0, with one element per
value of z;, j =1,...,m. In general, we are mainly interested in V(0, x(0)),
where «(0) is the known value of x at time 0. There is no need to include x(0)
in the grid, as we can simply employ an interpolator (e.g., a cubic spline) on
this vector V(0) to compute V(0, x(0)). Clearly, such an interpolator should
be at least second-order accurate to avoid interfering with the overall O(A?)
convergence of the finite difference scheme. Assuming the interpolator is
sufficiently smooth, we can also use it to compute various partial derivatives
with respect to x that we may be interested in. Alternatively, these can
be computed by the same type of finite difference coefficients discussed in
Section 2.2.1. The derivative OV(0,x(0))/Ot — the time decay — can be
picked up from the grid in the same fashion.
Remark 2.2.4. The scheme (2.18) may, without affecting convergence order,
be replaced with
(I-OA,A(t:)) V(ti) = (+ 0 - A) ALA (t41)) Vig).
2.3 Stability
2.3.1 Matrix Methods
Ignoring the contributions from boundary conditions, the finite difference
scheme developed in the previous section can be rewritten
V(t) = BEV (tay), (2.20)
where
Byt? S (1-04, A (ti*7(0)))~ * (E+ (1-8) ALA (ti*4(0))).2.3 Stability 53
That is, for any 0S k Miltaje i,
L
where H;(t;,) and w are the amplification factor (discrete Fourier transform)
and wave number for the /-th mode, respectively. Notice that i here denotes
©The spectral norm of a matrix C is defined as the largest absolute eigenvalue
of (C’C)'/?. ‘Phe infinity norm is defined as max; }); |Ci,y|-
7In the application to PDEs with non-constant coefficients, it may help to
think of the von Neumann analysis as being applied to the PDE locally with
“frozen” coefficients, followed by an examination of the worst case among all frozen
coefficients.54 2 Finite Difference Methods
the imaginary unit, i? = —1, with k (momentarily) having taken the role of
the time index in the finite difference grid. For the constant coefficient case,
a key fact for our PDE problem is that
Hi(te) = Hiltes)&r"s
where & is a mode-specific amplification factor independent of time. To
determine how a solution is propagated back through the finite difference
grid, it thus suffices to consider a test function of the form
(ty sj) = E(w) Rete (2.22)
According to the Von Neumann criterion, stability of (2.20) requires that the
modulus of the amplification factor E(w) is less or equal to one, independent
of the wave number:
Vw : |E(w)| <1. (2.23)
This criterion is natural and merely expresses that all eigenmodes should be
dampened, and not exponentially amplified, by the finite difference scheme.
Turning to our system (2.20), assume for simplicity that r(t,«) =0. A
positive interest rate (we will nearly always have r(t,%) > 0) introduces
some extra dampening through discounting effects and will, if anything, lead
to better stability properties than the case of zero interest rates. Writing
v([Link]) = veg, O(tET (8), ay) = oe, and w(t{**(9),23) = jue, the von
Neumann analysis gives the following result:
Proposition 2.3.1. Define a = A,/(Az)*. For (2.20) with r(t,x) = 0, the
von Neumann stability criterion is
2
Chg
; (2.24)
mle
Nok g + hg At + [125 Az = 08.5
to hold for all k =0,1,...,n—1, j= 1,2,....m.
Proof. Define sj, = o7,; + Arus,j- A local application of (2.20) gives
ad _ ad
Uk j-1 (-Sa,) + vg (1 + aO0% 5) + Ue 544 (-¥st,) =
a(1—@) _ 3 1-@
deans (7s) +ups1(l—a(L- op) Hee et (7)
with a defined above. Inserting (2.22) and rearranging (using Euler’s formulas
for sin and cos) yields
1— (1 —A)aoZ (1 — coswAg) + i(1 — AoAg py, SinwAy
1+ dao ;(1 — coswAz) — Bas j1x,j 8inwAy .
Ew) =2.3 Stability 55
Note that € is a function of k and j, due to the non-constant PDE parameters.
As discussed earlier (see also Mitchell and Griffiths [1980]), we expect. the
system to be stable if the criterion (2.23) holds for all k and j in the grid.
Computing the modulus of € and requiring that it does not exceed one leads,
after straightforward manipulations, to the stability criterion
Ve: 2aog,; + (26 1)0? fof, + ui AZ + coswAs (uz gAz — o%,;)] 2 0.
As coswA, € [-1, 1], this expression can be simplified to (2.24). 0
From (2.24) we can immediately conclude that the finite difference scheme
is always stable if 3 <@ <1, irrespective of the magnitudes of A, and A;.
For 3 < @ <1, we therefore say that the theta scheme is absolutely stable,
or simply A-stable. Both the fully implicit (@ = 1) and the Crank-Nicolson
(6 = 4) finite difference schemes are thus A-stable. For the explicit scheme
(@ = 0), however, stability is conditional, requiring
2 2 4 2 2 2 2 4
Fok g 2 Thy + MRA + [Meg AE — oh |
For small drifts, this expression amounts to the restriction oR, < a2 /A,
which can be quite onerous, often requiring the (laborious) use of thousands
of time steps in the finite difference grid. We shall not consider fully explicit
methods any further in this book.
Returning to the case } < @ <1, let us introduce a stronger definition
of stability. A time-stepping method is said to be strongly A-stable if the
modulus of the amplification factor € is strictly below 1 for any value of the
time step, including the limit® A, + oo. From (2.24), we see that if A, + 00
(which implies a — 00), then the modulus of the amplification factor could
reach 1 in the special case of @ = 1/2. In other words, the Crank-Nicolson
scheme is not strongly A-stable. For large time steps, harmonics in the Crank-
Nicolson finite difference solution will effectively not be dampened from
one time step to the next, opening up the possibility that unwanted high-
frequency oscillations can creep into the numerical solution. In practice, this
is primarily a problem if high-frequency eigenmodes have high amplification
factors, as can happen if there is an outright discontinuity in the terminal
value function g. The problem is especially noticeable if the discontinuity in
the value function is “close” in both time and space to t = 0 and x = (0)
(as would be the case for a short-dated option with a discontinuity close to
the starting value of x). Oscillations can be prevented by setting the time
step smaller than twice the maximum stable explicit time step (see Tavella
and Randall [2000]), but this can often be computationally expensive. We
shall deal with other methods to suppress oscillations in Section 2.5.
We conclude this section by noting a deep connection between the
stability of a finite difference scheme and its convergence to the true solution
if further |é| approaches zero for At —> 0, the scheme is said to be L-stable.56 2 Finite Difference Methods
of the PDE as A; > 0 and A, — 0. First, we define a finite difference
scheme to be consistent if local (Taylor) truncation errors approach zero
for A, + 0 and A, — 0. All the schemes we have encountered so far are
consistent. Further, define a finite difference scheme to be convergent if the
difference between the numerical solution and the exact PDE solution at
a fixed point in the domain converges to zero uniformly as A; > 0 and
A, —> 0 (not necessarily independently of each other). We then have
Theorem 2.3.2 (Lax Equivalence Theorem). For a well-posed® linear
terminal value PDE, a consistent 2-level finite difference scheme is convergent
if and only if it is stable.
A more precise statement of the above result, as well as a proof, can be
found in Mitchell and Griffiths [1980].
2.4 Non-Equidistant Discretization
In practice, we often wish to align the finite difference grid to particular
dates (e.g., those on which a coupon or a dividend is paid) and particular
values of x (e.g., those on which strikes and barriers are positioned). Also,
for numerical reasons we may want to make certain important parts of the
finite difference grid more densely spaced to concentrate computational effort
on domains of particular importance to the solution of the PDE. To do so,
we will now relax our earlier assumption of equidistant discretization in
time and space. Doing so for the time domain is actually trivial and merely
requires us to replace A; in (2.18) with Ay; © ti41 — ti, where the spacing
of the time grid {t;}/-p is now no longer constant. The backward induction
algorithm can proceed as before. We note that the ability to freely select
the time grid will allow us to line up perfectly with dates that carry high
significance for the product in question (e.g. dates on which cash flows take
place, see Section 2.7.3) or to, say, use coarser time steps for the part of the
finite difference grid that is far in the future. For an adaptive algorithm to
automatically select the time-step, see d’Halluin et: al. [2001].
For the spatial step, we have a number of options to induce non-
equidistant spacing. One method involves a non-linear change of variables
y = h(x) in the PDE, followed by a regular equidistant discretization in
the new variable y. This maps into a non-equidistant discretization in x
which, provided that h(-) is chosen carefully, will have the desired geometry.
Discussion of this method along with guidelines for choosing h(-) can be
found in Chapter 5 of Tavella and Randall [2000]. We will here pursue a more
direct alternative, where we simply introduce an irregular grid {aj }7™"
*Well-posed means that the PDE we are solving has a unique solution that
depends continuously on the problem data (PDE coefficients, domain, boundary
conditions, etc.)2.4 Non-Equidistant Discretization 57
and redefine the finite difference operators (2.5)~(2.6) to achieve maximum
precision. For this, define
+ a ~ =
Ar; = tj41- 23, AL; = Uj—1s
and set
_ Vtej41) — Vas)
t,2j) —
6fV (taj) = ar (ty
25
ro Vv,
» OV (te) =
By a Taylor expansion, we get
OV (t,x 1&V(t,
82V 623) = WE), SO at,
1PV(t.e 2
pein) 2) (43,)"+0((45,)"), (2.25)
_ OV(t,2;) 10?V(t,23) ._
45 V bas) = Oe aa,
10°V(t,2;) ,,— \2 ~ 3
+ gar (425) +0((43,)°). (2.26)
Maximum accuracy on the first-order derivative approximation is achieved
by selecting a weighted combination of (2.25)-(2.26) such that the terms of
order O(A?,) and O(A; ,) cancel. That is, we set
52V (t, 23) = Ae sty 15) A acvee xj) (2.27)
V(t, 25) = eV (td V(t, 9 .
At, +45; c a 7 x ‘J
24+ — \2
— W(t 05) |g ( (Ady) ey + (Aes) 455
Ox At; +45;
which is second-order accurate, in the sense that reducing both A}; and
Az, by a factor of k will reduce the error by a factor of k?. To estimate the
derivative 0°V(t,2r,)/Ox? we set
5¢V (t,x) - Jz V(t,25)
SeeV (t, 2j) = 2
nai 3 (AE; + 425)
2 — \2 3 — \3
_ PV (23) , 9 ( (Az) — (Aey) , (Azs) + (Asy)
Ox? At, +47; At ,+ Az;
(2.28)
which is only first-order accurate, unless At, = Az. Despite this, the
global discretization error will typically remain second-order in the spatial
step, even for a non-equidistant grid. A proof of this perhaps somewhat58 2 Finite Difference Methods
surprising result can be found in the monograph Axelsson and Barker [1991]
on finite element methods.
Development of a theta scheme around the definitions (2.27) and (2.28)
proceeds in the same way as in Section 2.2. The resulting time-stepping
scheme is identical to (2.18), after a modification of the matrix A. Specifically,
we must simply redefine the c-, u-, and l-arrays in (2.7)-(2.9) as follows:
= olt, 25)" — v(t, 23), (2.29)
A; 42; 7
A.
uj(t) © q(t, 05) + alt, 5), (2.30)
(45; +475) 42; (43+ 47; ) Az;
At 1
L(t) £4 -————) ry 2
0 * aay ae) + GE aay an
(2.31)
For an example where having a non-equidistant grid is essential to the
numerical performance of the scheme, see Section 9.4.3.
2.5 Smoothing and Continuity Correction
2.5.1 Crank-Nicolson Oscillation Remedies
As discussed earlier, for discontinuous terminal conditions, the Crank-
Nicolson scheme may exhibit localized oscillations if the time step is too
coarse relative to the spatial step. Depending on the timing and spatial
position of the discontinuities, these spurious oscillations may negatively
affect the computed option value or, more likely, its first (“delta”) or second
(“gamma”) x-derivatives. Further, in the presence of discontinuous terminal
conditions, the expected O(A?) convergence order of the Crank-Nicolson
scheme may not be realized. While O(A?) convergence is possible without
spurious oscillations in some multi-level time-stepping schemes, there is
evidence that these schemes are less robust than the Crank-Nicolson scheme
for many financially relevant problems, see, e.g., Windcliff et al. [2001]. For-
tunately, it is relatively easy to remedy the problems in the Crank-Nicolson
scheme. Specifically, a theoretical result by Rannacher [1984] shows that
second-order convergence can be achieved for the Crank-Nicolson scheme,
provided that two simple algorithm modifications are taken:
© The discontinuous terminal payout is least-squares (L?) projected onto
the space of linear Lagrange basis functions!°.
1Recall that the linear Lagrange basis functions (also called “hat” functions)
are simply small triangles given by lj(x) = l{z,_1<2<2j}* a jo
+12, H},
where the level H (the digital strike) is located between nodes x, and60 2 Finite Difference Methods
x41. For nodes 2;, 7 > k+ 1, clearly V(T,2;) = 1; for nodes x;, j < k,
V(T, xj) = 0. The smoothing algorithm will have effect only at xg or Ze41,
and will set either V(T,z,) or V(T, 2441) to a value somewhere between 0
and 1, depending on which of xg or xg41 is closest to H. If H happens to
be exactly midway between a, or £441, the continuity correction is seen to
have no effect whatsoever.
The digital option example above gives rise to a method listed in Tavella
and Randall [2000] (see also Cheuk and Vorst {1996]). Here, we simply arrange
the spatial grid such that the x-values where the payoff (or its derivatives)
is discontinuous are exactly midway between grid nodes. If necessary, we
can use a scheme with non-equidistant grid spacing to accomplish this (see
Section 2.4). Our example above shows that aligning the grid in this way
will, in a loose sense, make the payoff smooth.
For digital options, the grid shifting technique can be very efficient,
and such “locking” of the location of strikes and barriers relative to the
spatial grid can often reduce odd-even effects even better than the continuity
correction discussed earlier. To demonstrate, consider the concrete task of
using a finite difference grid to price a digital call option on a stock S$ in the
Black-Scholes model. In this case, we conveniently have a theoretical option
price to compare against, since it is easily shown that the time 0 value V(0)
must be
vo) = In(S(0)/H) + (r 0/2)
“TQ (S(T) > H)=e77® (
(2.33)
For our numerical work, we discretize the asset equidistantly in log-space
(i.e., we work with the PDE (2.3)) and determine the spatial grid boundaries
by probabilistic means using a multiplier of a = 4.5, see Section 2.1. Spatial
boundary conditions are 9V/Ox = 0?V/Ax?, implemented as described
in Section 2.2.2. In one experiment, we apply a straight Crank-Nicolson
approach, with no attempt to regularize the payoff condition. In a second
experiment, we combine Crank-Nicolson with Rannacher stepping and also
nudge the entire spatial grid upwards until the log-barrier In(H) is located
exactly half-way between two spatial grid points. Numerical results are
shown in Figure 2.1.
As Figure 2.1 shows, a naive Crank-Nicolson implementation is plagued
by severe odd-even effects and very slow convergence — 100’s of spatial
steps appear to be necessary before acceptable levels of the option price
are reached. On the other hand, grid shifting combined with Rannacher
stepping results in a perfectly smooth!! convergence profile, and 5-digit
price precision is here reached in less than 30 steps.
11 can be verified that the convergence order in m is, as expected, close to 2
in this experiment.2.6 Convection-Dominated PDEs 61
Fig. 2.1. 3 Year Digital Option Price
— Grid Shifting
0.30 —* Straight Crank-Nicolson
0.25 + +
5 1S 25 35 45,
Grid Points in Asset Direction (m )
Notes: Finite difference estimates for the Black-Scholes price of a 3 year digital
option with a strike of H = 100. The initial asset price is S(0) = 100, the interest
rate is r = 0, and the volatility is o = 20%. ‘lime stepping is performed with an
equidistant grid containing n = 50 points. Spatial discretization in log-space is
equidistant, as described in the main text; the number of grid points (m) is as
listed on the a-axis of the figure. The “Straight Crank-Nicolson” graph shows the
convergence profile for a pure Crank-Nicolson finite difference grid. ‘The “Grid
Shifting” graph shows the convergence profile for a Crank-Nicolson finite difference
grid with Rannacher stepping and a shift of the spatial grid to center In(H)
midway between two grid points. From (2.33), the theoretical value of the option
is 0.4312451.
2.6 Convection-Dominated PDEs
Recall from Section 2.3 that stability of the explicit finite difference scheme
requires that (omitting grid subscripts on y and o)
247 2 A 2 Ad 2 A2 4
a, ° Dott pw? + [pA? - 04.
As discussed, this condition can be violated if A; is too large relative to
A,. However, for fixed A; and A, we notice that instability can also be
triggered if the absolute value of the drift jz is raised to be sufficiently large
relative to the diffusion coefficient .
While theta schemes with @ > 1/2 are always stable, large drifts in the
PDE can nevertheless cause spurious oscillations and an overall deterioration
in numerical performance of these schemes. PDEs for which this effect62 2 Finite Difference Methods
occurs are said to be convection-dominated. To quantify matters, assume
for simplicity that the finite difference grid is equidistant in the x-direction,
and consider the matrix A in (2.11) with tri-diagonal coefficients c, u, and 1
given by (2.7)-(2.9). As discussed in e.g. d’Halluin et al. [2005], spurious
oscillations can occur when, for some t and some j, either uj(t) < 0 or
1;(t) < 0. From (2.8) and (2.9), to avoid spurious oscillations we would thus
need
alts)? > |u(t,«;)[Ae. (2.34)
Intuitively, in convection-dominated systems, the central difference coefficient
6, and d;¢ used to discretize the PDE can no longer fully contain the large
expected up- or downward trend of the underlying process for 2; as a result,
spurious oscillations can occur.
2.6.1 Upwinding
There are a number of well-established techniques to deal with convection-
dominated PDEs. First, we can obviously attempt to lower A, such that
(2.34) is satisfied. This, however, may not be practical from a computational
standpoint (and may require that A; is lowered as well to avoid spurious
oscillations originating from the time-stepping scheme). An alternative is
to modify the first-order discrete operator 6, such that it points in the
direction of the large absolute drift. For instance, we can simply elect to
use a suitably oriented one-sided difference, rather than a central difference,
whenever (2.34) is violated. This procedure is known as upstream differencing
or upwinding. To formalize the idea, introduce a new first-order difference
operator 5% given as
3 (V(t, 541) —V(tyaj-1)) Az, [a(t ary) Ae S a(t, 25)?,
52V(t, 23) = 4 (V(t, 23) — V(t, 23-1) Az!, w(t, 2;)A, < —o(t, x;)?,
(V(t, c541) — V(t,2j)) Apt, u(t, x;)Ag > a(t, x3).
Using 65% instead of 5, modifies the matrix A in (2.11). Specifically,
if u(t,xj)A, < —a(t, xj)? we replace (2.7)-(2.9) with:
ej(t) = w(t, 2s) AZ! - oft,25)?Az? —r(t, a5), (2.35)
1
u;(t) = 50 (tay) Az? (2.36)
-1,1 2 A-2
y(t) = —wlt,25) Ag! + 5o(t,25)7Az?. (2.37)
And when p(t,2;)Az > o(t,x;)?, we use
oj(t) = —n(t, xj)! — o(t, 3)? Az? — r(t, a5), (2.38)
-1,1 -
uj(t) = y(t, 23)AZ* + 37a) Az”, (2.39)
L(t) = sat )2Az?, (2.40)2.7 Option Examples 63
For non-equidistant grids, a similar modification to (2.29)-(2.31) is required.
We omit the straightforward details.
Let us try to gain some further understanding of the upwind algorithm.
Comparison of (2.35)-(2.40) with (2.7)-(2.9), shows that upwinding amounts
to using a regular central difference operator 6, on a PDE with a diffusion
coefficient modified to be o(t,z) + /|u(t,x)|Ax. The numerical scheme in
effect introduces enough artificial diffusion into the PDE to satisfy (2.34).
Doing so, however, comes at a cost: the convergence order of the scheme will
be reduced to O(A,) if one-sided differencing ends up being activated in a
significant part of the grid. We note that higher-order upwinding schemes
are possible if the finite difference operator 5% is allowed to act on more than
three neighboring points. For such schemes, the matrix A will no longer be
tri-diagonal.
2.6.2 Other Techniques
As discussed earlier, upwinding amounts to adding numerical diffusion at
nodes where the scheme is convection dominated. Alternatively, we can
increase a(t, x) directly, to o(t,x) + € where ¢ is chosen to be large enough
for the scheme to satisfy (2.34). By solving the resulting PDE for different
values of ¢, it may be possible to determine how the error associated with €
scales in e. This, in turn, will allow us to extrapolate to the limit ¢ = 0. See
p. 135 of Tavella and Randall [2000] for an example.
The upwinding scheme presented in Section 2.6.1 switches abruptly from
central differencing to one-sided differencing when the condition (2.34) is
violated. In some schemes, the switch from central to one-sided differencing
is made smooth by using a weighted average of a one-sided and a central
difference operator. The weight on the central difference is close to one when
o(t,x)? > |u(t,x)|A2, but decreases smoothly to zero as o(t, 2)? /|u(t, x)|
tends to zero. While it is unclear whether a smooth transition to upwinding
is truly important (the convergence order is typically not improved over
straight upwinding), Duffy [2000] suggests that the class of exponentially
fitted schemes (see Duffy {2000} and Stoyan [1979]) may be quite robust in
derivatives pricing applications.
In some finance applications, multi-dimensional PDEs might arise where
o(t,x) =0 for one of the underlying variables; see for instance Section 2.7.5.
While upwinding techniques still apply here, we note that specialized methods
exist with better (O(A?)) convergence, should they become necessary. See, for
instance, d’Halluin et al. (2005] for details on the so-called semi-Lagrangian
methods.
2.7 Option Examples
In our discussion so far, we have assumed that options are characterized
by a single terminal payoff function g(x) and a set of spatial boundary64 2 Finite Difference Methods
conditions determining the option price at the boundaries of the x-domain.
In reality, many options are more complicated than this and may involve
early exercise decisions, pre-maturity cash flows, path dependency, and more.
In this section, we provide some relatively straightforward examples of such
complications and how to modify the basic finite difference algorithm to deal
with them. More examples will be provided later, in the context of specific
fixed income securities.
2.7.1 Continuous Barrier Options
We have already touched upon the concept of an up-and-out knock-out
option, an option that expires worthless if the x-process ever rises above a
critical level H. As we described, we here must simply solve the PDE (2.1)
on a domain [M, H], where M represents the lowest attainable value of the
process x(t) on [0,7]. The boundary condition at the upper boundary is
then dictated to be V(t, H) = 0, ie. of the Dirichlet type. We can generalize
this to allow both “up” and “down” type barriers, and to perhaps give a
non-zero payout (a “rebate”) at the time the barrier(s) are hit (provided this
happens before the option maturity). Specifically, if we have a lower barrier
at H, an upper barrier of H, a time-dependent lower rebate function of f(t),
and a time-dependent upper rebate function of f(t), we must dimension our
spatial grid {2j}5 to have ro =H, tm41 = H, and we then simply impose
the Dirichlet boundary conditions V(t, co) = f(t) and V(t,¢m+1) = F(t).
See (2.10) and the definition of 9 for the algorithm required to incorporate
such Dirichlet boundary conditions into the finite difference scheme.
In practice, barrier options sometimes involve time-dependent barriers,
possibly with discontinuities. For instance, step-up and step-down barrier
options will have piecewise flat barriers that increase (step-up) or decrease
(step-down) at discrete points in time. Extension of the finite difference algo-
rithm to cover step-up and step-down options is relatively straightforward.
As an illustration, consider a zero-rebate up-and-out single-barrier option
where the (upper) barrier is flat, except for a discontinuous change at time
T* H*. We set the z-domain of our finite difference grid to
a € [M,H], with M a probabilistic lower limit, as defined above; accord-
ingly, our spatial grid would be {aj }7"G', where x = M and tm4i = H. In
preparation for the shift in barrier levels at time T*, we make sure that one
level in the spatial grid — say 2,41, k < m, — is set exactly at the level H*.
Similarly, we make sure that one level in the time grid is set exactly to T*.
Starting at time T, we then iterate backwards in time by repeated solution
of m-dimensional tri-diagonal systems of equations, at each step integrating
a prescribed rebate function by supplying the Dirichlet boundary condition
V(t,’m+1) = 0. The moment we hit T*, the PDE now only applies to the
smaller region (M, H*], covered by the reduced spatial grid {«;}4*} with2.7 Option Examples 65
p41 = H*. From T* back to time 0, the backward induction algorithm
then involves only k-dimensional tri-diagonal systems of equations, with
the Dirichlet boundary condition V(t, r4.41) = 0. Spatial nodes above p41
correspond to zero option value and can be ignored”. Modification of the
algorithm outlined above to handle more than two barrier discontinuities is
straightforward.
We can extend our definition of barrier options even further by making
the topology of “alive” and “dead” regions more complicated. At time t,
assume for instance that the PDE applies in an “alive” region of x € L(t) and
a rebate function R(t, x) that applies in the “dead” region D(t) = B\L(t).
Assume that we discretize the problem on a single rectangular fit finite difference
grid spanning the spatial domain {M,M], where M and M are set such
that the alive regions are covered, up to probabilistic limits (if necessary).
Given option values at time t;41, we then only need to run the basic matrix
equation (2.18) for values in our grid {«;} that lie inside L(t;). This requires
scaling down the dimension of the matrix A as needed, and providing the
relevant boundary conditions (given through R(t;,7;)) at the boundary (or
boundaries) of L(t;). The parts of the spatial grid that lie outside of L(t,)
can be directly filled in with values provided by the rebate function R. Notice
that, if possible, the spatial grid should be set such that the boundaries
of L(t;) are contained in the mesh; this will likely require us to use the
techniques outlined in Section 2.4.
If the alive region has the simple form L(t) = [a(t), A(t)] for smooth
deterministic functions a and £, an alternative to the scheme above is to
introduce a time-dependent transformation that straightens out the barriers,
allowing us to return to the standard finite difference setup where the PDE
applies to a single rectangular (t,2) domain. One possible transformation
involves using a spatial variable of
x — a(t)
B(t) — a(t)’
which transforms the curved x-barriers a(t) and A(t) into flat y-barriers at
y = 0 and y = 1, respectively. The linearity of the transformation (2.41)
makes it easy to work with; see Tavella and Randall [2000] for details and
a discussion of extensions to multi-dimensional PDEs and to barriers with
discontinuities.
y=y(t,2) = (2.41)
12 An obvious twist to the algorithm involves using different spatial grids over
[0,7] and [2", 7], allowing for more flexibility in node placement. In this case,
values computed by backward induction must, at time 7, be interpolated from one
z-grid to another. The interpolation rule should be at least third-order accurate;
see the discussion in Section 2.7.3.66 2 Finite Difference Methods
2.7.2 Discrete Barrier Options
The barrier options considered in Section 2.7.1 are continuously monitored,
in the sense that the barrier condition is observed for all times in a given
interval. In practice, monitoring the barrier condition continuously can be
impractical, and it may instead only be imposed on a discrete set of dates
T, < Tz <...< Tx, with Te 0. For the sake of concreteness,
let us consider a discretely monitored up-and-out option with a constant
barrier H. For a continuously monitored up-and-out barrier option it would
suffice to solve the PDE on a domain x € [M, H], where M is a probabilistic
lower limit. This is, however, no longer the case for a discretely monitored
option where we need to allow the value function to “diffuse” above the
barrier levels between dates in the monitoring set {Ti }{<,. To allow for this,
we discretize the PDE on a larger domain x € [M,M], M > H. We can
determine M probabilistically by determining a confidence interval for how
far above the barrier x(t) can rise between monitoring dates. For instance,
for the Black-Scholes PDE (2.3), assume that max;=9,...,.« (Tk —Tk-1) = Ar.
Conditioned on x(t) = H, the probability that x(t + Ar) exceeds
La =H+ (« - 5”) Ar +a0V/ Ar
is (—a). As in Section 2.1, we recommend setting 17 = xq for values
of a somewhere between 3 and 5. To properly capture diffusion between
barrier observation dates, we should also dimension the time grid of the
finite difference scheme such that multiple time steps (at least two or three,
say) are taken between observation dates. All observation dates {Tk}{Q,
should obviously be contained in the time grid.
Between barrier observation dates, we solve our PDE by the standard
finite difference algorithm outlined in Section 2.2.4, as always imposing
cither an asymptotic Dirichlet condition at « = MW or a condition on the
a-derivatives of the value function. At each barrier observation time T,,, we
must impose a barrier jump condition
V(Te-,2) =V(Tet,2) ecu, REL Ke, (2.42)
where the notation T,+ was introduced in Section 1.10.1 to denote the limit
T, +e for ¢ | 0. This merely states that all values V(Th,a:) are zero for
a > H, consistent with the definition of an up-and-out option. In our finite
difference scheme, we incorporate this jump condition by simply interpreting
the vector V(T,) as found by regular backward induction as V(7,+) and
then replacing
a ~ A T
V(T.+) = (Vile)... Fn(Te))
with2.7 Option Examples 67
~ ~~ ~ T
V(Tk-) = (Tet oycrrys+s Pm(Te+)Ux,, <1)
before continuing the algorithm backwards from Tp.
The jump condition (2.42) will generally produce a discontinuity in V
as a function of x, around the barrier level H. If we use Crank-Nicolson
time-stepping, it will then be prudent to employ a fully implicit scheme for
the first few backwards time steps (Rannacher stepping) past each barrier
observation date T,. As discussed in Section 2.5, ideally this should be
combined with a smoothing algorithm acting on V(Z_—) or, perhaps more
conveniently, a shift of the spatial grid such that A lies exactly mid-way
between two spatial nodes in the grid.
We round off by noting that the discussion above for an up-and-out
option easily extends to more complicated discrete barrier options, including
those with time-varying barrier levels and rebates. For instance, assume that
an option involves upper and lower time-varying barriers of H(t) and H(t),
respectively, as well as a time- and state-dependent rebate of R(t, x). In this
case, we simply replace the jump condition (2.42) with
V(i-.2) = V (Tet. 2) ry cxeH(t)}
+ R(T, ©) (eesti + leecner.))) ,
and otherwise proceed as above. We note that time-dependent barriers
will typically require flexibility in setting the spatial grid, as there are now
multiple critical x-levels to consider. The discretization in Section 2.4 can
obviously assist with this.
2.7.3 Coupon-Paying Securities and Dividends
Many fixed-income securities are coupon-bearing and involve periodic trans-
fer of a cash amount between the buyer and the seller. This can easily be
incorporated into a finite difference grid, through a jump condition. Specifi-
cally, consider a security that pays its owner a single cash amount of p(T*, x)
at time T* R. We
dimension our time grid such that T* is contained in the grid, and then
apply at time T* the condition
V(T*~,2) = V(T*+,2) + r(T*, (2.43)
This simply expresses that V will decrease by an amount p immediately
after p is paid (and thereby no longer contained in V). In a finite difference
algorithm, (2.43) is incorporated by replacing V(T*+), as found by regular
backward induction, with
oa a a T
vrr-) = (Aas) 4 p(T*, 1) y-.-; Vm (T*+) +(T",2m))68 2 Finite Difference Methods
before continuing the algorithm backwards from T*. Extensions to multiple
coupons are trivial.
In some cases a derivative security does not itself pay coupons, but is
written on a security that does. This involves no particular complications,
except for the case where payments may affect the state variable underlying
the PDE. For instance, consider the classical case of a stock paying a
dividend: at the time of the dividend payment, the stock jumps down by
an amount equal to the dividend payment. For a model that uses the stock
price (or a transformation of the stock price) as the state variable x, a
dividend payment at time T* would thus be associated with a discontinuity
in the state variable, 2(T*+) = x(T*—) ~ d(T*,x(T*—)), where d is the
magnitude of the jump!’. As long as the dividend-payment does not come
as a surprise (ie., at a random time), it must already be contained into the
option price at T*~, and will have no price effect as we move forward from
T*~ to T*+. We can express this continuity restriction through yet another
jump condition
V(T*-,2) =V(T"+,2-d(T*,2)). (2.44)
See Wilmott et al. [1993] for more discussion. Implementation of (2.44) in
a finite difference grid proceeds as follows. First, we use regular backward
induction to establish
V(T*+) = (ar ys Pm(4))"
= (V(r +21), PT*+,2%m)) .
Then we write
V(T*-) = (¥ (T*+,2, —d(T*,21)),...,0 (T* 4.2m — a(T*,2%m)))
The values V;(T*+, R,
finite difference grids are ideal for pricing of such securities. Let us first
consider a Bermudan option with exercise opportunities restricted to the
finite set {Z,}{,. The Bellman principle (1.67) in Section 1.10 can, as
shown there, be expressed as a simple jump condition
V(Tp—,) = max (V (Th+,2),h(Tk,2)), k=1,...,K, (2.45)
which can be incorporated into a finite difference solver precisely the same
way as in previous sections. The condition (2.45) will result in a kink in
the value function around the level of x at which we shift from the hold
region into the exercise region. If Crank-Nicolson time-stepping is used, one
should ideally apply smoothing on the finite difference value vector V(T*—),
particularly around the kink.
If exercise can take place continuously (that is, American-style) on a
given time interval, a crude way to incorporate this into a finite difference
grid is by simply applying (2.45) to every point in the time grid of the
finite difference scheme. By not specifically imposing the partial differential
inequalities (see Section 1.10.1), this algorithm, however, will generally only
be accurate to first order in the time step, even if a Crank-Nicolson scheme
is used; see Carverhill and Clewlow {1990} for a proof. As American-style
exercise is rarely used in fixed income markets, we shall not pursue this issue
further but just point out that a number of schemes exist to restore second-
order time convergence to finite difference pricing of American options, see,
e.g., Forsyth and Vetzal [2002].
2.7.5 Path-Dependent Options
Finite difference methods are normally limited to Markovian problems
where dynamics are characterized by SDEs and where payouts are simple
deterministic functions of the underlying state variables. A number of options,
however, have terminal time T payouts that depend not only on the state
of x at time T, but on the entire path {x(t), t € (0, 7]}. In general, such
options must be priced by Monte Carlo methods (see Chapter 3), but
exceptions exist. Indeed, barrier and American options can be considered
path-dependent options, yet, as we have seen, can still be priced in a finite
difference grid. Even stronger path-dependence can sometimes be handled,
through the introduction of new state variables to the PDE.
To give an example, consider a path-dependent contract whe:
terminal payout at time T can be written as
> the
441f h represents the value of a derivative security that has no closed-form pricing
formula, it may be necessary to estimate this function by backward induction in the
finite difference grid itself. Such a “preprocessing” step is typically straightforward
to execute.70 2 Finite Difference Methods
V(T) = 9 (a(T),1(7)), (2.46)
where J is a path integral of the type
t
I(t) = [ h(a(s)) ds, (2.47)
0
for some deterministic function h. For instance, if h(x) = x, we say that the
option is a continuously sampled Asian option.
For the payout (2.46) we have V(t) = V(t, x(t), I(t)) where x(t) satisfies
the SDE (2.2) and
dI(t) = h(a(t)) dt, I(0) =0.
From the backward Kolmogorov equation, it follows that V(t, z, I) solves
x + ult, at + So(t,2) Se V + h(a) ie =r(t2)V, (248)
subject to the terminal condition V(T,2z,I) = aan There are several
complications with this PDE. First, it involves two spatial variables, x and
I, requiring the use of a two-dimensional PDE solver. Second, the PDE
contains no second-order derivative in the variable J, i.e. it is convection
dominated in the I-direction. We have discussed methods to handle the
latter issue in Section 2.6.1 and will turn to address the former in Section
2.9. Another complication is the fact that the term h(x) multiplying 0V/OI
may be of a different order of magnitude than the other coefficients in (2.48),
increasing the difficulty of solving the equation numerically. We refer to
Zvan et al. [1998] for a more detailed discussion of PDEs of the type (2.48).
In practice, it is rare that a continuous-time integral such as (2.47) is
used in an option payout. Instead, one normally samples the function h(x(t))
only on a discrete set of dates, i.e. we replace I(T) with
n
UL) = So h(a(T)) (Ts -T-1),
i=l
where Typ < T; <... < Tj, is a discrete schedule, with Ty = 0 and T, = T.
Informally, we now have
aI(t) = 6(T; - t)-h(a(Ti)) (Ti-Ti-1), 1(0) = 0, (2.49)
where 6(-) is the Dirac delta function. In a PDE setting, we incorporate a
process such as (2.49) through appropriate jump conditions, writing
V(Tj-,2, 1) = V(Tit, 2,1 + h(x) (T; -Ty-1))- (2.50)
In the same fashion as for discrete dividends (Section 2.7.3), the jump
condition enforces continuity of the option price across the dates where [2.7 Option Examples 7
gets updated. The condition is applied at each date in the discrete schedule,
i =1,...,n; in between schedule dates (where now dI(t) = 0), we solve the
etal av av eV
= t. = V,
ae tolisalSe + Solty LE = r(t,aNv,
which has no term involving J. When the J-direction is discretized in, say,
my different values, the solution scheme thus involves solving m, different
one-dimensional PDEs backward in time; the solutions of these my PDEs
exchange information with each other at each date in the schedule, in
accordance with (2.50). As was the case for cash dividends, implementation
of (2.50) will normally require support from an interpolation scheme, to
align the (x-dependent) jumps in J with the knots of the discretized I-grid
used in the finite difference scheme. See, e.g., Zvan et al. [1999] or Wilmott
et al. [1993] for further details. An application of this idea in the context of
interest rate derivatives is given in Section 18.4.5.
On rare occasions — basically when the homogeneity condition
V(nz,XI,t) = X7V(z,1,t), A,n > 0, holds — it is possible to make a
change of variables or a change of probability measure that will reduce
(2.48) or its discrete-time version to a one-dimensional PDE; see e.g. Rogers
and Shi [1995] or Andreasen [1998] for the case of various Asian options.
Section 18.4.5 demonstrates one such method, sometimes called the method
of similarity reduction, for pricing of “weakly path-dependent” securities,
including certain callable interest rate derivatives where the notional accretes
at a stochastic coupon rate (see Section 5.14.5 for definitions).
2.7.6 Multiple Exercise Rights
Certain financial products with early exercise rights allow the holder to
exercise more than once. Such “multi-exercise” options are relatively rare,
but the so-called chooser cap (also known as a flezi-cap) is occasionally
traded and constitutes a good example for describing how to handle multi-
exercise options in a PDE setting. Let there be given a set of L possible
exercise dates, T; < Tz <... < T, and assume that we have the right to
exercise no more than | times, with | < L. Provided that we exercise at time
T,, in a chooser cap we are paid’® ($(T;) — K)+, where S(-) is some interest
rate index and K is the strike. Clearly, we would never exercise at time T;
unless S(7;) > K, but how much larger than K the rate S(T;) needs to be
to trigger optimal exercise is not obvious, and must at least depend on i)
how many of our / exercise opportunities we have already used up at time
T;; and ii) how much value is lost by using (rather than postponing) one of
the remaining exercise opportunities.
18We have ignored a day count scaling constant in the payout. Also, in most
cases payment takes place at time 141, rather than at ‘/j; such a payment delay
can be handled by a discount operation.72 2 Finite Difference Methods
While the question of how to exercise optimally on a chooser cap may
appear quite complex, it is surprisingly easy to implement in a finite difference
setting by combining techniques from Sections 2.7.4 and 2.7.5 above. The
key to the method is to introduce an additional state variable J to keep track
of how many exercise opportunities are left. Assume that all interest rates
are functions of a Markov state variable x(-), and let therefore V(t, x, I)
denote the value of the chooser cap at time ¢, given x(t) = x and given that
there are still J exercise opportunities left. Notice that the variable J can
only take 1 +1 distinct values: 0,1,...,0; notice also that V(t,x,0) =0 for
all t and 2, since J = 0 corresponds to the situation where there are no
exercise opportunities left. Additionally, at the terminal time T;, we clearly
have
V(Tr, 2,1) =(S(TL,e) — K)*, 1 =1,2,.0.51, (2.51)
where we have written S(T;) = S(T,,x) to emphasize the deterministic
dependence of S on the state variabl
For given dynamics of x(t), starting with the terminal conditions in
(2.51), we may roll the / different value functions V(-,x, J), J = 1,2,...,l,
back through time in standard finite difference manner. At each time T;,
i=1,...,L—1, jump conditions similar to (2.45) must be applied, for all
1=1,2...,
V(Ti-,@, I) = max (V(Tj+, 2, 1), V(Ti+, x, I ~ 1) + (S(Tj, x) — K)*).
Notice that these conditions simply express that exercise is optimal only if
the exercise value (the cap payout plus the value of a chooser cap with one
less exercise opportunity) exceeds the hold value (the non-exercised chooser
cap). Once we have rolled all the way back to t = 0, the chooser cap value
at time t = 0 may be identified as V(0) = V(0, x(0), 1).
We should note that the “chooser” or “flexi” feature can be added to
securities other than caps (and floors). For instance, in Section 19.5 we study
the so-called flexi-swap, another security with multiple embedded exercise
rights.
2.8 Special Issues
In this section, we briefly show a few techniques that may come in handy
for certain applications.
2.8.1 Mesh Refinements for Multiple Events
As discussed in Section 2.1, the domain of the state variable x is often
determined as an exact or approximate confidence interval for the random
variable x(T’), where T is the final time of interest for a particular valuation
problem we want to solve. Given the number of desired spatial steps in the2.8 Special Issues 73
scheme, the discretization step in z-direction is then obtained by dividing
the size of the confidence interval by the number of steps. Similarly, the
discretization step in t-direction is typically obtained by dividing T by the
number of desired time steps. This is a standard procedure for building a
simple rectangular mesh, and it works well if the derivative we wish to value
does not have any “interesting” features between the valuation time 0 and
the final time T (e.g., for a simple European option). However, as should be
evident from the examples in Section 2.7, many real-life derivative securities
are characterized by a multitude of events during their lifetimes, all of which
must be adequately captured in the PDE scheme. It is not hard to see that
a grid dimensioning scheme based solely on the last event date may yield
inappropriate mesh resolution at earlier dates.
To make the discussion above concrete, let us consider the example of
a Bermudan option (see Section 2.7.4) with two exercise dates, T; and T2.
Assume that 0 < Ty < To, ie. that the first exercise date is much closer to
the valuation date than the second (and last) one. Also assume that there
is a decent chance that the option actually will be exercised at time Tj,
making it important to capture to good precision the value of the option
expiring at T;. Now, if we build our mesh based only on the distribution of
the state variable z(T>) at time To, there would typically be too few t-points
in the interval [0,7]. Also, the x-direction discretization step would be too
large compared to the range of possible values of the state variable x(T,) at
time T), i.e. the z-grid would be too coarse for the process x(-) on the time
interval (0,7;]. Both issues would typically lead to a large discretization
error in the finite difference stepping of the option over the time period
{0,T;], leading to problems with accuracy in values and risk sensitivities.
The issue of the sparsity of the time grid is fairly easy to deal with, as
we are free to add extra points to the time grid before time T;. This by
itself, however, will not solve precision problems, as the space step remains
large. Any proper solution should, of course, come in the form of refining
both the t- and a-grids at the same time.
One possible way of refining the «-discretization is to abandon the usage
of a single rectangular (t,x)-domain, and instead link together different
equidistant rectangular meshes for different periods in the life of the deriva-
tive. These mesh “blocks” would generally increase in spatial width with time
and would connect to each other via an interpolation scheme. To be more
specific, let us assume, as in Section 2.1, that the state variable x(-) is the
logarithm of the stock in the Black-Scholes model and is given by (2.4), with
the PDE to solve given by (2.3). We extend our simple two-period example
above to a derivative with K times of interest, 0 < T; <... < Tx; these
times could be specified as an additional input into valuation, or derived
from the trade description (e.g. they could represent the exercise dates for
a Bermudan option, or the knock-out dates for the discretely-monitored
barrier option of Section 2.7.2). Suppose we are given values of m and n,
and now wish to construct the mesh for the time period [T},_1, Th], by using74 2 Finite Difference Methods
the same time and space steps A¥, AK as would be used in the standard
scheme of Section 2.1 for a derivative security with the terminal payoff at
Tk. That is, having fixed the cutoff a we would set
Ab =T/n, Ak = 2a0V/Tk/(m +1). (2.52)
Then the rectangular, equidistant mesh for the time period [T,~1,T7p] is
given by
Cee aR, Ta HAAR, oh = hag HAE,
(2.53)
where |-| denotes the integer part of a real number and (see (2.4))
hin = (0) + (" - 3”) Th — 00 Tp (2.54)
Note that in reality we would want to make sure that the point TJ), is also
in the mesh for the time period [T},1,7;], even though for simplicity of
notations we did not reflect it in (2.53). It is also useful to note that the
total number of time points is not going to be n, but is actually equal to
K
So [= Tea/Te) n},
k=l
which scales linearly with n. Clearly, if exactly n points were required, a
simple adjustment to the definition of the time step in (2.52) could be
applied.
With a mesh as defined above, when arriving at time Tj, in a backward
induction scheme the solution V(Tj,+) would be discretized on the z-grid
att me a . To solve the PDE backwards over the time period [Ty~1,Tx],
we would need to resample it on the different x-grid {e*}7™5'. As with
interpolation across dividends (Section 2.7.3), simple cubic ” interpolation
would be a good choice here. Specifically, one would fit a cubic spline to
the values V(Tk,2}""), 7 = 0,...,m+ 1, and then calculate V(Tp,2*),
j =0,...,m-+1, by valuing the spline at the required grid points.
The “interpolated mesh” scheme above is rather intuitive and straight-
forward, but it does suffer from the need to do interpolation work that could
slow down the PDE (especially in dimensions higher than 1 and/or for a
large number of interface points K). Also, it is not entirely clear how inter-
polation will affect stability and convergence properties of the PDE. Finally,
linking the interface mesh geometry to the trade specifics (such as exercise
dates) may not be ideal from the point of view of designing an efficient
valuation flow in a risk management system. These considerations lead us
to an alternative approach that relies on non-equidistant discretization as
developed in Section 2.4. The idea of this method is to use non-uniform2.8 Special Issues 75
discretization to concentrate more points, both in time and space, around
the initial point t = 0, = 2(0). Clearly many ways of achieving this are
possible — below we present a simple scheme we have used with good results.
We define K, the user input, to be the number of spatial refinement.
levels (with K = 2 or 3 typically used), and r, another user input, to be a
time scaling constant (typically r = 4). If T is the final horizon for valuation,
we then introduce times
0=%)
k=1 k=L
where
anink ak + jak, 2h = okt 4 7k,
‘6 And, as advised earlier, adding trade event dates that fall into this period —
although we do not reflect this in our notations for simplicity.76 2 Finite Difference Methods
and
m= |= | Re |= = that | _
ar At
This distribution of space points results in an z-grid that is more dense
around the point 2 = (0) than at the edges. It is worth noting that with
only one refinement level K = 1, the standard rectangular uniform mesh
sized by the terminal distribution of the state variable is recovered.
2.8.2 Analytics at the Last Time Step
In cases where the dynamics of underlying PDE variables are tractable,
one naturally wonders whether finite difference methods could somehow be
improved by incorporating analytical results into the scheme. Here, and in
the next section, we discuss two simple ideas.
Suppose that we are faced with the problem of pricing a contingent
claim with terminal boundary condition g(a(T)), where 2(¢) is a Markovian
process with known Arrow-Debreu state prices:
G(t,2;s,y) = E2 (6 (2(s) — y) en FE nwa) de} p(y = 2) , s>t
Assume also that the claim in question involves a jump condition at time
0 < T* < T (but no jump conditions between J* and T). If our finite
difference grid is {xj}, we can now use a series of m + 2 outright
convolutions to compute
V(I*,2;) = [erate g(y)dy, 7 =0,1,...,.m+1. (2.55)
If we are lucky (i.e., if both g and G are sufficiently simple), then the integral
on the right-hand side may be known in closed form for all values of x;. If
not, we can always perform a series of numerical integrations, the total cost
of which is typically!” O(m?), i.e. more expensive than the typical O(m) cost
of a single time step in a finite difference method. There are several reasons
why we may want to perform the numerical integrations nevertheless. First,
the convolution expression (2.55) is exact, as it is based on the true transition
density. Second, if the gap between T* and T is large, an ordinary finite
difference grid would need to roll back from T to T* using multiple time
steps n*, at a total cost of O(n*m); if n* is of the same magnitude as m, the
computational effort of the convolution scheme would be comparable to that
of a finite difference grid. Third, for discontinuous payouts, the integration
in (2.55) will have a naturally smoothing effect, similar to (but often better
than) the continuity correction method of Section 2.5.2. The smoothing
'TThere are exceptions. For instance, if fast Fourier transform (FFT) methods
are applicable, the cost may be reduced to O(m In(m)). See Section 8.4 for details.2.8 Special Issues 17
effect is discussed in more detail in Section 23.2.4 and is also demonstrated
below, in Figure 2.2, where we have continued our investigation of the 3
year digital option considered earlier in Section 2.5.3. Since the model used
in Figure 2.2 is ordinary Black-Scholes and g(a) = 1¢2>1}, the integrals in
(2.55) can here be computed in closed form from (2.33).
Fig. 2.2. 3 Year Digital Option Price
—— Analytical Smoothing
0.30 3 Straight Crank-Nicolson
0.25 + —_—_——.
5 15 25 35 45
Grid Points in Asset Direction (m )
Notes: Finite difference estimates for the Black-Scholes price of a 3 year digital
option. All contract and model parameters are as in Figure 2.1. Time stepping is
performed with an equidistant grid containing n = 50 points. Spatial discretization
in log-space is equidistant, as described in the main text; the number of grid points
(m) is as listed on the a-axis of the figure. The “Straight Crank-Nicolson” graph
shows the convergence profile for a pure Crank-Nicolson finite difference grid. The
“Analytical Smoothing” graph shows the convergence profile for a Crank-Nicolson
finite difference grid starting at 7 = 2.5 years, with the terminal boundary
condition set equal to a 0.5 year digital option price (as in (2.55)). The theoretical
value of the option is 0.4312451.
In principle, we could continue rolling back from T* (through, possi-
bly, jump conditions at earlier times) by performing convolutions, rather
than solving finite difference grids. In practice, this rarely leads to improve-
ments over a finite difference grid, unless the densities and payoffs are quite
simple!§. Moreover, in many cases we may not have exact Arrow-Debreu
18For simple densities (especially Gaussian), special-purpose methods exist to
compute convolutions rapidly, typically involving payoff approximations through
piecewise polynomials or other simple functions. We do not cover these methods in73 2 Finite Difference Methods
prices, only approximate ones based on, say, a small-time expansion (see,
e.g., Section [Link]). In this case, a one-time convolution may be safe
especially if 7 —T* is small — whereas repeated convolutions may lead to
unacceptable biases.
2.8.3 Analytics at the First Time Step
The idea in Section 2.8.2 of replacing the finite difference stepping with
analytical integration is even easier to apply over the first, rather than
the last, time step. Suppose 7 is the first “interesting” time for a given
derivative security, ie. there might be a jump condition at time T* but none
over the time interval {0,7*]. Then, rather than stepping the finite difference
scheme from T* to 0, we can perform a single integration to calculate the
value V(0,x(0)) of the derivative at time zero from the discretized values
{V(T",x5)}""5' of the derivative at time T* (using the same notations as
in Section 2.8.2),
0);T*,y) V(T*,y) dy,
V(0,x(0)) = [cu
R
where V(T*,y) is interpolated (using cubic splines, say) from the values
{V(T*,a5)}"4" on the grid. If the integral is computed numerically — as is
most often the case — the numerical cost is often comparable with that of
the finite difference stepping because only one value V(0,2(0)) is required
at time 0, not the whole slice.
While there are typically no numerical cost savings that arise from
using integration over the first time step, there are accuracy and stability
considerations that favor this approach. We have already seen in Section
2.8.1 that the standard discretization of a PDE often leads to insufficient
fidelity in resolving any features of the payoff that are close to today, and
numerical integration can be of considerable help in this regard. Moreover,
as we discuss in much detail later in Chapter 23, an integration scheme
typically allows us to treat discontinuities in the value V(T*, a) arising from
the jump condition at time 7* explicitly. If the discontinuity is introduced
at the value of the state variable x*, then the integration scheme can (and
should) explicitly take this information into account. For example we would
write
voo,x(0) = [7 G (0,2(0);T*, y) V> (Ty) dy
+ [60.2079 Hv) dy
this book except for a brief mention in Section 11.A. For a representative example
see Hu et al. [2006].2.9 Multi-Dimensional PDEs: Problem Formulation 79
and calculate V~ (T*,y) by interpolating the grid values in the time inter-
val (—co,*), and V+(T*,y) by interpolating the grid values in (z*,0o),
separately.
The usefulness of the method is only limited by the availability of the
closed-form expression for the time 0 Arrow-Debreu prices G(0,x(0);T”*,-).
For some models this is not an issue; for most others, sufficiently close
approximations could be obtained in a small-time limit (see e.g. Section
[Link] for a typical approach) that can be useful for times T* that are not
too large. By a change of measure, we see that.
V(0,2(0)) =E (e ha “(T+ 2(T")))
= P(,T*)E™ (V(T*,(T*))),
where ET” is the expected value operator under the T*-forward measure Q™”;
so we really only need the expression for the density (rather than Arrow-
Debreu security prices) of c(T*) under Q?”, either exact or approximate.
Finally, we note that while the integration over the first time step can be
seen to offer similar advantages to those of the methods in Section 2.8.1, the
two approaches are not substitutes for each other, but are complementary.
We typically recommend using direct integration over the time step [0, 7},
where T* is the smaller of the time of the first jump condition or the limit
of applicability of the approximation to the density of «(T*), and then (if
needed) use the methods in Section 2.8.1 over the time interval [T*,T), with
T being the final maturity of the option in question.
2.9 Multi-Dimensional PDEs: Problem Formulation
We now turn our attention to the numerical solution of multi-dimensional
terminal value problems. Let the spatial variable x be p-dimensional, x =
(t1,-.-,2p)", and consider the PDE
P wv i PV
Lunt 5 TFL lb Aga — Mhz) =0, (2.56)
h=1l=1
av
ot
where spp(t,) > 0 and spu(t,x) = si,n(t,2) for h,l = 1,...,p. The PDE is
assumed subject to the terminal value condition V(T,r) = g(x), g: R? > R.
From the results in Chapter 1, we recognize that the PDE provides the
solution to the expectation
'°9One of the functions V~(7",y), V* (1, y) is often known analytically and for
all values of y (rather than sampled on the grid); this is for instance the case for
the Bermudan options of Section 2.7.4. ‘Che integration algorithm should obviously
take advantage of this.80 2 Finite Difference Methods
V(t) = By (ero dg (2(7)) [x(¢) = 2),
where the components of x(t) satisfy risk-neutral SDEs of the type
dap,(t) = up (t, x(t) dt + on (t,a(t)) dW(t), h=1,...,p. (2.57)
Here W(t) is a d-dimensional Brownian motion, pn : [0,T] x R? > R,
h=1,...,p, are (scalar) drifts, and op : (0,T] x R? 3 RI*4,h=1,...,p,
are d-dimensional (row vector) diffusion coefficients. The PDE coefficients sp,
in (2.56) represent the instantaneous covariance matrix for the components
of 2(-), Le., Spu(t,2) = on(t,x)or(t, 2)". We assume enough regularity on
Hn; Th, 7, and g to ensure that (2.56) has a unique solution.
For the purpose of solving (2.56) numerically, we assume that the PDE is
to be solved on a (finite) spatial domain in x, « € [M,,My]x...x(M,, Mp),
where Mn,Mn, h=1,...,p, are constants either dictated by the contract
at hand (barrier options) or found by a suitable probabilistic truncation (see
Section 2.1).
2.10 Two-Dimensional PDE with No Mixed
Derivatives
To illustrate the construction of finite difference discretization of (2.56), we
start out with the simple case where p = d = 2 and there are no mixed
partial derivatives in the PDE: s),9(t,z) = so,1(t,c) = 0 for all t and a.
Probabilistically, the absence of mixed derivatives corresponds to the case
where the stochastic process increments dz ;(t) and dx2(t) are independent.
Defining 7,(t,2)? = sa,n(t, x), h = 1,2, the PDE to be solved now becomes
ov
at (Li + L2)V =0, (2.58)
where
Oo 1
Lr = walt, 2a "as Flt x) aa au 5r(te), h=1,2.
Notice that we have divided the term r(t, x) into equal pieces in £; and Lo.
To discretize (2.58) in x, introduce grids x, € {xi'}M$' and x2 €
{a }n25'. To simplify notation, assume these grids are equidistant such
that 2}! = M, + j1Ay and 2? = My + jog. Let Vj,,5,(t) £ V(t, 27, 222).
We define discrete central difference operators as before
Vij ttsja (t) = Vir 1,50 (t)
[Link] salt) = Be
f t
Sx2Vjnjn(t) = tee Ben2.10 ‘Two-Dimensional PDE with No Mixed Derivatives 81
and
[Link](t) — [Link](t) + Vin —ayjo(t
bese, Vjngp(t) = (int 152 (t) ‘ug ja ranlt)
i
joa t) ~ Vjcsjo(t) + Viacio—1(6)
BnaeaVj() = dee — Tag 4 Mae,
These operators, in turn, give rise to the discrete operators
7 4 1 2 1
Ln = Unt, 2)52, + 5In(t@)"bx,2, — 572), h=1,2,
where « is constrained to take values in the spatial grid. A Taylor expansion
shows that this operator is second-order accurate (compare to Lemma 2.2.1),
(Li + Lo)V(t,2) = (£1 + £2) V(t.) + O (A? + 43).
2.10.1 Theta Method
Turning to a theta-style time discretization, consider first proceeding exactly
as in Section 2.2.3. Assuming equidistant time spacing A;, we get for the
period [t:, tis]
( 0; (A + 2,)) Vj sia(ta)
= (141 =0)4: (Er + £2) Visa (ten) + ef",
where
et =O (4: (43 + AB + Log} Ae + 4?) ,
and where it is understood that Ly and Lo are to be evaluated at (t,x) =
(t)7 (0), ct 23?) with t!71 (0) defined as in (2.14). If Vj, 5, (t) = V(t,24', 3?)
is a finite difference approximation to V},,;.(t), we thus get the scheme
(1-04 (Er + £2) Piaslte) = (14 0-4 (G+ 22) Fh saltina)
(2.59)
to be solved for the mymy interior points Vj, j)(ti), jr = 1,--.,ma, de =
1,...,me, given the values of Vine (t.4.1), and given appropriate boundary
conditions at 7; = 0, j1 =m, +1, jo =0, and jg =m24+1.
The scheme (2.59) represents a system of linear equations in m,mz un-
knowns {¥},,;,(ti)}. When written out as a matrix equation (which requires
us to arrange the various Voasjo (ta) in some order in a (7m ,mg)-dimensional
vector), the matrix to be inverted is sparse but, unfortunately, no longer
tri-diagonal. Solution of the system of equations by standard methods (e.g.,
Gauss-Jordan elimination or LU decomposition) is out of the question due82 2 Finite Difference Methods
to the size of the matrix?°. We can proceed in two ways: either we use a
specialized sparse-matrix solver; or we attempt to redo the discretization
(2.59) to make it computationally efficient. We personally prefer the second
approach and shall outline one method in the next section. As for the first
approach, we simply note that a good iterative sparse solver should be
able to solve (2.59) in order O((mmz)°/4) operations. See Saad [2003] for
concrete algorithms.
2.10.2 The Alternating Direction Implicit (ADI) Method
The ADI method is an example of a so-called operator splitting method,
where the simultaneous application of two operators (here £; and Lo) is
split into two sequential operator applications. To illustrate the idea, set
9 = 3 (Crank-Nicolson scheme) in (2.59) and approximate
(1 - sau (E4 &)) ns ( 7 jac) (: 7 pada) , (2.60)
(1454 (é:+2)) x (1+ 54:21) (1+ 542). (2.61)
It is easy to see*! (and to verify, by a Taylor expansion) that the operators on
the right-hand sides of these approximations have the same order truncation
error as do the left-hand sides, namely O(A;(A? + A3 + A?)). To the order of
our original scheme, no accuracy is gained or lost in using the right-hand sides
of (2.60)-(2.61). What is gained, however, is a considerable improvement in
computational efficiency, originating in the fact that the resulting scheme
1 > 1 a\s
(: - pati) (: = jada) Vj, 50 (ta)
1 = 1 a\s
= (14 54d) (14+ $42) Fatt) (269)
can be split into the system
lye L,s\s5
( - jae) Ving = ( + jae) Vin cin (tetas (2.63)
l,a\s lis
( - a) Vir jo (ti) = ( + ye) Upasins (2.64)
?°Recall that the solution of a general linear system with mim2 unknowns is an
O(m?m3) operation. For, say, m1 and mz in the order of 100, this would involve
around 1,000,000 times more work than what is required for a one-dimensional
(tri-diagonal) scheme (O(m)).
°1To those versed in operator notation, we notice that the right- and left-hand
sides both approximate, to identical order, exp(+0.54:(£1 + L2)).2.10 Two-Dimensional PDE with No Mixed Derivatives 83
where we have introduced an intermediate value Uj,,5.. The advantage of
this decomposition is the fact that in each of (2.63) and (2.64), there is only
one operator on the left-hand side, leading to simple tri-diagonal equation
systems. To formalize this, first define
UP = (Uigias Ur,iay-- Umacia) +
Then, for a fixed value of jg we can write for the first step
t, i: Y
(raat (ise ‘)) ue? = Me (2344) , (2.65)
where A? is an (m; x m;)-dimensional tri- “diagonal matrix of the same
form as (2, 11) (to get Aj’, basically freeze r2 = x3? and substitute 4, and
‘1 for 4 and o in the definition of the one- dimensional matrix A). The
my-dimensional vector M3? has components Me, vip J = 1,..-,™1, given by
tir tti l,o\s5
ME, (A344) = (2 + 54:L2) ee
V5, jo —1(tiga) + 5. jaVind-sa(tint)
= Five
1
+ (1- Jone) Masaltonds (2.68)
where we have defined
7 batt > tiga tt
sha Ba (nm (44 at 0f) + Ante (HES ot of
A tigr tt Pa turtti jf 5
sno # Sf (m (Hatt a att) + 545r tt tad ok 7
For known values of 7(t:41), (2.65) defines a simple tri-diagonal equation
system which can be solved for Uj? in O(m) operations. Repeating the
procedure above for jg = 1,...,mg allows us to find U;,;, for all jy =
1,..-,™1, jo =1,..., me, at a total computational cost of O(m mz).
Turning to the second step of (2.63)-(2.64), we first fix j; and define
VP) = (Pra, FralOs-+sFramal)
In the same fashion as earlier, we can then write
(1-54 (#3 *)) VR (t;) = Me (= *) : (2.67)
where A} is an (mg X mg)-dimensional tri-diagonal matrix and where the
right-hand side vector now has components84 2 Finite Difference Methods
ty. t. 1 ra .
Mi, (4) 7 ( + pe) Uji J2=
For brevity we omit writing out the Mi, (which will be similar to (2.66)),
but just notice that the right-hand side of (2.67) is known after the first step
of the ADI algorithm (above) is complete. For a given value of #1, we can
solve the tri-diagonal system (2.67) for ve (t;) in O(mg) operations. Looping
over all m, different values of j;, the full matrix of time t; values Vinsie (ta),
ji=1l,...,11, jz =1,..-, ma, can then be found at a total computational
cost of O(inymg).
The scheme outlined above is known as the Peaceman-Rachford scheme.
As is the case for all ADI schemes, the scheme works by alternating the
directions that are treated fully implicitly in the finite difference grid: in the
first step, the 21-direction is fully implicit and the «2-direction is fully explicit,
and in the second step the order is reversed. In effect, both spatial variables
end up being discretized “semi-implicitly”, i.e. similar to a Crank-Nicolson
scheme, resulting in convergence order is O(A? + A3 + A?). We emphasize,
however, that whereas a direct application of the Crank-Nicolson scheme
will involve (if an efficient sparse-matrix solver is used) a computational cost
of O((mimz)*/4) per time step, the computational cost of the Peaceman-
Rachford ADI scheme is only O(m1mg). A (tedious) von Neumann analysis
reveals that the scheme is A-stable, but, like the Crank-Nicolson scheme,
not strongly A-stable.
While the Peaceman-Rachford scheme is a classical example of an ADI
scheme, there are many others. For instance, consider a theta-version of the
Douglas-Rachford scheme:
+) Ma.
Vusaltest), — (2.68)
(i - 0A:L2) Vyasa (ta) = Usa gs — OAL Vj, ia (tina), (2.69)
(Q 7 0A:L:) Uys = (a + (1-9) AL + AL,
where we understand that in Lt and Lo the PDE coefficients are to be
evaluated at time ti*1(6). Again, notice how the scheme consists of two
steps, each involving the solution of tri-diagonal sets of equations along
only one of the 2}- or x2-directions. The computational cost thus remains
at O(m,mg). It can be shown that the convergence order of this scheme
is O(A} + A3 + Lyoz4y4e + A?) and it is A-stable for @ > 5, and strongly
A-stable for @ > 4. By elimination of U;,,;, we note that the unsplit version
of the Douglas-Rachford scheme is
(a - 04,£1) (a - 0ArL2) Vpn ja (ta)
= ((1- 04:21) ( 0A £2) + AL + A.L2) Va rin(tist)-
It is not difficult to see that this approximates (2.59) to second order.2.11 ‘Two-Dimensional PDE with Mixed Derivatives 85
2.10.3 Boundary Conditions and Other Issues
The fact that ADI schemes reduce to solving sequences of matrix systems
identical to the ones arising in the one-dimensional case is convenient, in
the sense that many of the issues we have encountered for one-dimensional
finite difference grids (oscillations, stability, convection dominance, etc.)
and their remedies (smoothing, non-equidistant discretization, upwinding,
etc.) carry over to the ADI setting with only minor modifications. Con-
sider for instance the issue of applying spatial boundary conditions along
the edges of the (:c},22) domain, which we have so far not discussed. As
for the one-dimensional PDEs, the most convenient way to express such
boundary conditions is typically by imposing conditions on derivatives, like
OV (t, 29, 2?)/Ox? = V(t, 29, e3?)/Ax, and so forth. For the Peaceman-
Rachford scheme, say, such conditions can be incorporated directly into (2.65)
and (2.67) by altering the matrices Aj? and Aj}, as well as the boundary
elements of Mj’ and M3, in the manner outlined in Section 2.2.1. If instead
we wish to impose Dirichlet boundary conditions, we need to add corrective
terms to the tri-diagonal systems, as in (2.19). To complete the first part of
the split scheme, this then requires us to establish what boundary terms are
needed for the intermediate quantity Uj,,;., ie. we must define U,,9 and
U5, ,mz+1 for jy = 1,..., m1, as well as Uo,;, and Um, 41,3 for jg = 1,...,me.
While U;,,;, is a purely mathematic construct, sometimes it is adequate to
think of Uj,,,, as a proxy for V;,;, evaluated at ¢}+!(@), which obviously
makes determination of boundary conditions straightforward. For maximum.
precision, however, we should use the ADI equations themselves to express
the boundary conditions of U directly in terms of boundary conditions for
V(t,) and V(t,41). Here, the Douglas-Rachford scheme is particularly easy
to deal with, as a rearrangement of (2.69) directly relates U,, j, to Vas ga (ts)
and Vj,.y,(te41)s
Unga = (1 AL 2) Vy, salts) + OAL utes)
The Peaceman-Rachford scheme requires some further manipulations to
express U in terms of V(t;) and V(t,41); see Mitchell and Griffiths [1980]
for the details.
2.11 Two-Dimensional PDE with Mixed Derivatives
Consider now the case where the 2-dimensional PDE (2.58) has a mixed
partial derivative,
Vv
My (Cr+ Le Lia) V =, (2.70)
where £; and £o are as in (2.58), and where86 2 Finite Difference Methods
2 2
Lir= sialtit)a a olt,z)n (t.e)ralt") aS aay (2.71)
The quantity p(t, ) is the instantaneous correlation between the processes
x(t) and x9(t) in (2.57), ie. p(t,) € [-1,1]-
The presence of £;,2 prevents a direct application of the ADI methods
in Section 2.10.2, since the mixed operator £;,2 is not amenable to operator
splitting. We shall demonstrate two ways to overcome this problem: a)
orthogonalization of the PDE; and b) predictor-corrector schemes.
2.11.1 Orthogonalization of the PDE
The idea here is to introduce new variables y;(t,21,22) and yo(t, 21, 22)
such that the PDE loses its mixed derivative term when stated in terms of
these variables. To demonstrate this idea, assume first that p(t,x), y(t, x),
and y(t, x) are all functions of time only and independent of x. Then define,
say,
w(t, 1,02) = 21, (2.72)
y2(t)
mn(t)
where we must assume that y(t) 4 0 for all t.
yo(t, 21, £2) = —p(t) 21 +22 Sa(t)t1 +22, (2.73)
Lemma 2.11.1. Consider the PDE (2.70) subject to the terminal value
condition V(T, 2) = g(x). Define y = (yi,y2)" and u(t,y) = V(t,2). With
the variable change defined in (2.72)-(2.73), v satisfies
wvioiy Ov oy, Ov 1g Oy
at Fao aT tHE + ond) ag
1 2 207u
+3 (1 ~ p(t)?) y(t) ag r(t,yn,y2—a(t)y1) = 0, (2.74)
Y2
where
Hi (t,y) © mn (t, 21,22) = ma (tyr, y2 — alt), (2.75)
da(t
Blt) & On, + alten (t,22,20) + Ha (ts 21,22)
da(t
= By, + a(Qut(ty) +ue(tave—alOnn). (276)
The equation (2.74) is subject to the terminal value condition v(T, 41, y2) =
9(21, 02) = 9(y1,y2 — a(T)y)-
Proof. While the result can be established by the usual mechanics of ordinary
calculus, we will take the opportunity to show how stochastic calculus can2.11 ‘I'wo-Dimensional PDE with Mixed Derivatives 87
also conveniently prove results of this type. Going back to the processes
underlying the PDE (see (2.57)), we write
dxy(t) = p(t, x) dt + y1(t) dWi(t), (2.77)
dea(t) = pir(t,x) dt + y2(t) (0 (t) dWy(t) + VT — ple)? dWa(t )) (2.78)
for independent scalar Brownian motions W(t) and W(t); this is easily seen
to generate the correct correlation p(t) between x, and x. An application
of Ito’s lemma then shows that the processes for y; and yo are
dyi(t) = dara (ty = palt, 2) dt + ne) dWi(t),
dy(t) = 20
ay(t) dt + a(t) (t,2) dt + a(t)yn(t) dW (t)
+ nalts) dt + alt) (old) aWi(t) + VT pO? aWa(t))
7 (= ay(t) + a(t)un(t,2) + molt, )) at
+ a(t) V1 — p(t)? dWo(t)
With the definitions (2.75)-(2.76), this becomes simply
dys(t) = wy (t y(t) dt + n(t) dWi(t), (2.79)
dya(t) = u3 (t,y(t)) dt + y(t) /1 — p(t)? dWalt). (2.80)
Equations (2.79)-(2.80) define a Markov SDE in y;(t) and yo(t) where,
importantly, the Brownian motions on y(t) and yo(t) are now independent.
Writing V(t,x) = v(t,y), it then follows immediately from the backward
Kolmogorov equation (see Section 1.8) that v satisfies the PDE (2.74). 0
Through the chosen transformation (2.72)-(2.73), our original PDE has
now been put into a form where we can immediately apply the ADI schemes
outlined in Section 2.10.2.
In performing the orthogonalization of the PDE in Lemma 2.11.1 we
relied on p(t,x), y1(t,), and 72(t,r) all being independent of x. This
can often be relaxed. Consider for instance the case where p(t,x) = p(t),
yi(t,c) = y(t, 21), and y(t, x) = yo(t, x2); here the correlation p is still
assumed deterministic, but we now allow for some (though not full) x-
dependence in 7 and 72. Assuming that 7; (t,71) > 0 and y(t,x2) > 0 we
can introduce new variables
2 (t.21) = | Ea (2.81)
29 (ty) = / sana" (2.82)
Applying Ito’s lemma to (2.77)-(2.78) we see that88 2 Finite Difference Methods
_ On(tti) 1 a(t, x)
data = (~ [adn t een
1 dy (t,21)
2 an dt+dW,(t) (2.83)
and
_ Oy2(t, 22) 1 H(t, x)
dealtm) = (- ot Aol, ea)" a 7y2(t, 2)
~; me) dt -+ p(t) dWy(t) + JT =p dWe(t). (2.84)
2
As we assumed that 7 (t,21) > 0 and 72(t, 2) > 0, the functions z; and
zq are increasing in «, and 22, respectively, and are thereby invertible. As
such, we can rewrite (2.83)-(2.84) in the more appealing form
dzy(t, 1) = yj (t, 21, 22) dt + dW, (t),
dza(t,c1) = 43 (t, 21, 22) dt + p(t) dWi(t) + V1 — p(t)? dWa(t).
Through the transformation (2.81)-(2.82), we have reduced our original
system to one where the coefficients on W1(t) and W(t) are no longer
state-dependent, similar to the case that lead to Lemma 2.11.1. We can
now proceed with another variable transformation, as in (2.72)-(2.73), to
orthogonalize the system and prepare it for an application of the ADI
method.
While the orthogonalization method outlined here can be very effective on
a range of practical problems, it suffers from a few drawbacks. Most obviously,
the method is not completely general and requires a certain structure on the
parameters of the PDE. Another drawback is that the introduction of a time-
dependent transformation on one or more variables (Lemma 2.11.1) often
makes the alignment of the finite difference grid along (time-independent)
critical level points in x-space impossible. Also, the introduction of terms
like y;da(t)/dt in the drift of yo (see (2.76)) can be problematic, particularly
if the functions 7,(¢) and y(t) are not smooth. For instance, it is not
unlikely that y,da(t)/dt will locally be of such magnitude that upwinding
will be necessary to prevent oscillations; see Section 2.6.1. Further, we
note that inversion of the transformations (2.81)-(2.82) will not always be
possible to perform analytically and may require numerical (root-search)
work, complicating the scheme and potentially slowing it down. Finally,
as we shall highlight in future chapters, maintaining the “continuity” of a
numerical scheme with respect to input parameters is of critical importance
for the smoothness of risk sensitivities. Such continuity is difficult to ensure
if complicated transformations are applied to model variables. So, in the
end, we recommend formulating the PDEs in terms of financially meaningful
variables, avoiding excessive transformations, and relying on methods such
as developed in the next section when dealing with mixed derivatives and
other numerical complications.2.11 ‘I'wo-Dimensional PDE with Mixed Derivatives 89
2.11.2 Predictor-Corrector Scheme
In this section we shall consider a completely general method for handling
mixed derivatives in two-dimensional PDEs. While a bit slower than the
method outlined in Section 2.11.1, it does not involve any variable transfor-
mations and, by extension, does not suffer from the drawbacks associated
with such transformations. As a first step, consider the discretization of the
mixed derivative 0?V/dx,0x2. There are a few possibilities (see Mitchell
and Griffiths [1980]), but we shall just use
Se122 Vir sie (t) = Sirs 5ca Vina (€)
_ Vatiieti(t) — Visine) = Vinita) + Vinnie lt)
ia (2.85)
Extensions to non-equidistant grids follow directly from (2.27) and the
relation x, 02Vj; jo (t) = bx, 5x2 Vj1,j.(t). As we have not encountered mixed
difference operators before, for completeness we show the following lemma.
Lemma 2.11.2. For the discrete operator (2.85) we have
V(t, 24', a)
2 2
Bm, Be2 +0 (Aj + A3).
Sxyz2V ju ja(t) =
Proof. A Taylor expansion of V(t, ) around the point « = (x}',2?)" gives
snag a ta i
clasps ala Ata.
Vj —1jeti(t) = Vj ie (t) — ae “a a 1 oe
1, 4 BV (1, #8V
BA ang + 2 aaa *
A little thought then shows that
Vin etottlt) — Virtig—1) — Vir~tieta(t) + Vj 1-1)
= 4A, Ao +0 (A}Ap + A143),
Vv
0x1 0x990 2 Finite Difference Methods
as error terms of order Af, A$, and A?A3 will cancel. The result follows.
Oo
Equipped with (2.85), we can approximate the operator £;,9 in (2.71) as
LiaVi elt) £0 (tata) n (tat 2) ae (tat 2) dna Vi salt)
which is accurate to order O(A? + A3). The first easy way to modify our ADI
scheme to incorporate Lis is to treat the mixed derivative fully explicitly.
In the Douglas-Rachford scheme (2.68)-(2.69), for instance, we thus modify
the right-hand side of the first step as follows:
(1 = 04:£1) [Link] = (14-0) AL, + ArLa + Ara) Be (tins)s
(2.86)
(1 = @A:L2) Belts) = Vine — 0AeE2V, jatar: (2.87)
The addition of Lio this way clearly preserves the ADI structure of the
scheme which will continue to involve only sequences of tri-diagonal linear
equations. However, having, in effect, only a one-sided time-differencing of
the mixed derivative teri will lower the convergence order of the time step
to O(A,), irrespective of the choice of 6.
To change the time at which the mixed operator Ly, is evaluated,
consider using a predictor-corrector scheme, where the results of (2.86)—
(2.87) are re-used in a one-time”? iteration. Specifically, we write, for some
A € [0,1],
Predictor:
(1-44:L21) Uf, = (14 (1 OAL + Le + ALi 2) Vin (tins),
(2.88)
(16422) UP), = 00, - PAL, joltiar). (2-89)
Corrector:
(a - 9ArL:) 2) = (1 + (1- OAL, + ALo
+ (1-NAL a), Tr iv(tvt) +ALL 2,,, (2.90)
(1 = 0AcL2) Vy gots) = 24, — OALE2V, 9, (to41)- (2.91)
*2We can run the iteration more than once if desired, but a single iteration will
normally suffice.2.12 PDEs of Arbitrary Order 91
Notice how the Douglas-Rachford scheme is first run once, in (2.88)—
(2.89), to yield a first guess (a “predictor” ), ue), for the time t; value
Vj, .j2(t,). In a second run of the Douglas-Rachford scheme, in (2.90)-(2.91),
this guess is used as a “corrector” to affect the time at wet Li, gis evaluated,
by applying this operator to (1 — A), jo(tis1) + AU?,; when 4 = 4 we
effectively center the time-differencing of the mixed term. The scheme now
relies on three intermediate variables, U\),,, US)... and 2...
The combined predictor-corrector scheme above (in a slightly less general
form, with A; = Ag) was suggested by Craig and Sneyd [1988]. It can be
shown that the scheme has convergence order
0 ((Ar + 40)? + 1yozg} Ae + Leg} de + 42),
so second order convergence in the time domain is still achievable by setting
6 == }. The scheme will be A-stable for @ > 3 and 4 » ins) V(tisa),
h=2 h=1l=h4+1
(1 7 6A,£2) U® =U —GA,LP (tr),
1-0A,L,) U®) =U?) — GAL V (tis).
P Pp
Corrector:
( -9A,£;) 20
= 7 >
=A; (@" +(1-AL14+ 0 Lr
h=2
P P P P
+1-)Y ins) (tina) FADS SO La,
h=ll=htl h=1l=h41
(2 - 0A.£2) Z% = ZY ~ 9A,LoV (tis),
(1-0A:L,) P(t) = 2° — AZ, P (ti41).
With mp, points in the xp-direction, h = 1,...,p, the computational
cost of the predictor-corrector scheme is O([],_, hn). For p < 3, sufficient
conditions for A-stability are 9 > 3 and 3 < A < 8. For p > 4, sufficient
conditions are 9 < 3 and
p-1
d ov,
Yn-# 4
ohva NO):
where N(0,1) is a standard Gaussian distribution and-4 denotes convergence
in distribution'. Purther, if we define
"Recall that a sequence of variables X, with cumulative distribution func-
tions F, converge in distribution to a random vi ble X with distribution F if
limta-soo Fa(w) = F(e) for all x € R at which F(2) is continuous.3.1 Fundamentals 97
then also _
Yn-bu
Sn/Vn
Define the Gaussian percentile u, as ®(u,) = 1 — y, where @ is the
Gaussian cumulative distribution function. From Theorem 3.1.2, and from
the definition of convergence in distribution (see footnote 1), the probability
that the confidence interval
[V(t) - wyy2° 8n/ V2, V(t) + uy/2 > Sn/Vr] (3.4)
fails to include the true value V(t) approaches y for large n. Here
4N(0,1).
with the quantity s,/./n known as the standard error. For given y, the
rate at which the confidence interval for V(t) contracts is O(n~#). This
is relatively slow: to reduce the width of the interval by a factor of 2, n
must increase by a factor of 4. On the other hand, we notice that the
(asymptotic) convergence rate only depends on n, not on the specifics of
the g;’s. In particular, if g(T) = g(X(T)) where X is p-dimensional, the
asymptotic convergence rate is independent of p. As we shall see shortly, in
most applications the work required to generate samples of g(X(T)) is (at
most) linear in p.
3.1.1 Generation of Random Samples
At the most basic level, the Monte Carlo method requires the ability to
draw independent realizations of a scalar random variable Z with a specified
cumulative distribution function F(z) = P(Z < z), where P is a probability
measure. On a computer, the starting point for this exercise is normally a
pseudo-random number generator, a software program that will generate a
sequence of numbers uniformly distributed on (0, 1] (i.e. from 2/(0, 1). Press
et al. [1992] list a number of generators producing sequences of uniform
numbers uj, u2,.-. from iterative relationships of the form
Tizi = (al; +c) mod(m),
itn = Tiga /m.
The externally specified starting point Ip is the seed of the random number
generator. In this so-called general linear congruential generator, the choice
of the multiplier a, the modulus m, and the increment ¢ must be done