chap1a
chap1a
where A is an m × n matrix, then the tools of linear algebra can be brought to bear to
analyze this system, and the highly developed software of numerical linear algebra can be
employed to find the actual solution or an approximation to it.
The situation is usually considerably more complicated when the model leads to m
nonlinear equations in n unknowns, written conveniently as
F (x) = 0
f (x) = 0.
In order to have a concrete case in mind let us consider the following example from the
world of equity options:
A “European call” is an option which gives its holder (owner) the right to buy an asset
for a specified amount $K at a specified time T . K is known as the strike price and T is
the time of maturity of the option. If at time T the asset can be bought at a cheaper price
1
than $K then the option will not be exercised. But if the asset trades for more than $K
then the holder will exercise the option. The option confers a right but not an obligation
and hence has value. The writer (seller) of the option, on the other hand, is obligated to
sell the asset on demand at time T for $K regardless of its value at that time. This creates
risk for which the writer must be compensated by charging for the option.
Pricing options is one of the central topics of computational finance and will dominate
this course. For the above European call there is the famous Black-Scholes formula which
explicitly gives its price
C = SN (d1 ) − Ke−rT N (d2 )
where S is the price of the asset at the time t = 0 when the option is sold. Here
ln(S/K) + (r + σ 2 /2)T
d1 = √
σ T
ln(S/K) + (r − σ 2 /2)T
d2 = √
σ T
Z x
1 2
N (x) = √ e−s /2 ds.
2π −∞
The parameter r is the riskless interest rate at which money can be invested today with
payout at time T and is generally assumed known. The second parameter σ is the so-called
volatility of the asset and is a measure of the day to day changes in the value S of the
asset. There is a great deal of uncertainty of how to choose σ because it is not an observable
quantity. What is observable is the market price C of the option. So one possible approach
to determining σ is to find that value of σ which for given K, r, T and S produces a value
C which is identical to the quoted market price. This value of σ is called the “implied
volatility.” Mathematically, we need to solve the problem
where K, r, T , S and C are given. It is clear that this is a highly nonlinear equation in
the unknown σ which can only be solved numerically (or approximately). Implied volatility
calculations are said to take place around the clock in the financial derivative industry.
2
With this example in the background let us now consider feasible numerical methods
for solving the general problem
f (x) = 0.
3
Suppose we wish to insure that |z − x∗ | < 10−6 . This is guaranteed if
bn − a
< 10−6
2n
or
ln(b − a) + 6 ln 10
n> .
ln 2
For example, if b − a = 1 then
n > 19.93
so that as many as 20 function evaluations may be required to achieve the stated accuracy.
Suppose we wish to apply the bisection method to the implied volatility calculation.
By inspection d1 and d2 are continuous functions of σ and since N is a continuous function
of x it follows that f is a continuous function of σ. Moreover,
so that
lim f (σ) = C − S.
σ→∞
This is a negative quantity since if the call were more expensive than the asset one may as
well buy the asset itself.
The condition as σ → 0 is a little trickier. We note that
ln S/K + rT
lim d1 (σ) = lim d2 (σ) = lim √ .
σ→0 σ→0 σ→0 σ T
4
For a correctly priced call this quantity is also positive, for if
and
C < (S − Ke−rT )
then an investor can sell short the asset for $S and buy the call for C. The value of the
contract at time T is then
(S − C)erT
which would exceed the strike price K required to repurchase the asset sold short before.
Hence
lim f (σ) > 0 and lim f (σ) < 0
σ→0 σ→∞
f (x) = 0.
We shall rewrite the equation generically in the form of a so-called fixed point equation
x = g(x)
x∗ = g(x∗ )
x = g(x) ≡ x − αf (x)
5
for any non-zero scalar α. More commonly, a g is obtained by solving f for x in terms of a
function of x. For example,
f (x) ≡ x − x2 = 0
x = x2
or
√
x= x.
is obtained by simple substitution or, what is the same, a fixed point iteration. We assume
that we have an initial guess x0 and compute iteratively the sequence {xn } from
xn+1 = g(xn ), n = 0, 1, 2, . . .
x = x2
then xn → ∞ as n → ∞ no matter how small ² is chosen. The desirable property for a fixed
point x∗ is that it be a point of attraction which is defined as follows.
Definition: Let x∗ be a fixed point of g. x∗ is a point of attraction of the fixed point
iteration if there is a neighborhood N (x∗ , δ) of x∗ (i.e., an interval of radius δ around x∗ )
such that for any x0 ∈ N (x∗ , δ) the fixed point iteration converges to x∗ .
6
The discussion above shows that x∗ = 1 is not a point of attraction of
g(x) = x2 .
A moment’s reflection will show that the fixed point x∗ = 0 is a point of attraction. Similarly,
x∗ = 1 is a point of attraction of the alternate fixed point equation
√
x = g(x) = x
but x∗ = 0 is not. In many applications one has a reasonable idea of x ∗ and it is important
to have a fixed point formulation for f (x) = 0 for which x∗ is a point of attraction so that
a good initial guess will lead to convergence. This raises the question of what property of g
makes a fixed point a point of attraction. We have the following theoretical criterion.
Theorem: Let x∗ be a fixed point of the function g. Suppose that g is continuously
differentiable in a neighborhood of x∗ and that
|g 0 (x∗ )| < 1.
7
which guarantees that xn → x∗ as n → ∞.
If we examine the two fixed point equations
x = g1 (x) = x2
and
√
x = g2 (x) = x
associated with
f (x) = x − x2 = 0
Newton’s method
For the method of bisection we only required a continuous function f and an interval
over which f changes sign. The algorithm itself asks for the evaluation of f at given points
and does not demand that f be given analytically. For example, the evaluation of f may
involve a table-look up or f may require the solution of a differential equation whose solution
depends on the independent variable x. Hence f may be very general and complicated.
If f is given analytically and f 0 is continuous then we can solve
f (x) = 0
by a fixed point method called Newton’s method (sometimes Newton Raphson method)
which is in general much more efficient than bisection. The idea is as follows: Given an
initial guess x0 then for n = 0, 1, 2, . . . we linearize f around xn and find xn+1 as the solution
of the linear problem. The linearization is obtained from the first two terms of the Taylor
expansion of f , i.e. the linearization of f around xn is
8
so that xn+1 is the solution of Ln x = 0, or
f (xn )
xn+1 = xn − , n = 0, 1, 2.
f 0 (xn )
There is, of course, a natural geometric interpretation of this method. The equation
y = Ln x
is the tangent to f at xn and xn+1 is the point where the tangent crosses the x-axis. The
hope is that xn+1 is a better approximation to x∗ than the preceding iterate xn .
Newton’s method has two exceedingly important properties. Under mild conditions we
are guaranteed convergence from a good initial x0 and once xn is sufficiently close to x∗ the
convergence is very rapid. These properties are easy to establish in view of our discussion
of fixed point iterations.
We observe that Newton’s method is a fixed point iteration for
x = g(x)
where
f (x)
g(x) = x − .
f 0 (x)
Let x∗ be a root of f and suppose, as is usually the case, that f 00 exists and that f 0 (x∗ ) 6= 0.
Then it follows that
f 0 (x∗ )2 − f (x∗ )f 00 (x∗ )
g 0 (x) = 1 − = 0.
f 0 (x∗ )2
Hence x∗ is a point of attraction and Newton’s method will converge from a good initial
guess. Moreover, it follows from the identity
1 00
g(y) = g(x) + g 0 (x)(y − x) + g (ξ)(y − x)2
2
1 00
|xn+1 − x∗ | = |g(xn ) − g(x∗ )| = |g(x∗ ) + g 0 (x∗ )(xn − x∗ ) + g (ξ)(xn − x∗ )2 − g(x∗ )|
2
|xn+1 − x∗ | ≤ K|xn − x∗ |2
9
for some constant K related to g 00 (x). Thus, if in the iteration the error |xn − x∗ | is 10−2
then the next iterate gives an error of order 10−4 . This convergence is called quadratic
and usually insures that only two or three iterations are needed to have an acceptable
approximation to x∗ .
The dominant draw-back to using Newton’s method is that a good initial guess is
required since convergence to a point of attraction is a local property. If one were to provide,
say, a Newton’s method based program for solving the implied volatility problem one would
have to guard against a bad choice of the initial guess σ 0 , unless, of course, one can establish
that Newton’s method will automatically converge. This will require additional conditions
on f .
Theorem: Let f be defined on the interval [a, b]. Suppose that
f 0 (x) > 0
f 00 (x) ≥ 0
because the tangent has positive slope. The same observation applies to all subsequent
tangents. As a consequence we generate a decreasing sequence of numbers {x n } which is
bounded below by x∗ . Hence the sequence must converge and from
f (xn )
xn+1 = xn −
f 0 (xn )
follows that xn must converge to a root of f , hence to x∗ . Similar arguments are used if
f is decreasing or convex downward. The geometry tells us whether monotone convergence
can be guaranteed.
10
Let us look at an application of this result to the implied volatility calculation.
where
A
d1 (σ) = + bσ, d2 (σ) = d1 (σ) − 2bσ
σ
with
ln(S/K) + rT √
A= √ and b = T /2.
T
Our discussion of the problem in connection with the bisection method already established
that under reasonable assumption we may assume that
f (σ) = 0
1 2
N 0 (d2 ) = √ e−d2 /2
2π
and if we substitute d2 = d1 − 2b then simple algebra leads to
S rT
N 0 (d2 ) = N 0 (d1 ) e
K
so that
f 0 (σ) = −2bSN 0 (d1 ) < 0.
Furthermore,
2bS 2
f 00 (σ) = − √ (−d1 d01 )e−d1 /2 .
2π
But
1
d01 = − d2
σ
so that
2bS 2
f 00 (σ) = − √ (d1 d2 )e−d1 /2 .
σ 2π
11
From the definition of d1 and d2 we see that
µ ¶µ ¶
A A
d1 d2 = + bσ − bσ .
σ σ
If we set
p
σ0 = |A/b|
then f 00 is negative for σ < σ 0 and positive for all σ > σ 0 . A look at successive tangents
to the graph of f shows that starting from σ 0 Newton’s method will converge. If f (σ 0 ) > 0
we obtain a monotone increasing sequence, if f (σ 0 ) < 0 we generate a monotone decreasing
sequence. Hence this choice of initial value is sufficient to guarantee monotone convergence
as long as the call is correctly priced so that f (σ) = 0 has a solution. While in general
quadratic convergence only sets in close to the correct volatility in practice this approach is
quite efficient compared to the bisection method.
A few final comments: The strengths and weaknesses of Newton’s method carry
over to the solution of the system
F (x) = 0.
i.e, formally
xn+1 = xn − F 0 (xn )−1 F (xn )
xn+1 = xn + δ
12
in order to avoid the inverse of F 0 (x). The multi-dimensional Newton method will again
converge for a good initial guess as long as F 0 (x∗ ) is non-singular, and convergence close to
the solution remains quadratic. In general, the choice of the initial condition is more critical
than in the scalar case and has lead to some sophisticated methods for choosing x 0 .
Lack of convergence of Newton’s method for a good initial guess is usually due to the
incorrect calculation of F 0 (x). In addition, F may not be explicitly given but only in terms of
an input-output algorithm (given x one can calculate F (x)) so that F 0 is not calculable. One
can avoid F 0 (x) altogether if its entries are replaced by difference quotients. For example,
the jth column of F 0 (x) can be approximated by
F (x + hêj ) − F (x)
h
for small h where êj is the jth unit vector. By definition the limit of this difference quotient
as h → 0 is the jth column of F 0 (x). This discrete approximation to Newton’s method and
many variations thereof are closely related to secant methods and interpolation methods
which are discussed in texts on the numerical solution of nonlinear systems.
13