0% found this document useful (0 votes)
30 views

Notes

Uploaded by

shdi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Notes

Uploaded by

shdi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

MATH 20602

Numerical Analysis 1

Simon L. Cotter

These notes and other resources are available on Blackboard.

These notes are largely based on those inherited from Dr Martin


Lotz.

1
MATH20602: S.L. COTTER

Contents
Page

1 Introduction 3
1.1 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Interpolation 8
2.1 Lagrange Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Interpolation Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Newton’s divided differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 An alternative form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Integration and Quadrature 19


3.1 The Trapezium Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Simpson’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 The Runge phenomenon revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Composite integration rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Numerical Linear Algebra 28


4.1 The Jacobi and Gauss-Seidel methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 Convergence of Iterative Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Gershgorin’s circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.6 The Condition Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5 Non-linear Equations 49
5.1 The bisection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3 Fixed point iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.4 Rates of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.5 Newton’s method in the complex plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Department of Mathematics 2 University of Manchester


MATH20602: 1 INTRODUCTION S.L. COTTER

1 Introduction
Video 1.1
“Since none of the numbers which we take out from logarithmic and trigonometric tables ad-
mit of absolute precision, but are all to a certain extent approximate only, the results of all cal-
culations performed by the aid of these numbers can only be approximately true.”— C.F. Gauss,
Theoria motus corporum coelestium in sectionibus conicis solem ambientium, 1809
Classical mathematical analysis owes its existence to the need to model the natural world.
The study of functions and their properties, of differentiation and integration, has its origins
in the attempt to describe how things move and behave. With the rise of technology it became
increasingly important to get actual numbers out of formulae and equations. This is where
numerical analysis comes into the scene: to develop methods to make mathematical models
based on continuous mathematics effective.
In practice, one often cannot simply plug numbers into formulae and get all the exact
results. Most problems require an infinite number of steps to solve, but one only has a finite
amount of time available; most numerical data also require an infinite amount of storage
(just try to store π on a computer!), but a piece of paper or a computer only has so much
space. These are some of the reasons that lead us to work with approximations.1
An algorithm is a sequence of instructions to be carried out by a computer (machine or
human), in order to solve a problem. There are two guiding principles to keep in mind when
designing and analysing numerical algorithms.

1. Computational complexity: algorithms should be fast;

2. Accuracy: solutions should be good.

The first aspect is due to limited time; the second due to limited space. In what follows, we
discuss these two aspects in some more detail.

1.1 Computational Complexity


Video 1.5
An important consideration in the design of numerical algorithms is efficiency; we would
like to perform computations as fast as possible. Considerable speed-ups are possible by
clever algorithm design that aims to reduce the number of arithmetic operations needed to
perform a task. The measure of “computation time” we use is the number of basic (floating
point) arithmetic operations (+, −, ×, /) needed to solve a problem, as a function of the input
size. The input size will be typically the number of values we need to specify the problem.
Example 1. (Horner’s Algorithm) Take, for example, the problem of evaluating a polynomial

p n (x) = a 0 + a 1 x + a 2 x 2 + · · · + a n x n

for some x ∈ R and given a0, . . . , an . A naïve strategy would be as follows:


1. Compute x, x 2 , . . . , x n ,
1
In discrete mathematics and combinatorics, approximation also becomes a necessity, albeit for a differ-
ent reason, namely computational complexity. Many combinatorial problems are classified as NP-hard, which
makes them computationally intractable.

Department of Mathematics 3 University of Manchester


MATH20602: 1 INTRODUCTION S.L. COTTER

2. Multiply a k x k for k = 1, . . . , n,

3. Add up all the terms.

If each of the x k is computed individually from scratch, the overall number of multiplications
is n(n + 1)/2. This can be improved to 2n − 1 multiplications by computing the powers x k ,
1 ≤ k ≤ n, iteratively. An even smarter way, that also uses less intermediate storage, can be
derived by observing that the polynomial can be written in the following form:
p n (x) = a 0 + x(a 1 + a 2 x + a 3 x 2 + · · · + a n x n−1 )
= a 0 + xp n−1 (x).
The polynomial in brackets has degree n −1, and once we have evaluated it, we only need one
additional multiplication to have the value of p(x). In the same way, p n−1 (x) can be written as
p n−1 (x) = a 1 + xp n−2 (x) for a polynomial p n−2 (x) of degree n −2, and so on. This suggests the
possibility of recursion, leading to Horner’s Algorithm. This algorithm computes a sequence
of numbers
bn = an
b n−1 = a n−1 + x · b n
....
..
b0 = a0 + x · b1 ,
where b 0 turns out to be the value of the polynomial p n evaluated at x. In practice, one
would not compute a sequence but overwrite the value of a single variable at each step. The
following M ATLAB and Python code illustrates how the algorithm can be implemented. Note
that M ATLAB encodes the coefficients a 0 , . . . , a n as a vector with entries a(1), . . . , a(n + 1).

function p = horner ( a , x ) def horner ( polynomial , x ) :


m = length ( a ) ; result = 0
p = a (m) ; for c o e f f i c i e n t in polynomial :
for k = m−1: −1:1 r e s u l t = r e s u l t * x+ c o e f f i c i e n t
p = a ( k )+ x * p ; return r e s u l t
end

M ATLAB Python

This algorithm only requires n multiplications. Horner’s Method is the standard way of
evaluating polynomials on computers.

Here we are less concerned with the precise numbers, but with the order of magnitude.
Thus we will not care so much whether a computation procedure uses 1.5n 2 operations (n
is the input size) or 20n 2 , but we will care whether the algorithm needs n 3 as opposed to
n log(n) arithmetic operations to solve a problem. Video 1.2
To study conveniently the performance of algorithms, we use the big-O notation. Given
two functions f and g taking integer arguments n, we say that
f (n) ∈ O(g (n)) or f (n) = O(g (n)),
if there exists a constant C > 0 and an integer n 0 > 0, such that | f (n)| < C · |g (n)| for all n > n 0 .
For example, n log(n) = O(n 2 ) and n 3 + 10308 n ∈ O(n 3 ).

Department of Mathematics 4 University of Manchester


MATH20602: 1 INTRODUCTION S.L. COTTER

Example 2. Consider the problem of multiplying a matrix and a vector:

Ax = b,

where A is an n × n matrix, and x and b are n-vectors. Normally, the number of multiplica-
tions needed is n 2 , and the number of additions is n(n − 1) (verify this!). However, there are
some matrices, for example the one with the n-th roots of unity a i j = e 2πi j /n as entries, for
which there are algorithms (in this case, the Fast Fourier Transform) that can compute the
product Ax in O(n log n) operations. This example is of great practical importance, but will
not be discussed further at the moment.

An interesting and challenging field is algebraic complexity theory, which deals with lower
bounds on the number of arithmetic operations needed to perform certain computational
tasks. It also asks questions such as whether Horner’s method and other algorithms are opti-
mal, that is, can’t be improved upon.

1.2 Accuracy

In the early 19th century, C.F. Gauss, one of the most influential mathematicians of all time
and a pioneer of numerical analysis, developed the method of least squares in order to pre-
dict the reappearance of the recently discovered asteroid Ceres. He was well aware of the
limitations of numerical computing, as the quote at the beginning of this lecture indicates.

1.2.1 Measuring errors

To measure the quality of approximations, we use the concept of relative error. Given a quan-
tity x and a computed approximation x̂, the absolute error is given by

E abs (x̂) = |x − x̂|,

while the relative error is given as


|x − x̂|
E rel (x̂) = .
|x|
The benefit of working with relative errors is clear: they are scale invariant. On the other
hand, absolute error can sometimes be meaningless. For example, an error of one hour is
irrelevant when estimating the age of the Tyrannosaurus Rex at Manchester Museum, but it
is crucial when determining the time of a lecture. This is because, for the former, one hour
corresponds to a relative error of the order 10−11 , while for the latter it is of the order 10−1 .

1.2.2 Floating point and significant figures


Video 1.3
Nowadays, the established way of representing real numbers on computers is using floating-
point arithmetic. In the double precision version of the IEEE standard for floating-point arith-
metic, a number is represented using 64 bits.2 A number is written

x = ±(1 + f ) × 2e ,
2
A bit is a binary digit, that is, either 0 or 1.

Department of Mathematics 5 University of Manchester


MATH20602: 1 INTRODUCTION S.L. COTTER

where f is a fraction in [0, 1), represented using 52 bits, and e is the exponent, using 11 bits
(what is the remaining 64th bit used for?). Two things are worth noticing about this represen-
tation: there are largest possible numbers, and there are gaps between representable num-
bers. The largest and smallest numbers representable in this form are of the order of ±10308 ,
enough for most practical purposes. A bigger concern are the gaps, which means that the
results of many computations almost always have to be rounded to the closest floating-point
number.
Throughout this course, when going through calculations without using a computer, we
will usually use the terminology of significant
p figures (s.f.) and work with 4 significant figures
in base 10. For example, in base 10, 3 equals 1.732 to 4 significant figures. To count the
number of significant figures in a given number, start with the first non-zero digit from the
left and, moving to the right, count all the digits thereafter, counting final zeros if they are to
the right of the decimal point. For example, 1.2048, 12.040, 0.012048, 0.0012040 and 1204.0
all have 5 significant figures (s.f.). In rounding or truncation of a number to n s.f., the original
is replaced by the closest number with n s.f. An approximation x̂ of a number x is said to be
correct to n significant figures if both x̂ and x round to the same n s.f. number.3
Remark 1.1. Note that final zeros to the left of the decimal point may or may not be signifi-
cant: the number 1204000 has at least 4 significant figures, but without any more information
there is no way of knowing whether or not any more figures are significant. When 1203970
is rounded to 5 significant figures to give 1204000, an explanation that this has 5 significant
figures is required. This could be made clear by writing it in scientific notation: 1.2040 × 106 .
In some cases we also have to agree whether to round up or round down: for example, 1.25
could equal 1.2 or 1.3 to two significant figures. If we agree on rounding up, then to say that
a = 1.2048 to 5 s.f. means that the exact value of a satisfies 1.20475 ≤ a < 1.20485.
Example 3. Suppose we want to find the solution to the quadratic equation Video 1.6

ax 2 + bx + c = 0.

The two solutions to this problem are given by


p p
−b + b 2 − 4ac −b − b 2 − 4ac
x1 = , x2 = . (1.1)
2a 2a
In principle, to find x 1 and x 2 one only needs to evaluate the expressions for given a, b, c. As-
sume, however, that we are only allowed to compute to four significant figures, and consider
the particular equation
x 2 + 39.7x + 0.13 = 0.
Using the formula 1.1, we have, always rounding to four significant figures,

a = 1.000, b = 39.70, c = 0.1300,

b 2 = 1576.09 = 1576 (to 4 s.f.) , 4ac = 0.5200 (to 4 s.f.),


p
b 2 − 4ac = 1575.48 = 1575 (to 4 s.f.) , b 2 − 4ac = 39.69 (to 4 s.f.).
Hence, the computed solutions (to 4 significant figures) are given by

x 1 = −0.005000, x 2 = −39.70.
3
This definition is not without problems, see for example the discussion in Section 1.2 of Nicholas J. Higham,
Accuracy and Stability of Numerical Algorithms, SIAM 2002.

Department of Mathematics 6 University of Manchester


MATH20602: 1 INTRODUCTION S.L. COTTER

More accurate solutions, however, are

x 1 = −0.0032748..., x 2 = −39.6967...

The computed solution x 1 is completely wrong, at least if we look at the relative error:
|x 1 − x 1 |
= 0.5268.
|x 1 |
While the accuracy can be improved by increasing the number of significant figures during
the calculation, such effects happen all the time in scientific computing and the possibility
of such effects has to be taken into account when designing numerical algorithms.
Note that it makes sense, as in the above example, to look at errors in a relative sense.
An error of one mile is negligible when dealing with astronomical distances, but not so when
measuring the length of a race track.
By analysing what causes the error it is sometimes possible to modify the method of
calculation in order to improve
p the result. In the present example, the problems are being
2
caused by the fact that b ≈ b − 4ac, and therefore
p
−b + b 2 − 4ac −39.7 + 39.69
=
2a 2
exhibits what is called “catastrophic cancellation.” A way out is provided by the observation
that the two solutions are related by
c
x1 · x2 = . (1.2)
a
When b > 0, the computation of x 2 according to (1.1) shouldn’t cause any problems, and in
our case we get −39.70 which is accurate to four significant figures. We can then use (1.2) to
derive x 1 = c/(ax 2 ) = −0.003275, also accurate to 4 s.f.

1.2.3 Sources of errors

As we have seen, one can sometimes get around numerical catastrophes by choosing a clever
method for solving a problem, rather than increasing precision. So far we have considered
errors introduced due to rounding operations. There are other sources of errors:

1. Overflow;

2. Errors in the model;

3. Human or measurement errors;

4. Truncation or approximation errors.

The first is rarely an issue, as we can represent numbers of order 10308 on a computer.
The second and third are important factors, but fall outside the scope of this lecture. The
fourth has to do with the fact that many computations are done approximately rather than
exactly. For computing the exponential, for example, we might use a method that gives the
approximation
x2
ex ≈ 1 + x + .
2
As it turns out, many practical methods give approximations to the “true” solution. End week 1

Department of Mathematics 7 University of Manchester


MATH20602: 2 INTERPOLATION S.L. COTTER

2 Interpolation
Video 2.1
How do we represent a function on a computer? If f is a polynomial of degree n,

f (x) = p n (x) = a 0 + a 1 x + · · · + a n x n ,

then we only need to store the n + 1 coefficients a 0 , . . . , a n . In fact, one can approximate an
arbitrary continuous function on a bounded interval by a polynomial. Recall that C k ([a, b])
is the set of functions that are k-times continuously differentiable on [a, b].
Theorem 2.1 (Weierstrass). For any f ∈ C ([0, 1]) and any ε > 0 there exists a polynomial p(x)
such that
max ¯ f (x) − p(x)¯ ≤ ε.
¯ ¯
0≤x≤1

R
Given pairs (x j , y j ) ∈ , 0 ≤ j ≤ n, with distinct x j , the interpolation problem consists of
finding a polynomial p of lowest possible degree such that

p(x j ) = y j , 0 ≤ j ≤ n. (2.1)

22

20

18

16

14

12

10

6
0 1 2 3 4 5 6 7

Figure 1: The interpolation problem.

Example 4. Let h = 1/n, x 0 = 0, and x i = i h for 1 ≤ i ≤ n. The x i subdivide the interval [0, 1]
into segments of equal length h. Now let y i = i h/2 for 0 ≤ i ≤ n. Then the points (x i , y i )
all lie on the line p 1 (x) = x/2, as is easily verified. It is also easy to see that p 1 is the unique
polynomial of degree at most 1 that goes through these points. In fact, we will see that it is
the unique polynomial of degree at most n that passes through these points!

We will first describe the method of Lagrange interpolation, which also helps to establish
the existence and uniqueness of an interpolation polynomial satisfying (2.1). We then discuss
the quality of approximating polynomials by interpolation, the question of convergence, as
well as other methods such as Newton interpolation.

Department of Mathematics 8 University of Manchester


MATH20602: 2 INTERPOLATION S.L. COTTER

2.1 Lagrange Interpolation


Video 2.2
The next lemma shows that it is indeed possible to find a polynomial of degree at most n
satisfying (2.1). We denote by P n the set of all polynomials of degree at most n. Note that
this also includes polynomials of degree smaller than n, and in particular constants, since we
allow coefficients such as a n in the representation a 0 + a 1 x + · · · + a n x n to be zero.
Lemma 2.1. Let x 0 , x 1 , . . . , x n be distinct real numbers. Then there exist polynomials L k ∈ P n
such that (
1 j = k,
L k (x j ) =
0 j ̸= k.
Moreover, the polynomial
n
X
p n (x) = L k (x)y k
k=0
is in P n and satisfies p n (x j ) = y j for 0 ≤ j ≤ n.

Proof. Clearly, if L k exists, then it is a polynomial of degree n with n roots at x j for j ̸= k.


Hence, it may be factorized as
Y
L k (x) = C k (x − x j ) = C k (x − x 0 ) · · · (x − x k−1 )(x − x k+1 ) · · · (x − x n ),
j ̸=k
Q
for some constant C k . To determine C k , set x = x k . Then L k (x k ) = 1 = C k j ̸=k (x k − x j ), and
therefore
1
Ck = Q .
j ̸=k (x k − x j )

Note that we assumed the x j to be distinct, otherwise we would have to divide by zero and
cause a disaster. We therefore obtain the representation
Q
j ̸=k (x − x j )
L k (x) = Q .
j ̸=k (x k − x j )

These L k exist in P n , so this proves the first claim. Now set


n
X
p n (x) := y k L k (x).
k=0

Then p n (x j ) = nk=0 y k L k (x j ) = y j L j (x j ) = y j . Since p n (x) is a linear combination of the vari-


P

ous L k , it lives in P n . This completes the proof.


Video 2.3
We have shown the existence of an interpolating polynomial. We next show that this
polynomial is uniquely determined. The important ingredient is the Fundamental Theorem
of Algebra, a version of which states that

A polynomial of degree n with complex coefficients has exactly n complex roots.

Theorem 2.2 (Lagrange Interpolation Theorem). Let n ≥ 0. Let x j , 0 ≤ j ≤ n, be distinct real


numbers and let y j , 0 ≤ j ≤ n, be any real numbers. Then there exists a unique p n (x) ∈ P n
such that
p n (x j ) = y j , 0 ≤ j ≤ n. (2.2)

Department of Mathematics 9 University of Manchester


MATH20602: 2 INTERPOLATION S.L. COTTER

Proof. The case n = 0 is clear, so let us assume n ≥ 1. In Lemma 2.1 we constructed a poly-
nomial p n (x) of degree at most n satisfying the conditions (2.2), proving the existence part.
For the uniqueness, assume that we have two such polynomials p n (x) and q n (x) of degree at
most n satisfying the interpolating property (2.2). The goal is to show that they are the same.
By assumption, the difference p n (x) − q n (x) is a polynomial of degree at most n that takes on
the value p n (x j ) − q n (x j ) = y j − y j = 0 at the n + 1 distinct x j , 0 ≤ j ≤ n. By the Fundamental
Theorem of Algebra, a non-zero polynomial of degree n can have no more than n distinct real
roots, from which it follows that p n (x) − q n (x) ≡ 0, or p n (x) = q n (x).

Definition 2.1. Given n + 1 distinct real numbers x j , 0 ≤ j ≤ n, and n + 1 real numbers y j ,


0 ≤ j ≤ n, the polynomial
n
X
p n (x) = L k (x)y k (2.3)
k=0

is called the Lagrange interpolation polynomial of degree n corresponding to the data points
(x j , y j ), 0 ≤ j ≤ n. If the y k are the values of a function f , that is, if f (x k ) = y k , 0 ≤ k ≤ n, then
p n (x) is called the Lagrange interpolation polynomial associated with f and x 0 , . . . , x n .
Remark 2.1. Note that the interpolation polynomial is uniquely determined, but that the
polynomial can be written in different ways. The term Lagrange interpolation polynomial
thus referes to to the particular form (2.3) of this polynomial. For example, the two expres-
sions
x(x − 1) x(x + 1)
q 2 (x) = x 2 , p 2 (x) = + ,
2 2
define the same polynomial (as can be verified by multiplying out the terms on the right),
and thus both represent the unique polynomial interpolating the points (x 0 , y 0 ) = (−1, 1),
(x 1 , y 1 ) = (0, 0), (x 2 , y 2 ) = (1, 1), but only p 2 (x) is in the Lagrange form.

(*) A different take on the uniqueness problem can be arrived at by translating the prob-
lem into a linear algebra one. For this, note that if p n (x) = a 0 + a 1 x + · · · + a n x n , then the
polynomial evaluation problem at the x j , 0 ≤ j ≤ n, can be written as a matrix vector prod-
uct:
1 x 0 · · · x 0n a 0
    
y0
 y 1  1 x 1 · · · x n   a 1 
1 
 ..  =  .. .. . . .  . ,
   
 .  . . . ..   .. 
yn 1 xn · · · x nn an
or y = X a. If the matrix X is invertible, then the interpolating polynomial is uniquely deter-
mined by the coefficient vector a = X −1 y. The matrix X is invertible if and only if det(X ) ̸= 0.
The determinant of X is the well-known Vandermonde determinant:

1 x 0 · · · x 0n
 
1 x 1 · · · x n  Y
1
det(X ) = det  . . . = (x j − x i ).

 .. .. . . ... 
j >i

1 xn ··· x nn

Clearly, this determinant is different from zero if and only if the x j are all distinct, which
shows the importance of this assumption.

Department of Mathematics 10 University of Manchester


MATH20602: 2 INTERPOLATION S.L. COTTER

Example 5. Consider the function f (x) = e x on the interval [−1, 1], with interpolation points
x 0 = −1, x 1 = 0, x 2 = 1. The Lagrange basis functions are Video 2.2
(x − x 1 )(x − x 2 ) 1
L 0 (x) = = x(x − 1),
(x 0 − x 1 )(x 0 − x 2 ) 2
L 1 (x) = 1 − x 2 ,
1
L 2 (x) = x(x + 1).
2
The Lagrange interpolation polynomial is therefore given by
1 1
p 2 (x) = x(x − 1)e −1 + (1 − x 2 )e 0 + x(x + 1)e 1
2 2
= 1 + x sinh(1) + x 2 (cosh(1) − 1).

2.5

1.5

0.5

0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

Figure 2: Lagrange interpolation of e x at −1, 0, 1.

2.2 Interpolation Error


Video 2.4
If the data points (x j , y j ) come from a function f (x), that is, if f (x j ) = y j , then the Lagrange
interpolating polynomial can look very different from the original function. It is therefore
of interest to have some control over the interpolation error f (x) − p n (x). Clearly, without
any further assumption on f the difference can be arbitrary. We will therefore restrict to
functions f that are sufficiently smooth, as quantified by belonging to some class C k ([a, b])
for sufficiently large k.
Example 6. All polynomials belong to C k ([a, b]) for all bounded intervals [a, b] and any inte-
ger k ≥ 0. However, f (x) = 1/x ̸∈ C ([0, 1]), as f (x) → ∞ for x → 0 and the function is therefore
not continuous there.

Now that we have established the existence and uniqueness of a function’s interpolation
polynomial, we would like know how well it approximates that function.

Department of Mathematics 11 University of Manchester


MATH20602: 2 INTERPOLATION S.L. COTTER

Theorem 2.3. Let n ≥ 0 and assume f ∈ C n+1 ([a, b]). Let p n (x) ∈ P n be the Lagrange interpo-
lation polynomial associated with f and distinct x j , 0 ≤ j ≤ n. Then for every x ∈ [a, b] there
exists ξ = ξ(x) ∈ (a, b) such that
f (n+1) (ξ)
f (x) − p n (x) = πn+1 (x), (2.4)
(n + 1)!
where
πn+1 (x) = (x − x 0 ) · · · (x − x n ).

For the proof of Theorem 2.3 we need the following consequence of Rolle’s Theorem. Video 2.5
m
Lemma 2.2. Let g ∈ C ([a, b]), and suppose g vanishes at m + 1 points x 0 , . . . , x m . Then there
exists ξ ∈ (a, b) such that the m-th derivative g (m) satisfies g (m) (ξ) = 0.

Proof. By Rolle’s Theorem, for any two x i , x j there exists a point in between where g ′ van-
ishes, therefore g ′ vanishes at (at least) m points. Repeating this argument, it follows that
g (m) vanishes at some point ξ ∈ (a, b).

Proof of Theorem 2.3. Assume x ̸= x j for 0 ≤ j ≤ n (otherwise the theorem is clearly true).
Define the function Video 2.6
f (x) − p n (x)
ϕ(t ) = f (t ) − p n (t ) − πn+1 (t ).
πn+1 (x)
This function vanishes at n + 2 distinct points, namely t = x j , 0 ≤ j ≤ n, and x. Assume
n > 0 (the case n = 0 is left as an exercise). By Lemma 2.2, the function ϕ(n+1) has a zero
ξ ∈ (a, b), while the (n + 1)-th derivative of p n vanishes (since p n is a polynomial of degree n).
We therefore have
f (x) − p n (x)
0 = ϕ(n+1) (ξ) = f (n+1) (ξ) − (n + 1)!,
πn+1 (x)
from which we get
f (n+1) (ξ)
f (x) − p n (x) = πn+1 (x).
(n + 1)!
This completes the proof.

Theorem 2.3 contains an unspecified number ξ. Even though we can’t find this location
in practice, the situation is not too bad as we can sometimes bound the (n + 1)-th derivative
of f on the interval [a, b].
Corollary 2.1. Under the conditions as in Theorem 2.3,
M n+1
| f (x) − p n (x)| ≤ |πn+1 (x)|,
(n + 1)!
where
M n+1 = max | f (n+1) (x)|.
a≤x≤b

Proof. By assumption we have the bound


| f (n+1) (ξ)| ≤ M n+1 ,
so that ¯ (n+1) ¯
¯f (ξ)πn+1 (x) ¯¯ |πn+1 (x)|
| f (x) − p n (x)| = ¯
¯ ≤ M n+1 .
(n + 1)! ¯ (n + 1)!
This completes the proof. End week 2

Department of Mathematics 12 University of Manchester


MATH20602: 2 INTERPOLATION S.L. COTTER

Example 7. Suppose we would like to approximate f (x) = e x by an interpolating polynomial


p 1 ∈ P 1 at points x 0 , x 1 ∈ [0, 1] that are separated by a distance h = x 1 − x 0 . What h should we
choose to achieve Video 3.1
x −5
|p 1 (x) − e | ≤ 10 , x 0 ≤ x ≤ x 1 .
From Corollary 2.1, we get
M 2 |π2 (x)|
|p 1 (x) − e x | ≤ ,
2
where M 2 = maxx0 ≤x≤x1 | f (2) (x)| ≤ e (because x 0 , x 1 ∈ [0, 1] and f (2) (x) = e x ) and π2 (x) = (x −
x 0 )(x − x 1 ). To find the maximum of π2 (x), first write

x = x 0 + θh, x 1 = x 0 + h.

for θ ∈ [0, 1]. Then


|π2 (x)| = θh(h − θh) = h 2 θ(1 − θ).
By taking derivatives with respect to θ we find that the maximum is attained at θ = 1/2.
Hence,
h2
µ ¶
21 1
π2 (x) ≤ h 1− = .
2 2 4
We conclude that
h2e
|p 1 (x) − e x | ≤ .
8
p
In order to achieve that this falls below 10−5 , we require that h ≤ 8 · 10−5 /e = 5.425 · 10−3
to 4 s.f. This gives information on how small the spacing of points needs to be for linear
interpolation to guarantee a certain accuracy.

While the interpolation polynomial of degree at most n for a function f and n + 1 points
x 0 , . . . , x n is unique, it can appear in different forms. The one we have seen so far is the La-
grange form, where the polynomial is given as a linear combination of the Lagrange basis
functions:
Xn
p(x) = L k (x) f (x k ),
k=0
or some modifications of this form, such as the barycentric form (see Section 2.5). A different
approach to constructing the interpolation polynomial is based on Newton’s divided differ-
ences.

2.3 Newton’s divided differences


Video 3.2
A convenient way of representing an interpolation polynomial is as

p(x) = a 0 + a 1 (x − x 0 ) + · · · + a n (x − x 0 ) · · · (x − x n−1 ). (2.5)

Provided we have the coefficients a 0 , . . . , a n , evaluating the polynomial only requires n mul-
tiplications using Horner’s Method. Moreover, it is easy to add new points: if x n+1 is added,
the coefficients a 0 , . . . , a n don’t need to be changed.
Example 8. Let x 0 = −1, x 1 = 0, x 2 = 1 and x 3 = 2. Then the polynomial p 3 (x) = x 3 can be
written in the form (2.5) as

p 3 (x) = x 3 = −1 + (x + 1) + (x + 1)x(x − 1).

Department of Mathematics 13 University of Manchester


MATH20602: 2 INTERPOLATION S.L. COTTER

A pleasant feature of the form (2.5) is that the coefficients a 0 , . . . , a n can be computed
easily using divided differences. The divided differences associated with the function f and
R
distinct x 0 , . . . , x n ∈ are defined recursively as

f [x i ] := f (x i ),
f [x i +1 ] − f [x i ]
f [x i , x i +1 ] := ,
x i +1 − x i
f [x i +1 , x i +2 , . . . , x i +k ] − f [x i , x i +1 , . . . , x i +k−1 ]
f [x i , x i +1 , . . . , x i +k ] : = .
x i +k − x i

The divided differences can be computed from a divided difference table, where we move
from one column to the next by applying the rules above (here we use the shorthand f i :=
f (x i )):
0 x0 f 0
f [x 0 , x 1 ]
1 x1 f 1 f [x 0 , x 1 , x 2 ]
f [x 1 , x 2 ] f [x 0 , x 1 , x 2 , x 3 ]
2 x2 f 2 f [x 1 , x 2 , x 3 ]
f [x 2 , x 3 ]
3 x3 f 3
From this table we also see that adding a new pair (x n+1 , f n+1 ) would require an update of the
table that takes O(n) operations.
Theorem 2.4. Let x 0 , . . . , x n be distinct points. Then the interpolation polynomial for f at
points x i , . . . , x i +k is given by

p i ,k (x) = f [x i ] + f [x i , x i +1 ](x − x i ) + f [x i , x i +1 , x i +2 ](x − x i )(x − x i +1 ) + · · ·


+ f [x i , . . . , x i +k ](x − x i ) · · · (x − x i +k−1 ).

In particular, the coefficients in Equation (2.5) are given by the divided differences

a k = f [x 0 , . . . , x k ],

and the interpolation polynomial p n (x) can therefore be written as

p n (x) = p 0,n (x) = f [x 0 ] + f [x 0 , x 1 ](x − x 0 ) + f [x 0 , x 1 , x 2 ](x − x 0 )(x − x 1 ) + · · ·


+ f [x 0 , . . . , x n ](x − x 0 ) · · · (x − x n−1 ).

Before going into the proof, observe that the divided difference f [x 0 , . . . , x n ] is the high-
est order coefficient, that is, the coefficient of x n , of the interpolation polynomial x n . This
observation is crucial in the proof.

Proof. ∗ The proof is by induction on k. For the case k = 0 we have p i ,0 (x) = f [x i ] = f (x i ),


which is the unique interpolation polynomial of degree 0 at (x i , f (x i )), so the claim is true in
this case. Assume the statement holds for k > 0, which means that the interpolation polyno-
mial p i ,k (x) for the pairs (x i , f (x i )), . . . , (x i +k , f (x i +k )) is given as in the theorem. We can now
choose a value a k+1 such that

p i ,k+1 (x) = p i ,k (x) + a k+1 (x − x i ) . . . (x − x i +k ) (2.6)

Department of Mathematics 14 University of Manchester


MATH20602: 2 INTERPOLATION S.L. COTTER

interpolates f at x i , . . . , x i +k+1 . In fact, note that p i ,k+1 (x j ) = f (x j ) for i ≤ j ≤ i + k, so that we


only require a k+1 to be chosen so that p i ,k+1 (x i +k+1 ) = f (x i +k+1 ). Moreover, note that a k+1 is
the coefficient of highest order of p i ,k+1 (x), that is, we can write

p i ,k+1 (x) = a k+1 x k+1 + lower order terms.

The only thing that needs to be shown is that a k+1 = f [x i , . . . , x i +k+1 ]. For this, we define a
new polynomial q(x), show that it coincides with p i ,k+1 (x) by being the unique interpolation
polynomial of f at x i , . . . , x i +k+1 , and then show that the highest order coefficient of q(x) is
precisely f [x i , . . . , x i +k+1 ]. Define

(x − x i )p i +1,k (x) − (x − x i +k+1 )p i ,k (x)


q(x) = . (2.7)
x i +k+1 − x i

This polynomial has degree ≤ k + 1, just like p i ,k+1 (x). Moreover:

q(x i ) = p i ,k (x i ) = f (x i )
q(x i +k+1 ) = p i +1,k (x i +k+1 ) = f (x i +k+1 )
(x j − x i ) f (x j ) − (x j − x i +k+1 ) f (x j )
q(x j ) = = f (x j ), i + 1 ≤ j ≤ i + k.
x i +k+1 − x i

This means that q(x) also interpolates f at x i , . . . , x i +k+1 , and by the uniqueness of the in-
terpolation polynomial, must equal p i ,k+1 (x). Let’s now compare the coefficients of x k+1 in
both polynomials. The coefficient of x k+1 in p i ,k+1 is a k+1 , as can be seen from (2.6). By the
induction hypothesis, the polynomials p i +1,k (x) and p i ,k (x) have the form

p i +1,k (x) = f [x i +1 , . . . , x i +k+1 ]x k + lower order terms ,


p i ,k (x) = f [x i , . . . , x i +k ]x k + lower order terms .

By plugging into (2.7), we see that the coefficient of x k+1 in q(x) is

f [x i +1 , . . . , x i +k+1 ] − f [x i , . . . , x i +k ]
= f [x i , . . . , x i +k+1 ].
x i +k+1 − x i

This coefficient has to equal a k+1 , and the claim follows.

Example 9. Let’s find the divided difference form of a cubic interpolation polynomial for the
points Video 3.2
(−1, 1), (0, 1), (3, 181), (−2, −39).
The divided difference table would look like

j xj fj f [x j , x j +1 ] f [x j , x j +1 , x j +2 ] f [x 0 , x 1 , x 2 , x 3 ]
0 −1 1
0
60−0
1 0 1 3−(−1) = 15
181−1 8−15 (2.8)
3−0 = 60 −2−(−1) =7
44−60
2 3 181 −2−0
=8
−39−181
−2−3 = 44
3 −2 −39

Department of Mathematics 15 University of Manchester


MATH20602: 2 INTERPOLATION S.L. COTTER

The coefficients a j = f [x 0 , . . . , x j ] are given by the upper diagonal, and the interpolation poly-
nomial is thus

p 3 (x) = a 0 + a 1 (x − x 0 ) + a 2 (x − x 0 )(x − x 1 ) + a 3 (x − x 0 )(x − x 1 )(x − x 2 )


= 1 + 15x(x + 1) + 7x(x + 1)(x − 3).

Now suppose we add another data point (4, 801). This amounts to adding only one new term
to the polynomial. The new coefficient a 4 = f [x 0 , . . . , x 4 ] is calculated by adding a new line at
the bottom of Table (2.8) as follows:

j = 4, x j = 4, f 4 = 801, f [x 3 , x 4 ] = 140, f [x 2 , x 3 , x 4 ] = 96, f [x 1 , . . . , x 4 ] = 22, f [x 0 , . . . , x 4 ] = a 4 = 3.

The updated polynomial is therefore

p 4 (x) = 1 + 15x(x + 1) + 7x(x + 1)(x − 3) + 3x(x + 1)(x − 3)(x + 2).

Evaluating this polynomial can be done conveniently using Horner’s method,

1 + x(0 + (x + 1)(15 + (x − 3)(7 + 3(x + 2)))),

using only four multiplications.

∗ Another thing to notice is that the order of the x i plays a role in assembling the Newton
interpolation polynomial, while the order did not play a role in Lagrange interpolation. Recall
the characterisation of the interpolation polynomial in terms of the Vandermonde matrix
from Week 2. The coefficients a i of the Newton divided difference form can also be derived
as the solution of a system of linear equations, this time in convenient triangular form:

f0 1 0 0 ··· 0 a0
    
 f 1  1 x 1 − x 0 0 ··· 0   a1 
    
 f 2  1 x 2 − x 1 (x 2 − x 0 )(x 2 − x 1 ) ··· 0   a2 
 =  .
 .  . .. .. .. ..  . 
 ..   .. . . . .   .. 
Q
fn 1 x n − x 0 (x n − x 0 )(x n − x 1 ) ··· j <n (x n − x j ) an

2.4 Convergence
Video 3.3
For a given set of points x 0 , . . . , x n and function f , we have a bound on the interpolation error.
Is it possible to make the error smaller by adding more interpolation points, or by modifying
the distribution of these points? The answer to this question can depend on two things: the
class of functions considered, and the spacing of the points. Let p n (x) denote the Lagrange
interpolation polynomial of degree n for f at the points x 0 , . . . , x n . The question we ask is
whether
lim max |p n (x) − f (x)| → 0.
n→∞ a≤x≤b

Perhaps surprisingly, the answer is negative, as the following famous example, known as the
Runge Phenomenon, shows.
Example 10. Consider the interval [a, b] and let

j
xj = a + (b − a), 0 ≤ j ≤ n,
n

Department of Mathematics 16 University of Manchester


MATH20602: 2 INTERPOLATION S.L. COTTER

be n + 1 uniformly spaced points on [a, b]. Consider the function


1
f (x) =
1 + 25x 2
on the interval [−1, 1]. This function is smooth and it appears unlikely to cause any trouble.
However, when interpolating at various equispaced points for increasing n, we see that the
interpolation error seems to increase. The reason for this phenomenon lies in the behaviour
of the complex function z 7→ 1/(1 + z 2 ).

2 2

1 1

0 0

−1 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

2 2

1 1

0 0

−1 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

The problem is not due to the interpolation method, but has to do with the spacing of the
points.
Example 11. Let us revisit the function 1/(1 + 25x 2 ) and try to interpolate it at Chebyshev
points:
x j = cos( j π/n), 0 ≤ j ≤ n.
Calculating the interpolation error for this example shows a completely different result to the
previous example. In fact, plotting the error and comparing it with the case of equispaced
points shows that choosing the interpolation points in a clever way can be of huge benefit.

30
Equal spacing
1 25 Chebyshev spacing

0.8
20
0.6
15
0.4
10
0.2

0 5

−0.2 0
−1 −0.5 0 0.5 1 0 5 10 15 20

Figure 3: Interpolation error for equispaced and Chebyshev points.

Department of Mathematics 17 University of Manchester


MATH20602: 2 INTERPOLATION S.L. COTTER

It can be shown (see Part A of problem sheet 3) that the interpolation error at Chebyshev
points in the interval [−1, 1] can be bounded as

M n+1
| f (x) − p n (x)| ≤ .
2n (n + 1)!

This is entirely due to the behaviour of the polynomial πn+1 (x) at these points.
To summarize, we have the following two observations:

1. To estimate the difference | f (x) − p n (x)| we need assumptions on the function f , for
example, that it is sufficiently smooth.

2. The location of the interpolation points x j , 0 ≤ j ≤ n, is crucial. Equispaced points can


lead to unpleasant results!

2.5 An alternative form

The representation as a Lagrange interpolation polynomial


n
X
p n (x) = L k (x) f (x k ),
k=0

has some drawbacks. On the one hand, it requires O(n 2 ) operations to evaluate. Besides this,
adding new interpolation points requires the recalculation of the Lagrange basis polynomi-
als L k (x). Both of these problems can be remedied by rewriting the Lagrange interpolation
formula.
Provided x ̸= x j for 0 ≤ j ≤ n, the Lagrange interpolation polynomial can be written as
Pn wk
k=0 x−x k
f (x k )
p(x) = Pn wk , (2.9)
k=0 x−x k
Q
where w k = 1/ j ̸=k (x k − x j ) are called the barycentric weights. Once the weights have been
computed, the evaluation only takes O(n) operations, and updating it with new weights is
also only O(n) operations. To derive this formula, define L(x) = nk=0 (x − x k ) and note that
Q

p(x) = L(x) nk=0 w k /(x − x k ) f (x k ). Noting also that 1 = nk=0 L k (x) = L(x) nk=0 w k /(x − x k )
P P P

and dividing by this “intelligent one”, Equation (2.9) follows. Finally, it can be shown that the
problem of computing the barycentric Lagrange interpolation is numerically stable at points
such as Chebyshev points.

Department of Mathematics 18 University of Manchester


MATH20602: 3 INTEGRATION AND QUADRATURE S.L. COTTER

3 Integration and Quadrature


Video 3.4
We are interested in the problem of computing an integral
ˆ b
f (x) d x.
a

If possible, one can compute the antiderivative F (x) (the function such that F ′ (x) = f (x))
and obtain the integral as F (b) − F (a). However, it is not always possible to compute the
antiderivative, as in the cases
ˆ 1 ˆ π
x2
e d x, cos(x 2 ) d x.
0 0

More prominently, the standard normal (or Gaussian) probability distribution amounts to
evaluating the integral ˆ z
1 x2
p e − 2 d x,
2π −∞
which is not possible in closed form. Even if it is possible in principle, evaluating the an-
tiderivative may not be numerically the best thing to do. The problem is then is to approxi-
mate such integrals numerically as well as possible.

3.1 The Trapezium Rule

The Trapezium Rule seeks to approximate the integral, interpreted as the area under the
curve, by the area of a trapezium defined by the graph of the function.

One step Trapezium Rule


1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Department of Mathematics 19 University of Manchester


MATH20602: 3 INTEGRATION AND QUADRATURE S.L. COTTER

Suppose we want to approximate the integral between a and b, and let h = b − a. Then
the trapezium approximation is given by
ˆ b
h
f (x) d x ≈ I ( f ) = ( f (a) + f (b)),
a 2

as can be verified easily. The Trapezium Rule may be interpreted as integrating the linear
interpolant of f at the points x 0 = a and x 1 = b. The linear interpolant is given as

x −b x −a
p 1 (x) = f (a) + f (b).
a −b b−a
Integrating this function gives rise to the representation as the area of a trapezium:
ˆ b
h
p 1 (x) d x = ( f (a) + f (b)).
a 2

Using the interpolation error, we can derive the integration error for the Trapezium Rule.
Theorem 3.1. Given f ∈ C 2 ([a, b]), we claim that Video 3.5
ˆ b ˆ b
1 3 ′′
f (x) d x = p 1 (x) d x − h f (ξ),
a a 12

for some ξ ∈ (a, b).

Proof. To derive this, recall (from Theorem 2.3) that the interpolation error is given by

(x − a)(x − b) ′′
f (x) = p 1 (x) + f (η(x)),
2!
for some η(x) ∈ (a, b). We can therefore write the integral as
ˆ b ˆ b ˆ b
1
f (x) d x = p 1 (x) d x + (x − a)(x − b) f ′′ (η(x)) d x.
a a 2 a

By the Integral Mean Value Theorem, there exists a ξ ∈ (a, b) such that
ˆ b ˆ b
′′ ′′
(x − a)(x − b) f (η(x)) d x = f (ξ) (x − a)(x − b) d x.
a a

Using integration by parts, we get


ˆ b ˆ
1 b
· ¸b
1 2
(x − a)(x − b) d x = (x − a) (x − b) − (x − b)2 d x
a 2 a 2 a
1 1
= (a − b)3 = − h 3 .
6 6
For the whole expression we therefore get
ˆ b ˆ b
1 3 ′′
f (x) d x = p 1 (x) d x − h f (ξ),
a a 12

as claimed.

Department of Mathematics 20 University of Manchester


MATH20602: 3 INTEGRATION AND QUADRATURE S.L. COTTER

Example 12. We compute the integral Video 3.5


ˆ 2
1
I= d x.
1 1+x
The antiderivative is ln(1 + x), so the exact result is ln(1.5) ≈ 0.4055 to 4 s.f. for this integral.
Using the Trapezium Rule we get
µ ¶
2−1 1 1 1 5
I= ( f (1) + f (2)) = + = ≈ 0.4167,
2 2 2 3 12
to four significant figures.

The trapezium rule is an example of a quadrature rule. End week 3


Definition 3.1. A quadrature rule seeks to approximate the value of a definite integral by a
weighted sum of function values Video 4.1
ˆ b n
X
f (x) d x ≈ w k f (x k ),
a k=0

where the x k are the quadrature nodes and the w k are the quadrature weights.

3.2 Simpson’s Rule

Simpson’s rule uses three points, x 0 = a, x 2 = b, and x 1 = (a+b)/2 to approximate the integral.
Let h = (b − a)/2. Then Simpson’s Rule gives
ˆ b
h
f (x) d x ≈ I 2 ( f ) = ( f (x 0 ) + 4 f (x 1 ) + f (x 2 )).
a 3
Example 13. For the integrand f (x) = 1/(1 + x) from 1 to 2, Simpson’s rule gives the approxi-
mation µ ¶
1 1 8 1
I2( f ) = + + ≈ 0.4056.
6 2 5 3
This is much closer to the true value 0.4055 than the trapezium rule approximation.
Example 14. For the function f (x) = 3x 2 − x + 1 and interval [0, 1] (that is, h = 0.5), Simpson’s Video 4.3
rule gives the approximation
µ µ ¶ ¶
1 3 1 3
I2( f ) = 1 + 4 − + 1 + 3 = .
6 4 2 2
The antiderivative of this polynomial is x 3 − x 2 /2 + x, so the true integral is 1 − 1/2 + 1 = 3/2.
In this case, Simpson’s rule gives the exact value of the integral! As we will see, this is the case
for any quadratic polynomial.

Simpson’s rule is a special case of a Newton-Cotes quadrature rule.


Definition 3.2. A Newton-Cotes scheme of order n uses the Lagrange basis functions to con- Video 4.2
struct the interpolation weights. Given nodes x k = a + kh, 0 ≤ k ≤ n, where h = (b − a)/n, the
´b
exact integral a f d x is approximated by integrating the Lagrange interpolation polynomial
p n of degree n, associated with these points. Recall that
n
X
p n (x) = L k (x) f (x k ).
k=0

Department of Mathematics 21 University of Manchester


MATH20602: 3 INTEGRATION AND QUADRATURE S.L. COTTER

Then ˆ ˆ
b b n
X
f (x) d x ≈ I n ( f ) := p n (x) d x = w k f (x k ),
a a k=0
´b
where w k = a L k (x) d x.

We now show that Simpson’s rule is indeed a Newton-Cotes rule of order 2. Let x 0 =
a, x 2 = b and x 1 = (a + b)/2. Define h := x 1 − x 0 = (b − a)/2. The quadratic interpolation
polynomial is given by
(x − x 1 )(x − x 2 ) (x − x 0 )(x − x 2 ) (x − x 1 )(x − x 0 )
p 2 (x) = f (x 0 ) + f (x 1 ) + f (x 2 ).
(x 0 − x 1 )(x 0 − x 2 ) (x 1 − x 0 )(x 1 − x 2 ) (x 2 − x 0 )(x 2 − x 1 )
We claim that ˆ b
h
I2( f ) = p 2 (x) d x = ( f (x 0 ) + 4 f (x 1 ) + f (x 2 )).
a 3
To show this, we make use of the identities x 1 = x 0 + h, x 2 = x 0 + 2h, to get the representation
(we use f i := f (x i ) for brevity)

f0 f1 f2
p 2 (x) = 2
(x − x 1 )(x − x 2 ) + 2
(x − x 0 )(x − x 2 ) + 2 (x − x 1 )(x − x 0 ).
2h −h 2h
Using integration by parts or otherwise, we can evaluate the integrals of each term
ˆ x2 ˆ x2 ˆ x2
2 4 2
(x − x 1 )(x − x 2 ) d x = h 3 , (x − x 0 )(x − x 2 ) d x = − h 3 , (x − x 0 )(x − x 1 ) d x = h 3 .
x0 3 x0 3 x0 3

Altogether, we get ˆ x2
h
p2 d x = ( f 0 + 4 f 1 + f 2 ).
x0 3
This shows the claim. As with the Trapezium rule, we can bound the error for Simpson’s rule.
Theorem 3.2. Let f ∈ C 4 ([a, b]), h = (b − a)/2 and x 0 = a, x 1 = x 0 + h, x 2 = b. Then there exists Video 4.3
ξ ∈ (a, b) such that the integration error is
ˆ b
h¡ h5
f (x 0 ) + 4 f (x 1 ) + f (x 2 ) = − f (4) (ξ).
¢
E ( f ) := f (x) d x −
a 3 90

Note that, in some places in the literature, the bound is written in terms of (b − a) as

(b − a)5 (4)
E(f ) = − f (ξ).
2880
The two versions are equivalent, noting that h = (b − a)/2.

Proof. (*) The proof is based on Chapter 7 of Süli and Mayers, An Introduction to Numerical
Analysis. Consider the change of variable

x(τ) = x 1 + hτ, τ ∈ [−1, 1].

Define F (τ) = f (x(τ)). Then d x = hd τ, and


ˆ b ˆ 1
f (x) d x = h F (τ) d τ.
a −1

Department of Mathematics 22 University of Manchester


MATH20602: 3 INTEGRATION AND QUADRATURE S.L. COTTER

The integration error is written as


ˆ b µˆ 1
h

1
f (x) d x − ( f 0 + 4 f 1 + f 2 ) = h F (τ) d τ − (F (−1) + 4F (0) + F (1)) .
a 3 −1 3

Now define ˆ t
t
G(t ) = F (τ) d τ − (F (−t ) + 4F (0) + F (t ))
−t 3
for t ∈ [−1, 1]. In particular, h G(1) is the integration error we are trying to estimate. Consider
the function
H (t ) = G(t ) − t 5G(1).
Since H (0) = H (1) = 0, by Rolle’s Theorem there exists ξ1 ∈ (0, 1) such that H ′ (ξ1 ) = 0. Since
also H ′ (0) = 0 (exercise), there exists ξ2 ∈ (0, 1) such that H ′′ (ξ2 ) = 0. Since also H ′′ (0) = 0
(exercise), we apply Rolle’s Theorem to find that there exists µ ∈ (0, 1) such that

H ′′′ (µ) = 0.

Note that the third derivative of G is given by G ′′′ (t ) = − 3t (F ′′′ (t ) − F ′′′ (−t )), from which it
follows that
µ
H ′′′ (µ) = − (F ′′′ (µ) − F ′′′ (−µ)) − 60µ2G(1) = 0.
3
We can rewrite this equation as

2 F ′′′ (µ) − F ′′′ (−µ) 2 2


− µ2 × = µ × 90G(1).
3 µ − (−µ) 3

Since µ ̸= 0, we can divide both sides by 2µ2 /3. Then by the Mean Value Theorem, there exists
ξ ∈ (−µ, µ) such that
90G(1) = −F (4) (ξ),
from which we get for the error (after multiplying by h),

h (4)
h G(1) = − F (ξ).
90
Now note that, using again the substitution x(t ) = x 1 + ht as we made at the beginning,

d4 d4
F (4) (t ) = f (x(t )) = f (x 1 + ht ) = h 4 f (4) (x).
dt4 dt4
This completes the proof.

From this we derive the error bound Video 4.3


¯ˆ ¯
¯ b ¯ 1
E2( f ) = ¯ f (x) d x − I 2 ( f )¯ ≤ h 5 M 4 ,
¯ ¯
¯ a ¯ 90

where M 4 is an upper bound on the absolute value of the fourth derivative of f on the interval
[a, b].

Department of Mathematics 23 University of Manchester


MATH20602: 3 INTEGRATION AND QUADRATURE S.L. COTTER

One step Trapezium Rule One step Simpson Rule


1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

3.3 The Runge phenomenon revisited


Video 4.4
So far we have seen numerical integration methods that rely on linear interpolation (trapez-
ium rule) and quadratic interpolation (Simpson’s rule), with absolute error bounds:

h3 h5
E1( f ) ≤ M2 , E2( f ) ≤ M4 ,
12 90
where E 1 ( f ) is the absolute error for the trapezium rule, E 2 ( f ) the absolute error for Simpson’s
rule, h is the distance between two nodes, and M k the maximum absolute value of the k-th
derivative of f . In particular, it follows that the trapezium rule has error 0 for polynomials of
degree at most one (since p 1′′ = 0 so M 2 = 0), and Simpson’s rule for polynomials of degree at
most three (since p 3′′′′ = 0 so M 4 = 0). One may wonder whether increasing the degree of the
interpolating polynomial necessarily decreases the integration error.
Example 15. Consider the infamous function
1
f (x) =
1 + x2
on the interval [−5, 5]. The integral of this function is given by
ˆ 5
1
2
d x = arctan(x)|5−5 ≈ 2.7468.
−5 1 + x
Now let’s compute the Newton-Cotes quadrature
ˆ 5
In ( f ) = p n (x) d x,
−5

where p n (x) is the interpolation polynomial at n + 1 equispaced points between −5 and 5.


Figure 4 shows the absolute error for n from 1 to 15.
It turns out that in some cases (such as I 12 ( f ) = −0.31294), the numerical approximation
to the integral is negative, which is absurd for a strictly positive f . The reason is that some of
the weights in the quadrature rule turn out to be negative.

As this example shows, increasing the degree may not always be an effective choice, and
we have to think of other ways to increase the precision of numerical integration.

Department of Mathematics 24 University of Manchester


MATH20602: 3 INTEGRATION AND QUADRATURE S.L. COTTER

8 6
Newton−Cotes Absolute error
7 True value
5
6

5 4

4
3
3

2 2

1
1
0

−1 0
0 5 10 15 0 5 10 15

Figure 4: Integration error for Newton-Cotes rules.

3.4 Composite integration rules


Video 4.5
3.4.1 Composite trapezium rule

The trapezium rule uses only two points to approximate an integral, certainly not enough for
most applications. There are different ways to make use of more points and function values
in order to increase precision. One way, as we have just seen with the Newton-Cotes scheme
and Simpson’s rule, is to use higher-order interpolants. A different approach is to subdivide
the interval into smaller intervals and use lower-order schemes, like the trapezium rule, on
these smaller intervals. For this, we subdivide the integral
ˆ b n−1
ˆ x j +1
X
f (x) d x = f (x) d x,
a j =0 xj

where x 0 = a, x j = a + j h for 0 ≤ j ≤ n, and h = (b − a)/n. The composite trapezium rule


approximates each integral in the sum via the trapezium rule:
ˆ b
h h h
f (x) d x ≈ ( f (x 0 ) + f (x 1 )) + ( f (x 1 ) + f (x 2 )) + · · · + ( f (x n−1 ) + f (x n ))
a 2 2 2
µ ¶
1 1
= h f (x 0 ) + f (x 1 ) + · · · f (x n−1 ) + f (x n ) .
2 2
Example 16. Let’s look again at the function f (x) = 1/(1 + x) and apply the composite trapez- Video 4.6
ium rule with h = 0.1 on the interval [1, 2] (that is, with n = 10). Then
ˆ 2 µ ¶
1 1 1 1 1
d x ≈ 0.1 + +···+ + = 0.4056.
1 1+x 4 2.1 2.9 6
Recall that the exact integral was 0.4055 to 4 s.f., and that Simpson’s rule also gave an approx-
imation of 0.4056.
Theorem 3.3. If f ∈ C 2 (a, b) and a = x 0 < · · · < x n = b, then there exists µ ∈ (a, b) such that
ˆ b µ ¶
1 1 1
f (x) d x = h f (x 0 ) + f (x 1 ) + · · · + f (x n−1 ) + f (x n ) − h 2 (b − a) f ′′ (µ).
a 2 2 12

Department of Mathematics 25 University of Manchester


MATH20602: 3 INTEGRATION AND QUADRATURE S.L. COTTER

In particular, if M 2 = maxa≤x≤b | f ′′ (x)|, then the absolute error is bounded by


1 2
h (b − a)M 2 .
12

Proof. Recall from the error analysis of the trapezium rule (Theorem 3.1) that, for some
points ξ j ∈ (x j , x j +1 ) we have
ˆ b n−1
X h
µ ¶
1
f (x) d x = ( f (x j ) + f (x j +1 )) − h 3 f ′′ (ξ j )
a j =0 2 12
1
µ
1

1 n−1
= h · f (x 0 ) + f (x 1 ) + · · · + f (x n−1 ) + f (x n ) − h 3
X ′′
f (ξ j ).
2 2 12 j =0

Clearly the values f ′′ (ξ j ) lie between the minimum and maximum of f ′′ on the interval (a, b),
and so their average is also bounded by

1 n−1
min f ′′ (x) ≤
X ′′
f (ξ j ) ≤ max f ′′ (x).
x∈[a,b] n j =0 x∈[a,b]

Since the function f ′′ is continuous on [a, b], it assumes every value between the minimum
and the maximum, and in particular also the value given by the average above (this is the
statement of the Intermediate Value Theorem). In other words, there exists µ ∈ (a, b) such
that the average above is attained:

1 n−1
X ′′
f (ξ j ) = f ′′ (µ).
n j =0

Therefore we can write the error term as


1 3 n−1 1 1
f (ξ j ) = − h 2 (nh) f ′′ (µ) = − h 2 (b − a) f ′′ (µ),
X ′′
− h
12 j =0 12 12

where we used that h = (b − a)/n. This is the claimed expression for the error.

Example 17. Consider the function f (x) = e −x /x and the integral Video 4.7
ˆ 2
e −x
d x.
1 x
What choice of parameter h will ensure that the approximation error of the composite trapez-
ium rule will be below 10−5 ? Let M 2 denote an upper bound on the second derivative of f (x).
The approximation error for the composite trapezium rule with step length h is bounded by

h2
E(f ) ≤ (b − a) · M 2 .
12
We can find M 2 by calculating the derivatives of f :
µ ¶
1 1
f ′ (x) = −e −x + 2 ,
x x
µ ¶
′′ −x 1 2 2
f (x) = e + + .
x x2 x3

Department of Mathematics 26 University of Manchester


MATH20602: 3 INTEGRATION AND QUADRATURE S.L. COTTER

The second derivative f ′′ (x) is decreasing with x, so its maximum on [1, 2] is attained at x = 1,
i.e. M 2 = f ′′ (1) ≈ 1.8394. In the interval [1, 2] we therefore have the bound

h2
E(f ) ≤ × 1.8394 × (2 − 1) = 0.1533 × h 2 .
12

We find that taking 124 steps or more will guarantee an error below 10−5 . For example, with
h = 0.005 (this corresponds to taking n = 200 steps), the error is bounded by 3.83 × 10−6 . End week 4

3.4.2 Composite Simpson’s rule


Video 5.1
To derive the composite version of Simpson’s rule, we subdivide the interval [a, b] into 2m
intervals and set h = (b − a)/2m, and x j = a + j h for 0 ≤ j ≤ 2m. Then
ˆ b m
ˆ x2 j
X
f (x) d x = f (x) d x.
a j =1 x 2 j −2

Applying Simpson’s rule to each of the integrals in the sum, we arrive at the expression

h¡ ¢
f (x 0 ) + 4 f (x 1 ) + 2 f (x 2 ) + 4 f (x 3 ) + · · · + 4 f (x 2m−3 ) + 2 f (x 2m−2 ) + 4 f (x 2m−1 ) + f (x 2m ) ,
3
where the coefficients of the f (x i ) alternate between 4 and 2 for 1 ≤ i ≤ 2m − 1.
Theorem 3.4. If f ∈ C 4 (a, b) and a = x 0 < · · · < x n = b, then there exists µ ∈ (a, b) such that Video 5.2
ˆ b
h¡ 1 4
h (b − a) f (4) (µ).
¢
f (x) d x = f (x 0 ) + 4 f (x 1 ) + 2 f (x 2 ) + · · · + 4 f (x n−1 ) + f (x n ) −
a 3 180

In particular, if M 4 = maxa≤x≤b | f (4) (x)|, then the absolute error is bounded by

1 4
h (b − a)M 4 .
180

Proof. The proof is very similar to the proof of Theorem 3.3, using the previous bounds for
Simpson’s rule.

Example 18. Having an error of order h 2 means that, every time we halve the stepsize (or, Video 5.3
equivalently, double the number of points n), the error decreases by a factor of 4. This can
be written as E ( f ) = O(n −2 ). Looking at the example function f (x) = 1/(1 + x) and apply-
ing the composite Trapezium rule, we get the following relationship between the logarithm
of the number of points log n and the logarithm of the error. The fitted line has a slope of
−1.9935 ≈ −2, as expected from the theory.

Summarizing, we have seen the following integration schemes with their corresponding
error bounds:

1 3
Trapezium: 12 (b − a) M 2
1 2
Composite Trapezium: 12 h (b − a)M 2
1
Simpson: 2880
(b − a)5 M 4
1 4
Composite Simpson: 180 h (b − a)M 4 .

Department of Mathematics 27 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

Error for composite trapezium rule


−6

−8

−10

log of error
−12

−14

−16

−18

−20
0 1 2 3 4 5 6
log of number of steps

Note that we expressed the error bound for Simpson’s rule in terms of (b − a) rather than
h = (b − a)/2. The h in the bounds for the composite rules corresponds to the distance be-
tween any two consecutive nodes, x j +1 − x j .
We conclude the section with a definition of the order of precision of a quadrature rule.
Definition 3.3. A quadrature rule I ( f ) has degree of precision k, if it evaluates polynomials of
degree at most k exactly. That is,
ˆ b
j 1
I (x ) = xj dx = (b j +1 − a j +1 ), for 0 ≤ j ≤ k.
a j +1

For example, it is easy to show that the Trapezium rule has degree of precision 1 (it eval-
uates 1 and x exactly), while Simpson’s rule has degree of precision 3 (rather than 2 as ex-
pected!). In general, Newton-Cotes quadrature of degree n has degree of precision n, if n is
odd, and n + 1 if n is even.

4 Numerical Linear Algebra


Video 5.4
Problems in numerical analysis can often be formulated in terms of linear algebra. For ex-
ample, the discretization of partial differential equations leads to problems involving large
systems of linear equations. The basic problem in linear algebra is to solve a system of linear
equations
Ax = b, (4.1)
where  
a 11 · · · a 1n
 .. .. .. 
A= . . . 
a m1 · · · a mn

Department of Mathematics 28 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

is an m × n matrix with real numbers as entries, and


   
x1 b1
 ..   .. 
x =  . , b =  . 
xn bm

are vectors. We will often deal with the case m = n (a square matrix A).
There are two main classes of methods for solving such systems.

1. Direct methods attempt to solve (4.1) using a finite number of operations. An example
is the well-known Gaussian elimination algorithm.

2. Iterative methods generate a sequence x 0 , x 1 , . . . of vectors in the hope that x k converges


(in a sense to be made precise) to a solution x of (4.1) as k → ∞.

Direct methods generally work well for dense matrices and moderately large n. Iterative
methods work well with sparse matrices, that is, matrices with few non-zero entries a i j , and
large n.
Example 19. Consider the ordinary differential equation

−u xx = f (x),

with boundary conditions u(0) = u(1) = 0, where u is a twice differentiable function on [0, 1],
and u xx = ∂2 u/∂x 2 denotes the second derivative in x. We can discretize the interval [0, 1] by
setting ∆x = 1/(n + 1), x j = j ∆x, and denoting

u j := u(x j ), f j := f (x j ), for j = 0, . . . , n.

The second derivative can be approximated by a finite difference

u i −1 − 2u i + u i +1
u xx ≈ .
(∆x)2

At any one point x j , the differential equation thus translates to

u i −1 − 2u i + u i +1
− = fj.
(∆x)2

Making use of the initial conditions u(0) = u(1) = 0, we get the system of equations
    
−2 1 0 0 ··· 0 0 u1 f1
  u2   f 2 
 1 −2 1 0 · · · 0 0    

0 1 −2 1 · · · 0 0   u3   f 3 
    
1 
− . .. .. .. . . . ..   ..  =  .. .
    
(∆x)2  .. . . . . .. .  .   . 
0 0 0 0 · · · −2 1  u n−1   f n−1 
    
0 0 0 0 · · · 1 −2 un fn

The matrix is very sparse, it has only 3n − 2 non-zero entries, out of n 2 possible! This form is
typical for matrices arising from partial differential equations, and is well-suited for iterative
methods that exploit the specific structure of the matrix.

Department of Mathematics 29 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

4.1 The Jacobi and Gauss-Seidel methods


Video 5.5
In the following, we assume our matrices to be square (m = n). We will also use the following
bit of notation: upper indices are used to identify individual vectors in a sequence of vectors,
for example, x 0 , x 1 , . . . , x i . Individual entries of a vector x are denoted by x i , so, for example,
the i -th entry of a k-th vector would be written as x i(k) or simply x ik .
A template for solving a linear system iteratively can be derived as follows. Write the
matrix A as a sum A = A 1 + A 2 , where A 1 and A 2 are somewhat “simpler” to handle than the
original matrix. Then the system of equations Ax = b can be written as

A 1 x = −A 2 x + b.

This motivates the following approach: start with a vector x 0 and successively compute x k+1
from x k by solving the system
A 1 x k+1 = −A 2 x k + b. (4.2)
Note that after the k-th step, the right-hand side is known, while the unknown to be found is
the vector x k+1 on the left-hand side.

4.1.1 Jacobi’s Method

Decompose the matrix A as


A = L + D +U ,
where
   
0 0 0 ··· 0 0 0 a 12 a 13 · · · a 1(n−1) a 1n

 a 21 0 0 ··· 0 0 
0 0 a
 23 · · · a 2(n−1) a 2n 
a 31 a 32 0 ··· 0 0 0 0 0 · · · a 3(n−1) a 3n 
   

L= .. .. .. .. .. ,
..  U =
 .. .. .. .. .. .. 

 . . . . . . . . . . . . 

a (n−1)1 a (n−1)2 a (n−1)3 · · · 0 0 0 0 0 0 a (n−1)n 
   
···
a n1 a n2 a n3 · · · a n(n−1) 0 0 0 0 ··· 0 0

are the lower and upper triangular parts, and


 
a 11 0 · · · 0
 0 a 22 · · · 0 
D = diag(a 11 , . . . , a nn ) :=  .
 
.. .. .. 
 .. . . . 
0 0 · · · a nn

is the diagonal part. Jacobi’s method chooses A 1 = D and A 2 = L + U . The corresponding


iteration (4.2) then becomes D x k+1 = −(L +U )x k + b. Once we know x k , this is a particularly
simple system of equations, since the matrix on the left is diagonal! Solving for x k+1 ,

x k+1 = D −1 (b − (L +U )x k ). (4.3)

Note that since D is diagonal, it is easy to invert: just take reciprocals of the individual entries.
Example 20. For a concrete example, take the following matrix with its decomposition into
diagonal and off-diagonal parts:
µ ¶ µ ¶ µ ¶
2 −1 2 0 0 −1
A= = + .
−1 2 0 2 −1 0

Department of Mathematics 30 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

Since µ ¶−1 µ ¶ µ ¶
2 0 1/2 0 1 1 0
= = ,
0 2 0 1/2 2 0 1
we get the iteration scheme µ ¶
k+1 0 1/2 k 1
x = x + b.
1/2 0 2
We can also write the iteration (4.3) in terms of individual entries. If we denote Video 6.1
 
x 1(k)
 . 
x k :=  . 
 . ,
x n(k)

i.e., we write x i(k) for the i -th entry of the k-th iterate, then the iteration (4.3) becomes
à !
1
x i(k+1) = a i j x (k)
X
bi − j
. (4.4)
ai i j ̸=i

for 1 ≤ i ≤ n.
Let’s try this out with b = (1, 1)⊤ , to see if we get a solution. Let x 0 = 0 to start with. Then Video 5.5
µ ¶ µ1¶ µ1¶
1 0 1/2
x = 0 + 21 = 21
1/2 0 2 2
µ1¶ µ
0 1/2 12
µ ¶ ¶µ ¶ µ1¶ µ3¶
2 0 1/2 1 2 2 4
x = x + 1 = 1 + 1 = 3
1/2 0 2
1/2 0 2 2 4
µ ¶ µ1¶ µ ¶µ3¶ µ1¶ µ 7 ¶
0 1/2 2 0 1/2 4
x3 = x + 21 = 2
3 + 1 = 7
8
1/2 0 2
1/2 0 4 2 8.

We see a pattern emerging: in fact, one can show (exercise, induction) that in this example

1 − 2−k
µ ¶
k
x = .
1 − 2−k

In particular, as k → ∞ the vectors x k approach (1, 1)⊤ , which is easily verified to be a solution
of Ax = b. End week 5
We saw that a general approach to finding an approximate solution to a system of linear
equations
Ax = b
is to generate a sequence of vectors x (k) for k ≥ 0 by some procedure

x (k+1) = T x (k) + c,

in the hope that the sequence approaches a solution. In the case of the Jacobi method, we
had the iteration
x (k+1) = D −1 (b − (L +U )x (k) ),
with L, D, and U the lower, diagonal, and upper triangular part of A. That is,

T = −D −1 (L +U ), c = D −1 b.

Next, we discuss a refinement of this method, and will also address the issue of convergence.

Department of Mathematics 31 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

4.1.2 Gauss-Seidel
Video 6.2
In the Gauss-Seidel method, we use a different decomposition, leading to the following sys-
tem
(D + L)x k+1 = −U x k + b. (4.5)
Though the right-hand side is not diagonal (as in the Jacobi method), the system is still easily
solved for x k+1 when x k is given. To derive the entry-wise formula for this method, we take a
closer look at (4.5)
  (k+1)       (k) 
x1 0 a 12 · · · a 1n x 1

a 11 0 ··· 0 b1
 a 21 a 22 · · ·  (k+1)      (k) 
0  x2   b 2  0 0 · · · a 2n  x 

 .. .. .

. =  . −. . .
 2.  .
. .
 
 . . .. ..   ..   ..   ..
   .. .. ..   .. 
 

a n1 a n2 · · · a nn x n(k+1) bn 0 0 ··· 0 x n(k)

Writing out the equations, we get


³ ´
a 11 x 1(k+1) = b 1 − a 12 x 2(k) + · · · + a 1n x n(k)
³ ´
a 21 x 1(k+1) + a 22 x 2(k+1) = b 2 − a 23 x 3(k) + · · · + a 2n x n(k)
..
.
³ ´
a i 1 x 1(k+1) + · · · + a i i x i(k+1) = b i − a i i +1 x i(k)
+1
+ · · · + a x
in n
(k)
.

Rearranging this, we get the formula


à !
1
x i(k+1) = a i j x (k+1) a i j x (k)
X X
bi − j
− j
ai i j <i j >i

for the (k + 1)-th iterate of x i . Note that in order to compute the (k + 1)-th iterate of x i , we
already use values of the (k + 1)-th iterate of x j for j < i . This differs from the Jacobi form,
where we only resort to the k-th iterate. Both methods have their advantages and disadvan-
tages. While Gauss-Seidel may require less storage (we can overwrite each x i(k) by x i(k+1) as we
don’t need the old value subsequently), Jacobi’s method can be used more easily in parallel
processing (that is, all the x i(k+1) can be computed by different processors for each i ).
Example 21. Consider a simple system of the form
   
2 −1 0 1
−1 2 −1 x = 1 .
0 −1 2 1

Note that this is the kind of system that arises in the discretization of a differential equation.
Although for matrices of this size we can easily solve the system directly, we will illustrate the
use of the Gauss-Seidel method. The Gauss-Seidel iteration has the form
     
2 0 0 0 1 0 1
−1 2 0 x k+1 = 0 0 1 x k + 1 .
0 −1 2 0 0 0 1

Department of Mathematics 32 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

If we choose the starting point x 0 = (0, 0, 0)⊤ , then


       
2 0 0 0 1 0 1 1
−1 2 0 x 1 = 0 0 1 x 0 + 1 = 1 .
0 −1 2 0 0 0 1 1

The system is easily solved to find x 1 = (1/2, 3/4, 7/8)⊤ . Continuing this process we get
x 2 , x 3 , ... until we are satisfied with the accuracy of the solution.

4.2 Vector Norms


Video 6.3
We have seen in previous examples that the sequence of vectors x k generated by the Jacobi
method “approaches” the solution of the system of equations Ax = b as we keep going. In
order to make this type of convergence precise, we need to be able to measure distances
between vectors and matrices.
Definition 4.1. A vector norm on Rn is a real-valued function ∥·∥ that satisfies the following
conditions:

1. For all x ∈ Rn , we have ∥x∥ ≥ 0, and ∥x∥ = 0 if and only if x = 0.


2. For all α ∈ R, we have ∥αx∥ = |α| ∥x∥.

For x, y ∈ Rn , we have °x + y ° ≤ ∥x∥ + ° y ° (Triangle Inequality).


° ° ° °
3.

Example 22. The typical examples are the following:

1. The 2-norm s
n ¢1/2
x i2 = x ⊤ x
X ¡
∥x∥2 = .
i =1
This is just the usual notion of Euclidean length.

2. The 1-norm
n
X
∥x∥1 = |x i |.
i =1

3. The ∞-norm
∥x∥∞ = max |x i |.
1≤i ≤n

Example 23. Let us apply the norms above to the vector


 
1
x = −1 .

1
p
The different vector norms are ∥x∥2 = 3, ∥x∥1 = 3, ∥x∥∞ = 1.

A convenient way to visualize these norms is via their “unit circles.” If we look at the sets

{x ∈ R2 : ∥x∥p = 1}
for p = 2, 1, ∞, then we see the following shapes:

Department of Mathematics 33 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

Now that we have defined a way of measuring distances between vectors, we can talk
about convergence.
R R
Definition 4.2. A sequence of vectors x k ∈ n , k = 0, 1, 2, . . . , converges to x ∈ n with respect Video 6.4
to a norm ∥·∥ if for all ε > 0 there exists an N > 0 such that, for all k ≥ N , we have
° °
° k
°x − x ° < ε.
°

In words: we can get arbitrarily close to x by choosing k to be sufficiently large.

We sometimes write
lim x k = x or xk → x
k→∞

to indicate that a sequence x 0 , x 1 , . . . converges to a vector x. If we want to indicate the par-


ticular norm with respect to which convergence is measured, we sometimes write

x k → x, x k → x, xk → x
1 ∞ 2

to indicate convergence with respect to the 1-, ∞-, and 2-norms, respectively.
The following lemma implies that, for the purpose of convergence, it doesn’t matter
whether we take the ∞- or the 2-norm.
Lemma 4.1. For x ∈ Rn , p
Video 6.5
∥x∥∞ ≤ ∥x∥2 ≤ n ∥x∥∞ .

Proof. Let M := ∥x∥∞ = max1≤i ≤n |x i |. Note that


!1
n x2
Ã
2
X i p p
∥x∥2 = M · ≤ M n = ∥x∥∞ n,
i =1 M2

because x i2 /M 2 ≤ 1 for all i . This shows the second inequality. For the first one, note that
there is at least one value of i such that M = |x i |. It follows that
!1
n x2
Ã
2
i
X
∥x∥2 = M · ≥ M = ∥x∥∞ .
i =1 M2

This completes the proof.

A similar relationship can be shown between the 1-norm and the 2-norm, and also be-
tween the 1-norm and the ∞-norm.

Department of Mathematics 34 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

Corollary 4.1. Convergence in the 2-norm is equivalent to convergence in the ∞-norm:

xk → x ⇐⇒ x k → x.
2 ∞

In words: if x k → x with respect to the ∞-norm, then x k → x with respect to the 2-norm, and
vice versa.

Proof. Suppose that x k → x with respect to the 2-norm, and let ε > 0. Since x k converges
with respect to the 2-norm, there exists N > 0 such that for all k > N , ∥x k − x∥2 < ε. Since
∥x k − x∥∞ ≤ ∥x k − x∥2 , we also get convergence with respect to the ∞-norm. Now suppose
p
conversely that x k converges with respect to the ∞-norm. Then given ε > 0, for ε′ = ε/ n
p
there exists N > 0 such that ∥x k − x∥∞ < ε′ for k > N . But since ∥x k − x∥2 ≤ n∥x k − x∥∞ <
p ′
nε = ε, it follows that x k also converges with respect to the 2-norm.

The benefit of this type of result is that some norms are easier to compute than others.
Even if we are interested in measuring convergence with respect to the 2-norm, it may be
quicker to show that a sequence converges with respect to the ∞-norm, and once this is
shown, convergence in the 2-norm follows automatically by the above corollary.

4.3 Matrix Norms


Video 6.6
To study the convergence of iterative methods we also need norms for matrices. Recall that
the Jacobi and Gauss-Seidel methods generate a sequence x k of vectors by the scheme

x k+1 = T x k + c,

for some matrix T . The hope is that this sequence will converge to a solution vector x such
that x = T x + c. Given such an x, we can subtract x from both sides of the iteration to obtain

x k+1 − x = T x k + c − x = T (x k − x).

That is, the difference x k+1 − x °arises from the previous difference x k − x by multiplication by
T . For convergence, we want °x k − x ° to become smaller as k increases, or in other words,
°

we want multiplication by T to reduce the norm of a vector. In order to quantify the effect of a
linear transformation T on the norm of a vector, we introduce the concept of a matrix norm.
Definition 4.3. A matrix norm is a non-negative function ∥·∥ on the set of real n × n matrices
such that, for every n × n matrix A,

1. ∥A∥ ≥ 0, with ∥A∥ = 0 if and only if A = 0.

2. For all α ∈ R, we have ∥αA∥ = |α| ∥A∥.


3. For all n × n matrices A, B we have ∥A + B ∥ ≤ ∥A∥ + ∥B ∥.

4. For all n × n matrices A, B we have ∥AB ∥ ≤ ∥A∥ ∥B ∥.

Note that properties 1–3 just state that a matrix norm is also a vector norm, if we think of
the matrix as a vector. Property 4 of the definition is about the “matrix-ness” of a matrix. The
most useful class of matrix norms are the operator norms induced by a vector norm.

Department of Mathematics 35 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

Example 24. If we treat a matrix as a column vector of n 2 entries, then the 2-norm is called
the Frobenius norm of the matrix,
v
u n
uX 2
∥A∥F = t ai j .
i , j =1

The properties 1–3 are clearly satisfied, since this is just the 2-norm of the matrix considered
as a vector. Property 4 can be verified using the Cauchy-Schwarz inequality, and is left as an
exercise. End week 6
The most important matrix norms are the operator norms associated with certain vector
norms, which measure the extent to which a vector x is “stretched” by the matrix A with
respect to a given norm.
Definition 4.4. Given a vector norm ∥·∥, the corresponding operator norm of an n × n matrix Video 7.1
A is defined as
∥Ax∥
∥A∥ = max = maxn ∥Ax∥ .
x̸=0 ∥x∥ x∈R
∥x∥=1
Remark 4.1. To see the second equality, note that for x ̸= 0 we can write
° °
∥Ax∥ ° x ° ° °
= °A
° ° = ° Ay ° ,
∥x∥ ∥x∥ °
with y = x/ ∥x∥, where we used Property 2 of the definition of a vector norm. The vector
y = x/ ∥x∥ is a vector with norm ∥y∥ = ∥x/ ∥x∥∥ = 1, so that for every x ̸= 0 there exists a
vector y with ∥y∥ = 1 such that
∥Ax∥ ° °
= ° Ay ° .
∥x∥
In particular, minimizing the left-hand
° ° side over x ̸= 0 gives the same result as minimizing
the right-hand side over y with y = 1.
° °

First, we have to verify that the operator norm is indeed a matrix norm.
Theorem 4.1. The operator norm corresponding to a vector norm ∥·∥ is a matrix norm. Video 7.2

Proof. Properties 1–3 are easy to verify from the corresponding properties of the vector
norms. For example, ∥A∥ ≥ 0 because by the definition, there is no way it could be negative.
To show property 4, namely,
∥AB ∥ ≤ ∥A∥ ∥B ∥
for n × n matrices A and B , we first note that for any y ∈ n ,
° °
R
° Ay ° ∥Ax∥
° ° ≤ max = ∥A∥ ,
°y ° x̸=0 ∥x∥

and therefore ° ° ° °
° Ay ° ≤ ∥A∥ ° y ° .
Now let y = B x for some x with ∥x∥ = 1. Then
∥AB x∥ ≤ ∥A∥ ∥B x∥ ≤ ∥A∥ ∥B ∥ .
As this inequality holds for all unit-norm x, it also holds for the vector that maximises ∥AB x∥,
and therefore we get
∥AB ∥ = max ∥AB x∥ ≤ ∥A∥ ∥B ∥ .
∥x∥=1
This completes the proof.

Department of Mathematics 36 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

Example 25. Consider the matrix µ


¶ Video 7.1
2 1
A= .
1 2
The matrix norm ∥A∥2 gives the maximum length ∥Ax∥2 among all x with ∥x∥2 = 1. If we
draw the circle C = {x : ∥x∥2 = 1} and the ellipse E = {Ax : ∥x∥2 = 1}, then ∥A∥2 is the length of
the semi-major axis of the ellipse.

Ax

End week 7
Even though the operator norms with respect to the various vector norms are of immense
importance in the analysis of numerical methods, they are hard to compute or even estimate
from their definition alone. It is therefore useful to have alternative characterizations of these
norms. The first of these characterizations is concerned with the norms ∥·∥1 and ∥·∥∞ , and
provides an easy criterion to compute these.
Lemma 4.2. For an n × n matrix A, the operator norms with respect to the 1-norm and the Video 8.2
∞-norm are given by
n
X
∥A∥1 = max |a i j | (maximum absolute column sum) ,
1≤ j ≤n i =1
n
X
∥A∥∞ = max |a i j | (maximum absolute row sum) .
1≤i ≤n j =1

Proof. (*) We will prove this for the ∞-norm. We first show the inequality ∥A∥∞ ≤ Video 8.3
max1≤i ≤n nj=1 |a i j |. Let x be a vector such that ∥x∥∞ = 1. That means that all the entries
P

have absolute value |x i | ≤ 1. It follows that


¯ ¯
¯X n ¯
∥Ax∥∞ = max ¯ a i j x j ¯
¯ ¯
1≤i ≤n ¯ j =1 ¯
n
X
≤ max |a i j x j |
1≤i ≤n j =1

X n
≤ max |a i j |,
1≤i ≤n j =1

where the inequality follows from writing out the matrix vector product, interpreting the ∞-
norm, using the triangle inequality for the absolute value, and the fact that |x j | ≤ 1 for 1 ≤ j ≤
n. Since this holds for arbitrary x with ∥x∥∞ = 1, it also holds for the vector that maximizes
max∥x∥∞ =1 ∥Ax∥∞ = ∥A∥∞ , which shows that ∥A∥∞ ≤ max1≤i ≤n nj=1 |a i j |.
P

In order to show the other direction, ∥A∥∞ ≥ max1≤i ≤n nj=1 |a i j |, let i ′ be the value of the
P

index i at which the maximum of the sum is attained:


n
X n
X
max |a i j | = |a i ′ j |.
1≤i ≤n j =1 j =1

Department of Mathematics 37 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

Now choose° ° y to be the vector with entries y j = 1 if a i ′ j ≥ 0 and y j = −1 if a i ′ j < 0. This vector
satisfies ° y °∞ = 1 and, moreover,
n
X n
X
yj a i′ j = |a i ′ j |,
j =1 j =1

by the choice of the y j . We therefore have


° ° n
X n
X
∥A∥∞ = max ∥Ax∥∞ ≥ ° Ay °∞ = |a i ′ j | = max |a i j |.
∥x∥∞ =1 j =1 1≤i ≤n j =1

This finishes the proof.

Example 26. Consider the matrix Video 8.2


 
−7 3 −1
A =  2 4 5 .
−4 6 0

The operator norms with respect to the 1- and ∞-norm are

∥A∥1 = max{13, 13, 6} = 13, ∥A∥∞ = max{11, 11, 10} = 11.

How do we characterize the matrix 2-norm ∥A∥2 of a matrix? The answer is in terms of Video 8.4
the eigenvalues of A. Recall that a (possibly complex) number λ is an eigenvalue of A, with
associated eigenvector u, if
Au = λu.
Definition 4.5. The spectral radius of A is defined as

ρ(A) = max{|λ| : λ eigenvalue of A}.

Theorem 4.2. For an n × n matrix A we have


q
∥A∥2 = ρ(A ⊤ A).

Proof. (*) Note that for a vector x, ∥Ax∥22 = x ⊤ A ⊤ Ax. We can therefore express the squared
2-norm of A as
∥A∥22 = max ∥Ax∥22 = max x ⊤ A ⊤ Ax.
∥x∥2 =1 ∥x∥2 =1

As a continuous function over a compact set, f (x) = x ⊤ A ⊤ Ax attains its maximum over the
unit sphere {x : ∥x∥2 = 1} at some x = u, say. Using a Lagrange multiplier, there exists a pa-
rameter λ such that
∇ f (u) = 2λu. (4.6)
To compute the gradient ∇ f (x), set B = A ⊤ A, so that
n n
f (x) = x ⊤ B x = b i i x i2 + 2
X X X
bi j xi x j = bi j xi x j ,
i , j =1 i =1 i<j

where the last inequality follows from the symmetry of B (that is, b i j = b j i ). Then

∂f X X n
= 2b kk x k + 2 b ki x i = 2 b ki x i .
∂x k i ̸=k i =1

Department of Mathematics 38 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

But this expression is just twice the k-th row of B x, so that

∇ f (x) = 2B x = 2A ⊤ Ax.

Using this in Equation (4.6), we find

A ⊤ Au = λu,

so that λ is an eigenvalue of A ⊤ A. Using that u ⊤ u = ∥u∥2 = 1, we also have

u ⊤ A ⊤ Au = λu ⊤ u = λ,

and since u was a maximizer of the left-hand function, it follows that λ is the maximal eigen-
value of A ⊤ A, i.e. λ = ρ(A ⊤ A). Summarizing, we have ∥A∥22 = ρ(A ⊤ A).

Example 27. Let


 
1 0 2
A =  0 1 −1 .
−1 1 1
The eigenvalues are the roots of the characteristic polynomial

1−λ
 
0 2
p(λ) = det(A − λ1) = det  0 1 − λ −1  .
−1 1 1−λ

Evaluating this determinant, we get the equation

(1 − λ)(λ2 − 2λ + 4) = 0.
p
The solutions are given by λ1 = 1 and λ2,3 = 1 ± i 3. The spectral radius of A is therefore

ρ(A) = max{1, 2, 2} = 2.

We introduced the spectral radius of a matrix, ρ(A), as the maximum absolute value of
an eigenvalue of A, and characterized the 2-norm of A as
q
∥A∥2 = ρ(A ⊤ A).

Note that the matrix A ⊤ A is symmetric, and therefore has real eigenvalues.
For symmetric matrices A, i.e. matrices such that A ⊤ = A, the situation is simpler: the Video 8.5
2-norm is just the spectral radius.
Lemma 4.3. If A is symmetric, then ∥A∥2 = ρ(A).

Proof. Let λ be an eigenvalue of A with corresponding eigenvector u, so that

Au = λu.

Then
A ⊤ Au = A ⊤ λu = λA ⊤ u = λAu = λ2 u.
It follows that λ2 is eigenvalue of A ⊤ A with corresponding eigenvector u. In particular,

∥A∥22 = ρ(A ⊤ A) = max{λ2 : λ eigenvalue of A} = ρ(A)2 .

Taking square roots on both sides, the claim follows.

Department of Mathematics 39 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

Example 28. We compute the eigenvalues, and thus the spectral radius and the 2-norm, of
the finite difference matrix
 
2 −1 0 0 ··· 0 0
−1 2 −1 0 · · · 0 0
 
 0 −1 2 −1 · · · 0 0
 
A= .
 .. .. .. . . . .
.. 
 .. . . . . .. . 
0 0 0 0 · · · 2 −1
 
0 0 0 0 · · · −1 2
Let h = 1/(n + 1). We first claim that the vectors u k , 1 ≤ k ≤ n, defined by
 
sin(kπh)
..
uk = 
 
. 
sin(nkπh)
are the eigenvectors of A, with corresponding eigenvalues
λk = 2(1 − cos(kπh)).
This can be verified by checking that
Au k = λk u k .
In fact, for 2 ≤ k ≤ n − 1, the left-hand side of the j -th entry of the above product is given by
2 sin( j kπh) − sin(( j − 1)kπh) − sin(( j + 1)kπh).
Using the trigonometric identity sin(x + y) = sin(x) cos(y) + cos(x) sin(y), we can write this as
2 sin( j kπh) − (cos(kπh) sin( j kπh) − cos( j kπh) sin(kπh))
− (cos(kπh) sin( j kπh) + cos( j kπh) sin(kπh))
= 2(1 − cos(kπh)) · sin( j kπh).
Now sin( j kπh) is just the j -th entry of u k as defined above, so the coefficient in front must
equal the corresponding eigenvalue. The argument for k = 1 and k = n is similar.
The spectral radius is the maximum modulus of such an eigenvalue,
³ ³ nπ ´´
ρ(A) = max |λk | = 2 1 − cos .
1≤k≤n n +1
As the matrix A is symmetric, this is also equal to the matrix 2-norm of A:
³ ³ nπ ´´
∥A∥2 = 2 1 − cos .
n +1

4.4 Convergence of Iterative Algorithms


Video 8.6
In this section we focus on algorithms that attempt to solve a system of equations (Video 8.1)
Ax = b (4.7)
by starting with some vector x 0 and then successively computing a sequence x k , k ≥ 1, by
means of a rule
x k+1 = T x k + c (4.8)
for some matrix T and vector c. The hope is that the resulting sequence will converge to a
solution x of Ax = b.

Department of Mathematics 40 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

Example 29. The Jacobi and Gauss-Seidel methods fall into this framework. Recall the de-
composition
A = L + D +U ,
where L is the lower triangular, D the diagonal, and U the upper triangular part. Then the
Jacobi method corresponds to the choice
T = TJ = −D −1 (L +U ), c = D −1 b,
while the Gauss-Seidel method corresponds to
T = TGS = −(L + D)−1U , c = (L + D)−1 b.
Lemma 4.4. Let T and c be the matrices in the iteration scheme (4.8) corresponding to either
the Jacobi method or the Gauss-Seidel method, and assume that D and L + D are invertible.
Then x is a solution of the system of equations (4.7) if and only if x is a fixed point of the
iteration (4.8), that is,
x = T x + c.

Proof. We write down the proof for the case of Jacobi’s method, the Gauss-Seidel case being
similar. We have
Ax = b ⇔ (L + D +U )x = b
⇔ D x = −(L +U )x + b
⇔ x = −D −1 (L +U )x + D −1 b
⇔ x = T x + c.
This shows the claim.

The problem of solving Ax = b is thus reduced to the problem of finding a fixed point to
an iteration scheme. The following important result shows how to bound the distance of an
iterate x k from the solution x in terms of the operator norm of T and an initial distance of x 0 .
Theorem 4.3. Let x be a solution of Ax = b, and let x k , for k ≥ 0, be a sequence of vectors such
that
x k+1 = T x k + c
R
for an n×n matrix T and a vector c ∈ n . Then, for any vector norm ∥·∥ and associated matrix
norm, we have ° °
° k+1
− x ° ≤ ∥T ∥k+1 °x 0 − x ° .
° ° °
°x
for all k ≥ 0.

Proof. We prove this by induction on k. Recall that for every vector y, we have
° ° ° °
°T y ° ≤ ∥T ∥ ° y ° .

Subtracting the identity x = T x + c from x k+1 = T x k + c and taking norms we get


° ° ° ° ° °
° k+1
− x ° = °T (x k − x)° ≤ ∥T ∥ °x k − x ° . (4.9)
° ° ° ° °
°x

Setting k = 0 gives the claim of the theorem for this case. If we assume that the claim holds
for k − 1, k ≥ 1, then ° ° ° °
° k k ° k−1
°x − x ° ≤ ∥T ∥ − x°
° °
°x
by this assumption, and plugging this into (4.9) finishes the proof.

Department of Mathematics 41 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

Corollary 4.2. Assume that in addition to the assumptions of Theorem 4.3, we have ∥T ∥ < 1.
Then the sequence x k , k ≥ 0, converges to a fixed point x with x = T x + c, with respect to the
chosen norm ∥·∥.

Proof. Assume x 0 ̸= x (otherwise there is nothing to prove) and let ε > 0. Since ∥T ∥ < 1,
∥T ∥k → 0 as k → ∞. In particular, there exists an integer N > 1 such that for all k > N ,
ε
∥T ∥k < ° 0 °.
°x − x °

It follows that for k > N we have °x k − x ° < ε, which completes the convergence proof.
° °

Recall that for the Gauss-Seidel and Jacobi methods, a fixed point of x = T x + c was the
same as a solution of Ax = b. It follows that the Gauss-Seidel and Jacobi methods converge
to a solution (with respect to some norm) provided that ∥T ∥ < 1. Note also that either one
of ∥T ∥∞ < 1 or ∥T ∥2 < 1 will imply convergence with respect to both the 2-norm and the
∞-norm. The reason is the equivalence of norms (Lemma 4.1)
p
∥x∥∞ ≤ ∥x∥2 ≤ n ∥x∥∞ ,

which implies that if the sequence x k , k ≥ 0, converges to x with respect to one of these
norms, it also converges with respect to the other one. Such an equivalence can also be shown
between the 2- and the 1-norm.
So far we have seen that the condition ∥T ∥ < 1 ensures that an iterative scheme of the Video 8.7
form (4.8) converges to a vector x such that x = T x + c as k → ∞. The converse is not true,
there are examples for which ∥T ∥ ≥ 1 but the iteration (4.8) converges nevertheless.
Example 30. Recall the finite difference matrix

2 −1 0 ··· 0
 
−1 2 −1 · · · 0
 
A =  0 −1 2 · · · 0


 . .. .. . . .. 
 .. . . . .
0 0 0 ··· 2

and apply the Jacobi method to compute a solution of Ax = b. The Jacobi method computes
the sequence x k+1 = T x k + c, and for this particular A we have c = 21 b and

0 1 0 ··· 0
 
1 0 1 · · · 0
1 
T = T J = −D −1 (L +U ) = 0 1 0 · · · 0 .

2  .. .. .. . . .. 
. . . . .
0 0 0 ··· 0

We have ∥T ∥∞ = 1, so the convergence criterion doesn’t apply for this norm. However, one
can show that all the eigenvalues satisfy |λ| < 1. Since the matrix T is symmetric, we have

∥T ∥2 = ρ(T ) < 1,

where ρ(T ) denotes the spectral radius. It follows that the iteration (4.8) does converge with
respect to the 2-norm, and therefore also with respect to the ∞-norm, despite having ∥T ∥∞ =
1.

Department of Mathematics 42 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

It turns out that the spectral radius gives rise to a necessary and sufficient condition for
convergence.
Theorem 4.4. The iterates x k of (4.8) converge to a solution x of x = T x + c for all starting
points x 0 if and only if ρ(T ) < 1.

Proof. (*) Let x 0 be any starting point, and define, for all k ≥ 0,

z k = x k − x.

Then z k+1 = T z k , as is easily verified. The convergence of the sequence x k to x is then equiv-
alent to the convergence of z k to 0.
Assume T has n eigenvalues λk (possibly 0), 1 ≤ k ≤ n. We will only prove the claim for
R
case where the eigenvectors u k form a basis of n (equivalently, that T is diagonalizable),
and mention below how the general case can be deduced. We can write
n
z0 = αj u j
X
(4.10)
j =1

for some coefficients α j . For the iterate we get

z k+1 = T z k
= T k+1 z 0
à !
n
k+1
αj u j
X
=T
j =1
n
α j T k+1 u j
X
=
j =1
n
α j λk+1
X
= j uj.
j =1

Now suppose ρ(T ) < 1. Then |λ j | < 1 for all eigenvalues λ j , and therefore λk+1
j
→ 0 as k → ∞.
Therefore, z k+1 → 0 as k → ∞ and x k+1 → x. If, on the other hand, ρ(T ) ≥ 1, then there exists
an index j such that |λ j | ≥ 1. If we choose a starting point x 0 such that the coefficient α j
in (4.10) is not zero, then |α j λk+1
j
| ≥ |α j | for all k and we deduce that z k+1 does not converge
to zero.
If T is not diagonalizable, then we still have the Jordan normal form J = P −1 T P , where P
is an invertible matrix and J consists of Jordan blocks
λi 1 · · · 0
 
 0 λi · · · 0 
 
 .. .. . . .. 
. . . .
0 0 · · · λi

on the diagonal for each eigenvalue λi . Rather than considering a basis of eigenvectors, we
take one consisting of generalized eigenvectors, that is, solutions to the equation

(A − λi 1)k = 0,

where k ≤ m and m is the geometric multiplicity of λi .


End week 8

Department of Mathematics 43 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

4.5 Gershgorin’s circles


Video 9.1
k+1 k
So far we have seen that an iterative method x = T x + c converges to a fixed point x =
T x +c if and only if the spectral radius ρ(T ) < 1. Since the eigenvalues are in general not easy
to compute, the question is whether there is a convenient way to estimate ρ(T ). One way
to bound the size of the eigenvalues is by means of Gershgorin’s Circle Theorem. Recall that
eigenvalues of a matrix A can be complex numbers.
Theorem 4.5. Every eigenvalue of an n × n matrix A lies in one of the circles C 1 , . . . ,C n in the
complex plane, where C i has its centre at the diagonal entry a i i , and radius
X
ri = |a i j |.
j ̸=i

Example 31. Consider the matrix


 
2 −1 0
A = −1 4 −1 .
0 −1 8

The centres are given by 2, 4, 8, and the radii by r 1 = 1, r 2 = 2, r 3 = 1.

Figure 5: Gershgorin’s circles.

Proof. Let λ be an eigenvalue of A, with associated eigenvector u, so that

Au = λu.

The i -th row of this equation is


n
λu i =
X
ai j u j .
j =1

Bringing a i i to the left, this implies the inequality


X |u j |
|λ − a i i | ≤ |a i j | .
j ̸=i |u i |

If the index i is such that u i is the component of u with largest absolute value, then the right-
hand side is bounded by r i , and we get

|λ − a i i | ≤ r i ,

which implies that λ lies in a circle of radius r i around a i i .

Department of Mathematics 44 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

Gershgorin’s Theorem has implications on the convergence of Jacobi’s method. To state Video 9.2
these implications, we need a definition.
Definition 4.6. A matrix A is called diagonally dominant, if for all indices i we have
X
|a i i | > r i = |a i j |.
j ̸=i

Corollary 4.3. Let A be diagonally dominant. Then the Jacobi method converges to a solution
of the system Ax = b for any starting point x 0 .

Proof. We need to show that if A is diagonally dominant, then ρ(T J ) < 1, where T J = −D −1 (L+
U ) is the iteration matrix of Jacobi’s method. The i -th row of T J is
1 ¡ ¢
− a i 1 · · · a i i −1 0 a i i +1 · · · a i n .
ai i
By Gershgorin’s Theorem, all the eigenvalues of T J lie in a circle around 0 of radius
1 X
ri = |a i j |.
|a i i | j ̸=i

It follows that if A is diagonally dominant, then r i < 1, and therefore |λ| < 1 for all eigenvalues
λ of T J . Thus, ρ(T J ) < 1 and so Jacobi’s method converges for any x 0 .

4.6 The Condition Number


Video 9.3
In this section we discuss the sensitivity of a system of equations Ax = b to perturbations
in the data. This sensitivity is quantified by the notion of condition number. We begin by
illustrating the problem with a small example.
Example 32. Consider the system of equations with

ε 1 1+δ
µ ¶ µ ¶
A= , b= ,
0 1 1

where 0 < ε, δ ≪ 1. The solution of Ax = b is


µδ¶
x= ε .
1

We can think of δ as representing the effect of rounding error. Thus δ = 0 would give us an
exact solution, while if δ is small and ε ≪ δ, then the change in x due to δ ̸= 0 can be large!

The following definition is deliberately vague, and will be made more precise in light of
the condition number.
Definition 4.7. A system of equations Ax = b is called ill-conditioned, if small changes in the
system cause large changes in the solution.

To measure the sensitivity of a solution with respect to perturbations in the system, we


introduce the condition number of a matrix.
Definition 4.8. Let ∥·∥ be a matrix norm and A an invertible matrix. The condition number of
A is defined as
cond(A) = ∥A∥ · ° A −1 ° .
° °

Department of Mathematics 45 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

We write cond1 (A), cond2 (A), cond∞ (A) for the condition number with respect to the 1,
2, and ∞ norms.
Let x be the true solution of a system of equations Ax = b, and let x c = x + ∆x be the Video 9.4
solution of a perturbed system
A(x + ∆x) = b + ∆b, (4.11)
where ∆b is a perturbation to b. We are interested in bounding the relative error in the solu-
tion
∥∆x∥
∥x∥
in terms of the relative error in b, which is ∥∆b∥ / ∥b∥. We have

∆b = A(x + ∆x) − b = A∆x,

from which we get ∆x = A −1 ∆b and ∥∆x∥ = ° A −1 ∆b ° ≤ ° A −1 ° · ∥∆b∥. On the other hand,


° ° ° °

∥b∥ = ∥Ax∥ ≤ ∥A∥ ∥x∥, and combining these estimates, we get

∥∆x∥ ° ∥∆b∥ ∥∆b∥


≤ ∥A∥ ° A −1 °
°
= cond(A) · . (4.12)
∥x∥ ∥b∥ ∥b∥

The condition number therefore bounds the relative error in the solution in terms of the rela-
tive error in b. We can also derive a similar bound for perturbations ∆A in the matrix A. Note
that a small condition number is a good thing, as it implies a small error.
The above analysis can also be rephrased in terms of the residual of a computed solution.
Suppose we have A and b exactly, but solving the system Ax = b by a computational method
gives a computed solution x c = x + ∆x that has an error. We don’t necessarily know the error,
but we have access to the residual
r = Ax c − b.
We can rewrite this equation as in (4.11), with r instead of ∆b, so that we can interpret the
residual as a perturbation of b. The condition number bound (4.12) therefore implies

∥∆x∥ ∥r ∥
≤ cond(A) · .
∥x∥ ∥b∥

We now turn to some examples of condition numbers. Video 9.5


Example 33. Let 0 < ε ≪ 1, and let
ε 1
µ

A= .
0 1
The inverse is given by µ ¶
−1 1 1 −1
A = .
ε 0 ε
The condition numbers with respect to the 1 and ∞ norms are

2(1 + ε) 2(1 + ε)
cond1 (A) = , cond∞ (A) = .
ε ε
Since ε is small, the condition numbers are large and therefore we cannot guarantee small
errors.

Department of Mathematics 46 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

Example 34. A well-known example is the Hilbert matrix. Let Hn by the n × n matrix with
entries
1
hi j =
i + j −1
for 1 ≤ i , j ≤ n. This matrix is symmetric (Hn⊤ = Hn ) and positive definite (x ⊤ Hn x > 0 for all
x ̸= 0). For example, for n = 3 the matrix looks as follows
 1 1
1 2 3
H3 =  12 1
3
1
4 .
1 1 1
3 4 5

Examples such as the Hilbert matrix are not common in applications, but they serve as a
reminder that one should keep an eye on the conditioning of a matrix.

n 5 10 15 20
5 13 20
cond2 (Hn ) 4.8 × 10 1.6 × 10 6.1 × 10 2.5 × 1028

Conditioning of Hilbert matrix


15

10
log (cond (H))
2
10

0
2 3 4 5 6 7 8 9 10 11 12
n

Figure 6: Condition number of the Hilbert matrix.

It can be shown that the condition number of the Hilbert matrix is asymptotically
¡p ¢4n+4
2+1
cond2 (Hn ) ∼ 15/4 p
2 πn

for n → ∞. To see the effect that this conditioning has on solving systems of equations, let’s
look at a system
Hn x = b,

Department of Mathematics 47 University of Manchester


MATH20602: 4 NUMERICAL LINEAR ALGEBRA S.L. COTTER

Pn 1
with entries b i = j =1 i + j −1 . The system is constructed such that the solution is x =
(1, . . . , 1) . For n = 20 we get, solving the system using Matlab, a solution x + ∆x which differs

considerably from x. The relative error is

∥∆x∥2
≈ 44.9844.
∥x∥2

This means the computed solution is useless.


Example 35. An important example is the condition number of the omnipresent finite differ-
ence matrix  
−2 1 0 ··· 0 0
 1 −2 1 · · · 0 0
 
0 1 −2 · · · 0 0
 
A= .
 .. .. . . . .
.. 
 .. . . . .. . 
0 0 0 · · · −2 1 
 
0 0 0 · · · 1 −2
It can be shown that the condition number of this matrix is asymptotically

4
cond2 (A) ∼ ,
π2 h 2
where h = 1/(n + 1). If follows that the condition number increases with the number of dis-
cretisation steps n.
Example 36. What is the condition number of a random matrix? If we generate random 100×
100 matrices with normally distributed entries and look at the frequency of the logarithm of
the condition number, then we get the following:

Distribution of log of condition number.


1400

1200

1000
Frequency

800

600

400

200

0
1 2 3 4 5 6 7 8
log(cond2(A))

Figure 7: Condition number of random matrix.

Department of Mathematics 48 University of Manchester


MATH20602: 5 NON-LINEAR EQUATIONS S.L. COTTER

It should be noted that a random matrix is not the same as “any old matrix”, and equally
not the same as a typical matrix arising in applications, so one should be careful in interpret-
ing statements about random matrices!

Computing the condition number can be difficult, as it involves computing the inverse
of a matrix. In many cases one can find good bounds on the condition number, which can,
for example, be used to tell whether a problem is ill-conditioned.
Example 37. Consider the matrix
µ ¶ µ ¶
1 1 −1 4 1.0001 −1
A= , A = 10 .
1 1.0001 −1 1

The condition number with respect to the ∞-norm is given by cond∞ (A) ≈ 4×104 . We would
like to find an estimate for this condition number without having to invert the matrix A. To Video 9.6
do this, note that for any x and b = Ax we have

Ax = b ⇒ x = A −1 b ⇒ ∥x∥ ≤ ° A −1 ° ∥b∥ ,
° °

and we have the lower bound


° −1 ° ∥x∥
°A ° ≥ .
∥b∥
Choosing x = (−1, 1)⊤ in our case, we get b = (0, 0.0001)⊤ and the estimate

∥x∥∞
cond∞ (A) = ∥A∥ ° A −1 ° ≥ ∥A∥∞ ≈ 2 × 104 .
° °
∥b∥∞

This estimate is of the right order of magnitude (in particular, it shows that the condition
number is large), and no inversion was necessary.

To summarize:

• A small condition number is a good thing, as small changes in the data lead to small
changes in the solution.

• Condition numbers may depend on the problem from which the matrix arises, and can
be very large.

• A large condition number indicates that the matrix is “close” to being singular.

Condition numbers also play a role in the convergence analysis of iterative matrix algo-
rithms. We will not discuss this aspect here and refer to more advanced lectures on numerical
linear algebra and matrix analysis. End week 9

5 Non-linear Equations
Given a function f : R → R, we would like to find a solution to the equation Video 10.1

f (x) = 0. (5.1)

Department of Mathematics 49 University of Manchester


MATH20602: 5 NON-LINEAR EQUATIONS S.L. COTTER

0.8

0.6

0.4

0.2

f(x)
0

−0.2

−0.4

−0.6

−0.8

−1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x

For example, if f is a polynomial of degree 2, we can write down the solutions in closed form
(though, as seen in Section 1, this by no means solves the problem from a numerical point of
view!). In general, we will encounter functions for which a closed form does not exist, or it is
not convenient to write down or evaluate. The best way to deal with (5.1) is then to find an
approximate solution using an iterative method. Here we will discuss two methods:

• The bisection method.

• Newton’s method.

The bisection method only requires that f be continuous, while Newton’s method requires
differentiability too, but is faster.

5.1 The bisection method


Video 10.2
Let f : R R→ be a continuous function on an interval [a, b], with a < b. Assume that
f (a) f (b) < 0, that is, the function values at the end points have different signs. By the in-
termediate value theorem there exists an x with a < x < b such that f (x) = 0.
The most direct method of finding such a root x is by “divide and conquer” – determine
which half of the interval [a, b] contains x, and shrink the interval to that half, then repeat
until the boundary points are sufficiently close to x. This approach is called the bisection
method.
To be more precise, starting with [a, b] such that f (a) f (b) < 0, we construct a series of
decreasing sub-intervals [a n , b n ], n ≥ 1, each containing x. At each step, we calculate the
midpoint p n = (a n +b n )/2 and evaluate f (p n ). If f (p n ) f (a n ) < 0 we set [a n+1 , b n+1 ] = [a n , p n ],
otherwise [a n+1 , b n+1 ] = [p n , b n ]. We stop when b n − a n is below some predefined tolerance
TOL, for example 10−4 , and return the value p n .
In Matlab, this algorithm could be programmed as follows:

Department of Mathematics 50 University of Manchester


MATH20602: 5 NON-LINEAR EQUATIONS S.L. COTTER

Iteration: 0, x = 0.3750, f(x) = −0.0980 Iteration: 1, x = 0.5000, f(x) = −0.4714


1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1
0 0.5 1 0 0.5 1

Iteration: 2, x = 0.2500, f(x) = 0.2903 Iteration: 3, x = 0.3750, f(x) = −0.0980


1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1
0 0.5 1 0 0.5 1

Figure 8: The bisection method.

while ( b−a >= TOL)


p = ( a+b ) / 2 ; % Calculate midpoint
i f f ( a ) * f (p) <0 % Change of sign detected
b = p; % Set r i g h t boundary to p
else
a = p; % Set l e f t boundary to p
end
end
x = p; % Computed solution

Example 38. Let’s look at the polynomial x 6 − x − 1 on the interval [1, 2] with tolerance TOL =
0.2 (that is, we stop when we have located a sub-interval of length ≤ 0.2 containing the root
x). Note that no closed form solution exists for polynomials of degree ≥ 5. The bisection
method is best carried out in the form of a table. At each step the midpoint p n is obtained,
and this serves as the next left or right boundary, depending on whether f (a n ) f (p n ) < 0 or
not.
n an f (a n ) bn f (b n ) pn f (p n )
1 1 −1 2 61 1.5 8.8906
2 1 −1 1.5 8.8906 1.25 1.5647
3 1 −1 1.25 1.5647 1.125 −0.097713
4 1.125 −0.097713 1.25 1.5647 1.1875
We see that |b 4 − a 4 | = 0.125 < TOL, so we stop there and declare the a solution p 4 = 1.1875.

The following result shows that the bisection method indeed approximates a zero of f to
arbitrary precision.

Department of Mathematics 51 University of Manchester


MATH20602: 5 NON-LINEAR EQUATIONS S.L. COTTER

R R
Lemma 5.1. Let f : → be a continuous function on an interval [a, b] and let p n , n ≥ 1, Video 10.3
be the sequence of midpoints generated by the bisection method on f . Let x be such that
f (x) = 0. Then
1
|p n − x| ≤ n |b − a|.
2
In particular, p n → x as n → ∞.

A convergence of this form is called linear.

Proof. Let x ∈ [a, b] be such that f (x) = 0. Since p n is the midpoint of [a n , b n ] and x ∈ [a n , b n ],
we have
1
|p n − x| ≤ |b n − a n |.
2
By bisection, each interval has half the length of the preceding one:

1
|b n − a n | = |b n−1 − a n−1 |.
2
Therefore,
1
|p n − x| ≤ |b n − a n |
2
1
= 2 |b n−1 − a n−1 |
2
= ···
1 1
= n |b 1 − a 1 | = n |b − a|.
2 2
This completes the proof.

5.2 Newton’s method


Video 10.4
If the function f : R R
→ is differentiable, and we are able to compute f ′ (x), then we can
(under certain conditions) find the root of f much quicker by Newton’s method. The idea
behind Newton’s method is to approximate f at a point x n by its tangent line, and calculate
the next iterate x n+1 as the root of this tangent line.

Given a point x n with function value f (x n ), we need to find the root of the tangent line to the
function at (x n , f (x n )):
y = f ′ (x n )(x − x n ) + f (x n ) = 0.
Solving this for x, we get
f (x n )
x = xn − ,
f ′ (x n )
which is well-defined provided f ′ (x n ) ̸= 0. Formally, Newton’s method is as follows:

• Start with x 1 ∈ [a, b] such that f ′ (x 1 ) ̸= 0.

Department of Mathematics 52 University of Manchester


MATH20602: 5 NON-LINEAR EQUATIONS S.L. COTTER

Iteration: 0, x = 0.7000, f(x) = −0.8997 Iteration: 1, x = 0.0437, f(x) = 0.8090


1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1
0 0.5 1 0 0.5 1

Iteration: 2, x = 0.4819, f(x) = −0.4205 Iteration: 3, x = 0.3344, f(x) = 0.0295


1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1
0 0.5 1 0 0.5 1

Figure 9: Newton’s method.

• At each step, compute a new iterate x n+1 from x n as follows:

f (x n )
x n+1 = x n − .
f ′ (x n )

• Stop if |x n+1 − x n | < TOL for some predefined tolerance.

Example 39. Consider again the function f (x) = x 6 − x − 1. The derivative is f ′ (x) = 6x 5 − 1.
We apply Newton’s method using a tolerance TOL = 0.001. We get the sequence:

x1 = 1
f (x 1 )
x2 = x1 − = 1.2
f ′ (x 1 )
f (x 2 )
x3 = x2 − ′ = 1.1436
f (x 2 )
x 4 = 1.1349
x 5 = 1.1347.

The difference |x 5 − x 4 | is below the given tolerance, so we stop and declare x 5 to be our
solution. We can already see that in just four iterations we get a far better approximation
than by using the bisection method.

We will see that the error of Newton’s method is bounded as

|x n+1 − x| ≤ k|x n − x|2

Department of Mathematics 53 University of Manchester


MATH20602: 5 NON-LINEAR EQUATIONS S.L. COTTER

for a constant k, provided we start “sufficiently close” to x. This will be shown using the
theory of fixed point iterations, discussed in the next section.
Newton’s method is not without difficulties. One can easily come up with starting points
where the method does not converge. One example is when f ′ (x 1 ) ≈ 0, in which case the tan-
gent line at (x 1 , f (x 1 )) is almost horizontal and takes us far away from the solution. Another
issue is where the iteration oscillates between two values, as in the following example.

y=x3−2x+2
6

3
f(x)

−1

−2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
x

Figure 10: Newton’s method fails.

5.3 Fixed point iterations


Video 10.5
R R
A root of a function f : → is a number x ∈ R such that f (x) = 0. A fixed point is a root of
a function of the form f (x) = g (x) − x.
Definition 5.1. A fixed point of a function g : R → R is a number x such that g (x) = x.
In Newton’s method we have
f (x)
g (x) = x − ,
f ′ (x)
where x is a fixed point of g if and only if x is a root of f . A feature of the fixed point formula-
tion is that we may use the function g to generate a sequence.
Definition 5.2. A fixed point iteration is a sequence x n , n ≥ 1, given by

x n+1 = g (x n ).

We construct fixed point iterations in the hope that they will converge to a fixed point
of g . We will study conditions under which this happens.
Example 40. Let f (x) = x 3 +4x 2 −10. There are several ways to rephrase the problem f (x) = 0 Video 10.6
as a fixed point problem g (x) = x.

1. Let g 1 (x) = x − x 3 − 4x 2 + 10. Then g 1 (x) = x if and only if f (x) = 0.

Department of Mathematics 54 University of Manchester


MATH20602: 5 NON-LINEAR EQUATIONS S.L. COTTER

¢1/2
2. Let g 2 (x) = 12 10 − x 3 . Suppose x ≥ 0. Then g 2 (x) = x ⇔ x 2 = 14 (10 − x 3 ) ⇔ f (x) = 0.
¡

¡ 10 ¢1/2
3. Let g 3 (x) = 4+x . Then (if x ≥ 0) it is also not difficult to verify that g 3 (x) = x is
equivalent to f (x) = 0.

Example 41. We briefly discuss a more intriguing example, the logistic map

g (x) = r x(1 − x)

with a parameter r ∈ [0, 4]. Whether the iteration x n+1 = g (x n ) converges to a fixed point, and
how it converges, depends on the value of r . Three examples are shown below.

r = 2.8000 r = 3.5000 r = 3.8000


0.7 1 1

0.68 0.9
0.8
0.66 0.8

0.6
0.64 0.7

0.62 0.6
0.4

0.6 0.5
0.2
0.58 0.4

0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50

If we record the movement of x n for r ranging between 0 and 4, the following bifurcation
diagram emerges:

Logistic map x = r x(1−x)


1

0.9

0.8

0.7

0.6

0.5
x

0.4

0.3

0.2

0.1

0
0 0.5 1 1.5 2 2.5 3 3.5 4
r

Department of Mathematics 55 University of Manchester


MATH20602: 5 NON-LINEAR EQUATIONS S.L. COTTER

It turns out that for small values of r we have convergence (which, incidentally, does not
depend on the starting value). For values slightly above 3, oscillation between two, and then
four values, while for larger r we have “chaotic” behaviour. In the latter region, the trajectory
of x n is also highly sensitive to perturbations of the initial value x 1 . The precise behaviour of
such iterations is studied in dynamical systems.

Given a fixed point problem, the all important question is: when does the iteration x n+1 = Video 10.7
g (x n ) converge? The following theorem gives an answer to this question.
Theorem 5.1. (fixed point theorem) Let g be a smooth function on [a, b]. Suppose

1. g (x) ∈ [a, b] for x ∈ [a, b], and

2. |g ′ (x)| < 1 for x ∈ [a, b].

Then there exists a unique fixed point x = g (x) in [a, b] and the sequence {x n } defined by
x n+1 = g (x n ) converges to x. Moreover,

|x n+1 − x| ≤ λn |x 1 − x|

for some λ < 1.

Proof. Let f (x) = g (x) − x. Then by (1), f (a) = g (a) − a ≥ 0, and f (b) = g (b) − b ≤ 0. By the Video 11.1
intermediate value theorem, there exists an x ∈ [a, b] such that f (x) = 0. Hence, there exists
x ∈ [a, b] such that g (x) = x, showing the existence of a fixed point.
Next, consider x n+1 = g (x n ) for n ≥ 1, and let x = g (x) be a fixed point. Then

x n+1 − x = g (x n ) − g (x).

Assume without lack of generality x n > x. By the mean value theorem there exists ξ ∈ (x, x n )
such that
g (x n ) − g (x)
g ′ (ξ) = ,
xn − x
and hence
x n+1 − x = g ′ (ξ)(x n − x).
Since ξ ∈ (a, b), assumption (2) gives g ′ (ξ) ≤ λ for some λ < 1. Hence,

|x n+1 − x| ≤ λ|x n − x|
≤ ···
≤ λn |x 1 − x|.

This proves the convergence. To show uniqueness, suppose x, y are two distinct fixed points
of g with x < y. By the mean value theorem and assumption (2), there exists ξ ∈ (x, y) such
that ¯ ¯
¯ g (x) − g (y) ¯
¯ ¯ = |g ′ (ξ)| < 1.
¯ x−y ¯
But since both x and y are fixed points, we have
g (x) − g (y) x − y
= = 1,
x−y x−y
so we have a contradiction, and so the fixed point is unique.

Department of Mathematics 56 University of Manchester


MATH20602: 5 NON-LINEAR EQUATIONS S.L. COTTER

Example 42. Let’s look at the functions from Example 40 to see for which one we have con- Video 10.7
vergence.

1. g 1 (x) = x −x 3 −4x 2 +10 on [a, b] = [1, 2]. Note that g (1) = 6 ̸∈ [1, 2], therefore assumption
(1) is violated.

2. g 2 (x) = 12 (10 − x 3 )1/2 . The derivative is given by

−3x 2
g 2′ (x) = ,
4(10 − x 3 )1/2
and therefore g 2′ (2) ≈ −2.12. Condition (2) fails.

3. The third formulation is µ ¶1/2


10
g 3 (x) = .
4+x
The derivative is given by
−5
g 3′ (x) = p ,
(4 + x)3/2 10
p
and
p therefore the function is strictly decreasing on [1, 2]. Since g 3 (2) = p g 3 (1) =
5/3 and

2 are both in [1, 2], condition (1) is satisfied. Furthermore, |g 3 (x)| ≤ 1/5 2 < 1 for
x ∈ [1, 2], so condition (2) is also satisfied. It follows that the iteration x n+1 = g 3 (x n )
converges to a fixed point of g 3 . We can try this out:

x 1 = 1.5, x 2 = 1.3484, x 3 = 1.3674, x 4 = 1.3650, . . .


End week 10
Example 43. We can apply the fixed point theorem to Newton’s method. Let Video 11.2
f (x)
g (x) = x − .
f ′ (x)
Then
′ f ′ (x) f (x) f ′′ (x) f (x) f ′′ (x)
g (x) = 1 − ′ + = .
f (x) f ′ (x)2 f ′ (x)2
Let α be a root of f such that f (α) = 0 and f ′ (α) ̸= 0. Then

g ′ (α) = 0.

Hence, |g ′ (α)| < 1 at the fixed point. Now let ε > 0 be sufficiently small and a = α−ε, b = α+ε.
Then, by continuity,
1
|g ′ (x)| <
2
for x ∈ [a, b], and (2) holds. Furthermore,
1
|g (x) − α| = |g ′ (ξ)||x − α| ≤ |x − α| < ε
2
for x ∈ [a, b]. Hence, g (x) ∈ [a, b] and (1) holds. It follows that in a small enough neighbour-
hood of a root of f (x), Newton’s method converges to that root (provided f ′ (x) ̸= 0 at that
root).
Note that the argument with ε illustrates a key aspect of Newton’s method: it only con-
verges when the initial guess x 1 is close enough to a root of f . What “close enough” means is
often not so clear. In the next section we derive a stronger result for Newton’s method, namely
that it converges quadratically if the starting point is close enough.

Department of Mathematics 57 University of Manchester


MATH20602: 5 NON-LINEAR EQUATIONS S.L. COTTER

5.4 Rates of convergence


Video 11.3
The speed of iterative numerical methods is characterized by the rate of convergence.
Definition 5.3. The sequence x n , n ≥ 0, converges to α with order one, or linearly, if

|x n+1 − α| ≤ k|x n − α|

for some 0 < k < 1. The sequence converges with order r , r ≥ 2, if

|x n+1 − α| ≤ k|x n − α|r

with k > 0. If the sequence converges with order r = 2, it is said to converge quadratically.
n
Example 44. Consider the sequence x n = 1/2r for r > 1. Then x n → 0 as n → ∞. Note that

1 r
µ ¶
1 1
x n+1 = n+1 = r n r = r n = x nr ,
2r 2 2

and therefore |x n+1 − 0| ≤ 1 · |x n − 0|r . We have convergence of order r .

For example, if a sequence converges quadratically, then if |x n − α| ≤ 0.1, in the next step
we have |x n+1 −α| ≤ k ·0.01. We would like to show that Newton’s method converges quadrat-
ically to a root of a function f if we start the iteration sufficiently close to that root.
Theorem 5.2. Let g be continuously differentiable in the neighbourhood of a fixed point α. Video 11.4
The fixed point iteration x n+1 = g (x n ) converges quadratically to α, if g ′ (α) = 0 and the start-
ing point x 1 is sufficiently close to α.

Again, sufficiently close means that there exists an interval [a, b] for which this holds.

Proof. (*) Consider the Taylor expansion around α:

1
g (x) = g (α) + g ′ (α)(x − α) + g ′′ (α)(x − α)2 + R,
2

where R is a remainder term of order O((x − α)3 ). Assume g ′ (α) = 0. Then

1
g (x) − g (α) = g ′′ (α)(x − α)2 + R.
2

Suppose that |x − α| < ε. Since R is proportional to (x − α)3 , we can also write


µ ¶
1 ′′ 2 2
g (x) − g (α) = g (α)(x − α) 1 + ′′ · R1 ,
2 g (α)

where R 1 = R/(x − α)2 = O((x − α)) ≤ C ε for a constant C . Taking absolute values, we get

|g (x) − g (α)| ≤ k · |x − α|2 .

for a constant k. Set x = x n , x n+1 = g (x n ), α = g (α). Then g (x) − g (α) = x n+1 − α and

|x n+1 − α| ≤ k|x n − α|2 .

This shows quadratic convergence.

Department of Mathematics 58 University of Manchester


MATH20602: 5 NON-LINEAR EQUATIONS S.L. COTTER

Corollary 5.1. Newton’s method converges quadratically if we start sufficiently close to a root.

In summary, we have the following points worth noting about the bisection method and
Newton’s method.

• The bisection method requires that f is continuous on [a, b], and that f (a) f (b) < 0.

• Newton’s method requires that f is continuous and differentiable, and moreover re-
quires a good starting point x 1 .

• The bisection method converges linearly, while Newton’s method converges quadrati-
cally.

• There is no obvious generalization of the bisection method to higher dimension, while


Newton’s method generalizes easily.

5.5 Newton’s method in the complex plane


Video 11.5
We present another example illustrating the intricate behaviour of fixed point iterations, this
time over the complex numbers.
Example 45. Consider the function
f (z) = z 3 − 1.
This function has exactly three roots, the roots of unity
2πi k
zk = e 3

for k = 0, 1, 2. As in the real case, Newton’s method

f (z n )
z n+1 = z n −
f ′ (z n )

converges to one of these roots of unity if we start close enough. But what happens at the
boundaries? The following picture illustrates the behaviour of Newton’s method for this func-
tion in the complex plane, where each colour indicates the root to which a given starting value
converges.

Department of Mathematics 59 University of Manchester


MATH20602: 5 NON-LINEAR EQUATIONS S.L. COTTER

If we look at the rate of convergence, we get the following picture:

End week 11

Department of Mathematics 60 University of Manchester

You might also like