100% found this document useful (2 votes)
5K views437 pages

John Strikwerda - Finite Difference Schemes and Partial Differential Equations

Uploaded by

Room Art
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
5K views437 pages

John Strikwerda - Finite Difference Schemes and Partial Differential Equations

Uploaded by

Room Art
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 437

Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.

php
OT87 Strikwerde FM1.qxd
9/23/2004
2:46 PM
Page 1

Equations
and Partial Differential
Finite Difference Schemes
OT87 Strikwerde FM1.qxd 9/23/2004 2:46 PM Page 3

Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Finite Difference Schemes


and Partial Differential
Equations
Second Edition

John C. Strikwerda
University of Wisconsin–Madison
Madison, Wisconsin

Society for Industrial and Applied Mathematics


Philadelphia
OT87 Strikwerde FM1.qxd 9/23/2004 2:46 PM Page 4

Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Copyright © 2004 by the Society for Industrial and Applied Mathematics.

This SIAM edition is a second edition of the work first published by Wadsworth
& Brooks/Cole Advanced Books & Software, Pacific Grove, CA, 1989.

10 9 8 7 6 5 4 3 2 1

All rights reserved. Printed in the United States of America. No part of this book
may be reproduced, stored, or transmitted in any manner without the written
permission of the publisher. For information, write to the Society for Industrial
and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA
19104-2688.

Library of Congress Cataloging-in-Publication Data

Strikwerda, John C., 1947-


Finite difference schemes and partial differential equations / John C.
Strikwerda. — 2nd ed.
p. cm.
Includes bibliographical references and index.
ISBN 0-89871-567-9
1. Differential equations, Partial—Numerical solutions. 2. Finite differences.
I. Title.

QA374.S88 2004
518’.64—dc22
2004048714

is a registered trademark.
Contents
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Preface to the Second Edition ix


Preface to the First Edition xi

1 Hyperbolic Partial Differential Equations 1


1.1 Overview of Hyperbolic Partial Differential Equations 1
1.2 Boundary Conditions 9
1.3 Introduction to Finite Difference Schemes 16
1.4 Convergence and Consistency 23
1.5 Stability 28
1.6 The Courant–Friedrichs–Lewy Condition 34

2 Analysis of Finite Difference Schemes 37


2.1 Fourier Analysis 37
2.2 Von Neumann Analysis 47
2.3 Comments on Instability and Stability 58

3 Order of Accuracy of Finite Difference Schemes 61


3.1 Order of Accuracy 61
3.2 Stability of the Lax–Wendroff and Crank–Nicolson Schemes 76
3.3 Difference Notation and the Difference Calculus 78
3.4 Boundary Conditions for Finite Difference Schemes 85
3.5 Solving Tridiagonal Systems 88

4 Stability for Multistep Schemes 95


4.1 Stability for the Leapfrog Scheme 95
4.2 Stability for General Multistep Schemes 103
4.3 The Theory of Schur and von Neumann Polynomials 108
4.4 The Algorithm for Schur and von Neumann Polynomials 117

v
vi Contents
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

5 Dissipation and Dispersion 121


5.1 Dissipation 121
5.2 Dispersion 125
5.3 Group Velocity and the Propagation of Wave Packets 130

6 Parabolic Partial Differential Equations 137


6.1 Overview of Parabolic Partial Differential Equations 137
6.2 Parabolic Systems and Boundary Conditions 143
6.3 Finite Difference Schemes for Parabolic Equations 145
6.4 The Convection-Diffusion Equation 157
6.5 Variable Coefficients 163

7 Systems of Partial Differential Equations


in Higher Dimensions 165
7.1 Stability of Finite Difference Schemes for Systems of Equations 165
7.2 Finite Difference Schemes in Two and Three Dimensions 168
7.3 The Alternating Direction Implicit Method 172

8 Second-Order Equations 187


8.1 Second-Order Time-Dependent Equations 187
8.2 Finite Difference Schemes for Second-Order Equations 193
8.3 Boundary Conditions for Second-Order Equations 199
8.4 Second-Order Equations in Two and Three Dimensions 202

9 Analysis of Well-Posed and Stable Problems 205


9.1 The Theory of Well-Posed Initial Value Problems 205
9.2 Well-Posed Systems of Equations 213
9.3 Estimates for Inhomogeneous Problems 223
9.4 The Kreiss Matrix Theorem 225

10 Convergence Estimates for Initial Value Problems 235


10.1 Convergence Estimates for Smooth Initial Functions 235
10.2 Related Topics 248
10.3 Convergence Estimates for Nonsmooth Initial Functions 252
10.4 Convergence Estimates for Parabolic Differential Equations 259
10.5 The Lax–Richtmyer Equivalence Theorem 262
10.6 Analysis of Multistep Schemes 267
10.7 Convergence Estimates for Second-Order Differential Equations 270
Contents vii
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

11 Well-Posed and Stable Initial-Boundary


Value Problems 275
11.1 Preliminaries 275
11.2 Analysis of Boundary Conditions for the Leapfrog Scheme 281
11.3 The General Analysis of Boundary Conditions 288
11.4 Initial-Boundary Value Problems for Partial Differential Equations 300
11.5 The Matrix Method for Analyzing Stability 307

12 Elliptic Partial Differential Equations


and Difference Schemes 311
12.1 Overview of Elliptic Partial Differential Equations 311
12.2 Regularity Estimates for Elliptic Equations 315
12.3 Maximum Principles 317
12.4 Boundary Conditions for Elliptic Equations 322
12.5 Finite Difference Schemes for Poisson’s Equation 325
12.6 Polar Coordinates 333
12.7 Coordinate Changes and Finite Differences 335

13 Linear Iterative Methods 339


13.1 Solving Finite Difference Schemes for Laplace’s Equation
in a Rectangle 339
13.2 Eigenvalues of the Discrete Laplacian 342
13.3 Analysis of the Jacobi and Gauss–Seidel Methods 345
13.4 Convergence Analysis of Point SOR 351
13.5 Consistently Ordered Matrices 357
13.6 Linear Iterative Methods for Symmetric, Positive Definite Matrices 362
13.7 The Neumann Boundary Value Problem 365

14 The Method of Steepest Descent


and the Conjugate Gradient Method 373
14.1 The Method of Steepest Descent 373
14.2 The Conjugate Gradient Method 377
14.3 Implementing the Conjugate Gradient Method 384
14.4 A Convergence Estimate for the Conjugate Gradient Method 387
14.5 The Preconditioned Conjugate Gradient Method 390

A Matrix and Vector Analysis 399


A.1 Vector and Matrix Norms 399
A.2 Analytic Functions of Matrices 406
viii Contents
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

B A Survey of Real Analysis 413


B.1 Topological Concepts 413
B.2 Measure Theory 413
B.3 Measurable Functions 414
B.4 Lebesgue Integration 415
B.5 Function Spaces 417

C A Survey of Results from Complex Analysis 419


C.1 Basic Definitions 419
C.2 Complex Integration 420
C.3 A Phragmen–Lindelöf Theorem 422
C.4 A Result for Parabolic Systems 424

References 427
Index 431
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Preface to the Second Edition

I am extremely gratified by the wide acceptance of the first edition of this textbook. It
confirms that there was a need for a textbook to cover the basic theory of finite difference
schemes for partial differential equations, and I am pleased that this textbook filled some
of that need.
I am very appreciative that SIAM has agreed to publish this second edition of the
text. Many users of this textbook are members of SIAM, and I appreciate the opportunity
to serve that community with this improved text
This second edition incorporates a number of changes, a few of which appeared in
later printings of the first edition. An important modification is the inclusion of the notion
of a stability domain in the definition of stability. The incompleteness of the original
definition was pointed out to me by Prof. Ole Hald. In some printings of the first edition the
basic definition was modified, but now the notion of a stability domain is more prevalent
throughout the text.
A significant change is the inclusion of many more figures in the text. This has made it
easier to illustrate several important concepts and makes the material more understandable.
There are also more tables of computational results that illustrate the properties of finite
difference schemes.
There are a few small changes to the layout requested by SIAM. Among these are
that the end-of-proof mark has been changed to an open box, , rather than the filled-in box
used in the first edition.
I did not add new chapters to the second edition because that would have made the
text too long and because there are many other texts and research monographs that discuss
material beyond the scope of this text.
I offer my thanks to the many students who have taken my course using the textbook.
They have encouraged me and given a great many suggestions that have improved the
exposition. To them goes much of the credit for finding the typographical errors and
mistakes that appeared in the first edition’s text and exercises.
My special thanks is given to those former students, John Knox, Young Lee, Dongho
Shin, and Suzan Stodder, for their many thoughtful suggestions.

John C. Strikwerda
March 2004

ix
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Preface to the First Edition

This text presents the basic theory of finite difference schemes applied to the numerical
solution of partial differential equations. It is designed to be used as an introductory graduate
text for students in applied mathematics, engineering, and the sciences, and with that in
mind, presents the theory of finite difference schemes in a way that is both rigorous and
accessible to the typical graduate student in the course. The two aims of the text are
to present the basic material necessary to do scientific computation with finite difference
schemes and to present the basic theory for understanding these methods.
The text was developed for two courses: a basic introduction to finite difference
schemes for partial differential equations and an upper level graduate course on the theory
related to initial value problems. Because students in these courses have diverse back-
grounds in mathematics, the text presumes knowledge only through advanced calculus,
although some mathematical maturity is required for the more advanced topics. Students
taking an introduction to finite difference schemes are often acquainted with partial differ-
ential equations, but many have not had a formal course on the subject. For this reason,
much of the necessary theory of partial differential equations is developed in the text.
The chief motivation for this text was the desire to present the material on time-
dependent equations, Chapters 1 through 11, in a unified way that was accessible to students
who would use the material in scientific and engineering studies. Chapters 1 through 11
contain much that is not in any other textbook, but more important, the unified treatment,
using Fourier analysis, emphasizes that one can study finite difference schemes using a
few powerful ideas to understand most of their properties. The material on elliptic partial
differential equations, Chapters, 12, 13, and 14, is intended to be only an introduction; it
should enable students to progress to more advanced texts and implement the basic methods
knowledgably.
Several distinctive features of this textbook are:
• The fundamental concepts of convergence, consistency, and stability play an impor-
tant role from the beginning.
• The concept of order of accuracy of a finite difference scheme is carefully presented
with a single basic method of determining the order of accuracy of a scheme.
• Convergence proofs are given relating the order of accuracy of the scheme to that of
the solution. A complete proof of the Lax–Richtmyer equivalence theorem, for the
simple case of constant coefficient equations, is presented using methods accessible
to most students in the course.
• Fourier analysis is used throughout the text to give a unified treatment of many of the
important ideas.
• The basic theory of well-posed initial value problems is presented.
• The basic theory of well-posed initial-boundary value problems is presented for both
partial differential equations and finite difference schemes.
A suggested one-semester introductory course can cover most of the material in Chap-
ters 1, 2, 3, 5, 6, 7, 12, 13, and 14 and parts of Chapters 4 and 10. A more advanced course
could concentrate on Chapters 9, 10, and 11.

xi
xii Preface to the First Edition
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

In many textbooks on finite difference schemes, the discussion of the von Neumann
stability condition does not make it clear when one may use the restricted condition and
when one must use the general condition. In this text, theorems showing when the restricted
condition may be used are stated and proved. The treatment given here was motivated by
discussions with engineers and engineering students who were using the restricted condition
when the more general condition was called for.
The treatment of accuracy of finite difference schemes is new and is an attempt to make
the method for analyzing accuracy a rigorous procedure, rather than a grab-bag of quite
different methods. This treatment is a result of queries from students who used textbook
methods but were confused because they employed the wrong “trick” at the wrong time.
Because many applications involve inhomogeneous equations, I have included the forcing
function in the analysis of accuracy.
The convergence results of Chapter 10 are unique to this textbook. Both students
and practicing computational engineers are often puzzled about why second-order accurate
schemes do not always produce solutions that are accurate of second order. Indeed, some
texts give students the impression that solutions to finite difference schemes are always
computed with the accuracy of the scheme. The important results in Chapter 10 show
how the order of accuracy of the scheme is related to the accuracy of the solution and the
smoothness of the solution.
The material on Schur and von Neumann polynomials in Chapter 4 also appears in a
textbook for the first time. Tony Chan deserves credit for calling my attention to Miller’s
method, which should be more widely known. The analysis of stability for multilevel,
higher order accurate schemes is not practical without methods such as Miller’s.
There are two topics that, regretfully, have been omitted from this text due to lim-
itations of time and space. These are nonlinear hyperbolic equations and the multigrid
methods for elliptic equations. Also, it would have been nice to include more material
on variable grids, grid generation techniques, and other topics related to actual scientific
computing. But I have decided to leave these embellishments to others or to later editions.
The numbering of theorems, lemmas, and corollaries is done as a group. That is, the
corollary after Theorem 2.2.1 is numbered 2.2.2 and the next theorem is Theorem 2.2.3.
The end of each proof is marked with the symbol and the end of each example is marked
with the symbol .
Many students have offered comments on the course notes from which this book
evolved and they have improved the material immensely. Special thanks go to Scott
Markel, Naomi Decker, Bruce Wade, and Poon Fung for detecting many typographical
errors. I also acknowledge the reviewers, William Coughran, AT&T Bell Laboratories;
Max Gunzberger, Carnegie-Mellon University; Joseph Oliger, Stanford University; Nick
Trefethen, Massachusetts Institute of Technology; and Bruce Wade, Cornell University, for
their helpful comments.

John C. Strikwerda
April 1989
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 1

Hyperbolic Partial Differential


Equations

We begin our study of finite difference methods for partial differential equations by con-
sidering the important class of partial differential equations called hyperbolic equations. In
later chapters we consider other classes of partial differential equations, especially parabolic
and elliptic equations. For each of these classes of equations we consider prototypical equa-
tions, with which we illustrate the important concepts and distinguishing features associated
with each class. The reader is referred to other textbooks on partial differential equations
for alternate approaches, e.g., Folland [18], Garabedian [22], and Weinberger [68]. After
introducing each class of differential equations we consider finite difference methods for
the numerical solution of equations in the class.
We begin this chapter by considering the simplest hyperbolic equation and then extend
our discussion to include hyperbolic systems of equations and equations with variable
coefficients. After the basic concepts have been introduced, we begin our discussion of finite
difference schemes. The important concepts of convergence, consistency, and stability are
presented and shown to be related by the Lax–Richtmyer equivalence theorem. The chapter
concludes with a discussion of the Courant–Friedrichs–Lewy condition and related topics.

1.1 Overview of Hyperbolic Partial Differential Equations


The One-Way Wave Equation
The prototype for all hyperbolic partial differential equations is the one-way wave equation:

ut + aux = 0, (1.1.1)

where a is a constant, t represents time, and x represents the spatial variable. The
subscript denotes differentiation, i.e., ut = ∂u/∂t. We give u(t, x) at the initial time,
which we always take to be 0—i.e., u(0, x) is required to be equal to a given function
u0 (x) for all real numbers x —and we wish to determine the values of u(t, x) for positive
values of t. This is called an initial value problem.
By inspection we observe that the solution of (1.1.1) is

u(t, x) = u0 (x − at). (1.1.2)

(Actually, we know only that this is a solution; we prove later that this is the unique solution.)

1
2 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The formula (1.1.2) tells us several things. First, the solution at any time t0 is a
copy of the original function, but shifted to the right, if a is positive, or to the left, if a is
negative, by an amount |a|t0 . Another way to say this is that the solution at (t, x) depends
only on the value of ξ = x − at. The lines in the (t, x) plane on which x − at is constant
are called characteristics. The parameter a has dimensions of distance divided by time
and is called the speed of propagation along the characteristic. Thus the solution of the
one-way wave equation (1.1.1) can be regarded as a wave that propagates with speed a
without change of shape, as illustrated in Figure 1.1.

t =0 t >0 x

Figure 1.1. The solution of the one-way wave equation is a shift.

Second, whereas equation (1.1.1) appears to make sense only if u is differentiable,


the solution formula (1.1.2) requires no differentiability of u0 . In general, we allow for
discontinuous solutions for hyperbolic problems. An example of a discontinuous solution
is a shock wave, which is a feature of solutions of nonlinear hyperbolic equations.
To illustrate further the concept of characteristics, consider the more general hyper-
bolic equation
ut + aux + bu = f (t, x),
(1.1.3)
u(0, x) = u0 (x),
where a and b are constants. Based on our preceding observations we change variables
from (t, x) to (τ, ξ ), where τ and ξ are defined by

τ = t, ξ = x−at.

The inverse transformation is then

t = τ, x = ξ + aτ,

and we define ũ(τ, ξ ) = u(t, x), where (τ, ξ ) and (t, x) are related by the preceding
relations. (Both u and ũ represent the same function, but the tilde is needed to distinguish
1.1 Overview of Hyperbolic Equations 3
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

between the two coordinate systems for the independent variables.) Equation (1.1.3) then
becomes
∂ ũ ∂t ∂x
= ut + ux
∂τ ∂τ ∂τ
= ut + aux = −bu + f (τ, ξ + aτ ).
So we have
∂ ũ
= −bũ + f (τ, ξ + aτ ).
∂τ
This is an ordinary differential equation in τ and the solution is
 τ
ũ(τ, ξ ) = u0 (ξ )e−bτ + f (σ, ξ + aσ )e−b(τ −σ ) dσ.
0

Returning to the original variables, we obtain the representation for the solution of equation
(1.1.3) as
 t  
u(t, x) = u0 (x − at)e−bt + f s, x − a(t − s) e−b(t−s) ds. (1.1.4)
0

We see from (1.1.4) that u(t, x) depends only on values of (t  , x  ) such that x  − at  =
x − at, i.e., only on the values of u and f on the characteristic through (t, x) for
0 ≤ t  ≤ t.
This method of solution of (1.1.3) is easily extended to nonlinear equations of the
form
ut + aux = f (t, x, u). (1.1.5)
See Exercises 1.1.5, 1.1.4, and 1.1.6 for more on nonlinear equations of this form.

Systems of Hyperbolic Equations


We now examine systems of hyperbolic equations with constant coefficients in one space
dimension. The variable u is now a vector of dimension d.
Definition 1.1.1. A system of the form

ut + Aux + Bu = F (t, x) (1.1.6)

is hyperbolic if the matrix A is diagonalizable with real eigenvalues.


By saying that the matrix A is diagonalizable, we mean that there is a nonsingular
matrix P such that P AP −1 is a diagonal matrix, that is,
 
a1 0
P AP −1 =  ..
.  = .
0 ad
4 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The eigenvalues ai of A are the characteristic speeds of the system. Under the change of
variables w = P u we have, in the case B = 0,

wt +  wx = P F (t, x) = F̃ (t, x)

or
wti + ai wxi = f˜i (t, x),
which is the form of equation (1.1.3). Thus, when matrix B is zero, the one-dimensional
hyperbolic system (1.1.6) reduces to a set of independent scalar hyperbolic equations. If
B is not zero, then in general the resulting system of equations is coupled together, but
only in the undifferentiated terms. The effect of the lower order term, Bu, is to cause
growth, decay, or oscillations in the solution, but it does not alter the primary feature of the
propagation of the solution along the characteristics. The definition of hyperbolic systems
in more than one space dimension is given in Chapter 9.

Example 1.1.1. As an example of a hyperbolic system, we consider the system

ut + 2ux + vx = 0,
vt + ux + 2vx = 0,

which can be written as   


u 2 1 u
+ = 0.
v t 1 2 v x
As initial data we take

1 if |x| ≤ 1,
u(0, x) = u0 (x) =
0 if |x| > 1,
v(0, x) = 0.

By adding and subtracting the two equations, the system can be rewritten as

(u + v)t + 3(u + v)x = 0,

(u − v)t + (u − v)x = 0
or
wt1 + 3wx1 = 0, w1 (0, x) = u0 (x),

wt2 + wx2 = 0, w2 (0, x) = u0 (x).



1 1
The matrix P is for this transformation. The solution is, therefore,
1 −1

w1 (t, x) = w01 (x − 3t),

w 2 (t, x) = w02 (x − t)
1.1 Overview of Hyperbolic Equations 5
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

or
u(t, x) = 12 (w 1 + w 2 ) = 1
2 [u0 (x − 3t) + u0 (x − t)] ,

v(t, x) = 12 (w 1 − w 2 ) = 1
2 [u0 (x − 3t) − u0 (x − t)] .
These formulas show that the solution consists of two independent parts, one propagating
with speed 3 and one with speed 1.

Equations with Variable Coefficients


We now examine equations for which the characteristic speed is a function of t and x.
Consider the equation
ut + a(t, x)ux = 0 (1.1.7)
with initial condition u(0, x) = u0 (x), which has the variable speed of propagation a(t, x).
If, as we did after equation (1.1.3), we change variables to τ and ξ, where τ = t and ξ
is as yet undetermined, we have

∂ ũ ∂t ∂x
= ut + ux
∂τ ∂τ ∂τ
∂x
= ut + ux .
∂τ
In analogy with the constant coefficient case, we set

dx
= a(t, x) = a(τ, x).

This is an ordinary differential equation for x giving the speed along the characteristic
through the point (τ, x) as a(τ, x). We set the initial value for the characteristic curve
through (τ, x) to be ξ. Thus the equation (1.1.7) is equivalent to the system of ordinary
differential equations
d ũ
= 0, ũ(0, ξ ) = u0 (ξ ),
dτ (1.1.8)
dx
= a(τ, x), x(0) = ξ.

As we see from the first equation in (1.1.8), u is constant along each characteristic curve,
but the characteristic determined by the second equation need not be a straight line. We
now present an example to illustrate these ideas.

Example 1.1.2. Consider the equation

ut + x ux = 0,

1 if 0 ≤ x ≤ 1,
u(0, x) =
0 otherwise.
6 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Corresponding to the system (1.1.8) we have the equations

d ũ dx
= 0, = x, x(0) = ξ.
dτ dτ
The general solution of the differential equation for x(τ ) is x(τ ) = ceτ . Because we
specify that ξ is defined by x(0) = ξ, we have x(τ ) = ξ eτ , or ξ = xe−t . The equation
for ũ shows that ũ is independent of τ, so by the condition at τ equal to zero we have
that

ũ(τ, ξ ) = u0 (ξ ).

Thus

u(t, x) = ũ(τ, ξ ) = u0 (ξ ) = u0 (xe−t ).

So we have, for t > 0,

1 if 0 ≤ x ≤ et ,
u(t, x) =
0 otherwise.

As for equations with constant coefficients, these methods apply to nonlinear equa-
tions of the form
ut + a(t, x)ux = f (t, x, u), (1.1.9)
as shown in Exercise 1.1.9. Equations for which the characteristic speeds depend on u,
i.e., with characteristic speed a(t, x, u), require special care, since the characteristic curves
may intersect.

Systems with Variable Coefficients


For systems of hyperbolic equations in one space variable with variable coefficients, we
require uniform diagonalizability. (See Appendix A for a discussion of matrix norms.)
Definition 1.1.2. The system

ut + A(t, x) ux + B(t, x)u = F (t, x) (1.1.10)


with
u(0, x) = u0 (x)

is hyperbolic if there is a matrix function P (t, x) such that


 
a1 (t, x) 0
P (t, x) A(t, x) P −1 (t, x) = (t, x) =  ..
. 
0 ad (t, x)

is diagonal with real eigenvalues and the matrix norms of P (t, x) and P −1 (t, x) are
bounded in x and t for x ∈ R, t ≥ 0.
1.1 Overview of Hyperbolic Equations 7
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The characteristic curves for system (1.1.10) are the solutions to the differential
equations
dx i
= ai (t, x), x i (0) = ξ i .
dt
Setting v = P (t, x)u, we obtain the system for v:
vt + vx = P (t, x) F (t, x) + G(t, x)v,
where
G = (Pt + Px − P B)P −1 .
In terms of directional derivatives this system is equivalent to
dv i d
= f˜i (t, x) + gji (t, x)v j .
dt along x i
j =1

This formula is not a practical method of solution for most problems because the ordinary
differential equations are often quite difficult to solve, but the formula does show the
importance of characteristics for these systems.

Exercises
1.1.1. Consider the initial value problem for the equation
ut + aux = f (t, x)
with u(0, x) = 0 and

1 if x ≥ 0,
f (t, x) =
0 otherwise.
Assume that a is positive. Show that the solution is given by

0 if x ≤ 0,
u(t, x) = x/a if x ≥ 0 and x − at ≤ 0,

t if x ≥ 0 and x − at ≥ 0.
1.1.2. Consider the initial value problem for the equation
ut + aux = f (t, x)
with u(0, x) = 0 and

1 if −1 ≤ x ≤ 1,
f (t, x) =
0 otherwise.
Assume that a is positive. Show that the solution is given by

 (x + 1)/a if −1 ≤ x ≤ 1 and x − at ≤ −1,



 if −1 ≤ x ≤ 1 and −1 ≤ x − at,
t
u(t, x) = 2/a if x ≥ 1 and x − at ≤ −1,



 (1 − x + at)/a if x ≥ 1 and −1 ≤ x − at ≤ 1,


0 otherwise.
8 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1.1.3. Solve the initial value problem for

1
ut + ux = 0.
1+ 1
2 cos x

Show that the solution is given by u(t, x) = u0 (ξ ), where ξ is the unique solution
of
ξ + 12 sin ξ = x + 12 sin x − t.

1.1.4. Show that the initial value problem for (1.1.5) is equivalent to the family of initial
value problems for the ordinary differential equations

d ũ
= f (τ, ξ + aτ, ũ)

with ũ(0, ξ ) = u0 (ξ ). Show that the solution of (1.1.5), u(t, x), is given by u(t, x) =
ũ (t, x − at) .
1.1.5. Use the results of Exercise 1.1.4 to show that the solution of the initial value problem
for
ut + ux = − sin2 u
is given by 
−1 tan[u0 (x − t)]
u(t, x) = tan .
1 + t tan[u0 (x − t)]
An equivalent formula for the solution is

u(t, x) = cot −1 (cot[u0 (x − t)] + t) .

1.1.6. Show that all solutions to


u t + a ux = 1 + u2
become unbounded in finite time. That is, u(t, x) tends to infinity for some x as t
approaches some value t ∗ , where t ∗ is finite.
1.1.7. Show that the initial value problem for the equation
 
ut + 1 + x 2 ux = 0

is not well defined. Hint: Consider the region covered by the characteristics origi-
nating on the x-axis.
1.1.8. Obtain the solution of the system

ut + ux + vx = 0, u(x, 0) = u0 (x),
vt + ux − vx = 0, v(x, 0) = v0 (x).
1.2 Boundary Conditions 9
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1.1.9. Show that the initial value problem for (1.1.9) is equivalent to the family of initial
value problems for the system of ordinary differential equations

d ũ
= f (τ, x(τ ), ũ), ũ(0, ξ ) = u0 (ξ ),

dx
= a(τ, x(τ )), x(0) = ξ.

The solution to (1.1.9) is given by u (t, x(ξ )) = ũ(t, ξ ).

1.2 Boundary Conditions


We now consider hyperbolic partial differential equations on a finite interval rather than on
the whole real line. Most applications of partial differential equations involve domains with
boundaries, and it is important to specify data correctly at these locations. The conditions
relating the solution of the differential equation to data at a boundary are called boundary
conditions. A more complete discussion of the theory of boundary conditions for time-
dependent partial differential equations is given in Chapter 11. The problem of determining
a solution to a differential equation when both initial data and boundary data are present
is called an initial-boundary value problem. In this section we restrict the discussion to
initial-boundary value problems for hyperbolic equations in one space variable.
The discussion of initial-boundary value problems serves to illustrate again the im-
portance of the concept of characteristics. Consider the simple equation

ut + aux = 0 with 0 ≤ x ≤ 1, t ≥ 0. (1.2.1)

If a is positive the characteristics in this region propagate from the left to the right, as shown
in Figure 1.2. By examining the characteristics in Figure 1.2, we see that the solution must
be specified on the boundary at x equal to 0, in addition to the initial data, in order to be
defined for all time. Moreover, no data can be supplied at the other boundary or the solution
will be overdetermined.
If we specify initial data u(0, x) = u0 (x) and boundary data u(t, 0) = g(t), then
the solution is given by

u0 (x − at) if x − at > 0,
u(t, x) = −1
g(t − a x) if x − at < 0.

Along the characteristic given by x − at = 0, there will be a jump discontinuity in u if


u0 (0) is not equal to g(0). If a is negative, the roles of the two boundaries are reversed.
Now consider the hyperbolic system
 1   1
u a b u
+ =0 (1.2.2)
u2 t b a u2 x

on the interval 0 ≤ x ≤ 1. The eigenvalues, or characteristic speeds, of the system are


easily seen to be a + b and a − b. We consider only the cases where a and b are
10 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

t a>0 t a<0

x x
0 1 0 1

Figure 1.2. Characteristics for equation (1.2.1).

positive. If we have 0 < b < a, then both characteristic families propagate to the right,
as shown in Figure 1.3. This means that the entire solution, both components u1 and u2 ,
must be specified at x equal to 0, and no data should be specified at x equal to 1. Notice
that the slope of the characteristic in these figures is the inverse of the speed. Thus the
characteristics with the slower speed have the greater slope.
The most interesting case is where 0 < a < b, since then the characteristic families
propagate in opposite directions (see the right-hand side in Figure 1.3). If system (1.2.2) is
put into the form (1.1.6), it is
  
u1 + u2 a+b 0 u1 + u2
+ = 0. (1.2.3)
u1 − u 2 t
0 a−b u1 − u 2 x

Certainly one way to determine the solution uniquely is to specify u1 + u2 at x equal


to 0 and specify u1 − u2 at x equal to 1. However, there are other possible boundary
conditions; for example, any of the form

u1 + u2 = α0 (u1 − u2 ) + β0 (t) at x = 0,
(1.2.4)
u1 − u2 = α1 (u1 + u2 ) + β1 (t) at x = 1,

will determine the solution. The coefficients α0 and α1 may be functions of t or constants.
As examples, we have that the boundary conditions

u1 (t, 0) = β0 (t),
u2 (t, 1) = β1 (t)
1.2 Boundary Conditions 11
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

t t

a-b

a-b
a+b
a+b

x x
0 1 0 1

Figure 1.3. Characteristics for system (1.2.3).

can be put in the form


u1 (t, 0) + u2 (t, 0) = −(u1 (t, 0) − u2 (t, 0)) + 2β0 (t),
u1 (t, 1) + u2 (t, 1) = u1 (t, 1) − u2 (t, 1) + 2β1 (t),
which are equivalent to the conditions in (1.2.4) with α0 and α1 equal to −1 and 1,
respectively.
Boundary conditions that determine a unique solution are said to be well-posed. For
the system (1.2.2) the boundary conditions are well-posed if and only if they are equivalent
to (1.2.4). The boundary conditions (1.2.4) express the value of the characteristic variable
on the incoming characteristic in terms of the outgoing characteristic variable and the data.
By incoming characteristic we mean a characteristic that enters the domain at the boundary
under consideration; an outgoing characteristic is one that leaves the domain. We see then
that specifying u1 or u2 at x equal to 0 is well-posed, and specifying u1 or u2 at x
equal to 1 is also well-posed. However, specifying u1 − u2 at x equal to 0 is ill-posed,
as is specifying u1 + u2 at x equal to 1.
For a hyperbolic initial-boundary value problem to be well-posed, the number of
boundary conditions must be equal to the number of incoming characteristics. The pro-
cedure for determining whether or not an initial-boundary value problem is well-posed is
given in Chapter 11.

Example 1.2.1. To illustrate how the solution to a hyperbolic system is determined by


both the initial and boundary conditions, we consider as an example the system
 1  1 3  1
u 2 2 u
+ =0 (1.2.5)
2
u t 3 1 u2 x
2 2
12 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

on the interval [0,1] with the initial conditions


u1 (0, x) = 0 and u2 (0, x) = x.
The eigenvalues of the matrix in (1.2.5) are 2 and −1, so this system requires one
boundary condition on each boundary. We take boundary conditions
u1 (t, 0) = t and u1 (t, 1) = 0.
The two families of characteristic curves are given by
t − 2x = ξ1 and t + x = ξ2 ,
where different values of ξ1 and ξ2 give the different characteristic curves. The charac-
teristics are displayed in Figure 1.4.
The system (1.2.5) can be rewritten as
 1   1
u + u2 2 0 u + u2
+ =0 (1.2.6)
u1 − u 2 t 0 −1 u1 − u 2 x
and this shows that the characteristic variables w1 and w 2 are
w 1 = u1 + u2 and w 2 = u1 − u2 .
The inverse relations are
w1 + w2 w1 − w2
u1 = and u2 = .
2 2
The equations satisfied by w1 and w 2 are
wt1 + 2wx1 = 0 and wt2 − wx2 = 0.
The initial conditions for w1 and w 2 are
w 1 (0, x) = x and w2 (0, x) = −x.
In the characteristic variables the boundary conditions are
w 1 (t, 0) = −w2 (t, 0) + 2t and w 2 (t, 1) = w1 (t, 1). (1.2.7)

2
1

Figure 1.4. Characteristics for Example 1.2.1.


1.2 Boundary Conditions 13
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We now use this data to determine the solution in the interior. In region 1 of Figure
1.4, the solution is determined by the initial conditions. Thus, using the characteristics and
the initial data we obtain

w1 (t, x) = w 1 (0, x − 2t) = x − 2t,


w2 (t, x) = w 2 (0, x + t) = −(x + t) = −x − t.

Using the inverse relations, we have

w 1 (t, x) + w 2 (t, x) 3
u1 (t, x) = = − t,
2 2
w 1 (t, x) − w 2 (t, x) 1
u2 (t, x) = = x − t.
2 2

In region 2, the values of w1 are determined since the characteristics for w 1 enter
from region 1. Thus, the formula for w1 is the same for regions 1 and 2:

w 1 (t, x) = x − 2t.

The values of w 2 in region 2 are determined by the values from the characteristics
emanating from the boundary at x = 1. The boundary condition there is (from (1.2.7))

w 2 (t, 1) = −w1 (t, 1) = −(1 − 2t) = −1 + 2t,

and extending to the interior we have

w2 (t, x) = w 2 (x + t − 1, 1) = −1 + 2(x + t − 1) = −3 + 2x + 2t.

Thus in region 2

w 1 (t, x) + w 2 (t, x) (x − 2t) + (−3 + 2x + 2t) 3 3


u1 (t, x) = = = − + x,
2 2 2 2
w 1 (t, x) − w 2 (t, x) (x − 2t) − (−3 + 2x + 2t) 3 1
u2 (t, x) = = = − x − 2t.
2 2 2 2

Notice that both u1 and u2 are continuous along the line x + t = 1 between regions 1
and 2.
In region 3, the values of w 2 are the same as in region 1:

w 2 (t, x) = −x − t.

The boundary condition at x = 0 from (1.2.7) is

w 1 (t, 0) = −w2 (t, 0) + 2t.


14 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Thus at x = 0,
w 1 (t, 0) = −w2 (t, 0) + 2t = 3t.
Extending this into the interior along the characteristics gives
 
1 1 3
w (t, x) = w t − x, 0 = 3 t − x = − x + 3t.
1 1
2 2 2
Thus, from the inverse equations, in region 3
− 32 x + 3t + (−x − t) 5
u1 (t, x) = = − x + t,
2 4
− 32 x + 3t − (−x − t) 1
u2 (t, x) = = − x + 2t.
2 4
In region 4, the values of w 1 are determined by the characteristics from region 3,
and the values of w 2 are determined by the characteristics from region 2. Thus
3
w1 (t, x) = − x + 3t,
2
w 2 (t, x) = −3 + 2x + 2t,
and so
3 1 5
u1 (t, x) = − + x + t,
2 4 2
3 7 1
u2 (t, x) = − x + t.
2 4 2
Similar analysis can determine the solution in all the regions for all t.

Periodic Problems
Besides the initial value problem on the whole real line R, we can also consider periodic
problems on an interval. For example, consider the one-way wave equation (1.1.1) on the
interval [0, 1], where the solution satisfies
u(t, 0) = u(t, 1) (1.2.8)
for all nonnegative values of t. Condition (1.2.8) is sometimes called the periodic boundary
condition, but strictly speaking it is not a boundary condition, since for periodic problems
there are no boundaries.
A periodic problem for a function u(t, x) with x in the interval [0, 1] is equivalent to
one on the real line satisfying u(t, x) = u(t, x + $) for every integer $. Thus, the function
u(t, x) is determined by its values of x in any interval of length 1, such as [− 12 , 12 ].
A periodic problem may also be regarded as being defined on a circle that is coordi-
natized by an interval with endpoints being identified. In this view, there is a boundary in
the coordinate system but not in the problem itself.
1.2 Boundary Conditions 15
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
1.2.1. Consider system (1.2.2) on the interval [0, 1], with a equal to 0 and b equal to
1 and with the boundary conditions u1 equal to 0 at the left and u1 equal to 1
at the right boundary. Show that if the initial data are given by u1 (0, x) = x and
u2 (0, x) = 1, then the solution is u1 (t, x) = x and u2 (t, x) = 1 − t for all (t, x)
with 0 ≤ x ≤ 1 and 0 ≤ t.

1.2.2. Consider system (1.2.2) on the interval [0, 1], with a equal to 0 and b equal to 1
and with the boundary conditions u1 equal to 0 at the left and u1 equal to 1 + t
at the right boundary. Show that if the initial data are given by u1 (0, x) = x and
u2 (0, x) = 1, then for 0 ≤ x + t ≤ 3 the solution is given by


 x
if 0 ≤ t < 1 − x,


 1−t
 1 

u (t, x) 2x + t − 1
= if 1 − x ≤ t < 1 + x,
u2 (t, x)  2 − x − 2t


  3x



 if 1 + x ≤ t < 3 − x.
3(1 − t)

1.2.3. Consider system (1.2.2) on the interval [0, 1], with a equal to 0 and b equal to 1 and
with the boundary conditions u1 equal to 0 at both the left and the right boundaries.
Show that if the initial data are given by u1 (0, x) = x and u2 (0, x) = 1, then for
0 ≤ t ≤ 1 the solution is given by


 

x
if 0 ≤ x < 1 − t,
u1 (t, x)  1−t
= 

u2 (t, x)  x−1
 if 1 − t ≤ x < 1.
2−t

1.2.4. Show that the initial-boundary value problem of Exercise 1.2.3 has the solution for
1 ≤ t ≤ 2 given by


 1 

x
if 0 ≤ x < t − 1,
u (t, x) 3−t
= 

u2 (t, x)  x−1
 if t − 1 < x < 1.
2−t

1.2.5. Consider system (1.2.2) on the interval [0, 1], with a equal to 1 and b equal to
2 and with the boundary conditions u1 equal to 0 at the left and u1 equal to 1
at the right boundary. Show that if the initial data are given by u1 (0, x) = x and
16 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

u2 (0, x) = 1, then for 0 ≤ t ≤ 1 + 1


3x the solution is given by

 x−t

 if 0 ≤ t ≤ min( 13 x, 1 − x),

 1 − 2t



  


2
3x

 if 13 x ≤ t ≤ 1 − x,



 1 − 13 x − t


 1   
u (t, x)  2x − 1
= if 1 − x ≤ t ≤ 13 x,
 2 − x − 3t
u2 (t, x) 
  



 t + 53 x − 1

 if max( 13 x, 1 − x) ≤ t ≤ min( 34 − x, 1 + 13 x),



 2 − 2t − 43 x



  


2
x + 13

 3 4
− x ≤ t ≤ 1 + 13 x.
 2 1 if 3
3 − 3x − t

1.3 Introduction to Finite Difference Schemes


We begin our discussion of finite difference schemes by defining a grid of points in the
(t, x) plane. Let h and k be positive numbers; then the grid will be the points (tn , xm ) =
(nk, mh) for arbitrary integers n and m as displayed in Figure 1.5. For a function v
defined on the grid we write vm n for the value of v at the grid point (t , x ). We also use
n m
n
the notation um for u(tn , xm ) when u is defined for continuously varying (t, x). The
set of points (tn , xm ) for a fixed value of n is called grid level n. We are interested in
grids with small values of h and k. In many texts the quantities that we call h and k are
represented by )x and )t, respectively.

h h h h h h

Figure 1.5. The finite difference grid.


1.3 Finite Difference Schemes 17
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The basic idea of finite difference schemes is to replace derivatives by finite differ-
ences. This can be done in many ways; as two examples we have

∂u u (tn + k, xm ) − u(tn , xm )
(tn , xm )
∂t k
u (tn + k, xm ) − u(tn − k, xm )
.
2k
That these are valid approximations is seen from the formulas

∂u u(t + ε, x) − u(t, x)
(t, x) = lim
∂t ε→0 ε
u(t + ε, x) − u(t − ε, x)
= lim ,
ε→0 2ε

relating the derivative to the values of u. Similar formulas approximate derivatives with
respect to x.
Using these approximations we obtain the following five finite difference schemes
for equation (1.1.1). Many other schemes are presented later.
n+1 − v n
vm v n − vm
n
m
+ a m+1 = 0, (1.3.1)
k h

n+1 − v n n − vn
vm
vm m m−1
+a = 0, (1.3.2)
k h

n+1 − v n
vm v n − vm−1
n
m
+ a m+1 = 0, (1.3.3)
k 2h

n+1 − v n−1
vm v n − vm−1
n
m
+ a m+1 = 0, (1.3.4)
2k 2h

n+1 −
 
vm 1
2
n
vm+1 + vm−1
n n
vm+1 − vm−1
n
+a = 0. (1.3.5)
k 2h
We refer to scheme (1.3.1) as the forward-time forward-space scheme because forward
difference approximations are used for both the time and space derivatives. Similarly,
(1.3.2) and (1.3.3) are referred to as the forward-time backward-space scheme and forward-
time central-space scheme, respectively. The scheme (1.3.4) is called the leapfrog scheme
and (1.3.5) is called the Lax–Friedrichs scheme.
The method of deriving these five schemes is very simple. This is one of the sig-
nificant features of the general method of finite differences, namely, that it is very easy to
derive finite difference schemes for partial differential equations. However, the analysis of
18 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

finite difference schemes to determine if they are useful approximations to the differential
equation requires some powerful mathematical tools. Moreover, to develop very efficient
and accurate schemes requires more work than went into obtaining the schemes (1.3.1)–
(1.3.5). Nonetheless, the finite difference method is notable for the great variety of schemes
that can be used to approximate a given partial differential equation.
Given this short list of schemes, we are naturally led to the question of which of them
are useful and which are not, as indeed some are not. This is a basic question, and we spend
some time and care in answering it. In fact, the question can be answered on several levels.
We first answer it on the most primitive level, determining which schemes have solutions
that approximate solutions of the differential equation at all. Later, we determine which
schemes are more accurate than others and also investigate the efficiency of the various
schemes.
Each of the schemes (1.3.1)–(1.3.5) can be written expressing vm n+1 as a linear com-

bination of values of v at levels n and n − 1. For example, scheme (1.3.1) can be written
as
n+1
vm = (1 + aλ) vmn
− aλ vm+1
n
,

where λ = k/h. The quantity λ will appear often in the study of schemes for hyperbolic
equations and will always be equal to k/h. Those schemes that involve v at only two
levels, e.g., n + 1 and n, are called one-step schemes. Of the schemes just listed all
except the leapfrog scheme (1.3.4) are one-step schemes. Given the initial data vm 0, a
n
one-step scheme can be used to evaluate vm for all positive values of n.
The leapfrog scheme (1.3.4) is an example of a multistep scheme. For a multistep
scheme it is not sufficient to specify the values of vm 0 in order to determine v n for all
m
positive values of n. To specify completely the means of computing a solution to a multistep
scheme, either we must specify v on enough time levels so that the scheme can be employed
or we must specify a procedure for computing the values of v on these initial time levels.
For example, to use the leapfrog scheme we could specify the values of vm 0 and v 1 for
m
all m, or we could specify that scheme (1.3.1) would be used to compute the values of vm 1
0
from the values vm . In either case the leapfrog scheme (1.3.4) would be used to compute
vmn for n greater than 1.

When we refer to the leapfrog scheme we do not always distinguish between these
two ways of initializing the computation. As we show in Section 4.1, many of the properties
of the leapfrog scheme are independent of the method used to initialize the solution. Since
the usual practice is to use a one-step scheme to initialize the first time level, we usually
assume that the initialization is done in this way. This is illustrated in Example 1.3.2. The
subject of how to initialize multistep schemes in general is considered in more detail in
Section 4.1.

Example 1.3.1. Before we proceed with the analysis of finite difference schemes, we
present the results of some computations using two of the schemes just presented. We use
the initial-boundary value problem

u t + ux = 0 on − 2 ≤ x ≤ 3, 0 ≤ t
1.3 Finite Difference Schemes 19
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

with initial data


1 − |x| if |x| ≤ 1,
u0 (x) =
0 if |x| ≥ 1.

On the boundary at x equal to −2, we specify that u is zero.


The first computation uses the Lax–Friedrichs scheme (1.3.5) with λ = 0.8 and h
n+1
equal to 0.1. At the right-hand boundary we use the condition vM = vM−1
n+1
, where
xM = 3. For our initial data we take vm = u0 (xm ). The computation proceeds using the
0

formula
 n   n 
n+1
vm = 12 vm+1 + vm−1
n
− 12 λ vm+1 − vm−1
n

to find the values of vmn+1 for all values except those at the endpoints of the interval. A

graph of the solution at t = 1.6 is shown in Figure 1.6. In the figure the exact solution to
the differential equation is given by the solid line and the solution of the scheme is shown
as the curve with the circles. The figure shows that the finite difference scheme computes a
reasonable solution, except that the computed solution does not maintain the sharp corners
of the exact solution. A smaller value of h, with the same value of λ, improves the shape
of the computed solution.

u
0.5

-2 -1 0 1 2 3
x

Figure 1.6. A solution of the Lax–Friedrichs scheme, λ = 0.8.

A similar calculation but using λ = 1.6 is shown in Figure 1.7 at t = 0.8. The figure
shows that for this case the computed solution is not well behaved. As the computation
proceeds for larger values of t, the behavior becomes worse. Also, if the grid spacing is
decreased, with λ fixed at 1.6, the behavior does not get better and in fact becomes worse.
The explanation for this behavior is given in the next chapter.
20 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

0.5

-2 -1 0 1 2 3

Figure 1.7. A solution of the Lax–Friedrichs scheme, λ = 1.6.

0.5

-2 -1 0 1 2 3

Figure 1.8. A solution computed with leapfrog scheme, λ = 0.8.

Example 1.3.2. The leapfrog scheme (1.3.4) with λ = 0.8 gives much better results than
does the Lax–Friedrichs scheme for the same initial-boundary value problem in Example
1.3.1. The computational results are displayed in Figure 1.8. Notice that the resolution of
the peak in the solution is much better in Figure 1.8 than in Figure 1.6. The leapfrog scheme
has a less smooth solution than does the Lax–Friedrichs; however the small oscillations do
not detract significantly from the accuracy. In Section 5.1 we discuss methods of removing
n+1
these oscillations. At the right-hand boundary, vM is computed as it was for the Lax–
Friedrichs scheme.
1.3 Finite Difference Schemes 21
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

As discussed before, the leapfrog scheme requires that another scheme be used to
calculate the values at the time level with n equal to 1. For the calculations shown in
Figure 1.8, the forward-time central-space scheme (1.3.3) was used.

Computer Implementation of Finite Difference Schemes


To implement any of the finite difference schemes (1.3.1)–(1.3.5) or similar finite difference
schemes in a computer program, values of the solution vm n should not be stored beyond the

time steps in which they are needed. A simple way to do this is to use two one-dimensional
arrays vold and vnew, each of which is indexed by the spatial grid indices. The values
of vnew(m) and vold(m) correspond to vm n+1 and v n , respectively. For each value of
m
n, vnew, corresponding to v n+1 , is computed using vold, corresponding to v n . After
vnew has been computed for all m, then vold must be reset to vnew, and the time step
is incremented to the next value. For the leapfrog scheme the array vnew can be used to
store both v n−1 and v n+1 .
Any values of the solution that are to be saved or plotted may be written to a file as
they are computed. It is not advisable to save past values beyond the time they are needed
in the computation.
A more convenient way to store the solution for schemes (1.3.1)–(1.3.5) is to use a
two-dimensional array, such as v(nmod,m), where nmod is equal to n modulo 2. The
values of v(0, · ) are used to compute the values of v(1, · ), which are used to compute
v(0, · ), and so on. This method avoids the need to reset arrays such as vold, which was
set equal to vnew in the method described previously.
Here is a sample of pseudocode for the Lax–Friedrichs scheme.
# Supply initial data
now = 0
new = 1
time = 0
loop on m from 0 to M ! Set initial data
v(now,m) = u0(x(m))
end of loop on m
loop for time < TIME MAX
time = time + k ! This is the time being computed.
n time = n time + 1
v(new,0 ) = beta(time) ! Set the boundary value.
loop on m from 1 to M-1
v(new,m) = (v(now, m-1) + v(now,m+1))/2
- a*lambda*( v(now,m+1) - v(now,m-1))/2
end of loop on m
v(new,M ) = v(new,M-1) ! Apply boundary condition.

now = new ! Reset for the next time step.


new = mod(n time, 2)
end of loop on time
22 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

For periodic problems on the interval [0, 1] with h = 1/M and grid points xm =
mh, it is useful to store values at x0 and at xM , even though these values represent the
same point in the periodic problem.

Exercises
1.3.1. For values of x in the interval [−1, 3] and t in [0, 2.4], solve the one-way wave
equation
ut + ux = 0,
with the initial data

cos2 π x if |x| ≤ 12 ,
u(0, x) =
0 otherwise,

and the boundary data u(t, −1) = 0.


Use the following four schemes for h = 1/10, 1/20, and 1/40.
(a) Forward-time backward-space scheme (1.3.2) with λ = 0.8.
(b) Forward-time central-space scheme (1.3.3) with λ = 0.8.
(c) Lax–Friedrichs scheme (1.3.5) with λ = 0.8 and 1.6.
(d) Leapfrog scheme (1.3.4) with λ = 0.8.
n+1
For schemes (b), (c), and (d), at the right boundary use the condition vM =
vM−1 , where xM = 3. For scheme (d) use scheme (b) to compute the solution at
n+1

n = 1.
For each scheme determine whether the scheme is a useful or useless scheme.
For the purposes of this exercise only, a scheme will be useless if |vmn | is greater than

5 for any value of m and n. It will be regarded as a useful scheme if the solution
looks like a reasonable approximation to the solution of the differential equations.
Graph or plot several solutions at the last time they were computed. What do you
notice about the “blow-up time” for the useless schemes as the mesh size decreases?
Is there a pattern to these solutions? For the useful cases, how does the error decrease
as the mesh decreases; i.e., as h decreases by one-half, by how much does the error
decrease?
1.3.2. Solve the system

ut + 13 (t − 2)ux + 23 (t + 1)wx + 13 u = 0,
wt + 13 (t + 1)ux + 13 (2t − 1)wx − 13 w = 0

by the Lax–Friedrichs scheme: i.e., each time derivative is approximated as it is for


the scalar equation and the spatial derivatives are approximated by central differ-
ences. The initial values are
u(0, x) = max(0, 1 − |x|),
w(0, x) = max(0, 1 − 2|x|).
1.4 Convergence and Consistency 23
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Consider values of x in [−3, 3] and t in [0, 2]. Take h equal to 1/20 and λ equal
to 1/2. At each boundary set u = 0, and set w equal to the newly computed value
one grid point in from the boundary. Describe the solution behavior for t in the
range [1.5, 2]. You may find it convenient to plot the solution. Solve the system in
the form given; do not attempt to diagonalize it.
1.3.3. Solve the system
ut + 13 (t − 2)ux + 23 (t + 1)wx = 0,

wt + 13 (t + 1)ux + 13 (2t − 1)wx = 0

by the Lax–Friedrichs scheme as in Exercise 1.3.2, using the same initial data. An
examination of the computed solution should show how to obtain the analytical
solution to this problem.
1.3.4. Numerically solve the equation in Exercise 1.1.5 using the initial data and intervals
of Exercise 1.3.1. Use the leapfrog scheme with λ = 0.5 and h = 1/10, 1/20, and
1/40. Use the forward-time central-space scheme to compute the first time step.
The boundary condition at x = −1 is u(t, −1) = 0.

1.4 Convergence and Consistency


The most basic property that a scheme must have in order to be useful is that its solutions
approximate the solution of the corresponding partial differential equation and that the
approximation improves as the grid spacings, h and k, tend to zero. We call such a
scheme a convergent scheme, but before formally defining this concept it is appropriate to
extend our discussion to a wider class of partial differential equations than the hyperbolic
equations. We consider linear partial differential equations of the form

P (∂t , ∂x )u = f (t, x),

which are of first order in the derivative with respect to t. We also assume for such equations
or systems of equations that the specification of initial data, u(0, x), completely determines
a unique solution. More is said about this in Chapter 9. The real variable x ranges over
the whole real line or an interval. Examples of equations that are first order in time are the
one-way wave equation (1.1.1) and the following three equations:

ut − buxx + aux = 0,
ut − cutxx + buxxxx = 0, (1.4.1)
ut + cutx + aux = 0.

Definition 1.4.1. A one-step finite difference scheme approximating a partial differential


equation is a convergent scheme if for any solution to the partial differential equation,
24 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

n , such that v 0 converges to


u(t, x), and solutions to the finite difference scheme, vm m
u0 (x) as mh converges to x, then vm n converges to u(t, x) as (nk, mh) converges to

(t, x) as h, k converge to 0.

This definition is not complete until we clarify the nature of the convergence of vmn,

defined on the grid, to u(t, x) defined for continuously varying (t, x). We discuss this
convergence completely in Chapter 10. For multistep schemes the definition assumes that
some initializing procedure is used to compute the first several time levels necessary to
employ the multistep scheme. For the case that the data are specified on these first time
j
levels, the definition is altered to require vm for 0 ≤ j ≤ J to converge to u0 (xm ).
As illustrated by Figures 1.6 and 1.8, the Lax–Friedrichs scheme and the leapfrog
scheme with λ equal to 0.8 are convergent schemes. These figures show that the solution
of the difference scheme is a reasonable approximation to the solution of the differential
equation. As h and k are decreased, the solutions of the schemes become better ap-
proximations. The Lax–Friedrichs scheme with λ = 1.6 is not convergent. As h and k
decrease, with λ equal to 1.6, the solution of the scheme does not approach the solution
of the differential equation in any sense. As can be seen in Figure 1.7, the behavior of a
nonconvergent scheme can be quite poor.
The convergence of the Lax–Friedrichs scheme is also illustrated in Figure 1.9, which
shows a portion of Figure 1.6 along with the results for h = 1/20 and h = 1/40. The three
plots show that as h gets smaller, with λ = 0.8, the solution of the finite difference scheme
approaches the solution of the differential equation.
Proving that a given scheme is convergent is not easy in general, if attempted in a
direct manner. However, there are two related concepts that are easy to check: consistency
and stability. First, we define consistency.

0.5
1 2

Figure 1.9. Lax–Friedrichs scheme convergence.


1.4 Convergence and Consistency 25
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Definition 1.4.2. Given a partial differential equation, P u = f, and a finite difference


scheme, Pk,h v = f, we say that the finite difference scheme is consistent with the partial
differential equation if for any smooth function φ(t, x)

P φ − Pk,h φ → 0 as k, h → 0,

the convergence being pointwise convergence at each point (t, x).

For some schemes we may have to restrict the manner in which k and h tend to zero
in order for it to be consistent (see Example 1.4.2). When we refer to a smooth function we
mean one that is sufficiently differentiable for the context.
Also, note that the difference operator Pk,h when applied to a function of (t, x) does
not need to be restricted to grid points. Thus, a forward difference in x applied at a point
(t, x) is
φ(t, x + h) − φ(t, x)
.
h
We demonstrate the use of this definition and the notation by presenting two examples,
showing that two of the schemes in the above list are consistent with the equation (1.1.1).

Example 1.4.1. The Forward-Time Forward-Space Scheme. For the one-way wave
equation (1.1.1), the operator P is ∂t∂ + a ∂x

so that

P φ = φt + aφx .

For the forward-time forward-space scheme (1.3.1), the difference operator Pk,h is given
by
φ n+1 − φm
n φ n − φm n
Pk,h φ = m + a m+1 ,
k h
where
φmn
= φ(nk, mh).
We begin with the Taylor series of the function φ in t and x about (tn , xm ). We have
that
φmn+1
= φmn
+ kφt + 12 k 2 φtt + O(k 3 ),
n
φm+1 = φm
n
+ hφx + 12 h2 φxx + O(h3 ),
where the derivatives on the right-hand side are all evaluated at (tn , xm ), and so

Pk,h φ = φt + aφx + 12 kφtt + 12 ahφxx + O(k 2 ) + O(h2 ).

Thus

P φ − Pk,h φ = − 12 kφtt − 12 ahφxx + O(k 2 ) + O(h2 )

→ 0 as (k, h) → 0.

Therefore, this scheme is consistent.


26 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

When analyzing consistency it is convenient to use the “big oh” and “little oh” nota-
tion, as we have done in the preceding example. In general, if F and G are functions of
some parameter α, we write
F = O(G) as α → 0,
if
F
≤ K
G
for some constant K and all α sufficiently small. We write
F = o(G) as α → 0,
if F /G converges to zero as α tends to zero. In particular, a quantity is O(hr ) if it is
bounded by a constant multiple of hr for small h. A quantity is o(1) if it converges to
zero at an unspecified rate.

Example 1.4.2. The Lax–Friedrichs Scheme. For the Lax–Friedrichs scheme the differ-
ence operator is given by

n+1 − 1 φ n

m+1 + φm−1 φ n − φm−1
φm n n
Pk,h φ = 2
+ a m+1 .
k 2h
We use the Taylor series
n
φm±1 = φm
n
± hφx + 12 h2 φxx ± 16 h3 φxxx + O(h4 ),
where, as before, the derivatives are evaluated at (tn , xm ) and we have
 n 
2 φm+1 + φm−1 = φm + 2 h φxx + O(h )
1 n n 1 2 4

and n
φm+1 − φm−1
n
= φx + 16 h2 φxxx + O(h4 ).
2h
Substituting these expressions in the scheme, we obtain

Pk,h φ =φt + a φx + 12 k φtt − 12 k −1 h2 φxx


 
+ 16 ah2 φxxx + O h4 + k −1 h4 + k 2 .

So Pk,h φ − P φ → 0 as h, k → 0; i.e., it is consistent, as long as k −1 h2 also tends


to 0.

Consistency implies that the solution of the partial differential equation, if it is smooth,
is an approximate solution of the finite difference scheme. Similarly, convergence means
that a solution of the finite difference scheme approximates a solution of the partial differ-
ential equation. It is natural to consider whether consistency is sufficient for a scheme to
be convergent. Consistency is certainly necessary for convergence, but as the following
example shows, a scheme may be consistent but not convergent.
1.4 Convergence and Consistency 27
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example 1.4.3. Consider the partial differential equation ut + ux = 0 with the forward-
time forward-space scheme (1.3.1):
n+1 − v n
vm v n − vm
n
m
+ m+1 = 0.
k h
The scheme may be rewritten as
k n 
n+1
vm = vm
n
− vm+1 − vm
n
h
(1.4.2)
= (1 + λ) vm
n
− λ vm+1
n
,

where we have set λ = k/ h as usual. In Example 1.4.1 this scheme was shown to be
consistent. As initial conditions for the differential equation we take

1 if −1 ≤ x ≤ 0,
u0 (x) =
0 elsewhere.

The solution of the partial differential equation is a shift of u0 to the right by t. In particular,
for t greater than 0, there are positive values of x for which u(t, x) is nonzero. This is
illustrated in Figure 1.10.

u≠0
v=0

x
u ≠ 0, v ≠ 0 u = 0, v = 0

Figure 1.10. Consistency does not imply convergence.

For the difference scheme take the initial data


1 if −1 ≤ mh ≤ 0,
vm =
0
0 elsewhere.
28 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

As equation (1.4.2) shows, the solution of the difference scheme at (tn , xm ) depends only
on xm for m ≥ m at previous times. Thus we conclude that vm n is always 0 for points

xm to the right of 0, that is,

n
vm =0 for m > 0, n ≥ 0.

Therefore, vm n cannot converge to u(t, x), since for positive t and x, the function u is
n is zero.
not identically zero, yet vm
Notice that we conclude that the scheme is nonconvergent without specifying the
n for
type of convergence, but clearly, a sequence of functions that are all zero—i.e., the vm
m > 0 —cannot converge, under any reasonable definition of convergence, to the nonzero
function u.

Exercises
1.4.1. Show that the forward-time central-space scheme (1.3.3) is consistent with equation
(1.1.1).
1.4.2. Show that the leapfrog scheme (1.3.4) is consistent with the one-way wave equation
(1.1.1).
1.4.3. Show that the following scheme is consistent with the one-way wave equation (1.1.5):
 n+1

n+1 − v n
vm m a vm+1 − vm
n+1 n − vn
vm m−1
+ + = fmn . (1.4.3)
k 2 h h

1.4.4. Show that the following scheme is consistent with the equation ut + cutx +
aux = f :

n+1 − v n
vm v n+1 − vm−1
n+1
− vm+1
n + vm−1
n v n − vm−1
n
m
+ c m+1 + a m+1 = fmn .
k 2kh 2h

1.4.5. Interpret the results of Exercise 1.3.1 in light of the definition of convergence. Based
on the cases run in that exercise, decide which of the schemes are convergent.

1.5 Stability
Example 1.4.3 shows that a scheme must satisfy other conditions besides consistency before
we can conclude that it is convergent. The important property that is required is stability.
To introduce this concept we note that, if a scheme is convergent, as vm n converges to
n
u(t, x), then certainly vm is bounded in some sense. This is the essence of stability. The
following definition of stability is for the homogeneous initial value problem, that is, one
in which the right-hand-side function f is 0.
1.5 Stability 29
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Before giving the definition of stability we need to define a stability region. For many
schemes there are restrictions on the way that h and k should be chosen so that the scheme
is stable, and therefore useful in computation. A stability region is any bounded nonempty
region of the first quadrant of R 2 that has the origin as an accumulation point. That is, a
stability region must contain a sequence (kν , hν ) that converges to the origin as ν tends to
infinity. A common example is a region of the form {(k, h) : 0 < k ≤ ch ≤ C} for some
positive constants c and C. An example of a stability region is displayed in Figure 1.11.

Figure 1.11. Stability region.

Definition 1.5.1. A finite difference scheme Pk,h vm n = 0 for a first-order equation is

stable in a stability region  if there is an integer J such that for any positive time T ,
there is a constant CT such that


J ∞
j
h |vm | ≤ CT h
n 2
|vm |2 (1.5.1)
m=−∞ j =0 m=−∞

for 0 ≤ nk ≤ T , with (k, h) ∈ .

Before proceeding with our discussion of stability, we introduce some notation that
will be of use in understanding inequality (1.5.1). We first introduce the notation
 ∞ 1/2
wh = h |wm | 2
(1.5.2)
m=−∞

for any grid function w. The quantity wh is called the L2 norm of the grid function
w and is a measure of the size of the solution (see Appendix B for a discussion of function
norms). In many problems the L2 norm is a measure of a physically significant quantity
such as the energy of the system. With this notation the inequality (1.5.1) can be written as
 J 1/2
v n h ≤ CT v j 2h ,
j =0
30 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

which is equivalent to

J
v n h ≤ CT∗ v j h (1.5.3)
j =0

for some constant CT∗ . Inequalities (1.5.1) and (1.5.3) express the idea that the norm of the
solution at any time t, with 0 ≤ t ≤ T , is limited in the amount of growth that can occur.
The growth is at most a constant multiple of the sum of the norms of the solution on the
first J + 1 steps.
We may take J equal to zero for one-step schemes and also for multistep schemes
incorporating an initializing procedure for computing the solution for the first several time
steps, as discussed earlier in this section. We include the possibility of J being positive to
include multistep schemes with data specified on the first J + 1 levels. It will be shown
that the stability of a multistep scheme is not dependent on the method of initialization.
To demonstrate whether or not the estimate (1.5.1) holds for a particular scheme can
be quite formidable unless we use methods from Fourier analysis, which is discussed in
the next chapter. In Section 2.2 a relatively simple procedure, von Neumann analysis, is
presented for determining the stability of difference schemes.
For certain rather simple schemes we can determine sufficient conditions that ensure
that the scheme is stable. This is done by establishing the stability estimate (1.5.1) directly.

Example 1.5.1. We will prove a sufficient condition for stability for the forward-time
forward-space scheme (1.3.1) by considering schemes of the form

n+1
vm = αvm
n
+ βvm+1
n
,

of which the forward-time forward-space scheme is a special case. We will show that the
scheme is stable if |α| + |β| ≤ 1. The analysis is similar for the forward-time backward-
space scheme (1.3.2). We have




|vm | =
n+1 2
|αvm
n
+ βvm+1
n
|2
m=−∞ m=−∞



≤ |α|2 |vm | + 2|α||β||vm
n 2 n
||vm+1
n
| + |β|2 |vm+1
n
|2
m=−∞



≤ |α|2 |vm | + |α||β|(|vm
n 2
| + |vm+1
n 2 n
|2 ) + |β|2 |vm+1
n
|2 ,
m=−∞

where we have used the inequality 2xy ≤ x 2 + y 2 . The sum can be split over the terms
with index m and those with index m + 1 and the index can be shifted so that all terms
1.5 Stability 31
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

have the index m :




= |α|2 |vm | + |α||β||vm
n 2
| +
n 2
|α||β||vm+1
n
|2 + |β|2 |vm+1
n
|2
m=−∞ m=−∞




= |α|2 |vm | + |α||β||vm
n 2
| +
n 2
|α||β||vm | + |β|2 |vm
n 2
|
n 2

m=−∞ m=−∞

∞ 

= |α|2 + 2|α||β| + |β|2 |vm |
n 2

m=−∞



= (|α| + |β|)2 |vm | .
n 2

m=−∞

This shows that we have the relation





|vm | ≤ (|α| + |β|)2
n+1 2
|vm | ,
n 2

m=−∞ m=−∞

and since this applies for all n, we have that




|vm |
n 2
≤ (|α| + |β|) 2n
|vm | .
0 2

m=−∞ m=−∞

If |α| + |β| is at most 1 in magnitude, then the scheme will be stable. Thus, schemes of
the form given above are stable if |α| + |β| ≤ 1.
For the forward-time forward-space scheme (1.3.1) the condition |α| + |β| ≤ 1 is
that |1 + aλ| + |aλ| is at most 1. Thus we see that this scheme is stable if −1 ≤ aλ ≤ 0.
In Section 2.2 we show that this is also a necessary condition.

The concept of stability for finite difference schemes is closely related to the concept
of well-posedness for initial value problems for partial differential equations. As before,
we restrict our discussion to equations P u = f that are of first order with respect to
differentiation in time.

Definition 1.5.2. The initial value problem for the first-order partial differential equation
P u = 0 is well-posed if for any time T ≥ 0, there is a constant CT such that any solution
u(t, x) satisfies
 ∞  ∞
|u(t, x)|2 dx ≤ CT |u(0, x)|2 dx (1.5.4)
−∞ −∞

for 0 ≤ t ≤ T .
32 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

A discussion of the concept of a well-posed initial value problem is given in Chapter 9.


It is shown that only well-posed initial value problems can be used to model the evolution of
physical processes. The methods of Fourier analysis that are introduced in the next chapter
will be useful in the study of well-posed initial value problems.
In Chapter 9 we discuss stability and well-posedness for the inhomogeneous prob-
lems, Pk,h v = f and P u = f, respectively. As we show, the inhomogeneous equations
can be treated using the estimates (1.5.1) and (1.5.4) by use of Duhamel’s principle. Thus
a scheme is stable for the equation Pk,h v = f if it is stable for the equation Pk,h v = 0.

The Lax–Richtmyer Equivalence Theorem


The importance of the concepts of consistency and stability is seen in the Lax–Richtmyer
equivalence theorem, which is the fundamental theorem in the theory of finite difference
schemes for initial value problems.

Theorem 1.5.1. The Lax–Richtmyer Equivalence Theorem. A consistent finite differ-


ence scheme for a partial differential equation for which the initial value problem is well-
posed is convergent if and only if it is stable.

A proof of this theorem is given in Chapter 10. The Lax–Richtmyer equivalence


theorem is a very useful theorem, since it provides a simple characterization of convergent
schemes. As discussed earlier, determining whether a scheme is convergent or nonconver-
gent can be difficult if we attempt to verify Definition 1.4.1 in a rather direct way. However,
the determination of the consistency of a scheme is quite simple, as we have seen, and de-
termining the stability of a scheme is also quite easy, as we show in Section 2.2. Thus
the more difficult result—convergence—is replaced by the equivalent and easily verifiable
conditions of consistency and stability. It is also significant that the determination of the
consistency and stability of schemes involves essentially algebraic manipulations. A com-
puterized symbolic manipulation language can be useful in determining consistency and
stability. By contrast, a direct proof of convergence would rely on concepts in analysis.
Such a proof would have to begin by considering any solution u of the differential equation
and then it would have to be shown that given any ε, there exist h and k small enough
that the solution of the scheme is within ε of u. The Lax–Richtmyer theorem allows us
to dispense with all this analysis.
The preceding discussion of Theorem 1.5.1 has focused on the half of the theorem
that states that consistency and stability imply convergence. The theorem is useful in the
other direction also. It states that we should not consider any unstable schemes, since none
of these will be convergent. Thus the class of reasonable schemes is precisely delimited as
those that are consistent and stable; no other schemes are worthy of consideration.
The Lax–Richtmyer equivalence theorem is an example of the best type of mathemat-
ical theorem. It relates an important concept that is difficult to establish directly with other
concepts that are relatively easy to verify and establishes this relationship very precisely.
Notice that if we had only the half of the theorem that showed that consistency and stability
implied convergence, then it would be conceivable that there were unstable schemes that
were also convergent. If we had only the other half of the theorem, stating that a consis-
1.5 Stability 33
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

tent convergent scheme is stable, then we would not know if a stable consistent scheme
is convergent. The usefulness of the Lax–Richtmyer theorem arises both from the ease of
verifying consistency and stability and from the precise relationship established between
these concepts and the concept of convergence.

Exercises
1.5.1. Show that schemes of the form
n+1
vm = αvm+1
n
+ βvm−1
n

are stable if |α| + |β| is less than or equal to 1. Conclude that the Lax–Friedrichs
scheme (1.3.5) is stable if |aλ| is less than or equal to 1.
n+1 + v n−1 and summing over all
1.5.2. By multiplying the leapfrog scheme (1.3.4) by vm m
values of m, obtain the relation


|vm | + |vm
n+1 2
| + aλ(vm
n 2
vm+1 − vm+1
n+1 n n+1 n
vm )
m=−∞


= |vm | + |vm
n 2
| + aλ(vm
n−1 2
vm+1 − vm+1
n n−1 n n−1
vm ).
m=−∞

Show that the leapfrog scheme is stable for |aλ| < 1.


n+1 + v n and summing
1.5.3. By multiplying scheme (1.4.3), with fmn equal to 0, by vm m
over all values of m, obtain the relation



aλ aλ n+1 n+1
1− |vm | +
n+1 2
v v
m=−∞
2 2 m m+1

∞ 

aλ aλ n n
= 1− |vm | +
n 2
v v .
m=−∞
2 2 m m+1

Conclude that the scheme is stable for aλ < 1.


n+1
1.5.4. By multiplying scheme (1.4.3), with fmn equal to 0, by vm+1 + vm−1
n and summing
over all values of m, obtain the relation



aλ n+1 2 aλ n+1 n+1
|vm | + 1 − vm vm+1
m=−∞
2 2



aλ n 2 aλ n n
= |v | + 1 − vm vm+1 .
m=−∞
2 m 2

Conclude that the scheme is stable for aλ > 1.


34 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1.6 The Courant–Friedrichs–Lewy Condition


The condition that the magnitude of aλ be at most 1 is the stability condition for many
finite difference schemes for hyperbolic systems in one space dimension when λ is a
constant. This has been the stability condition for the Lax–Friedrichs scheme (1.3.5)
(see Exercise 1.5.1) and for the forward-time forward-space scheme (1.3.1) when a is
negative and the forward-time backward-space scheme (1.3.2) when a is positive (see
Example 1.5.1). We now show that this condition is a necessary condition for stability for
many explicit schemes for the equation (1.1.1).
An explicit finite difference scheme is any scheme that can be written in the form

n+1 = a finite sum of v n with n ≤ n.
vm m

All the schemes we considered so far are explicit; we examine implicit (i.e., nonexplicit)
schemes later. We now prove the following result, which covers all the one-step schemes
we have discussed.

Theorem 1.6.1. For an explicit scheme for the hyperbolic equation (1.1.1) of the form
n+1 = αv n
m−1 + βvm + γ vm+1 with k/ h = λ held constant, a necessary condition for
vm n n

stability is the Courant–Friedrichs–Lewy (CFL) condition,

|aλ| ≤ 1.

For systems of equations for which v is a vector and α, β, and γ are matrices, we must
have |ai λ| ≤ 1 for all eigenvalues ai of the matrix A.

Proof. First consider the case of a single equation. If |aλ| > 1, then by considering
the point (t, x) = (1, 0) we see that the solution to the partial differential equation depends
on the values of u0 (x) at x = −a. But the finite difference scheme will have v0n depend
on vm 0 only for |m| ≤ n, by the form of the scheme. This situation is illustrated in

Figure 1.12. Since h = λ−1 k, we have |m|h ≤ λ−1 kn = λ−1 , since kn = 1. So v0n
depends on x only for |x| ≤ λ−1 < |a|. Thus v0n cannot converge to u(1, 0) as h → 0.
This proves the theorem in this case.
For the case of a system of equations, we have that u(1, x) depends on u0 (x) for x
in the interval [−a, a], where a is the maximum magnitude of the characteristic speeds
ai . If |ai λ| > 1 for some characteristic speed ai , then we can take initial data that are
zero in [−λ−1 , λ−1 ] but not zero near ai . Then u(1, x) will not be zero, in general, and
yet v0n with nk = 1 will be zero. Thus v n cannot converge to u(1, ·), and the theorem
is proved.
A similar argument can be used to show that there is no explicit, consistent scheme for
hyperbolic partial differential equations that is stable for all values of λ (with λ constant
as h, k → 0 ). We obtain the following theorem, first proved by Courant, Friedrichs, and
Lewy [11].

Theorem 1.6.2. There are no explicit, unconditionally stable, consistent finite difference
schemes for hyperbolic systems of partial differential equations.
1.6 The CFL Condition 35
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

t=1

x
λ a
-1 -1
-a -λ

Figure 1.12. The grid for an unstable scheme.

The numerical speed of propagation for a scheme of the form considered in Theorem
1.6.1 is h/k = λ−1 since information can propagate one grid spacing in one time step.
The CFL condition can be rewritten as

λ−1 ≥ |a|,

which can be interpreted as stating that the numerical speed of propagation must be greater
than or equal to the speed of propagation of the differential equation. This is the basic idea
of these theorems. If the numerical scheme cannot propagate the solution at least as fast as
the solution of the differential equation, then the solution of the scheme cannot converge to
the solution of the partial differential equation.
We now present two implicit schemes for the one-way wave equation (1.1.1). These
schemes are consistent and stable for all values of λ and thus illustrate that Theorem 1.6.2
does not extend to implicit schemes. The two schemes are the backward-time central-space
scheme
vmn+1 − v n v n+1 − vm−1
n+1
m
+ a m+1 =0 (1.6.1)
k 2h
and the backward-time backward-space scheme

n+1 − v n n+1 − v
vm n+1
vm m m−1
+a =0 (1.6.2)
k h

for a positive. We are not concerned at this point with how to solve for the values vm n+1

given the values at time level n; this topic is considered in Section 3.5. It is easy to check
that both of these schemes are consistent schemes for (1.1.1). In Section 2.2 we show that
the scheme (1.6.1) is stable for all values of a and λ.
36 Chapter 1. Hyperbolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example 1.6.1. We now show that the backward-time backward-space scheme (1.6.2) is
stable when a is positive and λ is any positive number. This shows that Theorem 1.6.2
does not extend to implicit schemes.
We first write the scheme (1.6.2) as
(1 + aλ)vm
n+1
= vm
n
+ aλ vm−1
n+1
.
If we take the square of both sides, we obtain

(1 + aλ)2 |vm | ≤ |vm


n+1 2
| + 2aλ|vm
n 2 n
| |vm−1
n+1
| + (aλ)2 |vm−1 |
n+1 2

≤ (1 + aλ)|vm | + (aλ + (aλ)2 )|vm−1


n 2
| .
n+1 2

Taking the sum over all values of m, we obtain


∞ ∞ ∞

(1 + aλ)2 |vm | ≤ (1 + aλ)
n+1 2
|vm | + (aλ + (aλ)2 )
n 2
|vm | .
n+1 2

m=−∞ m=−∞ m=−∞


Subtracting the last expression on the right-hand side from the left-hand side gives the
estimate
∞ ∞

|vm | ≤
n+1 2
|vm | ,
n 2

m=−∞ m=−∞
showing that the scheme is stable for every value of λ when a is positive.

We point out that even though we can choose λ arbitrarily large for scheme (1.6.2)
and still have a stable scheme, the solution will not be accurate unless λ is restricted to
reasonable values. We discuss the accuracy of solutions in Chapter 3, and in Section 5.2
we show that there are advantages to choosing |aλ| small.

Exercises
1.6.1. Show that the following modified Lax–Friedrichs scheme for the one-way wave
equation, ut + aux = f, given by
 n  aλ
vmn+1
= 12 vm+1 + vm−1
n
− n
(vm+1 − vm−1
n
) + kfmn
1 + (aλ)2
is stable for all values of λ. Discuss the relation of this explicit and unconditionally
stable scheme to Theorem 1.6.2.
1.6.2. Modify the proof of Theorem 1.6.1 to cover the leapfrog scheme.
1.6.3. Show that schemes of the form
n+1
αvm+1 + βvm−1
n+1
= vm
n

are stable if |α| − |β| is greater than or equal to 1. Conclude that the reverse
Lax–Friedrichs scheme,
 
n+1
1
2 vm+1 + vm−1 − vm
n+1 n
v n+1 − vm−1
n+1
+ a m+1 = 0,
k 2h
is stable if |aλ| is greater than or equal to 1.
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 2

Analysis of Finite
Difference Schemes

In this chapter we present and develop the basic properties of Fourier analysis, which is
an important tool for analyzing finite difference schemes and their solutions. In this and
subsequent chapters this tool is used to study many important properties of finite difference
schemes and their solutions. We use Fourier analysis throughout this text to study both
finite difference schemes and partial differential equations.

2.1 Fourier Analysis


The tool that we will use most extensively in our study of stability and well-posedness is
Fourier analysis. We will use Fourier analysis on both the real line R and on the grid of
integers Z or hZ, which is defined by hZ = {hm : m ∈ Z}. For a function u(x) defined
on the real line R, its Fourier transform û(ω) is defined by
 ∞
1
û(ω) = √ e−iωx u(x) dx. (2.1.1)
2π −∞
The Fourier transform of u is a function of the real variable ω and is uniquely defined
by u. The function û is an alternative representation of the function u. Information about
certain properties of u can be inferred from the properties of û. For example, the rate at
which û decays for large values of ω is related to the number of derivatives that u has.
The Fourier inversion formula, given by
 ∞
1
u(x) = √ eiωx û(ω) dω, (2.1.2)
2π −∞
shows how u can be recovered from û. The Fourier inversion formula expresses the
function u as a superposition of waves, given by eiωx , with different amplitudes û(ω).
We will postpone for now the discussion of what conditions u(x) must satisfy so that
(2.1.1) and (2.1.2) are well defined. Notice that û(ω) may be complex valued even if u(x)
is real valued.

Example 2.1.1. As an example of the Fourier transform, consider the function



−x
e if x ≥ 0,
u(x) =
0 if x < 0.

37
38 Chapter 2. Analysis of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We have that 
1 ∞ 1 1
ũ(x) = √ e−iωx e−x dx = √ .
2π 0 2π 1 + iω
The validation of the formula (2.1.2) requires the use of the residue calculus; see
Appendix C.

In a similar fashion, if v is a grid function defined for all integers m, its Fourier
transform is given by

1 −imξ
v̂(ξ ) = √ e vm (2.1.3)
2π m=−∞

for ξ ∈ [−π, π ], and v̂(−π) = v̂(π ). The Fourier inversion formula is given by
 π
1
vm = √ eimξ v̂(ξ ) dξ. (2.1.4)
2π −π

Fourier analysis on the integers Z is the same as the study of Fourier series representa-
tions of functions defined on an interval. From the perspective of Fourier series one usually
starts with a function v̂(ξ ) defined on the interval [−π, π ] and shows that it can be
represented as a series such as (2.1.3) with coefficients vm given by (2.1.4). In our study
of finite difference schemes it is more natural to start with the grid functions vm and
regard the formula (2.1.4) as a representation of the grid function. The two approaches
are mathematically equivalent. The Fourier inversion formula (2.1.4) has an interpretation,
analogous to (2.1.2), as expressing v as a superposition of waves.
If the spacing between the grid points is h, we can change variables and define the
transform by

1
v̂(ξ ) = √ e−imhξ vm h (2.1.5)
2π m=−∞

for ξ ∈ [−π/ h, π/ h], and then the inversion formula is


 π/ h
1
vm = √ eimhξ v̂(ξ ) dξ. (2.1.6)
2π −π/ h

An important consequence of the preceding definitions is that the L2 norm of u,


which is
 ∞ 1/2
u2 = |u(x)|2 dx ,
−∞

is the same as the L2 norm of û(ω), i.e.,


 ∞  ∞
|u(x)| dx =
2
|û(ω)|2 dω. (2.1.7)
−∞ −∞
2.1 Fourier Analysis 39
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(See Appendix B for a discussion of function norms.) Also, for the discrete transform we
have equality for the L2 norm of v, as defined in (1.5.2), and the L2 norm of v̂, i.e.,
 π/ h ∞

v̂2h = |v̂(ξ )| dξ =
2
|vm |2 h = v2h . (2.1.8)
−π/ h −∞

The relations (2.1.7) and (2.1.8) are called Parseval’s relations. Using Parseval’s relations
one can show that the Fourier transform is defined for all functions in L2 (R) and L2 (hZ).
For proofs of Parseval’s relation for functions in L2 (R) the reader is referred to texts on
Fourier analysis, such as Titchmarsh [61] and Goldberg [23]. Other applications of Fourier
analysis are discussed in the book by Körner [31].
We can give an indication of the proof for Parseval’s relation for functions in L2 (hZ)
quite easily. Starting with the left-hand side of equation (2.1.8) and using the definition of
the transform, we have
 π/ h  π/ h ∞

1
v̂2h = |v̂(ξ )| dξ =
2
v̂(ξ ) √ e−imhξ vm h dξ
−π/ h −π/ h 2π m=−∞

∞  π/ h

1
=√ e−imhξ v̂(ξ ) dξ vm h
2π m=−∞ −π/ h


 π/ h
1
= √ eimhξ v̂(ξ ) dξ vm h
m=−∞ 2π −π/ h



= vm vm h = v2h .
m=−∞

The only step in this derivation that needs justification is the interchange of the integration
and summation operations. This is not difficult, and readers familiar with real analysis can
easily fill in the details.
Parseval’s relation will be used extensively in our study of stability. It allows us to
replace the stability estimates (1.5.1) and (1.5.3) by the equivalent inequality


J
v̂ n h ≤ CT∗ v̂ j h ,
j =0

for the transform of the grid function. In the next section we study the stability of schemes
by examining the effect of the scheme on the transform of the solution.
It should also be pointed out that there is not a relation equivalent to Parseval’s relation
if the norm is the maximum norm (see Exercise 2.1.7). Because there is no such relation,
the Lax–Richtmyer theorem is not valid in the maximum norm, at least in a straightforward
way, as is shown in Section 10.5 (see Exercise 10.5.2).
40 Chapter 2. Analysis of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Figures 2.1 and 2.2 show two examples of functions and their Fourier transforms.
In Figure 2.1 the function exp(−|x|) is displayed with its transform. The function is the
one that has the sharp peak at x = 0; the transform is the smooth function and is given in
Exercise 2.1.1. Because of the discontinuity in the derivative of the function, the Fourier
transform decays more slowly than the exponential; see Exercise 2.1.3.
In Figure 2.2 the function exp(−x 2 ) is displayed with its transform, which is also
given in Exercise 2.1.1. The function has the narrower graph; the transform has the wider
graph. The transform has the same basic shape, being proportional to exp(−x 2 /4), but
is wider. In general, functions with a narrow spike such as the function shown here have
wider transforms, and vice versa.

0.8

0.6

0.4

0.2

0
-6 -4 -2 0 2 4 6
x

Figure 2.1. The function e−|x| and its Fourier transform.

We now present some examples of functions and their Fourier transforms.

Example 2.1.2. We take the grid function given by


 1 if |xm | < 1,

vm = 1
if |xm | = 1,


2
0 if |xm | > 1

on a grid with spacing h. For the case where h = M −1 for some integer M, we
2.1 Fourier Analysis 41
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

0.8

0.6

0.4

0.2

0
-4 -2 0 2 4
x

Figure 2.2. The function e−x and its Fourier transform.


2

have by (2.1.3)

M−1
h 1 iMhξ 1 −iMhξ h
v̂(ξ ) = √ e + e +√ e−imhξ
2π 2 2 2π m=−(M−1)

h h sin(M − 12 )hξ
= √ cos ξ + √
2π 2π sin 12 hξ

h h sin Mhξ cos 12 hξ − cos Mhξ sin 12 hξ


= √ cos ξ + √
2π 2π sin 12 hξ

h 1
= √ sin ξ cot hξ.
2π 2

Parseval’s relation then asserts that

 2
M−1
1 1
2 − h = 2h +h 1
2 2
m=−(M−1)

h2 π/ h 1
= sin2 ξ cot 2 hξ dξ.
2π −π/ h 2

This result can also be verified by direct evaluation of the integral using contour
integration.
42 Chapter 2. Analysis of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example 2.1.3. For our second example we take the grid function given by

vm = e−α|m|h

for any positive constant α. We have for the transform




1
v̂(ξ ) = √ e−imhξ e−α|m|h h
2π m=−∞
 
∞ −∞

h 
=√ 1+ e−imhξ e−α|m|h + e−imhξ e−α|m|h 
2π m=1 m=−1
 
h e−(α−iξ )h e−(α+iξ )h
=√ 1+ +
2π 1 − e−(α−iξ )h 1 − e−(α+iξ )h

h 1 − e−2αh
=√ −αh
.
2π (1 − 2e cos hξ + e−2αh )

By Parseval’s relation we have that

1 + e−2αh
v2h = h ,
1 − e−2αh
  2
h2 π/ h 1 − e−2αh
v̂2h = dξ.
2π −π/ h 1 − 2e−αh cos hξ + e−2αh

This result can also be verified by direct evaluation of the integral using contour
integration.

Fourier Analysis and Partial Differential Equations


We conclude this section by using the tools of Fourier analysis to study partial differential
equations. In the next sections we use similar tools to study the stability of finite difference
schemes. If we differentiate the Fourier inversion formula (2.1.2) we obtain
 ∞
∂u 1
(x) = √ eiωx iω û(ω) dω,
∂x 2π −∞

and from this we conclude by (2.1.1) that the Fourier transform of the derivative of u(x)
is iω û(ω), i.e.,

∂u
(ω) = iω û(ω). (2.1.9)
∂x
The relation (2.1.9) shows the real power of the Fourier transform: under the trans-
form the operation of differentiation is converted into the operation of multiplication. The
2.1 Fourier Analysis 43
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

coupling of calculus, i.e., differentiation, with algebra, i.e., multiplication, gives us ma-
chinery to solve more easily many difficult problems in the theory of differential equations
and difference schemes. The important results of the next section on stability and those of
Chapter 9 on well-posedness use the Fourier transform to reduce questions about schemes
and differential equations to questions in algebra; for example, we show in Section 4.3 that
a multistep scheme is stable if the roots of a certain polynomial are all inside the unit circle.
An important consequence of (2.1.9) is that, by Parseval’s relations, u(x) has L2
integrable derivatives of order through r if and only if
 ∞
(1 + |ω|2 )r |û(ω)|2 dω < ∞.
−∞

This is because  r 
∞ ∂ u(x) 2 ∞
dx = |ω|2r |û(ω)|2 dω .
∂x
−∞ −∞

We define the space of functions H r , for each nonnegative value of r, as the set of
functions in L2 (R) such that the norm
 ∞ 1/2
uH r = (1 + |ω|2 )r |û(ω)|2 dω
−∞

is finite. Notice that the norm on H 0 is the same as the L2 norm.


We also define the expression D r u by

 r 2
∞ ∂
D u =
r 2
∂x r u(x) dx
−∞
 ∞
= |ω|2r |
u(ω)|2 dω,
−∞

where the integral over x is defined only when r is an integer, but we define D r u by
the last integral when r is not an integer.
We now apply Fourier analysis to the initial value problem for the one-way wave
equation (1.1.1). We begin by transforming only in the spatial variable. We obtain for
û(t, ω) the equation
ût = −iaω û, (2.1.10)
which is an ordinary differential equation in t. This equation is easily solved and, using
the initial data, the solution is

û(t, ω) = e−iaωt û0 (ω).

We now show that the initial value problem for (1.1.1) is well-posed according to
Definition 1.5.2. By the use of Parseval’s relation and this last relationship, we immediately
44 Chapter 2. Analysis of Finite Difference Schemes

obtain, using |e−iaωt | = 1,


Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

 ∞  ∞
|u(t, x)|2 dx = |û(t, ω)|2 dω
−∞ −∞
 ∞  ∞  ∞
= |e−iaωt û0 (ω)|2 dω = |û0 (ω)|2 dω = |u0 (x)|2 dx.
−∞ −∞ −∞

(2.1.11)
The equality of the first and last integrals in (2.1.11) can be easily established by the solution
formula (1.1.2), however the method of (2.1.11) can be used to prove results for more
general partial differential equations. A more general discussion of well-posed initial value
problems occurs in Chapter 9. Much of the analysis uses the same ideas used in (2.1.11),
which are to switch to the equation for the Fourier transform, obtain some estimates for
the norm of the transform, and then use Parseval’s relation to obtain information about the
solution of the partial differential equation.

The Fourier Transform in Higher Dimensions


The Fourier transform is defined for higher dimensions by the formula

1
û(ω) = e−iω·x u(x) dx, (2.1.12)
(2π )N/2 RN

where both x and ω are variables in R N . The inner product ω · x is the usual inner
product in R N . The inversion formula is given by

1
u(x) = eiω·x û(ω) dω.
(2π )N/2 RN

Similar formulas hold for the discrete transforms; they are

1
v̂(ξ ) = e−ihm·ξ vm hN
(2π )N/2 Nm∈Z

for ξ ∈ [−π/ h, π/ h]N , and the inversion formula is



1
vm = eihm·ξ v̂(ξ ) dξ.
(2π )N/2 [−π/ h,π/ h]N

Parseval’s relation also holds for higher dimensions.


Almost all the techniques we use for one-dimensional problems carry over to higher
dimensions without much difficulty. We restrict much of our analysis to the one-dimensional
case for simplicity, leaving the higher dimensional cases to the exercises.
2.1 Fourier Analysis 45
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
2.1.1. Check the following list of transforms and determine for which values of r they are
in H r :

u(x) û(ω)

2 1
e−|x|
π 1 + ω2

1
√ e−ω /2a
2
e−ax
2 /2

a

2 ω
xe−|x| −2i
π (1 + ω2 )2

2 1 − ω2
|x|e−|x|
π (1 + ω2 )2
 α −x
x e if x ≥ 0,
1 ?(1 + α)
u(x) = √
 2π (1 + iω)1+α
0 if x < 0

du
2.1.2. Show that if u is in H r , then dx is in H r−1 .
2.1.3. Show that if û(ω) satisfies the estimate

|û(ω)| ≤ Ce−|ω| ,

then u(x) is an infinitely differentiable function.


2.1.4. Use an argument similar to that used in (2.1.11) to show that the initial value problem
for the equation ut = uxxx is well-posed.
2.1.5. Use an argument similar to that used in (2.1.11) to show that the initial value problem
for the equation ut + ux + bu = 0 is well-posed.
2.1.6. Show that if the function u(x) is in L2 (R) and its transform satisfies

C
|û(ω)| ≤ ,
1 + ω4

for some constant C, then the first and second derivatives of u exist and are bounded
functions.
2.1.7. Show that if u(x) is in L1 (R), then û(ω) is a continuous function on R. Show that
û∞ ≤ (2π)−1/2 u1 . (See Appendix B for the notation.) Prove an equivalent
relation for grid functions.
46 Chapter 2. Analysis of Finite Difference Schemes

2.1.8. The Schwartz class S is defined as the set of all C ∞ functions f on R such that
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

 d β
for each pair of integers (α, β) the function (1 + |x|α ) dx f (x) is bounded.
Show that the Fourier transform of a function in S is also in S.
2.1.9. Finite Fourier Transforms. For a function vm defined on the integers, m =
0, 1, . . . , M − 1, we can define the Fourier transform as


M−1
v̂$ = e−2iπ$m/M vm for $ = 0, . . . , M − 1.
m=0

For this transform prove the Fourier inversion formula

1 2iπ$m/M
M−1
vm = e v̂$ ,
M
$=0

and the Parseval’s relation



M−1
1
M−1
|vm |2 = |v̂$ |2 .
M
m=0 $=0

Note that vm and v̂$ can be defined for all integers by making them periodic with
period M.
2.1.10. Finite Fourier Transforms. If M is an even integer, one can define the cosine and
sine transforms of a function vm defined for the integers m = 0, 1, . . . , M − 1 by
defining

M−1 
2π $m M
v̂$c = cos vm for $ = 0, . . . , ,
M 2
m=0

M−1 
2π $m M
v̂$s = sin vm for $ = 1, . . . , − 1.
M 2
m=0

Show that v̂$ as defined in Exercise 2.1.9 satisfies

M
v̂$ = v̂$c − i v̂$s for l = 0, . . . , ,
2
M
v̂$ = v̂M−$
c
+ i v̂M−$
s
for l = + 1, . . . , M − 1,
2
and then show that
 
2
M/2−1
1 c 2π $m 2π $m
vm = (v̂ + (−1) v̂M/2 ) +
m c
cos v̂$ + sin
c
v̂$s
M 0 M M M
$=1

for m = 0, . . . , M − 1.
2.2 Von Neumann Analysis 47
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

2.1.11. Use the multidimensional Fourier transform (2.1.12) to prove that the initial value
problem for the equation
ut + aux + buy = 0
is well-posed. (See (2.1.11).)
2.1.12. Prove the “uncertainty principle” inequality:
 ∞  ∞  ∞  ∞
|f (x)| dx
2 ˆ
|f (ω)| dω ≤ 4
2
x |f (x)| dx
2 2
ω2 |fˆ(ω)|2 dω.
−∞ −∞ −∞ −∞

Deduce that both the function and its transform cannot be concentrated at the origin.

2.2 Von Neumann Analysis


An important application of Fourier analysis is the von Neumann analysis of stability of
finite difference schemes. With the use of Fourier analysis we can give necessary and
sufficient conditions for the stability of finite difference schemes. The resulting method is
easier to apply and is more generally applicable than are the methods used in the examples
at the end of Chapter 1.
We illustrate the method by considering a particular example and then discussing
the method in general. Through the use of the Fourier transform the determination of the
stability of a scheme is reduced to relatively simple algebraic considerations. We begin by
studying the forward-time backward-space scheme
n+1 − v n n − vn
vm
vm m m−1
+a = 0, (2.2.1)
k h
which can be rewritten as
n+1
vm = (1 − aλ)vm
n
+ aλ vm−1
n
, (2.2.2)

where λ = k/ h. Using the Fourier inversion formula (2.1.6) for v n , we have


 π/ h
1
n
vm =√ eimhξ v̂ n (ξ ) dξ,
2π −π/ h

n and v n
and substituting this in (2.2.2) for vm m−1 , we obtain

1 π/ h  
n+1
vm =√ eimhξ (1 − aλ) + aλe−ihξ v̂ n (ξ ) dξ. (2.2.3)
2π −π/ h

Comparing this formula with the Fourier inversion formula for v n+1 ,
 π/ h
1
n+1
vm =√ eimhξ v̂ n+1 (ξ ) dξ,
2π −π/ h
48 Chapter 2. Analysis of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

and using the fact that the Fourier transform is unique, we deduce that the integrand of
(2.2.3) is the same as that in the inversion formula. We then have that
 
v̂ n+1 (ξ ) = (1 − aλ) + aλe−ihξ v̂ n (ξ )
(2.2.4)
= g(hξ ) v̂ n (ξ ),

where
g(hξ ) = (1 − aλ) + aλe−ihξ .

The formula (2.2.4) shows that advancing the solution of the scheme by one time step is
equivalent to multiplying the Fourier transform of the solution by the amplification factor
g(hξ ). The amplification factor is so called because its magnitude is the amount that the
amplitude of each frequency in the solution, given by v̂ n (ξ ), is amplified in advancing the
solution one time step. From (2.2.4) we obtain the important formula

v̂ n (ξ ) = g(hξ )n v̂ 0 (ξ ). (2.2.5)

Note that the superscript on v̂ is an index of the time level, while on g it is a power.
By means of the Fourier transform every one-step scheme can be put in the form
(2.2.5), and this provides a standard method for studying the wide variety of schemes. All
the information about a scheme is contained in its amplification factor, and we show how to
extract important information from it. In particular, the stability and accuracy of schemes
is easy to determine from the amplification factor.
We now use formula (2.2.5) to study the stability of scheme (2.2.1). This analysis is
analogous to that displayed in equation (2.1.11) to study the well-posedness of the initial
value problem for equation (2.1.10). By Parseval’s relation, (2.1.8), and (2.2.5),


 π/ h
h |vm | =
n 2
|v̂ n (ξ )|2 dξ
m=−∞ −π/ h

 π/ h
= |g(hξ )|2n |v̂ 0 (ξ )|2 dξ.
−π/ h

Thus we see that the stability inequality (1.5.1) will hold, with J = 0, if |g(hξ )|2n is
suitably bounded. We now evaluate |g(hξ )|. Setting θ = hξ, we have

g(θ ) = (1 − aλ) + aλe−iθ = (1 − aλ) + aλ cos θ − iaλ sin θ.

To evaluate |g(θ )|2 we add the squares of the real and imaginary parts. We also make use
of the half-angle formulas for the sine and cosine functions. These are

1 − cos ϕ = 2 sin2 21 ϕ and sin ϕ = 2 sin 12 ϕ cos 12 ϕ.


2.2 Von Neumann Analysis 49
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We then have

|g(θ )|2 = (1 − aλ + aλ cos θ)2 + a 2 λ2 sin2 θ

= (1 − 2aλ sin2 21 θ)2 + 4 a 2 λ2 sin2 21 θ cos2 21 θ


(2.2.6)
= 1 − 4aλ sin2 21 θ + 4a 2 λ2 sin4 21 θ + 4a 2 λ2 sin2 21 θ cos2 21 θ

= 1 − 4aλ(1 − aλ) sin2 21 θ.

We see from this last expression that |g(θ )| is bounded by 1 if 0 ≤ aλ ≤ 1; thus by


(2.2.5),
∞  π/ h
h |vm | ≤
n 2
|v̂ 0 (ξ )|2 dξ
m=−∞ −π/ h



=h |vm | ,
0 2

m=−∞

and the scheme is stable by Definition 1.5.1.

-1 0 1

Figure 2.3. The image of g(θ ) for the forward-time backward-space scheme.

Figure 2.3 shows the set of points marked out by g(θ ) as θ varies for the case
with aλ = 0.8 . These points lie within the unit circle because the scheme is stable. By
consistency we must always have g(0) = 1.
However, if aλ is not in the interval [0, 1] and λ is fixed as h and k tend to zero,
then |g(θ )| is greater than 1 for some values of θ, and the scheme is unstable, as we
show next.
50 Chapter 2. Analysis of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The Stability Condition


The exact condition for stability of constant coefficient one-step schemes is given in the
next theorem. Although in the example we have just considered, the amplification factor
g was a function only of θ = hξ, in general g will also depend on h and k. Also,
we have considered schemes only for equation (1.1.1), and yet our definition of stability,
Definition 1.5.1, applies to more general partial differential equations that are first order in
the differentiation with respect to time. To allow for more general equations, we have to
allow the magnitude of the amplification factor to exceed 1 by a small amount.

Theorem 2.2.1. A one-step finite difference scheme (with constant coefficients) is stable in
a stability region  if and only if there is a constant K (independent of θ, k, and h )
such that
|g(θ, k, h)| ≤ 1 + Kk (2.2.7)

with (k, h) ∈ . If g(θ, k, h) is independent of h and k, the stability condition (2.2.7)


can be replaced with the restricted stability condition

|g(θ )| ≤ 1. (2.2.8)

This theorem shows that to determine the stability of a finite difference scheme we
need to consider only the amplification factor g(hξ ). This observation is due to von Neu-
mann, and because of that, this analysis is usually called von Neumann analysis.
Before proceeding with the proof of this theorem, we consider some examples that
use the special condition (2.2.8).

Example 2.2.1. We consider the forward-time forward-space scheme (1.3.1), for which

g(hξ ) = 1 + aλ − aλeihξ ,

where a is positive and λ is constant. This formula is obtained in the same fashion as
(2.2.4); we have that
|g|2 = 1 + 4aλ(1 + aλ) sin2 21 θ.

If λ is constant, then we may use the restricted stability condition (2.2.8), and we see that
|g| is greater than 1 for θ not equal to 0, and therefore this scheme is unstable. Recall
that by Example 1.4.3 we know that this scheme is not convergent.
If a is negative, then the forward-time forward-space scheme is stable for −1 ≤
aλ ≤ 0.

We needn’t write out the integrals and obtain expressions such as (2.2.3) to obtain the
n in the scheme
amplification factor g. A simpler and equivalent procedure is to replace vm
n
by g e imθ for each value of n and m. The resulting equation can then be solved for the
amplification factor.
2.2 Von Neumann Analysis 51
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example 2.2.2. We use the forward-time central-space scheme (1.3.3),


n+1 − v n
vm v n − vm−1
n
m
+ a m+1 = 0,
k 2h
n by g n eimθ , the preceding expression is trans-
to illustrate this procedure. Replacing vm
formed to
g n+1 eimθ − g n eimθ g n ei(m+1)θ − g n ei(m−1)θ
+a
k 2h

g−1 eiθ − e−iθ
= g n eimθ +a = 0,
k 2h
which gives the amplification factor as

g = 1 − iaλ sin θ

with λ = k/ h. This method of obtaining the amplification factor is certainly easier than
the earlier analysis.
If λ is constant, then g is independent of h and k and

|g(θ )|2 = 1 + a 2 λ2 sin2 θ.

Since |g(θ )| is greater than 1 for θ not equal to 0 or π, by Theorem 2.2.1 this scheme
is unstable.

The determination of the amplification factor by replacing vm n by g n eimθ is not to

be regarded as merely looking for solutions of the difference scheme that have the form
n = g n eimθ . The replacement of v n by g n eimθ is a shortcut in the method used at the
vm m
beginning of the section, in which we proved that all solutions of the one-step difference
scheme were given by formula (2.2.5), and this proof gave the form of the amplification
factor. That same procedure can be applied to any one-step scheme to determine the form
of the amplification factor. A rearrangement of the manipulations used to determine the
amplification factor shows that the two procedures are equivalent in determining the form
of the amplification factor.

Example 2.2.3. As an example of a scheme that requires the more general condition (2.2.7),
we consider the modified Lax–Friedrichs scheme for

ut + a ux − u = 0, (2.2.9)

given by  
n+1 −
vm 1 n
vm+1 + vm−1
n n
vm+1 − vm−1
n
2
+a − vm
n
= 0.
k 2h
This scheme has the amplification factor

g(θ, k, h) = cos θ − iaλ sin θ + k


52 Chapter 2. Analysis of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

and

|g|2 = (cos θ + k)2 + a 2 λ2 sin2 θ

≤ (1 + k)2

if |aλ| ≤ 1. Notice that since (2.2.9) has solutions that grow with t (see Section 1.1), any
consistent, stable scheme for (2.2.9) must have |g| larger than 1 for some values of θ.

As the examples show, the amplification factor g(θ, k, h) is an algebraic function


of eiθ , and it is a continuous function of all of its arguments. We will always assume that
g(θ, k, h) is a smooth function of all of its arguments.

Proof of Theorem 2.2.1. We have, by Parseval’s relation and the definition of g,


that
 π/ h
v n 2h = |g(hξ, k, h)|2n |v̂ 0 (ξ )|2 dξ.
−π/ h

If |g(hξ, k, h)| ≤ 1 + Kk for (k, h) ∈ , we have

 π/ h
v n 2h ≤ (1 + Kk)2n |v̂ 0 (ξ )|2 dξ
−π/ h

= (1 + Kk)2n v 0 2h .

Now n ≤ T /k, so

(1 + Kk)n ≤ (1 + Kk)T /k ≤ eKT .

Therefore, v n h ≤ eKT v 0 h , which is (1.5.1), and thus the scheme is stable in .


We now prove that if inequality (2.2.7) cannot be satisfied for (k, h) ∈  for any
value of K, then the scheme is not stable in . To do this we show that we can achieve any
amount of growth in the solution; i.e., we show that the stability inequality (1.5.1) cannot
hold.
If for some positive value C there is an interval of θ ’s, θ ∈ [θ1 , θ2 ] and (k, h) ∈ 
with |g(θ, k, h)| > 1 + Ck, then we construct a function vm 0 as


0 if hξ ∈
/ [θ1 , θ2 ],
v̂ (ξ ) =
0

h(θ2 − θ1 )−1 if hξ ∈ [θ1 , θ2 ].
2.2 Von Neumann Analysis 53
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Notice that v̂ 0 h is equal to 1. Then



 n 2 π/ h
v  = |g(hξ, k, h)|2n |v̂ 0 (ξ )|2 dξ
h
−π/ h

 θ2 / h h
= |g(hξ, k, h)|2n dξ
θ1 / h θ2 − θ 1

≥ (1 + Ck)2n

1 2T C 0 2
≥ e v h
2

for n near T /k. This shows the scheme to be unstable if C can be arbitrarily large. Thus
the scheme is unstable if there is no region in which g(θ, k, h) can be bounded as in (2.2.7).
The proof of condition (2.2.8) is very easy and is similar to the proof of Theorem 2.2.3, so
we omit the proof here.

Corollary 2.2.2. If a scheme as in Theorem 2.2.1 is modified so that the modifications


result only in the addition to the amplification factor of terms that are O(k) uniformly
in ξ, then the modified scheme is stable if and only if the original scheme is stable.

Proof. If g is the amplification factor for the scheme and satisfies |g| ≤ 1 + Kk,
then the amplification factor of the modified scheme, g  , satisfies

|g  | = |g + O(k)| ≤ 1 + Kk + Ck = 1 + K  k.

Hence the modified scheme is stable if the original scheme is stable, and vice versa.
The use of Theorem 2.2.1 and Corollary 2.2.2 allows one to determine the stability
of all the schemes we have discussed so far, with the exception of the leapfrog scheme,
which is not a one-step scheme. Stability for the leapfrog scheme and other multistep
schemes is discussed in Chapter 4.
The following theorem shows how to reduce further algebraic manipulation in eval-
uating |g| and determining the stability of a scheme.

Theorem 2.2.3. A consistent one-step scheme for the equation

ut + aux + bu = 0

is stable if and only if it is stable for this equation when b is equal to 0. Moreover, when
k = λh and λ is a constant, the stability condition on g(hξ, k, h) is

|g(θ, 0, 0)| ≤ 1. (2.2.10)


54 Chapter 2. Analysis of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Proof. Because of consistency it is easy to see that the lower order term bu con-
tributes to the expression for g only terms that are proportional to k. By Corollary 2.2.2
the removal of these terms does not affect the stability of the scheme.
Using the Taylor series in k and h, we must have
g(θ, k, h) = g(θ, 0, 0) + O(h) + O(k),
and if h = λ−1 k, then the terms that are O(h) are also O(k). Moreover, since θ is
restricted to the compact set [−π, π ], the O(k) terms are uniformly bounded. Thus by
Corollary 2.2.2 the stability condition is
|g(θ, 0, 0)| ≤ 1 + Kk
for some constant K. But the left-hand side of this relation is independent of k, and
the inequality must hold for all small positive values of k. We have, therefore, that the
preceding estimate holds if and only if
|g(θ, 0, 0)| ≤ 1.
This same reasoning proves the last assertion of Theorem 2.2.1.
Because of Theorem 2.2.3 we usually write g as a function only of hξ, i.e., g(hξ ),
and do not display the dependence on h and k. It is important to realize that the stability
condition (2.2.10) cannot be used in all cases.
Note that the stability condition (2.2.7) is equivalent to
|g|2 ≤ 1 + K  k (2.2.11)
for some constant K  . If (2.2.7) holds, then
|g|2 ≤ 1 + 2Kk + k 2 ≤ 1 + (2K + k0 ) k.
Similarly, if (2.2.11) holds, then
 1/2
|g| ≤ 1 + K  k ≤ 1 + 12 K  k.
For many schemes it is easier to work with |g|2 rather than with |g| itself.
We now present several examples to illustrate the various ideas discussed in this
section.

Example 2.2.4. We perform von Neumann analysis for the Lax–Friedrichs scheme of
Example 2.2.3. The scheme is stable if and only if the scheme is stable without the undif-
ferentiated term. For this case
g(θ ) = cos θ − iaλ sin θ,
and
|g|2 = cos2 θ + a 2 λ2 sin2 θ (2.2.12)

We see that |g(θ )| is less than or equal to 1 if and only if |aλ| ≤ 1. Thus the Lax–
Friedrichs scheme with λ constant is stable if and only if |aλ| ≤ 1. As shown in Example
1.4.2, the Lax–Friedrichs scheme is consistent only if k −1 h2 tends to zero with k and h.
We have that k −1 h2 = kλ−2 , and thus if λ is constant and |aλ| ≤ 1, the scheme is both
stable and consistent.
2.2 Von Neumann Analysis 55
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

-1 0 1

Figure 2.4. The image of g(θ ) for the Lax–Friedrichs scheme.

Figure 2.4 displays the set of points of g(θ ) for the Lax–Friedrichs scheme. Compare
this with the set shown in Figure 2.3. Notice that the graph of g(θ ) touches the unit circle
in two places. It touches at both 1 and −1, because g(π ) = −1 for the Lax–Friedrichs
scheme.

Example 2.2.5. Some schemes are easier to understand or implement if they are written as
two separate steps. We now give an example of this, using the forward-time central-space
scheme with a smoothing operator for the one-way wave equation. The scheme is
n+1
ṽm = vm
n
− 12 aλ(vm+1
n
− vm−1
n
) + kfmn ,
(2.2.13)
n+1
vm = 14 (ṽm+1
n+1
+ 2ṽm
n+1
+ ṽm−1
n+1
).
To apply von Neumann analysis to this scheme, we could eliminate all reference to the
intermediate quantity ṽ, obtaining an equation for vm n+1 in terms of v n for m ranging
m
from m − 2 to m + 2. We use an equivalent and simpler procedure, which is to replace
n+1 by g̃g n eimθ as well as the usual replacement of v n by g n eimθ .
all occurrences of ṽm m
Notice that we also ignore the fmn term in the stability analysis. We obtain
g̃ = 1 − iaλ sin θ
and
g = 12 (1 + cos θ)g̃ = g̃ cos2 21 θ.
We then obtain
|g|2 = |g̃|2 cos4 21 θ = (1 + a 2 λ2 sin2 θ) cos4 21 θ .
If we take λ to be constant, then the stability requirement is that g have magnitude at
most 1. For stability we must satisfy
(1 + a 2 λ2 sin2 θ) cos4 21 θ ≤ 1
56 Chapter 2. Analysis of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

or
(1 + 4a 2 λ2 sin2 21 θ cos2 21 θ) cos4 21 θ ≤ 1,
which is equivalent to

4a 2 λ2 sin2 21 θ cos2 21 θ cos4 21 θ ≤ 1 − cos4 21 θ

= (1 − cos2 21 θ)(1 + cos2 21 θ)

= sin2 21 θ(1 + cos2 21 θ).

Canceling the common nonnegative factor of sin2 21 θ, we obtain the condition

4a 2 λ2 cos6 21 θ ≤ 1 + cos2 21 θ,

which must hold for all values of θ. We first consider the particular case of θ equal to 0,
obtaining the necessary condition that

a 2 λ2 ≤ 1
2. (2.2.14)

We now show that this condition is also sufficient, i.e., that θ equal to 0 is the “worst
case.” Assuming that (2.2.14) holds, and using the fact that cos2 21 θ is at most 1, we have

4a 2 λ2 cos6 21 θ ≤ 2 cos2 21 θ ≤ 1 + cos2 21 θ .

Thus the forward-time central-space scheme with the smoother (2.2.13) is stable if and only
if
1
|aλ| ≤ √ .
2
This scheme is not recommended for use in actual computation. For example, it requires
more work per time step than does the Lax–Friedrichs scheme, and the time-step limitation
is more severe. The forward-time central-space scheme (1.3.3), without the smoother, is
unstable; see Example 2.2.2.
Example 2.2.6. An interesting example of the relation between consistency and stability is
a scheme for the equation
ut + auxxx = f (2.2.15)
obtained by applying the ideas of the Lax–Friedrichs scheme (1.3.5). The scheme is
n+1
vm = 12 (vm+1
n
+ vm−1
n
) − 12 akh−3 (vm+2
n
− 2vm+1
n
+ 2vm−1
n
− vm−2
n
) + kfmn . (2.2.16)

This scheme is consistent with equation (2.2.15) if k −1 h2 tends to zero as h and k tend
to zero; see Exercise 2.2.3. This is similar to the result for the Lax–Friedrichs scheme as
discussed in Example 1.4.2.
2.2 Von Neumann Analysis 57
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The amplification factor for the scheme (2.2.16) is

g(θ ) = cos θ + 4akh−3 i sin θ sin2 21 θ,

and it is easily shown (see Exercise 2.2.3) that the scheme is stable only if

4|a|kh−3

is bounded.
The consistency condition, that k −1 h2 tend to zero, and the stability condition, that
−3
4akh be bounded as k and h tend to zero, cannot both be satisfied. Thus, this scheme
is not a convergent scheme, since it cannot be both consistent and stable.

Exercises
2.2.1. Show that the backward-time central-space scheme (1.6.1) is consistent with equation
(1.1.1) and is unconditionally stable.
2.2.2. Show that if one takes λ = k 1/2 , i.e., k = h2 , then the forward-time central-space
scheme (1.3.3) is stable and consistent with equation (1.1.1). (See Example 2.2.2.)
2.2.3. Verify the consistency and stability conditions of scheme (2.2.16) as given in Exam-
ple 2.2.6.
2.2.4. Show that the box scheme
1  n+1  a  n+1 
(vm + vm+1
n+1
) − (vm
n
+ vm+1
n
) + (vm+1 − vm
n+1
) + (vm+1
n
− vm
n
) = fmn
2k 2h

is consistent with the one-way wave equation ut + aux = f and is stable for all
values of λ.
2.2.5. Show that the scheme
n+1 − v n
vm v n − 3vm+1
n + 3vm
n − vn
m
+ a m+2 m−1
= fmn
k h3

is consistent with the equation (2.2.15) and, if ν = kh−3 is constant, then it is stable
when 0 ≤ aν ≤ 1/4.
2.2.6. Determine the stability of the following scheme, sometimes called the Euler back-
ward scheme, for ut + aux = f :

n+1/2 aλ n
vm = vm
n
− (v − vm−1
n
) + kfmn ,
2 m+1
aλ n+1/2 n+1/2
n+1
vm = vm
n
− (vm+1 − vm−1 ) + kfmn+1 .
2

The variable v n+1/2 is a temporary variable, as is ṽ in Example 2.2.5.


58 Chapter 2. Analysis of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

2.2.7. Using von Neumann analysis, show that the reverse Lax–Friedrichs scheme of
Exercise 1.6.3 is stable for |aλ| greater than or equal to 1.

2.3 Comments on Instability and Stability


An examination of the solutions of unstable finite difference schemes shows that instability
is related to high-frequency oscillations. An example is seen in Figure 1.7 for the Lax–
Friedrichs scheme applied to the one-way wave equation with aλ equal to 1.6. The ampli-
fication factor for this scheme has magnitude given by |g(θ )|2 = cos2 θ + a 2 λ2 sin2 θ. The
maximum value of |g(θ )| is attained  at θ equal to π/2, where |g| is 1.6. An examination
of the ratios of the norms v n+1 h v n h or of the ratio of the maximum magnitudes of
v n shows that these ratios are close to 1.6.
Moreover, the pattern of the instability in Figure 1.7 shows the strong presence of
the frequency h−1 π/2 associated with θ equal to π/2. Notice that θ equal to π/2
represents waves such as ṽm = ε sin mπ/2, which have a wavelength of 4h on a finite
difference grid. The forward-time central-space scheme (1.3.3) shows a similar pattern,
since it also has the maximum of |g(θ )| attained at θ equal to π/2.
The forward-time forward-space scheme (1.3.1) is unstable for a positive, and it
attains the maximum value of |g(θ )| at θ equal to π ; see Example 2.2.1. The pattern of
the instability associated with this scheme is different than that associated with the two other
schemes just mentioned; see Exercise 2.3.1. The instability is represented by disturbances
of the form ṽm = ε(−1)m = ε cos mπ, with a wavelength on the grid of 2h.
Instability is seen to be the rapid growth of high-frequency modes in the solution of
the finite difference solution. It follows, then, that instability is evident sooner with initial
data that contains larger amplitudes for its high frequencies. Based on the properties of
the Fourier transform in Section 2.1, we conclude that instability will be evident sooner
with initial data that is not smooth. This is indeed the case, as is easily demonstrated (see
Exercise 2.3.2).
An important point that is related to the previous discussion is that instability is
essentially a local phenomenon. This can be seen somewhat in Figure 1.7, where the
oscillations arise at the points where the derivative of the solution is discontinuous. Of
course, the oscillations caused by the instability propagate to other regions, which can
ultimately make the disturbance seem to be global in extent.
The proof that instability is first seen at points of discontinuity requires a good under-
standing of Fourier analysis. It is also somewhat difficult to define the problem correctly.
Since this topic is not germane to our goal of understanding convergent and stable schemes,
we will not pursue it.
Understanding the nature of instabilities can help distinguish between the effects of
a programming error and the instability of a finite difference scheme. The effects of a
programming error can be quite global and not confined to regions in which there is a
discontinuity. The effects of instability will be oscillatory and will be most noticeable in
regions where the solution is least smooth.
2.3 Comments on Instability and Stability 59
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Stability Conditions for Variable Coefficients


The analysis of stability as done in the previous section does not apply directly to problems
with variable coefficients. Nonetheless, the stability conditions obtained for constant co-
efficient schemes can be used to give stability conditions for the same scheme applied to
equations with variable coefficients. For example, the Lax–Friedrichs scheme applied to
ut + a(t, x)ux = 0 is

n+1
vm = 12 (vm+1
n
+ vm−1
n
) − 12 a(tn , xm )λ(vm+1
n
− vm−1
n
). (2.3.1)

The stability condition for this scheme is that |a(tn , xm )|λ ≤ 1 be satisfied for all values
of (tn , xm ) in the domain of computation.
The general procedure is that one considers each of the frozen coefficient problems
arising from the scheme. The frozen coefficient problems are the constant coefficient
problems obtained by fixing the coefficients at their values attained at each point in the
domain of the computation. If each frozen coefficient problem is stable, then the variable
coefficient problem is also stable. The proof of this result is beyond the scope of this text;
the interested reader may wish to refer to the works of Kreiss [32], Lax and Nirenberg [36],
Michelson [41], Shintani and Toemeda [56], Yamaguti and Nogi [70], and Wade [67].
If the stability condition as obtained from the frozen coefficient problems is violated
in a small region, the instability phenomena that arise will originate in that area and will
not grow outside that area; see Exercise 2.3.3.

Numerical Stability and Dynamic Stability


The term stability is used in a number of contexts in applied mathematics and engineering,
and it is important to distinguish between the several uses of this term. The stability
of Definition 1.5.1 can be called the numerical stability of finite difference schemes. In
applied mathematics it is common to study dynamic stability, which refers to the property
of a system in which small variations from a reference state will decay, or at least not grow,
with time. Dynamic stability refers to the behavior of solutions as time extends to infinity,
whereas the numerical stability of a scheme always refers to the behavior of solutions over
a finite interval of time as the grid is refined.
To compare these two concepts, consider the equation

ut + aux + bu = 0 (2.3.2)

for x in R and t > 0. If the value of b is positive, then the equation can be said to be
dynamically stable since any solution will decay as t increases. If b is negative, then
it is dynamically unstable, since solutions grow without bound as t increases. (See the
discussion relating to equation (1.1.3) to verify these assertions.) For a finite difference
scheme for (2.3.2), the numerical stability is independent of the value of b, as shown by
Theorem 2.2.3. One can use any convergent scheme to compute solutions to (2.3.2) for
any value of b; however, a numerically unstable scheme applied to a dynamically stable
equation will not compute convergent solutions.
60 Chapter 2. Analysis of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
2.3.1. Use the unstable forward-time forward-space scheme (1.3.1) for ut + ux = 0 with
the initial data

1 − |x| if |x| ≤ 1,
u0 (x) =
0 otherwise
on the interval [−1, 3] for 0 ≤ t ≤ 1. Use a grid spacing of 0.1 and λ equal
to 0.8. Demonstrate that the instability grows by approximately |g(π )| per time
step. Comment on the appearance of the graph of vm n as a function of m. Use the

boundary condition u(t, −1) = 0 at the left boundary and use vM


n+1
= vM−1
n+1
at the
right boundary.
2.3.2. Use the unstable forward-time central-space scheme (1.3.3) for ut + ux = 0 with
the following two sets of initial data on the interval [−1, 3] for 0 ≤ t ≤ 4 :

1 − |x| if |x| ≤ 1,
(a) u0 (x) =
0 otherwise
(b) u0 (x) = sin x.
Use a grid spacing of 0.1 and λ equal to 0.8. Demonstrate that the instability
is evident sooner with the less smooth initial data (a) than it is for the smooth data (b).
Show that the growth in the instability for each case is approximately |g(π/2)|. For
(a) use the boundary condition u(t, −1) = 0, and for (b) use the boundary condition
u(t, −1) = − sin(1 + t). Use vM n+1
= vM−1
n+1
at the right boundary.
2.3.3. Solve the initial value problem for equation

ut + (1 + 4 (3 − x)(1 + x))ux
1
= 0

on the interval [−1, 3] with the Lax–Friedrichs scheme (2.3.1) with λ equal to 0.8.
Demonstrate that the instability phenomena occur where |a(t, x)λ| is greater than
1 and where there are discontinuities in the solution. Use the same initial data as
in Exercise 2.3.1. Specify the solution to be 0 at both boundaries. Compute up to
the time of 0.2 and use successively smaller values of h to show the location of the
instability.
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 3

Order of Accuracy of Finite


Difference Schemes

In this chapter we study schemes based on how accurately they approximate partial dif-
ferential equations. We present the Lax–Wendroff and Crank–Nicolson schemes, both of
which are second-order accurate schemes. A convenient method for deriving higher order
accurate schemes, as well as a convenient notation, is provided by the symbolic difference
calculus. We also discuss the effect of boundary conditions on the stabilty of schemes. The
chapter closes by presenting the Thomas algorithm for solving for the solution of implicit
schemes.

3.1 Order of Accuracy


In the previous two chapters we classified schemes as acceptable or not acceptable only
on the basis of whether or not they are convergent. This, via the Lax–Richtmyer equiva-
lence theorem, led us to consider stability and consistency. However, different convergent
schemes may differ considerably in how well their solutions approximate the solution of the
differential equation. This may be seen by comparing Figures 1.3.6 and 1.3.8, which show
solutions computed with the Lax–Friedrichs and leapfrog schemes. Both of these schemes
are convergent for λ equal to 0.8, yet the leapfrog scheme has a solution that is closer to the
solution of the differential equation than does the Lax–Friedrichs scheme. In this section
we define the order of accuracy of a scheme, which can be regarded as an extension of the
definition of consistency. The leapfrog scheme has a higher order of accuracy than does
the Lax–Friedrichs scheme, and thus, in general, its solutions will be more accurate than
those of the Lax–Friedrichs scheme. The proof that schemes with higher order of accuracy
generally produce more accurate solutions is in Chapter 10.
Before defining the order of accuracy of a scheme, we introduce two schemes, which,
as we will show, are more accurate than most of the schemes we have presented so far. We
will also have to pay more attention to the way the forcing function, f (t, x), is incorporated
into the scheme.

The Lax–Wendroff Scheme


To derive the Lax–Wendroff scheme [37] for the one-way wave equation, we begin by using
the Taylor series in time for u(t + k, x), where u is a solution to the inhomogeneous

61
62 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

one-way wave equation (1.1.1),


k2
u(t + k, x) = u(t, x) + kut (t, x) + utt (t, x) + O(k 3 ).
2
We now use the differential equation that u satisfies,
ut = −aux + f ,
and the relation
utt = −autx + ft = a 2 uxx − afx + ft
to obtain
a2k2 ak 2 k2
u(t + k, x) = u(t, x) − ak ux (t, x) + uxx (t, x) + kf − fx + ft + O(k 3 ).
2 2 2
Replacing the derivatives in x by second-order accurate differences and ft by a forward
difference, we obtain

u(t, x + h) − u(t, x − h)
u(t + k, x) =u(t, x) − ak
2h
a 2 k 2 u(t, x + h) − 2u(t, x) + u(t, x − h)
+
2 h2
 
k  ak 2 f (t, x + h) − f (t, x − h)
+ f (t + k, x) + f (t, x) −
2 2 2h
+ O(kh2 ) + O(k 3 ).
This gives the Lax–Wendroff scheme

aλ n a 2 λ2 n
n+1
vm =vm
n
− (vm+1 − vm−1
n
)+ (vm+1 − 2vm
n
+ vm−1
n
)
2 2
(3.1.1)
k akλ n
+ (fmn+1 + fmn ) − (fm+1 − fm−1
n
),
2 4
or, equivalently,

n+1 − v n
vm v n − vm−1
n n
a 2 k (vm+1 − 2vmn + vn
m−1 )
m
+ a m+1 − 2
k 2h 2 h
(3.1.2)
1 n+1 ak
= (fm + fmn ) − (f n − fm−1 n
).
2 4h m+1
Figure 3.1 shows a comparison of the Lax–Wendroff scheme and the Lax–Friedrichs
schemes for the computation used in Example 1.3.1. The solution for the Lax–Wendroff
scheme is shown with circles; it is the one that has the greater maximum. In general, the
solution of the Lax–Wendroff scheme is closer to the exact solution, which is also shown.
Notice that the solution to the Lax–Wendroff scheme goes below the x-axis, while the
solution of the Lax–Friedrichs scheme is always on or above the axis.
3.1 Order of Accuracy 63
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

0.5

-2 -1 0 1 2 3

Figure 3.1. Comparison of the Lax–Wendroff and Lax–Friedrichs schemes.

The Crank–Nicolson Scheme


The Crank–Nicolson scheme is obtained by differencing the one-way wave equation (1.1.1)
about the point (t + k/2, x) to obtain second-order accuracy. We begin with the formula

1 u(t + k, x) − u(t, x)
ut t + k, x = + O(k 2 ).
2 k

We also use the relation



1 ux (t + k, x) + ux (t, x)
ux t + k, x = + O(k 2 )
2 2
!
1 u(t + k, x + h) − u(t + k, x − h) u(t, x + h) − u(t, x − h)
= +
2 2h 2h

+ O(k 2 ) + O(h2 ).

Using these approximations for ut + aux = f about (t + k/2, x), we obtain

n+1 − v n
vm v n+1 − vm−1
n+1
+ vm+1
n − vm−1
n
f n+1 + fmn
m
+ a m+1 = m (3.1.3)
k 4h 2
or, equivalently,

aλ n+1 aλ n+1 aλ n aλ n k
v + vm
n+1
− v = − vm+1 + vm
n
+ v + (f n+1 + fmn ). (3.1.4)
4 m+1 4 m−1 4 4 m−1 2 m

A comparison of the Crank–Nicolson scheme and the backward-time and central-


space scheme (1.6.1) is given in Figure 3.2.
64 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

0.5

0
-1 -0.5 0 0.5 1

Figure 3.2. Comparison of two implicit schemes.

The solution for the initial data and exact solution is a saw-tooth curve and the exact
solution is shown in the figure. The solution of the Crank–Nicolson scheme is shown with
circles and is the solution closer to the exact solution. The solution to the backward-time
central-space scheme is shown with squares marking the discrete points. In general, the
Crank–Nicolson scheme has more accurate solutions than does the backward-time central
scheme.
As we see from these two schemes that we have derived, a scheme for the partial
differential equation P u = f can be written in general as Pk,h v = Rk,h f in a natural
way, where each expression Pk,h v and Rk,h f evaluated at a grid point (tn , xm ) involves
only a finite sum of terms involving vm n or f n , respectively. We are now able to give
 m
our first definition of the order of accuracy of a scheme.
Definition 3.1.1. A scheme Pk,h v = Rk,h f that is consistent with the differential equation
P u = f is accurate of order p in time and order q in space if for any smooth function
φ(t, x),
Pk,h φ − Rk,h P φ = O(k p ) + O(hq ). (3.1.5)
We say that such a scheme is accurate of order (p, q).

If we compare this definition with Definition 1.4.2, we see that consistency requires
only that Pk,h φ − P φ be o(1), whereas Definition 3.1.1 takes into consideration the
more detailed information on this convergence. The operator Rk,h is required to be an
approximation of the identity operator by the requirement that Pk,h be consistent with P .
The quantity Pk,h φ − Rk,h P φ is called the truncation error of the scheme.

Example 3.1.1. We illustrate the use of this definition by showing that the Lax–Wendroff
scheme (3.1.2) is accurate of order (2, 2). We have, from (3.1.2),
n+1 − φ n
φm φ n − φm−1
n n
a 2 k (φm+1 − 2φmn + φn
m−1 )
Pk,h φ = m
+ a m+1 − (3.1.6)
k 2h 2 h2
and
1 n+1 ak n
Rk,h f = (fm + fmn ) − (f − fm−1
n
). (3.1.7)
2 4h m+1
3.1 Order of Accuracy 65
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

As before, we use the Taylor series on (3.1.6) evaluated at (tn , xm ) to obtain

k a2k
Pk,h φ = φt + φtt + aφx − φxx + O(k 2 ) + O(h2 ). (3.1.8)
2 2

For a smooth function f (t, x), (3.1.7) becomes

k ak
Rk,h f = f + ft − fx + O(k 2 ) + O(h2 ),
2 2

and if f = φt + aφx = P φ, this is

k k ak a2k
Rk,h P φ = φt + aφx + φtt + aφxt − φxt − φxx + O(k 2 ) + O(h2 )
2 2 2 2
k a2k
= φt + aφx + φtt − φxx + O(k 2 ) + O(h2 ),
2 2

which agrees with (3.1.8) to O(k 2 ) + O(h2 ). Hence the Lax–Wendroff scheme (3.1.2) is
accurate of order (2, 2).

We also see from this analysis that the Lax–Wendroff scheme with Rk,h fmn = fmn ,
i.e.,

aλ  n  a 2 λ2  n 
n+1
vm = vm
n
− vm+1 − vm−1
n
+ vm+1 − 2vm
n
+ vm−1
n
+ k fmn , (3.1.9)
2 2

is accurate of order (1, 2).


Notice that to determine the order of accuracy we use the form (3.1.2) of the Lax–
Wendroff scheme rather than (3.1.1), which is derived from (3.1.2) by multiplying by k
and rearranging the terms. Without an appropriate normalization, in this case demanding
that Pk,h u be consistent with P u, we can get incorrect results by multiplying the scheme
by powers of k or h. An equivalent normalization is that Rk,h applied to the function that
is 1 everywhere gives the result 1, i.e.,

Rk,h 1 = 1. (3.1.10)

Definition 3.1.1 is not completely satisfactory. For example, it cannot be applied


to the Lax–Friedrichs scheme, which contains the term k −1 h2 φxx in the Taylor series
expansion of Pk,h φ. We therefore give the following definition, which is more generally
applicable. We assume that the time step is chosen as a function of the space step, i.e.,
k = (h), where  is a smooth function of h and (0) = 0.

Definition 3.1.2. A scheme Pk,h v = Rk,h f with k = (h) that is consistent with the
differential equation P u = f is accurate of order r if for any smooth function φ(t, x),

Pk,h φ − Rk,h P φ = O(hr ).


66 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

If we take (h) = λh, then the Lax–Friedrichs scheme (1.3.5) is consistent with the
one-way wave equation according to Definition 3.1.2.

Symbols of Difference Schemes


Another useful way of checking for the accuracy of a scheme is by comparing the symbols
of the difference scheme to the symbol of the differential operator. Using the symbol is
often a more convenient method than that given in Definitions 3.1.1 and 3.1.2.

Definition 3.1.3. The symbol pk,h (s, ξ ) of a difference operator Pk,h is defined by

Pk,h (eskn eimhξ ) = pk,h (s, ξ )eskn eimhξ .

That is, the symbol is the quantity multiplying the grid function eskn eimhξ after operating
on this function with the difference operator.
As an example, for the Lax–Wendroff operator we have

esk − 1 ia a2k 1
pk,h (s, ξ ) = + sin hξ + 2 2 sin2 hξ
k h h 2
and
1 sk iak
rk,h (s, ξ ) = (e + 1) − sin hξ.
2 2h
The normalization (3.1.10) means

rk,h (0, 0) = 1.

Definition 3.1.4. The symbol p(s, ξ ) of the differential operator P is defined by

P (est eiξ x ) = p(s, ξ )est eiξ x .

That is, the symbol is the quantity multiplying the function est eixξ after operating on this
function with the differential operator.
In checking the accuracy of a scheme by using Taylor series and Definition 3.1.1, it is
seen that the derivatives of φ serve primarily as arbitrary coefficients for the polynomials
in h and k. The powers of the dual variables s and ξ can also serve as the coefficients
of h and k in the definition of accuracy, as the following theorem states.

Theorem 3.1.1. A scheme Pk,h v = Rk,h f that is consistent with P u = f is accurate of


order (p, q) if and only if for each value of s and ξ,

pk,h (s, ξ ) − rk,h (s, ξ )p(s, ξ ) = O(k p ) + O(hq ), (3.1.11)

or equivalently,
pk,h (s, ξ )
− p(s, ξ ) = O(k p ) + O(hq ). (3.1.12)
rk,h (s, ξ )
3.1 Order of Accuracy 67
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Proof. By consistency we have for each smooth function φ that

Pk,h φ − P φ

tends to zero as h and k tend to zero; see Definition 1.4.2. Taking

φ(t, x) = est eiξ x ,

we have that
pk,h (s, ξ ) − p(s, ξ ) = o(1) (3.1.13)
for each (s, ξ ).
From Definition 3.1.1 for the order of accuracy and using this same function for
φ(t, x), we have—by the definition of the symbol—that

pk,h (s, ξ ) − rk,h (s, ξ )p(s, ξ ) = O(k p ) + O(hq ),

which is (3.1.11). Hence from (3.1.13) and (3.1.11) we have that

rk,h (s, ξ ) = 1 + o(1), (3.1.14)

and by dividing (3.1.11) by rk,h (s, ξ ), we obtain (3.1.12).


To show that (3.1.12) implies (3.1.5), we again have by consistency that (3.1.14)
holds, and hence (3.1.11) holds also. To obtain the Taylor series expansion for Pk,h φ, we
note that if
pk,h (s, ξ ) = A$,j (k, h)s $ (iξ )j ,
$,j ≥0

then
∂ $+j φ
Pk,h φ = A$,j (k, h) .
∂t $ ∂x j
$,j ≥0

Therefore, (3.1.5) follows from (3.1.12).

Corollary 3.1.2. A scheme Pk,h v = Rk,h f with k = (h) that is consistent with P u = f
is accurate of order r if and only if for each value of s and ξ

pk,h (s, ξ )
− p(s, ξ ) = O(hr ). (3.1.15)
rk,h (s, ξ )

In practice, the form (3.1.11) is often more convenient than is (3.1.12) or (3.1.15) for
showing the order of accuracy.
In Chapter 10 we show that if a scheme is accurate of order r, then the finite difference
solution converges to the solution of the differential equation with the same order, provided
that the initial data are sufficiently smooth.
68 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example 3.1.2. As an example of using Theorem 3.1.1, we prove that the Crank–Nicolson
scheme (3.1.3) is accurate of order (2, 2). From (3.1.3) we have that

esk − 1 esk + 1 sin hξ


pk,h (s, ξ ) = + ia
k 2 h
and
esk + 1
rk,h (s, ξ ) = .
2
The left-hand side of (3.1.11) for this case is

esk − 1 esk + 1 sin hξ esk + 1


+ ia − (s + iaξ ) . (3.1.16)
k 2 h 2
We could use Taylor series expansions on this expression, but the work is reduced if we
first multiply (3.1.16) by e−sk/2 . Since e−sk/2 is O(1), multiplying by it will not affect
the determination of accuracy. We then have

esk/2 − e−sk/2 esk/2 + e−sk/2 sin hξ esk/2 + e−sk/2


+ ia − (s + iaξ ) . (3.1.17)
k 2 h 2
The Taylor series expansions of the different expressions are then

esk/2 − e−sk/2 s 3k2  


=s+ + O k4 ,
k 24

esk/2 + e−sk/2 s 2k2  


=1+ + O k4 ,
2 8
and

sin hξ ξ 3 h2  
=ξ− + O h4 .
h 6
Substituting these expansions in (3.1.17) we obtain

s 3k2 s 2ξ k2 ξ 3 h2
s + iaξ + + ia − ia
24 8 6
  
s 2k2
− 1+ (s + iaξ ) + O k 4 + h4 + k 2 h2
8

k2s 3 ξ 3 h2  
=− − ia + O k 4 + h 4 + k 2 h2
12 6
   
= O k 2 + O h2 .
3.1 Order of Accuracy 69
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Thus, the Crank–Nicolson scheme is accurate of order (2, 2).


Using Taylor series expansions directly on (3.1.16) instead of (3.1.17) would have
resulted in terms of order h and k in the expansion. These terms would have all canceled
out, giving the same order of accuracy. Working with equation (3.1.17) greatly reduces the
amount of algebraic manipulation that must be done to check the order of accuracy. Similar
techniques can be used on other schemes.

Order of Accuracy for Homogeneous Equations


For many initial value problems one is concerned only with the homogeneous equation
P u = 0 rather than the inhomogeneous equation P u = f. In this case one can determine
the order of accuracy without explicit knowledge of the operator Rk,h . We now show how
this is done. It is important to make sure that our treatment of this topic applies to schemes
for systems of differential equations as well as to single equations.
We begin by extending the set of symbols we have been using. Thus far we have con-
sidered symbols of finite difference schemes and symbols of partial differential operators,
but we will find it convenient to extend the class of symbols.

Definition 3.1.5. A symbol a(s, ξ ) is an infinitely differentiable function defined for


complex values of s with Re s ≥ c for some constant c and for all real values of ξ.
This definition includes as symbols not only the symbols of differential operators and
finite difference operators, but also many other functions. Symbols of differential operators
are polynomials in s and ξ, and symbols of difference operators are polynomials in eks
with coefficients that are either polynomials or rational functions of eihξ .

Definition 3.1.6. A symbol a(s, ξ ) is congruent to zero modulo a symbol p (s, ξ ), written

a(s, ξ ) ≡ 0 mod p (s, ξ ),

if there is a symbol b(s, ξ ) such that

a(s, ξ ) = b(s, ξ ) p (s, ξ ).

We also write
a(s, ξ ) ≡ c(s, ξ ) mod p (s, ξ )
if
a(s, ξ ) − c(s, ξ ) ≡ 0 mod p (s, ξ ).
We can now define the order of accuracy for homogeneous equations.

Theorem 3.1.3. A scheme Pk,h v = 0, with k = (h), that is consistent with the equation
P u = 0 is accurate of order r if

pk,h (s, ξ ) ≡ O(hr ) mod p (s, ξ ). (3.1.18)


70 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Proof. By Definition 3.1.6 the relation (3.1.18) holds if and only if there is a symbol
r̃k,h (s, ξ ) such that
pk,h (s, ξ ) − r̃k,h (s, ξ )p (s, ξ ) = O(hr ).
Since p (s, ξ ) is a linear polynomial in s with coefficients that are polynomials in ξ
and since pk,h (s, ξ ) is essentially a polynomial in esk with coefficients that are rational
functions of eihξ , it is not difficult to show that there is a symbol rk,h (s, ξ ) such that

rk,h (s, ξ ) ≡ r̃k,h (s, ξ ) + O(hr )

and rk,h (s, ξ ) is a polynomial in esk with coefficients that are rational functions of eihξ .
The replacement of r̃k,h (s, ξ ) by rk,h (s, ξ ) is not strictly necessary for the proof, but it is
important from the point of view of constructing an actual difference operator Rk,h whose
symbol is rk,h (s, ξ ) and that can actually be used in computation.
If we wish to use the Taylor series method of Definition 3.1.1 for checking the accuracy
of homogeneous equations, then we can proceed in a way analogous to Definition 3.1.6 and
Theorem 3.1.3. Equivalently, we can show that if

Pk,h φ = O(hr )

for each formal solution to P φ = 0, then the scheme is accurate of order r. By saying
a formal solution, we emphasize that we do not require knowledge of the existence of
solutions or of the smoothness of the solution; we merely use the relation P φ = 0 in
evaluating Pk,h φ. As an example, for the Lax–Wendroff scheme for the homogeneous
equation (1.1.1), we have

k2
φ(t + k, x) = φ(t, x) + kφt (t, x) + φtt (t, x) + O(k 3 )
2

k2  2 
= φ(t, x) + k [−aφx (t, x)] + a φxx + O(k 3 )
2

= φ(t, x) − [φ(t, x + h) − φ(t, x − h)]
2

a 2 λ2
+ [φ(t, x + h) − 2φ(t, x) + φ(t, x − h)] + O(k 3 ) + O(kh2 ).
2

Using this last expression in the formula for Pk,h given by (3.1.6) we see that the Lax–
Wendroff scheme is second-order accurate. In this derivation we have used the relations

φt = −aφx

and
φtt = −aφxt = a 2 φxx .
3.1 Order of Accuracy 71
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

From the preceding expression we obtain the scheme (3.1.1) without the terms
involving f.
As is seen in Chapter 10, even for the homogeneous initial value problem it is im-
portant to know that the symbol rk,h (s, ξ ) exists in order to prove that the proper order of
convergence is attained.
We use symbols to prove the following theorem, proved by Harten, Hyman, and
Lax [29], about schemes for the one-way wave equation and other hyperbolic equations.

Theorem 3.1.4. An explicit one-step scheme for hyperbolic equations that has the form



n+1
vm = n
α$ vm+$ (3.1.19)
$=−∞

for homogeneous problems can be at most first-order accurate if all the coefficients α$ are
nonnegative, except for the trivial schemes for the one-way wave equation with aλ = $,
where $ is an integer, given by
n+1
vm = vm−$
n
. (3.1.20)

Proof. We prove the theorem only for the one-way wave equation (1.1.1). As shown
in the discussion of Section 1.1, this is sufficient for the general case. The symbol of the
scheme (3.1.19) is
"
esk − α$ ei$hξ
pk,h (s, ξ ) = .
k
If we allow for a right-hand-side symbol rk,h (s, ξ ) = 1 + O(k) + O(h), the accuracy of
the scheme is determined by considering the expression

"
esk − α$ ei$hξ
− (1 + O(k) + O(h))(s + iaξ ).
k

If this expression is to be bounded as k tends to 0, we must have that this expression is


finite when s and ξ are 0. This implies that



α$ = 1. (3.1.21)
$=−∞

The terms in s to the first power agree, and the coefficients of ks 2 will cancel only if

rk,h = 1 + 12 sk + O(ξ k) + O(h).

The only occurrence of terms with the monomial sξ k appears in the product of rk,h with
s + iaξ, and these will cancel only if rk,h = 1 + 12 k(s − iaξ ) + O(h). Moreover, the
72 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

term O(h) must actually be O(h2 ), since there is no term of the form sh coming from
the symbol of the scheme. The terms to the first power of ξ are

h
−i α$ $ − ia,
k
$=−∞

and this expression must be zero if the scheme is to be first-order accurate. This gives the
relation

α$ $ = −aλ. (3.1.22)
$=−∞

Next consider the terms that are the coefficients of ξ 2 . They are

h2 ka 2
− α $ $2 + .
2k 2
$=−∞

To have second-order accuracy this expression must also be zero, giving




α $ $2 = a 2 λ 2 . (3.1.23)
$=−∞

We now use the Cauchy–Schwarz inequality on these three relations (3.1.21),


(3.1.22), and (3.1.23), for the coefficients of the scheme. We have, starting with (3.1.22),

∞ ∞
√ √

|aλ| = α$ $ = α$ α$ $

$=−∞ $=−∞

 ∞
1  ∞
1
2 2
≤ α$ α$ $ 2
= |aλ|.
$=−∞ $=−∞

Since the first and last expressions in this string of inequalities and equalities are the same,
it follows that all the expressions are equal. However, the Cauchy–Schwarz inequality is
an equality only if all the terms with the same index are proportional. This means there
must be a constant c such that
√ √
α$ $ = c α$ for all $,

and this implies that at most one α$ is nonzero. It is then easy to check that the only way
these relations can be satisfied is if aλ is an integer, and the resulting schemes are the
trivial schemes (3.1.20). This proves the theorem.
An examination of equations (3.1.21), (3.1.22), and (3.1.23) shows that the Lax–
Wendroff scheme is the explicit one-step second-order accurate scheme that uses the fewest
number of grid points. (See Exercise 3.1.1.)
3.1 Order of Accuracy 73
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

One consequence of this theorem is that schemes such as we are discussing that are
more than first-order accurate will have oscillatory solutions. For example, as shown in
Figure 3.1 the solution to the Lax–Wendroff scheme goes below the x-axis. This is the
result of some of the coefficients in the scheme (the α$ ) being negative. The Lax–Friedrichs
scheme has all coefficients nonnegative (when |aλ| ≤ 1 ) and it has a positive solution as
illustrated in Figure 3.1.
Schemes of the form (3.1.19) for which all the coefficients are nonnegative are called
monotone schemes. Monotone schemes have the property that the maximum of the solution
does not increase with time and, similarly, the minimum does not decrease. The theorem
says that monotone schemes can be at most first-order accurate.

Order of Accuracy of the Solution


We have spent some time on rigorously defining the order of accuracy of finite difference
schemes, and the importance of this concept is that it relates directly to the accuracy of the
solutions that are computed using these schemes. The order of accuracy of the solution of
a finite difference scheme is a quantity that can be determined by computation. For our
purposes here and in the exercises, it is sufficient to define the order of accuracy of the
solution of a finite difference scheme as follows. If we have an initial value problem for
a partial differential equation with solution u(t, x) and a finite difference scheme, we use
the initial data of the differential equation evaluated at the grid points as initial data for the
scheme, i.e., vm 0 = u(0, x ). We also assume that the time step is a function of the space
m
step, i.e., k = (h). We then determine the error at time tn = nk by
Error(tn ) = u(tn , ·) − v n h
 1/2
(3.1.24)
= h |u(tn , xm ) − vm |
n 2
,
m

where the sum is over all grid points. The order of accuracy of the solution is defined to be
that number r, if it exists, such that

Error(tn ) = O(hr ).

In Chapter 10 it is shown that for smooth initial data, the order of accuracy of the
solution is equal to the order of accuracy of the scheme. Moreover, for those cases in which
the data are not smooth enough for the accuracy of the solution to equal that of the scheme,
it is shown how the order of accuracy of the solution depends on both the order of accuracy
of the scheme and the smoothness of the initial data.
Table 3.1.1 displays results of several computations illustrating the order of accuracy
of solutions of several schemes.
The schemes are applied to a periodic computation to remove all effects of boundary
conditions. The value of λ was 0.9 for all computations. Columns 2 and 4 show the error
as measured by (3.1.24) for the initial value problem for the one-way wave equation with
a = 1 and initial data

u0 (x) = sin(2π x) for − 1 ≤ x ≤ 1.


74 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The error in the solution was measured at time 5.4. The first time step for the leapfrog
scheme was computed with the forward-time central-space scheme.
Notice that the order of the error for the first-order accurate, forward-time backward-
space scheme tends to 1 and that for the second-order accurate leapfrog scheme tends to 2.

Table 3.1.1
Comparison of order of accuracy of solutions.
Forward t backward x Leapfrog scheme
h Error Order Error Order
1/10 6.584e–1 5.945e–1
1/20 4.133e–1 0.672 1.320e–1 2.17
1/40 2.339e–1 0.821 3.188e–2 2.05
1/80 1.247e–1 0.907 7.937e–3 2.01
1/160 6.445e–2 0.953 1.652e–3 2.26

The order of accuracy of the solution, as given here, is dependent on the initial data
for the scheme and on the norm. For example, if the error is measured as the maximum
value of |u(t, xm ) − vm
n |, then the order of accuracy of the solution can be different than,

and usually not more than, the order obtained by the preceding definition. This topic is
discussed in more detail in Chapter 10.

Table 3.1.2
Comparison of order of accuracy of solutions.
Lax–Wendroff Lax–Friedrichs
h Error Order Error Order
1/10 1.021e–1 2.676e–1
1/20 4.604e–2 1.149 1.791e–1 0.579
1/40 2.385e–2 0.949 1.120e–1 0.677
1/80 1.215e–2 0.974 6.718e–2 0.738
1/160 6.155e–3 0.981 3.992e–2 0.751

Table 3.1.2 displays results of several computations with a solution that is not smooth
enough to give the solution the same order of accuracy as that of the scheme. Columns 2
and 4 show the error as measured by (3.1.24) for the initial value problem for the one-way
wave equation with a = 1 and initial data
#
1 − 2|x| if |x| ≤ 1/2,
u0 (x) =
0 otherwise.

The value of λ was 0.9 for all computations and the error in the solution was measured
at time 5.4. For this exact solution, the Lax–Wendroff scheme has solutions that converge
with an order of accuracy 1, while the Lax–Friedrichs scheme has solutions with order of
3.1 Order of Accuracy 75
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

accuracy 0.75. Convergence estimates proved in Chapter 10 give the rate of convergence
of solutions if the initial data are not smooth.

Exercises
3.1.1. Using equations (3.1.21), (3.1.22), and (3.1.23), show that the Lax–Wendroff scheme
is the only explicit one-step second-order accurate scheme that uses only the grid
points xm−1 , xm , and xm+1 to compute the solution at xm for the next time step.

3.1.2. Solve ut + ux = 0, −1 ≤ x ≤ 1, 0 ≤ t ≤ 1.2 with u(0, x) = sin 2π x and pe-


riodicity, i.e., u(t, 1) = u(t, −1). Use two methods:
(a) Forward-time backward-space with λ = 0.8,
(b) Lax–Wendroff with λ = 0.8.

Demonstrate the first-order accuracy of the solution of (a) and the second-order
accuracy of the solution of (b) using h = 10
1 1 1 1
, 20 , 40 , and 80 . Measure the error in
2
the L norm (3.1.24) and the maximum norm. (In the error computation, do not
sum both grid points at x = −1 and x = 1 as separate points.)

3.1.3. Solve the equation of Exercise 1.1.5,

ut + ux = − sin2 u,

with the scheme (3.1.9), treating the − sin2 u term as f (t, x). Show that the scheme
is first-order accurate. The exact solution is given in Exercise 1.1.5. Use a smooth
function, such as sin(x − t), as initial data and boundary data.

3.1.4. Modify the scheme of Exercise 3.1.3 to be second-order accurate and explicit. There
are several ways to do this. One way uses
   
sin2 vm
n+1
= sin2 vm
n
+ sin 2vm
n n+1
vm − vm
n
+ O k2 .

Another way is to evaluate explicitly the ft term in the derivation of the Lax–
Wendroff scheme and eliminate all derivatives with respect to t using the differential
equation.

3.1.5. Determine the order of accuracy of the Euler backward scheme in Exercise 2.2.6.

3.1.6. Show that the scheme discussed in Example 2.2.6 has the symbol

esk − cos hξ sin2 21 hξ sin hξ


+ 4ai
k h3

and discuss the accuracy of the scheme.


76 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

3.2 Stability of the Lax–Wendroff and Crank–Nicolson


Schemes
In this section we demonstrate the stability of the Lax–Wendroff and Crank–Nicolson
schemes. The stability analysis of the Lax–Wendroff scheme is informative because similar
steps can be used to show the stability of other schemes. From (3.1.1) the Lax–Wendroff
scheme for the one-way wave equation is

aλ  n  a 2 λ2  n 
n+1
vm = vm
n
− vm+1 − vm−1
n
+ vm+1 − 2vm
n
+ vm−1
n
.
2 2
Notice that we set f = 0 as required to obtain the amplification factor. We substitute
  n and then cancel the factor of g n eimθ , obtaining the following equation
g n eim θ for vm 
for the amplification factor:

aλ iθ a 2 λ2 iθ
g(θ ) = 1 − (e − e−iθ ) + (e − 2 + e−iθ )
2 2
= 1 − iaλ sin θ − a 2 λ2 (1 − cos θ)

1
= 1 − 2a 2 λ2 sin2 θ − iaλ sin θ.
2
To compute the magnitude of g(θ ) we compute |g(θ )|2 by summing the squares of
the real and imaginary parts:
 2
|g(θ )|2 = 1 − 2a 2 λ2 sin2 21 θ + (aλ sin θ )2 . (3.2.1)

To work with these two terms we use the half-angle formula on the imaginary part, obtaining

 2  2
|g(θ )|2 = 1 − 2a 2 λ2 sin2 21 θ + 2aλ sin 12 θ cos 12 θ

= 1 − 4a 2 λ2 sin2 21 θ + 4a 4 λ4 sin4 21 θ + 4a 2 λ2 sin2 21 θ cos2 21 θ .

Notice that two terms have a 2 λ2 as a factor and one has a 4 λ4 as a factor. We combine
the two terms with a 2 λ2 first, and then factor the common factors as follows:

|g(θ )|2 = 1 − 4a 2 λ2 sin2 21 θ(1 − cos2 21 θ) + 4a 4 λ4 sin4 21 θ

= 1 − 4a 2 λ2 sin4 21 θ + 4a 4 λ4 sin4 21 θ (3.2.2)


 
= 1 − 4a 2 λ2 1 − a 2 λ2 sin4 21 θ.

From this form for |g(θ )|2 we can see that it is less than or equal to 1 only if the
quantity to the right of the first minus sign is nonnegative. All the factors except 1 − a 2 λ2
3.2 Stability 77
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

are certainly nonnegative. To insure that |g(θ )|2 is always at most 1, we must have this
quantity nonnegative; i.e., the Lax–Wendroff scheme is stable if and only if |aλ| ≤ 1.
For the Crank–Nicolson scheme from (3.1.3) we have
n+1 − v n
vm v n+1 − vm−1
n+1
+ vm+1
n − vm−1
n
m
+ a m+1 = 0,
k 4h
where we have set f = 0 as required in obtaining the amplification factor. Substituting
  n and then canceling, we obtain
g n eim θ for vm 

g−1 geiθ − ge−iθ + eiθ − e−iθ


+a = 0.
k 4h
Or,
g+1
g − 1 + aλ i sin θ = 0,
2
from which we obtain the following expression for the amplification factor:
1 − i 12 aλ sin θ
g(θ ) = .
1 + i 12 aλ sin θ
As the ratio of a complex number and its conjugate we have immediately that |g(θ )| = 1.
Alternatively,
1 + ( 12 aλ sin θ)2
|g(θ )|2 = = 1.
1 + ( 12 aλ sin θ)2
This scheme is stable for any value of λ; it is unconditionally stable.

Exercises
3.2.1. Show that the (forward-backward) MacCormack scheme
n+1
ṽm = vm
n
− aλ(vm+1
n
− vm
n
) + kfmn ,
n+1
vm = 12 (vm
n
+ ṽm
n+1
− aλ(ṽm
n+1
− ṽm−1
n+1
) + kfmn+1 )
is a second-order accurate scheme for the one-way wave equation (1.1.1). Show that
for f = 0 it is identical to the Lax–Wendroff scheme (3.1.1).
3.2.2. Show that the backward-time central-space scheme (1.6.1) is unconditionally stable.
3.2.3. Show that the box scheme
1  n+1 
(vm + vm+1 n+1
) − (vmn
+ vm+1
n
)
2k
a  n+1 
+ (vm+1 − vm n+1
) + (vm+1
n
− vm n
) (3.2.3)
2h
1  n+1 
= fm+1 + fmn+1 + fm+1 n
+ fmn
4
is an approximation to the one-way wave equation ut + aux = f that is accurate
of order (2, 2) and is stable for all values of λ.
78 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

3.2.4. Using the box scheme (3.2.3), solve the one-way wave equation

ut + ux = sin(x − t)

on the interval [0, 1] for 0 ≤ t ≤ 1.2 with u(0, x) = sin x and with u(t, 0) =
−(1 + t) sin t as the boundary condition.
Demonstrate the second-order accuracy of the solution using λ = 1.2 and h =
1 1 1 1 2
10 20 , 40 , and 80 . Measure the error in the L norm (3.1.24) and the maximum
,
norm. To implement the box scheme note that v0n+1 is given by the boundary data,
n+1 n+1 and the other values.
and then each value of vm+1 can be determined from vm
3.2.5. Show that the following modified box scheme for ut + aux = f is accurate of order
(2, 4) and is unconditionally stable. The scheme is

1 n+1
(−vm+2 + 9vm+1
n+1
+ 9vm
n+1
− vm−1
n+1
)
16

+ n+1
(−vm+2 + 27vm+1
n+1
− 27vm
n+1
+ vm−1
n+1
)
48
1
= n
(−vm+2 + 9vm+1
n
+ 9vm
n
− vm−1
n
)
16

− n
(−vm+2 + 27vm+1
n
− 27vm
n
+ vm−1
n
)
48
k
+ n+1
(−fm+2 + 9fm+1
n+1
+ 9fmn+1 − fm−1
n+1
32
− fm+2
n
+ 9fm+1
n
+ 9fmn − fm−1
n
).

3.3 Difference Notation and the Difference Calculus


To assist in our analysis and discussion of schemes we introduce some notation for finite
differences. The forward and backward difference operators are defined by

vm+1 − vm
δ + vm = (3.3.1)
h
and
vm − vm−1
δ− vm = , (3.3.2)
h
respectively. We will occasionally use the notation δx+ and δx− for these operators and
define
v n+1 − vm
n
n
δt+ vm = m
k
3.3 The Difference Calculus 79
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

for the forward difference in t; we similarly define δt− .


The central (first) difference operator δ0 or δx0 is defined by

1 vm+1 − vm−1
δ 0 vm = (δ+ vm + δ− vm ) =
2 2h

or, more succinctly,


δ0 = 12 (δ+ + δ− ).

The central second difference operator is δ+ δ− , which we also denote by δ 2 . We have

vm+1 − 2vm + vm−1


δ 2 vm = ,
h2

and also
δ 2 = (δ+ − δ− )/ h.

We now demonstrate the use of this notation in deriving fourth-order accurate ap-
proximations to the first and second derivative operators. By Taylor series we have


du h2 d 3 u h2 d 2 du
δ0 u = + + O(h ) = 1 +
4
+ O(h4 )
dx 6 dx 3 6 dx 2 dx
 (3.3.3)
h2 du
= 1 + δ2 + O(h4 ),
6 dx

where we have used


d 2f
= δ 2 f + O(h2 ).
dx 2
We may rewrite the formula (3.3.3) for δ0 u as
 −1
du h2
= 1 + δ2 δ0 u + O(h4 ). (3.3.4)
dx 6

h2 2
The inverse of the operator 1 + 6 δ is used only in a symbolic sense. In practice, the
h2 2
inverse is always eliminated by operating on both sides of the expression with 1 + 6 δ .
Applying this formula to the simple equation

du
= f, (3.3.5)
dx

we have the equation


 −1
h2
1 + δ2 δ0 u(xm ) = f (xm )
6
80 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

to fourth order. From this we have



h2 2
δ0 u(xm ) = 1 + δ f (xm )
6

or
vm+1 − vm−1 1
= fm + (fm+1 − 2fm + fm−1 )
2h 6
1
= (fm+1 + 4fm + fm−1 ).
6
Notice that replacing the right-hand side with only fm is a second-order accurate formula.
This formula will be used in Chapter 4.
Another fourth-order difference formula may be derived by using the formula

du h2 2
δ0 u = + δ δ0 u + O(h4 ),
dx 6

which may be rewritten as



h2 2 du
1 − δ δ0 u = + O(h4 ).
6 dx

Applied to (3.3.5) we obtain the fourth-order approximation



h2 2
1 − δ δ0 vm = fm
6

or
−vm+2 + 8vm+1 − 8vm−1 + vm−2
= fm .
12h
For the second-order derivative we have the two formulas
 −1
d2 h2 2
= 1+ δ δ 2 + O(h4 ) (3.3.6)
dx 2 12

and 
d2 h2 2 2
= 1 − δ δ + O(h4 ). (3.3.7)
dx 2 12

It is of some use to develop the formalism relating differences to derivatives. Let


∂ = ∂x = dxd
. Then by Taylor series,


hj j
u(x + h) = ∂ u(x) = eh∂ u(x). (3.3.8)
j!
j =0
3.3 The Difference Calculus 81
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

This formalism may be regarded as a purely symbolic operation for obtaining differ-
ence equations. If we adopt this view, then we should always check the accuracy of the
formulas by the methods of Section 3.1. We may also regard this formalism as a shorthand
notation for general Taylor series methods. For example, we can write out the expressions
in (3.3.3) without writing down the symbol u. If we use this shorthand notation properly,
the results will be consistent with the methods of Section 3.1, and there is no need to perform
additional checks on the accuracy of schemes derived by this formalism. Therefore, we
may express formulas (3.3.1) and (3.3.2) as

eh∂ − 1
δ+ = (3.3.9)
h
and
1 − e−h∂
δ− = . (3.3.10)
h
Also,
1 eh∂ − e−h∂ sinh h∂
δ0 = (δ+ + δ− ) = = (3.3.11)
2 2h h
and
δ 2 = δ+ δ− = h−2 (eh∂ − 1)(1 − e−h∂ )
 2
= h−1 (eh∂/2 − e−h∂/2 )
(3.3.12)
 2
sinh 12 h∂
= 1
.
2h
Notice that to obtain the symbols of these operators according to Definitions 3.1.3
and 3.1.4 we need only replace ∂ by iξ.
We may generalize formula (3.3.4) as follows. From (3.3.11) we have
sinh h∂ sinh h∂
δ0 = = ∂ (3.3.13)
h h∂
and from (3.3.12) we have
hδ = 2 sinh 12 h∂,
where δ is defined by this relation. Thus

h∂ = 2 sinh−1 21 hδ

or
sinh−1 21 hδ
∂= 1
(3.3.14)
2h
and so, from (3.3.13),
sinh[2 sinh−1 ( 12 hδ)]
δ0 = ∂
2 sinh−1 21 hδ
82 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

or
2 sinh−1 ( 12 hδ)
∂= δ0
sinh[2 sinh−1 ( 12 hδ)]
$ (3.3.15)
 2 %−1/2
hδ sinh−1 21 hδ
= 1+ 1
δ0 .
2 2 hδ

One may use the expression (3.3.15) to substitute for the derivatives with respect to x
in differential equations and similarly use the square of (3.3.14) to substitute for the second
derivative. By expanding the Taylor series to high enough powers of h, approximations
to any order of accuracy can be obtained.
It is important to realize that not all schemes arise by a straightforward application
of these formulas. The Lax–Wendroff scheme is a good example of a scheme relying on
clever manipulations to obtain second-order accuracy in time, even though the scheme is
a one-step scheme. Other examples of higher order accuracy schemes using similar ideas
are given in Chapter 4.

Derivation of Schemes Using the Symbolic Calculus


To illustrate the use of the symbolic calculus, we derive several higher order accurate
schemes.

Example 3.3.1. We first derive a (4, 4) scheme for the one-way wave equation. The starting
point for the derivation is the Taylor series expansion for a solution of ut + aux = f,

m − um
un+2 2k 2
n−2
=ut + uttt + O(k 4 )
4k 3

2k 2 2
= 1+ δ ut + O(k 4 )
3 t

2k 2 2
= 1+ δt (−aux + f ) + O(k 4 )
3
 
2k 2 2 h2 δ 2
=− 1+ δt a 1 − δ0 u
3 6

2k 2 2
+ 1+ δt f + O(k 4 ) + O(h4 )
3
  n+1
h2 δ 2 2um − unm + 2un−1
=−a 1− δ0 m
6 3
 n+1
2fm − fmn + 2fmn−1
+ + O(k 4 ) + O(h4 ).
3
3.3 The Difference Calculus 83
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

This gives the (4, 4) scheme


n+2 − v n−2   n+1 n + 2v n−1
vm m h2 δ 2 2vm − vm 2f n+1 − fmn + 2fmn−1
+a 1− δ0 m
= m .
4k 6 3 3
In Chapter 4 we present methods to show that this scheme is stable for

3 1
|aλ| < √ √ ≈ 0.128825
4 (1 + 6)( 6 − 3/2)1/2
(see Exercises 4.2.1 and 4.4.5).
Example 3.3.2. As a second example we derive a scheme that is a hybrid between the Lax–
Wendroff scheme (3.1.2) and the Crank–Nicolson scheme (3.1.3) for the one-way wave
equation. We begin by considering u(tn+1/3 , x):
un+1 − un e2k∂t /3 − e−k∂t /3 n+1/3 n+1/3 k n+1/3
= u = ut + utt + O(k 2 )
k k 6

n+1/3 k  2 n+1/3 n+1/3 n+1/3



= ut + a uxx + (ft − afx ) + O(k 2 ),
6
and using the relation ϕ n+1/3 = (ϕ n+1 + 2ϕ n )/3 + O(k 2 ), we obtain

n+1 − v n  n+1 + 2v n  n+1 + 2v n


vm m vm k vm
+ aδ0 m
− a2δ2 m
k 3 6 3
 (3.3.16)
f n+1 + fmn ak fmn+1 + 2fmn
= m − δ0 .
2 6 3
This scheme is a (2, 2) scheme and is stable for |aλ| ≤ 3. See Exercise 3.3.7.
Example 3.3.3. For our last example we derive an implicit (2, 2) scheme for the one-way
wave equation. We have from (3.3.10) that
∂t v n+1 = −k −1 ln(1 − kδt− )v n+1
and by (3.3.8),

∂t un+2/3 = e−k∂t /3 ∂t un+1



1
= − 1 − kδt− k −1 ln(1 − kδt− )un+1 + O(k 2 )
3
 
1 1 2
= 1 − kδt− δt− + kδt− un+1 + O(k 2 )
3 2

1 2
= δt− + kδt− un+1 + O(k 2 )
6

7un+1 − 8un + un−1


= + O(k 2 ).
6k
84 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

n+2/3
Using this relation with ux = (2un+1
x + unx )/3 + O(k 2 ) we obtain

n+1 − 8v n + v n−1  n+1 + v n


7vm m m 2vm n+2/3
+ aδ0 m
= fm .
6k 3

In Example 4.3.1 it is shown that this scheme is unconditionally stable.

Exercises
3.3.1. Derive (3.3.6) and (3.3.7).
3.3.2. Obtain (3.3.4) directly from (3.3.15).
 2
3.3.3. Obtain (3.3.7) from ∂ 2 = h42 sinh−1 21 hδ , which is equivalent to (3.3.14).

3.3.4. Determine the stability and accuracy of the following scheme, a modification of the
Lax–Wendroff scheme, for ut + aux = f. For the stability analysis, but not the
accuracy analysis, assume that λ is a constant:

 −1 
1 h2 2 1 2 2 n
n+1
vm = n
vm − ak 1 − δ δ0 vm − a kδ vm
n
2 6 2
 −1
k ak 2 h2 2
+ (fmn+1 + fmn )− 1− δ δ0 fmn .
2 4 6

3.3.5. Show that the scheme for ut + aux = f given by

 h2 δ 2  n
n+1
vm =vm
n
− ak 1 − δ0 vm
6
$  %
a2k2 4 1
+ + a λ δ vm −
2 2 2 n
+ a λ δ0 vm
2 2 2 n
2 3 3

k ak 2
+ (fmn+1 + fmn ) − δ0 fmn
2 2

is accurate of order (2, 4) and stable if


√ 1/2
17 − 1
|aλ| ≤ ≈ 0.721469.
6

Note that O(kh2 ) ≤ O(k 2 ) + O(h4 ). Hint: The computation of |g|2 can be done
similarly to that of the Lax–Wendroff scheme.
3.4 Boundary Conditions 85
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

3.3.6. Show that the improved Crank–Nicolson scheme for ut + aux = f,

n+1 − v n  −1  n+1 n
vm m h2 v + vm f n+1 + fmn
+ a 1 + δ2 δ0 m = m ,
k 6 2 2

is accurate of order (2, 4) and is unconditionally stable. The scheme may also be
written as

1 n+1 2 1 n+1 aλ n+1


v + v n+1 + vm−1 + (v − vm−1
n+1
)
6 m+1 3 m 6 4 m+1

1 n 2 n 1 n aλ n
= vm+1 + vm + vm−1 − (v − vm−1
n
)
6 3 6 4 m+1

k n+1
+ (f + 4fmn+1 + fm−1
n+1
+ fm+1
n
+ 4fmn + fm−1
n
).
12 m+1

3.3.7. Show that the scheme derived in Example 3.3.2 is stable for |aλ| ≤ 3.
3.3.8. Use the relationship ∂ = h−1 ln(1 + hδ+ ) from (3.3.9) to derive the second-order
accurate one-sided approximation

du −3u(x0 ) + 4u(x1 ) − u(x2 )


(x0 ) ≈ .
dx 2h

3.4 Boundary Conditions for Finite Difference Schemes


In solving initial-boundary value problems such as (1.2.1) by finite difference schemes, we
must use the boundary conditions required by the partial differential equation in order to
determine the finite difference solution. Many schemes also require additional boundary
conditions, called numerical boundary conditions, to determine the solution uniquely. We
introduce our study of numerical boundary conditions by considering the Lax–Wendroff
scheme applied to the initial-boundary value problem (1.2.1). In Chapter 11 we discuss the
theory of boundary conditions in more detail.
When we use the Lax–Wendroff scheme on equation (1.2.1), the scheme can be
applied only at the interior grid points and not at the boundary points. This is because
the scheme requires grid points to the left and right of (tn , xm ) when computing vm n+1 ,

and at the boundaries either xm−1 or xm+1 is not a grid point. Assuming that a is
positive, the value of v0n is supplied by the boundary data as required by the differential
equation. At xM , where xM is the last grid point on the right, we must use some means
n+1
other than the scheme to compute vM . This additional condition is called a numerical
boundary condition. Numerical boundary conditions should be some form of extrapolation
that determines the solution on the boundary in terms of the solution in the interior. For
86 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

example, each of the following are numerical boundary conditions for (1.2.1):

n+1
vM = vM−1
n+1
, (3.4.1a)
n+1
vM = 2vM−1
n+1
− vM−2
n+1
, (3.4.1b)
n+1
vM = vM−1
n
, (3.4.1c)
n+1
vM = 2vM−1
n
− vM−2
n−1
. (3.4.1d)

Formulas (3.4.1a) and (3.4.1b) are simple extrapolations of the solution at interior grid points
to the boundary. Formulas (3.4.1c) and (3.4.1d) are sometimes called quasi-characteristic
extrapolation, since the extrapolation is done from points near the characteristics.
Numerical boundary conditions often take the form of one-sided differences of the
partial differential equation. For example, rather than formulas (3.4.1) we might use

n+1
vM = vM
n
− aλ(vM
n
− vM−1
n
). (3.4.2)

However, we can easily see that (3.4.2) is the result of using the Lax–Wendroff scheme at
n+1 n
vM where vM+1 is determined by

n
vM+1 = 2vM
n
− vM−1
n
,

which is essentially (3.4.1b). This example also illustrates the use of extra points beyond
the boundary to aid in the determination of the boundary values.
It is often easier to use extrapolation formulas such as (3.4.1) than to use extra points
or one-sided differences. Moreover, the extrapolations can give as accurate answers as the
other methods. The one-sided differences and extra points are occasionally justified by ad
hoc physical arguments, which can be more confusing than useful.
There is one difficulty with numerical boundary conditions, which we do not have
space to discuss in detail in this chapter, namely, that the numerical boundary condition cou-
pled with a particular scheme can be unstable. This topic is discussed further in Chapter 11.
For example, (3.4.1a) and (3.4.1b) together with the leapfrog scheme are unstable, whereas
(3.4.1c) and (3.4.1d) are stable. For the Crank–Nicolson scheme, conditions (3.4.1c) and
(3.4.1d) are unstable when aλ is larger than 2, but (3.4.1a) and (3.4.1b) are stable. The
proofs that these boundary conditions are stable or unstable, as the case may be, are given
in Chapter 11.
The analysis of the stability of a problem involving both initial data and boundary
conditions is done by considering the several parts. First, the scheme must be stable for the
initial value problem considered on an unbounded domain. This is done with von Neumann
analysis. The stability of the boundary conditions is done for each boundary separately.
Conditions at one boundary cannot have a significantly ameliorating effect on an unstable
boundary condition at the other boundary. As the preceding examples show, a boundary
condition may be stable or unstable depending on the scheme with which it is being used.
3.4 Boundary Conditions 87
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1

0.5

-0.5

-1

0 0.25 0.5 0.75 1

Figure 3.3. Unstable boundary condition for the leapfrog scheme.

Example 3.4.1. An example of an unstable boundary condition is shown in Figure 3.3.


The leapfrog scheme is used with equation (1.2.1), with a equal to 1. The grid spacing
is 0.02 and λ is equal to 0.9. At the left boundary, where x equals 0, u is specified to
be the exact solution sin 2π(x − t). The Lax–Friedrichs scheme is used for the first time
step. At the right boundary, where x is 1, (3.4.1a) is used. The three plots in the figure
show the effect at the times 0.9, 1.8, and 2.7. The growth arising from an unstable boundary
condition is not as dramatic as that arising from using an unstable scheme. The growth
may be O(n) for an unstable boundary condition, whereas it is exponential in n for an
unstable scheme.
Figure 3.3 illustrates one additional difficulty with unstable boundary conditions:
that the oscillations that are the result of the instability may not stay in the vicinity of the
boundary. In the first plot the oscillations are spread throughout the interval, and in the plot
at time 1.8, in the upper right, the oscillations are concentrated near the other boundary.
This is due to the slow growth of the instability and the presence of the parasitic mode for
the leapfrog scheme that propagates errors in the opposite direction from the differential
equation. Parasitic modes are discussed in Chapters 4 and 5. After sufficient time, as shown
88 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

in the third plot, the effect of the boundary instability is seen at that boundary. When the
effects of the boundary instability are observed far from the boundary, it can be difficult for
programmers to determine that the boundary condition is the source of the oscillations.

In practice, if we suspect that there is a numerical boundary condition instability, the


easiest thing to do is to change to a different form of extrapolation to eliminate it. There
is an analytical means of checking for these instabilities, but the algebraic manipulations
are often quite involved, as will be seen in Chapter 11. If a computer program using a
finite difference scheme is being used to solve a system of equations, it is usually easier
to implement other boundary conditions than it is to analyze the original conditions to
determine their stability.
One final comment should be made on this topic. In solving initial-boundary value
problems by finite differences, it is best to distinguish clearly between those boundary con-
ditions required by the partial differential equation and the numerical boundary conditions.
By making this distinction, we can avoid solving overdetermined or underdetermined partial
differential equation initial-boundary value problems.

Exercise
3.4.1. Solve the initial-boundary value problem (1.2.1) with the leapfrog scheme and the
following boundary conditions. Use a = 1. Only (d) should give good results.
Why?
(a) At x = 0, specify u(t, 0); at x = 1, use boundary condition (3.4.1b).
(b) At x = 0, specify u(t, 0); at x = 1, specify u(t, 1) = 0.
(c) At x = 0, use boundary condition (3.4.1b); at x = 1, use (3.4.1c).
(d) At x = 0, specify u(t, 0); at x = 1, use boundary condition (3.4.1c).

3.5 Solving Tridiagonal Systems


To use the Crank–Nicolson scheme and many other implicit schemes such as (1.6.1), we
must know how to solve tridiagonal systems of linear equations. We now present a con-
venient algorithm, called the Thomas algorithm, to solve tridiagonal systems that arise in
finite difference schemes. This is the algorithm used to compute the solutions displayed in
Figure 3.2.
Consider the system of equations

ai wi−1 + bi wi + ci wi+1 = di , i = 1, . . . , m − 1, (3.5.1)

with the boundary conditions

w0 = β0 and wm = βm . (3.5.2)

We will solve this system by Gaussian elimination without partial pivoting. It reduces
to this: We want to replace (3.5.1) by relationships of the form

wi = pi+1 wi+1 + qi+1 , i = 0, 1, 2, . . . , m − 1, (3.5.3)


3.5 Solving Tridiagonal Systems 89
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where the values of pi+1 and qi+1 are to be determined. For (3.5.3) to be consistent
with (3.5.1), we substitute (3.5.3) into (3.5.1) for wi−1 and examine the resulting relation
between wi and wi+1:

ai (pi wi + qi ) + bi wi + ci wi+1 = di

or
wi = −(ai pi + bi )−1 ci wi+1 + (ai pi + bi )−1 (di − ai qi ).
Comparing this expression with (3.5.3) we must have
pi+1 = −(ai pi + bi )−1 ci ,
(3.5.4)
qi+1 = (ai pi + bi )−1 (di − ai qi ),

for consistency of the formulas. Thus if we know p1 and q1 , then we can use (3.5.4)
to compute pi and qi for i greater than 1. The values of p1 and q1 are obtained
from the boundary condition (3.5.2) at i = 0. At i equal to 0 we have the two formulas
w0 = p1 w1 + q1 and w0 = β0 . These conditions are consistent if p1 = 0 and q1 = β0 .
With these initial values for p1 and q1 , formulas (3.5.4) then give all the values of pi and
qi up to i equal to m. To get the values of wi we use (3.5.3) starting with wm , which
is given.
We now consider other boundary conditions. If we have

w0 = w1 + β0 ,

then we set p1 = 1 and q1 = β0 . If we have the boundary conditions

wm = wm−1 + βm ,

then the relation


wm−1 = pm wm + qm
also holds, and we combine these two relations to obtain

wm = (1 − pm )−1 (qm + βm ).

If pm = 1, then wm cannot be defined, and the system with this boundary condition is
singular.
In general, the values of p1 and q1 are determined by the boundary condition at i
equal to 0, and the value of wm is determined by the boundary condition at i equal to m,
together with the relation (3.5.3) if necessary.
For the Thomas algorithm to be well-conditioned, we should have

|pi | ≤ 1. (3.5.5)

This is equivalent to having the multipliers in Gaussian elimination be at most 1 in mag-


nitude. From (3.5.3) we see that the error in wi+1 is multiplied by pi+1 to contribute to
90 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

the error in wi . If (3.5.5) is violated for several values of i, then there will be an increase
in the error. This error growth is due to ill-conditioning in the Thomas algorithm, and
using Gaussian elimination with partial pivoting should remove this error magnification.
Condition (3.5.5) has nothing to do with the stability or instability of the scheme.
The condition (3.5.5) should be checked when using the Thomas algorithm. Here are
two special cases where (3.5.5) holds.
1. Diagonal dominance, i.e., |ai | + |ci | ≤ |bi |.
2. 0 ≤ −ci ≤ bi , 0 ≤ ai with 0 ≤ p1 ≤ 1, or 0 ≤ −ai , 0 ≤ ci ≤ bi with −1 ≤
p1 ≤ 0.
The formulas for tridiagonal systems can be extended to block tridiagonal systems in
which the ai , bi , and ci are square matrices and the unknown wi are vectors. In this case
the pi are also matrices and the qi are vectors. The method also extends to pentadiagonal
systems.
Here is a sample of pseudocode for the Thomas algorithm for the Crank–Nicolson
scheme. The function Data refers to the boundary data that must be supplied as part of the
scheme. The boundary condition at the right end of the grid is (3.4.1c). This code must be
included in a loop over all time steps.

# Set the parameters.


aa = -a*lambda/4
bb = 1
cc = -aa

# Set the first elements of the p and q arrays.


p(1) = 0.
q(1) = Data(time)
# Compute the p and q arrays recursively.
loop on m from 1 to M-1
dd = v(m) - a*lambda*( v(m+1) - v(m-1))/2
denom = (aa* p(m) + bb )
p(m+1) = -cc / denom
q(m+1) = (dd - q(m)*aa ) /denom
end of loop on m
# Apply the boundary condition at the last point.
v(M) = v(M-1)
# Compute all interior values.
loop on m from M-1 to 0
v(m) = p(m+1)*v(m+1) + q(m+1)
end of loop on m

Periodic Tridiagonal Systems


If we use the Crank–Nicolson scheme or a similar scheme to solve a problem with periodic
solutions, then we obtain periodic tridiagonal systems. These can be solved by an extension
of the previous algorithm.
3.5 Solving Tridiagonal Systems 91
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Consider the system

ai wi−1 + bi wi + ci wi+1 = di , i = 1, . . . , m, (3.5.6)

with w0 = wm and wm+1 = w1 . This periodic system can be solved as follows. Solve
three systems as for the nonperiodic case, each for i = 1, . . . , m:

ai xi−1 + bi xi + ci xi+1 = di

with x0 = 0 and xm+1 = 0,

ai yi−1 + bi yi + ci yi+1 = 0

with y0 = 1 and ym+1 = 0, and

ai zi−1 + bi zi + ci zi+1 = 0

with z0 = 0 and zm+1 = 1.


Since these systems have the same matrix but different data, they use the same pi’s
but different qi’s. (For the last of these systems, qi = 0. )
Then we construct wi as

wi = xi + ryi + szi .

It is easy to see that wi satisfies (3.5.6) for i = 1, . . . , m. We choose r and s to guarantee


the periodicity. The relationship w0 = wm becomes

r = ry0 = xm + rym + szm

and wm+1 = w1 becomes

s = szm+1 = x1 + ry1 + sz1 .

These are two equations in the two unknowns r and s. The solution is

xm (1 − z1 ) + x1 zm
r= ,
D
xm y1 + x1 (1 − ym )
s= ,
D
with
D = (1 − ym )(1 − z1 ) − y1 zm .
These formulas for solving periodic tridiagonal systems as well as the formula in Exercise
3.5.8 are special cases of the Sherman–Morrison formula for computing the inverse of a
matrix given the inverse of a rank 1 modification of the matrix (see Exercise 3.5.10).
92 Chapter 3. Order of Accuracy of Finite Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
3.5.1. Solve ut + ux = 0 on −1 ≤ x ≤ 1 for 0 ≤ t ≤ 1 with the Crank–Nicolson scheme
using the Thomas algorithm. For initial data and boundary data at x equal to −1,
use the exact solution u(t, x) = sin π(x − t). Use λ = 1.0 and h = 1/10, 1/20,
and 1/40. For the numerical boundary condition use
n+1
vM − vM
n
+ λ(vM
n+1
− vM−1
n+1
) = 0,

where xM = 1. Comment on the accuracy of the method.


Note: When programming the method it is easiest to first debug your program
n+1
using the boundary condition vM = vM−1
n . After you are sure the program works
with this condition, you can then change to another boundary condition.
3.5.2. Solve ut + ux + u = 0 on −1 ≤ x ≤ 1 for 0 ≤ t ≤ 1 with the Crank–Nicolson
scheme using the Thomas algorithm. For initial data and boundary data at x equal
to −1, use the two exact solutions:
(a) u(t, x) = e−t sin π(x − t),
(b) u(t, x) = max(0, e−t cos π(x − t)).
Use λ = 1.0 and h = 1/10, 1/20, and 1/40. Be sure that the undifferenti-
ated term is treated accurately. For the numerical boundary condition use each of
the following two methods:
n+1
(a) vM − vM
n
+ λ(vM
n+1
− vM−1
n+1
) + kvM
n+1
=0
and
n+1
(b) vM = 2vM−1
n+1
− vM−2
n+1
,

where M is the grid index corresponding to x equal to 1. Comment on the accuracy


of the methods. See the note in Exercise 3.5.1.
3.5.3. Solve ut + ux − u = 0 on −1 ≤ x ≤ 1 for 0 ≤ t ≤ 1 with the Crank–Nicolson
scheme using the Thomas algorithm. For initial data take

u(0, x) = cos2 π x if |x| ≤ 1/2,


0 otherwise,

and for boundary data take u(t, −1) = 0. Use λ = 1.0 and h = 1/10, 1/20, and
1/40. Be sure that the undifferentiated term is treated accurately. For the numerical
boundary condition use each of the two methods
n+1
(a) vM − vM
n
+ λ(vM
n+1
− vM−1
n+1
) − kvM
n+1
=0
and
n+1
(b) vM = 2vM−1
n+1
− vM−2
n+1
.

Comment on the accuracy of the methods. See the note in Exercise 3.5.1.
3.5 Solving Tridiagonal Systems 93
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

3.5.4. Solve ut + ux − u = −t sin π(x − t) on −1 ≤ x ≤ 1 for 0 ≤ t ≤ 1.2 with the


Crank–Nicolson scheme using the Thomas algorithm. For initial data and boundary
data at x = −1 use the exact solution u(t, x) = (1 + t) sin π(x − t). Use λ = 1.0
and h = 1/10, 1/20, and 1/40. Be sure that the undifferentiated term is treated
accurately. For the numerical boundary condition at xM = 1 use
n+1
vM − vM
n
+ λ(vM
n+1
− vM−1
n+1
) − kvM
n+1
= kf (tn+1 , xM ).

Comment on the accuracy of the method. See the note in Exercise 3.5.1.
3.5.5. Show that the condition (3.5.5) is violated for the Crank–Nicolson scheme (3.1.4)
when p1 = 0 and aλ > 4.
3.5.6. Show that the second-order differential equation
d 2u du
a(x) 2
+ b(x) + c(x)u = d(x)
dx dx
for α ≤ x ≤ β with u(α) = A and u(β) = B can be solved by an algorithm
similar to the Thomas algorithm. Set
du
= p(x)u + q(x)
dx
and determine equations for p(x) and q(x). Discuss how p ≥ 0 is the analogue
to (3.5.5).
3.5.7. Repeat some of the calculations of Exercise 3.5.2 with the (2, 4) accurate scheme
of Exercise 3.3.6, modified to include the undifferentiated term. Can you attain a
benefit from the fourth-order accuracy?
3.5.8. Show that the following algorithm also solves the periodic tridiagonal system (3.5.6).
1. Solve ai xi−1 + bi xi + ci xi+1 = di , i = 1, . . . , m, with x0 = σ x1 and
xm+1 = σ xm , where σ = sign(a1 b1 ).
2. Solve ai yi−1 + bi yi + ci yi+1 = 0, i = 1, . . . , m, with y0 = σy1 + 1 and
ym = σym+1 + 1.
x0 −xm
3. The solution wi is then obtained as wi = xi − ryi , where r = y0 −ym .
3.5.9. In the algorithm of Exercise 3.5.8, why shouldn’t we take σ = 1 when a1 and b1
have opposite signs?
3.5.10. Verify the following formula, called the Sherman–Morrison formula, for a linear
system of equations with matrix A.
If Ay = b and Az = u, then (A + uv T )x = b has the solution
vT y
x=y− z.
(1 + v T z)

This formula is useful for computing the solution x of (A + uv T )x = b if we


have a convenient method of solving equations of the form Ay = b.
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 4

Stability for Multistep Schemes

In Section 2.2 we gave necessary and sufficient conditions for the stability of one-step
schemes, and in this chapter we extend this analysis to multistep schemes. In the first
section of this chapter we examine the leapfrog scheme and give necessary and sufficient
conditions for the stability of this scheme. In the second section we present the stability
analysis for general multistep schemes. In the last section we present the theory of Schur
and von Neumann polynomials, which provides an algorithm for determining the stability
criteria for multistep schemes.

4.1 Stability for the Leapfrog Scheme


We begin by analyzing the stability of the leapfrog scheme (1.3.4) for the one-way wave
equation (1.1.1), which is
n+1 − v n−1
vm v n − vm−1
n
m
+ a m+1 = 0.
2k 2h
The previous analysis of Chapter 2 covered only the case of one-step schemes. The leapfrog
scheme is representative of schemes using more levels than the two required by one-step
schemes. The stencil of the leapfrog scheme is displayed in Figure 4.1.
By using the Fourier inversion formula for v n−1 , v n , and v n+1 (see (2.1.4)), we
obtain the equation
 π/ h  
eimhξ v̂ n+1 (ξ ) + 2iaλ sin(hξ ) v̂ n (ξ ) − v̂ n−1 (ξ ) dξ = 0
−π/ h

in a manner similar to the method used to obtain equation (2.2.3). By the uniqueness of
the Fourier transform, we conclude that the integrand in this integral must be zero for each
value of n, giving the relationship

v̂ n+1 (ξ ) + 2iaλ sin(hξ ) v̂ n (ξ ) − v̂ n−1 (ξ ) = 0. (4.1.1)

To solve this three-term recurrence relation in v̂ n , we set v̂ n = g n , where the superscript


on v̂ is an index and that on g represents the power. We then obtain, after canceling
g n−1 ,
g 2 + 2iaλ sin(hξ )g − 1 = 0.

95
96 Chapter 4. Stability for Multistep Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

t, n

Leapfrog Scheme

One-Step Scheme

x, m

Figure 4.1. Leapfrog stencil.

There are two roots to this quadratic equation, given by


&
g± = −iaλ sin(hξ ) ± 1 − (aλ)2 sin2 (hξ ). (4.1.2)

When g+ and g− are not equal, the solution for v̂ n in (4.1.1) is given by

v̂ n (ξ ) = A+ (ξ )g+ (hξ )n + A− (ξ )g− (hξ )n , (4.1.3)

where the functions A+ (ξ ) and A− (ξ ) are determined by the initial conditions. As we


will see, the term with g+ contains most of the accurate portion of the solution. To
emphasize the special nature of g+ and make some of the formulas nicer, we rewrite the
above expression as
!
g− (hξ )n − g+ (hξ )n
v̂ n (ξ ) = A(ξ )g+ (hξ )n + B(ξ ) (4.1.4)
g− (hξ ) − g+ (hξ )

for functions A(ξ ) and B(ξ ), which are determined by the initial conditions. When g+
and g− are equal, then the solution can be written

v̂ n (ξ ) = A(ξ )g(hξ )n + B(ξ )ng(hξ )n−1 , (4.1.5)


4.1 Stability for the Leapfrog Scheme 97
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where g+ = g− = g. The functions A(ξ ) and B(ξ ) are related to v̂ 0 (ξ ) and v̂ 1 (ξ ) by

A(ξ ) = v̂ 0 (ξ )

and
B(ξ ) = v̂ 1 (ξ ) − v̂ 0 (ξ )g+ (hξ ). (4.1.6)

We now consider the stability of the leapfrog scheme using Definition 1.5.1 with J
equal to 1. We first consider the case where g+ and g− are not equal, and we choose the
initial data v 0 and v 1 so that B(ξ ) is identically zero. Then from (4.1.4) we have

|v̂ n (ξ )| = |A(ξ )| |g+ (hξ )|n .

As with the one-step schemes, we see that it is necessary that g+ (hξ ) satisfies the inequality

|g+ (hξ )| ≤ 1 + Kk,

just as for the amplification factor of a one-step scheme. Obviously, from (4.1.3) g− (hξ )
must also satisfy such an estimate. If we take λ to be a constant, then we may employ the
restricted condition
|g± (hξ )| ≤ 1
to determine the stability. From (4.1.2) with |aλ| ≤ 1 we have that

|g+ |2 = |g− |2 = 1 − (aλ)2 sin2 θ + (aλ sin θ)2 = 1.

If |aλ| is greater than 1, then for θ equal to π/2 we have from (4.1.2)

|g− (π/2)| = |aλ| + (aλ)2 − 1 > 1,

which shows that the scheme is unstable in this case. From this we see that the stability
condition is |aλ| ≤ 1, except that we must also examine what happens when g+ and g−
are equal.
It is easy to see from (4.1.2) that g+ can be equal to g− only when |aλ sin θ | = 1.
Since we know already that |aλ| must be at most 1, we need consider only |aλ| ≤ 1. But
then g+ = g− only when |aλ| = 1 and θ = ± π/2, and we then have g+ = g− = ± i.
The solution for v̂ n is then
 π  π  π
v̂ n ± =A ± (∓i)n + B ± n(∓i)n−1 ,
2h 2h 2h

and when these values of B are nonzero, v̂ n will grow linearly in n.


Since v̂ n for θ equal to ± π/2 behaves this way—i.e., has a growth that is linear
in n —we can show that there are solutions to the finite difference scheme whose norm
grows very nearly linearly in n, and therefore the leapfrog scheme is unstable if |aλ| = 1;
see Exercise 4.1.5. Hence the leapfrog scheme is stable only if |aλ| is strictly less than 1.
98 Chapter 4. Stability for Multistep Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

u 1

-1

-2
-1 -0.5 0 0.5 1
x

Figure 4.2. Leapfrog instability for aλ = 1.

Thus the necessary and sufficient condition for the stability of the leapfrog scheme (1.3.4)
is
|aλ| < 1.

The instability that occurs for |aλ| = 1 is much milder than that which occurs for
|aλ| > 1; nonetheless, it is an instability. Figure 4.2 displays the solution of the leapfrog
scheme applied to the one-way wave equation (1.1.1) with a = 1 and λ = 1 for a periodic
problem on the interval [−1, 1]. Also displayed is the initial solution at t = 0. The solution
is computed with a grid of h = 1/10 and is shown after 100 time steps. The first time
step was computed using the forward central scheme. Obviously, the solution is growing
slowly.

Initializing the Leapfrog Scheme


The leapfrog scheme and other three-level schemes require a one-step scheme to get started.
We can use any one-step scheme, even an unstable scheme, to initialize a multistep scheme.
A consistent unstable scheme that is used for only the first several time steps will produce
a small growth in the solution. This growth is small because of consistency. The stability
of the leapfrog scheme or other multistep schemes will keep this small initial growth from
being amplified. Also, as is shown in Chapter 10, if λ is a constant, then the initialization
scheme can be accurate of order one less than that of the scheme without degrading the
overall accuracy of the scheme. Thus our use of the forward-time central-space scheme
(1.3.3) to initialize the leapfrog scheme does not affect the stability or accuracy of the
leapfrog scheme, as shown in Figure 1.3.8.
4.1 Stability for the Leapfrog Scheme 99
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Parasitic Modes
We now take a closer look at the solution of the leapfrog scheme. As (4.1.3) shows, the
solution to the leapfrog scheme consists of two parts, one associated with g+ and the other
with g− . The two amplification factors are distinguished by g(0)+ = 1 and g− (0) = −1.
We are interested in how these two parts behave and how they contribute to the total solution.
For definiteness we take the case where for the first time step the forward-time central-space
scheme is used. That is, v̂ 1 is given by
v̂ 1 (ξ ) = (1 − iaλ sin hξ ) v̂ 0 (ξ ).
Using this relation and the expansions
g+ (hξ ) = 1 − iaλ sin hξ − 12 a 2 λ2 sin2 hξ + O(hξ )4 ,

g− (hξ ) = −1 − iaλ sin hξ + 12 a 2 λ2 sin2 hξ + O(hξ )4 ,


we have, from (4.1.6), that
 
B(ξ ) = 1 2 2 2
2 a λ sin hξ + O(hξ )4 v̂ 0 (ξ ).

This formula shows that B(ξ ) is small, i.e., O(hξ )2 for |hξ | small. Thus, for these values
of hξ, the scheme behaves like a one-step scheme with amplification factor g+ . For larger
values of |hξ |, the magnitude of B(ξ ) need not be small.
The portion of the solution associated with g− is called the parasitic mode. Since
at ξ equal to 0 the value of g− is −1, we see that this parasitic mode oscillates rapidly
in time. As is shown in Chapter 5, the parasitic mode also travels in the wrong direction.
That is, when a is positive, the parasitic mode travels to the left.

Example 4.1.1. An interesting way to see the parasitic mode and also to illustrate the effect
of inconsistent boundary conditions is shown in Figure 4.3. The figures show the solution
computed by the leapfrog scheme with initial data as a pulse given by

vm = cos π xm if |xm | ≤ 2 ,
2 1
0
0 otherwise.
The value of a is 1, λ is 0.9, and x is in the interval [−1, 1]. At both boundaries the
values of v n are fixed at zero. At the right boundary this is inconsistent with the differential
equation (see Section 1.2). This inconsistency will serve our purpose of generating a
substantial parasitic mode in the solution.
The top left plot in Figure 4.3 shows the solution at t equal to 0.45, with the pulse
moving to the right, and the top right plot shows the solution at t equal to 1.80 and moving
to the left. The inconsistent boundary condition has generated a solution having a significant
parasitic mode, as indicated by the oscillatory nature of the pulse and its “wrong” direction
of travel. The bottom plot shows the solution at t equal to 3.6 with the original pulse shape
nearly restored. The parasitic mode has been converted to the nonparasitic mode by the
boundary condition at the left endpoint of the interval. The scheme was initialized using the
forward-time central-space scheme (1.3.3), but the phenomena displayed in these figures
are not dependent on the initialization.
100 Chapter 4. Stability for Multistep Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1 1

0.5 0.5

0 0

-0.5 -0.5
t = 0.45 t = 1.80
-1 -1

-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

0.5

-0.5
t = 3.60
-1

-1 -0.5 0 0.5 1

Figure 4.3. Leapfrog parasitic mode.

In any calculation with multistep schemes, as opposed to one-step schemes, there will
be parasitic modes. These parasitic modes usually cause only minor difficulty, but in some
cases the effects they cause must be reduced or removed. We can reduce the effect of the
parasitic modes by the use of dissipation, which is discussed in the next chapter.

Example 4.1.2. As a further illustration of the stability of multistep schemes, we consider


the (2, 4) leapfrog scheme
n+1 − v n−1 
vm m h2 δ 2
+a 1− n
δ0 vm = fmn , (4.1.7)
2k 6
which uses the fourth-order difference formula (3.3.7). The equation for the amplification
factor is 
2 21
g + 2g iaλ 1 + sin θ sin θ − 1 = 0
2
3 2
or 
4 − cos θ
g + 2g iaλ
2
sin θ − 1 = 0,
3
4.1 Stability for the Leapfrog Scheme 101
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

and the amplification factors are


'
 2
4 − cos θ 4 − cos θ
g± = −iaλ sin θ ± 1 − a 2 λ2 sin2 θ.
3 3

The condition that g+ and g− have magnitude at most 1 and that they not be equal is
easily seen to be that 
4 − cos θ
|aλ| | sin θ | < 1 (4.1.8)
3
for all values of θ. To determine the extrema of (4 − cos θ) sin θ, we have at the extrema
d
(4 − cos θ) sin θ = 1 + 4 cos θ − 2 cos2 θ = 0.

This quadratic equation in cos θ has one root for real θ, given by

3
cos θ = 1 − .
2
Substituting this in (4.1.8), we obtain that the necessary and sufficient condition for stability
is that  
1 −1 √ 3 −1/2
|aλ| < 1 + √ 6− . (4.1.9)
6 2

The value of the right-hand side of (4.1.9) is approximately 0.7208 and, because this
constraint is more severe than that for the usual (2, 2) leapfrog scheme (1.3.4), we might
judge this scheme to be less efficient in some sense. However, quite the opposite is true.
Because the scheme (4.1.7) is fourth-order accurate in space but only second-order accurate
in time, we should take |aλ| smaller than the limit given by (4.1.9)—for example, 0.25—
to improve the temporal accuracy. As a consequence of the fourth-order accuracy we can
either take the spatial grid spacing larger for the (2, 4) scheme (4.1.7) than we would for
the (2, 2) scheme (1.3.4) without sacrificing accuracy in the solution, or we can use the
same spatial grid spacing with (4.1.7) as we would for (1.3.4) and use the smaller time step
to attain higher accuracy without much more effort. Either way it is seen that the constraint
(4.1.9) is not a severe limitation on the scheme.

Exercises
4.1.1. Show that the implicit (2, 4) leapfrog scheme
n+1 − v n−1  −1
vm m h2 2
+a 1+ δ n
δ0 vm = fmn
2k 6
for the one-way wave equation with λ constant is stable if and only if
1
|aλ| < √ .
3
102 Chapter 4. Stability for Multistep Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

4.1.2. Show that the (2, 2) leapfrog scheme for ut + auxxx = f (see (2.2.15)) given by

n+1 − v n−1
vm m
+ aδ 2 δ0 vm
n
= fmn ,
2k

with ν = k/ h3 constant, is stable if and only if

2
|aν| < .
33/2

4.1.3. Show that the leapfrog scheme

n+1 − v n−1 
vm m h2 2 h4 4
+ a 1 − δ + δ δ0 vmn
= fmn
2k 6 30

for the one-way wave equation is accurate of order (2, 6) and, if λ is constant, is
stable if and only if

3
|aλ| < .
[2( 25 )1/3 − 1]1/2 [( 52 )2/3 + 3( 52 )1/3 + 1]

Hint: The critical value of θ occurs when cos θ is equal to 1 − ( 52 )1/3 .

4.1.4. Show that the (2, ∞) leapfrog scheme for the one-way wave equation
$  2 %−1/2
n+1 − v n−1
vm m hδ sinh−1 21 hδ
+a 1+ 1
n
δ0 vm = fmn
2k 2 2 hδ

is stable, if λ is constant, if and only if |aλ| < 1/π ; see equation (3.3.15).

4.1.5. This exercise deals with the construction of solutions to the leapfrog scheme that
have nearly linear growth in n when aλ is 1. Consider the initial data given by
v 0 = 0 and  π
1
vm = sin m for |m| < 4M
2
for a positive integer M. Show that for odd values of n the solution is
 π
n
vm = (−1)(n−1)/2 n sin m for |m| ≤ 4M − n.
2

Conclude that v n  grows at least linearly in n. Hint: You need only show that
v n  ≥ Cnv 1  for some large values of n, such as n = 2M. You do not need to
explicitly compute v n .
4.2 Stability for General Multistep Schemes 103
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

4.2 Stability for General Multistep Schemes


We now discuss the stability conditions for a general multistep scheme. As for one-step
schemes, we assume that the differential equation is of first order in the differentiation with
respect to t.
The stability of the multistep scheme

Pk,h v = Rk,h f (4.2.1)

is determined by considering the roots of the amplification polynomial G(g, θ) given by



ln g
G(g, θ) = k pk,h , θh−1
k

or, equivalently,
 
G esk , hξ = k pk,h (s, ξ ) .

Alternatively, G can be obtained by requiring that

n
vm = g n eimθ (4.2.2)

is a solution to the equation (4.2.1) with f = 0. G(g, θ ) is the polynomial of which g must
be a root so that (4.2.2) can be a solution of (4.2.1). We assume that the scheme involves
σ + 1 time levels, so that G is a polynomial of order σ. Note that J in Definition 1.5.1
will be taken to be σ.
Since we are primarily concerned with the roots of this polynomial, there is no diffi-
culty in dealing with a scalar multiple of G(g, θ) rather than G(g, θ) itself. However, the
relationship between G(g, θ ) and the symbol p(s, ξ ) is important in proving convergence
results for multistep schemes in Chapter 10.

Example 4.2.1. Consider the multistep scheme for the one-way wave equation given by

n+1 − 4v n + v n−1
3vm v n+1 − vm−1
n+1
m m
+ a m+1 = fmn+1 . (4.2.3)
2k 2h

For this scheme the amplification polynomial is

G(g, θ ) = 12 (3 + 2iaλ sin θ)g 2 − 2g + 12 .

The analysis of the stability of this scheme is not as easy as that of the leapfrog
scheme and is most easily done with the methods of the next section, in which we present
a general method for analyzing the stability of multistep schemes. This scheme is accurate
of order (2, 2) and unconditionally stable; see Exercise 4.4.3 and Example 4.2.2.
104 Chapter 4. Stability for Multistep Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

For a one-step scheme G(g, θ) is a linear polynomial in g, and the general solution
of the homogeneous difference equation is given by (2.2.5). For a multistep scheme in
which G is a polynomial of degree σ, there are two cases to consider. First, if G has
distinct roots, gν (θ ), the general solution to the homogeneous difference scheme is given
by

σ
v̂ n (ξ ) = gν (hξ )n Aν (ξ ).
ν=1

The coefficients Aν (ξ ) are determined by the data on the time levels for n from 0 to
σ − 1. If the roots gν (hξ ) are bounded away from each other, independently of k and h,
then the values of Aν are bounded by the sum

−1
σ
C |v j (ξ )| (4.2.4)
j =0

for some constant C (see Exercise 4.2.2). As with one-step schemes, it is then easily shown
that the stability condition is

|gν (hξ )| ≤ 1 + Kk for ν = 1, . . . , σ,

for each root of G. In the cases when G(g, θ) is independent of k and h, the restricted
condition
|gν (hξ )| ≤ 1 for ν = 1, . . . , σ (4.2.5)
holds.
We now consider the situation in which G(g, θ) has multiple roots. For simplicity
we assume that the restricted condition (4.2.5) can be used. (The general case is handled
in the exercises.) Suppose that g1 (θ0 ) is a multiple root of the amplification polynomial
G at θ0 ; then the function
 
n
v̂m = g1 (θ0 )n B0 + ng1 (θ0 )n−1 B1 eimθ0

is a solution of the difference equation for any values of B0 and B1 . If B0 equals 0, then
the magnitude of v̂m n is

n |g1 (θ0 )|n−1 |B1 |. (4.2.6)


If |g1 (θ0 )| is less than 1, then this quantity is bounded by a multiple of
 −1
|g1 (θ0 )| log |g1 (θ0 )|−1 |B1 | (4.2.7)

(see Exercise 4.2.3). However, if |g1 (θ0 )| is equal to 1, then the quantity (4.2.6) cannot be
bounded independently of n. As in the proof of Theorem 2.2.1, we can construct a solution
to the finite difference scheme that is not bounded, as required by Definition 1.5.1, the
definition of stability. We state this result in the next theorem. A root that is not a multiple
root is called a simple root.
4.2 Stability for General Multistep Schemes 105
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Theorem 4.2.1. If the amplification polynomial G(g, θ) is explicitly independent of h


and k, then the necessary and sufficient condition for the finite difference scheme to be
stable is that all roots, gν (θ ), satisfy the following conditions:
(a) |gν (θ )| ≤ 1, and
(b) if |gν (θ )| = 1, then gν (θ ) must be a simple root.

Notice in particular that there is at most one root g0 (θ ) such that g0 (0) = 1. There
is always one root with g0 (0) = 1 by consistency. For completeness we state the general
theorem as well.

Theorem 4.2.2. A finite difference scheme for a scalar equation is stable if and only if all the
roots, gν (θ ), of the amplification polynomial G(θ, k, h) satisfy the following conditions.
(a) There is a constant K such that |gν | ≤ 1 + Kk.
(b) There are positive constants c0 and c1 such that if c0 ≤ |gν | ≤ 1 + Kk, then gν
is a simple root, and for any other root gµ the relation
|gν − gµ | ≥ c1
holds for h and k sufficiently small.

The proofs of these theorems are similar to that of Theorem 2.2.1 and are left as
exercises.
It is useful to consider the behavior of the roots g± (θ ) for the leapfrog scheme
in terms of Theorem 4.2.1. Figure 4.4 illustrates the behavior of g+ (θ ) and g− (θ ) as
functions of θ.
We first discuss the stable case, shown in the figure at the left. For θ = 0, the value
of g+ (θ ) is 1, and g− (θ ) is −1 . As θ increases from 0 to π/2, g+ (θ ) moves from 1 to
point A, and as θ goes from π/2 to π, the value of g+ (θ ) goes from point A back to
1. As θ continues from π to 2π, g+ (θ ) travels from 1 to B and back to 1. The values
of g− (θ ) are the reflection of g+ (θ ) in the imaginary axis.
The unstable case is illustrated on the right-hand side of Figure 4.4. Let θ0 be the
smallest positive value of θ for which g+ (θ ) and g− (θ ) are equal. As θ increases from
0 to θ0 , the values of g+ (θ ) traverse from 1 to −i. Similarly, g− (θ ) traverses from −1
to −i. For θ between θ0 and π/2, the double root at −i splits into two roots, both on
the imaginary axis, one inside the unit circle and one outside. At π/2, they are at points
A and A . (Since point A is outside the unit circle, the scheme is unstable.) As θ takes
on values from π/2 to 2π, the values of g± (θ ) travel from A and A , back to 1 and
−1, up to points B and B  , and back to 1 and −1.

Example 4.2.2. We can show that scheme (4.2.3) is unconditionally stable using several
tricks. The polynomial equation for g(θ ) can be written

(3 + 2iaλ sin θ)g 2 − 4g + 1 = 0.


It is more convenient to solve for g(θ )−1 , for which the equation is

g −2 − 4g −1 + (3 + 2iaλ sin θ) = 0
106 Chapter 4. Stability for Multistep Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

B
B’ B

B

-1 1 -1 1

A’

A A
A

Figure 4.4. The amplification factors for the leapfrog scheme.

and the roots are


−1

g± =2∓ 1 − 2iaλ sin θ. (4.2.8)

It is difficult to evaluate |g(θ )−1 | because the quantity under the square root operator is
complex. However, set g(θ )−1 = X + iY, where X and Y are real, and we have by
(4.2.8)

X − 2 + iY = ∓ 1 − 2iaλ sin θ,

so, by squaring each side of this equation, we obtain

(X − 2)2 − Y 2 + 2i(X − 2)Y = 1 − 2iaλ sin θ.

Thus we see that the values of X and Y are restricted to the hyperbola given by the real
part of this equation, i.e.,
(X − 2)2 − Y 2 = 1. (4.2.9)

This hyperbola, along with the unit circle, is shown in Figure 4.5. The two branches
of the hyperbola correspond to the two roots of the polynomial. The branch on the left
corresponds to g+ (θ ) with g+ (0)−1 = 1, and the branch on the right corresponds to
g− (θ ) with g− (0)−1 = 3. In particular, since they are on separate branches, the two roots
do not ever coalesce. As seen in the figure, the points (X, Y ) are outside the unit circle
except for the value of (1, 0). Thus the scheme is unconditionally stable. To show this
analytically, we use (4.2.9) to eliminate Y 2 in our evaluation of X2 + Y 2 as follows.
Recall that since g(θ )−1 = X + iY, we need X 2 + Y 2 to be greater than or equal to 1:

X2 + Y 2 = X2 + (X − 2)2 − 1 = 2X2 − 4X + 3 = 1 + 2(X − 1)2 ≥ 1.

Thus this scheme is unconditionally stable.


4.2 Stability for General Multistep Schemes 107
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1 3

Figure 4.5. Display for showing the stability of the scheme (4.2.3).

This example shows that many different approaches can be used to show stability of
schemes. However, it also shows that it would be nice to have a method that is of quite
general applicability, as is presented in the next section.

Exercises
4.2.1. Show that the amplification polynomial for the scheme derived in Example 3.3.1 is
 
G(z, θ) = z4 + 43 iβ 2z3 − z2 + 2z − 1, (4.2.10)

where
β = aλ(1 + 2
3 sin2 21 θ) sin θ.
Show that the stability for this scheme can be analyzed by the following method.
(a) Substituting z = eiψ , obtain an equation for ψ in the form

F (ψ) = β.

(b) Show that for real values of β near 0 there are four real roots of the equation
F (ψ) = β. Conclude that the scheme is stable for aλ sufficiently small.
(c) Show that the stability limit for aλ is determined by that value of β given by

F (ψ) = β and F  (ψ) = 0.

Determine the stability limit for this scheme.


108 Chapter 4. Stability for Multistep Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

4.2.2. Verify that the values of Aν (ξ ) in equation (4.2.4) can be bounded by the quantity
(4.2.4) if the roots gν (θ ) are bounded away from each other. Hint: The matrix to
be inverted is a Vandermonde matrix.
4.2.3. Verify the estimate (4.2.7).
4.2.4. Prove Theorem 4.2.1.
4.2.5. Prove Theorem 4.2.2.

4.3 The Theory of Schur and von Neumann Polynomials


The application of Theorem 4.2.1 to a particular scheme requires us to determine the location
of roots of amplification polynomials, and in this section we present an algorithm for
checking the roots of such polynomials. We first present some examples on which to apply
the theory. At the end of this section we determine the stability conditions for each of these
schemes.

Example 4.3.1. The second-order accurate scheme for the one-way wave equation (1.1.1)
n+1 − 8v n + v n−1  n+1 n
7vm m m 2vm + vm n+2/3
+ aδ0 = fm , (4.3.1)
6k 3
which was derived in Example 3.3.3, has the amplification polynomial

G(z) = (7 + 4iβ) z2 − (8 − 2iβ) z + 1, (4.3.2)

where β = aλ sin θ.
Example 4.3.2. The second scheme we consider, also for (1.1.1), is a (3, 4) accurate
scheme

n+1 − 21v n − 3v n−1 + v n−2


23vm m m m
24k
(4.3.3)
 −1  n+1 + v n  n+1 − v n !
h2 vm k2a2 2 vm n+1/2
+ 1 + δ2 aδ0 m
+ δ m
= fm .
6 2 8 k
This scheme has the amplification polynomial

(23 − 12α + 12iβ)z3 − (21 − 12α − 12iβ)z2 − 3z + 1, (4.3.4)

where
a 2 λ2 sin2 21 θ
α=
1− 2
3 sin2 21 θ
and
aλ sin θ
β= . (4.3.5)
1− 2
3 sin2 21 θ
4.3 Schur and von Neumann Polynomials 109
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

This scheme can be derived by considering third-order approximations about time


level n + 1/2.
Example 4.3.3. The third example is the (4, 4) accurate scheme for the one-way wave
equation (1.1.1):

n+2 − v n−2  −1  n+1 n + 2v n−1


vm m h2 2vm − vm
+ a 1 + δ2 δ0 m
4k 6 3
(4.3.6)
2fmn+1 − fmn + 2fmn−1
= ,
3
which has the amplification polynomial
4  
z4 + iβ 2z3 − z2 + 2z − 1, (4.3.7)
3
where β is as in (4.3.5). The derivation of this scheme is similar to the derivation in
Example 3.3.1.

A direct determination of the roots of these polynomials is a formidable task. Fortu-


nately, there is a well-developed theory and algorithm for checking whether these polynomi-
als satisfy the conditions of Theorem 4.2.1. We begin with some definitions and notations.
These definitions and the following discussion are based on the paper of Miller [43]. Let
ϕ(z) be a polynomial of degree d,

d
ϕd (z) = ad zd + · · · + a0 = a$ z $ .
$=0

We say that ϕ is of exact degree d if ad is not zero.


Definition 4.3.1. The polynomial ϕ is a Schur polynomial if all its roots, rν , satisfy
|rν | < 1.

Definition 4.3.2. The polynomial ϕ is a von Neumann polynomial if all its roots, rν ,
satisfy
|rν | ≤ 1.

Definition 4.3.3. The polynomial ϕ is a simple von Neumann polynomial if ϕ is a von


Neumann polynomial and its roots on the unit circle are simple roots.
Definition 4.3.4. The polynomial ϕ is a conservative polynomial if all its roots lie on
the unit circle, i.e., |rν | = 1 for all roots rν .

For any polynomial ϕ of exact degree d we define the polynomial ϕ ∗ by



d

ϕ (z) = ād−$ z$ , (4.3.8)
$=0
110 Chapter 4. Stability for Multistep Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where the bar on the coefficients of ϕ denotes the complex conjugate. Note that

ϕ ∗ (z) = ϕ(z̄−1 )zd . (4.3.9)

Finally, for a polynomial ϕd (z) of degree d we define recursively the polynomial


ϕd∗ (0)ϕd (z) − ϕd (0)ϕd∗ (z)
ϕd−1 (z) = . (4.3.10)
z
It is easy to see that the degree of ϕd−1 is less than that of ϕd . The next two theorems
give recursive tests for Schur polynomials and simple von Neumann polynomials.

Theorem 4.3.1. ϕd is a Schur polynomial of exact degree d if and only if ϕd−1 is a


Schur polynomial of exact degree d − 1 and |ϕd (0)| < |ϕd∗ (0)|.

Theorem 4.3.2. ϕd is a simple von Neumann polynomial if and only if either


(a) |ϕd (0)| < |ϕd∗ (0)| and ϕd−1 is a simple von Neumann polynomial or
(b) ϕd−1 is identically zero and ϕd is a Schur polynomial.
The proofs of these theorems depend on Rouché’s theorem from complex analysis.

Theorem 4.3.3. Rouché’s Theorem. Let the functions ϕ and ψ be analytic within and
on a simple closed curve C, and suppose

|ϕ(z) − ψ(z)| < |ϕ(z)| (4.3.11)

on the curve C. Then ϕ and ψ have the same number of zeros in the interior of C.
The proof of Rouché’s theorem rests on the observation that the number of zeros of
ϕ inside the curve C is equal to the number of times the image of C under ϕ winds
around the origin. Inequality (4.3.11) constrains the image of C under ψ to wind around
the origin the same number of times as ϕ does. Rouché’s theorem has been called the
“walk-the-dog theorem” to emphasize the geometric nature of the theorem. The “dog,”
ψ(z), must go around the origin exactly as many times as its “master,” ϕ(z), as long as
the “leash,” ϕ(z) − ψ(z), is shorter than the distance of the master from the origin. The
proof of Rouché’s theorem is given in standard introductory texts on complex analysis.
Proof of Theorem 4.3.1. First assume that |ϕd (0)| < |ϕd∗ (0)| and that ϕd−1 is
a Schur polynomial of degree d − 1. If we let ψ(z) = zϕd−1 (z)/ϕd∗ (0), we have, by the
definition of ϕd−1 ,


ϕd∗ (0)ϕd (z) − ϕd (0)ϕd∗ (z)
|ϕd (z) − ψ(z)| = ϕd (z) −
ϕd∗ (0)

ϕd (0)
= ∗ ϕd∗ (z) < |ϕd∗ (z)|.
ϕd (0)
On the unit circle we also have that

|ϕd∗ (z)| = |ϕd (z̄−1 )| = |ϕd (z)|,


4.3 Schur and von Neumann Polynomials 111

since z̄−1 = z on the unit circle. Thus, by Rouché’s theorem, ϕd has as many zeros inside
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

the unit circle as does zϕd−1 . Hence ϕd is a Schur polynomial.


Now assume that ϕd is a Schur polynomial of degree d. Then the product of the roots
of ϕd is a0 /ad , and this quantity must have a magnitude less than 1. This is equivalent to
|ϕd (0)| < |ϕd∗ (0)|. Rouché’s theorem then shows that zϕd−1 also is a Schur polynomial;
hence ϕd−1 is a Schur polynomial.
Proof of Theorem 4.3.2. We begin the proof by observing that a von Neumann
polynomial can be written as
(
$
ϕd (z) = (z − αν ) ϕ̃d−$ (z), (4.3.12)
ν=1
where |αν | = 1 for 1 ≤ ν ≤ $ and ϕ̃d−$ (z) is a Schur polynomial or a constant. (In
case $ = 0, the theorem follows from Theorem 4.3.1.)

Lemma 4.3.4. If ϕd (z) is of the form (4.3.12), then


(
$
ϕd−1 (z) = (z − αν ) ϕ̃d−$−1 (z). (4.3.13)
ν=1

Proof. We use the form (4.3.9) to prove the lemma and note that ᾱ = α −1 . We
have
$ 
( 
1 1
ϕd∗ (z) = ϕd (z̄−1 )zd =z d
− ᾱν ϕ̃ d−$
z z
ν=1
(
$

= (1 − zᾱν ) ϕ̃d−$ (z)
ν=1
(
$ (
$

= (−ᾱν ) (z − αν ) ϕ̃d−$ (z).
ν=1 ν=1
Note that
(
$
ϕd (0) = (−αν ) ϕ̃d−$ (0)
ν=1
and
(
$ (
$
ϕd∗ (0) = (−ᾱν ) ∗
(−αν ) ϕ̃d−$ ∗
(0) = ϕ̃d−$ (0).
ν=1 ν=1
Note also that
(
$ (
$ (
$
ϕd (0)ϕd∗ (z) = (−αν ) ϕ̃d−$ (0) (−ᾱν ) ∗
(z − αν ) ϕ̃d−$ (z)
ν=1 ν=1 ν=1

(
$

= ϕ̃d−$ (0) (z − αν ) ϕ̃d−$ (z) .
ν=1
112 Chapter 4. Stability for Multistep Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

With this we compute the numerator of (4.3.10) as

ϕd∗ (0)ϕd (z) − ϕd (0)ϕd∗ (z)

(
$ (
$
∗ ∗
= ϕ̃d−$ (0) (z − αν ) ϕ̃d−$ (z) − ϕ̃d−$ (0) (z − αν ) ϕ̃d−$ (z)
ν=1 ν=1

(
$
 ∗ 

= (z − αν ) ϕ̃d−$ (0)ϕ̃d−$ (z) − ϕ̃d−$ (0)ϕ̃d−$ (z)
ν=1

and the form (4.3.13) easily follows.


The lemma proves Theorem 4.3.2 in the case when ϕd−1 is not identically zero. We
also see that ϕd−1 is identically zero only if all the roots of ϕd lie on the unit circle, i.e.,
if ϕd is a conservative polynomial.
When ϕd is a conservative polynomial of degree d, we consider the polynomials

ϕdε (z) = ϕd (z) + εzϕd (z) (4.3.14)

for small positive values of ε. We use the following lemma to give information on the roots
of ϕdε (z). The lemma is for greater generality than is needed here because it is also used in
Chapter 8.

Lemma 4.3.5. If r is a root of ϕ(z) on the unit circle of multiplicity m, then the polynomial
ϕ ε (z) = ϕ(z) + εzϕ  (z) has a root satisfying

r ε = r(1 + ε)−m + O(ε 2 ).

Proof. We solve the equation

ϕ ε (z(1 + δ)) = 0

for δ as a function of ε. Using the Taylor series on ϕ and ϕ  , we have

ϕ(r(1 + δ)) + εr(1 + δ)ϕ  (r(1 + δ))


1 m 1
= ϕ (r)(rδ)m + εr(1 + δ) ϕ m (r)(rδ)(m−1) + O(δ)m+1
m! (m − 1)!
1 m  
= ϕ (r)r m δ m−1 δ + mε + O(δ)2 .
m!

From this last equation, we see that there are m − 1 roots with δ = 0, and one root with
δ = −mε + O(ε)2 . Thus the root not on the unit circle is of the form

r L = r(1 − mε) + O(ε 2 ) = r(1 + ε)−m + O(ε 2 ).


4.3 Schur and von Neumann Polynomials 113
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

By the lemma for m = 1, we have that the simple roots rνε of ϕdε are given by

rνε = rν (1 + ε)−1 + O(ε 2 ).

Thus, if ϕd is a conservative and simple von Neumann polynomial, then for small positive
ε
values of ε, ϕdε is a Schur polynomial. Theorem 4.3.1 then implies that ϕd−1 is also a
Schur polynomial,

Lemma 4.3.6. If ϕd (z) is a conservative polynomial of degree d, and ϕdε (z) is defined
by (4.3.14), then
ε
ϕd−1 (z) = ε(2 + dε)ϕd∗ (0)ϕd (z). (4.3.15)

Proof. We begin with several formulas. First, since ϕd−1 (z) is identically zero, we
have
ϕd∗ (0)ϕd (z) = ϕd (0)ϕd∗ (z),
and by differentiating this relation, we have

ϕd∗ (0)ϕd (z) = ϕd (0)ϕd∗  (z).

Next, we compute (zϕ  )∗ (z). We have


d
(zϕ  )∗ (z) = (d − $)ād−$ z$
$=0

d
= dϕ ∗ − $ād−$ z$
$=0
= dϕ ∗ (z) − zϕ ∗ (z).

So we have  
ϕdε∗ (z) = ϕd∗ (z) + ε dϕ ∗ (z) − zϕ ∗ (z) ,
and so
ϕdε∗ (0) = ϕd∗ (0)(1 + εd) and ϕdε (0) = ϕd (0).
Putting these formulas together to compute the numerator of (4.3.10), we have

ϕdε∗ (0)ϕdε (z) − ϕdε (0)ϕdε∗ (z)


  
= ϕd∗ (0)(1 + εd)(ϕd (z) + εzϕd (z)) − ϕd (0) ϕd∗ (z) + ε dϕ ∗ (z) − zϕ ∗ (z)
 
= εz ϕd∗ (0)(1 + εd)ϕd (z) + ϕd (0)ϕ ∗ (z)
 
= εz ϕd∗ (0)(1 + εd)ϕd (z) + ϕd∗ (0)ϕ  (z) = εz(2 + εd)ϕd∗ (0)ϕd (z).

This immediately leads to (4.3.15).


We are now in a position to prove Theorem 4.3.2 in the case when ϕd−1 is zero.
114 Chapter 4. Stability for Multistep Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

If ϕd−1 is zero, then ϕd (z) is a conservative polynomial and ϕdε is a Schur poly-
nomial for positive ε. Thus, by Theorem 4.3.1, |ϕdε (0)| < |ϕdε ∗ (0)| and ϕd−1
ε is a Schur
ε
polynomial. Moreover, by Lemma 4.3.6, the roots of ϕd−1 are the same as those of ϕd−1  .

Thus ϕd−1 is a Schur polynomial of degree d.
The argument in the other direction follows easily.
For completeness we state the following three theorems without proof.

Theorem 4.3.7. ϕd is a von Neumann polynomial of degree d if and only if either ϕd−1 is
a von Neumann polynomial of degree d − 1 and |ϕd (0)| < |ϕd∗ (0)| or ϕd−1 is identically
zero and ϕd is a von Neumann polynomial.

Theorem 4.3.8. ϕd is a conservative polynomial if and only if ϕd−1 is identically zero


and ϕd is a von Neumann polynomial.

Theorem 4.3.9. ϕd is a simple conservative polynomial if and only if ϕd−1 is identically


zero and ϕd is a Schur polynomial.
We now apply this theory to the examples given at the beginning of this section. In
applying this theory it is very helpful to use a computerized symbol manipulation language
to assist in the algebraic transformations.

Example 4.3.1, continued. We analyze the scheme (4.3.1) using Theorem 4.3.2 and begin
by setting
ϕ2 (z) = (7 + 4iβ)z2 − (8 − 2iβ)z + 1,
which is polynomial (4.3.2). The scheme will be stable precisely when ϕ2 (z) is a simple
von Neumann polynomial. We make repeated use of Theorem 4.3.2. We first check that
|ϕ2∗ (0)| = |7 − 4iβ| > 1 = |ϕ2 (0)| and then, using (4.3.10), we obtain

ϕ1 (z) = 4(12 + 4β 2 )z − 4(12 − 2β 2 − 12iβ).

ϕ1 is a simple von Neumann polynomial if and only if


 2  2
12 + 4β 2 ≥ 12 − 2β 2 + 122 β 2

and this inequality always holds, with equality only if β is zero. Thus the scheme (4.3.1)
is unconditionally stable.
Example 4.3.2, continued. For scheme (4.3.3) with ϕ3 equal to the amplification poly-
nomial (4.3.4), we have that

|ϕ3∗ (0)|2 − |ϕ3 (0)|2 = 24(2 − α)(11 − 6α) + 122 β 2 ,

and this expression is nonnegative for 0 ≤ α ≤ 11/6. (Since α, given by (4.3.6), depends
on θ and vanishes for θ equal to zero, we need not consider the case of α greater than 2,
nor need we consider negative α.) Again, we make repeated use of Theorem 4.3.2.
4.3 Schur and von Neumann Polynomials 115
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The polynomial ϕ2 , after dividing by 24, is

 
ϕ2 (z) = (11 − 6α)(2 − α) + 6β 2 z2
 
− 2 (2 − α)(5 − 3α) − 3β 2 − (11 − 6α)iβ z

− (2 − α − 2iβ).
We have
 
|ϕ2∗ (0)|2 − |ϕ2 (0)|2 = 4(5 − 3α) 3(2 − α)3 + β 2 (13 − 6α) + 36β 4 .

Thus for stability we must have


5 11
0 ≤ α ≤ < .
3 6
This places a more severe requirement on α. Finally,
 
ϕ1 (z) = 120 − 252α + 198α 2 − 69α 3 + 9α 4 + (18α 2 − 69α + 65)β 2 + 9β 4 z

+ 9β 4 + 6(5 − 3α)iβ 3 + (3α − 5)β 2


 
− 18α 3 + 102α 2 + 192α − 120 iβ

− 9α 4 + 69α 3 − 198α 2 + 252α − 120.

We have that the one root of ϕ1 is within or on the unit circle when
|ϕ1∗ (0)|2 − |ϕ1 (0)|2
is nonnegative. This quantity is
 
12β 4 (5 − 3α) 6β 2 + (11 − 6α)(2 − α)

and is nonnegative when α is at most 5/3. Thus the stability condition for (4.3.3) is

sin2 21 θ 5
α = |aλ|2 ≤ .
1− 2
3 sin2 21 θ 3

The maximum value of the left-hand side of this inequality is achieved at θ equal to π.
Thus the scheme (4.3.3) is stable if and only if

5
|aλ| ≤ .
3
Notice that (4.3.3) is an implicit scheme but is not unconditionally stable.
116 Chapter 4. Stability for Multistep Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example 4.3.3, continued. To complete our final example we consider the implicit (4, 4)
scheme (4.3.6) with amplification polynomial ϕ4 (z) given by (4.3.7). Using (4.3.10) we
find that ϕ3 (z) is identically zero, so by Theorem 4.3.2, ϕ4 (z) is a simple von Neumann
polynomial if and only if
ψ3 (z) = 3/4ϕ4 (z) = 3z3 + iβ(6z2 − 2z + 2)
is a Schur polynomial. By checking that |ψ3 (0)| < |ψ3∗ (0)|, i.e., |2β| < 3, we see that
ψ3 is a Schur polynomial only if |β| < 3/2.
Proceeding with the algorithm given by Theorem 4.3.2 and (4.3.10), we have
ψ2 (z) = (9 − 4β 2 )z2 + (4β 2 + 18iβ)z − 12β 2 − 6iβ.
This is a Schur polynomial only if
(9 − 4β 2 )2 > (12β 2 )2 + (6β)2 ,
which is equivalent to √
2 9( 41 − 3)
β < ,
64
and is a more severe restriction than that |β| be less than 3/2. We next obtain
 
ψ1 (z) = 81 − 108β 2 − 128β 4 z + 32β 4 − 264iβ 3 + 144β 2 + 162iβ.

The one root of ψ1 is inside the unit circle only if


 2  2  2
81 − 108β 2 − 128β 4 − 32β 4 + 144β 2 − 264β 3 − 162β

is nonnegative. This expression factors as


   
3 9 − 4β 2 3 − 16β 2 80β 4 − 72β 2 + 81 .

The last factor is always positive, and we deduce that ψ1 is a Schur polynomial for

2 3 9( 41 − 3)
β < < .
16 64
We obtain that the stability condition for the scheme (4.3.6) is

|aλ sin θ | 3
|β| = < .
1 − 3 sin 2 θ
2 2 1 4
The maximum of |β| as a function of θ occurs when cos θ is −1/2. Thus the scheme is
stable when
|aλ| < 1/4.
Notice that when |aλ| is 1/4, the polynomial ϕ4 (z) has a double root on the unit circle.
Since ϕ3 (z) vanishes identically, we have that ϕ4 (z) is a conservative polynomial; that is,
all the amplification factors of the scheme (4.3.6) satisfy
|gν (θ )| ≡ 1.
Even though this scheme is implicit, it is not unconditionally stable.
4.4 The Polynomial Algorithm 117
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
4.3.1. Show that if ϕd−1 (z), as defined by (4.3.10), is identically zero, then if α is a root
of ϕd (z), so is ᾱ −1 .

4.3.2. Verify the accuracy of the schemes (4.3.1), (4.3.3), and (4.3.6).

4.3.3. Verify that formula (4.3.15) follows from (4.3.10) for the polynomials (4.3.14).

4.3.4. A Hurwitz polynomial is a polynomial f (z), all of whose roots are in the left
complex half-plane, i.e., Re z < 0. If f has the coefficients aj we define


d

f (z) = ā$ (−z)$ .
$=0

Given a polynomial f0 we recursively define

fd (z)fd∗ (−1) − fd∗ (z)fd (−1)


fd−1 = .
z+1

Prove that fd is a Hurwitz polynomial of exact degree d if and only if fd−1


is a Hurwitz polynomial of exact degree d − 1 and |fd (−1)| < |fd∗ (−1)|. Hint:
|fd (−1)| is a constant multiple of the distance of the roots from −1. The proof is
similar to that of Theorem 4.3.1.

4.4 The Algorithm for Schur and von Neumann


Polynomials
We now incorporate the preceding theorems into an algorithm for determining the conditions
under which a polynomial is a von Neumann polynomial. In Chapter 8 these results are
extended to include von Neumann polynomials of higher order. Since the algorithm is
easier to state for the more general order, we give it in the greater generality. The von
Neumann polynomials defined in this chapter are von Neumann polynomials of order 1,
and Schur polynomials are von Neumann polynomials of order 0.
We start with a polynomial ϕd (z) of exact degree d, which might depend on several
parameters, and set the initial von Neumann order equal to 0.

While the degree, d, of ϕd (z) is greater than 0, perform steps 1 through 4.


1. Construct ϕd∗ (z) according to either (4.3.8) or (4.3.9).
2. Define cd = |ϕd∗ (0)|2 − |ϕd (0)|2 .
3. Construct the polynomial ψ(z) = (ϕd∗ (0)ϕd (z) − ϕd (0)ϕd∗ (z))/z
according to (4.3.10).
4.1. If ψ(z) is identically 0, then increase the von Neumann order by 1
and set ϕd−1 (z) to be ϕd (z).
118 Chapter 4. Stability for Multistep Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

4.2. Otherwise, if the coefficient of degree d − 1 in ψ(z) is 0, then the


polynomial is not a von Neumann polynomial of any order.
The algorithm terminates.
4.3. Otherwise, set ϕd−1 (z) to be ψ(z).
At the end of this algorithm, if the polynomial has not been rejected by step 4.2,
then the polynomial is a von Neumann polynomial of the resulting order provided that all
of the parameters cd , for d from the initial degree down to 1, satisfy the appropriate
inequalities. The quantities cd must be nonnegative if the polynomial ϕd is to be a von
Neumann polynomial and the cd must be positive if ϕd is to be a Schur polynomial. The
conditions on cd provide the stability conditions.
For first-order partial differential equations, the amplification polynomial must be a
von Neumann polynomial of order 1 for the scheme to be stable. For second-order partial
differential equations, as discussed in Chapter 8, the amplification polynomial must be a
von Neumann polynomial of order 2.

Exercises
4.4.1. Determine if these polynomials are Schur polynomials, von Neumann polynomials,
or neither. Use of a programming language is recommended for the polynomials of
higher degree.
(a) 2z3 + z2 + z + 1.
(b) 2z4 + z3 + z2 + 2z + 1.
(c) z6 + z5 − z − 1.
(d) z8 + z7 + z4 + z + 1.
(e) z8 + z5 + z + 1.
4.4.2. Use the methods of this section to show that the leapfrog scheme (1.3.4) is stable if
and only if |aλ| is less than 1.
4.4.3. Using the methods of this section, verify that the scheme (4.2.3) is unconditionally
stable.
4.4.4. Show that the modified leapfrog scheme
n+1 − v n−1  n+1 + 4v n + v n−1
vm m vm
+ aδ0 m m
= fmn
2k 6

is stable if and only if |aλ| < 3.
4.4.5. Show that the explicit (4, 4) scheme for (1.1.1) derived in Example 3.3.1,

n+2 − v n−2   n+1 n + 2v n−1


vm m h2 2 2vm − vm
+ a 1 − δ δ0 m
4k 6 3

2 fmn+1 − fmn + 2 fmn−1


= ,
3
4.4 The Polynomial Algorithm 119
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

is stable for √
3 1
|aλ| < √ √ .
4 (1 + 6)( 6 − 3/2)1/2
Hint: The amplification polynomial for this scheme is very similar to (4.3.7).
4.4.6. Show that the following scheme for ut + aux = f is accurate of order (3, 4) and
is unstable for all values of λ:

n+1 − 18v n + 9v n−1 − 2v n−2  −1


11vm m m m h2
+ a 1 + δ2 n+1
δ0 vm = fmn+1 .
6k 6

4.4.7. Show that the scheme


n+2 + v n+1 − v n−1 − v n−2 
vm m m m h2 δ 2
+a 1− n
δ0 vm = fmn
6k 6

is a (2, 4) accurate scheme for ut + aux = f and is stable for


 √
69 − 11 33
|aλ| < √ √ .
8( 6 + 1)( 6 − 3/2)1/2

4.4.8. Show that the scheme

n+2 + v n+1 − v n−1 − v n−2  −1


vm m m m h2 δ 2 v n+1 + vm
n−1
+a 1+ δ0 m = fmn
6k 6 2

is a (2, 4) accurate scheme for ut + aux = f and is stable for

1  1/3 
|aλ| < 2 −1 .
3
 2
Hint: The real root of 1 − 15x + 3x 2 − x 3 is 21/3 − 1 .
4.4.9. Show that the scheme

n+1 + 3v n − 6v n−1 + v n−2  −1


2vm m m m h2 δ 2
+a 1+ n
δ0 vm =0
6k 6

is a (3, 4) accurate scheme for the one-way wave equation ut + aux = 0 and is
unstable for all values of λ.
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 5

Dissipation and Dispersion

In this chapter we study two important topics in the study of finite difference schemes for
hyperbolic equations, dissipation and dispersion. Schemes that have dissipation damp out
high-frequency waves that can make the computed solution more oscillatory than desired.
Dispersion refers to the fact that finite difference schemes move different frequencies at
different speeds. This causes the computed solution to spread out as time progresses. Both
dissipation and dispersion are important properties to consider in selecting schemes for
computation.
The third section of this chapter deals with the propagation of wave packets, which
are highly oscillatory waves contained in a short range. The propagation of packets depends
on both the phase velocity due to dispersion and the group velocity of the packet.

5.1 Dissipation
In Section 1.3 we noted that the leapfrog scheme was more accurate than the Lax–Friedrichs
scheme, as illustrated by Figures 1.3.6 and 1.3.8. However, the solution computed with
the leapfrog scheme contains small oscillations that detract from the appearance of the
solution. In this section we discuss a way of removing, or at least reducing, the amplitude
of this “noise.” For many calculations, especially for nonlinear equations, these small-
amplitude, high-frequency oscillations can have a significant role in reducing the accuracy
of the solution.
To consider the method of propagation of these oscillations, consider the leapfrog
scheme (1.3.4) with initial data given by
0
vm = (−1)m η and 1
vm = (−1)m+1 η for m ∈ Z, (5.1.1)

where η is some small parameter. It is easy to check that the solution for all time steps is
n
vm = (−1)m+n η for m ∈ Z. (5.1.2)

This formula shows that the leapfrog scheme (1.3.4) propagates the initial disturbances
without damping them. A more striking illustration of propagation without damping was
seen in Figure 4.3, in which the solution of the leapfrog scheme in the third plot was
“reconstructed” as the reflection of the solution in the second plot. This propagation without
damping is a consequence of the amplification factors g+ (θ ) and g− (θ ) having magnitude
equal to 1. (See formula (4.1.2).)

121
122 Chapter 5. Dissipation and Dispersion
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Next consider the Lax–Wendroff scheme (3.1.2) with |aλ| less than 1 and with data
(5.1.1) for n = 0. Because of the repetition of the data, the solution is
n
vm = (1 − 2a 2 λ2 )n (−1)m+n .

Since |1 − 2a 2 λ2 | is less than 1, the oscillation decreases in magnitude each step. This
decreasing of high-frequency oscillations is called dissipation.
The definition of dissipation is usually given under the assumption that the lower order
terms have been removed. This is also assumed in the definition of dissipation given next.
Also note that we include both one-step schemes and multistep schemes in the definition.

Definition 5.1.1. A scheme is dissipative of order 2r if there exists a positive constant


c, independent of h and k, such that each amplification factor gν (θ ) satisfies

|gν (θ )| ≤ 1 − c (sin 12 θ)2r . (5.1.3)

A scheme that is dissipative of order 2r is also said to have dissipation of order 2r.
Similar to the observation at the end of Section 2.2, we note that (5.1.3) is equivalent to

|gν (θ )|2 ≤ 1 − c (sin 12 θ)2r (5.1.4)

for some constant c .


The Lax–Wendroff scheme satisfies
 
|g(θ )|2 = 1 − 4a 2 λ2 1 − a 2 λ2 (sin 12 θ)4

(see Section 3.2) and so is dissipative of order 4 for 0 < |aλ| < 1. In fact, for θ = π, we
have g(θ ) = 1 − 2a 2 λ2 , as our example showed.
The forward-time backward-space scheme (1.3.2) satisfies

|g(θ )|2 = 1 − 4aλ (1 − aλ) (sin 12 θ)2

(see (2.2.6)) and so is dissipative of order 2 for 0 < aλ < 1.


For the most general case, in which we cannot use the restricted stability condition
(2.2.8) but must use the general condition of (2.2.7), the estimate (5.1.3) must be replaced
by  
|gν (θ )| ≤ 1 − c(sin 12 θ)2r (1 + Kk) , (5.1.5)

and similarly for (5.1.4).


Often the definition of dissipation is given by replacing the quantity (sin θ)2r in
(5.1.3) by |θ |2r , with θ restricted in magnitude to less than π. The definitions are equiv-
alent; we prefer the form (5.1.3), since that is the form that actually occurs in evaluating
|g(θ )| for most schemes.
The leapfrog scheme (1.3.4) and the Crank–Nicolson scheme (3.1.3) are called strictly
nondissipative schemes because their amplification factors are identically 1 in magnitude.
5.1 Dissipation 123
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The Lax–Friedrichs scheme (1.3.5) and the backward-time central-space scheme (1.6.1)
and the (2, 2) implicit scheme (4.2.3) are nondissipative but not strictly so. For example,
the Lax–Friedrichs scheme has

|g(θ )|2 = cos2 θ + a 2 λ2 sin2 θ

(see (2.2.12)), and since |g(π )| = 1, this scheme is not dissipative.


For the Lax–Friedrichs and backward-time central-space schemes, |g(θ )| is less than
1 for most values of θ. These schemes reduce the magnitude of most frequencies but not
the highest frequency on the grid.

Adding Dissipation to Schemes


Dissipation can be added to any nondissipative scheme, as we will show, and this provides
us with some control over the properties of the scheme. In adding dissipation to a nondis-
sipative scheme, we must be careful not to affect the order of accuracy adversely. For
example, the modified leapfrog scheme
n+1 − v n−1  4
vm m ε 1
+ aδ0 vm
n
+ hδ n−1
vm = fmn (5.1.6)
2k 2k 2

is a second-order accurate scheme for ut + aux = f for small values of ε. Notice that
(sin 12 θ )2r is the symbol of ( 12 ihδ)2r.
The amplification factors are
&
g± (θ ) = −iaλ sin θ ± 1 − a 2 λ2 sin2 θ − ε sin4 21 θ.

If ε is small enough, then the scheme is stable and dissipative of order 4 (see Exercise
5.1.2) and satisfies
|g± (θ )|2 = 1 − ε sin4 21 θ .
Similarly, the Crank–Nicolson scheme (3.1.3) can be modified as
 4
n+1 − v n
vm m a  n+1  ε 1 1  n+1 
+ δ0 vm + vm
n
+ hδ n
vm = fm + fmn . (5.1.7)
k 2 k 2 2

This scheme is second-order accurate and dissipative of order 4 for small values of ε.
Figures 5.1 and 5.2 show the effect of adding dissipation to the leapfrog scheme. The
solution is the propagation of a simple piecewise linear pulse. Notice that the dissipation
removes most of the oscillations to the left of the pulse. It does not remove the larger oscil-
lation behind the pulse. This oscillation is inherent in higher order schemes, as discussed
after Theorem 3.1.4.
To show that any scheme can have dissipation added to it, we consider the amplifi-
cation polynomial and modify it as in formula (4.3.14). To be more precise, the scheme
corresponding to
Gε (g, θ ) = G(g, θ) + ε(sin 12 θ)2r gG (g, θ ) (5.1.8)
124 Chapter 5. Dissipation and Dispersion
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

0.5

-2 -1 0 1 2 3

Figure 5.1. The leapfrog scheme with no dissipation added.

0.5

-2 -1 0 1 2 3

Figure 5.2. The leapfrog scheme with dissipation of ε = 0.5.

will have all roots inside the unit circle except at θ equal to 0. ( G (g, θ ) is the derivative
of G with respect to g.) Another choice for a dissipative scheme is
 
Gε (g, θ ) = G(g, θ) + ε(sin 12 θ)2r gG (g, θ ) − dG(g, θ ) , (5.1.9)

where d is the degree of G(g, θ). The preceding general procedures are not always advis-
able to use, but they do give one guidance in adding dissipation to a scheme
(see Exercises 5.1.3 and 5.1.4).
5.2 Dispersion 125
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

If we use the methods of Section 4.3, then we can determine if the scheme is
dissipative by checking if the amplification polynomial is a Schur polynomial for val-
ues of θ other than θ equal to 0. For a dissipative scheme the amplification polynomial
is a Schur polynomial when θ is not equal to zero.

Exercises
5.1.1. Show that the scheme (5.1.7) is dissipative of order 4 and stable if 0 < ε < 2.
5.1.2. Show that the modified leapfrog scheme (5.1.6) is stable for ε satisfying

0 < ε ≤ 1 if 0 < a 2 λ2 ≤ 1
2

and  
0 < ε ≤ 4a 2 λ2 1 − a 2 λ2 if 1
2 ≤ a 2 λ2 < 1.
Note that these limits are not sharp. It is possible to choose L larger than these limits
and still have the scheme be stable.
5.1.3. Construct the modified scheme corresponding to formula (5.1.8) using the multistep
scheme (4.2.3). Compare this scheme with
n+1 − 4v n + v n−1  2r
3vm m m ε i
+ aδ0 vm
n+1
= hδ vmn−1
.
2k 2k 2

5.1.4. Construct the leapfrog scheme with added dissipation using the method given by
formula (5.1.9). Compare this scheme with the scheme (5.1.6).
5.1.5. Construct the Crank–Nicolson scheme with added dissipation using the method given
by formulas (5.1.8) and (5.1.9). Compare these schemes with each other and with
the scheme (5.1.7).
5.1.6. Show that the scheme of Exercise 3.3.5 is dissipative of order 6 for
 √17 − 1 1/2
0 < |aλ| < .
6

5.1.7. Show that the scheme (3.3.16) is dissipative of order 4 if 0 < |aλ| < 3.

5.2 Dispersion
To introduce the idea of dispersion we look again at equation (1.1.1) and notice that we can
write the solution as
 ∞
1
u(t, x) = √ eiωx e−iωat û0 (ω) dω. (5.2.1)
2π −∞
From this we conclude that the Fourier transform of the solution satisfies

û(t + k, ω) = e−iωak û(t, ω). (5.2.2)


126 Chapter 5. Dissipation and Dispersion
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

If we consider a one-step finite difference scheme, we have, from (2.2.5), that

v̂ n+1 = g(hξ )v̂ n , (5.2.3)

and by comparing (5.2.2) and (5.2.3) we see that we should expect that g(hξ ) will be a
good approximation to e−iξ ak .
To emphasize the similarity between e−iξ ak and g(hξ ), we write

g(hξ ) = |g(hξ )|e−iξ α(hξ )k . (5.2.4)

The quantity α(hξ ) is called the phase speed and is the speed at which waves of frequency
ξ are propagated by the finite difference scheme. If α(hξ ) were equal to a for all ξ,
then waves would propagate with the correct speed, but this is not the case for any finite
difference scheme except in trivial cases. The speed α(hξ ) is only an approximation to a.
The phenomenon of waves of different frequencies traveling with different speeds is
called dispersion. In studying dispersion for finite difference schemes it is convenient to
define the phase error as a − α(hξ ).
The effect of dispersion can be seen in the distortion of the shape of the solution of a
finite difference scheme. Consider the solution as displayed in Figure 1.3.6. For the partial
differential equation the shape is preserved; it is only translated. For the finite difference
scheme the shape is not preserved because the different frequencies that make up the initial
solution are propagated with different speeds.
From (5.2.4) we obtain two useful formulas for α(θ). First, by using θ rather than
ξ, we have
g(θ ) = |g(θ )|e−iθα(θ)λ .

By considering the real and imaginary parts of g(θ ) we have

Im g(θ )
tan(α(θ )λθ ) = − . (5.2.5)
Re g(θ )

Also, if |g(θ )| = 1, then we have the formula

sin(α(θ )λθ ) = −Im g(θ ). (5.2.6)

Example 5.2.1. We consider the Lax–Wendroff scheme to study its dispersion. We have

g = 1 − 2(aλ)2 sin2 21 hξ − iaλ sin hξ

and so by (5.2.5)
aλ sin hξ
tan [α(hξ )ξ k] = . (5.2.7)
1 − 2(aλ)2 sin2 21 hξ
5.2 Dispersion 127
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Since this formula does not give too much insight into the behavior of α(hξ ), we
study the Taylor series of α(hξ ) around ξ = 0. We use the formulas
 
sin x = x 1 − 16 x 2 + O(x 4 ) ,
 
tan x = x 1 + 13 x 2 + O(x 4 ) ,
 
tan−1 y = y 1 − 13 y 2 + O(y 4 ) .

Using the series for sin x in (5.2.7) we obtain, after some work,
   
tan [α(hξ )ξ k] = aλhξ 1 − (hξ )2 16 − 12 (aλ)2 + O(hξ )4 .

From the formula for tan−1 y we obtain


   
α(hξ ) = a 1 − 16 (hξ )2 1 − (aλ)2 + O(hξ )4 . (5.2.8)

We see that for hξ small and |aλ| < 1, α(hξ ) is less than a. Also, we see that if |aλ|
is close to 1, then the dispersion will be less.
Because α(hξ ) is less than a for smaller values of ξ, solutions to the Lax–Wendroff
finite difference scheme will move slower than the true solution. This is displayed in
Figure 3.1, in which the solution of both schemes trails the true solution.
To deduce the behavior of α(hξ ) for larger values of ξ, we refer to the formula for
g and (5.2.7). We find that for ξ equal to h−1 π, g has the value 1 − 2a 2 λ2 . If a 2 λ2
is greater than 1/2, then g is negative, and so α(π )h−1 π k is equal to π. Thus α(π )
is λ−1 . However, if a 2 λ2 is less than 1/2, then g is positive and so α(π ) is 0. By
consistency, α(hξ ) will always be close to a for small values of ξ, and, in particular,
α(0) is equal to a. This is proved in Chapter 10.

For the leapfrog scheme and other multistep schemes, the phase error is defined by
considering the one amplification factor g0 (hξ ) for which

g0 (0) = 1.

As we showed in Section 4.2, there is only one such amplification factor. For the leapfrog
scheme (1.3.4) we see by (5.2.6) that the phase speed is given by

sin[α(θ )λθ ] = aλ sin θ,

or, equivalently,

sin[kα(hξ )ξ ] = aλ sin hξ. (5.2.9)

The expansion of α(hξ ) for small values of ξ results in the same formula as for the
Lax–Wendroff scheme up to O(hξ )4 ; see Exercise 5.2.1. Note also that α(π ) is 0 for the
leapfrog scheme.
128 Chapter 5. Dissipation and Dispersion
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

One can also study the propagation velocities of parasitic modes for multistep schemes.
We consider the parasitic mode for the leapfrog scheme as an example. Since g− (0) = −1,
it is best to write the equation for the phase velocity as

g− (hξ ) = −|g− (hξ )|e−iξ α− (hξ )k . (5.2.10)

It is seen that
sin[α− (θ )λθ ] = −aλ sin θ
and thus α− (θ ) = −α(θ ). In particular, the parasitic mode moves in the opposite direction
to the correct direction of propagation. Moreover, since α(π ) = α− (π ) = 0, the highest
frequencies, which contain no accurate information, do not propagate away.
Figure 5.3 shows the phase velocity of the leapfrog scheme, α(θ), as a function of
θ for a = 1 and λ = 0.95. Notice that the phase speed is close to 1 for smaller values of
θ, but drops off to 0 for larger values of θ.

λ = 0.95

0.8

0.6
α

0.4

0.2

0
0 1 2 3
θ

Figure 5.3. Phase velocity for the leapfrog scheme.

As a general principle, for hyperbolic partial differential equations it is best to take


|aλ| close to the stability limit to keep the dispersion and dissipation small. If we are
interested in a particular frequency, say ξ0 , then we should choose h so that hξ0 is much
less than π to get accurate results, both in the speed of the wave (dispersion) and in the
amplitude (dissipation). For the leapfrog and Lax–Wendroff schemes for (1.1.1) with aλ
equal to 1, the schemes have no dispersion error. These are exceptional cases, and this does
not happen for variable coefficient equations or nontrivial systems.
Using the fact that the coefficients of the scheme are real, it is easy to show that the
phase error is an even function of hξ. Thus the phase error always has even order. If a

scheme has order of accuracy r, then phase error is of order O(hr ), where r  is r if r
is even, and r  is r + 1 if r is odd.
5.2 Dispersion 129
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

In choosing a scheme for particular applications, the amount of dissipation and dis-
persion can be used to choose between schemes; see Durran [15].

Exercises
5.2.1. Show that the formula (5.2.8) is also true for the leapfrog scheme, where α(hξ ) is
given by (5.2.9).
5.2.2. For the backward-time central-space scheme (1.6.1), show that the phase speed is
given by
tan [kα(hξ )ξ ] = aλ sin hξ
and satisfies
!
(hξ )2    4
α(hξ ) = a 1 − 1 + 2a λ + O hξ
2 2
.
6

5.2.3. Show that the phase speed for the Crank–Nicolson scheme (3.1.3) is given by
 
tan 12 kα(hξ )ξ = 12 aλ sin hξ

and satisfies
 !
(hξ )2 1 2 2  4
α(hξ ) = a 1 − 1 + a λ + O hξ .
6 2

5.2.4. Show that for the multistep scheme (4.2.3), the amplification factor g+ (θ ) satisfying
g+ (0) = 1 can be expanded as


g+ (θ )−1 = 2 − 1 − 2iaλ sin θ

1 i
= 1 + iaλ sin θ − (aλ sin θ)2 − (aλ sin θ)3 + O(θ )4
2 2
and thus  
tan [kα(hξ )ξ ] = aλ sin hξ 1 + O(hξ )4 ,
and conclude that α(hξ ) is the same as for the scheme of Exercise 5.2.2 to within
O(hξ )4 .
5.2.5. Show that the (2, 4) Crank–Nicolson scheme
  n+1 n
n+1 − v n
vm m h2 2 −1 vm + vm
+a 1+ δ δ0 =0 (5.2.11)
k 6 2
has phase speed given by
!  
1 3 aλ sin hξ 1
tan kα(hξ )ξ = = akξ 1 + O(hξ )4
2 2 2 + cos hξ 2
130 Chapter 5. Dissipation and Dispersion
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

and thus ! 
1
α(hξ ) = a 1 − a 2 (kξ )2 + O(kξ )4 1 + O(hξ )4 .
3

5.2.6. Show that the phase speed for the (2, 4) Lax–Wendroff scheme of Exercise 5.1.6
satisfies  
akξ 1 + O(hξ )4
tan kα(hξ )ξ =  
1 − 12 a 2 (kξ )2 1 + O(hξ )4
and therefore ! 
1
α(hξ ) = a 1 + a 2 (kξ )2 1 + O(hξ )4 .
6

5.3 Group Velocity and the Propagation of Wave Packets


The study of wave packets introduces another interesting aspect of dispersion, that of group
velocity. For other discussions of group velocity see Trefethen [62] and Vichnevetsky and
Bowles [65]. We consider the one-way wave equation (1.1.1) with initial data of the form

u(0, x) = eiξ0 x p(x), (5.3.1)

where p(x) is a relatively smooth function decaying rapidly with |x|. The solution for
the initial condition (5.3.1) is

u(t, x) = eiξ0 (x−at) p(x − at), (5.3.2)

since the solution is a simple translation by at.


We call a function of the form (5.3.2) a wave packet. We refer to the function p(x)
as the envelope of the packet and the frequency ξ0 as the frequency of the wave packet.
As a particular case, we will use the function

cos ξ0 x cos2 21 π x if |x| ≤ 1,


u(0, x) = (5.3.3)
0 otherwise,

in our numerical illustrations. In this case we use cos ξ0 x, the real part of eiξ0 x , instead
of eiξ0 x itself. Figure 5.4 displays the wave packet

cos(3π x) e−x .
2

The wave packet is a highly oscillatory function with a limited range.


For a finite difference scheme we know that dispersion will cause the pure wave with
frequency ξ0 to travel with the phase velocity α(hξ0 ), but it is not evident what will be
the speed or speeds associated to a wave packet.
We show that for many schemes, the wave of the packet moves with the phase speed
and the envelope moves with a second speed, at least to a error that is O(h). As we will
5.3 Group Velocity and Wave Packets 131
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Figure 5.4. A wave packet.

show, a strictly nondissipative finite difference scheme with a wave packet as initial data
will have a solution that is approximately

v ∗ (t, x) = eiξ0 (x−α(hξ0 )t) p(x − γ (hξ0 )t), (5.3.4)

where α(hξ0 ) is the phase velocity of the scheme and γ (hξ0 ) is the group velocity.
The group velocity is given by

d(θ α(θ ))
γ (θ ) = = α(θ) + θα  (θ ). (5.3.5)

Notice that since α(hξ ) tends to a as h tends to zero, we have that γ (hξ ) also tends to
a; that is, as h tends to zero the function v ∗ , which approximates vm
n , tends to the exact

solution (5.3.2).

Example 5.3.1. The concept of group velocity is illustrated in Figure 5.5. The computation
uses the leapfrog scheme to solve the one-way wave equation ut + ux = 0 on the interval
[−2, 2] with periodicity. The grid spacing is 0.05 with λ = 0.95. The initial condition is
(5.3.3) with ξ0 equal to 5π.
Figure 5.5 displays the computed solution at time 19.95 with a solid line connecting
the data points marked with dots. Also shown is the graph of the envelope, not the solution
itself, at time 19.95. It is immediately seen that the envelope of the computed solution does
not correspond to that of the exact solution, so the wave packet has traveled with a speed
132 Chapter 5. Dissipation and Dispersion
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1
⇓ ↓

0.5

-0.5

-1
-2 -1 0 1 2

Figure 5.5. The propagation of a wave packet.

that is less than the true envelope speed of 1. The group velocity for ξ0 equal to 5π is
approximately 0.9545, and a wave packet traveling with this velocity would be centered at
−0.9568. This location is marked by the double arrow in the figure. It is seen that this is a
very good approximation to the center of the computed solution wave packet.
The single arrow in the figure marks the location where the center of the wave packet
would be if it traveled with the phase velocity. The peak of the wave under the arrow has
traveled with the phase velocity. Originally this peak was at the center of the wave packet,
but it is no longer because the group velocity is less than the phase velocity.
Finally, the graph of v ∗ is shown as the solid line without points. It is seen that the
approximation of vm n by v ∗ (t , x ) is a good approximation, better than the approximation
n m
of u(t, x) by vm .n

We now justify the claim that v ∗ approximates the solution of the scheme. The
initial data (5.3.1) has the Fourier transform p̂(ξ − ξ0 ), and thus the solution of the finite
5.3 Group Velocity and Wave Packets 133
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

difference scheme is
 π/ h
1
n
vm =√ eiξ xm e−iξ α(hξ )tn p̂h (ξ − ξ0 ) dξ, (5.3.6)
2π −π/ h

where ph,m is the discrete function given by evaluating p at the grid points xm . We show
in Section 10.2 that p̂h (ξ ) is approximately p̂(ξ ) for |ξ | much less than h−1 π. For now
we may disregard the distinction between p̂(ξ − ξ0 ) and p̂h (ξ − ξ0 ).
We change the variable of integration in (5.3.6) by replacing ξ by ω + ξ0 . We obtain,
since all the functions are periodic with period 2π h−1 ,
 π/ h
1
vm = √
n
ei(ω+ξ0 )xm e−i(ω+ξ0 )α(hω+hξ0 )tn p̂h (ω) dω, (5.3.7)
2π −π/ h
and define ṽ(t, x) by replacing p̂h (ω) by p̂(ω) and by extending the range of integration
to the whole real line, obtaining
 ∞
1
ṽ(t, x) = √ ei(ω+ξ0 )x e−i(ω+ξ0 )α(hω+hξ0 )t p̂(ω) dω. (5.3.8)
2π −∞
It can be shown by the methods of Chapter 10 that the replacement of p̂h (ω) by
p̂(ω) and the extension of the limits of integration to (−∞, ∞) do not cause significant
errors in the approximation; see Exercise 10.2.6.
We write ṽ(t, x) as
   ∞
1
ṽ(t, x) = eiξ0 x−α(hξ0 )t √ eiω(x−γ̃ t) p̂(ω) dω, (5.3.9)
2π −∞
with
(ω + ξ0 )α(hω + hξ0 ) − ξ0 α(hξ0 )
γ̃ (hω) =
ω
(ϕ + θ0 )α(ϕ + θ0 ) − θ0 α(θ0 )
=
ϕ
with θ0 = hξ0 and ϕ = hω. The use of Taylor series on α(ϕ) about θ0 results in
d(θ α(θ )) 1 d 2 θ α(θ )
γ̃ (ϕ) = + ϕ
dθ θ=θ0 2 dθ 2 θ=θ ∗
for some value of θ ∗ between θ0 and θ0 + ϕ. Writing this in terms of ω and ξ0 , we
have

d(ξ α(hξ )) 1 d 2 (θ α(θ ))


γ̃ (hω) = + h ω
dξ ξ =ξ0 2 dθ 2 θ=hξ ∗

for some value of ξ ∗ between ξ0 and ξ0 + ω. We see that γ̃ (hω) is equal to the group
velocity γ (hξ0 ) to within an error on the order of h. Rewriting (5.3.9) we have
   ∞  
iξ0 x−α(hξ0 )t 1 iω x−γ (hξ0 )t ihtr(ω)
ṽ(t, x) = e √ e e p̂(ω) dω, (5.3.10)
2π −∞
134 Chapter 5. Dissipation and Dispersion
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where we have written r(ω) for the term (1/2) ω2 d 2 θ α(θ )/dθ 2 .
If we replace the factor eihtr(ω) by 1 in the expression for ṽ(t, x), we obtain v ∗ (t, x):
   ∞  
∗ iξ0 x−α(hξ0 )t 1 iω x−γ (hξ0 )t
v (t, x) = e √ e p̂(ω) dω
2π −∞
(5.3.11)
   
= eiξ0 x−α(hξ0 )t
p x − γ (hξ0 )t .

Since eihtr(ω) = 1 + O(hω), it is shown in Section 10.2 that the difference between
ṽ(t, x) and v ∗ (t, x) is also O(h), provided that p(x) is smooth enough (see Exercise
10.2.6). This shows (up to the details deferred to Chapter 10) that the solution to the finite
difference scheme
 is approximated
 to within O(h) by v ∗ (t, x). If ξ0 is such that the
quantity ξ0 a − α(hξ0 ) t is O(1), then vm n also differs from the exact solution, u(t, x),

by O(1), as we discussed earlier in the analysis of the phase error. In this case the
approximation of vm n by v ∗ (t, x) can be much better than the approximation of u(t, x)
n
by vm when t is large. This is well illustrated in Figure 5.5.
Group velocity can be used to explain some rather striking behavior of schemes (see
Exercise 5.3.9). Even in the presence of dissipation, the propagation of waves is governed
by the phase and group velocity, as Exercise 5.3.9 demonstrates. Group velocity has been
used by Trefethen to explain instability caused by boundary conditions; see [63].

Exercises
5.3.1. Show that the group velocity for the Lax–Wendroff scheme is given by

1 − 2(1 − a 2 λ2 ) sin2 21 hξ
γ (hξ ) = a .
1 − 4a 2 λ2 (1 − a 2 λ2 ) sin4 21 hξ

5.3.2. Show that the group velocity for the leapfrog scheme (1.3.4) is given by
cos hξ
γ (hξ ) = a &
1 − a 2 λ2 sin2 hξ .

Compare the phase speed and group velocity at ξ = h−1 π.


5.3.3. Show that the group velocity for the Crank–Nicolson scheme (3.1.3) is given by
cos θ
γ (θ) = a .
1+ 1 2 2 2
4 a λ sin θ

5.3.4. Show that the group velocity for the box scheme (3.2.3) is given by

sec2 21 θ
γ (θ) = a .
1 + a 2 λ2 tan2 21 θ
5.3 Group Velocity and Wave Packets 135
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

5.3.5. Repeat the calculation of Example 5.3.1 using the leapfrog scheme but on the
interval [−1, 9] for 0 ≤ t ≤ 7.5. Specify the solution at the left boundary to be
0, and at the right boundary use quasi-characteristic extrapolation (3.4.1). Demon-
strate that the wave packet moves with the group velocity and that the high-frequency
mode travels with the phase velocity. Show that the conclusions of Example 5.3.1
are valid in this case also. Study the effect of small amounts of dissipation using the
scheme (5.1.6).
5.3.6. Repeat the calculation of Example 5.3.1 using the Lax–Wendroff scheme. Demon-
strate that the wave packet moves with the group velocity and that the high-frequency
mode travels with the phase velocity. In this exercise you will, of course, also see
the effect of dissipation on the solution.
5.3.7. Repeat the calculation of Example 5.3.1 using the Crank–Nicolson scheme but
using ξ0 equal to 3π and λ equal to 1. Demonstrate that the wave packet moves
with the group velocity and that the high-frequency mode travels with the phase ve-
locity. Note that the Crank–Nicolson scheme is highly dispersive; i.e., because the
phase speed is not as good an approximation to a as it is for the leapfrog scheme,
the wave packet will be significantly distorted. This exercise will require you to
solve a periodic tridiagonal system; see Section 3.5.
5.3.8. Solve the one-way wave equation ut + ux = 0 on the interval [−1, 9] for 0 ≤ t ≤
7.5 for the initial data (5.3.3) with ξ equal to 8π. Use the Crank–Nicolson scheme
with grid spacing of 0.025 and λ equal to 1. For the boundary condition at x = 9,
use the quasi-characteristic boundary condition (3.4.1). Demonstrate that the wave
packet moves with the group velocity and that the high-frequency mode travels with
the phase velocity. Note that the Crank–Nicolson scheme is highly dispersive; i.e.,
because the phase speed is not as good an approximation to a as it is for the leapfrog
scheme, the wave packet will be significantly distorted.
5.3.9. Solve the one-way wave equation ut + ux = 0 on the interval [−3, 3] for 0 ≤
t ≤ 1.45. Use the leapfrog scheme with grid spacing of 0.1 and λ equal to 0.9.
For initial data use the wave packet (5.3.3) with ξ equal to 9π. To compute the
values of the solution at the first step, use two different methods to initialize, the
Lax–Friedrichs scheme and the forward-time central-space scheme. The difference
between the two solutions is quite dramatic. Explain the difference by considering
the amplitude associated with the parasitic mode for each case and noting that for
the Lax–Friedrichs scheme g(π ) = −1, whereas for the forward-time central-space
scheme g(π ) = 1. Also, for the leapfrog scheme γ (π ) is −1, and for the parasitic
mode γ− (π ) is 1.
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 6

Parabolic Partial Differential


Equations

6.1 Overview of Parabolic Partial Differential Equations


The simplest parabolic equation is the one-dimensional heat equation

ut = buxx , (6.1.1)

where b is a positive number. This equation arises in the study of heat transfer, in which
case the function u(t, x) gives the temperature at time t and location x resulting from the
initial temperature distribution. Equations similar to (6.1.1) arise in many other applications,
including viscous fluid flow and diffusion processes. As for the one-way wave equation
(1.1.1), we are interested in the initial value problem for the heat equation (6.1.1); i.e.,
we wish to determine the solution u(t, x) for t positive, given the initial condition that
u(0, x) = u0 (x) for some function u0 .
We can obtain a formula for the solution to (6.1.1) by using the Fourier transform of
(6.1.1) in space to obtain the equation

ût = −bω2 û.

Using the initial values, this equation has the solution

û(t, ω) = e−bω t û0 (ω),


2

and thus by the Fourier inversion formula

 ∞
1
eiωx e−bω t û0 (ω) dω.
2
u(t, x) = √ (6.1.2)
2π −∞

Formula (6.1.2) shows that u at time t is obtained from u0 by damping the high-frequency
modes of u0 . It also shows why the solution operator for a parabolic equation is called a
dissipative operator, since all high frequencies are dissipated.

137
138 Chapter 6. Parabolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

A second formula can be obtained by using the definition of û0 in (6.1.2) and inter-
changing the order of integration. We have

 ∞  ∞ !
1 1
eiωx e−bω e−iωy u0 (y) dy dω
2t
u(t, x) = √ √
2π −∞ 2π −∞
 ∞   ∞
1 1
eiω(x−y) e−bω t dω u0 (y) dy
2
=√ √
4π −∞ π −∞
 ∞
1
e−(x−y)
2 /4bt
=√ u0 (y) dy.
4πbt −∞

(See Exercise 6.1.6 for the evaluation of the integral in the parentheses.)
The formula
 ∞
1
e−(x−y) /4bt u0 (y) dy
2
u(t, x) = √ (6.1.3)
4π bt −∞

expresses u(t, x) as a weighted average of u0 . For small t the weighting function mul-
tiplying u0 (y) is very peaked about y = x, whereas for larger t the weighting function
is much wider.
There are several important things to learn from the representations (6.1.2) and (6.1.3).
First, the solution to (6.1.1) is an infinitely differentiable function of t and x for any positive
value of t. This is easily seen by differentiating the representation (6.1.2), obtaining
 ∞
∂ $+m u(t, x) 1
eiωx (iω)m (−bω2 )$ e−bω t û0 (ω) dω.
2
$ m
=√
∂t ∂x 2π −∞

Since the quantity (iω)m (−bω2 )$ e−bω


2t
is in L2 (R) for positive values of t, we obtain,
by the Cauchy–Schwarz inequality,

∂ $+m u(t, x)  ∞
1
|ω|m (bω2 )$ e−bω t |û0 (ω)| dω
2
≤ √
∂t $ ∂x m 2π −∞
 ∞ 1/2  ∞ 1/2
1
|ω|2m b2$ ω4$ e−2bω t dω
2
≤ √ |û0 (ω)|2 dω
2π −∞ −∞

≤ Ct,$,m u0  (6.1.4)

for some constant Ct,$,m . Notice that the derivatives of the solution at a point, for t
positive, are bounded by the norm of u0 .
Second, we see from (6.1.3) that if u0 is nonnegative and not identically zero, then
u(t, x) will be positive for all (t, x) with t positive. This is in accord with our physical
6.1 Overview of Parabolic Equations 139
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

intuition that heat will distribute itself rapidly and that temperatures colder than the initial
temperatures will not occur in an isolated system.
Figure 6.1 shows the solution of the heat equation

ut = uxx

on the real line with initial condition


1 if |x| ≤ 1,
u0 (x) =
0 if |x| > 1.

The solution is shown at the initial time and at times 0.02, 0.10, 0.50, and 1.50. The
solution becomes successively more spread out as time increases. The exact solution is
given by the formula
  
1 1−x 1+x
u(t, x) = erf √ + erf √ . (6.1.5)
2 4t 4t

The function erf () is the error function defined as


 x
2
e−t dt .
2
erf (x) = √
π 0

As t gets very large the argument of the error functions in (6.1.5) get smaller, for x fixed.
Thus, for each value of x, the value of u(t, x) tends to zero as t gets larger.

Figure 6.1. The solution of the heat equation.

In the remainder of this section we discuss two topics, the convection-diffusion equa-
tion and Fokker–Planck equations. These topics are not essential for the material that
follows but are included to give the reader a better understanding of parabolic equations
and how they arise in applications.
140 Chapter 6. Parabolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The Convection-Diffusion Equation


We briefly consider the convection-diffusion equation

ut + aux = buxx , (6.1.6)

which obviously has similarities to hyperbolic and parabolic equations. We study it further
in Section 6.4. To solve (6.1.6) let y = x − at and set

w(t, y) = u(t, y + at).

Then
wt = ut + aux = buxx
and
wy = ux , wyy = uxx ,
so the function w(t, y) satisfies the differential equation

wt = bwyy . (6.1.7)

Since u(t, x) = w(t, x − at), we see that the solution of (6.1.6), when examined from a
coordinate system moving with speed a, is (6.1.7). Thus the solution of (6.1.6) travels
with speed a (convection) and is dissipated with strength b (diffusion).

Fokker–Planck Equations
Many of the applications of parabolic equations arise as macroscopic descriptions of pro-
cesses whose microscopic description is essentially probabilistic. Heat flow is related to
the random motion of electrons and atoms; viscous fluid forces are related to molecular
forces. Parabolic equations are also used in economic models to give a macroeconomic
description of market behavior.
We present a simple illustration of the relation between random processes and para-
bolic equations. The resulting equation is an example of a Fokker–Planck equation.
Consider a discrete process with states identified by xi = ηi, where i varies over
all the integers and where η is some positive number. Suppose that transitions occur only
between neighboring states at the discrete times tn = τ n, n = 0, 1, 2, . . . . Let pin be the
probability that a transition from state i to i + 1 occurs in one time unit starting at tn ,
and let qin be the probability that a transition from state i to i − 1 occurs in one time
unit starting at tn . One may think of this process as describing a collection of objects
such as atoms, electrons, insects, or computer jobs, which change their position or state
every τ time units. Those at xi will move to xi+1 with probability pin and to xi−1 with
probability qin ; they will stay put with probability 1 − pin − qin . Let uni be the probability
density function at time tn ; i.e., uni is the probability that an object is at xi at time tn .
Then we have the relationship
 
un+1
i = pi−1
n
uni−1 + qi+1
n
uni+1 + 1 − pin − qin uni ; (6.1.8)
6.1 Overview of Parabolic Equations 141
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

i.e., at time tn+1 = tn + τ, an object could be at xi only if it came from xi−1 , xi+1 ,
or xi at the time tn ; this formula expresses the effect on un+1
i of each possibility. This
equation is called the Chapman–Kolmogorov equation for the process.
Now we will take the limit as τ and η tend to zero, but we first rewrite the preceding
equation as
  n  n  n
un+1
i − uni = 1
2
n
pi−1 − qi−1
n
ui−1 − 12 pi+1 − qi+1
n
ui+1
 n  n    n  n 
+ 12 pi−1 + qi−1
n
ui−1 − 2 pin + qin uni + pi+1 + qi+1
n
ui+1 .

We define functions u(t, x), c(t, x), and d(t, x) as limits as η and τ tend to zero,
given by
 b
uni → u(t, x) dx for t = nτ,
a≤xi ≤b a

pin − qin pin + qin 2


η → c(t, x), and η → d(t, x).
τ 2τ
We will assume that these limits exist in an appropriate sense. We then obtain

∂u ∂ ∂2
=− [c(t, x)u] + 2 [d(t, x)u] . (6.1.9)
∂t ∂x ∂x
This is called the Fokker–Planck equation for the continuous process, which is the limit of
the discrete process.
From (6.1.9) we see that c(t, x) being positive corresponds to having pin greater than
n
qi , which means objects will tend to move to the right. Similarly, c(t, x) being negative
means objects will tend to move to the left. Also, a larger value of d(t, x) corresponds to
larger values of pin + qin , which is the probability that an object will move. The solution
u(t, x) of (6.1.9) is a probability density function and satisfies
 ∞
u(t, x) dx = 1.
−∞

Fokker–Planck equations are applicable to many physical systems for which there is an
underlying randomness. Specific examples of Fokker–Planck equations are the equation
for the probability distribution for velocity in one-dimensional Brownian motion, which is

ut = (vu)v + uvv , (6.1.10)

and the energy in three-dimensional Brownian motion, which is


 !
3
wt = 2 e − w + 2(ew)ee . (6.1.11)
2 e

In (6.1.10) the probability density function u is a function of the velocity v and the time t.
)b
The probability that a particle at time t has a velocity between a and b is a u(t, v) dv.
142 Chapter 6. Parabolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

It is worth noting that often a discrete process involving many states or objects can be
better modeled by an accurate difference scheme for the differential equation describing the
limiting continuous process than by approximations of the discrete process itself. That is,
we can approximate a discrete process involving a great many states either by simulating the
discrete process using (6.1.8) with fewer states, so as to make it computationally feasible,
or by considering the limiting continuous process described by (6.1.9). The continuous
process may have to be further approximated by a numerical method. As an example, we
could study heat flow by examining the motion of the molecules in a solid. The limiting
continuous process would be that described by the heat equation, or some variation of it.
For most applications the numerical solution of the heat equation is more accurate, and
certainly more efficient, than a simulation of the molecular motion.

Exercises
6.1.1. Show that

−(x − y)2
u(t, x) = exp
4b(t + τ )

is a solution to the heat equation (6.1.1) for any values of y and τ.

6.1.2. Show that the solution of


1 if |x| ≤ a,
ut = buxx with u0 (x) =
0 if |x| > a

is given by
  
1 a−x a+x
u(t, x) = erf √ + erf √ ,
2 4bt 4bt
where erf( ) is the error function as in (6.1.5).

6.1.3. Determine the behavior of the quantity Ct,$,m in (6.1.4) as a function of t. In


particular, show that Ct,$,m is unbounded as t approaches 0. Also determine the
behavior of the bounds on the L2 norm of the derivatives of u as t approaches 0.

6.1.4. Use the representation (6.1.3) to verify the following estimates on the norms of
u(t, x):

u(t, ·)1 ≤ u0 1 ,

u(t, ·)∞ ≤ u0 ∞ .

Show that if u0 is nonnegative, then

u(t, ·)1 = u0 1 .


6.2 Parabolic Systems and Boundary Conditions 143
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

6.1.5. Show that if the three functions u1 (t, v1 ), u2 (t, v2 ), and u3 (t, v3 ) each satisfy the
one-dimensional equation (6.1.10), then the probability density function w(t, e)
satisfies equation (6.1.11), where
 
e = 21 v12 + v22 + v32

and
 E 
w(t, e) de = u1 (t, v1 )u2 (t, v2 )u3 (t, v3 ) dv1 dv2 dv3 .
0 v12 +v22 +v32 ≤2E

6.1.6. Evaluate the integral 


1 ∞
eiω(x−y) e−bω t dω,
2

π −∞
which appears in the derivation of equation (6.1.3), by considering the function
 ∞
1
eiωα e−ω dω.
2
F (α) = √
π −∞

Hint: Show that F (0) = 1 and that F  (α) = −(1/2) αF (α).

6.2 Parabolic Systems and Boundary Conditions


We now discuss general parabolic equations in one space dimension. A more complete
discussion is contained in Chapter 9. We consider parabolic systems in which the
solution u is a vector function with d components. A system of the form

ut = Buxx + Aux + Cu + F (t, x) (6.2.1)

is parabolic, or Petrovskii parabolic, if the eigenvalues of B all have positive real parts. A
common special case is when B is positive definite, but in general the eigenvalues need
not be real, nor does B have to be diagonalizable. Notice that no restrictions are placed
on the matrices A and C.

Theorem 6.2.1. The initial value problem for the system (6.2.1) is well posed in the sense
of Definition 1.5.2, and actually a stronger estimate holds. For each T > 0 there is a
constant CT such that
 ∞  t ∞
|u(t, x)|2 dx + |ux (s, x)|2 dx ds
−∞ 0 −∞
  t (6.2.2)
∞ ∞
≤ CT |u(0, x)|2 dx + |F (s, x)|2 dx ds
−∞ 0 −∞

for 0 ≤ t ≤ T .
144 Chapter 6. Parabolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Note that estimate (6.2.2) is stronger than estimate (1.5.4) for hyperbolic systems,
since it gives a bound on the derivative of u with respect to x in addition to a bound
for u. The bound on ux in (6.2.2) implies that the solution to the system (6.2.1) is infinitely
differentiable for positive t.
Proof. We prove Theorem 6.2.1 only for the case when the equation is
homogeneous, i.e., when F (t, x) is zero. We begin by considering the Fourier transform
of equation (6.2.1), which is
ût = (−ω2 B + iω A + C)û. (6.2.3)
For large values of |ω| the eigenvalues of the matrix
−ω2 B + iω A + C
must have real parts that are less than −bω2 for some positive value b. Indeed,
 
−ω2 B + iω A + C = −ω2 B − i(ω)−1 A − (ω)−2 C ,

and because the eigenvalues of a matrix are continuous functions of the matrix, the eigen-
values of B − i(ω)−1 A − (ω)−2 C must be close to those of B itself. Considering all
values of ω we deduce that the eigenvalues have real parts bounded by a − bω2 for some
positive value of b and some value a. The solution of the differential equation (6.2.3) is
given by
2
û(t, ω) = e(−ω B+iω A+C)t û0 (ω).
Using results on the matrix norm (see Appendix A, Proposition A.2.4), we have
2 )t
|û(t, ω)| ≤ Ke(a−bω |û0 (ω)|.
From this we easily obtain
 ∞  ∞
|û(t, ω)|2 dω ≤ Kt |û0 (ω)|2 dω
−∞ −∞

and  t ∞  ∞
ω2 |û(s, ω)|2 dωds ≤ Kt |û0 (ω)|2 dω,
0 −∞ −∞
from which (6.2.2) easily follows by Parseval’s relation.

Boundary Conditions for Parabolic Systems


A parabolic system such as (6.2.1) with d equations and defined on a finite interval requires
d boundary conditions at each boundary. The most commonly occurring boundary condi-
tions for parabolic systems involve both the unknown functions and their first derivatives
with respect to x. The general form of such boundary conditions is
T0 u = b0 ,
∂u (6.2.4)
T1 + T2 u = b1 ,
∂x
6.3 Finite Difference Schemes for Parabolic Equations 145
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where T0 is a d0 × d matrix and T1 and T2 are (d − d0 ) × d matrices. We may assume


that T1 has full row rank if it is nonzero. Boundary conditions are said to be well-posed
if the solution of the differential equation depends continuously on the boundary data. The
theory of well-posed boundary conditions is discussed in Chapter 11. The requirement for
the boundary conditions to be well-posed is that the d × d matrix
!
T0
T = , (6.2.5)
T1 B −1/2

consisting of the d0 rows of T0 and the d − d0 rows of T1 B −1/2 , is invertible. The


matrix B −1/2 is that square root of B −1 whose eigenvalues all have positive real part (see
Appendix A). The matrix T2 is a lower order term and does not affect the well-posedness.
Two important boundary conditions are when T0 is the identity matrix, which is
called the Dirichlet boundary condition for the system, and when T1 is the identity matrix
with T2 being zero, which is called the Neumann boundary condition. These are easily
seen to satisfy the condition that (6.2.5) be nonsingular.

Exercises
6.2.1. Prove the estimate (6.2.2) for the scalar equation (6.1.1) by the energy method; i.e.,
multiply (6.1.1) by u(t, x) and integrate by parts in t and x.
6.2.2. Prove the estimate (6.2.2) for the scalar equation (6.1.1) from the Fourier represen-
tation.
6.2.3. Modify the proof of the estimate (6.2.2) given in the text to include the case in which
F (t, x) is not zero.
6.2.4. Prove estimate (6.2.2) by the energy method for the system
  
u1 1 4 u1
= .
u2 t
−1 1 u2 xx

6.3 Finite Difference Schemes for Parabolic Equations


In this section we begin our study of finite difference schemes for parabolic equations. The
definitions of convergence, consistency, stability, and accuracy of finite difference schemes
given in Sections 1.4, 1.5, and 3.1 were given in sufficient generality that they apply to
schemes for parabolic equations. The methods we use to study the schemes are also much
the same.
We begin by considering the forward-time central-space scheme for the heat equation
(6.1.1):
vmn+1 − v n v n − 2vm n + vn
m
= b m+1 m−1
(6.3.1)
k h2
146 Chapter 6. Parabolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

or
 n 
n+1
vm = (1 − 2bµ) vm
n
+ bµ vm+1 + vm−1
n
,

where µ = kh−2 . The parameter µ plays a role for parabolic equations similar to the role
of λ for hyperbolic equations. The scheme (6.3.1) is easily seen to be first-order accurate
in time and second-order in space. The stability analysis is similar to what we did for
hyperbolic equations, i.e., replace vmn by g n eimθ . The amplification factor for the scheme

(6.3.1) satisfies
g(θ ) − 1 eiθ − 2 + e−iθ
=b
k h2
or
k  
g(θ ) = 1 + b 2 eiθ + e−iθ − 2 ,
h
and finally,
g(θ ) = 1 − 4bµ sin2 21 θ.

Since g(θ ) is a real quantity, the condition |g(θ )| ≤ 1 is equivalent to

−1 ≤ g(θ ) ≤ 1 or 4bµ sin2 21 θ ≤ 2,

which is true for all θ if and only if

bµ ≤ 12 . (6.3.2)

Scheme (6.3.1) is dissipative of order 2 as long as bµ is strictly less than 1/2 and positive.
Therefore, we usually take bµ < 1/2 so that the scheme will be dissipative. Dissipativity
is a desirable property for schemes for parabolic equations to have, since then the finite
difference solution will become smoother in time, as does the solution of the differential
equation. As we will show later, dissipative schemes for (6.1.1) satisfy estimates analogous
to (6.2.2) and are often more accurate for nonsmooth initial data. See Section 10.4 and
Exercises 6.3.10 and 6.3.11.
The stability condition (6.3.2) means the time step k is at most (2b)−1 h2 , which
means that when the spatial accuracy is increased by reducing h in half, then k, the time
step, must be reduced by one-fourth. This restriction on k can be quite severe for practical
computation, and other schemes are usually more efficient. Notice that even though the
scheme is accurate of order (1, 2), because of the stability condition the scheme (6.3.1) is
second-order accurate if µ is constant.
We now list some other schemes and their properties. We will give the schemes for
the inhomogeneous heat equation

ut = buxx + f (t, x),

and we will assume that b is positive.


6.3 Finite Difference Schemes for Parabolic Equations 147
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The Backward-Time Central-Space Scheme


The backward-time central-space scheme is
n+1
n+1 − v n
vm v − 2vmn+1 + v n+1
m
= b m+1 m−1
+ fmn+1 . (6.3.3)
k h2
The amplification factor is
1
g(θ ) = .
1 + 4bµ sin2 21 θ
This scheme is implicit and unconditionally stable. It is accurate of order (1, 2) and is
dissipative when µ is bounded away from 0.

The Crank–Nicolson Scheme


The Crank–Nicolson scheme (see [12]) is given by
n+1
n+1 − v n
vm 1 v − 2vm
n+1 + v n+1
m
= b m+1 m−1
k 2 h2
(6.3.4)
1 v n − 2vm
n + vn
1
+ b m+1 m−1
+ (fmn+1 + fmn ).
2 h2 2
The amplification factor is
1 − 2bµ sin2 21 θ
g(θ ) = .
1 + 2bµ sin2 21 θ
The Crank–Nicolson scheme is implicit, unconditionally stable, and second-order
accurate, i.e., accurate of order (2, 2). It is dissipative of order 2 if µ is constant, but not
dissipative if λ is constant. Even though the Crank–Nicolson scheme (6.3.4) is second-
order accurate, whereas the scheme (6.3.3) is only first-order accurate, with nonsmooth
initial data and with λ held constant, the dissipative scheme (6.3.3) may be more accurate
than the Crank–Nicolson scheme, which is not dissipative when λ is constant (also see
Exercises 6.3.10 and 6.3.11). This is discussed further and illustrated in Section 10.4.

The Leapfrog Scheme


The leapfrog scheme is
n+1 − v n−1
vm v n − 2vm n + vn
m
= b m+1 m−1
+ fmn , (6.3.5)
2k h2
and this scheme is unstable for all values of µ. The amplification polynomial is
g 2 + 8 g bµ sin2 21 θ − 1 = 0
(see Section 4.2), so the amplification factors are
&
g± (θ ) = −4bµ sin2 21 θ ± (4bµ sin2 21 θ)2 + 1 .
Because the quantity inside the square root is greater than 1 for most values of θ, the
scheme is unstable.
148 Chapter 6. Parabolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The Du Fort–Frankel Scheme


The Du Fort–Frankel scheme may be viewed as a modification of the leapfrog scheme. It
is  n+1 
n+1 − v n−1
vm m
n
vm+1 − vm n−1 + v n
+ vm m−1
=b + fmn . (6.3.6)
2k h2
This
 scheme
 is explicit
  and yet
 unconditionally stable. The order of accuracy is given by
O h2 + O k 2 + O k 2 h−2 . The scheme is nondissipative, and this limits its usefulness.
The Du Fort–Frankel scheme is distinctive in that it is both explicit and uncondition-
ally stable. It can be rewritten as
 n 
(1 + 2bµ) vm n+1
= 2bµ vm+1 + vm−1
n
+ (1 − 2bµ) vmn−1
.
To determine the stability we must solve for the roots of the amplification polynomial
equation (see Section 4.2):
(1 + 2bµ) g 2 − 4bµ cos θ g − (1 − 2bµ) = 0.
The two solutions of this equation are
&
2bµ cos θ ± 1 − 4b2 µ2 sin2 θ
g± (θ ) = .
1 + 2bµ
If 1 − 4b2 µ2 sin2 θ is nonnegative, then we have
&
2bµ| cos θ | + 1 − 4b2 µ2 sin2 θ 2bµ + 1
|g± (θ )| ≤ ≤ = 1,
1 + 2bµ 1 + 2bµ
and if 1 − 4b2 µ2 sin2 θ is negative, then

(2bµ cos θ)2 + 4b2 µ2 sin2 θ − 1


|g± (θ )|2 =
(1 + 2bµ)2

4b2 µ2 − 1
= < 1.
4b2 µ2 + 4bµ + 1
Thus for any value of µ or θ, we have that both g+ and g− are bounded by 1 in
magnitude. Moreover, when g+ and g− are equal, they both have magnitude less than 1,
and so this introduces no constraint on the stability (see Section 4.2). Thus the scheme is
stable for all values of µ.
Even though the Du Fort–Frankel scheme is both explicit and unconditionally stable,
it is consistent only if k/h tends to zero with h and k (see Exercise 6.3.2). Theorem
1.6.2, which states that there are no explicit unconditionally stable schemes for hyperbolic
equations, does not extend directly to parabolic equations. However, the proper analogue
of the results of Section 1.6 for parabolic equations is the following theorem.

Theorem 6.3.1. An explicit, consistent scheme for the parabolic system (6.2.1) is conver-
gent only if k/h tends to zero as k and h tend to zero.
6.3 Finite Difference Schemes for Parabolic Equations 149
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The proof of this theorem is similar in spirit to that of Theorem 1.6.2. It does
require one result that is beyond this text: If u(t, x) is a solution to (6.2.1) and u is zero
for positive x when t is between 0 and 1, then u is identically zero for negative x as
well (see Proposition C.4.1). The proof of Theorem 6.3.1 for the special case of equation
(6.1.1) is left as an exercise (see Exercise 6.3.3).

Lower Order Terms and Stability


For schemes for hyperbolic equations, we have Corollary 2.2.2 and Theorem 2.2.3, showing
that lower order terms can be ignored in determining stability. These results do not always
apply directly to parabolic equations because
  the contribution to the amplification factor
from first derivative terms is often O k 1/2 . For example, for the forward-time central-
space scheme for ut = buxx − aux + cu we have

g = 1 − 4bµ sin2 21 θ − iaλ sin θ + ck.

For the stability analysis, the term ck can be dropped by Corollary 2.2.2. However, for
the first derivative term λ = k 1/2 µ1/2 and if µ is fixed, Corollary 2.2.2 cannot be applied.
Nonetheless, we have
 2
|g(θ )|2 = 1 − 4bµ sin2 21 θ + a 2 kµ sin2 θ,

and since the first derivative term gives an O(k) contribution to |g|2 , it does not affect the
stability. Similar results hold for other schemes, in particular the Crank–Nicolson scheme
(6.3.4) and the backward-time central-space scheme (6.3.3).

Dissipativity and Smoothness of Solutions


We now show that a dissipative one-step scheme for a parabolic equation has solutions that
become smoother as t increases, provided µ is constant.
Theorem 6.3.2. A one-step scheme, consistent with (6.1.1), that is dissipative of order 2
with µ constant satisfies


n
v n+1 2h + ck δ+ v ν 2h ≤ v 0 2h (6.3.7)
ν=1

for all initial data v 0 and n ≥ 0.


Proof. Let c0 be such that

|g(hξ )|2 ≤ 1 − c0 sin2 21 hξ.

Then by
v̂ ν+1 (ξ ) = g(hξ ) v̂ ν (ξ ),
150 Chapter 6. Parabolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

we have

|v̂ ν+1 (ξ )|2 = |g(hξ )|2 |v̂ ν (ξ )|2 ≤ |v̂ ν (ξ )|2 − c0 (sin 12 hξ )2 |v̂ ν (ξ )|2 .

By adding this inequality for ν = 0, . . . , n, we obtain, using µ = k h−2 ,


n
|v̂ n+1 (ξ )|2 + µ−1 c0 k |h−1 sin 12 hξ v̂ ν (ξ )|2 ≤ |v̂ 0 (ξ )|2 .
ν=0

Since
2 sin 1 hξ eihξ − 1

2
v̂ ν (ξ ) = v̂ ν (ξ ) = |δ ν
+ v (ξ )|,
h h
we have

n
|v̂ n+1
(ξ )| + c k
2
|δ
+ v (ξ )| ≤ |v̂ (ξ )| ,
ν 2 0 2

ν=0

and integrating over ξ, by Parseval’s relation, we obtain


n
v n+1 2h + ck δ+ v ν 2h ≤ v 0 2h ,
ν=0

which is inequality (6.3.7).


We now use Theorem 6.3.2 to show that the solutions become smoother with time,
i.e., that the norms of the high-order differences are bounded and in fact tend to zero at a
rate that is faster than that of the norm of u. Since |g| ≤ 1, we have

v ν+1 h ≤ v ν h .

In addition, since δ+ v is also a solution of the scheme, we have

δ+ v ν+1 h ≤ δ+ v ν h ;

i.e., the solution and its differences decrease in norm as time increases. Therefore, from
(6.3.7),
v n+1 2h + ctδ+ v n 2h ≤ v 0 2h ,

which shows for nk = t > 0 that δ+ v n h is bounded. In fact, we have

δ+ v n 2h ≤ Ct −1 v 0 2h ,

which shows that the norm of the difference δ+ v n decays to zero as t increases.
6.3 Finite Difference Schemes for Parabolic Equations 151
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Since δ+ v n also satisfies the difference equation, we find for nk = t > 0 and any
integer r that δ+ r v n  is bounded. Thus, the solution of the difference scheme, as is true
h
for the solution to the differential equation, becomes smoother as t increases.
The preceding argument can be modified to show that if vm n converges to u (t , x )
n m
r n r
with order of accuracy p, then δ+ v also converges to δ+ u (tn , x) with order of accuracy
p. Thus, if D r is a difference approximation to ∂xr with order of accuracy p, then D r v n
converges to ∂xr u (tn , ·) with order of accuracy p. These results hold if the scheme is
dissipative; similar statements do not hold if the scheme is nondissipative (see Exercises
6.3.10 and 6.3.11).

0.8

0.6

0.4

0.2

0
-1 -0.5 0 0.5 1

Figure 6.2. Solution with nondissipative Crank–Nicolson scheme.

Figure 6.2 shows the solution of the Crank–Nicolson scheme applied to the initial
value problem for the heat equation with b equal to 1 and initial data shown in the figure.
The exact solution is given in Exercise 6.3.11. The solution used k = h = 1/40. The small
oscillations at the location of the discontinuities in the initial solution do not get smoothed
out as k and h decrease if they remain equal. This is a result of the Crank–Nicolson
scheme being nondissipative if λ remains constant.
152 Chapter 6. Parabolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Boundary Conditions for Parabolic Difference Schemes


Since a parabolic equation requires one boundary condition at each boundary point, there is
less need for numerical boundary conditions for difference schemes for parabolic equations
than there is for schemes for hyperbolic equations.
There is no difficulty implementing the Dirichlet boundary condition; the values of
the solution are specified at the grid points at the ends of the interval.
There is more variability in implementing the Neumann boundary condition. A com-
mon method is to approximate the derivative at the endpoint by the one-sided approximation

∂u v n − v0n
(tn , x0 ) ≈ 1 .
∂x h
This approximation is only first-order accurate and will degrade the accuracy of second-
order accurate schemes, such as the Crank–Nicolson scheme (6.3.4) and the forward-time
central-space scheme (6.3.1) (which is second-order accurate under the stability condition
(6.3.2)). A better approximation is the second-order accurate one-sided approximation (see
Exercise 3.3.8)
∂u −3v0n + 4v1n − v2n
(tn , x0 ) ≈ ,
∂x 2h
which maintains the second-order accuracy of these schemes.
We can also use the second-order approximation

∂u v n − v−1
n
(tn , x0 ) ≈ 1
∂x 2h
n . As an example, this
together with the scheme applied at x0 to eliminate the value of v−1
boundary condition, together with the forward-time central-space scheme (6.3.1), gives the
formula
v0n+1 = (1 − 2bµ) v0n + 2bµv1n . (6.3.8)
The overall method is then second-order accurate.
Here is a sample of code for the Thomas algorithm for the Crank–Nicolson scheme
for the heat equation (6.1.1) with the boundary conditions

u(t, 0) = f (t) and ux (t, 1) = 0.

# b and mu must be initialized.


# M and h must be initialized.
# The grid numbering starts at 0 and goes to M.
aa = b*mu/2.
bb = 1
cc = aa
k = mu*hˆ2
# v(m) must be initialized to the initial conditions.
t = 0
while t < tmax
6.3 Finite Difference Schemes for Parabolic Equations 153
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

# Dirichlet boundary condition.


p(1) = 0.
q(1) = f(t)
loop on m from 1 to M-1
dd = v(m) + b*mu*( v(m+1) - 2*v(m) + v(m-1))
denom = (aa* p(m) + bb )
p(m+1) = -cc/ denom
q(m+1) = (dd - q(m)*aa ) /denom
end of loop on m
# Neumann condition boundary condition
Data = v(M)*( 1 - b*mu/2) + v(M-1)*b*mu/2
v(M) = (Data + q(M)*b*mu/2 )/( 1 + (1-p(M))*b*mu/2 )
loop on m from M-1 to 1
v(m) = p(m+1) v(m+1) + q(m+1)
end of loop on m
t = t + k
end of loop on t

Analysis of an Explicit Scheme


The scheme
n+1
vm = e−2bµ vm
n
+ 12 (1 − e−2bµ )(vm+1
n
+ vm−1
n
) (6.3.9)
is sometimes advocated as an unconditionally stable scheme for (6.1.1). This scheme has
been derived in various ways. Each derivation attempts to show that this scheme has better
properties than does (6.3.1). As we will show, however, for this scheme to be accurate it
must be less efficient than (6.3.1). Since (6.3.1) is generally regarded as being not very
efficient, due to the stability constraint on the time step, the use of scheme (6.3.9) is rarely
justified. Scheme (6.3.9) is indeed unconditionally stable, but as we will show, it is not
convergent unless µ tends to zero with h. Thus it is less efficient than the forward-time
central-space scheme (6.3.1), and perhaps less accurate. Notice that the requirement that
µ tends to zero with h is more restrictive than the requirement of Theorem 6.3.1 that λ
must tend to zero with h.
To study the scheme (6.3.9) define µ by

e−2bµ = 1 − 2bµ .

n to (6.3.9) is also the solution to (6.3.1) with an effective time-step


Then the solution vm
 
k = µ h . Thus, since (6.3.1) is accurate of order (1, 2), we have
2

     
n
vm − u nk  , xm = O k  + O h2 .

The solution to (6.3.9) is convergent only if

n
vm − u (nk, xm )
154 Chapter 6. Parabolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

tends to zero as h and k tend to zero. Thus, to be convergent we must have


 
u nk  , xm − u (nk, xm )
 
tend to zero as h and k tend to zero with nk fixed, and therefore n k − k  must tend to
zero for nk = t fixed. We then have

  k
n k − k = t 1 −
k

µ
=t 1− .
µ
Thus 1 − µ /µ must tend to zero for the scheme to be convergent. But
µ e−2bµ − (1 − 2bµ)
1− = = O (bµ)
µ 2bµ
as µ tends to zero. This shows that scheme (6.3.9) is convergent only if µ tends to zero
with h and k. This makes this scheme less efficient than the standard forward central
scheme (6.3.1). In fact, for explicit second-order accurate schemes, the forward central
scheme is essentially optimal.

Exercises
6.3.1. Justify the claims about the stability and accuracy of schemes (6.3.3), (6.3.4), and
(6.3.5).
6.3.2. Show that if λ = k/ h is a constant, then the Du Fort–Frankel scheme is consistent
with
bλ2 utt + ut = buxx + f (t, x).

6.3.3. Prove Theorem 6.3.1 for the equation (6.1.1). Hint: If u0 is nonnegative and not
identically zero, then u(t, x) will be positive for all x when t is positive.
6.3.4. Show that scheme (6.3.9) with µ held constant as h and k tends to zero is consistent
with
ut = b uxx ,
where b is defined by
e−2bµ = 1 − 2b µ.

6.3.5. Show that a scheme for (6.1.1) of the form


1−α  n 
n+1
vm = αvm
n
+ vm+1 + vm−1
n
,
2
with α constant as h and k tend to zero, is consistent with the heat equation (6.1.1)
only if
α = 1 − 2bµ.
6.3 Finite Difference Schemes for Parabolic Equations 155
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

6.3.6. Consider the following two schemes for (6.1.1):

n+1/2
ṽm = vm
n
+ 12 kbδ 2 vm
n
,

n+1
vm = vm
n
+ kbδ 2 ṽ n+1/2 ,

n+1
v̄m = vm
n
+ kbδ 2 vm
n
,
 
n+1
vm = vm
n
+ 12 k δ 2 v̄m
n+1
+ δ 2 vm
n
.

(a) Show that the two schemes are in fact two different implementations of the
same scheme.
(b) Show that this scheme is accurate of order (2, 2).
(c) Show that this scheme is stable if bµ ≤ 1/2, and show that 1/2 ≤ g ≤ 1.
(d) Discuss the advantages and disadvantages of this scheme compared to the
forward-time central-space scheme (6.3.1).
6.3.7. Show that the scheme

n+1/4
ṽm = vm
n
+ 14 k b δ 2 vm
n,

n+1
vm = vm
n
+ k b δ 2 ṽ n+1/4

for (6.1.1) is stable for bµ ≤ 1 and accurate of order (1, 2), and show that it is
accurate of order 2 if µ is constant. Show also that 0 ≤ g ≤ 1. Compare this
scheme with (6.3.1) in terms of accuracy and efficiency. (Notice that this scheme
requires more work per time step than does (6.3.1) but allows for a larger time step.)
6.3.8. Show that the scheme

ṽ n+1/8 = vm
n
+ 18 k b δ 2 vm
n,

n+1/8
n+1
vm = vm
n
+ k b δ 2 ṽm

for (6.1.1) is stable for bµ ≤ 2 and accurate of order (1, 2), and show that it is
accurate of order 2 if µ is constant. Compare it with the scheme (6.3.1) in terms of
accuracy and efficiency.
6.3.9. Consider a scheme for (6.1.1) of the form
n+1
vm = (1 − 2α1 − 2α2 )vm
n
+ α1 (vm+1
n
+ vm−1
n
) + α2 (vm+2
n
+ vm−2
n
).
Show that when µ is constant, as k and h tend to zero, the scheme is inconsistent
unless
α1 + 4α2 = bµ.
Show that the scheme is fourth-order accurate in x if α2 = −α1 /16.
156 Chapter 6. Parabolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

6.3.10. Solve the initial-boundary value problem for (6.1.1) on −1 ≤ x ≤ 1 with initial
data given by 
 1 if |x| < 12 ,





u0 (x) = 12 if |x| = 12 ,






0 if |x| > 12 .
Solve up to t = 1/2. The boundary data and the exact solution are given by

1 cos π(2$ + 1)x −π 2 (2$+1)2 t
u(t, x) = +2 (−1)$ e .
2 π(2$ + 1)
$=0
Use the Crank–Nicolson scheme (6.3.4) with h = 1/10, 1/20, 1/40. Compare
the accuracy and efficiency when λ = 1 and also when µ = 10.
Demonstrate by the computations that when λ is constant, the error in the
solution does not decrease when measured in the supremum norm, but it does
decrease in the L2 norm.
6.3.11. Solve the initial boundary value problem for ut = uxx on −1 ≤ x ≤ 1 for 0 ≤
t ≤ 0.5 with initial data given by


 1 − |x| for |x| < 12 ,

u0 (x) = 14 for |x| = 12 ,



0 for |x| > 12 .
Use the boundary conditions
u(t, −1) = u∗ (t, −1) and ux (t, 1) = 0,
where u∗ (t, x)
is the exact solution given by
∞ 
∗ 3 (−1)$ 2
cos π(2$ + 1)x e−π (2$+1) t
2 2
u (t, x) = + +
8 π(2$ + 1) π 2 (2$ + 1)2
$=0

cos 2π(2m + 1)x −4π 2 (2m+1)2 t
+ e .
π 2 (2m + 1)2
m=0
Consider three schemes:
(a) The explicit forward-time central-space scheme with µ = 0.4.
(b) The Crank–Nicolson scheme with λ = 1.
(c) The Crank–Nicolson scheme with µ = 5.
For the boundary condition at xM = 1, use the scheme applied at xM and set
n
vM+1 = vM−1
n to eliminate the values at xM+1 for all values of n.
For each scheme compute solutions for h = 1/10, 1/20, 1/40, and 1/80.
Compare the accuracy and efficiency for these schemes.
6.4 The Convection-Diffusion Equation 157
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

6.3.12.
(a) Show that the scheme for (6.1.1) given by
  n+1 n
kb 2 vm − vm
1− δ = bδ 2 vm
n
2 k

is the Crank–Nicolson scheme (6.3.4).


(b) Show that the implicit scheme
  n+1 n 
kb 2 vm − vm h2 2 2 n
1− δ = b 1 − δ δ vm
2 k 12

is accurate of order (2, 4) and stable if bµ ≤ 3/2.


6.3.13. Maximum Norm Stability. Show that the forward-time central-space scheme
satisfies the estimate
v n+1 ∞ ≤ v n ∞
for all solutions if and only if 2bµ ≤ 1.
6.3.14. Maximum Norm Stability. Show that the Crank–Nicolson scheme satisfies the
estimate
v n+1 ∞ ≤ v n ∞
for all solutions if bµ ≤ 1. Hint: Show that if vm
n+1

n+1 ,
is the largest value of vm
then
n+1 bµ n+1 bµ n+1
vm  ≤− vm −1 + (1 + bµ)vm
n+1
 − v 
2 2 m +1

≤ v n ∞ .

6.4 The Convection-Diffusion Equation


We now consider finite difference schemes for the convection-diffusion equation

ut + aux = buxx , (6.4.1)

which is discussed briefly in Section 6.1. We begin our discussion of this equation by
considering the forward-time central-space scheme,
n+1 − v n
vm v n − vm−1
n v n − 2vm
n + vn
m
+ a m+1 = b m+1 m−1
. (6.4.2)
k 2h h2
This is obviously first-order accurate in time, second-order accurate in space, and second-
order accurate overall because of the stability requirement

bµ ≤ 1
2,
158 Chapter 6. Parabolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

as shown in Section 6.3. The scheme is equivalent to


n+1
vm = (1 − 2bµ) vm
n
+ bµ (1 − α) vm+1
n
+ bµ (1 + α) vm−1
n
, (6.4.3)

where
k ha aλ
µ= and α= = .
h2 2b 2bµ
For convenience we assume that a is positive; the case when a is negative is very similar.
Of course, b must be positive.
Based on the discussion in Section 6.1, we see that one property of the parabolic
differential equation (6.4.1) is that

sup |u(t, x)| ≤ sup |u(t  , x)| if t > t .


x x

That is, the maximum value of |u(t, x)| will not increase as t increases. From (6.4.3) we
see that the scheme will have a similar property if and only if

α ≤ 1. (6.4.4)

That is, if this condition on α is satisfied as well as the stability condition, then from
(6.4.3) we have

|vm
n+1
| ≤ (1 − 2bµ) |vm
n
| + bµ(1 − α) |vm+1
n
| + bµ(1 + α) |vm−1
n
|

≤ max |vm
n
|,
m

and thus
max |vm
n+1
| ≤ max |vm
n
|. (6.4.5)
m m

If α is larger than 1, then inequality (6.4.5) will not be satisfied in general. For
example, if the initial solution is given by

0
vm =1 for m≤0 and 0
vm = −1 for m > 0,

then the solution at the first time step with m equal to 0 is given by

v01 = 1 − 2bµ(1 − α) = 1 + 2bµ(α − 1).

We can show that for α greater than 1, the solution will be oscillatory. We now discuss the
interpretation of these oscillations and what can be done to avoid them.
The condition (6.4.4) can be rewritten as

2b
h≤ , (6.4.6)
a
6.4 The Convection-Diffusion Equation 159
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1.2

0.8

0.6

0.4

0.2

-0.2

-0.4

0 2 4 6 8 10

Figure 6.3. The solution of the convection-diffusion equation (central).

which is a restriction on the spatial mesh spacing. The quantity a/b corresponds to the
Reynolds number in fluid flow or the Peclet number in heat flow, and the quantity α, or
twice α, is often called the cell Reynolds number or cell Peclet number of the scheme.
Condition (6.4.4) or (6.4.6) is a condition on the mesh spacing that must be satisfied in order
for the solution to the scheme to behave qualitatively like that of the parabolic differential
equation. Notice that it is not a stability condition, since stability only deals with the
limit as h and k tend to zero, and (6.4.6) is always satisfied for h small enough. The
oscillations that occur when (6.4.6) is violated are not the result of instability. They do not
grow excessively; they are only the result of inadequate resolution.
Figure 6.3 shows two numerical solutions and the exact solution for the convection-
diffusion equation (6.4.1) with a = 10 and b = 0.1 at time t = 0.8. The scheme (6.4.2)
uses µ = 1 and the two numerical solutions use values for h of 1/20 and 1/30 . For
h equal to 1/20 the value of α is greater than 1 and so the solution is oscillatory. For
h equal to 1/30 the value of α is less than 1 and so the solution is nonoscillatory. The
exact solution is the smooth curve that is the lower of the two. The initial condition is also
shown. It is the “tent function” between −0.5 and 0.5.
160 Chapter 6. Parabolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

0.6

0.4

0.2

0
6 8 10

Figure 6.4. The solution of the convection-diffusion equation.

One way of avoiding the restriction (6.4.6) is to use upwind differencing of the con-
vection term. The scheme is then

n+1 − v n n − vn
vm v n − 2vm
n + vn
vm m−1
m
+a = b m+1 m−1
(6.4.7)
k h h2
or
n+1
vm = [1 − 2bµ(1 + α)] vm
n
+ bµvm+1
n
+ bµ (1 + 2α) vm−1
n
.

If 1 − 2bµ(1 + α) is positive, this scheme satisfies (6.4.5), as may easily be seen. The
oscillations have been eliminated at the expense of being only first-order accurate in space.
The condition that 1 − 2bµ (1 + α) be nonnegative is

2bµ + aλ ≤ 1.

When b is small and a is large, this condition is less restrictive than (6.4.4).
6.4 The Convection-Diffusion Equation 161
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Notice, however, that (6.4.7) can be rewritten as

  n
n+1 − v n
vm v n − vm−1
n
ah vm+1 − 2vm
n + vn
m
+ a m+1 = b+ m−1
.
k 2h 2 h2

Thus (6.4.7) is equivalent to solving (6.4.1) by (6.4.2) after replacing b by the larger value
b = b (1 + α) . The term bαuxx can be regarded as the artificial viscosity that has been
added to (6.4.1) to make (6.4.2) have nonoscillatory solutions.
There have been many discussions about the consequences of using an upwind scheme
such as (6.4.7) instead of a scheme like (6.4.2). Many of these discussions address only
imprecisely stated questions and make ambiguous conclusions. Let us restrict ourselves to
an example to consider these two schemes.

Example 6.4.1. Consider (6.4.1) with b = 0.1, a = 10, and choose a grid spacing h =
0.04 so that α has the value 2. Scheme (6.4.2) will have oscillations and will not be a
good approximation to the true solution. Solving by scheme (6.4.7) is equivalent to solving
(6.4.1) with b replaced by b , which has the value 0.3, a 200% change in the value
of b. Is the solution to (6.4.7) a good approximation to the solution of (6.4.1)? If it is, then
presumably replacing b by zero—only a 100% change from the true value—and using a
scheme for hyperbolic equations will also give a good solution.
If α is larger than 1, then none of these schemes will give a good approximation
to the solution of (6.4.1). We must then ask what we hope to learn or need to learn by
solving the problem. If we need only qualitative information on the general form of the
solution, then perhaps (6.4.7) is good enough. The solution of (6.4.7) will have the same
qualitative properties as the solution of (6.4.1), and the solution (6.4.7) will overly smooth
any gradients in the solution. In this case, however, the solution to the hyperbolic equation,
obtained by setting b to 0, should also be a good approximation to the true solution.
However, if we need to know precise information about the solution of (6.4.1), then neither
(6.4.2) nor (6.4.7) will be adequate if α is too large. We are forced to make h smaller or
to try other methods, such as perturbation methods, to extract the necessary information. A
good feature of (6.4.2) is that the oscillations are an indicator of the scheme’s inability to
resolve the gradients in the solution. Scheme (6.4.7) has no such indicator.
There is no answer to the question of which scheme is better, but it is to be hoped that
this discussion clarifies some of the questions that should be asked when solving a parabolic
differential equation like (6.4.1).

Figure 6.4 shows four numerical solutions and the exact solution for the same
equation as in Figure 6.3. The lowest curve is the solution to the upwind scheme with
h = 1/20. The curve above that is the upwind solution with h = 1/50. The curve in
the middle is the exact solution. The highest curve is the central differencing scheme with
h = 1/30, and the curve below that is the solution with central differencing and h = 1/50.
Notice that the central scheme with h = 1/50 is more accurate than the upwind
scheme for h = 1/50.
To draw reasonable conclusions from this discussion, it must be remembered that
most real applications involving equations like (6.4.1) are for more complex systems than
162 Chapter 6. Parabolic Partial Differential Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

the constant coefficient equation. The conclusions we should draw are these. First, there
is a grid spacing limitation. If the grid spacing is too coarse, then the scheme will not
compute a qualitatively correct solution. Second, if we need precise information about the
solution and it is not cost-effective to use a small grid spacing, then other methods should
be investigated to obtain this information.
In recent years a number of methods have been advocated for increasing local grid
refinement only in those places where the solution is changing rapidly. Equation (6.4.1) is
often used as a test problem for such methods.
More information on the numerical solution of the convection-diffusion equation can
be found in the book by Morton [44].

Exercises
6.4.1. Show that scheme (6.4.2) satisfies the condition |g| ≤ 1 if and only if k ≤ 2b/a 2 .
Discuss this condition in relation to the condition (6.4.6).
6.4.2. Show that scheme (6.4.2) has phase speed given by

aλ sinh ξ
tan αkξ =
1 − 4bµ sin2 21 hξ

and  !
1 a 2 λ2
α(ξ ) = a 1 − h ξ2 2
− bµ + + O(hξ ) 4
.
6 3

6.4.3. Consider the following scheme for equation (6.4.1), which is derived in the same
way as was the Lax–Wendroff scheme of Section 3.1.

k2  2 2 n 
n+1
vm = vm
n
− kaδ0 vm
n
+ kbδ 2 vm
n
+ a δ vm − 2abδ 2 δ0 vm
n
+ b2 δ 4 vm
n
.
2
Show that this scheme is stable if bµ ≤ 1/2. Also show that

|g|2 ≤ 1 + Kk 2

if bµ ≤ 1/2.
6.4.4. Consider the nonlinear equation

ut + 12 (u2 )x = buxx

on an interval such as −1 ≤ x ≤ 1. This equation has as a solution the function


c 
u(t, x) = a − c tanh (x − at) ,
2b
which represents a “front” moving to the right with speed a. The front has an increase
in u of 2c as it moves past any point, and the average value of u is a. Consider
6.5 Variable Coefficients 163
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

only positive values for a and c. Based on the analysis of the convection-diffusion
equation, it seems likely that there will be resolution restrictions on the grid spacing,
h, which place upper bounds on the quantities

hc ha
or
2b 2b

in order to get a “qualitatively correct” solution. Notice that the maximum magnitude
of the slope of the front divided by the total change in u is c/4b.
Using any one scheme, investigate this equation and one of these resolution
conditions. Justify your conclusions with a few well-chosen calculations. Fix values
of a and c and vary b, or fix a and b and vary c, or fix a, b, and c and vary
the grid spacing and time step.

6.5 Variable Coefficients


In many applications the diffusivity b is a function of t or x, or even a function of u
itself. The equation is frequently of the form

ut = (b(t, x)ux )x . (6.5.1)

For such equations the difference schemes must be chosen to maintain consistency.
For example, a forward-time central-space scheme for (6.5.1) is
 n   n 
n+1 − v n
vm m b(tn , xm+1/2 ) vm+1 n − b(t , x
− vm n m−1/2 ) vm − vm−1
n
= .
k h2

This scheme is consistent and is stable if

b(t, x)µ ≤ 1
2

for all values of (t, x) in the domain of computation.


Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 7

Systems of Partial Differential


Equations in Higher Dimensions

In this chapter we show how to extend the results of the previous chapters to systems of
equations and to equations and systems in two and three spatial dimensions. The concepts
of convergence, consistency, and stability carry over without change, as does the definition
of order of accuracy; the main change has to do with the increase in complexity of the
schemes, especially for implicit schemes for systems of equations. There are many schemes
for multidimensional systems that arise from various applications, such as aircraft flow
analysis in aeronautical engineering, numerical weather prediction in meteorology, and oil
reservoir simulation in geology. We are not able to present particular schemes for these
applications, but the ideas that we introduce are useful in each of these areas.
We begin by discussing stability for systems of equations, both for hyperbolic and
parabolic systems, and then discuss equations and systems in two and three dimensions.
In Section 7.3 we introduce the alternating direction implicit method, which is among the
most useful of the methods for multidimensional problems.

7.1 Stability of Finite Difference Schemes


for Systems of Equations
We have discussed the one-way wave equation (1.1.1) and the heat equation (6.1.1) quite
extensively, and we now show how much of what we had to say carries over to systems of
the form

ut + Aux = 0 (7.1.1)
and
ut = Buxx , (7.1.2)

where u is a vector of functions of dimension d and A and B are d × d matrices. For


system (7.1.1) to be hyperbolic the matrix A must be diagonalizable with real eigenvalues
(see Section 1.1) and for (7.1.2) to be parabolic all the eigenvalues of the matrix B must
have positive real part (see Section 6.2). In Chapter 9 there is a more general discussion
of well-posed systems of equations. Almost all of what we have done for scalar equations
extends readily to systems of equations. For example, the derivations of the Lax–Wendroff

165
166 Chapter 7. Systems of Partial Differential Equations in Higher Dimensions
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

scheme for the one-way wave equation and the Crank–Nicolson schemes for both the one-
way wave equation and the heat equation require no change when applied to systems other
than replacing a with A or b with B, respectively. The main difference is in the test for
stability.
In testing the stability of one-step schemes for systems we obtain not a scalar am-
plification factor, but an amplification matrix G. The amplification matrix is obtained by
making the substitution of Gn eimθ for vm n . The condition for stability is that for each

T > 0, there is a constant CT such that for 0 ≤ nk ≤ T , we have

Gn  ≤ CT . (7.1.3)

One great simplification to help analyze (7.1.3) for hyperbolic systems occurs when
the scheme has G as a polynomial or rational function of the matrix A (e.g., the
Lax–Wendroff or Crank–Nicolson scheme). Then the same matrix that diagonalizes matrix
A in (7.1.1) diagonalizes G, and the stability of the scheme depends only on the stability
of the scalar equations
wt + ai wx = 0,
where ai is an eigenvalue of A. For the Lax–Wendroff scheme, the stability condition for
(7.1.1) is |ai λ| ≤ 1 for i = 1, . . . , d.
Similar methods can be applied to parabolic systems, especially for dissipative schemes
with µ constant. The matrix that transforms the matrix B to upper triangular form can
also be used to convert G to upper triangular form. The methods of Chapter 9 can be used
to obtain estimates of the powers of G.
For each of these cases, if U is the matrix that transforms G to upper triangular
form, such that G = U G̃U −1 , then Gn = U G̃n U −1 and so

Gn  ≤ U U −1 G̃n .

This implies that estimate (7.1.3) will be satisfied for G if a similar estimate holds for G̃.
For general schemes the situation is not as nice. A necessary condition for stability is

|gν | ≤ 1 + Kk (7.1.4)

for each eigenvalue gν of G, but this is not sufficient in general.

Example 7.1.1. Asomewhat artificial example in which the condition (7.1.4) is not sufficient
for stability is obtained by considering the system

u1t = 0,

u2t = 0
with the first-order accurate scheme

1 n+1
vm = vm
1n
− c(vm+1
2n
− 2vm
2n
+ vm−1
2n
),
2 n+1
vm = vm
2n
.
7.1 Stability for Systems of Equations 167
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The amplification matrix is



1 4c sin2 21 θ
G=
0 1

and the eigenvalues are both 1. However,


1 4nc sin2 21 θ
Gn =
0 1

and the norm of Gn  for θ equal to π grows with n. Because of this growth we conclude
that this scheme is unstable.
Fortunately, the straightforward extensions of schemes for single equations to systems
of equations usually results in stable schemes.

As for single equations, it can be shown that lower order terms do not affect the
stability of systems. This is proved in Exercise 7.1.5.

Multistep Schemes as Systems


We can analyze the stability of multistep schemes by converting them to the form of a
system. For example, a scheme that can be transformed to the form


K
v̂ n+1 (ξ ) = aν (ξ )v̂ n−ν (ξ )
ν=0

can be written as
V̂ n+1 (ξ ) = G(hξ )V̂ n (ξ ),
 T
where V̂ n (ξ ) is the column vector v̂ n (ξ ), v̂ n−1 (ξ ), . . . , v̂ n−K (ξ ) . The matrix G(hξ )
is the companion matrix of the polynomial with coefficients −aν (ξ ), given by

 
a0 a1 . . . aK−1 aK
I 0 ... 0 0 
 
G(hξ ) =  0
 .
I ... 0 0 .
 .. .. .. .. .. 
. . . . 
0 0 ... I 0

To determine the stability of scalar finite difference schemes, the methods of Section 4.3
are usually easier to apply than the verification of estimate (7.1.3). For multistep schemes
applied to systems, there is no good analogue of the theory of Schur polynomials, and so
the conversion to a system is often the best way to analyze the stability of schemes.
168 Chapter 7. Systems of Partial Differential Equations in Higher Dimensions
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
7.1.1. Prove that condition (7.1.4) is a necessary condition for stability of a system.
7.1.2. Prove that a scheme for a parabolic system is stable if the amplification matrix G is
upper triangular and is dissipative of order 2. That is, there are constants c and C
such that for µ constant
|gii (θ )| ≤ 1 − c sin2 21 θ,

and moreover, for j > i,

|gij (θ )| ≤ C sin2 21 θ

and gij (θ ) = 0 for j < i. You may wish to use techniques from Sections 6.3
and 9.2.
7.1.3. Analyze the stability of the leapfrog scheme (1.3.4) as a system. Show that this
analysis gives the same conclusion as obtained in Section 4.1.
7.1.4. Show that the Lax–Friedrichs scheme applied to the system

  
u 1 1 u
+ =0
v t 0 1 v x

1 1
is unstable. The scheme is the same as (1.3.5) with the matrix replacing a.
0 1
(This equation is a weakly hyperbolic system; see Section 9.2.)
7.1.5. Use the matrix factorization (A.2.3) of Appendix A to prove the extension of Corol-
lary 2.2.2 to systems of equations. Let G be the amplification factor of a stable
scheme, with Gn  ≤ CT for 0 ≤ nk ≤ T . Also, let G̃ be the amplification factor
of a scheme with G − G̃ ≤ c0 k. Assuming that G̃ ≤ c1 with c1 ≥ 1, use
(A.2.3) to establish by induction that

G̃n  ≤ CT (1 + kc0 c1 )n .

7.2 Finite Difference Schemes in Two


and Three Dimensions
In this section we consider finite difference schemes in two and three spatial dimensions.
The basic definitions of convergence, consistency, and stability given in Sections 1.4 and
1.5 readily extend to two and three dimensions. One difficulty associated with schemes
in more than one dimension is that the von Neumann stability analysis can become quite
formidable, as we illustrate in the next two examples.
7.2 Schemes in Two and Three Dimensions 169
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example 7.2.1. We begin by considering the standard leapfrog scheme for the system
ut + Aux + Buy = 0, (7.2.1)
where A and B are d × d matrices. The scheme may be written
n+1
v$,m − v$,m
n−1
+ Aδ0x v$,m n
+ Bδ0y v$,m n
= 0. (7.2.2)
2k
The Fourier transform of the solution, v̂ n (ξ ) = v̂ n (ξ1 , ξ2 ), satisfies the recursion relation
v̂ n+1 (ξ ) + 2i(λ1 A sin hξ1 + λ2 B sin hξ2 )v̂ n (ξ ) − v̂ n−1 (ξ ) = 0, (7.2.3)
where λ1 = k/ h1 and λ2 = k/ h2 .
The stability of this scheme can be analyzed using the methods introduced in the
previous section. The scheme can be written as a one-step scheme and the condition (7.1.3)
has to be checked. However, it is difficult to obtain reasonable conditions without making
some assumptions about the matrices A and B.
The most common assumption is that A and B are simultaneously diagonalizable.
This means that there exists a matrix P such that both P AP −1 and P BP −1 are diagonal
matrices. In actual practice, this condition is rarely met, but it does give insight nonetheless.
If we set w = P v̂, then relation (7.2.3) can be reduced to the form
wνn+1 (ξ ) + 2i(λ1 αν sin θ1 + λ2 βν sin θ2 )wνn (ξ ) − wνn−1 (ξ ) = 0 (7.2.4)
for ν = 1, . . . , d, where αν and βν are the νth entries in P AP −1
and P BP −1 ,
respectively. Analyzing scheme (7.2.4) is similar to the analysis done on the
one-dimensional scalar leapfrog scheme in Section 4.1. We conclude that scheme (7.2.2)
is stable if and only if
λ1 |αν | + λ2 |βν | < 1
for all values of ν.
Example 7.2.2. A modification of the leapfrog scheme (7.2.2) allowing larger time steps
has been given by Abarbanel and Gottlieb [1]. The scheme is
n+1
v$,m − v$,m
n−1 n
v$,m+1 + v$,m−1
n n
v$+1,m + v$−1,m
n
+ Aδ0x + Bδ0y = 0. (7.2.5)
2k 2 2
Assuming that A and B are simultaneously diagonalizable, the stability analysis leads
easily to the condition that
|λ1 αν sin θ1 cos θ2 + λ2 βν sin θ2 cos θ1 | < 1
must hold for all values of θ1 , θ2 , and ν. We have, using the Cauchy–Schwarz inequality,

|λ1 αν sin θ1 cos θ2 + λ2 βν sin θ2 cos θ1 |

≤ max{λ1 |αν |, λ2 |βν |}(| sin θ1 || cos θ2 | + | sin θ2 || cos θ1 |)

≤ max{λ1 |αν |, λ2 |βν |}(sin2 θ1 + cos2 θ1 )1/2 (cos2 θ2 + sin2 θ2 )1/2

= max{λ1 |αν |, λ2 |βν |}.


170 Chapter 7. Systems of Partial Differential Equations in Higher Dimensions
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Thus we see that the two conditions λ1 |αν | < 1 and λ2 |βν | < 1 for all values of ν
are sufficient for stability, and it is easy to see, by taking appropriate choices of θ1 and
θ2 , that these conditions are also necessary. Thus the modified scheme (7.2.5) allows for
a much larger time step than does the standard leapfrog (7.2.2). The extra computation per
time step required by (7.2.5) is more than offset by the larger time step, making it more
efficient than the standard scheme (7.2.2) (see [1]).

We can obtain a general formula for the stability condition for the schemes (7.2.2) and
(7.2.5) without the assumption of simultaneous diagonalizablity as follows. Because the
system (7.2.1) is hyperbolic the matrix function Aξ1 + Bξ2 is uniformly diagonalizable
with real eigenvalues; see Chapter 9. This means there is a matrix Y (ξ1 , ξ2 ) such that the
norm of Y (ξ ) and Y (ξ )−1 are uniformly bounded and

Y (ξ ) (Aξ1 + Bξ2 ) Y (ξ )−1 = D(ξ ),

where D(ξ ) is a diagonal matrix with real eigenvalues given by Di (ξ ). By multiplying


equation (7.2.3) by Y = Y (λ1 sin θ1 , λ2 sin θ2 ) and setting ŵ = Y v̂, we obtain

ŵn+1 + 2iD(λ1 sin θ1 , λ2 sin θ2 )ŵ n − w n−1 = 0.

Because D is diagonal, this system is composed of d simple scalar equations. The stability
condition is then easily seen to be

max max |Di (λ1 sin θ1 , λ2 sin θ2 )| < 1. (7.2.6)


1≤i≤d θ1 ,θ2

The scheme is stable for all values of λ1 and λ2 that satisfy this inequality. Of course,
one must determine the functions Di (ξ ); these are the eigenvalues of Aξ1 + Bξ2 , to
explicitly determine the stability condition. In some cases this can be done; in other cases
it is a formidable task.
This same method can be applied to the analysis of other schemes, such as the
Lax–Wendroff and Crank–Nicolson schemes, for the system (7.2.1).

Time Split Schemes


Time splitting is a general method for reducing multidimensional problems to a sequence
of one-dimensional problems (see, e.g., Yanenko [72]). Consider an equation of the form

ut + A1 u + A2 u = 0, (7.2.7)

where A1 and A2 are linear operators, such as A1 = A∂/∂x and A2 = B∂/∂y. The
operators A1 and A2 need not be associated with a particular dimension, but this is the
most usual case. To advance the solution of (7.2.7) from a time t0 to the time t0 + k, we
approximate (7.2.7) with the equations

ut + 2A1 u = 0 for t0 ≤ t ≤ t0 + 12 k (7.2.7a)


7.2 Schemes in Two and Three Dimensions 171
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

and
ut + 2A2 u = 0 for t0 + 12 k ≤ t ≤ t0 + k. (7.2.7b)
That is, each of the operators acts with twice its usual effect for half of the time.
We then use one-step finite difference schemes to approximate (7.2.7a) and (7.2.7b).
If we use second-order accurate schemes to approximate both (7.2.7a) and (7.2.7b), then
the overall scheme will be second-order accurate only if the order of the splitting is reversed
on alternate time steps (see Gottlieb [25] and Strang [58]).
Stability for time split schemes does not necessarily follow from the stability of each
of the steps unless the amplification factors commute with each other.
A significant difficulty associated with time split schemes is in determining the ap-
propriate boundary conditions for each of the steps. Improper boundary conditions can
seriously degrade the accuracy of the solution. A method for deriving boundary conditions
for time split schemes has been given by LeVeque and Oliger [39].

Example 7.2.3. Atime split scheme popular in computational fluid dynamics is the time split
MacCormack scheme [41]; see also Exercise 3.2.1. For system (7.2.1) with )x = )y = h,
the forward-backward MacCormack scheme is
n+1/2
ṽ$,m = v$,m
n
− Aλ(v$+1,m
n
− v$,m
n
),
n+1/2 n+1/2 n+1/2 n+1/2
v$,m = 12 [v$,m
n
+ ṽ$,m − Aλ(ṽ$,m − ṽ$−1,m )],

n+1 n+1/2 n+1/2 n+1/2


ṽ$,m = v$,m − Bλ(v$,m+1 − v$,m ),

n+1 n+1/2
v$,m = 12 [v$,m + ṽ$,m
n+1
− Bλ(ṽ$,m
n+1
− ṽ$,m−1
n+1
)].

An advantage of this scheme is that each of the four stages is very easy to program, making
it suitable for use on high-speed vector or parallel processors.

Exercises
7.2.1. Show that the Lax–Friedrichs scheme
n+1
v$,m − 14 (v$+1,m
n + v$−1,m
n + v$,m+1
n + v$,m−1
n )
+ aδ0x v$,m
n
+ bδ0y v$,m
n
=0
k
for the equation ut + aux + buy = 0, with )x = )y = h, is stable if and only if
(|a|2 + |b|2 )λ2 ≤ 1/2.
7.2.2. Show that the scheme
n+1
v$,m − 14 (v$+1,m+1
n + v$−1,m+1
n + v$+1,m−1
n + v$−1,m−1
n )
k
+ n
aδ0x v$,m + bδ0y v$,m
n
=0

for the equation ut + aux + buy = 0, with )x = )y = h, is stable if and only if


(|a| + |b|)λ ≤ 1.
172 Chapter 7. Systems of Partial Differential Equations in Higher Dimensions
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

7.2.3. Show that the two-dimensional Du Fort–Frankel scheme for the equation ut =
b(uxx + uyy ) + f given by
 
n+1
v$,m − v$,m
n−1 n
v$+1,m + v$−1,m
n + v$,m+1
n + v$,m−1
n − 2 v$,m
n+1
+ v$,m
n−1
=b + fmn ,
2k h2
where )x = )y = h, is unconditionally stable.
7.2.4. Show that the scheme given by the two-step algorithm

n+1/2 1 n 
ṽ$+1/2,m+1/2 = v$+1,m+1 + v$,m+1
n
+ v$+1,m
n
+ v$,m
n
4
aλ  n 
− v$+1,m+1 + v$+1,m
n
− v$,m+1
n
− v$,m
n
4
bλ  n 
− v + v$,m+1
n
− v$+1,m
n
− v$,m
n
,
4 $+1,m+1

n+1 aλ  n+1/2 n+1/2 n+1/2 n+1/2



v$,m = v$,m
n
− ṽ$+1/2,m+1/2 + ṽ$+1/2,m−1/2 − ṽ$−1/2,m+1/2 − ṽ$−1/2,m−1/2
2
bλ  n+1/2 n+1/2 n+1/2 n+1/2

− ṽ$+1/2,m+1/2 + ṽ$−1/2,m+1/2 − ṽ$+1/2,m−1/2 − ṽ$−1/2,m−1/2
2
for the equation ut + aux + buy = 0, with )x = )y = h, is second-order accu-
rate and stable if and only if (|a|2 + |b|2 )λ2 ≤ 1.
7.2.5. Using the formula (7.2.6) find the stability condition for the leapfrog schemes (7.2.2)
and (7.2.5) when
 
1 0 0 1
A= and B = .
0 −1 1 0

7.2.6. Prove that if the two steps of a time split scheme have amplification factors that
commute with each other, then the time split scheme is stable if each step is stable.

7.3 The Alternating Direction Implicit Method


In this section we examine a very powerful method that is especially useful for solving
parabolic equations on rectangular domains. It is also useful for equations of other types
and on more general domains, although it can then become quite complex. This method is
called the alternating direction implicit, or ADI, method. We discuss the derivation of the
name and show that the method can be applied quite generally.
We begin by defining a parabolic equation in two spatial dimensions. The general
definition is given in Section 9.2. The equation in two spatial variables,

ut = b11 uxx + 2b12 uxy + b22 uyy , (7.3.1)


7.3 The ADI Method 173
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

is parabolic if
2
b11 , b22 > 0 and b12 < b11 b22.
The most common example is the two-dimensional heat equation
ut = b(uxx + uyy ),
which governs heat flow in two dimensions. Parabolic equations arise in many other
two- and three-dimensional problems, including the study of flows in porous media and
modeling of economic processes. We introduce the ADI method using the two-dimensional
heat equation as our primary example.

The ADI Method on a Square


Consider
ut = b1 uxx + b2 uyy (7.3.2)
on a square. Note that there is no term with a mixed derivative; i.e., there is no uxy term.
The ADI method applies most simply to parabolic equations of the form (7.3.2). Later in
this section we consider how to modify the basic method in order to include the mixed
derivative terms.
If we were to use a scheme similar to the Crank–Nicolson scheme for (7.3.2), with
discretization of both spatial derivatives, the scheme would be unconditionally stable, but
the matrix to be inverted at each time step would be much more difficult to invert than
were the tridiagonal matrices encountered in one-dimensional problems. The ADI method,
which we now derive, is a way of reducing two-dimensional problems to a succession of
many one-dimensional problems.
Let A1 and A2 be linear operators, which can be quite general, but for convenience
think of
A1 u = b1 uxx ,
A2 u = b2 uyy .
In general we assume that we have convenient methods to solve the equations
wt = A1 w
and
wt = A2 w
by the Crank–Nicolson scheme or a similar implicit scheme. The ADI method gives us a
way to use these schemes to solve the combined equation
ut = A1 u + A2 u (7.3.3)
using the methods for the simpler “one-dimensional” problems.
We begin by using the same idea as used in the Crank–Nicolson scheme, that of cen-
tering the difference scheme about t = (n + 1/2) k. By the Taylor series, (7.3.3) becomes
un+1 − un 1  1   
= A1 un+1 + A1 un + A2 un+1 + A2 un + O k 2
k 2 2
174 Chapter 7. Systems of Partial Differential Equations in Higher Dimensions
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

or    
k k k k
I − A1 − A2 un+1 = I + A1 + A2 un + O k 3 . (7.3.4)
2 2 2 2
As noted before, if we discretize the operators A1 and A2 with respect to the spatial
variable at this stage, then the matrix corresponding to the left-hand side will be difficult to
invert. We now note the formula
(1 ± a1 ) (1 ± a2 ) = 1 ± a1 ± a2 + a1 a2
and, based on this, add k 2 A1 A2 un+1 /4 to both sides of equation (7.3.4) and then write it
as
 
k k k2 k k k2
I − A 1 − A 2 + A1 A2 un+1
= I + A1 + A2 + A1 A2 un
2 2 4 2 2 4
(7.3.5)
k2    
+ A1 A2 un+1 − un + O k 3 .
4
The two matrix sums can be factored as
   
k k k k
I − A1 I − A2 un+1 = I + A1 I + A2 un
2 2 2 2

k2    
+ A1 A2 un+1 − un + O k 3 .
4
Consider first the second term on the right-hand side. We have
un+1 = un + O(k),
so with the k 2 factor this second term is O(k 3 ), which is the same order as the errors
already introduced. So we have
     
k k k k
I − A1 I − A2 un+1 = I + A1 I + A2 un + O k 3 . (7.3.6)
2 2 2 2
If we discretize this equation, then we have a more convenient method. In the case when
A1 = b1 uxx and A2 = b2 uyy , the matrices corresponding to I − k/2 Ai will be tridiag-
onal and can be solved conveniently with the Thomas algorithm; see Section 3.5. Let A1h
and A2h be second-order approximations to A1 and A2 , respectively. We obtain
   
k k k k
I − A1h I − A2h un+1 = I + A1h I + A2h un
2 2 2 2
   
+ O k 3 + O kh2 ,

and from this we have the ADI scheme


   
k k k k
I − A1h I − A2h v n+1
= I + A1h I + A2h v n . (7.3.7)
2 2 2 2
7.3 The ADI Method 175
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The Peaceman–Rachford Algorithm


To solve (7.3.7) Peaceman and Rachford [50] used
 
k k
I − A1h ṽ n+1/2 = I + A2h v n , (7.3.8a)
2 2
 
k k
I − A2h v n+1
= I+ A1h ṽ n+1/2 . (7.3.8b)
2 2

Formulas (7.3.8) explain the origin of the name alternating direction implicit method. The
two steps alternate which direction is implicit and which is explicit. More generally, the term
ADI applies to any method that involves the reduction of the problem to one-dimensional
implicit problems by factoring the scheme.
We now show that formulas (7.3.8) are equivalent to formula (7.3.7). If we start with
equation (7.3.8b), operate with (I − k/2 A1h ) , and then use (7.3.8a), we obtain

   
k k k k
I − A1h I − A2h v n+1
= I − A1h I + A1h ṽ n+1/2
2 2 2 2
   
k k k k
= I + A1h I − A1h ṽ n+1/2 = I + A1h I + A2h v n .
2 2 2 2

Notice that the equivalence of (7.3.8) to (7.3.7) does not require that the operators A1h and
A2h commute with each other. Some ADI methods similar to (7.3.8) require that some of
the operators commute with each other; see Exercise 7.3.14.
The D’Yakonov scheme for (7.3.7) is

  
k k k
I − A1h v̄ n+1/2 = I + A1h I + A2h v n ,
2 2 2
 (7.3.9)
k
I − A2h v n+1 = v̄ n+1/2 .
2

The variables ṽ n+1/2 in (7.3.8) and v̄ n+1/2 in (7.3.9) should be thought of as inter-
mediate or temporary variables in the calculation and not as approximations to u(t, x) at
any time t. By consistency such variables in ADI methods are usually first-order approxi-
mations to the solution, but this is of little significance.

The Douglas–Rachford Method


Other ADI schemes can be derived starting with other basic schemes. Starting with the
backward-time central-space scheme for (7.3.3), we have
 
(I − kA1 − kA2 ) un+1 = un + O k 2
176 Chapter 7. Systems of Partial Differential Equations in Higher Dimensions
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

or
   
I − kA1 − kA2 + k 2 A1 A2 un+1 = un + k 2 A1 A2 un + k 2 A1 A2 un+1 − un
 
+ O k2 ,

which gives the scheme


 
(I − kA1h ) (I − kA2h ) v n+1 = I + k 2 A1h A2h v n .

The Douglas–Rachford method [13] for this scheme is


(I − kA1h ) ṽ n+1/2 = (I + kA2h ) v n ,
(7.3.10)
(I − kA2h ) v n+1 = ṽ n+1/2 − kA2h v n .
If the operators Ai are approximated to second-order accuracy, the scheme is accurate of
first-order in time and second-order in space.

Boundary Conditions for ADI Schemes


ADI schemes require values of the intermediate variables on the boundary. If we consider
the case of Dirichlet boundary conditions, i.e., u(t, x, y) specified on the boundary, then
values of ṽ n+1/2 on the boundaries are obtained by using the two steps of the scheme with
v n and v n+1 specified to obtain ṽ n+1/2 .
For example, consider the Peaceman–Rachford method with
∂2 ∂2
A1 = b1 , A2 = b2 ,
∂x 2 ∂y 2
and u = β(t, x, y) on the boundary of the unit square. For step (7.3.8a), ṽ n+1/2 is needed
at x = 0 and x = 1. By adding the two parts of (7.3.8), we have
 
1 k 1 k
ṽ n+1/2 = I + A2h β n + I − A2h β n+1 , (7.3.11)
2 2 2 2
which can be used to compute ṽ n+1/2 along the boundaries at x = 0 and x = 1. Thus
ṽ n+1/2 is determined where needed.
For the Douglas–Rachford method, the second equation gives
ṽ n+1/2 = (I − kA2h ) β n+1 + kA2h β n .
Note again that ṽ n+1/2 need not be a good approximation to the solution at the intermediate
time level with t = (n + 1/2) k.
The boundary condition
n+1/2 n+1/2
ṽ$,m = β$,m = β(tn+1/2 , x$ , ym ) (7.3.12)
is very easy to implement but is only first-order accurate. If this condition is used with the
Peaceman–Rachford method or similar second-order methods, the overall accuracy will be
only first order.
7.3 The ADI Method 177
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Stability for ADI Methods


The Peaceman–Rachford and Douglas–Rachford methods are unconditionally stable, as
is easily seen by von Neumann analysis for two dimensions. As an example we show
the stability of the Douglas–Rachford method applied to the two-dimensional heat
equation (7.3.2).
n n+1/2
Replacing v$,m by g n ei$θ eimφ and ṽ$,m by g̃g n ei$θ eimφ , we obtain

 
1 + 4b1 µx sin2 21 θ g̃ = 1 − 4b2 µy sin2 21 φ,
 
1 + 4b2 µy sin2 21 φ g = g̃ + 4b2 µy sin2 21 φ.

Thus
1 + 16b1 b2 µx µy sin2 21 θ sin2 21 φ
g= ≤ 1.
(1 + 4b1 µx sin2 21 θ)(1 + 4b2 µy sin2 21 φ)

Implementing ADI Methods


To implement ADI methods on a rectangular domain, we begin with a grid consisting
of points (x$ , ym ), given by x$ = $)x and ym = m)y for $ = 0, . . . , L and m =
0, . . . , M, respectively. We illustrate the implementation using the Peaceman–Rachford
algorithm (7.3.8). The numerical method is most conveniently programmed using two
two-dimensional arrays, one for the values of v and one for ṽ. In addition to these two
two-dimensional arrays, two one-dimensional arrays are needed to store the variables p
and q used in the Thomas algorithm as given in Section 3.5.
Figure 7.1 shows a grid and illustrates the sets of grid points on which ṽ and v are
solved. Notice that the values of v on the left- and right-side boundaries are not obtained
as part of the algorithm and must be specifically assigned.
As formulas (7.3.8) show, the computation of v n+1 from v n involves two distinct
stages: the first to compute ṽ from v and the second to compute v from ṽ. We have
dropped the superscripts on ṽ and v, since in the programming of the method there is no
index corresponding to n on these arrays.
The difference equations for ṽ corresponding to (7.3.8a) are

b1 µx b 1 µx
− ṽ$−1,m + (1 + b1 µx )ṽ$,m − ṽ$+1,m
2 2
b2 µy b 2 µy (7.3.13)
= v$,m−1 + (1 − b2 µy )v$,m + v$,m+1
2 2
for $ = 1, . . . , L − 1 and for m = 1, . . . , M − 1.

This system of equations consists of M − 1 tridiagonal systems of equations, one for each
value of m. For each value of m, the Thomas algorithm can be used to solve for the values
of ṽ$,m . The standard way to implement the method uses a loop on the m, and within each
178 Chapter 7. Systems of Partial Differential Equations in Higher Dimensions
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

~
V

Figure 7.1. Grid for ADI.

loop the Thomas algorithm is used to solve for the values of ṽ$,m for $ = 0, . . . , L. The
boundary values of ṽ are given by (7.3.11), or in this specific case,

b2 µy n 1 − b 2 µy n b 2 µy n
ṽ0,m = β + β0,m + β
4 0,m−1 2 4 0,m+1
b2 µy n+1 1 + b2 µy n+1 b2 µy n+1
− β + β0,m − β ,
4 0,m−1 2 4 0,m+1

and similarly for ṽL,m . Notice that this formula gives ṽ0,m only for values of m from 1 to
M − 1. These boundary conditions and the equations (7.3.13) completely determine ṽ$,m
for $ = 0, . . . , L and m = 1, . . . , M − 1. Values of ṽ$,0 and ṽ$,M are not determined
by these formulas, and, as we shall see, these values are not needed at all.
Having computed ṽ, the second stage of the computation uses (7.3.8b), and the
difference equations are

b 2 µy b 2 µy
− v$,m−1 + (1 + b2 µy )v$,m − v$,m+1
2 2
b1 µx b 1 µx
= ṽ$−1,m + (1 − b1 µx )ṽ$,m + ṽ$+1,m (7.3.14)
2 2
for $ = 1, . . . , L − 1 and for m = 1, . . . , M − 1.
7.3 The ADI Method 179
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Similar to (7.3.13), this is a system of L − 1 tridiagonal systems of equations, one tridi-


agonal system for each value of $. The boundary conditions for v are the specification
of the exact values of the solution at time level n + 1, i.e., for t = (n + 1)k. Again, the
standard implementation uses a loop on $, within which the Thomas algorithm is used to
solve for the values of v$,m for m = 0, . . . , M.
It is important to notice that in equation (7.3.14) the required values of ṽ are precisely
the values computed by (7.3.13). In particular, there is no need to assign values to ṽ$,0 or
ṽ$,M for any values of $. It is also important to realize that the boundary values v0,m and
vL,m are not needed in the solution of (7.3.14), but these values must be updated as part of
the solution process.
A useful suggestion for implementing ADI methods is first to use the very simple
boundary condition (7.3.12) rather than more complex formulas such as (7.3.13). After the
program is found to be free of programming errors, then more complex and more accurate
boundary conditions such as (7.3.13) can be implemented.
Sample pseudocode for the Peaceman–Rachford ADI method is given below. The
variable w is used for ṽ. The grid lines are numbered 0 to L in the x direction and 0 to M
in the y direction. The lines relating to the boundary conditions and the boundary data are
not specified completely. Also, note that different arrays for the x and y directions could
be used for the Thomas algorithm arrays p and q, and the arrays for the p values could then
be computed only once for more computational efficiency.
The quantities halfbmux and halfbmuy are the two products (1/2) b1 µx and
(1/2) b2 µy , respectively.

! Loop on time
while time < tstop
time = time + time step
! Consider each grid line in y
loop on m from 1 to M-1
! Do the Thomas algorithm for this grid line. First loop
p(1) = 0.
q(1) = Boundary Data at x(0) and y(m)
loop on el from 1 to L-1
dd = v(el,m)
+ halfbmuy*( v(el,m-1) - 2 v(el,m) + v(el, m+1))
denom = 1 + halfbmux*(2 - p(el))
p(el+1) = halfbmux/denom
q(el+1) = ( dd + halfbmux*q(el))/ denom
end of loop on el
! Second loop for the Thomas algorithm
w(L,m) = Boundary Data at x(L) and y(m)
loop on el from L-1 to 0
w( el, m) = p(el+1)*w( el+1, m) + q(el+1)
end of loop on el
end of loop on m
! This completes the first half of ADI
180 Chapter 7. Systems of Partial Differential Equations in Higher Dimensions
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

! Consider each grid line in x


loop on el from 1 to L-1
! Do the Thomas algorithm for this grid line. First loop
p(1) = 0.
q(1) = Boundary Data at x(el) and y(0)
loop on m from 1 to M-1
dd = w(el,m)
+ halfbmux*( w(el-1,m) - 2 w(el,m) + w(el+1,m))
denom = 1 + halfbmuy*(2 - p(m))
p(m+1) = halfbmuy/denom
q(m+1) = ( dd + halfbmuy*q(m))/ denom
end of loop on m
! Second loop for the Thomas algorithm
v(el,M) = Boundary Data at x(el) and y(M)
loop on m from M-1 to 0
v( el, m) = p(m+1)* *iv( el, m+1 ) + q(m+1)
end of loop on m
end of loop on el
! Set the other boundary values
loop on m from 0 to M
v(0,m) = Boundary Data at x(0) and y(m)
v(L,m) = Boundary Data at x(L) and y(m)
end of loop on m
end of loop on time

The Mitchell–Fairweather Scheme


The Mitchell–Fairweather scheme [44] is an ADI scheme for (7.3.2) that is second-order
accurate in time and fourth-order accurate in space. The fourth-order accurate formula
(3.3.6),
 −1  
h2 2 d 2u
1+ δ δ2u = + O h4 ,
12 dx 2
is used rather than the second-order approximation. We consider (7.3.6):
     
k k k k
I − A1 I − A2 un+1 = I + A1 I + A2 un + O k 3 ,
2 2 2 2

where A1 and A2 are the second derivative operators; then we multiply both sides by
 
h2 2 h2 2
1+ δ 1+ δ
12 x 12 y

and replace 
h2 2 ∂2
1+ δ
12 x ∂x 2
7.3 The ADI Method 181
 
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

by δx2 + O h4 . Similar changes are made for the derivatives with respect to y. The
result is
 
h2 2 k h2 2 k
1+ δx − b1 δx2 1+ δy − b2 δy2 un+1
12 2 12 2
     
h2 2 k h2 2 k
= 1+ δx + b1 δx2
1+ δy + b2 δy un + O k 3 + O kh4 .
2
12 2 12 2

We obtain the Mitchell–Fairweather scheme, which is similar to the Peaceman–Rachford


scheme:
 !  !
1 1 2 2 n+1/2 1 1 2 2 n
1− b 1 µx − h δx v = 1+ b 2 µy + h δy v ,
2 6 2 6
 !  ! (7.3.15)
1 1 2 2 n+1 1 1 2 2 n+1/2
1− b 2 µy − h δy v = 1+ b 1 µx + h δx v .
2 6 2 6

This scheme is second order in time and fourth order in space and is not much more work
than the Peaceman–Rachford method.
As an example, suppose the Peaceman–Rachford scheme is used with grid spacings
h1 and k1 , and the Mitchell–Fairweather scheme is used with grid spacings h2 and k2 .
The amount of work for the schemes is proportional to k1−1 h−2 −1 −2
1 and k2 h2 , respectively,
whereas the accuracy is O(k12 ) + O(h21 ) for the Peaceman–Rachford scheme and O(k22 ) +
O(h42 ) for the Mitchell–Fairweather scheme. It is usually not difficult to choose the grid
parameters so that the Mitchell–Fairweather scheme requires less work and gives more
accuracy than the Peaceman–Rachford method (see Exercises 7.3.8, 7.3.10, and 7.3.12).
Boundary conditions for the Mitchell–Fairweather scheme are obtained as for the
otherADI schemes. We can eliminate the terms with δx2 v n+1/2 from (7.3.15) by multiplying
the first equation by b1 µx + 1/6 and the second by b1 µx − 1/6, and then eliminating
the terms containing δx2 v n+1/2 . In this way we obtain for Dirichlet boundary conditions
the following condition for v n+1/2 :

  !
1 1 1 1
v n+1/2
= b1 µx + 1+ b2 µy + 2 2
h δy β n
2b1 µx 6 2 6
  ! , (7.3.16)
1 1 1
+ b1 µx − 1− b2 µy − h2 δy2 β n+1 .
6 2 6

ADI with Mixed Derivative Terms


The ADI methods that have been discussed for equation (7.3.2) can be extended to include
the general equation (7.3.1), with the mixed derivative term as well. We confine our
discussion to the Peaceman–Rachford method for simplicity. One scheme that can be
182 Chapter 7. Systems of Partial Differential Equations in Higher Dimensions
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

used is
 
k k
1 − b11 δx v
2 n+1/2
= 1 + b22 δy v n + kb12 δ0x δ0y v n ,
2
2 2
  (7.3.17)
k k
1 − b22 δy2 v n+1 = 1 + b11 δx2 v n+1/2 + kb12 δ0x δ0y v n+1/2 ,
2 2

which is only first-order accurate in time. Beam and Warming [5] have shown that no ADI
scheme involving only time levels n and n + 1, such as (7.3.17), can be second-order
accurate unless b12 is zero. A simple modification to (7.3.17) that is second-order accurate
in time is
 
k k
1 − b11 δx2 v n+1/2 = 1 + b22 δy2 v n + kb12 δ0x δ0y ṽ n+1/2 ,
2 2
  (7.3.18)
k k
1 − b22 δy v
2 n+1
= 1 + b11 δx v n+1/2 + kb12 δ0x δ0y ṽ n+1/2 ,
2
2 2

where
ṽ n+1/2 = 32 v n − 12 v n−1 .

This scheme requires extra storage to compute ṽ n+1/2 , but it is relatively easy to implement.
The boundary condition
 
1 k 1 k
v n+1/2 = 1 + b22 δy2 β n + 1 − b22 δy2 β n+1 (7.3.19)
2 2 2 2

can be used with this scheme without loss of the second-order accuracy (see Exercise
7.3.13). Notice that this is essentially the same formula as (7.3.11).

Exercises
7.3.1. Show that the inhomogeneous equation

ut = A1 u + A2 u + f

corresponding to equation (7.3.3) can be approximated to second-order accuracy by

 
k k k
I − A1h ṽ n+1/2 = I + A2h v n + f n+1/2 ,
2 2 2
 
k k k
I − A2h v n+1 = I + A1h ṽ n+1/2 + f n+1/2 ,
2 2 2
7.3 The ADI Method 183
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where A1h and A2h are as in scheme (7.3.7). Also show that the scheme
 
k k k
I − A1h ṽ n+1/2 = I + A2h v n + f n ,
2 2 2
 
k k k
I − A2h v n+1 = I + A1h ṽ n+1/2 + f n+1
2 2 2

is second-order accurate.
7.3.2. Consider the system in one space dimension
    
u1 1 0 u1 0 −4 u1
= + .
u2 t
0 1 u2 xx
4 0 u2

Discuss the efficiency of the ADI method using


 
1 0 ∂2 0 −4
A1 = and A2 =
0 1 ∂x 2 4 0

compared with using a Crank–Nicolson scheme with a block tridiagonal system.


Solve this system using one of these methods on the interval −1 ≤ x ≤ 1 for
0 ≤ t ≤ 1 with the exact solution

u1 = e3t sin x cosh 2x,

u2 = e3t cos x sinh 2x

with Dirichlet boundary conditions at x = −1 and x = 1.


7.3.3. Apply the Peaceman–Rachford method to the hyperbolic equation

ut + ux + 2uy = 0

on the square −1 ≤ x ≤ 1, −1 ≤ y ≤ 1 for 0 ≤ t ≤ 1. Specify the exact so-


lution along the sides with y = 0 and x = 0. Apply the extrapolation conditions
n+1
vL,m = vL−1,m
n n+1
and v$,M = v$,M−1
n along the sides xL = 1 and yM = 1, respec-
tively. Use the exact solution

u(t, x, y) = u0 (x − t, y − 2t)

with 
 (1 − 2|x|)(1 − 2|y|) if |x| ≤ 1
2 and |y| ≤ 12 ,
u0 (x, y) =

0 otherwise
for initial and boundary data.
184 Chapter 7. Systems of Partial Differential Equations in Higher Dimensions
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

7.3.4. Show that scheme (7.3.18) for the parabolic equation (7.3.1) with the mixed deriva-
tive terms is second-order accurate and unconditionally stable. Hint: It may be
simpler to use the methods of Section 4.3 rather than to solve explicitly for the two
roots of the amplification polynomial.
7.3.5. Show that (7.3.10) is equivalent to the formula preceding it.
7.3.6. Derive boundary condition (7.3.16) for the Mitchell–Fairweather scheme.
7.3.7. Use the Peaceman–Rachford ADI method to solve

ut = 2uxx + uyy

on the unit square for 0 ≤ t ≤ 1. The initial and boundary data should be taken
from the exact solution

u = exp(1.68t) sin[1.2(x − y)] cosh(x + 2y).

Use )x = )y = )t = 1/10, 1/20, and 1/40. Demonstrate the second-order


accuracy.
7.3.8. Solve the same problem as in Exercise 7.3.7 but by the Mitchell–Fairweather ADI
method. Use )x = )y = 1/10 and )t = 1/30. Compare this case with the use
of the Peaceman–Rachford method with )x = )y = )t = 1/20.
7.3.9. Use the Peaceman–Rachford ADI method to solve

ut = uxx + 2uyy

on the unit square for 0 ≤ t ≤ 1. The initial and boundary data should be taken
from the exact solution

u = exp(1.5t) sin(x − 0.5y) cosh(x + y).

Use )x = )y = )t = 1/10, 1/20, and 1/40. Demonstrate the second-order


accuracy.
7.3.10. Solve the same problem as in Exercise 7.3.9 but by the Mitchell–Fairweather ADI
method. Use )x = )y = 1/10 and )t = 1/30. Compare this case with the use
of the Peaceman–Rachford method with )x = )y = )t = 1/20.
7.3.11. Use the Peaceman–Rachford ADI method to solve

ut = uxx + 2uyy

on the unit square for 0 ≤ t ≤ 1. Take initial and boundary data from the exact
solution
u = exp(0.75t) sin(2x − y) cosh[1.5(x + y)].
Use )x = )y = )t = 1/10, 1/20, and 1/40. Demonstrate the second-order
accuracy.
7.3 The ADI Method 185
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

7.3.12. Solve the same problem as in Exercise 7.3.11, but by the Mitchell–Fairweather ADI
method. Use )x = )y = 1/10 and )t = 1/30. Compare this case with the use
of the Peaceman–Rachford method with )x = )y = )t = 1/20.
7.3.13. Use the scheme (7.3.18) with boundary conditions (7.3.19) to compute an approx-
imation to the parabolic equation (7.3.1), with the mixed derivative term. Let
the coefficients have the values b11 = 1, b12 = 0.5, and b22 = 1 on the square
−1 ≤ x ≤ 1, −1 ≤ y ≤ 1, for 0 ≤ t ≤ 1. Use the exact solution

e2t sin(x + y) cosh(x + y)

with Dirichlet boundary conditions.


7.3.14. Show that the three-dimensional ADI method for

ut = A1 u + A2 u + A3 u

given by
 
k k
I − A1h ṽ n+1/3
= I + A3h v n ,
2 2
 
k k
I − A2h ṽ n+2/3 = I + A2h ṽ n+1/3 ,
2 2
 
k k
I − A3h v n+1
= I + A1h ṽ n+2/3
2 2

is equivalent to
  
k k k
I − A1h I − A2h I − A3h v n+1
2 2 2
  
k k k
= I + A1h I + A2h I + A3h v n
2 2 2

if the operators A1h and A2h commute.


Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 8

Second-Order Equations

In this chapter we study partial differential equations that are of second order in the time
derivatives and show that the methods introduced in previous chapters can easily be applied
to the equations treated here. As will be seen, no significantly new ideas are needed here,
although the definition of stability has to take into account the extra time derivative.

8.1 Second-Order Time-Dependent Equations


We begin with the second-order wave equation in one space dimension, which is

utt − a 2 uxx = 0, (8.1.1)


where a is a nonnegative real number. Initial value problems for equations such as (8.1.1),
which are second order in time, require two functions for initial data; typically these are
u(0, x) and ut (0, x). If
u(0, x) = u0 (x) and ut (0, x) = u1 (x), (8.1.2)
then the exact solution of (8.1.1) may be written as
 x+at
1 1
u(t, x) = [u0 (x − at) + u0 (x + at)] + u1 (y) dy. (8.1.3)
2 2a x−at

This formula shows that there are two characteristic speeds, a and −a, associated with
equation (8.1.1).
In terms of the Fourier transform the solution may be written as

sin aωt
û(t, ω) = û0 (ω) cos aωt + û1 (ω)
aω (8.1.4)
−iaωt
= û+ (ω)e iaωt
+ û− (ω)e
or
 ∞ !
1 sin aωt
u(t, x) = √ e ixω
û0 (ω) cos aωt + û1 (ω) dω
2π −∞ aω
 ∞
1 (8.1.5)
=√ û+ (ω)eiω(x+at) + û− (ω)eiω(x−at) dω
2π −∞

= u+ (x + at) + u− (x − at).

187
188 Chapter 8. Second-Order Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

These formulas for the solution show that in general the solution of the wave equation
(8.1.1) consists of two waves, one moving to the left and one moving to the right.
Figures 8.1 and 8.2 show the solutions to two initial value problems for the wave
equation.

Example 8.1.1. Figure 8.1 shows the solution to the initial value problem for the wave
equation (8.1.1) with a = 1 and with initial data

cos(π x/2) if |x| ≤ 1,


u0 (x) = u1 (x) = 0.
0 if |x| > 1,

Initially the shape gets wider and shorter, but it ultimately splits into two separate
pulses, one moving to the right and one to the left. The solution is

1 π  π 
u(t, x) = cos (x − t) + cos (x + t) .
2 2 2

Example 8.1.2. Figure 8.2 shows the solution to the initial value problem for the wave
equation (8.1.1) with a = 1 and with initial data

1 if |x| ≤ 1,
u0 (x) = 0, u1 (x) = (8.1.6)
0 if |x| > 1.

The initial state is zero, but the initial derivative is nonzero. The solution grows and
spreads. The solution is given by the integral in (8.1.3); i.e.,
 x+t # - .
1 1
u(t, x) = u1 (y) dy = length [x − t, x + t] [−1, 1] ,
2 x−t 2

where u1 is as in (8.1.6). The value of u(t, x) is one-half the length of the intersection of
the interval of width 2t centered on x and the interval [−1, 1].

It is appropriate at this point to discuss the origin of the names hyperbolic and
parabolic as applied to the systems treated in Chapters 1 and 6. The second-order equa-
tion (8.1.1) was originally called hyperbolic because of the similarity of its symbol to the
equation of a hyperbola. If we set ω = iη, the symbol of (8.1.1) is s 2 − a 2 η2 , and the
equations
s 2 − a 2 η2 = constant
are hyperbolas in the (s, η) plane. Similarly, the symbol of the heat equation (6.1.1) is
s − bη2 , and this is related to the equation of a parabola. The symbols of second-order
elliptic equations are likewise related to equations of ellipses. Even though these names
are based only on this formal similarity, they have persisted.
8.1 Second-Order Time-Dependent Equations 189
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1
t = 0.0
t = 0.4
t = 0.8
t = 3.0

0.5

-4 -2 0 2 4

Figure 8.1. A solution of the wave equation.

1
t = 0.4
t = 0.8
t = 1.5
t = 3.0

0.5

-4 -2 0 2 4

Figure 8.2. A solution of the wave equation.


190 Chapter 8. Second-Order Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

As mathematicians studied other equations and systems of equations they extended


the names to cover those systems that shared certain important features with the original
hyperbolic, parabolic, and elliptic equations. The essential feature of a hyperbolic system is
that the solution propagates with certain finite speeds. For a parabolic equation the essential
feature is that the solution becomes smoother than its initial data. The essential feature of
elliptic systems, which we discuss in Chapter 12, is that the solution is more differentiable
than the data.
The general second-order hyperbolic equation in one space dimension is

utt + 2butx = a 2 uxx + cux + dut + eu + f (t, x), (8.1.7)

where b2 < a 2 . The initial value problem for (8.1.7) is well-posed in the sense that for
0 ≤ t ≤ T there is a constant CT depending on the equation and on T , but not on the
particular solution, such that
 ∞
|ut (t, x)|2 + |ux (t, x)|2 + |u(t, x)|2 dx (8.1.8)
−∞
$  t %
∞ ∞
≤ CT |ut (0, x)| + |ux (0, x)| + |u(0, x)| dx +
2 2 2
|f (τ, x)| dxdτ .
2
−∞ 0 −∞

The estimate (8.1.8) can be established by use of either the Fourier transform or the
energy method (see Exercises 8.1.3 and 8.1.4).

The Euler–Bernoulli Equation


The second-order equation
utt = −b2 uxxxx (8.1.9)
is called the Euler–Bernoulli beam equation. It models the vertical motion of a thin, hor-
izontal beam with small displacements from rest. Using the Fourier transform we easily
obtain the solution in either of the two forms
 ∞ !
1 sin bω2 t
u(t, x) = √ eiωx û0 (ω) cos bω2 t + û1 (ω) dω
2π −∞ bω2
 ∞ (8.1.10)
1
=√ eiω(x+bωt) û+ (ω) + eiω(x−bωt) û− (ω) dω,
2π −∞
where u0 and u1 are the initial data as given by (8.1.2). From the second of these
formulas we see that the frequency ω propagates with speeds ± bω. Because the speed
of propagation depends on the frequency, the equation is said to be dispersive. The idea of
dispersion was applied in Section 5.2 to finite difference schemes for hyperbolic systems,
but it is applicable to study any wave phenomena. From (8.1.10) we see that the phase
velocity is bω or −bω, and the group velocity is twice the phase velocity.
The Euler–Bernoulli equation (8.1.9) is neither hyperbolic nor parabolic. As (8.1.10)
shows, the solution does not become smoother as t increases, as do solutions of parabolic
8.1 Second-Order Time-Dependent Equations 191
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

equations, nor does it have a finite speed of propagation, as do the solutions of hyperbolic
equations. Another feature of the Euler–Bernoulli equation is that lower order terms can
adversely effect the well-posedness of the initial value problem; see Example 9.1.2.
Another equation that models the motion of a beam is the Rayleigh equation

utt − c2 uttxx = −b2 uxxxx . (8.1.11)

From the formula


b2 ω4
ûtt (t, ω) = − û(t, ω),
1 + c2 ω2
we see that for small ω the solution to (8.1.11) behaves in a similar manner as the solution
of the Euler–Bernoulli equation (8.1.9), and for large ω the solution to (8.1.11) behaves
like the solution of the wave equation (8.1.1) with speed b/c. In particular, both the phase
velocity and group velocity are bounded.

Exercises
8.1.1. Write (8.1.1) as a first-order hyperbolic system with the two variables u1 = ux
and u2 = ut . Compare the solution of this system as given by the formulas of
Chapter 1 with the formulas (8.1.3) and (8.1.4).

8.1.2. Find an explicit relationship between the pair of functions u0 and u1 in the formula
(8.1.3) and the pair of functions u+ and u− in formula (8.1.5). Hint: Use the
antiderivative of u1 .

8.1.3. Prove (8.1.8) using Fourier transform methods.

8.1.4. Prove (8.1.8) by multiplying (8.1.7) with u(t, x) and integrating by parts. This
method is often called the energy method.

8.1.5. Show that the initial value problem for the two-dimensional wave equation utt =
uxx + uyy is well-posed.

8.1.6. Show that the Euler–Bernoulli equation (8.1.9) satisfies


 ∞  ∞
ut (t, x) + b uxx (t, x) dx =
2 2 2
ut (0, x)2 + b2 uxx (0, x)2 dx
−∞ −∞

by the energy method and by utilizing the Fourier transform.

8.1.7. Show that the initial value problem for the Rayleigh equation (8.1.11) is well-posed.

8.1.8. The Schrödinger equation is ut = ibuxx . Show that the real and imaginary parts of
the solution of the Schrödinger equation each satisfy the Euler–Bernoulli equation.
192 Chapter 8. Second-Order Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

8.1.9. Show that the solution to the initial value problem


utt + 2b utx = a 2 uxx − cu (8.1.12)
with 0 ≤ b < a, 0 < c, and initial data (8.1.2) is given by
 
1 b 1 b
u(t, x) = 1+ √ u0 (x + a+ t) + 1− √ u0 (x − a− t)
2 a 2 + b2 2 a 2 + b2
   
 2 1/2
c̃ x+a+ t J1 c̃ a t + 2bt (x − y) − (x − y)
2 2
−  1/2
4 x−a− t a 2 t 2 + 2bt (x − y) − (x − y)2

× [(a 2 + 2b2 )t − b(x − y)]u0 (y) dy


 x+a+ t 
1  1/2 
+ √ J0 c̃ a 2 t 2 + 2bt (x − y) − (x − y)2 u1 (y) dy,
2 a 2 + b2 x−a− t

where c̃ is c/(a 2 + b2 ) and a+ and −a− are the roots of η2 + 2bη − a 2 = 0
with −a− ≤ a+ . The functions J0 (ξ ) and J1 (ξ ) are the Bessel functions of order
0 and 1, respectively. They satisfy the system of ordinary differential equations

J0 (ξ ) = −J1 (ξ ),

J1 (ξ ) = J0 (ξ ) − ξ −1 J1 (ξ )
with J0 (0) = 1 and J1 (0) = 0. Hint: Let K(u1 ) be the last integral in the above
expression. Show that K(u1 ) is a solution. Then show that the general solution is
∂K(u0 ) ∂K(u0 )
+ 2b + K(u1 ).
∂t ∂x
8.1.10. Show that the solution to the initial value problem
utt + 2b utx = a 2 uxx + cu (8.1.13)
with 0 ≤ b < a, 0 < c, and initial data (8.1.2) is given by
 
1 b 1 b
u(t, x) = 1+ √ u0 (x + a+ t) + 1− √ u0 (x − a− t)
2 a 2 + b2 2 a 2 + b2
   
 2 1/2
c̃ x+a+ t I1 c̃ a t + 2bt (x − y) − (x − y)
2 2
−  1/2
4 x−a− t a 2 t 2 + 2bt (x − y) − (x − y)2

× [(a 2 + 2b2 )t − b(x − y)]u0 (y) dy


  
1 x+a+ t 1/2 
+ √ I0 c̃ a 2 t 2 + 2bt (x − y) − (x − y)2 u1 (y) dy,
2 a + b2
2 x−a− t
8.2 Finite Difference Schemes 193

Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where c̃ is c/(a 2 + b2 ) and a+ and −a− are the roots of η2 + 2bη − a 2 = 0


with −a− ≤ a+ . The functions I0 (ξ ) and I1 (ξ ) are the modified Bessel functions
of the first kind of order 0 and 1, respectively. They satisfy the system of ordinary
differential equations

I0 (ξ ) = I1 (ξ ) and I1 (ξ ) = I0 (ξ ) − ξ −1 I1 (ξ )

with I0 (0) = 1 and I1 (0) = 0.


8.1.11. Show that the general second-order hyperbolic equation (8.1.7) can be reduced to
either of the equations (8.1.12) or (8.1.13) by setting

u(t, x) = eαt eβx v(t, x)

when the parameters α and β are chosen suitably.

8.2 Finite Difference Schemes


for Second-Order Equations
The definitions of convergence, consistency, and order of accuracy for finite difference
schemes as given in Chapters 1 and 3 hold without modification for second-order equations.
The stability definition, however, must be altered slightly. In place of Definition 1.5.1 we
require the following definition.

Definition 8.2.1. A finite difference scheme Pk,h vm n = 0 for an equation that is second-

order in t is stable in a stability region  if there is an integer J and for any positive
time T there is a constant CT such that


 
J ∞
j
h |vm | ≤ 1 + n2 CT h
n 2
|vm |2 (8.2.1)
m=−∞ j =0 m=−∞

n and for 0 ≤ nk ≤ T with (k, h) ∈ .


for all solutions vm
 
The extra factor of 1 + n2 in (8.2.1) is the only change required by the second-
order equation and reflects the linear growth in t allowed by these equations. In the von
Neumann analysis of schemes for second-order equations, Definition 8.2.1 requires that the
amplification factors gν (there will always be at least two) satisfy

|gν | ≤ 1 + Kk

and permits two such amplification factors to coalesce near the unit circle. If there are no
lower order terms, then the stability condition is |gν | ≤ 1 with double roots on the unit
circle permitted. The integer J in Definition 8.2.1 must always be at most 1, since data
must always be given at two time levels for second-order equations.
194 Chapter 8. Second-Order Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We now state the stability condition for second-order differential equations. No-
tice how it differs from the conditions of Theorem 2.2.1 for one-step schemes and from
Theorem 4.2.1 for multistep schemes for first-order differential equations.

Theorem 8.2.1. If the amplification polynomial G(g, θ) for a second-order time-


dependent equation is explicitly independent of h and k, then the necessary and suffi-
cient condition for the finite difference scheme to be stable is that all roots, gν (θ ), satisfy
the following conditions:
(a) |gν (θ )| ≤ 1, and
(b) if |gν (θ )| = 1, then gν (θ ) must be at most a double root.

The necessity of including the factor of 1 + n2 in the estimate (8.2.1) can be seen by
considering the function u(t, x) = t, which is a solution of the equations (8.1.1), (8.1.9),
(8.1.11) and all other second-order equations without lower order terms. Most schemes for
these equations will compute this solution exactly, i.e., vm n = nk. This is represented by

the amplification factor g0 (ξ ) at ξ equal to 0, which is a double root. That is,


n
vm = 0 · g0 (0)n + nkg0 (0)n−1 .

From this we observe that without the factor of 1 + n2 in the estimate (8.2.1), all consistent
schemes for a second-order equation would be “unstable.” (The fact that the function
u(t, x), which is everywhere equal to t, is not in L2 (IR ) as a function of x is not
important to the argument. One can approximate u(t, x) by functions that are in L2 (R)
as functions of x, and the argument will proceed to the same conclusion.) This point
about the factor of 1 + n2 is not made by Richtmyer and Morton [52]. They reduced all
second-order equations to first-order equations and used the definition corresponding to
Definition 1.5.1.
The Lax–Richtmyer equivalence theorem for second-order equations can be proved
using the methods of Section 10.7.

Example 8.2.1. The first scheme we consider is the standard second-order accurate scheme
for (8.1.1),
n+1 − 2v n + v n−1
vm m m v n − 2vm
2 m+1
n + vn
m−1
= a . (8.2.2)
k2 h2
We now show that this scheme is stable for aλ ≤ 1 (we take a to be nonnegative). As
in the von Neumann analysis in Chapter 2, we have that the equation for the amplification
factors is
g − 2 + g −1 = −4a 2 λ2 sin2 21 θ,
or  2  2
g 1/2 − g −1/2 = ± 2iaλ sin 12 θ .

Hence
g 1/2 − g −1/2 = ± 2iaλ sin 12 θ
and so
g ± 2iaλ sin 12 θg 1/2 − 1 = 0,
8.2 Finite Difference Schemes 195
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

which is a quadratic equation for g 1/2 . We then have the roots


&
1/2
g± = ± iaλ sin 12 θ ± 1 − a 2 λ2 sin2 21 θ
or & 2
g± = 1 − a 2 λ2 sin2 21 θ ± iaλ sin 12 θ .

It is easily seen that |g| is bounded by 1 if and only if aλ is at most 1. When θ is equal
to 0, then g+ and g− are equal; this also occurs if aλ is 1 and θ is π. Recall that the
solution of the difference scheme is given by
v̂ n = A+ (ξ )g+
n
+ A− (ξ )g−
n

when g+  = g− and
v̂ n = A(ξ )g n + B(ξ ) n g n−1
when g+ = g− = g. Because linear growth in n is permitted by Definition 8.2.1, the
scheme is stable even when the roots are equal. Thus the scheme is stable if and only if
aλ ≤ 1.

At this point it is worthwhile to compare the analysis and conclusions given here
with those of Section 4.1 for the leapfrog scheme for first-order hyperbolic equations. The
analysis for the two cases is similar, but the coalesence of the two amplification factors
was not permitted for the first-order equation. For the second-order scheme (8.2.2), we
would usually take aλ to be strictly less than 1 to avoid the linear growth of the wave with
θ = π. However, the presence of the high-frequency oscillation, that with θ = π, does
not affect the convergence of the scheme (8.2.2) as it would for the leapfrog scheme for the
one-way wave equation because the extra initial data for (8.2.2) restricts the amplitude of
high frequencies that are growing linearly.

Example 8.2.2. For the Euler–Bernoulli equation (8.1.9), the simplest scheme is the second-
order accurate scheme
n+1 − 2v n + v n−1
vm m m v n − 4vm+1
2 m+2
n + 6vm
n − 4v n
m−1 + vm−2
n
= −b
k2 h4 (8.2.3)
= −b2 δ 4 vm
n
.
The equation for the amplification factors is
g − 2 + g −1 = −16b2 µ2 sin4 21 θ,

where µ = k/h2 . The stability analysis is almost exactly like that of scheme (8.2.2) for
the wave equation, and it is easy to see that scheme (8.2.3) is stable if and only if
2bµ sin2 21 θ ≤ 1,
which requires that
bµ ≤ 12 .
196 Chapter 8. Second-Order Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Higher Order Accurate Schemes for Second-Order Equations


We now present two (2, 4) accurate schemes for the wave equation (8.1.1). The first is


n+1 − 2v n + v n−1
vm m m −vm+2
n + 16vm+1
n − 30vm
n + 16v n
m−1 − vm−2
n
= a2
k2 12h2
 (8.2.4)
h2 2 2 n
= a2 1 − δ δ vm .
12

The equation for the amplification factors is


 −2 cos 2θ + 32 cos θ − 30 
g − 2 + g −1 = a 2 λ2
12

4 1 1
= − a 2 λ2 sin2 θ 3 + sin2 θ
3 2 2
or
   1/2
a 2 λ2 sin2 21 θ 3 + sin2 21 θ
g 1/2 − g −1/2 = ± 2i   .
3

As in the previous analyses, the scheme is stable, i.e., |g± | ≤ 1, if and only if
 
a 2 λ2 sin2 21 θ 3 + sin2 21 θ
≤ 1.
3

Obviously the maximum of the left-hand side of this inequality occurs when sin2 21 θ is 1,
and so we obtain the stability condition

3
aλ ≤ ≈ 0.8660
2
for the (2, 4) scheme (8.2.4).
An implicit (2, 4) scheme for the wave equation (8.1.1) is given by
 −1
h2
δt2 vm
n
=a 2
1 + δx2 δx2 vm
n
12
or

n+1  n 
vm+1 + 10vm
n+1
+ vm−1
n+1
− 2 vm+1 + 10vm
n
+ vm−1
n
+ vm+1
n−1
+ 10vm
n−1
+ vm−1
n−1

 n 
= 12a 2 λ2 vm+1 − 2vm
n
+ vm−1
n
.
8.2 Finite Difference Schemes 197
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Although this scheme requires the solution of a tridiagonal system of equations at each step,
it has the advantage over the scheme (8.2.4) of having a narrower stencil; i.e., it does not
n
use vm±2 n+1 . The narrower stencil makes it easier to implement boundary
to compute vm
conditions.
This scheme, though implicit, is not unconditionally stable. We have
 2 4a 2 λ2 sin2 21 θ
g 1/2 − g −1/2 = − ,
1 − 13 sin2 21 θ
and thus for stability we must enforce

a 2 λ2 sin2 21 θ
≤ 1.
1− 1
3 sin2 21 θ

The maximum of the left-hand side occurs at sin 12 θ = 1, and thus


&
aλ ≤ 23 ≈ 0.8165

is the stability condition. As for the previous scheme, this is not a serious restriction, since
k should be small compared with h to achieve good accuracy and efficiency with a (2, 4)
accurate scheme. (See the discussion at the end of Section 4.1 on higher order accurate
schemes.)

Computing the First Time Step


All the schemes for equations that are second order in time require some means of computing
the solution on the first time step after the initial time level. Perhaps the simplest procedure
is to use the Taylor series expansion

u(k, x) = u(0, x) + kut (0, x) + 12 k 2 utt (0, x) + O(k 3 ).

The values of u(0, x) and ut (0, x) are given data and, by using the differential equation,
utt (0, x) can be expressed as a derivative of u with respect to x, e.g., as a 2 uxx (0, x) for
(8.1.1) or −b2 uxxxx (0, x) for (8.1.9). Using a finite difference approximation, we easily
obtain an expression for vm1 that is of the same order of accuracy as the rest of the scheme.

For example, for (8.1.1) we have


1
vm = vm
0
+ k (ut )m + 12 a 2 k 2 δ 2 vm
0
. (8.2.5)

As with initializing multistep methods for first-order equations (see Section 4.1), the
initialization method has no effect on the stability of the overall method. If we regard
formula (8.2.5) as an approximation to ut (0, x), i.e., in the form
1 − v0
vm 1
ut (0, xm ) = m
− a 2 kδ 2 vm
0
+ O(k 2 ),
k 2
then the approximation must be of at least the same order of accuracy as the scheme in order
not to degrade the accuracy of the overall method. These results are proved in Section 10.7.
198 Chapter 8. Second-Order Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Von Neumann Polynomials and Stability


We can modify the algorithms for von Neumann and Schur polynomials, as discussed in
Section 4.3, to test for the stability of second-order schemes. We first extend the definition
of von Neumann polynomials.

Definition 8.2.2. The polynomial ϕ is a von Neumann polynomial of order q if all of


its roots, rν , satisfy the following conditions:
(a) |rν | ≤ 1, and
(b) the roots with |rν | = 1 have multiplicity at most q.
A von Neumann polynomial of order 0 is defined to be a Schur polynomial.

Comparing this definition with Definition 4.3.3, we see that a simple von Neumann
polynomial is a von Neumann polynomial of order 1. We then have the following general-
ization of Theorems 4.3.1 and 4.3.2.

Theorem 8.2.2. A polynomial ϕd of exact degree d is a von Neumann polynomial of


order q if and only if either
(a) |ϕd (0)| < |ϕd∗ (0)| and ϕd−1 is a von Neumann polynomial of order q or
(b) ϕd−1 is identically zero and ϕd is a von Neumann polynomial of order q − 1.

The proof of this theorem is similar to the proofs of Theorems 4.3.1 and 4.3.2 and is
left as an exercise. We note that if ϕd is a von Neumann polynomial of order 0 and degree
1 or more, then it is impossible for ϕd−1 to be identically zero.
Theorem 8.2.2 can be used to analyze the stability of schemes for second-order
equations. If G(g, θ ) is the amplification polynomial of a finite difference scheme for
a second-order equation for which the restricted condition |gν | ≤ 1 can be employed, then
the scheme is stable if and only if G(g, θ) is a von Neumann polynomial of order 2.
The algorithm given in Section 4.4 can be applied to von Neumann polynomials of
any degree or order.

Exercises
8.2.1. Show that the implicit scheme for (8.1.1) given by
 
δt2 vm
n
= 14 a 2 δx2 vm
n+1
+ 2vm
n
+ vm
n−1

is a second-order accurate scheme and is unconditionally stable.


8.2.2. Show that the scheme
n+2 − 2v n + v n−2  −1
vm m m h2 2
− a 2
1 + δ δx2 vm
n
4k 2 12 x
 −1  
a2 h2 2
− 1 + δx δx2 vmn+2
+ 12vm
n+1
− 26vm n
+ 12vm
n−1
+ vm
n−2
=0
48 12

is a (4, 4) scheme for the wave equation (8.1.1) and use Theorem 8.2.2 to show that
it is stable if and only if aλ < 1.
8.3 Boundary Conditions for Second-Order Equations 199
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

8.2.3. Prove Theorem 8.2.2.


8.2.4. Show that the scheme for the wave equation (8.1.1),
 4
hδx
δt2 vm
n
= a 2 δx2 vm
n
+ εk −2 n−1
vm ,
2

is dissipative for small positive values of the parameter ε. Show that the scheme is
second-order accurate when λ is constant.
8.2.5. Show that the implicit scheme

δt2 vm
n
+ 2cδt0 δx0 vm
n
= a 2 δx2 vm
n
(8.2.6)

for the equation


utt + 2cutx = a 2 uxx (8.2.7)
is second-order accurate and stable for aλ ≤ 1.
8.2.6. Use scheme (8.2.6) to obtain approximate solutions to equation (8.2.7) on the interval
−1 ≤ x ≤ 1 for 0 ≤ t ≤ 1. As initial data, take

u0 (x) = cos π x and u1 (x) = c sin π x.

For boundary data use the exact solution

u(t, x) = 1
2 [cos π(x − η+ t) + cos π(x − η− t)] ,

where η± = c ± c2 + a 2 . Take c equal to 0.5 and a equal to 1. Use grid spacings
of 1/10, 1/20, and 1/40 and λ equal to 1. Demonstrate the second-order accuracy
of the scheme.

8.3 Boundary Conditions for Second-Order Equations


The second-order wave equation (8.1.1) on an interval, say, 0 ≤ x ≤ 1, requires one bound-
ary condition at each end. This is easily seen by relating (8.1.1) to a first-order system (see
Exercise 8.1.1). The two most common boundary conditions are to specify the value of the
solution at the boundary, the Dirichlet boundary condition, and to specify the first derivative
with respect to x at the boundary, the Neumann boundary condition.
For all the schemes for the wave equation (8.1.1) other than (8.2.4), the boundary
conditions where the value of u is prescribed on the boundary present no problem. If
the derivative of u with respect to x is specified, then several options are available. For
example, suppose the boundary condition at x equal to 0 is

ux (t, 0) = 0.
200 Chapter 8. Second-Order Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

For the finite difference scheme at m equal to 0, we can use either

4v1n+1 − v2n+1
v0n+1 = (8.3.1)
3
or  
v0n+1 = 2v0n − v0n−1 − 2a 2 λ2 v0n − v1n . (8.3.2)
Formula (8.3.1) is from the second-order accurate one-sided approximation
4u(h) − 3u(0) − u(2h)  
ux (0) = + O h2
2h
(see Exercise 3.3.8). Formula (8.3.2) arises from employing scheme (8.2.2) at m equal
n by using
to 0 and then eliminating the value of v−1

v1n − v−1
n
= 0,
2h
which is the central difference approximation to the first derivative. Other boundary con-
ditions are also possible.
The use of first-order accurate boundary conditions, such as

v1n − v0n
= 0, (8.3.3)
h
degrade the overall accuracy of the second-order accurate scheme.
Scheme (8.2.4), which has a wider stencil, requires a numerical boundary condition
at the grid point next to the boundary since the scheme cannot be applied there. Various
conditions can be used. Using the second-order accurate scheme (8.2.4) at these points can
degrade the accuracy. If the value on the boundary is specified, then the value next to the
boundary can be determined by interpolation, e.g.,
 
v1n+1 = 14 v0n+1 + 6v2n+1 − 4v3n+1 + v4n+1 ,

which is obtained from


4 n+1
h4 δ+ v0 = 0.
The scheme (8.2.4) with derivative boundary conditions is rather unwieldy.
For the Euler–Bernoulli scheme (8.2.3) and similar schemes for the Rayleigh equation,
the boundary conditions can be obtained by standard methods, but now there are two
boundary conditions required by the differential equation. For example, if the beam is held
fixed and clamped at x equal to 0, the boundary conditions would be

u(t, 0) = ux (t, 0) = 0.

If the beam is fixed in place but allowed to pivot, the boundary conditions are

u(t, 0) = uxx (t, 0) = 0, (8.3.4)


8.3 Boundary Conditions for Second-Order Equations 201
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

and if the end of the beam is free to move, the conditions are

uxx (t, 0) = uxxx (t, 0) = 0. (8.3.5)

The implementation of these boundary conditions with a finite difference scheme can
be done in several ways. The use of second-order accurate formulas applied at x equal
to 0 can be used to give boundary conditions for the scheme (8.2.3) and schemes for the
Rayleigh equation. For example, for boundary conditions (8.3.4), v0n+1 is prescribed, and
at m = 1, we can use the scheme (8.2.3) with the formula
n
v−1 − 2v0n + v1n = 0 (8.3.6)
n . Similarly for boundary conditions (8.3.5), we can use the scheme (8.2.3)
to eliminate v−1
with the condition (8.3.6) and

−v−2
n
+ 2v−1
n
− 2v1n + v2n = 0, (8.3.7)

which is the second-order accurate formula for 2h3 δ0 δ 2 v0n = 0, to eliminate v−2 n and
n
v−1 from (8.2.3) applied at m equal to 0 and m equal to 1. In the actual computer
implementation we can either have variables v−2 n and v n and use (8.3.6) and (8.3.7) to
−1
define their values, or we can eliminate these variables from the difference formula (8.2.3),
obtaining formulas to calculate v0n+1 and v1n+1 . The two approaches are equivalent.

Exercises
8.3.1. Use the scheme (8.2.2) to obtain approximate solutions to the wave equation utt =
uxx on the interval 0 ≤ x ≤ 1 for 0 ≤ t ≤ 1. For initial data and Dirichlet boundary
data at x equal to 1, use the exact solution

u(t, x) = cos(x + t) + cos(x − t),

and at x equal to 0, use the Neumann condition ux = 0. Implement the boundary


conditions (8.3.1) and (8.3.2) as well as the first-order accurate boundary approxi-
mation δ+ u0 = 0.
Use grid spacings of 1/10, 1/20, and 1/40 and λ equal to 1. Demon-
strate the second-order accuracy of the solution with boundary conditions (8.3.1)
and (8.3.2) and the first-order accuracy when boundary condition (8.3.3) is used.
8.3.2. Use the scheme (8.2.2) to obtain approximate solutions to the wave equation utt =
uxx on the interval −2 ≤ x ≤ 2 for 0 ≤ t ≤ 3.8. For initial data use

1 − |x| if |x| ≤ 1,
u0 (x) = u1 (x) = 0,
0 if |x| > 1,
and at x equal to −2, use the Dirichlet condition u = 0. At the boundary at x
equal to 2, use the Dirichlet condition ux = 0.
Use grid spacings of 1/10, 1/20, 1/40, and 1/80 and λ equal to 0.95.
Comment on the accuracy of the solution. The exact solution, when extended to the
whole real line, is symmmetric around the point x = 2 for all values of t.
202 Chapter 8. Second-Order Equations
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

8.4 Second-Order Equations in Two


and Three Dimensions
The extension of most of the results of the previous sections to higher dimensions is straight-
forward. As noted in Section 7.2, the stability conditions usually become more severe. For
example, the wave equation in two spatial dimensions is
 
utt = a 2 uxx + uyy (8.4.1)

and the simplest scheme for this equation is


 
δt2 v$,m
n
= a 2 δx2 v$,m
n
+ δy2 v$,m
n
. (8.4.2)

The stability condition for this scheme when )x = )y = h is

1
aλ ≤ √ . (8.4.3)
2
As for the leapfrog scheme (7.2.5), this can be improved to

aλ ≤ 1 (8.4.4)

for the scheme



δt2 v$,m
n
= 14 a 2 δx2 (v$,m+1
n
+ 2v$,m
n
+ v$,m−1
n
)
 (8.4.5)
+ δy2 (v$+1,m
n
+ 2v$,m
n
+ v$−1,m
n
).
It is also possible to develop ADI schemes for (8.4.1). One possible scheme is
   
n+1/2
1 − 14 k 2 a 2 δx2 ṽ$,m = 1 + 14 k 2 a 2 δy2 v$,m
n
,
   
n+1/2
1 − 14 k 2 a 2 δy2 ṽ$,m
n+1
= 1 + 14 k 2 a 2 δx2 ṽ$,m , (8.4.6)

n+1
v$,m = 2ṽ$,m
n+1
− v$,m
n−1
,

which is second-order accurate and unconditionally stable (see Exercise 8.4.2). This scheme
is implemented in a fashion similar to the ADI schemes of Section 7.3. Other ADI schemes
for the two-dimensional wave equation are discussed by Fairweather and Mitchell [17].

Dispersion for Schemes in Higher Dimensions


It is interesting to analyze the dispersion of the scheme (8.4.2) from the formula for the
amplification factors. The amplification factors are

 1/2 ,2
g± = 1−a λ 2 2
(sin2 21 θ + sin2 21 φ) ± iaλ(sin2 21 θ + sin2 21 φ)1/2 .
8.4 Second-Order Equations in Two and Three Dimensions 203
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Comparing this with


eia(ξ1 +ξ2 )
2 2 1/2 k
,
we have that the phase velocity satisfies
 1/2 !  1/2
sin 1
2 α(ξ1 , ξ2 )k ξ12 + ξ22 = aλ sin2 21 hξ1 + sin2 21 hξ2 .

It is important to note that the phase error is not independent of direction. We have
!
h2 |ξ |2  4 
α (ξ1 , ξ2 ) = a 1 − cos β + sin β − a λ + O (h|ξ |) ,
4 2 2 4
(8.4.7)
24
 1/2
where |ξ | = ξ12 + ξ22 and tan β = ξ1 /ξ2 . This formula shows that the phase error
depends on the direction of propagation of the wave, where (cos β, sin β) is the unit vector
in the direction of propagation. For most computations it is difficult to notice the distortion
caused by the dependence of the dispersion on the direction of propagation unless the grid
is quite coarse.
Figure 8.3 shows the solution of the scheme (8.4.5), with initial and boundary data
taken from
u(t, x, y) = cos(3t)J0 (3r)

0.5

-0.5

-1
-1 -0.5 0 0.5 1

Figure 8.3. A solution of the wave equation.


204 Chapter 8. Second-Order Equations

Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

with r = (x − 1/2)2 + (y − 1/2)2 . The solution used h = 0.1 for both spatial direc-
tions, λ = 0.9, and the first time step was computed using a formula analogous to (8.2.5).
The solution is shown for time 12.0. The computed solution is seen to be radially symmetric
about the center of the wave and there is very little distortion due to dispersion.

Exercises
8.4.1. Verify stability condition (8.4.3) for scheme (8.4.2).
8.4.2. Verify stability condition (8.4.4) for scheme (8.4.5).
8.4.3. Verify that the scheme given by (8.4.6) is a second-order accurate and
unconditionally stable approximation to the wave equation (8.4.1).
8.4.4. Show that the hyperbolic equation

utt = a11 uxx + 2a12 uxy + a22 uyy

2 < a a
with a12 11 22 and a11 , a22 > 0, can be approximated by the ADI scheme

 
k2 n+1/2 k2 k2
1 − a11 δx2 ṽ$,m = 1 + a22 δy2 v$,m
n
+ a12 δ0x δ0y v$,m
n
,
4 4 2
 
k2 k2 n+1/2 k2
1 − a22 δy ṽ$,m = 1 + a11 δx ṽ$,m + a12 δ0x δ0y v$,m
2 n+1 2 n
,
4 4 2
n+1
v$,m = 2ṽ$,m
n+1
− v$,m
n−1
.

Show that this scheme is unconditionally stable and is second-order accurate.


8.4.5. Verify formula (8.4.7) for the phase velocity.
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 9

Analysis of Well-Posed
and Stable Problems

In this chapter we examine initial value problems for partial differential equations and finite
difference schemes from a more general perspective than in the previous chapters. We begin
by examining the concept of a well-posed initial value problem, first for a single partial
differential equation and then for a system of equations. These results are used in Chapter
10 as part of the proofs of the convergence theorems. The analysis used to study initial
value problems for partial differential equations is analogous to the von Neumann analysis
presented in Section 2.2. The concept of a well-posed initial value problem is important
in scientific modeling and in understanding finite difference schemes used in scientific
calculations. The concept of well-posedness was the topic of the important lectures of
Hadamard in 1921 [27]. As we will see, the analysis of this section gives another example
of the power and usefulness of Fourier analysis.
A central result for the general study of stability of finite difference schemes is the
Kreiss matrix theorem. This result is of importance in proving stability results for equations
with variable coefficients (see Wade [67] and Kreiss [32]) and for systems whose stability
cannot be verified by the methods of Section 7.1. The last section of this chapter contains
a proof and discussion of the Kreiss matrix theorem.

9.1 The Theory of Well-Posed Initial Value Problems


We begin by considering conditions under which initial value problems for partial differen-
tial equations are well-posed. This study can be motivated by the question of why certain
equations, such as the wave equation (8.1.1) and the heat equation (6.1.1), arise frequently
in applied mathematics and others, such as
utt = ux , (9.1.1)
do not arise in governing the time evolution of physical systems.
For a partial differential equation to model the time evolution of a well-behaved
physical process, there are several properties it should have. An important condition is
that the solution should depend on the initial data in a continuous way. In particular, small
errors such as those due to experimental error and interpolation of data should lead to small
changes in the solution. The norms used to define “small” errors must also be reasonable.
For example, a condition that the third derivative of measurement errors be small is an
unreasonable demand because there is no way to either check this condition or enforce it
for practical problems.

205
206 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

For linear problems such as we are concerned with here, the continuity condition is
satisfied if the solutions to the partial differential equation satisfy

u(t, ·) ≤ Ct u(0, ·) (9.1.2)

for some norm such as the L2 norm, L1 norm, or L∞ norm and a constant Ct independent
of the solution. Because of the linearity of the equations, we have by (9.1.2) that two
different initial functions u1 (0, x) and u2 (0, x) give different solutions whose difference
is bounded by their initial difference, i.e.,

u1 (t, ·) − u2 (t, ·) ≤ Ct u1 (0, ·) − u2 (0, ·).

This estimate expresses the notion that small changes in the initial data will result in small
changes in the solution at later times.

Definition 9.1.1. The initial value problem for a first-order equation is well-posed if for
each positive t there is a constant Ct such that the inequality (9.1.2) holds for all initial
data u(0, ·).
Unless otherwise specified, we take the norm in estimate (9.1.2) to be the L2 norm.
By using the L2 norm we can use Fourier analysis to give necessary and sufficient condi-
tions for initial value problems to be well-posed. With the L1 norm or L∞ norm it is often
easy to get necessary conditions or sufficient conditions but harder to obtain conditions that
are both necessary and sufficient. The main reason for this is that there is no relation like
Parseval’s relation for norms other than the L2 norm. We also say that an equation is
well-posed, by which we mean that the initial value problem for the equation is well-posed.
A second important property for a partial differential equation to have as a model of a
physical system is that the qualitative behavior of the solution be unaffected by the addition
of or changes in lower order terms and by sufficiently small changes in the coefficients.
This condition is not always met, but it does serve as a guide to the most “robust” systems
and types of equations.
This last property, which we refer to as robustness, is important because almost all
derivations of equations to model physical processes make some assumptions that certain
effects are not important to understanding the physical process being studied. Statements
such as, “assume that the temperature of the body is constant,” “we may ignore gravitational
forces,” and “consider a homogeneous body” can be made because it is assumed that
small variations in some quantities may be ignored without affecting the conclusions of the
analysis.
This robustness property is also important when we consider numerical methods for
solving the equations that model a physical system. Finite difference schemes and other
numerical methods may be regarded as perturbations, or approximations, of the equations
similar to modification of the equations by adding lower order terms. If the equation is not
robust, then the construction of difference schemes for the equation will be more difficult.
We begin our analysis by considering a general linear partial differential equation
with constant coefficients that is first order in the time differentiation. Examples of such
equations are the one-way wave equation (1.1.1), the three equations (1.4.1), and the heat
9.1 Well-Posed Initial Value Problems 207
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

equation (6.1.1). We assume that the Fourier transform is well defined for u(t, ·) for all t
and for u(0, ·).
Any equation of first order in the time derivative can be put in the form

ût (t, ω) = q(ω)û(t, ω) (9.1.3)

after applying the Fourier transform in space. The initial value problem for this equation
has the solution
û(t, ω) = eq(ω)t û0 (ω). (9.1.4)

Theorem 9.1.1. The necessary and sufficient condition for equation (9.1.3) to be well-
posed, that is, to satisfy the basic estimate (9.1.2), is that there is a constant q̄ such that

Re q(ω) ≤ q̄ (9.1.5)

for all real values of ω.

Proof. If the function q(ω) satisfies (9.1.5) for some constant q̄, then from (9.1.4)

|û(t, ω)| ≤ eq̄t |û0 (ω)|,

and we obtain estimate (9.1.2) by Parseval’s relation. However, if q(ω) does not satisfy
(9.1.5), then by choosing û0 (ω) appropriately we can have

u(t, ·)2 > Cu0 2

for any large constant C and some function u0 . This construction is similar to that used
in the proof of Theorem 2.2.1. This proves the theorem.
As the proof shows, if inequality (9.1.5) is violated, then some small errors of high
frequency, i.e., large |ω|, can cause the solution to be vastly different from the true solution
without the errors. Therefore, the condition (9.1.5), which is the necessary and sufficient
condition for the estimate (9.1.2) to hold for an equation of first order in time, is an analytical
consequence of the requirement of continuity.
For many single equations of first order in the time derivative, the robustness condition
is also satisfied. For example, the equations

ut = aux + cu,

ut = uxx + cux ,

ut − utxx = aux + cu,

ut + utx = buxx + cux

all satisfy the condition (9.1.5) regardless of the value of c, although the value of q̄ may
depend on c (but not on ω).
208 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example 9.1.1. An example of an equation violating the robustness condition is

ut = uxxx + cuxx ,

for which q(ω) satisfies (9.1.5) for c nonnegative but not if c is negative. Thus, this
equation with c equal to zero does not satisfy the robustness condition, although it does
for c positive.

Higher Order Equations


We next consider equations of higher order in the time derivative. For these equations the
symbol p(s, ω) is a polynomial in s. If the roots of the symbol are q1 (ω), . . . , qr (ω),
then any function of the form
eqv (ω)t eiωx ψ(ω)

is a solution of the partial differential equation. Based on our previous arguments, we see
that a necessary condition for the initial value problem to be well-posed is that each root
qv (ω) satisfies the estimate (9.1.5).
We would have to properly define a well-posed initial value problem for higher order
equations if we were to pursue this discussion. The definition would have to take account of
the additional initial data required by higher order equations. Rather than develop a general
theory we consider several typical cases. First we consider the second-order equation of
the form
utt = R(∂x )u (9.1.6)

whose symbol is
p(s, ω) = s 2 − r(ω). (9.1.7)

For the condition (9.1.5) to be satisfied for both roots of (9.1.7), which are

q± (ω) = ± (r(ω))1/2 , (9.1.8)

we see that r(ω) must be close to or on the negative real axis. Examples are given by the
wave equation (8.1.1) and the Euler–Bernoulli equation (8.1.9).

Example 9.1.2. Lower order terms can affect the well-posedness of the problem, as can be
seen by the equation
utt = −b2 uxxxx + cuxxx .

We have that
r(ω) = −b2 ω4 − icω3
9.1 Well-Posed Initial Value Problems 209
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

and so
 1/2
q± = ± −b2 ω4 − icω3

ic 1/2
= ±ibω2 1 + 2
b ω
!
1 ic
= ±ibω 1 +
2
+ O(ω−2 )
2 b2 ω
 cω 
= ± ibω2 − + O(1) .
2b
Hence if c is nonzero, each root violates (9.1.5) for either positive or negative values of ω.
Thus the Euler–Bernoulli equation (8.1.9) is not robust, although the wave equation (8.1.1)
is robust (see Exercise 9.1.3).

For completeness we also give the definition of a well-posed initial value problem for
equation (9.1.6), which is second order in the differentiation in time. Let 2ρ be the degree
of the polynomial r(ω), which is the symbol of R(∂x ).

Definition 9.1.2. The initial value problem for the second-order equation (9.1.6) is
well-posed if for each t > 0 there is a constant Ct such that for all solutions u
 
u(t, ·)H ρ + ut (t, ·)H 0 ≤ Ct u(0, ·)H ρ + ut (0, ·)H 0 . (9.1.9)

Condition (9.1.5) is necessary and sufficient for the initial value problem to be well-
posed. This result is stated in the following theorem. The theorem applies to a more general
class of equations (see Exercises 9.1.3 and 9.1.5), and it is stated so as to apply to this more
general case.

Theorem 9.1.2. A necessary and sufficient condition for the initial value problem for an
equation of second order in the time differentiation to be well-posed is that there exists a
constant q̄ such that (9.1.5) holds for each root of the symbol.

Proof. We give the proof only for equations in the form (9.1.6). It extends without
difficulty to more general equations; see Exercise 9.1.5. The necessity of condition (9.1.5)
is clear from our earlier arguments.
To show the sufficiency, let q+ (ω) and q− (ω) be the two roots of the symbol (9.1.7).
If these roots are not equal, we have that the solution satisfies

û(t, ω) = A(ω)eq+ (ω)t + B(ω)eq− (ω)t

for some functions A(ω) and B(ω). These functions are determined by the two relations

û(0, ω) = A(ω) + B(ω)


210 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

and
ût (0, ω) = A(ω)q+ (ω) + B(ω)q− (ω).
Therefore,
ût (0, ω) − q− (ω)û(0, ω)
A(ω) =
q+ (ω) − q− (ω)
and
ût (0, ω) − q+ (ω)û(0, ω)
B(ω) = .
q− (ω) − q+ (ω)

Now q+ (ω) is equal to q− (ω) only when r(ω) is zero (see (9.1.8)), and since r(ω) is
a polynomial in the one variable ω, this can occur only for |ω| less than some value c0 .
Consider first the case with |ω| greater than 2c0 . We then have

|û(t, ω)| ≤ (|A(ω)| + |B(ω)|) eq̄t


!
|q+ (ω)| + |q− (ω)| 2|ût (0, ω)|
≤ |û(0, ω)| + eq̄t .
|q+ (ω) − q− (ω)| |q+ (ω) − q− (ω)|

Since |ω| is greater than 2c0 , there is a constant C such that the preceding estimate
implies
   
1 + |ω|ρ |û(t, ω)| ≤ C (1 + |ω|ρ ) |û(0, ω)| + |ût (0, ω)| eq̄t .

We also have that

|ût (t, ω)| ≤ [|A(ω)q+ (ω)| + |B(ω)q− (ω)|] eq̄t


 
≤ C (1 + |ω|ρ )|û(0, ω)| + |ût (0, ω)| eq̄t .

For |ω| less than 2c0 we write the equation for û(t, ω) as
$ % $ %
eq+ (ω)t + eq− (ω)t eq+ (ω)t − eq− (ω)t
û(t, ω) = C(ω) + D(ω) .
2 q+ (ω) − q− (ω)

We have that
C(ω) = û(0, ω)
and 
q+ (ω) + q− (ω)
D(ω) = −û(0, ω) + ût (0, ω).
2
The function
eq+ (ω)t − eq− (ω)t
q+ (ω) − q− (ω)
9.1 Well-Posed Initial Value Problems 211
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

is uniformly bounded by Cteq̄t for some constant C. Thus for |ω| less than 2c0 ,
   
1 + |ω|ρ |û(t, ω)| ≤ C(1 + t) (1 + |ω|ρ ) |û(0, ω)| + |ût (0, ω)| eq̄t
and |ût (t, ω)| is also bounded by the same quantity.
Combining the estimates for |ω| both greater than and less than 2c0 , we obtain
 2
1 + |ω|ρ |û(t, ω)|2 + |ût (t, ω)|2
 
≤ Ct (1 + |ω|ρ )2 |û(0, ω)|2 + |ût (0, ω)|2 ,

which, by Parseval’s relation, gives (9.1.9).

Example 9.1.3. The equation (9.1.1) is ill-posed since the roots of its symbol are
1+i
q± (ω) = ± √ |ω|1/2
2
and (9.1.5) is not satisfied.

It should be pointed out that equations for which the initial value problem is ill-posed
can arise in applications. They will not, however, describe the time evolution of a system.

Example 9.1.4. As an example of how equation (9.1.1) can arise in an application, suppose
for the heat equation (6.1.1), i.e.,
ut = buxx , (9.1.10)
with b positive, that we have both u(t, 0) and ux (t, 0) at the boundary x = 0 for all
positive time, i.e., t > 0, and we wished to know the initial data u(0, x) for all positive x.
This problem requires the solution of an initial value problem with an equation like (9.1.1),
but where x is the time-like variable and t is the spatial variable. Because the problem
is ill-posed, we know before starting to calculate that we cannot hope to get “the” solution.
At best, we can hope for a reasonable estimate of a solution. To make the problem into a
well-posed problem we might solve
buxx = ut + εutt
with ε positive, rather than attempting to solve the true equation with ε zero. For the
boundary condition at t = 0 and positive x, one could take the equation (9.1.10).

Based on the previous discussion, it is easy to see that any equation of the form
 ν

u = R(∂x )u
∂t
for ν greater than 2 is ill-posed unless R (∂x ) is a constant. If r(ω), the symbol of R(∂x ),
grows with ω, then at least one of the νth roots of r(ω) must violate (9.1.5). This shows
that equations of order greater than 2 must be of very special form if they are to have
well-posed initial value problems. Since higher order equations have more possibilities for
lower order terms, the class of reasonable equations is further restricted.
212 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example 9.1.5. The initial value problem for the third-order equation
 
(∂t − a∂x ) ∂t2 − ∂x2 u = 0

is well-posed. However, the equation


 
(∂t − a∂x ) ∂t2 − ∂x2 u + 2∂x2 u = 0 (9.1.11)

is well-posed only if a is not equal to ±1. The symbol of this equation is


 
(s − iaω) s 2 + ω2 − 2ω2 = 0.

For |a|  = 1 we set s = iaω + ε, where ε is o(ω), and obtain


 
ε ε2 − 2iaωε + (1 − a 2 )ω2 − 2ω2 = 0.

Since ε is small compared with |ω| for large ω, we have ε(1 − a 2 ) ≈ 2 or ε =


2/(1 − a 2 ) + O(ω−1 ).
So one root is
2
s = iaω + + O(ω−1 )
1 − a2
and, similarly, the other roots are
1
s = iω − + O(ω−1 ),
1−a
1
s = −iω − + O(ω−1 ).
1+a

However, if a is 1 we have

s = iω ± −iω + O(1)

for two of the roots, and therefore the initial value problem for (9.1.11) is
ill-posed.

Exercises
9.1.1. Show that equations (1.4.1) can all be put in the form (9.1.3). Determine conditions
on the coefficients of these equations so that they are well-posed.
9.1.2. Show that if the operator R(∂x ) in the second-order equation (9.1.6) is of odd
order—i.e., the polynomial r(ω) has an odd number of roots—then the equation
(9.1.6) is not well-posed.
9.2 Well-Posed Systems 213
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

9.1.3. Show that if


utt = R(∂x )u

is well-posed, then so is
utt + 2a ut = R(∂x )u. (9.1.12)

9.1.4. Show that the following two equations are well-posed.


(a) utt + 2autxxx + buxxxx = 0 for all real a and b, a  = 0.
(b) utt + 2autxx + b2 uxxxx = 0 if a ≤ 0.
9.1.5. Verify that the proof of Theorem 9.1.2 applies to equations of the form of (9.1.12).

9.2 Well-Posed Systems of Equations


We next consider the well-posedness of initial value problems for systems of equations.
We restrict our discussion to systems that are of first order in the time differentiation. We
consider linear systems with constant coefficients and require that after application of the
Fourier transform, the system can be put in the form

ût = Q(ω)û, (9.2.1)

where û is a vector function of dimension d and Q is a d × d matrix function of ω.


We also consider systems in N space dimensions, so that ω is in R N . The concepts of
this section together with those of the previous section can be used to study well-posed
initial value problems of systems of higher order and mixed order in the time derivative.
Our discussion follows that of the important paper by Kreiss on well-posedness [33].
The solution to (9.2.1) is

û(t, ω) = etQ(ω) û0 (ω),

and in place of Theorem 9.1.1 we have Theorem 9.2.1.

Theorem 9.2.1. The necessary and sufficient condition for system (9.2.1) to be well-posed
is that for each nonnegative t, there is a constant Ct such that

eQ(ω)t  ≤ Ct (9.2.2)

for all ω ∈ R N . A necessary condition for (9.2.2) to hold is that (9.1.5) hold for each
eigenvalue of Q(ω).

The proof of this theorem is similar to that in the scalar case and is not given. Matrix
exponentials are not as easy to analyze as scalar exponentials, and there are no simple
conditions such as (9.1.5), which guarantee that (9.2.2) follows. We develop some tools
and apply them in several particular cases.
214 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

To analyze the norm of the exponential of a matrix we use the following lemma.

Lemma 9.2.2. Let U be an upper triangular square matrix of dimension d and let

ū = max Re uii
1≤i≤d

and
u∗ = max |uij |.
j >i

Then there is a constant Cd , independent of U, such that


 
etU  ≤ Cd et ū 1 + (tu∗ )d−1 .

Proof. To facilitate the proof we introduce polynomials mk (τ ) defined by


k−1
m0 (τ ) ≡ 1 and mk (τ ) = m$ (τ ), mk (0) = 0
$=0

for k greater than 0. The prime on mk (τ ) denotes differentiation with respect to τ. The
first few polynomials are shown here:
1 2
m0 (t) = 1, m1 (t) = t, m2 (t) = t + t,
2
1 3 1 4 1 3 3 2
m3 (t) = t + t 2 + t, m4 (t) = t + t + t + t,
6 24 2 2
1 5 1 4
m5 (t) = t + t + t 3 + 2 t 2 + t,
120 6
1 6 1 5 5 4 5 3 5 2
m6 (t) = t + t + t + t + t + t.
720 24 12 3 2

Let E(t) be the matrix etU and denote the elements of E(t) by eij (t). We will
prove by induction that  
|eij (t)| ≤ et ū mj −i tu∗ (9.2.3)
for j ≥ i.
The assertion (9.2.3) holds for j = i, since eii (t) = etuii . Assuming that (9.2.3)
holds for j − i < k, we prove that it holds for j − i = k. From the definition of E(t)
we have that
E  (t) = UE(t)
or

eij (t) = ui$ e$j (t).
i≤$≤j
9.2 Well-Posed Systems 215
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Therefore,
d  −tuii 
e eij (t) = ui$ e$j (t)e−tuii
dt
i<$≤j

and hence
 t
e−tuii eij (t) = ui$ e$j (τ )e−τ uii dτ.
0 i<$≤j

Using (9.2.3) for j − $ less than j − i, we have


 t
−tuii
|e eij (t)| ≤ u∗ mj −$ (τ u∗ )|eτ (ū−uii ) | dτ
0 i<$≤j
 t
≤ u∗ mj −$ (τ u∗ ) dτ
0 i<$≤j



t  
=u mj −i (τ u∗ ) dτ = mj −i tu∗ ,
0

by the defining equation for mk (τ ). This proves (9.2.3).


Depending on the matrix norm being used, there is a constant Cd such that
 
E(t) ≤ Cd max |eij (t)| ≤ Cd et ū 1 + md−1 (tu∗ ) ,

where we have used that mk (t) ≤ mk+1 (t) for positive k. Since md−1 (tu∗ ) is of degree
d − 1, the lemma follows.
We use this lemma to study the matrix exponential in (9.2.2). By Schur’s lemma (see
Appendix A), there is a unitary matrix function O(ω) such that O(ω)Q(ω)O(ω)−1 is
upper triangular. Let
Q̃(ω) = O(ω)Q(ω)O(ω)−1 .
Then, using the $2 norm for matrices,
 
etQ(ω)  = et Q̃(ω)  ≤ Cd et q̄(ω) 1 + |tq ∗ (ω)|d−1 , (9.2.4)

where
q̄(ω) = max Re qv (ω)
1≤v≤d

and
q ∗ (ω) = max |Q̃ij (ω)|,
j >i

similar to the definition of u∗ in Lemma 9.2.2. Moreover, since the diagonal elements
of et Q̃(ω) are etqν (ω) , where qν (ω) is an eigenvalue of Q(ω), we see that a necessary
condition for (9.2.2) to hold is that (9.1.5) hold for each eigenvalue of Q(ω). We also see
216 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

that sufficient conditions for (9.2.2) to hold must usually involve some information about
the off-diagonal elements of Q̃(ω).
We now give general definitions of parabolic and hyperbolic systems in N dimen-
sions.

Definition 9.2.1. The system



N
∂ 2u N
∂u
ut = Bj1 j2 + Cj + Du, (9.2.5)
∂xj1 ∂xj2 ∂xj
j1 ,j2 =1 j =1

for which

N
N
Q(ω) = − Bj1 j2 ωj1 ωj2 + i Cj ωj + D,
j1 ,j2 =1 j =1

is parabolic if the eigenvalues, qv (ω), of Q(ω) satisfy


Re qv (ω) ≤ a − b|ω|2 (9.2.6)
for some constant a and positive constant b.

For a parabolic system we have, in the notation of Lemma 9.2.2, that


q̄(ω) ≤ a − b|ω|2
for some positive constant b. The quantity q ∗ (ω) is bounded by a constant multiple of
1 + |ω|2 , in general. Thus from (9.2.4)
2
 
etQ(ω)  ≤ C et (a−bω ) 1 + (1 + |ω|2 )d−1 t d−1 ≤ Ct ,
where Ct is independent of ω.

Example 9.2.1. As an example of a parabolic system consider the system


 1   1   1   1
u 1 0 u 0 1 u 1 0 u
= + + .
u2 t 0 1 u2 xx 1 0 u2 xy 0 1 u2 yy
The matrix Q(ω) is 
ω12 + ω22 ω1 ω2
− .
ω1 ω2 ω12 + ω22
The eigenvalues of Q(ω) are easily found as the roots of the characteristic equation:

q + ω12 + ω22 ω1 ω2
det = (q + ω12 + ω22 )2 − ω12 ω22 = 0 .
ω12 ω2 q + ω12 + ω22
Thus the eigenvalues are
q± = −(ω12 + ω22 ) ± ω1 ω2 = −(ω12 ± ω1 ω2 + ω22 )

≤ − 12 (ω12 + ω22 ),

and so this system is parabolic.


9.2 Well-Posed Systems 217
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We see that condition (9.2.6), which is only on the eigenvalues of Q(ω), is sufficient
to assure that the system is well-posed. No other assumptions on the matrix Q(ω) are
needed.
We show later, in Theorem 9.2.4, that the lower order terms, those involving the Cj
and D, do not affect the well-posed nature of the system and so can be ignored in applying
Definition 9.2.1.
We see from (9.2.4) that if q̄(ω) is bounded below for large ω, then the system will
not be well-posed unless q ∗ (ω) is zero. This is why hyperbolic systems are required to be
diagonalizable.

Definition 9.2.2. The system


N
∂u
ut = Aj + Bu (9.2.7)
∂xj
j =1

with

N
Q(ω) = i Aj ωj + B
j =1

is hyperbolic if the eigenvalues of Q(ω), qv (ω), satisfy

Re qv (ω) ≤ c (9.2.8)

for some constant c, and if Q(ω) is uniformly diagonalizable for large ω, i.e., for each
ω with |ω| greater than some value K, there is a matrix M(ω) such that

M(ω)Q(ω)M −1 (ω)

is diagonal and the norms of M(ω) and M(ω)−1 are bounded independently of ω.

The conditions for a hyperbolic system are precisely those needed to make the ex-
pression in (9.2.4) bounded; that is, q̄(ω) is bounded by (9.2.8) and q ∗ (ω) can be taken to
be zero, since Q(ω) is diagonalizable. Note, however, that M(ω) need not be unitary, as
was O(ω) in deriving (9.2.4), but M(ω) and M(ω)−1 need to be bounded in norm. As
with parabolic systems the lower order term, in this case Bu, does not effect the well-posed
nature of the hyperbolic system (9.2.7).

Example 9.2.2. As an example of a hyperbolic system consider the shallow water equations
linearized around a constant velocity field (a, b):
ut + aux + buy + hx = 0,
vt + avx + bvy + hy = 0, (9.2.9)
ht + ahx + bhy + ux + vy = 0.
218 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

For this system we have


   
a 0 1 b 0 0
Q(ω) = −i 0 a 0 ω1 − i 0 b 1 ω2
1 0 a 0 1 b
 
aω1 + bω2 0 ω1
= −i 0 aω1 + bω2 ω2 .
ω1 ω2 aω1 + bω2
The three eigenvalues of Q(ω) are easily found to be
&
q0 (ω) = −i(aω1 + bω2 ), q± (ω) = −i(aω1 + bω2 ) ± i ω12 + ω22 .

Since the eigenvalues are purely imaginary, the system is hyperbolic.

Lower Order Terms


We next show that lower order terms do not affect the well-posedness of hyperbolic and
parabolic systems. We begin with a theorem applicable to the hyperbolic systems and to
the undifferentiated term in parabolic systems.

Theorem 9.2.3. If the system


ût = Q(ω)û (9.2.10)
is well-posed and Q0 (ω) is bounded independently of ω, then the system

ût = (Q(ω) + Q0 (ω)) û (9.2.11)

is also well-posed.

Proof. Let c0 be a constant such that

Q0 (ω) ≤ c0

and assume that Ct as defined by (9.2.2) is a nondecreasing function of t. From (9.2.11)


we have
 
e−Q(ω)t û(t, ω) = e−Q(ω)t Q0 (ω)û(t, ω)
t
and so
 t
û(t, ω) = eQ(ω)t û0 (ω) + eQ(ω)(t−τ ) Q0 (ω)û(τ, ω) dτ.
0

Therefore, by the well-posedness of (9.2.10),


 t
|û(t, ω)| ≤ Ct |û0 (ω)| + c0 Ct |û(τ, ω)| dτ, (9.2.12)
0
9.2 Well-Posed Systems 219
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where we have used our assumption that Ct is a nondecreasing function of t. If we define


the function U (t, ω) by  t
U (t, ω) = |û(τ, ω)| dτ,
0
then (9.2.12) can be written
d
U (t, ω) ≤ Ct |û0 (ω)| + c0 Ct U (t, ω).
dt
We then obtain, using a method similar to that used for obtaining (9.2.12),
ec0 CT t − 1
U (t, ω) ≤ |û0 (ω)|
c0
for 0 ≤ t ≤ T . Substituting this inequality in (9.2.12), we have that |û(t, ω)| is bounded
by CT ec0 CT t |û0 (ω)| for 0 ≤ t ≤ T . Taking T equal to t, we have
|û(t, ω)| ≤ Ct ec0 Ct t |û0 (ω)| = Ct∗ |û0 (ω)|.
Since  
Q(ω)+Q0 (ω) t
û(t, ω) = e û0 (ω)
and û0 (ω) is an arbitrary value for each ω, we have
 
 (Q(ω)+Q0 (ω))t 
e  ≤ Ct∗ ,

which shows that (9.2.11) is well-posed.


Theorem 9.2.3 shows that the matrix B, the lower order term in the hyperbolic system
(9.2.7), does not affect the well-posedness of the system (9.2.7). If the matrix B is zero,
then the constant c in (9.2.8) can be taken to be zero, and the constant K in Definition
9.2.2 can also be taken to be zero. These last results follow from the observation that if B
is zero, then Q(ω) is a homogeneous matrix function of ω, i.e., Q(αω) = αQ(ω) for
any real number α.
Theorem 9.2.3 also shows that the matrix D in the parabolic system (9.2.5) does not
affect the well-posedness. We next show that the first-derivative terms, the Cj in (9.2.5),
also do not affect the well-posed nature of the system. We actually prove a more general
theorem.

Theorem 9.2.4. If the system


ût = Q(ω) u
satisfies
eQ(ω)t  ≤ Kt e−b|ω|
ρt

for some positive constants b and ρ, with Kt independent of ω, and if Q0 (ω) satisfies
Q0 (ω) ≤ c0 |ω|σ
with σ < ρ, then the system
ût = [Q(ω) + Q0 (ω)] û
is also well-posed.
220 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Proof. We prove this theorem dealing directly with the exponential of the matrix
(Q(ω) + Q0 (ω)) t rather than the functions û(t, ω), as was done in the proof of Theorem
9.2.3. Let
E(t, ω) = e(Q(ω)+Q0 (ω))t .
Then E(t, ω) satisfies the ordinary differential equation
d
E(t, ω) = Q(ω)E(t, ω) + Q0 (ω)E(t, ω)
dt
with E(0, ω) = I. Thus
d  
e−Q(ω)t E(t, ω) = e−Q(ω)t Q0 (ω)E(t, ω),
dt
and so we have the representation
 t
E(t, ω) = e Q(ω)t
+ eQ(ω)(t−τ ) Q0 (ω)E(τ, ω) dτ.
0

Therefore, using the estimates on eQ(ω)t and Q0 (ω),


 t
E(t, ω) ≤ Kt e−b|ω| t + c0 Kt |ω|σ e−b|ω| (t−τ ) E(τ, ω) dτ.
ρ ρ
(9.2.13)
0

If we define the function F (t) by


 t ρτ
F (t) = eb|ω| E(τ, ω) dτ,
0

then (9.2.13) can be rewritten as


d
F (t) ≤ Kt + c0 Kt |ω|σ F (t),
dt
from which we obtain
ec0 KT |ω| t − 1
σ

F (t) ≤ for 0 ≤ t ≤ T ,
c0 |ω|σ
where we have assumed, without loss of generality, that Kt is a nondecreasing function
of t. Applying this in (9.2.13) with t = T , we obtain

E(t, ω) ≤ Kt e−(b|ω|


ρ −c K |ω|σ )t
0 t ≤ Kt∗ ,
since ρ is greater than σ. This proves the theorem.
We next show that the function Ct in (9.1.2) can always be chosen to be an exponential
function of t.

Lemma 9.2.5. The function Ct in (9.1.2) and (9.2.2) can always be taken in the form
Keat
for some constants K and a.
9.2 Well-Posed Systems 221
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Proof. We have
eQ(ω)t  ≤ K for 0 ≤ t ≤ 1,
where
K = max Ct .
0≤t≤1

For t larger that 1, set


t = n + t ,
where 0 ≤ t  < 1. Then  n 
eQ(ω)t = eQ(ω) eQ(ω)t ,

and so

eQ(ω)t  ≤ eQ(ω) n eQ(ω)t  ≤ K n+1 ≤ K eat ,
where K = ea . This proves the lemma.
The value of this lemma is that it shows that expressions such as

Ct ec0 Ct t ,

which was obtained in the proof of Theorem 9.2.3, are unnecessarily pessimistic when
the growth of Ct with t is not specified. The actual growth will never be worse than
exponential growth in t.

Weakly Hyperbolic Systems


It is also worthwhile to consider the consequences of relaxing the definition of a well-posed
initial value problem for systems. For example, the system
 1   1
u 1 1 u
= (9.2.14)
u2 t 0 1 u2 x

is similar to a hyperbolic system since the eigenvalues of the symbol are purely imaginary;
however, the symbol 
1 1
Q(ω) = iω
0 1
is not diagonalizable. Such a system is sometimes called a weakly hyperbolic system. It is
easy to see that the solution to (9.2.14) is

u1 (t, x) = u1 (0, x + t) + tu2x (0, x + t),

u2 (t, x) = u2 (0, x + t).

As these equations show, the solution depends on the first derivative of the initial data as
well as the data itself, i.e.,
 
u(1) (t, ·) + u(2) (t, ·) ≤ C u(1) (0, ·) + u2 (0, ·) + u2x (0, ·) . (9.2.15)
222 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

This weaker estimate is not serious in itself; the difficulty with such a system is that the
addition of lower order terms will make the system ill-posed. In particular, the system
u1t = u1x + u2x ,

u2t = u2x + εu1

is ill-posed for any nonzero value of ε. The eigenvalues of the system are
q± = iω ± (iεω)1/2 ,
and it is easily seen that condition (9.2.2) is not satisfied. In particular, the estimate (9.2.15)
shows that this system is not well-posed in the sense of Definition 1.5.2 or estimate (9.1.2).
As this example shows, Theorem 9.2.3 does not extend to the situation where weaker
estimates such as (9.2.15) hold. This example was used by Kreiss [33] to demonstrate
the effect of variable coefficients on the well-posed nature of systems. Examples such as
this show that estimate (9.1.2) is that which best embodies the notion of a well-behaved
process and which also leads to a reasonable mathematical theory of well-posed initial value
problems.
Systems of partial differential equations that describe physical systems are always
approximations based on assuming that certain effects are negligible. Systems such as
(9.2.14), which are not well-behaved under small effects such as variable coefficients and
lower order terms, cannot be useful in modeling initial value problems for physical systems.

Exercises
9.2.1. Show that the wave equation in two spatial dimensions utt = uxx + uyy can be
transformed to the system
u1t = u2x + u3y ,

u2t = u1x ,

u3t = u1y .

Show that the system is hyperbolic.


9.2.2. Show that the system
ut = ux − vy ,

vt = uy + vx
is hyperbolic.
9.2.3. Show that the system
 1    
u 2 −1 u1 0 1 u1
= +
u2 t 1 0 u2 xx
−1 2 u2 yy

is parabolic.
9.3 Inhomogeneous Problems 223
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

9.2.4. Show that the system


ut + vtxx = ux ,

utx + vt = vxx

can be put into the form (9.2.1). Determine if the system is well-posed.

9.3 Estimates for Inhomogeneous Problems


We now consider the inhomogeneous initial value problem, P u = f, to estimate the so-
lution at time t in terms of the initial data and the data f (t, x). We consider a single
partial differential equation, P u = f, with constant coefficients, that is first order in the
derivative with respect to t. Under the Fourier transform it may be written as

ût (t, ω) = q(ω)û(t, ω) + r(ω)fˆ(t, ω), (9.3.1)

where the factor of r(ω) arises from normalizing the equation so that the coefficient of ût
is 1 (see Exercise 9.3.1). We also require that there are constants q̄ and C1 such that the
well-posedness estimate (9.1.5) holds, i.e.,

Re q(ω) ≤ q̄

and also
|r(ω)| ≤ C1 . (9.3.2)

The solution of (9.3.1) can be written as


 t
û(t, ω) = eq(ω)t û0 (ω) + r(ω) eq(ω)(t−s) fˆ(s, ω) ds. (9.3.3)
0

From this we easily obtain, from (9.3.1) and (9.3.2),


 t !
|û(t, ω)|2 ≤ Ce2q̄t |û0 (ω)|2 + |fˆ(s, ω)|2 ds ,
0

and hence
 t !
u(t, ·)2 ≤ Ce2q̄t u0 2 + f (s, ·)2 ds . (9.3.4)
0

For a stable finite difference scheme an analogous estimate holds. We prove it now
for one-step schemes. All the one-step schemes we have considered may be written as

n (ξ ),
v̂ n+1 (ξ ) = g(hξ )v̂ n (ξ ) + k F (9.3.5)
224 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where g is the amplification factor and


 
n (ξ )| ≤ C |fˆn (ξ )| + |fˆn+1 (ξ )| .
|F (9.3.6)

The solution to (9.3.5) can be written as (see Exercise 9.3.2)


n−1
v̂ n (ξ ) = g(hξ )n v̂ 0 (ξ ) + k $ (ξ ).
g(hξ )n−$ F (9.3.7)
$=0

We then have by the von Neumann stability estimate (2.2.7) that


 n−1 2 

$ (ξ ) 
|v̂ n (ξ )|2 ≤ 2 |g(hξ )|2n | v̂ 0 (ξ )|2 + k g(hξ )n−$ F

$=0
$ %

n−1
n−1
≤ 2 |g(hξ )| | v̂ (ξ )| + k
2n 0 2
|g(hξ )| 2(n−$)
k $ (ξ )|2
|F
$=0 $=0
$ %

n−1
n−1
≤2 (1 + Kk) | v̂ (ξ )| + k
2n 0 2
(1 + Kk) 2(n−$)
k $
|F (ξ )| 2

$=0 $=0
$ %

n−1
≤ 2(1 + Kk) 2n
|v̂ (ξ )| + (kn)k
0 2 $ (ξ )|2
|F
$=0
$ %

n−1
≤ CT |v̂ (ξ )| + k
0 2 $ (ξ )|2 .
|F
$=0

Then, by Parseval’s relation and (9.3.6),


 

n
v  ≤ CT
n 2
v  + k0 2
f 
$ 2
. (9.3.8)
$=0

Estimates (9.3.4) and (9.3.8) show that for both the well-posed partial differential
equation and the stable finite difference scheme, the solution depends continuously on the
data. For example, consider the two initial value problems

(1)
P u(1) = f (1) , u(1) (0, x) = u0 (x),
(2)
P u(2) = f (2) , u(2) (0, x) = u0 (x).

The difference of the solutions can be estimated in terms of the difference in the data, i.e.,
  t
(1) (2) 2
u (t, ·) − u (t, ·) ≤ CT u0 − u0  +
(1) (2) 2
f (s, ·) − f (s, ·) ds .
(1) (2) 2
0
9.4 The Kreiss Matrix Theorem 225
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

It is essential that a differential equation describing the evolution of a physical system


satisfy such an estimate. This estimate expresses the idea that small changes in the data
result in small changes in the solution at the later times. (See the discussion at the beginning
of section 9.1.) In particular, it shows that errors in the data will be magnified by only a
fixed amount, determined by CT . For finite difference schemes the estimate shows that the
round-off error, which is inherent in all computation, will grow at a limited rate. It is for
this reason that the effects of round-off error are not a major concern in the study of finite
difference schemes for partial differential equations.

Duhamel’s Principle
Solution formulas (9.3.3) and (9.3.7) express Duhamel’s principle, which states that the
solution to an inhomogeneous initial value problem can be regarded as the superposition of
solutions to homogeneous initial value problems. For (9.3.3), consider the homogeneous
initial value problems for (9.3.1) with solutions u(t, ω; s) that have initial data prescribed
at t = s given by
û(s, ω; s) = r(ω)fˆ(s, ω).
Then (9.3.3) can be written as
 t
û(t, ω) = eq(ω)t û0 (ω) + û(t, ω; s) ds.
0

Similarly, for (9.3.7) we have


n−1
v̂ n = g n v̂ 0 + v̂ n,$ ,
$=0

where v̂ n,$ is the solution to the homogeneous initial value problem starting at time level
$ with
$ .
v̂ $,$ = k F
The well-posedness and stability estimates for the inhomogeneous initial value problems
are direct consequences of the estimates for the homogeneous problems.

Exercises
9.3.1. Show that equations (1.1.3) and (1.4.1) may be put in the form (9.3.1).
9.3.2. Show that the general solution to the one-step scheme (9.3.5) is given by (9.3.7).

9.4 The Kreiss Matrix Theorem


In Section 7.1 the stability condition for a one-step scheme for a system was shown to be
as follows: For each T > 0, there is a constant CT such that

G(θ, k, h)n  ≤ CT (9.4.1)


226 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

for all n with 0 ≤ nk ≤ T . As for the constants in the estimates for well-posed initial
value problems, we can always take the constant CT in (9.4.1) to have the form

CT = Keat = Keakn

(see Lemma 9.2.5). Thus estimate (9.4.1) is equivalent to

G̃(θ, k, h)n  ≤ K (9.4.2)

for all n where G̃ = e−ak G.


The Kreiss matrix theorem gives several equivalent characterizations of families of
matrices satisfying conditions such as (9.4.2). The theorem considers a family, or set, F, of
M × M matrices. In the context of finite difference schemes, the matrices would depend
continuously on the parameters θ, k, and h; however, the theorem can be stated as a result
in matrix theory without referring to our intended applications.

Theorem 9.4.1. The Kreiss Matrix Theorem. For a family F of M × M matrices, the
following statements are equivalent.
A: There exists a positive constant Ca such that for all A ∈ F and each nonnegative
integer n,
An  ≤ Ca .

R: There exists a positive constant Cr such that for all A ∈ F and all complex numbers
z with |z| > 1,
(zI − A)−1  ≤ Cr (|z| − 1)−1 . (9.4.3)

S: There exist positive constants Cs and Cb such that for each A ∈ F there is a
nonsingular hermitian matrix S such that B = SAS −1 is upper triangular and

S, S −1  ≤ Cs , (9.4.4a)

|Bii | ≤ 1, (9.4.4b)
|Bij | ≤ Cb min[1 − |Bii |, 1 − |Bjj |] (9.4.4c)
for i < j.
H: There exists a positive constant Ch such that for each A ∈ F there is a hermitian
matrix H such that
Ch−1 I ≤ H ≤ Ch I, (9.4.5a)
A∗ H A ≤ H. (9.4.5b)

N: There exist constants Cn and cn such that for each A ∈ F there is a hermitian
matrix N such that
Cn−1 I ≤ N ≤ Cn I,
Re (N (I − zA)) ≥ cn (1 − |z|)I
for all complex numbers z with |z| ≤ 1.
9.4 The Kreiss Matrix Theorem 227
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

U : There exists a positive constant Cω such that for each A ∈ F there is a hermitian
matrix U such that
Cω−1 I ≤ U ≤ Cω I,
|(UAn x, x)|
sup ≤ 1.
x=0 (Ux, x)

The original Kreiss matrix theorem (Kreiss [32]) contained only the first four condi-
tions, A, R, S, and H. The condition U was proved equivalent to the original four condi-
tions by Tadmor [59] and condition N was added by Strikwerda and Wade [58]. LeVeque
and Trefethen [39] showed that condition R implies condition A with Ca ≤ eMCr and
that this is the best possible bound on Ca in terms of Cr .
In some applications it is important to know when the matrices H, N, and U can be
constructed to be (locally) continuous functions of the elements of F. Although this result
can be established for some special families, it has not been established in general.
Proof. We will prove that these conditions are all equivalent by showing that each
condition implies the next one, in the given order, and finally that condition U implies
condition A.
We first show that condition A implies condition R. We have that


(zI − A)−1 = z−(j +1) Aj
j =0

for large values of z. Thus




(zI − A)−1  ≤ |z|−(j +1) Ca ≤ Ca (|z| − 1)−1 ,
j =0

which is condition R. The expression (zI − A)−1 is called the resolvent of A; it is


an analytic matrix-valued function of z, and condition R is often called the resolvent
condition.
The proof that condition R implies condition S is the most difficult portion of the
proof, and we postpone this until the end.
We next show that condition S implies condition H. We construct the matrix H as
S ∗ D 2 S, where the matrix D is a diagonal matrix whose j th entry is ε M−j , where ε is
a positive parameter to be chosen later. Condition (9.4.5b) is then seen to be

A∗ S ∗ D 2 SA ≤ S ∗ D 2 S

or
B ∗ D 2 B ≤ D 2.
Finally, if we set B̃ = DBD −1 , this condition is

B̃ ∗ B̃ ≤ I
228 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

or, equivalently,
|B̃x|2 ≤ |x|2 (9.4.6)

for any vector x in C M . Since B is an upper triangular matrix, we have that the elements
of B̃ are given by
B̃ij = Bij ε j −i .

Thus, by the Cauchy–Schwarz inequality,


M
M
M
|B̃x|2 = |(B̃x)i |2 = | Bij ε j −i xj |2
i=1 i=1 j =i
  

M M M
≤  |Bij |ε j −i |xj |2   |Bij |ε j −i  .
i=1 j =i j =i

Now we consider each portion of this sum, beginning with the sum over j. Using estimate
(9.4.4c), we obtain


M
M
j −i
|Bij |ε = |Bii | + |Bij |ε j −i
j =i j =i+1


M
≤ |Bii | + Cb (1 − |Bii |) ε j −i (9.4.7)
j =i+1

ε
≤ |Bii | + Cb (1 − |Bii |) ≤ 1
1−ε

if ε is chosen so that

Cb ε
≤ 1. (9.4.8)
1−ε
With this choice of ε we have


M
M
M
j
j −i
|B̃x| ≤
2
|Bij |ε |xj | =
2
|xj |
2
|Bij |ε j −i .
i=1 j =i j =1 i=1

We next employ an argument similar to (9.4.7). Considering the sum over j we obtain


j
Cb ε  
|Bij |ε j −i ≤ |Bjj | + 1 − |Bjj | ≤ 1.
1−ε
i=1
9.4 The Kreiss Matrix Theorem 229
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Thus if ε is chosen to satisfy (9.4.8), then (9.4.6) holds, which is equivalent to condition
(9.4.5b). The choice of ε is seen to depend only on Cb . Thus

H = S ∗ D 2 S ≤ Cs2 ε −2M

and, similarly, H ≥ Cs−2 ε 2M, which establishes condition H.


To prove condition N, we start with

0 ≤ (I − zA)∗ H (I − zA)

= H − 2 Re (H zA) + |z|2 A∗ H A

= 2 Re [H (I − zA)] − H + |z|2 A∗ H A

≤ 2 Re [H (I − zA)] + (|z|2 − 1)H.

Thus if |z| ≤ 1 and using the bounds on H,

Cn−1 (1 − |z|)I ≤ H (1 − |z|) ≤ H (1 − |z|2 ) ≤ 2 Re [H (I − zA)]

and condition N holds with N equal to H.


The proof that condition N implies condition U is similar to the proof of the
Halmos inequality given by Pearcy [50]. We begin with two relationships for all com-
plex numbers z,
(
n
1 − zn = (1 − ζk z),
k=1

1 (
n n
1= (1 − ζk z), (9.4.9)
n
j =1 k=1
k=j

where the ζk are the nth roots of unity.


As purely algebraic relationships, these relationships hold also when z is replaced
by a matrix A. For any vector x and complex number γ , with |γ | = 1, we define

(
n
xj = (1 − ζk γ A)x.
k=1
k=j

By (9.4.9) we have that


1
n
x= xj .
n
j =1
230 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Condition N with z = γ ζj implies that

1
n
 
0≤ Re N (I − ζj γ A)xj , xj
n
j =1

1
n
 
= Re N (I − γ n An )x, xj
n
j =1
 
= Re N (I − γ n An )x, x .

By choosing γ so that

Re (γ n N An x, x) = |(N An x, x)|,

we obtain
|(N An x, x)| ≤ (N x, x).
Thus condition N is satisfied with U equal to N and Cω equal to Cn .
The last implication is that condition U implies condition A. We use the following
relations: For a Hermitian matrix S we have

sup |(Sx, x)| = S


|x|=1

and for any matrix B


1 i
B= (B + B ∗ ) − (B − B ∗ ),
2 2
so
1   
(B + B ∗ ) + 1 (B − B ∗ ) ≤ 2 sup |(Bx, x)|.
B ≤
2 2 |x|=1

Since the matrix U is positive definite and hermitian, it has a positive definite and
hermitian square root T , with both T  and T −1  bounded by Cω . Thus we have
1/2

An  = T −1 T An T −1 T  ≤ T −1  T  T An T −1 

≤ 2T −1  T  sup |(T An T −1 x, x)|


|x|=1

= 2T −1  T  sup |(U An T −1 x, T −1 x)|


|x|=1

≤ 2T −1  T  sup |(U T −1 x, T −1 x)|


|x|=1

= 2T −1  T  ≤ 2Cω .
9.4 The Kreiss Matrix Theorem 231
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

It remains to prove that condition R implies condition S. We begin this portion of


the proof by assuming that the matrix A is upper triangular. This is permissible by Schur’s
lemma (Proposition A.5 in Appendix A). We may also assume that the diagonal elements
of A are ordered so that
|aii | ≥ |ajj | for i < j ; (9.4.10)
i.e., the eigenvalues are in order of decreasing magnitude. The resolvent of A, which is
Rz (A) = (zI − A)−1 , is also an upper triangular matrix. We also note that any element of
a matrix is bounded by the norm of the matrix.
The ith diagonal element of Rz (A) is (z − aii )−1 , and this is bounded by
Cr (|z| − 1)−1 . It is then immediate that |aii | ≤ 1. We now proceed to construct the matri-
ces B and S recursively, one diagonal at a time, for each diagonal above the main diagonal
of the matrices.
Let rij denote the elements of the matrix Rz (A). Since Rz (A) is the inverse of
zI − A, we obtain for j greater than i


j
[(zI − A)Rz (A)]ij = (z − aii )rij − aik rkj = 0. (9.4.11)
k=i+1

For j equal to i + 1 we have

(z − aii )rij − aij rjj = 0.

Therefore, since rjj = (z − ajj )−1 ,

aij
rij = .
(z − aii )(z − ajj )

Since |rij | ≤ Cr (|z| − 1)−1 , we have

|(z − aii )(z − ajj )|


|aij | ≤ Cr (9.4.12)
|z| − 1

for all z such that |z| > 1. If the eigenvalue aii has magnitude of 1/2 or less, then from
(9.4.12), with z equal to 5/2, we obtain

(|z| + 12 )2
|aij | ≤ Cr ≤ 6 Cr . (9.4.13)
|z| − 1

(Recall that by (9.4.10), |ajj | also has magnitude less than 1/2. )
If |aii | is greater than 1/2 in magnitude, then we set z = t (āii )−1 in (9.4.12), where
t is real and greater than 1, and then take the limit as t approaches 1. We obtain

(1 − |aii |2 ) |1 − āii ajj |


|aij | ≤ Cr ≤ 4 Cr |1 − āii ajj |. (9.4.14)
(1 − |aii |)|aii |
232 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

As in the proof given by Morton and Schecter [45] (see also Richtmyer and Morton [52]),
we have


|1 − āii ajj | = 1 − |aii |2 + āii (aii − ajj )

≤ (1 + |aii |)(1 − |aii |) + |aii | |aii − ajj |

≤ (1 + 2|aii |) max(1 − |aii |, |aii − ajj |)

≤ 3 max(1 − |aii |, |aii − ajj |).

Combining this estimate with (9.4.13) and (9.4.14), we obtain

|aij | ≤ 12 Cr max(1 − |aii |, |aii − ajj |).

If the maximum of 1 − |aii | and |aii − ajj | is the latter quantity, we consider
the matrix S (i,j ) , which is the identity matrix except that the entry in location (i, j ) is
aij (aii − ajj )−1 . The matrix

S (i,j ) A(S (i,j ) )−1 = A(i,j )

has a zero in location (i, j ). Moreover, the elements of A(i,j ) differ from those of A only
in the locations (i  , j  ) with i  ≤ i and j  ≥ j. Taking the product of all S (i,j ) formed
in this way—call it S—we have that the matrix

à = SAS −1

satisfies
|ãij | ≤ 12 Cr (1 − |ãii |), (9.4.15)
which is (9.4.4c) for j = i + 1. We also have that the norm of S is at most

(1 + 12 Cr )M−1 ,

since each S (i,j ) has norm bounded by 1 + 12 Cr . The matrix à satisfies the resolvent
condition (9.4.3) but with the constant C̃r equal to S S −1 Cr .
We continue for j = i + $ with $ > 1, assuming that (9.4.15) is satisfied for all
j − i less than $. From (9.4.11) we have
j −1

0 = (z − aii )rij − aik rkj − aij rjj .
k=i+1

So, since rjj is (z − ajj )−1 ,

j −1

aij = rij (z − aii ) (z − ajj ) − aik rkj (z − ajj ),
k=i+1
9.4 The Kreiss Matrix Theorem 233
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

and so by (9.4.15)
|(z − aii )(z − ajj )| |z − ajj |(1 − |aii |)
|aij | ≤ Cr + 12 Cr .
|z| − 1 |z| − 1
We may proceed as in the case with j = i + 1. Notice that the constant Cr may
increase at each step of the proof. However, it can always be chosen to depend only on
the value of Cr at the earlier step and on M. This recursive alteration of A terminates
in at most M − 1 steps, and thus the proof is complete, with B being the result of the
modifications to A and S being the product of all the S (i,j ) matrices. This proves that
condition R implies condition S.
Since we have shown that each condition implies the succeeding one, the proof of
the Kreiss matrix theorem is complete.
The Kreiss matrix theorem is of theoretical importance because it relates the usual
concept of stability, condition A, with equivalent conditions that may be useful in different
contexts. It is of limited practical use in determining stability because the verification of
any of the conditions is usually quite as difficult as verifying condition A itself.
It is also notable that the only portion of the theorem that depends on the finite
dimensionality of the linear operators is that involving condition S. The conditions H, N,
and U are all equivalent for families of operators on Hilbert spaces. Condition H states
that in the norm  · H , defined by

xH = (x, H x)1/2 ,

the operator A is a contraction, i.e.,

AxH ≤ xH .

If this condition is satisfied, we say that A is equivalent to a contraction. It is easy to


see that if A is equivalent to a contraction, with H satisfying (9.4.5a), then A is power
bounded in the original norm; i.e., condition A holds. However, Foquel [19] has shown that
there exist power-bounded operators on an infinite-dimensional Hilbert space that are not
equivalent to a contraction. Thus condition S is an essential part of the finite-dimensional
Kreiss matrix theorem.

Exercises
9.4.1. Determine the resolvent for the M × M matrix
 
0 a 0 0 ··· 0
0 0 a 0 ··· 0
 
0 0 0 a ··· 0
A=  ... .. .. .. .. 

 . . . . 
0 0 0 0 ··· a
0 0 0 0 ··· 0
for a real number a. Determine constants Ca and Cr for conditions A and R in
Theorem 9.4.1.
234 Chapter 9. Analysis of Well-Posed and Stable Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

9.4.2. By considering the contour integral



1
zn (zI − A)−1 dz,
2π i ?

where ? is the circle |z| = 1 + (n + 1)−1 , show that if A satisfies the resolvent
condition (9.4.3), then
An  ≤ C ∗ (n + 1)
for some constant C ∗ depending only on Cr and not on A.
9.4.3. Two other conditions that are equivalent to those of the Kreiss matrix theorem are:
M 1 : There is a positive constant C1 such that for each N ≥ 0, each real
value of θ, and each A in F,
 
 1 N 
 
 An einθ  ≤ C1 .
N + 1 
n=0

M 2 : There is a positive constant C2 such that for each N ≥ 0, each real


value of θ, and each A in F,
 
 2 N 
n  n inθ 
 
 1− A e  ≤ C2 .
N + 1 N 
n=0

Prove directly that condition A implies condition M1 , condition M1 im-


plies condition M2 , and condition M2 implies condition R. Hint: To prove that
condition M2 implies R, consider the sum



N
r N−1 (N − n)An einθ .
N=1 n=0

9.4.4. Show that condition M2 , of Exercise 9.4.3, is equivalent to condition R for oper-
ators on a Hilbert or Banach space. To prove that condition R implies condition
M2 , consider the contour integral
3  2
1 zN/2 − z−N/2
(zI − A)−1 dz,
2πi ? z1/2 − z−1/2

where ? is the circle |z| = 1 + N −1 . Prove that


 N/2 2  
1 z − z−N/2
dθ ≤ 4eN 1 + O(N −1 ) .
2π z1/2 − z−1/2
?
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 10

Convergence Estimates for


InitialValue Problems

In this chapter we prove estimates for the convergence of solutions of finite difference
schemes. The concept of the order of accuracy of a scheme was presented in Chapter 3,
where it was stated that the order of accuracy of the scheme was related to the order of
accuracy of the solution. Here we provide the proofs for this assertion. We also give a
proof of the Lax–Richtmyer theorem (Theorem 1.5.1).
Estimates are given that show the rate at which the discrete solutions of finite dif-
ference schemes converge to the solution of the differential equation. For simplicity, we
restrict ourselves at first to one-step schemes. Multistep schemes are considered in Section
10.6. Equations second order in the time derivative are considered in Section 10.7. We
also consider only scalar equations; the extension of these results to systems of equations
is straightforward and is left to the exercises.
Only constant coefficient equations are considered in this text. The theorems we give
can be extended to cover equations with variable coefficients, but the extension requires
techniques beyond the scope of this text. Estimates for variable coefficient equations are
proved by Peetre and Thomée [51] and by Wade [67]. Convergence estimates similar to
what we prove here, but using different norms, are given in the lecture notes of Brenner,
Thomée, and Wahlbin [6].
For simplicity we consider only one-dimensional problems. The theorems for higher
dimensional cases are given in the exercises.

10.1 Convergence Estimates for Smooth Initial Functions


We begin by addressing the problem of how to compare discrete functions defined on the
mesh hZ with functions defined on the real line. The truncation operator maps functions
on the real line to functions on the grid, and the interpolation operator maps functions on
the grid to functions defined on the real line. Both operators are defined in terms of the
Fourier transform as defined in Chapter 2.

Definition 10.1.1. The truncation operator T maps functions in L2 (R) to functions in


L2 (hZ). Given u ∈ L2 (R), we have

 ∞
1
u(x) = √ eixξ û(ξ ) dξ,
2π −∞

235
236 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

and Tu is defined as
 π/ h
1
Tum = √ eimhξ û(ξ ) dξ
2π −π/ h

for each grid point mh ∈ hZ. Alternatively, the Fourier transform of Tu is given by

4 ) = û(ξ ) for |ξ | ≤ π .
Tu(ξ
h

Definition 10.1.2. The interpolation operator S maps functions in L2 (hZ) to functions


on L2 (R). Given v ∈ L2 (hZ), we have
 π/ h
1
vm = √ eimhξ v̂(ξ ) dξ
2π −π/ h

and Sv(x) is defined as


 π/ h
1
Sv(x) = √ eixξ v̂(ξ ) dξ (10.1.1)
2π −π/ h

for each x ∈ R. Alternatively, the Fourier transform of Sv is given by


4 )= v̂(ξ ) if |ξ | ≤ π/ h,
Sv(ξ
0 if |ξ | > π/ h.

Both of the operators T and S depend on the parameter h. We do not explicitly


show this in the notation in order to keep our notation simple. The operator S is called the
cardinal spline operator; see [55].

Example 10.1.1. Consider the function and transform



1 π −|ξ |
u(x) = , û(ξ ) = e .
1 + x2 2

For this function the truncation operator gives



1 π/ h 1 − (−1)m e−π/ h
Tum = eimhξ e−|ξ | dξ = .
2 −π/ h 1 + (hm)2

The function u(x) and the discrete function Tum are shown in Figure 10.1 for the
case of h = 1. The difference between Tum and u(xm ) is especially noticeable at x = 0.
For smaller values of h this difference is harder to see.
10.1 Estimates for Smooth Initial Functions 237
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

0.8

0.6

0.4

0.2

0
-5 -4 -3 -2 -1 0 1 2 3 4 5

Figure 10.1. An example of the truncation operator.

Another formula for the interpolation operator can be developed by substituting the
formula (2.1.3) for the transform in the integral (10.1.1). We obtain
 π/ h
1
Sv(x) = √ eixξ v̂(ξ ) dξ
2π −π/ h
 π/ h ∞
h
= eixξ e−imξ vm dξ
2π −π/ h m=−∞
∞ 
h π/ h
= vm eixξ e−imξ dξ
2π m=−∞ −π/ h

sin((x − mh)π/ h)
= vm .
m=−∞
(x − mh)π/ h

This formula shows that the interpolant is a superposition of functions using the “sinc” func-
tion sin(t)/t. The interpolation is called cardinal interpolation and is a limit of polynomial
spline interpolation as the degree increases; see [55].
We now consider the numerical solution of partial differential equations by stable
one-step schemes. We consider the differential equation in the form of (9.1.3), i.e., in the
form
ût = q(ω)û. (10.1.2)

(Second-order equations are considered in Section 10.7.) If the partial differential equation
has initial function u0 (x), we take as initial function for the scheme

0
vm = (Tu0 )m .
238 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Although this is not what is done in practice, it is the initial function for the scheme that
gives the simplest estimates. Later we consider the effect of using other initial functions
for the scheme.
We also need to refine the concept of order of accuracy as given in Section 3.1. The
next definition of the order of accuracy of a finite difference scheme also takes account
of the power of the dual variable ξ in the approximation of ekq(ξ ) by the amplification
factor. This is needed to quantify the idea of how much smoothness is required of the initial
function so that the order of accuracy of the solutions of the scheme is equal to the order of
accuracy of the scheme. In Theorem 10.1.1 we show that this definition is consistent with
Definition 3.1.2.

Definition 10.1.3. A one-step scheme for a first-order system in the form (10.1.2) with
k = (h) is accurate of order [r, ρ] if there is a constant C such that for |hξ | ≤ π

ekq(ξ ) − g(hξ, k, h)

≤ Chr (1 + |ξ |)ρ . (10.1.3)
k

Note that square brackets are used in Definition 10.1.3 to distinguish this order of
accuracy, i.e., [r, ρ], from the parentheses used in Definition 3.1.1, i.e., (p, q).

Theorem 10.1.1 If a one-step finite difference scheme for a well-posed initial value problem
is accurate of order r according to Definition 3.1.2, then there is a nonnegative integer ρ
such that the scheme is accurate of order [r, ρ] according to Definition 10.1.3.

Example 10.1.2. We illustrate the use of Definition 10.1.1 using the Lax–Wendroff and
Lax–Friedrichs schemes for the one-way wave equation (1.1.1). For each case we take λ
constant, i.e., k = λh, and since there are no lower order terms, we may replace the factor
(1 + |ξ |)ρ by |ξ |ρ . We have

a 2 λ2 θ 2 a 3 λ3 θ 3
ekq(ξ ) = e−iaλθ = 1 − iaλθ − +i + O(θ 4 ).
2 6

For the Lax–Wendroff scheme the amplification factor is

1
g(θ )LW = 1 − iaλ sin θ − 2a 2 λ2 sin2 θ
2
a 2 λ2 θ 2 aλθ 3
= 1 − iaλθ − +i + O(θ 4 ),
2 6

whereas for the Lax–Friedrichs scheme it is

θ2
g(θ )LF = cos θ − iaλ sin θ = 1 − iaλθ − + O(θ 3 ).
2
10.1 Estimates for Smooth Initial Functions 239
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

For the Lax–Wendroff scheme,




k −1 e−iaλθ − g(θ )LW ≤ k −1 C1 |θ |3 = λ−1 C1 h2 |ξ |3 ,

showing that the scheme is accurate of order [2, 3]. For the Lax-Friedrichs scheme,


k −1 e−iaλθ − g(θ )LF ≤ k −1 C2 |θ |2 = λ−1 C2 h|ξ |2 ,

showing that the scheme is accurate of order [1, 2].

We postpone the proof of Theorem 10.1.1 until after we state and prove the main results
of this section. For now we merely point out that for schemes for hyperbolic equations with
λ constant, ρ is usually equal to r + 1, and for parabolic equations with µ constant, ρ
is often r + 2; however, these relationships do not hold in general (see Exercises 10.1.4
and 10.1.7).
Notice also that if a scheme is accurate of order [r, ρ], then it is also accurate of
order [r − 1, ρ − 1] (see Exercise 10.1.6). Finally, we note that Theorem 10.1.1 requires
the initial value problem for the differential equation to be well-posed but does not require
the scheme to be stable. This last observation is important in proving convergence estimates
for multistep schemes that are initialized with unstable schemes (see Section 10.6).
We now state the main result of this section.

Theorem 10.1.2. If the initial value problem for a partial differential equation of the form
(10.1.2), for which the initial value problem is well-posed, is approximated by a stable one-
step finite difference scheme that is accurate of order [r, ρ] with r ≤ ρ, and the initial
function is Tu0 , where u0 is the initial function for the differential equation, then for each
time T there exists a constant CT such that

u(tn , ·) − Sv n  ≤ CT hr u0 H ρ (10.1.4)

holds for all initial data u0 and for each tn = nk with 0 ≤ tn ≤ T and (h, k) in .
Before beginning the proof of this theorem we make several observations. First
notice that to get the optimal accuracy we must have sufficiently smooth functions. If the
initial function is not in H ρ (the space H ρ is defined in Section 2.1), then the order of
convergence in h will be less than r, as we show in the next section. Second, the choice
of the initial function for the scheme, Tu0 , is not natural in actual computation, nor is the
comparison of u with Sv. Later we examine the consequences of using u0 (mh) instead
of Tu0 (mh) as the initial function and also comparing u(tn , xm ) with vm n.

For simplicity of exposition we make two assumptions to reduce the technical details.
We assume that q̄ in (9.1.5) is zero and that the restricted stability condition (2.2.8) is
applicable; i.e., we assume

|etq(ξ ) | ≤ 1 and |g(hξ )| ≤ 1. (10.1.5)

The proof without these assumptions is left as an exercise (see Exercise 10.1.11).
240 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Proof of Theorem 10.1.2. To begin, we seek an estimate of the difference between


Sv n (x) and u(tn , x). We have by the definition of the amplification factor in Section 2.2
that  π/ h
1
vm = √
n
eimhξ g(hξ )n û0 (ξ ) dξ,
2π −π/ h
and so 
1 π/ h
Sv (x) = √
n
eixξ g(hξ )n û0 (ξ ) dξ.
2π −π/ h

Also, by equation (9.1.4),


 ∞
1
u(tn , x) = √ eixξ eq(ξ )tn û0 (ξ ) dξ.
2π −∞

The formula for Sv n differs from that for u(tn , x) in only two respects: There is g(hξ )n
in place of eq(ξ )tn and the interval of integration is [−π/ h, π/ h] instead of (−∞, ∞).
We have, therefore,

 π/ h
1
u(tn , x) − Sv n (x) = √ eixξ (eq(ξ )tn − g(hξ )n )û0 (ξ ) dξ
2π −π/ h

1
+√ eixξ eq(ξ )tn û0 (ξ ) dξ.
2π |ξ |>π/ h

By Parseval’s relation, it follows that

  2
∞ π/ h
q(ξ )tn 2
u(tn , x) − Sv n (x) 2 dx = e − g(hξ )n û0 (ξ ) dξ
−∞ −π/ h
 (10.1.6)
2
q(ξ )tn
+ e û0 (ξ ) dξ.
|ξ |>π/ h

The first term on the right-hand side of (10.1.6) measures the error due to the finite
difference scheme and the second term, as we show, is related to the smoothness of the
function u0 .
We estimate the first term on the right-hand side of (10.1.6) as follows. Let

z = eq(ξ )k .

Then, since tn = nk, we have that


n−1
eq(ξ )tn − g(hξ )n = zn − g n = (z − g) zn−j −1 g j .
j =0
10.1 Estimates for Smooth Initial Functions 241
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Since |z| ≤ 1 and |g| ≤ 1 by stability, it follows that


n
z − g n ≤ |z − g| n

or

q(ξ )tn
e − g(hξ )n ≤ n eq(ξ )k − g(hξ ) . (10.1.7)

Another estimate that will be useful is



q(ξ )tn
e − g(hξ )n ≤ 2, (10.1.8)

which is trivial since |z| and |g| are both at most 1, by (10.1.5).
We now use estimate (10.1.3) together with (10.1.7) in the first integral on the right-
hand side of (10.1.6).
 2
π/ h
q(ξ )tn 2
e − g(hξ )n û0 (ξ ) dξ
−π/ h
 2
π/ h
q(ξ )k 2
≤ n2 e − g(hξ ) û0 (ξ ) dξ
−π/ h
 (10.1.9)
π/ h
≤ n C 2 2
k h (1 + |ξ |)
2 2r 2ρ û0 (ξ ) 2 dξ
−π/ h
 π/ h 2
≤ C 2 tn2 h2r (1 + |ξ |)2ρ û0 (ξ ) dξ ≤ C 2 tn2 h2r u0 2H ρ ,
−π/ h

where u0 H ρ is as defined in Section 2.1.


We now need only estimate the last term in (10.1.6). To do this we note that the
exponential factor is bounded by (10.1.5), and in the range of the integral we have 1 ≤
|ξ | h/π, so
 2 
q(ξ )tn
e û0 (ξ ) dξ ≤ û0 (ξ ) 2 dξ
|ξ |>π/ h |ξ |>π/ h
 2ρ  (10.1.10)
h 2
≤ |ξ |2ρ û0 (ξ ) dξ ≤ C2 h2ρ u0 2H ρ .
π |ξ |>π/ h

Combining (10.1.6), (10.1.9), and (10.1.10) we obtain the basic estimate

u(tn , ·) − Sv n  ≤ C(t)hr u0 H ρ , (10.1.11)

which implies (10.1.4) and proves Theorem 10.1.2.


As we remarked earlier, the choice of initial function for the scheme and the compari-
son of u with Sv n in (10.1.4) are not natural in a computational setting. We now consider
the consequences of using u0 (mh) instead of Tu0 (mh) as the initial function.
242 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The Evaluation Operator


Given a continuous initial function u0 (x) for a partial differential equation, it is natural to
take the values u0 (mh) as the discrete initial function for the scheme. Indeed, this is what
has been done in each of the computational examples in this text and in the exercises. This
is a mapping taking functions defined on the real line R to functions defined on the grid
hZ; we call it the evaluation operator and use the symbol E. Thus for a function u(x) the
evaluation operator is defined by

Eum = u(mh).

Notice that the evaluation operator E cannot be defined for all functions in L2 (R),
since functions in L2 (R) are equivalent if they differ on a set of measure zero (see Appendix
B). Since the grid hZ is a set of measure zero, the evaluation of L2 (R) functions is not
well defined. As we will see, the evaluation operator can be defined for functions that have
some degree of smoothness.
Our first goal is to find the discrete Fourier transform of Eu. We have
 ∞
1
Eum = √ eimhξ û(ξ ) dξ.
2π −∞

To get this in the form of the Fourier inversion formula for a discrete function, the integral
must be over the interval [−π/ h, π/ h]; we can do this using the periodicity of eimhξ:

 ∞ ∞ 
(2π$+π)h−1
eimhξ û(ξ ) dξ = eimhξ û(ξ ) dξ
−∞ $=−∞ (2π$−π)h−1

∞ 
π/ h
= eimhξ û(ξ + 2π $h−1 ) dξ
$=−∞ −π/ h

 π/ h ∞

= eimhξ û(ξ + 2π $h−1 ) dξ.
−π/ h $=−∞

We conclude that


4 )=
Eu(ξ û(ξ + 2π $h−1 ). (10.1.12)
$=−∞

Formula (10.1.12) illustrates the idea of aliasing of Fourier modes. In sampling the
function u at the discrete points xm , we are unable to distinguish frequencies that differ
by multiples of 2π/h.
We now wish to compare Eu with Tu; both of these operations take functions on R
to functions on hZ. Observe that the Fourier transform of Tu is just the term in (10.1.12)
10.1 Estimates for Smooth Initial Functions 243
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

with $ equal to 0. We have, therefore, for some positive values of σ

∞ 2
2 
−1
4 4
Eu(ξ ) − Tu(ξ ) = û(ξ + 2π $h )

$=−∞
 ∞   

2 2σ ∞
−2σ

≤ û(ξ + 2π$h−1 ) ξ + 2π $h−1 · ξ + 2π $h−1 ,
$=−∞ $=−∞
"
where the prime on the means the term for $ equal to 0 is not taken in the sum. Since
|ξ | ≤ πh−1 , the second factor can be bounded by
∞  −2σ  2σ ∞
h 1
2 (2$ − 1)π h−1 =2 = h2σ C(σ )2 ,
π (2$ − 1)2σ
$=1 $=1

which is finite when σ > 1/2.


We then have
 π/ h  2 2σ

Eu − Tu2h ≤ C(σ )2 h2σ û(ξ + 2π $h−1 ) ξ + 2π $h−1 dξ
−π/ h


= C(σ )2 h2σ û(ξ ) 2 |ξ |2σ dξ
|ξ |≥π/ h

≤ C(σ )2 h2σ D σ u2 .


We collect this result in a theorem.
Theorem 10.1.3. If D σ u exists for σ > 1/2, then

Eu − Tuh ≤ C(σ )hσ D σ u.

Theorem 10.1.3 shows that if a function has “more than half a derivative,” i.e., is in
H σ for σ > 1/2, then the evaluation operator can be defined. Later we show that it can
also be defined for a special class of functions that are in H σ for σ less than 1/2 but not
in H 1/2 .
We now consider a stable finite difference approximation for the partial differential
equation (10.1.2) with initial function vm 0 for the scheme equal to u (mh). In addition,
0
n
consider the finite difference solution wm with wm 0 = Tu (mh). We wish to estimate the
0
difference between Eunm , which is u(nk, mh), and vm n . We have

Eun − v n h ≤ Eun − Tun h + Tun − w n h + w n − v n h . (10.1.13)


By Theorem 10.1.3 we can estimate the first term on the right-hand side using σ equal to
r. We have
Eun − Tun h ≤ C(r)hr D r u(tn , ·) ≤ CT C(r)hr D r u(t0 , ·)
244 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

because the initial value problem for the partial differential equation is well-posed.
We can estimate the last term using Theorem 10.1.3 and the stability estimate (1.5.1).
We have, for nk equal to t,
wn − v n h ≤ Ct w 0 − v 0 h = Ct Tu0 − Eu0 h ≤ Ct C(r)hr D r u0 .
We now consider the middle term on the right-hand side of (10.1.13). First we use
Parseval’s relation for discrete functions and then Theorem 10.1.2.
 π/ h

Tun − w n 2h = û(nk, ξ ) − ŵ n (ξ ) 2 dξ
−π/ h
 

= n (ξ ) 2 dξ −
û(nk, ξ ) − Sw û(nk, ξ ) 2 dξ
−∞ |ξ |≥π/ h

≤ u(tn , ·) − Swn 2 ≤ C(tn )h2r u0 2H ρ


by (10.1.11). In this way we pass from a norm on the grid to one on the line.
Combining this with Theorem 10.1.2, we obtain the next theorem.
Theorem 10.1.4. If the initial value problem for a partial differential equation of the form
(10.1.2), for which the initial value problem is well-posed, is approximated by a stable one-
step finite difference scheme that is accurate of order [r, ρ] with ρ > 1/2 and r ≤ ρ
0 is equal to u (mh), where u is in H ρ , then for each positive
and the initial function vm 0 0
time T , there is a constant CT such that
Eu(tn , ·) − v n h ≤ CT hr u0 H ρ
for each tn = nk with 0 ≤ tn ≤ T and (h, k) in .

Table 10.1.1 shows the result of the periodic initial value problem
ut + u x = 0
on the interval [−1, 1] with initial data u0 (x) = sin 2π x using the Lax–Wendroff scheme
with λ = 0.9 and the error at time 2.7. As shown, for this infinitely differentiable solution,
the convergence rate is order 2 in both the L2 and maximum norms; see Exercise 10.1.10.
Table 10.1.1
Second-order convergence for Lax–Wendroff scheme.
L2 convergence L∞ convergence
h Error Rate Error Rate
1/10 1.982e–1 1.969e–1
1/20 5.239e–2 1.92 5.223e–2 1.91
1/40 1.324e–2 1.98 1.323e–2 1.98
1/80 3.334e–3 1.99 3.334e–3 1.99
1/160 8.536e–4 1.97 8.543e–4 1.96

Similar results are displayed in Table 3.1.1 for the forward-time central-space and
leapfrog schemes. Results for multistep schemes, such as the leapfrog scheme, are discussed
in Section 10.6.
We now complete this section by proving Theorem 10.1.1.
10.1 Estimates for Smooth Initial Functions 245
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Proof of Theorem 10.1.1. This proof is in two parts. In the first part we obtain an
estimate of the form
ekq(ξ ) − g(hξ, k, h)
= O(hr ) (10.1.14)
k
for each value of ξ, and then in the second part we prove the existence of the value of ρ.
Since the differential equation is first order in the time differentiation, its symbol
p(s, ξ ) is linear in s and may be written as

p(s, ξ ) = q0 (ξ )s + q1 (ξ ).

The value of q(ξ ) in (10.1.2) is then

q(ξ ) = −q0 (ξ )−1 q1 (ξ ).

By Definition 3.1.2 with φ = eixξ eq(ξ )t or Corollary 3.1.2 with s equal to q(ξ ), we have
that
pk,h (q(ξ ), ξ ) − rk,h (q(ξ ), ξ ) p (q(ξ ), ξ ) = O(hr ).
Since q(ξ ) is the unique root of p(x, ξ ), we have p (q(ξ ), ξ ) = 0, and we obtain

pk,h (q(ξ ), ξ ) = O(hr ).

Moreover, g(hξ ) is the solution to


 
pk,h k −1 ln g(hξ ), ξ = k −1 G(g, hξ ) = 0, (10.1.15)

where G(g, hξ ) is the amplification polynomial defined in Section 4.2. Thus


 
pk,h k −1 ln g(hξ ), ξ − pk,h (q(ξ ), ξ ) = O(hr ).

By the implicit function theorem (see [3] or [8]), this last relation implies that

k −1 ln g(hξ ) − q(ξ ) = O(hr ),

and this gives the formula


r
g(hξ ) = ekq(ξ ) eO(kh ) ,
from which we obtain (10.1.14).
We next use the well-posed nature of the initial value problem and the accuracy of the
scheme to prove that a value of ρ exists. We begin with the Taylor series with remainder
for the exponential function for complex variables in the form

−1
J  1
zj zJ
ez = + (1 − t)J −1 etz dt.
j ! (J − 1)! 0
j =0
246 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

From this we obtain


−1
J  1
k j q(ξ )j k J q(ξ )J
ekq(ξ ) = + (1 − t)J −1 etkq(ξ ) dt.
j! (J − 1)! 0
j =0

We also have that g(θ ) is a ratio of finite trigonometric sums and thus is infinitely differ-
entiable for real values of θ. We can expand g(θ ) as a Taylor series in θ as

I −1 i
θ (i) θI
g(θ ) = g (0) + g (I ) (θ  )
i! I!
i=0

or
I −1 i i
h ξ (i) hI ξ I (I ) 
g(hξ ) = g (0) + g (θ ).
i! I!
i=0

Substituting these expansions for ekq(ξ ) and g(hξ ) into (10.1.14), we obtain

J −1 I −1 −1 i i
ekq(ξ ) − g(hξ, k, h) 1 j −1 k h ξ (i)
= k q(ξ )j − g (0)
k j! i!
j =0 i=0

k J −1 q(ξ )J 1 k −1 hI ξ I (I ) 
+ (1 − t)J −1 etkq(ξ ) dt − g (θ ) = O(hr ).
(J − 1)! 0 I!
(10.1.16)
Now, recalling that k = (h), we can choose J and I large enough so that the two
sums combine to be O(hr ). Similarly, k J −1 , which is (h)J −1 , and k −1 hI , which is
(h)−1 hI , can be made to be O(hr ). The value of g (I ) (θ  ) is then bounded by some
constant, and the integral is also bounded independently of ξ because the equation is well-
posed. Finally, q(ξ ) is a rational function of ξ, and so the growth of (10.1.16) in ξ is at
most polynomial. Thus (10.1.16) is bounded by hr times a constant multiple of (1 + |ξ |)ρ
for some nonnegative integer ρ.

Exercises
10.1.1. Show that the forward-time forward-space scheme (1.3.1) for the one-way wave
equation is accurate of order [1, 2].
10.1.2. Show that the box scheme (3.2.3) for the one-way wave equation is accurate of
order [2, 3].
10.1.3. Show that the forward-time central space scheme for the one-way wave equation
is accurate of order [1, 2].
10.1.4. Show that the forward-time central-space scheme (6.3.1) for the heat equation
(6.1.1) is accurate of order [2, 4] when µ is a constant.
10.1 Estimates for Smooth Initial Functions 247
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

10.1.5. Show that the backward-time central-space implicit scheme (6.3.3) for the heat
equation is accurate of order [1, 4] if λ is constant and of order [2, 4] if µ is
constant.
10.1.6. Show that if a scheme is accurate of order [r, ρ], according to Definition 10.1.3,
then it is accurate also of order [r − 1, ρ − 1].
10.1.7. Show that the Lax–Wendroff-like scheme
n+1 − v n
vm a2k 2 n
m
+ aδ0 vm
n
− δ vm + vm
n
=0
k 2

for the equation


ut + a ux + u = 0
is accurate of order [1, 1] when λ is held constant.
10.1.8. Show that the scheme
 n+1 − v n  n+1 + v n
vm vm
(1 − δ 2 ) m
= a δ0 m
k 2

for the equation


ut − utxx = a ux (10.1.17)
is unconditionally stable and accurate of order [2, 3] if λ is constant.
10.1.9. Show that the scheme
 n+1 − v n
vm
(1 − δ ) 2 m
= a δ0 vm
n
k

for (10.1.17) is unconditionally stable and accurate of order [1, 0] if λ is a constant.


10.1.10. Show that if a stable one-step scheme for ut + aux = 0 is accurate of order [r, ρ],
then  ∞
|vm
n
− u(tn , mh)| ≤ C(tn )hr |ξ |ρ |û0 (ξ )| dξ, (10.1.18)
−∞

where tn = nk and when the initial function is Tu0 . To obtain estimates of


convergence in the L∞ norm, use the simple inequality
 ∞
1
|f (x)| ≤ √ |fˆ(ξ )| dξ
2π −∞

in place of Parseval’s inequality. Hint: The proof is similar to those in the text
for the L2 results. Note that there is no Parseval’s relation for the L∞ norm and
that the interpolation operator S is not needed, since vm n and u(t , mh) can be
n
compared directly.
248 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

10.1.11. Prove Theorem 10.1.2 under the more general assumptions that

|etq(ξ ) | ≤ Ct and |g(hξ )| ≤ 1 + Kk

rather than the special case given by (10.1.5).


10.1.12. Prove that the estimate (10.1.4) holds for N -dimensional space.
10.1.13. Prove that for functions in L2 (R N ), the evaluation operator and truncation op-
erator satisfy
Eu − Tuh ≤ C(σ )hσ D σ u
for σ greater than N/2.
10.1.14. Use the matrix identity


n−1
Z −G =
n n
Z n−1−j (Z − G)Gj
j =0

to prove Theorem 10.1.2 for systems of partial differential equations. (See Exer-
cise A.2.9.)

10.2 Related Topics


In this section we consider two topics that are related to those covered in the previous
section. First, we prove the assertions made in Section 5.3 about the group velocity of wave
packets. Second, we present the Poisson summation formula, which is useful in many areas
of applied mathematics. The study of these topics serves to give insight into the techniques
of Fourier analysis and the ideas that we use in this chapter.

Group Velocity and Wave Packets


We now consider the estimate of the finite difference solution of a wave packet, as given
in Section 5.3. Recall that we chose for our initial function (5.3.1) evaluated at the grid
points, i.e.,
vm0
= (Eu0 )m = eiξ0 xm Epm .
In Section 5.3 we referred to the function Ep as ph .
The claim in Section 5.3 was that the function v ∗ (tn , x) given in (5.3.4), i.e.,

v ∗ (t, x) = eiξ0 (x−α(hξ0 )t) p(x − γ (hξ0 )t), (10.2.1)

is a good approximation to vm n and, for large values of ξ , this is a better approximation


0
to v n than is u(tn , ·). We now justify this claim.
We begin with an estimate of v ∗ (tn , ·) − Sv n :

v ∗ (tn , ·) − Sv n  ≤ v ∗ (tn , ·) − ṽ(tn , ·) + ṽ(tn , ·) − Sv n ,


10.2 Related Topics 249
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where ṽ is defined by (5.3.8), i.e.,


 ∞
1
ṽ(t, x) = √ ei(ω+ξ0 )x e−i(ω+ξ0 )α(hω+hξ0 )t p̂(ω) dω . (10.2.2)
2π −∞

First, we estimate the difference between ṽ and Sv n :


 π/ h 
ṽ(tn , ·) − Sv  =
n 2
|p̂(ω) − p̂h (ω)| dω + 2
|p̂(ω)|2 dω
−π/ h h|ξ |≥π (10.2.3)
= p − SEp ≤ (cr h pH r )
2 r 2

for any r greater than 1/2 (see Exercise 10.2.6). This justifies the claim, given after the
definition of ṽ in (5.3.8), that the replacement of v by ṽ is not a significant error.
We next examine the difference between ṽ and v ∗ . We have
 ∞
v ∗ (tn , ·) − ṽ(tn , ·)2 = |eihr(ω)tn − 1|2 |p̂(ω)|2 dω
−∞

from the formulas (5.3.10) and (5.3.11). Moreover, since

1 2 d 2 θ α(θ )
r(ω) = ω
2 dθ 2 θ=θ ∗

for some value of θ ∗ and |θ| ≤ π, we have that

|r(ω)| ≤ c|ω|2

for some constant c. Thus


 ∞

v (tn , ·) − ṽ(tn , ·) ≤
2
c2 tn2 h2 |ω|4 |p̂(ω)|2 dω ≤ c2 tn2 h2 (pH 2 )2 .
−∞

Combining this estimate with (10.2.3) we obtain

v ∗ (tn , ·) − Sv n  ≤ C(tn )hpH 2 . (10.2.4)

This estimate shows that v ∗ is a good approximation to the solution of the finite difference
scheme.
We now can explain how the first-order approximation in (10.2.4) can be a better
approximation than the higher order approximation (10.1.4) and the related estimates. The
explanation is that the estimate (10.2.4) is essentially independent of ξ0 . In the general esti-
mate (10.1.4), if the initial function u0 is a wave packet such as (5.3.1), then (d/dx)ρ u0 
ρ
contains terms proportional to |ξ0 |ρ p. Therefore, the quantity hr u0 H will not be
small if hξ0 is not small, whereas hpH 2 can be small independently of ξ0 .
250 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The Poisson Summation Formula


A very interesting formula related to the topics of this chapter is the Poisson summation
formula. This formula is useful in many areas of applied mathematics, although we will
not need to make explicit use of it. The formula is obtained by considering the evaluation
operator applied to a function and evaluating the Fourier transform of the discrete function
in two ways. We have formula (10.1.12); also, from the definition of the discrete Fourier
transform (2.1.5),


4 1
Eu(ξ ) = √ e−imhξ u(mh) h
2π m=−∞

for the Fourier transform of Eu. Equating these two expressions, we have the Poisson
summation formula



1 −imhξ
√ e u(mh) h = û(ξ + 2π $h−1 ), (10.2.5)
2π m=−∞ $=−∞

which relates a sum of the function values to a sum of the Fourier transform values. This
formula is valid whenever both infinite summations are convergent.
One use of the Poisson summation formula arises when u is a slowly decreasing,
smooth function, making the series on the left-hand side of (10.2.5) converge slowly. Then
the transform û may be a more rapidly decreasing function, making the series on the right-
hand side converge rapidly, and vice versa. Another use of the summation formula is when
one of the two series is more amenable to obtaining an explicit formula for the summation.
Often, quite difficult sums can be explicitly evaluated in this way.

Example 10.2.1. We illustrate the use of the Poisson summation formula with the func-
tion u(x) = e−ax /2 . The Fourier transform of this function is û(ω) = a −1/2 e−ω /2a , and
2 2

applying the Poisson summation formula (10.2.5) with ξ equal to 0 and h equal to 1 gives



1 1 −(2$π)2 /2a
e−am /2 = √
2
√ e
2π m=−∞ a
$=−∞

or, if we set b = a/2π,

√ ∞ ∞

e−bπm = e−π$ /b .
2 2
b
m=−∞ $=−∞

If b is small, then the sum on the left-hand side will converge slowly, but the sum on the
right-hand side will converge rapidly. Thus the quantity represented by these sums can be
evaluated very efficiently for all values of the parameter
√ b. Notice that this same formula
can be obtained from (10.2.5) by setting h equal to 2π and ξ equal to 0.
10.2 Related Topics 251
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example 10.2.2. We illustrate a second use of the Poisson summation formula with the
function

1 − |x|/a if |x| ≤ a,
u(x) =
0 if |x| > a.
The transform of u is
 a    
1 −iωx |x| 2 a x
û(ω) = √ e 1− dx = cos ωx 1 − dx
2π −a a π 0 a

2 sin2 21 aω
=2 .
π aω2
We apply the Poisson summation formula (10.2.5) with ξ equal to 0 and h equal to 1,
obtaining 
 ∞
1 |n| 2 sin2 a$π
√ 1− =2
2π a π 4a$2 π 2
|n|≤a $=−∞
or
 |n|

1 sin2 a$π

2 sin2 a$π
1− = = a + .
a aπ 2 $2 aπ 2 $2
|n|≤a $=−∞ $=1

We can then obtain the explicit representation for this last sum:

sin2 a$π π2
= [(a − a)(1 − (a − a))] ,
$2 2
$=1

where a is the greatest integer not larger than a. Thus for 0 ≤ a ≤ 1,

sin2 a$π a(1 − a)π 2
= .
$2 2
$=1

The Poisson summation formula can be used in a similar fashion to evaluate many
other sums.

Exercises
10.2.1. Repeat the computations of Example 5.3.1 and verify the estimate (10.2.4) that
v ∗ (tn , ·) is a better approximation to v n than v n is to u(tn , ·). Compute the norm
of the difference between v n and both v ∗ (tn , ·) and u(tn , ·).
10.2.2. Use the Poisson summation formula to evaluate:



1 (−1)n
and .
n +1
2 n2 + 1
n=0 n=0

Hint: Consider the Fourier transform of e−|x| .


252 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

10.2.3. Use the Poisson summation formula to verify the formula

√ ∞


(−1)m e−bπm = e−π($+1/2)
2 2 /b
b .
m=−∞ $=−∞

10.2.4. Given the relation  ∞


cos ωx π
dx = π ,
−∞ cosh x cosh( 2 ω)
use the Poisson summation formula to develop an efficient algorithm for computing
the value of the function

1
C(a) =
cosh(an)
n=0

for 0 < a < ∞. Demonstrate your algorithm with a computer program. Hint:
 ∞  ∞
eiωx cos ωx
dx = dx.
−∞ cosh x −∞ cosh x

10.2.5. Use the function 


1 if |x| < 1,
u(x) = 1
if |x| = 1,
 2
0 if |x| > 1
and the Poisson summation formula to prove the relation

sin 2π $a a + 1
if a is not an integer,
= 2
2π $ a if a is an integer.
$=−∞

In the sum the term for $ = 0 is evaluated to be a.


10.2.6. Verify the estimate
p − SEp ≤ cr hr pH r
that was used in equation (10.2.3).

10.3 Convergence Estimates for Nonsmooth Initial


Functions
The convergence estimates of Section 10.1 are valid only if the initial function is sufficiently
smooth. Since many applications involve initial functions that are not as smooth as required
for the general estimates of Theorem 10.1.2, it is important to obtain estimates for the case
when the initial functions are not smooth. The estimates of this section give the convergence
rate for the solutions of finite difference schemes with order of accuracy [r, ρ] when the
initial function, u0 , has fewer than ρ derivatives in L2 (R), i.e., when u0 H ρ is infinite.
10.3 Estimates for Nonsmooth Initial Functions 253
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The estimates given here are for general one-step schemes and first-order equations of the
form (10.1.2) satisfying the restricted stability and well-posedness conditions (10.1.5). In
the next section we show that for parabolic equations and dissipative schemes we can
improve on the results of this section. Results for second-order equations are given in
Section 10.7.
We now modify the proof of Theorem 10.1.2 under the assumption that u0 H ρ is
infinite and u0 H σ is finite for some σ less than ρ. Notice that the critical estimate
in that proof is given by the estimate (10.1.9). We begin by splitting the first integral in
(10.1.6) into an integral with |ξ | less than η and one with |ξ | greater than η, where η
is chosen as πh−α with α positive and less than 1.
We then have

 2
π/ h
q(ξ )nk 2
u(tn , ·) − Sv n 2 = e − g(hξ )n û0 (ξ ) dξ
−π/ h
 2
q(ξ )nk
+ e û0 (ξ ) dξ
|ξ |>π/ h
 2
η
q(ξ )nk 2
= e − g(hξ )n û0 (ξ ) dξ
−η
 2
q(ξ )nk 2
+ e − g(hξ )n û0 (ξ ) dξ
η≤|ξ |≤π/ h
 2
q(ξ )nk
+ e û0 (ξ ) dξ.
|ξ |>π/ h

In this last expression we use (10.1.7) and (10.1.3) on the first term, (10.1.8) on the second,
and (10.1.5) on the third to obtain

 
η
u(tn , ·) − Sv  ≤ n C
n 2 2 2
k h (1 + |ξ |)
2 2r 2ρ û0 (ξ ) 2 dξ + 4 û0 (ξ ) 2 dξ
−η |ξ |≥η
 η 2
≤ n2 C 2 k 2 h2r (1 + η)2(ρ−σ ) (1 + |ξ |)2σ û0 (ξ ) dξ
−η

2
+ 4η−2σ |ξ |2σ û0 (ξ ) dξ
|ξ |≥η

 η 2
= t 2 C 2 h2r−2(ρ−σ )α (hα + π )2(ρ−σ ) (1 + |ξ |)2σ û0 (ξ ) dξ
−η

2
+ 4π −2σ h2σ α |ξ |2σ û0 (ξ ) dξ.
|ξ |≥η
254 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

If we choose α equal to r/ρ, then both terms have h to the power 2σ r/ρ; this estimate
gives the following theorem.

Theorem 10.3.1. If a stable one-step finite difference scheme is accurate of order [r, ρ],
with r ≤ ρ, the initial function to the partial differential equation is u0 with D σ u0 
0 is Tu , then the solution v n to the
finite and σ less than ρ, and the initial function vm 0
finite difference scheme satisfies
u(tn , ·) − Sv n  ≤ C2 hβ u0 H σ , (10.3.1)
where
β = σ r/ρ.
If σ is greater than 1/2 and the initial function is either Eu0 or Tu0 , then in addition
Eun − v n h ≤ C1 hβ u0 H σ . (10.3.2)

Convergence Estimates for Piecewise Smooth Initial Functions


The convergence estimates of Theorem 10.3.1 often cannot be conveniently applied to many
functions that are useful in actual computations. For example, the function

 1 if |x| < 1,

u(x) = 12 if |x| = 1, (10.3.3)


0 if |x| > 1
is in H σ for each σ less than 1/2, but not in H 1/2 (see Example 10.3.1). Similarly the
functions

1 − |x| if |x| ≤ 1,
u(x) = (10.3.4)
0 if |x| ≥ 1
and

cos 2 x if |x| ≤ 1,
u(x) = (10.3.5)
0 if |x| ≥ 1
are in H σ for σ less than 3/2 and 5/2, respectively, but not in H 3/2 or H 5/2 , respec-
tively (see Exercises 10.3.1 and 10.3.2). Because the function (10.3.4) is almost but not
quite in H 3/2 , it is difficult to see what value of β should be used in the estimate (10.3.1).
We now show that we can take σ equal to the limiting value if the estimate is modified
appropriately.
Each of the functions (10.3.3), (10.3.4), and (10.3.5) satisfy the relation
C(u)
uσ ≤ √ , (10.3.6)
σ0 − σ
where σ0 is 1/2, 3/2, and 5/2 for the three functions (10.3.3), (10.3.4), and (10.3.5),
respectively, and C(u) depends on u but not on σ. We demonstrate this only for the
function given by (10.3.3). The demonstration for the other two functions is left as exercises
(Exercises 10.3.1 and 10.3.2).
10.3 Estimates for Nonsmooth Initial Functions 255
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example 10.3.1. For the function (10.3.3) we have



2 sin ω
û(ω) = ,
π ω

so, for σ with 0 ≤ σ < 1/2,

 ∞  ∞
2 sin2 ω dω
u2H σ = (1 + ω2 )σ dω ≤ C
−∞ π ω2 0 (1 + ω2 )1−σ
   ∞ 
C ∞ 1 dξ C 1 dξ dξ
= ≤ +
2 0 (1 + ξ )1−σ ξ 1/2 2 0 ξ
1/2
1 (1 + ξ )3/2−σ
 
C 1 1
= 2+ 2−(1/2−σ ) ≤ C∗ 1 .
2 1
2 −σ 2 −σ

Thus (10.3.6) is demonstrated for this function, and we also see that function (10.3.3) is in
H σ for each σ less than 1/2.

When the initial function satisfies (10.3.6) we apply Theorem 10.3.1 with σ equal
−1 −1
to σ0 − | ln h|−1 . First notice that h−(r/ρ)| ln h| = e−(r/ρ) ln h| ln h| = er/ρ , and so for
β = σ r/ρ and β0 = σ0 r/ρ,

−1
hβ = hβ0 h−r/ρ| ln h| = er/ρ hβ0 .

In this way we obtain, from (10.3.1),

u(tn , ·) − Sv n  ≤ C2 er/ρ hβ0 | ln h|1/2 C(u0 ).

We state this result formally as a corollary to Theorem 10.3.1.

Corollary 10.3.2. If the initial function u0 satisfies (10.3.6), then estimate (10.3.1) in
Theorem 10.3.1 may be replaced by

u(tn , ·) − Sv n  ≤ C2 hβ0 | ln h|1/2 C(u0 ), (10.3.7)

where β0 = σ0 r/ρ.

In computations to check the order of accuracy of solutions, the factor of | ln h|1/2 in


the estimate (10.3.7) is difficult to verify. For the order of accuracy we usually obtain only
the exponent of β0 , and the factor involving ln h is not noticed (see Exercise 10.3.6).
256 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Evaluation of Piecewise Smooth Functions


In many practical computations the initial function is piecewise differentiable with several
jump discontinuities. As an example, the function (10.3.3) is not in H 1/2 , and the eval-
uation operator is not defined for all functions in H 1/2 . However, as we will show, the
evaluation operator can be extended to function (10.3.3) and to many other functions of
common occurrence. In this section we show how the results of Theorem 10.1.3 can be
extended to cover these functions.
We begin by considering the function

 0 if x = 0,

1 −x
γ (x) = 2 e if x > 0,

 1 −|x|
−2e if x < 0.

We will show that even though it is not in H 1/2 , the evaluation operator can still be applied
to it. The Fourier transform of γ is
1 −iω
γ̂ (ω) = √
2π 1 + ω2
and by Definition 10.1.1 and (10.1.12),



4 (ξ ) − Tγ
Eγ 4 (ξ ) = γ̂ (ξ + 2π $h−1 )
$=−∞


−i ξ + 2π $h−1 ξ − 2π $h−1
=√ +
2π $=1 1 + (ξ + 2π $h−1 )2 1 + (ξ − 2π $h−1 )2

∞  
−i 2ξ 1 + ξ 2 − (2π $h−1 )2
=√   ,
2π $=1 1 + (ξ + 2π $h−1 )2 1 + (ξ − 2π $h−1 )2

where the prime on the first sum means that the term for $ equal to 0 is not taken in the
sum. Therefore, for ξ satisfying |ξ | ≤ π h−1 ,
 ∞
4 4 2 h2 (1 + 5$2 )
|Eγ (ξ ) − Tγ (ξ )| ≤ |ξ | 2 ≤ Ch2 |ξ |,
π π (2$ − 1)4
$=1

and hence
 π/ h 2  π 3
Eγ − Tγ 2h ≤ C 2 h4 |ξ |2 dξ = C 2 h4 = C02 h.
−π/ h 3 h

So for the function γ we have

Eγ − Tγ h ≤ C0 h1/2 .
10.3 Estimates for Nonsmooth Initial Functions 257
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Notice that γ is not in H 1/2 , so this estimate for γ is stronger than the general result of
Theorem 10.1.3. Also notice that this estimate is valid for translates of the function γ .
Now we extend this result to any function u(x) that is a piecewise differentiable
function except for a finite number of jump discontinuities. That is, there are a finite number
of points x1 < x2 < · · · < xL , such that in each interval (xi , xi+1 ) the function u(x)
is differentiable. Let [u](xi ) be the jump in u at xi , i.e.,
[u](xi ) = lim [u(xi + ε) − u(xi − ε)] .
ε→0+
(The notation ε → 0+ means that ε is restricted to positive values as it tends to 0.) We
take for u(xi ) the average of the values on either side, i.e.,
u(xi ) = 1
lim [u(xi
2 ε→0 + ε) + u(xi − ε)] .

Finally, we assume that 


|u(x)|2 + |Du(x)|2 dx (10.3.8)
|x|>K
is finite where K is larger than |x1 | and |xL |.
We now consider the function
L
u1 (x) = [u](xi )γ (x − xi )
i=1
and the function u2 (x) = u(x) − u1 (x). Notice that u1 has precisely the same jump
discontinuities as does u; therefore, u2 is continuous and in H 1 ; see Exercise 10.3.3.
(Recall that u0 is piecewise differentiable.)

L
Eu1 − Tu1 h ≤ |[u](xi )| Eγ (· − xi ) − Tγ (· − xi )h
i=1


L
≤ |[u](xi )|h1/2 C0 .
i=1

Using this result for u1 and Theorem 10.1.3 on u2 , we obtain


Eu − Tuh ≤ Eu2 − Tu2 h + Eu1 − Tu1 h


L
≤ C(1)hDu2  + |[u](xi )|h1/2 C0 ≤ C ∗ (u) h1/2 .
i=1

We state this result as a corollary to Theorem 10.3.1.


Corollary 10.3.3. If the initial function u0 of Theorem 10.3.1 is a piecewise differentiable
function except for a finite number of jump discontinuities and if (10.3.8) is finite for u0 ,
then the estimate (10.3.2) can be replaced by
Eun − v n h ≤ | ln h|1/2 hβ C(u0 ),
where β = r/(2ρ) and C(u0 ) depends only on u0 .
258 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercise 10.3.6 provides demonstrations of the estimates of Theorem 10.3.1 and


Corollary 10.3.3.
Table 10.3.1 shows the result of the periodic initial value problem for the same equa-
tion but with the initial data

1 − 2|x| if |x| ≤ 12 ,
u0 (x) =
0 otherwise

using the Lax–Wendroff scheme for λ = 0.9 and the error measured at time 2.7. As
shown, for this solution, for which σ = 3/2, the convergence rates are order 1 for the L2
convergence and order 2/3 for max-norm convergence (see Exercise 10.3.4).
The results of Table 3.1.2 also illustrate these ideas. There the value of σ is 3/2; the
rate of convergence for the Lax–Wendroff scheme is 1 and for the Lax–Friedrichs scheme
it is 3/4.

Table 10.3.1
Convergence for the Lax–Wendroff scheme.
L2 convergence L∞ convergence
h Error Rate Error Rate
1/10 6.603e–2 1.269e–1
1/20 3.347e–2 0.980 8.607e–2 0.560
1/40 1.671e–2 1.002 5.477e–2 0.652
1/80 8.452e–3 0.984 3.485e–2 0.652
1/160 4.295e–3 0.977 2.209e–2 0.658

Exercises
10.3.1. Verify that the relation (10.3.6) holds for the function (10.3.4) with σ0 equal to
3/2.

10.3.2. Verify that the relation (10.3.6) holds for the function (10.3.5) with σ0 equal to
5/2.

10.3.3. Show that the function u2 used in the proof of Corollary 10.3.3 is in H 1 .

10.3.4. Show that if a stable one-step scheme for ut + aux = 0 is accurate of order [r, ρ],
then  ∞
|vm
n
− u(tn , mh)| ≤ C(tn )hβ |ξ |σ |û0 (ξ )| dξ,
−∞

where tn = nk, β = rσ/ρ and when the initial function is Tu0 . See Exercise
10.1.10.

10.3.5. Show that if r > ρ, then the value of β in estimate (10.3.1) must be σ.
10.4 Estimates for Parabolic Equations 259
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

10.3.6. Solve the initial value problems for the one-way wave equation ut + ux = 0 on
the interval [−1, 1] with periodic boundary conditions up to t = 0.96 Use the
Lax–Wendroff scheme with λ = 0.8 and grid spacing h = 10 1 1
, 20 1
, 40 1
, and 80 .
Use the following initial functions:


1 if |x| < 12 ,
(a) u0 (x) = 1
if |x| = 12 ,

2
0 otherwise;
(b) u0 (x) = cos(π x);

cos2 (π x) if |x| ≤ 12 ,
(c) u0 (x) =
0 otherwise.

Determine the rates of convergence of the numerical solution to the exact solution
as a function of h. Compare the actual rates of convergence with those obtained in
this section. Discuss the results. Note that the convergence rate obtained for (a)
can be quite sensitive to the method of programming.
10.3.7. Repeat the computations of the previous exercise using the Lax–Friedrichs scheme.

10.4 Convergence Estimates for Parabolic Differential


Equations
The estimates of Theorems 10.1.1 and 10.3.1 are fairly sharp for hyperbolic equations
but unnecessarily pessimistic for many schemes for parabolic equations. Because of the
smoothing inherent in parabolic equations, it is reasonable to believe that nonsmooth initial
functions should not seriously degrade the rate of convergence of the finite difference
solution to the solution of the differential equations. Indeed, for dissipative schemes the
convergence rates are much better than those given by Theorem 10.3.1 and Corollaries
10.3.2 and 10.3.3.

Theorem 10.4.1. If a one-step scheme that approximates an initial value problem for a
parabolic equation is accurate of order [r, ρ], for ρ ≥ r + 2 and dissipative of order 2
with µ a constant and with µ = kh−2 , then for each time T , there is a constant CT
such that for any t with nk = t ≤ T and (h, k) in ,

u(t, ·) − Sv n  ≤ CT (1 + t −(ρ−1)/2 )hr u0  (10.4.1)


and
Eun − v n h ≤ CT (1 + t −(ρ−1)/2 )hr u0 . (10.4.2)

Notice that these estimates require only that u0 be in L2 (R), which, for our purposes,
places no requirement at all on the smoothness of u0 .
260 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Proof. To obtain the sharper estimates (10.4.1) and (10.4.2), we begin with sharper
estimates than we used in the proof of Theorem 10.3.1. In place of (10.1.5) we use

|etq(ξ ) | ≤ eKt e−cξ t ,


2
(10.4.3)

which holds for parabolic equations (see Definition 9.2.1) and, since µ is constant and the
scheme is dissipative,
 
|g(hξ )| ≤ 1 − c0 sin2 21 hξ (1 + kK) ≤ eKk e−cξ k .
2
(10.4.4)

The values of c and K can be taken to be the same in both (10.4.3) and (10.4.4). In
(10.4.4) the value of |hξ | is at most π.
Using these estimates rather than (10.1.5), we obtain in place of (10.1.7)
 n−1
|eq(ξ )t − g(hξ )n | ≤ n|eq(ξ )k − g(hξ )| max |eq(ξ )k |, |g(hξ )|

≤ n|eq(ξ )k − g(hξ )| eKt e−cξ


2 (t−k)

≤ necπ µ |eq(ξ )k − g(hξ )| eKt e−cξ t ,


2 2

2k 2µ
where we have used that ecξ ≤ ecπ for |hξ | at most π. Then in place of (10.1.9) we
have
 π/ h  π/ h
(1 + |ξ |)2ρ e−cξ t |û0 (ξ )|2 dξ.
2
|eq(ξ )t − g(hξ )n |2 |û0 (ξ )|2 dξ ≤ C n2 k 2 h2r
−π/ h −π/ h

The expression (1 + |ξ |)2ρ e−cξ t is bounded by a constant, depending on ρ and c, times


2

(1 + t −1/2 )2ρ or, equivalently, (1 + t −ρ/2 )2 (see Exercise 10.4.3). Thus the preceding
integral is bounded by

   π/ h    π/ h
ρ 2 ρ−1 2
C  t 1 + t − 2 h2r |û0 (ξ )|2 dξ ≤ C  1 + t − 2 h2r |û0 (ξ )|2 dξ.
−π/ h −π/ h

In place of (10.1.10) we have the estimate


  2r 
h
|ξ |2r e−cξ t |û0 (ξ )|2 dξ
2
|e q(ξ )t
û0 (ξ )| dξ ≤
2
e 2Kt
|ξ |≥π/h π |ξ |≥π/h
 2r 
h
≤ C e2Kt t −r |û0 (ξ )|2 dξ
π |ξ |≥π/h

for some constant C. Combining these estimates and using the relation ρ ≥ r + 2, we
have (10.4.1). The estimate (10.4.2) follows easily using the methods of Section 10.1.
10.4 Estimates for Parabolic Equations 261
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example 10.4.1. Figure 10.2 shows the initial condition and solution for the equation

ut = uxx for −1≤x ≤1

at time t = 0.052. The scheme is the forward-time central-space scheme with h = 0.1
and µ = 0.4. Because the scheme is dissipative, the solution is very smooth after only
13 time steps. The solution is also very accurate. The exact solution, which supplies the
boundary values, is


∗ 1 sin2 ( kπ
4 ) −tπ 2 k 2
u (t, x) = + 8 e cos(kπ x).
4 (kπ )2
k=1

0.8

0.6

0.4

0.2

0
-1 -0.5 0 0.5 1

Figure 10.2. A smooth solution of a dissipative scheme.

Exercises
10.4.1. Solve the initial value problems for the heat equation ut = uxx on the interval
[−1, 1] with periodic boundary conditions up to t = 1. Use the explicit forward-
time central-space scheme with µ = 0.4 and grid spacing h = 1/10, 1/20,
1/40, and 1/80, with the following initial functions:

 1 if |x| < 2 ,
 1

(a) u0 (x) = 1 if |x| = 1 ,


2
 2
0 otherwise;
(b) u0 (x) = cos(π x).
262 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The exact solution to (a) is


1 2 −t (2k+1)2 π 2 (−1)k
u(t, x) = + e cos[(2k + 1)π x].
2 π 2k + 1
k=0

For t = 1 only a few terms are needed to give seven-place accuracy. Show that
the solution is computed to second-order accuracy in the L2 norm.

10.4.2. Solve the initial boundary value problem for ut = uxx on −1 ≤ x ≤ 1 for 0 ≤
t ≤ 1/2 with initial function given by the function a of Exercise 10.4.1 and using
Dirichlet boundary conditions. The exact solution, from which the boundary values
can be obtained, is the same as in Exercise 10.4.1. Use the Crank–Nicolson scheme
with h = 10 1 1
, 20 1
, and 40 . Compare the convergence behavior in the L2 norm

and the L norm for the case in which λ = 1 with the case in which µ = 10.

10.4.3. Show that the expression (1 + |ξ |)2ρ e−cξ t is bounded by a constant, depending
2

on ρ and c, times (1 + t −1/2 )2ρ and that this is equivalent to (1 + t −ρ/2 )2 .

10.5 The Lax–Richtmyer Equivalence Theorem


In this section we prove the Lax–Richtmyer theorem, Theorem 1.5.1, for one-step schemes;
the extension to multistep schemes is discussed in Section 10.6. The definition of conver-
gence given in Section 1.4 is not complete, since the nature of the convergence of the
functions is not specified. We now make the idea of a convergent scheme precise using the
interpolation operator S as defined in Definition 10.1.2.

Definition 10.5.1. A finite difference scheme approximating the homogeneous initial


value problem for a partial differential equation is a convergent scheme if Sv n converges
to u(tn , ·) in L2 (R), where tn = nk, for every solution u to the differential equation
and every set of solutions to the difference scheme v, depending on h and k, for which
Sv 0 converges to u(0, ·) in L2 (R) as h and k tend to 0 in the stability region .

The study of inhomogeneous initial value problems is easily done using the results
for homogeneous problems and Duhamel’s principle, as described in Section 9.3. We now
restate Theorem 1.5.1 for one-step schemes.

Theorem 10.5.1. The Lax–Richtmyer Equivalence Theorem. A consistent one-step


scheme for a well-posed initial value problem for a partial differential equation is convergent
if and only if it is stable.

The proof of this theorem is somewhat similar to the proof of Theorem 10.1.2; how-
ever, here we have much less information about the scheme than we did in Section 10.1.
For example, we do not even assume that the order of accuracy is O(hα ) for any positive
α. However, since we are making so few assumptions about the scheme, we are also able
to obtain the equivalence of convergence and stability.
10.5 The Lax–Richtmyer Equivalence Theorem 263
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Proof. We first prove that stability of the scheme implies the convergence of the
scheme. Later we show that an unstable scheme is nonconvergent, which completes the
proof of the theorem.
We prove that stability implies convergence without making the special assumptions
(10.1.5); rather we assume only that there is a constant CT such that

|eq(ξ )t | ≤ CT and |g(hξ )n | ≤ CT (10.5.1)

for 0 ≤ t ≤ T and 0 ≤ nk ≤ T and (h, k) in λ.


For the first part of the proof we assume that the initial function for the scheme is
Tu0 . Then we have 
u0 − Sv 0 2 = |û0 (ξ )|2 dξ,
|ξ |>π/h
which converges to zero as h tends to zero by the Lebesgue dominated convergence theorem
(see Proposition B.4.2 in Appendix B). We use consistency to obtain the estimate

ekq(ξ ) − g(hξ )
= o(1) in h and k. (10.5.2)
k
The meaning of the notation o(1) is that for each ξ the left-hand side of (10.5.2) tends to
zero as h and k tend to zero. The estimate (10.5.2) is obtained as was (10.1.3), with the
replacement of O(hr ) by o(1).
We now consider the L2 norm of u(tn , ·) − Sv n . As in (10.1.6) we have the relation
 ∞
u(tn , x) − Sv n (x) 2 dx
−∞
  (10.5.3)
π/ h 2 2 2
q(ξ )tn n q(ξ )tn
= e − g(hξ ) û0 (ξ ) dξ + e û0 (ξ ) dξ.
−π/ h |ξ |>π/ h

We consider the right-hand side of (10.5.3) as one integral over R, with the specifi-
cation of the integrand given piecewise. That is, the integrand is the function
 2 π

 eq(ξ )tn − g(hξ )n |û0 (ξ )|2 if |ξ | ≤ ,
φh (ξ ) = h
 eq(ξ )tn û (ξ ) 2

π
if |ξ | > .
0
h
Furthermore, for each ξ, when h is small enough, i.e., small enough that |ξ | < πh−1 ,
the integrand is given as in the first piece. When h and k are both in , the expression
eq(ξ )tn − g(hξ )n satisfies

|eq(ξ )tn − g(hξ )n | ≤ n|eq(ξ )k − g(hξ )| CT ,

which is essentially the same as (10.1.7) except it uses the more general estimates (10.5.1)
rather than (10.1.5). By (10.5.2) we have the estimate

|eq(ξ )tn − g(hξ )n | ≤ nkCT o(1) ≤ o(1).


264 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We conclude that the integrand on the right-hand side of (10.5.3) converges to zero for each
value of ξ as h and k tend to zero in . We thus have the set of functions φh that are
in L1 (R) and tend to zero at each point as h and k tend to zero in .
Before we can conclude that the norms of these functions tend to zero, we need one
more piece of information. This is given by observing that

|eq(ξ )tn − g(hξ )n |2 |û0 (ξ )|2 ≤ (2CT )2 |û0 (ξ )|2 .

This shows that the functions φh are bounded uniformly by a function in L1 (R), namely,
4CT2 |û0 (ξ )|2 . By the Lebesgue dominated convergence theorem (see Appendix B), we
conclude that  ∞  ∞
φh (ξ ) dξ = n (ξ )|2 dξ
|û(tn , ξ ) − Sv
−∞ −∞

converges to zero as h and k tend to zero in , and thus the scheme is convergent.
We now briefly consider the case where v 0 is not equal to Tu0 . First, define the grid
function wn , which is the solution to the difference scheme with initial function Tu0 . We
then have
u(tn , ·) − Sv n  ≤ u(tn , ·) − Swn  + Swn − Sv n .
The norm of u(tn , ·) − Swn converges to zero by our previous result. We have, by the
definition of S and by stability,
Swn − Sv n  = S(wn − v n ) = wn − v n h

≤ CT w 0 − v 0 h = CT Tu0 − v 0 h

≤ CT u0 − Sv 0 ,
which converges to zero by assumption. This concludes the first part of the proof, showing
that a stable scheme is convergent.
We now prove that a consistent one-step scheme is nonconvergent if it is unstable.
The proof consists of constructing a function u0 (x) such that the scheme with initial
function Tu0 does not converge to the solution of the partial differential equation. The
function u0 (x) is constructed as the sum of functions wM (x) determined as follows.
If the scheme is unstable, then by Theorem 2.2.1 the estimate

|g(hξ, k, h)| ≤ 1 + Ck

does not hold for any constant C and sufficiently small h and k. Thus for any positive
integer M, there are values of ξM , kM , and hM such that

|g(hM ξM , kM , hM )| ≥ 1 + MkM (10.5.4)

and |hM ξM | ≤ π. Since g(hξ, k, h) is a continuous function, there is a positive number


ηM such that
|g(hM ξ, kM , hM )| ≥ 1 + 12 MkM (10.5.5)
10.5 The Lax–Richtmyer Equivalence Theorem 265
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

for |ξ − ξM | ≤ ηM , and moreover we may choose ηM to satisfy

ηM ≤ M −2

and choose hM and kM less than hM−1 and kM−1 , respectively. We now need a crucial
result, which relies on the consistency of the scheme.

Lemma 10.5.2. If the finite difference scheme is consistent, then the intervals IM = [ξM −
ηM , ξM + ηM ] can be chosen to be disjoint.
Proof. We prove this lemma by induction on M. For M = 1, there is only one
interval and the assertion is trivial.
Suppose that for some M the interval IM cannot be chosen as disjoint from IN
with N less than M. Let 5
J = IN
N<M

and by our supposition, for any h and k less than hM−1 and kM−1 , respectively, the
estimate
|g(hξ, k, h)| ≤ 1 + Mk (10.5.6)
holds for ξ ∈
/ J. From consistency of the scheme to the equation, it follows from (10.5.2)
that
g(hξ, k, h) − eq(ξ )k

≤ C(ξ ) (10.5.7)
k

for each ξ as h and k tend to zero. Since J is a compact set, being the union of a finite
number of closed intervals,
sup C(ξ ) = C ∗
ξ ∈J

exists and is finite. From (10.5.1) we also have that


1/n
|eq(ξ )k | = |eq(ξ )tn |1/n ≤ CT ≤ 1 + Kk

for some value of K. These estimates imply, by (10.5.6) for ξ ∈


/ J and (10.5.7) for ξ ∈ J,
that
|g(hξ, k, h)| ≤ 1 + max(M, C ∗ + K)k
for h < hM−1 and k < kM−1 , which contradicts our assumption that the scheme is un-
stable. Therefore, our supposition must be false, and there is a ξM ∈
/ J such that (10.5.4)
holds for some hM and kM small enough. Since J is a closed set and g is continuous,
there is an interval [ξM − ηM , ξM + ηM ] disjoint from J such that (10.5.5) holds. This
proves the lemma.
We now continue our construction of the functions wM (x). Define the positive num-
bers αM by αM 2 η = M −2 and then define the function w
M M by

αM if |ξ − ξM | ≤ ηM ,
ŵM (ξ ) =
0 otherwise.
266 Chapter 10. Convergence Estimates for Initial Value Problems
"
We define our initial function as the sum of the functions wM . Let u0 (x) = ∞
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

M=1
wM (x); we will show that u0 is in L2 (R). Because the intervals IM are disjoint, we
have that

 ∞  ∞ ∞ 

|u0 (x)|2 dx = |û0 (ξ )|2 dξ = |ŵM (ξ )|2 dξ
−∞ −∞ M=1 −∞



π2
= 2 2
αM ηM = 2 M −2 = ,
3
M=1 M=1

which shows that u0 is in L2 (R).


We now show that the solution of the scheme applied to Tu0 does not converge. Let
vmn be the solution to the scheme with this initial function. Given a time T , choose a time

level n and a value of M such that


T CT − 1 T
≤ nkM ≤ T and ≤ , (10.5.8)
2 M 8
where CT is the constant bounding eq(ξ )t in (10.5.1). We then have
 π/ h
Sv n − u(tn , ·)2 ≥ v n − Tu(tn , ·)2h = |g(hξ )n − eq(ξ )nk |2 |û0 (ξ )|2 dξ .
−π/ h

For h = hM and ξ in IM , we have the estimate

|g(hξ )n − eq(ξ )kn | ≥ |g(hξ )|n − CT ≥ (1 + 12 MkM )n − CT .

Thus 
Sv n − u(tn , ·)2 ≥ |g(hξ )n − eq(ξ )nk |2 |û0 (ξ )|2 dξ
|ξ −ξM |≤ηM
 n !2
1
≥ 1 + MkM − CT αM2
2ηM
2
$ %2
(1 + 12 MkM )n − CT
=2 .
M
We estimate this last expression using the inequality (1 + x)n ≥ 1 + nx for positive x.
We then have, by (10.5.8),
 2
1 CT − 1 T2
Sv n − u(tn , ·)2 ≥ 2 nkM − ≥  → 0.
2 M 32

Thus Sv n does not converge to u(tn , ·), hence the scheme is nonconvergent. This com-
pletes the proof of the Lax–Richtmyer equivalence theorem.
Notice that the proof we have given shows only that Sv n does not converge to
u(tn , ·); in fact the norm of v n must become unbounded as n increases.
10.6 Analysis of Multistep Schemes 267
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
10.5.1. State and prove the Lax–Richtmyer theorem for the inhomogeneous initial value
problem. Hint: Use Duhamel’s principle; see Section 9.3.

10.5.2. Using the computation of Exercise 10.4.2, show that in the L∞ norm the solution
computed with λ = 1 does not converge to the exact solution. This is a demon-
stration that the Lax-Richtmyer theorem does not hold in the L∞ norm when the
initial function for the scheme is Eu0 .

10.6 Analysis of Multistep Schemes


In this section we extend the results of the previous sections to multistep schemes. The
primary estimate for Section 10.1 is (10.1.3) and for Section 10.5 it is the similar estimate
(10.5.2). We will show that the convergence estimates for multistep schemes follow from
these results for one-step schemes. For a multistep scheme there is not a unique amplification
factor, and thus estimates (10.1.3) and (10.5.2) cannot be used without some clarification.
We restrict the discussion to schemes for single equations; the results for systems may be
proved in a similar fashion.
The first stage in the reduction of a multistep scheme to a one-step scheme is to
distinguish a special amplification factor.

Theorem 10.6.1. If a multistep scheme is accurate of order r as an approximation to a


partial differential equation in the form (10.1.2), then there is a unique amplification factor
g0 (hξ ) defined for |hξ | ≤ θ0 for some positive value θ0 such that

g0 (hξ ) = 1 + kq(ξ ) + o(k) (10.6.1)

as h and k tend to zero. Moreover, there exists a nonnegative integer ρ such that

ekq(ξ ) − g (hξ, k, h)
0
≤ Chr (1 + |ξ |)ρ .
k

If g0 satisfies this last estimate, the scheme is said to be accurate of order [r, ρ].

Example 10.6.1. As illustrations of this theorem we consider the leapfrog scheme for the
one-way wave equation and the Du Fort–Frankel schemes for the heat equation. By the
formula for the amplification factor for the leapfrog scheme (see (4.1.2)),

&
g0 (hξ ) = g+ (hξ ) = −iaλ sin hξ + 1 − a 2 λ2 sin2 hξ
(10.6.2)
= 1 + k(−iaξ ) + O(k ) + O(h ).
2 2
268 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

For the Du Fort–Frankel scheme (6.3.6) for the heat equation (6.1.1), we have

&
2bµ cos hξ + 1 − 4b2 µ2 sin2 hξ
g0 (hξ ) = g+ (hξ ) =
1 + 2bµ (10.6.3)

= 1 + k(−bξ 2 ) + O(k 2 )

when µ is a constant. In (10.6.2) we can take θ0 , the limit of the range where g0 is
defined, equal to π, since |aλ| < 1; a similar statement can be made for (10.6.3) if
bµ < 1. However, if bµ ≥ 1, θ0 should be chosen so that sin(θ0 ) < (bµ)−1 . This is
necessary to define g0 uniquely.

Proof. This proof is similar to the first part of the proof of Theorem 10.1.1. If we
set g(hξ ) = 1 + kq(ξ ) + o(k), then

k −1 ln g(hξ ) = q(ξ ) + o(1).

Since there is at most one root of G(z, hξ ) such that z is 1 when hξ is zero, the im-
plicit function theorem guarantees the existence of a root, g0 (hξ ), to G(g, hξ ) = 0 such
that k −1 ln g0 (hξ ) = q(ξ ) + o(1). This is equivalent to (10.6.1). The existence of ρ is
essentially the same as in the proof of Theorem 10.1.1.
The amplification factor g0 (hξ ) may not represent a one-step scheme, e.g., (10.6.2)
or (10.6.3), but nonetheless it can be used to generate the sequence of functions

ŵn (ξ ) = g0 (hξ )n v̂ 0 (ξ ). (10.6.4)

For |hξ | greater than θ0 we can set g0 (hξ ) equal to zero. We will call the method in
(10.6.4) for generating the functions ŵn (ξ ) a pseudoscheme. An important observation
is that the results of the previous sections apply to one-step pseudoschemes as well as for
actual schemes. For multistep schemes, the methods of Section 10.1, such as Definition
10.1.3, apply for g0 (hξ ).
Now consider a multistep scheme for an initial value problem. Let J + 1 be the
number of initial time levels that must be specified to determine the solution. That is,
assume that v 0 , v 1 , . . . , v J must be specified before v n for n > J can be computed by
the scheme.
Let w be the function generated by (10.6.4). By the results of Section 10.1 applied
to the pseudoscheme (10.6.4), we have that

u(tn , ·) − Swn  = O(hr ).

Now consider the norm of the difference between u(tn , ·) and Sv n :

u(tn , ·) − Sv n  ≤ u(tn , ·) − Swn  + Swn − Sv n 

= u(tn , ·) − Swn  + wn − v n h .


10.6 Analysis of Multistep Schemes 269
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We have by the stability definition, Definition 1.5.1, that


J
J
j
wn − v n h ≤ CT w j − v j h = CT g0 v̂ 0 − v̂ j h .
j =0 j =1

We thus see that to obtain the optimal accuracy of r, we must have that v j approximates
wj to within O(hr ). But

ŵ j (ξ ) = g0 (hξ )j v̂ 0 = ekj q(ξ ) v̂ 0 (ξ ) + O(khr )

= eq(ξ )tj û0 (ξ ) + O(khr ).

Thus
v̂ j (ξ ) − ŵ j (ξ ) = v̂ j (ξ ) − eq(ξ )tj û0 (ξ ) + O(khr ).

To obtain the optimal order of accuracy for the scheme, the initial functions for the
multistep scheme must satisfy

v j = u(tj , ·) + O(hr ). (10.6.5)

If the initial time levels are initialized using a one-step scheme with accuracy r  and am-
plification factor g̃(hξ ), we have v̂ j (ξ ) = g̃(hξ )j v̂ 0 , and requirement (10.6.5) becomes

g̃(hξ )j v̂ 0 (ξ ) − eq(ξ )tj û0 (ξ ) = O(hr ).

Since

g̃(hξ ) = eq(ξ )k + O(khr ),

we see that khr should be O(hr ). Thus the initializing scheme may have order of accuracy

less than r and not degrade the overall accuracy. All that is required is that khr be O(hr ).
Notice also that the initializing scheme need not be stable.

Theorem 10.6.2. If the initialization of a multistep scheme uses schemes of order of accu-

racy r  to compute the initial solution values v j for j from 1 to J such that khr is
O(hr ), and the initial data is in H ρ , where [r, ρ] is the order of accuracy of the multistep
scheme, then the order of accuracy of the solution is r.

In particular, the leapfrog scheme may be initialized with the forward-time central-
space scheme, which is first-order accurate, and still be second-order accurate overall.
Similarly, the Du Fort–Frankel scheme with µ constant may be initialized using vm 1 equal
0
to vm , a scheme accurate of order 0, and the overall scheme will be second-order accurate.
If the initial data is not in H ρ , the smoothness of the solution is given by the results
of Sections 10.3 and 10.4.
270 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
10.6.1. Show that the leapfrog scheme (1.3.4) is accurate of order [2, 3].
10.6.2. Show that the Du Fort–Frankel scheme (6.3.6) is accurate of order [2, 4].
10.6.3. Show that for the implicit multistep scheme (4.2.3)

1
g0 (θ ) = √
2− 1 − 2iaλ sin θ

and show that this scheme is accurate of order [2, 3].


10.6.4. Solve the heat equation with the Du Fort–Frankel scheme using the data of Exercise
6.3.10.
1 = v 0 gives second-order accurate solutions
(a) Show that the initialization vm m
if µ is constant.
(b) Show that if k = h3/2 , then the initialization in (a) is accurate of order less
than 2.
10.6.5. Repeat Exercise 10.3.6 using the leapfrog scheme, using the forward-time central-
space scheme to compute the first time step.

10.7 Convergence Estimates for Second-Order


Differential Equations
The proofs for the convergence estimates for schemes approximating second-order equa-
tions are similar to those of first-order equations. We give only a brief discussion with
emphasis on the points of difference between the two types of equations.
The class of equations we consider is that for which the equations can be put in the
form
ûtt + 2a(ξ )ût = b(ξ )û. (10.7.1)
We assume also that the initial value problem for (10.7.1) is well-posed. That is, we assume
that the two zeros of
q 2 + 2a(ξ )q − b(ξ ) = 0
satisfy the estimate
Re q± (ξ ) ≤ q̄,
as discussed in Section 9.1. We have
&
q± (ξ ) = −a(ξ ) ± a(ξ )2 + b(ξ ).

A further technical assumption that we need is that there is a constant c0 such that

|q± (ξ )| ≤ c0 |q+ (ξ ) − q− (ξ )| (10.7.2)


10.7 Estimates for Second-Order Equations 271
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

for all real values of ξ. Inequality (10.7.2) is satisfied for most second-order equations
arising in applications (see Exercise 10.7.2).
We also define the number χ as the largest value such that

c|ξ |χ ≤ |q± (ξ )| (10.7.3)

for large values of ξ | and some positive constant c.


We initially restrict ourselves to two-step schemes, which have two amplification
factors, denoted by g+ (hξ ) and g− (hξ ). These correspond to the two roots q+ (ξ ) and
q− (ξ ) of the symbol.
Corresponding to Definition 10.1.3 and Theorem 10.1.1, we have the next definition
and theorem.

Definition 10.7.1. A two-step scheme for a second-order equation of the form (10.7.1)
with k = (h) is accurate of order [r, ρ] if there is a constant C such that for |hξ | ≤ π

ekq± (ξ ) − g (hξ, k, h)
±
≤ Chr (1 + |ξ |)ρ . (10.7.4)
k

Theorem 10.7.1. If a two-step finite difference scheme for a second-order equation with a
well-posed initial value problem is accurate of order r according to Definition 3.1.1, then
there is a nonnegative integer ρ such that the scheme is accurate of order [r, ρ] according
to Definition 10.7.1. Moreover, if χ is defined by (10.7.3), then

ekq± (ξ ) − g (hξ, k, h)
±
≤ chr (1 + |ξ |)ρ−χ . (10.7.5)
kq± (ξ )

We do not prove Theorem 10.7.1, since the proof parallels that of Theorem 10.1.1.
The main distinction between the proofs is that for second-order equations, there are the
two roots q± (ξ ) instead of only the one root as for first-order equations.
The solution to the initial value problem (10.7.1) with initial functions

û(0, ξ ) = u0 (ξ ) and ût (0, ξ ) = u1 (ξ )

may be written as

−q− (ξ )û0 (ξ ) + û1 (ξ ) q+ (ξ )û0 (ξ ) − û1 (ξ )


û(t, ξ ) = eq+ (ξ )t + eq− (ξ )t . (10.7.6)
q+ (ξ ) − q− (ξ ) q+ (ξ ) − q− (ξ )

For the two-step finite difference scheme, we consider a special solution of the scheme,
0 (ξ ) and ŵ1 (ξ ) so that the solution
which we denote by w. We choose ŵ 0 (ξ ) equal to Tu
is

−q− (ξ )û0 (ξ ) + û1 (ξ ) q+ (ξ )û0 (ξ ) − û1 (ξ )


ŵ n (ξ ) = g+ (ξ )n + g− (ξ )n (10.7.7)
q+ (ξ ) − q− (ξ ) q+ (ξ ) − q− (ξ )
272 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

for h|ξ | ≤ π. This special choice of initial function, as in Section 10.1, is convenient in
order to obtain the simplest convergence estimate.

Theorem 10.7.2. If the initial value problem for a second-order partial differential equation
of the form (10.7.1), for which the initial value problem is well-posed, is approximated by
a stable two-step finite difference scheme with the solution (10.7.7), then for each time T
there is a constant CT such that

u(tn , ·) − Swn  ≤ CT hr (u0 H ρ + u1 H ρ−χ ) (10.7.8)

for tn = nk with 0 ≤ tn ≤ T .

Proof. This proof is similar in spirit to that of Theorem 10.1.2. We have, for
h|ξ | ≤ π,

û(tn ξ ) − ŵ n (ξ ) = (eq+ (ξ )tn − g+ (ξ )n )A+ (ξ ) + (eq− (ξ )tn − g− (ξ )n )A− (ξ ),

n in (10.7.6) and (10.7.7). As with (10.1.7) we have


where A± (ξ ) are the coefficients of g±

|eq± (ξ )t − g± (hξ )n | ≤ n CT |eq± (ξ )k − g± (hξ )|.

We then have by (10.7.4)

|q− (ξ )|
|eq+ (ξ )k − g+ (hξ )||A+ (ξ )| ≤ |eq+ (ξ )k − g+ (hξ )||û0 (ξ )|
|q+ (ξ ) − q(ξ )|

|q+ (ξ )| |eq+ (ξ )k − g+ (hξ )|


+ |û1 (ξ )|.
|q+ (ξ ) − q− (ξ )| |q+ (ξ )|

By (10.7.5), this estimate becomes


 
|eq+ (ξ )k − g+ (hξ )||A+ (ξ )| ≤ Ckhr (1 + |ξ |)ρ |û0 (ξ )| + (1 + |ξ |)ρ−χ |û1 (ξ )| .

A similar estimate holds for A− (ξ ). For |ξ | > h−1 π, we have

q+ (ξ )eq− (ξ )t − q− (ξ )eq+ (ξ )t eq+ (ξ )t − eq− (ξ )t


û(tn , ξ ) = û0 (ξ ) + û1 (ξ ),
q+ (ξ ) − q− (ξ ) q+ (ξ ) − q− (ξ )

from which we obtain the estimate

|û(tn , ξ )| ≤ Chr (|ξ |r |û0 (ξ )| + |ξ |r−χ |û1 (ξ )|)

for r less than ρ. These estimates for h|ξ | ≤ π and h|ξ | > π give estimate (10.7.8).
10.7 Estimates for Second-Order Equations 273
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Before considering solutions to the finite difference scheme other than those of the
form (10.7.7), we note that
1 (ξ ) = O(khr ),
û(k, ξ ) − Sw

where w 1 is given by (10.7.7).


Now let v be any solution to a two-step finite difference scheme approximating the
second-order equation (10.7.1). We let w be the particular solution given by (10.7.7), and
then we have

u(tn , ·) − Sv n  ≤ u(tn , ·) − Swn  + Swn − Sv n .

The first term on the right-hand side is estimated using Theorem 10.7.2, and the second
term is estimated using the stability estimate (8.2.1). We have


1
Swn − Sv n  = wn − v n h ≤ (1 + n)Ct w j − v j h .
j =0

We see that if w j − v j h is of the order of khr , then we have the estimate u(tn , ) −
Sv n  ≤ O(hr ).
These observations give us the next theorem.
Theorem 10.7.3. If the initial value problem for a well-posed second-order partial differ-
ential equation of the form (10.7.1) is approximated by a stable finite difference scheme
that is accurate of order [r, ρ] and the initial functions are accurate of order r, then the
solution v satisfies

u(tn , ·) − Sv n  ≤ CT hr (u0 H ρ + u1 H ρ−χ ).

The extension to general multistep schemes for second-order differential equations


is similar to that for first-order equations in section 10.6; see Exercise 10.7.6.
If the initial data is not sufficiently smooth, then results similar to Theorem 10.3.1
hold; in particular, we have the following theorem.

Theorem 10.7.4. If a stable multistep finite difference scheme for a second-order equation
is accurate of order [r, ρ], with r ≤ ρ, and the initial functions to the partial differential
equation are u0 and u1 with D σ0 u0  and D σ1 u1  finite and σ0 ≤ ρ and σ1 + χ ≤ ρ,
then the solution v n to the finite difference scheme satisfies

u(tn , ·) − Sv n  ≤ C2 hβ (u0 H σ0 + u1 H σ1 ) , (10.7.9)

where
r
β= min(σ0 , σ1 + χ ) .
ρ

The proof proceeds similarly to that of Theorem 10.3.1, with the main difference
being the role of u1 . The proof is left to Exercise 10.7.5.
274 Chapter 10. Convergence Estimates for Initial Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
10.7.1. Show that the scheme (8.2.2) for the second-order wave equation is accurate of
order [2, 3] with χ equal to 1.
10.7.2. Show that the estimate (10.7.2) holds for the three second-order equations (8.1.1),
(8.1.9), and (8.1.11).
10.7.3. Solve the wave equation utt = uxx for x ∈ [0, 1], with the scheme (8.2.2) using
the initial conditions

u(0, x) = sin x and ut (0, x) = cos x.

Obtain the boundary values from the exact solution u(t, x) = sin(x + t). Demon-
strate by computation that the initialization

1
vm = u(0, xm ) + kut (0, xm )

results in a first-order accurate solution, but the initialization (8.2.5) gives a second-
order accurate solution.
10.7.4. Prove the Lax–Richtmyer theorem for second-order equations under the restrictions
|etq± (ξ ) | ≤ 1 and |g± (θ )| ≤ 1.
10.7.5. Prove Theorem 10.7.4 and verify the conclusion with computations using piecewise
smooth functions.
10.7.6. Extend Theorem 10.7.2 to cover multistep schemes for second-order equations.
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 11

Well-Posed and Stable Initial-


BoundaryValue Problems

In this chapter we present the theory pertaining to the well-posedness of boundary conditions
for partial differential equations and stability of boundary conditions for finite difference
schemes. We begin the chapter by reducing the general initial-boundary value problem to
a special form in which the only nonzero data are those associated with the boundary con-
ditions. Then after introducing the Laplace transform, we give a rather general discussion
of the basic formulation of the analysis of boundary conditions. We introduce the basic
ideas of the analysis of boundary conditions for finite difference schemes in Section 11.2
by considering the leapfrog scheme with four boundary conditions and then present the
more general theory in Section 11.3. Section 11.4 deals with the theory of initial-boundary
value problems for hyperbolic and parabolic partial differential equations. The chapter
concludes by presenting the matrix method for analyzing the stability of finite difference
initial-boundary value problems.

11.1 Preliminaries
Consider an initial-boundary value problem for either a partial differential equation or a
finite difference scheme
Pu = f (11.1.1)

on a domain U in R n with initial function

u(0, x) = u0 (x) (11.1.2)

and boundary conditions


Bu = β on ∂U. (11.1.3)

We first assume that there is an extension of equation (11.1.1) and the initial data (11.1.2)
to all of R n and that the resulting initial value problem is well-posed in the case of the
differential equation or stable in the case of the difference scheme.
Let w be the solution to equation (11.1.1) on R n satisfying the initial condition
(11.1.2), suitably extended to all of R n . Writing the solution u to (11.1.1) as w + u , we
obtain an initial-boundary value problem for u on the domain U similar to the original
problem for u except that the data f of (11.1.1) and the initial data u0 for (11.1.2) are
equal to zero. The only nonzero data are the boundary data in (11.1.3).

275
276 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We now make a further modification, which simplifies the analysis. This is to extend
the time interval from (0, ∞) to (−∞, ∞). This extension allows for a convenient use
of the Laplace transform in the analysis of the boundary condition.
The next simplification depends on the idea that well-posedness of boundary con-
ditions of partial differential equations is essentially a local property. That is, we need
consider only the differential equation and boundary condition at each boundary point,
and if it is well-posed at each of these points, then the global problem is well-posed. The
proof of this result is beyond this text, but this principle is extremely useful. For a general
domain U with smooth boundary, the analysis of the initial-boundary value problem at
a boundary point x0 at time t0 is reduced to considering the differential equation with
the values of the coefficients fixed at (t0 , x0 ) and also the boundary conditions with their
coefficients evaluated at (t0 , x0 ). The domain U can be replaced by the half-space formed
by the tangent plane to the boundary of U at x0 and the interior normal at x0 . In this way
the general analysis of initial-boundary value problems can be reduced to the analysis of
constant coefficient equations on half-spaces. If each of these frozen coefficient problems
is well-posed, then the original problem is well-posed.
Similar results hold for finite difference schemes, although the theory is not as com-
plete as it is for partial differential equations. We consider only one-dimensional problems
for difference schemes, and in this case the stability of an initial-boundary value problem
can be analyzed by considering the pure initial value problem and the two initial-boundary
value problems arising from the two endpoints. As with the well-posedness of partial dif-
ferential equations, the stability of the initial-boundary value problem can be determined
by examining only the frozen coefficient problems.

The Laplace Transform


The Laplace transform is employed with an independent variable, such as time, for which the
directionality is important. For a function u(t) defined for t ∈ R, the Laplace transform
of u is a function ũ of a complex variable s = η + iτ defined as follows.

Definition 11.1.1. The Laplace transform ũ(s) is equal to the Fourier transform of
e−ηt u(t) with dual variable τ, i.e.,
 ∞
1
ũ(s) = √ e−(η+iτ )t u(t) dt,
2π −∞
where s = η + iτ. (Note that most definitions of the Laplace transform omit the factor of
(2π)−1/2 ; we include it for symmetry with the Fourier transform.)
Based on the Fourier transform, we have the Laplace inversion formula,
 ∞
1
u(t) = √ e(η+iτ )t ũ(η + iτ ) dτ
2π −∞
 η+i∞ (11.1.4)
1
=√ est ũ(s) ds.
2π i η−i∞
11.1 Preliminaries 277
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

An important result for the Laplace transform is that ũ(s) is an analytic function of the
complex variable s; see Appendix C. Because we are interested in the forward direction
of time, the functions of interest to us will be analytic for η positive. The discrete Laplace
transform is defined in a similar manner.

Definition 11.1.2. The Laplace transform of a discrete function vmn on a grid with spacing

k is defined by

1
ṽ(s) = √ k e−(η+iτ )nk v n .
2π n=−∞

Usually we set z = e(η+iτ )k and, with an abuse of notation, set



1
ṽ(z) = √ k z−n v n .
2π n=−∞

We have the inversion formula


 π/k
1
v =√
n
esnk ṽ(s) dτ
2π −π/k
3 (11.1.5)
1
=√ z(n−1) ṽ(z) dz.
2π ik
|z|=eηk

There should be no confusion about the use of either s or z as the Laplace transform dual
variable. Notice that the relation Re η ≥ 0 is equivalent to |z| ≥ 1.

Example 11.1.1. As an example of the Laplace transform of a function of t consider


#
1 if t > a,
u(t) =
0 otherwise.
We have  ∞
1 1 e−as
ũ(s) = √ e−st dt = √ .
2π a 2π s
To check the inversion formula we have the integral
 η+i∞ s(t−a)
1 e
u(t) = ds . (11.1.6)
2π i η−i∞ s

The path of integration is from η − i∞ to η + i∞, where the value of η is positive. In


general, the value of η must be such that ũ(s) is bounded for Re s > η. Since ũ(s) is
an analytic function we can evaluate the integral with contour integration; see Appendix C.
By considering the integral over the curve given by the two sections,

η + iγ for −R ≤ γ ≤ R,
?=
η + Re iθ for −π/2 ≤ θ ≤ π/2.
278 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Notice that the integrand of (11.1.6) tends to 0 along the half-circle portion of ? as R
tends to infinity. So, by taking the limit as R tends to infinity, and since the integrand has
no poles inside ?,
 s(t−a)  η+i∞ s(t−a)
e e
0= ds = ds
? s η−i∞ s
and thus u(t) is 0 for t less than a.
For the case with t greater than a we use the curve

η + iγ for −R ≤ γ ≤ R,
?=
η − Re iθ for −π/2 ≤ θ ≤ π/2.
As before, the integrand of (11.1.6) tends to 0 along the half-circle portion of ? as R tends
to infinity, but in this case there is a pole of the integrand inside of ?. Thus
 s(t−a)  η+i∞ s(t−a)
1 e 1 e
1= ds = ds,
2πi ? s 2π i η−i∞ s

and thus u(t) is 1 for t greater than a. The analysis for t equal to a requires some
careful analysis. It can be shown that u(a) = 1/2.

Example 11.1.2. As an example of the Laplace transform for a discrete function, consider

n
α if n ≥ N,
v =
n
0 otherwise
for any nonzero value of α. The transform is
∞  N
1 −n n 1 α z
ṽ(z) = √ k z α =√ k ,
2π n=N 2π z z−α

for z with |z| > |α|. The Fourier inversion formula is


3
1 αN 1
vn = dz .
2π i z N−n z−α
|z|=eηk

The path of integration is chosen to enclose both the origin and α. For n greater than or
equal to N, there is only the one pole at α and the contour integral gives v n = α n . For
n less than N there is a pole at 0 in addition to the one at α. To evaluate the residue at α
we use the expansion
zk−N+n ∞
αN 1 α N−1 1
= − = −α N−1
.
zN−n z − α zN−n 1 − z/α αk
k=0

The residue at 0 is seen by taking the term with k = N − n − 1 to be −α n . The sum of


the two residues is 0, thus v n is 0 for n less than N.
11.1 Preliminaries 279
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

From Parseval’s relations for the Fourier transform, we have equality of the norm of
the function and its transform,
 ∞  ∞
u2η = e−2ηt |u(t)|2 dt = |ũ(η + iτ )|2 dτ (11.1.7)
−∞ −∞
and

 π/k
v2η,k = k e−2ηkn |v n |2 = |ṽ(η + iτ )|2 dτ (11.1.8)
n=−∞ −π/k

or, equivalently,

3
v2η,k = k |z|−2n |v n |2 = k −1 |ṽ(z)|2 dθ, (11.1.9)
n=−∞
|z|=eηk

where z = i.e., θ = τ k. The subscript of η on the norm identifies η as a pa-


eηk eiθ ,
rameter. By choosing η to be positive we are specifying that we are considering the
initial-boundary problem for t in the positive direction.
When we consider both time and space dimensions, we have the norms
 ∞  ∞ ∞
u2η = e−2ηt |u(t, x, y)|2 dt dx dy
−∞ 0 −∞
 ∞  ∞ ∞
= |û(η + iτ, x, ω)|2 dτ dx dω,
−∞ 0 −∞

where û is the transform in both t and y. We also use the norm symbol with single bars
for the norm over the boundary; for example,
 ∞ ∞  ∞ ∞
|β|2η = e−2ηt |β(t, y)|2 dt dy = |β̂(η + iτ, ω)|2 dτ dω.
−∞ −∞ −∞ −∞

The estimates for well-posed initial-boundary value problems are of the form
 
u2η + |u|2η ≤ C(η) |β|2η + f 2η + u0 2 ,

showing that the norms of the solution in the interior and on the boundary are bounded
by the norms of the data β on the boundary as in (11.1.3), the data f as in (11.1.1), and
the initial data u0 as in (11.1.2). By the process given earlier, the general problem can be
reduced to the case in which the only nonzero data are the boundary data β. The estimate
relating the norms of the solution to the boundary data can be used to give the general
estimate, but these arguments are beyond this text.

A General Analysis of Boundary Conditions


Before delving into the particular details of the analysis of boundary conditions, it will be
helpful to make some general comments. The purpose of these comments is to illuminate
the basic ideas of these theories.
280 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Each boundary value problem, when transformed under the Fourier and Laplace
transforms, gives rise to a set of linear equations, one equation for each boundary condition.
The unknowns to be determined by these equations characterize the solution in the interior
of the domain. The boundary value problem is well-posed or stable if and only if this linear
system can be solved and if the solution can be bounded appropriately by the boundary
data.
To emphasize the basic ideas and to illustrate the approach, we consider first a simple
problem in linear algebra. Given a system of linear equations

Ax = b,

it is a standard result that there is a unique solution to this system if and only if there are no
nontrivial solutions to the system
Aζ = 0.
If the only solution to this homogeneous equation is the trivial solution, then there is a
constant, namely, A−1 , such that

x ≤ A−1  b.

In the theory for boundary conditions we wish to know if there is a constant such that
the solution is bounded in terms of the boundary data. To do this, we need examine only
homogeneous equations, as with the simple case just discussed. Many of the theorems in
the theory of boundary conditions state that if there are no nontrivial solutions to a certain
class of problems, then there is a constant by which the solution to the boundary value
problem is bounded by the boundary data.
We now consider a set of linear equations

A(ρ)x(ρ) = b(ρ), (11.1.10)

where the matrix A(ρ) and data b(ρ) depend continuously on a parameter ρ, which is an
element of an open set, say ρ ∈ (0, 1). There is a solution to equation (11.1.10) for each
value of ρ if and only if there are no nontrivial solutions to the homogeneous problems.
However, if we desire the bound on x(ρ) to be independent of ρ, we must also consider
the homogeneous problem for ρ equal to 0 and to 1. Assuming that A(ρ) is defined for
ρ in [0, 1], if there are no nontrivial solutions to

A(ρ) ζ = 0

for ρ in the closed set [0, 1], then there is a constant C, independent of ρ, such that

x(ρ) ≤ Cb(ρ)

for ρ ∈ (0, 1). For boundary value problems the parameters, such as (s, ω) or z, are in
open sets, e.g., Re s > 0 or |z| > 1, and it is to be determined whether the solution
can be bounded by the boundary data uniformly, i.e., independently of the parameters. It
is rarely found in practical applications that a boundary condition is ill-posed or unstable
11.2 Boundary Conditions for the Leapfrog Scheme 281
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

for values of the parameters in the interior of the parameter set; the nontrivial solutions to
the homogeneous equations usually occur at the boundary of the parameter set. This may
present some difficulties, as we will see, but a consideration of the basic ideas will give
guidance toward handling the difficulties.
In the next two sections we examine the stability of boundary conditions for finite
difference schemes, and in Section 11.4 we examine the well-posedness of boundary con-
ditions for partial differential equations.

Exercises
11.1.1. Compute the Laplace transform for the function

eαt if t ≥ 0,
u(t) =
0 if t < 0.

Verify the Laplace inversion formula (11.1.4) for this function.


11.1.2. Compute the Laplace transform for the function

n if n ≥ 0,
vn =
0 if n < 0.

Verify the Laplace inversion formula (11.1.5) for this function.


11.1.3. Show that if two discrete functions am and bm are related by am = bm+1 , then
the Laplace transforms satisfy ã(z) = zb̃(z).
11.1.4. Verify Parseval’s relations (11.1.7), (11.1.8), and (11.1.9) for the Laplace transform
by using the Parseval’s relations for the Fourier transform.

11.2 Analysis of Boundary Conditions for the Leapfrog


Scheme
We begin our analysis of boundary conditions for finite difference schemes by considering
the leapfrog scheme for ut − aux = 0, with a positive, written as
 n 
n+1
vm = vm
n−1
+ aλ vm+1 − vm−1
n
(11.2.1)

on the region R+ , which is the semi-infinite interval [0, ∞), for −∞ < t < ∞. Notice
that we have changed the sign of the propagation speed from the one-way wave equation
(1.1.1) considered in most other chapters. The differential equation requires no boundary
condition, but the scheme requires a numerical boundary condition at x = 0. We will
examine in detail four boundary conditions for this scheme. This analysis will serve to
motivate the more general discussion of the next section.
282 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The first two boundary conditions are extrapolations to determine v0n from values
of n
vm for m positive, and the second two are one-sided difference approximations to the
differential equation. These boundary conditions are
v0n+1 =v1n+1 + β n+1 , (11.2.2a)

v0n+1 =v1n + β n+1 , (11.2.2b)


 
v0n+1 =v0n−1 + 2aλ v1n − v0n + β n+1 , (11.2.2c)
 
v0n+1 =v0n + aλ v1n − v0n + β n+1 , (11.2.2d)

where, again, the function β n+1 is the result of subtracting solutions so that the initial
function is zero.
We begin by transforming scheme (11.2.1) via the Laplace transform in the time
variable to form the resolvent equation (see Exercise 11.1.3)

1
z− ṽm = aλ (ṽm+1 − ṽm−1 ) . (11.2.3)
z
We wish to obtain solutions to the resolvent equation that are in L2 (hZ+ ) as functions
of xm . The general solution to (11.2.3) is obtained as follows. Replacing ṽm by κ m for
m ≥ 0, we obtain the equation

1 1
z − = aλ κ − (11.2.4)
z κ
for κ as a function of z. Equation (11.2.4) has in general two roots κ− (z) and κ+ (z),
which are continuous functions of z. The general solution of (11.2.3) is then given by
ṽm = Aκ− (z)m + Bκ+ (z)m
when κ− and κ+ are distinct.
The first significant result is that for |z| > 1, one of the roots, which we denote by
κ− (z), satisfies |κ− (z)| < 1, and the other root, denoted by κ+ (z), satisfies |κ+ (z)| >
1. In particular, this means that the two roots do not cross the unit circle for z larger
than 1 in magnitude. This result is a direct consequence of the stability of the scheme.
The general result is stated in Theorem 11.3.1. We could verify this result by directly
solving equation (11.2.4), essentially a quadratic in κ, for the two roots κ− and κ+ . We
will, however, regard equation (11.2.4) as implicitly defining the two functions and avoid
explicitly determining κ− (z) and κ+ (z). As we will see, there are only a few facts we need
regarding these functions, and the information we need can be determined by avoiding the
algebra involved in explicitly computing κ− (z) and κ+ (z). We use this same approach on
more difficult problems in the next section.
Because we are interested only in those solutions of the resolvent equation that are in
L2 (hZ+ ) when |z| > 1, the general form of ṽm is
ṽm = A(z)κ− (z)m , (11.2.5)
where |κ− (z)| < 1 for |z| > 1.
11.2 Boundary Conditions for the Leapfrog Scheme 283
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The coefficient A(z) is to be determined by the transform of the boundary function,


β̃(z). Substituting from (11.2.5) into the boundary conditions, we obtain

A(z) [1 − κ− (z)] =β̃(z), (11.2.6a)

A(z) [z − κ− (z)] =zβ̃(z), (11.2.6b)


# .
A(z) z − z−1 − 2aλ[κ− (z) − 1] =zβ̃(z), (11.2.6c)

A(z) {z − 1 − aλ[κ− (z) − 1]} =zβ̃(z) (11.2.6d)

for each of the boundary conditions (11.2.2a), (11.2.2b), (11.2.2c), and (11.2.2d), respec-
tively.
The norm of the solution ṽm in L2 (hZ+ ) is given by



ṽ(z) = h
2
|ṽm (z)| = h|A(z)|
2 2
|κ− (z)|2m
m=0 m=0
h
= |A(z)| 2
.
1 − |κ− (z)|2
n the norm is
In terms of the function vm
 π/k h
v2η,h = |A(esk )|2 dτ,
−π/k 1 − |κ− (esk )|2

where s = η + iτ. For simplicity, we use only the subscript h rather than both h and k
to denote the norm involving both x and t.
To obtain an estimate of the form

v2η,h ≤ C|β|2η,h , (11.2.7)

we must substitute the expression giving A(z) as a function of β̃. For the first two boundary
conditions, i.e., (11.2.2a) and (11.2.2b), we have, from (11.2.2a) and (11.2.2b),
 π/k |β̃(esk )|2 h
v2η,h = dτ (11.2.8a)
−π/k |1 − κ− (esk )|2 1 − |κ− (esk )|2

and  π/k |z|2 |β̃(esk )|2 h


v2η,h = dτ, (11.2.8b)
−π/k |esk − κ− (esk )|2 1 − |κ− (esk )|2
respectively.
These equations show that we must obtain some lower bound on |1 − κ− | for
(11.2.8a) and on |esk − κ− | for (11.2.8b). Because we choose η positive, we have
that |z| > 1 and, by Theorem 11.3.1, |κ− (z)| < 1; therefore, neither of the expressions
|1 − κ− (z)| or |z − κ− | is zero, but, as k tends to zero, z —which is esk —approaches
284 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

arbitrarily close to the unit circle. Moreover, because κ− (z) is a continuous, even analytic,
function of z, we can examine the behavior of κ− (z) for k equal to 0, i.e., for |z| = 1.
The behavior for k positive but near to 0 can then be determined by methods such as Taylor
series.
This analysis reduces to checking for nontrivial solutions of the form (11.2.5), which
solve the homogeneous boundary condition. Thus we must check whether there is a κ− (z)
such that
A(z) [1 − κ− (z)] = 0 (11.2.9a)
or
A(z) [z − κ− (z)] = 0 (11.2.9b)
for the two boundary conditions (11.2.2a) and (11.2.2b), respectively.
To analyze boundary conditions (11.2.2a)–(11.2.2d), we first set κ = 1 in (11.2.4)
and we easily find that if κ = 1, then either z = 1 or z = −1. Conversely, if z = 1 or
z = −1, then κ = 1 is a root. This shows us that for z equal to 1, either κ− (1) = 1 and
κ+ (1) = −1, or, alternatively, κ− (1) = −1 and κ+ (1) = 1. To determine if the first of
these cases holds, i.e., if κ− (1) = 1, we consider z = 1 + ε and let κ = 1 + δ for small
values of ε and δ. If for ε > 0 we find that δ < 0, then κ− (1) is 1, but if instead for
ε > 0 we find that δ > 0, then it is κ+ (1) that is 1 and so κ− (1) is −1.

κ- κ- κ+
-1 1 -1 1
κ+

z -1 z 1

Figure 11.1. Behavior of κ− and κ+ as functions of z.

Substituting z = 1 + ε and κ = 1 + δ in (11.2.4), we obtain


    
1 1
z − = 2ε + O ε = aλ κ −
2
= aλ 2δ + O(δ 2 ) .
z κ

Since aλ is positive, we see that ε > 0 implies δ > 0; thus it is κ+ (1) that is 1, and by
default κ− (1) is −1, as represented in the right-hand image of Figure 11.1.
Similarly, for z = −(1 + ε) and κ = 1 + δ, we find
   
−2ε + O ε 2 = aλ 2δ + O(δ 2 ) ,

and ε > 0 implies δ < 0. Thus κ− (−1) = 1 as depicted in the left-hand image of
Figure 11.1. Notice that for z near −1, we have that 1 − κ− (z) = −δ = O(ε) = O(|z| −
1) = O(kη). Thus for boundary condition (11.2.2a) we have

|1 − κ− (z)| ≥ cηk, (11.2.10a)


11.2 Boundary Conditions for the Leapfrog Scheme 285
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

and this is the best possible estimate for the denominator of (11.2.8a), being achieved at
τ = ±π/k, i.e., for z near −1.
For boundary condition (11.2.2b) we consider the quantity z − κ− (z). To see if this
quantity can be zero or close to zero, we substitute κ = z in (11.2.4), obtaining

1 1
z − = aλ z − .
z z
Since aλ is less than 1, this equation is satisfied only if z − 1/z is zero, i.e., only if z = 1
or z = −1. As we showed in the preceding analysis, κ− (1) = −1 and κ− (−1) = 1.
Therefore, it cannot be true that |z − κ− (z)| is zero for |z| ≥ 1. Hence there is a constant
c, independent of k, such that
|z − κ− (z)| ≥ c. (11.2.10b)
From these estimates we see from (11.2.8a) and (11.2.8b) that the dependence of the
solution on the data is given by
 π/k
1 |β|2 h
v2η,h ≤ 2 2 dτ for (11.2.2a) (11.2.11a)
c k −π/k 1 − |κ− |2
and
 π/k
|β|2 h
v2η,h ≤ c−2 dτ for (11.2.2b). (11.2.11b)
−π/k 1 − |κ− |
2

It remains to estimate the term h/(1 − |κ− (z)|2 ) in the two expressions (11.2.11a)
and (11.2.11b). For general schemes, as we will show in Lemma 11.3.2, we have that

1 − |κ− (z)| ≥ c0 ηk (11.2.12)

for some constant c0 . We now show this


 for the particular case of the leapfrog scheme. We
set z = esk = eiτ 1 + ηk + O(ηk)2 and consider two cases. Either |κ− (z)| = 1 for kη
equal to 0 or |κ− (z)| < 1 for kη equal to 0. We need to consider how |κ− (z)| depends
on ηk. In the first case set κ− (z) = eiϕ (1 − δ), and then, from equation (11.2.4),
 
2i sin τ + 2ηk cos τ + O(ηk)2 = aλ 2i sin ϕ + 2δ cos ϕ + O(δ 2 ) . (11.2.13)

We obtain that sin τ = aλ sin ϕ, and so | sin τ | ≤ aλ, from which we conclude that

| cos τ | ≥ 1 − (aλ)2 .

Thus we obtain for δ from (11.2.13):


cos τ 
δ= ηk + O(ηk)2 ≥ 1 − (aλ)2 ηk + O(ηk)2 .
cos ϕ
For | sin τ | greater than aλ, the value of |κ− (z)| is strictly less than 1. Therefore, for η
positive and k in some range 0 < k ≤ k0 (η), it follows that (11.2.12) holds; thus, since
λ is constant,
h h c
≤ ≤ .
1 − |κ− (z)| 2 1 − |κ− (z)| η
286 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

From (11.2.11a) we obtain the estimate


k −2 ∗ 2
v2η,h ≤ c |β|η,h (11.2.14a)
η
for boundary condition (11.2.2a), where c∗ is some constant, and from (11.2.11b)
c∗ 2
v2η,h ≤ |β|η,h (11.2.14b)
η
for boundary condition (11.2.2b), for some other value of c∗ .
Estimate (11.2.14b) is of the form of (11.2.7) and shows that boundary condition
(11.2.2b) is stable. However, because the estimate (11.2.10a) and, therefore, the estimate
(11.2.14a) are the best possible estimates, boundary condition (11.2.2a) is unstable. By
considering when the estimate (11.2.10) is achieved, we can choose v and β so that
v2η,h ≥ ck −2 |β|2η,h .
For particular small data, i.e., |β|η,h , we can have vη,h arbitrarily large, and thus
boundary condition (11.2.2a) is unstable.
We complete this section by analyzing boundary conditions (11.2.2c) and (11.2.2d).
We have that κ− (z) is given by equation (11.2.4) for |z| ≥ 1 and, as before, |κ− (z)| < 1
for z outside the unit circle. Boundary condition (11.2.2c), by (11.2.5), gives the equation
z − z−1 = 2aλ(κ− − 1) (11.2.9c)
as the equation to be solved if there is to be a nontrivial solution to the homogeneous
boundary value problem. (The numbering of this last equation is chosen to show the
relationship to (11.2.9a) and (11.2.9b).) From (11.2.4) and (11.2.9c) we obtain

1 1
z − = 2aλ(κ− − 1) = aλ κ− − .
z κ−
Since aλ is not zero, we have that κ− = 1 is the only solution to this equation. We have
already determined from equation (11.2.4) that κ− is equal to 1 only when z is −1. We see
that z equal to −1 and κ− equal to 1 satisfies equation (11.2.9c), and thus the boundary
condition (11.2.2c) is unstable.
Boundary condition (11.2.2d) gives the equation
z − 1 = aλ(κ− − 1) (11.2.9d)
to be solved for a solution to the homogeneous initial-boundary value problem. Dividing
equation (11.2.4) by (11.2.9d), we obtain for z and κ− not equal to 1,
z+1 z − 1/z aλ(κ− − 1/κ− ) κ− + 1
= = = ,
z z−1 aλ(κ− − 1) κ−
which implies that z equals κ− . However, our analysis for boundary conditions (11.2.2a)
and (11.2.2b) showed that κ− (1) is not equal to 1 when z is 1, nor is κ− (z) equal to z
for any other z. Thus there is no solution to (11.2.9d), and therefore boundary condition
(11.2.2d) is stable.
11.2 Boundary Conditions for the Leapfrog Scheme 287
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1

0.5

-0.5

-1

0 0.25 0.5 0.75 1

Figure 11.2. Unstable boundary condition for the leapfrog scheme.

Example 11.2.1. The conclusions of this section are illustrated in plots of the solution of
the one-way wave equation computed with the leapfrog scheme and boundary condition
n+1
vM = 2vM−1
n+1
− vM−2
n+1
,

which is similar to (11.2.2a) and is unstable; see Exercise 11.2.3. Figure 11.2 shows the
results of using the leapfrog scheme (11.2.1) with a equal to 1 on the interval [0, 1],
with the solution specified at the left-hand endpoint. The exact solution of the differential
equation is u(t, x) = sin 2π(x + t). The solution of the finite difference scheme is plotted
with a line connecting the dots at the grid points, and the exact solution is plotted with a
solid line in the figure. The exact solution was also used to initialize the first time-step. The
value of h is 0.02 and λ is 0.95. The upper left plot in Figure 11.2 shows the solution
for boundary condition (11.2.2a) at time 1.33. At this time, there is some inaccuracy near
the boundary opposite to that where the numerical boundary condition is applied. This is
evidence of the parasitic mode that propagates in the direction opposite of the true solution.
The upper right graph in Figure 11.2 shows the result of the computation at time 1.90. In
addition to the inaccuracy at the left boundary, there are small oscillations in the solution
288 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

at the right boundary. In the lower plot in Figure 11.2, which shows the solution at time
2.47, it is seen that the solution using the unstable boundary condition has become very
oscillatory. Within a few more time steps the solution becomes much worse. Similar results
are shown in Figure 3.3 of Chapter 3.
By comparison, the use of the more accurate boundary condition (11.2.15b), which
is similar to (11.2.2b), will produce very accurate solutions (see Exercise 11.2.3).

Exercises
11.2.1. Show that the leapfrog scheme (11.2.1) with the boundary condition

v0n+1 (1 + aλ) − aλv1n+1 − v0n = β n+1

is stable.
11.2.2. Show that the leapfrog scheme (11.2.1) with the boundary condition

1 n+1 aλ n+1
(v1 + v0n+1 − v1n − v0n ) = (v − v0n+1 + v1n − v0n ) + β n+1
2 2 1

is stable.
11.2.3. Based on the results for (11.2.2a) and (11.2.2b), conclude that for the leapfrog
scheme the boundary condition

v0n+1 = 2v1n+1 − v2n+1 + β n+1 (11.2.15a)

is unstable and that the boundary condition

v0n+1 = 2v1n − v2n−1 + β n+1 (11.2.15b)

is stable.
11.2.4. Repeat the computations given in Example 11.2.1 and verify the results. Also use
the boundary condition (11.2.15b) and comment on the improvement this boundary
condition gives.

11.3 The General Analysis of Boundary Conditions


In this section we present the general method for checking the stability of boundary condi-
tions for finite difference schemes. These results were developed in the papers of Gustaffson,
Kreiss, and Sundström [26] and Osher [47], [48], and we will refer to them as the GKSO
theory. In these papers the method is developed for hyperbolic equations and systems,
but the method is applicable, with some minor changes, to more general time-dependent
equations. For simplicity we restrict our discussion to hyperbolic equations for now. See
the book by Gustaffson, Kreiss, and Oliger [25] for another presentation of this theory.
11.3 Analysis of Boundary Conditions 289
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We consider a scheme defined for all time and for x on the half-space R+ , with the
boundary at 0. Let the scheme be
n
Pk,h vm = Rk,h fmn . (11.3.1)

We assume the scheme is stable for the initial value problem and consistent with a hyperbolic
equation or system of partial differential equations. We also assume that there are no lower
order terms for the scheme, so that the restrictive von Neumann stability condition holds
for this scheme. The boundary conditions will be written

Bk,h v0n = β(tn ). (11.3.2)

We assume that system (11.3.1) contains d equations and that each vm n is a vector of

dimension d. As discussed at the beginning of Section 11.1, we need consider only the
homogeneous version of (11.3.1). The definition of a stable finite difference scheme for a
hyperbolic initial-boundary value problem is one in which the following estimate holds:

ηv2η,h + |v|2η,h ≤ C(η−1 f 2η,h + |β|2η,h ),

where the norms with double bars refer to functions defined for x in R+ and t in R, and
the single-bar norms refer to functions of t defined only on the boundary.
The general method begins by transforming in t with the Laplace transform to give
the resolvent equation, which we will write as

P̃k,h (z)ṽm (z) = 0. (11.3.3)

The general solution of the resolvent equation (11.3.3) is obtained by considering particular
solutions of the form
ṽm (z) = A(z)κ m ,
where A(z) is a vector of dimension d. Substituting this form of solution in (11.3.3), we
obtain
P̃k,h (z)A(z)κ m = k −1 p̃(z, κ)A(z)κ m .
The matrix function p̃(z, κ) is related to the symbol of Pk,h as defined in Section 3.1 and
to the amplification polynomial defined in Section 4.2 by the relations

p̃(esk , eihξ ) = kpk,h (s, ξ )

and
p̃(g, eiθ ) = G(g, θ).

We see that there will be solutions of the particular form only if


 
det p̃(z, κ) = 0, (11.3.4)

where we regard this as an equation for κ as a function of z. The vector A(z) is a null
vector of p̃(z, κ). Our first important result is the following theorem.
290 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Theorem 11.3.1. If scheme (11.3.1) is stable, then there are integers K− and K+ such
that the roots, κ(z), of equation (11.3.4) separate into two groups, one with K− roots and
one with K+ roots. The group of roots denoted by κ−,ν (z) satisfy

|κ−,ν (z)| < 1 for |z| > 1, and ν = 1, . . . , K− ,

and the group of roots denoted by κ+,ν (z) satisfy

|κ+,ν (z)| > 1 for |z| > 1, and ν = 1, . . . , K+ .

Proof. The proof depends on the relations between p̃ and the amplification poly-
nomial. If some κ assumed the magnitude of 1 when the magnitude of z was larger than
1, then we may write κ = eiθ for some real value of θ, and we have

G(z, θ) = p̃(z, κ) = 0.

But if the scheme is stable, then z, regarded as a function of θ, must satisfy the von
Neumann condition, that is, |z| ≤ 1. This contradiction shows that for |z| larger than 1,
the value of |κ| cannot be 1. Thus the roots split into two groups, those less than 1 in
magnitude and those greater than 1 in magnitude. This proves the theorem.
As an extension of Theorem 11.3.1 we prove the following lemma, which is important
in proving (11.2.12) for general schemes for hyperbolic equations.
Lemma 11.3.2. If κ(z) is a root of the equation (11.3.4) with |κ(z)| = 1 for |z| = 1,
then there is a constant C such that

|κ| − 1 > C(|z| − 1).

Proof. The proof depends on the observation that the roots of the amplification
polynomial G(g, θ ), which are on the unit circle, are simple and that κ is an analytic
function of z. Moreover, since |κ| is not 1 for |z| larger than 1, it follows from the Taylor
series expansion of κ as a function of z that the estimate of the lemma must hold for some
constant.
By Theorem 11.3.1 K− is independent of z, and we may write the general solution
in L2 (R+ ) of the resolvent equation as


K−
ṽm (z) = m
αν (z)Aν (z)κ−,ν (11.3.5)
ν=1

in the case when all the κ−,ν are distinct. The vectors Aν (z) are particular null vectors
of p̃(z, κ−,ν ) and the αν are arbitrary scalar coefficients. If the κ−,ν are not distinct,
then the preceding representation will have to be altered to account for the multiplicity of
the root. Since this occurs infrequently, we omit the details of the construction here (see
Example 11.3.2). Note that the functions κ−,ν (z) are distinguished by the property that
11.3 Analysis of Boundary Conditions 291
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

they are less than 1 in magnitude for z outside the unit circle, but they are also defined by
continuity for z on the unit circle. When z is on the unit circle, we must take some care
to distinguish between the functions κ−,ν and those functions κ+,ν that may also have
magnitude 1.

Definition 11.3.1. An admissible solution to the resolvent equation is a solution that


is in L2 (hZ+ ) in the case when |z| is larger than 1, and, when |z| is equal to 1, an
admissible solution is the limit of admissible solutions with z greater than 1 in magnitude.
That is, v(z) is an admissible solution if |z| is larger than 1 and v(z) is in L2 (hZ+ );
or, if |z| is equal to 1, then

v(z) = lim v(z(1 + ε)),


ε→0+

where v(z(1 + ε)) is in L2 (hZ+ ) for each positive value of ε.

Admissible solutions will have the form (11.3.5) when the roots κ−,ν are distinct. It
is easily seen that the set of admissible solutions is a vector space of dimension K− .
The number of boundary conditions necessary for stability must be precisely K− . If
we substitute expression (11.3.5) into the transformed boundary conditions, obtained from
(11.3.2) by applying the Laplace transform,

B̃ ṽ0 (z) = β̃(z),

we obtain K− equations for the K− coefficients αν . This equation is of the form (11.1.10).
As discussed in Section 11.1, the coefficients αν (z) can be determined by these equations,
and the solution can be bounded independently of z only if there are no nontrivial solutions
to the homogeneous equation for z satisfying |z| ≥ 1.
Thus the check for stability of the boundary conditions reduces to checking that there
are no admissible solutions to the resolvent equation that also satisfy the homogeneous
boundary conditions,
B̃ ṽ0 (z) = 0. (11.3.6)
The basic result is given by the following theorem.

Theorem 11.3.3. The initial-boundary value problem for the stable scheme (11.3.1) for
a hyperbolic equation with boundary conditions (11.3.2) is stable if and only if there are
no nontrivial admissible solutions of the resolvent equation that satisfy the homogeneous
boundary conditions (11.3.6).

The proof of this theorem is not given here. In the generality given by Gustaffson,
Kreiss, and Sundström [26], it applies to schemes for hyperbolic equations with variable
coefficients and uses techniques beyond those of this text.
We also state the corresponding theorem for schemes for parabolic equations. We
restrict ourselves to the case when the finite difference scheme requires no numerical bound-
ary conditions, i.e., when the finite difference scheme requires as many boundary conditions
as does the differential equation.
292 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Theorem 11.3.4. If the initial-boundary value problem for the stable scheme (11.3.1) to-
gether with boundary conditions (11.3.2) approximates a well-posed initial-boundary value
problem for a parabolic differential equation and the number of boundary conditions re-
quired by the scheme is equal to the number required by the differential equation, then the
initial-boundary value problem is stable if and only if there are no admissible solutions
of the resolvent equation that satisfy the homogeneous boundary conditions (11.3.6) for
|z| ≥ 1 except for z = 1.

The main difference between these theorems is that in Theorem 11.3.4, there is no
need to check for admissible solutions in the case when z is 1. The reason for this is that
the assumption that the differential problem is well-posed removes the need to check at z
equal to 1. There may be solutions to the resolvent equation with z equal to 1 and κ−
on the unit circle, but these do not cause instability because of the well-posedness of the
initial-boundary value problem for the partial differential equation.
We now illustrate Theorems 11.3.3 and 11.3.4 by applying them to several schemes
and boundary conditions.

Example 11.3.1. Our first example is for the Crank–Nicolson scheme (3.1.3) for the one-
way wave equation and the quasi-characteristic extrapolation boundary condition (3.4.1),
or, equivalently, the scheme for ut = aux , with a positive, given by

aλ n+1 aλ n+1 aλ n aλ n
− v + vm
n+1
+ v = v + vm
n
− v (11.3.7)
4 m+1 4 m−1 4 m+1 4 m−1
with boundary condition (11.2.2b).
Corresponding to equation (11.3.4) we obtain

z−1 aλ 1
= κ− . (11.3.8)
z+1 4 κ

This equation is equivalent to a quadratic equation in κ and we see that if κ(z) is a root,
then so is −1/κ(z). Thus, there is one root inside the unit circle and one outside, and by
Theorem 11.3.1 they remain separated for z outside the unit circle. Thus the functions
κ− (z) and κ+ (z) are well defined. An alternate way of deducing that K− and K+ are
both 1, one that can apply in more general cases (e.g., Example 11.3.2 and Exercise 11.3.5),
is to examine the roots for z near −1 where the left-hand side becomes infinite. If we set
z = −(1 + ε), then from (11.3.8) we have that one root satisfies

2 aλ
≈ κ
ε 4
and is therefore outside the unit circle, and the other satisfies

2 aλ 1
≈−
ε 4 κ
and is therefore inside the unit circle. Thus K− and K+ are both 1.
11.3 Analysis of Boundary Conditions 293
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The boundary condition (11.3.6) resulting from the substitution ṽm = κ−


m results in

the equation
z − κ− (z) = 0.
Since z is restricted to |z| ≥ 1 and κ− is restricted by |κ− (z)| ≤ 1, the only way that
equation (11.3.8) can be satisfied is if z = κ− (z) = eiθ for some real value of θ. Substi-
tuting this relation in equation (11.3.8), we obtain

eiθ − 1 aλ iθ
= (e − e−iθ )
eiθ + 1 4
or
tan 12 θ = 12 aλ sin θ = aλ sin 12 θ cos 12 θ.

This equation is satisfied if either sin 12 θ is zero or cos2 21 θ = (aλ)−1 .


We first check the possibility that sin 12 θ is zero. This is equivalent to showing that
κ− (1) is 1. Notice that for z equal to 1, there is a root κ of (11.3.8) equal to 1, but it is must
be determined whether this root is κ− (1) or κ+ (1). As done earlier in analyzing boundary
conditions (11.2.2a) and (11.2.2b) for the leapfrog scheme (see estimates (11.2.10a) and
(11.2.10b)), we set z = 1 + ε and κ = 1 + δ. We easily obtain from (11.3.8):

ε aλ
+ O(ε 2 ) = [δ + O(δ 2 )],
2 2
and thus we see that it is κ+ (1) that is 1, and not κ− (1). Thus there is no difficulty with
the case when z is 1.
We next consider the situation with

cos2 21 θ = (aλ)−1 . (11.3.9)

We see immediately that if aλ is less than 1, then the boundary condition is stable, since
this equation cannot be satisfied for real values of θ. For aλ equal to 1, (11.3.9) holds
only for θ equal to 0, and as we have already shown, this is not an admissible solution. If
aλ is greater than 1, then we set

1+ε
z = eiθ and κ = eiθ (1 + δ),
1−ε

where we have chosen the form of z to facilitate the algebraic manipulations. Substituting
these expressions into (11.3.8), we obtain

1 1 − iε cot 12 θ aλ
i tan θ + O(ε2 ) = [i sin θ + δ cos θ − δ 2 eiθ + O(δ 3 )],
2 1 + iε tan 2 θ
1 2

and hence to within O(ε2 ) and O(δ 2 ),

ε(1 + tan2 21 θ) = δ 12 aλ cos θ.


294 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Thus, if cos θ is positive, then it is κ+ (z) that is equal to z, and if cos θ is negative, then
κ− (z) is equal to z and the scheme with the boundary condition is unstable. The condition
that cos θ is negative is equivalent to the condition that cos2 (θ/2) is less than 1/2, and
so by (11.3.9) we see that the scheme is unstable for aλ larger than 2. When aλ is equal
to 2, then both κ− (z) and κ+ (z) are equal to z, and thus this case is also unstable.
We conclude that the Crank–Nicolson scheme (11.3.7) with boundary condition
(11.2.2b) is stable for aλ less than 2 and unstable if aλ is greater than or equal
to 2.

Example 11.3.2. Our next example is for the (2, 4) leapfrog scheme
n+1 − v n−1 
vm m h2
= 1 − δ 2 δ0 unm (11.3.10)
2k 6

for ut = ux on x ≥ 0. This is the same scheme as (4.1.7). Because this scheme involves
n
vm−1 n
and vm−2 to compute vm n+1 , it requires two boundary conditions. This can also be

seen from the equation for the roots of p̃(z, κ), which is equivalent to
  !
−1 λ 1 1
z−z = κ− 8− κ + . (11.3.11)
6 κ κ

For z very large we see that there are two roots satisfying
λ
z≈ (−κ 2 )
6
so that K+ is 2. There are two roots satisfying
λ −2
z≈ (κ )
6
so that K− is 2, and this must be the number of boundary conditions. For our boundary
conditions at m equal to 0 and 1, we take the quasi-characteristic extrapolations

v0n+1 = 2v1n − v2n−1 (11.3.12a)


and
v1n+1 = 2v2n − v3n−1 . (11.3.12b)

Recall that the stability condition for scheme (11.3.10) is


 
1 −1 √ 3 −1/2
λ < λ̄ = 1 + √ 6− (11.3.13)
6 2

as shown in Example 4.1.2. (See (4.1.9).)


The general admissible solution to the resolvent equation for the scheme (11.3.10) is

ṽm = α1 (z)κ−,1 (z)m + α2 (z)κ−,2 (z)m (11.3.14)


11.3 Analysis of Boundary Conditions 295
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

when the two roots are not equal and is of the form

ṽm = α(z)κ−,1 (z)m + α  (z)mκ−,1 (z)m−1 (11.3.15)

when κ−,1 equals κ−,2 .


Applying the boundary conditions (11.3.12) to the solution (11.3.14), we obtain the
equations
 2  2
z − κ−,1 (z) α1 + z − κ−,2 (z) α2 = 0
and  2  2
z − κ−,1 (z) κ−,1 α1 + z − κ−,2 (z) κ−,2 α2 = 0.
There will be a nontrivial solution to this system of equations for α1 and α2 only if the
determinant of the system is zero. The determinant is

(z − κ−,1 )2 (z − κ−,2 )2
det = (z − κ−,1 )2 (z − κ−,2 )2 (κ−,2 − κ−,1 ).

(z − κ−,1 )2 κ−,1 (z − κ−,2 )2 κ−,2

Since we have assumed that the values of the κ− are distinct, we see that the only way that
the determinant can vanish is when at least one of the two functions κ−,1 or κ−,2 is equal
to z. We may assume that κ−,1 , which we will now denote as κ− (z), is equal to z and,
as we have discussed before, this can only happen when both z and κ− are on the unit
circle, i.e., when z = κ− (z) = eiθ for some real value of θ. Substituting this relation in
(11.3.11), we see that several cases are possible. Either z = κ− = 1, z = κ− = −1, or
 !
λ 1 λ
1= 8 − κ− + = (4 − cos θ),
6 κ− 3

which is equivalent to λ and θ being related by

3
cos θ = 4 − . (11.3.16)
λ

It is not hard to show that the first two cases are not possible, i.e., that κ− (1)  = 1 and
κ− (−1) = −1. This is left as an exercise (see Exercise 11.3.6). In the third case, the
scheme is unstable if the value of θ is determined by (11.3.16). Since the scheme itself is
stable for 0 ≤ λ < λ̄ (see (11.3.13)), instability can only occur for 3/5 ≤ λ < λ̄. Notice
that in this case, cos θ is negative. For λ in this range we check on whether it is a κ−
root, or a κ+ root which is equal to z. As before, we set

z = eiθ (1 + ε) and κ = eiθ (1 + δ)

and we obtain the equation for δ from (11.3.11) as

3 1 + 4 cos θ − 2 cos2 θ
δ=ε + O(ε 2 ).
λ cos θ
296 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Since cos θ is negative, we have that the root κ will be inside the unit circle when z is
outside, i.e., δ < 0 when ε > 0, only when

1 + 4 cos θ − 2 cos2 θ > 0.

This relation is true when | cos θ | is greater than the value determined by making this
inequality an equality. This gives a quadratic equation, only one of whose roots has magni-
tude less than 1. When the value of cos θ from this equation is set equal to the expression
(11.3.16) we have

3 3
cos θ0 = 1 − =4− .
2 λ0
We conclude that the boundary condition for this scheme is stable when | cos θ| is greater
than | cos θ0 |, or, equivalently, when λ is less than


1 −1
λ0 = 1 + √ .
6

For λ greater than λ0 , the value of cos θ as given by (11.3.16) is greater in magnitude
than cos θ0 , and the scheme with the boundary conditions (11.3.12) is unstable. As in
the previous cases, when λ is equal to λ0 , then two roots, one a κ− root and one a κ+
root, are equal and equal to z. It remains to check that there are no additional admissible
solutions of the form (11.3.15) that satisfy the homogeneous boundary conditions. This is
left as an exercise (see Exercise 11.3.7). Thus the scheme is unstable for λ equal to λ0 .
We conclude that scheme (11.3.10) with boundary conditions (11.3.12) is stable only for


1 −1
λ < λ0 = 1 + √ < λ̄.
6

Since λ0 ≈ 0.7101 and λ̄ ≈ 0.7287, the boundary conditions exclude a rather small range
of values for λ.
Figure 11.3 displays the result of computations with the (2, 4) leapfrog scheme with
the boundary conditions (11.3.12) applied on the right-hand side. The exact solution, which
gives the initial condition and left boundary data, is

u(t, x) = sin(2π(x − t))

and is displayed in the figure as the curve without dots. The top left part of the figure shows
the computation with h = 0.1 and λ = 0.7 at time 14. Although the solution is not very
accurate, it is apparently stable. The top right part displays the computation with h = 0.1
and λ = 0.72 at time 4.032. The solution is becoming quite poor due to the instability.
The lower part of the figure shows the computation with h = 0.025 and λ = 0.7 at time
14 displaying that the solution is quite accurate for smaller values of h for this value
of λ.
11.3 Analysis of Boundary Conditions 297
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1

1.5

0.5

-0.5

-1

-1.5
0 0.25 0.5 0.75 1

Figure 11.3. Stable and unstable boundary conditions.

Example 11.3.3. We consider the heat equation (6.1.1) with the Neumann boundary con-
dition ux = 0 at x = 0. The scheme is the Crank–Nicolson scheme
bµ 2 n+1
n+1
vm − vm
n
= δ (vm + vm
n
),
2
and the boundary condition to implement the Neumann condition is
3v0n+1 − 4v1n+1 + v2n+1
= 0.
2h
The equation relating z and κ is

z−1 1
= bµ κ − 2 + ,
z+1 κ
and the boundary condition yields the relation
0 = 3 − 4κ− + κ−
2
= (1 − κ− )(3 − κ− ).
298 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

From the boundary condition, we see that the only possible solution is with κ− equal
to 1, and the relation between z and κ implies that κ is 1 only when z is 1. Since
the differential equation with the boundary condition is well-posed by Theorem 11.3.4,
there is nothing further to check. The finite difference equation and boundary condition
are stable.

Exercises
11.3.1. Show that the scheme
n+1 − 4v n + v n−1
3vm v n+1 − vm−1
n+1
m m
= m+1
2k 2h
for the equation ut = ux on x ≥ 0 with the boundary condition
v0n+1 − v0n v n − v0n
= 1
k h
is stable only if λ < 5/3. (See also (4.2.3) and Exercise 4.4.3.) Hint: The critical
values are
 
1 + 4(λ − 1)2 − 1 2λ − 3 + 1 + 4(λ − 1)2
κ− = − and z = − .
2λ 2
(See also Exercise 11.3.12.)
11.3.2. Show that the scheme
n+1
n+1 − 4v n + v n−1
3vm m m vm+1 − vm−1
n+1
=
2k 2h
for the equation ut = ux on x ≥ 0 with the boundary condition
v0n+1 − v0n v n+1 − v0n+1
= 1
k h
is unconditionally stable.
11.3.3. Show for the Lax–Wendroff and Crank–Nicolson schemes for the one-way wave
equation ut + ux = 0 on x ≥ 0, for which the data should be specified at x = 0,
that extrapolation of the solution given by either (11.2.2a) or (11.2.2b) is unstable.
11.3.4. Show that the Crank–Nicolson scheme discussed in Example 11.3.1 with boundary
condition (11.2.2a) is stable.
11.3.5. Show that the (2, 4) Crank–Nicolson scheme
n+1 − v n  −1  n+1 n
vm m h2 2 vm + vm
= 1+ δ δ0
k 6 2

for the equation ut = ux on x ≥ 0 with the boundary condition


v0n+1 = v1n
is stable only for λ < 2.
11.3 Analysis of Boundary Conditions 299
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

11.3.6. Show for the (2, 4) leapfrog scheme in Example 11.3.2 that κ− (1) is not equal to
1 and κ− (−1) is not equal to −1.
11.3.7. Show for the (2, 4) leapfrog scheme in Example 11.3.2 that there are no admissible
solutions of the form (11.3.15) that satisfy the homogeneous boundary conditions.
11.3.8. Show that the Du Fort–Frankel scheme (6.3.6) for the heat equation (6.1.1) with
the boundary condition

3v0n+1 − 4v1n+1 + v2n+1


=0
2h

is a stable approximation to the heat equation with the Neumann boundary condition.
11.3.9. Show that when the Crank–Nicolson scheme for the heat equation (6.1.1) with the
Neumann boundary condition is approximated by the boundary condition
 
v0n+1 + v0n
δ+ =0
2

the initial-boundary value problem is unstable.


11.3.10. Show that the (4, 4) scheme for ut − ux = 0,

n+2 − v n−2  −1  n+1 n + 2v n−1


vm m h2 2vm − vm
− 1 + δ2 δ0 m
= 0,
4k 6 3

which was discussed in Example 4.3.3, is unstable with the boundary condition

v0n+1 = 3v1n+1 − 3v2n+1 + v3n+1 .

Hint: Show that κ− (−1) = 1.


11.3.11. Show that the (4, 4) scheme of Exercise 11.3.10 is stable with the boundary
condition
v0n+1 = 3v1n − 3v2n−1 + v3n−2 .

11.3.12. Demonstrate with a computer program the instability of the boundary condition
given in Exercise 1 using the data u(t, x) = | sin(x + t)| on the interval [0, 1]
for t between 0 and 17, using λ equal to 1.7, and the stability of the boundary
condition when λ is 1.6 and t between 0 and 16. Use grid spacings of 1/10,
1/20, and 1/40. The boundary condition at x equal to 1 should be that the exact
solution is specified.
11.3.13. Demonstrate the instability discussed in Example 11.3.2 with numerical compu-
tations.
300 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

11.4 Initial-Boundary Value Problems for Partial


Differential Equations
In this section we present the method of determining if boundary conditions for initial-
boundary value problems are well-posed. We give illustrations of the method using several
examples, but we do not give complete proofs of the results.
We present the theory using as an example the parabolic equation

ut = b(uxx + uyy ) + f (t, x, y) (11.4.1)

on the region {(t, x, y) : t, y ∈ R, x ∈ R+ }. At the boundary we consider the boundary


condition
ux + αuy = β(t, y). (11.4.2)
We assume that the constants b and α are complex numbers and hence that u is a complex-
valued function. By considering the real and imaginary parts of u, we can replace (11.4.1)
and (11.4.2) by an equivalent system of two equations involving two real-valued functions.
For (11.4.1) to be parabolic we must require that the real part of b be positive.
We begin our analysis by taking the Fourier transform in the variable y and the
Laplace transform in t. We obtain the equation

ûxx = (b−1 s + ω2 )û − b−1 fˆ (11.4.3)

for the transform û(s, x, ω) of u. The general solution of (11.4.3) that is in L2 (R+ ) as a
function of x is
 ∞
1
û =û0 (s, ω)e −κx
+ e(x−z)κ fˆ(x, z, ω) dz
2κb x
 x (11.4.4)
1 −(x−z)κ ˆ
+ e f (s, z, ω) dz,
2κb 0

where
κ = (b−1 s + ω2 )1/2 and Re κ > 0.
Recall that the real part of s, i.e., η, is positive.
The function û0 (s, ω) is determined by boundary condition (11.4.2), which, after
transforming, is
ûx + αiωû = β̂(s, ω).
Substituting (11.4.4) in this boundary condition, we have
 ∞ !
1 −zκ ˆ
(−κ + iαω) û0 − e f (s, z, ω) dz = β̂(s, ω). (11.4.5)
2κ 0

This is a linear equation for the unknown û0 much like (11.1.10), where here the parameter
ρ varies over the set {(s, ω) : Re s > 0, ω ∈ R}. We see that we can solve for û0 only
11.4 Initial-Boundary Value Problems for PDEs 301
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

if the quantity −κ + iαω is not zero. Moreover, we can get a uniform estimate for û0 in
terms of β̂ only if −κ + iαω is bounded away from zero.
Let us now determine when −κ + iαω is zero. This occurs when

κ = b−1 s + ω2 = iαω,
and since ω can be either positive or negative, we lose no information if we square both
sides of this relation, obtaining
s = −b(α 2 + 1)ω2 .
This equation can be satisfied for (s, ω) with Re s ≥ 0 and ω real only if
Re b(α 2 + 1) ≤ 0.
If this is satisfied, then the solution to (11.4.5) cannot be uniformly bounded by the data
in this case. We conclude that the requirement for the boundary condition (11.4.2) to be
well-posed for equation (11.4.1) is
Re b(α 2 + 1) > 0. (11.4.6)
There are several things we should point out about this example that apply to more
general problems. First, notice that the function f (t, x, y) does not play a role in deciding
whether or not an estimate exists. Also, if condition (11.4.6) is satisfied, then an estimate
relating u to β and f can be obtained.
In general, for a partial differential equation of the form
ut = P (∂x , ∂y )u + f (t, x, y) (11.4.7)
for x ∈ R+ and y ∈ Rd with boundary conditions
Bu = β(t, y) (11.4.8)
on the boundary given by x equal to zero, the procedure to determine the well-posedness
of the boundary conditions is as follows.
First, consider the resolvent equation
[s − P (∂x , iω)] û = 0, (11.4.9)
which is an ordinary differential equation for û as a function of x. The parameter s is
restricted so that Re s > 0 and ω ∈ R d . The boundary condition for û is
B û = 0. (11.4.10)
Both the resolvent equation (11.4.9) and the boundary condition (11.4.10) are obtained by
applying the Laplace transform in t and the Fourier transform in y to the homogeneous
equation corresponding to (11.4.7) and the homogeneous boundary conditions (11.4.8).

Definition 11.4.1. An admissible solution to the resolvent equation (11.4.9) is a solution


that is in L2 (R+ ) as a function in x in the case when Re s is positive, and, when Re s = 0,
an admissible solution is the limit of admissible solutions with Re s positive. That is,
û(s, x, ω) is an admissible solution if Re s is positive and û(s, x, ω) is in L2 (R+ ). Or,
if Re s is equal to 0, then
û(s, x, ω) = lim û(s + ε, x, ω),
ε→0+
where û(s + ε, x, ω) is an admissible solution for each positive value of ε.
302 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Theorem 11.4.1. The initial-boundary value problem for differential equation (11.4.7) with
boundary condition (11.4.8) is well-posed if and only if there are no nontrivial admissible
solutions to the resolvent equation (11.4.9) that satisfy the homogeneous boundary condition
(11.4.10).

Theorem 11.4.1 deals with the strongest notion of a well-posed initial-boundary value
problem. The estimates that characterize the well-posedness involve estimates of the so-
lution in the interior of the domain and also L2 estimates of the solution on the boundary
in terms of L2 estimates of the boundary data. For the proof of Theorem 11.4.1, see [9],
[25], or [34].
If we modify the requirement to allow other norms of the solution and data on the
boundary, then some initial-boundary value problems that are ill-posed under Theorem
11.4.1 are well-posed in a weaker sense. This weaker form of the well-posed estimate
occurs frequently for hyperbolic systems. Based on the work of Kreiss [34] and [35], we
have the following theorem.

Theorem 11.4.2. If a nontrivial admissible solution û(s0 , x, ω0 ) to the hyperbolic system


(11.4.9) with Re s0 = 0 and |s0 |2 + ω02  = 0 satisfies the homogeneous boundary condition
(11.4.10) but there is a constant c such that

B û(s0 + ε, 0, ω0 ) ≥ c εû(s0 , 0, ω)

for ε sufficiently small and positive and there are no nontrivial admissible solutions with
Re s > 0 satisfying the homogeneous boundary conditions, then the initial-boundary value
problem is weakly well-posed.

The following example illustrates the use of the two theorems.

Example 11.4.1. We consider the hyperbolic system


 1   1  
u −1 0 u 0 1 u1
= + (11.4.11)
u2 t 0 1 u2 x 1 0 u2 y

on the domain R+ × R, with boundary condition

u1 + au2 = β(t, y) (11.4.12)

on x = 0.
The resolvent equation is equivalent to
 1  
û −s iω û1
=
û2 x −iω s û2

and the general admissible solution is


 1 
û −iω
=α e−κx , (11.4.13)
û2 κ −s
11.4 Initial-Boundary Value Problems for PDEs 303

Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where κ = s 2 + ω2 , with the convention that Re κ ≥ 0.


Substituting the admissible solution (11.4.13) into the homogeneous boundary con-
dition, we have
[−iω + a(κ − s)] α = 0,
which has a nontrivial solution for α only if
 
iω = a(κ − s) = a s 2 + ω2 − s . (11.4.14)

If there is a solution to (11.4.14) with ω ∈ R and Re s ≥ 0, then the initial-boundary


value problem consisting of the equation (11.4.11) and boundary condition (11.4.12) is ill-
posed; otherwise it is well-posed.
To examine (11.4.14), set s equal to ζ |ω| and obtain, after dividing by |ω|,
&
±i = a ζ2 + 1 − ζ


or, after multiplying by ζ 2 + 1 + ζ,
&
ζ 2 + 1 + ζ = ±ia. (11.4.15)

The mapping taking ζ to w = ζ 2 + 1 + ζ maps the plane given by Re ζ ≥ 0
onto the domain D = {w : Re w ≥ 0 and |w| ≥ 1}. Therefore, there is a solution to
(11.4.15) with Re ζ nonnegative if and only if |a| ≥ 1. Conversely, there is no solution
to (11.4.15) with Re ζ nonnegative if |a| is less than 1.
We conclude that the initial-boundary value problem for (11.4.11) with the boundary
condition (11.4.12) is well-posed in the strong sense only if |a| is less than 1.
If a is 1 or −1, then the initial-boundary value problem is well-posed in the weaker
sense, as we now show. For a equal to 1, we have an admissible solution satisfying the
homogeneous boundary condition when (s0 , ω0 ) is equal to (i, −1) by (11.4.14). All
other solutions are proportional to this solution. In this case, we have from (11.4.13) that

i
û(s0 + ε, x, ω0 ) = α e−κx ,
κ −i−ε
 1/2 √ √
where κ = (i + ε)2 + 1 = 2iε + O(ε). Notice that u(s0 , 0, ω0 ) = 2|α|. Sub-
stituting this function in the boundary condition with a equal to 1, we have by (11.4.12)
that
|û1 + û2 | = |α||i + κ − i − ε|
√ 
= |α||κ − ε| = |α| 2ε + O(ε) .

By Theorem 11.4.2 boundary condition (11.4.12) with a = 1 is well-posed in the weaker


sense; a similar analysis holds for a equal to −1.
304 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example 11.4.2. For our second example we use the system

ut + aux + buy + hx = 0,
vt + avx + bvy + hy = 0, (11.4.16)
ht + ahx + bhy + ux + vy = 0,

which is obtained by linearizing the shallow water equations around a constant flow;
see [46].
We consider this system on the domain {(x, y) : x ≥ 0, y ∈ R} and consider the
case when the coefficient a satisfies 0 < a < 1. The resolvent equation for this system
can be written as
a ûx + ĥx + s  û = 0,
a v̂x + s  v̂ + iωĥ = 0, (11.4.17)

a ĥx + ûx + iωv̂ + s ĥ = 0,
where s  = s + ibω. We first determine κ so that there are solutions to (11.4.17) of the
form    
û û0 (s, ω)
v̂ = v̂0 (s, ω) e−κx
ĥ ĥ0 (s, ω)
with the real part of κ being positive. The equation for κ is
  
s − aκ 0 −κ
0 = det 0 
s − aκ iω
−κ iω s  − aκ
 
= (s  − aκ) (s  − aκ)2 + ω2 − κ 2 .
Thus the values of κ with real part positive are
  
κ0 = a −1 s  and κ1 = −as  + s 2 + (1 − a 2 )ω2 /(1 − a 2 ) . (11.4.18)

These two roots are distinct as long as s  is not equal to |aω|.


The general form of an admissible solution when κ0 and κ1 are not equal is
     
û iω κ1
v̂ = A0 κ0 e−κ0 x + A1 −iω e−κ1 x . (11.4.19)
ĥ 0 s − aκ1
1

Since there are two values of κ that have positive real parts, there must be two
boundary conditions. For this example we consider the case where both u and v are
specified. The homogeneous boundary condition corresponding to this is û = 0 and v̂ = 0
at x equal to 0.
From (11.4.19) we obtain that the homogeneous boundary condition are satisfied only
if the equations
A0 iω + A1 κ1 = 0,
A0 κ0 − A1 iω = 0
11.4 Initial-Boundary Value Problems for PDEs 305
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

are satisfied. There is a nontrivial solution for A0 and A1 only if

κ1 κ0 = ω2 ,

which is equivalent to
  
s  −as  + s 2 + (1 − a 2 )ω2 = a(1 − a 2 )ω2 .

Rearranging this last equation we have


  
s  s 2 + (1 − a 2 )ω2 = a s 2 + (1 − a 2 )ω2 .

This relation is satisfied either if s 2 + (1 − a 2 )ω2 is zero or if


 
s 2 = a 2 s 2 + (1 − a 2 )ω2 . (11.4.20)

The expression s 2 + (1 − a 2 )ω2 is zero when s  is ± i(1 − a 2 )1/2 ω. We will consider


this possibility first.
Since for this case we have an admissible solution that satisfies the homogeneous
boundary condition, the initial-boundary value problem is not well-posed in the stronger
sense of Theorem 11.4.2. We now show that it is well-posed in the weaker sense of Theorem
11.4.2. We take s0 = i(1 − a 2 )1/2 and ω0 = 1; the other possibilities are equivalent.
We have for this choice of (s0 , ω0 ) that the admissible solution satisfying the homo-
geneous boundary condition is
   
iω0 κ1
−κ0 x
a κ0 e + (1 − a )
2 1/2
−iω0 e−κ1 x .
0 s0 − aκ1

Note that κ0 = ia −1 (1 − a 2 )1/2 and κ1 = −ia(1 − a 2 )−1/2 .


Replacing s0 by s0 + ε and ω0 by 1 in (11.4.18) we have

κ0 = ia −1 (1 − a 2 )1/2 + a −1 ε

and √
κ1 = −ia(1 − a 2 )−1/2 + 2iε(1 − a 2 )−3/4 + O(ε).
The boundary condition for the admissible solution at (s, ω) equal to (s0 + ε, ω0 ) is
 
û(s0 + ε, 0, ω0 ) aiω0 + (1 − a 2 )1/2 κ1
a =
v̂(s0 + ε, 0, ω0 ) aκ0 − (1 − a 2 )1/2 iω0
√
2iε(1 − a 2 )1/4 + O(ε)
=
ε
√ √
√ 2i(1 − a 2√)1/4 + O( ε)
= ε .
ε
306 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The norm of the vector (û, v̂, h)T therefore satisfies the condition of Theorem 11.4.2,
and so this boundary condition is well-posed in the weak sense, except that we must still
check the admissible solutions for (11.4.20).
Rearranging (11.4.20), we see that it is satisfied only when s  is equal to |aω|.
(Recall that Re s is nonnegative.) For these values of s  and ω, the values of κ0 and κ1
are equal, as noted before. Thus the admissible solutions are not of the form (11.4.13) but
rather      
û 0 iω
−κx
v̂ = B0 −1 e + (B1 + xB0 ) κ e−κx ,
ĥ iσ a 0
where σ = sign(ω) and κ = |ω|. It is easy to check that the only admissible solution with
û and v̂ equal to zero is the trivial solution, i.e., with B0 and B1 equal to zero.
We conclude that the boundary condition specifying both u and v for the system
(11.4.11) is well-posed in the weak sense.

Exercises
11.4.1. Show that the initial-boundary value problem for the system
ut = uxx + uyy + vxy ,
vt = vxx + vyy
for x > 0, y ∈ R with boundary conditions
ux + c1 vy = β1 ,
vx + c2 uy = β2
at x = 0 is well-posed if and only if
c2 (c1 − 12 ) < 1.

11.4.2. Show that the boundary condition for system (11.4.16) where u + a −1 h and v are
specified at x = 0 is well-posed when 0 < a < 1. Show that specifying u and
h is an ill-posed boundary condition.
11.4.3. For the system (11.4.16) when −1 < a < 0, show that one boundary condition is
needed and that specifying h is well-posed and specifying v is ill-posed.
11.4.4. Verify for a parabolic system of the form (6.2.1) that the boundary condition of the
form (6.2.4) is well-posed if and only if the matrix T given in (6.2.5) is nonsingular.
11.4.5. Consider the parabolic system
ut = uxx + vxx ,
vt = vxx
on x ≥ 0, −∞ < t < ∞, with the boundary conditions
ux + a vx = β1 (t),
bu + v = β0 (t).
Using both the method discussed in Exercise 11.4.4 and the method discussed in
this section,
 show that this initial-boundary value problem is ill-posed if and only
if b a − 12 = 1. Demonstrate this behavior with a computer program.
11.5 The Matrix Method 307
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

11.4.6. Show for the hyperbolic system (1.2.2) with 0 < a < b that the boundary condi-
tions
c01 u + c02 v = β0 (t) at x = 0,
c11 u + c12 v = β1 (t) at x = 1
are well-posed if and only if they are equivalent to (1.2.4).

11.5 The Matrix Method for Analyzing Stability


Another method that is frequently used to analyze stability of finite difference schemes is
the matrix method. The method considers the total initial-boundary value problem together,
not separating the initial value problem from the boundary conditions as we have done in
Sections 11.2 and 11.3. Because of this it is more difficult to make conclusions about the
results of the matrix method. We introduce the method with an example.

Example 11.5.1. We illustrate the matrix method and its deficiencies by applying it to
the forward-time backward-space scheme (1.3.2) for the one-way wave equation (1.1.1)
on the unit interval. We assume the characteristic speed a is positive and hence that v0n
is specified. Considering the unknowns vmn for m from 1 to M as the components of a
n
vector V , we can write the scheme as

V n+1 = CV n + bn , (11.5.1)

where the matrix C has the form


 
1 − aλ 0 0
 aλ 1 − aλ 0 
 
 aλ 1 − aλ 0 
 
 .. .. .. 
 . . . 
 aλ 1 − aλ 0 
0 aλ 1 − aλ

and bn = (aλv0n , 0, . . . , 0) . The solution of this equation can be written as


n
V n = CnV 0 + C n−j bj .
j =0

The superscript on V and b is the index for the time level, whereas the superscript on
C indicates the multiplicative power. If λ is a constant and the matrix norms C j  are
bounded uniformily for 0 ≤ nk ≤ T , we obtain
 

n
hV n  ≤ CT hV 0  + k V0  ,
j

j =0
308 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
" 1/2
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

M j
where V  = m=1 |vm |
2 and V0 is the boundary data at m = 0. This is precisely
the estimate we need to demonstrate the stability of the initial-boundary value problem,
analogous to (1.5.1) with the addition of the boundary data. Moreover, it is not difficult to
see that the boundedness of C n  for 0 ≤ nk ≤ T is necessary and sufficient to obtain
the preceding estimate. We will show that the powers of the matrix C are bounded for 0 ≤
aλ ≤ 1, which agrees with our earlier results that the scheme, together with this boundary
condition, are stable. To do this we first obtain relations between several matrix norms.
The reader may wish to consult Appendix A for the definitions of the norms.

Lemma 11.5.1. For a M × M matrix A,

1
√ A1 ≤ A2 ≤ (A1 A∞ )1/2 .
M

Proof. We prove the right inequality first. It is easy to show that

ρ(B) ≤ B

for any norm; therefore,


A22 = ρ(A∗ A) ≤ A∗ A1
≤ A∗ 1 A1
= A∞ A1 .

For the left inequality we use the fact that v1 ≤ Mv2 by the Cauchy–Schwarz
inequality; also, if v2 ≤ 1, then v2 ≤ v1 . Hence, by the definition of the norms,

A2 = sup Av2 ≥ sup Av2


v2 ≤1 v1 ≤1

1 1
≥ √ sup Av1 = √ A1 .
M v1 =1 M

We now consider the matrices C n . The element of C n on the j th lower diagonal is



n
(1 − aλ)n−j (aλ)j ,
j
n
where we take j equal to zero if j is greater than n. Thus, by Proposition A.1.6,

M 
n
C 1 =
n
|1 − aλ|n−j |aλ|j
j
j =1

≤ (|1 − aλ| + |aλ|)n ,


11.5 The Matrix Method 309
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where we have equality only if n is less than or equal to M. Lemma 11.5.1 gives us
1
√ (|1 − aλ| + |aλ|)n ≤ C n 2
M
for n ≤ M and, since we also have C n ∞ = C n 1 ,
C n 2 ≤ (|1 − aλ| + |aλ|)n
for all n. If |aλ| is a constant, these two inequalities show that a necessary and sufficient
condition for the stability of the finite difference initial value problem is that 0 ≤ aλ ≤ 1,
which agrees with the GKSO method given in Section 11.3.

Although the matrix method incorporates both the initial conditions and the boundary
conditions into its analysis, this advantage is offset by the difficulty of analyzing, in general,
matrices such as C and proving the estimates on its powers. This arises because the order
of the matrix increases as h decreases and yet the estimates must be independent of h.
A common misuse of the matrix method is to determine the conditions on C such
that C n tends to zero or is bounded as n increases without bound and to regard this as
the finite difference stability condition. It is a well-known theorem of matrix analysis that
powers of C ultimately tend to zero if and only if the eigenvalues of C have modulus less
than 1 (see Exercise A.2.12 in Appendix A). Also, C n is bounded if the eigenvalues of C
are at most 1 in magnitude and those eigenvalues of magnitude 1 are semisimple; i.e., they
correspond to trivial Jordan blocks. For the particular example we have been considering,
the matrices C n tend to 0 if
|1 − aλ| < 1
and the matrices are bounded if we allow, in addition, that aλ = 0. Thus the condition
0 ≤ aλ < 2
is the necessary and sufficient condition that C n  is bounded independent of n. The
explanation of the discrepancy between this result and the CFL condition is that for aλ
larger than 1 but less than 2, the norms of C n will increase initially and then ultimately
decay; however, there is no bound on C n that is independent of M, which is equivalent
to a bound independent of k or h.
Another difficulty with the matrix method is that if we determine by this method that a
scheme with boundary conditions is unstable, there is no easy way of determining whether
the instability is due to the scheme itself or to the boundary conditions. Von Neumann
analysis determines the stability of the scheme alone and is easier to perform than the
matrix method. Although the GKSO analysis of Section 11.3 for the boundary conditions
can be somewhat difficult, it is usually easier than the matrix method. Thus the separation
of the stability analysis into the consideration of the two parts by themselves is, in general,
easier and more informative than is the matrix method.
The analysis for Example (11.5.1) works because matrix C satisfies C1 = 1 for
0 ≤ aλ ≤ 1, which is related to all the coefficients of the scheme being positive. By Theo-
rem 3.1.4 the matrix method will have C1 greater than 1 for any scheme for hyperbolic
equations that is more than first-order accurate; see Exercise 11.5.2. This means that the
matrix method is more difficult to apply for higher order schemes.
310 Chapter 11. Well-Posed and Stable Initial-Boundary Value Problems
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
11.5.1. Using the method of Section 11.3, show that the forward-time backward-space
scheme for the one-way wave equation (1.1.1) on the unit interval is stable with the
solution specified at the left endpoint and the scheme being applied up to the right
endpoint.
11.5.2. Use Theorem 3.1.4 to show that for any scheme for hyperbolic equations that is
more than first-order accurate, other than the trivial cases given in the theorem,
matrix C, as in (11.5.1), satisfies C1 > 1.
11.5.3. Determine conditions under which a matrix of the form
 
0 β 0
α 0 β 
 
 α 0 β 
 
C=  . . .
. .. ..
. 

 
 α 0 β 
 
α 0 β
0 α−β 2β

satisfies C1 ≤ 1. Apply this result to the stability of the Lax–Friedrichs scheme
(1.3.5) for the one-way wave equation on the unit interval with the solution specified
at the left endpoint and the quasi-characteristic extrapolation (3.4.1) at the right
endpoint.
11.5.4. Show by using the matrix method that the forward-time central-space scheme for the
heat equation (6.1.1) with the Neumann condition approximated by (6.3.8) is stable
for bµ ≤ 1/2. Also show that this scheme with the Dirichlet boundary condition,
where the solution is specified at the endpoints, is stable for bµ ≤ 1/2.
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 12

Elliptic Partial Differential


Equations and Difference
Schemes

This chapter is the first of three chapters dealing with elliptic partial differential equations
and finite difference schemes. We start with a survey of important properties of elliptic
equations and the effects caused by boundary conditions. Then we show that finite difference
schemes have properties analogous to those of the differential equations. The following
two chapters are devoted to methods for obtaining the solution of the finite difference
schemes.

12.1 Overview of Elliptic Partial Differential Equations


The archetypal elliptic equation in two spatial dimensions is Poisson’s equation

uxx + uyy = f (x, y) (12.1.1)

in a domain U as illustrated in Figure 12.1. The Laplacian operator is the operator on the
left-hand side of (12.1.1), and we will denote it by ∇ 2 , i.e.,

∂2 ∂2
∇2 = + .
∂x 2 ∂y 2
In polar coordinates the Laplacian is given by the formula

1 ∂ ∂u 1 ∂ 2u
∇ u=
2
r + 2 .
r ∂r ∂r r ∂θ 2
The homogeneous equation corresponding to (12.1.1) is called Laplace’s equation, i.e.,

∇ 2 u = 0. (12.1.2)

The solutions of Laplace’s equation are called harmonic functions and are intimately con-
nected with the area of mathematics called complex analysis (see Ahlfors [2]).
To determine completely the solution of (12.1.1) it is necessary to specify a boundary
condition on the solution. Two common boundary conditions for (12.1.1) are the Dirichlet
condition, in which the values of the solution are specified on the boundary, i.e.,

u = b1 on ∂U, (12.1.3)

311
312 Chapter 12. Elliptic Partial Differential Equations and Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php



✍ ✌


∇ 2u = f

✍ ✌
Figure 12.1

and the Neumann condition, in which the values of the normal derivative are specified on
the boundary, i.e.,
∂u
= b2 on ∂U, (12.1.4)
∂n
where ∂U refers to the boundary of U. Only one boundary condition can be specified at
each point of the boundary, perhaps with (12.1.3) specified on one portion of the bound-
ary and (12.1.4) specified on the remaining portion. Elliptic partial differential equations
together with boundary conditions are called boundary value problems.
To gain a physically intuitive understanding of (12.1.1), we may regard it as describing
the steady-state temperature distribution of an object occupying the domain U. The solu-
tion u(x, y) would represent the steady temperature of the domain U with heat sources and
sinks given by f (x, y). The Dirichlet boundary condition (12.1.3) would represent spec-
ified temperatures on the boundary and the Neumann boundary condition (12.1.4) would
represent a specified heat flux. In particular, the Neumann boundary condition (12.1.4) with
b2 equal to zero would represent a perfectly insulated boundary.
An important observation concerning (12.1.1) with the Neumann condition (12.1.4)
specified on the boundary is that for a solution to exist, the data must satisfy the constraint
 
f = b2 . (12.1.5)
U ∂U

If the data do not satisfy this constraint, then there is no solution. This relationship is called
an integrability condition and is easily proved by the divergence theorem as follows. We
have, from equation (12.1.1), the divergence theorem, and (12.1.4), that
    
∂u
f = ∇ 2u = n · ∇u = = b2 .
U U ∂U ∂U ∂n ∂U

The vector n is the outer unit normal vector to the boundary ∂U. The integrability condition
(12.1.5) has the physical interpretation that the heat sources in the region must balance with
the heat flux on the boundary for a steady temperature to exist. Also, the solution to
(12.1.1) with (12.1.4) is determined only up to an arbitrary constant. This has the physical
12.1 Overview of Elliptic Equations 313
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

interpretation that the average temperature of a body cannot be determined from the heat
fluxes on the boundary and heat sources and sinks alone.
Although Poisson’s equation (12.1.1) is the most common elliptic equation, there are
many other elliptic equations that occur in applications. We now define elliptic equations
more generally.

Definition 12.1.1. The general (quasi-linear) second-order elliptic equation in two di-
mensions is an equation that may be written as

a(x, y)uxx + 2b(x, y)uxy + c(x, y)uyy + d(x, y, u, ux , uy ) = f (x, y) (12.1.6)

where a, c > 0 and b2 < ac.


Notice that the definition requires that the quadratic form

a(x, y)ξ 2 + 2b(x, y)ξ η + c(x, y)η2

be positive for all nonzero values of (ξ, η) for all values of (x, y) in U.

Other Elliptic Equations


We shall be primarily concerned with second-order elliptic equations, but there are elliptic
equations of any even order. In addition, elliptic equations in three dimensions are very
important in many applications. The biharmonic equation in two space dimensions is

∇ 4 u = ∇ 2 (∇ 2 u) = uxxxx + 2uxxyy + uyyyy = f. (12.1.7)

There are also elliptic systems such as the Cauchy–Riemann equations

ux − vy = 0,
(12.1.8)
uy + vx = 0

and the steady Stokes equations

∇ 2 u − px = f1 ,

∇ 2 v − py = f2 , (12.1.9)

ux + vy = 0.

The steady Stokes equations describe the steady motion of an incompressible highly
viscous fluid. The velocity field is given by the velocity components (u, v), and the
function p gives the pressure field. The biharmonic equation (12.1.7) is used to describe
the vertical displacement of a flexible, thin, nearly horizontal plate, subjected to small
vertical displacements and stresses on the boundary.
314 Chapter 12. Elliptic Partial Differential Equations and Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The essential property of these equations and systems is that the solutions are more
differentiable than the data. For example, the solution, u, of (12.1.1) has two more deriva-
tives than does f. Similarly, the solution, u, to the biharmonic equation (12.1.7) has four
more derivatives than does the function f. In particular, the solutions to Laplace’s equa-
tion (12.1.2) and the Cauchy–Riemann equations (12.1.8) are infinitely differentiable. This
property—that the solution is more differentiable than the data and that this gain in differ-
entiability of the solution is equal to the order of the differential operator—characterizes an
equation or system of equations as elliptic. (For systems such as (12.1.9) some care has to be
taken in appropriately defining the order; see Douglis and Nirenberg [14].) The ellipticity
of an equation is often expressed in terms of regularity estimates, as we demonstrate in the
next section.
As will be shown in the discussion of regularity estimates, the ellipticity of a single
equation depends on the nonvanishing of the symbol of the differential operator. More
precisely, if P is a differential operator of order 2m, then the operator is elliptic if there
is a constant c0 such that the symbol of P , denoted by p(x, ξ ), satisfies

|p(x, ξ )| ≥ c0 |ξ |2m (12.1.10)

for values of |ξ | sufficiently large. The symbol of a differential operator is defined as in


Definition 3.1.4, but for elliptic equations the factor of est is not required, since elliptic
equations do not depend on time.
We point out that equations such as
uxx − uyy = 0
do not have the property that its solutions are more differentiable than the data. This equation
is the wave equation, discussed in Chapter 8, and it has discontinuous functions in its class
of solutions. It does not satisfy the ellipticity requirement (12.1.10).

Exercises
12.1.1. Verify that if (u, v) is a solution of the Cauchy–Riemann equations (12.1.8), then
u and v also satisfy Laplace’s equation (12.1.2).
12.1.2. Show that second-order elliptic equations according to Definition 12.1.1 satisfy the
condition (12.1.10).
12.1.3. Show that the biharmonic operator in (12.1.7) satisfies the condition (12.1.10).
12.1.4. Show that the elliptic equation with constant coefficients auxx + 2buxy + cuyy +
d1 ux + d2 uy + eu = f can be transformed to an equation of the form
vξ ξ + vηη + γ v = g(ξ, η),
where γ is 1 or −1, using a linear change of coordinates, i.e., (ξ, η) = (x, y)M
for some matrix M, and where
v(ξ, η) = Au(x, y)eαξ +βη
for some constants A, α, and β.
12.2 Regularity Estimates 315
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

12.2 Regularity Estimates for Elliptic Equations


In this section we prove estimates that show how the smoothness of the solutions of el-
liptic equations depends on the data. We prove these estimates only for equations with
constant coefficients; similar estimates hold for equations with variable coefficients but the
techniques used to prove these estimates are beyond this text. We begin with the constant
coefficient equation

auxx + 2buxy + cuyy + d1 ux + d2 uy + eu = f (12.2.1)

for (x, y) ∈ R 2 , and we study this equation using the Fourier transform. We have the
Fourier transform of the solution

1
û(ξ1 , ξ2 ) = e−i(xξ1 +yξ2 ) u(x, y) dx dy
2π R 2

and the Fourier inversion formula



1
u(x, y) = ei(xξ1 +yξ2 ) û(ξ1 , ξ2 ) dξ1 dξ2
2π R2

as given in Section 2.1. There is also Parseval’s relation,


 
|u(x, y)| dx dy =
2
|û(ξ1 , ξ2 )|2 dξ1 dξ2 .
R2 R2

Also note that for nonnegative integers r and s,

 r+s 2 

u(x, y) dx dy = | ξ1r ξ2s û(ξ1 , ξ2 )|2 dξ1 dξ2

2 ∂x r ∂y s

R R2
 (12.2.2)
≤ (ξ12 + ξ22 )r+s |û(ξ1 , ξ2 )|2 dξ1 dξ2 .
R2

Applying the transform to (12.2.1), we obtain

(aξ12 + 2bξ1 ξ2 + cξ22 − id1 ξ1 − id2 ξ2 − e)û = −fˆ

or
−fˆ(ξ1 , ξ2 )
û(ξ1 , ξ2 ) = . (12.2.3)
aξ12 + 2bξ1 ξ2 + cξ22 − id1 ξ1 − id2 ξ2 − e
By the requirements that b2 < ac and a and c be positive, according to Definition
12.1.1, we have
aξ12 + 2bξ1 ξ2 + cξ22 ≥ c0 (ξ12 + ξ22 ) (12.2.4)
for some constant c0 , and hence when |ξ |2 = ξ12 + ξ22 ≥ C02 for some value C0 ,

|aξ12 + 2bξ1 ξ2 + cξ22 − id1 ξ1 − id2 ξ2 − e| ≥ c1 (ξ12 + ξ22 ) (12.2.5)


316 Chapter 12. Elliptic Partial Differential Equations and Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

for some positive constant c1 . Therefore, from (12.2.3) there is a constant C1 such that

|fˆ(ξ1 , ξ2 )|
|û(ξ1 , ξ2 )| ≤ C1
(ξ12 + ξ22 )

for ξ12 + ξ22 ≥ C02 . Then by Parseval’s relation and (12.2.2),


 
|∂x ∂y u(x, y)| dx dy =
s1 s2 2
|ξ1s1 ξ2s2 û(ξ1 , ξ2 )|2 dξ1 dξ2
R2 R2
 
= |ξ1s1 ξ2s2 û(ξ1 , ξ2 )|2 dξ1 dξ2 + |ξ1s1 ξ2s2 û(ξ1 , ξ2 )|2 dξ1 dξ2
|ξ |≤C0 |ξ |>C0

≤ |ξ1s1 ξ2s2 û(ξ1 , ξ2 )|2 dξ1 dξ2
|ξ |≤C0

+ C12 (ξ12 + ξ22 )s1 +s2 −2 |fˆ(ξ1 , ξ2 )|2 dξ1 dξ2
|ξ |≥C0

2(s1 +s2 )
≤ C0 |û(ξ1 , ξ2 )|2 dξ1 dξ2
R2

+ C12 (ξ12 + ξ22 )s1 +s2 −2 |fˆ(ξ1 , ξ2 )|2 dξ1 dξ2 .
R2

If we use the norms defined by



u2s = ∂xs1 ∂ys2 u2
s1 +s2 ≤s

(see Section 2.1), then the preceding estimate leads to

u2s+2 ≤ Cs (f 2s + u20 ). (12.2.6)

Estimate (12.2.6) is called a regularity estimate. It states that if a solution to (12.2.1)


exists in L2 , i.e., if u0 is finite and the function f has all derivatives of order through
s in L2 (R 2 ), then the function u has s + 2 derivatives in L2 (R 2 ).
Notice that the relation (12.2.4) is essential to proving the regularity estimate (12.2.6).
A curve on which aξ12 + 2bξ1 ξ2 + cξ22 is constant is an ellipse in the (ξ1 , ξ2 ) plane. This
is the historical reason for the name elliptic, although now the name is applied to more
general equations. (See the discussion of the origin of the names hyperbolic and parabolic
in Section 8.1.)
The property that characterizes an elliptic equation is that the solution of the equation
is more differentiable than the data and that the increase in differentiability of the solution
is equal to the order of the equation. For a second-order equation the property of ellipticity
is expressed by the regularity estimate (12.2.6). The biharmonic equation and other fourth-
order elliptic equations satisfy analogous estimates, showing that the solution has derivatives
12.3 Maximum Principles 317
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

of order 4 more than the data (see Exercise 12.2.2). Elliptic systems, such as the Stokes
equations, satisfy regularity estimates, but the concept of order must be generalized; see
Douglis and Nirenberg [14].
If equation (12.2.1) holds on a bounded domain U in R 2 , we can easily obtain an
interior regularity estimate on a subdomain U1 whose boundary is contained in the interior
of U. The interior regularity estimate is
 
u2s+2,U1 ≤ Cs (U, U1 ) f 2s,U + u20,U . (12.2.7)
This has the same interpretation as (12.2.6), but it gives estimates only in the interior of the
domain. Norms such as f s,U are defined as in Section 2.1 for integer values of s, but
the integration is only over the domain U.
The estimates (12.2.6) and (12.2.7) also hold if the coefficients of (12.2.1) are func-
tions of (x, y), as long as a constant c0 can be found so that (12.2.4) holds for all (x, y).
More sophisticated techniques than those used here must be employed to obtain the esti-
mates when the coefficients are variable. The theory of pseudodifferential operators has
been developed to extend the techniques used here to the situation when the coefficients
are not constant; see, for example, Taylor [61].

Exercises
12.2.1. Prove relation (12.2.4) for equation (12.2.1) from Definition 12.1.1.
12.2.2. For a fourth-order elliptic equation of the form
auxxxx + 2buxxyy + cuyyyy = f

with a and c positive and with b > − ac, prove the regularity estimate
us+4 ≤ Cs (f s + u0 ) .

12.2.3. Prove (12.2.7) by considering the function φu, where φ(x, y) is a smooth cutoff
function such that φ is 1 on U1 and 0 on the boundary of U. Hint: The function
φu can be extended to all R 2 by setting it to zero off of U and φu satisfies a
differential equation similar to (12.2.1), but where the right-hand side depends on
u and its first derivatives.

12.3 Maximum Principles


Maximum principles are a very useful set of tools for the study of second-order elliptic
equations. The usefulness of maximum principles is restricted to second-order equations
because the second derivatives of a function give information on the function at extrema.
The next two theorems are expressions of maximum principles.
Theorem 12.3.1. Let L be a second-order elliptic operator defined by Lφ = aφxx +
2bφxy + cφyy ; i.e., the coefficients a and c are positive and b satisfies b2 < ac. If a
function u satisfies Lu ≥ 0 in a bounded domain U, then the maximum value of u in
U is on the boundary of U.
318 Chapter 12. Elliptic Partial Differential Equations and Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

This theorem can be regarded as an extension to two dimensions of the following


result: If a function of one variable has a positive second derivative on a closed interval,
then that function must achieve its maximum value at the ends of the interval. Figure 12.2
shows a cartoon illustrating the idea that if a second-order differential elliptic operator is
positive, then the surface shape is somewhat upwardly concave and the maxima occur at
the boundary. On the other hand, if the operator is negative, then the minima occur at
the boundaries.

Figure 12.2. A cartoon illustrating the maximum principle.

Theorem 12.3.2. If the elliptic equation

auxx + 2buxy + cuyy + d1 ux + d2 uy + eu = 0

holds in a domain U, with a and c positive and e nonpositive, then the solution u(x, y)
cannot have a positive local maximum or a negative local minimum in the interior of U.

We prove both of these theorems under the assumption that u is in C 3 . We prove


Theorem 12.3.1 only in the case when Lu is positive, and we prove Theorem 12.3.2 only
in the case when e is negative. The proofs for the general case, when Lu is nonnegative
and e is nonpositive, require a more careful analysis; see Garabedian [23].
Proof of Theorem 12.3.1. If u is any C 3 function with a local maximum at
(x0 , y0 ), then the gradient of u is zero at (x0 , y0 ), i.e.,

ux (x0 , y0 ) = uy (x0 , y0 ) = 0.

See the illustration in Figure 12.3. By the Taylor series expansion about (x0 , y0 ), we have
that
 
u (x0 + )x, y0 + )y) = u (x0 , y0 ) + 12 )x 2 u0xx + 2)x)y u0xy + )y 2 u0yy

+ O (max()x, )y))3 .

We use the superscript of 0 to indicate that the functions are evaluated at (x0 , y0 ). Since
u(x0 + )x, y0 + )y) ≤ u(x0 , y0 ) for all sufficiently small values of )x and )y, it
follows that
)x 2 u0xx + 2)x)y u0xy + )y 2 u0yy ≤ 0.
Since this expression is homogeneous of degree 2 in )x and )y, we have

α 2 u0xx + 2αβu0xy + β 2 u0yy ≤ 0 (12.3.1)

for all real values of α and β.


12.3 Maximum Principles 319
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Figure 12.3. An interior maximum.

We now prove Theorem 12.3.1 for the case when Lu > 0. Applying (12.3.1) twice,
√ √  2
first with α = a 0 and β = b0 / a 0 and then with α = 0 and β 2 = c0 − b0 /a 0 ,
we have

Lu = a 0 u0xx + 2b0 u0xy + c0 u0yy


 2   b0  0 2
b

(b0 )2

= a 0 uxx + 2 a √
0 0 uxy + √
0
uyy + c − 0
0 0
u0yy
a0 a0 a
≤ 0.

Since this inequality contradicts the assumption that Lu > 0, Theorem 12.3.1 is
proved.
Proof of Theorem 12.3.2. We prove the theorem only for the case when e(x, y)
is strictly negative. The case when e(x, y) is zero at a maximum requires a more careful
analysis, and we will omit it; see Garabedian [23].
We first conclude from Theorem 12.3.1 that if u has a maximum at (x0 , y0 ), then
Lu cannot be positive there. Thus we have

−Lu(x0 , y0 ) = e(x0 , y0 )u(x0 , y0 ) ≥ 0.

Since e(x0 , y0 ) is negative, it follows that u(x0 , y0 ) is not positive at an interior local
maximum. Similarly, by considering −u(x, y), we can show that u is not negative at a
local minimum.
The maximum principle applied to Laplace’s equation (12.1.2) on a domain has the
physical interpretation that for a steady temperature distribution, both the hottest and coldest
temperatures occur at the boundary of the region. This means that harmonic functions,
solutions of Laplace’s equation, have their maximum and minimum values on the boundary
of any domain. Figure 12.4 displays a portion of a surface plot of the harmonic function
x 2 − y 2 illustrating the locations of the maximum and minimum values. For any domain
the highest and lowest points of the surface above the domain must occur on the boundary
of the domain. Figure 12.5 is a contour plot of the same function shown in Figure 12.4. The
maximum principle can be used to prove the uniqueness of the solution to many elliptic
equations.
320 Chapter 12. Elliptic Partial Differential Equations and Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Figure 12.4. A surface plot of a harmonic function.

Figure 12.5. A contour plot of a harmonic function.

Example 12.3.1. As an example of the application of maximum principles consider the


equation
uxx + uyy − u = f (12.3.2)
in a domain U with Dirichlet boundary conditions. Assume that there are two solutions u
and v to (12.3.2) and assume that u is greater than v somewhere in U. Set w = u − v;
then w satisfies (12.3.2) except with f equal to zero, and w is zero on the boundary.
Since w is positive somewhere in U and is zero on ∂U, w must have a positive interior
12.3 Maximum Principles 321
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

local maximum. But this contradicts Theorem 12.3.2, and thus equation (12.3.2) has at
most one solution. In fact, (12.3.2) does have a solution if U has a smooth boundary, but
we will not prove this.
Example 12.3.2. As an example of an equation with a nonunique solution, consider

uxx + uyy + 2π 2 u = 0 (12.3.3)

on the unit square with u equal to zero on the boundary. It is easily checked that

u = A sin π x sin πy

is a solution for any value of A. Also, equation (12.3.3) with u equal to 1 everywhere on
the boundary has no solution.

Exercises
12.3.1. Show that the equation
uxx + uyy − eu = f

on a domain U with u equal to zero on the boundary has a unique solution, if a


solution exists. Hint: For two functions u and v, the function (eu − ev )/(u − v)
is positive.

12.3.2. Show that if u satisfies the elliptic equation

auxx + 2buxy + cuyy = 0

on a domain, where the coefficients a, b, and c are constant, then the quantity
u2x + u2y takes its maximum on the boundary of the domain.

12.3.3. Prove that if u satisfies the elliptic equation of Exercise 12.3.2 on a domain and
P is a positive definite matrix, then the function

f (x, y) = 12 ∇u(x, y)T P ∇u(x, y)

takes its maximum on the boundary of the domain. ( ∇u is the gradient vector of
u with components ∂u/∂x and ∂u/∂y. )

12.3.4. Show by example that if ∇ 4 u = 0 in a domain U, then u can have an interior


maximum or minimum. Hint: Consider quadratic functions of (x, y).

12.3.5. Prove that there are no closed contours in the contour plot of a harmonic function.
322 Chapter 12. Elliptic Partial Differential Equations and Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

12.4 Boundary Conditions for Elliptic Equations


We restrict our discussion of boundary conditions to second-order equations and to the
Dirichlet condition (12.1.3), the Neumann condition (12.1.4), and the more general Robin
condition
∂u
+ αu = b. (12.4.1)
∂n
The existence and uniqueness of the solutions of the general second-order elliptic equation
(12.1.6) given boundary conditions of the form (12.1.3), (12.1.4), and (12.4.1) depend on
“global” constraints, such as the integrability condition (12.1.5). For certain equations,
especially (12.1.1), on domains with smooth boundaries, the existence and uniqueness
questions have been answered. With the Dirichlet boundary condition (12.1.3), Poisson’s
equation (12.1.1) has a unique solution, and with the Neumann condition (12.1.4) there
is a unique solution, up to the additive constant, if and only if the integrability condition
(12.1.5) is satisfied. See Section 13.7 for more on solving boundary value problems with
the Neumann boundary condition.
General statements can be made about the local behavior of solutions to (12.1.6) given
the different types of boundary conditions. If a Dirichlet boundary condition is enforced
along a smooth portion of the boundary, then the normal derivative at the boundary will
be as well behaved as the derivative of the boundary data function in the direction of the
boundary. If the boundary data function is discontinuous, then the normal derivative will
be unbounded at discontinuities.

Example 12.4.1. As an example of this last statement, consider Laplace’s equation in the
upper half-plane, i.e., y > 0, with

0 if x > 0,
u(x, 0) =
1 if x < 0.
A solution is
1 y θ
u(x, y) = tan−1 = ,
π x π
where θ is the angle in radians measured from the x-axis. The normal derivative of the
solution along the boundary is the derivative with respect to y. We have

∂u(x, y) 1 1/x x
= =  2 .
∂y π 1 + (y/x)2 π x + y2

At the boundary we have the normal derivative

1
uy (x, 0) = ,
πx
which is unbounded at the point of discontinuity in the boundary data function. The normal
derivative is unbounded at the point where the tangential derivative of the boundary data is
unbounded.
12.4 Boundary Conditions 323
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Notice also that in the interior of the upper half-plane, the solution and its derivatives
are all well-defined and smooth functions. The behavior of the solution near a point on the
boundary is primarily influenced by the boundary condition and data near that point. The
conditions and data at other boundary points have less effect the further they are away.

If either the Neumann or Robin conditions are enforced along a smooth boundary,
then the solution will be differentiable up to the boundary and the first derivatives will be
as well behaved as the boundary data function.

Example 12.4.2. As an example, again on the upper half-plane, consider the boundary data

∂u 0 if x > 0,
(x, 0) =
∂y |x|1/2 if x < 0.

Laplace’s equation has the solution

u(x, y) = − 32 r 3/2 cos 32 θ

using the polar coordinates of (x, y). The derivatives are given by
 
∂u ∂u 1 1
, = −r 1/2
cos θ, r 1/2 sin θ
∂x ∂y 2 2

and we see that the tangential derivative, i.e., ∂u/∂x, has the same qualitative behavior as
the normal derivative.

A serious difficulty occurs at points on the boundary where the boundary conditions
change from Dirichlet to Neumann or Robin type.

Example 12.4.3. Consider Laplace’s equation in the upper half-plane with the boundary
conditions
u(x, 0) = 0 for x > 0 and uy (x, 0) = 0 for x < 0.

This problem has as a solution

u(x, y) = r 1/2 sin 12 θ. (12.4.2)

Note that u and its first derivatives are in L2 (U) for any bounded domain U in the upper
half-plane whose boundary includes a portion of the real axis around zero. This function
u, however, does not have second derivatives in L2 (U) because of their growth near the
origin. The first derivatives are also unbounded, but are in L2 (U).
324 Chapter 12. Elliptic Partial Differential Equations and Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example 12.4.4. Similar difficulties arise at reentrant corners. Consider the domain
containing the points (r, θ ), in polar coordinates, with 0 < r ≤ r0 and 0 < θ < 3π/2,
see Figure 12.6. Laplace’s equation with the solution equal to zero on the two rays given
by θ = 0 and θ = 3π/2 has the solution
u(x, y) = r 2/3 sin 23 θ. (12.4.3)
Again, this function is in L2 (U), as are its first derivatives, but its second derivatives
are not.
Notice that the difficulty has nothing to do with the data at the reentrant corner. In
this case the data near the corner is identically zero, but the derivatives are not bounded at
the corner. See Exercise 12.4.2 for reentrant corners with different angles.

O u=0
u=0

Figure 12.6. A region with a reentrant corner.

In summary, the solutions of elliptic equations with any of the boundary conditions,
Dirichlet, Neumann, or Robin, will be well behaved near smooth portions of the boundary.
At boundary points where either the boundary conditions change type or the boundary is
not smooth, difficulties in the form of singularities in the solution’s derivatives can occur.
An appreciation for these difficulties is important to understanding the numerical methods
for elliptic equations.

Exercises
12.4.1. Find a function of the form r α sin αθ, with α taking the least possible posi-
tive value, that is a solution to Laplace’s equation in the region 0 < r < 1 and
0 < θ < 3π/2 and that satisfies the given boundary conditions.
(a) u = 0 on θ = 0, 0 < r < 1,
u = r α on θ = 3π/2, 0 < r < 1.
(b) u = 0 on θ = 0, 0 < r < 1,
∂u/∂n = 0 on θ = 3π/2, 0 < r < 1.
Compare the behavior of these functions with (12.4.2) and (12.4.3).
12.5 Schemes for Poisson’s Equation 325
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

12.4.2. Find solutions comparable to that of Example 12.4.4 on domains with 0 < θ < θ0
for 0 < θ0 < 2π. Show that the radial first derivative is unbounded when π < θ0 .

12.5 Finite Difference Schemes for Poisson’s Equation


We begin our discussion of difference schemes for elliptic equations by considering Pois-
son’s equation (12.1.1) in the unit square. In general, one has grid spacings )x and )y
in the two directions. For simplicity, we will usually restrict our discussion to the case of
equal grid spacing in the x and y directions, and denote the grid spacing as h. We use
the index $ for the x direction and the index m for the y direction. A schematic of the
grid is shown in Figure 12.7. The standard central difference approximations for the second
derivatives lead to the difference formula
δx2 v$,m + δy2 v$,m = f
or, equivalently,
1  
2
v$+1,m + v$−1,m + v$,m+1 + v$,m−1 − 4v$,m = f$,m . (12.5.1)
h
The difference operator on the left-hand side of (12.5.1) is called the five-point (discrete)
Laplacian. We will use the symbol ∇h2 to refer to the discrete five-point Laplacians, i.e.,
∇h2 = δx2 + δy2 .

(l,m)

Figure 12.7. The five grid points involved in the five-point Laplacian.
326 Chapter 12. Elliptic Partial Differential Equations and Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We begin our study of finite difference schemes for elliptic equations by obtaining
error estimates for the solution to (12.5.1). In the next two chapters we consider methods
for solving equations such as (12.5.1).

The Discrete Maximum Principle


We can prove a maximum principle for the discrete five-point Laplacian that is analogous
to that for the differential equation.

Theorem 12.5.1. Discrete Maximum Principle. If ∇h2 v ≥ 0 on a region, then the max-
imum value of v on this region is attained on the boundary. Similarly, if ∇h2 v ≤ 0, then
the minimum value of v is attained on the boundary.

Proof. We prove the principle only for the case when )x = )y. The condition
∇h2 v ≥ 0 is equivalent to

v$,m ≤ 14 (v$+1,m + v$−1,m + v$,m+1 + v$,m−1 );

i.e., v$,m in the interior of the region is less than or equal to an average of its four nearest
neighbors. This easily leads to the conclusion that an interior point can be a (local) maximum
only if its four neighbors also have this same maximum value and that the inequality
is actually an equality. This argument then implies that at all grid points, including the
boundary points, v must have the same value. So either there is not a maximum value in
the interior or all points have the same value. This proves the principle when ∇h2 v ≥ 0.
When ∇h2 v ≤ 0, by considering ∇h2 (−v) ≥ 0, this case reduces to the previous case.
This completes the proof of the discrete maximum principle.
The maximum norm on a region U is defined by

v∞ = v∞,U = max |v$,m |,

where the maximum is taken over all points in the region.


The chief tool in our error estimates is the following theorem.

Theorem 12.5.2. If v$,m is a discrete function defined on a grid on the unit square with
v$,m = 0 on the boundary, then

1 2
v∞ ≤ ∇ v∞ . (12.5.2)
8 h

Proof. Define the function f$,m in the interior of the unit square by

∇h2 v$,m = f$,m .

Then, obviously,
−f ∞ ≤ ∇h2 v ≤ f ∞ . (12.5.3)
12.5 Schemes for Poisson’s Equation 327
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

To prove the theorem we consider the function

 2  2 !
w$,m = 1
4 x$ − 1
2 + ym − 12

and note that w is nonnegative and

∇h2 w = 1.

Thus from (12.5.3) we have

∇h2 (v − f ∞ w) ≤ 0.

By the discrete maximum principle and this inequality, the function v − f ∞ w has its
minimum on the boundary of the square, i.e.,

−f ∞ w∞,∂ ≤ v$,m − f ∞ w$,m ≤ v$,m ,

where w∞,∂ is the maximum value of |w$,m | for grid points on the boundary of the
square.
Similarly, from (12.5.3) we also obtain

∇h2 (v + f ∞ w) ≥ 0,

and by the maximum principle,

v$,m ≤ v$,m + f ∞ w$,m ≤ f ∞ w∞,∂ .

The value of w∞,∂ is 1/8, and so the preceding two inequalities for v$,m give

1 1
v∞ ≤ f ∞ = ∇h2 v∞ ,
8 8

which proves the theorem.


Theorem 12.5.2 leads to the error estimate in the maximum norm for the solution of
(12.1.1) as approximated by (12.5.1).

Theorem 12.5.3. Let u(x, y) be the solution to ∇ 2 u = f on the unit square with Dirichlet
boundary conditions and let v$,m be the solution to ∇h2 v = f with v$,m = u(x$ , ym ) on
the boundary. Then
u − v∞ ≤ ch2 ∂ 4 u∞ ,

where ∂ 4 u∞ is the maximum magnitude of all the fourth derivatives of u over the
interior of the square.
328 Chapter 12. Elliptic Partial Differential Equations and Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Proof. By using the Taylor series for the central difference approximation to the
second derivative (see Section 3.3), we have that

∇h2 u = f + O(h2 ),

where the O(h2 ) terms are bounded by

Ch2 ∂ 4 u∞

for some constant C. Thus

∇h2 (u − v)∞ ≤ Ch2 ∂ 4 u∞

and u − v is zero on the boundary. Together with Theorem 12.5.2, this estimate proves
the theorem.

The Nine-Point Laplacian


Another very useful scheme for Poisson’s equation (12.1.1) is the fourth-order accurate
nine-point Laplacian. To derive this scheme we approximate (12.1.1) by
 −1  −1
)x 2 2 )y 2 2
1+ δ δx u + 1 +
2
δ δy2 u = f + O()4 ),
12 x 12 y
which gives
 
)y 2 2 2 )x 2 2 2
1+ δy δx u + 1 + δx δy u
12 12
 
)x 2 2 )y 2 2
= 1+ δx 1+ δy f + O()4 )
12 12
!
1
= 1 + ()x δx + )y δy ) f + O()4 ).
2 2 2 2
12
Rearranging this expression we have the fourth-order accurate scheme
   
∇h2 v + 12
1
)x 2 + )y 2 δx2 δy2 v = f + 12
1
)x 2 δx2 + )y 2 δy2 f.

In the case with )x = )y = h, this scheme can be written


1 
v$+1,m+1 + v$+1,m−1 + v$−1,m+1 + v$−1,m−1
6
2  10
+ v$+1,m + v$−1,m + v$,m+1 + v$,m−1 − v$,m (12.5.4)
3 3
h2  
= f$+1,m + f$−1,m + f$,m+1 + f$,m−1 + 8f$,m .
12
12.5 Schemes for Poisson’s Equation 329
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Table 12.5.1
Comparison of second-order and fourth-order schemes.
Second-order Fourth-order
h Error Order Error Order
0.100 2.79–5 9.40–9
0.050 7.01–6 1.99 5.85–10 4.01
0.025 1.75–6 2.00 3.66–11 4.00

This scheme is due to Rosser [54], and it satisfies maximum principles and error estimates
similar to the standard five-point Laplacian; see Exercise 12.5.6.
Table 12.5.1 shows the results of computations employing both the second-order
accurate five-point Laplacian (12.5.1) and the fourth-order accurate nine-point Laplacian
(12.5.4) applied to Poisson’s equation on the unit square. (The results were computed
using a preconditioned conjugate gradient method, which is discussed in Section 14.5.)
The exact solution is given by u = cos x sin y and f = −2 cos x sin y. The second and
fourth columns give the errors for the two methods, measured in the L2 norm due to the
difference between the finite difference solution and the solution of the differential equation.
The third and fifth columns display the order of accuracy for each method, as computed
from the approximation that
e(h) = chr ,
where e(h) is the error. Thus
log(e(h1 )/e(h2 ))
r=
log(h1 / h2 )
for two successive errors due to grid spacings h1 and h2 . The five-point Laplacian (12.5.1)
is obviously second-order accurate, and the nine-point formula (12.5.4) is obviously fourth-
order accurate. Notice that the error for the fourth-order scheme with h equal to 1/10 is
much smaller than that of the second-order method with h equal to 1/40 . The gain in
accuracy of the nine-point formula is significant compared to the slight increase in work
that it requires.
Schemes for the general second-order elliptic equation (12.2.1) need not satisfy a
maximum principle. For example, if the mixed derivative term in (12.2.1) is approximated
by
δ0x δ0y v,
then the resulting scheme does not satisfy an obvious maximum principle. If the coefficient
b(x, y) is positive, then the second-order accurate approximation
∂ ∂ 1  
b ≈ b δx+ δy+ + δx− δy−
∂x ∂y 2
will satisfy a maximum principle if b is not too large compared with the coefficients a
and c. Schemes that do not satisfy a maximum principle may have solutions and satisfy
error estimates such as in Theorem 12.5.3; however, the proofs are not as simple as those
just given. We do not need a maximum principle to hold in order to use the scheme.
330 Chapter 12. Elliptic Partial Differential Equations and Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Regularity Estimates for Schemes


We can prove discrete regularity estimates for schemes for elliptic equations, as is done for
the differential equation. For example, the scheme

aδx2 v + 2bδ0x δ0y v + cδy2 v + d1 δ0x v + d2 δ0y v + ev = f

has the symbol

a sin2 21 hξ1 + 2b sin 12 hξ1 sin 12 hξ2 cos 12 hξ1 cos 12 hξ2 + c sin2 21 hξ2
p(ξ1 , ξ2 ) = − 4
h2
sin hξ1 sin hξ2
+ id1 + id2 + e.
h h

The analogue of the estimate (12.2.5) for the symbol is, with θ = hξ1 and φ = hξ2 ,

a sin2 1 θ + 2b sin 1 θ sin 1 φ cos 1 θ cos 1 φ + c sin2 1 φ


sin θ sin φ
4 2 2 2
2
2 2 2
− id 1 − id 2 − e
h h h
≥ c0 4h−2 (sin2 21 θ + sin2 21 φ),

which holds for some positive constant c0 , when h is small enough and when θ12 + θ22 ≥
h2 C02 for some value C0 .
The interior regularity estimate that follows from this estimate is
 
u2h,s+2,U1 ≤ Cs f 2h,s,U + u2h,0,U . (12.5.5)

The discrete interior regularity estimate can be used to prove the following result.

Theorem 12.5.4. If the elliptic equation Lu = f is approximated by the scheme


Lh v = Rh f on a domain U such that

Lh u − Rh Luh,s−2,U ≤ c0 hr us (12.5.6)

and
u − vh,0,U ≤ c1 hr us (12.5.7)

and U1 is contained in U, then

δ+
s
u − δ+
s
vh,0,U1 ≤ c2 hr us ,

where c2 depends on the distance between U1 and ∂U.


12.5 Schemes for Poisson’s Equation 331
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Proof. The discrete function u − v satisfies the scheme


Lh (u − v) = Lh u − Rh f = Lh u − Rh Lu
and, by (12.5.5),
 
u − v2h,s,U1 ≤ Cs−2 Lh u − Rh Lu2h,s−2,U + u − v2h,0,U
≤ Cs−2 h2r u2s ,
from which the theorem follows.
This theorem shows that if the function f is smooth enough, then the divided dif-
ferences of v approximate the divided differences of u to the same order that v itself
approximates u. This is demonstrated in Exercises 13.5.3 and 13.5.4.
At first look, this is a very surprising result. Notice that in general, if a discrete
function v$,m is an approximation to u(x, y) of order hr , then
v$+1,m − v$−1,m ∂u(x$ , ym )
= + O(hr−1 )
2h ∂x
because the divided difference divides the error by a factor of h. However, Theorem 12.5.4
implies that when v$,m and u(x, y) are solutions of elliptic equations, then the error term
can be O(hr ) rather than O(hr−1 ) The reason is that for elliptic equations the error is
smooth, and a divided difference of the error is again a smooth function.
We look more closely at how Theorem 12.5.4 can be used to obtain approximations
to derivatives of solutions of elliptic equations that are of the same order of accuracy as the
solution itself. Suppose the solutions of a fourth-order accurate scheme satisfy estimates
(12.5.6) and (12.5.7) with r equal to 4. Then
 2

 
 1 − h δ 2 δ 2 v − ∂ 2 u = O(h4 )
 12 x x x 
h,0,U1
and   2

 2 
δ v − 1 + h δ 2 ∂ 2 u = O(h4 )
 x 12 x x h,0,U1
(see formulas (3.3.6) and (3.3.7)). Note, however, that
δx2 v − ∂x2 uh,0,U1 = O(h2 ),
since δx2 is only a second-order accurate approximation to ∂x2 . These results also apply to
equations with variable coefficients; see Frank [21] or Bube and Strikwerda [7].
By comparison, such results do not hold for solutions of hyperbolic problems. If
n is a solution to a second-order accurate scheme for a hyperbolic equation, such as
vm
the one-way wave equation (1.1.1), then, in general, δ0 vm n is only a first-order accurate

approximation to the first derivative of u.


Also, as was discussed in Section 12.4, the solutions of elliptic equations have, in
general, a loss of regularity at certain boundary points. The implication for finite difference
solutions is that the errors will be largest at these points, such as reentrant corners. Shown
in Figure 12.8 is a contour plot of the error for the solution of Laplace’s equation with
the boundary data given by the exact solution u(r, θ ) = r 2/3 sin( 23 θ). As the contour plot
shows, the error is concentrated at the reentrant corner.
332 Chapter 12. Elliptic Partial Differential Equations and Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Figure 12.8. Contour plot of the error at a reentrant corner.

Exercises
12.5.1. Prove the discrete maximum principle on the unit square for the case with )x not
equal to )y.
12.5.2. Show that on a domain that is contained in a square of side d the analogue of the
estimate (12.5.2) is
d2 2
v∞ ≤ ∇h v∞ .
8
12.5.3. Prove the equivalent of Theorems 12.5.1, 12.5.2, and 12.5.3 for the nine-point
scheme (12.5.4).
12.5.4. Consider the domain given by −1 < x < 1 , −1 < y < 1, except for 0 < x < 1 ,
−1 < y < 0, i.e., the points in quadrants 1, 2, and 3 with |x| and |y| less than 1.
For this domain prove the estimate
v∞ ≤ 52 ∇ 2 v∞
corresponding to Theorem 12.5.2. Hint: Consider w = (x + 1/2)2 + (y − 1/2)2 .
12.6 Polar Coordinates 333
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

12.5.5. Using the results of Exercise 12.5.4, prove the analogue of Theorem 12.5.3 for the
domain of Exercise 12.5.4. Why is this theorem nearly useless for computation?
Hint: See Section 12.4.
12.5.6. Show that the nine-point Laplacian (12.5.4) satisfies a discrete maximum principle.
12.5.7. Show that the “diagonal” five-point Laplacian scheme given by
1  
2
v$−1,m−1 + v$+1,m−1 + v$−1,m+1 + v$+1,m+1 − 4v$,m = f$,m
2h
does not satisfy a regularity estimate by showing that the symbol of the scheme
p(ξ1 , ξ2 ) vanishes for ξ1 and ξ2 equal to π/ h. (The vanishing of the symbol is
a reflection of the fact that this scheme decomposes into two separate schemes, one
for grid points with $ + m being even and the other for $ + m being odd.)

12.6 Polar Coordinates


Many applications involving elliptic equations are for domains on which it is natural to use
polar coordinates, and so we now examine the effects of using polar coordinates. Consider
Poisson’s equation on the unit disk

1 ∂ ∂u 1 ∂ 2u
r + 2 = f (r, θ ) (12.6.1)
r ∂r ∂r r ∂θ 2

with 0 ≤ r ≤ 1 and 0 ≤ θ ≤ 2π. We use a grid as shown in Figure 12.9 with ri = i)R
and θj = j )θ. We approximate the equation by

1 ui+1j − uij uij − ui−1j 1
ri+1/2 − ri−1/2
ri )r )r )r

1 uij +1 − 2uij + uij −1


+ = fij , (12.6.2)
ri2 )θ 2
 
where uij and fij are the grid functions at ri , θj = (i)r, j )θ ) . The grid functions are
periodic in j with period J = 2π/)θ, and u0j is independent of the value of j.
The main new feature of polar coordinates is the condition that must be imposed at the
origin. It is important to realize that any difficulties that arise at the origin are only a result
of the choice of coordinate system and are not reflected in the continuous function u(r, θ ).
To derive our condition at the origin we integrate equation (12.6.1) over a disk D of
radius ε, obtaining
   !
1 ∂ ∂u 1 ∂ 2u
f r dr dθ = r + 2 2 r dr dθ
D D r ∂r ∂r r ∂θ
 2π
∂u
= ε dθ.
0 ∂r
334 Chapter 12. Elliptic Partial Differential Equations and Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Figure 12.9. A polar grid for a circular domain.

Now choose ε equal to )r/2 and approximate this relation by

 2
J
)r u1j − u0 )r
f (0) π= )θ.
2 )r 2
j =1

Since u0j is independent of j —call this value u0 —we have


1
J
)r 2
u0 = u1j − f (0) . (12.6.3)
J 2
j =1

Using this formula preserves the second-order accuracy of scheme (12.6.2).


For parabolic and hyperbolic equations on a disk, a procedure analogous to the one
that gave rise to (12.6.3) can be used to give accurate difference formulas at the origin.

Exercises
12.6.1. Show that the discrete maximum principle holds for finite difference scheme (12.6.2)
on a disk with formula (12.6.3) used at the origin.
12.6.2. If we denote the finite difference operator on the left-hand side of (12.6.2) by ∇˜ h2 ,
show that the estimate
v∞ ≤ 14 ∇˜ h2 v∞
holds for a disk of radius 1, where the formula (12.6.3) is used at the origin.
12.7 Coordinate Changes 335
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

12.7 Coordinate Changes and Finite Differences


Frequently partial differential equations must be solved on domains that are not rectangles,
disks, or other nice shapes. Sometimes it is possible to change coordinates so that a con-
venient coordinate system can be used. To illustrate the techniques and the difficulties, we
will work through a relatively simple example. It is not hard to come up with much more
difficult examples.

Figure 12.10. The trapezoidal region and grid.

We consider Poisson’s equation on the trapezoidal domain given by 0 ≤ x ≤ 1 and


0 ≤ y ≤ (1 + x)/2 and shown in Figure 12.10. We take the new coordinate system
2y
ξ = x, η=
1+x
so that (ξ, η) in the unit square maps one-to-one with (x, y) in the trapezoid. To change
coordinates we use the differentiation formulas
∂ ∂ 2y ∂ ∂ η ∂
= − = − ,
∂x ∂ξ (1 + x) ∂η
2 ∂ξ 1 + ξ ∂η
∂ 2 ∂ 2 ∂
= = .
∂y 1 + x ∂η 1 + ξ ∂η

Using these relations, Poisson’s equation (12.1.1) becomes



η η η   4
uξ ξ − uξ η − uη + η uη η + uηη = f (ξ, η) .
1+ξ 1+ξ ξ (1 + ξ ) 2 (1 + ξ )2
(12.7.1)
336 Chapter 12. Elliptic Partial Differential Equations and Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

If we were to discretize (12.7.1) in this form using second-order accurate central


differences, the matrix arising from the matrix representation would not be symmetric.
Since the iterative solution methods we will study in the next two chapters will work if
the matrix is symmetric, we will show how to modify the equation (12.7.1) to obtain a
symmetric, positive definite matrix. To do this we must get the equation in divergence
form, i.e., in the form
∂  ∂u
aij = f˜,
∂xi ∂xj
i,j
where (x1 , x2 ) = (ξ, η) and (aij ) is a symmetric matrix at each point (x1 , x2 ). (See
Exercise 12.7.3.) If we multiply (12.7.1) by (1 + ξ ), we can collect terms such that
(12.7.1) is equivalent to

      4 + η2
(1 + ξ )uξ ξ − ηuξ η − ηuη ξ + uη = (1 + ξ ) f.
1+ξ η
This equation may be discretized on a uniform grid in ξ and η as
(1 + ξi+1/2 )(ui+1j − uij ) − (1 + ξi−1/2 )(uij − ui−1j )
)ξ 2
ηj +1 (ui+1j +1 − ui−1j +1 ) − ηj −1 (ui+1j −1 − ui−1j −1 )

4)ξ )η
ηj (ui+1j +1 − ui+1j −1 ) − ηj (ui−1j +1 − ui−1j −1 ) (12.7.2)

4)ξ )η

(4 + ηj2+1/2 )(uij +1 − uij ) − (4 + ηj2−1/2 )(uij − uij −1 )


+
(1 + ξi ))η2

= (1 + ξi ) fij .

We now show that the matrix corresponding to this discretization is symmetric. The
matrix is symmetric if the coefficient of u$,m in the equation at grid point (i, j ) is the
same as the coefficient of
 ui,j at ($,
 m). We check that the coefficient of ui+1j +1 in the
equation at (i, j ) is − ηj +1 + ηj /4)ξ )η, and this is also the coefficient of uij in the
equation at (i + 1, j + 1). The same is true for all the other nonzero coefficients. Thus
the matrix is symmetric.
To show that the matrix of the equations in (12.7.2) is negative definite, we consider
the operator on the left-hand side of (12.7.2) applied to a grid function φij that is zero on
the boundaries of the unit (ξ, η) square. Multiplying the operator applied to φ at (i, j )
by φij and summing over all"(i, j )"gives a long" expression that we will consider in three
parts. Denote these sums by ξ ξ , ηη , and ξ η . The terms from the second difference
in ξ are
 
= )ξ −2 φij (1 + ξi+1/2 )(φi+1j − φi−1j ) − (1 + ξi−1/2 )(φij − φi−1j )
ξξ
  2
= −)ξ −2 1 + ξi+1/2 φi+1j − φij
12.7 Coordinate Changes 337
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

by summation by parts. Similarly, the terms from the second differences in η are

4 + ηj2+1/2  2
= −)η−2 φij +1 − φij .
ηη
1 + ξi

The sums are over all interior (i, j ) values. The sums from the mixed differences are also
treated by summation by parts and become
  
= (2)ξ )η)−1 ηj φi+1j − φi−1j φij +1 − φij −1 .
ξη

To show that the matrix is negative definite, we must show that


   

− + +  ≥ −C  + ≥0
ξξ ξη ηη ξξ ηη

for some positive number C. This is easily done using the inequalities

  
− ηj φi+1j − φi−1j φij +1 − φij −1 /2)ξ )η
 2 
φi+1j − φi−1j 2 1 ηj φij +1 − φij −1 2
≤ a (1 + ξi ) +
2)ξ a 1 + ξi 2)η
$  %
a φi+1j − φij 2 φij − φi−1j 2
≤ (1 + ξi ) +
2 )ξ )ξ

2
$ 2  2 %
1 ηj φij +1 − φij φij − φij −1
+ + .
2a 1 + ξi )η )η

Therefore,
 
  2
− + +  ≤ (1 − a))ξ −2 1 + ξi+1/2 φi+1j − φij
ξξ ξη ηη

4 − ηj2+1/2 /a + ηj2+1/2  2
−2
+ )η φij +1 − φij .
1 + ξi

If we choose a = 1/2, say, then both sums are nonnegative. Thus the system of difference
equations (12.7.2) has a matrix that is symmetric and negative definite.
This system of equations can be solved by the methods discussed in Chapters 13
and 14.
338 Chapter 12. Elliptic Partial Differential Equations and Difference Schemes
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
12.7.1. Consider Poisson’s equation (12.1.1) on the domain given by 0 ≤ x ≤ 1 and
0 ≤ y ≤ H (x). Change coordinates to (ξ, η) given by ξ = x and η = y/H (x).
Write the scheme in a form that gives a positive definite and symmetric matrix.
12.7.2. Consider Poisson’s equation in polar coordinates (12.6.1) on the domain given
by 0 ≤ r ≤ s(θ ) and 0 ≤ θ ≤ 2π. Change coordinates to (ρ, φ) given by
ρ = r/s(θ ) and φ = θ. Write the scheme in a form that gives a positive definite
and symmetric matrix.
12.7.3. Show that by multiplying an elliptic equation of the form

a11 (x, y)uxx + 2a12 (x, y)uxy + a22 (x, y)uyy = f (x, y)

by a function φ, the resulting equation can be put in divergence form,

∂  ∂u
ãij = f˜,
∂xi ∂xj
i,j

with (x1 , x2 ) = (x, y) if and only if the coefficients satisfy


   
∂a12 ∂a22 ∂a11 ∂a12
a11 + − a12 +
∂ 
 ∂x ∂y ∂x ∂y  
∂x  a11 a22 − a12
2 

   
∂a12 ∂a22 ∂a11 ∂a12
a12 + − a22 +
∂ 
 ∂x ∂y ∂x ∂y   = 0.
+
∂y  a11 a22 − a12
2 

Hint: Obtain equations that φ and its first derivatives must satisfy; then use the
identity
 
∂ 1 ∂φ ∂ 1 ∂φ
= .
∂x φ ∂y ∂y φ ∂x
Conclude that the equation exy uxx + uyy = f (x, y) cannot be put in divergence
form. This equation is used in Example 13.7.2 in Chapter 13.
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 13

Linear Iterative Methods

In this chapter we consider the class of iterative methods known as linear methods, con-
centrating primarily on the class of methods related to successive overrelaxation. These
methods are relatively easy to implement and require minimal computer storage and, for
these reasons, are very widely used in the numerical solution of elliptic equations.

13.1 Solving Finite Difference Schemes


for Laplace’s Equation in a Rectangle
We begin by considering methods for solving Laplace’s equation (12.1.2) in a rectangular
domain. The basic method can be extended to solve general elliptic equations such as
(12.1.6) on general regions, as discussed in Section 12.7.
Consider Laplace’s equation (12.1.2) on the unit square with Dirichlet boundary
conditions (12.1.3). For the finite difference scheme we use the standard second-order
accurate five-point Laplacian with equal grid spacing in the x and y directions. This has
the finite difference formula

v$+1,m + v$−1,m + v$,m+1 + v$,m−1 − 4v$,m = 0 (13.1.1)

for all interior points (x$ , ym ). For the Dirichlet boundary condition (12.1.3) we assume
that the values of v$,m in (13.1.1) are given when (x$ , ym ) is a boundary point. The
Neumann boundary condition is considered in Section 13.7.
Equations (13.1.1) comprise a system of linear equations for the interior values of
v$,m with the boundary v$,m values prescribed. These equations can be written in the
standard matrix notation
Ax = b, (13.1.2)
where the vector x consists of the interior values v$,m and b is composed from the values
of v$,m on the boundary, i.e., the known values.
We could solve (13.1.2) by standard methods for systems of linear equations, such
as Gaussian elimination. However, the matrix A in (13.1.2) is a very sparse matrix and is
often quite large. For example, if the grid spacing in the unit square is N −1 , then A is
an (N − 1)2 × (N − 1)2 matrix, and each row contains at most five nonzero elements. If
N is taken to be about 40, then only about 0.3% of the elements are nonzero. Gaussian
elimination is not efficient for such sparse matrices, and so direct methods such as Gaussian
elimination are not often used to solve (13.1.1). Instead, iterative methods are usually
employed. Because matrix A has a well-defined structure, due to the finite difference

339
340 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

scheme, using a good iterative method is usually more efficient than the use of general
sparse matrix methods for Gaussian elimination.

The Jacobi Method


The first iterative method we will consider is the Jacobi algorithm. It is given by the formula
 
k+1
v$,m = 14 v$+1,m
k
+ v$−1,m
k
+ v$,m+1
k
+ v$,m−1
k
(13.1.3)

for all interior points. This formula describes how we proceed from an initial approximation
0 to successive approximations v$,m k . Given the values of v k
v$,m $,m for all grid points,
k+1
equation (13.1.3) shows how to compute v$,m at each interior grid point. Having computed
v k+1 for all the grid points, the iterative process can be continued to compute v k+2 , and so
on. Of course, throughout the computation the values of v$,m k on the boundary all remain
at their prescribed values.
The Jacobi algorithm (13.1.3) converges as k increases, and we stop the iterations
when some criterion is satisfied. For example, one criterion would be to stop when the max-
imum value of |v$,m k+1 k | taken over all values of ($, m) is less than some prescribed
− v$,m
tolerance.

The Gauss–Seidel Method


The Jacobi algorithm has been described as the slowest of all converging methods; it
certainly is not hard to improve on it. A method that converges twice as fast as (13.1.3) is
the Gauss–Seidel algorithm, given by
 
k+1
v$,m = 14 v$+1,m
k
+ v$−1,m
k+1
+ v$,m+1
k
+ v$,m−1
k+1
. (13.1.4)

In this formula we see that if we proceed through the grid of points in the natural order, then
we do not need to keep two copies of the solution, one for the “old” values at iteration k and
another for “new” values at iteration k + 1. In (13.1.4) we can use immediate replacement;
k+1 k
i.e., when v$,m is computed it can be stored in the location where v$,m was stored. Thus
(13.1.4) uses less storage than (13.1.3) and, as shown in Section 13.3, it is twice as fast.
The natural order of progressing through the grid points is also called the lexicographic
order. It is the order we obtain in programming using two nested loops, the inner loop being
on $ and the outer loop being on m.

The SOR Method


A method that improves on (13.1.4) is successive overrelaxation (SOR), given by
 
k+1
v$,m = v$,m
k
+ ω 14 (v$+1,m
k
+ v$−1,m
k+1
+ v$,m+1
k
+ v$,m−1
k+1
) − v$,m
k
. (13.1.5)

If the parameter ω is chosen properly, then (13.1.5) can be very much faster than
(13.1.4). Notice that when ω is equal to 1, then SOR reduces to the Gauss–Seidel algorithm.
SOR also uses immediate replacement.
13.1 Solving Finite Difference Schemes 341
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

In the next sections we analyze each of the preceding methods to determine their
relative rates of convergence. We also present other versions of SOR.

Analysis of General Linear Iterative Methods


There is an extensive literature on iterative methods for solving linear systems of equations,
and we give only an introduction to these methods. More exhaustive discussions are con-
tained in the books by Young [73], Varga [65], Wachpress [67], and Hageman and Young
[29]. The Jacobi, Gauss–Seidel, and SOR methods are particular cases of the general class
of methods called linear iterative methods. The general linear iterative method for solving
a linear system
Ax = b (13.1.6)
involves decomposing the matrix A by writing it as
A=B −C (13.1.7)
and then iteratively solving the system of equations
Bx k+1 = Cx k + b. (13.1.8)
Of course, we wish to choose B so that (13.1.8) can be easily solved. As we will
show, the Jacobi, Gauss–Seidel, and SOR methods are different ways of splitting the linear
system for the five-point Laplacian. Since the exact solution satisfies (13.1.6), we obtain
from (13.1.8), the equation for the error,
Bek+1 = Cek
or, equivalently,
ek+1 = B −1 Cek . (13.1.9)
The matrix B −1 C is called the iteration matrix for the algorithm.
A necessary and sufficient condition for the error given by (13.1.9) to converge to
zero is that all the eigenvalues of B −1 C are less than 1 in magnitude. For a matrix M, its
spectral radius ρ(M) is defined by
ρ(M) = max |λi |,
i
where the λi are the eigenvalues of M; see Appendix A. Thus (13.1.8) is a convergent
method if and only if  
ρ B −1 C < 1.
 
The quantity ρ B −1 C is a measure of the error reduction per step of the iteration. Fur-
thermore, the speed of convergence of the method is dependent on the size of ρ(B −1 C).
If we have two different splittings of A,
A = B1 − C1
= B2 − C2
and    
ρ B2−1 C2 < ρ B1−1 C1 ,
then the second method, with the smaller spectral radius, converges faster than does the
first.
342 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
13.1.1. For a linear system of the form (A1 + A2 )x = b, consider the iterative method

(I + µA1 )x̃ = (I − µA2 )x k + µb,


(13.1.10)
(I + µA2 )x k+1 = (I − µA1 )x̃ + µb,

where µ is a parameter. Show that this iterative method can be put in the form
(13.1.8) and determine the iteration matrix for the method. (This method is based
on the ADI method discussed in Section 7.3.)
13.1.2. Show for the system

xj − xj +1 = bj for j = 1, . . . , K − 1,
xK = bK

that the iterative method

xjk+1 = xjk+1 + bj for j = 1, . . . , K − 1,


k+1
xK = bK

converges in K steps. Show also that ρ(B −1 C) is zero.


13.1.3. Prove that a linear iterative method converges in a finite number of steps if and only
if ρ(B −1 C) = 0.

13.2 Eigenvalues of the Discrete Laplacian


In the analysis of the numerical methods introduced in the previous section we will require
the eigenvalues of the discrete Laplacian operator. In this section we will derive formulas
for these eigenvalues.
The equation for eigenvalues of the discrete Laplacian is

∇ 2 v$,m = −λv$,m ,

where v$,m is a grid function that is identically zero on the boundary of the region, but is not
identically zero. We will determine the eigenfunctions and eigenvectors for a rectangular
grid for a region with 0 < x < X and 0 < y < Y. We have )x = X/L, )y = Y /M,
and
v$−1,m − 2v$,m + v$+1,m v$,m−1 − 2v$,m + v$,m+1
2
+ = −λv$,m (13.2.1)
)x )y 2

for all interior points and v$,m equal to zero on the boundaries.
It is important to make a distinction between the eigenvector v̄, which has unknowns
corresponding to the (L − 1)(M − 1) interior grid points, and the grid function v, which
13.2 Eigenvalues of the Discrete Laplacian 343
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

has (L + 1)(M + 1) values corresponding to both the interior and boundary points. Be-
cause we specify that the boundary values of v are zero, we can write the simple formula
(13.2.1). The equations for v̄ are different than (13.2.1) if ($, m) is next to a boundary, in
which case at least one of the terms on the left-hand side of (13.2.1) would not be present.
We begin by looking for solutions of the form

v$,m = A($)B(m), (13.2.2)

where A(·) and B(·) are functions of one integer variable. (Note that it is not clear a priori
that we can obtain such solutions.) By substituting the relation (13.2.2) in (13.2.1) and then
dividing by A($)B(m), we obtain the equation

A($ − 1) − 2A($) + A($ + 1) B(m − 1) − 2B(m) + B(m + 1)


+ = −λ.
)x 2 A($) )y 2 B(m)

In this relation we see that we have an expression that depends on $ and one that depends
on m and their sum is a value independent of both $ and m. This can only occur if both
of these expressions are actually constant. That is, we have

A($ − 1) − 2A($) + A($ + 1) = −2(1 − α)A($),


(13.2.3)
B(m − 1) − 2B(m) + B(m + 1) = −2(1 − β)B(m)

for some complex values α and β related by

1−α 1−β
λ=2 +2 .
)x 2 )y 2

Since the equation for B(·) is similar to that of A(·), we consider only the equation
for A(·). To solve the equation for A(·), a recurrence relation, we substitute

A($) = ζ $

in the first equation in (13.2.3). We obtain the quadratic equation

ζ 2 − 2αζ + 1 = 0

for the two values of ζ. The two roots are



ζ± = α ± α 2 − 1.

Note that ζ− = 1/ζ+ . Thus the equation for A(·) is of the form

A($) = A+ ζ+$ + A− ζ−$

for some constants A+ and A− .


344 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

To determine A+ and A− and also α, we consider the boundary conditions for


A(·). These are
A(0) = 0 and A(L) = 0.
The condition A(0) = 0 is satisfied if A+ + A− = 0, so

A($) = A+ (ζ+$ − ζ−$ ). (13.2.4)


Note that we cannot determine a value for A+ since the equation for A(·) in (13.2.3) is a
homogeneous equation.
The boundary condition A(L) = 0 is equivalent to

A(L) = A+ (ζ+L − ζ−L ) = 0


or  L
ζ+
= 1.
ζ−
Also, since ζ− = 1/ζ+ , we have
ζ+2L = 1.
Thus ζ+ (and ζ− ) is a 2Lth root of unity, i.e.,

ζ+ = eiπa/L , (13.2.5)
for some integer a ranging from 0 to 2L − 1. Since ζ− = 1/ζ+ we can restrict a so
that 0 < a < L and (13.2.4) gives all of the L − 1 nontrivial solutions of the equation for
A(·) in (13.2.3).
Moreover,
1 + ζ+2 πa
α= = cos for some integer a, 0 < a < L.
2ζ+ L
Similarly,
πb
β = cos for some integer b, 0 < b < M.
M
From equation (13.2.3), we have that the eigenvalue corresponding to the pair of
integers (a, b) is

1 − cos πa 1 − cos πb sin πa πb


sin 2M
λa,b = 2 L
+ 2 M
= 4 2L
+ 4 (13.2.6)
)x 2 )y 2 )x 2 )y 2
for integers (a, b) with 0 < a < L and 0 < b < M. Moreover, also from (13.2.2) and
(13.2.4), we obtain that the corresponding eigenvector is given by
 
a,b a$π bmπ
v̄$,m = sin sin . (13.2.7)
L M
So there are (L − 1)(M − 1) eigenvalues, and each corresponds to a distinct eigen-
vector. This shows that the discrete Laplacian has a complete set of eigenvalues and
eigenvectors given by (13.2.6) and (13.2.7), respectively.
13.3 The Jacobi and Gauss–Seidel Methods 345
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

13.3 Analysis of the Jacobi and Gauss–Seidel Methods


In this section we analyze the Jacobi and Gauss–Seidel methods for the five-point Laplacian.
For simplicity of exposition, we restrict to a square with the same spacing in both directions
with )x = )y = h and N points in each direction.
To analyze the Jacobi and Gauss–Seidel methods, we rewrite (13.1.1) as

v$,m − 14 v$−1,m − 14 v$,m−1 − 14 v$+1,m − 14 v$,m+1 = 0 (13.3.1)

for all interior points. If this were written in the form (13.1.2), then all values of v$±1,m
and v$,m±1 that correspond to boundary points would have to be placed on the right-hand
side of the equation. For example, if ($, m − 1) is a boundary grid point, then instead of
(13.3.1) we have

v$,m − 14 v$−1,m − 14 v$+1,m − 14 v$,m+1 = 14 v$,m−1 .

Using the natural ordering of the grid points, we can write (13.3.1) as

Ax = b

with
A = I − L − U,
where L is a lower triangular matrix and U is an upper triangular matrix.
It is important to realize that the vector x is indexed with pairs of indices corre-
sponding to the grid points v$,m , and the matrix A is indexed with pairs of pairs. In
particular,

A($,m),($,m) = 1, A($,m),($+1,m) = − 14 , A($,m),($−1,m) = − 14 ,


A($,m),($,m+1) = − 14 , A($,m),($,m−1) = − 14

when these elements are defined. All other elements are 0. If the grid spacing is given by
h = 1/N, then the matrices have order K equal to (N − 1)2 .
We now consider the splittings corresponding to the two methods that we are studying
in this section. Notice that the B matrix multiplies the unknowns of iteration k + 1 and
the C matrix multiplies those of index k.
For the Jacobi method we see that

B=I and C = L + U. (13.3.2)

The splitting for the Gauss–Seidel method depends on the order of the unknowns.
We take the order in which the unknowns are updated to be the same as that used in the
vector x. With this proviso, the Gauss–Seidel method has the splitting

B =I −L and C = U. (13.3.3)

The matrix decomposition (13.3.2) for the Jacobi method is a restatement of (13.1.3),
which shows that the updated variables, those multiplied by B, are only the diagonal
346 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

elements. The variables evaluated at step k in formula (13.1.3) are those corresponding
to the off-diagonal elements of the matrix. Similarly, the decomposition (13.3.2) for the
Gauss–Seidel method is a restatement of (13.1.4) in which the variables evaluated at step
k + 1 are those corresponding to the elements of the matrix on the diagonal and below.
Notice that the matrix B, being a lower triangular matrix, is easy to invert.
It is important to realize that in the actual implementation of these methods in a com-
puter program, we do not store the matrices A, B, and C. They are all quite sparse and
it is very inefficient to store them as matrices. The matrices are useful in the analysis, but
the implementation can be done without explicit reference to them. That is, a computer im-
plementation should not have an (N − 1)2 × (N − 1)2 array for storage of these matrices.
Instead the implementation should use a form such as (13.1.4), in which only the current
values of v$,mk are stored. There is no reason to store other arrays.

Analysis of the Jacobi Method


To determine the spectral radius of the iteration matrix for each of these methods applied
to the five-point Laplacian, we first find the eigenvalues and eigenvectors of the iteration
matrix for the Jacobi method (13.1.3). That is, we must find a vector v̄ and value µ such
that
µv̄ = (L + U )v̄.
If we represent v̄ as a grid function with indices from 0 to N in each direction, with the
indices 0 and N corresponding to the boundaries, we have
 
µv$,m = 1
4 v$−1,m + v$,m−1 + v$+1,m + v$,m+1 (13.3.4)

for all interior points and v$,m equal to zero on the boundaries.
As mentioned after equation (13.2.1), it is important to make a distinction between the
eigenvector v̄, which has unknowns corresponding to the (N − 1)2 interior grid points,
and the grid function v, which has (N + 1)2 values corresponding to both the interior and
boundary points. Because we specify that the boundary values of v are zero, we can write
the simple formula (13.3.4). The equations for v̄ are different than (13.3.4) if ($, m) is
next to a boundary, in which case at least one of the terms on the right-hand side of (13.3.4)
would not be present. The use of the grid function v in place of the eigenvector v̄ allows
for a simpler way to write the equations.
Since L + U is an (N − 1)2 × (N − 1)2 matrix, there should be (N − 1)2 eigen-
values and eigenvectors.
Comparing equation (13.3.4) with (13.2.1), we see that the eigenvalues of the Jacobi
method are related to those of the Laplacian by

µ = 1 − 14 h2 λ.

So the eigenvalues are


  !
1 aπ bπ
µ a,b
= cos + cos (13.3.5)
2 N N
13.3 The Jacobi and Gauss–Seidel Methods 347
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

for 1 ≤ a, b ≤ N − 1. By equation (13.2.7) the eigenvectors are given by


 
a,b a$π bmπ
v$,m = sin sin . (13.3.6)
N N

This gives all (N − 1)2 eigenvalues and eigenvectors for the Jacobi iteration matrix. See
also Exercise 13.3.1.
From the formula (13.3.5), we see that

π
ρ(B −1 C) = ρ(L + U ) = cos = µ1,1 .
N

Since ρ(L + U ) is less than 1, the Jacobi method will converge; however, since ρ(L + U )
is very close to 1, i.e.,
π π2
cos ≈1− ,
N 2N 2
we see that the convergence will be slow.
The relationship µN−a,N−b = −µa,b shows that the nonzero eigenvalues occur in
pairs and that if µ is an eigenvalue, then −µ is also an eigenvalue. Notice also that the
eigenvalues µa,N−a for a between 1 and N − 1 are all equal to 0 and these are the only
eigenvalues equal to 0. Thus there are N − 1 eigenvalues of L + U that are zero, and
consequently there are (N − 1)(N − 2) nonzero eigenvalues.

Analysis of the Gauss–Seidel Method


We now consider the Gauss-Seidel method. An eigenvector v̄ of the iteration matrix
(I − L)−1 U with eigenvalue λ satisfies

λ(I − L)v̄ = U v̄,

or, for the grid function v$,m , we have


 
λv$,m = 1
4 λv$−1,m + λv$,m−1 + v$+1,m + v$,m+1 (13.3.7)

for all interior points and v$,m equal to zero on the boundaries. Notice that the coefficient
λ in (13.3.7) multiplies only the variables with superscript of k + 1 in the formula (13.1.4).
In the form (13.3.7) the formula is rather intractable; however, there is a substitution that
reduces the analysis of this case to that of the Jacobi method. If we set

v$,m = λ($+m)/2 u$,m (13.3.8)

for each nonzero eigenvalue λ, we obtain, after dividing by λ($+m+1)/2 ,


 
λ1/2 u$,m = 1
4 u$−1,m + u$,m−1 + u$+1,m + u$,m+1 . (13.3.9)
348 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

By comparing (13.3.9) with (13.3.4), we see that the nonzero eigenvalues λ of the
Gauss–Seidel method are related to the eigenvalues µ of the Jacobi method by

1 aπ bπ 2
λ a,b
= (µ ) =
a,b 2
cos + cos . (13.3.10)
4 N N
In particular,
ρ[(I − L)−1 U ] = ρ(L + U )2 ,
which shows that the Gauss–Seidel method converges twice as fast as the Jacobi method
for the five-point Laplacian.
The eigenvalues of the Gauss–Seidel iteration matrix from equation (13.3.10) give
only (N − 1)(N − 2)/2 eigenvalues corresponding to the (N − 1)(N − 2) nonzero eigen-
values of the Jacobi iteration matrix. An examination of the corresponding eigenvectors
for the Gauss–Seidel method shows that they are given by
 
a,b a$π bmπ
v$,m = (µ )
a,b $+m
sin sin
N N
N−a,N −b a,b
with v$,m = v$,m . All other eigenvalues are zero, and they are not semisimple. (See
Appendix A for the definition of a semisimple eigenvalue.)
An alternative way to describe the preceding analysis is to consider the equation

det[λI − (I − L)−1 U ] = 0

for the eigenvalues of the Gauss–Seidel iteration matrix. We have the relationship

0 = det[λI − (I − L)−1 U ] = det(λI − λL − U ) det(I − L)−1 .

The value of det(I − L)−1 is 1, since L is strictly lower triangular. We next transform the
matrix λI − λL − U by a similarity transformation using the diagonal matrix S, where
the ($, m)th entry on the diagonal is λ($+m)/2 . (Recall that the rows and columns of L,
U, and S, are indexed by the ordered pairs of integers corresponding to the grid indices.)
We then have
S −1 (λI − λL − U )S = λI − λ1/2 (L + U )
corresponding to (13.3.9). Thus

det(λI − λL − U ) = det[λI − λ1/2 (L + U )]


2 /2
= λ(N−1) det[λ1/2 I − (L + U )]
2 (
= λ(N−1) /2 (λ1/2 − µa,b )
1≤a,b≤N−1
(
= λN(N−1)/2 [λ − (µa,b )2 ],
2≤a+b≤N−1
13.3 The Jacobi and Gauss–Seidel Methods 349

where in the last product we used the facts that µa,N −a is zero for each a and µN−a,N −b =
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

−µa,b . This last formula confirms our previous conclusion that there are N (N − 1)/2 zero
eigenvalues and shows that (13.3.10) gives the (N − 1)(N − 2)/2 nonzero eigenvalues.
An examination of why the substitution (13.3.8) works shows that the updating of
values in the Gauss–Seidel method can be organized either in the standard lexicographic
order or in the order of increasing values of $ + m. When one updates a value at a grid
point with indices ($, m), the computation involves only points of lower value for the sum
of the indices, the points with “new” values, and points of larger value for the sum of the
indices, the points with “old” values.
The Jacobi method can also be regarded as solving the heat equation
ut = uxx + uyy
until a steady-state solution is reached using forward-time central-space differencing and
)t = 14 h2 . In general it seems that finding steady-state solutions by solving the correspond-
ing time-dependent equations is less efficient than using special methods for the steady-state
equations. The Gauss–Seidel method can be regarded as a finite difference approximation
for the time-dependent evolution for the equation
ut = uxx + uyy − ε(uxt + uyt ),

where )t = 12 h2 and ε = 14 h. The equation should be discretized about (t, x, y) equal


to ((n + 12 ))t, $h, mh) to obtain (13.1.4).

Methods for Diagonally Dominant Matrices


We now state and prove a theorem about the Gauss–Seidel and Jacobi methods for the
class of diagonally dominant matrices. Many schemes for second-order elliptic equations,
including the five-point Laplacian, give rise to diagonally dominant matrices.

Definition 13.3.1. A matrix is diagonally dominant if



|aij | ≤ |aii | (13.3.11)
j =i

for each value of i. A row is strictly diagonally dominant if the inequality in (13.3.11)
is a strict inequality and a matrix is strictly diagonally dominant if each row is strictly
diagonally dominant.
By a permutation of a matrix A, we mean a simultaneous permutation of the rows
and columns of the matrix; i.e., aij is replaced by aσ (i),σ (j ) for some permutation σ.
Definition 13.3.2. A matrix is reducible if there is a permutation σ under which A has
the structure 
A1 O
, (13.3.12)
A12 A2
where A1 and A2 are square matrices. A matrix is irreducible if it is not reducible.
350 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

For an arbitrary matrix A the Jacobi iterative method for equation (13.1.1) is

x k+1 = D −1 ((D − A)x k + b)


(13.3.13)
= (I − D −1 A)x k + D −1 b,

where D is the diagonal matrix with the same diagonal elements as A. If A is written as

A = D − L − U,

where L and U are strictly lower and upper triangular matrices, respectively, then the
Gauss–Seidel method for (13.1.2) is

(D − L)x k+1 = U x k + b. (13.3.14)

Notice that the diagonal dominance of a matrix is preserved if the rows and columns
of the matrix are permuted simultaneously. The Gauss–Seidel method is dependent on the
permutations of the matrix, whereas the Jacobi method is not, and a matrix is reducible if
in using the Jacobi method it is possible to have certain components of x k be zero for all
values of k while x 0 is not identically zero (see Exercises 13.3.4 and 13.3.5).

Theorem 13.3.1. If A is an irreducible matrix that is diagonally dominant, with at least


one row being strictly diagonally dominant, then the Jacobi and Gauss–Seidel methods are
convergent.

Proof. We prove the theorem only for the Gauss–Seidel method; the proof for the
Jacobi method is easier. Our proof is based on that of James [31]. We begin by assuming
that there is an eigenvalue of the iteration matrix, λ, that satisfies |λ| ≥ 1. Let x be
an eigenvector of the iteration matrix with eigenvalue λ, and we normalize x so that
x∞ is 1.
Let xi be a component of x with |xi | equal to 1; then we have the series of
inequalities


i−1 n

|λ||aii ||xi | = λ aij xj + aij xj
j =1 j =i+1

i−1
n
≤ |λ| |aij ||xj | + |aij ||xj |
j =1 j =i+1 (13.3.15)

≤ |λ| |aij ||xj |
j =i

≤ |λ| |aij ||xi | ≤ |λ||aii ||xi |.
j =i

Since the first and last expressions are the same, each inequality in the preceding
sequence must be an equality. This implies that for each j, either |xj | is 1 or aij is zero.
This conclusion follows for each i with |xi | equal to 1.
13.4 Analysis of Point SOR 351
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

If we permute the indices of A so that the components with |xj | equal to 1 are
placed first and the others, for which |aij | is zero, are last, then the structure of A is of
form (13.3.12). Since A is irreducible, we conclude that |xj | is 1 for each value of j.
By choosing a row that is strictly diagonally dominant, the last inequality of (13.3.15)
is then a strict inequality, which leads to a contradiction. This implies that the assumption
that λ satisfies |λ| ≥ 1 is false. Therefore, |λ| is less than 1 for the iteration matrix, and
the Gauss–Seidel method is convergent.

Exercises
13.3.1. Verify by direct substitution that the eigenvalues and eigenvectors for the Jacobi
iteration matrix are given by (13.3.5) and (13.3.6), respectively.
13.3.2. Determine the eigenvalues of the Jacobi iteration matrix when applied to the “di-
agonal” five-point Laplacian scheme given by

1  
v$−1,m−1 + v$+1,m−1 + v$−1,m+1 + v$+1,m+1 − 4v$,m = f$,m (13.3.16)
2h2

on a uniform grid with )x = )y = h. Hint: The eigenvectors for this Jacobi


method are the same as for the Jacobi method for the usual five-point Laplacian.
The eigenvalues, however, are different.
13.3.3. Verify that zero is not a semisimple eigenvalue of the iteration matrix for the Gauss–
Seidel method for the five-point Laplacian on the unit square.
13.3.4. Show that the Jacobi method (13.3.13) is not affected by a simultaneous reordering
of the rows and columns of a matrix, whereas the Gauss–Seidel method (13.3.14)
is affected. Note that such a permutation is equivalent to applying a similarity
transformation using a permutation matrix P to the matrix A resulting in the
matrix P AP −1 .
13.3.5. Show that a matrix is reducible if in using the Jacobi method, it is possible to have
certain components of x k be zero for all values of k while x 0 is not identically
zero (see Exercise 13.3.4).
13.3.6. Show that the matrix for the five-point Laplacian on the unit square is irreducible.

13.4 Convergence Analysis of Point SOR


We now analyze the convergence of SOR for the five-point Laplacian as given by (13.1.5).
We have to determine the splitting matrices B and C. As before, B multiplies the un-
knowns at iteration k + 1 and C multiplies those at iteration k. We also have the condition
that A = B − C = I − L − U. After rearranging the formula (13.1.5) and dividing by ω,
we obtain that the splitting is given by

1 1−ω
B= I − L, C= I + U.
ω ω
352 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

By the same reasoning used with the other methods, from (13.1.5) we obtain that the
eigenvalues λ are given as the solutions to
 
ω−1 (λ + ω − 1) v$,m = 14 λv$−1,m + λv$,m−1 + v$+1,m + v$,m+1 (13.4.1)

for interior grid points, with v$,m = 0 on the boundary. We use the substitution (13.3.8)
for the nonzero eigenvalues, which we used on (13.3.7), obtaining
λ+ω−1 1  
1/2
u$,m = u$−1,m + u$,m−1 + u$+1,m + u$,m+1 .
ωλ 4
From this relation we see that the nonzero eigenvalues for SOR are related to those
of the Jacobi method by
λ+ω−1

ωλ1/2
for each eigenvalue µ of the Jacobi iteration matrix. We rewrite this relationship as

λ − λ1/2 ωµ + ω − 1 = 0, (13.4.2)

which is a quadratic equation in λ1/2 .


First note that the iteration matrix for SOR is nonsingular for ω not equal to 1. We
have
det B −1 C = det(ω−1 I − L)−1 det[ω−1 (1 − ω)I + U ]

= ωK [ω−1 (1 − ω)]K = (1 − ω)K ,


where K is (N − 1)2 , the order of the matrix. In particular, zero is not an eigenvalue of
the iteration matrix for SOR when ω is not equal to 1.
Equation (13.4.2) relates each eigenvalue of the Jacobi iteration matrix to two eigen-
values of the SOR iteration matrix. Since µa,b = −µN−a,N −b and there is an ambiguity in
the sign of λ1/2 , there is actually a one-to-one correspondence between the pair of nonzero
eigenvalues {µa,b , µN−a,N −b } of the Jacobi iteration matrix and the pair of solutions of
equation (13.4.2) with µ equal to µa,b . For µa,b equal to zero, there corresponds the
one eigenvalue λ equal to 1 − ω. Thus equation (13.4.2) determines the (N − 1)2 eigen-
values of the SOR iteration matrix from the (N − 1)2 eigenvalues of the Jacobi iteration
matrix.
Since we wish to have both roots of (13.4.2) less than 1 in magnitude and the product
of the roots is ω − 1, we see that a necessary condition for the convergence of SOR is

|ω − 1| < 1,
or, equivalently,
0 < ω < 2. (13.4.3)

This same conclusion is reached for the N − 1 eigenvalues corresponding to µa,b = 0.


Solving (13.4.2), we obtain
 & 
λ1/2 = 12 ωµ + ω2 µ2 − 4(ω − 1) . (13.4.4)
13.4 Analysis of Point SOR 353
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We choose the nonnegative square root when the square root is real in this formula
so that when ω equals 1, then λ = µ2 for positive µ and λ is zero for negative µ. This
correspondence is somewhat arbitrary, but since SOR reduces to the Gauss–Seidel method
for ω equal to 1, it is useful to relate the eigenvalues in this way.
We now assume, without loss of generality, that µ is positive, and we wish to find
the value of ω that minimizes the magnitude of λ1/2 when λ1/2 is real, i.e., when the
quantity inside the square root in (13.4.4) satisfies
 2 
2 1
ω µ − 4ω + 4 = ωµ −
2 2
−4 − 1 ≥ 0.
µ µ2

To determine how λ1/2 varies as a function of ω, we take the derivative of (13.4.4):

∂ 1/2 1 1
λ = µ + (ωµ2 − 2)(ω2 µ2 − 4ω + 4)−1/2
∂ω 2 2
$ %
µ 2/µ − ωµ
= 1−  < 0.
2 (2/µ − ωµ)2 − 4(µ−2 − 1)

Since this derivative is negative, we see that to decrease the size of λ1/2 we must increase
ω. The maximum value of ω for which λ1/2 is real is the root of

ω2 µ2 − 4ω + 4 = 0 (13.4.5)

that satisfies (13.4.3).


When µ is negative and λ1/2 is real, then λ1/2 is less than the value of λ1/2
corresponding to |µ| and thus does not affect the spectral radius of the iteration matrix.
Since we are ultimately concerned with determining the spectral radius of the iteration
matrix, we need not consider this case in detail.
Now consider the case when λ1/2 is complex. Notice that since the polynomial in
(13.4.2) has real coefficients, the two values of λ corresponding to µ and −µ are complex
conjugates of each other. The magnitude of λ can be computed from (13.4.4) as follows:
 
|λ| = |λ1/2 |2 = 1
4 ω2 µ2 + 4(ω − 1) − ω2 µ2 = ω − 1.

From this relationship we see that to decrease |λ| we must decrease ω. The minimum
value of ω for which λ1/2 is complex is again the root of (13.4.5) satisfying (13.4.3).
We now consider the eigenvalues λ(µa,b ) for the SOR iteration matrix for all eigen-
values µa,b of L + U. The spectral radius for the SOR iteration matrix is the maximum
magnitude of all the λ(µa,b ). We wish to choose ω in order to minimize the spectral
radius.
First, consider ω very close to 2—so close that

ω2 (µa,b )2 − 4ω + 4
354 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

is negative for all eigenvalues µa,b . By our previous discussion, all the λ corresponding to
nonzero values of µa,b are complex with magnitude equal to ω − 1. Those λ correspond-
ing to µa,b that are equal to zero have the value −(ω − 1), which means all eigenvalues
have the same magnitude. The spectral radius is therefore ω − 1. As we decrease ω, we
will reach some value ω∗ at which some λ(µa,b ) that is complex will become real. It is
easy to see that this must happen for µa,b equal to µ̄, the largest eigenvalue of L + U in
magnitude. For ω less than ω∗ , the spectral radius will now increase because ∂λ1/2 /∂ω
is negative for λ corresponding to µ̄. Thus the optimal choice for ω is ω∗ , where ω∗
satisfies
ω∗2 µ̄2 − 4ω∗ + 4 = 0
and (13.4.3), which gives the optimal value as

2
ω∗ =  . (13.4.6)
1 + 1 − µ̄2

Since for Laplace’s equation µ̄ = cos π/N, the value of ω∗ for Laplace’s equation
is
2
ω∗ =
1 + sin π/N
and the spectral radius is

1 − sin π/N 2π
ρ ∗ = ω∗ − 1 = ≈1− .
1 + sin π/N N

The behavior of the spectral radius as a function of ω is displayed in Figure 13.1


for N = 10. The optimal value of ω is the lowest point on the graph. For ω larger than
ω∗ , the spectral radius is seen to be the linear relation ρ = ω − 1. Otherwise, the spectral
radius is obtained from (13.4.4) for µ = µ̄.

0.5

0
0 0.5 1 1.5 2

Figure 13.1. The spectral radius ρ as a function of ω for N = 10.

It is also useful to consider the behavior of all of the eigenvalues of the iteration
matrix for SOR as functions of ω as ω increases from 1. For ω equal to 1, there are
N (N − 1)/2 eigenvalues that are 0, and the rest are real and located between 0 and 1,
given by (13.3.10). As ω is taken to be larger than 1, these eigenvalues between 0 and
13.4 Analysis of Point SOR 355
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1 all decrease in magnitude. Of the eigenvalues that are 0 for ω equal to 1, N − 1 of


them become negative for ω larger than 1 and have the value 1 − ω, and the rest become
positive and increase as ω increases. When an eigenvalue from the group that is decreasing
with ω coalesces with an eigenvalue from the group that is increasing with ω, they become
a pair of complex conjugates of magnitude ω − 1. The optimal value of ω is that value
where only two eigenvalues in the interval (0, 1) are real and are equal to each other. This
value is given by (13.4.6).

Figure 13.2. Eigenvalues for ω = 1.25, 1.35, 1.56, 1.60 with N = 11.

This is illustrated in the plots in Figure 13.2 that show the eigenvalues λa,b as
functions of ω for N equal to 11. There are 100 eigenvalues in all. For ω equal to 1.25,
the figure in the upper left shows four real eigenvalues larger than ω − 1 and four positive
real eigenvalues less than this value. In addition, there is the real eigenvalue −ω + 1 of
multiplicity 10. The other 82 eigenvalues are complex. The arrows at the top and bottom of
the circles show the direction that the eigenvalues move as ω increases. The eigenvalues
356 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

on the positive real axis move toward the circle of radius ω − 1 as ω increases. The plot
at the upper right shows the positions of the eigenvalues for ω equal to 1.35. For this case,
there are only two pairs of positive real eigenvalues. As ω increases through the values
1.25 to 1.35 the magnitude of the largest positive eigenvalue decreases and one pair of
real eigenvalues becomes complex. At 1.56, which is just slightly less than ω∗ , there
remain only two positive eigenvalues, and they are very close to ω − 1. For ω equal to
1.60, shown in the lower right plot, all eigenvalues are of magnitude ω − 1.
Because of the relationship µa,b = µb,a , which holds if )x = )y, the set of eigen-
values has fewer than (N − 1)(N − 2) + 1 elements. If N is even, the set of eigenvalues
has N (N − 2)/4 + 1 elements, and if N is odd, the set of eigenvalues has (N − 1)2 /4 + 1
elements.
We now examine how the number of iterations for an iterative method to achieve a
certain error tolerance is related to the spectral radius. Suppose an iterative method has
spectral radius ρ and we wish to know how many iterations, I, it will take to reduce the
norm of the error to a certain multiple, ε, of the initial error. From (13.1.9) we see that we
must have
ρI ≈ ε

or
− log ε
I≈ .
− log ρ

If ρ is close to 1, then
log ε−1
I≈ .
1−ρ

So, for the Gauss–Seidel method, from ρ = (cos(π/N ))2 ≈ 1 − π 2 /N 2 ,

N2
I≈ log ε−1 ,
π2

and for SOR with ω = ω∗ ,


N
I≈ log ε−1 .

These formulas show that for the Gauss–Seidel and Jacobi methods, the number of iterations
is proportional to N 2 , whereas for SOR it is proportional to N. This is why SOR is a
dramatic improvement in efficiency over the Gauss–Seidel method for even small values
of N.

Exercise
13.4.1. Determine the optimal value of ω for SOR applied to the “diagonal” five-point
Laplacian (13.3.16).
13.5 Consistently Ordered Matrices 357
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

13.5 Consistently Ordered Matrices


In relating the eigenvalues of the Gauss–Seidel and SOR methods to the eigenvalues of
the Jacobi method, we made use of the fact that if α is an eigenvalue of λL + U, then
αλ−1/2 is an eigenvalue of L + U. (See the discussion relating to (13.3.7) and (13.4.1).)
If L + U has this special property, it is said to be consistently ordered.

Definition 13.5.1. A matrix of the form I − L − U is consistently ordered if whenever


α is an eigenvalue of λL + U, then αλ−1/2 is an eigenvalue of L + U.

An examination of our analysis shows that we have proved that if I − L − U is


consistently ordered, then the Gauss–Seidel method will converge if and only if the Jacobi
method converges, and the Gauss–Seidel method will converge twice as fast. We have also
shown that SOR will converge under these conditions, and the optimal value of ω is given
by (13.4.6). The reader should check that in deriving (13.4.6) we used nothing special about
the matrix I − L − U other than that it was consistently ordered and that its eigenvalues
are real (see Exercise 13.5.6). Thus we have proved the following theorem.

Theorem 13.5.1. If the matrix A, which is equal to I − L − U, is consistently ordered


and has real eigenvalues, then the SOR method, given by

(ω−1 I − L)x k+1 = (ω−1 (1 − ω)I + U )x k + b,

converges to the solution of Ax = b for all ω in the interval (0, 2) if and only if the
Jacobi method converges. Moreover, the optimal value of ω is given by formula (13.4.6),
where µ̄ is the eigenvalue of L + U with largest magnitude.

In case matrix I − L − U is consistently ordered but with complex eigenvalues,


we can determine those values of ω for which the SOR method converges, but it is more
difficult to determine the optimal value of ω.

Theorem 13.5.2. If the matrix A, given by I − L − U, is consistently ordered, then the


SOR method converges for those values of ω in the interval (0, 2) that satisfy
 2
ω
(Re µi ) + 2
(Im µi )2 < 1 (13.5.1)
2−ω

for each eigenvalue µi of L + U. In particular, if |Re µi | < 1 for each µi , then there
is an interval (0, ω̄) of values of ω for which SOR converges.
1/2
Proof. Let τi = λi . Then equation (13.4.2) can be written

1 1−ω ωµi
ζi = τi − = .
2 τi 2

We consider the mapping of the complex plane that takes the complex variable τ to ζ =
(τ − (1 − ω)/τ )/2. This mapping takes circles in the complex τ plane to ellipses in the
358 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

complex ζ plane. The circle |τ | = |1 − ω|1/2 is mapped to the degenerate ellipse given
by √
Re ζ = 0, |Im ζ | ≤ 1 − ω
when 0 < ω < 1 and √
Im ζ = 0, |Re ζ | < ω−1
when 1 < ω < 2. In either case the annulus |ω − 1|1/2 ≤ |τ | < 1 is mapped onto the
ellipse
  2
Re ζ 2 Im ζ
+ < 1. (13.5.2)
ω/2 1 − ω/2
For each value of ζ we obtain two roots; if τ1 is one root, then τ2 = (ω − 1)/τ1
is the other root. It is therefore necessary that |ω − 1| = |τ1 τ2 | must be less than 1. We
also see that one root, say τ1 , must satisfy |ω − 1|1/2 ≤ |τ1 | < 1. If we set ζ = ωµi /2
in (13.5.2) we obtain (13.5.1), which proves the first assertion of the theorem. We also see
that if |Re µi | is less than 1 for all µi , then there are values of ω near 0 that satisfy
(13.5.1). This proves the theorem.

Estimating the Optimal Value of ω


SOR often converges when I − L − U is not consistently ordered, for example, when
used on more general elliptic equations with variable coefficients. Even though formula
(13.4.6) is not valid, we often find that the optimal ω is close to 2. In fact, the relation

2
ω∗ = (13.5.3)
1 + Ch
is often nearly true, where h is some measure of the grid spacing and C is some constant.
This formula is computationally very useful and can be employed as follows. First, for a
coarse grid we find a good estimate for ω∗ , the optimal ω, by experimentation, i.e., by
making several calculations with different values of ω. Given this ω∗ and h, we can
determine C and then use (13.5.3) to estimate ω∗ for smaller values of h. This formula
can considerably reduce computational effort.
Garabedian [22] showed that the optimal value of ω for Poisson’s equation on a
domain other than the square can be approximated by

2
ω∗ ≈ √ ,
1 + k1 h/ 2

where h is the mesh width and k1 is the first eigenvalue of the Laplacian, i.e., the least
positive value k1 such that
∇ 2 u + k12 u = 0
has a nontrivial solution with u equal to zero on the boundary. He also pointed out that
the value of k1 can be estimated from below by the Faber–Krahn inequality
 π 1/2
k1 ≥ k1∗ ,
A
13.5 Consistently Ordered Matrices 359

where A is the area of the domain and k1∗ is the first eigenvalue for a circle of radius 1. The
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

constant k1∗ is the first zero of the Bessel function J0 and is approximately 2.4. Because
the Faber–Krahn inequality is sharp for circular domains and less sharp for elongated and
nonconvex regions, we can estimate k1 as a multiple of k1∗ (π/A)1/2 , the multiplying
factor being determined by experiment. In ways similar to this, we can usually estimate the
optimal value of ω quite well in situations for which it cannot be explicitly determined.
In estimating the optimal value of ω, it is important to realize that it is better to
overestimate ω∗ than it is to underestimate. This is because, as shown in Figure 13.1 for ω
larger than ω∗ , the spectral radius varies linearly with ω, but the derivative with respect
to ω of λ(µ̄) for ω less than ω∗ , as given in (13.4.4), is infinite for the optimal value
of ω.

Variations of SOR
There are several variations of SOR. The one we have considered is often called point SOR
with natural ordering. One variation is to use a different ordering of the points. If we
update all the points with $ + m equal to an even number, followed by an update of all
those with $ + m equal to an odd number, we have point SOR with checkerboard ordering.
We can also do one iteration of point SOR with natural ordering followed by one
iteration of point SOR with reverse natural ordering. This is called symmetric SOR, or
SSOR.
Line SOR, or LSOR, updates one line of grid points at a time. The formula is
 
ṽ$−1,m − 4ṽ$,m + ṽ$+1,m = −ω v$−1,m
k
+ v$+1,m
k
+ v$,m−1
k+1
+ v$,m+1
k
− 4v$,m
k
,

k+1
v$,m = v$,m
k
+ ṽ$,m (13.5.4)
when taking the lines in the usual order. LSOR requires that a tridiagonal system be solved
for each grid line. This extra work is offset by a smaller spectral radius of the √iterative
method. Generally it is considered to be faster than point SOR by a factor of 2; see
Exercise 13.5.7.
In general, line, or block, SOR is derived by writing the system (13.1.6) as

− Lj m xm + Dj xj − Uj m xm = bj ,
m<j m>j

where each xj is a vector consisting of a subset of all the components of x. The coefficients
Lj m , Uj m , and Dj are matrices of the appropriate sizes. In the usual case xj is the set
of unknowns associated with the j th grid line. The line Jacobi method is given by
 

k+1 −1
xj = Dj bj + Lj m xm +
k k
Uj m xm
m<j m>j

and the LSOR is given by


 

xjk+1 = xjk + ωDj−1 bj + k+1
Lj m xm + k
Uj m xm − Dj xjk ,
m<j m>j

from which we obtain (13.5.4) for the special case of the five-point Laplacian.
360 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

It is easy to implement a symmetric LSOR method, in which the lines are swept in
the opposite order during each successive iteration. As with point SOR, symmetric LSOR
has a better convergence rate with almost no extra work.
One case where LSOR is useful is in the solution of elliptic equations on domains
with polar coordinate systems (r, θ ); see Section 12.6 and Exercise 12.7.2. Each “line”
consists of the grid points with fixed value of r. At the center we use formula (12.6.3). The
periodic tridiagonal system for each line can be solved by the methods of Section 3.5 (see
also Exercise 3.5.8). We first update all the points other than the origin; then (12.6.3) can
be used to compute the new value at the origin. In the SOR iterations it appears to be best
to proceed from the boundary of the disk in toward the center. The equation to update the
center value using (12.6.3) is
 
J  2
1 )r
uk+1
0 = uk0 + ω  1j − u0 − f (0)
uk+1 k .
J 2
j =1

Implementing SOR Methods


The implementation of SOR methods is quite straightforward, but there are some small
details that should be mentioned. The SOR methods are usually terminated when the
change in the solution is sufficiently small. One usually sets a tolerance and proceeds until
the changes are smaller than that tolerance. Rather than using formula (13.1.5) it is better
to use the two-step procedure
k
ṽ$,m = 14 (v$+1,m
k
+ v$−1,m
k+1
+ v$,m+1
k
+ v$,m−1
k+1
) − v$,m
k
,
k+1
(13.5.5)
v$,m = v$,m
k
+ ω ṽ$,m
k
,

k
where ṽ$,m is used to measure the change in the solution per iteration.
Of course, since SOR uses immediate replacement, in the computer implementation
there is no need to index the solution by the index k. Also, the temporary variable ṽ$,m
is not stored as an array; it need only be a scalar. Both steps of (13.5.5) are computed at
each grid point before proceeding to the next point. The two-step procedure (13.5.5) is
less sensitive to loss of significance than is the procedure of first using (13.1.5) and then
determining the change by computing the difference between the successive values of v$,m .
The line SOR (13.5.4) is given as a two-step procedure for the same reason. For more details
on the implementation, the reader is referred to Hageman and Young [29].
Here is a section of pseudocode illustrating how to implement the SOR method.
Notice that it requires only the one two-dimensional array v.

Initialize solution
while change > tolerance
change = 0
loop on $
loop on m
13.5 Consistently Ordered Matrices 361
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

change pt = [v( $ -1,m)+v( $ +1,m)+v( $ ,m-1)+v( $ ,m+1)


-4v( $ ,m)]/ 4.
v( $ ,m) = v( $ ,m) + omega*change pt
change = change + change pt ˆ 2
end of loop on m
end of loop on $
change = sqrt( change*h ˆ 2
end of while loop

The changes in the solution can be measured by the L2 norm of ṽ k , either with or
without the factor of ω. The L2 norm preferred by the author is
 1/2
ṽ k  = |ṽ$,m
k
|2 h 2 . (13.5.6)
$,m

The factor of h in the measurement of the norm causes the stopping tolerance to be
relatively independent of the grid size. The results given in the examples in this book use
the norm (13.5.6).
In checking for the optimal value of ω for a SOR method, it is often found that the
optimal value of ω to achieve convergence for a given tolerance in the norm (13.5.6) is
close to, but not the same as, that given by formula (13.4.6). One reason for this discrepancy
is that the convergence criteria are different; i.e., the use of (13.5.6) is not a measurement of
the spectral radius that was used in deriving (13.4.6). This discrepancy is of little concern,
since formulas such as (13.4.6) and (13.5.3) can be used to give nearly optimal values
for ω.
For Poisson problems, the values of f (x, y) at grid points should be computed
once and stored in an array, rather than be computed as needed. For standard computers,
accessing an array element is much faster than a function call and its related computation.
For other information on these and other iterative methods, see the compendium of
numerical methods by Barrett et al. [4].

Exercises
13.5.1. Using the point SOR method, solve Poisson’s equation

uxx + uyy = −2 cos x sin y

on the unit square. The boundary conditions and exact solution are given by the
formula u = cos x sin y. Use the standard five-point difference scheme with h =
)x = )y = 0.1, 0.05, and 0.025. The initial iterate should be zero in the interior
of the square. Comment on the accuracy of the scheme and the efficiency of
the method. Use ω = 2/(1 + π h). Stop the iterations when the changes in the
solution as measured in the L2 norm (13.5.6) are less than 10−7 . Note: For some
computers the value of 10−7 will be too small unless double-precision variables
are used.
362 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

13.5.2. Solve the same problem as in Exercise 13.5.1 but use the fourth-order accurate
finite difference scheme (12.5.4). Comment on the efficiency and accuracy of the
two methods. Even though the matrix for this scheme is not consistently ordered,
the SOR method will converge, as is shown in the next section. A good estimate
for the optimal value of ω is 2/(1 + π h).
13.5.3. Use the results of Exercise 13.5.1 to show that the values of δ0x v and δx2 v, where
v is the computed solution, are second-order approximations to the corresponding
derivatives.
13.5.4. Use the results of Exercise 13.5.2 to show that the values of the approximations
to the first and second derivatives given by (3.3.3) and (3.3.7) give fourth-order
approximations to the corresponding solutions.
13.5.5. Solve the same equation as in Exercise 13.5.1 but on the trapezoidal domain dis-
cussed in Section 12.7.
13.5.6. Prove Theorem 13.5.1.
13.5.7. Determine the formula for the optimal value of ω as a function of the grid spacing
for LSOR on the unit square in the case of equal spacing in both directions. Hint:
You will have to use the fact that the natural ordering of the lines is a consistent
ordering and also that the eigenvectors for the line Jacobi method are the same as
for the point Jacobi method. The eigenvalues, however, are different.
13.5.8. Suppose matrix A, given by I − L − U, is consistently ordered and L + U is
skew with eigenvalues µj . (A skew matrix is one for which  S T = −S.)
 Show
that SOR is convergent if and only if ω is in the interval 0, 2(1 + β̄)−1 , where
β̄ = max |µj | and the optimal value of ω is given by
2
ω∗ = .
1 + (1 + β̄ 2 )1/2
Notice that ω∗ is less than 1.
13.5.9. Show that the fourth-order accurate finite difference scheme (12.5.4) is not consis-
tently ordered with the natural ordering of points. Also show that it is consistently
ordered for LSOR.
13.5.10. Show that the optimal value of ω for point SOR with the checkerboard ordering
applied to the five-point Laplacian on the unit square is given by formula (13.4.6).
Hint: Show that the checkerboard ordering is a consistent ordering.

13.6 Linear Iterative Methods for Symmetric,


Positive Definite Matrices
We can also analyze linear iterative methods when the matrix A is symmetric and positive
definite. The methods of this section can be applied to many schemes that are not consis-
tently ordered and thus cannot be analyzed by the methods of the previous section. For
13.6 Methods for Symmetric, Positive Definite Matrices 363
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

example, the fourth-order accurate nine-point scheme (12.5.4) is not consistently ordered
for point SOR, but the matrix is symmetric and positive definite (see Exercise 13.6.3). On
the one hand, the method of analysis of this section requires less detailed understanding of
the matrix than is required to establish the consistent ordering of A; on the other hand, it
is not apparent how to determine the optimal value of ω.
It should be pointed out that one need not write out the scheme in matrix form to
determine if the matrix is symmetric. The matrix A representing the scheme is symmetric
when the coefficient multiplying v$ ,m in the scheme applied at grid point ($, m) is the
same as the coefficient multiplying v$,m in the scheme applied at grid point ($ , m ) for
each of the unknown grid function values.
The main result for symmetric, positive definite matrices is the following theorem.

Theorem 13.6.1. If A is symmetric and positive definite, then the iterative method (13.1.8)
based on the splitting (13.1.7) is convergent if

Re B > 12 A (13.6.1)

or, equivalently, that B T + C is symmetric and positive definite, i.e.,

B T + C > 0. (13.6.2)

Proof. We first establish that the two conditions in the conclusion are equivalent.
The matrix Re B is (B + B T )/2, and thus (13.6.1) is equivalent to

B T + B − A > 0. (13.6.3)

The defining relation of the splitting (13.1.7) shows that this is equivalent to (13.6.2)
and that B T + C is symmetric.
We now begin the proof. We measure the error in the norm induced by A, i.e.,
xA = (x, Ax)1/2 . In this norm we have the relation

ek+1 A = B −1 Cek A ≤ B −1 CA ek A

(see Appendix A). If the norm of B −1 C is less than 1, then the error will decrease at each
iteration and the method will converge. We have that the norm of B −1 C is given by

(B −1 Cx, AB −1 Cx) (x, C T B −T AB −1 Cx)


B −1 C2A = sup = sup .
x=0 (x, Ax) x=0 (x, Ax)

Thus the condition B −1 CA < 1 is equivalent to C T B −T AB −1 C < A, and we consider


now the matrix C T B −T AB −1 C. We have, using relation (13.1.7) to eliminate C,

C T B −T AB −1 C = (I − AB −T )A(I − B −1 A)

= A − (AB −T A + AB −1 A − AB −T AB −1 A).
364 Chapter 13. Linear Iterative Methods

Thus we see that C T B −T AB −1 C < A if and only if


Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

AB −T A + AB −1 A − AB −T AB −1 A > 0. (13.6.4)
But this last expression can be factored as
AB −T (B + B T − A)B −1 A
or
(B −1 A)T (B + B T − A)B −1 A.
Thus (13.6.4) is true if and only if (13.6.3) is true, and this implies that B −1 CA is less
than 1 and so the method is convergent. This proves the theorem.

Example 13.6.1. As our first application of Theorem 13.6.1 we consider SOR for a sym-
metric matrix A of the form
A = I − L − LT . (13.6.5)
Note that L need not be the lower triangular part of A, although in most applications it
is. We have the splitting
1 1−ω
B= I − L, C= I + LT ,
ω ω
and the condition (13.6.2) is
2−ω
BT + C = I > 0.
ω
We conclude that SOR for the matrix (13.6.5) will converge for ω in the interval (0, 2) if
the matrix A is positive definite.
This result applies to the fourth-order accurate nine-point scheme (12.5.4), which is
not consistently ordered; see Exercise 13.6.3.
Example 13.6.2. For our second application we consider SSOR for a matrix in the form
(13.6.5). For SSOR the splitting is
 
ω 1 1
B= I −L I −L ,
T
2−ω ω ω
  (13.6.6)
ω 1−ω 1−ω
C= I +L I +L T
2−ω ω ω
(see Exercise 13.6.1). In this case both B and C are symmetric and
 
B + C = ω(2 − ω)−1 ω−2 (2 − 2ω + ω2 )I − L − LT + 2LLT
  !
−1 (2 − ω)2
1 √ 1 √ T
= ω(2 − ω) I+
√ I − 2L √ I − 2L
2ω22 2
  T
2−ω ω 1 √ 1 √
= + √ − 2L √ − 2L ,
2ω 2−ω 2 2

which is positive definite if and only if 0 < ω < 2.


13.7 The Neumann Boundary Value Problem 365
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

As we see from these two examples, this analysis shows rather easily that the iterative
methods will converge for ω between 0 and 2, but it does not give an indication of the
optimal value of ω. The method used to prove Theorem 13.6.1 can be refined to give
estimates of the optimal ω, but we will not pursue this topic. Formula (13.5.3) and the
discussion of that formula should suffice for most applications.

Exercises
13.6.1. Verify that the matrices in (13.6.6) define the splitting for SSOR.
13.6.2. Consider the iterative method (13.1.10) based on the ADI method and assume that
the matrices A1 and A2 are symmetric. Use Theorem 13.6.1 to determine the
values of µ for which the iterative method will converge.
13.6.3. Show that the matrix arising from the fourth-order accurate scheme (12.5.4) is
positive definite when written in the form

10 2 
v$,m − v$+1,m + v$−1,m + v$,m+1 + v$,m−1
3 3
1 
− v$+1,m+1 + v$+1,m−1 + v$−1,m+1 + v$−1,m−1
6
h2  
=− f$+1,m + f$−1,m + f$,m+1 + f$,m−1 + 8f$,m .
12

13.7 The Neumann Boundary Value Problem


In this section we examine second-order elliptic equations with the Neumann boundary
condition (12.1.4). More specifically, we confine ourselves to equations of the form

a(x, y)uxx + 2b(x, y)uxy + c(x, y)uyy + d1 (x, y)ux + d2 (x, y)uy = f (x, y) (13.7.1)

on a domain U with the boundary condition

∂u
= b(x, y) on ∂U, (13.7.2)
∂n
which is the same as (12.1.4). Notice that equation (13.7.1) depends on u only through
its derivatives. As opposed to the Dirichlet boundary value problem for equation (13.7.1),
the solution to (13.7.1) and (13.7.2) is not unique. Indeed, if u is any solution to (13.7.1)
and (13.7.2), then for any constant c the function uc given by uc (x, y) = u(x, y) + c
is also a solution. The solution of this boundary value problem is unique to within the
additive constant; that is, any two solutions differ by a constant (see Exercise 13.7.2). (The
nonuniqueness of the solution of elliptic equations can occur for any type of boundary
condition; see Example 12.3.2.)
366 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

In addition to the solution not being unique, a solution may not exist unless the data, f
and b in (13.7.1), satisfy a linear constraint. For many applications, especially symmetric
problems, we can easily determine the constraint to be satisfied, but for some problems it
may be quite difficult to determine this constraint. For Poisson’s equation (12.1.1) with the
Neumann boundary condition (13.7.2), the constraint on the data is equation (12.1.5).
As an example of an equation for which it is difficult to determine the constraint, we
have
uxx + exy uyy = f
with the Neumann boundary condition (see Exercise 12.7.3). The solutions of this boundary
value problem are unique to within an additive constant, and numerical evidence confirms
that there is a constraint on the data.
The nonuniqueness of the solution of the differential equation boundary value problem
and possible nonexistence of a solution causes some difficulties in obtaining the numerical
solution. A careful examination of the difficulties leads to effective strategies to surmount
them.
We now consider using a finite difference scheme to obtain an approximate solution
of the Neumann problem. As an example, we consider solving the Neumann problem for
the Laplacian on the unit square. Either the five-point Laplacian (12.5.1) or the nine-point
Laplacian (12.5.4) might be used to approximate the differential equation. For the boundary
condition, suitable approximations are
∂u −3v0m + 4v1m − v2m
(0, ym ) ≈ = b(0, ym ) (13.7.3)
∂x 2)x
or
∂u v1m − v0m
(0, ym ) ≈ = b(0, ym ). (13.7.4)
∂x )x
The approximation (13.7.3) is second-order accurate, whereas (13.7.4) is first-order
accurate. For each of these methods we obtain one equation for each unknown v$,m ,
0 ≤ $, m ≤ N. The linear system can be written as
Ax = b (13.7.5)
as for the Dirichlet boundary conditions, except in this case the vector of unknowns, x,
also contains the components of v$,m on the boundary. Thus, K, the order of the system
(13.7.5), is (N + 1)2 .
The nonuniqueness of the solution of the Neumann problem for (13.7.1) implies that
the matrix A in (13.7.5) is singular or nearly singular. Because the solution of (13.7.1)
with the Neumann boundary conditions is unique only up to a constant, most difference
schemes for (13.7.1) and the boundary conditions will also be unique only to within an
additive constant. That is, if x is a solution to (13.7.5), then
A(x + αx0 ) = b
is also true, where x0 is the vector all of whose components are 1 and α is any real
number. Comparing this equation with (13.7.5), we see that x0 is a null vector of A, i.e.,
Ax0 = 0.
13.7 The Neumann Boundary Value Problem 367
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We will assume that the null space of the matrix A is one-dimensional. (The null
space of a matrix is the linear subspace of vectors z such that Az is the zero vector.) The
matrix A is said to have a (column) rank deficiency of 1. This is a reasonable assumption,
since the null space of the differential operator is also one-dimensional.
A fundamental result of linear algebra is that the row rank of a matrix is equal to
its column rank. Thus there is a nonzero vector y0 such that y0T A is the zero vector.
The vector y0 represents the constraint that the data in (13.7.5) must satisfy in order for a
solution to exist. We have

0 = (y0T A)x = y0T (Ax) = y0T b (13.7.6)

if a solution x exists for (13.7.5). If A is symmetric, then y0 may be taken to be x0 .


There are two problems concerning constraint (13.7.6). The first is that we may
not know the constraint vector y0 , and the second is that the constraint (13.7.6) may not
be satisfied exactly for the known or given data, either because of errors in the physical
data or through truncation errors. One solution to these difficulties is to use only simple
boundary condition discretizations that maintain the symmetry of A, when that is possible.
Unfortunately, this usually results in only first-order accurate boundary conditions (see
Exercise 13.7.1).
If we delete one equation from the linear system (13.7.5) and arbitrarily fix one
component of x, then the resulting system will usually be nonsingular. However, the
accuracy of the solution will depend on which equation is deleted.
An approach that does not single out any particular equation or variable is to use the
concept of a factor space. We consider two vectors v1 and v2 to be equivalent if their
difference, v1 − v2 , is a multiple of the null vector x0 . We consider equation (13.7.5) for
solutions in the resulting factor space, which we denote by R K /!x0 ". If we consider the
data in the factor space R K /!y0 ", then the system is nonsingular. If we do not know y0 ,
we can consider the data in R K /!x0 ", and the system will be nonsingular as long as y0T x0
is nonzero (see Exercise 13.7.3). We will assume that y0T x0 is nonzero for each system we
discuss.
This abstract reasoning is useful only if it leads to a useful and convenient algorithm.
In this case it does, as we now illustrate. The norm of a vector x in R K /!x0 ", where x0
is the vector with all components equal to 1, is given by
K 1/2

x = (xν − x)
2
,
ν=1

where x is the average of the components xν . The equation being solved is no longer
(13.7.5), but rather
Ax = b − γ x0 , (13.7.7)
where γ is the average of b − Ax, i.e., the average residual. When a solution to (13.7.7)
is obtained, the value of γ is an indication of how closely the data vector b satisfies the
constraint. A nonzero value of γ can be due either to errors in the data or to the truncation
errors implicit in the use of finite difference schemes.
368 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We now give formulas for using this method on an elliptic equation. First, we write
the finite difference equation at each grid point ($, m) in the form

v$,m − L($,m)($ ,m ) v$ ,m − U($,m)($ ,m ) v$ ,m = b$,m ,

where L and U refer to the lower and upper triangular parts of the matrix. One sweep of
SOR applied to this system may be described as follows. At each grid point ($, m), the
k , the update is computed:
value of r$,m

k
r$,m =  ,m +
L($,m)($ ,m ) v$k+1 U($,m)($ ,m ) v$k ,m − v$,m
k
+ b$,m .

k+1
The value of v$,m is obtained as

k+1
v$,m = v$,m
k
+ ωr$,m
k
.

The iteration continues until the updates are essentially constant, independent of ($, m),
i.e., until r − r̄, the norm of the update in the factor space, is sufficiently small. To make
the method efficient requires a convenient means of computing the average of the update
and computing r − r̄.
We now show how to compute both the average of the update and the norm of the
update in the factor space. The algorithm for computing the averages and norms is due to
West [70], who introduced it as an efficient means of computing averages and variances
of statistical quantities. First, the variables r k+1
0 and v k+1
0 , which will accumulate the
average values of the update and v, respectively, are set to zero along with the variables
R0k and V0k , which will accumulate the norms of these quantities. It is also convenient to
use the variable J to count the total number of points that have been updated.
At each grid point the accumulators of the norms are computed as

J
+1 = RJ
RJk+1 k+1
+ (r$,m
k+1
− r k+1
J )
2
,
J +1
2 J
VJk+1
+1 = VJk+1 + (v$,m
k+1
− v k+1
J ) ,
J +1

and then the averages are computed:

k+1
r$,m − r k+1
J
J +1 = r J
r k+1 k+1
+ ,
J +1
k+1
v$,m − v k+1
J
v k+1
J +1 = v k+1 + .
J
J +1

The value of J is then incremented by 1, and the computation proceeds to the next
grid point.
13.7 The Neumann Boundary Value Problem 369
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

At the completion of one SOR sweep, the value of J will be equal to the total number
of grid points at which values have been updated, which is K. The value of r k+1K will be
equal to the average update and v k+1
K will be the average value of v k+1 . The norms

 1/2

v k+1 − v̄ k+1  =  (v$,m
k+1
− v k+1 )2 )x)y 
$,m

and  1/2

r k+1 − r̄ k+1  =  (r$,m
k+1
− r k+1 )2 )x)y 
$,m

will be equal to (VJk+1 )x)y)1/2 and (RJk+1 )x)y)1/2 , respectively. The SOR iterations
can be stopped when r k+1 − r̄ k+1  is sufficiently small.

Example 13.7.1. We show results of using the factor space method and the method in
which a specified variable is fixed in Table 13.7.1. The equation being solved is Poisson’s
equation
uxx + uyy = −5 sin (x + 2y)
on the unit square with the normal derivative data being consistent with the solution

u(x, y) = sin (x + 2y) + C. (13.7.8)

The five-point Laplacian was used, and the boundary conditions were approximated
by the second-order approximation (13.7.3).
The finite difference grid used equal grid spacing in each direction. The three different
grid spacings are displayed in the first column of the table. The next columns show the
number of iterations required to obtain a converged solution and the error in the solutions.

Table 13.7.1
Comparison of using factor space or fixing the center value.
Factor method Fixed center value
h Iterations Error* Iterations Error* Error**
0.100 55 3.40–3 95 4.47–3 2.38–2
0.050 93 9.38–4 241 1.13–3 7.70–3
0.025 200 2.47–5 958 2.87–4 2.37–3
∗ In the factor space L2 norm.
∗∗ In the usual L2 norm.

For each method the initial iterate was the grid function, which was identically zero.
Each method was terminated when the appropriate norm of the change in the solution was
less than 10−7 . This convergence criterion was sufficient to produce results for which the
370 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

error was primarily due to the truncation √ error. For the factor space method, the iteration
parameter ω was chosen as 2/(1 + π h/ 2), since π is the smallest eigenvalue for the
Laplacian on the square with Neumann boundary conditions.
For the fixed-value method, the value of ω was 2/(1 + h) for h equal to 1/10,
it was 2/(1 + 1.1h) for h equal to 1/20, and it was 2/(1 + 2h) for h equal to 1/40.
These values give convergence but are not optimal. For this method, the exact value of the
solution was fixed at the center point of the square; the constant in (13.7.8) was chosen so
that u(1/2, 1/2) was zero.
The solutions show the second-order accuracy of the finite difference methods when
measured in the factor space norm. Notice that the error in the factor space norm is signif-
icantly smaller than in the usual L2 norm.
Example 13.7.2. Table 13.7.2 shows the results of using the factor space method on equation

exy uxx + uyy = f (13.7.9)

on the unit square with Neumann boundary data. The values of f and the boundary data
are determined by the exact solution

u(x, y) = e−xy .

The last column gives the average update for the last iteration. It can be seen that the
average update is quite small compared with the error. The results clearly show that the
solution is second-order accurate.
This example is interesting because the integrability constraint is unknown. The
integrability condition is discussed in Section 12.1 and is a linear relationship involving the
boundary data and the data f in equation (13.7.9). The integrability condition must be
satisfied for a solution to exist.
In spite of not knowing the integrability condition, the solution can be computed. The
integrability constraint for this equation is difficult to obtain because this equation cannot
be put into divergence form; see Exercise 12.7.3.

Table 13.7.2
The factor space method for a nonsymmetric equation.
h Iteration Error r̄
0.100 60 2.05–4 1.13–5
0.050 103 5.07–5 1.56–6
0.025 233 1.26–5 1.98–7

If a nonzero constant, say 1, is added to the value of f in (13.7.9), then the integra-
bility condition is not satisfied. This method will compute a solution in the factor space,
but the value of the average update, corresponding to γ in (13.7.7), will not be small, since
the constraint is not close to being satisfied.
13.7 The Neumann Boundary Value Problem 371
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
13.7.1. Show that the five-point Laplacian and first-order accurate boundary condition
(13.7.4) on the unit square give a symmetric matrix if the equations are scaled
properly.
13.7.2. Using the maximum principle, show that equation (13.7.1) with boundary condition
(13.7.2) has a unique solution to within an additive constant.
13.7.3. Consider a K × K matrix A that is singular with rank deficiency 1 and with
a left null vector y0 and right null vector x0 . Show that when considered as a
linear mapping from the factor space R K /!x0 " to the factor space R K /!x0 ", A
is nonsingular if and only if the inner product of x0 and y0 is nonzero.
13.7.4. Solve Poisson’s equation

uxx + uyy = −2π 2 cos π x cos πy

on the unit square with the Neumann boundary condition

∂u
= 0.
∂n
The exact solution is u(x, y) = cos π x cos πy. Use both the first-order accurate
approximation (13.7.4) and the second-order accurate approximation (13.7.3) to
approximate the boundary conditions. Use equal grid spacing for both directions,

and use grid spacings of 1/10, 1/20, and 1/40. Use ω = 2/(1 + π h/ 2).
13.7.5. Consider the Jacobi iteration given by the five-point Laplacian on the unit square
given by
k+1
v$,m = 1
4
k
(v$+1,m + v$−1,m
k
+ v$,m+1
k
+ v$,m−1
k
)

for $ = 0, . . . , N and m = 0, . . . , N with grid spacing h equal to N −1 . The


boundary conditions are used to eliminate the variables v$,m with $ or m less
than 0 or greater than N, with the relations

v−1,m = v1,m for m = 0, . . . , N,

v$,−1 = v$,1 for $ = 0, . . . , N,

vN+1,m = vN−1,m for m = 0, . . . , N,

v$,N+1 = v$,N−1 for $ = 0, . . . , N.

Show that the eigenvalues are given by


  !
1 aπ bπ
µa,b = cos + cos
2 N N
372 Chapter 13. Linear Iterative Methods
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

for 0 ≤ a, b ≤ N and the corresponding eigenvectors are


 
a,b a$π bmπ
v$,m = cos cos .
N N

Show that the Jacobi method will not converge in the factor space R K /!x0 " in
which x0 is the vector with all components equal to 1. Show also that the Gauss–
Seidel method will converge. This result does not contradict Theorem 13.5.1, since
the Jacobi method in the factor space is not the true Jacobi method.
13.7.6. Show that the optimal value of ω for point SOR applied to the equations in Exercise
13.7.5 in the factor space is

2
ω∗ = 
1 + sin(π/2N ) 1 + cos2 (π/2N )
2
≈ √ .
1 + π h/ 2

13.7.7. Verify that the following algorithm can be used to compute norms and vector prod-
ucts in the factor space R K /!x0 ", where x0 is the vector with all components
equal to 1:
Given vectors x and y in R K , let σK and τK denote the factor space
norms of x and y, respectively, and their inner product will be denoted by πK .
The quantities x̄K and ȳK are the averages of x and y, respectively.
The algorithm is: Set σ0 = 0, τ0 = 0, π0 = 0, x̄0 = 0, ȳ0 = 0. Then for
k from 0 to K − 1, compute the quantities

x̄k+1 = x̄k + (xk+1 − x̄k )/(k + 1), ȳk+1 = ȳk + (yk+1 − ȳk )/(k + 1),
σk+1 = σk + (xk+1 − x̄k )2 k/(k + 1), τk+1 = τk + (yk+1 − ȳk )2 k/(k + 1),
πk+1 = πk + (xk+1 − x̄k )(yk+1 − ȳk )k/(k + 1).

Then, at the conclusion of the algorithm,


"K "K
x̄K = j =1 xj /K = x̄, ȳK = j =1 yj /K = ȳ,
"K "K
σK = j =1 (xj − x̄)2 , τK = j =1 (yj − ȳ)2 ,
"K
πK = j =1 (xj − x̄)(yj − ȳ).
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 14

The Method of Steepest Descent


and the Conjugate Gradient
Method

In this chapter we consider a class of methods for solving linear systems of equations when
the matrix of coefficients is both symmetric and positive definite. (See Appendix A for the
definitions of these terms.) Although we are interested primarily in the application of these
methods to the solution of difference schemes for elliptic equations, these methods can be
applied to any symmetric and positive definite system. We begin by discussing the method
of steepest descent and then the conjugate gradient method, which can be regarded as an
acceleration of the method of steepest descent. Our approach to the conjugate gradient
method is based on that of Concus, Golub, and O’Leary in [10]. There have been many
variations and extensions of the conjugate gradient method that cannot be discussed here.
A good reference for these additional topics is the book by Hageman and Young [28] and
the compendium of iterative methods by Barrett et al. [4].

14.1 The Method of Steepest Descent


We consider a system of linear equations

Ax = b, (14.1.1)

where matrix A is symmetric and positive definite. As in the previous chapter we will let
K be the order of the matrix. Consider also the function F (y) defined by

F (y) = 1
2 (y − x, A(y − x)) , (14.1.2)

where x is the solution to (14.1.1) and (·, ·) is the usual inner product on R K . Obviously,
the function F has a unique minimum at y equal to x, the solution of (14.1.1). Similarly,
the function E given by

E(y) = F (y) − F (0) = 12 (y, Ay) − (y, b) (14.1.3)

has a unique minimum at the solution of (14.1.1). Both the method of steepest descent and
the conjugate gradient method are iterative methods that reduce the value of E at each
step until a vector y is obtained for which E(y) is minimal or nearly minimal. In many

373
374 Chapter 14. Method of Steepest Descent and Conjugate Gradient Method
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

applications the function E(y) represents a quantity of significance, such as the energy of
the system. In these cases the solution of (14.1.1) is the state of minimum energy.
We first consider the method of steepest descent. The gradient of the function E(y)
is the vector
G(y) = Ay − b = −r, (14.1.4)
where r is called the residual (see Exercise 14.1.1). Since the gradient of a function points
in the direction of steepest ascent, to decrease the value of the function it is advantageous
to go in the direction opposite of the gradient, which is the direction of steepest descent.
The method of steepest descent, starting from an initial vector x 0 , is given by

x k+1 = x k + αk r k , (14.1.5)

where
r k = b − Ax k
and αk is some parameter.
The notation we will use is that lowercase Roman letters will denote vectors and have
superscripts, and Greek letters will denote scalar quantities and have subscripts. The norm
of a vector v will be denoted by |v|, where |v| = (v, v)1/2 .
The parameter αk in (14.1.5) will be chosen so that E(x k+1 ) is minimal. We have
   
E x k+1 = E x k + αk r k
         
= 21 x k , Ax k + αk r k , Ax k + 12 αk2 r k , Ar k − x k , b − αk r k , b
     
= E x k − αk r k , r k + 12 αk2 r k , Ar k .

This expression is a quadratic function in αk and has a minimum for some value of αk .
We find the minimum as follows. Setting ∂E/∂αk = 0, we find that αk given by
 k k
r ,r |r k |2
αk =  k =  (14.1.6)
r , Ar k r k , Ar k

is the value at which E(x k+1 ) is minimal.


We now derive some consequences of this choice of αk . From (14.1.5) we have that

r k+1 = r k − αk Ar k

and so from (14.1.6),


     
r k+1 , r k = r k , r k − αk r k , Ar k = 0, (14.1.7)

showing that consecutive residuals are orthogonal. For this optimal choice of αk we have
    1 |r k |4
E x k+1 = E x k −  k , (14.1.8)
2 r , Ar k
14.1 The Method of Steepest Descent 375
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

showing that E(x k ) will decrease


 as k increases until the residual is zero. Notice also
from the definitions of E x k and r k that
   
E x k = 12 A−1 r k , r k − F (0)

and hence (14.1.8) is equivalent to


    |r k |4
A−1 r k+1 , r k+1 = A−1 r k , r k − .
(r k , Ar k )
We now collect the formulas for steepest descent:
x k+1 = x k + αk r k , (14.1.9a)
r k+1 = r k − αk Ar k , (14.1.9b)
|r k |2
αk =  . (14.1.9c)
r k , Ar k
Notice that to implement the method, we need only one matrix multiplication per step; also,
there is no necessity for storing the matrix A. Often A is quite sparse, as in solving linear
systems arising from elliptic systems of finite difference equations, and we need only a
means to generate the vector Ar given the vector r. Formula (14.1.9b) should be used
instead of the formula r k = b − Ax k to compute the residual vectors r k . When using
the finite precision of a computer, there is a loss of significant digits when the residual is
computed as b − Ax k , since the two vectors b and Ax k will be nearly the same when k
is not too small. The formula (14.1.9b) avoids this problem.
Although our derivation of the steepest descent method relied on matrix A being
both symmetric and positive definite, we can apply the algorithm (14.1.9) in case A is not
symmetric. The following theorem gives conditions on which the method will converge.
Theorem 14.1.1. If A is a positive definite matrix for which AT A−1 is also positive
definite, then the algorithm given by (14.1.9) converges to the unique solution of (14.1.1)
for any initial iterate x 0 .

Proof. First note that if A is positive definite, then A−1 is also positive definite,
and if AT A−1 is positive definite, we have that there are constants c0 and c1 such that

c0 (x, A−1 x) ≤ (x, AT A−1 x) (14.1.10)

and
c1 (x, Ax) ≤ (x, x) (14.1.11)
 
for all vectors x (see Exercise 14.1.2). We now consider r k+1 , A−1 r k+1
, where r 0 =
b − Ax and r
0 k+1 k
depends on r by (14.1.9b). We have
         
r k+1 , A−1 r k+1 = r k , A−1 r k − αk r k , r k − αk Ar k , A−1 r k + αk2 r k , Ar k
   
= r k , A−1 r k − αk r k , AT A−1 r k
376 Chapter 14. Method of Steepest Descent and Conjugate Gradient Method
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

by the definition of αk . Now using (14.1.11) we have


 k k
r ,r
αk =  k  ≥ c1
r , Ar k

and thus, by (14.1.10),


   
r k+1 , A−1 r k+1 ≤ r k , A−1 r k (1 − c0 c1 ) for k ≥ 0. (14.1.12)

Notice that 1 − c0 c1 is nonnegative, since A−1 is positive definite. Therefore,


   
r k , A−1 r k ≤ r 0 , A−1 r 0 (1 − c0 c1 )k

 
and thus r k , A−1 r k tends to zero.
But r k , given by (14.1.9b), is b − Ax k , as can be shown by induction. Since A−1
is positive definite, we have that the vectors r k converge to zero, and because

x k = A−1 (b − r k )

it follows that the vectors x k converge to A−1 b, which is the unique solution
of (14.1.1).

Corollary 14.1.2. If A is symmetric and positive definite, then the steepest descent method
converges.

The estimate (14.1.12) shows that if the product c0 c1 can be taken to be close to
1, then the method of steepest descent will converge quite rapidly. As can be seen from
Exercise 14.1.2, one way of having c0 c1 close to 1 is if A is close to being a multiple
of the identity matrix. However, steepest descent can often be quite slow, and this usually
occurs because the residuals oscillate. That is, in spite of (14.1.7), we can have r k+2 be in
essentially the same direction as r k or −r k .
Because the method of steepest descent is often quite slow, we consider several means
to accelerate it. One method that accelerates steepest descent is the conjugate gradient
method, which is the subject of the next several sections.

Exercises
14.1.1. Using the relation

E(y + z) = E(y) + (z, Ay − b) + O(|z|2 )

for the function E(y) given by (14.1.3), verify that the gradient of the function
E(y) is G(y) = Ay − b, as asserted in (14.1.4).
14.2 The Conjugate Gradient Method 377
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

14.1.2. Show that the constants c0 and c1 of (14.1.10) and (14.1.11) can be taken to be

λ3 1
c0 = and c1 = ,
λ2 λ1

where λ1 and λ2 are the greatest eigenvalues and λ3 is the least eigenvalue of

−1
2 (A + A ),
1 T 1
2 (A + A−T ), and 1 −1 T
2 (A A + AA−T ),

respectively.
14.1.3. Consider the matrix 
1 b
A= .
0 2

(a) Show that A is positive definite if |b| < 2 2.
(b) Show that AT A−1 is positive definite if |b| < 4/3.
14.1.4. Show that if |b| < 2, then |r k+2 | < |r k | when the steepest descent algorithm is
applied to the matrix of Exercise 14.1.3.
  Conclude that the method converges when
|b| < 2. Hint: Show that if r k = xy , then

y x(x − by)
r k+1 = .
−x (x 2 + bxy + 2y 2 )

14.1.5. Discuss the relationship between the example of Exercise 14.1.4 and Theorem
14.1.1 when 4/3 ≤ |b| < 2.
14.1.6. Show that the method of steepest descent applied to the matrix of Exercise 14.1.3
does not converge for |b| ≥ 2. Hint: Consider r 0 = (α, 1)T , where α 2 +
α − 1 = 0.
14.1.7. Prove the Cauchy–Schwarz inequality for a symmetric, positive definite matrix A:

(x, Ay) ≤ (x, Ax)1/2 (y, Ay)1/2 .

Hint: Consider (αx − βy, A(αx − βy)) .

14.2 The Conjugate Gradient Method


The conjugate gradient method can be viewed as an acceleration of steepest descent. We
begin our derivation of the method by writing
 
x k+1 = x k + αk r k + γk (x k − x k−1 )
378 Chapter 14. Method of Steepest Descent and Conjugate Gradient Method
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

for some scalar parameters αk and γk . This formula shows that the new change in position,
x k+1 − x k , is a linear combination of the steepest descent direction and the previous change
in position x k − x k−1 . We rewrite the preceding formula as

x k+1 = x k + αk p k ,

where  
pk = r k + γk x k − x k−1 = r k + γk αk−1 p k−1
= r k + βk−1 p k−1 .
Combining these formulas we have

x k+1 = x k + αk p k , (14.2.1a)
r k+1
= r − αk Ap ,
k k
(14.2.1b)
p k+1
=r k+1
+ βk p , k
(14.2.1c)

where the parameters αk and βk are to be determined. The vector p k is called the search
direction to the kth iteration.
We now wish to determine the parameters αk and βk and also determine what p 0
should be so that (14.2.1) converges
 as rapidly as possible. As with steepest descent, we
wish to choose x k+1 so that E x k+1 is minimal. To begin we assume that p k is known,
 
and we choose αk so that E x k+1 is minimized. We have
           
E x k+1 = 1
2x k , Ax k + αk p k , Ax k + 12 αk2 p k , Ap k − x k , b − αk p k , b
     
= E x k − αk p k , r k + 12 αk2 p k , Ap k .

By considering the derivative of E(x k+1 ) with respect to αk , we obtain that


 k k
p ,r
αk =  k  for k ≥ 0 (14.2.2)
p , Ap k

is the optimal value of αk . Using this value of αk we have

    1 p k , r k 2
E x k+1
= E xk −  k .
2 p , Ap k

We first consider the case k = 0, where we have complete freedom to choose 0


 1  p . From 
this formula we see that r is a good choice for p , since it will make E x less E x 0 .
0 0

From now on we will assume p0 = r 0 ; later on we will see other advantages to this choice.
Next we use (14.2.2) with (14.2.1b) to give
     
p k , r k+1 = pk , r k − αk p k , Ap k = 0.
14.2 The Conjugate Gradient Method 379
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Then using this relation with (14.2.1c), we have


     
pk+1 , r k+1 = r k+1 , r k+1 + βk p k , r k+1
= |r k+1 |2 for k ≥ 0.

Then by our choice of p0 we have


 
p k , r k = |r k |2 for k ≥ 0.

This pattern of alternatively using (14.2.1b) and (14.2.1c) will be used repeatedly in our
analysis of the conjugate gradient method.
With this last relation and (14.2.2) we have that

|r k |2
αk =  , (14.2.3)
p k , Ap k

which is a convenient formula for computing αk . We also have that


    1 |r k |4
E x k+1 = E x k −  k .
2 p , Ap k
 
This formula shows that pk should be chosen so that pk , Ap k is minimal, since that
 k+1 
will minimize E x given x k . By (14.2.1c) we see that βk−1 should be chosen to
 k 
minimize p , Ap given p k−1 . We have
k

       
pk , Ap k = r k , Ar k + 2βk−1 r k , Ap k−1 + βk−1
2
p k−1 , Ap k−1 ,

and so the optimal choice of βk−1 is


 
r k , Ap k−1
βk−1 = −   for k ≥ 1
p k−1 , Ap k−1

or, equivalently,  k+1 


r , Ap k
βk = −  k  for k ≥ 0. (14.2.4)
p , Ap k
Our first conclusion from this formula results from using this formula for βk with
(14.2.1c). We have
     
p k+1 , Ap k = r k+1 , Ap k + βk p k , Ap k = 0

and so we obtain the important result that


 
pk+1 , Ap k = 0 for k ≥ 0, (14.2.5)
380 Chapter 14. Method of Steepest Descent and Conjugate Gradient Method
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

which we describe by saying that consecutive search directions are conjugate. Using
(14.2.5) with (14.2.1c), we find
     
p k , Ap k = r k , Ap k + βk−1 p k−1 , Ap k
 
= r k , Ap k ,

which we use with (14.2.1b) and (14.2.3) to obtain


     
r k+1 , r k = r k , r k − αk Ap k , r k
   
= r k , r k − αk p k , Ap k (14.2.6)

= 0.

We now obtain a more convenient formula for βk than (14.2.4). First, by (14.2.1b)
and (14.2.6),      
r k+1 , r k+1 = r k+1 , r k − αk r k+1 , Ap k
 
= −αk r k+1 , Ap k ,

so by (14.2.4) our formula for βk is

1 |r k+1 |2 |r k+1 |2
βk = = .
k k
αk (p , Ap ) |r k |2

We now collect the formulas for the conjugate gradient method:

p 0 = r 0 = b − Ax 0 , (14.2.7a)
x k+1 = x k + αk p k , (14.2.7b)
r k+1 = r k − αk Ap k , (14.2.7c)
p k+1 = r k+1 + βk p k , (14.2.7d)
|r k |2
αk =  , (14.2.7e)
p k , Ap k
|r k+1 |2
βk = . (14.2.7f )
|r k |2

The implementation of these formulas in a computer program is discussed in the next


section. We conclude this section with some basic observations about the algorithm.
We see from formulas (14.2.7) that if βk is small, i.e., if |r k+1 | is much less that |r k |,
then pk+1 is essentially r k+1 and the conjugate gradient method is close to the steepest
descent method. If |r k+1 | is not much less than |r k |, then the new search direction, pk+1 ,
14.2 The Conjugate Gradient Method 381
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

will not be close to the local steepest descent direction, r k+1 . Notice that the vectors
r k as defined by (14.2.7c) are equal to the residual b − Ax k for all values of k; see
Exercise 14.2.1.
Next we prove a very interesting and significant result about the residuals and search
directions for the conjugate gradient method.

Theorem 14.2.1. For the conjugate gradient method (14.2.7), the residuals and search
directions satisfy the relations
   
r k , r j = pk , Ap j = 0 for k  = j. (14.2.8)

Proof. We prove this result by induction. First notice that


 
r 0, r 1 = 0

and  
p 0 , Ap 1 = 0

by (14.2.6) and (14.2.5).


Next, assume that
   
r $ , r j = p$ , Ap j = 0 for 0 ≤ j < $ ≤ k.

We wish to show that this holds for all j and $ with 0 ≤ j < $ ≤ k + 1 as well. First,
by (14.2.6) and (14.2.5), we take the case with j equal to k and $ equal to k + 1 :
   
r k+1 , r k = pk+1 , Ap k = 0.

Now assume that j is less than k. By (14.2.7c) and (14.2.7d) we have


     
r k+1 , r j = r k , r j − αk Ap k , r j
 
= −αk Ap k , pj − βj −1 p j −1

=0
   
since p$ , Ap j and p$ , Ap j −1 are zero by our induction hypothesis. Also, for j less
than k      
pk+1 , Ap j = r k+1 , Ap j + βk p k , Ap j
 
= r k+1 , r j − r j +1 αj−1

=0
by the result just proved. This completes the proof of the theorem.
382 Chapter 14. Method of Steepest Descent and Conjugate Gradient Method
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Theorem 14.2.1 has the following immediate corollary.

Corollary 14.2.2. If the matrix A is a K × K symmetric positive definite matrix, then


the conjugate gradient algorithm converges in at most K steps.

Proof. By Theorem 14.2.1 all the residuals are mutually orthogonal, by (14.2.8).
Thus r K is orthogonal to r k for k = 0, . . . , K − 1. Since the dimension of the space is
K, r K must be zero, and so the method must be converged within K steps.
This corollary is not often of practical importance, since for an elliptic difference
equation on the square, e.g., the five-point Laplacian (12.5.1) with grid spacing )x =
)y = 1/N the vectors have dimension K = (N − 1)2 , which is quite large. However, it
does turn out that often the conjugate gradient method is essentially converged in far fewer
than K steps. When viewed as an iterative method, it is very effective, the number of
iteration steps being on the order of N (i.e., K 1/2 ) for elliptic difference equations. This
is proved in section 14.4.
We have derived the conjugate gradient method by minimizing the quadratic func-
tional E(y) or F (y). Notice that by (14.1.2)

F (y) = 1
2 (y − x, A(y − x))
   
= 12 −A−1 r, −r = 12 r, A−1 r .

Thus, the conjugate gradient method minimizes the functional (r, A−1 r) at each step in
the search direction.

Example 14.2.1. Table 14.2.1 displays the results of computations using both the SOR and
conjugate gradient methods to solve for the solution of the five-point Laplacian on the unit
square with Dirichlet boundary conditions. The exact solution of the partial differential
equation was u = ex sin y for 0 ≤ x, y ≤ 1. The finite difference grid used equal grid
spacing in each direction. The three different grid spacings are displayed in the first column
of the table.
For both methods the initial iterate was the grid function that was equal to the exact
solution on the boundary and was zero in the interior of the square. The SOR method was
terminated when the L2 norm of the changes to the solution, given by
 1/2
ω | 14 (v$+1,m
n
+ v$−1,m
n+1
+ v$,m+1
n
+ v$,m−1
n+1
) − v$,m | h
n 2 2
,
$,m

was less than the tolerance of 10−7 . The sum is for all interior grid points. The value for
ω was 2(1 + πh)−1 for each case.
The conjugate gradient method was also terminated when the norm of the updates
to the solution was less than the tolerance of 10−7 . The norm of the updates is given
by hαk |p k |. For each method the number of iterations and the norm of the error are given
14.2 The Conjugate Gradient Method 383
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Table 14.2.1
Comparison of SOR and conjugate gradient methods.
SOR Conjugate gradient
h Iterations Error Iterations Error Residual
0.100 31 5.52–5 27 5.51–5 1.91–8
0.050 64 1.38–5 54 1.39–5 3.19–8
0.025 122 3.21–6 107 3.48–6 2.59–8

for the three values of h equal to 1/10, 1/20, and 1/40. In addition, the norms of the
residuals are displayed for the conjugate gradient method.
Table 14.2.1 clearly shows for both methods that the number of iterates is proportional
to h−1 . The table also demonstrates the second-order accuracy of the five-point Laplacian
finite difference scheme. Decreasing the tolerance from 10−7 to 10−8 and 10−9 decreased
the residuals for the conjugate gradient method but did not decrease the errors. This shows
that the error given is primarily due to the truncation error inherent in the finite difference
scheme and is not the error due to the iterative method.
The error shown for h equal to 1/40 for the SOR method actually increased as the
tolerance was reduced from 10−7 to 10−8 . The error shown in the table is due to the
fortuitous circumstance that the iterate at which the method was stopped was closer to the
solution to the differential equation than it was to the true solution to the difference scheme.
When the tolerance was reduced to 10−8 , the error was essentially that of the conjugate
gradient method.

In doing computations to demonstrate the order of accuracy of schemes and the speed
of iterative methods, we must be careful to distinguish between errors due to the use of
finite difference schemes, i.e., truncation errors, and errors due to the iterative method. The
results shown in Table 14.2.1 were done in double precision to remove the arithmetic errors
due to the finite precision of the computer. Double-precision calculations are often not
needed in practical computations because the arithmetic errors are usually much smaller
than the uncertainty of the data.
Since the conjugate gradient method is more expensive per step than SOR in terms of
both storage and computation time, Table 14.2.1 shows that SOR is more efficient than the
conjugate gradient method for this problem. A major advantage of the conjugate gradient
method is that it can be easily modified to a preconditioned conjugate gradient method, as is
shown in Section 14.5. A second advantage of the conjugate gradient method is that it does
not require the user to specify any parameters, such as the iteration parameter ω required
by SOR methods.

Exercises
14.2.1. Prove by induction that the vectors r k as defined by (14.2.7) are equal to the
residual b − Ax k for each k.
384 Chapter 14. Method of Steepest Descent and Conjugate Gradient Method
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

14.2.2. A skew matrix A is one for which AT = −A. Show that the following algorithm
converges when A is skew and nonsingular:
p0 = −Ar 0 ,
x k+1 = x k + αk p k ,
r k+1 = r k − αk Ap k ,
pk+1 = −Ar k+1 + βk p k ,
αk = |Ar k |2 /|Ap k |2 ,
βk = |Ar k+1 |2 /|Ar k |2 .

Hint:
 k+1 Show k and βk−1
 that αk+1  minimize |r k |2 at each step. Also show that
r , Ap = Ar , Ar = 0 for all k.
k k

14.2.3. Show that if A is skew but singular, then with the algorithm given in Exercise
14.2.2, the vectors x k converge to a vector x ∗ and the vectors r k converge to
r ∗ , such that
r ∗ = b − Ax ∗
and r ∗ is a null vector of A. Hint: Show that |r k | converges and that |Ar k | ≤
A |r k − r k+1 | and |r k+1 − r k |2 = |r k |2 − |r k+1 |2 .

14.3 Implementing the Conjugate Gradient Method


We now discuss how to implement the conjugate gradient method using the five-point
Laplacian on a uniform grid as an illustration. We begin by considering (14.2.7) and see
that four vectors of dimension K are required. These are x k , r k , pk , and an additional
vector q k , which is used to store the values of Ap k .
We start with an initial iterate x 0 and then compute r 0 = b − Ax 0 , q 0 = Ar 0 , and
α0 with p 0 = r 0 . Then (14.2.7) becomes

x k+1 = x k + αk p k , (14.3.1a)

r k+1 = r k − αk q k , (14.3.1b)

p k+1 = r k+1 + βk p k , (14.3.1c)

q k+1 = Ar k+1 + βk q k , (14.3.1d)

|r k |2
αk =  , (14.3.1e)
pk , q k

|r k+1 |2
βk = . (14.3.1f )
|r k |2
14.3 Implementing the Conjugate Gradient Method 385
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

One can avoid using the vectors q k if Ap k is computed twice, once for (14.3.1b) and once
for evaluating α k .
We now show what these formulas become for the example of solving Poisson’s
equation on the unit square with equal spacing in both directions. The vectors will now be
indexed by their grid point indices, and we denote the components of the vector x by the
grid function v$,m . The equations to solve are

−v$+1,m − v$−1,m − v$,m+1 − v$,m−1 + 4v$,m = −h2 f$,m , (14.3.2)

which forms the system of equations Ax = b. Notice that A is positive definite and
symmetric and that the vector b contains both the values h2 f$,m and the values of the
solution on the boundary.
0
First, v$,m 0
is given and then r$,m is computed in the interior as

0
r$,m = −h2 f$,m + v$+1,m
0
+ v$−1,m
0
+ v$,m+1
0
+ v$,m−1
0
− 4v$,m
0
,
0
p$,m = r$,m
0
,

with |r 0 |2 also being computed. Then q$,m


0 is computed as

0
q$,m = 4r$,m
0
− r$+1,m
0
− r$−1,m
0
− r$,m+1
0
− r$,m−1
0
(14.3.3)
 
and the inner product p 0 , q 0 is also computed to evaluate α0 as |r 0 |2 /(p 0 , q 0 ). Note
that for Dirichlet boundary data, r k , p k , and q k should be zero on the boundary.
Now begins the main computation loop. First v and r are updated by
k+1
v$,m = v$,m
k
+ αk p$,m
k
,
k+1
r$,m = r$,m
k
− αk q$,m
k
,

with |r k+1 |2 also being computed. Using |r k+1 |2 , βk is computed; then p and q are
updated by
k+1
p$,m = r$,m
k+1
+ βk p$,m
k
,
(14.3.4)
k+1
q$,m = 4r$,m
k+1
− r$+1,m
k+1
− r$−1,m
k+1
− r$,m+1
k+1
− r$,m−1
k+1
+ βk q$,m
k
,

k+1 k+1
and the inner product (p k+1 , q k+1 ) is computed by accumulating the products p$,m q$,m .
 
Finally, αk+1 is computed as the ratio |r k+1 |2 / pk+1 , q k+1 and k is incremented.
It is important to notice that in the computer code there is no need to use variables
indexed by the iteration counter k. The values of αk and βk are not required beyond the
kth iteration, and thus the implementation should use only variables α and β.
A trick can be used to reduce the code that initializes p 0 and q 0 . After v 0 and r 0
have been computed, set β equal to zero and then use the code for formulas (14.3.4) to
compute p 0 and q 0 . This avoids using separate code for (14.3.3) and (14.3.4).
386 Chapter 14. Method of Steepest Descent and Conjugate Gradient Method
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The conjugate gradient method is terminated when either αk |p k | or |r k | is suffi-


ciently small. For most systems these two quantities are good indicators of how close the
current iterate x k is to the true solution. As with the general linear methods, e.g., SOR, the
method should be continued until the error in the iteration is comparable to the truncation
error in the numerical method. There is no reason to solve the linear system exactly when
there is intrinsic truncation error due to using finite difference methods.
It should also be pointed out that it is not wise to compute the residual r k as b − Ax k ,
and the formula (14.3.1b) should be used instead. Although r k is mathematically equivalent
to b − Ax k , when using the finite precision of a computer there is a loss of significant digits
when the residual is computed as b − Ax k , since the two vectors b and Ax k will be nearly
the same and much larger than r k when k is not too small. The formula (14.3.1b) avoids
this problem.
In those cases where the matrix A is ill conditioned, there will usually be a significant
difference between the computed vector r k and the true residual for large values of k.
Nonetheless, the method, as given by (14.3.1), will converge to machine precision even in
the presence of these rounding errors. Of course, one must not set the convergence criteria
smaller than what can be obtained with the machine arithmetic.

Exercises
14.3.1. Use the conjugate gradient method to solve Poisson’s equation

uxx + uyy = −4 cos(x + y) sin(x − y)

on the unit square. The boundary conditions and exact solution are given by the for-
mula u = cos(x + y) sin(x − y) . Use the standard five-point difference scheme
with h = )x = )y = 0.1, 0.05, and 0.025. The initial iterate should be zero in
the interior of the square. Comment on the accuracy of the scheme and the effi-
ciency of the method. Stop the iterative method when the L2 norm of the change
is less than 10−6 .

14.3.2. Use the conjugate gradient method to solve Poisson’s equation

uxx + uyy = −2 cos x sin y

on the unit square. The boundary conditions and exact solution are given by
the formula u = cos x sin y. Use the standard five-point difference scheme with
h = )x = )y = 0.1, 0.05, and 0.025. The initial iterate should be zero in the
interior of the square. Comment on the accuracy of the scheme and the efficiency of
the method. Stop the iterative method when the L2 norm of the change is less than
10−6 . Compare with the results of the SOR method applied to this same equation
(see Exercise 13.5.1).
14.4 A Convergence Estimate 387
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

14.4 A Convergence Estimate for the Conjugate Gradient


Method
Theorem 14.2.1 shows that the conjugate gradient method will converge in at most K
steps if A is a K × K matrix. However, we will now prove an estimate on the rate of
convergence of the method that shows that the method is often essentially converged after
far fewer than K steps.

Theorem 14.4.1. If A is a symmetric positive definite matrix whose eigenvalues lie in the
interval [a, b], with 0 < a, then the error vector ek for the conjugate gradient method
satisfies
√ √ k 
 1/2 b − a 1/2
ek , Aek ≤2 √ √ e0 , Ae0 . (14.4.1)
b+ a

Proof. We begin with the observation based on (14.2.7b) and (14.2.7c) that the resid-
ual after k steps of the conjugate gradient method can be expressed as a linear combination
of the set of vectors {Aj r 0 } for j from 0 to k. We express this observation as

r k = Rk (A)r 0 , (14.4.2)

where Rk (λ) is a polynomial in λ of exact degree k (see Exercise 14.4.1). The coefficients
of the polynomial Rk (λ) depend on the initial residual r 0 . We will also make use of the
observation that
Rk (0) = 1 (14.4.3)

for all nonnegative integers k. (If A = 0, then by (14.2.7c), r k = r 0 . )


The error ek on the kth step of the conjugate gradient method is related to the residual
by
r k = Aek. (14.4.4)

Since matrix A commutes with Rk (A), a polynomial in A, we have by (14.4.2) that


 
A ek − Rk (A)e0 = 0,

and since A is nonsingular we have

ek = Rk (A)e0 . (14.4.5)

We now use Theorem 14.2.1 to establish that


   
ek , Aek = Qk (A)e0 , Aek (14.4.6)
388 Chapter 14. Method of Steepest Descent and Conjugate Gradient Method
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

for any polynomial Qk (λ) of degree k satisfying Qk (0) = 1. Relation (14.4.6) is proved
as follows. Using (14.4.5) and Theorem 14.2.1, we have
   
ek , Aek = ek , r k
 

k−1
= ek + γj r j , r k 
j =0

for any choice of the coefficients γj . But we then have, by (14.4.4) and (14.4.5), that
 

k−1
k−1
ek + γj r j = Rk (A) + γj ARj (A)e0
j =0 j =0

= Qk (A)e0 ,

where it is easy to see that, by appropriate choice of the γj , Qk (λ) can be any polynomial
of degree k satisfying Qk (0) = 1. This establishes (14.4.6).
We now use the Cauchy–Schwarz inequality for positive definite matrices (see Exer-
cise 14.1.7) to obtain
   
ek , Aek = Qk (A)e0 , Aek
 1/2  1/2
≤ Qk (A)e0 , AQk (A)e0 ek , Aek ,

from which we obtain


   
ek , Aek ≤ Qk (A)e0 , AQk (A)e0 . (14.4.7)

We now wish to choose Qk (A) so that the right-hand side of (14.4.7) is as small as possible,
or nearly so. We will actually only estimate the minimum value of the right-hand side. We
begin by using the spectral mapping theorem (see Appendix A). Since the eigenvalues of
A are in the interval [a, b], we have that
   
Qk (A)e0 , AQk (A)e0 ≤ max |Qk (λ)|2 e0 , Ae0 . (14.4.8)
a≤λ≤b

We will choose the polynomial Qk (λ) so that |Qk (λ)| is quite small for λ in [a, b]. Recall
that Qk (0) is 1. Based on an understanding of the properties of orthogonal polynomials,
we choose  
Tk b+a−2λ
b−a
Qk (λ) =   ,
Tk b+a
b−a
14.4 A Convergence Estimate 389
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where Tk (µ) is the Tchebyshev polynomial of degree k given by



cos(k cos−1 µ) if |µ| ≤ 1
Tk (µ) = (14.4.9)
[sign(µ)]k cosh(k cosh−1 |µ|) if |µ| ≥ 1.

See Exercise 14.4.2. Notice that Qk (0) is 1.


For λ in the interval [a, b], the value of |b + a − 2λ|/(b − a) is bounded by 1 and
|Tk (µ)| for µ ∈ [−1, 1] is at most 1; therefore, we have
 !−1   !−1
b+a b+a
max |Qk (λ)| ≤ Tk = cosh k cosh−1 .
a≤λ≤b b−a b−a

As k increases,
 the value of cosh{k cosh−1 [(b + a)/(b − a)]} also increases, showing
that ek , Aek decreases with k. To obtain a more useful estimate of this quantity, we set

b+a eσ + e−σ
= cosh σ = .
b−a 2
Solving this equation for eσ , we have
√ √
b+ a
e =√
σ
√ .
b− a

(There should be no cause for confusion between eσ , which is the exponential of σ, and
ek , which is the kth error vector.)
We then obtain
√  
−kσ √ k √ √ 2k
e +e
kσ 1 b+ a  b− a 
cosh kσ = = √ √ 1+ √ √
2 2 b− a b+ a
√ √ k
1 b+ a
≥ √ √ .
2 b− a

Thus we have √√ k


ab−
max |Qk (λ)| ≤ 2 √ √ .
a≤λ≤b b+ a
This estimate with (14.4.8) gives (14.4.1), which proves Theorem 14.4.1.
Theorem 14.4.1 shows that the conjugate gradient method converges faster when the
eigenvalues of A are clustered together in the sense that a/b is close to 1. Notice also that
the estimate (14.4.1) is independent of simple scaling of the matrix A; i.e., the estimate
is the same for Ax = b and αAx = αb for √any positive number α. For the five-point
√  √ √
Laplacian on the unit square, the value of b− a / b + a is 1 − O(h), as
with SOR, and indeed the two methods are about equal in terms of the number of iterations
required for a solution, as shown in Table 14.2.1 (see Exercise 14.4.3).
390 Chapter 14. Method of Steepest Descent and Conjugate Gradient Method
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
14.4.1. Using induction on k, verify relation (14.4.2). You may wish to also show that
p k can be expressed as a polynomial in A multiplying r 0 .

14.4.2. Verify that the Tchebyshev polynomials Tk (µ) given by (14.4.9) are indeed poly-
nomials of degree k. Hint: Use the formula

cos(k + 1)θ = − cos(k − 1)θ + 2 cos kθ cos θ

and a similar formula for cosh(k + 1)θ to establish a recurrence relation between
the Tk (µ).

14.4.3. For the five-point


 Laplacian on the unit square with equal spacing in each direction,
show that a/b is approximately π h/2.

14.5 The Preconditioned Conjugate Gradient Method


A technique resulting in further acceleration of the conjugate gradient method is the pre-
conditioned conjugate gradient method. We first discuss this method in some generality
and then examine the particular case of preconditioning with SSOR.
The basic idea of the preconditioned conjugate gradient method is to replace the
system
Ax = b

by
 
B −1 AB −T B T x = B −1 b,

where B −1 AB −T is a matrix for which the conjugate gradient method converges faster
than it does with A itself. Matrix B is chosen so that computing B −T y and B −1 y are
easy operations to perform. Note that B −1 AB −T is symmetric and positive definite when
A is.
According to Theorem 14.4.1, to get faster convergence, we wish to have the eigen-
values of B −1 AB −T more clustered together than are those of A. Since A is symmetric
and positive definite, there is a matrix C so that A = CC T , and B is usually chosen to
approximate C in some sense. Note also that B need only approximate a multiple of C,
so that B −1 AB −T is closer to being a multiple of the identity than is A itself.
Consider now the conjugate gradient method applied to

Ãx̃ = b̃,

where
à = B −1 AB −T , x̃ = B T x, b̃ = B −1 b.
14.5 The Preconditioned Conjugate Gradient Method 391
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We have from (14.2.7) that


x̃ k+1 = x̃ k + αk p̃k ,
r̃ k+1 = r̃ k − αk Ãp̃k , (14.5.1)
p̃ k+1
= r̃ k+1
+ βk p̃ ,
k

where αk = |r̃ k |2 /(p̃ k , Ãp̃k ) and βk = |r̃ k+1 |2 /|r̃ k |2 .


Now let us rewrite (14.5.1) in terms of the original variables x rather than x̃. Using
x k = B −T x̃ k , p k = B −T p̃k , and r k = B r̃ k , we have

x k+1 = x k + αk p k ,
r k+1 = r k − αk Ap k ,
p k+1 = M −1 r k+1 + βk p k ,

where M = BB T and
   
r k , M −1 r k r k+1 , M −1 r k+1
αk =  k  , βk =  k  .
p , Ap k r , M −1 r k

We see that the effect of the preconditioning is to alter the equation for updating the search
direction p k+1 and to alter the definitions of αk and βk . For the method to be effective,
we must easily be able to solve

r = Mz = BB T z

for z. A common choice of B is to take B = L̃, where L̃ is an approximate lower


triangular factor of A in the sense that

A = L̃L̃T + N,

where N is small in some sense.

Preconditioning by SSOR
We now consider SSOR and show how it can be used as a preconditioning for the conjugate
gradient method. We assume that A can be written in the form

A = I − L − LT .

Notice that the matrix A in (14.3.2) is actually in the form 4(I − L − LT ), but the scalar
multiple does not affect the conclusions. SSOR is a two-step process given by
 
v k+1/2 = v k + ω Lv k+1/2 + LT v k − v k + b ,
  (14.5.2)
v k+1 = v k+1/2 + ω Lv k+1/2 + LT v k+1 − v k+1/2 + b .
392 Chapter 14. Method of Steepest Descent and Conjugate Gradient Method
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We wish to rewrite this in the form M(v k+1 − v k ) = r k . Notice that we can express v k+1 −
v k as a linear function of r k in this way because the construction of v k+1 is linear, and if
r k were zero, then the update v k+1 − v k would also be zero. It remains to determine the
matrix M and to determine if it has the form B̃ B̃ T .
We rewrite the first step as
   
v k+1/2 − v k − ωL v k+1/2 − v k = ω Lv k + LT v k − v k + b = ωr k .

We can therefore write


v k+1/2 = v k + (I − ωL)−1 ωr k . (14.5.3)

The second step of (14.5.2) can be rewritten as


 
I − ωLT v k+1 = (I (1 − ω) + ωL) v k+1/2 + ωb,

and substituting from (14.5.3) we have


 
I − ωLT v k+1 = [(1 − ω)I + ωL] v k + [(1 − ω)I + ωL] (I − ωL)−1 ωr k + ωb

or
  
I − ωLT v k+1 − v k
 
= −ωI + ωL + ωLT v k + ωb + [(1 − ω)I + ωL] (I − ωL)−1 ωr k
= ωr k + [(1 − ω)I + ωL] (I − ωL)−1 ωr k
= (I − ωL)−1 [I − ωL + (1 − ω) I + ωL] r k
= (I − ωL)−1 (2 − ω) ωr k .

We thus have
1   
(I − ωL) I − ωLT v k+1 − v k = r k . (14.5.4)
ω (2 − ω)
If we compare expression (14.5.4) with the identity
 
A v − vk = r k ,

we see that SSOR can be viewed as an iterative method, which approximates A by the
matrix in (14.5.4). Since the matrix in (14.5.4) is in the form BB T , it is natural to employ
the preconditioned conjugate gradient method with B = (ω(2 − ω))−1/2 (I − ωL).
It is important to note that if we are going to use SSOR alone to solve the problem,
we would use (14.5.2) with immediate replacement. Formula (14.5.4) is important only
when using SSOR as a preconditioner.
14.5 The Preconditioned Conjugate Gradient Method 393
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

We now apply this preconditioning matrix to Laplace’s equation in a square. We have

x k+1 = x k + αk p k ,
(14.5.5)
r k+1 = r k − αk Ap k ,

and
p k+1 = zk+1 + βk p k ,

where zk+1 is computed using (14.5.4). The computation of zk+1 is implemented as


follows:  
k+1
z̃$,m = 14 ω z̃$,m−1
k+1
+ z̃$−1,m
k+1
+ ω (2 − ω) r$,m
k+1
,
  (14.5.6)
k+1
z$,m = 14 ω z$,m+1
k+1
+ z$+1,m
k+1
+ z̃$,m
k+1

for all interior points with z̃ and z being zero on the boundaries. Notice that the first of
these relations should be executed in the order of increasing indices, and the second should
be done in the order of decreasing indices.
Notice that the quantities z and z̃ can occupy the same storage locations. The
parameters for the preconditioned method are computed by the formulas
 
r k , zk
αk =  ,
p k , Ap k
 k+1 k+1 
r ,z
βk =  k k  .
r ,z

The method of (14.5.6) is a method for solving

(I − ωL) z̃k+1 = ω (2 − ω) r k+1 ,


 
I − ωLT zk+1 = z̃k+1 .

Other ways of computing zk+1 can also be used. Notice that we have scaled z̃ to avoid
taking the square root of ω(2 − ω). We can also dispense with the factor ω(2 − ω), since
it represents only a scaling factor.
To implement the preconditioning requires two more loops than does the regular
conjugate gradient method. The additional loops, given by (14.5.6), are very simple, and
the slight extra effort is more than justified by the substantial increase in speed of the
preconditioned method.
We now collect the formulas for implementing the preconditioned conjugate gradient
method. To initialize the preconditioned conjugate gradient method, we use

p 0 = z0 = M −1 r 0 ,
394 Chapter 14. Method of Steepest Descent and Conjugate Gradient Method
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

as we see from the relations between p 0 , p̃ 0 , r 0 , and r̃ 0 . The formulas are:

x k+1 = x k + αk p k , (14.5.7a)

r k+1 = r k − αk q k , (14.5.7b)

zk+1 = M −1 r k+1 , (14.5.7c)

pk+1 = zk+1 + βk p k , (14.5.7d)

q k+1 = Azk+1 + βk q k , (14.5.7e)


 k k
r ,z
αk = k k  ,
 (14.5.7f )
p ,q
 k+1 k+1 
r ,z
βk =  k k  . (14.5.7g)
r ,z

As with (14.3.1), we can avoid using the vectors q k if Apk is computed twice, once for
(14.5.7b) and once for evaluating α k in (14.5.7f).
The preconditioned conjugate gradient method can be significantly faster than the
conjugate gradient method. As we can see, it requires only minor modifications to a con-
jugate gradient method to implement a preconditioned conjugate gradient method. The
choice of ω in the SSOR preconditioner is not as critical as it is in the SSOR method
itself. The spectral radiusfor the preconditioned conjugate gradient method with the SSOR
preconditioner is 1 − O N −1/2 . This is illustrated in Table 14.5.1.

Example 14.5.1. Table 14.5.1 shows the results of solving Poisson’s equation using the point
SOR method, the conjugate gradient method, and the preconditioned conjugate gradient
method, with SSOR as the preconditioner. The exact solution that was calculated was
u = cos x sin y for 0 ≤ x, y ≤ 1. The finite difference grid used equal grid spacing in
each direction. The three different grid spacings are displayed in the first column of the
table. The next columns show the numbers of iterations required to obtain a converged
solution.
For each method the initial iterate was the grid function that was equal to the exact
solution on the boundary and was zero in the interior of the square. Each method was
terminated when the L2 norm of the change in the solution was less than 10−7 . This
convergence criterion was sufficient to produce results for which the error was primarily
due to the truncation error. For both the SOR method and the SSOR preconditioner, the
value of ω was 2(1 + πh)−1 . The table shows that the number of iterations for the first two
methods is roughly proportional to h−1 , whereas for the preconditioned conjugate gradient
method, the number of iterations is proportional to h−1/2 . These results are similar to those
of Table 14.2.1.
Since the work in one iteration of the SOR method is less than that in one iteration
of the other two methods, it is not appropriate to judge the methods solely on the number
of iterations. The conjugate gradient method involves roughly twice as much work per
14.5 The Preconditioned Conjugate Gradient Method 395
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Table 14.5.1
Comparison of the speeds of SOR, the conjugate gradient method,
and the preconditioned conjugate gradient method.
h SOR C.G. P.C.G.
0.100 33 26 12
0.050 60 52 16
0.025 115 103 22

iteration as does point SOR, and the preconditioned conjugate gradient method involves
three to four times as much work as SOR. Thus the preconditioned conjugate gradient
method is faster than SOR for h equal to 1/40, but probably not for the grid spacing
of 1/10. Of course, for even smaller values of the grid spacing h, the preconditioned
conjugate gradient method would be even faster relative to SOR. In terms of computer
storage, the SOR method requires much less storage than the other two methods, but this
is not a significant concern in many scientific computations.

Formulas (14.5.7) show that five vectors are required to implement the preconditioned
conjugate gradient method, as opposed to only four vectors for the conjugate gradient
method. One way of using only four vectors for the preconditioned conjugate gradient
method is to work with r̃ = B −1 r rather than r; see Eisenstat [16]. We then obtain the
algorithm

x k+1 = x k + αk p k , (14.5.8a)

r̃ k+1 = r̃ k − αk B −1 q k , (14.5.8b)

p k+1 = B −T r̃ k+1 + βk p k , (14.5.8c)

q k+1 = Apk+1 , (14.5.8d)


 
αk = |r̃ k |2 / pk , q k , (14.5.8e)

βk = |r̃ k+1 |2 /|r̃ k |2 . (14.5.8f )

The results of the calculations of the vectors B −1 q k and B −T r̃ k+1 are stored in the
vector q.

Preconditioning by Approximate Cholesky Factorization


Other preconditioning matrices for the conjugate gradient method can be obtained by ap-
proximating A as L̃L̃T for a convenient form of L̃. A factorization of a matrix as LLT ,
where L is a lower triangular matrix, is called a Cholesky factorization; thus the product
L̃L̃T is called an approximate Cholesky factorization of A. As an example of an approx-
imate Cholesky factorization for the matrix for the five-point Laplacian, consider a matrix
396 Chapter 14. Method of Steepest Descent and Conjugate Gradient Method
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

L̃ of the form  
L̃v = av$,m + bv$−1,m + cv$,m−1 , (14.5.9)
$,m
where a, b, and c are constants. It is easy to see then that
 
L̃T v = av$,m + bv$+1,m + cv$,m+1 (14.5.10)
$,m
if we use the natural ordering of the components v$,m in the vector v. We then have, by
(14.5.9) and (14.5.10),
       
L̃L̃T v = a L̃T v + b L̃T v + c L̃T v
$,m $,m $−1,m $,m−1
 
= a av$,m + bv$+1,m + cv$,m+1
 
+ b av$−1,m + bv$,m + cv$−1,m+1
 
+ c av$,m−1 + bv$+1,m−1 + cv$,m
 
= a 2 + b2 + c2 v$,m + abv$+1,m + acv$,m+1

+ abv$−1,m + acv$,m−1 + bcv$−1,m+1

+ bcv$+1,m−1 .

To have L̃L̃T approximate A, where A corresponds to the five-point Laplacian, we may


set
a 2 + b2 + c2 = 1,

ab = ac = − 14 .
The two terms in L̃L̃T v for the terms with subscripts ($ − 1, m + 1) and ($ + 1,
m − 1) do not match with terms in the five-point Laplacian. They represent the error
in the approximation of A by L̃L̃T .
Solving the equations for a, b, and c we have
 √
2+ 2
a= ,
2
1
b=c=− .
4a
To implement this method it is often convenient to approximate A by L̃D L̃T , where D is
a diagonal matrix. For our particular choice of L̃, D is just a 2 times the identity. Using
this choice of L̃ the preconditioned conjugate gradient method is (14.5.5), with (14.5.6)
replaced by  
k+1
ẑ$,m = ẑ$,m−1
k+1
+ ẑ$−1,m
k+1
+ 4r$,m
k+1
d,
  (14.5.11)
k+1
z$,m = z$,m+1
k+1
+ z$+1,m
k+1
d + ẑ$,m
k+1
,
14.5 The Preconditioned Conjugate Gradient Method 397
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Table 14.5.2
Comparison of three preconditioning methods
for the nine-point Laplacian.
h None Cholesky Five-point Nine-point
0.100 28 16 18 16
0.050 57 28 25 23
0.025 112 52 34 32

where  √ −1
d = 2+ 2 .

The temporary variable ẑ is defined by


2
ẑ =  T
√ L̃ z.
2+ 2

We can try more sophisticated choices for the matrix L̃. For the discrete Laplacian on
a square, the preceding methods all do quite well. For matrices arising from other problems
we may have to work quite hard to get a good preconditioning matrix.

Example 14.5.2. To solve the difference equations for the fourth-order accurate Poisson’s
equation (12.5.4), we can use preconditioning based on the five-point Laplacian. This is a
simple way to accelerate the solution procedure and it does not affect the accuracy of the
scheme. In fact, using SSOR based on the nine-point Laplacian as the preconditioner with
the nine-point Laplacian does not give a significant improvement over that using SSOR
based on the five-point Laplacian as the preconditioner. This is illustrated in Table 14.5.2.
Table 14.5.2 displays the results of solving Laplace’s equation using the nine-point
Laplacian with the conjugate gradient method and with three different preconditioning
methods. The three preconditioning methods are the approximate Cholesky factorization
(14.5.11) for the five-point Laplacian, the SSOR preconditioning using the five-point Lapla-
cian, and the SSOR preconditioning using the nine-point Laplacian. The exact solution that
was calculated was u = e3x sin 3y for 0 ≤ x, y ≤ 1. The finite difference grid used equal
grid spacing in each direction. The three different grid spacings are displayed in the first
column of the table. The next columns show the number of iterations required to obtain a
converged solution. For each method the initial iterate was the grid function that was equal
to the exact solution on the boundary and was zero in the interior of the square.
Each method was terminated when the L2 norm of the change in the solution was less
than 10−10 . This convergence criterion was sufficient to produce results for which the error
was primarily due to the truncation error, similar to the results shown in Table 12.5.1. For
both of the SSOR preconditioners, the value of ω was 2(1 + π h)−1 . The table shows that
the number of iterations for the last two methods is roughly proportional to h−1/2 . There is
not a significant difference between the last two methods, but the nine-point preconditioner
is better, as would be expected. The approximate Cholesky method based on the five-point
scheme is not as good as the other two methods, but it still offers a significant improvement
over the basic conjugate gradient method.
398 Chapter 14. Method of Steepest Descent and Conjugate Gradient Method
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

As Example 14.5.2 illustrates, a preconditioner based on the five-point Laplacian


may be a good preconditioner for the nine-point Laplacian. The reason for this is that they
are both related to the same partial differential equation. In general, there is a trade-off
between the effort it takes to find a better preconditioner and the perhaps marginal increase
in performance.
Currently, there is an extensive literature on preconditioning methods. For equations
other than those discussed here, one should check the literature to see what methods other
researchers have employed.

Exercises
14.5.1. Repeat the calculations of Exercise 14.3.1, but using the preconditioned conjugate
gradient method with the SSOR preconditioning. Comment on the efficiency of
the method and observe that the number of iterations increases as O(N 1/2 ).
14.5.2. Repeat the calculations of Exercise 14.3.2, but using the preconditioned conjugate
gradient method with the SSOR preconditioning. Comment on the efficiency of
the method and observe that the number of iterations increases as O(N 1/2 ).
14.5.3. Repeat the calculations of Exercise 14.3.1, but using the preconditioned conjugate
gradient method with the approximate Cholesky factorization as the precondition-
ing. Comment on the efficiency of the method and observe that the number of
iterations increases as O(N 1/2 ).
14.5.4. Repeat the calculations of Exercise 14.3.2, but using the preconditioned conjugate
gradient method with the approximate Cholesky factorization as the precondition-
ing. Comment on the efficiency of the method and observe that the number of
iterations increases as O(N 1/2 ).
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Appendix A

Matrix andVector Analysis

In this appendix we collect results about matrices and vectors that are used throughout the
text. Since many of the applications of linear algebra in the text use vectors with complex
components, we concern ourselves primarily with this case. We denote the set of complex
numbers by C. The proofs of many of the results stated here are included for completeness.

A.1 Vector and Matrix Norms


We may consider a vector, v, as an element of C M , i.e., v = (v1 , . . . , vM ), where vj ,
the j th component of v, is a complex number. Norms are real-valued functions on vector
spaces that provide a notion of the length of a vector. There are three norms on C M that
we use. The most common norms are the $2 or Euclidean norm,
 1/2
M
|v|2 =  |vj |2  ,
j =1

the $1 norm,

M
|v|1 = |vj |,
j =1

and the $∞ , or maximum, norm,

|v|∞ = max |vj |.


1≤j ≤M

Each of these norms satisfy three important properties.

Proposition A.1.1. Each of the norms just given satisfy the following three conditions.
(a) |v| ≥ 0, with equality if and only if v = 0.
(b) |v + w| ≤ |v| + |w|.
(c) |αv| = |α| |v| for α ∈ C.

The proof is easy and is omitted.


The three properties in Proposition A.1.1 are those that define a norm. In property (c)
the expression |α| is the absolute value of the complex number α, and since the absolute
value is a norm on C, the use of the same symbol for absolute value of a number and norm

399
400 Appendix A. Matrix and Vector Analysis
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

of a vector should cause no difficulty. We write expressions such as (C M , | · |1 ) when we


wish to signify which norm is being considered.
The following relations between the norms given earlier are easily proved:

|v|1 ≤ M 1/2 |v|2 ,


(A.1.1)
|v|2 ≤ M 1/2 |v|∞ .

We let ej denote the vector whose components are all zero except for the j th component,
which is 1.

Matrices and Matrix Norms


An M × N matrix may be defined as a linear map from C N to C M . The (i, j )th com-
ponent of a matrix A will be written as Aij or aij , where aij is defined as the ith
component of Aej . The transpose of an M × N matrix A is the N × M matrix A∗ ,
defined by
(A∗ )ij = Āj i ,

where the bar denotes the complex conjugate.


If we consider both C N and C M with norms, then we define the norm of an M × N
matrix A by
|Av|
A = sup |Av| = sup , (A.1.2)
|v|=1 v=0 |v|

where, of course, the expression |Av| refers to the norm C M and |v| refers to the norm
on C N . The equivalence of the two expressions in (A.1.2) follows from the linearity of
A and property (c) of Proposition A.1.1. The matrix norm defined in this way satisfies the
properties of Proposition A.1.1 and thus is a norm on the vector space of M × N matrices.
We collect the important properties of matrix norms in the following proposition.

Proposition A.1.2. Let A and B be M × N matrices and let D be an N × P matrix.


Then the following five conditions are satisfied.
(a) A ≥ 0 with equality if and only if A = 0.
(b) A + B ≤ A + B.
(c) αA ≤ |α| A for α ∈ C.
(d) |Av| ≤ A |v| for all v ∈ C N.
(e) AD ≤ A D.
The proofs of these results follow immediately from the definition of the matrix norm
in (A.1.2).
An important consequence of inequality (e) in Proposition A.1.2 is that for a square
matrix A, i.e., one for which M = N, we have

An  ≤ An . (A.1.3)


A.1 Vector and Matrix Norms 401
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Scalar Products
The $2 norm has several useful properties not shared by the other two vector norms just
given. These properties are a consequence of the $2 norm having an associated scalar
product on C M given by

M
!v, w" = vj wj . (A.1.4)
j =1

We have
|v|2 = !v, v"1/2

and
4 Re !v, w" = |v + w|22 − |v − w|22 .

We also have
!v, Aw" = !A∗ v, w"

for the transpose matrix of A. Vectors v and w are said to be orthogonal if

!v, w" = 0.

There is a difference in terminology that should be pointed out here. The scalar
product !·, ·" is sometimes called a hermitian product, whereas the term scalar product
is used to describe a product like (A.1.4) but without the conjugate on the vj . Also, the
transpose is often called the conjugate transpose or adjoint.

Unitary Matrices
One of the most useful properties of the $2 norm is that there is a large class of matrices
that leave the norm invariant.

Proposition A.1.3. For a square N × N matrix U, the following statements are equiv-
alent.
(a) U ∗ U = I.
(b) |U v|2 = |v|2 for all v ∈ C N.
Matrices satisfying the conditions of Proposition A.1.3 are said to be unitary. Unitary
matrices whose elements are all real numbers are called orthogonal matrices.
Proof of Proposition A.1.3. In terms of the components of the matrix U, condition
(a) is
N

1 if j = k,
uij uik =
i=1
0 if j  = k.

This shows that the rows of U are vectors of unit norm that are orthogonal to each other.
402 Appendix A. Matrix and Vector Analysis
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

To prove condition (b) from condition (a), we have


   

N N
N
|U v|2 =  uij vj  uik vk
i=1 j =1 k=1
 

N
N
N
= vj uij uij vk
j =1 k=1 i=1


N
= |vj |2 = |v|2 ,
j =1

using the orthogonality of rows of U.


To show that condition 2 implies condition 1, we first take v = ej , the vector whose
only nonzero component is a 1 for the j th component. Then


N
1 = |ej |2 = |U ej |2 = uij uij
i=1

for each j. Second, let v = ej + αek for j  = k and α a complex number of absolute
value 1; then
N
2 = |v|2 = |U v|2 = |uij + αuj k |2
i=1
 

N
= 2 + 2 Re α uij uik .
i=1

This implies that



N
Re α uij uik = 0
i=1

for all values of α, which means that the complex number given by the summation must
be zero. This proves that condition (a) follows from condition (b).
For both the $1 and $∞ norms, the class of matrices leaving the norm invariant is
much smaller.
Proposition A.1.4. If P is a matrix such that

|P v| = |v|

for all vectors v and where the norm is either the $1 or $∞ norm, then P is a complex
permutation matrix; i.e., there is only one nonzero element in each row and column of P
and each nonzero element has magnitude 1.
A.1 Vector and Matrix Norms 403

Proof. We give the proof only for the $1 norm; the proof for the $∞ norm is
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

similar. First, let v = ej ; then


N
1 = |P ej | = |pij |,
i=1

so each column of P has norm 1. Next let

v = ej + αek ,

where |α| = 1 and j  = k. Then


N
N
|v| = 2 = |P v| = |pij + αpik | ≤ |pij | + |pik | = 2.
i=1 i=1

Thus the preceding inequality must be an equality, i.e., |pij + αpik | = |pij | + |pik | for
each value of i. But this can be true for α equal to 1 and −1 only if either pij or pik is
zero. Therefore, we conclude that pij = 0 if pik  = 0, and vice versa. Since each column
has norm 1, each column has at least one nonzero element, and the proposition is proved.

The $2 norm is similar to the norm on the space L2 (hZ) defined in Chapter 1. We
use the notation L2 to refer to norms in which the grid parameter h is used, and use the
notation $2 when h is not used.

Eigenvalues
Associated with every square N × N matrix A are numbers called eigenvalues. An
eigenvalue λ is characterized by having A − λI be a singular matrix. An eigenvector
associated with the eigenvalue λ is a nontrivial vector v such that (A − λI )v = 0.
A generalized eigenvector is a nonzero vector such that (A − λI )k v = 0 for some
integer k. An eigenvalue λ is a simple eigenvalue if any two of its eigenvectors are
multiples of each other. An eigenvalue λ is a semisimple eigenvalue if the only vectors
satisfying (A − λI )k v = 0 are actual eigenvectors, i.e., satisfy (A − λI )v = 0. We denote
the set of eigenvalues of a matrix A by (A).
An important result using unitary matrices is Schur’s lemma.

Proposition A.1.5. Schur’s Lemma. For each N × N matrix A, there exists a unitary
matrix U such that
U ∗ AU = T
is an upper triangular matrix, i.e.,

Tij = 0 if i > j.
404 Appendix A. Matrix and Vector Analysis
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Proof. Let v1 be an eigenvector of A with unit $2 norm and with eigenvalue t11 ,
i.e.,
Av1 = t11 v1 .
The matrix
A(2) = (I − v1 v1∗ )A
maps the space of vectors orthogonal to v1 into itself. By considering A(2) on the subspace
of vectors orthogonal to v1 , we see that A(2) has an eigenvalue t22 and eigenvector v2
of unit norm such that v2 is orthogonal to v1 , i.e.,
A(2) v2 = t22 v2
or
Av2 = t22 v2 + t12 v1 ,
where
t12 = v1∗ Av2 = !v1 , Av2 ".
By setting
A(3) = (I − v2 v2∗ )A(2) ,
we may continue the process, obtaining vectors vj with

j
Avj = tij vi .
i=1
Defining the matrix U as that whose ith column is the vector vi , we have
AU = U T ,
where Tij = tij , and so T is upper triangular and U is unitary.

Formulas for Matrix Norms


We now prove some formulas for explicitly evaluating and estimating the matrix norms.
First, we define the spectral radius of a square matrix A to be the largest of the magnitudes
of the eigenvalues, i.e.,
ρ(A) = max |λ|.
λ∈(A)
The estimate ρ(A) ≤ A holds for any matrix norm; see Exercise A.2.1.

Proposition A.1.6. If A maps (C M , | · |p ) to (C N , | · |p ), then



N
A = max |aij | if p = 1,
1≤j ≤M
i=1

M
A = max |aij | if p = ∞,
1≤i≤N
j =1
and
A = ρ(A∗ A)1/2 if p = 2.
A.1 Vector and Matrix Norms 405
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Proof. For p equal to 1, we have

N
M
M N
|Av|1 = a v ≤ |aij | |vj |
ij j
i=1 j =1 j =1 i=1


N
≤ max |aij | |v|1 .
j
i=1

This shows that A is at most equal to the quantity given in the proposition. We can
prove that equality holds by choosing v = ek , where


N
N
|aik | = max |aij |.
j
i=1 i=1

We see that

N
|Aek |1 = |aik |.
i=1

Thus the proposition is proved for p equal to 1.


For p infinite, we have

M
M
|Av|∞
= max
aij vj ≤ max |aij | |vj |
i j =1 i
j =1
 
M
≤ max |aij | max |vj |,
i j
j =1

which shows that A is bounded above by the expression in the proposition. To show
that equality is obtained, we choose k such that


M
M
|akj | = max |aij |
i
j =1 j =1

and set

0 if akj = 0,
vj =
akj /|akj | otherwise.
It is easy to check that

M
|Av| = |akj | |v|,
j =1

which proves the proposition in this case.


406 Appendix A. Matrix and Vector Analysis
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

For the case p equal to 2, we have


|Av|2 = !Av, Av" = !v, A∗ Av",

and by Schur’s lemma there is a unitary matrix U such that

U ∗ A∗ AU = D

is upper triangular. But D ∗ = D, and thus D is diagonal. Therefore,

0 ≤ |Av|2 = !v, U DU ∗ v" = !U ∗ v, DU ∗ v" = !w, Dw"

for all vectors w.


For a diagonal matrix with di being the ith element on the main diagonal, we have

!w, Dw" = di |wi |2 ,

and so we see that each di is nonnegative; moreover,

|Av|2 ≤ max di |w|2 ,


i

and since
|w| = |U ∗ v| = |v|,
we have that
A2 ≤ max di .
The di are the eigenvalues of A∗ A; moreover, by choosing w to be an eigenvector
of D whose eigenvalue has maximum magnitude, the proposition is easily proved.

A.2 Analytic Functions of Matrices


We also need to define analytic functions of square matrices. (See Appendix C.) For an
analytic function with power series expansion


f (z) = an (z − z0 )n
n=0

around a point z0 in the complex plane, we define f (A) for a square matrix A as


f (A) = an (A − z0 I )n .
n=0

In particular,

1 n
e =
A
A .
n!
n=0
A.2 Analytic Functions of Matrices 407
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

The convergence of this series is proved in a manner similar to proving that ez exists for
all complex numbers z. We have
∞ ∞

1 1
e  ≤
A
A  ≤
n
An ≤ eA .
n! n!
n=0 n=0
Thus the exponential of a matrix is always defined.
We can also define etA as the unique solution to the matrix differential equation
dX
= AX, X(0) = I. (A.2.1)
dt
This equation can also be viewed as a linear ordinary differential equation in the vector
space of matrices. Because linear systems of ordinary differential equations have unique
solutions, etA is the unique solution to (A.2.1) and to the equation
dX
= XA, X(0) = I.
dt
It is important to realize that in general
eB eA  = eA eB  = eA+B .
If A and B commute, i.e., AB = BA, then
eA eB = eA+B .
Another useful formula is
−1 AS
eS = S −1 eA S
for any invertible matrix S.

The Spectral Mapping Theorem


The spectral mapping theorem for matrices is the statement that if f is an analytic function
defined on a set containing (A), then
f ((A)) = (f (A)). (A.2.2)
This result is an immediate consequence of the observation that if λ is an eigenvalue of
A, then f (λ) is an eigenvalue of f (A).

Square Roots of Matrices


In our discussion of boundary conditions for parabolic systems we need the following result.

Proposition A.2.1. Let B be an N × N matrix whose eigenvalues λν , ν = 1, . . . , N,


satisfy
Re λν > 0.
Then there exists a unique N × N matrix C such that
C2 = B
and whose eigenvalues µν , ν = 1, . . . , N, satisfy
Re µν > 0.
408 Appendix A. Matrix and Vector Analysis

Proof. Let O be a unitary matrix such that B̃ = OBO ∗ is an upper triangular


Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

matrix with elements (b̃ij ). The upper triangular matrix C̃ = (c̃ij ) is defined by

C̃ 2 = B̃.

This means that the diagonal elements of C̃ satisfy

c̃ii2 = b̃ii ,

and we choose c̃ii with real part positive. This is possible, since none of the b̃ii are
negative real numbers. The off-diagonal elements for j > i satisfy

j −1

c̃ij (c̃ii + c̃jj ) + c̃ik c̃kj = b̃ij .
k=i+1

These equations uniquely determine the c̃ij , since c̃ii + c̃jj is nonzero. Then

C = O ∗ C̃O

is the matrix whose existence is asserted in the proposition.

Positive Definite, Hermitian, and Symmetric Matrices


We say a matrix A is positive semidefinite and write A ≥ 0 if !v, Av" ≥ 0 for all vectors
v. Using this notion, matrices can be given a partial ordering. We say A ≥ B if A − B ≥ 0.
The usual rules for an ordering relation hold for this ordering of matrices. For example,
if A ≥ 0, then αA ≥ 0 for any positive real number α, and A ≥ 0 and B ≥ 0 imply
A + B ≥ 0. We also define A ≤ B if B ≥ A.
We say a matrix is positive definite and write A > 0 if A ≥ εI for some positive
number ε. A matrix A is said to be negative definite or negative semidefinite if −A is
positive definite or positive semidefinite, respectively.
An important class of matrices are those such that A∗ = A. These matrices are called
hermitian matrices. For any matrix A we define its real part Re A as 12 (A∗ + A). Re A
is a hermitian matrix. If all the components of a hermitian matrix are real numbers, then
the matrix is called a symmetric matrix.

Proposition A.2.2. If Re A ≤ cI, then eAt 2 ≤ ect .


dZ
Proof. Let Z(t) = eAt . Then = AZ, and so
dt
 ∗
d  ∗  dZ dZ
Z (t) Z(t) = Z + Z∗ = Z ∗ A∗ Z + Z ∗ AZ
dt dt dt

≤ 2c Z ∗ (t)Z(t).
A.2 Analytic Functions of Matrices 409
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

This implies
d −2ct ∗
[e Z (t)Z(t)] ≤ 0.
dt
By integrating this inequality, we obtain
e−2ct Z ∗ (t)Z(t) − I ≤ 0
or, equivalently,
Z ∗ (t)Z(t) ≤ e2ct I.
Therefore,
Z(t)2 ≤ ect .

Proposition A.2.3. If all the eigenvalues of a matrix A have positive real part, then there
exists a matrix S such that
Re SAS −1 ≥ 0.

Proof. By Schur’s lemma,


U AU ∗ = T
is upper triangular for a suitable unitary matrix U. Let D(δ) be a diagonal matrix with
Dii = δ i for i = 1, . . . , M. Then D −1 T D has elements
δ −i tij δ j = tij δ j −i .
"
Therefore, Re ! v, D −1 T Dv" = Re M i=1 tii |vi | + O(δ), and for δ small enough
2

Re !v, D −1 T Dv" > 0


since each eigenvalue satisfies Re tii > 0. Then
S = D −1 U
is the desired matrix.

Proposition A.2.4. If the eigenvalues of A satisfy Re λ(A) > 0, then there are positive
constants C0 and ε such that
e−tA  ≤ C0 e−εt .

Proof. By Proposition A.2.3 there is a matrix S such that


Re SAS −1 > 0.
By Proposition A.2.4
e−t à  ≤ e−εt .
Since e−tA = S −1 e−t à S, we have
e−tA  ≤ S −1  S e−εt ,
which proves the proposition.
410 Appendix A. Matrix and Vector Analysis
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises
A.2.1. Show that
ρ(A) ≤ A
for a square matrix from (C N , | · |) to itself for any vector norm.
A.2.2. If A is an N × M matrix considered as a map from (C N , | · |1 ) to (C M , | · |∞ ),
show that
A = sup |aij |.
i,j

A.2.3. If B is an N × M matrix considered as a map from (C N , | · |∞ ) to (C M , | · |1 ),


show that
M N
B ≤ |bij |.
i=1 j =1

Show that equality holds in this estimate only if

bij bk$ bi$ bkj

is nonnegative for all indices (i, j ) and (k, $).


A.2.4. Show that
etA etB = et (A+B)
for all t if and only if A and B commute. Hint: Take the second derivative of
each side.
A.2.5. Show that  ! 
0 1 0 1
sin t = sinh t.
−1 0 −1 0

A.2.6. The trace of a square N × N matrix is defined by


N
Tr(A) = aii .
i=1

It is easy to show that


Tr(CAC −1 ) = Tr(A)
for any invertible matrix C. Show that

det eA = eTr(A) ,

and thus eA is invertible for every matrix A. Hint: Use Schur’s lemma.
A.2 Analytic Functions of Matrices 411

A.2.7. Show that if S satisfies S ∗ = −S, then eS is unitary.


Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

A.2.8. Show that there are 2N solutions to X2 = B if B is a nonsingular N × N matrix.


0 1
Show that there are no solutions to X 2 = . Show that there are infinitely
0 0 0 0
many solutions to X2 = .
0 0
A.2.9. Verify the matrix factorization formula


n−1
An − B n = An−1−$ (A − B)B $ . (A.2.3)
$=0

A.2.10. Show that if A is an N × N matrix and B is an M × M matrix with Re λi (A) +


Re λj (B) > 0 for every i = 1, . . . , M and j = 1, . . . , N, then the solution to

AX + XB = C

is given by  ∞
X= e−At Ce−Bt dt.
0

A.2.11. Prove the spectral mapping theorem given by relation (A.2.2).


A.2.12. Use the relation ρ(An ) = ρ(A)n , derived from the spectral mapping theorem, to
prove that
lim An  = 0 if and only if ρ(A) < 1.
n→∞
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Appendix B

A Survey of Real Analysis

This appendix is a survey of some basic concepts of real analysis. The selection of topics
is based upon the demands of the text and is not intended to be exhaustive.

B.1 Topological Concepts


One of the most basic concepts of analysis is that of an open set. A set O in C n is an
open set if for each point x0 in O there is a positive real number ε such that the set
{x : |x − x0 | < ε} is contained in O. The norm | · | on C n may be any of the vector
space norms discussed in Appendix A.
A set F is closed if its complement, written ∼ F, is open. A compact set is any
set K such that if K is contained in the union of a collection of open sets, then there is a
finite subcollection of these open sets whose union contains K. In C n compact sets are
sets that are closed and bounded.
Several important properties of open sets are that the union of any collection of open
sets is open, and the intersection of a finite number of open sets is also open. The empty
set is open by definition.
A function f from C n to C m is continuous at a point x0 if for each positive number
ε there is a number δ such that |f (x) − f (x0 )| < ε whenever |x − x0 | < δ. A function
is continuous if it is continuous at each point in its domain. A continuous function may
also be characterized as one such that f −1 (O) is an open set for each open set O in C m ,
where f −1 (O) = {y : f (y) ∈ O}.

B.2 Measure Theory


If {fn }∞ n
n=0 is a sequence of continuous functions on C such that for each x in C the
n

sequence {fn (x)}n=0 converges, it need not be that the function f, given by f (x) =
limn→∞ fn (x), is continuous. The function f (x) is called the pointwise limit of the
sequence {fn }∞ n=0 .
It is useful to consider a class of functions that contains the continuous functions but
also contains pointwise limits of sequences from the class. A very useful class of functions
that has this property is the class of measurable functions.
Before defining a measurable function we must define a measurable set. To do this
we begin with the class of Borel sets. The collection of Borel sets is the collection of sets B
containing all the open and closed sets and also containing any countable union or countable

413
414 Appendix B. A Survey of Real Analysis
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

intersection of sets in B. Note that if the sets A and B are in B, then A\B, which is
{x : x ∈ A and x ∈ / B}, is also in B. The set of Borel sets is an example of a σ -algebra.
A measure is a function that assigns to sets a real number or infinity. The measure of
a set generalizes the notion of the length, area, or volume of the set. For convenience, we
restrict our discussion at this point to the real line. On the real line a Borel measure is a
function µ defined for each interval; the value of µ for an interval (a, b) will be written
as µ(a, b). Except for trivial cases, a measure cannot be defined for all subsets of the real
line; it can, however, be defined for all Borel sets. The basic property satisfied by a measure
is that it be countably additive. This means that if {Mi }∞ i=1 is a countable collection of
disjoint Borel sets, then ∞ 
5 ∞
µ Mi = µ(Mi ). (B.2.1)
i=1 i=1

As a consequence of the countable additivity (B.2.1), it follows that if {Mi }∞


i=1 is a collection
of Borel
8 sets that satisfy M i+1 ⊆ M i for each i, then the measure of the intersection
M= ∞ M
i=1 i is given by
µ(M) = lim µ(Mi ).
i→∞

It can be shown that the countable additivity condition (B.2.1) completely determines the
(Borel) measure if µ(a, b) is defined for each open interval (a, b).
The usual measure on the real line is defined by µ(a, b) = b − a. Lebesgue measure
is the completion of this Borel measure; the completion is that if Z is any subset of a Borel
set A and µ(A) is zero, then µ(Z) is defined to be zero also. The σ -algebra formed
from the Borel sets and these sets of measure zero is the collection of Lebesgue measurable
sets. Unless we explicitly state otherwise, we restrict our discussion to Lebesgue measure
in the rest of this appendix.
If F is a monotone increasing function on R, then one can define the measure µF
by
µF (a, b) = F (b) − F (a).
This is an example of a Stieltjes measure. If F is continuous and strictly monotone, then
the completion of µF determines the same collection of measurable sets as does Lebesgue
measure.

B.3 Measurable Functions


A measurable function on R is a function f such that f −1 (a, b) is a measurable set for
each open interval (a, b). This definition is easily seen to be an extension of the concept of
a continuous function; in particular, each continuous function is measurable. As with con-
tinuous functions, the sum and product of measurable functions is also measurable. Among
the important properties of measurable functions is that pointwise limits of a sequence of
measurable functions are measurable. That is, if {fn }∞ n=1 is a sequence of measurable
functions and f is the pointwise limit of the sequence, i.e.,

f (x) = lim fn (x) (B.3.1)


n→∞
B.4 Lebesgue Integration 415
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

for each x, then f is also a measurable function. Similarly, the function formed by taking
the pointwise supremum, supn fn (x), is also measurable.
Measurable functions need be defined only to within sets of measure zero. If f and
g are measurable functions, but they differ only on a set of measure zero, then they are
equivalent for most purposes in the theory. Similarly, if the limit (B.3.1) holds for all x
except for x in a set of measure zero, then the function f is still a measurable function.
The convergence (B.3.1) is said to be convergence almost everywhere (a.e.) if it holds for
all x except for those in a set of measure zero.

B.4 Lebesgue Integration


One of the most powerful uses of measurable functions is in the definition of Lebesgue
integration. For any set A the characteristic function of A, χA , is defined by

1 if x ∈ A,
χA =
0 if x ∈
/ A.

The function χA is a measurable function only if A is a measurable set. A simple function


ϕ is one that can be represented as


N
ϕ= αi χAi (B.4.1)
i=1

for a finite number of measurable sets Ai and real numbers αi . The representation is not
unique. For each simple function ϕ represented as in (B.4.1), the integral of ϕ is defined
by
 N
ϕ= αi µ(Ai )
i=1

whenever the sum is defined. (The sum is not defined if there are sets Ai and Aj that have
infinite measure and the corresponding αi and αj have opposite signs.) It is straightfor-
ward to show that the definition of the integral is independent of the representation of the
simple function.
For any nonnegative measurable function f, the integral of f over R is defined by
 
f = sup ϕ,
0≤ϕ≤f

where the supremum is over simple nonnegative functions. For any measurable function
f, the integral is defined by
  
f = f+ − f− ,
416 Appendix B. A Survey of Real Analysis
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where f = f+ − f− and f+ and f− are nonnegative measurable functions. The integral


of a measurable function f over measurable set A is defined by
 
f = f χA .
A

)
A function f is said to be integrable if f is defined.
This definition of the integral gives the same value as the Riemann integral when f
is a continuous function and the set A is a finite interval; thus we may write

  b
f = f (x) dx.
(a,b) a

The choice of notation for the integral in formulas is arbitrary and will depend on the
particular application.
The basic result relating the integral of a limit of a sequence of measurable function
to the sequence of integrals is Fatou’s lemma. Fatou’s lemma is easily proved using the
basic definitions.

Proposition B.4.1. Fatou’s Lemma. If {fn }∞ n=1 is a sequence of nonnegative integrable


functions that converges almost everywhere to a measurable function f, then
 
f ≤ lim inf fn .
n→∞

In Chapter 10 we require the Lebesgue dominated convergence theorem, which relies


on Fatou’s lemma.

Proposition B.4.2. Lebesgue Dominated Convergence Theorem. If {fn }∞


n=1 is a
sequence of integrable functions that converges almost everywhere to a function f, and if
there is an integrable function F such that |fn | ≤ F for all n, then
 
f = lim fn .
n→∞

The proof depends on Fatou’s lemma applied to the function sequences {F + fn }∞ n=1
and {F − fn }∞n=1 .
Functions that take on complex values are measurable if both the real and imaginary
parts are measurable; a similar statement is true for the integrals of such functions.
On R n , Lebesgue measure is defined by starting with Cartesian products of intervals
and defining the measure as the usual volume of the region.
B.5 Function Spaces 417
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

B.5 Function Spaces


One advantage of Lebesgue integration over Riemann integration, one that is very important
in the application of this text, concerns the set of functions L2 (R) and the Fourier transform.
The space L2 (R) consists of those functions such that
 1/2
f L2 = |f | 2

is finite. The quantity f L2 is a norm, which satisfies the properties of Proposition
A.1.1, on the vector space L2 (R). Actually, L2 (R) is composed of equivalence classes of
functions. Two functions f1 and f2 are equivalent if f1 − f2  is zero.
The Fourier transform of a function f in L2 (R) is given by
 K
1
fˆ(ω) = lim √ e−iωx f (x) dx.
K→∞ 2π −K

The Fourier transform, as the pointwise limit of the continuous functions


 K
1
√ e−iωx f (x) dx,
2π −K

is a measurable function and, moreover, is also in L2 (R). By Parseval’s relation and


the Fourier inversion formula (see Chapter 2), the Fourier transform is a one-to-one and
onto mapping of L2 (R) to itself. This statement does not hold if we consider Riemann
integration in place of Lebesgue integration.
Other spaces of functions of some interest in the text are L1 (R) and L∞ (R). The
norms are defined by 
f 1 = |f |

for L1 (R) and


f ∞ = ess sup |f (x)|
x

for L∞ (R). The essential supremum of |f |, written ess supx |f (x)|, is the infimum of the
supremum of |g(x)| for all measurable functions g that are equal to f almost everywhere.
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Appendix C

A Survey of Results from Complex


Analysis

This appendix gives the basic concepts of complex analysis and a few of the principal results
that we need in the text.

C.1 Basic Definitions


A function f is an analytic function in a domain U in the complex domain C if at each
point of U, f has a power series expansion with a nonzero radius of convergence. An
equivalent definition is that f has a derivative, defined by
f (z + ε) − f (z)
f  (z) = lim , (C.1.1)
ε→0 ε
at each point z in U. The derivative defined by (C.1.1) must be independent of the way the
complex parameter ε tends to zero. If f is written as u + iv, for real functions u and
v, and if ε in (C.1.1) is taken alternatively to be real and purely imaginary, we conclude
that f  is well defined if and only if u and v satisfy the Cauchy–Riemann equations
∂u ∂v
(x, y) = (x, y),
∂x ∂y
∂u ∂v
(x, y) = − (x, y),
∂y ∂x
where z = x + iy. The Cauchy–Riemann equations imply that u and v are harmonic
functions (see Chapter 12). Examples of analytic functions are polynomials, the trigono-
metric functions, the exponential function, and functions built up from them. For example,
compositions of analytic functions are also analytic functions. Specific examples of analytic
functions are the functions given by the expressions
cos z 5
sin z, , ln(1 + 2ez ).
1 + z3
These functions are analytic functions in any region in which they are single valued and
finite. In particular, since ln reiθ = ln r + iθ, the logarithm is analytic only in regions
that exclude the origin and for which the value of θ can be well defined. The formula
eiz = cos z + i sin z relating the exponential function with the sine and cosine functions is
a basic result of great significance.

419
420 Appendix C. A Survey of Results from Complex Analysis
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

C.2 Complex Integration


Integrals of analytic functions along a curve are defined using either Riemann sums or
Lebesgue integration along the curve. The most important result concerning integrals of
analytic functions is that if ? is a closed curve in a domain U and f is analytic in U,
then the integral of f along ? is zero, i.e.,

f = 0.
?
This result is called Cauchy’s theorem. An equivalent formulation of Cauchy’s theorem is
that if ?1 and ?2 are two curves with the same endpoints, then
 
f = f
?1 ?2

if the function f is analytic in the region bounded by the two curves. In particular, for n
not equal to −1, we have
 b
bn+1 − a n+1
zn dz =
a n+1
for any two complex numbers a and b.
If f is analytic in a neighborhood of a point z0 , except at z0 itself, and f can be
expanded as


f (z) = a$ (z − z0 )$ ,
$=−L
then the residue of f at z0 is the coefficient a−1 . As a consequence of Cauchy’s theorem,
we have 
f (z) dz = 2π ia−1 (C.2.1)
?
for any curve ? that winds once around z0 in a counterclockwise direction. This result is
proved by replacing ? by a circle around z0 with small radius. Using the power series of
f, we may explicitly evaluate the integral.
In the special case when L in expansion (C.2.1) is 1, the residue is determined by
a−1 = lim (z − z0 )f (z).
z→z0

In this case f is said to have a simple pole at z0 .


Formula (C.2.1) is the basis of the calculus of residues to evaluate integrals. The
method is defined by the next proposition.

Proposition C.2.1. If f is analytic in the domain U bounded by the simple closed curve
?, except for a finite set of points z1 , . . . , zN at which f has residues r1 , . . . , rN , then

N
f (z) dz = 2π i r$ ,
? $=1

where the curve ? is taken in the counterclockwise direction.


C.2 Complex Integration 421
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Example C.2.1. As an example of Proposition C.2.1, we use it to compute the Fourier


transform of the function given by u(x) = (x 2 + 1)−1 . By formula (2.1.1) the Fourier
transform is  ∞
1 1
û(ω) = √ e−iωx 2 dx.
2π −∞ x +1
To evaluate this integral, first note that the function u(z) = (z2 + 1)−1 is analytic in the
whole complex plane except at the two points i and −i. We first evaluate û(ω) for ω
positive. We consider the family of curves ?R given by the interval [−R, R] on the real
axis and the arc in the lower half-plane given by Reiθ for θ in the interval [π, 2π ]. The
residue of e−iωz (z2 + 1)−1 at −i is given by

(z + i)e−iωz e−iωz e−ω


lim = lim = .
z→−i (z2 + 1) z→−i (z − i) −2i

When R is larger than 1, then Proposition C.2.1 states that



e−iωz e−ω
dz = 2π i = −π e−ω .
?R z +1
2 −2i

Moreover, in the limit as R tends to infinity, the value of the integral over the arc tends to
zero, as seen by the estimate
  2π −ωR| sin θ|
R e−iωz e
dz ≤ R dθ.
z +1
2 R2 − 1
−R π

Since the integrand is bounded and tends to zero pointwise as R tends to infinity, the
Lebesgue dominated convergence theorem (see Appendix B) shows that in the limit the
integral over the arc is zero. Therefore, we obtain
 −∞ e−iωz
dz = −π e−ω .
∞ z2 + 1

By
√ reversing the direction of the integration on the real line and dividing by the factor of
2π, we obtain the Fourier transform for ω positive. A similar analysis, but using an arc
in the upper half-plane, gives the value of û for negative values of ω. The final result is
 ∞ 
1 −iωx 1 π −|ω|
√ e 2+1
dx = e .
2π −∞ x 2

See Exercise 2.1.1.

A special case of Proposition C.2.1 is the Cauchy integral formula



1 f (ζ )
f (z) = dζ (C.2.2)
2π i ? ζ − z
422 Appendix C. A Survey of Results from Complex Analysis
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

for any closed curve ? winding once around the point z. If ? is a circle of radius r,
formula (C.2.2) is equivalent to the formula
 2π
1
f (z) = f (z + reiθ ) dθ. (C.2.3)
2π 0

From (C.2.3) we obtain the result

|f (z)| ≤ max |f (ζ )|,


|z−ζ |=r

with equality only if f is a constant. This result can be easily extended to prove the
following maximum principle.

Proposition C.2.2. The Maximum Principle. If f is analytic in a bounded set U, then


|f | attains its maximum on the boundary of U.

By applying Proposition C.2.2 to the analytic function ef (z) , we can conclude that
the real part of f must also attain its maximum value on the boundary. This leads to an
alternate proof of Theorem 12.3.2 for the special case of Laplace’s equation.

C.3 A Phragmen–Lindelöf Theorem


The next two results are needed in Chapter 6 to prove Theorem 6.3.1. The first of these
is an example of a class of theorems called Phragmen–Lindelöf theorems. The proofs of
these results are an excellent illustration of the power of the methods of complex analysis.
A Phragmen–Lindelöf theorem states that if an analytic function satisfies a weak bound in
some unbounded domain and a stronger bound on the boundary, then the function actually
satisfies the stronger bound throughout the region.

Proposition C.3.1. If f is an analytic function in the quadrant Q1 given by Re z ≥ 0


and Im z ≥ 0 and there are constants K and d such that

|f (z)| ≤ Ked|z| for z ∈ Q1

and
|f (z)| ≤ Ke−|z|
2
for both Re z = 0 and Im z = 0,
then, in fact,
|f (z)| ≤ Ke−|z|
2
for all z ∈ Q1 .

Proof. We begin by considering the function


 n  α !
z2 eiφ z
h(z) = 1 + exp −ε √ f (z),
n i
C.3 A Phragmen–Lindelöf Theorem 423
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where ε is any positive number, α is between 1 and 2, φ is an arbitrary√ real number, and
n is any positive integer. The square root of i is taken to be (1 + i)/ 2. For the first part
of the proof, the parameters ε, φ, and n are fixed; later we will vary them as appropriate.
We first use the estimate
 n #  
|z|2 π .
|h(z)| ≤ 1 + exp −ε|z|α cos α θ − |f (z)|
n 4
together with the estimate  x n
1+ ≤ ex
n
to conclude that on the boundary of Q1
  π 
|h(z)| ≤ e|z| exp −ε|z|α cos α Ke−|z| ≤ K,
2 2

4
and in the interior of Q1
#   π .
|h(z)| ≤ Cn |z|2n exp −ε |z|α cos α Ked|z| ,
4
where Cn is some constant depending only on n. If |z| is taken large enough, say |z| = R,
then on this arc |h(z)| ≤ K, since the first exponential factor ultimately suppresses the
growth of the other factors.
Thus h(z) is bounded by K on the boundary of the subdomain of Q1 , whose
boundary consists of the real and imaginary axes and the circular arc |z| = R. By the max-
imum principle Proposition C.2.2, h is bounded by K in the interior as well. Moreover,
since the value of R was arbitrary, h is bounded by K in all of Q1 .
We now fix the value of z and vary ε and n. We have
 n  α !
z2 eiφ z
1+ f (z) = exp ε √ h(z),
n i
and by the estimate on h,
 2 iφ n
  

1+ z e f (z) ≤ K exp ε|z|α cos απ .
n 4
Taking the limit as ε tends to zero, we obtain
 2 iφ n


1+ z e f (z) ≤ K.
n
Next we take the limit as n tends to infinity, obtaining

z2 eiφ
e f (z) ≤K

or
|f (z)| ≤ Ke−Re (z
2 eiφ )
.
This estimate holds for all values of φ, and by choosing φ so that Re z2 eiφ is equal
to |z|2 ,
we have proved the proposition.
424 Appendix C. A Survey of Results from Complex Analysis
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

C.4 A Result for Parabolic Systems


Proposition C.3.1 applied to parabolic systems of equations gives the next proposition about
parabolic systems.

Proposition C.4.1. If u(t, x) is a solution to the parabolic system

ut = Buxx + Aux + Cu

and both u(0, x) and u(T , x) are zero for x > 0, then u(t, x) is identically zero.

Proof. We begin by considering the Fourier transform of u(0, x), which is


 0
1
û(0, ω) = √ e−iωx u(0, x) dx
2π −∞

since u(0, x) is zero for x positive. If we set ω = α + iβ, where α and β are real, we
have  0
1
û(0, ω) = √ e−iαx e−|x|β u(0, x) dx.
2π −∞
We see that each component of û(0, ω) is an analytic function of ω for Im ω > 0.
Moreover, we can estimate the vector norm of û by
 0
1
û(0, ω) ≤ √ e−|x|β u(0, x) dx
2π −∞
 1/2
1 0
−2β|x|
≤ u(0, ·) √ e dx
2π −∞
1
= u(0, ·) √ ,
4πβ

so û(0, ω) is bounded for Im ω = β ≥ β0 > 0. Note that û(0, ω) denotes the vector
space norm of the function û evaluated at (0, ω), and u(0, ·) denotes the L2 norm of
the function u(0, x). By the assumptions of the theorem, an estimate of the same form as
the preceding also applies to û(T , ω).
We apply these estimates to the û( 12 T , ω). We have
 
û( 12 T , ω) = exp (−ω2 B + iωA + C) 12 T û(0, ω)
  (C.4.1)
= exp −(−ω2 B + iωA + C) 12 T û(T , ω).

From the first of these relations we conclude for ω = α + iβ0 that

û( 12 T , ω) ≤ K1 e−c1 α


2
C.4 A Result for Parabolic Systems 425
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

for some positive constants K1 and c1 . Using the second representation for û( 12 T , ω),
with ω = iβ for β ≥ β0 , we have that

û( 12 T , ω) ≤ K2 e−c2 β


2

for some positive constants K2 and c2 . Using both representations (C.4.1) shows that
û( 12 T , ω) ≤ K3 ed|ω|
for some constants K3 and d.
Proposition C.3.1, with some adjustment, shows that each component of û( 12 T , ω)
satisfies
|u$ ( 12 T , ω)| ≤ Ke−c|ω|
2

for all ω with Im ω ≥ β0 . We now use the Fourier inversion formula (2.1.2) and this
estimate on û( 12 T , ω) to show that u( 12 T , x) is zero. We have
 ∞+iβ
1
u( 12 T , x) = √ eiωx û( 12 T , ω) dω
2π −∞+iβ
for any β ≥ β0 . (We may replace the path of integration from the real line to the line given
by Im z = β because of Cauchy’s theorem.) Therefore,
 ∞
eβ|x| e−c(α +β ) dα
2 2
u( 12 T , x) ≤ C
−∞
 β|x| −cβ 2
=Ce e .

By taking β arbitrarily large, we conclude that u( 12 T , x) is zero for all x, both positive
and negative. By representations (C.4.1) we conclude that u(0, x) is also zero, and hence
that u(t, x) is zero.

Exercises
C.4.1. Show that if |f (z)| is bounded on the boundary of the quadrant Q1 , as defined in
Proposition C.3.1, and if |f (z)| ≤ C(1 + |z|m ) in the quadrant Q1 for some value
of m, then in fact |f (z)| is bounded in Q1 .
C.4.2. Show that Proposition C.4.1 can be extended to R n , where u(0, x) and u(T , x)
are zero, for x in the half-space x1 ≥ 0.
C.4.3. Use the calculus of residues to verify the formulas given in Exercise 10.2.4. Hint:
Consider the integral over the real line and the line Im z = π.
C.4.4. Use the calculus of residues to show that
 ∞
xα π
dx =
0 x +1
2
2 cos 12 απ
for 0 ≤ α < 1. Hint: Consider curves similar to those used in Example C.2.1 and
use the relation x α = eiπα |x|α for x negative when zα is defined in the upper
half-plane.
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

References

[1] Abarbanel, S., and D. Gottlieb, A note on the leap-frog scheme in two and three
dimensions, J. Comput. Phys., 21 (1976), pp. 351–355.
[2] Ahlfors, L.V., Complex Analysis, 2nd ed, McGraw-Hill, New York, 1966.
[3] Apostol, T., Mathematical Analysis, Addison-Wesley, Reading, MA, 1964.
[4] Barrett, R., M.W. Berry, T.F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout,
R. Pozo, C. Romaine, and H. van der Vorst, Templates for the Solution of Linear
Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia, 1994.
[5] Beam, R.M., and R.F. Warming, Alternating direction implicit methods for parabolic
equations with a mixed derivative, SIAM J. Sci. Statist. Comput., 1 (1980),
pp. 131–159.
[6] Brenner, P., V. Thomée, and L. Wahlbin, Besov Spaces and Applications to Difference
Methods for Initial Value Problems, Springer-Verlag, New York, 1974.
[7] Bube, K.P., and J.C. Strikwerda, Interior regularity estimates for elliptic systems of
difference equations, SIAM J. Numer. Anal., 20 (1983), pp. 653–670.
[8] Buck, R.C., Advanced Calculus, McGraw-Hill, New York, 1965.
[9] Chazarain, J., and A. Piriou, Introduction to the Theory of Linear Partial Differential
Equations, North-Holland, New York, 1982.
[10] Concus, P., G.H. Golub, and D.P. O’Leary, A generalized conjugate gradient method
for the numerical solution of elliptic partial differential equations, in Sparse
Matrix Computations, J.R. Bunch and D.J. Rose, Eds., Academic Press, New
York, 1976, pp. 309–322.
[11] Courant, R., K.O. Friedrichs, and H. Lewy, Über die partiellen differenzengleichun-
gen der mathematischen physik, Math. Ann., 100 (1928), pp. 32–74.
[12] Crank, J., and P. Nicolson, A practical method for numerical integration of solutions
of partial differential equations of heat-conduction type, Proc. Camb. Phil. Soc.,
43 (1947), pp. 50–67.
[13] Douglas, J., and H.H. Rachford, On the numerical solution of heat conduction
problems in two and three space variables, Trans. Amer. Math. Soc., 82 (1956),
pp. 421–439.
[14] Douglis, A., and L. Nirenberg, Interior estimates for elliptic systems of partial dif-
ferential equations, Comm. Pure Appl. Math., 8 (1955), pp. 503–538.
[15] Durran, D., The third-order Adams-Bashforth method: An attractive alternative to
leapfrog time differencing, Monthly Weather Review, 119 (1991), pp. 702–720.

427
428 References
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

[16] Eisenstat, S.C., Efficient implementation of a class of preconditioned conjugate gra-


dient methods, SIAM J. Sci. Statist. Comput., 2 (1981), pp. 1–4.
[17] Fairweather, G., and A.R. Mitchell, A high accuracy alternating direction method
for the wave equation, J. Inst. Math. Appl., 1 (1965), pp. 309–316.
[18] Folland, G.B., Introduction to Partial Differential Equations, Princeton University
Press, Princeton, NJ, 1976.
[19] Foquel, S.R., A counterexample to a problem of S. Nagy, Proc. Amer. Math. Soc.,
15 (1964), pp. 788–790.
[20] Frank, L.S., Algébra des opérateurs aux différence finies, Israel J. Math., 13 (1972),
pp. 24–55.
[21] Garabedian, P., Estimation of the relaxation factor for small mesh sizes, Math. Tables
and Other Aids to Comp., 10 (1956), pp. 183–185.
[22] Garabedian, P., Partial Differential Equations, John Wiley, New York, 1964.
[23] Goldberg, R.R., Fourier Transforms, Cambridge University Press, New York, 1965.
[24] Gottlieb, D., Strang-type difference schemes for multidimensional problems, SIAM
J. Numer. Anal., 9 (1972), pp. 650–661.
[25] Gustafsson, B., H.-O. Kreiss, and J. Oliger, Time Dependent Problems and Difference
Methods, John Wiley & Sons, New York, 1995.
[26] Gustafsson, B., H.-O. Kreiss, and A. Sundström, Stability theory of difference ap-
proximations for mixed initial-boundary value problems, II, Math. Comp., 26
(1972), pp. 649–686.
[27] J. Hadamard, Lectures on Cauchy’s Problem in Linear Partial Differential Equations,
Yale University Press, New Haven, CT, 1921.
[28] Hageman, L.A., and D.M. Young, Applied Iterative Methods, Academic Press, New
York, 1981.
[29] Harten, A., J.M. Hyman, and P.D. Lax, On finite-difference approximations and
entropy conditions for shocks, Comm. Pure Appl. Math, 29 (1976), pp. 297–322.
[30] James, K.R., Convergence of matrix iterations subject to diagonal dominance, SIAM
J. Numer. Anal., 10 (1973), pp. 478–484.
[31] Körner, T.W., Fourier Analysis, Cambridge University Press, Cambridge, UK, 1993.
[32] Kreiss, H.-O., Über die stabilitätsdefinition für differenzengleichungen die partielle
differentialgleichungen approximieren, Nord. Tidskr. Inf. Beh. (BIT), 2 (1962),
pp. 153–181.
[33] Kreiss, H.-O., Über sachgemässe cauchyprobleme, Math. Scand., 13 (1963),
pp. 109–128.
[34] Kreiss, H.-O., Initial boundary value problems for hyperbolic systems, Comm. Pure
Appl. Math., 23 (1970), pp. 277–298.
[35] Kreiss, H.-O., Boundary conditions for hyperbolic differential equations, in Confer-
ence on Numerical Solution of Differential Equations, Lecture Notes in Math.,
A. Dodd and B. Eckman, Eds., Springer-Verlag, New York, 1974, pp. 64–74.
References 429
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

[36] Lax, P.D., and L. Nirenberg, On stability of difference schemes; a sharp form of
Gårding’s inequality, Comm. Pure Appl. Math., 19 (1966), pp. 473–492.
[37] Lax, P.D., and B. Wendroff, Systems of conservation laws, Comm. Pure Appl.
Math., 13 (1960), pp. 217–237.
[38] LeVeque, R.J., and J. Oliger, Numerical methods based on additive splittings for
hyperbolic partial differential equations, Math. Comp., 40 (1983), pp. 469–497.
[39] LeVeque, R.J., and L. N. Trefethen, On the resolvent condition in the Kreiss matrix
theorem, Nord. Tidskr. Inf. Beh. (BIT), 24 (1984), pp. 584–591.
[40] MacCormack, R.W., Numerical solution of the interaction of a shock wave with
a laminar boundary layer, in Proc. of the Second International Conference on
Numerical Methods in Fluid Dynamics, Lecture Notes in Physics, M. Holt, Ed.,
Springer-Verlag, New York, 1971, pp. 151–163.
[41] Michelson, D., Stability theory of difference approximations for multi-dimensional
initial-boundary value problems, Math. Comp., 40 (1983), pp. 1–45.
[42] Miller, J.J.H., On the location of zeros of certain classes of polynomials with applica-
tions to numerical analysis, J. Inst. Math. and Its Appl., 8 (1971), pp. 397–406.
[43] Mitchell, A.R., and G. Fairweather, Improved forms of the alternating direction
methods of Douglas, Peaceman and Rachford for solving parabolic and elliptic
equations, Numer. Math., 6 (1964), pp. 285–292.
[44] Morton, K.W., Numerical Solution of Convection-Diffusion Problems, Chapman &
Hall, London, 1996.
[45] Morton, K., and S. Schechter, On the stability of finite difference matrices, SIAM J.
Numer. Anal., 2 (1965), pp. 119–128.
[46] Oliger, J., and A. Sundström, Theoretical and practical aspects of some initial
boundary value problems in fluid dynamics, SIAM J. Appl. Math., 35 (1978),
pp. 419–446.
[47] Osher, S., Stability of difference approximations of dissipative type for mixed ini-
tial-boundary value problems, I, Math. Comp., 23 (1969), pp. 335–340.
[48] Osher, S., Systems of difference equations with general homogeneous boundary
conditions, Trans. Amer. Math. Soc., 137 (1969), pp. 177–201.
[49] Peaceman, D.W., and H.H. Rachford, The numerical solution of parabolic and el-
liptic differential equations, J. Soc. Indust. Appl. Math., 3 (1955), pp. 28–41.
[50] Pearcy, C., An elementary proof of the power inequality for the numerical radius,
Michigan Math. J., 13 (1966), pp. 289–291.
[51] Peetre, J., and V. Thomée, On the rate of convergence for discrete initial-value
problems, Math. Scand., 21 (1967), pp. 159–176.
[52] Richtmyer, R.D., and K.W. Morton, Difference methods for initial-value problems,
2nd ed., Wiley Interscience, New York, 1967.
[53] Rosser, J.B., Nine point difference solutions for Poisson’s equation, Comp. Math.
Appl., 1 (1975), pp. 351–360.
430 References
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

[54] Saad, Y., and M.H. Schultz, GMRES: A generalized minimal residual algorithm for
solving nonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 7 (1986),
pp. 856–869.
[55] Schoenberg, I.J., Cardinal Spline Interpolation, SIAM, Philadelphia, 1973.
[56] Shintani, H., and K. Toemeda, Stability of difference schemes for nonsymmetric linear
hyperbolic systems with variable coefficients, Hiroshima Math. J., 7 (1977),
pp. 309–378.
[57] Strang, G., On the construction and comparison of difference schemes, SIAM J.
Numer. Anal., 5 (1968), pp. 506–517.
[58] Strikwerda, J.C., and B.A. Wade, An extension of the Kreiss matrix theorem, SIAM
J. Numer. Anal., 25 (1988), pp. 1272–1278.
[59] Tadmor, E., The equivalence of L-stability, the resolvent condition and strict
H-stability, Linear Algebra Appl., 41 (1981), pp. 151-159.
[60] Taylor, M.E., Pseudodifferential Operators, Princeton University Press, Princeton,
NJ, 1984.
[61] Titchmarsh, E.C., Introduction to the Theory of Fourier Integrals, Clarendon Press,
Oxford, UK, 1962.
[62] Trefethen, L.N., Group velocity in finite difference schemes, SIAM Rev., 24 (1982),
pp. 113–136.
[63] Trefethen, L.N, Instability of difference models for hyperbolic initial boundary value
problems, Comm. Pure Appl. Math., 37 (1984), pp. 329–367.
[64] Varga, R.S., Matrix Iterative Analysis, Prentice-Hall, Englewood Cliffs, NJ, 1962.
[65] Vichnevetsky, R., and J.B. Bowles, Fourier Analysis of Numerical Approximations
of Hyperbolic Equations, SIAM, Philadelphia, 1982.
[66] Wachpress, E.L., Iterative Solution of Elliptic Systems and Application to the Neutron
Diffusion Equations of Reactor Physics, Prentice-Hall, Englewood Cliffs, NJ,
1966.
[67] Wade, B.A., Stability and Sharp Convergence Estimates for Symmetrizable Differ-
ence Operators, Ph.D. Thesis, University of Wisconsin–Madison, 1987.
[68] Weinberger, H.F., A First Course in Partial Differential Equations, John Wiley, New
York, 1965.
[69] West, D.H.D., Updating mean and variance estimates: An improved method, Comm.
ACM, 22 (1979), pp. 532–535.
[70] Yamaguti, M., and T. Nogi, An algebra of pseudo difference schemes and its appli-
cations, Publ. Res. Inst. Math. Sci., Series A, 3 (1967), pp. 151–166.
[71] Yanenko, N.N., The Method of Fractional Steps, English translation edited by M.
Holt, Springer-Verlag, New York, 1971.
[72] Young, D.M., Iterative Solution of Large Linear Systems, Academic Press, New
York, 1971.
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Index

ADI methods, 172–185, 342 Chapman–Kolmogorov equation, 141


boundary conditions for, 176 characteristics, 2
implementation of, 177–180 incoming, 11
with mixed derivatives, 181 outgoing, 11
for second-order equations, 202 for systems, 9
stability, 177 for variable coefficients, 5
admissible solution, 291, 301 checkerboard ordering, 359
amplification factor, 48 Cholesky factorization, 395
for multistep scheme, 97, 267 preconditioning, 395
of second-order equations, 193, 271 conjugate gradient method, 377
amplification matrix, 166 convergence estimate, 387
amplification polynomial, 103, 123, 245, implementation, 384
289 conjugate search directions, 380
analytic function, 419 conjugate transpose, 401
artificial viscosity, 161 conservative polynomial, 109
consistent scheme, 25
backward-time backward-space scheme, 35 consistently ordered matrix, 357
backward-time central-space scheme continuity of the solution on the data, 205
for the heat equation, 147 convection-diffusion equation, 140, 157
for hyperbolic equations, 35, 57 convergence estimates
biharmonic equation, 313 for nonsmooth initial functions, 252–258
block tridiagonal systems, 90 for parabolic equations, 259–261
boundary conditions, 275-310 for smooth functions, 235–246
for ADI schemes, 176 convergent scheme, 23, 262
analysis of, 279 coordinate changes and schemes, 335
for elliptic equations, 322 Courant–Friedrichs–Lewy condition, 34
for finite difference schemes, 85, 281 Crank–Nicolson scheme for heat equation,
for parabolic equations, 152 147
box scheme, 57, 77 nondissippative, 151
modified, 78 Crank–Nicolson scheme for hyperbolic
Brownian motion, 141 equations, 63
adding dissipation to, 123
Cauchy–Riemann equations, 313 boundary conditions, 86, 292
Cauchy–Schwarz inequality, 377 modified, 85
cell Peclet number, 159 order of accuracy, 68
cell Reynolds number, 159 solution of, 88
CFL condition, 34 stability of, 77

431
432 Index
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

diagonalizable matrix, 3 forward-time central-space scheme, 17


diagonally dominant matrices, 349 for hyperbolic equations, 17, 51
difference calculus, 78–82 and smoothing, 55
Dirichlet boundary condition, 145, 152, for the heat equation, 145
176, 199, 311 forward-time forward-space scheme, 17, 27
dispersion, 125 Fourier analysis
in higher dimensions, 202 differentiability of functions, 42
dispersive equation, 190 on the integers Z , 38
dissipation, 122 on the real line, 37
adding to nondissipative schemes, 123 Fourier inversion formula
convergence estimates, 259 on the grid, 38
for parabolic schemes, 146 on the integers, 38
and smoothness of the solution, 149 multidimensional, 44
Douglas–Rachford method, 175 on the real line, 37
Du Fort–Frankel scheme, 148, 268 Fourier series, 38
Duhamel’s principle, 32, 225, 262 Fourier transform
D’Yakonov scheme, 175 of derivatives, 42
dynamic stability, 59 in higher dimensions, 44
on the integers, 38
efficiency of higher order schemes, 101, on the real line, 37
181 fourth-order accurate approximations
eigenvalue of a matrix, 403 of first derivative, 79, 80
semisimple, 403 of second derivative, 80
eigenvector, 403 fourth-order accurate nine-point Laplacian,
generalized, 403 328
elliptic equations frozen coefficient problems, 59, 276
boundary conditions, 322 function spaces, L1 (R ), L2 (R ),
differentiability of the solution, 314 L∞ (R ), 417
discontinuous boundary data, 322
regularity estimates, 315 Gauss–Seidel algorithm, 340
energy method, 145, 191 analysis of, 347
envelope of a wave packet, 130 and diagonally dominate matrices,
Euler backward scheme, 57, 75 349–351
Euler–Bernoulli equation, 190 iteration matrix for, 345
scheme for, 195, 200 Gaussian elimination, 88, 89, 339
evaluation operator, 242 grid, 16
and piecewise smooth functions, 256–258 group velocity, 130, 190, 248
explicit schemes, definition, 34 Gustaffson–Kreiss–Sundström–Osher
exponential of a matrix, 214, 406 (GKSO) method, 288, 309

Faber–Krahn inequality, 358 H r , 43, 239, 243


factor space, 367 harmonic functions, 311, 319, 419
finite difference grid, 16 heat equation, 137
finite Fourier transform, 46 hermitian matrix, 226, 230, 408
five-point (discrete) Laplacian, 325 Hurwitz polynomial, 117
Fokker–Planck equations, 140 hyperbolic equation, 1
forward-time backward-space scheme, 17, differentiability of solutions, 2
47 with variable coefficients, 5
Index 433
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

hyperbolic systems, 3, 166, 217 Lax–Wendroff scheme, 61, 70


weakly, 221 dispersion, 126
dissipation, 122
implementation of iterative methods, 346 modified, 84
implicit schemes, 34 for parabolic equations, 162
solution of, 88 smallest stencil, 72
initial value problem stability of, 76
analysis of, 205-225 leapfrog scheme for heat equation, 147
for heat equation, 137 leapfrog scheme for hyperbolic
for one-way wave equation, 1 equations, 17, 195, 267
for second-order equations, 187 (2,4) explicit, 100
initial-boundary value problems, 275–310
(2,4) implicit, 101
for hyperbolic schemes, 291
adding dissipation to, 123
for parabolic schemes, 292
dispersion of, 127
for partial differential equations, 300
initialization initialization, 98
for leapfrog scheme, 18 parasitic mode, 99
of multistep schemes, 18, 98, 269 stability of, 95–97
of schemes for second-order equations, for ut + auxxx = f, 102
197 Lebesgue dominated convergence theorem,
integrability condition, 312, 370 264, 416
interior regularity estimate Lebesgue integration, 415
for finite difference schemes, 330 Lebesgue measure, 414
for partial differential equations, 315 lexicographic order, 340
interpolation operator, 236 linear iterative methods, 341
irreducible matrix, 349 lower order terms
iteration matrix, 341 and stability, 53, 149
and well-posedness of systems, 218
Jacobi method, 340
analysis of, 345, 346 MacCormack scheme, 77
for diagonally dominate matrices, 349 time split, 171
iteration matrix for, 345 matrix method for analyzing stability, 307
line Jacobi method, 359 matrix norms, 400
formulas for, 404
Kreiss matrix theorem, 225–233 maximum principle
for analytic functions, 422
L2 norm
for the discrete five-point Laplacian, 326
on grid, 29, 39
for elliptic equations, 317
on real line, 38
Laplace transform, 276, 291 measurable function, 414
of a discrete function, 277 measurable set, 414
Laplace’s equation, 311 Mitchell–Fairweather scheme, 180
Laplacian operator, 311 monotone schemes, 73
Lax–Friedrichs scheme, 17 multistep schemes, 18, 30, 95
stability, 51 convergence 24, 267-269
Lax–Richtmyer equivalence theorem, 32–33 dispersion of, 127
proof, 262–266 initialization and order of accuracy, 269
for second-order equations, 194 as systems, 167
434 Index
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Neumann boundary condition, 145, 152, quasi-characteristic extrapolation, 86, 282,


199, 312, 365–370 292, 294
norms
H r , 43, 239, 243 Rayleigh equation, 191, 200
L2 , 39 reducible matrix, 349
for discrete functions, 29 reentrant corners, 324, 331
in the factor space, 367 regularity estimates, 314, 315, 330
for vectors, 399 residual, 374
numerical boundary condition, 85, 281–288 resolvent condition
for finite difference equations, 227, 289
for partial differential equations, 301
one-way wave equation, 1 restricted stability condition, 50
order of accuracy reverse Lax–Friedrichs scheme, 36, 58
for homogeneous equations, 69 Riemann integral, 416
and initialization of multistep schemes, Robin condition, 322
269 robustness, 206
for multistep schemes, 267 Rouché’s theorem, 110
of a scheme, 64
and smoothness of parabolic equations, scalar product, 401
149 Schrödinger equation, 191
of the solution, 73 Schur polynomial, 108, 109, 125, 198
using symbols, 66 Schur’s lemma, 403
Schwartz class, 46
search direction, 378
parabolic equations, 137
second-order equations, 187
lower-order terms and stability of, 149
convergence estimates for, 270
schemes for, 145 stability of, 193
parabolic systems, 143, 216 semisimple eigenvalue, 404
parasitic mode, 99 Sherman–Morrison formula, 91
dispersion, 128 simple root, 104
Parseval’s relations, 39 simultaneously diagonalizable, 169
Peaceman–Rachford algorithm, 175, 181 SOR, 340
boundary conditions, 176 analysis of, 351
periodic problems, 14 efficiency of, 356
periodic tridiagonal systems, 91 estimating the parameter, 358
phase error, 126, 203 implementation, 360
for multistep schemes, 127 line, 359
Poisson summation formula, 250 and Neumann boundary condition, 368
Poisson’s equation, 311 symmetric, 359
polar coordinates, 333 symmetric positive definite matrices, 364
positive definite matrix for elliptic schemes, spectral radius, 341, 404
336 SSOR, 359, 391
iterative method for, 362 preconditioner, 391
preconditioned conjugate gradient method, stability
390 for ADI methods, 177
implementation, 393 condition, general, 50
pseudoscheme, 268 definition, 28
Index 435
Downloaded 03/05/15 to 132.248.181.171. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

stability (continued) tridiagonal systems, 88


for initial-boundary value problems, 288 truncation error, 64
and lower-order terms, 53 truncation operator, 235
for multistep schemes, 105
region, 29 unitary matrix, 401
for systems of equations, 165 upwind differencing, 160
and variable coefficients, 59
and von Neumann polynomial, 108 variable coefficients, 59, 163, 205, 235,
steepest descent method, 373 291, 315, 331
implementation, 375 effect on well-posedness, 222
Stokes equations, 313 von Neumann analysis
strictly nondissipative schemes, 122
for first-order equations, 47
successive-over-relaxation (SOR), 340
for second-order equations, 193
symbol
von Neumann polynomial, 108, 109, 198
of a differential operator, 69, 314
of a finite difference scheme, 69
symbolic calculus, 81 wave equation, 187
symmetric matrix, 408 in two dimensions, 202
for elliptic schemes, 336 wave packet, 130, 248
symmetric successive-over-relaxation frequency of, 130
(SSOR), 391 well-posedness
for initial value problem, 31, 206
Tchebyshev polynomial, 389 initial-boundary value problem, 279
Thomas algorithm, 88, 174, 177 for second-order equations, 190
time split schemes, 170 West’s algorithm, 368

You might also like