100% (2) 100% found this document useful (2 votes) 615 views 345 pages L. Fox - An Introduction To Numerical Linear Algebra-Oxford University Press (1967) PDF
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save L. Fox - An introduction to numerical linear algeb... For Later AN INTRODUCTION TO
NUMERICAL
LINEAR
ALGEBRAProblems involving linear algebra arise
im many contexts of scientific computa-
tion, cither directly or through the
replacement of continuous systems by
diserete approximations. ‘This introduc-
tion covers the practice of matrix algebra
and manipulation, and the theory and
practice of direct and iterative methods
for solving linear simultaneous algebraic
equations, inverting matrices, and dete
mining the latent roots and vectors of
matrices. Special attention is given to
the important problem of error analys
and numerous examples illustrate the
procedures recommended in various ¢i
cumstances. Emphasis is on the ch
and reasons for the cl we of selected
numerical methods, and although this is
conditioned by the digital computer
of programming «
coding for any machine or in
language. It is essentially a bo
analysis’ aspect of
there are no det
the ‘numeric:
Algebra.
eead
15. APR 1987
‘This book is due for retum on or before the
last date shown above.
AsanMONOGRAPHS ON NUMERICAL ANALYSIS
General Editors
E. T. GOODWIN, L. FOXMONOGRAPHS ON NUMERICAL ANALYSIS
Already published
‘THE NUMERICAL SOLUTION OF TWO-POINT
BOUNDARY PROBLEMS IN ORDINARY DIFFERENTIAL EQUATIONS
By1.v0x 1057
THE ALGEBRAIC EIGENVALUE PROBLEM
By 5. winrrnson 1964AN INTRODUCTION TO
NUMERICAL
LINEAR ALGEBRA
BY
L. FOX, M.A., D.Se.
DIREOTOR, UNIVERSITY COMPUTING LABORATORY, AND
PROFESSOR OF NUMERICAL ANALYSIS,
oxroRD
CLARENDON PRESS - OXFORDOxford University Press, Ely House, London W.1
{GLASGOW NEW YORK TORONTO MEEDOURKE WRLLINGTON
KUALA LOMPCR HONG KONO TORYO
© Oxford University Press, 1964
PIRST PUBLISHED 1996
REPRINTED LITHOGRAPRICALLY IN GREAT BRITAIN
‘FROM CORRECTED SHEETS OF THE FIRST EDITION
‘BY WILLIAM CLOWES AND SONS, LIMITED
LONDON AND BECCLES
1007Preface
This series of Monographs on Numerical Analysis owes its existence
to the late Professor D. R. Hartree who, defying the walrus, thought
that the time had come to talk of one thing at a time, at least in this
field. Indeed the various areas of Numerical Analysis have expanded
so rapidly that it is now virtually impossible to write a single book
which gives more than a very elementary introduction in all fields.
We even need a variety of books on each single topic, as in other
branches of science and mathematics, to meet the various require-
ments of the undergraduate, the research student, and those who
spend their working life in solving numerical problems in specific
contexts.
Numerical analysis was introduced in 1959 into the Oxford under-
graduate mathematical syllabus, and it seemed to me preferable to
talk about numerical linear algebra in the first place and to leave
for subsequent courses the theory and practice of approximation
and its applications to the solution of differential and integral equa-
tions in which, of course, linear algebra plays # large part. The material
of this book is therefore based on this first set of lectures, and I
generally cover some two-thirds of it in about 28 lectures, treating
less thoroughly Chapters 4, 5 and 7 and the later parts of Chapters
8, 10 and 11.
Thad considerable difficulty with Chapter 2. Instead of introducing
linear equations via vector spaces, linear transformations and matrices
I started with linear equations and tried to show how the algebra of
matrix manipulation ‘hangs together’ and simplifies not only our
notation but the proofs of our numerical operations. Though this
does not give a beautiful mathematical theory I made the deliberate
choice for three reasons. First, the Oxford undergraduates learn the
mathematical theory from other lecturers. Second, I think that with
‘that theory they do not easily acquire the facility with matrix
manipulation which they will need in numerical work and even for
the study of more advanced theoretical texts. Third, the Director of
2 Computing Laboratory must also consider the engineers and scien-
tists who use the computer to solve numerical problems, and in my
experience these workers have some antipathy to and even fear of
words like ‘space’, ‘rank’ and even ‘matrix’ which, they feel, representvi PREFACE
strange and impractical mathematical abstractions. And yet the
use and manipulation of matrices, in the elementary form given here
and which is sufficient for many practical purposes, is really very
easy! This is true also of ‘norms’, introduced in an elementary way
in this Chapter and which are so valuable for measuring the conver-
gence of series and iterative processes.
And of course mathematical rank and matrix singularity have less
importance in practical work. Here the data is rarely exact, and
instead of matrix A we have to consider the matrix A--éA, where
we may know only upper and lower bounds to the elements of 5A.
Even if A is exact (in a ‘mathematical’ problem) our numerical
methods involve arithmetic which is rarely exact. We must faco the
fact, and numerical analysts do not apologise for it, that the question
of error analysis is profoundly important and that for this purpose
we must investigate very closely the details of the arithmetic, The
present tendeney of error analysis, in all branches of numerical analysis,
largely refrains from following through the effects on the solution of
each individual error, but accepts these errors and tries to determine
what problem we have actually solved. Our methods are then evalu-
ated according to our ability to perform the appropriate analysis and
to the size of the upper bounds of the perturbations.
This is considered in Chapter 6. Chapters 3 and 4 study various
direct processes for solving linear equations based on elimination and
triangular decomposition, and the close relations which exist between.
the various methods. Many of these, together with the orthogonalisa-
tion methods of Chapter 5, might well be discarded for practical
purposes, but they have some mathematical interest and considerable
literature, and I thought it desirable to collect in one place a summary
of the relevant facts. In Chapter 7 I consider very briefly the work and.
storage requirements for some of the methods, with particular
zeferenco to automatic digital computers. Chapter 8 gives an intro-
duction to a class of iterative methods for solving linear equations,
whose recent developments, particularly for the large sparse matrices
relevant to elliptic differential equations, have been brilliantly
expounded by RB. S. Varga in his ‘Matrix Iterative Analysis’ (Prentice-
Hall, 1962).
Chapters 9 and 10 discuss the determination of the Jatent roots
and vectors of general matrices, both by iterative methods and by
‘the search for similarity transformations of various kinds, associated
with the names of Jacobi, Givens, Honscholder, Lanozos, Rutishauser,PREFACE vii
Francia and others. ‘These have been further developed by J. H.
Wilkinson, together with a systematic error analysis whose general
features are indicated in Chapter 11, and which are described in
comprehensive detail in his forthcoming “The algebraic eigenvalue
problem’ (Oxford, this series, in press).
There are, of course, some omissions which will displease many
students and teachers. For example I have said very little about
computing machines, and have not made detailed distinction between
the error analysis of ‘fixed-point’ and ‘floating-point’ machine
arithmetic. Coding and programming are mentioned only briefly
in the introductory chapter, with no details of languages like
FORTRAN or ALGOL. My personal opinion is that these things,
while relatively easy to learn and master, take much space to describe
and the mathematical undergraduate needs essentially the principles
expressed in a book which is reasonably short and correspondingly
inexpensive. Those who teach ALGOL, moreover can easily use as
exercises the algorithms of this book, all of which, I hope, are ex-
pressed unambiguously in the language of English and of standard
mathematical notation. In a few cases the algorithmic language
would simplify the description, and in these cases it is interesting to
note that hand computation is relatively tedious; the method of § 30
in Chapter 4 is one example of this. There is, of course, some advantage
in using digital computers at the undergraduate stage, and I hope to
introduce this at Oxford when we acquire facilities which are not
completely saturated by the demands of research.
‘With regard to notation I have used the prime rather than the super-
script T to denote matrix transposition, and usually capital letters de-
note matrices and lower-case letters denote vectors, in ordinary italic
type. Exceptions are the row or column vectors of a matrix, usually
denoted respectively by R,(A) and C,(4), and I fear that consistency
lapses for the residual vector, sometimes called r and sometimes R,
I suspect for personal historical reasons. All my matrices, incidentally,
have distinct latent roots (which word I use consistently instead of
eigenvalues) and consequently a full set of independent latent vectors,
with obvious simplifications in the theory and no considerable
restriction in practice.
Most of the material is already published in learned journals, and
most books on numerical analysis have some account of parts of it.
Few similar books, however, are available in English. Prodecessors not
mentioned. in the text include P. §. Dwyer’s ‘Linear Computations’i PREFACE
(Wiley, 1951), written before the advent of the digital computer
and the advances in error analysis, and H, Bodewig’s ‘Matrix Calculus’
(North Holland Publishing Company, Amsterdam, 1950) which has
more and deeper theoretical treatment but perhaps fewer practical
details. More advanced books include those of Varga, the imminent
treatise of Wilkinson, and the latter’s just published ‘Rounding
errors in algebraic processes’ (HMSO, 1963), and I hope that my
readers will be able subsequently to benefit more easily from these
Tearned works.
It is a pleasure to record my debt to Dr. E. 'T. Goodwin, who read
the proofs and made several valuable suggestions; to Professor A. H.
‘Taub, who invited me to Illinois for 2 sabbatical semester in which I
found time to write several chapters; to the Clarendon Press, who made
a special and successful effort to produce this book in time for the
1964 examinations; and above all to Dr. J. H. Wilkinson, who read
all the first draft, made important criticisms and suggestions, and
from whom I have learnt much.
L. Fox
Oxford, January 1964Contents
1, INTRODUCTION
‘Numerical analysis
Computer arithmetic
Simplo error analysis,
Computing machines, programming and coding
Checking
‘Additional notos
2. MATRIX ALGEBRA
Introduction
Linear equations. General considerations
Homogeneous equations
Linear equations and matrices
Matrix addition and multiplication
Inversion and solution. Tho unit matrix:
‘Transposition and symmetry. Tnversion of products
Some special matricos
‘Triangular matrices. The decomposition theorem
‘Tho determinant
Cofactors and the inverse matrix
Determinants of spocial matrices
Partitioned matrices
Latent roots and vectors
Similarity transformations
Orthogonality
‘Symmetry, Rayloigh’s Principle. Hermitian matrices
Limits, series and norms
Numerical methods
‘Additional notes and bibliography
3. ELIMINATION METHODS OF GAUSS, JORDAN
AND AITKEN
Introduction
Calculation of the inverse
‘Matrix equivalent of elimination
‘The method of Aitken
‘Tho symmotrie caso
‘Tho symmotrie, positive-definite case
‘Exact and approximate solutions. Integer coefficients
‘Determination of rank
‘Complete pivoting
Compatibility of linear equations
‘Noto on comparison of methods
Additional notes and bibliography
a
66
68
5
9
82
87
91
93
96
97x CONTENTS
4. COMPACT ELIMINATION METHODS OF DOOLITTLE,
CROUT, BANACHIEWIOZ AND CHOLESKY
Introduction 90
‘The mothod of Doolittle 99
Connexion with decomposition 102
‘Tho method of Crout 102
Symmotrie ease 104
‘Tho mothods of Bannchiowiex and Cholesky 106
Inversion. Connexion with Doolittle and Crout no
Inversion, Symmetric ease 13
Connexion with Jordan and Aitken ns
Row interchanges 7
Operations with complex matrices 121
Additional notes and bibliography 124
5. ORTHOGONALISATION METHODS
Introduction 125
Symmotric case 126
Unsymmetrie case 128
‘Matrix orthogonalisation 130
Additional notes and bibliography 135
6. CONDITION, ACCURACY AND PRECISION
Introduction 136
Symptoms, causes and effects of il-conditioning aT
Measure of condition aa
Exact and approximate data 143
Mathematical problems. Correction to approximate solution 13
Mathematical problems. Correction to the inverse 155
Physical problems. Error analysis 158
Rolative procision of components of solution 107
‘Additional notes and bibliography 109
7. COMPARISON OF METHODS. MEASURE OF WORK
Introduction 175
Gauss elimination 15
Jordan elimina 19
Matrix decomposition 190
Aitken elimination 183
183
185
136
8. ITERATIVE AND GRADIENT METHODS
Introduction ‘189
General nature of iteration 190
‘acobi and Gauss-Seidel iteration 191
‘Acesleration of convergence 194
Labour and accuracy 202
Consistent ordering 203
Gradient mothods 205CONTENTS
‘Symmetric positive-definite case
A finite iterative process
‘Additional notes and bibliography
9, ITERATIVE METHODS FOR LATENT ROOTS AND
VECTORS
Introduction
Direct iteration
Acceleration of convergence
Other roots and vectors. Inverso iteration
‘Matrix deflation
Connexion with similarity transformation
Additional notes and bibliography
10. TRANSFORMATION METHODS FOR LATENT ROOTS
AND VECTORS
Introduction
‘Method of Jacobi, symmetric matrices
‘Method of Givens, eymmetric matrices
‘Method of Householder, symmetric matrices:
‘Bxamplo of Givens and Houscholder
‘Uniqueness of triple-diagonal form
Method of Lanczos, symmetric matrices
‘Method of Lanczos, unsymmetrie matrices
‘Vectors of triple-diagonal matrices
Other similarity transformations. The L-R method
The Q-R method
Reduction to Hessenberg form
Roots and vectors of Hessenberg matrix
‘Additional notes and bibliography
11, NOTES ON ERROR ANALYSIS FOR LATENT
ROOTS AND VECTORS
Introduction
MLeonditioning
Corrections to approximate roots and vectors
General perturbation analysis
Deflation perturbation
‘Additional notes end bibliography
INDEX
xi
207
208
213
275
275
278
281
286
288
201
Nore. + indicates that thero is a further mention of the section in Additional notes
‘and bibliography, given at the end of each Chapter.1
Introduction
Numerical analysis
1, Tuts book is concerned with topics in the field of linear algebra, in
particular with the solution of linear equations and the inversion of
matrices, and the determination of the latent roots and vectors of
matrices. Before embarking on our exposition it is desirable to make
some introductory remarks on the nature and general aims of numer-
ical analysis, and on the computing equipment which will enable us,
without undue fatigue and in reasonable time, to obtain numerical
answers to our problems.
‘The numerical answer is our aim. The roots of the quadratic equation
a*+2be+c = 0 are
ay, y= —4(0*—o}f, a)
but we are concemed with the evaluation of 2, and 2, for given
numerical values of & and ¢, We might, as here, have a ‘closed ex-
pression’ for the answer, in which we merely have to substitute the
given numbers, the data of the problem. More commonly there is no
simplo formula, but there may be an algorithm, represented by an
ordered sequence of numerical operations, additions, subtractions,
multiplications and divisions, which is known to give the required
result. The construction of such algorithms is one of the research
activities of numerical analysis.
2. But we must be careful with the phrase ‘required result’. An
answer is rarely obtainable exactly as an integer or the ratio of two
integers. Even for a simple problem like that represented by equation
(1) we shall have to compute an irrational number, or non-terminating
decimal, for most values of b and c. For example if} = 1 and ¢ -1
the required roots are —14/2, and if we want this as @ singlo
number we have to specify in advance the precision of our result, that
is the number of figures which we should like to have correct. In the
decimal scale the number 1/2 is 1-41421356..., and if we specify a
precision of p decimals wo have to round the number appropriately,
and in such a way that the error committed is as small as possible.
To do this we truncate the number to the precision required, in-
creasing by unity the last digit retained if the first neglected digit is2 INTRODUCTION
5, 6, 7, 8 or 9. We thereby ensure that the maximum error committed
is not more than five units in the first neglected place, or half a unit in
the last figure given, or 0-5 x 10-*. To three decimals /2 = 1-414, to
seven decimals it is 1-4142136, and so on.
3. Even if the computation can in theory be performed with exact
integers, moreover, we shall find that our computing machine cannot
usually handle the large numbers involved in the arithmetic processes.
For example, if we are solving simultaneous linear algebraic equations
inn unknowns, in which the coefficients and right-hand sides are given
as p-figure integers, an exact process could give the results as the
ratios of two integers, both of which would contain np digits. In tho
more practicable methods which we discuss in this book the integers
might contain p x 2"~1 digits. If n is 20, which is by no means large in
practical problems, and p is say four, this number is of the order of
2x10, and no computing machine can store numbers of this size
without complicating prohibitively the task of ‘programming’ and
increasing prohibitively the time of operation.
If the coefficients are given as rational fractions, such as 4, or as
irrational numbers like e, m, +/2 or sin 0-72 (radians), we shall have to
round them to a given number of digits. The problem we are solving is,
‘then not quite the original problem, and one of our tasks will be to
decide how many figures we need to keep in the original data, and also
in the process of the computation, to obtain the required precision in
the results.
4, Problems in which the data are known exactly, either as integers,
rational or irrational numbers, I call mathematical. The author of such
a problem has a perfect right to ask for any degree of precision which
ho needs for his purpose. On the other hand most problems with a
scientific context will involve data obtained asa result of measurement,
in some degree inaccurate, and our task now is to decide the worth-
while precision of the answers. Such problems are called physical, and
it is solf-deceptive to quote as answers more digits than those which
remain unchanged however the data is varied within its limits of
‘tolerance’. The ‘required result’ now becomes the ‘meaningful result’,
and our methods should decide this for us.
As a trivial example, if we are asked to compute sinz, and a
measurement of z gives the value z = 47+0-005, we see that there is
a range of values of the answer, from about 0-7036 to 0-7106, and a
quoted result of 0-7071 has a possible error of -.0-0035. It would clearlyINTRODUCTION 3
bbe stupid to quote more than three decimals in the result. We shall see
later that the precision of the answer compared with that of the data
varies considerably with the problem, and in complicated algorithms
our work of determining this might be formidable and challenging.
5. We note also that we would often prefer to use an algorithm,
rather than evaluate a closed solution, even when the latter exists.
In the field of differential equations, for example, the solution of the
first-order equation
(2)
1 = a(t), ®
where A is an arbitrary constant to be fixed by the specification of y
for a particular value of x.
‘Now this is a useful formula for the computation of y for one or two
particular values of 2. But it is quite common to want a graph, or
preferably a ‘able of values of y for a sot of (usually) equidistant values
of x over a lengthy range. The calculation of the expression (3) is then.
not trivial, involving the evaluations of a square root, an inverse
tangent, and an exponential function, in addition to one division and
several multiplications. In the computation of these elementary
functions, moreover, we shalll either have to use some form of series or
to interpolate in mathematical tables, and the whole operation is
somewhat lengthy. We have numerical methods for solving such
problems, though they belong to a field outside our present interest,
which perform much less arithmetic and which produce suecessive
values in the table without ever knowing the closed solution (3).
6. The closed solution, of course, is extremely valuable for many
purposes, but unfortunately it can rarely be obtained in terms of the
so-called ‘elementary’ functions. For example an apparently innocent
change in (2), to the form
di 2
ea =® @
produces the more formidable-looking solution
1+: 4, 1—: 1,
(Her (PGR eer). @
‘This can hardly be called a solution at all, since we have no analytical
methods for evaluating the indefinite integral in terms of elementary4 INTRODUCTION
functions, and some numerical process has to be used for this purpose,
We might just as well use our algorithmic numerical method for the
equation (4) without recourse to (5), and in fact the extra numerical
work in (4) compared with that of (2) is almost negligible.
7. Again, however, we should not ignore the possibility of obtaining
a closed solution, and it is very important that we should understand
the mathematics and mathematical methods for our problems, as well
as the numerical analysis and possible algorithms. In particular we
should try to decide in advance whether our given problem really has a
solution, that is whether there is an existence theorem for it. With the
development of automatic computing machines the mathematical
analysis is increasingly important, and it should never be thought that
the machine will do the mathematics for us.
Our algorithm may sometimes decide for us whether or not our
problem has a solution, or at least a unique solution. For example it is
usually the case that a set of simultaneous linear algebraic equations
has @ unique solution when the number of equations is equal to the
number of unknowns. But it is clear that the equations
at+y = 3)
y } )
oe2y = 9
do not define @ unique solution, the second equation being effectively a
restatement of the first. If in the second of (6) the right-hand side were
a number other than six it is clear, moreover, that the equations would
have no solution at all. This is less obvious with the equations
ztyte =o)
z—y—z =), @
Baty +42 = 7)
which have no unique solution for any a, B and y, and no solution at
all unless y = 3a—f.
With many equations, and with more digits in the coefficients, wo
may have some trouble in this context, and the necessity for rounding
may produce a solution from our computing machine when in fact no
solution exists. We shall give examples of this in a later chapter and
show how our algorithm can help to decide the questions. In other
fields, notably in the solution of differential equations, our algorithm
may be less valuable in the determination of existence, and mathe-
matical analysis is essential.INTRODUCTION 5
8 Summarizing, we can say that numerical analysis is concerned
with the production of numerical solutions to scientific and mathe-
matical problems. Our aim is to find methods which are economic in
time, which produc? the results to the accuracy requested in mathe-
matical problems, and which tell us how many figures are worth
quoting in physical problems. To the numerical analysis we should add
any mathematical knowledge we have or can find about the existence
of solutions, and in some sense our methods, like those of mathematics
itself, should be elegant!
‘As a rather trivial example of elegance we might consider the
formula (1) for the solution of quadratic equations. If b*—c is
reasonably small, and we compute its square root to a given number of
decimal places, the formula gives roughly the same number of correct
digits in both roots, But if ¢ is small, so that (6*—c)t = b-+«, where €
is small, then z, = —2b—«, 2, = ¢, and 2, is given accurately with
many more digits than z,. To avoid computing the square root to more
figures we use our mathematics to note that 2,2, = ¢, 80 that x, = ¢/zy
and can be computed from this formula with a relative accuracy
similar to that of 2.
The loss of significant digits in subtracting large numbers is a
common phenomenon, and we use all possible methods to avoid or
mitigate the consequences thereof.
Computer arithmetic
9. There are two methods in common use for operating with
numbers in @ computing machine, In both cases the numbers are
stored in registers of fixed length, so that we can retain only p digits
say, in any given number, and a number containing more than p
digits must be truncated or, with extra effort, stored in two or more
such registers. In what follows we assume that we are working in the
common decimal system.
With ‘single-length’ arithmetic, with p digits, we have either the
fixed-point or the floating-point method of operation. In the fixed-point
method it is customary to limit the size of numbers which may occur
to the range —1 to +1, and any number outside this range must be
sealed appropriately by dividing by power of 10. The programmer
must take definite steps to keep track of these scale factors so that the
correct result can finally be obtained.
Since our machine can only store digits we must turn the positive
and negative signs into quasi-digital form, and thie we do with the6 INTRODUCTION
convention that all positive numbers have their first digit zero. Tho
decimal point will normally be thought to follow this digit, so that in
a four-digit register wo can effectively store threo figures. Tho number
0-924 will actually appear in that form, and the largest positive
number we can store is 0-999, the integer after the decimal point
being 10”—1 in a (p+1) register machine.
For a negative number z we store the complement 10°+1—|z|, so that
the first digit is always 9, and the number —0-924 appears as 9076.
All negative numbers have nine as the first digit, and the largost
negative number we can store is 9-000, which is —1 in the ‘signed’
convention, the ‘fractional part’ representing the integer 10°.
Ttiseasy to see that addition and subtraction, using the complements
of negative numbers, will always give the correct answers in the ‘signed’
convention provided that the result is in the allowed range. In fact in a
sequence of such operations the intermediate results are allowed to ex-
ceed the range. For example 0-126—0-125 )-126 +-9-875 = 10-001,
‘The first digit is ‘lost’ and we are left with 0-001, the true result. Again,
0-125 —0-126 = 0-125 49-874 = 9-999 = —0-001, again correct. The
sum 0-986 40-125 = 1-111 cannot be allowed, however, and we would
have to store this in the rounded form 0-111 x 10!, remembering the
power of 10 involved. But
0-986 +.0-125 —0-389 = 0-986 +0-125+9-611 = 10-722 = 0-722,
and this is correct.
‘When we multiply together two permissible numbers the result is
certain to be within range. But the exact product of two numbers of p
digits has 2p digits, and we need two registers to store it exactly, a
s0 called ‘double-length’ accumulator, If we have to round it to single
length we commit an error of maximum amount 0:5 x10-*. The
division a/b is out of range if a > b, but otherwise we can perform the
calculation. In a ‘single-length’ register the stored result will have @
maximum error of 0-5 x10-°, unless the resulting decimal number
terminates in at most p digits.
10. In the floating-point system our numbers can be of almost any
size, and we store them in the form 10*Xb, making space in our
register for both a and }. This representation is not unique, but we
standardize by choosing b in the range 0-1 < |b |< 1. For example the
number 1562 is stored as 0-1562 x 10*, 0-001562 is given as 0-1562 x
10-*. Both a and b can be negative, and are stored with the signedINTRODUCTION 1
convention, though ais always an integer and we oan forget about the
decimal point in its register.
Here the user is not worried by scaling problems and the machine
automatically keops track of the relevant powers of ten. ‘Overflow’ of
the accumulator is now almost solely restricted to the case of division
by zero, and otherwise the size of allowable numbers is governed by
the size of the register we allow for the representation of the exponent
a, We shall mention some other relevant facts about arithmetic in the
appropriate contexts.
Simple error analysis
11. ‘The fixed-point and floating-point representations introduce the
ideas of decimal places and significant figures. Both the numbers 0-9246
and 0-0002 have four decimal places and would be stored in this form
in the fixed-point method. ‘The first number, however, has four
significant figures whereas the second has only one significant figure.
‘The point about the word ‘significant’ is that, if these numbers were
obtained as a result of rounding with a possible maximum error of
half a unit in the last place retained, each has a possible absolute error
of +0-00005, but the former has a much smaller relative error. It is
correct to approximately one part in 20,000, while the number 0-0002
is correct only to one part in 4,
In the floating-point representation these numbers are stored re-
spectively as 0-9246 x 10° and 0-2000x10-*. Here the number of
non-zero digits in the fractional part represents the number of signifi-
cant figures present, the three zeros in the second example being
inserted to fill up the register. If we had more significant information
about this value, for example that it was 0-0002329..., or 0-0002000
where the last three zeros are known to be correct, we could store it in
a floating-point form like 0:2329 x 10-* with a small relative error,
whereas the rounded fixed-point number 0-0002 has a small absolute
error but a large relative error. This, incidentally, does not imply that
the floating-point representation is superior. There are many factor
involved, some of which we shall mention later. We note immediately,
however, that in an addition like 0-9246 x 10°+0-2329 x 10-* we
have first to express the smaller number in the rounded form
0-0002 x 10° in order to add it to the first, and we have had to
discard its last three digits.
12, We shall need rules for assessing both types of error in simple
operations, so that we can extend them to complicated situations.8 INTRODUCTION
Consider first the case of absolute error. If x is the true value, and
262 an approximation, the absolute error of xt 4z is just dz. In
general, for instance after rounding, the error will normally have equal
possibility of being positive or negative.
We can then assert quite obviously that the maximum absolute
error in a sequence of additions or subtractions
w= tatbtetd... (8)
is just the arithmetic sum of the individual absolute errors, given by
[dx] = [da] + 160] +... (9)
For a product ab we actually form (aéa)(b-+3b), and the
absolute error is [Boal +088] + (2085), ao)
the last term usually being negligible, a quantity of ‘second order’, in
relation to the others.
13. In fact we can use the differential calculus and say that, if
¥ =f ley 2p. qa)
and 2,, 2,... have absolute errors |8z,|, |dz|,..., then that of y is
a af
byl = |x ze| + |Se0g7,| > aay
provided that the individual absolute errors are sufficiently small. For
example, if y = sin z, then dy = cos x 62, and the absolute error in y
is not greater than that in z.
Again, if y = 2”, we have
ldy| = px? |ézl, (18)
and the ratio |4y/82| will depend both on « and on p. If p and x both
exceed unity the error in y is greater than that in x, but if 2 > 1 and
p <1, 80 that we are taking a fractional power, then [yl < [8z|. The
statement in many books that we cannot get a result with more
correct figures than are contained in the data is clearly false. For
example if y=200, ay
and all we know about the ‘2’ is that it is correctly rounded, we can
certainly quote y = 1-007 with a maximum absolute error of 0-003, or
maximum relative error of 1 in 300.INTRODUCTION 9
14, We shall in fact more often be concerned with relative error, the
dimensionless quantity |82/2|. The relative error of a sum or difference
has no simple expression, but corresponding to (8) and (9) we have the
Tale that if x = attbilett,,., (16)
&
z
then
da], 18
SI + FI sey (18)
that is the maximum relative error of the result is the sum of the
individual relative errors. This is proved immediately by taking the
logarithmic derivative of (15).
‘This result will give us valuable information about the number of
‘meaningful figures in the number z derived from an operation like (15).
For example, if 0-833 x 22-5
= 9395”
and all we know about the factors is that they are correctly rounded,
to how many digits can we reasonably quote the result? From (16) we
a be 105 , 0-05 , 0-0005
Z| = Vass taa6+ 0-225
to sufficient accuracy. The error in 2 is therefore one part in two
hundred. From (17) our estimate of 2 is 83-3, and this therefore has a
possible error of 0-4, and only two significant figures of x are worth-
while,
Computing machines, programming and coding
15. There are about five steps in any computational task, though
some of them may not always be needed. They are as follows:
(i) Expressing the scientific problem as a mathematical problem.
(i) Finding the ‘best’ numerical method for solving the mathe-
matical problem.
(iii) Expressing this method in algorithmic form, that is as a
sequence of numerical operations, recordings, and s0 on.
(iv) Turning this sequence into the language of the machine.
(v) Performing the computation.
‘These items are in some sense independent of the nature of our
computing equipment, but the latter will influence our choice in (ii),
and to some extent the work of (iv).
Before about 1950 most computation was carried out on a desk
machine which can perform arithmetic and store a few numbers. For
example, if we want to compute ab, we can put ain the setting register,
(a7)
= 0-0050 (a8)10 INTRODUCTION
tap out the number b on the multiplication register, and obtain the
result in the product register. The numbers a, b and ab are all visible—
‘they are stored in the machine. In a long computation, however, in
spite of various tricks that we can play, we shall have to record with
pen and ink the results of many such intermediate calculations. We
use an auciliary storage medium, in this case the ‘registers’ on a sheet
of paper.
The ‘best’ numerical method is then to some extent conditioned by
our desire to avoid overmuch recording, which is tedious and error-
provoking. On the other hand our auxiliary store is unlimited, and the
point of this remark will become apparent later.
In item (iii) for our desk-machine work we write down, in consider-
able detail, the precise nature and order of the operations we wish to
perform, and possibly present it to an assistant who will then perform
item (v). In other words we give him a programme of instructions. The
language we use in (iv) is the national tongue, with words and mathe-
matical symbols. Our helper then operates the machine as in (v),
records the intermediate results and produces the final answer,
recorded on his sheet of paper.
16. The modern high-speed electronic digital computer (the machine)
differs in several important respects, but our use of it has analogies
with that of the desk machine, and we can use the same type of
vocabulary. First, the machine has large storage capacity, its arith-
metic speeds are very great, and intermediate calculations can be
transferred to the registers of the machine rapidly and accurately.
‘The registers in the machine are numbered, are given addreasea, and we
can ask the machine to put a number in a particular register, or to
fetch it from that register, just as wo used to ask our assistant to copy
a number into a particular location on the computing sheet, or to take
ich a particular number and perform some numerical operation with
Second, the machine has an arithmetic unit, the operative part of
which is the accumulator, corresponding directly to the product
register of our desk machine. There is, however, one significant
difference. In the desk machine our registers have a fixed length, that
is we can store numbers to a certain precision, but it is perfectly
Possible, and indeed easier, to perform our arithmetic with fewer
digits. In the electronic machine our registers can store a fixed number
of digits (the word length of the machine) and there is no economy of