100% found this document useful (2 votes)
615 views345 pages

L. Fox - An Introduction To Numerical Linear Algebra-Oxford University Press (1967) PDF

Uploaded by

nithiyaraj m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
100% found this document useful (2 votes)
615 views345 pages

L. Fox - An Introduction To Numerical Linear Algebra-Oxford University Press (1967) PDF

Uploaded by

nithiyaraj m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 345
AN INTRODUCTION TO NUMERICAL LINEAR ALGEBRA Problems involving linear algebra arise im many contexts of scientific computa- tion, cither directly or through the replacement of continuous systems by diserete approximations. ‘This introduc- tion covers the practice of matrix algebra and manipulation, and the theory and practice of direct and iterative methods for solving linear simultaneous algebraic equations, inverting matrices, and dete mining the latent roots and vectors of matrices. Special attention is given to the important problem of error analys and numerous examples illustrate the procedures recommended in various ¢i cumstances. Emphasis is on the ch and reasons for the cl we of selected numerical methods, and although this is conditioned by the digital computer of programming « coding for any machine or in language. It is essentially a bo analysis’ aspect of there are no det the ‘numeric: Algebra. ee ad 15. APR 1987 ‘This book is due for retum on or before the last date shown above. Asan MONOGRAPHS ON NUMERICAL ANALYSIS General Editors E. T. GOODWIN, L. FOX MONOGRAPHS ON NUMERICAL ANALYSIS Already published ‘THE NUMERICAL SOLUTION OF TWO-POINT BOUNDARY PROBLEMS IN ORDINARY DIFFERENTIAL EQUATIONS By1.v0x 1057 THE ALGEBRAIC EIGENVALUE PROBLEM By 5. winrrnson 1964 AN INTRODUCTION TO NUMERICAL LINEAR ALGEBRA BY L. FOX, M.A., D.Se. DIREOTOR, UNIVERSITY COMPUTING LABORATORY, AND PROFESSOR OF NUMERICAL ANALYSIS, oxroRD CLARENDON PRESS - OXFORD Oxford University Press, Ely House, London W.1 {GLASGOW NEW YORK TORONTO MEEDOURKE WRLLINGTON KUALA LOMPCR HONG KONO TORYO © Oxford University Press, 1964 PIRST PUBLISHED 1996 REPRINTED LITHOGRAPRICALLY IN GREAT BRITAIN ‘FROM CORRECTED SHEETS OF THE FIRST EDITION ‘BY WILLIAM CLOWES AND SONS, LIMITED LONDON AND BECCLES 1007 Preface This series of Monographs on Numerical Analysis owes its existence to the late Professor D. R. Hartree who, defying the walrus, thought that the time had come to talk of one thing at a time, at least in this field. Indeed the various areas of Numerical Analysis have expanded so rapidly that it is now virtually impossible to write a single book which gives more than a very elementary introduction in all fields. We even need a variety of books on each single topic, as in other branches of science and mathematics, to meet the various require- ments of the undergraduate, the research student, and those who spend their working life in solving numerical problems in specific contexts. Numerical analysis was introduced in 1959 into the Oxford under- graduate mathematical syllabus, and it seemed to me preferable to talk about numerical linear algebra in the first place and to leave for subsequent courses the theory and practice of approximation and its applications to the solution of differential and integral equa- tions in which, of course, linear algebra plays # large part. The material of this book is therefore based on this first set of lectures, and I generally cover some two-thirds of it in about 28 lectures, treating less thoroughly Chapters 4, 5 and 7 and the later parts of Chapters 8, 10 and 11. Thad considerable difficulty with Chapter 2. Instead of introducing linear equations via vector spaces, linear transformations and matrices I started with linear equations and tried to show how the algebra of matrix manipulation ‘hangs together’ and simplifies not only our notation but the proofs of our numerical operations. Though this does not give a beautiful mathematical theory I made the deliberate choice for three reasons. First, the Oxford undergraduates learn the mathematical theory from other lecturers. Second, I think that with ‘that theory they do not easily acquire the facility with matrix manipulation which they will need in numerical work and even for the study of more advanced theoretical texts. Third, the Director of 2 Computing Laboratory must also consider the engineers and scien- tists who use the computer to solve numerical problems, and in my experience these workers have some antipathy to and even fear of words like ‘space’, ‘rank’ and even ‘matrix’ which, they feel, represent vi PREFACE strange and impractical mathematical abstractions. And yet the use and manipulation of matrices, in the elementary form given here and which is sufficient for many practical purposes, is really very easy! This is true also of ‘norms’, introduced in an elementary way in this Chapter and which are so valuable for measuring the conver- gence of series and iterative processes. And of course mathematical rank and matrix singularity have less importance in practical work. Here the data is rarely exact, and instead of matrix A we have to consider the matrix A--éA, where we may know only upper and lower bounds to the elements of 5A. Even if A is exact (in a ‘mathematical’ problem) our numerical methods involve arithmetic which is rarely exact. We must faco the fact, and numerical analysts do not apologise for it, that the question of error analysis is profoundly important and that for this purpose we must investigate very closely the details of the arithmetic, The present tendeney of error analysis, in all branches of numerical analysis, largely refrains from following through the effects on the solution of each individual error, but accepts these errors and tries to determine what problem we have actually solved. Our methods are then evalu- ated according to our ability to perform the appropriate analysis and to the size of the upper bounds of the perturbations. This is considered in Chapter 6. Chapters 3 and 4 study various direct processes for solving linear equations based on elimination and triangular decomposition, and the close relations which exist between. the various methods. Many of these, together with the orthogonalisa- tion methods of Chapter 5, might well be discarded for practical purposes, but they have some mathematical interest and considerable literature, and I thought it desirable to collect in one place a summary of the relevant facts. In Chapter 7 I consider very briefly the work and. storage requirements for some of the methods, with particular zeferenco to automatic digital computers. Chapter 8 gives an intro- duction to a class of iterative methods for solving linear equations, whose recent developments, particularly for the large sparse matrices relevant to elliptic differential equations, have been brilliantly expounded by RB. S. Varga in his ‘Matrix Iterative Analysis’ (Prentice- Hall, 1962). Chapters 9 and 10 discuss the determination of the Jatent roots and vectors of general matrices, both by iterative methods and by ‘the search for similarity transformations of various kinds, associated with the names of Jacobi, Givens, Honscholder, Lanozos, Rutishauser, PREFACE vii Francia and others. ‘These have been further developed by J. H. Wilkinson, together with a systematic error analysis whose general features are indicated in Chapter 11, and which are described in comprehensive detail in his forthcoming “The algebraic eigenvalue problem’ (Oxford, this series, in press). There are, of course, some omissions which will displease many students and teachers. For example I have said very little about computing machines, and have not made detailed distinction between the error analysis of ‘fixed-point’ and ‘floating-point’ machine arithmetic. Coding and programming are mentioned only briefly in the introductory chapter, with no details of languages like FORTRAN or ALGOL. My personal opinion is that these things, while relatively easy to learn and master, take much space to describe and the mathematical undergraduate needs essentially the principles expressed in a book which is reasonably short and correspondingly inexpensive. Those who teach ALGOL, moreover can easily use as exercises the algorithms of this book, all of which, I hope, are ex- pressed unambiguously in the language of English and of standard mathematical notation. In a few cases the algorithmic language would simplify the description, and in these cases it is interesting to note that hand computation is relatively tedious; the method of § 30 in Chapter 4 is one example of this. There is, of course, some advantage in using digital computers at the undergraduate stage, and I hope to introduce this at Oxford when we acquire facilities which are not completely saturated by the demands of research. ‘With regard to notation I have used the prime rather than the super- script T to denote matrix transposition, and usually capital letters de- note matrices and lower-case letters denote vectors, in ordinary italic type. Exceptions are the row or column vectors of a matrix, usually denoted respectively by R,(A) and C,(4), and I fear that consistency lapses for the residual vector, sometimes called r and sometimes R, I suspect for personal historical reasons. All my matrices, incidentally, have distinct latent roots (which word I use consistently instead of eigenvalues) and consequently a full set of independent latent vectors, with obvious simplifications in the theory and no considerable restriction in practice. Most of the material is already published in learned journals, and most books on numerical analysis have some account of parts of it. Few similar books, however, are available in English. Prodecessors not mentioned. in the text include P. §. Dwyer’s ‘Linear Computations’ i PREFACE (Wiley, 1951), written before the advent of the digital computer and the advances in error analysis, and H, Bodewig’s ‘Matrix Calculus’ (North Holland Publishing Company, Amsterdam, 1950) which has more and deeper theoretical treatment but perhaps fewer practical details. More advanced books include those of Varga, the imminent treatise of Wilkinson, and the latter’s just published ‘Rounding errors in algebraic processes’ (HMSO, 1963), and I hope that my readers will be able subsequently to benefit more easily from these Tearned works. It is a pleasure to record my debt to Dr. E. 'T. Goodwin, who read the proofs and made several valuable suggestions; to Professor A. H. ‘Taub, who invited me to Illinois for 2 sabbatical semester in which I found time to write several chapters; to the Clarendon Press, who made a special and successful effort to produce this book in time for the 1964 examinations; and above all to Dr. J. H. Wilkinson, who read all the first draft, made important criticisms and suggestions, and from whom I have learnt much. L. Fox Oxford, January 1964 Contents 1, INTRODUCTION ‘Numerical analysis Computer arithmetic Simplo error analysis, Computing machines, programming and coding Checking ‘Additional notos 2. MATRIX ALGEBRA Introduction Linear equations. General considerations Homogeneous equations Linear equations and matrices Matrix addition and multiplication Inversion and solution. Tho unit matrix: ‘Transposition and symmetry. Tnversion of products Some special matricos ‘Triangular matrices. The decomposition theorem ‘Tho determinant Cofactors and the inverse matrix Determinants of spocial matrices Partitioned matrices Latent roots and vectors Similarity transformations Orthogonality ‘Symmetry, Rayloigh’s Principle. Hermitian matrices Limits, series and norms Numerical methods ‘Additional notes and bibliography 3. ELIMINATION METHODS OF GAUSS, JORDAN AND AITKEN Introduction Calculation of the inverse ‘Matrix equivalent of elimination ‘The method of Aitken ‘Tho symmotrie caso ‘Tho symmotrie, positive-definite case ‘Exact and approximate solutions. Integer coefficients ‘Determination of rank ‘Complete pivoting Compatibility of linear equations ‘Noto on comparison of methods Additional notes and bibliography a 66 68 5 9 82 87 91 93 96 97 x CONTENTS 4. COMPACT ELIMINATION METHODS OF DOOLITTLE, CROUT, BANACHIEWIOZ AND CHOLESKY Introduction 90 ‘The mothod of Doolittle 99 Connexion with decomposition 102 ‘Tho method of Crout 102 Symmotrie ease 104 ‘Tho mothods of Bannchiowiex and Cholesky 106 Inversion. Connexion with Doolittle and Crout no Inversion, Symmetric ease 13 Connexion with Jordan and Aitken ns Row interchanges 7 Operations with complex matrices 121 Additional notes and bibliography 124 5. ORTHOGONALISATION METHODS Introduction 125 Symmotric case 126 Unsymmetrie case 128 ‘Matrix orthogonalisation 130 Additional notes and bibliography 135 6. CONDITION, ACCURACY AND PRECISION Introduction 136 Symptoms, causes and effects of il-conditioning aT Measure of condition aa Exact and approximate data 143 Mathematical problems. Correction to approximate solution 13 Mathematical problems. Correction to the inverse 155 Physical problems. Error analysis 158 Rolative procision of components of solution 107 ‘Additional notes and bibliography 109 7. COMPARISON OF METHODS. MEASURE OF WORK Introduction 175 Gauss elimination 15 Jordan elimina 19 Matrix decomposition 190 Aitken elimination 183 183 185 136 8. ITERATIVE AND GRADIENT METHODS Introduction ‘189 General nature of iteration 190 ‘acobi and Gauss-Seidel iteration 191 ‘Acesleration of convergence 194 Labour and accuracy 202 Consistent ordering 203 Gradient mothods 205 CONTENTS ‘Symmetric positive-definite case A finite iterative process ‘Additional notes and bibliography 9, ITERATIVE METHODS FOR LATENT ROOTS AND VECTORS Introduction Direct iteration Acceleration of convergence Other roots and vectors. Inverso iteration ‘Matrix deflation Connexion with similarity transformation Additional notes and bibliography 10. TRANSFORMATION METHODS FOR LATENT ROOTS AND VECTORS Introduction ‘Method of Jacobi, symmetric matrices ‘Method of Givens, eymmetric matrices ‘Method of Householder, symmetric matrices: ‘Bxamplo of Givens and Houscholder ‘Uniqueness of triple-diagonal form Method of Lanczos, symmetric matrices ‘Method of Lanczos, unsymmetrie matrices ‘Vectors of triple-diagonal matrices Other similarity transformations. The L-R method The Q-R method Reduction to Hessenberg form Roots and vectors of Hessenberg matrix ‘Additional notes and bibliography 11, NOTES ON ERROR ANALYSIS FOR LATENT ROOTS AND VECTORS Introduction MLeonditioning Corrections to approximate roots and vectors General perturbation analysis Deflation perturbation ‘Additional notes end bibliography INDEX xi 207 208 213 275 275 278 281 286 288 201 Nore. + indicates that thero is a further mention of the section in Additional notes ‘and bibliography, given at the end of each Chapter. 1 Introduction Numerical analysis 1, Tuts book is concerned with topics in the field of linear algebra, in particular with the solution of linear equations and the inversion of matrices, and the determination of the latent roots and vectors of matrices. Before embarking on our exposition it is desirable to make some introductory remarks on the nature and general aims of numer- ical analysis, and on the computing equipment which will enable us, without undue fatigue and in reasonable time, to obtain numerical answers to our problems. ‘The numerical answer is our aim. The roots of the quadratic equation a*+2be+c = 0 are ay, y= —4(0*—o}f, a) but we are concemed with the evaluation of 2, and 2, for given numerical values of & and ¢, We might, as here, have a ‘closed ex- pression’ for the answer, in which we merely have to substitute the given numbers, the data of the problem. More commonly there is no simplo formula, but there may be an algorithm, represented by an ordered sequence of numerical operations, additions, subtractions, multiplications and divisions, which is known to give the required result. The construction of such algorithms is one of the research activities of numerical analysis. 2. But we must be careful with the phrase ‘required result’. An answer is rarely obtainable exactly as an integer or the ratio of two integers. Even for a simple problem like that represented by equation (1) we shall have to compute an irrational number, or non-terminating decimal, for most values of b and c. For example if} = 1 and ¢ -1 the required roots are —14/2, and if we want this as @ singlo number we have to specify in advance the precision of our result, that is the number of figures which we should like to have correct. In the decimal scale the number 1/2 is 1-41421356..., and if we specify a precision of p decimals wo have to round the number appropriately, and in such a way that the error committed is as small as possible. To do this we truncate the number to the precision required, in- creasing by unity the last digit retained if the first neglected digit is 2 INTRODUCTION 5, 6, 7, 8 or 9. We thereby ensure that the maximum error committed is not more than five units in the first neglected place, or half a unit in the last figure given, or 0-5 x 10-*. To three decimals /2 = 1-414, to seven decimals it is 1-4142136, and so on. 3. Even if the computation can in theory be performed with exact integers, moreover, we shall find that our computing machine cannot usually handle the large numbers involved in the arithmetic processes. For example, if we are solving simultaneous linear algebraic equations inn unknowns, in which the coefficients and right-hand sides are given as p-figure integers, an exact process could give the results as the ratios of two integers, both of which would contain np digits. In tho more practicable methods which we discuss in this book the integers might contain p x 2"~1 digits. If n is 20, which is by no means large in practical problems, and p is say four, this number is of the order of 2x10, and no computing machine can store numbers of this size without complicating prohibitively the task of ‘programming’ and increasing prohibitively the time of operation. If the coefficients are given as rational fractions, such as 4, or as irrational numbers like e, m, +/2 or sin 0-72 (radians), we shall have to round them to a given number of digits. The problem we are solving is, ‘then not quite the original problem, and one of our tasks will be to decide how many figures we need to keep in the original data, and also in the process of the computation, to obtain the required precision in the results. 4, Problems in which the data are known exactly, either as integers, rational or irrational numbers, I call mathematical. The author of such a problem has a perfect right to ask for any degree of precision which ho needs for his purpose. On the other hand most problems with a scientific context will involve data obtained asa result of measurement, in some degree inaccurate, and our task now is to decide the worth- while precision of the answers. Such problems are called physical, and it is solf-deceptive to quote as answers more digits than those which remain unchanged however the data is varied within its limits of ‘tolerance’. The ‘required result’ now becomes the ‘meaningful result’, and our methods should decide this for us. As a trivial example, if we are asked to compute sinz, and a measurement of z gives the value z = 47+0-005, we see that there is a range of values of the answer, from about 0-7036 to 0-7106, and a quoted result of 0-7071 has a possible error of -.0-0035. It would clearly INTRODUCTION 3 bbe stupid to quote more than three decimals in the result. We shall see later that the precision of the answer compared with that of the data varies considerably with the problem, and in complicated algorithms our work of determining this might be formidable and challenging. 5. We note also that we would often prefer to use an algorithm, rather than evaluate a closed solution, even when the latter exists. In the field of differential equations, for example, the solution of the first-order equation (2) 1 = a(t), ® where A is an arbitrary constant to be fixed by the specification of y for a particular value of x. ‘Now this is a useful formula for the computation of y for one or two particular values of 2. But it is quite common to want a graph, or preferably a ‘able of values of y for a sot of (usually) equidistant values of x over a lengthy range. The calculation of the expression (3) is then. not trivial, involving the evaluations of a square root, an inverse tangent, and an exponential function, in addition to one division and several multiplications. In the computation of these elementary functions, moreover, we shalll either have to use some form of series or to interpolate in mathematical tables, and the whole operation is somewhat lengthy. We have numerical methods for solving such problems, though they belong to a field outside our present interest, which perform much less arithmetic and which produce suecessive values in the table without ever knowing the closed solution (3). 6. The closed solution, of course, is extremely valuable for many purposes, but unfortunately it can rarely be obtained in terms of the so-called ‘elementary’ functions. For example an apparently innocent change in (2), to the form di 2 ea =® @ produces the more formidable-looking solution 1+: 4, 1—: 1, (Her (PGR eer). @ ‘This can hardly be called a solution at all, since we have no analytical methods for evaluating the indefinite integral in terms of elementary 4 INTRODUCTION functions, and some numerical process has to be used for this purpose, We might just as well use our algorithmic numerical method for the equation (4) without recourse to (5), and in fact the extra numerical work in (4) compared with that of (2) is almost negligible. 7. Again, however, we should not ignore the possibility of obtaining a closed solution, and it is very important that we should understand the mathematics and mathematical methods for our problems, as well as the numerical analysis and possible algorithms. In particular we should try to decide in advance whether our given problem really has a solution, that is whether there is an existence theorem for it. With the development of automatic computing machines the mathematical analysis is increasingly important, and it should never be thought that the machine will do the mathematics for us. Our algorithm may sometimes decide for us whether or not our problem has a solution, or at least a unique solution. For example it is usually the case that a set of simultaneous linear algebraic equations has @ unique solution when the number of equations is equal to the number of unknowns. But it is clear that the equations at+y = 3) y } ) oe2y = 9 do not define @ unique solution, the second equation being effectively a restatement of the first. If in the second of (6) the right-hand side were a number other than six it is clear, moreover, that the equations would have no solution at all. This is less obvious with the equations ztyte =o) z—y—z =), @ Baty +42 = 7) which have no unique solution for any a, B and y, and no solution at all unless y = 3a—f. With many equations, and with more digits in the coefficients, wo may have some trouble in this context, and the necessity for rounding may produce a solution from our computing machine when in fact no solution exists. We shall give examples of this in a later chapter and show how our algorithm can help to decide the questions. In other fields, notably in the solution of differential equations, our algorithm may be less valuable in the determination of existence, and mathe- matical analysis is essential. INTRODUCTION 5 8 Summarizing, we can say that numerical analysis is concerned with the production of numerical solutions to scientific and mathe- matical problems. Our aim is to find methods which are economic in time, which produc? the results to the accuracy requested in mathe- matical problems, and which tell us how many figures are worth quoting in physical problems. To the numerical analysis we should add any mathematical knowledge we have or can find about the existence of solutions, and in some sense our methods, like those of mathematics itself, should be elegant! ‘As a rather trivial example of elegance we might consider the formula (1) for the solution of quadratic equations. If b*—c is reasonably small, and we compute its square root to a given number of decimal places, the formula gives roughly the same number of correct digits in both roots, But if ¢ is small, so that (6*—c)t = b-+«, where € is small, then z, = —2b—«, 2, = ¢, and 2, is given accurately with many more digits than z,. To avoid computing the square root to more figures we use our mathematics to note that 2,2, = ¢, 80 that x, = ¢/zy and can be computed from this formula with a relative accuracy similar to that of 2. The loss of significant digits in subtracting large numbers is a common phenomenon, and we use all possible methods to avoid or mitigate the consequences thereof. Computer arithmetic 9. There are two methods in common use for operating with numbers in @ computing machine, In both cases the numbers are stored in registers of fixed length, so that we can retain only p digits say, in any given number, and a number containing more than p digits must be truncated or, with extra effort, stored in two or more such registers. In what follows we assume that we are working in the common decimal system. With ‘single-length’ arithmetic, with p digits, we have either the fixed-point or the floating-point method of operation. In the fixed-point method it is customary to limit the size of numbers which may occur to the range —1 to +1, and any number outside this range must be sealed appropriately by dividing by power of 10. The programmer must take definite steps to keep track of these scale factors so that the correct result can finally be obtained. Since our machine can only store digits we must turn the positive and negative signs into quasi-digital form, and thie we do with the 6 INTRODUCTION convention that all positive numbers have their first digit zero. Tho decimal point will normally be thought to follow this digit, so that in a four-digit register wo can effectively store threo figures. Tho number 0-924 will actually appear in that form, and the largest positive number we can store is 0-999, the integer after the decimal point being 10”—1 in a (p+1) register machine. For a negative number z we store the complement 10°+1—|z|, so that the first digit is always 9, and the number —0-924 appears as 9076. All negative numbers have nine as the first digit, and the largost negative number we can store is 9-000, which is —1 in the ‘signed’ convention, the ‘fractional part’ representing the integer 10°. Ttiseasy to see that addition and subtraction, using the complements of negative numbers, will always give the correct answers in the ‘signed’ convention provided that the result is in the allowed range. In fact in a sequence of such operations the intermediate results are allowed to ex- ceed the range. For example 0-126—0-125 )-126 +-9-875 = 10-001, ‘The first digit is ‘lost’ and we are left with 0-001, the true result. Again, 0-125 —0-126 = 0-125 49-874 = 9-999 = —0-001, again correct. The sum 0-986 40-125 = 1-111 cannot be allowed, however, and we would have to store this in the rounded form 0-111 x 10!, remembering the power of 10 involved. But 0-986 +.0-125 —0-389 = 0-986 +0-125+9-611 = 10-722 = 0-722, and this is correct. ‘When we multiply together two permissible numbers the result is certain to be within range. But the exact product of two numbers of p digits has 2p digits, and we need two registers to store it exactly, a s0 called ‘double-length’ accumulator, If we have to round it to single length we commit an error of maximum amount 0:5 x10-*. The division a/b is out of range if a > b, but otherwise we can perform the calculation. In a ‘single-length’ register the stored result will have @ maximum error of 0-5 x10-°, unless the resulting decimal number terminates in at most p digits. 10. In the floating-point system our numbers can be of almost any size, and we store them in the form 10*Xb, making space in our register for both a and }. This representation is not unique, but we standardize by choosing b in the range 0-1 < |b |< 1. For example the number 1562 is stored as 0-1562 x 10*, 0-001562 is given as 0-1562 x 10-*. Both a and b can be negative, and are stored with the signed INTRODUCTION 1 convention, though ais always an integer and we oan forget about the decimal point in its register. Here the user is not worried by scaling problems and the machine automatically keops track of the relevant powers of ten. ‘Overflow’ of the accumulator is now almost solely restricted to the case of division by zero, and otherwise the size of allowable numbers is governed by the size of the register we allow for the representation of the exponent a, We shall mention some other relevant facts about arithmetic in the appropriate contexts. Simple error analysis 11. ‘The fixed-point and floating-point representations introduce the ideas of decimal places and significant figures. Both the numbers 0-9246 and 0-0002 have four decimal places and would be stored in this form in the fixed-point method. ‘The first number, however, has four significant figures whereas the second has only one significant figure. ‘The point about the word ‘significant’ is that, if these numbers were obtained as a result of rounding with a possible maximum error of half a unit in the last place retained, each has a possible absolute error of +0-00005, but the former has a much smaller relative error. It is correct to approximately one part in 20,000, while the number 0-0002 is correct only to one part in 4, In the floating-point representation these numbers are stored re- spectively as 0-9246 x 10° and 0-2000x10-*. Here the number of non-zero digits in the fractional part represents the number of signifi- cant figures present, the three zeros in the second example being inserted to fill up the register. If we had more significant information about this value, for example that it was 0-0002329..., or 0-0002000 where the last three zeros are known to be correct, we could store it in a floating-point form like 0:2329 x 10-* with a small relative error, whereas the rounded fixed-point number 0-0002 has a small absolute error but a large relative error. This, incidentally, does not imply that the floating-point representation is superior. There are many factor involved, some of which we shall mention later. We note immediately, however, that in an addition like 0-9246 x 10°+0-2329 x 10-* we have first to express the smaller number in the rounded form 0-0002 x 10° in order to add it to the first, and we have had to discard its last three digits. 12, We shall need rules for assessing both types of error in simple operations, so that we can extend them to complicated situations. 8 INTRODUCTION Consider first the case of absolute error. If x is the true value, and 262 an approximation, the absolute error of xt 4z is just dz. In general, for instance after rounding, the error will normally have equal possibility of being positive or negative. We can then assert quite obviously that the maximum absolute error in a sequence of additions or subtractions w= tatbtetd... (8) is just the arithmetic sum of the individual absolute errors, given by [dx] = [da] + 160] +... (9) For a product ab we actually form (aéa)(b-+3b), and the absolute error is [Boal +088] + (2085), ao) the last term usually being negligible, a quantity of ‘second order’, in relation to the others. 13. In fact we can use the differential calculus and say that, if ¥ =f ley 2p. qa) and 2,, 2,... have absolute errors |8z,|, |dz|,..., then that of y is a af byl = |x ze| + |Se0g7,| > aay provided that the individual absolute errors are sufficiently small. For example, if y = sin z, then dy = cos x 62, and the absolute error in y is not greater than that in z. Again, if y = 2”, we have ldy| = px? |ézl, (18) and the ratio |4y/82| will depend both on « and on p. If p and x both exceed unity the error in y is greater than that in x, but if 2 > 1 and p <1, 80 that we are taking a fractional power, then [yl < [8z|. The statement in many books that we cannot get a result with more correct figures than are contained in the data is clearly false. For example if y=200, ay and all we know about the ‘2’ is that it is correctly rounded, we can certainly quote y = 1-007 with a maximum absolute error of 0-003, or maximum relative error of 1 in 300. INTRODUCTION 9 14, We shall in fact more often be concerned with relative error, the dimensionless quantity |82/2|. The relative error of a sum or difference has no simple expression, but corresponding to (8) and (9) we have the Tale that if x = attbilett,,., (16) & z then da], 18 SI + FI sey (18) that is the maximum relative error of the result is the sum of the individual relative errors. This is proved immediately by taking the logarithmic derivative of (15). ‘This result will give us valuable information about the number of ‘meaningful figures in the number z derived from an operation like (15). For example, if 0-833 x 22-5 = 9395” and all we know about the factors is that they are correctly rounded, to how many digits can we reasonably quote the result? From (16) we a be 105 , 0-05 , 0-0005 Z| = Vass taa6+ 0-225 to sufficient accuracy. The error in 2 is therefore one part in two hundred. From (17) our estimate of 2 is 83-3, and this therefore has a possible error of 0-4, and only two significant figures of x are worth- while, Computing machines, programming and coding 15. There are about five steps in any computational task, though some of them may not always be needed. They are as follows: (i) Expressing the scientific problem as a mathematical problem. (i) Finding the ‘best’ numerical method for solving the mathe- matical problem. (iii) Expressing this method in algorithmic form, that is as a sequence of numerical operations, recordings, and s0 on. (iv) Turning this sequence into the language of the machine. (v) Performing the computation. ‘These items are in some sense independent of the nature of our computing equipment, but the latter will influence our choice in (ii), and to some extent the work of (iv). Before about 1950 most computation was carried out on a desk machine which can perform arithmetic and store a few numbers. For example, if we want to compute ab, we can put ain the setting register, (a7) = 0-0050 (a8) 10 INTRODUCTION tap out the number b on the multiplication register, and obtain the result in the product register. The numbers a, b and ab are all visible— ‘they are stored in the machine. In a long computation, however, in spite of various tricks that we can play, we shall have to record with pen and ink the results of many such intermediate calculations. We use an auciliary storage medium, in this case the ‘registers’ on a sheet of paper. The ‘best’ numerical method is then to some extent conditioned by our desire to avoid overmuch recording, which is tedious and error- provoking. On the other hand our auxiliary store is unlimited, and the point of this remark will become apparent later. In item (iii) for our desk-machine work we write down, in consider- able detail, the precise nature and order of the operations we wish to perform, and possibly present it to an assistant who will then perform item (v). In other words we give him a programme of instructions. The language we use in (iv) is the national tongue, with words and mathe- matical symbols. Our helper then operates the machine as in (v), records the intermediate results and produces the final answer, recorded on his sheet of paper. 16. The modern high-speed electronic digital computer (the machine) differs in several important respects, but our use of it has analogies with that of the desk machine, and we can use the same type of vocabulary. First, the machine has large storage capacity, its arith- metic speeds are very great, and intermediate calculations can be transferred to the registers of the machine rapidly and accurately. ‘The registers in the machine are numbered, are given addreasea, and we can ask the machine to put a number in a particular register, or to fetch it from that register, just as wo used to ask our assistant to copy a number into a particular location on the computing sheet, or to take ich a particular number and perform some numerical operation with Second, the machine has an arithmetic unit, the operative part of which is the accumulator, corresponding directly to the product register of our desk machine. There is, however, one significant difference. In the desk machine our registers have a fixed length, that is we can store numbers to a certain precision, but it is perfectly Possible, and indeed easier, to perform our arithmetic with fewer digits. In the electronic machine our registers can store a fixed number of digits (the word length of the machine) and there is no economy of

You might also like