Maths All Notes
Maths All Notes
Numerical Analysis
These notes were originally prepared during Fall 2014 for Math Numerical Analysis. In writing these
notes, it was not my intention to add to the glut of Numerical Analysis texts; they were designed to
complement the course text, Numerical Analysis, Ninth edition, by Burden and Faires. As such, these
notes follow the conventions of that text fairly closely. If you are at all serious about pursuing study of
Numerical Analysis, you should consider acquiring that text, or any one of a number of other fine texts
by e.g., Atkinson, Cheney & Kincaid etc.
Special thanks go to the students of batch 2013, who suffered through early versions of these notes,
which were riddled with (more) errors. Now I guess these notes have less errors.
More homework questions and example problems will help you learn the material.
CHAPTER 1 (4 LECTURES)
FLOATING POINT ARITHMETIC AND ERRORS
1. Numerical analysis
Numerical analysis, area of mathematics and computer science that creates, analyzes, and imple-
ments algorithms for obtaining numerical solutions to problems involving continuous variables. Such
problems arise throughout the natural sciences, social sciences, engineering, medicine, and business.
Since the mid 20th century, the growth in power and availability of digital computers has led to an
increasing use of realistic mathematical models in science and engineering, and numerical analysis
of increasing sophistication is needed to solve these more detailed models of the world. The formal
academic area of numerical analysis ranges from quite theoretical mathematical studies to computer
science issues. A major advantage for numerical technique is that a numerical answer can be obtained
even when a problem has no analytical solution. However, result from numerical analysis is an approx-
imation, in√ general, which can be made as accurate as desired. For example to find the approximate
values of 2, π etc.
With the increasing availability of computers, the new discipline of scientific computing, or com-
putational science, emerged during the 1980s and 1990s. The discipline combines numerical analysis,
symbolic mathematical computations, computer graphics, and other areas of computer science to make
it easier to set up, solve, and interpret complicated mathematical models of the real world.
1.1. Common perspectives in numerical analysis. Numerical analysis is concerned with all as-
pects of the numerical solution of a problem, from the theoretical development and understanding of
numerical methods to their practical implementation as reliable and efficient computer programs. Most
numerical analysts specialize in small subfields, but they share some common concerns, perspectives,
and mathematical methods of analysis. These include the following:
• When presented with a problem that cannot be solved directly, they try to replace it with a
“nearby problem” that can be solved more easily. Examples are the use of interpolation in
developing numerical integration methods and root-finding methods.
• There is widespread use of the language and results of linear algebra, real analysis, and func-
tional analysis (with its simplifying notation of norms, vector spaces, and operators).
• There is a fundamental concern with error, its size, and its analytic form. When approximating
a problem, it is prudent to understand the nature of the error in the computed solution.
Moreover, understanding the form of the error allows creation of extrapolation processes to
improve the convergence behaviour of the numerical method.
• Numerical analysts are concerned with stability, a concept referring to the sensitivity of the
solution of a problem to small changes in the data or the parameters of the problem. Numerical
methods for solving problems should be no more sensitive to changes in the data than the
original problem to be solved. Moreover, the formulation of the original problem should be
stable or well-conditioned.
In this chapter, we introduce and discuss some basic concepts of scientific computing. We begin
with discussion of floating-point representation and then we discuss the most fundamental source of
imperfection in numerical computing namely roundoff errors. We also discuss source of errors and then
stability of numerical algorithms.
This is an infinite series, but computer use an finite amount of memory to represent numbers. Thus
only a finite number of digits may be used to represent any number, no matter by what representation
method.
8
For example, we can chop the infinite decimal representation of after 4 digits,
3
8 2 6 6 6
= 1
+ 2 + 3 + 4 × 101 = 0.2666 × 101 .
3 10 10 10 10
Generalizing this, we say that number has n decimal digits and call this n as precision.
For each real number x, we associate a floating point representation denoted by f l(x), given by
f l(x) = ±(0.a1 a2 . . . an )β × β e ,
here β based fraction is called mantissa with all ai integers and e is known as exponent. This repre-
sentation is called β−based floating point representation of x and we take base β = 10 in this course.
For example,
42.965 = 4 × 101 + 2 × 100 + 9 × 10−1 + 6 × 10−2 + 5 × 10−3
= 0.42965 × 102 .
−0.00234 = −0.234 × 10−2 .
Number 0 is written as 0.00 . . . 0 × 10e . Likewise, we can use for binary number system and any real
x can be written
x = ±q × 2m
with 21 ≤ q ≤ 1 and some integer m. Both q and m will be expressed in terms of binary numbers. For
example,
1001.1101 = 1 × 23 + 2 × 20 + 1 × 2−1 + 1 × 2−2 + 1 × 2−4
= (9.8125)10 .
Remark 2.1. The above representation is not unique.
For example, 0.2666 × 101 = 0.02666 × 102 etc.
Definition 2.1 (Normal form). A non-zero floating-point number is in normal form if the values of
mantissa lies in (−1, −0.1] or [0.1, 1).
Therefore, we normalize the representation by a1 6= 0. Not only the precision is limited to a finite
number of digits, but also the range of exponent is also restricted. Thus there are integers m and M
such that −m ≤ e ≤ M .
FLOATING POINT ARITHMETIC AND ERRORS 3
Definition 2.2 (Overflow and underflow). An overflow is obtained when a number is too large to fit
into the floating point system in use, i.e e > M . An underflow is obtained when a number is too small,
i.e e < −m . When overflow occurs in the course of a calculation, this is generally fatal. But underflow
is non-fatal: the system usually sets the number to 0 and continues. (Matlab does this, quietly.)
2.1. Rounding and chopping. Let x be any real number and f l(x) be its machine approximation.
There are two ways to do the “cutting” to store a real number
x = ±(0.a1 a2 . . . an an+1 . . . ) × 10e , a1 6= 0.
(1) Chopping: We ignore digits after an and write the number as following in chopping
f l(x) = (.a1 a2 . . . an ) × 10e .
(2) Rounding: Rounding is defined as following
±(0.a1 a2 . . . an ) × 10e , 0 ≤ an+1 < 5
(rounding down)
f l(x) =
±[(0.a1 a2 . . . an ) + (0.00 . . . 01)] × 10e , 5 ≤ an+1 < 10 (rounding up).
Example 1.
0.86 × 100 (rounding)
6
fl =
7 0.85 × 100 (chopping).
√
Example 2. Find the largest interval in which f l(x) must lie to approximate 2 with relative error
at most 10−5 for each value of x.
Sol. We have √
2 − f l(x) √
√ ≤ 2 · 10−5 .
2
Therefore
√ √
| 2 − f l(x)| ≤ 2 · 10−5 ,
√ √ √
− 2 · 10−5 ≤ 2 − f l(x) ≤ 2 · 10−5
√ √ √ √
− 2 − 2 · 10−5 ≤ −f l(x) ≤ − 2 + 2 · 10−5
√ √ √ √
2 + 2 · 10−5 ≥ f l(x) ≥ 2 − 2 · 10−5 .
Hence interval (in decimals) is [1.4141994 · · · , 1.4142277 · · · ].
Therefore
∞
!
X ai
|x − f l(x)| = × 10e
10i
i=n+1
∞
X 10 − 1
|x − f l(x)| ≤ × 10e
10i
i=n+1
1 1
= (10 − 1) + + . . . × 10e
10n+1 10n+2
" #
1
10n+1
= (10 − 1)) 1 × 10e
1 − 10
= 10e−n .
Ea = |x − f l(x)| ≤ 10e−n .
Now
1
|x| = (0.a1 a2 . . . an )10 × 10e ≥ 0.1 × 10e = × 10e .
10
Therefore relative error bound is
n
!
X ai
(0.a1 a2 . . . an )10 × 10e = × 10e , 0 ≤ an+1 < 5
10i
f l(x) = i=1
n
!
e 1 X ai
(0.a1 a2 . . . an−1 [an + 1])10 10 = + × 10e , 5 ≤ an+1 < 10.
10n 10i
i=1
∞
X ai
|x − f l(x)| = × 10e
10i
i=n+1
∞
" #
an+1 X ai
= + × 10e
10n+1 10i
i=n+2
∞
" #
10/2 − 1 X (10 − 1)
≤ + × 10e
10n+1 10i
i=n+2
10/2 − 1 1
= + n+1 × 10e
10n+1 10
1 e−n
= 10 .
2
FLOATING POINT ARITHMETIC AND ERRORS 5
Thus,
x ⊕ y = f l(f l(x) + f l(y)) = f l(0.71428 × 100 + 0.33333 × 100 ) = f l(1.04761 × 100 ) = 0.10476 × 101 .
Let errors in two components are δx1 and δx2 , respectively and error in sum is δX.
∴ X + δX = (x1 + δx1 ) + (x2 + δx2 )
= (x1 + x2 ) + (δx1 + δx2 ).
=⇒ |δX| = |δx1 | + |δx2 |
|δX| ≤ |δx1 | + |δx2 |.
Dividing by X we get,
δX δx1 δx2
≤ +
X X X
which is a maximum relative error. Therefore, if two numbers are added then the magnitude of abso-
lute error in the result is the sum of the magnitudes of the absolute errors of the components. Same
result holds if we replace addition with subtraction.
x1 7.342
Sol. Let = = 30.467.
x2 0.241
Here errors |δx1 | = |δx2 | ≤ 21 × 10−3 = 0.0005, for rounding.
Therefore relative error
0.0005 0.0005
Er ≤ + = 0.0021.
7.342 0.241
Absolute error
x1
Ea ≤ 0.0021 × = 0.0639
x2
7.342
Hence true value of lies between 30.4647 − 0.0639 = 30.4008 and 30.4647 + 0.0639 = 30.5286.
0.241
Example 8. The error in the measurement of area of a circle is not allowed to exceed 0.5%. How
accurately the radius should be measured.
Sol. Area of the circle is A = πr2 (say).
∂A
∴ = 2πr.
∂r
δA
Percentage error in A = × 100 = 0.5
A
0.5
Therefore δA = × A = 1/200πr2
100
δr 100 δA
Percentage error in r = × 100 = = 0.25.
r r ∂A∂r
6. Loss of significance
Roundoff errors are inevitable and difficult to control. Other types of errors which occur in com-
putation may be under our control. The subject of numerical analysis is largely preoccupied with
understanding and controlling errors of various kinds.
One of the most common error-producing calculations involves the cancellation of significant digits due
to the subtractions nearly equal numbers (or the addition of one very large number and one very small
number or multiplication of a small number with a quite large number).
The phenomenon can be illustrated with the following examples.
Example 9. If x = 0.3721478693 and y = 0.3720230572. What is the relative error in the computation
of x − y using five decimal digits of accuracy?
Sol. We can compute with ten decimal digits of accuracy and can take it as ‘exact’.
x − y = 0.0001248121.
Both x and y will be rounded to five digits before subtraction. Thus
f l(x) = 0.37215
f l(y) = 0.37202.
Sol. The quadratic formula states that the roots of ax2 + bx + c = 0 are
√
−b ± b2 − 4ac
x1,2 = .
2a
Using the above formula, the roots of given eq. 1.002x2 + 11.01x + 0.01265 = 0 are approximately
(using long format)
x1 = −0.00114907565991, x2 = −10.98687487643590.
We use four-digit rounding arithmetic to find approximations to the roots. We write the approxima-
tions of root as x∗1 and x∗2 . These approximations are given by
p
−11.01 ± (−11.01)2 − 4 · 1.002 · 0.01265
x∗1,2 =
√ 2 · 1.002
−11.01 ± 121.2 − 0.05070
=
2.004
−11.01 ± 11.00
=
2.004
Therefore we find the first root:
x∗1 = −0.004990,
which has the absolute error |x1 − x∗1 | = 0.00384095 and relative error |x1 − x∗1 |/|x1 | = 3.34265968
(very high).
We find the second root
−11.01 − 11.00
x∗2 = = −10.98,
2.004
which has the following absolute error
|x2 − x∗2 | = 0.006874876,
and relative error
|x2 − x∗2 |
= 0.000626127.
|x2 |
This quadratic formula for the calculation of first root, encounter the subtraction of nearly equal
numbers and cause loss of significance. Therefore, we use the alternate quadratic formula by rationalize
the expression to calculate x1 and approximation is given by
−2c
x∗1 = √ = −0.001149,
b + b2 − 4ac
which has the following relative error
|x1 − x∗1 |
= 6.584 × 10−5 .
|x1 |
Example 11. The quadratic formula is used for computing the roots of equation ax2 +bx+c = 0, a 6= 0
and roots are given by √
−b ± b2 − 4ac
x= .
2a
Consider the equation x2 + 62.10x + 1 = 0 and discuss the numerical results.
Sol. Using quadratic formula and 8-digit rounding arithmetic, we obtain two roots
x1 = −.01610723
x2 = −62.08390.
We use these
√ values as√“exact values”. Now √ we perform calculations with 4-digit rounding arithmetic.
We have b2 − 4ac = 62.102 − 4.000 = 3856 − 4.000 = 62.06 and
−62.10 + 62.06
f l(x1 ) = = −0.02000.
2.000
10 FLOATING POINT ARITHMETIC AND ERRORS
f (x) − f (x∗ )
f (x)
=
x − x∗
x
xf 0 (x)
≈ .
f (x)
10
For example, if f (x) = , then the condition number can be calculated as
1 − x2
xf 0 (x) 2x2
κ= = .
f (x) |1 − x2 |
Condition number can be quite large for |x| ≈ 1. Therefore, the function is ill-conditioned.
Example 14. Compute and interpret the condition number for
(a) f (x) = sin x for x = 0.51π.
(b) f (x) = tan x for x = 1.7.
Sol. (a) The condition number is given by
xf 0 (x)
κ= .
f (x)
For x = 0.51π, f 0 (x) = cos(0.51π) = −0.03141, f (x) = sin(0.51π) = 0.99951.
∴ κ = 0.05035.
Since, the condition number is < 1, we conclude that the relative error is attenuated.
(b) f (x) = tan x, f (1.7) = −7.6966, f 0 (x) = 1/ cos2 x, f 0 (a) = 1/ cos2 (1.7) = 60.2377.
κ = −13.305.
Thus, the function is ill-conditioned.
In the following we study an example to create a stable algorithm.
7.1. Creating Algorithms. Another theme that occurs repeatedly in numerical analysis is the dis-
tinction between numerical algorithms are stable and those that are not. Informally speaking, a
numerical process is unstable if small errors made at one stage of the process are magnified and prop-
agated in subsequent stages and seriously degrade the accuracy of the overall calculation.
An algorithm can be thought of as a sequence of problems, i.e. a sequence of function evaluations.
In this case we consider the algorithm for evaluating f (x) to consist of the evaluation of the sequence
x1 , x2 , · · · , xn . We are concerned with the condition of each of the functions f1 (x1 ), f2 (x2 ), · · · , fn−1 (xn−1 )
where f (x) = fi (xi ) for all i. An algorithm is unstable if any fi is ill-conditioned, i.e. if any fi (xi ) has
condition much worse than f (x).
√ √
Example 15. Write an algorithm to calculate the expression f (x) = x + 1 − x, when x is quite
large. By considering the condition number κ of the subproblem of evaluating the function, show that
such a function evaluation is not stable. Suggest a modification which makes it stable.
Sol. Consider √ √
f (x) = x+1− x
so that there is potential loss of significance when x is large. Taking x = 12345 as an example, one
possible algorithm is
x0 : = x = 12345
x1 : = x0 + 1
√
x2 : = x
√ 1
x3 : = x0
f (x) := x4 : = x2 − x3 .
FLOATING POINT ARITHMETIC AND ERRORS 13
The loss of significance occurs with the final subtraction. We can rewrite the last step in the form
f3 (x3 ) = x2 − x3 to show how the final answer depends on x3 . As f30 (x3 ) = −1, we have the condition
x3 f30 (x3 ) x3
κ(x3 ) = =
f3 (x3 ) x2 − x3
from which we find κ(x3 ) ≈ 2.2 × 104 when x = 12345. Note that this is the condition of a subproblem
arrived at during the algorithm. To find an alternative algorithm we write
√ √
√ √ x+1+ x 1
f (x) = ( x + 1 − x) √ √ =√ √ .
x+1+ x x+1+ x
This suggests the algorithm
x0 : = x = 12345
x1 : = x0 + 1
√
x2 : = x
√ 1
x3 : = x0
x4 : = x2 + x3
f (x) := x5 : = 1/x4 .
In this case f3 (x3 ) = 1/(x2 + x3 ) giving a condition for the subproblem of
x3 f30 (x3 ) x3
κ(x3 ) = = ,
f3 (x3 ) x2 + x3
which is approximately 0.5 when x = 12345, and indeed in any case where x is much larger than 1.
Thus first algorithm is unstable and second is stable for large values of x. In general such analyses are
not usually so straightforward but, in principle, stability can be analysed by examining the condition
of a sequence of subproblems.
Example 16. Write an algorithm to calculate the expression f (x) = sin(a + x) − sin a, when x =
0.0001. By considering the condition number κ of the subproblem of evaluating the function, show that
such a function evaluation is not stable. Suggest a modification which makes it stable.
Sol. Let x = 0.0001
x0 = 0.0001
x1 = a + x0
x2 = sin x1
x3 = sin a
x4 = x2 − x3 .
Now to check the effect of x3 on x2 , we consider the function f3 (x3 ) = x2 − x3
x3 f 0 (x3 ) x3
κ(x3 ) = =
f (x3 ) x2 − x3
We obtain a very larger condition number, which shows that the last step is not stable.
Now we modify the above algorithm. We write the equivalent form
f (x) = sin(a + x) − sin a = 2 sin(x/2) cos(a + x/2).
The modified algorithm is the following
x0 = 0.0001
x1 = x0 /2
x2 = sin x1
x3 = cos(a + x1 )
x4 = 2x2 x3 .
14 FLOATING POINT ARITHMETIC AND ERRORS
Exercises
(1) Compute the absolute error and√relative error in approximations of x by x∗ .
a. x = π, x∗ = 22/7 b. x = 2, x∗ = 1.414 c. x = 8!, x∗ = 39900.
(2) Find the largest interval in which x must lie to approximate x with relative error at most 10−4
∗
Explain the difficulty in using the right-hand fraction to evaluate this expression when x is
close to zero. Give a way to avoid this problem and be as precise as√possible.
(12) a. Consider the stability (by calculating the condition number) of 1 + x − 1 when x is near
0. Rewrite the expression to rid it of subtractive cancellation.
b. Rewrite ex − cos x to be stable when x is near 0.
(13) Suppose that a function f (x) = ln(x + 1) − ln(x), is computed by the following algorithm for
large values of x using six digit rounding arithmetic
x0 : = x = 12345
x1 : = x0 + 1
x2 : = ln x1
x3 : = ln x0
f (x) := x4 : = x2 − x3 .
By considering the condition κ(x3 ) of the subproblem of evaluating the function, show that
such a function evaluation is not stable. Also propose the modification of function evaluation
so that algorithm will become stable.
(14) Assume 3-digit mantissa with rounding
a. Evaluate y = x3 − 3x2 + 4x + 0.21 for x = 2.73.
b. Evaluate y = [(x − 3)x + 4]x + 0.21 for x = 2.73.
Compare and discuss the errors obtained in part (a) and (b).
(15) a. How many multiplications and additions are required to determine a sum of the form
n X
X i
ai bj ?
i=1 j=1
b. Modify the sum in part (a) to an equivalent form that reduces the number of computations.
(16) Let P (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 be a polynomial, and let x0 be given. Construct
an algorithm to evaluate P (x0 ) using nested multiplication.
(17) Construct an algorithm that has as input an integer n ≥ 1, numbers x0 , x1 , · · · , xn , and a
number x and that produces as output the product (x − x0 )(x − x1 ) · · · (x0 − xn ).
Bibliography
[Burden] Richard L. Burden, J. Douglas Faires and Annette Burden, “Numerical Analysis,” Cengage
Learning, 10th edition, 2015.
[Atkinson] K. Atkinson and W. Han, “Elementary Numerical Analysis,” John Willey and Sons, 3rd
edition, 2004.
CHAPTER 2 (8 LECTURES)
ROOTS OF NON-LINEAR EQUATIONS IN ONE VARIABLE
1. Introduction
Finding one or more root (or zero) of the equation
f (x) = 0
is one of the more commonly occurring problems of applied mathematics. In most cases explicit
solutions are not available and we must be satisfied with being able to find a root to any specified
degree of accuracy. The numerical procedures for finding the roots are called iterative methods. These
problems arise in variety of applications.
The growth of a population can often be modeled over short periods of time by assuming that the
population grows continuously with time at a rate proportional to the number present at that time.
Suppose that N (t) denotes the number in the population at time t and λ denotes the constant birth
rate of the population. Then the population satisfies the differential equation
dN (t)
= λN (t),
dt
whose solution is N (t) = N0 eλt , where N0 denotes the initial population.
This exponential model is valid only when the population is isolated, with no immigration. If
immigration is permitted at a constant rate I, then the differential equation becomes
dN (t)
= λN (t) + I,
dt
whose solution is
I λt
N (t) = N0 eλt + (e − 1).
λ
Suppose a certain population contains N (0) = 1000000 individuals initially, that 424000 individuals
immigrate into the community in the first year, and that N (1) = 1564000 individuals are present at
the end of one year. To determine the birth rate of this population, we need to find λ in the equation
424000 λ
1564000 = 1000000eλ + (e − 1).
λ
It is not possible to solve explicitly for λ in this equation, but numerical methods discussed in this
chapter can be used to approximate solutions of equations of this type to an arbitrarily high accuracy.
Definition 1.1 (Simple and multiple root). A zero (root) has a “multiplicity”, which refers to the
number of times that its associated factor appears in the equation. A root having multiplicity one is
called a simple root. For example, f (x) = (x − 1)(x − 2) has a simple root at x = 1 and x = 2, but
g(x) = (x − 1)2 has a root of multiplicity 2 at x = 1, which is therefore not a simple root.
A multiple root is a root with multiplicity m ≥ 2 is called a multiple point or repeated root. For example,
in the equation (x − 1)2 = 0, x = 1 is multiple (double) root.
If a polynomial has a multiple root, its derivative also shares that root.
Let α be a root of the equation f (x) = 0, and imagine writing it in the factored form
f (x) = (x − α)m φ(x)
with some integer m ≥ 1 and some continuous function φ(x) for which φ(α) 6= 0. Then we say that α
is a root of f (x) of multiplicity m.
Now we study some iterative methods to solve the non-linear equations.
1
2 ROOTS OF NON-LINEAR EQUATIONS
Example 1. The sum of two numbers is 20. If each number is added to its square root, then the
product of the resulting sums is 155.55. Perform five iterations of bisection method to determine the
two numbers. 1
Sol. Let x and y be the two numbers. Then,
x + y = 20.
√ √
Now x is added to x and y is added to y. The product of these sums is
√ √
(x + x)(y + y) = 155.55.
√ √
∴ (x + x)(20 − x + 20 − x) = 155.55.
Write the above equation in to root finding problem
√ √
f (x) = (x + x)(20 − x + 20 − x) − 155.55 = 0.
As f (6)f (7) < 0, so there is a root in interval (6.7).
Below are the iterations of bisection method for finding root. Therefore root is 6.53125.
1Choice of initial approximations: Initial approximations to the root are often known from the physical significance of
the problem. Graphical methods are used to find the zero of f (x) = 0 and any value in the neighborhood of root can be
taken as initial approximation.
If the given equation f (x) = 0 can be written as f1 (x) = f2 (x) = 0, then the point of the intersection of the graphs
y = f1 (x) and y = f2 (x) gives the root of the equation. Any value in the neighborhood of this point can be taken as
initial approximation.
ROOTS OF NON-LINEAR EQUATIONS 3
Definition 2.1 (Convergence). A sequence {xn } is said to be converge to a point α with order p if
there is exist a constant c such that
|xn+1 − α|
lim p = c, n ≥ 0.
n→∞ |xn − α|
The constant c is known as asymptotic error constant. If we write en = |xn − α| where en denote the
absolute error in n-th iteration then we can write in limiting case
en+1 = c epn .
Two cases are given special attention.
(i) If p = 1 (and c < 1), the sequence is linearly convergent.
(ii) If p = 2, the sequence is quadratically convergent.
Definition 2.2. Let {βn } is a sequence which converges to zero and {xn } is any sequence. If there
exists a constant c > 0 and an integer N > 0 such that
|xn − α| ≤ c|βn |, ∀n ≥ N,
then we say that {xn } converges to α with rate O(βn ). We write
xn = α + O(βn ).
Example: Define two sequences for n ≥ 1,
n+1 n+2
xn = 2
, and yn = .
n n3
Both the sequences has limit 0 but the sequence {yn } converges to this limit much faster than the
sequence {xn }.
Now
n+1 n+n 1
|xn − 0| = 2
< 2
= 2 = 2βn
n n n
and
n+2 n + 2n 1
|yn − 0| = 3
< 3
= 3 2 = 3β̃n .
n n n
Hence the rate of convergence of {xn } to zero is similar to the convergence of {1/n} to zero, whereas
{yn } converges to zero at a rate similar to the more rapidly convergent sequence {1/n2 }. We express
this by writing
xn = 0 + O(βn ) and yn = 0 + O(β̃n ).
2.2. Convergence analysis. Now we analyze the convergence of the iterations generated by the
bisection method.
Theorem 2.3. Suppose that f ∈ C[a, b] and f (a) · f (b) < 0. Then the bisection method generates a
sequence {cn } approximating a zero α of f with linear convergence.
Proof. Let [a1 , b1 ], [a2 , b2 ], · · · , [an , bn ], · · · , denote the successive intervals produced by the bisection
algorithm. Thus
a = a1 ≤ a2 ≤ · · · ≤ b1 = b
b = b1 ≥ b2 ≥ · · · ≥ a1 = a.
This implies {an } and {bn } are monotonic and bounded and hence convergent.
Since
b1 − a1 = (b − a)
1 1
b2 − a2 = (b1 − a1 ) = (b − a)
2 2
........................
1
bn − an = n−1
(b − a). (2.1)
2
Hence
lim (bn − an ) = 0.
n→∞
4 ROOTS OF NON-LINEAR EQUATIONS
Here b − a denotes the length of the original interval with which we started. Take limit
lim an = lim bn = α (say).
n→∞ n→∞
Since f is continuous function, therefore
lim f (an ) = f ( lim an ) = f (α).
n→∞ n→∞
The bisection method ensures that
f (an )f (bn ) ≤ 0
which implies
lim f (an )f (bn ) = f 2 (α) ≤ 0
n→∞
=⇒ f (α) = 0.
Thus limit of {an } and {bn } is a zero of [a, b].
Since the root α is in either the interval [an , cn ] or [cn , bn ]. Therefore
1
|α − cn | < cn − an = bn − cn = (bn − an )
2
Combining with (2.1), we obtain the further bound
1
en = |α − cn | < n (b − a).
2
Therefore
1
en+1 < n+1 (b − a).
2
1
∴ en+1 < en .
2
This shows that the iterates cn converge to α as n → ∞. By definition of convergence, we can say that
the bisection method converges linearly with rate 21 .
Illustrations: 1. Since the method brackets the root, the method is guaranteed to converge, however,
can be very slow.
an + bn
2. Computing cn : It might happen that at a certain iteration n, computation of cn = will
2
give overflow. It is better to compute cn as:
bn − an
cn = an + .
2
3. Stopping Criteria: Since this is an iterative method, we must determine some stopping criteria
that will allow the iteration to stop. We can use the following criteria to stop in term of absolute error
and relative error
|cn+1 − cn | ≤ ,
|cn+1 − cn |
≤ ,
|cn+1 |
provided |cn+1 | =6 0.
Criterion |f (cn )| ≤ can be misleading since it is possible to have |f (cn )| very small, even if cn is not
close to the root.
Let’s now find out what is the minimum number of iterations N needed with the bisection method to
b−a
achieve a certain desired accuracy. The interval length after N iterations is N . So, to obtain an
2
b−a
accuracy of , we must have N ≤ . That is,
2
2−N (b − a) ≤ ε,
or
log(b − a) − log ε
N≥ .
log 2
Note the number N depends only on the initial interval [a, b] bracketing the root.
4. If a function is such that it just touches the x-axis, for example f (x) = x2 , then we don’t have a
ROOTS OF NON-LINEAR EQUATIONS 5
and b such that f (a)f (b) < 0 but x = 0 is the root of f (x) = 0.
5. For functions where there is a singularity and it reverses sign at the singularity, bisection method
1
may converge on the singularity. An example include f (x) = . We can chose a and b such that
x
f (a)f (b) < 0. However, the function is not continuous and the theorem that a root exists is not
applicable.
Example 2. Use the bisection method to find solutions accurate to within 10−2 for x3 −7x2 +14x−6 = 0
on [0, 1].
Sol. Number of iterations
log(1 − 0) − log(10−2 )
N≥ = 6.6439.
log 2
Thus, a minimum of 7 iterations will be needed to obtain the desired accuracy using the bisection
method. This yields the following results for mid-points cn and f (cn ):
Fixed-point iterations: We now consider solving an equation x = g(x) for a root α by the iteration
xn+1 = g(xn ), n ≥ 0,
with x0 as an initial guess to α.
Each solution of x = g(x) is called a fixed point of g.
2. x = 3/x.
3. x = 21 (x + 3/x).
Let x0 = 2.
√
Now 3 = 1.73205 and it is clear that third choice is correct but why other two are not working?
Therefore which of the approximation is correct or not, we will answer after the convergence result
(which require |g 0 (α) < 1| and a ≤ g(x) ≤ b, ∀x ∈ [a, b] in the neighborhood of root α).
Lemma 3.1. Let g(x) be a continuous function on [a, b] and assume that a ≤ g(x) ≤ b, ∀x ∈ [a, b]
then x = g(x) has at least one solution in [a, b].
Proof. Let g be a continuous function on [a, b].
Let assume that a ≤ g(x) ≤ b, ∀x ∈ [a, b].
Now consider φ(x) = g(x) − x.
If g(a) = a or g(b) = b then proof is trivial. Hence we assume that a 6= g(a) and b 6= g(b).
Now since a ≤ g(x) ≤ b
=⇒ g(a) > a and g(b) < b.
Now
φ(a) = g(a) − a > 0
and
φ(b) = g(b) − b < 0.
Now φ is continuous and φ(a)φ(b) < 0, therefore by Intermediate Value Theorem φ has at least one
zero in [a, b], i.e. there exists some α s.t.
g(α) = α, α ∈ [a, b].
Graphically, the roots are the intersection points of y = x & y = g(x) as shown in the Figure.
Theorem 3.2 (Contraction Mapping Theorem). Let g & g 0 are continuous functions on [a, b] and
assume that g satisfy a ≤ g(x) ≤ b, ∀x ∈ [a, b]. Furthermore, assume that there is a positive constant
λ < 1 exists with
|g 0 (x)| ≤ λ, ∀x ∈ (a, b).
Then
1. x = g(x) has a unique solution α of x = g(x) in the interval [a, b].
2. The iterates xn+1 = g(xn ), n ≥ 1 will converge to α for any choice of x0 ∈ [a, b].
3.
λn
|α − xn | ≤ |x1 − x0 |, n ≥ 0.
1−λ
4. Convergence is linear.
Proof. Let g and g 0 are continuous functions on [a, b] and assume that a ≤ g(x) ≤ b, ∀x ∈ [a, b]. By
previous Lemma, there exists at least one solution to x = g(x).
By Mean-Value Theorem, ∃ a point c s.t.
g(x) − g(y) = g 0 (c)(x − y).
|g(x) − g(y)| ≤ λ|x − y|, 0 < λ < 1, ∀x ∈ [a, b].
1. Let x = g(x) has two solutions, say α and β in [a, b] then α = g(α), and β = g(β). Now
|α − β| = |g(α) − g(β)| ≤ λ|α − β|
=⇒ (1 − λ)|α − β| ≤ 0
=⇒ α = β, Since 0 < λ < 1.
Therefore x = g(x) has a unique solution in [a, b] which is α (say).
2. To check the convergence of iterates {xn }, we observe that they all remain in [a, b] as xn ∈
[a, b], xn+1 = g(xn ) ∈ [a, b].
Now
|α − xn+1 | = |g(α) − g(xn )| = |g 0 (cn )||α − xn | (3.1)
for some cn between α and xn .
=⇒ |α − xn+1 | ≤ λ|α − xn | ≤ λ2 |α − xn−1 |
................
≤ λn+1 |α − x0 |
As n → ∞, λn → 0 which implies xn → α. Also
|α − xn | ≤ λn |α − x0 |. (3.2)
3. To find the bound:
Since
|α − x0 | = |α − x1 + x1 − x0 |
≤ |α − x1 | + |x1 − x0 |
≤ λ|α − x0 | + |x1 − x0 |
=⇒ (1 − λ)|α − x0 | ≤ |x1 − x0 |
1
=⇒ |α − x0 | ≤ |x1 − x0 |
1−λ
λn
=⇒ λn |α − x0 | ≤ |x1 − x0 |
1−λ
Therefore using (3.2)
λn
|α − xn | ≤ λn |α − x0 | ≤ |x1 − x0 |
1−λ
λn
=⇒ |α − xn | ≤ |x1 − x0 |.
1−λ
8 ROOTS OF NON-LINEAR EQUATIONS
Now xn → α =⇒ cn → α.
Hence
|α − xn+1 |
lim = |g 0 (α)|.
n→∞ |α − xn |
If |g 0 (α)| < 1, the above formula shows that iterates are linearly convergent with rate (asymptotic error
constant) |g 0 (α)|. If in addition g 0 (α) 6= 0, then formula proves that convergence is exactly linear, with
no higher order of convergence being possible.
Illustrations: 1. In practice, it is difficult to find an interval [a, b] for which a ≤ g(x) ≤ b condition
is satisfied. On the contrary if |g 0 (α)| > 1, then the iteration method xn+1 = g(xn ) will not converge
to α.
When |g 0 (α)| = 1, no conclusion can be drawn and even if convergence occur, the method would be
far too slow for the iteration method to be practical.
2. If
λn
|α − xn | ≤ |x1 − x0 | < ε
1−λ
where ε is desired accuracy. This bound can be used to find the number of iterations to achieve the
accuracy ε.
Also from part 2, |α − xn | ≤ λn |α − x0 | ≤ λn max{x0 − a, b − x0 } < ε, can be used to find the number
of iterations.
3. The possible behavior of fixed-point iterates {xn } is shown in Figure 3 for various values of g 0 (α).
To see the convergence, consider the case case of x1 = g(x0 ), the height of y = g(x) at x0 . We bring
the number x1 back to the x-axis by using the line y = x and the height y = x1 . We continue this with
each iterate, obtaining a stair-step behavior when g 0 (α) > 0. When g 0 (α) < 0, the iterates oscillates
around the fixed point α, as can be seen in the Figure. In first figure (on top) iterations are monotonic
convergence, in second oscillatory convergent, in third figure iterations are divergent and in the last
figure iterations are oscillatory divergent.
Theorem 3.3. Let α is a root of x = g(x), and g(x) is p times continuously differentiable function
for all x ∈ [α − δ, α + δ], g(x) ∈ [α − δ, α + δ], for some p ≥ 2. Furthermore assume
Proof. Let g(x) is p times continuously differentiable function for all x ∈ [a−δ, a+δ], g(x) ∈ [a−δ, a+δ]
and satisfying the conditions in equation (3.3) stated above.
Now expand g(xn ) in a Taylor polynomial about α.
xn+1 = g(xn )
= g(α + xn − α)
(xn − α)p−1 (p−1) (xn − α)p (p)
= g(α) + (xn − α)g 0 (α) + · · · + g (α) + g (ξn ),
(p − 1)! p!
ROOTS OF NON-LINEAR EQUATIONS 9
Hence by the Contraction Mapping Theorem the sequence {xn } defined above will converge to the
unique solution of given equation. Starting with x0 = 0.5, we can compute the solution as following.
x1 = 0.303571429
x2 = 0.28971083
x3 = 0.289188016.
Therefore root correct to three decimals is 0.289.
Example 5. An equation ex = 4x2 has a root in [4, 5]. Show that we cannot find that root using
x = g(x) = 12 ex/2 for the fixed-point iteration method. Can you find another iterative formula which
will locate that root ? If yes, then find third iterations with x0 = 4.5. Also find the error bound.
Sol. Here g(x) = 12 ex/2 , g 0 (x) = 41 ex/2 > 1 for all x ∈ (4, 5), therefore, the fixed-point iteration fails to
converge to the root in [4, 5].
2
Now consider x = g(x) = ln(4x2 ) and |g 0 (x)| = < 1 for all x ∈ (4, 5).
x
Also 4 ≤ g(x) ≤ 5, the fixed-point iteration converges to the root in [4, 5].
Using the fixed-point iteration method with x0 = 4.5 gives the iterations as
x1 = g(x0 ) = ln(4 × 4.52 ) = 4.3944
x2 = 4.3469
x3 = 4.3253.
Now λ = max |g 0 (x)| = g 0 (4) = 0.5
4≤x≤5
We have the error bound
0.53
|α − x3 | ≤ |4.3944 − 4.5| = 0.0264.
1 − 0.5
Example 6. The equation x3 + 4x2 − 10 = 0 has a unique root in [1, 2]. Write the fixed-point
representations which converge to unique solution.
Sol. We discus the possibilities of writing the g(x).
(1)
x = g1 (x) = x − x3 − 4x2 + 10
For g1 (x) = x − x3 − 4x2 + 10, we have g1 (1) = 6 and g1 (2) = −12, so g1 does not map [1, 2] into
itself. Moreover, g10 (x) = 1 − 3x2 − 8x, so |g10 (x)|> 1 for all x in [1, 2]. Although Convergence
Theorem does not guarantee that the method must fail for this choice of g, there is no reason
to expect convergence.
(2)
x3 = 10 − 4x2
x2 = 10/x − 4x
x = [(10/x) − 4x]1/2 = g2 (x)
With g2 (x) = [10/x − 4x]1/2 , we can see that g2 does not map [1, 2] into [1, 2], and the sequence
{xn }∞
n=0 is not defined when p0 = 1.5. Moreover, there is no interval containing α ≈ 1.365
such that |g20 (x)|< 1, because |g20 (α)|≈ 3.4. There is no reason to expect that this method will
converge.
(3)
4x2 = 10 − x3
1
x = (10 − x3 )1/2 = g3 (x)
2
For the function g3 (x) = 21 (10 − x3 )1/2 , we have
3
g30 (x) = − x2 (10 − x3 )1/2 < 0 on[1, 2],
4
so g3 is strictly decreasing on [1, 2]. However, |g30 (2)|≈ 2.12, so the condition |g30 (x)|≤ λ < 1
fails on [1, 2]. A closer examination of the sequence {xn }∞ n=0 starting with x0 = 1.5 shows that
ROOTS OF NON-LINEAR EQUATIONS 11
it suffices to consider the interval [1, 1.5] instead of [1, 2]. On this interval it is still true that
g30 (x) < 0 and g3 is strictly decreasing, but, additionally,
1 < 1.28 ≈ g3 (1.5) ≤ g3 (x) ≤ g3 (1) = 1.5,
for all x ∈ 1, 1.5]. This shows that g3 maps the interval [1, 1.5] into itself. It is also true that
|g30 (x)|≤ |g30 (1.5)|≈ 0.66 on this interval, so Convergence Theorem confirms the convergence.
(4)
x3 + 4x2 = 10
x2 (x + 4) = 10
10 1/2
x = = g4 (x).
x+4
For g4 (x) we have
5 5
|g40 (x)|= √ ≤√ < 0.15, for all x ∈ [1, 2].
10(4 + x)3/2 10(5)3/2
The bound on the magnitude of g40 (x) is much smaller than the bound (found in (3)) on the
magnitude of g30 (x), which explains the more rapid convergence using g4 .
(5) The sequence defined by
x3 + 4x2 − 10
g5 (x) = x −
3x2 + 8x
converges much more rapidly than our other choices. In the next sections (Newton’s Method)
we will see where this choice came from and why it is so effective.
Starting with x0 = 1.5, the following table shows some of the iterates.
n (1) (2) (3) (4) (5)
0 1.5 1.5 1.5 1.5 1.5
1 -0.875 0.8165 1.286953768 1.348399725 1.373333333
2 6.732 2.9969 1.402540804 1.367376372 1.365262015
3 -4.697 (−8.65) 1/2 1.345458374 1.364957015 1.365230014
4 1.375170253 1.365264748 1.365230013
5 1.360094193 1.365225594
6 1.367846968 1.365230576
Example 7. Use a fixed-point method to determine a solution to within 10−4 for x = tan x, for x in
[4, 5].
Sol. Using g(x) = tan x and x0 = 4 gives x1 = g(x0 ) = tan 4 = 1.158, which is not in the interval [4, 5].
So we need a different fixed-point function.
If we note that x = tan x implies that
1 1
=
x tan x
1 1
=⇒ x = x − + .
x tan x
1 1
Starting with x0 and taking g(x) = x − + ,
x tan x
we obtain x1 = 4.61369, x2 = 4.49596, x3 = 4.49341, x4 = 4.49341.
As x3 and x4 agree to five decimals, it is reasonable to assume that these values are sufficiently accurate.
Example 8. The iterates xn+1 = 2 − (1 + c)xn + cx3n will converge to α = 1 for some values of constant
c (provided that x0 is sufficiently close to α). Find the values of c for which convergence occurs? For
what values of c, if any, convergence is quadratic?
Sol. Fixed-point iteration
xn+1 = g(xn )
with
g(x) = 2 − (1 + c)x + cx3 .
12 ROOTS OF NON-LINEAR EQUATIONS
Illustrations:
1. Stopping Criterion: We can use the following stopping criteria
|xn − xn−1 | < ε,
xn − xn−1
Or < ε,
xn
where ε is given accuracy.
2. We can combine the secant method with the bisection method and bracket the root, i.e., we choose
initial approximations x0 and x1 in such a manner that f (x0 )f (x1 ) < 0. At each stage we bracket the
root. The method is known as ‘Method of False Position’ or ‘Regula Falsi Method’.
Example 10. Apply secant method to find the root of the equation ex = cos x with Relative error
< 0.5%.
Sol. Let f (x) = ex − cos x = 0.
The successive iterations of the secant method are given by
xn − xn−1
xn+1 = xn − f (xn ), n = 1, 2, . . .
f (xn ) − f (xn−1 )
We take initial guesses x0 = −1.1 and x1 = −1, and let en denotes error at n-th step and we obtain
x2 − x1
x2 = 0.2709, e1 = × 100% = 469.09%.
x2
x3 − x2
x3 = 0.4917, e2 = × 100% = 44.9%.
x3
x4 − x3
x4 = 0.5961, e3 = × 100% = 17.51%.
x4
x5 − x4
x5 = 0.6170, e4 = × 100% = 3.4%.
x5
x6 − x5
x6 = 0.6190, e5 = × 100% = 0.32%.
x6
We obtain error less than 0.5% and accept x6 = 0.6190 as root with prescribed accuracy.
Example 11. Let f ∈ C 0 [a, b]. If α is a simple root of f (x) = 0, then show that the sequence {xn }
generated by the secant method has order of convergence 1.618.
Sol. We assume that α is a simple root of f (x) = 0 then f (α) = 0.
Let xn = α + en , where en is the error at n-th step.
An iterative method is said to has order of convergence p if
|xn+1 − α| = C |xn − α|p .
14 ROOTS OF NON-LINEAR EQUATIONS
Or equivalently
|en+1 | = C|en |p .
Successive iteration in secant method are given by
xn − xn−1
xn+1 = xn − f (xn ) n = 1, 2, . . .
f (xn ) − f (xn−1 )
Error equation is written as
en − en−1
en+1 = en − f (α + en ).
f (α + en ) − f (α + en−1 )
By expanding f (α + en ) and f (α + en−1 ) in Taylor series, we obtain the error equation
0 1 2 00
(en − en−1 ) en f (α) + en f (α) + . . .
2
en+1 = en −
0
1 00
(en − en−1 ) f (α) + (en + en−1 ) f (α) + . . .
2
00
−1
f 00 (α)
1 2 f (α) 1
= en − en + en 0 + . . . 1 + (en−1 + en ) 0 + ...
2 f (α) 2 f (α)
1 2 f 00 (α) f 00 (α)
1
= en − en + en 0 + . . . 1 − (en−1 + en ) 0 + ...
2 f (α) 2 f (α)
1 f 00 (α)
= × en en−1 + O(e2n en−1 + en e2n−1 )
2 f 0 (α)
Therefore
en+1 ≈ Aen en−1
1 f 00 (α)
where constant A = .
2 f 0 (α)
This relation is called the error equation. Now by the definition of the order of convergence, we expect
a relation of the following type
en+1 = Cepn .
1/p
Making one index down, we obtain en = Cepn−1 or en−1 = C 1/p en .
Hence
C epn = Aen C 1/p e1/p
n
=⇒ epn = AC (−1+1/p) e1+1/p
n .
Comparing the powers of en on both sides, we get
p = 1 + 1/p,
which gives two values of p, one is p = 1.618 and another one is negative (and we neglect negative
value of p as order of convergence is non-negative).
Therefore, order of convergence of secant method is less than 2.
4.2. Newton’s Method. Let f (x) = 0 be the given non-linear equation.
Let the tangent line at point (x0 , f (x0 )) on the curve y = f (x) intersect with the x-axis at (x1 , 0). The
equation of tangent is given by
y − f (x0 ) = f 0 (x0 )(x − x0 ).
Here the number f 0 (x0 ) gives the slope of tangent at x0 . At x = x1 ,
0 − f (x0 ) = f 0 (x0 )(x1 − x0 )
f 0 (x0 )
x1 = x0 − .
f (x0 )
Here x0 is the approximations of the root.
This is called the Newton’s method and successive iterations are given by
f (xn )
xn+1 = xn − , n = 0, 1, . . . .
f 0 (xn )
ROOTS OF NON-LINEAR EQUATIONS 15
The method can be obtained directly from the secant method by taking limit xn−1 → xn . In the
limiting case the chord joining the points (xn−1 , f (xn−1 )) and (xn , f (xn )) becomes the tangent at
(xn , f (xn )).
In this case problem of finding the root of the equation is equivalent to finding the point of intersection
of the tangent to the curve y = f (x) at point (xn , f (xn )) with the x-axis.
√
Example 12. Use Newton’s Method in computing of 2.
Sol. This number satisfies the equation f (x) = 0 where f (x) = x2 − 2 = 0.
Since f 0 (x) = 2x, it follows that in Newton’s Method, we can obtain the next iterate from the previous
iterate xn by
x2 − 2 xn 1
xn+1 = xn − n = + .
2xn 2 xn
Starting with x0 = 1, we obtain
1 1
x1 = + = 1.5
2 1
1.5 1
x2 = + = 1.41666667
2 1.5
x3 = 1.41421569
x4 = 1.41421356
x5 = 1.41421356.
Since the fourth and fifth iterates agree in to eight decimal places, we assume that 1.41421356 is a
correct solution to f (x) = 0, to at least eight decimal places.
• If f (x) be twice continuously differentiable on the closed finite interval [a, b] and the following
conditions are satisfied:
(i) f (a) f (b) < 0.
(ii) f 0 (x) 6= 0, ∀x ∈ [a, b].
(iii) Either f 00 (x) ≥ 0 or f 00 (x) ≤ 0, ∀x ∈ [a, b].
(iv) The tangent to the curve at either endpoint intersects the x−axis within the interval [a, b].
In other words, at the end points a, b,
|f (a)| |f (b)|
|x − a| = 0 < b − a, |x − b| = 0 < b − a.
|f (a)| |f (b)|
Then the Newton’s method converges to the unique solution α of f (x) = 0 in [a, b] for any
choice of x0 ∈ [a, b].
Conditions (i) and (ii) guarantee that there is one and only one solution in [a, b]. Condition
(iii) states that the graph of f (x) is either concave from above or concave from below, and
furthermore together with condition (ii) implies that f 0 (x) is monotone on [a, b].
The following example shows that choice of initial guess is very important for convergence.
Example 13. Using Newton’s Method to find a non-zero solution of x = 2 sin x.
Sol. Let f (x) = x − 2 sin x.
Then f 0 (x) = 1 − 2 cos x, f (1)f (2) < 0, root lies in (1, 2).
The Newton’s iterations are given by
f (xn ) xn − 2 sin xn 2(sin xn − xn cos xn )
xn+1 = xn − 0 = xn − = ; n ≥ 0.
f (xn ) 1 − 2 cos xn 1 − 2 cos xn
Let x0 = 1.1. The next six estimates, to 3 decimal places, are:
x1 = 8.453, x2 = 5.256, x3 = 203.384, x4 = 118.019, x5 = −87.471, x6 = −203.637.
Therefore iterations diverges.
Note that choosing x0 = π/3 ≈ 1.0472 leads to immediate disaster, since then 1 − 2 cos x0 = 0 and
therefore x1 does not exist. The trouble was caused by the choice of x0 as f 0 (x0 ) ≈ 0.
Let’s see whether we can do better. Draw the curves y = x and y = 2 sin x. A quick sketch shows that
they meet a bit past π/2. If we take x0 = 1.5. Here are the next five estimates
x1 = 2.076558, x2 = 1.910507, x3 = 1.895622, x4 = 1.895494, x5 = 1.895494.
ROOTS OF NON-LINEAR EQUATIONS 17
Figure 7. One more example of where Newton’s method will not work.
Example 14. Find, correct to 5 decimal places, the x-coordinate of the point on the curve y = ln x
which is closest to the origin. Use the Newton’s Method.
Sol. Let (x, ln x) be a general point on the curve, and let S(x) be the square of the distance from
(x, ln x) to the origin. Then
S(x) = x2 + ln2 x.
We want to minimize the distance. This is equivalent to minimizing the square of the distance. Now
the minimization process takes the usual route. Note that S(x) is only defined when x > 0. We have
ln x 2
S 0 (x) = 2x + 2 = (x2 + ln x).
x x
Our problem thus comes down to solving the equation S 0 (x) = 0. We can use the Newton’s Method
directly on S 0 (x), but calculations are more pleasant if we observe that S 0 (x) = 0 is equivalent to
x2 + ln x = 0.
Let f (x) = x2 + ln x. Then f 0 (x) = 2x + 1/x and we get the recurrence relation
x2k + ln xk
xk+1 = xk − , k = 0, 1, · · ·
2xk + 1/xk
We need to find a suitable starting point x0 . Experimentation with a calculator suggests that we take
x0 = 0.65.
Then x1 = 0.6529181, and x2 = 0.65291864.
Since x1 agrees with x2 to 5 decimal places, we can perhaps decide that, to 5 places, the minimum
distance occurs at x = 0.65292.
4.3. Convergence Analysis.
Theorem 4.1. Let f ∈ C 2 [a, b]. If α is a simple root of f (x) = 0 and f 0 (α) 6= 0, then Newton’s method
generates a sequence {xn } converging quadratically to root α for any initial approximation x0 near to
α.
Proof. The proof is based on analyzing Newton’s method as the fixed point iteration scheme
xn+1 = g(xn )
f (xn )
= xn − , n≥0
f 0 (xn )
18 ROOTS OF NON-LINEAR EQUATIONS
with
f (x)
g(x) = x − .
f 0 (x)
We first find an interval [α − δ, α + δ] such that g(x) ∈ [α − δ, α + δ] and for which |g 0 (x)| ≤ λ, λ ∈ (0, 1),
for all x ∈ (α − δ, α + δ).
Since f 0 is continuous and f 0 (α) 6= 0, i.e., a continuous function is non-zero at a point which implies it
will remain non-zero in a neighborhood of α.
Thus g is defined and continuous in a neighborhood of α. Also in that neighborhood
f 0 (x)f 0 (x) − f (x)f 00 (x) f (x)f 00 (x)
g 0 (x) = 1 − = . (4.1)
f 0 (x)2 [f 0 (x)]2
Now since f (α) = 0, therefore
f (α)f 00 (α)
g 0 (α) = = 0.
[f 0 (α)]2
Since g is continuous and 0 < λ < 1, then there exists a number δ such that
|g 0 (x)| ≤ λ, ∀x ∈ [α − δ, α + δ].
Now we will show that g maps [α − δ, α + δ] into [α − δ, α + δ].
If x ∈ [α − δ, α + δ], the Mean Value Theorem implies that for some number c between x and α,
|g(x) − α| = |g(x) − g(α)| = |g 0 (c)| |x − α| ≤ λ|x − α| < |x − α|.
It follows that if |x − α| < δ =⇒ |g(x) − α| < δ.
Hence, g maps [α − δ, α + δ] into [α − δ, α + δ].
All the hypotheses of the Fixed-Point Convergence Theorem (Contraction Mapping) are now satisfied,
so the sequence xn converges to root α. Further from Eqs. (4.1)
f 00 (α)
g 00 (α) = 6= 0,
f 0 (α)
which proves that convergence is of second-order provided f 00 (α) 6= 0.
Remark 4.1. Newton’s method converges at least quadratically. If g 00 (α) = 0, then higher order
convergence is expected.
Example 15. The function f (x) = sin x has a zero on the interval (3, 4), namely, x = π. Perform
three iterations of Newton’s method to approximate this zero, using x0 = 4. Determine the absolute
error in each of the computed approximations. What is the apparent order of convergence?
Sol. Consider f (x) = sin x. In the interval (3, 4), f has a zero α = π.
Also, f 0 (x) = cos x.
Newton’s iterations are given by
f (xn )
xn+1 = xn − 0 , n ≥ 0.
f (xn )
With x0 = 4, we have
f (x0 ) sin 4
x1 = x0 − 0 =4− = 2.8422,
f (x0 ) cos 4
f (x1 ) sin 2.8422
x2 = x1 − 0 = 2.8422 − = 3.1509,
f (x1 ) cos 2.8422
f (x2 ) sin 3.1509
x3 = x2 − 0 = 3.1509 − = 3.1416.
f (x2 ) cos 3.1509
The absolute errors are:
e0 = |x0 − α| = 0.8584,
e1 = |x1 − α| = 0.2994,
e2 = |x2 − α| = 0.0093,
e3 = |x3 − α| = 2.6876 × 10−7 .
ROOTS OF NON-LINEAR EQUATIONS 19
4.4. Newton’s method for multiple roots. Let α be a root of f (x) = 0 with multiplicity m. In
this case we can write
f (x) = (x − α)m φ(x).
In this case
f (α) = f 0 (α) = · · · = f (m−1) (α) = 0, f (m) (α) 6= 0.
Recall that we can regard Newton’s method as a fixed point method:
f (x)
xn+1 = g(xn ), g(x) = x − .
f 0 (x)
Then we substitute
f (x) = (x − α)m φ(x)
to obtain
(x − α)m φ(x)
g(x) = x −
m(x − + (x − α)m φ0 (x)
α)m−1 φ(x)
(x − α) φ(x)
= x− .
mφ(x) + (x − α)φ0 (x)
Therefore we obtain
1
g 0 (α) = 1 − 6= 0.
m
For m > 1, this is nonzero, and therefore Newton’s method is only linearly convergent.
There are ways of improving the speed of convergence of Newton’s method, creating a modified method
that is again quadratically convergent. In particular, consider the fixed point iteration formula
f (x)
xn+1 = g(xn ), g(x) = x − m
f 0 (x)
in which we assume to know the multiplicity m of the root α being sought. Then modifying the above
argument on the convergence of Newton’s method, we obtain
1
g 0 (α) = 1 − m = 0
m
and the iteration method will be quadratically convergent. But most of the time we don’t know the
multiplicity.
One method of handling the problem of multiple roots of a function f is to define
f (x)
µ(x) = .
f 0 (x)
If α is a zero of f of multiplicity m with f (x) = (x − α)m φ(x), then
(x − α)m φ(x)
µ(x) =
m(x − α)m−1 φ(x) + (x − α)m φ0 (x)
φ(x)
= (x − α)
mφ(x) + (x − α)φ0 (x)
20 ROOTS OF NON-LINEAR EQUATIONS
arctan 6
Example 18. The function f (x) = tan πx − 6 has a zero at ≈ 0.447431543. Use eight
π
iterations of each of the following methods to approximate this root. Which method is most successful
and why?
a. Bisection method in interval [0,1].
b. Secant method with x0 = 0 and x1 = 0.48.
c. Newton’s method with x0 = 0.4.
Sol. It is important to note that f has several roots on the interval [0, 5] (to see make a plot).
a. Since f has several roots in [0, 5], the bisection method converges to a different root in this interval.
Therefore, it would be a better idea to choose the interval to be [0, 1]. For such case, we have the
following results: After 8 iterations answer is 0.447265625.
n a b c
0 0 1 0.5
1 0 0.5 0.25
2 0.25 0.5 0.375
3 0.375 0.5 0.4375
4 0.4375 0.5 0.46875
5 0.4375 0.46875 0.453125
6 0.4375 0.46875 0.4453125
7 0.4375 0.4453125 0.44921875
8 0.4453125 0.44921875 0.447265625
Exercises
(1) Use the bisection method to find solutions accurate to within 10−3 for the following problems.
a. x − 2−x = 0 for 0 ≤ x ≤ 1 b. ex − x2 + 3x − 2 = 0 for 0 ≤ x ≤ 1
c. x + 1 − 2 sin(πx) = 0 for √0 ≤ x ≤ 0.5 and 0.5 ≤ x ≤ 1.
(2) Find an approximation to 3 25 correct to within 10−3 using the bisection algorithm.
(3) Find a bound for the number of iterations needed to achieve an approximation by bisection
method with accuracy 10−2 to the solution of x3 − x − 1 = 0 lying in the interval [1, 2]. Find
an approximation to the root with this degree of accuracy.
(4) Sketch the graphs of y = x and y = 2 sin x. Use the bisection method to find an approximation
to within 10−3 to the first positive value of x with x = 2 sin x.
(5) Let f (x) = (x + 2)(x + 1)2 x(x − 1)3 (x − 2). To which zero of f does the bisection method
converge when applied on the following intervals?
a. [-1.5, 2.5] b. [-0.5, 2.4] c. [-0.5, 3] d. [-3, -0.5].
22 ROOTS OF NON-LINEAR EQUATIONS
(6) The function defined by f (x) = sin(πx) has zeros at every integer. Show that when −1 < a < 0
and 2 < b < 3, the bisection method converges to
a. 0, if a + b < 2 b. 2, if a + b > 2 c. 1, if a + b = 2.
(7) For each of the following equations, use the given interval or determine an interval [a, b] on
which fixed-point iteration will converge. Estimate the number of iterations necessary to obtain
approximations accurate to within 10−2 , and perform the calculations.
a. 2 + sin x − x = 0 use [2, 3] b. x3 − 2x − 5 = 0 use [2, 3] c. 3x2 − ex = 0 d. x − cos x = 0.
(8) Use the fixed-point iteration method to find smallest and second smallest positive roots of the
equation tan x = 4x, correct to 4 decimal places.
(9) Show that g(x) = π + 0.5 sin(x/2) has a unique fixed point on [0, 2π]. Use fixed-point iteration
to find an approximation to the fixed point that is accurate to within 10−2 . Also estimate the
number of iterations required to achieve 10−2 accuracy, and compare this theoretical estimate
to the number actually needed.
(10) Find all the zeros of f (x) = x2 + 10 cos x by using the fixed-point iteration method for an
appropriate iteration function g. Find the zeros accurate to within 10−2 .
(11) What is the order of convergence of the iteration
xn (x2n + 3a)
xn+1 =
3x2n + a
√
as it converges to the fixed point α = a?
(12) Let A be a given positive constant and g(x) = 2x − Ax2 .
a. Show that if fixed-point iteration converges to a nonzero limit, then the limit is α = 1/A,
so the inverse of a number can be found using only multiplications and subtractions.
b. Find an interval about 1/A for which fixed-point iteration converges, provided x0 is in that
interval.
(13) Consider the root-finding problem f (x) = 0 with root α, with f 0 (x) 6= 0. Convert it to the
fixed-point problem
x = x + cf (x) = g(x)
with c a nonzero constant. How should c be chosen to ensure rapid convergence of
xn+1 = xn + cf (xn )
to α (provided that x0 is chosen sufficiently close to α)? Apply your way of choosing c to the
root-finding problem x3 − 5 = 0.
(14) Show that if A is any positive number, then the sequence defined by
1 A
xn = xn−1 + , for n ≥ 1,
2 2xn−1
√
converges to A whenever x0 > 0. What happens if x0 < 0?
(15) Use secant method to find solutions accurate to within 10−3 for the following problems.
a. −x3 − cos x = with x0 = −1 and x1 = 0.
b. x − cos x = 0, x ∈ [0, π/2].
c. ex + 2−x + 2 cos x − 6 = 0, x ∈ [1, 2].
(16) Use Newton’s method to find solutions accurate to within 10−3 to the following problems.
a. x − e−x = 0 for 0 ≤ x ≤ 1.
b. 2x cos 2x − (x − 2)2 = 0 for 2 ≤ x ≤ 3 and 3 ≤ x ≤ 4.
(17) Use Newton’s method to approximate the positive root of 2 cos x = x4 correct to six decimal
places.
(18) A calculator is defective: it can only add, subtract, and multiply. Use the equation 1/x = 1.732,
the Newton’s Method, and the defective calculator to find 1/1.732 correct to 4 decimal places.
(19) The fourth degree polynomial f (x) = 230x4 + 18x3 + 9x2 − 221x − 9 = 0 has two real zeros, one
in [−1, 0] and other in [0, 1]. Attempt to approximate these zeros to within 10−6 using Secant
and Newton’s methods.
(20) Use Newton’s method to solve the equation
1 1 2 1
+ x − x sin x − cos 2x = 0
2 4 2
ROOTS OF NON-LINEAR EQUATIONS 23
with x0 = π2 . Iterate using Newton’s method until an accuracy of 10−5 is obtained. Explain
why the result seems unusual for Newton’s method. Also, solve the equation with x0 = 5π and
x0 = 10π.
(21) Find all positive roots of the equation
Z x
2
10 e−x dt = 1
0
(23) Apply the Newton’s method with x0 = 0.8 to the equation f (x) = x3 − x2 − x + 1 = 0, and
verify that the convergence is only of first-order. Further show that root α = 1 has multiplicity
2 and then apply the modified Newton’s method with m = 2 and verify that the convergence
is of second-order.
(24) Use Newton’s method to approximate, to within 10−4 , the value of x that produces the point
on the graph of y = x2 that is closest to (1, 0).
(25) Use Newton’s method and the modified Newton’s method to find a solution of
√ √
cos(x + 2) + x(x/2 + 2) = 0, for − 2 ≤ x ≤ −1
accurate to within 10−3 .
(26) The circle below has radius 1, and the longer circular arc joining A and B is twice as long as
the chord AB. Find the length of the chord AB, correct to four decimal places. Use Newton’s
method.
(27) A particle starts at rest on a smooth inclined plane whose angle θ is changing at a constant
rate
dθ
= ω < 0.
dt
At the end of t seconds, the position of the object is given by
e − e−ωt
ωt
g
x(t) = − 2 − sin ωt .
2ω 2
Suppose the particle has moved 1.7 ft in 1 s. Find, to within 10−5 , the rate ω at which θ
changes. Assume that g = 32.17 ft/s2 .
24 ROOTS OF NON-LINEAR EQUATIONS
(28) An object falling vertically through the air is subjected to viscous resistance as well as to the
force of gravity. Assume that an object with mass m is dropped from a height s0 and that the
height of the object after t seconds is
mg m2 g
s(t) = s0 − t + 2 (1 − e−kt/m ),
k k
where g = 32.17 ft/s2 and k represents the coefficient of air resistance in lb-s/ft. Suppose
s0 = 300 ft, m = 0.25 lb, and k = 0.1 lb-s/ft. Find, to within 0.01 s, the time it takes this
quarter-pounder to hit the ground.
(29) It costs a firm C(q) dollars to produce q grams per day of a certain chemical, where
C(q) = 1000 + 2q + 3q 2/3 .
The firm can sell any amount of the chemical at $4 a gram. Find the break-even point of the
firm, that is, how much it should produce per day in order to have neither a profit nor a loss.
Use the Newton’s method and give the answer to the nearest gram.
Appendix A. Algorithms
Algorithm (Bisection):
To determine a root of f (x) = 0 that is accurate within a specified tolerance value ε, given values a
and b such that f (a) f (b) < 0.
a+b
Define c = .
2
If f (a) f (c) < 0, then set b = c, otherwise a = c.
End if.
Until |a − b| ≤ ε (tolerance value).
Print root as c.
Algorithm (Fixed-point):
To find the fixed point of g in an interval [a, b], given the equation x = g(x) with an initial guess
x0 ∈ [a, b]
1. n = 1.
2. xn = g(xn−1 ).
3. If
|xn − xn−1 |
|xn − xn−1 | < ε or <ε
|xn |
then step 5.
4. n → n + 1; go to 2.
5. End of Procedure.
Algorithm (Secant):
1. Give inputs and take two initial guesses x0 and x1 .
2. Start iterations
x1 − x0
x2 = x1 − f0 .
f1 − f0
ROOTS OF NON-LINEAR EQUATIONS 25
3. If
|xn − xn−1 |
|xn − xn−1 | < ε or <ε
|xn |
then stop and print the root.
4. Repeat the iterations (step 2). Also check if the number of iterations has exceeded the maximum
number of iterations.
Bibliography
[Burden] Richard L. Burden, J. Douglas Faires and Annette Burden, “Numerical Analysis,” Cengage
Learning, 10th edition, 2015.
[Atkinson] K. Atkinson and W. Han, “Elementary Numerical Analysis,” John Willey and Sons, 3rd
edition, 2004.
CHAPTER 3 (4 LECTURES)
DIRECT METHODS FOR SOLVING LINEAR SYSTEMS
1. Introduction
System of simultaneous linear equations are associated with many problems in engineering and
science, as well as with applications to the social sciences and quantitative study of business and
economic problems. These problems occur in wide variety of disciplines, directly in real world problems
as well as in the solution process for other problems.
The principal objective of this Chapter is to discuss the numerical aspects of solving linear system of
equations having the form
a x + a12 x2 + .........a1n xn = b1
11 1
a21 x1 + a22 x2 + .........a2n xn = b2
(1.1)
................................................
an1 x1 + an2 x2 + .........ann xn = bn .
This is a linear system of n equation in n unknowns x1 , x2 ......xn . This system can simply be written
in the matrix equation form
Ax=b
a11 a12 ··· a1n x1 b1
a21 a22 ··· a2n x2 b2
.. × .. = .. (1.2)
.. .. ..
. . . . . .
an1 an2 · · · ann xn bn
This equations has a unique solution x = A−1 b, when the coefficient matrix A is non-singular. Unless
otherwise stated, we shall assume that this is the case under discussion. If A−1 is already available,
then x = A−1 b provides a good method of computing the solution x.
If A−1 is not available, then in general A−1 should not be computed solely for the purpose of obtaining
x. More efficient numerical procedures will be developed in this chapter. We study broadly two
categories Direct and Iterative methods. We start with direct method to solve the linear system in
this chapter.
2. Gaussian Elimination
Direct methods, which are technique that give a solution in a fixed number of steps, subject only to
round-off errors, are considered in this chapter. Gaussian elimination is the principal tool in the direct
solution of system (1.2). The method is named after Carl Friedrich Gauss (1777-1855).
To solve larger system of linear equation, we consider a following n × n system
a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1 (E1 )
a21 x1 + a22 x2 + a23 x3 + · · · + a2n xn = b2 (E2 )
............................................................
ai1 x1 + ai2 x2 + ai3 x3 + · · · + ain xn = bi (Ei )
............................................................
an1 x1 + an2 x2 + an3 + · · · + ann xn = bn (En ).
Here Ei denote the i-th row of the coefficients matrix A, i = 1, 2, · · · , n.
Let a11 6= 0 and eliminate x1 from E2 , E3 , · · · , En .
ai1
Define multipliers mi1 = , for each i = 2, 3, · · · , n.
a11
We write each entry in Ei as Ei − mi1 E1 and bi as bi − mi1 b1 ; i = 1, 2, · · · , n.
1
2 DIRECT METHODS FOR SOLVING LINEAR SYSTEMS
Partial Pivoting: In the elimination process, we divide with aii at each stage and assume that aii 6= 0.
These elements are known as pivot element. If at any stage of elimination, one of the pivot becomes
small (or zero) then we bring other element as pivot by interchanging the rows. This process is called
Gauss elimination with partial pivoting.
Example 1. Solve the system of equations using Gauss elimination. This system has exact solution
(known from other sources!) x1 = 2.6, x2 = −3.8, x3 = −5.0.
1.667
Multiplier is m32 = = 16670 and we write E3 as E3 − 16670E2 .
0.0001
6.000 2.000 2.000 −2.000
0.0 0.0001000 −0.3333 1.667
0.0 0.0 5555 −27790
Using back substitution, we obtain
x3 = −5.003
x2 = 0.0
x1 = 1.335.
We observe that computed solution is not compatible with the exact solution.
The difficulty is in a22 . This coefficient is very small (almost zero). This means that the coefficient
in this position had essentially infinite relative error and this was carried through into computation
involving this coefficient. To avoid this, we interchange second and third rows and then continue the
elimination.
In this case (after interchanging) multipliers is m32 = 0.00005999 and we write new E3 as E3 −
000005999E2 .
6.000 2.000 2.000 −2.000
0.0 1.667 −1.337 0.3334 .
0.0 0.0 −0.3332 1.667
Using back substitution, we obtain
x3 = −5.003
x2 = −3.801
x1 = 2.602.
We see that after partial pivoting, we get the desired solution.
Example 2. Given the linear system
x1 − x2 + αx3 = −2,
−x1 + 2x2 − αx3 = 3,
αx1 + x2 + x3 = 2.
a. Find value(s) of α for which the system has no solutions.
b. Find value(s) of α for which the system has an infinite number of solutions.
c. Assuming a unique solution exists for a given α, find the solution.
Sol. Augmented matrix is given by
1 −1 α −2
−1 2 −α 3
α 1 1 2
Multipliers are m21 = −1 and m31 = α. Performing E2 → E2 + E1 and E3 → E3 − αE1 to obtain
1 −1 α −2
0 1 0 1
2
0 1 + α 1 − α 2(1 + α)
Multiplier is m32 = 1 + α and we perform E3 → E3 − m32 E2 .
1 −1 α −2
0 1 0 1
0 2
0 1−α 1+α
a. If α = 1, then the last row of the reduced augmented matrix says that 0.x3 = 2, the system has no
solution.
b. If α = −1, then we see that the system has infinitely many solutions.
c. If α 6= 1, then the system has a unique solution.
1 1
x3 = , x2 = 1, x1 = − .
1−α 1−α
4 DIRECT METHODS FOR SOLVING LINEAR SYSTEMS
Complete Pivoting: In the first stage of elimination, we search the largest element in magnitude
from the entire matrix and bring it at the position of first pivot. We repeat the same process at every
step of elimination. This process require interchange of both rows and columns.
Scaled Partial Pivoting: In this approach, the algorithm select the largest relative entries as the
pivot elements at each stage of elimination. At the beginning, a scale factor must be computed for
each equation in the system. We define
si = max |aij | (1 ≤ i ≤ n)
1≤j≤n
These numbers are recored in the scaled vector s = [s1 , s2 , · · · , sn ]. Note that the scale vector does not
change throughout the procedure.
In starting the forward elimination process, we do not arbitrarily use the first equation as the pivot
DIRECT METHODS FOR SOLVING LINEAR SYSTEMS 5
ai,1
equation. Instead, we use the equation for which the ratio is greatest. We repeat the process by
si
taking same scaling factors.
Example 4. Solve the system
2.11x1 − 4.21x2 + 0.921x3 = 2.01
4.01x1 + 10.2x2 − 1.12x3 = −3.09
1.09x1 + 0.987x2 + 0.832x3 = 4.21
by using scaled partial pivoting.
Sol. The augmented matrix is
2.11 −4.21 0.921 2.01
4.01 10.2 −1.12 −3.09
1.09 0.987 0.832 4.21.
The scale factors are s1 = 4.21, s2 = 10.2, & s3 = 1.09. We need to pick the largest (2.11/4.21 =
0.501, 4.01/10.2 = 0.393, 1.09/1.09 = 1), which is the third entry, and interchange row 1 and row 3 and
interchange s1 and s3 to get
1.09 0.987 0.832 4.21
4.01 10.2 −1.12 −3.09
2.11 −4.21 0.921 2.01.
Performing E2 → E2 − 3.68E1 , E3 → E3 − 1.94E1 , we obtain
1.09 0.987 0.832 4.21
0 6.57 −4.18 −18.6
0 −6.12 −0.689 −6.16.
Now comparing (6.57/10.2 = 0.6444, 6.12/4.21 = 1.45), the second ratio is largest so we need to
interchange row 2 and row 3 and interchange scale factor accordingly.
1.09 0.987 0.832 4.21
0 −6.12 −0.689 −6.16
0 6.57 −4.18 −18.6.
Performing E3 → E3 + 1.07E2 , we get
1.09 0.987 0.832 4.21
0 −6.12 −0.689 −6.16
0 0 −4.92 −25.2.
Backward substitution gives x3 = 5.12, x2 = 0.43, x1 = −0.436.
Example 5. Solve the system
3x1 − 13x2 + 9x3 + 3x4 = −19
−6x1 + 4x2 + x3 − 18x4 = −34
6x1 − 2x2 + 2x3 + 4x4 = 16
12x1 − 8x2 + 6x3 + 10x4 = 26
by hand using scaled partial pivoting. Justify all row interchanges and write out the transformed matrix
after you finish working on each column.
Sol. The augmented matrix is
3 −13 9 3 −19
−6 4 1 −18 −34
6 −2 2 4 16
12 −8 6 10 26
6 DIRECT METHODS FOR SOLVING LINEAR SYSTEMS
and the scale factors are s1 = 13, s2 = 18, s3 = 6, & s4 = 12. We need to pick the largest (3/13, 6/18, 6/6, 12/12),
which is the third entry, and interchange row 1 and row 3 and interchange s1 and s3 to get
6 −2 2 4 16
−6 4 1 −18 −34
3 −13 9 3 −19
12 −8 6 10 26
with s1 = 6, s2 = 18, s3 = 13, s4 = 12. Performing E2 → E2 − (−6/6)E1 , E3 → E3 − (3/6)E1 , and
E4 → E4 − (12/6)E1 , we obtain
6 −2 2 4 16
0
2 3 −14 −18
0 −12 8 1 −27
0 −4 2 2 −6.
Comparing (|a22 |/s2 = 2/18, |a32 |/s3 = 12/13, |a42 |/s4 = 4/12), the largest is the third entry so we
need to interchange row 2 and row 3 and interchange s2 and s3 to get
6 −2 2 4 16
0 −12 8 1 −27
0 2 3 −14 −18
0 −4 2 2 −6
with s1 = 6, s2 = 13, s3 = 18, s4 = 12. Performing E3 → E3 − (2/12)E2 and E4 → E4 − (−4/12)E2 ,
we get
6 −2 2 4 16
0 −12 8 1 −27
0 0 13/3 −83/6 −45/2
0 0 −2/3 5/3 3
Comparing (|a33 |/s3 = (13/3)/18, |a43 |/s4 = (2/3)/12), the largest is the first entry so we do not
interchange rows. Performing E4 → E4 − (−2/13)E3 , we get the final reduced matrix
6 −2 2 4 16
0 −12 8 1 −27
0 0 13/3 −83/6 −45/2
0 0 0 −6/13 −6/13
Backward substitution gives x1 = 3, x2 = 1, x3 = −2, x4 = 1.
Example 6. Solve this system of linear equations:
0.0001x + y = 1
x+y =2
using no pivoting, partial pivoting, and scaled partial pivoting. Carry at most five significant digits
of precision (rounding) to see how finite precision computations and roundoff errors can affect the
calculations.
Sol. By direct substitution, it is easy to verify that the true solution is x = 1.0001 and y = 0.99990 to
five significant digits.
For no pivoting, the first equation in the original system is the pivot equation, and the multiplier is
1/0.0001 = 10000. The new system of equations is
0.0001x + y = 1
9999y = 9998
We obtain y = 9998/9999 ≈ 0.99990 and x = 1. Notice that we have lost the last significant digit in
the correct value of x.
We repeat the solution process using partial pivoting for the original system. We see that the second
DIRECT METHODS FOR SOLVING LINEAR SYSTEMS 7
entry is larger, so the second equation is used as the pivot equation. We can interchange the two
equations, obtaining
x+y =2
0.0001x + y = 1
2.1. Operation Counts. We count the number of operations required to solve the system Ax = b.
Both the amount of time required to complete the calculations and the subsequent round-off error
depend on the number of floating-point arithmetic operations needed to solve a routine problem.
In general, the amount of time required to perform a multiplication or division on a computer is
approximately the same and is considerably greater than that required to perform an addition or
subtraction. The actual differences in execution time, however, depend on the particular computing
system.
To demonstrate the counting operations for a given method, we will count the operations required to
solve a typical linear system of n equations in n unknowns using Gauss elimination Algorithm. We will
keep the count of the additions/subtractions separate from the count of the multiplications/divisions
because of the time differential.
First step is to calculate multipliers. Then the replacement of the equation Ei by (Ei − mij Ej ) requires
that mij be multiplied by each term in Ej and then each term of the resulting equation is subtracted
from the corresponding term in Ei . The following table states the operations count from going from
A to U at each step 1, 2, · · · , n − 1.
3. The LU Factorization:
When we use matrix multiplication, another meaning can be given to the Gauss elimination. The
matrix A can be factored into the product of the two triangular matrices.
Let AX = b is the system to be solved, A is n × n coefficient matrix. The linear system can be reduced
to the upper triangular system U X = g with
u11 u12 · · · u1n
0 u22 · · · u2n
U = ..
.. ..
. . .
0 0 ··· unn
Here uij = aij . Introduce an auxiliary lower triangular matrix L based on the multipliers mij as
following
1 0 0 ··· 0
m21
1 0 ··· 0
L = m31 m32 1
··· 0
.. . . ..
. . .
mn,1 0 · · · mn,n−1 1
Theorem 3.1. Let A be a non-singular matrix and let L and U be defined as above. If U is produced
without pivoting then
LU = A.
This is called LU factorization of A.
We can use Gaussian elimination to solve a system by LU decomposition. Suppose that A has been
factored into the triangular form A = LU , where L is lower triangular and U is upper triangular. Then
we can solve for x more easily by using a two-step process.
First we let y = U x and solve the lower triangular system Ly = b for y. Once y is known, the upper
DIRECT METHODS FOR SOLVING LINEAR SYSTEMS 9
triangular system U x = y provide the solution x. We can check that total number of operations are
same as Gauss elimination.
Example 7. We require to solve the following system of linear equations using LU decomposition.
x1 + 2x2 + 4x3 = 3
3x1 + 8x2 + 14x3 = 13
2x1 + 6x2 + 13x3 = 4.
(a) Find the matrices L and U using Gauss elimination.
(b) Using those values of L and U , solve the system of equations.
Sol. We first apply the Gaussian elimination on the matrix A and collect the multipliers m21 , m31 ,
and m32 .
We have
1 2 4
A = 3 8 14
2 6 13
Multipliers are m21 = 3, m31 = 2.
E2 → E2 − 3E1 and E3 → E3 − 2E1 .
1 2 4
∼ 0 2 2
0 2 5
Multiplier is m32 = 2/2 = 1 and we perform E3 → E3 − E2 .
1 2 4
∼ 0 2 2
0 0 3
We observe that m21 = 3, m31 = 2, and m32 = 1. Therefore,
1 2 4 1 0 0 1 2 4
A = 3 8 14 = 3 1 0 0 2 2 = LU
2 6 13 2 1 1 0 0 3
Therefore,
Ax = b =⇒ LU x = b.
Assuming U x = y, we obtain,
Ly = b
i.e.
1 0 0 y1 3
3 1 0 y2 = 13 .
2 1 1 y3 4
Using forward substitution, we obtain y1= 3, y2 = 4, and y3 = −6. Now
1 2 4 x1 3
U x = y =⇒ 0 2 2 x2 = 4 .
0 0 3 x3 −6
Now, using the backward substitution process, we obtain the final solution as x3 = −2, x2 = 4, and
x1 = 3.
Example 8. (a) Determine the LU factorization for matrix A in the linear system AX = B, where
1 1 0 3 1
2 1 −1 1 1
A= 3 −1 −1 2 and B = −3
(3.1)
−1 2 3 −1 4
10 DIRECT METHODS FOR SOLVING LINEAR SYSTEMS
b. Show that solving Ly = b, where L is a lower-triangular matrix with lii = 1 for all i, requires
1 2 1 1 2 1
n − n multiplications/divisions and n − n additions/subtractions.
2 2 2 2
c. Show that solving Ax = b by first factoring A into A = LU and then solving Ly = b and U x = y
requires the same number of operations as the Gaussian elimination algorithm.
Sol. a. We have already counted the mathematical operation in detail in Gauss elimination. Here we
provide the same for LU factorization.
n(n − 1)(2n − 1) 1 1 1
We found total number of additions/subtractions from A to U are = n3 − n2 + n
6 3 2 6
n(n − 1) n(n − 1)(2n − 1) n(n2 − 1) 1 3
and total number of multiplications/divisions are + = = n −
2 6 3 3
1
n.
3
These counts remains same to factorize the matrix A in to L and U .
b. Solving Ly = b, where L is a lower-triangular matrix with lii = 1 for all i, requires total number of
n(n − 1)
additions/subtractions 0 + 1 + · · · + (n − 1) = .
2
n(n − 1)
Total number of multiplications/divisions 0 + 1 + 2 + · · · + n = .
2
Please note that these operations can be counted in similar manner as we discussed for back substitu-
tion. As lii is always 1, so this will reduce one division at each step. These counts are same as for b.
c. Finally the counts used in b are same as we solve Ly = b. Therefore total counts remains same.
Exercises
(1) Use Gaussian elimination with backward substitution and two-digit rounding arithmetic to
solve the following linear systems. Do not reorder the equations. (The exact solution to each
system is x1 = −1, x2 = 1, x3 = 3.)
a.
−x1 + 4x2 + x3 = 8
5 2 2
x1 + x2 + x3 = 1
3 3 3
2x1 + x2 + 4x3 = 11.
b.
4x1 + 2x2 − x3 = −5
1 1 1
x1 + x2 − x3 = −1
9 9 3
x1 + 4x2 + 2x3 = 9.
(2) Using the four-digit arithmetic solve the following system of equations by Gaussian elimination
with and without partial pivoting
This system has exact solution, rounded to four places x1 = 0.2245, x2 = 0.2814, x3 = 0.3279.
Compare your answers!
(3) Use the Gaussian elimination algorithm to solve the following linear systems, if possible, and
determine whether row interchanges are necessary:
12 DIRECT METHODS FOR SOLVING LINEAR SYSTEMS
a.
x1 − x2 + 3x3 = 2
3x1 − 3x2 + x3 = −1
x1 + x2 = 3.
b.
2x1 − x2 + x3 − x4 = 6
x2 − x3 + x4 = 5
x4 = 5
x3 − x4 = 3.
(4) Use Gaussian elimination and three-digit chopping arithmetic to solve the following linear
system, and compare the approximations to the actual solution [0, 10, 1/7]T .
3.03x1 − 12.1x2 + 14x3 = −119
−3.03x1 + 12.1x2 − 7x3 = 120
6.11x1 − 14.2x2 + 21x3 = −139.
(5) Repeat the above exercise using Gaussian elimination with partial and scaled partial pivoting
and three-digit rounding arithmetic.
(6) Suppose that
2x1 + x2 + 3x3 = 1
4x1 + 6x2 + 8x3 = 5
6x1 + αx2 + 10x3 = 5,
with |α| < 10. For which of the following values of α will there be no row interchange required
when solving this system using scaled partial pivoting?
a. α = 6 b. α = 9 c. α = −3.
(7) Modify the LU Factorization Algorithm so that it can be used to solve a linear system, and
then solve the following linear systems.
2x1 − x2 + x3 = −1
3x1 + 3x2 + 9x3 = 0
3x1 + 3x2 + 5x3 = 4.
Bibliography
[Burden] Richard L. Burden, J. Douglas Faires and Annette Burden, “Numerical Analysis,” Cengage
Learning, 10th edition, 2015.
[Atkinson] K. Atkinson and W. Han, “Elementary Numerical Analysis,” John Willey and Sons, 3rd
edition, 2004.
Appendix A. Algorithms
Algorithm (Gauss Elimination)
(1) Give inputs matrix A and right hand vector b and compute order N = max(size(A)).
(2) Perform Gaussian elimination
for j = 2 : N
for i = j : N
Calculate multipliers: m = A(i, j − 1)/A(j − 1, j − 1);
Replace entries: A(i, :) = A(i, :) − A(j − 1, :) ∗ m;
Replace b: b(i) = b(i) − m ∗ b(j − 1);
end for i
end for j.
DIRECT METHODS FOR SOLVING LINEAR SYSTEMS 13
1. Introduction
In this chapter we will study iterative techniques to solve linear systems. An initial approximation
(or approximations) will be found, and new approximations are then determined based on how well
the previous approximations satisfied the equation. The objective is to find a way to minimize the
difference between the approximations and the exact solution. To discuss iterative methods for solving
linear systems, we first need to determine a way to measure the distance between n-dimensional column
vectors. This will permit us to determine whether a sequence of vectors converges to a solution of the
system. In actuality, this measure is also needed when the solution is obtained by the direct methods
presented in Chapter 3. Those methods required a large number of arithmetic operations, and using
finite-digit arithmetic leads only to an approximation to an actual solution of the system. We end
the chapter by presenting a way to find eigenvalue (dominant) and associated eigenvector. Dominant
eigenvalue plays an important role in convergence of any iterative method.
1.1. Norms of Vectors and Matrices.
1.2. Vector Norms. Let Rn denote the set of all n−dimensional column vectors with real-number
components. To define a distance in Rn we use the notion of a norm, which is the generalization of
the absolute value on R, the set of real numbers.
Definition 1.1. A vector norm on Rn is a function, k·k, from Rn into R with the following properties:
(1) kxk≥ 0 for all x ∈ Rn .
(2) kxk= 0 if and only if x = 0.
(3) kx + yk≤ kxk+kyk for all x, y ∈ Rn (triangle inequality).
(4) kαxk= |α|kxk for all x ∈ Rn and α ∈ R.
Definition 1.2. The l2 and l∞ norms for the vector x = (x1 , x2 , . . . , xn )t are defined by
Xn 1/2
2
kxk2 = |xi | and kxk∞ = max |xi |.
1≤i≤n
i=1
Note that each of these norms reduces to the absolute value in the case n = 1.
The l2 norm is called the Euclidean norm of the vector x because it represents the usual notion of
distance from the origin in case x is in R1 = R, R2 , or R3 . For example, the l2 norm of the vector
x = (x1 , x2 , x3 )t gives the length of the straight line joining the points (0, 0, 0) and (x1 , x2 , x3 ).
Example 1. Determine the l2 norm and the l∞ norm of the vector x = (−1, 1, −2)t .
Sol. The vector x = (−1, 1, −2)t in R3 has norms
p √
kxk2 = (−1)2 + (1)2 + (−2)2 = 6
and
kxk∞ = max{|−1|, |1|, |−2|} = 2
Definition 1.3 (Distance between vectors in Rn ). If x = (x1 , x2 , · · · , xn )t and y = (y1 , y2 , · · · , yn )t
are vectors in Rn , the l2 and l∞ distances between x and y are defined by
( n )1/2
X
kx − yk2 = (xi − yi ) ,
i=1
Definition 1.4 (Matrix Norm). A matrix norm on the set of all n × n matrices is a real-valued
function, k·k, defined on this set, satisfying for all n × n matrices A and B and all real numbers α :
(1) kAk≥ 0,
(2) kAk= 0 if and only if A is O, the matrix with all 0 entries,
(3) kαAk= |α|kAk,
(4) kA + Bk≤ kAk+kBk,
(5) kABk≤ kAk kBk,
If k·k is a vector norm on Rn , then
kAk= max kAxk
kxk=1
is a matrix norm.
The matrix norms we will consider have the forms
kAk∞ = max kAxk∞ ,
kxk∞ =1
and
kAk2 = max kAxk2 .
kxk2 =1
∴ lim Ak = 0,
k→∞
which implies matrix A is convergent.
Note that the convergent matrix A in this Example has spectral radius ρ(A) = 21 , because 12 is the
only eigenvalue of A. This illustrates an important connection that exists between the spectral radius
of a matrix and the convergence of the matrix, as detailed in the following result.
4 ITERATIVE TECHNIQUES IN MATRIX ALGEBRA
The proof of this theorem can be found in advanced texts of numerical analysis.
2. Iterative Methods
The linear system Ax = b may have a large order. For such systems Gauss elimination is often too
expensive in either computation time or computer memory requirements or both.
In an iterative method, a sequence of progressively iterates is produced to approximate the solution.
Jacobi and Gauss-Seidel Method: We start with an example. Let us consider a system of equations
9x1 + x2 + x3 = 10
2x1 + 10x2 + 3x3 = 19
3x1 + 4x2 + 11x3 = 0.
One class of iterative method for solving this system as follows.
We write
1
x1 = (10 − x2 − x3 )
9
1
x2 = (19 − 2x1 − 3x3 )
10
1
x3 = (0 − 3x1 − 4x2 ).
11
(0) (0) (0)
Let x(0) = [x1 x2 x3 ] be an initial approximation of solution x. Then define an iteration of sequence
(k+1) 1 (k) (k)
x1 = (10 − x2 − x3 )
9
(k+1) 1 (k) (k)
x2 = (19 − 2x1 − 3x3 )
10
(k+1) 1 (k) (k)
x3 = (0 − 3x1 − 4x2 ), k = 0, 1, 2, . . . .
11
This is called Jacobi or method of simultaneous replacements. The method is named after German
mathematician Carl Gustav Jacob Jacobi.
We start with [0 0 0] and obtain
(1) (1) (1)
x1 = 1.1111, x2 = 1.900, x3 = 0.0.
(2) (2) (2)
x1 = 0.9000, x2 = 1.6778, x3 = −0.9939
etc.
An another approach to solve the same system will be following.
(k+1) 1 (k) (k)
x1 = (10 − x2 − x3 )
9
(k+1) 1 (k+1) (k)
x2 = (19 − 2x1 − 3x3 )
10
(k+1) 1 (k+1) (k+1)
x3 = (0 − 3x1 − 4x2 ), k = 0, 1, 2, . . . .
11
This method is called Gauss-Seidel or method of successive replacements. It is named after the German
mathematicians Carl Friedrich Gauss and Philipp Ludwig von Seidel. Starting with [0 0 0], we obtain
(1) (1) (1)
x1 = 1.1111, x2 = 1.6778, x3 = −0.9131.
(2) (2) (2)
x1 = 1.0262, x2 = 1.9687, x3 = −0.9588.
ITERATIVE TECHNIQUES IN MATRIX ALGEBRA 5
The Jacobi iterative method is obtained by solving the i-th equation for xi to obtain (provided aii 6= 0)
n
1 X
xi = (−a ij x j ) + bi
.
aii
j=1
j6=i
(k+1) (k)
For each k ≥ 0, generate the components xi from xi as
n
(k+1) 1 X
(−aij xkj ) + bi
xi =
.
aii
j=1
j6=i
Now we write the above iterative scheme is matrix form. To write matrix form, we take
n
(k+1) X
= (−aij xkj ) + bi
aii xi
.
j=1
j6=i
In matrix form
(D + L)x(k+1) = −U x(k) + b,
where D, L and U are diagonal, strictly lower triangle and upper triangle matrices, respectively.
x(k+1) = −(D + L)−1 U x(k) + (D + L)−1 b
6 ITERATIVE TECHNIQUES IN MATRIX ALGEBRA
x(k+1) = Tg x(k) + B, k = 0, 1, 2, · · · .
Here Tg = −(D + L)−1 U and this matrix is called iteration matrix and B = (D + L)−1 b.
Stopping Criteria: Since these techniques are iterative so we require a stopping criteria. Let ε is
accuracy desired then we can use the following
kx(k) − x(k−1) k∞
< ε.
kx(k) k∞
Example 5. Use the Gauss-Seidel method to approximate the solution of the following system:
4x1 + x2 − x3 = 3
2x1 + 7x2 + x3 = 19
x1 − 3x2 + 12x3 = 31.
Continue the iterations until two successive approximations are identical when rounded to three signif-
icant digits.
Sol. To begin, write the system in the form
1
x1 = (3 − x2 + x3 )
4
1
x2 = (19 − 2x1 − x3 )
7
1
x3 = (31 − x1 + 3x2 )
12
As
|a11 | = 4 > |a12 | + |a13 | = 1
|a22 | = 7 > |a21 | + |a23 | = 3
|a33 | = 12 > |a31 | + |a32 | = 2
which shows that coefficient matrix is strictly diagonally dominant. Therefore Gauss-Seidel iterations
will converge.
Start with a random vector x(0) = [0, 0, 0]t the first approximation is
(1)
x1 = 0.7500
(1)
x2 = 2.5000
(1)
x3 = 3.1458.
Similarly
x(2) = [0.9115, 2.0045, 3.0085]t
2.1. Convergence analysis of iterative methods. To study the convergence of general iteration
techniques, we need to analyze the formula
x(k+1) = T x(k) + B, for eachk = 0, 1, · · · ,
where x(0) is arbitrary. The next lemma and Theorem provide the key for this study.
Lemma 2.1. If the spectral radius ρ(T ) < 1, then (I − T )−1 exists, and
∞
X
(I − T )−1 = I + T + T 2 + · · · = T k.
k=0
ITERATIVE TECHNIQUES IN MATRIX ALGEBRA 7
Theorem 2.2 (Necessary and sufficient condition). A necessary and sufficient condition for the con-
vergence of an iterative method is that the eigenvalue of iteration matrix T satisfy the inequality
ρ(T ) < 1.
Proof. Let
ρ(T ) < 1.
The sequence of vector x(k) by iterative method (Gauss-Seidel) are given by
x(1) = T x(0) + B.
........................
= T 2 (x − x(k−2) )
= T k (x − x(0 ).
Let z = x − x(0) ) then
=⇒ ρ(T ) < 1.
Theorem 2.3. If A is strictly diagonally dominant in Ax = b, then iterative method always converges
for any initial starting vector.
8 ITERATIVE TECHNIQUES IN MATRIX ALGEBRA
x(k+1) = Tj x(k) + B.
Method is convergent iff ρ(Tj ) < 1.
Now
k(D + L)−1 k∞
kTj k∞ = k − (D + L)−1 U k∞ ≤ k − (D + L)−1 k∞ kU k∞ = < 1.
| max aii |
This shows the convergence condition for Jacobi method.
Further we prove the convergence of Gauss-Seidel method. Gauss-Seidel iterations are given by
x(k+1) = −(D + L)−1 U x(k) + (D + L)−1 b
x(k+1) = Tg x(k) + B.
Let λ be an eigenvalue of matrix A and x be an eigenvector then
Tg x = λx
−(D + L)−1 U x = λx
−U x = λ(D + L)x
n
X Xi
− aij = λ[ aij xj ], i = 1, 2, . . . , n
j=i+1 j=1
n
X i−1
X
− aij = λaii xi + λ aij xj
j=i+1 j=1
i−1
X n
X
λaii xi = −λ aij xj − λ aij xj
j=1 j=i+1
i−1
X n
X
|λaii xi | ≤ |λ| |aij | |xj | + |λ| |aij | |xj |
j=1 j=i+1
n
P n
P
|aij | |aij |
j=i+1 j=i+1
=⇒ |λ| ≤ i−1
≤ n =1
P
|aij |
P
|aii | − |aij |
j=1 j=i+1
Eigenvalues are − 21 , − 21 , 0.
Thus ρ(Tg ) = 12 < 1.
Spectral radius of iteration matrix of Jacobi method is greater than one and less than one for
Gauss-Seidel. Therefore Gauss-Seidel iterations converge.
The choice of relaxation factor ω is not necessarily easy, and depends upon the properties of the
coefficient matrix. If A is a symmetric and positive definite matrix and 0 < ω < 2, then the SOR
method converges for any choice of initial approximate vector x(0) .
Important Note: If a matrix A is symmetric, it is positive definite if and only if all its leading
principle submatrices (minors) has a positive determinant.
Example 7. Consider a linear system Ax = b, where
3 −1 1 −1
A = −1 3 −1 , b = 7
1 −1 3 −7
a. Check, that the SOR method with value ω = 1.25 of the relaxation parameter can be used to solve
this system.
b. Compute the first iteration by the SOR method starting at the point x(0) = (0, 0, 0)t .
Sol. a. Let us verify the sufficient condition for using the SOR method. We have to check, if matrix
A is symmetric, positive definite: A is symmetric as A = AT , so let us check positive definitness:
3 −1
det(3) = 3 > 0, det = 8 > 0, det(A) = 20 > 0.
−1 3
All leading principal minors are positive and so the matrix A is positive definite. We know, that for
symmetric positive definite matrices the SOR method converges for values of the relaxation parameter
ω from the interval 0 < ω < 2.
Therefore the SOR method with value ω = 1.25 can be used to solve this system.
b. The iterations of the SOR method are easier to compute by elements than in the vector form:
Write the system as equations and write down the equations for the Gauss-Seidel iterations
(k+1) (k) (k)
x1 = (−1 + x2 − x3 )/3
(k+1) (k+1) (k)
x2 = (7 + x1 + x3 )/3
(k+1) (k+1) (k+1)
x3 = (−7 − x1 + x2 )/3.
Now multiply the right hand side by the parameter ω and add to it the vector x(k) from the previous
iteration multiplied by the factor of (1 − ω) :
(k+1) (k) (k) (k)
x1 = (1 − ω)x1 + ω(−1 + x2 − x3 )/3
(k+1) (k) (k+1) (k)
x2 = (1 − ω)x2 + ω(7 + x1 + x3 )/3
(k+1) (k) (k+1) (k+1)
x3 = (1 − ω)x3 + ω(−7 − x1 + x2 )/3.
For k = 0:
(1)
x1 = (1 − 1.25) · 0 + 1.25 · (−1 + 0 − 0)/3 = −0.41667
(1)
x2 = (1 − 1.25) · 0 + 1.25 · (7 − 0.41667 + 0)/3 = 2.7431
(1)
x3 = (1 − 1.25) · 0 + 1.25 · (−7 + 0.41667 + 2.7431)/3 = −1.6001.
The next three iterations are
x(2) = (1.4972, 2.1880, −2.2288)t ,
x(3) = (1.0494, 1.8782, −2.0141)t ,
x(4) = (0.9428, 2.0007, −1.9723)t .
The exact solution is x = (1, 2, −2)t .
ITERATIVE TECHNIQUES IN MATRIX ALGEBRA 11
Condition Numbers: The inequalities in the above theorem imply that kA−1 k and kAk·kA−1 k pro-
vide an indication of the connection between the residual vector and the accuracy of the approximation.
In general, the relative error kx − x̃|/kxk is of most interest, and this error is bounded by the product
of kAk·kA−1 k with the relative residual for this approximation, krk/kbk. Any convenient norm can be
used for this approximation; the only requirement is that it be used consistently throughout.
Definition 4.3. The condition number of the nonsingular matrix A relative to a norm k·k is
K(A) = kAk·kA−1 k.
With this notation, the inequalities in above theorem become
krk
kx − x̃k≤ K(A)
kAk
and
kx − x̃k krk
≤ K(A) .
kxk kbk
For any nonsingular matrix A and natural norm k·k,
1 = kIk= kA · A−1 k≤ kAk·kA−1 k= K(A)
A matrix A is well-conditioned if K(A) is close to 1, and is ill-conditioned when K(A) is significantly
greater than 1. Conditioning in this context refers to the relative security that a small residual vector
implies a correspondingly accurate approximate solution. When it is very large, the solution of Ax = b
12 ITERATIVE TECHNIQUES IN MATRIX ALGEBRA
will be very sensitive to relatively small changes in b. Or in the the residual, a relatively small residual
will quite possibly lead to a relatively large error in x̃ as compared with x. These comments are also
valid when the changes are made to A rather than to b.
0.98
Example 9. Suppose x̄ = is an approximate solution for the linear system Ax = b, where
1.1
3.9 1.6 5.5
A= , b= .
6.8 2.9 9.7
kx − x̄k
Find a bound for the relative error .
kxk
Sol. The residual is given by
5.5 3.9 1.6 0.98 −0.0820
r = b − Ax̄ = − = .
9.7 6.8 2.9 1.1 −0.1540
The bound for the relative error is (for the infinity norm)
kx − x̄k kAk kA−1 k krk
≤ .
kxk b
Also
det(A) = 0.43.
1 2.9 −1.6 6.7442 −3.7209
∴ A−1 = =
0.43 −6.8 2.9 −15.8140 9.0698
kAk = 9.7, kA−1 k = 24.8837, krk = 0.1540, kbk = 9.7.
kx − x̄k kAk kA−1 k krk
∴ ≤ = 3.8321.
kxk b
Example 10. Determine the condition number for the matrix
1 2
A= .
1.0001 2
Sol. We saw in previous Example that the very poor approximation (3, −0.0001)t to the exact solution
(1, 1)t had a residual vector with small norm, so we should expect the condition number of A to be
large. We have kAk∞ = max{|1| + |2|, |1.001| + |2|} = 3.0001, which would not be considered large.
However,
−1 −10000 10000
A = , so kAk∞ = 20000,
5000.5 −5000
and for the infinity norm, K(A) = (20000)(3.0001) = 60002. The size of the condition number for this
example should certainly keep us from making hasty accuracy decisions based on the residual of an
approximation.
Example 11. Find the condition number K(A) of the matrix
1 c
A= , |c| =
6 1.
c 1
When does A become ill-conditioned? What does this say about the linear system Ax = b? How is
K(A) related to det(A)?
Sol. For the given system of equations the matrix A is
1 c
c 1
and is well conditioned if K(A) is near 1. K(A) with respect to norm k·k∞ is given as
K(A) = kAk∞ kA−1 k∞ .
ITERATIVE TECHNIQUES IN MATRIX ALGEBRA 13
1 −c
1 −c − c2 1 − c2
Here det(A) = 1 − c2 and adj(A) = . Thus A−1 = 1 −c 1
−c 1
1 − c2 1 − c2
1 |c| 1 + |c|
Thus ||A||∞ = 1 + |c| and ||A−1 ||∞ = + = .
|1 − c2 | |1 − c2 | |1 − c2 |
(1 + |c|)2
Hence condition number K(A) = .
|1 − c2 |
Thus A is ill-conditioned when |c| is near 1.
When condition number is large the solution of the system Ax = b is sensitive to small changes in A.
If the determinant of A is small, then the condition number of A will be very large.
4.1. The Residual Correction Method. A further use of this error estimation procedure is to
define an iterative method for improving the computed value x. Let x(0) , the initial computed value
for x, generally obtained by using Gaussian elimination. Define
r(0) = b − Ax(0) = A(x − x(0) ).
Then
Ae(0) = r(0) , e(0) = x − x(0) .
Solving by Gaussian elimination, we obtain an approximate value of e(0) . Using it, we define an
improved approximation
x(1) = x(0) + e(0) .
Now we repeat the entire process, calculating
r(1) = b − Ax(1)
x(2) = x(1) + Ae(1) ,
where e(1) is the approximate solution of
Ae(1) = r(1) , e(1) = x − x(1) .
Continue this process until there is no further decrease in the size of error vector.
For example, use a computer with four-digit floating-point decimal arithmetic with rounding, and
use Gaussian elimination with pivoting. The system to be solved is
x1 + 0.5x2 + 0.3333x3 = 1
0.5x1 + 0.3333x2 + 0.25x3 = 0
0.3333x1 + 0.25x2 + 0.2x3 = 0
Then
x(0) = [8.968, −35.77, 29.77]t
r(0) = [−0.005341, −0.004359, −0.0005344]t
e(0) = [0.09216, −0.5442, 0.5239]t
x(1) = [9.060, −36.31, 30.29]t
r(1) = [−0.0006570, −0.0003770, −0.0001980]t
e(1) = [0.001707, −0.01300, 0.01241]t
x(2) = [9.062, −36.32, 30.30]t .
dominant eigenvalue - that is the eigenvalue with largest magnitude. By modifying the method it can
be used to determine other eigenvalues. One useful feature of power method is that it produces not
only eigenvalue but also associated eigenvector.
To apply the power method, we assume that n × n matrix A has n eigenvalues λ1 , λ2 , · · · , λn (which
we don’t know) with associated eigenvectors v (1) , v (2) , · · · , v (n) . We say matrix A is diagonalizable.
We write
Av (i) = λi v (i) , i = 1, 2, · · · , n.
We assume that these eigenvalues are ordered so that λ1 is the dominant eigenvalue (with correspond-
ing eigenvector v (1) ).
From linear algebra, if A is diagonalizable, then it has n linearly independent eigenvectors v (1) , v (2) , · · · , v (n) .
An n×n matrix need not have n linearly independent eigenvectors. When it does not the Power method
may still be successful, but it is not guaranteed to be.
As the n eigenvectors v (1) , v (2) , · · · , v (n) are linearly independent, they must form a basis for Rn .
We select an arbitrary nonzero starting vector x(0) and express it as a linear combination of basis
vectors as
x0 = c1 v (1) + c2 v (2) + · · · + cn v (n) .
We assume that c1 6= 0. (If c1 = 0, the power method may not converge, and a different x(0) must be
used as an initial approximation.
Then we repeatedly carry out matrix-vector multiplication, using the matrix A to produce a sequence
of vectors. Specifically, we have
x(1) = Ax(0)
x(2) = Ax(1) = A2 x(0)
..
.
x(k) = Ax(k−1) = Ak x(0) .
In general, we have
x(k) = Ak x(0) , k = 1, 2, 3, · · ·
Substituting the value of x(0) , we obtain
x(k) = Ak x(0)
= c1 Ak v (1) + c2 Ak v (2) + · · · + cn Ak v (n)
= c1 λk1 v (1) + c2 λk2 v (2) + · · · + cn λkn v (n)
" k k #
λ 2 λn
= λk1 c1 v (1) + c2 v (2) + · · · + cn v (n)
λ1 λ1
Now, from our original assumption that λ1 is larger in absolute value than the other eigenvalues it
follows that each of the fractions
λ2 λ3 λn
, ,··· , < 1.
λ1 λ1 λ1
Therefore each of the factors k k k
λ2 λ3 λn
, ,··· ,
λ1 λ1 λ1
must approach 0 as k approaches infinity. This implies that the approximation
Ak x(0) ≈ λk1 c1 v (1) , c1 6= 0.
Since v (1) is a dominant eigenvector, it follows that any scalar multiple of v (1) is also a dominant
eigenvector. Thus we have shown that Ak x0 approaches a multiple of the dominant eigenvector of A.
The entries of Ak x(0) may grow with k, therefore we scale the powers of Ak x(0) in an appropriate
manner to ensure that the limit is finite and nonzero. The scaling begins by choosing initial guess to
be a unit vector x(0) relative to maximum norm, that is kx(0) k∞ = 1. Then we compute y (1) = Ax(0)
and next approximation can be taken as
y (1)
x(1) = .
ky (1) k∞
ITERATIVE TECHNIQUES IN MATRIX ALGEBRA 15
We repeat the procedure and stop by putting the following stopping criteria:
kx(k) − x(k−1) k∞
< ε,
kx(k) k∞
where ε is the desired accuracy.
Example 12. Calculate four iterations of the power method with scaling to approximate a dominant
eigenvector of the matrix
1 2 0
−2 1 2
1 3 1
Sol. Using x(0) = [1, 1, 1]T as initial approximation, we obtain
1 2 0 1 3
y (1) = Ax(0) = −2 1 2 1 = 1
1 3 1 1 5
and by scaling we obtain the approximation
3 0.60
x(1) = 1/5 1 = 0.20 .
5 1.00
Similarly we get
1.00 0.45
y (2) = Ax(1) = 1.00 = 2.20 0.45 = 2.20x(2) .
2.20 1.00
1.35 0.48
y (3) = Ax(2) = 1.55 = 2.8 0.55 = 2.8x(3) .
2.8 1.00
0.51
y (4) = Ax(3) = 3.1 0.51 .
1.00
etc.
After four iterations, we observe that dominant eigenvector is
0.51
x = 0.51 .
1.00
Scaling factors are approaching to dominant eigenvalue λ = 3.1.
Remark 5.1. The power method is useful to compute the eigenvalue but it gives only dominant eigen-
value. To find other eigenvalue we use properties of matrix such as sum of all eigenvalue is equal to the
trace of matrix. Also if λ is an eigenvalue of A then λ−1 is the eigenvalue of A−1 . Hence the smallest
eigenvalue of A is the dominant eigenvalue of A−1 .
5.1. Inverse Power method. The Inverse Power method is a modification of the Power method that
is used to determine the eigenvalue of A that is closest to a specified number σ.
We consider A − σI then its eigenvalues are λ1 − σ, λ2 − σ, · · · , λn − σ, where λ1 , λ2 , · · · , λn are the
eigenvalues of A.
1 1 1
Now the eigenvalues of (A − σI)−1 are , ,··· , .
λ1 − σ λ2 − σ λn − σ
The eigenvalues of the original matrix A that is the closest to σ corresponds to the eigenvalue of largest
magnitude of the shifted and inverted of matrix (A − σI)−1 .
To find the eigenvalue closest to σ, we apply the Power method to obtain the eigenvalue µ of (A−σI)−1 .
Then we recover the eigenvalue λ of the original problem by λ = 1/µ + σ. This method is called shifted
and inverted. We solve y = (A − σI)−1 x which implies (A − σI)y = x. We need not to compute the
inverse of the matrix.
16 ITERATIVE TECHNIQUES IN MATRIX ALGEBRA
Example 13. Apply the inverse power method with x(0) = [1, 1, 1]T to the matrix
−4 14 0
−5 13 0
−1 0 2
with σ = 19/3.
Sol. For the inverse power method, we consider
−31
3 14 0
19 20
A− I = −5 3 0
3
−1 0 − 13
3
Starting with x(0) = [1, 1, 1]T , (A − σI)−1 x(0) = y (1) gives (A − σI)y (1) = x(0) . This gives
−31
3 14 0 a 1
−5 20 0 b = 1 .
3
−1 0 − 13 3
c 1
Solving above system by Gauss elimination (LU decomposition), we get a = −6.6, b = −4.8, and
c = 1.2923.
Therefore y (1) = (−6.6, −4.8, 1.2923)T . We normalize it by taking 6.6 as scale factor and x(1) =
1 (1) = (1, 0.7272, −0.1958)T .
−6.6 y
1
Therefore first approximation of the eigenvalue of A near 19/3 is − 6.6 + 19
3 = 6.1818.
Repeating the above procedure we can obtain the eigenvalue (and which is 6).
Important Remark: Although the power method worked well in these examples, we must say
something about cases in which the power method may fail. There are basically three such cases:
1. Using the power method when A is not diagonalizable. Recall that A has n linearly Independent
eigenvector if and only if A is diagonalizable. Of course, it is not easy to tell by just looking at A
whether it is diagonalizable.
2. Using the power method when A does not have a dominant eigenvalue or when the dominant
eigenvalue is such that |λ1 | = |λ2 |.
3. If the entries of A contains significant error. Powers Ak will have significant roundoff error in their
entires.
Exercises
(1) Find l∞ and l2 norms of the vectors.
a. x = (3, −4, 0, 23 )t
b. x = (sin k, cos k, 2k )t for a fixedpositive integer
k.
4 −1 7
(2) Find the l∞ norm of the matrix: −1 4 0 .
−7 0 4
(3) The following linear system Ax = b have x as the actual solution and x̄ as an approximate
solution. Compute kx − x̄k∞ and kAx̄ − bk∞ . Also compute kAk∞ .
x1 + 2x2 + 3x3 = 1
2x1 + 3x2 + 4x3 = −1
3x1 + 4x2 + 6x3 = 2,
x = (0, −7, 5)t
x̄ = (−0.2, −7.5, 5.4)t .
(4) Find the first two iterations of Jacobi and Gauss-Seidel using x(0) = 0:
4.63x1 − 1.21x2 + 3.22x3 = 2.22
−3.07x1 + 5.48x2 + 2.11x3 = −3.17
1.26x1 + 3.11x2 + 4.57x3 = 5.11.
ITERATIVE TECHNIQUES IN MATRIX ALGEBRA 17
x1 − x3 = 0.2
1 1
− x1 + x2 − x3 = −1.425
2 4
1
x1 − x2 + x3 = 2
2
has the solution (0.9, −0.8, 0.7)T .
a. Is the coefficient matrix strictly diagonally dominant?
b. Compute the spectral radius of the Gauss-Seidel iteration matrix.
c. Perform four iterations of the Gauss-Seidel iterative method to approximate the solution.
d. What happens in part (c) when the first equation in the system is changed to x1 −2x3 = 0.2?
(6) Show that Gauss-Seidel method does not converge for the following system of equations
2x1 + 3x2 + x3 = −1
3x1 + 2x2 + 2x3 = 1
x1 + 2x2 + 2x3 = 1.
(7) Find the first two iterations of the SOR method with ω = 1.1 for the following linear systems,
using x(0) = 0 :
4x1 + x2 − x3 = 5
−x1 + 3x2 + x3 = −4
2x1 + 2x2 + 5x3 = 1.
has solution (1, 1)t . Use four-digit rounding arithmetic to find the solution of the perturbed
system
1 2 x1 3.00001
=
1.000011 2 x2 3.00003
Is matrix A ill-conditioned?
18 ITERATIVE TECHNIQUES IN MATRIX ALGEBRA
(11) Determine the largest eigenvalue and the corresponding eigenvector of the following matrix
correct to three decimals using the power method with x(0) = (−1, 2, 1)t using the power
method.
1 −1 0
−2 4 −2 .
0 −1 2
(12) Use the inverse power method to approximate the most dominant eigenvalue of the matrix until
a tolerance of 10−2 is achieved with x(0) = (1, −1, 2)t .
2 1 1
1 2 1 .
1 1 2
(13) Find the eigenvalue of matrix nearest to 3
2 −1 0
−1 2 −1
0 −1 2
using inverse power method.
Bibliography
[Burden] Richard L. Burden, J. Douglas Faires and Annette Burden, “Numerical Analysis,” Cengage
Learning, 10th edition, 2015.
[Atkinson] K. Atkinson and W. Han, “Elementary Numerical Analysis,” John Willey and Sons, 3rd
edition, 2004.
Appendix A. Algorithms
Algorithm (Gauss-Seidel):
(1) Input matrix A = [aij ], b, XO = x(0) , tolerance TOL, maximum number of iterations
(2) Set k = 1
(3) while (k ≤ N ) do step 4-7
(4) For i = 1, 2, · · · , n
i−1 n
1 X X
xi = − (aij xj ) − (aij XOj ) + bi )
aii
j=1 j=i+1
1. Introduction
Polynomials are used as the basic means of approximation in nearly all areas of numerical analysis.
They are used in the solution of equations and in the approximation of functions, of integrals and
derivatives, of solutions of integral and differential equations, etc. Polynomials have simple structure,
which makes it easy to construct effective approximations and then make use of them. For this reason,
the representation and evaluation of polynomials is a basic topic in numerical analysis. We discuss this
topic in the present chapter in the context of polynomial interpolation, the simplest and certainly the
most widely used technique for obtaining polynomial approximations.
Definition 1.1 (Polynomial). A polynomial Pn (x) of degree ≤ n is, by definition, a function of the
form
Pn (x) = a0 + a1 x + a2 x2 + · · · + an xn (1.1)
with certain coefficients a0 , a1 , · · · , an . This polynomial has (exact) degree n in case its leading coeffi-
cient an is nonzero.
The power form (1.1) is the standard way to specify a polynomial in mathematical discussions. It is
a very convenient form for differentiating or integrating a polynomial. But, in various specific contexts,
other forms are more convenient. For example, the following shifted power form may be helpful.
P (x) = a0 + a1 (x − c) + a2 (x − c)2 + · · · + an (x − c)n . (1.2)
It is good practice to employ the shifted power form with the center c chosen somewhere in the interval
[a, b] when interested in a polynomial on that interval.
Definition 1.2 (Newton form). A further generalization of the shifted power form is the following
Newton form
P (x) = a0 + a1 x − c1 ) + a2 (x − c1 )(x − c2 ) + · · · + an (x − c1 )(x − c2 ) · · · (x − cn ).
This form plays a major role in the construction of an interpolating polynomial. It reduces to the
shifted power form if the centers c1 , · · · , cn , all equal c, and to the power form if the centers c1 , · · · , cn ,
all equal zero.
2. Lagrange Interpolation
In this chapter, we consider the interpolation problem. Suppose we do not know the function f , but
a few information (data) about f . Now we try to compute a function g that approximates f .
2.1. Polynomial Interpolation. The polynomial interpolation problem, also called Lagrange inter-
polation, can be described as follows: Given (n+1) data points (xi , yi ), i = 0, 1, · · · , n find a polynomial
P of lowest possible degree such
yi = P (xi ), i = 0, 1, · · · , n.
Such a polynomial is said to interpolate the data. Here yi may be the value of some unknown function
f at xi , i.e. yi = f (xi ).
One reason for considering the class of polynomials in approximation of functions is that they uniformly
approximate continuous function.
Theorem 2.1 (Weierstrass Approximation Theorem). Suppose that f is defined and continuous on
[a, b]. For any ε > 0, there exists a polynomial P (x) defined on [a, b] with the property that
|f (x) − P (x)| < ε, ∀x ∈ [a, b].
1
2 INTERPOLATION AND APPROXIMATIONS
Another reason for considering the class of polynomials in approximation of functions is that the
derivatives and indefinite integrals of a polynomial are easy to compute.
Theorem 2.2 (Existence and Uniqueness). Given a real-valued function f (x) and n + 1 distinct points
x0 , x1 , · · · , xn , there exists a unique polynomial Pn (x) of degree ≤ n which interpolates the unknown
f (x) at points x0 , x1 , · · · , xn .
Proof. Existence: Let x0 , x1 , · · · , xn be the given n + 1 discrete data points. We will prove the result
by the mathematical induction.
The Theorem clearly holds for n = 0, only one data point is given and we can take constant polynomial
P0 (x) = f (x0 ), ∀x.
Assume that the Theorem holds for n ≤ k, i.e. there is a polynomial Pk with degree ≤ k such that
Pk (xi ) = f (xi ), for 0 ≤ i ≤ k.
Now we try to construct a polynomial of degree at most k + 1 to interpolate (xi , f (xi )), 0 ≤ i ≤ k + 1.
Let
Pk+1 (x) = Pk (x) + c(x − x0 )(x − x1 ) · · · (x − xk ).
For x = xk+1 ,
f (xk+1 ) − Pk (xk+1 )
=⇒ c = .
(xk+1 − x0 )(xk+1 − x1 ) · · · (xk+1 − xk )
Since xi are distinct, the polynomial Pk+1 (x) is well-defined and degree of Pk+1 ≤ k + 1. Now
and
Pk+1 (xk+1 ) = f (xk+1 )
Above two equations implies
Pk+1 (xi ) = f (xi ), 0 ≤ i ≤ k + 1.
Therefore Pk+1 (x) interpolate f (x) at all k + 2 nodal points. By mathematical induction result is true
for all polynomials.
Uniqueness: Let there are two such polynomials Pn and Qn such that
Pn (xi ) = f (xi )
Qn (xi ) = f (xi ), 0 ≤ i ≤ n.
Define
Sn (x) = Pn (x) − Qn (x)
Since for both Pn and Qn , degree ≤ n, which implies the degree of Sn is also ≤ n.
Also
Sn (xi ) = Pn (xi ) − Qn (xi ) = f (xi ) − f (xi ) = 0, 0 ≤ i ≤ n.
This implies Sn has at least n + 1 zeros which is not possible as degree of Sn is at most n.
This implies
Sn (x) = 0, ∀x
solved in order to determine all li (x)’s. Fortunately there is a shortcut. An obvious way of constructing
polynomials li (x) of degree n that satisfy the condition is the following:
(x − x0 )(x − x1 ) · · · (x − xi−1 )(x − xi+1 ) · · · (x − xn )
li (x) = .
(xi − x0 )(xi − x1 ) · · · (xi − xi−1 )(xi − xi+1 ) · · · (xi − xn )
The uniqueness of the interpolating polynomial of degree ≤ n given n + 1 distinct interpolation points
implies that the polynomials li (x) given by above relation are the only polynomials of degree n.
Note that the denominator does not vanish since we assume that all interpolation points are distinct.
We can write the formula for li (x) in a compact form using the product notation.
(x − x0 )(x − x1 ) · · · (x − xi−1 )(x − xi+1 ) · · · (x − xn )
li (x) =
(xi − x0 )(xi − x1 ) · · · (xi − xi−1 )(xi − xi+1 ) · · · (xi − xn )
W (x)
= , i = 0, 1, · · · , n
(x − xi )W 0 (xi )
where
W (x) = (x − x0 ) · · · (x − xi−1 )(x − xi )(x − xi+1 ) · · · (x − xn )
∴ W 0 (xi ) = (xi − x0 ) · · · (xi − xi−1 )(xi − xi+1 ) · · · (xi − xn ).
The Lagrange interpolating polynomial can be written as
n n
X Y (x − xj )
Pn (x) = f (xi ) .
(xi − xj )
i=0 j=0
j6=i
Example 1. Given the following four data point. Find a polynomial in Lagrange form to interpolate
xi 0 1 3 5
yi 1 2 6 7
the data.
Sol. The Lagrange functions are given by
(x − 1)(x − 3)(x − 5) 1
l0 (x) = = − (x − 1)(x − 3)(x − 5).
(0 − 1)(0 − 3)(0 − 5) 15
(x − 0)(x − 3)(x − 5) 1
l1 (x) = = (x − 0)(x − 3)(x − 5).
(1 − 0)(1 − 3)(1 − 5) 8
(x − 0)(x − 1)(x − 5) 1
l2 (x) = = − (x)(x − 1)(x − 5).
(3 − 0)(3 − 1)(3 − 5) 12
(x − 0)(x − 1)(x − 3) 1
l3 (x) = = (x)(x − 1)(x − 3).
(5 − 0)(5 − 1)(5 − 3) 40
The interpolating polynomial in the Lagrange form is
P3 (x) = l0 (x) + 2l1 (x) + 6l2 (x) + 7l3 (x).
√
Example 2. Let f (x) = x − x2 and P2 (x) be the interpolation polynomial on x0 = 0, x1 and x2 = 1.
Find the largest value of x1 in (0, 1) for which f (0.5) − P2 (0.5) = −0.25.
√ p
Sol. If f (x) = x − x2 then our nodes are [x0 , x1 , x2 ] = [0, x1 , 1] and f (x0 ) = 0, f (x1 ) = x1 − x21
and f (x2 ) = 0. Therefore
(x − x1 )(x − x2 ) (x − x1 )(x − 1)
l0 (x) = = ,
(x0 − x1 )(x0 − x2 ) x1
(x − x0 )(x − x2 ) x(x − 1)
l1 (x) = = ,
(x1 − x0 )(x1 − x2 ) x1 (x1 − 1)
(x − x0 )(x − x1 ) x(x − 1)
l2 (x) = = .
(x2 − x0 )(x2 − x1 ) (1 − x1 )
INTERPOLATION AND APPROXIMATIONS 5
Corollary 2.5. If |f (n+1) (ξ)| ≤ M then we can obtain a bound of the error
M
|f (x) − P (x)| ≤ max |(x − x0 ) · · · (x − xn )|.
(n + 1)! x∈[a,b]
The next example illustrates how the error formula can be used to prepare a table of data that will
ensure a specified interpolation error within a specified bound.
Example 3. Suppose a table is to be prepared for the function f (x) = ex , for x in [0, 1]. Assume
the number of decimal places to be given per entry is d ≥ 8 and that the difference between adjacent
x−values, the step size, is h. What step size h will ensure that linear interpolation gives an absolute
error of at most 10−6 for all x in [0, 1]?
Sol. Let x0 , x1 , . . . be the numbers at which f is evaluated, x be in [0, 1], and suppose i satisfies
xi ≤ x ≤ xi+1 .
The error in linear interpolation is
1 2 |f 2 (ξ)|
|f (x) − P (x)| = f (ξ)(x − xi )(x − xi+1 ) = |(x − xi )||(x − xi+1 )|.
2 2
The step size is h, so xi = ih, xi+1 = (i + 1)h, and
1
|f (x) − p(x)| ≤ |f 2 (ξ)| |(x − ih)(x − (i + 1)h|.
2
Hence
1
|f (x) − p(x)| ≤ max eξ max |(x − ih)(x − (i + 1)h|
2 ξ∈[0,1] xi ≤x≤xi+1
e
≤ max |(x − ih)(x − (i + 1)h|.
2 xi ≤x≤xi+1
Consider the function g(x) = (x − ih)(x − (i + 1)h), for ih ≤ x ≤ (i + 1)h. Because
h
g 0 (x) = (x − (i + 1)h) + (x − ih) = 2 x − ih − ),
2
the only critical point for g is at x = ih + h/2, with g(ih + h/2) = (h/2) = h2 /4. Since g(ih) = 0 and
2
g((i + 1)h) = 0, the maximum value of |g 0 (x)| in [ih, (i + 1)h] must occur at the critical point which
implies that
e e h2 eh2
|f (x) − p(x)| ≤ max |g(x)| ≤ · = .
2 xi ≤x≤xi+1 2 4 8
Consequently, to ensure that the the error in linear interpolation is bounded by 10−6 , it is sufficient
for h to be chosen so that
eh2
≤ 10−6 .
8
This implies that h < 1.72 × 10−3 .
Because n = (1 − 0)/h must be an integer, a reasonable choice for the step size is h = 0.001.
Example 4. Determine the step size h that can be used in the tabulation of a function f (x), a ≤ x ≤ b,
at equally spaced nodal points so that the truncation error of the quadratic interpolation is less than ε.
INTERPOLATION AND APPROXIMATIONS 7
Sol. Let xi−1 , xi , xi+1 are three eqispaced points with spacing h. The truncation error of the quadratic
interpolation is given by
M
|f (x) − P( x)| ≤ max |(x − xi−1 , )(x − xi )(x − xi+1 )|
3! a≤x≤b
where M = max |f (3) (x)|.
a≤x≤b
To simplify the calculation, let
x − xi = th
∴ x − xi−1 = x − (xi − h) = (t + 1)h
and x − xi+1 = x − (xi + h) = (t − 1)h.
∴ |(x − xi−1 , )(x − xi )(x − xi+1 )| = h3 |t(t + 1)(t − 1)| = g(t)(say)
Now g(t) attains its extreme values if
dg
=0
dt
which gives t = ± √13 . At end points of the interval g becomes zero.
For both values of t = ± √13 , we obtain max 2
|g(t)| = h3 3√ 3
.
xi−1 ≤x≤xi+1
Truncation error
|f (x) − P2 (x)| < ε
h3
=⇒ √ M < ε
9 3
" √ #1/3
9 3ε
=⇒ h < .
M
3. Neville’s Method
Neville’s method can be applied in the situation that we want to interpolate f (x) at a given point
x = p with increasingly higher order Lagrange interpolation polynomials.
For concreteness, consider three distinct points x0 , x1 , and x2 at which we can evaluate f (x) ex-
actly f (x0 ), f (x1 ), f (x2 ). From each of these three points we can construct an order zero (constant)
”polynomial” to approximate f (p)
f (p) ≈ P0 (p) = f (x0 ) (3.1)
f (p) ≈ P1 (p) = f (x1 ) (3.2)
f (p) ≈ P2 (p) = f (x2 ) (3.3)
Of course this isn’t a very good approximation so we turn to first order Lagrange polynomials
p − x1 p − x0
f (p) ≈ P0,1 (p) = f (x0 ) + f (x1 )
x0 − x1 x1 − x0
p − x2 p − x1
f (p) ≈ P1,2 (p) = f (x1 ) + f (x2 ).
x1 − x2 x2 − x1
There is also P0,2 , but we won’t concern ourselves with that one.
If we note that f (xi ) = Pi (x), we find
p − x1 p − x0
P0,1 (p) = P0 (p) + P1 (p)
x0 − x1 x1 − x0
(p − x1 )P0 (p) − (p − x0 )P1 (p)
=
x0 − x1
and similarly
(p − x2 )P1 (p) − (p − x1 )P2 (p)
P1,2 (p) =
x1 − x2
In general we want to multiply Pi (x) by (x−xj ) where j 6= i (i.e., xj is a point that is NOT interpolated
by Pi (x)). We take the difference of two such products and divide by the difference between the added
8 INTERPOLATION AND APPROXIMATIONS
points.
The result is a polynomial Pi,i−1 of one degree higher than either of the two used to construct it
and that interpolates all the points of the two constructing polynomials combined. This idea can be
extended to construct the third order polynomial P0,1,2
(p − x2 )P0,1 (p) − (p − x0 )P1,2 (p)
P0,1,2 (p) = .
x0 − x2
A little algebra will convince that
(p − x1 )(p − x2 ) (p − x0 )(p − x2 ) (p − x0 )(p − x1 )
P0,1,2 (p) = f (x0 ) + f (x1 ) + f (x2 )
(x0 − x1 )(x0 − x2 ) (x1 − x0 )(x1 − x2 ) (x2 − x0 )(x2 − x1 )
which is just the third order Lagrange polynomial interpolating the points x0 , x1 , x2 . This shouldn’t
surprise you since this is the unique third order polynomial interpolating these three points.
Example: We are given the function
1
f (x) =
x
We want to approximate the value f (3). First we evaluate the function at the three points
xi f (xi )
2 0.5
2.5 0.4
4 0.25
We can first make three separate zero-order approximations
f (3) ≈ P0 (3) = f (x0 ) = 0.5
Sol.
(0.4 − 0.75)P2 − (0.4 − 0.5)P3
P2,3 (0.4) = = 2.4
0.5 − 0.75
=⇒ P2 = 4
(0.4 − 0.5)P1 − (0.4 − 0.25)P2 (−0.1)(2) − (0.15)(4)
P1,2 (0.4) = = = 3.2
0.25 − 0.5 −0.25
(0.4 − 0.5)P0,1 − (0.4 − 0)P1,2 (0.4) (−0.1)(2.6) − (0.4)(3.2)
P0,1,2 (0.4) == = = 3.08.
0 − 0.5 −0.5
Example 6. In Neville’s method, suppose xi = i, for i = 0, 1, 2, 3 and it is known that P0,1 (x) =
x + 1, P1,2 (x) = 3x − 1, and P1,2,3 (1.5) = 4. Find P2,3 (1.5) and P0,1,2,3 (1.5).
Sol. Here x0 = 0, x1 = 1, x2 = 2, x3 = 3.
(x − x2 )P0,1 (x) − (x − x0 )P1,2 (x) (x − 2)(x + 1) − x(3x + 1)
P0,1,2 (x) = = = x2 + 1.
x0 − x2 −2
We can write the Newton divided difference formula in the following fashion (and we will prove in next
Theorem).
Pn (x) = f (x0 ) + f [x0 , x1 ](x − x0 ) + f [x0 , x1 , x2 ](x − x0 )(x − x1 ) +
· · · + f [x0 , x1 , · · · , xn ](x − x0 )(x − x1 ) · · · (x − xn−1 )
n
X i−1
Y
= f (x0 ) + f [x0 , x1 , · · · , xi ] (x − xj ).
i=1 j=0
We can also construct the Newton interpolating polynomial as given in the next result.
Theorem 4.1. The unique polynomial of degree ≤ n that passes through (x0 , y0 ), (x1 , y1 ), · · · , (xn , yn )
is given by
Pn (x) = f [x0 ] + f [x0 , x1 ](x − x0 ) + f [x0 , x1 , x2 ](x − x0 )(x − x1 ) + · · · +
f [x0 , · · · , xn ](x − x0 )(x − x1 ) · · · (x − xn−1 )
Proof. We prove it by induction. The unique polynomial of degree 0 that passes through (x0 , y0 ) is
obviously
P0 (x) = y0 = f [x0 ].
Suppose that the polynomial Pk (x) of order ≤ k that passes through (x0 , y0 ), (x1 , y1 ), · · · , (xk , yk ) is
Pk (x) = f [x0 ] + f [x0 , x1 ](x − x0 ) + f [x0 , x1 , x2 ](x − x0 )(x − x1 ) + · · · +
f [x0 , · · · , xk ](x − x0 )(x − x1 ) · · · (x − xk−1 )
Write Pk+1 (x), the unique polynomial of order (degree) ≤ k that passes through (x0 , y0 ), (x1 , y1 ), · · · ,
(xk , yk )(xk+1 , yk+1 ) by
Pk+1 (x) = f [x0 ] + f [x0 , x1 ](x − x0 ) + f [x0 , x1 , x2 ](x − x0 )(x − x1 ) + · · · +
f [x0 , · · · , xk ](x − x0 )(x − x1 ) · · · (x − xk−1 ) + C(x − x0 )(x − x1 ) · · · (x − xk−1 )(x − xk )
We only need to show that
C = f [x0 , x1 , · · · , xk , xk+1 ].
For this, let Qk (x) be the unique polynomial of degree ≤ k that passes through (x1 , y1 ), · · · , (xk , yk )
(xk+1 , yk+1 ). Define
x − x0
R(x) = Pk (x) + [Qk (x) − Pk (x)]
xk+1 − x0
Then,
• R(x) is a polynomial of degree k + 1.
• R(x0 ) = Pk (x0 ) = y0 ,
xi − x0
R(xi ) = Pk (xi ) + (Qk (xi ) − Pk (xi )) = Pk (xi ) = yi , i = 1, · · · , k,
xk+1 − x0
R(xk+1 ) = Qk (xk+1 ) = yk+1 .
INTERPOLATION AND APPROXIMATIONS 11
Example 7. Given the following four data points. Find a polynomial in Newton form to interpolate
xi 0 1 3 5
yi 1 2 6 7
P3 (x) = f (x0 ) + (x − 0)f [0, 1] + (x − 0)(x − 1)f [0, 1, 3] + (x − 0)(x − 1)(x − 3)f [0, 1, 3, 5]
P3 (x) = 1 + x + 1/3x(x − 1) − 17/120x(x − 1)(x − 3).
Note that xi can be re-ordered but must be distinct. When the order of some xi are changed, one
obtain the same polynomial but in different form.
Theorem 4.2. Let f ∈ C n [a, b] and x0 , · · · , xn are distinct numbers in [a, b]. Then there exists ξ such
that
f (n) (ξ)
f [x0 , x1 , x2 , · · · , xn ] = .
n!
12 INTERPOLATION AND APPROXIMATIONS
Proof. Let
n
X
Pn (x) = f (x0 ) + f [x0 , x1 , · · · , xk ](x − x0 )(x − x1 ) · · · (x − xk−1 )
k=1
be the interpolating polynomial of f in Newton’s form. Define
g(x) = f (x) − Pn (x).
Since Pn (xi ) = f (xi ) for i = 0, 1, · · · , n, the function g has n + 1 distinct zeros in [a, b]. By the
generalized Rolle’s Theorem there exists ξ ∈ (a, b) such that
g (n) (ξ) = f (n) (ξ) − Pn(n) (ξ) = 0.
Here
Pn(n) (x) = n! f [x0 , x1 , · · · , xn ].
Therefore
f (n) (ξ)
f [x0 , x1 , · · · , xn ] = .
n!
Example 9. Using the following table for tan x, approximate its value at 0.71 using Newton interpo-
lation.
Sol. As the point x = 0.71 lies in the beginning, we will use Newton’s forward interpolation. The
forward difference table is:
Here x0 = 0.70, h = 0.02, x = 0.71 = x0 + sh gives s = 0.5. The Newton forward difference polynomial
is given by
s(s − 1) 2 s(s − 1)(s − 2) 3 s(s − 1)(s − 2)(s − 3) 4
P3 (x) = f (x0 )+s∆f (x0 )+ ∆ f (x0 )+ ∆ f (x0 )+ ∆ f (x0 ).
2! 3! 4!
Substituting the values from table (first entries of each column starting from second), we obtain
P3 (0.71) = tan(0.71) = 0.8596.
Example 10. Show that the cubic polynomials
P (x) = 3 − 2(x + 1) + 0(x + 1)(x) + (x + 1)(x)(x − 1)
and
Q(x) = −1 + 4(x + 2) − 3(x + 2)(x + 1) + (x + 2)(x + 1)(x)
both interpolate the given data. Why does this not violate the uniqueness property of interpolating
x -2 -1 0 1 2
f (x) -1 3 1 -1 3
polynomials?
Sol. The forward difference table is:
In the formulation of P (x), second node is taken as x0 while in the formulation of Q(x) first point
is taken as initial point.
Also (alternatively without drawing the table) P (−2) = Q(−2) = −1, P (−1) = Q(−1) = 3, P (0) =
Q(0) = 1, P (1) = Q(1) = −1, P (2) = Q(2) = 3.
Therefore both the cubic polynomials interpolate the given data. Further the interpolating polyno-
mials are unique but format of a polynomial is not unique.
If P (x) and Q(x) are expanded, they are identical.
INTERPOLATION AND APPROXIMATIONS 15
5 5 5 5
x2i = 2.2,
P P P P
From the data, we have xi = 3, f (xi ) = 3.748, and xi f (xi ) = 2.5224.
i=1 i=1 i=1 i=1
Therefore
5a + 3b = 3.748, 3a + 2.2b = 2.5224.
The solution of this system is a = 0.3392 and b = 0.684. The required approximation is y = 0.3392 +
0.684x.
5
[f (xi ) − (0.3392 + 0.684xi )2 ] = 0.00245.
P
Least square error=
i=1
Example 12. Find the least square approximation of second degree for the discrete data
x −2 −1 0 1 2
f (x) 15 1 1 3 19
5
X 5
X 5
X
f (xi ) = 5a + b xi + c x2i
i=1 i=1 i=1
5
X 5
X 5
X 5
X
xi f (xi ) = a xi + b x2i + c x3i
i=1 i=1 i=1 i=1
5
X 5
X 5
X 5
X
x2i f (xi ) = a x2i + b x3i + c x4i .
i=1 i=1 i=1 i=1
5 5 4 5 5 5 5
x2i = 10, x3i = 0, x4i = 34, x2i f (xi ) =
P P P P P P P
We have xi = 0, f (xi ) = 39, xi f (xi ) = 10,
i=1 i=1 i=1 i=1 i=1 i=1 i=1
140.
From given data
5a + 10c = 39
10b = 10
10a + 34c = 140.
−37 31
The solution of this system is a = , b = 1, and c = .
35 7
1
The required approximation is y = (−37 + 35x + 155x2 ).
35
√
Example 13. Use the method of least square to fit the curve f (x) = c0 x + c1 / x. Also find the least
square error.
INTERPOLATION AND APPROXIMATIONS 17
Example 14. Obtain the least square fit of the form y = abx to the following data
x 1 2 3 4 5 6 7 8
f (x) 1.0 1.2 1.8 2.5 3.6 4.7 6.6 9.1
Sol. The curve y = abx takes the form Y = A + Bx after taking log, where Y = log y, A = log a and
B = log b.
Hence the normal equations are given by
8
X 8
X
Yi = 8A + B xi
i=1 i=1
8
X x
X 8
X
xi Yi = A xi + B x2i
i=1 i=1 i=1
From the data, we form the following table.
Putting the values, we obtain
8A + 36B = 3.7393, 36A + 204B = 22.7385
=⇒ A = 0.1656, B = 0.1407
=⇒ a = 0.68, b = 1.38
The required curve is y = (0.68)(1.38)x .
18 INTERPOLATION AND APPROXIMATIONS
x y Y = log y xY x2
1 1.0 0.0 0.0 1
2 1.2 0.0792 0.1584 4
3 1.8 0.2553 0.7659 9
4 2.5 0.3979 1.5916 16
5 3.6 0.5563 2.7815 25
6 4.7 0.6721 4.0326 36
7 6.6 0.8195 5.7365 49
8 9.1 0.9590 7.6720 64
Σ 36 30.5 3.7393 22.7385 204
Remark 5.1. If data is quite large then we can make it small by changing the origin and appropriating
scaling.
Example 15. Show that the line of fit to the following data is given by y = 0.7x + 11.28.
x 0 5 10 15 20 25
y 12 15 17 22 24 30
Sol. Here n = 6. We fit a line of the form y = A + Bx.
x − 15
Let u = , v = y − 20 and line of the form v = a + bu.
5
x y u v uv u2
0 12 −3 −8 24 9
5 15 −2 −5 10 4
10 17 −1 −3 3 1
15 22 0 2 0 0
20 24 1 4 4 1
25 30 2 10 20 4
Σ −3 0 61 19
The normal equations are,
0 = 6a − 3b
61 = −3a + 19b.
By solving a = 1.7428 and b = 3.4857.
Therefore equation of the line is v = 1.7428 + 3.4857u.
Changing in to original variable, we obtain
x − 15
y − 20 = 1.7428 + 3.4857
5
=⇒ y = 11.2857 + 0.6971x.
Exercises
(1) Find the unique polynomial P (x) of degree 2 or less such that
P (1) = 1, P (3) = 27, P (4) = 64
using Lagrange interpolation. Evaluate P (1.05).
(2) For the given functions f (x), let x0 = 1, x1 = 1.25, and x2 = 1.6. Construct Lagrange
interpolation polynomials of degree at most one and at most two to approximate f (1.4), and
find the absolute error. √
a. f (x) = sin πx b. f (x) = 3 x − 1 c. f (x) = log10 (3x − 1) d. f (x) = e2x − x.
(3) Let P3 (x) be the Lagrange interpolating polynomial for the data (0, 0), (0.5, y), (1, 3) and (2, 2).
Find y if the coefficient of x3 in P3 (x) is 6.
(4) Let f (x) = ln(1 + x), x0 = 1, x1 = 1.1. Use Lagrange linear interpolation to find the
approximate value of f (1.04) and obtain a bound on the truncation error.
INTERPOLATION AND APPROXIMATIONS 19
(5) Construct the Lagrange interpolating polynomials for the following functions, and find a bound
for the absolute error on the interval [x0 , xn ].
a. f (x) = e2x cos 3x, x0 = 0, x1 = 0.3, x2 = 0.6, n = 2.
b. f (x) = sin(ln x), x0 = 2.0, x1 = 2.4, x2 = 2.6, n = 2.
c. f (x) = cos x + sin x, x0 = 0, x1 = 0.25, x2 = 0.5, x3 = 1.0, n = 3.
(6) Use the following values and four-digit rounding arithmetic to construct a third degree Lagrange
polynomial approximation to f (1.09). The function being approximated is f (x) = log10 (tan x).
Use this knowledge to find a bound for the error in the approximation.
f (1.00) = 0.1924, f (1.05) = 0.2414, f (1.10) = 0.2933, f (1.15) = 0.3492.
(7) Use the Lagrange interpolating polynomial of degree three or less and four-digit chopping
arithmetic to approximate cos 0.750 using the following values. Find an error bound for the
approximation.
cos 0.698 = 0.7661, cos 0.733 = 0.7432, cos 0.768 = 0.7193, cos 0.803 = 0.6946.
The actual value of cos 0.750 is 0.7317 (to four decimal places). Explain the discrepancy between
the actual error and the error bound. √
(8) Determine the spacing h in a table of equally spaced values of the function f (x) = x between
1 and 2, so that interpolation with a quadratic polynomial will yield an accuracy of 5 × 10−8 .
(9) It is suspected that the high amounts of tannin in mature oak leaves inhibit the growth of the
winter moth larvae that extensively damage these trees in certain years. The following table
lists the average weight of two samples of larvae at times in the first 28 days after birth. The
first sample was reared on young oak leaves, whereas the second sample was reared on mature
leaves from the same tree.
a. Use Lagrange interpolation to approximate the average weight curve for each sample.
b. Find an approximate maximum average weight for each sample by determining the maximum
of the interpolating polynomial.
Day 0 6 10 13 17 20 8
Sample 1 average weight (mg) 6.67 17.33 42.67 37.33 30.10 29.31 28.74
Sample 2 average weight (mg) 6.67 16.11 18.89 15.00 10.56 9.44 8.89
(10) Use Neville’s method to obtain the approximations for Lagrange interpolating polynomials of
degrees one, two, and three to approximate each of the following:
a. f (8.4) if f (8.1) = 16.94410, f (8.3) = 17.56492, f (8.6) = 18.50515, f (8.7) = 18.82091
b. f (−1/3) if f (−0.75) = −0.07181250, f (−0.5) = −0.02475000, f (−0.25) = 0.33493750, f (0) =
1.10100000. √
(11) Use Neville’s method to approximate 3 with the following functions and values.
a. f (x) = 3√x and the values x0 = −2, x1 = −1, x2 = 0, x3 = 1, and x4 = 2.
b. f (x) = x and the values x0 = 0, x1 = 1, x2 = 2, x3 = 4, and x4 = 5.
c. Compare the accuracy of the approximation in parts (a) and (b).
(12) Let P3 (x) be the interpolating polynomial for the data (0, 0), (0.5, y), (1, 3), and (2, 2). Use
Neville’s method to find y if P3 (1.5) = 0.
(13) Neville’s Algorithm is used to approximate f (0) using f (−2), f (−1), f (1), and f (2). Suppose
f (−1) was understated by 2 and f (1) was overstated by 3. Determine the error in the original
calculation of the value of the interpolating polynomial to approximate f (0).
(14) If linear interpolation is used to interpolate the error function
Z x
2 2
f (x) = √ e−x dt,
π 0
show that the √ error of linear interpolation using data (x0 , f0 ) and (x1 , f1 ) cannot exceed
(x1 − x0 )2 /2 2πe.
(15) Using Newton divided difference interpolation, construct interpolating polynomials of degree
one, two, and three for the following data. Approximate the specified value using each of the
polynomials.
f (0.43) if f (0) = 1, f (0.25) = 1.64872, f (0.5) = 2.71828, f (0.75) = 4.4816.
20 INTERPOLATION AND APPROXIMATIONS
(16) Show that the polynomial interpolating (in Newton form) the following data has degree 3.
x −2 −1 0 1 2 3
f (x) 1 4 11 16 13 −4
(17) Let f (x) = ex , show that f [x0 , x1 , . . . , xm ] > 0 for all values of m and all distinct equally spaced
nodes {x0 < x1 < · · · < xm }.
(18) Show that the interpolating polynomial for f (x) = xn+1 at n + 1 nodal points x0 , x1 , · · · , xn is
given by
xn+1 − (x − x0 )(x − x1 ) · · · (x − xn ).
(19) The following data are given for a polynomial P (x) of unknown degree
x 0 1 2 3
f (x) 4 9 15 18
Determine the coefficient of x3 in P (x) if all fourth-order forward differences are 1.
(20) Let i0 , i1 , · · · , in be a rearrangement of the integers 0, 1, · · · , n. Show that f [xi0 , xi1 , · · · , xin ] =
f [x0 , x1 , · · · , xn ].
(21) Let f (x) = 1/(1 + x) and let x0 = 0, x1 = 1, x2 = 2. Calculate the divided differences f [x0 , x1 ]
and f [x0 , x1 , x2 ]. Using these divided differences, give the quadratic polynomial P2 (x) that
interpolates f (x) at the given node points {x0 , x1 , x2 }. Graph the error f (x) − P2 (x) on the
interval [0, 2].
(22) Construct the interpolating polynomial that fits the following data using Newton forward and
backward
x 0 0.1 0.2 0.3 0.4 0.5
f (x) −1.5 −1.27 −0.98 −0.63 −0.22 0.25
difference interpolation. Hence find the values of f (x) at x = 0.15 and 0.45.
(23) For a function f , the forward-divided differences are given by
x0 = 0.0 f [x0 ]
50
x1 = 0.4 f [x1 ] f [x0 , x1 ] f [x0 , x1 , x2 ] =
7
x1 = 0.4 f [x2 ] = 6 f [x1 , x2 ] = 10
Determine the missing entries in the table.
(24) A fourth-degree polynomial P (x) satisfies ∆4 P (0) = 24, ∆3 P (0) = 6, and ∆2 P (0) = 0, where
∆P (x) = P (x + 1) − P (x). Compute ∆2 P (10).
(25) Show that
f (n+1) (ξ(x))
f [x0 , x1 , x2 , · · · , xn , x] = .
(n + 1)!
(26) Use the method of least squares to fit the linear and quadratic polynomial to the following
data.
x −2 −1 0 1 2
f (x) 15 1 1 3 19
(27) By the method of least square fit a curve of the form y = axb to the following data.
x 2 3 4 5
y 27.8 62.1 110 161
√
(28) Use the method of least squares to fit a curve y = c0 /x + c1 x to the following data.
x 0.1 0.2 0.4 0.5 1 2
y 21 11 7 6 5 6
(29) Experiment with a periodic process provided the following data :
t◦ 0 50 100 150 200
y 0.754 1.762 2.041 1.412 0.303
Estimate the parameter a and b in the model y = a + b sin t, using the least square approxima-
tion.
INTERPOLATION AND APPROXIMATIONS 21
Appendix A. Algorithms
Algorithm (Lagrange Interpolation):
•
Read the degree n of the polynomial Pn (x).
•
Read the values of x(i) and y(i) = f (xi ), i = 0, 1, . . . , n.
•
Read the point of interpolation p.
•
Calculate the Lagrange’s fundamental polynomials li (x) using the following loop:
for i=1 to n
l(i) = 1.0
for j=1 to n
if j 6= i
p − x(j)
l(i) = ∗ l(i)
x(i) − x(j)
end j
end i
• Calculate the approximate value of the function at x = p using the following loop:
sum=0.0
for i=1 to n
sum = sum + l(i) ∗ y(i)
end i
• Print sum.
Algorithm (Newton Divided-Difference Interpolation):
Given n distinct interpolation points x0 , x1 , · · · , xn , and the values of a function f (x) at these
points, the following algorithm computes the matrix of divided differences:
D = zeros(n, n);
for i = 1 : n
D(i,1) = y(i);
end i
for j = 2 : n,
for k = j : n,
D(k, j) = (D(k, j − 1) − D(k − 1, j − 1))/(x(k) − x(k − j + 1));
end i
end j.
Bibliography
[Burden] Richard L. Burden, J. Douglas Faires and Annette Burden, “Numerical Analysis,” Cengage
Learning, 10th edition, 2015.
[Atkinson] K. Atkinson and W. Han, “Elementary Numerical Analysis,” John Willey and Sons, 3rd
edition, 2004.
CHAPTER 6 (4 LECTURES)
NUMERICAL INTEGRATION
1. Introduction
The general problem is to find the approximate value of the integral of a given function f (x) over
an interval [a, b]. Thus
Z b
I= f (x)dx. (1.1)
a
Problem can be solved by using the Fundamental Theorem of Calculus by finding an anti-derivative
F of f , that is, F 0 (x) = f (x), and then
Z b
f (x)dx = F (b) − F (a).
a
But finding an anti-derivative is not an easy task in general. Hence, it is certainly not a good approach
for numerical computations.
In this chapter we’ll study methods for finding integration rules. We’ll also consider composite versions
of these rules and the errors associated with them.
where
Z b
λi = li (x)dx.
a
1
2 NUMERICAL INTEGRATION
We can also use Newton divided difference interpolation to approximate the function f (x).
3. Newton-Cotes Formula
b−a
Let all nodes are equally spaced with spacing h = . The number h is also called the step
n
length.
Let x0 = a and xn = b then xi = a + ih, i = 0, 1, · · · , n.
The general quadrature formula is given by
Z b n
X
f (x)dx = λi f (xi ) + E(f ).
a i=0
This formula is called Newton-Cotes formula if all points are equally spaced. We now derive rules by
taking one and two degree interpolating polynomials.
Rb
3.1. Trapezoidal Rule. We derive the Trapezoidal rule for approximating f (x)dx using the linear
a
Lagrange polynomial.
Let x0 = a, x1 = b, and h = b − a.
b=x
Z 1 Zx1
f (x) dx = P1 (x)dx + E(f ).
a=x0 x0
Zx1 Zx1
P1 (x)dx = [l0 (x)f (x0 ) + l1 (x)f (x1 )] dx
x0 x0
Zx1 Zx1
x − x1 x − x0
= f (x0 ) dx + f (x1 ) dx
x0 − x1 x1 − x0
x0 x0
2
x1 x
(x − x0 )2 1
(x − x1 )
= f (x0 ) + f (x1 )
2(x0 − x1 ) x0 2(x1 − x0 ) x0
x1 − x0
= [f (x0 ) + f (x1 )]
2
h
= [f (a) + f (b)].
2
Zx1
f (2) (ξ)
E(f ) = (x − x0 )(x − x1 ) dx
2!
x0
Zx1
1
= f (2) (ξ)(x − x0 )(x − x1 ) dx.
2
x0
NUMERICAL INTEGRATION 3
Since (x − x0 )(x − x1 ) does not change its sign in [x0 , x1 ], therefore by the Weighted Mean-Value
Theorem, there exists a point c ∈ (x0 , x1 ) such that
Zx1
f (2) (c)
E(f ) = (x − x0 )(x − x1 ) dx
2
x0
Geometrically, it is the area of Trapezium (Trapezoid) with width h and ordinates f (a) and f (b).
3.2. Simpson’s Rule. We take second degree Lagrange interpolating polynomial. We take n =
a+b
2, x0 = a, x1 = , x2 = b, h = (b − a)/2.
2
b=x
Z 2 Zx2
f (x) dx = P2 (x)dx + E(f ).
a=x0 x0
Zx2 Zx2
P2 (x)dx = [l0 (x)f (x0 ) + l1 (x)f (x1 ) + +l2 (x)f (x2 )] dx
x0 x0
= λ0 f (x0 ) + λ1 f (x1 ) + λ2 f (x2 ).
The values of the multipliers λ0 , λ1 , and λ2 are given by
Zx2
(x − x1 )(x − x2 )
λ0 = dx.
(x0 − x1 )(x0 − x2 )
x0
To simply this integral, we substitute x = x0 + ht, dx = h dt and change the limits from 0 to 2
accordingly.
Z 2
(t − 1)(t − 2)
∴ λ0 = h dt = h/3.
0 (0 − 1)(0 − 2)
Similarly
Zx2
(x − x0 )(x − x2 )
λ1 = dx
(x1 − x0 )(x1 − x2 )
x0
2
(t − 0)(t − 2)
Z
= h dt = 4h/3.
0 (1 − 0)(1 − 2)
and
Zx2
(x − x0 )(x − x1 )
λ2 = dx
(x2 − x0 )(x2 − x1 )
x0
Z2
(t − 0)(t − 1)
= h dt = h/3.
(2 − 0)(2 − 1)
0
4 NUMERICAL INTEGRATION
Since (x − x0 )(x − x1 )(x − x2 ) changes its sign in the interval [x0 , x1 ], therefore we cannot apply the
Weighted Mean-Value Theorem (as we did in trapezoidal rule).
Also Z x2
(x − x0 )(x − x1 )(x − x2 ) dx = 0.
x0
We can add an interpolation point without affecting the area of the interpolated polynomial, leaving
the error unchanged. We can therefore do our error analysis of Simpson’s rule with any single point
added, since adding any point in [a, b] does not affect the area, we simply double the midpoint, so that
our node set is {x0 = a, x1 = (a + b)/2, x1 = (a + b)/2, x2 = b}. We can now examine the value of the
next interpolating polynomial. Therefore
1 x2 (4)
Z
E(f ) = f (ξ)(x − x0 )(x − x1 )2 (x − x2 )dx.
4! x0
Now the product (x−x0 )(x−x1 )2 (x−x2 ) does not change its sign in [x0 , x1 ], therefore by the Weighted
Mean-Value Theorem, there exists a point c ∈ (x0 , x1 ) such that
Z x2
1 (4)
E(f ) = f (c) (x − x0 )(x − x1 )2 (x − x2 )dx
24 x0
f (4) (c)
= − (x2 − x0 )5
2880
h5
= − f (4) (c).
90
Hence
b
h5
Z
h a+b
f (x)dx = f (a) + 4f + f (b) − f (4) (c).
a 3 2 90
1
This rule is called Simpson’s rule.
3
Similarly by taking third order Lagrange interpolating polynomial with three nodes a = x0 , x1 , x2 , x3 =
b−a 3
b with h = , we get the next integration formula known as Simpson’s rule.
3 8
Z b
3h 3
f (x)dx = [f (x0 ) + 3f (x1 ) + 3f (x2 ) + f (x3 )] − h5 f (4) (c).
a 8 80
Definition 3.1. The degree of accuracy, or precision, or order of a quadrature formula is the largest
positive integer n such that the formula is exact for xk , for each k = 0, 1, · · · , n.
In other words, an integration method of the form
Z b n Z b n
X 1 (n+1)
Y
f (x)dx = λi f (xi ) + f (ξ) (x − xi )dx
a (n + 1)! a
i=0 i=0
is said to be of order n if it provides exact results for all polynomials of degree less than or equal to n
and the error term will be zero for all polynomials of degree ≤ n.
Trapezoidal rule has degree of precision one and Simpson’s rule has three.
Example 1. Find the value of the integral
Z 1
dx
I=
0 1+x
using trapezoidal and Simpson’s rule. Also obtain a bound on the errors. Compare with exact value.
NUMERICAL INTEGRATION 5
Sol.
1
f (x) =
1+x
By trapezoidal rule
IT = h/2[f (a) + f (b)]
Here a = 0, b = 1, h = b − a = 1.
1/2α2 + α3 = π/2
1/4α2 + α3 = 3π/8.
By solving these equations, we obtain α1 = π/4, α2 = π/2, α3 = π/4. Hence
Z 1
f (x)
p dx = π/4[f (0) + 2f (1/2) + f (1)].
0 x(1 − x)
Z 1 Z 1 Z 1
dx dx f (x)dx
I= √ = √ p = p .
x−x 3 1 + x x(1 − x) x(1 − x)
0 0 0
√
Here f (x) = 1/ 1 + x.
By using the above formula, we obtain
" √ √ #
2 2 2
I = π/4 1 + √ + = 2.62331.
3 2
The exact value of the given integral is
I = 2.6220575.
4. Composite Integration
As the order of integration method is increased, the order of the derivative involved in error term also
increase. Therefore, we can use higher-order method if the integrand is differentiable up to required
degree. We can apply lower-order methods by dividing the whole interval in to subintervals and then
we use any Newton-Cotes or Gauss quadrature method for each subintervals separately.
Composite Trapezoidal Method: We divide the interval [a, b] into N subintervals with step size
b−a
h= and taking nodal points a = x0 < x1 < · · · < xN = b where xi = x0 +i h, i = 1, 2, · · · , N −1.
N
Now
Z b
I = f (x)dx
Za x1 Z x2 Z xN
= f (x)dx + f (x)dx + · · · + f (x)dx.
x0 x1 xN −1
Now use trapezoidal rule for each of the integrals on the right side, we obtain
h
I = [(f (x0 ) + f (x1 )) + (f (x1 ) + f (x2 )) + · · · + (f (xN −1 ) + f (xN )]
2
h3
− [f (2) (ξ1 ) + f (2) (ξ2 ) + · · · + f (2) (ξN )]
12
N −1 N
" #
h X h3 X (2)
= f (x0 ) + f (xN ) + 2 f (xi ) − f (ξi )
2 12
i=1 i=1
This formula is composite trapezoidal rule where where xi−1 ≤ ξi ≤ xi , i = 1, 2, · · · , N.
The error associated with this approximation is
N
h3 X (2)
E(f ) = − f (ξi ).
12
i=1
If f ∈ C 2 [a, b], the Extreme Value Theorem implies that f (2) assumes its maximum and minimum in
[a, b]. Since
min f (2) (x) ≤ f (2) (ξi ) ≤ max f (2) (x).
x∈[a,b] x∈[a,b]
On summing, we have
N
X
N min f (2) (x) ≤ f (2) (ξi ) ≤ N max f (2) (x)
x∈[a,b] x∈[a,b]
i=1
NUMERICAL INTEGRATION 7
and
N
(2) 1 X (2)
min f (x) ≤ f (ξi ) ≤ max f (2) (x).
x∈[a,b] N x∈[a,b]
i=1
By the Intermediate Value Theorem, there is a c ∈ (a, b) such that
N
(2) 1 X (2)
f (c) = f (ξi ).
N
i=1
Therefore
h3
E(f ) = − N f (2) (c),
12
or, since h = (b − a)/N ,
(b − a) 2 (2)
E(f ) = −
h f (c).
12
Composite Simpson’s Method: Simpson’s rule require three abscissas, choose an even integer N
b−a
to produce odd number of nodes with h = . Likewise before, we write
N
Z b
I = f (x)dx
Za x2 Z x4 Z xN
= f (x)dx + f (x)dx + · · · + f (x)dx.
x0 x2 xN −2
Now use Simpson’s rule for each of the integrals on the right side to obtain
h
I = [(f (x0 ) + 4f (x1 ) + f (x2 )) + (f (x2 ) + 4f (x3 ) + f (x4 )) + · · · + (f (xN −2 ) + 4f (xN −1 ) + f (xN )]
3
h5
− [f (4) (ξ1 ) + f (4) (ξ2 ) + · · · + f (4) (ξN/2 )]
90
N/2−1 N/2 N/2
h X X h5 X (4)
= f (x0 ) + 2 f (x2i ) + 4 f (x2i−1 ) + f (xN ) − f (ξi ).
3 90
i=1 i=1 i=1
This formula is called composite Simpson’s rule. The error in the integration rule is given by
N
h5 X (4)
E(f ) = − f (ξi ).
90
i=1
If f ∈ C 4 [a, b], the Extreme Value Theorem implies that f (4) assumes its maximum and minimum in
[a, b]. Since
min f (4) (x) ≤ f (4) (ξi ) ≤ max f (4) (x).
x∈[a,b] x∈[a,b]
On summing, we have
N/2
N X N
min f (4) (x) ≤ f (4) (ξi ) ≤ max f (4) (x)
2 x∈[a,b] 2 x∈[a,b]
i=1
and
N/2
(4) 2 X (4)
min f (x) ≤ f (ξi ) ≤ max f (4) (x).
x∈[a,b] N x∈[a,b]
i=1
By the Intermediate Value Theorem, there is a c ∈ (a, b) such that
N/2
2 X (4)
f (4) (c) = f (ξi ).
N
i=1
Therefore
h5
E(f ) = − N f (4) (c),
180
8 NUMERICAL INTEGRATION
So composite Simpson’s rule requires only n ≥ 18. Composite Simpson’s rule with n = 18 gives
Z π X 8 9
π iπ X (2i − 1)π
sin x dx ≈ 2 sin( ) + 4 sin( ) = 2.0000104.
0 54 9 18
i=1 i=1
This is accurate to within about 10−5 because the true value is − cos(π) − (− cos(0)) = 2.
Example 5. The area A inside the closed curve y 2 + x2 = cos x is given by
Z α
1/2
A=4 (cos x − x2 ) dx
0
f (x) = cos x − x2 = 0,
we obtain the following iteration scheme
cos xk − x2k
xk+1 = xk + , k = 0, 1, 2, · · ·
sin xk + 2xk
Starting with x0 = 0.5, we obtain
0.62758
x1 = 0.5 + = 0.92420
1.47942
−0.25169
x2 = 0.92420 + = 0.82911
2.64655
−0.011882
x3 = 0.82911 + = 0.82414
2.39554
−0.000033
x4 = 0.82414 + = 0.82413.
2.38226
Hence the value of α correct to three decimals is 0.824.
(b) Substituting the value of α, we obtain
Z 0.824
1/2
A=4 (cos x − x2 ) dx.
0
Using composite trapezoidal method by taking h = 0.824, 0.412, and 0.206 respectively, we obtain the
following approximations of the area A.
4(0.824)
A = [1 + 0.017753] = 1.67725
2
4(0.412)
A = [1 + 2(0.864047) + 0.017753] = 2.262578
2
4(0.206)
A = [1 + 2(0.967688 + 0.864047 + 0.658115) + 0.017753] = 2.470951.
2
5. Gauss Quadrature
In the numerical integration method if both nodes xi and multipliers λi are unknown then method is
called Gaussian quadrature. We can obtain the unknowns by making the method exact for polynomials
of degree as high as required. The formulas are derived for the interval [−1, 1] and any interval [a, b]
can be transformed to [−1, 1] by taking the transformation x = At + B which gives a = −A + B and
b−a b+a
b = A + B and after solving we get x = t+ .
2 2
10 NUMERICAL INTEGRATION
f (n+1) (c)
= C,
(n + 1)!
where
Z1
C= (x − x0 ) · · · (x − xn ) dx.
−1
We can compute the value of C by putting f (x) = xn+1 to obtain
Z b n
n+1
X C
x dx = λi xi n+1 + (n + 1)!
a (n + 1)!
i=0
Z b Xn
=⇒ C = xn+1 dx − λi xi n+1 .
a i=0
The number C is called error constant. By using the notation, we can write error term as following
C
E(f ) = f (n+1) (c).
(n + 1)!
Gauss-Legendre Integration Methods: The technique we have described could be used to de-
termine the nodes and coefficients for formulas that give exact results for higher-degree polynomials.
One-point formula: The formula is given by
Z 1
f (x)dx = λ0 f (x0 ).
−1
The method has two unknowns λ0 and x0 . Make the method exact for f (x) = 1, x, we obtain
Z 1
f (x) = 1 : dx = 2 = λ0
−1
Z 1
f (x) = x : xdx = 0 = λ0 x0 =⇒ x0 = 0.
−1
Therefore one-point formula is given by
Z 1
f (x)dx = 2f (0).
−1
The error in approximation is given by
C 00
E(f ) = f (ξ)
2!
where error constant C is given by
Z 1
C= x2 dx − 2f (0) = 2/3.
−1
Hence
1
E(f ) = f 00 (ξ), −1 < ξ < 1.
3
NUMERICAL INTEGRATION 11
Two-point formula:
Z 1
f (x)dx = λ0 f (x0 ) + λ1 f (x1 ).
−1
The method has four unknowns. Make the method exact for f (x) = 1, x, x2 , x3 , we obtain
Z 1
f (x) = 1 : dx = 2 = λ0 + λ1 (5.1)
−1
Z 1
f (x) = x : xdx = 0 = λ0 x0 + λ1 x1 (5.2)
−1
Z 1
f (x) = x2 : x2 dx = 2/3 = λ0 x20 + λ1 x21 (5.3)
−1
Z 1
f (x) = x3 : x3 dx = 0 = λ0 x30 + λ1 x31 (5.4)
−1
The method has six unknowns. Make the method exact for f (x) = 1, x, x2 , x3 , x4 , x5 , we obtain
f (x) = 1 : 2 = λ0 + λ1 + λ2
f (x) = x : 0 = λ0 x0 + λ1 x1 + λ2 x2
f (x) = x2 : 2/3 = λ0 x20 + λ1 x21 + λ2 x22
f (x) = x3 : 0 = λ0 x30 + λ1 x31 + λ2 x32
f (x) = x4 : 2/5 = λ0 x40 + λ1 x41 + λ2 x42
f (x) = x5 : 0 = λ0 x50 + λ1 x51 + λ2 x52
p
p these equations, we obtain λ0 = λ2 = 5/9 and λ1 = 8/9. x0 = ± 3/5, x1 = 0 and
By solving
x2 = ∓ 3/5.
12 NUMERICAL INTEGRATION
1
∴ E5 = f (6) (ξ), −1 < ξ < 1.
15750
Note: Legendre polynomials Pn (x) is a monic polynomial of degree n. The first few Legendre poly-
nomials are
P0 (x) = 1,
P1 (x) = x,
1
P2 (x) = x2 − ,
3
3
P3 (x) = x3 − x.
5
Nodes in Gauss-Legendre rules are roots of these polynomials.
Example 6. Evaluate
Z 2
2x
I= 4
dx
1 1+x
using Gauss-Legendre 1 and 2-point formula. Also compare with the exact value.
t+3
Sol. Firstly we change the interval [1, 2] in to [−1, 1] by taking x = , dx = dt/3.
2
Z 2 Z 1
2x 8(t + 3)
I= dx = dt
1 1 + x4 −1 16 + (t + 3)4
Let
8(t + 3)
f (t) = .
16 + (t + 3)4
By 1-point formula
I = 2f (0) = 0.4948
By 2-point formula
1 1
I = f −√ +f √ = 0.5434
3 3
Now exact value of the integral is given by
Z 2
2x π
I= 4
dx = tan−1 4 − = 0.5408
1 1+x 4
Example 7. Evaluate
Z 1
3/2
I= (1 − x2 ) cos x dx
−1
using Gauss-Legendre 3-point formula.
NUMERICAL INTEGRATION 13
3/2
Sol. Using Gauss-Legendre 3-point formula with f (x) = (1 − x2 ) cos x, we obtain
" r ! r !#
1 3 3
I = 5f − + 8f (0) + 5f
9 5 5
" ! r !#
2 3/2
r 3/2
1 3 2 3
= 5 cos +8+5 cos
9 5 5 5 5
= 1.08979.
Example 8. Evaluate Z 1
dx
I=
0 1+x
by subdividing the interval [0, 1] into two equal parts and then by using Gauss-Legendre three-point
formula.
Sol. " r ! r !#
Z 1
1 3 3
f (x)dx = 5f − + 8f (0) + 5f .
−1 9 5 5
Let Z 1 Z 1/2 Z 1
dx dx dx
I= = + = I1 + I2 .
0 1+x 0 1+x 1/2 1 + x
t+1 z+3
Now substitute x = and x = in I1 and I2 , respectively to change the limits to [−1, 1].
4 4
We have dx = dt/4 and dx = dz/4 for integral I1 and I2 , respectively.
Therefore Z 1 " #
dt 1 5 8 5
I1 = = p + + p = 0.405464
−1 t + 5 9 5 − 3/5 5 5 + 3/5
Z 1 " #
dz 1 5 8 5
I2 = = p + + p = 0.287682
−1 z + 7 9 7 − 3/5 7 7 + 3/5
Hence
I = I1 + I2 = 0.405464 + 0.287682 = 0.693146.
Exercises
(1) Given
Z 1
I= x ex dx.
0
Approximate the value of I using trapezoidal and Simpson’s one-third method. Also obtain
the error bounds and compare with exact value of the integral.
(2) Evaluate
Z 1
dx
I=
0 1 + x2
using trapezoidal and Simpson’s rule with 4 and 6 subintervals. Compare with the exact value
of the integral.
(3) Approximate the following integrals using the trapezoidal and Simpson formulas. a. I =
0.25 e+1 1
(cos x)2 dx
R R
b. dx. Find a bound for the error using the error formula, and
−0.25 e x ln x
compare this to the actual error.
R2
(4) The quadrature formula f (x)dx = c0 f (0) + c1 f (1) + c2 f (2) is exact for all polynomials of
0
degree less than or equal to 2. Determine c0 , c1 , and c2 .
R2
(5) The quadrature formula f (x)dx = c0 f (0) + c1 f (1) + c2 f (2) is exact for all polynomials of
0
degree less than or equal to 2. Determine c0 , c1 , and c2 .
14 NUMERICAL INTEGRATION
(14) Compute by Gaussian quadrature with n = 2 and compare with the exact value of the integral.
Z 3.5
x
√ dx.
2
x −4
3
(15) Evaluate
Z 1
sin x dx
I=
0 2+x
by subdividing the interval [0, 1] into two equal parts and then by using Gaussian quadrature
with n = 2.
(16) Determine the coefficients in the formula
Z 2h
x−1/2 f (x)dx = (2h)1/2 [A0 f (0) + A1 f (h) + A2 f (2h)] + R
0
and calculate the remainder R, while f (3) (x) is constant.
(17) Consider approximating integrals of the form
Z1
√
I= xf (x)dx
0
in which f (x) has several continuous derivatives on [0, 1].
(a) Find a formula
Z1
√
xf (x)dx ≈ w1 f (x1 ) = I1
0
which is exact if f (x) is any linear polynomial.
(b) To find a formula
Z1
√
xf (x)dx ≈ w1 f (x1 ) + w2 f (x2 ) = I2
0
which is exact for all polynomials of degree ≤ 3, set up a system of four equations with unknowns
w1 , w2 , x1 , x2 . Verify that
r ! r !
1 10 1 10
x1 = 5+2 , x2 = 5−2 ,
9 7 9 7
r !
1 7 2
w1 = 5+ , w2 = − w1
15 10 3
is a solution of the system.
(c) Apply I1 and I2 to the evaluation of
Z1
√
I= xe−x dx = 0.37894469164.
0
Appendix A. Algorithms
Algorithm (Composite Trapezoidal Method):
Step 1 : Inputs: function f (x); end points a and b; and N number of subintervals.
Step 2 : Set h = (b − a)/N .
Step 3 : Set sum = 0
Step 4 : For i = 1 to N − 1
Step 5 : Set x = a + h ∗ i
Step 6 : Set sum = sum+2 ∗ f (x)
end
Step 7 : Set sum = sum+f (a) + f (b)
16 NUMERICAL INTEGRATION
Bibliography
[Burden] Richard L. Burden, J. Douglas Faires and Annette Burden, “Numerical Analysis,” Cengage
Learning, 10th edition, 2015.
[Atkinson] K. Atkinson and W. Han, “Elementary Numerical Analysis,” John Willey and Sons, 3rd
edition, 2004.
CHAPTER 7 (4 LECTURES)
INITIAL-VALUE PROBLEMS FOR ORDINARY DIFFERENTIAL EQUATIONS
1. Introduction
Differential equations are used to model problems in science and engineering that involve the change
of some variable with respect to another. Most of these problems require the solution of an initial-value
problem, that is, the solution to a differential equation that satisfies a given initial condition.
In common real-life situations, the differential equation that models the problem is too complicated to
solve exactly, and one of two approaches is taken to approximate the solution. The first approach is to
modify the problem by simplifying the differential equation to one that can be solved exactly and then
use the solution of the simplified equation to approximate the solution to the original problem. The
other approach, which we will examine in this chapter, uses methods for approximating the solution of
the original problem. This is the approach that is most commonly taken because the approximation
methods give more accurate results and realistic error information.
In this chapter, we discuss the numerical methods for solving the ordinary differential equations of
initial-value problems (IVP) of the form
dy
= f (t, y), t ∈ R, y(t0 ) = y0 (1.1)
dt
where y is a function of t, f is function of t and y, and t0 is called the initial value. The numerical
values of y(t) on an interval containing t0 are to be determined.
We divide the domain [a, b] in to subintervals
a = t0 < t1 < · · · < tN = b.
These points are called mesh points or grid points. Let equal spacing is h. The uniform mesh points
are given by ti = t0 + ih, i = 0, 1, 2, ... The set of points y0 , y1 , · · · , yN are the numerical solution of
the initial-value problem (IVP).
2.1. Picard method. This method is also known as method of successive approximations.
We consider the following IVP
dy
= f (t, y), t ∈ R, y(t0 ) = y0
dt
Let f (t, y) to be a continuous function on the given domain. The initial value problem is equivalent to
following integral equation,
Zt
y(t) = y(0) + f (t, y(t))dt.
t0
Writing y(0) = y0 and we can compute the solution y(t) at any time t by integrating above equation.
Note that y(t) also appears in integral in f (t, y(t)). Therefore we take any approximation of y(t) to
start the procedure.
The successive approximations for solutions are given by
Zt
y0 (t) = y0 , yk+1 (t) = 1 + f (s, yk (s))ds, k = 0, 1, 2, · · ·
0
Or equivalently a solution of this equation is meant a continuous function φ exists and which approx-
imate y(t), i.e,
Zt
φ0 (t) = y0 , φk+1 (t) = 1 + f (s, φk (s))ds, k = 0, 1, 2, · · · .
0
Thus
t
t2
Z
φ1 (t) = 1 + s ds = 1 + ,
0 2
Zt
s2 t2 t4
φ2 (t) = 1 + s 1+ ds = 1 + + ,
2 2 2.4
0
and it may be established by induction that
2 2 k
t 1 t2 1 t2
φk (t) = 1 + + + .... + .
2 2! 2 k! 2
2
We recognize φk (x) as the partial sum for the series expansion of the function φ(t) = et /2 . We know
that this series converges for all t and this means that φk (t) → φ(t) as k → ∞, for all x ∈ R.
Indeed φ is a solution of the given initial value problem.
2.2. Taylor’s Series method. Consider the one dimensional initial value problem
y 0 = f (t, y), y(t0 ) = y0
where f is a function of two variables t and y and (t0 , y0 ) is a known point on the solution curve.
If the existence of all higher order partial derivatives is assumed for y at some point t = ti , then by
Taylor series the value of y at any neighboring point xi + h can be written as
h2 00 h3 000
y(ti + h) = y(ti ) + hy 0 (ti ) +
y (ti ) + y (ti ) + · · · + O(hp+1 ).
2 3!
Since at ti , yi is known, y 0 at xi can be found by computing f (ti , yi ).
Similarly higher derivatives of y at ti can be computed by making use of the relation y 0 = f (t, y).
Hence the value of y at any neighboring point ti + h can be obtained by summing the above infinite
series. If the series has been terminated after the p-th derivative term then the approximated formula
is called the Taylor series approximation to y of order p and the error is of order p + 1.
Example 5. Given the IVP y 0 = x2 y − 1, y(0) = 1. By Taylor series method of order 4 with step size
0.1. Find y at x = 0.1 and x = 0.2.
Sol. Given IVP
y 0 = x2 y − 1, y 00 = 2xy + x2 y 0 , y 000 = 2y + 4xy 0 + x2 y 00 , y (4) = 6y 0 + 6xy 00 + x2 y 000 .
3.1. Euler’s Method: The Euler method is named after Swiss mathematician Leonhard Euler (1707-
1783). This is the one of the simplest method to solve the IVP. Consider the IVP given in Eqs(3.1-3.2).
dy
We can approximate the derivative as following by assuming that all nodes ti are equally spaced
dt
with spacing h and ti+1 = ti + h.
Now by the definition of derivative
1
y 0 (t0 ) ≈ [y(t0 + h) − y(t0 )].
h
Apply this approximation to the given IVP at point t = t0 gives
y 0 (t0 ) = f (ti , yi )
Therefore
1
[y(t1 ) − y(t0 )] = f (t0 , y0 )
h
=⇒ y(t1 ) − y(t0 ) = hf (t0 , y0 )
which gives
y(t1 ) = y(t0 ) + hf (t0 , y0 ).
In general, we write
ti+1 = ti + h
yi+1 = yi + hf (ti , yi )
where yi = y(ti ). This method is called Euler’s method.
Alternatively we can derive this method from a Taylor’s series. We write
h2 00
y(ti+1 ) = y(ti + h) = y(ti ) + hy 0 (ti ) + y (ti ) + · · ·
2!
If we cut the series at y 0 (ti ), we obtain
y(ti+1 ) = y(ti ) + hy 0 (ti )
=⇒ y(ti+1 ) = y(ti ) + hf (ti , y(ti ))
=⇒ yi+1 = yi + hf (ti , yi ).
If truncation error has term hp+1 , then order of the numerical method is p. Therefore, Euler’s method
is a first-order method.
3.2. The Improved or Modified Euler’s method. We write the integral form of y(t) as
Zt
dy
= f (t, y) ⇐⇒ y(t) = y(t0 ) + f (t, y(t))dt.
dt
t0
Approximate integral using the trapezium rule:
h
y(t) = y(t0 ) + [f (t0 , y(t0 )) + f (t0 + h, y(t1 )], t1 = t0 + h.
2
Use Euler’s method to approximate y(t1 ) ≈ y(t0 ) + hf (t0 , y(t0 )) in trapezium rule:
h
y(t1 ) = y(t0 ) + [f (t0 , y(t0 )) + f (t1 , y(t0 ) + hf (t0 , y(t0 )))].
2
NUMERICAL DIFFERENTIAL EQUATIONS 5
t2 = t0 + 2h = 0 + 2 × 0.1 = 0.2
y2 = y1 + hf (0.1, 0.9) = 0.9 + 0.1(−2 × 0.9 + 2 − e−4(0.1) )
= 0.9 + 0.1(−0.47032) = 0.852967
∴ y2 = y(0.2) = 0.852967.
√
Example 7. For the IVP y 0 = t + y, y(0) = 1. Calculate y in the interval [0.0.6] with h = 0.2 by
using modified Euler’s method.
Sol.
√
y 0 = t + y = f (t, y), t0 = 0, y0 = 1, h = 0.2, t1 = 0.2
K1 = hf (t0 , y0 ) = 0.2(1) = 0.2
K2 = hf (t1 , y0 + K1 ) = hf (0.2, 1.2) = 0.2591
K1 + K2
y1 = y(0.2) = y0 + = 1.22955.
2
Similarly we can compute solutions at other points.
Example 8. Show that the following initial-value problem has a unique solution.
y 0 = t−2 (sin 2t − 2ty), 1 ≤ t ≤ 2, y(1) = 2.
Find y(1.1) and y(1.2) with step-size h = 0.1 using modified Euler’s method.
Sol.
y 0 = t−2 (sin 2t − 2ty) = f (t, y).
Holding t as a constant,
|f (t, y1 ) − f (t, y2 )| = |t−2 (sin 2t − 2ty1 ) − t−2 (sin 2t − 2ty2 )|
2
= |y1 − y2 |
|t|
≤ 2|y1 − y2 |.
6 NUMERICAL DIFFERENTIAL EQUATIONS
Thus f satisfies a Lipschitz condition in the variable y with Lipschitz constant L = 2. Additionally,
f (t, y) is continuous when 1 ≤ t ≤ 2, and −∞ < y < ∞, so Existence Theorem implies that a unique
solution exists to this initial-value problem.
Now we apply Modified Euler’s method to find the solution.
t0 = 1, y0 = 2, h = 0.1, t1 = 1.1.
K1 = hf (t0 , y0 ) = hf (1, 2) = −0.309072
K2 = hf (t1 , y0 + K1 ) = hf (1, 1.6909298) = −0.24062
y1 = y(1.1) = y0 + 1/2(K1 + K2 ) = 1.725152.
Now y1 = 1.725152, h = 0.1, t2 = 1.2.
∴ K1 = −0.24684
K2 = −0.19947
y2 = y(0.2) = 1.50199.
Example 9. Given the initial-value problem
2
y 0 = y + t2 et , 1 ≤ t ≤ 2, y(1) = 0
t
(i) Use Euler’s method with h = 0.1 to approximate the solution in the interval [1, 1.6].
(ii) Use the answers generated in part (i) and linear interpolation to approximate y at t = 1.04 and
t = 1.55.
Sol. Given the initial-value problem
2
y 0 = y + t2 et = f (t, y)
t
t0 = 1.0, y(t0 ) = 0.0, h = 0.1.
By Euler’s method, approximation of solutions at different time-level are given by
y(ti+1 ) = y(ti ) + hf (ti , y(ti )).
2 2 1.0
∴ y(t1 ) = y(1.1) = y(0) + hf (1, 0) = 0.0 + 0.1 0.0 + 1.0 e = 0.271828.
1.0
t1 = 1.1
2 2 1.1
y(t2 ) = y(1.2) = 0.271828 + 0.1 0.271828 + (1.1) e = 0.684756
1.1
t2 = 1.2
2 2 1.2
y(t3 ) = y(1.3) = 0.684756 + 0.1 0.684756 + (1.2) e = 1.27698.
1.2
t3 = 1.3
Similarly
t4 = 1.4
y(t4 ) = y(1.4) = 2.09355
t5 = 1.5
y(t5 ) = y(1.5) = 3.18745
t6 = 1.6
y(t6 ) = y(1.6) = 4.62082.
Now using linear interpolation, approximate y can be found in the following way.
1.04 − 1.1 1.04 − 1.1
y(1.04) = y(1.0) + y(1.1) = 0.10873120.
1.0 − 1.1 1.0 − 1.1
1.55 − 1.6 1.55 − 1.5
y(1.55) = y(1.5) + y(1.6) = 3.90413500.
1.5 − 1.6 1.6 − 1.5
NUMERICAL DIFFERENTIAL EQUATIONS 7
3.3. Runge-Kutta Methods: This is the one of the most important method to solve the IVP. These
techniques were developed around 1900 by the German mathematicians C. Runge and M. W. Kutta.
If we apply Taylor’s Theorem directly then we require that the function have higher-order derivatives.
The class of Runge-Kutta methods does not involve higher-order derivatives which is the advantage of
this class.
Euler’s method is an example of the Runge-Kutta method of first-order and modified Euler’s method
is an example of second-order Runge-Kutta method.
Third-order Runge-Kutta methods: Like-wise modified Euler’s, using Simpson’s rule to approxi-
mate the integral, we obtain the following Runge-Kutta method of order three.
ti+1 = ti + h
K1 = hf (ti , yi )
K2 = hf (ti + h/2, yi + K1 /2)
K3 = hf (ti + h, yi − K1 + 2K2 )
1
yi+1 = yi + (K1 + 4K2 + K3 ).
6
There are different Runge-Kutta method of order three. Most commonly used method is Heun’s
method, given by
ti+1 = ti + h
h 2h 2h h h
yi+1 = yi + f (ti , yi ) + 3f ti + , yi + f (ti + , yi + f (ti , yi )) .
4 3 3 3 3
Runge-Kutta methods of order three are not generally used. The most common Runge-Kutta method
in use is of order four, is given by the following.
Fourth-order Runge-Kutta method:
ti+1 = ti + h
K1 = hf (ti , yi )
K2 = hf (ti + h/2, yi + K1 /2)
K3 = hf (ti + h/2, yi + K2 /2)
K4 = hf (ti + h, yi + K3 )
1
yi+1 = yi + (K1 + 2K2 + 2K3 + K4 ) + O(h5 ).
6
Local truncation error in the Runge-Kutta method is the error that arises in each step because of
the truncated Taylor series. This error is inevitable. The fourth-order Runge-Kutta involves a local
truncation error of O(h5 ).
dy y 2 − t2
Example 10. Using Runge-Kutta fourth-order, solve = 2 with y0 = 1 at t = 0.2 and 0.4.
dt y + t2
Sol.
y 2 − t2
f (t, y) = , t0 = 0, y0 = 1, h = 0.2
y 2 + t2
K1 = hf (t0 , y0 ) = 0.2f (0, 1) = 0.200
h K1
K2 = hf (t0 + , y0 + ) = 0.2f (0.1, 1.1) = 0.19672
2 2
h K2
K3 = hf (t0 + , y0 + ) = 0.2f (0.1, 1.09836) = 0.1967
2 2
K4 = hf (t0 + h, y0 + K3 ) = 0.2f (0.2, 1.1967) = 0.1891
1
y1 = y0 + (K1 + 2K2 + 2K3 + K4 ) = 1 + 0.19599 = 1.196
6
∴ y(0.2) = 1.196.
8 NUMERICAL DIFFERENTIAL EQUATIONS
Now
t1 = t0 + h = 0.2
K1 = hf (t1 , y1 ) = 0.1891
h K1
K2 = hf (t1 + , y1 + ) = 0.2f (0.3, 1.2906) = 0.1795
2 2
h K2
K3 = hf (t1 + , y1 + ) = 0.2f (0.3, 1.2858) = 0.1793
2 2
K4 = hf (t1 + h, y1 + K3 ) = 0.2f (0.4, 1.3753) = 0.1688
1
y2 = y(0.4) = y1 + (K1 + 2K2 + 2K3 + K4 ) = 1.196 + 0.1792 = 1.3752.
6
Sol. Given
dy dz
= 1 + xz = f (x, y, z), = −xy = g(x, y, z)
dx dx
x0 = 0, y0 = 0, z0 = 1, h = 0.3
K1 = hf (x0 , y0 , z0 ) = 0.3f (0, 0, 1) = 0.3
L1 = hg(x0 , y0 , z0 ) = 0.3g(0, 0, 1) = 0
h K1 L1
K2 = hf (x0 + , y0 + , z0 + ) = 0.3f (0.15, 0.15, 1) = 0.346
2 2 2
h K1 L1
L2 = hg(x0 + , y0 + , z0 + ) = −0.00675
2 2 2
NUMERICAL DIFFERENTIAL EQUATIONS 9
h K2 L2
K3 = hf (x0 + , y0 + , z0 + ) = 0.34385
2 2 2
h K2 L2
L3 = hg(x0 + , y0 + , z0 + ) = −0.007762
2 2 2
K4 = hf (x0 + h, y0 + K3 , z0 + L3 ) = 0.3893
L4 = hg(x0 + h, y0 + K3 , z0 + L3 ) = −0.03104.
Hence
1
y1 = y(0.3) = y0 + (K1 + 2K2 + 2K3 + K4 ) = 0.34483
6
1
z1 = z(0.3) = z0 + (L1 + 2L2 + 2L3 + L4 ) = 0.9899.
6
Example 13. Consider the following Lotka-Volterra system in which u is the number of prey and v
is the number of predators.
du
= 2u − uv, u(0) = 1.5
dt
dv
= −9v + 3uv, v(0) = 1.5.
dt
Use the fourth-order Runge-Kutta method with step-size h = 0.2 to approximate the solution at t = 0.2.
Sol.
du
= 2u − uv = f (t, u, v)
dt
dv
= −9v + 3uv = g(t, u, v),
dt
u0 = 1.5, v0 = 1.5, h = 0.2.
K1 = hf (t0 , u0 , v0 ) = 0.15
L1 = hg(t0 , u0 , v0 ) = −1.35
K2 = hf (t0 + h/2, u0 + K1 /2, v0 + L1 /2) = 0.370125.
h K1 L1
L2 = hg(t0 + , u0 + , v0 + ) = −0.7054
2 2 2
h K2 L2
K3 = hf (t0 + , u0 + , v0 + ) = 0.2874
2 2 2
h K2 L2
L3 = hg(t0 + , u0 + , v0 + ) = −0.9052
2 2 2
K4 = hf (t0 + h, u0 + K3 , v0 + L3 ) = 0.5023
L4 = hg(x0 + h, u0 + K3 , v0 + L3 ) = −0.4348.
Therefore
1
u(0.2) = 1.5 + (0.15 + 2 × 0.370125 + 2 × 0.2874 + 0.5023) = 1.8279.
6
1
v(0.2) = 1.5 + (−1.35 − 2 × 0.7054 − 2 × 0.9052 − 0.4348) = 0.6657.
6
Example 14. Solve by using fourth-order Runge-Kutta method for x = 0.2.
2
d2 y dy
2
=x − y 2 , y(0) = 1, y 0 (0) = 0.
dx dx
Sol. Let
dy
= z = f (x, y, z).
dx
Therefore
dz
= xz 2 − y 2 = g(x, y, z).
dx
10 NUMERICAL DIFFERENTIAL EQUATIONS
Now
x0 = 0, y0 = 1, z0 = 0, h = 0.2
K1 = hf (x0 , y0 , z0 ) = 0.0
L1 = hg(x0 , y0 , z0 ) = −0.2
h K1 L1
K2 = hf (x0 + , y0 + , z0 + ) = −0.02
2 2 2
h K1 L1
L2 = hg(x0 + , y0 + , z0 + ) = −0.1998
2 2 2
h K2 L2
K3 = hf (x0 + , y0 + , z0 + ) = −0.02
2 2 2
h K2 L2
L3 = hg(x0 + , y0 + , z0 + ) = −0.1958
2 2 2
K4 = hf (x0 + h, y0 + K3 , z0 + L3 ) = −0.0392
L4 = hg(x0 + h, y0 + K3 , z0 + L3 ) = −0.1905.
Hence
1
y1 = y(0.2) = y0 + (K1 + 2K2 + 2K3 + K4 ) = 0.9801.
6
1
z1 = y 0 (0.3) = z0 + (L1 + 2L2 + 2L3 + L4 ) = −0.1970.
6
Example 15. The motion of a swinging pendulum is described by the following second-order differential
equation
d2 θ g π
2
+ sin θ = 0, θ(0) = , θ0 (0) = 0,
dt L 6
where θ be the angle with vertical at time t, length of the pendulum L = 2 ft, and g = 32.17 ft/s2 . With
h = 0.1 s, find the angle θ at t = 0.1 using Runge-Kutta fourth order method.
Sol. First of all we convert the given second order initial value problem into simultaneous first order
initial value problems.
dθ
Assuming = y, we obtain the following system:
dt
dθ
= y = f (t, θ, y), θ(0) = π/6
dt
dy g
= − sin θ = g(t, θ, y), y(0) = 0.
dt L
Here t0 = 0, θ0 = π/6, and y0 = 0. We have, by Runge-Kutta fourth order method, taking h = 0.1.
K1 = hf (t0 , θ0 , y0 ) = 0.00000000
L1 = hg(t0 , θ0 , y0 ) = −0.80425000
K2 = hf (t0 + 0.5h, θ0 + 0.5K1 , y0 + 0.5L1 ) = −0.04021250
L2 = hg(t0 + 0.5h, θ0 + 0.5K1 , y0 + 0.5L1 ) = −0.80425000
K3 = hf (t0 + 0.5h, θ0 + 0.5K2 , y0 + 0.5L2 ) = −0.04021250
L3 = hg(t0 + 0.5h, θ0 + 0.5K2 , y0 + 0.5L2 ) = −0.77608129
K4 = hf (t0 + h, θ0 + K3 , y0 + L3 ) = −0.07760813
L4 = hg(t0 + h, θ0 + K3 , y0 + L3 ) = −0.74759884.
(K1 + 2K2 + 2K3 + K4 )
θ1 = θ0 + = 0.48385575.
6
Therefore, θ(0.1) ≈ θ1 = 0.48385575.
NUMERICAL DIFFERENTIAL EQUATIONS 11
Exercises
(1) Show that each of the following initial-value problems (IVP) has a unique solution, and find
the solution.
2
a. y 0 = y cos t, 0 ≤ t ≤ 1, y(0) = 1. b. y 0 = y + t2 et , 1 ≤ t ≤ 2, y(1) = 0.
t
(2) Apply Picard’s method for solving the initial-value problem generate y0 (t), y1 (t), y2 (t), and
y3 (t) for the initial-value problem
y 0 = −y + t + 1, 0 ≤ t ≤ 1, y(0) = 1.
(3) Consider the following initial-value problem
x0 = t(x + t) − 2, x(0) = 2.
Use the Euler method with stepsize h = 0.2 to compute x(0.6).
(4) Given the initial-value problem
1 y
y0 = 2
− − y 2 , 1 ≤ t ≤ 2, y(1) = −1,
t t
1
with exact solution y(t) = − :
t
a. Use Euler’s method with h = 0.05 to approximate the solution, and compare it with the
actual values of y.
b. Use the answers generated in part (a) and linear interpolation to approximate the following
values of y, and compare them to the actual values.
i. y(1.052) ii. y(1.555) iii. y(1.978).
(5) Solve the following IVP by second-order Runge-Kutta method
y 0 = −y + 2 cos t, y(0) = 1.
Compute y(0.2), y(0.4), and y(0.6) with mesh length 0.2.
(6) Compute solutions to the following problems with a second-order Taylor method. Use step size
h = 0.2.
20
a. y 0 = (cos y)2 , 0 ≤ x ≤ 1, y(0) = 0. b. y 0 = , 0 ≤ x ≤ 1, y(0) = 1.
1 + 19e−x/4
(7) A projectile of mass m = 0.11 kg shot vertically upward with initial velocity v(0) = 8 m/s is
slowed due to the force of gravity, Fg = −mg, and due to air resistance, Fr = −kv|v|, where
g = 9.8 m/s2 and k = 0.002 kg/m. The differential equation for the velocity v is given by
mv 0 = −mg − kv|v|.
a. Find the velocity after 0.1, 0.2, · · · , 1.0 s.
b. To the nearest tenth of a second, determine when the projectile reaches its maximum height
and begins falling.
(8) Using Runge-Kutta fourth-order method to solve the IVP at x = 0.8 for
dy √
= x + y, y(0.4) = 0.41
dx
with step length h = 0.2.
(9) Water flows from an inverted conical tank with circular orifice at the rate
√
dx 2
p x
= −0.6πr 2g ,
dt A(x)
where r is the radius of the orifice, x is the height of the liquid level from the vertex of the
cone, and A(x) is the area of the cross section of the tank x units above the orifice. Suppose
r = 0.1 ft, g = 32.1 ft/s2 , and the tank has an initial water level of 8 ft and initial volume of
512(π/3) ft3 . Use the Runge-Kutta method of order four to find the following.
a. The water level after 10 min with h = 20 s.
b. When the tank will be empty, to within 1 min.
12 NUMERICAL DIFFERENTIAL EQUATIONS
(10) The following system represent a much simplified model of nerve cells
dx
= x + y − x3 , x(0) = 0.5
dt
dy x
= − , y(0) = 0.1
dt 2
where x(t) represents voltage across the boundary of nerve cell and y(t) is the permeability of
the cell wall at time t. Solve this system using Runge-Kutta fourth-order method to generate
the profile up to t = 0.2 with step size 0.1.
(11) Use Runge-Kutta method of order four to solve
y 00 − 3y 0 + 2y = 6e−t , 0 ≤ t ≤ 1, y(0) = y 0 (0) = 2
for t = 0.2 with stepsize 0.2.
Bibliography
[Burden] Richard L. Burden, J. Douglas Faires and Annette Burden, “Numerical Analysis,” Cengage
Learning, 10th edition, 2015.
[Atkinson] K. Atkinson and W. Han, “Elementary Numerical Analysis,” John Willey and Sons, 3rd
edition, 2004.
Appendix A. Algorithms
Algorithm for second-order Runge-Kutta method:
for i = 0, 1, 2, .. do
ti+1 = ti + h = t0 + (i + 1)h
K1 = hf (ti , yi )
K2 = hf (ti+1 , yi + K1 )
1
yi+1 = yi + (K1 + K2 ).
2
end for
Algorithm for fourth-order Runge-Kutta method:
for i = 0, 1, 2, .. do
ti+1 = ti + h
K1 = hf (ti , yi )
K2 = hf (ti + h/2, yi + K1 /2)
K3 = hf (ti + h/2, yi + K2 /2)
K4 = hf (ti+1 , yi + K3 )
1
yi+1 = yi + (K1 + 2K2 + 2K3 + K4 ).
6
end for