Calculus UCD
Calculus UCD
MATH10400
Dr Richard Smith
Acknowledgements
I am most grateful to Dr Michael Mackey, who lectured this module in
2015/16 and 2016/17, and who very kindly provided me with full access
to the content that he created.
All figures are the author’s.
i
Contents
Contents iii
0 Programme overview 1
0.1 Programme outline . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Assessment and grading . . . . . . . . . . . . . . . . . . . . . 5
0.3 Continuous assessment schedules . . . . . . . . . . . . . . . 7
0.4 Discussion boards and MathJax . . . . . . . . . . . . . . . . . 9
0.5 Any other business . . . . . . . . . . . . . . . . . . . . . . . . 10
1 Preliminaries 13
1.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Algebraic manipulation . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Powers (or Laws of Indices) . . . . . . . . . . . . . . . . . . . 21
1.7 Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8 Solving equations . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.9 Graphing functions . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Differentiation 57
3.1 Rates of change . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 Differentiation from First Principles . . . . . . . . . . . . . . 60
3.3 Rules for differentiating . . . . . . . . . . . . . . . . . . . . . . 65
3.4 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
iii
iv Contents
6 Integration 109
6.1 Indefinite integrals . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2 Riemann sums and definite integrals . . . . . . . . . . . . . . 112
6.3 The Fundamental Theorem of Calculus . . . . . . . . . . . . 116
Programme overview
• Vector geometry 1
Vectors in n-dimensional Euclidean space, vector arithmetic, scalar
products, the Cauchy-Schwarz inequality, angles between vectors,
and the action of matrices on vectors.
1
2 Programme overview
• Vector geometry 2
Orthonormal lists of vectors, orthonormal bases of Rn and coordi-
nate systems.
• Matrices 2
Quadratic forms and matrix norms.
• Differentiation
Rates of change, differentiation from first principles, relationship
with continuity, the power, product, quotient and chain rules, deriva-
tives of polynomials, trigonometric, exponential and logarithmic func-
tions, and composites thereof.
• Integration
Indefinite integrals as antiderivatives, standard examples, Riemann
sums, definite integrals and area, the Fundamental Theorem of Cal-
culus.
• Methods of integration
Integration by substitution and integration by parts.
• Numerical techniques
Solving equations numerically, the bisection and Newton-Raphson
methods, numerical integration, the trapezoidal rule and Simpson’s
rule.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
4 Programme overview
Online material
1. UCD Mathematics Moodle
All module material will be made available on
Once at the site, please log in using your UCD Connect creden-
tials and then enrol to both modules (the enrolment key for both is
‘ucdprofcert2022’).
3. Continuous assessment
Continuous assessment comes in two forms: written homework and
WeBWorK. Both will be issued and managed online – see Section
0.2 for more details. The full schedule of issue dates and assessment
deadlines is given in Section 0.3.
4. Discussion boards
Students can post queries and discuss topics via the weekly dis-
cussion boards – see Section 0.4 for more information.
vector.ucd.ie/moodle
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
6 Programme overview
pandemic. However, the intention this year is to resume the normal state
of affairs, which is for the exams to be held in person. Online exams are
problematic because they make it very difficult to protect the integrity of
assessment; unfortunately plagiarism been committed during the running
of online exams in this programme.
The two final 2-hour written exams will take place from 10am – 12 noon,
and 2pm – 4pm, in room H2.38 SCH, UCD Science Centre (Hub), Belfield
Campus, University College Dublin, on Friday 19 August 2022 (building
64, square 6D on the UCD map). No alternative exam date will be offered.
When travel to Dublin for the final exams is not possible, examination in
appropriate third party centres may be facilitated. Such arrangements
will need to be made well in advance of the exam and cannot be guaran-
teed. Contact Laura Barnes ([email protected]) by Friday 3
June (week 3) to enquire.
Grading
You will receive a mark out of 30 for your continuous assessment, which
will be converted to a letter grade according to the University’s Standard
Conversion Grade Scale (see under Mark to Grade Conversion Scales).
Likewise, you will receive a mark out of 70 for your final exam which will
be converted into a letter grade in the same manner. These two letter
grades will be combined to make an overall module grade; the precise
mechanism by which this will be achieved can be seen under Module
Grade Calculation Points.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
8 Programme overview
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
10 Programme overview
use MathJax in the discussion boards are given in Appendix A.1 (of either
module).
You’re free to use MathJax to post mathematical content. Alternatively
you can post such content by writing it by hand and scanning it to a pdf
(see above) or by using a suitable pdf annotator, and then attaching the
pdf file to your post. This option may be preferable if you want to write
a lot of mathematical content.
If you wish to withdraw from the programme, then please note that it is
essential to do so by Friday 5 August (week 12), to ensure you do not
have a failing grade recorded against your name on the University’s sys-
tem. Since this programme is only one trimester long, it is not possible
to get a refund upon withdrawal; see point 1.7 of UCD’s Refunds Policy.
Extenuating circumstances
The University has an Extenuating Circumstances Policy. The Univer-
sity defines extenuating circumstances to be ‘serious unforeseen circum-
stances beyond your control which prevented you from meeting the re-
quirements of your programme’. Note the following footnote on page 2
of the Guidance Notes for Students:
You can apply for extenuating circumstances online. For more details,
please contact Laura Barnes ([email protected]).
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
Chapter 1
Preliminaries
1.1 Numbers
Where to begin a maths course? Numbers might be a good choice. They
are at least familiar to us, but questions such as ‘how many numbers are
there?’ are more subtle than they first appear while the question of ‘what
is a number?’ borders on philosophy. In this module we will sidestep
such questions by referring to Kronecker’s quotation and accepting that
some numbers at least are given to us. We can then construct and dis-
tinguish between different types of number; integers, fractions, negative
numbers, irrational numbers, complex numbers, prime numbers, transcen-
dental numbers, algebraic numbers, quaternions, and many more, and Leopold Kronecker
proceed to do things with these numbers, such as add them, or multiply (1823 – 1891) was a
reference later. Don’t worry about all the details as you read through, ‘God made the integers,
all else is the work of
but do try to gauge for yourself how much of it is familiar. Also, while man.’
prior knowledge can be helpful, you should always be open to seeing Image source:
Wikipedia.
new light through old windows.
Number systems
The counting numbers or natural numbers are 1, 2, 3, 4, . . . . We give That’s ‘N’ for Natu-
ral. And yes – that
the set of all such numbers the name N for short. Whenever we add or strange font is used de-
multiply two natural numbers we get another natural number. liberately: x, n, N can
all represent different
things at different times,
13 but N, in this typeface,
always means the natu-
ral numbers.
14 Preliminaries
It is hard to imagine how many natural numbers there are. The number
The number 10100 is 1 followed by one hundred zeros (more easily written as 10100 ) is quite a
known as a googol.
big natural number – the number of the atoms in the universe is estimated
to be much less. And yet, the number of ways of ordering a class of 70
students is much greater, so extraordinarily large numbers can crop up
in common situations.
If one wants to subtract natural numbers, one has to allow for 0 and neg-
ative numbers. We get the set of integers . . . , −3, −2, −1, 0, 1, 2, 3, 4, . . . .
The short name for the integers is Z, coming from the German word
‘Zahlen’.
We can add, multiply and subtract any two integers to get another inte-
ger. Multiplying two negative numbers gives a positive number: −2 ×
−3 = +6 = 6. Multiplying a positive by a negative gives a nega-
tive: 5 × −3 = −15.
The rules above were not chosen by anyone – they are an inevitable
consequence of the axioms (fundamental properties) of numbers.
Dividing one integer by another may not give an integer so we have to
consider the fractions. The fractions, or quotient numbers, also called
the rational numbers (from ‘ratio’) are numbers which can be written as
one integer divided by another (non-zero) integer. For example
1
2
, − 32 , 12
1
= 12, 3145927
1000000
= 3.1415927,
The shorthand name for the set of rational numbers is Q (for ‘quotient’).
The number 6.3567 is in Q because it can be written as 10000
63567
. Similarly,
every number your calculator displays is a rational √ number. However,
your calculator lies! If you ask your calculator for 2 it will happily dis-
play the rational number 1.414213562, but this is only an approximation.
√
The amazing fact is, as proven by the School of Pythagoras, 2 is not
Pythagoras of Samos a rational number, and so can never be fully displayed by a calculator.
(c. 570 BC – c. 495 BC). Likewise, π is not a rational number, and π 6= 3.141592654.
Though hugely influen-
−6
tial, many results tra- There are many ways to write each element of Q, for example, 73 = −14 =
ditionally attributed to 9
him probably originated 21
and so on. However it is always possible to ‘cancel’ into the top and
earlier or were discov- bottom (read more about that in Section 1.7) to write a rational in its
ered by members of his
school.
Image source:
Totally History.
1.2. Sets 15
lowest form, i.e. write it using the smallest possible integers. Notice
when this is done, at least one of the integers will be odd.
The set of natural numbers N is a subset of the set of integers Z, and
likewise Z is a subset of Q. We write
N ⊆ Z ⊆ Q.
1.2 Sets
Set notation
A collection of objects is called a set. For example, N and Z and Q and Should you need them, I
have written some more
R are sets of numbers. A set may be given explicitly as in detailed notes on sets
for undergraduates.
A = {Curly, Moe, Larry},
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
16 Preliminaries
this is the case with the real numbers, which we think of as increasing
from left to right.
Interval notation
The notation (3, 7) Because we deal with subsets of the real line so often, we have a special
looks worryingly like
that of an ordered pair
notation to refer to intervals (that is, segments) of the line. For example,
of numbers used, for the real numbers between, and including, 3 and 7 can be represented by
instance, to represent
a point in the 2- [3, 7] for short, while the same interval of numbers, but excluding the two
dimensional Cartesian endpoints is written (3, 7). For reasons that are not obvious, the first is
called the closed interval from 3 to 7, and the second the open interval
plane (see Section 1.9).
This is a feature of
mathematics: sometimes from 3 to 7.
different things are
represented using
the same notation.
However, usually the Definition 1.2. Given a, b ∈ R, a < b, we have the bounded intervals
correct interpretation
(a, ∞) = {x ∈R : a < x} ,
[a, ∞) = {x ∈R : a 6 x} ,
(−∞, a) = {x ∈R : x < a} ,
(−∞, a] = {x ∈R : x 6 a} .
Although it may not look like a big deal at first glance, including or
excluding the endpoints a and b is an important distinction; for example,
the closed interval [3, 7] has a biggest and smallest number, but the open
interval (3, 7) has neither!
1.3 Functions
A function is a relationship between two sets. This, admittedly, is terri-
bly vague. The idea is that a function assigns to each element of a given
input set a unique element of another output set – a function takes some-
thing as input and produces output. In this most general of descriptions,
something like a radio
or a mathematician
| {z } −→ mathematician −→ theorem
coffee | {z },
input output
is a function, but of course, we will most often work with functions whose
input and outputs are numbers
| {z } −→ square −→ number
number | × number}.
{z
input output
Indeed, if we are dealing with a very big or infinite set (e.g. the natural
numbers) then this is the only way one can define a function, e.g. f(x) =
x 2 , or the piecewise-defined function
0 x 6 12000
T (x) = (0.27)(x − 12000) 12000 < x 6 25000
(0.27)(13000) + (0.45)(x − 25000) x > 25000,
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
18 Preliminaries
Why did we use such a vague definition of what a function is? Well, it is
not enough to consider only functions which send one number to another
number, like f(x) = x 2 . We might want to send a pair of numbers to
Sometimes we even
need to send functions
to other functions. . . a number – this is a function of two variables. But such functions are
already very familiar to you. Consider addition, which takes two numbers
and returns one number:
(x, y) −→ add −→ x + y.
| {z } | {z }
input output
See MATH10390 Chap- In linear algebra, you will learn about matrices – these can be regarded
ters 1 and 2, and Section
2.4 in particular.
as functions which take an input list of numbers (a vector) and produce
an output vector.
1.4 Variables
The relationship between numbers and variables (xs and ys) is like that
between nouns and pronouns in language. As mentioned above, the
easiest way to define a function on numbers is to make use of variables.
The following example illustrates the concept.
Example 1.3.
Number / Proper Noun Variable / Pronoun
When you meet Moe, poke Moe in the eye. When you meet a stooge,
When you meet Curly, poke Curly in the eye. poke him in the eye.
When you meet Larry, poke Larry in the eye.
f(1) = 5
f(2) = 7
f(2.35) = 7.7 f(x) = 2x + 3
f(11) = 25
..
.
The lower case letters ‘x’ and ‘y’ are often used for variables. Functions
are usually denoted by f or g and sets by upper case A, B or C . This
is just a rule of thumb, but it explains why you often see f(x), but rarely
see x(f).
1.5. Algebraic manipulation 19
Fact 1.4. First multiply and divide, then add and subtract.
(3 + 4)2 = (3 + 4)(3 + 4)
= (3 + 4)3 + (3 + 4)4
= 3·3+4·3+3·4+4·4 Here, 4 · 3 and 3 · 4 mean
4 × 3 and 3 × 4, respec-
= 3 +3·4+3·4+4
2 2
tively!
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
20 Preliminaries
= 32 + 2 · 3 · 4 + 42
= 9 + 24 + 16 = 49.
Solution.
(x + y)2 = (x + y)(x + y)
= x(x + y) + y(x + y)
= x ·x +x ·y+y·x +y·y
= x 2 + xy + xy + y2
= x 2 + 2xy + y2 .
x x2 xy
x+y
4. (a − b)(a2 + ab + b2 ) and
A great advantage of this system is that two very neat formulae hold.
would be absurd.
How can we define the powers of x, x q for q ∈ Q in such a way that Fact
1.9 still holds? Well, let’s begin with x 0 . What should this mean? If the
rule (R1) is to hold then x 0 · x 1 = x 1 . In other words x 0 · x = x. But what
multiplies x to give x? The number 1 of course.
Now for negative integers. What does x −n mean for n ∈ N? For rule (R1)
ting 00 = 1 are reason-
ably strong as doing so
to hold, we have yields several benefits.
x −n · x n = x −n+n = x 0 = 1.
We will accept that 00 =
1 in this programme.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
22 Preliminaries
x −n =
1
.
xn
So for example, x −1 = x1 , x −2 = 1
x2
, 10−6 = 1
1,000,000
and so on.
(x 2 )2 = x 2 ·2 = x 1 = x,
1 1
√
√
As well as square roots we can take cube roots. The√ cube root of x, 3
x,
is a number such that its cube is x, e.g. 8 = 2, −8 = −2. Unlike
3 3
square roots, we take the cube root of a negative number and there is
always exactly one valid cube root.
√
Similarly we can define the nth root of a real number x to be that number
n
(call it x) so that when we raise it to the power of n we get x. The
existence and uniqueness of an nth root√ depends on whether n is odd or
even. If n is even then the nth root n x will exist only if x > 0, and for
x > 0 there will be two solutions, a positive and a √negative one (we will
always take the positive one to be our value for n x). If n is odd then
there will always be a unique nth root of x for any x ∈ R.
Returning to the question of what is meant by x 1/n for n ∈ N, we see
that if our two rules are to hold
(x 1/n )n = x n ·n = x 1 = x,
1
√
then it makes sense to write x 1/n = n x.
Now we can make sense of x q for q ∈ Q. If q ∈ Q then q = mn for some
m, n in Z, n > 0, and where m and n are in their lowest form.
Example 1.13.
√
(−8) 3 = ( −8)2 = (−2)2 = 4,
2 3
There is a lot more to the business of raising one number to the power
of another. It is covered further in Appendix B.2. There, we will give a
meaning to the expression x y for any y ∈ R, not just for y ∈ Q.
1.7 Fractions
This is something we all learn in primary school, but it is surprising how
often mistakes are made. It is no harm to recall the rules.
1. Multiplying
Multiply the numerators (top) and the denominators (bottom):
a c ac
b
· d
= bd
.
For example, 12
29
· 2
3
= 24
87
= 8
29
.
2. Dividing
Invert and multiply:
a c a d ad
b
÷ d
= b
· c
= bc
.
For example,
11
28
3
= 11
28
· 7
3
= 77
84
= 11
12
.
7
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
24 Preliminaries
1 1 2
Exercise 1.14. Show that − = 2 . Write down 19 − 11
1
.
x −1 x +1 x −1
Fact 1.15. Solving an equation means to find all values of the un-
known for which the equation is true.
Linear equations
A linear equation is one of the form ax + b = 0, where a and b are Solving several linear
given numbers with a 6= 0, and x is the unknown we are solving for, equations at once, with
many unknowns, is the
e.g. 11x − 3 = 0. Subtracting the number b from both sides we get subject of MATH10390
ax = −b and now dividing both sides by a, we get the solution x = −b
a
. Chapter 3.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
26 Preliminaries
Quadratic equations
f(x) = a0 + a1 x + a2 x 2 + a3 x 3 + · · · + an x n ,
ax 2 + bx + c = 0,
Taking the square roots It appears that a quadratic equation has two solutions according to the
of negative numbers
would lead us to com-
above formula but, if the quantity inside the square root b2 − 4ac (the
plex numbers, which so-called discriminant) is zero, then the two solutions coincide and the
have a vast and hugely
applicable theory, but it quadratic has only one root, while if b2 − 4ac is a negative number then
will not be considered the quadratic equation has no solutions (since we can’t take the square
in this programme.
root of a negative number).
1.9. Graphing functions 27
Example 1.20.
(the plane of this sheet of paper). The horizontal line is called the x-axis
Cartesian coordinates
nowadays that is
and the vertical line the y-axis. Any point in the plane is specified by difficult to appreciate
two numbers, the x-coordinate of the point (how far to the right of the
how revolutionary their
introduction was.
y-axis the point is), and the y-coordinate (the height of the point above Image source:
the x-axis. The two numbers are then written as a pair, (x, y). The x- Wikipedia
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
28 Preliminaries
(3,2)
2
−2
3
−1
(−2,−1)
See MATH10390 Sec- A plane with this system of labelling is called the Cartesian plane, or
tion 2.1 for further dis-
cussion.
simply R2 , since it consists of all pairs of elements of R. That is, R2 =
{(x, y) : x, y ∈ R}. The point of intersection of the two axes, (0, 0), often
Abusing mathematical written simply (and abusing notation slightly) as 0, is called the origin.
notation is usually
the points and then ‘filling in’) we get the graph in Fig. 1.4.
−3 −2 −1 1 2 3
(4,11)
9 There is an interac-
tive GeoGebra applet,
6 (2,7)
written by Dr Anthony
Brown, that you can use
3 (0,3) to draw graphs of linear
functions.
−3 −2 −1 1 2 3
(−3,−3) −3
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
30 Preliminaries
3
There is another applet
by Dr Anthony Brown 2
that treats quadratic
functions. 1
−2 −1 1 2
The points where the graph of a function f touches or crosses the x-axis
are called the roots of f. In other words, the roots of f are the values of
x for which f(x) = 0.
Example 1.28. From Figure 1.6, we see that the roots of the quadratic
x 2 − 1 are −1 and 1.
Example 1.29. A rectangular field has one of its sides 5 metres longer
than another. The area of the field is 2250 m2 . What are the dimen-
sions of the field?
Solution. Let x be the length of the short side. Then the long side
has length x + 5 and so the area of the field is x(x + 5) = x 2 + 5x.
Hence, x 2 + 5x = 2250, or equivalently, x 2 + 5x − 2250 = 0. Now
we have to solve this equation for x, i.e. find a root of the quadratic
f(x) = x 2 + 5x − 2250.
For this quadratic, a = 1, b = 5, c = −2250, and so the roots are
1.9. Graphing functions 31
given by
√ p
−b ± b2 − 4ac −5 ± 52 − 4(1)(−2250)
x = =
2a √ 2
= 2 (−5 ± 9025)
1
= 1
2
(−5 ± 95),
Exercise 1.30.
1. Solve 2x 2 − 5x − 12 = 0.
2. Solve x(x + 1) = 2.
Exercise 1.31.
(x 3 )(x 4 ) p
(a) (b) 4
(x 2 )3 (c) 256−1/4 .
x 15
Trigonometric Functions
We start by defining the sine and cosine functions. For this, draw a circle
in the Cartesian plane, whose centre is at the origin and which has a
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
32 Preliminaries
We usually put paren- radius of 1. See the left-hand picture in Figure 1.7. Given an angle θ,
theses around the
parameter of a function
consider the point (x, y) that makes an angle θ, measured anticlockwise,
– f(x). However, for with the positive part of the x-axis. We define cos θ = x, and sin θ = y.
named functions this
is somewhat optional,
e.g. sin x and sin(x) are Figure 1.7: sine and cosine
both used.
θ x
x θ
y
(x,−y) = (cos(−θ),sin(−θ))
In this way, the sin and cos function make sense for every angle. For
example, what is cos 180◦ ? Well, the point on the unit circle which makes
a 180◦ angle with the positive x-axis is the point (−1, 0). Thus, cos 180◦ =
−1 (and sin 180◦ = 0).
Now consider the right-hand picture in Figure 1.7. Everything is as in the
left-hand picture, except that the angle θ is measured clockwise, which
we interpret as −θ. If we do this, then the corresponding point equals
(x, −y), from which we infer that cos(−θ) = x = cos θ and sin(−θ) =
−y = − sin θ.
Remarks 1.32. Apart from the sine and cosine functions, the most
commonly met trigonometric function is tangent. It is defined by
sin θ
tan θ = cos θ
. With the notation of Figure 1.7, tan θ = yx is the steep-
ness, or slope, of the line determined by the angle θ. Notice that
tan θ is not defined at places where cos θ is zero.
The notation sin2 θ means (sin θ)2 (i.e. calculate the sine of θ and then
square the answer) and is not to be confused with sin θ 2 (i.e. square θ
and calculate sine of the answer). The following formulae are often useful
when dealing with trigonometric functions.
If you look again at the right-angled triangle in Figure 1.7, and apply
the Theorem of Pythagoras, you will conclude that cos2 θ + sin2 θ = 1,
for any angle θ. This is a very useful trigonometric identity, and we’ll
present it here with another couple of identities which will be useful later
on. We won’t prove them.
1.9. Graphing functions 33
1. cos2 x + sin2 x = 1,
Measuring angles
Fact 1.34.
Measuring an angle by dividing a circle into 360 parts and calling each
part one degree is actually rather arbitrary. Beyond historial precedent,
there is no natural reason for doing so: why not 100 parts, or 29 parts? Nobody knows the exact
origin of the partition of
However, there is in fact a very natural way of measuring angles. This the circle into 360 de-
grees. It is certainly an-
is done as follows. cient: it is seen in an-
r
r
Alternatively, x radians
1 radian is the angle we get by
r walking an arc of length
x around the circumfer-
ence of the unit circle
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
34 Preliminaries
How big is a radian? Well, if we walk around the whole circle, we will
have travelled a distance 2πr, that is 2π times a distance r. Hence there
are 2π radians in a complete revolution, or 360◦ = 2π radians. It is
important to know the sine and cosine of a small number of fundamental
angles, given by the following table.
π π π π
The pattern of numbers
θ 0◦ ≡ 0 30◦ ≡ 6
45◦ ≡ 4
60◦ ≡ 3
90◦ ≡ 2
√ √ √ √ √
sin θ
under the square root
signs is a handy way of
0
=0 1
= 1 2
= √1 3 4
=1
2 2 2 2 2 2 2
remembering the values. √ √ √ √ √
cos θ 2
4
=1 2
3
2
2
= √1
2 2
1
= 1
2 2
0
=0
Notice that sin θ = sin(θ + 2π) and cos θ = cos(θ + 2π). We will now
make graphs of the sine and cosine functions.
−2π − 3π −π − π2 π π 3π 2π
2 2 2
−1
In this programme, we Yes! They are the same shape, but one is a horizontal translate of the
will use radians by de-
fault.
other. More specifically, cos x = sin x + π2 . Notice also that, since adding
If you use your calcula- a semi-circle (i.e. π radians) to an angle gives us the point on the perime-
tor to compute sines and ter which is diametrically opposed, we also have sin(x + π) = − sin x and
cos(x + π) = − cos x.
cosines etc. it is highly
advisable to set it to ra-
dian mode, by default!
1.9. Graphing functions 35
Notice also that the graph of cosine has mirror symmetry in the y-axis A function that has this
property is called even.
(if you reflect it in the y-axis, it doesn’t change). This is to be expected, Other examples include
since cos(−x) = cos x, as we saw above. x 2 − 1 (see Figure 1.26)
and x 4 .
Also true is the fact that the graph of sine has a rotation symmetry, in
the sense that if you were to rotate it about the origin through an angle A function that has this
of π radians (180◦ ), then it would look the same. This is, ahem, reflected property is called odd.
Other examples include
by the fact, again seen above, that sin(−x) = − sin x. x, x 3 and tan x.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
Chapter 2
2.1 Functions
Domain and codomain
Let us recall Definition 1.21 and examine again what a function is. Sup-
pose we call our function f (as we often do!), our set of inputs A and our
set of outputs B. We write the phrase ‘f is a function from the set A to
the set B’ more compactly as f : A → B. The input set is known as the
domain and the output set as the codomain. Quite often for us, these are
both the set R of real numbers and we have f : R → R. By a real-valued
function, we mean one which maps into R, that is, the codomain is R (or
a subset of R).
The rule for getting from the domain to the codomain is usually given John von Neumann
f :R → R
x 7→ x 3 + 7.
Here is another function, but there is a problem with its definition. Can
you see what it is?
g:R → R
37
38 Functions and Limits
1
g(x) = .
x −3
Look again at Definition 1.21 and notice that a function must ‘map each
element’ of the domain to the codomain. Here, g fails this requirement
Division by zero! because the rule g(x) = 1/(x − 3) cannot be evaluated when the input is
3. There are two ways around this. One is to change our formula so that
the function is properly defined on all of the given domain, the other is
to change our domain so that the given rule is valid. So both of these
functions g1 and g2 are well-defined:
(
1
x 6= 3
g1 : R → R, g1 (x) = x−3
2021 x = 3,
and
1
g2 : R \ {3} → R, g2 (x) = .
x −3
If a formula for a function is given, without the domain being specified,
then you should assume the domain to be all the numbers for which the
formula makes sense. This is called the natural domain of the function.
Example 2.1.
Fact 2.4. The modulus function has the following properties for all
x, y ∈ R.
1. |x| > 0,
2. |x| = |−x|,
x
−2 −1 1 2
Unique output
Look once again at Definition 1.21 and consider the requirement that
each input value is mapped to a unique element of the output set. This
phrase rules out any ambiguity in the value of a function, which is good,
but it means you have to be a careful when defining a function.
√
For example, consider the function defined through f(x) = x on its
natural domain R+ . What is f(4)? We cannot say that f(4) is ‘2 and −2’
√
because a pair of numbers is not a unique element of the codomain R.
We get around this by always taking x to be the positive square root
√ solve equations.
and allowing for this when we √ Thus the solution
√ to the
equation x = 4 is not x = 4, it is x = ± 4, and since 4 = 2, the
2
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
40 Functions and Limits
x x
Algebra of functions
Much of MATH10390 The algebra of numbers, that is, how to add, subtract, multiply and divide
concerns performing
arithmetic on objects
numbers is well-known to us. Perhaps surprisingly, we can do these
(matrices, vectors etc.) operations on real-valued functions. The operations are inherited from
that are not numbers.
those on numbers applied pointwise. That is, if f and g are functions
then we add them to get the function f + g. What is the function f + g?
Well, its value at a number x is defined to be (f + g)(x) := f(x) + g(x),
and similarly,
The notation ‘:=’ means
One has to be a little careful with the domains: (f + g)(x) only makes
distinction, other times
confusing. You can ig-
nore it if you prefer. sense for values of x which are in the domain of both f and g, and the
same is true for the other combinations of f and g. In addition, gf is not
defined at any points x for which g(x) = 0.
Example 2.6. Let f(x) = x 2 and g(x) = ex . Write down formulae for
(f + g)(x), (fg)(x), gf (x) and gf (x). What are the natural domains?
2.1. Functions 41
f x2
Solution. We have (f + g)(x) = x 2 + ex , (fg)(x) = x 2 ex , g
(x) = ex
,
g x
f
(x) = xe2 .
Both f and g have R as their (natural) domain, thus the domain of
both f + g and fg is R. Since ex is never zero, the domain of gf is also
R, but gf is not defined at 0 because the denominator is f(0) = 0, so
the domain of gf is R \ {0}.
Composition
There is one other very important way in which functions can be combined
other than the algebraic operations mentioned above, and it is when we
use the output value of one function as the input value of another. This
is known as a chain or composition of functions. The notation f ◦ g
represents the function ‘f after g’:
(f ◦ g)(x) := f(g(x)).
Example 2.7. Let f(x) = 1 − x 2 and g(x) = sin x. Write down formulae
for f(g(x)) and g(f(x)).
Since sin2 x + cos2 x = 1, we can simplify: f(g(x)) = cos2 x. Lemma 1.33 (1).
Notice that f ◦ g and g ◦ f are different functions! The act of taking In MATH10390 Section
compositions is non-commutative: the order in which the composition is 1.2 we discover another
non-commutative opera-
made matters. tion, namely matrix mul-
tiplication.
Contrast these opera-
Surjectivity tions with addition or
multiplication of num-
bers, which is commu-
The function f : R → R, f(x) = x 2 is well-defined; the formula can be tative: for all numbers a
calculated for every element of the domain R. Notice there are elements and b, a + b = b + a and
ab = ba.
of the codomain which are never output by the function: the square of
a real number is never negative. The set of values which the function
outputs is called the range or image of the function. The range of f(x) =
x 2 is R+ = [0, ∞) (see Example 2.1 (2)). The range of a function is a
subset of the codomain.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
42 Functions and Limits
Example 2.9.
Given this ability to We can always ensure that a function is surjective by ‘shrinking’ its
shrink the codomain, the
idea of being surjective
codomain to its range.
may seem redundant,
Example 2.10. The cosine function cos : R → [−1, 1] is surjective.
but there are reasons for
having it around.
Injectivity
While a function cannot map an element of the domain to two different
elements of the codomain (see the subsection on unique output above),
it is perfectly allowable for a function to map two different elements of
the domain to the same element of the codomain. For example, consider
g : R → R, g(x) = x 2 − 6x, where both g(1) = −5 and g(5) = −5.
A function f is said to be injective or one-to-one if we cannot get the
same output from two different inputs. In other words, equal output
implies equal input.
Example 2.12.
4. The sine and cosine functions are not injective on R, e.g. sin 0 =
0 = sin π.
Example 2.13.
Inverse Functions
Suppoose we have a function f : A → B. If we can find a function
g : B → A which ‘undoes’ the action of f, then we call g an inverse
function of f.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
44 Functions and Limits
and
f(g(y)) = y for all y ∈ B,
then we call g an inverse function for f.
Theorem 2.17. A function f has an inverse if, and only if, it is bijective.
and p
g(f(y)) = g(y3 ) = 3
y3 = y.
Throughout this pro- Example 2.19. Let f : R → (0, ∞) be given by f(x) = ex . Then f has
the inverse function g : (0, ∞) → R, g(y) = log y, since elog x = x and
gramme, log denotes
natural logarithm, that
is, ln, or ‘log to base log(ex ) = x.
e’, i.e. loge . This is
standard practice in
mathematics. However,
in some other contexts Figure 2.3: The graphs of f and g in Example 2.19
(in particular, some
calculators) log can y y
common
ex
mean the
logarithm or ‘log to
base 10’, i.e. log10 , so 4 2 log x
do be careful!
To see the difference 3 1
between log10 and log,
take a look at this inter- 2 x
active GeoGebra applet, 1 2 3 4
written by Dr Anthony 1 −1
Brown. The default set-
ting (a = 10) displays x −2
log10 , while setting a = −2 −1 1 2
2.7 yields the (approxi-
mate) graph of log. This
is because e ≈ 2.7 (to
one decimal place).
2.1. Functions 45
Pick any number on your calculator. Press the ‘exp’ button (it appears
as ‘ex ’ on some calculators). Then press the ‘ln’ (or ‘loge ’) button. You
should end up with the number you started with. Ta da!
It might look odd that
Remarks 2.20. A function may not have an inverse, but if a function this remark is being
has one, then it is unique: it does not have two different inverses. made, but if you look
back at Definition 2.16,
Hereafter, if a function has an inverse, we will refer to it as the g simply fulfils some re-
quirements. The fulfill-
inverse. ing of requirements does
not imply uniqueness in
general: the shoes I am
Example 2.21. Let f : [1, 4] → [5, 14] be given by f(x) = 3x + 2. Then wearing fit my feet, but I
have other pairs that do
f is bijective and its inverse function g : [5, 14] → [1, 4] is found by so too.
solving y = 3x + 2 for x as a function of y. Subtracting 2 from both See MATH10390 Propo-
sides and then dividing by 3, we find x = 13 (y − 2). Thus g(y) = y−2 3
sition 1.28 for another
example of this point.
is the inverse function of f.
The notation f −1 is often used for the inverse function of f (when there
is an inverse function). This terminology is slightly unfortunate because
with functions f −1 is not the same thing as 1f (whereas, with numbers,
x −1 = x1 ).
Linear functions
Linear functions are those of the form f(x) = mx + c where m and c are
given real numbers. The natural domain of a linear function is all of R.
For example, f(x) = 3x + 2 and g(x) = −17x + π are linear functions. A stricter definition of
linear function is not
Linear functions are so called because their graph is a straight line. For just that the graph is a
the linear function f(x) = mx + c, the number m gives the slope or straight line, but that it
steepness of the line: for each unit of increase on the horizontal x-axis, also passes through the
origin (in other words, it
the line rises by m on the vertical y-axis (or drops, if m is negative). The has the form f(x) = mx,
where c = 0) but we
number c is the y-intercept, that is, where the line crosses the y-axis. will use the more gen-
See Example 1.25. eral notion.
Linear functions are injective (if m 6= 0) and their inverses can be found
quite easily – see Example 2.21. If m = 0, then we get a constant
function, f(x) = c, such as f(x) = 17. The graph of a constant function is
a horizonal straight line (slope is zero).
While linear functions are very simple, they are important because most
functions that we need to deal with can be approximated by linear func-
tions. This fact lies at the heart of calculus. If you zoom in far enough
on the graph of any ‘smooth’ function then the graph looks like it is a
straight line.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
46 Functions and Limits
In mathematics, appear- As x gets closer to 1, either from the left or the right hand side, f(x)
ances can deceive. The
table suggests certain
appears to get closer to 2 (and as we see below, it really does do this).
behaviour, but offers no We say that the limit of f(x) as x approaches 1 is 2. We can express this
proof.
using notation as
lim f(x) = 2.
x→1
There is a more rigorous Definition 2.23. If f(x) → ` when x → a then we say that the limit
of f(x) as x approaches a is `, and we write
definition of limit, which
belongs to a course on
so-called Mathematical
Analysis, but this defi-
lim f(x) = `.
nition will suit our pur- x→a
poses.
Example 2.25.
In these two examples, it makes sense to talk about a ‘left hand limit’
or a ‘right hand limit’. The functions f and g did not have limits
because their left and right hand limits did not agree.
2 2
1 1
x x
−3 −2 −1 1 2 −2 −1 1 2
−1
−2
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
48 Functions and Limits
Example
2.26. The function h : R \ {0} → [−1, 1], given by h(x) =
1
sin x , has neither left nor right hand limits at 0, and in particular
has no limit at 0.
−1
If a limit exists at a given point, and the function is defined at that point,
the limit may not equal the value of the function there.
x
−2 −1 1 2
−1
−2
2.2. Limits of functions 49
One polynomial whose limits are easy to calculate is the identity func-
tion, f(x) = x. Indeed, limx→a f(x) = limx→a x = a. This is really saying
nothing more than the tautology ‘if x tends to a then x tends to a’.
Another limit, even easier to calculate, is that of the constant function
f(x) = k, where k is some constant. We simply have limx→a f(x) = k.
The following rules of limits are very useful – we’ll state them in the form
of a theorem. These rules form what is known as the algebra of limits.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
50 Functions and Limits
Solution. We use the rules Theorem 2.29, together with the limits of
In this solution, we’re
really applying the rules constant functions and the identity function. We have that
in Theorem 2.29 in re-
verse.
lim(5x 2 + 3) = lim 5x 2 + lim 3 rule (2)
If we followed Theo- x→2 x→2 x→2
= 5 lim x 2 + 3
rem 2.29 to the let-
rule (1)
x→2
ter, we could start at
the last line and work 2
backwards. In practice = 5 lim x + 3 rule (3)
though, this is rarely x→2
= 5 · 2 + 3 = 23.
done. 2
So, for polynomials, we can calculate the limit by just substituting the
limit point into the function. This does not work for every function as
we’ve already seen.
Next, let’s look at limits of rational functions.
The following result follows from Theorems 2.31 and 2.29 (4).
Corollary 2.33. If p(x) and q(x) are polynomials and q(a) 6= 0 then
p(x) p(a)
lim = .
x→a q(x) q(a)
Example 2.34.
x +2 3
1. lim = = −3
x→1 x − 2 −1
x +2 0
2. lim = = 0
x→−2 x − 2 −4
x −3
3. lim = – no limit!
x→−3 x + 3 0
4. lim 7 = 7
x→2
5. lim h = h
x→2
6. lim x = x
h→2
x 2 − 36 (x − 6)(x + 6)
= lim x − 6 = −12
In parts 7 – 11, we have
7. lim = lim
x→−6 x + 6 x +6
cancelled the denomina-
x→−6 x→−6 tor with a factor in the
numerator. We can to
h2 − 7h + 12 (h − 4)(h − 3)
= 3 − 4 = −1
do this because, in all
8. lim = lim
h−3 h−3
cases, these factors are
h→3 h→3 never zero. E.g. in part
7, the factor x + 6 is
y3 − 8 (y − 2)(y2 + 2y + 4) never zero because x →
9. lim = lim = 12 −6 means that x tends
y→2 y − 2 y→2 y−2 to −6, but never equals
−6. Likewise for the
12h + 6h2 + h3 other parts. See the ar-
10. lim = lim 12 + 6h + h2 = 12
h
gument after Definition
h→0 h→0 2.23.
3x 2 h + 3xh2 + h3
11. lim = lim 3x 2 + 3xh + h2 = 3x 2 .
h→0 h h→0
√
A useful trick for limits which√involve a − b is to multiply above and We can eliminate the
below by the surd conjugate a + b. square√root signs
√ in this
way: ( a−b)( a+b) =
a − b2 .
Example 2.35. See Exercise 1.8 (3).
√ √ √
x −5−2 x −5−2 x −5+2
lim = lim ·√
x→9 x −9 x→9 x −9 x −5+2
√ √
( x − 5 − 2)( x − 5 + 2)
= lim √
x→9 (x − 9)( x − 5 + 2)
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
52 Functions and Limits
√
x − 5 − 22
2
= lim √
x→9 (x − 9)( x − 5 + 2)
x −9
= lim √
x→9 (x − 9)( x − 5 + 2)
= lim √ = √
1 1
= 1
.
x→9 x −5+2 9−5+2 4
Limits at infinity
The limits that we considered in the previous subsection were, loosely
speaking, questions of the form ‘what happens to f(x) as x gets closer
to the number a’. We can also ask questions of the form ‘what happens
to the function f(x) as x gets arbitrarily large and positive (or the same,
but negative)’. The relevant notation is
lim f(x) or lim f(x),
x→∞ x→−∞
respectively.
Example 2.36.
4. limx→−∞ ex = ∞.
because the meaning 2
of ‘infinity’ can be
ambiguous.
5. limx→∞ e−x = 0.
2
x3 1
6. lim = lim = 12 .
x→∞ 2x + 9
3 x→∞ 2 + 93
x
Trigonometric limits
The basic trigonometric functions obey the same trivial limit formulae Functions which obey
the limit formula
that the polynomials do (Theorem 2.31), namely
lim f(x) = f(a),
x→a
lim sin x = sin a, lim cos x = cos a,
x→a x→a
are considered ‘well-
as does the tangent function where it is defined (i.e. whenever cos x 6= 0).
behaved’ – see Section
2.3.
We won’t prove this result, but see Exercise 2.38. If you want supporting
evidence (but not proof), choose a number very close to zero, e.g. 0.001,
and use a calculator (in radian mode) to find its sine. You should get This are just approxima-
sin 0.001 ≈ 0.0009999998333, and thus sin0.001
0.001
≈ 0.999999833, which is tions of the true values
in question.
awfully close to 1.
Exercise 2.38. Draw the unit circle and mark a (small) angle x. Mark
the point corresponding to the angle x on the unit circle and drop a
perpendicular to the horizontal axis (as in Figure 1.7). The length of
the perpendicular is sin x. Because we measure angle using radians,
the angle x is the length of an arc of the circle. Mark this arc.
Compare the length of this arc x, with the length of the perpendicular
sin x. What happens to these lengths as the angle gets smaller?
Example 2.40.
1. limx→0 sin 3x
x
= limx→0 3 · sin 3x
3x
= 3 · 1 = 3.
2. limx→0 tan 4x
sin 5x
= limx→0 tan 4x
4x
· 5x
sin 5x
· 4x
5x
=1·1· 4
5
= 54 .
sin x 3 sin x 3
3. limx→0 x4
= limx→0 x3
· x1 = 1 · limx→0 x1 , which does not exist.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
54 Functions and Limits
Exercise 2.41.
sin2 x
1. What is limx→0 x
?
1−cos x
2. Show that limx→0 x
= 0.
2.3 Continuity
Limits – an interpretation
One way of interpreting the notion of a limit in mathematics is as a way
to see what you cannot look at directly. When you calculate limx→a f(x)
you are in some sense ‘predicting’ what f(a) might be, by looking only
at the values of f(x) for x near a. It’s as if someone puts their thumb over
the graph and asks you what’s underneath.
Sometimes, you cannot tell what the function should be at the limit point
(the limit does not exist). For example, given the signum function (Figure
2.4 (2)), is g(0) equal to 1 or −1?
Other times, you can make an informed guess (the limit limx→a f(x) exists),
but your guess is wrong (limx→a f(x) 6= f(a)). For example, if shown the
δ function of Example 2.27 (and Figure 2.6), but the value at 0 covered
up, what would you expect the value to be? Presumably 0?
Finally, there is the case when the function behaves ‘as expected’ – you
can make an informed prediction (the limit limx→a f(x) exists), then the
graph is uncovered, and you see your prediction was correct (limx→a f(x) =
f(a)). Functions that behave this way are said to be continuous.
Continuous functions
Example 2.43.
1. Continuous functions
All polynomials are continuous functions (by Theorem 2.31). Ra-
tional functions are continuous except at points where the de-
nominator is zero (at these points the functions are undefined).
Sine and cosine are continuous functions. Tangent is continu-
ous except at points where the cosine is zero.
2. Non-continuous functions
Functions that are not continuous include the signum function
and the function h(x) = sin x1 (Examples 2.25 (2) and 2.26) which
are both discontinous at 0. The floor function (or ‘round down’
function)
is not continuous at any point. Can you imagine what its graph
looks like?
Example 2.44. Find the value or values of k ∈ R for which the func-
tion (
2x + k, x 6 −1
f(x) =
x 2 + 1, x > −1,
is continuous.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
56 Functions and Limits
Solution. The two pieces of this piecewise defined function are each
continuous, but f could fail to be continuous at the point where the
two pieces meet, namely at x = −1.
The left hand limit as we approach −1 is 2(−1)+k, the right hand limit
is (−1)2 + 1 = 2. These agree, and we have a limit, when −2 + k = 2,
that is, k = 4. The limit (from both sides) is then 2 which equals
f(−1). So f is continuous when k = 4.
x
−2 −1 1 2 3
−1
−2
Differentiation
Consider the straight line in Figure 3.1 and the angle θ that it makes In MATH10390 the co-
with the x-axis. Given any two points P = (x1 , y1 ) and Q = (x2 , y2 ) on ordinates of points in
R2 are typically labelled
the line, the slope of that line, tan θ, is also given by (x1 , x2 ) and (y1 , y2 ), etc.
y2 − y1
,
x2 − x1
that is, the vertical difference divided by the horizontal difference.
y2 −y1
Figure 3.1: tan θ = x2 −x1
y
Q = (x2 ,y2 )
y2 −y1
P = (x1 ,y1 ) θ
x2 −x1
θ
x
57
58 Differentiation
For the straight line, f(x) = mx + c, the slope turns out to be m. Notice
that the slope does not depend on the constant c. We may use the
phrase ‘rate of change’ instead of ‘slope’ because, if the slope is m, then
by moving 1 unit along the x-axis, the value of the function changes by
m, i.e.
Figure 3.2: Lines having positive, negative and zero slopes, respectively
y y y
Thus linear functions change value in a very rigid way – the rate of
change is constant (no matter where we are on the line, if we move one
unit to the right on the line then we increase our height by m). Most
functions change values in more exciting ways.
The Calculus
Calculus is the study of how mathematical functions change their values.
The formal study of calculus began in the late 17th century. Isaac New-
ton developed this new science of change to deal with the motion of the
heavenly bodies under the action of gravity. Newton’s great work, Prin-
cipia Mathematica, was published in 1687. Around the same time, and
independently, Gottfried Leibniz studied derivatives and integrals in a
more abstract sense. Argument about who was first to come up with the
important ideas was a divisive topic among academics at the time.
3.1. Rates of change 59
It is interesting to muse on the fact that while Newton would have con-
sidered himself primarily a physicist, his works on optics and gravity,
while great breakthroughs at the time, have been superceded by newer
and better theories, for example, Einstein’s General Theory of Relativity.
However, relativity, and all of modern science, relies on the Newton’s
indelible mathematical contribution: the calculus.
Rates of change
For a function whose graph is a straight line, the slope of the line is a
measure of how quickly the values of the function change. For a more
general function, how do we measure the rate of change of the function?
Essentially, given a point on the x-axis we want to measure how steep
the graph of a function f is at that point. This is done according to a
simple strategy. We simply pick the straight line which best describes
the function at that point and say that the rate of change of the function
at that point is just the slope of this ‘special’ line. The special line which
best approximates the function at a point, a, is called the tangent line
to the function at a or ‘the linear approximation of the function’. For
example, Figure 3.3 shows the tangent line to a function at the point
a = 3.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
60 Differentiation
P = (3,f(3))
Q = (4,f(4))
3 4
P
Q
3 31
2
In the pictures, we have marked points P = (a, f(a)) and Q = (x, f(x))
on each graph. We notice that as the point Q gets closer to the point
P, the solid secant line from P to Q gets closer to the dotted tangent
line. In particular, the slope of the secant line from P to Q gets closer to
the slope of the tangent line. We obtain the slope of the tangent line by
taking the limit of the slope of the secant line as x → a.
3.2. Differentiation from First Principles 61
P
Q
3 3 15
f 0 (a) = Slope of tangent line = limit of slope of secant line PQ There is an interactive
GeoGebra applet, writ-
f(x) − f(a) ten by Dr John Sheekey,
= lim . that you can use to
x→a x −a see how the secant line
approaches the tangent
Often, a + h is used instead of x, and with this notation we consider line as h → 0.
the limit as h → 0 instead of as x → a. This gives us the following
reformulation of Definition 3.2, which is of central importance.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
62 Differentiation
Example 3.4. Find the slope of the tangent line to the function f
defined by f(x) = x 2 at the point a. In other words, find the derivative
of f(x) = x 2 at a.
(a + h)2 − a2
f 0 (a) = lim
h→0 h
a + 2ah + h2 − a2
2
= lim Example 1.7
h→0 h
2ah + h2
= lim
h→0 h
= lim 2a + h = 2a.
h→0
Solution.
steps: why does each
line follow from the pre-
vious one? What fact or
f(x + h) − f(x)
f 0 (x) = lim
result was used?
h→0 h
sin(x + h) − sin x
= lim .
h→0 h
sin x cos h + cos x sin h − sin x
= lim Lemma 1.33 (3)
h→0 h
sin x(cos h − 1) + cos x sin h
= lim
h→0 h
3.2. Differentiation from First Principles 63
cos h − 1 sin h
= lim sin x + cos x
h→0 h h
cos h − 1 sin h
= sin x lim + cos x lim Theorem 2.29 (1)
h→0 h h→0 h
= (sin x) · 0 + (cos x) · 1
= cos x.
f(x) − f(a)
lim f(x) − f(a) = lim (x − a)
x→a x→a x −a
f(x) − f(a)
= lim lim x − a = f 0 (a) · 0 = 0.
x→a x − a x→a
So, if f is differentiable at a, then it follows that limx→a f(x) = f(a), which
is just saying that f is continuous at a.
The converse statement is not true. A function can be continuous but not
differentiable.
Example 3.8. The absolute value function f(x) = |x| (see Definition
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
64 Differentiation
a variable.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
66 Differentiation
There’s a lot in this theorem so let’s break it down. Theorem 3.14 (1) tells
us, for example, that
d d
(100 sin x) = 100 sin x = 100 cos x.
dx dx
3.3. Rules for differentiating 67
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
68 Differentiation
= (1 − x) cos x − sin x.
(uv)(x + h) − (uv)(x)
(uv)0 (x) = lim
h→0 h
On the third line, we u(x + h)v(x + h) − u(x)v(x)
= lim
h
subtract and then add
back a particular term. h→0
This term has been cho- u(x + h)v(x + h) − u(x + h)v(x) + u(x + h)v(x) − u(x)v(x)
sen so that we can we = lim
can take advantage of h→0 h
the limit definitions of v(x + h) − v(x) u(x + h) − u(x)
u0 (x) and v 0 (x) in the re- = lim u(x + h) + lim v(x)
mainder of the proof. h→0 h h→0 h
= u(x)v 0 (x) + v(x)u0 (x).
d sin x
dx x 2
The derivative of a quo- Next, the quotient rule, which allows us to find derivatives such as .
tient is not the quotient
of the derivatives!
sin x
Example 3.17. Differentiate f(x) = .
x2
Solution. Setting u(x) = sin x and v(x) = x 2 , and applying the quo-
tient rule 0
u v · u0 − u · v 0
= ,
v v2
we find
d sin x x 2 cos x − (sin x)(2x)
=
dx x 2 x4
cos x sin x
= −2 3 .
x 2 x
The proof of the quotient rule is not very different from the product rule
– which gives you the perfect opportunity to practice your skills!
Solution.
1. d
dx
(3x 2 − 4ex + sin x + 2) = 6x − 4ex + cos x.
√
f 0 (x) = 76 6 x + 2(√1x)3 .
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
70 Differentiation
f 0 (x) = cos x
d x
(e + 2) + (ex + 2) cos x
d
dx dx
x x
= cos x · e + (e + 2)(− sin x)
= ex (cos x − sin x) − 2 sin x.
The chain rule looks quite natural in Leibniz notation, where it takes the
form of a ‘cancellation’.
Theorem 3.23 (Chain rule (Leibniz form)). Suppose f and g are dif-
ferentiable functions. Then setting u = g(x) we have
d d df du
(f ◦ g)(x) = f(u) = · .
dx dx du dx
We’ll leave the proof to Appendix B.1. The chain rule tells us that to
differentiate f ◦ g we treat the inner function g as a variable; we write
(f ◦ g)(x) as f(u), where u = g(x). We then want dxd f(u) but since we
can’t differentiate a function of one variable (here u) with respect to a
d
different variable (here x), we actually find du f(u) and then make up for
the change of variable by multiplying by dx which we find by writing u
du
d
sin x 2 = 2x cos x 2 .
dx
You could have avoided using the chain rule in the previous example by
expanding (x 2 + 3)7 into a (lengthy) polynomial and then differentiating
that polynomial using the power rule, but it would be much more work.
With a little practice, using the chain rule becomes second nature. Given
an expression like (x 2 + 3)7 to differentiate, you just mentally put your
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
72 Differentiation
thumb over the inside function x 2 + 3 and say to yourself ‘ok, when I
differentiate something to the power of 7, I get 7 times that thing to the
power of 6. Then I have to multiply by the derivative of what’s under my
thumb. . .’
Here are some more examples of its use.
Example 3.26.
d 3x
1. Find dx
e .
2. Find d
dx
sin2 x.
Solution.
d u d u
1. Let u(x) = 3x. Then e · = eu · 3 = 3e3x .
This is much quicker du
than the hinted method dx
= du
e dx
in Exercise 3.21 (4).
2. Recall sin2 x means (sin x)2 . We’ll interpret this as u2 where u =
sin x and apply the chain rule to get dxd u2 = 2u du
dx
= 2 sin x cos x.
d
Example 3.27. What is cos x?
dx
because d
dx
(x + π2 ) = 1. This can be simplified to give
cos(x + π2 ) = sin(x + π
2
+ π2 ) = sin(x + π) = − sin x.
3.4. The Chain Rule 73
d
cos x = − sin x.
dx
d −x 2 d u d u du
e · = eu (−2x) = −2xe−x .
2
e = e =
dx dx du dx
The chain rule can be applied more than once to deal with longer chains
of functions.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
74 Differentiation
Solution.
d d tan x
f(x) = xe
dx dx
First, we use the product rule:
= etan x x + x etan x
d d
dx dx
= etan x + x etan x .
d
dx
Second the chain rule on etan x :
tan x tan x d
=e +x e · tan x
dx
sin x
Finally, we apply the quotient rule on tan x = cos x
, which you will
have done in Exercise 3.19, to give
d tan x 1
f(x) = e 1+x .
dx cos2 x
Here’s a challenge. You have the tools necessary to answer this, but do
you have enough paper and perseverance?
x = exp(log x),
3.4. The Chain Rule 75
to obtain
d d
1 = (exp(log x )) = exp(u)
dx | {z } dx
u
d d In this argument, we are
= exp u · u chain rule implicitly assuming that
du dx the derivative of log x
d exists. Strictly speak-
= exp u · log x ing, we should not do
dx this, because it has not
d been shown to exist.
= exp(log x) · log x However, the argument
dx is sufficient for our pur-
d poses! See Appendix
=x log x. B.2 for a more mathe-
dx matically robust treat-
ment of exp and log.
d 1
log x = .
dx x
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
Chapter 4
Notice that if the graph of the function was readily available then the
fact that the function is decreasing at 1 would be easy to see. Indeed,
we have the graph of this function in Figure 4.1.
77
78 More about the Derivative
−1 1 2 3
−2
Exercise 4.3. Find the critical points of the function given in the
previous example, f(x) = 2x 3 − 7x 2 + 3x + 4.
Local maximum of f at Critical points are values of x at which f(x) has a horizontal (flat) tangent
a means f(a) is greater
than f(x) for all x near
line and they are related to local maxima and minima by the following
a, but the function might result.
have greater values ‘far’
from a. The same ap-
plies to local minimum. Theorem 4.4. If f has a local maximum or local minimum at the point
In Fig. 4.1, we see a lo- c, and f is differentiable at c, then c is a critical point of f, i.e. f 0 (c) =
cal max near 0, but f has
greater values near 4. 0.
If you want to know why this is true, see the proof in Appendix B.1.
Beware that the converse to this theorem is not true!
1
(1,f(1))
−1 1 2
−1
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
80 More about the Derivative
In Example 4.6, 3 is imum value of this function on this interval. The maximum value is
what is called an upper not 3: we cannot take x = 1 since 1 is not in the open interval (0, 1).
bound for f(x), meaning
f(x) 6 3 for all values x We can get the value 2.99 from the function (at x = 0.99) but this is
in the domain (0, 1). Of not the greatest value, since we can also get 2.999. We see that no
course, π and 100 are
also upper bounds, but number smaller than 3 is the greatest value, and the values do not
3 is special because it
is the least upper bound
reach 3, so we conclude the function has no greatest value.
of f(x). However, it is
not a value of f(x) and
so cannot be considered If we have a closed interval (endpoints included), do you think we will
the greatest value! always have a maximum and minimum?
Solution. Again the function has no maximum value. Unlike the pre-
vious example, this function does not even have an upper bound,
meaning it takes arbitrarily large values: f( 12 ) = 2, f(0.98) = 50,
f(0.999) = 1000, . . .
To distinguish from local maxima and minima, we use the term global
maximum/minimum for the maximum/minimum value of a function over a
specified interval or domain. Corollary 4.11 provides us with the possible
locations of global max/min values. To pin them down, we have to locate
the critical points and do a few calculations.
Example 4.12. Find the maximum and minimum values of the function
g(x) = 2x 3 − 9x 2 + 12x − 4 on the interval [1, 4].
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
82 More about the Derivative
24
18
12
1 2 3 4
Finding the maxima and minima of functions is one of the most important
applications of differential calculus.
Theorem 4.13. Suppose c is a critical point of the twice differentiable Warning!! This classifi-
function f. If f 00 (c) > 0 then c is a local minimum of the function, and cation theorem does not
help to classify critical
if f 00 (c) < 0 then c is a local maximum of the function. points at which the sec-
ond derivative is zero.
These could still be lo-
cal maxima, local min-
ima, or neither.
Example 4.14. For the function g(x) = 2x 3 − 9x 2 + 12x − 4 given in
Example 4.12, we have g00 (x) = 12x − 18. At the critical point 1, we
have g00 (1) = −6 < 0, so this critical point is a local maximum. At the
critical point 2, we have g00 (2) = 6 > 0 so this critical point is a local
minimum.
Of course, this agrees with what we see in the graph.
Example 4.15. Find and classify any critical points of φ(x) = e−x .
2
d
φ00 (x) = −2xe−x
2
dx
d −x 2 d −x 2
= −2 x e + x e
dx dx
= − 2(−2x 2 e−x + 1 · e−x )
2 2
= − 2(1 − 2x 2 )e−x .
2
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
84 More about the Derivative
Proof. The Max/Min Theorem (4.8) implies that f has a (global) maximum
at some point c ∈ [a, b] and a minimum at some point d ∈ [a, b]. Now
from Theorem 4.10 each of c and d is either an endpoint of [a, b] or it is
an element of (a, b) at which f 0 = 0. If either of c and d is a point at
which f 0 = 0 then the proof is finished.
So suppose that both c and d are endpoints. This means that f attains
its maximum at a or b. In fact, our hypothesis f(a) = f(b) implies that
it achieves its maximum at both a and b. Also f achieves its minimum
at a and b. That is, the maximum and minimum for f on [a, b] are the
same. Of course, this means that f is a constant function and so f 0 = 0
everywhere in (a, b). This finishes the proof.
(a,f(a)) (b,f(b))
a c b
We actually calculated f 0 in Example 3.16 and found f 0 (x) = (1 − In our solution, we di-
vide by cos c. How do
x) cos x − sin x. So now we know (1 − c) cos c − sin c = f 0 (c) = 0 we know that cos c is
and hence (1 − c) cos c = sin c. Divide both sides by cos c to get not zero?
tan c = 1 − c.
f(b) − f(a)
f 0 (c) = .
b−a
The Mean Value Theorem states that there is a point c between a and The Mean Value The-
b such that the tangent line at c is parallel to the secant line joining orem provides a le-
gal justification for av-
(a, f(a)) to (b, f(b)). See Figure 4.5. The Mean Value Theorem is es- erage speed cameras!
sentially just a ‘rotation’ of Rolle’s theorem. Indeed when f(b) = f(a) it If timed photos show a
car entering and exit-
says exactly the same thing as Rolle’s theorem. We’ll leave the proof to ing the 4km Dublin Port
Tunnel in less than a
Appendix B.1. 3-minute interval, and
hence with an average
speed through the tun-
Figure 4.5: The Mean Value Theorem nel of more than 80km/h,
then the MVT guar-
antees that at some
point inside the tunnel
the car was travelling
at more than 80km/h
and can be issued a
speeding ticket (even
though the car was
a c b never observed breaking
the speed limit).
Proof. Let f satisfy the hypothesis, that is, f 0 (x) = 0 for all x ∈ I. Take
some point a ∈ I, which we fix for the rest of the proof. We show that f
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
86 More about the Derivative
Solution. Pick any number y > 0 and consider two functions on the
interval (0, ∞), namely f(x) = log(xy) and g(x) = log x. Here, y is
This identity can also
be established using
a fixed number (a constant), while x is the variable in our functions.
ex ey = ex+y and the Now find dxd f and dxd g.
inverse relationship
between ex and log x. By the chain rule (take u(x) = xy), dxd f(x) = dxd log(xy) = 1 d
xy dx
(xy) =
1
xy
y = x1 . We also know that dxd g(x) = dxd log x = x1 .
So f and g have the same derivative on the interval (0, ∞). By Corol-
lary 4.20, the two functions differ by a constant so there is a number
k such that f(x) = g(x) + k, or
log(xy) = log x + k,
4.4. Linear Approximation 87
for all x in the interval. What can the number k be? To decide, set
x = 1 which tells us that
log y = log 1 + k.
for all x ∈ (0, ∞). This works for the number y we picked at the
start, but we could have picked any y ∈ (0, ∞) so in fact, for all
x, y ∈ (0, ∞),
log(xy) = log x + log y.
See if you can apply this method yourself in the following exercise.
log x a = a log x,
y = f 0 (a)x + c,
f(a) = af 0 (a) + c,
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
88 More about the Derivative
If you prefer, you can consider this tangent line as the graph of the
straight line function
The function Laf given in Equation 4.1 is the linearisation or linear ap-
proximation of f at a. To emphasize, Laf is the function whose graph is
the tangent line to f at a. We can use it to approximate f(x) when x is
close to a. The idea is that, because the tangent line is a good local
approximation to the graph of f, so Laf (x) is a good approximation to f(x).
√
Example 4.23. Find the √
√ linearisation of f(x) = x at a = 25, and
use it to approximate 26.3 and 24.8 without using a calculator.
√ √
Using a calculator, 26.3 ≈ 5.12835 . . . and 24.8 ≈ 4.979959 . . . , so our
√ to the actual values! Indeed,
approximations are pretty close
f
Figure 4.6
x
(on the left) shows f(x) = x and the tangent line L25 (x) = 2.5 + 10
as a dotted line. The two graphs are virtually indistinguishable in the
close-up version on the right.
4.4. Linear Approximation 89
5.2
6
(25,5) 5 (25,5)
4
4.8
2
4.6
10 20 30 40 23 24 25 26 27
Solution. Since dxd sin x = cos 0 we find L0sin (x) = sin(0)+(cos 0)(x −0).
But sin 0 = 0 and cos 0 = 1 so L0sin (x) = x.
1
The fact that the lineari-
sation of sin x at 0 is just
x means that sin x be-
haves very much like x
−3 −2 −1 1 2 3 near the origin.
−1
√
The requirement to approximate 26.3 without using a calculator may
seem a little contrived, and indeed it is. Nevertheless, linearising a
function is used a lot in practice. The equations and functions that gov-
ern real-world problems can be incredibly complex and impossible to
solve exactly. Linearising is the approach used to make such problems
tractable, and knowing how to work with the linear equations that result
is a big motivation for studying linear algebra.
For example, the motion of a pendulum, which is along an arc of a circle,
is governed by an equation involving sin x. Solving the equation is not
possible, but if the pendulum only oscillates on a short arc, sin x can be
approximated very closely by x (Example 4.24) and with this the equa-
tions can be solved. This was first done by Galileo, and his work was the
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
90 More about the Derivative
1 d 2
f(x) = ,
f(x) dx x
d
f(x) = 2x.
dx
OK, we already knew that dxd x 2 = 2x, but the method can be more valu-
able in other situations. Consider again Exercise 3.31.
Solution. First we take the log of f(x) and use rules of logarithms
(Example 4.21 and Exercise 4.22) to see that log f(x) equals
Implicit differentiation
Implicit differentiation is a generalisation of this idea where we want
to know the derivative of a function f(x), or any other quantity y that
varies with x, but we don’t have an explicit formula for the quantity. For
√
example, if you are told that y + x = ex and asked to find dy dx
then,
normally, you would try to write y explicitly as a function of x:
y + x = (ex )2 = e2x , y = e2x − x,
and then differentiate:
dy
= 2e2x − 1.
dx
Sometimes however, you cannot write y explicitly as a function of x, as
in the example
y5 − y2 + y = x 2 + 1.
where y as a function of the variable x would involve solving a quin- For more information on
this remark, see the
tic (as opposed to quadratic) polynomial which is generally impossible. marginal note just be-
This doesn’t necessarily stop you from finding dy
dx
because we can still fore MATH10390 Propo-
differentiate both sides with respect to x: sition 5.9.
dy dy dy
5y4 − 2y + = 2x
dx dx dx
dy
5y4 − 2y + 1 = 2x
dx
dy 2x
= .
dx 5y − 2y + 1
4
We’ve found the derivative, albeit the formula is also implicit (involves x
and y, not just a formula in x).
2. What is the slope of the tangent line to this circle at the point
(3, −4)?
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
92 More about the Derivative
Solution.
3. Since our line must be of the form y = 43 x+c and passes through
(3, −4) we have −4 = 34 3 + c so c = − 25 4
and the equation of
the line is y = 4 x − 4 .
3 25
4. At the point where the line crosses the horizontal axis, we have
y = 0 so 34 x − 25
4
= 0 and x = 25
3
= 8 13 .
2 4
(3,−4)
5.1 Graphs
Whereas the graph of a function f : R → R involved marking a point
at height y = f(x) above each point x in the domain, R, viewed as a
horizontal line, for a function g : R2 → R, our graph will involve marking
a point at height z = f(x, y) above each point in the domain R2 , which
you can view as a horizontal plane. Instead of being a line or curve
in two dimensions, our graph is more like a surface or terrain in three
dimensions. For example, f(x, y) = sin(x 2 y) exp y is plotted in Figure 5.1.
93
94 Functions of Several Variables
It is worth remembering that you are already very familiar with functions
of two variables, namely addition and multiplication! The functions of
addition f : R2 → R, f(x, y) = x + y, and multiplication g : R2 → R,
g(x, y) = xy, are graphed in Figure 5.2.
Notice how the graph of the addition function is ‘flat’; it is a plane, which
is the two-variable analogue of a line. There is a reason for this: the
function f(x, y) = x + y is a linear function. It can be represented by the
See MATH10390 Sec- 1 × 2 matrix ( 1 1 ), in the sense that we can we can take advantage of
tions 1.2 and 2.4.
matrix multiplication and write
!
x
E.g. in MATH10390 Sec- f(x, y) = 1 1 .
tion 1.2, we saw that y
the 2 × 2 rotation matrix
Rθ produces a rotation
about the origin through
In linear algebra, the m × n matrices you study can be regarded as
the angle θ, which is a functions from Rn into Rm . However, in the grand scheme of things, they
(linear) function from R2
to R2 .
5.2. The vector spaces R2 and R3 95
can be considered quite simple functions because they are linear. Here,
we will deal with non-linear functions, although given time constraints,
we’ll only deal with those which are real-valued, rather than the general
case which map into Rm .
respectively.
x
An element of R can be written either as (x, y) or
2
. The difference y
in notation can be used to distinguish between the different ways of
looking at an element of R2 :
We will wander freely between these two viewpoints and take advantage See MATH10390 Sec-
tion 2.4.
of the different notation when suitable. Of course, the same goes for
elements of R3 , and Rn in general.
Recall that vectors can be added together and multiplied by scalars ‘en- See MATH10390 Sec-
trywise’. Given x = (x1 , x2 ) and y = (y1 , y2 ) in R2 and λ ∈ R, tion 2.2.
and
λx = λ(x1 , x2 ) = (λx1 , λx2 ). Just about everything
we say about functions
The extension to Rn is clear enough. The lengthpor Euclidean norm of of 2 variables in this
chapter can be extended
the vector x = (x1 , x2 ) in R2 is given by x = x12 + x22 . This formula
to n variables. We’ll
stick to 2 variables just
comes from the Theorem of Pythagoras. to make the notation
easier.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
96 Functions of Several Variables
How does this help us? First, suppose we have a continuous function
g : R → R of one variable, e.g. g(x) = sin x, and define f : R2 → R
by f(x, y) = g(x). Then f is effectively a function of one variable, be-
cause it completely ignores the y-coordinate. The function is continuous
everywhere because, given (a, b) ∈ R2 ,
close to g(a) whenever x is close to (or equals) a. If (x, y) gets closer and
closer to (a, b) then, regardless of the approach (x, y) takes, x must get
closer and closer to (and may equal) a, and hence g(x) gets closer and
closer to g(a). Therefore lim(x,y)→(a,b) g(x) must equal limx→a g(x), namely
g(a). In a similar vein, the function h(x, y) = g(y), (x, y) ∈ R2 , again
effectively a function of one variable, will be continuous.
Because many functions of two (and more) variables that we will look
at are actually sums, products and compositions of functions that are
effectively functions of a single variable, the argument above, together
with Fact 5.1, allows to decide on continuity quite easily.
0
lim h(x, y) = lim = 0.
(x,y) = (x,0)→(0,0) x→0 x 4
2kx 3 2kx
lim h(x, y) = lim = lim 2 = 0.
(x,y) = (x,kx)→(0,0) x→0 x + k x
4 2 2 x→0 x + k 2
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
98 Functions of Several Variables
Since we have different limits along different paths, the limit does
not exist. The graph is shown in Figure 5.3.
f(x, 7) = 21x 2 .
This now is really a function just of one variable x. Let’s call this function
f7 , so f7 (x) = 21x 2 . Of course, differentiating this function is no problem:
f7 0 (x) = 42x.
In particular, 3 f 0 (7) = 27. This time we have fixed x and taken a derivative
with respect to y. This is the partial derivative of f with respect to y and
∂f ∂f
is written ∂y . We have calculated ∂y (3, 7) = 27.
For f(x, y) = 3x 2 y we have, on considering the y value fixed, and differ-
entiating with respect to x, we have
∂f
(x, y) = 6xy.
∂x
and similarly, fixing x and differentiating with respect to y,
∂f
(x, y) = 3x 2 .
∂y
We can define partial derivatives formally as follows.
∂f f(a + h, b) − f(a, b)
(a, b) = lim ,
∂x h→0 h
whenever this limit exists, while the partial derivative of f with re-
spect to y at (a, b) is given by
∂f f(a, b + h) − f(a, b)
(a, b) = lim ,
∂y h→0 h
∂f ∂f
Definition 5.4. The pair of functions ( ∂x , ∂y ) is called the gradient of ∇f evaluated at a point
f, denoted by ∇f. It is a function from R2 to R2 , or subsets thereof. yields a vector in R2 .
∂f
In general, to find ∂x , just differentiate with respect to x while treating y
∂f
as a constant. To find ∂y , differentiate with respect to y while treating x
as a constant.
∂f ∂f
Example 5.5. Find ∂x ∂y
, and ∇f(2, 3) in the following cases.
1. f(x, y) = x 2 y
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
100 Functions of Several Variables
Solution.
∂f ∂f ∂f ∂f
1. ∂x
= 2xy and ∂y = x 2 . In particular, ∂x
(2, 3) = 12 and ∂y
(2, 3) =
4, so ∇f(2, 3) = (12, 4).
∂f ∂f ∂f
2. ∂x
= yexy sin x +exy cos x, ∂y = xexy sin x, ∂x
(2, 3) = e6 (3 sin 2+
∂f
cos 2) and ∂y (2, 3) = 2e6 sin 2, so
∂f ∂f
Exercise 5.6. Find the partial derivatives ∂x
and ∂y
, where
1. f(x, y) = x 2 y3
2. f(x, y) = (x 2 + y2 ) sin x
3. f(x, y) = 1
x 2 +y2
4. f(x, y) = ex
2 y3
.
∂f ∂f
= 0 and = 0.
∂x ∂y
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
102 Functions of Several Variables
Solution. We have
In Example 5.11, if you
rewrote the function as ∂f ∂f
f(x, y) = (x − 2y)2 + ∇f = , = (4x − 4y + 4, 8y − 4x).
(x + 2)2 − 4 you would ∂x ∂y
immediately see that
the function has a lo-
cal minimum when each
For critical points we solve the simultaneous linear equations
of the squares is zero,
i.e. when x = −2 and so 4x − 4y = −4
y = −1. However, this
ad hoc approach will −4x + 8y = 0,
only work under limited
circumstances.
to find the solution (and only critical point) (x, y) = (−2, −1).
Example 5.12. Find the second order partial derivatives of the func-
tion f : R2 → R given by f(x, y) = x 3 sin y.
∂f ∂f
= 3x 2 sin y and = x 3 cos y.
∂x ∂y
∂f
Looking at ∂x in particular, we can find its own partial derivatives:
∂ ∂f ∂ ∂f
= 6x sin y and = 3x 2 cos y.
∂x ∂x ∂y ∂x
∂2 f ∂2 f
∂x 2 ∂y∂x
We denote these by and , respectively.
5.4. Critical Points 103
∂f
∂y
Similarly, by looking at , we have
∂2 f ∂ ∂f ∂2 f ∂ ∂f
= = 3x cos y and
2
= = −x 3 sin y.
∂x∂y ∂x ∂y ∂y2 ∂y ∂y
A point to notice about the above example of second order partial deriva-
tives is that for the function above we had
∂2 f ∂2 f
= 3x 2 cos y = .
∂x∂y ∂y∂x
In other words, the order of the differentiation did not matter. This is
not a fluke – for most functions f which have both second order mixed
∂2 f ∂2 f
partial derivatives, ∂x∂y = ∂y∂x . Specifically,
∂2 f ∂2 f
Theorem 5.13. Suppose ∂x∂y
and ∂y∂x
exist and are continuous. Then
∂2 f ∂2 f
∂x∂y
= ∂y∂x
.
Theorem 5.13 tells us that the Hessian matrix is (in many cases) sym- See MATH10390 Defini-
tion 1.21.
metric.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
104 Functions of Several Variables
0 2
1
The graph of a
saddle point can -0.5 1.5 0.5
look a lot like a
horse’s saddle, -1 1 0
hence the name.
-1.5 0.5 -0.5
-1 -1
-1
-0.5
0 -0.5 -1 -0.5
-2 0 y
-1 0 x 1 0
1 0.5 -0.5 0.5 0.5 y
0.5 0 1 0 0 0.5
-0.5 0.5 1 x
-1 1 -0.5 1
x y -1
See MATH10390 Defini- The classification of critical points in R2 is almost identical in phrasing
tion 6.10. A real sym-
metric matrix is positive
to Theorem 4.13 except that instead of the usual second derivative at
definite if all its eigen- the critical point f 0 (c) we use the Hessian matrix at the critical point
values are positive num-
bers and negative def- Hf (a, b) and instead of positivity/negativity of the number f 0 (c) we ask
inite if the eigenvalues for positive/negative definiteness of the matrix Hf (a, b).
are all negative.
Eigenvalues and
eigenvectors of matrices Theorem 5.15. Suppose f has a critical point at (a, b). To classify the
critical point (a, b), consider the Hessian matrix Hf (a, b). If Hf (a, b) is
will be covered in
MATH10390 Chapter 5.
positive definite then (a, b) is a local minimum. If Hf (a, b) is negative
definite then (a, b) is a local maximum.
If one of the eigenvalues of Hf (a, b) is positive and the other negative
then (a, b) is a saddle point.
Notice that this classification theorem fails to classify the critical point
if either of the eigenvalues of Hf (a, b) is zero.
It turns out that there is a simplification of Theorem 5.15 in the case of
functions of two variables, which doesn’t mention matrices or eigenvalues.
5.4. Critical Points 105
∂2 f ∂2 f ∂2 f 2
1. If ∂x 2 ∂y2
− ∂x∂y
> 0 at (a, b), then the critical point is:
∂2 f ∂2 f
a) a local maximum if ,
∂x 2 ∂y2
< 0 at (a, b), and
∂2 f ∂2 f
b) a local minimum if ,
∂x 2 ∂y2
> 0 at (a, b).
∂ f ∂ f ∂2 f 2
2. If ∂x 2 ∂y2 − < 0 at (a, b) then the critical point is a saddle
2 2
∂x∂y
point.
∂2 f ∂2 f ∂2 f 2
If ∂x 2 ∂y2 − ∂x∂y
= 0 at (a, b), then we cannot use Theorem 5.16 to
classify the critical point.
The reader can find in Appendix B a proof of the equivalence of Theorems
5.15 and 5.16 in the case of two variables.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
106 Functions of Several Variables
√
√ that the eigenvalues of this matrix are 6 + 2 5 ≈ 10.47 and
It would be a good
idea to review this
We find
part of the example 6 − 2 5 ≈ 1.53. Both are positive! This means Hf (−2, −1) is positive
once you have covered
MATH10390 Chapter 5.
definite and (−2, −1) is a local minimum by Theorem 5.15.
Now we want to find the minimum value of g(m, c) over all posible
values of (m, c) ∈ R2 . Since our region is the entire Cartesian place,
there is no boundary as such, and we just have to look for critical
points by solving ∇g = (0, 0).
∂g
Now ∂m = −206 + 28m + 12c while ∂g ∂c
= −98 + 12m + 6c so the
equations we have to solve are the linear system
Exercise 5.19. Use Theorem 5.16 to show that the critical point
(m, c) = ( 25 , 34
3
) is indeed a local minimum.
Our example just has a data set of three points, but the same method
applies to find the line of best fit through any number of points in R2 .
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
108 Functions of Several Variables
19 19
18 18
17 17
16 16
15 15
14 14
1 2 3 1 2 3
P x
discrepencies x dx because
P this would encourage large negative dis-
crepencies. Mimimising | x dx | would allow large positive
P and negative
discrepencies to cancel each other out. Minimising x |dx | is logically
sensible, but the problem is that the absolute value function is not dif-
ferentiable at 0 (see Example 3.8) which complicates
P the task of finding
minimum values. Therefore we choose to minimise dX2 .
Integration
109
110 Integration
R
Remarks 6.3. The symbol is the integral sign and evolved from
an elongated S. For reasons that will become apparent later, the
letter S was Rused to represent ‘summation’. The function f to be
integrated in f(x) dx is called the integrand and x is the variable
of integration.
R
Recall from Defi- In our first example, we have 2xRdx = x 2 + c : c ∈ R . However, this
nition 6.1 that the
indefinite integral
is usually written more lazily as 2x dx = x 2 + c, where c is understood
is officially a set of to be an arbitrary constant, referred to as the constant of integration.
functions.
Solution.
2. Since d
dx
sin x = cos x, we have
Z
cos x dx = sin x + c.
R
f(x) f(x) dx
r
x (r 6= −1) 1
r+1
x r+1
log |x|? Both log x and
x −1 log |x| log(−x) have a deriva-
tive of x1 . However, for
cos x sin x x ∈ R, only one of log x
sin x − cos x
and log(−x) will be de-
fined since the domain
1/ cos2 x tan x
of log is the set of posi-
tive numbers. So, on the
ex ex positive half line the in-
tegral of x1 is log x while
on the negative half line
it is log(−x). These can
(We have omitted the constant of integration in each formula.) be
R 1 combined by saying
x dx = log |x|.
These two linearity rules are easily derived from the facts that (f +g)0 (x) =
f 0 (x) + g0 (x) and (kf)0 (x) = kf 0 (x). In practical terms, overlooking the
constants of integration above means that we can apply Fact 6.6 without
paying attention to them, provided we remember to insert a final constant
of integration after using it (see examples of this below).
Unfortunately, and this is the difficulty with integration, there are no Differentiation is a more
algorithmic process: fol-
simple analogues of the product, quotient or chain rules of differentiation low the rules correctly
for integration. We will spend a good deal of time developing methods and you will find the
derivative. Integration
is more of an art form!
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
112 Integration
to overcome the difficulty this entails, but there is no getting around the
fact that integration is harder than differentiation. We start though with
some simple examples which use the two rules which we know.
Example 6.7.
Z Z Z Z
1. x − 2x + 3 dx =
4 3
x dx − 2 x dx + 3 1 dx
4 3
Z √ Z !
4+ t t2
1
4. 4
dt = + dt
t3 t3 t3
Z
(4t −3 + t − 2 ) dt
5
=
= 4 · − 12 t −2 − 23 t − 2 + c = −2t −2 − 23 t − 2 + c.
3 3
Z Z
−2
5. u (1 − u) du = (u−2 − u−1 ) du
f(xk ). The width of the rectangle is denoted by ∆k . The area of the kth
rectangle is the product f(xk )∆k , and an approximation to the area we
want is obtained by adding up the areas of these rectangles. We see
that the area is approximated by
n
X
f(xk )∆k = f(x1 )∆1 + f(x2 )∆2 + · · · + f(xn )∆n .
k=1
The value 20.9375 calculated in this example is actually the total area
of the six rectangles involved in our Riemann sum (shaded in grey in
Figure 6.1, on the left). It is an approximation of the desired area. If we A finer partition just
means that the maxi-
repeat the exercise with a finer partition we get a better approximation. mum width of any rect-
You can get a feel for this by considering Figure 6.1, where 6 and 12 angle is smaller.
strips, respectively, are used to approximate the area.
16 16
12 12
8 8
4 4
1 2 3 4 1 2 3 4
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
114 Integration
16
12
1 2 3 4
We can see how the grey area calculated by our Riemann sums should
limit to the desired area. This motivates the following definition.
R5
Example 6.10. Find 3
2x dx by considering areas.
10
3 5
6.2. Riemann sums and definite integrals 115
We assumed so far that f > 0, that is, its graph was on or above the
x-axis, but this was only for convenience. Even if f has negative values
Rb
on [a, b], we still define a f(x) dx to be the limit of the Riemann sums as
we take finer partitions, but negative values of f contribute negatively in
the Riemann sum. That is, area under the x-axis carries a negative sign.
Remarks 6.11. When f 6 0, the area between the graph of the func-
tion and the x-axis is negative. Thus
Z 1
x dx = 0,
−1
because the area above and below the x-axis cancel out (see Fig-
ure 6.4).
−1 1
−1
Rb
Nevertheless, the definite integral a f(x)dx does exist for a large class of
functions. One positive result, due to Riemann, is that it exists whenever
f is continuous.
Rb
Theorem 6.13. If f : [a, b] → R is continuous then a
f(x) dx exists.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
116 Integration
Constructing finer and While it is helpful to know that the definite integral exists, this result does
Rb
not explain how to find a f(x) dx for a given function f. Neither does
finer Riemann sums, and
trying to spot what their
R Rb
limit is is not an appeal-
ing option!
it explain why have used two similar notations f(x) dx and a f(x) dx
for two quite different concepts. However, the answer to both of these
questions is provided by the two parts of the Fundamental Theorem of
Calculus.
It turns out that this new function F is differentiable and its derivative is
exactly f.
This result is what links the definite and indefinite integrals. It tells
us that a continuous function has an antiderivative, and that that an-
tiderivative is given in terms of an area function (i.e. definite integral).
More precisely,
Rx an antiderivative (i.e. indefinite integral) of f(x) is given
by F (x) = a f(t) dt.
Notice that F (a) = 0 because it represents the area of a line. Now this
F is not be the only antiderivative of f, but we do know that any other
antiderivative G say differs from F by a constant, so there is a number c
such that G(x) = F (x) + c for all x in the interval. Thus
Z b
= F (b) − F (a) = F (b) = f(t) dt.
a
b
Definition 6.16. The notation F (x)a is used as shorthand for F (b) − b
The notation F (x) a is
F (a). also used.
This of course is the same answer we had in Example 6.10 but this method
applies where splitting into rectangles and triangles is not possible. For
example, we can now compute exactly the area in Example 6.8 without
having to use a Riemann sum approximation.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
118 Integration
Example 6.18. What is the area under the graph of f(x) = x 2 between
a = 1 and b = 4.
R4
Solution. We want 1 f(x) dx. We know that F (x) = 31 x 3 is an an-
tiderivative of f and so
Z 4
4
x 2 dx = 13 x 3 1 = 13 (43 − 13 ) = 63
Compare this solution
with our Riemann sum 3
= 21.
approximation in Exam- 1
ple 6.8!
The first two properties (linearity) come from linearity of the indefinite
integral and the close connection between the definite and indefinite
integral that the Fundamental Theorem of Calculus gives us. Parts (3)
and (5) should make sense to you geometrically in terms of areas, but
can be proven easily from Theorem 6.15. Part (4) is also easy to prove
from this theorem, if not so geometrically intuitive (‘backwards area is
negative!’).
We now have a very useful and flexible method of calculating areas of
shapes bounded by graphs of functions. We are limited though to the
functions which we can integrate (i.e. find an antiderivative of). To expand
the list of functions we can integrate, we will look at some methods of
6.3. The Fundamental Theorem of Calculus 119
integration in the next chapter, but we’ll finish this one with a few more
examples of computing definite integrals.
Example 6.20.
R4√ R4
t dt = t t = 23 4 2 − 23 · 0 = 16
1 2 32 4 3
1. 2 dt = .
0 0 3 0 3
R1 1
2. −1
3x 2 − x 3 + 1dx = x 3 − 14 x 4 +x −1 = (1− 41 +1)−(−1− 41 −1) = 4.
Rπ π
3. 0
sin t dt = − cos t = − cos π − (− cos 0) = −(−1) − (−1) = 2.
0
When integrating trig-
onometric functions, ra-
R3 dians are used by de-
4. 2
−x 3
+ 6x 2
dx = − 4
x
1 4
+ 2x 3 3
2
= − 81
4
+ 54 − (−4 + 16) = 21 34 . fault!
Rx t x
5. 0
e dt = et 0 = ex − e0 = ex − 1.
1 4 x
Rx 3
6. 0
w dw = 4
w 0
= 14 x 4 .
Rx 3 x
7. 0
s ds = 41 s4 0 = 14 x 4 .
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
Chapter 7
Methods of Integration
121
122 Methods of Integration
not very appealing. How about making the same substitution we used
to differentiate, that is, let u = 3x + 1. Now our integral becomes
Z Z
100
(3x + 1) dx = u100 dx.
= 1
3
· 1
101
u101 +c = 1
303
(3x + 1)101 + c.
Notice that at the end we replaced u by 3x + 1, to give the answer in
terms of the variable in which the original integral was phrased. It’s the
polite thing to do.
This form of integration, where the variable is changed to make the inte-
gral appear as a standard integral, is called integration by substitution.
It is the integral counterpart to the chain rule for differentiation.
Example 7.1.
Z
1. Determine cos(3t + 2) dt.
R
Solution. Since we know the standard integral cos u du, we
will try the substitution u = 3t + 2:
Z Z
Notice we’ve written c
cos(3t + 2) dt = cos u dt
instead of 13 c 0 , simply to Z
= 3 cos u du
indicate that one third 1 du
of an arbitrary constant dt
= 3 implies dt = du
3
is still just an arbitrary
constant (albeit a differ- = 1
3
(sin u + c0 ) = 1
3
sin(3t + 2) + c.
ent one).
Z
1
2. Determine dt.
3t + 6
7.1. Integration by Substitution 123
R
Solution. We know u1 du, so let’s try the substitution u = 3t+6
which makes our integral look like that standard integral:
Z Z
1 1
dt = dt
3t + 6 u
Z
1
= 13 du du
= 3 implies du = 3dt
u dt
= 6
u +c
1 6
= 1 3
6
(x + 1)6 + c.
Let’s examine that last one again. Why did u = x 3 + 1 work out nicely?
The reason lies behind the fact that substitution is like a reversal of
the chain rule. The chain rule says that f 0 (g(x))g 0
R 0 (x) is the derivative of
f(g(x)). In terms of anti-derivatives, this says f (g(t)) · dg
dt
dt = f(g(x)).
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
124 Methods of Integration
Example 7.4.
Z
xe−x dx.
2
1. Evaluate
where u = −x 2 . Thus du
dx
= −2x, giving xdx = − 21 du, and
Z Z
−x 2
xe dx = − 2 eu du = − 12 eu + c = − 12 e−x + c.
2
1
Z
2. Evaluate cos x sin x dx.
√
Z 2
2
− 21
1
3
u du = 3
u
2 12
= 2
3
( 2 − 1).
1 1
7.2. Integration by Parts 125
Z Z
u0 (x)v(x) + u(x)v 0 (x) dx,
d
u(x)v(x) dx =
dx
and so, forgetting about constants of integration for the moment,
Z Z
u(x)v(x) = u (x)v(x) dx + u(x)v 0 (x) dx.
0
Fact 7.5.
Z Z
0
u(x)v (x) dx = u(x)v(x) − u0 (x)v(x) dx.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
126 Methods of Integration
= −x cos x + sin x + c.
Z
Example 7.8. Determine xe−7x dx.
R R
Since u(x) = x, we get u0 (x) = 1. We find v(x) = v 0 (x)dx = e−7x dx
and the substitution w = −7x gives v(x) = − 71 e−7x . So we get
Z Z
xe dx = − 7 xe + 7 e−7x dx
−7x 1 −7x 1
= − 17 xe−7x − 1 −7x
49
e + c.
Solution. We write
Z Z
n
x log x dx = x n log x dx
|{z} | {z }
v0 u
Z
x n+1 x n+1
1
= 1
n+1
log x − 1
n+1
dx
| {z } | {z } x
| {z }|{z}
v u v
u0
7.2. Integration by Parts 127
Z
= 1
n+1
x n+1 log x − 1
n+1
x n dx
= 1
n+1
x n+1 log x − 1
(n+1)2
x n+1 + c.
Note that all ofR this works perfectly well when n = 0 (where we have
x n = 1), giving log x dx = x log x − x + c. As with any integral, we can
check our answer by differentiating our solution to get the function we
were originally trying to integrate:
d d d
(x log x − x) = x log x + x · log x − 1
dx dx dx
1
= 1 · log x + x · − 1
x
= log x.
You can add this one to our short list of standard integrals (Fact 6.5).
Remarks 7.10. This relates to Remarks 7.7. Probably the most well-
known rule of thumb for applying integration by parts is the so-called
LIATE rule. It states that the function that comes first in the following
list should be chosen as u:
L logarithmic functions
T trigonometric functions
E exponential functions.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
128 Methods of Integration
R R R
x n sin x dx, x n cos x dx and x n ex dx, where n R> 1 is an integer.
In the final example of the chapter, we just consider x 2 cos x dx.
Z
Example 7.11. Determine x 2 cos x dx.
R
In general, the integral x n cos x dx can be evaluated by performing
integration by parts n times (formally, by mathematical induction).
We finish with another remark about the difficulty of expressing certain
integrals. The bell curve e−x introduced in Example 3.28 was hailed
2
with much fanfare in the nearby marginal note as being one of the most
We look at this problem important functions in statistics. Hence it is rather embarrassing to note
again in Section 8.2.
that it is impossible to express the indefinite integral of this function in
terms of familiar functions. In fact, we just have to define a brand new
function. The error function erf : R → R is defined by
Z x
erf(x) = √ e−t dt.
2 2
π 0
erf(x) = √ e−x ,
d 2 2
dx π
7.2. Integration by Parts 129
√
hence 2π erf(x) is an anti-derivative of e−x . The constant multiple √2π is
2
introduced into the definition of erf in order to ensure that limx→∞ erf(x) =
1 (this is a consequence of a beautiful result in integration that, sadly,
we do not have time to cover!).
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
Chapter 8
Numerical Techniques
131
132 Numerical Techniques
the bisection method. For this, we first find two points where the graph
of f is on opposite sides of the x-axis. That is, locate x1 and x2 such
that f(x1 ) < 0 and f(x2 ) > 0. E.g. for f(x) = x − cos x we have f(0) =
That f really has a −1 < 0 while f( π2 ) = π2 > 0. The idea is that since f(x1 ) is negative
root is a consequence of
the so-called Interme-
and f(x2 ) is positive, f must have a root (a place where it crosses the
diate Value Theorem, x-axis) somewhere between x1 and x2 . If |x2 − x1 | is small enough then
which states that if g :
[a, b] → R is continuous, we know the location of our root to desired accuracy, but otherwise, we
g(a) 6 0 6 g(b), then consider the point halfway between: x3 = 12 (x1 + x2 ). Now there are
g has a root in [a, b],
i.e. there exists x ∈ three possibilities: (i) f(x3 ) = 0 in which case we have found the root
[a, b] such that g(x) = (which would be lovely, but it almost never happens), (ii) f(x3 ) > 0 or
(iii) f(x3 ) < 0. In either of these latter two cases, f(x3 ) differs in sign
0. This theorem is fairly
deep and its proof is be-
yond the scope of this to either f(x1 ) or f(x2 ) and we can say whether the root lies between x1
and x3 or between x2 and x3 . Thus we have located the root in a shorter
module. See Theorem
B.3
interval. We can keep repeating this until we have shortened sufficiently
8.1. Solving equations numerically 133
the length of the interval in which the root resides, thereby finding the
root to the desired accuracy.
1
The point x3 looks like it
is very close to the true
solution, but this is just
x1 = 0 x3 x2 = π dumb luck (and also de-
2
pends on the scale of the
graph), and not really by
−1 virtue of the method!
Newton’s method
Newton’s method (or the Newton-Raphson method) usually provides very
fast convergence to the root of a differentiable function. The idea is very
simple. We take a starting point, say x1 , and instead of looking for the
root r of f, we find the root of the linear appproximation Lxf 1 (which is
much easier to find), and name this x2 . Recall from Section 4.4 that the
linear approximation to f at a is given by
The root of the linear function Lxf 1 is the solution to f(x1 )+(x−x1 )f 0 (x1 ) = 0:
f(x1 ) + (x − x1 )f 0 (x1 ) = 0
(x − x1 )f 0 (x1 ) = −f(x1 )
f(x1 )
x − x1 = − 0
f (x1 )
f(x1 )
x = x1 − 0 .
f (x1 )
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
134 Numerical Techniques
Thus x2 = x1 − ff(x 1)
0 (x ) . Geometrically, all we’ve done is to calculate where
1
the tangent line to f at x1 crosses the x-axis. See Figure 8.3 (left hand
side).
(x2 ,f(x2 ))
x2 x1 x3 x2 x1
f(xn )
xn+1 = xn − .
f 0 (xn )
The right hand side of Figure 8.3 displays the next iteration, which pro-
duces x3 .
This iteration process continues until we are close enough to the root.
We can measure how close we are to the root by comparing f(xn ) to 0.
That is, we will stop when |f(xn )| is less than some acceptable approxi-
mation error. Of course, if f 0 (xn ) = 0 at any point then we are stymied
by a division by zero in the formula (what does this correspond to geo-
metrically?) and we have to start again from a different x1 . However, in
practice, this problem rarely occurs.
f 0 (x) =
d 3
(x + x 2 − x − 1) = 3x 2 + 2x − 1.
dx
We apply Newton’s method by filling in the following table.
f(xn ) f(xn )
xn f(xn ) f 0 (xn ) f 0 (xn )
xn − f 0 (xn )
x1 = 3 32 32 1 2
x2 = 2 9 15 0.6 1.4
x3 = 1.4 2.304 7.68 0.3 1.1
x4 = 1.1 0.441 4.83 0.091304347 1.008695652
8.621011219×10−3
We start to use approx-
x5 = 1.008695652 0.035085723 4.069792059 1.000074641 imate values in rows 4
and beyond.
x6 = 1.000074641 2.985854×10−4 4.000597145 7.4635208×10−5 1.0000000006
x7 = 1.0000000006 2.402×10−8
How good is Newton’s method? Well, if you used bisection to find the
root in the above example then it would take you perhaps 30 iterations to
achieve the same level of accuracy achieved after 6 iterations of Newton’s
method, and 60 bisections to achieve the same accuracy as 7 iterations
of Newton. Roughly speaking, bisection halves the error at each step,
while Newton’s method squares the error, and if the error is small, then
this is much better.
In school, we’ve all learned to add, subtract, multiply and divide numbers
but you probably didn’t learn how to extract square roots using these
operations. Here’s one way to do it.
Example
√ 8.4. Use Newton’s method to find a decimal expansion of
2 correct to 4 decimal places.
√
Solution. We notice first that 2 is a root of the function f(x) = x 2 −2.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
136 Numerical Techniques
We get x1 = 1, x2 = 32 , x3 = 17
12
, x4 = 577
408
, x5 = 470832
665857
. Checking the
decimal expansions reveals that the first four decimal places have
√
In fact, the approxima-
665857
tion 470832 of 2 is started to repeat,√hence we stop here and take x = 1.4142 as the
correct to 11 decimal
places. approximation of 2.
Exercise 8.5.
and these involve a lot of work. Can we get better approximations to the
area without having to do so much extra work?
The idea is to approximate the given function f on [a, b] by another func-
tion g which we can integrate simply. The integral of g should then be
a good approximation to the integral of f.
1
n
(b − a), for n = 4 and n = 9. We call h the step size of the partition.
We let xk = a + kh for k = 0, 1, 2, . . . n and yk = f(xk ). We construct n
trapezoids based on the subintervals. The area of the first trapezoid is
1
2
h(y0 + y1 ). The area of the second trapezoid is 12 h(y1 + y2 ). The total
area of the n trapezoids, Tn , is given by
Tn = 1
2
h(y0 + y1 ) + 12 h(y1 + y2 ) + · · · + 12 h(yn−1 + yn )
= 1
2
h(y0 + 2y1 + 2y2 + · · · + 2yn−1 + yn ).
a = x0 x1 x2 x3 x4 = b a = x0 x9 = b
Rb
Definition 8.6. The nth trapezoidal approximation to a
f(x) dx is
given by
Tn = 1
2
h(y0 + 2y1 + 2y2 + · · · + 2yn−1 + yn ),
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
138 Numerical Techniques
R2
Example 8.7. Estimate 1
x 2 dx with n = 2 subintervals and also with
n = 4 subintervals.
xk 1 3
2
2
yk = f(xk ) 1 9
4
4
which gives T2 = 14 (1 + 2 · 49 + 4) = 19
8
= 2.375.
If n = 4, we get a step size of h = 14 (2−1) = 14 , and the table becomes
xk 1 5
4
3
2
7
4
2
f(xk ) 1 25
16
9
4
49
16
4
Thus T4 = 18 (1 + 2 25
16
+ 2 94 + 2 49
16
+ 4) = 75
32
= 2.34375.
The example was chosen In the example above one can of course calculate the integral exactly.
for the sake of simplic-
ity. Z 2
x 2 dx = 3
x
1 3 2
1
= 8
3
− 1
3
= 2 13 .
1
As one might expect, we got a better estimate of the true value when
taking a smaller step size – that is, a higher value for n.
R2
Example 8.9. Calculate 0
1 + x 3 dx to within an accuracy of 1
2
using
the trapezoidal rule.
xk 0 1
2
1 3
2
2
yk = f(xk ) 1 9
8
2 35
8
9
This gives T4 = 41 (1 + 49 + 4 + 35
4
+ 9) = 25
4
(exact calculation gives an
answer of 6).
Simpson’s rule
The trapezoidal rule approximates a function f by a series of straight
lines. We can make a better approximation by using sections of quadratic
curves instead of straight lines. This is the basis of Simpson’s rule.
Rb
Definition 8.10. The nth Simpson approximation to a
f(x)dx is given
by
Sn = 1
3
h(y0 + 4y1 + 2y2 + 4y3 + · · · + 2yn−2 + 4yn−1 + yn ),
b−a
where n is an even number, h = n
is the step size, xk = a + kh for
k = 0, . . . , n and yk = f(xk ).
We’ll leave the derivation of this rule to Appendix B.1, but it is no more
difficult to apply than the trapezoidal rule, and can save time because of
its improved accuracy, meaning we can choose smaller values for n.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
140 Numerical Techniques
R1
Example 8.11. Use Simpson’s rule with n = 4 to estimate 0
6x 5 dx.
What is the error estimate? What is the actual error?
xk 0 1
4
1
2
3
4
1
yk = f(xk ) 0 0.0058594 0.1875 1.423828 6
S4 = 1
3
h(y0 + 4y1 + 2y2 + 4y3 + y4 )
≈ 1
12
(0 + 4(0.0058594) + 2(0.1875) + 4(1.423828) + 6)
≈ 1.0078125.
Now we must find the theoretical limit on the error ES4 . We know
|ES4 | 6 180 1
(b − a)h4 M, where M is the maximum of |f (4) | on [a, b] =
[0, 1]. For f(x) = 6x 5 we have f 0 (x) = 30x 4 , f 00 (x) = 120x 3 , f (3) (x) =
360x 2 , f (4) (x) = 720x = |f (4) (x)| on [0, 1]. So M = 720 and
|ES4 | 6 1
180
· ( 14 )4 · 720 = 1
64
= 0.015625.
R1 1
This integral above can be evaluated exactly: 0 6x 5 dx = x 6 0 = 1.
(Thus, the actual error is quite a lot less than the theoretical bound
on the error.)
8.2. Integrating numerically 141
it is, in a sense, the one to which all others gravitate. This statement is
made precise in a result called the Central Limit Theorem.
The point is that calculating definite integrals of e−x is crucially impor-
2
This means that the only way to calculate these definite integrals is nu-
merically, or to look up tables produced by others who have done the
calculations. We’ll finish the chapter by performing our own numerical
calculation.
R1
Example 8.12. Find e−x dx to within an accuracy of ε = 10−4 =
2
0
0.0001.
Solution. Let φ be as in Example 3.28. Using Fact 2.4, and the fact
that |e−x |, |x n | 6 1 whenever x ∈ [0, 1] and n ∈ Z is non-negative,
2
we have |φ00 (x)| 6 6 and |φ0000 (x)| 6 76 for all x ∈ [0, 1]. When using
Simpson’s rule, to make |ESn | < ε, we need 180 1
h4 ·76 6 ε so h4 6 180
76
ε
b−a
and thus h 6 0.12405513. Since h = n = n , we have to take n > 8.
1
If we use the trapezoidal rule, then to make |ETn | < ε, we would need
h · 6 6 ε and so h2 6 2ε, giving h 6 0.0141421 and n > 70.
1 2
12
To run this script your-
Neither of these looks inviting to do by hand, so we’ll use a computer self, you will need Perl
to do the work. There is a perl script in the week 8 section, written on your computer.
by Michael Mackey, that produces the following output:
---
a = 0 b=1 n=10 Step size is h=0.1.
Trapezoidal sum with 10 steps is 0.746210796131749
Simpson sum with 10 steps is 0.746824948254443
---
a = 0 b=1 n=72 Step size is h=0.0138888888888889.
Trapezoidal sum with 72 steps is 0.746812305336648
Simpson sum with 72 steps is 0.746824133116616
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
142 Numerical Techniques
143
144 Discussion board and WeBWorK guides
$$aˆ4 = 3$$
4. Surds
√ √
Expressions like 2 and 5 11 can be obtained by writing $\sqrt{2}$
or $\sqrt[5]{11}$, respectively (note the use of square brackets
as well as curly ones in the second example).
6. Standard functions
The commands $\sin$, $\cos$, $\tan$ and $\log$ produce the
standard functions sin, cos, tan and log. For example,
$\cos(x) = \frac{\sqrt{3}}{2}$
√
3
produces cos(x) = 2
.
7. Summation and integration
Use the commands $\sum$ and $\int$, together with ˆ and _ and
curly braces, to write expressions involving summation and integra-
tion. For instance,
$\sum_{k=1}ˆn k = \frac{1}{2}n(n-1)$
Pn
produces k=1 k = 12 n(n − 1), and
$\int_1ˆ2 xˆ2 dx = \frac{7}{3}$
R2
gives 1
x 2 dx = 73 .
8. Greek characters and special symbols
The Greek letters α, β, θ and π etc can be expressed using $\alpha$,
$\beta$, $\theta$ and $\pi$, respectively. Symbols such as R and
≈ require $\mathbb{R}$ and $\approx$, respectively.
9. Matrices
Alas, there is no quick way to write down matrices properly using
MathJax, because doing so requires a so-called ‘LaTeX environ-
ment’.
To begin, type \begin{pmatrix}. Then type in the entries of the
first row, separating each one by an ampersand & character. When
you reach end of the first row, type \\. Add the entries of the
second row as above, and repeat until you have reached the end
of the final row. To finish, type \end{pmatrix} (you do not need to
add \\ at the end of the final row).
Perhaps an example explains all of this best. Typing
$$\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$$
will produce !
1 2
.
3 4
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
146 Discussion board and WeBWorK guides
The best way to learn this stuff is through practice and experimenta-
tion. You can do so by using this Live Demo. The examples above can
be adapted and combined in all sorts of ways to produce expressions
of greater complexity (though it should not be necessary to write enor-
mously complicated expressions!).
Be mindful when using the curly braces { and }. MathJax will com-
plain with error messages, or will not render your expression properly, if
they are missing or are in the wrong place. Every opening { requires a
corresponding closing } (correctly placed).
If you are in any doubt about how WeBWorK is going to interpret your
answer, press the ‘Preview Answers’ button.
A.2. How to use WeBWorK 147
http://webwork.maa.org/wiki/Available_Functions
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
Appendix B
149
150 Additional material (non-examinable)
so, using the algebra of limits, the proof will be complete if we can show
that limh→0 φ(h) = f 0 (g(x)).
Set k = g(x + h) − g(x). Whenever k 6= 0, we have
f(g(x) + k) − f(g(x))
φ(h) = ,
k
and φ(h) = f 0 (g(x)) otherwise. As h → 0, g(x + h) → g(x) because g
is continuous at x, being differentiable there. Consequently, k → 0 as
h → 0. Hence limh→0 φ(h) = f 0 (g(x)) as required.
f(b) − f(a)
g0 (x) = f 0 (x) − ,
b−a
and so
f(b) − f(a)
0 = g0 (c) = f 0 (c) − ,
b−a
which is just what we want.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
152 Additional material (non-examinable)
which is equivalent to
Lastly, A has one positive and one negative eigenvalue if and only if
p
|a + d| < (a + d)2 − 4(ad − b2 ),
ad − b2 < 0. (B.4)
∂2 f ∂2 f ∂2 f
a = , d = and b = ,
∂x 2 ∂y2 ∂x∂y
evaluated at the critical point, then A becomes the corresponding Hessian
matrix. By Theorem 5.15, the critical point is a local minimum if both
eigenvalues are positive, which is the case if and only if equation (B.2)
holds. Observe that with the given values of a, d and b, (B.2) holds if and
only if Theorem 5.16 1(b) holds. Likewise, by Theorem 5.15, the critical
point is a local maximum if both eigenvalues are negative, which is true
if and only if (B.3) and Theorem 5.16 1(a) hold. Finally, we have a saddle
point if one of the eigenvalues is positive and the other is negative, which
is true if and only if (B.4) and Theorem 5.16 2 hold.
B.2. Additional concepts 153
Figure B.1: sin and cos restricted to [− π2 , π2 ] and [0, π], respectively
1 1
− π2 π π π
2 2
−1 −1
The functions sin : [− π2 , π2 ] → [−1, 1] and cos : [0, π] → [−1, 1] are bi-
jections. The respective inverses are arcsin : [−1, 1] → [− π2 , π2 ] and
arccos : [−1, 1] → [0, π]. Sometimes these functions are denoted in the
literature by sin−1 and cos−1 , however, when one considers the notation
sinn and cosn , n ∈ N, which denotes sin and cos raised to the nth (pos-
itive integer) power, it is clear how this alternative notation may cause
confusion, so we won’t use it here.
π π
2
π
2
−1 1
− π2
−1 1
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
154 Additional material (non-examinable)
π
1 2
− π2 π
2
−2 −1 1 2
−1 − π2
−2
= √
1 1
cos(arctan x) = q . (B.7)
1 + tan (arctan x)
2 1 + x 2
B.2. Additional concepts 155
There are many other such relationships (you may like to try to establish
some), but we focus on the three above because they will help us to
establish the differentiability properties of these inverse trigonometric
functions.
Concerning the differentiablilty properties, it will be useful to consider
the following theorem, which we will state without proof.
(f −1 )0 (x) =
1
.
f 0 (f −1 (x))
Using this result, we can readily establish the derivatives of arcsin, arccos
and arctan.
Proposition B.2.
arcsin x = √
d 1
.
dx 1 − x2
arccos x = − √
d 1
.
dx 1 − x2
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
156 Additional material (non-examinable)
Proof. Assume first that f(a) 6 u 6 f(b). Then apply Theorem B.3 to the
continuous function g(x) = f(x) − u, x ∈ [a, b]. If f(a) > u > f(b), consider
g(x) = u − f(x) instead.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
158 Additional material (non-examinable)
d
exp x = exp x.
dx
As is clear, in the treatment above we defined the logarithm first and then
constructed the exponential function. It is common to start by defining
exp first and then log as its inverse, and end up with the same functions.
Often, exp is defined in terms of a so-called infinite power series: given
x ∈ R, we set
∞
X xn x2 x3 x4
exp x = = 1+x + + + + ...
n=0
n! 2! 3! 4!
Amazingly, many func- In this module, we have not come close to considering how we could
tions familiar to us such
as sin and cos have sim-
possibly add together infinitely many terms like this and end up with
ilar power series repre- anything meaningful, which is why this approach was avoided above.
sentations.
Nevertheless, we do get something meaningful, and it is possible to prove
further that exp, so defined, has two properties:
B.2. Additional concepts 159
1. exp 0 = 1, and
d
2. exp x = exp x for all x ∈ R.
dx
From this point, we can apply the method of Example 4.21 to the function
f(x) = exp(x + y) exp(−x) to establish Proposition B.6 (another example
of the power of the technique in Example 4.21), and from there reason
that exp : R → (0, ∞) is a bijection, define log as its inverse, and find its
derivative using Theorem B.1.
Exponentiation revisited
Back in Section 1.6, we looked at exponentiation and reached a point
where we could define x q , where q is a rational number (at least for
x > 0). No definition of x y , where y is an irrational number, was given.
Above, we gave a more robust treatment of the exponential function exp.
In this subsection, we revisit exponentiation and take advantage of exp
to yield a more robust and expansive definition of x y . As well as this, we
consider again power functions, provide justification for Fact 3.13, and
introduce exponential functions and their derivatives.
We begin with a definition.
x y · x z = x y+z and (x y )z = x yz .
Proof. We have
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
160 Additional material (non-examinable)
(x y )z = exp(z log(x y ))
= exp(z log(exp(y log x))) = exp(zy log x) = x yz .
= −n = x n ,
1 1
exp(n log x) =
exp((−n) log x) x
so equation B.8 holds for all n ∈ Z. Futhermore, given n ∈ N, again by
Proposition B.6 we observe that
n
exp n1 log x = exp n · n1 log x = exp(log x) = x,
√
and consequently, exp n1 log x = n x. Finally, given a rational number
q = mn as written just before Definition 1.12, we have
√ m
x q = ( n x)m = exp n1 log x = exp mn log x = exp(q log x).
Therefore, happily, Definition B.8 agrees perfectly with that which has
been established already. What Definition B.8 provides in addition is a
way of taking irrational exponents.
Moreover, we can use it to prove Fact 3.13 (which concerns the derivatives
of power functions), at least for x > 0. Power functions are those of the
form f(x) = x r where the number r ∈ R is fixed. Thus, f(x) = x 2 and
g(x) = x − 3 and h(x) = x 3 , r(x) = x −1 = x1 are all power functions. The
7 1
Using the chain rule and differentiability properties of exp, we see that
all exponential functions are differentiable:
d x
exp(x log a) = log a exp(x log a) = log a · ax .
d
a =
dx dx
We can see that these functions have the important property that the
derivative is directly proportional to the function. These functions are
used in, for example, the understanding of half lives of radioactive sub-
stances and simple models of cooling.
The number exp 1, usually denoted by e, is called Euler’s constant. It is
an irrational number having approximate value 2.718281828459, and its
immense importance to mathematics is only marginally outweighed by
that of π. If we follow the definition of exponential functions as above,
then we get
ex = exp(x log(e)) = exp(x log(exp 1)) = exp x.
Very often, as in these notes, it is more convenient and easy on the eye
to write ex instead of exp x. In this case, the derivative is equal to the
original function.
Example B.10. Find the tangent plane to g(x1 , x2 ) = x13 x22 at a = (2, 2).
Solution. We have
∂g ∂g
∇g(x) = , = (3x12 x22 , 2x13 x2 ),
∂x1 ∂x2
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
162 Additional material (non-examinable)
−1 a b 1
√
Using Pythagoras, the equation for the semicircle is given by f(x) =
1 − x 2 , x ∈ [−1, 1], so our problem boils down to finding the definite
integral
Z b
f(x) dx.
a
We require an inverse trigonometric function to evaluate this integral. In
the next example, you will see how a successful integration sometimes
involves the combination of a number of techniques.
= √
du 1 1 1
= p = ,
dx 1 − x2 1 − sin2 u cos u
and so Z √ Z
1− x2 dx = cos2 u du.
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
164 Additional material (non-examinable)
= 1
2
u + 14 sin 2u + c
= 1
2
u + 12 sin u cos u + c
= 1
arcsin x + 12 x cos(arcsin x) +c
2
√
= 1
2
arcsin x + 12 x 1 − x 2 + c.
The solution above makes perfect sense from a geometric point of view.
Given an angle θ (in radians), the area of the circular sector having radius
1 and central angle θ is equal to 21 θ.
√
1−x 2
−1 x 1 −1 x 1
In the first figure above, to the left, x = sin θ and hence θ = arcsin x.
Therefore the area of the illustrated sector equals 12 arcsin x. Meanwhile,
√
the area of the shaded triangular region on the right is 21 x 1 − x 2 . Com-
bining the two regions gives the area sought above, between 0 and x.
of ‘successful’ outcomes (3) and the total number of possible outcomes (6)
and taking the ratio 36 , giving the probability of getting an even number
as 21 = 0.5. The expected value, essentially the long run average, when
we throw the die is obtained by taking each possible outcome, multi-
plying it by the probability that this outcome occurs, and then summing.
For a fair die, each outcome has an equal proability 16 of occuring, so we
get an expected value of 1 · 16 + 2 · 61 + · · · + 6 · 61 = 3 21 .
However, if the experiment involves an interval of possible outcomes then
we can’t calculate probabilities just by counting. For example, suppose
a random number generator produces a number between 0 and 1. There
are an infinite number of possible outcomes. The proability of any one
of them occuring is zero, but we can measure the probability that the
outcome will lie in any given interval by measuring the length of that
7 8
interval: the probability that we get a number between than 10 and 10 ,
for example, is 10 . The probability that the outcome lies between a and b
1
Rb
can actually be written as a 1 dx, a definite integral which evaluates to
b − a. The constant integrand here, 1, reflects the fact that all outcomes
are equally likely.
But this is not always the case. Perhaps we have a weighted random
number generator with numbers close to zero more likely to occur. This
would be reflected by a different function in the integrand, for example
Rb
a
2 − 2x dx. In this context, the integrand is called a probability density
function. It should be a positive function and its integral over the interval
of possible outcomes should equal one (because the probability of some
number being generated equals 1, i.e. it is a certainty). You can check that
this is the case with our weighted random number generator: 2 − 2x > 0
for x ∈ [0, 1], and
Z 1 1
2 − 2x dx = 2x − x 2 0 = (2 − 1) − (0 − 0) = 1.
0
The probability that this random number generator gives an answer be-
7 8
tween 10 and 10 is
Z 8 108
10
2 − 2x dx = 2x − x = ( 85 −
2 64
100
) − ( 57 − 49
100
) = 1
20
.
7 7
10 10
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
166 Additional material (non-examinable)
1
So ‘on average’ we get 2
from our fair random number generator.
For our weighted random number generator, the expected value would
R1
be 0 x(2 − 2x) dx. Can you work this out?
Figure B.6: moving the quadratic does not change the area underneath
x0 x1 x2 −h 0 h
= 1
3
h(2ah2 + 6c). (B.9)
Now use the fact that the quadratic passes through (−h, y0 ), (0, y1 ) and
(h, y2 ) to get
y0 = ah2 − bh + c (B.10)
y1 = c (B.11)
y2 = ah2 + bh + c. (B.12)
Adding (B.10) to (B.12) gives y0 + y2 = 2ah2 + 2c. Now add four times
(B.11) to get y0 + 4y1 + y2 = 2ah2 + 6c. Compare this to (B.9). We see
that the area under the quadratic is 31 h(y0 + 4y1 + y2 ).
Now this is only an approximation to the area over the first two subinter-
vals [x0 , x1 ] and [x1 , x2 ]. The area over the next two will be 13 h(y2 +4y3 +y4 ).
We keep doing this up the last pair of subintervals. Therefore the total
area Sn over the n = 2m subintervals will be
Sn = 1
3
h(y0 + 4y1 + y2 + y2 + 4y3 + y4 + y4 + 4y5 + · · · + 4y2m−1 + y2m )
= 1
3
h(y0 + 4y1 + 2y2 + 4y3 + · · · + 2y2n−2 + 4yn−1 + yn ).
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.