0% found this document useful (0 votes)
83 views171 pages

Calculus UCD

Uploaded by

Sofia Ani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views171 pages

Calculus UCD

Uploaded by

Sofia Ani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 171

Calculus (Online)

MATH10400

Dr Richard Smith

May 20, 2023

Drs Michael Mackey and Richard Smith copyright © 2022/23.

Acknowledgements
I am most grateful to Dr Michael Mackey, who lectured this module in
2015/16 and 2016/17, and who very kindly provided me with full access
to the content that he created.
All figures are the author’s.

i
Contents

Contents iii

0 Programme overview 1
0.1 Programme outline . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Assessment and grading . . . . . . . . . . . . . . . . . . . . . 5
0.3 Continuous assessment schedules . . . . . . . . . . . . . . . 7
0.4 Discussion boards and MathJax . . . . . . . . . . . . . . . . . 9
0.5 Any other business . . . . . . . . . . . . . . . . . . . . . . . . 10

1 Preliminaries 13
1.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Algebraic manipulation . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Powers (or Laws of Indices) . . . . . . . . . . . . . . . . . . . 21
1.7 Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8 Solving equations . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.9 Graphing functions . . . . . . . . . . . . . . . . . . . . . . . . . 27

2 Functions and Limits 37


2.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Limits of functions . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.3 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3 Differentiation 57
3.1 Rates of change . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 Differentiation from First Principles . . . . . . . . . . . . . . 60
3.3 Rules for differentiating . . . . . . . . . . . . . . . . . . . . . . 65
3.4 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

iii
iv Contents

4 More about the Derivative 77


4.1 Critical Points, local maxima and minima . . . . . . . . . . . 77
4.2 Finding Maxima and Minima . . . . . . . . . . . . . . . . . . 79
4.3 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . 84
4.4 Linear Approximation . . . . . . . . . . . . . . . . . . . . . . . 87
4.5 Logarithmic and Implicit Differentiation . . . . . . . . . . . . 90

5 Functions of Several Variables 93


5.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 The vector spaces R2 and R3 . . . . . . . . . . . . . . . . . . 95
5.3 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.4 Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.5 An application – Least Squares . . . . . . . . . . . . . . . . . 106

6 Integration 109
6.1 Indefinite integrals . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2 Riemann sums and definite integrals . . . . . . . . . . . . . . 112
6.3 The Fundamental Theorem of Calculus . . . . . . . . . . . . 116

7 Methods of Integration 121


7.1 Integration by Substitution . . . . . . . . . . . . . . . . . . . 121
7.2 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . 125

8 Numerical Techniques 131


8.1 Solving equations numerically . . . . . . . . . . . . . . . . . . 131
8.2 Integrating numerically . . . . . . . . . . . . . . . . . . . . . . 136

A Discussion board and WeBWorK guides 143


A.1 How to use the Moodle discussion boards . . . . . . . . . . 143
A.2 How to use WeBWorK . . . . . . . . . . . . . . . . . . . . . . 146

B Additional material (non-examinable) 149


B.1 Additional proofs . . . . . . . . . . . . . . . . . . . . . . . . . . 149
B.2 Additional concepts . . . . . . . . . . . . . . . . . . . . . . . . 153
Chapter 0

Programme overview

0.1 Programme outline


Linear Algebra (Online) and Calculus (Online) comprise the Professional It should be possible to
link to anything in blue.
Certificate in Mathematics for Data Analytics and Statistics. The purpose
of these modules is to teach the student fundamental concepts and tech-
niques from linear algebra and calculus that are, for instance, necessary
for the study of multivariate statistics. While the material in the mod-
ules will be quite generic in nature (and thus applicable to many other
fields), the reader will find in some of the appendices specific techniques
in multivariate statistics (e.g. MATH10390 Appendix B on Principle Com-
ponent Analysis), that draw together many of the topics covered in the
programme as a whole, either directly or indirectly.

MATH10390 Linear Algebra (Online) outline


• Matrices 1
Matrix arithmetic, determinants of n × n matrices and their com-
putation for small n, and the adjugate method of finding inverses.
Symmetric and orthogonal matrices.

• Vector geometry 1
Vectors in n-dimensional Euclidean space, vector arithmetic, scalar
products, the Cauchy-Schwarz inequality, angles between vectors,
and the action of matrices on vectors.

1
2 Programme overview

• Systems of linear equations


Solutions of systems of linear equations by Gaussian elimination,
connections between matrices and linear systems, including matrix
rank.

• Vector geometry 2
Orthonormal lists of vectors, orthonormal bases of Rn and coordi-
nate systems.

• Eigenvalues and eigenvectors of matrices


Eigenvalues and eigenvectors of n × n matrices, and their compu-
tation for n = 2 and n = 3. Symmetric matrices and orthonormal
bases of eigenvectors.

• Matrices 2
Quadratic forms and matrix norms.

MATH10400 Calculus (Online) outline


• Functions and Limits
Functions, domain, codomain, algebra of functions, injectivity, sur-
jectivity, inverses, limits, algebra of limits, continuity, polynomials,
rational functions, trigonometric, exponential and logarithmic func-
tions.

• Differentiation
Rates of change, differentiation from first principles, relationship
with continuity, the power, product, quotient and chain rules, deriva-
tives of polynomials, trigonometric, exponential and logarithmic func-
tions, and composites thereof.

• More about the derivative


Critical points, finding local maxima, minima and inflection points,
higher order derivatives, Rolle’s Theorem, the Mean Value Theorem,
linear approximation, and logarithmic and implicit differentiation.

• Functions of several variables


Functions on Rn (mostly n = 2) and their graphs, partial deriva-
tives, gradients, critical points and their classification, second order
partial derivatives, Hessian matrices, lines of best fit, least squares.
0.1. Programme outline 3

• Integration
Indefinite integrals as antiderivatives, standard examples, Riemann
sums, definite integrals and area, the Fundamental Theorem of Cal-
culus.

• Methods of integration
Integration by substitution and integration by parts.

• Numerical techniques
Solving equations numerically, the bisection and Newton-Raphson
methods, numerical integration, the trapezoidal rule and Simpson’s
rule.

Examples will be peppered throughout the two modules. While there


will be some theory, the emphasis will be on the introduction of ideas
and techniques. A small number of mathematical proofs are included, but
they will be relatively straightforward, and will be specific applications
of the techniques introduced during the course of the modules. Any of
the deeper, more involved proofs that would belong to more theoretical
courses on linear algebra or calculus will be confined to Appendix C.1
and Appendix B.1, respectively, should the reader be interested. These
appendices, replete with dark secrets and forbidden magic, will not be
examinable!

Lecturer – Dr Richard Smith


I am on the right. My email address is [email protected]
and the address of my website is https://maths.ucd.ie/~rsmith.
Please only use my email address in the event of an emergency! See
point 4 below under ‘Online material’, and Section 0.4 for details on how
to pose queries concerning the module.
My office is S1.71, first floor, Science Centre South. It is in building 11,
square 6D on the most recent version of the UCD map available.
In addition, the Mathematics and Statistics School Office is in G03, In keeping with ancient
academic tradition, the
ground floor, Science North, in building 65, square 6C on the map. photo is comfortably out
of date.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
4 Programme overview

Figure 0.1: School office: G03, ground floor, Science North

Online material
1. UCD Mathematics Moodle
All module material will be made available on

UCD Mathematics Moodle (https://vector.ucd.ie/moodle).

Once at the site, please log in using your UCD Connect creden-
tials and then enrol to both modules (the enrolment key for both is
‘ucdprofcert2022’).

2. Lecture notes and videos


You will see a series of The full set of lecture notes for each module will be made available
exercises in the notes
themselves. You do not
when the programme opens. At 9am (summer time, i.e. 08:00 GMT)
need to submit solutions on each Monday of the first eight weeks of term, a set of short videos
to these.
covering the central topics from the notes in further detail will be
released. Note that this material is intended to be absorbed over
the 12-week summer teaching term; in particular, the assessment
will be spread across the 12 weeks. The compressed schedule is
designed to allow people to look ahead if they wish.
0.2. Assessment and grading 5

3. Continuous assessment
Continuous assessment comes in two forms: written homework and
WeBWorK. Both will be issued and managed online – see Section
0.2 for more details. The full schedule of issue dates and assessment
deadlines is given in Section 0.3.

4. Discussion boards
Students can post queries and discuss topics via the weekly dis-
cussion boards – see Section 0.4 for more information.

5. News and announcements


I will make class announcements via the ‘MATH10390 announce-
ments’ and ‘MATH10400 announcements’ discussion boards at the
top of each site’s main page, respectively. These announcements
will be repeated in the ‘Latest news’ boxes to the right hand side.

Mobile access to online material


We recommend that you view UCD Mathematics Moodle via a web brow-
ser on a desktop or laptop computer, or tablet.
Moodle does have an app, available for both iOS and Android mobile
platforms. However, be warned that its functionality is limited: it does
not work with WeBWorK all that well, and there is no rendering of math-
ematical notation via MathJax (see Section 0.4). Given the app’s limita-
tions and given my strong doubts about whether smartphones can really
aid the acquisition of deep knowledge and understanding, I am simply
making you aware of the app rather than actively promoting it.
To gain access to UCD Mathematics Moodle via the app, please enter

vector.ucd.ie/moodle

when prompted for the URL, and then log in as usual.

0.2 Assessment and grading


The proportion of marks allocated to the various assessment components
will be the same for both modules. On the other hand, the modules’
assessment deadlines will be different – see Section 0.3 below.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
6 Programme overview

Homework (20% of final mark)


Four homework sheets will be issued on Moodle over the course of the
module. Each is worth 5% of your final mark. You will receive a mark out
of 5 for each sheet. The only possible exception to this is MATH10390
Homework Sheet 4, where you may receive an additional 2 bonus marks,
owing to the length of the sheet.
Written solutions to the homework should be scanned to pdf files and
submitted online via Moodle. As well as ordinary scanners, there are
some apps (free or ‘freemium’) that use smartphone cameras to scan doc-
uments to pdf, such as CamScanner. Alternatively, with a suitable app,
you can write solutions directly onto a touchscreen device, such as a
tablet, and export to a pdf file.
Please ensure that your student number and solutions are clearly visible,
otherwise, you may lose marks! For each homework assignment, you
must submit exactly one pdf file containing your solutions and accept
the submission declaration before clicking the submit button (see the
end of Section 0.5). The maximum size of files to be uploaded is 10 MB,
which should be plenty.
Homework issue dates and submission deadlines are given in Section 0.3
below. Marks and feedback on submitted solutions will be provided on a
rolling basis.

WeBWorK (10% of final mark)


WeBWorK is an online homework system, again available via Moodle.
Five WeBWorK homework sets will be issued over the course of the mod-
ule. Each set is worth 2% of your final mark. You will receive a mark out
of 2 for each set.
Solutions are entered directly online. For advice on entering answers,
and comments on certain questions, please see the WeBWorK guide in
Appendix A.2 (of either module).
WeBWorK set issue dates and submission deadlines are given in Section
0.3 below. Answers to a given WeBWorK set will be released immediately
after the corresponding deadline.

Final exam (70% of final mark)


For the past two years exams for the two modules in this programme have
been conducted online. This was done out of necessity in response to the
0.3. Continuous assessment schedules 7

pandemic. However, the intention this year is to resume the normal state
of affairs, which is for the exams to be held in person. Online exams are
problematic because they make it very difficult to protect the integrity of
assessment; unfortunately plagiarism been committed during the running
of online exams in this programme.
The two final 2-hour written exams will take place from 10am – 12 noon,
and 2pm – 4pm, in room H2.38 SCH, UCD Science Centre (Hub), Belfield
Campus, University College Dublin, on Friday 19 August 2022 (building
64, square 6D on the UCD map). No alternative exam date will be offered.
When travel to Dublin for the final exams is not possible, examination in
appropriate third party centres may be facilitated. Such arrangements
will need to be made well in advance of the exam and cannot be guaran-
teed. Contact Laura Barnes ([email protected]) by Friday 3
June (week 3) to enquire.

Grading
You will receive a mark out of 30 for your continuous assessment, which
will be converted to a letter grade according to the University’s Standard
Conversion Grade Scale (see under Mark to Grade Conversion Scales).
Likewise, you will receive a mark out of 70 for your final exam which will
be converted into a letter grade in the same manner. These two letter
grades will be combined to make an overall module grade; the precise
mechanism by which this will be achieved can be seen under Module
Grade Calculation Points.

0.3 Continuous assessment schedules


All MATH10390 continuous assessment issue dates and deadlines will fall
at 9am (summer time, i.e. 08:00 GMT) on Tuesdays. All MATH10400
issue dates and deadlines will fall at 9am on Wednesdays.
Two weeks are given to complete each WeBWorK set. There are no
‘overall submit’ buttons. To obtain full credit for a set, simply enter the
correct solutions online before the deadline.
The amount of time given to complete written homework assignments The deadlines start to
pile up towards the end
varies. Early submission of written homework assignments is strongly of the modules. Please
encouraged, but the formal deadlines are structured in a way that allows be mindful of this!
students some flexibility in making their own work plan. The complete
schedules are given in the tables below.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
8 Programme overview

The WeBWorK deadlines are hard deadlines. Regarding homework dead-


lines, if homework is submitted late, your total mark will decrease linearly
to zero after 48 hours. For example, a piece of work ordinarily worth 5%
will receive 2.5% if submitted 24 hours late, and 1.25% if submitted 36
hours late, and so on.
WeBWork marks and feedback will be returned immediately after the
deadlines, and homework marks and feedback will be returned within 10
working days of the deadlines. For this reason UCD’s Late Submission
of Coursework Policy will not apply to these modules (see point 6.1 in
the policy). Penalties for late submission may be waived if the student
has valid extenuating circumstances (see Section 0.5).

MATH10390 assessment schedule (9am Tuesdays)

Week Date Assignment issue Assignment deadline


2 24-05 WeBWorK 1
Again, while the videos 3 31-05 Homework 1
are compressed into an
8-week period, the con- 4 07-06 WeBWorK 2 WeBWorK 1
tinuous assessment is
spread throughout the 5 14-06 Homework 2
12-week summer teach-
ing term. 6 21-06 WeBWorK 3, Homework 3 WeBWorK 2
7
8 05-07 WeBWorK 4, Homework 4 WeBWorK 3, Homework 1
9 12-07 Homework 2
10 19-07 WeBWorK 5 WeBWorK 4, Homework 3
11
The homework dead-
lines for both modules 12 02-08 WeBWorK 5, Homework 4
are closely aligned, with
the exception of home-
work sheet 4.
MATH10400 assessment schedule (9am Wednesdays)

Week Date Assignment issue Assignment deadline


2 25-05 WeBWorK 1
3 01-06 Homework 1
4 08-06 WeBWorK 2 WeBWorK 1
5 15-06 Homework 2
0.4. Discussion boards and MathJax 9

6 22-06 WeBWorK 3, Homework 3 WeBWorK 2


7
8 06-07 WeBWorK 4, Homework 4 WeBWorK 3, Homework 1
9 13-07 Homework 2
10 20-07 WeBWorK 5 WeBWorK 4, Homework 3
11 27-07 Homework 4
12 03-08 WeBWorK 5

0.4 Discussion boards and MathJax


Weekly discussion boards
Each module will have its own set of discussion boards. Each week will
be given its own discussion board to keep conversations focussed. If you
have a query about the module or about its content, you are strongly
encouraged to post your query to the discussion boards. Please don’t
be afraid to ask questions!! From personal experience, I know that it can
feel daunting to ask questions (especially in an online environment), but
asking questions really is an excellent way of improving understanding!
Ordinarily, these boards will be monitored, and queries posted to them
will be answered, for up to two hours in the afternoon, Monday to Friday,
depending on the volume of queries. If I take leave at any point then I
will let you know via the ‘MATH10390 announcements’ and ‘MATH10400
announcements’ discussion boards and will arrange appropriate cover.
Please only contact me by email in the event of an emergency!
I will not monitor the boards at other times, or at weekends! Of course,
the online nature of these modules means that you can view module
content and work through assignments at any time, day or night. In
contrast, the lowly human behind it all (i.e. that person in the photo)
cannot be on hand at all times as well. Please do take this into account,
particularly when the Tuesday and Wednesday deadlines loom!

Posting mathematical content


The Moodle forums have a system called MathJax that allows people
to write mathematical notation directly into web pages, which enables
mathematical queries to be easily and clearly stated. Details of how to

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
10 Programme overview

use MathJax in the discussion boards are given in Appendix A.1 (of either
module).
You’re free to use MathJax to post mathematical content. Alternatively
you can post such content by writing it by hand and scanning it to a pdf
(see above) or by using a suitable pdf annotator, and then attaching the
pdf file to your post. This option may be preferable if you want to write
a lot of mathematical content.

0.5 Any other business


Suggested further reading
Regarding books, neither module formally follows a textbook. However, I
can suggest Anton, Rorres, Elementary Linear Algebra, Applications Ver-
sion, Chapters 1-3 and parts of Chapters 5 and 7 for MATH10390, and
Anton, Bivens, Davis, Calculus Early Transcendentals, parts of Chapters
0-5, 7 and 13 for MATH10400. It often helps to see concepts approached
from a second, slightly different perspective, and there are plenty of ex-
ercises to practice on.

Calculators permissible in the final exams


There is of course a huge range of calculators available and it is unre-
alistic to provide an explicit list of those that will be permissible in the
final exams. Generally speaking, the calculators that are not permissi-
ble are programmable ones or ones that are capable of more advanced
built-in functionality. As an example, the Casio fx-83GT PLUS model is
permissible, but the fx-991ES PLUS is not. If in doubt, please ask me
on the discussion boards. Of course, use of smartphones in the exams is
completely banned!

Registration, fee payment and withdrawals


Please confirm your personal details (including email address and photo)
and pay your programme fees using UCD’s Student Information System.
As this is a one-semester programme, payment is required in full before
it starts on 16 May. For further assistance, please contact Laura Barnes
([email protected]) or see UCD’s guide to online registration
and fee payment. Further information on fee payment and deadlines can
be found on the UCD Fees office website.
0.5. Any other business 11

If you wish to withdraw from the programme, then please note that it is
essential to do so by Friday 5 August (week 12), to ensure you do not
have a failing grade recorded against your name on the University’s sys-
tem. Since this programme is only one trimester long, it is not possible
to get a refund upon withdrawal; see point 1.7 of UCD’s Refunds Policy.

Extenuating circumstances
The University has an Extenuating Circumstances Policy. The Univer-
sity defines extenuating circumstances to be ‘serious unforeseen circum-
stances beyond your control which prevented you from meeting the re-
quirements of your programme’. Note the following footnote on page 2
of the Guidance Notes for Students:

Work commitments are not normally considered to be extenu-


ating circumstances. However a student on a part-time and/or
continuing professional education programme may have work-
related extenuating circumstances outside of the norm (e.g. a
work-related court case that they legally must attend) and in
these exceptional cases, they should consult the appropriate
programme/school office for advice.

You can apply for extenuating circumstances online. For more details,
please contact Laura Barnes ([email protected]).

UCD Student Code and plagiarism


Concerning conduct and plagiarism in particular, please see the Univer-
sity’s Student Code and specifically its Student Plagiarism Policy. In
addition to these documents, the School of Mathematics and Statistics
has its own Plagiarism Protocol. Please familiarise yourselves with the
second and third documents. The Library also has resources and advice
to help people avoid unintentional plagiarism.
In accordance with the School’s protocol (see §2.2), you will need to
accept the submission declaration before clicking the submit button. In
doing so, you acknowledge that you have neither given, sought, nor re-
ceived, aid in order to complete the assessment.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
Chapter 1

Preliminaries

1.1 Numbers
Where to begin a maths course? Numbers might be a good choice. They
are at least familiar to us, but questions such as ‘how many numbers are
there?’ are more subtle than they first appear while the question of ‘what
is a number?’ borders on philosophy. In this module we will sidestep
such questions by referring to Kronecker’s quotation and accepting that
some numbers at least are given to us. We can then construct and dis-
tinguish between different types of number; integers, fractions, negative
numbers, irrational numbers, complex numbers, prime numbers, transcen-
dental numbers, algebraic numbers, quaternions, and many more, and Leopold Kronecker
proceed to do things with these numbers, such as add them, or multiply (1823 – 1891) was a

them, or find our mobile numbers in the decimal expansion of π. Doing


German mathematician
whose primary contri-
things with numbers is where functions make their entrance. butions were in number
theory and algebra. In a
critique of the emerging
The material in this chapter is a condensation of a lot of background field of ‘set theory’, he
material that you may, or may not, be familiar with. It may prove a useful is said to have uttered

reference later. Don’t worry about all the details as you read through, ‘God made the integers,
all else is the work of
but do try to gauge for yourself how much of it is familiar. Also, while man.’
prior knowledge can be helpful, you should always be open to seeing Image source:
Wikipedia.
new light through old windows.

Number systems
The counting numbers or natural numbers are 1, 2, 3, 4, . . . . We give That’s ‘N’ for Natu-
ral. And yes – that
the set of all such numbers the name N for short. Whenever we add or strange font is used de-
multiply two natural numbers we get another natural number. liberately: x, n, N can
all represent different
things at different times,
13 but N, in this typeface,
always means the natu-
ral numbers.
14 Preliminaries

It is hard to imagine how many natural numbers there are. The number
The number 10100 is 1 followed by one hundred zeros (more easily written as 10100 ) is quite a
known as a googol.
big natural number – the number of the atoms in the universe is estimated
to be much less. And yet, the number of ways of ordering a class of 70
students is much greater, so extraordinarily large numbers can crop up
in common situations.
If one wants to subtract natural numbers, one has to allow for 0 and neg-
ative numbers. We get the set of integers . . . , −3, −2, −1, 0, 1, 2, 3, 4, . . . .
The short name for the integers is Z, coming from the German word
‘Zahlen’.
We can add, multiply and subtract any two integers to get another inte-
ger. Multiplying two negative numbers gives a positive number: −2 ×
−3 = +6 = 6. Multiplying a positive by a negative gives a nega-
tive: 5 × −3 = −15.
The rules above were not chosen by anyone – they are an inevitable
consequence of the axioms (fundamental properties) of numbers.
Dividing one integer by another may not give an integer so we have to
consider the fractions. The fractions, or quotient numbers, also called
the rational numbers (from ‘ratio’) are numbers which can be written as
one integer divided by another (non-zero) integer. For example
1
2
, − 32 , 12
1
= 12, 3145927
1000000
= 3.1415927,

are all rational numbers.

Warning 1.1. Division by zero is not allowed!

The shorthand name for the set of rational numbers is Q (for ‘quotient’).
The number 6.3567 is in Q because it can be written as 10000
63567
. Similarly,
every number your calculator displays is a rational √ number. However,
your calculator lies! If you ask your calculator for 2 it will happily dis-
play the rational number 1.414213562, but this is only an approximation.

The amazing fact is, as proven by the School of Pythagoras, 2 is not
Pythagoras of Samos a rational number, and so can never be fully displayed by a calculator.
(c. 570 BC – c. 495 BC). Likewise, π is not a rational number, and π 6= 3.141592654.
Though hugely influen-
−6
tial, many results tra- There are many ways to write each element of Q, for example, 73 = −14 =
ditionally attributed to 9
him probably originated 21
and so on. However it is always possible to ‘cancel’ into the top and
earlier or were discov- bottom (read more about that in Section 1.7) to write a rational in its
ered by members of his
school.
Image source:
Totally History.
1.2. Sets 15

lowest form, i.e. write it using the smallest possible integers. Notice
when this is done, at least one of the integers will be odd.
The set of natural numbers N is a subset of the set of integers Z, and
likewise Z is a subset of Q. We write

N ⊆ Z ⊆ Q.

These numbers can be marked on a number line.

Figure 1.1: part of the number line



−4 −3 −2 −1 0 3 1 2 2 3π 4
7

By the famous Theorem of Pythagoras, you can actually √ use a so-called


‘straight edge and compass’ construction to mark 2 on the number
√ are
line with perfect accuracy. This means, as pointed out above, there Actually, in some sense,
which can be defined
numbers on the number line which are not rational numbers (with 2 and precisely, ‘most’ num-
π being examples). The shorthand name for all numbers on the number bers are irrational.
line is R, and they are usually referred to as the real numbers.

1.2 Sets
Set notation
A collection of objects is called a set. For example, N and Z and Q and Should you need them, I
have written some more
R are sets of numbers. A set may be given explicitly as in detailed notes on sets
for undergraduates.
A = {Curly, Moe, Larry},

or (more frequently) as a collection of objects which conform to a certain


rule, e.g.

A = {x : x is a member of the three stooges} .

A more difference is say: N = {1, 2, 3, . . . } while


 a pertinent example of this
Q = b : a, b ∈ Z and b 6= 0 .
It doesn’t matter in what order the elements of the set are listed –
{Larry, Curly, Moe} is the same set as {Curly, Larry, Moe}. Also, rep-
etitions are redundant, so {Larry, Curly, Larry, Larry, Moe, Moe} is the
same as the sets above. Though the order does not matter, many sets do
have a natural order which can help us to understand how they work, and

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
16 Preliminaries

this is the case with the real numbers, which we think of as increasing
from left to right.

We use the notation x ∈ A to mean x is an element (or member) of the


set A. For example 2694√∈ N. The notation x ∈ / A means x is not an
element of the set A; so 2 ∈
/ Q.

Interval notation
The notation (3, 7) Because we deal with subsets of the real line so often, we have a special
looks worryingly like
that of an ordered pair
notation to refer to intervals (that is, segments) of the line. For example,
of numbers used, for the real numbers between, and including, 3 and 7 can be represented by
instance, to represent
a point in the 2- [3, 7] for short, while the same interval of numbers, but excluding the two
dimensional Cartesian endpoints is written (3, 7). For reasons that are not obvious, the first is
called the closed interval from 3 to 7, and the second the open interval
plane (see Section 1.9).
This is a feature of
mathematics: sometimes from 3 to 7.
different things are
represented using
the same notation.
However, usually the Definition 1.2. Given a, b ∈ R, a < b, we have the bounded intervals
correct interpretation

(a, b) {x ∈R a<x < b} ,


follows from the wider
context. = :
[a, b] = {x ∈R : a6x 6 b} ,
[a, b) = {x ∈R : a6x < b} ,
(a, b] = {x ∈R : a<x 6 b} .

We also can write unbounded intervals:

(a, ∞) = {x ∈R : a < x} ,
[a, ∞) = {x ∈R : a 6 x} ,
(−∞, a) = {x ∈R : x < a} ,
(−∞, a] = {x ∈R : x 6 a} .

Although it may not look like a big deal at first glance, including or
excluding the endpoints a and b is an important distinction; for example,
the closed interval [3, 7] has a biggest and smallest number, but the open
interval (3, 7) has neither!

Notice that we never have a square bracket about −∞ or ∞ because


infinity is excluded from the real numbers.
1.3. Functions 17

1.3 Functions
A function is a relationship between two sets. This, admittedly, is terri-
bly vague. The idea is that a function assigns to each element of a given
input set a unique element of another output set – a function takes some-
thing as input and produces output. In this most general of descriptions,
something like a radio

electromagnetic wave −→ FM radio −→ music


| {z }
| {z }
input output

or a mathematician

| {z } −→ mathematician −→ theorem
coffee | {z },
input output

is a function, but of course, we will most often work with functions whose
input and outputs are numbers

| {z } −→ square −→ number
number | × number}.
{z
input output

A function from a set A to a set B can be given by explicitly pro-


viding a unique element of B for each element of A. For example, if
A = {Curly, Moe, Larry} and B = {Chocolate, Vanilla} then we can de-
fine a function f from A to B (we write f : A → B) by

f(Curly) = Chocolate, f(Moe) = Chocolate, f(Larry) = Vanilla.

However, it is much more common to define a function by providing a rule


for an arbitrary element (or ‘variable’) of the first set. For example, the
same function f might be defined by simply providing the rule

f(x) = favourite ice cream flavour of x.

Indeed, if we are dealing with a very big or infinite set (e.g. the natural
numbers) then this is the only way one can define a function, e.g. f(x) =
x 2 , or the piecewise-defined function


0 x 6 12000
T (x) = (0.27)(x − 12000) 12000 < x 6 25000

(0.27)(13000) + (0.45)(x − 25000) x > 25000,

which is a simple tax calculator.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
18 Preliminaries

Why did we use such a vague definition of what a function is? Well, it is
not enough to consider only functions which send one number to another
number, like f(x) = x 2 . We might want to send a pair of numbers to
Sometimes we even
need to send functions
to other functions. . . a number – this is a function of two variables. But such functions are
already very familiar to you. Consider addition, which takes two numbers
and returns one number:

(x, y) −→ add −→ x + y.
| {z } | {z }
input output

See MATH10390 Chap- In linear algebra, you will learn about matrices – these can be regarded
ters 1 and 2, and Section
2.4 in particular.
as functions which take an input list of numbers (a vector) and produce
an output vector.

1.4 Variables
The relationship between numbers and variables (xs and ys) is like that
between nouns and pronouns in language. As mentioned above, the
easiest way to define a function on numbers is to make use of variables.
The following example illustrates the concept.

Example 1.3.
Number / Proper Noun Variable / Pronoun
When you meet Moe, poke Moe in the eye. When you meet a stooge,
When you meet Curly, poke Curly in the eye. poke him in the eye.
When you meet Larry, poke Larry in the eye.
f(1) = 5
f(2) = 7
f(2.35) = 7.7 f(x) = 2x + 3
f(11) = 25
..
.

The lower case letters ‘x’ and ‘y’ are often used for variables. Functions
are usually denoted by f or g and sets by upper case A, B or C . This
is just a rule of thumb, but it explains why you often see f(x), but rarely
see x(f).
1.5. Algebraic manipulation 19

1.5 Algebraic manipulation


There are four basic arithmetic operations – multiplication, division, ad-
dition and subtraction (although these can be boiled down to just multi-
plication and addition) and there is a precedence in the order in which
these are to be carried out.

Fact 1.4. First multiply and divide, then add and subtract.

For instance 6 + 2 × 3 + 4 equals 6 + (2 × 3) + 4 = 6 + 6 + 4 = 16.


To change the order in which operations are performed, brackets are
used:
(6 + 2) × 3 + 4 = 28,
(6 + 2) × (3 + 4) = 56,
6 + 2 × (3 + 4) = 20.
You can see why it is so important to be careful with brackets.
Brackets are especially important when dealing with variables. For ex-
ample, 3 multiplied by x + y is written 3(x + y) and not 3x + y, otherwise
we could be replacing 3(4 + 5) = 27 with 3 × 4 + 5 = 17.
We can get rid of the brackets (this is called expanding) by using the
distributive law:
3(x + y) = 3x + 3y.
For example, 3(4 + 5) = 3 × 4 + 3 × 5. We don’t always put a
multiplication sign, pre-
ferring 3(4 + 5) to 3 ×
Fact 1.5. The general form of the distributive law is (4 + 5). With numbers,
the multiplication sign

x(y + z) = xy + xz and (x + y)z = xz + yz.


may be necessary – ob-
viously we can’t abbre-
viate 3 × 4 to 34. Some-
times a point is used in-
stead of ×, so 3 × x may
Example 1.6. Expand (3 + 4)2 . be written as 3·x or sim-
ply 3x. Of course, 3 · 4
is ambiguous (3 × 4 or
Solution. 3.4 = 3 25 ), so be careful!

(3 + 4)2 = (3 + 4)(3 + 4)
= (3 + 4)3 + (3 + 4)4
= 3·3+4·3+3·4+4·4 Here, 4 · 3 and 3 · 4 mean
4 × 3 and 3 × 4, respec-
= 3 +3·4+3·4+4
2 2
tively!

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
20 Preliminaries

= 32 + 2 · 3 · 4 + 42
= 9 + 24 + 16 = 49. 

Example 1.7. Expand (x + y)2 .

Solution.

(x + y)2 = (x + y)(x + y)
= x(x + y) + y(x + y)
= x ·x +x ·y+y·x +y·y
= x 2 + xy + xy + y2
= x 2 + 2xy + y2 . 

Figure 1.2 illustrates the above expansion geometrically where a blue


square of side x, a red square of side y and two green rectangles of sides
of length x and y together form a large square of side x +y. Equating the
area of the large square with the sum of areas of the component parts
gives (x + y)2 = x 2 + 2xy + y2 .

Figure 1.2: expanding (x + y)2 geometrically

Can you draw the anal- y xy y2


ogous decomposition of
a cube that provides the
expansion of (x + y)3 ?

x x2 xy

x+y

Exercise 1.8. Expand the following expressions.


Some exercises will be
presented in mauve
boxes. These exercises 1. (x + y)3 ,
will not be graded. Feel
free to discuss them on
the discussion boards. 2. (x − y)2 ,
The same applies to
passages in the text 3. (x + y)(x − y),
or solutions of examples
which ask you to verify
certain things.
1.6. Powers (or Laws of Indices) 21

4. (a − b)(a2 + ab + b2 ) and

5. (a − b)(an−1 + an−2 b + · · · + abn−2 + bn−1 ).

1.6 Powers (or Laws of Indices)


We have just used the notation x 2 to mean x multiplied by itself (i.e. x 2 =
x · x). This is used simply as a matter of convenience – it is easier to
recognise 210 than 2 · 2 · 2 · 2 · 2 · 2 · 2 · 2 · 2 · 2. In general, for any x ∈ R
and n ∈ N, we have
x n = x| · x{z· · · x} .
n times

A great advantage of this system is that two very neat formulae hold.

Fact 1.9. For n, m ∈ N,


In the expression ab , the
x n · x m = x n+m (R1) number a is referred to
as the base and b as the
(x n )m = x nm (R2) exponent (or power).

We want to examine now what we should mean by x a when a is not a


natural number. For example, x −1 can’t mean ‘x multiplied by itself −1
times’ and how can x 2 mean ‘x multiplied by itself half a time’. This
1

would be absurd.
How can we define the powers of x, x q for q ∈ Q in such a way that Fact
1.9 still holds? Well, let’s begin with x 0 . What should this mean? If the
rule (R1) is to hold then x 0 · x 1 = x 1 . In other words x 0 · x = x. But what
multiplies x to give x? The number 1 of course.

Definition 1.10. For any number x ∈ R, There is no universally


accepted value of 00 .
Some people think that
x 0 = 1. it should remain unde-
fined. However, the ar-
guments in favour of set-

Now for negative integers. What does x −n mean for n ∈ N? For rule (R1)
ting 00 = 1 are reason-
ably strong as doing so
to hold, we have yields several benefits.

x −n · x n = x −n+n = x 0 = 1.
We will accept that 00 =
1 in this programme.

and hence the following definition makes sense.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
22 Preliminaries

Definition 1.11. Given x 6= 0 and n ∈ N,

x −n =
1
.
xn

So for example, x −1 = x1 , x −2 = 1
x2
, 10−6 = 1
1,000,000
and so on.

What about powers which are fractions? Well, taking x 2 as an example,


1

by (R2) we must have

(x 2 )2 = x 2 ·2 = x 1 = x,
1 1

√ x should be a number which, when squared, gives us x. This suggests


1
If x < 0 then there is no

so 2
real number a such that
a2 = x, and if x > 0 then
x, but things get a little complicated because
√ x is √
undefined if x < 0,
there are two such num- and if x > 0 then there
√ are two possibilities: x and − x. To avoid any
doubt, we set x 2 = x if x > 0 and leave it undefined if x < 0.
bers. 1



As well as square roots we can take cube roots. The√ cube root of x, 3
x,
is a number such that its cube is x, e.g. 8 = 2, −8 = −2. Unlike
3 3

square roots, we take the cube root of a negative number and there is
always exactly one valid cube root.


Similarly we can define the nth root of a real number x to be that number
n
(call it x) so that when we raise it to the power of n we get x. The
existence and uniqueness of an nth root√ depends on whether n is odd or
even. If n is even then the nth root n x will exist only if x > 0, and for
x > 0 there will be two solutions, a positive and a √negative one (we will
always take the positive one to be our value for n x). If n is odd then
there will always be a unique nth root of x for any x ∈ R.
Returning to the question of what is meant by x 1/n for n ∈ N, we see
that if our two rules are to hold

(x 1/n )n = x n ·n = x 1 = x,
1


then it makes sense to write x 1/n = n x.
Now we can make sense of x q for q ∈ Q. If q ∈ Q then q = mn for some
m, n in Z, n > 0, and where m and n are in their lowest form.

Definition 1.12. For any number x ∈ R and any q ∈ Q written as


above, √
x q = (x n )m = ( n x)m .
1
1.7. Fractions 23

This is undefined if either x < 0 and n is even, or x = 0 and m < 0.

Example 1.13.

(−8) 3 = ( −8)2 = (−2)2 = 4,
2 3

16− 4 = (16 4 )−1 = 2−1 = ,


1 1
1
2

27− 3 = (27 3 )−4 = 3−4 = (34 )−1 = 81−1 = .


4 1
1
81

There is a lot more to the business of raising one number to the power
of another. It is covered further in Appendix B.2. There, we will give a
meaning to the expression x y for any y ∈ R, not just for y ∈ Q.

1.7 Fractions
This is something we all learn in primary school, but it is surprising how
often mistakes are made. It is no harm to recall the rules.

1. Multiplying
Multiply the numerators (top) and the denominators (bottom):
a c ac
b
· d
= bd
.

For example, 12
29
· 2
3
= 24
87
= 8
29
.

2. Dividing
Invert and multiply:
a c a d ad
b
÷ d
= b
· c
= bc
.

For example,
11
28
3
= 11
28
· 7
3
= 77
84
= 11
12
.
7

3. Adding and subtracting


The rule here is simple – ensure that the denominators are the
same (‘common denominator’), then add/subtract the numerators.
For example, to find 32 + 43 we make the denominators of both frac-
8 9
tions equal to 12. Now we have 12 + 12 = 8+9
12
= 17
12
. Note we can

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
24 Preliminaries

find a common denominator by multiplying the two denominators


together.
This works with letters (variables) as well as numbers (constants).
1 1 x +1 x x +1−x 1
− = − = = .
x x +1 x(x + 1) x(x + 1) x(x + 1) x(x + 1)
4. Cancelling
This is a major source of mistakes. The idea is to simplify a fraction,
28
for example, by writing 12 = 73 (cancelling the ‘4’s). But what are
we really doing? Well, essentially we are doing the following
28
12
= 4·7
4·3
= 4
4
· 7
3
= 1· 7
3
= 7
3
.
That is, we use the rule for multiplication, the fact that xx = 1 for
any number x 6= 0, and the fact that 1 · y = y for any number y.
However the following ‘cancellation’ of the 3s is wrong:
6+7 2+7
= . (!)
9 3
Clearly it is wrong for 139
is not equal to 93 = 3. Even so, many
students would mistakenly write
2x + y 2+y
= ,
3x 3
which is also certainly incorrect (we have just pointed out that it is
wrong when x = 3 and y = 7). Notice why it is wrong – the rule
violated when we incorrectly write 2x+y
3x
= 2+y3
is to say that
2x + y = x(2 + y).

1 1 2
Exercise 1.14. Show that − = 2 . Write down 19 − 11
1
.
x −1 x +1 x −1

1.8 Solving equations


An equation, that is a mathematical sentence of the form
‘left hand side’ = ‘right hand side’,
may be true (2 × 3 = 6, 0 = 0, 70 = 0) or false (5 − 2 = π). Usually an
equation contains an unknown (or unknowns), such as x 3 + 3x 2 = 7 − 3x,
and a frequent source of entertainment is to try to solve a given equation
involving a unknown. What does this mean?
1.8. Solving equations 25

Fact 1.15. Solving an equation means to find all values of the un-
known for which the equation is true.

For example, solving the equation 2x + 1 = 11 gives the solution x = 5.


This equation has only one solution but bear in mind that an equation
may have more than one solution (x 2 = 4 has the solutions x = 2 and
x = −2, and x 3 = x has three solutions, 1, −1 and 0) or may have no
solution (x = x + 1).
The general method of solving an equation is to apply the same mathe-
matical operations to both sides of the equation to isolate the variable
(or unknown) on its own, or at least to change the equation to one whose
solution we can easily find. The operations we use to simplify the equa-
tion must be reversible so we may add (or subtract) the same number
to (from) both sides of the equation and we may multiply or divide both
sides of the equation by the same non-zero number. We can not multiply We lose all information
by doing this.
both sides of the equation by 0 as this will just lead to the trivial equa-
tion 0 = 0. Neither can we divide both sides by zero because division
by zero doesn’t make sense. This division by zero problem crops up in
other situations. Consider the equation
(x − 4)(x + 3) = 9(x − 4).
One is tempted to divide both sides by x − 4 but the problem is that x − 4
may equal zero, in which case we can’t divide by x − 4. Obviously, either
x − 4 = 0 or x − 4 6= 0. In the latter case, we can divide both sides by
x − 4 to get x + 3 = 9 and the solution x = 6. In the former case, we
notice the equation is true when x − 4 = 0 because it reads 0 = 0. Thus,
the solutions are x = 4 and x = 6.

Linear equations
A linear equation is one of the form ax + b = 0, where a and b are Solving several linear
given numbers with a 6= 0, and x is the unknown we are solving for, equations at once, with
many unknowns, is the
e.g. 11x − 3 = 0. Subtracting the number b from both sides we get subject of MATH10390
ax = −b and now dividing both sides by a, we get the solution x = −b
a
. Chapter 3.

So 11x − 3 = 0 has the solution x = 11 .


3

Example 1.16. More generally, an equation like 2x + 3 = 13 − 3x can


be converted to a linear equation and solved. In this example add 3x
to both sides to get 5x + 3 = 13, subtract 3 from both sides giving

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
26 Preliminaries

5x = 10, and divide by 5 to get the solution x = 2.

Quadratic equations

Definition 1.17. A function of the form

f(x) = a0 + a1 x + a2 x 2 + a3 x 3 + · · · + an x n ,

where each ai is a given real number with an 6= 0 is called a polyno-


mial of degree n (the degree is the highest power of x that appears).

For example f(x) = 9x − 3x 2 + 99x 5 is a polynomial of degree 5. The


coefficient of x 2 is −3.

Definition 1.18. A quadratic function is a polynomial of degree 2.


That is, a function of the form f(x) = ax 2 + bx + c where a, b and c
are given numbers with a 6= 0.

For example f(x) = 3x − x 2 , g(x) = 8x 2 + 56x − 73 and h(x) = x 2 − 1 are


all quadratic functions. A quadratic equation is one of the form

ax 2 + bx + c = 0,

where a 6= 0. How do we solve a quadratic equation? The following


formula might be familiar.

Fact 1.19 (The Quadratic Formula). If f(x) = ax 2 + bx + c, where


a 6= 0, then f(x) = 0 when
√ √
−b + b2 − 4ac −b − b2 − 4ac
x = or x = .
2a 2a

Taking the square roots It appears that a quadratic equation has two solutions according to the
of negative numbers
would lead us to com-
above formula but, if the quantity inside the square root b2 − 4ac (the
plex numbers, which so-called discriminant) is zero, then the two solutions coincide and the
have a vast and hugely
applicable theory, but it quadratic has only one root, while if b2 − 4ac is a negative number then
will not be considered the quadratic equation has no solutions (since we can’t take the square
in this programme.
root of a negative number).
1.9. Graphing functions 27

Example 1.20.

1. Solve the quadratic equation 2x 2 − 10x + 12 = 0.

Solution. Here a = 2, b = −10 and c = 12. Since b2 − 4ac =


(−10)2 − 4 · 2 · 12 = 100 − 96 = 4, which is greater than zero,
the quadratic has two solutions (or roots):
√ √ You can (and should)
−(−10) + 4 −(−10) − 4
x = and x = ,
verify that these are so-
lutions by putting x = 3
2·2 2·2 into the original equa-
tion and checking that
i.e. x = 10+2
= 3 and x = 10−2
= 2.  it is then true, and like-
4 4 wise for x = 2.

2. Solve the quadratic equation x 2 + 2x + 2 = 0.

Solution. Here a = 1, b = 2 and c = 2. This time b2 − 4ac =


22 − 4 · 1 · 2 = 4 − 8 = −4, which is negative. We can’t take
the square root of a negative number (in this programme), so in
fact this equation has no solutions. 

1.9 Graphing functions


Further to our earlier discussion of functions in Section 1.3, we give a
more formal definition.

Definition 1.21. Let A and B be non-empty sets. A function f from A


to B is a rule which assigns to each element x of A a unique element
f(x) of B. We write this as f : A → B.
Cartesian products
are named after René

In this course we will mostly be studying functions for which A and B


Descartes (1596 –
1650), who invented
are subsets of R. what we know as the
Cartesian coordinate
system.
Consider two number lines placed at right angles to one another, the
point of intersection being the 0 on each line. The two lines give a plane We are so used to

(the plane of this sheet of paper). The horizontal line is called the x-axis
Cartesian coordinates
nowadays that is
and the vertical line the y-axis. Any point in the plane is specified by difficult to appreciate

two numbers, the x-coordinate of the point (how far to the right of the
how revolutionary their
introduction was.
y-axis the point is), and the y-coordinate (the height of the point above Image source:
the x-axis. The two numbers are then written as a pair, (x, y). The x- Wikipedia

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
28 Preliminaries

coordinate always goes first, the y-coordinate second. For example, in


Figure 1.3, we plot the points (3, 2) and (−2, −1).

Figure 1.3: (a bit of) the Cartesian plane

(3,2)
2

−2
3
−1
(−2,−1)

See MATH10390 Sec- A plane with this system of labelling is called the Cartesian plane, or
tion 2.1 for further dis-
cussion.
simply R2 , since it consists of all pairs of elements of R. That is, R2 =
{(x, y) : x, y ∈ R}. The point of intersection of the two axes, (0, 0), often
Abusing mathematical written simply (and abusing notation slightly) as 0, is called the origin.
notation is usually

Definition 1.22. Let f : R → R be a function. The subset of R2 given


strongly discouraged as
it creates ambiguity,
which is poison for by
mathematics. Occa-
sionally, given a clear {(x, f(x)) : x ∈ R} ,
enough context, it is
permitted. is called the graph of f.

The above is a mathematical definition. We usually interpret the graph


of a function as being a ‘picture’ of the function in the Cartesian plane.
In many ways, the graph of a function is a nicer thing than the formula
Though for a deeper which defines it. Often it is easier to gain an intuitive understanding
understanding, and to
verify one’s intuition,
of how the function behaves (e.g. if it is going up (increasing) or down
studying the written (decreasing), or if it is nice and smooth or has corners and jumps) by
rule is often the only
way. looking at its graph, than by studying the written rule that defines it.

Example 1.23. Let f : R → R be the function f(x) = 2 for all x ∈


R. Such functions are usually called ‘constant’ functions, since their
value does not change when x changes. The graph of this function is
the set {(x, 2) : x ∈ R}. Drawing this set in R2 (by marking some of
1.9. Graphing functions 29

the points and then ‘filling in’) we get the graph in Fig. 1.4.

Figure 1.4: the graph of f(x) = 2

−3 −2 −1 1 2 3

Remarks 1.24. The graph of a constant function is a horizontal straight


line.

Example 1.25. Let f : R → R be given by f(x) = 2x + 3. This function


is a polynomial of degree 1. Its graph is {(x, 2x + 3) : x ∈ R}. Tak-
ing some values for x, say −3, 0, 2 and 4, we find the corresponding
points of the graph which are (−3, −3), (0, 3), (2, 7) and (4, 11). Plot-
ting these points on the plane, we see they lie in a straight line. Any
other points of the graph will also lie in this line and so filling in we
get the picture in Figure 1.5.

Figure 1.5: the graph of f(x) = 2x + 3

(4,11)
9 There is an interac-
tive GeoGebra applet,
6 (2,7)
written by Dr Anthony
Brown, that you can use
3 (0,3) to draw graphs of linear
functions.

−3 −2 −1 1 2 3

(−3,−3) −3

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
30 Preliminaries

Example 1.26. Let f : R → R be given by f(x) = x 2 − 1. The function


f is a polynomial of degree 2, whose graph is shown in Figure 1.6.

Figure 1.6: the graph of f(x) = x 2 − 1

3
There is another applet
by Dr Anthony Brown 2
that treats quadratic
functions. 1

−2 −1 1 2

Remarks 1.27. Polynomials of degree twoSare usually called quadratic


functions. Their graph is always
T either -shaped (when the coeffi-
cient of x is positive) or -shaped (when the coefficient of x 2 is
2

negative). Technically, the proper name for this shape is a parabola.

The points where the graph of a function f touches or crosses the x-axis
are called the roots of f. In other words, the roots of f are the values of
x for which f(x) = 0.

Example 1.28. From Figure 1.6, we see that the roots of the quadratic
x 2 − 1 are −1 and 1.

Example 1.29. A rectangular field has one of its sides 5 metres longer
than another. The area of the field is 2250 m2 . What are the dimen-
sions of the field?

Solution. Let x be the length of the short side. Then the long side
has length x + 5 and so the area of the field is x(x + 5) = x 2 + 5x.
Hence, x 2 + 5x = 2250, or equivalently, x 2 + 5x − 2250 = 0. Now
we have to solve this equation for x, i.e. find a root of the quadratic
f(x) = x 2 + 5x − 2250.
For this quadratic, a = 1, b = 5, c = −2250, and so the roots are
1.9. Graphing functions 31

given by
√ p
−b ± b2 − 4ac −5 ± 52 − 4(1)(−2250)
x = =
2a √ 2
= 2 (−5 ± 9025)
1

= 1
2
(−5 ± 95),

so x = 12 (−5 + 95) = 45 or x = 12 (−5 − 95) = −50. Since lengths are


always positive we disregard the negative answer, thus the short and
long sides of the field are 45 m and (45 + 5) = 50 m, respectively. 

Exercise 1.30.

1. Solve 2x 2 − 5x − 12 = 0.

2. Solve x(x + 1) = 2.

3. The long side of a rectangular field is twice as long as the short


side. If the field has an area of 1000 square metres, then how
long is the short side?

4. Find all solutions of the equation x = x + 3.

Exercise 1.31.

1. Expand (x − 2y)(2x 2 − 3xy + 2y3 ).

2. Simplify the following expressions

(x 3 )(x 4 ) p
(a) (b) 4
(x 2 )3 (c) 256−1/4 .
x 15

3. On the Cartesian plane, plot three points of the graph of (i)


f(x) = x 3 , (ii) g(x) = x1 , (iv) h(x) = 2x 3 − 5x 2 + 7x − 4.

Trigonometric Functions
We start by defining the sine and cosine functions. For this, draw a circle
in the Cartesian plane, whose centre is at the origin and which has a

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
32 Preliminaries

We usually put paren- radius of 1. See the left-hand picture in Figure 1.7. Given an angle θ,
theses around the
parameter of a function
consider the point (x, y) that makes an angle θ, measured anticlockwise,
– f(x). However, for with the positive part of the x-axis. We define cos θ = x, and sin θ = y.
named functions this
is somewhat optional,
e.g. sin x and sin(x) are Figure 1.7: sine and cosine
both used.

(x,y) = (cos θ,sin θ)


y

θ x
x θ

y
(x,−y) = (cos(−θ),sin(−θ))

In this way, the sin and cos function make sense for every angle. For
example, what is cos 180◦ ? Well, the point on the unit circle which makes
a 180◦ angle with the positive x-axis is the point (−1, 0). Thus, cos 180◦ =
−1 (and sin 180◦ = 0).
Now consider the right-hand picture in Figure 1.7. Everything is as in the
left-hand picture, except that the angle θ is measured clockwise, which
we interpret as −θ. If we do this, then the corresponding point equals
(x, −y), from which we infer that cos(−θ) = x = cos θ and sin(−θ) =
−y = − sin θ.

Remarks 1.32. Apart from the sine and cosine functions, the most
commonly met trigonometric function is tangent. It is defined by
sin θ
tan θ = cos θ
. With the notation of Figure 1.7, tan θ = yx is the steep-
ness, or slope, of the line determined by the angle θ. Notice that
tan θ is not defined at places where cos θ is zero.

The notation sin2 θ means (sin θ)2 (i.e. calculate the sine of θ and then
square the answer) and is not to be confused with sin θ 2 (i.e. square θ
and calculate sine of the answer). The following formulae are often useful
when dealing with trigonometric functions.
If you look again at the right-angled triangle in Figure 1.7, and apply
the Theorem of Pythagoras, you will conclude that cos2 θ + sin2 θ = 1,
for any angle θ. This is a very useful trigonometric identity, and we’ll
present it here with another couple of identities which will be useful later
on. We won’t prove them.
1.9. Graphing functions 33

Lemma 1.33. For all x, y ∈ R,

1. cos2 x + sin2 x = 1,

2. cos(x + y) = cos x cos y − sin x sin y,

3. sin(x + y) = sin x cos y + cos x sin y.

Measuring angles

Fact 1.34.

1. Positive angles are measured anti-clockwise from the positive


x-axis; if an angle is negative it means it is measured clockwise
from the positive x-axis.

2. Adding 360◦ to an angle doesn’t actually change the point on


the circle corresponding to that angle. A consequence of this
is that sin θ = sin(θ + 360◦ ) and cos θ = cos(θ + 360◦ ) for any
angle θ.

Measuring an angle by dividing a circle into 360 parts and calling each
part one degree is actually rather arbitrary. Beyond historial precedent,
there is no natural reason for doing so: why not 100 parts, or 29 parts? Nobody knows the exact
origin of the partition of
However, there is in fact a very natural way of measuring angles. This the circle into 360 de-
grees. It is certainly an-
is done as follows. cient: it is seen in an-

Suppose we are standing on a circle of radius r. If you walk along the


cient Greek and Indian
texts, and may go back
perimeter of the circle for a distance r then you will have ‘traced out’ a further to the Babyloni-

certain angle. We call this angle a radian – see Figure 1.8.


ans.

Figure 1.8: one radian

r
r
Alternatively, x radians
1 radian is the angle we get by
r walking an arc of length
x around the circumfer-
ence of the unit circle

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
34 Preliminaries

How big is a radian? Well, if we walk around the whole circle, we will
have travelled a distance 2πr, that is 2π times a distance r. Hence there
are 2π radians in a complete revolution, or 360◦ = 2π radians. It is
important to know the sine and cosine of a small number of fundamental
angles, given by the following table.

π π π π
The pattern of numbers
θ 0◦ ≡ 0 30◦ ≡ 6
45◦ ≡ 4
60◦ ≡ 3
90◦ ≡ 2
√ √ √ √ √
sin θ
under the square root
signs is a handy way of
0
=0 1
= 1 2
= √1 3 4
=1
2 2 2 2 2 2 2
remembering the values. √ √ √ √ √
cos θ 2
4
=1 2
3
2
2
= √1
2 2
1
= 1
2 2
0
=0

Notice that sin θ = sin(θ + 2π) and cos θ = cos(θ + 2π). We will now
make graphs of the sine and cosine functions.

Figure 1.9: the sine function


There is an interac-
tive GeoGebra applet,
written by Dr Anthony 1
Brown, which clearly
links the definition of
sine and Figure 1.7 to its
graph, as seen in Figure
−2π − 3π −π − π2 π π 3π 2π
1.9. There is a corre- 2 2 2
sponding applet for the
cosine function. −1

Figure 1.10: the cosine function

−2π − 3π −π − π2 π π 3π 2π
2 2 2

−1

In this programme, we Yes! They are the same shape, but one is a horizontal translate of the
will use radians by de-
fault.
other. More specifically, cos x = sin x + π2 . Notice also that, since adding
If you use your calcula- a semi-circle (i.e. π radians) to an angle gives us the point on the perime-
tor to compute sines and ter which is diametrically opposed, we also have sin(x + π) = − sin x and
cos(x + π) = − cos x.
cosines etc. it is highly
advisable to set it to ra-
dian mode, by default!
1.9. Graphing functions 35

Notice also that the graph of cosine has mirror symmetry in the y-axis A function that has this
property is called even.
(if you reflect it in the y-axis, it doesn’t change). This is to be expected, Other examples include
since cos(−x) = cos x, as we saw above. x 2 − 1 (see Figure 1.26)
and x 4 .
Also true is the fact that the graph of sine has a rotation symmetry, in
the sense that if you were to rotate it about the origin through an angle A function that has this
of π radians (180◦ ), then it would look the same. This is, ahem, reflected property is called odd.
Other examples include
by the fact, again seen above, that sin(−x) = − sin x. x, x 3 and tan x.

The shape of these functions, often referred to as a sine wave, will be


familiar if you’ve ever seen alternating current displayed on an oscillo-
scope. This is because alternating current is produced by rotating mag-
nets and whenever there is circular motion, such as with magnetic coils,
or combustion engines, or planetary motion, trigonometric functions are
required.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
Chapter 2

Functions and Limits

2.1 Functions
Domain and codomain
Let us recall Definition 1.21 and examine again what a function is. Sup-
pose we call our function f (as we often do!), our set of inputs A and our
set of outputs B. We write the phrase ‘f is a function from the set A to
the set B’ more compactly as f : A → B. The input set is known as the
domain and the output set as the codomain. Quite often for us, these are
both the set R of real numbers and we have f : R → R. By a real-valued
function, we mean one which maps into R, that is, the codomain is R (or
a subset of R).
The rule for getting from the domain to the codomain is usually given John von Neumann

by a formula (typically, an algebraic expression), so a function might be


(1903 – 1957).
‘If people do not believe
fully specified in the following way: that mathematics is sim-
ple, it is only because
f :R → R they do not realize how
complicated life is.’
f(x) = x 3 + 7. Image source:
Wikipedia.
The following variation of this is also used:

f :R → R
x 7→ x 3 + 7.

Here is another function, but there is a problem with its definition. Can
you see what it is?

g:R → R

37
38 Functions and Limits

1
g(x) = .
x −3
Look again at Definition 1.21 and notice that a function must ‘map each
element’ of the domain to the codomain. Here, g fails this requirement
Division by zero! because the rule g(x) = 1/(x − 3) cannot be evaluated when the input is
3. There are two ways around this. One is to change our formula so that
the function is properly defined on all of the given domain, the other is
to change our domain so that the given rule is valid. So both of these
functions g1 and g2 are well-defined:
(
1
x 6= 3
g1 : R → R, g1 (x) = x−3
2021 x = 3,

and
1
g2 : R \ {3} → R, g2 (x) = .
x −3
If a formula for a function is given, without the domain being specified,
then you should assume the domain to be all the numbers for which the
formula makes sense. This is called the natural domain of the function.

Example 2.1.

1. The natural domain of g(x) = 1


x−3
is R \ {3}.
√ extend the
Yes – one can
domain of x to include

negative numbers if we
2. The natural domain of f(x) = x is {x ∈ R : x > 0}. This set
extend the codomain to (the non-negative half line) is often referred to as R+ .
complex numbers, but
we will stick to real
numbers in this pro-
gramme.
5
Exercise 2.2. What is the natural domain of h(x) = ?
x2 + x − 6

A piecewise defined function is one which has a different formula, or rule,


depending on what part of the domain the input is taken from.

Definition 2.3. The absolute value or modulus function is defined by


(
x if x > 0
|x| =
−x if x < 0.

Thus, |7| = 7 because 7 > 0 while, as −7 < 0, we have |−7| = −(−7) = 7.


2.1. Functions 39

Fact 2.4. The modulus function has the following properties for all
x, y ∈ R.

1. |x| > 0,

2. |x| = |−x|,

3. |xy| = |x| · |y|,

4. |x + y| 6 |x| + |y|. (the triangle inequality)

We think of |x| as the magnitude of the number x, or the ‘distance’ be-


tween x and the origin 0. It follows that |x − y| is the distance between
the numbers x and y. The graph of |x| is shown in Figure 2.1. It is
notable for not being ‘smooth’ – it has a corner at the origin.

Figure 2.1: The graph of |x|


y

x
−2 −1 1 2

Unique output
Look once again at Definition 1.21 and consider the requirement that
each input value is mapped to a unique element of the output set. This
phrase rules out any ambiguity in the value of a function, which is good,
but it means you have to be a careful when defining a function.

For example, consider the function defined through f(x) = x on its
natural domain R+ . What is f(4)? We cannot say that f(4) is ‘2 and −2’

because a pair of numbers is not a unique element of the codomain R.
We get around this by always taking x to be the positive square root
√ solve equations.
and allowing for this when we √ Thus the solution
√ to the
equation x = 4 is not x = 4, it is x = ± 4, and since 4 = 2, the
2

solutions are x = ±2.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
40 Functions and Limits

Remarks 2.5. This unique output property of functions has a notable


geometric consequence for the graph of a function: if f : R → R then
no vertical line will hit the graph more than once.

Figure 2.2: neither of these plots are graphs of functions f : R → R


y y

x x

Algebra of functions
Much of MATH10390 The algebra of numbers, that is, how to add, subtract, multiply and divide
concerns performing
arithmetic on objects
numbers is well-known to us. Perhaps surprisingly, we can do these
(matrices, vectors etc.) operations on real-valued functions. The operations are inherited from
that are not numbers.
those on numbers applied pointwise. That is, if f and g are functions
then we add them to get the function f + g. What is the function f + g?
Well, its value at a number x is defined to be (f + g)(x) := f(x) + g(x),
and similarly,
The notation ‘:=’ means

(f − g)(x) := f(x) − g(x)


‘is defined to be’. Thus
x := 5 means ‘x is de-
(fg)(x) := f(x)g(x)
fined to be 5’, which is
a subtle distinction from
the usual ‘x = 5’ used
when x is calculated to f
 f(x)
g
(x) := .
be, or turns out to be. g(x)
Sometimes it’s a useful

One has to be a little careful with the domains: (f + g)(x) only makes
distinction, other times
confusing. You can ig-
nore it if you prefer. sense for values of x which are in the domain of both f and g, and the
same is true for the other combinations of f and g. In addition, gf is not
defined at any points x for which g(x) = 0.

Example 2.6. Let f(x) = x 2 and g(x) = ex . Write down formulae for
(f + g)(x), (fg)(x), gf (x) and gf (x). What are the natural domains?
2.1. Functions 41

f x2
Solution. We have (f + g)(x) = x 2 + ex , (fg)(x) = x 2 ex , g
(x) = ex
,
g x
f
(x) = xe2 .
Both f and g have R as their (natural) domain, thus the domain of
both f + g and fg is R. Since ex is never zero, the domain of gf is also
R, but gf is not defined at 0 because the denominator is f(0) = 0, so
the domain of gf is R \ {0}. 

Composition
There is one other very important way in which functions can be combined
other than the algebraic operations mentioned above, and it is when we
use the output value of one function as the input value of another. This
is known as a chain or composition of functions. The notation f ◦ g
represents the function ‘f after g’:
(f ◦ g)(x) := f(g(x)).

Example 2.7. Let f(x) = 1 − x 2 and g(x) = sin x. Write down formulae
for f(g(x)) and g(f(x)).

Solution. First of all

f(g(x)) = f(sin x) = 1 − (sin x)2 = 1 − sin2 x.

Since sin2 x + cos2 x = 1, we can simplify: f(g(x)) = cos2 x. Lemma 1.33 (1).

Second, g(f(x)) = g(1 − x ) = sin(1 − x ).


2 2


Notice that f ◦ g and g ◦ f are different functions! The act of taking In MATH10390 Section
compositions is non-commutative: the order in which the composition is 1.2 we discover another
non-commutative opera-
made matters. tion, namely matrix mul-
tiplication.
Contrast these opera-
Surjectivity tions with addition or
multiplication of num-
bers, which is commu-
The function f : R → R, f(x) = x 2 is well-defined; the formula can be tative: for all numbers a
calculated for every element of the domain R. Notice there are elements and b, a + b = b + a and
ab = ba.
of the codomain which are never output by the function: the square of
a real number is never negative. The set of values which the function
outputs is called the range or image of the function. The range of f(x) =
x 2 is R+ = [0, ∞) (see Example 2.1 (2)). The range of a function is a
subset of the codomain.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
42 Functions and Limits

Definition 2.8. If the range of a function equals its codomain then we


say the function is surjective or onto, or that f is a surjection.

Example 2.9.

1. The function f : R → R given by f(x) = x 3 is surjective: its range


equals its codomain.

2. The cosine function cos : R → R is not surjective. Its codomain


is R but its range is the closed interval [−1, 1].

Given this ability to We can always ensure that a function is surjective by ‘shrinking’ its
shrink the codomain, the
idea of being surjective
codomain to its range.
may seem redundant,
Example 2.10. The cosine function cos : R → [−1, 1] is surjective.
but there are reasons for
having it around.

Injectivity
While a function cannot map an element of the domain to two different
elements of the codomain (see the subsection on unique output above),
it is perfectly allowable for a function to map two different elements of
the domain to the same element of the codomain. For example, consider
g : R → R, g(x) = x 2 − 6x, where both g(1) = −5 and g(5) = −5.
A function f is said to be injective or one-to-one if we cannot get the
same output from two different inputs. In other words, equal output
implies equal input.

Definition 2.11. We say the function f is injective or one-to-one, or


that f is an injection, if f(x) = f(y) implies x = y.

Example 2.12.

1. We saw above that the function g : R → R, g(x) = x 2 − 6x, is


not injective (since, for example, g(1) = g(5) and 1 6= 5).

2. The mapping from the set of UCD students to their respective


student numbers is injective.
2.1. Functions 43

3. The function f(x) = x n is injective on R if n is an odd integer,


but not if n is even (e.g. f(1) = 1 = f(−1)).

4. The sine and cosine functions are not injective on R, e.g. sin 0 =
0 = sin π.

You can often ensure a function is injective by restricting it to a suitable


domain (in the process, you will lose a bit of the original function).

Example 2.13.

1. The function f : R → R, f(x) = x 2 , is not injective (for example,


because f(−2) = f(2)).

2. The function g : R+ → R, g(x) = x 2 , is injective.

Neither f nor g is surjective (because, for example, −3 is not output


by either of them.)

Definition 2.14. A functions that is both injective and surjective is


called bijective, or a bijection.

Example 2.15. The functions f : R → R, f(x) = x 3 and h : R+ → R+ ,


h(x) = x 2 , are both bijective.

Inverse Functions
Suppoose we have a function f : A → B. If we can find a function
g : B → A which ‘undoes’ the action of f, then we call g an inverse
function of f.

Definition 2.16. Let f : A → B be a function between two sets. If


there is a function g : B → A with the properties that

g(f(x)) = x for all x ∈ A,

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
44 Functions and Limits

and
f(g(y)) = y for all y ∈ B,
then we call g an inverse function for f.

Functions do not always have inverses. For example, if f : A → B is not


injective then it cannot have an inverse g because if f(x1 ) = f(x2 ) = b
then should g(b) be x1 or x2 ? It’s rather like giving two students the same
student number and then trying to find the student name corresponding
to that number – confusion arises.
The next fact, which we won’t prove, is important enough to be called a
theorem.

Theorem 2.17. A function f has an inverse if, and only if, it is bijective.

Example 2.18. Let f : R → R be given √by f(x) = x 3 . Show that f has


the inverse function g : R → R, g(x) = 3 x.

Solution. Verify that


p √
f(g(x)) = f( 3 (x)) = ( 3 x)3 = x,

and p
g(f(y)) = g(y3 ) = 3
y3 = y. 

Throughout this pro- Example 2.19. Let f : R → (0, ∞) be given by f(x) = ex . Then f has
the inverse function g : (0, ∞) → R, g(y) = log y, since elog x = x and
gramme, log denotes
natural logarithm, that
is, ln, or ‘log to base log(ex ) = x.
e’, i.e. loge . This is
standard practice in
mathematics. However,
in some other contexts Figure 2.3: The graphs of f and g in Example 2.19
(in particular, some
calculators) log can y y
common
ex
mean the
logarithm or ‘log to
base 10’, i.e. log10 , so 4 2 log x
do be careful!
To see the difference 3 1
between log10 and log,
take a look at this inter- 2 x
active GeoGebra applet, 1 2 3 4
written by Dr Anthony 1 −1
Brown. The default set-
ting (a = 10) displays x −2
log10 , while setting a = −2 −1 1 2
2.7 yields the (approxi-
mate) graph of log. This
is because e ≈ 2.7 (to
one decimal place).
2.1. Functions 45

Pick any number on your calculator. Press the ‘exp’ button (it appears
as ‘ex ’ on some calculators). Then press the ‘ln’ (or ‘loge ’) button. You
should end up with the number you started with. Ta da!
It might look odd that
Remarks 2.20. A function may not have an inverse, but if a function this remark is being
has one, then it is unique: it does not have two different inverses. made, but if you look
back at Definition 2.16,
Hereafter, if a function has an inverse, we will refer to it as the g simply fulfils some re-
quirements. The fulfill-
inverse. ing of requirements does
not imply uniqueness in
general: the shoes I am
Example 2.21. Let f : [1, 4] → [5, 14] be given by f(x) = 3x + 2. Then wearing fit my feet, but I
have other pairs that do
f is bijective and its inverse function g : [5, 14] → [1, 4] is found by so too.
solving y = 3x + 2 for x as a function of y. Subtracting 2 from both See MATH10390 Propo-
sides and then dividing by 3, we find x = 13 (y − 2). Thus g(y) = y−2 3
sition 1.28 for another
example of this point.
is the inverse function of f.

The notation f −1 is often used for the inverse function of f (when there
is an inverse function). This terminology is slightly unfortunate because
with functions f −1 is not the same thing as 1f (whereas, with numbers,
x −1 = x1 ).

Linear functions
Linear functions are those of the form f(x) = mx + c where m and c are
given real numbers. The natural domain of a linear function is all of R.
For example, f(x) = 3x + 2 and g(x) = −17x + π are linear functions. A stricter definition of
linear function is not
Linear functions are so called because their graph is a straight line. For just that the graph is a
the linear function f(x) = mx + c, the number m gives the slope or straight line, but that it
steepness of the line: for each unit of increase on the horizontal x-axis, also passes through the
origin (in other words, it
the line rises by m on the vertical y-axis (or drops, if m is negative). The has the form f(x) = mx,
where c = 0) but we
number c is the y-intercept, that is, where the line crosses the y-axis. will use the more gen-
See Example 1.25. eral notion.

Linear functions are injective (if m 6= 0) and their inverses can be found
quite easily – see Example 2.21. If m = 0, then we get a constant
function, f(x) = c, such as f(x) = 17. The graph of a constant function is
a horizonal straight line (slope is zero).
While linear functions are very simple, they are important because most
functions that we need to deal with can be approximated by linear func-
tions. This fact lies at the heart of calculus. If you zoom in far enough
on the graph of any ‘smooth’ function then the graph looks like it is a
straight line.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
46 Functions and Limits

2.2 Limits of functions


Consider the function f(x) = xx−1−1
. The domain of this function is R \ {1}.
2

The function is not defined at 1 because division by zero is forbidden (in


particular, the expression 00 is meaningless). However, a pattern emerges
if one calculates some values of the function close to 1.
x<1 f(x) x>1 f(x)
0.9 1.9 1.1 2.1
0.99 1.99 1.01 2.01
0.999999 1.999999 1.000001 2.000001
↑ ↑ ↑ ↑
getting close to 1 getting close to 2 getting close to 1 getting close to 2

In mathematics, appear- As x gets closer to 1, either from the left or the right hand side, f(x)
ances can deceive. The
table suggests certain
appears to get closer to 2 (and as we see below, it really does do this).
behaviour, but offers no We say that the limit of f(x) as x approaches 1 is 2. We can express this
proof.
using notation as
lim f(x) = 2.
x→1

Remarks 2.22. The notation x → 1 means the variable x is getting


closer to or tending towards or is approaching the fixed number 1,
without actually being equal to 1.

There is a more rigorous Definition 2.23. If f(x) → ` when x → a then we say that the limit
of f(x) as x approaches a is `, and we write
definition of limit, which
belongs to a course on
so-called Mathematical
Analysis, but this defi-
lim f(x) = `.
nition will suit our pur- x→a
poses.

Deciding if a limit of a function exists at a point, and proceeding to find


the limit, by making a table of values as we did above is slow, error prone,
and unconvincing. A better way to determine limx→1 xx−1
2 −1
is as follows.
Notice that
x2 − 1 (x − 1)(x + 1) x −1
f(x) = = = (x + 1).
x −1 x −1 x −1
While x → 1, it is not equal to 1, and so x − 1 6= 0. This means we can
cancel and write f(x) = x + 1. This ‘new’ formula for f(x) is only valid
2.2. Limits of functions 47

when x 6= 1, but that’s good enough to determine the limit – as x → 1, it


is clear that f(x) = x + 1 → 2. In other words limx→1 f(x) = 2.

Remarks 2.24. The existence of a limit of f(x) as x → a does not


depend on the value of f at a. It is not even necessary for f(a) to be
defined! Also, a function f has the limit ` at a only if f(x) gets close
to ` for all x sufficiently close to a. If it only works for some x close
to a (e.g. x close to a but to the right of a), that’s not good enough.

A function need not have a limit at every point.

Example 2.25.

1. The function f(x) = x+2


1
(defined on R \ {−2}) has no limit as
x → −2, because we have f(x) → −∞ as x → −2 from the left,
and f(x) → ∞ as x → −2 from the right.

2. The function g : R → R, given by


(
1 x>0
g(x) =
−1 x < 0,

has no limit at 0 because as x → 0 from the right g(x) → 1,


while as x → 0 from the left, g(x) → −1. This function is often To avoid confusing with
called the sign or the signum function. the sine function.

In these two examples, it makes sense to talk about a ‘left hand limit’
or a ‘right hand limit’. The functions f and g did not have limits
because their left and right hand limits did not agree.

Figure 2.4: Example 2.25 (1) and (2), respectively


y y

2 2

1 1

x x
−3 −2 −1 1 2 −2 −1 1 2
−1

−2

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
48 Functions and Limits

Even stranger behaviour can occur.

Example
 2.26. The function h : R \ {0} → [−1, 1], given by h(x) =
1
sin x , has neither left nor right hand limits at 0, and in particular
has no limit at 0.

Figure 2.5: Example 2.26


y

The graph oscillates ar-


bitrarily quickly as x ap-
proaches 0 (from either x
direction). −2 −1 1 2

−1

If a limit exists at a given point, and the function is defined at that point,
the limit may not equal the value of the function there.

Example 2.27. Define the function δ : R → R by


(
0 x= 6 0
δ(x) =
1 x = 0.

Then limx→0 δ(x) = 0, but δ(0) = 1.

Figure 2.6: The function δ in Example 2.27


y

x
−2 −1 1 2
−1

−2
2.2. Limits of functions 49

Rules for finding limits


We would like to have a set of tools that will enable us to find lim-
its of various functions more easily and systematically. Let’s start with
polynomials.

Definition 2.28. A function of the form E.g. quadratic and cubic


functions are polynomi-
p(x) = a0 + a1 x + a2 x 2 + a3 x 3 + · · · + an x n ,
als of degree 2 and 3, re-
spectively.

where each ai is a given real number, with an 6= 0, is called a poly-


nomial of degree n. The degree of a polynomial is the highest power
of x that appears.

One polynomial whose limits are easy to calculate is the identity func-
tion, f(x) = x. Indeed, limx→a f(x) = limx→a x = a. This is really saying
nothing more than the tautology ‘if x tends to a then x tends to a’.
Another limit, even easier to calculate, is that of the constant function
f(x) = k, where k is some constant. We simply have limx→a f(x) = k.
The following rules of limits are very useful – we’ll state them in the form
of a theorem. These rules form what is known as the algebra of limits.

Theorem 2.29. Suppose limx→a f(x) = ` and limx→a g(x) = m. Then

1. limx→a k f(x) = k`, where k is a constant,

2. limx→a (f(x) + g(x)) = ` + m,


Rule 1 is a special case
of rule 3 – let g be the
3. limx→a f(x)g(x) = `m, constant function g(x) =
m.
f(x) `
4. if m 6= 0 then limx→a g(x)
= m
,

5. limx→a (f(x))b = ` b (if ` b ∈ R).

This result allows us to calculate limits of sums, products, quotients and


powers of functions at a point a if we know the limit of the functions at
a. Since we know the limits of the identity polynomial f(x) = x, this
allows us to write down limits of polynomials, because polynomials are
just sums of multiples of powers of the identity function.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
50 Functions and Limits

Example 2.30. Find limx→2 (5x 2 + 3).

Solution. We use the rules Theorem 2.29, together with the limits of
In this solution, we’re
really applying the rules constant functions and the identity function. We have that
in Theorem 2.29 in re-
verse.
lim(5x 2 + 3) = lim 5x 2 + lim 3 rule (2)
If we followed Theo- x→2 x→2 x→2
= 5 lim x 2 + 3
rem 2.29 to the let-
rule (1)
x→2
ter, we could start at
the last line and work 2
backwards. In practice = 5 lim x + 3 rule (3)
though, this is rarely x→2
= 5 · 2 + 3 = 23.
done. 2


Concerning polynomials, we have a more general result.

Theorem 2.31. If p(x) is a polynomial then, given any a ∈ R, we have

lim p(x) = p(a).


x→a

So, for polynomials, we can calculate the limit by just substituting the
limit point into the function. This does not work for every function as
we’ve already seen.
Next, let’s look at limits of rational functions.

Definition 2.32. A function which is the quotient of two polynomials


is called a rational function.

The following result follows from Theorems 2.31 and 2.29 (4).

Corollary 2.33. If p(x) and q(x) are polynomials and q(a) 6= 0 then

p(x) p(a)
lim = .
x→a q(x) q(a)

If q(a) = 0, and p(a) 6= 0 then we have behaviour like that in Example


2.25 (1): the limit does not exist. If q(a) = 0 and p(a) = 0 then we have to
apply a trick like factorising and cancelling, as we did when we looked
at the first function featured in the section: f(x) = xx−1
2 −1
.
Some more examples might help.
2.2. Limits of functions 51

Example 2.34.

x +2 3
1. lim = = −3
x→1 x − 2 −1
x +2 0
2. lim = = 0
x→−2 x − 2 −4
x −3
3. lim = – no limit!
x→−3 x + 3 0
4. lim 7 = 7
x→2

5. lim h = h
x→2

6. lim x = x
h→2

x 2 − 36 (x − 6)(x + 6)
= lim x − 6 = −12
In parts 7 – 11, we have
7. lim = lim
x→−6 x + 6 x +6
cancelled the denomina-
x→−6 x→−6 tor with a factor in the
numerator. We can to
h2 − 7h + 12 (h − 4)(h − 3)
= 3 − 4 = −1
do this because, in all
8. lim = lim
h−3 h−3
cases, these factors are
h→3 h→3 never zero. E.g. in part
7, the factor x + 6 is
y3 − 8 (y − 2)(y2 + 2y + 4) never zero because x →
9. lim = lim = 12 −6 means that x tends
y→2 y − 2 y→2 y−2 to −6, but never equals
−6. Likewise for the
12h + 6h2 + h3 other parts. See the ar-
10. lim = lim 12 + 6h + h2 = 12
h
gument after Definition
h→0 h→0 2.23.

3x 2 h + 3xh2 + h3
11. lim = lim 3x 2 + 3xh + h2 = 3x 2 .
h→0 h h→0


A useful trick for limits which√involve a − b is to multiply above and We can eliminate the
below by the surd conjugate a + b. square√root signs
√ in this
way: ( a−b)( a+b) =
a − b2 .
Example 2.35. See Exercise 1.8 (3).

√ √ √
x −5−2 x −5−2 x −5+2
lim = lim ·√
x→9 x −9 x→9 x −9 x −5+2
√ √
( x − 5 − 2)( x − 5 + 2)
= lim √
x→9 (x − 9)( x − 5 + 2)

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
52 Functions and Limits


x − 5 − 22
2

= lim √
x→9 (x − 9)( x − 5 + 2)

x −9
= lim √
x→9 (x − 9)( x − 5 + 2)

= lim √ = √
1 1
= 1
.
x→9 x −5+2 9−5+2 4

Limits at infinity
The limits that we considered in the previous subsection were, loosely
speaking, questions of the form ‘what happens to f(x) as x gets closer
to the number a’. We can also ask questions of the form ‘what happens
to the function f(x) as x gets arbitrarily large and positive (or the same,
but negative)’. The relevant notation is
lim f(x) or lim f(x),
x→∞ x→−∞

respectively.

Example 2.36.

1. limx→∞ x1 = 0 and limx→−∞ x1 = 0. ‘The bigger x gets (positive


or negative), the closer x1 is to zero’.

2. limx→∞ k = k (a constant stays the same, regardless of what x


does).

3. limx→∞ x = ∞. Writing ‘limx→a f(x) = ∞’ means that, as x gets


closer to a, f(x) just gets arbitrarily large and positive (i.e. it
Writing infinities in
equations should be
done with care and only has no upper bound). Similarly, limx→∞ (1 − 2x 6 ) = −∞.
in specific situations,

4. limx→−∞ ex = ∞.
because the meaning 2
of ‘infinity’ can be
ambiguous.
5. limx→∞ e−x = 0.
2

x3 1
6. lim = lim = 12 .
x→∞ 2x + 9
3 x→∞ 2 + 93
x

Example 2.36 (6) involves a rational function. In general, to calculate


the limit at infinity of a rational function, divide above and below by the
highest power of x appearing in the denominator (x 3 in the case above)
and then take the limit.
2.2. Limits of functions 53

Trigonometric limits
The basic trigonometric functions obey the same trivial limit formulae Functions which obey
the limit formula
that the polynomials do (Theorem 2.31), namely
lim f(x) = f(a),
x→a
lim sin x = sin a, lim cos x = cos a,
x→a x→a
are considered ‘well-

as does the tangent function where it is defined (i.e. whenever cos x 6= 0).
behaved’ – see Section
2.3.

In particular, limx→0 sin x = 0 and limx→0 cos x = 1, but the following


result states the most important trigonometric limit.

sin x This limit is only valid


Theorem 2.37. lim = 1. when measuring angles
x→0 x in radians, not degrees!!

We won’t prove this result, but see Exercise 2.38. If you want supporting
evidence (but not proof), choose a number very close to zero, e.g. 0.001,
and use a calculator (in radian mode) to find its sine. You should get This are just approxima-
sin 0.001 ≈ 0.0009999998333, and thus sin0.001
0.001
≈ 0.999999833, which is tions of the true values
in question.
awfully close to 1.

Exercise 2.38. Draw the unit circle and mark a (small) angle x. Mark
the point corresponding to the angle x on the unit circle and drop a
perpendicular to the horizontal axis (as in Figure 1.7). The length of
the perpendicular is sin x. Because we measure angle using radians,
the angle x is the length of an arc of the circle. Mark this arc.
Compare the length of this arc x, with the length of the perpendicular
sin x. What happens to these lengths as the angle gets smaller?

tan x sin x sin x


Example 2.39. limx→0 x
= limx→0 x cos x
= limx→0 x
· cos1 x = 1·1 = 1.

Example 2.40.

1. limx→0 sin 3x
x
= limx→0 3 · sin 3x
3x
= 3 · 1 = 3.

2. limx→0 tan 4x
sin 5x
= limx→0 tan 4x
4x
· 5x
sin 5x
· 4x
5x
=1·1· 4
5
= 54 .
sin x 3 sin x 3
3. limx→0 x4
= limx→0 x3
· x1 = 1 · limx→0 x1 , which does not exist.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
54 Functions and Limits

cos2 x−sin2 x−1 −2 sin2 x


Using Lemma 1.33 (2) 4. limx→0 cos(2x)−1
3x 2
= limx→0 3x 2
= limx→0 3x 2
= − 23 .
and Theorem 2.37.

Exercise 2.41.

sin2 x
1. What is limx→0 x
?
1−cos x
2. Show that limx→0 x
= 0.

2.3 Continuity
Limits – an interpretation
One way of interpreting the notion of a limit in mathematics is as a way
to see what you cannot look at directly. When you calculate limx→a f(x)
you are in some sense ‘predicting’ what f(a) might be, by looking only
at the values of f(x) for x near a. It’s as if someone puts their thumb over
the graph and asks you what’s underneath.
Sometimes, you cannot tell what the function should be at the limit point
(the limit does not exist). For example, given the signum function (Figure
2.4 (2)), is g(0) equal to 1 or −1?
Other times, you can make an informed guess (the limit limx→a f(x) exists),
but your guess is wrong (limx→a f(x) 6= f(a)). For example, if shown the
δ function of Example 2.27 (and Figure 2.6), but the value at 0 covered
up, what would you expect the value to be? Presumably 0?
Finally, there is the case when the function behaves ‘as expected’ – you
can make an informed prediction (the limit limx→a f(x) exists), then the
graph is uncovered, and you see your prediction was correct (limx→a f(x) =
f(a)). Functions that behave this way are said to be continuous.

Continuous functions

Definition 2.42. Let f(x) be defined on an open interval containing


a ∈ R. We say f is continuous at a if limx→a f(x) exists and equals
f(a). We say f is a continuous function if it is continuous at every
point of its domain.
2.3. Continuity 55

Loosely speaking, a function is continuous if it does not have any jumps,


breaks or ‘discontinuity’ in its graph. A rule of thumb is that a function
is continuous if you can draw its graph without having to lift pen from
paper.
Sums, products and compositions of continuous functions are continuous.
A quotient of continuous functions is continuous at any point where the
denominator function is not zero.

Example 2.43.

1. Continuous functions
All polynomials are continuous functions (by Theorem 2.31). Ra-
tional functions are continuous except at points where the de-
nominator is zero (at these points the functions are undefined).
Sine and cosine are continuous functions. Tangent is continu-
ous except at points where the cosine is zero.

2. Non-continuous functions
Functions that are not continuous include the signum function
and the function h(x) = sin x1 (Examples 2.25 (2) and 2.26) which
are both discontinous at 0. The floor function (or ‘round down’
function)

bxc := the greatest integer less than or equal to x,

has a discontinuity at every integer. The graph of the floor


function looks like a staircase – see Figure 2.7.
The function (
1 x∈Q Try not to spend too
q(x) := long imagining it – it is
0 x∈
/ Q, fairly unimaginable!

is not continuous at any point. Can you imagine what its graph
looks like?

Example 2.44. Find the value or values of k ∈ R for which the func-
tion (
2x + k, x 6 −1
f(x) =
x 2 + 1, x > −1,
is continuous.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
56 Functions and Limits

Solution. The two pieces of this piecewise defined function are each
continuous, but f could fail to be continuous at the point where the
two pieces meet, namely at x = −1.
The left hand limit as we approach −1 is 2(−1)+k, the right hand limit
is (−1)2 + 1 = 2. These agree, and we have a limit, when −2 + k = 2,
that is, k = 4. The limit (from both sides) is then 2 which equals
f(−1). So f is continuous when k = 4. 

Figure 2.7: The floor function Example 2.43 (2)


y

x
−2 −1 1 2 3
−1

−2

Exercise 2.45. Consider the piecewise-defined function


(
kx + 2, x61
f(x) =
(kx) − 10, x > 1.
2

Sketch this function for k = 1. Is it continuous? Find any values of


k for which f is continuous.
Chapter 3

Differentiation

3.1 Rates of change


Slopes of straight lines
See also Remark 1.32.
Definition 3.1. The slope of a straight line in R2 is the tangent tan θ
of the angle θ the line makes with the positive x-axis.

Consider the straight line in Figure 3.1 and the angle θ that it makes In MATH10390 the co-
with the x-axis. Given any two points P = (x1 , y1 ) and Q = (x2 , y2 ) on ordinates of points in
R2 are typically labelled
the line, the slope of that line, tan θ, is also given by (x1 , x2 ) and (y1 , y2 ), etc.

y2 − y1
,
x2 − x1
that is, the vertical difference divided by the horizontal difference.
y2 −y1
Figure 3.1: tan θ = x2 −x1
y

Q = (x2 ,y2 )

y2 −y1
P = (x1 ,y1 ) θ
x2 −x1

θ
x

57
58 Differentiation

For the straight line, f(x) = mx + c, the slope turns out to be m. Notice
that the slope does not depend on the constant c. We may use the
phrase ‘rate of change’ instead of ‘slope’ because, if the slope is m, then
by moving 1 unit along the x-axis, the value of the function changes by
m, i.e.

f(a + 1) = m(a + 1) + c = ma + m + c = f(a) + m.

If the line is rising as we go left to right (that is, as x increases) then


its slope is positive, if it is falling then we have a negative slope. The
lines in Figure 3.2 have positive slope, negative slope, and slope zero,
respectively.

Figure 3.2: Lines having positive, negative and zero slopes, respectively
y y y

Isaac Newton (1643 –


1727)
‘I can calculate the mo-
tion of heavenly bodies,
but not the madness of x x x
people.’
Image source:
Wikipedia.

Thus linear functions change value in a very rigid way – the rate of
change is constant (no matter where we are on the line, if we move one
unit to the right on the line then we increase our height by m). Most
functions change values in more exciting ways.

The Calculus
Calculus is the study of how mathematical functions change their values.
The formal study of calculus began in the late 17th century. Isaac New-
ton developed this new science of change to deal with the motion of the
heavenly bodies under the action of gravity. Newton’s great work, Prin-
cipia Mathematica, was published in 1687. Around the same time, and
independently, Gottfried Leibniz studied derivatives and integrals in a
more abstract sense. Argument about who was first to come up with the
important ideas was a divisive topic among academics at the time.
3.1. Rates of change 59

It is interesting to muse on the fact that while Newton would have con-
sidered himself primarily a physicist, his works on optics and gravity,
while great breakthroughs at the time, have been superceded by newer
and better theories, for example, Einstein’s General Theory of Relativity.
However, relativity, and all of modern science, relies on the Newton’s
indelible mathematical contribution: the calculus.

Rates of change
For a function whose graph is a straight line, the slope of the line is a
measure of how quickly the values of the function change. For a more
general function, how do we measure the rate of change of the function?
Essentially, given a point on the x-axis we want to measure how steep
the graph of a function f is at that point. This is done according to a
simple strategy. We simply pick the straight line which best describes
the function at that point and say that the rate of change of the function
at that point is just the slope of this ‘special’ line. The special line which
best approximates the function at a point, a, is called the tangent line
to the function at a or ‘the linear approximation of the function’. For
example, Figure 3.3 shows the tangent line to a function at the point
a = 3.

Figure 3.3: tangent line In many cases a tan-


gent line touches the
curve at a point, but
does not ‘cross over’ the
3 curve locally (though
at
line it can cross the curve
g ent somewhere else, as we
tan
can see in Figure 3.3).
Sometimes a tangent
line will cross over the
3 curve locally – Remark
4.5.

Definition 3.2. Let f : R → R be a function and let a ∈ R. The rate


of change of f at the point a is the slope of the tangent line to f
at a and is denoted by the notation f 0 (a). This is also called the
derivative of f at a.
We call the function f 0 the derivative of f.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
60 Differentiation

3.2 Differentiation from First Principles


Now comes the tricky part. How do we find the rate of change of a
function f whose graph is not a straight line?
The problem is to find the slope of the tangent line to f at x = a. To
find the (slope of the) tangent line, we need two points on the line. One
point is easy: P = (a, f(a)) is a point on the tangent line – it is the point
where the tangent line touches the graph of f. However, we don’t have
a second point on the tangent line. We can, however, calculate other
points on the graph of the function and the sequence of Figures 3.4 – 3.6.

Figure 3.4: differentiation from first principles

P = (3,f(3))
Q = (4,f(4))

3 4

Figure 3.5: differentiation from first principles

P
Q

3 31
2

In the pictures, we have marked points P = (a, f(a)) and Q = (x, f(x))
on each graph. We notice that as the point Q gets closer to the point
P, the solid secant line from P to Q gets closer to the dotted tangent
line. In particular, the slope of the secant line from P to Q gets closer to
the slope of the tangent line. We obtain the slope of the tangent line by
taking the limit of the slope of the secant line as x → a.
3.2. Differentiation from First Principles 61

Figure 3.6: differentiation from first principles

P
Q

3 3 15

Notice that as Q is a point on the graph, Q is of the form (x, f(x)), so


when we say Q is getting closer to P = (a, f(a)), it is equivalent (at least
for a continuous function) to saying x is getting closer to a.
The slope of the straight line between two points (x1 , y1 ) and (x2 , y2 ) is
(y2 − y1 )/(x2 − x1 ), so the slope of the line between P = (a, f(a)) and
Q = (x, f(x)) is just
f(x) − f(a)
.
x −a
Now f 0 (a), the slope of the tangent line, is the limit of this quantity as
x → a:

f 0 (a) = Slope of tangent line = limit of slope of secant line PQ There is an interactive
GeoGebra applet, writ-
f(x) − f(a) ten by Dr John Sheekey,
= lim . that you can use to
x→a x −a see how the secant line
approaches the tangent
Often, a + h is used instead of x, and with this notation we consider line as h → 0.
the limit as h → 0 instead of as x → a. This gives us the following
reformulation of Definition 3.2, which is of central importance.

Definition 3.3. The derivative of a function f is the function f 0 whose


value at a is
f(a + h) − f(a)
f 0 (a) = lim . This is the most impor-
h→0 h tant definition in calcu-
lus!
If this limit exists we say f is differentiable at a. If f is differentiable at
every point of its domain, then we say it is a differentiable function.

The process of finding the derivative of a function is called differentiation.


Learning how to differentiate functions is the focus of the rest of this
chapter and the next chapter.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
62 Differentiation

Example 3.4. Find the slope of the tangent line to the function f
defined by f(x) = x 2 at the point a. In other words, find the derivative
of f(x) = x 2 at a.

Solution. From Definition 3.3, we know that


f(a + h) − f(a)
f 0 (a) = lim .
h→0 h
In this example, f(x) = x 2 , so substituting into the limit formula, we
get

(a + h)2 − a2
f 0 (a) = lim
h→0 h
a + 2ah + h2 − a2
2
= lim Example 1.7
h→0 h
2ah + h2
= lim
h→0 h
= lim 2a + h = 2a. 
h→0

This method of finding the derivative is called differentiation from first


principles. It would be quite arduous to apply this method in all cases.
Later, we’ll provide some faster techniques for finding the derivative of a
function f.
This example is quite
tricky, so do not de- Example 3.5. Differentiate f(x) = sin x from first principles and in so
spair if you find the doing show that
f 0 (x) = cos x.
whole thing challeng-
ing. Concentrate ini-
tially on the individual

Solution.
steps: why does each
line follow from the pre-
vious one? What fact or
f(x + h) − f(x)
f 0 (x) = lim
result was used?
h→0 h
sin(x + h) − sin x
= lim .
h→0 h
sin x cos h + cos x sin h − sin x
= lim Lemma 1.33 (3)
h→0 h
sin x(cos h − 1) + cos x sin h
= lim
h→0 h
3.2. Differentiation from First Principles 63

   
cos h − 1 sin h
= lim sin x + cos x
h→0 h h
   
cos h − 1 sin h
= sin x lim + cos x lim Theorem 2.29 (1)
h→0 h h→0 h

Here we may use Exercise 2.41 (2) and Theorem 2.37:

= (sin x) · 0 + (cos x) · 1
= cos x. 

The graph of the function f(x) = mx + c is a straight line of slope m. The


tangent at any point on a straight line is the line itself. Hence the slope
of the tangent, i.e. the derivative, is the slope of the line itself, which is
m. The following exercise asks you to formalise this conclusion.

Exercise 3.6. Verify that if f(x) = mx + c then f 0 (x) = m by dif-


ferentiating from first principles. (In particular, this shows that the
derivative of a constant function f(x) = c is zero.)

Differentiability and Continuity


If the function f is differentiable at the point a then we know limx→a f(x)−f(a)
x−a
exists. Since limx→a x − a = 0, one of the rules of limits, namely Theorem
2.29 (4), implies that

f(x) − f(a)
lim f(x) − f(a) = lim (x − a)
x→a x→a x −a
f(x) − f(a)
= lim lim x − a = f 0 (a) · 0 = 0.
x→a x − a x→a
So, if f is differentiable at a, then it follows that limx→a f(x) = f(a), which
is just saying that f is continuous at a.

Theorem 3.7. A differentiable function is continuous.

The converse statement is not true. A function can be continuous but not
differentiable.

Example 3.8. The absolute value function f(x) = |x| (see Definition

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
64 Differentiation

2.3) is continuous but not differentiable at 0.

The technical reason that |x| is not differentiable at 0 is that we get


different limits of f(0+h)−f(0)
h
as h → 0 from the left and the right, and hence
the limit does not exist. However, this non-differentiability should make
good geometrical sense to you: the graph of |x| is V-shaped (Figure 2.1)
and at the corner, there is no unique tangent line. If there is no unique
tangent line, then there is no unique slope of the tangent line, and this
means no derivative.
A rule of thumb is that the graph of a continuous function has no jumps or
breaks while, in addition, that of a differentiable function has no corners,
and is frequently described as being smooth.

Alternative notation for the derivative


So far we have used f 0 to indicate the derivative function of the function
f, and of course f 0 (x) denotes the value of the derivative at an arbitrary
point x. This notation is due to Lagrange, Newton actually used a dot,
that is, ḟ, for the derivative, which is still used today, particularly when
the variable is t (for time) instead of x.
Another very common notation for the derivative of a function f is dx df
,
which may also be written as dx f(x) and is due to Gottfried Leibniz who
d

developed differential calculus around the same time as Isaac Newton


in the 17th century. Since we usually think of the values of f as being
plotted on the vertical y-axis, and consequently often write y = f(x), we
also often write dy
dx
df
interchangeably with dx and f 0 (x).
Mathematics is an Leibniz’s notation might appear strange – exactly what dy and dx are,
intellectually rigorous
discipline. If you find
and why you cannot cancel the ds requires some explaining. Let us just
distinctly fishy the idea say that the notation reflects the formula for the derivative (Definition
of trying to divide one
‘infinitesimally small’ 3.3) where we divide the (infinitesimally) small change in the y-value by
quality by another one the infinitesimally small change in x-value that produces the change in
y.
(whatever they may be),
to get a ‘real’ number,
then you are in good
company. There are some subtle differences between the alternative notations –
Meanwhile, the defi- for example, Newton’s notation f 0 makes sense without reference to the
nition of derivative in df
terms of a limit is per-
variable used in defining the function while Leibniz’s notation dx requires
fectly sound. reference to the variable, but for the most part the notations are freely
interchangeable and we use both throughout.
3.3. Rules for differentiating 65

Example 3.9. For the function f : R → R given by f(x) = x 2 , we have


seen f 0 (x) = 2x and so f 0 (3) = 6. In Leibniz’s notation, this is written
df
dx
= 2x and dx df df
(3) = 6 or dx |x=3 = 6.
We have also seen that the derivative of sine is cosine. In Leib-
niz’s notation, we could equivalently write this as dxd sin x = cos x or
d
sin θ = cos θ, or in terms of any other variable that is convenient.
In Newton’s notation, we can just write sin0 = cos without mentioning

a variable.

Example 2.34 (11)


Exercise 3.10. Find the derivative of f(x) = x 3 from first principles. should prove useful.

√ Hint: use the trick intro-


Exercise 3.11. Find f 0 (x) if f(x) = x, from first principles. duced in Example 2.35.

3.3 Rules for differentiating


Differentiating functions from first principles becomes tedious after a little Differentiating from first
principles is useful be-
while so it is useful to develop some rules which speed up the process. cause it gives meaning
The first such is a rule that allows us to differentiate powers of x. to the process of differ-
entiation.

Lemma 3.12. Fix n ∈ N and let f(x) = x n , x ∈ R. Then f 0 (x) = nx n−1 .

Proof. We use here a factorisation established in Exercise 1.8 (5):

an − bn = (a − b)(an−1 + an−2 b + · · · + abn−2 + bn−1 ).

Because of this, we can write


f(x + h) − f(x)
f 0 (x) = lim
h→0 h
(x + h)n − x n
= lim
h→0 h
and letting a = x + h and b = x
h ((x + h)n−1 + (x + h)n−2 x + · · · + (x + h)x n−2 + x n−1 )
= lim
h→0 h
= x n−1 + x n−2 x + · · · + x(x n−2 ) + x n−1
= nx n−1 .

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
66 Differentiation

It is reassuring to see that if we set n = 2 in this lemma then it tells


us that the derivative of f(x) = x 2 is f 0 (x) = 2x, which agrees with our
differentiation from first principles in Example 3.4. Having proven the
lemma, we can now say that dxd x 3 = 3x 2 and dxd x 100 = 100x 99 without
having to be dragged, kicking and screaming, towards first principles.
Actually, we can do even better. The next fact applies to arbitrary real
powers of x, not just positive integers:
We have only defined x r
when r is rational (Def- Fact 3.13. Suppose f(x) = x r for some r ∈ R. Then f 0 (x) = rx r−1 .
inition 1.12), but it is
possible to define x r for
real r – see Appendix
B.2. We need to take
This power rule for differentiation would quickly answer Exercise 3.11
care with1 the √ domain of (if we weren’t asked to differentiate from first principles) because it tells
x r , e.g. x 2 = x cannot
have x < 0. Moreover,
us that
d√
x = x− 2 = √ .
d 12 1 1 1
the resulting derivative x =
1 − 21
2x = 2√ 1
x
cannot dx dx 2 2 x
have x = 0. This rule,
and others which follow,
Other rules can also be proved from first principles. Most functions we
only apply where the meet are built up as sums, products, quotients and compositions (chains)
function and its deriva-
tive are defined.
of familiar elementary functions. These elementary functions include
power functions x r , trigonometric functions sin x, cos x, exponential func-
tions such as ex , as well as inverses of these. The rules of differentiation
allow us to differentiate functions which are built out of these elemen-
The sum rule says ‘the tary functions. For example, we know the derivatives of u(x) = x 2 and
derivative of a sum is the
sum of the derivatives’
v(x) = sin x, but what about the derivative of (u + v)(x) = x 2 + sin x or
x2
(uv)(x) = x 2 sin x or uv (x) = sin x
?

Theorem 3.14. Suppose u(x) and v(x) are differentiable functions.


Then

1. (ku)0 = k · u0 , for a constant k ∈ R, scalar multiple rule

2. (u + v)0 = u0 + v 0 , sum rule

3. (uv)0 = uv 0 + u0 v, product rule


 0
u vu0 − uv 0
4. = . quotient rule
v v2

There’s a lot in this theorem so let’s break it down. Theorem 3.14 (1) tells
us, for example, that
d d
(100 sin x) = 100 sin x = 100 cos x.
dx dx
3.3. Rules for differentiating 67

Theorem 3.14 (2) tells us that


d 2 d 2 d
(x + sin x) = x + (sin x) = 2x + cos x.
dx dx dx
These two parts follow from the corresponding rules of limits, Theo-
rem 2.29. We’ll just prove the second, and you can do the first.

Proof of Theorem 3.14 (2).


(u + v)(x + h) − (u + v)(x)
(u + v)0 (x) = lim
h→0 h
u(x + h) + v(x + h) − u(x) − v(x)
= lim
h→0
 h 
u(x + h) − u(x) v(x + h) − v(x)
= lim +
h→0 h h
u(x + h) − u(x) v(x + h) − v(x)
= lim + lim Thm 2.29 (2)
h→0 h h→0 h
= u0 (x) + v 0 (x).

Remarks 3.15. Differentiation is an operation which takes one func-


tion f as input and produces another function f 0 as output. So you
can think of the act of differentiation itself as a function. Sometimes
the notation D(f) is used instead of f 0 or dx df
to reflect this way of
thinking.
Together, Theorem 3.14 (1) and (2) above state that differentiation is
a linear function in the sense that you learn about in Linear Algebra
(though it is out of the scope of MATH10390).

The product rule allows us to compute, for example, d


dx
(x 2 sin x) for it says
d d d
x 2 sin
(|{z} |{z}x ) = |{z}
x 2 ( sin x ) + ( x 2 )sin x = x 2 cos x + 2x sin x.
|{z}
dx |dx {z } |dx{z } v
u v u
v0 u0

Example 3.16. Calculate d


dx
(1 − x) sin x.

Solution. Using the product rule,


 
d d d
(1 − x) sin x = (1 − x) sin x + (1 − x) sin x
dx dx dx

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
68 Differentiation

= (1 − x) cos x − sin x. 

The derivative of a prod- Let’s try to prove the product rule.


uct is not the product of
the derivatives!
Proof of Theorem 3.14 (3).

(uv)(x + h) − (uv)(x)
(uv)0 (x) = lim
h→0 h
On the third line, we u(x + h)v(x + h) − u(x)v(x)
= lim
h
subtract and then add
back a particular term. h→0
This term has been cho- u(x + h)v(x + h) − u(x + h)v(x) + u(x + h)v(x) − u(x)v(x)
sen so that we can we = lim
can take advantage of h→0 h
the limit definitions of v(x + h) − v(x) u(x + h) − u(x)
u0 (x) and v 0 (x) in the re- = lim u(x + h) + lim v(x)
mainder of the proof. h→0 h h→0 h
= u(x)v 0 (x) + v(x)u0 (x).

d sin x
dx x 2
The derivative of a quo- Next, the quotient rule, which allows us to find derivatives such as .
tient is not the quotient
of the derivatives!
sin x
Example 3.17. Differentiate f(x) = .
x2

Solution. Setting u(x) = sin x and v(x) = x 2 , and applying the quo-
tient rule  0
u v · u0 − u · v 0
= ,
v v2
we find
d sin x x 2 cos x − (sin x)(2x)
=
dx x 2 x4
cos x sin x
= −2 3 . 
x 2 x

The proof of the quotient rule is not very different from the product rule
– which gives you the perfect opportunity to practice your skills!

Exercise 3.18. Prove the quotient rule for differentiation.


3.3. Rules for differentiating 69

Exercise 3.19. Show that


d 1
tan x = .
dx cos2 x

The derivative of exp x


A proper treatment of the exponential function and its inverse the natural
logarithm is given in Appendix B.2. For the purposes of this subsection,
it will suffice to state that the exponential function exp : R → R is given
by exp x = ex , where e is a very special irrational number (known as
Euler’s constant) and is approximately 2.718281828459. The graph of
exp x is given in Figure 2.3.
The exponential function is unique in the sense that it is equal to its own
derivative
d
exp x = exp x,
Leonhard Euler (1707
– 1783) was one of
dx the greatest mathemati-
and takes value 1 at 0. If a function f : R → R satisfies these two cians of all time.

properties, then f = exp. The only other func-


tions which equal their
derivatives are constant
Example 3.20. Evaluate the following expressions. multiples of the expo-
nential function, C ex ,
including, (when C = 0)
− 4ex + sin x + 2)
d the zero function itself.
1. dx
(3x 2 Note that ex+k = ek · ex
√ is also a constant multi-
x x ple of ex .
2. Find f 0 (x), where f(x) = √
3x − √1 .
x
Image source:
0 x Wikipedia.
3. Find f (x), where f(x) = (e + 2) cos x.
d 2x
4. dx
e .

Solution.

1. d
dx
(3x 2 − 4ex + sin x + 2) = 6x − 4ex + cos x.

2. Here, we use the laws of indices to write f(x) = x 6 − x − 2 and


7 1

then easily one has f 0 (x) = 67 x 6 + 12 x −3/2 , or, in surd notation,


1


f 0 (x) = 76 6 x + 2(√1x)3 .

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
70 Differentiation

3. Using the product rule,

f 0 (x) = cos x
d x
(e + 2) + (ex + 2) cos x
d
dx dx
x x
= cos x · e + (e + 2)(− sin x)
= ex (cos x − sin x) − 2 sin x.

4. For this we use laws of indices to write e2x = (ex )2 = ex · ex .


Now we use the product rule to write
 
x d x d x
e ex = ex · ex + ex · ex = 2e2x .
d 2x
e = e e + 
dx dx dx

The number 4!, pro- Exercise 3.21. Find d


dx
of each of the following expressions.
nounced ‘4 factorial’, is
short for 4·3·2·1, i.e. 24.
In general, given n ∈ N, 1. 1 + x + 21 x 2 + 16 x 3 + 4!1 x 4
n! = n(n − 1) · · · 3 · 2 · 1. x+1
x 2 +1
2.
Interestingly, n! is the
number of ways of ar- 3. x cos x (see Example 3.27 for the derivative of cosine)
ranging n different ob-
jects. x sin x
4. 1+x 2

Hint: e3x = e2x · ex , so


5. e3x .
use the product rule and
Example 3.20 (4).

3.4 The Chain Rule


The last rule required to differentiate almost any (differentiable) function
we meet is the chain rule, which tells us how to differentiate a chain or
composition of functions. For example if f(x) = sin x and g(x) = x 2 then
f ◦ g(x) = f(g(x)) = sin(x 2 ). We want to find (f ◦ g)0 (x), that is, in this
case, dxd sin(x 2 ). The tool that enables us to do that is the chain rule.

Theorem 3.22 (The Chain Rule). Suppose f and g are differentiable


functions. Then
(f ◦ g)0 (a) = f 0 (g(a)) · g0 (a).
That is, the derivative of f ◦ g at a is the derivative of f at g(a)
multiplied by the derivative of g at a.
3.4. The Chain Rule 71

The chain rule looks quite natural in Leibniz notation, where it takes the
form of a ‘cancellation’.

Theorem 3.23 (Chain rule (Leibniz form)). Suppose f and g are dif-
ferentiable functions. Then setting u = g(x) we have

d d df du
(f ◦ g)(x) = f(u) = · .
dx dx du dx

We’ll leave the proof to Appendix B.1. The chain rule tells us that to
differentiate f ◦ g we treat the inner function g as a variable; we write
(f ◦ g)(x) as f(u), where u = g(x). We then want dxd f(u) but since we
can’t differentiate a function of one variable (here u) with respect to a
d
different variable (here x), we actually find du f(u) and then make up for
the change of variable by multiplying by dx which we find by writing u
du

back as an expression in x, u(x).

Example 3.24. Find d


dx
sin x 2 .

Solution. We write u(x) = x 2 so the problem becomes that of finding


d
dx
sin u, which by the chain rule is du
d
sin u · du
dx
. Now the first of these
derivatives is cos u, which we may write as cos x 2 . The second is
du
dx
= dxd u(x) = dxd x 2 = 2x. Hence

d
sin x 2 = 2x cos x 2 . 
dx

Example 3.25. What is the derivative of (x 2 + 3)7 ?

Solution. This is f(u(x)), where f(u) = u7 and u(x) = x 2 + 3. By the


chain rule:
d df du
(f(u(x))) = · = 7u6 · 2x = 14x · u6 = 14x(x 2 + 3)6 . 
dx du dx

You could have avoided using the chain rule in the previous example by
expanding (x 2 + 3)7 into a (lengthy) polynomial and then differentiating
that polynomial using the power rule, but it would be much more work.
With a little practice, using the chain rule becomes second nature. Given
an expression like (x 2 + 3)7 to differentiate, you just mentally put your

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
72 Differentiation

thumb over the inside function x 2 + 3 and say to yourself ‘ok, when I
differentiate something to the power of 7, I get 7 times that thing to the
power of 6. Then I have to multiply by the derivative of what’s under my
thumb. . .’
Here are some more examples of its use.

Example 3.26.

d 3x
1. Find dx
e .

2. Find d
dx
sin2 x.

Solution.

d u d u
1. Let u(x) = 3x. Then e · = eu · 3 = 3e3x .
This is much quicker du
than the hinted method dx
= du
e dx
in Exercise 3.21 (4).
2. Recall sin2 x means (sin x)2 . We’ll interpret this as u2 where u =
sin x and apply the chain rule to get dxd u2 = 2u du
dx
= 2 sin x cos x.


There is often more than one way to differentiate a given function.

d
Example 3.27. What is cos x?
dx

Solution. We can do this in the way we differentiated sin x, from first


principles.
A second method is to use the fact remarked on at the end of Chapter
1, that sin and cos are horizontal translations of each other, specifi-
cally, cos x = sin(x + π2 ). Thus dxd cos x = dxd sin u, where u = x + π2 ,
and so by the chain rule
d
dx
cos x = cos u · du
dx
= cos u = cos(x + π2 ),

because d
dx
(x + π2 ) = 1. This can be simplified to give

cos(x + π2 ) = sin(x + π
2
+ π2 ) = sin(x + π) = − sin x.
3.4. The Chain Rule 73

No matter which way we do it, we find

d
cos x = − sin x. 
dx

Example 3.28. Let φ(x) = e−x . Calculate φ0 (x).


2

Solution. Letting u = −x 2 , we have

d −x 2 d u d u du
e · = eu (−2x) = −2xe−x . 
2
e = e =
dx dx du dx

The function φ defined


Figure 3.7: The ‘bell curve’ e−x
2
here by φ(x) = e−x is
2

y one of the most impor-


tant you will meet, par-
ticularly if you are in-
1 terested in statistics or
any type of data sci-
ence. It is the basis
of the normal distribu-
x tion in statistics, popu-
larly referred to as the
−2 −1 1 2 bell curve because of
the shape of its graph.

The chain rule can be applied more than once to deal with longer chains
of functions.

Example 3.29. Find f 0 (x), where f(x) = cos(log(1 + sin x)).

Solution. This is a ‘chain’ of three functions, f = u ◦ v ◦ w where


u(x) = cos x, v(x) = log x and w(x) = 1 + sin x. The chain rule says

(u ◦ v ◦ w)0 (x) = u0 (v(w(x))) · v 0 (w(x)) · w 0 (x), We’ve used here the


derivative of log x = x1
and this gives f 0 (x) = − sin(log(1 + sin x)) · 1
1+sin x
· cos x.  which is shown at the
end of the chapter.

And of course, several rules may need to be used together to find a


derivative.

Example 3.30. Differentiate f(x) = xetan x .

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
74 Differentiation

Solution.
d d tan x
f(x) = xe
dx dx
First, we use the product rule:

= etan x x + x etan x
d d
dx dx
= etan x + x etan x .
d
dx
Second the chain rule on etan x :
 
tan x tan x d
=e +x e · tan x
dx
sin x
Finally, we apply the quotient rule on tan x = cos x
, which you will
have done in Exercise 3.19, to give
 
d tan x 1
f(x) = e 1+x . 
dx cos2 x

Here’s a challenge. You have the tools necessary to answer this, but do
you have enough paper and perseverance?

(x 2 + 1)100 (3x 3 + x)47


Exercise 3.31. Differentiate f(x) = √ √ .
x 2 + 1 5x + 2
3

We will see the trick of logarithmic differentiation later which provides a


shortcut to answering this exercise.

The derivative of log x


Recall that the exponential and logarithm functions are inverses, mean-
ing that exp(log x) = x and log(exp x) = x over the respective domains.
Combined with the chain rule, and knowing the derivative of exp x, this
fact allows us to find the derivative of log x.
Indeed, we can differentiate both sides of the identity

x = exp(log x),
3.4. The Chain Rule 75

to obtain
d d
1 = (exp(log x )) = exp(u)
dx | {z } dx
u
d d In this argument, we are
= exp u · u chain rule implicitly assuming that
du dx the derivative of log x
d exists. Strictly speak-
= exp u · log x ing, we should not do
dx this, because it has not
d been shown to exist.
= exp(log x) · log x However, the argument
dx is sufficient for our pur-
d poses! See Appendix
=x log x. B.2 for a more mathe-
dx matically robust treat-
ment of exp and log.

and, on dividing both sides by x we have

d 1
log x = .
dx x

Remarks 3.32. Differentiation is an important skill. Like tying your


shoelaces, if you cannot do it reliably then you will eventually trip
up. Also like tying your shoelaces, you cannot master it by reading
about it. It is a skill that must be practised until it becomes second
nature.

Calvin and Hobbes by Bill Watterson

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
Chapter 4

More about the Derivative

4.1 Critical Points, local maxima and minima


As mentioned at the start of Chapter 3, and as seen in Figure 3.2 in
particular, a straight line with positive slope is rising, or increasing, as x
increases (that is, as we move from left to right), while a line with negative
slope is falling, or decreasing. We can tell if a function is increasing or
decreasing at a point a by looking at its tangent line and asking if that
has a positive or negative slope. Remember the slope of the tangent
line is just the derivative of f at a and so we can tell if a function f is
increasing or decreasing at a by calculating whether f 0 (a) is positive or
negative. Gottfried Wilhelm Leib-
niz (1646 – 1716)

Example 4.1. Is f(x) = 2x 3 − 7x 2 + 3x + 4 increasing or decreasing It is rare to find learned


men who are clean, do
at 1? not stink and have a
sense of humour. The
Duchess of Orléans
Solution. Since f 0 (x) = 6x 2 − 14x + 3, we find f 0 (1) = −5 is negative about G. Leibniz.

and so f(x) is decreasing at 1.  Image source:


Wikipedia.

Notice that if the graph of the function was readily available then the
fact that the function is decreasing at 1 would be easy to see. Indeed,
we have the graph of this function in Figure 4.1.

77
78 More about the Derivative

Figure 4.1: the function f in Example 4.1 is decreasing at 1

−1 1 2 3
−2

This technique can be used to find the intervals on which a function f is


increasing or decreasing, which is valuable information if one wants to
sketch the graph of a function.
So we can determine if a function is increasing or decreasing by calcu-
lating whether its derivative is positive (> 0) or negative (< 0). But what
if the derivative is zero?

Definition 4.2. Suppose f is differentiable at a point c and f 0 (c) = 0.


Then we call c a critical point of f.

Exercise 4.3. Find the critical points of the function given in the
previous example, f(x) = 2x 3 − 7x 2 + 3x + 4.

Local maximum of f at Critical points are values of x at which f(x) has a horizontal (flat) tangent
a means f(a) is greater
than f(x) for all x near
line and they are related to local maxima and minima by the following
a, but the function might result.
have greater values ‘far’
from a. The same ap-
plies to local minimum. Theorem 4.4. If f has a local maximum or local minimum at the point
In Fig. 4.1, we see a lo- c, and f is differentiable at c, then c is a critical point of f, i.e. f 0 (c) =
cal max near 0, but f has
greater values near 4. 0.

If you want to know why this is true, see the proof in Appendix B.1.
Beware that the converse to this theorem is not true!

Remarks 4.5. Although a local maximum or local minimum of a dif-


ferentiable function is always a critical point, a critical point does
not have to be a local maximum nor local minimum. You could have
behaviour as in Figure 4.2, where a is a critical point but is neither
4.2. Finding Maxima and Minima 79

a local maximum nor a local minimum. Notice the tangent line at


a actually crosses the graph of the function. We call it a point of
inflection.

Figure 4.2: a point of inflection

1
(1,f(1))

−1 1 2
−1

Nevertheless, at a critical point, very often,


T the graph of a function has
either a local
S maximum and looks like or has a local minimum and
looks like .
For example, in Figure 4.1 above we see two critical points. One lies You located these points
exactly in Exercise 4.3.
between 0 and 1 and looks like a local maximum, the other lies slightly
to the right of 2 and looks like a local minimum.

4.2 Finding Maxima and Minima


We often need to find the maximum or minimum value of a function on
its domain, or on some subset of its domain. For example, we may have
a function f which gives us the concentration of medicine in the blood
f(t) at time t (where t is number of hours after dosage), and we want to
know when the drug concentration is greatest and, indeed, what is that
maximum concentration, between time 0 and 6 hours.
Before considering such questions, we need to know that there is a so-
lution. After all, we do not want to waste time looking for a needle in a
haystack if there is no needle to be found. Consider the following.

Example 4.6. What is the maximum value of the function f(x) = x + 2


over the interval (0, 1)?

Solution. Actually, this is a trick question, because there is no max-

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
80 More about the Derivative

In Example 4.6, 3 is imum value of this function on this interval. The maximum value is
what is called an upper not 3: we cannot take x = 1 since 1 is not in the open interval (0, 1).
bound for f(x), meaning
f(x) 6 3 for all values x We can get the value 2.99 from the function (at x = 0.99) but this is
in the domain (0, 1). Of not the greatest value, since we can also get 2.999. We see that no
course, π and 100 are
also upper bounds, but number smaller than 3 is the greatest value, and the values do not
3 is special because it
is the least upper bound
reach 3, so we conclude the function has no greatest value. 
of f(x). However, it is
not a value of f(x) and
so cannot be considered If we have a closed interval (endpoints included), do you think we will
the greatest value! always have a maximum and minimum?

Example 4.7. Let (


1, if x = 1
f(x) = 1
1−x
if x < 1.
What is the maximum value of f on the closed interval [0, 1]?

Solution. Again the function has no maximum value. Unlike the pre-
vious example, this function does not even have an upper bound,
meaning it takes arbitrarily large values: f( 12 ) = 2, f(0.98) = 50,
f(0.999) = 1000, . . . 

In the last example, although the function is defined at x = 1, the value


is not the limit of f(x) as x → 1. In other words, the function is not
continuous on the interval [0, 1].
If we rule out non-continuous functions, and intervals not containing their
endpoints, we do get what we want.

Theorem 4.8 (The Max-Min Theorem). A continuous function on a


closed interval [a, b] has a maximum and a minimum value on the
interval.

The Max-Min Theorem guarantees that there is a needle in the haystack


– provided the function is continuous and we restrict our search to (bound-
ed) closed intervals, there will be a maximum and minimum value to be
found. If the interval is not closed, or the function is not continuous, then
we may not have maximum and minimum values, as our examples have
already shown.
4.2. Finding Maxima and Minima 81

Example 4.9. Does the function g(x) = 2x 3 − 9x 2 + 12x − 4 have a


maximum value on the interval [1, 4]?

Solution. Yes. The function g is a polynomial and so continuous by


Example 2.43. The interval [1, 4] is closed. Therefore by the Max-Min
Theorem, g has a maximum value on the interval. 

Good. So at least we know there is a needle in the haystack – the


continuous function has a maximum value which is reached somewhere
in the closed interval. But what is this maximum value, and where in
the interval does it occur? In any closed interval, there are an infinite
number of numbers – we cannot evaluate our function at all of them by
hand in order to find which gives us the maximum (or minimum) value.
The next result, and its corollary, tell us we can throw away most of the
haystack. . .

Theorem 4.10. For a continuous function on a closed interval, the


maximum value (resp. minimum value) occurs either at a local max-
imum (resp. local minimum) of the function, or at an endpoint of the
interval.

Since a local maximum/minimum of a differentiable function is always a


critical point (by Theorem 4.4), we obtain the following consequence.

Corollary 4.11. For a differentiable function on a closed interval, the


maximum value and minimum value occur either at a critical point of
the function or at an endpoint of the interval.

To distinguish from local maxima and minima, we use the term global
maximum/minimum for the maximum/minimum value of a function over a
specified interval or domain. Corollary 4.11 provides us with the possible
locations of global max/min values. To pin them down, we have to locate
the critical points and do a few calculations.

Example 4.12. Find the maximum and minimum values of the function
g(x) = 2x 3 − 9x 2 + 12x − 4 on the interval [1, 4].

Solution. First we find the critical points: g0 (x) = 6x 2 − 18x + 12 so


we can solve g0 (x) = 0 (a quadratic equation) to find that the critical

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
82 More about the Derivative

points are at 1 and 2. The endpoints of the interval are 1 and 4, so


the maximum and minimum must occur at either 1, 2 or 4. Computing
g at these points yields
In Example 4.12, the
critical points and inter-
val endpoints were in-
tegers by design. You
can’t assume that the
g(1) = 1, g(2) = 0 and g(4) = 28.
max/min values occur at
integers generally. We conclude that the maximum value is 28 and occurs at x = 4 while
the minimum value is 0 and occurs at x = 2 (see Figure 4.3.) 

Figure 4.3: the graph of g in Example 4.12

24

18

12

1 2 3 4

Finding the maxima and minima of functions is one of the most important
applications of differential calculus.

Higher order derivatives


Once you have differentiated a function f to find f 0 = dx df
you can, of course,
0 0 00 d2 f
differentiate again to find (f ) , which is written f or dxd ( dx df
) = dx 2 , and

referred to as the second derivative of f. Why might we be interested in


this? Well, there are common applications. For example, if p(t) represents
position at time t then its derivative p0 (t) (or ṗ(t)) represents the rate of
change of position – that is, speed or velocity. The second derivative
p00 (t) represents the rate of change of velocity, namely the acceleration.
I expect this was un-
intentional, but the fol-
lowing headline alludes
to a fourth derivative. . . The second derivative can also tell us something about a critical point.
Decline in lending to Suppose the function f has a critical point at c, i.e. f 0 (c) = 0. Suppose
further that f 0 is differentiable at c and f 00 (c) < 0. Since the derivative of
firms continues:
Central Bank says pace
of fall in availability f 0 is negative at c, this means f 0 is decreasing at c. But if the function f 0
of credit to businesses
growing at increased
is zero at c and decreasing there, we conclude f 0 must be positive to the
rate. left of c and negativeTto the right of c. This suggests that, around c, the
Irish Times, April 2010 graph of f looks like . Thus f has a local maximum.
4.2. Finding Maxima and Minima 83

A similar argument leads to the conclusion that if c is a critical point and


f 00 (c) > 0, then c must be a local minimum. This is helpful for classifying
critical points.

Theorem 4.13. Suppose c is a critical point of the twice differentiable Warning!! This classifi-
function f. If f 00 (c) > 0 then c is a local minimum of the function, and cation theorem does not
help to classify critical
if f 00 (c) < 0 then c is a local maximum of the function. points at which the sec-
ond derivative is zero.
These could still be lo-
cal maxima, local min-
ima, or neither.
Example 4.14. For the function g(x) = 2x 3 − 9x 2 + 12x − 4 given in
Example 4.12, we have g00 (x) = 12x − 18. At the critical point 1, we
have g00 (1) = −6 < 0, so this critical point is a local maximum. At the
critical point 2, we have g00 (2) = 6 > 0 so this critical point is a local
minimum.
Of course, this agrees with what we see in the graph.

Example 4.15. Find and classify any critical points of φ(x) = e−x .
2

Solution. We found the derivative of φ already in Example 3.28:


φ0 (x) = −2xe−x . For critical points, we set this equal to 0 and
2

solve −2xe−x = 0. Since the exponential function is never 0, we can


2

divide both sides by −2e−x leaving x = 0 as the only solution, and


2

hence 0 is the only critical point.


Now differentiate again, using the product and chain rule, to find

d  
φ00 (x) = −2xe−x
2

dx    
d −x 2 d −x 2
= −2 x e + x e
dx dx
= − 2(−2x 2 e−x + 1 · e−x )
2 2

= − 2(1 − 2x 2 )e−x .
2

Therefore φ00 (0) = −2 · 1 · 1 = −2 is negative, and Theorem 4.13


tells us that the critical point at 0 is a local maximum. Happily, our
conclusion agrees with Figure 3.7. 

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
84 More about the Derivative

4.3 The Mean Value Theorem


Rolle’s Theorem guarantees the existence of a critical point for a function
which takes the same value at two different points.

Theorem 4.16 (Rolle’s Theorem). Suppose f is continuous on the


closed interval [a, b], differentiable on the open interval (a, b), and
f(a) = f(b). Then there exists a point c ∈ (a, b) such that f 0 (c) = 0.

Proof. The Max/Min Theorem (4.8) implies that f has a (global) maximum
at some point c ∈ [a, b] and a minimum at some point d ∈ [a, b]. Now
from Theorem 4.10 each of c and d is either an endpoint of [a, b] or it is
an element of (a, b) at which f 0 = 0. If either of c and d is a point at
which f 0 = 0 then the proof is finished.
So suppose that both c and d are endpoints. This means that f attains
its maximum at a or b. In fact, our hypothesis f(a) = f(b) implies that
it achieves its maximum at both a and b. Also f achieves its minimum
at a and b. That is, the maximum and minimum for f on [a, b] are the
same. Of course, this means that f is a constant function and so f 0 = 0
everywhere in (a, b). This finishes the proof.

Figure 4.4: Rolle

(a,f(a)) (b,f(b))

a c b

Example 4.17. Show that there is a number c between 0 and 1, such


that tan c = 1 − c.

Solution. Consider the function f(x) = (1 − x) sin x, which is differen-


tiable because it is the product of two differentiable functions (1 − x
and sin x). Notice that f(0) = f(1) = 0. Therefore we can apply
Rolle’s Theorem to f to assert the existence of c ∈ (0, 1) such that
f 0 (c) = 0.
4.3. The Mean Value Theorem 85

We actually calculated f 0 in Example 3.16 and found f 0 (x) = (1 − In our solution, we di-
vide by cos c. How do
x) cos x − sin x. So now we know (1 − c) cos c − sin c = f 0 (c) = 0 we know that cos c is
and hence (1 − c) cos c = sin c. Divide both sides by cos c to get not zero?
tan c = 1 − c. 

A variation of Rolle’s theorem is the Mean Value Theorem.

Theorem 4.18 (Mean Value Theorem). Suppose f is continuous on the


closed interval [a, b] and differentiable on (a, b). Then there exists a
point c ∈ (a, b) such that

f(b) − f(a)
f 0 (c) = .
b−a

The Mean Value Theorem states that there is a point c between a and The Mean Value The-
b such that the tangent line at c is parallel to the secant line joining orem provides a le-
gal justification for av-
(a, f(a)) to (b, f(b)). See Figure 4.5. The Mean Value Theorem is es- erage speed cameras!
sentially just a ‘rotation’ of Rolle’s theorem. Indeed when f(b) = f(a) it If timed photos show a
car entering and exit-
says exactly the same thing as Rolle’s theorem. We’ll leave the proof to ing the 4km Dublin Port
Tunnel in less than a
Appendix B.1. 3-minute interval, and
hence with an average
speed through the tun-
Figure 4.5: The Mean Value Theorem nel of more than 80km/h,
then the MVT guar-
antees that at some
point inside the tunnel
the car was travelling
at more than 80km/h
and can be issued a
speeding ticket (even
though the car was
a c b never observed breaking
the speed limit).

We have some nice corollaries to the Mean Value Theorem.

Corollary 4.19. If f 0 (x) = 0 for every x in an (open) interval I then f


is a constant function.

Proof. Let f satisfy the hypothesis, that is, f 0 (x) = 0 for all x ∈ I. Take
some point a ∈ I, which we fix for the rest of the proof. We show that f

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
86 More about the Derivative

is constant by proving that f(b) = f(a) for all b ∈ I. Let b ∈ I be distinct


from a. If a < b, then by applying the Mean Value Theorem to f on the
interval [a, b], one gets a point c ∈ (a, b) ⊆ I for which
f(b) − f(a)
f 0 (c) = .
b−a
But whatever c is, f 0 (c) = 0 by hypothesis, hence f(b) = f(a). If instead
b < a then we reach the same conclusion by using the Mean Value
Theorem as above, interchanging the roles of a and b. As b was chosen
arbitrarily, this conclusion holds for all b ∈ I, hence f is constant.

Corollary 4.20. If f and g are two functions on an interval I for which


f 0 (x) = g0 (x) for all x ∈ I then there exists k ∈ R such that f(x) =
g(x) + k for all x ∈ I.

Proof. Apply Corollary 4.19 to the function h = f − g, which has zero


derivative at every point of I, and thus takes some constant value k.

This corollary provides a powerful technique for establishing mathemat-


ical identities. What is an identity? Well, normally equations like x 2 = 4
have solutions – the values of x for which the equation is true (2 and −2
for this one). An equation which is true for all values of the variable(s)
Not related to the is known as an identity, for example, x + x = 2x is an identity, so is
identity matrices in
MATH10390 Chapter
cos2 x + sin2 x = 1.
1, or any other part of
that module! Example 4.21. Show that log(xy) = log x + log y for all x and y in
(0, ∞).

Solution. Pick any number y > 0 and consider two functions on the
interval (0, ∞), namely f(x) = log(xy) and g(x) = log x. Here, y is
This identity can also
be established using
a fixed number (a constant), while x is the variable in our functions.
ex ey = ex+y and the Now find dxd f and dxd g.
inverse relationship
between ex and log x. By the chain rule (take u(x) = xy), dxd f(x) = dxd log(xy) = 1 d
xy dx
(xy) =
1
xy
y = x1 . We also know that dxd g(x) = dxd log x = x1 .
So f and g have the same derivative on the interval (0, ∞). By Corol-
lary 4.20, the two functions differ by a constant so there is a number
k such that f(x) = g(x) + k, or

log(xy) = log x + k,
4.4. Linear Approximation 87

for all x in the interval. What can the number k be? To decide, set
x = 1 which tells us that

log y = log 1 + k.

But log 1 = 0 so in fact k = log y, and we have

log(xy) = log x + log y,

for all x ∈ (0, ∞). This works for the number y we picked at the
start, but we could have picked any y ∈ (0, ∞) so in fact, for all
x, y ∈ (0, ∞),
log(xy) = log x + log y. 

See if you can apply this method yourself in the following exercise.

Exercise 4.22. Verify the identity

log x a = a log x,

for any numbers x > 0 and a ∈ R.

4.4 Linear Approximation


When we introduced differentiation from first principles, we approxi-
mated the slope of the tangent line to f at a point a by using the slope
of the secant line joining two points on the graph: (a, f(a)) and (x, f(x)).
We are now going to turn this on its head – we will use points on the
tangent line to approximate points on the graph of the function.
Consider a differentiable function f and its tangent line at a, which we
will denote by Laf . We know the tangent line passes through the point
(a, f(a)) and we also know the slope of this tangent line – it is f 0 (a).
Now every non-vertical line can be written in the form y = mx + c. Our
tangent line has slope m = f 0 (a) and so takes the form

y = f 0 (a)x + c,

and we find c using the fact that when x = a, y is f(a):

f(a) = af 0 (a) + c,

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
88 More about the Derivative

so c = f(a) − af 0 (a) and the line Laf has the formula

y = f 0 (a)x + f(a) − af 0 (a)


= f(a) + (x − a)f 0 (a).

If you prefer, you can consider this tangent line as the graph of the
straight line function

Laf (x) = f(a) + (x − a)f 0 (a). (4.1)

The function Laf given in Equation 4.1 is the linearisation or linear ap-
proximation of f at a. To emphasize, Laf is the function whose graph is
the tangent line to f at a. We can use it to approximate f(x) when x is
close to a. The idea is that, because the tangent line is a good local
approximation to the graph of f, so Laf (x) is a good approximation to f(x).

Example 4.23. Find the √
√ linearisation of f(x) = x at a = 25, and
use it to approximate 26.3 and 24.8 without using a calculator.

Solution. We use several steps.

1. The linear approximation of f at 25 is


f
L25 (x) = f(25) + f 0 (25) · (x − 25).

2. f(25) = 25 = 5.

3. f 0 (x) = dxd ( x) = 2√1 x so f 0 (25) = √1
2 25
= 1
10
.
f x
4. Thus L25 1
(x) = 5 + 10 (x − 25) = 2.5 + 10 is the linearisation of f
at x = 25.
√ f
5. Now 26.3 = f(26.3) ≈ L25 (26.3) = 2.5 + 26.3
10
= 5.13.
√ f
6. And 24.8 = f(24.8) ≈ L25 (24.8) = 2.5 + 24.8
10
= 4.98.

√ √
Using a calculator, 26.3 ≈ 5.12835 . . . and 24.8 ≈ 4.979959 . . . , so our
√ to the actual values! Indeed,
approximations are pretty close
f
Figure 4.6
x
(on the left) shows f(x) = x and the tangent line L25 (x) = 2.5 + 10
as a dotted line. The two graphs are virtually indistinguishable in the
close-up version on the right.
4.4. Linear Approximation 89

Figure 4.6: the linearisation of the square root function at 5

5.2
6
(25,5) 5 (25,5)
4
4.8
2
4.6

10 20 30 40 23 24 25 26 27

Example 4.24. Find the linearisation of the sine function at a = 0.

Solution. Since dxd sin x = cos 0 we find L0sin (x) = sin(0)+(cos 0)(x −0).
But sin 0 = 0 and cos 0 = 1 so L0sin (x) = x. 

Figure 4.7: the linearisation of the sine function at 0

1
The fact that the lineari-
sation of sin x at 0 is just
x means that sin x be-
haves very much like x
−3 −2 −1 1 2 3 near the origin.

−1


The requirement to approximate 26.3 without using a calculator may
seem a little contrived, and indeed it is. Nevertheless, linearising a
function is used a lot in practice. The equations and functions that gov-
ern real-world problems can be incredibly complex and impossible to
solve exactly. Linearising is the approach used to make such problems
tractable, and knowing how to work with the linear equations that result
is a big motivation for studying linear algebra.
For example, the motion of a pendulum, which is along an arc of a circle,
is governed by an equation involving sin x. Solving the equation is not
possible, but if the pendulum only oscillates on a short arc, sin x can be
approximated very closely by x (Example 4.24) and with this the equa-
tions can be solved. This was first done by Galileo, and his work was the

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
90 More about the Derivative

scientific basis for high-accuracy timekeeping by pendulum clocks from


the 17th century until quartz (and later, atomic) time-keeping provided
a jump in accuracy in the 20th century.

4.5 Logarithmic and Implicit Differentiation


Logarithmic differentiation
Example 4.21 and Exercise 4.22 tell us that the logarithm transforms
products into sums, and powers into products. So if you have a function
f involving lots of products and powers, it might be easier to work with,
and differentiate, log f instead of f.
Here’s a simple example. Take f(x) = x 2 . Then log f(x) = log x 2 = 2 log x.
Now differentiate both sides of log f(x) = 2 log x to find

1 d 2
f(x) = ,
f(x) dx x

and multiply both sides by f(x) (i.e. x 2 ) to get

d
f(x) = 2x.
dx
OK, we already knew that dxd x 2 = 2x, but the method can be more valu-
able in other situations. Consider again Exercise 3.31.

Example 4.25. Differentiate

(x 2 + 1)100 (3x 3 + x)47


f(x) = √ √ .
x 2 + 1 5x + 2
3

Solution. First we take the log of f(x) and use rules of logarithms
(Example 4.21 and Exercise 4.22) to see that log f(x) equals

100 log(x 2 + 1) + 47 log(3x 3 + x) − 31 log(x 2 + 1) − 12 log(5x + 2).

We differentiate both sides:


5
1 d 200x 47(9x 2 + 1) 2x
f(x) = 2 + − − 2
,
f(x) dx x +1 3x 3 + x 3(x 2 + 1) 5x + 2
4.5. Logarithmic and Implicit Differentiation 91

and hence f 0 (x) is equal to


!
(x 2 + 1)100 (3x 3 + x)47 47(9x 2 + 1) 5

200x 2x
√ + − − 2
.
3
x 2 + 1 5x + 2 x2 + 1 3x 3 + x 3(x 2 + 1) 5x + 2


Implicit differentiation
Implicit differentiation is a generalisation of this idea where we want
to know the derivative of a function f(x), or any other quantity y that
varies with x, but we don’t have an explicit formula for the quantity. For

example, if you are told that y + x = ex and asked to find dy dx
then,
normally, you would try to write y explicitly as a function of x:
y + x = (ex )2 = e2x , y = e2x − x,
and then differentiate:
dy
= 2e2x − 1.
dx
Sometimes however, you cannot write y explicitly as a function of x, as
in the example
y5 − y2 + y = x 2 + 1.
where y as a function of the variable x would involve solving a quin- For more information on
this remark, see the
tic (as opposed to quadratic) polynomial which is generally impossible. marginal note just be-
This doesn’t necessarily stop you from finding dy
dx
because we can still fore MATH10390 Propo-
differentiate both sides with respect to x: sition 5.9.

dy dy dy
5y4 − 2y + = 2x
dx dx dx
dy 
5y4 − 2y + 1 = 2x
dx
dy 2x
= .
dx 5y − 2y + 1
4

We’ve found the derivative, albeit the formula is also implicit (involves x
and y, not just a formula in x).

Example 4.26. Points (x, y) ∈ R2 which satisfy x 2 + y2 = 25 form a


circle of radius 5.
1. Verify that (3, −4) is on this circle.

2. What is the slope of the tangent line to this circle at the point
(3, −4)?

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
92 More about the Derivative

3. What is the equation of this tangent line?

4. Where does it cross the x-axis?

Solution.

1. Since 32 + (−4)2 = 25, (3, −4) is indeed on the circle.

2. We apply implicit differentiation to the equation x 2 + y2 = 25 to


get 2x + 2y dy
dx
= 0 and hence dy dx
= − yx . Now at (x, y) = (3, −4),
the slope of the tangent is − (−4)
3
= 34 .

3. Since our line must be of the form y = 43 x+c and passes through
(3, −4) we have −4 = 34 3 + c so c = − 25 4
and the equation of
the line is y = 4 x − 4 .
3 25

4. At the point where the line crosses the horizontal axis, we have
y = 0 so 34 x − 25
4
= 0 and x = 25
3
= 8 13 . 

Figure 4.8: The circle x 2 + y2 = 25

2 4

(3,−4)

Calvin and Hobbes by Bill Watterson


Chapter 5

Functions of Several Variables

In Chapter 1, when we introduced functions, we said a function takes in-


put and produces output, usually according to some rule. We’ve become
used to our function’s input being a number x, its output being a (gen-
erally different) number y, or f(x), and the rule being some formula in x.
Thus we have something along the lines of f : R → R, f(x) = sin(x 2 + 2).
In this chapter, we’ll take a look at the situation where the input consists
of not one, but a ordered pair (or perhaps an ordered triple. . . ) of num-
bers. That is, the domain of the function will be a subset of R2 , or R3 , or
Rn . In the case of a function f : R2 → R, we’ll typically write the input
as (x, y) and use z or f(x, y) for the output. Thus, we’ll have something
like f : R2 → R, f(x, y) = sin(x 2 y) exp y.

5.1 Graphs
Whereas the graph of a function f : R → R involved marking a point
at height y = f(x) above each point x in the domain, R, viewed as a
horizontal line, for a function g : R2 → R, our graph will involve marking
a point at height z = f(x, y) above each point in the domain R2 , which
you can view as a horizontal plane. Instead of being a line or curve
in two dimensions, our graph is more like a surface or terrain in three
dimensions. For example, f(x, y) = sin(x 2 y) exp y is plotted in Figure 5.1.

93
94 Functions of Several Variables

Figure 5.1: f(x, y) = sin(x 2 y) exp y plotted as a 3D surface


These days, plotting
such things with
computers is quite easy,
and doing so helps
to put flesh on them.
There are some free
graphing apps, such
as Quick Graph for
iOS, which enable the
user to quickly sketch
quadratic forms (and
more general functions)
on R2 , and view them
from different angles. If
you have the chance, I
encourage you to try it
out!

It is worth remembering that you are already very familiar with functions
of two variables, namely addition and multiplication! The functions of
addition f : R2 → R, f(x, y) = x + y, and multiplication g : R2 → R,
g(x, y) = xy, are graphed in Figure 5.2.

Figure 5.2: the graphs of addition and multiplication

Notice how the graph of the addition function is ‘flat’; it is a plane, which
is the two-variable analogue of a line. There is a reason for this: the
function f(x, y) = x + y is a linear function. It can be represented by the
See MATH10390 Sec- 1 × 2 matrix ( 1 1 ), in the sense that we can we can take advantage of
tions 1.2 and 2.4.
matrix multiplication and write
!
  x
E.g. in MATH10390 Sec- f(x, y) = 1 1 .
tion 1.2, we saw that y
the 2 × 2 rotation matrix
Rθ produces a rotation
about the origin through
In linear algebra, the m × n matrices you study can be regarded as
the angle θ, which is a functions from Rn into Rm . However, in the grand scheme of things, they
(linear) function from R2
to R2 .
5.2. The vector spaces R2 and R3 95

can be considered quite simple functions because they are linear. Here,
we will deal with non-linear functions, although given time constraints,
we’ll only deal with those which are real-valued, rather than the general
case which map into Rm .

5.2 The vector spaces R2 and R3


From MATH10390 Chapter 2, we are familiar with the set of ordered pairs
or ordered 2-tuples and the set of ordered 3-tuples

R2 = {(x, y) : x, y ∈ R} and R3 = {(x, y, z) : x, y, z ∈ R} ,

respectively.
 
x
An element of R can be written either as (x, y) or
2
. The difference y
in notation can be used to distinguish between the different ways of
looking at an element of R2 :

• as a point, i.e. just a dot in the plane or

• as a vector, i.e. as an ‘arrow’ with length and direction, either in


‘row’ or column form.

We will wander freely between these two viewpoints and take advantage See MATH10390 Sec-
tion 2.4.
of the different notation when suitable. Of course, the same goes for
elements of R3 , and Rn in general.
Recall that vectors can be added together and multiplied by scalars ‘en- See MATH10390 Sec-
trywise’. Given x = (x1 , x2 ) and y = (y1 , y2 ) in R2 and λ ∈ R, tion 2.2.

x + y = (x1 , x2 ) + (y1 , y2 ) = (x1 + y1 , x2 + y2 ),

and
λx = λ(x1 , x2 ) = (λx1 , λx2 ). Just about everything
we say about functions
The extension to Rn is clear enough. The lengthpor Euclidean norm of of 2 variables in this
chapter can be extended
the vector x = (x1 , x2 ) in R2 is given by x = x12 + x22 . This formula
to n variables. We’ll
stick to 2 variables just
comes from the Theorem of Pythagoras. to make the notation
easier.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
96 Functions of Several Variables

Limits and Continuity


The idea of a limit that we met for a 1-variable function in Chapter
2 carries over to the several variable setting, so we may talk about
lim(x,y)→(a,b) f(x, y) as the value that f(x, y) is close to when (x, y) is close
to (a, b).
There is a notable difference in a several variable limit. For f : R → R,
when talking about limx→a f(x), there isn’t much choice, or room, in how
x can approach a. Perhaps x is close and to the left of a, or close and
to the right of a. Recall the signum function (Example 2.25 (2)) which
has a different ‘left-hand-side’ limit and ‘right-hand-side’ limit at 0. But
in R2 (and beyond) there are many ways that (x, y) can approach (a, b),
e.g. from the left, right, top, bottom, along a line from the North-West,
along a wiggly curve, around a spiral, and so on. This generally makes
it harder to guarantee that a limit exists, because in order for a limit to
exist, we must get the same limit along every possible approach. The
good news is that the algebra of limits, Theorem 2.29, carries over to the
multivariate case verbatim, and this can often be used to overcome the
problem of establishing existence of limits.
Following the notion of limit, we have continuity, and just like in R, we
will say f : R2 → R is continuous at (a, b) ∈ R2 if lim(x,y)→(a,b) f(x, y) exists
and equals f(a, b). You might imagine that it is difficult to decide if a
given function of several variables is continuous at a point, but because
of the algebra of limits, most of the functions we meet won’t present too
much difficulty. The following applies just as it did for one variable.

Fact 5.1. Sums, products and compositions of continuous functions


are continuous. A quotient of continuous functions is continuous at
any point where the denominator function is not zero.

How does this help us? First, suppose we have a continuous function
g : R → R of one variable, e.g. g(x) = sin x, and define f : R2 → R
by f(x, y) = g(x). Then f is effectively a function of one variable, be-
cause it completely ignores the y-coordinate. The function is continuous
everywhere because, given (a, b) ∈ R2 ,

lim f(x, y) = lim g(x) = lim g(x) = g(a) = f(a, b).


(x,y)→(a,b) (x,y)→(a,b) x→a

Only the third equality above requires some justification. As g is con-


tinuous, we know that limx→a g(x) exists and equals g(a). Thus g(x) is
5.2. The vector spaces R2 and R3 97

close to g(a) whenever x is close to (or equals) a. If (x, y) gets closer and
closer to (a, b) then, regardless of the approach (x, y) takes, x must get
closer and closer to (and may equal) a, and hence g(x) gets closer and
closer to g(a). Therefore lim(x,y)→(a,b) g(x) must equal limx→a g(x), namely
g(a). In a similar vein, the function h(x, y) = g(y), (x, y) ∈ R2 , again
effectively a function of one variable, will be continuous.
Because many functions of two (and more) variables that we will look
at are actually sums, products and compositions of functions that are
effectively functions of a single variable, the argument above, together
with Fact 5.1, allows to decide on continuity quite easily.

Example 5.2. Is the two-variable function g : R2 → R, g(x, y) = x 2 y


2y
continuous? What about h(x, y) = x2x
4 +y2 ?

Solution. The function g is continuous, because it can be seen as


the product of two one-variable functions (x, y) 7→ x 2 and (x, y) 7→ y,
both of which, being polynomials, are continuous.
The function h is continuous anywhere that the denominator is not
zero, because both its numerator and denominator are continuous
two-variable functions. The only place the denominator x 4 + y2 is
zero, is at the origin (0, 0). The function is not continuous at (0, 0)
because it is not defined there. Moreover, it is not possible to extend
the definition of h to (0, 0) in any way, so as to make it continuous
there, because lim(x,y)→(0,0) h(x, y) does not exist.
If we approach the origin along the x-axis, that is (x, y) → (0, 0) along
y = 0, we have

0
lim h(x, y) = lim = 0.
(x,y) = (x,0)→(0,0) x→0 x 4

In fact, if we approach along any straight line y = kx we get

2kx 3 2kx
lim h(x, y) = lim = lim 2 = 0.
(x,y) = (x,kx)→(0,0) x→0 x + k x
4 2 2 x→0 x + k 2

However, if we approach along the quadratic curve y = x 2 , then we


find
2x 4
lim h(x, y) = lim 4 = 1.
(x,y) = (x,x 2 )→(0,0) x→0 x + x 4

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
98 Functions of Several Variables

Since we have different limits along different paths, the limit does
not exist. The graph is shown in Figure 5.3. 

Figure 5.3: h cannot be continuous at (0, 0)

5.3 Partial Derivatives


We want to start finding derivatives of functions of two variables. Con-
sider the function f : R2 → R given by f(x, y) = 3x 2 y. If we fix the
y-value, say y = 7, then we have

f(x, 7) = 21x 2 .

This now is really a function just of one variable x. Let’s call this function
f7 , so f7 (x) = 21x 2 . Of course, differentiating this function is no problem:

f7 0 (x) = 42x.

We have f7 0 (3) = 126.


This is only a ‘partial derivative’ of f because only x is allowed to change
while the y-value stays the same (7 in this case). We call it the partial
∂f
derivative of f with respect to x and denote it by ∂x . We have just
∂f
calculated that ∂x (3, 7) = 126.
Similarly, we can look at what happens if we fix the x-value, say x = 3,
to get
f(3, y) = 27y.
This notation is non- Again this leaves us with a one variable function, let’s call it 3 f (apologies
standard: it is being
used to explain the
for the ugly notation), which is given by 3 f(y) = 27y. We can differentiate
idea of partial deriva- this:
0
3 f (y) = 27.
tives and will vanish af-
terwards.
5.3. Partial Derivatives 99

In particular, 3 f 0 (7) = 27. This time we have fixed x and taken a derivative
with respect to y. This is the partial derivative of f with respect to y and
∂f ∂f
is written ∂y . We have calculated ∂y (3, 7) = 27.
For f(x, y) = 3x 2 y we have, on considering the y value fixed, and differ-
entiating with respect to x, we have
∂f
(x, y) = 6xy.
∂x
and similarly, fixing x and differentiating with respect to y,
∂f
(x, y) = 3x 2 .
∂y
We can define partial derivatives formally as follows.

Definition 5.3. Let f : R2 → R be a function of two variables. The


partial derivative of f with respect to x at (a, b) ∈ R2 is given by

∂f f(a + h, b) − f(a, b)
(a, b) = lim ,
∂x h→0 h
whenever this limit exists, while the partial derivative of f with re-
spect to y at (a, b) is given by

∂f f(a, b + h) − f(a, b)
(a, b) = lim ,
∂y h→0 h

(again, subject to the existence of the limit).

∂f ∂f
Definition 5.4. The pair of functions ( ∂x , ∂y ) is called the gradient of ∇f evaluated at a point
f, denoted by ∇f. It is a function from R2 to R2 , or subsets thereof. yields a vector in R2 .

∂f
In general, to find ∂x , just differentiate with respect to x while treating y
∂f
as a constant. To find ∂y , differentiate with respect to y while treating x
as a constant.
∂f ∂f
Example 5.5. Find ∂x ∂y
, and ∇f(2, 3) in the following cases.

1. f(x, y) = x 2 y

2. f(x, y) = exy sin x.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
100 Functions of Several Variables

Solution.

∂f ∂f ∂f ∂f
1. ∂x
= 2xy and ∂y = x 2 . In particular, ∂x
(2, 3) = 12 and ∂y
(2, 3) =
4, so ∇f(2, 3) = (12, 4).
∂f ∂f ∂f
2. ∂x
= yexy sin x +exy cos x, ∂y = xexy sin x, ∂x
(2, 3) = e6 (3 sin 2+
∂f
cos 2) and ∂y (2, 3) = 2e6 sin 2, so

∇f(2, 3) = (e6 (3 sin 2 + cos 2), 2e6 sin 2). 

You should practise finding partial derivatives.

∂f ∂f
Exercise 5.6. Find the partial derivatives ∂x
and ∂y
, where

1. f(x, y) = x 2 y3

2. f(x, y) = (x 2 + y2 ) sin x

3. f(x, y) = 1
x 2 +y2

4. f(x, y) = ex
2 y3
.

Of course partial derivatives can be extended from functions of 2 variables


to functions of n variables. If f is a function of n variables x1 , . . . , xn then
∂f
f has n first order partial derivatives ∂x 1
, . . . , ∂x∂fn .

Example 5.7. Let g(x, y) = ex sin y. Calculate ∇g(x, y) and eval-


2 +y

uate ∇g(0, π).


∂g ∂g
Solution. First we find = 2xex sin y and = ex cos y +
2 +y 2 +y
∂x ∂y
x 2 +y x 2 +y
e sin y and so ∇g(x, y) = e (2x sin y, cos y + sin y). Now,

∇g(0, π) = eπ (2π sin π, cos π + sin π) = eπ (0, −1) = (0, −eπ ).




5.4 Critical Points


For one variable, a critical point is one where the derivative is zero. We
can generalise to two variables like so:
5.4. Critical Points 101

Definition 5.8. A point (a, b) ∈ R2 for which both partial derivatives


of f : R2 → R are zero is called a critical point of f.

We also have corresponding notions of local maxima and local min-


ima: (a, b) is a local maximum of f : R2 → R if f(a, b) > f(x, y) for
all (x, y) close to (a, b), and the reverse inequality for a local minimum. If
(a, b) is a local maximum of f(x, y), then of course a is a local maximum of
the one-variable function fb (x) := f(x, b) (where we have fixed y = b) and
it follows from Theorem 4.4 that fb0 (a) = 0, which is the same as saying
∂f ∂f
∂x
(a, b) = 0. Similarly ∂y (a, b) = 0. That is, both partial derivatives are
zero at the local maximum. The same applies at a local minimum. This
amounts to the following two variable analogue of Theorem 4.4.

Theorem 5.9. If f : R2 → R has a local maximum or local minimum at


the point (a, b), and the partial derivatives of f exist at (a, b), then
(a, b) is a critical point of f, i.e. ∇f(a, b) = (0, 0).

To find critical points of f, we solve ∇f(x, y) = (0, 0), or equivalently, the


simultaneous equations

∂f ∂f
= 0 and = 0.
∂x ∂y

Example 5.10. Find critical points of f(x, y) = x 2 + y2 + 2x − 4y + 3. In Examples 5.10 and


5.11, the simultaneous
equations are linear
Solution. We have systems, and therefore
relatively straightfor-
∂f ∂f ward to solve, using
= 2x + 2 and = 2y − 4. e.g. techniques from
∂x ∂y MATH10340 Chapter
3. In general however,
Hence ∇f = (2x + 2, 2y − 4). For critical points we solve 2x + 2 = 0
the equations in need
of solving may not be
and 2y − 4 = 0 to get x = −1 and y = 2, that is the critical point linear.

(−1, 2).  Notice that in Exam-


ple 5.10, we could have
written f(x, y) = (x +
1)2 + (y − 2)2 − 2. It is
then obvious that f has
Example 5.11. Find critical points of f(x, y) = 2x 2 + 4y2 − 4xy + 4x. a minimum when x = −1
and y = 2.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
102 Functions of Several Variables

Solution. We have
In Example 5.11, if you  
rewrote the function as ∂f ∂f
f(x, y) = (x − 2y)2 + ∇f = , = (4x − 4y + 4, 8y − 4x).
(x + 2)2 − 4 you would ∂x ∂y
immediately see that
the function has a lo-
cal minimum when each
For critical points we solve the simultaneous linear equations
of the squares is zero,
i.e. when x = −2 and so 4x − 4y = −4
y = −1. However, this
ad hoc approach will −4x + 8y = 0,
only work under limited
circumstances.
to find the solution (and only critical point) (x, y) = (−2, −1). 

Second order partial derivatives


∂f ∂f
For a function of n vari- A function f : R2 → R has two partial derivatives ∂x and ∂y , but it actually
ables, f : Rn → R, there
are n partial derivatives
has four second order partial derivatives:
and n2 second order    
partial derivatives. ∂2 f ∂ ∂f ∂2 f ∂ ∂f
= and = ,
∂x 2 ∂x ∂x ∂y2 ∂y ∂y

and the mixed second order partial derivatives


   
∂2 f ∂ ∂f ∂2 f ∂ ∂f
= and = .
∂x∂y ∂x ∂y ∂y∂x ∂y ∂x

Example 5.12. Find the second order partial derivatives of the func-
tion f : R2 → R given by f(x, y) = x 3 sin y.

Solution. First, we find the (first order) partial derivatives:

∂f ∂f
= 3x 2 sin y and = x 3 cos y.
∂x ∂y
∂f
Looking at ∂x in particular, we can find its own partial derivatives:
   
∂ ∂f ∂ ∂f
= 6x sin y and = 3x 2 cos y.
∂x ∂x ∂y ∂x
∂2 f ∂2 f
∂x 2 ∂y∂x
We denote these by and , respectively.
5.4. Critical Points 103

∂f
∂y
Similarly, by looking at , we have
   
∂2 f ∂ ∂f ∂2 f ∂ ∂f
= = 3x cos y and
2
= = −x 3 sin y.
∂x∂y ∂x ∂y ∂y2 ∂y ∂y


A point to notice about the above example of second order partial deriva-
tives is that for the function above we had

∂2 f ∂2 f
= 3x 2 cos y = .
∂x∂y ∂y∂x

In other words, the order of the differentiation did not matter. This is
not a fluke – for most functions f which have both second order mixed
∂2 f ∂2 f
partial derivatives, ∂x∂y = ∂y∂x . Specifically,

∂2 f ∂2 f
Theorem 5.13. Suppose ∂x∂y
and ∂y∂x
exist and are continuous. Then
∂2 f ∂2 f
∂x∂y
= ∂y∂x
.

The square matrix of second order partial derivatives is used to classify


critical points.

Definition 5.14. Let f : R2 → R be a function of two variables whose


second order partial derivatives exists. The Hessian matrix, Hf , is
the matrix of second order partial derivatives:
!
2 2 ∂ f ∂ f
∂x 2 ∂x∂y
Hf := ∂2 f ∂2 f
.
∂y∂x ∂y2
.

Evaluated at a point (a, b) ∈ R2 , the Hessian matrix Hf (a, b) is a


matrix of numbers
!
∂2 f ∂2 f
∂x 2
(a, b) ∂x∂y
(a, b)
Hf (a, b) = ∂2 f ∂2 f
∈ M2 (R).
∂y∂x
(a, b) ∂y 2 (a, b)

Theorem 5.13 tells us that the Hessian matrix is (in many cases) sym- See MATH10390 Defini-
tion 1.21.
metric.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
104 Functions of Several Variables

Classifying critical points


Recall how we classified the critical points of a function f : R → R using
the second derivative of the function, Theorem 4.13. What about a critical
point f : R2 → R? Just as with one variable, the critical point could be a
local maximum, a local minimum, or what we call in R2 , a saddle point,
which means a maximum in one direction but a minimum in another.
Graphs of such critical points are shown in Figure 5.4, respectively.

Figure 5.4: a local maximum, minimum and saddle point

0 2
1
The graph of a
saddle point can -0.5 1.5 0.5
look a lot like a
horse’s saddle, -1 1 0
hence the name.
-1.5 0.5 -0.5

-1 -1
-1
-0.5
0 -0.5 -1 -0.5
-2 0 y
-1 0 x 1 0
1 0.5 -0.5 0.5 0.5 y
0.5 0 1 0 0 0.5
-0.5 0.5 1 x
-1 1 -0.5 1
x y -1

See MATH10390 Defini- The classification of critical points in R2 is almost identical in phrasing
tion 6.10. A real sym-
metric matrix is positive
to Theorem 4.13 except that instead of the usual second derivative at
definite if all its eigen- the critical point f 0 (c) we use the Hessian matrix at the critical point
values are positive num-
bers and negative def- Hf (a, b) and instead of positivity/negativity of the number f 0 (c) we ask
inite if the eigenvalues for positive/negative definiteness of the matrix Hf (a, b).
are all negative.
Eigenvalues and
eigenvectors of matrices Theorem 5.15. Suppose f has a critical point at (a, b). To classify the
critical point (a, b), consider the Hessian matrix Hf (a, b). If Hf (a, b) is
will be covered in
MATH10390 Chapter 5.
positive definite then (a, b) is a local minimum. If Hf (a, b) is negative
definite then (a, b) is a local maximum.
If one of the eigenvalues of Hf (a, b) is positive and the other negative
then (a, b) is a saddle point.

Notice that this classification theorem fails to classify the critical point
if either of the eigenvalues of Hf (a, b) is zero.
It turns out that there is a simplification of Theorem 5.15 in the case of
functions of two variables, which doesn’t mention matrices or eigenvalues.
5.4. Critical Points 105

Theorem 5.16. Suppose a function f of two variables has a critical


point at (a, b).

∂2 f ∂2 f ∂2 f 2

1. If ∂x 2 ∂y2
− ∂x∂y
> 0 at (a, b), then the critical point is:
∂2 f ∂2 f
a) a local maximum if ,
∂x 2 ∂y2
< 0 at (a, b), and
∂2 f ∂2 f
b) a local minimum if ,
∂x 2 ∂y2
> 0 at (a, b).

∂ f ∂ f ∂2 f 2

2. If ∂x 2 ∂y2 − < 0 at (a, b) then the critical point is a saddle
2 2
∂x∂y
point.

∂2 f ∂2 f ∂2 f 2

If ∂x 2 ∂y2 − ∂x∂y
= 0 at (a, b), then we cannot use Theorem 5.16 to
classify the critical point.
The reader can find in Appendix B a proof of the equivalence of Theorems
5.15 and 5.16 in the case of two variables.

Example 5.17. Consider Example 5.11 again. We found f(x, y) =


∂f
2x 2 + 4y2 − 4xy + 4x had partial derivatives ∂x = 4x − 4y + 4 and
∂f
∂y
= 8y − 4x, and a critical point at (−2, −1), which we classify.
∂2 f ∂ ∂2 f ∂2 f ∂
We have ∂x 2
= ∂x
(4x − 4y + 4) = 4, ∂x∂y
= ∂y∂x
= ∂x
(8y − 4x) − 4 and
∂2 f ∂
∂y2
= ∂y
(8y − 4x) = 8. Since
 2
∂2 f ∂2 f ∂2 f ∂2 f ∂2 f
− = 16 > 0 and , > 0,
∂x 2 ∂y2 ∂x∂y ∂x 2 ∂y2

f has a local minimum at (−2, −1), according to Theorem 5.16 (1b).


Alternatively, we can use Theorem 5.15 to classify the critical point.
The Hessian is !
4 −4
Hf = .
−4 8
In this example, the functions in the Hessian matrix are actually con-
stants, so it is the same at every point, but in particular
!
4 −4
Hf (−2, −1) = .
−4 8

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
106 Functions of Several Variables


√ that the eigenvalues of this matrix are 6 + 2 5 ≈ 10.47 and
It would be a good
idea to review this
We find
part of the example 6 − 2 5 ≈ 1.53. Both are positive! This means Hf (−2, −1) is positive
once you have covered
MATH10390 Chapter 5.
definite and (−2, −1) is a local minimum by Theorem 5.15.

Global maxima and minima


Similar to the one variable case presented in Section 4.1, existence of
max/minima of a function f : R2 → R depend on continuity of the function
and on properties of the domain that we’re considering. In one variable,
to ensure a max/min on an interval, we needed to include the endpoints
of that interval. In two variables, we typically want the maximum over
some two-dimensional domain (e.g. a disc or square in R2 , but often a
more exotic shape) and to guarantee a max/minimum we have to check
these boundary points of our domain. The problem is that while the
boundary of an interval is just two points (the endpoints), the boundary
of a disc is a circle, which includes an infinite number of points of R2 .
This is a problem since we can’t just check the value of the function at
each point of the boundary.
The technique that resolves this problem is the method of Lagrange Mul-
tipliers, but unfortunately we don’t have room to pursue it here.

5.5 An application – Least Squares


A common problem is to take a set of observed or experimental data and
try to extrapolate from it predicted data for some point in space or time
that is not covered by the observed data. Essentially, this means finding
a function which fits the data.
Typically, there might some physical law, or theoretical model, which
covers the situation and gives us an idea of what sort of function we’re
after (linear function, quadratic, exponential, log, etc.). We’ll concentrate
for the moment on a linear model: that is, we think that the data should
form a straight line. In practice, because of measurement errors, or
because the model is an approximation that cannot take all factors into
account, the data will not sit exactly on a straight line. Nevertheless, we
still want to find the line of best fit.
Let’s explore this by looking at a simple example.

Example 5.18. The temperature in a room is modelled as a linear


5.5. An application – Least Squares 107

function. At times 1, 2 and 3, minutes, we observe the temperature


at 14, 16, 19 Celsius respectively. Thus we have three data points
(1, 14), (2, 16) and (3, 19). We can plot these three points in R2 and
notice they don’t quite sit on a straight line (Figure 5.5 below).
So what line best fits this data set? Well, a straight line has the
form f(x) = mx + c. For a given choice of m and c, each data point
(x, y) has a discrepency (vertical distance from the line) given by
dx = y − (mx + c). We’d like what choice of m and c best suits our
data. The method of linear least squares involves choosing m and c
so that the sum of squares of the discrepencies is minimised.
In our data set, d1 = 14 − (m + c), d2 = 16 − (2m + c) and d3 =
19 − (3m + c). The sum of squares of these discrepencies is

d12 + d22 + d32 = (14 − m − c)2 + (16 − 2m − c)2 + (19 − 3m − c)2 .

This gives us a function g of the two variables m and c. A little


arithmetic yields

g(m, c) = 813 − 206m − 98c + 14m2 + 12mc + 3c 2 .

Now we want to find the minimum value of g(m, c) over all posible
values of (m, c) ∈ R2 . Since our region is the entire Cartesian place,
there is no boundary as such, and we just have to look for critical
points by solving ∇g = (0, 0).
∂g
Now ∂m = −206 + 28m + 12c while ∂g ∂c
= −98 + 12m + 6c so the
equations we have to solve are the linear system

28m + 12c = 206


12m + 6c = 98.

The solution to these equations is (m, c) = ( 52 , 34


3
). Hence our line of
best fit is f(x) = 52 x + 34
3
. This is drawn together with the data set in
Figure 5.5.

Exercise 5.19. Use Theorem 5.16 to show that the critical point
(m, c) = ( 25 , 34
3
) is indeed a local minimum.

Our example just has a data set of three points, but the same method
applies to find the line of best fit through any number of points in R2 .

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
108 Functions of Several Variables

Figure 5.5: The method of least squares

19 19

18 18

17 17

16 16

15 15

14 14

1 2 3 1 2 3

The method is known as least squares


X because we minimise the sum
of squares of the discrepencies, dx . We don’t minimise the sum of
2

P x
discrepencies x dx because
P this would encourage large negative dis-
crepencies. Mimimising | x dx | would allow large positive
P and negative
discrepencies to cancel each other out. Minimising x |dx | is logically
sensible, but the problem is that the absolute value function is not dif-
ferentiable at 0 (see Example 3.8) which complicates
P the task of finding
minimum values. Therefore we choose to minimise dX2 .

Calvin and Hobbes by Bill Watterson


Chapter 6

Integration

6.1 Indefinite integrals


We are now able to differentiate just about any function. Consider the
reverse problem. Given a function f, can we find a function F whose
derivative is f, that is F 0 (x) = f(x). Such a function F is called an
antiderivative of f, and the process of finding F is called integration of
f. For some functions it is easy to find an antiderivative. For example,
take f : R → R, f(x) = 2x. What function must we differentiate to
get a derivative of 2x. Clearly, F (x) = x 2 is such a function. However,
F (x) = x 2 + 1 is also an antiderivative of f, as is F (x) = x 2 − 23. In fact,
any function of the form F (x) = x 2 + c, where c is a constant, will be an Siméon Denis Poisson
antiderivative of f. All antiderivatives are of this form, by Corollary 4.20. (1781 – 1840)
‘Life is good for only
two things, discovering
Definition 6.1. A function F is called an antiderivative of the function mathematics and teach-
f if F 0 (x) = f(x). The set of all antiderivatives of f is called the ing mathematics.’

indefinite integral of f with respect to x and is denoted by


Image source:
Wikipedia.
Z
f(x) dx. Mathematics is case
sensitive: f and F
represent different
functions!

Remarks 6.2. If the function f is defined on an interval then the


indefinite integral will be the set of all functions of the form F (x) + c,
where F is a fixed antiderivative of f and c is any constant. This is
a consequence of Corollary 4.20.
However, if f is defined on a set which is not an interval then it may

109
110 Integration

not happen that any two antiderivatives differ by a constant function.


Can you think of an ex-
ample of this behaviour? Throughout this chapter, we will take our functions to be defined on
an interval, so that any two functions in the indefinite integral will
differ only by a constant.

R
Remarks 6.3. The symbol is the integral sign and evolved from
an elongated S. For reasons that will become apparent later, the
letter S was Rused to represent ‘summation’. The function f to be
integrated in f(x) dx is called the integrand and x is the variable
of integration.

R 
Recall from Defi- In our first example, we have 2xRdx = x 2 + c : c ∈ R . However, this
nition 6.1 that the
indefinite integral
is usually written more lazily as 2x dx = x 2 + c, where c is understood
is officially a set of to be an arbitrary constant, referred to as the constant of integration.
functions.

Example 6.4. Find the following indefinite integrals.


R
1. x 3 dx
R
2. cos x dx
R
3. 1 dx.

Solution.

1. First, we need to find a particular antiderivative of x 3 . The


derivative of x 4 is 4x 3 , but if we take F (x) = 41 x 4 then F 0 (x) = x 3 .
So F (x) = 14 x 4 is an antiderivative of f and
Z
x 3 dx = 14 x 4 + c.

2. Since d
dx
sin x = cos x, we have
Z
cos x dx = sin x + c.

3. The derivative of x is 1, hence the solution is x + c. 


6.1. Indefinite integrals 111

Fact 6.5. Certainly, knowing derivatives of common functions helps


us when trying to find integrals of common functions. For example,
d r+1
from the power
R r rule for differentiating, dx x = (r + 1)x r , we can
infer
R that x dx = r+1 1
x r+1 + c, as long as r 6= −1. In particular,
1 dx = x + c. Other standard integrals are as follows.

R
f(x) f(x) dx
r
x (r 6= −1) 1
r+1
x r+1
log |x|? Both log x and
x −1 log |x| log(−x) have a deriva-
tive of x1 . However, for
cos x sin x x ∈ R, only one of log x

sin x − cos x
and log(−x) will be de-
fined since the domain
1/ cos2 x tan x
of log is the set of posi-
tive numbers. So, on the
ex ex positive half line the in-
tegral of x1 is log x while
on the negative half line
it is log(−x). These can
(We have omitted the constant of integration in each formula.) be
R 1 combined by saying
x dx = log |x|.

Properties of the indefinite integral


Just like the process of differentiating, that of integrating is linear (if we
overlook technicalities due to the non-uniqueness/constant of integra-
tion).

Fact 6.6. The equalities below hold if we overlook the constants of


integration.
R R
1. kf(x) dx = k
f(x) dx where k is a constant.
R R R
2. (f(x) + g(x)) dx = f(x) dx + g(x) dx.

These two linearity rules are easily derived from the facts that (f +g)0 (x) =
f 0 (x) + g0 (x) and (kf)0 (x) = kf 0 (x). In practical terms, overlooking the
constants of integration above means that we can apply Fact 6.6 without
paying attention to them, provided we remember to insert a final constant
of integration after using it (see examples of this below).
Unfortunately, and this is the difficulty with integration, there are no Differentiation is a more
algorithmic process: fol-
simple analogues of the product, quotient or chain rules of differentiation low the rules correctly
for integration. We will spend a good deal of time developing methods and you will find the
derivative. Integration
is more of an art form!
© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
112 Integration

to overcome the difficulty this entails, but there is no getting around the
fact that integration is harder than differentiation. We start though with
some simple examples which use the two rules which we know.

Example 6.7.
Z Z Z Z
1. x − 2x + 3 dx =
4 3
x dx − 2 x dx + 3 1 dx
4 3

Notice that we combine


several constants of in- = x
− 12 x 4 + 3x + c.
1 5
5
tegration into one (be-

cause the sum of several
Z   Z Z Z
3 1
5 cos x − + x dx = 5 cos x dx − 3 dx + x 2 dx
1
constants is still a con- 2.
stant). The first example x x
suggests, correctly, that
= 5 sin x − 3 log |x| + 23 x 2 + c.
3
integrating any polyno-
mial presents no diffi-
culty. Z   Z
x
−√ (x −3 − x − 3 + 2ex ) dx
3. 1 1 1
+ 2e dx =
x 3 3
x
= − 12 x −2 − 32 x 3 + 2ex + c.
2

Z √ Z !
4+ t t2
1
4. 4
dt = + dt
t3 t3 t3
Z
(4t −3 + t − 2 ) dt
5
=

= 4 · − 12 t −2 − 23 t − 2 + c = −2t −2 − 23 t − 2 + c.
3 3

Z Z
−2
5. u (1 − u) du = (u−2 − u−1 ) du

= −u−1 − log |u| + c = −


1
− log |u| + c.
u

6.2 Riemann sums and definite integrals


Recall the summation notation: nk=1 ck = c1 + c2 + · · · + cn . Let f be a
P
See MATH10390 Sec-
tion 1.2 for a refresher if
necessary.
function on the closed interval [a, b]. For convenience, assume that it is
nowhere negative (so that its graph is on or above the x-axis. What is
the area between the (graph of the) function and the x-axis?
We can partition of [a, b] into n sub-intervals (not necessarily equally
spaced) by choosing n − 1 points between a and b. In each sub-interval
choose a point xk and draw a rectangle over the subinterval of height
6.2. Riemann sums and definite integrals 113

f(xk ). The width of the rectangle is denoted by ∆k . The area of the kth
rectangle is the product f(xk )∆k , and an approximation to the area we
want is obtained by adding up the areas of these rectangles. We see
that the area is approximated by
n
X
f(xk )∆k = f(x1 )∆1 + f(x2 )∆2 + · · · + f(xn )∆n .
k=1

Example 6.8. Approximate the area under f(x) = x 2 from a = 1 to


b = 4 by calculating a Riemann sum of 6 rectangles having equal
width 21 .

Solution. We’ll take xk to be the mid-point of the base of the kth


rectangle. Then
6
X
f(xk )∆k = f(1.25) · 12 + f(1.75) · 21 + · · · + f(3.75) · 1
2
k=1

= 12 (1.252 + 1.752 + 2.252 + 2.752 + 3.252 + 3.752 )


= 20.9375. 

The value 20.9375 calculated in this example is actually the total area
of the six rectangles involved in our Riemann sum (shaded in grey in
Figure 6.1, on the left). It is an approximation of the desired area. If we A finer partition just
means that the maxi-
repeat the exercise with a finer partition we get a better approximation. mum width of any rect-
You can get a feel for this by considering Figure 6.1, where 6 and 12 angle is smaller.
strips, respectively, are used to approximate the area.

Figure 6.1: Riemann sums of 6 and 12 rectangles, respectively

16 16

12 12

8 8

4 4

1 2 3 4 1 2 3 4

If we used hundreds of rectangles we would see something like Figure 6.2.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
114 Integration

Figure 6.2: a Riemann sum with hundreds of rectangles

16

12

1 2 3 4

We can see how the grey area calculated by our Riemann sums should
limit to the desired area. This motivates the following definition.

Definition 6.9. The definite integral of f over [a, b] is defined to be


We have been a little
b
vague here. There is Z X
f(x) dx := f(xk )∆k ,
a lot of detail in the
theory of Riemann sums lim
and ‘Riemann integra- a rectangle width ∆k →0
k
tion’ that lies beyond
the module’s scope.
when this limit exists. When it exists, the value of this limit may be
interpreted as an area.

R5
Example 6.10. Find 3
2x dx by considering areas.

Solution. The definite integral is the area under the graph of 2x


between 3 and 5. This can be broken into the area of a rectangle
of width 2 and height 6, and a triangle of width 2 and height 4, see
Figure 6.3. Thus
Z 5
2x dx = area() + area(∆) = 2 · 6 + 12 · 2 · 4 = 16. 
3

Figure 6.3: Finding definite integrals by considering areas

10

3 5
6.2. Riemann sums and definite integrals 115

We assumed so far that f > 0, that is, its graph was on or above the
x-axis, but this was only for convenience. Even if f has negative values
Rb
on [a, b], we still define a f(x) dx to be the limit of the Riemann sums as
we take finer partitions, but negative values of f contribute negatively in
the Riemann sum. That is, area under the x-axis carries a negative sign.

Remarks 6.11. When f 6 0, the area between the graph of the func-
tion and the x-axis is negative. Thus
Z 1
x dx = 0,
−1

because the area above and below the x-axis cancel out (see Fig-
ure 6.4).

Figure 6.4: Finding definite integrals by considering areas

−1 1

−1

Example 6.12. In Definition 6.9 we said ‘when the limit exists’. It


doesn’t always exist. For example, the function q(x) defined in Ex-
R1
ample 2.43 (2) is an example where 0 q(x) dx does not exist in the
sense defined above. You might think why this is the case. What do
you think the area under the graph of this function should be?

Rb
Nevertheless, the definite integral a f(x)dx does exist for a large class of
functions. One positive result, due to Riemann, is that it exists whenever
f is continuous.
Rb
Theorem 6.13. If f : [a, b] → R is continuous then a
f(x) dx exists.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
116 Integration

Constructing finer and While it is helpful to know that the definite integral exists, this result does
Rb
not explain how to find a f(x) dx for a given function f. Neither does
finer Riemann sums, and
trying to spot what their
R Rb
limit is is not an appeal-
ing option!
it explain why have used two similar notations f(x) dx and a f(x) dx
for two quite different concepts. However, the answer to both of these
questions is provided by the two parts of the Fundamental Theorem of
Calculus.

6.3 The Fundamental Theorem of Calculus


Rb
Given a continuous function f on an interval [a, b] we know that a f(t) dt
exists. Its value is just a number, but by allowing the upper limit of the
integral to vary over [a, b], we can define a function F by
Z x
F (x) = f(t) dt.
a

It turns out that this new function F is differentiable and its derivative is
exactly f.

Theorem 6.14 (Fundamental Theorem of Calculus – Part 1). If f :


[a, b] → R is continuous, then F : [a, b] → R defined by
Z x
F (x) := f(t) dt,
a

is a differentiable function and


Z x
d d
F (x) = f(t) dt = f(x). (6.1)
dx dx a

This result is what links the definite and indefinite integrals. It tells
us that a continuous function has an antiderivative, and that that an-
tiderivative is given in terms of an area function (i.e. definite integral).
More precisely,
Rx an antiderivative (i.e. indefinite integral) of f(x) is given
by F (x) = a f(t) dt.
Notice that F (a) = 0 because it represents the area of a line. Now this
F is not be the only antiderivative of f, but we do know that any other
antiderivative G say differs from F by a constant, so there is a number c
such that G(x) = F (x) + c for all x in the interval. Thus

G(b) − G(a) = F (b) + c − (F (a) + c)


6.3. The Fundamental Theorem of Calculus 117

Z b
= F (b) − F (a) = F (b) = f(t) dt.
a

So, no matter which antiderivative G of f that we pick, G(b)−G(a) always


Rb
gives us the same answer, namely a f(t) dt. This then is the second part
of the Fundamental Theorem of Calculus and tells us how to compute
definite integrals (and areas).

Theorem 6.15 (Fundamental Theorem of Calculus - Part 2). If f is


continuous on [a, b] and F is any antiderivative of f on [a, b], then
Z b
f(x) dx = F (b) − F (a).
a

b
Definition 6.16. The notation F (x) a is used as shorthand for F (b) −  b
The notation F (x) a is
F (a). also used.

Let’s have another go at Example 6.10, this time using antiderivatives


(Theorem 6.15) instead of areas of rectangles and triangles.
R5
Example 6.17. Find 3
2x dx by using Theorem 6.15.

Solution. We know that F (x) = x 2 is an antiderivative of 2x. There-


R5
fore to compute the definite integral 3 2x dx we simply use Theorem
R5
6.15 to write 3 2x dx = F (5) − F (3), where F (x) = x 2 , that is,
Z 5 5
2x dx = x 2 3
3
= 52 − 32
= 16. 

This of course is the same answer we had in Example 6.10 but this method
applies where splitting into rectangles and triangles is not possible. For
example, we can now compute exactly the area in Example 6.8 without
having to use a Riemann sum approximation.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
118 Integration

Example 6.18. What is the area under the graph of f(x) = x 2 between
a = 1 and b = 4.
R4
Solution. We want 1 f(x) dx. We know that F (x) = 31 x 3 is an an-
tiderivative of f and so
Z 4
4
x 2 dx = 13 x 3 1 = 13 (43 − 13 ) = 63 
Compare this solution
with our Riemann sum 3
= 21.
approximation in Exam- 1
ple 6.8!

Properties of the definite integral


Rb
The integral a f(x) dx is called the definite integral because the answer
F (b) − F (a) is just a number – the answer does not contain the variable
Rb Rb
x. We call x a ‘dummy’ variable because a f(x) dx = a f(u) du. Both
are equal to F (b) − F (a) where F is any antiderivative of f.

Fact 6.19 (Properties of the Definite Integral).


Rb Rb
1. a
kf(x) dx = k a
f(x) dx for any constant k,
Rb Rb Rb
2. a
[f(x) ± g(x)] dx = a f(x) dx ± a g(x) dx.
Ra
3. a
f(x) dx = 0,
Rb Ra
4. a
f(x) dx = − b f(x) dx,
Rb Rc Rc
5. a
f(x) dx + b
f(x) dx = a
f(x) dx.

The first two properties (linearity) come from linearity of the indefinite
integral and the close connection between the definite and indefinite
integral that the Fundamental Theorem of Calculus gives us. Parts (3)
and (5) should make sense to you geometrically in terms of areas, but
can be proven easily from Theorem 6.15. Part (4) is also easy to prove
from this theorem, if not so geometrically intuitive (‘backwards area is
negative!’).
We now have a very useful and flexible method of calculating areas of
shapes bounded by graphs of functions. We are limited though to the
functions which we can integrate (i.e. find an antiderivative of). To expand
the list of functions we can integrate, we will look at some methods of
6.3. The Fundamental Theorem of Calculus 119

integration in the next chapter, but we’ll finish this one with a few more
examples of computing definite integrals.

Example 6.20.
R4√ R4
t dt = t t = 23 4 2 − 23 · 0 = 16
1 2 32 4 3
1. 2 dt = .
0 0 3 0 3
R1 1
2. −1
3x 2 − x 3 + 1dx = x 3 − 14 x 4 +x −1 = (1− 41 +1)−(−1− 41 −1) = 4.
Rπ π
3. 0
sin t dt = − cos t = − cos π − (− cos 0) = −(−1) − (−1) = 2.
0
When integrating trig-
onometric functions, ra-
R3 dians are used by de-
4. 2
−x 3
+ 6x 2
dx = − 4
x
1 4
+ 2x 3 3
2
= − 81
4
+ 54 − (−4 + 16) = 21 34 . fault!

Rx t x
5. 0
e dt = et 0 = ex − e0 = ex − 1.
1 4 x
Rx 3
6. 0
w dw = 4
w 0
= 14 x 4 .
Rx 3 x
7. 0
s ds = 41 s4 0 = 14 x 4 .

Remarks 6.21. These last two examples demonstrate that a definite


integral depends only on the limits of the integral and the function to
be integrated (the integrand) and not on the variable (but it is impor-
tant that the same variable is used in the formula for the function and
integration variable). They also show how a function can be defined
in terms of an integral.

Calvin and Hobbes by Bill Watterson

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
Chapter 7

Methods of Integration

We said in Chapter 6 that integration is harder than differentiation. Why


is that? We can differentiate and integrate powers such as x 2 and other
standard functions like cos, sin and exp. The difference is that the prod-
uct, quotient and chain rules (together with logarithmic and implicit dif-
ferentiation, from time to time) allow us to differentiate essentially any-
thing that can be differentiated, and the outcome can be expressed in
terms of functions that are familar to us. For example, these rules allow
x
us to find dxd (x 2 cos x), dxd cos
x2
, and dxd cos(x 2 ) relatively easily.
However, while there are integral counterparts to the chain and product
rules (we will see them in this chapter), they are harder to use and not
as universally applicable. In Rcontrast to the
R above, it is difficult to find
x
R
expressions for x 2 cos x dx, cos x2
dx or cos(x 2
) dx. In fact, while it is Keep in mind the differ-
ence: while the indefi-
possible to evaluate the first integral in terms of functions we know (see nite integrals do exist,
Example 7.11), it is in fact impossible to express the other integrals in in the sense that anti-
derivatives for the func-
terms of familiar functions. tions exist, they are im-
possible to express in
Nevertheless, in this section, we look at two methods that might help us familiar terms.
to integrate a given function, bearing in mind that there is no method or
set of methods that will work in all cases.

7.1 Integration by Substitution


The chain rule for differentiation allows us to find dxd (3x + 1)100 by making
the substitution u = 3x + 1 and applying the chain rule: dxd u100 = du u ·
d 100
du
dx
, both of which are simple derivatives.
R
What about the integral (3x + 1)100 dx? Of course, we could expand the The function that is to
be integrated is called
integrand (3x + 1)100 and integrate the resulting polynomial, but that’s the integrand.

121
122 Methods of Integration

not very appealing. How about making the same substitution we used
to differentiate, that is, let u = 3x + 1. Now our integral becomes
Z Z
100
(3x + 1) dx = u100 dx.

This looks slightly more promising, except that we cannot integrate a


function of u with respect to x. What we have to do is relate du to dx.
The terms du and dx This is done by calulating du
dx
. Here u = 3x + 1 gives du dx
= 3. Now, we
are not real numbers, so
this kind of manipula-
do something that looks a little fishy, which is to ‘multiply both sides by
tion is not valid in gen- dx’ to get du = 3dx, or dx = 13 du. This allows us to continue from above
eral. It should only be
done within the context with
of integration by substi-
Z Z
tution. 100
(3x + 1) dx = u100 dx u = 3x + 1
Z
= 13 u100 du now a standard integral!

= 1
3
· 1
101
u101 +c = 1
303
(3x + 1)101 + c.
Notice that at the end we replaced u by 3x + 1, to give the answer in
terms of the variable in which the original integral was phrased. It’s the
polite thing to do.
This form of integration, where the variable is changed to make the inte-
gral appear as a standard integral, is called integration by substitution.
It is the integral counterpart to the chain rule for differentiation.

Example 7.1.
Z
1. Determine cos(3t + 2) dt.

R
Solution. Since we know the standard integral cos u du, we
will try the substitution u = 3t + 2:
Z Z
Notice we’ve written c
cos(3t + 2) dt = cos u dt
instead of 13 c 0 , simply to Z
= 3 cos u du
indicate that one third 1 du
of an arbitrary constant dt
= 3 implies dt = du
3
is still just an arbitrary
constant (albeit a differ- = 1
3
(sin u + c0 ) = 1
3
sin(3t + 2) + c. 
ent one).
Z
1
2. Determine dt.
3t + 6
7.1. Integration by Substitution 123

R
Solution. We know u1 du, so let’s try the substitution u = 3t+6
which makes our integral look like that standard integral:
Z Z
1 1
dt = dt
3t + 6 u
Z
1
= 13 du du
= 3 implies du = 3dt
u dt

= 13 (log |u| + c) = 13 log |3t + 6| + c 0 . 

Unfortunately, integration by substitution may not be straightforward, Sometimes integration


by substitution can
because it may not be at all obvious what is the correct substitution to be fiendishly difficult,
make. though we’ll steer clear
of such examples in this
Z module.

Example 7.2. Determine 3x 2 (x 3 + 1)5 dx.

Solution. In this example, it is not so clear beforehand what we


should substitute for. Should we let u = 3x 2 , or u = 3x 2 (x 3 + 1)5 ,
or u = x 3 + 1 or something else? It happens that the most fruitful
substitution is u = x 3 + 1.
Z Z
2 3 5
3x (x + 1) dx = 3x 2 · u5 dx
Z
= u5 du du
dx
= 3x 2 implies du = 3x 2 dx

= 6
u +c
1 6

= 1 3
6
(x + 1)6 + c. 

Let’s examine that last one again. Why did u = x 3 + 1 work out nicely?
The reason lies behind the fact that substitution is like a reversal of
the chain rule. The chain rule says that f 0 (g(x))g 0
R 0 (x) is the derivative of
f(g(x)). In terms of anti-derivatives, this says f (g(t)) · dg
dt
dt = f(g(x)).

Remarks 7.3. When deciding what substitution to make, keep an eye


out for a function u(x) appearing inside another function f(u(x)), and
its derivative u0 (x) (or a scalar multiple thereof) appearing as a factor.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
124 Methods of Integration

Example 7.4.
Z
xe−x dx.
2
1. Evaluate

Solution. We see −x 2 appearing as the input to the exponential


function, i.e. e−x . This is multiplied by x, which is not the
2

derivative of −x 2 , but almost is, in the sense that x = − 12 (−2x).


This looks promising, so we try the substitution u = −x 2 .
Z Z
−x 2
xe dx = xeu dx,

where u = −x 2 . Thus du
dx
= −2x, giving xdx = − 21 du, and
Z Z
−x 2
xe dx = − 2 eu du = − 12 eu + c = − 12 e−x + c. 
2
1

Z
2. Evaluate cos x sin x dx.

Solution. Here we’ll try the substitution u = sin x because we


Interestingly, you can
also do this one with
the substitution u = see its derivative is a factor in the integrand.
cos x to get the solu-
tion − 21 cos2 x + c. This Z Z
appears to be a dif-
ferent answer, but the cos x sin x dx = u du = 12 u2 + c = 12 sin2 x + c. 
difference between the
two solutions is a con-
stant, so they are re-
ally the same as far
anti-derivatives are con-
cerned.
Definite integration by Substitution
When calculating a definite integral which involves a substitution one
should also substitute in the limits. For example,
Z Z x=1
1
x2 1
√ dx = √3 du,
0 x3 + 1 x=0 u

where we have substituted u = x 3 + 1. Now make sure to change the


limits of integration to the new variable. When x = 0, u = 03 + 1 = 1
and when x = 1, u = 13 + 1 = 2. So the definite integral is actually


Z 2
2
− 21

1
3
u du = 3
u
2 12
= 2
3
( 2 − 1).
1 1
7.2. Integration by Parts 125

7.2 Integration by Parts


The product rule for differentiation says that

u(x)v(x) = u0 (x)v(x) + u(x)v 0 (x).


d
dx
If we integrate both sides with respect to x, we have

Z Z
u0 (x)v(x) + u(x)v 0 (x) dx,
d 
u(x)v(x) dx =
dx
and so, forgetting about constants of integration for the moment,
Z Z
u(x)v(x) = u (x)v(x) dx + u(x)v 0 (x) dx.
0

Fact 7.5.
Z Z
0
u(x)v (x) dx = u(x)v(x) − u0 (x)v(x) dx.

which is more easily read as


Z Z
uv = uv − u0 v.
0

This is the formula for integration by parts. It is the integral counterpart


to the product rule for differentiation. It allows you to transform an
integral which involves the product of two functions, by differentiating
one of the functions and integrating the other, in the hope of making the
remaining integral easier to calculate. The formula for integra-
tion by parts does not
Z tell you which function
to pick for u and which
Example 7.6. Determine x sin x dx. to pick for v 0 ! The for-
mula changes the in-
tegrand by differentiat-
0 0 0 ing u and integrating v 0 .
Solution. Set
R u = x and v = sin x. We have u = 1, and v = sin x
implies v = sin x dx = − cos x:
We want this change to
make the integrand sim-
pler. The correct choice
Z Z here is to take u(x) = x
x sin x dx = x(− cos x) − 1 · (− cos x) dx and v 0 (x) = sin x, the
reason being that x sim-
Z plifies when differenti-
ated while sin x does not
= −x cos x + cos x dx become any more com-
plicated when it is inte-
grated.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
126 Methods of Integration

= −x cos x + sin x + c. 

Remarks 7.7. A rule of thumb when trying to use integration by parts


is to let u be the ‘polynomial part’ which reduces its degree when
differentiated, and to let v be the trigonometric or exponential part,
which doesn’t get significantly more complicated when integrated.

Z
Example 7.8. Determine xe−7x dx.

Solution. Having in mind Remarks 7.7, we write


Z Z
−7x
xe dx = u(x)v 0 (x) dx u(x) = x and v 0 (x) = e−7x
Z
= u(x)v(x) − u0 (x)v(x) dx.

R R
Since u(x) = x, we get u0 (x) = 1. We find v(x) = v 0 (x)dx = e−7x dx
and the substitution w = −7x gives v(x) = − 71 e−7x . So we get
Z Z
xe dx = − 7 xe + 7 e−7x dx
−7x 1 −7x 1

= − 17 xe−7x − 1 −7x
49
e + c. 

One standard function whose integral is conspicuously R absent from our


list of standard integrals (Fact 6.5) is log x. What is log x dx? Integra-
tion by parts helps us here. In fact, we can do a little bit more.
Z
Example 7.9. Let n > 0 be an integer. Determine x n log x dx.

Solution. We write
Z Z
n
x log x dx = x n log x dx
|{z} | {z }
v0 u
Z
x n+1 x n+1
1
= 1
n+1
log x − 1
n+1
dx
| {z } | {z } x
| {z }|{z}
v u v
u0
7.2. Integration by Parts 127

Z
= 1
n+1
x n+1 log x − 1
n+1
x n dx

= 1
n+1
x n+1 log x − 1
(n+1)2
x n+1 + c. 

Note that all ofR this works perfectly well when n = 0 (where we have
x n = 1), giving log x dx = x log x − x + c. As with any integral, we can
check our answer by differentiating our solution to get the function we
were originally trying to integrate:
 
d d d
(x log x − x) = x log x + x · log x − 1
dx dx dx
1
= 1 · log x + x · − 1
x
= log x.

You can add this one to our short list of standard integrals (Fact 6.5).

Remarks 7.10. This relates to Remarks 7.7. Probably the most well-
known rule of thumb for applying integration by parts is the so-called
LIATE rule. It states that the function that comes first in the following
list should be chosen as u:

L logarithmic functions

I inverse trigonometric functions

A algebraic functions (i.e. polynomials)

T trigonometric functions

E exponential functions.

Inverse trigonometric functions feature in Appendix B.2 and are not


in the main notes. Hence, for us, perhaps LATE rule is a better (and
more memorable) name. Again, this is a rule of thumb, and there are
cases in which it fails.

Using integration by parts more than once


Sometimes it is necessary to use integration by parts repeatedly in or-
der to evaluate certain integrals. Standard examples of this include

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
128 Methods of Integration

R R R
x n sin x dx, x n cos x dx and x n ex dx, where n R> 1 is an integer.
In the final example of the chapter, we just consider x 2 cos x dx.
Z
Example 7.11. Determine x 2 cos x dx.

Solution. We will use integration by parts twice. Again having in


mind Remarks 7.7, we use the technique once to get
Z Z
x cos x dx =
2
x 2 cos
|{z} | {z x} dx
u v0
Z
x sin
= |{z}2
x−
|{z} 2x sin
|{z}|{z}x dx
u v u0 v
Z
= x 2 sin x − 2 x sin x dx.

At this point, we can apply integration by parts a second time to


evaluate the integral that remains on the right hand side. Actually
we did this earlier in Example 7.6, hence we can finish by writing
Z
x 2 cos x dx = x 2 sin x + 2x cos x − 2 sin x + c. 

R
In general, the integral x n cos x dx can be evaluated by performing
integration by parts n times (formally, by mathematical induction).
We finish with another remark about the difficulty of expressing certain
integrals. The bell curve e−x introduced in Example 3.28 was hailed
2

with much fanfare in the nearby marginal note as being one of the most
We look at this problem important functions in statistics. Hence it is rather embarrassing to note
again in Section 8.2.
that it is impossible to express the indefinite integral of this function in
terms of familiar functions. In fact, we just have to define a brand new
function. The error function erf : R → R is defined by
Z x
erf(x) = √ e−t dt.
2 2

π 0

Using Theorem 6.14, erf is differentiable and

erf(x) = √ e−x ,
d 2 2

dx π
7.2. Integration by Parts 129


hence 2π erf(x) is an anti-derivative of e−x . The constant multiple √2π is
2

introduced into the definition of erf in order to ensure that limx→∞ erf(x) =
1 (this is a consequence of a beautiful result in integration that, sadly,
we do not have time to cover!).

Calvin and Hobbes by Bill Watterson

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
Chapter 8

Numerical Techniques

8.1 Solving equations numerically


How do we solve equations? For a linear equation ax + b = 0, we just
subtract b from both sides, then divide both sides by a to find the solution
is x = −b/a. A quadratic equation can be solved exactly using the well-
known formula. In fact, there are formulae that give exact solutions of
cubic and quartic polynomial equations as well (though they are very
unpleasant to look at). However, for higher degree polynomial equations See the marginal note
alongside MATH10390
it has been proved that, while solutions exist, we will never find formulae Proposition 5.9.
that can describe them explicitly. Thus, you might not be able to solve
the characteristic polynomial to find the eigenvalues of an n × n matrix,
where n > 5. Eigenvalue and eigenvector problems that arise in practice
often involve very large matrices, where n could be in the thousands or
millions or even greater. In these situations, the only sensible option is
not to find eigenvalues exactly, but to seek numerical approximations.
In fact, even quite simple equations involving non-polynomial functions
are generally impossible to solve explicitly, even when we know that
solutions exist. Again, we must try to find numerical approximations
instead. In this section, we introduce some basic methods for doing so.

Example 8.1. Consider the equation x = cos x. Does it have any


solutions? If we plot the functions x and cos x on the Cartesian plane
(Figure 8.1), we see that they do intersect, and where they intersect
represents a value of x for which x = cos x. So the equation has a
solution, but it turns out there is no way of writing down that number
explicitly.

131
132 Numerical Techniques

Hitherto, in this pro- Figure 8.1: The solution of x = cos x


gramme I have stressed
the writing down of
numbers and solutions
exactly, whenever pos-
sible. This was to em-
phasise that there is

a difference between,
say, 2, and numeri- −1 x 1 2 3
cal approximations ob-
tained using e.g. a cal-
culator.
In this chapter we try to
put quantitative bounds
on the differences, or
errors, between true
solutions and their We are led inevitably to giving an approximate value for the solution. For
numerical approxima-
tions obtained using example, we can look at the graph we have drawn and say ‘the solution
some approximation is about 34 ’. However, that feels like pinning the tail on the donkey and
method. More precisely,
given some error bound is not very satisfactory. We want a method that is more reliable and can
ε > 0, which will give us an approximation of the solution to any desired level of accuracy.
vary according to the
application, I hope to

Remarks 8.2. The problem of finding a solution of an equation is


run my chosen method
long enough so that
the difference between equivalent to the problem of finding a root (or zero) of a function.
the true solution a and
the numerical output The equivalence is seen by taking all terms of the equation to one
x obtained from the side. For example, finding a solution to the equation x = cos x can
method is at most ε, i.e.
be rephrased as finding a root of the function f(x) = x − cos x.
|x − a| 6 ε.

So while I may not hit


the bullseye exactly, I The bisection method
can get as close as de-
One way of finding a root of a given continuous function f is to employ
sired.

the bisection method. For this, we first find two points where the graph
of f is on opposite sides of the x-axis. That is, locate x1 and x2 such
that f(x1 ) < 0 and f(x2 ) > 0. E.g. for f(x) = x − cos x we have f(0) =
That f really has a −1 < 0 while f( π2 ) = π2 > 0. The idea is that since f(x1 ) is negative
root is a consequence of
the so-called Interme-
and f(x2 ) is positive, f must have a root (a place where it crosses the
diate Value Theorem, x-axis) somewhere between x1 and x2 . If |x2 − x1 | is small enough then
which states that if g :
[a, b] → R is continuous, we know the location of our root to desired accuracy, but otherwise, we
g(a) 6 0 6 g(b), then consider the point halfway between: x3 = 12 (x1 + x2 ). Now there are
g has a root in [a, b],
i.e. there exists x ∈ three possibilities: (i) f(x3 ) = 0 in which case we have found the root
[a, b] such that g(x) = (which would be lovely, but it almost never happens), (ii) f(x3 ) > 0 or
(iii) f(x3 ) < 0. In either of these latter two cases, f(x3 ) differs in sign
0. This theorem is fairly
deep and its proof is be-
yond the scope of this to either f(x1 ) or f(x2 ) and we can say whether the root lies between x1
and x3 or between x2 and x3 . Thus we have located the root in a shorter
module. See Theorem
B.3
interval. We can keep repeating this until we have shortened sufficiently
8.1. Solving equations numerically 133

the length of the interval in which the root resides, thereby finding the
root to the desired accuracy.

Figure 8.2: applying the bisection method to x − cos x

1
The point x3 looks like it
is very close to the true
solution, but this is just
x1 = 0 x3 x2 = π dumb luck (and also de-
2
pends on the scale of the
graph), and not really by
−1 virtue of the method!

The bisection method is pretty good but has a couple of drawbacks.


Firstly, we have to initially search for x1 and x2 such that f(x1 ) and f(x2 )
have opposite signs. A second problem is that we might have to do a lot
of bisections before we get sufficient accuracy. We can do better.

Newton’s method
Newton’s method (or the Newton-Raphson method) usually provides very
fast convergence to the root of a differentiable function. The idea is very
simple. We take a starting point, say x1 , and instead of looking for the
root r of f, we find the root of the linear appproximation Lxf 1 (which is
much easier to find), and name this x2 . Recall from Section 4.4 that the
linear approximation to f at a is given by

Laf (x) = f(a) + (x − a)f 0 (a).

The root of the linear function Lxf 1 is the solution to f(x1 )+(x−x1 )f 0 (x1 ) = 0:

f(x1 ) + (x − x1 )f 0 (x1 ) = 0
(x − x1 )f 0 (x1 ) = −f(x1 )
f(x1 )
x − x1 = − 0
f (x1 )
f(x1 )
x = x1 − 0 .
f (x1 )

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
134 Numerical Techniques

Thus x2 = x1 − ff(x 1)
0 (x ) . Geometrically, all we’ve done is to calculate where
1
the tangent line to f at x1 crosses the x-axis. See Figure 8.3 (left hand
side).

Figure 8.3: first and second steps of Newton’s method

(x1 ,f(x1 )) (x1 ,f(x1 ))

(x2 ,f(x2 ))

x2 x1 x3 x2 x1

This is usually a better approximation to the true root of f than the


one we started with, so all we need to do is to repeat the procedure.
That is, for each approximation xn of the root of f, we generate a better
approximation xn+1 by the formula

f(xn )
xn+1 = xn − .
f 0 (xn )

The right hand side of Figure 8.3 displays the next iteration, which pro-
duces x3 .
This iteration process continues until we are close enough to the root.
We can measure how close we are to the root by comparing f(xn ) to 0.
That is, we will stop when |f(xn )| is less than some acceptable approxi-
mation error. Of course, if f 0 (xn ) = 0 at any point then we are stymied
by a division by zero in the formula (what does this correspond to geo-
metrically?) and we have to start again from a different x1 . However, in
practice, this problem rarely occurs.

Example 8.3. Use Newton’s method to estimate a root of f(x) = x 3 +


x 2 − x − 1, taking x1 = 3 as your initial estimate. The final estimate
x should satisfy |f(x)| < 0.00001.
8.1. Solving equations numerically 135

Solution. First we find the derivative of f, which is

f 0 (x) =
d 3
(x + x 2 − x − 1) = 3x 2 + 2x − 1.
dx
We apply Newton’s method by filling in the following table.
f(xn ) f(xn )
xn f(xn ) f 0 (xn ) f 0 (xn )
xn − f 0 (xn )

x1 = 3 32 32 1 2
x2 = 2 9 15 0.6 1.4
x3 = 1.4 2.304 7.68 0.3 1.1
x4 = 1.1 0.441 4.83 0.091304347 1.008695652

8.621011219×10−3
We start to use approx-
x5 = 1.008695652 0.035085723 4.069792059 1.000074641 imate values in rows 4
and beyond.
x6 = 1.000074641 2.985854×10−4 4.000597145 7.4635208×10−5 1.0000000006

x7 = 1.0000000006 2.402×10−8

At this point we stop because Actually, in this case,


we picked a polynomial
|f(x7 )| = |2.402 × 10−8 | = 0.00000002402 < 0.00001. function where one can
easily see that the cor-
rect answer is 1, since
Our estimate of the root is thus x7 = 1.000000006.  f(1) = 0, so it is quite
a good estimate.

How good is Newton’s method? Well, if you used bisection to find the
root in the above example then it would take you perhaps 30 iterations to
achieve the same level of accuracy achieved after 6 iterations of Newton’s
method, and 60 bisections to achieve the same accuracy as 7 iterations
of Newton. Roughly speaking, bisection halves the error at each step,
while Newton’s method squares the error, and if the error is small, then
this is much better.
In school, we’ve all learned to add, subtract, multiply and divide numbers
but you probably didn’t learn how to extract square roots using these
operations. Here’s one way to do it.

Example
√ 8.4. Use Newton’s method to find a decimal expansion of
2 correct to 4 decimal places.

Solution. We notice first that 2 is a root of the function f(x) = x 2 −2.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
136 Numerical Techniques

We take an initial guess of 1 for a root. Now

f(xn ) xn2 − 2 xn2 + 2


xn+1 = xn − x n − .
f 0 (xn )
= =
2xn 2xn

We get x1 = 1, x2 = 32 , x3 = 17
12
, x4 = 577
408
, x5 = 470832
665857
. Checking the
decimal expansions reveals that the first four decimal places have

In fact, the approxima-
665857
tion 470832 of 2 is started to repeat,√hence we stop here and take x = 1.4142 as the
correct to 11 decimal
places. approximation of 2. 

Exercise 8.5.

1. Use Newton’s method to approximate a point x such that 7 sin x =


x + cos x.

2. Apply Newton’s method to find a root of f(x) = x 4 − 2x 2 + 3.


Take x0 = 1 and explain your answer.

8.2 Integrating numerically


Rb
We saw in Chapter 6 that a f(x) dx = F (b) − F (a), where F is any anti-
derivative of of f, and that this quantity is the area under the graph of
f between x = a and x = b (where area beneath the x-axis counts as
‘negative” area). In Chapter 7, we met methods of integration, that is,
ways of finding an anti-derivative of a given function f. We mentioned
that, while some integrals are standard, and some are tricky, there are
many which are just plain impossible, such as the integral of the bell
curve e−x . It’s a little like trying to find a root of f(x) = x 11 − 73x 4 +
2
By ‘impossible integral’
we mean one where
anti-derivatives, while
109 – you know the function has a root (why?), but there is no formula
guaranteed to exist by which will give it to you. The bisection method and Newton’s method are
Theorem 6.14, cannot
be expressed in terms approximation techniques that allow you to get around the problem, if
of standard functions. you can put up with some small error in your answer. We want a similar
numerical workaround for finding definite integrals, particularly for those
situations where finding an anti-derivative is intractable.
In fact, we’ve see one approach already. Riemann sums provide the ob-
vious way of approximating an integral, by virtue of the fact that the
integral is a limit of Riemann sums. We applied this technique in Exam-
ple 6.8. However, we may have to pick very fine Riemann sums to get a
good approximation to the area (think Figure 6.2 rather than Figure 6.1)
8.2. Integrating numerically 137

and these involve a lot of work. Can we get better approximations to the
area without having to do so much extra work?
The idea is to approximate the given function f on [a, b] by another func-
tion g which we can integrate simply. The integral of g should then be
a good approximation to the integral of f.

The Trapezoidal Rule


Rb
We approximate a f(x) dx, the area between the curve f and the x-axis If we insisted on hor-
between a and b by partitioning the interval [a, b] into subintervals and izontal lines then this
approximating f by a straight line on each subinterval. would essentially be a
Riemann sum approxi-

In Figure 8.4, we have split [a, b] into n subintervals of equal width h =


mation.

1
n
(b − a), for n = 4 and n = 9. We call h the step size of the partition.
We let xk = a + kh for k = 0, 1, 2, . . . n and yk = f(xk ). We construct n
trapezoids based on the subintervals. The area of the first trapezoid is
1
2
h(y0 + y1 ). The area of the second trapezoid is 12 h(y1 + y2 ). The total
area of the n trapezoids, Tn , is given by

Tn = 1
2
h(y0 + y1 ) + 12 h(y1 + y2 ) + · · · + 12 h(yn−1 + yn )
= 1
2
h(y0 + 2y1 + 2y2 + · · · + 2yn−1 + yn ).

Figure 8.4: trapezoidal rule with 4 and 9 subintervals, respectively

a = x0 x1 x2 x3 x4 = b a = x0 x9 = b

Rb
Definition 8.6. The nth trapezoidal approximation to a
f(x) dx is
given by

Tn = 1
2
h(y0 + 2y1 + 2y2 + · · · + 2yn−1 + yn ),

where h = n1 (b − a), xk = a + kh and yk = f(xk ), 0 6 k 6 n.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
138 Numerical Techniques

R2
Example 8.7. Estimate 1
x 2 dx with n = 2 subintervals and also with
n = 4 subintervals.

Solution. We start with n = 2 which gives a step size of h = 12 (2 −


1) = 12 . The points at which we calculate the value of the function
are x0 = 1, x1 = a + h = 32 and x2 = a + 2h = 2. We get

xk 1 3
2
2
yk = f(xk ) 1 9
4
4

which gives T2 = 14 (1 + 2 · 49 + 4) = 19
8
= 2.375.
If n = 4, we get a step size of h = 14 (2−1) = 14 , and the table becomes

xk 1 5
4
3
2
7
4
2
f(xk ) 1 25
16
9
4
49
16
4

Thus T4 = 18 (1 + 2 25
16
+ 2 94 + 2 49
16
+ 4) = 75
32
= 2.34375. 

The example was chosen In the example above one can of course calculate the integral exactly.
for the sake of simplic-
ity. Z 2
x 2 dx = 3
x
1 3 2
1
= 8
3
− 1
3
= 2 13 .
1

As one might expect, we got a better estimate of the true value when
taking a smaller step size – that is, a higher value for n.

Error estimate of the trapezoidal rule


One can in fact put a limit on the error for the trapezoidal rule. Let
Rb
ETn := a f(x) dx − Tn be the error in the nth trapezoidal approximation.
If f 00 is continuous then If f is twice differentiable and f 00 is continuous, and if M is the maximum
so is |f 00 |, so the max-
imum M will exist by
of |f 00 | on [a, b], then
Theorem 4.8. |ETn | 6 121
(b − a)h2 M.
Notice that the maximum possible error |ETn | is proportional to the square
of the step size h: if we halve h (by doubling the number of subintervals),
then the maximum possible error will be reduced by a factor of 4.
8.2. Integrating numerically 139

Remarks 8.8. If you want to calculate a definite integral to a given


accuracy then, using the above error estimate, you can decide how
large an n you need to take in order to guarantee such accuracy. For
large values of n, Tn is best calculated by computer.

R2
Example 8.9. Calculate 0
1 + x 3 dx to within an accuracy of 1
2
using
the trapezoidal rule.

Solution. Again |ETn | 6 12 1


(b − a)h2 M. Here, 12
1
(b − a) = 16 , while M
00
is the maximum value that |f (x)| = 6x takes on [0, 2], which is 12.
So |ETn | 6 61 h2 · 12 = 2h2 . So to guarantee accuracy to within 12 , we
need 2h2 6 12 , giving h2 6 14 and h 6 21 . Thus, if we take n = 4 then
h = b−a
n
= 42 = 12 will produce a sufficiently accurate result.

xk 0 1
2
1 3
2
2
yk = f(xk ) 1 9
8
2 35
8
9

This gives T4 = 41 (1 + 49 + 4 + 35
4
+ 9) = 25
4
(exact calculation gives an
answer of 6). 

Simpson’s rule
The trapezoidal rule approximates a function f by a series of straight
lines. We can make a better approximation by using sections of quadratic
curves instead of straight lines. This is the basis of Simpson’s rule.

Rb
Definition 8.10. The nth Simpson approximation to a
f(x)dx is given
by

Sn = 1
3
h(y0 + 4y1 + 2y2 + 4y3 + · · · + 2yn−2 + 4yn−1 + yn ),
b−a
where n is an even number, h = n
is the step size, xk = a + kh for
k = 0, . . . , n and yk = f(xk ).

We’ll leave the derivation of this rule to Appendix B.1, but it is no more
difficult to apply than the trapezoidal rule, and can save time because of
its improved accuracy, meaning we can choose smaller values for n.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
140 Numerical Techniques

Error estimate of Simpson’s rule


How much better is Simpson’s rule compared with the trapezoidal rule?
Rb
Well, if we let ESn = a f(x) dx − Sn be the error, then we have the
following error estimate.
As above, the existence If the fourth derivative, f (4) , of f is continuous and M is the maximum of
of M is guaranteed by
Theorem 4.8.
|f (4) | on [a, b] then
|ESn | 6 1
180
(b − a)h4 M.

So if we make h twice as small, by doubling n then the size of the error


reduces by a factor of 16. Compare this with the error estimate for the
trapezoidal rule.

R1
Example 8.11. Use Simpson’s rule with n = 4 to estimate 0
6x 5 dx.
What is the error estimate? What is the actual error?

Solution. In this example, n = 4, a = x0 = 0 and b = x4 = 1. The


step size is h = n1 (b − a) = 14 (1 − 0) = 14 . We make a table (using
some approximate values).

xk 0 1
4
1
2
3
4
1
yk = f(xk ) 0 0.0058594 0.1875 1.423828 6

Now Simpson’s rule gives

S4 = 1
3
h(y0 + 4y1 + 2y2 + 4y3 + y4 )
≈ 1
12
(0 + 4(0.0058594) + 2(0.1875) + 4(1.423828) + 6)
≈ 1.0078125.

Now we must find the theoretical limit on the error ES4 . We know
|ES4 | 6 180 1
(b − a)h4 M, where M is the maximum of |f (4) | on [a, b] =
[0, 1]. For f(x) = 6x 5 we have f 0 (x) = 30x 4 , f 00 (x) = 120x 3 , f (3) (x) =
360x 2 , f (4) (x) = 720x = |f (4) (x)| on [0, 1]. So M = 720 and

|ES4 | 6 1
180
· ( 14 )4 · 720 = 1
64
= 0.015625.
R1 1
This integral above can be evaluated exactly: 0 6x 5 dx = x 6 0 = 1.
(Thus, the actual error is quite a lot less than the theoretical bound
on the error.) 
8.2. Integrating numerically 141

The normal distribution again


In Chapter 3 (Figure 3.7), we mentioned that the bell curve e−x was
2

particularly important in statistics and data science generally. The rea-


son? In probability theory, we integrate so-called probability density
functions to calculate the probability of certain events taking place. The
probability density function √12π e−x is the most important one because
2

it is, in a sense, the one to which all others gravitate. This statement is
made precise in a result called the Central Limit Theorem.
The point is that calculating definite integrals of e−x is crucially impor-
2

tant when we have large amounts of data. However, we have already


seen that e−x is one of those functions that are ‘impossible’ to integrate.
2

This means that the only way to calculate these definite integrals is nu-
merically, or to look up tables produced by others who have done the
calculations. We’ll finish the chapter by performing our own numerical
calculation.

R1
Example 8.12. Find e−x dx to within an accuracy of ε = 10−4 =
2
0
0.0001.

Solution. Let φ be as in Example 3.28. Using Fact 2.4, and the fact
that |e−x |, |x n | 6 1 whenever x ∈ [0, 1] and n ∈ Z is non-negative,
2

we have |φ00 (x)| 6 6 and |φ0000 (x)| 6 76 for all x ∈ [0, 1]. When using
Simpson’s rule, to make |ESn | < ε, we need 180 1
h4 ·76 6 ε so h4 6 180
76
ε
b−a
and thus h 6 0.12405513. Since h = n = n , we have to take n > 8.
1

If we use the trapezoidal rule, then to make |ETn | < ε, we would need
h · 6 6 ε and so h2 6 2ε, giving h 6 0.0141421 and n > 70.
1 2
12
To run this script your-
Neither of these looks inviting to do by hand, so we’ll use a computer self, you will need Perl
to do the work. There is a perl script in the week 8 section, written on your computer.
by Michael Mackey, that produces the following output:

---
a = 0 b=1 n=10 Step size is h=0.1.
Trapezoidal sum with 10 steps is 0.746210796131749
Simpson sum with 10 steps is 0.746824948254443
---
a = 0 b=1 n=72 Step size is h=0.0138888888888889.
Trapezoidal sum with 72 steps is 0.746812305336648
Simpson sum with 72 steps is 0.746824133116616 

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
142 Numerical Techniques

Calvin and Hobbes by Bill Watterson


Appendix A

Discussion board and WeBWorK guides

A.1 How to use the Moodle discussion boards


Writing mathematics on the discussion boards
Evidently, written mathematics uses a host of symbols and notation that
is not available in ordinary word processing software. These days, the
majority of mathematicians use LaTeX (or its predecessor TeX) to typeset
mathematical documents. All of the material in this module was typeset
using LaTeX. However, learning LaTeX takes some time, and it was not
designed for direct use on the web. Instead, the more recent MathJax
system enables mathematics to be written directly into web pages using
simple LaTeX expressions.
The Moodle site is supported by MathJax, so users can write simple
mathematical expressions when posting to forums. We encourage users
to use MathJax when posting mathematical queries; the results look good
and clear, and only a minimal knowledge of LaTeX is required.
For the remainder of this section, we cover the basics of how to post sim-
ple mathematical expressions using MathJax. Note that when MathJax
is used on the discussion boards, typically it takes a few seconds for
mathematical expressions to be rendered properly!

1. Enclose mathematical notation using dollar signs


All mathematical notation should be enclosed by a pair of dollar
signs. For example, if you want to post the query ‘I don’t understand
why a4 = 3 in that example.’, you should write
I don’t understand why $aˆ4 = 3$ in that example.

143
144 Discussion board and WeBWorK guides

If you want to post a mathematical expression on a separate line,


enclose it using a pair of double dollar signs. For example, writing

$$aˆ4 = 3$$

renders the expression


a4 = 3
on a separate line, as above.

2. Arithmetic and fractions


Expressions such as ‘x + y = 5’ or ‘x − 2y = 3.6’ can rendered by
writing
$x + y = 5$ or $x - 2y = 3.6$,
respectively. You can write fractions either by using the division
sign or by using the \frac command, together with two pairs of
curly braces { and }. For instance,

$3/5$ and $\frac{3}{5}$

yield 3/5 and 35 , respectively.

3. Exponents/superscripts and subscripts


The characters ˆ and _ are used to render exponents/superscripts
and subscripts, respectively. For example,

$xˆ5 + 3xˆ2 + 10 = 0$ and $x_1 + x_3 = 4$

yield x 5 + 3x 2 + 10 = 0 and x1 + x3 = 4, respectively.

4. Surds
√ √
Expressions like 2 and 5 11 can be obtained by writing $\sqrt{2}$
or $\sqrt[5]{11}$, respectively (note the use of square brackets
as well as curly ones in the second example).

5. Use pairs of curly braces to nest expressions


For instance,
$xˆ{\frac{1}{2}} = eˆ{yˆ2}$
produces x 2 = ey .
1 2
A.1. How to use the Moodle discussion boards 145

6. Standard functions
The commands $\sin$, $\cos$, $\tan$ and $\log$ produce the
standard functions sin, cos, tan and log. For example,
$\cos(x) = \frac{\sqrt{3}}{2}$

3
produces cos(x) = 2
.
7. Summation and integration
Use the commands $\sum$ and $\int$, together with ˆ and _ and
curly braces, to write expressions involving summation and integra-
tion. For instance,
$\sum_{k=1}ˆn k = \frac{1}{2}n(n-1)$
Pn
produces k=1 k = 12 n(n − 1), and
$\int_1ˆ2 xˆ2 dx = \frac{7}{3}$
R2
gives 1
x 2 dx = 73 .
8. Greek characters and special symbols
The Greek letters α, β, θ and π etc can be expressed using $\alpha$,
$\beta$, $\theta$ and $\pi$, respectively. Symbols such as R and
≈ require $\mathbb{R}$ and $\approx$, respectively.
9. Matrices
Alas, there is no quick way to write down matrices properly using
MathJax, because doing so requires a so-called ‘LaTeX environ-
ment’.
To begin, type \begin{pmatrix}. Then type in the entries of the
first row, separating each one by an ampersand & character. When
you reach end of the first row, type \\. Add the entries of the
second row as above, and repeat until you have reached the end
of the final row. To finish, type \end{pmatrix} (you do not need to
add \\ at the end of the final row).
Perhaps an example explains all of this best. Typing
$$\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$$
will produce !
1 2
.
3 4

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
146 Discussion board and WeBWorK guides

The best way to learn this stuff is through practice and experimenta-
tion. You can do so by using this Live Demo. The examples above can
be adapted and combined in all sorts of ways to produce expressions
of greater complexity (though it should not be necessary to write enor-
mously complicated expressions!).

Be mindful when using the curly braces { and }. MathJax will com-
plain with error messages, or will not render your expression properly, if
they are missing or are in the wrong place. Every opening { requires a
corresponding closing } (correctly placed).

Finally, we repeat that when MathJax is used, typically it takes a few


seconds for mathematical expressions to be rendered properly. Also, the
system is not perfect. Sometimes it can stop working for reasons that
are inexplicable and beyond our control!

A.2 How to use WeBWorK


Submitting WeBWorK solutions
WeBWorK solutions must be entered entirely online. Many solutions are
numerical in nature, in which case entering a numerical solution usually
suffices. Occasionally, it may be necessary to enter more complicated
types of solutions, such as a polynomial like x 2 + 3. Such solutions
require a syntax that is similar to, but not the same as, MathJax above
(unfortunately, we are not able to do anything about this). Advice on this
syntex is given below.

√a given answer is not an integer (e.g. a fraction or irrational number like


If
2), you can enter it either symbolically or by using a decimal expansion.
For example, you can enter the fraction 32 either as 2/3 or as 0.666667.
In the latter case, give your answer to at least 6 significant digits so
that WeBWorK does not misinterpret it. The number of digits provided
by most calculator displays should be sufficient.

Below are some examples of types of expressions that may come up in


WeBWorK for this module (and maybe other ones), together with typical
examples and how to enter them.

If you are in any doubt about how WeBWorK is going to interpret your
answer, press the ‘Preview Answers’ button.
A.2. How to use WeBWorK 147

Expression type Example Enter into WeBWorK as


3
Fractions 4
or
Powers, Exponents n5
4k
(−7)n not
Polynomials 2x 2 + 15x − 4 or
Trig functions cos x
Exponential functions ex

WeBWorK will also interpret e.g. pi as π, pi/4 as π4 , cos(pi/4) as cos π4 ,


log(2) as log 2, e^4 as e4 , and so on. See

http://webwork.maa.org/wiki/Available_Functions

for other ways of entering answers. Also, sometimes you have to be


careful when using brackets ( and ). Just as entering things in a cal-
culator in a different order will produce different answers, so WeBWorK
will interpret things in a different order, depending on where you put the
brackets.

Notes on specific MATH10400 WeBWorK questions


This subsection may be added to during the semester, should queries
arise.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
Appendix B

Additional material (non-examinable)

B.1 Additional proofs


In this module, procedure is emphasised over theory. In other words, we
focus on presenting methods of doing things, rather than looking at why
those methods work in the first place. Those of you who want to delve
a little more deeply into why the mathematics in this module works are
invited to look at this section.

Missing proofs from Chapter 3


Proof of Theorem 3.22. The derivative of f ◦ g at x, assuming it exists, is
given by
(f ◦ g)(x + h) − (f ◦ g)(x) f(g(x + h)) − f(g(x))
(f ◦ g)0 (x) = lim = lim .
h→0 h h→0 h
Define a new function for h 6= 0 by

 f(g(x + h)) − f(g(x))



if g(x + h) − g(x) 6= 0
φ(h) = g(x + h) − g(x)
 0
f (g(x)) if g(x + h) − g(x) = 0,
and observe that
f(g(x + h)) − f(g(x)) g(x + h) − g(x)
= φ(h) · ,
h h
even if g(x + h) − g(x) = 0, because in that case both sides are 0. We
know that
g(x + h) − g(x)
lim = g0 (x),
h→0 h

149
150 Additional material (non-examinable)

so, using the algebra of limits, the proof will be complete if we can show
that limh→0 φ(h) = f 0 (g(x)).
Set k = g(x + h) − g(x). Whenever k 6= 0, we have
f(g(x) + k) − f(g(x))
φ(h) = ,
k
and φ(h) = f 0 (g(x)) otherwise. As h → 0, g(x + h) → g(x) because g
is continuous at x, being differentiable there. Consequently, k → 0 as
h → 0. Hence limh→0 φ(h) = f 0 (g(x)) as required.

Missing proofs from Chapter 4


Proof of Theorem 4.4. We want to show that any local maximum or local
minimum c of f is a critical point, that is, f 0 (c) = 0, provided of course
that f is differentiable at c.
Suppose then that c is a local maximum of f (the proof for a local minimum
is very similar so is not worth repeating). This means that f(c) > f(x) for
all x near c. Consider the derivative of f at c, which is given by
f(x) − f(c)
f 0 (c) = lim .
x→c x −c
This limit exists because we are told that f is differentiable at c, and
existence of the limit means we get the same outcome no matter how x
approaches c. So let’s suppose x gets closer to c from the right hand
side, that is x → c, but x > c at all times. This means x − c > 0. But
once x is close enough to c we know f(c) > f(x) and hence f(x)−f(c) 6 0.
Putting these two facts together, we have
f(x) − f(c)
6 0,
x −c
for x sufficiently close and to the right of c. It follows that
f(x) − f(c)
f 0 (c) = lim 6 0.
x→c x −c
Now repeat the argument, this time with x close to, but to the left of c,
to find
f(x) − f(c)
f 0 (c) = lim > 0.
x→c x −c
We conclude that f 0 (c) = 0 as required.
B.1. Additional proofs 151

Proof of Theorem 4.18. We consider the function


 
f(b) − f(a)
g(x) := f(x) − (x − a).
b−a

This function is continuous on [a, b] and differentiable on (a, b) because


f is. Notice that g(a) = g(b), therefore we can apply Theorem 4.16 to
get a point c ∈ (a, b) with g0 (c) = 0. But differentiating g, we see that

f(b) − f(a)
g0 (x) = f 0 (x) − ,
b−a
and so
f(b) − f(a)
0 = g0 (c) = f 0 (c) − ,
b−a
which is just what we want.

Missing proofs from Chapter 5


We give a proof of the equivalence between Theorems 5.15 and 5.16 in
the case of two variables.

Proof. First, we require an analysis of the signs of the eigenvalues of a


general 2 × 2 symmetric matrix. Let
 
a b
A =  ,
b d

be a general 2×2 symmetric matrix. According to MATH10390 Definition


5.8 and the succeeding remarks, the eigenvalues of A are the roots of the
characteristic equation of A, which is
 
a−λ b
0 = det(A − λI2 ) = det  
b d−λ
= (a − λ)(d − λ) − b2 = λ2 − (a + d)λ + ad − b2 .

Using the quadratic formula, the solutions of the characteristic equation


(i.e. the eigenvalues of A) are
p 
λ = 21 a + d ± (a + d)2 − 4(ad − b2 ) . (B.1)

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
152 Additional material (non-examinable)

By considering equation (B.1), we see that both eigenvalues of A are


positive if and only if
p
a+d > (a + d)2 − 4(ad − b2 ).

Upon squaring both sides, we see that this is equivalent to

a+d > 0 and ad − b2 > 0.

This, in turn, is equivalent to

a, d > 0 and ad − b2 > 0. (B.2)

By similar reasoning, both eigenvalues of A are negative if and only if


p
a + d < − (a + d)2 − 4(ad − b2 ),

which is equivalent to

a, d < 0 and ad − b2 > 0. (B.3)

Lastly, A has one positive and one negative eigenvalue if and only if
p
|a + d| < (a + d)2 − 4(ad − b2 ),

which, upon squaring both sides, is equivalent to

ad − b2 < 0. (B.4)

This completes our analysis of the signs of the eigenvalues of A.


According to Theorem 5.15, we need to check the signs of the eigenvalues
of the Hessian matrix at the critical point. If we let

∂2 f ∂2 f ∂2 f
a = , d = and b = ,
∂x 2 ∂y2 ∂x∂y
evaluated at the critical point, then A becomes the corresponding Hessian
matrix. By Theorem 5.15, the critical point is a local minimum if both
eigenvalues are positive, which is the case if and only if equation (B.2)
holds. Observe that with the given values of a, d and b, (B.2) holds if and
only if Theorem 5.16 1(b) holds. Likewise, by Theorem 5.15, the critical
point is a local maximum if both eigenvalues are negative, which is true
if and only if (B.3) and Theorem 5.16 1(a) hold. Finally, we have a saddle
point if one of the eigenvalues is positive and the other is negative, which
is true if and only if (B.4) and Theorem 5.16 2 hold.
B.2. Additional concepts 153

B.2 Additional concepts


Inverse trigonometric functions and their derivatives
The inverse trigonometric functions are important and should belong to
a first calculus course, but were cut from this one due to lack of space.
We include a few notes on them in this subsection. The trigonometric
functions sin, cos and tan are not injections when considered on their
natural domains, but they do become injective when we shrink the do-
mains appropriately, and, with an appropriate choice of codomain, they
become bijections.

Figure B.1: sin and cos restricted to [− π2 , π2 ] and [0, π], respectively

1 1

− π2 π π π
2 2

−1 −1

The functions sin : [− π2 , π2 ] → [−1, 1] and cos : [0, π] → [−1, 1] are bi-
jections. The respective inverses are arcsin : [−1, 1] → [− π2 , π2 ] and
arccos : [−1, 1] → [0, π]. Sometimes these functions are denoted in the
literature by sin−1 and cos−1 , however, when one considers the notation
sinn and cosn , n ∈ N, which denotes sin and cos raised to the nth (pos-
itive integer) power, it is clear how this alternative notation may cause
confusion, so we won’t use it here.

Figure B.2: the graphs of arcsin and arccos, respectively

π π
2

π
2
−1 1

− π2
−1 1

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
154 Additional material (non-examinable)

Meanwhile, the restricted function tan : (− π2 , π2 ) → R is a bijection,


and has inverse arctan : R → (− π2 , π2 ). In the figures below we have
added vertical and horizontal asymptotes to denote the fact that the left
and right limits of tan at π2 and − π2 are ∞ and −∞, respectively, and
limx→∞ arctan x = 1 and limx→−∞ arctan x = −1.

Figure B.3: the graphs of tan and arctan, respectively

π
1 2

− π2 π
2
−2 −1 1 2

−1 − π2

−2

Some important algebraic relationships exist between the trigonometric


functions and their inverses. For example, if x ∈ [−1, 1] and y = arcsin x,
then having in mind the identity cos2 y + sin2 y = 1, we see that
q √
cos(arcsin x) = cos y = 1 − sin2 y = 1 − x 2 . (B.5)
Since y = arcsin x ∈ [− π2 , π2 ], we know that cos y > 0, therefore taking
the positive square root above will yield the correct result. By very
similar reasoning, given x ∈ [−1, 1], we have

sin(arccos x) = 1 − x 2 . (B.6)
If x ∈ R and y = arctan x, then since
1
1 + tan2 y = ,
cos2 y
and again noting that cos y is non-negative, it follows that

= √
1 1
cos(arctan x) = q . (B.7)
1 + tan (arctan x)
2 1 + x 2
B.2. Additional concepts 155

There are many other such relationships (you may like to try to establish
some), but we focus on the three above because they will help us to
establish the differentiability properties of these inverse trigonometric
functions.
Concerning the differentiablilty properties, it will be useful to consider
the following theorem, which we will state without proof.

Theorem B.1. Suppose that f has an inverse f −1 . If f is differentiable


at f −1 (x) and f 0 (f −1 (x)) 6= 0, then f −1 is differentiable at x and

(f −1 )0 (x) =
1
.
f 0 (f −1 (x))

Using this result, we can readily establish the derivatives of arcsin, arccos
and arctan.

Proposition B.2.

1. arcsin is differentiable on (−1, 1) and

arcsin x = √
d 1
.
dx 1 − x2

2. arccos is differentiable on (−1, 1) and

arccos x = − √
d 1
.
dx 1 − x2

3. arcsin is differentiable on R and


d 1
arctan x = .
dx 1 + x2

Proof. Let f be the sine function restricted to [− π2 , π2 ]. Given x ∈ (−1, 1),


we see that f −1 (x) = arcsin x ∈ (− π2 , π2 ), and f 0 (f −1 (x)) = cos(arcsin x) 6= 0.
Consequently, using Theorem B.1 and equation (B.5) above, arcsin is
differentiable at x and
= √
d 1 1
arcsin x = .
dx cos(arcsin x) 1 − x2
We leave the rest of the proof to the reader. Equations (B.6) and (B.7)
help here.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
156 Additional material (non-examinable)

Having these derivatives in mind, we can expand slightly the range of


functions we can integrate, not least because now we have antiderivates
for

1 1
and .
1 − x2 1 + x2
This will be covered in further detail below.

A more robust treatment of exp and log


The exponential and natural logarithm functions exp and log were intro-
duced in Example 2.19, but were not actually defined. Later, in Section
3.3, it was stated, without justification, that exp was differentiable and
that the derivative of exp was itself. In Section 3.4 we showed, more or
less, that the derivative of log was 1/x (really, we showed that if log is
differentiable, then its derivative must be 1/x, but not that log is differ-
entiable in the first place). All in all, the treatment of these functions to
date has been a little sketchy. In this subsection we aim to rectify this
difficiency as best we can. It will be necessary to introduce one more
theorem without proof, but nevertheless the foundations will be shored
up quite well.
In fact, we begin by considering the logarithm rather the exponential
function. We want to define log in such a way that log 1 = 0 and the
derivative of log is 1/x. One common approach is to do this directly, using
the Fundamental Theorem of Calculus: we define the natural logarithm
for x > 0 using a definite integral
Z x
1
log x = dt.
1 t

By Theorem 6.14, log is differentiable and d


dx
log x = 1/x. Moreover, from
the definition it is clear that log 1 = 0.
Our intention is for exp to be the inverse of log. To this end, we must
check that log : (0, ∞) → R is a bijection. Checking that log is injective
is simple. Indeed, given 0 < x < y, we observe, using Theorem 4.18, that
there exists c ∈ (x, y) such that
log y − log x 1
= > 0,
y−x c
and thus log x < log y. This tells us that log is strictly increasing. In
particular, log x 6= log y, so it follows that log is injective. Showing that
log is surjective, that is, its range is R, requires some extra muscle, which
we supply without proof.
B.2. Additional concepts 157

Theorem B.3 (The Intermediate Value Theorem). Let f : [a, b] → R be


continuous, with f(a) 6 0 6 f(b). Then f has a root in [a, b], i.e. there
exists x ∈ [a, b] such that f(x) = 0.

We have an immediate corollary.

Corollary B.4. Let f : [a, b] → R be continuous and let u be a number


between f(a) and f(b). Then f(x) = u for some x ∈ [a, b].

Proof. Assume first that f(a) 6 u 6 f(b). Then apply Theorem B.3 to the
continuous function g(x) = f(x) − u, x ∈ [a, b]. If f(a) > u > f(b), consider
g(x) = u − f(x) instead.

Proposition B.5. The function log : (0, ∞) → R is surjective.

Proof. First of all, by repeating Example 4.21 enough times (strictly


speaking, using mathematical induction), we have that log(2n ) = n log 2
whenever n ∈ N. Again, by Example 4.21, given x > 0,

0 = log 1 = log(x · x −1 ) = log x + log(x −1 ),

thus log(x −1 ) = − log x. Hence log(2n ) = n log 2 whenever n ∈ Z.


Now let u ∈ R. We show that log x = u for some x ∈ (0, ∞). From above, The Intermediate Value
we know that 0 = log 1 < log 2. This implies that we can pick n ∈ N Theorem can be used
to show that a host of
large enough such that |u| 6 n log 2, i.e. −n log 2 6 u 6 n log 2. Using functions are surjective.
Theorem B.4, there exists x ∈ [2−n , 2n ] such that log x = u. Indeed, this should be
the approach in a course
that places more em-
phasis on mathematical
So we have established that log : (0, ∞) → R is bijective. According to rigour, i.e. firm justifica-
Theorem 2.17, log has an inverse function from R to (0, ∞). We define
tion of every statement.

the exponential function exp : R → ∞ to be this inverse. We spend the


rest of the subsection establishing the key properties of this function.
Straightaway, we have exp 0 = exp(log 1) = 1. Example 4.21 allows us
to establish the fundamental exponentiation identity.

Proposition B.6. We have exp(x + y) = exp x exp y for all x, y ∈ R.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
158 Additional material (non-examinable)

Proof. Let x, y ∈ R, and set a = exp x, b = exp y > 0. By Example 4.21,


we have

log(ab) = log a + log b = log(exp x) + log(exp y)) = x + y,

and applying exp to both sides yields

exp x exp y = ab = exp(log(ab)) = exp(x + y).

In particular, since exp 0 = 1, given x ∈ R, we have exp x exp(−x) =


exp(x − x) = exp 0 = 1, meaning that
1
exp(−x) = .
exp x
To prove the key differentiability property of exp, we use Theorem B.1.

Theorem B.7. The function exp is differentiable on R and

d
exp x = exp x.
dx

Proof. Set f(x) = log x, x ∈ R. Given x ∈ R, f is differentiable at


f −1 (x) = exp x > 0, and f 0 (exp x) = 1/ exp x > 0. Thus, by Theorem B.1,
d 1 1
exp x = 0 −1 = 0 = exp x.
dx f (f (x)) f (exp x)

As is clear, in the treatment above we defined the logarithm first and then
constructed the exponential function. It is common to start by defining
exp first and then log as its inverse, and end up with the same functions.
Often, exp is defined in terms of a so-called infinite power series: given
x ∈ R, we set

X xn x2 x3 x4
exp x = = 1+x + + + + ...
n=0
n! 2! 3! 4!

Amazingly, many func- In this module, we have not come close to considering how we could
tions familiar to us such
as sin and cos have sim-
possibly add together infinitely many terms like this and end up with
ilar power series repre- anything meaningful, which is why this approach was avoided above.
sentations.
Nevertheless, we do get something meaningful, and it is possible to prove
further that exp, so defined, has two properties:
B.2. Additional concepts 159

1. exp 0 = 1, and

d
2. exp x = exp x for all x ∈ R.
dx

From this point, we can apply the method of Example 4.21 to the function
f(x) = exp(x + y) exp(−x) to establish Proposition B.6 (another example
of the power of the technique in Example 4.21), and from there reason
that exp : R → (0, ∞) is a bijection, define log as its inverse, and find its
derivative using Theorem B.1.

Exponentiation revisited
Back in Section 1.6, we looked at exponentiation and reached a point
where we could define x q , where q is a rational number (at least for
x > 0). No definition of x y , where y is an irrational number, was given.
Above, we gave a more robust treatment of the exponential function exp.
In this subsection, we revisit exponentiation and take advantage of exp
to yield a more robust and expansive definition of x y . As well as this, we
consider again power functions, provide justification for Fact 3.13, and
introduce exponential functions and their derivatives.
We begin with a definition.

Definition B.8. Let x > 0 and y ∈ R. Define

x y = exp(y log x).

Evidently, if y is rational, this definition of exponentiation has the po-


tential to conflict with the definition given in Section 1.6, so we have to
make sure that this does not happen. To this end, we begin by showing
that the definition obeys Fact 1.9.

Proposition B.9. Given x > 0 and y, z ∈ R,

x y · x z = x y+z and (x y )z = x yz .

Proof. We have

x y · x z = exp(y log x) exp(z log x) = exp((y + z) log x) = x y+z ,

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
160 Additional material (non-examinable)

by Proposition B.6, and

(x y )z = exp(z log(x y ))
= exp(z log(exp(y log x))) = exp(zy log x) = x yz .

Now, let x > 0 and n ∈ N. Notice that

exp(n log x) = (exp(log x))n = x n , (B.8)

using (mathematical induction applied to) Proposition B.6. Moreover, if


n ∈ Z is negative, then

= −n = x n ,
1 1
exp(n log x) =
exp((−n) log x) x
so equation B.8 holds for all n ∈ Z. Futhermore, given n ∈ N, again by
Proposition B.6 we observe that
n 
exp n1 log x = exp n · n1 log x = exp(log x) = x,
 √
and consequently, exp n1 log x = n x. Finally, given a rational number
q = mn as written just before Definition 1.12, we have
√ m
x q = ( n x)m = exp n1 log x = exp mn log x = exp(q log x).


Therefore, happily, Definition B.8 agrees perfectly with that which has
been established already. What Definition B.8 provides in addition is a
way of taking irrational exponents.
Moreover, we can use it to prove Fact 3.13 (which concerns the derivatives
of power functions), at least for x > 0. Power functions are those of the
form f(x) = x r where the number r ∈ R is fixed. Thus, f(x) = x 2 and
g(x) = x − 3 and h(x) = x 3 , r(x) = x −1 = x1 are all power functions. The
7 1

power, or index, or exponent is a fixed number r, while the base x varies.


Suppose we fix r ∈ R and consider the power function x y , x > 0. Using
the chain rule and differentiability properties of exp and log,
d r r
exp(r log x) = exp(r log x)· = r exp((r−1) log x) = rx r−1 .
d
x =
dx dx x

We finish by using Definition B.8 to produce exponential functions. Ex-


ponential functions are those of the form f(x) = ax = exp(x log a), where,
this time, the base a > 0 is fixed and the exponent x varies. Hence
x
f(x) = 2x , f(x) = 72 and f(x) = 10x are examples of exponential functions.
B.2. Additional concepts 161

Using the chain rule and differentiability properties of exp, we see that
all exponential functions are differentiable:
d x
exp(x log a) = log a exp(x log a) = log a · ax .
d
a =
dx dx
We can see that these functions have the important property that the
derivative is directly proportional to the function. These functions are
used in, for example, the understanding of half lives of radioactive sub-
stances and simple models of cooling.
The number exp 1, usually denoted by e, is called Euler’s constant. It is
an irrational number having approximate value 2.718281828459, and its
immense importance to mathematics is only marginally outweighed by
that of π. If we follow the definition of exponential functions as above,
then we get
ex = exp(x log(e)) = exp(x log(exp 1)) = exp x.
Very often, as in these notes, it is more convenient and easy on the eye
to write ex instead of exp x. In this case, the derivative is equal to the
original function.

Linear approximation of functions of two variables


Recall from Section 4.4 that the line which best approximates a function
f : R → R at the point (a, f(a)) is given by the graph of the linear function
Laf (x) = f(a) + (x − a)f 0 (a).
For a function g : R2 → R, the appropriate analogue of tangent line is
actually a tangent plane. For a = (a1 , a2 ) ∈ R2 , the tangent plane of
g at (a, g(a) ∈ R3 ) is given by the graph of the linear function of two
variables
Lag (x1 , x2 ) = g(a) + (x − a) · ∇g(a),
where x = (x1 , x2 ). Notice in this formula, x − a and ∇g(a) are both
vectors in R2 , and the dot · represents the scalar product.

Example B.10. Find the tangent plane to g(x1 , x2 ) = x13 x22 at a = (2, 2).

Solution. We have
 
∂g ∂g
∇g(x) = , = (3x12 x22 , 2x13 x2 ),
∂x1 ∂x2

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
162 Additional material (non-examinable)

so that ∇g(a) = (48, 32). Now,


g
L(2,2) (x) = g(2, 2) + ∇g(2, 2) · ((x1 , x2 ) − (2, 2))
= 32 + (48, 32) · (x1 − 2, x2 − 2)
= 32 + 48x1 − 96 + 32x2 − 64
= 48x1 + 32x2 − 128. 

Integration by Partial Fractions


Let’s
R 1 just mention briefly one other method of integrating. Determining
x+1
dx is a simple matter of substituting u = x + 1. However, if the de-
nominator has a higher degree then such a simple substitution is unlikely
to work. For example, Z
1
dx,
x −1
2

just doesn’t work out if we substitute u = x 2 − 1 (try it!). However, it is


possible to use the method of partial fractions to rewrite the integrand
in a different way, namely
 
1 1 1
= 21
− ,
x2 − 1 x −1 x +1

then the integration is straightforward:


Z Z Z 
1 1 1
dx = 2 1
dx − dx
x2 − 1 x −1 x +1
= 12 (log |x − 1| − log |x + 1|) + c.

The method of partial fractions is an algebraic technique which won’t be


covered in these notes. We mention it here because it can be used to
rewrite many otherwise intractable integrals in a form that enables them
to be solved.

Integration using inverse trigonometric functions


Above, we saw the introduction of inverse trigonometric functions and
their derivatives. Let’s take a brief look at how they can help us in the
context of integration. Consider the following problem, where we are
asked to find the area illustrated below, enclosed between the x-axis
B.2. Additional concepts 163

and the semicircle in R2 having centre 0 radius 1, between a and b,


where −1 6 a 6 b 6 1.

Figure B.4: Finding areas enclosed under a semicircle


1

−1 a b 1


Using Pythagoras, the equation for the semicircle is given by f(x) =
1 − x 2 , x ∈ [−1, 1], so our problem boils down to finding the definite
integral
Z b
f(x) dx.
a
We require an inverse trigonometric function to evaluate this integral. In
the next example, you will see how a successful integration sometimes
involves the combination of a number of techniques.

Example B.11. Determine the indefinite integral


Z √
1 − x 2 dx,

and thus evaluate the area above.

Solution. We begin by making the substitution u = arcsin x. Using


Proposition B.2, we have

= √
du 1 1 1
= p = ,
dx 1 − x2 1 − sin2 u cos u

and so Z √ Z
1− x2 dx = cos2 u du.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
164 Additional material (non-examinable)

At this point, we use the trigonometric identities cos2 u = 1


2
(1 +
cos 2u), sin 2u = sin u cos u and equation (B.5) to write
Z √ Z Z
1 − x dx =
2 cos u du =
2 1
2
(1 + cos 2u) du

= 1
2
u + 14 sin 2u + c
= 1
2
u + 12 sin u cos u + c
= 1
arcsin x + 12 x cos(arcsin x) +c
2

= 1
2
arcsin x + 12 x 1 − x 2 + c.

We leave the final part to the reader. 

The solution above makes perfect sense from a geometric point of view.
Given an angle θ (in radians), the area of the circular sector having radius
1 and central angle θ is equal to 21 θ.

Figure B.5: A geometric approach to Example B.11


1 1


1−x 2

−1 x 1 −1 x 1

In the first figure above, to the left, x = sin θ and hence θ = arcsin x.
Therefore the area of the illustrated sector equals 12 arcsin x. Meanwhile,

the area of the shaded triangular region on the right is 21 x 1 − x 2 . Com-
bining the two regions gives the area sought above, between 0 and x.

An application of integration – Probability


There are many applications of integration: calculating areas we’ve seen,
and this can be generalised to calculating volumes and lengths of curves.
Other applications include solving the differential equations that arise in
physics and finance. We want to briefly describe one of its applications
in probability theory.
Simple probability models of experiments involving a finite number of
possible outcomes involve counting: for example, the probability of get-
ting an even number when we toss a die is found by counting the number
B.2. Additional concepts 165

of ‘successful’ outcomes (3) and the total number of possible outcomes (6)
and taking the ratio 36 , giving the probability of getting an even number
as 21 = 0.5. The expected value, essentially the long run average, when
we throw the die is obtained by taking each possible outcome, multi-
plying it by the probability that this outcome occurs, and then summing.
For a fair die, each outcome has an equal proability 16 of occuring, so we
get an expected value of 1 · 16 + 2 · 61 + · · · + 6 · 61 = 3 21 .
However, if the experiment involves an interval of possible outcomes then
we can’t calculate probabilities just by counting. For example, suppose
a random number generator produces a number between 0 and 1. There
are an infinite number of possible outcomes. The proability of any one
of them occuring is zero, but we can measure the probability that the
outcome will lie in any given interval by measuring the length of that
7 8
interval: the probability that we get a number between than 10 and 10 ,
for example, is 10 . The probability that the outcome lies between a and b
1
Rb
can actually be written as a 1 dx, a definite integral which evaluates to
b − a. The constant integrand here, 1, reflects the fact that all outcomes
are equally likely.
But this is not always the case. Perhaps we have a weighted random
number generator with numbers close to zero more likely to occur. This
would be reflected by a different function in the integrand, for example
Rb
a
2 − 2x dx. In this context, the integrand is called a probability density
function. It should be a positive function and its integral over the interval
of possible outcomes should equal one (because the probability of some
number being generated equals 1, i.e. it is a certainty). You can check that
this is the case with our weighted random number generator: 2 − 2x > 0
for x ∈ [0, 1], and
Z 1 1
2 − 2x dx = 2x − x 2 0 = (2 − 1) − (0 − 0) = 1.
0

The probability that this random number generator gives an answer be-
7 8
tween 10 and 10 is

Z 8 108
10
2 − 2x dx = 2x − x = ( 85 −
2 64
100
) − ( 57 − 49
100
) = 1
20
.
7 7
10 10

The ‘expected’ outcome, where the probaility density function is f(x), is


calculated by integrating xf(x). For example, our ‘fair’ random number

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.
166 Additional material (non-examinable)

generator has expected value


Z 1 Z 1
x · 1 dx = x dx = 2
x
1 2 1
0
= 1
2
.
0 0

1
So ‘on average’ we get 2
from our fair random number generator.
For our weighted random number generator, the expected value would
R1
be 0 x(2 − 2x) dx. Can you work this out?

Derivation of Simpson’s Rule


The trapezoidal rule approximates a function f by a series of straight
lines. We can make a better approximation by using quadratics instead
of straight lines. To do this we again split the interval [a, b] into n
subintervals. This time we will take n = 2m to be even because we will
be considering the subintervals in pairs. Look at the function f on the first
pair of subintervals, [x0 , x1 ] and [x1 , x2 ]. Now, f passes through the three
points (x0 , y0 ), (x1 , y1 ) and (x2 , y2 ), where xk = x0 + kh and h = n1 (b − a).
We want to find a quadratic which also passes through these three points
– it should be a good approximation to f and thus the area underneath
the graph of the quadratic should be a good approximation to that of f.
Suppose that the quadratic is ax 2 + bx + c.
To simplify the calculations let us suppose that x1 = 0. This is a valid
assumption because in moving the quadratic left or right to ensure that
x1 = 0 we do not change the area beneath it.

Figure B.6: moving the quadratic does not change the area underneath

x0 x1 x2 −h 0 h

The area beneath the graph of the quadratic is


Z h h

ax + bx + c dx = 3 ax + 2 bx + cx
2 1 3 1 2
−h −h
= 2
3
ah3 + 2ch
B.2. Additional concepts 167

= 1
3
h(2ah2 + 6c). (B.9)

Now use the fact that the quadratic passes through (−h, y0 ), (0, y1 ) and
(h, y2 ) to get

y0 = ah2 − bh + c (B.10)
y1 = c (B.11)
y2 = ah2 + bh + c. (B.12)

Adding (B.10) to (B.12) gives y0 + y2 = 2ah2 + 2c. Now add four times
(B.11) to get y0 + 4y1 + y2 = 2ah2 + 6c. Compare this to (B.9). We see
that the area under the quadratic is 31 h(y0 + 4y1 + y2 ).
Now this is only an approximation to the area over the first two subinter-
vals [x0 , x1 ] and [x1 , x2 ]. The area over the next two will be 13 h(y2 +4y3 +y4 ).
We keep doing this up the last pair of subintervals. Therefore the total
area Sn over the n = 2m subintervals will be

Sn = 1
3
h(y0 + 4y1 + y2 + y2 + 4y3 + y4 + y4 + 4y5 + · · · + 4y2m−1 + y2m )
= 1
3
h(y0 + 4y1 + 2y2 + 4y3 + · · · + 2y2n−2 + 4yn−1 + yn ).

This gives us Simpson’s rule.

© 2023 Michael Mackey and Richard Smith. These notes are for personal use only and should not be circulated.

You might also like