Introduction To Mathematical Analysis
Introduction To Mathematical Analysis
Mathematical Analysis
John E. Hutchinson
1994
Revised by
Richard J. Loy
1995/6/7
Department of Mathematics
School of Mathematical Sciences
ANU
Pure mathematics have one peculiar advantage, that they occa-
sion no disputes among wrangling disputants, as in other branches
of knowledge; and the reason is, because the definitions of the
terms are premised, and everybody that reads a proposition has
the same idea of every part of it. Hence it is easy to put an end
to all mathematical controversies by shewing, either that our ad-
versary has not stuck to his definitions, or has not laid down true
premises, or else that he has drawn false conclusions from true
principles; and in case we are able to do neither of these, we must
acknowledge the truth of what he has proved . . .
The mathematics, he [Isaac Barrow] observes, effectually exercise,
not vainly delude, nor vexatiously torment, studious minds with
obscure subtilties; but plainly demonstrate everything within their
reach, draw certain conclusions, instruct by profitable rules, and
unfold pleasant questions. These disciplines likewise enure and
corroborate the mind to a constant diligence in study; they wholly
deliver us from credulous simplicity; most strongly fortify us
against the vanity of scepticism, effectually refrain us from a
rash presumption, most easily incline us to a due assent, per-
fectly subject us to the government of right reason. While the
mind is abstracted and elevated from sensible matter, distinctly
views pure forms, conceives the beauty of ideas and investigates
the harmony of proportion; the manners themselves are sensibly
corrected and improved, the affections composed and rectified,
the fancy calmed and settled, and the understanding raised and
exited to more divine contemplations.
1 Introduction 1
1.1 Preliminary Remarks . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 History of Calculus . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Why “Prove” Theorems? . . . . . . . . . . . . . . . . . . . . . 2
1.4 “Summary and Problems” Book . . . . . . . . . . . . . . . . . 2
1.5 The approach to be used . . . . . . . . . . . . . . . . . . . . . 3
1.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 3
i
ii
4 Set Theory 33
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Russell’s Paradox . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Union, Intersection and Difference of Sets . . . . . . . . . . . . 35
4.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4.1 Functions as Sets . . . . . . . . . . . . . . . . . . . . . 38
4.4.2 Notation Associated with Functions . . . . . . . . . . . 39
4.4.3 Elementary Properties of Functions . . . . . . . . . . . 40
4.5 Equivalence of Sets . . . . . . . . . . . . . . . . . . . . . . . . 41
4.6 Denumerable Sets . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.7 Uncountable Sets . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.8 Cardinal Numbers . . . . . . . . . . . . . . . . . . . . . . . . 46
4.9 More Properties of Sets of Cardinality c and d . . . . . . . . . 50
4.10 *Further Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.10.1 The Axiom of choice . . . . . . . . . . . . . . . . . . . 52
4.10.2 Other Cardinal Numbers . . . . . . . . . . . . . . . . . 53
4.10.3 The Continuum Hypothesis . . . . . . . . . . . . . . . 54
4.10.4 Cardinal Arithmetic . . . . . . . . . . . . . . . . . . . 55
4.10.5 Ordinal numbers . . . . . . . . . . . . . . . . . . . . . 55
6 Metric Spaces 63
6.1 Basic Metric Notions in Rn . . . . . . . . . . . . . . . . . . . 63
iii
8 Cauchy Sequences 87
8.1 Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.2 Complete Metric Spaces . . . . . . . . . . . . . . . . . . . . . 90
8.3 Contraction Mapping Theorem . . . . . . . . . . . . . . . . . 92
11 Continuity 117
11.1 Continuity at a Point . . . . . . . . . . . . . . . . . . . . . . . 117
11.2 Basic Consequences of Continuity . . . . . . . . . . . . . . . . 119
11.3 Lipschitz and Hölder Functions . . . . . . . . . . . . . . . . . 121
11.4 Another Definition of Continuity . . . . . . . . . . . . . . . . 122
11.5 Continuous Functions on Compact Sets . . . . . . . . . . . . . 124
iv
14 Fractals 165
14.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
14.1.1 Koch Curve . . . . . . . . . . . . . . . . . . . . . . . . 166
14.1.2 Cantor Set . . . . . . . . . . . . . . . . . . . . . . . . . 167
14.1.3 Sierpinski Sponge . . . . . . . . . . . . . . . . . . . . . 168
14.2 Fractals and Similitudes . . . . . . . . . . . . . . . . . . . . . 170
14.3 Dimension of Fractals . . . . . . . . . . . . . . . . . . . . . . . 171
14.4 Fractals as Fixed Points . . . . . . . . . . . . . . . . . . . . . 174
14.5 *The Metric Space of Compact Subsets of Rn . . . . . . . . . 177
14.6 *Random Fractals . . . . . . . . . . . . . . . . . . . . . . . . . 182
v
15 Compactness 185
15.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
15.2 Compactness and Sequential Compactness . . . . . . . . . . . 186
15.3 *Lebesgue covering theorem . . . . . . . . . . . . . . . . . . . 189
15.4 Consequences of Compactness . . . . . . . . . . . . . . . . . . 190
15.5 A Criterion for Compactness . . . . . . . . . . . . . . . . . . . 191
15.6 Equicontinuous Families of Functions . . . . . . . . . . . . . . 194
15.7 Arzela-Ascoli Theorem . . . . . . . . . . . . . . . . . . . . . . 197
15.8 Peano’s Existence Theorem . . . . . . . . . . . . . . . . . . . 201
16 Connectedness 207
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
16.2 Connected Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 207
16.3 Connectedness in R . . . . . . . . . . . . . . . . . . . . . . . . 209
16.4 Path Connected Sets . . . . . . . . . . . . . . . . . . . . . . . 210
16.5 Basic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Bibliography 273
Chapter 1
Introduction
1
2
where each assumption is used in the proof of the theorem. Think of various
interesting examples of the theorem.
The dependencies of the various chapters are
14 16 15
4
1 2 3 6 7 8 9 10 11 12 13
5 17 18 19
1.6 Acknowledgments
Thanks are due to Paulius Stepanas and other members of the 1992 and
1993 B21H and B30H classes, and Simon Stephenson, for suggestions and
corrections from the previous versions of these notes. Thanks also to Paulius
for writing up a first version of Chapter 16, to Jane James and Simon for
some assistance with the typing, and to Maciej Kocan for supplying problems
for some of the later chapters.
The diagram of the Sierpinski Sponge is from [Ma].
4
Chapter 2
1. (x + y)2 = x2 + 2xy + y 2 .
2. 3x2 + 2x − 1 = 0.
5
6
The precise meaning should always be clear from context; if it is not then
more information should be provided.
Statement (2) probably refers to a particular real number x; although it
is possibly an abbreviation for the (false) statement
Again, the precise meaning should be clear from the context in which the
statment occurs.
Statement (3) is known as Fermat’s Last “Theorem”.2 An equivalent
statement is
or as
In the previous line, the symbols u and v are sometimes called dummy vari-
ables. Note, however, that the statement
2.2 Quantifiers
The expression for all (or for every, or for each, or (sometimes) for any), is
called the universal quantifier and is often written ∀.
The following all have the same meaning (and are true)
It is implicit in the above that when we say “for all x” or ∀x, we really
mean for all real numbers x, etc. In other words, the quantifier ∀ “ranges
over” the real numbers. More generally, we always quantify over some set of
objects, and often make the abuse of language of suppressing this set when
it is clear from context what is intended. If it is not clear from context, we
can include the set over which the quantifier ranges. Thus we could write
for all x ∈ R and for all y ∈ R, (x + y)2 = x2 + 2xy + y 2 ,
which we abbreviate to
³ ´
∀x ∈ R ∀y ∈ R (x + y)2 = x2 + 2xy + y 2 .
The expression there exists (or there is, or there is at least one, or there
are some), is called the existential quantifier and is often written ∃.
The following statements all have the same meaning (and are true)
and
there exists y such that for all x, x < y,
respectively. Here (as usual for us) the quantifiers are intended to range over
the real numbers. Note once again that the meaning of these statments is
unchanged if we replace x and y by, say, u and v.4
is false.
We have seen that reversing the order of consecutive existential and uni-
versal quantifiers can change the meaning of a statement. However, changing
the order of consecutive existential quantifiers, or of consecutive universal
quantifiers, does not change the meaning. In particular, if P (x, y) is a state-
ment whose meaning possibly depends on x and y, then
∀x∀y∃z(x2 + y 3 = z),
and
∀y∀x∃z(x2 + y 3 = z),
both have the same meaning. Similarly,
2.4 Connectives
The logical connectives and the logical quantifiers (already discussed) are used
to build new statements from old. The rigorous study of these concepts falls
within the study of Mathematical Logic or the Foundations of Mathematics.
We now discuss the logical connectives.
2.4.1 Not
If p is a statement, then the negation of p is denoted by
¬p (2.3)
Negation of Quantifiers
10
³ ´
1. The negation of ∀xP (x), i.e. the statement ¬ ∀xP (x) , is equivalent to
³ ´
∃x ¬P (x) . Likewise, the negation of ∀x ∈ R P (x), i.e. the statement
³ ´ ³ ´
¬ ∀x ∈ R P (x) , is equivalent to ∃x ∈ R ¬P (x) ; etc.
³ ´
2. The negation of ∃xP (x), i.e. the statement ¬ ∃xP (x) , is equivalent to
³ ´
∀x ¬P (x) . Likewise, the negation of ∃x ∈ R P (x), i.e. the statement
³ ´ ³ ´
¬ ∃x ∈ R P (x) , is equivalent to ∀x ∈ R ¬P (x) .
∀x∃yP (x, y)
is equivalent to
∃x∀y¬P (x, y).
Also, the negation of
∃x∀yP (x, y)
is equivalent to
∀x∃y¬P (x, y).
Similar rules apply if the quantifiers range over specified sets; see the
following Examples.
Examples
1 Suppose a is a fixed real number. The negation of
∃x ∈ R (x > a)
is equivalent to
∀x ∈ R ¬(x > a).
From the properties of inequalities, this is equivalent to
∀x ∈ bR (x ≤ a).
∃y∀x(x ≤ y).
Elementary Logic 11
if ² > 0 then there is a natural number n such that 0 < 1/n < ².
The statement 0 < 1/n was only added for emphasis, and follows from the
fact any natural number is positive and so its reciprocal is positive. Thus
the Corollary is equivalent to
The Corollary was proved by assuming it was false, i.e. by assuming the
negation of (2.4), and obtaining a contradiction. Let us go through essentially
the same argument again, but this time using quantifiers. This will take a
little longer, but it enables us to see the logical structure of the proof more
clearly.
From the properties of inequalities, and the fact ² and n range over certain
sets of positive numbers, we have
∃² > 0 ∀n ∈ N (n ≤ 1/²).
But this implies that the set of natural numbers is bounded above by 1/²,
and so is false by Theorem 3.2.10.
Thus we have obtained a contradiction from assuming the negation of (2.4),
and hence (2.4) is true.
4 The negation of
is
³ ´
Not (every differentiable function is continuous), i.e. ¬ ∀f ∈ D C(f ) ,
and is equivalent to
or
2.4.2 And
If p and q are statements, then the conjunction of p and q is denoted by
p∧q (2.6)
2.4.3 Or
If p and q are statements, then the disjunction of p and q is denoted by
p∨q (2.7)
1 = 1 or 1 = 2
is true. This may seem different from common usage, but consider the fol-
lowing true statement
1 = 1 or I am a pink elephant.
Elementary Logic 13
2.4.4 Implies
This often causes some confusion in mathematics. If p and q are statements,
then the statement
p⇒q (2.8)
is read as “p implies q” or “if p then q”.
Alternatively, one sometimes says “q if p”, “p only if q”, “p” is a sufficient
condition for “q”, or “q” is a necessary condition for “p”. But we will not
usually use these wordings.
If p is true and q is false then p ⇒ q is false, and in all other cases p ⇒ q
is true.
This may seem a bit strange at first, but it is essentially unavoidable.
Consider for example the true statement
Since in general we want to be able to say that a statement of the form ∀xP (x)
is true if and only if the statement P (x) is true for every (real number) x,
this leads us to saying that the statement
x>2⇒x>1
3 > 2 ⇒ 3 > 1,
1.5 > 2 ⇒ 1.5 > 1,
.5 > 2 ⇒ .5 > 1,
all be true statements. Thus we have examples where p is true and q is true,
where p is false and q is true, and where p is false and q is false; and in all
three cases p ⇒ q is true.
Next, consider the false statement
and
The statement p ⇒ q is equivalent to ¬(p ∧ ¬q), i.e. not(p and not q).
This may seem confusing, and is perhaps best understood by considering the
four different cases corresponding to the truth and/or falsity of p and q.
It follows that the negation of ∀x (P (x) ⇒ Q(x)) is equivalent to the state-
ment ∃x ¬ (P (x) ⇒ Q(x)) which in turn is equivalent to ∃x (P (x) ∧ ¬Q(x)).
As a final remark, note that the statement all elephants are pink can be
written in the form ∀x (E(x) ⇒ P (x)), where E(x) means x is an elephant
and P (x) means x is pink. Previously we wrote it in the form ∀x ∈ E P (x),
where here E is the set of pink elephants, rather than the property of being
a pink elephant.
2.4.5 Iff
If p and q are statements, then the statement
p⇔q (2.9)
Remarks
2.6 Proofs
A mathematical proof of a theorem is a sequence of assertions (mathemati-
cal statements), of which the last assertion is the desired conclusion. Each
assertion
The word “obvious” is a problem. At first you should be very careful to write
out proofs in full detail. Otherwise you will probably write out things which
you think are obvious, but in fact are wrong. After practice, your proofs will
become shorter.
A common mistake of beginning students is to write out the very easy
points in much detail, but to quickly jump over the difficult points in the
proof.
The problem of knowing “how much detail is required” is one which will
become clearer with (much) practice.
In the next few subsections we will discuss various ways of proving math-
ematical statements.
Besides Theorem, we will also use the words Proposition, Lemma and
Corollary. The distiction between these is not a precise one. Generally,
“Theorems” are considered to be more significant or important than “Propo-
sitions”. “Lemmas” are usually not considered to be important in their own
right, but are intermediate results used to prove a later Theorem. “Corollar-
ies” are fairly easy consequences of Theorems.
16
Theorem 2.6.1 For every integer n there exists an integer m such that m >
n.
Proof:
and so
2p2 = (n∗ )2 .
Hence (n∗ )2 is even, and so n∗ is even.
Since both m∗ and n∗ are even, √they must have the common factor 2,
which is a contradiction. So m/n 6= 2.
3.1 Introduction
The Real Number System satisfies certain axioms, from which its other prop-
erties can be deduced. There are various slightly different, but equivalent,
formulations.
Definition 3.1.1 The Real Number System is a set1 of objects called Real
Numbers and denoted by R together with two binary operations2 called ad-
dition and multiplication and denoted by + and × respectively (we usually
write xy for x × y), a binary relation called less than and denoted by <,
and two distinct elements called zero and unity and denoted by 0 and 1
respectively.
The axioms satisfied by these fall into three groups and are detailed in the
following sections.
A1 a + b = b + a
A2 (a + b) + c = a + (b + c)
A3 a + 0 = 0 + a = a
1
We discuss sets in the next Chapter.
2
To say + is a binary operation means that + is a function such that + : R × R → R.
We write a + b instead of +(a, b). Similar remarks apply to ·.
19
20
A5 a × b = b × a
A6 (a × b) × c = a × (b × c)
A7 a × 1 = 1 × a = a,and 1 6= 0.
Algebraic Axioms It turns out that one can prove all the algebraic prop-
erties of the real numbers from properties A1–A9 of addition and multipli-
cation. We will do some of this in the next subsection.
We call A1–A9 the Algebraic Axioms for the real number system.
1. a = a
2. a = b ⇒ b = a3
3. a = b and4 b = c ⇒ a = c5
a − b = a + (−b).
Similarly, if b 6= 0 define
µ ¶
a
a ÷ b = a/b = = ab−1 .
b
3
By ⇒ we mean “implies”. Let P and Q be two statements, then “P ⇒ Q” means “P
implies Q”; or equivalently “if P then Q”.
4
We sometimes write “∧” for “and”.
5
Whenever we write “P ∧ Q ⇒ R”, or “P and Q ⇒ R”, the convention is that we mean
“(P ∧ Q) ⇒ R”, not “P ∧ (Q ⇒ R)”.
22
Some consequences of axioms A1–A9 are as follows. The proofs are given
in the AH1 notes.
1. a0 = 0
2. −(−a) = a
3. (c−1 )−1 = c
4. (−1)a = −a
5. a(−b) = −(ab) = (−a)b
6. (−a) + (−b) = −(a + b)
7. (−a)(−b) = ab
8. (a/c)(b/d) = (ab)/(cd)
9. (a/c) + (b/d) = (ad + bc)/cd
Remark Henceforth (unless we say otherwise) we will assume all the usual
properties of addition, multiplication, subtraction and division. In particular,
we can solve simultaneous linear equations. We will also assume standard
definitions including x2 = x × x, x3 = x × x × x, x−2 = (x−1 ) , etc.
2
Order Axioms There is a subset6 P of the set of real numbers, called the
set of positive numbers, such that:
A10 For any real number a, exactly one of the following holds:
a = 0 or a ∈ P or −a∈P
The following important properties can be deduced from the axioms; but we
will not pause to do so.
2. |a + b| ≤ |a| + |b|
¯ ¯
3. ¯¯|a| − |b|¯¯ ≤ |a − b|
with similar definitions for [a, b), (a, b], (−∞, a], (−∞, a). Note that ∞ is not
a real number and there is no interval of the form (a, ∞].
We only use the symbol ∞ as part of an expression which, when written
out in full, does not refer to ∞.
a c b
A B
Note that every number < c belongs to A and every number > c belongs
to B. Moreover, either c ∈ A or c ∈ B by 2. Hence if c ∈ A then A = (−∞, c]
and B = (c, ∞); while if c ∈ B then A = (−∞, c) and B = [c, ∞).
The pair of sets {A, B} is called a Dedekind Cut.
The intuitive idea of A12 is that the Completeness Axiom says there are
no “holes” in the real numbers.
√
We next use Axiom A12 to prove the existence of 2, i.e. the existence
of a number c such that c2 = 2.
Proof: Let
It follows (from the algebraic and order properties of the real numbers; i.e.
A1–A11) that every real number x is in exactly one of A or B, and hence
that the two hypotheses of A12 are satisfied.
By A12 there is a unique real number c such that
0 c b
A B
From the Note following A12, either c ∈ A or c ∈ B.
If c ∈ A then c < 0 or (c > 0 and c2 < 2). But then by taking ² > 0
sufficiently small, we would also have c + ² ∈ A (from the definition
of A), which contradicts conclusion b in A12.
Hence c ∈ B, i.e. c > 0 and c2 ≥ 2.
If c2 > 2, then by choosing ² > 0 sufficiently small we would also have
c − ² ∈ B (from the definition of B), which contradicts a in A12.
Hence c2 = 2.
We write
b = l.u.b. S = sup S
One similarly defines lower bound and greatest lower bound (or g.l.b. or infi-
mum or inf ) by replacing “≤” by “≥”.
11
And hence a negative real number c such that c2 = 2; just replace c by −c.
Real Number System 27
Note that if the l.u.b. or g.l.b. exists it is unique, since if b1 and b2 are
both l.u.b.’s then b1 ≤ b2 and b2 ≤ b1 , and so b1 = b2 .
Examples
Proof: Let
T = {−x : x ∈ S}.
Then it follows that a is a lower bound for S iff −a is an upper bound for T ;
and b is a g.l.b. for S iff −b is a l.u.b. for T .
b 0 -b
( )
T
S
We will usually use the version Axiom A120 rather than Axiom A12; and
we will usually refer to either as the Completeness Axiom. Whenever we use
the Completeness axiom in our future developments, we will explictly refer
to it. The Completeness Axiom is essential in proving such results as the
Intermediate Value Theorem14 .
Exercise: Give an example to show that the Intermediate Value Theorem
does not hold in the “world of rational numbers”.
The structure consisting of the set R, together with the operations + and
× and the set of positive numbers P , is uniquely characterised by Axioms 1–
12, in the sense that any two structures satisfying the axioms are essentially
14
If a continuous real valued function f : [a, b] → R satisfies f (a) < 0 < f (b), then
f (c) = 0 for some c ∈ (a, b).
30
the same. More precisely, the two systems are isomorphic, see Chapter IV-5
of Birkhoff and MacLane or Chapter 29 of Spivak.
N = {1, 1 + 1, 1 + 1 + 1, . . .}.
n ∈ N implies n ≤ b. (3.1)
It follows that
m ∈ N implies m + 1 ≤ b, (3.2)
since if m ∈ N then m + 1 ∈ N, and so we can now apply (3.1) with n there
replaced by m + 1.
But from (3.2) (and the properties of subtraction and of <) it follows that
m ∈ N implies m ≤ b − 1.
Corollary 3.2.11 If ² > 0 then there is a natural number n such that 0 <
1/n < ².16
Proof: Assume there is no natural number n such that 0 < 1/n < ². Then
for every n ∈ N it follows that 1/n ≥ ² and hence n ≤ 1/². Hence 1/² is an
upper bound for N, contradicting the previous Theorem.
Hence there is a natural number n such that 0 < 1/n < ².
15
Logical Point: Our intention is to obtain a contradiction from this assumption, and
hence to deduce that N is not bounded above.
16
We usually use ² and δ to denote numbers that we think of as being small and positive.
Note, however, that the result is true for any real number ²; but it is more “interesting”
if ² is small.
Real Number System 31
We can now prove that between any two real numbers there is a rational
number.
Theorem 3.2.12 For any two reals x and y, if x < y then there exists a
rational number r such that x < r < y.
Proof: (a) First suppose y − x > 1. Then there is an integer k such that
x < k < y.
To see this, let l be the least upper bound of the set S of all integers j
such that j ≤ x. It follows that l itself is a member of S,and so in particular
is an integer.17 ) Hence l + 1 > x, since otherwise l + 1 ≤ x, i.e. l + 1 ∈ S,
contradicting the fact that l = lub S.
Moreover, since l ≤ x and y − x > 1, it follows from the properties of < that
³ l x l+1 y
l + 1 < y. Diagram: )
Thus if k = l + 1 then x < k < y.
(b) Now just assume x < y.
By the previous Corollary choose a natural number n such that 1/n < y − x.
Hence ny −nx > 1 and so by (a) there is an integer k such that nx < k < ny.
Hence x < k/n < y, as required.
Theorem 3.2.13 For any two reals x and y, if x < y then there exists an
irrational number r such that x < r < y.
Proof: First
√ suppose a and b are rational
√ and a < b.
Note that 2/2√is irrational (why? ) and 2/2 < 1.√ Hence
a < a + (b − a) 2/2 < b and moreover a + (b − a) 2/2 is irrational18 .
To prove the result for general x < y, use the previous theorem twice to
first choose a rational number a and then another rational number b, such
that x < a < b < y.
By the first paragraph there is a rational number r such that x < a < r <
b < y.
17
The least upper bound b of any set S of integers which is bounded above, must itself
be a member of S. This is fairly clear, using the fact that members of S must be at least
the fixed distance 1 apart.
More precisely, consider the interval [b − 1/2, b]. Since the distance between any two
integers is ≥ 1, there can be at most one member of S in this interval. If there is no
member of S in [b − 1/2, b] then b − 1/2 would also be an upper bound for S, contradicting
the fact b is the least upper bound. Hence there is exactly one member s of S in [b−1/2, b];
it follows s = b as otherwise s would be an upper bound for S which is < b; contradiction.
Note that this argument works for any set S whose members are all at least a fixed
positive distance d > 0 √ apart. Why?
18
Let√r = a + (b − a) 2/2. √
Hence 2 = 2(r − a)/(b − a). So if r were rational then 2 would also be rational, which
we know is not the case.
32
Corollary 3.2.14 For any real number x, and any positive number ² >,
there a rational (resp. irrational) number r (resp.s) such that 0 < |r − x| < ²
(resp. 0 < |s − x| < ²).
Chapter 4
Set Theory
4.1 Introduction
The notion of a set is fundamental to mathematics.
A set is, informally speaking, a collection of objects. We cannot use
this as a definition however, as we then need to define what we mean by a
collection.
The notion of a set is a basic or primitive one, as is membership ∈, which
are not usually defined in terms of other notions. Synonyms for set are
collection, class 1 and family.
It is possible to write down axioms for the theory of sets. To do this
properly, one also needs to formalise the logic involved. We will not follow
such an axiomatic approach to set theory, but will instead proceed in a more
informal manner.
Sets are important as it is possible to formulate all of mathematics in set
theory. This is not done in practice, however, unless one is interested in the
Foundations of Mathematics 2 .
1
*Although we do not do so, in some studies of set theory, a distinction is made between
set and class.
2
There is a third/fourth year course Logic, Set Theory and the Foundations of
Mathematics.
33
34
This is read as: “S is the set of all x such that P (x) (is true)”3 .
For example, if P (x) is an abbreviation for
x is an integer > 5
or
x is a pink elephant,
then there is a corresponding set (although in the second case it is the so-
called empty set, which has no members) of objects x having property P (x).
However, Bertrand Russell came up with the following property of x:
or in symbols
x 6∈ x.
Suppose
S = {x : x 6∈ x} .
If there is indeed such a set S, then either S ∈ S or S 6∈ S. But
• if the first case is true, i.e. S is a member of S, then S must satisfy the
defining property of the set S, and so S 6∈ S—contradiction;
• if the second case is true, i.e. S is not a member of S, then S does not
satisfy the defining property of the set S, and so S ∈ S—contradiction.
is the set with members 1,3 and {1, 5}. Note that 5 is not a member5 . If we
write the members of a set in a different order, we still have the same set.
If S is the set of all x such that . . . x . . . is true, then we write
S = {x : . . . x . . .} , (4.3)
and read this as “S is the set of x such that . . . x . . .”. For example, if
S = {x : 1 < x ≤ 2}, then S is the interval of real numbers that we also
denote by (1, 2].
Members of a set may themselves be sets, as in (4.2).
If A and B are sets, their union A ∪ B is the set of all objects which
belong to A or belong to B (remember that by the meaning of or this also
includes those objects belonging to both A and B). Thus
A ∪ B = {x : x ∈ A or x ∈ B} .
The intersection A ∩ B of A and B is defined by
A ∩ B = {x : x ∈ A and x ∈ B} .
The difference A \ B of A and B is defined by
A \ B = {x : x ∈ A and x 6∈ B} .
It is sometimes convenient to represent this schematically by means of a Venn
Diagram.
5
However, it is a member of a member; membership is generally not transitive.
36
We can take the union of more than two sets. If F is a family of sets, the
union of all sets in F is defined by
[
F = {x : x ∈ A for at least one A ∈ F} . (4.4)
and
\
n
Ai or A1 ∩ · · · ∩ An (4.7)
i=1
Examples
S∞
1. n=1 [0, 1 − 1/n] = [0, 1)
T∞
2. n=1 [0, 1/n] = {0}
T∞
3. n=1 (0, 1/n) =∅
We say two sets A and B are equal iff they have the same members, and
in this case we write
A = B. (4.10)
∅. (4.11)
There is only one empty set, since any two empty sets have the same mem-
bers, and so are equal!
Set Theory 37
Thus if X is the (set of) reals and A is the (set of) rationals, then the
complement of A is the set of irrationals.
The complement of the union (intersection) of a family of sets is the in-
tersection (union) of the complements; these facts are known as de Morgan’s
laws. More precisely,
Proposition 4.3.2
(A ∪ B)c = Ac ∩ B c and (A ∩ B)c = Ac ∪ B c . (4.14)
More generally,
Ã∞ !c ∞
Ã∞ !c ∞
[ \ \ [
Ai = Aci and Ai = Aci , (4.15)
i=1 i=1 i=1 i=1
and c c
[ \ \ [
Aλ = Acλ and Aλ = Acλ . (4.16)
λ∈J λ∈J λ∈J λ∈J
4.4 Functions
We think of a function f : A → B as a way of assigning to each element a ∈ A
an element f (a) ∈ B. We will make this idea precise by defining functions
as particular kinds of sets.
If A and B are sets, their Cartesian product is the set of all ordered pairs
(x, y) with x ∈ A and y ∈ B. Thus
A1 × A2 × · · · × An = {(a1 , a2 , . . . , an ) : a1 ∈ A1 , a2 ∈ A2 , . . . , an ∈ An } .
(4.21)
In particular, we write
n
z }| {
R = R × ··· × R.
n
(4.22)
If f is a set of ordered pairs from A × B with the property that for every
x ∈ A there is exactly one y ∈ B such that (x, y) ∈ f , then we say f is a
function (or map or transformation or operator ) from A to B. We write
f : A → B, (4.23)
y = f (x). (4.24)
f (x) = x2 , (4.26)
Note that f [A] ⊂ B but may not equal B. For example, in (4.26) the range
of f is the set [0, ∞) = {x ∈ R : 0 ≤ x}.
We say f is one-one or injective or univalent if for every y ∈ B there is
at most one x ∈ A such that y = f (x). Thus the function f1 : R → R given
by f1 (x) = x2 for all x ∈ R is not one-one, while the function f2 : R → R
given by f2 (x) = ex for all x ∈ R is one-one.
We say f is onto or surjective if every y ∈ B is of the form f (x) for some
x ∈ A. Thus neither f1 nor f2 is onto. However, f1 maps R onto [0, ∞).
If f is both one-one and onto, then there is an inverse function f −1 :
B → A defined by f (y) = x iff f (x) = y. For example, if f (x) = ex for all
x ∈ R, then f : R → [0, ∞) is one-one and onto, and so has an inverse which
is usually denoted by ln. Note, incidentally, that f : R → R is not onto, and
so strictly speaking does not have an inverse.
If S ⊂ A, then the image of S under f is defined by
f −1 [T ] = {x : f (x) ∈ T } . (4.31)
Proposition 4.4.1
h[ i [
f [C ∪ D] = f [C] ∪ f [D] f Aλ = f [Aλ ] (4.33)
λ∈J λ∈J
h\ i \
f [C ∩ D] ⊂ f [C] ∩ f [D] f Cλ ⊂ f [Cλ ] (4.34)
λ∈J λ∈J
Set Theory 41
h[ i [
f −1 [U ∪ V ] = f −1 [U ] ∪ f −1 [V ] f −1 Uλ = f −1 [Uλ ] (4.35)
λ∈J λ∈J
h\ i \
−1 −1 −1 −1
f [U ∩ V ] = f [U ] ∩ f [V ] f Uλ = f −1 [Uλ ] (4.36)
λ∈J λ∈J
−1 c −1 c
(f [U ]) = f [U ] (4.37)
h i
A ⊂ f −1 f [A] (4.38)
Exercise Give a simple example to show equality need not hold in (4.34).
The idea is that the two sets A and B have the same number of elements.
Thus the sets {a, b, c}, {x, y, z} and that in (4.2) are equivalent.
Some immediate consequences are:
Proposition 4.5.2
1. A ∼ A (i.e. ∼ is reflexive).
2. If A ∼ B then B ∼ A (i.e. ∼ is symmetric).
3. If A ∼ B and B ∼ C then A ∼ C (i.e. ∼ is transitive).
When we consider infinite sets there are some results which may seem
surprising at first:
Proof: It is sufficient to show that N is not finite (why?). But in fact any
finite subset of N is bounded, whereas we know that N is not (Chapter 3).
6
This is clear from the graph of f2 . More precisely:
(i) if x ∈ (0, 1) then x/(1 − x) ∈ (0, ∞) follows from elementary properties of inequalities,
(ii) for each y ∈ (0, ∞) there is a unique x ∈ (0, 1) such that y = x/(1 − x), namely
x = y/(1 + y), as follows from elementary algebra and properties of inequalities.
7
As is again clear from the graph of f3 , or by arguments similar to those used for for f2 .
8
See Section 4.8 for a more general discussion of cardinal numbers.
Set Theory 43
We have seen that the set of even integers is denumerable (and similarly
for the set of odd integers). More generally, the following result is straight-
forward (the only problem is setting out the proof in a reasonable way):
Remark This proof is rather more subtle than may appear. Why is the
resulting function from N → B onto? We should really prove that every
non-empty set of natural numbers has a least member, but for this we need
to be a little more precise in our definition of N. See [St, pp 13–15] for
details.
In the first row are listed all positive rationals whose reduced
form is m/1 for some m (this is just the set of natural numbers);
In the second row are all positive rationals whose reduced form
is m/2 for some m;
In the third row are all positive rationals whose reduced form is
m/3 for some m;
...
44
We will see in the next section that not all infinite sets are denumerable.
However denumerable sets are the smallest infinite sets in the following sense:
Proof: Since A 6= ∅ there exists at least one element in A; denote one such
element by a1 . Since A is not finite, A 6= {a1 }, and so there exists a2 , say,
where a2 ∈ A, a2 6= a1 . Similarly there exists a3 , say, where a3 ∈ A, a3 6= a2 ,
a3 6= a1 . This process will never terminate, as otherwise A ∼ {a1 , a2 , . . . , an }
for some natural number n.
Thus we construct a denumerable set B = {a1 , a2 , . . .}9 where B ⊂ A.
It turns out that the answer is No, as we see from the next theorem. Two
proofs will be given, both are due to Cantor (late nineteenth century), and
the underlying idea is the same.
Proof: 10 We show that for any f : N → (0, 1), the map f cannot be onto.
It follows that there is no one-one and onto map from N to (0, 1).
To see this, let yn = f (n) for each n. If we write out the decimal expansion
for each yn , we obtain a sequence
Some rational numbers have two decimal expansions, e.g. .14000 . . . = .13999 . . .
but otherwise the decimal expansion is unique. In order to have uniqueness,
we only consider decimal expansions which do not end in an infinite sequence
of 9’s.
To show that f cannot be onto we construct a real number z not in the
above sequence, i.e. a real number z not in the range of f . To do this define
z = .b1 b2 b3 . . . bi . . . by “going down the diagonal” as follows:
It follows that z is not in the sequence (4.40)11 , since for each i it is clear
that z differs from the i’th member of the sequence in the i’th place of z’s
decimal expansion. But this implies that f is not onto.
Proof: Suppose that (an ) is a sequence of real numbers, we show that there
is a real number r ∈ (0, 1) such that r 6= an for every n.
Let I1 be a closed subinterval of (0, 1) with a1 6∈ I1 , I2 a closed subinterval
of I1 such that a2 6∈ I2 . Inductively, we obtain a sequence (In ) of intervals
such that In+1 ⊆ In for all n. Writing In = [αn , βn ], the nesting of the
intervals shows that αn ≤ αn+1 < βn+1 ≤ βn . In particular, (αn ) is bounded
above, (βn ) is bounded below, so that α = supn αn , β = inf n βn are defined.
Further it is clear that [α, β] ⊆ In for all n, and hence excludes all the (an ).
Any r ∈ [α, β] suffices.
10
We will show that any sequence (“list”) of real numbers from (0, 1) cannot include all
numbers from (0, 1). In fact, there will be an uncountable (see Definition 4.7.3) set of real
numbers not in the list — but for the proof we only need to find one such number.
11
First convince yourself that we really have constructed a number z. Then convince
yourself that z is not in the list, i.e. z is not of the form yn for any n.
46
Remark We have seen that the set of rationals has cardinality d. It follows13
that the set of irrationals has cardinality c. Thus there are “more” irrationals
than rationals.
On the other hand, the rational numbers are dense in the reals, in the
sense that between any two distinct real numbers there is a rational number14 .
(It is also true that between any two distinct real numbers there is an irra-
tional number15 .)
Definition 4.8.1 With every set A we associate a symbol called the cardinal
number of A and denoted by A. Two sets are assigned the same cardinal
number iff they are equivalent16 . Thus A = B iff A ∼ B.
12
c comes from continuum, an old way of referring to the set R.
13
We show in one of the problems for this chapter that if A has cardinality c and B ⊂ A
has cardinality d, then A \ B has cardinality c.
14
Suppose a < b. Choose an integer n such that 1/n < b − a. Then a < m/n < b for
some integer m. √
15
Using the notation of the previous footnote, take the irrational number m/n + 2/N
for some sufficiently large natural number N .
16
We are able to do this precisely because the relation of equivalence is reflexive, sym-
metric and transitive. For example, suppose 10 people are sitting around a round table.
Define a relation between people by A ∼ B iff A is sitting next to B, or A is the same
as B. It is not possible to assign to each person at the table a colour in such a way that
two people have the same colour if and only if they are sitting next to each other. The
problem is that the relation we have defined is reflexive and symmetric, but not transitive.
Set Theory 47
If A = ∅ we write A = 0.
If A = {a1 , . . . , an } (where a1 , . . . , an are all distinct) we write A = n.
If A ∼ N we write A = d (or ℵ0 , called “aleph zero”, where ℵ is the first
letter of the Hebrew alphabet).
If A ∼ R we write A = c.
Proposition 4.8.3
1 ≤ 2 ≤ 3 ≤ . . . ≤ d ≤ c. (4.42)
1. A ≤ A;
2. A ≤ B and B ≤ A implies A = B;
3. A ≤ B and B ≤ C implies A ≤ C;
4. either A ≤ B or B ≤ A.
Proof: The first and the third results are simple. The first follows from
Theorem 4.5.2(1) and the third from Theorem 4.5.2(3).
The other two result are not easy.
*Proof of (2): Since A ≤ B there exists a function f : A → B which is
one-one (but not necessarily onto). Similarly there exists a one-one function
g : B → A since B ≤ A.
If f (x) = y or g(u) = v we say x is a parent of y and u is a parent of v.
Since f and g are one-one, each element has exactly one parent, if it has any.
If y ∈ B and there is a finite sequence x1 , y1 , x2 , y2 , . . . , xn , y or y0 , x1 , y1 ,
x2 , y2 , . . . , xn , y, for some n, such that each member of the sequence is the
parent of the next member, and such that the first member has no parent,
then we say y has an original ancestor, namely x1 or y0 respectively. Notice
that every member in the sequence has the same original ancestor. If y has
no parent, then y is its own original ancestor. Some elements may have no
original ancestor.
Let A = AA ∪ AB ∪ A∞ , where AA is the set of elements in A with original
ancestor in A, AB is the set of elements in A with original ancestor in B,
and A∞ is the set of elements in A with no original ancestor. Similarly let
B = BA ∪ BB ∪ B∞ , where BA is the set of elements in B with original
ancestor in A, BB is the set of elements in B with original ancestor in B,
and B∞ is the set of elements in B with no original ancestor.
Define h : A → B as follows:
Set Theory 49
Note that every element in AB must have a parent (in B), since if it did
not have a parent in B then the element would belong to AA . It follows that
the definition of h makes sense.
If x ∈ AA , then h(x) ∈ BA , since x and h(x) must have the same original
ancestor (which will be in A). Thus h : AA → BA . Similarly h : AB → BB
and h : A∞ → B∞ .
Note that h is one-one, since f is one-one and since each x ∈ AB has
exactly one parent.
Every element y in BA has a parent in A (and hence in AA ). This parent
is mapped to y by f and hence by h, and so h : AA → BA is onto. A similar
argument shows that h : A∞ → B∞ is onto. Finally, h : AB → BB is onto as
each element y in BB is the image under h of g(y). It follows that h is onto.
Thus h is one-one and onto, as required. End of proof of (2).
*Proof of (4): We do not really have the tools to do this, see Section 4.10.1
below. One lets
It follows from Zorn’s Lemma, see 4.10.1 below, that F contains a maximal
element. Either this maximal element is a one-one function from A into B,
or its inverse is a one-one function from B into A.
Proof: Suppose A = B. Then the second alternative holds and the first
and third do not.
Suppose A 6= B. Either A ≤ B or B ≤ A from the previous theorem.
Again from the previous theorem exactly one of these possibilities can hold,
as both together would imply A = B. If A ≤ B then in fact A < B since
A 6= B. Similarly, if B ≤ A then B < A.
We now prove the result promised at the end of the previous Section.
Proof: Let f : (0, 1) → R be one-one and onto, see Section 4.5. The map
(x, y) 7→ (f (x), f (y)) is thus (exercise) a one-one map from (0, 1)×(0, 1) onto
R × R; thus (0, 1) × (0, 1) ∼ R × R. Since also (0, 1) ∼ R, it is sufficient to
show that (0, 1) ∼ (0, 1) × (0, 1).
Consider the map f : (0, 1) × (0, 1) → (0, 1) given by
We take the unique decimal expansion for each of x and y given by requiring
that it does not end in an infinite sequence of 9’s. Then f is one-one but
not onto (since the number .191919 . . . for example is not in the range of f ).
Thus (0, 1) × (0, 1) ≤ (0, 1).
On the other hand, there is a one-one map g : (0, 1) → (0, 1) × (0, 1) given
by g(z) = (z, 1/2), for example. Thus (0, 1) ≤ (0, 1) × (0, 1).
Hence (0, 1) = (0, 1) × (0, 1) from the Schröder-Bernstein Theorem, and
the result follows as (0, 1) = c.
(2) If the sets A and B have cardinality c then they are in one-one
correspondence20 with R. It follows that A × B is in one-one correspon-
dence with R × R, and so the result follows from Theorem 4.8.8.
(3) Let {Ai }∞
i=1 be a countable family of countable sets. Consider an array
whose first column enumerates the members of A1 , whose second column
enumerates the members of A2 , etc. Then an enumeration similar to that
in (1), but suitably modified to take account of the facts that some columns
may be finite, that the number of columns may be finite, and that some
elements may appear in more than one column, gives the result.
(4) Let {Aα }α∈S be a family of sets each of cardinality c, where the index
set S has cardinality c. Let fα : Aα → R be a one-one and onto function for
each α.
S
Let A = α∈S Aα and define f : A → R × R by f (x) = (α, fα (x)) if
x ∈ Aα (if x ∈ Aα for more than one α, choose one such α 21 ). It follows
that A ≤ R × R, and so A ≤ c from Theorem 4.8.8.
On the other hand there is a one-one map g from R into A (take g equal
to the inverse of fα for some α ∈ S) and so c ≤ A.
The result now follows from the Schröder-Bernstein Theorem.
could interpret the hypothesis on {Aα }α∈S as providing the maps fα . This
has implicitly been done in (3) and (4).
Remark It is clear from the proof that in (4) it is sufficient to assume that
each set is countable or of cardinality c, provided that at least one of the sets
has cardinality c.
For some of the most commonly used equivalent forms we need some
further concepts.
1. (x ≤ y) ∧ (y ≤ x) ⇒ x = y (antisymmetry), and
22
This says that a ball in R3 can be divided into five pieces which can be rearranged by
rigid body motions to give two disjoint balls of the same radius as before!
Set Theory 53
2. (x ≤ y) ∧ (y ≤ z) ⇒ x ≤ z (transitivity), and
An element x ∈ X is maximal if (y ∈ X) ∧ (x ≤ y) ⇒ y = x, x is
maximum (= greatest) if z ≤ x for all z ∈ X. Similar for minimal and
minimum ( = least), and upper and lower bounds.
A subset Y of X such that for any x, y ∈ Y , either x ≤ y or y ≤ x is
called a chain. If X itself is a chain, the partial order is a linear or total
order. A linear order ≤ for which every non-empty subset of X has a least
element is a well order.
With this notation we have the following, the proof of which is not easy
(though some one way implications are).
X = {a ∈ A : a 6∈ f (a)} . (4.47)
Remark Note that the argument is similar to that used to obtain Russell’s
Paradox.
α + β = A ∪· B (4.48)
α × β = A × B, (4.49)
αβ = {f | f : B → A}. (4.50)
Why is this consistent with the usual definition of mn and Rn where m and
n are natural numbers?
For more information, see [BM, Chapter XII].
2. W is well ordered by ⊂
Then N, and its elements are ordinals, as is N ∪ {N}. Recall that for
n ∈ N, n = {m ∈ N : m < n}. An ordinal number in fact is equal to the set
of ordinal numbers less than itself.
24
We will discuss the Zermelo-Fraenkel axioms for set theory in a later course.
56
Chapter 5
In this Chapter we briefly review the fact that Rn , together with the usual
definitions of addition and scalar multiplication, is a vector space. With
the usual definition of Euclidean inner product, it becomes an inner product
space.
6. 1u = u for all u ∈ V
1
One can define a vector space over the complex numbers in an analogous manner.
2
It is common to denote vectors in boldface type.
57
58
Examples
Remarks You should review the following concepts for a general vector
space (see [F, Appendix 1] or [An]):
e1 = (1, 0, . . . , 0)
e2 = (0, 1, . . . , 0)
.. (5.4)
.
en = (0, 0, . . . , 1)
Examples
defines a norm on Rn , called the sup norm. Exercise: Show this nota-
tion is consistent, in the sense that
6. On the other hand, for RN , which is clearly a vector space under point-
wise operations, has no natural norm. Why?
1. u · u ≥ 0, u · u = 0 iff u = 0 (positivity)
2. u · v = v · u (symmetry)
5
We will see the reason for the || · ||2 notation when we discuss the Lp norm.
6
Other notations are h·, ·i and (·|·).
Vector Space Properties of Rn 61
Examples
1. The Euclidean inner product (or dot product or standard inner product)
of two vectors in Rn is defined by
It is easily checked that this does indeed satisfy the axioms for an inner
product. The corresponding inner product space is denoted by E n
in [F], but we will abuse notation and use Rn for the set of n-tuples,
for the corresponding vector space, and for the inner product space just
defined.
Definition 5.3.2 In an inner product space we define the length (or norm)
of a vector by
|u| = (u · u)1/2 , (5.16)
and the notion of orthogonality between two vectors by
The proof of the inequality is in [F, p. 6]. Although the proof given
there is for the standard inner product in Rn , the same proof applies to any
inner product space. A similar remark applies to the proof of the triangle
inequality in [F, p. 7]. The other two properties of a norm are easy to show.
An orthonormal basis for a finite dimensional inner product space is a
basis {v1 , . . . , vn } such that
(
0 if i 6= j
vi · vj = (5.21)
1 if i = j
Beginning from any basis {x1 , . . . , xn } for an inner product space, one can
construct an orthonormal basis {v1 , . . . , vn } by the Gram-Schmidt process
described in [F, p.10 Question 10]:
Metric Spaces
Proof: The first two are immediate. For the third we have d(x, y) = |x −
y| = |x−z +z−y| ≤ |x−z|+|z −y| = d(x, z) +d(z, y), where the inequality
comes from version (5.20) of the triangle inequality in Section 5.3.
63
64
We often denote the corresponding metric space by (X, d), to indicate that
a metric space is determined by both the set X and the metric d.
Examples
The proof is the same as that for Theorem 6.1.2. As examples, the sup
norm on Rn , and both the inner product norm and the sup norm on
C[a, b] (c.f. Section 5.2), induce corresponding metric spaces.
One can check that this defines a metric—the French metro with Paris
at the centre. The distance between two stations on different lines is
measured by travelling in to Paris and then out again.
One can check that this defines a metric which in fact satisfies the
strong triangle inequality (which implies the usual one):
Members of a general metric space are often called points, although they
may be functions (as in the case of C[a, b]), sets (as in 14.5) or other mathe-
matical objects.
Definition 6.2.2 Let (X, d) be a metric space. The open ball of radius r > 0
centred at x, is defined by
Note that the open balls in R are precisely the intervals of the form (a, b)
(the centre is (a + b)/2 and the radius is (b − a)/2).
Exercise: Draw the open ball of radius 1 about the point (1, 2) ∈ R2 , with
respect to the Euclidean (L2 ), sup (L∞ ) and L1 metrics. What about the
French metro?
1
Some definitions of neighbourhood require the set to be open (see Section 6.4 below).
66
Proof: These all follow immediately from the previous definition, why?
We next make precise the notion of a point for which there are members
of A which are arbitrarily close to that point.
NB The terms cluster point and accumulation point are also used here.
However, the usage of these three terms is not universally the same through-
out the literature.
Proof: Exercise.
Proof: For (6.5) first note that x ∈ A iff every Br (x) (r > 0) contains at
least one member of A. On the other hand, x ∈ ext A iff some Br (x) is a
subset of Ac , and so x ∈ (ext A)c iff it is not the case that some Br (x) is a
subset of Ac , i.e. iff every Br (x) contains at least one member of A.
Equality (6.6) follows from (6.5), (6.2) and the fact that the sets on the
right side of (6.2) are mutually disjoint.
For 6.7 it is sufficient from (6.5) to show A ∪ ∂A = (int A) ∪ ∂A. But
clearly (int A) ∪ ∂A ⊂ A ∪ ∂A.
On the other hand suppose x ∈ A ∪ ∂A. If x ∈ ∂A then x ∈ (int A) ∪ ∂A,
while if x ∈ A then x 6∈ ext A from the definition of exterior, and so x ∈
(int A) ∪ ∂A from (6.2). Thus A ∪ ∂A ⊂ (int A) ∪ ∂A.
68
Example 2
(extA ⊃ {y : d(y, x) > r}) If d(y, x) > r, let d(y, x) = s. Then Bs−r (y) ⊂ Ac
by the triangle inequality (exercise), i.e. y is an exterior point of A.
(∂A ⊂ {y : d(y, x) = r}, with equality for Rn ) This follows from the previ-
ous results and the fact that ∂A = X \ ((int A) ∪ ext A).
If z ∈ Bs (y) then d(z, y) < r − s. But d(y, x) = s and so d(z, x) ≤ d(z, y) + d(y, x) <
2
Everything in this section apart from specific examples, applies with Rn re-
placed by an arbitrary metric space (X, d).
The concept of an open set is very important in Rn and more generally
is basic to the study of topology4 . We will see later that notions such as
connectedness of a set and continuity of a function can be expressed in terms
of open sets.
Remark Thus a set is open iff all its members are interior points. Note
that since always intA ⊂ A, it follows that
Proof:
4
There will be courses on elementary topology and algebraic topology in later years.
Topological notions are important in much of contemporary mathematics.
70
Br-s(y)
Br(x)
A y..
.
x
r
The next result shows that finite intersections and arbitrary unions of
open sets are open. It is not true that an arbitrary intersection of open sets
is open. For example, the intervals (−1/n, 1/n) are open for each positive
T
integer n, but ∞ n=1 (−1/n, 1/n) = {0} which is not open.
Proof: Exercise.
We saw before that a set is open iff it is contained in, and hence equals,
its interior. Analogously we have the following result.
Proof: A is closed iff Ac is open iff Ac = int (Ac ) iff Ac = ext A (from (6.3))
iff A = A (taking complements and using (6.5)).
Remark “Most” sets are neither open nor closed. In particular, Q and
(a, b] are neither open nor closed in R.
Proof: This follows from the previous theorem by DeMorgan’s rules. More
precisely, if A = A1 ∪ · · · ∪ An then Ac = Ac1 ∩ · · · ∩ Acn and so Ac is open
and hence A is closed. A similar proof applies in the case of arbitrary inter-
sections.
S
Remark The example (0, 1) = ∞ n=1 [1/n, 1 − 1/n] shows that a non-finite
union of closed sets need not be closed.
S
Theorem 6.4.9 A set U ⊂ R is open iff U = i≥1 Ii , where {Ii } is a
countable (finite or denumerable) family of disjoint open intervals.
It is easy (exercise) to see that the axioms for a metric space do indeed
hold for (S, dS ).
Examples
1. The sets [a, b], (a, b] and Q all define metric subspaces of R.
2. Consider R2 with the usual Euclidean metric. We can identify R with
the “x-axis” in R2 , more precisely with the subset {(x, 0) : x ∈ R}, via
the map x 7→ (x, 0). The Euclidean metric on R then corresponds to
the induced metric on the x-axis.
Proof:
BrS (a) := {x ∈ S : dS (x, a) < r} = {x ∈ S : d(x, a) < r}
= S ∩ {x ∈ X : d(x, a) < r} = S ∩ Br (a).
5
There is no connection between the notions of a metric subspace and that of a vector
subspace! For example, every subset of Rn defines a metric subspace, but this is certainly
not true for vector subspaces.
74
Theorem 6.5.3 Suppose (X, d) is a metric space and (S, d) is a metric sub-
space. Then for any A ⊂ S:
.
S A = S∩U a
But BrSa (a) ⊂ A, and for each a ∈ A we trivially have that a ∈ BrSa (a). Hence
S ∩ U = A as required.
The result for closed sets follow from the results for open sets together
with DeMorgan’s rules.
(iii) First suppose A = S ∩ C, where C (⊂ X) is closed in X. Then
S \ A = S ∩ C c from elementary properties of sets. Since C c is open in X, it
follows from (1) that S \ A is open in S, and so A is closed in S.
6
We use the notation r = ra to indicate that r depends on a.
Metric Spaces 75
Examples
1. Let S = (0, 2]. Then (0, 1) and (1, 2] are both open in S (why?), but
(1, 2] is not open in R. Similarly, (0, 1] and [1, 2] are both closed in S
(why?), but (0, 1] is not closed in R.
7.1 Notation
If X is a set and xn ∈ X for n = 1, 2, . . ., then (x1 , x2 , . . .) is called a sequence
in X and xn is called the nth term of the sequence. We also write x1 , x2 , . . .,
or (xn )∞
n=1 , or just (xn ), for the sequence.
Thus xn → x if for every open ball Br (x) centred at x the sequence (xn )
is eventually contained in Br (x). The “smaller” the ball, i.e. the smaller the
1
It is sometimes convenient to replace r by ², to remind us that we are interested in
small values of r (or ²).
77
78
Examples
P
Series An infinite series ∞ n=1 xn of terms from R (more generally, from R
n
sn = x1 + · · · + xn .
P
Then we say the series ∞ n=1 xn converges iff the sequence (of partial sums)
(sn ) converges, and in this case the limit of (sn ) is called the sum of the
series.
NB Note that changing the order (re-indexing) of the (xn ) gives rise to a
possibly different sequence of partial sums (sn ).
Example
P∞
If 0 < r < 1 then the geometric series n=0 rn converges to (1 − r)−1 .
Definition 7.3.2 A sequence is bounded if the set of terms from the sequence
is bounded.
and so
d(x, y) − d(xn , yn ) ≤ d(x, xn ) + d(yn , y). (7.2)
Similarly
d(xn , yn ) ≤ d(xn , x) + d(x, y) + d(y, yn ),
and so
d(xn , yn ) − d(x, y) ≤ d(x, xn ) + d(yn , y). (7.3)
It follows from (7.2) and (7.3) that
7.4 Sequences in R
The results in this section are particular to sequences in R. They do not even
make sense in a general metric space.
Proof: Suppose (xn )∞ n=1 ⊂ R and (xn ) is increasing (if (xn ) is decreasing,
the argument is analogous). Since the set of terms {x1 , x2 , . . .} is bounded
above, it has a least upper bound x, say. We claim that xn → x as n → ∞.
To see this, note that xn ≤ x for all n; but if ² > 0 then xk > x − ² for
some k, as otherwise x − ² would be an upper bound. Choose such k = k(²).
Since xk > x − ², then xn > x − ² for all n ≥ k as the sequence is increasing.
Hence
x − ² < xn ≤ x
for all n ≥ k. Thus |x − xn | < ² for n ≥ k, and so xn → x (since ² > 0 is
arbitrary).
It follows that a bounded closed set in R contains its infimum and supre-
mum, which are thus the minimum and maximum respectively.
For sequences (xn ) ⊂ R it is also convenient to define the notions xn → ∞
and xn → −∞ as n → ∞.
P
Example Let zn = nk=1 k1 − log n. Then (zn ) is monotonically decreasing
and 0 < zn < 1 for all n. Thus (zn ) has a limit γ say. This is Euler’s constant,
and γ = 0.577 . . .. It arises when considering the Gamma function:
Z∞ ∞
eγz Y 1
Γ(z) = e−t tz−1 dt = (1 + )−1 ez/n
z 1 n
0
Proof: If (xn ) ⊂ A and xn → x, then for every r > 0, Br (x) must contain
some term from the sequence. Thus x ∈ A from Definition (6.3.3) of a limit
point.
Conversely, if x ∈ A then (again from Definition (6.3.3)), B1/n (x) ∩ A 6= ∅
for each n ∈ N . Choose xn ∈ B1/n (x)∩A for n = 1, 2, . . .. Then (xn )∞
n=1 ⊂ A.
Since d(xn , x) ≤ 1/n it follows xn → x as n → ∞.
4
More precisely, to be consistent
√ with the definition
√ of convergence, we could replace ²
throughout the proof by ²/ k and so replace ² k on the last line of the proof by ². We
would not normally bother doing this.
84
Proof: From the theorem, (7.4) is true iff A = A, i.e. iff A is closed.
Remark Thus in a metric space X the closure of a set A equals the set of
all limits of sequences whose members come from A. And this is generally
not the same as the set of limit points of A (which points will be missed?).
The set A is closed iff it is “closed” under the operation of taking limits of
convergent sequences of elements from A5 .
Exercise Use Corollary 7.6.2 to show directly that the closure of a set is
indeed closed.
lim (xn + yn ) = x + y,
n→∞
(7.5)
lim αxn = αx.
n→∞
(7.6)
lim αn xn = αx.
n→∞
(7.7)
lim xn · yn = x · y. (7.8)
n→∞
5
There is a possible inconsistency of terminology here. The sequence (1, 1 + 1, 1/2, 1 +
1/2, 1/3, 1+1/3, . . . , 1/n, 1+1/n, . . .) has no limit; the set A = {1, 1+1, 1/2, 1+1/2, 1/3, 1+
1/3, . . . , 1/n, 1 + 1/n, . . .} has the two limit points 0 and 1; and the closure of A, i.e. the
set of limit points of sequences whose members belong to A, is A ∪ {0, 1}.
Exercise: What is a sequence with members from A converging to 0, to 1, to 1/3?
Sequences and Convergence 85
Let N = max{N1 , N2 }.
(i) If n ≥ N then
Since (yn ) is convergent, it follows from Theorem 7.3.3 that |yn | ≤ M1 (say)
for all n. Setting M = max{M1 , |x|} it follows that
|xn · yn − x · y| ≤ 2M ²
Cauchy Sequences
Definition 8.1.1 Let (xn )∞n=1 ⊂ X where (X, d) is a metric space. Then
(xn ) is a Cauchy sequence if for every ² > 0 there exists an integer N such
that
m, n ≥ N ⇒ d(xm , xn ) < ².
We sometimes write this as d(xm , xn ) → 0 as m, n → ∞.
Thus a sequence is Cauchy if, for each ² > 0, beyond a certain point in
the sequence all the terms are within distance ² of one another.
Warning This is stronger than claiming that, beyond a certain point in the
sequence, consecutive terms are within distance ² of one another.
√
For example, consider the sequence xn = n. Then
³√
√ √
√ ´ n+1+ n
|xn+1 − xn | = n+1− n √ √
n+1+ n
1
= √ √ .
n+1+ n
(8.1)
87
88
Hence |xn+1 − xn | → 0 as n → ∞.
√ √
But |xm − xn | = m − n if m > n, and so for any N we can choose
n = N and m > N such that |xm − xn | is as large as we wish. Thus the
sequence (xn ) is not Cauchy.
Proof: Let (xn ) be a convergent sequence in the metric space (X, d), and
suppose x = lim xn .
Given ² > 0, choose N such that
n ≥ N ⇒ d(xn , x) < ².
since the sequence (xn ) is Cauchy and hence bounded (if |xn | ≤ M for all n
then also |yn | ≤ M for all n).
From Theorem 7.4.2 on monotone sequences, yn → a (say) as n → ∞.
We will prove that also xn → a.
2
Suppose ² > 0. Since (xn ) is Cauchy there exists N = N (²) such that
xn − ² ≤ xm ≤ x` + ² (8.2)
for all `, m, n ≥ N .
Claim:
yn − ² ≤ xm ≤ yn + ² (8.3)
for all m, n ≥ N .
To establish the first inequality in (8.3), note that from the first inequality
in (8.2) we immediately have for m, n ≥ N that
y n − ² ≤ xm ,
since yn ≤ xn .
To establish the second inequality in (8.3), note that from the second
inequality in (8.2) we have for `, m ≥ N that
xm ≤ x` + ² .
xm ≤ yn + ².
It now follows from the Claim, by fixing m and letting n → ∞, and from
the Comparison Test (Theorem 7.4.4) that
a − ² ≤ xm ≤ a + ²
Remark In the case k = 1, and for any bounded sequence, the number a
constructed above,
a = sup inf{xi : i ≥ n}
n
is denoted lim inf xn or lim xn . It is the least of the limits of the subse-
quences of (xn )∞ 4
n=1 (why?). One can analogously define lim sup xn or lim xn
(exercise).
Examples
1. We will see in Corollary 12.3.5 that C[a, b] (see Section 5.1 for the
definition) is a Banach space with respect to the sup metric. The same
argument works for `∞ (N).
On the other hand, C[a, b] with respect to the L1 norm (see 5.11) is not
complete. For example, let fn ∈ C[−1, 1] be defined by
0 −1 ≤ x ≤ 0
fn (x) = nx 0 ≤ x ≤ n1
1 1
n
≤x≤1
Then there is no f ∈ C[−1, 1] such that ||fn − f ||L1 → 0, i.e. such that
R1
−1 |fn − f | → 0. (If there were such an f , then we would have to have
f (x) = 0 if −1 ≤ x < 0 and f (x) = 1 if 0 < x ≤ 1 (why?). But such
an f cannot be continuous on [−1, 1].)
The same example shows that C[a, b] with respect to the L2 norm
(see 5.12) is not complete.
Remark The second example here utilizes the following simple fact. Given
a set X, a metric space (Y, dY ) and a suitable function f : X → Y , the
function dX (x, x0 ) = dY (f (x), f (x0 )) is a metric on X. (Exercise : what does
suitable mean here?) The cases X = (0, 1), Y = R2 , f (x) = (x, sin x−1 ) and
f (x) = (cos 2πx, sin 2πx) are of interest.
Proof: We will find the fixed point as the limit of a Cauchy sequence.
Let x be any point in X and define a sequence (xn )∞
n=1 by
But
Remark Fixed point theorems are of great importance for proving existence
results. The one above is perhaps the simplest, but has the advantage of
giving an algorithm for determining the fixed point. In fact, it also gives an
estimate of how close an iterate is to the fixed point (how?).
Example Take R with the standard metric, and let a > 1. Then the map
1 a
f (x) = (x + )
2 x
takes [1, ∞) into itself, and is contractive with λ = 12 . What is the fixed
point? (This was known to the Babylonians, nearly 4000 years ago.)
Cauchy Sequences and Complete Metric Spaces 95
f (x) 00
g 0 (x) = f (x) ,
(f 0 (x)2
9.1 Subsequences
Recall that if (xn )∞
n=1 is a sequence in some set X and n1 < n2 < n3 < . . .,
then the sequence (xni )∞ ∞
i=1 is called a subsequence of (xn )n=1 .
Proof: Let xn → x in the metric space (X, d) and let (xni ) be a subse-
quence.
Let ² > 0 and choose N so that
d(xn , x) < ²
for n ≥ N . Since ni ≥ i for all i, it follows
d(xni , x) < ²
for i ≥ N .
Proof: Suppose that (xn ) is cauchy in the metric space (X, d). So given
² > 0 there is N such that d(xn , xm ) < ² provided m, n > N . If (xnj ) is a
subsequence convergent to x ∈ X, then there is N 0 such that d(xnj , x) < ²
for j > N 0 . Thus for m > N , take any j > max{N, N 0 } (so that nj > N
certainly), to see that
d(x, xm ) ≤ d(x, xnj ) + d(xnj , xm ) < 2² .
97
98
1
Other texts may have different forms of the Bolzano-Weierstrass Theorem.
Subsequences and Compactness 99
Remark 2 The Theorem is not true if Rn is replaced by C[0, 1]. For ex-
ample, consider the sequence of functions (fn ) whose graphs are as shown in
the diagram.
fn
1 1
1 1 1
n+1 n 3 2
√
2
The distance
√ between any two points in I1 is ≤ 2 √kr, between any two points in I2
is thus ≤ kr, between any two points in I3 is thus ≤ kr/2, etc.
100
The sequence is bounded since ||fn ||∞ = 1, where we take the sup norm
(and corresponding metric) on C[0, 1].
But if n 6= m then
Remark The second half of this corollary holds in a general metric space,
the first half does not.
Remark
Examples
1. From Corollary 9.2.2 the compact subsets of Rn are precisely the closed
bounded subsets. Any such compact subset, with the induced metric,
is a compact metric space. For example, [a, b] with the usual metric is
a compact metric space.
2. The Remarks on C[0, 1] in the previous section show that the closed5
bounded set S = {f ∈ C[0, 1] : ||f ||∞ = 1} is not compact. The set
S is just the “closed unit sphere” in C[0, 1]. (You will find later that
C[0, 1] is not unusual in this regard, the closed unit ball in any infinite
dimensional normed space fails to be compact.)
Relative and Absolute Notions Recall from the Note at the end of
Section (6.4) that if X is a metric space the notion of a set S ⊂ X being
open or closed is a relative one, in that it depends also on X and not just on
the induced metric on S.
However, whether or not S is compact depends only on S itself and the
induced metric, and so we say compactess is an absolute notion. Similarly,
completeness is an absolute notion.
It is not necessarily the case that there exists y ∈ A such that d(x, A) =
d(x, y). For example if A = [0, 1) ⊂ R and x = 2 then d(x, A) = 1, but
d(x, y) > 1 for all y ∈ A.
Moreover, even if d(x, A) = d(x, y) for some y ∈ A, this y may not be
unique. For example, let S = {y ∈ R2 : ||y|| = 1} and let x = (0, 0). Then
d(x, S) = 1 and d(x, y) = 1 for every y ∈ S.
Notice also that if x ∈ A then d(x, A) = 0. But d(x, A) = 0 does not
imply x ∈ A. For example, take A = [0, 1) and x = 1.
5
S is the boundary of the unit ball B1 (0) in the metric space C[0, 1] and is closed as
noted in the Examples following Theorem 6.4.7.
102
Proof: Let
γ = d(x, S)
and choose a sequence (yn ) in S such that
d(x, yn ) → γ as n → ∞.
6
This is fairly obvious and the actual argument is similar to showing that convergent
sequences are bounded, c.f. Theorem 7.3.3. More precisely, we have there exists an integer
N such that d(x, yn ) ≤ γ + 1 for all n ≥ N , by the fact d(x, yn ) → γ. Let
Limits of Functions
1. f : A (⊂ R) → R,
2. f : A (⊂ Rn ) → R,
3. f : A (⊂ Rn ) → Rm .
You should think of these particular cases when you read the following.
Sometimes we can represent a function by its graph. Of course functions
can be quite complicated, and we should be careful not to be misled by the
simple features of the particular graphs which we are able to sketch. See the
following diagrams.
103
104
R2
R
graph of →R2
f:R→
Sometimes we can sketch the domain and the range of the function, per-
haps also with a coordinate grid and its image. See the following diagram.
f:R2→R2
(xn )∞
n=1 ⊂ A \ {a} and xn → a together imply f (xn ) → b.
(where in the last notation the intended domain A is understood from the
context).
Limits of Functions 107
and say the limit as x approaches a from the right or the limit as x approaches
a from the left, respectively.
Example 2 If g : R → R then
g(x) − g(a)
lim
x→a x−a
is (if the limit exists) called the ³derivative of´ g at a. Note that in Defini-
tion 10.2.1 we are taking f (x) = g(x) − g(a) /(x − a) and that f (x) is not
defined at x = a. We take A = R \ {a}, or A = (a − δ, a) ∪ (a, a + δ) for
some δ > 0.
We can also use the definition to show in some cases that limx→a f (x)
does not exist. For this it is sufficient to show that for some sequence (xn ) ⊂
A \ {a} with xn → a the corresponding sequence (f (xn )) does not have a
limit. Alternatively, it is sufficient to give two sequences (xn ) ⊂ A \ {a} and
(yn ) ⊂ A \ {a} with xn → a and yn → a but lim xn 6= lim yn .
Example 4 (Draw a sketch). limx→0 sin(1/x) does not exist. To see this
consider, for example, the sequences xn = 1/(nπ) and yn = 1/((2n + 1/2)π).
Then sin(1/xn ) = 0 → 0 and sin(1/yn ) = 1 → 1.
108
Example 5
Then da,k > 0, even if a ∈ Sk , since it is the minimum of a finite set of strictly
positive (i.e. > 0) numbers.
Now let (xn ) be any sequence with xn → a and xn 6= a for all n. We need
to show g(xn ) → 0.
Suppose ² > 0 and choose k so 1/2k ≤ ². Then from (10.2), g(xn ) < ² if
xn 6∈ Sk .
On the other hand, 0 < |xn − a| < da,k for all n ≥ N (say), since xn 6= a
for all n and xn → a. It follows from (10.3) that xn 6∈ Sk for n ≥ N . Hence
g(xn ) < ² for n ≥ N .
Since also g(x) ≥ 0 for all x it follows that g(xn ) → 0 as n → ∞. Hence
limx→a g(x) = 0 as claimed.
Example 6 Define h : R → R by
h(x) = m→∞ lim (cos(m!πx))n
lim n→∞
Example 7 Let
xy
f (x, y) =
x2 + y2
for (x, y) 6= (0, 0).
If y = ax then f (x, y) = a(1 + a2 )−1 for x 6= 0. Hence
a
lim f (x, y) = .
(x,y)→a 1 + a2
y=ax
One can also visualize f by sketching level sets 1 of f as shown in the next
diagram. Then you can visualise the graph of f as being swept out by a
straight line rotating around the origin at a height as indicated by the level
sets.
Example 8 Let
x2 y
f (x, y) =
x4 + y 2
ax3
lim f (x, y) = lim
(x,y)→a x→0 x4 + a2 x2
y=ax
ax
= lim 2
x→0 x + a2
= 0.
Thus the limit of f as (x, y) → (0, 0) along any line y = ax is 0. The limit
along the y-axis x = 0 is also easily seen to be 0.
But it is still not true that lim(x,y)→(0,0) f (x, y) exists. For if we consider
the limit of f as (x, y) → (0, 0) along the parabola y = bx2 we see that
f = b(1 + b2 )−1 on this curve and so the limit is b(1 + b2 )−1 .
You might like to draw level curves (corresponding to parabolas y = bx2 ).
1
A level set of f is a set on which f is constant.
Limits of Functions 111
This example reappears in Chapter 17. Clearly we can make such exam-
ples as complicated as we please.
1. limx→a f (x) = b;
Proof:
(1) ⇒ (2): Assume (1), so that whenever (xn ) ⊂ A \ {a} and xn → a then
f (xn ) → b.
Suppose (by way of obtaining a contradiction) that (2) is not true. Then
for some ² > 0 there is no δ > 0 such that
In other words, for some ² > 0 and every δ > 0, there exists an x depending
on δ, with
But * is true for all n ≥ N (say, where N depends on δ and hence on ²), and
so ρ(f (xn ), b) < ² for all n ≥ N . Thus f (xn ) → b as required and so (1) is
true.
Proposition 10.4.2 Assume limx→a f (x) exists. Then for some r > 0, f is
bounded on the set A ∩ Br (a).
Limits of Functions 113
Most of the basic properties of limits of functions follow directly from the
corresponding properties of limits of sequences without the necessity for any
²–δ arguments.
Theorem 10.4.3 Limits are unique; in the sense that if limx→a f (x) = b1
and limx→a f (x) = b2 then b1 = b2 .
(f g)(x) = f (x)g(x),
à !
f f (x)
(x) = .
g g(x)
The following algebraic properties of limits follow easily from the cor-
responding properties of sequences. As usual you should think of the case
X = Rn and V = Rn (in particular, m = 1).
If V = R then
provided in the last case that g(x) 6= 0 for all x ∈ A\{a}2 and limx→a g(x) 6= 0.
2
It is often convenient to instead just require that g(x) 6= 0 for all x ∈ Br (a) ∩ (A \ {a})
and some r > 0. In this case the function f /g will be defined everywhere in Br (a)∩(A\{a})
and the conclusion still holds.
Limits of Functions 115
(f + g)(xn ) → b + c.
But
from (10.6) and the algebraic properties of limits of sequences, Theorem 7.7.1.
This proves the result.
The others are proved similarly. For the second last we also need the
Problem in Chapter 7 about the ratio of corresponding terms of two conver-
gent sequences.
One usually uses the previous Theorem, rather than going back to the
original definition, in order to compute limits.
P (x) P (a)
lim =
x→a Q(x) Q(a)
if Q(a) 6= 0.
To see this, let P (x) = a0 + a1 x + a2 x2 + · · · + an xn . It follows (exer-
cise) from the definition of limit that limx→a c = c (for any real number c)
and limx→a x = a. It then follows by repeated applications of the previous
theorem that limx→a P (x) = P (a). Similarly limx→a Q(x) = Q(a) and so the
result follows by again using the theorem.
116
Chapter 11
Continuity
Example 1 Define
(
x sin(1/x) if x 6= 0,
f (x) =
0 if x = 0.
117
118
Example 2 From the Example in Section 10.4 it follows that any rational
function P/Q, and in particular any polynomial, is continuous everywhere it
is defined, i.e. everywhere Q(x) 6= 0.
Example 3 Define (
x if x ∈ Q
f (x) =
−x if x 6∈ Q
Then f is continuous at 0, and only at 0.
The following equivalent definitions are often useful. They also have the
advantage that it is not necessary to consider the case of isolated points and
limit points separately. Note that, unlike in Theorem 10.3.1, we allow xn = a
in (2), and we allow x = a in (3).
1. f is continuous at a;
2. whenever (xn )∞
n=1 ⊂ A and xn → a then f (xn ) → f (a);
x ∈ A and d(x, a)
³ < δ implies
´ ³ρ(f (x),
´ f (a)) < ²;
i.e. f Bδ (a) ⊆ B² f (a) .
Limits of Functions 119
Proof: (1) ⇒ (2): Assume (1). Then in particular for any sequence
(xn ) ⊂ A \ {a} with xn → a, it follows f (xn ) → f (a).
In order to prove (2) suppose we have a sequence (xn ) ⊂ A with xn → a
(where we allow xn = a). If xn = a for all n ≥ some N then f (xn ) = f (a)
for n ≥ N and so trivially f (xn ) → f (a). If this case does not occur then
by deleting any xn with xn = a we obtain a new (infinite) sequence x0n → a
with (x0n ) ⊂ A \ {a}. Since f is continuous at a it follows f (x0n ) → f (a). As
also f (xn ) = f (a) for all terms from (xn ) not in the sequence (x0n ), it follows
that f (xn ) → f (a). This proves (2).
The equivalence of (2) and (3) is proved almost exactly as is the equiva-
lence of the two corresponding conditions (1)and (2) in Theorem 10.3.1. The
only essential difference is that we replace A \ {a} everywhere in the proof
there by A.
Remark Property (3) here is perhaps the simplest to visualize, try giving
a diagram which shows this property.
A useful consequence
Rb
of this observation is that if f : [a, b] → R is con-
tinuous, and a |f | = 0, then f = 0. (This fact has already been used in
Section 5.2.)
The following two Theorems are proved using Theorem 11.1.2 (2) in the
same way as are the corresponding properties for limits. The only difference
is that we no longer require sequences xn → a with (xn ) ⊂ A to also satisfy
xn 6= a.
Proof: Using Theorem 11.1.2 (2), the proofs are as for Theorem 10.4.5,
except that we take sequences (xn ) ⊂ A with possibly xn = a. The only
extra point is that because g(a) 6= 0 and g is continuous at a then from the
remark at the beginning of this section, g(x) 6= 0 for all x sufficiently near a.
Remark Use of property (3) again gives a simple picture of this result.
More generally:
Remarks
Examples
1
To prove this we need to first give a proper definition of sin x. This can be done by
means of an infinite series expansion.
122
Theorem 11.4.1 Let f : X → Y , where (X, d) and (Y, ρ) are metric spaces.
Then the following are equivalent:
1. f is continuous;
2. f −1 [E] is open in X whenever E is open in Y ;
3. f −1 [C] is closed in X whenever C is closed Y .
Proof:
(1) ⇒ (2): Assume (1). Let E be open in Y . We wish to show that f −1 [E]
is open (in X).
Let x ∈ f³−1 [E].´ Then f (x) ∈ E, and since E is open there exists r > 0
such that Br f (x) ⊂ E. From Theorem 11.1.2(3) there exists δ > 0 such
Limits of Functions 123
h i ³ ´ h ³ ´i
that f Bδ (x) ⊂ Br f (x) . This implies Bδ (x) ⊂ f −1 Br f (x) . But
h ³ ´i
f −1 Br f (x) ⊂ f −1 [E] and so Bδ (x) ⊂ f −1 [E].
Thus every point x ∈ f −1 [E] is an interior point and so f −1 [E] is open.
(2) ⇒ (1): Assume (2). We will use Theorem 11.1.2(3) to prove (1).
³ ´
Let x ∈ X. In order to prove f is continuous at x take any Br f (x) ⊂
³ ´ h ³ ´i
Y . Since Br f (x) is open it follows that f −1 Br f (x) is open. Since
³ ´ ³ ´
−1
x∈f [Br f (x) ] it follows there exists δ > 0 such that Bδ (x) ⊂ f −1 [Br f (x) ].
h i h h ³ ´ii h h ³ ´ii ³ ´
Hence f Bδ (x) ⊂ f f −1 Br f (x) ; but f f −1 Br f (x) ⊂ Br f (x)
h i ³ ´
(exercise) and so f Bδ (x) ⊂ Br f (x) .
It follows from Theorem 11.1.2(3) that f is continuous at x. Since x ∈ X
was arbitrary, it follows that f is continuous on X.
1. f is continuous;
Proof: Since (S, d) is a metric space, this follows immediately from the
preceding theorem.
Proof: Let (yn ) be any sequence from f [K]. We want to show that some
subsequence has a limit in f [K].
Let yn = f (xn ) for some xn ∈ K. Since K is compact there is a sub-
sequence (xni ) such that xni → x (say) as i → ∞, where x ∈ K. Hence
yni = f (xni ) → f (x) since f is continuous, and moreover f (x) ∈ f [K] since
x ∈ K. It follows that f [K] is compact.
You know from your earlier courses on Calculus that a continuous function
defined on a closed bounded interval is bounded above and below and has
a maximum value and a minimum value. This is generalised in the next
theorem.
Proof: From the previous theorem f [K] is a closed and bounded subset of
R. Since f [K] is bounded it has a least upper bound b (say), i.e. b ≥ f (x)
for all x ∈ K. Since f [K] is closed it follows that b ∈ f [K]4 . Hence b = f (x0 )
for some x0 ∈ K, and so f (x0 ) is the maximum value of f on K.
Similarly, f has a minimum value taken at some point in K.
1. Let f (x) = 1/x for x ∈ (0, 1]. Then f is continuous and (0, 1] is
bounded, but f is not bounded above on the set (0, 1].
2. Let f (x) = x for x ∈ [0, 1). Then f is continuous and is even bounded
above on [0, 1), but does not have a maximum on [0, 1).
3. Let f (x) = 1/x for x ∈ [1, ∞). Then f is continuous and is bounded
below on [1, ∞) but does not have a minimum on [1, ∞).
Definition 11.6.1 Let (X, d) and (Y, ρ) be metric spaces. The function
f : X → Y is uniformly continuous on X if for each ² > 0 there exists δ > 0
such that
d(x, x0 ) < δ ⇒ ρ(f (x), f (x0 )) < ²,
for all x, x0 ∈ X.
Remark The point is that δ may depend on ², but does not depend on x
or x0 .
Examples
Proof: If f is not uniformly continuous, then there exists ² > 0 such that
for every δ > 0 there exist x, y ∈ S with
Fix some such ² and using δ = 1/n, choose two sequences (xn ) and (yn )
such that for all n
is (uniformly) continuous.
Proof: We have
Z 1
|f (x) − f (y)| ≤ |K(x, t) − K(y, t)|dt .
0
Uniform continuity of K on [0, 1]2 means that given ² > 0 there is δ > 0 such
that |K(x, s) − K(y, t)| < ² provided d((x, s), (y, t)) < δ. So if |x − y| < δ
|f (x) − f (y)| < ².
Uniform Convergence of
Functions
1.
f, fn : [−1, 1] → R
for n = 1, 2, . . ., where
0 −1 ≤ x ≤ 0
2nx 0 < x ≤ (2n)−1
fn (x) =
2 − 2nx (2n)−1 < x ≤ n−1
0 n−1 < x ≤ 1
and
f (x) = 0 for all x.
129
130
2.
f, fn : [−1, 1] → R
for n = 1, 2, . . . and
f (x) = 0 for all x.
3.
f, fn : [−1, 1] → R
for n = 1, 2, . . . and (
1 x=0
f (x) =
6 0
0 x=
4.
f, fn : [−1, 1] → R
Uniform Convergence of Functions 131
for n = 1, 2, . . . and (
0 x<0
f (x) =
1 0≤x
5.
f, fn : R → R
for n = 1, 2, . . . and
f (x) = 0 for all x.
6.
f, fn : R → R
for n = 1, 2, . . . and
f (x) = 0 for all x.
7.
132
f, fn : R → R
for n = 1, 2, . . .,
1
fn (x) = sin nx,
n
and
f (x) = 0 for all x.
Then it is not the case for Examples 1–6 that the graph of fn is a subset
of the ²-strip about the graph of f for all sufficiently large n.
However, in Example 7, since
1 1
|fn (x) − f (x)| = | sin nx − 0| ≤ ,
n n
it follows that |fn (x) − f (x)| < ² for all n > 1/². In other words, the graph
of fn is a subset of the ²-strip about the graph of f for all sufficiently large
n. In this case we say that fn → f uniformly.
1
Notice that how large n needs to be depends on x.
Uniform Convergence of Functions 133
for all x ∈ S.
Remarks (i) Thus (fn ) is uniformly Cauchy iff the following is true: for
each ² > 0, any two functions from the sequence after a certain point (which
will depend on ²) lie within the ²-strip of each other.
³ ´
(ii) Informally, (fn ) is uniformly Cauchy if ρ fn (x), fm (x) → 0 “uniformly”
in x as m, n → ∞.
(iii) We will see in the next section that if Y is complete (e.g. Y = Rn ) then
a sequence (fn ) (where fn : S → Y ) is uniformly Cauchy iff fn → f uniformly
for some function f : S → Y .
134
(Thus in this case the uniform “metric” is the metric corresponding to the
uniform “norm”, as in the examples following Definition 6.2.1)
∞ + ∞ = ∞, c ∞ = ∞ if c > 0, c ∞ = 0 if c = 0. (12.6)
Theorem 12.2.2 The uniform “metric” (“norm”) satisfies the axioms for
a metric space (Definition 6.2.1) (Definition 5.2.1) provided we interpret
arithmetic operations on ∞ as in (12.6).
n ≥ N ⇒ du (fn , f ) < ².
it follows
m, n ≥ N ⇒ ρ(fn (x), fm (x)) < 2²
for each x ∈ S.
We know that fn → f in the pointwise sense, but we need to show that
fn → f uniformly.
So suppose that ² > 0 and, using the fact that (fn ) is uniformly Cauchy,
choose N such that
³ ´
m, n ≥ N ⇒ ρ fn (x), fm (x) < ²
for all x ∈ S 4 .
Since this applies to every m ≥ N , we have that
³ ´
m ≥ N ⇒ ρ f (x), fm (x) ≤ ².
³ ´
is < ². Since fN +p (x) → f (x) as p → ∞, it follows that ρ fN +p (x), fm (x) →
³ ´
ρ f (x), fm (x) as p → ∞ (this is clear if Y is R or Rk , and follows in general from
³ ´
Theorem 7.3.4). By the Comparison Test it follows that ρ f (x), fm (x) ≤ ².
138
Y fN
f
fN (x0)
f(x0 )
fN(x)
f(x)
x x0 X
Recall from Definition 11.1.1 that if X and Y are metric spaces and
A ⊂ X, then the set of all continuous functions f : A → Y is denoted by
C(A; Y ). If A is compact, then we have seen that f [A] is compact and hence
bounded5 , i.e. f is bounded. If A is not compact, then continuous functions
need not be bounded6 .
Proof: It has already been verified in Theorem 12.2.2 that the three axioms
for a metric are satisfied. We need only check that du (f, g) is always finite
for f, g ∈ BC(A; Y ).
But this is immediate. For suppose b ∈ Y . Then since f and g are
bounded on A, it follows there exist K1 and K2 such that ρ(f (x), b) ≤ K1
and ρ(g(x), b) ≤ K2 for all x ∈ A. But then du (f, g) ≤ K1 + K2 from the
definition of du and the triangle inequality. Hence BC(A; Y ) is a metric space
with the uniform metric.
In order to verify completeness, let (fn )∞
n=1 be a Cauchy sequence from
BC(A; Y ). Then (fn ) is uniformly Cauchy, as noted in Proposition 12.2.4.
From Theorem 12.2.5 it follows that fn → f uniformly, for some function
f : A → Y . From Proposition 12.2.3 it follows that fn → f in the uniform
metric.
From Theorem 12.3.1 it follows that f is continuous. It is also clear that
f is bounded7 . Hence f ∈ BC(A; Y ).
We have shown that fn → f in the sense of the uniform metric du , where
f ∈ BC(A; Y ). Hence (BC(A; Y ), du ) is a complete metric space.
Corollary 12.3.4 Let (X, d) be a metric space and (Y, ρ) be a complete met-
ric space. Let A ⊂ X be compact. Then C(A; Y ) is a complete metric space
with the uniform metric du .
We will use the previous corollary to find solutions to (systems of) differ-
ential equations.
*Remarks
8
For the following, recall ¯Z ¯
¯ b ¯ Z b
¯ ¯
¯ g¯ ≤ |g|,
¯ a ¯ a
Z b Z b
f (x) ≤ g(x) for all x ∈ [a, b] ⇒ f≤ g.
a a
Uniform Convergence of Functions 141
In particular, let
(
n 2
2
x+ 1
2n
0 ≤ |x| ≤ n1
fn (x) =
|x| 1
n
≤ |x| ≤ 1
Then the fn are differentiable on [−1, 1] (the only points to check are x =
±1/n), and fn → f uniformly since du (fn , f ) ≤ 1/n.
Example 7 from Section 12.1 gives an example where fn → f uniformly
and f is differentiable, but fn0 does not converge for most x. In fact, fn0 (x) =
cos nx which does not converge (unless x = 2kπ for some k ∈ Z (exercise)).
However, if the derivatives themselves converge uniformly to some limit,
then we have the following theorem.
Then f 0 exists and is continuous on [a, b] and fn0 → f 0 uniformly on [a, b].
Moreover, fn → f uniformly on [a, b].
9
Recall that the integral of a continuous function is differentiable, and the derivative
is just the original function.
Chapter 13
The main result in this Chapter is the Existence and Uniqueness Theorem
for first order systems of (ordinary) differential equations. Essentially any
differential equation or system of differential equations can be reduced to a
first-order system, so the result is very general. The Contraction Mapping
Principle is the main ingredient in the proof.
The local Existence and Uniqueness Theorem for a single equation, to-
gether with the necessary preliminaries, is in Sections 13.3, 13.7–13.9. See
Sections 13.10 and 13.11 for the global result and the extension to systems.
These sections are independent of the remaining sections.
In Section 13.1 we give two interesting examples of systems of differential
equations.
In Section 13.2 we show how higher order differential equations (and more
generally higher order systems) can be reduced to first order systems.
In Sections 13.4 and 13.5 we discuss “geometric” ways of analysing and
understanding the solutions to systems of differential equations.
In Section 13.6 we give two examples to show the necessity of the condi-
tions assumed in the Existence and Uniqueness Theorem.
13.1 Examples
Suppose there are two species of animals, and let the populations at time t
be x(t) and y(t) respectively. We assume we can approximate x(t) and y(t)
by differentiable functions. Species x is eaten by species y . The rates of
143
144
dx
= ax − bxy − ex2 ,
dt (13.1)
dy
= −cy + dxy − f y 2 .
dt
We will return to this system later. It is first order, since only first
derivatives occur in the equation, and nonlinear, since some of the terms
involving the unknowns (or dependent variables) x and y occur in a nonlinear
way (namely the terms xy, x2 and y 2 ). It is a system of ordinary differential
equations since there is only one independent variable t, and so we only form
ordinary derivatives; as opposed to differential equations where there are two
or more independent variables, in which case the differential equation(s) will
involve partial derivatives.
If the spring obeys Hooke’s law, then the force is proportional to the dis-
placement, but acts in the opposite direction, and so
Force = −kx(t),
i.e.
mx00 (t) + kx(t) = 0.
velocity x0 , and so obtain the following first order system for the “unknowns”
x and y:
x0 = y
(13.3)
y 0 = −m−1 cy − m−1 kx
This is a first order system (linear in this case).
If x, y is a solution of (13.3) then it is clear that x is a solution of (13.2).
Conversely, if x is a solution of (13.2) and we define y(t) = x0 (t), then x, y is
a solution of (13.3).
x1 (t) = x(t)
x2 (t) = x0 (t)
x3 (t) = x00 (t)
..
.
x (t) = x(n−1) (t).
n
x = (x1 , . . . , xn )
dx dx1 dxn
= ( ,..., )
dt dt dt
f (t, x) = f (t, x1 , . . . , xn )
³ ´
= f 1 (t, x1 , . . . , xn ), . . . , f n (t, x1 , . . . , xn ) .
for some given t0 and some given x0 = (x10 , . . . , xn0 ). That is,
x(t0 ) = x0 .
Here, (t0 , x0 ) ∈ U .
The following diagram sketches the situation (schematically in the case
n > 1).
dx
= f (t, x), (13.6)
dt
x(t0 ) = x0 . (13.7)
We say x(t) = (x1 (t), . . . , xn (t)) is a solution of this initial value problem for
t in the interval I if:
1. t0 ∈ I,
2. x(t0 ) = x0 ,
3
Sometimes it is convenient to allow U to be the closure of an open set.
150
x(t0 + h) ≈ x0 + hf (t0 , x0 ) =: x1 4
Similarly
P = (t0 , x0 )
Q = (t0 + h, x1 )
R = (t0 + 2h, x2 )
S = (t0 + 3h, x3 )
Direction field
Competing Species
Consider the case of two species whose populations at time t are x(t) and
y(t). Suppose they have a good food supply but fight each other whenever
they come into contact. By a discussion similar to that in Section 13.1.1,
their populations may be modelled by the equations
dx ³ ´
= ax − bxy = f 1 (x, y) ,
dt
dy ³ ´
= cy − dxy = f 2 (x, y) ,
dt
for suitable a, b, c, d > 0. Consider as an example the case a = 1000, b = 1,
c = 2000 and d = 1.
If a solution x(t), y(t) passes through a point (x, y) in phase space at some
time t, then the “velocity” of the path at this point is (f 1 (x, y), f 2 (x, y)) =
(x(1000 − y), y(2000 − x)). In particular, the path is tangent to the vector
(x(1000 − y), y(2000 − x)) at the point (x, y). The set of all such velocity
vectors (f 1 (x, y), f 2 (x, y)) at the points (x, y) ∈ R2 is called the velocity field
associated to the system of differential equations. Notice that as the example
we are discussing is autonomous, the velocity field is independent of time.
In the previous diagram we have shown some vectors from the velocity
field for the present system of equations. For simplicity, we have only shown
their directions in a few cases, and we have normalised each vector to have
the same length; we sometimes call the resulting vector field a direction field5 .
5
Note the distinction between the direction field in phase space and the direction field
for the graphs of solutions as discussed in the last section.
First Order Systems 153
Once we have drawn the velocity field (or direction field), we have a good
idea of the structure of the set of solutions, since each solution curve must
be tangent to the velocity field at each point through which it passes.
Next note that (f 1 (x, y), f 2 (x, y)) = (0, 0) if (x, y) = (0, 0) or (2000, 1000).
Thus the “velocity” (or rate of change) of a solution passing through either
of these pairs of points is zero. The pair of constant functions given by
x(t) = 2000 and y(t) = 1000 for all t is a solution of the system, and from
Theorem 13.10.1 is the only solution passing through (2000, 1000). Such a
constant solution is called a stationary solution or stationary point. In this
example the other stationary point is (0, 0) (this is not surprising!).
The stationary point (2000, 1000) is unstable in the sense that if we
change either population by a small amount away from these values, then
the populations do not converge back to these values. In this example, one
population will always die out. This is all clear from the diagram.
t t≤1
1
x(t) = { 2t - 1 t > 1
1 t
Notice that x(t) satisfies the initial condition and also satisfies the dif-
ferential equation provided t 6= 1. But x(t) is not differentiable at t = 1.
First Order Systems 155
If f is Lipschitz with respect to x in Ah,k (t0 , x0 ), for every set Ah,k (t0 , x0 ) ⊂ A
of the form
then we say f is locally Lipschitz with respect to x, (see the following dia-
gram).
(t, x )
1
(t, x2)
A
(t 0, x ) h
0
A h.k (t , x )
0 0
We could have replaced the sets Ah,k (t0 , x0 ) by closed balls centred at
(t, x0 ) without affecting the definition, since each such ball contains a set
156
Ah,k (t0 , x0 ) for some h, k > 0, and conversely. We choose sets of the form
Ah,k (t0 , x0 ) for later convenience.
The difference between being Lipschitz with respect to x and being locally
Lipschitz with respect to x is clear from the following Examples.
for some ξ between x1 and x2 , again using the Mean Value Theorem. If
x1 , x2 ∈ B for some bounded set B, in particular if B is of the form
{(t, x) : |t − t0 | ≤ h, |x − x0 | ≤ k}, then ξ is also bounded, and so f is lo-
cally Lipschitz in A. But f is not Lipschitz in A.
Proof: Let (t0 , x0 ) ∈ U . Since U is open, there exist h, k > 0 such that
Theorem 13.8.1 Assume the function x satifies (t, x(t)) ∈ U for all t ∈ I,
where I is some closed bounded interval. Assume t0 ∈ I. Then x is a C 1
solution to (13.17), (13.18) in I iff x is a C 0 solution to the integral equation
Z t ³ ´
x(t) = x0 + f s, x(s) ds (13.19)
t0
in I.
158
Proof:
Rt
6
If h exists and is continuous on I, t0 ∈ I and g(t) = t0
h(s) ds for all t ∈ I, then g 0
exists and g 0 = h on I. In particular, g is C 1 .
First Order Systems 159
(t , x ) graph of x(t)
0 0
t -h t t
0 t 0 +h
0
T : C ∗ [t0 − h, t0 + h] → C ∗ [t0 − h, t0 + h]
7
If xn → x uniformly and |xn (t)| ≤ k for all t, then |x(t)| ≤ k for all t.
160
defined by
Z t ³ ´
(T x)(t) = x0 + f t, x(t) dt for t ∈ [t0 − h, t0 + h]. (13.24)
t0
Notice that the fixed points of T are precisely the solutions of (13.20).
We check that T is indeed a map into C ∗ [t0 − h, t0 + h] as follows:
(i) Since in (13.24) we are taking the definite integral of a continuous func-
tion, Corollary 11.6.4 shows that T x is a continuous function.
Since the contraction mapping theorem gives an algorithm for finding the
fixed point, this can be used to obtain approximates to the solution of the
differential equation. In fact the argument can be sharpened considerably.
At the step (13.25)
First Order Systems 161
Z t
|(T x1 )(t) − (T x2 )(t)| ≤ K|x1 (t) − x2 (t)| dt
t0
≤ K|t − t0 |du (x1 , x2 ).
x0 (t) = x2 ,
x(0) = a,
Remark* The second alternative in the Theorem just says that the graph
of the solution eventually leaves any closed bounded A ⊂ U . We can think
of it as saying that the graph of the solution either escapes to infinity or
approaches the boundary of U as t → T .
The integral of the vector function on the right side is defined componentwise
in the natural way, i.e.
Z t ³ ´ µZ t ³ ´ Z t ³ ´ ¶
f s, x(s) ds := f 1 s, x(s) ds, . . . , f 2 s, x(s) ds .
t0 t0 t0
The proof of equivalence is essentially the proof in Section 13.8 for the single
equation case, applied to each component separately.
Solutions of the integral equation are precisely the fixed points of the
operator T , where
Z t ³ ´
(T x)(t) = x0 + f s, x(s) ds t ∈ [t0 − h, t0 + h].
t0
Fractals
165
166
14.1 Examples
In general,
This is quite plausible, and will easily follow after we make precise the limiting
process used to define K.
A == A(0) = [0, 1]
A(1) = [0, 1/3] ∪ [2/3, 1]
A(2) = [0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1]
..
.
T∞
Let C = n=0 A(n) . Since C is the intersection of a family of closed sets, C
is closed.
168
Note that A(n+1) ⊂ A(n) for all n and so the A(n) form a decreasing family
of sets.
Consider the ternary expansion of numbers x ∈ [0, 1], i.e. write each
x ∈ [0, 1] in the form
a1 a2 an
x = .a1 a2 . . . an . . . = + 2 + ··· + n + ··· (14.1)
3 3 3
Next let
1 1
S1 (x) = x, S2 (x) = 1 + (x − 1).
3 3
Notice that S1 is a dilation with dilation ratio 1/3 and fixed point 0. Similarly,
S2 is a dilation with dilation ratio 1/3 and fixed point 1.
Then
A(n+1) = S1 [A(n) ] ∪ S2 [A(n) ].
Moreover,
C = S1 [C] ∪ S2 [C].
The Sierpinski Sponge P is obtained by first drilling out from the closed
unit cube A = A(0) = [0, 1]×[0, 1]×[0, 1], the three open, square cross-section,
tubes
S = D ◦ T ◦ O,
This proves the result, including the last statement of the Theorem.
K = K1 ∪ · · · ∪ KN ,
Ki = Si [K]
where each Si is a similitude with dilation ratio ri > 0. See the following
diagram for a few examples.
6
In the sense that the Hausdorff dimension of the intersection is less than k.
172
K1 K
2
K
K1 K2 K3
K
K1 K3
K1 K
2
K K6 K
K3 K
4
K1
K
K3
K K
2 4
N rk = 1,
and so
log N
k= .
log 1/r
Thus we have a formula for the dimension k in terms of the number N of
“almost disjoint” sets Ki whose union is K, and the dilation ratio r used to
obtain each Ki from K.
Fractals 173
More generally, if the ri are not all equal, the dimension k can be deter-
mined from N and the ri as follows. Define
X
N
g(p) = rip .
i=1
Then g(0) = N (> 1), g is a strictly decreasing function (assuming 0 < ri <
1), and g(p) → 0 as p → ∞. It follows there is a unique value of p such that
g(p) = 1, and from (14.4) this value of p must be the dimension k.
The preceding considerations lead to the following definition:
K = S1 [K] ∪ · · · ∪ Sn [K],
where the Si are similitudes with dilation ratios 0 < ri < 1. Then the
similarity dimension of K is the unique real number D such that
X
N
1= riD .
i=1
Remarks This is only a good definition if the sets Si [K] are “almost” dis-
joint in some sense (otherwise different decompositions may lead to different
values of D). In this case one can prove that the similarity dimension and the
Hausdorff dimension are equal. The advantage of the similarity dimension is
that it is easy to calculate.
Variants on the Koch Curve Let K be the Koch curve. We have seen
how we can write
K = S1 [K] ∪ · · · ∪ S4 [K].
K = S1 [K] ∪ S2 [K]
for suitable other choices of similitudes S1 , S2 . Here S1 [K] is the left side of
the Koch curve, as shown in the next diagram, and S2 [K] is the right side.
" #
cos θ − sin θ
tation by θ in an anticlockwise direction, or O = , i.e. O
− sin θ − cos θ
is a rotation by θ in an anticlockwise direction followed by reflection in the
x-axis.
For a given fractal it is often a simple matter to work “backwards” and
find a corresponding family of similitudes. One needs to find S1 , . . . , SN such
that
K = S1 [K] ∪ · · · ∪ SN [K].
If equality is only approximately true, then it is not hard to show that the
fractal generated by S1 , . . . , SN will be approximately equal to K 11 .
In this way, complicated structures can often be encoded in very efficient
ways. The point is to find appropriate S1 , . . . , SN . There is much applied
and commercial work (and venture capital!) going into this problem.
A² = {x ∈ Rn : d(x, A) ≤ ²} .
1. A ⊂ B ⇒ A² ⊂ B² .
5. \
A² = A δ . (14.9)
²>δ
T
To see this12 first note that Aδ ⊂ ²>δ A² , since Aδ ⊂ A² whenever
T
² > δ. On the other hand, if x ∈ ²>δ A² then x ∈ A² for all ² > δ.
Hence d(x, A) ≤ ² for all ² > δ, and so d(x, A) ≤ δ. That is, x ∈ Aδ .
12
The result is not completely obvious. Suppose we had instead defined A² =
{x ∈ Rn : d(x, A) < ²}. Let A = [0, 1] ⊂ R. With this changed definition we would
have A² = (−², 1 + ²), and so
\ \
A² = (−², 1 + ²) = [−δ, 1 + δ] 6= Aδ .
²>δ ²>δ
Fractals 179
Elementary Properties of dH
1. (Exercise) If E, F, G, H ⊂ Rn then
The Hausdorff metric is not a metric on the set of all subsets of Rn . For
example, in R we have ³ ´
d (a, b), [a, b] = 0.
Thus the distance between two non-equal sets is 0. But if we restrict to
compact sets, the d is indeed a metric, and moreover it makes K into a
complete metric space.
Proof: (a) We first prove the three properties of a metric from Defini-
tion 6.2.1. In the following, all sets are compact and non-empty.
(b) Assume (Ai )i≥1 is a Cauchy sequence (of compact non-empty sets)
from K.
Let [
Cj = Ai ,
i≥j
for j = 1, 2, . . .. Then the C j are closed and bounded14 , and hence compact.
Moreover, the sequence (C j ) is decreasing, i.e.
Cj ⊂ Ck,
14
This follows from the fact that (Ak ) is a Cauchy sequence.
Fractals 181
if j ≥ k.
Let \
C= Cj.
j≥1
P
Q
P Q
P Q
184
Chapter 15
Compactness
15.1 Definitions
In Definition 9.3.1 we defined the notion of a compact subset of a metric
space. As noted following that Definition, the notion defined there is usually
called sequential compactness.
We now give another definition of compactness, in terms of coverings
by open sets (which applies to any topological space)1 . We will show that
compactness and sequential compactness agree for metric spaces. (There are
examples to show that neither implies the other in an arbitrary topological
space.)
K ⊂ U1 ∪ . . . ∪ UN
for some U1 , . . . , UN ∈ F. That is, every open cover has a finite subcover.
If X is compact, we say the metric space itself is compact.
185
186
Let A = {xn }. Note that A may be finite (in case there are only a finite
number of distinct terms in the sequence).
(1) Claim: If A is finite, then some subsequence of (xn ) converges.
Proof: If A is finite there is only a finite number of distinct terms in the
sequence. Thus there is a subsequence of (xn ) for which all terms are equal.
This subsequence converges to the common value of all its terms.
(2) Claim: If A is infinite, then A has at least one limit point.
Proof: Assume A has no limit points.
It follows that A = A from Definition 6.3.4, and so A is closed by Theo-
rem 6.4.6.
It also follows that each a ∈ A is not a limit point of A, and so from
Definition 6.3.3 there exists a neighbourhood Va of a (take Va to be some
open ball centred at a) such that
Va ∩ A = {a}. (15.1)
In particular, [
X = Ac ∪ Va
a∈A
(4) Claim: 3
For each integer k there is a finite set {x1 , . . . , xN } ⊂ X
such that
x ∈ X ⇒ d(xi , x) < 1/k for some i = 1, . . . , N.
X = Vn
for some n.
Suppose not. Then there exists a sequence (xn ) where xn 6∈ Vn for each n.
By assumption of sequential compactness, some subsequence (x0n ) converges
to x, say. Since G is a cover of X, it follows x ∈ UN , say, and so x ∈ VN . But
VN is open and so
x0n ∈ VN for n > M, (15.3)
for some M .
On the other hand, xk 6∈ Vk for all k, and so
xk 6∈ VN (15.4)
d(Y ) = sup{d(y, y 0 ) : y, y 0 ∈ Y } .
Note this is not necessarily the same as the diameter of the smallest ball
containing the set, however, Y ⊆ Bd(Y ) (y)l for any y ∈ Y .
4 It is not true in general that closed and bounded sets are compact. In
Remark 2 of Section 9.2 we see that the set
We also have the following facts about continuous functions and compact
sets:
by
f (θ) = (cos θ, sin θ).
Then f is clearly continuous (assuming the functions cos and sin are continu-
ous), one-one and onto. But f −1 is not continuous, as we easily see by finding
a sequence xn (∈ S 1 ) → (1, 0) (∈ S 1 ) such that f −1 (xn ) 6→ f −1 ((1, 0)) = 0.
Compactness 191
S1
.xn (1,0)
.
.x2
x1
-1
f-1(x1)
. . f. (xn)
0 f-1(x2) 2π
Note that [0, 2π) is not compact (exercise: prove directly from the def-
inition of sequential compactness that it is not sequentially compact, and
directly from the definition of compactness that it is not compact).
Remark If necessary, we can assume that the centres of the balls belong
to A.
To see this, first cover A by balls of radius δ/2, as in the Definition. Let
the centres be x1 , . . . , xN . If the ball Bδ/2 (xi ) contains some point ai ∈ A,
then we replace the ball by the larger ball Bδ (ai ) which contains it. If Bδ/2 (xi )
contains no point from A then we discard this ball. In this way we obtain a
finite cover of A by balls of radius δ with centres in A.
That X is totally bounded follows from the observation that the set of all
balls Bδ (x), where x ∈ X, is an open cover of X, and so has a finite subcover
by compactness of A.
Definition 15.6.1 Let (X, d) and (Y, ρ) be metric spaces. Let F be a family
of functions from X to Y .
Then F is equicontinuous at the point x ∈ X if for every ² > 0 there
exists δ > 0 such that
Remarks
Example 3 In the first example, equicontinuity followed from the fact that
the families of functions had a uniform Lipschitz bound.
More generally, families of Hölder continuous functions with a fixed ex-
ponent α and a fixed constant M (see Definition 11.3.2) are also uniformly
equicontinuous. This follows from the fact that in the definition of uniform
³ ´1/α
equicontinuity we can take δ = ² M
.
196
Proof: Suppose ² > 0. For each x ∈ X there exists δx > 0 (where δx may
depend on x as well as ²) such that
for all f ∈ F.
The family of all balls B(x, δx /2) = Bδx /2 (x) forms an open cover of X.
By compactness there is a finite subcover B1 , . . . , BN by open balls with
centres x1 , . . . , xn and radii δ1 /2 = δx1 /2, . . . , δN /2 = δxN /2, say.
Let
δ = min{δ1 , . . . , δN }.
Take any x, x0 ∈ X with d(x, x0 ) < δ/2.
Then d(xi , x) < δi /2 for some xi since the balls Bi = B(xi , δi /2) cover X.
Moreover,
d(xi , x0 ) ≤ d(xi , x) + d(x, x0 ) < δi /2 + δ/2 ≤ δi .
In particular, both x, x0 ∈ B(xi , δi ).
It follows that for all f ∈ F,
Remarks
Claim: CM,K
α
(X; Rn ) is closed, uniformly bounded and uniformly equicon-
tinuous, and hence compact by the Arzela-Ascoli Theorem.
We saw in Example 3 of the previous Section that CM,K
α
(X; Rn ) is equicon-
tinuous.
Boundedness is immediate, since the distance from any f ∈ CM,K
α
(X; Rn )
to the zero function is at most K (in the sup metric).
In order to show closure in C(X; Rn ), suppose that
fn ∈ CM,K
α
(X; Rn )
for n = 1, 2, . . ., and
fn → f uniformly as n → ∞,
Remark The Arzela-Ascoli Theorem implies that any sequence from the
class LipM,K [a, b] has a convergent subsequence. This is not true for the set
CK [a, b] of all continuous functions f from C[a, b] merely satisfying sup |f | ≤
K. For example, consider the sequence of functions (fn ) defined by
fn (x) = sin nx x ∈ [0, 2π].
then the absolute value of the derivatives, and hence the Lipschitz constants,
are uniformly bounded by 1. In this case the entire sequence converges
uniformly to the zero function, as is easy to see.
d(x, xi ) < δ1
and hence
α : {x1 , . . . , xp } → {y1 , . . . , yq }.
then choose one such f and label it gα . Let S be the set of all gα . Thus
Then
|f (x) − gα (x)| ≤ |f (x) − f (xi )| + |f (xi ) − α(xi )|
+|α(xi ) − gα (xi )| + +|gα (xi ) − gα (x)|
δ
≤ 4× from (15.6), (15.9), (15.10) and (15.8)
4
< δ.
This establishes (15.5) since x was an arbitrary member of X. Thus F is
totally bounded.
Remark We saw in Theorem 13.9.1 that the integral equation does indeed
have a solution, assuming f is also locally Lipschitz in x. The proof used the
Contraction Mapping Principle. But if we merely assume continuity of f ,
then that proof no longer applies (if it did, it would also give uniqueness,
which we have just remarked is not the case).
In the following proof, we show that some subsequence of the sequence
of Euler polygons, first constructed in Section 13.4, converges to a solution
of the integral equation.
Proof of Theorem
(1) Choose h, k > 0 so that
k
h≤ . (15.13)
M
Compactness 203
(2) (See the diagram for n = 3) For each integer n ≥ 1, let xn (t) be the
piecewise linear function, defined as in Section 13.4, but with step-size h/n.
More precisely, if t ∈ [t0 , t0 + h],
xn (t) = x0 + (t − t0 ) f (t0 , x0 )
" #
h
for t ∈ t0 , t0 +
n
à ! à !
³ h ´ ³ h ´ h ³ h ´
xn (t) = xn t0 + + t − t0 + f t0 + , xn t0 +
n n n n
" #
h h
for t ∈ t0 + , t0 + 2
n n
à ! à !
³ h´ ³ h´ h n³ h´
n n
x (t) = x t0 + 2 + t − t0 + 2 f t0 + 2 , x t0 + 2
n n n n
" #
h h
for t ∈ t0 + 2 , t0 + 3
n n
..
.
(3) From (15.12) and (15.13), and as is clear from the diagram, | dtd xn (t)| ≤
M (except at the points t0 , t0 ± nh , t0 ± 2 nh , . . .). It follows (exercise) that xn
is Lipschitz on [t0 − h, t0 + h] with Lipschitz constant at most M .
In particular, since k ≥ M h, the graph of t 7→ xn (t) remains in the closed
rectangle Ah,k (t0 , x0 ) for t ∈ [t0 − h, t0 + h].
(4) From (3), the functions xn belong to the family F of Lipschitz func-
tions
f : [t0 − h, t0 + h] → R
204
such that
Lipf ≤ M
and
|f (t) − x0 | ≤ k for all t ∈ [t0 − h, t0 + h].
But F is closed, uniformly bounded, and uniformly equicontinuous, by the
same argument as used in Example 1 of Section 15.7. It follows from the
0
Arzela-Ascoli Theorem that some subsequence (xn ) of (xn ) converges uni-
formly to a function x ∈ F.
Our aim now is to show that x is a solution of (15.11).
(5) For each point (t, xn (t)) on the graph of xn , let P n (t) ∈ R2 be the
coordinates of the point at the left (right) endpoint of the corresponding line
segment if t ≥ 0 (t ≤ 0). More precisely
à ! " #
h ³ h´ h h
P (t) = t0 + (i − 1) , xn t0 + (i − 1)
n
if t ∈ t0 + (i − 1) , t0 + i
n n n n
for i = 1, . . . , n. A similar formula is true for t ≤ 0.
h ´
Notice that P n (t) is constant for t ∈ t0 + (i − 1) nh , t0 + i nh , (and in
particular P n (t) is of course not continuous in [t0 − h, t0 + h]).
h i
(6) Without loss of generality, suppose t ∈ t0 + (i − 1) nh , t0 + i nh . Then
from (5) and (3)
v
uà !2 à !2
u ³ h´ ³ h´
t
|P (t) − (t, x (t)| ≤
n n
t − t0 + i + xn (t) − xn t0 + i
n n
s
³ h ´2 ³ h ´2
≤ + M
n n
√ h
= 1 + M2 .
n
Thus |P n (t) − (t, xn (t))| → 0, uniformly in t, as n → ∞.
(7) It follows from the definitions of xn and P n , and is clear from the
diagram, that Z t
xn (t) = x0 + f (P n (s)) ds (15.14)
t0
|f (s, x(s)) − f (P n (s))| ≤ |f (s, x(s)) − f (s, xn (s))| + |f (s, xn (s)) − f (P n (s))|.
From (6), |(s, xn (s)) − P n (s)| < δ for all n ≥ N1 (say), independently of
0
s. From uniform convergence (4), |x(s) − xn (s)| < δ for all n0 ≥ N2 (say),
independently of s. By the choice of δ it follows
0
|f (s, x(s)) − f (P n (s))| < 2², (15.15)
(10) From (4), the left side of (15.14) converges to the left side of (15.11),
0
for the subsequence (xn ).
From (15.15), the difference of the right sides of (15.14) and (15.11) is
0
bounded by 2²h for members of the subsequence (xn ) such that n0 ≥ N (²).
As ² is arbitrary, it follows that for this subsequence, the right side of (15.14)
converges to the right side of (15.11).
This establishes (15.11), and hence the Theorem.
206
Chapter 16
Connectedness
16.1 Introduction
207
208
1. iff there do not exist two non-empty disjoint closed sets U and V such
that X = U ∪ V ;
2. iff the only non-empty subset of X which is both open and closed1 is X
itself.
16.3 Connectedness in R
Not surprisingly, the connected sets in R are precisely the intervals in R.
We first need a precise definition of interval.
S = U ∪ V, U ∩ V = ∅.
Definition 16.4.2 A metric space (X, d) is path connected if any two points
in X can be connected by a path in X.
A set S ⊂ X is path connected if the metric subspace (S, d) is path
connected.
The notion of path connected may seem more intuitive than that of con-
nected. However, the latter is usually mathematically easier to work with.
Every path connected set is connected (Theorem 16.4.3). A connected
set need not be path connected (Example (3) below), but for open subsets
of Rn (an important case) the two notions of connectedness are equivalent
(Theorem 16.4.4).
3
We want to show that X is not path connected.
Connectedness 211
Examples
3. Let
Then it is easy to see that h is a continuous path in U from a to y (the main point is to
check what happens at t = 1/2).
212
E = f [X] ∩ E 0 = f [X] ∩ E 0 .
In particular,
f −1 [E] = f −1 [E 0 ] = f −1 [E 00 ],
and so f −1 [E] is both open and closed in X. Since E 6= ∅, f [X] it follows
that f −1 [E] 6= ∅, X. Hence X is not connected, contradiction.
Thus f [X] is connected.
Differentiation of Real-Valued
Functions
17.1 Introduction
In this Chapter we discuss the notion of derivative (i.e. differential ) for func-
tions f : D (⊂ Rn ) → R. In the next chapter we consider the case for
functions f : D (⊂ Rn ) → Rn .
We can represent such a function (m = 1) by drawing its graph, as is done
in the first diagrams in Section 10.1 in case n = 1 or n = 2, or as is done
“schematically” in the second last diagram in Section 10.1 for arbitrary n.
In case n = 2 (or perhaps n = 3) we can draw the level sets, as is done in
Section 17.6.
x. r
D
213
214
y · x = y 1 x1 + . . . + y n xn
L : Rn → R
L(x) = y · x ∀x ∈ Rn . (17.1)
y i = L(ei ) i = 1, . . . , n.
Then
L(x) = L(x1 e1 + · · · + xn en )
= x1 L(e1 ) + · · · + xn L(en )
= x1 y 1 + · · · + xn y n
= y · x.
L(ei ) = y i i = 1, . . . , n.
Note that if L is the zero operator , i.e. if L(x) = 0 for all x ∈ Rn , then
the vector y corresponding to L is the zero vector.
Differential Calculus for Real-Valued Functions 215
L
. .
x x+tei
∂f
Thus (x) is just the usual derivative at t = 0 of the real-valued func-
∂xi
tion g defined by g(t) = f (x1 , . . . , xi + t, . . . , xn ). Think of g as being defined
along the line L, with t = 0 corresponding to the point x.
.
. x+tv
x
D L
216
graph of
x |→ f(x)
graph of
f(x) . x |→ f(a)+f'(a)(x-a)
.
f(a)+f'(a)(x-a)
. .
a x
Note that the right-hand side of (17.5) is linear in x. (More precisely, the
right side is a polynomial in x of degree one.)
The error, or difference between the two sides of (17.5), approaches zero
as x → a, faster than |x − a| → 0. More precisely
¯ ³ ´¯ ¯ ³ ´¯
¯ ¯ ¯ f (x) − f (a) + f 0 (a)(x − a) ¯
¯f (x) − f (a) + f 0 (a)(x − a) ¯ ¯ ¯
= ¯ ¯
¯ ¯
|x − a| ¯ x−a ¯
¯ ¯
¯ f (x) − f (a) ¯
¯
= ¯ − f 0 (a)¯¯
¯ x−a ¯
→ 0 as x → a. (17.6)
We make this the basis for the next definition in the case n > 1.
Differential Calculus for Real-Valued Functions 217
graph of
x |→ f(a)+L(x-a)
graph of
x |→ f(x)
|f(x) - (f(a)+L(x-a))|
x
a
Proof:
Thus df (a) is the linear map corresponding to the row vector (2a1 +3a2 2 , 6a1 a2 +
3a2 2 a3 , a2 3 + 1).
If a = (1, 0, 1) then hdf (a), vi = 2v1 + v3 . Thus df (a) is the linear map
corresponding to the row vector (2, 0, 1).
∂f
If a = (1, 0, 1) and v = e1 then hdf (1, 0, 1), e1 i = (1, 0, 1) = 2.
∂x
Differential Calculus for Real-Valued Functions 219
|ψ(x)|
→ 0 as x → a,
|x − a|
and
O(|x − a|) for sin(x − a).
Clearly, if ψ(x) can be written as o(|x − a|) then it can also be written as
O(|x − a|), but the converse may not be true as the above example shows.
Then
f (x) = f (a) + hdf (a), x − ai + ψ(x),
and ψ(x) = o(|x − a|) from Definition 17.5.1.
220
Conversely, suppose
Finally we have:
The previous proposition corresponds to the fact that the partial deriva-
tives for f + g are the sum of the partial derivatives corresponding to f and
g respectively. Similarly for αf 1 .
Dv f (x) = v · ∇f (x).
For example, the contour lines on a map are the level sets of the height
function.
222
x12+x 22 = 2.5 x2
∇f(x)
x.
x1
x2
x12+x 22 = .8
x1
x2 x12 - x 22 = -.5
x12 - x 22 = 0 x12 - x 22 = 2
x1
x12 - x22 = 2
Proof: This is immediate from the previous Definition and Proposition 17.6.3.
(2) An example where the directional derivatives at some point all exist, but
the function is not differentiable at the point.
Let
xy 2
(x, y) 6= (0, 0)
f (x, y) = x2 + y 4
0 (x, y) = (0, 0)
Thus the directional derivatives Dv f (0, 0) exist for all v, and are given
by (17.9).
In particular
∂f ∂f
(0, 0) = (0, 0) = 0. (17.10)
∂x ∂y
224
But if f were differentiable at (0, 0), then we could compute any direc-
tional derivative from the partial drivatives. Thus for any vector v we would
have
(3) An Example where the directional derivatives at a point all exist, but
the function is not continuous at the point
Take the same example as in (2). Approach the origin along the curve
x = λ2 , y = λ. Then
λ4 1
lim f (λ2 , λ) = lim = .
λ→0 λ→0 2λ4 2
But if we approach the origin along any straight line of the form (λv1 , λv2 ),
then we can check that the corresponding limit is 0.
Thus it is impossible to define f at (0, 0) in order to make f continuous
there.
Then
L
.
a+h = g(1)
.x
.
a = g(0)
Proof: Note that (17.12) follows immediately from (17.11) by Corollary 17.5.3.
Define the one variable function g by
g(t) = f (a + th).
By the usual Mean Value Theorem for a function of one variable, applied
to g, we have
g(1) − g(0) = g 0 (t) (17.16)
for some t ∈ (0, 1).
Substituting (17.13) and (17.15) in (17.16), the required result (17.11)
follows.
Corollary 17.9.2 Assume the hypotheses of the previous theorem and sup-
pose |∇f (x)| ≤ M for all x ∈ L. Then
|f (a + h) − f (a)| ≤ M |h|
E = {x ∈ Ω : f (x) = α}.
Then E is non-empty (as a ∈ E). We will prove E is both open and closed
in Ω. Since Ω is connected, this will imply that E is all of Ω3 . This establishes
the result.
To see E is open 4 , suppose x ∈ E and choose r > 0 so that Br (x) ⊂ Ω.
If y ∈ Br (x), then from (17.11) for some u between x and y,
Proof: We prove the theorem in case n = 2 (the proof for n > 2 is only
notationally more complicated).
Suppose that the partial derivatives of f exist and are continuous in Ω.
Then if a ∈ Ω and a + h is sufficiently close to a,
f (a1 + h1 , a2 + h2 ) = f (a1 , a2 )
+f (a1 + h1 , a2 ) − f (a1 , a2 )
+f (a1 + h1 , a2 + h2 ) − f (a1 + h1 , a2 )
∂f ∂f
= f (a1 , a2 ) + 1 (ξ 1 , a2 )h1 + 2 (a1 + h1 , ξ 2 )h2 ,
∂x ∂x
for some ξ 1 between a1 and a1 + h1 , and some ξ 2 between a2 and a2 + h2 . The
first partial derivative comes from applying the usual Mean Value Theorem,
for a function of one variable, to the function f (x1 , a2 ) obtained by fixing
a2 and taking x1 as a variable. The second partial derivative is similarly
obtained by considering the function f (a1 + h1 , x2 ), where a1 + h1 is fixed
and x2 is variable.
228
(a1+h1, a2+h2 )
.
. (a +h , ξ )
1 1 2
(ξ1, a2 )
. . .
(a 1, a2) (a 1+h1 , a 2)
Ω
Hence
∂f 1 2 1 ∂f
f (a1 + h1 , a2 + h2 ) = f (a1 , a2 ) + 1
(a , a )h + 2 (a1 , a2 )h2
à ∂x ∂x
!
∂f 1 2 ∂f 1 2
+ (ξ , a ) − 1 (a , a ) h1
∂x1 ∂x
à !
∂f 1 ∂f 1 2
+ 2
(a + h , ξ ) − 2 (a , a ) h2
1 2
∂x ∂x
1 2
= f (a , a ) + L(h) + ψ(h), say.
∂2f
or fij or Dij f.
∂xj ∂xi
If all first and second partial derivatives of f exist and are continuous in
5
Ω we write
f ∈ C 2 (Ω).
∂2f ∂ 2f
= ,
∂xi ∂xj ∂xj ∂xi
and so
∂3f ∂ 3f ∂ 3f
= = , etc.
∂xi ∂xj ∂xk ∂xj ∂xi ∂xk ∂xj ∂xk ∂xi
5
In fact, it is sufficient to assume just that the second partial derivatives are continuous.
For under this assumption, each ∂f /∂xi must be differentiable by Theorem 17.10.1 applied
to ∂f /∂xi . From Proposition 17.8.1 applied to ∂f /∂xi it then follows that ∂f /∂xi is
continuous.
230
Theorem 17.11.1 If f ∈ C 1 (Ω)6 and both fij and fji exist and are contin-
uous (for some i 6= j) in Ω, then fij = fji in Ω.
In particular, if f ∈ C 2 (Ω) then fij = fji for all i 6= j.
A B
C D
∂f 1 ∂f
g 0 (x2 ) = 2
(a + h, x2 ) − 2 (a1 , x2 ) (17.19)
∂x ∂x
for a2 ≤ x ≤ a2 + h.
Applying the mean value theorem for a function of a single variable
to (17.18), we see from (17.19) that
1 0 2
A(h) = g (ξ ) some ξ 2 ∈ (a2 , a2 + h)
hà !
1 ∂f 1 ∂f 1 2
= (a + h, ξ ) − 2 (a , ξ ) .
2
(17.20)
h ∂x2 ∂x
6
As usual, Ω is assumed to be open.
Differential Calculus for Real-Valued Functions 231
∂f 1 2
Applying the mean value theorem again to the function (x , ξ ), with
∂x2
ξ 2 fixed, we see
∂2f
A(h) = (ξ 1 , ξ 2 ) some ξ 1 ∈ (a1 , a1 + h). (17.21)
∂x1 ∂x2
∂2f
A(h) = (η 1 , η 2 ) (17.23)
∂x2 ∂x1
1 00
g(b) = g(a) + g 0 (a)(b − a) + g (a)(b − a)2 + · · · (17.24)
2!
Z b
1 (b − t)k−1 (k)
+ g (k−1) (a)(b − a)k−1 + g (t) dt.
(k − 1)! a (k − 1)!
232
Now choose
(b − t)k−1
ϕ(t) = .
(k − 1)!
Then
(b − t)k−2
ϕ0 (t) = (−1)
(k − 2)!
(b − t)k−3
ϕ00 (t) = (−1)2
(k − 3)!
..
.
(b − t)2
ϕ(k−3) (t) = (−1)k−3
2!
ϕ(k−2) (t) = (−1)k−2 (b − t)
ϕ(k−1) (t) = (−1)k−1
ϕk (t) = 0. (17.26)
By the Intermediate Value Theorem, g (k) takes all values in the range
[m, M ], and so the middle term in the previous inequality must equal g (k) (ξ)
for some ξ ∈ (a, b). Since
Z b (b − t)k−1 (b − a)k
dt = ,
a (k − 1)! k!
it follows Z b (b − t)k−1 (b − a)k (k)
g (k) (t) dt = g (ξ).
a (k − 1)! k!
Formula (17.27) now follows from (17.24).
Remark For a direct proof of (17.27), which does not involve any integra-
tion, see [Sw, pp 582–3] or [F, Appendix A2].
where
X n Z 1
1
Rk (a, h) = (1 − t)k−1 Di1 ...ik f (a + th) dt
(k − 1)! i1 ,...,ik =1 0
1 X n
= Di1 ,...,ik f (a + sh) hi1 · . . . · hik for some s ∈ (0, 1).
k! i1 ,...,ik =1
234
This is just a particular case of the chain rule, which we will discuss later.
This particular version follows from (17.15) and Corollary 17.5.3 (with f
there replaced by F ).
Let
g(t) = f (a + th).
Then g : [0, 1] → R. We will apply Taylor’s Theorem for a function of one
variable to g.
From (17.28) we have
X
n
0
g (t) = Di f (a + th) hi . (17.29)
i−1
Similarly
X
n
g 000 (t) = Dijk f (a + th) hi hj hk , (17.31)
i,j,k=1
etc. In this way, we see g ∈ C k [0, 1] and obtain formulae for the derivatives
of g.
But from (17.24) and (17.27) we have
1 00 1
g(1) = g(0) + g 0 (0) + g (0) + · · · + g (k−1) (0)
2! (k − 1)!
Z 1
1
(1 − t)k−1 g (k) (t) dt
(k − 1)! 0
+ or
1 (k)
g (s) some s ∈ (0, 1).
k!
If we substitute (17.29), (17.30), (17.31) etc. into this, we obtain the required
results.
Remark The first two terms of Taylor’s Formula give the best first order
approximation 7 in h to f (a + h) for h near 0. The first three terms give
7
I.e. constant plus linear term.
Differential Calculus for Real-Valued Functions 235
the best second order approximation 8 in h, the first four terms give the best
third order approximation, etc.
Note that the remainder term Rk (a, h) in Theorem 17.12.3 can be written
as O(|h|k ) (see the Remarks on rates of convergence in Section 17.5), i.e.
Rk (a, h)
is bounded as h → 0.
|h|k
This follows from the second version for the remainder in Theorem 17.12.3
and the facts:
Example Let
f (x, y) = (1 + y 2 )1/2 cos x.
One finds the best second order approximation to f for (x, y) near (0, 1) as
follows.
First note that
f (0, 1) = 21/2 .
Moreover,
Hence
³ ´
f (x, y) = 21/2 + 2−1/2 (y − 1) − 21/2 x2 + 2−3/2 (y − 1)2 + R3 (0, 1), (x, y) ,
where
³ ´ ³ ´ µ³ ´3/2 ¶
R3 (0, 1), (x, y) = O |(x, y) − (0, 1)|3 = O x2 + (y − 1)2 .
8
I.e. constant plus linear term plus quadratic term.
236
Chapter 18
Differentiation of
Vector-Valued Functions
18.1 Introduction
In this chapter we consider functions
f : D (⊂ Rn ) → Rn ,
where
f i : D → R i = 1, . . . , m
are real -valued functions.
Example Let
f (x, y, z) = (x2 − y 2 , 2xz + 1).
Then f 1 (x, y, z) = x2 − y 2 and f 2 (x, y, z) = 2xz + 1.
237
238
18.2 Paths in Rm
In this section we consider the case corresponding to n = 1 in the notation
of the previous section. This is an important case in its own right and also
helps motivates the case n > 1.
Proof: Since
à !
f (t + s) − f (t) f 1 (t + s) − f 1 (t) f m (t + s) − f m (t)
= ,..., ,
s s s
The theorem follows by applying Theorem 10.4.4.
³ ´
Definition 18.2.3 If f (t) = f 1 (t), . . . , f m (t) then f is C 1 if each f i is C 1 .
We have the usual rules for differentiating the sum of two functions from I
to <m , and the product of such a function with a real valued function (exer-
cise: formulate and prove such a result). The following rule for differentiating
the inner product of two functions is useful.
Proof: Since
³ ´ X
m
f 1 (t), f 2 (t) = f1i (t)f2i (t),
i=1
the result follows from the usual rule for differentiation sums and products.
1
If t is an endpoint of I then one takes the corresponding one-sided limits.
Differential Calculus for Vector-Valued Functions 239
f '(t1)
f(t 1)=f(t2)
a path in R2
f(t+s) - f(t)
f '(t2)
s f
f(t+s)
f(t) f '(t)
t1 t2 I
Examples
1. Let
f (t) = (cos t, sin t) t ∈ [0, 2π).
This traces out a circle in R2 and
2. Let
f (t) = (t, t2 ).
This traces out a parabola in R2 and
f(t)=(cos t, sin t)
f(t) = (t, t 2)
1. f 1 (t) = (t, t3 ) t ∈ R,
2. f 2 (t) = (t3 , t9 ) t ∈ R,
240
√
3. f 3 (t) = ( 3 t, t) t ∈ R.
Then each function f i traces out the same “cubic” curve in R2 , (i.e., the
image is the same set of points), and
However,
f 01 (0) = (1, 0), f 02 (0) = (0, 0), f 03 (0) is undefined.
f 1'(t) / |f 1'(t)| =
f2'( ϕ(t)) / |f2'( ϕ(t))|
f1(t)=f 2(ϕ(t))
f2
f1
ϕ
I1 I2
Proof: From the chain rule for a function of one variable, we have
³ 0
´
f 01 (t) = f11 (t), . . . , f1m 0 (t)
³ 0
´
= f21 (φ(t)) φ0 (t), . . . , f2m 0 (φ(t)) φ0 (t)
= f 02 (φ(t)) φ0 (t).
Example If |f 0 (t)| is constant (i.e. the “speed” is constant) then the velocity
and the acceleration are orthogonal.
³ ´
Proof: Since |f (t)|2 = f 0 (t), f 0 (y) is constant, we have from Proposi-
tion 18.2.4 that
d³ 0 ´
0 = f (t), f 0 (y)
dt³ ´
= 2 f 00 (t), f 0 (y) .
f(t i-1)
f(t i)
f(t 1)
f
t1 t i-1 ti tN
The next result shows that this definition is independent of the particular
parametrisation chosen for the curve.
Proof: From the chain rule and then the rule for change of variable of
integration,
Z b1 Z b1
|f 01 (t)| dt = |f 02 (φ(t))| φ0 (t)dt
a1 a1
Z b2
= |f 02 (s)| ds.
a2
Remarks
Dv f(a)
∂f
(b)
∂y
R2 f(a)
f(b)
v e2
f
∂f (b)
a b e1 ∂x
R3
in the sense that if one side of either equality exists, then so does the other,
and both sides are then equal.
Then
à !
∂f ∂f 1 ∂f 2 ∂f 3
(x, y) = , , = (2x − 2y, 2x, cos x),
∂x ∂x ∂x ∂x
à !
∂f ∂f 1 ∂f 2 ∂f 3
(x, y) = , , = (−2x, 3y 2 , 0),
∂y ∂y ∂y ∂y
are vectors in R3 .
3
More precisely, if n ≤ m and the differential df (x) has rank n. See later.
244
Li = df i (a) i = 1, . . . , m
But this says that the differential df (a) is unique and is given by (18.2).
4
It follows from Proposition 18.4.2 that if L exists then it is unique and is given by the
right side of (18.2).
Differential Calculus for Vector-Valued Functions 245
Proof: The ith column of the matrix corresponding to df (a) is the vector
hdf (a), ei i5 . From Proposition 18.4.2 this is the column vector corresponding
to ³ ´
hdf 1 (a), ei i, . . . , hdf m (a), ei i ,
i.e. to
³ ∂f 1 ∂f m ´
i
(a), . . . , (a) .
∂x ∂xi
This proves the result.
matrix is L(ei ).
246
Find the best first order approximation to f (x) for x near (1, 2).
Solution:
" #
−3
f (1, 2) = ,
9
" #
2x − 2y −2x
df (x, y) = ,
2x 3y 2
" #
−2 −2
df (1, 2) = .
2 12
Alternatively, working with each component separately, the best first or-
der approximation is
µ
∂f 1 ∂f 1
f 1 (1, 2) + (1, 2)(x − 1) + (1, 2)(y − 2),
∂x ∂y
¶
∂f 2 ∂f 2
2
f (1, 2) + (1, 2)(x − 1) + (y − 2)
∂x ∂y
³ ´
= −3 − 2(x − 1) − 4(y − 2), 9 + 2(x − 1) + 12(y − 2)
³ ´
= 7 − 2x − 4y, −17 + 2x + 12y .
The previous proposition corresponds to the fact that the partial deriva-
tives for f + g are the sum of the partial derivatives corresponding to f and
g respectively. Similarly for αf .
1. f ∈ C 1 (D) ⇒ f is differentiable in D;
2. C 0 (D) ⊃ C 1 (D) ⊃ C 2 (D) ⊃ . . ..
This is generalised in the following theorem. The theorem says that the
linear approximation to g ◦f (computed at x) is the composition of the linear
approximation to f (computed at x) followed by the linear approximation to
g (computed at f (x)).
Schematically:
g◦f
−−−−−−−−f−−−−−−−−−−−→ g
D (⊂ Rn ) −→ Ω (⊂ Rn ) −→Rr
d(g◦f )(x) = dg(f (x)) ◦ df (x)
−−− −−−−−−−−−dg(f
df (x)
−−− −−→
(x))
R n
−→ R n
−→ Rr
Example To see how all this corresponds to other formulations of the chain
rule, suppose we have the following:
f g
R3 −→ R2 −→ R2
(x, y, z) (u, v) (p, q)
Thus coordinates in R3 are denoted by (x, y, z), coordinates in the first copy
of R2 are denoted by (u, v) and coordinates in the second copy of R2 are
denoted by (p, q).
The functions f and g can be written as follows:
The usual version of the chain rule in terms of partial derivatives is:
∂p ∂p ∂u ∂p ∂v
= +
∂x ∂u ∂x ∂v ∂x
∂p ∂p ∂u ∂p ∂v
= +
∂x ∂u ∂x ∂v ∂x
..
.
∂q ∂q ∂u ∂q ∂v
= + .
∂z ∂u ∂z ∂v ∂z
∂p ∂p ∂p
In the first equality,´ ∂x
³
is evaluated at (x, y, z), ∂u and ∂v are evaluated at
∂u ∂v
u(x, y, z), v(x, y, z) , and ∂x and ∂x are evaluated at (x, y, z). Similarly for
the other equalities.
In terms of the matrices of partial derivatives:
" ∂p ∂p ∂p # " #" ∂u ∂u ∂u
#
∂p ∂p
∂x ∂y ∂z ∂u ∂v ∂x ∂y ∂z
∂q ∂q ∂q = ∂q ∂q ∂v ∂v ∂v ,
∂x ∂y ∂z ∂u ∂v ∂x ∂y ∂z
| {z } | {z }| {z }
d(g ◦ f )(x) dg(f (x)) df (x)
1. Suppose
f : Ω (⊂ Rn ) → Rn
and f is C 1 . Note that the dimension of the domain and the range are
the same. Suppose f (x0 ) = y0 . Then a good approximation to f (x)
for x near x0 is gven by
x 7→ f (x0 ) + hf 0 (x0 ), x − x0 i. (19.1)
f
x0
f(x 0)
R2 R2
The right curved grid is the image of the left grid under f
The right straight grid is the image of the left grid under the
first order map x |→ f(x 0) + <f'(x0), x-x0>
We expect that if f 0 (x0 ) is a one-one and onto linear map, (which is the
same as det f 0 (x0 ) 6= 0 and which implies the map in (19.1) is one-one
and onto), then f should be one-one and onto near x0 . This is true,
and is called the Inverse Function Theorem.
2. Consider the set of equations
f 1 (x1 , . . . , xn ) = y 1
f 2 (x1 , . . . , xn ) = y 2
..
.
f (x , . . . , x ) = y n ,
n 1 n
251
252
f
.
x0
.
f(x0)
g V
U
y ∗ ∈ Bδ (f (x0 )).
We will choose δ later. (We will take the set V in the theorem to be the open
set Bδ (f (x0 )) )
For each such y, we want to prove the existence of x (= x∗ , say) such that
f (x) = y ∗ . (19.2)
1
Note that the dimensions of the domain and range are equal.
2
That is, the matrix f 0 (x0 ) is one-one and onto, or equivalently, det f 0 (x0 ) 6= 0.
Inverse Function Theorem 253
We write f (x) as a first order function plus an error term. Thus we want
to solve (for x)
where
R(x) := f (x) − f (x0 ) − hf 0 (x0 ), x − x0 i. (19.4)
In other words, we want to find x such that
(why?).
The right side of (19.5) is the sum of two terms. The first term, that is
x0 +h[f 0 (x0 )]−1 , y ∗ − f (x0 )i, is the solution of the linear equation y ∗ = f (x0 )+
hf 0 (x0 ), x − x0 i. The second term is the error term − h[f 0 (x0 )]−1 , R(x)i,
which is o(|x − x0 |) because R(x) is o(|x − x0 |) and [f 0 (x0 )]−1 is a fixed
<[f'(x0)]-1, R(x)>
Rn graph of f
R(x)
y*
f(x0)
Rn
x
x0 x*
linear map. x0 + <[f'(x0)]-1, y*-f(x0)>
254
Note that x is a fixed point of Ay∗ iff x satisfies (19.5) and hence solves (19.2).
We claim that
Ay∗ : B ² (x0 ) → B² (x0 ), (19.7)
and that Ay∗ is a contraction map, provided ² > 0 is sufficiently small (²
will depend only on x0 and f ) and provided y ∗ ∈ Bδ (y0 ) (where δ > 0 also
depends only on x0 and f ).
To prove the claim, we compute
D E
Ay∗ (x1 ) − Ay∗ (x2 ) = [f 0 (x0 )]−1 , R(x2 ) − R(x1 ) ,
and so
|Ay∗ (x1 ) − Ay∗ (x2 )| ≤ K |R(x1 ) − R(x2 )|, (19.8)
where ° °
K := °°[f 0 (x0 )]−1 °° . (19.9)
From (19.4)
R(x2 ) − R(x1 ) = f (x2 ) − f (x1 ) − hf 0 (x0 ), x2 − x1 i .
We apply the mean value theorem (17.9.1) to each of the components of this
equation to obtain
¯ ¯ ¯D E D E¯
¯ i ¯ ¯ 0 ¯
0
¯R (x2 ) − Ri (x1 )¯ = ¯ f i (ξi ), x2 − x1 − f i (x0 ), x2 − x1 ¯
for i = 1, . . . , n and someE¯ξi ∈ Rn between x1 and x2
¯D 0
¯ 0 ¯
= ¯ f i (ξi ) − f i (x0 ), x2 − x1 ¯
¯ ¯
≤ ¯¯f i (ξi ) − f i (x0 )¯¯ |x2 − x1 |,
0 0
0
by Cauchy-Schwartz, treating f i as a “row vector”.
.. .
x1 ξi
x2
. x0
provided x ∈ B ² (x0 ) and y ∗ ∈ Bδ (f (x0 )) (if Kδ < ²). This establishes (19.7)
and completes the proof of the claim.
Step 3 We now know that for each y ∈ Bδ (f (x0 )) there is a unique x ∈
B² (x0 ) such that f (x) = y. Denote this x by g(y). Thus
g : Bδ (f (x0 )) → B² (x0 ).
Step 4 Let
V = Bδ (f (x0 )), U = g [Bδ (f (x0 ))] .
256
Remark We have
−1
g 0 (y) = [f 0 (g(y))]
Ad [f 0 (g(y))]
= , (19.14)
det[f 0 (g(y))]
where Ad [f 0 (g(y))] is the matrix of cofactors of the matrix [f (g(y))].
If f is C 2 , then since we already know g is C 1 , it follows that the terms
in the matrix (19.14) are algebraic combinations of C 1 functions and so are
C 1 . Hence the terms in the matrix g 0 are C 1 and so g is C 2 .
Similarly, if f is C 3 then since g is C 2 it follows the terms in the ma-
trix (19.14) are C 2 and so g is C 3 .
By induction we have the following Corollary.
See (19.5).
5. Write out the difference quotient for the derivative of g and use this
and the differentiability of f to show g is differentiable.
f (x, u) = y,
i.e.,
f 1 (x1 , . . . , xn , u1 , . . . , um ) = y 1
f 2 (x1 , . . . , xn , u1 , . . . , um ) = y 2
..
.
f (x , . . . , x , u , . . . , u ) = y n ,
n 1 n 1 m
Thus det[∂f /∂x] is the determinant of the derivative of the map f (x1 , . . . , xn ),
where x1 , . . . , xm are taken as the variables and the u1 , . . . , um are taken to
be fixed .
Now suppose that
" #
∂f
f (x0 , u0 ) = y0 , det 6= 0.
∂x (x0 ,u0 )
f (x, u0 ) = y.
The Implicit Function Theorem says more generally that for y near y0 and
for u near u0 , there exists a unique x near x0 such that
f (x, u) = y.
Hence for u near u0 there exists a unique x = x(u) near x0 such that
f (x(u), u) = c. (19.16)
x2 + y 2 = 1.
Write
F (x, y) = 1. (19.18)
Thus in (19.15), u is replaced by y and c is replaced by 1.
y
.
(x0,y 0)
. x
(x0,y0)
x2+y2 = 1
Suppose F (x0 , y0 ) = 1 and ∂F/∂x0 |(x0 ,y0 ) 6= 0 (i.e. x0 6= 0). Then for y
√
near y0 there is a unique x near x0 satisfying (19.18). In fact x = ± 1 − y 2
according as x0 > 0 or x0 < 0. See the diagram for two examples of such
points (x0 , y0 ).
Similarly, if ∂F/∂y0 |(x0 ,y0 ) 6= 0, i.e. y0 6= 0, Then for x near x0 there is a
unique y near y0 satisfying (19.18).
Φ(x, y, z) = 0. (19.19)
Φ (x,y,z) = 0
z0
.
y
x (x0,y0)
Then by the Implicit Function Theorem, for (x, y) near (x0 , y0 ) there is
a unique z near z0 such that Φ(x, y, z) = 0. Thus the “surface” can locally3
be written as a graph over the x-y plane
More generally, if ∇Φ(x0 , y0 , z0 ) 6= 0 then at least one of the derivatives
∂Φ/∂x (x0 , y0 , z0 ), ∂Φ/∂y (x0 , y0 , z0 ) or ∂Φ/∂z (x0 , y0 , z0 ) does not equal 0.
The corresponding variable x, y or z can then be solved for in terms of
3
By “locally” we mean in some Br (a) for each point a in the surface.
260
the other two variables and the surface is locally a graph over the plane
corresponding to these two other variables.
Φ(x, y, z) = 0,
Ψ(x, y, z) = 0.
Φ (x,y,z) = 0
z0 Ψ(x,y,z) = 0
.
y
x (x0,y0)
has rank 2. In other words, two of the three columns must be linearly inde-
pendent. Suppose it is the first two. Then
¯ ¯
¯ ∂Φ ∂Φ ¯
det ¯¯ ∂x ∂y ¯
¯ 6= 0.
¯ ∂Ψ ∂Ψ ¯
∂x ∂y (x0 ,y0 ,z0 )
By the Implicit Function Theorem, we can solve for (x, y) near (x0 , y0 ) in
terms of z near z0 . In other words we can locally write the curve as a graph
over the z axis.
f (0, 1, 3, 2, 7) = 0
4
One constraint gives a four dimensional surface, two constraints give a three dimen-
sional surface, etc. Each further constraint reduces the dimension by one.
Inverse Function Theorem 261
and " #
2 3 1 −4 0
f 0 (0, 1, 3, 2, 7) = .
−6 1 2 0 −1
The first two columns are linearly independent and so we can solve for x1 , x2
in terms of y1 , y2 , y3 near (3, 2, 7).
Moreover, from (19.17) we have
" ∂x1 ∂x1 ∂x1
# " #−1 " #
2 3 1 −4 0
∂y1 ∂y2 ∂y3
= −
∂x2
∂y1
∂x2
∂y2
∂x2
∂y3
−6 1 2 0 −1
(3,2,7)
" #" #
1 1 −3 1 −4 0
= −
20 6 2 2 0 −1
" #
1
4
1
5
− 20
3
=
− 12 6
5
1
10
Proof: Define
F : D → Rn × Rm
by ³ ´
F (x, u) = f (x, u), u .
Also
F (x0 , u0 ) = (y0 , u0 ).
From the Inverse Function Theorem, for all (y, u) near (y0 , u0 ) there exists
a unique (x, w) near (x0 , u0 ) such that
Moreover, x and w are C 1 functions of (y, u). But from the definition of F
it follows that (19.20) holds iff w = u and f (x, u) = y. Hence for all (y, u)
near (y0 , u0 ) there exists a unique x = g(u, y) near x0 such that
f (x, u) = y. (19.21)
19.3 Manifolds
Discussion Loosely speaking, M is a k-dimensional manifold in Rn if M
locally6 looks like the graph of a function of k variables. Thus a 2-dimensional
manifold is a surface and a 1-dimensional manifold is a curve.
We will give three different ways to define a manifold and show that they
are equivalent.
We begin by considering manifolds of dimension n − 1 in Rn (e.g. a curve
in R2 or a surface in R3 ). Such a manifold is said to have codimension one.
Suppose
Φ : Rn → R
is C 1 . Let
M = {x : Φ(x) = 0}.
See Examples 1 and 2 in Section 19.2 (where Φ(x, y) = F (x, y) − 1 in Exam-
ple 1).
If ∇Φ(a) 6= 0 for some a ∈ M , then as in Examples 1 and 2 we can write
M locally as the graph of a function of one of the variables xi in terms of the
remaining n − 1 variables.
Na M
∇Φ(a) = 0
Remarks
2. See Section 17.6 for a discussion of ∇Φ(a) which motivates the defini-
tion of Na M .
3. With the same proof as in Examples 1 and 2 from Section 19.2, we can
locally write M as the graph of a function
for some 1 ≤ i ≤ n.
Φ : Rn → R`
This leads to the following definition which generalises the previous one.
Remarks With the same proof as in Examples 3 from the section on the
Implicit Function Theorem, we can locally write M as the graph of a function
of ` of the variables in terms of the remaining n − ` variables.
R ~ xi
a
M = graph of f ,
near a
8
The space Na M does not depend on the particular Φ used to describe M . We show
this in the next section.
Inverse Function Theorem 265
Then à !
∂f ∂f ∂f ∂f
∇Φ(x) = − ,...,− , 1, − ,...,−
∂x1 ∂xi−1 ∂xi+1 ∂xn
In particular, ∇Φ(x) 6= 0 and so M is a manifold in the level-set sense.
Conversely, we have already seen (in the Remarks following Definitions 19.3.1
and 19.3.2) that if M is a manifold in the level-set sense then it is also a man-
ifold in the graphical sense.
F : Ω (⊂ Rn−1 ) → Rn
such that
M ∩ Br (a) = F [Ω] ∩ Br (a).
∂F ∂F
(u), . . . , (u)
∂u1 ∂un−1
( F1(u),...,Fn-1(u),Fn (u) )
= ( x1,...,xn-1 , (Fno G)(x 1,...,xn-1 ) ) M
F
⇒
Rn-1
u = (u1,...,u n-1)
= G(x 1,...,xn-1 ) G ( F1(u),...,Fn-1(u) )
⇐ = ( x1,...,xn-1 )
h i
i
It follows that the (n − 1) × (n − 1) matrix ∂F ∂uj
(p) is invert-
1≤i,j≤n−1
ible and hence by the Inverse Function Theorem there is locally a one-one
correspondence between u = (u1 , . . . , un−1 ) and points of the form
Moreover,
∂F ∂f
= ei + en
∂xi ∂xi
for i = 1, . . . , n − 1, and so these vectors are linearly independent.
k + ` = n.
To see this, note that previous arguments show that M is locally the graph
of a function from Rk → Rn−k and also locally the graph of a function from
Rn−` → R` . This makes it very plausible that k = n − `. A strict proof
requires a little topology or measure theory.
.
a = ψ(0) ψ'(0)
∂F ∂F
(u), . . . , (u).10
∂u1 ∂uk
10
As in Definition 19.3.4, these vectors are assumed to be linearly independent.
268
Suppose
F : Rn → R
is C 1 and F has a constrained minimum (maximum) at a ∈ M . Then
X̀
∇F (a) = λj ∇Φj (a)
j=1
X
n
∂F dψ i
0= (a) (0),
i=1 ∂xi dt
i.e.
∇F (a) ⊥ ψ 0 (0).
X̀
∇F (a) = λj ∇Φj (a)
j=1
∂H ∂F X ∂Φj ∂H
= − σj , = −Φj .
∂xi ∂xi j ∂xi ∂σj
X̀
∇F (a) = λj ∇Φj (a).
j=1
Inverse Function Theorem 271
F (x, y, z) = x + y + 2z
on the ellipsoid
M = {(x, y, z) : x2 + y 2 + 2z 2 = 2}.
Solution: Let
Φ(x, y, z) = x2 + y 2 + 2z 2 − 2.
At a critical point there exists λ such that
∇F = λ∇Φ.
That is
1 = λ(2x)
1 = λ(2y)
2 = λ(4z).
Moreover
x2 + y 2 + 2z 2 = 2.
These four equations give
1 1 1 1 √
x= , y= , z= , = ± 2.
2λ 2λ 2λ λ
Hence
1
(x, y, z) = ± √ (1, 1, 1).
2
Since F is continuous and M is compact,
√ F must have a minimum and a
maximum point. Thus one of ±(1, 1, 1)/ 2 must be the minimum point and
the other the maximum point. A calculation gives
à !
1 √
F √ (1, 1, 1) = 2 2
2
à !
1 √
F √ (1, 1, 1) = −2 2.
− 2
√ √
Thus the minimum and maximum points are −(1, 1, 1)/ 2 and +(1, 1, 1)/ 2
respectively.
272
Bibliography
[Mo] G. P. Monro. Proofs and Problems in Calculus. 2nd ed., 1991, Carslaw
Press.
[PJS] H-O. Peitgen, H. Jürgens, D. Saupe. Fractals for the Classroom. 1992,
Springer-Verlag.
273