100% found this document useful (2 votes)
2K views481 pages

Adel N. Boules - Fundamentals of Mathematical Analysis-OUP Oxford (2021)

Fundamentals of Mathematical Analysis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
2K views481 pages

Adel N. Boules - Fundamentals of Mathematical Analysis-OUP Oxford (2021)

Fundamentals of Mathematical Analysis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 481

OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

Fundamentals of Mathematical Analysis


OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

Fundamentals of
Mathematical Analysis
A D E L N . B OU L E S

1
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Adel N. Boules 2021
The moral rights of the author have been asserted
First Edition published in 2021
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2020952673
ISBN 978–0–19–886878–1 (hbk.)
ISBN 978–0–19–886879–8 (pbk.)
DOI: 10.1093/oso/9780198868781.001.0001
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

This work is a tribute


To all my teachers
The ones I met and the ones I did not

Dedication
To all my children
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

Preface

This is a beginning graduate book on real and functional analysis, with a significant
component on topology. The prerequisites include a solid understanding of
undergraduate real analysis and linear algebra, and a good degree of mathematical
maturity. Rudimentary knowledge of metric spaces, although not required, is a
huge asset. With the singular exception of Liouville’s theorem (stated without
proof), and a passing reference to Laurent series, knowledge of complex analysis
is neither assumed nor needed.

It is possible for students with high mathematical aptitude to study this book
independently. However, the book is designed as a textbook for well-prepared
students of mathematics, to be taught under the able guidance of an instructor.
I like to think of this book as an accessible classical introduction to the subject.
The goal is to provide a springboard from which students can dive into greater
depths in the sea of mathematics.

The book is neither encyclopedic nor a shallow introduction. The aim is to achieve
excellent breadth and depth. The topics are organized logically but not rigidly, in
order to maximize utility and the potential readership. The careful sequencing
of the sections is designed to allow instructors to select topics that suit their
course goals, student backgrounds, and time limitations. Although the proofs are
detailed, I hope the reader will find the writing style clear and concise. The section
exercises constitute an important complement to the results in the main body of
the section. Indeed, some of the exercises provide alternative approaches to some
topics, and generalizations of some of the results in the main text are considered in
the exercises. The book synopsis included after the preface furnishes more details
on the structure of the book and brief chapter descriptions.

I deliberately avoided making specific bibliographic citations within the body of


the text. There are two main reasons for this. First, all the results in this book are
well established and can be found in multiple sources. Second, the book contains
no original results. Therefore, the lack of bibliographic citations or the absence
of any specific source must not be conflated with claims of originality. I did not
number the definitions, in order to prevent item numbers within sections from
escalating to an annoying level. Definitions are seldom referenced far from where
they first appear, and the extensive index and the glossary of symbols should help
the reader locate items easily. Examples are locally and manually numbered within
each section.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

viii preface

Almost all of the historical information contained in this book is abridged,


with large excerpts included without quotation marks, from J. J. O’Connor and
E. F. Robertson’s articles in the MacTutor History of Mathematics archive, School of
Mathematics and Statistics, University of St Andrews, Scotland (see http://www-
history.mcs.st-andrews.ac.uk/index.html).

Sir Isaac Newton once said that if he had seen further than others, it was by
standing on the shoulders of giants. I am no giant, but this book is the shoulder I
have to offer. Perhaps a few students will climb and will be able to see farther than
I have.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

Acknowledgments

I would like to express my sincere appreciation to a long succession of upper


academic administrators at the University of North Florida for their support
throughout my tenure there. I must expressly mention Lewis Radonovich, Mark
Workman, and Barbara Hetrick. I also owe a debt to the College of Arts and
Sciences for granting me a sabbatical for the academic year 2017–18, during
which a significant bulk of writing was achieved.

I thank the anonymous reviewers of this book for their insightful criticism.
Their suggestions greatly contributed to the richness of the book and the
cohesion of its topics. I also thank the commissioning editor Dan Taber and the
assistant commissioning editor Katherine Ward for their prompt and professional
assistance during the review and production stages, and my son Youssef for his
assistance with the typesetting and formatting of the graphics.

Finally but foremost, my deep gratitude goes to my wife and life companion. Her
kind, patient, and trusting nature touched many lives and transformed mine.

Jacksonville, Florida
July 2020
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

The Book in Synopsis

The book in its entirety contains enough material for a two-semester course. The
core of the book can be used for an easy paced two-semester course. If a definition
of the core contents of the book is desirable, I define the core to consist of the
following sections, in addition to the very basic ideas in sections 1.1, 1.2, and 3.1:

▶ Sections 2.1 and 2.2


▶ Sections 3.2–3.4, 3.6, and 3.7
▶ Sections 4.1–4.10
▶ Sections 5.1–5.4 and 5.6–5.8
▶ Sections 6.1–6.4
▶ Sections 7.1 and 7.2
▶ Sections 8.1–8.4

Part I. Background Material

Instructors can choose material from this part as their students’ background
warrants. The most basic results in the first three chapters are stated without proof.

Chapter 1. This chapter furnishes a brief refresher of basic concepts. The natural,
rational, and real number systems are taken for granted, although we develop the
completeness of the real line and the Bolzano-Weierstrass theorem at length,
as well as the complex number field, including its completeness. Embryonic
manifestations of completeness and compactness can be seen in this chapter.
Examples include the nested interval theorem and the uniform continuity of
continuous functions on compact intervals, and our proof of the Heine-Borel
theorem in chapter 4 is squarely based on the Bolzano-Weierstrass property of
bounded sets.

Chapter 2. This chapter fills in any potential gaps that may exist in the
student’s knowledge of set theory. Sections 2.1 and 2.2 are essential for a proper
understanding of the rest of the book. In particular, a thorough understanding
of countability and Zorn’s lemma is indispensable. Some of section 2.3 may be
included, but only an intuitive understanding of cardinal numbers is sufficient.
Studying section 2.3 up to theorem 2.3.4, together with theorem 2.3.13, is sufficient
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

xii the book in synopsis

to follow the discussion on the existence of a vector space of arbitrary (infinite)


dimension, and the existence of inseparable Hilbert spaces. Cardinal arithmetic
can be omitted. Indeed, the results on cardinal arithmetic are applied only once in
order to prove the invariance of the cardinality of a linear basis of a vector space.
Ordinal numbers have been carefully avoided.

Chapter 3. It is this author’s observation that the undergraduate linear algebra


curriculum has settled into a matrix theory mode without enough exposure to
vector space theory. This chapter aims to provide a solid but brief account of the
theory of vector spaces. The reader is assumed to have good knowledge of the
basic definitions, which are briefly summarized in section 3.1. The aim of sections
3.2 and 3.3 is to provide a thorough presentation of the concepts of basis and
dimension, especially for infinite-dimensional vector spaces, as these are topics
that are not normally developed rigorously in the undergraduate curriculum. The
approach is unified in the sense that we do not treat finite and infinite-dimensional
spaces separately. Important concepts make their first debut in section 3.4. These
include algebraic complements, quotient spaces, direct sums, projections, linear
functionals, and invariant subspaces. Section 3.5 provides a brief refresher of
matrix representations and diagonalization. Section 3.6 introduces normed linear
spaces and is followed by an extensive study of inner product spaces in section
3.7. The presentation of inner product spaces in this section and in section 4.10
is not limited to finite-dimensional spaces but rather to many of the properties of
inner products that do not require completeness. The chapter concludes with the
finite-dimensional spectral theory.

Part II. Topology

A respectable one-semester course on topology can be based on chapters 4 and 5.


It is my belief that an adequate mastery of the basics of topology is a necessary
prerequisite for an organized study of higher mathematics. This is a focal point of
the book philosophy. It is fair to say that the book, generally speaking, has a mild
topological flavor. Chapters 4 and 5 provide a solid launch pad into the last three
chapters of the book. It is possible for the instructor, with a moderate amount of
maneuvering, to navigate most of the rest of book while avoiding chapter 5. This
chapter, however, contributes richly to the depth of the book.

Chapter 4. This chapter provides an extensive account of the metric topology


and is a prerequisite for all the subsequent chapters. The leading sections furnish
basic concepts such as closure, continuity, separation properties, product spaces,
and countability axioms. This is followed by a detailed study of completeness,
compactness, and function spaces. Chapter applications include contraction
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

the book in synopsis xiii

mappings, nowhere differentiable functions, and space-filling curves. The chapter


concludes with a detailed section on Fourier series and orthogonal polynomials,
which, together with section 3.7, provides an excellent background for Hilbert
spaces. Our study of sequence and function spaces in this chapter leads up gently
into Banach spaces.

Chapter 5. This chapter emphasizes the nonmetric properties of topology.


Sections 5.1–5.8 constitute the core of the chapter. Section 5.5 is terminal and
may be omitted. The remaining sections are more advanced and can be omitted.
Section 5.9 (locally compact spaces) is the transitional section between the core
of the chapter and the last three sections. At various points in the book, I point
out how results stated for the metric case can be extended to topological spaces,
especially locally compact spaces. Some such results are developed in the exercises.
Sections 5.10–5.12 are optional, and little subsequent material is based on them. I
provided a specialized proof of Urysohn’s lemma for ℝn in section 8.4 in order to
help instructors avoid section 5.11, if they so choose. Tychonoff ’s theorem appears
twice: once in section 5.8, for the product of finitely many topological spaces,
and again in section 5.12, for the product of infinitely many spaces. The proofs
are different, and both are worthy of inclusion, if an instructor decides to include
section 5.12.

Part III. Functional Analysis

An introductory course on functional analysis can be based on the instructor’s


choice of the background material and chapters 4, 6, and 7.

Chapter 6. This chapter introduces Banach spaces. Sections 6.1–6.4 form the
core of the chapter. It would be accurate to characterize sections 6.1–6.4 as quite
classical. Section 6.5 is needed for sections 7.3 and 7.4. Section 6.6 can be omitted
if a brief introduction is the goal. In this case, section 7.5 must also be omitted.
Section 6.7 is terminal and may be omitted without consequence. I have enriched
the chapter by including such topics as Gelfand’s theorem, Schauder bases, and
complemented subspaces. Chapters 6 and 7 include a good number of applications
of the four fundamental theorems of functional analysis.

Chapter 7. This chapter introduces Hilbert spaces and the elements of operator
theory. Sections 7.3 and 7.4 contain a good set of results on self-adjoint and
compact operators. The section exercises contain problems that suggest alternative
approaches and hence allow the instructor to shorten these two sections while
preserving good depth. For example, the Fredholm theory can be bypassed if
the instructor wishes to limit the discussion to compact, self-adjoint operators on
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

xiv the book in synopsis

Hilbert spaces. Sections 7.3 and 7.4 are written in such a way to facilitate extending
the results to compact operators on Banach spaces (section 7.5). For example, we
used Riesz’s lemma instead of the projection theorem in order to keep the proofs
adaptable for extension to Banach spaces. Sections 7.3–7.5 contain more results
than are typically found in an introductory course.

Part IV. Integration Theory

Together with chapter 4, this chapter constitutes the general/real analysis


component of the book, and a good course on real analysis can be built on the
background material and those two chapters.

Chapter 8. Section 8.1 furnishes a brief but rigorous introduction to the Riemann
integral of continuous functions on compact boxes in ℝn . Although it has intrinsic
value, the section is included for the express purpose of developing section 8.4.
Section 8.4 develops the Lebesgue measure on ℝn , and the approach is to extend
the positive linear functional provided by the Riemann integral on the space
of continuous, compactly supported functions on ℝn . This very nearly amounts
to developing the Radon measure theory on locally compact Hausdorff spaces.
However, I chose to limit the discussion to Lebesgue measure on ℝn because I did
not wish to base the presentation heavily on chapter 5. I did, nonetheless, include
an excursion into Radon measures as an optional topic. The rest of the chapter is
largely independent of sections 8.1 and 8.4 and constitutes a decent introduction
to general measure and integration theories. The section on complex measures has
intrinsic value but is also included in order to facilitate the study of the duals of 𝔏p
spaces. In particular, I limited the discussion of signed measures to real measures,
this is, signed measures that are not allowed to assume infinite values. This turned
out to be sufficient for our purposes. The selection of topics and the approach in
sections 8.6 and 8.8 are quite classical and cover the basics of 𝔏p spaces and product
measures. Section 8.7 contains an excellent collection of approximations theorems,
including approximations by 𝒞∞ functions. The title of the last section accurately
captures its contents: a mere glimpse of the subject. However, the section finally
settles questions started in sections 3.7 and 4.10 and concludes with the unraveling
of the mystery about the completeness of orthogonal polynomials.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

the book in synopsis xv

Appendices

Appendix A. This appendix contains the proof of the equivalence of the axiom
of choice, Zorn’s lemma, and the well-ordering principle. I created this appendix
in order to avoid distraction if instructors decide not to include the proof in their
course.

Appendix B. This appendix is rather elementary in nature. It develops matrix


factorizations and is used for deriving the change of variables formula in the
exercises on section 8.8. Reference to this appendix is also made in section 3.5.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

Contents

1. Preliminaries 1
1.1 Sets, Functions, and Relations 2
1.2 The Real and Complex Number Fields 9
2. Set Theory 25
2.1 Finite, Countable, and Uncountable Sets 26
2.2 Zorn’s Lemma and the Axiom of Choice 33
2.3 Cardinal Numbers 39
3. Vector Spaces 47
3.1 Definitions and Basic Properties 49
3.2 Independent Sets and Bases 53
3.3 The Dimension of a Vector Space 57
3.4 Linear Mappings, Quotient Spaces, and Direct Sums 61
3.5 Matrix Representation and Diagonalization 70
3.6 Normed Linear Spaces 75
3.7 Inner Product Spaces 85
4. The Metric Topology 103
4.1 Definitions and Basic Properties 105
4.2 Interior, Closure, and Boundary 110
4.3 Continuity and Equivalent Metrics 119
4.4 Product Spaces 129
4.5 Separable Spaces 133
4.6 Completeness 136
4.7 Compactness 149
4.8 Function Spaces 160
4.9 The Stone-Weierstrass Theorem 171
4.10 Fourier Series and Orthogonal Polynomials 175
5. Essentials of General Topology 191
5.1 Definitions and Basic Properties 192
5.2 Bases and Subbases 197
5.3 Continuity 200
5.4 The Product Topology: The Finite Case 205
5.5 Connected Spaces 208
5.6 Separation by Open Sets 213
5.7 Second Countable Spaces 217
5.8 Compact Spaces 221
5.9 Locally Compact Spaces 226
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

xviii contents

5.10 Compactification 229


5.11 Metrization 233
5.12 The Product of Infinitely Many Spaces 238
6. Banach Spaces 245
6.1 Finite vs. Infinite-Dimensional Spaces 247
6.2 Bounded Linear Mappings 253
6.3 Three Fundamental Theorems 260
6.4 The Hahn-Banach Theorem 266
6.5 The Spectrum of an Operator 272
6.6 Adjoint Operators and Quotient Spaces 278
6.7 Weak Topologies 284
7. Hilbert Spaces 291
7.1 Definitions and Basic Properties 292
7.2 Orthonormal Bases and Fourier Series 300
7.3 Self-Adjoint Operators 308
7.4 Compact Operators 319
7.5 Compact Operators on Banach Spaces 336
8. Integration Theory 341
8.1 The Riemann Integral 342
8.2 Measure Spaces 349
8.3 Abstract Integration 364
8.4 Lebesgue Measure on ℝn 373
8.5 Complex Measures 393
8.6 𝔏p Spaces 402
8.7 Approximation 408
8.8 Product Measures 418
8.9 A Glimpse of Fourier Analysis 430
Appendix A: The Equivalence of Zorn’s Lemma, the Axiom of Choice,
and the Well Ordering Principle 445
Appendix B: Matrix Factorizations 449

Bibliography 453
Glossary of Symbols 455
Index 457
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

1
Preliminaries

We are justified in calling numbers a free creation of the human mind.


Richard Dedekind

Richard Dedekind. 1831–1916

In 1848, at the age of 16, Dedekind entered The Collegium Carolinum, an


educational institution between a high school and a university. He then attended
the University of Göttingen in 1850, and in 1852 completed his doctoral work in
four semesters under Gauss’s supervision. Dedekind was to be Gauss’s last pupil.
Dedekind spent the following two years in Berlin for further training, returning
to Göttingen in 1855, the year Gauss died.

Dirichlet was appointed to fill Gauss’s chair at Göttingen, soon became Dedekind’s
friend and mentor, and had a strong influence in shaping his mathematical
interests. While at Göttingen, Dedekind studied the work of Galois and was the
first to lecture on Galois theory.

Dedekind was later appointed to the Polytechnic of Zürich and began teaching
there in 1858. By the 1860s, The Collegium Carolinum in Brunswick had been
upgraded to the Brunswick Polytechnic, and Dedekind was appointed to it in

Fundamentals of Mathematical Analysis. Adel N. Boules, Oxford University Press (2021). © Adel N. Boules.
DOI: 10.1093/oso/9780198868781.003.0001
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

2 fundamentals of mathematical analysis

1862. With this appointment he returned to his hometown and remained there
for the rest of his life.

Dedekind made a number of highly significant contributions to mathematics,


including his definition of finite and infinite sets, and his construction of the real
numbers as cuts in the set of rational numbers. Dedekind’s definitions are accepted
today as the standard definitions.

Among Dedekind’s other notable contributions to mathematics were his editions


of the collected works of Dirichlet, Gauss, and Riemann. His study of Dirichlet’s
work led him to study algebraic number fields, where he realized the importance
of rings and ideals. The general term ring did not appear in Dedekind’s work; it was
introduced later by David Hilbert, and Dedekind’s notion of an ideal was taken up
and extended by Hilbert and then later by Emmy Noether.

Dedekind retired in 1894. His life was long, healthy, and contented. He never
married and instead lived with one of his sisters, who also remained unmarried,
for most of his adult life. “He did not feel pressed to have a more marked effect in
the outside world: such confirmation of himself was unnecessary.”1

“Dedekind’s legacy ... consisted not only of important theorems, examples, and
concepts, but a whole style of mathematics that has been an inspiration to each
succeeding generation.”2

1.1 Sets, Functions, and Relations

The reader is expected to be familiar with basic set theoretic concepts such
as containment, unions, and intersections and should be comfortable with set
notation. Most of the essential definitions will be stated in this section. A number
of basic facts will be stated as theorems, without proof.

We use the symbols ℕ, ℤ, ℚ, ℝ, and ℂ to denote, respectively, the natural numbers,


the integers, rational numbers, real numbers, and complex numbers. The symbol
∅ denotes the empty set.

1 J. J. O’Connor and E. F. Robertson, “Julius Wilhelm Richard Dedekind,” in MacTutor History of


Mathematics, (St Andrews: University of St Andrews, 1998), http://mathshistory.st-andrews. ac.uk/
Biographies/Dedekind/, accessed Oct. 31, 2020.
2 O’Connor and Robertson, “Julius Wilhelm Richard Dedekind.”
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

preliminaries 3

Notation. If X is a suitable universal set for a particular problem and A ⊆ X, we


use the notation X − A to denote the complement of A in X:

X − A = {x ∈ X ∶ x ∉ A}.

We use the same notation for relative differences (the complement of B in A):

A − B = {x ∈ A ∶ x ∉ B} = A ∩ (X − B).

Theorem 1.1.1 (distributive laws). Let A and B1 , B2 ,…, Bn be subsets of a set X.


Then

(a) A ∪ (∩ni=1 Bi ) = ∩ni=1 (A ∪ Bi ),


(b) A ∩ (∪ni=1 Bi ) = ∪ni=1 (A ∩ Bi ). 

Theorem 1.1.2 (De Morgan’s laws). Let A1 , A2 ,…, An be subsets of a set X. Then

(a) X − ∪ni=1 Ai = ∩ni=1 (X − Ai ),


(b) X − ∩ni=1 Ai = ∪ni=1 (X − Ai ). 

Definition. If x and y are objects (e.g., numbers, functions, sets), the ordered pair
(x, y) is defined by (x, y) = {x, {x, y}}. The reader can verify that the definition
guarantees that (x, y) = (a, b) if and only if x = a, and y = b.

Definition. Let X and Y be nonempty sets. The Cartesian product of X and Y is


the set of all ordered pairs:

X × Y = {(x, y) ∶ x ∈ X, y ∈ Y}.

Definitions. Let X and Y be nonempty sets. A function f from X to Y is a subset


of X × Y such that for any x ∈ X, there is a unique y ∈ Y such that (x, y) ∈ f. We
use the more common notation y = f (x) instead of the cumbersome (x, y) ∈ f.
We use the notation f ∶ X → Y to indicate that f is a function from X to Y; X is
called the domain of f, denoted Dom( f ), and the range of f, denoted ℜ( f ), is
the set of all function values,

ℜ( f ) = { f (x) ∶ x ∈ X}.

If A ⊆ X, the image of A under f is the set f (A) = { f (a) ∶ a ∈ A}. The inverse
image of a set B ⊆ Y is the set f−1 (B) = {x ∈ X ∶ f (x) ∈ B}.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

4 fundamentals of mathematical analysis

Definitions. A function f ∶ X → Y is onto (or surjective) if ℜ( f ) = Y. Thus,


every y ∈ Y is the image of some x ∈ X. A function f is called one-to-one (or
injective) if, for x1 , x2 ∈ X, x1 ≠ x2 implies f (x1 ) ≠ f (x2 ). Finally, f is called a
one-to-one correspondence (or a bijection) if f is one-to-one and onto.

Definition. The identity function IX on a set X is the function IX (x) = x for all
x ∈ X.

Definition. Let f ∶ X → Y and g ∶ Y → Z. The composition of f and g is the


function gof ∶ X → Z defined by ( gof )(x) = g( f (x)).
We sometimes use the notation g f if there is no danger of confusing the
composition of f and g with the product of f and g.

Definition. Let f ∶ X → Y and g ∶ Y → X be functions. We say that g is the inverse


of f if gof = IX and fog = IY .
We write g = f−1 to indicate that g is the inverse of f. Notice that a function f
has an inverse if and only if it is bijective. Also, if g = f−1 , then f = g−1 .

Definition. A finite sequence in a set A is a function a ∶ {1, 2, … , n} → A. The


element a(i) is often denoted by ai . It is sometimes the case that a distinction
must be made between the sequence (as a function) and its range (as a set). We
denote a sequence by the notation (ai )ni=1 , and its range by {a1 , a2 , … , an }. An
infinite sequence in A in is a function a ∶ ℕ → A. An infinite sequence is often
given the notation (an ).

Indexed Sets

Let I be a set (the indexing set) and let 𝔄 be a collection of sets. An indexing of
𝔄 by I is a bijection A ∶ I → 𝔄. The image of an element 𝛼 ∈ I is denoted by A𝛼
instead of A(𝛼). Thus 𝔄 = {A𝛼 ∶ 𝛼 ∈ I}. Indexing is, of course, not limited to sets;
one can index, for example, a set of numbers, or functions. If there is no danger
of ambiguity, we sometimes omit reference to the indexing set I and write {A𝛼 }𝛼 .
Indexing is clearly a generalization of sequencing, as illustrated by the examples
below.

Example 1. For each n ∈ ℤ, let An = (n, n + 1). This is a collection of intervals


indexed by ℤ. 

Example 2. Let I = ℝ and, for 𝛼 ∈ I, let B𝛼 = (𝛼, ∞). 


OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

preliminaries 5

Example 3. We can index the set of linear homogeneous functions in one real
variable as { f𝛼 ∶ 𝛼 ∈ ℝ}, where, for x ∈ ℝ, f𝛼 (x) = 𝛼x. 

Definition. Let 𝔄 = {A𝛼 ∶ 𝛼 ∈ I} be an indexed family of subsets of a set X. We


define the union and intersection of 𝔄 as follows:
∪𝛼∈I A𝛼 = {x ∈ X ∶ x ∈ A𝛼 for some 𝛼 ∈ I},
∩𝛼∈I A𝛼 = {x ∈ X ∶ x ∈ A𝛼 for all 𝛼 ∈ I}.

Example 4. In example 1, ∪n∈ℤ An = ℝ − ℤ, ∩n∈ℤ An = ∅. 

Example 5. In example 2, ∪𝛼∈ℝ B𝛼 = ℝ, ∩𝛼∈ℝ B𝛼 = ∅. 

Definition. A family of sets {A𝛼 }𝛼 is said to be disjoint if A𝛼 ∩ A𝛽 = ∅ whenever


𝛼 ≠ 𝛽.

For example, the family {An } in example 1 is a disjoint family.


The following theorem will be used frequently in this book.

Theorem 1.1.3. Let (An ) be a sequence of subsets of a given set X. Then

(a) There exists a sequence of sets (Bn ) such that B1 ⊆ B2 ⊆ … and ∪∞ n=1 An =
∪∞ B
n=1 n . We simply define B n = ∪ n
A
i=1 i .
(b) There exists a disjoint sequence of sets (Cn ) such that ∪∞ ∞
n=1 An = ∪n=1 Cn . The
sequence we seek is C1 = A1 and, for n ≥ 2, Cn = An − ∪i=1 Ai . 
n−1

Definition. A sequence (Bn ) of sets is said to be ascending if B1 ⊆ B2 ⊆ . . . .


A sequence (Bn ) of sets is said to be descending if B1 ⊇ B2 ⊇ . . . .

The following two theorems generalize theorems 1.1.1 and 1.1.2.

Theorem 1.1.4 (distributive laws). Let {B𝛼 }𝛼 be an indexed family of subsets of a


set X, and let A be a subset of X. Then

(a) A ∪ (∩𝛼 B𝛼 ) = ∩𝛼 (A ∪ B𝛼 ),
(b) A ∩ (∪𝛼 B𝛼 ) = ∪𝛼 (A ∩ B𝛼 ). 

Theorem 1.1.5 (De Morgan’s laws). Let {A𝛼 }𝛼 be an indexed family of subsets of a
set X. Then
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

6 fundamentals of mathematical analysis

(a) X − ∩𝛼 A𝛼 = ∪𝛼 (X − A𝛼 ),
(b) X − ∪𝛼 A𝛼 = ∩𝛼 (X − A𝛼 ). 

Theorem 1.1.6. Let f ∶ X → Y, let {A𝛼 }𝛼 be a collection of subsets of X, and let {B𝛽 }𝛽
be a collection of subsets of Y. Then

(a) f (∪𝛼 A𝛼 ) = ∪𝛼 f (A𝛼 ),


(b) f (∩𝛼 A𝛼 ) ⊆ ∩𝛼 f (A𝛼 ),
(c) f−1 (∪𝛽 B𝛽 ) = ∪𝛽 f−1 (B𝛽 ),
(d) f−1 (∩𝛽 B𝛽 ) = ∩𝛽 f−1 (B𝛽 ). 

Definition (Cartesian products). Let {X𝛼 }𝛼∈I be a nonempty collection of


nonempty sets. The product ∏𝛼∈I X𝛼 is the collection of all functions

x ∶ I → ∪𝛼∈I X𝛼

such that x(𝛼) ∈ X𝛼 for all 𝛼 ∈ I. We write x𝛼 for x(𝛼).

We will denote the function x in the above definition by (x𝛼 )𝛼∈I , or simply (x𝛼 ).
The above definition generalizes the definition of the Cartesian product of a
n
finite number of sets. Indeed, for sets X1 , X2 , … , Xn , the Cartesian product ∏i=1 Xi
is the set of all sequences (x1 , x2 , … , xn ) such that xi ∈ Xi for all 1 ≤ i ≤ n. A
sequence is nothing but a function x ∶ {1, 2, … , n} → ∪ni=1 Xi such that xi = x(i) ∈
Xi for all 1 ≤ i ≤ n.

Example 6. ℝn = ℝ × ... × ℝ (n factors) is the Euclidean n-space. The complex


n-space ℂn is defined similarly. 

Example 7. Let ℝℕ be the set of all infinite sequences in ℝ. This is also the product

∏i=1 Xi , where each Xi = ℝ. 

Example 8. Let A be a set, and let 2A denote the set of all functions from A to
the set {0, 1}. Indeed, 2A is a product because if we define X𝛼 = {0, 1} for all
𝛼 ∈ A, then 2A = ∏𝛼∈A X𝛼 . As a special case, the set 2ℕ is the set of all binary
sequences. 

Definition (set exponentiation). Let A and B be nonempty sets. Define AB to be


the set of all functions f ∶ B → A. We leave it to the reader to interpret AB as a
product.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

preliminaries 7

Definition. Let A be a nonempty set. The collection of all the subsets of A,


including the empty set, is known as the power set of A and is denoted
by 𝒫(A).

Definition. For a subset S of a set A, we define the characteristic function of S by

1 if x ∈ S,
𝜒S (x) = {
0 if x ∉ S.

Clearly, 𝜒S ∈ 2A . Moreover, the correspondence 𝜒 ∶ 𝒫(A) → 2A that assigns to


each element S ∈ 𝒫(A) (i.e., S ⊆ A) its characteristic function 𝜒S is a bijection
from 𝒫(A) to 2A . We leave it to the reader to verify the details.

Definition. Let {X𝛼 }𝛼∈I be a collection of sets, and let X = ∏𝛼∈I X𝛼 . For each
𝛼 ∈ I, define the projection 𝜋𝛼 ∶ X → X𝛼 by 𝜋𝛼 (x) = x𝛼 . Here x = (x𝛼 )𝛼∈I is
an element of X.

Example 9. Let X = ℝn . Then, 𝜋1 ∶ ℝn → ℝ is indeed what we think of as the


projection of ℝn onto the x1 -axis: 𝜋1 (x1 , … , xn ) = x1 . 

Example 10. Consider the set D of all functions f ∶ [0, 1] → ℝ. This set can be
thought of as ∏𝛼∈[0,1] X𝛼 , where each X𝛼 = ℝ. If f ∈ D and a ∈ [0, 1], then
𝜋a ( f ) = f (a). Fix an element a ∈ [0, 1] and an interval U ⊆ ℝ. It makes sense
to ask what 𝜋a−1 (U) is. This is simply the set of all functions f ∈ D such that
𝜋a ( f ) ∈ U or simply f (a) ∈ U. Thus 𝜋a−1 (U) is the set of all the functions on
the closed unit interval whose graphs cross the line segment {a} × U. 

Definition. A relation R on a set A is a subset of A × A. Thus R is a set of ordered


pairs (x, y), where x, y ∈ A. Instead of writing (x, y) ∈ R, we write xRy. If xRy,
we say that x is related to y.

Definition. A relation R on a set A is said to be

(a) reflexive if, for all x ∈ A, xRx;


(b) symmetric if yRx whenever xRy;
(c) transitive if xRy and yRz imply xRz; and
(d) an equivalence relation if it is reflexive, symmetric, and transitive.

Definition. Let R be an equivalence relation on a set A, and let x ∈ A. The


equivalence class of x, denoted [x], is [x] = {y ∈ A ∶ yRx}.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

8 fundamentals of mathematical analysis

Theorem 1.1.7. Let R be an equivalence relation on a set A, and let x, y ∈ A. Then

(a) [x] = [y] if and only if xRy.


(b) If [x] ≠ [y], then [x] ∩ [y] = ∅. 

Thus the union of the equivalence classes is A, and distinct equivalence classes are
disjoint. The common terminology is that the equivalence classes partition A.

Exercises

In the exercises below, A, An , B, C, and so on are subsets of a nonempty set X.

1. Prove that A ∩ (B − C) = (A ∩ B) − (A ∩ C).


Is it true that A ∪ (B − C) = (A ∪ B) − (A ∪ C)?
1
2. Find ∪n∈ℕ [ , n], ∩𝛼∈(0,1) [𝛼 − 1, 𝛼 + 1], and ∪𝛼∈(0,1) [𝛼 − 1, 𝛼 + 1].
n
3. (a) Show that A ⊆ B if and only if 𝒫(A) ⊆ 𝒫(B).
(b) Show that 𝒫(A) ∪ 𝒫(B) ⊆ 𝒫(A ∪ B).
(c) Show that 𝒫(A) ∩ 𝒫(B) = 𝒫(A ∩ B).
4. For r ∈ ℝ, r > 0, let Br = {(x, y) ∈ ℝ2 ∶ x2 + y2 ≤ r2 }. Find ∩r>0 Br and
∪r>0 Br .
5. Describe the following sets in words: ∪∞ ∞ ∞ ∞
n=1 ∩k=n Ak and ∩n=1 ∪k=n Ak
6. Let {A𝛼 }𝛼 be an indexed family of sets, and let B be a set. Show that
(a) (∪𝛼 A𝛼 ) × B = ∪𝛼 (A𝛼 × B), and
(b) (∩𝛼 A𝛼 ) × B = ∩𝛼 (A𝛼 × B).
7. Let A = {x ∈ ℝ ∶ |x| ≤ 1}, B = {x ∈ ℝ ∶ |x| ≥ 1}. Give a geometric interpre-
tation of A × B.
8. Consider the product ∏𝛼∈I X𝛼 of a collection of sets. Suppose that
{𝛼1 , 𝛼2 , … , 𝛼n } is a finite subset of I and that U𝛼i ⊆ X𝛼i , 1 ≤ i ≤ n.
Describe the set ∩ni=1 𝜋𝛼−1 i
(U𝛼i ).
9. Prove theorem 1.1.6.
10. Let f ∶ X → Y. Show that the following are equivalent:
(a) f is one-to-one.
(b) f (A1 ∩ A2 ) = f (A1 ) ∩ f (A2 ) for every A1 , A2 ⊆ X.
(c) f−1 ( f (A)) = A for every A ⊆ X.
(d) f (X − A) ⊆ Y − f (A) for every A ⊆ X.
11. Let f ∶ X → Y. Show that the following are equivalent:
(a) f is onto.
(b) f ( f−1 (B)) = B for every B ⊆ Y.
(c) f (X − A) ⊇ Y − f (A) for every A ⊆ X.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

preliminaries 9

12. Show that the composition of two injective (respectively, surjective, bijec-
tive) functions is injective (respectively, surjective, bijective).
13. Let f ∶ A → B and g ∶ B → C be bijections. Show that (gof)−1 = f−1 og−1 .
14. (a) Show that if f ∶ A → B is injective, then there exists a function g ∶ B → A
such that gof = IA .
(b) Show that if f ∶ A → B is surjective, then there exists a function g ∶ B → A
such that fog = IB .
15. Show that the function f ∶ ℕ × ℕ → ℕ given by f (m, n) = 2m−1 (2n − 1) is a
one-to-one correspondence.
16. Verify the one-to-one correspondence between 2A and 𝒫(A).
17. Show that if A has n elements and B has m elements, then AB has nm
elements. Conclude that 𝒫(A) has 2n elements.
18. Let S and T be subsets of a set A. Show that
(a) 𝜒S∩T = 𝜒S .𝜒T ; and
(b) 𝜒S∪T = 𝜒S + 𝜒T − 𝜒S∩T .
19. Prove theorem 1.1.7.
20. Fix an integer n > 1, and define a relation R on ℤ as follows: xRy if x − y
is a multiple of n. Show that R is an equivalence relation, and describe the
equivalence classes.
21. Define a relation R on ℝ as follows: xRy if and only if x − y ∈ ℚ. Show that
R is an equivalence relation.
22. Define a relation R on ℤ by xRy if and only if x2 + y2 is even. Show that R
is an equivalence relation.
23. Define a relation R on ℝ by xRy if and only if xy ≥ 0. Is R an equivalence
relation?

1.2 The Real and Complex Number Fields

An organized study of mathematics must be rooted in a proper understanding of


number systems. Authors of textbooks such as this one are often divided between
two extremes: either they provide an extensive development of number systems
from scratch or they ignore the entire matter and consider knowledge of the
real numbers to be a prerequisite. This presentation is a compromise between
the two extremes. It is assumed that the reader has a thorough knowledge of
integers and the rational number field, including such topics as divisibility, prime
factorizations, the infinitude of the set of prime numbers, and the construction
of ℚ in the usual manner as the quotient field of ℤ. We basically accept the
completeness of real numbers as an axiom, then prove the Cauchy criterion and
the Bolzano-Weierstrass property, which we decided to develop at length since it
is a cornerstone theorem. The section concludes with the definition of complex
numbers and a study of their basic properties, including completeness. Although
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

10 fundamentals of mathematical analysis

the section is not totally self-contained, there is value in its inclusion because
it illustrates a number of important proof techniques and provides a succinct
summary of the properties of real and complex number fields.

Definition. Let F be a nonempty set endowed with two binary operations, +


(addition) and × (multiplication). The triple (F, +, ×) is said to be a field if the
following conditions are satisfied for all a, b, c ∈ F :

(a) a + b = b + a.
(b) a + (b + c) = (a + b) + c.
(c) There is an element 0 ∈ F such that a + 0 = a.
(d) For every a ∈ F, there is an element −a ∈ F such that a + (−a) = 0.
(e) a × b = b × a.
(f) a × (b × c) = (a × b) × c.
(g) There is an element 1 ∈ F such that a × 1 = a.
(h) For every a ≠ 0, there is an element a−1 such that a × a−1 = 1.
(i) a × (b + c) = a × b + a × c.

We often omit the symbol for multiplication and write ab or a.b for a × b. The
element 0 is called the additive identity, and 1 is called the multiplicative identity
of the filed. A field must clearly contain at least two elements.

With the usual operations of addition and multiplication of numbers, the rational
numbers, ℚ, and the real numbers, ℝ, are fields. We will see later in this section
that complex numbers also form a field.

Example 1. Let p be a prime number. Define an equivalence relation ≡ on ℤ by


a ≡ b if a − b is divisible by p. Since the remainder upon dividing a whole
number by p is an integer between 0 and p − 1, the equivalence classes
containing 0, 1, … , p − 1 are all the equivalence classes of ≡. The field of p
elements (also called the integers modulo p) consists of the equivalence classes
of 0, 1, … , p − 1 and is often given the symbol ℤp . The equivalence class of an
integer n is denoted n. Addition and multiplication in ℤp are defined as follows:
n + m = n + m, and n.m = nm. This simply means that we add or multiply
integers representing the class, then reduce the result modulo p. For example,
let p = 7. Then 6 + 5 = 11 = 4. 

Real Numbers

Definition. A subset A of ℝ is said to be bounded above if there is a real number


M such that, for all x ∈ A, x ≤ M. The number M is called an upper bound of
A. A is said to be bounded below if there is a real number m such that, for all
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

preliminaries 11

x ∈ A, x ≥ m. The number m is called a lower bound of A. Finally, A is bounded


if it is bounded above and below.

It is clear that if M is an upper bound of A, then every real number greater than M
is also an upper bound of A. This leads to the following definition.

Definition. The least upper bound of a set A ⊆ ℝ is the number M such that

(a) M is an upper bound of A, and


(b) for all 𝜖 > 0, M − 𝜖 is not an upper bound of A. Thus there is an element
x ∈ A such that x > M − 𝜖.

The least upper bound of A is also called the supremum of A and is denoted by
supA. If A is not bounded above, we set supA = ∞.

Definition. The greatest lower bound of a set A ⊆ ℝ is the number m such that

(a) m is a lower bound of A, and


(b) for all 𝜖 > 0, m + 𝜖 is not a lower bound of A. Thus there is an element
x ∈ A such that x < m + 𝜖.

The greatest lower bound of A is also called the infimum of A and is given the
notation inf A. If A is not bounded below, we set infA = −∞.

1 n
Example 2. Let A1 = (−∞, 1), A2 = { ∶ n ∈ ℕ}, and A3 = { ∶ n ∈ ℕ}. Then,
n n+1
supA1 = supA2 = supA3 = 1, infA1 = −∞ , infA2 = 0, and infA3 = 1/2. 

The completeness of ℝ. We accept the following fact as true:


Let A ⊂ ℝ be bounded above. Then A has a least upper bound.
The above fact is not trivial; its establishment requires delving deep into the very
definition of the real numbers, which we will not do here. The following example
shows that ℚ is not complete and illustrates that the completeness of ℝ is not to
be mistaken for a simple fact.

Example 3. Let A = {x ∈ ℚ ∶ x2 < 2}. Clearly, A is bounded above and below.


However, supA and infA are not in ℚ. 

Definition. A sequence (an ) of real numbers is said to converge to a ∈ ℝ if, for


every 𝜖 > 0, there is a natural number N such that |an − a| < 𝜖 for all n > N. In
this case, we say the limit of (an ) is a, and we write limn→∞ an = a or simply
limn an = a.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

12 fundamentals of mathematical analysis

Definition. A sequence (an ) of real numbers is said to diverge to ∞ if, for every
M ∈ ℝ, there is a natural number N such that an > M for all n > N. In this case,
we also say that an has limit ∞, and we write limn an = ∞. The sequence (an )
is said to diverge to −∞ if, for every m ∈ ℝ, there is a natural number N such
that an < m for all n > N. In this case, we also say that an has limit −∞, and we
write limn an = −∞.

1
Example 4. Let an = n + , bn = 1 + (−1)n , cn = e−n .
n
The sequence (an ) diverges to ∞, while (bn ) does not converge, nor does it
diverge to ±∞. Finally, limn cn = 0.

Example 5. If, for every n ∈ ℕ, an ≤ bn and limn an = a, limn bn = b, then a ≤ b.


Suppose for a contradiction that b < a. Let 𝜖 = (a − b)/3. Observe that b − 𝜖 <
b + 𝜖 < a − 𝜖 < a + 𝜖. There exist integers N1 and N2 such that, for n > N1 , bn ∈
(b − 𝜖, b + 𝜖), and for n > N2 , an ∈ (a − 𝜖, a + 𝜖). Now, for any n > max{N1 , N2 },
bn < an , which is a contradiction. 

Theorem 1.2.1. If an ≤ cn ≤ bn and limn an = limn bn = a, then limn cn = a. 

Definition. A sequence (an ) is bounded if its range {a1 , a2 , ...} is a bounded set.
Thus there is a positive number M such that, for all n ∈ ℕ, |an | ≤ M.

Theorem 1.2.2. A convergent sequence is bounded. 

Definitions. A sequence (an ) is non-decreasing if a1 ≤ a2 ≤ ….


A sequence (an ) is (strictly) increasing if a1 < a2 < ….
A sequence (an ) is non-increasing if a1 ≥ a2 ≥ ….
A sequence (an ) is (strictly) decreasing if a1 > a2 > ….
A sequence (an ) is monotonic if it is non-decreasing or non-increasing.

Example 6. an = 1/n is decreasing, but bn = (−1)n /n is not monotonic.

Theorem 1.2.3. A monotonic sequence is convergent if and only if it is bounded.

Proof. Without loss of generality, let (an ) be a bounded, non-decreasing sequence,


and let A = {an ∶ n ∈ ℕ}. By assumption, A is bounded, so, by the completeness
of ℝ, a = supA exists. We show that limn an = a. Let 𝜖 > 0.There is an integer
N > 0 such that a − 𝜖 < aN . Because (an ) is non-decreasing, for any n > N,
a − 𝜖 < aN ≤ an ≤ a < a + 𝜖; hence limn an = a. The converse is a special case of
theorem 1.2.2. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

preliminaries 13

Definition. A sequence an is said to be a Cauchy sequence if, for every 𝜖 > 0, there
is a natural number N such that, for all m, n > N, |an − am | < 𝜖

Theorem 1.2.4. A convergent sequence is a Cauchy sequence. 

Theorem 1.2.5. A Cauchy sequence is bounded.

Proof. Let 𝜖 = 1. There is a positive integer N such that, for m, n ≥ N, |an − am | < 1.
In particular, taking m = N, |an − aN | < 1 for all n ≥ N Thus, by the trian-
gle inequality, for every n ≥ N, |an | = |(an − aN ) + aN | ≤ |an − aN | + |aN | ≤ 1 +
|aN |. Let M = max{|a1 |, … , |aN−1 |, 1 + |aN |}. Clearly, |an | ≤ M for all n. 

Definition. Let (an ) be a sequence, and let (n1 , n2 , ...) be a strictly increasing
sequence of natural numbers. We say that (ank )∞
k=1 is a subsequence of (an ).

Theorem 1.2.6. A subsequence of a convergent sequence is convergent to the same


limit. Thus if limn an = a and (ank ) is a subsequence of (an ), then limk→∞ ank = a.

Proof. Let 𝜖 > 0. Since limn an = a, there is a positive integer N such that, for n > N,
|an − a| < 𝜖. Since (nk ) is an increasing sequence of natural numbers, nk ≥ k for
every k ∈ ℕ. Thus, for k > N, nk > N and |ank − a| < 𝜖. 

Theorem 1.2.7. Every sequence (an ) contains a monotonic subsequence.

Proof. Define a term an of the sequence to be a peak if, for every i ≥ n, an ≥ ai . There
are two cases:

Case 1. The sequence (an ) has finitely many peaks. Suppose k0 is the largest positive
integer for which ak0 is a peak, and let n1 = k0 + 1. Since an1 is not a peak, there is
an integer n2 > n1 such that an2 > an1 . Continuing inductively, one can construct
a strictly increasing sequence of positive integers n1 < n2 < n3 , ... such that an1 <
an2 < an3 …. The sequence (ank ) is an increasing subsequence of (an ).
Case 2. The sequence, (an ) contains infinitely many peaks, an1 ≥ an2 ≥ an3 ≥ …,
where nk is an increasing sequence of positive integers. The sequence (ank ) is a
non-increasing subsequence of (an ). 

Theorem 1.2.8 (the Bolzano-Weierstrass theorem). Every bounded sequence


contains a convergent subsequence.

Proof. Let (an ) be a bounded sequence. By the previous theorem, (an ) contains a
monotonic subsequence, (ank ), which is convergent by theorem 1.2.3. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

14 fundamentals of mathematical analysis

The following two examples are fixtures in undergraduate real analysis books. The
proof technique is quite common and is valid for general compact sets. See chapter
4. First we remind the reader of the definition of continuity.

Definition. Let X be a subset of ℝ, and let f ∶ X → ℝ. We say that f is continuous


at a point x ∈ X if, for every 𝜖 > 0, there exists 𝛿 > 0 such that | f (y) − f (x)| < 𝜖
whenever y ∈ X and |y − x| < 𝛿. The function f is said to be continuous on X if
it is continuous at every point x ∈ X.

It is easy to see that if f is continuous at x, and xn ∈ X is such that limn xn = x, then


limn f (xn ) = f (x).

Example 7. A continuous real-valued function f on a closed bounded interval


[a, b] is bounded and attains its supremum and infimum values.

Suppose, for a contradiction, that f is unbounded. Without loss of generality,


assume that supx∈[a,b] f (x) = ∞. There exists a sequence (xn ) in [a, b] such that
limn f (xn ) = ∞. By the Bolzano-Weierstrass theorem, (xn ) contains a convergent
subsequence xnk . Because [a, b] is closed, x = limk xnk ∈ [a, b]. Now we have the
following contradiction: f (x) = limk f (xnk ) = ∞.
The proof that f attains its supremum and infimum values replicates the above
argument. 

Definition. Let X be a subset of ℝ, and let f ∶ X → ℝ. We say that f is uniformly


continuous on X, if for every 𝜖 > 0, there exists 𝛿 > 0 such that | f (y) − f (x)| < 𝜖
whenever x, y ∈ X and |y − x| < 𝛿.

The number 𝛿 in the above definition depends on 𝜖 only and not on any particular
x ∈ X. For example, the function f ∶ (0, 1) → ℝ defined by f (x) = 1/x is continuous
but not uniformly continuous.

Example 8. A continuous real-valued function f on a closed bounded interval


[a, b] is uniformly continuous.

Suppose that f is not uniformly continuous. Then there exists a positive


number 𝜖 such that, for every n ∈ ℕ, [a, b] contains a pair of points xn and
1
yn such that |xn − yn | < and | f (yn ) − f (xn )| ≥ 𝜖. By the Bolzano-Weierstrass
n
theorem, (xn ) contains a convergent subsequence xnk . Let x = limk xnk . Observe
that x ∈ [a, b] and limk ynk = x. Now, for all k ∈ ℕ, | f (ynk ) − f (xnk )| ≥ 𝜖.
This is a contradiction because if we take the limit as k → ∞ of the left-
hand side of the last inequality and use the continuity of f, we obtain
0 = | f (x) − f (x)| = limk | f (ynk ) − f (xnk )| ≥ 𝜖. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

preliminaries 15

Theorem 1.2.9 (the Cauchy criterion). A sequence in ℝ is a Cauchy sequence if


and only if it is convergent.

Proof. By theorem 1.2.4, every convergent sequence is a Cauchy sequence. To prove


the converse, let (an ) be a Cauchy sequence. By theorem 1.2.5, (an ) is bounded;
hence, by theorem 1.2.8, (an ) contains a convergent subsequence, (ank ). Let
limk ank = a. We show that limn an = a. Let 𝜖 > 0. Since (an ) is Cauchy, there
is a positive integer N such that, for n, m > N, |an − am | < 𝜖/2. Since limk ank = a,
there is an integer K such that for k ≥ K, |ank − a| < 𝜖/2. Without loss of
generality, we may assume that K > N; thus nK ≥ K > N. Taking m = nK and
using the triangle inequality, for all n > N, |an − a| ≤ |an − anK | + |anK − a| <
𝜖/2 + 𝜖/2 = 𝜖. 

Example 9. The rational field ℚ does not satisfy the Cauchy criterion. For
n 1
example, the sequence ∑i=0 is a Cauchy sequence in ℚ, but its limit, e, is
i!
not in ℚ. 

Remark. The completeness of ℝ is, in fact, equivalent to the Cauchy criterion.


See example 10 below. This is why the Cauchy criterion is sometimes used as a
definition of the completeness of ℝ.

Definition. Let A be a subset of ℝ. A real number x is called a limit point of A if,


for every 𝛿 > 0, (x − 𝛿, x + 𝛿) ∩ A contains a point other than x.

Theorem 1.2.10 (the Bolzano-Weierstrass property of bounded sets). Every


bounded infinite subset A of ℝ has a limit point.

Proof. Let I1 = [a, b] be a closed bounded interval that contains A. Bisect I1 into
two congruent closed subintervals. One of the two subintervals contains infinitely
many points of A. Denote that interval by I2 . Continuing this process produces
a sequence of subintervals I1 ⊇ I2 ⊇ … such that A ∩ In is infinite for all n ∈ ℕ,
b−a
and the length of In , l(In ) = n−1 . For each n ∈ ℕ, pick a point an ∈ A ∩ In . If
2
b−a b−a
m > n, In ⊇ Im , and am , an ∈ In ; hence |an − am | < n−1 . Since limn n−1 = 0, (an )
2 2
is a Cauchy sequence. Let a = limn an . Since ai ∈ In for all i ≥ n, a ∈ In for all
n (see exercise 9 at the end of this section). Now let 𝛿 > 0. Since limn l(In ) = 0,
and a ∈ ∩∞ n=1 In , In ⊆ (a − 𝛿, a + 𝛿) for sufficiently large n. Thus (a − 𝛿, a + 𝛿)
contains infinitely many points of A because In does. In particular, (a − 𝛿, a +
𝛿) ∩ A contains a point other than a. 

Example 10. The completeness of ℝ is equivalent to the Cauchy criterion. The fact
that the completeness of ℝ implies the Cauchy criterion has been established
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

16 fundamentals of mathematical analysis

in theorem 1.2.9. Observe that the proof of theorem 1.2.9 depends heavily
(through the intervening theorems) on theorem 1.2.3, where the completeness
of ℝ was crucial. We now prove that the Cauchy criterion implies the complete-
ness of ℝ.

Let A be a subset of ℝ that is bounded above, pick an element a0 ∈ A, and let b0


be an upper bound of A. We construct two sequences (an ) and (bn ) such that

(1) each an ∈ A, and each bn is an upper bound of A;


(2) [an+1 , bn+1 ] ⊆ [an , bn ]; and
b −a
(3) bn − an ≤ 0 n 0 .
2

b0 −a0 b0 −a0
Consequently, an+1 − an ≤ and bn − bn+1 ≤ .
2n 2n

Suppose a1 , … , an and b1 , … , bn have been found. We define an+1 and bn+1 as


a +b
follows. Let m = n n . If m is an upper bound of A, let an+1 = an , and let
2
bn+1 = m. If m is not an upper bound of A, choose an element an+1 ∈ A such
that m < an+1 < bn , and define bn+1 = bn .3

We now show that (bn ) is a Cauchy sequence. Let 𝜖 > 0, and choose an integer N
such that (b0 − a0 )/2N−1 < 𝜖. For m > n > N, we have

|bn − bm | = bn − bm = (bn − bn+1 ) + (bn+1 − bn+2 ) + ... + (bm−1 − bm )


≤ (b0 − a0 )[1/2n + ... + 1/2m−1 ] < (b0 − a0 )[1 + 1/2 + ...]/2n
= (b0 − a0 )/2n−1 < (b0 − a0 )/2N−1 < 𝜖

By the Cauchy criterion, the sequence (bn ) has a limit, say, b. An argument
identical to the one above shows that (an ) is convergent, and since bn − an ≤
(b0 − a0 )/2n , limn an = b.

Finally, we prove that b = supA. If a > b for an element a ∈ A, then a > bn for some
n, which contradicts the fact that bn is an upper bound of A. Thus b is an upper
bound of A. For any number c < b, let 𝜖 = b − c. Since limn an = b, there exists an
integer n such that an ∈ (b − 𝜖, b + 𝜖). In particular, an > c; hence c is not an upper
bound of A. 

3 Observe that if an+1 = bn , the process terminates and bn is the least upper bound (in fact,
the maximum) of A. Otherwise, the process continues ad infinitum, and each bn is a strict upper
bound of A.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

preliminaries 17

Definition. The extended real line is ℝ = ℝ ∪ {−∞, ∞}. We need this extension
of ℝ because the limits of some sequences are infinite and because it is some-
times convenient to allow functions to take infinite values. We retain the usual
ordering on ℝ, and, for x ∈ ℝ, we define −∞ < x < ∞. The following rules of
arithmetic in ℝ are convenient and widely accepted:

(a) For a real number a, a + ∞ = ∞, a − ∞ = −∞.


(b) If a > 0, then a.∞ = ∞, and a.(−∞) = −∞, while if a < 0, then
a.∞ = −∞, and a.(−∞) = ∞.
(c) For any real number a, a/ ± ∞ = 0.
(d) In chapter 8, we adopt the convention that 0.∞ = 0. However, this defini-
tion is specific to integration theory.

We do not define the operation ∞ − ∞.

Definition. A point a ∈ ℝ is a limit point of a sequence (an ) if, for every


𝜖 > 0, |an − a| < 𝜖 for infinitely many n. We say that ∞ is a limit point of an
if, for every M ∈ ℝ, an > M for infinitely many n. Likewise, −∞ is a limit point
of an if, for every m ∈ ℝ, an < m for infinitely many n.

Remark. Not every limit point of a sequence is a limit point of its range. For
example, the sequence an = (−1)n has two limit points, ±1, while its range, the
set {−1, 1}, has no limit points.

Theorem 1.2.11. An extended real number a is a limit point of (an ) if and only if
there exists a subsequence (ank ) of (an ) such that limk ank = a.

Proof. Suppose a ∈ ℝ is a limit point of (an ). There exists n1 ∈ ℕ such that


|an1 − a| < 1. Now we can find an integer n2 > n1 such that |an2 − a| < 1/2.
Continuing this construction produces a sequence n1 < n2 < n3 < ... of integers
such that |ank − a| < 1/k. Thus limk ank = a. The converse is trivial. We leave it to
the reader to provide the details when a = ±∞. 

Definition. Let (an ) be a sequence, and consider the sequences

𝛼n = supk≥n ak , and 𝛽n = infk≥n ak .

Clearly, 𝛼n is non-increasing, and 𝛽n is non-decreasing. Therefore limn 𝛼n and


limn 𝛽n exist. We define the limit superior, or the upper limit and the limit
inferior, or the lower limit of (an ), respectively, as follows:
lim supn an = limn 𝛼n = limn supk≥n ak = infn∈ℕ 𝛼n ,
lim infn an = limn 𝛽n = limn infk≥n ak = supn∈ℕ 𝛽n .
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

18 fundamentals of mathematical analysis

Theorem 1.2.12. 𝛼 = lim supn an if and only if, for every 𝜖 > 0,

(a) there is a positive integer N such that an < 𝛼 + 𝜖 for all n > N, and
(b) an > 𝛼 − 𝜖 for infinitely many n ∈ ℕ.

Proof. Suppose 𝛼 = lim supn an . Since 𝛼 = infn 𝛼n , there is a positive integer N such
that 𝛼N < 𝛼 + 𝜖. Now, because an ≤ 𝛼n and 𝛼n is non-increasing, an ≤ 𝛼n ≤ 𝛼N <
𝛼 + 𝜖, for all n ≥ N. This proves (a).
To prove (b), note that 𝛼 − 𝜖 < 𝛼 ≤ 𝛼1 = sup{a1 , a2 , ...}. Thus there is a positive
integer n1 such that an1 > 𝛼 − 𝜖. Now 𝛼 − 𝜖 < 𝛼 ≤ 𝛼n1 +1 = sup{an1 +1 , an1 +2 , ...}.
Thus there is a positive integer n2 > n1 such that an2 > 𝛼 − 𝜖. This process produces
a subsequence ank of an such that ank > 𝛼 − 𝜖.
To prove the converse, suppose 𝛼 ∈ ℝ satisfies conditions (a) and (b), and let
𝜖 > 0. By condition (b), for every n ∈ ℕ, there exists an integer k ≥ n such that
ak > 𝛼 − 𝜖. Thus 𝛼n = supk≥n ak ≥ 𝛼 − 𝜖. Taking the limit as n → ∞, produces
lim supn an ≥ 𝛼 − 𝜖. Since 𝜖 is arbitrary, lim supn an ≥ 𝛼.
By condition (a), there exists an integer N such that ak < 𝛼 + 𝜖 for every k > N.
Thus, for every n > N, 𝛼n = supk≥n 𝛼k ≤ 𝛼 + 𝜖. Taking the limit as n → ∞, we
obtain lim supn an ≤ 𝛼 + 𝜖. Because 𝜖 is arbitrary, lim supn an ≤ 𝛼. 

Theorem 1.2.13. The upper limit of a sequence (an ) is the largest limit point of (an ).

Proof. Let 𝜖 > 0. By the previous theorem (and its proof), there is a positive integer N
such that, for all n > N, an < 𝛼 + 𝜖, and a subsequence (ank ) such that ank > 𝛼 − 𝜖.
Since limk nk = ∞, there is a positive integer K such that nk > N for all k > K.
Therefore 𝛼 − 𝜖 < ank < 𝛼 + 𝜖 for all k > K. By theorem 1.2.11, 𝛼 is a limit point
of (an ). If t is a limit point of (an ), then, for infinitely many positive integers n,
t − 𝜖 < an . By theorem 1.2.12, there is an integer N such that, for all n > N, an <
𝛼 + 𝜖. Choosing n large enough for the last two inequalities to be simultaneously
satisfied, we have t − 𝜖 < an < 𝛼 + 𝜖. Therefore t < 𝛼 + 2𝜖. Since 𝜖 is arbitrary,
t ≤ 𝛼. 

Theorem 1.2.14. A sequence (an ) converges to a if and only if lim supn an =


lim infn an = a.

Proof. Let 𝛼 = lim supn an , 𝛽 = lim infn an , and suppose that 𝛼 = 𝛽. By theorems
1.2.12 and problem 17 at the end of this section, there is a positive integer N such
that, for n > N, an < 𝛼 + 𝜖 and 𝛼 − 𝜖 = 𝛽 − 𝜖 < an . Thus, for n > N, 𝛼 − 𝜖 < an <
𝛼 + 𝜖; hence limn an = 𝛼. Conversely, if limn an = a, then it is easy to verify that
the conditions of theorem 1.2.12 and those of problem 17 are met with 𝛼 = a and
𝛽 = a, respectively. Hence 𝛼 = 𝛽. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

preliminaries 19

Complex Numbers

Definition. A complex number z is an ordered pair (x, y) of real numbers. We use


the symbol ℂ for the set of complex numbers.

The definition so far makes ℂ nothing more than the Euclidean plane ℝ2 . This
is why the set of complex numbers is also called the complex plane. What sets ℂ
apart from ℝ2 is the following pair of binary operations.

Definition. For complex numbers z = (x, y) and w = (a, b), we define the sum
z + w = (x + a, y + b) and the product zw = (ax − by, ay + bx). The real field ℝ
is embedded into the complex plane in a natural way: we identify a real number
x with the complex number (x, 0). Under the operations of complex addition
and multiplication, the subset ℝ̃ = {(x, 0) ∈ ℂ ∶ x ∈ ℝ} is closed in the sense
that if z and w are in ℝ,̃ then z + w and zw are in ℝ.̃ Indeed z + w = (x + a, 0)
and zw = (ax, 0). From now on, we make no distinction between ℝ and ℝ̃ and
simply write x for (x, 0). With this understanding, we see that if x ∈ ℝ, then
xw = (x, 0)(a, b) = (xa, xb). It is also straightforward to verify that the elements
0 = (0, 0) and 1 = (1, 0) satisfy z + 0 = z and z.1 = z for all z ∈ ℂ. Thus 0 and 1
are the identity elements for complex addition and multiplication, respectively.

Definition. The complex number i = (0, 1) is called the imaginary number. Now
i2 = (0, 1).(0, 1) = (−1, 0) = −1. We therefore think of i as the square root of −1.

Armed with the imaginary number i, we now have a convenient and notationally
simple way to represent complex numbers. An arbitrary complex number z
can be written as z = (x, y) = (x, 0) + (0, y) = x + y(0, 1) = x + iy. With this way
of representing complex numbers, we can restate the definitions of complex
addition and multiplication as follows. For complex numbers z = x + iy and
w = a + ib, z + w = (x + a) + i(y + b) and zw = (ax − by) + i(ay + bx). Note that
the complex operations obey the same rules as the addition and multiplication of
linear polynomials, taking into account that i2 = −1. Indeed, if we multiply out the
product of the binomials x + iy and a + ib according to the usual rules of algebra,
we obtain (x + iy)(a + ib) = ax + iay + ibx + i2 by = (ax − by) + (ay + bx)i, which
is consistent with the original definition of complex multiplication. Now that
we have a convenient way of manipulating complex numbers, we can prove the
following theorem.

Theorem 1.2.15. With the operations of complex addition and multiplication, ℂ is


a field.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

20 fundamentals of mathematical analysis

Proof. Most of the defining properties of a field are easy to verify. As a sample of the
calculations, we verify the following two properties:

(a) Complex multiplication is associative. Let z = x + iy, w = a + ib, and let t =


r + is be complex numbers. Then (zw)t = [(ax − by) + i(ay + bx)](r + is) =
r(ax − by) − s(ay + bx) + i[r(ay + bx) + s(ax − by)]. The reader can calcu-
late z(wt) and reconcile the result with the above expression for (zw)t.
(b) A slightly less obvious fact is the inversion formula of a nonzero complex
x y
number. If z = x + iy ≠ 0, then z−1 = 2 2 − i 2 2 . It is easy to verify that
x +y x +y
zz−1 = 1. 

Definition. For a complex number z = x + iy, x is called the real part of z, and y
is the imaginary part of z. We use the notation x = Re(z), and y = Im(z).

Definition. The complex conjugate of a complex number z = x + iy is the number


z = x − iy, and the absolute value (or modulus) of z is |z| = √x2 + y2 .

Theorem 1.2.16. If z and w are complex numbers, then

(a) z + w = z + w and zw = z w;
(b) z + z = 2Re(z) and z − z = 2iIm(z);
(c) |z| = |z| and zz = |z|2 ;
(d) |Re(z)| ≤ |z|, |Im(z)| ≤ |z|, and |z| ≤ |Re(z)| + |Im(z)|;
z
(e) z−1 = 2 ; and
|z|
(f) the triangle inequality |z + w| ≤ |z| + |w|.

Proof. The proofs are mostly computational and are left to the reader to check. We
prove the triangle inequality below.
Note that zw is the conjugate of zw; hence zw + zw = 2Re(zw) ≤ 2|zw| =
2|z||w| = 2|z||w|. Using this, we have |z + w|2 = (z + w)(z + w) = zz + zw +
zw + ww ≤ |z|2 + 2|z||w| + |w|2 = (|z| + |w|)2 . The result follows by taking the
square roots of the extreme sides of the above string of inequalities. 

Now that we have a measure of the length of a complex number, we have a measure
of the distance between two points in the complex plane. For complex numbers
z1 and z2 , the quantity |z1 − z2 | is exactly the Euclidean distance between z1 and
z2 . Now we can generalize many of the properties of subsets of the real line to
the complex plane. For example, a bounded subset of ℂ is a set A of complex
numbers such that sup{|z| ∶ z ∈ A} < ∞. For a complex number a, and a positive
real number 𝛿, the set {z ∈ ℂ ∶ |z − a| < 𝛿} is an open disk of radius 𝛿 and centered
at a. A point z ∈ ℂ is a limit point of a set A of complex numbers if every open
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

preliminaries 21

disk centered at z contains points of A other than z. We urge the reader to examine
the rest of the concepts we studied for real numbers and generalize them to the
complex field, whenever possible. One important distinction between ℝ and ℂ is
that there is no natural (or useful) way to order the complex field. We conclude the
section with the following theorem.

Theorem 1.2.17 (completeness of the complex field). A complex sequence (zn ) is


a Cauchy sequence if and only if it is convergent.

Proof. Let zn = xn + iyn , where (xn ) and (yn ) are real sequences. If (zn ) is Cauchy,
then, given 𝜖 > 0, there exists a positive integer N such that, for n, m > N,
|zn − zm | < 𝜖. It follows that the real sequences (xn ) and (yn ) are Cauchy
sequences, since |xn − xm | ≤ |zn − zm | and |yn − ym | ≤ |zn − zm |. By the complete-
ness of ℝ, (xn ) and (yn ) converge to real numbers x and y, respectively. Clearly,
(zn ) converges to z = x + iy because |zn − z| ≤ |xn − x| + |yn − y|. We leave the
proof of the converse to the reader. 

The following example establishes the Bolzano-Weierstrass theorem for complex


sequences.

Example 11. Every bounded complex sequence contains a convergent subse-


quence. Let zn = xn + iyn be a bounded sequence in ℂ. Since |xn | ≤ |zn |, (xn )
is bounded. By theorem 1.2.8, (xn ) has a convergent subsequence (xnk ). Let
x = limk xnk . Now the sequence (ynk ) is bounded, so it contains a convergent
subsequence (ynkp ). Let y = limp (ynkp ). The subsequence znkp = xnkp + iynkp of
(zn ) clearly converges to x + iy as p → ∞. 

Exercises

1. Prove that the finite union of bounded subsets of ℝ is bounded, and give an
example to show that the conclusion is false for an infinite union of bounded
sets.
2. Prove that if A ⊆ ℝ is bounded below, then A has a greatest lower bound.
Hint: Define −A = {−x ∶ x ∈ A}. Show that inf A = −sup{−A}.
3. Prove that if limn an = a, limn bn = b, then limn (an ± bn ) = a ± b and that
limn (an bn ) = ab.
4. Let limn bn = b ≠ 0. Prove that there is a natural number N such that, for
all n > N, |bn | ≥ |b|/2. Hence prove that if, in addition, limn an = a, then
a a
limn n = .
bn b
5. Prove theorem 1.2.1.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

22 fundamentals of mathematical analysis

6. Prove theorem 1.2.2.


7. Prove theorem 1.2.4.
8. Show that the limit of a convergent sequence is unique.
9. Suppose all the terms of a sequence (an ) are in a closed interval I. Prove that
if limn an = a, then a ∈ I.
10. Show that limn an = a if and only if every interval centered at a contains all
but finitely many terms of the sequence.

11. Let (𝛿n ) be a positive sequence such that ∑n=1 𝛿n < ∞. Show that if (an ) is
such that |an+1 − an | ≤ 𝛿n , then (an ) is convergent. Hint: Examine the proof
in example 10.
12. Prove that a point x is a limit point of a subset A of ℝ if and only if every
interval centered at x contains infinitely many points of A.
13. Show that if A is bounded above and a = supA, then there is a sequence
(an ) in A such that limn an = a. Also prove that if a ∉ A, the terms of the
sequence (an ) can be chosen to be distinct.
14. Prove that if {In }n∈ℕ is a descending sequence of closed bounded intervals,
then ∩n∈ℕ In is a closed interval or a point. Give examples to show that the
result is false if either of the conditions closed or bounded is omitted.
15. Let A and B be nonempty subsets of ℚ such that A ∪ B = ℚ and, for
every a ∈ A and every b ∈ B, a < b.⁴ Prove that exactly one of the following
alternatives holds:
(a) there exists a number 𝛼 ∈ ℚ such that A = ℚ ∩ (−∞, 𝛼],
(b) there exists a number 𝛼 ∈ ℚ such that A = ℚ ∩ (−∞, 𝛼), or
(c) there exists a number 𝛼 ∈ ℝ − ℚ such that A = ℚ ∩ (−∞, 𝛼).
16. Suppose (an ) and (bn ) are real sequences. Prove that
(a) lim infn an ≤ lim supn an ;
(b) lim infn (−an ) = − lim supn an and lim supn (−an ) = − lim infn an ;
(c) lim infn an + lim infn bn ≤ lim infn (an + bn );
(d) lim supn (an + bn ) ≤ lim supn an + lim supn bn ; and
(e) if an ≤ bn , then lim supn an ≤ lim supn bn .
17. Prove that 𝛽 = lim infn an if and only if, for every 𝜖 > 0,
(a) there is a positive integer N such that an > 𝛽 − 𝜖 for all n > N, and
(b) an < 𝛽 + 𝜖 for infinitely many n ∈ ℕ.
18. Show that lim infn an is the smallest limit point of (an ).
1 1
19. Let an be a positive sequence. Prove that lim supn = .
an lim infn an
20. Verify the details of the proof of theorem 1.2.15.
21. Verify the details of the proof of theorem 1.2.16.
22. Verify the details of the proof of theorem 1.2.17.
23. Prove that every bounded infinite subset of ℂ has a limit point.

⁴ Such a partition of ℚ is called a Dedekind cut.


OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

preliminaries 23

∞ zn
24. (a) Show that the series ∑n=0 is absolutely convergent for all z ∈ ℂ.
n!
z ∞ zn
(b) Define e = ∑n=0 . Show that, for all z, w ∈ ℂ, ez ew = ez+w . Conclude
n!
that, for n ∈ ℕ, and z ∈ ℂ, (ez )n = enz . Hint: Recall that absolutely conver-
gent series can be multiplied term by term. The reader will recognize ez as
the complex exponential function.
25. (a) Show that, for 𝜃 ∈ ℝ, ei𝜃 = cos 𝜃 + i sin 𝜃. Hint: Recall that the terms of
an absolutely convergent series can be rearranged without affecting the sum
of the series.
(b) Show that if z is a nonzero complex number, then there is a unique
positive number r and a unique real number 𝜃 ∈ [0, 2𝜋) such that z = rei𝜃 .
z
Hint: Write z = |z|w, where w = . Note that |w| = 1.
|z|
26. Show that, for 𝜃 ∈ ℝ, (cos 𝜃 + i sin 𝜃)n = cos(n𝜃) + i sin(n𝜃).
27. Let z be a nonzero complex number, and write z = rei𝜃 . Show that, for n ≥ 2,
each of the numbers 𝜉k = r1/n ei(𝜃+2𝜋k)/n , 0 ≤ k ≤ n − 1, satisfies 𝜉kn = z. The
numbers 𝜉k are the nth roots of z.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

2
Set Theory

A false conclusion once arrived at and widely accepted is not easily dislodged
and the less it is understood the more tenaciously it is held.
Georg Cantor

Georg Cantor. 1845–1918

Georg Cantor entered the Polytechnic of Zürich in 1862 to study engineering. He


later moved to the University of Berlin, where he attended lectures by Weierstrass,
Kummer, and Kronecker, completing his dissertation on number theory in 1867.

In 1873 Cantor proved the countability of the set of rational numbers. He then
proved that the real numbers were uncountable and published the result in 1874.
It is in that paper that the idea of a one-to-one correspondence appeared for the
first time. He next pondered the question of whether the unit interval could be
put in a one-to-one correspondence with the unit square. He initially dismissed
the possibility and wrote that “the answer seems so clearly to be ‘no’ that proof
appears almost unnecessary.” When he did prove the result, he wrote to Dedekind
in 1877, “I see it, but I don’t believe it!” In a paper published in 1878, he made the
concept of one-to-one correspondence precise and discussed sets of equal power,
that is, sets which have equal cardinality.

Fundamentals of Mathematical Analysis. Adel N. Boules, Oxford University Press (2021). © Adel N. Boules.
DOI: 10.1093/oso/9780198868781.003.0002
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

26 fundamentals of mathematical analysis

Between 1879 and 1884, Cantor published a series of six papers designed to
provide a basic introduction to set theory, and this is when he realized that his
work was not finding the acceptance that he had hoped. In fact, Cantor’s ideas
earned him the strong antagonism of Kronecker, among other mathematicians
and philosophers. Dedekind was sympathetic to Cantor’s work and in 1888 wrote
his article Was sind und was sollen die Zahlen [What are numbers and what should
they be], partially in defense of Cantor’s work.

Cantor’s last major papers on set theory appeared in 1895 and 1897, where he
had hoped, without success, to include a proof of the continuum hypothesis. He
did, however, succeed in formulating his theory of well-ordered sets and ordinal
numbers. It was also during those years that Cantor discovered the first paradoxes
of set theory.

Hilbert described Cantor’s work as “the finest product of mathematical genius and
one of the supreme achievements of purely intellectual human activity.”

Cantor’s personal life was not entirely a happy one. For more than thirty years,
Cantor was troubled with bouts of depression, and, in 1899, he suffered the death
of his youngest son. He spent the last year of his life confined to a sanatorium,
where he died of a heart attack.

2.1 Finite, Countable, and Uncountable Sets

Cantor’s revolutionary ideas were initially focused on understanding infinite sets.


His starting point was, as is ours in this chapter, set equivalence. The title of the
section accurately captures its objectives: to formulate clear definitions of finite
and infinite sets, and to study their properties in good detail. Among the results
we establish are Dedekind’s definition of an infinite set, the countability of ℚ, and,
in general, the countability of a countable union of countable sets. We conclude the
section by showing the existence of uncountable sets through the establishment of
the fact that 2ℕ and ℝ are uncountable.

Definition. Two sets A and B are equivalent if there is a bijection from A to B.1
We use the notation A ≈ B to indicate the equivalence of A and B.

Example 1. The set 2ℕ of even positive integers is equivalent to ℕ. The function


f ∶ ℕ → 2ℕ defined by f (n) = 2n is bijective. 

1 The term equipotent is also used.


OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

set theory 27

Example 2. The closed interval [0, 1] is equivalent to an arbitrary closed interval


x−a
[a, b] (a < b). The function f (x) = is a bijection from [a, b] to [0, 1]. 
b−a

Example 3. The closed interval [0, 1] is equivalent to the open interval (0, 1).
Define a function f ∶ [0, 1] → (0, 1) as follows:

⎧1/2 if x = 0,
f (x) = 1/(n + 2) if x = 1/n, n ∈ ℕ,

⎩x otherwise.

It is easy to verify that f is a bijection. 

Example 4. Let A = (−𝜋/2, 𝜋/2), B = ℝ. The function f (x) = tan(x) is a bijection


from A to B. Thus A ≈ B. 

Theorem 2.1.1. Let A, B and C be sets. Then

(a) A ≈ A.
(b) If A ≈ B, then B ≈ A.
(c) If A ≈ B and B ≈ C, then A ≈ C.

Proof. (a) The identity function IA ∶ A → A is a bijection.


(b) If f ∶ A → B is a bijection, then f−1 ∶ B → A is a bijection.
(c) If f ∶ A → B and g ∶ B → C are bijections, then gof ∶ A → C is a bijection. 

Definitions. For n ∈ ℕ, let ℕn = {1, 2, … , n}. A set A is finite if A ≈ ℕn for some


n ∈ ℕ. A set is said to be infinite if it is not finite. If A ≈ ℕn , we say that the
cardinality of A is n, and we write Card(A) = n. The cardinality of a finite set is
simply the number of elements in it. We also define Card(∅) = 0.

Theorem 2.1.2.
(a) A proper subset B of ℕn is finite, and Card(B) = m for some m < n.
(b) A proper subset B of a finite, set A is finite, and Card(B) < Card(A).
(c) If m, n ∈ ℕ and m < n, then there is no injection from ℕn to ℕm .
(d) A finite set is not equivalent to any of its proper subsets.
(e) ℕ is infinite.

Proof. (a) We proceed by induction. The only proper subset of ℕ1 is ∅, and


Card(∅) = 0 < 1 = Card(ℕ1 ). Suppose the statement is true for some integer
n ≥ 1, and let B be a proper subset of ℕn+1 . If n + 1 ∉ B, then B ⊆ ℕn , and,
by the inductive hypothesis, B is finite and Card(B) ≤ n < n + 1. Otherwise,
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

28 fundamentals of mathematical analysis

B = {n + 1} ∪ C, where C is a proper subset of ℕn . By the inductive hypothesis,


C is finite, and Card(C) = m < n. Let g be a bijection from ℕm to C. Define
f ∶ ℕm+1 → B by

g(x) if x ∈ ℕm ,
f (x) = {
n+1 if x = m + 1.

Clearly, f is a bijection; hence Card(B) = m + 1 < n + 1.


(b) Suppose Card(A) = n, and let f be a bijection from A to ℕn . The restriction of f
to B is a bijection from B onto f (B). By part (a), f (B) is finite, and Card( f (B)) < n;
thus Card(B) = Card( f (B)) < n.
(c) If f is an injection from ℕn to ℕm , then B = f (ℕn ) is a subset of ℕm . By part
(a), n = Card(ℕn ) = Card(B) ≤ m. This contradiction shows that no such f exists.

(d) Suppose B is a proper subset of a finite set A, and let n = Card(A), and m =
Card(B). By part (b), m < n. Let g ∶ B → ℕm and h ∶ ℕn → A be bijections. If
there is a bijection f ∶ A → B, then gofoh would be an injection from ℕn to ℕm .
This contradicts part (c).
(e) If, for some positive integer m, there exists a bijection f ∶ ℕ → ℕm , then, for
any integer n > m, the restriction of f to ℕn would be an injection from ℕn into
ℕm . This contradicts part (c) and completes the proof. 

Corollary 2.1.3. If A is infinite and A ⊆ B, then B is infinite.

Proof. If B were finite, A would also be finite by theorem 2.1.2. 

Theorem 2.1.4. A set A is infinite if and only if it contains a sequence of distinct


elements.

Proof. Suppose A is infinite. First we show that, for n ∈ ℕ, A contains a set of exactly
n elements. The proof is inductive, and here is the inductive step: Having found a
subset {a1 , … , an } of A containing exactly n elements, we pick an element an+1 ∈
A − {a1 , … , an }. Such an an+1 exists because otherwise A would be be equal to
{a1 , … , an }, which is finite. The set {a1 , … , an+1 } has exactly n + 1 elements.
For n = 0, 1, 2, …, let Bn be a subset of A of exactly 2n elements. Define C0 = B0 ,
n−1 i
and, for n ∈ ℕ, let Cn = Bn − ∪n−1 n−1
i=0 Bi . Now Card(∪i=0 Bi ) ≤ ∑i=0 2 = 2 − 1.
n
n n
Hence Card(Cn ) ≥ 2 − (2 − 1) = 1. Thus the sets Cn are disjoint and nonempty.
We choose an element cn from each Cn , and we obtain a sequence of distinct
elements of A. The converse is true because ℕ is infinite. 
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

set theory 29

Theorem 2.1.5. A set A is infinite if and only if it is equivalent to one of its proper
subsets.

Proof. If A is equivalent to one of its proper subsets, A is infinite by theorem 2.1.2(d).


Conversely, if A is infinite, by theorem 2.1.4, A contains a sequence of distinct
elements (b1 , b2 , . . .). Let B = {b1 , b2 , . . .} and define a function f ∶ A → A − {b1 }
as follows: for n ∈ ℕ, f (bn ) = bn+1 , and f (x) = x if x ∉ B. As f is clearly a bijection,
A ≈ A − {b1 }. 

Example 5. The closed interval [0, 1] is infinite because it is equivalent to its subset
(0, 1). (See example 3.)

Definition. A set A is countable if it is equivalent to ℕ. A bijection f ∶ ℕ → A


is called an enumeration (sequencing) of A. A set is said to be at most countable
if it is finite or countable. If an infinite set is not countable, we say it is
uncountable.

Theorem 2.1.6. ℕ × ℕ is countable.

Proof. We enumerate ℕ × ℕ recursively as follows: f (1) = (1, 1), and once f (n) has
been defined, say, f (n) = (a, b), define

(a − 1, b + 1) if a > 1,
f (n + 1) = {
(b + 1, 1) if a = 1.

We pictorially think of ℕ × ℕ as the integer points in the open first quadrant


of the plane. The above enumeration sequences the integer points in the open
first quadrant along each diagonal a + b = constant, a, b ∈ ℕ, from bottom to
top. Once the top of a diagonal has been reached, we start at the bottom of the
next diagonal. See figure 2.1. It is clear that f is a bijection from ℕ to ℕ × ℕ.
The enumeration trick in this proof is attributed to Cantor. Another proof of this
theorem is provided by exercise 15 on section 1.1. 

Theorem 2.1.7. An infinite subset B of a countable set A is countable.

Proof. Let A = {a1 , a2 , . . .} be an enumeration of A. Let n1 be the least positive


integer such that an1 ∈ B. Suppose we have found integers n1 < n2 < . . . < nk such
that, for 1 ≤ i ≤ k − 1, ni+1 is the least positive integer greater than ni for which
ani+1 ∈ B. Define nk+1 to be the least integer greater than nk such that ank+1 ∈ B.
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

30 fundamentals of mathematical analysis

1
1 2 3 4

Figure 2.1 Cantor’s trick

We claim that B = {ank ∶ k ∈ ℕ}. If not, then there exists an element b ∈ B such
that b ≠ ank for all k ∈ ℕ. Now b ∈ A, so b = an for some positive integer n. By
assumption, n ≠ nk for all k ∈ ℕ. Because nk is a strictly increasing sequence of
positive integers, there are two possibilities: either n < n1 or there is a unique k ∈ ℕ
such that nk < n < nk+1 . The former possibility would contradict the definition of
n1 , and the latter possibility contradicts the definition of nk+1 . This shows that
B = {ank ∶ k ∈ ℕ}. 

Theorem 2.1.8. If there exists an injection f from a set A to ℕ, then A is at most


countable.

Proof. Without loss of generality, assume A is infinite. Since f is one-to-one, ℜ( f )


(the range of f) is an infinite subset of ℕ. By theorem 2.1.7, ℜ( f ) is countable.
Therefore A is countable since it is in one-to-one correspondence with ℜ( f ). 

Theorem 2.1.9. A set A is at most countable if and only if there is a surjection


f ∶ ℕ → A.

Proof. If A is countable, a bijection exists from ℕ onto A. If A is finite and


Card(A) = n, then there exists a bijection f ∶ {1, 2, … , n} → A. Extend f to a
surjection f ∶ ℕ → A by defining f (m) = f (n) for all m > n.
Conversely, if there exists a surjection f ∶ ℕ → A, the sets Sa = f−1 (a), a ∈ A,
are mutually disjoint and nonempty. Choose an element na from each of the sets
Sa and define a function g ∶ A → ℕ by g(a) = na . Clearly, g is one-to-one. Now A
is at most countable by theorem 2.1.8. 
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

set theory 31

Theorem 2.1.10. A countable union of countable sets is countable.

Proof. Let {An } be a countable collection of countable sets, and let A = ∪∞ n=1 An .
Write An = {an1 , an2 , . . .}. Define f ∶ ℕ × ℕ → ∪∞
n=1 An by f (m, n) = amn . Clearly, f
is onto. By theorem 2.1.6, there exists a bijection g ∶ ℕ → ℕ × ℕ. The composition
fog maps ℕ onto A. By theorem 2.1.9, A is countable. 

Corollary 2.1.11. ℤ and ℚ are countable.

Proof. Use theorem 2.1.10 and the facts that ℤ = ℕ ∪ {0} ∪ −ℕ and ℚ = ∪∞
n=1
m
{ ∶ m ∈ ℤ}. 
n

Theorem 2.1.12. The set 𝔉 of finite sequences in a countable set A is countable.

Proof. For each n ∈ ℕ, let An be the family of sequences in A of exact length n. As


a consequence of theorem 2.1.6 (see problem 9 at the end of this section), An is
countable. Since 𝔉 = ∪∞ n=1 A , 𝔉 is countable by theorem 2.1.10. 
n

Theorem 2.1.13. 2ℕ is uncountable.

Proof. Recall that 2ℕ is the set of all sequences from ℕ in {0, 1} (binary sequences).
Suppose, for a contradiction, that 2ℕ is countable. Then 2ℕ = {x1 , x2 , . . .}, where
each xi is a binary sequence, say, xi = (xi1 , xi2 , . . .) and each xij is 0 or 1. The binary
sequence y = (y1 , y2 , . . .), where

0 if xii = 1,
yi = {
1 if xii = 0.

is clearly not equal to any xi . This contradiction establishes the theorem. 

Corollary 2.1.14. If a set A contains at least two elements, then Aℕ is uncountable.

Proof. Pick two distinct elements a0 and a1 in A. By the previous theorem, {a0 , a1 }ℕ
is uncountable. Consequently, Aℕ is uncountable because it contains {a0 , a1 }ℕ . See
problem 4 at the end of this section. 

Theorem 2.1.15. The interval (0, 1] is uncountable. Consequently, ℝ is uncountable.

Proof. Let T be the set of binary sequences which contain only a finite num-
ber of nonzero terms. Each of the sets Tn of binary sequences of length n is
finite, and T is equivalent to ∪∞
n=1 Tn . Therefore T is countable. It follows that
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

32 fundamentals of mathematical analysis

A = 2ℕ − T is uncountable, by problem 6 at the end of this section. We construct


an injection f from A to (0, 1] as follows: for a sequence a = (a1 , a2 , . . .) ∈ A,
∞ a
define f (a) = ∑i=1 ii . To prove that f is one-to-one, suppose a = (a1 , a2 , . . .) and
2
b = (b1 , b2 , . . .) are distinct sequences in A. Let n be the least positive integer
n−1 a n−1 b
for which an ≠ bn , and assume that an = 1, bn = 0. If t = ∑i=1 ii = ∑i=1 ii ,
2 2

then f (b) ≤ t + ∑i=n+1 1/2i = t + 1/2n . Since (an ) contains an infinite number
of terms that are equal to 1, let m be such that m > n and am = 1. Now f (a) > t +
1/2n + 1/2m > f(b). Thus ℜ( f ) is uncountable. Since ℜ( f ) ⊆ (0, 1] ⊆ ℝ, both
(0, 1] and ℝ are uncountable. See problem 4 at the end of this section. 

Remark. It is easy to see that the function f in the above proof is onto and hence
A ≈ (0, 1]. See problem 11 at the end of this section.

Exercises

1. Show that any two (bounded or unbounded) intervals in ℝ are equivalent.


2. Show that A × B ≈ B × A.
3. Show that if A ≈ B and C ≈ D, then A × C ≈ B × D and AC ≈ BD .
4. Prove that if A ⊆ B and A is uncountable then B is uncountable.
5. Show that, for any two sets A and B, there exist disjoint sets C and D such
that A ≈ C and B ≈ D.
6. Prove that if A is a countable subset of an uncountable set B, then B − A is
uncountable and B − A ≈ B.
7. Let f ∶ A → B.
(a) Show that if f is onto and B is uncountable, then A is uncountable.
(b) Show that if f is one-to-one and A is uncountable, then B is uncountable.
8. Let g ∶ ℕ × ℕ → ℕ be the inverse of the function f in the proof of theorem
1
2.1.6. Show that, for (a, b) ∈ ℕ × ℕ, g(a, b) = (a + b − 2)(a + b − 1) + b.
2
9. Show that if A is a countable set, then An is countable.
10. Let A be a countable set. Show that the collection of finite subsets of A is
countable.
11. In connection with the proof of theorem 2.1.15, show that A ≈ (0, 1] ≈ ℝ.
12. A real number x is said to be algebraic if it is a root of some polynomial
equation with integer coefficients. For example, √2 is algebraic. Prove that
the set of algebraic numbers is countable. If a real number is not algebraic,
it is said to be transcendental. It can be shown, for example, that e and 𝜋 are
transcendental numbers. Conclude that the set of transcendental numbers
is uncountable.
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

set theory 33

2.2 Zorn’s Lemma and the Axiom of Choice

The axiom of choice is one of the most useful tools in set theory. Although
it is easy to state and widely accepted, the axiom of choice has also generated
much controversy among mathematicians. In this section, we study the axiom
of choice and its most famous and widely applicable equivalent: Zorn’s lemma,
which is an indispensable tool in this book. The section and the section exercises
contain typical but illuminating illustrations of how Zorn’s lemma is applied. In
this section, we also study partially ordered, linearly ordered, and well-ordered sets
and establish results such as the Schröder-Bernstein theorem, which will help us
study cardinal numbers in the next section. Although ordinal numbers have been
avoided in this book, the section exercises are largely focused on well-ordered sets.

Definition. Let A be a nonempty set. A partial ordering on A is a relation ≤ on


A such that, for all x, y, and z ∈ A,

(a) x ≤ x,
(b) if x ≤ y and y ≤ z, then x ≤ z, and
(c) if x ≤ y and y ≤ x, then x = y.

If x ≤ y and x ≠ y, we write x < y.

A relation satisfying condition (c) is called antisymmetric.

Definition. Let A be a nonempty set. A partial ordering ≤ on A is said to be a


linear (or total) ordering if it also satisfies the condition that, for x, y ∈ A, either
x ≤ y or y ≤ x. In this case, we say that A is linearly ordered by ≤. A linearly
ordered set is commonly called a chain.

Example 1. Let A = 𝒫(ℕ). Order A by set inclusion. Thus if S and T are subsets
of ℕ, then S ≤ T means that S ⊆ T. Set inclusion is a partial ordering of A. It is
not total because if S and T are subsets of ℕ, it need not be the case that T ⊆ S
or S ⊆ T. The set {ℕn ∶ n ∈ ℕ} is a chain in A. 

Definitions. Let (A, ≤) be a partially ordered set and let S ⊆ A.

(a) An element s ∈ S is the greatest element of S if, for all t ∈ S, t ≤ s. Thus s


exceeds every other element of S. The greatest element of a set, if one exists,
is unique.
(b) An element s ∈ S is maximal if, for all t ∈ S, t ≥ s implies that t = s. Thus s
is not exceeded by any other element of S. A maximal element need not be
unique.
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

34 fundamentals of mathematical analysis

(c) An element a ∈ A is an upper bound of S if s ≤ a for all s ∈ S. Notice that


an upper bound of S need not be in S.
(d) An element a ∈ A is the least upper bound of S if a is an upper bound of S
and, if b < a, then b is not an upper bound of S.

Example 2. Let A = ℝ2 and define ≤ on A as follows: P = (a, b) ≤ Q = (x, y) if


a ≤ x and b ≤ y. Thus P is below and to the left of Q. This ordering of ℝ2 is
not linear because the points (1, 2) and (2, 1) are not comparable. Let S (the
shaded region in fig. 2.2) be the closed subset of the third quadrant below the
line x + y + 1 = 0. The fact that S is closed means that S contains its three straight
boundaries. The set S has no greatest element, since no point of S is strictly
above and to the right of every other point of S. Every point on the line segment
x + y + 1 = 0, −1 ≤ x ≤ 0, is a maximal element of S. The set of upper bounds
of S is the closed first quadrant, and the least upper bound of S is (0, 0). 

Definition. A linearly ordered set A is said to be well ordered if every nonempty


subset of A contains a least element. Thus if S is a subset of A, then there is an
element s ∈ S such that s ≤ t for every t ∈ S. The least element of S is also called
the first element of S.

Example 3. ℕ is well ordered with the usual ordering of the real numbers. We
often use the well ordering of ℕ without explicit mention; see, for example, the
proof of theorem 2.1.7. 

–1

–2

–3
–3 –2 –1 0 1 2 3

Figure 2.2
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

set theory 35

Example 4. Let A = ℕ ∪ {𝜔}, where 𝜔 is any object not in ℕ. The ordering on the
subset ℕ of A is the natural ordering of the integers. We define n < 𝜔 for all
n ∈ ℕ. Thus we simply define 𝜔 to be the largest element of A. The set (A, ≤) is
well ordered. 

Example 5. Let B = {x1 , x2 , . . .} be a countable set of distinct elements such that


B ∩ ℕ = ∅. Define an ordering ≤ on A = ℕ ∪ B as follows: the restriction of ≤
to ℕ is the usual ordering on ℕ and if xn and xm are in B, xn ≤ xm if n ≤ m.
Finally, if n ∈ ℕ, x ∈ B, we define n < x. The set (A, ≤) is well ordered.

We now state the well ordering principle, which is really an axiom. It simply states
that any set can be well ordered.

The well ordering principle: given a nonempty set A, there exists a well ordering
on A.

It should be clear that a countable set can be well ordered. If {a1 , a2 , . . .} is an


enumeration of a countable set A, then we can well order A in a natural way: define
an ≤ am if n ≤ m. Therefore the challenge is when A is uncountable. Notice that an
arbitrary uncountable set contains an abundance of well-ordered subsets, namely,
all the countable subsets of A. In order to make the terminology we use in this
section unambiguous, when we speak of a well-ordered subset of A, we mean a
subset that can be well ordered.

The axiom of choice: if {X𝛼 }𝛼∈I is a nonempty collection of nonempty sets, then
∏𝛼 X𝛼 is nonempty.

Recall that an element of ∏𝛼 X𝛼 is a function x ∶ I → ∪𝛼∈I X𝛼 such that x𝛼 ∈ X𝛼


for all 𝛼 ∈ I. Such a function x is called a choice function because it is constructed
by choosing an element x𝛼 from each of the sets X𝛼 , hence the name “Axiom of
Choice,” which we can restate as follows: choice functions exist. Notice that the
axiom of choice is not needed when the product ∏𝛼 X𝛼 involves a finite number
of factor sets. Also when each of the factor sets X𝛼 contains a distinguishable
element, then one does not need the axiom of choice to assert that ∏𝛼 X𝛼 is
nonempty. For example, in ℕ, we can pick a distinguishable element, say, 1, and
ℕℕ is not empty because it contains the constant sequence (1, 1, 1, . . .).

The axiom of choice is often applied in the following equivalent form: if X is a


nonempty set, it is possible to choose an element from each of the nonempty
subsets of X.
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

36 fundamentals of mathematical analysis

The axiom of choice is perhaps the most believable axiom of set theory. However,
it is neither a simple fact nor obvious. In fact, the axiom of choice is equivalent to
the well ordering principle and to the following axiom, which is less intuitive than
the well ordering principle or the axiom of choice:

Zorn’s lemma: if A is a partially ordered set such that every chain in A has an
upper bound, then A contains a maximal element.

Theorem 2.2.1. The axiom of choice, Zorn’s lemma, and the well ordering principle
are all equivalent. 

We will include the lengthy proof of the above theorem in Appendix A.


The rest of the results in this section are well-known theorems and lay the
foundation for studying the cardinality of sets. The theorem below is a typical
example of how Zorn’s lemma is applied.

Theorem 2.2.2. Let A and B be nonempty sets. Then there is an injection from A to
B or an injection from B to A.

Proof. Let 𝔅 be the collection of all injective functions f such that Dom( f ) ⊆ A
and ℜ( f ) ⊆ B. Let { f𝛼 ∶ 𝛼 ∈ I} be an indexing of 𝔅 and, for 𝛼 ∈ I, write A𝛼 =
Dom( f𝛼 ), and B𝛼 = ℜ( f𝛼 ). Partially order 𝔅 as follows: f𝛼 ≤ f𝛽 if f𝛽 extends f𝛼 .
More explicitly, f𝛼 ≤ f𝛽 means that A𝛼 ⊆ A𝛽 , B𝛼 ⊆ B𝛽 , and the restriction of f𝛽 to
A𝛼 is f𝛼 . Clearly, ≤ is a partial ordering of 𝔅. Now let ℭ be a chain in 𝔅, and index
ℭ by a subset J of I; ℭ = { f𝛼 ∶ 𝛼 ∈ J }. We show that ℭ has an upper bound: let
A𝛼 = Dom( f𝛼 ), B𝛼 = ℜ( f𝛼 ), S = ∪𝛼∈J A𝛼 , and T = ∪𝛼∈J B𝛼 . Define f ∶ S → T as
follows: if x ∈ S, choose a set A𝛼 that contains x, and let f (x) = f𝛼 (x). The function
f is well defined because ℭ is a chain. Specifically, if x ∈ A𝛼 ∩ A𝛽 (𝛼, 𝛽 ∈ J), then
f𝛼 ≤ f𝛽 or f𝛽 ≤ f𝛼 ; say, the former. Since f𝛽 extends f𝛼 , f𝛼 (x) = f𝛽 (x). We leave
it to the reader to verify that f is an injection. Clearly, f is an upper bound of ℭ.
By Zorn’s lemma, 𝔅 has a maximal element, say, f1 . Write A1 = Dom( f1 ) and
B1 = ℜ( f1 ). If A1 = A, then f1 is an injection from A into B. If B1 = B, then f−11 is
an injection from B to A, and the proof is complete. We now show that A1 ≠ A
and B1 ≠ B cannot occur simultaneously. If that were the case, pick elements
a ∈ A − A1 , b ∈ B − B1 , and extend f1 to a function f ∶ A1 ∪ {a} → B1 ∪ {b} by
defining f (a) = b and f |A1 = f1 . Clearly, f is a strict extension of f1 , and this
contradicts the maximality of f1 . 

Lemma 2.2.3. Let B be a nonempty subset of a set A. If there is an injection f ∶


A → B, then A ≈ B.
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

set theory 37

Proof. Assume, without loss of generality, that ℜ( f ) ⊂ B ⊂ A (strict inclusions).


Define the powers of f as follows: f (1) (x) = f (x), f (n+1) (x) = f ( f (n) (x)), n ≥ 1.
Let B′ = { f (n) (x) ∶ x ∈ A − B, n ∈ ℕ}. Note that B′ ⊆ ℜ( f ) and that f (B′ ) ⊆
B′ . Let C = (A − B) ∪ B′ , and let f1 be the restriction of f to C. Thus f1 is an
injection from C to B′ . We now show that f1 is a bijection. If y ∈ B′ , then
y = f (n) (x) for some x ∈ A − B. If n = 1, then y = f (x), x ∈ A − B ⊆ C. If n > 1,
then y = f (z), z = f (n−1) (x) ∈ B′ ⊆ C. Now let D = B − B′ . Since B′ ⊆ ℜ( f ), D =
B − B′ ⊇ B − ℜ( f ) ≠ ∅. The reader can check that B′ and D partition B and
that the three sets B′ , D, and A − B partition A; a simple Venn diagram makes
this abundantly clear. Thus C and D partition A, and B′ and D partitions B. The
function h ∶ A → B defined below is a bijection from A to B:

g(x) if x ∈ C,
h(x) = {
x if x ∈ D. 

Example 6. Any two open disks in the plane are equivalent. Let 0 < r1 < r2 ,
and consider the disks D1 = {(x, y) ∈ ℝ2 ∶ x2 + y2 < r21 } and D2 = {(x, y) ∈ ℝ2 ∶
x2 + y2 < r22 }. Choose a number a such that 0 < a < r1 . The function f ∶ D2 →
a
D1 defined by f (x, y) = (x, y) is an injection, as the reader can easily verify. By
r2
the previous lemma, D1 ≈ D2 . In a similar manner, one can prove that any two
open squares are equivalent. 

Theorem 2.2.4 (the Schröder-Bernstein theorem). Let A and B be nonempty sets.


If there exist injections f ∶ A → B and g ∶ B → A, then A ≈ B.

Proof. Let A1 = ℜ(g) ⊆ A. The function gof ∶ A → A1 is an injection. By lemma


2.2.3, there exists a bijection h ∶ A → A1 . Now g−1 is a bijection from A1 onto
B, and the composition g−1 oh is the desired bijection from A to B. 

Example 7. An open square is equivalent to an open disk. Let S = {(x, y) ∶ |x| <
2, |y| < 2}, let D3 = {(x, y) ∶ x2 + y2 < 9}, and let D1 = {(x, y) ∶ x2 + y2 < 1}.
Observe that D1 ⊆ S ⊆ D3 . Set inclusion is clearly an injection from S into
D3 . By example 6, there is an injection from D3 into D1 , and hence from D3 into
S. By the Schröder-Bernstein theorem, S ≈ D3 . 

Exercises

1. Let A be a partially ordered set and let S ⊆ A. State the definition of each of
the following terms: the least element of S, a minimal element of S, a lower
bound of S, and the greatest lower bound of S.
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

38 fundamentals of mathematical analysis

2. Prove that a nonempty subset of a well-ordered set is well ordered.


3. Let A be a partially ordered set such that every nonempty subset of A has a
least element. Prove that A is linearly ordered and hence well-ordered.
4. Let (A, ≤) be a linearly ordered set. Prove that ≤ is a well ordering if and
only if A does not contain a strictly decreasing sequence a1 > a2 > a3 > . . .
5. Prove that if every countable subset of a linearly ordered set A is well
ordered, then A is well ordered.

Definition. Let A be a linearly ordered set and let a ∈ A. The initial


segment of A determined by a is the set S(a) = {x ∈ A ∶ x < a}.

6. Let A be linearly ordered and let a, b ∈ A. Show that S(a) = S(b) if and only
if a = b.
7. Prove that if every segment of a linearly ordered set A is well ordered, then
A is well ordered.
8. Let A be a well-ordered set, and let B be a proper subset of A with the
property that the conditions b ∈ B and c < b imply that c ∈ B. Prove that
B is a segment of A.
9. Let A be a well-ordered set, and let B be a proper subset of A such that, for
every b ∈ B and for every a ∈ A − B, b < a. Prove that B is a segment of A.
10. The principle of transfinite induction. Suppose that A is a well-ordered
set and that ∅ ≠ B ⊆ A is such that whenever S(x) ⊆ B, x ∈ B. Prove that
B = A.
11. Suppose that A is a well-ordered set and that B ⊆ A. Prove that either
∪x∈B S(x) = A, or ∪x∈B S(x) is an initial segment of A.
12. Prove that there exists an uncountable, well-ordered set Ω such that every
initial segment of Ω is countable.
13. Let Ω be as in the previous problem. Prove that every countable subset of
Ω has an upper bound.
14. Give a direct proof of the fact that Zorn’s lemma implies the axiom of
choice. Hint: Let {X𝛼 }𝛼∈I be a nonempty collection of nonempty sets, and let
𝔅 = {(J, g) ∶ J ⊆ I, g ∈ ∏𝛼∈J X𝛼 }, that is, g is a choice function on {X𝛼 }𝛼∈J .
The set 𝔅 ≠ ∅ because finite subsets J of I generate such functions. Partially
order 𝔅 as follows: (J1 , g1 ) ≤ (J2 , g2 ) if J1 ⊆ J2 and g2 extends g1 .
15. Let A be a linearly ordered set. Is the union of a collection of well-ordered
subsets of A necessarily a well-ordered set?
16. The Hausdorff maximal principle. Every partially ordered set contains a
maximal chain, that is, a chain that is not properly contained in any other
chain.
Prove that the Hausdorff maximal principle is equivalent to Zorn’s
lemma.
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

set theory 39

Hint: To prove that the Hausdorff maximal principle implies Zorn’s lemma,
let (X, ≤) be a partially ordered set that satisfies the conditions of Zorn’s
lemma. Let C be a maximal chain, and let x be an upper bound of C. To
prove the converse, let ℭ be the collection of all chains in X, and order ℭ by
set inclusion. Verify that the conditions of Zorn’s lemma are met and hence
ℭ contains a maximal member, that is, a maximal chain in X.
17. Prove that any open disk is equivalent to any closed disk.
18. Prove that any open square is equivalent to any closed square.

2.3 Cardinal Numbers

In section 2.1 we took a small step toward showing that infinite sets are not created
equal. In this section, we show that there are infinitely many types of infinities,
in the sense that there is a whole cascade (loosely speaking) of infinite sets of
unequal sizes, or cardinalities. This is the first result in the section. Our approach
to infinite cardinals is intuitive rather than axiomatic. We proceed to show that the
set of integers is the smallest infinite set, then we prove that a set of infinite sets is
well ordered by size, or cardinality. Only an intuitive understanding of cardinal
numbers is essential for subsequent material that make reference to cardinality.
Thus the discussion of cardinal arithmetic and sums of infinitely many cardinals
can be omitted on the first reading if the goal is to take the fastest route to chapter 4.

Definition. Let A and B be nonempty sets. We say that A and B have the same
cardinality if A ≈ B. We also say that A and B define the same cardinal number,
and we write Card(A) = Card(B).

Definition. Let a = Card(A) and b = Card(B). We say that a ≤ b if there is an


injection from A to B. We also write a < b to mean that there is a injection from
A to B but that A and B are not equivalent. By the Schröder-Bernstein theorem,
this is equivalent to saying that there is no injection from B to A.

Theorem 2.3.1. For any set A, Card(A) < Card(𝒫(A)).

Proof. Recall that the notation 𝒫(A) stands for the power set of A. Define a func-
tion f ∶ A → 𝒫(A) by f (x) = {x}. Clearly, f is one-to-one; therefore, Card(A) ≤
Card(𝒫(A)). If Card(A) = Card(𝒫(A)), then there exists a bijection g ∶ A →
𝒫(A). Define S = {x ∈ A ∶ x ∉ g(x)}. Since g is onto, let a be such that g(a) = S. If
a ∈ S, then, by the definition of S, a ∉ S. If a ∉ S, then again, by the definition of
S, a ∈ S. This contradiction completes the proof. 

The reader may have observed that while our definition of what it means for
two sets to have the same cardinality is unambiguous, we have not really defined
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

40 fundamentals of mathematical analysis

what a cardinal number is. We can give a slightly more tangible definition of a
set of cardinal numbers as follows: Let 𝔖 be a set of sets. By theorem 2.1.1, set
equivalence is an equivalence relation on 𝔖. We can define the cardinal numbers
in 𝔖 to be equivalence classes of set equivalence in 𝔖. This does not define all
cardinal numbers because if A is a set that is not equivalent to any set in 𝔖,2 then
Card(A) ≠ Card(S) for all S ∈ 𝔖.

One might be tempted to generalize the idea of the last paragraph by considering
the set of all sets, instead of a fixed set of sets. However, within the limitations
of naïve set theory, this is paradoxical for the following reason: If we were
allowed to use terms such as the set of all sets, 𝔖, let U = ∪{S ∶ S ∈ 𝔖}. Since
U contains every S ∈ 𝔖, Card(S) ≤ Card(U). Since 𝔖 contains all sets, Card(U)
would be the largest cardinal number. This is a paradox because, by theorem 2.3.1,
Card(𝒫(U)) > Card(U). Such paradoxes can be avoided in an axiomatic treatment
of set theory. Such a treatment is hardly essential for our purposes because we will
never refer to cardinal numbers as an absolute concept. We will be content to
think of cardinal numbers as a comparative measure of the size of sets in the sense
of the opening definition of this section.

Some common cardinals are:

n = Card(ℕn )
ℵ0 = Card(ℕ)
𝔠 = Card(ℝ)

The natural numbers are the finite cardinals, and all other cardinals are infinite.

Theorem 2.3.2. ℵ0 is the smallest infinite cardinal number.

Proof. Let A be an infinite set. By theorem 2.1.4, A contains a countable set


of distinct elements, B = {b1 , b2 , . . .}. Since the inclusion B ⊆ A is an injection,
ℵ0 = Card(B) ≤ Card(A). 

Theorem 2.3.3. Let 𝔖 be a set of cardinal numbers. Then 𝔖 is linearly ordered.

Proof. Let a, b ∈ 𝔖 and let A and B be sets such that a = Card(A), and b = Card(B).
By theorem 2.2.2, there is an injection from A to B or one from B to A. Thus
a ≤ b or b ≤ a. To check antisymmetry, suppose that a ≤ b ≤ a. Then is an
injection from A to B and one from B to A. By the Schröder-Bernstein theorem,
A ≈ B and a = b. 

2 Such a set A exists. One can take A to be the power set of ∪{S ∶ S ∈ 𝔖}.
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

set theory 41

The following theorem establishes the fact that any set of cardinal numbers is well
ordered.

Theorem 2.3.4. If 𝔖 = {𝜉𝛼 }𝛼∈I is a set of cardinal numbers, then there is an element
𝛼0 ∈ I such that 𝜉𝛼0 ≤ 𝜉𝛼 for all 𝛼 ∈ I.

Proof. Let {X𝛼 }𝛼∈I be sets such that 𝜉𝛼 = Card(X𝛼 ). If 𝔖 contains any integers, the
smallest of these integers is the least cardinal in 𝔖. Otherwise, all the sets X𝛼
are infinite. We prove that there is 𝛼0 ∈ I such that, for every 𝛼 ∈ I, there is an
injection f𝛼 ∶ X𝛼0 → X𝛼 .
Let X = ∏𝛼∈I X𝛼 , and let 𝔅 be the collection of subsets B of X with the property
that if x = (x𝛼 ) and y = (y𝛼 ) are distinct elements of B, then x𝛼 ≠ y𝛼 for all 𝛼 ∈ I.
Order 𝔅 by set inclusion. It is clear that if ℭ is a chain in 𝔅, then ℭ has an upper
bound, namely, ∪{C ∶ C ∈ ℭ}. By Zorn’s lemma, 𝔅 has a maximal member, B.
We claim that, for some 𝛼0 ∈ I, 𝜋𝛼0 (B) = X𝛼0 . If this is not the case, then, for
each 𝛼 ∈ I, choose an element a𝛼 ∈ X𝛼 − 𝜋𝛼 (B) and let a = (a𝛼 ). The set B ∪ {a}
is clearly in 𝔅, which contradicts the maximality of B and shows that, for some
𝛼0 ∈ I, 𝜋𝛼0 (B) = X𝛼0 .
Now, for each a ∈ X𝛼0 , there is a unique element x ∈ B such that 𝜋𝛼0 (x) = a.
Such x exists because 𝜋𝛼0 is onto, and it is unique by the definition of 𝔅 and the
fact that B ∈ 𝔅. Now define f𝛼 ∶ X𝛼0 → X𝛼 as follows: f𝛼 (a) = 𝜋𝛼 (x), where x is
the element of B constructed above.3 By construction, f𝛼 is an injection. 

Cardinal Arithmetic

Definition. Let A and B be disjoint sets, and write a = Card(A), b = Card(B).


By definition:

the sum of a and b: a + b = Card(A ∪ B)


the product of a and b: ab = Card(A × B)
exponentiation: ab = Card(AB ); recall that AB is the set of all functions B → A

The above operations are well defined in the sense that they are independent of
the particular sets A and B chosen to represent a and b. For example, if A ≈ C and
B ≈ D, then A × B ≈ C × D and AB ≈ CD . See the exercises on section 2.1.

Example 1. For any cardinal number a, a < 2a .

3 In fact, f𝛼 (a) = 𝜋𝛼 (𝜋𝛼−10 (a)).


OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

42 fundamentals of mathematical analysis

By theorem 2.3.1 and problem 16 on section 1.1, a < Card(𝒫(A)) =


Card(2A ) = 2a . 

Example 2. Let a and b be cardinal numbers. Then a ≤ b if and only if there is a


cardinal number c such that a + c = b.

Suppose b = a + c and let A, B, and C be sets such that Card(A) = a, Card(B) = b,


and Card(C) = c, and suppose that A ∩ C = ∅. By assumption, there is a bijection
f ∶ A ∪ C → B. The restriction of f to A injects A into B. Hence a ≤ b. Conversely,
if a ≤ b, let A and B be as above, let f ∶ A → B be an injection and assume that
A ∩ B = ∅. Define C = B − f (A). The function g ∶ A ∪ C → B defined below is
easily seen to be a bijection:

f (x) if x ∈ A,
g(x) = {
x if x ∈ C.

If c = Card(C) then, by definition, a + c = b. 

Theorem 2.3.5. Let a, b, and c be cardinal numbers. Then

1. a + b = b + a and ab = ba.
2. a + (b + c) = (a + b) + c and a(bc) = (ab)c.
3. a(b + c) = ab + ac.
4. ab ac = ab+c .
5. ac bc = (ab)c .
c
6. (ab ) = abc .
7. If a ≤ b, then a + c ≤ b + c.
8. If a ≤ b, and c ≥ 1, then ac ≤ bc.

Proof. Most of the rules of cardinal arithmetic are obvious. We prove property 6,
as an example. Let A, B, and C be such that a = Card(A), b = Card(B) and
C C
c = Card(C). We need to show that (AB ) is equivalent to AB×C . Let f ∈ (AB ) .
Then for c ∈ C, f (c) is a function from B to A. We write fc instead of f (c). For such
an f, define a function 𝜙( f ) = g ∶ B × C → A by g(b, c) = fc (b). The assignment
C
𝜙 ∶ f ↦ g maps (AB ) to AB×C .

𝜙 is onto: If g ∶ B × C → A, define f ∶ C → AB by fc (b) = g(b, c). Clearly g = 𝜙( f ).


C
𝜙 is one-to-one: Let f and f ′ be in (AB ) be such that f ≠ f ′ . Then there is
c ∈ C such that fc ≠ f ′ c . Thus there is b ∈ B such that fc (b) ≠ f ′ c (b). Now if
g = 𝜙( f ), g′ = 𝜙( f ′ ), then g(b, c) = fc (b) ≠ f ′ c (b) = g′ (b, c). 
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

set theory 43

Example 3. Let a, b, c, and d be cardinal numbers such that a ≤ b and c ≤ d. Then


ac ≤ bd.

By example 2, there are cardinal numbers r and s such that a + r = b, and


c + s = d. Now bd = (a + r)(c + s) = ac + (as + rc + rs). Again by example 2,
ac ≤ bd. 

Example 4. If a, b, and c are cardinal numbers and a ≤ b, then ac ≤ bc .

Let A, B, and C be such that Card(A) = a, Card(B) = b, and Card(C) = c. By


assumption, there is an injection g ∶ A → B. Define a function 𝜙 ∶ AC → BC by
𝜙( f ) = gof. The function 𝜙 is an injection; hence ac ≤ bc . 

Theorem 2.3.6. Let a be an infinite cardinal. Then a.a = a.

Proof. Let A be such that Card(A) = a, and let 𝔅 = {(A𝛼 , f𝛼 )}𝛼∈I be the collection
of all bijections f𝛼 ∶ A𝛼 → A𝛼 × A𝛼 , where A𝛼 ⊆ A. To see that 𝔅 ≠ ∅, pick a
countable subset G of A. By theorem 2.1.6, G ≈ G × G; hence 𝔅 ≠ ∅.
Order 𝔅 as follows: for 𝛼 and 𝛽 ∈ I, (A𝛼 , f𝛼 ) ≤ (A𝛽 , f𝛽 ) if A𝛼 ⊆ A𝛽 , and f𝛽
extends f𝛼 . If ℭ = {(A𝛼 , f𝛼 )}𝛼∈J is a chain in 𝔅, let C = ∪𝛼∈J A𝛼 and define a
function f ∶ C → C × C by f (x) = f𝛼 (x), where 𝛼 ∈ J is such that x ∈ A𝛼 . The
function f is a well-defined bijection from C → C × C, as the reader can verify.
Clearly, (C, f ) extends every member in ℭ and hence is an upper bound of
ℭ. By Zorn’s lemma, 𝔅 contains a maximal member, (C, g). We claim that
Card(C) = a. Suppose for a contradiction that Card(C) = b < a. First observe
that b ≤ b + b ≤ b.b = Card(C × C) = Card(C) = b, and hence b + b = b.b = b.
Now let d = Card(A − C). If d ≤ b, then a = b + d ≤ b + b = b, which contradicts
the supposition that b < a. Therefore b < d, and A − C contains a subset E such
that Card(E) = b.
Now (C ∪ E) × (C ∪ E) = (C × C) ∪ K, where K = (C × E) ∪ (E × C) ∪ (E × E).
Since K is the disjoint union of three sets each of cardinality b, Card(K) = b +
b + b = (b + b) + b = b + b = b. Therefore there is an bijection h ∶ E → K. Now
define a function f ∶ C ∪ E → (C × C) ∪ K = (C ∪ E) × (C ∪ E) by

g(x) if x ∈ C,
f (x) = {
h(x) if x ∈ E.

Clearly, the pair (C ∪ E, f ) ∈ 𝔅 is a strict extension of (C, g), which contradicts the
maximality of (C, g). This shows that the supposition b < a is false; hence a = b.
This concludes the proof because a.a = Card(C × C) = Card(C) = a. 

Corollary 2.3.7. If a is an infinite cardinal number and 1 ≤ b ≤ a, then ab = a.


OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

44 fundamentals of mathematical analysis

Proof. a ≤ ab ≤ a.a = a. 

Theorem 2.3.8. Let a be an infinite cardinal. Then a + a = a.

Proof. a ≤ a + a ≤ a.a = a. Therefore a + a = a. 

Corollary 2.3.9. If a is an infinite cardinal and 1 ≤ b ≤ a, then a + b = a

Proof. a ≤ a + b ≤ a + a = a.

Example 5. Let b be an infinite cardinal number, and suppose that 1 < a ≤ b.


Then ab = 2b .

Since a < 2a , ab ≤ (2a )b = 2ab = 2b . Because 2 ≤ a, 2b ≤ ab , and 2b = ab . 

We conclude our study of cardinal arithmetic with a brief exploration of sums of


infinitely many cardinals.

Definition. Let {a𝛼 }𝛼∈I be a set of cardinal numbers, and let {A𝛼 } be a collection
of disjoint sets such that Card(A𝛼 ) = a𝛼 . Define ∑𝛼∈I a𝛼 = Card(∪𝛼∈I A𝛼 ).

Theorem 2.3.10. Let {a𝛼 }𝛼∈I be a collection of equal cardinal numbers, say, a𝛼 = a
and let b = Card(I). Then ∑𝛼∈I a𝛼 = ab.

Proof. Let A be such that Card(A) = a, and let {A𝛼 } be a collection of disjoint
sets such that Card(A𝛼 ) = a. Then there are bijections f𝛼 ∶ A → A𝛼 . Define a
function f ∶ A × I → ∪𝛼∈I A𝛼 by f (x, 𝛼) = f𝛼 (x). Verifying that f is a bijection is
straightforward. Therefore ∑𝛼∈I a𝛼 = Card(∪𝛼∈I A𝛼 ) = Card(A × I) = ab. 

Theorem 2.3.11. If a𝛼 ≤ b𝛼 for every 𝛼 ∈ I, then ∑𝛼∈I a𝛼 ≤ ∑𝛼∈I b𝛼 .

Proof. Let {A𝛼 } be a collection of disjoint sets such that Card(A𝛼 ) = a𝛼 , and let
{B𝛼 } be a collection of disjoint sets such that Card(B𝛼 ) = b𝛼 . By assumption,
there exist injections f𝛼 ∶ A𝛼 → B𝛼 . Define a function f ∶ ∪𝛼∈I A𝛼 → ∪𝛼∈I B𝛼 by
f (x) = f𝛼 (x) if x ∈ A𝛼 . The function f is well defined because {A𝛼 } is a disjoint
family. Clearly, f is an injection from ∪𝛼∈I A𝛼 into ∪𝛼∈I B𝛼 . 

The following theorem is a far-reaching generalization of theorem 2.1.12:

Theorem 2.3.12. Let I be an infinite set, and let b = Card(I). Then the family 𝔉 of
finite sequences in I has cardinality b.
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

set theory 45

Proof. Let In be the family of sequences in I of length exactly n. Since In =



I{1,2,. . .,n} , Card(In ) = bn = b. Now 𝔉 = ∪∞ n n
n=1 I , so Card(𝔉) = ∑n=1 Card(I ).

By theorem 2.3.10, ∑n=1 Card(In ) = ℵ0 b = b. 

We conclude the section with a well-known theorem and a famous conjecture.

Theorem 2.3.13. 2ℵ0 = 𝔠.

Proof. By definition, 2ℵ0 = Card(2ℕ ). Let T be the set of all binary sequences that
contain only a finite number of nonzero terms. By problem 6 on section 2.1,
Card(2ℕ − T) = Card(2ℕ ) = 2ℵ0 . By the proof of theorem 2.1.15 (see also prob-
lem 11 on section 2.1), 2ℕ − T ≈ (0, 1] ≈ ℝ. Thus 𝔠 = Card(ℝ) = Card((0, 1]) =
Card(2ℕ − T) = 2ℵ0 . 

Example 6. 𝔠ℵ0 = 𝔠.

Using theorems 2.3.6 and 2.3.13, 𝔠ℵ0 = (2ℵ0 )ℵ0 = 2ℵ0 ℵ0 = 2ℵ0 = 𝔠. 

Example 7. There exits a sequence of cardinal numbers a1 , a2 , . . . such that



a1 < a2 < . . . and an 0 = an .


Take a1 = 𝔠. By the previous example, a1 0 = a1 . For n ≥ 1, define an+1 = 2an . 

The Continuum Hypothesis


2ℵ0
Take a sufficiently large infinite cardinal such as 𝔞 = 22 . There are several infinite

cardinals between ℵ0 and 𝔞, such as 2ℵ0 , and 22 0 . Consider the set of cardinals
strictly between ℵ0 and 𝔞. By theorem 2.3.4, there is a smallest cardinal in that set.
We call it ℵ1 . Thus ℵ1 is the immediate successor of ℵ0 in the sense that there are
no cardinals strictly between ℵ0 and ℵ1 .
We know that 2ℵ0 > ℵ0 . By the above paragraph, ℵ1 ≤ 2ℵ0 . A famous conjecture
of set theory is the continuum hypothesis, which states that 2ℵ0 = ℵ1 . In other
words, there are no cardinals strictly between ℵ0 and 2ℵ0 = 𝔠.

The generalized continuum hypothesis states that, for any infinite cardinal a,
there is no cardinal number b such that a < b < 2a , that is, 2a is the immediate
successor of a.

Exercises

1. Provide proofs of the statements of theorem 2.3.5.


OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

46 fundamentals of mathematical analysis

2. Prove that if a ≥ 2 is a cardinal number, then a + a ≤ a.a. This result was used
in the proof of theorems 2.3.6 and 2.3.8.
3. Let a and b be infinite cardinal numbers. Prove that if a + a = a + b, then
a ≥ b.
4. Let a, b, and c be infinite cardinal numbers. Prove that if a + b < a + c, then
b < c.

5. What is ℵ0 0 ?

6. Prove that ∑n=1 n = ℵ0 .
7. Let A and B be infinite sets and let f ∶ A → B be a surjection.
(a) Prove that Card(A) ≥ Card(B).
(b) Prove that if f−1 (b) is at most countable for each b ∈ B, then A ≈ B.
8. Let {A𝛼 }𝛼∈I be a family of nonempty sets. Prove that Card(∪𝛼∈I A𝛼 ) ≤
∑𝛼∈I Card(A𝛼 ).
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

3
Vector Spaces

Questions that pertain to the foundations of mathematics, although treated by


many in recent times, still lack a satisfactory solution. Ambiguity of language is
philosophy’s main source of problems. That is why it is of the utmost importance
to examine attentively the very words we use.
Giuseppe Peano

Giuseppe Peano. 1858–1932

Peano was born in a farmhouse about 5 km from Cuneo, where he received his
early education. One of Peano’s uncles was a priest and a lawyer in Turin, and
he realized the child’s talent. He took him to Turin in 1870 for his secondary
schooling. Peano entered the University of Turin in 1876, graduated in 1880
doctor of mathematics, and was appointed to the university the same year. He
received his qualification to be a university professor in 1884.

In 1886 Peano proved the existence of the solution of the differential equation
dy/dx = f (x, y) under the mere assumption that f is continuous in the neigh-
borhood of the initial point (x0 , y0 ). In 1888 he published the book Geometrical
Calculus, which begins with a chapter on mathematical logic. A significant feature
of the book is that, in it, Peano sets out with great clarity the ideas of Grassmann,
who made the first attempt to define a vector space, albeit in a rather obscure way.

Fundamentals of Mathematical Analysis. Adel N. Boules, Oxford University Press (2021). © Adel N. Boules.
DOI: 10.1093/oso/9780198868781.003.0003
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

48 fundamentals of mathematical analysis

This book contains the first definition of a vector space given with a remarkably
modern notation and style. This was, without a doubt, a big development in the
history of mathematics. In 1889 Peano published an axiomatic approach to the
definition of the natural numbers that was based on the notion of the successor
function. In 1890 he made the stunning discovery that there are continuous
surjective mappings from [0,1] onto the unit square, which came to be known as
space-filling curves.

Peano’s career was strangely divided into two periods. The period up to 1900
is one where he showed great originality and a remarkable feel for topics that
would be important in the development of mathematics. His achievements were
outstanding, and he had a modern style quite ahead of his own time. However,
this feel for what was important seemed to leave him, and, after 1900, he worked
with great enthusiasm on two projects of great difficulty, which were enormous
undertakings but proved quite unimportant in the development of mathematics.
From around 1892, Peano embarked on a new and extremely ambitious project,
namely, the Formulario Mathematico. As he explained:1

of the greatest usefulness would be the publication of collections of all


the theorems now known that refer to given branches of the mathematical
sciences . . . Such a collection, which would be long and difficult in ordinary
language, is made noticeably easier by using the notation of mathemati-
cal logic

Even before the Formulario Mathematico project was completed, Peano took up the
project of finding an international, artificial language, “Latino sine flexione,” which
was based on Latin but stripped of all grammar. He compiled the vocabulary by
taking words from English, French, German, and Latin. In fact, the final edition of
the Formulario Mathematico was written in Latino sine flexione, which is another
reason the work was so little used.

The Evolution of the Concept of a Vector Space2


The emergence of the modern definition of a vector space was delayed for a con-
siderable length of time because of several reasons. It appears that early attempts
to define what we know now as a vector space were hindered by the insistence
on incorporating axioms for determinants. The lack of awareness of the impor-
tance of axiomatics and abstract thinking was also a major obstacle. Grassmann’s

1 J. J. O’Connor and E. F. Robertson, “Giuseppe Peano,” in MacTutor History of Mathematics, (St


Andrews: University of St Andrews, 1998), http://mathshistory.st-andrews.ac.uk/Biographies/Peano/,
accessed Nov. 3, 2020.
2 All the historical information in this article can be found in Jean-Luc Dorier, “A general outline of
the genesis of vector space theory.” Historia Mathematica 22, no. 3 (1995): 227–61.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 49

pioneering ideas were obscured by philosophical language, and although Peano’s


definition was a long step toward axiomatization, it did not produce the modern
definition. The founders of functional analysis were instrumental in framing the
modern definition. In 1916 Riesz studied spaces of continuous functions and
defined linear transformations and even the concept of bounded linear operators.
Decisive steps toward axiomatization were taken independently by Banach in 1920
and by Hahn, in two papers published in 1922 and 1927. In 1920 Banach took
Riesz’s ideas one step further and defined what is known in modern terminology as
a Banach space. The function spaces Banach and Riesz studied are infinite dimen-
sional, and this makes the use of an axiomatic approach compulsory. Banach’s
approach was confined to function spaces, and his axioms did not coincide with
the modern definition in that some axioms are redundant and some are missing.
Modern algebra finally paved the way toward the modern definition of a vector
space: determinants were dropped from the axiomatic approach, and this unified
the definition of finite and infinite-dimensional spaces. The definition was made
accessible to beginning students in books that were published in the 1940s by
Birkhoff and MacLane, Halmos, and Bourbaki.

3.1 Definitions and Basic Properties

This section is a summary of the most basic concepts of vector space theory. The
main reason for including this section is to establish terminology and provide a
collection of important examples. The reader should pay particular attention to
the examples, because the sequence and function spaces we introduce here are of
fundamental importance for the rest of the book. The theorems are stated without
proof.

Definition. Let 𝕂 be a field, and suppose U is a nonempty set equipped with a


binary operation, + (vector addition). Suppose also that there is a function
× ∶ 𝕂 × U → U (scalar multiplication) that assigns to each pair (a, u) ∈ 𝕂 × U
an element a × u (or simply au) in U. The triple (U, +, x) is called a vector space
over the field 𝕂 if the following conditions are satisfied by all elements a, b ∈ 𝕂
and all elements u, v, w ∈ U:

(a) u + v = v + u.
(b) u + (v + w) = (u + v) + w.
(c) There is an element 0 ∈ V (the zero vector) such that u + 0 = u.
(d) For every u ∈ U, there is an element −u ∈ U such that u + (−u) = 0.
(e) a(u + v) = au + av.
(f) (a + b)u = au + bu.
(g) (ab)u = a(bu).
(h) 1.u = u.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

50 fundamentals of mathematical analysis

The field 𝕂 is called the base field, the elements of 𝕂 are referred to as scalars, and
the elements in U are called vectors. The only two fields we will use in this book are
the real field, ℝ, and the complex field, ℂ. Either of these two fields will be denoted
by 𝕂. Most of the results we will obtain apply equally whether the underlying field
is ℝ or ℂ. When a given result applies to only one field but not the other, we will
explicitly state the base field.

Example 1. For each n ∈ ℕ, let 𝕂n be the set of sequences in 𝕂 of length n. The


set 𝕂n is a vector space with the operations

(x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn ), a(x1 , . . . , xn ) = (ax1 , . . . , axn ). 

Example 2. Let I be a nonempty set, and let 𝕂I be the space of all functions from
I to 𝕂. For functions x = (x𝛼 )𝛼∈I , y = (y𝛼 )𝛼∈I , and for a ∈ 𝕂, define

x + y = (x𝛼 + y𝛼 )𝛼∈I , ax = (ax𝛼 )𝛼∈I . 

Example 3. The space 𝕂(I) is the space of all functions x ∶ I → 𝕂 such that x𝛼 = 0
for all but a finite number of elements 𝛼 ∈ I. Addition and scalar multiplication
are defined as in example 2. 

Example 4. Important special cases of examples 2 and 3 are obtained when I = ℕ;


𝕂ℕ is the space of all sequences in 𝕂, and 𝕂(ℕ) is the space of sequences that
have a finite number of terms different from 0. 

Example 5. Let ℙ be the set of all polynomials with coefficients in 𝕂. We


add polynomials by adding the coefficients of equal powers of x, and scalar
n n
multiplication is defined by a(∑i=0 ai xi ) = ∑i=0 (aai )xi . Let ℙn be the space of
polynomials of degree ≤ n. Clearly, ℙn is contained in ℙ for all n ∈ ℕ. In fact,
ℙ = ∪∞n=0 ℙn . 

Example 6. Let 𝕂m×n be the space of all m × n matrices. Addition and scalar
multiplication are defined entrywise, in the usual manner. 

Example 7. For real numbers a < b, define X = ℬ[a, b] as the space of all bounded
(real or complex) functions on the interval [a, b]. For f, g ∈ X, x ∈ [a, b], and
c ∈ 𝕂, define vector addition and scalar multiplication in X, respectively, by

(f + g)(x) = f (x) + g(x),


(cf)(x) = cf (x). 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 51

Example 8. Another important example is the set 𝒞[a, b] of continuous functions


on the closed bounded interval [a, b]. Vector addition and scalar multiplication
are defined as in the previous example. Because continuous functions are closed
under addition and scalar multiplication (the sum of two continuous functions
is continuous, etc.), 𝒞[a, b] is a vector space. 

Example 9. The space 𝒞∞ (ℝ) consists of all real-valued functions on ℝ that have
derivatives of all orders. Vector addition and scalar multiplication are defined
as in example 7. 

Theorem 3.1.1.
(a) The zero vector is unique.
(b) For u ∈ U and a ∈ 𝕂, 0.u = O and a.O = O (0 is the scalar zero and O is the
zero vector).
(c) (−a)u = a(−u) = −(au).
n n
(d) (∑i=1 ai )u = ∑i=1 ai u.
n n
(e) a(∑i=1 ui ) = ∑i=1 aui . 

Definition. A subset V of a vector space U is called a subspace of U if V is closed


under vector addition and scalar multiplication. Thus V is a subspace if, for all
v, w ∈ V, and all a ∈ 𝕂, v + w ∈ V, and av ∈ V. It is clear that V is a vector space
in its own right.

Example 10. The set V = {(x1 , x2 , 0) ∶ x1 , x2 ∈ 𝕂} is a subspace of 𝕂3 . 

Example 11. More generally, for n < m, 𝕂n can be viewed as a subspace of 𝕂m if


we identify an element (x1 , . . . , xn ) ∈ 𝕂n with the element (x1 , . . . , xn , 0, . . . , 0) ∈
𝕂m . 

Example 12. For every n ∈ ℕ, ℙn is a subspace of ℙ. 

Example 13. For an arbitrary nonempty set I, 𝕂(I) is a subspace of 𝕂I . In


particular, 𝕂(ℕ) is a subspace of 𝕂ℕ . A particularly important subspace of 𝕂ℕ
is the space of bounded sequences,
l∞ = {(x1 , x2 , ...) ∶ supn |xn | < ∞}. 

Example 14. Two well-known subspaces of l∞ are the space c of convergent


sequences, and the space c0 of all sequences that converge to 0. We also call
c0 the space of null sequences. 

Example 15. The space 𝒞[a, b] is a subspace of ℬ[a, b] because a continuous


function on a closed bounded interval is bounded. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

52 fundamentals of mathematical analysis

Theorem 3.1.2. A subset V of a vector space U is a subspace of U if and only if for


all v, w ∈ V, and all a, b ∈ 𝕂, av + bw ∈ V. 

Definitions. The Canonical Vectors:

(a) The canonical vectors in 𝕂n are the n vectors

e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , and en = (0, 0, . . . , 1).

(b) The canonical vectors in 𝕂(ℕ) are the sequences en (n ∈ ℕ), where the nth
term of en is one, and all the other terms are zero.
(c) The canonical vectors in 𝕂(I) are the functions e𝛼 ∶ I → 𝕂, defined by
e𝛼 (𝛽) = 𝛿𝛼,𝛽 . Here 𝛿𝛼,𝛽 is the Kronecker delta:

1 if 𝛼 = 𝛽,
𝛿𝛼,𝛽 = {
0 if 𝛼 ≠ 𝛽.

Definition. Let {u1 , u2 , . . . , un } be a finite subset of a vector space U. A linear


n
combination of u1 , u2 , . . . , un is an element of U of the form ∑i=1 ai ui for some
scalars a1 , . . . an .

Example 16. Every vector in 𝕂n is a linear combination of the canonical vectors


n
in 𝕂n . Indeed, if x = (x1 , . . . , xn ) ∈ 𝕂n , then x = ∑i=1 xi ei . 

Example 17. In 𝕂(ℕ), every vector is a linear combination of a finite number of


the canonical vectors, because if f ∈ 𝕂(ℕ), and a1 = f (k1 ), . . . , an = f (kn ) are all
n
the nonzero terms of f, then f = ∑i=1 ai eki . 

Example 18. Every polynomial in ℙn is a linear combination of the n + 1 vectors


1, x, x2 , . . . , xn . 

Definition. Let S be a subset of a vector space U. The span of S, written Span(S),


is the collection of all linear combinations of finite subsets of S (common termi-
nology: finite linear combinations of S). To reiterate, a finite linear combination
n
of S is a vector in U of the form ∑i=1 ai ui , where ui ∈ S and ai ∈ 𝕂.

Example 19. In 𝕂n , Span({e1 , e2 , . . . , en }) = 𝕂n . In 𝕂(ℕ), Span({e1 , e2 }) is the set


of all sequences where only the first two terms may be nonzero. 

It is easy of see that Span(S) is a subspace of U. If V is a subspace of U that contains


S, then V contains all finite linear combinations of S. Thus Span(S) ⊆ V, hence the
following result.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 53

Theorem 3.1.3. Span(S) is the smallest subspace of U containing S. 

Theorem 3.1.4. If {V𝛼 } is a collection of subspaces of U, then ∩𝛼 V𝛼 is a subspace


of U. 

Theorem 3.1.5. Span(S) is the intersection of all the subspaces containing S. 

Exercises

1. Prove theorem 3.1.1.


2. Prove theorem 3.1.2.
3. Prove theorem 3.1.4.
4. Prove theorem 3.1.5.
5. Let S1 and S2 be subsets of a vector space U. Prove that Span(S1 ∪ S2 ) =
Span(S1 ) if and only if S2 ⊆ Span(S1 ).

3.2 Independent Sets and Bases

This section is focused on the concepts on linear independence and bases. Our
approach to studying bases is unified in the sense that we do not treat finite-
dimensional and infinite-dimensional spaces separately. We use Zorn’s lemma to
prove the existence of a basis. A number of important equivalent characterizations
of a basis are also discussed, both in the body of the section as well as in the section
exercises.

Definition. A finite subset {u1 , u2 , . . . , un } of a vector space U is dependent if there


n
exist scalars a1 , a2 , . . . , an , not all zero, such that ∑i=1 ai ui = 0.

n
Terminology. A vector of the form ∑i=1 ai ui , where at least one ai ≠ 0, is called
a nontrivial linear combination of u1 , u2 , . . . , un . The above definition can be
restated as follows: {u1 , u2 , . . . , un } is dependent if some nontrivial linear combi-
nation of u1 , u2 , . . . , un is zero.

Theorem 3.2.1. A subset S = {u1 , u2 , . . . , un } of a vector space U is dependent if and


only if one of the vectors in S is a linear combination of the remaining vectors.
n
Proof. Suppose {u1 , u2 , . . . , un } is dependent. Then ∑i=1 ai ui = 0 for scalars
−1 n
a1 , a2 , . . . , an , not all zero. Say ai ≠ 0. Then ui = ∑j≠i aj uj . Conversely, if
ai
n n
ui = ∑j≠i aj uj , then 1.ui − ∑j≠i aj uj = 0, and {u1 , u2 , . . . , un } is dependent. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

54 fundamentals of mathematical analysis

Definition. A finite subset {u1 , u2 , . . . , un } of a vector space U is independent


if it is not dependent. Equivalently, {u1 , u2 , . . . , un } is independent if a linear
n
combination ∑i=1 ai ui is equal to zero if and only if each ai = 0.

Example 1. The set {e1 , . . . , en } is independent in 𝕂n . 

Example 2. Any finite subset of {en ∶ n ∈ ℕ} is independent in 𝕂(ℕ). 

Example 3. Any finite subset of the monomials 1, x, x2 , ... is independent in ℙ. 

Example 4. Any finite subset of the canonical vectors e𝛼 in 𝕂(I) is independent. 

Definitions. An infinite subset S of a vector space U is independent if every finite


subset of S is independent. An infinite subset S of vectors is dependent if some
finite subset of S is dependent.

The following follow immediately from the previous set of examples.

Example 5. The set of canonical vectors {en ∶ n ∈ ℕ} is independent in 𝕂(ℕ). 

Example 6. The set of all monomials {1, x, x2 , . . . } is independent in ℙ. 

Example 7. The set of canonical vectors {e𝛼 ∶ 𝛼 ∈ I} is independent in 𝕂(I). 

Example 8. The functions f𝛼 (x) = e𝛼x , 𝛼 ∈ ℝ, are independent in 𝒞∞ (ℝ).


We show that, for any finite set {𝛼1 , . . . , 𝛼n } of distinct real numbers, the
functions e𝛼1 x , . . . , e𝛼n x are independent. Suppose that, for constants c1 , . . . , cn ,
n
∑i=1 ci e𝛼i x = 0. Repeated differentiation of the above identity (n − 1 times)
n j
yields ∑i=1 𝛼i ci e𝛼i x = 0, j = 0, . . . , n − 1. Evaluating each of the last identities
at x = 0 yields the system of linear equations

n
j
∑ 𝛼i ci = 0, j = 0, . . . , n − 1.
i=1

The matrix of the system is the famous Vandermonde matrix,

1 1 … 1
⎛ ⎞
𝛼 𝛼2 … 𝛼n ⎟
J=⎜ 1 .
⎜ ⋮ ⋮ ⎟
n−1
⎝𝛼1 𝛼2n−1 … 𝛼nn−1 ⎠
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 55

Since det(J) = ∏1≤i<j≤n (𝛼j − 𝛼i ) ≠ 0, we must have c1 = ... = cn = 0, establish-


ing the independence of the set { f𝛼 (x) = e𝛼x ∶ 𝛼 ∈ ℝ}. 

Definition. Let U be a vector space. A basis for U is a maximal independent subset


of U. To rephrase, S is a basis if S is independent and any subset of U properly
containing S is dependent. A basis for a vector space is sometimes called a linear
basis or a Hamel basis.

Theorem 3.2.2. Every vector space U has a basis.

Proof. Let 𝔅 be the collection of all independent subsets of U. Order 𝔅 by set


inclusion, and let ℭ be a chain in 𝔅. We show that ∪{C ∶ C ∈ ℭ} is independent.
Let {u1 , u2 , . . . , un } be a subset of ∪{C ∶ C ∈ ℭ}. Then, for each 1 ≤ i ≤ n, there is
a member Ci ∈ ℭ such that ui ∈ Ci . Because ℭ is a chain, one set Ci contains all
the other sets Cj , 1 ≤ j ≤ n. Therefore {u1 , u2 , . . . , un } is a subset of Ci and hence
is independent. Thus ℭ has an upper bound, namely, ∪{C ∶ C ∈ ℭ}. By Zorn’s
lemma, 𝔅 has a maximal member, that is, a maximal independent subset of U,
that is, a basis for U. 

The corollary below says that an independent subset of a vector space can be
augmented to a basis.

Corollary 3.2.3. Let S1 be an independent subset of a vector space U. Then there is


a basis S for U containing S1 .

Proof. The proof parallels that of theorem 3.2.2, except that 𝔅 is defined to be the
family of all independent subsets of U that contain S1 ; 𝔅 ≠ ∅ because S1 ∈ 𝔅. 

Theorem 3.2.4. Let S be a subset of a vector space U. The following are equivalent:

(a) S is a basis.
(b) S is independent and spans U (meaning that Span(S) = U).
(c) Every nonzero element of U can be written uniquely as a finite linear combi-
nation of vectors in S. Specifically, if u ≠ 0, then there exists a unique subset
{u1 , . . . , un } of S and a unique set of nonzero scalars {a1 , . . . , an } such that
n
u = ∑i=1 ai ui .

Proof. (a) implies (b). Since a basis is independent, we only need to show that
Span(S) = U. Let u ∈ U and, without loss of generality, assume that u ∉ S. Then
S1 = S ∪ {u} is dependent, so a finite subset S2 of S1 is dependent. S2 must contain
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

56 fundamentals of mathematical analysis

u because the other elements of S2 are independent. Write S2 = {u, u1 , ..., un }. Then
there are scalars a, a1 , . . . , an , not all zero, such that au + a1 u1 + . . . + an un = 0;
1
a ≠ 0 because otherwise, {u1 , . . . , un } would be dependent. Hence u = (a1 u1 +
a
... + an un ). Thus S spans U.
(b) implies (c). We only need to show the uniqueness of the representation of a
nonzero element u ∈ U as a finite linear combination of S. Suppose there are
finite subsets E and F of S such that u can be written as a linear combination
of the elements of both E and F. We will show that E ≠ F leads to a contradiction.
We adopt the notation E ∩ F = {u1 , . . . , ur }, E − F = {ur+1 , . . . , us }, and F − E =
{us+1 , . . . , un }. The assumption is that there are nonzero scalars a1 , . . . , b1 , . . . , such
that
r s r n
u = ∑ ai ui + ∑ ai ui = ∑ bi ui + ∑ bi ui .
i=1 i=r+1 i=1 i=s+1
r s
Rearranging the above equation, we have ∑i=1 (ai
− bi )ui + ∑i=r+1 ai ui −
n
∑i=s+1 bi ui = 0. This would contradict the independence of E ∪ F unless E − F =
r
∅ = F − E. Now ∑i=1 (ai − bi )ui = 0, and the independence of E forces ai = bi
for all 1 ≤ i ≤ r.
(c) implies (a) First observe that the zero vector is not in S because otherwise
the uniqueness of representation of any finite linear combination of S would be
violated by adding 1.0 to it. To show the independence of S, suppose a linear
n
combination of some finite subset of S is equal to zero, say, ∑i=1 ai ui = 0.
By the previous observation, at least two of the coefficients are nonzero, say,
n
a1 ≠ 0 ≠ a2 . In this case, a1 u1 = − ∑i=2 ai ui . This contradicts the uniqueness
of the representation of a1 u1 and proves the independence of S. To show that
S is maximal, let u ∈ U − S. Then u is a finite linear combination of elements
u1 , . . . , un of S. This implies that {u, u1 , . . . , un } is dependent, and hence {u} ∪ S is
dependent. This establishes the maximality of S. 

Example 9. The canonical vectors e1 , . . . , en are independent and span 𝕂n and


therefore form a basis for 𝕂n . 

Example 10. The set S = {1, x, x2 , ...} is independent and spans ℙ and is therefore
a basis for ℙ. Naturally, we call S the canonical basis for ℙ. 

Example 11. For the same reason, {en ∶ n ∈ ℕ} is a basis for 𝕂(ℕ). 

Theorem 3.2.5. If S is a basis for a vector space U, then S is a minimal spanning


set for U in the sense that Span(S) = U and no proper subset of S spans U.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 57

Proof. If S is a basis for U, then, by the previous theorem, S is independent and


Span(S) = U. If S1 is a proper subset of S that also spans U, then, again by the
previous theorem, S1 would also be a basis for U. This contradicts the maximality
of S1 and hence the very definition of a basis, because S1 is a proper subset
of S. 

Exercises

1. Prove that a subset of an independent set is independent.


2. Prove that a set containing a dependent set is dependent.
3. Prove that a minimal spanning subset of a vector space U is a basis for U.
This is the converse of theorem 3.2.5.
4. Prove that every spanning subset of a vector space U contains a basis for U.
5. Find a basis for 𝕂m×n .

3.3 The Dimension of a Vector Space

In this section, we discuss the definition of dimension and prove the invariance
of the cardinality of the basis. Some results on cardinal arithmetic are needed in
the infinite-dimensional case. We also prove the existence of a vector space of any
given dimension.

Definition. A vector space U is said to be finite dimensional if it contains a finite


basis.

Example 1. 𝕂n and ℙn are finite dimensional. 

Lemma 3.3.1. Consider the following system of linear equations with coefficients
in 𝕂:

a11 x1 + a12 x2 + ... + a1m xm = 0,


a21 x1 + a22 x2 + ... + a2m xm = 0,


an1 x1 + an2 x2 + ... + anm xn = 0.

If m > n, then the system has a nontrivial (i.e., nonzero) solution


(x1 , . . . , xm ) ∈ 𝕂m .
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

58 fundamentals of mathematical analysis

Proof. Without loss of generality, assume that m = n + 1, because we can augment


the system by adding m − n − 1 equations with zero coefficients to the system.
Since at least one of the coefficients is different from zero, we may assume,
by reordering the equations and renumbering the variables, that a11 ≠ 0. We
a
prove the theorem by induction on n. Subtracting i,1 times the top equation from
a11
equation i, 2 ≤ i ≤ n yields the equivalent system

a11 x1 + a12 x2 + ... + a1,n+1 xn+1 = 0,


b22 x2 + ... + b2,n+1 xn+1 = 0,


bn2 x2 + ... + bn,n+1 xn+1 = 0,

where bij = aij − ai1 a1j /a11 , 2 ≤ i ≤ n, 2 ≤ j ≤ n + 1. The bottom n − 1 equations


of the above system have a nontrivial solution (x2 , . . . , xn+1 ), by the inductive
−1 n+1
hypothesis. Defining x1 = ∑j=2 a1j xj yields a nontrivial solution (x1 , . . . , xn+1 )
a11
of the original system. 

Lemma 3.3.2. If a finite dimensional space U has a basis S = {u1 , . . . , un } of n


vectors, then any subset of U containing more than n elements is dependent.

Proof. Let {v1 , . . . , vm } be a subset of U with m > n. Each vj is a linear combination


n
of S, say, vj = ∑i=1 aij ui , (1 ≤ j ≤ m). By the previous theorem, there exists a
m
nontrivial solution (x1 , . . . , xm ) of the system ∑j=1 aij xj = 0, i = 1, 2, . . . , n. Now
m m n n m n
∑j=1 xj vj = ∑j=1 xj ∑i=1 aij ui = ∑i=1 (∑j=1 aij xj )ui = ∑i=1 0.ui = 0.
Thus {v1 , . . . , vm } are dependent. 

We now prove the invariance of the number of vectors in a basis for a finite-
dimensional space.

Theorem 3.3.3. If S = {u1 , . . . , un } and T = {v1 , . . . , vm } are bases for a finite-


dimensional vector space U, then n = m.

Proof. Since S is independent and T is a basis, n ≤ m, by the previous lemma. For


the same reason, m ≤ n. 

Definition. The dimension of a finite-dimensional vector space U is the number


of elements in a basis for U. This number is independent of the basis by the
previous theorem.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 59

Example 2. The dimension of 𝕂n is n. The dimension of ℙn is n + 1. The dimen-


sion of 𝕂m×n is mn. 

Definition. A vector space U is said to be infinite dimensional if it is not finite


dimensional. Thus U is infinite dimensional if every basis for U is infinite.

As in the finite-dimensional case, the cardinality of a basis for an infinite-


dimensional space is an invariant of the space, as the following theorem shows.

Theorem 3.3.4. Let {u𝛼 }𝛼∈I and {v𝛽 }𝛽∈J be bases for an infinite-dimensional space
U. Then Card(I) = Card(J).

Proof. For each 𝛽 ∈ J, there is a finite subset I𝛽 ⊆ I such that v𝛽 is a linear combina-
tion of the finite set {u𝛼 ∶ 𝛼 ∈ I𝛽 }. Therefore

U = Span({v𝛽 ∶ 𝛽 ∈ J}) ⊆ Span({u𝛼 ∶ 𝛼 ∈ ∪𝛽∈J I𝛽 }) ⊆ U.

Since no proper subset of {u𝛼 }𝛼∈I spans U (theorem 3.2.5), I = ∪𝛽∈J I𝛽 . Using
theorems 2.3.11, and 2.3.10 (also see problem 8 on section 2.3),

Card(I) = Card(∪𝛽∈J I𝛽 ) ≤ ∑ Card(I𝛽 ) ≤ ∑ ℵ0 = ℵ0 Card(J) = Card(J).


𝛽∈J 𝛽∈J

Likewise, Card(J) ≤ Card(I), and the proof is complete. 

Now that we proved the invariance of the cardinality of a basis in an infinite-


dimensional space, we define the dimension of such a space to be the cardinality
of any basis for the space.

Notation. We use the notation dim𝕂 (U) to denote the dimension of a vector space
U over the field 𝕂. If the base field is understood, we simply write dim(U).

Example 3. dim(𝕂(ℕ)) = ℵ0 = dim(ℙ). 

Example 4. In example 8 in section 3.2, we proved that the set of func-


tions { f𝛼 (x) = e𝛼x ∶ 𝛼 ∈ ℝ} is independent in 𝒞∞ (ℝ). This shows that
dim(𝒞∞ (ℝ)) ≥ 𝔠. 

We now show the existence of a vector space of any given dimension. The essential
uniqueness of such a space will be discussed in section 3.4.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

60 fundamentals of mathematical analysis

Theorem 3.3.5. Let ℵ be a cardinal number. Then there is a vector space of


dimension ℵ.

Proof. If ℵ is a finite cardinal, n, then 𝕂n has dimension n. So assume that ℵ is


infinite, and let I be a set such that Card(I) = ℵ. We show that the space U = 𝕂(I)
discussed in section 3.1 has dimension ℵ by finding a basis for U which is in one-to-
one correspondence with I. Let S = {e𝛼 }𝛼∈I be the set of canonical vectors in 𝕂(I);
S is clearly in one-to-one correspondence with I. We show that S is a basis for U. Let
{e𝛼1 , . . . , e𝛼n } be a finite subset of S, and suppose that, for some scalars a1 , . . . , an ,
n
f = ∑k=1 ak e𝛼k = 0 (the zero function from I → 𝕂). For a fixed 1 ≤ j ≤ n, 0 =
n
f (𝛼j ) = ∑k=1 ak e𝛼k (𝛼j ) = aj . This shows the independence of S. Next we show that
S spans U. Let f ∈ U and let a1 = f (𝛼1 ), . . . , an = f (𝛼n ) be all the nonzero values
n
of f. Clearly, f = ∑k=1 ak e𝛼k . 

Exercises

1. In this problem, the base filed is ℝ. Let V1 be the set of real symmetric n × n
matrices, and let V2 be the set of skew-symmetric matrices. Show that V1
and V2 are subspaces of ℝn×n , and find their dimensions. An n × n matrix
A is skew-symmetric if, for all 1 ≤ i, j ≤ n, aij = −aji .
2. Let V be a subspace of U. Show that dim(V) ≤ dim(U).
3. Let U be an n-dimensional vector space, and let S be a subset of U of exactly
n elements. Prove that the following are equivalent:
(a) S is a basis for U.
(b) S is independent.
(c) S spans U.
4. Let V be a subspace of U. Show that if V contains a basis for U, then V = U.
5. Show that a vector space U is infinite dimensional if and only if it contains
an infinite independent subset.
6. Let U be an infinite-dimensional vector space. Show that there is a sequence
V1 ⊃ V2 ⊃ ... (proper containments) of subspaces of U such that dim(Vn ) =
dim(U) for all n.
7. Let {x0 , . . . , xn } be a set of distinct real numbers. For 0 ≤ i ≤ n, define the
following set of polynomials in ℙn :

(x − x1 )(x − x2 )...(x − xn )
L0 (x) = ,
(x0 − x1 )(x0 − x2 )...(x0 − xn )
(x − x0 )(x − x2 )...(x − xn )
L1 (x) = , . . . , and
(x1 − x0 )(x1 − x2 )...(x1 − xn )
(x − x0 )(x − x1 )...(x − xn−1 )
Ln (x) = .
(xn − x0 )(xn − x1 )...(xn − xn−1 )
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 61

Show that the set {Li }ni=0 is a basis for ℙn . Hint: For a polynomial f ∈ ℙn ,
n
show that f = ∑i=0 f (xi )Li (x). Observe that Li (xj ) = 𝛿ij .
8. Let I = [a, b] be a closed, bounded interval, and suppose a = t1 < t2 < ... <
tn = b is a fixed set of points (also called nodes) in I. Define V to be the
set of continuous functions on [a, b] whose restrictions to the subintervals
[ti , ti+1 ] are linear. Prove that V is a vector space, and find a basis for it.
A function in the space V is known as a continuous, piecewise linear
function with nodes {t1 , . . . , tn }.
9. The space of continuous, piecewise linear functions. Let U be the collec-
tion of all continuous, piecewise linear functions on [a, b]. Prove that U is
an infinite-dimensional vector space.
10. Show that ℝ is an infinite-dimensional vector space over ℚ.
11. Let M be a field, let L be a subfield of M, and let K be a subfield of L.
We can consider L as a vector space over K, and M as a vector space
over either L or K. Prove that if dimL (M) and dimK (L) are finite, then
dimK (M) = dimL (M).dimK (L).

3.4 Linear Mappings, Quotient Spaces, and Direct Sums

A proper understanding of this section is essential for a smooth transition to the


rest of the book. While the early results in the section are elementary, a number
of important concepts make their first debut later in the section. Specifically, this
includes quotient spaces and quotient maps, direct sums, projections and algebraic
complements, linear functionals and linear operators, maximal subspaces and the
co-dimension of a subspace and, finally, the definition of an algebra over a field.

Definition. Let U and V be vector spaces over 𝕂. A mapping T ∶ U → V is said to


be linear if, for all u, v ∈ U, and all a ∈ 𝕂,

T(u + v) = T(u) + T(v), and T(au) = aT(u).

The following are examples of linear mappings.

Example 1. Define T ∶ ℙ → ℙ by T( f ) = f ′ (the derivative of f). 

Example 2. Let A be an m × n matrix with entries in 𝕂. The linear mapping


T ∶ 𝕂n → 𝕂m , defined by T(x) = Ax, is known as the mapping induced by the
matrix A. It is easy to check that every linear transformation T ∶ 𝕂n → 𝕂m
is induced by the m × n matrix A whose columns are T(e1 ), . . . , T(en ). Here
{e1 , . . . , en } is the canonical basis for 𝕂n . The matrix A is called the standard
matrix of T. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

62 fundamentals of mathematical analysis

Theorem 3.4.1. If T ∶ U → V is linear, then

(a) T(0) = 0;
(b) T(−u) = −T(u);
n n
(c) if a1 , . . . , an ∈ 𝕂 and u1 , . . . , un ∈ U, then T(∑i=1 ai ui ) = ∑i=1 ai T(ui );
(d) the image under T of a subspace of U is a subspace of V; and
(e) the inverse image under T of a subspace of V is a subspace of U. 

Definition. Let T ∶ U → V be linear. The kernel (or null-space) of T, written


Ker(T ) or 𝒩(T ), is T−1 (0). The range of T is defined by ℜ(T) = {T(u) ∶ u ∈ U}.

Theorem 3.4.2. Let T ∶ U → V be linear. Then

(a) 𝒩(T ) is a subspace of U,


(b) ℜ(T) is a subspace of V, and
(c) T is one-to-one if and only if 𝒩(T ) = {0}. 

x
Example 3. Let T ∶ ℙ → ℙ be defined by T( f ) = ∫0 f (t)dt. It is easy to verify
directly that 𝒩(T ) = {0} and that T is one-to-one. 

Definition. Let T ∶ U → V be linear. The rank of T is the dimension of ℜ(T )


and the nullity of T is the dimension of 𝒩(T ). The rank and nullity of a linear
mapping are particularly useful when they are finite.

Theorem 3.4.3. Let U be a vector space of dimension n < ∞, and let T be a linear
transformation from U to a vector space V.
Then dim(Ker(T )) + dim(ℜ(T )) = n. In other words,

rank(T ) + nullity(T ) = n.

Proof. Let S1 = {u1 , . . . , ur } be a basis for Ker(T ). Augment S1 to a basis {u1 , . . . , un }


for U. We show that rank(T ) = n − r by showing that {T(ur+1 ), . . . , T(un )} is
a basis for ℜ(T). Every element y in ℜ(T) has the form T(x), where x ∈ U.
n n n
Write x = ∑i=1 ai ui , then y = T(∑i=1 ai ui ) = ∑i=r+1 ai T(ui ). This shows that
n
T(ur+1 ), . . . , T(un ) Span ℜ(T). Suppose, for some scalars br+1 , . . . , bn , ∑i=r+1 bi
n n n
T(ui ) = 0. Then T(∑i=r+1 bi ui ) = 0, and ∑i=r+1 bi ui ∈ Ker(T ), so ∑i=r+1 bi ui =
r
∑i=1 ai ui for some scalars a1 , . . . , ar . This would contradict the independence
of {u1 , . . . , un }, unless a1 , . . . , ar , and br+1 , . . . , bn are all zero. This shows the
independence of T(ur+1 ), . . . , T(un ) and concludes the proof. 

Example 4. Let T ∶ ℙn → ℙn be defined by T( f ) = f ′ . Clearly, 𝒩(T ) consists


of all constant functions. Now ℜ(T ) = Span({1, x, . . . , xn−1 }) because if
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 63

n−1 n−1 aj xj+1


g = ∑j=0 aj xj , then T( f ) = g, where f = ∑j=0 . Observe that rank(T ) = n,
j+1
and nullity(T ) = 1, consistent with theorem 3.4.3. 

Theorem 3.4.4. Let S = {u𝛼 }𝛼∈I be a basis for a vector space U, and let {v𝛼 }𝛼∈I be
an arbitrary subset of a vector space V. Then there exists a unique linear mapping
T ∶ U → V such that, for every 𝛼 ∈ I, T(u𝛼 ) = v𝛼 .

Proof. Every vector x ∈ U has a unique representation as x = ∑𝛼∈F a𝛼 u𝛼 for some


finite subset F ⊆ I. Define T(x) = ∑𝛼∈F a𝛼 v𝛼 ; T is clearly linear (the interested
reader is encouraged to formulate the notation needed to write out the details).
To show that T is unique, suppose S ∶ U → V is another linear mapping such that
S(u𝛼 ) = v𝛼 . Let x = ∑𝛼∈F a𝛼 u𝛼 ∈ U. Then

S(x) = S( ∑ a𝛼 u𝛼 ) = ∑ a𝛼 S(u𝛼 )
𝛼∈F 𝛼∈F

= ∑ a𝛼 v𝛼 = ∑ a𝛼 T(u𝛼 ) = T( ∑ a𝛼 u𝛼 ) = T(x). 
𝛼∈F 𝛼∈F 𝛼∈F

The above theorem says that a linear mapping is completely (and uniquely)
determined by its values on a basis. Stated differently, an arbitrary function on
a basis for U can be uniquely extended to a linear function on U.

Example 5. Let S = {1, x, x2 , ...} be the canonical basis for ℙ, and define T ∶ S → ℙ
by T(1) = 0, T(x) = 0, and, for n ≥ 2, T(xn ) = n(n − 1)xn−2 . It is clear that the
unique linear mapping on ℙ that extends T is T( f ) = f ″ (the second derivative
of f.) 

Definition. A linear mapping T ∶ U → V is an isomorphism if it is a bijection.


In this case, we say that U and V are isomorphic. Isomorphic spaces may have
different underlying sets and different operations, but, from the algebraic point
of view, they are essentially identical.

Example 6. ℙn is isomorphic to 𝕂n+1 because the linear mapping


n
T(∑i=0 ai xi ) = (a0 , a1 , . . . , an ) is an isomorphism. 

Example 7. The space ℙ of all polynomials is isomorphic to 𝕂(ℕ).


n
Let f = ∑i=0 ai xi ∈ ℙ. The following linear mapping is an isomorphism: T( f ) =
y, where y is the sequence (y0 , y1 , y2 , ...) such that

ai if 0 ≤ i ≤ n,
yi = {
0 if i > n. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

64 fundamentals of mathematical analysis

In theorem 3.3.5, we established the existence of a vector space of any given


dimension. We now show that such a space is unique up to an isomorphism.

Theorem 3.4.5. An n-dimensional vector space U is isomorphic to 𝕂n .

Proof. Let {u1 , . . . , un } be a basis for U, and define T ∶ U → 𝕂n to be the unique


linear mapping that extends T(ui ) = ei , 1 ≤ i ≤ n; T is clearly one-to-one, and it
is onto because its range contains the canonical basis for 𝕂n . 

Theorem 3.4.6. Let U be a vector space of infinite dimension ℵ, and let I be a set
such that Card(I) = ℵ. Then U is isomorphic to 𝕂(I).

Proof. Let {u𝛼 }𝛼∈I be a basis for U, and let {e𝛼 } be the canonical basis for 𝕂(I). If
∑𝛼∈F a𝛼 u𝛼 is the unique representation of an element x ∈ U as a finite linear
combination of the basis elements, define T ∶ U → 𝕂(I) by T(x) = ∑𝛼∈F a𝛼 e𝛼 .
The proof that T is an isomorphism is much like the proof of the previous
theorem. 

Quotient Spaces

Let V be a subspace of a vector space U. Define a relation R on U by xRy if


x − y ∈ V. It is easy to verify that R is an equivalence relation. The equivalence
classes of R are subsets of U of the form x + V = {x + v ∶ v ∈ V}. Such a set is
called a coset of V.

For example, let U = ℝ2 and let V be a one-dimensional subspace of U. Then V is


a straight line containing the origin, and the cosets of V are lines parallel to V. 

Definition. The quotient space U/V (read U modulo V) consists of the cosets of
V, endowed with a vector space structure by the operations

(x + V) + (y + V) = (x + y) + V
and
a(x + V) = (ax) + V.

The above operations are well defined in the sense that they do not depend on the
particular element x chosen to represent the coset x + V. For example, if x′ + V =
x + V and y′ + V = y + V, then x′ − x ∈ V, y′ − y ∈ V, and (x′ + y′ ) − (x + y) ∈ V;
hence (x′ + y′ ) + V = (x + y) + V. For brevity of notation, the coset x + V will be
denoted by x.

Definition. Let V be a subspace of a vector space U. The function 𝜋 ∶ U → U/V,


defined by 𝜋(x) = x, is called the quotient map. It is easy to verify that 𝜋 is
linear.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 65

Theorem 3.4.7. Let T ∶ U → W be linear, and let V = Ker(T ). Then U/V is isomor-
phic to ℜ(T ) via the isomorphism T(x) = T(x).

Proof. We leave it to the reader to verify that T is well defined. Clearly, T is onto. We
verify the linearity of T:

T(ax + by) = T(ax + by) = T(ax + by) = aT(x) + bT(y) = aT(x) + bT(y).

To show that T is one-to-one, suppose T(x) = 0. Therefore T(x) = 0, and


x ∈ Ker(T ) = V; hence x = 0. 

Direct Sums

Definition. Let U1 and U2 be subspaces of a vector space U. The sum of U1 and


U2 is the set U1 + U2 = {x + y ∶ x ∈ U1 , y ∈ U2 }. It is clear that U1 + U2 is a
subspace of U.

Example 8. Let U = ℝ3 , and let U1 and U2 be distinct lines containing the origin.
Then the subspace U1 + U2 is the plane that contains U1 and U2 . 

Theorem 3.4.8. U1 + U2 = Span(U1 ∪ U2 ). 

Definition. A vector space U is the direct sum of two subspaces U1 and U2 if


U1 + U2 = U, and U1 ∩ U2 = {0}. In this case, we write U = U1 ⊕ U2 and say
that U2 is an algebraic complement of U1 in U.

Example 9. ℝ2 = {(x1 , 0) ∶ x1 ∈ ℝ} ⊕ {(0, x2 ) ∶ x2 ∈ ℝ}. 

Theorem 3.4.9. Let U1 and U2 be subspaces of a vector space U. Then U = U1 ⊕ U2


if and only if every vector u ∈ U can be written uniquely as u = u1 + u2 , where
u1 ∈ U1 , u2 ∈ U2 .

Proof. Such a representation of u is guaranteed by the definition of a direct sum. To


prove the uniqueness, suppose u1 + u2 = v1 + v2 , where u1 , v1 ∈ U1 and u2 , v2 ∈
U2 . Then u1 − v1 = v2 − u2 ∈ U1 ∩ U2 = {0}, so u1 − v1 = v2 − u2 = 0 and
u1 = v1 and u2 = v2 . The converse is straightforward. 

Example 10. Let c be the space of all convergent sequences, and let c0 be the
space of all sequences that converge to 0. We show that c = c0 ⊕ Span({e}),
where e = (1, 1, 1, ...). Let x = (x1 , x2 , ...) be a convergent sequence, and let
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

66 fundamentals of mathematical analysis

𝜉 = limn xn . Define y = (x1 − 𝜉, x2 − 𝜉, x3 − 𝜉, ...), and let z = 𝜉e = (𝜉, 𝜉, 𝜉, ...).


Clearly, y converges to 0 (i.e., y ∈ c0 ), and x = y + z. The representation of
x = y + z is unique because the only constant sequence that converges to 0 is
the zero sequence. 

Theorem 3.4.10. Every subspace U1 of a vector space U has a complement in U.

Proof. We need to show that there is a subspace U2 of U such that U = U1 ⊕ U2 . Let


S1 be a basis for U1 . Augment S1 to a basis S of U, and let S2 = S − S1 . If U2 =
Span(S2 ), then U = U1 ⊕ U2 . We leave it to the reader to write out the details. 

Definition. Let U = U1 ⊕ U2 . The projection 𝜋1 ∶ U → U1 is the linear mapping


𝜋1 (u) = u1 , where u = u1 + u2 , and is the unique representation of u provided
by theorem 3.4.9. The projection 𝜋2 onto U2 is defined similarly. Some of the
properties of projections are explored in the section exercises.

Example 11. ℝ2 = {(x1 , 0) ∶ x1 ∈ ℝ} ⊕ {(0, x2 ) ∶ x2 ∈ ℝ}. The projection 𝜋1


projects ℝ2 onto the x1 -axis in the sense of elementary geometry. 

Theorem 3.4.11. Let U = U1 ⊕ U2 . Then Ker(𝜋1 ) = U2 , and U/U2 is isomorphic


to U1 .

Proof. To verify that Ker(𝜋1 ) = U2 , let x = u1 + u2 , where u1 ∈ U1 , u2 ∈ U2 .


𝜋1 (x) = 0 if and only if u1 = 0, if and only if x = u2 ∈ U2 . The fact that U/U2 is
isomorphic to U1 follows from theorem 3.4.7. 

Linear Functionals and Operators

A particularly important set of linear transformations is that from a vector space


U to the base field 𝕂.

Definition. A linear mapping from a vector space U to the base field 𝕂 is called a
linear functional on U.

The following are examples of linear functionals.

1
Example 12. Define 𝜆 ∶ ℙ → ℝ by 𝜆( f ) = ∫0 f (x)dx. (The base field is ℝ and the
polynomials have real coefficients.) 
n
Example 13. Define 𝜆 ∶ 𝕂n×n → 𝕂 by 𝜆(A) = ∑i=1 aii . Here A = (aij ) is an n × n
n
matrix. The quantity ∑i=1 aii is called the trace of A, often written tr(A). 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 67

Theorem 3.4.12. Let M be a subspace of a vector space U. The following are


equivalent:

(a) M = Ker(𝜆) for some nonzero linear functional 𝜆 on U.


(b) M has a one-dimensional complement.

Proof. (a) implies (b). Let x ∈ U be such that 𝜆(x) ≠ 0. By replacing x with x/𝜆(x),
we may assume that 𝜆(x) = 1. For y ∈ U, let w = y − 𝜆(y)x. Then 𝜆(w) = 𝜆(y) −
𝜆(𝜆(y)x) = 𝜆(y) − 𝜆(y)𝜆(x) = 0. This shows that w ∈ Ker(𝜆) = M; hence y =
w + 𝜆(y)x ∈ M + Span({x}), and U = M + Span({x}). Next we show that M ∩
Span({x}) = {0}. This will complete the proof. If y ∈ M ∩ Span({x}), then y = ax
for some a ∈ 𝕂, and 𝜆(y) = 0. But 𝜆(y) = a𝜆(x) = a. Thus a = 0, and y = 0.
Conversely, suppose that U = M ⊕ Span({x}) for some nonzero x ∈ U. Let S1
be a basis for M, and let S = S1 ∪ {x}. Then S is a basis for U. Define 𝜆 ∶ S → 𝕂
by 𝜆(x) = 1, and 𝜆(u) = 0 for all u ∈ S1 . Finally, extend 𝜆 to a linear functional,
which we also denote by 𝜆, on U according to theorem 3.4.4. The reader can easily
verify that Ker(𝜆) = M. 

Example 14. Refer to example 12. Let M = Ker(𝜆). The following facts are easy to
1
verify: A basis for M is {xn − ∶ n ∈ ℕ}, and the one-dimensional subspace
n+1
N of constant polynomials is a complement of M. Every polynomial f can be
1
written as f = g + c, where c = ∫0 f (t)dt, and g = f − c. 

Definition. A proper subspace M of a vector space U is said to be a maximal


subspace if it is not properly contained in any other proper subspace of U.

Theorem 3.4.13. For a subspace M of a vector space U, each of the following is


equivalent to each of the conditions (a) and (b) of the previous theorem:

(a) M is a maximal subspace of U.


(b) U/M has dimension 1. 

Definition. If dim(U/M) = 1, M is said to have co-dimension 1. More generally,


if V is a subspace of U, then dim(U/V) is called the co-dimension of V in U.
The concept is particularly useful when dim(U/V) < ∞.

Another important vector space is the space of all linear transformations from one
vector space U to another space V.

Notation. Let U and V be vector spaces. The set of all linear transformations
from U to V is denoted by Hom(U, V). A linear mapping is also called a
homomorphism, hence the notation Hom(U, V).
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

68 fundamentals of mathematical analysis

It is easy to see that Hom(U, V) is a vector space with the following operations:
for T1 , T2 ∈ Hom(U, V) and a ∈ 𝕂, (T1 + T2 )(u) = T1 (u) + T2 (u), and (aT1 )(u) =
aT1 (u).

An element of Hom(U, U) is often called a linear operator on U. Hom(U, U) has


additional structure provided by the composition of linear operators; if S, T ∈
Hom(U, U), (SoT)(u) = S(T(u)). When there is no danger of ambiguity, we write
ST for SoT. The composition of linear operators satisfies a number of properties
including, for example, S(T1 + T2 ) = ST1 + ST2 .

Definition. A vector space U over a field 𝕂 is an algebra over 𝕂, if U possesses


another binary operation, to be called multiplication, such that, for all u, v, w ∈
U, and all a ∈ 𝕂, the following conditions are met:

(a) u(vw) = (uv)w


(b) u(v + w) = uv + uw, and (u + v)w = uw + vw
(c) a(uv) = u(av) = (au)v

The multiplication operation in an algebra is not necessarily commutative, and


an algebra need not contain a multiplicative identity element, although many
important algebras do. The simplest example of an algebra is the space of square
matrices 𝕂n×n , where the binary operations are addition and multiplication of
matrices.

Theorem 3.4.14. Suppose U and V are vector spaces over a field 𝕂. Then Hom(U, V)
is a vector space, and Hom(U, U) is an algebra over 𝕂. 

Exercises

1. Prove theorem 3.4.1.


2. Prove theorem 3.4.2.
3. Let T ∶ U → V be linear, and let S1 be a basis for Ker(T ). Augment S1 to a
basis S of U, and let S2 = S − S1 . Prove that T(S2 ) is a basis for ℜ(T ). This
result is a generalization of theorem 3.4.3 when dim(U) = ∞.
4. Show that if there exists a linear mapping that maps U onto V, then
dim(V) ≤ dim(U).
5. Show that if there exists a one-to-one linear mapping from U to V, then
dim(U) ≤ dim(V).
6. Let {x0 , . . . , xn } be a set of distinct real numbers. Show that the mapping
T ∶ ℙn → 𝕂n+1 given by T( f ) = (f (x0 ), . . . , f (xn )) is an isomorphism.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 69

7. Prove theorem 3.4.8.


8. Give an example to show that the algebraic complement of a subspace is not
unique.
9. Prove that if U = U1 ⊕ U2 , then dim(U) = dim(U1 ) + dim(U2 ).
10. In this problem, the base field is ℝ. Let U1 be the subspace of ℝn×n of real
symmetric n × n matrices, and let U2 be the subspace of skew-symmetric
matrices. Show that ℝn×n = U1 ⊕ U2 .
11. Let U = U1 ⊕ U2 , and, for i = 1, 2 let Ti be a linear mapping from Ui to a
vector space W. Prove that there exists a unique linear mapping T ∶ U → W
such that T|Ui = Ti .

Definition. Let T ∶ U → U be linear. A subspace V of U is said to be T-


invariant if T(V) ⊆ V.

12. Prove that if V is a T-invariant subspace of U, then the mapping T ∶ U/V →


U/V, defined by T(x + V) = T(x) + V, is linear. Part of the exercise is to
show that T is well defined. If 𝜋 ∶ U → U/V is the quotient map, show that
𝜋oT = To𝜋.
13. Let U1 and U2 be subspaces of a vector space U such that U = U1 ⊕ U2 ,
and let 𝜋 ∶ U → U1 be the projection of U onto U1 . Prove that 𝜋 2 = 𝜋, that
U1 = {x ∈ U ∶ 𝜋(x) = x} = ℜ(𝜋), and that U2 = Ker(𝜋). By definition,
𝜋 2 (x) = 𝜋(𝜋(x)).

Definition. Definition. A linear operator T on a vector space U is said to


be idempotent if T2 = T. The problem above says that the projection 𝜋 of
U onto U1 is idempotent.

14. Let 𝜋 be a linear idempotent operator on a vector space U. Prove that


U = U1 ⊕ U2 , where U1 = {x ∈ U ∶ 𝜋(x) = x}, and U2 = Ker(𝜋).
15. Prove theorem 3.4.13.
16. Exhibit a basis for the null-space of the functional in example 13.
17. Prove theorem 3.4.14.
18. Let U be an n-dimensional vector space, and let U∗ = Hom(U, 𝕂). Suppose
{u1 , . . . , un } is a basis for U, and, for each 1 ≤ i ≤ n, define a linear functional
𝜆i ∈ U∗ by 𝜆i (uj ) = 𝛿ij , (1 ≤ j ≤ n) (see theorem 3.4.4). Prove that {𝜆i ∶ 1 ≤
i ≤ n} is a basis for U∗ . The space U∗ is called the dual of U.
19. Let 𝜆1 , . . . , 𝜆n be linear functionals on a vector space U, let Mi = Ker(𝜆i ),
and let N = ∩ni=1 Mi . Prove that dim(U/N) ≤ n. Hint: Define T ∶ U → 𝕂n
by T(x) = (𝜆1 (x), . . . , 𝜆n (x)). Use theorem 3.4.7.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

70 fundamentals of mathematical analysis

3.5 Matrix Representation and Diagonalization

A careful reading of example 2 in section 3.4 reveals that the set of linear mappings
from 𝕂n to 𝕂m is in one-to-one correspondence with the set of m × n matrices.
This section generalizes this result. Suppose U and V are finite-dimensional vector
spaces and that {u1 , . . . , nn } and {v1 , . . . , vm } are bases for U and V, respectively.
Theorem 3.4.4 states that a linear mapping T ∶ U → V is uniquely determined by
the vectors T(u1 ), . . . , T(un ). Since each of the vectors T(uj ) can be uniquely written
as a linear combination of {v1 , . . . , vm } with coefficients in 𝕂, the set of coefficients
determines T uniquely. This observation is the basis for the opening definition of
this section. The information in this section is standard, and we assume familiarity
with its contents.

Matrix Representations of Linear Mappings

Let U and V be finite-dimensional vector spaces, and let n = dim(U), m = dim(V).


Fix a pair of bases B = {u1 , . . . , un } and C = {v1 , . . . , vm } for U and V, respectively.
If T ∈ Hom(U, V), then, for every 1 ≤ j ≤ n, T(uj ) can be written as a linear
m
combination of C, say, T(uj ) = ∑i=1 aij vi .

Definition. Given the construction in the previous paragraph, the matrix


A = (aij ) is called the matrix of T relative to the base pair (B, C).

The matrix representing a linear mapping is totally dependent on the base pair
(B, C) and is even sensitive to the permutation of the elements in each basis. Thus
the bases B and C are assumed to be ordered.

Example 1. Consider the linear transformation T ∶ 𝕂n → 𝕂m induced by an


m × n matrix A. It is clear that the matrix of T relative to the base pair (Bn , Bm )
is A, where Bn and Bm are the canonical bases for 𝕂n and 𝕂m , respectively. 

df
Example 2. Let T ∶ ℙn → ℙn be the linear transformation T( f ) = .
dx
If B = {1, x, . . . , xn }, then the matrix of T relative to (B, B) is

0 1
⎛ ⎞
⎜ 0 2 ⎟. 
⎜ ⋱ n⎟
⎝ 0⎠

Theorem 3.5.1. The function Φ ∶ Hom(U, V) → 𝕂m×n that assigns to an element


T ∈ Hom(U, V) its matrix relative to the base pair (B, C) is an isomorphism. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 71

Now we study how the matrix of the composition of two linear transformations
relates to the matrices of the composed transformations.

Theorem 3.5.2. Let U, V, B, C, and T be as in the above definition, let D =


{w1 , . . . , wp } be a basis for a third vector space W, and, finally, let S ∈ Hom(V, W).

If A = (aij ) is the matrix of T relative to the base pair (B, C), and A′ = (aij ) is the
matrix of S relative to the base pair (C, D), then the matrix product A′ A is the
matrix of SoT relative to the base pair (B, D).

Proof. Let 1 ≤ j ≤ n. Then

m m p

(SoT)(uj ) = S(T(uj )) = S( ∑ aij vi ) = ∑ ∑ aij aki wk
i=1 i=1 k=1
p m p m
′ ′
= ∑ ( ∑ aki aij )wk = ∑ ekj wk , where ekj = ∑ aki aij .
k=1 i=1 k=1 i=1

Thus the matrix of SoT relative to (B, D) is E = (ekj ). By the definition of matrix
multiplication, ekj is the (k, j) entry of the product A′ A. 

The above theorem is the crucial piece of information needed to prove the
following theorem, which is a special case of theorem 3.5.1 when V = U and C = B.

Theorem 3.5.3. The function Φ ∶ Hom(U, U) → 𝕂n×n that assigns to an element


T ∈ Hom(U, U) its matrix relative to the base B (i.e., relative to the base pair
(B, B)) is an algebra isomorphism. Thus, in addition to being linear, Φ satisfies
Φ(SoT) = Φ(S)Φ(T ). 

Definition. Let U be an n-dimensional vector space, and let B = {u1 , . . . , un } and


′ ′
B′ = {u1 , . . . , un } be two bases for U. Every vector uj ∈ B is a linear combination
n ′
of the base B′ , uj = ∑i=1 pij ui . The resulting matrix P = (pij ) is called the matrix
from B to B′ .

It is important to understand that the matrix P from B to B′ is the matrix of the


identity transformation IU ∶ U → U relative to the base pair (B, B′ ).

Example 3. Notice that if P is the matrix from B to B′ , then P−1 is the matrix from
B′ to B. Indeed, let Q be the matrix from B′ to B, and consider the mapping
T = IU with the base pair (B, B′ ) (its matrix is P), and the mapping S = IU with
the base pair (B′ , B) (its matrix is Q). Consider the matrix of the composition
SoT. On the one hand, its matrix relative to (B, B) is QP by theorem 3.5.2. On the
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

72 fundamentals of mathematical analysis

other hand, the matrix of IU relative to (B, B) is the identity matrix In . Therefore
QP = In , and Q = P−1 . 

Example 4. Given a basis B for U and an invertible n × n matrix P, there is a basis


B′ for U such that P is the matrix from B to B′ . To see this, let Q = (qij ) be the
′ n ′ ′
inverse of P, and define uj = ∑i=1 qij ui . The set B′ = {u1 , . . . , un } is a basis for U
(see problem 3 at the end of this section), and the matrix from B′ to B is Q. By
example 3, P is the matrix from B to B′ . 

Example 5. As another application of problem 3, let B′ = {P0 , . . . , Pn } be polyno-


mials such that, for each 0 ≤ i ≤ n, Pi has exact degree i. Then B′ is a basis for
i
ℙn . To see this, write Pi = ∑j=0 qij xj . The lower triangular matrix Q = (qij ) is
n
invertible because its determinant is ∏i=0 qii ≠ 0. Since B = {1, . . . , xn } is a basis
for ℙn , so is B′ , by problem 3. 

The above discussion leads to the following.

Theorem 3.5.4. Let U be an n-dimensional vector space. Then the collection of


bases for U is in one-to-one correspondence with the collection of invertible n × n
matrices.

Proof. Fix a basis B for U. For another basis B′ for U, let P be the matrix from
B to B′ . The correspondence Ψ ∶ B′ ↦ P is the correspondence promised by the
theorem. We leave the rest of the formalities to the reader. The examples preceding
this theorem are relevant for verifying the details. 

Theorem 3.5.5 (change of base Formula). Let U and V be finite-dimensional


vector spaces, and let n = dim(U), m = dim(V). Let B and B′ be bases for U, and
let C and C′ be bases for V. Let T ∈ Hom(U, V) and suppose that A is the matrix
of T relative to a base pair (B, C) and that A′ is the matrix of T relative to the base
pair (B′ , C′ ). If P is the matrix from B′ to B and Q is the matrix from C′ to C, then
A′ = Q−1 AP.

Proof. Consider diagram 1. Each corner contains a pair: a space and a basis. The
top arrow prompts the reader to consider the mapping T ∶ U → V and mind the
bases indicated in the top corners of the diagram. Thus the matrix of the mapping
T represented by the top arrow is relative to the base pair (B, C) and is therefore A.
Likewise, the matrices representing the rest of mappings indicated on the diagram
are Q−1 for IV , P−1 for IU , and A′ for the mapping depicted by the bottom arrow.
Now IV oT = ToIU . Applying theorem 3.5.2 to each side of the above equation, we
get Q−1 A = A′ P−1 , or A′ = Q−1 AP. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 73

T
(U, B) (V, C)
IU IV

′ T
(U, B ) (V, C′ )

Diagram 1

Corollary 3.5.6. Let U be an n-dimensional vector space, let T ∈ Hom(U, U), and
let B and B′ be bases for U. If A is the matrix of T relative to B, and A′ is the matrix
of T relative to B′ , then A′ = P−1 AP, where P is the matrix from B′ to B.

Proof. This is the special case of the above theorem when V = U, C = B, and
C′ = B′ . 

Diagonalization

When the matrix representing a linear operator T on a finite-dimensional vector


space relative to a basis B is diagonal, the action of T on B is quite simple: T maps
each element of B to a multiple of itself. The following question is natural: given an
operator T ∈ Hom(U, U), can you find a basis for U relative to which the matrix of
T is diagonal? By corollary 3.5.6, the matrix equivalent of the question is as follows:
given an arbitrary square matrix A, can you find an invertible matrix P such that
P−1 AP is diagonal? The answer to both questions is no. The following definitions
formalize the discussion.

Definition. A linear operator T on a finite-dimensional vector space U is diag-


onalizable if U contains a basis relative to which the matrix of T is diagonal.
Equivalently, U possesses a basis B consisting entirely of eigenvectors of T.

Definition. A square matrix A is diagonalizable if there exists an invertible matrix


P such that P−1 AP is diagonal.

The following theorem gives a necessary and sufficient condition for a square
matrix (linear operator) to be diagonalizable.

Theorem 3.5.7. A square matrix A is diagonalizable if and only if 𝕂n has a basis


consisting entirely of eigenvectors of A.

Proof. Suppose A is diagonalizable. Thus there exists an invertible matrix P such that
P−1 AP = D, a diagonal matrix. Let 𝜆1 , . . . , 𝜆n be the diagonal entries of D, and let
P = [u1 , . . . , un ] be a partitioning of A by its columns. The equation P−1 AP = D
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

74 fundamentals of mathematical analysis

is equivalent to A[u1 , . . . , un ] = PD, or [Au1 , . . . , Aun ] = [𝜆1 u1 , . . . , 𝜆n un ]. Thus


Aui = 𝜆i ui , for 1 ≤ i ≤ n, and {u1 , . . . , un } is a basis of 𝕂n consisting of eigenvectors
of A. To prove the converse, we simply reverse the above argument. 

We will discuss in section 3.7 a class of matrices that can be diagonalized in a


very spacial way. We will also extend the discussion to infinite-dimensional spaces
in chapter 7. We conclude the section with two examples of linear operators on
infinite-dimensional spaces. In the first example, the operator has uncountably
many eigenvalues; in the second, it has none.

d2 f
Example 6. Let T be an operator on 𝒞∞ (ℝ), defined by T( f ) = 2 + f. It is easy
dx
to verify that, for every 𝜔 ∈ ℝ, the function f𝜔 (x) = sin (𝜔x) is an eigenfunction
of T corresponding to the eigenvalue 𝜆𝜔 = 1 − 𝜔2 . 

Example 7. Let T be an operator on 𝒞∞ (ℝ), defined by (Tf)(x) = xf (x). We verify


that T has no eigenvalues. If T( f ) = 𝜆f, then xf (x) = 𝜆f (x) for all x ∈ ℝ. This
implies that f (x) = 0 for all x ≠ 𝜆. The continuity of f implies that f (𝜆) = 0; thus
f = 0. 

Exercises

1. Let A and B be n × n matrices such that AB = In . Prove that A is invertible


and hence B = A−1 .
2. Let U be a finite-dimensional vector space, and let T ∶ U → U be linear. Prove
that if V is an r-dimensional, T-invariant subspace of U, then there is a basis
for U relative to which the matrix of T has the form

A11 A12
( ),
0 A22

where A11 is an r × r submatrix.


3. Given a basis B = {u1 , . . . , un } for U and an invertible n × n matrix Q = (qij ),
′ n ′ ′
define uj = ∑i=1 qij ui . Show that the set B′ = {u1 , . . . , un } is a basis for U.
4. Let A be an n × n matrix and let P be an invertible n × n matrix. Show that
A and P−1 AP have the same eigenvalues. It follows from this result that the
eigenvalues of a linear operator T on a finite-dimensional space are those of
the matrix representing T relative to any basis.
5. Let U be a finite-dimensional vector space, and let T ∈ Hom(U, U). Prove
that T is diagonalizable if and only if its matrix relative to any basis is
diagonalizable.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 75

6. Let U be a finite-dimensional vector space, and let T ∈ Hom(U, U). Define


det(T ) to be the determinant of the matrix representing T relative to some
basis of U. Prove that det(T ) is independent of the choice of the basis.
7. Let T be a linear mapping from a finite-dimensional vector space U to a
finite-dimensional vector space V. Prove that there exist a pair of bases for
U and V relative to which the matrix of T is diagonal. Hint: See theorem B.3
in appendix B.
df
8. Let T ∶ ℙn → ℙn be the linear operator T( f ) = + f. Show that T is not
dx
diagonalizable.

3.6 Normed Linear Spaces

Let us examine the function d ∶ ℝ2 → ℝ, which assigns to a point (x1 , x2 ) ∈ ℝ2 its


1/2
distance from the origin. Thus d(x) = (x21 + x22 ) . The function d has the following
characteristics:

(1) d(x) ≥ 0 and d(x) = 0 if and only if x = 0.


(2) For a real scalar a and a point x ∈ ℝ2 , d(ax) = |a|d(x).
(3) For x, y ∈ ℝ2 , d(x + y) ≤ d(x) + d(y).

The abstraction of the function d to an arbitrary vector space yields the definition
of a normed linear space. Instead of using the notation d(x), we use the universally
accepted notation ‖x‖ for the length of a vector x, or its distance from the zero
vector.
Normed linear spaces are the most common examples of metric spaces. What
sets norms apart, still using the function d on ℝ2 as our prototype, is the fact that
the distance function between two points in the plane is translation invariant
in the sense that if D ∶ ℝ2 × ℝ2 → ℝ is the function D(x, y) = {(x1 − y1 )2 +
1/2
(x2 − y2 )2 } , then D(x, y) = D(x − a, y − a) for all x, y, a ∈ ℝ2 . Equivalently,
D(x, y) = D(x − y, 0) = d(x − y). See the definition of a translation later on in
this section. This property makes no sense for a general metric space because the
underlying set of a metric space is not required to be a vector space.

Definition. A normed linear space is a vector space X over 𝕂 together with a


function ‖.‖ ∶ X → ℝ such that, for all x, y ∈ X and all a ∈ 𝕂,

(a) ‖x‖ ≥ 0 and ‖x‖ = 0 if and only if x = 0,


(b) ‖ax‖ = |a|‖x‖, and
(c) ‖x + y‖ ≤ ‖x‖ + ‖y‖.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

76 fundamentals of mathematical analysis

The function ‖.‖ is called a norm on X, and condition (c) in the above definition
is known as the triangle inequality.

Motivated by the discussion about the translation invariance of the distance


function in the plane, the following definition makes sense.

Definition. The distance between two points x and y in a normed linear space X
is the scalar ‖x − y‖.

The reader can easily verify that the defining conditions of a norm are satisfied
in each of the examples below.

Example 1. Let X = 𝕂n , and define the 1-norm of x = (x1 , . . . , xn ) by


n
‖x‖1 = ∑i=1 |xi |. 

Example 2. Let X = 𝕂n , and define the ∞-norm of x by

‖x‖∞ = max1≤i≤n |xi |. 

Example 3. Let l∞ be the space of bounded sequences discussed in section 3.1.


The norm of a bounded sequence (xn ) is defined by

‖x‖∞ = supn∈ℕ |xn |. 

Example 4. In section 3.1, we defined the space X = ℬ[a, b] to consist of all


bounded functions on the interval [a, b]. The supremum norm (also the
uniform or ∞-norm) of a function f ∈ X is defined by

‖f‖∞ = supx∈[a,b] |f (x)|.

We verify the triangle inequality here. If f and g are bounded functions on [a, b]
and x ∈ [a, b], then |(f + g)(x)| ≤ |f (x)| + |g(x)| ≤ ‖f‖∞ + ‖g‖∞ .
Thus ‖f + g‖∞ = supx∈[a,b] |(f + g)(x)| ≤ ‖f‖∞ + ‖g‖∞ . 

Example 5. An important subspace of ℬ[a, b] is the space 𝒞[a, b] of continuous


functions on [a, b]. Both spaces are given the uniform norm. 

Example 6. Another useful norm on 𝒞[a, b] is the 1-norm defined by


b
‖f‖1 = ∫a |f (x)|dx. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 77

Example 7. Consider the following sequence of functions in 𝒞[0, 1]:

3 1
⎧ 2n x if 0 ≤ x ≤ ,
⎪ 2n2
1 1 1
fn (x) = −2n3 (x − 2 ) if ≤x≤ ,
⎨ n 2n2 n2
⎪0 if
1
≤ x ≤ 1.
⎩ n2

It is clear that
1
‖fn ‖∞ = fn (1/2n2 ) = n, ‖fn ‖1 = .
2n

Example 8. Define a function ‖.‖ on the space ℝn×n of n × n matrices as follows:


let a1 , . . . , an be the rows of a matrix A and set
‖A‖∞ = max1≤i≤n ‖ai ‖1 .
The function defined above is a norm on ℝn×n . We verify the triangle inequality:
if b1 , . . . , bn are the rows of another matrix B, then the rows of A + B are a1 +
b1 , . . . , an + bn , and, for 1 ≤ i ≤ n,

‖ai + bi ‖1 ≤ ‖ai ‖1 + ‖bi ‖1 ≤ max1≤i≤n ‖ai ‖1 + max1≤i≤n ‖bi ‖1 = ‖A‖∞ + ‖B‖∞ .

Thus
‖A + B‖∞ ≤ ‖A‖∞ + ‖B‖∞ . 

The matrix norm in the above example is compatible with the ∞-norm on ℝn in
the sense that, for x ∈ ℝn , ‖Ax‖∞ ≤ ‖A‖∞ ‖x‖∞ , as the reader can easily verify.

lp Spaces

We now define the rest of the lp spaces.

Definition. For every real number 1 ≤ p < ∞, define lp to be the set of all

sequences x = (x1 , x2 , ...) in 𝕂 such that ∑n=1 |xn |p < ∞. For x ∈ lp ,

∞ 1/p
‖x‖p = ( ∑ |xn |p ) .
n=1

It is straightforward to verify that l1 is a normed linear space. For example, the


triangle inequality is obtained from the following inequality upon taking the limit
n n n
as n → ∞: ∑i=1 |xi + yi | ≤ ∑i=1 |xi | + ∑i=1 |yi |.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

78 fundamentals of mathematical analysis

Showing that lp (for 1 < p < ∞) is a normed linear space is less straightforward
and requires the development of two useful inequalities which are important in
their own right.

Definition. Let 1 < p < ∞. The conjugate Hölder exponent of p is the number
1 1
q > 1 such that + = 1. By definition, p = 1 and q = ∞ are conjugate Hölder
p q
exponents.

|x|p |y|q
Lemma 3.6.1. If p > 1, and x, y ∈ ℂ, then |xy| ≤ + . Here p and q are
p q
conjugate Hölder exponents.

t 1 1 1
Proof. Consider the function f (t) = t1/p − − , t ≥ 1; f ′ (t) = t1/p−1 − ≤ 0. Thus
p q p p
f is decreasing for all t ≥ 1, and since f (1) = 0, it follows that f (t) ≤ 0 for all t ≥ 1.
t 1 a
Thus t1/p ≤ + for t ≥ 1. Now let a, b > 0 and, say, ≥ 1. By replacing t with
p q b
a 1/p 1 a 1 ba1/p a b a b
a/b, we obtain ( ) ≤ ( ) + . Therefore, ≤ + , or a1/p b1/q ≤ + .
b p b q b1/p p q p q
Letting a = |x|p , b = |y|q , we obtain the inequality we seek. 

Theorem 3.6.2 (Hölder’s inequality). If x = (xn ) ∈ lp and y = (yn ) ∈ lq , then


z = (xn yn ) ∈ l1 and ‖z‖1 ≤ ‖x‖p ‖y‖q .

Proof. If p = 1, q = ∞, x ∈ l1 , and y ∈ l∞ , then

∞ ∞
∑ |xn yn | ≤ supn |yn | ∑ |xn | = ‖y‖∞ ‖x‖1 .
n=1 n=1

|xi | |yi | 1 |xi |p 1 |yi |q


Now let 1 < p, q < ∞. Applying lemma 3.6.1, ≤ p + q . Thus
‖x‖p ‖y‖q p ‖x‖p q ‖y‖q

n n n
1 1 1
q ∑ |xi yi | ≤ p ∑ |xi |p + q ∑ |yi |
q
‖x‖p ‖y‖ i=1 p‖x‖p i=1 q‖y‖q i=1
∞ ∞
1 p 1 q 1 1
≤ p ∑ |xi | + q ∑ |yi | = + = 1.
p q
p‖x‖p i=1 q‖y‖q i=1

1 n
The summary of the above calculations is that ∑i=1 |xi yi | ≤ 1. Taking the
‖x‖p ‖y‖q
limit as n → ∞ we obtain Hölder’s inequality. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 79

Theorem 3.6.3 (Minkowsi’s inequality). If x = (xn ), y = (yn ) ∈ lp , then x + y ∈ lp ,


and
‖x + y‖p ≤ ‖x‖p + ‖y‖p .

Proof. We already proved the theorem for p = 1 and p = ∞, so assume that 1 < p <
∞, and let q be the conjugate Hölder exponent of p. Then:

n n
∑ |xi + yi |p ≤ ∑ |xi + yi |p−1 (|xi | + |yi |).
i=1 i=1

Applying Hölder’s inequality to the right side of the above inequality yields

n n
∑ |xi + yi |p−1 |xi | + ∑ |xi + yi |p−1 |yi |
i=1 i=1
n 1/p n 1/p n 1/q
p p (p−1)q
≤ [ (∑ |xi | ) + (∑ |yi | ) ] (∑ |xi + yi | )
i=1 i=1 i=1
n 1/q
p
≤ (‖x‖p + ‖y‖p ) (∑ |xi + yi | ) .
i=1

The summary of the above calculations is that

n n 1/q
∑ |xi + yi |p ≤ (‖x‖p + ‖y‖p )( ∑ |xi + yi |p ) .
i=1 i=1

Thus
n 1−1/q
( ∑ |xi + yi |p ) ≤ ‖x‖p + ‖y‖p .
i=1
Taking the limit as n → ∞, and recalling that 1 − 1/q = 1/p, we have

∞ 1/p
( ∑ |xi + yi |p ) ≤ ‖x‖p + ‖y‖p or, ‖x + y‖p ≤ ‖x‖p + ‖y‖p . 
i=1

We have verified all the crucial details needed to prove the result below.

Theorem 3.6.4. For 1 ≤ p ≤ ∞, lp is a normed linear space. 


OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

80 fundamentals of mathematical analysis

Observe that Hölder’s inequality and the triangle inequality apply to finite
sequences:

n n 1/p n 1/q
∑ |xi yi | ≤ ( ∑ |xi |p ) ( ∑ |yi |q ) , and
i=1 i=1 i=1
n 1/p n 1/p n 1/p
( ∑ |xi + yi |p ) ≤ ( ∑ |xi |p ) + ( ∑ |yi |p ) .
i=1 i=1 i=1

Thus, for 1 ≤ p ≤ ∞, (ℝn , ‖.‖p ) is a normed linear space.

Balls, Lines, and Convex Sets

Definition. Let X be a normed linear space. The open ball of radius r centered at
x ∈ X is the set
B(x, r) = {y ∈ X ∶ ‖y − x‖ < r}.

Example 9. In (ℝ2 , ‖.‖∞ ), the open ball of radius r centered at the point (x0 , y0 )
is the open square {(x, y) ∶ |x − x0 | < r, |y − y0 | < r}. In (ℝ2 , ‖.‖1 ), the open ball
of radius r centered at (x0 , y0 ) is the open square with vertices (x0 ± r, y0 ) and
(x0 , y0 ± r). 

Definition. Let u be a unit vector (i.e., a vector of length 1) in a normed linear


space X, and let 𝜉 be a fixed point in X. The line that contains 𝜉 and is parallel
to u is the set
{𝜉 + tu ∶ −∞ < t < ∞}.

Remark. The vector u in the above definition does not have to be a unit vector, but
when it is, t is the exact distance between 𝜉 + tu and 𝜉. An important special case
is the equation of the line joining two points 𝜉 and 𝜂 in X. In this case, the line
is the set of all points x such that
x = 𝜉 + t(𝜂 − 𝜉) = (1 − t)𝜉 + t𝜂, −∞ < t < ∞.
The set {(1 − t)𝜉 + t𝜂 ∶ 0 ≤ t ≤ 1} is called the line segment joining 𝜉 and 𝜂.

Example 10. In ℝn , the above definition reduces to the familiar definition


of a straight line, especially when n = 2, 3. Indeed, if u = (u1 , . . . , un ), and
𝜉 = (𝜉1 , . . . , 𝜉n ), then the parametric equation of the line containing 𝜉 and
parallel to u is
x1 = 𝜉1 + tu1 , x2 = 𝜉2 + tu2 , . . . , xn = 𝜉n + tun , −∞ < t < ∞. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 81

Definition. Let E be a subset of a vector space V, and let x ∈ V. The set E + x =


{x + y ∶ y ∈ E} is known as the translation of E by x (or in the direction of x).

The set E + x can be visualized as rigidly moving E in the direction of the vector x.
The graph of the parabola y = x2 + 1 is the translation of the graph of the parabola
y = x2 by the vector (0, 1). Figure 3.1 depicts the translation of the raindrop-like
set E by the vector x = (0, −1).
Translating a set preserves most of its characteristics. Convexity is a good
example.

Definition. A subset C of a vector space V is said to be convex if, for every 𝜉, 𝜂 ∈ C,


and all 0 ≤ t ≤ 1, (1 − t)𝜉 + t𝜂 ∈ C. Thus C is convex if whenever it contains 𝜉
and 𝜂, it contains the line segment joining 𝜉 and 𝜂.

Example 11.
(a) An open ball in ℝn is a convex set.
(b) Let A be an m × n real matrix, and let b ∈ ℝm . The two sets
{x ∈ ℝn ∶ Ax = b} and {x ∈ ℝn ∶ Ax > b} are convex subsets of ℝn .3
(c) The union of the first and third quadrants in the plane is not convex.
(d) The raindrop region in figure 3.1 is not convex.
(e) The intersection of an arbitrary collection of convex sets is convex. 

0
E

x
–1
E+x

Figure 3.1 The falling raindrop

n
3 The notation Ax < b means that ∑j=1 aij xj < bi for all 1 ≤ i ≤ m.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

82 fundamentals of mathematical analysis

Excursion: Convex Hulls and Polytopes

We limit the discussion below to ℝn , although some of the statements are valid for
an arbitrary vector space.

Definition. A convex combination of a finite set {x1 , . . . , xk } ⊆ ℝn is a point of the


k k
form x = ∑i=1 𝜆i xi such that 0 ≤ 𝜆i ≤ 1 and ∑i=1 𝜆i = 1.

Theorem 3.6.5. A subset C ⊆ ℝn is convex if and only if it contains all convex


combinations of points of C.

k
Proof. It is enough to show that if C is convex and x = ∑i=1 𝜆i xi is a convex combina-
tion of points x1 , . . . , xk ∈ C, then x ∈ C. The converse is trivial. We use induction
on k. The statement is true for k = 2 by the very definition of convexity. Without
k 𝜆x
loss of generality, assume that 𝜆1 < 1, and write x = 𝜆1 x1 + (1 − 𝜆1 ) ∑i=2 i i .
1−𝜆1
k 𝜆i k 𝜆x
Now ∑i=2 = 1 and y = ∑i=2 i i ∈ C by the inductive hypothesis. By the
(1−𝜆1 ) 1−𝜆1
convexity of C, x = 𝜆1 x1 + (1 − 𝜆1 )y ∈ C. 

Definition. The convex hull of a nonempty set A ⊆ ℝn is the smallest convex


subset of ℝn that contains A. The notation conv(A) denotes the convex hull of A.

It is clear that conv(A) is the intersection of all the convex subsets of ℝn that contain
A and that conv(A) ≠ ∅, since A ⊆ ℝn , and ℝn is convex.

Theorem 3.6.6. For a nonempty set A ⊆ ℝn , conv(A) is the set of all convex
combinations of points of A.

Proof. By the previous theorem, it is enough to show that the set of all convex
k
combinations of points in A is a convex set. Suppose that x = ∑i=1 𝜆i xi and
l
y = ∑j=1 𝜇j yj are, respectively, convex combinations of points x1 , . . . , xk and
y1 , . . . , yl in A. If 𝛼 ∈ [0, 1], then

k l
(1 − 𝛼)x + 𝛼y = ∑(1 − 𝛼)𝜆i xi + ∑ 𝛼𝜇j yj
i=1 j=1

is a convex combination of x1 , . . . , xk , y1 , . . . , yl because

k l
∑(1 − 𝛼)𝜆i + ∑ 𝛼𝜇j = 1. 
i=1 j=1
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 83

A natural question now is whether there is an upper bound on the length of the
convex combinations of vectors in A needed to generate all of conv(A). The next
theorem provides the answer.

Theorem 3.6.7 (Carathéodory’s theorem). Let A ⊆ ℝn , and let C = conv(A).


Then every point in C is the convex combination of, at most, n + 1 points of A.

k
Proof. Let x ∈ C. Then x = ∑i=1 𝜆i xi for some x1 , . . . , xk ∈ A and some 𝜆1 , . . . , 𝜆k ∈
k
(0, 1] with ∑i=1 𝜆i = 1. If k > n + 1, then the vectors x2 − x1 , . . . , xk − x1 are
linearly dependent, so there are constants 𝜇2 , . . . , 𝜇k , not all zero, such that
k k k k
∑j=2 𝜇j (xj − x1 ) = 0. If we set 𝜇1 = − ∑j=2 𝜇j , then ∑j=1 𝜇j xj = 0, and ∑j=1 𝜇j =
0. Observe that at least one of the numbers 𝜇1 , . . . , 𝜇k is positive. Now, for all
k 𝜆i 𝜆j
𝛼 ∈ ℝ, x = ∑j=1 (𝜆j − 𝛼𝜇j )xj . Let i be such that = min1≤j≤k { ∶ 𝜇j > 0}, and
𝜇i 𝜇j
𝜆i
choose 𝛼 = . Observe that 𝛼 > 0 and for 1 ≤ j ≤ k, 𝜆j − 𝛼𝜇j ≥ 0. Now x =
𝜇i
k k
∑j=1 (𝜆j − 𝛼𝜇j )xj , 𝜆j − 𝛼𝜇j ≥ 0, and ∑j=1 (𝜆j − 𝛼𝜇j ) = 1. Since 𝜆i − 𝛼𝜇i = 0, x
is a convex combination of, at most, k − 1 points of A. We continue this process
until x is a convex combination of, at most, n + 1 vectors in A. 

Example 12. It is possible that k < n + 1. The closed unit disk D in ℝ2 is the convex
hull of the unit circle 𝒮1 , and every interior point in D is a convex combination
of two vectors in 𝒮1 . However, k = n + 1 is the best possible bound. For example,
if x0 , x1 , and x2 are three noncollinear points in the plane, then an interior point
in the triangle defined by the three points is not a convex combination of any
two of the three points. 

Definition. A polytope in ℝn is the convex hull of a finite subset of ℝn .

Definition. A point x in a convex subset C ⊆ ℝn is said to be an extreme point


of C if whenever y, z ∈ C, and y ≠ z, then, for any 𝜆 ∈ (0, 1), x ≠ 𝜆y + (1 − 𝜆)z.
The extreme points of a polytope are more specifically called its vertices.

A convex set may not have any extreme points. A simple example is the set
{(x, y) ∈ ℝ2 ∶ 0 ≤ x ≤ 1, −∞ < y < ∞}.

Example 13. If x1 and x2 are distinct points in ℝn , then the polytope


Q = conv(x1 , x2 ) is the line segment {(1 − t)x1 + tx2 ∶ 0 ≤ t ≤ 1}. It is easy to
verify that x1 and x2 are the vertices of Q. 

The number of vertices of a polytope in ℝn is not related to the dimension n of


the space. For example, a regular polygon in ℝ2 can have any number of vertices.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

84 fundamentals of mathematical analysis

While it is intuitively obvious that a polytope is the convex hull of its vertices,
this is not an entirely trivial fact. We prove this fact, together with the even more
fundamental fact that polytopes do have vertices.

Lemma 3.6.8. The vertices of a polytope Q = conv(x1 , . . . , xk ) in ℝn are contained


in {x1 , . . . , xk }.

Proof. We show that if x ∈ Q and x ≠ xj , then x is not an extreme point of Q.


k
By assumption, x is a convex combination of x1 , . . . , xk , say, x = ∑i=1 𝜆j xj ,
and assume, without loss of generality, that 𝜆1 > 0. Since x ≠ x1 , 𝜆1 ≠ 1. Let
k 𝜆
y = ∑i=2 i xi . Since y is a convex combination of x2 , . . . , xk , y ∈ Q, and
1−𝜆1
x = 𝜆1 x1 + (1 − 𝜆1 )y. Since 0 < 𝜆1 < 1, x is not an extreme point of Q. 

Lemma 3.6.9. Consider the polytope Q = conv(x1 , . . . , xk ). If xk is not a vertex


of Q, then xk is a convex combination of x1 , . . . , xk−1 . Consequently, Q =
conv(x1 , . . . , xk−1 ).

k
Proof. Since xk is not a vertex of Q, there exist convex combinations y = ∑i=1 𝛽i xi
k
and z = ∑i=1 𝛾i xi (y ≠ z), and a number 𝜆 ∈ (0, 1) such that xk = 𝜆y + (1 − 𝜆)z.
k
Now xk = ∑i=1 𝛼i xi , where 𝛼i = 𝜆𝛽i + (1 − 𝜆)𝛾i . If 𝛼k = 0, the proof is complete.
It is easy to check that 𝛼k = 1 is possible only if 𝛽k = 𝛾k = 1. But this would force
k−1 𝛼
y = z = xk , which is a contradiction. Thus 0 < 𝛼k < 1 and xk = ∑i=1 i xi , as
1−𝛼k
desired. We leave it to the reader to check that Q = conv(x1 , . . . , xk−1 ). 

Theorem 3.6.10. The set of vertices of a polytope Q = conv(x1 , . . . , xk ) is not empty,


and Q is the convex hull of its vertices.

Proof. We prove the result by induction on k. The result is true for k = 2 by example
13. Now consider the polytope Q = conv(x1 , . . . , xk ). If all the points x1 , . . . , xk are
vertices of Q, there is nothing to prove. Otherwise, one point, say, xk is not a vertex
of Q. By the previous lemma, Q = conv(x1 , . . . , xk−1 ). By the inductive hypothesis,
Q is the convex hull of its vertices. 

The fact that a polytope is the convex hull of its extreme points is the weakest
version of the well-known Krein-Millman theorem.

An important special type of polytopes is the simplex.

Definition. The standard n-simplex, Tn , in ℝn is the convex hull of the n + 1


vectors 0, e1 , . . . , en . In general, if {x0 , . . . , xn } ⊆ ℝn is such that x1 − x0 , . . . , xn −
x0 are independent, the set conv{x0 , . . . , xn } is called an n-simplex in ℝn .
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 85

The standard 2-simplex is a triangle with vertices (0, 0), (1, 0), and (0, 1). The
standard 3-simplex is a pyramid with vertices (0, 0, 0), (1, 0, 0), (0, 1, 0), and
(0, 0, 1).
n
Every point x in the standard n-simplex can be written uniquely as x = ∑i=1 𝜆i ei ,
n n
where 𝜆i ∈ [0, 1], and ∑i=1 𝜆i ≤ 1. Set 𝜆0 = 1 − ∑i=1 𝜆i . The numbers 𝜆0 , . . . , 𝜆n
are called the barycentric coordinates of x.

Exercises

1. Show that, for elements x and Y of a normed linear space,

|‖x‖ − ‖y‖| ≤ ‖x ± y‖.

2. Let a1 , . . . , an be the columns of a real n × n matrix A. Prove that the function

‖A‖1 = max1≤i≤n ‖ai ‖1

defines a norm on ℝn×n and that, for x ∈ ℝn ,

‖Ax‖1 ≤ ‖A‖1 ‖x‖1 .

3. Show that if 1 ≤ r < s ≤ ∞, then lr ⊂ ls .


4. Show that if x ∈ l1 , then limp→∞ ‖x‖p = ‖x‖∞ .
5. Show that in a normed linear space, B(x, r) + B(y, s) = B(x + y, r + s).
6. Prove that the translation of a convex subset of ℝn is convex.
7. Prove that x is an extreme point of a convex set C if and only if C − {x} is
convex.
8. Prove that 0, e1 , . . . , en are the vertices of the standard n-simplex.

3.7 Inner Product Spaces

The concept of an inner product stems out of the need to have an instrument that
determines the orthogonality of vectors in a normed linear space. Let us consider
the Euclidean norm on ℝn . The orthogonality of two vectors x = (x1 , . . . , xn ) and
y = (y1 , . . . , yn ) in ℝn is equivalent to the condition that ‖x + y‖2 = ‖x‖2 + ‖y‖2 .
(the Pythagorean theorem). Now

n n n n n
‖x + y‖2 = ∑(xi + yi )2 = ∑ x2i + ∑ y2i + 2 ∑ xi yi = ‖x‖2 + ‖y‖2 + 2 ∑ xi yi .
i=1 i=1 i=1 i=1 i=1

n
Thus the orthogonality of x and y is equivalent to the condition ∑i=1 xi yi = 0.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

86 fundamentals of mathematical analysis

This suggests that we examine the function B ∶ ℝn × ℝn → ℝ defined by


n
B(x, y) = ∑i=1 xi yi . A little reflection reveals that B is linear in each of its
arguments, that B(x, y) = 0 if and only if x and y are orthogonal, and that B
defines the norm in the sense that B(x, x) = ‖x‖2 . Therefore a function B with the
above properties may be a useful specialization of norms in that it provides a tool
for defining orthogonality. Abstracting the above discussion leads directly to the
definition of an inner product.

Definition. An inner product on a vector space H is a function


⟨., .⟩ ∶ H × H → 𝕂 such that, for all x, y, z ∈ H and all scalars 𝛼 ∈ 𝕂,

(a) ⟨x, y⟩ = ⟨y, x⟩,


(b) ⟨x + y, z⟩ = ⟨x, z⟩ + ⟨y, z⟩,
(c) ⟨𝛼x, y⟩ = 𝛼⟨x, y⟩, and
(d) ⟨x, x⟩ ≥ 0, and ⟨x, x⟩ = 0 if and only if x = 0.

A vector space H with an inner product is called an inner product space.

Example 1. The standard inner product on ℂn is defined by


n
⟨x, y⟩ = ∑ xy = y∗ x.
i=1 i i

Consistent with matrix notation, we write vectors as columns and y∗ as the


conjugate transpose of the vector y, while y∗ x is conveniently thought of as a
matrix product. The standard inner product on ℝn is defined by
n
⟨x, y⟩ = ∑i=1 xi yi = yT x = xT y. 

Example 2. The space l2 is an inner product space with the inner product

⟨x, y⟩ = ∑n=1 xn yn . 

Example 3. The space 𝒞[a, b] is an inner product space with the inner product
b
⟨f, g⟩ = ∫a f (x)g(x)dx. 

Of particular interest to us is the special case when a = −𝜋, b = 𝜋. In this case, we


define
1 𝜋
⟨f, g⟩ = ∫−𝜋 f (x)g(x)dx.
2𝜋
The normalization constant 1/2𝜋 is included for convenience, as will become clear
later in this section.

For an element x in an inner product space H, we write ‖x‖ = √⟨x, x⟩. We will see
shortly that ‖.‖ is indeed a norm on H.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 87

Theorem 3.7.1 (the Cauchy-Schwarz inequality). If H is an inner


product space, then, for all x, y ∈ H,

|⟨x, y⟩| ≤ ‖x‖‖y‖.

Equality holds if and only if x and y are linearly dependent.

Proof. Without loss of generality, assume that y ≠ 0. For 𝛼 ∈ 𝕂,

0 ≤ ‖x + 𝛼y‖2 = ⟨x + 𝛼y, x + 𝛼y⟩


= ⟨x, x⟩ + 𝛼⟨y, x⟩ + 𝛼⟨x, y⟩ + |𝛼|2 ⟨y, y⟩.

|⟨x,y⟩|2
Substituting 𝛼 = −⟨x, y⟩/‖y‖2 , we obtain 0 ≤ ‖x‖2 − , from which the
‖y‖2
Cauchy-Schwarz inequality follows.
It is easy to verify that if y = 𝛼x, then |⟨x, y⟩| = ‖x‖‖y‖. Conversely, suppose
that |⟨x, y⟩| = ‖x‖‖y‖. Now

‖‖y‖2 x − ⟨x, y⟩y‖2 = ⟨‖y‖2 x − ⟨x, y⟩y, ‖y‖2 x − ⟨x, y⟩y⟩


= ‖y‖4 ‖x‖2 − ‖y‖2 ⟨x, y⟩⟨y, x⟩ − ‖y‖2 ⟨x, y⟩⟨x, y⟩ + |⟨x, y⟩|2 ‖y‖2
= ‖y‖2 {‖x‖2 ‖y‖2 − |⟨x, y⟩|2 } = 0.

Thus ‖y‖2 x − ⟨x, y⟩y = 0, and x and y are dependent. 

Corollary 3.7.2 (the triangle inequality). In an inner product space H,

‖x + y‖ ≤ ‖x‖ + ‖y‖.

Proof. Using the Cauchy-Schwartz inequality,

‖x + y‖2 = ⟨x + y, x + y⟩ = ‖x‖2 + ⟨x, y⟩ + ⟨y, x⟩ + ‖y‖2


= ‖x‖2 + 2Re⟨x, y⟩ + ‖y‖2 ≤ ‖x‖2 + 2‖x‖‖y‖ + ‖y‖2 = (‖x‖ + ‖y‖)2 .

Taking the square roots of the extreme sides of the above string yields the triangle
inequality. 

It follows from the above corollary that the function ‖x‖ = ⟨x, x⟩1/2 defines a norm
on H. Therefore every inner product space is a normed linear space.

Definition. Two vectors x and y in an inner product space are said to be orthog-
onal if ⟨x, y⟩ = 0. Symbolically, we write x ⟂ y to indicate the orthogonality of
x and y.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

88 fundamentals of mathematical analysis

Theorem 3.7.3 (the Pythagorean theorem). If x and y are orthogonal vectors in


an inner product space, then
‖x + y‖2 = ‖x‖2 + ‖y‖2 .

Proof.
‖x + y‖2 = ⟨x + y, x + y⟩ = ⟨x, x⟩ + ⟨x, y⟩ + ⟨y, x⟩ + ⟨y, y⟩
= ⟨x, x⟩ + ⟨y, y⟩ = ‖x‖2 + ‖y‖2 . 

The Pythagorean theorem can be easily generalized as follows: if x1 , . . . , xn are


mutually orthogonal elements of an inner product space, then

‖x1 + ... + xn ‖2 = ‖x1 ‖2 + ... + ‖xn ‖2 .

Definition. A subset S of an inner product space H is said to be orthogonal if the


vectors in S are pairwise orthogonal. If, in addition, each vector in S is a unit
vector, then S is called an orthonormal subset of H. We always assume that an
orthogonal subset excludes the zero vector.

Example 4. The canonical vectors in 𝕂n form an orthonormal set. 

Example 5. The set of functions un (t) = {eint ∶ n ∈ ℤ} is orthogonal in 𝒞[−𝜋, 𝜋]


1 𝜋
and the inner product ⟨f, g⟩ = ∫−𝜋 f (x)g(x)dx. (Here ei𝜃 = cos 𝜃 + i sin 𝜃.)
2𝜋
Indeed, if m and n are distinct integers, then

𝜋 𝜋
1 1 |
⟨un , um ⟩ = ∫ eint e−imt dt = ei(n−m)t || = 0,
2𝜋 −𝜋 2𝜋i(n − m) −𝜋

while
𝜋
1
⟨un , un ⟩ = ∫ eint e−int dt = 1.
2𝜋 −𝜋
Observe the convenience of including the factor 1/2𝜋 in the definition of the
inner product. 

Theorem 3.7.4. An orthogonal subset S of an inner product space H is independent.

Proof. Let {u1 , . . . , un } be a finite subset of S, and suppose that, for scalars a1 , . . . , an ,
n
∑i=1 ai ui = 0. For a fixed 1 ≤ j ≤ n,

n n
⟨∑ ai ui , uj ⟩ = ∑ ai ⟨ui , uj ⟩ = aj .
i=1 i=1
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 89

But
n
⟨∑ ai ui , uj ⟩ = ⟨0, uj ⟩ = 0.
i=1

Therefore aj = 0, and {u1 , . . . , un } is independent. 

We now pause briefly in order to get confirmation that the concepts we


have developed so far are consistent with the geometry of ℝn . The Cauchy-
|⟨x,y⟩|
Schwartz inequality, which we write as ≤ 1, implies that there exists a
‖x‖‖y‖
⟨x,y⟩
unique number 𝜃 ∈ [0, 𝜋] such that cos 𝜃 = , or ⟨x, y⟩ = ‖x‖‖y‖cos 𝜃. Now
‖x‖‖y‖
2 2 2 2 2
‖y − x‖ = ‖x‖ + ‖y‖ − 2⟨x, y⟩ = ‖x‖ + ‖y‖ − 2‖x‖‖y‖cos 𝜃. When n = 2, the
last identity is the well-known law of cosines in elementary trigonometry. The
number 𝜃, of course, is the angle between x and y.

We continue to exploit the geometry of vectors in ℝ2 to get direction for the next
step. An important concept in geometry (and in Hilbert space theory) is that of
projecting a vector onto another. Let x ∈ ℝ2 and let u be a unit vector in the plane.
The length of the projection of x onto u is given by ‖x‖ cos 𝜃 = ⟨x, u⟩; hence the
vector projection of x onto the line containing u is the vector y = ⟨x, u⟩u. This is
the closest vector in the line containing u to the vector x. Since the projection
of a vector x ∈ ℝ3 onto the span M of two orthonormal vectors u1 and u2 is the
sum of the individual projections of x onto u1 and u2 , the projection of x on M is
⟨x, u1 ⟩u1 + ⟨x, u2 ⟩u2 . The constructions involved in the next two theorems are now
well motivated.

Theorem 3.7.5. Let S = {u1 , . . . , un } be a finite orthonormal subset of an inner


product space H, let x ∈ Span(S), and write xî = ⟨x, ui ⟩. Then

n n
x = ∑ x̂i ui , and ‖x‖2 = ∑ |x̂i |2 .
i=1 i=1

In particular, if ⟨x, ui ⟩ = 0 for all 1 ≤ i ≤ n, then x = 0.


n
Proof. Since x ∈ Span(S), there are scalars a1 , . . . , an such that x = ∑i=1 ai ui . For a
fixed 1 ≤ j ≤ n,
n
x̂j = ⟨x, uj ⟩ = ∑ ai ⟨ui , uj ⟩ = aj .
i=1
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

90 fundamentals of mathematical analysis

By the Pythagorean theorem,

n n
‖x‖2 = ∑ ‖x̂i ui ‖2 = ∑ |x̂i |2 . 
i=1 i=1

Definition. Let M be a subspace of an inner product space H. The orthogonal


complement of M is the set

M⟂ = {z ∈ H ∶ z ⟂ x ∀x ∈ M}.

It is clear that M⟂ is a subspace of H and that M ∩ M⟂ = {0}.

Example 6. Let a = (a1 , . . . , an )T be a nonzero vector in ℝn . The orthogonal


complement of M = Span({a}) is the set of all vectors x such that aT x = 0.
Thus M⟂ can be viewed as the kernel of an 1 × n matrix and is therefore a
maximal subspace of ℝn . A translation of a maximal subspace of ℝn is called
n
a hyperplane. Thus the equation of a hyperplane is of the form ∑i=1 ai xi = b,
b ∈ ℝ. Observe that the hyperplane aT x = b partitions ℝn into the three sets
{x ∈ ℝn ∶ aT x = b} , {x ∈ ℝn ∶ aT x < b} and {x ∈ ℝn ∶ aT x > b}. The latter two
sets are called the open half-spaces determined by the hyperplane aT x = b. 

Theorem 3.7.6. Let S = {u1 , . . . , un } be a finite orthonormal subset of H, and let


M = Span(S). Then every vector x ∈ H can be written uniquely as

x = y + z, where y ∈ M and z ∈ M⟂ .

Additionally, y is the closest vector in M to x in the sense that if y′ ∈ M and y′ ≠ y,


then ‖x − y‖ < ‖x − y′ ‖.
n
Proof. Define y = ∑i=1 ⟨x, ui ⟩ui , and let z = x − y.
Clearly, y ∈ M. We show that z ∈ M⟂ . For 1 ≤ j ≤ n,

n
⟨z, uj ⟩ = ⟨x − ∑⟨x, ui ⟩ui , uj ⟩
i=1
n
= ⟨x, uj ⟩ − ∑⟨x, ui ⟩⟨ui , uj ⟩ = ⟨x, uj ⟩ − ⟨x, uj ⟩ = 0.
i=1

This shows that H = M + M⟂ . To show the uniqueness part, suppose that x = y +


z = y′ + z′ , where y, y′ ∈ M and z, z′ ∈ M⟂ . Thus y − y′ = z′ − z. Since y − y′ ∈ M
and z′ − z ∈ M⟂ , y − y′ ∈ M ∩ M⟂ = {0}.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 91

To prove the last assertion, suppose y′ ∈ M, y′ ≠ y, and write x − y′ = (x − y) +


(y − y′ ). Now x − y = z ∈ M⟂ and y − y′ ∈ M. Using the Pythagorean theorem,
we have ‖x − y′ ‖2 = ‖x − y‖2 + ‖y − y′ ‖2 > ‖x − y‖2 . 
n
The vector y = ∑i=1 ⟨x, ui ⟩ui is called the orthogonal projection of x on M. We
reiterate that y is the closest vector of M to x. We also say that y is the best
approximation of x in M.

Sometimes the basis vectors u1 , . . . , un in the above theorem are merely orthogonal
and not orthonormal. In this case, we find the orthogonal projection of x on M by
using the formula
n ui
y = ∑i=1 ⟨x, ui ⟩ ,
‖ui ‖2
which is the previously stated formula for y when each ui is replaced with the
u
normalized vector i .
‖ui ‖

The following question naturally imposes itself: does every finite-dimensional


inner product space have an orthonormal basis? The following theorem delivers
the answer.

Theorem 3.7.7. Every finite-dimensional inner product space contains an orthonor-


mal basis.

Proof. We use induction on dim(H). Let {v1 , . . . , vn } be a basis for H. Use the
inductive hypothesis to find an orthogonal basis {u1 , . . . , un−1 } for the inner
product space Span({v1 , . . . , vn−1 }), and define

n−1
uj
un = vn − ∑ ⟨vn , uj ⟩ .
j=1
‖uj ‖2

Clearly, un ≠ 0 because otherwise vn ∈ Span({u1 , . . . , un−1 }) = Span({v1 , . . . ,


vn−1 }). Observe that un is nothing but the difference between vn and its orthogonal
projection on Span ({u1 , . . . , un−1 }) (the vector z in the notation of theorem 3.7.6),
and therefore un is orthogonal to each of the vectors u1 , . . . , un−1 . By theorem
3.7.4, the orthogonal set {u1 , u2 , . . . , un } is independent and hence is a basis for
H. To obtain the desired orthonormal basis, we simply normalize each of the
vectors ui . 

The above theorem and its proof deliver more than the mere existence of an
orthonormal basis for an arbitrary finite-dimensional inner product space.
The proof is inductive and constructive; hence it can be applied to an infinite
independent sequence of vectors, recursively, as follows.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

92 fundamentals of mathematical analysis

The Gram-Schmidt Process

Given an infinite sequence v1 , v2 , . . . of independent vectors in an inner product


space, the sequence defined below is orthogonal:
n−1 uj
u1 = v1 , and, for n ≥ 2, un = vn − ∑j=1 ⟨vn , uj ⟩ .
‖uj ‖2

Additionally, for each n ∈ ℕ,

Vn = Span({v1 , . . . , vn }) = Span({u1 , . . . , un }) = Un .

Example 7. Consider the space 𝒞[−1, 1] with the inner product ⟨f, g⟩ =
1
∫−1 f (x)g(x)dx. Applying the Gram-Schmidt process to the infinite independent
sequence of monomials 1, x, x2 . . . , we obtain a sequence of orthogonal
polynomials P0 , P1 , . . . , that spans the space of polynomials such that

Span({1, x, . . . , xn }) = Span({P0 , P1 , . . . , Pn }) for all n ≥ 0.

The polynomials Pn are known as the Legendre polynomials. We will study


some of the properties of these polynomials in section 4.10. 

The following observation is sometimes crucial for avoiding the often cumbersome
calculations needed to compute the orthogonal sequence u1 , u2 , . . . .

If w1 , w2 , ... is an orthogonal sequence and, for each n ∈ ℕ, Span({v1 , . . . , vn }) =


Span({w1 , . . . , wn }), then each wn is a multiple of the corresponding un . This is
because the orthogonal complement U⟂n−1 of Un−1 in Vn is one-dimensional, hence
any nonzero vector in U⟂n−1 is a multiple of any other nonzero vector in U⟂n−1 . The
following example exploits this idea to generate the Legendre polynomials.

Example 8. Consider the following set of polynomials:

Qn (x) = Dn [(x2 − 1)n ],

where Dn f denotes the nth derivative of f. Each Qn is a polynomial of exact degree


n; thus Span({P0 , . . . , Pn }) = San({Q0 , . . . , Qn }) = Span({1, x, . . . , xn }). If we
show that Qn is orthogonal to each of the monomials xj for 0 ≤ j ≤ n − 1, then
the polynomials {Qn ∶ n ∈ ℕ} are orthogonal and, by the above observation,
Pn = cn Qn .
Integration by parts yields

1 1 1
|
∫ xj Dn (x2 − 1)n dx = − ∫ jxj−1 Dn−1 (x2 − 1)n dx + xj Dn−1 (x2 − 1)n | .
| −1
−1 −1
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 93

The second term is zero because if k < n, then x2 − 1 is a factor of Dk (x2 − 1)n .
The same reason coupled with integration by parts j − 1 times proves the desired
result. 

Example 9. Let 𝜆 be a linear functional on a finite-dimensional inner product H.


Then there exists a unique vector y ∈ H such that, for every x ∈ H, 𝜆(x) = ⟨x, y⟩.
Let {u1 , . . . , un } be an orthonormal basis for H, and define ai = 𝜆(ui ). We
n
claim that y = ∑i=1 ai ui is the desired vector. For a vector x ∈ H, use theorem
n
3.7.5 to write x = ∑i=1 xî ui . On the one hand,

n n
𝜆(x) = ∑ xî 𝜆(ui ) = ∑ xî ai .
i=1 i=1
On the other hand,
n n n n
⟨x, y⟩ = ⟨∑ xî ui , ∑ aj uj ⟩ = ∑ xî aj ⟨ui , uj ⟩ = ∑ xî ai = 𝜆(x). 
i=1 j=1 i,j=1 i=1

The Spectral Decomposition of a Normal Matrix

The goal of this subsection is to derive the spectral decomposition of a normal


matrix. An exact generalization of this decomposition is valid for compact self-
adjoint (in fact, normal) operators on infinite-dimensional separable Hilbert
spaces.

For a (complex) matrix A, we use the symbol A∗ to denote the conjugate transpose
of A. Thus (A∗ )ij = aji . The following theorem sums up the properties of conjugate
transposition. We only verify part (c).

Theorem 3.7.8. Let A and B be matrices of compatible sizes for matrix multiplica-
tion. Then

(a) A∗∗ = A,
(b) (AB)∗ = B∗ A∗ , and
(c) if A is an n × n matrix, then, for all x, y ∈ ℂn , ⟨Ax, y⟩ = ⟨x, A∗ y⟩.

Proof. ⟨x, A∗ y⟩ = (A∗ y)∗ x = y∗ A∗∗ x = y∗ Ax = ⟨Ax, y⟩. 

Definition. An n × n matrix P is said to be unitary if its columns form an


orthonormal basis for ℂn . A real unitary matrix is specifically called an orthog-
onal matrix.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

94 fundamentals of mathematical analysis

Theorem 3.7.9. A unitary matrix P is invertible, and P−1 = P∗ .

Proof. Partition P by its columns, and write P = (u1 , . . . , un ). Then

u∗
⎛ 1⎞
.
P∗ P = ⎜ ⎟ (u1 , . . . , un ).
⎜.⎟

⎝un ⎠

Thus the (i, j) entry of the product P∗ P is u∗i uj = 𝛿i,j . Hence P∗ P = In . 

The simplest example of an orthogonal (hence unitary) matrix is a rotation matrix.

Example 10. For 𝜃 ∈ [0, 2𝜋), the 2 × 2 matrix

cos 𝜃 −sin 𝜃
P𝜃 = ( )
sin 𝜃 cos 𝜃

is an orthogonal matrix. We show that the linear mapping induced by P𝜃 is a


rotation of the plane. Indeed, if we identify a point (x, y) ∈ ℝ2 with the complex
number z = x + iy and write z = reit , then multiplying z by ei𝜃 produces the
point z1 = x1 + iy1 , which is the rotation of z through the origin by the angle
𝜃. Thus

z1 = zei𝜃 = rei(t+𝜃) = r[cos(t + 𝜃) + isin(t + 𝜃)]


= r[costcos𝜃 − sintsin𝜃 + i(sintcos𝜃 + costsin𝜃)]
= xcos𝜃 − ysin𝜃 + i(xsin𝜃 + ycos𝜃).

Equating the real and imaginary parts yields

x xcos𝜃 − ysin𝜃 x
( 1) = ( ) = P𝜃 ( ) . 
y1 xsin𝜃 + ycos𝜃 y

Rotation matrices are important because their obvious geometrical properties


typify most of the general properties of a unitary matrix. For example, it is clear
that a rotation of the plane preserves distances between vectors as well as angles
between them. More specifically, ‖P𝜃 x‖ = ‖x‖ and ⟨P𝜃 x, P𝜃 y⟩ = ⟨x, y⟩. This is the
reason many people, including this author, loosely think of orthogonal matrices
as rotations, although this is inaccurate, even in two dimensions. For example, the
following matrix is orthogonal, but it is not a rotation matrix:

−1 0
( ).
0 1
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 95

The following innocent-sounding question leads to a whole set of interesting


definitions and problems, in both finite and infinite-dimensional inner product
spaces: which linear operators on ℝ2 can be diagonalized by simply rotating the
axes? The question is exactly equivalent to the question of which matrices can be
diagonalized by a rotation matrix. The immediate generalization is the question
of which (complex) matrices can be diagonalized by a unitary matrix. To answer
the question, suppose that a matrix A can be diagonalized by a unitary matrix
P = (u1 , . . . , un ). Thus P−1 AP = P∗ AP = D, where D is a diagonal matrix whose
entries are the eigenvalues of A (see the proof of theorem 3.5.7). Then

u∗ 𝜆 u∗
𝜆1 ⎛ 1⎞ ⎛ 1 1⎞
. . ⎟
A = PDP∗ = (u1 , . . . , un ) ( ⋱ ) ⎜ ⎟ = (u1 , . . . , un ) ⎜ .
⎜ . ⎟ ⎜ . ⎟
𝜆n ∗ ∗
⎝un ⎠ ⎝𝜆n un ⎠

n n
Thus A = ∑i=1 𝜆i ui u∗i . Similarly, A∗ = ∑i=1 𝜆i ui u∗i .
n n
Now AA∗ = ∑i,j=1 𝜆i 𝜆j ui (u∗i uj )u∗j = ∑i=1 |𝜆i |2 ui u∗i = A∗ A.

The above calculation shows that a necessary condition for a matrix A to be


unitarily diagonalizable is that A∗ A = AA∗ . Such a matrix has a name.

Definition. A (complex) matrix A is called normal if A∗ A = AA∗ . A matrix A is


called Hermitian if A∗ = A. A Hermitian matrix is clearly normal. Observe that
a real Hermitian matrix is simply a symmetric matrix.

Theorem 3.7.13 establishes the fact that normality is also a sufficient condition for
the unitary diagonalization of a matrix.

Lemma 3.7.10. If A is normal, then, for all x ∈ ℂn , ‖Ax‖ = ‖A∗ x‖.

Proof. ‖Ax‖2 − ‖A∗ x‖2 = ⟨Ax, Ax⟩ − ⟨A∗ x, A∗ x⟩


= ⟨A∗ Ax, x⟩ − ⟨AA∗ x, x⟩
= ⟨(A∗ A − AA∗ )x, x⟩ = ⟨0, x⟩ = 0. 

Theorem 3.7.11. Let A be a normal matrix. Then a vector u is an eigenvector of


A with the corresponding eigenvalue 𝜆 if and only if u is an eigenvector of A∗
corresponding to the eigenvalue 𝜆.

Proof. It is easy to verify that A − 𝜆I is normal and that its conjugate transpose is
A∗ − 𝜆I. By the previous lemma, ‖(A − 𝜆I)u‖ = ‖(A∗ − 𝜆I)u‖. Thus (A − 𝜆I)u =
0 if and only if (A∗ − 𝜆I)u = 0. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

96 fundamentals of mathematical analysis

Theorem 3.7.12. If A is a normal matrix, then eigenvectors of A corresponding to


distinct eigenvalues are orthogonal.

Proof. Suppose Au1 = 𝜆1 u1 , Au2 = 𝜆2 u2 , where u1 ≠ 0 ≠ u2 , and 𝜆1 ≠ 𝜆2 . Then

𝜆1 ⟨u1 , u2 ⟩ = ⟨𝜆1 u1 , u2 ⟩ = ⟨Au1 , u2 ⟩


= ⟨u1 , A∗ u2 ⟩ = ⟨u1 , 𝜆2 u2 ⟩ = 𝜆2 ⟨u1 , u2 ⟩.

Thus (𝜆1 − 𝜆2 )⟨u1 , u2 ⟩ = 0, and ⟨u1 , u2 ⟩ = 0. 

Theorem 3.7.13 (diagonalization). Let A be a normal matrix. Then there exists a


unitary matrix P and a diagonal matrix D such that

P∗ AP = D.

Proof. The proof is inductive. The base case (n=2) is left as an exercise. Let 𝜆1 be
an eigenvalue of A, and let v1 be a unit eigenvector corresponding to 𝜆1 . Let
M = Span({v1 }), and let {v2 , . . . , vn } be an orthonormal basis for M⟂ . By construc-
tion, the matrix Q = (v1 , . . . , vn ) is unitary.
We claim that Q∗ AQ has the form

𝜆 0 ... 0
⎛ 1 ⎞
0
Q∗ AQ = ⎜ ⎟.
⎜ ⋮ A′ ⎟
⎝ 0 ⎠

The (i, j) entry of Q∗ AQ is eTi Q∗ AQej .⁴ But, for 1 ≤ i ≤ n,

eTi Q∗ AQe1 = eTi Q∗ Av1 = 𝜆1 eTi Q∗ v1 = 𝜆1 (Qei )∗ v1 = 𝜆1 v∗i v1 = 𝜆1 𝛿i,1 .


Thus the entries in the first column of Q∗ AQ are what we claim they are.
The entries in the top row are computed similarly by examining the quantity
eT1 Q∗ AQej , and using the fact that A∗ v1 = 𝜆1 v1 (from theorem 3.7.11).
Next we show that the matrix Q∗ AQ is normal:

(Q∗ AQ)(Q∗ AQ)∗ = Q∗ AQQ∗ A∗ Q = Q∗ AA∗ Q = Q∗ A∗ AQ


= Q∗ A∗ QQ∗ AQ = (Q∗ AQ)∗ (Q∗ AQ).

⁴ Here {e1 , . . . , en } is the standard basis for 𝕂n .


OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 97

Now

|𝜆 |2 0 ... 0
⎛ 1 ⎞
0
(Q∗ AQ)(Q∗ AQ)∗ = ⎜ ⎟,
⎜ ⋮ A′ (A′ )∗ ⎟
⎝ 0 ⎠

while

|𝜆 |2 0 ... 0
⎛ 1 ⎞
0
(Q∗ AQ)∗ (Q∗ AQ) = ⎜ ⎟.
⎜ ⋮ (A′ )∗ A′ ⎟
⎝ 0 ⎠

This shows that A′ is normal.


Invoking the inductive hypothesis, there is a unitary (n − 1) × (n − 1) matrix
Q1 such that
𝜆2
Q∗1 A′ Q1 = ( ⋱ ).
𝜆n
Now define
1 0 ... 0
⎛ ⎞
0
P=Q ⎜ ⎟.
⎜ ⋮ Q1 ⎟
⎝ 0 ⎠
Being the product of two unitary matrices, P is unitary. It is straightforward to
verify that
𝜆1
P∗ AP = ( ⋱ ) =D.
𝜆n

Remarks. 1. If we write P = (u1 , . . . , un ), then, by the proof of theorem 3.5.7,


the eigenvalues of A are 𝜆1 , . . . , 𝜆n , and u1 , . . . , un are the corresponding
eigenvectors.
n
2. Observe that A = PDP∗ = ∑i=1 𝜆i ui u∗i . Each of the matrices Pi = ui u∗i is a
rank 1 matrix and, in fact, Pi is the projection of ℂn onto the one-dimensional
subspace generated by ui , because Pi x = (ui u∗i )x = ui (u∗i x) = ⟨x, ui ⟩ui . The
representation
n
A=∑ 𝜆P
i=1 i i

is known as the spectral decomposition of the normal matrix A.


OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

98 fundamentals of mathematical analysis

Spectral Theory for Normal Operators

Let H be a finite-dimensional inner product space, and let T be a linear operator


on H. For a fixed element y ∈ H, define a functional 𝜆y on H by 𝜆y (x) = ⟨Tx, y⟩.
It is clear that 𝜆y is linear. By example 9, there is a unique vector T∗ y ∈ H such
that 𝜆y (x) = ⟨x, T∗ y⟩. Therefore we have a function T∗ ∶ H → H defined by the
requirement that

⟨Tx, y⟩ = ⟨x, T∗ y⟩

for all x, y ∈ H.

It is straightforward to verify that T∗ itself is a linear operator on H. We call it the


adjoint operator of T.

Definition. A linear operator T on a finite-dimensional inner product H is said to


be normal if T∗ T = TT∗ . We say that T is self-adjoint if T = T∗ . A self-adjoint
operator is clearly normal.

We will develop the analog of the spectral decomposition of a normal matrix for
normal operators.

Lemma 3.7.14. Let T be a normal operator on a finite-dimensional inner product


space H, and let B = {v1 , . . . , vn } be an orthonormal basis for H. Then the matrix
A of T relative to B is a normal matrix.

Proof. By theorem 3.5.2, it is sufficient to prove that the matrix of T∗ relative to B


n
is A∗ . By assumption, T(vk ) = ∑i=1 aik vi for 1 ≤ k ≤ n. We need to show that,
n
for 1 ≤ j ≤ n, T∗ (vj ) − ∑i=1 aji vi = 0. It is further sufficient to show that, for all
n
1 ≤ j, k ≤ n, the quantity qjk = ⟨T∗ (vj ) − ∑i=1 aji vi , vk ⟩ is zero:

n
qjk = ⟨T∗ (vj ), vk ⟩ − ∑ aji ⟨vi , vk ⟩ = ⟨vj , Tvk ⟩ − ajk
i=1
n n
= ⟨vj , ∑ aik vi ⟩ − ajk = ∑ aik ⟨vj , vi ⟩ − ajk = ajk − ajk = 0. 
i=1 i=1

Theorem 3.7.15. A normal operator T on a finite-dimensional inner product space


H is diagonalizable.

Proof. Fix an orthonormal basis B = {v1 , . . . , vn } for H, and let A be the matrix of
T relative to B. By the previous lemma, A is normal and, by theorem 3.7.13, A is
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 99

diagonalizable by a unitary matrix P. Thus P∗ AP = D. Let B′ be the basis of H


such that P is the matrix from B′ to B. Such a basis exists by example 4 in section
3.5. By theorem 3.5.5, the matrix of T relative to B′ is P−1 AP = P∗ AP = D, as
desired. We leave it to the reader to verify that B′ is, in fact, an orthonormal basis
for H. 

The above theorem leads to the spectral theorem for normal operators on finite-
dimensional inner product spaces.

Theorem 3.7.16 (the spectral theorem). A normal operator T on a finite-


dimensional inner product space can be written as

n
T = ∑ 𝜆i Pi ,
i=1

where 𝜆1 , . . . , 𝜆n are the eigenvalues of T, and P1 , . . . , Pn are rank 1 operators.

Proof. In the notation of the previous theorem, let u1 , . . . , un be the columns of


the matrix P, and let 𝜆1 , . . . , 𝜆n be the diagonal entries of D. Then T(ui ) = 𝜆i ui .
n n
Write an arbitrary element x of H as x = ∑i=1 xî ui . Then T(x) = ∑i=1 xî T(ui ) =
n
∑i=1 𝜆i xî ui . Define Pi (x) = xî ui = ⟨x, ui ⟩ui . Each of the operators Pi is the
projection of H onto the one-dimensional subspace generated by ui and
n
T = ∑i=1 𝜆i Pi . 

Exercises
1
1. For functions f, g ∈ 𝒞∞ [0, 1], define ⟨f, g⟩ = f (0)g(0) + ∫0 f ′ (x)g′ (x)dx.
Prove that ⟨., .⟩ is an inner product on 𝒞∞ [0, 1].
2. Prove that the following generalization of the previous exercise also defines
an inner product on 𝒞∞ [0, 1]: for a fixed positive integer n,

n−1 1
⟨f, g⟩ = ∑ f(i) (0)g(i) (0) + ∫ f(n) (x)g(n) (x)dx.
i=0 0

3. Prove the following properties of inner products, which are often used
without explicit mention:
(a) If x, y are vectors in an inner product space H such that
⟨x, w⟩ = ⟨y, w⟩ for every w ∈ H, then x = y.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

100 fundamentals of mathematical analysis

(b) For u1 , . . . , un , v1 , . . . , vm ∈ H and for scalars 𝛼1 , . . . , 𝛼n , 𝛽1 , . . . , 𝛽m ,

n m n m
⟨∑ 𝛼i ui , ∑ 𝛽j vj ⟩ = ∑ ∑ 𝛼i 𝛽j ⟨ui , vj ⟩.
i=1 j=1 i=1 j=1

(c) If u1 , . . . , un are mutually orthogonal, then

2
‖n ‖ n
‖‖∑ 𝛼i ui ‖‖ = ∑ |𝛼i |2 ‖u‖2i .

i=1 ‖ i=1

4. Prove that, in an inner product space, x ⟂ y if and only if ‖x + 𝛼y‖ = ‖x −


𝛼y‖ for every 𝛼 ∈ 𝕂.
5. Let u1 , u2 , ... be an infinite orthonormal sequence in an inner product space

H. Prove that, for x ∈ H, ∑n=1 |x̂n |2 ≤ ‖x‖2 . Here x̂n = ⟨x, un ⟩.
6. Prove that if M is a subspace of an inner product space H, then M⟂ is a
subspace of H, and M ∩ M⟂ = {0}.
7. Let M be a finite-dimensional proper subspace of an inner product space H.
Prove that M⟂ ≠ {0}. In particular, there exists a unit vector x orthogonal
to M.
8. Consider the space M = 𝕂(ℕ) of finite sequences. Clearly, M is a subspace
of l2 . Prove that M⟂ = {0}.
9. For real square matrices A and B, define ⟨A, B⟩ = tr(ABT ). Prove that ⟨., .⟩
is an inner product on ℝn×n .
10. This is a continuation of the previous exercise. Prove the orthogonal
complement of the space of symmetric matrices is the subspace of skew-
symmetric matrices.
11. Let M be a proper subspace of ℝn , and let r = dim(M). Prove that there
exists an (n − r) × n matrix A whose null-space is M. What additional
properties can A have?
12. QR factorization. Prove that every real invertible matrix A can be written
as A = QR, where Q is an orthogonal matrix, and R is an upper triangular
matrix. Hint: Apply the Gram-Schmidt process to the columns a1 , . . . , an
of A to find an orthonormal basis q1 , . . . , qn . For 1 ≤ i ≤ n, ai is a linear
combination of q1 , . . . , qi .
13. Let P be an orthogonal matrix.
(a) Prove that ⟨Px, Py⟩ = ⟨x, y⟩ for every x, y ∈ ℝn .
(b) Prove that ‖Px‖ = ‖x‖ for every x ∈ ℝn .
14. Let P be an orthogonal matrix.
(a) Prove that PT is orthogonal.
(b) Prove that det(P) = ±1.
(c) Prove that the product of two orthogonal matrices is orthogonal.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

vector spaces 101

15. Prove theorem 3.7.13 when n = 2.


16. Refer to theorem 3.7.16. Prove that the operators Pi satisfy
n
(a) ∑i=1 Pi = I (the identity operator on H), and
(b) Pi Pj = 𝛿ij Pi .
17. Find an orthogonal matrix that diagonalizes the matrix A below, and write
down the spectral decomposition of A:

−1 0 0
A = ( 0 0 2) .
0 2 3
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

4
The Metric Topology

A mathematician who is not also something of a poet will never be a perfect


mathematician.
Karl Weierstrass

Felix Hausdorff. 1868–1942

Felix Hausdorff was born into a wealthy Jewish family and when he was still
a young boy, the family moved to Leipzig. He studied at Leipzig University,
graduating in 1891 with a doctorate in the applications of mathematics to
astronomy. He published four papers on astronomy and optics over the next
few years. Hausdorff remained in Leibzig, where he lectured until 1910. He then
moved to Bonn, then to Greifswald in 1913, returning to Bonn in 1921, where he
continued his work until 1935.

Hausdorff was the first to coin the definitions of metric and topological spaces. In
1914, building on work by Maurice Fréchet and others, he published his famous
text Grundzüge der Mengenlehre. The book was the beginning point for studying
metric and topological spaces, which are now core topics in modern mathematics.
Among Hausdorff ’s numerous achievements, we count his introduction of the
notion of the Hausdorff dimension, his study of the Gaussian law of errors, limit
theorems and the problem of moments, and the strong law of large numbers. He
introduced the concept of a partially ordered set and, from 1901 to 1909, he proved

Fundamentals of Mathematical Analysis. Adel N. Boules, Oxford University Press (2021). © Adel N. Boules.
DOI: 10.1093/oso/9780198868781.003.0004
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

104 fundamentals of mathematical analysis

a series of results on ordered sets. In 1907 he introduced special types of ordinals


in an attempt to prove Cantor’s continuum hypothesis, and he was also the first to
pose the generalized continuum hypothesis.

Hausdorff sensed the oncoming calamity of Nazism but made no attempt to


emigrate while it was still possible. Although he swore the necessary oath to Hitler
in 1934, he was forced to give up his position in 1935. He continued to undertake
research in topology and set theory but the results could not be published in
Germany. As a Jew, Hausdorff ’s position grew increasingly more difficult. He
lived under the constant threat of being deported to an internment camp. Bonn
University requested that the Hausdorffs be allowed to remain in their home, and
the request was granted. But, in 1941, they were forced to wear the “yellow star”,
and, in January 1942, the Hausdorffs were informed that they were to be interned
in Endenich. Together with his wife and his wife’s sister, Hausdorff committed
suicide on 26 January.

Hausdorff was, according to a quote attributed to Weierstrass, a perfect mathe-


matician. Indeed, he was something of a poet, according to the following excerpt:1

Hausdorff pursued, especially during the early years in Leipzig, a kind of


double identity: as Felix Hausdorff, the productive mathematician, and as
Paul Mongré. Under this pseudonym, Hausdorff enjoyed remarkable recog-
nition within the German intelligentsia at the end of the 19th century as
a writer, philosopher and socially critical essayist. He fostered a circle of
friends that consisted of well-known writers, artists and publishers including
Hermann Conradi, Richard Dehmel, Otto Erich Hartleben, Gustav Kirstein,
Max Klinger, Max Reger and Frank Wedekind. Between 1897 and 1904,
Hausdorff reached the peak of his literary-philosophical accomplishment:
during this period, 18 of a total of 22 works were published under his
pseudonym. These included the volume of aphorisms Sant’ Ilario: Thoughts
from Zarathrustra’s Country, his critique Das Chaos in kosmischer Auslese,
a book of poems entitled Ekstases, the farce Der Arzt seiner Ehre, as well as
numerous essays, most of which appeared in the leading journal of the day,
“Neue Deutsche Rundschau (Freie Bühne)”. The play was Hausdorff ’s greatest
literary success, as it was performed over 300 times in 31 cities.

1 Excerpted from Hausdorff Center for Mathematics, Felix Hausdorff, http://www.hcm.uni-bonn.


de/about-hcm/felix-hausdorff/about-felix-hausdorff/, accessed Oct. 29, 2020.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 105

4.1 Definitions and Basic Properties

Basic calculus concepts such as limits and continuity are heavily based on the
concept of proximity. A metric is the most common tool for measuring proximity.
The definition of a metric is a direct abstraction of the properties of the distance
function in the plane. The most important characteristics of the Euclidean distance
are:

(1) the distance between two points in the plane is positive,


(2) the distance is a symmetric function, and
(3) the triangle inequality as it is understood in plane geometry.

These three characteristics are the ingredients of the definition of a metric. It is


an amazing fact that so few axioms produce such a rich structure. The abstraction
of a simple concept almost never produces a structure with properties identical
to those of the concept. Indeed, there are fundamental differences between the
properties of a general metric and those of the Euclidean distance. For example,
you will see that there are metrics where a ball consists of a single point or the
entire space. Of course, such metrics generally have much less importance than the
most common metrics, those induced by a norm. Thus the fact that some metric
properties violate our sense of geometry does not detract from the usefulness of
metric spaces as one of the most powerful tools of mathematics.

Definition. A metric space is a nonempty set X together with a function d ∶ X ×


X → ℝ such that, for all x, y and z ∈ X,

(a) d(x, y) ≥ 0, and d(x, y) = 0 if and only if x = y,


(b) d(x, y) = d(y, x), and
(c) d(x, y) ≤ d(x, z) + d(z, y)

The function d is called the distance function, or the metric. Property (c) is
known as the triangle inequality.

Example 1. Let X = R, and let d(x, y) = |x − y|. The triangle inequality is indeed
the inequality known by the same name in elementary mathematics. In general,
n
the metric on ℝn given by d(x, y) = (∑i=1 |xi − y2i |)1/2 = ‖x − y‖2 is called the
Euclidean (or the usual) metric on ℝn . In this case, the triangle inequality
follows from Minkowski’s inequality with p = 2. 

Example 2. Normed linear spaces provide a rich source of metric spaces. If (X, ‖.‖)
is a normed linear space, and the distance function is defined by d(x, y) = ‖x −
y‖, then d(x, y) = ‖x − y‖ = ‖(x − z) + (z − y)‖ ≤ ‖x − z‖ + ‖z − y‖ = d(x, z) +
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

106 fundamentals of mathematical analysis

d(z, y), and this proves the triangle inequality. The other properties are trivial to
verify. Special cases include all lp spaces, ℝn with any of the lp metrics, and the
space ℬ[0, 1]. See section 3.6. 

Example 3. Let X be a nonempty set and define the discrete metric on X as


follows:
1 if x ≠ y,
d(x, y) = {
0 if x = y. 

Definition. Let (X, d) be a metric space. The open ball of radius r centered at
x ∈ X is the set

B(x, r) = {y ∈ X ∶ d(x, y) < r}.

The special case of this definition stated in section 3.6 when X is a normned linear
space is consistent with the above definition.

Example 4. In ℝ with the usual metric, B(x, r) is the open interval of radius r
centered at x. 

Example 5. In (ℝ2 , ‖.‖2 ), the open ball of radius r centered at (x0 , y0 ) is the open
disk of radius r centered at (x0 , y0 ). 

Example 6. In the space ℬ[0, 1] of bounded real functions on [0, 1], the ball B(f, r)
is the set of all bounded functions whose graphs on [0, 1] are between the graphs
of the functions y = f(x) − r and y = f(x) + r. 

Example 7. In the discrete metric on a set X, B(x, r) = {x} if r ≤ 1, and B(x, r) = X


if r > 1. 

Definition. A subset of a metric space X is said to be open if it is the union of


open balls.

Example 8. An open ball in any metric space X is an open subset of X. 

Example 9. Consider the discrete metric on a nonempty set X. Single-point


subsets of X are open because B(x, 1) = {x}. It follows that every subset of X
is an open set since a set is the union of its single-point subsets. 

Example 10. In ℝ with the usual metric, the interval (0, ∞) is open since (0, ∞) =
i=0 (i, i + 2). 
∪∞
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 107

Theorem 4.1.1. A subset U of a metric space X is open if and only if, for every x ∈ U,
there exists a positive number 𝛿 such that B(x, 𝛿) ⊆ U.

Proof. Suppose U is open, and let x ∈ U. Since U is the union of open balls, there
exists a ball B(y, r) in X such that x ∈ B(y, r) ⊆ U. Since d(x, y) < r, the number
𝛿 = r − d(x, y) is positive. We show that B(x, 𝛿) ⊆ B(y, r). Let z ∈ B(x, 𝛿). Then
d(z, y) ≤ d(z, x) + d(x, y) < d(x, y) + 𝛿 = r. Conversely, if, for every x ∈ U, there
is a positive number 𝛿x such that B(x, 𝛿x ) ⊆ U, then U = ∪x∈U B(x, 𝛿x ). 

Example 11. In ℝ2 , the first quadrant U = {(x, y) ∶ x > 0, y > 0} is open. If


(x0 , y0 ) ∈ U, then the disk centered at (x0 , y0 ) of radius 𝛿 < min(x0 , y0 ) is
contained in U. 

Theorem 4.1.2. Let X be a metric space. Then

(a) The union of an arbitrary collection of open subsets of X is open.


(b) The intersection of two (hence any finite number of) open sets is open.
(c) X is open.

Proof. (a) Let {U𝛼 } be a collection of open sets of X, and let U = ∪𝛼 U𝛼 . If x ∈ U,


then x ∈ U𝛼 for some 𝛼. Therefore there exists 𝛿 > 0 such that B(x, 𝛿) ⊆ U𝛼 ⊆ U.
Thus U is open by theorem 4.1.1.

(b) Let U and V be open subsets of X, and let x ∈ U ∩ V. By theorem 4.1.1, there
exist positive numbers 𝛿1 and 𝛿2 such that B(x, 𝛿1 ) ⊆ U, and B(x, 𝛿2 ) ⊆ V. Let
𝛿 = min{𝛿1 , 𝛿2 }. Clearly, B(x, 𝛿) ⊆ U ∩ V. Again, by theorem 4.1.1, U ∩ V is open.

n=1 B(x, n). 


(c) Fix an element x ∈ X. Clearly, X = ∪∞

By definition, the empty set is also an open subset of any metric space. This is
largely a useful convention. For example, the statement of theorem 4.1.2 (b) should
read: “The intersection of two open sets is open or empty.” If we declare the empty
set to be open, the statement as it stands is correct.

Definition. A subset F of a metric space X is closed if its complement X − F is


open.

Example 12. In ℝ with the usual metric, [a, b], [a, ∞), and ∪n∈ℤ [2n, 2n + 1] are
all closed sets, as the complement of each set is open. 

The theorem below follows from theorem 4.1.2 and De Morgan’s laws.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

108 fundamentals of mathematical analysis

Theorem 4.1.3. Let X be a metric space. Then

(a) X and ∅ are closed.


(b) A finite union of closed sets is closed.
(c) The intersection of an arbitrary collection of closed sets is closed. 

Theorem 4.1.4. Let x and y be distinct elements of a metric space X. Then there exist
open sets U and V containing x and y, respectively, such that U ∩ V = ∅.

Proof. Let 𝛿 = d(x, y), and set U = B(x, 𝛿/2) and V = B(y, 𝛿/2). We show that U ∩
V = ∅. If z ∈ U ∩ V, then d(x, y) ≤ d(x, z) + d(z, y) < 𝛿/2 + 𝛿/2 = 𝛿, which is a
contradiction. 

The property established by the above theorem, namely, that distinct points in a
metric space are contained in disjoint open sets, is called the Hausdorff property.
This is an important separation property of metric spaces. Common terminology
used to describe the Hausdorff property is that distinct points in a metric space
can be separated by disjoint open sets.

Definition. A neighborhood of a point x of a metric space X is a subset of X that


contains an open subset of X that contains x. A neighborhood of a point need
not be open. The concept is sometimes helpful in economizing on verbiage.

Now you see that distance functions are not created equal. An open neighborhood
of a point in the discrete metric is either very small (a single point) or very large
(the whole space), while the collection of neighborhoods of a point x in a normed
linear space includes all the open balls centered at x and is therefore quite rich.
There is another important distinction between a general metric and the metric
generated by a norm. In the latter case, the collection of open neighborhoods
of a point is exactly the translation of the collection of open neighborhoods
of any other points. Thus the neighborhoods of a point are identical (up to a
translation) to the neighborhoods of any other points. The open neighborhoods
in a general metric space can be quite heterogeneous in the sense that knowledge
of the neighborhoods of one point tells us nothing about the open neighborhoods
of other points.

Definition. Let (xn ) be a sequence in a metric space X, and let x ∈ X. We say that
(xn ) converges to x if limn d(xn , x) = 0. In this case, we write limn xn = x. We
also say that x is the limit of (xn ). Observe that if X is a normed linear space,
limn xn = x is equivalent to the condition that limn ‖xn − x‖ = 0.

Theorem 4.1.5. The limit of a convergent sequence (xn ) is unique.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 109

Proof. Suppose that limn xn = x, limn xn = y, and x ≠ y. By the Hausdorff property,


there exists 𝛿 > 0 such that B(x, 𝛿) ∩ B(y, 𝛿) = ∅. There exist natural numbers
N1 and N2 such that, for n > N1 , d(xn , x) < 𝛿, and, for n > N2 , d(xn , y) < 𝛿. If
we choose an integer n > max{N1 , N2 }, then d(xn , x) < 𝛿 and d(xn , y) < 𝛿, and
xn ∈ B(x, 𝛿) ∩ B(y, 𝛿) = ∅, which is a contradiction. 

Convergence in the spaces (ℬ[0, 1], ‖.‖∞ ) and (𝒞[0, 1], ‖.‖∞ ) is equivalent to
uniform convergence. Explicitly stated, a sequence (fn ) of bounded (respectively,
continuous) functions converges in the uniform norm to a bounded (respectively,
continuous) function f if, for 𝜖 > 0, there exists a positive integer N, dependent
only on 𝜖, such that, for all x ∈ [0, 1] and all n > N, |fn (x) − f(x)| < 𝜖. Clearly, the
pointwise convergence of (fn ) to f is necessary for its uniform convergence to f. The
following two examples illustrate that the converse is not true.

Example 13. Let

nx if 0 ≤ x ≤ 1/n,
fn (x) = {
1 if 1/n ≤ x ≤ 1.
The pointwise limit of the sequence (fn ) is clearly the bounded function

0 if x = 0,
f(x) = {
1 if 0 < x ≤ 1.

However, the sequence (fn ) does not converge to f in ℬ[0, 1] because, for every
n ∈ ℕ, ‖fn − f‖∞ ≥ |fn (1/(2n)) − f(1/(2n))| = |1/2 − 1| = 1/2. 

1
Example 14. Let fn (x) = . Clearly, 0 ≤ fn (x) ≤ 1, and limn fn (x) = 0 for
n3 (x−1/n)2 +1
all x ∈ [0, 1]. However, fn does not converge to the zero function in the uniform
norm since ‖fn ‖∞ = fn (1/n) = 1. 

The following theorem is occasionally useful. Its proof is left as an exercise.

Theorem 4.1.6. Let (xn ) be a sequence in a metric space X, and let x ∈ X. If every
subsequence of (xn ) contains a subsequence that converges to x, then (xn ) converges
to x. 

Exercises

1. Show that the intersection of an arbitrary collection of open sets need not
be open.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

110 fundamentals of mathematical analysis

2. Show that an arbitrary union of closed sets is not necessarily closed.


3. Show that a single-point subset of a metric space X is closed. Hint: Use the
Hausdorff property. Conclude that an arbitrary subset of X is the union of
closed sets.
4. Show that |d(x, z) − d(y, z)| ≤ d(x, y). Hence show that if limn xn = x,
limn yn = y, then limn d(xn , yn ) = d(x, y).
5. Prove that a convergent sequence in a metric space is bounded.
6. If, in a normed linear space X, limn xn = x, limn yn = y, and (an ) and (bn )
are scalar sequences that converge to a and b, respectively, then limn (an xn +
bn yn ) = ax + by.
7. Prove that if limn xn = x, then every subsequence of (xn ) converges to x.
8. Prove theorem 4.1.6.
9. Show that limn xn = x if and only if every neighborhood of x contains all
but finitely many terms of (xn ).
10. Give an example of a normed linear space that contains an uncountable
number of mutually disjoint balls of equal radii. Hint: Let A be the subset
of l∞ of all binary sequences. What is the distance between any pair of points
in A?
11. Prove that the sphere 𝒮n−1 = {x ∈ ℝn ∶ ‖x‖2 = 1} is a closed subset of ℝn .

4.2 Interior, Closure, and Boundary

The notions of interior, closure, and boundary are quite familiar, and their mean-
ing is rather obvious for simple sets. For example, the interior of the closed
disk D = {x ∈ ℝ2 ∶ ‖x‖2 ≤ 1} is the open disk U = {x ∈ ℝ2 ∶ ‖x‖2 < 1}, and the
boundary of D is the unit circle. The fact that a concept is intuitively obvious
is no substitute for a definition. It is often the case that the definition of a
familiar concept deepens our realization that familiarity and simplicity are not
synonymous. You will see in this section that the interior of ℚ is empty, that its
boundary is the entire real line, and that important subsets of ℝ, such as the Cantor
set, can come in infinitely many fragments. Intuitively speaking, one expects the
definitions to capture the ideas that an interior point of a set A must be completely
surrounded by points of A and that a boundary point of A falls on the edge of A.
Thus any ball centered at a boundary point of A falls partially inside A and partially
outside it. This section formulates precise generalizations of those concepts. We
will also see that disjoint closed sets can be separated in much the same way that
the Hausdorff property separates points.

Definition. Let A be a nonempty subset of a metric space X. The interior of A,


denoted int(A), is the union of all the open subsets of X contained in A. A point
of int(A) is called an interior point of A.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 111

Example 1. The interior of a nonempty subset may well be empty. The simplest
example is the subset ℚ of the metric space ℝ; int(ℚ) = ∅ because ℚ contains
no open intervals and hence no open subsets of ℝ. 

The proofs of the following two theorems are straightforward.

Theorem 4.2.1. The interior of a subset A is the largest open subset of X contained
in A. A subset A is open if and only if int(A) = A. Finally, if A ⊆ B, then int(A) ⊆
int(B). 

Theorem 4.2.2. Let A be a nonempty subset of X, and let x ∈ X. Then x ∈ int(A) if


and only if there exists 𝛿 > 0 such that B(x, 𝛿) ⊆ A. 

The above theorem captures what it means for an interior point of A to be totally
surrounded by points of A. In fact, the statement of theorem 4.2.2 can be taken as
the definition of an interior point x of A. One can then define the interior of A to
be the set of all interior points of A.

Definition. Let A be a subset of a metric space X. The closure, A, of A is the inter-


section of all the closed subsets of X containing A. Points of A are called closure
points of A. Since X is closed and it contains A, the closure of a nonempty set
is nonempty. The following theorem is an immediate consequence of theorem
4.1.3.

Theorem 4.2.3. The closure, A, of A is the smallest closed subset of X containing A.


A subset A of X is closed if and only if A = A. Finally, if A ⊆ B, then A ⊆ B. 

Theorem 4.2.4. Let A be a nonempty subset of X, and let x ∈ X. Then x ∈ A if and


only if for every 𝛿 > 0, A ∩ B(x, 𝛿) ≠ ∅.

Proof. Suppose, x ∉ A. Then x ∈ X − A, which is open. Thus there exists 𝛿 > 0 such
that B(x, 𝛿) ⊆ X − A. In particular, B(x, 𝛿) ∩ A = ∅. Conversely, if for some 𝛿 >
0, B(x, 𝛿) ∩ A = ∅, then A ⊆ X − B(x, 𝛿), which is closed, so A ⊆ X − B(x, 𝛿) =
X − B(x, 𝛿). In particular, x ∉ A. 

Example 2. In ℝ with the usual metric, ℚ = ℝ. This is because every open interval
in ℝ contains rational points. 

Example 3. (The Comb) Consider the following subset of ℝ2 ∶


∞ 1 1
A = ⋃n=1 ({ } × [0, 1]). The line segments { } × [0, 1] are called the teeth of
n n
A. We claim that A = A ∪ ({0} × [0, 1]). Since A is contained in the closed unit
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

112 fundamentals of mathematical analysis

square S = [0, 1] × [0, 1], A ⊆ S. Any point in S that does not belong to A or the
line segment {0} × [0, 1] must lie strictly between two consecutive teeth, and a
small-enough disk centered at the point is strictly contained between the two
teeth. Finally any disk centered at a point on {0} × [0, 1] intersects all the teeth
from some point n on. 

Theorem 4.2.5. Let A be a nonempty subset of X, and let x ∈ X. Then x ∈ A if and


only if there exists a sequence (xn ) in A such that limn xn = x.

Proof. Suppose limn xn = x, where each xn ∈ A, and let 𝛿 > 0. There exists a natural
number N such that, for all n ≥ N, d(xn , x) < 𝛿. In particular, xN ∈ B(x, 𝛿) ∩ A;
thus x ∈ A, by theorem 4.2.4. Conversely, suppose x ∈ A. By theorem 4.2.4,
B(x, 1/n) ∩ A ≠ ∅ for all n ∈ ℕ. Choose a point xn ∈ B(x, 1/n) ∩ A. Clearly,
limn xn = x. 

Definition. Let A be a subset of a metric space X. A point x ∈ X is called a limit


point of A if, for every 𝛿 > 0, B(x, 𝛿) ∩ A contains a point other than x. A point
of A that is not a limit point of A is called an isolated point of A.

Observe that a limit point of A need not belong to A. A point x of A is isolated if


and only if there is 𝛿 > 0 such that B(x, 𝛿) ∩ A = {x}.

Example 4. In ℝ2 with the usual metric, points on the unit circle {(x, y) ∶ x2 + y2 =
1} are limit points of the open unit disk {(x, y) ∶ x2 + y2 < 1}. 

Example 5. In ℝ, every point of ℕ is an isolated point of ℕ. A little reflection shows


1
that every point of the set A = { ∶ n ∈ ℕ} is an isolated point of A. 
n

Theorem 4.2.6. If A is a nonempty subset of X, and x ∈ X, then x is a limit point


of A if and only if there exists a sequence (xn ) of distinct points of A such that
limn xn = x.

Proof. Let x be a limit point of A. There exists a point x1 ∈ A such that 0 < d(x1 , x) <
1. Let 𝛿2 = min{d(x1 , x), 1/2}. There exists a point x2 ∈ A such that 0 < d(x2 , x) <
𝛿2 . Note that x2 ≠ x1 by construction. The rest of the construction is inductive.
Having found points x1 , . . . , xn such that 0 < d(xn , x) < . . . < d(x1 , x) such that
1
d(xi , x) < 1/i, let 𝛿n = min{ , d(xn , x)}, then choose a point xn+1 ∈ A such that
n+1
0 < d(xn+1 , x) < 𝛿n . Clearly, limn xn = x. The converse is straightforward. 

Definition. The derived set of a subset A of a metric space X, denoted by A′ , is


the set of all limit points of A.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 113

Theorem 4.2.7. A = A ∪ A′ . Thus A is closed if and only if it contains all its limit
points.

Proof. By theorems 4.2.5 and 4.2.6, A′ ⊆ A, and, by definition A ⊆ A. Thus A ∪ A′ ⊆


A. Now suppose x ∈ A and that x ∉ A. Since x ∈ A, B(x, 𝛿) ∩ A ≠ ∅ for every
𝛿 > 0. Because x ∉ A, B(x, 𝛿) ∩ A contains a point of A other than x. This makes
x a limit point of A, by definition. 

Definition. The boundary of a subset A of a metric space X is the set


𝜕A = A ∩ X − A. Points of 𝜕A are called the boundary points of A. Observe that
x ∈ 𝜕A if and only if every neighborhood of x intersects both A and X − A.

Example 6. In ℝ with the usual metric, 𝜕 ℚ = ℝ. This is because every open


interval in ℝ contains both rational and irrational points. 

Theorem 4.2.8. A = int(A) ∪ 𝜕A.

Proof. It is enough to show that A ⊆ int(A) ∪ 𝜕A. The reverse containment is obvi-
ous. Let x ∈ A − 𝜕A. Since every open ball centered at x intersects A, and since
x ∉ 𝜕A, there exists an open ball B(x, 𝛿) that does not intersect X − A. This means
that B(x, 𝛿) ⊆ A; hence x ∈ int(A). 

Theorem 4.2.9. A = A ∪ 𝜕A.

Proof. By the previous theorem, A = int(A) ∪ 𝜕A ⊆ A ∪ 𝜕A ⊆ A. 

Definition. Let A be a nonempty subset of a metric space X, and let x ∈ X. The


distance from x to A is dist(x, A) = inf {d(x, a) ∶ a ∈ A}.

Definition. Let A and B be nonempty subsets of a metric space X. The distance


between A and B is dist(A, B) = inf {d(a, b) ∶ a ∈ A, b ∈ B}.

Observe that dist(x, A) and dist(A, B) are always finite numbers.

1
Example 7. Let A = {n ∈ ℕ ∶ n ≥ 2} and B = {n + ∶ n ≥ 2}. Then dist(A, B) = 0.
n
1
To see this, observe that an = n ∈ A, that bn = n + ∈ B, and that |an − bn | =
n
1/n → 0 as n → ∞. 

Theorem 4.2.10. Let A be a nonempty subset of a metric space X. Then A = {x ∈


X ∶ dist(x, A) = 0}.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

114 fundamentals of mathematical analysis

Proof. Suppose x ∈ A. By theorem 4.2.5, there exists a sequence (xn ) in A such that
limn xn = x. Thus dist(x, A) = inf {d(x, a) ∶ a ∈ A} ≤ d(xn , x) for all n ∈ ℕ. Since
limn d(xn , x) = 0, dist(x, A) = 0. Conversely, if dist(x, A) = 0, then there exists a
sequence of points xn ∈ A such that limn d(xn , x) = 0. Thus limn xn = x, and x ∈ A
by theorem 4.2.5. 

Definition. Let A be a nonempty subset of a metric space X. The diameter of


A, diam(A) = sup{d(x, y) ∶ x, y ∈ A}. If diam(A) is finite, we say that A is a
bounded subset of X. A sequence (xn ) is said to be bounded if its range, {xn },
is a bounded set. If diam(X) < ∞, we say that d is a bounded metric.

Theorem 4.2.11. diam(A) = diam(A).

Proof. Clearly, diam(A) ≤ diam(A). To prove that diam(A) ≤ diam(A), let x, y ∈ A.


We will show that d(x, y) ≤ diam(A). There exist sequences (xn ) and (yn ) in A
such that limn xn = x, limn yn = y. Now d(x, y) ≤ d(x, xn ) + d(xn , yn ) + d(yn , y) ≤
d(xn , x) + diam(A) + d(yn , y). The desired inequality follows from the above string
of inequalities by taking the limit as n → ∞. 

Separation by Open Sets

Separation is a central idea in topology and analysis, and its importance cannot be
exaggerated. The Hausdorff property is the simplest form of separation. We will
see below that closed sets can be separated in much that same way points can be.

Theorem 4.2.12. Let F be a closed subset of X, and let x ∈ X − F. Then there exist
open subsets U and V such that x ∈ U, F ⊆ V, and U ∩ V = ∅.

Proof. Since x ∈ X − F, which is open, there exists 𝛿 > 0 such that B(x, 𝛿) ⊆ X − F.
For every y ∈ F, d(x, y) ≥ 𝛿; hence B(x, 𝛿/2) ∩ B(y, 𝛿/2) = ∅. The open sets U =
B(x, 𝛿/2) and V = ∪{B(y, 𝛿/2) ∶ y ∈ F} satisfy the conclusion of the theorem. 

An alternative (and commonly used) terminology to summarize theorem 4.2.12 is


to say that there are disjoint open subsets that separate a closed subset of X and a
point outside it.

Theorem 4.2.13. Let E and F be disjoint closed subsets of a metric space X. Then E
and F can be separated by disjoint open subsets of X.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 115

Proof. We need to find disjoint open subsets U and V such that E ⊆ U and F ⊆ V.
For x ∈ E, dist(x, F) > 0, by theorem 4.2.10. Let 𝛿x = dist(x, F). By the proof
of theorem 4.2.12, for every y ∈ F, B(x, 𝛿x /2) ∩ B(y, 𝛿x /2) = ∅. For y ∈ F, let
𝛿y = dist(y, E) > 0. By the proof of theorem 4.2.12, for every x ∈ E, B(x, 𝛿y /2) ∩
B(y, 𝛿y /2) = ∅. Let U = ∪x∈E B(x, 𝛿x /2), and V = ∪y∈F B(y, 𝛿y /2). Clearly, U and
V are open, E ⊆ U, and V ⊆ V. It remains to show that U and V are disjoint.
If z ∈ U ∩ V, then z ∈ B(x, 𝛿x /2) ∩ B(y, 𝛿y /2) for some x ∈ E and y ∈ F. Now,
d(x, y) ≤ d(x, z) + d(z, y) < 𝛿x /2 + 𝛿y /2 ≤ max{𝛿x , 𝛿y }. But d(x, y) ≥ dist(x, F) =
𝛿x and d(x, y) ≥ dist(y, E) = 𝛿y . We have arrived at a contradiction that proves
the theorem. 

Example 8. Let E = {(x, y) ∈ ℝ2 ∶ x > 0, y ≥ 1/x2 }, and F = {(x, y) ∈ ℝ2 ∶ x <


0, y ≥ 1/x2 }. Clearly, E and F are disjoint closed subsets of the plane. They
are separated by the open right and left half planes. 

Subspaces

Let (X, d) be a metric space, and let A be a subset of X. The defining conditions of
the metric are clearly satisfied by the elements of A. Since the distance function is
the only defining characteristic of a metric space, the pair (A, d) is a metric space
in its own right. We say that (A, d) is a subspace of (X, d), and the metric d on A is
called the restricted (induced, or subspace) metric.

If A is a subspace of a metric space X, we use the notation BA (x, 𝛿) to denote the ball
in A of radius 𝛿 centered at a point x ∈ A. Thus BA (x, 𝛿) = {x ∈ A ∶ d(x, a) < 𝛿}.
We use the notation BA to denote the closure of a subset B of A in the restricted
metric.

Theorem 4.2.14. Let A be a subspace of a metric space X, and let B ⊆ A. Then

(a) BA (x, 𝛿) = B(x, 𝛿) ∩ A.


(b) B is open in the restricted metric on A if and only if there exists an open subset
U of X such that B = U ∩ A.
(c) B is closed in the restricted metric on A if and only if there exists a closed subset
E of X such that B = E ∩ A.

Proof. We prove part (b) and leave the rest of the statements to the reader. If B is an
open subset of A, then B is the union of open balls in A. Thus B = ∪𝛼∈I BA (x𝛼 , 𝛿𝛼 ).
By part (a), BA (x𝛼 , 𝛿𝛼 ) = B(x𝛼 , 𝛿𝛼 ) ∩ A; thus, B = ∪𝛼∈I [B(x𝛼 , 𝛿𝛼 ) ∩ A] = [ ∪𝛼∈I
B(x𝛼 , 𝛿𝛼 )] ∩ A = U ∩ A, where U = ∪𝛼∈I B(x𝛼 , 𝛿𝛼 ), which is open in X. We leave
the proof of the converse as an exercise. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

116 fundamentals of mathematical analysis

The Cantor Set

Consider the closed unit interval I = [0, 1]. Trisect I and remove the open middle
third (1/3, 2/3). This leaves two closed intervals: I1,1 = [0, 1/3] and I1,2 = [2/3, 1].
Let C1 = I1,1 ∪ I1,2 . Repeat the construction for each of the intervals I1,1 and
I1,2 , thus removing the middle open third of each of the two intervals. This
leaves four closed intervals: I2,1 = [0, 1/9], I2,2 = [2/9, 1/3], I2,3 = [2/3, 7/9], and
I2,4 = [8/9, 1]. Define C2 = ∪4j=1 I2,j . Repeating this construction yields, for every
n ∈ ℕ, a sequence of closed intervals In,1 , . . . , In,2n , each of length 3−n . Define
n
Cn = ∪2j=1 In,j .
The Cantor set is defined to be C = ∩∞
n=1 Cn .

It is clear that C is an infinite set because it contains the endpoints of each of the
intervals In,j for all n ∈ ℕ and all 1 ≤ j ≤ 2n . What is less obvious is whether C
contains any additional points. The surprising answer is that C is uncountable.

First we establish some topological properties of C.

Definition. A closed subset A of a metric space X is said to be a perfect set if every


point of A is a limit point of A. Thus A is perfect if it is equal to its derived set.

Example 9. The closed unit interval [0, 1] is a perfect set. 

Definition. A subset A of a metric space X is said to be nowhere dense if


int(A) = ∅.

Example 10. A hyperplane M in ℝn is nowhere dense in ℝn .

Without loss of generality, assume that the hyperplane contains the origin.
Thus there is a nonzero vector a such that M = {x ∈ ℝn ∶ aT x = 0}. For x ∈ M,
and 𝜖 > 0, the open ball B = B(x, 𝜖) is not contained in M because the point
𝜖a
x+ ∈ B − M. 
2‖a‖

Lemma 4.2.15. The Cantor set is closed, perfect, and nowhere dense.

Proof. Since each Cn is closed, C = ∩∞


n=1 Cn is closed.

We show that C contains no open intervals. This proves that int(C) = ∅. Let J be an
open interval of length 𝜖 > 0, and choose an integer n such that 3−n < 𝜖. Since
the length of each of the intervals In,j is 3−n , none of the intervals In,j contains J.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 117

Since J is connected,2 it cannot be contained in the (disconnected) union of two or


n
more of the intervals In,j . Thus J is not contained in ∪2j=1 In,j = Cn . Hence J is not
contained in C.

Finally, let x ∈ C and consider the interval (x − 𝜖, x + 𝜖). Again choose an integer n
such that 3−n < 𝜖. Since x ∈ Cn , x ∈ In,j for some 1 ≤ j ≤ n. Because the length of
In,j is 3−n < 𝜖, In,j ⊆ (x − 𝜖, x + 𝜖). Thus (x − 𝜖, x + 𝜖) intersects C at a point other
than x (at least one of the endpoints of In,j is not equal to x.) This shows that every
point in C is a limit point of C; hence C is perfect. 

In the rest of this subsection and in the section exercises, we need to consider the
ternary (base 3) expansions of points in [0, 1]. Every point x ∈ [0, 1] has a ternary
∞ a
representation x = ∑i=1 ii where each ai ∈ {0, 1, 2}. The sum may well be finite,
3
n a
x = ∑i=1 ii , and the ternary representation of a number may not be unique, but
3
this point is of no immediate consequence. However, see the section exercises.
n a
Lemma 4.2.16. If x = ∑i=1 ii , ai ∈ {0, 2}, then x is the left endpoint of an interval
3
In,j for some 1 ≤ j ≤ 2n .3

a
Proof. We prove the result by induction on n. When n = 1, x = 1 . If a1 = 0, x = 0,
3
which is the left endpoint of I1,1 . If a1 = 2, x = 2/3, which is the left endpoint of I1,2 .

Now suppose the statement is true for a certain integer n. Consider a number
n+1 a
y = ∑i=1 ii , where ai ∈ {0, 2}. If an+1 = 0, there is nothing to prove, so suppose
3
n ai
an+1 = 2. By the inductive hypothesis, the number x = ∑i=1 is the left endpoint
3i
n 2
of an interval In,j for some 1 ≤ j ≤ 2 . Since y = x + n+1 , y is the left endpoint of
3
the right closed subinterval that results from the trisection of In,j . Thus y is the left
endpoint of an interval In+1,k for some 1 ≤ k ≤ 2n+1 . 
∞ a
Proposition 4.2.17. If y = ∑i=1 ii , where ai ∈ {0, 2}, then y ∈ C.
3

n a
Proof. It is enough to prove that y ∈ Cn for every n ∈ ℕ. Let x = ∑i=1 ii . By the
3
previous lemma, x is the left endpoint of some interval In,j . Since the length of In,j
∞ 2
is 3−n and y − x ≤ ∑i=n+1 = 3−n , y ∈ In,j ⊆ Cn . 
3i

2 To say that J is connected means that if x, y ∈ J, and x < z < y, then z ∈ J.


3 Observe that the statement does not exclude the possibility that an = 0. This is because if a point
x is the left endpoint of an interval Im,k for some m and some 1 ≤ k ≤ 2m , then x is the left endpoint
of In,j for every n > m and some 1 ≤ j ≤ 2n . The reason is that the successive trisections of Im,k always
result in an interval (the leftmost) whose left endpoint is x.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

118 fundamentals of mathematical analysis

We now need the binary (base 2) representations of numbers in the interval [0, 1].
∞ a
In this system, every x ∈ [0, 1] can be written as a series ∑i=1 ii , where ai ∈ {0, 1}.
2
n a
Again, such a representation may be finite; x = ∑i=1 ii , and, in this case, x does
2
not have a unique representation. For example, the number 1/2 can also be written
n a
as 1/4 + 1/8 + 1/16 + . . . . In general, the number x = ∑i=1 ii , where an = 1 can
2
n−1 a ∞ 1
also be written as x = ∑i=1 ii + ∑i=n+1 i . In order to avoid ambiguity, we use
2 2
the latter representation of x and not the finite sum representation.

Theorem 4.2.18. The Cantor set has cardinality 𝔠.

Proof. We define a function f ∶ [0, 1] → C as follows: f(0) = 0, and, for x ∈ (0, 1],
∞ a ∞ 2a
write x = ∑i=1 ii and define f(x) = ∑i=1 ii . By the previous proposition, f(x) ∈
2 3
C. We leave it to the reader to verify that f is one-to-one. Now lemma 2.2.3 implies
that C is equivalent to [0, 1]; hence Card(C) = Card([0, 1]) = 𝔠. 

Exercises

1. Which of the following subsets of ℝ2 are open?


(a) A = {(x, y) ∈ ℝ2 ∶ x ≠ 0, y < 1/x2 }
(b) B = ∪a>0 {(x, y) ∈ ℝ2 ∶ x ∈ ℝ, y = ax }
2. Find the closure of each of the sets A and B in the previous problem.
3. Let X be a metric space, and let x ∈ X. Show that the set B[x, 𝛿] = {y ∈ X ∶
d(x, y) ≤ 𝛿} is closed in X. The set B[x, 𝛿] is called the closed ball of radius
𝛿 centered at x. Give an example to show that the closure of the open ball
B(x, 𝛿) is not necessarily the closed ball B[x, 𝛿].
4. Show that if X is a normed linear space, then the closure of the open ball
B(x, 𝛿) is the closed ball B[x, 𝛿].
5. Let ℋ = {x ∈ l2 ∶ |xn | ≤ 1/n}. Prove that ℋ is closed. This set is known as
the Hilbert cube.
6. Prove that a subset A of a metric space X is bounded if and only if it is
contained in a ball.
7. Let (X, d) be a metric space, let A ⊆ X, and let x, y ∈ X. Prove that
dist(x, A) ≤ d(x, y) + dist(y, A).
8. Let X, A, and x be as in the previous exercise. Prove that dist(x, A) =
dist(x, A).
9. Let (X, d) be a metric space, and let A and B be nonempty subsets of X.
Prove that
(a) dist(A, B) = inf {dist(a, B) ∶ a ∈ A} = inf {dist(b, A) ∶ b ∈ B},
(b) dist(A, B) = dist(A, B), and
(c) there are disjoint closed subsets E and F (of ℝ) such that dist(E, F) = 0.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 119

10. Let (X, d) be a metric space, and let A and B be subsets of X. Prove that
(a) int(A ∩ B) = int(A) ∩ int(B), and
(b) int(A ∪ B) ⊇ int(A) ∪ int(B), giving an example to show that the con-
tainment may be proper.
11. Let A and B be as in the previous exercise. Prove that
(a) A ∪ B = A ∪ B, and
(b) A ∩ B ⊆ A ∩ B, giving an example to show that the containment may be
proper.
12. Show that if a sequence (xn ) in a metric space X converges to x, then {xn ∶
n ∈ ℕ} ∪ {x} is closed in X.
13. Cantor-like sets. Let 0 < 𝜖 ≤ 1. From the unit interval [0, 1], remove the
open subinterval of length 𝜖/3 centered at 1/2, leaving the two closed
intervals I1,1 and I1,2 . Then repeat the geometric construction of the Cantor
set, except require that the removed open interval from In,j be centered at
the midpoint of In,j and have length 𝜖/3n+1 . The resulting set, C𝜖 , is known
as a Cantor-like set. Prove that C𝜖 is closed, perfect, and nowhere dense.
14. Complete the proof of theorem 4.2.18. Hint: Modify the proof of theorem
2.1.15
15. Prove the converse of lemma 4.2.16.
We now take a more careful look at the ternary representation of numbers
in [0, 1]. Specifically, we address the issue of the nonuniqueness of the
n a
representation of a finite sum x = ∑i=1 ii , where an ≠ 0. If an = 2, we use
3
n−1 ai
the finite sum to represent x. If an = 1, we use the series x = ∑i=1 +
3i
∞ 2
∑i=n+1 i and not the finite sum to represent x.
3
16. Prove the converse of proposition 4.2.17. Thus the Cantor set consists
of exactly the points in [0, 1] that have a ternary representation of the
∞ a ∞ a
form x = ∑i=1 ii , ai ∈ {0, 2}. Hint: Prove that if x = ∑i=1 ii , and any of the
3 3
integers ai = 1, then x ∉ C.
17. Prove that a number x ∈ C is a right endpoint of an interval In,j if and only
if the ternary representation of x contains a finite number of zeros.
18. Prove that the interior of the standard n-simplex Tn consists of all the points
in Tn with positive barycentric coordinates. Hence describe the boundary
of Tn .

4.3 Continuity and Equivalent Metrics

Continuity, from the intuitive point of view, is about the gradual rather than the
abrupt change of function values. In its simplest form, the graph of a continuous,
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

120 fundamentals of mathematical analysis

real-valued function of a single real variable must be connected. Most functions in


mathematics are too complicated for such a visual characterization of continuity,
and a more rigorous and robust definition is needed. The 𝜖-𝛿 definition of
continuity revolutionized calculus, and hence mathematics, in the early nineteenth
century. It is based on the idea that the fluctuations of a continuous function
can be controlled in a sufficiently small neighborhood of a point of continuity. Our
definition of local continuity in the metric setting is an immediate generalization
of the 𝜖-𝛿 definition. We then define the global continuity of a function on a metric
space, an important concept seldom treated in undergraduate textbooks. You will
see that continuity does not depend on the specific metric we use to measure
proximity, but rather on the collection of open sets the metric induces. This leads
us to the notion of equivalent metrics and, more generally, homeomorphisms.

Definition. Let (X, d) and (Y, 𝜌) be metric spaces. A function f ∶ X → Y is said to


be continuous at a point x0 ∈ X if, for every 𝜖 > 0, there exists 𝛿 > 0 such that
𝜌(f(x), f(x0 )) < 𝜖 whenever d(x, x0 ) < 𝛿.

The following theorem is an obvious restatement of the definition.

Theorem 4.3.1. Let (X, d) and (Y, 𝜌) be metric spaces. A function f ∶ X → Y is


continuous at x0 if and only if the inverse image of an open ball in Y centered
at f(x0 ) contains an open ball in X centered at x0 . 

Theorem 4.3.2. Let f ∶ (X, d) → (Y, 𝜌). Then f is continuous at x0 if and only if, for
a sequence (xn ) in X with limn xn = x0 , limn f(xn ) = f(x0 ).

Proof. Suppose f is continuous at x0 , and let limn xn = x0 . Given 𝜖 > 0, there


exists 𝛿 > 0 such that 𝜌(f(x), f(x0 ) < 𝜖 whenever d(x, x0 ) < 𝛿. Now there exists
a natural number N such that, for n > N, d(xn , x) < 𝛿. Thus, for n > N,
𝜌(f(xn ), f(x0 )) < 𝜖, and limn f(xn ) = f(x0 ). Conversely, if f is not continuous at
x0 , there exists 𝜖 > 0 such that f−1 (B(f(x0 ), 𝜖)) contains no open ball centered
at x0 , and hence B(x0 , 1/n) − f−1 (B(f(x0 ), 𝜖)) ≠ ∅ for every n ∈ ℕ. Pick a point
xn ∈ B(x0 , 1/n) − f−1 (B(f(x0 ), 𝜖)). Clearly, limn xn = x0 , but limn f(xn ) ≠ f(x0 )
because 𝜌(f(xn ), f(x0 )) ≥ 𝜖 for all n. 

Theorem 4.3.2 provides an extremely useful criterion for proving that a given
function is continuous. It is called the sequential characterization of continuity.
See examples 1 and 2 below.

Definition. A function f from a metric space (X, d) to a metric space (Y, 𝜌) is


continuous on X if it is continuous at each point x ∈ X.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 121

Theorem 4.3.3. For a function f from a metric space (X, d) to a metric space (Y, 𝜌),
the following are equivalent.

(a) f is continuous on X.
(b) The inverse image of an open subset of Y is an open subset of X.
(c) The inverse image of a closed subset of Y is a closed subset of X.

Proof. (a) implies (b). Let V be an open subset of Y, and let x0 ∈ f−1 (V). Since V is
open, there exists 𝜖 > 0 such that B(f(x0 ), 𝜖) ⊆ V. Since f is continuous at x0 , there
exists 𝛿 > 0 such that f(B(x0 , 𝛿)) ⊆ B(f(x0 ), 𝜖). Thus f−1 (V) ⊇ f−1 (B(f(x0 ), 𝜖) ⊇
B(x0 , 𝛿). This proves that f−1 (V) is open in X.
(b) implies (c). Let F be a closed subset of Y. Then Y − F is open in Y. By
assumption, f−1 (Y − F) is open in X. But f−1 (Y − F) = X − f−1 (F); hence f−1 (F)
is closed in X.
(c) implies (a). Let x0 ∈ X and let V = B(f(x0 ), 𝜖); Y − V is closed in Y, so, by
assumption, f−1 (Y − V) = X − f−1 (V) is closed in X, and hence f−1 (V) is open.
Because x0 ∈ f−1 (V), there exists 𝛿 such that B(x0 , 𝛿) ⊆ f−1 (V). By theorem 4.3.1,
f is continuous at x0 . 

Example 1 (the continuity of norms). Let (xn ) be a convergent sequence in a


normed linear space, and suppose that limn xn = x. Then limn ‖xn ‖ = ‖x‖. This
follows immediately from the fact that |‖xn ‖ − ‖x‖| ≤ ‖xn − x‖. 

Example 2 (the continuity of inner products). Let (xn ) and (yn ) be convergent
sequences in an inner product space with limits x and y, respectively. Then
limn ⟨xn , yn ⟩ = ⟨x, y⟩. First recall that convergent sequences are bounded. Thus
there is a constant M such that ‖yn ‖ ≤ M. Now

|⟨xn , yn ⟩ − ⟨x, y⟩| = |⟨xn − x, yn ⟩ + ⟨x, yn − y⟩|


≤ |⟨xn − x, yn ⟩| + |⟨x, yn − y⟩|
≤ ‖xn − x‖‖yn ‖ + ‖x‖‖yn − y‖
≤ M‖xn − x‖ + ‖x‖‖yn − y‖] → 0 as n → ∞. 

Definition. Let d1 and d2 be metrics on the same underlying set X. We say that d1
is weaker (or coarser) than d2 if every d1 -open subset of X is d2 -open. In this
case, we also say that d2 is stronger or finer than d1 .

Example 3. Let (X, d1 ) be any metric space, and let d2 be the discrete metric on X.
Clearly, d1 is weaker than d2 . We will give more interesting examples later. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

122 fundamentals of mathematical analysis

Theorem 4.3.4. A metric d1 is weaker that another metric d2 if and only if the
identity function IX ∶ (X, d2 ) → (X, d1 ) is continuous.

Proof. If IX ∶ (X, d2 ) → (X, d1 ) is continuous and V is d1 -open, then I−1 X (V) is open
in d2 . But I−1
X (V) = V, so d 1 is weaker than d 2 . The converse is proved by reversing
the above reasoning. 

Now we discuss concrete criteria that guarantee that a metric d1 is weaker than d2 .
Since every d1 -open set is the union of d1 -open balls, it suffices to show that every
d1 -open ball, Bd1 (x, 𝛿) is d2 -open. Since every y ∈ Bd1 (x, 𝛿) is the center of a ball
Bd1 (y, 𝛿 ′ ) ⊆ Bd1 (x, 𝛿), it is further sufficient to show that every open ball Bd1 (y, 𝛿 ′ )
contains a d2 -open ball Bd2 (y, 𝜖) for some 𝜖 > 0. We apply the above strategy to
prove the following theorem.

Theorem 4.3.5. If there exists a real number 𝛼 > 0 such that d1 (x, y) ≤ 𝛼d2 (x, y) for
all x, y ∈ X, then d1 is weaker than d2 .

Proof. Consider a d1 -open ball Bd1 (x, 𝛿), and let 𝜖 = 𝛿/𝛼. It follows that Bd2 (x, 𝜖) ⊆
Bd1 (x, 𝛿), because if y ∈ Bd2 (x, 𝜖), then d1 (x, y) ≤ 𝛼d2 (x, y) ≤ 𝛼𝜖 = 𝛿. 

Now we look at more significant examples of the concepts we developed.

Example 4. It is clear that l1 ⊆ l∞ since every absolutely convergent series is


bounded. Thus the space X = l1 has two metrics; the metric induced by the 1-
norm ‖.‖1 and that induced by the infinity norm ‖.‖∞ . Since, for x ∈ l1 , ‖x‖∞ ≤
‖x‖1 , and d∞ (x, y) = ‖x − y‖∞ ≤ ‖x − y‖1 = d1 (x, y), the infinity metric on X is
weaker than the 1-metric on X. 

Example 5. Consider the space X = 𝒞[0, 1] under the uniform metric and the
1-metric. The identity function IX ∶ (X, ‖.‖∞ ) → (X, ‖.‖1 ) is continuous since,
for f ∈ 𝒞[0, 1], ‖f‖1 ≤ ‖f‖∞ . By theorem 4.3.4, the 1-metric is weaker than the
uniform metric. However, the identity function IX ∶ (X, ‖.‖1 ) → (X, ‖.‖∞ ) is not
continuous. To see this, consider the sequence (see section 3.6)

1
⎧ 2n3 x if 0 ≤ x ≤ 2 ,
⎪ 2n
3 1 1 1
fn (x) = −2n (x − 2 ) if 2 ≤ x ≤ 2 ,
⎨ n 2n n
⎪0 1
if 2 ≤ x ≤ 1.
⎩ n

1
‖fn ‖1 = ; hence fn → 0 in the 1-norm, while fn does not converge in the
2n
uniform norm since ‖fn ‖∞ = n. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 123

Definition. Two metrics d1 and d2 on X are equivalent if they generate the same
collection of open sets. Thus d1 and d2 are equivalent if d1 is weaker than d2 and
d2 is weaker than d1 .

Definition. A bijection f from a metric space (X, d) to a metric space (Y, 𝜌) is


bicontinuous if f and f−1 are continuous.

By theorem 4.3.4, the following result is immediate.

Theorem 4.3.6. Two metrics d1 and d2 on a set X are equivalent if and only if the
identity function IX ∶ (X, d1 ) → (X, d2 ) is bicontinuous. 

Theorem 3.4.5 directly implies the following theorem.

Theorem 4.3.7. If there exist positive constants 𝛼 and 𝛽 such that 𝛽d2 (x, y) ≤
d1 (x, y) ≤ 𝛼d2 (x, y) for every x, y ∈ X, then d1 and d2 are equivalent. 

The following theorem gives a sequential characterization of the equivalence of


two metrics.

Theorem 4.3.8. A necessary and sufficient condition for two metrics d1 and d2 on
X to be equivalent is that a sequence (xn ) converges to x in d1 if and only if it
converges to x in d2 .

Proof. Suppose d1 and d2 are equivalent, and let limn xn = x in d1 . By theorem 4.3.6,
IX ∶ (X, d1 ) → (X, d2 ) is continuous; hence, by the sequential characterization of
continuity, IX (xn ) = xn converges to x in d2 . We leave the rest of the proof to the
reader. 

Example 6. Let X = ℝn . The metrics induced by the 1-norm, the 2-norm, and
the ∞-norm are all equivalent. To see this, we use theorem 4.3.7. The reader
should work out the details. A partial list of the inequalities needed includes
‖x‖1 ≤ n‖x‖∞ and ‖x‖1 ≤ √n‖x‖2 . 

Example 7. Let (X, d) be a metric space. Then the metric d(x, y) = min{1, d(x, y)}
is equivalent to d. It is a simple exercise to show that d is a metric. The fact that
the two metrics are equivalent follows from Bd (x, 𝜖) ⊆ Bd (x, 𝜖) and Bd (x, 𝛿) ⊆
Bd (x, 𝜖), where 𝛿 = min{𝜖, 1}. 

Remarks.
1. Important properties of metric spaces are often determined by the collection
of open sets and not by the specific metric that generates the open sets. For
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

124 fundamentals of mathematical analysis

example, a function f from a metric space (X, d) to a metric space (Y, 𝜌) is


continuous if and only if the inverse image of an open subset of Y is open
in X. Clearly, if the metric d is replaced with an equivalent metric d1 , then
f is continuous with respect to d1 . The same is true if 𝜌 is replaced with an
equivalent metric. In many ways, the collection of open sets in a metric space
is almost as intrinsic to the space as the specific metric that generates the
open sets.
2. Not all metric properties are preserved under metric equivalence. Observe
that the metric d in example 2 above is a bounded metric because d(X) = 1,
even though the metric d may be unbounded. For example, the metric d on
ℝ is equivalent to the usual metric on ℝ. In particular, boundedness is not
preserved under metric equivalence.

We include the following as another example of a bounded metric that is equivalent


to an arbitrary metric.

d(x,y)
Example 8. For an arbitrary metric space (X, d), the metric d(x, y) = is
1+d(x,y)
equivalent to d.

We show that d satisfies the triangle inequality and leave the rest of the details
t
for the reader to verify. The function f ∶ [0, ∞) → [0, 1) defined by f(t) = is
1+t
a b
increasing. Thus if 0 ≤ a ≤ b, then ≤ . Replacing a with d(x, z), and b with
1+a 1+b
d(x, y) + d(y, z), yields

d(x, z) d(x, y) + d(y, z)


d(x, z) = ≤
1 + d(x, z) 1 + d(x, y) + d(y, z)
d(x, y) d(y, z)
= +
1 + d(x, y) + d(y, z) 1 + d(x, y) + d(y, z)
d(x, y) d(y, z)
≤ + = d(x, y) + d(y, z). 
1 + d(x, y) 1 + d(y, z)

Definition. Let (X, d) and (Y, 𝜌) be metric spaces. A function 𝜑 ∶ X → Y is said


to be an isometry if, for every x, y ∈ X, 𝜌(𝜑(x), 𝜑(y)) = d(x, y). Notice that an
isometry is always injective. We say that the metric spaces (X, d) and (Y, 𝜌) are
isometric if there is a bijective isometry 𝜑 ∶ (X, d) → (Y, 𝜌).

Example 9 (linear isometries on ℝn ). Let T be a linear isometry on ℝn . Then


there exists an orthogonal matrix P such that T(x) = Px for every x ∈ ℝn .
Observe that the converse was established in the exercises in section 3.7.
Let P be the standard matrix of T. We prove that P is orthogonal. The
assumption is that, for x, y ∈ ℝn , ‖Px − Py‖ = ‖x − y‖. In particular, taking
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 125

y = 0, we have ‖Px‖ = ‖x‖. First we claim that, for x, y ∈ ℝn , ⟨Px, Py⟩ = ⟨x, y⟩.
This will conclude the proof because we then have

⟨PT Px − x, y⟩ = ⟨PT Px, y⟩ − ⟨x, y⟩ = ⟨Px, Py⟩ − ⟨x, y⟩ = 0.

Choosing y = PT Px − x, we obtain ⟨PT Px − x, PT Px − x⟩ = 0, or

(PT P − In )x = 0.

Since x is arbitrary, PT P − In = 0.
We now prove the claim. The assumption that ‖Px − Py‖2 = ‖x − y‖2 yields
⟨Px − Py, Px − Py⟩ = ⟨x − y, x − y⟩. Expanding the bilinear forms on the two
sides of the last identity yields ⟨Px, Py⟩ = ⟨x, y⟩, as claimed. 

Isometric spaces are virtually identical except for the nature of the elements of the
spaces X and Y and the definition of the metrics d and 𝜌. An isometry preserves
all the metric properties of the space, including boundedness, which, as we saw, is
not preserved under the equivalence of metrics. Another metric property that is
preserved under isometries but not under metric equivalence is completeness. See
section 4.6.

Homeomorphisms

The concept of a homeomorphism is of central importance in topology. In the met-


ric setting, isometry, although quite useful, is too stringent and does not preclude
homeomorphisms from being useful. One can loosely think of a homeomorphism
as a relaxation of the concept of isometry and an extension of the notion of metric
equivalence.

Definition. Two metric spaces (X, d) and (Y, 𝜌) are homeomorphic if there exists
a bicontinuous bijection 𝜑 from X to Y. The function 𝜑 is called a homeomor-
phism from X to Y.

Example 10. The open interval (−1, 1) is homeomorphic to ℝ (both sets have the
t
usual metric). The function f(t) = 2 maps (−1, 1) bicontinuously onto ℝ. 
1−t

Example 11. The closed upper half plane H = ℝ × [0, ∞) is homeomorphic to


the half-open strip A = ℝ × [0, 1). To see this, define 𝜑 ∶ H → A by 𝜑(x, y) =
y
(x, ). It is a rather routine matter to verify that 𝜑 is a bijection and that its
1+y
t
inverse is 𝜑−1 (x, t) = (x, ). 
1−t
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

126 fundamentals of mathematical analysis

Example 12 (the stereographic projection). Let 𝒮1 = {(𝜉1 , 𝜉2 ) ∈ ℝ2 ∶ 𝜉12 + 𝜉22 −


𝜉2 = 0} be a circle of diameter 1 and centered at the point (0, 1/2), and let N =
(0, 1) be the top point on the circle. Define the punctured circle to be the circle
with the top point removed: 𝒮1∗ = 𝒮1 − {N}. We give 𝒮1∗ the Euclidean metric in
the plane. Define a bijection P ∶ 𝒮1∗ → ℝ as follows: for a point 𝜉 = (𝜉1 , 𝜉2 ) ∈ 𝒮1∗ ,
P(𝜉) is the horizontal intercept of the line that contains the points N and 𝜉, as
shown in figure 4.1.

N = (0,1)

ξ = P–1 (x)

x = P (ξ)
(0,0)

Figure 4.1 The stereographic projection

The mapping P is known as the stereographic projection of the punctured circle


onto the real line. It is geometrically clear that P is a bijection and that it is
bicontinuous: the inverse image of a bounded open interval in ℝ is an open arc
on 𝒮1∗ , and conversely.

Explicit formulas exist for P and P−1 . It is easier to derive the formula for P−1
than to compute that for P. For a fixed x ∈ ℝ, the parametric equations of the
line containing the points N and (x, 0) are 𝜉1 = xt, 𝜉2 = 1 − t, and −∞ < t < ∞.
Finding the intersection point 𝜉 of the line and the circle yields the formula
for P−1 ∶
x x2
P−1 (x) = 𝜉 = (𝜉1 , 𝜉2 ) = ( , ).
1 + x2 1 + x2
Inverting the above formulas, one obtains the following formula for the stereo-
graphic projection:
𝜉1
x = P(𝜉) = .
1 − 𝜉2
We define the chordal metric 𝜒(x, y) on ℝ as follows: for two points x, y ∈ ℝ,
𝜒(x, y) is the length of the chord of the circle that joins the points P−1 (x) and
P−1 (y), hence the name chordal metric. Note that 𝜒 is the metric on ℝ that
makes the stereographic projection an isometry. Given the above formula for
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 127

P−1 , a direct calculation of the Euclidean distance between P−1 (x) and P−1 (y)
yields
|x − y|
𝜒(x, y) = .
√1 + x2 √1 + y2
Because the stereographic projection is a homeomorphism, the chordal metric
𝜒 is equivalent to the usual metric on ℝ. We will see in section 4.6 that the
chordal metric is not complete. This illustrates the fact that completeness is not
preserved under homeomorphisms. The reader will recall that boundedness
is not preserved under metric equivalence. Saying that two metrics d1 and
d2 on a space X are equivalent is exactly the same as saying that the identity
mapping IX ∶ (X, d1 ) → (X, d2 ) is a homeomorphism. The properties of a space
that are preserved under homeomorphisms are called topological properties
of the space. Compactness is the prime example of a topological property;
see theorem 4.7.4. The fact that some metric properties, such as boundedness
and completeness, fail to be hereditary under homeomorphisms is a rather
inconvenient fact and does not diminish the usefulness of such properties. 

Example 13. Stereographic projections can be defined in all dimensions. Let 𝒮n =


n+1
{𝜉 = (𝜉1 , 𝜉2 , . . . , 𝜉n+1 ) ∈ ℝn+1 ∶ ∑i=1 𝜉i2 − 𝜉n+1 = 0} be the sphere in ℝn+1 of
diameter 1 and center (0, 0, . . . , 0, 1/2) ∈ ℝn+1 , and let N = (0, 0, . . . , 0, 1) ∈ 𝒮n .
Define the punctured sphere 𝒮n∗ = 𝒮n − {N}. The stereographic projection P of
𝒮n∗ onto ℝn maps a point 𝜉 = (𝜉1 , 𝜉2 , . . . , 𝜉n+1 ) on the punctured sphere to the
intersection of the hyperplane 𝜉n+1 = 0 and the line that contains the points
𝜉 and N. As in the one-dimensional case, it is easier to compute the formula
for P−1 than that for P. The calculations needed for computing the formulas
for P and P−1 are left as an exercise (see problem 18). The continuity of all the
component functions shows that P is a homeomorphism. See theorem 4.4.6.
One can also define the chordal metric on ℝn by 𝜒(x, y) = ‖P−1 (x) − P−1 (y)‖2 .
See problem 19 in the section exercises. 

One important special case is when n = 2. This is relevant to the one-point


compactification of the plane; see section 5.10.

Exercises

1. Let 𝕂 denote the real or complex field with the usual metric. Prove that if f
and g are continuous functions from a metric space (X, d) to 𝕂, then so are
the functions f ± g and fg. If, in addition, g(x) ≠ 0 for all x ∈ X, then f/g is
continuous.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

128 fundamentals of mathematical analysis

2. Let f be a continuous function from a metric space X to a metric space Y,


and let g be a continuous function from Y to a metric space Z. Prove that
the composition gof ∶ X → Z is continuous.
3. Let f be a continuous function from a metric space X to a metric space Y,
and let A ⊆ X. Prove that the restriction of f to A is continuous when A is
given the restricted metric.
4. Fix an element a of a metric space X, and define a function f ∶ X → ℝ by
f(x) = d(x, a). Prove that f is continuous.
5. Let A be a fixed subset of a metric space X, and define a function f ∶ X → ℝ
by f(x) = dist(x, A). Prove that f is continuous.
6. Let E be a closed subset of a metric space X, and let a ∈ X − E. Prove
that there exists a continuous function f ∶ X → ℝ such that f(a) = 0, f(E) =
d(x,a)
1. Hint: Consider the function f(x) = . Show how this result
d(x,a)+dist(x,E)
provides an alternative proof of theorem 4.2.12.
7. Let E and F be disjoint closed subsets of a metric space X. Prove that there
exists a continuous function f ∶ X → ℝ such that f(E) = 0, f(F) = 1. Show
how this result provides an alternative proof of theorem 4.2.13.
8. Let (X, d) be a metric space, and let E and F be disjoint closed subsets
of X. Set U = {x ∈ X ∶ dist(x, E) < dist(x, F)}, and V = {x ∈ X ∶ dist(x, F) <
dist(x, E)}. Show that U and V are open sets that separate E and F.
9. Let f and g be continuous functions from a metric space X to a metric space
Y. Prove that {x ∈ X ∶ f(x) ≠ g(x)} is an open subset of X.
10. Let f and g be continuous functions from a metric space X to a metric space
Y, and let A be a subset of X such that f(x) = g(x) for every x ∈ A. Prove that
f(x) = g(x) for all x ∈ A.
11. Prove that the metric d in example 8 is equivalent to d.
12. Show that the converse of theorem 4.3.5 is false by finding a metric d1 that
is weaker that another metric d2 but where exists no constant 𝛼 > 0 such
that d1 (x, y) ≤ 𝛼d2 (x, y) for all x, y ∈ X.
13. Fix an element x0 of a normed linear space X, and define a function 𝜑 ∶
X → X by 𝜑(x) = x + x0 . Show that 𝜑 is an isometry.
14. Define a function 𝜑 ∶ X → X on a normed linear space X by 𝜑(x) = −x.
Show that 𝜑 is an isometry.
15. Show that the open unit disk U = {(x, y) ∈ ℝ2 ∶ x2 + y2 < 1} is homeo-
morphic to the plane ℝ2 . Hence show that the punched disk U − {(0, 0)}
is homeomorphic to the punctured plane ℝ2 − {(0, 0)}. The same results
extend to ℝn .
16. Let f, g ∶ ℝ → ℝ be continuous functions such that f(x) < g(x) for all x ∈ ℝ.
Show that the region between the graphs, {(x, y) ∶ x ∈ ℝ, f(x) < y < g(x)}, is
homeomorphic to the open strip ℝ × (0, 1).
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 129

17. Let X be a normed linear space. Prove that any two open balls in X are
homeomorphic. The same is true of closed balls.
18. The parametric equations of the line containing the points (0, 0, . . . , 1) and
(x1 , x2 , . . . , xn , 0) in ℝn+1 are

𝜉 = (𝜉1 , . . . , 𝜉n+1 ) = (tx1 , tx2 , . . . , txn , 1 − t).

Find the point of intersection of the line and the sphere,

n+1
{𝜉 = (𝜉1 , 𝜉2 , . . . , 𝜉n+1 ) ∶ ∑ 𝜉i2 − 𝜉n+1 = 0},
i=1

to derive the formula for the inverse of the stereographic projection,


P−1 (x1 , . . . , xn ) = (𝜉1 , . . . , 𝜉n+1 ), where

xi ‖x‖22
𝜉i = , 1 ≤ i ≤ n, 𝜉n+1 = .
1 + ‖x‖22 1 + ‖x‖22

Hence, by inverting the above formulas, derive the formula for the stereo-
graphic projection,

𝜉i
P(𝜉1 , . . . , 𝜉n+1 ) = (x1 , . . . , xn ), where xi = .
1 − 𝜉n+1

19. Derive the formula for the chordal metric on ℝn ,

‖x − y‖2
𝜒(x, y) = .
2 2
√1 + ‖x‖2 √1 + ‖y‖2

4.4 Product Spaces

The Euclidean plane ℝ2 , as the product of two copies of ℝ, is the simplest example
of a product space. We saw in section 4.3 that the Euclidean metric in the plane,
although the most natural, is equivalent to several other metrics, including the
∞-metric, which, according to the definition below, is the product metric on
ℝ2 . It is only natural to expect that the product of two open intervals should
be an open subset of ℝ2 , and the definition we adopt for the product metric
smoothly guarantees that. When we identify the complex field with ℝ2 , the
convergence of a complex sequence zn = xn + iyn is equivalent to the convergence
of its real and imaginary parts in ℝ, and one expects that product metrics in
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

130 fundamentals of mathematical analysis

general should extend this property. Not only does the product metric preserve the
componentwise convergence in the factor spaces, it is characterized by it. You will
see that the product metric is the weakest metric that guarantees componentwise
convergence in the factor spaces. Additionally, we will show that the product
metric admits the continuity of the projections on the factor spaces and, once
again, is characterized by it. We therefore think of the product metric as the most
economical metric that generalizes the properties of Euclidean space in relation
to its factor spaces.
n
Let {(Xi , di )}ni=1 be a finite set of metric spaces, and let X = ∏i=1 Xi =
{(x1 , . . . , xn ) ∶ xi ∈ Xi } be the Cartesian product of the underlying sets Xi .

Definition. The product metric D on X is defined by

D(x, y) = max1≤i≤n di (xi , yi ).

Here x = (x1 , . . . , xn ) and (y1 , . . . , yn ) are points in X. The verification that D is a


metric is straightforward.

Example 1. For 1 ≤ i ≤ n, take Xi = ℝ, and let di be the usual metric on ℝ. The


n
product metric D on ∏i=1 Xi = ℝn is exactly the ∞-metric on ℝn . 

For x ∈ X and 𝛿 > 0, we denote the D-ball in X of radius 𝛿 centered at x by BD (x, 𝛿).

Theorem 4.4.1. If x ∈ X and 𝛿 > 0, then

BD (x, 𝛿) = Bd1 (x1 , 𝛿) × . . . × Bdn (x, 𝛿).

Proof. A point y = (y1 , . . . , yn ) is in BD (x, 𝛿) if and only if max1≤i≤n di (xi , yi ) < 𝛿, if


and only ifdi (xi , yi ) < 𝛿 for each 1 ≤ i ≤ n, if and only if yi ∈ Ddi (xi , 𝛿) for each
n
1 ≤ i ≤ n, if and only if y ∈ ∏i=1 Bdi (xi , 𝛿). 

n
Theorem 4.4.2. If Ui is open in Xi for each 1 ≤ i ≤ n, then the set U = ∏i=1 Ui is
open in (X, D).
n
Proof. Let x ∈ ∏i=1 Ui . Then xi ∈ Ui , and hence there exists 𝛿i > 0 such that
n n
Bdi (xi , 𝛿i ) ⊆ Ui . Let 𝛿 = min1≤i≤n 𝛿i . Clearly, ∏i=1 Bdi (xi , 𝛿) ⊆ ∏i=1 Bdi (xi , 𝛿i ) ⊆
n n
∏i=1 Ui . By theorem 4.4.1, ∏i=1 Bdi (xi , 𝛿) = BD (x, 𝛿). Thus x ∈ BD (x, 𝛿) ⊆ U,
which proves that U is open. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 131

Remark. For a fixed 1 ≤ i ≤ n, let Ui be an open subset of Xi . As an immediate


consequence of theorem 4.4.2, the set X1 × . . . × Xi−1 × Ui × Xi+1 × . . . × Xn is
open in X. It follows that X − [X1 × . . . × Xi−1 × Ui × Xi+1 × . . . × Xn ] = X1 ×
. . . × Xi−1 × (Xi − Ui ) × Xi+1 × . . . × Xn is closed in X.

Theorem 4.4.3. If F1 , . . . , Fn are closed subsets of X1 , . . . , Xn , respectively, then


F1 × . . . × Fn is closed in X.
n n
Proof. Let Ui = Xi − Fi . Then F = ∏i=1 Fi = ∏i=1 (Xi − Ui ) = ∩ni=1 X1 × . . . ×
Xi−1 × (Xi − Ui ) × Xi+1 × . . . × Xn ]. By the above remark, each of the sets
X1 × . . . × Xi−1 × (Xi − Ui ) × Xi+1 × . . . × Xn is closed, and hence F is closed. 
n
Theorem 4.4.4. Suppose (X, D) = ∏i=1 (Xi , di ). Let (x(k) )∞
k=1 be a sequence in X,
(k) (k)
and let x = (x1 , . . . , xn ) ∈ X. Write x(k) = (x1 , . . . , xn ). Then limk x(k) = x in D
(k)
if and only if limk xi = xi in di for each 1 ≤ i ≤ n.

(k)
Proof. Because di (xi , xi ) ≤ D(x(k) , x), limk D(x(k) , x) = 0 implies that
(k) (k)
limk di (xi , xi ) = 0 . Conversely, if limk xi = xi in di for each 1 ≤ i ≤ n, then
(k)
limk max1≤i≤n di (xi , xi ) = 0, and hence limk x(k) = x. 

Theorem 4.4.4 says that the convergence of a sequence in the product metric D
is equivalent to the convergence of each of the component sequences (compo-
nentwise convergence). In fact, componentwise convergence characterizes all the
metrics on X that are equivalent to the product metric D, as the following theorem
shows.

Theorem 4.4.5. Suppose D∗ is a metric on the product space X where convergence


in D∗ is equivalent to componentwise convergence. Then D∗ is equivalent to D.

Proof. We use theorem 4.3.8. The metrics D and D∗ are equivalent if and only if
convergence of a sequence in one metric occurs if and only if it occurs in the other
metric. Clearly, this is the case for D and D∗ , since convergence in either metric is
equivalent to componentwise convergence. 

Example 2. To illustrate the importance of the above theorem, note that each of
the following metrics are equivalent to the product metric D on X:

n
D1 (x, y) = ∑ di (xi , yi ),
i=1
n 1/2

D2 (x, y) = (∑ d2i (xi , yi )) .


i=1
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

132 fundamentals of mathematical analysis

It is clear that convergence of a sequence in (X, D1 ) or (X, D2 ) occurs exactly


when the component sequences converge.
Therefore we can use either of the metrics D1 or D2 or any other metric
equivalent to D as a definition of the product space, since they all generate the
same collection of open sets. It is common to use whatever metric happens
to be convenient in any particular situation. Theorem 4.4.5 also yields an
equivalent metric for the product space if the metrics d1 , . . . , dn are replaced
with equivalent metrics. 

The following theorem is very useful in characterizing continuity of vector func-


tions (functions into a product space). It says that a vector function is continuous
exactly when its component functions are.
n
Theorem 4.4.6. Let (Y, 𝜌) be a metric space, let f ∶ Y → ∏i=1 Xi , and write f(y) =
(f1 (y), . . . , fn (y)). Then f is continuous if and only if each of the component
functions fi ∶ Y → Xi is continuous.

Proof. Let (y(k) ) be a sequence in Y, and suppose limk y(k) = y. By theorem 4.4.4,
limk f(y(k) ) = f(y) if and only limk fi (y(k) ) = fi (y) for all 1 ≤ i ≤ n. 

Example 3. We used the previous theorem to prove the continuity of the stereo-
graphic projections. See example 12 on section 4.3, and problem 18 on the same
section.

Exercises

1. Let {(Xi , di )}ni=1 be a finite set of metric spaces. Prove that X1 × . . . × Xn is


isometric to X1 × (X2 × . . . × Xn ).
2. Let {(Xi , di )}ni=1 be a finite set of metric spaces, and let (X, D) be their product.
(a) Prove directly, using the definition of D, that the projections 𝜋i ∶ X → Xi
are continuous. It follows that 𝜋i−1 (Ui ) is open in X for every open subset
Ui of Xi .
(b) It then follows from part (a) that, for open subsets Ui ⊆ Xi , 1 ≤ i ≤ n,
∩ni=1 𝜋i−1 (Ui ) is open. What is ∩ni=1 𝜋i−1 (Ui )?
3. When you have solved exercise 2 above, you will have an alternative proof of
theorem 4.4.2. Do you see it?
4. Consider the metric d on a metric space X as a function on the product
space X × X, endowed with the product metric. Prove that d ∶ X × X → ℝ
is continuous.
n
5. Let {(Xi , di )}ni=1 be a finite set of metric spaces, and let X = ∏i=1 Xi be the
Cartesian product of the underlying sets. Prove that the product metric is the
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 133

weakest metric relative to which all the projections 𝜋i are continuous. More
explicitly stated, show that if D∗ is a metric on X and each 𝜋i ∶ (X, D∗ ) →
(Xi , di ) is continuous, then the product metric is weaker that D∗ .

4.5 Separable Spaces

Although the rigorous definition of the real line was a giant leap in the devel-
opment of mathematics, it would not be nearly as useful an invention had it not
been for the fact that it contains the rational numbers as a dense subset. Indeed,
all practical computations, including machine calculations, are done exclusively
using rational numbers. The simplicity of rational numbers is enhanced by their
countability. Thus ℚ is numerous enough, simple enough, but not too enormous to
be a useful approximation of ℝ. It is a reasonable quest to study metric spaces
that contain a countable dense subset (of simpler elements). Such spaces are,
by definition, separable. You will see that many (but not all) metric spaces are
separable. The classical example is the space 𝒞[0, 1]. It is well known that (see
section 4.8) the set of polynomials with rational coefficients, which is countable,
is dense in 𝒞[0, 1]. What can be a nicer approximation of a continuous function
than a rational polynomial! Separability of a metric space turns out to be equivalent
to the existence of a countable collection of open sets that generate all open sets,
which is an added benefit and an important characterization of separability.

Definition. A subset A of a metric space X is dense in X if A = X. By theorem


4.2.5, A is dense in X if and only if every point in X is the limit of a sequence in
A. Equivalently, A is dense in X if and only if for every x ∈ X and every 𝜖 > 0,
there is an element a ∈ A such that d(x, a) < 𝜖.

Example 1. Given a function f ∈ 𝒞[0, 1] and a number 𝜖 > 0, there exists a


continuous, piecewise linear function g such that ‖f − g‖∞ < 𝜖.

We use the uniform continuity of f (see example 8 on section 1.2). Let 𝛿 > 0
be such that |f(x) − f(y)| < 𝜖 whenever |x − y| < 𝛿. Choose a natural number n
such that 1/n < 𝛿, and, for 0 ≤ j ≤ n, let xj = j/n. Define the function g to be
the continuous, piecewise linear function such that g(xj ) = f(xj ) for 0 ≤ j ≤ n.
By construction, ‖f − g‖∞ < 𝜖. Observe that this example says that the space of
continuous, piecewise linear functions is dense in 𝒞[0, 1]. 

Definition. A metric space is separable if it contains a countable dense subset.

Example 2. Since ℚ is dense in ℝ, ℝ is separable. More generally, ℚn is countable


and dense in ℝn ; hence ℝn is separable. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

134 fundamentals of mathematical analysis

Example 3. The metric space l1 is separable. Let A = {(a1 , a2 , . . . , an , 0, 0, . . . ) ∶ n ∈


ℕ, ai ∈ ℚ} be the subset of l1 that contains all of the sequences with finitely
many nonzero rational terms. As A is countable, we need to show that it is

dense in l1 . Let x = (xi ) ∈ l1 , and let 𝜖 > 0. Since ∑i=1 |xi | is convergent, there

exists N ∈ ℕ such that ∑i=N+1 |xi | < 𝜖/2. For 1 ≤ i ≤ N, choose ai ∈ ℚ such that
N
|xi − ai | < 𝜖/(2N), and let a = (a1 , . . . , an , 0, 0, . . . ). Now ‖x − a‖1 = ∑i=1 |xi −
∞ 𝜖
ai | + ∑i=N+1 |xi | < N + 𝜖/2 = 𝜖. 
2N

Example 4. The space c of convergent sequences is separable.


Let A = {(a1 , a2 , . . . , an , a, a, a, . . . ) ∶ n ∈ ℕ, ai ∈ ℚ, a ∈ ℚ} be the set of ratio-
nal sequences that are eventually constant. We show that A is dense in c. Let
x = (xn ) be a convergent sequence, and let 𝜉 = limn xn . For 𝜖 > 0, there is an
integer N such that, for n > N, |xn − 𝜉| < 𝜖/2. For 1 ≤ i ≤ N, choose ai ∈ ℚ such
that |xi − ai | < 𝜖, and then choose a rational number a such that |𝜉 − a| < 𝜖/2.
Finally, set y = (a1 , . . . , an , a, a, . . . ). By construction, ‖x − y‖∞ < 𝜖. 

Definition. A collection 𝔅 of open subsets of a metric space X is said to be an


open base for X if every open subset of X is the union of members of 𝔅.

Example 5. The collection of open sets 𝔅 = {B(x, r) ∶ x ∈ ℚn , r ∈ ℚ} of open balls


in ℝn that have rational centers and rational radii is an open base for the
Euclidean metric on ℝn . This takes some verification, and we urge the reader to
work out the details. 

Definition. A metric space X is second countable if it has a countable open base.

The above example shows that ℝn is second countable because the collection 𝔅
is countable.

Definition. A collection of open subsets 𝒰 = {U𝛼 }𝛼∈I of a metric space X covers


X if ∪𝛼∈I U𝛼 = X. We also say that {U𝛼 } is an open cover of X. A subset of 𝒰
that also covers X is said to be a subcover of 𝒰.

Example 6. The collection 𝒰 = {(−n, n) ∶ k ∈ ℕ} is an open cover of ℝ. The subset


{(−2n, 2n) ∶ n ∈ ℕ} is a subcover of 𝒰. 

Definition. A metric space X is said to be a Lindelöf space if every open cover of


X contains a countable subcover of X.

Theorem 4.5.1. The following are equivalent for a metric space X


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 135

(a) X is separable.
(b) X is second countable.
(c) X is Lindelöf.

Proof. (a) implies (b). Let A = {a1 , a2 , . . . } be a countable dense subset of X. We claim
that the countable collection 𝔅 = {B(an , r) ∶ n ∈ ℕ, r ∈ ℚ} is an open base for X.
To prove that every open subset of X is the union of members of 𝔅, it is sufficient to
show that if x ∈ X and 𝛿 > 0, there exist an element an ∈ A and a rational number
r such that x ∈ B(an , r) ⊆ B(x, 𝛿). Pick an element an ∈ A such that d(x, an ) <
𝛿/4, and choose a rational number r such that 𝛿/4 < r < 𝛿/2. Then x ∈ B(an , r),
and if y ∈ B(an , r), then d(x, y) ≤ d(x, an ) + d(an , y) < 𝛿/4 + r < 𝛿, so B(an , r) ⊆
B(x, 𝛿).
(b) implies (c). Let 𝔅 = {Bn ∶ n ∈ ℕ} be a countable open base for X. Suppose,
for some collection 𝒰 = {U𝛼 ∶ 𝛼 ∈ I} of open subsets of X, X = ∪𝛼∈I U𝛼 . For each
natural number n, pick an element Vn in 𝒰 that contains Bn . If no element of
𝒰 contains Bn , define Vn = ∅. We claim that {Vn }n∈ℕ covers X. If x ∈ X, then
x ∈ U𝛼 for some 𝛼 ∈ I. There exists Bn such that x ∈ Bn ⊆ U𝛼 ; thus, Vn ≠ ∅ and
x ∈ Vn .
(c) implies (a). For a fixed n ∈ ℕ, X = ∪x∈X B(x, 1/n). By assumption, there exists a
set {xn,1 , xn,2 . . . } such that X = ∪∞
j=1 B(xn,j , 1/n). We claim that {xn,j ∶ n, j ∈ ℕ} is
dense in X. Let x ∈ X and let 𝛿 > 0. Choose n ∈ ℕ such that 1/n < 𝛿. Because
x ∈ ∪∞j=1 B(xn,j , 1/n), x ∈ B(xn,j , 1/n) for some j ∈ ℕ. Now d(xn,j , x) < 1/n < 𝛿,
and the proof is complete. 

The following example shows that a separable metric space is, in a way, not too
large.

Example 7. The cardinality of a separable metric space is, at most, 𝔠.


Let A = {an ∶ n ∈ ℕ} be a countable dense subset of a separable metric space
X. For each x ∈ X, define a real sequence Sx = (d(x, a1 ), d(x, a2 ), d(x, a3 ), . . . ) We
prove that the function x ↦ Sx is an injection from X to the space ℝℕ of real
sequences. Let x and y be distinct elements of X, and let r = d(x, y). Since A is
dense in X, there exists an element an of A such that d(x, an ) < r/3. It is easy to
see that d(y, an ) ≥ 2r/3. In particular, d(x, an ) ≠ d(y, an ); thus S is an injection.
It follows that Card(X) ≤ Card(ℝℕ ) = 𝔠ℵ0 = 𝔠. 

Exercise

1. Show that lp is separable for all 1 ≤ p < ∞.


2. Show that the space c0 of null sequences is separable.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

136 fundamentals of mathematical analysis

3. Prove that a subset A of a metric space X is dense if and only if it intersects


every nonempty open subset of X.
4. Prove that if X is separable, then any collection of pairwise disjoint open
subsets of X is countable. Conclude that l∞ is not separable. See problem 10
on section 4.1.
5. Show that a normed linear space X is separable if and only if the unit ball
{x ∈ X ∶ ‖x‖ ≤ 1} is separable.
6. Show that a subspace of a second countable space is second countable.
7. Show that a subspace of a separable space is separable and that a subspace
of a Lindelöf space is Lindelöf.
8. Show that the product of finitely many separable metric spaces is separable.
9. Let f be a function from a metric space X to a metric space Y, and let 𝔅 be
an open base for Y. Prove that f is continuous if and only if f−1 (V) is open
for every V ∈ 𝔅.
10. Prove that every open subset V of ℝ is the countable union of disjoint open
intervals. Hint: For x ∈ V, let ax = inf {t ∈ V ∶ (t, x] ⊆ V}, and bx = sup{t ∈
V ∶ [x, t) ⊆ V}. Show that Ix = (ax , bx ) ⊆ V, then prove that, for x, y ∈ V,
either Ix = Iy or Ix ∩ Iy = ∅.

4.6 Completeness

The mathematicians of antiquity had a clear understanding of the existence


of irrational numbers, and mathematicians through the ages understood that
irrational numbers are gaps inside the rational number field. Thus it was quite
well understood that the rational field is not complete. It took some twenty-
four centuries for a rigorous definition of the real number field as a complete
ordered field to materialize. The definitions and some of the results in this section
parallel those in section 1.2. For example, the proof of the Bolzano-Weierstrass
property of bounded sets (theorem 1.2.10) includes a proof of the nested interval
theorem, which is a very special case of the Cantor intersection theorem. Another
highlight of this section is Baire’s theorem, which is one of the cornerstones upon
which functional analysis is built. We will establish the completeness of the lp
spaces as well as the function space 𝒞[a, b], which will pave the way for a number
of interesting applications begun in the section and continued in the section
exercises.

Definition. A sequence (xn ) in a metric space X is said to be a Cauchy sequence


if, for every 𝜖 > 0, there is a natural number N such that,

for all m, n > N, d(xn , xm ) < 𝜖.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 137

Theorem 4.6.1. A convergent sequence is a Cauchy sequence.

Proof. Let lim xn = x, and let 𝜖 > 0. There exists a natural number N such that, for
n > N, d(xn , x) < 𝜖/2. Now, for m, n > N, d(xn , xm ) ≤ d(xn , x) + d(x, xm ) < 𝜖. 

Theorem 4.6.2. A Cauchy sequence is bounded.

Proof. Let 𝜖 = 1. There exists a positive integer N such that, for m, n ≥ N,


d(xn , xm ) < 1. In particular, d(xn , xN ) < 1 for all n ≥ N. Therefore, for all n ∈ ℕ,
d(xn , xN ) ≤ max{1, d(x1 , xN ), . . . , d(xN−1 , xN )}. 

Theorem 4.6.3. If a Cauchy sequence (xn ) contains a subsequence xnk that converges
to x, then (xn ) converges to x.

Proof. Let 𝜖 > 0. There exists a natural number N such that, for m, n ≥ N,
d(xn , xm ) < 𝜖/2. Since limk xnk = x, there exists an integer K such that, for k ≥ K,
d(xnk , x) < 𝜖/2. Without loss of generality, we may assume that K > N and
thus nK ≥ K > N. Taking m = nK and using the triangle inequality, for n > N,
d(xn , x) ≤ d(xn , xnK ) + d(xnK , x) < 𝜖. 

Definition. A metric space X is complete if every Cauchy sequence in X converges


to a point in X.

Before we look at major examples of complete spaces, we look at an example of


an incomplete one.

Example 1. Consider the chordal metric 𝜒 on ℝ. We will show that although the
sequence xn = n is a Cauchy sequence in (ℝ, 𝜒), it does not converge.

1 1
|n − m| | − | 1 1
n m
𝜒(n, m) = = ≤ | − | → 0 as m, n → ∞.
√1 + n2 √1 + m2 1 1 n m
√1 + n2 √1 + m2

To prove that the sequence does not converge to any x ∈ ℝ, we observe that
x
|n − x| |1 − | 1
n
lim 𝜒(n, x) = lim = lim = ≠ 0. 
n n √1 + n2 √1 + x2 n 1 √1 + x2
√1 + n2 √1 + x
2

Theorem 4.6.4.
(a) A closed subspace A of a complete metric space is complete.
(b) A complete subspace A of a metric space is closed.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

138 fundamentals of mathematical analysis

Proof. (a) Let (xn ) be a Cauchy sequence in A. Since X is complete, there exists x ∈ X
such that limn xn = x. Since A is closed, theorem 4.2.5 guarantees that x ∈ A.
(b) Let x ∈ A. By theorem 4.2.5, there exists a sequence (xn ) in A such that
limn xn = x. Now (xn ) is Cauchy (theorem 4.6.1), and A is complete, so (xn )
converges to a point y in A. By the uniqueness of limits, x = y. 

Example 2. The space c of convergent sequences is complete. We show that c


is complete by showing that it is closed in l∞ . See problem 2 in the section
exercises. Let x = (xn ) ∈ l∞ be a closure point of c. We show that x is convergent
by showing that it is Cauchy. Let 𝜖 > 0, and choose a convergent sequence
y = (yn ) such that ‖x − y‖∞ < 𝜖. Because (yn ) is Cauchy, there exists an integer
N such that, for n, m > N, |yn − ym | < 𝜖. Now if n, m > N, then

|xn − xm | ≤ |xn − yn | + |yn − ym | + |ym − xm | < 3𝜖. 

Theorem 4.6.5. The sequence spaces lp are complete for 1 ≤ p ≤ ∞.

Proof. We leave the proof of the case p = ∞ to the reader (also see theorem
4.8.1). Fix 1 ≤ p < ∞, let (xn ) be a Cauchy sequence in lp , and write xn =
(xn,1 , xn,2 , . . . , xn,k , . . . ). Given 𝜖 > 0, there exists N ∈ ℕ such that, for n, m >
p ∞
N, ‖xn − xm ‖p = ∑k=1 |xn,k − xm,k |p < 𝜖p . In particular, if k is a fixed positive
integer, then, for every n, m > N, |xn,k − xm,k | < 𝜖. Thus (xn,k )∞ n=1 is a Cauchy
sequence in 𝕂. By the completeness of 𝕂, xk = limn xn,k exists for every k ∈ ℕ.
Set x = (xk )∞ p
k=1 . We will show that x ∈ l and that limn ‖xn − x‖p = 0. For an
arbitrary positive integer K,

K K ∞
p
∑ |xk |p = lim ∑ |xn,k |p ≤ lim sup ∑ |xn,k |p = lim sup ‖xn ‖p .
n n n
k=1 k=1 k=1

p
Because (xn ) is Cauchy, ‖xn ‖p is bounded by theorem 4.6.2; hence lim supn ‖xn ‖p <

∞. This shows ∑k=1 |xk |p < ∞, and hence x ∈ lp .

Finally we show that ‖xn − x‖p → 0, as n → ∞. For arbitrary positive integers n


and K,
K
K p
∑ |xn,k − xk |p = lim ∑ |xn,k − xm,k |p ≤ lim sup ‖xn − xm ‖p .
m→∞ k=1
k=1 m→∞

Taking the limit as K → ∞ of the extreme left side of the above string of inequali-
ties, we have

p p
‖xn − x‖p = ∑ |xn,k − xk |p ≤ lim sup ‖xn − xm ‖p .
k=1 m→∞
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 139

Observe that the above inequalities hold for an arbitrary positive integer n. Now
let 𝜖 > 0. There exists N ∈ ℕ such that, for m, n > N, ‖xn − xm ‖p < 𝜖. Thus, for
n > N, lim supm→∞ ‖xn − xm ‖p ≤ 𝜖. We have shown that, for n > N, ‖xn − x‖p ≤
𝜖. This completes the proof. 

Theorem 4.6.6 (the Cantor intersection theorem). Suppose X is a complete metric


space. If {Fn }∞
n=1 is a descending sequence of closed nonempty subsets of X such that
limn diam(Fn ) = 0, then ∩∞ n=1 Fn is a one-point set.

Proof. For every n ∈ ℕ, choose a point xn ∈ Fn . Let 𝜖 > 0. There exists a natural
number N such that, for n > N, diam(Fn ) < 𝜖. Now if m ≥ n > N, then Fn ⊇ Fm
and xn , xm ∈ Fn , and hence d(xn , xm ) < 𝜖. This makes (xn ) a Cauchy sequence and
hence convergent to, say, x. Each of the sets Fn contains all but a finite number
of terms of (xk ). Since each Fn is closed, x ∈ Fn for all n, and x ∈ ∩∞ n=1 Fn . Now
diam(∩n=1 Fn ) ≤ diam(Fn ) → 0. Hence ∩n=1 Fn = {x}. 
∞ ∞

Definition. A subset A of a metric space X is nowhere dense if int(A) = ∅.

Example 3. ℕ is nowhere dense in ℝ. The reader is cautioned that int(A) ≠ int(A);


for example, int(ℚ) = ∅, while int(ℚ) = ℝ. 

Theorem 4.6.7 (Baire’s theorem). A complete metric space cannot be expressed as


a countable union of nowhere dense subsets.

Proof. Let {An } be a countable family of nowhere dense subsets of X. Without loss
of generality, assume that each An is closed. Since X − A1 is open and nonempty,
there exists a ball B1 = B(x1 , 𝛿1 ) such that B1 ∩ A1 = ∅. By reducing the radius
𝛿1 , if necessary, we may assume that 𝛿1 < 1 and that B1 ∩ A1 = ∅. Since B1 − A2
is open and nonempty, we can find a ball B2 = B(x2 , 𝛿2 ) such that B2 ∩ A2 = ∅.
As before, we may assume that 𝛿2 < 1/2 and B2 ∩ A2 = ∅. We can continue this
process and construct a sequence of balls {Bn } such that Bn ∩ An = ∅, B1 ⊇ B2 ⊇
. . . , and diam(Bn ) ≤ 2/n. By the Cantor intersection theorem, ∩∞
n=1 Bn = {x}. Since
Bn ∩ An = ∅, x ∉ An for all n ∈ ℕ, and ∪∞ A
n=1 n ≠ X. 

The following two results are powerful consequences of Baire’s theorem.

Theorem 4.6.8. Let {An } be a countable family of closed nowhere dense subsets of
a complete metric space X, and let U0 be a nonempty open subset of X. Then
U0 − ∪∞
n=1 An ≠ ∅.

Proof. Modify the proof of Baire’s theorem by requiring that B1 ⊆ U0 − A1 . 


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

140 fundamentals of mathematical analysis

This result generalizes Baire’s theorem: not only is the set A = ∪∞


n=1 An not equal
to X, but A, in fact, has an empty interior.

Theorem 4.6.9. If {Un } is a countable collection of open dense subsets of a complete


metric space, then ∩∞
n=1 Un is dense.

Proof. If there is a nonempty open subset U0 such that U0 ∩ ∩∞n=1 Un = ∅, then U0 ⊆


X − ∩∞ ∞
n=1 Un = ∪n=1 (X − Un ). This contradicts the previous theorem because each
X − Un is closed and nowhere dense. See problem 10 at the end of the section. 

Theorem 4.6.10. The product of a finite number {(Xi , di )}ni=1 of complete metric
spaces is complete.

Proof. Since X1 × . . . × Xn is isometric to X1 × (X2 × . . . × Xn ), it is enough to show


that the product of two complete metric spaces, (X1 , d1 ) and (X2 , d2 ), is complete.
(k) (k)
Let (x(k) ) be a Cauchy sequence in X1 × X2 , and write x(k) = (x1 , x2 ). Since
(n) (m) (k)
di (xi , xi ) ≤ D(x(n) , x(m) ), each of the sequences (xi ) is Cauchy. Therefore, for
(k)
i = 1, 2, limk xi = xi exists. Clearly, limk x(k) = (x1 , x2 ). 

Before we embark on the application subsection, we prove the following result.

Theorem 4.6.11. The space (𝒞[a, b], ‖.‖∞ ) is complete.

Proof. Let (fn ) be a Cauchy sequence in 𝒞[a, b]. For 𝜖 > 0, there is a positive integer
N such that ‖fn − fm ‖∞ < 𝜖 for every m, n ≥ N. Thus, for every x ∈ [a, b] and
every m, n ≥ N, |fn (x) − fm (x)| < 𝜖; hence (fn (x)) is a Cauchy sequence for every
x ∈ [a, b]. By the completeness of 𝕂, f(x) = limn fn (x) exists for every x.

We claim that limn ‖fn − f‖∞ = 0. Let 𝜖 and N be as in the previous paragraph.
Then |fn (x) − fm (x)| < 𝜖 for every x ∈ [a, b] and every n, m ≥ N. Taking the limit
as m → ∞, we obtain |fn (x) − f(x)| < 𝜖 for every x ∈ [a, b] and every n ≥ N. This
means that ‖fn − f‖∞ < 𝜖, as claimed.

Finally, we need to show that f is continuous. Suppose that xk ∈ [a, b] and that
limk xk = x. Let 𝜖 > 0. By the previous paragraph, there is an integer N such that
‖fN − f‖∞ < 𝜖. By the continuity of fN at x, there exists an integer K such that, for
k > K, |fN (xk ) − fN (x)| < 𝜖. Now, for k > K,

|f(x) − f(xk )| ≤ |f(x) − fN (x)| + |fN (x) − fN (xk )| + |fN (xk ) − f(xk )| < 3𝜖. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 141

Example 4 (the Weierstrass M-test). Let fn be a sequence in 𝒞[a, b], and suppose
that there exists a real sequence (Mn ) such that, for every n ∈ ℕ, ‖fn ‖∞ ≤ Mn
∞ ∞
and ∑n=1 Mn < ∞. Then the series of functions ∑n=1 fn (x) converges in 𝒞[a, b].
n
We prove that the sequence of partial sums Sn (x) = ∑i=1 fi (x) is a Cauchy
sequence in 𝒞[a, b]. Let 𝜖 > 0. By the convergence of the positive series
∞ m
∑n=1 Mn , there is an integer N such that, for m > n > N, ∑i=n+1 Mi < 𝜖.⁴
m
Thus, for m > n > N, and for every x ∈ [a, b], |Sm (x) − Sn (x)| ≤ ∑i=n+1 |fi (x)| ≤
m
∑i=n+1 Mi < 𝜖, or ‖Sm − Sn ‖∞ < 𝜖. This shows that Sn is a Cauchy sequence and
hence, by the completeness of 𝒞[a, b], is convergent to a function f ∈ 𝒞[a, b]. 

In fact, the series ∑=1 fn (x) converges to f absolutely as well as uniformly to f on
[a, b].

Example 5. If a a sequence fn converges to f in 𝒞[a, b], then

b b
lim ∫ fn (x)dx = ∫ f(x)dx.
n
a a


In particular, if the series ∑n=1 gn (x) converges in 𝒞[a, b], then

b ∞ ∞ b
∫ ∑ gn (x)dx = ∑ ∫ gn (x)dx.
a n=1 n=1 a

Let 𝜖 > 0. There exists an integer n such that for n > N and all x ∈ [a, b], |fn (x) −
f(x)| < 𝜖. Now if n > N, then

b b b
| ∫ fn (x)dx − ∫ f(x)dx| ≤ ∫ |fn (x) − f(x)|dx ≤ 𝜖(b − a). 
a a a

Applications of Completeness, Part 1: Contraction Mappings and Applications

In this application we prove the contraction mapping theorem,⁵ which is one of


the simplest fixed point theorems. Then we apply it to derive the existence and
uniqueness of solutions of certain types of differential and integral equations.

n ∞
⁴ The sequence of partial sums ∑i=1 Mi is Cauchy because its limit, ∑n=1 Mn , is convergent.
⁵ Also called Banach’s fixed point theorem.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

142 fundamentals of mathematical analysis

Definition. Let X be a metric space. A function T ∶ X → X is called a contraction


if there exists a constant 0 < k < 1 such that,

for all x, y ∈ X, d(T(x), T(y)) ≤ kd(x, y).

Theorem 4.6.12 (contraction mapping theorem). Let T ∶ X → X be a contraction


on a complete metric space X. Then T has a unique fixed point. Thus there is a
unique point z in X such that T(z) = z.

Proof. Let x0 be an arbitrary point in X, and define a sequence (xn ) in X by


xn+1 = T(xn ). First we show that (xn ) is a Cauchy sequence.

For n ≥ 1, d(xn+1 , xn ) = d(T(xn ), T(xn−1 )) ≤ kd(xn , xn−1 ), and, by induction,


d(xn+1 , xn ) ≤ kn d(x1 , x0 ). Now, for m, n ∈ ℕ with m < n,

d(xn , xm ) ≤ d(xn , xn−1 ) + d(xn−1 , xn−2 ) + . . . + d(xm+1 , xm )


≤ (kn−1 + kn−2 + . . . + km )d(x1 , x0 ) = km (1 + k + . . . + kn−m−1 )d(x1 , x0 )

km
≤ km ∑ kj d(x1 , x0 ) = d(x , x ).
j=0
1−k 1 0

Since limm km = 0, (xn ) is a Cauchy sequence. By the completeness of X, z =


limn xn exists. We claim that z is the unique fixed point of T.

Now z = limn xn = limn T(xn−1 ) = T(limn xn−1 ) = T(z). To show that z is unique,
suppose w is a fixed point of T. Then d(z, w) = d(T(z), T(w)) ≤ kd(z, w). This
would be a contradiction unless d(z, w) = 0, that is, z = w. 

Definition. A function f ∶ [a, b] × ℝ → ℝ is said to be a Lipschitz function in its


second variable if there is a constant L > 0 such that

|f(x, y) − f(x, z)| ≤ L|y − z|,


for all x ∈ [a, b] and all y, z ∈ ℝ.

Theorem 4.6.13. Consider the initial value problem

dy
= f(x, y(x)), y(a) = y0 .
dx

Suppose that f ∶ [a, b] × ℝ → ℝ is continuous and that it satisfies the Lipschitz


condition in its second argument: |f(x, y) − f(x, z)| ≤ L|y − z|. Then the initial
value problem has a unique solution y(x) on the interval [a, b].
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 143

Proof. Choose a constant K > L, and define a metric on X = 𝒞[a, b] by

d(y, z) = supx∈[a,b] exp{−K(x − a)}|y(x) − z(x)|.

It is relatively easy to check that d is a complete metric on X. Note that, for all
x ∈ [a, b], and all y, z ∈ X, |y(x) − z(x)| ≤ eK(x−a) d(y, z). The initial value prob-
lem in question is equivalent to the integral equation
x
y(x) = y0 + ∫ f(s, y(s))ds.
a

x
Define a function T ∶ X → X by y ↦ Ty , where Ty (x) = y0 + ∫a f(s, y(s))ds. The
proof will be complete if we show that T has a unique fixed point y ∈ X. For two
functions y, z ∈ X,
x
−K(x−a) −K(x−a) || |
e |Ty (x) − Tz (x)| = e ∫ f(s, y(s)) − f(s, z(s))ds|
| |
a
x x
≤ e−K(x−a) ∫ |f(s, y(s)) − f(s, z(s))|ds ≤ Le−K(x−a) ∫ |y(s) − z(s)|ds
a a
x
L
≤ Le−K(x−a) d(y, z) ∫ eK(s−a) ds = e−K(x−a) d(y, z)[eK(x−a) − 1]
a
K
L −K(x−a) L L
< e d(y, z)eK(x−a) = d(y, z) = kd(y, z), where k = < 1.
K K K

The above inequalities show that d(Ty , Tz ) ≤ kd(y, z); hence T is a contraction. We
now invoke the contraction mapping theorem to conclude that the initial value
problem has a unique solution in 𝒞[a, b]. 

Theorem 4.6.14. If X is a complete metric space and T ∶ X → X is such that Tn is


a contraction for some positive integer n, then the unique fixed point of Tn is the
unique fixed point of T.

Proof. Let x be the unique fixed point of Tn . Thus Tn (x) = x. Now T(x) = Tn+1 (x) =
Tn (T(x)). Thus T(x) is a fixed point of Tn . But the fixed point ot Tn is unique, so
T(x) = x. We leave it to the reader to show that x is the only fixed point of T. 

Theorem 4.6.15. Consider the nonlinear Volterra equation


x
u(x) = ∫ K(x, y, u(y))dy + f(x).
a
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

144 fundamentals of mathematical analysis

Suppose f ∈ 𝒞[a, b] and that K is continuous on [a, b] × [a, b] × (−∞, ∞) and


satisfies the Lipschitz condition |K(x, y, z1 ) − K(x, y, z2 )| ≤ L|z1 − z2 | for all
x, y ∈ [a, b] and all z1 , z2 ∈ ℝ. Then the above integral equation has a unique
solution u ∈ 𝒞[a, b].

Let X = 𝒞[a, b], equipped with the uniform metric. Define a function T ∶ X → X
by
x
u ↦ Tu , where Tu (x) = f(x) + ∫ K(x, y, u(y))dy.
a

We leave it to the reader to verify that Tu ∈ 𝒞[a, b]. For u, v ∈ X,

|Tu (x) − Tv (x)| ≤ L‖u − v‖∞ (x − a).

One more application of T yields

|T2u (x) − T2v (x)| ≤ L2 ‖u − v‖∞ (x − a)2 /2,

and, by induction,

Ln Ln
|Tnu (x) − Tnv (x)| ≤ ‖u − v‖∞ (x − a)n ≤ ‖u − v‖∞ (b − a)n .
n! n!
Ln (b−a)n
For sufficiently large n, k = < 1, and, for such an n, Tn is a contraction.
n!
By theorem 4.6.14, T has a unique fixed point, and the Volterra equation has a
unique solution. 

Applications of Completeness Part 2: Continuous, Nowhere Differentiable


Functions

The first example of a continuous, nowhere differentiable function was pro-


duced by Weierstrass in 1872. Until that time, it was generally believed that
continuous functions could fail to be differentiable at an isolated set of points. The
main result in this application establishes an extreme contrast to the Weierstrass
polynomial approximation theorem. Like polynomials, which are simple, well-
behaved functions, the very erratic continuous, nowhere differentiable functions
are also dense in 𝒞[0, 1].

Definition. For a fixed integer n ≥ 1, let 𝔉n be the set of functions f ∈ 𝒞[0, 1]


for which there is a point x0 ∈ [0, 1] such that, for all x ∈ [0, 1], |f(x) − f(x0 )| ≤
n|x − x0 |.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 145

The geometric meaning of the condition in the above definition is that the slope
of the line joining the point (x0 , f(x0 )) and an arbitrary point (x, f(x)) on the graph
of f cannot exceed n in absolute value. There is no shortage of functions in 𝔉n . For
example, the functions f𝛼 (x) = 𝛼x are in 𝔉n if |𝛼| ≤ n.

Example 6. If f ∈ 𝒞[0, 1] has a continuous derivative and ‖f′ ‖∞ ≤ n, then f ∈ 𝔉n .


Fix a point x0 ∈ [0, 1]. For any x ∈ [0, 1], |f(x) − f(x0 )| = |f′ (𝜉)(x − x0 )| ≤ n|x −
x0 |. Here 𝜉 is a point between x and x0 . 

The assumptions of the previous example are much stronger than they need to
be. The differentiablility of f at a single point in (0, 1) is enough to guarantee that
f ∈ 𝔉n for some n, as the following example illustrates.

Example 7. If f ∈ 𝒞[0, 1] is differentiable at x0 ∈ (0, 1), then f ∈ 𝔉n for some


n ≥ 1.
By assumption, there exists a number 𝛿 > 0 such that, for |x − x0 | < 𝛿,
| f(x)−f(x0 ) − f′ (x0 )| < 1. Thus, for |x − x0 | < 𝛿, |f(x) − f(x0 )| ≤ (|f′ (x0 )| + 1)|x −
x−x0
∞ 2‖f‖ ∞ 2‖f‖
x0 |. If |x − x0 | ≥ 𝛿, then |f(x) − f(x0 )| ≤ 2‖f‖∞ = 𝛿≤ |x − x0 |. Now,
𝛿 𝛿

for any integer n > max{|f (x0 )| + 1, 2‖f‖∞ /𝛿}, and all x ∈ [a, b], |f(x) − f(x0 )| ≤
n|x − x0 |. 

Remark 1. A direct consequence of example 7 is that if f ∈ 𝒞[0, 1] is not in any


𝔉n , then f is nowhere differentiable in (0, 1). To prove the existence of a single
nowhere differentiable function, we need to show that 𝒞[0, 1] − ∪∞ n=1 𝔉n ≠ ∅.
Theorem 4.6.9 provides the plan of attack if we wish to prove that there is an
abundance of continuous nowhere differentiable functions: Prove that each 𝔉n
is closed and nowhere dense. Then Un = 𝒞[0, 1] − 𝔉n is open and dense in
𝒞[0, 1], and hence ∩∞ ∞
n=1 Un is also dense. The set ∩n=1 Un consists entirely of
nowhere differentiable functions.

Lemma 4.6.16. The set 𝔉n is closed.

Proof. Let (fk ) be a sequence in 𝔉n , and suppose fk → f in the uniform norm. For each
k ∈ ℕ, let xk be such that |fk (x) − fk (xk )| ≤ n|x − xk |. By the Bolzano-Weierstrass
theorem (theorem 1.2.8), (xk ) contains a convergent subsequence xkp . For simplic-
ity of notation, write xp for xkp and fp for fkp , and let x0 = limp→∞ xp . For x ∈ [0, 1],
|f(x) − f(x0 )| = limp |fp (x) − fp (xp )| ≤ n limp |x − xp | = n|x − x0 |. 

We need to construct continuous functions that change direction steeply and


frequently. Continuous, piecewise linear functions of this type exist. A continuous,
piecewise linear function f has one-sided derivatives at each x in [0, 1]. We denote
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

146 fundamentals of mathematical analysis

the right and left derivatives of f by D+ f(x) and D− f(x), respectively. We use
the notation |Df| to denote the minimum (absolute) value of the (one-sided)
derivatives of f. Simply put, |Df| is the minimum absolute value of the slope of
any straight line segment of the graph of f. Figure 4.2 shows the graph of the type
of functions of interest to us. In that graph, the slope of any straight line segment
of the graph is ±4, and |D𝜓| = 4.

Example 8. Given an arbitrary interval [a, b] and an integer k ≥ 1, there exists a


continuous, piecewise linear function 𝜓 on [a, b] such that 𝜓(a) = 0 = 𝜓(b),
‖𝜓‖∞ = 2−k , and |D𝜓| > 2k .

Choose an integer m such that 2.4m > 4k (b − a), and divide the interval [a, b]
j(b−a)
into 4m subintervals of equal length. For 0 ≤ j ≤ 4m , let xj = a + m . Define 𝜓
4
to be the continuous, piecewise linear function such that, for 0 ≤ j ≤ 4m , 𝜓(xj ) =
xj +xj+1
0, and, for 0 ≤ j ≤ 4m − 1, 𝜓( ) = 2−k . The magnitude of the slope of any
2
2−k
straight line segment of the graph of 𝜓 is equal to 1 (b−a)
> 2k . 
2 4m

The idea behind the construction in example 8 is geometrically simple. If we want


the short function 𝜓 (its height is 2−k ) to have very steep slopes, we must make the
base of each triangle very small, about 4−k . Then the slopes would have magnitudes
2−k /4−k = 2k . Figure 4.2 depicts the function 𝜓 on [0, 1] with k = 1 = m.

0.5

1/4 1/2 3/2 1

Figure 4.2 The short, narrow, but spiky function 𝜓


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 147

Remark 2. The function 𝜓 constructed in example 8 is not in 𝔉n for any n ≤ 2k .


For any point (x0 , 𝜓(x0 )) on the graph of 𝜓, there is another point (x, 𝜓(x)) on
the same straight line segment of the graph. Thus the slope of that line segment
is ±|D𝜓|, which is greater than 2k in absolute value.

Example 9. Let [a, b] be an arbitrary interval, and let h be the linear function
h(x) = mx + c. For every 𝜖 > 0 and for every n ≥ 1, there exists a continuous,
piecewise linear function 𝜑 on [a, b] such that 𝜑(a) = h(a), 𝜑(b) = h(b),
‖h − 𝜑‖∞ < 𝜖, and |D𝜑| > n.

Choose an integer k such that 2−k < 𝜖 and 2k − |m| > n. Using example 8,
we find a continuous, piecewise linear function 𝜓 such that 𝜓(a) = 0 = 𝜓(b),
‖𝜓‖∞ = 2−k , |D𝜓| > 2k . Define 𝜑 = h + 𝜓. Clearly, ‖h − 𝜑‖∞ = ‖𝜓‖∞ = 2−k <
𝜖 and, for x ∈ [a, b], |D± 𝜑(x)| = |D± 𝜓(x) + m| ≥ |D± 𝜓(x)| − |m| = |D𝜓| −
|m| > 2k − |m| > n. 

Lemma 4.6.17. For each n ≥ 1, 𝔉n is nowhere dense in 𝒞[0, 1].

Proof. Let f ∈ 𝒞[0, 1] and let 𝜖 > 0. We will show that there is a continuous, piecewise
linear function g such that ‖f − g‖∞ < 𝜖 and g ∉ 𝔉n . Since continuous, piecewise
linear functions are dense in 𝒞[0, 1] (see example 1 in section 4.5), let h be a
continuous, piecewise linear function such that h(xj ) = f(xj ) for 0 ≤ j ≤ M and
‖f − h‖∞ < 𝜖/2. For 0 ≤ j ≤ M − 1, let hj be the restriction of h to [xj , xj+1 ]. By
example 9, for each j we construct a piecewise linear function 𝜑j such that ‖hj −
𝜑j ‖∞ < 𝜖/2 and |D𝜑j | > n. We define the required function g by pasting together
the functions 𝜑j . The function g is continuous because 𝜑j (xj ) = f(xj ) = 𝜑j+1 (xj ).
Now ‖f − g‖∞ ≤ ‖f − h‖∞ + ‖h − g‖∞ < 𝜖. 

The following result follows from remark 1, lemma 4.6.16, lemma 4.6.17, and
theorem 4.6.9.

Theorem 4.6.18. Continuous, nowhere differentiable functions are dense in


𝒞[0, 1]. 

Exercises

1. Prove that ℝn and ℂn are complete.


2. Prove that l∞ is complete.
3. Prove that the space c0 of null sequences is complete.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

148 fundamentals of mathematical analysis

1 1
4. Define a metric on ℕ as follows: d(m, n) = | − |. Prove that d is an
n m
incomplete metric.
5. For x, y ∈ ℝ, define d(x, y) = |tan−1 x − tan−1 y|. Prove that d is an incom-
x−y
plete metric on ℝ. Use the identity tan−1 x − tan−1 y = tan−1 ( ).
1+xy
6. Let A be a dense subset of a metric space X such that every Cauchy sequence
in A is convergent to a point in X. Prove that X is complete.
7. Prove that if (xn ) and (yn ) are Cauchy sequences in a metric space, then
d(xn , yn ) converges.
8. Prove the converse of the Cantor intersection theorem. Hint: Let (xn ) be a
Cauchy sequence. For each n ∈ ℕ, let An = {xn , xn+1 , . . . }, and let Fn = An .
Show that limn diam(Fn ) = 0.
9. Prove that a subset A of a metric space is nowhere dense if and only if every
nonempty open set U contains a nonempty open subset V such that V ∩
A = ∅.
10. Show that a closed subset F of a metric space X is nowhere dense, if and
only if X − F is dense.
11. Show that the boundary of a closed subset F of a metric space X is nowhere
dense and give an example to show that the assumption that F is closed
cannot be omitted.
12. Let X be a complete metric space, and let {Fn } be a countable collection of
closed, nowhere dense subsets of X. Is ∪∞ n=1 Fn necessarily nowhere dense?
13. Prove that a contraction on a metric space is continuous. Notice that this
fact was used in the proof of theorem 4.6.12
14. Prove that the metric d in the proof of theorem 4.6.13 is complete.
15. Prove that the function Ty in the proof of theorem 4.6.13 is continuous.
16. Let g ∶ [a, b] → ℝ and K ∶ [a, b] × [a, b] → ℝ be continuous functions.
Show that when |𝛼| is small enough, the integral equation

b
y(x) = 𝛼 ∫ K(x, t)y(t)dt + g(x)
a

has a unique solution in 𝒞[a, b].


17. Show that the fixed point of T found in the proof of theorem 4.6.14 is
unique.
18. Show that the function Tu in the proof of theorem 4.6.15 is continuous.

Definition. An n × n matrix A = (aij ) is said to be diagonally dominant


if, for each 1 ≤ i ≤ n, |aii | > ∑j≠i |aij |.

19. Prove that a diagonally dominant matrix is invertible. Hint: If 0 ≠ x ∈ ℝn


is such that Ax = 0, let i be such that |xi | = max1≤j≤n |xj |. Now write down
the ith equation of the system Ax = 0.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 149

Numerical approximations of linear elliptic partial differential equa-


tions often lead to matrix equations with a very large, sparse, diagonally
dominant matrix. Iterative solutions are practical in this situation, and the
method described in the problem below, the Jacobi iteration, is one of the
simplest (and the slowest).
20. Let A be a diagonally dominant matrix, and consider the system Ax = b.
Write A as follows: A = D + L + U, where

0 a1,2 … a1,n 0 … … 0
⎛ ⎞ ⎛ ⎞
⋮ 0 ⋱ a 0
U=⎜ ⎟ , L = ⎜ 2,1 ⎟,
⎜ ⋮ ⋱ an−1,n ⎟ ⎜ ⋮ ⋱ ⋱ ⎟
⎝0 … 0 ⎠ ⎝an,1 … an,n−1 0⎠

a11
D=( ⋱ ).
ann
Define J = −D−1 (L + U). Show that the function T ∶ ℝn → ℝn defined
by Tx = Jx + D−1 b is a contraction. Conclude that the iteration xk =
Txk−1 , k ≥ 1, converges to the solution of the system Ax = b. Hint: Examine
the matrix norm ‖J‖∞ defined in section 3.6.

4.7 Compactness

A clear manifestation of sequential compactness can be seen in examples 7 and 8 in


section 1.2, where we proved the boundedness of continuous functions and their
uniform continuity on a compact interval. We urge the reader to re-examine these
two examples. This section opens with the topological (non-sequential) definition
of compactness and the establishment of the general characteristics of compact
spaces. This is done in order to avoid the duplication of definitions and results in
the corresponding section in chapter 5. The various equivalent characterizations
of compact metric spaces are discussed, and then we prove two famous theorems:
Tychonoff ’s theorem and the Heine-Borel theorem. The section concludes with an
illuminating application on closed convex sunsets of ℝn .

Definition. A metric space X is said to be compact if every open cover of X


contains a finite subcover of X. The definitions of open covers and subcovers
have been stated in section 4.5.

Example 1. The collection 𝒰 = {(−n, n) ∶ n ∈ ℕ} of open subsets of ℝ is an open


cover of ℝ that contains no finite subcover. Therefore ℝ is not compact. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

150 fundamentals of mathematical analysis

Example 2. The sequence of open intervals 𝒰 = {In = (1/n, 1 − 1/n) ∶ n ≥ 3} cov-


ers the open interval (0, 1), but no finite subset of 𝒰 covers (0, 1). Thus (0, 1) is
not compact. 

Definition. Let K be a subset of a metric space X. We say that K is a compact subset


(or a compact subspace) of X if it is compact in the restricted metric.

Theorem 4.7.1. A subset K of a metric space X is compact if and only if it satisfies


the following condition: if 𝒰 is a collection of open subsets of X such that
K ⊆ ∪ {U ∶ U ∈ 𝒰}, then there exists a finite subcollection {U1 , U2 , . . . , Un } of 𝒰
such that K ⊆ ∪ni=1 Un .

Proof. Suppose K is compact and that K ⊆ ∪{U ∶ U ∈ 𝒰}. Then K = ∪{K ∩ U ∶


U ∈ 𝒰}, and each of the sets K ∩ U is open in K. Therefore, for a finite
subcollection {U1 , U2 , . . . , Un } of 𝒰, K = ∪{K ∩ Ui ∶ 1 ≤ i ≤ n}. Thus K ⊆ ∪ni=1 Ui .
The proof of the converse is left as an exercise. 

Example 3. The set K = {1/n ∶ n ∈ ℕ} ∪ {0} is compact. Suppose {U𝛼 ∶ 𝛼 ∈ I} is


an open cover of K, and let 0 ∈ U𝛼0 . Since limn 1/n = 0, there is an integer N
such that an ∈ U𝛼0 for all n > N. For i = 1, . . . , N, choose members U𝛼1 , . . . , U𝛼N
of 𝒰 such that xi ∈ U𝛼i . Clearly, K ⊆ ∪Ni=0 U𝛼i . 

Theorem 4.7.2. A closed subspace K of a compact space X is compact.

Proof. Let 𝒰 be a collection of open subsets of X whose union contains K.


Then 𝒰∗ = 𝒰 ∪ {X − K} is an open cover of X. Therefore there exists a finite
subcollection 𝒰′ of 𝒰∗ that covers X. There is no loss of generality in assuming
that X − K ∈ 𝒰′ . Thus X = (X − K) ∪ ∪ni=1 Ui , where each Ui ∈ 𝒰. Since K ⊆ X,
and K does not intersect X − K, K ⊆ ∪ni=1 Ui . This proves that K is compact. 

Theorem 4.7.3. A compact subspace K of a metric space X is closed and bounded.

Proof. We prove that X − K is open. Let x ∈ X − K. For every point y ∈ K, there


exist disjoint open subsets Uy and Vy of X such that x ∈ Uy and y ∈ Vy . Now K ⊆
∪y∈K Vy , so, by the compactness of K, there are finitely many points y1 , y2 , . . . , yn ∈
K such that K ⊆ ∪ni=1 Vyi . Now let U = ∩ni=1 Uyi . Clearly, K ∩ U = ∅, thus x ∈ U ⊆
X − K, and hence X − K is open. We leave the proof that K is bounded as an
exercise. 

Theorem 4.7.4. The continuous image of a compact space is compact.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 151

Proof. Let (X, d) be a compact space, and let (Y, 𝜌) be a metric space. We show that
if f ∶ X → Y is a continuous surjection, then (Y, 𝜌) is compact. Let {V𝛼 } be an
open cover of Y. Since f is continuous, f−1 (V𝛼 ) is open in X for each 𝛼, and hence
{f−1 (V𝛼 )} is an open cover of X. The compactness of X yields a finite subcover
{f−1 (V𝛼i )}ni=1 of X. Clearly, {V𝛼i }ni=1 covers Y. 

Definition. A metric space X is sequentially compact if every sequence in X


contains a subsequence that converges in X.

Definition. A metric space X has the Bolzano-Weierstrass property if every


infinite subset of X has a limit point.

Theorem 4.7.5. A metric space X is sequentially compact if and only if it has the
Bolzano-Weierstrass property.

Proof. Let A be an infinite subset of a sequentially compact space X. Then A contains


a sequence (xn ) of distinct points. By assumption, (xn ) contains a subsequence
that converges to a point, x. By theorem 4.2.6, x is a limit point of A.

Conversely, suppose X has the Bolzano-Weierstrass property, and let (xn ) be a


sequence in X. If the range A = {x1 , x2 , . . . } of (xn ) is finite, then (xn ) contains
a constant subsequence, which is clearly convergent. So, suppose A is infinite. By
assumption, A has a limit point, x. Let n1 ∈ ℕ be such that d(xn1 , x) < 1. Having
found positive integers n1 < n2 < . . . < nk such that d(xni , x) < 1/i, for 1 ≤ i ≤ k,
we pick an integer nk+1 > nk such that d(xnk+1 , x) < 1/(k + 1). Such an integer
1
exists because otherwise, for every n > nk , we would have d(x, xn ) ≥ , which
k+1
is impossible since x is a limit point of A. By construction, limk xnk = x. 

Definition. A metric space X is totally bounded if, for every 𝜖 > 0, there exists
a finite subset {x1 , . . . , xn } of X such that X = ∪ni=1 B(xi , 𝜖). The set {x1 , . . . , xn } is
called an 𝜖-dense subset of X.

Theorem 4.7.6. A metric space X is sequentially compact if and only if it is complete


and totally bounded.

Proof. Suppose X is sequentially compact. Let (xn ) be a Cauchy sequence in X. By


assumption, (xn ) contains a subsequence (xnk ) that converges to a point x. By
theorem 4.6.3, limn xn = x, and X is complete. If X is not totally bounded, then
there exists 𝜖 > 0 such that if F is a finite subset of X, then X ≠ ∪x∈F B(x, 𝜖). Pick a
point x1 ∈ X, and a point x2 ∉ B(x1 , 𝜖). Since B(x1 , 𝜖) ∪ B(x2 , 𝜖) ≠ X, there exists a
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

152 fundamentals of mathematical analysis

point x3 ∈ X such that d(x3 , xi ) ≥ 𝜖, i = 1, 2. Continuing this construction yields a


sequence (x1 , x2 , . . . ) of X such that d(xn , xm ) ≥ 𝜖 for all n, m ∈ ℕ, n ≠ m. Clearly,
(xn ) contains no convergent subsequence, which contradicts the sequential
compactness of X.

Suppose X is complete and totally bounded. We claim that X has the Bolzano-
Weierstrass property. The proof will be complete by theorem 4.7.5. Let A be an
infinite subset of X. The total boundedness of X allows us to cover X by a finite
collection closed balls of radius 1. One of the balls, B1 , contains infinitely many
points of A. Define F1 = B1 , and A1 = A ∩ B1 . Now cover X by a finite collection of
closed balls of radius 1/2. One of those balls, B2 , contains infinitely many points of
A1 and hence infinitely many points of A. Define F2 = B2 ∩ F1 , and A2 = A1 ∩ B2 .
Continue by induction to construct a sequence of closed subsets F1 ⊇ F2 ⊇ F3 ⊇ . . .
such that diam(Fn ) ≤ 2/n and each Fn contains infinitely many points of A. By the
Cantor intersection theorem, let {x} = ∩∞ n=1 Fn . Since limn diam(Fn ) = 0, any ball
centered at x contains Fn for sufficiently large n. Since Fn contains infinitely many
points of A, x is a limit point of A. 

Definition. Let 𝒰 = {U𝛼 } be an open cover of a metric space X. A Lebesgue


number for 𝒰 is a positive number a such that every subset A of X of diameter
less that a is contained in one member of 𝒰.

Theorem 4.7.7. In a sequentially compact metric space X, every open cover of X has
a Lebesgue number.

Proof. Suppose that there is an open cover 𝒰 = {U𝛼 } of X that does not have a
Lebesgue number. We show that X is not sequentially compact. By assumption,
for every n ∈ ℕ, there exists a subset An of X such that diam(An ) < 1/n, and
An is not contained in any member of 𝒰. For each n ∈ ℕ, pick a point xn ∈ An .
We claim that (xn ) has no convergent subsequence. Suppose, contrary to our
claim, that some subsequence (xnk ) of (xn ) converges to x. Since ∪𝛼 U𝛼 = X,
there exists a member U𝛼 of 𝒰 that contains x, and since U𝛼 is open, there
is a number 𝛿 > 0 such that B(x, 𝛿) ⊆ U𝛼 . Now choose a positive integer K
such that d(xnK , x) < 𝛿/2, and 1/nK < 𝛿/2. If y ∈ AnK , then d(x, y) ≤ d(x, xnK ) +
d(xnK , y) < 𝛿/2 + diam(AnK ) < 𝛿/2 + 𝛿/2 = 𝛿. This implies that AnK ⊆ B(x, 𝛿) ⊆
U𝛼 , which is a contradiction. 

Theorem 4.7.8. Every sequentially compact metric space is compact.

Proof. Let X be sequentially compact, and let 𝒰 = {U𝛼 } be an open cover of X.


By theorem 4.7.7, 𝒰 has a Lebesgue number, a. Let 𝜖 = a/3. By theorem 4.7.6,
there exists a finite subset {x1 , . . . , xn } of X such that ∪ni=1 B(xi , 𝜖) = X. For each
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 153

1 ≤ i ≤ n, diam(B(xi , 𝜖)) ≤ 2𝜖 < a. Therefore each ball B(xi , 𝜖) is contained in a


member U𝛼i of 𝒰. Clearly, X = ∪ni=1 U𝛼i . 

Theorem 4.7.9. For a metric space X, the following are equivalent:

(a) X is compact.
(b) X is sequentially compact.
(c) X has the Bolzano-Weierstrass property.
(d) X is complete and totally bounded.

Proof. In light of theorems 4.7.5, 4.7.6, and 4.7.8, we only need to show that (a)
implies (b). Let (xn ) be a sequence in X. Define An = {xn , xn+1 , . . . }, and let Fn =
An . Clearly, {Fn } is a descending sequence of closed nonempty sets. If ∩n∈ℕ Fn = ∅,
then ∪n∈ℕ (X − Fn ) = X. Thus (X − Fn ) is an ascending sequence of open subsets
that covers X. Therefore X = X − Fn , for some positive integer n, and hence Fn =
∅. This contradiction shows that ∩n∈ℕ Fn ≠ ∅. Let x ∈ ∩n∈ℕ Fn . Observe that x
is a closure point of each of the sets An . Since x ∈ A1 , there exists an integer
n1 ≥ 1 such that d(xn1 , x) < 1. Now x ∈ An1 +1 ; thus there is an integer n2 ≥ n1 + 1
such that d(xn2 , x) < 1/2. Having found a sequence of positive integers n1 < n2 <
. . . < nk such that, for 1 ≤ i ≤ k, d(xni , x) < 1/i, choose an integer nk+1 ≥ nk + 1
1
such that d(xnk+1 , x) < . This is possible because x ∈ Ank +1 . By construction,
k+1
limk xnk = x. 

Theorem 4.7.10 (Tychonoff ’s theorem). The product of finitely many compact


metric spaces is compact.

Proof. It is enough to show that the product of two compact metric spaces X and
Y is compact. Let (xn , yn ) be a sequence in X × Y. Since X is compact, there is a
subsequence (xnk ) of (xn ) that converges to x ∈ X. Since Y is compact, there exists
a subsequence (ynkp ) of (ynk ) that converges to a point y ∈ Y. Now (xnkp , ynkp ) is a
subsequence of (xn , yn ) that converges to (x, y) as p → ∞. 

Example 4. The convex hull, C, of a compact subset K ⊆ ℝn is compact.

Let Tn be the standard n-simplex, which is compact by problem 21 at the end of


this section. Consider the function F ∶ Tn × Kn+1 → C defined by

n
F(𝜆0 , . . . , 𝜆n , x0 , . . . , xn ) = ∑ 𝜆i xi , where (𝜆0 , . . . , 𝜆n ) ∈ Tn , and x0 , . . . , xn ∈ K.
i=0
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

154 fundamentals of mathematical analysis

The continuity of F is straightforward, and F is surjective since, by Carathéodory’s


theorem, every point in C is a convex combination of at most n + 1 points in K.
By Tychonoff ’s theorem, the set Tn × Kn+1 is compact. The result now follows
from theorem 4.7.4. 

Theorem 4.7.11 (the Heine-Borel theorem). A subset K of ℝn is compact if and


only if it is closed and bounded.

Proof. A compact subset of any metric space is closed and bounded by theorem
4.7.3. Conversely, suppose K ⊆ ℝn is closed and bounded; K is contained in some
rectangle I1 × . . . × In , where each Ii is a closed bounded interval in ℝ. By theorem
1.2.10, each Ii has the Bolzano-Weierstrass property and hence is compact by
theorem 4.7.9. By Tychonoff ’s theorem, I1 × . . . × In is compact and, by theorem
4.7.2, K is compact. 

Remark 1. In the above proof of the Henie-Borel theorem, it is tacitly assumed


that the metric involved is the product metric as defined in section 4.4. This is
largely a matter of convenience. We may show that K is closed and bounded in the
1-norm or the Euclidean norm.⁶ See example 6 following theorem 4.3.8.

Theorem 4.7.12. A continuous real-valued function f on a compact space X is


bounded and attains its maximum and minimum values.

Proof. By theorem 4.7.4, f(X) is a compact subset of ℝ. By the Heine-Borel theorem,


f(X) is closed and bounded. Therefore f is a bounded function. Since f(X) is closed,
it contains its least upper and greatest lower bounds. Therefore maxx∈X f(x) =
supx∈X f(x) is in f(X), and hence the maximum value of f is attained. The same
reasoning shows that the minimum value of f is attained in X. 

Definition. A metric space X is locally compact if every point in X belongs to the


interior of a compact subset of X. Thus, for every x ∈ X, there is an open subset
V of X such that x ∈ V and V is compact.

Theorem 4.7.13. ℝn is locally compact.

Proof. Any point x = (x1 , . . . , xn ) ∈ ℝn is contained in the open rectangle

V = (x1 − 1, x1 + 1) × . . . × (xn − 1, xn + 1)

⁶ We will see in section 6.1 that all norms on ℝn are equivalent. Thus if a set K is closed and bounded
in one norm on ℝn , then it is closed and bounded in any norm on ℝn .
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 155

and
V = [x1 − 1, x1 + 1] × . . . × [xn − 1, xn + 1]
is compact. 

Example 5. ℚ is not locally compact.

Since every open subset of ℚ is the union of sets of the form (a, b) ∩ ℚ,
where a, b ∈ ℝ, it is enough to show that a set of the form I = [a, b] ∩ ℚ is
not sequentially compact. Choose an irrational number r ∈ (a, b), then choose
a sequence xn ∈ I such that limn xn = r. Clearly, no subsequence of (xn ) is
convergent in I. 

Example 6. The metric space l∞ is not locally compact. It is enough to show that
the closed unit ball B = {x ∈ l∞ ∶ ‖x‖∞ ≤ 1} is not compact (see problem 8 in
the section exercises). As B contains the canonical vectors en of 𝕂(ℕ), since
d(en , em ) = 1 if n ≠ m, the sequence (en ) in l∞ does not contain a convergent
subsequence. 

The proof of the following theorem is left as an exercise.

Theorem 4.7.14. The product of finitely many locally compact spaces is locally
compact. 

Excursion: Closed Convex Subsets of ℝn

Example 7. Let K be a compact subset of ℝn , and let a ∈ ℝn − K. Then there exists


a point z ∈ K such that ‖z − a‖2 = dist(a, K). The point z is the closest point in
K to a.

Define a function f ∶ K → ℝ by f(x) = ‖x − a‖2 . Clearly, f is continuous. By


theorem 4.7.12, f is bounded and attains its minimum value in K. Thus there
is a point z ∈ E such that f(z) = ‖z − a‖2 = min{f(x) ∶ x ∈ K} = dist(a, K). 

Example 8. Let C be a closed subset of ℝn , and let a ∈ ℝn − C. Then there exists


a point z ∈ C such that ‖z − a‖2 = dist(a, C). If, in addition, C is convex, then z
is unique.

Let B be a closed ball of radius r centered at a, and assume r is large enough so


that B ∩ C ≠ ∅. The set K = B ∩ C is a closed and bounded subset of ℝn . By the
Heine-Borel theorem, K is compact. By the previous example, there is a point
z ∈ K such that d = ‖z − a‖2 = dist(a, K). Since d ≤ r and ‖x − a‖2 > r for every
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

156 fundamentals of mathematical analysis

vector x ∈ C − B, ‖a − z‖2 = dist(a, C). We leave it as an exercise to show that z


is unique when C is convex. 

Example 9 (the obtuse angle criterion). Let C be a closed convex subset of ℝn ,


let a ∈ ℝn − C, and let z be the closest element of C to a. Then, for every y ∈ C,
⟨a − z, y − z⟩ ≤ 0. Here ⟨., .⟩ is the Euclidean inner product on ℝn .

Without loss of generality, assume that y ≠ z. Consider the quadratic function

𝜑(t) = ‖(1 − t)z + ty − a‖22 , 0 ≤ t ≤ 1.

Observe that

𝜑(t) = ‖(z − a) + t(y − z)‖22 = ‖z − a‖22 + 2t⟨z − a, y − z⟩ + t2 ‖y − z‖22 .

Because C is convex, (1 − t)z + ty ∈ C for every 0 ≤ t ≤ 1. Since z is the clos-


est point in C to a, 𝜑(0) ≤ 𝜑(t) for every 0 ≤ t ≤ 1, and 𝜑 is increasing on
[0, 1]. This can happen only if 𝜑′ (0) ≥ 0. Thus 2⟨z − a, y − z⟩ ≥ 0, and hence
⟨a − z, y − z⟩ ≤ 0. 

⟨a−z,y−z⟩
Observe that if 𝜃 is the angle between a − z and y − z, then cos 𝜃 = .
‖a−z‖2 ‖x−z‖2

The condition ⟨a − z, y − z⟩ ≤ 0 is equivalent to saying that 𝜃 is at least 90 , hence
the name obtuse angle criterion. Figure 4.3 illustrates the geometry. The wedge-
shaped region depicts the convex set C, and the rest of the diagram is self-
explanatory.

Figure 4.3 The obtuse angle criterion


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 157

We know (see theorem 4.2.12) that a closed subset in a metric space can be
separated from a point outside it by disjoint open sets. In ℝn , a closed con-
vex subset can be separated from a point outside it in a much stronger and
more specific way. They can be separated by a hyperplane, as the next example
illustrates.

Example 10. Let C be a closed convex subset of ℝn , and let a ∈ ℝn − C. Then there
exists a hyperplane nT x = b such that nT y < b for every y ∈ C, and nT a > b. Thus
C is contained in one of the open half-spaces determined by the hyperplane,
and a is contained in the other open half-space.

Let z ∈ C be the closest point in C to a, and let m = (a + z)/2. We show that


the hyperplane we seek is the hyperplane that contains m and is normal to the
vector n = a − z. Without loss of generality, assume that m = 0 or, equivalently,
z = −a and n = 2a.⁷
By the previous example, for every y ∈ C, ⟨y − z, a − z⟩ ≤ 0, and hence
⟨y − z, n⟩ ≤ 0. Thus ⟨y, n⟩ ≤ ⟨z, n⟩ = ⟨−a, 2a⟩ = −2‖a‖22 < 0, and nT y < 0. On
the other hand, nT a = ⟨a, n⟩ = ⟨a, 2a⟩ = 2‖a‖22 > 0. 

Remark 2. A direct consequence of the above example is the following. Under


the assumptions of example 10, there exists a unit vector u and a real number b
such that, for all y ∈ C, uT y < b < uT a.

Theorem 4.7.15. A closed convex subset C of ℝn is the intersection of the closed


half-spaces containing C.

Proof. We only need to show that C contains the intersection of the closed half-spaces
containing C. The reverse containment is obvious. If a ∉ C, then, by the previous
example, there is a hyperplane nT x = b such that nT y < b for all y ∈ C and
nT a > b. Thus C is contained in the closed half-space H = {x ∈ ℝn ∶ nT x ≤ b},
but a ∉ H. 

Definition. A hyperplane M is said to be a supporting hyperplane of the closed


convex set C ⊆ ℝn if C is contained in one of the closed half-spaces determined
by M, and C ∩ M ≠ ∅. The closed half-space determined by M that contains
C is called a supporting half-space of C. Observe that every point in C ∩ M is
necessarily a boundary point of C.

⁷ We can translate C by −m. Specifically, we look at the set C′ = {x − m ∶ x ∈ C} and the point
m′ = 0. This translation preserves all the properties of C but has the advantage that the hyperplane we
seek has a homogeneous equation. This simplifies the algebra.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

158 fundamentals of mathematical analysis

Example 11. Every tangent line to the unit circle is a supporting line of the closed
unit disk. The line y = x + 1 is a supporting line of the closed unit square
S = [0, 1] × [0, 1]. Slight rotations of the line about the point (0, 1) are also
supporting lines of S. Thus there are infinitely many supporting lines of S at
the point (0, 1). The line x = 1 is also a supporting line of the square.

Example 12. In the notation of example 10, the hyperplane nT x = nT z is a sup-


porting hyperplane of C. 

We conclude this section with the following fine application of compactness.

Theorem 4.7.16 (the supporting hyperplane theorem). Suppose z is a boundary


point of a closed convex set C ⊆ ℝn . Then there exists a supporting hyperplane M
of C that contains z.

Proof. Let an be a sequence in ℝn − C such that limn an = z. By remark 2, there is a


sequence un of unit vectors such that,

for every y ∈ C, uTn y < uTn an .

By the compactness of the unit sphere in ℝn , (un ) contains a convergent subse-


quence, which we continue to call (un ) for simplicity. Let u = limn un . Taking the
limit of the two sides of the above inequality, we have uT y ≤ uT z for all y ∈ C. The
hyperplane M orthogonal to u and containing z is the one we seek. 

Exercises

1. Prove directly that a compact metric space X is bounded.


2. Let (xn ) be a convergent sequence in a metric space X, and let x = limn xn .
Prove that the set {xn }∞
n=1 ∪ {x} is compact.
3. Let f ∶ [0, 1] → ℝ be continuous. Prove that the graph of f, {(x, f(x)) ∶ x ∈
[0, 1]}, is compact in ℝ2 .
4. Prove that if X is a compact metric space and F1 ⊇ F2 ⊇ . . . is a descending
sequence of nonempty closed subsets of X, then ∩∞ n=1 Fn ≠ ∅.
5. Consider the space c0 of null sequences, endowed with the supremum
norm. Prove that a bounded subset A of c0 is totally bounded if and only if,
for every 𝜖 > 0, there exists a natural number N such that |xn | < 𝜖 for every
n > N and every x ∈ A.
6. Prove that if a subset A of a metric space is totally bounded, then A is totally
bounded.
7. Prove that a totally bounded metric space is separable.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 159

8. Prove that, in a normed linear space X, the closed unit ball

B = {x ∈ X ∶ ‖x‖ ≤ 1}

is compact if and only if any closed ball in X is compact.


9. Prove that a normed linear space X is locally compact if and only if the
closed unit ball is compact. In this case, show that the unit sphere {x ∈ X ∶
‖x‖ = 1} is compact.
10. Prove that the product of finitely many locally compact spaces is locally
compact.
11. In connection with example 8, prove the point z is unique when C is convex.
12. Let F be a compact subset of a metric space X, and let a ∈ X − F. Prove that
there exists a point z ∈ F such that d(z, a) = dist(a, F). Also give an example
to show that z is not necessarily unique.
13. Let F be a closed subset of a locally compact normed linear space X, and let
a ∈ X − F. Prove that there exists a point z ∈ F such that d(z, a) = dist(a, F).
14. Let K be a compact subset of a metric space X. Prove that there exist points
x, y ∈ K such that d(x, y) = diam(K).
15. Show that if E is a compact subset of a metric space X and F is closed in X
and disjoint from E, then dist(E, F) > 0.
16. Let A be a subset of a metric space (X, d). For 𝜖 > 0, define

A𝜖 = ∪x∈A B(x, 𝜖).

Prove that
A𝜖 = {x ∈ X ∶ dist(x, A) < 𝜖}.
Also show that if E is a compact subset of X and F is closed in X and disjoint
from E, then E𝜖 ∩ F𝜖 = ∅ for some 𝜖 > 0.
17. Show that if E and F are disjoint compact subsets of ℝn , then there are points
x ∈ E and y ∈ F such that d(x, y) = dist(E, F).
18. Let E and F be disjoint compact convex subsets of ℝn . Show that there exists
a hyperplane uT x = b such that uT x > b for every x ∈ E, and uT x < b for
every x ∈ F.
19. Prove that a closed convex subset C of ℝn is the intersection of the closed
supporting half-spaces that contain C.
20. Find a countable set of closed supporting half-planes whose intersection is
the closed unit disk.
21. Prove that the standard n-simplex in ℝn is compact.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

160 fundamentals of mathematical analysis

4.8 Function Spaces

We already encountered several examples of function spaces. We mention two


examples here before we embark on a more general study of function spaces. An
early example is the space l∞ , which is nothing but the space of bounded functions
from ℕ to the base field 𝕂. A little reflection reveals that the same definition makes
sense when ℕ is replaced with an arbitrary set X. Another space we studied in some
detail is the space 𝒞[0, 1]. We also studied several of the properties of l∞ and 𝒞[0, 1],
such as completeness and the lack of local compactness. We start the section with
the definition of a number of important function spaces that generalize l∞ and
𝒞[0, 1] in particular.

Definition. Let X be a nonempty set, and define ℬ(X) to be the set of all
bounded, real or complex, functions on X. Define vector addition and scalar
multiplication in ℬ(X) by (f + g)(x) = f(x) + g(x), (af)(x) = af(x). Here f and
g are bounded functions, x ∈ X, and a ∈ 𝕂. The supremum norm (also the
uniform or ∞-norm) of a function f ∈ ℬ(X) is defined by

‖f‖∞ = supx∈X |f(x)|.

It is a straightforward exercise to verify that ℬ(X) is a vector space and that the
function ‖.‖∞ is a norm. Observe that it is not assumed that X is necessarily a
metric space. It is sometimes necessary to specify the scalar field. In this case, we
use the notations ℬ(X, ℝ), and ℬ(X, ℂ) to indicate whether we wish to consider
real or complex valued functions.

Definition. Let X be a metric space, and define 𝒞(X) to be the set of continuous
real or complex functions on X. The operations on 𝒞(X) are defined pointwise
as in the above definition. This clearly makes 𝒞(X) into a vector space; see prob-
lem 1 on section 4.3. However, since a continuous function is not necessarily
bounded, the supremum norm is not necessarily defined on 𝒞(X).

Definition. Let X be a metric space. The space of continuous bounded functions,


denoted by ℬ𝒞(X), is the intersection of ℬ(X) and 𝒞(X). It is a normed subspace
of ℬ(X). In the special case when X is a compact metric space, 𝒞(X) ⊆ ℬ(X),
and 𝒞(X) = ℬ𝒞(X).

Theorem 4.8.1. The space ℬ(X) of bounded functions on a set X is a complete


normed linear space.

Proof. We only prove the completeness of ℬ(X). Let (fn ) be a Cauchy sequence in
ℬ(X), and let 𝜖 > 0. There exists a natural number N such that, for m, n > N,
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 161

‖fn − fm ‖∞ = supx∈X |fn (x) − fm (x)| < 𝜖. In particular, (fn (x)) is a Cauchy
sequence in 𝕂 for each x ∈ X. Therefore limn fn (x) exists. Define f(x) = limn fn (x).

We claim that f ∈ ℬ(X). Since fn is Cauchy, there exists N ∈ ℕ such that, for
n > N, ‖fn − fN ‖∞ < 1. Consequently, for all x ∈ X, and all n > N, |fn (x)| ≤
|fn (x) − fN (x)| + |fN (x)| ≤ ‖fn − fN ‖∞ + ‖fN ‖∞ < 1 + ‖fN ‖∞ . Taking the limit of
the quantity on the extreme left of the above string of inequalities, we obtain
|f(x)| ≤ 1 + ‖fN ‖∞ . Thus f is a bounded function.

Finally, we show that limn fn = f in ℬ(X). Let 𝜖 > 0. There exists N ∈ ℕ such that,
for n, m > N and for all x ∈ X, |fn (x) − fm (x)| < 𝜖. Taking the limit as m → ∞, we
obtain |fn (x) − f(x)| ≤ 𝜖 for all x ∈ X, and all n > N. This means that ‖fn − f‖∞ <
𝜖 for all n > N, and the proof is now complete. 

Theorem 4.8.2. If X is a metric space, then the space ℬ𝒞(X) of continuous bounded
functions on X is a complete normed linear space.

Proof. Since ℬ𝒞(X) is a subspace of ℬ(X), it suffices, by theorems 4.8.1 and 4.6.4, to
show that ℬ𝒞(X) is closed in ℬ(X). Let f ∈ ℬ(X) be a closure point of ℬ𝒞(X). We
need to show that f is continuous. For 𝜖 > 0, there exists a function g ∈ ℬ𝒞(X) such
that ‖f − g‖∞ < 𝜖/3. Fix x0 ∈ X, and let 𝛿 > 0 be such that d(x, x0 ) < 𝛿 implies
that |g(x) − g(x0 )| < 𝜖/3. Now if d(x, x0 ) < 𝛿, then

|f(x) − f(x0 )| ≤ |f(x) − g(x)| + |g(x) − g(x0 )| + |g(x0 ) − f(x0 )| < 𝜖.

This proves that f is continuous at x0 . 

Definition. A function f ∶ X → 𝕂 from a metric space X to the base field 𝕂 is


uniformly continuous if, for every 𝜖 > 0, there exists a number 𝛿 > 0 such that,
for all x, y ∈ X with d(x, y) < 𝛿, |f(x) − f(y)| < 𝜖.

What distinguishes uniform continuity from continuity is that 𝛿 in the above


definition does not depend on x.

Theorem 4.8.3. A continuous (real or complex) function f on a compact metric space


X is uniformly continuous.

Proof. Let 𝜖 > 0. For every x ∈ X, there exists 𝛿x > 0 such that whenever d(x, 𝜉) <
𝛿x , |f(x) − f(𝜉)| < 𝜖/2. Now X = ∪x∈X B(x, 𝛿x ). Let 3𝛿 be a Lebesgue number for
the open cover {B(x, 𝛿x ) ∶ x ∈ X}. For each 𝜉, 𝜂 ∈ X with d(𝜉, 𝜂) < 𝛿, B(𝜉, 𝛿)
contains 𝜂 and has diameter < 3𝛿. By the definition of a Lebesgue number,
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

162 fundamentals of mathematical analysis

there exists x ∈ X such that B(𝜉, 𝛿) ⊆ B(x, 𝛿x ). In particular, d(𝜉, x) < 𝛿x , and
d(𝜂, x) < 𝛿x . Consequently, |f(𝜉) − f(𝜂)| ≤ |f(𝜉) − f(x)| + |f(x) − f(𝜂)| < 𝜖. 

The next result uses function spaces to provide an elegant and succinct proof of
the existence of the completion of an arbitrary (incomplete) metric space.

Theorem 4.8.4. Let X be a metric space. Then there exists a complete metric space
X and an isometry 𝜑 ∶ X → X such that 𝜑(X) is dense in X.

Proof. We know that ℬ(X, ℝ) is a complete metric space. We will find an isometry
𝜑 ∶ X → ℬ(X, ℝ). The theorem follows by taking X = 𝜑(X). To this end, fix an
element a ∈ X. For 𝜉 ∈ X, define a function 𝜑𝜉 ∶ X → ℝ by

𝜑𝜉 (x) = d(x, 𝜉) − d(x, a).

By the triangle inequality, |𝜑𝜉 (x)| = |d(x, 𝜉) − d(x, a)| ≤ d(a, 𝜉) for all x ∈ X.
Therefore 𝜑𝜉 is bounded. We now show that the map 𝜉 ↦ 𝜑𝜉 from X to ℬ(X, ℝ)
is an isometry. Specifically, we need to show that, for 𝜉, 𝜂 ∈ X, ‖𝜑𝜉 − 𝜑𝜂 ‖∞ =
d(𝜉, 𝜂).
Now ‖𝜑𝜉 − 𝜑𝜂 ‖∞ = supx∈X |𝜑𝜉 (x) − 𝜑𝜂 (x)| = supx∈X |d(x, 𝜉) − d(x, 𝜂)| ≤ d(𝜉, 𝜂).
Therefore ‖𝜑𝜉 − 𝜑𝜂 ‖∞ ≤ d(𝜉, 𝜂).
Since |𝜑𝜉 (𝜉) − 𝜑𝜂 (𝜉)| = d(𝜉, 𝜂), ‖𝜑𝜉 − 𝜑𝜂 ‖∞ = d(𝜉, 𝜂), as desired. 

Commonly used language to describe the conclusion of the previous theorem is


that X is isometrically embedded in X. By identifying a point 𝜉 ∈ X with 𝜑𝜉 ∈ X,
we often think of X as a subset of X. We employ this convenience in the next
theorem.

Definition. The space X that we just constructed is called the completion of X. It is


the (unique) smallest complete metric space that contains X as a dense subspace.
The following theorem frames that concept.

Theorem 4.8.5. Let (Y, 𝜌) be a complete metric space space, and let 𝜑 ∶ X → Y be
an isometry from a metric space X into Y such that 𝜑(X) is dense in Y. Then 𝜑
can be uniquely extended to an isometry 𝜑 ∶ X → Y.

Proof. Let x ∈ X, and choose a sequence (xn ) in X such that limn xn = x. In


particular, (xn ) is Cauchy. Because 𝜑 is an isometry, 𝜑(xn ) is a Cauchy sequence
of Y. By the completeness of Y, 𝜑(xn ) converges to a point dependent on x. Define
𝜑(x) = limn 𝜑(xn ). The reader should verify that the function 𝜑 ∶ X → Y is well
defined in the sense that it depends only on x and not on the particular choice of
the sequence (xn ). See problem 4 on section 4.1
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 163

To show that 𝜑 is onto, let y ∈ Y. Since 𝜑(X) is dense in Y, there exists a sequence
xn in X such that lim 𝜑(xn ) = y. Again, because 𝜑 is an isometry, (xn ) is Cauchy
in X. Since X is complete, (xn ) converges to a point x ∈ X. By the very definition
of 𝜑, 𝜑(x) = y.

Finally, we verify that 𝜑 an isometry. Let x, y ∈ X and choose sequences (xn )


and (yn ) in X such that limn xn = x, and limn yn = y. Since 𝜑 is an isometry,
𝜌(𝜑(xn ), 𝜑(yn )) = d(xn , yn ). Taking the limit of the two sides of the last identity
gives

𝜌(𝜑(x), 𝜑(y)) = 𝜌(lim 𝜑(xn ), lim 𝜑(yn )) = lim 𝜌(𝜑(xn ), 𝜑(yn ))


n n n

= lim d(xn , yn ) = d(x, y). 


n

Example 1. We know that the chordal metric 𝜒 on ℝ is not a complete metric.


It make sense to ask if the completion of (ℝ, 𝜒) can be described in concrete
terms. The answer is rather obvious now. Since (ℝ, 𝜒) is isometric to 𝒮1∗ , which
is a dense subset of 𝒮1 , the completion of (ℝ, 𝜒) is (isometric to) the circle 𝒮1 .
We did use here the fact that the completion of an incomplete metric space is
unique. See theorems 4.8.4 and 4.8.5. More generally, the completion of (ℝn , 𝜒)
is the sphere 𝒮n . 

We now prove two theorems of great utility: Ascoli’s theorem, which gives neces-
sary and sufficient conditions for the compactness of a subset of continuous func-
tions on a compact space in the uniform metric, and the Weierstrass polynomial
approximation theorem. Later in the book, we will encounter several applications
of the two theorems.

Definition. Let X be a metric space. A subset 𝔉 of 𝒞(X) is said to be equicontinu-


ous at x ∈ X if, for every 𝜖 > 0, there exists 𝛿 > 0 such that, for every y ∈ X with
d(x, y) < 𝛿, and every f ∈ 𝔉, |f(x) − f(y)| < 𝜖. We say that 𝔉 is equicontinuous
if it is equicontinuous at every x ∈ X.

Definition. A subset 𝔉 of 𝒞(X) is said to be uniformly equicontinuous if, for


every 𝜖 > 0, there exists 𝛿 > 0 such that, for every x, y ∈ X with d(x, y) < 𝛿, and
every f ∈ 𝔉, |f(x) − f(y)| < 𝜖.

Theorem 4.8.6. If X is a compact metric space and 𝔉 is an equicontinuous subset


of 𝒞(X), then 𝔉 is uniformly equicontinuous.

Proof. The proof mimics that of theorem 4.8.3 and is left as an exercise. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

164 fundamentals of mathematical analysis

Theorem 4.8.7 (Ascoli’s theorem). ⁸ Let X be a compact metric space. A subset 𝔉


of 𝒞(X) is compact in the uniform metric if and only if 𝔉 is closed, bounded, and
equicontinuous.

Proof. Suppose 𝔉 is compact. By theorem 4.7.3, 𝔉 is closed and bounded. We show


that 𝔉 is equicontinuous. Let 𝜖 > 0. The total boundedness of 𝔉 (theorem 4.7.9)
guarantees a finite set of functions {f1 , . . . , fn } in 𝔉 such that 𝔉 ⊆ ∪ni=1 B(fi , 𝜖/3).
By theorem 4.8.3, each fi is uniformly continuous, so there exists 𝛿i > 0 such
that if d(x, y) < 𝛿i , then |fi (x) − fi (y)| < 𝜖/3. Let 𝛿 = min1≤i≤n 𝛿i . If f ∈ 𝔉, there
exists fi such that ‖fi − f‖∞ < 𝜖/3. Now if x, y ∈ X are such that d(x, y) < 𝛿, then
|f(x) − f(y)| ≤ |f(x) − fi (x)| + |fi (x) − fi (y)| + |fi (y) − f(y)| < 𝜖. This proves that 𝔉
is equicontinuous. We now prove the converse.

Because of theorems 4.6.4 and 4.7.9, it is sufficient to show that 𝔉 is totally


bounded. Let 𝜖 > 0. Since 𝔉 is equicontinuous, for every x ∈ X, there exists 𝛿x > 0
such that, for all y ∈ B(x, 𝛿x ) and all f ∈ 𝔉, |f(x) − f(y)| < 𝜖/4. By the compactness
of X, there exists a subset A = {x1 , . . . , xn } of X such that X = ∪ni=1 B(xi , 𝛿xi ). By
the boundedness of 𝔉, the set R = ∪ni=1 {f(xi ) ∶ f ∈ 𝔉} is a bounded subset of the
complex plane, and therefore R is compact and hence totally bounded. Thus there
is a finite set B = {z1 , . . . , zm } of complex numbers such that R ⊆ ∪m j=1 B(zj , 𝜖/4).
The following observation is crucial. Consider an arbitrary function f in 𝔉. For
every xi ∈ A, there is a point zj ∈ B such that |f(xi ) − zj | < 𝜖/4. The assignment
xi ↦ zj clearly defines a function from A to B.⁹ This suggests that we look at
the finite set BA of all functions from A to B. For each 𝜑 ∈ BA , we define
a set 𝔉𝜑 = ∩ni=1 {f ∈ 𝔉 ∶ |f(xi ) − 𝜑(xi )| < 𝜖/4}.1⁰ By the above observation,
𝔉 = ∪{𝔉𝜑 ∶ 𝜑 ∈ AB }. We claim that each of the sets 𝔉𝜑 has a diameter less
than 𝜖. This will complete the proof because we can choose a function f𝜑 from
each nonempty 𝔉𝜑 , and then we will have an 𝜖-dense subset of 𝔉.

To prove the claim, let f, g ∈ 𝔉𝜑 . Since |f(xi ) − 𝜑(xi )| < 𝜖/4 and |g(xi ) − 𝜑(xi )| <
𝜖/4 for every xi ∈ A, |f(xi ) − g(xi )| < 𝜖/2 for every xi ∈ A. Now let x ∈ X. Then
x ∈ B(xi , 𝛿xi ) for some xi ∈ A, and

|f(x) − g(x)| ≤ |f(x) − f(xi )| + |f(xi ) − g(xi )| + |g(xi ) − g(x)| < 𝜖. 

Remark. Observe that we did not use the full force of the assumption that 𝔉 is
bounded, just that it is pointwise bounded. Problem 8 at the end of this section
is relevant here.

⁸ Also widely known as the Arzela-Ascoli theorem.


⁹ The point zj may not be unique, so we pick one such.
1⁰ It is possible that 𝔉𝜑 = ∅.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 165

Theorem 4.8.8 (the Weierstrass polynomial approximation theorem). Let g ∈


𝒞[0, 1] and let 𝜖 > 0. Then there exists a polynomial P such that

‖g − P‖∞ < 𝜖.

Proof. Observe that the theorem says that the space of polynomials is dense
in 𝒞[0, 1]. Without loss of generality, we may replace g with the function
f(x) = g(x) − [g(0) + x(g(1) − g(0))]. This is because g(0) + x(g(1) − g(0)) is a
polynomial. Replacing g with f has the advantage that f(0) = f(1) = 0. Extend f to
ℝ by defining f(x) = 0 when x ∉ [0, 1].

1 1
Define Ln (x) = cn ∫−1 f(x + t)(1 − t2 )n dt, where c−1 2 n
n = ∫−1 (1 − t ) dt. Since
f(x) = 0 for x ∉ [0, 1],

1−x 1
Ln (x) = cn ∫ f(x + t)(1 − t ) dt = cn ∫ f(𝜉)[1 − (𝜉 − x)2 ]n d𝜉.(𝜉 = x + t.)
2 n
−x 0

The last expression makes it clear that Ln (x) is a polynomial of degree ≤ 2n.

It is a simple induction exercise to show that, for all n ∈ ℕ, (1 − t2 )n ≥ 1 − nt2 , so


1 1/√n 4 1
∫−1 (1 − t2 )n dt ≥ ∫−1/√n (1 − nt2 )dt = > . In particular, cn < √n.
3√n √n
Now let 𝜖 > 0. The uniform continuity of f yields a number 0 < 𝛿 < 1 such that, for
1
all x, y with |x − y| < 𝛿, |f(x) − f(y)| < 𝜖. Now cn ∫𝛿 (1 − t2 )n dt < √n(1 − 𝛿 2 )n .
1
Since limn √n(1 − 𝛿 2 )n = 0, we can pick an integer n such that cn ∫𝛿 (1 − t2 )n
dt < 𝜖.

1
Since ∫−1 cn (1 − t2 )n dt = 1 and, for |t| < 1, cn (1 − t2 )n ≥ 0,

1
|Ln (x) − f(x)| = |cn ∫ [f(x + t) − f(x)](1 − t2 )n dt|
−1
1
≤ cn ∫ |f(x + t) − f(x)|(1 − t2 )n dt
−1
𝛿
= cn ∫ |f(x + t) − f(x)|(1 − t2 )n dt + cn ∫ |f(x + t) − f(x)|(1 − t2 )n dt
−𝛿 |t|>𝛿
𝛿
< 𝜖cn ∫ (1 − t2 )n dt + 2‖f‖∞ cn ∫ (1 − t2 )n dt
−𝛿 |t|>𝛿
1
< 𝜖 + 4‖f‖∞ ∫ (1 − t2 )n dt < 𝜖 + 4𝜖‖f‖∞ . 
𝛿
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

166 fundamentals of mathematical analysis

Example 2. 𝒞[0, 1] is separable.

Let A be the set of polynomials with rational coefficients. Clearly, A is countable.


We show it is dense in 𝒞[0, 1]. Let f ∈ 𝒞[0, 1], and let 𝜖 > 0. By the Weierstrass
n
theorem, there is a polynomial q = ∑i=0 ai xi such that ‖f − q‖∞ < 𝜖/2. For
each 0 ≤ i ≤ n, choose a rational number ri such that |ai − ri | < 𝜖/2(n + 1),
n
and define p = ∑i=0 ri xi . Now ‖f − p‖∞ ≤ ‖f − q‖∞ + ‖q − p‖∞ ≤ 𝜖/2 +
n
∑i=0 |ai − ri | < 𝜖. 

The last example and the Weierstrass polynomial approximation theorem have
far-reaching generalizations. Their proofs require the full power of the Stone-
Weierstrass theorem. The proof of all three theorems can be found in section 4.9.

1
Example 3. Let f ∈ 𝒞[0, 1] be such that ∫0 xn f(x)dx = 0, for every nonnegative
integer n. Then f = 0.

Without loss of generality, assume that f is a real function. The assumption


1
implies that ∫0 f(x)p(x)dx = 0 for any polynomial p. By the Weierstrass theo-
rem, there is a polynomial p such that ‖f − p‖∞ < 𝜖.
1 1 1 1
Now | ∫0 f2 dx| = | ∫0 f(f − p)dx| ≤ ∫0 |f(x)||f(x) − p(x)|dx ≤ 𝜖 ∫0 |f|dx ≤ 𝜖‖f‖∞ .
1 2
Since 𝜖 is arbitrary, ∫0 f dx = 0. The continuity of f forces f = 0. 

The discussion so far has been focused on scalar-valued functions, and all the
function spaces we have studied are normed linear spaces. We now expand the
discussion and consider functions that take values in a general metric space. The
next two examples are extensions of theorems 4.8.1 and 4.8.2.

Example 4. Let (Y, 𝜌) be a bounded metric space, and let X be an arbitrary


nonempty set. For functions f, g ∶ X → Y, define
D(f, g) = supx∈X 𝜌(f(x), g(x)).
Then D is a metric on the set YX of all functions from X to Y. If, in addition, 𝜌
is a complete metric, so is D.

Observe that the definition of D makes sense because 𝜌 is a bounded metric.


The verification that D is a metric on YX is straightforward. Now assume that 𝜌
is complete, and let fn be a Cauchy sequence in YX . For 𝜖 > 0, there is a natural
number N such that, for m, n > N, supx∈X 𝜌(fn (x), fm (x)) < 𝜖. In particular, for
an arbitrary x ∈ X, the sequence (fn (x)) is a Cauchy sequence in Y. By the
completeness of 𝜌, f(x) = limn fn (x) exists. We now show that fn converges to f
in the metric D.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 167

Let 𝜖, N, n, and m be as in the last paragraph. Taking the limit as m → ∞, we


obtain 𝜌(fn (x), f(x)) < 𝜖. Since the last inequality holds for all x ∈ X, D(fn , f) < 𝜖
for all n > N. This shows that limn D(fn , f) = 0. 

Example 5. Let (Y, 𝜌) be as in the previous example, and assume that 𝜌 is


complete. If (X, d) is a metric space, then the space (𝒞(X, Y), D) of continuous
functions from (X, d) to (Y, 𝜌) is complete.

We show that 𝒞(X, Y) is closed in (YX , D). Let f ∈ YX be a closure point


of 𝒞(X, Y). We need to show that f is continuous. For 𝜖 > 0, there exists a
function g ∈ 𝒞(X, Y) such that D(f, g) < 𝜖/3. Fix x0 ∈ X, and let 𝛿 > 0 be such
that d(x, x0 ) < 𝛿 implies that 𝜌(g(x), g(x0 )) < 𝜖/3. Now if d(x, x0 ) < 𝛿, then
𝜌(f(x), f(x0 )) ≤ 𝜌(f(x), g(x)) + 𝜌(g(x), g(x0 )) + 𝜌(g(x0 ), f(x0 )) < 𝜖. This proves
that f is continuous at x0 . 

Let I = [0, 1] be the closed unit interval, and let I2 = [0, 1] × [0, 1] be the closed
unit square; I is given the usual metric on ℝ, and we give I2 the product met-
ric 𝜌((r, s), (u, v)) = max{|r − u|, |s − v|}. In theorem 4.8.9, we will make use of
the space 𝒞(I, I2 ) defined in example 5 with the complete metric D defined in
example 4.

Application: A Space-Filling Curve

Let J = [a, b] be an arbitrary closed interval, and let S be an arbitrary closed square.
We will refer to a function in 𝒞(J, S) as a path. We are particularly interested in the
four types of triangular paths g shown in figure 4.4. The triangles differ only in
orientation. Specifically, the intervals [a, (a + b)/2] and [(a + b)/2, b] are mapped
linearly onto the straight line segments of the triangle such that g(a) and g(b) are
adjacent corners of S and g((a + b)/2) is the center of S. See the formula defining
the path f0 in the proof of theorem 4.8.9.
Before we embark on the task of finding the space-filling curve, we describe
a special type of operation we need in the proof of the next theorem. Observe
that the paths g intersect only two of the four sub-squares that result from
bisecting the sides of S. We define the modified paths g′ as follows. Divide [a, b]
into four congruent subintervals Jj = [a + j(b − a)/4, a + (j + 1)(b − a)/4], 0 ≤ j ≤
3, and map the subinterval Jj linearly onto the four triangular paths that make up
the path g′ , as shown in figure 4.5. Observe that the paths g′ intersect all the sub-
squares of S.
We are now ready to find the space-filling curve. The statement of the theorem
below justifies the term space filling.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

168 fundamentals of mathematical analysis

(a) (b)

(c) (d)

Figure 4.4 The triangular paths g

Theorem 4.8.9. There exists a continuous surjection f from I to I2 .

Proof. We apply the operation discussed above the theorem to construct a sequence
(fn ) that converges to the desired function f.
Define the path f0 ∶ I → I2 by

(t, t) if 0 ≤ t ≤ 1/2,
f0 (t) = {
(t, 1 − t) if 1/2 ≤ t ≤ 1.

Figure 4.4 (a) depicts the path f0 .


Applying the operation described above the theorem, we can find a path f1
consisting of the four triangular paths shown in figure 4.5(a). Next we apply the
operation to each of the triangular pieces of f1 to produce the path f2 . Observe
that this requires dividing each of the subintervals Ij = [j/4, (j + 1)/4] into four
congruent sub-intervals and modifying the restriction of f1 to each Ij , depending on
its orientation, according to figure 4.5. The repeated application of the operation
produces a sequence of paths (fn ), which we show converges to the space-filling
curve. The path fn consists of 4n triangular paths, and each triangle is contained in
a square of length 2−n . The triangular pieces correspond to partitioning I into the
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 169

(a) (b)

(c) (d)

Figure 4.5 The modified paths g′

(a) (b) (c)


1 1 1

0 0 0
0 1 0 1 0 1

Figure 4.6 The paths f2 , f3 , and f4

subintervals [j/4n , (j + 1)/4n ], 0 ≤ j ≤ 4n − 1. The paths f2 , f3 , and f4 are shown in


figure 4.6.

A crucial feature of the sequence (fn ) is that if a triangular piece T of the path
fn is contained in a square S of length 2−n , then the four triangular pieces of
fn+1 obtained by modifying T are contained in the same square S. Thus, for every
t ∈ I, 𝜌(fn+1 (t), fn (t)) < 2−n . Consequently, D(fn+1 , fn ) < 2−n . This is the crux of
the proof.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

170 fundamentals of mathematical analysis

Now, for positive integers m, n, if m > n, then

D(fm , fn ) ≤ D(fm , fm−1 ) + D(fm−1 , fm−2 ) + . . . + D(fn+1 , fn )


< 2−(m−1) + . . . + 2−n < 2−n+1 → 0 as n → ∞.

Thus the sequence (fn ) is Cauchy, and the completeness of (𝒞(I, I2 ), D) guarantees
that fn converges to a function f ∈ 𝒞(I, I2 ).
Since I is compact, the range of f is compact and hence closed in I2 . The proof will
be complete if we show that the range of f is dense in I2 . Let x ∈ I2 , and let 𝜖 > 0.
Choose an integer n such that 2−n < 𝜖/2 and D(fn , f) < 𝜖/2. The point x belongs
to one of the 4n squares that contain the triangular pieces of fn . Let S be such a
square. If t ∈ [0, 1] is such that fn (t) is on the triangular piece contained in S, then
𝜌(fn (t), x) < 2−n and

𝜌(f(t), x) ≤ 𝜌(f(t), fn (t)) + 𝜌(fn (t), x) < D(fn , f) + 𝜖/2 < 𝜖. 

Exercise

1. Let Y be a complete metric space, let A be a dense subset of a metric space


X, and suppose that f ∶ A → Y is a uniformly continuous function.
(a) Show that f maps Cauchy sequences into Cauchy sequences.
(b) Show that f admits a unique uniformly continuous extension f ∶ X → Y.
2. Give an example to show that the mere continuity of f in the above exercise
is not enough to guarantee an extension.
3. Prove that the completion of a separable metric space is separable.
4. Prove that the function 𝜑 in the proof of theorem 4.8.5 is well defined.
5. Let 𝔉 be a pointwise bounded family of continuous, scalar-valued functions
on a complete metric space X. Thus, for each x ∈ X, sup{|f(x)| ∶ f ∈ 𝔉} < ∞.
Prove that there exists an open subset V ⊆ X such that sup{|f(x)| ∶ x ∈ V, f ∈
𝔉} < ∞. Hint: Let An = {x ∈ X ∶ |f(x)| ≤ n for every f ∈ 𝔉}.
6. Show that if X is compact and 𝔉 ⊆ 𝒞(X) is equicontinuous, then 𝔉 is
equicontinuous.
7. Ascoli’s theorem is often applied in the following form. Let X be a compact
metric space. If a sequence (fn ) of functions in 𝒞(X) is bounded and
equicontinuous, then (fn ) contains a subsequence that converges in 𝒞(X).
Prove this version of Ascoli’s theorem.
8. Let X be a compact metric space, and let 𝔉 be an equicontinuous family
of functions in 𝒞(X). Prove that if 𝔉 is pointwise bounded, then 𝔉 is
uniformly bounded. Hint: Let g(x) = sup{|f(x)| ∶ f ∈ 𝔉}. For n ∈ ℕ, let
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 171

Un = {x ∈ X ∶ g(x) < n}. Prove that Un is open and that {Un ∶ n ∈ ℕ} is an


open cover of X.
9. Let X be a compact metric space, and let (fn ) be a sequence of equicon-
tinuous functions that converges pointwise to a function f. Prove that f is
continuous and that (fn ) converges uniformly to f.
10. Dini’s theorem. Let X be a compact metric space, and let fn ∶ X → ℝ be a
sequence of continuous functions such that, for x ∈ X, f1 (x) ≥ f2 (x) ≥ . . . ,
and limn fn (x) = 0. Prove that f converges uniformly to the zero function.
11. Let X be a compact metric space, and let fn ∶ X → ℝ be a sequence of con-
tinuous functions such that, for x ∈ X, f1 (x) ≤ f2 (x) ≤ . . . , and limn fn (x) =
f(x), where f is continuous. Prove that fn converges uniformly to f.
12. Prove that 𝒞[a, b] is not locally compact.
13. Prove that 𝒞[a, b] is separable and that polynomials are dense in 𝒞[a, b].
14. LetX = 𝒞(ℝn ), and let Ki be the closed ball of radius i and centered at
the origin. Clearly, K1 ⊆ K2 ⊆ . . . , and ∪∞ n
i=1 Ki = ℝ . Let ‖.‖i denote the
uniform norm on 𝒞(Ki ). For a continuous function f ∶ ℝn → ℂ, ‖f‖i =
sup{|f(x)| ∶ x ∈ Ki } denotes the norm of the restriction of f to Ki . Define
a metric d on X as follows:

d(f, g) = ∑ 2−i min{1, ‖f − g‖i }.
i=1

(a) Show that d is a metric on X.


(b) Show that, for each i ∈ ℕ, d(f, g) ≤ ‖f − g‖i + 2−i .
(c) Show that a sequence of functions (fm ) in X converges in the metric d
to f ∈ X if and only if (fm ) converges uniformly to f on compact subsets
of ℝn .
(d) Show that d is a complete metric.
15. Prove that, for every natural number n, there is a continuous surjection
from I to In .
16. Prove that a countable metric space can be isometrically injected in l∞ .
Hint: Examine the proof of theorem 4.8.4.
17. Prove that every separable metric space can be isometrically injected in l∞ .

4.9 The Stone-Weierstrass Theorem

Like the Weierstrass theorem, the Stone-Weierstrass theorem is an approximation


theorem. However, the Stone-Weierstrass theorem allows us to prove far-reaching
generalizations of the results we obtained in section 4.8. Powerful theorems often
require the development of elaborate machinery and this section demonstrates
that.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

172 fundamentals of mathematical analysis

Throughout this section, X is a compact metric space, 𝒞(X, ℝ) is the space of


continuous, real-valued functions on X, and 𝒞(X, ℂ) is the space of continuous,
complex-valued functions on X. We use the notation 𝒞(X) for either of the two
spaces when the distinction is immaterial; 𝒞(X) is a endowed with the uniform
norm and, as such, is a Banach space. Additionally, 𝒞(X) is an algebra with the
pointwise multiplication of functions: (fg)(x) = f(x)g(x). See the definition of an
algebra in section 3.4

Let 𝒜 be a subalgebra of 𝒞(X) satisfying the following standing assumptions:

(SA1) 𝒜 contains all constant functions.


(SA2) 𝒜 separates points in X in the sense that if x, y are distinct points in X,
then there exists a function h ∈ 𝒜 such that h(x) ≠ h(y).

Lemma 4.9.1. Let 𝒜 be a subalgebra of 𝒞(X, ℝ) satisfying SA1 and SA2. Then, for
f, g ∈ 𝒜, the functions max{f, g} and min{f, g} are in 𝒜 (the closure of 𝒜).

1 1 1 1
Proof. Since max{f, g} = (f + g) + |f − g|, and min{f, g} = (f + g) − |f − g|, and
2 2 2 2
since 𝒜 is a subspace of 𝒞(X, ℝ), it is sufficient to prove that |f| ∈ 𝒜 whenever
f ∈ 𝒜. Let M = ‖f‖∞ , and let 𝜖 > 0. By the Weierstrass approximation theorem
n
applied to the function g(t) = |t|, there exists a polynomial p(t) = ∑j=0 aj tj (aj ∈
ℝ) such that, for all t ∈ [−M, M], | |t| − p(t)| < 𝜖. Consider the function pof =
n
∑j=0 aj fj . Since 𝒜 is an algebra, pof ∈ 𝒜, and since | |f(x)| − p(f(x))| < 𝜖 for all
x ∈ X, ‖|f| − pof‖∞ < 𝜖, and |f| ∈ 𝒜. 

Lemma 4.9.2. Let 𝒜 be a subalgebra of 𝒞(X, ℝ) satisfying SA1 and SA2, and let f ∈
𝒞(X, ℝ). For every y, z ∈ X, there exists a function gyz ∈ 𝒜 such that gyz (y) = f(y)
and gyz (z) = f(z).

Proof. If y = z, define gyz (x) = f(y) (a constant function). Otherwise, by SA2, there
exists a function h ∈ 𝒜 such that h(y) ≠ h(z). The following function is in 𝒜 and
satisfies the requirements:

h(x) − h(z)
gyz (x) = f(z) + (f(y) − f(z)) .
h(y) − h(z)

Theorem 4.9.3 (the Stone-Weierstrass theorem). Let 𝒜 be a subalgebra of


𝒞(X, ℝ) satisfying SA1 and SA2. Then 𝒜 is dense in 𝒞(X, ℝ).

Proof. It is sufficient to show that 𝒜 is dense in 𝒞(X, ℝ). Observe that 𝒜 is a


subalgebra of 𝒞(X, ℝ) that satisfies SA1 and SA2. Let f ∈ 𝒞(X, ℝ), and let 𝜖 > 0.
We will show that there is a function g ∈ 𝒜 such that ‖f − g‖∞ < 𝜖. For y, z ∈ X,
let gy,z be as in lemma 4.9.2, and let Uy,z = (f − gy,z )−1 (−𝜖, 𝜖). By the continuity
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 173

of f − gy,z , Uy,z is open and, clearly, y, z ∈ Uy,z . In particular, for every x ∈ Uyz ,
f(x) < gy,z (x) + 𝜖 and f(x) > gy,z (x) − 𝜖. The collection {Uy,z ∶ y ∈ X} covers X.
nz
Thus there exists a finite subset {y1 , . . . , ynz } of X such that ∪i=1 Uyi ,z = X.
nz
Define gz = max{gyi ,z ∶ 1 ≤ i ≤ nz }, and let Vz = ∩i=1 Uyi ,z . The function gz is
in 𝒜 by lemma 4.9.1. Observe that

f(x) < gz (x) + 𝜖 for all x, z ∈ X,

and

f(x) > gz (x) − 𝜖 for all z ∈ X and all x ∈ Vz .

Now each Vz is an open neighborhood of z; hence the collection {Vz ∶ z ∈ X}


covers X. Thus there exists a finite subset {z1 , . . . , zm } of X such that ∪m
j=1 Vzj = X.
Finally, let

g = min{gzj ∶ 1 ≤ j ≤ m}.

A little reflection reveals that g(x) − 𝜖 < f(x) < g(x) + 𝜖 for all x ∈ X. 

The following corollary is a far-reaching generalization of the Weierstrass polyno-


mial approximation theorem.

Corollary 4.9.4. If X is a compact subset of ℝn , then the set 𝒜 of polynomials with


real coefficients in n variables is dense in 𝒞(X, ℝ).

Proof. Clearly, 𝒜 is an algebra, and it contains all constant functions. To show that 𝒜
separates points in X, let x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) be distinct points in
n
X. The polynomial p(t1 , . . . , tn ) = ∑i=1 (ti − xi )2 satisfies p(x) = 0, and p(y) > 0.
By the Stone-Weierstrass theorem, 𝒜 is dense in 𝒞(X, ℝ). 

The following result is the promised generalization of example 2 in section 4.8.

Corollary 4.9.5. If X is a compact metric space, then 𝒞(X, ℂ) is separable.

Proof. Since compact metric spaces are separable, let {𝜉n ∶ n ∈ ℕ} be a countable
dense subset of X. For n ∈ ℕ, define fn (x) = d(x, 𝜉n ), and define f0 (x) = 1. Let
ℳ be the set of all finite products of the functions f0 , f1 , . . . , and let 𝒜 be the set
of all linear combinations with real coefficients of elements in ℳ. Clearly, 𝒜 is
a subalgebra of 𝒞(X, ℝ).11 We show that 𝒜 separates points in X. If x and y are

11 In fact, 𝒜 is the subalgebra generated by the set {fn }∞


n=0 , that is, the smallest subalgebra of 𝒞(X, ℝ)
that contains {fn }∞
n=0 .
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

174 fundamentals of mathematical analysis

distinct, let 𝛿 = d(x, y)/4. There exists a natural number n such that d(x, 𝜉n ) < 𝛿.
The function fn separates x and y since fn (x) < 𝛿, and fn (y) > 3𝛿.

By theorem 4.9.3, 𝒜 is dense in 𝒞(X, ℝ). We now show that the countable set
n
𝒜1 = {∑i=1 qi gi ∶ n ∈ ℕ, qi ∈ ℚ, gi ∈ ℳ} is dense in 𝒞(X, ℝ). By the first part of
n
the proof, it is enough to show that if f = ∑i=1 ai gi ∈ 𝒜 and 𝜖 > 0, then there exists
an element h ∈ 𝒜1 such that ‖f − h‖∞ < 𝜖. Let M = max{‖gi ‖∞ ∶ 1 ≤ i ≤ n}
n
and choose rational numbers qi such that |ai − qi | < 𝜖/(nM). Set h = ∑i=1 qi gi .
Clearly, ‖f − h‖∞ < 𝜖.

To show that 𝒞(X, ℂ) is separable, let f = f1 + if2 ∈ 𝒞(X, ℂ), and choose functions
h1 and h2 in 𝒜1 such that ‖f1 − h1 ‖∞ < 𝜖/2, and ‖f2 − h2 ‖∞ < 𝜖/2. The function
h = h1 + ih2 is in 𝒜1 + i𝒜1 and satisfies ‖f − h‖∞ < 𝜖. Since 𝒜1 + i𝒜1 is count-
able, the proof is complete. 

Theorem 4.9.3 does not extend to 𝒞(X, ℂ), as we show in example 1 below. First
we need a definition.

Definition. Let 𝒞(𝒮1 , ℂ)12 be the space of all continuous complex functions on
[−𝜋, 𝜋] such that f(−𝜋) = f(𝜋). It is clear that 𝒞(𝒮1 , ℂ) is a closed subspace of
𝒞[−𝜋, 𝜋] when both spaces are given the uniform norm.

Another way to view the space 𝒞(𝒮1 , ℂ) is as follows. The restriction of any
continuous, 2𝜋-periodic function g ∶ ℝ → ℂ to the interval [−𝜋, 𝜋] is in the space
𝒞(𝒮1 , ℂ). Conversely, any function f ∈ 𝒞(𝒮1 , ℂ) can be extended by periodicity
to a continuous, 2𝜋-periodic function. Thus the space 𝒞(𝒮1 , ℂ) is also the space
of continuous, 2𝜋-periodic functions. Every point 𝜃 ∈ [−𝜋, 𝜋) corresponds to a
unique point ei𝜃 on the unit circle 𝒮1 in the complex place, and, for every function
f ∈ 𝒞(𝒮1 , ℂ), there corresponds a function f ̃ ∶ 𝒮1 → ℂ, where f(ẽ i𝜃 ) = f(𝜃) (here
𝜃 ∈ [−𝜋, 𝜋)). The correspondence f ↔ f ̃ is unambiguous because of the condition
f(−𝜋) = f(𝜋). Therefore the space of 2𝜋-periodic functions can also be thought of
as the space of continuous functions on the unit circle 𝒮1 . We adopt any of the
three equivalent characterizations of 𝒞(𝒮1 , ℂ), as convenience dictates.

Example 1. For n = 0, 1, . . . let un (t) = eint . The set {un }∞


n=0 separates point in
n
[−𝜋, 𝜋]. Thus the set 𝒜 = {∑i=0 ai ui ∶ ai ∈ ℂ} is a subalgebra of 𝒞(𝒮1 , ℂ) that
separates points in [−𝜋, 𝜋] and contains all constant functions. However,
𝒜 is not dense in 𝒞(𝒮1 , ℂ). We show that for the function f(t) = e−it ,
n
‖f − p‖∞ ≥ 1 for all p ∈ 𝒜. First observe that for any p = ∑j=0 aj uj ∈ 𝒜,

12 The reason for the notation will be justified in the next paragraph.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 175

2𝜋 n 2𝜋
∫0 fpdt = ∑j=0 aj ∫0 ei(j+1)t dt = 0. Because |f| = ff = 1,
| 2𝜋 | | 2𝜋 | 2𝜋
2𝜋 = || ∫0 ffdt|| = || ∫0 f(f − p)dt|| ≤ ∫0 |f − p|dt ≤ 2𝜋‖f − p‖∞ . Thus ‖f −
p‖∞ ≥ 1, as claimed. 

The following is the generalization of theorem 4.9.3 to the complex case.

Theorem 4.9.6 (the Stone-Weierstrass theorem). Let X be a compact metric space


and let 𝒜 be a subalgebra of 𝒞(X, ℂ) that satisfies SA1 and SA2. If 𝒜 is closed
under complex conjugation, then 𝒜 is dense in 𝒞(X, ℂ).

Proof. Let ℛ = {Re(f) ∶ f ∈ 𝒜}, and let ℐ = {Im(f) ∶ f ∈ 𝒜}. If f = f1 + if2 ∈ 𝒜, then
if = −f2 + if1 ∈ 𝒜. Thus f2 ∈ ℛ and f1 ∈ ℐ. It follows that ℐ = ℛ. First we show
that ℛ satisfies SA1 and SA2. It is clear that ℛ contains all constant functions. If
x and y are distinct points of X, then there exists f ∈ 𝒜 such that f(x) ≠ f(y). Thus
f1 (x) ≠ f1 (y) or f2 (x) ≠ f2 (y). Because f1 and f2 are in ℛ, ℛ separates points in X.
Theorem B.3 implies that ℛ is dense in 𝒞(X, ℝ). Because 𝒜 is closed under complex
conjugation, f1 = (f + f)/2 ∈ 𝒜; thus ℛ ⊆ 𝒜, and hence ℛ + iℛ ⊆ 𝒜. We show
that ℛ + iℛ is dense in 𝒞(X, ℂ). By the density of ℛ in 𝒞(X, ℝ), there are functions
h1 , h2 ∈ ℛ such that ‖f1 − h1 ‖∞ < 𝜖/2 and ‖f2 − h2 ‖∞ < 𝜖/2. The function h =
h1 + ih2 is in ℛ + iℛ and ‖f − h‖∞ < 𝜖. 

Example 2. For n ∈ ℤ, let un (t) = eint , and consider the set 𝒯 = Span({un ∶ n ∈
ℤ}). 𝒯 is clearly a subalgebra of 𝒞(𝒮1 , ℂ) that satisfies the assumptions of
theorem 4.9.6. Therefore 𝒯 is dense in 𝒞(𝒮1 , ℂ).

The last example is really a well-known theorem. We will expand this discussion
in a more focused manner in the next section.

4.10 Fourier Series and Orthogonal Polynomials

In section 3.7 we studied the geometry of inner product spaces more than their
metric properties. We now have a bigger toolbox with which we can tackle
inner product spaces. Before we pose the central questions of this section, let us
summarize the highlights of section 3.7, upon which this section rests heavily. Let
{u1 , u2 , . . . } be an infinite orthonormal sequence of vectors in an inner product
space H. The orthogonal projection of an element x ∈ H on the finite-dimensional
n
space Mn = Span({u1 , . . . , un }) is, by definition, the vector Sn x = ∑i=1 ⟨x, ui ⟩ui . We
know from theorem 3.7.6 that the vector Sn x is the closest vector in Mn to x,
and we also say that Sn x is the best approximation of x in Mn . Now that we have
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

176 fundamentals of mathematical analysis

studied convergence in metric spaces, it is natural to ask whether limn Sn x = x.


Unfortunately, we are still not in a position to state an exact set of conditions under
which a general answer can be provided because the answer depends on the space
H and the sequence {u1 , u2 , . . . }. The reader should suspect that completeness is
relevant here, and it is. The spaces we study in this section are not complete, and this
is precisely the reason we cannot decisively settle the question posed above about
the convergence of the sequence Sn x. In two of the major examples we consider in
this section, we will answer this question satisfactorily but not completely. The full
picture will materialize in sections 7.2 and 8.9.

Fourier series
1 𝜋
In section 3.7, we defined the inner product ⟨f, g⟩ = ∫−𝜋 f(x)g(x)dx on the space
2𝜋
𝒞[−𝜋, 𝜋]. The sequence
{un (t) = eint ∶ n ∈ ℤ}
is an orthonormal sequence with respect to the above inner product. The norm
of a function f induced by the inner product will be denoted by ‖f‖2 in order
to distinguish it from the uniform norm on 𝒞[−𝜋, 𝜋], which will also play a
prominent role in this section. Thus the uniform norm of a function f ∈ 𝒞[a, b]
will be denoted by the usual notation ‖f‖∞ , while
𝜋 1/2
1
‖f‖2 = ( ∫ |f(x)|2 dx) .
2𝜋 −𝜋

It is clear that ‖f‖2 ≤ ‖f‖∞ .

In section 4.9, we introduced the space 𝒞(𝒮1 , ℂ) (which we now abbreviate


𝒞(𝒮1 )) of 2𝜋-periodic functions on [−𝜋, 𝜋]. It is clear that 𝒞(𝒮1 ) is a closed
subspace of (𝒞[−𝜋, 𝜋], ‖.‖∞ ).

For a function f ∈ 𝒞[−𝜋, 𝜋], we define the Fourier series of f to be the formal
series
∞ 𝜋
̂
∑ f(n)e inx ̂ = 1 ∫ f(t)e−int dt.
where f(n)
n=−∞
2𝜋 −𝜋

̂
The numbers f(n), n ∈ ℤ are called the Fourier coefficients of f. It is clear that
the partial sum of the Fourier series,
n
̂ ijx
Sn f(x) = ∑ f(j)e
j=−n

is the orthogonal projection of f on Mn = Span({ui ∶ −n ≤ i ≤ n}).


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 177

The question now is whether the sequence Sn f converges to f in the 2-norm. We


answer the question affirmatively after we establish a few facts. Convergence in the
2-norm is sometimes called the mean square convergence in order to distinguish
it from the uniform convergence of Sn f to f, which is also a valid question.

Definition. A trigonometric polynomial is a linear combination of the functions


{un = eint ∶ n ∈ ℤ}. Thus a trigonometric polynomial is a function of the form
n
p(t) = ∑ cj eijt , where cj ∈ ℂ, n ∈ ℕ.
j=−n

The collection 𝒯 of all trigonometric polynomials is clearly the span of the


sequence {un ∶ n ∈ ℤ} and is a subspace of 𝒞(𝒮1 ).

Using the terminology we just established, example 2 in section 4.9 can be stated
as follows.

Theorem 4.10.1. The space of trigonometric polynomials is dense in the space


(𝒞(𝒮1 ), ‖.‖∞ ). Explicitly, for every f ∈ 𝒞(𝒮1 ) and every 𝜖 > 0, there exists a
trigonometric polynomial p such that ‖f − p‖∞ < 𝜖. 

Lemma 4.10.2. For a function f ∈ 𝒞[−𝜋, 𝜋] (not necessarily periodic), and for every
𝜖 > 0, there exists a 2𝜋-periodic function g such that ‖f − g‖2 < 𝜖.

𝜖2
Proof. Let M = ‖f‖∞ and define 𝛿 = . Define g ∈ 𝒞(𝒮1 ) as follows:
8M2

f(−𝜋+𝛿)
⎧ (x + 𝜋) if − 𝜋 ≤ x ≤ −𝜋 + 𝛿,
⎪ 𝛿
g(x) = f(x) if − 𝜋 + 𝛿 ≤ x ≤ 𝜋 − 𝛿,
⎨ f(𝜋−𝛿)

⎩ −𝛿 (x − 𝜋) if 𝜋 − 𝛿 ≤ x ≤ 𝜋.

Figure 4.7 below shows how f is modified on the subintervals [−𝜋, −𝜋 + 𝛿] and
[𝜋 − 𝛿, 𝜋] to produce g. We replace the graph of f on the subinterval [−𝜋, −𝜋 + 𝛿]
with the straight line that interpolates the points (−𝜋 + 𝛿, f(−𝜋 + 𝛿)) and (−𝜋, 0),
and similarly on the subinterval [𝜋 − 𝛿, 𝜋]. The dotted lines in figure 4.7 indicate
the modification of f to produce g. By construction, g is continuous and periodic.
Also, for x ∈ [−𝜋, 𝜋], |f(x) − g(x)| < 2M. Now
−𝜋+𝛿 𝜋
1 1
‖f − g‖22 = ∫ |f(x) − g(x)|2 dx + ∫ |f(x) − g(x)|2 dx
2𝜋 −𝜋 2𝜋 𝜋−𝛿
−𝜋+𝛿 𝜋
4M2 4M2 8M2 𝛿 𝜖2
≤ ∫ dx + ∫ dx = = < 𝜖2 . 
2𝜋 −𝜋 2𝜋 𝜋−𝛿 2𝜋 2𝜋
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

178 fundamentals of mathematical analysis

π
δ
-π -π
δ

+
+

Figure 4.7 The modified function g -π

Corollary 4.10.3. Trigonometric polynomials are dense in (𝒞[−𝜋, 𝜋], ‖.‖2 )

Proof. Let f ∈ 𝒞[−𝜋, 𝜋], and let 𝜖 > 0. By lemma 4.10.2, there is a function g ∈
𝒞(𝒮1 ) such that ‖f − g‖2 < 𝜖. By theorem 4.10.1, there exists a trigonometric
polynomial p such that ‖g − p‖∞ < 𝜖. Now

‖f − p‖2 ≤ ‖f − g‖2 + ‖g − p‖2 ≤ ‖f − g‖2 + ‖g − p‖∞ < 2𝜖. 

Observe that the set of trigonometric polynomials with rational coefficients is


dense in (𝒞[−𝜋, 𝜋], ‖.‖2 ); hence (𝒞[−𝜋, 𝜋], ‖.‖2 ) is separable.

We are now able to settle a question posed in the preamble to this section.

Theorem 4.10.4. For every function f ∈ 𝒞[−𝜋, 𝜋], the sequence of partial sums Sn f
converges in the mean square to f.

Proof. We need to show that limn ‖f − Sn f‖2 = 0. Let 𝜖 > 0. By corollary 4.10.3,
N
there exists a trigonometric polynomial p = ∑j=−N cj uj such that ‖f − p‖2 <
𝜖. For every n ≥ N, p ∈ Mn = Span({uj ∶ −n ≤ j ≤ n}). Because Sn f is the
best approximation of f in Mn , it follows that, for every n ≥ N, ‖f − Sn f‖2 ≤
‖f − p‖2 < 𝜖. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 179

We take a short detour to discuss the sum of a two-sided sequence. The concept
framed in the following, more general, definition is sometimes useful. See the
excursion in section 7.2.

Definition. Let {a𝛼 ∶ 𝛼 ∈ I} be an indexed set of nonnegative numbers, where I is


an infinite set, possibly uncountable. The sum ∑{a𝛼 ∶ 𝛼 ∈ I} is, by definition

∑{a𝛼 ∶ 𝛼 ∈ I} = sup{∑𝛼∈F a𝛼 }.

where the supremum is taken over all finite subsets F ⊆ I.

Example 1. If ∑{a𝛼 ∶ 𝛼 ∈ I} < ∞, then the set J = {𝛼 ∈ I ∶ a𝛼 > 0} is countable.

Fix a positive integer n. If the set Jn = {𝛼 ∈ I ∶ a𝛼 > 1/n} is infinite, we can


choose a sequence 𝛼1 , 𝛼2 , . . . of distinct elements of Jn . For the finite subset
Fj = {𝛼1 , . . . , 𝛼j }, ∑𝛼∈F a𝛼 > j/n. Since j is arbitrary, ∑{a𝛼 ∶ 𝛼 ∈ I} would be
j
infinite. This proves that Jn is a finite set for every n ∈ ℕ.
Since (0, ∞) = ∪∞ n=1 (1/n, ∞), J = ∪n=1 Jn . Thus J is countable. 

Example 2. Let J be a countable set and suppose that ∑{a𝛼 ∶ 𝛼 ∈ J} < ∞, where

a𝛼 ≥ 0. If 𝛼1 , 𝛼2 , . . . is any enumeration of J, then ∑{a𝛼 ∶ 𝛼 ∈ J} = ∑i=1 a𝛼i .
N
For an integer N, ∑i=1 a𝛼i ≤ ∑{a𝛼 ∶ 𝛼 ∈ J}. Thus the partial sums of the
∞ ∞
series ∑i=1 a𝛼i are bounded by ∑{a𝛼 ∶ 𝛼 ∈ J} and hence ∑n=1 a𝛼n ≤ ∑{a𝛼 }.
Conversely, if F is an arbitrary finite subset of J, then there is an integer N such
N ∞
that F ⊆ {𝛼1 , . . . , 𝛼N }. Therefore ∑{a𝛼 ∶ 𝛼 ∈ F} ≤ ∑n=1 a𝛼n ≤ ∑n=1 a𝛼n . Thus

∑{a𝛼 ∶ 𝛼 ∈ J} ≤ ∑n=1 a𝛼n . 

A special case of the above examples is when (an )n∈ℤ is a two-sided sequence

of nonnegative numbers.13 The series ∑−∞ an can be defined (for example)
n
as limn→∞ ∑i=−n ai , which corresponds to the following enumeration of ℤ:
0, −1, 1, −2, 2, −3, 3, . . . .

We can now define, for 1 ≤ p < ∞, the space lp (ℤ) to be the set of all two-sided

sequences x = (xn )n∈ℤ such that ∑n=−∞ |xn |p < ∞. It is easy to check that lp (ℤ) is
∞ 1/p
a complete normed linear space with the norm ( ∑n=−∞ |xn |p ) .
Similarly, we define l∞ (ℤ) to be the space of all bounded scalar functions
x ∶ ℤ → 𝕂, which is a complete normed linear space with the norm ‖x‖∞ =
sup{|x(n)| ∶ n ∈ ℤ}.

13 More accurately, functions from ℤ to the base field 𝕂.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

180 fundamentals of mathematical analysis

We also define the space c0 (ℤ) as the subspace of l∞ (ℤ) of all two-sided sequences
x ∈ l∞ (ℤ) such that lim|n|→∞ xn = 0.

Theorem 4.10.5. For every f ∈ 𝒞[−𝜋, 𝜋],

𝜋 ∞
1 ̂ 2.
‖f‖22 = ∫ |f(x)|2 dx = ∑ |f(n)|
2𝜋 −𝜋 n=−∞

Proof. By the continuity of norms (see section 4.3), limn ‖Sn f‖22 = ‖f‖22 . By the
n
Pythagorean theorem, ‖Sn f‖22 = ‖ ∑j=−n f(j)û j ‖22 = ∑n ̂ 2 . We obtain the
|f(j)|
j=−n
required result by taking the limit of both sides as n → ∞. 

Example 3. Let f(x) = x2 . We compute the Fourier coefficients of f. Since x2 sin(nx)


1 𝜋
is an odd function and x2 cos(nx) is an even function, ∫−𝜋 x2 sin(nx)dx = 0
𝜋
1 𝜋
̂ = f(−n)
and thus f(n) ̂ = ∫ x2 cos(nx)dx.
𝜋 0
Integration by parts now yields

𝜋 𝜋 𝜋
1 −2 2x |
∫ x2 cos(nx)dx = ∫ x sin(nx)dx = cos(nx)||
𝜋 0 𝜋n 0 𝜋n2 0
n
2 cos n 𝜋 2(−1)
= = .
n2 n2

Thus
̂ = f(−n)
̂ 2(−1)n
f(n) = .
n2
We also have
𝜋 2
̂ = 1 ∫ x2 dx = 𝜋 .
f(0)
2𝜋 −𝜋 3
Theorem 10.4.5 now yields

𝜋 ∞
𝜋4 1 ̂ 2 + ∑ |f(n)|
̂ 2
= ∫ x4 dx = |f(0)|
5 2𝜋 −𝜋 |n|=1
∞ ∞
𝜋4 4
̂ 2= 𝜋 +∑ 8 .
= + 2 ∑ |f(n)|
9 n=1
9 n=1 n4

Rearranging the extreme sides of the above string we obtain


1 𝜋4
∑ 4
= .
n=1
n 90
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 181

The next result says that a function in 𝒞[−𝜋, 𝜋] is determined by its Fourier
coefficients.

̂ = g(n)
Corollary 4.10.6 (the uniqueness theorem). If f, g ∈ 𝒞[−𝜋, 𝜋] and f(n) ̂ for
every n ∈ ℤ, then f = g.

̂ = 0 for every n ∈ ℤ. By theorem 4.10.5,


Proof. Let h = f − g. By assumption, h(n)
1 𝜋 ∞
̂
∫−𝜋 |h(x)|2 dx = ∑n=−∞ |h(n)| 2
= 0. Since h is continuous, h = 0. 
2𝜋

We now turn to the question of uniform convergence of Fourier series, which


is a more complicated problem than the mean square convergence. Since 𝒞(𝒮1 )
is closed in (𝒞[−𝜋, 𝜋], ‖.‖∞ ), the Fourier series of a non-periodic function f on
[−𝜋, 𝜋] cannot converge uniformly to f. There are, however, simple criteria that
guarantee the uniform convergence of the Fourier series of 2𝜋-periodic functions.
The next theorem is a sample. See also example 5 below.

̂
Theorem 4.10.7. If f ∈ 𝒞(𝒮1 ) is such that ∑−∞ |f(n)| < ∞, then the Fourier series
of f converges uniformly to f. In particular,

̂
f(x) = ∑ f(n)e inx
for every x ∈ [−𝜋, 𝜋].
n=−∞
n
̂ ijx . For positive integers m > n,
Proof. For n ∈ ℕ, define Fn (x) = ∑j=−n f(j)e
m
‖Fm − Fn ‖∞ = supx∈[−𝜋,𝜋] | ∑ ̂ ijx | ≤ ∑m
f(j)e ̂ → 0 as n → ∞.
|f(j)|
|j|=n+1 |j|=n+1

Thus the sequence of functions Fn (x) is Cauchy in 𝒞(𝒮1 ) and hence converges
uniformly on [−𝜋, 𝜋] to some function F ∈ 𝒞(𝒮1 ). Thus limn ‖Fn − F‖∞ = 0.
But ‖Fn − F‖2 ≤ ‖Fn − F‖∞ , and hence Fn converges to F in ‖.‖2 . Since Fn also
converges to f in ‖.‖2 (theorem 4.10.4), F = f by the uniqueness of limits. 

Example 4. This is a continuation of example 3. For the function f(x) = x2 ,


∞ ∞ 2
̂
∑|n|=1 |f(n)| = ∑|n|=1 2 < ∞, The Fourier series of x2 converges uniformly
n
̂ = f(−n),
(and absolutely) to x2 on [−𝜋, 𝜋]. Since f(n) ̂
∞ ∞ ∞
̂
∑|n|=1 f(n)e inx ̂
= ∑n=1 f(n)(e inx ̂
+ e−inx ) = 2∑n=1 f(n)cos(nx). Therefore


𝜋2 (−1)n cos(nx)
x2 = +4∑ , −𝜋 ≤ x ≤ 𝜋.
3 n=1
n2

Substituting x = 0 and then x = 𝜋 in this identity, we obtain, respectively,


∞ ∞
(−1)n+1 𝜋 2 1 𝜋2
∑ = , and ∑ 2 = .
n=1
n 2 12 n=1
n 6

Example 5. If a function F ∈ 𝒞(𝒮1 ) has a continuous derivative, then the Fourier


series of F converges uniformly to F on [−𝜋, 𝜋].
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

182 fundamentals of mathematical analysis

Let F′ = f. If n ≠ 0, integration by parts yields

𝜋 𝜋
̂ −1 | 1 1 ̂
F(n) = F(t)e−int || + ∫ f(t)e−int dt = f(n).
2𝜋in −𝜋 2𝜋in −𝜋
in

1
Using the inequality |ab| ≤ [|a|2 + |b|2 ], we have
2

̂ 1 1 ̂ 2]
|F(n)| ≤ [ + |f(n)|
2 n2

and
∞ ∞ ∞
̂ 1 1 ̂ 2 ] < ∞.
∑ |F(n)| ≤ [2 ∑ 2 + ∑ |f(n)|
|n|=1
2 n=1
n |n|=1

The result now follows from theorem 4.10.7. 

Orthogonal Polynomials: The General Construction

Let (a, b) be an interval. A function 𝜔 ∶ (a, b) → ℝ is said to be a weight


function if

(a) 𝜔 is continuous and strictly positive on (a, b),


b
(b) ∫a 𝜔(x)dx < ∞, and
b
(c) for every integer n ≥ 0, ∫a xn 𝜔(x)dx < ∞.

b
Consequently, ∫a p(x)𝜔(x)dx < ∞ for every polynomial p. Neither the function 𝜔
nor the interval (a, b) is assumed to be bounded.
When either 𝜔 or (a, b) is unbounded, we interpret the integrals involved
as improper Riemann integrals according to the standard definitions. Observe
that if (a, b) is a bounded interval, then 𝜔 can be unbounded if and only if
limx↓a 𝜔(x) = ∞ or limx↑b 𝜔(x) = ∞. See the weight function for the Tchebychev
polynomials later on in this section.

Let H be the collection of all continuous functions on (a, b) such that

b
∫ |f(x)|2 𝜔(x)dx < ∞.
a

It is obvious that H is closed under complex conjugation and scalar multiplication.


The following estimates show that H is a vector space. If f, g ∈ H, then

b b
1
∫ |f(x)g(x)|𝜔(x)dx ≤ ∫ [|f(x)|2 + |g(x)|2 ]𝜔(x)dx < ∞
a
2 a
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 183

and
b b
2
∫ |f + g|2 𝜔(x)dx ≤ ∫ (|f| + |g|) 𝜔(x)dx
a a
b
= ∫ |f|2 𝜔(x) + 2|fg|𝜔(x) + |g|2 𝜔(x)dx < ∞.
a

We call H the space of continuous, square integrable functions with respect to


the weight function 𝜔. It now makes sense to define the following inner product
on H:
b
⟨f, g⟩ = ∫a f(x)g(x)𝜔(x)dx.
The Gram-Schmidt process can be applied to the sequence of independent func-
tions {1, x, x2 , . . . } to yield a sequence of orthogonal polynomials 𝜙0 , 𝜙1 , . . . . The
orthogonal polynomials in this general construction (regardless of the weight
function or the interval (a, b)) share broad characteristics, which we will not
discuss further. See the section exercises for some of the general features of
orthogonal polynomials. In the remainder of this section, we give three major
examples of orthogonal polynomials.

The Legendre Polynomials

In this special case, we take


(a, b) = (−1, 1)
and
𝜔(x) = 1.
Observe that the space H of continuous, square integrable functions on (−1, 1)
contains the entire space 𝒞[−1, 1]. The resulting orthogonal polynomials are
the well-known Legendre polynomials. In section 3.7, we derived the following
formula for the Legendre polynomials (up to a multiplicative constant):
Qn (x) = Dn (x2 − 1)n .
The first two Legendre polynomials are obvious: Q0 (x) = 1, and Q1 (x) = 2x. We
establish the properties of the Legendre polynomials in a number of steps:

1. The parity of the Legendre polynomials. Since the binomial expansion of


(x2 − 1)n contains only even powers of x, Qn is an even polynomial if n is
even, and conversely.
2. The normalized Legendre polynomials Pn . It is customary to normalize the
Legendre polynomials so that they take the value 1 at x = 1. Thus Pn = cn Qn ,
and we find cn by imposing the condition Pn (1) = 1. Using the Leibnitz rule,
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

184 fundamentals of mathematical analysis

Qn (x) = Dn (x2 − 1)n = Dn−1 [2nx(x2 − 1)n−1 ]


= 2nxDn−1 (x2 − 1)n−1 + (n − 1)2nDn−2 (x2 − 1)n−1 .

Evaluating the last identity at x = 1, we obtain Qn (1) = 2nQn−1 (1) and, by


induction, Qn (1) = 2n n!. Therefore the polynomials

1 n 2
Pn (x) = D (x − 1)n
2n n!

are orthogonal and satisfy the normalization condition Pn (1) = 1.


3. The leading coefficient of Pn . We use the symbol 𝛼n to indicate the leading
coefficient of Pn . The leading coefficient in Dn (x2 − 1)n is the result of
differentiating x2n exactly n times. Therefore

1 (2n)!
𝛼n = (2n)(2n − 1) . . . (n + 1) = .
2n n! 2n (n!)2

4. The three-term recurrence relation. The recurrence relation we derive


below facilitates the generation of the sequence Pn . We will use the brief
𝛼 2n+1 𝛼
notation 𝛽n = n+1 = . The polynomial Pn+1 − n+1 xPn is of degree at
𝛼n n+1 𝛼n
most n. Therefore,
n
Pn+1 − 𝛽n xPn = ∑i=1 ci Pi ,

where

cj ‖Pj ‖22 = ⟨Pn+1 − 𝛽n xPn , Pj ⟩ = −𝛽n ⟨xPn , Pj ⟩

If j < n − 1, cj ‖Pj ‖22 = −𝛽n ⟨xPn , Pj ⟩ = −𝛽n ⟨Pn , xPj ⟩ = 0, since xPj has degree
less than n. Thus Pn+1 − 𝛽n xPn = cn Pn + cn−1 Pn−1 . Now

1
cn ‖Pn ‖22 = −𝛽n ⟨xPn , Pn ⟩ = −𝛽n ∫ xP2n (x)dx = 0,
−1

since xP2n (x) is an odd function. Therefore

Pn+1 − 𝛽n xPn = cn Pn−1 .

−n
Evaluating the last identity at x = 1, we obtain cn = 1 − 𝛽n = , and we
n+1
have the recurrence relation:
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 185

2n + 1 n
Pn+1 = xP − P .
n + 1 n n + 1 n−1

Here is a list of the first Legendre polynomials:

P0 (x) = 1,
P1 (x) = x,
1
P2 (x) = (3x2 − 1),
2
1
P3 (x) = (5x3 − 3x),
2
1
P4 (x) = (35x4 − 30x2 + 3).
8

5. The norm of Pn . Next we show that

1
2 2
∫ [Pn (x)] dx = .
−1
2n + 1

1 2
Let an = ∫−1 [Pn (x)] dx. Taking the inner product of Pn with both sides of the
2n−1 n−1 2n−1 1
identity Pn = xPn−1 − Pn−2 , we obtain an = ∫−1 (xPn )Pn−1 dx.
n n n
Using the recurrence relation again, xPn = [(n + 1)Pn+1 + nPn−1 ]/(2n + 1),
and hence
1
2n − 1 1 2n − 1
an = ∫ P [(n + 1)Pn+1 + nPn−1 ] = a .
n 2n + 1 −1 n−1 2n + 1 n−1

1
Now a0 = ∫−1 dx = 2. By induction, one obtains

2
an = .
2n + 1

It follows that the polynomials below are orthonormal in (𝒞[−1, 1], ‖.‖2 ):

2n + 1
P̃ n = √ Pn .
2

Theorem 4.10.8 (mean square convergence). For every f ∈ 𝒞[−1, 1], the sequence
n
Sn f = ∑i=0 ⟨f, P̃ i ⟩P̃ i converges to f in the sense that limn ‖Sn f − f‖2 = 0.

Proof. Let 𝜖 > 0. By the Weierstrass approximation theorem, there exists a polyno-
mial q such that ‖f − q‖∞ < 𝜖/√2. Now ‖f − q‖2 ≤ √2‖f − q‖∞ < 𝜖. Let N be the
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

186 fundamentals of mathematical analysis

degree of q. For every n ≥ N, q ∈ ℙn , and since Sn f is the best approximation of f


in ℙn , ‖f − Sn f‖2 ≤ ‖f − q‖2 < 𝜖, as required. 

Observe the resemblance between the proof of the last theorem and that of
theorem 4.10.4. See also the examples in section 6.1.

The Tchebychev Polynomials

In this special case, we take


(a, b) = (−1, 1)
and
1
𝜔(x) = .
√1 − x2
Observe that the space H of square integrable functions with respect to 𝜔 contains
the entire space 𝒞[−1, 1].

A simple and direct derivation of the orthogonal polynomials is possible because of


the observation that, for an integer n ≥ 0, cos(nx) can be expressed as a polynomial
of cosx. For example, cos(2x) = 2cos2 x − 1. The next lemma proves the existence of
such polynomials and establishes the three-term recurrence relation among them.

Lemma 4.10.9. For n ≥ 0, there exists a polynomial Tn of exact degree n such that,
for all x ∈ ℝ, cos(nx) = Tn (cos x).

Proof. For n = 0, 1, the polynomials T0 (x) = 1 and T1 (x) = x trivially satisfy the
requirements. The rest of the construction is inductive. Suppose that there are
polynomials T0 , . . . , Tn that satisfy the statement we wish to prove. For n ≥ 1, we
have cos(n + 1)x + cos(n − 1)x = 2cos(nx)cos x. Therefore

cos(n + 1)x = 2cos(nx)cos x − cos(n − 1)x = 2cos xTn (cos nx) − Tn−1 (cos x).

The last identity dictates the definition of Tn+1 and concludes the proof:

Tn+1 (x) = 2xTn (x) − Tn−1 (x). 


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 187

Definition. The polynomials Tn in the previous lemma are called the Tchebychev
polynomials. A list of the next three Tchebychev polynomials appears below:

T2 (x) = 2x2 − 1,
T3 (x) = 4x3 − 3x,
T4 (x) = 8x4 − 8x2 + 1.

Theorem 4.10.10. The Tchebychev polynomials are orthogonal with respect to the
weight function 𝜔. Additionally,

‖T0 ‖22 = 𝜋

and, for n ≥ 1,
𝜋
‖Tn ‖22 = .
2

Proof. We use the change of variable x = cos 𝜃. If m ≠ n, then

1 𝜋
Tn (x)Tm (x)
⟨Tm , Tn ⟩ = ∫ dx = ∫ cos(n𝜃) cos(m𝜃)d𝜃
−1 √1 − x2 0
𝜋
1
= ∫ cos(m + n)𝜃 + cos(m − n)𝜃d𝜃 = 0.
2 0

1 [Tn (x)]2 𝜋 1 𝜋
Finally, ‖Tn ‖22 = ∫−1 dx = ∫0 cos2 (n𝜃)d𝜃 = ∫ 1 + cos(2n𝜃)d𝜃 =
√1−x2 2 0
𝜋/2. 

The basic properties of the Tchebychev polynomials appear below. The first three
follow from the three-term recurrence relation and induction:

1. Tn is even if and only if n is even.


2. The leading term of Tn is 2n−1 .
3. Tn (1) = 1, and Tn (−1) = (−1)n .
4. For all n ≥ 0, ‖Tn ‖∞ = max{|Tn (x)| ∶ −1 ≤ x ≤ 1} = 1. For every x ∈
[−1, 1], there is a number 𝜃 such that x = cos 𝜃. Thus |Tn (x)| = |cos(n𝜃)| ≤ 1.
Since Tn (1) = 1, it follows that ‖Tn ‖∞ = 1.
(2k−1)𝜋
5. The roots of Tn are xk = cos , 1 ≤ k ≤ n. This can be verified directly,
2n
or one can write x = cos 𝜃. If Tn (x) = 0, then cos(n𝜃) = 0. So n𝜃 is an odd
multiple of 𝜋/2; hence the stated values of xk are all the roots of Tn .
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

188 fundamentals of mathematical analysis

𝜋k
6. The extreme values of Tn in [−1, 1] are attained at the points yk = cos ,
n
k
0 ≤ k ≤ n. Additionally, Tn (yk ) = (−1) . Again a direct verification is the
dT (x)
simplest or, as before, we write x = cos 𝜃, then Tn (x) = cos(n𝜃), and n =
dx
d cos(n𝜃) d𝜃 n sin(n𝜃)
= . The interested reader can work out the calculus and
d𝜃 dx √1−x2
arrive at the points yk .

1
For n ≥ 1, let T̃ n = Tn . From the above properties of Tn , T̃ n is a monic
2n−1
polynomial,1⁴ and

T̃ n (xk ) = 0 for 1 ≤ k ≤ n,
k
T(ỹ k ) = (−1) for 0 ≤ k ≤ n,
2 n−1
1
and ‖T̃ n ‖∞ = n−1 .
2

The following theorem establishes the curious fact that, among all monic poly-
nomials of degree n, T̃ n has the least uniform norm on [−1, 1]. This result is
important for understanding the error when a sufficiently differentiable function
is interpolated by a polynomial.

Theorem 4.10.11. Suppose p is a monic polynomial of degree n. Then


1
‖p‖∞ = max{|p(x)| ∶ −1 ≤ x ≤ 1} ≥ .
2n−1

1
Proof. Suppose, for a contradiction, that ‖p‖∞ < . Consider the integers
2n−1
0 ≤ k ≤ n.

−1 1
If k is odd, then p(yk ) > = T̃ n (yk ). If k is even, then p(yk ) < = T̃ n (yk ).
2n−1 2n−1

Thus the polynomial q = p − T̃ n alternates sign at the points y0 , . . . , yn ; hence q has


a root in each of the n open intervals (y0 , y1 ), . . . , (yn−1 , yn ). This is a contradiction
because q has degree at most n − 1. 

The Hermite Polynomials

For our last example of orthogonal polynomials, we take

(a, b) = (−∞, ∞)

1⁴ A monic polynomial is one whose leading coefficient is 1.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

the metric topology 189

and
2
𝜔(x) = e−x .

We will show that the polynomials defined below are orthogonal with respect
to 𝜔:
2 2
Hn (x) = (−1)n ex Dn [e−x ].
∞ 2
Since H0 = 1, and H1 (x) = 2x, ⟨H0 , H1 ⟩ = ∫−∞ 2xe−x dx = 0. We now use induc-
tion on n. If 0 ≤ j < n, then integration by parts yields
∞ ∞
2 2 2 2
⟨xj , Hn ⟩ = ∫ xj (−1)n ex Dn [e−x ]e−x dx = (−1)n ∫ xj Dn [e−x ]dx
−∞ −∞
∞ ∞
−x2 | 2
= (−1)n xj Dn−1 e | − (−1)n ∫ jxj−1 Dn−1 e−x dx.
|−∞
−∞

2
The first term of the last expression is 0 because xj Dn−1 e−x is the product of a
2
polynomial and e−x , and the second term is 0 by the inductive hypothesis.

We leave some of the properties of the Hermite polynomials as exercises for the
interested reader.

Exercises
1 𝜋 ∞
1. Let f, g ∈ 𝒞[−𝜋, 𝜋]. Prove that ∫−𝜋 f(x)g(x)dx = ∑n=−∞ f(n) ̂ g(n).
̂ Hint:
2𝜋
Use theorem 4.10.4 and the continuity of inner products. See section 4.3.
𝜋 4 ∞ cos(2n−1)x
2. Show that |x| = − ∑n=1 2
, −𝜋 ≤ x ≤ 𝜋. Conclude that
2 𝜋 (2n−1)
∞ (−1)n+1 𝜋2
∑n=1 = .
(2n−1)2 8
𝜋 2 x−x3 ∞ (−1)n+1 sin(nx)
3. Show that = ∑n=1 .
12 n3
∞ 1 𝜋6
4. Use the previous problem to show that ∑n=1 6 = .
n 945
5. This exercise furnishes the three-term recurrence relation for general
orthogonal polynomials with respect to a weight function 𝜔 on an interval
b
(a, b), and the inner product ⟨f, g⟩ = ∫a f(x)g(x)𝜔(x)dx. Let 𝜙0 , 𝜙1 , . . . be
the orthogonal monic polynomials with respect to the weight function
𝜔, where 𝜙0 = 1, and 𝜙n has degree n.1⁵ Prove the three term recurrence
relation below:

1⁵ Observe that these are precisely the orthogonal polynomials generated by applying the Gram-
Schmidt process to the monomials 1, x, x2 , . . . .
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

190 fundamentals of mathematical analysis

𝜙n+1 (x) = (x − an+1 )𝜙n (x) − b2n+1 𝜙n−1 ,

where
⟨x𝜙n , 𝜙n ⟩
an+1 =
‖𝜙n ‖2
and
‖𝜙n ‖
bn+1 = .
‖𝜙n−1 ‖
Here n ≥ 0, and, for notational convenience, define 𝜙−1 = 0, b1 = 0.
6. In the notation of the previous exercise, prove that the roots of 𝜙n are the
eigenvalues of the tri-diagonal matrix

a b2
⎛ 1 ⎞
b a2 ⋱
Jn = ⎜ 2 ⎟.
⎜ ⋱ ⋱ bn ⎟
⎝ bn an ⎠

7. Prove that all the roots of 𝜙n are real and simple and lie in the interval (a, b).
b
Outline: Since ⟨𝜙0 , 𝜙n ⟩ = 0, ∫a 𝜙n 𝜔dx = 0. Thus 𝜙n changes sign in (a, b),
and hence it has at least one root of odd multiplicity. Let x1 , . . . , xr be the
roots of 𝜙n of odd multiplicity in (a, b), and let q = (x − x1 ) . . . (x − xr ). If
r < n, examine ⟨q, 𝜙n ⟩.
8. Prove that the Legendre polynomial Pn satisfies the differential equation
(x2 − 1)P″n + 2xP′n − n(n + 1)Pn = 0.
9. Prove that the sum of the coefficients of any Legendre polynomial is 1. The
same is true for the Tchebychev polynomials.
10. Prove that 𝒞[−1, 1] is contained in the space of continuous square inte-
1
grable functions on (−1, 1) with respect to 𝜔(x) = . The integrals
√1−x2
involved are improper Riemann integrals.
1 2
11. Define the normalized Tchebychev polynomials T0 = , Tn = √ Tn .
√𝜋 𝜋
n
For a function f ∈ 𝒞[−1, 1] let Sn f = ∑j=0 ⟨f, Tj ⟩Tj . Prove that
limn ‖Sn f − f‖2 = 0.
12. Prove that Hn+1 = 2xHn − 2nHn−1 . Conclude that Hn is even if and only if
n is even.
13. Prove that H′n = 2xHn − Hn+1 . Conclude that H′n = 2nHn−1 .
14. Compute H2 and H3 .
∞ 2 2
15. Show that ‖Hn ‖22 = ∫−∞ [Hn (x)] e−x dx = n!2n √𝜋.
16. Prove that the Hermite polynomial Hn satisfies the differential equation
H″n (x) − 2xH′n (x) + 2nHn (x) = 0.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

5
Essentials of General Topology

Considering that he only had three years to devote to topology, he made his mark
in his chosen field with brilliance and passion. He transformed the subject into
a rich domain of modern mathematics. How much more might there have been,
had he not died so young?1
Crilly and Johnson wrote of Pavel Urysohn

Pavel Urysohn. 1898–1924

In 1915 Urysohn entered the University of Moscow to study physics. However, his
interest in physics soon took second place, for, after attending lectures by Luzin
and Egoroff, he began to concentrate on mathematics. Urysohn graduated in 1919
and continued working toward his doctorate. In June 1921, he became an assistant
professor at the University of Moscow.
Urysohn soon turned to topology. Egoroff gave him two problems in 1921.
These were difficult problems that had been around for some time. Egoroff was
not to be disappointed. Near the end of August, even before working out the details,
Urysohn had the correct ideas for solving the problems. During the following year,
Urysohn worked through the details, building a whole new area of dimension
theory in topology. It was an exciting time for topologists in Moscow, for Urysohn
lectured on the topology of continua, and often his latest results were presented
in the course shortly after he had proved them. He published a series of short
notes on this topic during 1922. The complete theory was presented in an article

1 T. Crilly and D. Johnson, “The emergence of topological dimension theory,” in I. M. James (ed.),
History of Topology (New York: Elsevier, 1999), 1–24.

Fundamentals of Mathematical Analysis. Adel N. Boules, Oxford University Press (2021). © Adel N. Boules.
DOI: 10.1093/oso/9780198868781.003.0005
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

192 fundamentals of mathematical analysis

that Lebesgue accepted for publication in the Comptes rendus of the Academy of
Sciences in Paris. This gave Urysohn an international platform for his ideas, which
immediately attracted the interest of mathematicians such as Hilbert, Hausdorff,
and Brouwer. In addition to advancing dimension theory, Urysohn is credited for
an important metrization theorem. He is particularly remembered for “Urysohn’s
lemma,” which establishes the existence of a continuous function taking the values
0 and 1 on disjoint closed subsets of a normal space.

Urysohn published a full version of his dimension theory in Fundamenta


Mathematicae. He wrote a major paper in two parts in 1923, but they did not
appear in print until 1925 and 1926. Sadly, Urysohn died in a drowning accident
before even the first part was published. His untimely death generated much
sadness in the mathematical community.

In the summer of 1924, Urysohn set off with Alexandroff on a European trip
through Germany, Holland, and France. The two mathematicians visited Hilbert.
After they left, Hilbert wrote to Urysohn, informing him that his paper with
Alexandroff had been accepted for publication in Mathematische Annalen, and
expressing the hope that Urysohn would visit again the following summer. They
then met Hausdorff, who was impressed with Urysohn’s results. He also wrote a
letter to Urysohn, which was dated August 11, 1924. The letter discusses Urysohn’s
metrization theorem and his construction of a universal separable metric space
(one into which any separable metric space can be injected), which was one of
Urysohn’s last results. Like Hilbert, Hausdorff expressed the hope that Urysohn
would visit again the following summer. Van Dalen writes about their final
mathematical visit, which was to Brouwer:2 “This time [Urysohn and Alexandroff]
visited Brouwer, who was most favourably impressed by the two Russians. He
was particularly taken with Urysohn, for whom he developed something like the
attachment to a lost son.”

5.1 Definitions and Basic Properties

While the metric topology is often sufficient for most introductory courses in
analysis, a good understanding of the elements of general topology is essential for
any advanced study of analysis. An attempt to define topology in a paragraph is
quite difficult and not likely to be successful, but we offer the following narrative

2 J. J. O’Connor and E. F. Robertson, “Pavel Samuilovich Urysohn,” in MacTutor History


of Mathematics, (St Andrews: University of St Andrews, 1998), http://mathshistory.st-andrews.
ac.uk/Biographies/Urysohn/, accessed Oct. 31, 2020.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 193

for the satisfaction of the the reader who insists on an overview of the subject. We
saw in chapter 4 that the collection of open sets generated by a metric has many
intrinsic properties independent of the defining metric. In this section, we study
the arrangement of the collection of open sets, or the topology, in a metric-free
context. Every metric space is a topological space; hence all results for topological
spaces (which are meaningful in the metric setting) are also valid for metric spaces,
but not conversely. We often fall back on the metric case to gain insight into both
subjects. We will encounter in this section many of the definitions that appeared
in chapter 4, such as closure, interior, and boundary. We include those definitions
again in this chapter for ease of reference. However, the proofs that duplicate
those in chapter 4 are omitted. The amount of duplication is small and does
not rise to the level of redundancy. We encourage the reader to compare results in
this section to their counterparts in the previous chapter. The exercise is insightful.

Let X be a nonempty set, and let 𝒯 be a collection of subsets of X; 𝒯 is called a


topology on X if

(a) ∅ and X are in 𝒯,


(b) the union of an arbitrary family of members of 𝒯 is a member of 𝒯, and
(c) the intersection of two members of 𝒯 is a member of 𝒯.

Thus 𝒯 is closed under the formation of arbitrary unions and finite intersections.
The members of 𝒯 are called the open subsets of X, and the pair (X, 𝒯) is called a
topological space.

Example 1. Let X be a nonempty set, and let 𝒯 = 𝒫(X). Clearly, (X, 𝒯) is a


topological space. In fact, 𝒫(X) is the largest topology one can define on X. In
this topology, every subset of X is open. This topology is known as the discrete
topology on X. It is clear that the discrete topology is too large to be useful. 

Example 2. Let X be a nonempty set, and let 𝒯 = {∅, X}. This topology is called
the trivial or indiscrete topology on X. 

Example 3. Let X be an infinite set, and define a subset U of X to be open if U = ∅


or if X − U is finite. We verify that the collection of open sets we just defined is
a topology. If {U𝛼 }𝛼 is a collection of open sets, then, for each 𝛼, F𝛼 = X − U𝛼
is finite. Now ∪𝛼 U𝛼 is open because X − ∪𝛼 U𝛼 = ∩𝛼 F𝛼 , which is finite. It is
easy to verify that the intersection of two open subsets is open. This topology is
called the co-finite topology on X (or the finite complement topology.) 

Example 4. Let X = (0, ∞), and let 𝒯 consist of ∅ and all intervals of the form
(a, ∞), for all a ≥ 0. It is easy to verify that 𝒯 is a topology. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

194 fundamentals of mathematical analysis

Example 5. The most common topologies are the metric topologies. Thus every
metric space is a topological space in accordance with the following definition: a
subset U of a metric space (X, d) is open if it is the union of open balls. Theorem
4.1.2 says precisely that the collection of open sets thus defined is a topology.
The reader can look up sections 3.6, 3.7, 4.1, and 4.8 for a variety of examples of
metric spaces and hence topological spaces. 

Example 6. The most important topological space is ℝn , where the topology is the
metric topology generated by the Euclidean metric (or any equivalent metric.)
We will call this the usual topology on ℝn . 

If the topology 𝒯 on a topological space X is understood, we simply say that X is


a topological space and omit the reference to 𝒯. If more than one topology on X
is being considered or if there is a danger of ambiguity, we will specifically state
which topology applies to the situation in hand.

Definition. Let (X, 𝒯) be a topological space. A subset F of X is said to be closed


if its complement is open.

Theorem 5.1.1. Let X be a topological space. Then

(a) X and ∅ are closed,


(b) the union of finitely many closed sets is closed, and
(c) the intersection of an arbitrary collection of closed sets is closed. 

Definition. Let A be a subset of a topological space X. The interior of A, denoted


int(A), is the union of all the open sets contained in A. A point of int(A) is called
an interior point of A. The closure of A, denoted A, is the intersection of all the
closed sets containing A. A point of A is called a closure point of A.

The following properties of interiors and closures are straightforward. See the
corresponding results in chapter 4.

Theorem 5.1.2. Let A and B be subsets of a topological space X. Then

(a) int(A) is the largest open subset of A;


(b) A is open if and only if int(A) = A;
(c) if A ⊆ B, then int(A) ⊆ int(B);
(d) A is the smalled closed subset of X containing A;
(e) A is closed if and only if A = A; and
( f) if A ⊆ B, then A ⊆ B. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 195

Definition. Let X be a topological space, and let x ∈ X. A neighborhood of x is a


subset A of X that contains an open set that contains x. A neighborhood of a set
E ⊆ X is a subset A of X that contains an open set that contains E.

Observe that the definition does not require a neighborhood of a point or a set
to be open. If A is open, we specifically refer to it as an open neighborhood of
x (respectively, E). For example, an open set is a neighborhood of each of its points.

The following theorem provides a useful criterion for characterizing the closure of
a set. Compare its statement and proof to those of theorem 4.2.4.

Theorem 5.1.3. Let A be a subset of a topological space X. Then x ∈ A if and only if


every open neighborhood of x intersects A.

Proof. Suppose x ∉ A. Then x is in the open set U = X − A, and, clearly, U ∩ A = ∅.


Conversely, if U is an open neighborhood of x that does not intersect A, then A is
contained in the closed set F = X − U. Therefore A ⊆ F. In particular, x ∉ A. 

Theorem 5.1.4. Let A be a subspace of a topological space (X, 𝒯). Then

(a) int(A) = X − (X − A), and


(b) A = X − int(X − A).

Proof. The proof is as follows:

int(A) = ∪{U ∶ U ∈ 𝒯, U ⊆ A} = X − [X − ∪{U ∶ U ∈ 𝒯, U ⊆ A}]


= X − ∩{X − U ∶ U ∈ 𝒯, U ⊆ A} = X − ∩{F ∶ (X − A) ⊆ F, F closed}
= X − X − A.

To prove part (b), let B = X − A. Then, by part (a),

X − int(B) = X − [X − (X − B)] = X − B = A. 

Definition. Let A be a subset of a topological space X. The boundary of A, denoted


𝜕A, is the set A ∩ X − A. A point of 𝜕A is called a boundary point of A.

By theorem 5.1.3, a point x is a boundary point of A if and only if every open


neighborhood of x intersects A and its complement.

The proofs of the the following statements strongly resemble their metric counter-
parts. See, for example, theorems 4.2.8 and 4.2.9.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

196 fundamentals of mathematical analysis

Theorem 5.1.5. Let A be a subset of a topological space X. Then

(a) int(A) ∩ 𝜕A = ∅,
(b) A = int(A) ∪ 𝜕A,
(c) A = A ∪ 𝜕A, and
(d) A is closed if and only if 𝜕A ⊆ A. 

Subspace Topology

Let (X, 𝒯) be a topological space, and let Y be a subset of X. Define 𝒯Y = {Y ∩ U ∶


U ∈ 𝒯}. It is easy to verify that 𝒯Y is a topology on Y. For example, if {Y ∩ U𝛼 }𝛼
is a collection of members of 𝒯Y , then ∪𝛼 (Y ∩ U𝛼 ) = Y ∩ (∪𝛼 U𝛼 ), which is in
𝒯Y because ∪𝛼 U𝛼 ∈ 𝒯. Verifying that 𝒯Y is closed under the formation of finite
intersections is straightforward.

Definition. The topology 𝒯Y is known as the relative, subspace, or restricted


topology on Y induced by the topology 𝒯.

Theorem 5.1.6. Let A ⊆ Y, and let AY denote the closure of A in (Y, 𝒯Y ). Then
AY = A ∩ Y.

Proof. Since A is closed in X, A ∩ Y is closed in Y. Since A ⊆ A ∩ Y, AY ⊆ A ∩ Y. We


prove the reverse containment. Since AY is closed in Y, there exists a closed subset
F of X such that AY = F ∩ Y. Thus F is a closed subset of X, and A ⊆ F. Hence
A ⊆ F, and A ∩ Y ⊆ F ∩ Y = AY . 

Exercises

1. Let 𝒯 = {[n, ∞) ∶ n ∈ ℤ}. Is 𝒯 a topology on ℝ?


2. Let X be an infinite set, and let 𝒯 be the collection of subsets U of X such
that U = ∅ or X − U is countable. Is 𝒯 a topology?
3. Define 𝒯 to be the following collection of subsets of ℝ: U ∈ 𝒯 if U is empty
or if 0 ∈ U. Verify that 𝒯 is a topology. Prove that {0} = X and that the
restriction of 𝒯 to ℝ − {0} is the discrete topology.
4. Let (X, 𝒯) be a topological space, and let 𝜔 be an object not in X. Define a
collection ℱ of subsets of Y = X ∪ {𝜔} as follows: a subset A of Y is in ℱ if
A = ∅ or if A = {𝜔} ∪ U, where U ∈ 𝒯. Is ℱ a topology on Y?
5. Prove that the intersection of an arbitrary collection of topologies on a set
X is a topology on X.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 197

6. (a) Prove that if A and B are subsets of a topological space X, then A ∪ B =


A ∪ B.
(b) Let {A𝛼 }𝛼 be an arbitrary collection of subsets of X. Prove that ∪𝛼 A𝛼 ⊆
∪𝛼 A𝛼 and give an example to show that strict inclusion is possible.
7. Let Y be a subspace of a topological space X. Show that a subset A of Y
is closed in Y if and only if there exists a closed subset F of X such that
A = F ∩ Y.
8. Let U be an open subset of X, and let A ⊆ X. Show that if U ∩ A ≠ ∅, then
U ∩ A ≠ ∅.

Definition. A point x is said to be a limit point of a subset A of a


topological space X if every open neighborhood of x intersects A at a point
other than x. The set of limit points of A is denoted by A′ .

9. (a) Prove that x is a limit point of A if and only if x ∈ A − {x}.


(b) Prove that theorem 4.2.7 is valid for a general topological space.
10. Let A and B be subsets of a topological space X. Which of the following is
true?
(a) (A ∪ B)′ = A′ ∪ B′
(b) (A ∩ B)′ = A′ ∩ B′
Definition. A subset A of a topological space X is said to be nowhere dense
in X if int(A) = ∅.

11. Show that the results of problems 9 and 10 on section 4.6 are valid for a
general topological space.

5.2 Bases and Subbases

Some topologies are quite difficult to define directly, and it is frequently the case
that we want to define a topology on a set X that includes a certain collection
𝔖 of subsets of X. The existence of such a topology is obvious because 𝒫(X) is
such a topology. However, 𝒫(X) is useless because it is too large. This immediately
suggests the question of finding the smallest topology 𝒯 on X that contains 𝔖.
Fortunately, such a unique smallest topology 𝒯 exists.
The reader may wonder what situations would compel us to “want” the mem-
bers of 𝔖 to be open. The prime such situation is when we need a certain class
of functions from X to another topological space Y to be continuous, which is
the overarching idea behind the definition of product and weak topologies. See
sections 5.4 and 6.7.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

198 fundamentals of mathematical analysis

The set 𝔖 in the above discussion is called a subbase for 𝒯, and a closely connected
concept is that of a base for the topology 𝒯, which is our first definition. Bases and
subbases have a wide range of applications. In addition to providing the means
to define useful topologies, bases and subbases give us easy ways to prove the
continuity of functions and to characterize closures. See theorems 5.2.2 and 5.3.1.

Definition. An open base for a topology 𝒯 on a set X is a collection 𝔅 of open


subsets of X such that every nonempty open subset in X is the union of members
of 𝔅. If 𝔅 is an open base for 𝒯, we say that 𝔅 generates 𝒯.

See problem 2 at the end of this section for an equivalent, more explicit
formulation of the definition of an open base.

Example 1. The collection 𝔅 = {(r, s) ∶ r, s ∈ ℚ, r < s} is an open base for the usual
topology on ℝ. This is because every open subset of ℝ is the union of open
bounded intervals, and any such interval is the union of members of 𝔅: (a, b) =
∪{(r, s) ∶ r ∈ ℚ, s ∈ ℚ, a < r < s < b}. See section 4.5 for a more general version
of this example. 

The collection of open balls in a metric space is an open base for the metric
topology. This follows immediately from the definition of open sets in a metric
space.

Caution: Not every collection ℭ of subsets of X such that ∪{U ∶ U ∈ ℭ} = X is the


open base for some topology on X, as the next example illustrates.

Example 2. Let X = {a, b, c}, and let ℭ = {∅, X, {a, b}, {b, c}}. The collection ℭ is
not the base for any topology on X because if it were, that topology would be ℭ
because the union of two members of ℭ is in ℭ. However, ℭ is not a topology,
because {a, b} ∩ {b, c} ∉ ℭ. 

Theorem 5.2.1. Let X be a nonempty set, and let 𝔅 be a collection of subsets of X


such that ∪{U ∶ U ∈ 𝔅} = X. Then 𝔅 is an open base for some topology on X
if and only if, for every U, V ∈ 𝔅, and every x ∈ U ∩ V, there exists a member
W ∈ 𝔅 such that x ∈ W ⊆ U ∩ V.

Proof. If 𝔅 is an open base for some topology 𝒯, and x, U, and V are as in the
statement of the theorem, then U ∩ V is a nonempty open set. By the definition
of an open base, there is member W of 𝔅 such that x ∈ W ⊆ U ∩ V.

Conversely, suppose 𝔅 satisfies the assumptions of the theorem. Define a family


of subsets 𝒯 of X as follows: U ∈ 𝒯 if and only if U is the union of members of
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 199

𝔅. We claim that 𝒯 is a topology. Suppose {U𝛼 }𝛼 is a collection of members of


𝒯, and let x ∈ U = ∪𝛼 U𝛼 . Then x ∈ U𝛼 for some 𝛼. By the very definition of 𝒯,
there is a member B of 𝔅 such that x ∈ B ⊆ U𝛼 ⊆ U. This makes U the union of
members of 𝔅, that is, U ∈ 𝒯. Now consider two members U1 and U2 of 𝒯, and
let x ∈ U1 ∩ U2 . By the definition of 𝒯, there are members B1 and B2 of 𝔅 such
that x ∈ B1 ⊆ U1 and x ∈ B2 ⊆ U2 . By assumption, there is a member W of 𝔅
such that x ∈ W ⊆ B1 ∩ B2 . Thus x ∈ W ⊆ U1 ∩ U2 , and U1 ∩ U2 ∈ 𝒯. We have
proved that 𝒯 is a topology. It is clear that 𝔅 is an open base for 𝒯. 

Example 3. Let X = ℝ, and let 𝔅 be the collection of intervals in ℝ of the form


[a, b), where a, b ∈ ℝ and a < b. The nonempty intersection of two members
of 𝔅 is a member of 𝔅. By theorem 5.2.1, 𝔅 is the base for a topology on ℝ
called the lower limit topology. The real line with the lower limit topology is
sometimes referred to as the Sorgenfrey line and is denoted by ℝl . The lower
limit topology is a rich and complicated topology that provides a number of
interesting counterexamples. See problem 3 at the end of this section, and the
exercises on section 5.7. 

The following theorem serves as an early indicator of the importance and typical
uses of open bases.

Theorem 5.2.2. Let X be a topological space, and let 𝔅 be an open base for the
topology on X. If A be a subset of X, and x ∈ X, then x ∈ A if and only if every
basis element containing x intersects A.

Proof. Use theorem 5.1.3 and problem 2 at the end of this section. 

Definition. A topology 𝒯 on a set X is said to be weaker (or smaller, or coarser)


than a topology ℱ on X if 𝒯 ⊆ ℱ. We also say that ℱ is stronger (or finer)
than 𝒯.

Example 4. If X is an infinite set, then the indiscrete topology on X is weaker


than the co-finite topology, which, in turn, is weaker than the topology 𝒫(X).
The lower limit topology is strictly stronger than the usual topology on ℝ. See
problem 3 at the end of this section. 

Definition. An open subbase for the topology 𝒯 on X is a collection 𝔖 of open


sets such that the collection of finite intersections of members of 𝔖 is an open
base for 𝒯. If 𝔖 is a subbase for 𝒯, we say that 𝔖 generates 𝒯.

Example 5. The collection of intervals {(−∞, b) ∶ b ∈ ℚ} ∪ {(a, ∞) ∶ a ∈ ℚ} is an


open subbase for the usual topology on ℝ. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

200 fundamentals of mathematical analysis

The following theorem provides an important mechanism for constructing a


topology that contains a predetermined collection of subsets, as described in the
preamble to the section.

Theorem 5.2.3. Let 𝔖 be a collection of subsets of a nonempty set X such that ∪{S ∶
S ∈ 𝔖} = X. Then there exists a unique smallest topology on X that contains 𝔖 as
a subbase.

Proof. Let 𝔅 be the collection of finite intersections of members of 𝔖. If U and V are


in 𝔅, then clearly U ∩ V is in 𝔅. By theorem 5.2.1, 𝔅 is the base of a topology, 𝒯.
Notice that the members of 𝒯 are unions of finite intersections of members of 𝔖. If
ℱ is another topology that contains 𝔖, then ℱ, being a topology, contains all finite
intersections of 𝔖 and hence all unions of such intersections. Thus ℱ contains 𝒯.
This makes 𝒯 the weakest topology that contains 𝔖. 

Exercises
n
1. Show that the collection of open boxes {∏i=1 (ai , bi ) ∶ ai , bi ∈ ℚ} is an open
base for the usual topology on ℝn .
2. Prove that a collection 𝔅 of open subsets of a topological space X is an
open base if and only if, for every open set U and every x ∈ U, there exists a
member B of 𝔅 such that x ∈ B ⊆ U.
3. (a) Prove that the usual topology on ℝ is weaker that the lower limit
topology.
(b) Prove that each of the following intervals is both open and closed in
the lower limit topology: [a, b), (−∞, a), and [a, ∞). Conclude that the
usual topology is strictly weaker than the lower limit topology.
4. Let 𝔅1 and 𝔅2 be bases for the topologies 𝒯1 and 𝒯2 on the same set X. Show
that if, for every B ∈ 𝔅1 and every x ∈ B, there exists an element B′ ∈ 𝔅2
such that x ∈ B′ ⊆ B, then 𝒯1 ⊆ 𝒯2 .
5. Let 𝔅 be an open base for a topology 𝒯 on a set X. Prove that 𝒯 is the
intersection of all the topologies on X that contain 𝔅.
6. What topology on ℝ is generated by the open subbase {(−∞, a) ∶ a ∈ ℝ}?
7. Let {𝒯𝛼 }𝛼 be a collection of topologies on a set X. Prove that there is a unique
smallest topology 𝒯 that contains ∪𝛼 𝒯𝛼 .

5.3 Continuity

In section 4.3, we studied the definition of local continuity of functions on metric


spaces. It is clear that the 𝜖-𝛿 definition provides no clues to generalizing the
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 201

definition to the topological case. However, theorem 4.3.1 provides a metric-free


characterization of local continuity which, with very slight changes, produces the
following definition.

Definition. Let X and Y be topological spaces. A function f ∶ X → Y is said to be


continuous at a point x0 ∈ X if, for every open subset V of Y containing f (x0 ),
f−1 (V) contains an open neighborhood of x0 .

We point out here an important distinction between metric and general


topologies. Theorem 4.3.2 established the fact that, in the metric case, continuity
is equivalent to sequential continuity. This is not the case for a general
topological space. See problem 11 at the end of this section.

As in the metric case, we can define a function from a topological space X to


another space Y to be continuous if it is continuous at each point of X. However,
theorem 4.3.3 suggests a more convenient, and widely used, definition of global
continuity.

Definition. Let (X, 𝒯X ) and (Y, 𝒯Y ) be topological spaces. A function f ∶ X → Y is


said to be continuous if the inverse image of every open subset of Y is an open
subset of X. Symbolically, V ∈ 𝒯Y implies f−1 (V) ∈ 𝒯X .

Continuity depends entirely on the topologies on X and Y. Let X = ℝ, 𝒯1 be


the discrete topology on X, and let 𝒯2 be the usual topology on ℝ. The identity
function IX ∶ (X, 𝒯1 ) → (X, 𝒯2 ) is continuous, but the very same function IX ∶
(X, 𝒯2 ) → (X, 𝒯1 ) is not continuous because not every subset of ℝ is open in the
usual topology of ℝ.

Theorem 5.3.1. Using the notation of the above definition, the following are
equivalent:

(a) f is continuous.
(b) The inverse image of a closed subset of Y is a closed subset of X.
(c) If 𝔅 is an open base for 𝒯Y , then f−1 (B) is open in X for every B ∈ 𝔅.
(d) If 𝔖 is an open subbase for 𝒯Y , then, for every S ∈ 𝔖, f−1 (S) is open in X.

Proof. Parts (a) and (b) are equivalent because of the identity f−1 (F) = X − f−1
(Y − F) and the fact that a subset F of Y is closed if and only if Y − F is open.
Clearly, (a) implies (c), and (c) implies (d). Now (d) implies (c) by virtue of the
identity f−1 (S1 ∩ . . . ∩ Sn ) = f−1 (S1 ) ∩ . . . ∩ f−1 (Sn ), and(c) implies (a) because of
the identity f−1 (∪𝛼 B𝛼 ) = ∪𝛼 f−1 (B𝛼 ). 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

202 fundamentals of mathematical analysis

Example 1. Suppose f is a real-valued function on a topological space X. If f is


continuous at x0 , then there exists an open neighborhood U of x0 such that f |U
is a bounded subset of ℝ.
Consider the open interval V = ( f (x0 ) − 1, f (x0 ) + 1). By the definition of
continuity, there exists an open neighborhood U of x0 such that f (U) ⊆ V.
Equivalently, for every x ∈ U, | f (x) − f (x0 )| < 1. Now, for every x ∈ U, | f (x)| ≤
| f (x) − f (x0 )| + | f (x0 )| < 1 + | f (x0 )| < ∞. 

We leave the proof of the following theorem as an exercise.

Theorem 5.3.2. A function f from a topological space X to a topological space Y is


continuous if and only if it is continuous at each point x ∈ X. 

Definition. A real-valued function f on a topological space X is said to be locally


bounded if, for every x ∈ X, there exists an open neighborhood Ux of x such
that f |Ux is bounded in ℝ.

The following example follows directly from example 1 and the previous theorem.

Example 2. A continuous, real-valued function on a topological space is locally


bounded. 

Two Important Function Spaces

In section 4.8, we defined the spaces ℬ(X) of bounded functions on an arbitrary


nonempty set X and, for a metric space X, the spaces 𝒞(X) of continuous
functions on X and ℬ𝒞(X) of continuous bounded functions on X. The same
definitions clearly make good sense when X is a topological space. Theorem 4.8.1
states that ℬ(X) is a complete normed linear space under the uniform metric. The
following theorem is the generalization of theorem 4.8.2.

Theorem 5.3.3. If X is a topological space, then the space ℬ𝒞(X) of continuous


bounded functions on X is a complete normed linear space.

Proof. Since ℬ𝒞(X) is a subspace of ℬ(X), it suffices to show that ℬ𝒞(X) is closed
in ℬ(X). Let f ∈ ℬ(X) be a closure point of ℬ𝒞(X). We need to show that f is
continuous at each point x0 ∈ X. For 𝜖 > 0, there exists a function g ∈ ℬ𝒞(X)
such that ‖ f − g‖∞ < 𝜖/3. By the continuity of g at x0 , there exists an open
neighborhood U of x0 such that, for every x ∈ U, |g (x) − g (x0 )| < 𝜖/3. Now if
x ∈ U, then | f (x) − f (x0 )| ≤ | f (x) − g (x)| + |g (x) − g (x0 )| + |g (x0 ) − f (x0 )| < 𝜖.

OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 203

Homeomorphisms

Definition. We say that two topological spaces (X, 𝒯X ) and (Y, 𝒯Y ) are homeo-
morphic if there exists a bijection 𝜑 ∶ X → Y such that both 𝜑 and 𝜑−1 are
continuous. We call such a function 𝜑 bicontinuous, or a homeomorphism.

Intuitively speaking, two topological spaces are homeomorphic if they have iden-
tical arrangements of open sets.

Example 3. Any two open bounded intervals are homeomorphic. The linear
function that maps (0, 1) onto (a, b) is clearly bicontinuous. 

Example 4. The stereographic projection is a homeomorphism from the punc-


tured sphere onto ℝ2 . 

Example 5. Not every continuous bijection is a homeomorphism. The function


f (t) = (cos t, sin t) is a continuous bijection from the half-open interval [0, 2𝜋)
onto the unit circle. 

Definition. Let (X, 𝒯X ) and (Y, 𝒯Y ) be topological spaces, let 𝜑 ∶ X → Y be an


injection, and let Z = ℜ( f). If 𝜑 and 𝜑−1 ∶ Z → X are both continuous, we say
that 𝜑 injects X homeomorphically into Y. We also say that X is topologically
embedded in Y. Here Z is given the restricted topology induced by 𝒯Y .

Example 6. The inverse stereographic projection embeds ℝ2 into the unit


sphere. 

Upper and Lower Semicontinuous Functions

Definition. A real-valued function f on a topological space X is said to be lower


semicontinuous if, for every a ∈ ℝ, f−1 ((a, ∞)) is open. We say that f is upper
semicontinuous if f−1 ((−∞, b)) is open for every b ∈ ℝ.3

Theorem 5.3.4. Let f be a real-valued function on a topological space X.

(a) f is continuous if and only if it is both upper and lower semicontinuous.


(b) The characteristic function of an open subset is lower semicontinuous.

3 Lower semicontinuous functions played a significant role in the early development of measure
theory. Upper and lower semicontinuous functions facilitate a succinct proof of Uryshon’s lemma
(theorem 5.11.2).
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

204 fundamentals of mathematical analysis

(c) The characteristic function of a closed subset is upper semicontinuous.


(d) If { f𝛼 }𝛼 is a family of lower semicontinuous functions, then sup𝛼 { f𝛼 } is lower
semicontinuous.
(e) If { f𝛼 }𝛼 is a family of upper semicontinuous functions, then inf𝛼 {f𝛼 } is upper
semicontinuous.

Proof. (a) If f is both upper and lower semicontinuous, then, for all real numbers
a and b, f−1 (a, ∞) and f−1 (−∞, b) are open. Since intervals of the type
(a, ∞) and (−∞, b) form an open subbase for the usual topology on ℝ, f is
continuous. The converse is trivial.
(b) If A ⊆ X is open, then, for a ∈ ℝ, 𝜒A−1 (a, ∞) is open because

⎧∅ if a ≥ 1,
𝜒A−1 (a, ∞) = A if 0 ≤ a < 1,

⎩X if a < 0.

(c) The proof is similar to that of part (b).


(d) Let f = supa { f𝛼 }. Since f−1 (a, ∞) = ∪𝛼 f−1 −1
𝛼 (a, ∞), f (a, ∞) is open.
(e) The proof is similar to that of part (d). 

Exercises

1. Let X, Y, and Z be topological spaces, and let f ∶ X → Y and g ∶ Y → Z.


Prove that
(a) if f is constant, it is continuous.
(b) if f and g are continuous, so is gof;
(c) if A is a subset of X, then the inclusion map from A to X is continuous;
and
(d) if f is continuous and A ⊆ X, then the restriction of f to A is continuous.
2. Let X and Y be topological spaces, and let f ∶ X → Y.
(a) Prove that f is continuous if and only if, for every x ∈ X and every open
neighborhood V of f (x) in Y, there exists an open neighborhood U of x
such that f (U) ⊆ V.
(b) Prove that f is continuous if and only if for every subset A of X,
f (A) ⊆ f (A).
3. Let 𝒯 and ℱ be two topologies on a set X. Prove that 𝒯 is weaker that ℱ if
and only if the identity function IX ∶ (X, ℱ) → (X, 𝒯) is continuous. Con-
clude that 𝒯 = ℱ if and only if IX ∶ (X, ℱ) → (X, 𝒯) is a homeomorphism.
4. Suppose that f is a function from a topological space X to a topological space
Y and that X = ∪ni=1 Ai , where A1 , . . . , An are closed subsets of X. Prove that
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 205

if each f |Ai is continuous, then f is continuous. This result is also true when
each of the sets Ai is open.
5. Let f and g be continuous real-valued functions on a topological space X.
Prove that
(a) f ± g, fg and | f | are continuous,
(b) the set {x ∈ X ∶ f (x) ≤ g (x)} is closed, and
(c) the functions h = min{ f, g} and k = max{ f, g} are continuous.
6. Prove that the following subspaces of the Euclidean plane are homeomor-
phic:
(a) the punctured plane {(x, y) ∶ x2 + y2 > 0}
(b) the open annulus {(x, y) ∶ 1 < x2 + y2 < 4}
7. Prove that a discrete topological space (X, 𝒯) is homeomorphic to a sub-
space of ℝ if and only if X is countable.
8. Let a, b, c, and d be real numbers such that

a b
det ( ) > 0.
c d

az+b
Show that the function f (z) = is a homeomorphism of the open upper
cz+d
half of the complex plane.
x
9. Let X = ℝn − {0}. Prove that the function f (x) = 2 is continuous on X.
‖x‖2
10. (a) Let f be a real function on a topological space X. Prove that f is lower
semicontinuous if and only if −f is upper semicontinuous.
(b) Prove that a subset A of X is open if 𝜒A is lower semicontinuous.
(c) Prove that a subset B of X is closed if 𝜒B is upper semicontinuous.
11. Definition. A sequence (xn ) in a topological space X is said to converge to
x ∈ X if every neighborhood of x contains all but finitely many terms of
(xn ). See problem 9 on section 4.1.
Let f be a function from a topological space X to a topological space Y.
Show that if f is continuous at x0 ∈ X, then it is sequentially continuous at
x0 (see theorem 4.3.2). Also give an example to show that the converse is
false.

5.4 The Product Topology: The Finite Case

In section 4.4, we defined the product of finitely many metric spaces. In this
section, we develop a construction that generalizes the concept to the case of
topological spaces. Thus we define a topology on the Cartesian product of a finite
number of topological spaces. Needless to say, the product topology should agree
with and extend the definition of the product metric.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

206 fundamentals of mathematical analysis

Let (X1 , 𝒯1 ), … , (Xn , 𝒯n ) be topological spaces, and consider the Cartesian product
n
X = ∏i=1 Xi of the underlying sets. Consider the following collection of subsets
of X:
𝔖 = ∪ni=1 {X1 × . . . × Ui × . . . × Xn ∶ Ui ∈ 𝒯i }.
Since ∪{S ∶ S ∈ 𝔖} = X, theorem 5.2.3 applies; hence the following definition is
meaningful:

Definition. In the notation of the above paragraph, the product topology on X


is the weakest topology that contains 𝔖.

By construction, 𝔖 is a subbase for the product topology (theorem 5.2.3). An


open base for the product topology on X consists of intersections of finitely
many members of 𝔖. Since ∩ni=1 (X1 × . . . × Ui × . . . × Xn ) = U1 × . . . × Un , an
open base for the product topology is the collection

𝔅 = {U1 × . . . × Un ∶ Ui ∈ 𝒯i }.

The set 𝔖 is referred to as the defining subbase for the product topology and
the set 𝔅 is called the defining base for the product topology.

The following theorem establishes a property of the product topology that


actually characterizes that topology and is frequently used as an alternative,
equivalent definition of the product topology. Recall that 𝜋i denotes the pro-
jection of the product space X onto the factor space Xi : 𝜋i (x1 , . . . , xn ) = xi .

Theorem 5.4.1. The product topology is the weakest topology relative to which all
the projections 𝜋i ∶ X → Xi are continuous.

Proof. Let Ui be open in Xi . A set of the type 𝜋i−1 (Ui ) = X1 × . . . × Ui × . . . × Xn is


a member of the defining subbase for the product topology on X. Thus 𝜋i−1 (Ui )
is open in X, and 𝜋i is continuous. Any topology ℱ relative to which all the
projections are continuous must contain all sets of the form 𝜋i−1 (Ui ) for all
1 ≤ i ≤ n and all Ui ∈ 𝒯i . Thus ℱ contains the defining subbase 𝔖 of the product
topology. By theorem 5.2.3, the product topology is weaker than ℱ. 

Comparing the above result with problems 2 and 5 on section 4.4 should convince
the reader that the product topology defined in this section is indeed the correct
generalization of the product metric.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 207

n n
Theorem 5.4.2. If Fi is closed in Xi , then ∏i=1 Fi is closed in ∏i=1 Xi .

Proof. The proof is identical to that of theorem 4.4.3. 

Theorem 5.4.3. Let 𝔅i be an open base for the topology 𝒯i on Xi . Then

n
∏ 𝔅i = {V1 × . . . × Vn ∶ Vi ∈ 𝔅i }
i=1

n
is an open base for the product topology on ∏i=1 Xi .

n
Proof. Let W be open in ∏i=1 Xi , and let x = (x1 , . . . , xn ) ∈ W; W is the union of
sets of the type U1 × . . . × Un , where Ui ∈ 𝒯i . Therefore, for a set of that type, x ∈
U1 × . . . × Un ⊆ W. For each xi , choose a member Vi ∈ 𝔅i such that xi ∈ Vi ⊆ Ui .
Clearly, x ∈ V1 × . . . × Vn ⊆ U1 × . . . × Un ⊆ W. 

Exercises

1. Let X and Y be topological spaces, and let x be a fixed element of X. Prove that
Y is homeomorphic to {x} × Y. The latter set is given the restricted topology
induced by the product topology on X × Y.
2. Prove that X1 × . . . × Xn is homeomorphic to X1 × (X2 × . . . × Xn ).
3. Let X and Y be topological spaces, and let A ⊆ X and B ⊆ Y. Prove that
A × B = A × B.

Definition. A function f from a topological space X to a topological space Y


is said to be an open mapping if f (U) is open in Y for every open subset U of
X. Similarly, f is a closed mapping if it maps closed subsets of X into closed
subsets of Y.
n
4. Prove that the projections 𝜋i from a product space ∏i=1 Xi onto the factor
spaces Xi are open. Also give an example to show that the projections need
not be closed mappings.
5. Let X1 , X2 , Y1 , and Y2 be topological spaces, and let fi ∶ Xi → Yi be con-
tinuous, i = 1, 2. Prove that the function F ∶ X1 × X2 → Y1 × Y2 defined by
F(x1 , x2 ) = ( f1 (x1 ), f2 (x2 )) is continuous.
6. Let X be a nonempty set (no topology), and let {Y𝛼 } be a collection of
topological spaces. Show that, for an arbitrary collection of functions f𝛼 ∶
X → Y𝛼 , there is a unique smallest topology on X relative to which all the
functions f𝛼 are continuous.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

208 fundamentals of mathematical analysis

n
7. Prove that if Ai is dense in Xi for 1 ≤ i ≤ n, then ∏i=1 Ai is dense in the
product topology.
8. Let X be an infinite set, and let 𝒯 be the co-finite topology on X. Prove that
the product topology on X × X is not the co-finite topology.

5.5 Connected Spaces

Intuitively speaking, a disconnected space comes in two pieces. One might be


tempted to define a disconnected space as the union (X1 ∪ X2 , 𝒯1 ∪ 𝒯2 ) of two
topological spaces (X1 , 𝒯1 ) and (X2 , 𝒯2 ) where X1 ∩ X2 = ∅. A little reflection
reveals that 𝒯1 ∪ 𝒯2 is not a topology. A topology 𝒯 on X = X1 ∪ X2 that contains
𝒯1 ∪ 𝒯2 must contains U1 ∪ U2 for any two open sets U1 ∈ 𝒯1 and U2 ∈ 𝒯2 . In
particular, X1 ∈ 𝒯 and X2 ∈ 𝒯2 . Thus X is the union of two open, disjoint proper
subsets of X. This leads us to the following definition.

Definition. A topological space is said to be connected if it is not the union of two


disjoint nonempty open subsets. If X is not connected, we say it is disconnected.
Thus X is disconnected if X = P ∪ Q, where P and Q are open, disjoint, and
P ≠ ∅ ≠ Q. The pair (P, Q) is called a disconnection of X. It is clear that X is
disconnected if and only if it contains a proper, nonempty subset that is both
open and closed.

Example 1. The space X = {0, 1} with the discrete topology is disconnected


because it is the union of the open sets {0} and {1}. We will refer to this space as
the discrete space {0, 1}.

Theorem 5.5.1. A topological space X is disconnected if and only if there exists a


continuous function from X onto the discrete space {0, 1}.

Proof. Let X be disconnected, and let (P, Q) be a disconnection of X. The function


𝜑 ∶ X → {0, 1} defined by 𝜑(P) = 0, and 𝜑(Q) = 1 is clearly continuous.
Conversely, if 𝜑 ∶ X → {0, 1} is a continuous surjection, then P = 𝜑−1 (0) and
Q = 𝜑−1 (1) is a disconnection of X. 

Definition. A subset X of ℝ is an interval if whenever x, y ∈ X and x < z < y, then


z ∈ X.

Theorem 5.5.2. A subset X of ℝ is connected if and only if it is an interval. In


particular, ℝ is connected.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 209

Proof. Here X is given the relative topology induced by the usual topology on ℝ.
If X is not an interval, then there exist two real numbers x and y in X and a
real number z ∈ ℝ − X such that x < z < y. The two sets P = X ∩ (−∞, z) and
Q = X ∩ (z, ∞) form a disconnection of X.

Now suppose, contrary to our assertion, that X is a disconnected interval, and


let 𝜑 ∶ X → {0, 1} be a continuous surjection. Since 𝜑 is onto, there exist two
real numbers a, b ∈ X such that 𝜑(a) = 0, and 𝜑(b) = 1. Without loss of gen-
erality, assume that a < b (otherwise, replace 𝜑 with 1 − 𝜑). Since X is an
interval, the closed interval [a, b] is contained in X. Define P = [a, b] ∩ 𝜑−1 (0) and
Q = [a, b] ∩ 𝜑−1 (1). Clearly, P and Q are closed and nonempty and partition
[a, b]. We claim that there exist sequences an ∈ P and bn ∈ Q such that (an ) is
b−a
non-decreasing, (bn ) is non-increasing and bn − an = n−1 . This immediately leads
2
to a contradiction because then a = limn an = limn bn = b ∈ P ∩ Q = ∅.
We now construct the sequences (an ) and (bn ). Define a1 = a, b1 = b. Having
a +b
found a2 , … , an and b2 , … , bn , let m = n n . If m ∈ P, define an+1 = m, and
2
bn+1 = bn . If m ∈ Q, define an+1 = an , and bn+1 = m. The sequences (an ) and
(bn ) have the stated properties. 

Theorem 5.5.3. The continuous image of a connected space is connected.

Proof. Let X be a connected space, and let f be a continuous surjection of X onto a


topological space Y. If Y is disconnected, there is a continuous surjection 𝜑 ∶ Y →
{0, 1}. In this case, the function 𝜑of would be a continuous surjection from X onto
{0, 1}. This contradicts the connectedness of X and proves that Y is connected. 

Example 2. The closed interval [0, 1] is not homeomorphic to the circle 𝒮1 .

Suppose there exists a homeomorphism f ∶ [0, 1] → 𝒮1 . Then the restriction


of f to the connected subset (0, 1) would be a homeomorphism. But this is a
contradiction because f ((0, 1)) is the circle with two missing points, which not
connected. 

The following result follows directly from the last two theorems.

Corollary 5.5.4 (the intermediate value theorem). If X is connected and f ∶


X → ℝ is continuous, then f (X) is an interval. 

Example 3. (a) Let f ∶ [a, b] → ℝ be a continuous function and, say, f (a) < f (b).
If k is between f (a) and f (b), then there exists a point x ∈ (a, b) such that
f (x) = k.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

210 fundamentals of mathematical analysis

(b) Let f ∶ [a, b] → [a, b] be continuous; then f has a fixed point in [a, b].
Since the range, R, of f is connected, it is an interval. In particular, R
contains the interval [f (a), f (b)]. This proves (a). We now prove (b).
If f (a) = a or f (b) = b, there is nothing to prove, so assume that f (a) >
a and f (b) < b. Define a function h on [a, b] by h(x) = x − f (x). Then
h(a) < 0 < h(b). By (a), there is a point x ∈ [a, b] such that h(x) = 0, that is,
f (x) = x. 

Theorem 5.5.5. If X and Y are connected, then the product X × Y is connected.

Proof. Let (x0 , y0 ) and (x1 , y1 ) be arbitrary but fixed elements in X × Y. Suppose 𝜑 ∶
X × Y → {0, 1} is continuous. The function i ∶ Y → {x0 } × Y given by i(y) = (x0 , y)
is continuous; hence 𝜑oi is continuous and hence constant because Y is connected.
Thus 𝜑(x0 , y0 ) = 𝜑(x0 , y1 ). Likewise, the function x ↦ 𝜑(x, y1 ) is constant, so
𝜑(x0 , y1 ) = 𝜑(x1 , y1 ). Thus 𝜑(x1 , y1 ) = 𝜑(x0 , y0 ), and 𝜑 is constant. This proves
that X × Y is connected. 

Corollary 5.5.6. ℝn is connected.

Proof. Use induction, the previous theorem and the fact that ℝn is homeomorphic to
ℝ × ℝn−1 . 

Definition. A subset A of a topological space (X, 𝒯) is connected if it is a


connected space with respect to the restricted topology on A induced by 𝒯.

Example 4. The set X = (−1, 0) ∪ (0, 1) is a disconnected subspace of ℝ. This is


because (−∞, 0) ∩ X = (−1, 0) and (0, ∞) ∩ X = (1, 0); hence both (−1, 0) and
(1, 0) are open in X. 

Theorem 5.5.7. Let {A𝛼 }𝛼 be a collection of connected subsets of a topological space


X such that ∩𝛼 A𝛼 ≠ ∅. Then A = ∪𝛼 A𝛼 is connected.

Proof. Let 𝜑 ∶ A → {0, 1} be continuous. Fix an element b ∈ ∩𝛼 A𝛼 . For any a ∈ A,


a ∈ A𝛼 for some 𝛼. The restriction of 𝜑 to A𝛼 is continuous; therefore 𝜑(a) = 𝜑(b)
since A𝛼 is connected. Thus 𝜑 is not onto, and A is connected. 

Theorem 5.5.8. Let A be a connected subset of a topological space X. If B is such that


A ⊆ B ⊆ A, then B is connected. In particular, A is connected.

Proof. Let 𝜑 ∶ B → {0, 1} be continuous. Since A is connected, 𝜑|A is constant; say,


𝜑(A) = 0. Now {0} is closed in {0, 1}, so 𝜑−1 (0) is closed in B and contains A.
Therefore 𝜑−1 (0) contains the closure of A in B. But the closure of A in B is
A ∩ B = B. Thus 𝜑(B) = 0, and 𝜑 is not onto, showing that B is connected. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 211

Definition. Let X be a topological space, and let x, y ∈ X. We say that two points x
and y in X are connected if there is a connected subset of X that contains x and
y. Define a relation ≡ on X by x ≡ y if x and y are connected. It is clear that ≡ is
an equivalence relation.

Theorem 5.5.9. The equivalence classes of the relation ≡ in the above definition are
connected sets.

Proof. Let C be one of the equivalence classes and fix an element a ∈ C. For every
x ∈ C, there exists a connected subset Ax of X containing a and x. All the
elements of Ax are related; hence Ax ⊆ C. Since C = ∪x∈C Ax , and a ∈ ∩x∈C Ax ,
C is connected by theorem 5.5.7. 

Definition. The equivalence classes of the relation ≡ are called the connected
components of X.

If A is a connected subset of X, then all the elements of A are connected (hence


related). Therefore A is contained in exactly one of the connected components of
X. The summary of the above discussion is that the connected components are
the maximal connected subsets of X. If C is a connected component of X, then C
is also connected by theorem 5.5.8. Thus C is contained in a unique connected
component of X. Since C ∩ C ≠ ∅, C ⊆ C. Thus C = C.
We have proved most of next result.

Theorem 5.5.10. A topological space X is the disjoint union of a collection 𝒞 of


disjoint, connected, closed subsets of X, namely, the connected components of
X. Every connected subset of X is contained in exactly one of the connected
components of X. Every proper, nonempty subset of X that is both open and closed
is the union of connected components of X.

Proof. The last assertion of the theorem is the only one we still need to prove. Let P
be a proper nonempty subset of X that is both open and closed, and let Q = X − P.
Then ∅ ≠ Q ≠ X, and Q is also open and closed. We show that if C is a connected
component of X, and C ∩ P ≠ ∅, then C ⊆ P. The sets C ∩ P and C ∩ Q are both
open and closed in C. Since C is connected, and C ∩ P ≠ ∅, C ∩ Q = ∅, because
otherwise the pair (C ∩ P, C ∩ Q) would form a disconnection of C. This proves
that C ⊆ P. 

We conclude this section with a brief excursion into path connected spaces.

Definition. Given two points x and y in a topological space X, a path from x to y


is a continuous function f ∶ [0, 1] → X such that f (0) = x, and f (1) = y. If there
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

212 fundamentals of mathematical analysis

is a path from x to y, we say that x and y are path connected. A topological space
X is path connected if every pair of points in X are path connected.

Example 5. Every path connected space is connected.


Let x and y be points in a path connected space X, and let f be a path from
x to y. The set { f (t) ∶ t ∈ [0, 1]} is connected and contains x and y. This shows
that every two points in X are connected, and hence X is connected. 

Example 6. The space X = ℝn − {0} is path connected.


Let x and y be points in X, and consider the line segment, L, that joins x and
y. If L does not contain 0, then L is the path we need. If 0 ∈ L, take a point z ∈ X
not on L. The union of the two line segments that join x and y, and then y and
z, is a path from x to z. 

Example 7. For n > 1, the sphere 𝒮n−1 is path connected.


The function f (x) = x/‖x‖2 maps ℝn − {0} continuously onto 𝒮n−1 . The result
now follows from example 6 and problem 12 at the end of this section. 

Exercises

1. Prove that a subset X of ℝ is an interval (according to the definition in this


section) if and only if X has one of the following types: (−∞, ∞), (−∞, a),
(−∞, a], (b, ∞), [b, ∞), [a, b), (a, b], [a, b], or (a, b). Here a and b are real
numbers, and a < b.
2. Prove that the intervals [0, 1) and (0, 1) are not homeomorphic. Also show
that [0, 1] and [0, 1) are not homeomorphic.
3. Show that, for n > 1, ℝn is not homeomorphic to ℝ.
4. Prove that a topological space X is connected if and only if every nonempty
proper subset of X has a nonempty boundary.
5. Let X be connected. Show that if there exists a continuous, nonconstant
function f ∶ X → ℝ, then X is uncountable.
6. Prove that if a subset A of a topological space X is connected, open and
closed, then A is a connected component of X.

Definition. A topological space (X, 𝒯) is called totally disconnected if the


connected components of 𝒯 are singletons.

7. Prove that ℚ (with the usual topology) is totally disconnected. This result
shows that the connected components of a topological space need not be
open.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 213

8. Prove that a topological space X is totally disconnected if, for every pair of
distinct points x and y, there is a disconnection (P, Q) of X such that x ∈ P
and y ∈ Q.
9. Prove that if a Hausdorff space X has an open base whose members are also
closed, then X is totally disconnected. The definition of a Hausdorff space
appears in the next section.
10. Prove that the Sorgenfrey line is totally disconnected.
11. Prove the the product of two totally disconnected spaces is totally discon-
nected.
12. Prove that the continuous image of a path connected space is path con-
nected.
13. Prove that the set {x ∈ ℝn ∶ ‖x‖2 > 1} is path connected.

Definition. Define a relation ≈ on a topological space X by x ≈ y if x and


y are path connected.

14. Prove that ≈ is an equivalence relation. The equivalence classes of ≈ are


called the path connected components of X.
15. It follows from the above exercise that the path connected components
partition X. Prove that if A is a path connected subset of X, then A is
contained in exactly one of the path connected components.
1
16. Let A = {(x, sin( )) ∶ 0 < x < 1/𝜋}. Clearly, A is path connected and hence
x
connected. By theorem 5.5.8, the closure A of A in ℝ2 is also connected.
Show that A is not path connected. Notice that A = A ∪ {(0, y) ∈ ℝ2 ∶ −1 ≤
y ≤ 1}.

5.6 Separation by Open Sets

Metric spaces enjoy strong separation properties, which we often take for granted.
For example, two distinct points in a metric space have disjoint open neigh-
borhoods. In chapter 4, we called this property the Hausdorff property. There
is no reason to expect that the same property should hold true for an arbitrary
topological space, so this property must be axiomatized. Similarly, theorem 4.2.13
shows that disjoint closed subsets of a metric space possess disjoint open neigh-
borhoods. In the general topological setting, this property is known as normality.
One important problem in topology is that of the metrizability of a topological
space. Explicitly stated, under what set of conditions is a given topology induced
by a metric. The fact that every metirc space is normal imposes an immediate
necessary condition on a topology to be metrizable: such a topology must be
normal. Of course, normality is not a sufficient condition for a space to be
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

214 fundamentals of mathematical analysis

metrizable. In section 5.11, we prove a metrization theorem that gives a sufficient


set of conditions for a topology to be metrizable. In this section, we study the three
most common forms of separating points and sets in a topological space.

Definition. A topological space X is said to be a T1 space if, for every pair of


distinct points x and y in X, there exists a neighborhood of x not containing
y and a neighborhood of y not containing x. The two neighborhoods may
intersect.

Definition. A topological space X is said to be Hausdorff (or T2 ) if for every pair


of distinct points x and y, there is an open neighborhood U of x and an open
neighborhood V of y such that U ∩ V = ∅.

It is safe to say that all important topological spaces are Hausdorff. Weaker
separation axioms, such as T1 , are used mostly to generate exercises and
counterexamples.

Theorem 4.1.4 states that a metric space is Hausdorff, which supports the
statement in the above paragraph since metric spaces are the most important
(but not the only important) examples of topological spaces.

Theorem 5.6.1. If X is a Hausdorff space and x ∈ X, then {x} is closed.

Proof. We show that the set W = X − {x} is open. For every y ∈ W, there exist open
neighborhoods Uy and Vy of x and y, respectively, such that Uy ∩ Vy = ∅. This
clearly implies that Vy ⊆ W for all y ∈ W. Consequently, W = ∪{Vy ∶ y ∈ W},
which is open. 

Definition. A sequence (xn ) of a topological space X is said to converge to a point


x ∈ X if every neighborhood of x contains all but finitely many terms of the
sequence.

Theorem 4.1.5 says that the limit of a convergent sequence in a metric space is
unique. This is precisely because metric spaces are Hausdorff spaces.

Example 1. Let X be a Hausdorff space, and suppose that (xn ) is a convergent


sequence. Then the limit is unique.
Suppose that limn xn = x and limn xn = y and that x ≠ y. Let U and V be
disjoint open neighborhoods of x and y, respectively. Since limn xn = x, there
is an integer N such that, for all n > N, xn ∈ U. Since U ∩ V = ∅, V can contain
only finitely many terms of (xn ), which is a contradiction. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 215

Example 2. Let X be a topological space, and let Y be a Hausdorff space. If f ∶


X → Y is continuous, then the graph of f, G = {(x, f (x)) ∶ x ∈ X} is closed in the
product space X × Y.
We will show that the complement of G is open in X × Y. Let (x, y) ∉ G, thus
y ≠ f (x). Let U and V be disjoint open neighborhoods of y and f (x), respectively.
Because f is continuous, there exists an open neighborhood W of x such that
f (W) ⊆ V. It is easy to check that (W × U) ∩ G = ∅. 

Definition. A Hausdorff space X is said to be regular if, for every x ∈ X and every
closed subset F that does not contain x, there exist open sets U and V such that
x ∈ U, F ⊆ V, and U ∩ V = ∅.

Theorem 4.2.12 states that a metric space is regular.

Example 3. A subspace of a regular space X is regular.


Let Y be a subspace of X, let F be a closed subset of Y (in the restricted
topology on Y), and let x ∈ Y − F. By theorem 5.1.6, F = F ∩ Y, where F denotes
the closure of F in X. Now x ∉ F, so by the regularity of X, there exist disjoint
open neighborhoods U and V of x and F, respectively. The sets U1 = U ∩ Y and
V1 = V ∩ Y are open in Y and separate x and F. 

The following characterization of regularity is often useful.

Theorem 5.6.2. A Hausdorff space is regular if and only if for every x ∈ X and every
open neighborhood U of x, there exists an open neighborhood V of x such that
V ⊆ U.

Proof. Suppose X is regular, and let x and U be as in the statement of the theorem. By
regularity, applied to x and the closed set X − U, there exists open neighborhoods
V of x and W of X − U such that V ∩ W = ∅. Because V ⊆ X − W and the latter
set is closed, V ⊆ X − W. In particular, V ⊆ U.
Conversely, let F be a closed subset of X that does not contain x. By assumption,
there exits an open neighborhood U of x such that U ⊆ X − F. Set V = X − U.
The sets U and V are disjoint open neighborhoods of x and F, respectively,
as desired. 

Definition. A Hausdorff space X is said to be normal if, for every pair of disjoint
closed subsets E and F of X, there exist open sets U and V such that E ⊆ U,
F ⊆ V, and U ∩ V = ∅.

Theorem 4.2.13 states that a metric space is normal.


OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

216 fundamentals of mathematical analysis

The proof of theorem 5.6.3 mimics that of theorem 5.6.2 and is therefore omitted.

Theorem 5.6.3. A Hausdorff space X is normal if and only if for every closed set E
and every open neighborhood U of E, there exists an open neighborhood V of E
such that V ⊆ U. 

Products and subspaces of normal and regular spaces have dissimilar properties.
For example, the product of regular spaces is regular, but the same result does not
hold for the product of normal spaces. Likewise, an arbitrary subspace of a normal
space need not be normal. See the exercises on section 5.7. However, the following
special case is easy to prove.

Example 4. A closed subspace Y of a normal space X is normal.


Let E and F be closed subspaces of Y. Since Y is closed in X, E and F are closed
in X. By the normality of X, there exists disjoint open neighborhoods U and V
of E and F, respectively. The sets U1 = U ∩ Y and V1 = V ∩ Y are open in Y and
separate E and F. 

Exercises

1. Prove that a topological space is T1 if and only if every single-point set of X


is closed.
2. Let X be an infinite set. Prove that the co-finite topology on X is T1 but not
Hausdorff.
3. Let A be subset of a Hausdorff space X. Show that a point x ∈ X is a limit
point of A if and only if every neighborhood of x contains infinitely many
points of A.
4. Prove that a subspace of a Hausdorff space is Hausdorff and that the product
of two Hausdorff spaces is Hausdorff.
5. Prove that a topological space X is Hausdorff if and only if the diagonal set
{(x, x) ∶ x ∈ X} is closed in the product space X × X.
6. Let X and Y be topological spaces, and let f, g ∶ X → Y be continuous. Prove
that if Y is Hausdorff, then the set {x ∈ X ∶ f (x) = g (x)} is closed.
7. Prove that the set of fixed points of a continuous function on a Hausdorff
space is closed.
8. Let f and g be continuous functions from a topological space X to a
Hausdorff space Y. Show that if f and g agree on a dense subset of X, then
f = g.
9. Prove that the product of two regular spaces is regular.
10. Prove theorem 5.6.3.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 217

11. Let X be a regular space. Prove that every pair of distinct points in X have
neighborhoods whose closures are disjoint.
12. Let X be a normal space. Prove that every pair of disjoint closed subsets of
X have neighborhoods whose closures are disjoint.

5.7 Second Countable Spaces

In this section, we study second countable, separable, and Lindelöf spaces. Theo-
rem 4.5.1 states that all three conditions are equivalent for metric spaces. This is
not true for general topological spaces, and several counterexamples are provided
in this section and the section exercises to show the nonequivalence of the three
conditions. However, second countability implies the other two conditions. Sec-
ond countability has other pleasant consequences, especially when it is combined
with normality or local compactness. The definitions in this section are identical
to the those in the metric case and are included below for ease of reference.

Definition. A subset A of a topological space X is dense in X if A = X.

Definition. A topological space X is separable if it contains a countable dense


subset.

Definition. A topological space X is second countable if the topology on X


contains a countable open base.

Definition. A topological space X is said to be a Lindelöf space if every open


cover of X contains a countable subcover of X. The definitions of open covers
and subcovers can be seen in section 4.5.

Example 1. Consider the Sorgenfrey plane, ℝ2l = ℝl × ℝl . In problem 11, we


ask the reader to show that ℝ2l is separable. Here we show that the subspace
L = {(x, −x) ∶ x ∈ ℝ} is not separable. We claim that restriction of the topology
on ℝ2l to L is the discrete topology. Since L is uncountable, it is not separable. To
prove our claim, let x ∈ ℝ, and consider the set U = [x, x + 1) × [−x, −x + 1); U
is open in ℝ2l , and U ∩ L = {(x, −x)}. Therefore the single point (x, −x) is open
in L. 

Example 2. In problem 10, we ask the reader to show that the Sorgenfrey line
ℝl is Lindelöf. We show here that ℝ2l is not Lindelöf. Thus the product of two
Lindelöf spaces is not necessarily Lindelöf. Let L be as in example 1. The line L is
closed in ℝ2l . Consider the open cover 𝒰 of ℝ2l that consists of {ℝ2l − L} and the
collection {[x, x + 1) × [−x, −x + 1) ∶ x ∈ ℝ}. Clearly, no countable subset of 𝒰
can cover ℝ2l . 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

218 fundamentals of mathematical analysis

Definition. A collection 𝔉 = {F𝛼 ∶ 𝛼 ∈ I} of subsets of a nonempty set X is said


to have the countable intersection property if every countable subcollection
of 𝔉 has a nonempty intersection.

Example 3. A topological space X is Lindelöf if and only if every collection of


closed subsets of X with the countable intersection property has a nonempty
intersection.
Suppose X is Lindelöf, and let 𝔉 be a collection of closed subsets with the
countable intersection. If ∩{F𝛼 ∶ 𝛼 ∈ I} = ∅, then X = ∪𝛼∈I (X − F𝛼 ). Therefore
X = ∪∞n=1 (X − F𝛼n ), for some countable subset {𝛼1 , 𝛼2 , . . . } of I. It follows that

∩n=1 F𝛼n = ∅; a contradiction.
Conversely, if {U𝛼 }𝛼∈I is an open cover of X with no countable subcover,
then the family {F𝛼 ∶ 𝛼 ∈ I} = {X − U𝛼 ∶ 𝛼 ∈ I} has the countable intersection
property because, for a countable subcollection {F𝛼1 , F𝛼2 , . . . } of 𝔉, ∩∞ n=1 F𝛼n =
∞ ∞
∩n=1 (X − U𝛼i ) = X − ∪n=1 U𝛼n ≠ ∅. However, ∩𝛼∈I F𝛼 = ∩𝛼∈I (X − U𝛼 ) = X −
(∪𝛼∈I U𝛼 ) = ∅. 

Theorem 5.7.1. A subset A of a topological space X is dense if and only if it intersects


every open subset of X.

Proof. We prove the contrapositive of each implication. If A ≠ X, then the set


U = X − A is open, nonempty and U ∩ A = ∅.
Conversely, if there exists a nonempty open set U such that U ∩ A = ∅, then
A ⊆ X − U. Since X − U is closed, A ⊆ X − U ≠ X. 

Theorem 5.7.2. In a separable topological space X, every collection of pairwise


disjoint open sets is countable.

Proof. We prove that if X contains an uncountable collection of pairwise disjoint sub-


sets, then any dense subset A of X is uncountable. Let {U𝛼 }𝛼∈I be an uncountable
family of pairwise disjoint open subsets of X. By theorem 5.7.1, each U𝛼 intersects
A. Choose an element a𝛼 ∈ U𝛼 ∩ A. Now a𝛼 ≠ a𝛽 if 𝛼 ≠ 𝛽 since U𝛼 ∩ U𝛽 = ∅.
Hence A is uncountable. 

Theorem 5.7.3. Let X be a second countable topological space. Then

(a) X is separable, and


(b) X is Lindelöf.

Proof. Let {Bn } be a countable open base for the topology on X. For each n ∈ ℕ,
choose a point an ∈ Bn , and let A = {an ∶ n ∈ ℕ}. If U ≠ ∅ is open in X, then U
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 219

contains a basis element Bn , and hence an ∈ A ∩ U. Theorem 5.7.1 implies that A


is dense. The proof of (b) is identical to that in theorem 4.5.1. 

It was observed in section 5.6 that normality is a necessary condition for the
metrizability of a topological space. For second countable spaces, the normality
requirement can be relaxed, as the following theorem shows. As it turns out,
regular second countable spaces are metrizable. See the Urysohn metrization
theorem in section 5.11.

Theorem 5.7.4. A regular second countable Hausdorff space X is normal.

Proof. Let E and F be disjoint closed subsets of X, and let 𝔅 be a countable open base
for the topology on X. For every x ∈ E, x belongs to the open set X − F. By theorem
5.6.2, there exists an open neighborhood W of x such that W ⊆ X − F. Choose a
basis element Bx such that x ∈ Bx ⊆ W. Clearly, E ⊆ ∪x∈E Bx . Since 𝔅 is countable,
the collection {Bx }x∈E can be enumerated as {Un }. Observe that Un ⊆ X − F. A
similar argument produces a countable open cover {Vn } of F such that Vn ∈ 𝔅,
and Vn ⊆ X − E.
Define U′n = Un − ∪ni=1 Vi , and V′n = Vn − ∪ni=1 Ui . Notice that if n ≤ m, then
Un ∩ V′m = ∅. By symmetry, if m ≤ n, then V′m ∩ U′n = ∅. It follows that, for

all m, n ∈ ℕ, U′n ∩ V′m = ∅. Now define U = ∪∞ ′ ∞ ′


n=1 Un and V = ∪n=1 Vn . Clearly,
U ∩ V = ∅, and it is straightforward to verify that E ⊆ U and F ⊆ V. 

Example 4. Let X be a second countable topological space, and let ℭ be an open


base for the topology on X. Then ℭ contains a countable subset which is also
an open base for X.

Let 𝔅 = {Bn ∶ n ∈ ℕ} be a countable open base. Let I be the subset of ℕ × ℕ of


pairs (m, n) for which there is a member C ∈ ℭ such that Bm ⊆ C ⊆ Bn . For each
pair (m, n) ∈ I, choose a member Cm,n of ℭ such that Bm ⊆ Cn,m ⊆ Bn . We show
that the countable collection {Cm,n ∶ (m, n) ∈ I} is an open base. Let U be an
open set, and let x ∈ U. Because 𝔅 is an open base, there is a set Bn such that
x ∈ Bn ⊆ U. For the same reason, there is a member C of ℭ such that x ∈ C ⊆ Bn .
Finally, there is an element Bm of 𝔅 such that x ∈ Bm ⊆ C. Clearly, (m, n) ∈ I,
and x ∈ Cm,n ⊆ U. 

Exercises

1. Prove that the product of two second countable spaces is second countable
and that the product of two separable spaces is separable.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

220 fundamentals of mathematical analysis

Definition. A point x of a subset A of a topological space X is said to be


isolated if x has an open neighborhood U such that A ∩ U = {x}.

2. Prove that the set of isolated points of a second countable space X is at


most countable. Then show that if X is uncountable, then X has uncountably
many limit points.

Definition. A topological space X is said to be first countable if, for every


x ∈ X, there is a countable collection {Un } of open neighborhoods of x such
that, for every open neighborhood U of x, there is an integer n such that
x ∈ Un ⊆ U. The collection {Un } is called a local base at x. It is sometimes
convenient to have a local base {Vn } with the additional property that
Vn ⊇ Vn+1 . This can be easily achieved by defining Vn = ∩ni=1 Ui .

3. Prove that every second countable space is first countable and that every
metric space is first countable.
4. Show that a subspace of a second (respectively, first) countable is second
(respectively, first) countable. Also show that the product of two first
countable spaces is first countable.
5. Show that a subspace of a separable space need not be separable. Hint: See
problem 3 on section 5.1. For a more elaborate example, see problem 12
below.
6. Let X be an uncountable set, and let 𝒯 be the co-finite topology on X.
Show that every infinite subset of X is dense, and hence X is separable.
Show, however, that X is not second countable. Hint: If {Bn } is a countable
collection of open subsets of X, then ∩∞ n=1 Bn is uncountable. Pick a point

x ∈ ∩n=1 Bn , and consider the open set U = X − {x}.
7. Show that a closed subspace of a Lindelöf space if Lindelöf.
8. Let X be a topological space X, and let 𝔅 be an open base for X. Prove that
X is Lindelöf if and only if every open cover of X by members of 𝔅 has a
countable subcover.
9. Show that the Sorgenfrey line is first countable, separable, but not second
countable. Hint: To show that ℝl is not second countable, let 𝔅 be an open
base for ℝl . For every and x ∈ ℝ, there is a member Bx ∈ 𝔅 such that
x ∈ Bx ⊆ [x, x + 1).
10. Prove that the Sorgenfrey line ℝl is Lindelöf. Together with the previous
problem, this problem shows that not every Lindelöf space is second
countable. Hint: Use problem 8. Let {[a𝛼 , b𝛼 ) ∶ 𝛼 ∈ I} be an open cover of
ℝl by basic open subsets of ℝl . Define C = ∪𝛼∈I (a𝛼 , b𝛼 ). View C as a subset
of ℝ with the usual topology; C is Lindelöl because ℝ is a metric space. Thus
there exists a countable subset {𝛼n ∶ n ∈ ℕ} such that C = ∪∞ n=1 (a𝛼n , b𝛼n ).
Argue that ℝ − C is countable.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 221

11. Show that the Sorgenfrey plane ℝ2l is separable.


12. Show that the line L in example 2 is closed in ℝ2l .
13. Let X be a topological space, and let 𝔅 be an open base for the topology on
X. Suppose that 𝔅 has infinite cardinality ℵ. Prove that if ℭ is another open
base for the topology on X, then ℭ contains a subset of cardinality ≤ ℵ that
is also an open base for X.

5.8 Compact Spaces

In section 4.7, we studied compact metric spaces extensively, including several


equivalent formulations of the definition of compactness. We adopt the same
definition of compactness in this chapter because the other characterizations do
not lend themselves easily to generalization to general topological spaces, and
especially because some of the other characterizations of compact metric spaces
are false in general. You will also see that compact spaces have pleasant separation
properties. Finally, we will prove the celebrated Tychonoff theorem for the product
of finitely many topological spaces.The leading theorems in this section have
counterparts in section 4.7. Therefore, proofs that duplicate those in section 4.7
will be omitted.

Definition. A topological space X is said to be compact if every open cover of X


contains a finite subcover of X.

Example 1. The co-finite topology on an infinite set X is compact.


Let 𝒰 be an open cover of X, and fix an element U1 ∈ 𝒰. The complement of
U1 is finite, say, U1 = X − {x2 , . . . , xn }. Now, for each 2 ≤ i ≤ n, pick an element
Ui ∈ 𝒰 that contains xi . The finite collection {U1 , … , Un } covers X. 

Example 2. A real-valued, locally bounded function f on a compact space X is


bounded.
By the definition of local boundedness, for every x ∈ X, there exists a positive
number Mx and an open neighborhood Ux of x such that supx∈Ux | f (x)| ≤ Mx .
Clearly, {Ux ∶ x ∈ X} is an open cover of X. Choose points x1 , … , xn ∈ X such
that the sets Ux1 , … , Uxn cover X, and let M = max1≤i≤n Mxi . For x ∈ X, x ∈ Uxi
for some 1 ≤ i ≤ n, and | f (x)| ≤ Mxi ≤ M. 

Definition. Let K be a subset of a topological space X. We say that K is a compact


subset (or a compact subspace) of X if it is compact in the restricted topology.

Theorem 5.8.1. A subset K of a topological space X is compact if and only if it


satisfies the following condition: if 𝒰 is a collection of open subsets of X such that
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

222 fundamentals of mathematical analysis

K ⊆ ∪{U ∶ U ∈ 𝒰}, then there exists a finite subcollection {U1 , U2 , … , Un } of 𝒰


such that K ⊆ ∪ni=1 Ui .
The proof is identical to that of theorem 4.7.1. 

Theorem 5.8.2. A closed subspace K of a compact space X is compact.


The proof is identical to that of theorem 4.7.2. 

Example 3. Every compact space has the Bolzano-Weierstrass property.


It is sufficient to prove that if a subset A of a compact space X has no limit
points, then it is finite. By problem 9(b) on section 5.1, A is closed. By theorem
5.8.2, A is compact. Every point a ∈ A is not a limit point of A; hence there exists
an open set Ua of A such that A ∩ Ua = {a}. If A is infinite, then the open cover
{Ua ∶ a ∈ A} of A would have no finite subcover. This forces A to be finite, as
claimed. 

The converse of the above example is false, but counterexamples are rather difficult.

Theorem 5.8.3. A compact subspace K of a Hausdorff space X is closed.


The proof is identical to that of theorem 4.7.3. 

Example 4. Let {K𝛼 }𝛼∈I be a collection of compact subsets of a Hausdorff space X.


If ∩𝛼 K𝛼 = ∅, then the intersection of some finite subcollection of {K𝛼 } is empty.

Let V𝛼 = X − K𝛼 , and fix and element 𝛼1 ∈ I. By assumption, {V𝛼 } covers


X and hence K𝛼1 . Thus there exists a finite subset {𝛼2 , … , 𝛼n } of I such that
K𝛼1 ⊆ ∪ni=2 V𝛼i . It follows directly that ∩1≤i≤n K𝛼i = ∅. 

Theorem 5.8.4. The continuous image of a compact space is compact.


The proof is identical to that of theorem 4.7.4. 

Theorem 5.8.5. A continuous real-valued function f ∶ X → ℝ on a compact space


X is bounded and attains its maximum and minimum values.
The proof is identical to that of theorem 4.7.12. 

The next result follows immediately from theorem 5.3.3 and the fact that for a
compact space X, 𝒞(X) = ℬ𝒞(X).

Theorem 5.8.6. Let X be a compact Hausdorff space, and let 𝒞(X) be the space
of continuous functions on X. Then (𝒞(X), ‖.‖∞ ) is a complete normed linear
space. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 223

Theorem 5.8.7. Let X be a compact space, and let Y be a Hausdorff space. Then a
continuous bijection 𝜑 ∶ X → Y is a homeomorphism from X to Y.

Proof. We prove that 𝜑−1 is continuous by showing that 𝜑 is a closed mapping. Let
F be a closed subset of X. By theorem 5.8.2, F is compact. By theorem 5.8.4, 𝜑(F)
is compact in Y. Now theorem 5.8.3 implies that 𝜑(X) is closed, as desired. 

The theorem says that when we limit our attention to compact Hausdorff spaces,
a bijection 𝜑 ∶ X → Y is a homeomorphism if and only if it is simply continuous.
In this situation, we can show that X and Y are homeomorphic by merely showing
the continuity of 𝜑 or 𝜑−1 or by showing that 𝜑 (or 𝜑−1 ) is an open (or a closed)
mapping.

Definition. A collection 𝔉 of subsets of a nonempty set X is said to have the


finite intersection property if every finite subcollection of 𝔉 has a nonempty
intersection.

The next theorem provides a useful equivalent characterization of compactness.


Its proof is left as an exercise. See example 3 on section 5.7.

Theorem 5.8.8. The following are equivalent for a topological space X:


(a) X is compact.
(b) If 𝔉 = {F𝛼 ∶ 𝛼 ∈ I} is a collection of closed subsets of X satisfying the finite
intersection property, then ∩{F𝛼 ∶ 𝛼 ∈ I} ≠ ∅. 

Compactness and Separation

Theorem 5.8.9. Let X be a Hausdorff space, and let F be a compact subset of X.


For every x ∈ X − F, there exist disjoint open sets U and V such that x ∈ U, and
F ⊆ V.

Proof. For every y ∈ F, there exist disjoint open sets Uy and Vy such that x ∈ Uy
and y ∈ Vy . Now F ⊆ ∪y∈F Vy . Since F is compact, F ⊆ ∪ni=1 Vyi for a finite sub-
set {y1 , … , yn } of F. The sets U = ∩ni=1 Uyi and V = ∪ni=1 Vyi have the desired
properties. 

Theorem 5.8.10. A compact Hausdorff space is normal. Thus if E and F are disjoint
closed subsets of X, then there exist disjoint open subsets U and V such that E ⊆ U
and F ⊆ V.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

224 fundamentals of mathematical analysis

Proof. First observe that E and F are compact by theorem 5.8.2. Let x ∈ E. By the
previous theorem, there are disjoint open sets Ux and Vx such that x ∈ Ux and
F ⊆ Vx . Since E ⊆ ∪x∈E Ux , and E is compact, E ⊆ ∪ni=1 Uxi for some finite subset
{x1 , … , xn } of E. Set U = ∪ni=1 Uxi and V = ∩ni=1 Vxi . The sets U and V have the
stated properties. 

Finite Products of Compact Spaces

Lemma 5.8.11 (the tube lemma). Let X be a topological space, and let Y be a
compact space. If an open subset W in X × Y contains a line, {x} × Y, then there
exists a neighborhood U of x such that U × Y ⊆ W. Here x is a fixed element of X.

Proof. For every y ∈ Y, there are open sets Uy ⊆ X and Vy ⊆ Y such that (x, y) ∈
Uy × Vy ⊆ W. Thus {x} × Y ⊆ ∪y∈Y (Uy × Vy ) ⊆ W. The compactness of {x} × Y
yields a finite subset {y1 , … , yn } such that {x} × Y ⊆ ∪ni=1 (Uyi × Vyi ) ⊆ W. Define
n
U = ∩i=1 Uyi . We claim that U × Y ⊆ W. If u ∈ U, and y ∈ Y, then y ∈ Vyi for
some 1 ≤ i ≤ n. But u belongs to Uyi for every 1 ≤ i ≤ n. Therefore (u, y) ∈ Uyi ×
Vyi ⊆ W. 

The above lemma says that if an open subset of X × Y contains a line, then it
must contain a strip (or a tube, hence the name) that contains the line. Intuitively,
an open subset of X × Y cannot get arbitrarily thin around a line. The following
example illustrates the concept.

1
Example 5. The open subset W = {(x, y) ∈ ℝ2 ∶ x ∈ ℝ, |y| < } contains the
1+x2
x-axis but there is no positive number 𝛿 such that ℝ × (−𝛿, 𝛿) is contained
in W. 

Theorem 5.8.12 (Tychonoff ’s theorem). If X and Y are compact spaces, so is


X × Y.

Proof. Let 𝒲 be an open cover of X × Y. For x ∈ X, {x} × Y ⊆ ∪{W ∶ W ∈ 𝒲}.


Since {x} × Y is compact, there exists a finite subset {Wx1 , … , Wxnx } of 𝒲 such
nx nx
that {x} × Y ⊆ ∪i=1 Wxi . Let Wx = ∪i=1 Wxi . By the previous lemma, there exists
an open neighborhood U of x such that Ux × Y ⊆ Wx . The collection of open
x

sets {Ux }x∈X covers X; hence X = ∪m xi


i=1 U , for some finite subset {x1 , … , xm } of X.
xj
We claim that the finite collection {Wi ∶ 1 ≤ j ≤ m, 1 ≤ i ≤ nxj } covers X × Y. Let
nxj xj
(x, y) ∈ X × Y. Then x ∈ Uxj for some 1 ≤ j ≤ m. Now (x, y) ∈ Uxj × Y ⊆ ∪i=1 Wi .
xj
Therefore (x, y) ∈ Wi for some 1 ≤ i ≤ nxj . 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 225

Theorem 5.8.13 (Tychonoff ’s theorem). If X1 , … , Xn are compact spaces, then


n
∏i=1 Xi is compact.

Proof. Use induction, the previous theorem, and the fact that X1 × . . . × Xn is
homeomorphic to X1 × (X2 × . . . × Xn ). 

The following topic is included as an excursion. More properties of countably


compact spaces are explored in the section exercises.

Definition. A topological space X is said to be countably compact if every


countable open cover of X contains a finite subcover.

Example 6. A topological space X is countably compact if and only if, for every
descending sequence F1 ⊇ F2 ⊇ . . . of nonempty closed sets, ∩∞
n=1 Fn ≠ ∅.

Suppose X is countably compact. If ∩∞ ∞


n=1 Fn = ∅, then X = ∪n=1 (X − Fn ). The
countable compactness assumption and the fact that the sequence X − Fn is
ascending imply that X = X − FN for some integer N. This would force FN = ∅,
which is a contradiction.
To prove the converse, suppose that {Un } is an open cover of X, and for
n ∈ ℕ, define Vn = ∪ni=1 Ui . Finally define Fn = X − Vn . Then F1 ⊇ F2 ⊇ . . . , and
∩∞n=1 Fn = ∅ because {Vn } covers X. It follows that Fn = ∅ for some integer n,
hence X = Vn = ∪ni=1 Ui . 

Exercises

1. Show that the union of a finite number of compact subsets of a topological


space X is compact.
2. Verify that the proofs of theorems 5.8.1 through 5.8.5 are those included for
the corresponding theorems in section 4.7, without alteration.
3. Let X be a compact Hausdorff space, and suppose there exists a countable
set of continuous functions fn ∶ X → [0, 1] such that, for every pair of
distinct point x and y in X, there exists a function fn such that fn (x) ≠ fn (y).

Prove that the function d(x, y) = ∑n=1 2−n | fn (x) − fn (y)| is a metric and
that it induces the topology on X.
4. Let X be a compact space, and let F1 ⊇ F2 ⊇ . . . be a descending sequence
of nonempty closed subsets of X. Prove that ∩∞ n=1 Fn ≠ ∅.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

226 fundamentals of mathematical analysis

5. Prove that a compact Hausdorff space X cannot be expressed as a countable


union of (closed) nowhere dense subsets {An }.
6. Let 𝒯1 and 𝒯2 be topologies on the same set X such that 𝒯1 is Hausdorff and
𝒯2 is compact. Prove that if 𝒯1 ⊆ 𝒯2 , then 𝒯1 = 𝒯2 . Conclude that if 𝒯 is a
compact Hausdorff topology, then any strictly larger topology than 𝒯 is
not compact, and any strictly smaller topology than 𝒯 is not Hausdorff.⁴
7. Let X be a topological space, and let 𝔅 be an open base for X. Prove that X
is compact if and only if every open cover of X by members of 𝔅 has a finite
subcover. The same result is true for open subbases, but it is considerably
harder to prove.
8. Prove that if X is a compact space and Y is a Lindelöf space, then X × Y is
Lindelöf.
9. Prove that any two disjoint compact subsets of a Hausdorff space have
disjoint open neighborhoods.
10. Let 𝒦 be a collection of compact subsets of a Hausdorff space X, and let U
be an open subset of X such that ∩{K ∶ K ∈ 𝒦} ⊆ U. Prove that U contains
the intersection of a finite subcollection of 𝒦.
11. Let X be a Hausdorff space. Prove that if K1 ⊇ K2 ⊇ . . . is a sequence of
descending nonempty compact subsets of X, then ∩∞ n=1 Kn ≠ ∅.
12. Prove that the continuous image of a countably compact space is countably
compact.
13. Prove that a closed subspace of a countably compact space is countably
compact.
14. Prove that a countably compact metric space is compact.
15. Prove that a second countable, countably compact space is compact.
16. Verify that the proofs included in section 4.9 for theorems 4.9.3 (the Stone-
Weierstrass theorem) and 4.9.6 are valid without alteration when X is a
compact Hausdorff topological space.

5.9 Locally Compact Spaces

Without a doubt, ℝn is the most important example of a locally compact Hausdorff


space. We studied locally compact metric spaces briefly in section 4.7. In this sec-
tion, we will see that locally compact Hausdorff spaces are regular (theorem 5.9.3);
hence they have good separation properties. They are also very nearly normal.
Compare theorems 5.9.2 and 5.6.3. The next section is the natural continuation
of this one, where we show that every locally compact Hausdorff spaces can be
embedded into a compact Hausdorff space in a special kind of way. We will take

⁴ This property is sometimes described to as the rigidity of compact Hausdorff topologies.


OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 227

another journey into locally compact spaces in section 5.11, where we establish
Urysohn’s theorem for locally compact Hausdorff spaces and introduce the space
of continuous, compactly supported functions on such spaces.
This section is the transitional section to the remaining three sections in this
chapter. It may be bypassed on the first reading of the book because locally compact
metric spaces (section 4.7) are sufficient for most of the rest of the book. Locally
compact Hausdorff spaces are needed only in sections 8.4 and 8.7, where frequent
reference is made to the results in this section and sections 5.10 and 5.11, and where
certain theorems are extended from ℝn to locally compact Hausdorff spaces.

Definition. A topological space X is locally compact if, for every x ∈ X, there


exists an open set V such that x ∈ V and V is compact. Thus every point is in
the interior of a compact set.

We established in section 4.7 that ℝn is locally compact and that l∞ is not. See
theorem 6.1.5 for a far-reaching result. Also in section 4.7, we showed that ℚ is
not locally compact.

Theorem 5.9.1. Let X be a Hausdorff space. Then X is locally compact if and only
if, for every x ∈ X and every open neighborhood U of x, there exists an open
neighborhood V of x such that V is compact and V ⊆ U.

Proof. Suppose X is locally compact, and let x and U be as in the statement of the
theorem. Let K be a compact subset of X that contains x in its interior, and
let F = K − U. As F is a closed subset of the compact subset K, it is compact.
Invoking theorem 5.8.9 yields disjoint open sets W1 and W2 such that x ∈ W1 and
F ⊆ W2 . Define V = W1 ∩ int(K). Since K is compact and V ⊆ K, V is compact.
Finally, since V ⊆ X − W2 , and the latter set is closed, V ⊆ X − W2 ⊆ X − F. Thus
V ⊆ K ∩ (X − F) = K − F ⊆ U. The proof of the converse is trivial. 

The next result generalizes the last.

Theorem 5.9.2. Let X be a locally compact Hausdorff space, and let U be an open
neighborhood of a compact subset K of X. Then there exists an open neighborhood
V of K such that V is compact and V ⊆ U.

Proof. By theorem 5.9.1, every point x ∈ K has an open neighborhood Vx with


compact closure such that Vx ⊆ U. Since K is compact and K ⊆ ∪x∈K Vx , K ⊆
∪ni=1 Vxi , for some finite subset {x1 , … , xn } of K. The open set V = ∪ni=1 Vxi has the
desired properties. 

The following result is a direct consequence of theorems 5.6.2 and 5.9.1.


OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

228 fundamentals of mathematical analysis

Theorem 5.9.3. A locally compact Hausdorff space is regular. 

Theorem 5.9.4. Let X be a second countable locally compact Hausdorff space. Then
X is a countable union of compact subsets of X.

Proof. Let 𝔅 be a countable open base for X. For every x ∈ X, there is an open set
Vx such that x ∈ Vx and Vx is compact. Let Bx ∈ 𝔅 be such that x ∈ Bx ⊆ Vx .
Clearly, Bx ⊆ Vx ; thus Bx is compact. Now X = ∪x∈X Bx . Since 𝔅 is countable, only
countably many of the sets Bx can be distinct, showing that X is a countable union
of compact subsets of X. 

Definition. A topological space X is said to be 𝜎-compact if it is the countable


union of compact subsets.

For example, ℝn is 𝜎-compact. More generally, the above theorem states that a
second countable locally compact Hausdorff space is 𝜎-compact.

We will use the following result in the next section to prove a simple character-
ization of locally compact Hausdorff spaces. The proof is left as an exercise.

Proposition 5.9.5. An open subspace of a locally compact Hausdorff space is locally


compact. 

Exercises

1. Prove proposition 5.9.5.


2. Prove that a closed subspace of a locally compact space is locally compact.
3. Prove that the product of two locally compact spaces is locally compact.
4. Prove that a second countable locally compact Hausdorff space is normal.
5. Let f be a continuous, open mapping from a locally compact space X onto a
topological space Y. Prove that Y is locally compact.
6. Prove that if E and F are compact subsets of a locally compact Hausdorff
space X, then E and F have disjoint neighborhoods with compact closures.
7. Prove that a compact subspace of the Sorgenfrey line ℝl is countable.
Conclude that ℝl is not locally compact. Hint: Let K be compact in ℝl , and let
1
x ∈ K. Clearly, ℝ = ∪∞ n=1 (−∞, x − ) ∪ [x, ∞). Let n be the least positive
n
1
integer such that K ⊆ (−∞, x − ) ∪ [x, ∞). Set ax = x − 1/n. Clearly,
n
(ax , x] ∩ K = {x}. Show that if x and y are distinct points of K, then
(ax , x] ∩ (ay , y] = ∅.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 229

5.10 Compactification

In this section, we show that a locally compact Hausdorff space (X, 𝒯) can be
embedded in a compact Hausdorff space (X∞ , 𝒯∞ ) in the manner described in
theorem 5.10.1. In that theorem, the definition of the topology 𝒯∞ requires some
explanation.

The prototypical and most important example of a locally compact Hausdorff


space is ℝn . We focus here on ℝ2 , because the stereographic projection of the
punctured sphere 𝒮2∗ onto ℝ2 is easy to visualize and provides an excellent
motivation for the the definition of 𝒯∞ . The stereographic projection has been
known to mapmakers since the late sixteenth century, and it is reasonable to
surmise that Alexandroff was aware of that projection when he invented the
topology 𝒯∞ .

It is clear that a compactification of the plane (more literally, its homeomorphic


image 𝒮2∗ ) is the compact sphere 𝒮2 , which contains 𝒮2∗ and a single additional
point N. Some reflection reveals that there are two types of open subsets of the
compact sphere:

(a) The open subsets of 𝒮2 that do not contain N: These are in one-to-one cor-
respondence (through the stereographic projection) with the open subsets
of the usual topology of ℝ2 .
(b) The open subsets U of 𝒮2 that contain the point N: The complement
K = 𝒮2 − U of such an open set is closed in 𝒮2 . Since 𝒮2 is compact, K is
compact. Thus the open sets U of this type are exactly the complements
of compact subsets of the punctured sphere, which are in one-to-one
correspondence with the compact subsets of ℝ2 .

The above discussion suggests that a likely construction of a compact topology that
contains the usual topology on ℝ2 can be obtained by adding a single point, which
we call ∞ (this point corresponds to the point N on the compact sphere), to ℝ2
and define the topology on ℝ2 ∪ {∞} to consist of the above two types of sets. This
is exactly how the topology 𝒯∞ in theorem 5.10.1 is defined.

Theorem 5.10.1. Let (X, 𝒯) be a locally compact Hausdorff space that is not
compact. Then there exists a compact Hausdorff space (X∞ , 𝒯∞ ) containing (X, 𝒯)
such that

(i) X∞ − X is a single point,


(ii) 𝒯 is the restriction of 𝒯∞ to X, and
(iii) X is dense in (X∞ , 𝒯∞ ).
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

230 fundamentals of mathematical analysis

Proof. Take an object, which we give the symbol ∞, that does not belong to X, and
let X∞ = X ∪ {∞}.
We define 𝒯∞ to be the collection of subsets of X∞ of one of the following two
types:

(a) all the members of 𝒯, or


(b) subsets of X∞ of the form X∞ − K, where K is a compact subset of X.

We claim that 𝒯∞ is a topology that satisfies the stated properties. We leave it to


the reader to verify that the intersection of two members of 𝒯∞ belongs to 𝒯∞ . To
show that the union of an arbitrary subcollection of 𝒯∞ is in 𝒯∞ , we work out three
cases:

1. Since 𝒯 is a topology and 𝒯 ⊆ 𝒯∞ , the union of open sets of type (a) is in 𝒯∞ .


2. Consider the union of a collection {X∞ − K𝛼 }𝛼 of subsets of X∞ of type (b);
∪𝛼 (X∞ − K𝛼 ) = X∞ − ∩𝛼 K𝛼 , which is in 𝒯∞ because ∩𝛼 K𝛼 is compact in X.
3. Consider the union of a subcollection {U𝛼 }𝛼 of 𝒯 and a subcollection
{X − K𝛽 }𝛽 of 𝒯∞ , where each K𝛽 is compact. By cases 1 and 2 above, ∪𝛼 (U𝛼 ) ∈
𝒯 and ∪𝛽 (X∞ − K𝛽 ) ∈ 𝒯∞ . Write ∪𝛼 (U𝛼 ) = U, and ∪𝛽 (X∞ − K𝛽 ) = X∞ −
K. Now ∪𝛼 (U𝛼 ) ∪ ∪𝛽 (X∞ − K𝛽 )=U ∪ (X∞ − K) = X∞ − (K − U), which is in
𝒯∞ because K − U is compact in X. This proves that 𝒯∞ is a topology.

We verify that 𝒯 is the restriction of 𝒯∞ to X. Given an open subset of X∞ , its


intersection with X is in 𝒯 since, for an open subset U of X, U ∩ X = U and, for a
compact subset K of X, (X∞ − K) ∩ X = X − K, which is open in X. The converse
is trivial since, for an open subset U of X, U ∩ X∞ = U.

Next we show that X is dense in X∞ by showing that every open neighborhood of


∞ intersects X. Such a neighborhood is of the type X∞ − K, where K is a compact
subset of X. Since X is not compact, X − K ≠ ∅, and clearly (X∞ − K) ∩ X ≠ ∅.

We now show that 𝒯∞ is Hausdorff. It is sufficient to show that if x ∈ X, then x and


∞ have disjoint open neighborhoods in X∞ . Since X is locally compact, there is a
compact subset K of X such that x ∈ int(K). Let U = int(K), and V = X∞ − K; U
and V are disjoint neighborhoods of x and ∞.

Finally, we show that X∞ is compact. Let 𝒰 be an open cover of X∞ ; 𝒰 must


contain a member of the type X∞ − K for some compact subset K of X. Let
𝒰′ be 𝒰 with the exclusion of X∞ − K. The intersection of X and the members
of 𝒰′ clearly covers K. Thus there exists a finite subcollection 𝒰″ of 𝒰′ such
that K ⊆ ∪{X ∩ W ∶ W ∈ 𝒰″ }. Since X∞ = K ∪ (X∞ − K), the finite subcollection
𝒰″ ∪ {X∞ − K} of 𝒰 covers X∞ . 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 231

Definition. The topological space (X∞ , 𝒯∞ ) we constructed in theorem 5.10.1 is


called the one-point (or Alexandroff) compactification of the locally compact
Hausdorff space X.

Theorem 5.10.2. The one-point compactification of a locally compact Hausdorff


space X is unique up to homeomorphism. More specifically, if Y is a compact
Hausdorff space and 𝜑 ∶ X → Y is a topological embedding such that Y − 𝜑(X)
is a single point, then 𝜑 can be extended to a homeomorphism 𝜑∞ ∶ X∞ → Y.

Proof. Let Y − 𝜑(X) = {𝜔}, and extend 𝜑 to a function 𝜑∞ ∶ X∞ → Y by defining


𝜑∞ (∞) = 𝜔. Trivially, 𝜑∞ is a bijection. Since both X∞ and Y are compact
−1
Hausdorff spaces, we need only to show that 𝜑∞ is continuous; see theorem 5.8.7.
Equivalently, we show that 𝜑∞ is an open mapping. If V is an open subset of X,
then 𝜑∞ (V) = 𝜑(V), which is open by the assumption that 𝜑 is an embedding. If
V contains ∞, then V = X∞ − K for some compact subset K of X. Now 𝜑∞ (V) =
𝜑∞ (X∞ − K) = Y − 𝜑∞ (K) = Y − 𝜑(K). The compactness of K together with the
continuity of 𝜑 imply that 𝜑(K) is compact. By theorem 5.8.3, 𝜑(K) is closed, and
Y − 𝜑(K) is open. 

Example 1. Let 𝜒 be the chordal metric on ℝ. Recall that (ℝ, 𝜒) is homeomor-


phic to ℝ with the usual topology. Therefore the one-point compactification
of ℝ with respect to the usual topology is homeomorphic to the one-point
compactification of (ℝ, 𝜒). By the very definition of the chordal metric, (ℝ, 𝜒)
is homeomorphic to the punctured sphere 𝒮1 − {N}. Therefore the one-point
compactification of (ℝ, 𝜒), hence (ℝ, 𝒯), is the compactification of 𝒮1 − N,
which is clearly 𝒮1 . We have arrived at the following result: the one-point
compactification of ℝ is the circle. The same argument shows that the one-point
compactification of the complex plane ℂ (identified with the Euclidean plane
ℝ2 ) is the sphere. This is the reason the sphere is thought of as the extended
complex plane and is often called the Riemann sphere.

Example 2. The one-point compactification of the open interval (0, 1) is the circle
𝒮1 . To see this, recall that the unit (0, 1) is homeomorphic to the line ℝ. Since
the one-point compactification of ℝ is 𝒮1 , the compactification of (0, 1) is 𝒮1 . 

Example 3. While the one-point compactification X∞ of a locally compact Haus-


dorff space X is essentially unique, it is possible to embed X as a dense subspace
of other spaces not homeomorphic to X∞ . For example, another compactifi-
cation of the open unit interval (0, 1) is the closed interval [0, 1], which is not
homeomorphic to 𝒮1 . 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

232 fundamentals of mathematical analysis

Example 4. The one-point compactification of the punctured line, X = (−∞, 0) ∪


(0, ∞) is homeomorphic to the union of two externally tangent circles (a figure
eight).

Each of the open half lines is homeomorphic to an open half circle, as shown in
figure 5.1(a). There are several ways to see this. The stereographic projection
is the easiest to visualize. The next step is to pull the two open half circles
horizontally apart a distance equal to the diameter of each half circle, as shown
in figure 5.1(b). Now each half circle is homeomorphic to a punctured circle.
For example, function f (ei𝜃 ) = e2i𝜃 maps the half circle {ei𝜃 ∶ −𝜋/2 < 𝜃 < 𝜋/2}
onto the punctured circle {ei𝜃 ∶ −𝜋 < 𝜃 < 𝜋}. Hence X is homeomorphic to the
union of the two tangent punctured circles shown in figure 5.1(c). If we define
the point at infinity to be the missing point of tangency, we obtain the figure
eight shown in figure 5.1(d). 

The following succinct characterization of locally compact Hausdorff spaces fol-


lows directly from theorems 5.9.5 and 5.10.1.

Theorem 5.10.3. A Hausdorff space is locally compact if and only if it is an open


subspace of a compact Hausdorff space. 

(a) (b)

(c) (d)

Figure 5.1
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 233

Exercises

1. Let X = ℕ with the restricted topology induced by the usual topology on ℝ.


This topology is, in fact, the discrete topology on ℕ. Prove that the one-point
1
compactification of X is homeomorphic to the space X∞ = { ∶ n ∈ ℕ} ∪ {0}
n
(as a subspace of ℝ).
2. Prove that the one-point compactification of ℝn is the sphere 𝒮n .
3. What is the one-point compactification of the open unit disk in ℝ2 ?
4. By generalizing the idea of example 4, make a conjecture about the one-point
compactification of the union of the two open half planes {(x, y) ∈ ℝ2 ∶ x >
0} ∪ {(x, y) ∈ ℝ2 ∶ x < 0}.
5. Prove that if a locally compact Hausdorff space X is second countable, then
so is X∞ .

5.11 Metrization

We now turn to the question of which topologies are induced by a metric. Theorem
5.11.3 is the main result in this section. Although it is not the best known result,
it does establish sufficient conditions for metrization. The proof techniques we
develop along the path to theorem 5.11.3 are elegant and important in their own
right. We first state the following definition.

Definition. A topological space (X, 𝒯) is metrizable if there is a metric d on X


that induces the topology 𝒯.

Lemma 5.11.1. Suppose X is a normal space, and let E and F be disjoint closed
subsets of X. Let C be the set of rational points in the interval [0, 1]. Then there
exists a countable collection of open subsets {Up ∶ p ∈ C} such that

if p, q ∈ C and p < q, then Up ⊆ Uq . (*)

Additionally, for all p ∈ C, E ⊆ Up , and Up ⊆ X − F.

Proof. Let p0 = 0, and p1 = 1, and let { p2 , p3 , p4 , . . . } be an enumeration of the


rational point in (0, 1). Since E ⊆ X − F, theorem 5.6.3 yields an open set U1
such that E ⊆ U1 ⊆ U1 ⊆ X − F. Another application of theorem 5.6.3 yields an
open set U0 such that E ⊆ U0 ⊆ U0 ⊆ U1 . The rest of the construction is inductive.
Suppose that, for each element pi of the finite set Cn = {p0 , . . . , pn }, we have found
an open set Upi such that the sets Up1 , … , Upn satisfy condition (*) for p, q ∈ Cn .
Consider the rational number pn+1 . It must fall strictly between two elements of
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

234 fundamentals of mathematical analysis

Cn , say, pi < pn+1 < pj . Again by theorem 5.6.3, there exists an open set Upn+1 such
that Upi ⊆ Upn+1 ⊆ Upn+1 ⊆ Upj . By construction, the sets Up0 , … , Upn+1 satisfy
condition (*) for p, q ∈ Cn+1 . Since, for every pair of points p and q in C, there
is a finite set Cn that contains p and q, the proof is complete.
The inclusions E ⊆ Up , and Up ⊆ X − F for all p ∈ C are obvious since E ⊆ U0
and U1 ⊆ X − F. 

Remark. Remark. Any dense subset C of [0, 1] containing 0 and 1 can be used in
the construction of the collection {Up ∶ p ∈ C}. A commonly used such set is
the set of dyadic rational numbers D = {0, 1, 1/2, 1/4, 3/4, 1/8, 3/8, 5/8, 7/8, …},⁵
which is slightly more advantageous in the visualization of the construction of
the sets {Up }.

The following theorem is crucial for the proof of theorem 5.11.3. It is greatly
important in its own right.

Theorem 5.11.2 (Urysohn’s lemma). Suppose that X is a normal space and that
E and F are disjoint closed subsets of X. Then there exists a continuous function
f ∶ X → [0, 1] such that f (E) = 1, and f (F) = 0.

Proof. Let C and {Up ∶ p ∈ C} be as in lemma 5.11.1. For p, q ∈ C, define

p if x ∈ U1−p , 1 if x ∈ U1−q ,
fp (x) = { and gq (x) = {
0 if x ∉ U1−p , q if x ∉ U1−q .

Since fp = p𝜒U1−p and gq = q + (1 − q)𝜒U1−q , fp is lower semicontinuous and gq is


upper semicontinuous by theorem 5.3.4 (U1−p is open, and U1−q is closed). Now
define
f = supp∈C { fp } and g = infq∈C {gq }.
Again theorem 5.3.4 implies that f is lower semicontinuous and that g is upper
semicontinuous.
If x ∈ E, x ∈ U1−p , for every p ∈ C, and hence f (x) = supp∈C {p} = 1. If x ∈ F,
then x ∉ U1−p for every p ∈ C, and hence f (x) = 0.

By theorem 5.3.4, the proof will be complete if we show that f = g.

k
⁵ D = {0, 1} ∪ ∪∞
n=1 { ∶ k = 1, 3, 5, … , 2n − 1}.
2n
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 235

We claim that, for all p, q ∈ C, fp ≤ gq . It follows immediately that f ≤ g.


If, for some x ∈ X, gq (x) < fp (x), then fp (x) > 0, and gq (x) < 1. Thus

fp (x) = p, hence x ∈ U1−p , and


gq (x) = q, hence x ∉ U1−q .

Now p = fp (x) > gq (x) = q; hence 1 − p < 1 − q, and U1−p ⊆ U1−q . This
contradicts x ∈ U1−p and x ∉ U1−q and establishes the claim that fp ≤ gq .

Suppose, for a contradiction, that f (x) < g (x) for some x ∈ X. Because C is dense
in [0, 1], there are points p, q ∈ C such that f (x) < p < q < g (x). Now f (x) < p
implies that x ∉ U1−p , and g (x) > q implies x ∈ U1−q . This is a contradiction
because
1 − q < 1 − p; hence U1−q ⊆ U1−p . The contradiction concludes the proof. 

Theorem 5.11.3 (the Urysohn metrization theorem). Every regular second


countable topological space (X, 𝒯) is metrizable.

Proof. According to theorem 5.7.4, X is actually a normal space. Let 𝔅 = {B1 , B2 , . . . }


be a countable open base for the topology. If x ∈ B1 , then, by theorem 5.6.2, and
the fact that 𝔅 is an open base, there exists a basis element Bk ∈ 𝔅 such that
x ∈ Bk ⊆ Bk ⊆ B1 . Therefore the collection of pairs P = {(Bn , Bm ) ∶ Bn ⊆ Bm } is
not empty. Since P is countable, we enumerate P as follows: P = {(Bni , Bmi ) ∶ i ∈
ℕ}. By theorem 5.11.2, for each i ∈ ℕ, a continuous function fi ∶ X → [0, 1] exists
such that fi (Bni ) = 0, and fi (X − Bmi ) = 1.
We now define a metric d on X as follows:


| fi (x) − fi (y)|2 1/2
d(x, y) = { ∑ } .
i=1
i2

It is clear that series in the above definition converge since | fi (x) − fi (y)| ≤ 1. In
f (x) f (x) f (y) f (y)
fact, the sequences 𝜑x = ( f1 (x), 2 , … , i , . . . ) and 𝜑y = ( f1 (y), 2 , … , i , . . . )
2 i 2 i
are in l2 and d(x, y) is nothing but the l2 distance between 𝜑x and 𝜑y . It becomes
clear that d is a metric once we show that the function x ↦ 𝜑x is an injection. Let
x and y be distinct elements of X, and let U be an open neighborhood of x that
excludes y. Choose a basis member Bm such that x ∈ Bm ⊆ Bm ⊆ U, then choose a
basis member Bn such that x ∈ Bn ⊆ Bn ⊆ Bm . The pair (Bn , Bm ) ∈ P, and hence
(Bn , Bm ) = (Bni , Bmi ) for some i ∈ ℕ. It follows that fi (x) = 0 and fi (y) = 1.
We now show that the metric d induces the topology 𝒯.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

236 fundamentals of mathematical analysis

We prove that every d-open subset U of X is 𝒯-open. Let x ∈ U, and let r > 0
be such that B(x, r) ⊆ U. Here B(x, r) is the d-ball of radius r centered at x.
We will show that there exists a 𝒯-open set V such that x ∈ V ⊆ B(x, r). First
∞ 1 r2
choose an integer N such that ∑i=N+1 2 < . Since each fi is continuous,
i 2
there exists an open neighborhood Vi of x such that, for every y ∈ Vi , | fi (x) −
r2
fi (y)|2 < . We claim that the set V = ∩Ni=1 Vi is the set we seek. If y ∈ V,
2N
∞ | fi (x)−fi (y)|2 N | fi (x)−fi (y)|2 ∞ | fi (x)−fi (y)|2
then [d(x, y)]2 = ∑i=1 = ∑i=1 + ∑i=N+1 <
i2 i2 i2
r2 N 1 ∞ 1 r2 r2
∑i=1 2 + ∑i=N+1 2 < + .
2N i i 2 2

To show that every 𝒯-open set is d-open, it is sufficient to show that every basic
open set Bm is d-open. Let x ∈ Bm . We need to show that there exists r > 0 such that
B(x, r) ⊆ Bm . By theorem 5.6.2, there exists a basis element Bn such that x ∈ Bn ⊆
1
Bn ⊆ Bm . Now (Bn , Bm ) ∈ P, say, (Bn , Bm ) = (Bni , Bmi ). Let r = . If y ∈ B(x, r),
2i
∞ | fj (x)−fj (y)|2 1 | fi (x)−fi (y)|2 1
then ∑j=1 < . In particular, < . Thus | fi (x) − fi (y)| <
j2 4i2 i2 4i 2
1 1
. Because fi (x) = 0, | fi (y)| < . Since fi (X − Bmi ) = 1, y ∈ Bmi = Bm . 
2 2

The conditions of theorem 5.11.3 are not necessary for a space to be metrizable.
For example, the space l∞ is metrizable but not second countable. However, if we
limit ourselves to compact Hausdorff spaces, the conditions of theorem 5.11.3 are
necessary as well as sufficient, as the next theorem shows.

Theorem 5.11.4. A compact Hausdorff space is metrizable if and only if it is second


countable.

Proof. A compact Hausdorff space X is normal by theorem 5.8.10. Therefore if X is


second countable, it is metrizable by the previous theorem. To prove the converse,
recall that a compact metric space is second countable (see problem 7 on section
4.7 and theorem 4.5.1). 

We now venture back into locally compact Hausdorff spaces. The following theo-
rem is the closest analog of theorem 5.11.2 for locally compact Hausdorff spaces,
which need not be normal. It is sometimes referred to as Urysohn’s theorem for
locally compact spaces.

Theorem 5.11.5. Let X be a locally compact Hausdorff space, and let K and F be
disjoint subsets of X such that K is compact and F is closed. Then there exists a
continuous function f ∶ X → [0, 1] such that f (K) = 1, and f (F) = 0.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 237

Proof. Applying theorem 5.9.2 to the compact set K and the open set X − F, there
exists an open subset V with compact closure such that K ⊆ V ⊆ V ⊆ X − F.
Applying theorem 5.9.2 again to the compact set K and the open set V, we can
find an open set U with compact closure such that K ⊆ U ⊆ U ⊆ V. Now the
subspace V with the restricted topology is a compact Hausdorff space and is
therefore normal by theorem 5.8.10. Applying theorem 5.11.2 to the closed subsets
K and V − U of V, there is a continuous function f ∶ V → [0, 1] such that f (K) = 1,
and f (V − U) = 0. Extend f to a continuous function f ∶ X → [0, 1] by defining
f (x) = 0, for all x ∈ X − U. Problem 4 on section 5.3 is relevant here to show the
continuity of the extended f. 

Definition. Let f be a complex-valued function on a topological space X. The


support of f, written supp( f), is the closure of the set {x ∈ X ∶ f (x) ≠ 0}. A
function f ∶ X → ℂ is said to have compact support if supp( f) is compact. The
set 𝒞c (X) of continuous, compactly supported functions is clearly a subspace
of ℬ𝒞(X).

The following corollary is of pivotal importance in studying a certain class of


measures on locally compact Hausdorff spaces. In section 8.4, we present the
main examples of such a measure: Lebesgue and Radon measures.

Corollary 5.11.6. Suppose that X is a locally compact Hausdorff space, that K is a


compact subspace of X, and that V is an open neighborhood of K. Then there exists
a continuous function of compact support, f ∶ X → [0, 1], such that f (K) = 1, and
supp( f) ⊆ V.

Proof. Apply theorem 5.9.2 to find an open set U with compact closure such that
K ⊆ U ⊆ U ⊆ V. Now apply theorem 5.11.5 to the sets K and F = X − U to find
a function f ∶ X → [0, 1] such that F(K) = 1 and f (X − U) = 0. Observe that
supp( f) ⊆ U, which is compact. 

Definition. Let X be a locally compact Hausdorff space. A continuous, scalar-


valued function f is said to vanish at ∞ if, for every 𝜖 > 0, there exists a compact
subset K of X such that | f (x)| < 𝜖 for every x ∈ X − K. The set 𝒞0 (X) of all scalar-
valued functions on X that vanish at ∞ is clearly a vector space.

Theorem 5.11.7. A function f ∈ 𝒞0 (X) is bounded and the space 𝒞0 (X) is a complete
normed linear space under the supremum norm.

Proof. We leave it to the reader to show that 𝒞0 (X) ⊆ ℬ𝒞(X). We prove that 𝒞0 (X) is
closed in ℬ𝒞(X). Let f ∈ ℬ𝒞(X) be a closure point of 𝒞0 (X), and let 𝜖 > 0. There
exists a function g ∈ 𝒞0 (X) such that ‖ f − g‖∞ < 𝜖/2. Let K be a compact subset
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

238 fundamentals of mathematical analysis

of X such that |g (x)| < 𝜖/2 whenever x ∉ K. Now if x ∉ K, then | f (x)| ≤ | f (x) −
g (x)| + |g (x)| < 𝜖/2 + 𝜖/2. 

Theorem 5.11.8. Suppose X is a locally compact Hausdorff space. Then 𝒞c (X) is


dense in 𝒞0 (X).

Proof. Let g ∈ 𝒞0 (X), let 𝜖 > 0, and let K be a compact subset of X such that
|g (x)| < 𝜖 for x ∈ X − K. By theorem 5.9.2, there is an open subset V with compact
closure such that K ⊆ V. By corollary 5.11.6, there exists a function f ∈ 𝒞c (X) such
that f (K) = 1, 0 ≤ f (x) ≤ 1, and supp( f) ⊆ V. The function fg is in 𝒞c (X) and
‖g − fg‖∞ < 𝜖. 

Exercises

In the problems below, X is a locally compact Hausdorff space.

1. Prove that 𝒞0 (X) ⊆ ℬ𝒞(X).


2. Prove that f ∈ 𝒞0 (X) if and only if, for every 𝜖 > 0, the set {x ∈ X ∶ | f (x)| ≥ 𝜖}
is compact.
3. Prove that f ∈ 𝒞0 (X) if and only if f is the restriction to X of a function
g ∈ 𝒞(X∞ ) such that g (∞) = 0. Here X∞ is the one-point compactification
of X.

5.12 The Product of Infinitely Many Spaces

This section generalizes section 5.4. First we review some terminology and
notation.

Let {X𝛼 }𝛼∈I be an arbitrary collection of nonempty sets. The Cartesian product
X = ∏𝛼∈I X𝛼 is the set of all functions x ∶ I → ∪𝛼∈I X𝛼 such that, for every 𝛼 ∈ I,
x(𝛼) ∈ X𝛼 . We write x𝛼 instead of x(𝛼), and we denote an element of X by
x = (x𝛼 )𝛼∈I , or simply x = (x𝛼 ). For a fixed 𝛼 ∈ I, the projection of X onto the
factor set X𝛼 is the function 𝜋𝛼 (x) = x𝛼 .

Let {(X𝛼 , 𝒯𝛼 )}𝛼∈I be a collection of topological spaces, and let X = ∏𝛼 X𝛼 be


the Cartesian product of the underlying sets. As in the definition of the product
topology in section 5.4, we would like the product topology to guarantee the
continuity of all the projections 𝜋𝛼 ∶ X → X𝛼 . One might be tempted to adopt the
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 239

following simple generalization of the product of finitely many spaces. Consider


the topology 𝒯, which has the following subbase:

{ ∏ U𝛼 ∶ 𝛼 ∈ I, U𝛼 ∈ 𝒯𝛼 }.
𝛼∈I

It would be a hasty decision to define 𝒯 to be the product topology. Although 𝒯


certainly guarantees the continuity of all the projections, it is too wasteful because,
in order to guarantee the continuity of 𝜋𝛼 , we only need the openness of sets of
the form 𝜋𝛼−1 (U𝛼 ), where U𝛼 ∈ 𝒯𝛼 . A little reflection shows that
𝜋𝛼−1 (U𝛼 ) = U𝛼 × ∏𝛽≠𝛼 X𝛽 . ⁶
Therefore the smallest topology which guarantees the continuity of all
the projections is the topology whose subbase is the collection {𝜋𝛼−1 (U𝛼 ) ∶ 𝛼 ∈ I,
U𝛼 ∈ 𝒯𝛼 }.

We now formalize the above motivation to define the product topology


𝔖 = {𝜋𝛼−1 (U𝛼 ) ∶ 𝛼 ∈ I, U𝛼 ∈ 𝒯𝛼 }.
Since ∪{S ∶ S ∈ 𝔖} = X, theorem 5.2.3 applies, and the following definition is
meaningful.

Definition. The product topology on X is the weakest topology that contains 𝔖.

By construction, 𝔖 is a subbase for the product topology (theorem 5.2.3).


An open base 𝔅 for the product topology on X consists of finite intersections
of members of 𝔖. Thus a typical member of 𝔅 is a set of the form

∩ni=1 𝜋𝛼−1
i
(U𝛼i ) = U𝛼1 × . . . × U𝛼n × ∏ X𝛼 ,
𝛼≠𝛼i

where {𝛼1 , … , 𝛼n } is a finite subset of I.

To reiterate, the above set is the set of all x ∈ X such that 𝜋𝛼i (x) ∈ Ui for all
1 ≤ i ≤ n.
The following theorem is a restatement of the definition of the product
topology. See the proof of theorem 5.4.1.

Theorem 5.12.1. The product topology is the weakest topology relative to which all
the projections 𝜋𝛼 ∶ X → X𝛼 are continuous. 

⁶ The set 𝜋𝛼−1 (U𝛼 ) is the set of all elements x in X such that x𝛼 ∈ U𝛼 and the other coordinates, x𝛽 ,
of x are unrestricted elements of X𝛽 . This is exactly the set U𝛼 × ∏𝛽≠𝛼 X𝛽 .
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

240 fundamentals of mathematical analysis

Example 1. The product of an arbitrary collection {X𝛼 ∶ 𝛼 ∈ I} of Hausdorff


spaces is Hausdorff.
Let x = (x𝛼 ) and y = (y𝛼 ) be distinct elements of ∏𝛼 X𝛼 . Fix an element 𝛼 ∈
I for which x𝛼 ≠ y𝛼 . Since X𝛼 is Hausdorff, there exist open neighborhoods U𝛼
and V𝛼 of x𝛼 and y𝛼 , respectively, such that U𝛼 ∩ V𝛼 = ∅. Now 𝜋𝛼−1 (U𝛼 ) and
𝜋𝛼−1 (V𝛼 ) are disjoint open neighborhoods of x and y, respectively. 

Example 2. Let {X𝛼 ∶ 𝛼 ∈ I} be a collection of connected spaces, let X = ∏𝛼∈I X𝛼 ,


and let 𝜑 ∶ X → {0, 1} be continuous. If x, y ∈ X are such that x𝛼 = y𝛼 except
for a finite subset F of I, then 𝜑(x) = 𝜑(y).

Define a function j ∶ ∏𝛼∈F X𝛼 → Xas follows : z ↦ jz , where

z𝛼 if 𝛼 ∈ F,
jz (𝛼) = {
y𝛼 if 𝛼 ∉ F.

For each 𝛽 ∈ I, the function 𝜋𝛽 oj ∶ ∏𝛼∈F X𝛼 → X𝛽 is equal to 𝜋𝛽 if 𝛽 ∈ F, and


it is constant if 𝛽 ∉ F. By problem 6 at the end of this section, the function j is
continuous. Now ∏𝛼∈F X𝛼 is connected by theorem 5.5.5; hence j(∏𝛼∈F X𝛼 )
is connected by theorem 5.5.3. Since x and y are in j(∏𝛼∈F X𝛼 ), 𝜑(x) = 𝜑(y). 

Example 3. If {X𝛼 ∶ 𝛼 ∈ I} is a collection of connected spaces, then X = ∏𝛼∈I X𝛼


is connected.
Let 𝜑 ∶ X → {0, 1} be a continuous function, and suppose that U = 𝜑−1 (0) ≠
∅. We show that 𝜑(X) = {0}. Fix an element a = (a𝛼 ) ∈ U, and let x ∈ X be
arbitrary. Since U is open in X, there is a basis element B = ∩ni=1 𝜋𝛼−1
i
(U𝛼i ) such
that a ∈ B ⊆ U. Define an element y ∈ X as follows:

a𝛼 if 𝛼 = 𝛼i ,
y𝛼 = {
x𝛼 if 𝛼 ≠ 𝛼i .

Then y ∈ B ⊆ U and 𝜑(y) = 0. Since x𝛼 = y𝛼 for all 𝛼 ≠ 𝛼i , 𝜑(x) = 𝜑(y) = 0 by


example 2. 

Theorem 5.12.2. If, for each 𝛼, F𝛼 is closed in X𝛼 , then ∏𝛼 F𝛼 is closed in X.

Proof. Let U𝛼 = X𝛼 − F𝛼 . We claim that ∏𝛼∈I F𝛼 = X − ∪𝛼 𝜋𝛼−1 (U𝛼 ). The result


follows since ∪𝛼 𝜋𝛼−1 (U𝛼 ) is open in X. Let x = (x𝛼 ) be an element of X. Now
x ∈ X − ∏𝛼 F𝛼 if and only if x ∉ ∏𝛼 F𝛼 , if and only if x𝛼 ∉ F𝛼 for some 𝛼 ∈ I if
and only if x𝛼 ∈ U𝛼 for some 𝛼 ∈ I, if and only if x ∈ ∪𝛼 𝜋𝛼−1 (U𝛼 ). 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 241

Theorem 5.12.3. For each 𝛼 ∈ I, let 𝔅𝛼 be an open base for X𝛼 . Then the family of
subsets of X of the form ∩𝛼∈F 𝜋𝛼−1 (B𝛼 ), where F ranges over finite subsets of I and
B𝛼 ∈ 𝔅𝛼 , is an open base for the product topology.
The proof is left as an exercise. 

Example 4. The product of a countable collection of second countable spaces is


second countable.
This follows directly from the previous theorem. Indeed, when I is countable
and, for each 𝛼 ∈ I, 𝔅𝛼 is countable, then the family ∩𝛼∈F 𝜋𝛼−1 (B𝛼 ), where F
ranges over finite subsets of I and B𝛼 ∈ 𝔅𝛼 , is countable. 

We need the following lemma before we tackle Tychonoff ’s theorem.

Lemma 5.12.4. Let {X𝛼 }𝛼 be a collection of topological spaces, and let X be the
product space. If 𝔉 is a collection of closed subsets of X possessing the finite
intersection property, then there exists a family 𝔉∗ of subsets of X, not necessarily
closed, which is maximal subject to the following conditions:

(a) 𝔉∗ has the finite intersection property, and


(b) 𝔉 ⊆ 𝔉∗ .

Furthermore, 𝔉∗ is closed under the formation of finite intersections.

Proof. Consider the family 𝔇 of subsets of X containing 𝔉 and having the finite
intersection property. Order 𝔇 by set inclusion, and let ℭ be a chain in 𝔇. We
will verify that ∪{𝒞 ∶ 𝒞 ∈ ℭ} is an upper bound on ℭ. Let F1 , … , Fn be members
of ∪{𝒞 ∶ 𝒞 ∈ ℭ}. Then there are members 𝒞1 , … , 𝒞n of ℭ such that Fi ∈ 𝒞i . Since
ℭ is a chain, one of the families 𝒞1 , … , 𝒞n , say, 𝒞1 , contains all the others. Now
all the sets F1 , … , Fn are in 𝒞1 ; hence ∩ni=1 Fi ≠ ∅, and ∪{𝒞 ∶ 𝒞 ∈ ℭ} has the
finite intersection property. Clearly, 𝒞 contains 𝔉. By Zorn’s lemma, 𝔇 contains
a maximal member 𝔉∗ . If there are sets F1 and F2 in 𝔉∗ such that F1 ∩ F2 ∉
𝔉∗ , then 𝔉∗ ∪ {F1 ∩ F2 } would have properties (a) and (b), which contradicts
the maximality of 𝔉∗ . This proves that the intersection of two (hence any finite
number of) sets in 𝔉∗ is in 𝔉∗ . 

Theorem 5.12.5 (Tychonoff ’s theorem). Let {(X𝛼 , 𝒯𝛼 )}𝛼∈I be a collection of com-


pact topological spaces. Then the product space X is compact.

Proof. Let 𝔉 be a collection of closed subsets of X that has the finite intersection
property. We prove that ∩{F ∶ F ∈ 𝔉} ≠ ∅. By theorem 5.8.8, X is compact. Let
𝔉∗ be a collection of subsets of X having the properties described in lemma 5.12.4.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

242 fundamentals of mathematical analysis

We will show that ∩{F ∶ F ∈ 𝔉∗ } ≠ ∅. This will establish the theorem because
the members of 𝔉 are closed and ∩{F ∶ F ∈ 𝔉∗ } ⊆ ∩{F ∶ F ∈ 𝔉} = ∩{F ∶ F ∈ 𝔉}.

For a fixed 𝛼 ∈ I, consider the following collection of sets:

{𝜋𝛼 (F) ∶ F ∈ 𝔉∗ }.

This is a family of closed subsets of X𝛼 , and it has the finite intersection property
because if F1 , … , Fn are in 𝔉∗ , then ∩ni=1 𝜋𝛼 (Fi ) ⊇ ∩ni=1 𝜋𝛼 (Fi ) ⊇ 𝜋𝛼 (∩ni=1 Fi ) ≠ ∅.
Since each X𝛼 is compact, there is an element x𝛼 ∈ ∩{𝜋𝛼 (F) ∶ F ∈ 𝔉∗ } (theorem
5.8.8). Let x = (x𝛼 ). We will show that x ∈ ∩{F ∶ F ∈ 𝔉∗ }. Let U = ∩ni=1 𝜋𝛼−1 i
(U𝛼i )
be an arbitrary basic open neighborhood of x. We claim that U intersects every
F ∈ 𝔉∗ . This will show that x ∈ F, and the proof will be complete. Since x𝛼i ∈
U𝛼i , and x𝛼i ∈ 𝜋𝛼i (F) for every F ∈ 𝔉∗ , U𝛼i ∩ 𝜋𝛼i (F) ≠ ∅ for every F ∈ 𝔉∗ . Thus
𝜋𝛼−1
i
(U𝛼i ) ∩ F ≠ ∅ for every F ∈ 𝔉∗ . By the maximality of 𝔉∗ , it must be the case
that 𝜋𝛼−1
i
(U𝛼i ) ∈ 𝔉∗ . Since 𝔉∗ is closed under the formation of finite intersections,
n
U = ∩i=1 𝜋𝛼−1 i
(U𝛼i ) ∈ 𝔉∗ . In particular, U ∩ F ≠ ∅ for every F ∈ 𝔉∗ . 

Example 5. (the box topology). Let {(X𝛼 , 𝒯𝛼 )}𝛼∈I be a collection of topological


spaces, and let X = ∏𝛼∈I X𝛼 be the Cartesian product of the underlying sets.
Consider the topology 𝒯 whose open subbase is

𝔖 = { ∏ U𝛼 ∶ U𝛼 ∈ 𝒯𝛼 }.
𝛼∈I

We alluded to this topology in the section preamble. It is a well-known topology,


although it is more intellectually curious than practically important. When each
of the spaces X𝛼 is compact and Hausdorff, the box topology is Hausdorff,
because it contains the product topology, which is Hausdorff by example 1.
However, the box topology is not compact, by problem 6 on section 5.8. In fact,
the product topology has the optimality feature of being the smallest Hausdorff
topology on X that admits the continuity of the projections, and the largest
topology on X that admits Tychonoff ’s theorem. 

We conclude this section by showing that not every compact Hausdorff space is
metrizable.

Example 6. Let I be an uncountable set and, for each 𝛼 ∈ I, let X𝛼 = [0, 1]. The
space X = [0, 1]I = ∏𝛼∈I X𝛼 is compact by Tychonoff ’s theorem, and Hausdorff
by example 1. We show that X is not metrizable.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

essentials of general topology 243

Let 0 denote the zero function from I to [0, 1], and let A consist of all elements
x = (x𝛼 ) ∈ X such that x𝛼 = 0 for finitely many 𝛼 ∈ I, and x𝛼 = 1 otherwise. We
show that 0 ∈ A. Suppose B = ∩ni=1 𝜋𝛼−1 i
(U𝛼i ) is a basic open neighborhood of 0.
The element x = (x𝛼 ) defined below is in A ∩ B, and hence A ∩ B ≠ ∅:

0 if 𝛼 = 𝛼i ,
x𝛼 = {
1 if 𝛼 ≠ 𝛼i .

We show that, for any sequence x(n) = (xn𝛼 ) in A, limn x(n) ≠ 0. The proof will
be complete by theorem 4.2.5. Let In be the subset of elements 𝛼 ∈ I for which
(n)
x𝛼 = 0. The set J = ∪∞ n=1 In is countable since each of the sets In is finite. Because
(n)
I is uncountable, I − J ≠ ∅. Pick an element 𝛽 ∈ I − J. By construction, x𝛽 = 1
for all n ∈ ℕ. Now consider the open set V = 𝜋𝛽−1 ([0, 1/2)); V is a neighborhood
of 0 that contains no terms of the sequence (x(n) ). 

Exercises

1. Let {X𝛼 }𝛼 be a collection of topological spaces, and let A𝛼 ⊆ X𝛼 . Prove that


∏𝛼 A𝛼 = ∏𝛼 A𝛼 .
2. Prove theorem 5.12.3.
3. Prove that the product of an arbitrary collection of regular spaces is regular.
4. Prove that the product of a countable collection of separable spaces is
separable.
5. Let {X𝛼 }𝛼 be a family of topological spaces, and let X = ∏𝛼 X𝛼 . Prove that
a sequence (x(n) ) ∈ X converges to x ∈ X if and only if, for each 𝛼, 𝜋𝛼 (x(n) )
converges to 𝜋𝛼 (x).
6. Let {X𝛼 }𝛼 be a collection of topological spaces, and let f be a function from a
topological space Y to the product space ∏𝛼 X𝛼 . Prove that f is continuous
if and only if each of the compositions 𝜋𝛼 of ∶ Y → X𝛼 is continuous.
7. Let {X𝛼 }𝛼∈I and {Y𝛼 }𝛼∈I be two collections of topological spaces, and,
for each 𝛼 ∈ I, let f𝛼 ∶ X𝛼 → Y𝛼 be continuous. Define a function F ∶
∏𝛼 X𝛼 → ∏𝛼 Y𝛼 by F(x) = ( f𝛼 (x𝛼 ))𝛼∈I . Prove that F is continuous.
8. Prove that the product of a countable collection of metrizable spaces is
metrizable. Hint: Let {(Xn , dn )} be a countable collection of metric spaces.
Without loss of generality, assume that each of the metrics dn is bounded
by 1 (see theorem 4.3.9). Prove that the metric

dn (xn , yn )
D(x, y) = supn∈ℕ
n
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

244 fundamentals of mathematical analysis


induces the product topology on ∏n=1 Xn . Here x = (xn ), and y = (yn ), are

elements of ∏n=1 Xn .
9. In the notation of the previous exercise, prove that the metric


d(x, y) = ∑ 2−n dn (xn , yn )
n=1


also induces the product topology on ∏n=1 Xn .
10. In the notation of problem 8, prove that if dn is a complete metric for every
n ∈ ℕ, then D is a complete metric.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

6
Banach Spaces

Mathematics is the most beautiful and most powerful creation of the human
spirit.
Stefan Banach

Stefan Banach. 1892–1945

In 1902 Banach began his secondary education at the Henryk Sienkiewicz


Gymnasium in Kraków,1 where he graduated in 1910. He then went to Lvov
where he studied engineering at Lvov Technical University, graduating in 1914,
shortly before World War I broke out in August. With the outbreak of the war,
the Russian troops occupied the city of Lvov. Having poor vision in his left eye,
Banach was not physically fit for army service. During the war, he worked building
roads but also spent time in Kraków, where he earned money by teaching in the
local schools. He also attended mathematics lectures at the Jagiellonian University
in Kraków.

A life-changing event occurred in the spring of 1916 when Banach met Steinhaus,
who was living in Kraków, waiting to take up a post at the Jan Kazimierz University
in Lvov. Steinhaus and Banach wrote a joint paper, which was published in The
Bulletin of the Kraków Academy after the war ended in 1918. From that time,

1 A European secondary school that prepares students for the university.

Fundamentals of Mathematical Analysis. Adel N. Boules, Oxford University Press (2021). © Adel N. Boules.
DOI: 10.1093/oso/9780198868781.003.0006
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

246 fundamentals of mathematical analysis

Banach started to produce important mathematics papers at a rapid rate. On


Steinhaus’s initiative, the Mathematical Society of Kraków was set up in 1919. The
society later became the Polish Mathematical Society in 1920. It was also through
Steinhaus that Banach met his future wife, Lucja Braus, whom he married in 1920.

Banach was offered an assistantship to Lomnicki at Lvov Technical University in


1920. He lectured there in mathematics and submitted a dissertation. This was,
of course, not the standard route to a doctorate, for Banach had no university
mathematics qualifications. However, an exception was made to allow him to
submit his thesis “On Operations on Abstract Sets and their Application to Integral
Equations.” This thesis is sometimes said to mark the birth of functional analysis.
In his dissertation, Banach defined axiomatically what today is called a Banach
space, a term which was coined later by Fréchet. The importance of Banach’s work
is that he developed a systematic theory of functional analysis, where before there
had only been isolated results, which were later seen to fit into the new theory.

In 1922 the Jan Kazimierz University in Lvov awarded Banach his qualification to
become a university professor, and in 1924 Banach was promoted to full professor.
The years between the wars were extremely busy for Banach. As well as continuing
to produce a stream of important papers, he wrote arithmetic, geometry, and
algebra texts for high schools. In 1929, together with Steinhaus, he started a
new journal, Studia Mathematica, and Banach and Steinhaus became the first
editors. Another important publishing venture, begun in 1931, was a new series
of mathematical monographs. These were set up under the editorship of Banach
and Steinhaus, from Lvov, and Knaster, Kuratowski, Mazurkiewicz, and Sierpiński
from Warsaw. The first volume in the series, Théorie des opérations linéaires, was
written by Banach and appeared in 1932. It was a French version of a volume
he originally published in Polish in 1931 and quickly became a classic. Another
important influence on Banach was the fact that Kuratowski was appointed to
the Lvov Technical University in 1927 and worked there until 1934. Banach
collaborated with Kuratowski, and they wrote some joint papers during this
period. Banach proved a number of fundamental results on normed linear spaces,
including the Hahn-Banach theorem, the Banach-Steinhaus theorem, the Banach-
Alaoglu theorem, Banach’s open mapping theorem, and the Banach fixed point
theorem. In addition, he contributed to measure theory, integration, topological
vector spaces, and set theory.

In 1939, just before the start of World War II, Banach was elected as President
of the Polish Mathematical Society. At the beginning of the war, Soviet troops
occupied Lvov. Banach had been on good terms with the Soviet mathematicians
before the war started, visiting Moscow several times, and he was treated well by
the new Soviet administration. He was allowed to continue to hold his chair at
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 247

the university, and he became Dean of the Faculty of Science at the university,
now renamed the Ivan Franko University. Life at this stage was little changed for
Banach, who continued his research, his textbook writing, lecturing, and holding
sessions in cafés. Sobolev and Alexandroff visited Banach in Lvov in 1940, and
Banach attended conferences in the Soviet Union. He was in Kiev when Germany
invaded the Soviet Union, and he returned immediately to his family in Lvov.

The Nazi occupation of Lvov in June 1941 meant that Banach lived under very
difficult conditions. He was arrested under suspicion of trafficking in German
currency but was released after a few weeks. As soon as the Soviet troops retook
Lvov, Banach renewed his contacts. He met Sobolev outside Moscow, but by this
time he was seriously ill. Sobolev, giving an address at a memorial conference for
Banach, said of this meeting2

Despite heavy traces of the war years under German occupation, and despite the
grave illness that was undercutting his strength, Banach’s eyes were still lively.
He remained the same sociable, cheerful, and extraordinarily well-meaning and
charming Stefan Banach whom I had seen in Lvov before the war. That is how he
remains in my memory: with a great sense of humor, an energetic human being, a
beautiful soul, and a great talent.

Banach had planned to go to Kraków after the war to take up the chair of
mathematics at the Jagiellonian University, but he died in Lvov in 1945 of lung
cancer.

6.1 Finite vs. Infinite-Dimensional Spaces

This section draws some sharp distinctions between finite and infinite-
dimensional spaces. Although some of the results in this section have intrinsic
importance and will be used later in the book, they are collected here to convince
the reader that infinite-dimensional spaces are truly vast compared to finite-
dimensional ones and that a very different set of tools is needed for studying
them. Among other results, we will see that local compactness characterizes finite-
dimensional normed linear spaces, and that an infinite-dimensional Banach space
cannot have a countable linear basis.

Definition. A Banach space is a complete normed linear space.


Examples of Banach spaces include (𝕂n , ‖.‖2 ), (𝒞[0, 1], ‖.‖∞ ), and all lp spaces.

2 J. J. O’Connor and E. F. Robertson, “Stefan Banach,” in MacTutor History of Mathematics,


(St Andrews: University of St Andrews, 1998), http://mathshistory.st-andrews.ac.uk/Biographies/
Banac/, accessed Nov. 1, 2020.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

248 fundamentals of mathematical analysis

Lemma 6.1.1. Let X be an n-dimensional vector space. Then there exists a norm
‖.‖∗ on X such that (X, ‖.‖∗ ) is isometric to (𝕂n , ‖.‖∞ ). In particular, (X, ‖.‖∗ ) is
complete and locally compact.

Proof. Fix a basis {x1 , … , xn } of X, and define ‖x‖∗ = max1≤i≤n |ai |, where x =
n
∑i=1 ai xi is the unique representation of x as a linear combination of the basis
elements. The mapping T ∶ x ↦ (a1 , … , an ) is clearly a linear isometry from
(X, ‖.‖∗ ) onto (𝕂n , ‖.‖∞ ). 

Theorem 6.1.2. Let (X, ‖.‖) be an n-dimensional normed linear space, and let ‖.‖∗
be the norm on X defined in lemma 6.1.1. Then there exist positive constants 𝛼
and 𝛽 such that, for all x ∈ X, 𝛽‖x‖∗ ≤ ‖x‖ ≤ 𝛼‖x‖∗ .

Proof. We continue to use the notation of the proof of the previous lemma.
Let 𝛼 = n max1≤ i≤n ‖xi ‖. Then

n n n
‖x‖ = ‖ ∑ ai xi ‖ ≤ ∑ |ai |‖xi ‖ ≤ max1≤i≤n ‖xi ‖ ∑ |ai |
i=1 i=1 i=1

≤ n max1≤i≤n ‖xi ‖ max1≤i≤n |ai | = 𝛼‖x‖∗ .

To prove the other inequality, define a function 𝜆 ∶ (X, ‖.‖∗ ) → ℝ by 𝜆(x) = ‖x‖.
Now 𝜆 is continuous because if limn ‖xn − x‖∗ = 0, then |𝜆(xn ) − 𝜆(x)| = |‖xn ‖ −
‖x‖| ≤ ‖xn − x‖ ≤ 𝛼‖xn − x‖∗ . Hence 𝜆(xn ) → 𝜆(x). By lemma 6.1.1, (X, ‖.‖∗ ) is
locally compact; hence the closed unit sphere S = {x ∈ X ∶ ‖x‖∗ = 1} in (X, ‖.‖∗ )
is compact (see problem 9 on section 4.7). Thus the restriction of 𝜆 to S assumes a
minimum value 𝛽 = 𝜆(x0 ) at some point x0 ∈ S. The constant 𝛽 must be positive
since, otherwise, 𝜆(x0 ) = ‖x0 ‖ = 0, and hence x0 = 0, which is not possible. We
x
have shown that, for every x ∈ S, ‖x‖ ≥ 𝛽. Now, for a nonzero vector x ∈ X, ∗ ∈
‖x‖
x
S; hence ‖ ∗
‖ ≥ 𝛽, and ‖x‖ ≥ 𝛽‖x‖∗ . 
‖x‖

Corollary 6.1.3. All norms on a finite-dimensional normed linear space are equiv-
alent.

Proof. By theorem 6.1.2, an arbitrary norm on a finite-dimensional space is equiva-


lent to ‖.‖∗ ; hence any two norms are equivalent. 

Theorem 6.1.4. A finite-dimensional proper subspace F of a normed linear space X


is closed and nowhere dense.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 249

Proof. By lemma 6.1.1, F is complete and hence closed in X. To show that X is


nowhere dense, let {x1 , … , xn } be a basis for F, and let x ∈ X − F. If F contains
a ball of radius 𝛿, then it would contain the ball B of radius 𝛿 and centered at
𝛿x
the origin. But then B (hence F) would contain a multiple of x, namely, . This
2‖x‖
contradiction shows that F is nowhere dense in X. 

Example 1. Let F be a finite-dimensional subspace of a normed linear space X,


and let x ∈ X − F. Then there exists a point z ∈ F such that ‖x − z‖ = dist(x, F).
Let B = B[x, r] be a closed ball centered at x, and assume r is large enough so
that B ∩ F ≠ ∅. The set K = B ∩ F is a closed and bounded subset of F. By the
Heine-Borel theorem (see problem 6 at the end of this section), K is compact.
The function f ∶ K → ℝ given by f (y) = ‖x − y‖ is continuous and positive.
Therefore d = min{ f ( y) ∶ y ∈ K } is positive and is attained at some point z ∈ K.
Since d ≤ r and ‖x − y‖ > r for every vector y ∈ F − B, ‖x − z‖ = dist(x, F). 

Example 2. The following is a direct application of the previous example. Take


X = 𝒞[0, 1], and F = ℙn . For any function f ∈ X, there is a polynomial p∗n ∈ ℙn
such that ‖ f − p∗n ‖∞ = dist( f, ℙn ). 

The polynomial p∗n is the best approximation of f in ℙn . It can be shown that p∗n is
unique. Observe that p∗n can have degree less than n.

Example 3. For a function f ∈ 𝒞[0, 1], the sequence of best approximations p∗n
converges uniformly to f.
Let 𝜖 > 0. By the Weierstrass polynomial approximation theorem, there exists
a polynomial q such that ‖ f − q‖∞ < 𝜖. Let N be the degree of q. Then, for
every n > N, q ∈ ℙn . Since p∗n is the best approximation of f in ℙn , ‖ f − p∗n ‖∞ ≤
‖ f − q‖∞ < 𝜖. This shows that limn ‖ f − p∗n ‖∞ = 0. 

The following theorem establishes the fact that local compactness is exclusively a
property of finite-dimensional spaces.

Theorem 6.1.5. A normed linear space X is locally compact if and only if it is finite
dimensional.

Proof. Finite-dimensional spaces are locally compact by lemma 6.1.1 and corol-
lary 6.1.3. Now suppose X is a locally compact normed linear space. Thus the
closed unit ball B = {x ∈ X ∶ ‖x‖ ≤ 1} is compact. Since B ⊆ ∪x∈B B(x, 1/2), B ⊆
∪ni=1 B(xi , 1/2) for a finite subset {x1 , … , xn } of B. We will show that {x1 , … , xn }
spans X. Let F = Span{x1 , … , xn }, and suppose, for a contradiction, that there
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

250 fundamentals of mathematical analysis

is a vector x ∈ X − F. By example 1, there is an element z ∈ F which is closest


x−z
to x, and d = dist(x, F) = ‖x − z‖ > 0. Now ∈ B, so there is an element
‖x−z‖
x−z
xi ∈ B such that ‖ − xi ‖ < 1/2, and ‖x − z − ‖x − z‖xi ‖ ≤ ‖x − z‖/2 = d/2.
‖x−z‖
But z + ‖x − z‖xi ∈ F, so ‖x − z − ‖x − z‖xi ‖ ≥ d. This contradiction concludes
the proof. 

Remark. The last theorem implies that compact subsets of an infinite-dimensional


space have empty interiors and are therefore thought of as rather thin and scarce
sets. However, compact sets continue to play an important role in the study of
infinite-dimensional spaces. The Hilbert cube is an example of a rather exotic
compact set.

Example 4. The Hilbert cube ℋ is a compact, convex, nowhere dense subset of


l 2 . Furthermore, Span(ℋ) is dense in l 2 .

It is simple to show that ℋ is closed and convex. Once we show that ℋ is


compact, the above remark implies that it has an empty interior.

Let (xk )k=1 be a sequence in ℋ, and write xk = (xk1 , xk2 , ...). The sequence
∞ ∞
(xk1 )k=1 is bounded by 1, so there exists a strictly increasing sequence (k1p )p=1
k1p k1p
of positive integers such that x1 = limp→∞ x1 exists. Since the sequence (x2 )
k2p ∞
is bounded by 1/2, it contains a convergent subsequence (x2 )p=1 . Let x2 =
k2p ∞
limp x2 . Continue inductively to find sequences (knp )p=1 of positive integers
such that, for each n ≥ 1, (kpn+1 ) is a subsequence of (knp ) such that xn =
knp p ∞
limp xn exists. Consider the diagonal sequence (kp )p=1 , and observe that it is
p
∞ kp
a subsequence of (knp )p=1 for every n ≥ 1, thus limp xn = xn . Let x = (xn ), and
p p
kp
observe that |xn | = limp |xn | ≤ 1/n; thus x ∈ ℋ. We claim that limp xkp = x in
p
l2 . For simplicity of notation, write yp = xkp . Let 𝜖 > 0, and choose an integer N
∞ 1 p
such that ∑n=N+1 2 < 𝜖2 /8. For n = 1, 2, … , N, limp yn = xn ; thus there exists an
n
p 𝜖2
integer P such that, for n = 1, … , N, p > P implies that |yn − xn |2 < . Hence,
2N
for p > P,
N p
∑n=1 |yn − xn |2 < 𝜖2 /2. Also,
∞ p ∞ 2 2 ∞ 1
∑n=N+1 |yn − xn |2 < ∑n=N+1 ( ) = 4 ∑n=N+1 < 𝜖2 /2.
n n2

These last two inequalities imply that, for p > P, ‖yp − x‖2 < 𝜖.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 251

Finally, we show that Span(ℋ) is dense in l2 . Let x = (xn ) ∈ l2 , and let 𝜖 > 0.

Choose an integer N such that ∑n=N+1 |xn |2 < 𝜖2 . Define y1 = (1, 0, 0, ...),
N
y2 = (0, 1/2, 0, 0, ...), ... , yN = (0, 0, … , 0, 1/N, 0, ...), and set h = ∑n=1 an yn ,
where an = nxn . Clearly, h ∈ Span(ℋ) and ‖x − h‖2 < 𝜖. 

Two very useful tools for studying finite-dimensional spaces are local compact-
ness and the existence of a finite linear basis. We already saw in theorem 6.1.5
that infinite-dimensional normed linear spaces are never locally compact. By
definition, an infinite-dimensional space cannot have a finite Hamel basis. The
following theorem should thoroughly convince the reader that a Hamel basis is
of no practical use as a tool for studying infinite-dimensional Banach spaces.
However, see the concept of a Schauder basis in the exercises following this section
and the next section.

Theorem 6.1.6. An infinite-dimensional Banach space does not have a countable


Hamel basis.

Proof. Let X be an infinite-dimensional Banach space, and let B = {x1 , x2 , ..} be


a countably infinite independent subset of X. The finite-dimensional spaces
Fn = Span{x1 , … , xn } are closed and nowhere dense in X. Baire’s theorem implies
that ∪∞ ∞
n=1 Fn ≠ X. Since Span(B) = ∪n=1 Fn , Span(B) ≠ X. Therefore no countable
subset of X spans X. 

The following result will be used frequently later in the book. The motivation for
the theorem is provided below.
Let M be a proper subspace of ℝn . It is an elementary fact of linear algebra (see
problem 7 on section 3.7) that there is a unit vector x orthogonal to M. In this case,
dist(x, M) = 1.
Generalizing this result to Banach space is more challenging because we lack
the concept of orthogonality, which is a available only for inner product spaces.
The result below provides the next best alternative to the desirable property that
dist(x, M) = 1; we can pick a unit vector x whose distance from M is arbitrarily
close to 1.

Lemma 6.1.7 (Riesz’s lemma). Let M be a closed proper subspace of a normed


linear space X, and let 0 < 𝜃 < 1. Then there exists a unit vector x such that
dist (x, M) > 𝜃.

Proof. Let v ∈ X − M, and let 𝛿 = dist(v, M). Since 𝜃 < 1, there exists y0 ∈ M such
v−y0
that 𝛿 ≤ ‖v − y0 ‖ < 𝛿/𝜃. Define x = . For y ∈ M, y0 + ‖v − y0 ‖y ∈ M and
‖v−y0 ‖
‖v − (y0 + ‖v − y0 ‖y)‖ ≥ 𝛿. Now
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

252 fundamentals of mathematical analysis

v − y0 1
‖x − y‖ = ‖ − y‖ = ‖v − y0 − ‖v − y0 ‖y‖
‖v − y0 ‖ ‖v − y0 ‖
1 𝛿 𝛿
= ‖v − (y0 + ‖v − y0 ‖y)‖ ≥ > = 𝜃. 
‖v − y0 ‖ ‖v − y0 ‖ 𝛿/𝜃

Exercises

1. (a) Prove that a sequence (xn ) in a normed linear space X is Cauchy if and
only if limn (xpn − xqn ) = 0 for every pair (pn ) and (qn ) of increasing
sequences of positive integers.
1 n
(b) Show that if limn xn = x, then limn ∑i=1 xi = x.
n
2. Let w be a fixed positive function in 𝒞[0, 1]. For f ∈ 𝒞[0, 1], define
‖ f ‖ = ‖ fw‖∞ . Prove that ‖.‖ is a norm, and determine if it is equivalent
to the uniform norm on 𝒞[0, 1].
3. Let X be a normed linear space. Prove that X is a Banach space if and only
if the closed unit ball in X is complete.
4. Let X be a normed linear space. Prove that X is separable if and only if the
closed unit sphere in X is separable.
5. Let (xn ) be a sequence in a Banach space X such that, for every 𝜖 > 0, there
exists a convergent sequence (yn ) in X such that ‖xn − yn ‖ < 𝜖 for all n ∈ ℕ.
Prove that (xn ) is convergent.
6. The Heine-Borel theorem. Let V be a finite-dimensional subspace of a
normed linear space X. Show that a subset K of V is compact if and only
if it is closed and bounded.
7. Let X be an infinite-dimensional normed linear space. Show that X contains
a compact countable subset that is not contained in any finite-dimensional
subspace of X. Hint: Let {x1 , x2 , ...} be an infinite independent subset of X,
x
and let 𝜉n = n . Consider the set {𝜉n } ∪ {0}.
n‖xn ‖
8. Prove that, for 1 ≤ p < ∞, the linear dimension of lp is 𝔠. Hint: For each
0 < 𝜆 < 1, let x𝜆 = (𝜆, 𝜆2 , 𝜆3 , ...). Show that the set {x𝜆 } is independent, then
use example 7 on section 4.5.

Definition. Let (xn ) be a sequence in a normed linear space X. We say the


∞ n
the series ∑n=1 xn converges if the sequence of partial sums Sn = ∑i=1 xi

converges to an element x ∈ X. We write x = ∑n=1 xn for limn Sn and say

the x is the sum of the series. We say that ∑n=1 xn is absolutely convergent

if ∑n=1 ‖xn ‖ < ∞.

9. Prove that if X is a Banach space, then every absolutely convergent series is


convergent.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 253

10. Prove that if every absolutely convergent series in a normed linear space
X is convergent, then X is a Banach space. The proof outline is as follows.
Let (xn ) be a Cauchy sequence in X. It is enough to show that (xn ) contains
a convergent subsequence. Choose a subsequence (xnk ) of (xn ) such that

‖xnk+1 − xnk ‖ < 2−k . Define yk = xnk+1 − xnk . Show that ∑k=1 ‖yk ‖ < ∞. By
∞ ∞
assumption, ∑k=1 yk converges. But ∑k=1 yk = −xn1 + limk xnk .

Definition. Let X be a normed linear space. A countable subset {un } of X


is a Schauder basis for X if every element x ∈ X can be expressed uniquely

as x = ∑n=1 an un where an ∈ 𝕂.

11. Prove that a Schauder basis is independent.


12. Prove that if X has a Schauder basis then it is separable.
13. Find a Schauder basis for lp , 1 ≤ p < ∞.
14. Prove that if M is a subspace of a normed linear space X, then M is also a
subspace of X.
15. Let M be a closed subspace of a normed linear space X, and let x ∈ X. Prove
that dist(x, M) ≤ ‖x‖.
16. Use Riesz’s theorem to produce another proof that an infinite-dimensional
normed linear space is not locally compact.

6.2 Bounded Linear Mappings

The boundedness of a linear transformation on a normed linear space and its conti-
nuity are used synonymously. Every linear transformation on a finite-dimensional
space is continuous. The picture is far more complicated for linear transformations
on infinite-dimensional spaces. In this chapter and the next, we study continuous
linear transformations exclusively because nonlinear transformations and discon-
tinuous linear transformations fall outside the realm of beginning linear functional
analysis.
In this section, we study the various equivalent characterizations of bounded-
ness, the space of bounded linear transformations on a normed linear space, and
the dual space in particular. The section concludes with a typical representation
theorem, which gives a concrete description of the dual of a normed linear space.
Throughout this section, X and Y are normed linear spaces.

Definition. A linear mapping T ∶ X → Y is said to be bounded if there exists a


constant M > 0 such that for every x ∈ X,

‖T(x)‖ ≤ M‖x‖.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

254 fundamentals of mathematical analysis

Theorem 6.2.1. Let T ∶ X → Y be linear. The following are equivalent:


(a) T is continuous.
(b) T is continuous at one point x0 ∈ X.
(c) T is continuous at 0.
(d) T is bounded.

Proof. (a) implies (b), obviously.

(b) implies (c). Let xn → 0 in X. Then xn + x0 converges to x0 . By assumption,


limn T(xn + x0 ) = T(x0 ). But limn T(xn + x0 ) = limn T(xn ) + T(x0 ); hence
limn T(xn ) = 0.

(c) implies (d). Suppose T is not bounded. Then, for every n ∈ ℕ, there exists
x
xn ∈ X such that ‖T(xn )‖ > n‖xn ‖. Let 𝜉n = n . Then limn 𝜉n = 0 in X, but
n‖xn ‖
‖T(xn )‖
‖T(𝜉n )‖ = > 1. Thus, limn T(𝜉n ) ≠ 0 in Y, and T is not continuous at 0.
n‖xn ‖

(d) implies (a). Suppose that there is a constant M > 0 such that, for every x ∈ X,
‖T(x)‖ ≤ M‖x‖, and limn xn = x in X. Then

limn ‖T(xn ) − T(x)‖ = limn ‖T(xn − x)‖ ≤ limn M‖xn − x‖ = 0. 

Let T ∶ X → Y be a bounded linear mapping. The norm of T is

‖T(x)‖
‖T‖ = supx≠0 .
‖x‖

Notice that since T is bounded, there is a constant M > 0 such that ‖T(x)‖ ≤ M‖x‖
‖T(x)‖
for all x ∈ X; therefore ≤ M, and hence ‖T‖ is finite. It also follows directly
‖x‖
from the definition that ‖T(x)‖ ≤ ‖T‖‖x‖.

Example 1. Let (cn ) be a bounded sequence and, for a sequence x = (x1 , x2 , ...) ∈
l2 , define T(x) = (c1 x1 , c2 x2 , ...). We claim that T is a bounded linear mapping
∞ ∞
on l2 . Indeed, ‖T(x)‖22 = ∑n=1 |cn xn |2 ≤ ‖c‖2∞ ∑n=1 |xn |2 = ‖c‖2∞ ‖x‖22 . This
estimate shows that T(x) ∈ l2 and that ‖T‖ ≤ ‖c‖∞ . The linearity of T is
obvious. 

Example 2. A bounded linear mapping T ∶ X → Y maps bounded sets into


bounded sets.
Let A be a bounded subset of X and suppose that ‖x‖ ≤ r for every x ∈ A.
Then ‖T(x)‖ ≤ ‖T‖‖x‖ ≤ ‖T‖r < ∞. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 255

Example 3. If X is finite dimensional, then every linear mapping T ∶ X → Y is


bounded.
Let {x1 , … , xn } be a linear basis for X, and let M = max1≤i≤n ‖T(xi )‖. Since all
norms on X are equivalent, we may assume that the norm on X is the 1-norm.
n n n
Thus if x = ∑i=1 ai xi , then ‖x‖ = ∑i=1 |ai |. Now ‖T(x)‖ = ‖T(∑i=1 ai xi )‖ ≤
n n
∑i=1 |ai |‖T(xi )‖ ≤ M ∑i=1 |ai | = M‖x‖. 

Example 4. If X is infinite dimensional, then there exists a linear unbounded


mapping from X to Y.

Fix a nonzero element y ∈ Y, and let S1 = {x1 , x2 , ...} be an infinite independent


set of unit vectors in X. Let S2 ⊆ X be such that S = S1 ∪ S2 is a linear basis for
X. Define a function T ∶ S → Y as follows:

ny if x = xn ,
T(x) = {
0 if x ∈ S2 .

Extend T to a linear mapping T ∶ X → Y. See theorem 3.4.4. Since S1 is bounded


but T(S1 ) is not, T is not bounded by example 2. 

Theorem 6.2.2. Let T ∶ X → Y be a bounded linear mapping. Then

‖T‖ = sup‖x‖≤1 ‖T(x)‖ = sup‖x‖=1 ‖T(x)‖.

x
Proof. Let M = sup‖x‖≤1 ‖T(x)‖. For every x ∈ X, x ≠ 0, ‖T( )‖ ≤ M, hence
‖x‖
‖T(x)‖
≤ M. Thus ‖T‖ ≤ M. To prove that M ≤ ‖T‖, fix a vector x ∈ X such
‖x‖
‖Tx‖
that 0 < ‖x‖ ≤ 1. By definition of ‖T‖, ‖T‖ ≥ ≥ ‖Tx‖. Since x is arbitrary,
‖x‖
it follows that M = sup‖x‖≤1 ‖Tx‖ ≤ ‖T‖, as desired.
The proof that ‖T‖ = sup‖x‖=1 ‖T(x)‖ is similar. 

Theorem 6.2.3. Let X and Y be normed linear spaces, and let ℒ(X, Y) be the set
of all bounded linear mappings from X to Y. Then ℒ(X, Y) is a normed linear
space with the operations (T1 + T2 )(x) = T1 (x) + T2 (x), (aT)(x) = aT(x), and
‖T(x)‖
the norm ‖T‖ = supx≠0 . Furthermore, if Y is a Banach space, then so is
‖x‖
ℒ(X, Y).

Proof. We first show that a linear combination of bounded linear mappings is


bounded. Let a and b be scalars, and let T1 , T2 ∈ ℒ(X, Y). Then, for every x ∈
X, ‖aT1 (x) + bT2 (x)‖ ≤ |a|‖T1 (x)‖ + |b|‖T2 (x)‖ ≤ |a|‖T1 ‖‖x‖ + |b|‖T2 ‖‖x‖ =
(|a|‖T1 ‖ + |b|‖T2 ‖)‖x‖. This shows that ‖aT1 + bT2 ‖ ≤ |a|‖T1 ‖ + |b‖T2 ‖ and
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

256 fundamentals of mathematical analysis

that ℒ(X, Y) is closed under addition and scalar multiplication. Verifying the rest
of the axioms for a vector space is routine.
The above inequality can also be used to verify the defining properties of
a norm. For example, taking a = b = 1 gives ‖T1 + T2 ‖ ≤ ‖T1 ‖ + ‖T2 ‖. The
identity ‖aT‖ = |a|‖T‖ is obvious.
It remains to show that ℒ(X, Y) is complete if Y is complete. Suppose (Tn )
is a Cauchy sequence in ℒ(X, Y), and let 𝜖 > 0. By assumption, there exists
a positive integer N such that, for m, n > N, ‖Tn − Tm ‖ < 𝜖. For all x ∈ X,
‖Tn (x) − Tm (x)‖ = ‖(Tn − Tm )(x)‖ ≤ ‖Tn − Tm ‖‖x‖ < 𝜖‖x‖. Thus (Tn (x)) is a
Cauchy sequence in Y, and hence limn T(xn ) exists for every x ∈ X. Define T(x) =
limn Tn (x). We show that T ∈ ℒ(X, Y). The linearity of T is straightforward;
if x, y ∈ X, and a and b are scalars, then T(ax + by) = limn Tn (ax + by) =
limn aTn (x) + bTn (y) = aT(x) + bT(y). To show that T is bounded, let 𝜖 =
1. There is a positive integer N such that, for m, n ≥ N, ‖Tn − Tm ‖ ≤ 1. In
particular, for all n ≥ N, ‖Tn (x) − TN (x)‖ ≤ ‖x‖. Hence, for all x ∈ X, and
all n ≥ N, ‖Tn (x)‖ ≤ ‖Tn (x) − TN (x)‖ + ‖TN (x)‖ ≤ ‖x‖ + ‖TN ‖‖x‖ = (1 +
‖TN ‖)‖x‖. Taking the limit as n → ∞, we obtain ‖T(x)‖ ≤ (1 + ‖TN ‖)‖x‖. Thus
‖T‖ ≤ (1 + ‖TN ‖). Finally, we show that limn ‖Tn − T‖ = 0. Let 𝜖 > 0, and let N
be such that ‖Tn − Tm ‖ < 𝜖 for all m, n > N. For all x ∈ X, ‖Tn (x) − Tm (x)‖ ≤
𝜖‖x‖. Taking the limit as m → ∞, we have ‖Tn (x) − T(x)‖ ≤ 𝜖‖x‖ for all x ∈ X,
and all n > N. Thus ‖Tn − T‖ ≤ 𝜖 for all n > N; hence limn Tn = T. 

An important special case of theorem 6.2.3 is the space ℒ(X, 𝕂) of all bounded
linear functionals from a normed linear space X to the base field. This space is
known as the dual space of X , and is denoted by X∗ . Since 𝕂 is complete, X∗ is a
Banach space, even when X is not complete.

Another important special case of theorem 6.2.3 is the space ℒ(X) = ℒ(X, X)
of bounded linear transformations on a Banach space X. Elements of ℒ(X) are
‖T(x)‖
also called bounded operators on X. The norm ‖T‖ = supx≠0 is called, not
‖x‖
surprisingly, the operator norm on ℒ(X).

Example 5. Let ‖.‖ and ‖.‖′ be norms on a vector space X. Then ‖.‖ and ‖.‖′ are
equivalent if and only if there exist positive constants k1 and k2 such that, for
every x ∈ X, k1 ‖x‖ ≤ ‖x‖′ ≤ k2 ‖x‖. Note the contrast between this result and
exercise 12 on section 4.3.
If k1 and k2 exist, the two norms are equivalent by theorem 4.3.9. Conversely,
the equivalence of the two norms implies the bi-continuity of the identity
mapping I ∶ (X, ‖.‖) → (X, ‖.‖′ ). The continuity of I implies the existence of a
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 257

positive constant k2 (namely, the norm of I relative to the given norms) such that
‖x‖′ = ‖I(x)‖′ ≤ k2 ‖x‖. The existence of k1 is established in a similar way. 

Example 6. It is easy to verify that the function ‖ f ‖ = ‖ f ‖∞ + | f(0)| + | f(1)|


defines a norm on 𝒞[0, 1]. This norm is equivalent to the uniform norm on
𝒞[0, 1] by the previous example since ‖ f ‖∞ ≤ ‖ f ‖ ≤ 3‖ f ‖∞ . 

Example 7. Let L ∶ l2 → l2 be the operator defined by L(x1 , x2 , ...) = (x2 , x3 , ...),


and let Tn = Ln . Thus Tn (x1 , x2 , ...) = (xn+1 , xn+2 , ...). Since ‖Tn (x)‖2 ≤ ‖x‖2 for
every x ∈ l2 , ‖Tn ‖ ≤ 1. In fact, ‖Tn ‖ = 1 because if we take x = en+1 , then
Tn (x) = e1 , and ‖Tn (x)‖2 = 1 = ‖x‖2 . Next we show that limn Tn (x) = 0 for

every x ∈ l2 . Indeed, ‖Tn (x)‖22 = ∑k=n+1 |xk |2 . Being the tail end of a conver-
gent series, the last quantity approaches 0 as n → ∞. Observe that the sequence
(Tn ) does not converge in the operator norm because if it did, it should converge
to the zero operator. This is obviously false because ‖Tn ‖ = 1. 

Definition. A bounded linear operator T on a Banach space X is said to be


bounded away from zero if there is a constant c > 0 such that ‖Tx‖ ≥ c‖x‖ for
all x ∈ X.

Example 8. Let X be a Banach space, and let T ∈ ℒ(X) be bounded away from
zero. Then T is one-to-one, ℜ(T) is closed in X, and T−1 ∶ ℜ(T) → X is
bounded.

If Tx = 0, then ‖x‖ ≤ ‖Tx‖/c = 0; hence x = 0, and T is one-to-one.


To prove that ℜ(T) is closed, we show that if (xn ) is such that limn Txn = y,
then y ∈ ℜ(T). It is enough to show that (xn ) is Cauchy because if we set x =
limn xn , then y = limn Txn = Tx. Since (Txn ) is a Cauchy sequence, ‖xn − xm ‖ ≤
(1/c)‖Txn − Txm ‖ → 0 as n, m → ∞. Thus (xn ) is a Cauchy sequence, as desired.
Finally, the inequality ‖Tx‖ ≥ c‖x‖ implies that ‖T−1 ‖ ≤ 1/c. 

We conclude this section with an example of a representation theorem. The result


below gives a concrete characterization of the dual of the sequence spaces lp .

Theorem 6.2.4. Let 1 ≤ p < ∞. The dual (lp )∗ of lp is isometrically isomorphic to lq ,


1 1
where + = 1.
p q

Proof. Fix a real number p > 1. For y ∈ lq , define a functional 𝜆y ∶ lp → 𝕂 by


∞ ∞
𝜆y (x) = ∑n=1 xn yn . By Hölder’s inequality, |𝜆y (x)| ≤ ∑n=1 |xn yn | ≤ ‖x‖p ‖y‖q .
Thus 𝜆y is a bounded linear functional on lp , and ‖𝜆y ‖ ≤ ‖y‖q . The linearity of
𝜆y is clear. To show that ‖𝜆y ‖ = ‖y‖q , define a sequence (xn ) as follows xn = 0
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

258 fundamentals of mathematical analysis

q/p q−1
if yn = 0, and xn = |yn |q /yn if yn ≠ 0. Note that ‖x‖p = ‖y‖q = ‖y‖q ; hence
∞ ∞ q q−1
x ∈ lp . Now 𝜆y (x) = ∑n=1 xn yn = ∑n=1 |yn |q = ‖y‖q = ‖y‖q ‖y‖q = ‖x‖p ‖y‖q .
Thus ‖𝜆y ‖ = ‖y‖q .
The mapping Λ ∶ lq → (lp )∗ given by y ↦ 𝜆y is clearly linear, and the fact that
‖𝜆y ‖ = ‖y‖q makes Λ an isometry. It remains to show that Λ maps lq onto (lp )∗ .
Let 𝜆 be a bounded linear functional on lp . We need to show that 𝜆 = 𝜆y for
some y ∈ lq . Let en be the canonical vectors in 𝕂(ℕ). For x = (x1 , x2 , ...) ∈ lp , the
n
sequence 𝜉n = ∑i=1 xi ei = (x1 , x2 , … , xn , 0, 0, 0, ...) converges to x in lp . Let yn =
𝜆(en ), and let y = (yn ). First we show that y ∈ lq . Let 𝜂n = (y1 , y2 , … , yn , 0, 0, 0...),
n
and define 𝜆n (x) = ∑i=1 xi yi . By the part of the theorem we already estab-
n
lished, 𝜆n ∈ (lp )∗ , and ‖𝜆n ‖ = ‖𝜂n ‖q = (∑i=1 |yi |q )1/q . Now |𝜆n (x)| = |𝜆(𝜉n )| ≤
n
‖𝜆‖‖𝜉n ‖p ≤ ‖𝜆‖‖x‖p ; hence ‖𝜆n ‖ ≤ ‖𝜆‖. Therefore (∑i=1 |yi |q )1/q is bounded by

‖𝜆‖; hence ∑n=1 |yn |q < ∞, that is, y ∈ lq . Finally, we show that 𝜆 = 𝜆y :

n
𝜆(x) = 𝜆(lim 𝜉n ) = lim 𝜆(𝜉n ) = lim 𝜆(∑ xi ei )
n n n
i=1
n n ∞
= lim ∑ xi 𝜆(ei ) = lim ∑ xi yi = ∑ xn yn = 𝜆y (x). 
n n
i=1 i=1 n=1

We sometimes summarize the above result by saying that the dual of lp is lq instead
of saying that the dual of lp is isometrically isomorphic to lq . This slight abuse of
language is common.

Exercises

1. Prove that if a linear function T ∶ X → Y maps bounded sets into bounded


sets, then T is bounded.
2. Let T ∶ X → Y be a bounded linear mapping. Show that the closed ball
{y ∈ Y ∶ ‖y‖ ≤ ‖T‖} is the smallest ball in Y that contains the image of the
closed unit ball, {x ∈ X ∶ ‖x‖ ≤ 1} in X.
3. Let T ∶ X → Y be a bounded linear mapping. Show that ‖T‖ =
sup‖x‖<1 ‖T(x)‖.
4. Show that a bounded linear mapping T ∶ X → Y is uniformly continuous.
5. Prove that every linear functional on X is bounded if and only if
dim(X) < ∞.
6. Let 𝜆 ∈ X∗ . Prove that 𝜆 is an open mapping.
7. Let T ∶ X → X be a linear mapping such that whenever xn → 0, then {T(xn )}
is bounded. Prove that T is bounded. Hint: If T is unbounded, then there
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 259

exists a sequence (xn ) such that xn → 0 but ‖T(xn )‖ is bounded away from
0. Consider the sequence 𝜉n = xn /√‖xn ‖.
8. Suppose X is an n-dimensional normed linear space. Prove that
dim(X∗ ) = n.
9. Let T ∶ X → Y be a bounded linear injection. Prove that the following
conditions are equivalent:
(a) T is an isometry from X onto Y.
(b) T(SX ) = SY .
(c) T(BX ) = BY .
Here BX and BY are the closed unit balls in X and Y, respectively, and SX
and SY are the unit spheres in X and Y, respectively.
10. We know that if 1 ≤ p < q ≤ ∞, then lp ⊂ lq ; see problem 3 on section 3.6.
Let i ∶ lp → lq be the inclusion map. Find ‖i‖.
x
11. Define a linear operator T ∈ ℒ(c0 ) as follows: for x = (xn ), T(x) = ( n ). Find
n
‖T‖ and show that ℜ(T) is dense in c0 .
12. In connection with example 1, show that ‖T‖ = ‖c‖∞ .
13. Let X be the space of polynomials equipped with the norm ‖ f ‖ = sup0≤x≤1
| f (x)|. Prove that differentiation is an unbounded operator on X.
14. Define a function ‖.‖′ on the space of null sequences c0 by ‖x‖′ =

∑n=1 2−n |xn |. Here x = (xn ). Prove that the given function is a norm and
that it is not equivalent to the infinity norm on c0 . Hint: The sequence
(1, 1, … , 1, 0, 0, 0, ...) is Cauchy in ‖.‖′ .
15. Let ‖.‖1 and ‖.‖2 be equivalent norms on a Banach space X. Prove that the
closed unit balls B1 = {x ∈ X ∶ ‖x‖1 ≤ 1} and B2 = {x ∈ X ∶ ‖x‖2 ≤ 1} are
homeomorphic. Hint: Consider the function 𝜑 ∶ B1 → B2 defined by

‖x‖1
x if x ≠ 0,
𝜑(x) = { ‖x‖2
0 if x = 0.

16. Prove that c∗0 = l1 and (l1 )∗ = l∞ .


17. Let 𝜆 ∶ X → 𝕂 be a nonzero linear functional on a vector space X. Prove that
there exists a one-dimensional subspace M of X such that X = Ker(𝜆) ⊕ M.
18. Let 𝜆 ∶ X → 𝕂 be a nonzero linear functional on a normed linear space X.
Prove that the following are equivalent:
(a) 𝜆 is bounded.
(b) Ker(𝜆) is closed.
(c) Ker(𝜆) is not dense in X.
Conclude that 𝜆 is unbounded if and only if Ker(𝜆) is dense in X.
19. Let 𝜆 ∶ X → 𝕂 be a nonzero linear functional on a normed linear space X.
Prove that the following are equivalent:
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

260 fundamentals of mathematical analysis

(a) 𝜆 is unbounded.
(b) There is a sequence (xn ) in X such that ‖xn ‖ = 1 and limn |𝜆(xn )| = ∞.
(c) There is a sequence (xn ) in X such that limn xn = 0 and 𝜆(xn ) = 1.
x
20. Let T ∶ 𝒞[0, 1] → 𝒞[0, 1] be the linear operator (Tf )(x) = ∫0 f (t)dt. Show
that T is bounded, and find its norm.
1
21. Let 𝜆 ∶ 𝒞[0, 1] → 𝕂 be the linear functional 𝜆( f ) = ∫0 f (t)dt. Show that 𝜆
is bounded, and find its norm.
22. Define a linear operator Pn on the space of convergent sequences c by
Pn (x) = (x1 , x2 , … , xn , xn , xn , ...).
(a) Prove that ‖Pn ‖ = 1 and that lim Pn (x) = x for all x ∈ X.
(b) Prove that u1 = (1, 1, 1...), u2 = (0, 1, 1, 1, ...), u3 = (0, 0, 1, 1, 1, ...), ... , is
a Schauder basis for c.
23. Let {tn } be a countable dense subset of [0, 1], where t1 = 0, t2 = 1. For n ∈ ℕ,
define an operator Pn on 𝒞[0, 1] as follows: Pn f is the continuous, piecewise
linear function with nodes t1 , … , tn such that (Pn f)(ti ) = f (ti ) for 1 ≤ i ≤ n.
Show that ‖Pn ‖ = 1 for all n ∈ ℕ and that, for every f ∈ 𝒞[0, 1], limn ‖Pn f −
f ‖∞ = 0.
24. This is a continuation of the previous exercise. Define u1 (x) = 1, and, for
n ≥ 2, define un to be the continuous, piecewise linear function such that
un (tn ) = 1, and un (ti ) = 0 for 1 ≤ i ≤ n − 1. Prove that {ui }ni=1 is a basis for
the range of Pn , and hence conclude that {un }∞ n=1 is a Schuader basis for
𝒞[0, 1].

Definition. Let {un } be a Schauder basis for a Banach space X. Thus every

x ∈ X has a unique representation x = ∑n=1 an (x)un . Define the canonical
n
projections Pn ∶ X → Span{u1 , … , un } by Pn (x) = ∑i=1 ai (x)ui . Notice that
the last three problems include examples of canonical projections. We
assume, without proof, the fact that the set {Pn } is uniformly bounded, that
is, supn ‖Pn ‖ < ∞.

25. Let {un } be a Schauder basis for a Banach space X, and consider the series

representation x = ∑n=1 an un of an element x ∈ X. Each of the coefficients
an is clearly a linear functional on X. Prove that an ∈ X∗ . Hint: an (x)un =
Pn (x) − Pn−1 (x).

6.3 Three Fundamental Theorems

In addition to the Hahn-Banach theorem, the three theorems we present in this


section are of fundamental importance. All three theorems require completeness;
hence they apply only to Banach spaces.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 261

In chapter 4 (see problem 5 on section 4.8), we encountered an example


where a family of pointwise bounded functions on a complete metric space is,
in fact, uniformly bounded on a ball. Lemma 6.3.1 is similar in spirit, and its
proof demonstrates the centrality of Baire’s theorem in this section. Because the
boundedness of a linear function on a ball implies its boundedness, it must not be
surprising that when X is a Banach space, pointwise boundedness implies uniform
boundedness. This is the uniform boundedness principle.
The open mapping theorem is a central theorem in functional analysis, and
one cannot exaggerate its importance. Lemma 6.3.3 is critical to the proof of the
open mapping theorem, and, again, completeness is crucial. The closed graph
theorem comes in quite handy in certain applications to prove the boundedness
of a linear function. It follows rather easily from the open mapping theorem. Later
in the book, you will see many applications of the three theorems, as well as the
Hahn-Banach theorem.

In this section, X and Y are normed linear spaces.

A family of bounded linear functions {T𝛼 }𝛼∈I from X to Y such that, for each x ∈ X,
sup𝛼∈I {‖T𝛼 (x)‖} < ∞ is said to be pointwise bounded. If sup𝛼∈I ‖T𝛼 ‖ < ∞, we say
that the family {T𝛼 } is uniformly bounded.

Example 1. Let X and Y be normed linear spaces, and suppose that dim(X) <
∞. If a family of linear transformations {T𝛼 }𝛼∈I from X to Y is pointwise
bounded, then sup𝛼∈I ‖T𝛼 ‖ < ∞. To see this, fix a basis {x1 , … , xn } for X,
n n
and use the 1-norm on X. Thus if x = ∑i=1 ai xi , then ‖x‖ = ∑i=1 |ai |. Define
Mi = sup𝛼∈I ‖T𝛼 (xi )‖, and let M = max1≤i≤n Mi . For any 𝛼 ∈ I, we have

‖ n ‖ n n
‖T𝛼 (x)‖ = ‖‖T𝛼 (∑ ai xi )‖‖ ≤ ∑ |ai |‖T𝛼 (xi )‖ ≤ ∑ Mi |ai | ≤ M‖x‖. 
‖ i=1 ‖ i=1 i=1

The uniform boundedness principle generalizes example 1 to the infinite-


dimensional case.

Lemma 6.3.1. Let X be a Banach space, and let Y be a normed linear space.
Suppose {T𝛼 }𝛼∈I is a family of bounded linear functions X → Y such that, for
each x ∈ X, sup𝛼∈I {‖T𝛼 (x)‖} < ∞. Then there exists a ball B(x0 , 𝛿) such that
sup{‖T𝛼 (x)‖ ∶ x ∈ B(x0 , 𝛿), 𝛼 ∈ I} < ∞.

Proof. For each n ∈ ℕ, let Fn = ∩𝛼∈I {x ∈ X ∶ ‖T𝛼 (x)‖ ≤ n}. Note that each Fn is
closed and that X = ∪∞ n=1 Fn . Since X is complete, Baire’s theorem forces at least
one set FN to have a nonempty interior. Thus there exists a ball B = B(x0 , 𝛿) ⊆
int (FN ) ⊆ FN . Now, for every x ∈ B and every 𝛼 ∈ I, ‖T𝛼 (x)‖ ≤ N. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

262 fundamentals of mathematical analysis

Theorem 6.3.2 (the uniform boundedness principle). Under the assumptions


of lemma 6.3.1, the family {T𝛼 } is a bounded subset of ℒ(X, Y), that is,
sup𝛼 ‖T𝛼 ‖ < ∞.

Proof. We continue to use the notation of lemma 6.3.1.


Let 𝛽 = sup𝛼 ‖T𝛼 (x0 )‖. We claim that sup{‖T𝛼 (x)‖ ∶ ‖x‖ ≤ 1, 𝛼 ∈ I} ≤
2 2
(N + 𝛽). This will show that sup{‖T𝛼 ‖ ∶ 𝛼 ∈ I} ≤ (N + 𝛽). Let 𝛼 ∈ I, and
𝛿 𝛿
𝛿x
‖x‖ ≤ 1. Then x0 + ∈ B, and
2

2 𝛿x 2 𝛿x
‖T𝛼 (x)‖ = ‖T𝛼 ( )‖ = ‖T𝛼 (x0 + ) − T𝛼 (x0 )‖
𝛿 2 𝛿 2
2 𝛿x 𝛿
≤ {‖T𝛼 (x0 + )‖ + ‖T𝛼 (x0 )‖} ≤ (N + 𝛽). 
𝛿 2 2

The above theorem fails when X is not complete. See problem 1 at the end of this
section.

In the lemma below, we use the notation B𝛿 to denote an open ball in X of radius
𝛿 centered at 0. We use the same notation, in addition to the prime character,
to indicate an open ball in Y. Thus B′r denotes an open ball in Y of radius r and
centered at 0.

Lemma 6.3.3. Suppose that X and Y are Banach spaces and that T is a bounded
linear mapping from X to Y. If, for some r > 0, B′r ⊆ T(B1 ), then B′r ⊆ T(B3 ).
Equivalently, B′r/3 ⊆ T(B1 ).

Proof. First observe that B′r ⊆ T(B1 ) implies that B′r/2i ⊆ T(B1/2i ), for every i ∈ ℕ.
Pick y ∈ B′r . There exists x1 ∈ B1 such that ‖y − T(x1 )‖ < r/2. Now y − T(x1 ) ∈
B′r/2 ⊆ T(B1/2 ), so there exists x2 ∈ B1/2 such that ‖y − T(x1 ) − T(x2 )‖ < r/4.
Continuing in this manner, we can construct a sequence (xn ) in X such that
xn ∈ B1/2n−1 (i.e., ‖xn ‖ < 1/2n−1 ), and ‖y − T(x1 ) − T(x2 ) − ... − T(xn )‖ < r/2n .
Because ‖xn ‖ < 1/2n−1 , the sequence Sn = x1 + ... + xn is a Cauchy sequence in
X; hence x = limn Sn exists. Now T(x) = T(limn Sn ) = limn T(Sn ) = y, and ‖x‖ =
n ∞
limn ‖Sn ‖ = limn→∞ ‖x1 + ... + xn ‖ ≤ limn ∑i=1 ‖xi ‖ ≤ ∑i=1 1/2i−1 = 2 < 3. We
have shown that every y ∈ B′r is the image of an element x ∈ B3 . This proves the
result. 

Theorem 6.3.4 (the open mapping theorem). Suppose that X and Y are Banach
spaces and that T ∶ X → Y is a bounded linear mapping from X onto Y. Then there
exists a number 𝛿 > 0 such that B′𝛿 ⊆ T(B1 ).
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 263

Proof. Since T is onto, Y = ∪∞ n=1 T(Bn ). Baire’s theorem implies that T(BN )
has a nonempty interior for some positive integer N. Thus there exists an element
y0 ∈ Y and a positive number r such that B(y0 , r) ⊆ T(BN ). We claim that
B′r ⊆ T(B2N ). Let y ∈ Y be such that ‖y‖ < r, and let 𝜖 > 0. Both y0 and y0 + y
are in B(y0 , r), so there are vectors 𝜉 and 𝜂 in BN such that ‖y + y0 − T(𝜉)‖ <
𝜖/2, and ‖y0 − T(𝜂)‖ < 𝜖/2. Let x = 𝜉 − 𝜂. Then ‖x‖ ≤ ‖𝜉‖ + ‖𝜂‖ < 2N,
and ‖y − T(x)‖ = ‖y − T(𝜉 − 𝜂) − y0 + y0 ‖ ≤ ‖y + y0 − T(𝜉)‖ + ‖T(𝜂) − y0 ‖ <
𝜖/2 + 𝜖/2 = 𝜖. This proves that B′r ⊆ T(B2N ), which establishes our claim and
r
implies that B′r/2N ⊆ T(B1 ). By lemma 6.3.3, B′𝛿 ⊆ T(B1 ), where 𝛿 = . 
6N

Corollary 6.3.5. Under the assumptions of the open mapping theorem, given r > 0,
there exists 𝛿 > 0 such that B′𝛿 ⊆ T(Br ). 

The following theorem justifies the name of the open mapping theorem.

Theorem 6.3.6. Under the assumptions of the open mapping theorem, T is an open
mapping.

Proof. Let U be an open subset of X. We need to show that T(U) is open in Y.


Let y = T(x) ∈ T(U). Since U is open, there exists r > 0 such that B(x, r) ⊆ U.
Corollary 6.3.5 implies that there is a positive number 𝛿 such that B′𝛿 ⊆ T(Br ).
Now T(B(x, r)) = T(x + Br ) = T(x) + T(Br ) ⊇ y + B′𝛿 = B(y, 𝛿). This concludes
the proof. 

The continuity of a function does not imply its openness. For example, the function
f(x) = sin x is continuous but not open, since the image of interval (0, 𝜋) is (0, 1].

Example 2. Under the assumptions of the open mapping theorem, there exists a
constant M > 0 such that, for every y ∈ Y, there is an element x ∈ T−1 (y) such
that ‖x‖ ≤ M‖y‖. By the open mapping theorem, there exists a positive number
𝛿y
𝛿 such that B′𝛿 ⊆ T(B1 ). For a nonzero vector y ∈ Y, ∈ B′𝛿 ; hence there is a
2‖y‖
𝛿y 2x1 ‖y‖
vector x1 ∈ X such that ‖x1 ‖ ≤ 1 and T(x1 ) = . Define x = . One can
2‖y‖ 𝛿
2‖y‖ 2
see that T(x) = y, and ‖x‖ ≤ . The constant we seek is M = . 
𝛿 𝛿

The following results represent a small sample of applications of the open mapping
theorem.

Theorem 6.3.7. Let X and Y be Banach spaces, and let T ∶ X → Y be a bounded


bijection. Then T is a homeomorphism.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

264 fundamentals of mathematical analysis

Proof. By theorem 6.3.6, T is an open mapping; hence T−1 is continuous, and T is a


homeomorphism. 

The above theorem also follows from example 2. If T is injective, then


‖T−1 ‖ ≤ M.
The following theorem states that it is not possible for a complete norm to be
strictly stronger than another.

Theorem 6.3.8. Let X be a Banach space under each of the norms ‖.‖ and ‖.‖′ . If
there exits a constant 𝛼 > 0 such that ‖x‖ ≤ 𝛼‖x‖′ for every x ∈ X, then there
exists a constant 𝛽 > 0 such that ‖x‖′ ≤ 𝛽‖x‖ for every x ∈ X.

Proof. Consider the identity mapping IX ∶ (X, ‖.‖′ ) → (X, ‖.‖). The assumption
‖x‖ ≤ 𝛼‖x‖′ is equivalent to the boundedness of IX . By theorem 6.3.7, the inverse
of IX is also continuous. Thus IX ∶ (X, ‖.‖) → (X, ‖.‖′ ) is bounded. Thus there
exists a positive constant 𝛽 such that, for all x ∈ X, ‖x‖′ ≤ 𝛽‖x‖. 

Definition. Let (X, d) and (Y, 𝜌) be metric spaces, and let T ∶ X → Y. The graph
of T is the subset G = {(x, T(x)) ∶ x ∈ X} of X × Y. We say that the graph of T is
closed if G is closed in the product metric on X × Y.
Recall that a sequence (xn , yn ) ∈ X × Y converges to (x, y) if and only if xn → x
and yn → y. Thus the graph of T is closed if whenever xn → x, and T(xn ) → y,
then (x, y) ∈ G, or simply y = T(x). It is a simple exercise to verify that if T is
continuous, then the graph of T is closed in X × Y. For Banach spaces and linear
mappings, the converse is true.

Theorem 6.3.9 (the closed graph theorem). Let X and Y be Banach spaces, and let
T be a linear mapping from X to Y. If the graph of T is closed, then T is bounded.

Proof. Define a norm on X as follows: ‖x‖′ = ‖x‖ + ‖T(x)‖. We first show that ‖.‖′
is complete, and hence (X, ‖.‖′ ) is a Banach space. If (xn ) is a Cauchy sequence
in ‖.‖′ , then, for 𝜖 > 0, there is a natural number N such that, for m, n > N,
‖xn − xm ‖′ < 𝜖. In particular, both (xn ) and (T(xn )) are Cauchy sequences in X
and Y, respectively. The completeness of X and Y guarantees that both sequences
converge, say, x = limn xn , and y = limn T(xn ). The assumption that the graph of
graph of T is closed implies that y = T(x). Now ‖xn − x‖′ = ‖xn − x‖ + ‖T(xn ) −
T(x)‖ = ‖xn − x‖ + ‖T(xn ) − y‖ → 0 as n → ∞. This demonstrates the complete-
ness of ‖.‖′ . Now ‖x‖ ≤ ‖x‖ + ‖T(x)‖ = ‖x‖′ . By theorem 6.3.8, the two norms ‖.‖
and ‖.‖′ are equivalent; thus the boundedness of T in one norm is equivalent to its
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 265

boundedness in the other. But the boundedness of T in the ‖.‖′ norm is immediate
from the inequality ‖T(x)‖ ≤ ‖T(x)‖ + ‖x‖ = ‖x‖′ . 

The following examples show that both the linearity of the T and the completeness
of the spaces are needed for the closed graph theorem to hold.

Example 3. The function f ∶ ℝ → ℝ, defined below, is discontinuous but its graph


is closed:
1
if x ≠ 0,
f(x) = { x
0 if x = 0. 

Example 4. Let X = 𝒞[0, 1] be equipped with the 1-norm, and let Y = 𝒞[0, 1] be
equipped with the uniform norm. The identity function I ∶ X → Y is discon-
tinuous by example 7 on section 3.6. However, the graph of I is closed. Sup-
pose limn ‖fn − f‖1 = 0 and limn ‖I(fn ) − g‖∞ = limn ‖fn − g‖∞ = 0. Since con-
vergence in the uniform norm implies convergence in the 1-norm, limn ‖fn −
g‖1 = 0. Now the uniqueness of limits forces f = g. 

Exercises

1. Let 𝜆n ∶ 𝕂(ℕ) → 𝕂 be the functional defined by 𝜆n (x) = nxn . Prove that the
set {𝜆n } is pointwise bounded but not uniformly bounded. Here x = (xn ), and
𝕂(ℕ) is given the supremum norm.
2. The Banach-Steinhaus theorem. Let X and Y be Banach spaces, and let
(Tn ) be a sequence of bounded linear mappings from X to Y such that,
for every x ∈ X, T(x) = limn Tn (x) exists. Prove that T is bounded and that
‖T‖ ≤ lim infn ‖Tn ‖. Is it necessarily true that limn Tn = T in ℒ(X, Y)?

3. Let (yn ) be a sequence such that ∑n=1 xn yn < ∞ for all sequences (yn ) ∈ lq .
Prove that (xn ) ∈ lp . Here p and q are conjugate Hölder exponents with p > 1.

4. Let (yn ) be a sequence such that ∑n=1 xn yn < ∞ for all sequences (yn ) that

converge to 0. Prove that ∑n=1 |yn | < ∞.
5. Let X be a Banach space, and suppose that the sequence 𝜆n ∈ X∗ is pointwise
bounded. Prove that 𝜆n is equicontinuous.
6. Let M and N be closed subspaces of a Banach space (X, ‖.‖) such that
X = M ⊕ N. Thus every x ∈ X can be written uniquely as x = y + z, where
y ∈ M, z ∈ N. Define a norm on X by ‖x‖′ = ‖y‖ + ‖z‖. Prove that ‖.‖′ is
equivalent to ‖.‖.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

266 fundamentals of mathematical analysis

6.4 The Hahn-Banach Theorem

The importance of the Hahn-Banach theorem cannot be overstated. The results


following theorem 6.4.4 represent only a sample of the wide range of applications
of the Hahn-Banach theorem. Unlike the three major theorems of the previous
section, the Hahn-Banach theorem does not require completeness.

The Hahn-Banach theorem has many guises, and one of them is an extension
theorem. The following example shows that, from the purely algebraic perspective,
extending a linear functional on a subspace M of a vector space X is a trivial task.
Compare the following example to theorem 6.4.4.

Example 1. Let M be a subspace of a vector space X, and let 𝜆 be a linear functional


on M. Then 𝜆 can be extended to a linear functional on X.

Let S1 be a basis for M, and choose a subset S2 of X such that S1 ∪ S2 is a basis


for X. Define a function Λ ∶ S → ℂ as follows:

𝜆(x) if x ∈ S1 ,
Λ(x) = {
0 if x ∈ S2 .

Extend the function Λ by linearity to a functional Λ on X. The restriction of Λ


to M is clearly 𝜆. 

One of the corollaries of the Hahn-Banach theorem (theorem 6.4.5) is a powerful


separation theorem. Earlier in the book, we saw examples of separation theorems
by linear functionals, albeit in a slightly different context. See example 10 in section
4.7. The following example shows, once again, that, from the algebraic point of
view, the problem of separating a subspace from a point outside it is a simple one.
Compare the result below to theorem 6.4.5.

Example 2. Let M be a proper subspace of a vector space X, and let x0 ∈ X − M.


There exists a linear functional 𝜆 on X such that 𝜆(M) = 0 and 𝜆(x0 ) ≠ 0.

Choose a basis S1 for M. Since S1 ∪ {x0 } is independent, there is a subset


S2 of X such that S1 ∪ {x0 } ∪ S2 is a basis for X. Define a function 𝜆 ∶ S1 ∪ {x0 } ∪
S2 → ℂ by

1 if x = x0 ,
𝜆(x) = {
0 if x ∈ S2 ∪ S2 .
Extend 𝜆 by linearity to a linear functional 𝜆 on X. Clearly, 𝜆(M) = 0. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 267

Definition. Let X be a complex normed linear space. A real functional u on X is


said to be a bounded real-valued functional if

(a) u(x + y) = u(x) + u(y) for all x, y ∈ X,


(b) u(ax) = au(x) for all x ∈ X and a ∈ ℝ, and
|u(x)|
(c) ‖u‖ = supx≠0 < ∞.
‖x‖

Lemma 6.4.1. Let 𝜆 be a bounded complex functional on a normed linear space X,


and let u be the real part of 𝜆. Then

(a) 𝜆(x) = u(x) − iu(ix), and


(b) u is a bounded real functional on X and ‖u‖ = ‖𝜆‖.
Conversely, if u is a bounded real functional on X, then 𝜆(x) = u(x) − iu(ix)
is a bounded complex functional on X.

Proof. Write 𝜆(x) = u(x) + iv(x). On the one hand, 𝜆(ix) = i𝜆(x) = iu(x) − v(x).
On the other hand, 𝜆(ix) = u(ix) + iv(ix). Equating the right-hand sides
of the above identities yields v(x) = −u(ix), and hence (a). Since, for any
complex number z, |Re(z)| ≤ |z|, |u(x)| ≤ |𝜆(x)|; hence ‖u‖ ≤ ‖𝜆‖. If 𝜆(x) ≠ 0,
|𝜆(x)|
let 𝛼 = . Then |𝜆(x)| = 𝛼𝜆(x) = 𝜆(𝛼x) = u(𝛼x) ≤ ‖u‖‖𝛼x‖ = |𝛼|‖u‖‖x‖ =
𝜆(x)
‖u‖‖x‖. Thus ‖𝜆‖ ≤ ‖u‖, and this establishes (b).
Conversely, if u is a bounded real functional on X and 𝜆(x) = u(x) − iu(ix),
then the additivity of 𝜆 is straightforward. Now 𝜆(ix) = u(ix) − iu(−x) =
u(ix) + iu(x) = i[u(x) − iu(ix)] = i𝜆(x). Hence 𝜆((a + ib)x) = 𝜆(ax) + 𝜆(ibx) =
a𝜆(x) + i𝜆(bx) = a𝜆(x) + ib𝜆(x) = (a + ib)𝜆(x). Thus 𝜆 is complex linear. The
boundedness of 𝜆 follows from the proof of part (b). 

Lemma 6.4.2. Let M be a a subspace of a real normed linear space X, and let
x0 ∈ X − M. If u is a bounded real functional on M, then u has an extension U to
a bounded real functional on N = M ⊕ Span{x0 } such that ‖U‖ = ‖u‖.

Proof. Without loss of generality, assume that ‖u‖ = 1. Every element of N can
be written uniquely as x + 𝛼x0 , where x ∈ M and 𝛼 ∈ ℝ. Define U ∶ N → ℝ
by U(x + 𝛼x0 ) = u(x) + 𝛼b, where b is a constant to be determined later in
the proof. The linearity of U is obvious, and since U extends u, ‖u‖ ≤ ‖U‖. It
remains to show that ‖U‖ ≤ 1. It suffices to show that a constant b exists such that
|u(x) − b| ≤ ‖x − x0 ‖ for every x ∈ M, because then |U(x)| = |u(x) + 𝛼b| =
u(x) x
| − 𝛼|| − b| ≤ | − 𝛼|‖ − x0 ‖ = ‖x + 𝛼x0 ‖; hence ‖U‖ ≤ 1. We now show
−𝛼 −𝛼
that a constant b exists such that |u(x) − b| ≤ ‖x − x0 ‖ for every x ∈ M. For
x, y ∈ X, u(x) − u(y) = u(x − y) ≤ ‖u‖‖x − y‖ = ‖x − y‖ ≤ ‖x − x0 ‖ + ‖y − x0 ‖.
Therefore u(x) − ‖x − x0 ‖ ≤ u(y) + ‖y − x0 ‖, and b1 = supx∈M {u(x) − ‖x −
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

268 fundamentals of mathematical analysis

x0 ‖} ≤ infy∈M {u(y) + ‖y − x0 ‖} = b2 . Any constant b such that b1 ≤ b ≤ b2


satisfies u(x) − ‖x − x0 ‖ ≤ b ≤ u(x) + ‖x − x0 ‖ for every x ∈ M. For such a
constant b, |u(x) − b| ≤ ‖x − x0 ‖. The existence of b concludes the proof. 

Lemma 6.4.3. Let M be a subspace of a real normed linear space X, and let u be a
bounded real functional on M. Then u has a bounded real extension, U, on X such
that ‖U‖ = ‖u‖.

Proof. Consider the family 𝔅 = {(M𝛼 , U𝛼 ) ∶ 𝛼 ∈ I} of extensions of u that satisfy the


conclusion of the theorem. Thus, for each 𝛼 ∈ I, M is a subspace of M𝛼 , U𝛼 extends
u, and ‖U𝛼 ‖ = ‖u‖. Order 𝔅 by set and function inclusion: (M𝛼 , U𝛼 ) ⊆ (M𝛽 , U𝛽 )
means, by definition, that M𝛼 ⊆ M𝛽 , and U𝛽 extends U𝛼 . If {ℭ = (M𝛼 , U𝛼 ) ∶
𝛼 ∈ J} is a chain in 𝔅, let (N, U) = (∪𝛼∈J M𝛼 , ∪𝛼∈J U𝛼 ). It is easy to verify that N
is a subspace of X, that U is well defined and, linear and that ‖U‖ = ‖u‖. All the
properties follow from the fact that ℭ is a chain. Thus (N, U) is an upper bound of
ℭ. By Zorn’s lemma, 𝔅 has a maximal member, (M∗ , U∗ ). It must be the case that
M∗ = X because otherwise we can pick an element x0 ∈ X − M and use lemma
6.4.2 to extend U∗ to M∗ ⊕ Span{x0 }, which would contradict the maximality of
(M∗ , U∗ ). 

Theorem 6.4.4 (the Hahn-Banach theorem). Let 𝜆 be a bounded linear functional


on a subspace M of a complex normed linear space X. Then 𝜆 has an extension to
a bounded linear functional, Λ, on X such that ‖Λ‖ = ‖𝜆‖.

Proof. Consider X as a real normed linear space simply by limiting the scalar field to
ℝ, and let u be the real part of 𝜆. By lemma 6.4.1, u is a bounded real functional
on M, and ‖u‖ = ‖𝜆‖. By lemma 6.4.3, u has an extension, U, to X such that
‖U‖ = ‖u‖. Define Λ ∶ X → ℂ by Λ(x) = U(x) − iU(ix). By lemma 6.4.1, Λ is a
bounded linear functional on X, and ‖Λ‖ = ‖U‖ = ‖u‖ = ‖𝜆‖. 

We now look at some applications of the Hahn-Banach theorem. The results below
are important in their own right.

Theorem 6.4.5. Let M be a subspace of a normed linear space X, and let x0 ∈ X.


Then x0 ∈ M if and only if there does not exist a bounded linear functional 𝜆 on
X such that 𝜆(M) = 0 and 𝜆(x0 ) ≠ 0.

Proof. We show that if x0 ∉ M, then there exists a functional Λ ∈ X∗ such


that Λ(M) = 0, and Λ(x0 ) = 1. Since x0 ∉ M, there exists a number 𝛿 > 0
such that ‖x − x0 ‖ ≥ 𝛿 for every x ∈ M. Let N = M ⊕ Span{x0 }, and define a
function 𝜆 ∶ N → ℂ by 𝜆(x + 𝛼x0 ) = 𝛼; 𝜆 is clearly linear on N, 𝜆(M) = 0, and
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 269

x
𝜆(x0 ) = 1. Now, for any x ∈ M, 𝛼 ≠ 0, ‖ − x0 ‖ ≥ 𝛿. Thus 𝛿|𝜆(x + 𝛼x0 )| =
−𝛼
x
𝛿|𝛼| ≤ |𝛼|‖ − x0 ‖ = ‖x + 𝛼x0 ‖. Thus 𝜆 is bounded on N (‖𝜆‖ ≤ 1/𝛿).
−𝛼
Extend 𝜆 to a bounded linear functional Λ on X. The functional Λ has the
desired properties. Conversely, if x0 ∈ M and 𝜆 ∈ X∗ is such that 𝜆(M) = 0,
then there exists a sequence of vectors (xn ) in M such that limn xn = x. Now
𝜆(x0 ) = 𝜆(limn xn ) = limn 𝜆(xn ) = 0. 

The following result can be used to prove certain approximation theorems. It


follows immediately from the previous theorem.

Corollary 6.4.6. Let M be a subspace of a normed linear space X. If, for


𝜆 ∈ X∗ , 𝜆(M) = 0 implies that 𝜆 = 0, then M is dense in X. 

Example 3. Let A be a dense subset of [−𝜋, 𝜋]. For a fixed t ∈ A, the sequence
eint
𝜉t = ( )∞ 2
n=1 is in l . We claim that the subspace M = Span{𝜉t ∶ t ∈ A} is
n
2
dense in l . We use the above corollary and show that, for a bounded
linear functional 𝜆 on l2 , 𝜆(M) = 0 is possible only if 𝜆 = 0. By theorem
6.2.4, there exists a sequence (yn ) ∈ l2 such that, for every sequence x =
∞ ∞ y ∞ |y |
(xn ) ∈ l2 , 𝜆(x) = ∑n=1 xn yn . For every t ∈ [−𝜋, 𝜋], ∑n=1 | n eint | = ∑n=1 n ≤
n n
∞ 1 1/2 ∞ y
‖y‖2 { ∑n=1 2 } < ∞, and the series ∑n=1 n eint converges absolutely and
n n
uniformly on [−𝜋, 𝜋] to a continuous function F(t). By assumption, F vanishes
on a dense subset of [−𝜋, 𝜋], so F is identically equal to the zero function.
y
Theorem 4.10.5 implies that n = 0 for all n ∈ ℕ. Thus yn = 0, and 𝜆 = 0. 
n

Another corollary of the Hahn-Banach theorem is the following separation


theorem.

Corollary 6.4.7. Let X be a normed linear space, and let x0 ∈ X, x0 ≠ 0. Then there
exists a bounded linear functional 𝜆 on X such that 𝜆(x0 ) = ‖x0 ‖, and ‖𝜆‖ = 1.
In particular, if y ∈ X and 𝜆(y) = 0 for all 𝜆 ∈ X∗ , then y = 0.

Proof. Let M = Span{x0 }, and define a functional 𝜆 ∶ M → ℂ by 𝜆(𝛼x0 ) = 𝛼‖x0 ‖.


Clearly, 𝜆(x0 ) = ‖x0 ‖, and ‖𝜆‖ = 1. By the Hahn-Banach theorem, 𝜆 has a norm-
preserving extension to X. 

The following important construction relies heavily on the above corollary. As we


established in section 6.2, the dual X∗ of a normed linear space X is a Banach space
|𝜆(x)|
with the norm ‖𝜆‖ = supx≠0 ; X∗ , in turn, had a dual, X∗∗ , which is a Banach
‖x‖
space known as the second dual of X. Now X can be linearly and isometrically
embedded into X∗∗ as follows. For an element x ∈ X, define an element x̂ of X∗∗ by
̂ = 𝜆(x). The linearity of x,̂ as well as that of the mapping X → X∗∗ defined
x(𝜆)
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

270 fundamentals of mathematical analysis

by x ↦ x,̂ are obvious. We now show that ‖x‖̂ = ‖x‖. Since |x(𝜆)| ̂ = |𝜆(x)| ≤
|x(𝜆)|
̂
‖𝜆‖‖x‖, ≤ ‖x‖. Hence ‖x‖̂ ≤ ‖x‖. We now show that ‖x‖̂ = ‖x‖. By corollary
‖𝜆‖
6.4.7, there exists 𝜆 ∈ X∗ such that ‖𝜆‖ = 1, and 𝜆(x) = ‖x‖. Now |x(𝜆)|
̂ = |𝜆(x)| =
‖x‖. Therefore ‖x‖̂ ≥ ‖x‖ and ‖x‖̂ = ‖x‖. We have proved the following result.

Theorem 6.4.8. Let X be a normed linear space, and let 𝜑 ∶ X → X∗∗ be the function
𝜑(x) = x.̂ Then 𝜑 is a linear isometry. 

The function 𝜑 in the above theorem is known as the natural embedding of X into
X∗∗ . We use the notation X̂ to denote the range of 𝜑. Thus X̂ = {x̂ ∶ x ∈ X}.
The above theorem provides the neatest construction of the completion of a
normed linear space.

Theorem 6.4.9. Let X be a normed linear space. Then X can be linearly and
isometrically embedded as a dense subspace of a Banach space. Thus every normed
linear space has a completion.

Proof. We know that X∗∗ is a Banach space. Let X̂ be the image of X under the
natural embedding 𝜑 in theorem 6.4.8. The desired completion of X is the closure
of X̂ in X∗∗ . 

Definition. A Banach space X is reflexive if X̂ = X∗∗ . Thus X is reflexive if every


member of X∗∗ is of the form x̂ for some x ∈ X.

Example 4. The lp spaces are reflexive for 1 < p < ∞. This follows directly from
theorem 6.2.4. 

The result below is important in its own right, but it also helps us decide whether
certain spaces are reflexive.

Example 5. Let X be a normed linear space. If X∗ is separable, then X is separable.

Let {𝜆n } be a countable dense subset of X∗ . Since ‖𝜆n ‖ = sup‖x‖=1 |𝜆n (x)|, there
exist unit vectors xn ∈ X such that |𝜆n (xn )| ≥ ‖𝜆n ‖/2. Let M = Span{x1 , x2 , ...}.
We employ theorem 6.4.6. Suppose that 𝜆 ∈ X∗ is such that 𝜆(M) = 0. Let
𝜖 > 0, and pick a positive integer n such that ‖𝜆n − 𝜆‖ < 𝜖. By the definition of
xn , and the fact that 𝜆(xn ) = 0, we have ‖𝜆n ‖/2 ≤ |𝜆n (xn )| = |𝜆n (xn ) − 𝜆(xn )| =
|(𝜆n − 𝜆)(xn )| ≤ ‖𝜆n − 𝜆‖ < 𝜖. Therefore ‖𝜆‖ ≤ ‖𝜆 − 𝜆n ‖ + ‖𝜆n ‖ < 𝜖 + 2𝜖 = 3𝜖.
This means that 𝜆 = 0, and, by corollary 6.4.6, M is dense in X. Now the
n
countable set {∑i=1 ai xi ∶ n ∈ ℕ, ai ∈ ℚ + iℚ} is dense in X. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 271

Definition. A closed subspace M of a Banach space X is said to be complemented


if there exists a closed subspace N of X such that X = M ⊕ N.

The definition of the algebraic complement of an arbitrary subspace of a vector


space was introduced in section 3.4. The current definition requires both M and
N to be closed subspaces of X. The direct sum of two closed subspaces of a Banach
space is sometimes referred to as the topological direct sum of the two subspaces.

The very definition suggests that not every closed subspace of a Banach space has
a closed complement. However, the following examples identify two important
special cases where closed complements are guaranteed.

Example 6. If M is a finite-dimensional subspace of a Banach space X, then M is


complemented.
n
Let {x1 , … , xn } be a basis for M. For x ∈ M, x = ∑i=1 ai (x)xi for a unique set of
coefficients a1 (x), … , an (x). Each ai is a continuous linear functional on M. By
the Hahn-Banach theorem, each ai has an extension to a functional 𝜆i ∈ X∗ .
n
Define an operator P ∶ X → X by P(x) = ∑i=1 𝜆i (x)xi . It is easy to see that
P ∈ ℒ(X), that P(x) = x for every x ∈ M, and that P2 = P. Let N = Ker(P) =
∩ni=1 Ker(𝜆i ). Clearly, N is a closed subspace of X. For x ∈ X, x = x − P(x) + P(x).
By the above, P(x) ∈ M, and x − P(x) ∈ N, since P(x − P(x)) = P(x) − P2 (x) =
P(x) − P(x) = 0. This shows that M + N = X. If x ∈ M ∩ N, then 𝜆i (x) = 0 for
n n
every 1 ≤ i ≤ n; hence x = ∑i=1 ai (x)xi = ∑i=1 𝜆i (x)xi = 0. We have shown that
M ⊕ N = X. 

Example 7. If N is a closed, finite co-dimensional subspace of X, then N is


complemented.

Recall that the co-dimension of N is the dimension of the quotient space


X/N. Pick vectors x1 , … , xn such that {x̃1 , … , x̃n } is a basis for X/N, where
xĩ = xi + N, and let M = Span{x1 , … , xn }. We claim that M ⊕ N = X. For x ∈ X,
n n n
x + N = ∑i=1 ai (xi + N) = (∑i=1 ai xi ) + N. Therefore y = x − ∑i=1 ai xi ∈ N,
n n
and x = ∑i=1 ai xi + y ∈ M + N. If x ∈ M ∩ N, then x = ∑i=1 ai xi ∈ N. Thus
n
∑i=1 ai x̃i = 0; hence ai = 0 for every 1 ≤ i ≤ n by the independence of
{x̃1 , … , x̃n }, and x = 0. 

Exercises

1. Let M be a closed maximal subspace of a normed linear space X. Prove that


there exists a functional 𝜆 ∈ X∗ such that Ker(𝜆) = M.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

272 fundamentals of mathematical analysis

2. Let A be a subset of a normed linear space X. Prove that A is bounded in X


if and only if, for every 𝜆 ∈ X∗ , 𝜆(A) is bounded in ℂ.
3. Prove that if {x1 , … , xn } is an independent subset of a normed linear space
X, and {𝛼1 , … , 𝛼n } is an arbitrary set of complex numbers, then there exists
𝜆 ∈ X∗ such that 𝜆(xi ) = 𝛼i for all 1 ≤ i ≤ n.
4. Let M be a closed subspace of a normed linear space X, and let x0 ∉ M. Prove
there exists 𝜆 ∈ X∗ such that 𝜆(M) = 0, ‖𝜆‖ ≤ 1, and |𝜆(x0 )| = dist(x0 , M).
5. Prove that, for an element x of a normed linear space X, ‖x‖ = sup{|𝜆(x)| ∶
𝜆 ∈ X∗ , ‖𝜆‖ = 1}.

Definition. A sequence (xn ) in a normed linear space X is said to converge


weakly to an element x ∈ X if, for every 𝜆 ∈ X∗ , limn 𝜆(xn ) = 𝜆(x). We use
the notation xn →w x to indicate the weak convergence of (xn ) to x.

6. Prove that if (xn ) is weakly convergent, then (‖xn ‖) is bounded.


7. Prove that if xn →w x and yn →w y, then for any scalars a and b, axn +
byn →w ax + by.
8. Prove that the weak limit of a sequence, if it exists, is unique.
9. Prove that l1 is not reflexive.

Definition. A bounded operator P on a Banach space X is called a bounded


projection if P2 = P. Equivalently, if Px = x for every x ∈ M = ℜ(P). See
problems 13 and 14 on section 3.4 for the general properties of the projec-
tion of a vector space onto a subspace.

10. Let M be a closed subspace of a Banach space X. Prove that M is comple-


mented if and only if there exists a bounded projection P on X such that
M = ℜ(P). Hint: Suppose X = M ⊕ N, where M and N are closed subspaces
of X, and let P ∶ X → X be the projection of X onto M. Use the closed graph
theorem to prove the boundedness of P.
11. Suppose that M and N are closed, complementary subspaces of a Banach
space X, and let T1 ∶ M → X and T2 ∶ N → X be bounded linear mappings.
Define T ∶ X → X by T(x) = T1 (y) + T2 (z), where x = y + z, y ∈ M, z ∈ N.
Prove that T is bounded.

6.5 The Spectrum of an Operator

The spectrum of a square matrix A is simply its set of eigenvalues, and the
eigenvalues of A are easy to characterize. They are exactly the complex numbers 𝜆
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 273

for which the matrix A − 𝜆I is not invertible. We recall the simple fact that A − 𝜆I
is not invertible if and only if the linear operator T it generates on 𝕂n is not one-
to-one, and this is the case if and only if T in not onto.
The definition of the spectrum of an operator T on an infinite-dimensional
space is exactly the same as it is for a matrix. The stark distinction here is that
not every point in the spectrum of an operator on an infinite-dimensional space
is an eigenvalue. This is because such an operator may be one-to-one but not onto
or conversely. See example 1. Thus the spectrum consists of two main parts: the
complex numbers 𝜆 for which T − 𝜆I is not one-to-one (the eigenvalues) and those
for which T − 𝜆I is one to one but not onto. The spectrum of an operator T often
carries valuable information about T, and, in some cases, the eigenvalues of an
operator and the corresponding eigenvectors completely define the operator.

Definition. A Banach algebra is a Banach space X that is also an algebra with a


multiplicative identity I such that the norm satisfies the following additional
assumptions:

(a) ‖I‖ = 1, and


(b) ‖ST‖ ≤ ‖S‖T‖ for all S and T in X.

We know that the set ℒ(X) of bounded linear operators on a Banach space X is a
Banach space. In fact, ℒ(X) is a Banach algebra with the composition of operators
as the multiplication operation. The composition of two operators S and T is
usually denoted by ST rather than SoT. Property (a) is obvious, and property (b)
follows from the inequalities ‖(ST)(x)‖ = ‖S(T(x))‖ ≤ ‖S‖‖T(x)‖ ≤ ‖S‖‖T‖‖x‖.

For the convenience of the reader, we list below the properties that make ℒ(X) a
Banach algebra: for operators T, S, U ∈ ℒ(X) and all a, b ∈ 𝕂,

(a) (ST)U = S(TU),


(b) (ab)T = a(bT),
(c) (T + S)U = TU + SU and U(T + S) = UT + US,
(d) ‖I‖ = 1, and
(e) ‖ST‖ ≤ ‖S‖‖T‖.

The algebra ℒ(X) is called the operator algebra on X.

Definition. An operator T ∈ ℒ(X) is called invertible if there exists an operator


S ∈ ℒ(X) such that ST = TS = I.

If T is a bounded linear bijection of a Banach space X, then its inverse is bounded


by theorem 6.3.7. Thus a bounded operator T fails to be invertible if
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

274 fundamentals of mathematical analysis

(a) T is not one-to-one, that is, Ker(T) ≠ {0}, or


(b) T is not onto, that is, ℜ(T) ≠ X.

We point out here an important distinction between operators on finite vs.


infinite-dimensional spaces. Every linear operator on a finite-dimensional space
is bounded, and such an operator is one-to-one if and only if it is onto. This is not
the case in infinite dimensions, as the following example illustrates.

Example 1. The right shift operator R and the left shift operator L on l2 are,
respectively,

R(x1 , x2 , ...) = (0, x1 , x2 , ...),


L(x1 , x2 , ...) = (x2 , x3 , ...).

It is clear that R is one-to-one but not onto, while L is onto but not
one-to-one. 

Definition. The spectrum, 𝜎(T), of an operator T ∈ ℒ(X) is the set of all complex
numbers 𝜆 such that T − 𝜆I is not invertible. It follows that there are two types
of points in the spectrum:

(a) Complex numbers 𝜆 such that Ker(T − 𝜆I) ≠ {0}: Such a number 𝜆 is called
an eigenvalue of T. Specifically, 𝜆 is an eigenvalue of T if there exists a nonzero
vector x such that Tx = 𝜆x. In this case, we say that x is an eigenvector of T
corresponding (or belonging) to the eigenvalue 𝜆. The set of eigenvalues of T is
known as the point spectrum of T. The set Ker(T − 𝜆I) is called the eigenspace
of T corresponding to the eigenvalue 𝜆.

(b) Complex numbers 𝜆 such that T − 𝜆I is one-to-one but not onto, that
is, ℜ(T − 𝜆I) ≠ X. We will not dwell on this part of the spectrum, since the
eigenvalues are the only important part of the spectrum for our purposes.

The complement of the spectrum of T in the complex plane is called the resolvent
set of T and is denoted 𝜌(T). Thus 𝜆 ∈ 𝜌(T) if and only if (T − 𝜆I)−1 exists. If
𝜆 ∈ 𝜌(T), we use the notation T𝜆 to denote (T − 𝜆I)−1 .

Example 2. Define an operator T on 𝒞[0, 1] as follows: for f ∈ 𝒞[0, 1], (Tf)(x) =


xf(x). The reader can easily verify that T has no eigenvalues. Thus the spectrum
consists only of complex numbers 𝜆 for which T − 𝜆I is not onto. For 𝜆 ∈ ℂ
and g ∈ 𝒞[0, 1], if there exists a function f ∈ 𝒞[0, 1] such that (T − 𝜆I)f = g, then
g(x)
f(x) = . Therefore the spectrum is the interval [0, 1].
x−𝜆
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 275

Example 3. Every complex number 𝜆 in the open unit disk is an eigenvalue of the
left shift operator on l2 .

If 0 ≠ 𝜆 ∈ ℂ and |𝜆| < 1, then the vector x𝜆 = (𝜆, 𝜆2 , 𝜆3 , ...) is clearly in l2


and L(x𝜆 ) = 𝜆x𝜆 . Also, 𝜆 = 0 is an eigenvalue of L because L(e1 ) = 0, where
e1 = (1, 0, 0, ...). 

Lemma 6.5.1. If T ∈ ℒ(X), and ‖T‖ < 1, then (I − T)−1 exists, (I − T)−1 =
∞ 1
∑n=0 Tn , and ‖(I − T)−1 ‖ ≤ .
1−‖T‖

∞ ∞ 1 ∞
Proof. First observe that ∑n=0 ‖Tn ‖ ≤ ∑n=0 ‖T‖n = . Thus the series ∑n=0 Tn
1−‖T‖
n
converges to an operator S ∈ ℒ(X). Now (I − T) ∑j=0 Tj= I − Tn+1 . Taking the
limit as n → ∞, (I − T)S = I. Similarly, S(I − T) = I; hence (I − T)−1 = S. 

Theorem 6.5.2. Let T ∈ ℒ(X). If 𝜆 ∈ ℂ and |𝜆| > ‖T‖, then 𝜆 ∈ 𝜌(T).

Proof. Since ‖T‖ < |𝜆|, ‖T/𝜆‖ < 1. By lemma 6.5.1, (I − T/𝜆)−1 exists. Thus T − 𝜆I
is invertible since (T − 𝜆I)−1n = −𝜆−1 (I − T/𝜆)−1 . Notice that, in this case,
−1 ∞ T
T𝜆 = (T − 𝜆I)−1 = ∑n=0 n . 
𝜆 𝜆

Corollary 6.5.3. The spectrum 𝜎(T) of an operator T ∈ ℒ(X) is bounded.

Proof. By theorem 6.5.2, 𝜎(T) is contained in the closed disk {z ∈ ℂ ∶


|z| ≤ ‖T‖}. 

Theorem 6.5.4. The spectrum 𝜎(T) of an operator T ∈ ℒ(X) is a closed, hence


compact, subset of ℂ.

Proof. We show that 𝜌(T) is an open subset of ℂ. Let 𝜆0 ∈ 𝜌(T), and let 𝜆 ∈ ℂ. Recall
the notation T𝜆 = (T − 𝜆I)−1 . Now

T − 𝜆I = (T − 𝜆0 I) − (𝜆 − 𝜆0 )I = (T − 𝜆0 I)[I − (𝜆 − 𝜆0 )T𝜆0 ].

If |𝜆 − 𝜆0 | < 1/‖T𝜆0 ‖, then, by lemma 6.5.1, I − (𝜆 − 𝜆0 )T𝜆0 is invertible, and


hence T − 𝜆I is also invertible, being the composition of invertible operators. This
shows that 𝜌(T) contains the disk in the complex plane centered at 𝜆0 of radius
1/‖T𝜆0 ‖, and therefore 𝜌(T) is open in ℂ. 

Definition. The spectral radius of an operator T is the number

r(T) = sup{|𝜆| ∶ 𝜆 ∈ 𝜎(T)}.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

276 fundamentals of mathematical analysis

Thus r(T) is the radius of the smallest closed disk in the complex plane that
contains 𝜎(T). By theorem 6.5.3, r(T) ≤ ‖T‖. It is possible that r(T) < ‖T‖. See
problem 5 on section 7.4.

Example 4. Let L ∶ l2 → l2 be the left shift operator. It is clear that ‖L(x)‖2 ≤ ‖x‖2 ,
and since ‖L(e2 )‖2 = ‖e1 ‖2 = 1 = ‖e2 ‖2 , ‖L‖ = 1. Therefore the spectrum of
R is contained in the closed unit disk, D. It follows directly from this and
example 3 that 𝜎(L) = D. Thus, ‖L‖ = r(L) = 1. 

The last conclusion of the previous example is true for the right shift operator. We
derive it without directly computing 𝜎(R).

Example 5. For the right shift operator R, ‖R‖ = r(R) = 1.

Since R is an isometry, ‖R‖ = 1. The result follows if we prove that 𝜆 = 1 ∈ 𝜎(R).


We show below that R − I is not onto.
We formally compute the inverse image, x = (xn ), under R − I of an element
y = (yn ) ∈ l2 . If (R − I)(x) = y, then (−x1 , x1 − x2 , x2 − x3 , ...) = (y1 , y2 , ...).
Equating the corresponding terms, we have −x1 = y1 , x1 − x2 = y2 , … , xn −
xn+1 = yn+1 , ….
Solving for x, we have x1 = −y1 , x2 = −y1 − y2 , … , xn = −y1 − y2 − ... − yn .
1
Now if y = ( ) ∈ l2 , then there is no x ∈ l2 such that (R − I)(x) = y since
n
n 1
xn = − ∑i=1 → −∞. 
i

Before we show that the spectrum of a bounded linear operator on a Banach space
is not empty, we need to establish the following identity: for 𝜆 and 𝜇 ∈ 𝜌(T),

T𝜆 − T𝜇 = (𝜆 − 𝜇)T𝜆 T𝜇 , (1)

T𝜆 = (T − 𝜆I)−1 = T𝜆 (T − 𝜇I)T𝜇 = T𝜆 [T − 𝜆I + (𝜆 − 𝜇)I]T𝜇


= [I + (𝜆 − 𝜇)T𝜆 ]T𝜇 = T𝜇 + (𝜆 − 𝜇)T𝜆 T𝜇 .

We need the following result from complex analysis, which we state without proof.

Lemma 6.5.5 (Liouville’s theorem). If F(z) is a bounded differentiable complex


function defined on the entire complex plane, then F is constant. 

Theorem 6.5.6. The spectrum of a bounded linear operator T on a Banach space X


is nonempty.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 277

Proof. Suppose, contrary to the above statement, that 𝜎(T) = ∅. Thus 𝜌(T) = ℂ.
For an arbitrary but fixed functional g ∈ (ℒ(X))∗ , define a function F ∶ ℂ → ℂ
F(𝜆)−F(𝜇)
by F(𝜆) = g(T𝜆 ). By identity (1), = g(T𝜆 T𝜇 ).
𝜆−𝜇
F(𝜆)−F(𝜇)
As 𝜇 → 𝜆, g(T𝜆 T𝜇 ) → g(T2𝜆 ). Therefore F′ (𝜆) = lim𝜇→𝜆 = g(T2𝜆 ),
𝜆−𝜇
and F is differentiable at every point of the complex plane. If |𝜆| ≥ 1 + ‖T‖, then,
by lemma 6.5.1,

1 T 1 1 1
‖T𝜆 ‖ = ‖( − I)−1 ‖ ≤ = ≤ 1. (2)
|𝜆| 𝜆 |𝜆| 1 − ‖T‖/|𝜆| |𝜆| − ‖T‖

Thus ‖T𝜆 ‖ is bounded by 1 outside the closed disk, D, of radius 1 + ‖T‖. Therefore,
outside the disk D, |F(𝜆)| = |g(T𝜆 )| ≤ ‖g‖. Because F is continuous on D, it is
bounded on D; hence F is a bounded differentiable function on the entire complex
plane. By lemma 6.5.5, F(𝜆) is constant. If 𝜖 > 0, there exists a positive constant R
such that ‖T𝜆 ‖ < 𝜖 for |𝜆| ≥ R (see inequality (2) above). Consequently, for such
𝜆, |F(𝜆)| ≤ ‖g‖𝜖. Since 𝜖 is arbitrary, and F is constant, F(𝜆) = 0 for all 𝜆 ∈ ℂ.
Now since g is an arbitrary element of (ℒ(X))∗ , T𝜆 = 0 (see corollary 6.4.7). This
is impossible because T𝜆 is invertible. 

The following formula for the spectral radius is well known.

Theorem 6.5.7 (Gelfand’s theorem). Let T be a bounded operator on a Banach


space X. Then r(T) = limn ‖Tn ‖1/n .

Proof. By problem 9 at the end of this section, r(Tn ) = [r(T)]n . Therefore r(T) =
[r(Tn )]1/n ≤ ‖Tn ‖1/n , and r(T) ≤ lim infn ‖Tn ‖1/n . The proof will be complete if
we show that lim supn ‖Tn ‖1/n ≤ r(T).
Let 𝜆 ∈ ℂ be such that |𝜆| > ‖T‖. By theorem 6.5.2, T𝜆 = (T − 𝜆I)−1 =
−1 ∞ Tn −1 ∞ g(Tn )
∑n=0 n . If g ∈ (ℒ(X))∗ , then g(T𝜆 ) = ∑n=0 n . By the proof of theorem
𝜆 𝜆 𝜆 𝜆
6.5.6, the function F(𝜆) = g(T𝜆 ) is differentiable for all 𝜆 ∈ 𝜌(T); thus the function
−1 ∞ g(Tn )
F(𝜆) extends the series ∑n=0 n to the set {z ∈ ℂ ∶ |𝜆| > r(T)}. Therefore
𝜆 𝜆
−1 ∞ g(Tn )
the series expansion ∑n=0 is valid for all complex numbers 𝜆 such that
𝜆 𝜆n
−1 ∞ g(Tn )
|𝜆| > r(T).3 Now, for an arbitrary real number a > r(T), the series ∑n=0
a an
Tn
is convergent; hence the sequence g( n ) is bounded. Since g ∈ (ℒ(X))∗ is
a
arbitrary, Tn /an is bounded in ℒ(X). Let K > 0 be such that ‖Tn /an ‖ ≤ K. Then
‖Tn ‖1/n ≤ K1/n a, and lim supn ‖Tn ‖1/n ≤ a. Since a is an arbitrary number greater
than r(T), lim supn ‖Tn ‖1/n ≤ r(T). 

3 The series involved here are Laurent series.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

278 fundamentals of mathematical analysis

Exercises

1. Show that the composition of invertible operators is invertible.


2. Show that if limn Tn = T, and limn Sn = S in ℒ(X), then limn Tn Sn = TS.
3. Let 𝜆1 , … , 𝜆n be distinct eigenvalues of a bounded operator T, and let
u1 , … , un be eigenvectors that correspond to 𝜆1 , … , 𝜆n , respectively. Prove
that u1 , … , un are independent. Hint: Use induction on n.
4. Prove the following version of lemma 6.5.1. If T ∈ ℒ(X) is such that

‖I − T‖ < 1, then T is invertible, T−1 = ∑n=0 (I − T)n , and ‖T−1 ‖ ≤
1
.
1−‖I−T‖
5. Let T ∈ ℒ(X) be an invertible operator, and let S ∈ ℒ(X). Prove that if
1
‖S − T‖ < −1 , then S is invertible.
‖T ‖
1
6. Let T, S ∈ ℒ(X) be invertible operators, such that ‖S − T‖ < . Prove
2‖T−1 ‖
−1 −1 −1 2 −1
that ‖S − T ‖ < 2‖T ‖ ‖S − T‖. Hint: First show that ‖I − T S‖ <
1/2, then use the identity S−1 = [I − (I − T−1 S)]−1 T−1 to show that S−1 −

T−1 = ∑n=1 (I − T−1 S)n T−1 .
7. Let U be the set of all invertible operators in ℒ(X). Prove that U is open in
ℒ(X) and that inversion is a homeomorphism on U.
8. Let T and S be commuting bounded linear operators on a Banach space
X. Prove that if ST is invertible, then S and T are invertible. Also give an
example of two singular operators whose composition is invertible.
9. Prove that, for T ∈ ℒ(X), 𝜎(Tn ) = {𝜇n ∶ 𝜇 ∈ 𝜎(T)}. Conclude that r(Tn ) =
[r(T)]n . Hint: Let 𝜆 ∈ ℂ, and let tn − 𝜆 = (t − 𝜇1 )...(t − 𝜇n ). Then Tn − 𝜆I =
(T − 𝜇1 I)...(T − 𝜇n I).
10. For a fixed function w ∈ 𝒞[0, 1], define an operator T on 𝒞[0, 1] by
(Tf )(x) = f(x)w(x). Show that T is a bounded operator and that ‖T‖ =
‖w‖∞ . Also give a sufficient condition for T to be invertible.

6.6 Adjoint Operators and Quotient Spaces

In section 3.7, we defined the adjoint of an operator on a finite-dimensional inner


product space, and, in chapter 7, we will study adjoints of operators on a Hilbert
space. The definition of the adjoints on Banach spaces X is more complicated. In
fact, the adjoint of a bounded operator on a Banach space X is a bounded operator
on the dual space X∗ . Among other results, we prove that an operator T and its
adjoint, T∗ have the same norm, the same spectrum, and the same spectral radius.
We also study annihilators and quotient spaces. Little subsequent material rests on
this section, and it is possible to study the remainder of the book independently
of this section.

Notation. The duality bracket: Let X be a Banach space. For x ∈ X and 𝜆 ∈ X∗ , we


write ⟨x, 𝜆⟩ for 𝜆(x). This is a notational convenience that also facilitates certain
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 279

computations. In addition, the notation equalizes the roles of X and X∗ . We already


saw that X acts on X∗ in much the same way X∗ acts on X. See, for example, the
construction leading up to theorem 6.4.8. Observe that |⟨x, 𝜆⟩| ≤ ‖x‖‖𝜆‖, reminis-
cent of the Cauchy-Schwarz inequality. We revert to the traditional notation 𝜆(x)
when convenient.

Theorem 6.6.1. Let X be a Banach space, let x ∈ X, 𝜆 ∈ X∗ , and let T ∈ ℒ(X). Then

(a) ‖𝜆‖ = sup{|⟨x, 𝜆⟩| ∶ x ∈ X, ‖x‖ ≤ 1},


(b) ‖x‖ = ‖x‖̂ = sup{|⟨x, 𝜆⟩| ∶ 𝜆 ∈ X∗ , ‖𝜆‖ ≤ 1}, and
(c) ‖T‖ = sup{|⟨Tx, 𝜆⟩| ∶ x ∈ X, 𝜆 ∈ X∗ , ‖x‖ ≤ 1, ‖𝜆‖ ≤ 1}.

Proof. (a) and (b) are previously established facts in new notation. To prove (c),

‖T‖ = sup{‖Tx‖ ∶ ‖x‖ ≤ 1} = sup‖x‖≤1 sup‖𝜆‖≤1 |⟨Tx, 𝜆⟩|


= sup{|⟨Tx, 𝜆⟩| ∶ ‖x‖ ≤ 1, ‖𝜆‖ ≤ 1}. 

Definition. Let T ∈ ℒ(X). We define the adjoint operator T∗ on X∗ by the


requirement that for all x ∈ X,

⟨Tx, 𝜆⟩ = ⟨x, T∗ (𝜆)⟩.

Using conventional notation rather than duality brackets, the requirement in


the above definition can be written as 𝜆(Tx) = (T∗ (𝜆))(x) for every x ∈ X. This
simply means that T∗ (𝜆) = 𝜆oT, which can well be taken as the definition of the
operator T∗ . It is obvious that T∗ ∈ ℒ(X∗ ).

Example 1. In this example, we use theorem 6.2.4 and identify (l1 )∗ with l∞ . For
elements x = (xn ) ∈ l1 and 𝜆 = (𝜆n ) ∈ l∞ , define T(x) = (x2 , x3 , ...) and S(𝜆) =
(0, 𝜆1 , 𝜆2 , ...). Clearly, T ∈ ℒ(l1 ) and S ∈ ℒ(l∞ ). We claim that S = T∗ . We need
to verify that 𝜆oT = S(𝜆), which is straightforward since, for x ∈ l1 , 𝜆(T(x)) =

∑n=1 𝜆n xn+1 = (S(𝜆))(x). 

Theorem 6.6.2. ‖T∗ ‖ = ‖T‖.

Proof. By theorem 6.6.1,

‖T‖ = sup{|⟨Tx, 𝜆⟩| ∶ ‖x‖ ≤ 1, ‖𝜆‖ ≤ 1}


= sup{|⟨x, T∗ 𝜆⟩| ∶ ‖x‖ ≤ 1, ‖𝜆‖ ≤ 1}
= sup{‖T∗ 𝜆‖ ∶ ‖𝜆‖ ≤ 1} = ‖T∗ ‖. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

280 fundamentals of mathematical analysis

Example 2. r(T) = r(T∗ ).


It is straightforward to show that, for n ∈ ℕ, (Tn )∗ = (T∗ )n . It follows that
‖(T∗ )n ‖ = ‖(Tn )∗ ‖ = ‖Tn ‖. Employing theorem 6.5.7, we have

r(T∗ ) = lim ‖(T∗ )n ‖1/n = lim ‖Tn ‖1/n = r(T). 


n n

The next example utilizes several of the ideas of sections 6.3 and 6.4.

Example 3. An operator T ∈ ℒ(X) is invertible if and only if T∗ is invertible.

If T is invertible, then TT−1 = T−1 T = IX . Equating the adjoints of the operators


in the previous identities and using problem 2 at the end of this section, we have
(T−1 )∗ T∗ = T∗ (T−1 )∗ = IX∗ . Thus (T∗ )−1 = (T−1 )∗ .
Now suppose T∗ is invertible, and write S = (T∗ )−1 .
We show that T is bounded away from zero. It then follows from example 8
on section 6.2 that ℜ(T) is closed and that T is injective. Now

‖x‖ = sup{|⟨x, 𝜆⟩| ∶ 𝜆 ∈ X∗ , ‖𝜆‖ ≤ 1}


= sup{|⟨x, T∗ S𝜆⟩| ∶ 𝜆 ∈ X∗ , ‖𝜆‖ ≤ 1}
= sup{|⟨Tx, S𝜆⟩| ∶ 𝜆 ∈ X∗ , ‖𝜆‖ ≤ 1}
≤ ‖Tx‖sup{‖S𝜆‖ ∶ 𝜆 ∈ X∗ , ‖𝜆‖ ≤ 1} = ‖Tx‖‖S‖.

Thus ‖Tx‖ ≥ c‖x‖, where c = 1/‖S‖.


If we show that T is surjective, then T is invertible by theorem 6.3.7, and
the proof will be complete. Suppose there is an element y ∈ X − ℜ(T). Since
ℜ(T) is closed, theorem 6.4.5 yields an element 𝜆 ∈ X∗ such that 𝜆(y) ≠ 0 and
𝜆(Tx) = 0 for every x ∈ X. Now (T∗ 𝜆)(x) = 𝜆(Tx) = 0 for every x ∈ X; hence
T∗ 𝜆 = 0. Since T∗ is injective, 𝜆 = 0, and this is a contradiction. 

Example 4. 𝜎(T) = 𝜎(T∗ ).


For any 𝜆 ∈ ℂ, (T − 𝜆I)∗ = T∗ − 𝜆I. By the above example, T − 𝜆IX is invertible
if and only if T∗ − 𝜆IX∗ is invertible, and the result follows. Observe that the
result of example 2 is a trivial consequence of this result. 

̂ T∗∗ (x).
Theorem 6.6.3. Let T ∈ ℒ(X). Then, for every x ∈ X, (Tx) = ̂

Proof. For 𝜆 ∈ X∗ , (Tx)(𝜆)


̂ = 𝜆(Tx) = (𝜆oT)(x) = (T∗ 𝜆)(x) = x(T
̂ ∗ 𝜆) = (xoT
̂ ∗ )(𝜆) =
∗∗
(T (x))(𝜆).
̂ Thus (Tx) = ∗∗
̂ 
̂ T (x).

Loosely interpreted, the above theorem says that T is the restriction of T∗∗ to X.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 281

Definition. Let M be a subspace of a Banach space X, and let N be a subspace of


X∗ . The annihilator M⊥ of M consists of all the functionals in X∗ that vanish
on M. Symbolically, M⊥ = {𝜆 ∈ X∗ ∶ 𝜆(M) = 0}. Similarly, the annihilator of N
is N⊥ = {x ∈ X ∶ 𝜆(x) = 0 ∀𝜆 ∈ N}.

Example 5. Let M be the set of all sequences in l1 where every even term is 0. We
claim that M⟂ is the set S of all the sequences in l∞ where every odd term is 0. It

is clear that if 𝜆 = (𝜆n ) ∈ S, then, for every x = (xn ) ∈ M, ∑n=1 xn 𝜆n = 0; hence
S ⊆ M⟂ . Conversely, if 𝜆 = (𝜆n ) ∈ M⟂ , then 𝜆(e2n+1 ) = 0 for every positive
integer n. This means that 𝜆2n+1 = 0, and 𝜆 ∈ S. Observe that M⟂ is a closed
subspace l∞ , consistent with the theorem below. 

Theorem 6.6.4. M⊥ is a closed subspace of X∗ , and N⊥ is a closed subspace of X.

Proof. For x ∈ M, x⊥ = {𝜆 ∈ X∗ ∶ 𝜆(x) = 0} is a closed subspace of X∗ because


x⊥ = Ker(x).
̂ Consequently, M⊥ = ∩x∈M x⊥ is a closed subspace of X∗ . Similarly,
N⊥ = ∩𝜆∈N Ker(𝜆) is a closed subspace of X. 

Theorem 6.6.5. Let T ∈ ℒ(X), let 𝒩(T) and ℜ(T) be the kernel and range of T,
respectively, and let 𝒩(T∗ ), and ℜ(T∗ ) be the kernel and range of T∗ . Then

(a) 𝒩(T) = ℜ(T∗ )⊥ .



(b) 𝒩(T∗ ) = ℜ(T) .

Proof. (a) x ∈ 𝒩(T) if and only if Tx = 0, if and only if ⟨Tx, 𝜆⟩ = 0 for all 𝜆 ∈ X∗ , if
and only if ⟨x, T∗ 𝜆⟩ = 0 for all 𝜆 ∈ X∗ , if and only if x ∈ ℜ(T∗ )⊥ .

(b) 𝜆 ∈ 𝒩(T∗ ) if and only if T∗ 𝜆 = 0, if and only if ⟨x, T∗ 𝜆⟩ = 0 for all x ∈ X if



and only if ⟨Tx, 𝜆⟩ = 0 for every x ∈ X, if and only if 𝜆 ∈ ℜ(T) . 

Quotient Spaces

Let M be a closed subspace of a normed linear space X. We define a norm on


the quotient space X/M as follows: for x̃ = x + M ∈ X/M, ‖x‖̃ = inf {‖x − y‖ ∶ y ∈
M} = dist(x, M). We leave it to the reader to verify that the norm we just defined
on X/M is well defined and that ‖x‖̃ = 0 if and only if x̃ = 0. The triangle inequality
is the only norm property yet to be verified. Let x1 , x2 ∈ X, and y1 , y2 ∈ M.
Since y1 + y2 ∈ M, dist(x1 + x2 , M) ≤ ‖(x1 + x2 ) − (y1 + y2 )‖ ≤ ‖x1 − y1 ‖ + ‖x2 −
y2 ‖. Because the last inequality is valid for arbitrary elements y1 and y2 of M,
dist(x1 + x2 , M) ≤ dist(x1 , M) + dist(x2 , M), that is, ‖x̃1 + x̃2 ‖ ≤ ‖x̃1 ‖ + ‖x̃2 ‖.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

282 fundamentals of mathematical analysis

Remarks. 1. If ‖x‖̃ < 𝛿, then there exists 𝜉 ∈ x̃ such that ‖𝜉‖ < 𝛿. This is because
if ‖x‖̃ < 𝛿, then there exists y ∈ M such that ‖x − y‖ < 𝛿. Set 𝜉 = x − y.
2. For x ∈ X, ‖x‖̃ ≤ ‖x‖. This is because 0 ∈ M; hence ‖x‖ = ‖x − 0‖ ≥ ‖x‖. ̃
3. It follows directly from remark 2 that if (xn ) converges to x in X, then (x̃n )

converges to x̃ in X/M. In particular, if the series ∑n=1 𝜉n converges in X,

then ∑n=1 𝜉ñ converges in X/M.

Theorem 6.6.6. Let M be a closed subspace of a Banach space X. Then X/M is a


Banach space.

Proof. We use the result of problem 10 on section 6.1. Suppose ∑n=1 ‖x̃n ‖ < ∞. We

prove that ∑n=1 x̃n converges in X/M. By remark 1, for every n ∈ ℕ, there exists an
∞ ∞
element 𝜉n ∈ x̃n such that ‖𝜉n ‖ < ‖x̃n ‖ + 1/2n . Now ∑n=1 ‖𝜉n ‖ ≤ ∑n=1 [‖x̃n ‖ +
∞ ∞
1/2n ] = 1 + ∑n=1 ‖x̃n ‖ < ∞. By the completeness of X, the series ∑n=1 𝜉n con-
∞ ∞
verges in X, and, by remark 3, ∑n=1 𝜉ñ = ∑n=1 x̃n converges in X/M. 

Theorem 6.6.7. Let M be a closed subspace of a Banach space X. Then

(a) (X/M)∗ is isometrically isomorphic to M⊥ .


(b) X∗ /M⊥ is isometrically isomorphic to M∗ .

Proof. (a) Define a map 𝛿 ∶ M⊥ → (X/M)∗ by 𝛿 ∶ 𝜆 ↦ 𝛿𝜆 , where 𝛿𝜆 (x + M) =


𝛿𝜆 (x)̃ = 𝜆(x); 𝛿 is onto since if 𝜇 ∈ (X/M)∗ , define a functional 𝜆 ∶ X → ℂ by
𝜆(x) = 𝜇(x).̃ It is easy to see that 𝜆 ∈ X∗ , 𝜆 ∈ M⊥ , and 𝛿𝜆 = 𝜇. To show that 𝛿
is an isometry, first notice that ‖x‖̃ ≤ ‖x‖ and, by remark 1, if ‖x‖̃ < 1, there
exists an element x ∈ x̃ such that ‖x‖ < 1. Therefore ‖𝛿𝜆 ‖ = sup‖x‖<1̃ |𝛿𝜆 (x)|
̃ =
sup‖x‖<1 |𝜆(x)| = ‖𝜆‖.

(b) Let 𝜇 ∈ M∗ . By the Hahn-Banach theorem, 𝜇 has an extension 𝜆 ∈ X∗ .


Define a mapping 𝜎 ∶ M∗ → X∗ /M⊥ by 𝜎𝜇 = 𝜆 + M⊥ ; 𝜎 is well defined
because if 𝜆 and 𝜆′ are bounded extensions of 𝜇, then (𝜆 − 𝜆′ )(M) = 0; hence
𝜆 − 𝜆′ ∈ M⊥ , and 𝜆 + M⊥ = 𝜆′ + M⊥ . The linearity of 𝜎 is obvious and since
the restriction of any 𝜆 ∈ X∗ to M is in M∗ , 𝜎 is onto. It remains to show
that ‖𝜎𝜇 ‖ = ‖𝜇‖. Observe that 𝜎𝜇 is the collection of all bounded extensions
of 𝜇. Thus, by the definition of the quotient norm, ‖𝜎𝜇 ‖ = inf{‖𝜆‖}, where 𝜆
is a bounded extension of 𝜇. Since, for any such 𝜆, ‖𝜇‖ ≤ ‖𝜆‖, it follows that
‖𝜇‖ ≤ ‖𝜎𝜇 ‖ ≤ ‖𝜆‖. The Hahn-Banach theorem also guarantees an extension 𝜆
for which ‖𝜆‖ = ‖𝜇‖. Thus ‖𝜎𝜇 ‖ = ‖𝜇‖. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 283

Exercises

1. Show that if T, S ∈ ℒ(X) and a, and b are scalars, then (aT + bS)∗ = aT∗ +
bS∗ . Conclude that if X is reflexive, then the correspondence Ψ ∶ T ↦ T∗ is
an isometric isomorphism from ℒ(X) to ℒ(X∗ ).
2. If T and S are as in the above exercise, show that (ST)∗ = T∗ S∗ .
3. Let X be a Banach space. Prove that X⊥ = {0}, {0}⊥ = X∗ . State and prove the
corresponding statements for X∗ .
4. Let M be a subspace of a Banach space X. Prove that (M⊥ )⊥ = M. Hint: Use
theorem 6.4.5 to show that if x ∉ M, then x ∉ (M⊥ )⊥ .
5. Let X be a Banach space, and let T ∈ ℒ(X). Prove that ℜ(T) = 𝒩(T∗ )⊥ .
Conclude that ℜ(T) is dense if and only if T∗ is one-to-one.
6. Let X be a Banach space, and let T ∈ ℒ(X). Show that if xn →w x, then
Txn →w Tx.
7. Let S and T be commuting bounded linear operators on X. Prove that the
eigenspaces of T are S-invariant.
8. Let T ∈ ℒ(X), and suppose M is a T-invariant subspace of X. Prove that M⊥
is invariant under T∗ .
9. Verify the details of the proof that the norm defined on X/M is indeed a
norm.
10. Show that if M is a closed subspace of a Banach space X, then the quo-
tient map 𝜋 ∶ X → X/M is continuous. Also prove that if N is a finite-
dimensional subspace of X, then 𝜋(N) is a finite-dimensional subspace of
X/M.
11. In the quotient space l∞ /c0 , prove that ‖x‖̃ = lim supn |xn |. Hint: For 𝜖 > 0,
there are finitely many n ∈ ℕ such that |xn | > lim supn |xn | + 𝜖.
12. Let X be a Banach space, and let T ∈ ℒ(X). Define T ∶ X/Ker(T) → ℜ(T)
by T(x)̃ = T(x). Prove that T is a bounded isomorphism. Hint: To show
the continuity of T, suppose x̃n → 0, and choose xn ∈ x̃n such that ‖xn ‖ <
‖x̃n ‖ + 1/n.
13. Let R be the right shift operator on l2 , let M1 be the range of R, and let M2 be
the range of R2 . Determine the quotient spaces l2 /M1 and l2 /M2 . Conclude
that if M1 and M2 are isomorphic closed subspaces of a Banach space X,
then it is not necessarily true that X/M1 and X/M2 are isomorphic.
14. Prove that if M is a closed subspace of a separable Banach space X, then
X/M is separable.
15. Let M be a closed subspace of a Banach space X. Prove that if X∗ is separable,
then so is M∗ .
16. Let M be a closed subspace of a Banach space X. Prove that if M and X/M
are separable, then X is separable. Hint: Let {xn } ⊆ X be such that {x̃n } is
dense in X/M, and let {ym } ⊆ M be dense in M.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

284 fundamentals of mathematical analysis

17. Let X be a normed linear space, and let M be a closed subspace of X. Prove
that if M and X/M are Banach spaces, then so is X.
18. Let M be a closed subspace of a Banach space X, and let N be a finite-
dimensional subspace of X. Show that M + N is closed. Hint: Consider
𝜋 −1 (𝜋(N)), where 𝜋 ∶ X → X/M is the quotient map.
19. Let M be a complemented subspace of a Banach space X. Show that M⊥ is
complemented in X∗ . Hint: Let P be the projection of X onto M, and let
N = Ker(P). Show that M⊥ = Ker(P∗ ) and that N⊥ = ℜ(P∗ ).
x
20. Define a linear operator T ∈ ℒ(c0 ) as follows: for x = (xn ), T(x) = ( n ).
n
Describe T∗ . Recall the result of problem 16 on section 6.2.

6.7 Weak Topologies

The weak topologies are defined in much the same way the product topology
is defined. They are designed to guarantee the continuity of a certain class of
functions. We urge the reader to look up theorem 5.4.1, the definition of the
product topology in section 5.12, and theorem 5.12.1. This section is terminal and
may be omitted without loss of continuity.

Definition. Let X be a normed linear space. The weak topology on X is the


smallest topology relative to which all the bounded linear functionals on X are
continuous. We use the abbreviation w-topology for the weak topology on X.

Definition. Let X be a normed linear space, and let X∗ be its dual. The weak*
topology on X∗ is the smallest topology on X∗ relative to which the functionals
x̂ are continuous. Here x̂ is the image of x ∈ X under the natural embedding of
X into X∗∗ . We use the abbreviation w∗ -topology for the weak* topology on X∗ .

Notice that the definitions of the w- and w∗ -topologies are asymmetric. Only the
functional on X∗ of the form x̂ is admitted in the definition of the w∗ -topology on
X∗ . Thus if X is not reflexive, then the functionals in X∗∗ − X̂ are not guaranteed
to be continuous in the w∗ -topology, and indeed they are not. See theorem 6.7.6.

In order to eliminate any potential confusion, we specifically refer to the topology


generated by the norm on a space X (or its dual X∗ ) as the norm topology on X
(or X∗ ). The norm topology is also referred to as the strong topology. We denote
the closed unit balls of a normed linear space X and its dual X∗ by B and B∗ ,
respectively. We use notation such as (B∗ , w∗ ) to indicate the closed unit ball of
X∗ , when it is endowed with the w∗ -topology.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 285

It follows directly from the definitions that an open base for the w-topology is
the collection of sets of the form ∩ni=1 {x ∈ X ∶ |𝜆i (x) − 𝜆i (x0 )| < r}, where r > 0,
x0 ∈ X, and {𝜆1 , … , 𝜆n } is a finite subset of X∗ . Similarly, an open base for the w∗ -
topology is the collection of all sets of the type ∩ni=1 {𝜆 ∈ X∗ ∶ |𝜆(xi ) − 𝜆0 (xi )| < r},
where r > 0, 𝜆0 ∈ X∗ , and {x1 , … , xn } is a finite subset of X.

In the exercises in section 6.4, we introduced the notion of a weakly convergent


sequence in a normed linear space. We now reconcile this concept with the defi-
nition of the weak topologies. First recall the definition of a convergent sequence
in a topological space.

Definition. A sequence (xn ) in a topological space X is said to converge to a point


x ∈ X if every open neighborhood of x contains all but finitely many terms of
the sequence (xn ). Thus if U is an open neighborhood of x, then there exists a
natural number N such that xn ∈ U for every n ≥ N.

Theorem 6.7.1.
(a) A sequence (xn ) converges to x in the w-topology on a normed linear space X
if and only if limn 𝜆(xn ) = 𝜆(x) for every 𝜆 ∈ X∗ .
(b) A sequence (𝜆n ) converges to 𝜆 in the w∗ -topology if and only if limn 𝜆n (x) =
𝜆(x) for every x ∈ X.

Proof. We prove part (b). Let 𝜆n and 𝜆0 be such that limn 𝜆n (x) = 𝜆0 (x) for every
x ∈ X. We show that 𝜆n converges to 𝜆0 in the w∗ -topology. If U is a w∗ -
open neighborhood of 𝜆0 , then there exists r > 0 and a finite subset {x1 , ..., xm }
of X such that ∩m ∗
i=1 {𝜆 ∈ X ∶ |𝜆(xi ) − 𝜆0 (xi )| < r} ⊆ U. Since for all 1 ≤ i ≤ m,
limn 𝜆n (xi ) = 𝜆0 (xi ), there is a natural number N such that |𝜆n (xi ) − 𝜆0 (xi )| < r
for all n > N and all 1 ≤ i ≤ m. This means that 𝜆n ∈ ∩m ∗
i=1 {𝜆 ∈ X ∶ |𝜆(xi ) −
𝜆0 (xi )| < r} ⊆ U, for every n > N. The proof of the converse is a partial reversal
of the above argument. 

Theorem 6.7.2. Let X be a finite-dimensional normed linear space. Then

(a) The w-topology and the norm topology on X coincide.


(b) The w∗ -topology and the norm topology on X∗ coincide.

Proof. We prove part (b). We show that if U = {𝜆 ∈ X∗ ∶ ‖𝜆 − 𝜆0 ‖ < r}, then U con-
tains a w∗ -neighborhood V of 𝜆0 . Let {e1 , … , en } be a basis for X, and define a norm
on X∗ by ‖𝜆‖′ = max1≤i≤n |𝜆(ei )|. Since all norms on X∗ are equivalent, there
exists 𝛿 > 0 such that ‖𝜆‖′ < 𝛿 implies that ‖𝜆‖ < r. The w∗ -open neighborhood
V = ∩ni=1 {𝜆 ∈ X∗ ∶ |𝜆(ei ) − 𝜆0 (ei )| < 𝛿} of 𝜆0 is contained in U. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

286 fundamentals of mathematical analysis

Before we can prove that the w-topology is different from the norm topology on
an infinite-dimensional normed linear space, we need a fact from general vector
space theory.

Lemma 6.7.3. Let U, V, and W be vector spaces, and let 𝜋 ∶ U → V and 𝜑 ∶ U → W


be linear mappings such that Ker(𝜋) ⊆ Ker(𝜑). Then there exists a linear mapping
𝜓 ∶ V → W such that 𝜓o𝜋 = 𝜑.

Proof. Let V1 = ℜ(𝜋), and define 𝜓 ∶ V1 → W by 𝜓(𝜋(x)) = 𝜑(x). The condition


Ker(𝜋) ⊆ Ker(𝜑) guarantees that 𝜓 is well defined. Let V2 be an algebraic
complement of V1 in V, and extend the definition of 𝜓 to V by 𝜓(v) = v1 , where
v = v1 + v2 and vi ∈ Vi . By construction, 𝜓o𝜋 = 𝜑. 

Lemma 6.7.4. Let X be a vector space, and let 𝜆 and 𝜆1 , … , 𝜆n be linear functionals
on X. If ∩ni=1 Ker(𝜆i ) ⊆ Ker(𝜆), then 𝜆 is a linear combination of 𝜆1 , … , 𝜆n .

Proof. Define 𝜋 ∶ X → 𝕂n by 𝜋(x) = (𝜆1 (x), … , 𝜆n (x)). The condition ∩ni=1 Ker(𝜆i ) ⊆
Ker(𝜆) implies that Ker(𝜋) ⊆ Ker(𝜆). The previous lemma produces a functional
𝜓 ∶ 𝕂n → 𝕂 such that 𝜓o𝜋 = 𝜆. Because 𝜓 is linear, there exist scalars a1 , … , an
n
such that for (v1 , … , vn ) ∈ 𝕂n , 𝜓(v1 , … , vn ) = ∑i=a ai vi . Now, for x ∈ X, 𝜆(x) =
n
(𝜓o𝜋)(x) = 𝜓(𝜆1 (x), … , 𝜆n (x)) = ∑i=1 ai 𝜆i (x). 

Theorem 6.7.5. A weakly open subset U of an infinite-dimensional normed linear


space X is unbounded.

Proof. Without loss of generality, we assume that 0 ∈ U. Then there is r > 0 and
a finite subset {𝜆1 , … , 𝜆n } of X∗ such that ∩ni=1 {x ∈ X ∶ |𝜆i (x)| < r} ⊆ U. The set
N = ∩ni=1 Ker(𝜆i ) is clearly contained in U. If N = {0}, then, for every 𝜆 ∈ X∗ ,
N ⊆ Ker(𝜆). By lemma 6.7.4, every 𝜆 ∈ X∗ would be a linear combination of
𝜆1 , … , 𝜆n , contradicting the assumption that X, hence X∗ , is infinite dimensional.
Thus N ≠ {0}, and, for any nonzero x ∈ N, the line {cx ∶ c ∈ ℝ} ⊆ N; hence U is
unbounded. 

The above theorem implies that the weak and norm topologies on an infinite-
dimensional space X are distinct since no open bounded subset of X can be
weakly open.

Weak topologies are generally intricate, and good caution must be exercised when
formulating arguments involving them. In metric topologies, when one speaks
of an open neighborhood of a point x, one instinctively thinks of an open ball
centered at x. A w-open neighborhood of a point looks nothing like an open ball
since bounded subsets of X are never w-open.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 287

We now prove that the w∗ -topology is very tight in the sense that it admits
the continuity of no linear functionals other than the functionals x̂ used in the
definition of the w∗ -topology.

Theorem 6.7.6. Let X be a Banach space, and let F be a w∗ -continuous linear


functional on X∗ . Then F = x̂ for some x ∈ X.

Proof. Let D be the open unit disk in the complex plane. By the w∗ -continuity of
F, F−1 (D) is w∗ -open; hence it contains a w∗ -neighborhood of 0 of the form
U = ∩ni=1 {𝜆 ∈ X∗ ∶ |𝜆(xi )| < r} for some r > 0, and some finite subset {x1 , … , xn }
of X. In particular, F(U) is a bounded subset of the complex plane. We show that
∩ni=1 Ker(x̂i ) ⊆ Ker(F). If 𝜆 ∈ ∩ni=1 Ker(x̂i ), then, clearly, 𝜆 ∈ U and c𝜆 ∈ U, for
all c ∈ ℝ; hence |c||F(𝜆)| = |F(c𝜆)| is bounded for all c ∈ ℝ. This forces F(𝜆) = 0.
Therefore ∩ni=1 Ker(xî ) ⊆ Ker(F). By lemma 6.7.4, F is a linear combination of
x̂1 , … , x̂n ; hence F ∈ X.̂ 

Theorem 6.7.7 (the Banach-Alaoglu theorem). Let X be a normed linear space.


Then B∗ = {𝜆 ∈ X∗ ∶ ‖𝜆‖ ≤ 1} is compact in the w∗ -topology.

Proof. For each x ∈ X, let Dx = {z ∈ ℂ ∶ |z| ≤ ‖x‖}, and let D = ∏x∈X Dx . By


Tychonoff ’s theorem, D is compact. For each 𝜆 ∈ B∗ , define f𝜆 ∈ D by f𝜆 (x) = 𝜆(x).
Since, for x ∈ X, |f𝜆 (x)| ≤ ‖𝜆‖‖x‖ ≤ ‖x‖, f𝜆 (x) ∈ Dx , and, indeed, f𝜆 ∈ D. The
function f ∶ B∗ → D given by 𝜆 ↦ f𝜆 clearly injects B∗ into D. For the rest of the
proof, we identify 𝜆 and f𝜆 and consider B∗ as a subset of D. The w∗ -topology
on B∗ is the restriction of the product topology on D to B∗ . The proof will
be complete if we show that B∗ is closed in D. Let 𝜇 ∈ D be a closure point
of B∗ . We need to show that 𝜇 ∈ B∗ . Fix a pair of points x, y ∈ X, and let
𝜖 > 0. The D-open set {g ∈ D ∶ |g(x) − 𝜇(x)| < 𝜖/3, |g(y) − 𝜇(y)| < 𝜖/3, |g(x +
y) − 𝜇(x + y)| < 𝜖/3} intersects B∗ , so there exists an element 𝜆 ∈ B∗ such that
|𝜆(x) − 𝜇(x)| < 𝜖/3, |𝜆(y) − 𝜇(y)| < 𝜖/3, and |𝜆(x + y) − 𝜇(x + y)| < 𝜖/3. Since
𝜆(x + y) − 𝜆(x) − 𝜆(y) = 0,

|𝜇(x + y) − 𝜇(x) − 𝜇(y)| = |𝜇(x + y) − 𝜇(x) − 𝜇(y) − 𝜆(x + y) + 𝜆(x) + 𝜆(y)|


≤ |𝜆(x + y) − 𝜇(x + y)| + |𝜆(x) − 𝜇(x)| + |𝜆(y) − 𝜇(y)| < 𝜖.

Since 𝜖 is arbitrary, 𝜇(x + y) = 𝜇(x) + 𝜇(y). The homogeneity of 𝜇 is proved using


a similar argument. Finally, 𝜇 ∈ B∗ because 𝜇(x) ∈ Dx , so |𝜇(x)| ≤ ‖x‖, which
means that 𝜇 is bounded and ‖𝜇‖ ≤ 1. 

The following theorem is curious if not very practical.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

288 fundamentals of mathematical analysis

Corollary 6.7.8. Every Banach space X is isometrically isomorphic to a closed


subspace of 𝒞(K) for some compact Hausdorff space K.

Proof. By the Banach-Alaoglu theorem, the space K = (B∗ , w∗ ) is compact. Define


a function F ∶ X → 𝒞(K) by F(x) = x;̂ F is a linear isometry since ‖F(x)‖∞ =
̂
sup{|x(𝜆)| ∶ 𝜆 ∈ B∗ } = ‖x‖̂ = ‖x‖. (Note: The norm on 𝒞(K) is the supremum
norm.) Thus X is isometrically isomorphic to F(X), which is a closed subspace
of 𝒞(K). 

Theorem 6.7.9. Let X be a separable Banach space. Then (B∗ , w∗ ) is metrizable.

Proof. Let {xn } be a dense subset of B. For 𝜆, 𝜇 ∈ B∗ , define


d(𝜆, 𝜇) = ∑ 2−n |𝜆(xn ) − 𝜇(xn )|.
n=1

If d(𝜆, 𝜇) = 0, then (𝜆 − 𝜇)(xn ) = 0 for all n. The density of {xn } forces 𝜆 = 𝜇.


The other defining properties of a metric are easy to verify. We show that the
identity function I ∶ (B∗ , w∗ ) → (B∗ , d) is a homeomorphism. It follows that
(B∗ , w∗ ) is metrizable. Since (B∗ , w∗ ) is compact and (B∗ , d) is Hausdorff, it
suffices to show that I is continuous. See theorem 5.8.7. To this end, we prove that a
d-open ball U = {𝜆 ∈ B∗ ∶ d(𝜆, 𝜆0 ) < r} contains a w∗ -open neighborhood V
of 𝜆0 . Because |𝜆(xi ) − 𝜆0 (xi )| ≤ 2 for all 𝜆 ∈ B∗ and all i ∈ ℕ, there exists

a positive integer N such that such that ∑i=N+1 2−i |𝜆(xi ) − 𝜆0 (xi )| < r/2 for
all 𝜆 ∈ B∗ . For every 𝜆 ∈ B∗ satisfying |𝜆(xi ) − 𝜆0 (xi )| < r/2 for all 1 ≤ i ≤ N,
N ∞
we have d(𝜆, 𝜆0 ) = ∑i=1 2−i |𝜆(xi ) − 𝜆0 (xi )| + ∑i=N+1 2−i |𝜆(xi ) − 𝜆0 (xi )| < r/2 +
∗ N ∗
r/2. Thus the w -neighborhood V = ∩i=1 {𝜆 ∈ B ∶ |𝜆(xi ) − 𝜆0 (xi )| < r/2} of 𝜆0 is
contained in U. 

Corollary 6.7.10. If X is separable, so is (X∗ , w∗ ).

Proof. By theorems 6.7.7 and 6.7.9, (B∗ , w∗ ) is compact and metrizable; hence, it is
separable. Since X∗ = ∪∞n=1 nB , X is separable in the w -topology. 
∗ ∗ ∗

The converse of theorem 6.7.9 is also true. Recall (see theorem 4.9.10) that if K is
a compact metric space, then 𝒞(K) is separable.

Theorem 6.7.11. Let X be a Banach space. Then (B∗ , w∗ ) is metrizable if and only
if X is separable.

Proof. Suppose (B∗ , w∗ ) is metrizable. Theorem 6.7.7 implies that K = (B∗ , w∗ ) is a


compact metrizable space; hence 𝒞(K) is separable. The mapping F ∶ X → 𝒞(K)
defined by F(x) = x̂ is an isometry. Hence X is separable because it is isometric to
F(X), which is separable, being a subspace of a separable metric space. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

banach spaces 289

Exercises

1. Prove that the w- and the w∗ -topologies are Hausdorff.


2. Complete the proof of theorem 6.7.1.
3. Complete the proof of theorem 6.7.2.
4. Let X be an infinite-dimensional normed linear space. Prove that every w-
neighborhood of 0 contains an infinite-dimensional subspace of X. Hint:
Examine the proof of theorem 6.7.5, and see problem 19 on section 3.4.
5. In connection with corollary 6.7.8, prove that F(X) is a closed subspace
of 𝒞(K).
6. Let M be a subspace of a normed linear space X. Prove that the w-closure
of M is a subspace of X.
7. Prove that every norm-closed subspace M of a Banach space X is w-closed.
Conclude that the w-closure and the norm-closure of a subspace of X
coincide.
8. Prove that a Banach space is separable if it is weakly separable.
9. Prove that if X a Banach space such that X∗ is separable, then (B, w) is
metrizable.
10. Let K be a compact subset of a Banach space X. Prove that the weak and
norm topologies on K coincide. Hint: See problem 6 on section 5.8.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

7
Hilbert Spaces

Wir müssen wissen.


Wir werden wissen.
David Hilbert. 1862–1943

David Hilbert. 1862–1943

Upon graduation from the Wilhelm Gymnasium, where he spent his final year of
schooling, Hilbert enrolled at the University of Königsberg in the autumn of 1880.
He received his Ph.D. from Königsberg in 1885, remained there as a member of
staff from 1886 to 1895, and was promoted to the rank of professor in 1893. In 1895
Hilbert was appointed to the chair of mathematics at the University of Göttingten,
where he spent the rest of his career. Among Hilbert’s numerous students were
Hermann Weyl, Felix Bernstein, Otto Blumenthal, Richard Courant, Alfred Haar,
and Hugo Steinhaus.

Hilbert contributed to many branches of mathematics, including geometry,


algebraic number fields, functional analysis, integral equations, mathematical
physics, and the calculus of variations. Hilbert’s work in geometry had the greatest
influence in that area after Euclid. A systematic study of the axioms of Euclidean
geometry led Hilbert to propose twenty-one such axioms, and he analyzed their

Fundamentals of Mathematical Analysis. Adel N. Boules, Oxford University Press (2021). © Adel N. Boules.
DOI: 10.1093/oso/9780198868781.003.0007
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

292 fundamentals of mathematical analysis

significance. He published Grundlagen der Geometrie in 1899, putting geometry


in a formal axiomatic setting. Hilbert is most remembered for studying infinite-
dimensional Euclidean spaces, which are now known as Hilbert spaces.

Hilbert’s famous twenty-three Paris problems challenged (and still today


challenge) mathematicians to solve fundamental questions. Hilbert’s famous
speech The Problems of Mathematics was delivered to the Second International
Congress of Mathematicians in Paris. It was a speech full of optimism for
mathematics in the coming century, and he felt that open problems were the sign
of vitality in the subject. Hilbert’s problems included the continuum hypothesis,
Goldbach’s conjecture, and the Riemann hypothesis.

Hilbert’s mathematical abilities were nicely summed up by Otto Blumenthal, his


first student:

In the analysis of mathematical talent one has to differentiate between the ability to
create new concepts that generate new types of thought structures and the gift for
sensing deeper connections and underlying unity. In Hilbert’s case, his greatness
lies in an immensely powerful insight that penetrates into the depths of a question.
All of his works contain examples from far-flung fields in which only he was able
to discern an interrelatedness and connection with the problem at hand. From
these, the synthesis, his work of art, was ultimately created. Insofar as the creation
of new ideas is concerned, I would place Minkowski higher, and of the classical
great ones, Gauss, Galois, and Riemann. But when it comes to penetrating insight,
only a few of the very greatest were the equal of Hilbert.

Hilbert retired in 1930, and the city of Königsberg made him an honorary citizen.
He gave an address which ended with famous words that now appear on his
epitaph:

Wir müssen wissen, wir werden wissen: We must know, we shall know.1

7.1 Definitions and Basic Properties

Let {u1 , u2 , ...} be an infinite orthonormal sequence of vectors in an inner prod-


uct space H, and let x ∈ H. In the introduction to section 4.10, we posed the
following problem. Under what conditions does the sequence of orthogonal
n n
projections, Sn x = ∑i=1 ⟨x, ui ⟩ui = ∑i=1 x̂i ui , of x on the finite-dimensional space

1 Perhaps as a rebuttal of Du Bois-Raymond’s statement “we do not know and will not know,”
reflecting the idea that scientific knowledge is unknown and unknowable.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 293

Mn = Span({u1 , … , un }), converge to x. Regardless of whether Sn x converges to


x, it is a Cauchy sequence. To see this, recall the result of problem 5 on section

3.7 (also see theorem 7.2.6,) which states that ∑n=1 |x̂n |2 < ∞. Now, for m > n,
m
‖Sm x − Sn x‖2 ≤ ∑i=n+1 |x̂i |2 . The sum in the last expression tends to 0 as n → ∞

because it is the middle section of the convergent series ∑i=1 |x̂i |2 . Thus we have a
sufficient condition for the convergence of the sequence Sn x: the completeness of
H. This is exactly the definition of a Hilbert space. The completeness of H merely
guarantees the convergence of Sn x. It does not guarantee that limn Sn x = x, as the
following situation illustrates. If u ∈ H is unit vector orthogonal to each un , then
Sn u = 0 for all n ∈ ℕ; hence limn Sn u = 0 ≠ u. To remedy this situation, one may
want to impose the condition that no such vector u exists. Equivalently, this means
that the sequence {u1 , u2 , ...} is a maximal orthonormal subset of H, and this is
precisely the definition of a countable orthonormal basis for H. Hilbert spaces
and orthonormal bases are the subject of our study in this section and the next.
The question about the smallest Hilbert space H in which trigonometric series
of functions in H converge will be settled in section 8.9, together with related
questions pertaining orthogonal polynomials. It is strongly recommended that you
study sections 3.7 and 4.10 before you tackle this chapter.

Definition. A Hilbert space is a complete inner product space.

Example 1. The spaces 𝕂n and l2 are Hilbert spaces. 

Example 2. The space (𝕂(ℕ), ‖.‖2 ) is not a Hilbert space. We use the fact that a
subspace of l2 is complete if and only if it is closed. Now 𝕂(ℕ) is not closed
in l2 because it contains the sequence x1 = (1, 0, 0, ...), x2 = (1, 1/2, 0, 0, ...), … ,
xn = (1, 1/2, 1/3, ..., 1/n, 0, 0, ...) The limit of the sequence (xn ) is the harmonic

sequence x = (1, 1/2, … , 1/n, ...) because ‖xn − x‖22 = ∑j=n+1 |xj |2 → 0 as
n → ∞. Clearly, x ∉ 𝕂(ℕ). 

For ease of reference, we state, without proof, a few results from section 3.7. We
urge the reader to look up the proofs and the basic definitions in section 3.7.

Theorem 7.1.1 (the Cauchy-Schwarz inequality). If H is an inner product space,


then, for all x, y ∈ H, |⟨x, y⟩| ≤ ‖x‖‖y‖. Equality holds if and only if x and y are
linearly dependent. 

Corollary 7.1.2. In an inner product space H, ‖x + y‖ ≤ ‖x‖ + ‖y‖. Here


‖x‖ = ⟨x, x⟩1/2 is the norm on H induced by the inner product. 

Theorem 7.1.3. Let x and y be elements of an inner product space H.


OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

294 fundamentals of mathematical analysis

(a) The Pythagorean theorem. If x and y are orthogonal, then

‖x + y‖2 = ‖x‖2 + ‖y‖2 .

(b) The Parallelogram law.

‖x + y‖2 + ‖x − y‖2 = 2‖x‖2 + 2‖y‖2 .

(c) The Polarization identity.

‖x + y‖2 − ‖x − y‖2 + i‖x + iy‖2 − i‖x − iy‖2 = 4⟨x, y⟩.

Proof. We leave the proof as an exercise. 

Not all norms are induced by an inner product. However, we have the following
result, which we limit to real normed linear spaces for simplicity.

Example 3. Suppose that (X, ‖.‖) is a real normed linear space and that the norm
satisfies the parallelogram identity. Then the function
1
⟨x, y⟩ = [‖x + y‖2 − ‖x − y‖2 ]
4

is an inner product that generates the norm.


It is clear that ⟨x, x⟩ = ‖x‖2 , thus establishing the positivity of the function
⟨., .⟩ and that it generates the norm. The symmetry of ⟨., .⟩ is obvious. Next we
establish the linearity of ⟨., .⟩.
We leave it to the reader to use the parallelogram identity to show that

‖x + y + z‖2 + ‖x‖2 + ‖y‖2 + ‖z‖2 = ‖x + y‖2 + ‖y + z‖2 + ‖x + z‖2 . (1)

Replacing z with −z in identity (1), we have

‖x + y − z‖2 + ‖x‖2 + ‖y‖2 + ‖z‖2 = ‖x + y‖2 + ‖y − z‖2 + ‖x − z‖2 . (2)

Subtracting (2) from (1), we obtain

‖x + y + z‖2 − ‖x + y − z‖2 = ‖x + z‖2 − ‖x − z‖2 + ‖y + z‖2 − ‖y − z‖2 ,

which is equivalent to ⟨x + y, z⟩ = ⟨x, z⟩ + ⟨y, z⟩.

Using the linearity property we just established and induction, it follows


that, for m ∈ ℕ, ⟨mx, y⟩ = m⟨x, y⟩. Since ⟨−x, y⟩ = −⟨x, y⟩, the identity
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 295

⟨mx, y⟩ = m⟨x, y⟩ holds for all m ∈ ℤ. Using this, for all n ∈ ℕ, ⟨x, y⟩ =
1 1 1 1
⟨n x, y⟩ = n⟨ x, y⟩. Equivalently, ⟨ x, y⟩ = ⟨x, y⟩.
n n n n
We have shown that, for all q ∈ ℚ, ⟨qx, y⟩ = q⟨x, y⟩. It is easy to see that if
limn xn = x, then limn ⟨xn , y⟩ = ⟨x, y⟩. Now the homogeneity property, ⟨𝛼x, y⟩ =
𝛼⟨x, y⟩ for 𝛼 ∈ ℝ, is immediate because ℚ is dense in ℝ. 

Definition. For a subset A of an inner product space H, the annihilator of A is


the set of all vectors that are orthogonal to every element in A. Symbolically,
A⊥ = {x ∈ H ∶ ⟨x, a⟩ = 0 ∀a ∈ A}.

Example 4. Consider the space 𝒞[−𝜋, 𝜋] with the inner product ⟨ f, g⟩ =


−𝜋
∫𝜋 f(x)g(x)dx. Let M be the set of functions in 𝒞[−𝜋, 𝜋] that vanish on the
interval [−𝜋, 0], and let N be the set of all functions in 𝒞[−𝜋, 𝜋] that vanish on
the interval [0, 𝜋]. Every function in M is orthogonal to every function in N.
Thus N ⊆ M⟂ and M ⊆ N⟂ . 

Theorem 7.1.4. For subsets A and B of an inner product space H,

(a) A ⊆ A⊥⊥ ;
(b) if A ⊆ B, then A⊥ ⊇ B⊥ ;
(c) A⊥ is a closed subspace of H; and
(d) A⊥ = M⊥ , where M = Span(A).

Proof. (a) and (b) are obvious.


(c) Since A⊥ = ∩a∈A a⊥ , it is enough to prove that a⊥ is a closed subspace of
H. If 𝛼, 𝛽 ∈ 𝕂 and x, y ∈ a⊥ , then ⟨𝛼x + 𝛽y, a⟩ = 𝛼⟨x, a⟩ + 𝛽⟨y, a⟩ = 0. Thus
a⊥ is a subspace of H. If xn ∈ a⊥ and limn xn = x, then ⟨x, a⟩ = ⟨limn xn , a⟩ =
limn ⟨xn , a⟩ = 0. The continuity of the inner product in its arguments has been
used here. The proof of part (d) is left as an exercise. 

Definition. If M is a closed subspace of a Hilbert space H, the closed subspace M⊥


is called the orthogonal complement of M (rather than the annihilator of M.)
The reason for the above terminology will become apparent in theorem 7.1.7.

Example 8 in section 4.7 is a very special case of the theorem below. Observe that
the completeness of H is crucial here.

Theorem 7.1.5. Let C be a closed convex subset of a Hilbert space H, and let
x ∈ H. Then there exists a unique element y ∈ C such that ‖x − y‖ = dist(x, C) =
infz∈C ‖x − z‖.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

296 fundamentals of mathematical analysis

Proof. If x ∈ C, take y = x. If x ∉ C, then 𝛿 = dist(x, C) > 0 because C is closed.


There exists a sequence (yn ) in C such that limn ‖x − yn ‖ = 𝛿. By the parallelogram
law,

‖yn − ym ‖2 = ‖(yn − x) − (ym − x)‖2 = 2‖yn − x‖2 + 2‖ym − x‖2


− ‖yn − x + ym − x‖2 .

yn +ym
Now ‖yn − x − ym − x‖2 = 4‖x − ( )‖2 ≥ 4𝛿 2 . The last inequality is true
2
yn +ym
because ∈ C due to the convexity of C. Thus
2

‖yn − ym ‖2 ≤ 2‖yn − x‖2 + 2‖ym − x‖2 − 4𝛿 2 → 0 as m, n → ∞.

This shows that (yn ) is a Cauchy sequence, and hence y = limn yn exists. Since C is
closed, y ∈ C. Now 𝛿 = limn ‖x − yn ‖ = ‖x − y‖, and y is one of the closest points
in C to x. To show that y is unique, suppose z ∈ C is such that ‖x − z‖ = 𝛿. By the
parallelogram law, and as in the calculation above,

‖y − z‖2 = 2‖y − x‖2 + 2‖x − z‖2 − ‖y + z − 2x‖2


y+z 2
= 2𝛿2 + 2𝛿 2 − 4‖x − ( )‖ ≤ 2𝛿 2 + 2𝛿 2 − 4𝛿 2 = 0. 
2

Corollary 7.1.6. If C is a closed convex subset of a Hilbert space H, then C contains


a unique element of smallest norm.

Proof. Apply the above theorem with x = 0. 

Theorem 7.1.7 (the projection theorem). Let M be a closed subspace of a Hilbert


space H. Then H = M ⊕ M⊥ , where M⊥ is the orthogonal complement of M.

Proof. Let x ∈ H, let y be the closest element of M to x, and let z = x − y. Write


𝛿 = dist(x, M) = ‖z‖. We show that z ∈ M⊥ . Let w ∈ M and, without loss of
generality, assume that ‖w‖ = 1. For any 𝛼 ∈ 𝕂, y + 𝛼w ∈ M, so

𝛿2 ≤ ‖x − y − 𝛼w‖2 = ‖z − 𝛼w‖2 = ⟨z − 𝛼w, z − 𝛼w⟩


= ‖z‖2 − 𝛼⟨w, z⟩ − 𝛼⟨z, w⟩ + |𝛼|2 ‖w‖2
= 𝛿 2 − 2Re(𝛼⟨w, z⟩) + |𝛼|2 .

Therefore 2Re(𝛼⟨w, z⟩) ≤ |𝛼|2 . Since the above is true for an arbitrary 𝛼, choose
𝛼 = ⟨z, w⟩. We now have 2|⟨w, z⟩|2 ≤ |⟨w, z⟩|2 ; hence ⟨w, z⟩ = 0. The proof is now
complete because M ∩ M⊥ = {0}. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 297

As an immediate consequence of the above theorem, every element x ∈ H can be


written uniquely as x = y + z, where y ∈ M and z ∈ M⊥ .

Example 5. Let M be the set of all sequences in l2 whose even terms are zero, and
let N be the set of all sequences in l2 whose odd terms are zero. It is easy to see
that M and N are closed subspaces of l2 and that N = M⟂ . Trivially, every vector
x = (xn ) ∈ l2 can be written as x = (x1 , 0, x3 , 0, ...) + (0, x2 , 0, x4 , ...) ∈ M ⊕ N. 

Definition. The element y in theorem 7.1.7 is called the orthogonal projection


of x on M. It is worth reiterating that y is the closest element in M to x. The
mapping PM ∶ H → H defined by PM (x) = y is called the projection operator
(or simply the projection) of H onto M.

Theorem 7.1.8. Let M be a closed subspace of a Hilbert space H, and let P = PM be


the projection of H on M. Then

(a) P is bounded and ‖P‖ = 1,


(b) ℜ(P) = M, and
(c) P2 = P.

Proof. (a) Let x, x′ ∈ H, and let x = y + z, and x′ = y′ + z′ , where y, y′ ∈ M and


z, z′ ∈ M⊥ . Then x + x′ = (y + y′ ) + (z + z′ ). Since y + y′ ∈ M and z + z′ ∈ M⊥ ,
the uniqueness of the orthogonal projection of x + x′ on M forces P(x + x′ ) = y +
y′ = P(x) + P(x′ ). The proof that P(ax) = aP(x) is similar. Now ‖x‖2 = ‖y‖2 +
‖z‖2 ; thus ‖P(x)‖ = ‖y‖ ≤ ‖x‖, so ‖P‖ ≤ 1. Since P(x) = x for every x ∈ M,
‖P‖ = 1. This proves (a). Parts (b) and (c) are obvious. 

The following theorem gives a complete and simple characterization of the dual
of a Hilbert space. The Riesz representation theorem basically says that a Hilbert
space is isometrically isomorphic to itself in a very natural way.

Theorem 7.1.9 (the Riesz representation theorem). Let 𝜆 be a bounded linear


functional on a Hilbert space H. Then there exists a unique element y ∈ H such
that 𝜆(x) = ⟨x, y⟩ for all x ∈ H. Furthermore, ‖𝜆‖ = ‖y‖.

Proof. If 𝜆 = 0, take y = 0. Otherwise, let M = Ker(𝜆); M is a closed subspace of


H because M = 𝜆−1 (0), and M ≠ H because 𝜆 ≠ 0. By the projection theorem,
H = M ⊕ M⊥ . Pick a nonzero element z ∈ M⊥ . Then 𝜆(z) ≠ 0, and, by replacing
z with z/𝜆(z), we may assume that 𝜆(z) = 1. For x ∈ H, x = x − 𝜆(x)z + 𝜆(x)z. It
is easy to verify that w = x − 𝜆(x)z ∈ M.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

298 fundamentals of mathematical analysis

Observe that ⟨x, z⟩ = ⟨w, z⟩ + ⟨𝜆(x)z, z⟩ = 𝜆(x)‖z‖2 . Define y = z/‖z‖2 . Then,


⟨x,z⟩
by the above identity, 𝜆(x) = 2 = ⟨x, y⟩. To prove that y is unique, suppose
‖z‖
that there is another element y1 ∈ H such that ⟨x, y⟩ = ⟨x, y1 ⟩ for all x ∈ H. Then
⟨x, y − y1 ⟩ = 0 for all x ∈ H. Choose x = y − y1 . Then ‖y − y1 ‖2 = 0; hence y = y1 .
Finally, |𝜆(x)| = |⟨x, y⟩| ≤ ‖x‖‖y‖. Thus ‖𝜆‖ ≤ ‖y‖.
Also |𝜆(y)| = |⟨y, y⟩| = ‖y‖2 = ‖y‖‖y‖. This shows that ‖𝜆‖ ≥ ‖y‖ and that
‖𝜆‖ = ‖y‖. 

Recall that a hyperplane in ℝn is nothing other than the translation of the null-
space of a linear functional on ℝn , that all linear functionals on ℝn are continuous,
and that all maximal subspaces are closed. In infinite dimensions, the null-space
of a linear functional 𝜆 is closed if and only if 𝜆 is continuous. The following result
is the exact analog of example 10 in section 4.7.

Example 6. Let C be a closed convex subset of a real Hilbert space H, and let
a ∈ H − C. Then there exists a bounded functional 𝜆 on H and a constant b
such that 𝜆(y) < b for every y ∈ C, and 𝜆(a) > b.

The obtuse angle criterion extends to the current situation, and the proof is
identical to that in example 9 in section 4.7. Thus if z is the closest element
in C to a, then, for every y ∈ C, ⟨a − z, y − z⟩ ≤ 0. Let m = (a + z)/2, and define
n = a − z, 𝜆(x) = ⟨x, n⟩, and b = 𝜆(m). As in example 10 in section 4.7, we may
assume that m = 0; hence b = 0. It is easy to verify that 𝜆(y) < 0 for all y ∈ C
and that 𝜆(a) > 0. 

The Completion of an Inner Product Space.

Example 7. If (xn ) and (yn ) are Cauchy sequences in an inner product space, then
limn ⟨xn , yn ⟩ exists.
We prove that the sequence ⟨xn , yn ⟩ is Cauchy in ℂ; hence the limit in question
exists. Recall that Cauchy sequences are bounded. Now

|⟨xn , yn ⟩ − ⟨xm , ym ⟩| = |⟨xn − xm , yn ⟩ + ⟨xm , yn − ym ⟩|


≤ ‖xn − xm ‖‖yn ‖ + ‖xm ‖‖yn − ym ‖ → 0 as m, n → ∞. 

Theorem 7.1.10. Let (X, ⟨., .⟩) be an incomplete inner product space. Then there
exists a Hilbert space H that contains X as a dense subspace such that the inner
product on X is the restriction of the inner product on H. If X is separable, so is H.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 299

Proof. Let ‖.‖ be the norm on X induced by the inner product, and let H be the
completion of X with respect to the norm on X (theorem 6.4.9). Refer to the
extended norm by ‖.‖′ . For x, y ∈ H, choose sequences (xn ) and (yn ) in X such
that xn → x and yn → y, and extend the definition of the inner product to H by
⟨x, y⟩′ = limn ⟨xn , yn ⟩. We leave it to the reader to verify that the inner product
we just defined is well defined and that it is indeed an inner product. Clearly, the
inner product on H extends that on X. Finally, we prove that that the extended
inner product induces the extended norm on H. For a sequence (xn ) converging to
2 2
x ∈ H, ⟨x, x⟩′ = limn ⟨xn , xn ⟩ = limn ‖xn ‖2 = limn (‖xn ‖′ ) = (‖x‖′ ) . 

Exercises

1. Prove that the norm on 𝒞[0, 1] generated by the inner product

1
⟨ f, g⟩ = ∫ f(x)g(x)dx
0

is not complete.
2. Prove the parallelogram law and the polarization identity.
3. Let x and y be nonzero vectors in an inner product space. Prove that there
Re⟨x,y⟩
exists a unique number 0 ≤ 𝜃 ≤ 𝜋 such that cos 𝜃 = . Conclude that
‖x‖‖y‖
‖x + y‖2 = ‖x‖2 + ‖y‖2 + 2‖x‖‖y‖ cos 𝜃.
4. Prove the Apollonius identity: For vectors x, y, and z in an inner product
1 x+y 2
space, ‖z − x‖2 + ‖z − y‖2 = ‖x − y‖2 + 2‖z − ‖.
2 2
5. Let A be a subset of a Hilbert space H, and let M = Span(A). Prove that
A⊥ = M⊥ .
6. Let M be a closed subspace of a Hilbert space. Prove that M = M⊥⊥ . Give
an example to show that the result fails if M is not closed. More generally,
show that M⊥⊥ = M.
7. Show that if A is a subset of a Hilbert space H, then A⊥⊥ is the smallest
closed subspace of H containing A.
8. Let (xn ) and (yn ) be sequences in an inner product space. Prove that
(a) if limn xn = 0, and (yn ) is bounded, than limn ⟨xn , yn ⟩ = 0; and
∞ ∞
(b) if y⊥xn for each n ∈ ℕ, and ∑n=1 xn is convergent, then y⊥ ∑n=1 xn .
9. Prove that if an element x in a Hilbert space is orthogonal to every vector
in a dense subset of H, then x = 0.
10. Let (xn ) be a sequence of mutually orthogonal vectors in a Hilbert space H.
∞ ∞
Prove that ∑n=1 xn converges in H if and only if ∑n=1 ‖xn ‖2 < ∞.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

300 fundamentals of mathematical analysis

11. Use Hilbert space methods to provide an easy proof of the Hahn-Banach
theorem for Hilbert spaces.
n
12. Let M = {x = (x1 , … , xn ) ∈ ℝn ∶ ∑i=1 xi = 1}. Show that M is closed and
convex, and find the element in M closest to the origin.
13. Let C be a closed convex subset of a Hilbert space H, let x ∈ H − C, and
let y be the closest element of C to x. Prove that, for every z ∈ C, Re⟨x − y,
z − y⟩ ≤ 0.
14. Let 𝛿n be a positive sequence, and let C = {x ∈ l2 ∶ |xn | ≤ 𝛿n }. Show that C

is compact if and only if ∑n=1 𝛿n2 < ∞.

7.2 Orthonormal Bases and Fourier Series

In the introduction to section 7.1, we made the case for the existence of a maximal
orthonormal sequence {u1 , u2 , ...} in a Hilbert space H. As you will see in this
section, some Hilbert spaces do not admit countable maximal orthonormal
subsets. Perhaps we must first tackle the problem of the existence of a maximal
orthogonal subset of H, then examine the problem of which Hilbert spaces possess
a countable such subset. In this section, we provide solutions to both problems
and reveal the basic structure of a Hilbert space, hence paving the way to answer
the problems posed in section 4.10.

The proof of the following theorem can be seen in section 3.7.

Theorem 7.2.1. An orthogonal subset S of a Hilbert space H is independent. 

Definition. An orthonormal basis for a Hilbert space H is a maximal orthonor-


mal subset of H. An orthonormal subset of H is maximal if it is not properly
contained in another orthonormal subset of H.

Example 1. We show that {en ∶ n ∈ ℕ} is an orthonormal basis for l2 .


It is clear that S is orthonormal. If x = (xn ) ∈ l2 is orthogonal to S, then, for every
n ∈ ℕ, xn = ⟨x, en ⟩ = 0, and hence x = 0. 

In the theorem below, we prove a little more than the existence of an orthonormal
basis for an arbitrary Hilbert space.

Theorem 7.2.2. Every orthonormal subset A of a Hilbert space H is contained


in an orthonormal basis for H. In particular, every Hilbert space contains an
orthonormal basis.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 301

Proof. Let 𝔅 be the collection of all orthonormal subsets of H that contain A; 𝔅 is


not empty since A is one of its members. Order 𝔅 by set inclusion. It is rather
straightforward to show that the union of the members of a chain in 𝔅 is an
orthonormal subset of H that contains A and is therefore an upper bound of the
chain. By Zorn’s lemma, 𝔅 has a maximal member, that is, an orthonormal basis
of H containing A. To prove that every Hilbert space possesses an orthonormal
basis, apply the result we just proved with A = {x}, where x is a unit vector. 

The goal of this section is to represent an arbitrary element of a Hilbert space


H in terms of a basis of some kind. If dim(H) < ∞, the goal is too trivial, and
if dim(H) = ∞, the goal is unrealistic if one insists on looking at a Hamel basis
because any such basis is uncountable and hence too big to be useful. The only
realistic expectation is to hope to express an arbitrary element of H as a series
of the basis elements, as was achieved in section 4.10 for trigonometric series of
continuous functions. This means that H has a Schauder basis, which immediately
suggests that we investigate separable Hilbert spaces (see problem 12 on section
6.1). The following theorem is the happy coincidence we hope for.

Theorem 7.2.3. A Hilbert space H is separable if and only if every orthonormal basis
of H is countable.

Proof. If H is separable, then H contains a countable dense subset {x1 , x2 , ...} and,
clearly, H = ∪n∈ℕ B(xn , 1/2). If S = {u𝛼 }𝛼∈I is an orthonormal basis for H, then,
for 𝛼, 𝛽 ∈ I, ‖u𝛼 − u𝛽 ‖ = √2. Since the diameter of each of the balls B(xn , 1/2)
is 1, no such ball can contain more that one member of S. Therefore S is at most
countable.
Conversely, if H possesses a countable orthonormal basis S = {un ∶ n ∈ ℕ},
let A be the collection of all finite linear combinations of element in S with
coefficients in ℚ + iℚ. We claim that A is dense in H. This will conclude the
proof because A is countable. To prove the claim, let M be the closure of A. To
show that M is a subspace of H, let x, y ∈ M, and let a, b ∈ 𝕂. Then there exist
sequences (xn ) and (yn ) in A, and sequences an , bn ∈ ℚ + iℚ such that limn xn =
x, limn yn = y, limn an = a, and limn bn = b. The sequence (an xn + bn yn ) is in
A, and limn an xn + bn yn = ax + by. Therefore ax + by ∈ M. We now show that
M = H. If not, then H = M ⊕ M⊥ , and M⊥ ≠ {0}. Pick a unit vector z ∈ M⊥ . Then
S ∪ {z} is an orthonormal subset of H that properly contains S. This contradicts the
maximality of S and completes the proof. 

Example 2. It is possible for a separable inner product space (hence for a sepa-
rable Hilbert space) to contain uncountably many pairs of orthogonal vectors.
1 𝜋
Consider the space 𝒞[−𝜋, 𝜋] with the inner product ⟨ f, g⟩ = ∫−𝜋 f(x)g(x);
2𝜋
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

302 fundamentals of mathematical analysis

𝒞[−𝜋, 𝜋] is separable.2 In the notation of example 4 on section 7.1, every pair


of functions (f, g) ∈ M × N is orthogonal. Since both M and N are uncountable,
we have proved our assertion. 

We focus mostly but not exclusively on separable Hilbert spaces. The existence
of inseparable Hilbert spaces of arbitrary Hilbert dimension will be presented in
the excursion at the end of this section. Many of the results we develop in this
chapter are valid for inseparable Hilbert spaces. Examples include the projection
theorem, the Riesz representation theorem, and the next three theorems. Also, in
the definition below, the set I need not be countable; hence H is not assumed to be
separable.

Definition. Let S = {u𝛼 ∶ 𝛼 ∈ I} be an orthonormal subset of a Hilbert space H. It


is not assumed that S is an orthonormal basis. For an element x ∈ H, the scalars
x̂𝛼 = ⟨x, u𝛼 ⟩ are called the Fourier coefficients of x relative to S.

Theorem 7.2.4. Let S = {u1 , … , un } be an orthonormal subset of H, and let


n n
x ∈ Span(S). Then x = ∑i=1 x̂i ui , and ‖x‖2 = ∑i=1 |x̂i |2 .

Proof. See the proof of theorem 3.7.5. 

Theorem 7.2.5. Suppose S = {u1 , … , un } be an orthonormal subset of H, and let


n
M = Span(S). For a vector x ∈ H, the vector y = ∑i=1 x̂i ui is the orthogonal
n
projection of x on M. In particular, for all scalars a1 , … , an , ‖x − ∑i=1 x̂i ui ‖ ≤
n n
‖x − ∑i=1 ai ui ‖. Furthermore ∑i=1 |x̂i |2 ≤ ‖x‖2 .

n
Proof. We only need to show that the vector z = x − y = x − ∑i=1 x̂i ui is in M⊥ . The
rest of the assertions follow from the projection theorem and theorem 7.2.4. Now,
for a fixed 1 ≤ j ≤ n,

n
⟨z, uj ⟩ = ⟨x, uj ⟩ − ∑ x̂i ⟨ui , uj ⟩ = ⟨x, uj ⟩ − x̂j = 0. 
i=1

Theorem 7.2.6 (Bessel’s inequality). Let {un } be an orthonormal subset of a Hilbert



space H. Then, for x ∈ H, ∑n=1 |x̂n |2 ≤ ‖x‖2 .

n
Proof. By theorem 7.2.5, ∑i=1 |x̂i |2 ≤ ‖x‖2 for each n ∈ ℕ. Taking the limit as
n → ∞ yields Bessel’s inequality. 

2 The set of trigonometric polynomials with rational coefficients is dense in 𝒞[−𝜋, 𝜋]. See corollary
4.10.3.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 303

Theorem 7.2.7. Let H be a separable Hilbert space, and let S = {un ∶ n ∈ ℕ} be an


orthonormal subset of H. Then the following are equivalent:

(a) S is an orthonormal basis for H.



(b) For every x ∈ H, x = ∑n=1 x̂n un .
(c) Span(S) is dense in H.

(d) For every x ∈ H, ‖x‖2 = ∑n=1 |x|̂ 2 .

(e) Parseval’s identity. For every x, y ∈ H, ⟨x, y⟩ = ∑n=1 x̂n yn̂ .

n
Proof. (a) implies (b). Let yn = ∑k=1 x̂k uk .
m m
For n < m, ‖ym − yn ‖2 = ‖ ∑k=n+1 x̂k uk ‖2 = ∑k=n+1 |x̂k |2 → 0 as m, n → ∞,

because ∑n=1 |x̂n |2 < ∞ (Bessel’s inequality). This shows that (yn ) is a Cauchy

sequence in H; hence it converges to, say y. Thus y = ∑n=1 x̂n un . We need to show
that y = x. For a fixed k ∈ ℕ, ⟨y, uk ⟩ = limn→∞ ⟨yn , uk ⟩ = ⟨x, uk ⟩. Thus y − x is
y−x
orthogonal to each uk . If y − x ≠ 0, then S ∪ { } would be an orthonormal set
‖y−x‖
that properly contains S. The maximality of S forces y = x.
n n
That (b) implies (c) is obvious, since x = limn→∞ ∑k=1 x̂k uk , and each ∑k=1 x̂k uk
is in Span(S).

(c) implies (d). Suppose, for some x ∈ H, ‖x‖2 > ∑n=1 |x̂n |2 , and let 𝛿 2 =

‖x‖2 − ∑n=1 |x̂n |2 . We show that the ball B(x, 𝛿) contains no finite linear
n
combination of S. This will show that Span(S) is not dense in H. If ∑k=1 ak uk ∈
n n
Span(S), then, by theorem 7.2.5, ‖x − ∑k=1 ak uk ‖2 ≥ ‖x − ∑k=1 x̂k uk ‖2 =
n n
‖x‖2 − ‖ ∑k=1 x̂k uk ‖2 = ‖x‖2 − ∑k=1 |x̂k |2 = 𝛿 2 .

(d) implies (e). The identity ‖x‖2 = ∑n=1 |x̂n |2 can be written as ‖x‖2 = ‖x‖̂ 22 ,
where x̂ = (x̂n ) ∈ l2 , and ‖x‖̂ 2 is the l2 -norm of x.̂ Now, assuming (d) is true, then,
for every 𝛼 ∈ 𝕂, ⟨x + 𝛼y, x + 𝛼y⟩ = ⟨x̂ + 𝛼 y,̂ x̂ + 𝛼 y⟩. ̂ Equivalently, 𝛼⟨y, x⟩ +
𝛼⟨x, y⟩ = 𝛼⟨y,̂ x⟩̂ + 𝛼⟨x,̂ y⟩.
̂ Setting 𝛼 = 1/2, we obtain Re(⟨x, y⟩) = Re(⟨x,̂ y⟩). ̂
Setting 𝛼 = 1/2i yields Im⟨x, y⟩ = Im⟨x,̂ y⟩. ̂ This proves that ⟨x, y⟩ = ⟨x,̂ y⟩,
̂ which

is equivalent to ⟨x, y⟩ = ∑n=1 x̂n yn̂ .

(e) implies (a). Suppose there exists a unit vector u such that S ∪ {u} is orthonor-

mal. Then û k = ⟨u, uk ⟩ = 0 for all k ∈ ℕ, and 1 = ⟨u, u⟩ = ∑k=1 û k û k = 0. This
contradiction shows that (a) is true. 

Example 3. Every element in l2 can be written as a series x = ∑n=1 xn en .
n
Consider the vectors yn = x − ∑i=1 xi ei = (0, 0, … , 0, xn+1 , xn+2 , ...). Since
∞ ∞
limn ‖yn ‖2 = limn ∑i=n+1 |xi |2 = 0, x = ∑n=1 xn en . 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

304 fundamentals of mathematical analysis

Example 4. Consider the countable set of functions S = {eint ∶ n ∈ ℤ}. By corol-


lary 4.10.3, Span(S) is dense in (𝒞[−𝜋, 𝜋], ‖.‖2 ). Therefore Span(S) is dense
in the completion, H, of (𝒞[−𝜋, 𝜋], ‖.‖2 ). Thus the set {eint ∶ n ∈ ℤ} is an
orthonormal basis for H. We will see in section 8.9 that H is the space 𝔏2 (−𝜋, 𝜋)
of (Lebesgue) square integrable functions on (−𝜋, 𝜋).
For exactly the same reason, (see theorem 4.10.8), the set of normalized
Legendre polynomials {P̃ n ∶ n ∈ ℕ} is an orthonormal basis for 𝔏2 (−1, 1). 

Definition. Two Hilbert spaces H1 and H2 are isomorphic (as Hilbert spaces) if
there exists an isomorphism T ∶ H1 → H2 such that, for all x, y ∈ H1 ,

⟨x, y⟩ = ⟨Tx, Ty⟩.

It follows directly that such an isomorphism is also an isometry because

‖x‖2 = ⟨x, x⟩ = ⟨Tx, Tx⟩ = ‖Tx‖2 .

Theorem 7.2.8 (the Riesz-Fisher theorem). Let H be a separable Hilbert space.

(a) If dim(H) = n, then H is isomorphic to 𝕂n .


(b) If dim(H) = ∞, then H is isomorphic to l2 .

Proof. We only prove the second statement. The proof of the first statement is simpler.
Let {un } be an orthonormal basis for H. For x ∈ H, let T(x) = (x̂n )∞ n=1 = x;̂
T ∶ H → l2 is linear since (ax + by) =̂ ax̂ + by.̂ The fact that ⟨x, y⟩ = ⟨Tx, Ty⟩ is
Parseval’s identity in theorem 7.2.7. To verify that T is one-to-one, suppose that

̂ 0, and ∑n=1 |x̂n − yn̂ |2 = 0. Therefore x̂n = yn̂ . Hence, by
x̂ = y.̂ Then (x − y) =
∞ ∞
theorem 7.2.7, x = ∑n=1 x̂n un = ∑n=1 yn̂ un = y; T is onto because if (an ) ∈ l2 ,

then the series ∑n=1 an un converges to a vector x ∈ H such that x̂ = (an ). See
problem 3 at the end of this section. 

We offer a few observations on some crucial differences between Banach and


Hilbert spaces. This will hopefully explain why Hilbert spaces have such an elegant
and uncluttered structure compared to a general Banach space.

The closest point property and the projection theorem (theorems 7.1.5 and 7.1.7,
respectively) are at the heart of the constructions of this chapter. An examination
of the proof of theorem 7.1.5 reveals that the parallelogram law delivers both
the existence and the uniqueness of the closest point to a closed convex set. The
parallelogram law is a direct result of the fact that the norm on a Hilbert space is
induced by an inner product, which is what sets Hilbert spaces apart from general
Banach spaces, where the closest point property fails as does the conclusion of the
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 305

projection theorem. The following simple example illustrates one of the discussion
points.

Example 5. Consider the space ℝ2 with the norm ‖x‖∞ = max{|x1 |, |x2 |}. The set
M = {x = (x1 , x2 ) ∈ ℝ2 ∶ |x1 | ≤ 1, |x2 | ≤ 1} is closed and convex. Every point on
the line segment {(1, y) ∶ |y| ≤ 1} has distance 1 from the point x = (2, 0), and
dist(x, M) = 1. There are examples where the very existence of a closest point is
not guaranteed. See problems 6–8 at the end of this section for a slight expansion
of this discussion. 

It was mentioned in section 6.4 that not every closed subspace of a Banach space is
complemented. Theorem 7.1.7 guarantees that every closed subspace of a Hilbert
space is complemented. Projections in Banach spaces play a similar role to orthog-
onal projections in proving that certain closed subspaces are complemented. See
problem 10 on section 6.4 for necessary and sufficient conditions for a closed
subspace of a Banach space to be complemented. Also examine example 6 in
section 6.4.

Excursion: Inseparable Hilbert Spaces

Inseparable Hilbert spaces do exist. They are mostly a curiosity and do not have
much practical use. We include the discussion below for the satisfaction of the
inquisitive reader.
The motivation for the definition below and the construction in theorem 7.2.9
is provided by the following example.

Example 6. Let S = {u𝛼 ∶ 𝛼 ∈ I} be an uncountable orthonormal subset of a


Hilbert space H. For a vector x ∈ H, consider the set of Fourier coefficients
{x̂𝛼 ∶ 𝛼 ∈ I}. We claim that x̂𝛼 = 0 for all but countably many 𝛼 ∈ I.
n
Let {u𝛼1 , … , u𝛼n } be a finite subset of S. By theorem 7.2.5, ∑i=1 |x̂𝛼i |2 ≤
‖x‖2 < ∞. It follows that ∑𝛼∈I |x̂𝛼 |2 < ∞ (see example 1 in section 4.10 and
the definition preceding it); hence the set {𝛼 ∈ I ∶ x̂𝛼 ≠ 0} is countable. 

The above example strongly suggests the following definition.

Definition. Let I be an infinite set, and let ℵ = Card(I). Define l2 (ℵ) to be the
set of all functions x ∶ I → ℂ such that x𝛼 = 0 for all but countably many 𝛼 ∈ I
and ‖x‖ = (∑𝛼∈I |x𝛼 |2 )1/2 < ∞. To eliminate any danger of ambiguity, let Ix =
{𝛼1 , 𝛼2 , ...} be the subset of I for which x𝛼 ≠ 0. The notation ∑𝛼∈I |x𝛼 |2 means

∑i=1 |x𝛼i |2 . We will continue to employ this notation for the remainder of this
discussion.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

306 fundamentals of mathematical analysis

Theorem 7.2.9. The set l2 (ℵ) is a Hilbert space with the operations defined within
the proof.

Proof. Let x = (x𝛼 ), and y = (y𝛼 ) ∈ l2 (ℵ). We show that x + y ∈ l2 (ℵ) and that
‖x + y‖ ≤ ‖x‖ + ‖y‖. Let Ix = {𝛼 ∈ I ∶ x𝛼 ≠ 0}, and Iy = {𝛼 ∈ I ∶ y𝛼 ≠ 0}, and
let J = Ix ∪ Iy . Since J is countable, we can write J = {𝛼1 , 𝛼2 , ...}. Note that x̂ = (x𝛼i )
and ŷ = (y𝛼i ) are in l2 ; hence ‖x̂ + y‖̂ 2 ≤ ‖x‖̂ 2 + ‖y‖̂ 2 . But every 𝛼 for which

x𝛼 + y𝛼 ≠ 0 is in J; hence ‖x + y‖ = (∑i=1 |x𝛼i + y𝛼i |2 )1/2 = ‖x̂ + y‖̂ 2 ≤ ‖x‖̂ 2 +
‖y‖̂ 2 = ‖x‖ + ‖y‖. The fact that ‖ax‖ = |a|‖x‖ for all x ∈ l2 (ℵ) and all scalars a
requires an even simpler argument. The rest of the properties of a normed linear
space are easily verifiable. Thus l2 (ℵ) is a normed linear space.

Define an inner product on l2 (ℵ) as follows: ⟨x, y⟩ = ⟨x,̂ y⟩̂ = ∑i=1 x𝛼i y𝛼 This
i
inner product induces the norm on l2 (ℵ) we defined earlier. We now show
the completeness of l2 (ℵ). Suppose (x(n) ) is a Cauchy sequence in l2 (ℵ), let
(n)
In = {𝛼 ∈ I ∶ x𝛼 ≠ 0}, and let J = ∪n∈ℕ In . Then J is a countable subset of I,
and we can write J = {𝛼1 , 𝛼2 , ...}. Since ‖x̂(m) − y(n)
̂ ‖ = ‖x(n) − y(n) ‖, (x̂(n) ) is a
Cauchy sequence in l and is therefore convergent to an element x̂ = (x1 , x2 , ...) ∈ l2 .
2

Define x ∈ l2 (ℵ) by

xi if 𝛼 = 𝛼i ,
x𝛼 = {
0 otherwise.

Clearly, x(n) converges to x in l2 (ℵ). 

The reader can now anticipate the theorem that must be stated next: the set {e𝛼 }𝛼∈I
is an orthonormal basis for l2 (ℵ), where e𝛼 (𝛽) = 𝛿𝛼,𝛽 . Thus, for any cardinal
number ℵ, we have constructed a Hilbert space whose orthonormal basis has
cardinality ℵ. Such a space is also unique up to Hilbert space isomorphism in
the sense that it depends only on ℵ and not on the particular set I in the above
construction. We leave it to the interested reader to reflect on the details.
The cardinality of an orthonormal basis of a Hilbert space H is known as the
Hilbert dimension of H.

Exercises

1. Let {un } be an orthonormal basis for a separable Hilbert space H, and let

{vn } be an orthonormal set in H such that ∑n=1 ‖un − vn ‖2 < 1. Prove that
{vn } is an orthonormal basis for H.
2. Let S = {v1 , v2 , ...} be an orthonormal subset of a separable Hilbert space H
(not necessarily an orthonormal basis), and let M = Span(S). Prove that if

P is the projection of H onto M, then Px = ∑i=1 ⟨x, vi ⟩vi .
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 307

3. Let {un } be an orthonormal basis for a separable Hilbert space H, and let
(an ) ∈ l2 . Prove that there is an element x ∈ H such that x̂n = an .
4. Use theorems 6.2.4 and 7.2.8 to provide an alternative proof of the Riesz
representation theorem for separable Hilbert spaces.
5. Let {un } be an orthonormal basis for a separable Hilbert space H. Define a

function ‖.‖′ ∶ H → ℝ as follows: ‖x‖′ = ∑n=1 2−n |x̂n |. Show that ‖.‖′ is a
norm on H and that it is not equivalent to the original norm on H.
6. Let X = 𝒞[0, 1] endowed with the uniform norm, and let M be the subset
of X consisting of all functions f such that f(0) = 0, f(1) = 1, f ≥ 0, and
1
∫0 f(x)dx = 1. Prove that M is closed and convex and that dist(0, M) = 1.
Also show that, for every f ∈ M, ‖f‖∞ > 1, and hence M contains no element
of smallest norm.

Definition. A Banach space X is strictly convex if whenever x ≠ y, and


x+y
‖x‖ = ‖y‖, then ‖ ‖ < ‖x‖. Geometrically, strict convexity means that if
2
x and y are equidistant from the origin, then the midpoint is strictly closer
to the origin than x (and y).

7. Prove that a Hilbert space is strictly convex.


8. Let X be a strictly convex Banach space, let M be a closed convex subset, and
let x ∈ X. Show that if there is a point y ∈ M such that ‖x − y‖ = dist(x, M),
then y is unique.

Definition. A sequence (xn ) in a Hilbert space is said to converge weakly


to x ∈ H if, for every y ∈ H, limn ⟨xn , y⟩ = ⟨x, y⟩. In light of the Riesz rep-
resentation theorem, this definition is consistent with the corresponding
definition for Banach spaces introduced in the problem set in section 6.4.
Some of the exercises below are repetitive of problems on section 6.4, but
the proofs can be significantly simplified in the context of Hilbert spaces.

9. Prove that if {un } is an orthonormal basis for a separable Hilbert space, then
(un ) converges weakly to 0.
10. Show that a norm convergent sequence is weakly convergent but not
conversely.
11. Show that the weak limit of a sequence in a Hilbert space, if it exists, is
unique.
12. Show that if limn ⟨xn , y⟩ = ⟨x, y⟩ for every y in a dense subset of H, then
xn →w x.
13. Let {un } be an orthonormal basis for a separable Hilbert space H. Prove the
xn →w x if and only if limn ⟨xn , uj ⟩ = ⟨x, uj ⟩ for every j ∈ ℕ.
14. Show that if xn →w x, and limn ‖xn ‖ = ‖x‖, then xn is norm convergent to x.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

308 fundamentals of mathematical analysis

15. Prove that if xn →w x, then {‖xn ‖} is bounded and ‖x‖ ≤ lim infn ‖xn ‖. Hint:
The linear functionals 𝜆n (𝜉) = ⟨𝜉, xn ⟩ are pointwise bounded.
16. Let xn →w x. Prove that there exists a subsequence xnk of (xn ) such that
1 N
∑k=1 xnk is strongly convergent to x. Hint: Without loss of generality,
N
assume that x = 0. Inductively define a subsequence xnk of (xn ) such that
|⟨xni , xnk ⟩| < 2−k for i = 1, … , k − 1. Now

N N N k−1
1 1 1
‖ ∑ x ‖2 = 2 ∑ ‖xnk ‖2 + 2 ∑ ∑ 2Re⟨xni , xnk ⟩.
N k=1 nk N k=1 N k=2 i=1

This is a version of the Banach-Saks theorem.

7.3 Self-Adjoint Operators

This section establishes the broad characteristics of self-adjoint operators. We also


study projection operators and prove a theorem (7.3.11) which produces ample
examples of self-adjoint operators. Self-adjoitness (more generally, normality) is
not nearly sufficient to produce a result resembling the spectral theorem for nor-
mal operators on finite-dimensional inner product spaces. The complete picture
emerges in the next section.
In the finite-dimensional case, the definition of the adjoint is straightforward,
owing to the simplicity of the characterization of linear functional on a finite-
dimensional inner product spaces (see example 9 in section 3.7). The definition
below in the infinite-dimensional case requires the full power of the Riesz repre-
sentation theorem. Let H be a separable Hilbert space, and let T ∈ ℒ(H). For a
fixed element y ∈ H, define a functional 𝜆y by 𝜆y (x) = ⟨Tx, y⟩. It is clear that 𝜆y is
linear. In fact, 𝜆y is bounded since |𝜆y (x)| = |⟨Tx, y⟩| ≤ ‖Tx‖‖y‖ ≤ ‖T‖‖x‖‖y‖. By
the Riesz representation theorem, there exists a unique element T∗ y ∈ H such that,
for all x ∈ H, 𝜆y (x) = ⟨x, T∗ y⟩. We therefore have a function T∗ ∶ H → H defined
by the requirement that
⟨Tx, y⟩ = ⟨x, T∗ y⟩
for all x, y ∈ H.

The above equation is the defining property of the adjoint operator T∗ of T. The
reader can easily see that the definition is consistent with the definition of the
adjoint operator on a Banach space that was introduced in section 6.6.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 309

It is easy to verify that T∗ is linear. For example,

⟨x, T∗ (y1 + y2 )⟩ = ⟨Tx, y1 + y2 ⟩ = ⟨Tx, y1 ⟩ + ⟨Tx, y2 ⟩


= ⟨x, T∗ y1 ⟩ + ⟨x, T∗ y2 ⟩ = ⟨x, T∗ y1 + T∗ y2 ⟩.

This shows that T∗ (y1 + y2 ) = T∗ (y1 ) + T∗ (y2 ). We now show that T∗ ∈ ℒ(H).
‖T∗ y‖2 = ⟨T∗ y, T∗ y⟩ = ⟨T(T∗ y), y⟩ ≤ ‖T(T∗ y)‖‖y‖ ≤ ‖T‖‖T∗ y‖‖y‖. Thus ‖T∗ y‖ ≤
‖T‖‖y‖ for every y ∈ H. Hence T∗ ∈ ℒ(H), and ‖T∗ ‖ ≤ ‖T‖.

Theorem 7.3.1. For T, T1 , T2 ∈ ℒ(H) and 𝛼 ∈ 𝕂,

(a) (T1 + T2 )∗ = T∗1 + T∗2 .


(b) (𝛼T)∗ = 𝛼T∗ .
(c) (T1 T2 )∗ = T∗2 T∗1 . Consequently, for every n ∈ ℕ, (T∗ )n = (Tn )∗ .
(d) T∗∗ = T.
(e) ‖T∗ ‖ = ‖T‖.
(f) ‖T∗ T‖ = ‖T‖2 .

Proof. The computations needed to prove parts (a)–(d) are simple. As an example, we
establish part (c): ⟨T1 T2 x, y⟩ = ⟨T2 x, T∗1 y⟩ ⟨x, T∗2 T∗1 y⟩, which, by definition, means
that (T1 T2 )∗ = T∗2 T∗1 . We already saw that ‖T∗ ‖ ≤ ‖T‖. Applying the same fact to
T∗ and using part (d), we have ‖T∗ ‖ ≤ ‖T∗∗ ‖ = ‖T‖, thus proving (e). To prove
(f), ‖T∗ T‖ ≤ ‖T∗ ‖‖T‖ = ‖T‖2 . Also,

‖Tx‖2 = ⟨Tx, Tx⟩ = ⟨x, T∗ Tx⟩ ≤ ‖x‖‖T∗ Tx‖ ≤ ‖x‖‖T∗ T‖‖x‖ = ‖T∗ T‖‖x‖2 ,

which implies that ‖T‖2 ≤ ‖T∗ T‖, and the proof of (f) is complete. 

Definition. An operator T ∈ ℒ(H) is called self-adjoint if T∗ = T. Thus T is self-


adjoint if and only if, for all x, y ∈ H, ⟨Tx, y⟩ = ⟨x, Ty⟩.

Example 1. Let {un ∶ n ∈ ℕ} be an orthonormal basis for a separable Hilbert space


H, and fix a positive integer N. The projection operator P ∶ H → H defined by
N
Px = ∑n=1 x̂n un is self-adjoint. For all vectors x and y in H,

N N
⟨Px, y⟩ = ⟨∑ x̂ u , y⟩ =∑ x̂ ŷ ,
n=1 n n n=1 n n

while
N N N
⟨x, Py⟩ = ∑ ⟨x, yn̂ un ⟩ = ∑ ŷ ⟨x, un ⟩ = ∑ x̂ ŷ . 
n=1 n=1 n n=1 n n
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

310 fundamentals of mathematical analysis

Projection operators are the simplest self-adjoint operators, and, in a way, are
building blocks we can use to generate more examples of self-adjoint operators.

Theorem 7.3.2. (a) The sum of self-adjoint operators is self-adjoint.


(b) If T is self-adjoint and 𝛼 ∈ ℝ, then 𝛼T is self-adjoint.
(c) The composition of two self-adjoint operators T1 and T2 is self-adjoint if and
only if T1 T2 = T2 T1 .
(d) The set of self-adjoint operators is closed in ℒ(H).

Proof. We leave the proof of (a)–(c) to the reader. To prove (d), let (Tn ) be a sequence
of self-adjoint operators that converges in ℒ(H) to T. We show that T∗ = T. Then

‖T − T∗ ‖ ≤ ‖T − Tn ‖ + ‖Tn − T∗n ‖ + ‖T∗n − T∗ ‖


= ‖T − Tn ‖ + ‖(Tn − T)∗ ‖ = 2‖T − Tn ‖ → 0 as n → ∞. 

Theorem 7.3.3. The eigenvalues of a self-adjoint operator T are real.

Proof. Let 𝜆 be an eigenvalue of T with eigenvector u. 𝜆⟨u, u⟩ = ⟨𝜆u, u⟩ = ⟨Tu, u⟩ =


⟨u, Tu⟩ = ⟨u, 𝜆u⟩ = 𝜆⟨u, u⟩. Since ⟨u, u⟩ ≠ 0, 𝜆 = 𝜆, and 𝜆 is real. 

Theorem 7.3.4. If T is a self-adjoint operator, then eigenvectors of T corresponding


to distinct eigenvalues are orthogonal.

Proof. Let Tu1 = 𝜆1 u1 , Tu2 = 𝜆2 u2 , where u1 ≠ 0 ≠ u2 , and 𝜆1 ≠ 𝜆2 . Then


𝜆1 ⟨u1 , u2 ⟩ = ⟨𝜆1 u1 , u2 ⟩ = ⟨Tu1 , u2 ⟩ = ⟨u1 , Tu2 ⟩ = ⟨u1 , 𝜆2 u2 ⟩ = 𝜆2 ⟨u1 , u2 ⟩. Thus
(𝜆1 − 𝜆2 )⟨u1 , u2 ⟩ = 0, and ⟨u1 , u2 ⟩ = 0. 

Example 2. The set of eigenvalues of a self-adjoint operator on a separable Hilbert


space is at most countable.
Since a separable Hilbert space cannot contain an uncountable subset of
orthogonal vectors and since eigenvectors corresponding to distinct eigenval-
ues are orthogonal, the set of eigenvalues is at most countable. 

Lemma 7.3.5. (a) Let H be a complex Hilbert space, and let T ∈ ℒ(H). If ⟨Tx, x⟩ =
0 for all x ∈ H, then T = 0.
(b) Let H be a real Hilbert space, and let T ∈ ℒ(H) be self-adjoint. If ⟨Tx, x⟩ = 0
for all x ∈ H, then T = 0.

Proof. It is sufficient to show that ⟨Tx, y⟩ = 0 for all x, y ∈ H, because, in that


case, ⟨Tx, Tx⟩ = 0, so ‖Tx‖2 = 0, and T = 0. For x, y ∈ H, and scalars 𝛼 and 𝛽,
0 = ⟨T(𝛼x + 𝛽y), 𝛼x + 𝛽y⟩ = 𝛼𝛽⟨Tx, y⟩ + 𝛼𝛽⟨Ty, x⟩. If we take 𝛼 = 𝛽 = 1, then
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 311

⟨Tx, y⟩ + ⟨Ty, x⟩ = 0. If we take 𝛼 = i, 𝛽 = 1, then i⟨Tx, y⟩ − i⟨Ty, x⟩ = 0. The


above two identities imply that ⟨Tx, y⟩ = 0.
The proof of part (b) is a straightforward specialization of the proof of (a). 

Remark. Part (b) of the above theorem is false if T is not self-adjoint. For example,
if T ∶ ℝ2 → ℝ2 is the 90∘ rotation of the plane, then ⟨Tx, x⟩ = 0 for all x ∈ ℝ2 .

Theorem 7.3.6. Let H be a complex Hilbert space, and let T ∈ ℒ(H). Then T is self-
adjoint if and only if ⟨Tx, x⟩ is real for all x ∈ H.

Proof. If T is self-adjoint, then ⟨Tx, x⟩ = ⟨x, T∗ x⟩ = ⟨x, Tx⟩ = ⟨Tx, x⟩. Thus ⟨Tx, x⟩ is
real. Conversely, if ⟨Tx, x⟩ is real for all x ∈ H, then ⟨Tx, x⟩ = ⟨Tx, x⟩ = ⟨x, T∗ x⟩ =
⟨T∗ x, x⟩. Thus ⟨(T∗ − T)x, x⟩ = 0 for all x; hence T∗ − T = 0, by the previous
lemma. 

Theorem 7.3.7. Let T ∈ ℒ(H) be self-adjoint. Then

‖T‖ = sup{|⟨Tx, x⟩| ∶ ‖x‖ = 1}.

Proof. Let M = sup{|⟨Tx, x⟩| ∶ ‖x‖ = 1}. If ‖x‖ = 1, then

|⟨Tx, x⟩| ≤ ‖T‖‖x‖2 = ‖T‖.

Thus M ≤ ‖T‖.
It follows from the definition of M that |⟨Tx, x⟩| ≤ M‖x‖2 for all x ∈ H. The
following identities are easy to verify:

⟨T(x + y), x + y⟩ − ⟨T(x − y), x − y⟩ = 2⟨Tx, y⟩ + 2⟨Ty, x⟩ = 2⟨Tx, y⟩ + 2⟨y, Tx⟩


= 2⟨Tx, y⟩ + 2⟨Tx, y⟩ = 4Re(⟨Tx, y⟩).

Thus
1 1
|Re⟨Tx, y⟩| ≤ |⟨T(x + y), x + y⟩| + |⟨T(x − y), x − y⟩|
4 4
M M
≤ {‖x + y‖ + ‖x − y‖ } = {‖x‖2 + ‖y‖2 }.
2 2
4 2

The summary of the above calculations is that

M
for all x, y ∈ H, |Re⟨Tx, y⟩| ≤ {‖x‖2 + ‖y‖2 } (3)
2
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

312 fundamentals of mathematical analysis

Tx
If ‖x‖ = 1, and Tx ≠ 0, let y = . Then
‖Tx‖

Tx 1
Re ⟨Tx, ⟩= ⟨Tx, Tx⟩ = ‖Tx‖.
‖Tx‖ ‖Tx‖

This and inequality (3) yield

M Tx 2
‖Tx‖ ≤ {‖x‖2 + ‖ ‖ } = M.
2 ‖Tx‖

This completes the proof. 

Theorem 7.3.8. If T is a self-adjoint operator, then r(T) = ‖T‖.

Proof. Since |𝜆| ≤ ‖T‖ for all 𝜆 ∈ 𝜎(T), it is sufficient to find an element 𝜆 ∈ 𝜎(T)
such that |𝜆| = ‖T‖. By the previous theorem, there exists a sequence of unit
vectors (xn ) such that limn |⟨Txn , xn ⟩| = ‖T‖. Thus there exists a subsequence (yn )
of (xn ) such that limn ⟨Tyn , yn ⟩ = ‖T‖, or limn ⟨Tyn , yn ⟩ = −‖T‖. Therefore there
exists a real number 𝜆 such that |𝜆| = ‖T‖ and limn ⟨Tyn , yn ⟩ = 𝜆. Now

‖Tyn − 𝜆yn ‖2 = ‖Tyn ‖2 − 2𝜆⟨Tyn , yn ⟩ + 𝜆2 ‖yn ‖2 ≤ ‖T‖2 − 2𝜆⟨Tyn , yn ⟩ + 𝜆2


= 2𝜆2 − 2𝜆⟨Tyn , yn ⟩ = 2𝜆(𝜆 − ⟨Tyn , yn ⟩) → 0.

If T − 𝜆I is invertible, then 1 = ‖yn ‖ = ‖(T − 𝜆I)−1 (T − 𝜆I)yn ‖ ≤ ‖


(T − 𝜆I)−1 ‖‖Tyn − 𝜆yn ‖ → 0. This contradiction shows that 𝜆 ∈ 𝜎(T). 

Definition. A bounded operator P ∈ ℒ(H) is a projection if, for some closed


subspace M of H, P is the (orthogonal) projection of H onto M. See the
projection theorem. We remind the reader that we use the notation PM to denote
the projection of H onto its closed subspace M

Theorem 7.3.9. A bounded operator P is a projection if and only if it is idempotent


and self-adjoint.

Proof. Suppose P is the projection of H onto a closed subspace M. The fact that P2 = P
has been established in theorem 7.1.8, We show that P is self-adjoint. First observe
that, for all x, y ∈ H, Px ∈ M and Py − y ∈ M⊥ ; hence ⟨Px, Py − y⟩ = 0.
Now for x, y ∈ H,

⟨Px, y⟩ = ⟨Px, y − Py⟩ + ⟨Px, Py⟩ = ⟨Px, Py⟩ = ⟨Px − x, Py⟩ + ⟨x, Py⟩ = ⟨x, Py⟩.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 313

Conversely, suppose that P is self-adjoint and idempotent. Let M = {x ∈


H ∶ Px = x}. Being the kernel of the bounded operator P − I, M is a closed subspace
of H. We show that P is the projection of H onto M by showing that, for x ∈ H,
y = Px ∈ M, and z = x − Px ∈ M⊥ . Because Py = P(Px) = P2 x = Px = y, y ∈ M.
Now if w ∈ M, then ⟨z, w⟩ = ⟨x − Px, w⟩ = ⟨x, w⟩ − ⟨Px, w⟩ = ⟨x, w⟩ − ⟨x, P∗ w⟩ =
⟨x, w⟩ − ⟨x, Pw⟩ = ⟨x, w⟩ − ⟨x, w⟩ = 0. 

Definition. Two projections PM and PN are said to be orthogonal if M ⊥ N. Notice


that, in this case, M + N = M ⊕ N.

Theorem 7.3.10. The sum of two orthogonal projections PM and PN is a projection


and, in this case, PM + PN = PM⊕N . Consequently, the sum of a finite set of
pairwise orthogonal projections is a projection.

Proof. It is easy to verify that PM PN = PN PM = 0. For example, PM (PN x) = 0


because PN x ∈ N, and N ⊆ M⊥ . Now theorem 7.3.9 implies that PM + PN is
a projection since the sum of self-adjoint operators is self-adjoint and (PM +
PN )2 = P2M + PM PN + PN PM + P2N = P2M + P2N = PM + PN . We now show
that ℜ(PM + PN ) = M ⊕ N. Clearly, ℜ(PM + PN ) ⊆ M ⊕ N. Conversely, if
x = y + z ∈ M ⊕ N, where y ∈ M, and z ∈ N, then (PM + PN )(x) = PM (y) +
PN (y) + PM (z) + PN (z) = PM (y) + PN (z) = y + z = x. 

Example 3. Let M and N be the following closed subspaces of l2 : M =


Span({e2n ∶ n ∈ ℕ}) and N = Span({e2n+1 ∶ n ∈ ℕ}). Since M ⊕ N = l2 , PM +
PN = I. 

The following construction produces an abundance of examples of self-adjoint


operators. This is also the first step toward understanding the structure of compact
self-adjoint operators.

Theorem 7.3.11. Let (Pn ) be a sequence of pairwise orthogonal projections, and


let 𝜆n be a sequence of nonzero complex numbers such that limn 𝜆n = 0. Define

T ∶ H → H by Tx = ∑n=1 𝜆n Pn x. Then

(a) T is a bounded operator;



(b) T∗ = ∑n=1 𝜆n Pn ; therefore if each 𝜆n ∈ ℝ, then T is self-adjoint; and
(c) {𝜆n } is the set of nonzero eigenvalues of T.
n
Proof. We show that the sequence of operators Sn = ∑i=1 𝜆i Pi is a Cauchy sequence
m
in ℒ(H). The previous theorem shows that, for n < m, ∑k=n Pk is a projection;
m
hence ‖ ∑k=n Pk ‖ = 1. (See theorem 7.1.8.) Observe that the mutual orthogonality
of the projections Pn implies that, for every x ∈ H, the vectors Pn x are orthogonal.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

314 fundamentals of mathematical analysis

Let 𝜖 > 0. Since 𝜆n → 0, there exists a positive integer N such that, for all
m
n > N, |𝜆n | < 𝜖. Now, for m > n > N, and an arbitrary x ∈ H, ‖ ∑k=n 𝜆k Pk x‖2 =
m m m m
∑k=n |𝜆k |2 ‖Pk x‖2 ≤ 𝜖2 ∑k=n ‖Pk x‖2 = 𝜖2 ‖ ∑k=n Pk x‖2 ≤ 𝜖2 ‖ ∑k=n Pk ‖2 ‖x‖2 =
𝜖2 ‖x‖2 . This shows that the sequence Sn of partial sums of the series defining T is
Cauchy; hence the series converges, and T ∈ ℒ(H). This proves part (a).
∞ ∞ ∞ ∞
Now ⟨Tx, y⟩ = ⟨∑n = 1 𝜆n Pn x, y⟩ = ∑n = 1 𝜆n ⟨Pn x, y⟩ = ∑n = 1 𝜆n ⟨x, Pn y⟩ =⟨x, ∑n = 1
𝜆n Pn y⟩; hence we obtain the stated formula for T∗ .

Fix a positive integer m. For a nonzero element x ∈ Mm = ℜ(Pm ), Pm x = x, and



Pn x = 0 for all n ≠ m. Thus Tx = ∑n=1 𝜆n Pn x = 𝜆m Pm x = 𝜆m x. Hence 𝜆m is an
eigenvalue of T. To show that {𝜆n } is the entire set of nonzero eigenvalues of
1
T, let u ∈ H and 0 ≠ 𝜇 ∈ ℂ be such that 𝜇 ≠ 𝜆n and Tu = 𝜇u. Since u = Tu,
𝜇

u ∈ ℜ(T). We show below that u ∈ ℜ(T) ; hence u = 0, which will establish (c).
For x ∈ Mn , Tx = 𝜆n x, T∗ x = 𝜆n x, so

𝜇⟨u, Tx⟩ = ⟨𝜇u, Tx⟩ = ⟨Tu, 𝜆n x⟩ = ⟨u, T∗ (𝜆n x)⟩


= ⟨u, 𝜆n 𝜆n x⟩ = 𝜆n ⟨u, 𝜆n x⟩ = 𝜆n ⟨u, Tx⟩.

Thus (𝜇 − 𝜆n )⟨u, Tx⟩ = 0. Since 𝜇 ≠ 𝜆n , ⟨u, Tx⟩ = 0. We have shown that u ∈ M⊥n
for every n ∈ ℕ. Therefore u ⊥ S = Span{∪n∈ℕ Mn }; hence u ⊥ S. Clearly, ℜ(T) ⊆

S; hence u ∈ ℜ(T) . 
∞ x̂
Example 4. Consider the operator T ∶ l2 → l2 defined by Tx = ∑n=1 n en . Here
n
Pn is the projection of l2 on the one-dimensional subspace spanned by en .
1
By the above theorem, T is self-adjoint, and the set {𝜆n = ∶ n ∈ ℕ} is the
n
entire set of nonzero eigenvalues. Since the spectrum of T is closed, 𝜆 = 0 is
in 𝜎(T). However, since T is injective, 𝜆 = 0 is not an eigenvalue of T. We
1
now show that the set S = { ∶ n ∈ ℕ} ∪ {0} is the entire spectrum of T. If
n
∞ 1
𝜆 ∈ ℂ − S, then 𝛿 = dist(𝜆, S) > 0. Now (T − 𝜆I)x = ∑n=1 ( − 𝜆)x̂n en ; hence
n
∞ 1 ∞
‖((T − 𝜆I)(x)‖2 = ∑n=1 | − 𝜆|2 |x̂n |2 ≥ 𝛿 2 ∑n=1 |x̂n |2 = 𝛿 2 ‖x‖2 . Thus T − 𝜆I is
n
bounded away from zero. In the same manner, the adjoint of T − 𝜆I, namely,
T − 𝜆I, is bounded away from zero. Hence T is invertible by problem 11 at the
end of this section. 

Theorem 7.3.12. Let T ∈ ℒ(H). Then


(a) ℜ(T) = 𝒩(T∗ ), and
(b) 𝒩(T∗ )⊥ = ℜ(T).
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 315


Proof. y ∈ ℜ(T) if and only if ⟨y, Tx⟩ = 0 for all x ∈ H if and only if ⟨T∗ y, x⟩ = 0
for all x ∈ H if and only if T∗ y = 0 if and only if y ∈ 𝒩(T∗ ). Part (b) follows from
⊥⊥
(a) and 𝒩(T∗ )⊥ = ℜ(T) = ℜ(T). 

The above theorem has many applications. Here is a simple example.

Example 5. Suppose T is a self-adjoint operator. If 𝜆 is not an eigenvalue of T, then


the range of T − 𝜆I is dense.
If 𝜆 is not an eigenvalue of T, 𝜆 is not an eigenvalue of T. Applying the previous
theorem, we have ℜ(T − 𝜆I) = 𝒩(T∗ − 𝜆I)⟂ = 𝒩(T − 𝜆I)⟂ = {0}⟂ = H. 

The following example shows that the entire spectrum, not just the eigenvalues,
of a self-adjoint operator is contained in ℝ. Problem 16 at the end of this section
provides a sharper result.

Example 6. Let T be a self-adjoint operator on H. Then 𝜎(T) ⊆ ℝ.


It is enough to show that if 𝜆 ∈ ℂ and 𝜇 = Im(𝜆) ≠ 0, then 𝜆 ∈ 𝜌(T). Let
x ∈ H. Using theorem 7.3.6, we have 𝜇‖x‖2 = −Im(⟨(T − 𝜆I)x, x⟩). Thus
|𝜇|‖x‖2 ≤ |⟨(T − 𝜆I)x, x⟩| ≤ ‖(T − 𝜆I)x‖‖x‖. Hence |𝜇|‖x‖ ≤ ‖(T − 𝜆I)x‖. This
proves that T − 𝜆I is bounded away from zero. In particular, by example 8 in
section 6.2, T − 𝜆I is injective, and its range is closed. By the previous example,
ℜ(T − 𝜆I) = H. This shows that T − 𝜆I is invertible. 

The next example illustrates that theorem 7.3.11 is not the only way to construct
self-adjoint operators and that we have wide control over the design of the
spectrum.

Example 7. Let {qn ∶ n ∈ ℕ} be an enumeration of the rational numbers in [0, 1],


and let {un ∶ n ∈ ℕ} be an orthonormal basis for a Hilbert space H. Define an

operator T as follows: T(x) = ∑n=1 qn x̂n un . We show that 𝜎(T) = [0, 1].
The assumptions of theorem 7.3.11 are clearly not satisfied. However, T
∞ ∞
is bounded because ‖Tx‖2 = ∑n=1 q2n |x̂n |2 ≤ ∑n=1 |x̂n |2 = ‖x‖2 . Thus ‖T‖ ≤ 1.
The verification that T is self-adjoint is similar to example 1, and we leave it to
the reader. Since Tun = qn un , each qn is an eigenvalue of T. Since 𝜎(T) is closed,
[0, 1] ⊆ 𝜎(T). By the above example and corollary 6.5.3 𝜎(T) ⊆ [−1, 1]. It is
easy to see that if 𝜆 ∈ [−1, 0), then |qn − 𝜆| ≥ |𝜆| for every n ∈ ℕ. A calculation
similar to that in example 4 shows that ‖(T − 𝜆I)x‖ ≥ |𝜆|‖x‖, and hence T − 𝜆I is
bounded away from zero. Since T − 𝜆I is self-adjoint, it is invertible by problem
11 at the end of this section. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

316 fundamentals of mathematical analysis

Normal and Unitary Operators

We briefly discuss two classes of bounded operators. The section exercises extend
the discussion. The definition of a normal operator is the same as that in the finite-
dimensional case discussed in section 3.7.

Definition. A bounded operator T on a Hilbert space is normal if TT∗ = T∗ T.

Observe that every self-adjoint operator is normal and that, for an arbitrary
operator T, TT∗ and T∗ T are self-adjoint.

Example 8. Let T be a normal operator. Then, for n ∈ ℕ, ‖Tn ‖ = ‖T‖n .


We apply the result of problem 8 in the section exercises to the self-adjoint
operator TT∗ :
n
‖T‖2n = (‖T‖2 ) = ‖T∗ T‖n = ‖(T∗ T)n ‖ = ‖(Tn )∗ (Tn )‖ = ‖Tn ‖2 .

Now equate the square roots of the extreme quantities of the last string. 

Example 9. A bounded operator T is normal if and only if, for every x ∈ H, ‖Tx‖ =
‖T∗ x‖. Consequently, 𝒩(T) = 𝒩(T∗ ).
If T is normal, then ‖Tx‖2 − ‖T∗ x‖2 = ⟨Tx, Tx⟩ − ⟨T∗ x, T∗ x⟩ = ⟨x, T∗ Tx⟩ −
⟨x, TT∗ x⟩ = ⟨x, (T∗ T − TT∗ )x⟩ = ⟨x, 0⟩ = 0.
If ‖Tx‖ = ‖T∗ x‖, then, by the above calculation, ⟨x, (T∗ T − TT∗ )x⟩ = 0 for all
x ∈ H. Since T∗ T − TT∗ is self-adjoint, it is 0, by lemma 7.3.5. 

Example 10. Let 𝜇 be an arbitrary complex number, and define T(x) = 𝜇x.
It is clear that T∗ x = 𝜇x. By the previous example, T is normal because
‖Tx‖ = |𝜇|‖x‖ = |𝜇|‖x‖ = ‖𝜇x‖ = ‖T∗ x‖. 

Definition. A bounded operator U is unitary if UU∗ = U∗ U = I.

Observe that U is unitary if and only if U−1 = U∗ and that every unitary operator
is normal.

Example 11. U is unitary if and only if ‖Ux‖ = ‖x‖ for every x ∈ H.


If U is unitary, then ‖Ux‖2 = ⟨Ux, Ux⟩ = ⟨x, U∗ Ux⟩ = ⟨x, x⟩ = ‖x‖2 . Conversely,
if ⟨Ux, Ux⟩ = ⟨x, x⟩ for all x, then ⟨x, U∗ Ux⟩ − ⟨x, x⟩ = 0 for all x ∈ H; hence
U∗ U = I by lemma 7.3.5. 

Example 12. For 𝜃 ∈ [0, 2𝜋), the operator U(x) = ei𝜃 x is unitary, by the previous
example. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 317

The results of the last two examples are consistent with the fact the unitary matrices
resemble rotations of the plane. Also see the last three problems in the section
exercises.

Exercises

1. Let T be a linear operator on H such that, for all x, y ∈ H,

⟨Tx, y⟩ = ⟨x, Ty⟩.

Prove that T is bounded. Hint: Use the closed graph theorem.


2. Let T ∈ ℒ(H), and let S be an invertible operator. Show that T and S−1 TS
have the same eigenvalues.
3. Show that if T is invertible, then (T−1 )∗ = (T∗ )−1 .
4. Prove that 𝜆 ∈ 𝜎(T) if and only if 𝜆 ∈ 𝜎(T∗ ).
5. Complete the proof of theorem 7.3.1.
6. Complete the proof of theorem 7.3.2.
7. Let T be a linear operator on a complex Hilbert space H. Prove that, for all
x, y ∈ H,

⟨T(x + y), x + y⟩ − ⟨T(x − y), x − y⟩ + i⟨T(x + iy), x + iy⟩.


− i⟨T(x − iy), x − iy⟩ = 4⟨Tx, y⟩

For a real Hilbert space, prove that if T is self-adjoint, then

⟨T(x + y), x + y⟩ − ⟨T(x − y), x − y⟩ = 4⟨Tx, y⟩.

Use the above identities to provide another proof of lemma 7.3.5.


8. Let T be a self-adjoint operator on a separable Hilbert space H.
k k
(a) Prove that ‖T‖2 = ‖T2 ‖. By induction, ‖T‖2 = ‖T2 ‖ for every positive
integer k.
(b) Prove that, for every positive integer n, ‖Tn ‖ = ‖T‖n . Hint: Choose an
k k k
integer k such that 1 ≤ n ≤ 2k ; ‖T‖2 = ‖T2 ‖ = ‖Tn T2 −n ‖.
w w
9. Prove that if xn → x, and T ∈ ℒ(H), then Txn → Tx.
10. Let T ∈ ℒ(H). Prove there are unique self-adjoint operators A and B such
that T = A + iB and T∗ = A − iB.
11. Prove that a bounded operator on a Hilbert space is invertible if and only
if both T and T∗ are bounded away from zero. Consequently, if T is self-
adjoint, the mere assumption that T is bounded away from zero implies the
invertibility of T.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

318 fundamentals of mathematical analysis

12. Let {un } be an orthonormal basis for H, and let {𝜆n } ∈ ℂ be such
that limn 𝜆n = 𝜆 and 𝜆n ≠ 𝜆 for all n. Prove that the function Tx =

∑n=1 𝜆n ⟨x, un ⟩un is a bounded linear operator on H. Prove also that each
𝜆n is an eigenvalue of T, but 𝜆 is not.
13. Let T ∈ ℒ(H). Show that if a subspace M is invariant under T, then M⊥ is
invariant under T∗ .
14. Let R and L be the right and left shift operators on l2 , respectively.
(a) Show that R∗ = L.
(b) Describe the eigenvalues of each operator.
(c) Prove that 𝜎(R) = 𝜎(L) = the closed unit disk in the complex plane.
15. Let T be a self-adjoint operator on H, and let 𝜆 be a complex number. Prove
that 𝜆 ∈ 𝜎(T) if and only if inf‖x‖=1 ‖(T − 𝜆I)(x)‖ = 0. Hint: If there exists
a constant 𝛿 > 0 such that ‖(T − 𝜆I)(x)‖ ≥ 𝛿‖x‖ for every x ∈ H, then, by
example 8 in section 6.2, ℜ(T − 𝜆I) is closed. Show that it is also dense
in H. To prove the converse, examine the proof of theorem 7.3.8. Observe
that this result is false if T is not self-adjoint. The right shift on l2 satisfies
‖Rx‖ = ‖x‖ but 0 ∈ 𝜎(R).
16. Let T be a self-adjoint operator on a separable Hilbert space H, let
m = inf‖x‖=1 ⟨Tx, x⟩, and let M = sup‖x‖=1 ⟨Tx, x⟩. Prove that 𝜎(T) ⊆ [m, M]
and that both m and M are in 𝜎(T). Hint: Since 𝜎(T + 𝜇I) = 𝜎(T) + 𝜇, we
may assume (by considering T + 𝜇I for a sufficiently large positive constant
𝜇) that 0 ≤ m ≤ M. By theorem 7.3.7, ‖T‖ = M. Thus 𝜎(T) ⊆ [−M, M]. Let
𝛿 > 0, and let 𝜆 = m − 𝛿. For every unit vector u, ‖Tu − 𝜆u‖ ≥ ⟨Tu − 𝜆u, u⟩.
Show that ⟨Tu − 𝜆u, u⟩ ≥ 𝛿. By the previous problem, m − 𝛿 ∉ 𝜎(T). This
proves that 𝜎(T) ⊆ [m, M]. To show that M ∈ 𝜎(T), use theorem 7.3.7 to
find a sequence of unit vectors (un ) such that limn ⟨Tun , un ⟩ = M. Show that
limn Tun − Mun = 0, and again use problem 15. To show that m ∈ 𝜎(T),
assume (by considering T − 𝜇I for a sufficiently large positive constant 𝜇)
that m ≤ M ≤ 0. Apply the result you just obtained to the operator S = −T
to conclude that −m ∈ 𝜎(S).
17. Let T be a self-adjoint operator on H, let M be a closed, T-invariant
subspace of H, and let N = M⊥ . If T1 and T2 are the restrictions of T to
M and N, respectively, prove that ℜ(T) = ℜ(T1 ) ⊕ ℜ(T2 ) and that 𝜎(T) =
𝜎(T1 ) ∪ 𝜎(T2 ). Hint: Use problem 15 to show that if 𝜆 ∉ 𝜎(T1 ) ∪ 𝜎(T2 ), then
𝜆 ∉ 𝜎(T).
18. Prove that if P is the projection on a closed subspace M, then I − P is the
projection of H on M⊥ .
19. Let P be a projection. Prove that 0 and 1 are the only eigenvalues of P. What
is 𝜎(P)?
20. Let P be a projection. Show that, for all x ∈ H, ⟨Px, x⟩ = ‖Px‖2 .
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 319

21. Show that the composition PM PN of two projections is a projection if and


only if PM and PN commute. In this case, PM PN = PM∩N .

Definition. A bounded operator T ∈ ℒ(H) is positive if ⟨Tx, x⟩ ≥ 0 for


every x ∈ H. If ⟨Tx, x⟩ > 0 for all nonzero vectors x ∈ H, T is said to be
strictly positive.

22. Prove that the eigenvalues of a positive operator are nonnegative. If T is


strictly positive, prove that the eigenvalues of T are positive.
23. Prove that a bounded operator T on a Hilbert space is invertible if there
exists a positive constant c such that T − cI is a positive operator.
24. Show that, for T ∈ ℒ(T), T∗ T is a positive operator.
25. (a) Prove that if each 𝜆n ≥ 0, then the operator T defined in theorem 7.3.11
is positive.

(b) For 𝛼 > 0, define T𝛼 (x) = ∑n=1 𝜆𝛼n Pn (x). Prove that T𝛼 T𝛽 = T𝛼+𝛽 .
26. Let S and T be commuting self-adjoint operators. Prove that the operator
S + 𝛼T is normal for every 𝛼 ∈ ℂ.
27. Let T be a normal operator. Prove that r(T) = ‖T‖. Conclude that if
T ≠ 0, then 𝜎(T) contains at least one nonzero point. Hint: Use example
8 and theorem 6.5.7. Observe that this result generalizes theorem 7.3.8 and
provides an alternative proof of it.
28. Let T be a normal operator. Prove that if 𝜆 is an eigenvalue of T and u
is a corresponding eigenvector, then 𝜆 is an eigenvalue of T∗ and u is a
corresponding eigenvector.
29. Prove that eigenvectors of a normal operator corresponding to distinct
eigenvalues are orthogonal.
30. Prove that a bounded operator U is unitary if and only if ⟨Ux, Uy⟩ = ⟨x, y⟩
for all x, y ∈ H.
31. If U is a unitary operator and {un } is an orthonormal basis for H, prove that
{Uun } is an orthonormal basis for H.
32. Prove that if 𝜆 is an eigenvalue of a unitary operator, then |𝜆| = 1.

7.4 Compact Operators

In section 3.7. we established the spectral theorem for normal operators on


finite-dimensional inner product spaces. The question now is how much of the
finite-dimensional theory can be generalized to self-adjoint (generally, normal)
operators on a separable Hilbert space.

Self-adjoint operators on an infinite-dimensional separable Hilbert space share


some of the properties of Hermitian matrices. For example, the eigenvalues of
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

320 fundamentals of mathematical analysis

such an operator are real, and eigenvalues corresponding to distinct eigenvalues


are orthogonal. However, the spectral theorem does not extend to self-adjoint
operators for a simple reason: A self-adjoint operator on an infinite-dimensional
separable Hilbert space may not have any eigenvalues. The following example
assumes familiarity with the space 𝔏2 (0, 1) of (Lebesgue) square integrable func-
1
tions on [0, 1], with the inner product ⟨ f, g⟩ = ∫0 f(x)g(x)dx. The unfamiliar reader
can think of 𝔏2 (0, 1) as the completion of 𝒞[0, 1] with respect to the given inner
product. See theorem 7.1.10 and example 4 in section 7.2.

Example 1. The operator T on 𝔏2 (0, 1) defined by (Tu)(x) = xu(x) is clearly self-


adjoint and has no eigenvalues. 

In this section, we study compact operators in some depth. The culmination of the
section is the spectral theorem for compact self-adjoint operators.

Definition. A linear operator T on a separable Hilbert space H is compact if it


maps bounded sets into relatively compact sets. Thus T is compact if whenever
A is a bounded subset of H, then T(A) is compact.

Example 2. (a) Compact operators are clearly bounded.


(b) The identity operator, I, on an infinite-dimensional Hilbert space is never
compact. The image of the unit ball, which is bounded, is itself. But,
in infinite-dimensional space, no ball is relatively compact, so I is not
compact.
(c) Define T ∶ l2 → l2 as follows: for x = (xn ) ∈ l2 , T(x) = (x1 , 0, x3 , 0, x5 , ...).
The set A = {e2n−1 ∶ n ∈ ℕ} is bounded, but its image T(A) = A is not
relatively compact. Hence T is not a compact operator. 

Example 3. A bounded operator on a Hilbert space H is compact if and only if


T(B) is compact, where B is the unit ball in H
Suppose T(B) is compact, and let A be a bounded subset of H. Then A is
contained in a ball Br of radius r and centered at the origin. Because T(Br ) =
rT(B), T(Br ) is compact; hence T(A) is compact. The converse is trivial. 

Example 4. Define D ∶ l2 → l2 as follows: for an element x = (xn ) ∈ l2 , D(x) =


(x1 , x2 /2, x3 /3, ...). Being a closed subset of the Hilbert cube, D(B) is compact.
Thus D is compact, by the previous example. 

Theorem 7.4.1. An operator T ∈ ℒ(H) is compact if and only if, for every bounded
sequence (xn ) in H, (T(xn )) contains a convergent subsequence.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 321

Proof. Suppose T is compact, and let (xn ) be a bounded sequence in H, say, ‖xn ‖ ≤ r.
By assumption, T(B(0, r)) is compact and contains T(xn ). By the sequential
compactness of T(B(0, r)), (T(xn )) contains a convergent subsequence.

Conversely, if T is not compact, then there exists a bounded subset A such that
T(A) is not compact. In particular, T(A) is not totally bounded. Thus there exists
a positive number 𝜖 and a sequence (xn ) in A such that ‖T(xn ) − T(xm )‖ ≥ 𝜖 for all
m, n ∈ ℕ. See the proof of theorem 4.7.6. We have constructed a bounded sequence
(xn ) for which T(xn ) contains no convergent subsequence. 

Example 5. A compact operator T maps weakly convergent sequences into


(norm) convergent sequences. The converse is also true. See problem 2 at
the end of this section.
Let (xn ) be a sequence in H such that xn →w x. We show that every subse-
quence of (T(xn )) contains a subsequence that converges to T(x). Pick a subse-
quence yk = xnk of (xn ). Since yk →w x and since weak sequences are bounded
(problem 15 on section 7.2), the previous theorem yields a subsequence zp =
ykp of (yk ) such that T(zp ) is convergent. Set limp T(zp ) = z. In particular,
T(zp ) →w z. Now zp →w x, so, by problem 6 on section 6.6, T(zp ) →w T(x). By
the uniqueness of weak limits (problem 11 on section 7.2), z = T(x). 

Theorem 7.4.2. The set 𝒦 of compact operators on a separable Hilbert space H is a


closed subspace of ℒ(H).

Proof. We leave it to the reader to verify that 𝒦 is a vector space. To prove that 𝒦
is closed, let T ∈ ℒ(H) be in the closure of 𝒦. Let (xn ) be a bounded sequence
in H, and suppose that ‖xn ‖ ≤ r. If 𝜖 > 0, there exists a compact operator K
such that ‖T − K‖ < 𝜖. Since K is compact, a subsequence (yn ) of (xn ) exists
such that K(yn ) is convergent. In particular, K(yn ) is a Cauchy sequence, so
there exists a positive integer N such that, for m, n > N, ‖Kyn − Kym ‖ < 𝜖. Now,
for m, n > N, ‖Tyn − Tym ‖ ≤ ‖Tyn − Kyn ‖ + ‖Kyn − Kym ‖ + ‖Kym − Tym ‖ ≤
‖T − K‖‖yn ‖ + 𝜖 + ‖K − T‖‖ym ‖ < r𝜖 + 𝜖 + r𝜖. Thus Tyn is Cauchy; hence it is
convergent. 

Theorem 7.4.3. (a) If T is compact and S ∈ ℒ(H), then ST and TS are compact.
(b) If T is compact, and H is infinite dimensional, then 0 ∈ 𝜎(T).

Proof. The proof of part (a) is a straightforward application of theorem 7.4.1 and the
fact that a bounded operator maps bounded sequences into bounded sequences
and convergent sequences into convergent sequences. To prove (b), suppose
0 ∉ 𝜎(T). Then T is invertible, so there exists a bounded operator S such that
ST = I. By part (a), ST would be compact, so I would be compact, which is false
by example 2. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

322 fundamentals of mathematical analysis

Definition. An operator T ∈ ℒ(H) is said to be of finite rank if ℜ(T) is finite


dimensional.

Theorem 7.4.4.
(a) A bounded, finite-rank operator T is compact.
(b) If T is compact and ℜ(T) is closed, then T is of finite rank.

Proof. (a) Suppose dim(ℜ(T)) < ∞. The continuity of T implies that T(A) is a
bounded subset of ℜ(T) for every bounded subset A of H. But bounded subsets
of a finite-dimensional space are relatively compact by the Heine-Borel theorem.
This proves that T is compact.

(b) If ℜ(T) is closed, then it is a Banach space, and T maps H onto ℜ(T). The open
mapping theorem implies that T is an open mapping. Coupled with the compact-
ness of T, this implies that the image T(B) of the unit ball, B, in H is relatively com-
pact and contains a ball B′ = {x ∈ ℜ(T) ∶ ‖x‖ < 𝛿}. In particular, the closed ball
B′ in ℜ(T) is compact. This cannot happen unless ℜ(T) is finite dimensional. 
∞ ∞
Example 6. Let (aij ) be an infinite matrix such that ∑i=1 ∑j=1 |aij |2 < ∞, and

define an operator T on l2 as follows: for x = (xn ) ∈ l2 , T(x) = ∑j=1 aij xj . We

claim that T is compact. Observe that the assumptions imply that | ∑j=1 aij xj |2 ≤
∞ ∞ ∞ ∞ ∞
∑j=1 |aij |2 ∑j=1 |xj |2 = ‖x‖ ∑j=1 |aij |2 . Also, limn→∞ ∑i=n+1 ∑j=1 |aij |2 = 0.
For n ∈ ℕ let Pn be the projection of l2 onto the finite-dimensional subspace
Span({e1 , … , en }). Thus, for x = (xn ) ∈ l2 , Pn (x) = (x1 , … , xn , 0, 0, 0, ...). Since
Pn is compact, Pn T is compact by theorem 7.4.3. If we show that limn ‖T −
Pn T‖ = 0, the proof will be complete by theorem 7.4.2. Now ‖(T − Pn T)x‖2 =
∞ ∞ ∞ ∞
∑i=n+1 | ∑j=1 aij xj |2 ≤ ∑i=n+1 ‖x‖2 ∑j=1 |aij |2 . This shows that ‖T − Pn T‖ ≤
∞ ∞ ∞ ∞
∑i=n+1 ∑j=1 |aij |2 . Since limn ∑i=n+1 ∑j=1 |aij |2 = 0, we are done. 

Not every compact operator is of finite rank. The following theorem provides the
next best result.

Theorem 7.4.5. Every compact operator T on a separable Hilbert space H is the limit
of a sequence of finite-rank operators.

Proof. Let B be the closed unit ball in H. Since T(B) is relatively compact, for every
n ∈ ℕ, there exists a finite subset Fn of H such that T(B) ⊆ ∪y∈Fn B(y, 1/n). Let
Mn = Span{Fn }, and let Pn be the projection of H onto Mn . Finally, let Tn = Pn T.
Note that ℜ(Tn ) has finite dimension because it is contained in Mn . Thus each
Tn if a finite-rank operator. We now show that, for x ∈ B, ‖Tn x − Tx‖ < 2/n.
This will prove that limn Tn = T. Fix n ∈ ℕ and write Fn = {y1 , … , yN }. If x ∈ B,
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 323

Tx ∈ T(B) ⊆ ∪Ni=1 B(yi , 1/n). Thus, for some 1 ≤ i ≤ N, ‖Tx − yi ‖ < 1/n. Now
‖Tn x − yi ‖ = ‖Pn (Tx) − Pn (yi )‖ = ‖Pn (Tx − yi )‖ ≤ ‖Pn ‖‖Tx − yi ‖ = ‖Tx − yi ‖ <
1/n. Finally, ‖Tn x − Tx‖ ≤ ‖Tn x − yi ‖ + ‖yi − Tx‖ < 1/n + 1/n = 2/n. 

Theorem 7.4.6. Let T ∈ ℒ(T). Then T is compact if and only if T∗ is compact.

Proof. Since T∗∗ = T, it is enough to show that if T is compact, then T∗ is compact.


Let (yn ) be a sequence in the closed unit ball B of H. We show that T∗ yn contains
a convergent subsequence. For each n ∈ ℕ, define 𝜆n ∈ H∗ by 𝜆n (x) = ⟨x, yn ⟩. For
x, x′ ∈ H,
|𝜆n (x) − 𝜆n (x′ )| = |⟨x − x′ , yn ⟩| ≤ ‖x − x′ ‖.

It follows that the sequence 𝜆n is equicontinuous on H and, in particular, on T(B),


which is compact by assumption. Ascoli’s theorem guarantees a subsequence 𝜆nk
that converges uniformly on T(B).
Now

‖T∗ yni − T∗ ynj ‖ = supx∈B |⟨x, T∗ (yni − ynj )⟩| = supx∈B |⟨Tx, yni − ynj ⟩|
= supx∈B |⟨Tx, yni ⟩ − ⟨Tx, ynj ⟩|
= supx∈B |𝜆ni (Tx) − 𝜆nj (Tx)|.

The uniform convergence of 𝜆nk on T(B) guarantees that the last quantity can be
made less than 𝜖 for sufficiently large integers i and j. Thus T∗ (ynk ) is Cauchy and
hence convergent. 

The Eigenvalues of a Compact Operator

Theorem 7.4.7 (the Riesz-Schauder theorem). Let T be compact, and let r > 0.
Then the set of eigenvalues 𝜆 of T such that |𝜆| > r is finite.

Proof. Suppose there exist infinitely many eigenvalues 𝜆n of T such that |𝜆n | > r. For
each eigenvalue 𝜆n , choose an eigenvector xn , and let Mn = Span{x1 , … , xn }. Note
that Mn is properly contained in Mn+1 and that T(Mn ) ⊆ Mn . By Riesz’s lemma,
for every n ≥ 2, there exists a unit vector yn ∈ Mn such that dist(yn , Mn−1 ) ≥ 1/2.
It is easy to verify that (T − 𝜆m I)ym ∈ Mm−1 . Now if n < m, then Tyn −
1
(T − 𝜆m I)ym ∈ Mm−1 , so [Tyn − (T − 𝜆m I)ym ] ∈ Mm−1 , and ‖Tyn − Tym ‖ =
𝜆m
1
|𝜆m |‖ [Tyn − (T − 𝜆m I)ym ] − ym ‖ ≥ |𝜆m | dist(ym , Mm−1 ) ≥ r/2. Thus (Tyn )
𝜆m
contains no convergent subsequence, contradicting the compactness of T. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

324 fundamentals of mathematical analysis

Theorem 7.4.8. For a compact operator T on a separable Hilbert space H,

(a) the set of eigenvalues of T is at most countable and can be arranged as follows:
|𝜆1 | ≥ |𝜆2 | ≥ ...; and the set {𝜆n } of eigenvalues of T has no nonzero limit
points,
(b) If T has infinitely many eigenvalues, then limn 𝜆n = 0; and
(c) if 0 ≠ 𝜆 ∈ ℂ, and L = T − 𝜆I, then dim(Ker(L)) < ∞.

Proof. Let Λ be the set of nonzero eigenvalues of T, and assume Λ is infinite. Let r be
the spectral radius of T, and let rn = r/n (n ∈ ℕ). If Un is the complement in the
complex plane of the closed disk of radius rn centered at 0, then ℂ − {0} = ∪∞ n=1 Un ,
and Λ = ∪∞ n=1 Λ ∩ Un . Since each of the sets Λn = Λ ∩ Un is finite by theorem
7.4.7, Λ is countable. If Λ has a nonzero limit point, z, then z ∈ Un for some
positive integer n. Because Un is open, it contains a disk centered at z, and such
a disk would contains infinitely many points of Λ. This would contradict the
finiteness of Λn , so no such point z exists. Next let n ∈ ℕ be such that Λ ∩ Un ≠ ∅.
Since Λ ∩ Un is finite, the eigenvalues 𝜆 such that |𝜆| > rn can be enumerated such
that |𝜆1 | ≥ |𝜆2 | ≥ ... |𝜆N1 |. Since Λ is infinite, there exists an integer m > n such
that Λ ∩ Um properly contains Λ ∩ Un . Arrange the eigenvalues in (Um − Un ) ∩ Λ
in such a way that |𝜆N1 +1 | ≥ |𝜆N1 +2 | ≥ ... ≥ |𝜆N2 |. Continuing in this manner, we
can enumerate all the eigenvalues in the desired fashion.

(b) Any disk centered at 0 contains all but finitely many of the points 𝜆n . This
proves part (b).

(c) Write NL for Ker(L). Note that T(NL ) ⊆ NL ; hence the restriction of T to NL is
1
compact. Since Tx = 𝜆x for all x ∈ NL , I = T on NL . Thus the identity operator
𝜆
on NL is compact, so NL is finite dimensional. 

Now that we have established enough of the basic properties of compact operators,
we give examples of how compact operators can be constructed. We hope this will
help motivate some of the results we discuss later in the section. The following
builds on the constructions of theorem 7.3.11.

Example 7. Let Pn be a sequence of pairwise orthogonal projections, and let 𝜆n


be a sequence of nonzero complex numbers such that limn 𝜆n = 0. Define T ∶

H → H by Tx = ∑n=1 𝜆n Pn x. By theorem 7.3.11, T ∈ ℒ(H). If, in addition, the
rank of each Pn is finite (i.e., Pn projects H onto a finite-dimensional subspace),
then T is compact, by theorems 7.4.4 and 7.4.2. By theorem 7.3.11, 𝜆n are all the
nonzero eigenvalues of T. The importance of theorem 7.3.11 and this example
is that they not only produce an abundance of examples of self-adjoint and
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 325

compact operators but also illustrate that we have wide control over tailoring
the spectrum, as the examples below illustrate. Also see problem 4 at the end
of this section for an example of the ultimate tailoring of the spectrum of a
bounded operator. 

Example 8. Let {un } be an orthonormal basis for H and define T(x) =



∑n=1 𝜆n ⟨x, un ⟩un , where (𝜆n ) is a sequence of nonzero complex numbers such
that limn 𝜆n = 0. This is a special case of the previous example; each Pn is the
projection on the one-dimensional subspace spanned by un . In this example,
𝜆 = 0 is not an eigenvalue of T, as the reader can easily verify. 

Example 9. Let {un } be an orthonormal basis for H and define T(x) =



∑n=1 𝜆n ⟨x, u2n ⟩u2n , where (𝜆n ) is a sequence of nonzero complex numbers
such that limn 𝜆n = 0. In this case, 𝜆 = 0 is an eigenvalue of T. In fact,
dim(𝒩(T)) = ∞ since T(u2n+1 ) = 0 for all positive integers n. 

The following subsection is independent of the subsequent subsections and can be


bypassed without loss of continuity.

The Fredholm Theory

We will adopt the following standing assumptions for the remainder of this section:
T is a compact operator on a separable Hilbert space H, and 𝜆 is a nonzero
complex number. We also use the following notation: L = T − 𝜆I, L∗ = T∗ − 𝜆I;
NL = Ker(L); RL = ℜ(L); NL∗ = Ker(L∗ ); RL∗ = ℜ(L∗ ).
In the calculations in the rest of this section, we repeatedly use the fact that T
commutes with the powers of L. This is because the powers of T commute.

Theorem 7.4.9. RL is closed.

Proof. Let X1 be a complement of NL in H. One exists by example 6 in section 6.4.


We can choose X1 = N⊥L , but we are not making this election because the rest
of the proof below works well with any complement of NL . We first prove the
following fact: There exists a constant 𝛿 > 0 such that ‖Lu‖ ≥ 𝛿‖u‖ for every
u ∈ X1 . Suppose not. Then there exists a sequence (un ) in X1 such that ‖un ‖ =
1, ‖Lun ‖ < 1/n. Clearly, Lun → 0 as n → ∞. Since T is compact, Tun contains
1 1
a convergent subsequence, Twn . Thus wn = Twn − Lwn is convergent. Let
𝜆 𝜆
1
w = limn wn . Since X1 is closed, w ∈ X1 . Now w = limn wn = limn (Twn −
𝜆
1
Lwn ) = Tw. Thus Tw = 𝜆w; hence w ∈ NL ∩ X1 = {0}. This contradicts the fact
𝜆
that ‖w‖ = limn ‖wn ‖ = 1 and establishes the fact. We now prove that RL is closed.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

326 fundamentals of mathematical analysis

Suppose Lyn is a convergent sequence in RL . We need to show that limn Lyn ∈ RL .


Write yn = un + wn , where un ∈ X1 , and wn ∈ NL . Note that Lyn = Lun , so Lun
1
is convergent. By the above fact, ‖un − um ‖ ≤ ‖Lun − Lum ‖ → 0 as m, n → ∞.
𝛿
Thus un is a Cauchy sequence, so u = lim un exists. Finally, limn Lun = Lu. 

Remarks. (a) An immediate consequence of theorem 7.4.9 is that H = NL∗ ⊕ RL ,


because, by theorem 7.3.12, RL = N⊥L∗ . Since RL is closed, RL = N⊥L∗ , which
is the result we seek.
(b) Since L∗ = T∗ − 𝜆I, and T∗ is compact, the above theorem implies that NL∗
is finite dimensional and that RL∗ is closed. As in remark a, H = NL ⊕ RL∗ .
(c) By the above remarks, codim(RL ) = dim(NL∗ ), and codim(RL∗ ) = dim(NL ).
It is also true that dim(NL ) = dim(NL∗ ). It follows that the numbers
dim(NL ), dim(NL∗ ), codim(RL ), and codim(RL∗ ) are all finite and equal.
The proof that dim(NL∗ ) = dim(NL ) appears at the end of this subsection.
See theorem 7.4.15.

Lemma 7.4.10. Let NLn denote Ker(Ln ). Then NLn is finite dimensional, and NLn ⊆
NLn+1 . Moreover, there exists an integer n such that, for every k ≥ n, NLk = NLn .

Proof. Observe that

n
n
Ln = (T − 𝜆I)n = ∑ ( )Tn−i (−𝜆I)i
i=0
i
= (Tn − n𝜆Tn−1 + ... + n(−𝜆)n−1 T) − [(−1)n+1 𝜆n ]I.

The operator K = Tn − n𝜆Tn−1 + ... + n(−𝜆)n−1 T is compact by theorems 7.4.2


and 7.4.3. Since (−1)n+1 𝜆n ≠ 0, the kernel NLn of Ln is finite dimensional by
theorem 7.4.8. The fact that NLn ⊆ NLn+1 is obvious. Now suppose, for a contra-
diction, that NLn ≠ NLn+1 for all n ∈ ℕ. By Riesz’s lemma, choose a unit vector
un ∈ NLn+1 such that dist(un , NLn ) ≥ 1/2. We claim that Tun contains no conver-
gent subsequence, contradicting the compactness of T, and concluding the proof.
For n > m,
Tun − Tum = 𝜆un − (Tum − Lun ) (4)

Now Ln (Tum −, Lun ) = T(Ln um ) − Ln+1 un = 0 − 0 = 0. Thus Tum − Lun ∈


1
NLn , and, by (4), ‖Tun − Tum ‖ = |𝜆|‖un − (Tum − Lun )‖ ≥ |𝜆|/2, which is
𝜆
the contradiction we seek. In the above computation, we used the fact that T and
Ln commute. 

Lemma 7.4.11. Let RLn = ℜ(T − 𝜆I)n . Then each RLn is closed, RLn ⊇ RLn+1 , and
there exists a positive integer n such that RLk = RLn for all k ≥ n.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 327

Proof. As in the proof of the previous lemma, Ln = K − [(−1)n+1 𝜆n ]I, where K is


a compact operator; hence, by theorem 7.4.9, RLn is closed. The inclusions RLn ⊇
RLn+1 are obvious. If RLn ≠ RLn+1 for all n, then, by Riesz’s lemma, choose a unit
vector un ∈ RLn such that dist(un , RLn+1 ) ≥ 1/2. We claim that Tun contains no
convergent subsequence, contradicting the compactness of T, and concluding the
proof. For n > m,
Tum − Tun = 𝜆um − (Tun − Lum ). (5)
Now Tun − Lum ∈ RLm+1 ; hence, by (5),
1
‖Tum − Tun ‖ = |𝜆|‖um − (Tun − Lum )‖ ≥ |𝜆|/2. 
𝜆

Proposition 7.4.12 (the Fredholm alternative theorem). The operator L is sur-


jective if and only if it in injective. Symbolically, RL = H if and only if NL = {0}.

Proof. Suppose RL = H. If NL ≠ {0}, then there exists a vector u0 ≠ 0 such that


Lu0 = 0. Since L is onto, there is a vector u1 such that Lu1 = u0 , and, by induction
there exists a sequence of nonzero vectors u1 , u2 , ... such that Lui = ui−1 . Now, for
all n, Ln un = u0 ≠ 0, but Ln+1 un = 0. Thus un ∈ NLn+1 − NLn . This contradicts
lemma 7.4.10.

Conversely, suppose NL = {0}. Note that NLn = {0}, that is, Ln is one-to-one. If
RL ≠ H, then there is an element x ∉ RL . In this case, for all y ∈ H, Ln x − Ln+1 y =
Ln (x − Ly) ≠ 0, because x ≠ Ly and Ln is injective. Hence Ln x ≠ Ln+1 y for all
y ∈ H. This means that RLn strictly contains RLn+1 for all n, thus contradicting
lemma 7.4.11. 

Theorem 7.4.13. Let T be compact. If 𝜆 ≠ 0, and 𝜆 is not an eigenvalue, then T − 𝜆I


is invertible. In other words, all the nonzero elements of the spectrum of a compact
operator are eigenvalues.

Proof. Suppose 𝜆 ≠ 0 and that 𝜆 is not an eigenvalue, that is, NL = {0}. By propo-
sition 7.4.12, RL = H. Hence L = T − 𝜆I is one-to-one and onto and hence is
invertible by theorem 6.3.7. 

Remark. It follows from theorem 7.4.8 and the previous theorem that the spec-
trum of a compact operator on an infinite-dimensional separable Hilbert space
is {0, 𝜆1 , 𝜆2 , ...}.

The following result is an immediate consequence of proposition 7.4.12 and


theorem 7.4.13.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

328 fundamentals of mathematical analysis

Theorem 7.4.14 (the Fredholm alternative theorem). Let T be a compact opera-


tor, and let 𝜆 ≠ 0. Then exactly one of the following holds:

(a) T − 𝜆I is invertible.
(b) 𝜆 is an eigenvalue of T. 

We conclude this subsection by furnishing the proof of a result we mentioned


earlier.

Theorem 7.4.15. Let T be a compact operator on a separable Hilbert space H, and,


for a complex number 𝜆 ≠ 0, let L = T − 𝜆I, L∗ = T∗ − 𝜆I. Then NL and NL∗ have
the same (finite) dimension.

Proof. Suppose, for a contradiction, that dim(NL ) = m < n = dim(NL∗ ), and let
{u1 , … , um } and {v1 , … , vn } be orthonormal bases for NL and NL∗ respectively.
Define a finite rank operator on H by

m
F(x) = ∑⟨x, ui ⟩vi .
i=1

Notice that Fui = vi for 1 ≤ i ≤ m and that the restriction of F to NL is one-


to-one. The operator K = T + F is compact. We claim that K − 𝜆I is one-
to-one. If (K − 𝜆I)(x) = 0, then (T − 𝜆I)(x) = −Fx ∈ RL ∩ NL∗ = {0}. Thus
(T − 𝜆I)(x) = 0 = Fx. In particular, x ∈ NL . Because F|NL is one-to-one, x = 0,
and this proves our claim. By the Fredholm alternative theorem, K − 𝜆I is
onto, which is a contradiction because ℜ(K − 𝜆I) ⊆ RL ⊕ Span{v1 , … , vm } and
vm+1 ∉ RL ⊕ Span{v1 , … , vm }. This contradiction shows that n ≤ m. By the
preceding part of the proof and the fact that L∗∗ = L, m ≤ n, and the proof is
complete. 

The Spectral Theorem

The discussion so far shows that compact operators, like self-adjoint operators,
share some properties with operators on finite-dimensional spaces. When we limit
our attention to compact, self-adjoint operators, we obtain results that directly
extend those of the finite-dimensional case.

Lemma 7.4.16. If T is a nonzero compact, self-adjoint operator on a separable


Hilbert space, then ‖T‖ or −‖T‖ is an eigenvalue of T. In particular, every nonzero
compact, self-adjoint operator has a nonzero eigenvalue.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 329

Proof. By the proof of theorem 7.3.8, there exists a real number 𝜆 and a sequence of
unit vectors (yn ) such that |𝜆| = ‖T‖ and limn Tyn − 𝜆yn = 0. Since T is compact,
(yn ) contains a subsequence (un ) such that (Tun ) is convergent. It follows that
(un ) is convergent (it is the difference between the two convergent sequences
1 1
Tun and [Tun − 𝜆un ]). Let u = limn un . Now limn Tun − 𝜆un = 0; hence
𝜆 𝜆
Tu − 𝜆u = 0. Since u ≠ 0, 𝜆 is an eigenvalue of T. 

In light of theorems 7.3.8 and 7.4.13, r(T) = ‖T‖ = |𝜆1 | (the largest eigenvalue of
T). Thus the previous lemma is, in fact, redundant. However, we decided to include
it here in order to make this subsection self-contained and independent of the
Fredholm theory.

Theorem 7.4.17 (the Hilbert-Schmidt theorem). Let T be a compact self-adjoint


operator on a separable Hilbert space H. Then H possesses an orthonormal basis
of eigenvectors of T.

Proof. Let 𝜆1 , 𝜆2 , ... be the nonzero eigenvalues of T, and, for each n, let Bn be an
orthonormal basis for the (finite-dimensional) eigenspace, Vn , that corresponds to
𝜆n . The reader should keep in mind that the set of eigenvalues may be finite. Since
the eigenspaces are mutually orthogonal, the set B = ∪n Bn is an orthonormal
set. Let M be the closure of the span of B, and let N = M⊥ . Since each Vn is T-
invariant, so is M. It follows that N is also T-invariant (see problem 13 on section
7.3). If N = {0}, then M = H and B is the desired orthonormal basis for H.

If N ≠ {0}, the restriction of T to N is compact and self-adjoint. If T|N is not the


zero operator, then, by lemma 7.4.16, T|N has a nonzero eigenvalue 𝜆, which is also
an eigenvalue of T. Since the set {𝜆1 , 𝜆2 , ...} contains all the nonzero eigenvalues
of T, 𝜆 = 𝜆n for some n. This is a contradiction because then an eigenvector v
that corresponds to 𝜆 would be in M ∩ N = {0}. This shows that N ⊆ Ker(T). In
particular, Ker(T) ≠ {0}; hence 𝜆 = 0 is an eigenvalue of T. Now we show that
N = Ker(T). If x ∈ Ker(T), then by the orthogonality of eigenvectors belonging to
distinct eigenvalues, x ⊥ u for every u ∈ B. Thus x ∈ M⊥ = N, and N = Ker(T).
Now choose an orthonormal basis B0 of N. The set B ∪ B0 is an orthonormal basis
of M ⊕ N = H consisting entirely of eigenvectors of T. 

We now arrive at the spectral theorem for compact self-adjoint operators.

Theorem 7.4.18 (the spectral theorem). Let T be a compact self-adjoint oper-


ator on a separable Hilbert space H, and let {un } be an orthonormal basis of
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

330 fundamentals of mathematical analysis

eigenvectors of T corresponding to the eigenvalues {𝜆n }. Then, for every x ∈ H,


Tx = ∑ 𝜆n ⟨x, un ⟩un .
n=1


Proof. Write x = ∑n=1 ⟨x, un ⟩un . Then

∞ ∞ ∞
Tx = T ( ∑ ⟨x, un ⟩un ) = ∑ ⟨x, un ⟩Tun = ∑ 𝜆n ⟨x, un ⟩un . 
n=1 n=1 n=1

The spectral theorem is the exact analog of the finite-dimensional case for a
Hermitian matrix. If we define Pn to be the projection on the one-dimensional

subspace spanned by un , then Pn is a rank-1 operator, and T = ∑n=1 𝜆n Pn . Notice

that the series ∑n=1 𝜆n Pn converges in the operator norm by theorem 7.3.11.

Remarks. 1. If 𝜆 = 0 is an eigenvalue of T, then 𝜆 = 0 contributes nothing to


the sum in the statement of theorem 7.4.18. Consequently, if y ∈ ℜ(T),

then y = ∑n=1 ⟨y, un ⟩un , where the series involves only the eigenvectors that
correspond to the nonzero eigenvalues.
2. The proof of theorem 7.4.17 and remark 1 reveal that T projects H onto the
orthogonal complement of 𝒩(T), which is nothing other than the closure of
the span of the eigenvectors that belong to the nonzero eigenvalues of T.

Example 10. Let T be a compact self-adjoint operator, let {𝜆n } be the nonzero
eigenvalues of T, and let un be the corresponding eigenvectors. For a fixed
g ∈ H, consider the equation Tf − 𝜆f = g. We work out two cases:

(a) Suppose 𝜆 ≠ 0 is not an eigenvalue. In this case, T − 𝜆I is invertible, and the


equation has a unique solution f. To find f, observe that the equation can be
written as Tf = 𝜆f + g; hence 𝜆f + g ∈ ℜ(T). Remark 1 implies that

∞ ∞
𝜆f + g = ∑ ⟨𝜆f + g, un ⟩un = ∑ [𝜆fn̂ + gn̂ ]un . (6)
n=1 n=1

By theorem 7.4.18,

Tf = ∑ 𝜆n fn̂ un . (7)
n=1
Equating the Fourier coefficients of the two series in (6) and (7), we obtain

𝜆n fn̂ = 𝜆fn̂ + gn̂ , which gives fn̂ = n , and the unique solution of the
𝜆n −𝜆
equation is

−g 1 −g 𝜆 ⟨g, un ⟩
f= + Tf = +∑ n u .
𝜆 𝜆 𝜆 n=1 𝜆(𝜆n − 𝜆) n
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 331

(b) If 𝜆 = 𝜆n , and 𝜆n is a simple eigenvalue with eigenvector v, then the equation


has a solution if and only if ⟨g, v⟩ = 0 and, in this case, the solutions are of the
form
g 𝜆k ⟨g,un ⟩
f=− + av + ∑{ uk ∶ k ≠ n}, where a is an arbitrary scalar.
𝜆n 𝜆n (𝜆k −𝜆n )

To see this, one duplicates case (a) to obtain the equations

𝜆k fk̂ = 𝜆n fk̂ + gk̂ for all k ∈ ℕ (8)

when k = n, the equation 𝜆n fn̂ = 𝜆n fn̂ + gn is satisfied if and only if gn̂ = 0. In


this case, fn̂ is arbitrary and, for k ≠ n, fk̂ is uniquely determined by equation
(8), so we have arrived at the stated solution. 

See problem 8 at the end of the section for the continuation of this example.

We are now ready to prove the spectral theorem for compact normal operators.

Theorem 7.4.19. Let T be a compact normal operator on a separable Hilbert space


H. Then H possesses an orthonormal basis of eigenvectors of T.

Proof. Consider the self-adjoint operator U = TT∗ = T∗ T. Observe that T and U


commute: TU = TT∗ T = T∗ TT = UT.

We show that if 𝜆0 = 0 is an eigenvalue of U, then 𝜆0 is an eigenvalue of T and


Ker(T) = Ker(U). If U(x) = 0 then

‖T(x)‖2 = ⟨Tx, Tx⟩ = ⟨x, T∗ Tx⟩ = ⟨x, U(x)⟩ = ⟨x, 0⟩ = 0.

Conversely, if Tx = 0, then U(x) = T∗ (T(x)) = 0. Now let 𝜆1 , 𝜆2 , ... be the dis-


tinct nonzero eigenvalues of U, and let V1 , V2 , ... be the corresponding finite-
dimensional, mutually orthogonal eigenspaces. We show that each Vn is T-
invariant. If x ∈ Vn , then (U − 𝜆n I)(Tx) = (UT − 𝜆n T)(x) = (TU − 𝜆n T)(x) =
T(U − 𝜆n I)(x) = T(0) = 0. Thus Tx ∈ Vn . Now T|Vn is a normal operator on the
finite-dimensional space Vn . By theorem 3.7.15, Vn has a basis Bn of eigenvectors
of T. Choose an orthonormal basis B0 for V0 = Ker(T) = Ker(U). By theorem
7.4.17, H = Span{∪∞ ∞
n=0 Vn }. Since Vn = Span(Bn ), ∪n=0 Bn is an orthonormal
basis for H. 

Example 11. Let T ∶ l2 → l2 be the finite-rank operator

T(x1 , x2 , ...) = (ix1 , −ix2 , (1 + i)x3 , (1 − i)x4 , 0, 0, ...).


OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

332 fundamentals of mathematical analysis

It is easy to verify that T∗ (x1 , x2 , ...) = (−ix1 , ix2 , (1 − i)x3 , (1 + i)x4 , 0, 0, ...) and
that T is a normal operator. The self-adjoint operator U = T∗ T is given by

U(x1 , x2 , ...) = (x1 , x2 , 2x3 , 2x4 , 0, 0, ...).

The three eigenvalues of U are 𝜆0 = 0, 𝜆1 = 1, and 𝜆2 = 2 with eigenspaces


Ker(U) = {(0, 0, 0, 0, x5 , x6 , ...)}, V1 = Span{e1 , e2 } and V2 = Span{e3 , e4 }, respec-
tively.3
The nonzero eigenvalues of T are ±i and 1 ± i, and the corresponding eigen-
vectors are e1 , e2 , e3 , and e4 , respectively. Also, 𝜆0 = 0 is an eigenvalues of T
and Ker(T) = Ker(U). In the notation of the previous theorem, B0 = {e5 , e6 , ...},
B1 = {e1 }, B2 = {e2 }, B3 = {e3 }, and B4 = {e4 }. 

Excursion: Integral Equations

The theory of compact operators has deep roots in the study of integral equations,
and this section would not be complete without a brief mention of integral
equations.
Consider the Fredholm integral equation

Tu − 𝜆u = f,
where T is the integral operator generated by the function K(x, 𝜉),

b
Tu(x) = ∫ K(x, 𝜉)u(𝜉)d𝜉.
a

The complex function K(x, 𝜉) on the square [a, b] × [a, b] is called the kernel of the
operator, and we limit ourselves to Hilbert-Schmidt kernels since these, as it turns
out, define compact integral operators on 𝔏2 = 𝔏2 [a, b].

Definition. The function K(x, 𝜉) is said to be a Hilbert-Schmidt kernel if


b b
∫a ∫a |K(x, 𝜉)|2 dxd𝜉 < ∞.
We now prove that a Hilbert-Schmidt kernel generates a compact integral
operator on 𝔏2 , and we achieve this in a number of steps.

Theorem 7.4.20. If K(x, 𝜉) is continuous on the closed square [a, b] × [a, b] and
u ∈ 𝔏2 , then Tu is continuous on [a, b].

3 The set {en } is the canonical basis for 𝕂(ℕ).


OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 333

Proof. Let 𝜖 > 0. By the uniform continuity of K on [a, b] × [a, b], there exists a
number 𝛿 > 0 such that if |x1 − x2 | < 𝛿, then |K(x1 , 𝜉) − K(x2 , 𝜉)| < 𝜖. Now using
the Cauchy-Schwarz inequality,

b
| |
|Tu(x1 ) − Tu(x2 )| = || ∫ (K(x1 , 𝜉) − K(x2 , 𝜉))u(𝜉)d𝜉 ||
a
b
≤ ∫ |(K(x1 , 𝜉) − K(x2 , 𝜉))u(𝜉)|d𝜉
a
b 1/2 b 1/2
≤ ( ∫ |K(x1 , 𝜉) − K(x2 , 𝜉)|2 d𝜉) ( ∫ |u(𝜉)|2 d𝜉)
a a
b 1/2
≤ ( ∫ 𝜖2 d𝜉) ‖u‖2 = 𝜖(b − a)1/2 ‖u‖2 .
a

This proves the continuity of Tu. 

Corollary 7.4.21. If K is continuous on [a, b] × [a, b], and 𝔉 is a bounded subset of


𝔏2 , then T(𝔉) is equicontinuous.

Proof. This is obvious from the proof of the previous theorem since if ‖u‖2 ≤ C
for all u ∈ 𝔉, then |Tu(x1 ) − Tu(x2 )| ≤ C(b − a)1/2 𝜖 for all x1 , x2 ∈ [a, b] with
|x1 − x2 | < 𝛿. 

Theorem 7.4.22. If K(x, 𝜉) is continuous on [a, b] × [a, b], then the integral operator
it generates is a compact operator on 𝔏2 .

Proof. This result follows from the previous corollary and Ascoli’s theorem. If {un } is
a bounded sequence in 𝔏2 , then T(un ) is equicontinuous and bounded in 𝒞[a, b];
hence it contains a subsequence Tunk that converges uniformly in 𝒞[a, b]. Since,
for any function u ∈ 𝒞[a, b], ‖u‖2 ≤ (b − a)1/2 ‖u‖∞ , the subsequence Tunk is
convergent in 𝔏2 . 

We now prove the result we seek.

Theorem 7.4.23. If K(x, 𝜉) is a Hilbert-Schmidt kernel, then the integral operator it


generates is compact.

Proof. We utilize the fact that 𝒞([a, b] × [a, b]) is dense in 𝔏2 ([a, b] × [a, b]).
Let Kn (x, 𝜉) be a sequence of continuous functions on [a, b] × [a, b] such that
limn ‖Kn − K‖2 = 0. It suffices to show that if Tn is the compact integral operator
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

334 fundamentals of mathematical analysis

generated by Kn , then limn ‖Tn − T‖ = 0 in ℒ(𝔏2 ). Now

b b
‖Tn u − Tu‖22 = ∫ | ∫ (Kn (x, 𝜉) − K(x, 𝜉))u(𝜉)d𝜉|2 dx
a a
b b
≤ ∫ |Kn (x, 𝜉) − K(x, 𝜉)|2 d𝜉dx ∫ |u(𝜉)|2 d𝜉 = ‖Kn − K‖22 ‖u‖22 .
a a

Thus ‖Tn − T‖ ≤ ‖Kn − K‖2 → 0 as n → ∞. 

Example 12. Consider the Hilbert-Schmidt kernel K(x, 𝜉) = cos xcos 𝜉, 0 ≤ x, 𝜉 ≤


𝜋, and let T be the corresponding integral operator.
If 𝜆 ≠ 0 is an eigenvalue of T and u is the corresponding eigenvector, then
𝜋
(Tu)(x) = cos x ∫0 cos 𝜉u(𝜉)d𝜉 = 𝜆u = 𝜆u(x). It follows that u is a multiple of
cos x, the only nonzero eigenvalue is 𝜆1 = 𝜋/2, and the normalized eigenvector
2
is u1 (x) = √ cos x. In this case, 0 is an eigenvalue of infinite multiplicity, and
𝜋
the null-space of T is the orthogonal complement of the one-vector set {cos x}.
Using the result of example 10, we find the solutions of the equation Tf − 𝜆f = g
in two cases:

−g(x) c cos x
(a) 0 ≠ 𝜆 ≠ 𝜋/2. The unique solution of the equation is f(x) = + 𝜋 ,
𝜆 𝜆( −𝜆)
2
𝜋
where c = ∫0 g(𝜉)cos
𝜉d𝜉.
𝜋
(b) The equation Tf − f = g has a solution if and only if ⟨g, cos x⟩ = 0, and, in
2
−2g(x)
this case, f = + k cos x, where k is an arbitrary constant. 
𝜋

Exercises

1. Prove that 𝒦 is a subspace of ℒ(H).


2. The following characterization of compact operators is sometimes handy.
Let T ∈ ℒ(H). Prove that T is compact if and only if xn →w x, implies that
Txn converges in the norm to Tx. Hint: If xn →w x then Txn →w Tx. Now
see problem 10 on section 6.7.
3. Let {𝜆n } be a bounded sequence of complex numbers, and let {un } be
an orthonormal basis for H. Prove that the function T ∶ H → H defined

by Tx = ∑n=1 𝜆n ⟨x, un ⟩un is a bounded operator. Also show that T∗ x =

∑n=1 𝜆n ⟨x, un ⟩un and hence T is self-adjoint if and only if each 𝜆n is real.
Finally, show that {𝜆n } are all the eigenvalues of T.
4. This is a continuation of the previous exercise. Show that every compact
subset C of the complex plane is the spectrum of a bounded operator. Hint:
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 335

Let {𝜆n } be a dense subset of C, and define T as in the previous exercise.


Since {𝜆n } ⊆ 𝜎(T), C ⊆ 𝜎(T). To show that 𝜎(T) ⊆ C, let 𝜆 ∉ C, and show

that (T − 𝜆I)−1 x = ∑n=1 (𝜆n − 𝜆)−1 ⟨x, un ⟩un .
5. In this exercise, we construct a compact operator which has no eigenvalues.
Define a bounded operator T ∶ l2 → l2 by T(x) = (0, x1 , x2 /2, x3 /3, ...). Show
that T is compact and that 𝜎(T) = {0}. Hint: T = RoD, where R is the right
shift operator and D(x) = (x1 , x2 /2, ...). D is compact by example 4. Show
directly that T has no eigenvalues.
6. Give a direct proof of the following theorem without using the subsection
on the Fredholm theory. Let T be a compact self-adjoint operator on a
separable Hilbert space. Prove that the nonzero points of the spectrum of
T are eigenvalues. Hint: Use problem 15 on section 7.3, and examine the
proof of theorem 7.4.16.
7. Let T be a compact self-adjoint operator on a separable Hilbert space H.
Prove that there exists a set of orthonormal vectors {un } corresponding
to nonzero eigenvalues {𝜆n } such that every element x ∈ H can be written

uniquely as x = ∑n=1 ⟨x, un ⟩un + v, where v ∈ 𝒩(T). Some books refer to
this result as the Hilbert-Schmidt theorem. It is clearly equivalent to
theorem 7.4.17.
8. Let T be a compact self-adjoint operator, let {𝜆n } be the nonzero eigenvalues
of T, and let un be the corresponding eigenvectors. For a fixed g ∈ H,
consider the equation Tf − 𝜆f = g.
(a) Prove that if 𝜆 = 𝜆n , and 𝜆n has multiplicity m, with eigenvectors
v1 , … , vm , then the equation has a solution if and only if ⟨g, vi ⟩ = 0 for
all 1 ≤ i ≤ m and, in this case, the solutions are of the form

g m 𝜆 ⟨g, uk ⟩
f=− + ∑ ai vi + ∑{ k u ∶ k ≠ n}.
𝜆n i=1 𝜆n (𝜆k − 𝜆n ) k

Here a1 , … , am are arbitrary scalars.


∞ ⟨g,u ⟩
(b) Prove that if 𝜆 = 0 is not an eigenvalue, and ∑n=1 | n |2 < ∞, then
𝜆n
∞ ⟨g,un ⟩
f = ∑n=1 un is the unique solution of the equation.
𝜆k
(c) What can you say about the case when 𝜆 = 0 is an eigenvalue of T?
9. Let T be a self-adjoint operator. Show that if Tk is compact for some integer
k ≥ 2, then T is compact. Hint: It is enough to show that Tk−1 is compact.
10. Let K(x, 𝜉) = sin x cos 𝜉, 0 ≤ x, 𝜉 ≤ 𝜋. Show that 𝜆 = 0 is the only eigenvalue
of the corresponding integral operator.
11. Let K(x, 𝜉) be a Hilbert-Schmidt kernel, and let T be the corresponding
integral operator.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

336 fundamentals of mathematical analysis

(a) Show that if K(x, 𝜉) = K(𝜉, x), then T is self-adjoint.


1/2
b b
(b) Show that ‖T‖ ≤ ‖K‖2 = { ∫a ∫a |K(x, 𝜉)|2 dxd𝜉}
x𝜉
12. For a fixed 0 < k < 1, let K(x, 𝜉) = , 0 ≤ x, 𝜉 ≤ k, and let T be the
1−x2 𝜉 2
1 1+k 1
corresponding integral operator. Show that ‖T‖ ≤ log( ) − tan−1 k.
4 1−k 2
13. Let K(x, 𝜉) = 1 + sin x sin 𝜉, 0 ≤ x, 𝜉 ≤ 2𝜋, and let T be the corresponding
integral operator. Find all the eigenvalues of T. Then find the solution of the
2𝜋
integral equation ∫0 (1 + sin x sin 𝜉)u(𝜉)d𝜉 = 𝜆u(x) + x, where 𝜆 is not an
eigenvalue of T. Hint: If 𝜆 ≠ 0 is an eigenvalue of T, then the corresponding
eigenfunction must be of the form u(x) = A + B sin x.
14. Let K(x, 𝜉) = cos(x − 𝜉), 0 ≤ x, 𝜉 ≤ 2𝜋, and let T be the corresponding inte-
gral operator. Find all the eigenvalues of T, and describe the corresponding
eigenspaces.
15. Let K(x, 𝜉) be a Hilbert-Schmidt kernel, and let T be the corresponding
b
integral operator. Show that T2 u(x) = ∫a K2 (x, 𝜉)u(𝜉)d𝜉, where K2 (x, 𝜉) =
b b
∫a K(x, t)K(t, 𝜉)dt. In general, show that if Kn (x, 𝜉) = ∫a Kn−1 (x, t)K(t, 𝜉)dt,
b
then Tn u(x) = ∫a Kn (x, 𝜉)u(𝜉)d𝜉.
16. Let K(x, 𝜉) be a Hilbert-Schmidt kernel, and let T be the corresponding
integral operator. Show that if |𝜆| > ‖T‖, then the function F ∶ 𝔏2 → 𝔏2
1
defined by F(u) = [Tu − f] is a contraction on 𝔏2 . In this case, show that
𝜆
−1 ∞ Tn f
the solution of the equation Tu − 𝜆u = f is ∑n=0 n .
𝜆 𝜆
17. Let K(x, 𝜉) = x𝜉, 0 ≤ x, 𝜉 ≤ 1. Show that Kn (x, 𝜉) = x𝜉/3n−1 . Also show that
if |𝜆| > 1/3, then the solution of the integral equation Tu − 𝜆u = f is u(x) =
−f 3x 1
− ∫0 𝜉f(𝜉)d𝜉.
𝜆 𝜆(3𝜆−1)

7.5 Compact Operators on Banach Spaces

The reader may have observed that the definition of a compact operator makes
perfectly good sense for an operator on a Banach space. We state the definition
again. A linear operator on a Banach space X is compact if it maps bounded
subsets of X into relatively compact subsets of X. All the results in theorems 7.4.1
through 7.4.15 are valid for compact operators on Banach spaces. All the proofs
we presented for theorems 7.4.1 through 7.4.15, are valid without alteration for
compact operators on Banach spaces, with the exception of theorems 7.4.5, 7.4.6,
and 7.4.15. The proofs of theorems 7.4.1 through 7.4.15 (with the exceptions noted
above) were deliberately made more general than is needed for Hilbert spaces. For
example, we used Riesz’s theorem at several places when a simpler alternative was
available. As an illustration, in the proof of lemma 7.4.10, we could simply choose a
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 337

unit vector un ∈ NLn+1 such that un ⟂ NLn . Another place where the proof could be
simplified is theorem 7.4.9, where we could have used the orthogonal complement
N⊥L instead of a complement X1 of NL . We now furnish the proofs of theorems 7.4.5,
7.4.6, and 7.4.15 for compact operators on Banach spaces.

Lemma 7.5.1. Let Tn be a sequence of bounded operators on a Banach space X, and


let T be a bounded operator on X such that limn Tn (x) = T(x) for every x ∈ X.
Then, for every compact subset K of X, Tn converges uniformly to T on K.

Proof. By the Banach-Steinhaus theorem, supn ‖Tn ‖ < ∞. Choose a constant


M > 0 such that M > ‖T‖ and M > supn ‖Tn ‖. Suppose, for a contradiction,
that there exists a compact subset K of X on which (Tn ) does not converge
uniformly to T. Then there exists a sequence (xn ) of K, a subsequence (Sn )
of (Tn ), and a positive number 𝜖 such that ‖Sn (xn ) − T(xn )‖ > 𝜖 for every
n ∈ ℕ. By the compactness of K, (xn ) contains a convergent subsequence (yn ).
Let y = limn yn . Now ‖Sn (yn ) − T(yn )‖ ≤ ‖Sn (y) − T(y)‖ + ‖(Sn − T)(yn − y)‖ ≤
‖Sn (y) − T(y)‖ + 2M‖yn − y‖ → 0. This contradicts ‖Sn (yn ) − T(yn )‖ ≥ 𝜖 and
concludes the proof. 

The following is a partial generalization of theorem 7.4.5.

Theorem 7.5.2. If a Banach space X has a Schauder basis, then every compact
operator T on X is the limit of a sequence of finite-rank operators.

Proof. Let {un } be a Schauder basis for X, and let Pn be the canonical projection of
X onto Span{u1 , … , un } (see the definition before problem 25 on section 6.2). We
prove that the sequence Tn = Pn T of finite-rank operators converges in ℒ(X) to T.
For every x ∈ X, limn (Pn − I)(x) = 0. By the previous lemma, sequence (Pn − I)
converges uniformly to 0 on compact subsets of X. In particular, Pn − I converges
uniformly to 0 on T(B). Now ‖Tn − T‖ = supx∈B ‖Tn (x) − T(x)‖ = supx∈B ‖(Pn −
I)(Tx)‖ → 0. 

Theorem 7.5.3. A bounded operator T on a Banach space X is compact if and only


if T∗ is compact.

Proof. Suppose T is compact. The proof that T∗ is compact is a slight modification


of the proof of theorem 7.4.6. Let B and B∗ denote the closed unit balls in X and
X∗ , respectively. We need to show that T∗ (B∗ ) is relatively compact in X∗ . Let (𝜆n )
be a sequence of functionals in B∗ . For x, x′ ∈ X, |𝜆n (x) − 𝜆n (x′ )| ≤ ‖x − x′ ‖. It
follows that the sequence (𝜆n ) is equicontinuous on X and, in particular, on T(B),
which is compact by assumption. Ascoli’s theorem guarantees a subsequence (𝜆nk )
of (𝜆n ) that converges uniformly on T(B). Now
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

338 fundamentals of mathematical analysis

‖T∗ 𝜆ni − T∗ 𝜆nj ‖ = supx∈B |⟨x, T∗ (𝜆ni − 𝜆nj )⟩| = supx∈B |⟨Tx, 𝜆ni − 𝜆nj ⟩|
= supx∈B |⟨Tx, 𝜆ni ⟩ − ⟨Tx, 𝜆nj ⟩|
= supx∈B |𝜆ni (Tx) − 𝜆nj (Tx)|.

The uniform convergence of 𝜆nk on T(B) guarantees that the last quantity can be
made less than 𝜖 for sufficiently large integers i and j. Thus T∗ 𝜆nk is Cauchy and
hence convergent.

We now prove that if T∗ is compact, then T is compact. If T∗ is compact, then, by


the first part of the theorem, T∗∗ is compact. Let B̂ be the image of B under the
natural embedding of X into X∗∗ . By the compactness of T∗∗ , T∗∗ (B∗∗ ) is compact,
and hence its subset T∗∗ (B)̂ is also compact. By theorem 6.6.3, T∗∗ (B)̂ = (T(B)) .̂
Therefore (T(B)) ̂ is compact, and hence T(B) is compact since it is isometric to
(T(B)) .̂ This proves the compactness of T. 

Using annihilators is not as simple as using orthogonal complements, for the


simple reason that the annihilator of a subspace M of a Banach space X resides
in a different space, X∗ . Thus the fact that H = RL ⊕ NL∗ makes no sense if H
is replaced with a Banach space X. However, we will generalize the fact that
the dimensions of the spaces NL , NL∗ , X/RL , and X∗ /RL∗ are all finite and equal
(theorem 7.4.15). We adopt the standing assumption that T is a compact operator
on X, and use the notation that 𝜆 is a nonzero complex number, L = T − 𝜆I,
L∗ = T∗ − 𝜆I, NL = Ker(L), RL = ℜ(L), NL∗ = Ker(L∗ ), and RL∗ = ℜ(L∗ ).

Recall that we have already established (theorem 7.4.8) that NL and NL∗ are finite
dimensional and that RL and RL∗ are closed (theorem 7.4.9).

Lemma 7.5.4. dim(X/RL ) ≤ dim(NL∗ ).

Proof. Let x1 , … , xn ∈ X be such that x̃i = xi + RL are linearly independent in X/RL .


Then, for each i, xi ∉ RL + Span{x1 , … , xi−1 , xi+1 , … , xn } = Mi . Because the spaces
Mi are closed (see problem 18 on section 6.6), there exist bounded linear func-
tionals 𝜆1 , … , 𝜆n ∈ X∗ such that 𝜆i (xi ) = 1, 𝜆i (Mi ) = 0. Clearly, {𝜆1 , … , 𝜆n } are
independent in X∗ (reason: 𝜆i (xj ) = 𝛿ij ), and since 𝜆i (RL ) = 0, 𝜆i ∈ R⊥L = NL∗ .
Thus dim(X/RL ) ≤ dim(NL∗ ). 

Theorem 7.5.5. dim(X/RL ) = dim(NL∗ ).

Proof. Since X/RL is finite dimensional by the previous lemma, dim(X/RL ) =


dim(X/RL )∗ . Applying theorem 6.6.7 with M = RL , (X/RL )∗ is isometrically
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

hilbert spaces 339

isomorphic to R⊥L = NL∗ . Thus dim(X/RL )∗ = dim(NL∗ ), and dim(X/RL ) =


dim(X/RL )∗ = dim(NL∗ ). 

Theorem 7.5.6. dim(X/RL ) = dim(NL ).

Proof. Let y1 , … , yn be such that yĩ = yi + RL form a basis for X/RL , let x1 , … , xm
be a basis for NL , and let X1 be a closed complement of NL . We will show
that m = n. Suppose that m < n. Define a finite-rank operator F ∈ ℒ(X) by
F|X1 = 0, Fxi = yi for 1 ≤ i ≤ m. The operator K = T + F is compact, and we
claim that K − 𝜆I is one-to-one. If (K − 𝜆I)(x) = 0, then (T − 𝜆I)(x) = −Fx ∈
RL ∩ Span{y1 , … , yn } = {0}. Thus (T − 𝜆I)x = 0 = Fx, and hence x ∈ NL . The
restriction of F to NL is clearly one-to-one; hence Fx = 0 implies that x = 0,
and we have proved the claim. By the Fredholm alternative, K − 𝜆I is onto,
which contradicts the fact that ym+1 is not in the range of K − 𝜆I. (Note that
ℜ(K − 𝜆I) ⊆ RL ⊕ Span{y1 , … , ym }).

We have proved that m ≥ n. If m > n, define a finite rank operator F by F|X1 = 0,


Fxi = yi for 1 ≤ i ≤ n, and Fxi = yn for n ≤ i ≤ m. In this case, K − 𝜆I is onto
(note that ℜ(K − 𝜆I) = RL ⊕ Span{y1 , … , yn } = X) and hence one-to-one by the
Fredholm alternative theorem. But this contradicts the fact that (K − 𝜆I)(xn ) =
Fxn = Fxn+1 = (K − 𝜆I)(xn+1 ). Therefore m = n. 

The following result follows immediately.

Theorem 7.5.7. The following numbers are finite and equal:

dim(NL ), dim(X/RL ), dim(NL∗ ), and dim(X∗ /RL∗ ). 

Exercises

1. Find an example of an unbounded, finite-rank linear operator on a Banach


space.
2. Verify the details of the proof of theorem 7.5.6.
x
3. Let X = 𝒞[0, 1], and define (Tu)(x) = ∫0 u(t)dt. Prove that T is compact.
1
4. Let X = 𝒞[0, 1], and define (Tu)(x) = ∫0 ext u(t)dt. Prove that T is compact.
1 1 u(𝜉)
5. Let X = 𝒞[−1, 1], and define (Tu)(x) = ∫−1 d𝜉. Prove that T is
𝜋 1+(x−𝜉)2
compact, and estimate ‖T‖.
1
6. Let X = 𝒞[−1, 1], and define (Tu)(x) = x ∫−1 𝜉u(𝜉)d𝜉. It is clear that T is
compact. Show that 𝜆 = 0, and 𝜆 = 2/3 are the only eigenvalues of T and
−1 3x 1
that if 0 ≠ 𝜆 ≠ 2/3, (T − 𝜆I)−1 f = [ ∫ 𝜉f(𝜉)d𝜉 + f(x)]. Is it true that
𝜆 3𝜆−2 −1
X is the direct sum of the two eigenspaces?
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

8
Integration Theory

The only teaching that a professor can give, in my opinion, is that of thinking in
front of his students.
Henri Lebesgue

Henri Lebesgue. 1875–1941

Lebesgue entered the École Normale Supérieure in Paris in 1894 and was awarded
his teaching diploma in mathematics in 1897. He studied Baire’s papers on dis-
continuous functions and realized that much more could be achieved in this area.
Building on the work of others, including that of Émile Borel and Camille Jordan,
Lebesgue formulated measure theory, which he published in 1901. He generalized
the definition of the Riemann integral by extending the concept of the area (or
measure), and his definition allowed the integrability of a much wider class of
functions, including many discontinuous functions. This generalization of the
Riemann integral revolutionized integral calculus. Up to the end of the nineteenth
century, mathematical analysis was limited to continuous functions, based largely
on the Riemann method of integration.

Fundamentals of Mathematical Analysis. Adel N. Boules, Oxford University Press (2021). © Adel N. Boules.
DOI: 10.1093/oso/9780198868781.003.0008
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

342 fundamentals of mathematical analysis

Hawkins writes,1

In Lebesgue’s work ... the generalized definition of the integral was simply
the starting point of his contributions to integration theory. What made the
new definition important was that Lebesgue was able to recognize in it an
analytic tool capable of dealing with—and to a large extent overcoming—
the numerous theoretical difficulties that had arisen in connection with Rie-
mann’s theory of integration. In fact, the problems posed by these difficulties
motivated all of Lebesgue’s major results.

After he received his doctorate in 1902, Lebesgue held appointments in regional


colleges. In 1910 he was appointed to the Sorbonne, where he was promoted to
Professor of the Application of Geometry to Analysis in 1918. In 1921 he was
named as Professor of Mathematics at the Collège de France, a position he held
until his death in 1941. He also taught at the École Supérieure de Physique et de
Chimie Industrielles de la Ville de Paris between 1927 and 1937 and at the École
Normale Supérieure in Sèvres.

Lebesgue did not concentrate throughout his career on the field which he started.
He also made major contributions in other areas of mathematics, including
topology, potential theory, the Dirichlet problem, the calculus of variations, set
theory, the theory of surface area, and dimension theory.

8.1 The Riemann Integral

In this section, we treat the definition and the fundamental properties of the
Riemann integral of a bounded function on a compact box. The main reason for
the inclusion of this section is that our definition of Lebesgue measure is, loosely
stated, based on the notion that the Riemann integral of a continuous function f
on a compact box measures the volume of the region below the graph of f. The
presentation in this section is standard and reflects almost exactly the standard
approach to the Riemann integral on a compact interval found in undergraduate
real analysis textbooks.

Let I = [a, b] be a compact interval. A grid in I is a sequence of points x0 = a <


x1 < x2 < ... < xk = b.
Each grid in I defines a partition of I into a finite set of closed intervals
𝒫 = {[x0 , x1 ], [x1 , x2 ], … , [xk−1 , xk ]}. We make no distinction between a grid in

1 T. Hawkins, “Lebesgue, Henri Léon” in C. C. Gillispie, F. L. Holmes, and N. Koertge (eds.), Complete
Dictionary of Scientific Biography (Detroit: Charles Scribner’s Sons, 2008), 110–12.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 343

I and the partition it generates. We also denote a partition of I by the sequence


that defines it, {x0 , … , xk }. We say that a partition 𝒫 ′ = {y0 , … , ym } is a refinement
of a partition 𝒫 = {x0 , … , xk } if {x0 , … , xk } ⊆ {y0 , … , ym }. This simply means that
𝒫 ′ is obtained from 𝒫 by inserting additional grid points between some (or all)
consecutive points xi and xi+1 . Note that if 𝒫 ′ is a refinement of 𝒫, then every
interval in 𝒫 is the union of intervals in 𝒫 ′ . If 𝒫 and 𝒫 ′ are partitions of [a, b],
then 𝒫 and 𝒫 ′ have a common refinement, namely, the partition generated by the
grid {x0 , … , xk } ∪ {y0 , … , ym }.

Let I1 , … , In be compact intervals. The closed box in ℝn determined by I1 , … , In


is Q = I1 × ... × In . Thus if Ii = [ai , bi ], then Q = {x = (x1 , … , xn ) ∶ ai ≤ xi ≤ bi }. By
n
definition, the volume of the box Q is vol (Q) = ∏i=1 (bi − ai ). It is easy to show
n 1/2
that diam(Q) = (∑i=1 (bi − ai )2 ) . Now if, for each 1 ≤ i ≤ n, 𝒫i is a partition of Ii ,
then the corresponding partition of Q is Δ = 𝒫1 × ... × 𝒫n . We often use the notation
𝜎 to denote a typical sub-box in Δ. Thus we use the following notation to denote
the partition of Q generated by 𝒫1 , … , 𝒫n :

Δ = {𝜎 = J1 × J2 × ... × Jn ∶ Ji ∈ 𝒫i } .

By a refinement Δ′ of Δ, we mean a sequence of refinements 𝒫1′ , … , 𝒫n′ of 𝒫1 , … , 𝒫n ,


respectively, and
Δ′ = {𝜎 ′ = J′1 × ... × J′n ∶ J′i ∈ 𝒫i ′ }.
Again, if Δ′ is a refinement of Δ, then every sub-box 𝜎 in Δ is the union of sub-
s
boxes {𝜎1′ , … , 𝜎s′ } in Δ′ such that vol(𝜎) = ∑j=1 vol(𝜎j′ ).

Now let f be a bounded real-valued function on Q, and let Δ be a partition of Q.


Let the sub-boxes in Δ be enumerated as 𝜎1 , … , 𝜎K . We use the notation

f 𝜍i = supx∈𝜍i f (x), and f𝜍i = infx∈𝜍i f (x).

Both numbers are finite because f is assumed to be bounded. We define the upper
and lower Riemann sums, respectively, of f corresponding to the partition Δ on
Q by
K K
S∆ (f ) = ∑ f 𝜍i vol(𝜎i ), and s∆ (f ) = ∑ f𝜍i vol(𝜎i ).
i=1 i=1

Clearly, s∆ (f ) ≤ S∆ (f ). Since f is bounded, there exist real numbers m and M such


that, for every x ∈ Q, m ≤ f (x) ≤ M. For an arbitrary partition Δ of Q, f 𝜍i ≥ m, so
K
S∆ (f ) ≥ m ∑i=1 vol(𝜎i ) = m vol(Q). Thus the set {S∆ (f )} of upper Riemann sums is
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

344 fundamentals of mathematical analysis

bounded below, and hence the number 𝛽 = inf∆ S∆ (f ) is finite, where inf is taken
over all partitions Δ of Q.
Similarly, 𝛼 = sup∆ s∆ (f ) is a finite number. The numbers 𝛼 and 𝛽 are called,
respectively, the lower and upper Riemann integrals of f over Q.

Definition. A bounded function f on a box Q is Riemann integrable over Q if


𝛼 = 𝛽. In this case, we use the notation ∫Q f (x)dx to denote the common value
of 𝛼 and 𝛽, and we call this number the Riemann integral of f over Q.

An important property of refinements is the following: if Δ′ is a refinement of


Δ, then

S∆ (f ) ≤ S∆ (f ), and s∆′ (f ) ≥ s∆ (f ).

The reason is as follows: Consider the contribution f 𝜍i vol(𝜎i ) of one sub-box 𝜎i to


the upper Riemann sum of f corresponding to the partition Δ. Since 𝜎i is the union

of sub-boxes 𝜎1′ , … , 𝜎s′ in Δ′ , f 𝜍j = supx∈𝜍j′ f (x) ≤ supx∈𝜍i f (x) = f 𝜍i . Therefore the
sum of the contributions of 𝜎1′ , … , 𝜎s′ to the upper Riemann sum corresponding
s ′ s
to Δ′ is ∑j=1 supx∈𝜍j′ f 𝜍j vol(𝜎j′ ) ≤ f 𝜍i ∑j=1 vol(𝜎j′ ) = f 𝜍i vol(𝜎i ). This shows that

S∆ (f ) ≤ S∆ (f ). The fact that s∆′ (f ) ≥ s∆ (f ) is justified using a similar estimate. The
reader can now check that, for any two partitions Δ1 and Δ2 of Q,

s∆1 (f ) ≤ S∆2 (f ).

See problem 2 at the end of this section. Therefore, 𝛼 ≤ 𝛽.

Theorem 8.1.1. A bounded function f on a box Q is Riemann integrable if and only


if, for every 𝜖 > 0, there exists a partition Δ of Q such that S∆ (f ) − s∆ (f ) < 𝜖.

Proof. Let 𝜖 > 0, and let Δ be such that S∆ (f ) − s∆ (f ) < 𝜖. Now s∆ (f ) ≤ 𝛼 ≤ 𝛽 ≤


S∆ (f ). Therefore 𝛽 ≤ 𝛼 + 𝜖. Since 𝜖 is arbitrary, 𝛽 ≤ 𝛼, and hence 𝛼 = 𝛽. Con-
versely, if 𝛼 = 𝛽, and 𝜖 > 0, then there exist partitions Δ1 and Δ2 of Q such that
S∆1 − 𝛼 < 𝜖/2, and 𝛼 − s∆2 < 𝜖/2. Let Δ be a common refinement of Δ1 and Δ2 .
Then S∆ (f ) ≤ S∆1 (f ), and s∆ (f ) ≥ s∆2 (f ). Therefore, S∆ (f ) − s∆ (f ) = S∆ (f ) − 𝛼 +
𝛼 − s∆ (f ) < 𝜖/2 + 𝜖/2 = 𝜖. 

Example 1. If f ∶ [a, b] → ℝ is integrable, then | f | is integrable.


Let Δ be a partition of [a, b], and let 𝜎i be one of the subintervals in Δ. It is easy
to see that, for x, y ∈ 𝜎i , | f (x)| − | f(y)| ≤ f 𝜍i − f𝜍i . It follows that | f |𝜍i − | f |𝜍i ≤
f 𝜍i − f𝜍i ; hence S∆ (| f |) − s∆ (| f |) ≤ S∆ (f ) − s∆ (f ). Since f is integrable, there is a
partition Δ such that S∆ (f ) − s∆ (f ) < 𝜖. The result now follows from theorem
8.1.1. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 345

Example 2. Under the assumptions of example 1, f2 is integrable.


Since f2 = | f |2 , we may assume that f is a nonnegative function; hence

(f2 )𝜍i = (f 𝜍i )2 , and (f2 )𝜍i = (f𝜍i )2 .

Now
(f2 )𝜍i − (f2 )𝜍i = (f 𝜍i + f𝜍i )(f 𝜍i − f𝜍i ) ≤ 2M(f 𝜍i − f𝜍i ),

where M is an upper bound of f on I. The result now follows from


theorem 8.1.1. 

Example 3. If f and g are integrable on an interval [a, b], then so is fg.


By problem 1 at the end of this section, the functions f + g and f − g are
1
integrable. By example 2, fg is integrable since fg = [(f + g)2 − (f − g)2 ]. 
4

Example 4. The converse of the result in example 2 is false.


For example, the function

1 if x ∈ ℚ,
f (x) = {
−1 if x ∉ ℚ

is not integrable on [0, 1], but f2 is. 

Now we consider a special sequence of partitions of Q that is very useful in proving


results, especially when f is continuous. As before, Q = I1 × ... × In . For each k ∈ ℕ,
and for 1 ≤ i ≤ n, let 𝒫i be the partition of Ii into 2k subintervals of equal length
bi −ai
, and let Δk be the corresponding partition of Q. This is the construction
2k
described earlier except that each of the intervals I1 , … , In is divided into the same
number of congruent subintervals, which is a power of 2. It follows that each
Δk+1 is a refinement of Δk . Thus Δk consists of 2nk congruent sub-boxes, and
b −a b −a vol(Q)
each sub-box 𝜎 has dimensions 1 k 1 , … , n k n , vol(𝜎) = nk , and diam(𝜎) =
2 2 2
1 n
(∑i=1 (bi − ai )2 )1/2 . Denote the sub-boxes in the partition Δk by 𝜎1 , … , 𝜎2nk . As
2k
before, we form the upper and lower Riemann sums of f corresponding to the
partition Δk and write

2nk 2nk
𝜍i
Sk (f ) = ∑ f vol(𝜎i ), and sk (f ) = ∑ f𝜍i vol(𝜎i ).
i=1 i=1

Since Δk+1 is a refinement of Δk ,

S1 (f ) ≥ S2 (f ) ≥ … , and s1 (f ) ≤ s2 (f ) ≤ … .
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

346 fundamentals of mathematical analysis

As we discussed, the above sequences are bounded; hence

𝛼0 = lim sk (f ) and 𝛽0 = lim Sk (f )


k k

are finite and 𝛼0 ≤ 𝛽0 . Also 𝛼0 ≤ 𝛼 ≤ 𝛽 ≤ 𝛽0 .


For the rest of this section, we assume that f is continuous on Q.

Theorem 8.1.2. If f is a continuous real-valued function on Q, then f is Riemann


integrable, and

∫Q f (x)dx = limk sk (f ) = limk Sk (f ).

Proof. In the notation of the previous paragraph, we prove that 𝛼0 = 𝛽0 . This will
establish all the assertions of the theorem. Let 𝜖 > 0, and let k be a positive
integer such that |Sk (f ) − 𝛽0 | < 𝜖/3, and |sk (f ) − 𝛼0 | < 𝜖/3. Since f is uniformly
𝜖
continuous on Q, there exists 𝛿 > 0 such that | f (x) − f(y)| < whenever
3 vol(Q)
‖x − y‖ < 𝛿. We may assume, without loss of generality, that the integer k is such
that the diameter of each sub-box in Δk is less than 𝛿. Since f assumes its maximum
𝜖
and minimum values on 𝜎i in 𝜎i , | f 𝜍i − f𝜍i | < , for each 1 ≤ i ≤ 2nk . Now
3 vol(Q)
2nk 𝜖 2nk
|Sk (f ) − sk (f )| ≤ ∑i=1 | f 𝜍i − f𝜍i |vol(𝜎i ) ≤ ∑i=1 vol(𝜎i ) = 𝜖/3. Finally,
3 vol(Q)
|𝛼0 − 𝛽0 | ≤ |𝛼0 − sk (f )| + |sk (f ) − Sk (f )| + |Sk (f ) − 𝛽0 | < 𝜖/3 + 𝜖/3 + 𝜖/3 = 𝜖.
Since 𝜖 is arbitrary, 𝛼0 = 𝛽0 . 

Theorem 8.1.3. If f and g are continuous on Q, then

∫(f + g)dx = ∫ fdx + ∫ gdx.


Q Q Q

2nk
Proof. Sk (f + g) = ∑i=1 (f + g)𝜍i vol(𝜎i ). Now (f + g)𝜍i = maxx∈𝜍i (f (x) + g(x)) ≤
maxx∈𝜍i f (x) + maxx∈𝜍i g(x) = f 𝜍i + g𝜍i . Therefore, Sk (f + g) ≤ Sk (f ) + Sk (g).
Taking the limit of both sides as k → ∞, ∫Q (f + g)dx ≤ ∫Q f + ∫Q gdx. Similarly,
sk (f + g) ≥ sk (f ) + sk (g); hence ∫Q (f + g)dx ≥ ∫Q f + ∫Q gdx. 

Example 5. Let f ∶ ℝn → ℂ be continuous. If ∫Q | f (x)|dx = 0 for every cube Q,


then f = 0.
Suppose, contrary to our assertion, there is a point x0 ∈ ℝn such that m =
| f(x0 )| > 0. By the continuity of f, there is a cube Q centered at x0 such that,
for x ∈ Q, | f (x) − f(x0 )| < m/2. Now, for x ∈ Q, m − | f (x)| = | f(x0 )| − | f (x)| ≤
| f(x0 ) − f (x)| < m/2. Thus | f (x)| > m/2 for all x ∈ Q. Consequently, ∫Q f (x)dx ≥
m vol(Q)
. This contradiction proves the result. 
2
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 347

Lemma 8.1.4. Let f be continuous on Q. Then ∫Q (−f )dx = − ∫Q dx.

Proof. Since (−f )𝜍i = maxx∈𝜍i (−f (x)) = −minx∈𝜍i f (x) = −f𝜍i ,

2nk
∫(−f )dx = lim Sk (−f ) = lim ∑(−f )𝜍i vol(𝜎i )
k k
Q i=1
2nk
= − lim ∑ f𝜍i vol(𝜎i ) = − lim sk (f ) = − ∫ fdx. 
k k
i=1 Q

Theorem 8.1.5. If f is a continuous real-valued function on Q, and a ∈ ℝ, then


∫Q (af)dx = a ∫Q fdx.

Proof. If a ≥ 0, the proof is simple. If a < 0, then

∫(af)dx = ∫ |a|(−f )dx = |a| ∫(−f )dx = −|a| ∫ fdx = a ∫ fdx. 


Q Q Q Q Q

It is now easy to verify the linearity of the integral: if f and g are continuous on Q,
and a, b ∈ ℝ, then ∫Q (af + bg)dx = a ∫Q fdx + b ∫Q gdx.

Theorem 8.1.6. Let f and g be continuous real-valued functions on Q. Then

(a) if f ≥ 0, then ∫Q f ≥ 0; and


(b) if f ≤ g on Q, then ∫Q fdx ≤ ∫Q gdx.

Proof. Part (a) follows from the definition.


To prove (b), let h = g − f. Then h ≥ 0; hence, by (a),
∫Q hdx ≥ 0, so ∫Q gdx − ∫Q fdx = ∫Q (g − f )dx = ∫Q hdx ≥ 0. 

Definition. Let f be a continuous, complex-valued function on Q, and write


f = f1 + if2 , where f1 and f2 are continuous real-valued functions. Define

∫Q fdx = ∫Q f1 + i ∫Q f2 dx.

Theorem 8.1.7. For continuous complex-valued functions f and g, and all a, b ∈ ℂ,

∫Q (af + bg)dx = a ∫Q fdx + b ∫Q gdx.

The proof is purely computational and is left as an exercise. 

Theorems 8.1.6(a) and 8.1.7 are often summarized by the terminology that the
Riemann integral is a positive linear functional on the space 𝒞(Q) of continuous
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

348 fundamentals of mathematical analysis

complex-valued functions on Q. The positivity of the integral means that, for f ≥ 0,


∫Q fdx ≥ 0.

Exercises

In all the exercises below, we assume that f is a bounded function on a box Q.

1. Prove that the sum (difference) of two integrable functions f and g is


integrable, and ∫Q (f ± g)dx = ∫Q fdx ± ∫Q gdx.
2. Prove that, for any two partitions Δ1 and Δ2 of Q, s∆1 ≤ S∆2 . Hint: Consider
a common refinement, Δ, of Δ1 and Δ2 .
3. Let f and g be integrable on [a, b], and let f ≤ h ≤ g. Give an example to show
that h need not be integrable.
4. Suppose f is integrable, and f (x) ≥ m > 0 for some constant m and all
x ∈ [a, b]. Prove that 1/f is integrable on [a, b].
5. Let a = x0 < x1 < ... < xn = b be a partition of the interval [a, b]. For 1 ≤
k ≤ n − 1, let Ek = [xk−1 , xk ), and let En = [xn−1 , xn ]. For constants a1 , … , an ,
n
define s = ∑k=1 ak 𝜒Ek . Such a function is called a step function. Prove that
b n
∫a s(x)dx = ∑k=1 ak (xk − xk−1 ).
b
6. Let f ∶ [a, b] → [0, ∞) be an integrable function. Prove that ∫a f (x)dx =
b
sup{∫a s(x)dx}, where the supremum is taken over all step functions s such
that s ≤ f.
In all of the remaining exercises, we assume that f is continuous on Q.
7. (a) Prove theorem 8.1.7.
(b) Prove that | ∫Q fdx| ≤ ∫Q | f |dx. Prove the statement for complex-valued
functions f.
8. Let f ≥ 0 be such that ∫Q fdx = 0. Prove that f = 0.
9. Suppose that the sequence fk converges to f in 𝒞(Q), that is, in the uniform
norm. Prove that limk ∫Q fk dx = ∫Q fdx
1
10. Define the average value of the function f on Q by fav = ∫ fdx. Prove
vol(Q) Q
that there exists a point 𝜉 ∈ Q such that f(𝜉) = fav .
11. Let Q = I1 × ... × In , let Ji be a closed subinterval of Ii for 1 ≤ i ≤ n, and let
Q1 = J1 × ... × Jn . Prove that if f ≥ 0 on Q, then ∫Q1 fdx ≤ ∫Q fdx.
12. Let I1 , … , In be compact intervals, and let c be an interior point in I1 = [a, b].
Suppose Q1 = [a, c] × I2 × ... × In and that Q2 = [c, b] × I2 × ... × In . Show
that if f is continuous on Q1 ∪ Q2 , then ∫Q1 ∪Q2 fdx = ∫Q1 fdx + ∫Q2 fdx.
13. Fubini’s theorem. Let {Ii = [ai , bi ], 1 ≤ i ≤ n} be compact intervals, let
Q = I1 × ... × In , and let Q′ = I2 × ... × In . For a point x = (x1 , x2 , … , xn ) ∈ Q,
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 349

we write x = (x1 , x′ ), where x′ = (x2 , … , xn ) ∈ Q′ . Prove that ∫Q f (x)dx =


b
∫a11 ∫Q′ f(x1 , x′ )dx′ dx1 . It follows that ∫Q fdx can be computed by evaluating
b b b
the iterated integral ∫a11 ∫a22 ... ∫ann fdxn ...dx1 .
14. The fundamental theorem of calculus. Let f ∈ 𝒞[a, b].
x
(a) For x ∈ [a, b], define F (x) = ∫a f(t)dt. Show that F is differentiable and
that F′ (x) = f (x).
(b) Show that if F is differentiable in an open interval containing [a, b] such
b
that F′ = f on [a, b], then ∫a f (x)dx = F(b) − F(a).

8.2 Measure Spaces

Let us consider the problem of measuring the volume of objects (sets) in ℝ3 .


Strictly speaking, volume is a function that assigns a nonnegative number to a
subset of ℝ3 . A natural question is whether it is possible to measure the volume
of an arbitrary subset of ℝ3 . For the most natural measure on ℝ3 , namely, the
Lebesgue measure, the answer to the question is no. In other words, there are
subsets of ℝ3 to which a volume cannot be assigned. The question then becomes
that of finding a large enough collection of ℝ3 for which a volume can be assigned.
Such sets are called measurable. It is clearly desirable for the finite union of
measurable sets to be measurable. It was a paradigm shift when it was realized
that a successful formulation of a measure theory necessitates that we allow
the countable union of measurable sets to be measurable, and this leads to the
definition of a 𝜎-algebra. The definition of a measure as a set function on a
𝜎-algebra is quite intuitive. This section develops the basics of abstract measure
theory and measurable functions. The picture continues to evolve and culminates
in section 8.4 with the construction of the Lebesgue measure.

For the remainder of this chapter, we use the notation E′ for the complement X − E
of a subset E of a set X.

Definitions. A collection 𝔐 of subsets of a nonempty set X is said to be an algebra


of sets in X if the following two conditions are met:

(a) if E ∈ 𝔐, then E′ ∈ 𝔐; and


(b) if E1 , E2 ∈ 𝔐, then E1 ∪ E2 ∈ 𝔐.

An algebra 𝔐 is called a 𝜎-algebra if it satisfies the additional condition

(c) if (En ) is a sequence in 𝔐, then ∪∞


n=1 En ∈ 𝔐.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

350 fundamentals of mathematical analysis

Example 1. For an arbitrary set X, the power set 𝒫(X) is a 𝜎-algebra in X. 

Example 2. Let X be an uncountable set. A subset E of X is called co-countable if


E′ is countable. The collection of countable and co-countable subsets of X is a
𝜎-algebra. 

Theorem 8.2.1.
(a) If 𝔐 is an algebra, then ∅, X ∈ 𝔐.
(b) If 𝔐 is an algebra and E1 , E2 ∈ 𝔐,then E1 ∩ E2 ∈ 𝔐, and E1 − E2 ∈ 𝔐. It
follows by induction that an algebra is closed under the formation of finite
unions and intersections.
(c) If 𝔐 is a 𝜎-algebra, and En ∈ 𝔐, then ∩∞ n=1 En ∈ 𝔐.

Proof.
(a) Let E ∈ 𝔐. Then E′ ∈ 𝔐; hence X = E ∪ E′ ∈ 𝔐, and ∅ = X′ ∈ 𝔐.
(b) Using De Morgan’s laws, if E1 , E2 ∈ 𝔐, then E1 ∩ E2 = (E′1 ∪ E′2 )′ ∈ 𝔐. Also
E1 − E2 = E1 ∩ E′2 ∈ 𝔐.
n=1 En = (∪n=1 En ) . 
′ ′
(c) This follows from De Morgan’s law, since ∩∞ ∞

Theorem 8.2.2. Let ℭ be an arbitrary collection of subsets of a set X. Then there


exits a (unique) smallest 𝜎-algebra 𝔐 that contains ℭ.

Proof. It is clear that the intersection of a family of 𝜎-algebras is a 𝜎-algebra.


The collection of 𝜎-algebras on X containing ℭ is not empty since 𝒫(X) is such
an algebra. Now take 𝔐 to be the intersection of all the 𝜎-algebras in X that
contain ℭ. 

Definition. The smallest 𝜎-algebra that contains a collection of sets ℭ is called the
𝜎-algebra generated by ℭ.

Definition. Let X be a metric (topological) space. The smallest 𝜎-algebra in X


containing the collection of open subsets of X is called the Borel algebra in
X, and its members are called the Borel subsets of X. The collection of Borel
sets of X is denoted by ℬ(X). In particular, the 𝜎-algebras ℬ(ℝn ) are of central
importance.

Example 3. The collection ℭ = {(a, b) ∶ a, b ∈ ℝ, a < b} generates ℬ(ℝ).


Since every member of ℭ is an open set, the 𝜎-algebra generated by ℭ is
contained in ℬ(ℝ). Now ℭ generates ℬ(ℝ) because every open subset of ℝ
is a countable union of members of ℭ. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 351

Definition. Let X be a metric (topological) space. The intersection of a countable


collection of open subsets of X is known as a G𝛿 set. The countable union of
closed subsets of X is called an F𝜍 set.

It follows from theorem 8.2.1 that ℬ(X) contains all open sets, closed sets, F𝜍 sets,
and G𝛿 sets.

Definitions. Let 𝔐 be a 𝜎-algebra of subsets in X. A positive measure on 𝔐 is a


set function 𝜇 ∶ 𝔐 → [0, ∞] such that

(a) 𝜇 ≢ ∞, in the sense that 𝜇(E) < ∞ for at least one E ∈ 𝔐; and
(b) if {En } is a countable collection of mutually disjoint members of 𝔐, then


𝜇(∪∞
n=1 En ) = ∑ 𝜇(En ).
n=1

The pair (X, 𝔐) is called a measurable space, the members of 𝔐 are called
measurable sets, and (X, 𝔐, 𝜇) is called a measure space. If 𝔐 and 𝜇 are
understood, we loosely say that X is a measure space.

Property (b) is known as the countable additivity of positive measures.


If 𝜇(X) < ∞, we say that 𝜇 is a finite positive measure.

Example 4. the (counting measure). Let X be a nonempty set, and let 𝔐 = 𝒫(X).
Define 𝜇 ∶ 𝔐 → ℝ as follows: 𝜇(E) = Card(E) if E is finite, and 𝜇(E) = ∞
otherwise. Then 𝜇 is a measure on 𝒫(X). 

Example 5. the (Dirac measure). Let X be a nonempty set, and let 𝔐 = 𝒫(X).
Fix an element x0 ∈ X, and define 𝜇 ∶ 𝔐 → ℝ as follows: 𝜇(E) = 1 if x0 ∈ E,
and 𝜇(E) = 0 otherwise. Then 𝜇 is a measure on 𝒫(X). 

Example 6. Let X = ℕ. A subset E of X is at most countable, so we can write E =


{n1 , n2 , ...}. Define 𝜇(E) = ∑k 2−nk . It is easy to see that 𝜇 is a measure on 𝒫(X).
Observe that 𝜇(X) = 1. 

Theorem 8.2.3. If X is a measure space, then

(a) The monotonicity of positive measures: if E, F ∈ 𝔐 and E ⊆ F, then

𝜇(E) ≤ 𝜇(F ).
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

352 fundamentals of mathematical analysis

(b) The countable subadditivity of positive measures: if (En ) is a sequence in


𝔐, then

𝜇(∪∞
n=1 En ) ≤ ∑n=1 𝜇(En ).

(c) If E1 ⊆ E2 ⊆ ... is an ascending sequence of subsets in 𝔐, then

𝜇(∪∞
n=1 En ) = limn 𝜇(En ).

(d) If E1 ⊇ E2 ⊇ ... is a descending sequence of subsets in 𝔐 and 𝜇(E1 ) < ∞, then

𝜇(∩∞
n=1 En ) = limn 𝜇(En ).

Proof. (a) Since F = E ∪ (F − E), 𝜇(F ) = 𝜇(E) + 𝜇(F − E) ≥ 𝜇(E).

(b) Let B1 = E1 , and, for n ≥ 2, let Bn = En − ∪n−1


i=1 Ei . The sequence {Bn } is
pairwise disjoint, and ∪∞ E
n=1 n = ∪ ∞
B
n=1 n ; hence 𝜇(∪∞ ∞
n=1 En ) = 𝜇(∪n=1 Bn ) =
∞ ∞
∑n=1 𝜇(Bn ) ≤ ∑n=1 𝜇(En ).

(c) Let B1 = E1 , and, for n ≥ 2, let Bn = En − En−1 . The sequence {Bn } is pairwise

disjoint, and ∪ni=1 Bi = En . Now 𝜇(∪∞ ∞
n=1 En ) = 𝜇(∪n=1 Bn ) = ∑n=1 𝜇(Bn ) =
n
limn ∑i=1 𝜇(Bi ) = limn 𝜇(∪ni=1 Bi ) = limn 𝜇(En ).

(d) The sequence E1 − En is ascending, and E1 − ∩∞ ∞


n=1 En = ∪n=1 (E1 − En ). By
∞ ∞
part (c), 𝜇(E1 ) − 𝜇(∩n=1 En ) = 𝜇(∪n=1 (E1 − En )) = limn 𝜇(E1 − En ) = 𝜇(E1 ) −
n=1 En ) = limn 𝜇(En ). 
limn 𝜇(En ). Hence 𝜇(∩∞

Example 7. The condition 𝜇(E1 ) < ∞ in part (d) of the previous theorem cannot
be omitted. For example, if 𝜇 is the counting measure on ℕ, and En = [n, ∞) ∩
ℕ, then limn 𝜇(En ) = ∞, while 𝜇(∩∞ n=1 En ) = 𝜇(∅) = 0. 

Outer Measures

We now discuss an important general construction which we need in section 8.4


for the construction of the Lebesgue measure on ℝn .

Definition. Let X be a nonempty set. A set function m∗ ∶ 𝒫(X) → [0, ∞] is called


an outer measure on X if the following conditions are satisfied:

(a) if E ⊆ F ⊆ X, then m∗ (E) ≤ m∗ (F ); and


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 353

(b) for a countable sequence (En ) of subsets of X,


m∗ (∪∞ ∗
n=1 En ) ≤ ∑ m (En ).
n=1

Thus an outer measure is a nonnegative set function on 𝒫(X) that is monotone and
countably subadditive. Outer measures have little intrinsic importance. However,
an outer measure can be restricted to a positive measure on a certain 𝜎-algebra of
sets in X, as we detail below.

Definition. Let m∗ be an outer measure on X. A subset E of X is said to be m∗ -


measurable (or simply measurable, in this discussion) if

m∗ (A) = m∗ (A ∩ E) + m∗ (A ∩ E′ )

for all subsets A of X.


The above condition is known as the Carathéodory condition. Let 𝔐 denote
the set of all m∗ -measurable subsets of X.

The Carathéodory condition is not a very intuitive idea. However, it immediately


guarantees the finite additivity of m∗ on 𝔐. Indeed, if E1 and E2 are disjoint
subsets of X, and E1 is measurable, then applying the Carathéodory condition with
A = E1 ∪ E2 , we obtain

m∗ (E1 ∪ E2 ) = m∗ ((E1 ∪ E2 ) ∩ E1 ) + m∗ ((E1 ∪ E2 ) ∩ E′1 ) = m∗ (E1 ) + m∗ (E2 ).

The Carathéodory condition also implies without too much difficulty that 𝔐 is an
algebra (see lemma 8.2.4). In fact, it turns out that 𝔐 is a 𝜎-algebra and that the
restriction of m∗ to 𝔐 is a positive measure. We prove this in three steps.

Lemma 8.2.4. 𝔐 is an algebra.

Proof. If E ∈ 𝔐, then E′ ∈ 𝔐. This follows from the symmetry of the definition of a


measurable set. Now let E1 , E2 ∈ 𝔐. We need to prove that E1 ∪ E2 is measurable.
Because m∗ is subadditive, it is sufficient to show that

m∗ (A ∩ (E1 ∪ E2 )) + m∗ (A ∩ (E1 ∪ E2 )′ ) ≤ m∗ (A)

for all subsets A of X.


Using the identity A ∩ (E1 ∪ E2 ) = (A ∩ E1 ) ∪ (A ∩ E2 ∩ E′1 ) and the measura-
bility of E1 and E2 ,
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

354 fundamentals of mathematical analysis

m∗ ((A ∩ E1 ) ∪ (A ∩ E2 ∩ E′1 )) + m∗ (A ∩ E′1 ∩ E′2 )


≤ m∗ (A ∩ E1 ) + m∗ (A ∩ E2 ∩ E′1 ) + m∗ (A ∩ E′1 ∩ E′2 )
= m∗ (A ∩ E1 ) + m∗ (A ∩ E′1 ) = m∗ (A). 

Lemma 8.2.5. If (En ) is a disjoint sequence of measurable sets and A ⊆ X, then


n
(a) m∗ (A ∩ ∪ni=1 Ei ) = ∑i=1 m∗ (A ∩ Ei ),

(b) m∗ (A ∩ ∪∞ ∗
i=1 Ei ) = ∑i=1 m (A ∩ Ei ), and
∗ ∞ ∞ ∗
(c) m (∪i=1 Ei ) = ∑i=1 m (Ei ).

Proof. Using the fact that E1 is measurable, we have

m∗ (A ∩ (E1 ∪ E2 )) = m∗ (A ∩ (E1 ∪ E2 ) ∩ E1 ) + m∗ (A ∩ (E1 ∪ E2 ) ∩ E′1 )


= m∗ (A ∩ E1 ) + m∗ (A ∩ E2 ).

To complete the proof of part (a), we use induction coupled with the fact we just
established (n = 2) and the fact that 𝔐 is an algebra.

To prove (b),

n
∑ m∗ (A ∩ Ei ) = m∗ (A ∩ ∪ni=1 Ei ) ≤ m∗ (A ∩ ∪∞
i=1 Ei )
i=1

= m∗ (∪∞ ∗
i=1 (A ∩ Ei )) ≤ ∑ m (A ∩ Ei ).
i=1

Taking the limit as n → ∞, we obtain (b). Part (c) follows from (b) by taking
A = ∪∞i=1 Ei . 

Theorem 8.2.6 (Carathéodory’s theorem). 𝔐 is a 𝜎-algebra, and the restriction


of m∗ to 𝔐 is a positive measure.

Proof. The fact that m∗ is countably additive on 𝔐 is part (c) of the previous
theorem. We need to show that 𝔐 is closed under the formation of countable
unions. Let En ∈ 𝔐, and write E = ∪∞ n=1 En . Define B1 = E1 , and, for n ≥ 2, Bn =
En − ∪n−1
i=1 iE . Since 𝔐 is an algebra, each Bn ∈ 𝔐. Notice that the sets Bn are
∞ ∞
mutually disjoint, and ∪n=1 Bn = ∪n=1 En . Therefore, without loss of generality,
we may assume the sets En are mutually disjoint. We need to show that, for
A ⊆ X, m∗ (A) ≥ m∗ (A ∩ E) + m∗ (A ∩ E′ ). Using the facts that ∪ni=1 Ei ∈ 𝔐, A ∩
(∪ni=1 Ei )′ ⊇ A ∩ E′ , and lemma 8.2.5, we obtain
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 355

n
m∗ (A) = m∗ (A ∩ (∪i=1 Ei )) + m∗ (A ∩ (∪ni=1 Ei )′ )
n
≥ m∗ (A ∩ (∪ni=1 Ei )) + m∗ (A ∩ E′ ) = ∑ m∗ (A ∩ Ei ) + m∗ (A ∩ E′ ).
i=1

Taking the limit as n → ∞ in the above string, then applying part (b) of the
previous theorem, we obtain


m (A) ≥ ∑ m∗ (A ∩ Ei ) + m∗ (A ∩ E′ ) = m∗ (A ∩ E) + m∗ (A ∩ E′ ). 

i=1

Definition. Let (X, 𝔐, 𝜇) be a measure space. We say that 𝜇 is a complete


measure if whenever E ∈ 𝔐 is such that 𝜇(E) = 0, then any subset of E is in
𝔐. Thus 𝔐 contains all subsets of sets of measure 0.

We have now reached the culmination of this construction.

Theorem 8.2.7. Let m∗ be an outer measure on a set X, and let 𝔐 be the 𝜎-algebra
of measurable subsets of X. Then the restriction of m∗ to 𝔐 is a complete measure.

Proof. We have already established the fact that m∗ is a measure on 𝔐. It remains


to show the completeness of m∗ . We first show that if Z ⊆ X and m∗ (Z) = 0, then
Z ∈ 𝔐. Let A ⊆ X. Then 0 ≤ m∗ (A ∩ Z) ≤ m∗ (Z) = 0. Thus m∗ (A) ≤ m∗ (A ∩
Z) + m∗ (A ∩ Z′ ) = m∗ (A ∩ Z′ ) ≤ m∗ (A). This proves that Z is measurable. Now if
E ⊆ Z, then 0 ≤ m∗ (E) ≤ m∗ (Z) = 0; hence m∗ (E) = 0. By what we have already
established, E ∈ 𝔐. 

A word about complete measures is very much in order here. It is an inconvenient


fact that incomplete measures can occur quite naturally. For example, the product
of Lebesgue measures, which are complete, is not a complete measure (see section
8.8). It is desirable to know whether an incomplete measure space can be com-
pleted. The answer is yes, and the completion of a measure turns out to be a rather
simple construction. See problems 3 and 4 at the end of this section.

Measurable Functions

For the remainder of this section, (X, 𝔐) is a measurable space. We allow real-
valued functions on X to take infinite values. This is essential because, for example,
the limit of a sequence of functions fn (x) may well diverge to ±∞, or it may not
even exist for some x ∈ X. It will turn out that this is largely a technicality because,
in practice, the exceptional set of points where a reasonable measurable function
f takes infinite values has measure 0 (see, e.g., example 1 in section 8.3). In this
section, we have to contend with the nuisance that functions can assume infinite
values.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

356 fundamentals of mathematical analysis

Definition. An extended real-valued function f ∶ X → ℝ is said to be measurable


if f−1 ((a, ∞]) is measurable for every a ∈ ℝ.

Proposition 8.2.8. For a function f ∶ X → ℝ, and a ∈ ℝ, the following are


equivalent:

(a) f is measurable.
(b) f−1 ([a, ∞]) ∈ 𝔐.
(c) f−1 ([−∞, a)) ∈ 𝔐.
(d) f−1 ([−∞, a]) ∈ 𝔐.

Proof. (a) implies (b): f−1 ([a, ∞]) = ∩∞ −1


n=1 f ((a − 1/n, ∞]).
−1 −1
(b) implies (c): f ([−∞, a)) = X − f ([a, ∞]).
(c) implies (d): f−1 ([−∞, a]) = ∩∞ −1
n=1 f ([−∞, a + 1/n)).
(d) implies (a): f ((a, ∞]) = X − f ([−∞, a]). 
−1 −1

Theorem 8.2.9. (a) A constant function is measurable.


(b) If A ⊆ X, then 𝜒A is measurable if and only if A is measurable.
(c) If f ∶ X → ℝ is measurable and c ∈ ℝ, then f + c and cf are measurable.

The proof is left as an exercise. 

Lemma 8.2.10. Let f and g be measurable, extended real-valued functions. Then


the following subsets of X are measurable:

A = {x ∈ X ∶ f (x) > g(x)}.


B = {x ∈ X ∶ f (x) ≥ g(x)}.
C = {x ∈ X ∶ f (x) = g(x)}.

Proof. The set

A = ∪r∈ℚ [{x ∈ X ∶ f (x) > r} ∩ {x ∈ X ∶ g(x) < r}]

is measurable because ℚ is countable.


The set B is measurable because it is the complement of the set

{x ∈ X ∶ g(x) > f (x)},

which is measurable by part (a).


Finally,
C = {x ∈ X ∶ f (x) ≥ g(x)} ∩ {x ∈ X ∶ g(x) ≥ f (x)}
is measurable by part (b). 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 357

Proposition 8.2.11. An extended real-valued function f is measurable if and only if


the following two conditions hold:

(a) f−1 (−∞) and f−1 (∞) are measurable subsets of X, and
(b) f−1 (V) is measurable for every open subset V of ℝ.

Proof. Suppose f is measurable. Because f−1 (∞) = ∩∞ −1 −1


n=1 f ((n, ∞]), and f (−∞) =
∞ −1 −1 −1
∩n=1 f ([−∞, −n)), f (∞) and f (−∞) are measurable. Since an open subset
of ℝ is a countable union of open bounded intervals, it is enough to show that the
inverse image under f of an open bounded interval is in 𝔐. But, for a bounded
interval (a, b), f−1 ((a, b)) = f−1 ((a, ∞]) ∩ f−1 ([−∞, b)), which is in 𝔐.
Conversely, since (a, ∞) is open and f−1 ((a, ∞]) = f−1 ((a, ∞)) ∪ f−1 (∞),
f ((a, ∞]) is in 𝔐. 
−1

Lemma 8.2.12. Let f ∶ X → ℝ be a measurable function, and let 𝜑 ∶ ℝ → ℝ be


continuous. Then the function h ∶ X → ℝ defined below is measurable:

𝜑(f (x)) if f (x) ∈ ℝ,


h(x) = {
0 if | f (x)| = ∞.

Proof. We use proposition 8.2.11. By construction, h takes only finite values, so


h−1 (∞) = ∅ = h−1 (−∞). By the continuity of 𝜑, if V is an open subset of ℝ,
then U = 𝜑−1 (V) is open in ℝ. By proposition 8.2.11, f−1 (U) is measurable. Now

f−1 (U) if 0 ∉ V,
h−1 (V) = { −1 −1 −1
f (U) ∪ f (∞) ∪ f (−∞) if 0 ∈ V.

In either case, h−1 (V) is measurable. Again by proposition 8.2.11, h is a measur-


able function. 

This lemma can be applied to infer the measurability of a wide class of functions.
The following is a sample.

Theorem 8.2.13. If f ∶ X → ℝ is measurable, then so are max { f, 0}, min { f, 0}, | f |p


for all positive p, and f m for m ∈ ℕ.

Proof. This follows from lemma 8.2.12 applied with 𝜑(t) = max{t, 0}, 𝜑(t) =
min{t, 0}, 𝜑(t) = |t|p , and 𝜑(t) = tm , respectively. Here it is assumed, in
accordance with the lemma, that when | f (x)| = ∞, (𝜑of)(x) is defined to be 0. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

358 fundamentals of mathematical analysis

Lemma 8.2.14. Let A be a measurable subset of X, and let f ∶ A → ℝ be such that


{x ∈ A ∶ f (x) > a} ∈ 𝔐 for every a ∈ ℝ. Define h ∶ X → ℝ as follows: h|A = f,
and h(X − A) = 0. Then h is measurable.

Proof. Let a ∈ ℝ. If a ≥ 0, then

h−1 ((a, ∞]) = {x ∈ A ∶ h(x) > a} ∪ {x ∈ X − A ∶ h(x) > a} = {x ∈ A ∶ f (x) > a},

which is measurable.
If a < 0,
h−1 ((a, ∞]) = {x ∈ A ∶ f (x) > a} ∪ (X − A),
which is also measurable. 

A function satisfying the conditions of lemma 8.2.14 is said to be measurable on A.


Loosely speaking, lemma 8.2.14 says that a measurable function on a measurable
subset of X can be extended to a measurable function on X. Another way to look
at it is that altering the values of a measurable function on a measurable subset
produces a measurable function. Assigning the value 0 to h(X − A) is arbitrary,
and any (extended) real number can be used instead of 0.

Theorem 8.2.15. Let f and g be measurable, extended real-valued functions on a


measurable space X, and let

A = {x ∈ X ∶ f (x) ∈ ℝ} ∩ {x ∈ X ∶ g(x) ∈ ℝ}.

Then the following functions are measurable:

f (x) + g(x) if x ∈ A,
h(x) = {
0 if x ∉ A,
f (x)g(x) if x ∈ A,
k(x) = {
0 if x ∉ A.

Proof. By proposition 8.2.11, the set A is measurable. By lemma 8.2.14, it is enough


to check that h and k are measurable on A. Now if a ∈ ℝ, {x ∈ A ∶ f (x) + g(x) >
a} = A ∩ {x ∈ X ∶ f (x) > a − g(x)}, which is measurable by lemma 8.2.10. Thus h
is measurable on A. Now that f + g and f − g are measurable on A, (f + g)2 and
(f − g)2 are measurable on A by theorem 8.2.13. It follows that k is measurable on
A because fg = [(f + g)2 − (f − g)2 ]/4. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 359

Theorem 8.2.16. Let fn be a sequence of measurable functions. Then the following


functions are measurable:

(a) supn fn ,
(b) infn fn ,
(c) lim supn fn , and
(d) lim infn fn .

Also, the set {x ∈ X ∶ limn fn (x) exists} is measurable.

Proof. Parts (a) and (b) are true because

{x ∈ X ∶ supn fn (x) > a} = ∪∞


n=1 {x ∈ X ∶ fn (x) > a},

and
{x ∈ X ∶ infn fn (x) < a} = ∪∞
n=1 {x ∈ X ∶ fn (x) < a},

respectively. Now parts (c) and (d) follow from parts (a) and (b) because

lim sup fn = infn {supk≥n fk }, and lim inf fn = supn {infk≥n fk }.


n n

The last assertion follows from parts (c) and (d) and from lemma 8.2.10, because
the set in question is

{x ∈ X ∶ lim sup fn (x) = lim inf fn (x)}. 


n n

Definition. A complex function f ∶ X → ℂ is said to be measurable if its real and


imaginary parts are measurable.

Theorem 8.2.17. A complex function f ∶ X → ℂ is measurable if and only if


f−1 (V) ∈ 𝔐 for every open subset V of the complex plane.

Proof. Write f = f1 + if2 , and suppose f is measurable. An open set V in ℂ is a


countable union of open bounded rectangles, so it is enough to show that, for
the rectangle R = (a, b) × (c, d), f−1 (R) is measurable. But this is obvious since
f−1 (R) = f−1 −1
1 ((a, b)) ∩ f2 ((c, d)). To prove the converse, let a ∈ ℝ, and consider
the open set V = {𝜉 + i𝜂 ∈ ℂ ∶ 𝜉 > a}. By assumption, f−1 (V) is in 𝔐. But
f−1 (V) = f−1
1 ((a, ∞)). One shows that f2 is measurable by considering the open
set V = {𝜉 + i𝜂 ∈ ℂ ∶ 𝜂 > a}. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

360 fundamentals of mathematical analysis

Excursion: The Hopf Extension Theorem2

The motivation for the Hopf extension included below is not entirely precise, but
we hope it will help the reader gain some insight into the construction of important
measures such as the Lebesgue measure on ℝ2 . The plane contains a collection of
subsets for which a natural measure exists, namely, the collection of rectangles.3
The measure (area) of a rectangle ought to be the product of its dimensions. The
collection ℭ of finite disjoint unions of rectangles in the plane is known to be an
algebra in ℝ2 , and the measure of a member of ℭ is defined in the obvious way: it
is the (finite) sum of the measures of the rectangles in the union. The immediate
question is whether the natural measure we just described can be extended to the
𝜎-algebra 𝔐 generated by ℭ.
The Hopf extension abstracts the above motivation and provides an affirmative
answer (theorem 8.2.19). Theorem 8.2.20 gives a sufficient condition for the
uniqueness of such an extension.

We will construct measures on product spaces (section 8.8) using a different


approach, and this excursion can be bypassed without affecting the continuity of
the rest of this chapter.

We will adopt the following standing assumptions throughout this excursion:

1. ℭ is an algebra of subsets in X. (Thus X ∈ ℭ, and ∅ ∈ ℭ.)


2. 𝜇 ∶ ℭ → [0, ∞] is a set function such that 𝜇(∅) = 0.
3. 𝜇 is a countably additive on ℭ in the sense that if {Cn } is a disjoint sequence

in ℭ, and ∪∞ ∞
n=1 Cn ∈ ℭ, then 𝜇(∪n=1 Cn ) = ∑n=1 𝜇(Cn ). Observe that such a
function is monotone.

Lemma 8.2.18. Under the standing assumptions, define a set function


n∗ ∶ 𝒫(X) → [0, ∞] by


n∗ (E) = inf{ ∑ 𝜇(Cn ) ∶ Cn ∈ ℭ, E ⊆ ∪∞
n=1 Cn }.
n=1

(a) n∗ is an outer measure on X.


(b) For every E ∈ ℭ, n∗ (E) = 𝜇(E).
(c) Every E ∈ ℭ is n∗ -measurable.

2 This theorem is also attributed to Lebesgue.


3 Intuitively, a rectangle is the product of two intervals. More precisely, a rectangle is the product of
two Lebesgue measurable subsets of ℝ.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 361

Proof. The definition of n∗ is meaningful because X ∈ ℭ, and the monotonicity


of n∗ is obvious. To prove that n∗ is subadditive, let {En } be a countable
collection of subsets of X, and, without loss of generality, assume that

∑n=1 n∗ (En ) < ∞. Let 𝜖 > 0 and, for each n ∈ ℕ, let {Cn,j } ⊆ ℭ be such that En ⊆

∪∞ ∗ n ∞ ∞
j=1 Cn,j and ∑j=1 𝜇(Cn,j ) < n (En ) + 𝜖/2 . Now ∪n=1 En ⊆ ∪n,j=1 Cn,j ; hence
∞ ∞ ∞ ∞
n∗ (∪∞ ∗ n ∗
n=1 En ) ≤ ∑n=1 ∑j=1 𝜇(Cn,j ) ≤ ∑n=1 n (En ) + 𝜖/2 = ∑n=1 n (En ) + 𝜖.
Since 𝜖 is arbitrary, the proof of part (a) is complete.

(b) If E ∈ ℭ, then E ⊆ ∪∞ n=1 Cn , where C1 = E, and Cn = ∅ for n ≥ 2. Thus


n∗ (E) ≤ 𝜇(E). Suppose E ⊆ ∪∞ n=1 Cn , and define D1 = C1 and, for n ≥ 2, Dn =
Cn − ∪i=1 Ci . Because ℭ is an algebra, each Dn ∈ ℭ. Clearly, ∪∞
n−1 ∞
n=1 Dn = ∪n=1 Cn ;
∞ ∞
hence E = ∪n=1 (E ∩ Dn ). By the additivity of 𝜇 on ℭ, 𝜇(E) = 𝜇(∪n=1 (E ∩ Dn )) =
∞ ∞
∑n=1 𝜇(E ∩ Cn ) ≤ ∑n=1 𝜇(Cn ). By the very definition of n∗ , 𝜇(E) ≤ n∗ (E).

(c) Let E ∈ ℭ, A ⊆ X, and, without loss of generality, assume that n∗ (A) < ∞.
For every 𝜖 > 0, there exists a sequence {Cn } in ℭ such that A ⊆ ∪∞ n=1 Cn and
∞ ∞
∑n=1 𝜇(Cn ) ≤ n∗ (A) + 𝜖. By the additivity of 𝜇 on ℭ, n∗ (A) + 𝜖 ≥ ∑n=1 𝜇(Cn ) =
∞ ∞
∑n=1 𝜇(Cn ∩ E) + ∑n=1 𝜇(Cn ∩ E′ ) ≥ n∗ (A ∩ E) + 𝜈 ∗ (A ∩ E′ ). Since 𝜖 is arbi-
trary, the result follows. 

Theorem 8.2.19 (the Hopf extension theorem). Under the standing assumptions,
the set function 𝜇 has an extension to a positive measure on the 𝜎-algebra 𝔐
generated by ℭ.

Proof. By theorem 8.2.6 (Carathéodory’s theorem), the collection 𝔑 of n∗ -measurable


subsets of X is a 𝜎-algebra, and the restriction, 𝜈, of n∗ to 𝔑 is a positive measure.
Since every member of ℭ is n∗ -measurable, 𝔐 ⊆ 𝔑. The measure space we seek
is (X, 𝔐, 𝜈). 

The next corollary establishes a sufficient condition for the uniqueness of the Hopf
extension.

Theorem 8.2.20. Suppose, in addition to the standing assumptions, that the


following assumption is satisfied:

𝜎-finiteness: there exists a sequence (Xn ) in ℭ such that X = ∪∞


n=1 Xn , and, for
every n ∈ ℕ, 𝜇(Xn ) < ∞

Then the extension 𝜈 provided by the previous theorem is unique.

Proof. The following two facts are consequences of the 𝜎-finiteness assumption.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

362 fundamentals of mathematical analysis

(a) The sequence (Xn ) may be assumed to be mutually disjoint, because we can
replace it with the sequence Y1 = X1 , and, for n ≥ 2, Yn = Xn − ∪n−1
i=1 Xi . Clearly,

Yn ∈ ℭ, 𝜇(Yn ) ≤ 𝜇(Xn ) < ∞, and ∪n=1 Yn = X.

(b) An arbitrary member E ∈ 𝔐 can be written as E = ∪∞n=1 En , where (En ) is a


disjoint sequence in 𝔐 such that 𝜈(En ) < ∞. We simply set En = E ∩ Yn . Then
𝜈(En ) ≤ 𝜈(Yn ) = 𝜇(Yn ) < ∞.

We now prove the result. Suppose there is another measure that extends 𝜇 from ℭ
to 𝔐. We continue to use the symbol 𝜇 to denote this extension. Thus we assume
that 𝜇(C) = 𝜈(C) for every C ∈ ℭ and prove that 𝜇(E) = 𝜈(E) for every E ∈ 𝔐.
Observe the following facts:

(c) If {Cn } ⊆ ℭ, and C = ∪∞ n n


n=1 Cn , then 𝜈(C) = limn 𝜈(∪i=1 Ci ) = limn 𝜇(∪i=1 Ci ) =
𝜇(C).

(d) For every E ∈ 𝔐, 𝜇(E) ≤ 𝜈(E). If E ⊆ ∪∞ n=1 Cn , where {Cn } ⊆ ℭ, then



𝜇(E) ≤ 𝜇(∪∞ C
n=1 n ) ≤ ∑ n=1
𝜇(C n ). By the definition of 𝜈, 𝜇(E) ≤ 𝜈(E).

(e) If E ∈ 𝔐 and 𝜈(E) < ∞, then 𝜇(E) = 𝜈(E). Let 𝜖 > 0. There exists a

sequence (Cn ) in ℭ such that E ⊆ C = ∪∞ n=1 Cn and ∑n=1 𝜇(Cn ) < 𝜈(E) + 𝜖.
∞ ∞
Now 𝜈(C) ≤ ∑n=1 𝜈(Cn ) = ∑n=1 𝜇(Cn ) < 𝜈(E) + 𝜖. In particular, 𝜈(C − E) <
𝜖. Using fact (c), we have 𝜈(E) ≤ 𝜈(C) = 𝜇(C) = 𝜇(E) + 𝜇(C − E) ≤ 𝜇(E) +
𝜈(C − E) < 𝜇(E) + 𝜖. Since 𝜖 is arbitrary, 𝜈(E) ≤ 𝜇(E). Now 𝜇(E) = 𝜈(E) by
fact (d).

Finally, for an arbitrary set E ∈ 𝔐, we use fact (b) to write E = ∪∞


n=1 En , where
(En ) is a disjoint sequence in 𝔐, such that 𝜈(En ) < ∞. Using fact (e), 𝜈(E) =
∞ ∞
∑n=1 𝜈(En ) = ∑n=1 𝜇(En ) = 𝜇(E). 

Exercises

1. Let 𝒜 = {A1 , … , An } be distinct subsets of a nonempty set X. Show that the


n
𝜎-algebra generated by 𝒜 contains at most 22 members.
2. Show that if a 𝜎-algebra 𝔐 is infinite, then it is uncountable.
3. Completion of an incomplete measure. Let (X, 𝔐, 𝜇) be an incomplete
measure space, and let ℨ be the collection of subsets of sets of 𝜇-measure
0. Let 𝔐 be the smallest 𝜎-algebra in X that contains 𝔐 ∪ ℨ. Prove that
every member of 𝔐 has the form E ∪ Z, where E ∈ 𝔐 and Z ∈ ℨ. Extend
the definition of 𝜇 to 𝔐 as follows: for E ∈ 𝔐 and Z ∈ ℨ, 𝜇(E ∪ Z) = 𝜇(E).
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 363

Show that 𝜇 is well defined and that 𝜇 is a complete measure. Hint: Show
that the set 𝔐1 = {E ∪ Z ∶ E ∈ 𝔐, Z ∈ ℨ} is a 𝜎-algebra.
4. This exercise provides a useful alternative characterization of the comple-
tion of a measure space. In the notation of the previous exercise, prove that,
for a subset E of X, E ∈ 𝔐 if and only if there exists two sets A and B in 𝔐
such that A ⊆ E ⊆ B and 𝜇(B − A) = 0.
5. Prove that each of the following collections of sets generates ℬ(ℝ):
(a) {(a, ∞) ∶ a ∈ ℝ}
(b) {(−∞, b) ∶ b ∈ ℝ}
n
6. Prove that the collection of open boxes {∏i=1 (ai , bi ) ∶ ai , bi ∈ ℚ} generates
ℬ(ℝn ).
7. Suppose 𝔐 is a 𝜎-algebra generated by a collection ℭ of subsets of a
nonempty set X. Prove that 𝔐 is the union of the 𝜎-algebras generated
by 𝔉 where 𝔉 ranges over all the countable subsets of ℭ. Hint: The latter
union is a 𝜎-algebra.
8. Prove that if E and F are measurable sets such that 𝜇(EΔF) = 0, then 𝜇(E) =
𝜇(F ) = 𝜇(E ∪ F) = 𝜇(E ∩ F).

9. Let En be a sequence of measurable sets such that ∑n=1 𝜇(En ) < ∞. Prove
that the set ∩∞ n=1 ∪k≥n Ek has measure 0. Conclude that, except for a set of
measure 0, every x ∈ X belongs to finitely many of the sets En .
10. Let E1 , … , En be measurable sets and, for 1 ≤ j ≤ n, let Fj to be the set
of points in X that belong to exactly j of the sets E1 , … , En . Prove that
n n n
𝜇(∪ni=1 Ei ) = ∑j=1 𝜇(Fj ), and ∑i=1 𝜇(Ei ) = ∑j=1 j𝜇(Fj ). Hint: Fj = {x ∈ X ∶
n
∑i=1 𝜒Ei (x) = j}.
11. Prove theorem 8.2.9.
12. Show that if f is measurable and a ∈ ℝ, then f−1 (a) is measurable.
13. Let (X, 𝔐) be a measurable space such that 𝔐 ≠ 𝒫(X). Prove that there is
a function f such that | f | is measurable but f is not.
14. Suppose that (X, 𝔐) is a measurable space and that Y is a nonempty set.
Show that if f ∶ X → Y, then the collection 𝔑 = {E ⊆ Y ∶ f−1 (E) ∈ 𝔐} is a
𝜎-algebra.
15. Let (X, 𝔐) be a measurable space, and let f ∶ X → ℝ be a measurable
function. Show that f−1 (B) is measurable for every Borel subset B of ℝ.
Hint: The collection Ω = {E ⊆ ℝ ∶ f−1 (E) ∈ 𝔐} contains all open subsets
of ℝ.
16. Let X be a topological space, and let f ∶ X → ℝ be a continuous function.
Show that f−1 (B) is a Borel subset of X for every Borel subset B of ℝ.
17. Show that if E ∈ ℬ(ℝs ) and F ∈ ℬ(ℝr ), then E × F ∈ ℬ(ℝr+s ). Hint: For
an open subset E of ℝr , consider the collection Ω = {F ⊆ ℝs ∶ E × F ∈
ℬ(ℝr+s )}. Show that ℬ(ℝs ) ⊆ Ω. Then, for a Borel subset F of ℝs , consider
the collection {E ⊆ ℝr ∶ E × F ∈ ℬ(ℝr+s )}.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

364 fundamentals of mathematical analysis

18. Let C be the Cantor set, and define a function f ∶ [0, 1] → C as follows:
∞ a ∞ 2a
f(0) = 0, and, for x ∈ (0, 1], write x = ∑i=1 ii and set f (x) = ∑i=1 ii .⁴ Show
2 3
that f is Borel measurable. Hint: For a fixed i ∈ ℕ, define fi (x) = ai . It is
enough to show that fi is measurable. To this end, show that fi = ∑ {𝜒Ei,k ∶
k k+1
k = 1, 3, 5, … , 2i − 1}, where Ei,k = ( i , ].
2 2i

8.3 Abstract Integration

In this section, we examine Lebesgue’s revolutionary approach to the definition of


the integral. The motivation below is imprecise and does not rigorously develop
any particular set of ideas. For the sake of simplicity, we assume that f is a positive
continuous function on a compact interval.
The Riemann integral is based on the simple geometrical idea of dividing the
region below the graph of f into thin vertical strips, where the area below the graph
is approximated by the integral of a step function (the Riemann sum). Lebesgue’s
idea was to divide the range of f by points y0 , … , yn , and, for k = 1, … , n, we consider
the sets Ek = f−1 ([yk , yk+1 )). Even for an uncomplicated function, the set Ek may
come in several fragments, as shown in figure 8.1, where Ek has three fragments.
When yk+1 − yk is small, the approximate combined area of the three shaded strips
is the approximate common height, yk , times the sum of the lengths of the three
fragments that comprise the set Ek or, more precisely, the measure of Ek . Thus

yk+1

yk

Ek Ek Ek

Figure 8.1 Lebesgue integration

⁴ We use the series representation of x if x has a terminating binary expansion.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 365

n
the approximate area below the graph is ∑k=1 yk 𝜇(Ek ), which, by definition, is
the integral of a simple function. Needless to say, as the partition of the range
of f gets finer, we expect the integrals of the simple functions to converge to the
integral of f. This is the overarching idea in Lebesgue integration. As it turns out,
we can integrate far more functions under the Lebesgue definition than under
the Riemann definition. For example, the integral of any positive measurable
function is defined, although it may not be finite. Additionally, the definition of the
integral extends seamlessly to abstract measure spaces. The section results capture
the above ideas. First we define the integral of a positive measurable function
f, then we show that f is the limit of simple functions, sn , and then we show
that ∫X fd𝜇 = limn ∫X sn d𝜇. Extending the definition of the integral to complex
functions follows without difficulty. The section concludes with three important
convergence theorems.

Definition. Let (X, 𝔐) be a measurable space. A simple function on X is a


function s ∶ X → ℂ of finite range. If a1 , … , am are the distinct values of s, then
m
s = ∑i=1 ai 𝜒Ei , where Ei = s−1 (ai ). Clearly, Ei ∩ Ej = ∅ if i ≠ j.

Remarks. (a) It is clear that a simple function is measurable if and only if each
Ei is a measurable set. Also, a simple function need not have bounded support.
For example, s = 𝜒(−∞,0) + 𝜒(1,2) is not supported on a bounded set.

(b) Our definition of a simple function is sometimes referred to as the standard


form of a simple function. It is important to understand that any finite linear
combination of characteristic functions of disjoint sets is a simple function. For
m
example, if s = ∑i=1 ai 𝜒Ei , where a1 , … , am are not all distinct, we can rewrite
s in standard form as follows: Let b1 , … , bn be the distinct values of s and, for
1 ≤ i ≤ n, let Ti = {j ∈ ℕm ∶ aj = bi }. Note that {T1 , … , Tn } is a partition of ℕm .
Set Bi = ∪j∈Ti Ej . The sets Bi are disjoint since the sets Ei are. Clearly, s =
n
∑i=1 bi 𝜒Bi . It is, in fact, true that a finite linear combination of characteristic
functions of subsets of X is a simple function, even when the sets Ei overlap. An
inductive proof is possible. We invite the reader to try it.

Definition. The integral of a simple function. Let (X, 𝔐, 𝜇) be a measure space,


m
and let s = ∑i=1 ai 𝜒Ei be a measurable simple function in standard form. The
integral of s with respect to 𝜇 is

m
∫ sd𝜇 = ∑ ai 𝜇(Ei ).
X i=1

The above formula is robust in the sense that it is valid even when s is not
in standard form. This follows from remark (b) above. If a1 , … , am are not
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

366 fundamentals of mathematical analysis

n
all distinct, write s in standard form using remark (b): s = ∑i=1 bi 𝜒Bi . Then
n n m
∑i=1 bi 𝜇(Bi ) = ∑i=1 bi ∑j∈T 𝜇(Ej ) = ∑j=1 aj 𝜇(Ej ).
i

Theorem 8.3.1. If s and t are simple functions and c ∈ 𝕂, then

∫(s + t)d𝜇 = ∫ sd𝜇 + ∫ td𝜇, and ∫ csd𝜇 = c ∫ sd𝜇.


X X X X X

Proof. The second identity follows trivially from the definition.


m n
Let s = ∑i=1 ai 𝜒Ei and t = ∑j=1 bj 𝜒Fj be simple functions in standard form, and
let Bij = Ei ∩ Fj , 1 ≤ i ≤ m, 1 ≤ j ≤ n. The collection {Bij } is disjoint, Ei = ∪nj=1 Bij ,
m n n m
and Fj = ∪m i=1 Bij . Now s = ∑i=1 ∑j=1 ai 𝜒Bij , t = ∑j=1 ∑i=1 bj 𝜒Bij , and s + t =
m n
∑i=1 ∑j=1 (ai + bj )𝜒Bij . By definition,

m n
∫(s + t)d𝜇 = ∑ ∑(ai + bj )𝜇(Bij )
X i=1 j=1
m n n m
= ∑ ai ∑ 𝜇(Bij ) + ∑ bj ∑ 𝜇(Bij )
i=1 j=1 j=1 i=1
m n
= ∑ ai 𝜇(Ei ) + ∑ bj 𝜇(Fj ) = ∫ sd𝜇 + ∫ td𝜇. 
i=1 j=1 X X

Remark. This proof includes a proof of the fact that the sum of two simple
functions is a simple function.

Definition. The integral of a positive function. Let f ∶ X → [0, ∞] be a measur-


able function. We define

∫ fd𝜇 = sup{ ∫ sd𝜇 ∶ 0 ≤ s ≤ f, s simple}.


X X

Observe that this definition is reminiscent of the fact that the Riemann integral
of a function is the supremum of the lower Riemann sums of the function
and that a lower Riemann sum of a function is the integral of a step function
dominated by f.

The following facts are immediately obvious:

(a) ∫X cfd𝜇 = c ∫X fd𝜇 for c ≥ 0, and


(b) If g ∶ X → [0, ∞] is measurable and 0 ≤ f ≤ g, then ∫X fd𝜇 ≤ ∫X gd𝜇.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 367

The fact that, for positive functions f and g, ∫X (f + g)d𝜇 = ∫X fd𝜇 + ∫X gd𝜇 requires
the development of some machinery. First we show that a positive measurable
function f is the limit of a sequence, sn , of simple functions, then we show that
limn ∫X sn d𝜇 = ∫X fd𝜇. The details appear below.

Definition. The positive and negative parts of a measurable real-valued func-


tion f are, respectively,

f+ (x) = max{f (x), 0}, and f− (x) = −min{f (x), 0}.

Observe that f+ and f− are positive, measurable functions,

f = f+ − f− , and | f | = f+ + f− .

Theorem 8.3.2. (a) Let f ∶ X → [0, ∞] be a measurable function. Then there exists
an increasing sequence of simple functions s1 , s2 , ... such that limn sn (x) = f (x)
for every x ∈ X.
(b) Let f ∶ X → ℂ be a measurable function. Then there exists a sequence of
simple functions u1 , u2 , ... such that limn un (x) = f (x) for every x ∈ X and
|u1 | ≤ |u2 | ≤ ... ≤ | f |.

k−1 k
Proof. For each n ∈ ℕ, define En,k = {x ∈ X ∶ ≤ f (x) < }, k = 1, 2, … , n2n ,
2n 2n
n2n k−1
and Fn = {x ∈ X ∶ f (x) ≥ n}. Let sn = 𝜒En,k + n𝜒Fn .
∑k=1 n
2
The fact that sn (x) ≤ f (x) is clear. Now every x ∈ X belongs to exactly one of
the sets En,k or to Fn . We show that sn is an increasing sequence of functions. If
2(k−1) 2k−1 2(k−1) 2k−1 2k
≤ f (x) < , then sn (x) = = sn+1 (x). If ≤ f (x) < , then
2n+1 2n+1 2n+1 2n+1 2n+1
2(k−1) 2k−1
sn (x) = n+1 < n+1 = sn+1 (x). If f (x) ≥ n, n = sn (x) ≤ sn+1 (x). Now we show
2 2
that limn sn (x) = f (x). If f (x) < ∞, 0 ≤ f (x) − sn (x) ≤ 1/2n . If f (x) = ∞, sn (x) =
n for all n ∈ ℕ. In either case, limn sn (x) = f (x).

To prove (b), write f = (f+ − + −


1 − f1 ) + i(f2 − f2 ). By part (a), there are sequences
of positive, increasing, measurable simple functions s+ − + −
n , sn , tn , and tn such that
+ + − − + + − −
limn sn = f1 , limn sn = f1 , limn tn = f2 , and limn tn = f2 . The sequence of sim-
ple functions un = (s+n − sn ) + i(tn − tn ) satisfies the requirements of part (b). 
− + −

Remark. This proof shows that if f is a bounded, positive, measurable function,


then sn converges uniformly to f because ‖sn − f‖∞ ≤ 1/2n .
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

368 fundamentals of mathematical analysis

Lemma 8.3.3. Let fn ∶ X → [0, ∞] be an increasing sequence of measurable func-


tions such that limn fn (x) = f (x) for all x ∈ X. If s is a simple measurable function
such that 0 ≤ s ≤ f, then limn ∫X fn d𝜇 ≥ ∫X sd𝜇.
m
Proof. Let s = ∑j=1 aj 𝜒Ej . Fix 0 < 𝛼 < 1, and define Bn = {x ∈ X ∶ fn (x) > 𝛼s(x)}.
It is easy to see that Bn ⊆ Bn+1 and that ∪∞ n=1 Bn = X. Notice that (Ej ∩ Bn ) is

an ascending sequence of sets, and Ej = ∪n=1 (Ej ∩ Bn ); hence 𝜇(Ej ) = limn 𝜇(Ej ∩
m
Bn ). Now ∫X fn d𝜇 ≥ ∫X fn 𝜒Bn d𝜇 ≥ 𝛼 ∫X s𝜒Bn d𝜇 = 𝛼 ∑j=1 aj 𝜇(Ej ∩ Bn ). Taking the
m
limit as n → ∞, we obtain limn ∫X fn d𝜇 ≥ 𝛼 ∑j=1 aj 𝜇(Ej ) = 𝛼 ∫X sd𝜇. The result
we need follows by letting 𝛼 → 1. 

Theorem 8.3.4. Let f ≥ 0 be a measurable function, and let 0 ≤ sn ≤ sn+1 be simple


functions such that sn ≤ f, and limn sn (x) = f (x). Then ∫X fd𝜇 = limn ∫X sn d𝜇.

Proof. Since 0 ≤ sn ≤ f, ∫X sn ≤ ∫X fd𝜇, and limn ∫X sn d𝜇 ≤ ∫X fd𝜇. Now if t is a


simple function such that 0 ≤ t ≤ f, then, by lemma 8.3.3, ∫X td𝜇 ≤ limn ∫X sn d𝜇.
Therefore, ∫X fd𝜇 = sup{∫X td𝜇 ∶ 0 ≤ t ≤ f, t simple } ≤ limn ∫X sn d𝜇. 

Theorem 8.3.5. If f ≥ 0 and g ≥ 0 are measurable functions and a, b ≥ 0, then


∫X (af + bg)d𝜇 = a ∫X fd𝜇 + b ∫X gd𝜇.

Proof. By theorem 8.3.2, there exist sequences of simple functions s1 ≤ s2 ≤ … , and


t1 ≤ t2 ≤ ... such that limn sn (x) = f (x), and limn tn (x) = g(x). By theorem 8.3.1,
∫X (asn + btn )d𝜇 = a ∫X sn d𝜇 + b ∫X tn d𝜇. Now the sequence of simple functions
asn + btn is increasing, and limn (asn (x) + btn (x)) = af (x) + bg(x). Thus,
by theorem 8.3.4, ∫X (af + bg)d𝜇 = limn ∫X (asn + btn )d𝜇 = a limn ∫X sn d𝜇 +
b limn ∫X tn d𝜇 = a ∫X fd𝜇 + b ∫X gd𝜇. 

Definition. The integral of a real function. Let f ∶ X → [−∞, ∞] be a measurable


function, and write f = f+ − f− . By definition,

∫ fd𝜇 = ∫ f+ − ∫ f− d𝜇,
X X X

provided that at least one of the integrals on the right-hand side of the definition
is finite. We say f is integrable if both ∫X f+ d𝜇 and ∫X f− d𝜇 are finite, which
is equivalent to the condition that ∫X | f |d𝜇 < ∞. This is because | f | = f+ + f− ,
f+ ≤ | f |, and f− ≤ | f |.

Theorem 8.3.6. If f and g are real and integrable, then ∫X (f + g)d𝜇 = ∫X fd𝜇 +
∫X gd𝜇. Also, ∫X afd𝜇 = a ∫X fd𝜇 for every real number a.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 369

Proof. Write f = f+ − f− , g = g+ − g− , and let h = f + g. Writing h = h+ − h−


yields h+ + f− + g− = h− + f+ + g+ . Integrating both sides of the last identity
and using the previous theorem, we obtain ∫X h+ d𝜇 + ∫X f− d𝜇 + ∫X g− d𝜇 =
∫X h− d𝜇 + ∫X f+ d𝜇 + ∫X g+ d𝜇. The result we seek is obtained by rearranging
the last identity. To complete the proof of the theorem, we only need to show
that ∫X (−f )d𝜇 = − ∫X fd𝜇. This is a simple calculation if we use the identities
(−f )+ = f− , and (−f )− = f+ . 

Definition. The integral of a complex function. If f ∶ X → ℂ, and f = f1 + if2 ,


define ∫X fd𝜇 = ∫X f1 d𝜇 + i ∫X f2 d𝜇. We say that f is integrable if f1 and f2 are
integrable.

Notice that a complex function is integrable if and only if ∫X | f |d𝜇 < ∞. This is
because | f1 | ≤ | f |, | f2 | ≤ | f |, and | f | ≤ | f1 | + | f2 |.

Theorem 8.3.7. If f and g are integrable complex functions, and a, b ∈ ℂ, then


af + bg is integrable and

∫(af + bg)d𝜇 = a ∫ fd𝜇 + b ∫ gd𝜇.


X X X

Proof. ∫X |af + bg|d𝜇 ≤ ∫X |a|| f | + |b||g|d𝜇 = |a| ∫X | f |d𝜇 + |b| ∫X |g|d𝜇 < ∞. Thus
af + bg is integrable. The verification that ∫X (f + g)d𝜇 = ∫X fd𝜇 + ∫X gd𝜇 is a
routine calculation, as is the fact that ∫X cfd𝜇 = c ∫X fd𝜇 when c is a real constant.
It now suffices to show that ∫X ifd𝜇 = i ∫X fd𝜇. Indeed, ∫X ifd𝜇 = ∫X i(f1 + if2 )d𝜇 =
∫X (−f2 + if1 )d𝜇 = ∫X −f2 d𝜇 + i ∫X f1 d𝜇 = i ∫X fd𝜇. 

It is easy to see that the set of complex integrable functions is a vector space. We
denote it by 𝔏1 (𝜇). In fact, if a norm is defined on 𝔏1 (𝜇) by ‖ f‖1 = ∫X | f |d𝜇, then
𝔏1 (𝜇) is a normed linear space, as the reader can easily verify.

Definition. Let (X, 𝔐, 𝜇) be a measure space, and let P(x) be a property that may
or may not be satisfied by a point x ∈ X. For example, for a given extended real-
valued function f, P(x) may be the property that f (x) is finite. Another example
is the property that f (x) = g(x) for two measurable functions f and g. We say
that property P holds for almost every x in a measurable set E, or that P holds
almost everywhere in E, if 𝜇({x ∈ E ∶ P(x) is false}) = 0. In this situation, we
often write “P holds for a.e. x ∈ E.” The examples below are good illustrations of
the concept.

Example 1. If f ∈ 𝔏1 (𝜇), then f is finite almost everywhere.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

370 fundamentals of mathematical analysis

Let En = {x ∈ X ∶ | f (x)| ≥ n}. It is clear that E1 ⊇ E2 ⊇ E3 ⊇ …, and that,


for each n ∈ ℕ, ∫X | f |d𝜇 ≥ ∫X | f |𝜒En d𝜇 ≥ ∫X n𝜒En d𝜇 = n𝜇(En ). Thus 𝜇(En ) ≤
1
∫ | f |d𝜇 → 0 as n → ∞. Therefore 𝜇(∩∞ ∞
n=1 En ) = limn 𝜇(En ) = 0. But ∩n=1 En =
n X
{x ∈ X ∶ | f (x)| = ∞}. This shows that f is finite almost everywhere. 

Example 2. If f ∈ 𝔏1 (𝜇), then | ∫X fd𝜇| ≤ ∫X | f |d𝜇.


Assume that z = ∫X fd𝜇 ≠ 0 because, otherwise, there is nothing to prove.
|z|
Let 𝛼 = . Then |𝛼| = 1, and 𝛼z = |z|. Now | ∫X fd𝜇| = |z| = 𝛼z = 𝛼 ∫X fd𝜇 =
z
∫X (𝛼f)d𝜇. It follows that ∫X 𝛼fd𝜇 is real and positive; therefore ∫X 𝛼fd𝜇 =
∫X ud𝜇, where u = Re(𝛼f). Now u ≤ |𝛼f| = | f |, and ∫X ud𝜇 ≤ ∫X |𝛼f|d𝜇 =
∫X | f |d𝜇. 

Definition. If f ∶ X → [0, ∞] is measurable and E ∈ 𝔐, we define

∫fd𝜇 = ∫ f𝜒E d𝜇.


E X

Note that if E1 ⊆ E2 , then ∫E1 fd𝜇 ≤ ∫E2 fd𝜇. Also, if 0 ≤ f ≤ g, then ∫E fd𝜇 ≤
∫E gd𝜇.
m m
When s = ∑j=1 aj 𝜒Ej is a simple function, then s𝜒E = ∑j=1 aj 𝜒Ej ∩E is also a simple
function and
m
∫sd𝜇 = ∑ aj 𝜇(Ej ∩ E).
E j=1

This equation can very well be used to define ∫E sd𝜇. One can then take the
alternative approach of defining

∫fd𝜇 = sup{ ∫sd𝜇 ∶ 0 ≤ s ≤ f, s simple}.


E E

The two methods of defining ∫E fd𝜇 are clearly equivalent, and the interested
reader is encouraged to work out the details of reconciling the two definitions.

Another detail must be mentioned here. If E ∈ 𝔐, one can restrict 𝔐 and 𝜇 to E


in the obvious way: Define 𝔐E to be the members of 𝔐 contained in E, and define
𝜇E to be the restriction of 𝜇 to 𝔐E . This clearly turns (E, 𝔐E , 𝜇E ) into a measure
space, and it makes sense to define ∫E fd𝜇 to be the integral of f|E with respect to
(E, 𝔐E , 𝜇E ). Again, this definition is consistent with the above definitions of ∫E fd𝜇,
and, again, we leave the details to the interested reader.

Example 3. Suppose f ∶ X → [0, ∞] is a measurable function. If ∫E fd𝜇 = 0 for


some measurable set E, then f = 0 a.e. on E. In particular, if ∫X fd𝜇 = 0, then
f = 0 a.e. on X.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 371

1 1
Let En = {x ∈ E ∶ f (x) > }. Then 𝜇(En ) ≤ ∫En fd𝜇 ≤ ∫E fd𝜇 = 0. Thus
n n
𝜇(En ) = 0. The result now follows from the fact that {x ∈ E ∶ f (x) > 0} =

n=1 En , and 𝜇(∪n=1 En ) ≤ ∑n=1 𝜇(En ) = 0. 
∪∞ ∞

Example 4. If f is a measurable function and ∫E fd𝜇 = 0, for every measurable set


E, then f = 0 a.e.
Without loss of generality, assume f is real. Let E = {x ∈ X ∶ f (x) ≥ 0}. By
assumption, ∫E fd𝜇 = 0. But ∫E fd𝜇 = ∫X f+ d𝜇. By example 3, f+ = 0 a.e. on X.
Similarly, f− = 0 a.e. on X. 

Convergence Theorems

Theorem 8.3.8 (Fatou’s theorem). Let fn ∶ X → [0, ∞] be a sequence of measur-


able functions. Then

∫ lim inf fn d𝜇 ≤ lim inf ∫ fn d𝜇.


n n
X X

Proof. Let gn = infk≥n fk . Then 0 ≤ g1 ≤ g2 ≤ … , and let f (x) = limn gn (x). Note that
f (x) = lim infn fn (x). If s is a simple function such that 0 ≤ s ≤ f, then, by lemma
8.3.3, ∫X sd𝜇 ≤ limn ∫X gn d𝜇. Hence ∫X fd𝜇 = sup{∫X sd𝜇 ∶ s ≤ f} ≤ limn ∫X gn d𝜇.
Since gn ≤ fn , ∫X gn d𝜇 ≤ ∫X fn d𝜇, and limn ∫X gn d𝜇 ≤ lim infn ∫X fn d𝜇. 

Example 5. Let (fn ) be a convergent sequence in 𝔏1 (𝜇), and let f be its 𝔏1 -limit.
Then (fn ) contains a subsequence that converges to f for almost every x ∈ X.

Choose a subsequence (fni ) of (fn ) such that, for i ∈ ℕ, ‖ fni − f‖1 < 2−i . Let
k
gk = ∑i=1 | fni − f|. The functions gk are in 𝔏1 and, by construction, 0 ≤ g1 ≤
g2 ≤ … , and ‖gk ‖1 ≤ 1. Let g(x) = limk gk (x). By Fatou’s theorem, ∫X gd𝜇 ≤

lim infn ‖gk ‖1 ≤ 1. This shows that g ∈ 𝔏1 .⁵ Since g(x) = ∑i=1 | fni (x) − f (x)|,

it follows that the series ∑i=1 | fni (x) − f (x)| is convergent for a.e. x ∈ X (by
example 1). In particular, limi→∞ | fni (x) − f (x)| = 0 for a.e. x ∈ X. 

Theorem 8.3.9 (the monotone convergence theorem). If fn ∶ X → [0, ∞] is an


increasing sequence of measurable functions such that f (x) = limn fn (x) exists for
every x ∈ X, then

∫ fd𝜇 = lim ∫ fn d𝜇.


n
X X

⁵ One can also use the monotone convergence theorem to show that g ∈ 𝔏1 .
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

372 fundamentals of mathematical analysis

Proof. Since fn is increasing, lim infn fn = f, and since ∫X fn d𝜇 is increasing,


lim infn ∫X fn = limn ∫X fn d𝜇. By Fatou’s theorem, ∫X fd𝜇 ≤ limn ∫X fn d𝜇. Since
fn ≤ f, ∫X fn d𝜇 ≤ ∫X fd𝜇, and limn ∫X fn d𝜇 ≤ ∫X fd𝜇, and the proof is complete. 

Example 6. Let f ∈ 𝔏1 , and suppose that X = ∪∞ n=1 En , where En is an ascending


sequence of measurable sets. The limn ∫En | f |d𝜇 = ∫X | f |d𝜇.
For n ∈ ℕ define fn = | f |𝜒En . It is clear that fn is increasing and that
limn fn (x) = | f |(x). By the monotone convergence theorem, limn ∫En | f |d𝜇 =
limn ∫X fn d𝜇 = ∫X | f |d𝜇. 

Theorem 8.3.10 (the dominated convergence theorem). Let fn be a sequence of


complex measurable functions, and let g ∈ 𝔏1 (𝜇) be such that | fn (x)| ≤ |g(x)|. If
f (x) = limn fn (x) exists for every x ∈ X, then

f ∈ 𝔏1 (𝜇) and lim ∫ | fn − f|d𝜇 = 0, that is , fn → f in 𝔏1 (𝜇).


n
X

Proof. Notice that | fn (x)| ≤ |g(x)| implies that | f (x)| ≤ |g(x)|. Hence fn ∈ 𝔏1 (𝜇),
and f ∈ 𝔏1 (𝜇). Since | fn − f| ≤ 2g, we can apply Fatou’s theorem to the
sequence 2g − | fn − f| to obtain ∫X 2gd𝜇 ≤ lim infn ∫X 2g − | fn − f|d𝜇 = ∫X 2gd𝜇 −
lim supn ∫X | fn − f|d𝜇. Hence lim supn ∫X | fn − f|d𝜇 ≤ 0, so lim supn ∫X | fn −
f|d𝜇 = 0. Since ∫X | fn − f|d𝜇 is a nonnegative sequence, limn ∫X | fn − f|d𝜇 = 0,
as desired. 

Example 7. Let f ∈ 𝔏1 (𝜇). Then, for every 𝜖 > 0, there exists 𝛿 > 0 such that
whenever 𝜇(E) < 𝛿, ∫E |f|d𝜇 < 𝜖.
Suppose there exists a number 𝜖 > 0 such that, for every n ∈ ℕ, there is a
measurable set En such that 𝜇(En ) < 2−n , and ∫En | f |d𝜇 ≥ 𝜖. Let Fk = ∪n≥k En ,

and let F = ∩∞k=1 Fk . On the one hand, 𝜇(Fk ) ≤ ∑n=k 2
−n
= 2−k+1 ; hence 𝜇(F ) =
limk 𝜇(Fk ) = 0, and ∫F | f |d𝜇 = 0. On the other hand, by the dominated con-
vergence theorem, ∫F | f |d𝜇 = limk ∫Fk | f |d𝜇 ≥ lim infk ∫Ek | f |d𝜇 ≥ 𝜖. This con-
tradiction establishes the result. 

Exercises

In the problems below, (X, 𝔐, 𝜇) is a measure space.

1. Let f be a measurable function, and let g be a function such that f (x) = g(x)
for a.e. x ∈ X. Prove that g is measurable.
2. Define a relation on the collection of measurable functions as follows: f ≡ g
if f (x) = g(x) for a.e. x ∈ X. Prove that ≡ is an equivalence relation.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 373

3. Let f be an integrable function, and let g be such that f (x) = g(x) a.e. Prove
that g is integrable and that ∫X fd𝜇 = ∫X gd𝜇.
4. Let f ∈ 𝔏1 (𝜇), and let E = {x ∈ X ∶ | f (x)| > c}, where c > 0. Prove
1
the inequality (Tchebychev) 𝜇(E) ≤ ∫E | f |d𝜇. More generally, if f is
c
1
measurable and | f |p ∈ 𝔏1 (𝜇), then 𝜇(E) ≤ p ∫E | f |p d𝜇. Here 1 ≤ p < ∞.
c
5. Let f ∈ 𝔏1 (𝜇). Show that the set E = {x ∈ X ∶ f (x) ≠ 0} is a countable union
of sets of finite measure.
6. Let f be a positive measurable function. Show that if E and F are measurable
sets such that 𝜇(EΔF) = 0, then ∫E fd𝜇 = ∫F fd𝜇.

7. Let fn be a sequence of measurable functions such that ∑n=1 ∫X | fn |d𝜇 < ∞.

Show that the series ∑n=1 | fn (x)| converges a.e. in X.
8. Show that if 𝜇 is a finite measure and (fn ) is a sequence of bounded mea-
surable functions such that fn converges uniformly to f, then limn ∫X | fn −
f|d𝜇 = 0.
9. Let f ∈ 𝔏1 (𝜇). Prove that for every 𝜖 > 0, there exists a set E of finite measure
such that ∫E | f |d𝜇 > ‖ f‖1 − 𝜖.
10. Let (fn ) be a decreasing sequence of nonnegative measurable functions, and
let f = limn fn . Show that if f1 is integrable, then ∫X fd𝜇 = limn ∫X fn d𝜇.

8.4 Lebesgue Measure on ℝn

This section is the centerpiece of the chapter. The motivation for the definition of
the Lebesgue measure, as well as an extensive development of its properties, appear
later in the section. We must furnish some needed background. The four leading
results in this section are valid for locally compact Hausdorff spaces, and this is
made abundantly clear in the excursion on Radon measures. We chose to limit the
bulk of the section to the Lebesgue measure because we do not wish to base this
section too heavily on chapter 5.

Preliminaries

Lemma 8.4.1 (Urysohn’s lemma). Let E and F be disjoint closed subsets of ℝn .


Then there exists a continuous function f ∶ ℝn → [0, 1] such that f(E) = 1, and
f(F ) = 0.

Proof. The functions g(x) = dist(x, F) and h(x) = dist(x, E) are continuous and are
never simultaneously zero since E and F are closed and disjoint. Furthermore,
g(x) > 0 for every x ∈ E, and h(x) > 0 for every x ∈ F.
g(x)
The function f (x) = has the stated properties. 
g(x)+h(x)
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

374 fundamentals of mathematical analysis

Lemma 8.4.2. Let K be a compact subset of an open subset V of ℝn . Then there exists
an open set U such that U is compact and K ⊆ U ⊆ U ⊆ V.

Proof. For every x ∈ K, there exists a ball B(x, 𝛿x ) such that B(x, 𝛿x ) ⊆ V. Since
K is compact, and K ⊆ ∪x∈K B(x, 𝛿x ), there exists a finite number of points
x1 , … , xm ∈ K such that K ⊆ ∪m m
i=1 B(xi , 𝛿xi ). The set U = ∪i=1 B(xi , 𝛿xi ) satisfies the
requirements. 

Definition. Let f ∶ ℝn → ℂ be a continuous function. The support of f, written


supp(f ), is the closure of the set {x ∈ ℝn ∶ f (x) ≠ 0}. A continuous function f ∶
ℝn → ℂ is said to be of compact support if supp(f ) is compact.

We use the notation 𝒞c (ℝn ) to denote the set of continuous, complex-valued


functions of compact support on ℝn . Clearly, 𝒞c (ℝn ) is a vector space. We also use
𝒞rc (ℝn ) to denote the set of continuous, real-valued functions of compact support
on ℝn .

Notation. Let K be a compact subset of ℝn , and let V be an open subset of ℝn . For a


function f ∈ 𝒞rc (ℝn ), we write f ≺ V to mean that 0 ≤ f ≤ 1 and supp(f ) ⊆ V. We
use the notation K ≺ f to mean that 0 ≤ f ≤ 1 and f (x) = 1 for all x ∈ K. Many
books refer to the following result as Urysohn’s lemma.

Lemma 8.4.3 (Urysohn’s lemma). Let K be a compact subset of an open subset V


of ℝn . Then there exists a function f ∈ 𝒞rc (ℝn ) such that K ≺ f ≺ V.

Proof. By lemma 8.4.2, there exists an open set U such that U is compact and
K ⊆ U ⊆ U ⊆ V. Applying lemma 8.4.1 with E = K and F = ℝn − U, we find the
function we seek. 

Lemma 8.4.4. Suppose K ⊆ ℝn is compact, and let {V1 , … , Vm } be an open cover of


K. Then there exist continuous, compactly supported functions h1 , … , hm such that
hi ≺ Vi and (h1 + ... + hm )(x) = 1 for all x ∈ K.

Proof. First we show that there exists an open cover {U1 , … , Um } of K such that
each Ui is compact and Ui ⊆ Vi . The proof is by induction on m. When m = 2,
let K1 = K − V2 . Then K1 is compact and contained in V1 . By lemma 8.4.2,
there exists an open set U1 with compact closure such that K1 ⊆ U1 ⊆ U1 ⊆ V1 .
Clearly, {U1 , V2 } is an open cover of K. Now let K2 = K − U1 , and repeat
the above argument to find an open set U2 with compact support such that
K2 ⊆ U2 ⊆ U2 ⊆ V2 . Clearly, {U1 , U2 } is an open cover of K. This proves the base
case when m = 2. We outline the inductive step. Let {V1 , … , Vm } be an open cover
of K, and write W = V2 ∪ ... ∪ Vm . Then {V1 , W} is an open cover of K. By what
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 375

we already established, there are open sets U1 and W1 with compact closures such
that K ⊆ U1 ∪ W1 , and U1 ⊆ V1 and W1 ⊆ W = V2 ∪ ... ∪ Vm . Now apply the
inductive hypothesis to the compact set W1 and its open cover {V2 , … , Vm }.

By lemma 8.4.3, there exist functions gi ∈ 𝒞rc (ℝn ) such that Ui ≺ gi ≺ Vi . Define

h1 = g1 , h2 = (1 − g1 )g2 , … , hm = (1 − g1 )...(1 − gm−1 )gm .

The fact that hi ≺ Vi is obvious. Simple induction shows that, for 2 ≤ i ≤ m,


h1 + ... + hi = 1 − (1 − g1 )...(1 − gi ). Now define h = h1 + ... + hm . Thus h = 1 −
(1 − g1 )...(1 − gm ). If x ∈ K, then x ∈ Ui for some i, so gi (x) = 1, and h(x) = 1. 

The functions h1 , … , hm in the above lemma are called a partition of unity on K


subordinate to the open cover {Vi }mi=1 .

Dicing ℝn

For a fixed natural number k, consider the following partition of ℝ:


𝜈 𝜈+1
[ k, ), 𝜈 ∈ ℤ.
2 2k

This partitions each interval [m, m + 1)(m ∈ ℤ) into 2k congruent half-open inter-
1
vals, each of length k .
2
The above partition of ℝ can be employed to partition ℝn into a collection of
half-open cubes:
𝜈 𝜈1 +1 𝜈 𝜈n +1
𝒮k = {𝜎 = [ k1 , ) × ... × [ kn , ) ∶ (𝜈1 , … , 𝜈n ) ∈ ℤn }.
2 2k 2 2k

Note that, for 𝜎 ∈ 𝒮k , diam(𝜎) = √n2−k and that if 𝜎 and 𝜎 ′ are distinct cubes in
𝒮k , then 𝜎 ∩ 𝜎 ′ = ∅.

Observe that the half-open unit cube [0, 1) × ... × [0, 1) is the union of 2nk cubes
in 𝒮k , that 𝒮k+1 is a refinement of 𝒮k , and that each cube in 𝒮k is the union of 2n
cubes in 𝒮k+1 .

Now, given an open set V ⊆ ℝn , let


𝒮k (V) = {𝜎 ∈ 𝒮k ∶ 𝜎 ⊆ V}, and Gk = ∪{𝜎 ∶ 𝜎 ∈ 𝒮k (V)}.
Note that G1 ⊆ G2 ⊆ …. We claim that V = ∪∞ ∞
k=1 Gk . Clearly, V ⊇ ∪k=1 Gk . Con-
versely, if x ∈ V, there exists 𝛿 > 0 such that B(x, 2𝛿) ⊆ V. Choose k ∈ ℕ such that
√n2−k < 𝛿. (Reminder: √n2−k is the diameter of a cube in 𝒮k .) Since ℝn = ∪{𝜎 ∶
𝜎 ∈ 𝒮k }, x ∈ 𝜎 for some 𝜎 ∈ 𝒮k . Since diam(𝜎) < 𝛿, 𝜎 ⊆ B(x, 2𝛿) ⊆ V. This proves
that x ∈ Gk and that V = ∪∞ k=1 Gk .
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

376 fundamentals of mathematical analysis

(a) (b)

Figure 8.2 (a) 𝒮3 (U) (b)𝒮4 (U)

This construction should be geometrically obvious. The set 𝒮k (V) is the largest set
of cubes in 𝒮k that fits inside V. It is also clear that 𝒮k+1 (V) is a refinement of 𝒮k (V)
that also contains all the additional cubes in 𝒮k+1 that fit in V. Figure 8.2 illustrates
the construction: figure 8.2(a) depicts all the squares of length 1/8 that fit in the
unit disk U, and figure 8.2(b) shows all the squares of length 1/16 that fit in the
disk. The union of the squares are G3 (U) and G4 (U), respectively.

Lemma 8.4.5. Let V be an open subset of ℝn . Then V is the countable union of


disjoint cubes 𝜎 of the type discussed in the previous paragraph. More specifically,

V = ∪i=1 𝜎i , where 𝜎i ∈ ∪∞
k=1 𝒮k , and 𝜎i ∩ 𝜎j = ∅ if i ≠ j.

Proof. Let B1 = G1 , and, for k ≥ 1, let Bk+1 = Gk+1 − Gk . The family {Bk } is mutually
disjoint, and ∪∞ k=1 Bk = V. Each Bk is the union of cubes in 𝒮k . The collection of
all such cubes is countable, and their union (over k ∈ ℕ) is V. Renumbering those
cubes as 𝜎1 , 𝜎2 , … , we obtain V = ∪∞ k=1 𝜎k . Finally, consider two distinct cubes, 𝜎i
and 𝜎j . If 𝜎i ⊆ Br , where 𝜎j ⊆ Bs and r ≠ s, then 𝜎i ∩ 𝜎j = ∅ because Br ∩ Bs = ∅
if r ≠ s. If 𝜎i and 𝜎j are subsets of Br , for some integer r, then 𝜎i ∩ 𝜎j = ∅ because
the cubes in 𝒮r are disjoint. 

As an illustration of the above construction, figure 8.3 depicts the set

B4 (U) = G4 (U) − G3 (U)

for the unit disk U.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 377

Figure 8.3 B4 (U) shown as the union of the unshaded squares

Lebesgue measure: Motivation and Overview

This subsection is included for the sole purpose of building the reader’s intuition.
It is not meant to be a rigorous development of any particular set of ideas.

It must be emphasized at the outset that the Lebesgue measure is not an artificial
construct but rather a very natural kind of measure, as the reader will see below.
The broad goals are intuitively clear; we wish to find a large enough 𝜎-algebra
ℒn in ℝn and a positive measure 𝜆 on ℒn that extends and is consistent with our
common geometric perceptions about length, area, and volume. It is therefore
entirely reasonable to expect (indeed, require) that every closed box Q must
be in ℒn and that the Lebesgue measure of such a box must be the product of its
dimensions, consistent with our definition of the volume of a closed box in section
8.1. Surprisingly, those two simple requirements allow us to achieve most of our
broad goals. Because every open subset of ℝn is a countable union of closed boxes,
every open subset of ℝn must be in ℒn ; hence ℒn contains all Borel subsets of ℝn .
The requirement that 𝜆(Q) = vol(Q) uniquely extends the Lebesgue measure to
all open sets, as we explain below.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

378 fundamentals of mathematical analysis

i
In theorem 8.4.5, if we define Ki = ∪ij=1 𝜎j , then 𝜆(Ki ) = ∑j=1 vol(𝜎j ), and it follows
directly from theorem 8.2.3 that

𝜆(V) = lim 𝜆(Ki ) (1)


i

This discussion strongly suggests equation (1) as a possible definition of the


Lebesgue measure of an open set.⁶ However, we will take a different path.

Another way to view equation (1) is as follows: Since, for any compact subset K of
V, 𝜆(K) ≤ 𝜆(V) and since there is a sequence Ki of compact subsets of V such that
𝜆(V) = limi 𝜆(Ki ), it must be true that, for an open set V ⊆ ℝn ,

𝜆(V) = sup{𝜆(K) ∶ Kcompact, K ⊆ V}. (2)

We will use a variant of equation (2) as the definition of the Lebesgue outer
measure of an open subset V of ℝn . However, this raises a serious question: why
would we abandon equation (1), which defines 𝜆(V) in terms of the measure
of a sequence of simple compact subsets of V, in favor of equation (2), which
involves the measure of general compact sets? In other words, how do we define
the measure of an arbitrary compact subset K of ℝn ? The answer is, we do not!
We use the Riemann integral as an instrument for the approximation of 𝜆(K) for a
compact subset K of V, and this is why Urysohn’s lemma is crucially important for
our development of the Lebesgue measure. Figures 8.4 and 8.5 illustrate the idea.
In figure 8.4, the outer disk depicts the open set V, and the inner disk depicts a
compact subset K of V. If f is a continuous function such that K ≺ f ≺ V, then the
Riemann integral ∫ℝn f (x)dx can be regarded as an approximation of both 𝜆(K)
and 𝜆(V). Figure 8.5 further illustrates the point. In that figure, the measure of K is
the volume of the cylinder above K, which differs from ∫ℝn f (x)dx by the volume
of the thin shell between the cylinder and the wall of the graph of f. Since we can
construct a compact subset K that fills as much of V as we wish (the compact sets Ki
in equation (1)), ∫ℝn f (x)dx can be used to simultaneously approximate 𝜆(K) and
𝜆(V) with arbitrary precision. We hope that the preceding discussion motivates
the definition below of the outer measure of an open subset of ℝn .

Card(𝒮 (V))
k
⁶ Equation (1) can be written more explicitly as 𝜆(V) = limk . This is a perfectly viable
2nk
approach, and some recent books have adopted this as the definition of the measure of an open subset
of ℝn . Observe that this definition accepts as a axiom the fact that the measure of the half-open cube is
1
the product of its dimensions; hence the quantity nk . Another implied assumption is that all the cubes
2
in 𝒮k (V) have the same measure. This is the seed of the translation invariance of the Lebesgue measure.
See the proof of theorem 8.4.14.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 379

Figure 8.4 A compact set K filling most of V

Figure 8.5 A function f such that K ≺ f ≺ V


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

380 fundamentals of mathematical analysis

In figure 8.4 we depict V as a bounded open set. However, the discussion points
are valid even for open sets of infinite measure. Specifically, this means that if V
is an open set of infinite measure, then V contains compact subsets of arbitrarily
large measure.

Lebesgue Measure

As explained in the above motivation, the Riemann integral will play a pivotal role
in our development of the Lebesgue measure. For a function f ∈ 𝒞c (ℝn ), let Q be
a closed box that contains supp(f ), and define

∫ f (x)dx = ∫ f (x)dx.
ℝn Q

The Riemann integral is clearly a positive linear functional on 𝒞c (ℝn ). For the
remainder of this section, we will use the following notation: for a function
f ∈ 𝒞c (ℝn ),

write I(f ) = ∫ f (x)dx.


ℝn

Definition. The Lebesgue outer measure is the set function

m∗ ∶ 𝒫(ℝn ) → [0, ∞],

which is as follows: for an open set V ⊆ ℝn ,

m∗ (V) = sup{I(f ) ∶ f ≺ V},

and for an arbitrary set A ⊆ ℝn ,

m∗ (A) = inf{m∗ (V) ∶ A ⊆ V, V open}.

The definition of m∗ (A) requires some justification. It is a well-known fact that


an open subset of ℝ is the disjoint union of a countable collection {(ai , bi )}∞ i=1 of
∗ ∞ ∗
open intervals. Therefore m (V) = ∑i=1 (bi − ai ). It follows, trivially, that m (V) =

inf{∑i=1 (bi − ai ) ∶ V ⊆ ∪∞ n
i=1 (ai , bi )}. While an arbitrary subset A of ℝ is not the
countable union of open boxes, it can be covered by such a set of boxes. Therefore

it makes sense to define m∗ (A) = inf{∑i=1 vol(Qi )}, where the infimum is taken
over all the countable covers {Qi }∞ i=1 of A by open boxes. Since the union of open
boxes is an open set, and since every open set is the countable union of open boxes,
the definition of m∗ (A) is well justified.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 381

Proposition 8.4.6. The set function m∗ is an outer measure on ℝn .

Proof. The monotonicity of m∗ is obvious. First we show that, for open sets V1 , … , Vm ,
m
m∗ (∪m ∗ m
i=1 Vi ) ≤ ∑i=1 m (Vi ). Let f ≺ ∪i=1 Vi . By lemma 8.4.4, there exist func-
tions hi ≺ Vi , 1 ≤ i ≤ m, such that that h1 (x) + ... + hm (x) = 1 for every x ∈
m m m
supp(f ). Now I(f ) = I(∑i=1 hi f) = ∑i=1 I(hi f) ≤ ∑i=1 m∗ (Vi ). This shows that
n
m∗ (∪ni=1 Vi ) ≤ ∑i=1 m∗ (Vi ).
We now show that m∗ is countably subadditive. Let (Ei ) be a sequence of subsets
∞ ∞
of ℝn . We must prove that m∗ (∪∞ ∗ ∗
i=1 Ei ) ≤ ∑i=1 m (Ei ). If ∑i=1 m (Ei ) = ∞, there

is nothing to prove, so assume that ∑i=1 m∗ (Ei ) < ∞. Let 𝜖 > 0, and choose
open sets Vi such that Ei ⊆ Vi and m∗ (Vi ) < m∗ (Ei ) + 𝜖/2i . Let V = ∪∞ i=1 Vi ,
and let f ≺ V, that is, K = supp(f ) ⊆ ∪∞ V
i=1 i . The compactness of K produces a
m
finite subcover V1 , … , Vm of K. Now I(f ) ≤ m∗ (V1 ∪ ... ∪ Vm ) ≤ ∑i=1 m∗ (Vi ) ≤
∞ ∞ ∞
∑i=1 m∗ (Vi ) ≤ ∑i=1 [m∗ (Ei ) + 𝜖/2i ] = ∑i=1 m∗ (Ei ) + 𝜖. Since the last inequal-

ity is true for an arbitrary f ≺ V, m∗ (V) ≤ ∑i=1 m∗ (Ei ) + 𝜖. Since ∪∞ i=1 Ei ⊆ V and

m∗ is monotone, m∗ (∪∞ E
i=1 i ) ≤ m ∗
(V) ≤ ∑ i=1
m ∗
(E i ) + 𝜖. Because 𝜖 is arbitrary,

m∗ (∪∞ E
i=1 i ) ≤ ∑ i=1
m ∗
(E i ). 

Example 1. The outer measure of an open interval V = (a, b) is b − a.


b
For any function f such that f ≺ V, it is clear that ∫a f (x)dx ≤ b − a. Thus
m∗ (V) ≤ b − a. Let g be the continuous, piecewise linear function whose graph
contains the points (a, 0), (a + 𝜖, 0), (a + 2𝜖, 1), (b − 2𝜖, 1), and (b − 𝜖, 0), (b, 0).
b
The function g is supported in V, and ∫a g(x)dx = b − a − 3𝜖. Thus m∗ (V) =
b − a. 

Example 2. The outer measure of a point set {x} in ℝ is zero.


For any 𝜖 > 0, the interval V𝜖 = (x − 𝜖, x + 𝜖) contains {x}, and m∗ (V𝜖 ) = 2𝜖.
Since 𝜖 is arbitrary, m∗ ({x}) = 0. 

Example 3. The following facts follow directly from example 2. The outer measure
of a countable subset of ℝ is zero. The outer measure of the closed interval [a, b]
is b − a. 

Example 4. The outer measure of the Cantor set is zero.


In the notation of section 4.2, the Cantor set, C, is contained in the set Cn for each
n ∈ ℕ. By the previous example and the subadditivity of m∗ , m∗ (Cn ) ≤ 2n /3n .
Since m∗ (C) ≤ m∗ (Cn ) and since n is arbitrary, m∗ (C) = 0. 

Example 5. The two-dimensional outer measure of the set E = {(x, 0) ∶ x ∈ ℝ}


(the x-axis) is zero.
In a manner quite similar to that in example 1, one can show that the
two-dimensional outer measure of an open rectangle is the product of its
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

382 fundamentals of mathematical analysis

dimensions. Let Rn be the open rectangle (−n, n) × (−𝜖/n3 , 𝜖/n3 ). Now


m∗ (Rn ) = 4𝜖/n2 , and E ⊆ ∪∞ ∗ ∗
n=1 Rn . By the subadditivity of m , m (E) ≤
∞ 4𝜖
m (∪n=1 Rn ) ≤ ∑n=1 2 . Since 𝜖 is arbitrary, m (E) = 0. 
∗ ∞ ∗
n

Definition. A subset E of ℝn is Lebesgue measurable if it satisfies the


Carathéodory condition, for everyA ⊆ ℝn ,

m∗ (A) = m∗ (A ∩ E) + m∗ (A ∩ E′ ).

By theorem 8.2.7, the set ℒ(ℝn ) of Lebesgue measurable sets is a 𝜎-algebra,


and the restriction of m∗ to ℒ(ℝn ) is a complete positive measure: the Lebesgue
measure on ℒ(ℝn ). We will reserve the notation 𝜆(E) exclusively to denote the
Lebesgue measure of a set E ∈ ℒ(ℝn ). We continue to write m∗ (E) for the outer
measure of a set E whose Lebesgue measurability has not been established. We
will frequently write ℒn as an abbreviation of ℒ(ℝn ).

The immediate task now is to show that every open subset of ℝn is Lebesgue
measurable (theorem 8.4.9). We first need to establish the finite additivity of m∗
for compact and open sets.

Theorem 8.4.7.
(a) If K is compact, then

m∗ (K) = inf{I(f ) ∶ K ≺ f}.

In particular, compact subsets of ℝn have finite Lebesgue outer measures.


(b) If K1 and K2 are disjoint compact subsets of ℝn , then

m∗ (K1 ∪ K2 ) = m∗ (K1 ) + m∗ (K2 ).

Proof. Let K ≺ f. If 0 < 𝛼 < 1, then the set V𝛼 = {x ∈ ℝn ∶ f (x) > 𝛼} is open and
contains K. Now if g ≺ V𝛼 , then 𝛼g < f, and m∗ (K) ≤ m∗ (V𝛼 ) = sup{I(g) ∶ g ≺
1
V𝛼 } ≤ I(f ). Letting 𝛼 → 1, we obtain m∗ (K) ≤ I(f ). Let 𝜖 > 0. There exists
𝛼
an open set V containing K such that m∗ (V) < m∗ (K) + 𝜖. Choose a function
f ∈ 𝒞rc (ℝn ) such that K ≺ f ≺ V. Then I(f ) ≤ m∗ (V) < m∗ (K) + 𝜖. This establishes
part (a).

To prove part (b), let 𝜖 > 0. By part (a), there exists a function g ∈ 𝒞rc (ℝn )
such that K1 ∪ K2 ≺ g, and I(g) < m∗ (K1 ∪ K2 ) + 𝜖. By lemma 8.4.2, there exists
an open subset W with compact closure such that K1 ⊆ W ⊆ W ⊆ ℝn − K2 .
By theorem 8.4.3, there exists a function f ∈ 𝒞rc (ℝn ) such that K1 ≺ f ≺ W. In
particular, f(K1 ) = 1, and f(K2 ) = 0. Note that K1 ≺ fg and that K2 ≺ (1 − f )g.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 383

Now m∗ (K1 ) + m∗ (K2 ) ≤ I(fg) + I(g − fg) = I(g) < m∗ (K1 ∪ K2 ) + 𝜖. Since 𝜖 is
arbitrary, m∗ (K1 ) + m∗ (K2 ) ≤ m∗ (K1 ∪ K2 ). Now the subadditivity of m∗ delivers
the result. 

Theorem 8.4.8. For an open set V,

(a) m∗ (V) = sup{m∗ (K) ∶ Kcompact, K ⊆ V},


(b) m∗ (V) = sup{m∗ (U) ∶ Uopen, Ucompact, U ⊆ V}, and
(c) if V1 , V2 are disjoint open sets, then m∗ (V1 ∪ V2 ) = m∗ (V1 ) + m∗ (V2 ).

Proof. Let 𝛼 < m∗ (V). By the definition of m∗ (V), there exists a function f ∈ 𝒞rc (ℝn )
such that f ≺ V and I(f ) > 𝛼. Let K = supp(f ). If K ⊆ W for some open set W, then
f ≺ W, so I(f ) ≤ m∗ (W). This shows that

m∗ (K) = inf{m∗ (W) ∶ Wopen, K ⊆ W} ≥ I(f ) > 𝛼.

Thus 𝛼 < m∗ (K) ≤ m∗ (V). This proves part (a). Observe that this proof is valid
even when m∗ (V) = ∞.

Part (b) follows from part (a) and lemma 8.4.2.

To prove (c), we may, without loss of generality, assume that V1 and V2 have
finite outer measure. Let 𝜖 > 0. By part (a), there exist compact sets K1 and K2
such that m∗ (Vi ) < m∗ (Ki ) + 𝜖/2, i = 1, 2. The set K = K1 ∪ K2 is compact, and
K1 ∪ K2 ⊆ V1 ∪ V2 . Now m∗ (V1 ) + m∗ (V2 ) ≤ m∗ (K1 ) + m∗ (K2 ) + 𝜖 = m∗ (K1 ∪
K2 ) + 𝜖 ≤ m∗ (V1 ∪ V2 ) + 𝜖. Since 𝜖 is arbitrary, and m∗ is subadditive, m∗ (V1 ) +
m∗ (V2 ) = m∗ (V1 ∪ V2 ). 

Theorem 8.4.9. Every open subset of ℝn is Lebesgue measurable. Consequently,


every Borel subset of ℝn is in ℒn .

Proof. Let U be an open subset of ℝn , and let A be an arbitrary subset of ℝn . Since


m∗ (A) ≤ m∗ (A ∩ U) + m∗ (A ∩ U′ ), we may assume that m∗ (A) < ∞. Without
loss of generality, assume that A ∩ U ≠ ∅ ≠ A ∩ U′ . Let 𝜖 > 0. There exists an
open set V containing A such that m∗ (V) < m∗ (A) + 𝜖/2. By theorem 8.4.8, there
exists an open set W such that W is compact, W ⊆ V ∩ U, and m∗ (W) + 𝜖/2 >
m∗ (V ∩ U).
Let W0 = V ∩ (W)′ . Notice that W0 ∩ W = ∅, that W0 ∪ W ⊆ V, and that W0
has finite outer measure. Now V ∩ U′ ⊆ W0 , so m∗ (W0 ) ≥ m∗ (V ∩ U′ ). Using this
information and theorem 8.4.8(c),
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

384 fundamentals of mathematical analysis

m∗ (A) + 𝜖 > m∗ (V) + 𝜖/2 ≥ m∗ (W ∪ W0 ) + 𝜖/2 = m∗ (W) + m∗ (W0 ) + 𝜖/2


≥ m∗ (V ∩ U) + m∗ (V ∩ U′ ) ≥ m∗ (A ∩ U) + m∗ (A ∩ U′ ).

Since 𝜖 is arbitrary, m∗ (A) ≥ m∗ (A ∩ U) + m∗ (A ∩ U′ ). 

Theorem 8.4.10. Let E ∈ ℒn .

(a) For 𝜖 > 0, there exists a closed subset F and an open subset V such that
F ⊆ E ⊆ V and 𝜆(V − F) < 𝜖.
(b) 𝜆(E) = sup{𝜆(K) ∶ Kcompact, K ⊆ E}.
(c) There exists an F𝜍 set A and a G𝛿 set B such that A ⊆ E ⊆ B and 𝜆(B − A) = 0.

Proof. ℝn is the countable union of the nest of compact balls Ki = B(0, i), i ∈
ℕ. For each i ∈ ℕ, 𝜆(Ki ∩ E) < ∞. Thus there exists open sets Vi ⊇ Ki ∩ E
such that 𝜆(Vi − (Ki ∩ Ei )) < 𝜖/2i+1 . Let V = ∪∞
i=1 Vi . Then E ⊆ V, V − E ⊆
∪∞ ′
i=1 (Vi − (Ki ∩ E)), and 𝜆(V − E) < 𝜖/2. Applying this result to E , we can
find an open set W containing E such that 𝜆(W − E ) < 𝜖/2. Let F = W′ .
′ ′

Then F ⊆ E, E − F = W − E′ and 𝜆(E − F) = 𝜆(W − E′ ) < 𝜖/2. Thus 𝜆(V − F) =


𝜆(V − E) + 𝜆(E − F) < 𝜖/2 + 𝜖/2 = 𝜖. This proves part (a).


If F is closed, F = ∪i=1 (Ki ∩ F). Each Ki ∩ F is compact and limi 𝜆(Ki ∩ F) = 𝜆(F ).
Thus (b) holds for closed subsets of ℝn . If E ∈ ℒn and 𝜖 > 0, by part (a) we
can choose a closed set F ⊆ E such that 𝜆(E − F) < 𝜖/2. If 𝜆(F ) = ∞, then sup
{𝜆(K) ∶ Kcompact, K ⊆ E} ≥ sup{𝜆(K) ∶ Kcompact, K ⊆ F} = ∞. If 𝜆(F ) < ∞,
there exists a compact subset K of F such that 𝜆(F − K) < 𝜖/2. Now 𝜆(E) =
𝜆(K) + 𝜆(E − F) + 𝜆(F − K) < 𝜆(K) + 𝜖.

To prove part (c), find open sets Vi and closed sets Fi such that Fi ⊆ E ⊆ Vi
and 𝜆(Vi − Fi ) < 1/i. Set A = ∪∞ ∞
i=1 Fi , and B = ∩i=1 Vi . Then 𝜆(B − A) < 1/i for
every i ∈ ℕ; hence 𝜆(B − A) = 0. Observe that these results are valid even when
𝜆(E) = ∞. 

Observation. Let ℬn be the 𝜎-algebra of Borel subsets of ℝn , and consider the


measure space (ℝn , ℬn , 𝜆). It is known that the restriction of 𝜆 to ℬn is not a
complete measure (see problem 4 at the end of this section). Theorem 8.4.10(c)
implies that (ℝn , ℒn , 𝜆) is the completion of (ℝn , ℬn , 𝜆). See problems 3 and 4 on
section 8.2. The following corollary is an affirmation of the same fact.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 385

Corollary 8.4.11. ℒn is the smallest 𝜎-algebra that contains ℬn and all sets of
Lebesgue (outer) measure 0.

Proof. We have already seen that ℬn ⊆ ℒn and that all subsets of Lebesgue outer
measure 0 are Lebesgue measurable. We show that if 𝔐 is a 𝜎-algebra containing
ℬn and all subsets of Lebesgue measure 0, then ℒn ⊆ 𝔐. If E ∈ ℒn , then, by
theorem 8.4.10, there exists an F𝜍 set A such that A ⊆ E and 𝜆(E − A) = 0. Thus
E = A ∪ (E − A), where A ∈ ℬn and 𝜆(E − A) = 0. 

Recall from section 8.1 that the volume of a closed box

Q = {x ∈ ℝn ∶ ai ≤ xi ≤ bi }
n
is, by definition, vol(Q) = ∏i=1 (bi − ai ).

Lemma 8.4.12. For an open box Q = (a1 , b1 ) × ... × (an , bn ),

n
𝜆(Q) = vol(Q) = ∏(bi − ai ).
i=1

Proof. Let f ≺ Q. Then I(f ) = ∫Q f (x)dx ≤ ∫Q 1dx = vol(Q). Thus 𝜆(Q) ≤ vol(Q).
For small-enough positive constants 𝜖, define Q𝜖 = [a1 + 𝜖, b1 − 𝜖] × ... × [an +
𝜖, bn − 𝜖]. There exists a function f ∈ 𝒞rc (ℝn ) such that Q𝜖 ≺ f ≺ Q. Therefore
n
𝜆(Q) ≥ I(f ) = ∫Q f (x)dx ≥ ∫Q𝜖 f (x)dx = ∫Q𝜖 1dx = vol(Q𝜖 ) = ∏i=1 (bi − ai − 2 𝜖).
n
Since 𝜖 is arbitrary, 𝜆(Q) ≥ ∏i=1 (bi − ai ) = vol(Q). 

Example 6. Consider an open box Q = (a1 , b1 ) × ... × (an , bn ), and let Q be the
closed box [a1 , b1 ] × ... × [an , bn ]. For every k ∈ ℕ, let Qk be the open box
1 1 1 1
(a1 − , b1 + ) × ... × (an − , bn + ). Since {Qk } is a descending sequence and
k k k k
n 2 n
Q = ∩∞
k=1 Qk , 𝜆(Q) = limk 𝜆(Qk ) = limk ∏i=1 (bi − ai + ) = ∏i=1 (bi − ai ) =
k
vol(Q) = 𝜆(Q). Therefore 𝜆(Q) = 𝜆(Q), and the boundary of any box has
Lebesgue measure zero. Therefore the Lebesgue measure of any box (open,
closed, or half-open) is the product of its dimensions. We will continue to refer
to the Lebesgue measure of a box as its volume. 

Example 7. The Lebesgue measure of an open ball of radius r in ℝn is cn rn , where


cn is the measure of the unit open ball in ℝn .

Let Br be the open ball of radius r centered at the origin, and let B be
the open unit ball. By lemma 8.4.5, B = ∪∞
i=1 𝜎i , where {𝜎i } is a sequence of
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

386 fundamentals of mathematical analysis


disjoint half-open cubes. Because Br = rB = ∪∞
i=1 r𝜎i , 𝜆(Br ) = ∑i=1 𝜆(r𝜎i ) =
∞ n
∑i=1 r 𝜆(𝜎i ) = r 𝜆(B) = cn r . 
n n

We are now ready to prove that the Riemann integral of a function of compact
support is the same as its integral with respect to Lebesgue measure.

Theorem 8.4.13. For a function f ∈ 𝒞c (ℝn ), ∫ℝn fd𝜆 = ∫ℝn f (x)dx.

Proof. It is sufficient to prove the result for a positive function f. Let Q be a half-
open cube containing supp(f ) in its interior. Consider the partition of Q into
2nk
2nk congruent, half-open sub-cubes {𝜎1 , … , 𝜎2nk }, and let sk (f ) = ∑i=1 f𝜍i vol(𝜎i )
be the lower Riemann sums of f. Here f𝜍i = min{f (x) ∶ x ∈ 𝜎i }. By theorem
8.1.2, limk→∞ sk (f ) = ∫ℝn f (x)dx. On the other hand, the simple functions fk (x) =
2nk
∑i=1 f𝜍i 𝜒𝜍i (x) satisfy 0 ≤ f1 ≤ f2 ≤ … , and limk fk (x) = f (x) for every x ∈ ℝn .
By the monotone convergence theorem, limk ∫ℝn fk d𝜆 = ∫ℝn fd𝜆. But ∫ℝn fk d𝜆 =
2nk 2nk
∑i=1 f𝜍i 𝜆(𝜎i ) = ∑i=1 f𝜍i vol(𝜎i ) = sk (f ). Therefore, limk sk (f ) = ∫ℝn fd𝜆. 

The previous theorem is commonly cast in the following language: the Lebesgue
integral extends the Riemann integral from 𝒞c (ℝn ) to 𝔏1 (ℝn , ℒn , 𝜆). We also
say that the Lebesgue measure 𝜆 represents the positive linear functional I(f ) =
∫ℝn f (x)dx.

Theorem 8.4.14. The Lebesgue measure is translation invariant. Thus if E ∈ ℒn ,


and x ∈ ℝn , then 𝜆(E + x) = 𝜆(E).

Proof. It is easy to see that vol(Q + x) = vol(Q) for every box Q. Now let V be an
open subset of ℝn . By lemma 8.4.5, we can write V as a disjoint union of half-
open cubes, V = ∪∞ ∞ ∞
i=1 𝜎i . Thus V + x = ∪i=1 (𝜎i + x), and 𝜆(V + x) = 𝜆(∪i=1 (𝜎i +
∞ ∞
x)) = ∑i=1 𝜆(𝜎i + x) = ∑i=1 𝜆(𝜎i ) = 𝜆(V). Thus the result holds for open subsets
of ℝn . The general result for an arbitrary measurable set E follows from the special
case we just established and the fact that 𝜆(E) = inf{𝜆(V) ∶ E ⊆ V, Vopen}. See
the definition of the Lebesgue outer measure. 

We now summarize the properties of Lebesgue measure.

Theorem 8.4.15. Lebesgue measure is a complete, translation-invariant measure on


ℒn , and

(a) every Borel subset of ℝn is Lebesgue measurable;


(b) for every open set V, 𝜆(V) = sup{∫ℝn f (x)dx ∶ f ≺ V};
(c) for every E ∈ ℒn , 𝜆(E) = inf{𝜆(V) ∶ E ⊆ V, Vopen};
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 387

(d) for every compact set K, 𝜆(K) = inf{∫ℝn f (x)dx ∶ K ≺ f};


(e) for every E ∈ ℒn , 𝜆(E) = sup{𝜆(K) ∶ K ⊆ E, Kcompact}, and
(f ) 𝜆 extends (represents) the Riemann integral in the sense that

∫ f (x)dx = ∫ fd𝜆
ℝn ℝn

for every f ∈ 𝒞c (ℝn ).

Property (a) is by theorem 8.4.9.


Properties (b) and (c) are the definitions of Lebesgue outer measure.
Property (d) is theorem 8.4.7(a).
Property (e) is theorem 8.4.10(b).
Property (f ) is theorem 8.4.13.
Finally, the completeness of 𝜆 is by theorem 8.2.7, and the translation invariance
of 𝜆 is by theorem 8.4.14. 

Definitions. Let X be a locally compact metric (topological) space, and let 𝔐 be


a 𝜎-algebra of subsets of X that contains all Borel subsets of X.
A positive measure 𝜇 on 𝔐 is said to be

(a) outer regular if, for every E ∈ 𝔐, 𝜇(E) = inf{𝜇(V) ∶ E ⊆ V, Vopen}, and
(b) inner regular if, for everyE ∈ 𝔐, 𝜇(E) = sup{𝜇(K) ∶ K ⊆ E, Kcompact}.

We say that 𝜇 is regular if it is both inner and outer regular.

Lebesgue measure is outer regular by the very definition of the Lebesgue outer
measure, m∗ . Theorem 8.4.10(b) states that Lebesgue measure is inner regular.

We conclude this section with two uniqueness results that characterize Lebesgue
measure.

Theorem 8.4.16. Let 𝜇 be a regular measure on ℒn such that 𝜇(K) < ∞ for
every compact subset K of ℝn , and ∫ℝn fd𝜇 = ∫ℝn f (x)dx for every f ∈ 𝒞c (ℝn ).
Then 𝜇 = 𝜆.

Proof. It is sufficient to prove that 𝜇(K) = 𝜆(K) for every compact set K. The result
then follows from the regularity of 𝜆 and 𝜇.
Let 𝜖 > 0. There exists an open subset V such that K ⊆ V and 𝜇(V) < 𝜇(K) + 𝜖.
Let f ∈ 𝒞rc (ℝn ) be such that K ≺ f ≺ V. Then 𝜆(K) = ∫ℝn 𝜒K d𝜆 ≤ ∫ℝn fd𝜆 =
∫ℝn f (x)dx = ∫ℝn fd𝜇 ≤ ∫ℝn 𝜒V d𝜇 = 𝜇(V) < 𝜇(K) + 𝜖. Since 𝜖 is arbitrary,
𝜆(K) ≤ 𝜇(K). Switching the roles of 𝜆 and 𝜇, we obtain 𝜇(K) ≤ 𝜆(K). 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

388 fundamentals of mathematical analysis

Theorem 8.4.17. Let 𝜇 be a translation invariant measure on ℒn such that


c = 𝜇([0, 1)n ) > 0. Then 𝜇 = c𝜆, that is, 𝜇(E) = c𝜆(E) for every E ∈ ℒn .

Proof. Let Q be the half-open unit cube [0, 1)n . For a fixed k ∈ ℕ, partition Q
into 2nk congruent half-open sub-cubes 𝜎1 , … , 𝜎2nk in 𝒮k . Each cube in 𝒮k is a
translation of any other cube in 𝒮k , therefore, by assumption, 𝜇(𝜎i ) = 𝜇(𝜎1 ) for
i = 1, 2, … , 2nk . Since 𝜎1 , … , 𝜎2nk are disjoint, c = 𝜇(Q) = 2nk 𝜇(𝜎1 ). Also c = c.1 =
2nk
c𝜆(Q) = c ∑i=1 𝜆(𝜎i ) = c2nk 𝜆(𝜎1 ). It follows that 𝜇(𝜎1 ) = c𝜆(𝜎1 ); hence 𝜇(𝜎) =
c𝜆(𝜎) for every cube 𝜎 in 𝒮k . Since k is arbitrary, 𝜇(𝜎) = c𝜆(𝜎) for any cube
𝜎 ∈ ∪∞ k=1 𝒮k . By lemma 8.4.5, an arbitrary open set V is a countable union of
disjoint cubes in ∪∞ k=1 𝒮k . The countable additivity of 𝜆 and 𝜇 produces 𝜇(V) =
c𝜆(V) for every open subset V.
Now let E ∈ ℒn . For an open set V ⊇ E, 𝜇(E) ≤ 𝜇(V) = c𝜆(V). Hence
𝜇(E) ≤ c inf{𝜆(V) ∶ V ⊇ E, Vopen} = c𝜆(E). To show that 𝜇(E) ≥ c𝜆(E), we first
assume that E is bounded. Choose a large enough open box Ω that contains E.
Then 𝜇(Ω) − 𝜇(E) = 𝜇(Ω − E) ≤ c𝜆(Ω − E) = c𝜆(Ω) − c𝜆(E) = 𝜇(Ω) − c𝜆(E).
Thus 𝜇(E) ≥ c𝜆(E).
If E is unbounded, let Bi be the open ball of radius i and centered at the origin.
Then E = ∪∞ i=1 (E ∩ Bi ), and 𝜇(E) = limi 𝜇(E ∩ Bi ) = c limi 𝜆(E ∩ Bi ) = c𝜆(E). 

Excursion: Radon Measures

A close examination of the constructions and the results of this section so far
reveals that most of the theory we developed can be extended to locally compact
Hausdorff spaces. Specifically, if ℝn is replaced with a locally compact Hausdorff
space X and the Riemann integral is replaced with a positive linear functional I on
𝒞c (X), then we can construct a measure 𝜇 that represents I and has most (but not
all) of the regularity properties we derived for Lebesgue measure.⁷
The following results can be established by replicating the proofs of the corre-
sponding results for the Lebesgue integral. Theorem 5.9.2 must be used instead of
theorem 8.4.2, and lemma 5.11.6 instead of theorem 8.4.3. The proof we included
for lemma 8.4.4 is valid for any locally compact Hausdorff; hence we state the
lemma below for the sake of completeness. We urge the reader to scrutinize our
claim that the proofs of the theorems below for Radon measures are identical to
those provided for the Lebesgue measure. The exercise is illuminating.
Throughout this subsection, X is a locally compact Hausdorff space, and I is a
positive linear functional on 𝒞c (X). Explicitly, for f, g ∈ 𝒞c (X), and 𝛼, 𝛽 ∈ 𝕂, I(𝛼f +
𝛽g) = 𝛼I(f ) + 𝛽I(g), and if f ≥ 0, then I(f ) ≥ 0. Observe that such a functional is
monotone in the sense that if f ≤ g, then I(f ) ≤ I(g).
We continue to use the notation K ≺ f ≺ V to indicate that f ∈ 𝒞rc (X), 0 ≤ f ≤ 1,
f(K) = 1, and supp(f ) ⊆ V.

⁷ In the sense that ∫X fd𝜇 = I(f ) for all f ∈ 𝒞c (X).


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 389

Lemma 8.4.18. Suppose K ⊆ X is compact and that V1 , … , Vm are open subsets of X


such that K ⊆ ∪m i=1 Vi . Then there exist functions h1 , … , hm such that hi ≺ Vi and
(h1 + ... + hm )(x) = 1 for all x ∈ K. 

The Radon outer measure induced by the positive linear functional I is the set
function
m∗ ∶ 𝒫(X) → [0, ∞],
defined as follows: for an open set V ⊆ X,

m∗ (V) = sup{I(f ) ∶ f ≺ V},

and for an arbitrary set A ⊆ X,

m∗ (A) = inf{m∗ (V) ∶ A ⊆ V, V open}.

Proposition 8.4.19. The set function m∗ is an outer measure on X. 

Definition. A subset E of X is Radon measurable, or simply measurable, if it


satisfies the Carathéodory condition: for everyA ⊆ X,

m∗ (A) = m∗ (A ∩ E) + m∗ (A ∩ E′ ).

By theorem 8.2.7, the set 𝔐 of measurable sets is a 𝜎-algebra, and the restric-
tion of m∗ to 𝔐 is a complete positive measure: the Radon measure on 𝔐
induced by I . We will reserve the notation 𝜇(E) exclusively to denote the 𝜇-
measure of a set E ∈ 𝔐. We continue to write m∗ (E) for the outer measure of a
set E whose Radon measurability has not been established.

Theorem 8.4.20.
(a) If K is compact, then

m∗ (K) = inf{I(f ) ∶ K ≺ f}.

In particular, compact subsets of X have finite outer measure.


(b) If K1 and K2 are disjoint compact subsets of X, then m∗ (K1 ∪ K2 ) = m∗ (K1 ) +
m∗ (K2 ). 

Theorem 8.4.21. For an open set V,

(a) m∗ (V) = sup{m∗ (K) ∶ Kcompact, K ⊆ V},


(b) m∗ (V) = sup{m∗ (U) ∶ Uopen, Ucompact, U ⊆ V}, and
(c) if V1 , V2 are disjoint open sets, then m∗ (V1 ∪ V2 ) = m∗ (V1 ) + m∗ (V2 ). 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

390 fundamentals of mathematical analysis

Theorem 8.4.22. Every open subset of X is Radon measurable. Consequently, every


Borel subset of X is in 𝔐. 

We now arrive at the main distinction between the Lebesgue measure and general
Radon measures. Part (a) of theorem 8.4.21 states that 𝜇 is inner regular on
open subsets of X. Inner regularity does not extend to all Radon measurable sets,
however. But we do have the following result.

Theorem 8.4.23. If E ∈ 𝔐 and 𝜇(E) < ∞, then 𝜇(E) = sup{𝜇(K) ∶ K ⊆ E, Kcompact}.


Thus 𝜇 is inner regular on sets of finite measure.

Proof. Let 𝜖 > 0, and choose an open set U ⊇ E such that 𝜇(U) < 𝜇(E) + 𝜖.
Since 𝜇(U − E) = 𝜇(U) − 𝜇(E) < 𝜖, there exists an open set V ⊇ U − E such
that 𝜇(V) < 𝜖. By theorem 8.4.21, U contains a compact subset H such that
𝜇(U) < 𝜇(H) + 𝜖. The set K = H − V is clearly compact, and K ⊆ U − V ⊆ E.
Now

𝜇(E) < 𝜇(U) = 𝜇(U) − 𝜇(H) + 𝜇(H)


= 𝜇(U − H) + 𝜇(H − V) + 𝜇(H ∩ V) < 𝜖 + 𝜇(K) + 𝜇(V) < 𝜇(K) + 2𝜖. 

We now prove the generalization of theorem 8.4.13.

Theorem 8.4.24. For a function f ∈ 𝒞c (X), ∫X fd𝜇 = I(f ).

Proof. It is enough to prove that

I(f ) ≤ ∫ fd𝜇 for every f ∈ 𝒞rc (X) (3)


X

because we then have −I(f ) = I(−f ) ≤ ∫X −fd𝜇 = − ∫X fd𝜇.


It is further sufficient to establish (3) when 0 ≤ f ≤ 1. Let K = supp(f ), and let
n be an arbitrary positive integer. For notational convenience, set 𝜖 = 1/n. For
0 ≤ i ≤ n, let yi = i/n, and define E1 = f−1 ([0, y1 ]) ∩ K and Ei = f−1 (yi−1 , yi ] for
2 ≤ i ≤ n. Clearly, the sets Ei are disjoint and ∪ni=1 Ei = K. Since f is continuous,
f−1 (B) ∈ ℬ(X) for every Borel subset B ⊆ ℝ; see problem 16 on section 8.2. In
particular, the sets Ei are in 𝔐. Because yi − 𝜖 = yi−1 ≤ f (x) for all x ∈ Ei , the
n
simple function s = ∑i=1 (yi − 𝜖)𝜒Ei satisfies 0 ≤ s ≤ f. Therefore

n
∑(yi − 𝜖)𝜇(Ei ) ≤ ∫ fd𝜇. (4)
i=1 X

For 1 ≤ i ≤ n, choose open subsets Vi ⊇ Ei such that 𝜇(Vi ) < 𝜇(Ei ) + 𝜖/n and
f (x) < yi + 𝜖 for all x ∈ Vi , and let {hi } be a partition of unity of K subordinate
to {Vi }. Since hi ≺ Vi ,
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 391

I(hi ) ≤ 𝜇(Vi ) < 𝜇(Ei ) + 𝜖/n. (5)

n
Since f = ∑i=1 hi f and since hi f ≤ (yi + 𝜖)hi , we have (using inequalities (4)
and (5)),

n n n
I(f ) = ∑ I(hi f) ≤ ∑(yi + 𝜖)I(hi ) ≤ ∑(yi + 𝜖)(𝜇(Ei ) + 𝜖/n)
i=1 i=1 i=1
n n
= ∑(yi − 𝜖)𝜇(Ei ) + 2𝜖𝜇(K) + 𝜖/n ∑(yi + 𝜖)
i=1 i=1
n
≤ ∫ fd𝜇 + 2𝜖𝜇(K) + 𝜖/n ∑(1 + 𝜖) = ∫ fd𝜇 + 2𝜖𝜇(K) + 𝜖(1 + 𝜖).
X i=1 X

This establishes inequality (3) because n is arbitrary. 

The following theorem summarizes the properties of Radon measures.

Theorem 8.4.25. Suppose X is a locally compact Hausdorff space, and let I be a


positive linear functional on 𝒞c (X). Then the Radon measure induced by I is a
complete measure on 𝔐, and

(a) every Borel subset of X is Radon measurable;


(b) for every open subset V 𝜇(V) = sup{I(f ) ∶ f ≺ V};
(c) 𝜇 is outer regular;
(d) 𝜇 is inner regular on open sets and sets of finite 𝜇-measure;
(e) for every compact set K, 𝜇(K) = inf{I(f ) ∶ K ≺ f}, and
(f ) 𝜆 extends (represents) I in the sense that ∫X fd𝜇 = I(f ) for every f ∈ 𝒞c (X).

Additionally, 𝜇 is unique, subject to these properties. 

To prove the uniqueness part of this theorem, mimic the proof of theorem 8.4.16.
Observe that the proof of theorem 8.4.16 is based only on the outer regularity of
the measures in question and their inner regularity for open sets.

The following theorem provides a sufficient condition for the inner regularity of
Radon measures (for all E ∈ 𝔐). The proof is identical to that of theorem 8.4.10.

Theorem 8.4.26. Suppose X is an 𝜎-compact, locally compact Hausdorff space, and


let E ∈ 𝔐.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

392 fundamentals of mathematical analysis

(a) For 𝜖 > 0, there exists a closed subset F and an open subset V such that F ⊆
E ⊆ V and 𝜇(V − F) < 𝜖.
(b) 𝜇(E) = sup{𝜇(K) ∶ Kcompact, K ⊆ E}.
(c) There exists an F𝜍 set A and a G𝛿 set B such that A ⊆ E ⊆ B and
𝜇(B − A) = 0. 

Exercises

1. Let f ∈ 𝒞c (ℝn ). Prove that f is uniformly continuous.


2. Prove that every countable subset of ℝn has measure 0, and find an example
of a subset E of ℝn such that 𝜆(E) = 0 but 𝜆(𝜕E) = ∞.
3. The goal of this exercise is to show the existence of non-Lebesgue measur-
able subsets of ℝ. Complete the following sketch of the proof. Define an
equivalence relation ≈ on ℝ by x ≈ y if x − y ∈ ℚ. Each equivalence class
of ≈ intersects the interval [0, 1/2]. Let P be a subset of [0, 1/2] containing
exactly one element from each of the equivalence classes of ≈. Enumerate
the rational numbers in [−1/2, 1/2] as {rn ∶ n ∈ ℕ}, and let Pn = rn + P.
Show that the family {Pn ∶ n ∈ ℕ} is disjoint and that the union A = ∪∞ n=1 Pn
satisfies [0, 1/2] ⊆ A ⊆ [−1/2, 1]. If P were measurable, then 1/2 ≤ 𝜆(A) ≤

3/2. But 𝜆(A) = ∑n=1 𝜆(Pn ).
4. Show that not every Lebesgue measurable set is a Borel set. Construction:
Let C be the Cantor set, and define a function f ∶ [0, 1] → C as follows:
∞ a ∞ 2a
f(0) = 0, and, for x ∈ (0, 1], write x = ∑i=1 ii , and set f (x) = ∑i=1 ii .⁸ The
2 3
function f is measurable by problem 18 on section 8.2, and is one-to-
one by theorem 4.2.18. Choose a subset P of [0, 1] which is not Lebesgue
measurable. The set A = f(P) is the set you need. Recall that the Cantor set
has measure 0; see example 4.
5. Use the fact that the Cantor set has measure 0 to show that Card(ℒ(ℝ)) =
2𝔠 . It can be shown that Card(ℬ(ℝ)) = 𝔠. Thus there are many more
Lebesgue measurable subsets than Borel subsets of ℝ.
6. Let V = {(x1 , … , xn ) ∈ ℝn ∶ xn = 0}. Prove that 𝜆(V) = 0.
1
7. Compute the Lebesgue measure of the set {x ∈ (0, ) ∶ sin(1/x) > 0.}
𝜋
8. Let E be a subset of ℝn .
(a) Prove that E is Lebesgue measurable if and only if, for every 𝜖 > 0, there
exists an open set V containing E such that m∗ (V − E) < 𝜖. Hint: The
necessity of the condition is by theorem 8.4.10. To prove the sufficiency,
use the identity A ∩ E′ = (A ∩ V′ ) ∪ (A ∩ (V − E)).

⁸ We use the series representation of x if x has a terminating binary expansion.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 393

(b) Prove that E is Lebesgue measurable if and only if, for every 𝜖 > 0, E
contains a closed subset F such that m∗ (E − F) < 𝜖.
The importance of this problem and the next is that they provide more
intuitive characterizations of Lebesgue measurability than the Carathédory
condition does. Intuitively, a subset E of ℝn is Lebesgue measurable if it can
be approximated from the outside by an open set or from the inside by a
closed set.
9. Let E be a subset of ℝn .
(a) Prove that E is Lebesgue measurable if and only if there exists a G𝛿 set
G containing E such that m∗ (G − E) = 0.
(b) Prove that E is Lebesgue measurable if and only if E contains an F𝜍 set
F such that m∗ (E − F) = 0.
10. In this exercise, we use 𝜆k to denote the Lebesgue measure on ℝk . Let r and
s be positive integers, and let n = r + s. Prove that if U ⊆ ℝr and V ⊆ ℝs are
open sets, then 𝜆n (U × V) = 𝜆r (U)𝜆s (V). Hint: 𝒮k (U × V) = 𝒮k (U) × 𝒮k (V).
1 1
11. Let rn be an enumeration of ℚ, and let G = ∪∞ n=1 (rn − 2 , rn + 2 ). Prove
n n
that 𝜆(G Δ F) > 0 for every closed subset F of ℝ. Hint: Show that if 𝜆
(G − F) = 0, then F = ℝ.
12. Let f be a continuous function in 𝔏1 (ℝn ). Show that if lim‖x‖→∞ f (x) exists,
then lim‖x‖→∞ f (x) = 0. Also give an example to show that a continuous
positive integrable function need not be bounded.
13. Let f ∈ 𝔏1 (ℝn ), and let a ∈ ℝn be fixed. Define (𝜏a f)(x) = f(x − a). Show
that ∫ℝn fd𝜆 = ∫ℝn (𝜏a f)d𝜆. This is a familiar linear change of variables
formula. Using more conventional notation, ∫ℝn f (x)d𝜆(x) = ∫ℝn f(x −
a)d𝜆(x).
14. For a subset E of ℝn , let −E = {−x ∶ x ∈ E}. Prove that E is measurable if
and only if −E is measurable and, in this case, 𝜆(−E) = 𝜆(E).
15. For r > 0 and E ⊆ ℝn , define rE = {rx ∶ x ∈ E}. Prove that E is measurable
if and only if rE is measurable and that 𝜆(rE) = rn 𝜆(E).
16. Let f ∈ 𝔏1 (ℝn ). For r > 0, define fr (y) = f(ry). Show that ∫ℝn fd𝜆 =
rn ∫ℝn fr d𝜆. Using more familiar notation, if x = ry, then d𝜆(x) = rn d𝜆(y).
17. Let X be an infinite-dimensional normed linear space. Prove that there
does not exist a translation-invariant measure on ℬ(X) that assigns finite
measure to bounded sets in ℬ(X). Hint: Use Riesz’s theorem to find a
sequence {un } of unit vectors such that ‖ui − uj ‖ ≥ 1/2.

8.5 Complex Measures

Complex measures do not really measure anything in the strict geometric sense of
the word, but they do share the defining property of a positive measure, namely,
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

394 fundamentals of mathematical analysis

countable additivity. Although they are rather abstract, real and complex measures
have applications in differentiation and probability theories, among many other
applications. We study the notion of differentiating one measure with respect to
another measure, and our main result is the Radon-Nikodym theorem, which we
apply in section 8.6 to study duals of 𝔏p spaces. Although the section results are
limited to the basics, example 2 and the section exercises significantly expand
the scope of the section, where we introduce such topics as the total variation
of real and complex measures, uniform integrability, and measurable dissections.
The properties of the Radon-Nikodym derivatives are also explored in the section
exercises.

Definition. Let (X, 𝔐) be a measurable space. A real measure on 𝔐 is a count-


ably additive function 𝜈 ∶ 𝔐 → ℝ. Thus if (En ) is a disjoint sequence in 𝔐, and

E = ∪∞ n=1 En , then 𝜈(E) = ∑n=1 𝜈(En ). By definition, 𝜈(∅) = 0. Observe that
finite positive measures are real measures.

In this definition, it is tacitly assumed that the series is absolutely convergent. We


call the reader’s attention to the fact that, according to the above definition, 𝜈 takes
finite values, that is, 𝜈(E) = ∞ and 𝜈(E) = −∞ are specifically not permitted.⁹

Theorem 8.5.1. Let (X, 𝔐, 𝜈) be a real measure space.

(a) If (En ) is a ascending sequence in 𝔐, then 𝜈(∪∞


n=1 En ) = limn 𝜈(En ).
(b) If (En ) is a descending sequence in 𝔐, then 𝜈(∩∞
n=1 En ) = limn 𝜈(En ).

Proof. The proofs parallel those of theorem 8.2.3 and are therefore omitted. 

Definition. Let (X, 𝔐, 𝜈) be a real measure space. A measurable set E is said to


be a 𝜈-positive set (or simply positive) if 𝜈(F ) ≥ 0 for every measurable subset
F of E. A measurable set E is said to be a 𝜈-negative set (or simply negative) if
𝜈(F ) ≤ 0 for every measurable subset F of E. A measurable set E is said to be
𝜈-null if 𝜈(F ) = 0 for every measurable subset F of E.

Clearly, if a measurable set E is both negative and positive, then E is 𝜈-null.

Warning. Monotonicity does not hold for real measures. It is possible for a set of
positive measure to contain a subset of negative measure, and conversely. Mono-
tonicity does hold, however, for positive and negative sets: if F is a measurable
subset of a positive set E, then 𝜈(F ) ≤ 𝜈(E).

⁹ This definition is not standard. Most books allow a real measure to take extended real values, ∞
or −∞, but not both. The standard term used in this case is signed measure.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 395

Proposition 8.5.2. A measurable subset of a positive set is positive, and the count-
able union of positive sets is positive. The corresponding statements are true for
negative sets.

Proof. The first assertion follows from the definition. To prove the second assertion,
let (En ) be a sequence of positive measurable subsets of X. Define A1 = E1 ,
n−1
and, for n ≥ 2, let An = En − ∪i=1 Ei . The sequence An is disjoint, and each An
is a positive set. Now let E ⊆ ∪∞ ∞
n=1 En = ∪n=1 An . Since 𝜈(E ∩ An ) ≥ 0, 𝜈(E) =

∑n=1 𝜈(E ∩ An ) ≥ 0. 

Lemma 8.5.3. Every set of positive measure contains a positive set of positive
measure.

Proof. Suppose, for a contradiction, that S is a set of positive measure that contains
no positive sets of positive measure. We first establish the following:

If A ⊆ S and 𝜈(A) > 0, then there is a subset of B of A such that 𝜈(B) > 𝜈(A).
(*)

Since A is not a positive set, A contains a subset C such that 𝜈(C) < 0. Set B =
A − C. Then 𝜈(B) = 𝜈(A) − 𝜈(C) > 𝜈(A). This proves (*).
1
Set A1 = S, and let n1 be the least natural number for which 𝜈(A1 ) > . By
n1
(*), A1 contains a set B such that 𝜈(B) > 𝜈(A1 ). Let n2 be the least natural
1
number for which A1 contains a set B such that 𝜈(B) > 𝜈(A1 ) + , and let A2
n2
be such a set. Continue inductively to construct a sequence of natural numbers
n1 , n2 , … , and measurable sets A1 ⊇ A2 ⊇ ... such that nj is the least positive
1
integer for which Aj−1 contains a set B with 𝜈(B) > 𝜈(Aj−1 ) + , and Aj is such
nj
1 1 1 1 1 1
a set. Now 𝜈(A3 ) > 𝜈(A2 ) + > 𝜈(A1 ) + + > + + . Inductively,
n3 n2 n3 n1 n2 n3
j 1 ∞ 1
𝜈(Aj ) > ∑i=1 . Define A = ∩∞j=1 Aj . Then ∞ > 𝜈(A) = limj 𝜈(Aj ) ≥ ∑j=1 . In
ni nj
∞ 1
particular, ∑j=1 is convergent, and limj nj = ∞. Again by (*), A contains a
n j
subset B such that 𝜈(B) > 𝜈(A), and there is a natural number n such that 𝜈(B) >
1 1
𝜈(A) + . But there is an integer j such that nj > n. Thus 𝜈(B) > 𝜈(A) + >
n n
1
𝜈(Aj−1 ) + . This contradicts the definition of nj because B ⊆ Aj−1 . 
n

Theorem 8.5.4 (the Hahn decomposition theorem). If (X, 𝔐, 𝜈) is a real measure


space, then there exist a positive set P and a negative set N such that X = P ∪ N
and P ∩ N = ∅. The sets P and N are essentially unique in the sense that if (Q, M)
is another pair of subsets of X satisfying the conclusion of the theorem, then PΔQ
and NΔM are 𝜈-null sets. Here PΔQ = (P − Q) ∪ (Q − P).
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

396 fundamentals of mathematical analysis

Proof. Let K = sup{𝜈(E) ∶ E ∈ 𝔐, E positive}, let Pn be a sequence of positive mea-


surable subsets such that limn 𝜈(Pn ) = K, and let P = ∪∞ n=1 Pn . By proposition
8.5.2, P is positive; hence 𝜈(P) ≤ K. Now 𝜈(Pn ) ≤ 𝜈(P) ≤ K implies that K =
limn 𝜈(Pn ) ≤ 𝜈(P) ≤ K. Therefore 𝜈(P) = K. Notice that this proves that K < ∞.
Let N = X − P. We show that N is a negative set, and this will prove the existence
part of the theorem. If N is not negative, then it contains a subset S of positive
measure. By lemma 8.5.3, S contains a positive subset G of positive measure. This
contradicts the definition of K, since P ∪ G would be a positive set and 𝜈(P ∪ G) =
𝜈(P) + 𝜈(G) > 𝜈(P) = K. If the pair (Q, M) also satisfies the conclusion of the
theorem, then P − Q = P ∩ M is both positive and negative and hence P − Q is
a 𝜈-null set. Similarly, Q − P is 𝜈-null and so is PΔQ. One shows that NΔM is
𝜈-null using an argument identical to this one. 

Corollary 8.5.5. A real measure is bounded.

Proof. We use the notation of the proof of the previous theorem. Let k = 𝜈(N). For a
measurable set E, 𝜈(E) = 𝜈(E ∩ P) + 𝜈(E ∩ N). Since 0 ≤ 𝜈(E ∩ P) ≤ K, and k ≤
𝜈(E ∩ N) ≤ 0, k ≤ 𝜈(E) ≤ K. 

Definition. Two positive measures 𝜈 and 𝜇 on a 𝜎-algebra 𝔐 are called mutually


singular if there exist disjoint measurable subsets Q and M such that Q ∪ M = X
and 𝜇(Q) = 0 = 𝜈(M).

Theorem 8.5.6 (the Jordan decomposition theorem). If 𝜈 is a real measure, then


there exist unique, finite, positive, mutually singular measures 𝜈 + and 𝜈 − such
that, for every E ∈ 𝔐, 𝜈(E) = 𝜈 + (E) − 𝜈 − (E).

Proof. Let (P, N) be a Hahn decomposition of 𝜈, and define 𝜈 + (E) = 𝜈(E ∩ P), and
𝜈− (E) = −𝜈(E ∩ N). The pair 𝜈+ and 𝜈 − has the desired properties since 𝜈 + (N) =
0 = 𝜈 − (P). If 𝜇+ and 𝜇− is another pair satisfying the stated properties with
𝜇+ (M) = 0 = 𝜇− (Q), where Q ∩ M = ∅, Q ∪ M = X, then the pair (Q, M) is a
Hahn decomposition of 𝜈 and hence PΔQ is 𝜈-null. Therefore, for E ∈ 𝔐,

𝜇+ (E) = 𝜇+ (E ∩ Q) + 𝜇+ (E ∩ M) = 𝜇+ (E ∩ Q)
= 𝜇+ (E ∩ Q) − 𝜇− (E ∩ Q) = 𝜈(E ∩ Q) = 𝜈(E ∩ P) = 𝜈 + (E).

Thus 𝜇+ = 𝜈 + ; hence 𝜇− = 𝜈 − . 

Definitions. The finite positive measures 𝜈 + and 𝜈− are called the positive and
negative variations of 𝜈, respectively. The finite positive measure |𝜈| = 𝜈 + + 𝜈 −
is called the total variation of 𝜈. Notice that, for every E ∈ 𝔐, |𝜈(E)| ≤ |𝜈|(E).
Define ‖𝜈‖ = |𝜈|(X).
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 397

Example 1. Let 𝜆 be Lebesgue measure on ℝ, and let ℒ be the set of Lebesgue


measurable subsets of ℝ. Let f (x) = xe−|x| , and define a set function 𝜈 ∶ ℒ →
ℝ by 𝜈(E) = ∫E fd𝜆 (E ∈ ℒ). One can easily check that 𝜈 is a real measure
(also see theorem 8.5.7). The Hahn decomposition of 𝜈 consists of the sets
P = [0, ∞), and N = (−∞, 0). Let A = (−1, 2), and B = (−1, 0), and let C =
(0, 2). Notice that while B ⊆ A, A has positive measure and B has negative
2 2 0
measure. Also C ⊆ A, but 𝜈(C) = ∫0 xe−x dx > 𝜈(A) = ∫0 xe−x + ∫−1 xex dx. One
can see that 𝜈+ (E) = ∫E∩(0,∞) fd𝜆, 𝜈 − (E) = − ∫E∩(−∞,0) fd𝜆 and that |𝜈|(E) =
∫E | f |d𝜆. In particular, 𝜈(ℝ) = 0, while ‖𝜈‖ = |𝜈|(ℝ) = 2. 

Definition. Let (X, 𝔐) be a measurable space. A complex measure on 𝔐 is a


countably additive complex-valued function on 𝔐.

Now let 𝜈 be a complex measure, and let 𝜈r and 𝜈i be the real and imaginary parts
of 𝜈, that is for E ∈ 𝔐, 𝜈(E) = 𝜈r (E) + i𝜈i (E). Clearly, 𝜈r and 𝜈i are real measures;
hence ‖𝜈r ‖ < ∞, ‖𝜈i ‖ < ∞, and |𝜈(E)| ≤ |𝜈r (E)| + |𝜈i (E)| ≤ ‖𝜈r ‖ + ‖𝜈i ‖ < ∞.
Therefore, complex measures, like real measures, are bounded. Notice that the
set of complex measures contains the set of real measures and, in particular, the
set of finite positive measures. The set of complex measures on a 𝜎-algebra 𝔐 is a
vector space under the obvious operations: for complex measures 𝜈 and 𝜇 and for
a complex scalar 𝛼, (𝜈 + 𝜇)(E) = 𝜈(E) + 𝜇(E), and (𝛼𝜈)(E) = 𝛼𝜈(E)(E ∈ 𝔐).

The following theorem generalizes example 1 and provides a rich source of real
and complex measures.

Theorem 8.5.7. Let (X, 𝔐) be a measurable space, and let 𝜇 be a positive (not
necessarily finite) measure on 𝔐. If h ∈ 𝔏1 (𝜇), then the following set function
defines a complex measure on 𝔐:

𝜈(E) = ∫hd𝜇.
E

Proof. Let {En } be a disjoint family of members of 𝔐, and let E = ∪∞


n=1 En ; h𝜒E =
∞ n n n
∑n=1 h𝜒En = limn ∑i=1 h𝜒Ei . Since | ∑i=1 h𝜒Ei | ≤ ∑i=1 |h|𝜒Ei ≤ |h| ∈ 𝔏1 (𝜇),
the dominated convergence theorem implies that


𝜈(E) = ∫ h𝜒E d𝜇 = ∫ ∑ h𝜒En d𝜇
X X n=1
n n ∞
= lim ∫ ∑ h𝜒Ei d𝜇 = lim ∑ 𝜈(Ei ) = ∑ 𝜈(En ). 
n n
X i=1 i=1 n=1
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

398 fundamentals of mathematical analysis

Definition. Let 𝜇 and 𝜈 be as in theorem 8.5.7. The function h is called the Radon-
d𝜈
Nikodym derivative of 𝜈 with respect to 𝜇, and we symbolically write h = ,
d𝜇
or d𝜈 = hd𝜇, to indicate that 𝜈(E) = ∫E hd𝜇. The following theorem justifies the
definition and the notation.

Theorem 8.5.8. Let 𝜇 be a positive measure, let h ∈ 𝔏1 (𝜇) be a positive function,


and let d𝜈 = hd𝜇. Then, for a positive measurable function f,

∫ fd𝜈 = ∫ fhd𝜇.
X X

In particular, if f ∈ 𝔏1 (𝜈), then fh ∈ 𝔏1 (𝜇) and ‖ f‖1,𝜈 = ‖ fh‖1,𝜇 .

Proof. For a measurable set E, ∫X 𝜒E d𝜈 = 𝜈(E) = ∫E hd𝜇 = ∫X 𝜒E hd𝜇. Linearity


guarantees that ∫X sd𝜈 = ∫X shd𝜇 for every simple function s. Now, for a
positive function f, let sn be an increasing sequence of simple functions such
that limn sn = f. Then sn h increases to fh, and, by the monotone convergence
theorem, ∫X fd𝜈 = limn ∫X sn d𝜈 = limn ∫X sn hd𝜇 = ∫X fhd𝜇. The remaining parts
of the theorem are obvious. 

Theorem 8.5.8 justifies the definition of the Radon-Nikodym derivative and the
d𝜈
notation h = . Indeed, this theorem can be stated using the notation ∫X fd𝜈 =
d𝜇
d𝜈
∫X f d𝜇. Observe that the last formula is reminiscent of the change of variables
d𝜇
formula. Problem 8 at the end of this section is what one might call the chain rule
for Radon-Nikodym derivatives.

Definition. Let 𝜇 be a positive (not necessarily finite) measure on a 𝜎-algebra 𝔐,


and let 𝜈 be an arbitrary complex measure on 𝔐. We say that 𝜈 is absolutely
continuous with respect to 𝜇 if, for every E ∈ 𝔐, 𝜇(E) = 0 implies 𝜈(E) = 0. In
this situation, we write 𝜈 << 𝜇.

Notice that if 𝔐, 𝜇, h, and 𝜈 are as in theorem 8.5.7, then 𝜈 << 𝜇. The Radon-
Nikodym theorem, in effect, is the converse of theorem 8.5.7.

If 𝜈 is a real measure and 𝜈 = 𝜈 + − 𝜈 − is the Jordan decomposition of 𝜈, then 𝜈 <<


𝜇 if and only if 𝜈 + << 𝜇 and 𝜈− << 𝜇. Also if 𝜈 = 𝜈r + i𝜈i is a complex measure,
then 𝜈 << 𝜇 if and only if 𝜈r and 𝜈i are absolutely continuous with respect to 𝜇. We
leave the details to the reader to verify.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 399

Lemma 8.5.9. Let 𝜈 and 𝜇 be finite positive measures on a 𝜎-algebra 𝔐, and


suppose that 𝜈 << 𝜇. Then there exists a positive number 𝜖 and a set P ∈ 𝔐 such
that

(a) 𝜇(P) > 0, and


(b) P is positive for the measure 𝜈 − 𝜖 𝜇, that is, 𝜈(E ∩ P) − 𝜖𝜇(E ∩ P) ≥ 0 for
every E ∈ 𝔐.

Proof. Since 0 < 𝜈(X) < ∞ and 0 < 𝜇(X) < ∞, there exists a positive number 𝜖
such that 𝜈(X) − 𝜖𝜇(X) > 0. Let (P, N) be the Hahn decomposition of the real
measure 𝜈 − 𝜖𝜇. Then P is positive for 𝜈 − 𝜖𝜇. If 𝜇(P) = 0, then 𝜈(P) = 0; hence
𝜈(X) − 𝜖𝜇(X) = 𝜈(N) − 𝜖𝜇(N) ≤ 0, since N is negative for 𝜈 − 𝜖𝜇. This contra-
dicts 𝜈(X) − 𝜖𝜇(X) > 0 and proves that 𝜇(P) > 0. 

Theorem 8.5.10 (the Radon-Nikodym theorem—the real version). If 𝜈 and


𝜇 are finite positive measures and 𝜈 << 𝜇, then there exists a unique positive
function f ∈ 𝔏1 (𝜇) such that d𝜈 = fd𝜇, that is,

𝜈(E) = ∫fd𝜇 for every E ∈ 𝔐.


E

Proof. The uniqueness of f follows from example 4 in section 8.3.1⁰ Let 𝔉 be the
following set of measurable functions:

𝔉 = {f ≥ 0 ∶ ∫fd𝜇 ≤ 𝜈(E) ∀ E ∈ 𝔐}.


E

Since f = 0 ∈ 𝔉, 𝔉 ≠ ∅. Every f ∈ 𝔉 is 𝜇-integrable since ∫X fd𝜇 ≤ 𝜈(X) < ∞.


It follows that 𝛼 = sup{∫X fd𝜇 ∶ f ∈ 𝔉} is finite. We first prove the fact that if
f, g ∈ 𝔉, then h = max{f, g} ∈ 𝔉. Let A = {x ∈ X ∶ f (x) ≥ g(x)}, and B = {x ∈
X ∶ f (x) < g(x)}. For every E ∈ 𝔐, ∫E hd𝜇 = ∫A∩E hd𝜇 + ∫B∩E hd𝜇 = ∫A∩E fd𝜇 +
∫B∩E gd𝜇 ≤ 𝜈(A ∩ E) + 𝜈(B ∩ E) = 𝜈(E). Hence h ∈ 𝔉. By the definition of
𝛼, there exists a sequence g1 , g2 , ... ∈ 𝔉 such that limn ∫X gn d𝜇 = 𝛼. Let f1 =
g1 , f2 = max{g1 , g2 }, … , fn = max{g1 , g2 , … , gn }. By the above fact, fn ∈ 𝔉, 0 ≤ f1 ≤
f2 ≤ … , and limn ∫X fn d𝜇 = 𝛼. Set f (x) = limn fn (x). By the monotone convergence
theorem, ∫E fd𝜇 = limn ∫E fn d𝜇 ≤ 𝜈(E); hence f ∈ 𝔉. Also, ∫X fd𝜇 = 𝛼. We claim
that 𝜈(E) = ∫E fd𝜇 for every E ∈ 𝔐. To this end, it is enough to show that the
measure 𝜁(E) = 𝜈(E) − ∫E fd𝜇 is identically equal to zero. If not, then 𝜁 would be
a positive measure, and 𝜁 << 𝜇. By lemma 8.5.9, there exists a positive number
𝜖 and a set P with 𝜇(P) > 0 such that P is positive for 𝜁 − 𝜖𝜇. Thus, for every

1⁰ More precisely, if f1 d𝜇 = f2 d𝜇 for fi ∈ 𝔏1 (𝜇), then f1 = f2 , 𝜇-a.e.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

400 fundamentals of mathematical analysis

E ∈ 𝔐, 𝜁(E) ≥ 𝜁(E ∩ P) ≥ 𝜖𝜇(E ∩ P) = 𝜖 ∫E 𝜒P d𝜇, and 𝜈(E) = ∫E fd𝜇 + 𝜁(E) ≥


∫E (f + 𝜖𝜒P )d𝜇. Thus f + 𝜖𝜒P ∈ 𝔉. But this leads to the following violation of the
definition of 𝛼: ∫X (f + 𝜖𝜒P )d𝜇 = ∫X fd𝜇 + 𝜖𝜇(P) > ∫X fd𝜇 = 𝛼. 

Theorem 8.5.11 (the Radon-Nikodym theorem). If 𝜇 is a finite positive measure


on 𝔐 and 𝜈 is a complex measure on 𝔐 such that 𝜈 << 𝜇, then there exists
a unique complex-valued function f ∈ 𝔏1 (𝜇) such that, for every E ∈ 𝔐,
𝜈(E) = ∫E fd𝜇.

Proof. If 𝜈 is real and 𝜈 << 𝜇, then 𝜈+ << 𝜇 and 𝜈 − << 𝜇. By theorem 8.5.10, we
find positive 𝜇-integrable functions f+ and f− such that, for every E ∈ 𝔐, 𝜈 + (E) =
∫E f+ d𝜇, and 𝜈− (E) = ∫E f− d𝜇. Thus 𝜈(E) = ∫E fd𝜇, where f = f+ − f− ∈ 𝔏1 (𝜇).
If 𝜈 is a complex measure, apply the result we just established to the real and
imaginary parts of 𝜈, since each part is absolutely continuous with respect to 𝜇. 

As an application of the Radon-Nikodym theorem, we develop the definition of


the total variation of a complex measure.

Example 2. Let 𝜈 be a complex measure, and let 𝜇 be a finite positive measure


such that 𝜈 << 𝜇. Such 𝜇 exists; one can take, for example, 𝜇 = |𝜈r | + |𝜈i |.
By the Radon-Nikodym theorem, there exists a function f ∈ 𝔏1 (𝜇) such that
d𝜈 = fd𝜇. We define the total variation of 𝜈 to be the finite positive measure
given by d|𝜈| = | f |d𝜇. Notice that this definition is consistent with the result
of problem 6(c) for real measures. We need to prove that |𝜈| is well defined in
the sense that it does not depend on the particular choice of 𝜇. Suppose that,
for finite positive measures 𝜇1 and 𝜇2 , and for functions fi ∈ 𝔏1 (𝜇i ), f1 d𝜇1 =
f2 d𝜇2 . Let 𝜉 = 𝜇1 + 𝜇2 . Then 𝜉 is a finite positive measure and 𝜇i << 𝜉. By
d𝜇 d𝜇
problem 8 in the section exercises, f1 1 d𝜉 = f2 2 d𝜉. By the uniqueness of
d𝜉 d𝜉
d𝜇1 d𝜇2 d𝜇i d𝜇1
the Radon-Nikodym derivative, f1 = f2 , 𝜉-a.e. Since ≥ 0, | f1 | =
d𝜉 d𝜉 d𝜉 d𝜉
d𝜇1 d𝜇2 d𝜇2
| f1 | = | f2 | = | f2 | , 𝜉-a.e. Now, for a measurable set E,
d𝜉 d𝜉 d𝜉

d𝜇1 d𝜇
∫| f1 |d𝜇1 = ∫| f1 | d𝜉 = ∫| f2 | 2 d𝜉 = ∫| f2 |d𝜇2 .
E E d𝜉 E d𝜉 E

Exercises

In the following exercises, (X, 𝔐) is a measurable space and 𝜇 is a positive measure


on 𝔐.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 401

1. Prove that if P and Q are positive sets for a real measure 𝜈 such that P Δ Q
is 𝜈-null, then 𝜈(E ∩ P) = 𝜈(E ∩ Q) = 𝜈(E ∩ P ∩ Q) for every measurable
set E.
1 1
2. Show that, for a real measure 𝜈, 𝜈 + = (|𝜈| + 𝜈) and 𝜈 − = (|𝜈| − 𝜈).
2 2
3. Let 𝜈 be a real measure. Prove that if 𝜉 and 𝜂 are finite positive measures
such that 𝜈 = 𝜉 − 𝜂, then 𝜉 ≥ 𝜈 + and 𝜂 ≥ 𝜈 − .
4. Define the following function on the space of real measures on 𝔐: ‖𝜈‖ =
|𝜈|(X). Prove that ‖.‖ is a norm.
5. Show that if 𝜈 is a real measure on 𝔐, then 𝜈 << 𝜇 if and only if 𝜈 + << 𝜇,
and 𝜈 − << 𝜇 if and only if |𝜈| << 𝜇.
6. Let f ∈ 𝔏1 (𝜇) be a real-valued function, and define 𝜈(E) = ∫E fd𝜇, (E ∈ 𝔐).
Prove that
(a) the pair (P, N) is a Hahn decomposition of 𝜈, where P = {x ∈ X ∶ f (x) ≥
0}, and N = {x ∈ X ∶ f (x) < 0};
(b) 𝜈 + (E) = ∫E f+ d𝜇, and 𝜈− (E) = − ∫E f− d𝜇; and
(c) |𝜈|(E) = ∫E |f|d𝜇; using our notation for Radon-Nikodym derivatives,
d|𝜈| d𝜈
this result can be written as =| |.
d𝜇 d𝜇

Definition. A subset 𝔉 of 𝔏1 (𝜇) is said to be uniformly integrable if, for


every 𝜖 > 0, there exists 𝛿 > 0 such that ∫E | f |d𝜇 < 𝜖 for every f ∈ 𝔉 and for
every measurable set E with 𝜇(E) < 𝛿.

7. Prove that if (fn ) is a convergent sequence in 𝔏1 (𝜇), then {fn } is uniformly


integrable. Hint: See example 7 in section 8.3.
8. Let 𝜁, 𝜈, and 𝜇 be finite positive measures such that 𝜁 << 𝜈 << 𝜇. Show that
d𝜁 d𝜁 d𝜈 d𝜈 d𝜇
= . Conclude that if 𝜈 << 𝜇 << 𝜈, then = ( )−1 (𝜇-or 𝜈-a.e.)
d𝜇 d𝜈 d𝜇 d𝜇 d𝜈
9. Prove that if 𝜈 is a complex measure, then, for every measurable set E,
|𝜈(E)| ≤ |𝜈|(E).
10. For a complex measure 𝜈, define ‖𝜈‖ = |𝜈|(X). Prove that ‖.‖ is a norm on
the space of complex measures on 𝔐.
11. Let 𝜈 be a complex measure on 𝔐. Show that 𝜈 << 𝜇 if and only if, for
every 𝜖 > 0, there exists 𝛿 > 0 such that, for every E ∈ 𝔐, 𝜇(E) < 𝛿 implies
that |𝜈(E)| < 𝜖. Hint: See example 7 in section 8.3.

Definition. Let 𝜈 be a real measure on 𝔐. A measurable dissection of E is


a disjoint collection {E1 , … , En } of measurable subsets such that E = ∪ni=1 Ei .
n
12. Prove that, for every E ∈ 𝔐, |𝜈|(E) = sup ∑i=1 |𝜈(Ei )|, where the supre-
mum is taken over all measurable dissections of E. The result also holds
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

402 fundamentals of mathematical analysis

when {Ei } is a countable dissection of E. Hint: If (P, N) is a Hahn decom-


position of 𝜈, then E1 = E ∩ P and E2 = E ∩ N is a measurable dissection
of E.

Definition. A positive measure 𝜇 on (X, 𝔐) is said to be 𝜎-finite if


X = ∪∞ n=1 Xn , where Xn ∈ 𝔐 and 𝜇(Xn ) < ∞. We may, and often do, choose
(Xn ) to be a disjoint sequence.

13. Prove that theorem 8.5.11 is valid when 𝜇 is a 𝜎-finite positive measure and
𝜈 is a complex measure such that 𝜈 << 𝜇.
Here is a proof outline. It is sufficient to prove the result when 𝜈 is a finite
positive measure. For n ∈ ℕ, define two finite positive measures on 𝔐 as
follows: 𝜇n (E) = 𝜇(E ∩ Xn ) and 𝜈n (E) = 𝜈(E ∩ Xn ). Show that 𝜈n << 𝜇n . By
theorem 8.5.10, there exist positive functions hn ∈ 𝔏1 (𝜇n ) such that d𝜈n =

hn d𝜇n . Without loss of generality, hn vanishes outside Xn . Set h = ∑n=1 hn .
Argue that h ∈ 𝔏1 (𝜇) and that d𝜈 = hd𝜇.

8.6 𝔏p Spaces

In addition to the function spaces ℬ(X), 𝒞(X), and ℬ𝒞(X), the 𝔏p spaces are
prototypical examples of Banach spaces and play a prominent role in modern
analysis. By far, the most important of the 𝔏p spaces is the Hilbert space
𝔏2 (X, 𝔐, 𝜇), where (X, 𝔐, 𝜇) is a positive measure space, such as a Lebesgue
measure restricted to a subset X of ℝn . The section results parallel those for the
sequence spaces lp . We prove the completeness of 𝔏p and derive the representation
theorem that, for 1 < p < ∞, 𝔏q is the dual of 𝔏p . In fact, the sequence spaces lp
are special cases of the 𝔏p spaces. See problem 1 at the end of this section. The
next section is a continuation of this one.

Throughout this section, p ≥ 1 and q ≥ 1 denote conjugate Hölder exponents; thus


1 1
+ = 1. It is understood that p = 1, and q = ∞ are conjugate exponents.
p q

Lemma 8.6.1. For all x, y ∈ ℂ,

|x|p |y|q
(a) |xy| ≤ + , 1 < p, q < ∞.
p q
(b) |x + y|p ≤ 2p (|x|p + |y|p ), 1 ≤ p < ∞.

Proof. Part (a) was established in lemma 3.6.1.


For part (b), |x + y| ≤ |x| + |y| ≤ 2 max{|x|, |y|}. Thus |x + y|p ≤ 2p (max{|x|, |y|})p =
2p max{|x|p , |y|p } ≤ 2p (|x|p + |y|p ). 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 403

Definition. Let (X, 𝔐) be a measurable space, and let 𝜇 be a positive measure on


𝔐. For 1 ≤ p < ∞, we define 𝔏p (𝜇) to be the set of all measurable functions f ∶
X → ℂ such that ∫X | f |p d𝜇 < ∞. If the measure 𝜇 is understood, we sometimes
write 𝔏p for 𝔏p (𝜇). In anticipation of the fact that 𝔏p is a normed linear space,
we write
1/p
‖ f ‖p = ( ∫ | f |p d𝜇) for f ∈ 𝔏p (𝜇).
X

Definition. We define 𝔏∞ (𝜇) as follows. A measurable function f is said to be


essentially bounded if there exists a positive constant M such that | f (x)| ≤ M
for almost every x ∈ X. Thus f is bounded by M a.e. Such a constant M is called
an essential upper bound of f. The space 𝔏∞ (𝜇) is the set of all essentially
bounded functions on X. For f ∈ 𝔏∞ (𝜇), we define

‖ f‖∞ = inf{M > 0 ∶ M is an essential upper bound of f}.

We leave it to the reader to prove that ‖ f‖∞ is an essential upper bound of f.


Thus ‖ f‖∞ is the least nonnegative constant such that | f (x)| ≤ ‖ f‖∞ a.e.
Observe that if 0 < 𝜖 < ‖ f‖∞ , then the set {x ∈ X ∶ | f (x)| > ‖ f‖∞ − 𝜖} has a
positive measure.

Theorem 8.6.2 (Hölder’s inequality). If f ∈ 𝔏p (𝜇) and g ∈ 𝔏q (𝜇), then fg ∈


𝔏1 (𝜇), and ‖ fg‖1 ≤ ‖ f‖p ‖g‖q .

| f (x)| |g(x)| 1 | f (x)|p 1 |g(x)|q


Proof. By lemma 8.6.1, ≤ p + q . Integrating both sides,
‖ f‖p ‖g‖q p ‖ f‖p q ‖g‖q
p
1 1 ‖ f‖p 1 ‖g‖q 1 1
we obtain ∫X | fg|d𝜇 ≤ p + q = + = 1. Therefore ∫X | fg|d𝜇 ≤
‖ f‖p ‖g‖q p ‖ f‖p q ‖g‖q p q
‖ f‖p ‖g‖q .
If f ∈ 𝔏1 and g ∈ 𝔏∞ , then ∫X | fg|d𝜇 ≤ ∫X | f |‖g‖∞ d𝜇 = ‖ f‖1 ‖g‖∞ . 

When p = 2 = q, Hölder’s inequality is the familiar Cauchy-Schwarz inequality.

Example 1. If f, g ∈ 𝔏1 (𝜇), then √| fg| ∈ 𝔏1 .


By assumption, the functions √| f | and √|g| are in 𝔏2 . By the Cauchy-Schwarz
inequality, √| fg| ∈ 𝔏1 . 

∞ e−x 1
Example 2. We show that ∫1 dx ≤ .
x e√2
−x
Let f (x) = 1/x, and g(x) = e . Then f and g are in 𝔏2 ((1, ∞)), and ‖ f‖2 =
1
1, ‖g‖2 = . The desired inequality now follows from the Cauchy-Schwartz
e√2
inequality. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

404 fundamentals of mathematical analysis

Example 3. If 𝜇(X) < ∞, then, for p ∈ (1, ∞), 𝔏p (𝜇) ⊆ 𝔏1 (𝜇), and, for f ∈ 𝔏p (𝜇),
‖ f‖1 ≤ ‖ f‖p (𝜇(X))1/q . In particular, if 𝜇(X) = 1, then ‖ f‖1 ≤ ‖ f‖p .

Because 𝜇(X) < ∞, the constant function g(x) = 1 ∈ 𝔏q . Using Hölder’s


inequality, we have ∫X | f |d𝜇 ≤ ‖ f‖p ‖g‖q = ‖ f‖p (𝜇(X))1/q . 

Example 4. Suppose that 𝜔 is a positive integrable function on ℝn and that


∫ℝn 𝜔(x)dx = 1. If f is a measurable function such that | f |p 𝜔 is Lebesgue inte-
grable, so is | f |𝜔, and
1/p
∫ℝn | f (x)|𝜔(x)dx ≤ ( ∫ℝn | f (x)|p 𝜔(x)dx) .

Here p ∈ [1, ∞).

Define a finite measure on ℝn by d𝜇 = 𝜔d𝜆, where 𝜆 is a Lebesgue measure.


The measure 𝜇 and the function f satisfy the conditions of example 1; hence the
result. 

Theorem 8.6.3 (Minkowsi’s inequality). For f, g ∈ 𝔏p , f + g ∈ 𝔏p and ‖ f + g‖p ≤


‖ f‖p + ‖g‖q .

Proof. We leave the cases p = 1 and p = ∞ to the reader. Assume 1 < p < ∞. By
p p
lemma 8.6.1, ∫X | f + g|p d𝜇 ≤ ∫X 2p (| f |p + |g|p )d𝜇 = 2p (‖ f‖p + ‖g‖p ) < ∞. This
shows that f + g ∈ 𝔏p . To prove Minkowski’s inequality when 1 < p < ∞, notice
that if h ∈ 𝔏p , then |h|p−1 ∈ 𝔏q because (p − 1)q = p. Now

p
‖ f + g‖p = ∫ | f + g|p d𝜇
X

= ∫ | f + g|p−1 | f + g|d𝜇 ≤ ∫ | f + g|p−1 | f | + | f + g|p−1 |g|d𝜇


X X
1/p 1/q 1/p 1/q
≤ ( ∫ | f |p ) ( ∫ | f + g|(p−1)q ) + ( ∫ |g|p ) ( ∫ | f + g|(p−1)q )
X X X X
p/q
= ‖ f + g‖p (| f‖p + ‖g‖p );

hence ‖ f + g‖p ≤ ‖ f‖p + ‖g‖p . 

It is now easy to verify that 𝔏p is a normed linear space for 1 ≤ p ≤ ∞.

Theorem 8.6.4 (completeness of 𝔏p (𝜇)). For an arbitrary positive measure 𝜇,


𝔏p (𝜇) is a Banach space for 1 ≤ p ≤ ∞.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 405

Proof. We do two cases.

Case 1. 1 ≤ p < ∞. We use the result of problem 10 on section 6.1. Let (fk ) be a
∞ ∞
sequence in 𝔏p such that K = ∑k=1 ‖ fk ‖p < ∞. We show that the series ∑k=1 fk
n ∞
converges in 𝔏p . Define gn = ∑k=1 | fk |, and let g = ∑k=1 | fk |. Then gn ∈ 𝔏p
n
and ‖gn ‖p ≤ ∑k=1 ‖ fk ‖p ≤ K. By the monotone convergence theorem, ∫X gp d𝜇 =
p ∞
limn ∫X gn d𝜇 ≤ Kp . Thus g ∈ 𝔏p . In particular, the series ∑k=1 fk (x) converges for

a.e. x ∈ X. Define f (x) = ∑k=1 fk (x). Since | f | ≤ g, f ∈ 𝔏p . Finally, we show that
n n
the sequence ∑k=1 fk converges to f in 𝔏p . Now | f − ∑k=1 fk |p ≤ (2g)p , and the
n
dominated convergence theorem implies that limn ‖ f − ∑k=1 fk ‖p = 0.

Case 2. p = ∞. Suppose for m, n > N, ‖ fn − fm ‖∞ < 𝜖. Let En = {x ∈ X ∶ | fn (x)| ≥


‖ fn ‖∞ }, and En,m = {x ∈ X ∶ | fn (x) − fm (x)| ≥ ‖ fn − fm ‖∞ }. By definition of
‖.‖∞ , the set E = ∪∞ ∞
n=1 En ∪ ∪n,m=1 Em,n has measure 0.
For m, n > N, sup{| fn (x) − fm (x)| ∶ x ∈ X − E} < 𝜖. Thus {fn } is a Cauchy
sequence in the space ℬ(X − E). Therefore, by 4.8.1, fn converges uniformly to
some function f ∈ ℬ(X − E). Extend f to X by defining f (x) = 0 for x ∈ E. Clearly,
‖ fn − f‖∞ → 0 as n → ∞. 

Representation of Bounded Linear Functionals on 𝔏p

Definition. If g is a measurable function, the sign of g is the function

g(x)
if g(x) ≠ 0,
(sgn(g))(x) = { |g(x)|
1 otherwise .

Notice that g.sgn(g) = |g| and that |sgn(g)| = 1.

Theorem 8.6.5. Let 1 < p ≤ ∞, and let g ∈ 𝔏q (𝜇). Then the functional

Φg (f ) = ∫X fgd𝜇

is a bounded linear functional on 𝔏p (𝜇), and ‖Φg ‖ = ‖g‖q . The same is true for
p = 1 under the additional assumption that 𝜇 is 𝜎-finite.

Proof. By Hölder’s inequality, |Φg (f )| ≤ ∫X | fg|d𝜇 ≤ ‖ f‖p ‖g‖q . Since the linearity of
Φg is obvious, this inequality shows that Φg is bounded and that ‖Φg ‖ ≤ ‖g‖q .
It remains to show that ‖Φg ‖ = ‖g‖q .
p q
If 1 < p < ∞, let f = |g|q−1 sgn(g). Then f ∈ 𝔏p (𝜇), and ‖ f‖p = ‖g‖q . Now
q q−1
Φg (f ) = ∫X fgd𝜇 = ∫X |g|q d𝜇 = ‖g‖q = ‖g‖q ‖g‖q = ‖ f‖p ‖g‖q . This concludes
the proof of the case 1 < p < ∞.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

406 fundamentals of mathematical analysis

If p = ∞, set f = sgn(g). Then ‖ f‖∞ = 1, and ∫X fgd𝜇 = ∫X |g|d𝜇 = ‖g‖1 .

Now suppose that p = 1 and that 𝜇 is 𝜎-finite. Let 0 < 𝜖 < ‖g‖∞ , and let E = {x ∈
X ∶ |g(x)| > ‖g‖∞ − 𝜖}. By definition of ‖g‖∞ , 𝜇(E) > 0. Since 𝜇 is 𝜎-finite, X =
∪∞ ∞
n=1 Xn , where each Xn has finite measure. Since E = ∪n=1 (E ∩ Xn ) and since 0 <

𝜇(E) ≤ ∑n=1 𝜇(E ∩ Xn ), 𝜇(E ∩ Xn ) > 0 for some integer n. Let A = E ∩ Xn , and
1
let f = sgn(g)𝜒A /𝜇(A). Then f ∈ 𝔏1 (𝜇), ‖ f‖1 = 1, and |Φg (f )| = ∫ |g|d𝜇 ≥
𝜇(A) A
‖g‖∞ − 𝜖. 

Theorem 8.6.5 establishes the fact that Φ ∶ g ↦ Φg is an isometry from 𝔏q (𝜇) into
(𝔏p (𝜇))∗ . Theorem 8.6.7 establishes sufficient conditions for Φ to be an isometric
isomorphism. First we need a technical result.

Lemma 8.6.6. Suppose 𝜇(X) < ∞. If g ∈ 𝔏1 (𝜇) and there exists a constant M such
that | ∫X sgd𝜇| ≤ M‖s‖p for every simple function s, then g ∈ 𝔏q .

Proof. Because 𝜇(X) < ∞, all measurable simple functions are in 𝔏p (𝜇), and
𝔏∞ (𝜇) ⊆ 𝔏p (𝜇) for all p ≥ 1. We work out two separate cases.

Case 1. 1 < p < ∞. First we show that | ∫X fgd𝜇| ≤ M‖ f‖p for every function
f ∈ 𝔏∞ . To see that, let (sn ) be a sequence of simple functions that converges
to f in 𝔏∞ (see theorem 8.3.2.) In this case, sn converges to f in 𝔏p for every
p ≥ 1. Now | ∫X fg − sn gd𝜇| ≤ ‖sn − f‖∞ ‖g‖1 → 0 as n → ∞. Thus | ∫X fgd𝜇| =
limn | ∫X sn gd𝜇| ≤ limn M‖sn ‖p = M‖ f‖p .
We show that g ∈ 𝔏q . Let En = {x ∈ X ∶ |g(x)| ≤ n}, and let f = |g|q−1 sgn(g)𝜒En .
Then f ∈ 𝔏∞ , fg = |g|q 𝜒En , and | f |p = |g|q 𝜒En . Hence ∫En |g|q d𝜇 = ∫X fgd𝜇 ≤
M(∫X | f |p d𝜇)1/p = M(∫En |g|q d𝜇)1/p . Thus (∫En |g|q d𝜇)1/q ≤ M. Taking the limit
of the left side of the last inequality, the monotone convergence theorem yields
‖g‖q ≤ M < ∞.

Case 2. p = 1. For every measurable set E, | ∫E gd𝜇| ≤ M‖𝜒E ‖1 = M𝜇(E). It follows


1
that, for every measurable set E of positive measure, ∫E gd𝜇 is in the closed
𝜇(E)
disk D of radius M and centered at the origin of the complex plane. We claim
that |g(x)| ≤ M for a.e. x ∈ X, that is, ‖g‖∞ ≤ M. We show that if B = B(z0 , r)
is an open disk of radius r in ℂ − D, then 𝜇(g−1 (B)) = 0. This will establish
the claim, since ℂ − D is a countable union of open disks. Let E = g−1 (B). If
1 1 1
𝜇(E) > 0, then | ∫E gd𝜇 − z0 | = | ∫E (g − z0 )d𝜇| ≤ ∫E |g − z0 |d𝜇 ≤ r.
𝜇(E) 𝜇(E) 𝜇(E)
1
Thus ∫E gd𝜇 ∈ B ∩ D = ∅. This contradiction completes the proof. 
𝜇(E)
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 407

Theorem 8.6.7. If 𝜇 is 𝜎-finite, then the function Φ in theorem 8.6.5 is onto for
1 ≤ p < ∞.

Proof. Let 𝜑 ∈ (𝔏p (𝜇))∗ . We need to prove the existence of a function g ∈ 𝔏q such
that 𝜑 = Φg . Equivalently, for all f ∈ 𝔏p (𝜇),

𝜑(f ) = ∫ fgd𝜇. (6)


X

We first prove the result in the special case when 𝜇(X) < ∞. For a measurable
set E, define 𝜈(E) = 𝜑(𝜒E ). Since 𝜑 is linear and since, for disjoint measurable
sets E1 and E2 , 𝜒E1 ∪E2 = 𝜒E1 + 𝜒E2 , 𝜈(E1 ∪ E2 ) = 𝜈(E1 ) + 𝜈(E2 .) Thus 𝜈 is finitely
additive. We show that 𝜈 is countably additive, and this will establish the fact
that 𝜈 is a complex measure. Let (En ) be a disjoint sequence in 𝔐, and let
p
E = ∪∞ k
n=1 En . Let Ak = ∪i=1 Ei . Then limk 𝜇(Ak ) = 𝜇(E); hence ‖𝜒E − 𝜒Ak ‖p =
p p
∫X |𝜒E − 𝜒Ak | d𝜇 = 𝜇(E − Ak ) → 0 as k → ∞. Thus 𝜒Ak converges to 𝜒E in 𝔏 (𝜇).

By the continuity of 𝜑, limk 𝜑(𝜒Ak ) = 𝜑(𝜒E ). that is, ∑n=1 𝜈(En ) = 𝜈(E).
If 𝜇(E) = 0, then 𝜒E = 0 𝜇-a.e.; thus 𝜈(E) = 0.
The summary of the proof so far is that 𝜈 is a complex measure and 𝜈 << 𝜇. By
the Radon-Nikodym theorem, there exists a function g ∈ 𝔏1 (𝜇) such that 𝜑(𝜒E ) =
∫E gd𝜇 for every measurable set E. The linearity of the functionals on the two
sides of the last identity implies that 𝜑(s) = ∫X sgd𝜇 for every simple measurable
function s. Now, for a simple function s, | ∫X sgd𝜇| = |𝜑(s)| ≤ ‖𝜑‖‖s‖p . By the
previous lemma, g ∈ 𝔏q . The functional on the left-hand side of identity (6) is
continuous on 𝔏p by assumption, and the functional on the right side of (6) is
continuous by theorem 8.6.5. Since the two functionals agree on a dense subset
of 𝔏p , namely, the set of simple functions,11 identity (6) holds for all f ∈ 𝔏p . This
completes the proof of the theorem when 𝜇(X) < ∞.

Now suppose that 𝜇(X) = ∞ and that X is the disjoint union of a countable
∞ 𝜒
sequence (En ) of sets of finite measure. Define h(x) = ∑n=1 n En . By the
2 𝜇(En )
∞ 𝜒 d𝜇 ∞
monotone convergence theorem, ∫X hd𝜇 = ∑n=1 ∫En n En = ∑n=1 1/2n = 1.
2 𝜇(En )
1
Thus h ∈ 𝔏 (𝜇). Let 𝜈 be the finite positive measure such that d𝜈 = hd𝜇. For
1 ≤ p < ∞, the correspondence F ↦ h1/p F defines a linear isometry from 𝔏p (𝜈)
onto 𝔏p (𝜇) (theorem 8.5.8 is relevant here and for the rest of the proof). In
particular, 𝜓(F ) = 𝜑(h1/p F) defines a bounded linear functional on 𝔏p (𝜈). By the
first part of the proof, there exists a function G ∈ 𝔏q (𝜈) such that 𝜓(F ) = ∫X FGd𝜈,
for every F ∈ 𝔏p (𝜈).

11 see theorem 8.7.3.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

408 fundamentals of mathematical analysis

If 1 < p < ∞, define g = h1/q G. By theorem 8.5.8, ∫X |g|q d𝜇 = ∫X |G|q d𝜈 <


∞; hence g ∈ 𝔏q (𝜇). For f ∈ 𝔏p (𝜇), 𝜑(f ) = 𝜓(h−1/p f) = ∫X h−1/p fGd𝜈 =
∫X h−1/p fGhd𝜇 = ∫X fh1/q Gd𝜇 = ∫X fgd𝜇, as desired.

If p = 1, define g = G. Because ‖G‖𝜇,∞ = ‖G‖𝜈,∞ , g ∈ 𝔏∞ (𝜇). If f ∈ 𝔏1 (𝜇), then


𝜑(f ) = 𝜓(h−1 f) = ∫X h−1 fGd𝜈 = ∫X h−1 fghd𝜇 = ∫X fgd𝜇. 

Exercises

1. Let 𝜇 be the counting measure on ℕ, and let f ∶ ℕ → 𝕂. Show that f ∈ 𝔏p (𝜇)


if and only if f ∈ lp as defined in section 3.6.
2. Show that, for an essentially bounded function f, ‖ f‖∞ is an essential upper
bound of f.
3. Prove that if 𝜇(X) < ∞ and 1 ≤ p < q ≤ ∞, then 𝔏q (𝜇) ⊆ 𝔏p (𝜇).
4. Let fn be a convergent sequence in 𝔏1 (𝜇), and let f = limn fn . For 𝜖 > 0,
define En,𝜖 = {x ∈ X ∶ | fn (x) − f (x)| ≥ 𝜖}. Show that limn 𝜇(En,𝜖 ) = 0.
5. Let f ∈ 𝔏∞ (𝜇), and suppose that 𝜇(X) < ∞. Prove that limn ‖ f‖n = ‖ f‖∞ .
‖ fn+1 ‖1
6. Let f ∈ 𝔏∞ (𝜇), and suppose that 𝜇(X) < ∞. Show that limn = ‖ f‖∞ .
‖ fn ‖1
1 1
7. Show that if p1 , … , pm > 1 are such that + ... + = 1, and fi ∈ 𝔏pi , then
p1 pm
f1 ...fm ∈ 𝔏1 , and ‖ f1 ...fm ‖1 = ‖ f1 ‖p1 ...‖ fm ‖pm .
8. Show that if f ∈ 𝔏p1 and g ∈ 𝔏p2 , then fg ∈ 𝔏p for some p.
9. Let f ∶ X → [0, ∞) be in 𝔏p , and let fm = min{f, m}. Prove that fm converges
to f in 𝔏p .
10. Show that if fn → f in 𝔏p and gn → f in 𝔏q , then fn gn → fg in 𝔏1 . Here p and
q are conjugate Hölder exponents.
11. Let 𝜇 and 𝜈 be finite positive measures such that 𝜈 << 𝜇 << 𝜈. Prove that
𝔏∞ (𝜇) = 𝔏∞ (𝜈).

8.7 Approximation

In this section, we prove a large collection of approximation theorems. The high-


lights include approximating 𝔏p functions by simple functions and continuous
functions of compact support. We prove that trigonometric polynomials are dense
in 𝔏2 (−𝜋, 𝜋), which is the last piece of information we need to settle the question
of the convergence of Fourier series of functions in 𝔏2 (−𝜋, 𝜋). The important
operation of convoluting functions makes its first debut in this section. Finally, we
study approximations by C∞ functions, prove the C∞ version of Urysohn’s lemma,
and prove that 𝒞∞ n p
c (ℝ ) is dense in 𝔏 (ℝ ).
n
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 409

Lemma 8.7.1 (the Tietze extension theorem). Let K be a compact subset of ℝn and
let f ∶ K → [0, 1] be continuous. Then f can be extended to a continuous function
g ∈ 𝒞c (ℝn ) such that 0 ≤ g ≤ 1. If K is contained in an open set U, then g can be
constructed in such a way that supp(g) ⊆ U.

Proof. Let W be an open set such that W is compact and K ⊆ W ⊆ W ⊆ U. Let K1 =


f−1 ([0, 1/3]), and K2 = f−1 ([2/3, 1]). Applying lemma 8.4.1 to the closed sets E =
K1 ∪ (ℝn − W), and F = K2 , produces a continuous function g1 ∶ ℝn → [0, 1/3]
such that g1 (E) = 0, and g1 (F ) = 1/3. By construction, supp(g1 ) ⊆ W, and 0 ≤
f − g1 ≤ 2/3 on K. Applying the same construction to the function f − g1 , we can
1 2
find a function g2 ∶ ℝn → [0, . ] such that supp(g2 ) ⊆ W, and 0 ≤ f − g1 − g2 ≤
3 3
2
( )2 on the set K. Continuing this construction yields a sequence of continuous
3
2i−1
functions gi on ℝn such that supp(gi ) ⊆ W, 0 ≤ gi ≤ , and 0 ≤ f − g1 − g2 ... −
3i
2
gi ≤ ( )i on K. The sequence Gi = g1 + ... + gi is supported in W and is Cauchy in
3
the uniform norm on the compact set W. Therefore Gi converges uniformly to a
2
continuous function g. Since 0 ≤ f − Gi ≤ ( )i on K, g = f on K. Because each Gi
3
is supported in W ⊆ U, so is g. 

Remark. The Tietze extension theorem is valid for locally compact Hausdorff
spaces. See problem 1 at the end of this section.

Proposition 8.7.2 (Egoroff ’s theorem). Let (X, 𝔐, 𝜇) be a finite measure space.


Suppose the functions f and (fn )∞ n=1 are measurable and finite a.e. and that
limn fn (x) = f (x) for a.e. x ∈ X. Then, for every positive number 𝛿, there exists
a measurable set E such that 𝜇(E) < 𝛿 and fn converges to f uniformly on X − E.

Proof. First we show that, for every pair of positive real numbers 𝜖 and 𝛿, there exists
a measurable set A and an integer N ≥ 1 such that 𝜇(A) < 𝛿 and sup{| fk (x) −
f (x)| ∶ x ∈ X − A} < 𝜖 for every k ≥ N. Define Ck = {x ∈ X ∶ | fk (x) − f (x)| < 𝜖},
and let Dn = ∩∞ k=n Ck = {x ∈ X ∶ | fk (x) − f (x)| < 𝜖 for every k ≥ n}. Clearly,
D1 ⊆ D2 ⊆ … . The set X − ∪∞ n=1 Dn is contained in the set {x ∈ X ∶ limn fn (x) ≠
f (x)}, which, by assumption, has measure 0. It follows that limn 𝜇(Dn ) = 𝜇(X).
Therefore there exists a positive integer N such that 𝜇(X − DN ) < 𝛿. Set A =
X − DN . This proves our assertion because if x ∉ A, then x ∈ Ck for every k ≥ N,
and | fk (x) − f (x)| < 𝜖 for every k ≥ N.

For a fixed 𝛿 > 0, and each k ∈ ℕ, let 𝛿k = 𝛿/2k . Applying the above construction
to the pair 𝜖k = 1/k, and 𝛿k , we find a measurable set Ak such that 𝜇(Ak ) < 𝛿/2k
and a positive integer nk such that sup{| fn (x) − f (x)| ∶ x ∈ X − Ak } < 1/k for
∞ ∞
n > nk . Define E = ∪∞ k
k=1 Ak . Then 𝜇(E) ≤ ∑k=1 𝜇(Ak ) ≤ ∑k=1 𝛿/2 = 𝛿.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

410 fundamentals of mathematical analysis

Now let 𝜖 > 0, and choose a positive integer k such that 1/k < 𝜖.
Now, for m > nk = N,

sup{| fm (x) − f (x)| ∶ x ∈ X − E} ≤ sup{| fm (x) − f (x)| ∶ x ∈ X − Ak } < 1/k < 𝜖.

Thus fn converges uniformly to f on X − E. 

Before we proceed to the next theorem, we call the reader’s attention to the fact that
m
𝔏∞ contains all simple functions, while a simple function s = ∑i=1 ai 𝜒Ei belongs
p
to 𝔏 if and only if the support of s has finite measure, that is, if 𝜇(Ei ) < ∞ for all
1 ≤ i ≤ m.

Theorem 8.7.3. Let (X, 𝔐, 𝜇) be a measure space. For 1 ≤ p ≤ ∞, the simple


functions that belong to 𝔏p (𝜇) are dense in 𝔏p (𝜇).

Proof. We will show that if f ∈ 𝔏p , there is a sequence of simple functions sn such


that limn ‖ f − sn ‖p = 0. By theorem 8.3.2, there exists a sequence s1 , s2 , ... of simple
functions such that |s1 | ≤ |s2 | ≤ ... ≤ | f | and limn sn (x) = f (x).
Clearly, | f − sn | ≤ 2| f |; hence, for 1 ≤ p < ∞, | f − sn |p ≤ 2p | f |p ∈ 𝔏1 . By the dom-
inated convergence theorem, limn ‖ f − sn ‖p = 0.
1
For p = ∞, notice that if n > ‖ f‖∞ , then, for a.e. x ∈ X, 0 ≤ f (x) − sn (x) ≤
2n
(theorem 8.3.2) . Clearly, limn ‖ f − sn ‖∞ = 0. 
m
Lemma 8.7.4. Let s = ∑i=1 ai 𝜒Ei be a simple function on ℝn , and let E = ∪m i=1 Ei .
It is assumed that E1 , … , Em are pairwise disjoint. If 𝜆(E) < ∞, then, for every
𝜖 > 0, there exists a function g ∈ 𝒞c (ℝn ) such that 𝜆({x ∈ ℝn ∶ s(x) ≠ g(x)}) < 𝜖,
and ‖g‖∞ ≤ ‖s‖∞ .

Proof. Let U be an open set containing E such that 𝜆(U − E) < 𝜖/2. For each 1 ≤
𝜖
i ≤ m, let Ki be a compact subset of Ei such that 𝜆(Ei − Ki ) < , and set H =
2m
∪mi=1 Ki . Notice that 𝜆(E − H) < 𝜖/2. For 1 ≤ i ≤ n, define Vi = U − ∪j≠i Kj . By the-
orem 8.4.3, there exist functions gi ∈ 𝒞c (ℝn ) such that Ki ≺ gi ≺ Vi . Now define
m
g = ∑i=1 ai gi . Clearly, g ∈ 𝒞c (ℝn ), g|H = s|H , and g vanishes outside U. The set
{x ∈ ℝn ∶ s(x) ≠ g(x)} is contained in the union of U − E and E − H, and the
Lebesgue measure of each of the two sets is less than 𝜖/2. If ‖g‖∞ > ‖s‖∞ , we
modify g as follows to satisfy the last requirement of the theorem. Let S = {x ∈
ℂ ∶ |z| ≤ ‖s‖∞ }, and T = {z ∈ ℂ ∶ |z| ≤ ‖g‖∞ }. Define 𝜑 ∶ T → S by

z if z ∈ S,
𝜑(z) = { z‖s‖∞
if z ∈ T − S.
|z|
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 411

𝜑 is continuous,12 and |𝜑(z)| = ‖s‖∞ for every z ∈ T − S. Now let h = 𝜑og.


Clearly, h(x) = 0 when g(x) = 0, and hence h ∈ 𝒞c (ℝn ), and ‖h‖∞ = ‖s‖∞ . 

Lemma 8.7.4 is a very special case of the following well-known theorem, which
says, loosely speaking, that a measurable function on a set of finite Lebesgue
measure is not too far from being continuous.

Theorem 8.7.5 (Luzin’s theorem). Let f ∶ ℝn → ℂ be a measurable function, and


suppose that there exists a set E of finite Lebesgue measure such that f (x) = 0 for
every x ∈ ℝn − E. Then, for every 𝜖 > 0, there exists a function g ∈ 𝒞c (ℝn ) such
that the set {x ∈ E ∶ f (x) ≠ g(x)} has a Lebesgue measure less than 𝜖. Moreover, if
f is bounded, g can be chosen in such a way that ‖g‖∞ ≤ ‖ f‖∞ .

Proof. Let U be an open set such that E ⊆ U and 𝜆(U − E) < 𝜖/3. Let si be a sequence
of simple measurable functions such that |s1 | ≤ |s2 | ≤ ... ≤ | f | and limi si (x) =
f (x). Since f is supported in E, each si is supported in E. By Egoroff ’s theorem,
there exists a subset A of E such that 𝜆(A) < 𝜖/3, and the sequence si converges
uniformly to f on E − A. By the proof of lemma 8.7.4, there exist compact sets
𝜖/3
Hi ⊆ E − A such that 𝜆((E − A) − Hi ) < i and functions gi ∈ 𝒞c (ℝn ) such that
2
and gi |Hi = si |Hi , and each gi vanishes outside U. Now let K = ∩∞ i=1 Hi . Clearly,
K is compact, and 𝜆((E − A) − K) < 𝜖/3. The sequence of continuous functions
gi converges uniformly to f on K. Thus f|K is continuous. By the Tietze extension
theorem, there exists a function g ∈ 𝒞c (ℝn ) that extends f|K and g(x) = 0 for every
x ∉ U. The set {x ∈ ℝn ∶ f (x) ≠ g(x)} is contained in the union of U − E, A, and
(E − A) − K, and each of the three sets has Lebesgue measure less than 𝜖/3.
If ‖ f‖∞ < ∞, and ‖g‖∞ > ‖ f‖∞ , we modify g as in the proof of lemma 8.7.4. to
satisfy the requirement ‖g‖∞ ≤ ‖ f‖∞ . 

Theorem 8.7.6. For 1 ≤ p < ∞, 𝒞c (ℝn ) is dense in 𝔏p (ℝn ) for all 1 ≤ p < ∞.

Proof. Let f ∈ 𝔏p (ℝn ), and let 𝜖 > 0. By lemma 8.7.3, we may assume that f = s, a
simple function with 𝜆(supp(s)) < ∞. Lemma 6.7.4 produces a set A of measure
less than 𝜖 and a function g ∈ 𝒞c (ℝn ) such that s(x) = g(x) for x ∉ A, and
p
‖g‖∞ ≤ ‖s‖∞ . Thus |g − s| ≤ |g| + |s| ≤ 2‖s‖∞ . Hence ‖g − s‖p = ∫A |g − s|p d𝜇 ≤
p p
2p ‖s‖∞ 𝜆(A) < 2p ‖s‖∞ 𝜖. 

Remark. Lemma 8.7.4 and theorems 8.7.5 and 8.7.6 are valid for Radon measures
on locally compact Hausdorff spaces without any alterations to the proofs

12 Observe that 𝜑 simply fixes S and retracts the annulus between the disks T and S radially onto the
boundary of S.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

412 fundamentals of mathematical analysis

included for the last three results. For example, in the proof of lemma 8.7.4,
we only used the inner regularity of measurable sets of finite measure.

Example 1. Let [a, b] be a compact interval in ℝ. For 1 ≤ p < ∞, 𝒞[a, b] is dense


in 𝔏p (a, b).
Let f ∈ 𝔏p (a, b), and extend f to a function f ̃ ∈ 𝔏p (ℝ) by defining f ̃ to be 0
outside (a, b). By theorem 8.7.6, there exists a function g ∈ 𝒞c (ℝ) such that
b
‖f ̃ − g‖p < 𝜖. The restriction of g to [a, b] is in 𝒞[a, b], and ∫a | f (x) − g(x)|p dx ≤
∫ℝ |f ̃ − g|p dx < 𝜖p . 

Example 2. For a function f ∈ 𝒞[−𝜋, 𝜋] (not necessarily periodic), and for


every 𝜖 > 0, there exists a 2𝜋-periodic function g such that ‖ f − g‖p < 𝜖. Here
1 ≤ p < ∞.
The function g is be obtained by modifying f near ±𝜋 in the exact same
manner as in the proof of lemma 4.10.2. 

Example 3. The space 𝒞(𝒮1 ) is dense in 𝔏p (−𝜋, 𝜋). Also, trigonometric


polynomials are dense in 𝔏p (−𝜋, 𝜋) for p ∈ [1, ∞).

This result follows immediately from the last two examples and theorem
4.10.1. 

We conclude this subsection by proving the following separability result.

Theorem 8.7.7. For 1 ≤ p < ∞, 𝔏p (ℝn ) is separable.

Proof. Let ℭ be the collection of half-open boxes of the form 𝜎 = [a1 , b1 ) × ... ×
[an , bn ), where ai , bi ∈ ℚ. Define 𝔇 to be the collection of linear combinations of
characteristic functions of members of ℭ with rational coefficients. Thus a member
m
of 𝔇 is a simple function of the form s = ∑i=1 ci 𝜒𝜍i , where m ∈ ℕ, the coefficients
ci are rational numbers, and 𝜎i ∈ ℭ. It is clear that 𝔇 is countable. We prove that
it is dense in 𝔏p (ℝn ). In light of theorem 8.7.6, it suffices to show that if f ∈ 𝒞rc (ℝn )
and 𝜖 > 0, then there is a function s ∈ 𝔇 such that ‖ f − s‖p < c𝜖 for some constant
c, which is independent of 𝜖.
By the uniform continuity of f, there exists a number 𝛿 > 0 such that | f (x) −
f(y)| < 𝜖 whenever ‖x − y‖ < 𝛿. Let Q be a box in ℭ that contains supp(f ) in
its interior. Partition Q into disjoint sub-boxes 𝜎1 , … , 𝜎m , where each 𝜎i ∈ ℭ,
and diam(𝜎i ) < 𝛿. For each 1 ≤ i ≤ m, choose a rational number ci such that
m
minx∈𝜍i f (x) ≤ ci ≤ maxx∈𝜍i f (x). Finally, define s = ∑i=1 ci 𝜒𝜍i . By construction,
‖ f − s‖∞ < 𝜖.
p p
Now ‖ f − s‖p = ∫Q |f − s|p d𝜆 ≤ ‖ f − s‖∞ vol(Q) < 𝜖p vol(Q). 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 413

Approximation by 𝒞∞ Functions

Definition. For Lebesgue measurable functions f and g on ℝn , the convolution of


f and g is the function

(f ∗ g)(x) = ∫ f(x − y)g(y)dy.


ℝn

It is clear that if (f ∗ g)(x) is finite, then (f ∗ g)(x) = (g ∗ f)(x). Thus

(f ∗ g)(x) = ∫ f(x − y)g(y)dy = ∫ f(y)g(x − y)dy.


ℝn ℝn

A variety of conditions can be imposed on f and g to guarantee the finiteness of


the integral, at least for a.e. x ∈ ℝn . We take for granted the measurability of the
function f(x − y)g(y).
In this subsection, we will limit the functions f and g to be continuous functions
of compact support. The reader can look at the section exercises for a slightly
expanded discussion of the properties of convolutions.

Lemma 8.7.8. Let f, g ∈ 𝒞c (ℝn ). Then

(a) (f ∗ g)(x) exists for all x ∈ ℝn , and


(b) f ∗ g ∈ 𝒞c (ℝn ).

Proof. (a) Let K = supp(g). Then

|(f ∗ g)(x)| ≤ ∫ | f(x − y)g(y)|dy ≤ ‖ f‖∞ ∫ |g(y)|dy ≤ ‖ f‖∞ ‖g‖∞ 𝜆(K) < ∞.
ℝn ℝn

(b) Let F be the closure of the (bounded) set {x + y ∶ x ∈ supp(f ), y ∈ supp(g)}.


We claim that f ∗ g is supported inside F. If x ∉ F, then, for every y ∈ supp(g),
x − y ∉ supp(f ). Thus f(x − y)g(y) = 0 for all y ∈ ℝn ; hence (f ∗ g)(x) = 0.
Let 𝜖 > 0. Since f is uniformly continuous, there exists a number 𝛿 > 0 such that,
for 𝜉, 𝜂 ∈ ℝn , | f(𝜉) − f(𝜂)| < 𝜖 whenever ‖𝜉 − 𝜂‖ < 𝛿. Now, for such 𝜉 and 𝜂,

|(f ∗ g)(𝜉) − (f ∗ g)(𝜂)| ≤ ∫ | f(𝜉 − y) − f(𝜂 − y)||g(y)|dy ≤ 𝜖 ∫ |g(y)|dy


ℝn K
≤ 𝜖‖g‖∞ 𝜆(K). 

Lemma 8.7.9. Let f ∈ 𝔏p (ℝn ), where 1 ≤ p < ∞. Then lima→0 ‖𝜏a f − f‖p = 0.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

414 fundamentals of mathematical analysis

Proof. Recall that (𝜏a f)(x) = f(x − a). First we prove the result for a function
g ∈ 𝒞c (ℝn ). Without loss of generality, assume that ‖a‖ < 1. Thus the functions
𝜏a g have a common support, say, K. Let 𝜖 > 0. By the uniform continuity of g,
there exists a number 𝛿 > 0 such that whenever ‖a‖ < 𝛿, then |(𝜏a g)(x) − g(x)| =
p
|g(x − a) − g(x)| < 𝜖. Now ‖𝜏a g − g‖p = ∫K |g(x − a) − g(x)|p dx ≤ 𝜖p 𝜆(K).
p n
Now let f ∈ 𝔏 (ℝ ), and let 𝜖 > 0. By theorem 8.7.6, there is a function g ∈
𝒞c (ℝn ) such that ‖ f − g‖p < 𝜖/3. By the first part of the proof, there is 𝛿 > 0 such
that for ‖a‖ < 𝛿, ‖𝜏a g − g‖p < 𝜖/3. Now if ‖a‖ < 𝛿, then

‖ f − 𝜏a f‖p ≤ ‖ f − g‖p + ‖g − 𝜏a g‖p + ‖𝜏a g − 𝜏a f‖p


= ‖ f − g‖p + ‖g − 𝜏a g‖p + ‖g − f‖p < 𝜖. 

Definition. A multi-index 𝛼 is a sequence 𝛼 = (𝛼1 , … , 𝛼n ), where each 𝛼i is a


n
nonnegative integer. The length of 𝛼 is the integer |𝛼| = ∑i=1 𝛼i .

Notation. Let f be a scalar-valued function on ℝn , and let 𝛼 = (𝛼1 , … , 𝛼n ) be


𝜕|𝛼| f
a multi-index. The notation D𝛼 f stands for the derivative 𝛼 𝛼 𝛼 , if it
𝜕x1 1 𝜕x2 2 ...𝜕xn n
𝜕4 f
exists. For example, if n = 5, and 𝛼 = (1, 0, 2, 0, 1), then D𝛼 f = .
𝜕x1 𝜕x23 𝜕x5

Definition. A function f is said to be infinitely differentiable if D𝛼 f exists for every


multi-index 𝛼. The space of infinitely differentiable functions is denoted by
𝒞∞ (ℝn ), and the space of infinitely differentiable functions of compact support
is given the symbol 𝒞∞ n
c (ℝ ). We will shortly see that there is an abundance of
such functions.

Example 4. Consider the function

exp{−1/x} if x > 0,
f (x) = {
0 if x ≤ 0.

It is easily seen that, for x > 0, and k ∈ ℕ, f(k) (x) = p(1/x) exp{−1/x}, where p is
a polynomial of degree 2k. Therefore limx↓0 f(k) (x) = 0. Hence f(k) (0) = 0, and
f is infinitely differentiable at x = 0. Since the differentiability of f at x ≠ 0 is
obvious, f ∈ 𝒞∞ (ℝ). 

Example 5 (the bump function). For a fixed h > 0, consider the function

−h2
exp{ } if |x| < h,
𝜑(x) = { h2 −|x|2
0 if |x| ≥ h.

c (ℝ). 
As |x| ↑ h, h2 − |x|2 ↓ 0, so, by example 1, 𝜑 ∈ 𝒞∞
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 415

Example 6 (the bump kernel). We can use the function 𝜑 in example 5 to construct
a continuously parameterized family of functions as follows. For a fixed h > 0,

−h2
An h−n exp{ } if ‖x‖ < h,
𝛿h (x) = { h2 −‖x‖2
0 if ‖x‖ ≥ h,

−1
where A−1 c (ℝ ). 
}dy. By the above examples, 𝛿h ∈ ℂ∞ n
n = ∫‖y‖<1 exp {
1−‖y‖2

Observe that maxx∈ℝn 𝛿h (x) = 𝛿h (0) = An h−n /e , and ∫ℝn 𝛿h (x)dx = 1.

The first assertion is obvious. For the second assertion, use the change of variable
x = hy. By problem 14 on section 8.4, dx = hn dy, and

−h2
∫ 𝛿h (x)dx = ∫ An h−n exp { } dx
ℝn ‖x‖<h
h2 − ‖x‖2
−1
= An h−n ∫ exp { } hn dy = 1.
‖y‖<1
1 − ‖y‖2

We call the family {𝛿h ∶ h > 0} the bump kernel.

Lemma 8.7.10. If f ∈ 𝒞∞ n n ∞ n
c (ℝ ), and g ∈ 𝒞c (ℝ ), then f ∗ g ∈ 𝒞c (ℝ ), and, for every
𝛼 𝛼
multi-index 𝛼, D (f ∗ g) = D f ∗ g.

Proof. The proof is by induction on |𝛼|. It is sufficient to prove the result when
𝜕 𝜕f
|𝛼| = 1. Thus we need to show that (f ∗ g) = ∗ g. For simplicity of notation,
𝜕xi 𝜕xi
fix x1 , … , xi−1 , xi+1 , … , xn , and consider f as a function of the single variable xi ,
d
which we rename x. Thus we need to prove that (f ∗ g) = f ′ ∗ g.
dx
(f∗g)(x+t)−(f∗g)(x) ′
We will show that limt→0 − (f ∗ g)(x) = 0.
t
Let 𝜖 > 0. By the uniform continuity of f′ , there is 𝛿 > 0 such that
(f∗g)(x+t)−(f∗g)(x)
| f′ (𝜉) − f′ (𝜂)| < 𝜖 whenever |𝜉 − 𝜂| < 𝛿. Now | − (f′ ∗ g)(x)| =
t
f(x+t−y)−f(x−y)
| ∫ℝ { − f′ (x − y)}g(y)dy| = | ∫ℝ {f′ (x + 𝜃t − y) − f′ (x − y)}g(y)dy|,
t
where 0 < 𝜃 < 1. Now if |t| < 𝛿, then | f′ (x + 𝜃t − y) − f′ (x − y)| < 𝜖 and
| ∫ℝ {f′ (x + 𝜃t − y) − f′ (x − y)}g(y)dy| ≤ 𝜖 ∫ℝ |g(y)|dy ≤ 𝜖‖g‖∞ 𝜆(K), where K =
supp(g). 

As a corollary of the last result, for every f ∈ 𝒞c (ℝn ), f ∗ 𝛿h ∈ 𝒞∞ n


c (ℝ ).

The following is the C∞ version of Urysohn’s lemma.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

416 fundamentals of mathematical analysis

Corollary 8.7.11. Let K be a compact subset of an open subset V of ℝn . Then there


exists a function f ∈ 𝒞∞ n
c (ℝ ) such that K ≺ f ≺ V.

Proof. Let 𝛿 = dist(K, ℝn − V). Since K is compact, 𝛿 is positive. Define K1 =


{x ∈ ℝn ∶ dist(x, K) ≤ 𝛿/4}, and V1 = {x ∈ ℝn ∶ dist(x, K) < 𝛿/2}. Since K1 is
compact, V1 is open, and K1 ⊆ V1 , theorem 8.4.3 produces a function g such that
K1 ≺ g ≺ V1 . Now choose a number h < 𝛿/4, and define f = g ∗ 𝛿h . By lemma
8.7.8, supp(f ) ⊆ {x ∈ ℝn ∶ dist(x, K) ≤ 3𝛿/4} ⊆ V.
Clearly, f (x) ≥ 0. Since 0 ≤ g(x) ≤ 1, f (x) = ∫ℝn g(x − y)𝛿h (y)dy ≤
∫ℝn 𝛿h (y)dy = 1.
It remain to show that K ≺ f. If x ∈ K, then B(x, h) ⊆ B(x, 𝛿/4) ⊆ K1 ; hence
f (x) = ∫‖x−y‖<h g(y)𝛿h (x − y)dy = ∫‖x−y‖<h 𝛿h (x − y)dy = 1. 

Proposition 8.7.12. If f ∈ 𝒞c (ℝn ), then, for 1 ≤ p < ∞, f ∗ 𝛿h → f in 𝔏p (ℝn ).

Proof. We will make use of the fact that ∫ℝn 𝛿h (y)dy = 1 and example 4 on sec-
tion 8.6:

| |
|(f ∗ 𝛿h )(x) − f (x)| = || ∫ (f(x − y) − f (x))𝛿h (y)dy||
ℝn

≤ ∫ | f(x − y) − f (x)|𝛿h (y)dy


ℝn
1/p
≤ ( ∫ | f(x − y) − f (x)|p 𝛿h (y)dy) .
ℝn

Integrating the pth power of the extreme sides of the above string, we have

p
‖ f ∗ 𝛿h − f‖p ≤ ∫ ∫ | f(x − y) − f (x)|p 𝛿h (y)dydx
ℝn ℝn

= ∫ 𝛿h (y) ∫ | f(x − y) − f (x)|p dxdy


ℝn ℝn
p p
= ∫ ‖𝜏y f − f‖p 𝛿h (y)dy = ∫ ‖𝜏y f − f‖p 𝛿h (y)dy.13
ℝn ‖y‖≤h

By lemma 8.7.9, there exists a number h0 > 0 such that, for ‖y‖ < h0 , ‖𝜏y f − f ‖p <
p
𝜖. Hence, for h < h0 , ∫‖y‖<h ‖𝜏y f − f‖p 𝛿h (y)dy ≤ 𝜖p ∫‖y‖≤h 𝛿h (y)dy = 𝜖p . 

Part (a) of the following result is a vast generalization of theorem 8.7.6.

Theorem 8.7.13. (a) For 1 ≤ p < ∞, C∞ n p n


c (ℝ ) is dense in 𝔏 (ℝ ).
∞ n n
(b) 𝒞c (ℝ ) is dense in 𝒞0 (ℝ ).

13 Fubini’s theorem is used below to switch the order of integration.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 417

Proof. (a) Let f ∈ 𝔏p (ℝn ), and let 𝜖 > 0. By theorem 8.7.6, there is a function
g ∈ 𝒞c (ℝn ) such that ‖ f − g‖p < 𝜖/2. By the previous proposition, we can choose
h > 0 small enough so that ‖g − g ∗ 𝛿h ‖p < 𝜖/2. The function g ∗ 𝛿h is in 𝒞∞ n
c (ℝ ),
and ‖ f − g ∗ 𝛿h ‖p < 𝜖.

(b) Let f ∈ 𝒞0 (ℝn ), and let 𝜖 > 0. By theorem 5.11.8, there exits a function
g ∈ 𝒞c (ℝn ) such that ‖ f − g‖∞ < 𝜖. By the uniform continuity of g, there is a
number 𝛿 > 0 such that |g(x) − g(y)| < 𝜖 whenever ‖x − y‖ < 𝛿. Choose a positive
number h < 𝛿. The proof will be complete if we show that ‖g ∗ 𝛿h − g‖∞ < 𝜖. Since
∫ℝn 𝛿h (x − y)dy = 1,

| |
|g ∗ 𝛿h (x) − g(x)| = || ∫ {g(y) − g(x)}𝛿h (x − y)dy||
ℝn

≤∫ |g(x) − g(y)|𝛿h (x − y)dy


‖x−y‖<𝛿

< 𝜖∫ 𝛿h (x − y)dy = 𝜖. 
‖x−y‖<h

Exercises

1. Let K be a compact subset of a locally compact Hausdorff space X, and let f ∶


K → [0, 1] be continuous. Show f can be extended to a continuous function
g ∈ 𝒞c (X) such that 0 ≤ g ≤ 1. If K is contained in an open set U, then g can be
constructed in such a way that supp(g) ⊆ U. Hint: Mimic the proof of lemma
8.7.1, and use theorem 5.11.5.
2. Let F be a closed subset of normal space X, and let f ∶ F → [0, 1] be contin-
uous. Show f can be extended to a continuous function g ∈ 𝒞(X) such that
0 ≤ g ≤ 1. If F is contained in an open set U, then g can be constructed in
such a way that supp(g) ⊆ U. Hint: Modify the proof of lemma 8.7.1, and use
theorem 5.11.2. In this case, convergence of the functions Gi takes place in
the space ℬ𝒞(X).
3. Let 𝜇 be a 𝜎-finite measure on X, let (fn ) be a sequence of measurable func-
tions, and suppose that limn fn (x) = f (x). Prove that there exists a sequence
(Ej ) of measurable sets such that fn converges uniformly to f on each Ej and
𝜇(X − ∪∞ j=1 Ej ) = 0.
4. Let f ∈ 𝔏p (ℝn ), where 1 ≤ p < ∞. Show that the mapping ℝn → 𝔏p (ℝn )
defined by a → 𝜏a f is uniformly continuous. Observe that, in lemma 8.7.9,
we established the continuity at a = 0.
5. Assuming that (f ∗ g)(x) exists for every x ∈ ℝn , prove that
(a) f ∗ g = g ∗ f, and
(b) 𝜏a (f ∗ g) = (𝜏a f) ∗ g = f ∗ (𝜏a g).
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

418 fundamentals of mathematical analysis

6. Let f ∈ 𝔏p (ℝn ) and g ∈ 𝔏q (ℝn ), where p and q are conjugate exponents.


Prove that f ∗ g ∈ 𝔏∞ (ℝn ) and that ‖ f ∗ g‖∞ ≤ ‖ f‖p ‖g‖q .
7. This problem is a continuation of the previous exercise. Prove that if 1 ≤ p <
∞, then f ∗ g is uniformly continuous.
8. This problem is a continuation of exercise 6. Show that if 1 < p < ∞, then
f ∗ g ∈ 𝒞0 (ℝn ).
9. Prove that dim 𝒞∞ n
c (ℝ ) = ∞.

8.8 Product Measures

Throughout this section, (X, 𝔐, 𝜇) and (Y, 𝔑, 𝜈) denote a pair of measure spaces.
The objective of this section is to find a reasonable definition of the product
measure on X × Y. Fubini’s theorem is one of the section’s main results. We also
settle questions about the products of Lebesgue measures in this section.

The basic definitions are motivated by the ideas found in standard calculus
textbooks. Let us look at the simplest case, which is the product of two copies
of the real line with Lebesgue measure, 𝜆. The problem of computing the area of
a plane region contains all the motivations for the ideas behind the definitions in
this section. Figure 8.6 depicts a (bounded) plane region E in ℝ2 . To compute the
area of E, we take a vertical cross section Sx in E, and the area of E is obtained
by integrating the length (the Lebesgue measure) of the cross section. The same
can be achieved by taking a horizontal cross section Sy in E. Thus the area (two-
dimensional measure) of E, denoted 𝜌(E), is given by

(x,y)
Sy

Sx

Figure 8.6 Computing the area of a plane


region
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 419

𝜌(E) = ∫ 𝜆(Sx )dx = ∫ 𝜆(Sy )dy.


ℝ ℝ

We also wish the two-dimensional measure 𝜌 to preserve the property that the
area of a rectangle is the product of its dimensions. More generally, if A and B are
measurable subsets of ℝ, then it should be the case that

𝜌(A × B) = 𝜆(A)𝜆(B).

Now see theorem 8.8.9, where the definition of the product measure appears.

Before we can achieve any of the above goals, we need to define a reasonable 𝜎-
algebra in X × Y where our expectations can materialize. Geometry dictates that
the product of two intervals (or, more generally, measurable subsets) A and B in
ℝ ought to be measurable in the product space. This immediately suggests that we
look at the smallest 𝜎-algebra that contains all rectangles, and this provides the
motivation of the definitions below of the product of measurable spaces.

Products of Measurable Spaces

Definition. A subset of X × Y of the form A × B, where A ∈ 𝔐, B ∈ 𝔑 is called a


measurable rectangle in X × Y.

Definition. The product of the measurable spaces (X, 𝔐) and (Y, 𝔑) is the
measurable space (X × Y, 𝔐 ⊗ 𝔑), where 𝔐 ⊗ 𝔑 is the 𝜎-algebra generated
by the collection of measurable rectangles.

Definition. For a subset E ⊆ X × Y, and for a fixed element x ∈ X, we define the


x-section of E to be the set Ex = {y ∈ Y ∶ (x, y) ∈ E}. Similarly, for y ∈ Y, the
y-section of E is the set Ey = {x ∈ X ∶ (x, y) ∈ E}.

The following lemma will be used without explicit reference. Its proof is simple.

Lemma 8.8.1. If (En ) is a sequence of subsets of X × Y, then

(∪n En )x = ∪n (En )x , and (∩n En )x = ∩n (En )x .

The corresponding statements for y-sections are true. 

Proposition 8.8.2. If E ∈ 𝔐 ⊗ 𝔑, then, for every x ∈ X, Ex ∈ 𝔑. Likewise, for


every y ∈ Y, Ey ∈ 𝔐.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

420 fundamentals of mathematical analysis

Proof. Let Ω = {E ⊆ X × Y ∶ Ex ∈ 𝔑 ∀x ∈ X}. Clearly, Ω contains all elementary


rectangles because if A ∈ 𝔐 and B ∈ 𝔑, then

B if x ∈ A,
(A × B)x = {
∅ if x ∉ A.

If E ∈ Ω, (E′ )x = (Ex )′ ∈ 𝔑; hence E′ ∈ Ω. Here E′ denotes the complement of E


in X × Y.
Finally, if for every n ∈ ℕ, En ∈ Ω, (∪∞ ∞
n=1 En )x = ∪n=1 (En )x ∈ 𝔑. Therefore

∪n=1 En ∈ Ω.
The above shows that Ω is a 𝜎-algebra that contains all measurable rectangles;
hence Ω ⊇ 𝔐 ⊗ 𝔑. The proof that Ey ∈ 𝔐 for every y ∈ Y is identical to the
above case. 

Definition. Let f be a scalar function on X × Y. For an element x ∈ X, the x-section


of f is the function fx ∶ Y → ℂ defined by fx (y) = f(x, y). Similarly, the y-section
of f is the function fy ∶ X → ℂ such that f y (x) = f(x, y).

Proposition 8.8.3. If f ∶ X × Y → ℂ is 𝔐 ⊗ 𝔑-measurable, then, for every x ∈ X,


fx is 𝔑-measurable, and, for every y ∈ Y, fy is 𝔐-measurable.

Proof. Let a ∈ ℝ, and let E = f−1 (a, ∞) = {(x, y) ∈ X × Y ∶ f(x, y) > a}. Now
f−1
x (a, ∞) is exactly the set Ex , which is measurable by the previous proposition.
Thus fx is 𝔑-measurable. 

Definition. An elementary set in X × Y is a disjoint union of finitely many


measurable rectangles. The collection of elementary sets will be given the
symbol 𝔈.

It is clear that the collection of elementary sets also generates 𝔐 ⊗ 𝔑.

Proposition 8.8.4. The collection 𝔈 of elementary sets is an algebra.

Proof. It is clear that the intersection of two measurable rectangles is either empty or
a measurable rectangle. Also (A × B)′ = (A′ × Y) ∪ (A × B′ ), so the complement
of a measurable rectangle is an elementary set.
Let E = ∪ni=1 Ri and F = ∪m j=1 Sj be elementary sets, where each of {Ri } and {Sj }
is a set of disjoint measurable rectangles. Now E ∩ F = ∪{Ri ∩ Sj ∶ 1 ≤ i ≤ n, 1 ≤
j ≤ m}. This shows that E ∩ F ∈ 𝔈, and that 𝔈 is closed under the formation of
finite intersections
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 421

n
Now consider the complement of an elementary set E = ∪i=1 Ri . Since E′ =
∩ni=1 R′i ,′
E ∈ 𝔈. Thus 𝔈 is closed under complementation. It is also clear that 𝔈
is closed under the formation of finite disjoint unions.
Now if E1 , E2 ∈ 𝔈, then, by the above, E′1 ∩ E2 ∈ 𝔈; hence E1 ∪ E2 = E1 ∪
(E1 ∩ E2 ) ∈ 𝔈. 

Definition. Let T be a nonempty set. A monotone class in T is a collection ℭ of


subsets of T such that

(a) if (En ) is an ascending sequence in ℭ, then ∪∞


n=1 En ∈ ℭ; and

(b) if (En ) is a descending sequence in ℭ, then ∩n=1 En ∈ ℭ.

Proposition 8.8.5. Given an arbitrary collection ℰ of subsets of a nonempty set T,


there exists a (unique) smallest monotone class in T that contains ℰ.

Proof. The intersection of an arbitrary collection of monotone classes in T is clearly


a monotone class. The family of monotone classes containing ℰ is nonempty since
𝒫(T) is such a monotone class. The intersection of the monotone classes in T that
contain ℰ is the monotone class we seek. 

Lemma 8.8.6 (the monotone class lemma). Let 𝔈 be an algebra of subsets in a


nonempty set T. Then the smallest monotone class in T containing 𝔈 is the 𝜎-
algebra generated by 𝔈. In particular, if an algebra in T is a monotone class, then
it is a 𝜎-algebra.

Proof. Let 𝔐 be the 𝜎-algebra generated by 𝔈, and let 𝔐1 be the smallest monotone
class in T containing 𝔈. Since 𝔐 is a monotone class containing 𝔈, 𝔐1 ⊆ 𝔐.
Thus we need to establish the reverse inclusion. It is clearly sufficient to show that
𝔐1 is a 𝜎-algebra.
We first show that 𝔐1 is an algebra. Let 𝔐′1 = {E ⊆ T ∶ E′ ∈ 𝔐1 }. It is clear
that 𝔐′1 is a monotone class in T and that 𝔈 ⊆ 𝔐′1 . Thus 𝔐1 ⊆ 𝔐′1 ; hence 𝔐1
is closed under complementation.
For a member F ∈ 𝔐1 , define Ω(F ) = {E ∈ 𝔐1 ∶ E ∪ F ∈ 𝔐1 }. It is easy to
verify that Ω(F ) is a monotone class in X. Now if G ∈ 𝔈, then Ω(G) contains
𝔈, so Ω(G) = 𝔐1 . Hence, for any H ∈ 𝔐1 , H ∈ Ω(G). By the very definition of
Ω(H), G ∈ Ω(H), so 𝔈 ⊆ Ω(H) for each H ∈ 𝔐1 . Because Ω(H) is a monotone
class, 𝔐1 = Ω(H), so 𝔐1 is an algebra.
Now if (En ) is a sequence of members of 𝔐1 , let Bn = ∪ni=1 Ei . Because 𝔐1 is an
algebra, each Bn ∈ 𝔐1 . Since 𝔐1 is a monotone class, it follows that ∪∞ n=1 En =
∪∞n=1 Bn is in 𝔐1 . This shows that 𝔐1 is a 𝜎-algebra, and the proof is complete. 

The following result is immediate.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

422 fundamentals of mathematical analysis

Corollary 8.8.7. The 𝜎-algebra 𝔐 ⊗ 𝔑 is the smallest monotone class that contains
the algebra 𝔈 of elementary sets. 

Product Measures

Theorem 8.8.8. Suppose (X, 𝔐, 𝜇) and (Y, 𝔑, 𝜈) are 𝜎-finite measure spaces.
For a subset E ∈ 𝔐 ⊗ 𝔑, and for x ∈ X, y ∈ Y, define

𝜑(x) = 𝜈(Ex ), and 𝜓(y) = 𝜇(Ey ).

Then

(i) 𝜑 is 𝔐-measurable,
(ii) 𝜓 is 𝔑-measurable, and
(iii) ∫X 𝜑d𝜇 = ∫Y 𝜓d𝜈.

Proof. Let Ω be the collection of members of 𝔐 ⊗ 𝔑 for which all three conclusions
of the theorem hold. We will show that Ω = 𝔐 ⊗ 𝔑.

First, we establish a number of facts.

(a) Ω contains all elementary sets.


If E = A × B is a measurable rectangle, then 𝜈((A × B)x ) = 𝜒A (x)𝜈(B), and
𝜇((A × B)y ) = 𝜒B (y)𝜇(A), are measurable;1⁴ hence ∫X 𝜈(Ex )d𝜇 = ∫X 𝜈(B)𝜒A d𝜇 =
𝜈(B)𝜇(A) = ∫Y 𝜇(A)𝜒B d𝜈 = ∫Y 𝜇(Ey )d𝜈. Now the result is true for elementary sets
because of the additivity of measures and the linearity of integrals.

(b) If En ∈ Ω and E1 ⊆ E2 ⊆ … , then E = ∪∞ n=1 En ∈ Ω.


Write 𝜑n (x) = 𝜈((En )x ), 𝜓n (y) = 𝜇((En )y ), 𝜑(x) = 𝜈(Ex ), and 𝜓(y) = 𝜇(Ey ).
Now 𝜑n (x) increases to 𝜑(x) = 𝜈(Ex ), and 𝜓n (y) increases to 𝜓(y) = 𝜇(Ey ).
By assumption, 𝜑n and 𝜓n are measurable, so 𝜑 and 𝜓 are measurable. Also
by assumption, ∫X 𝜑n d𝜇 = ∫X 𝜓n d𝜈. By the monotone convergence theorem,
conclusion (iii) holds for the set E.

(c) If E1 ⊇ E2 ⊇ ... is a sequence in Ω and if E1 ⊆ A × B for some measurable


rectangle A × B with 𝜇(A) < ∞ and 𝜈(B) < ∞, then E = ∩∞ n=1 En ∈ Ω.
In the notation of the proof of fact (b), 𝜑n decreases to 𝜑, and 𝜓n decreases to 𝜓.
Thus 𝜑 and 𝜓 are measurable. Since (E1 )x ⊆ (A × B)x , 𝜈((E1 )x ) ≤ 𝜈((A × B)x ) =
𝜈(B)𝜒A (x). Therefore ∫Y 𝜑1 d𝜇 = ∫X 𝜈((E1 )x )d𝜇 ≤ ∫X 𝜈(B)𝜒A d𝜇 = 𝜇(A)𝜈(B) <
∞. Similarly, ∫Y 𝜓d𝜈 < ∞. By assumption, ∫X 𝜑n d𝜇 = ∫Y 𝜓n d𝜈. Fact (c) now
follows from the dominated convergence theorem.

1⁴ Recall that characteristic functions of measurable sets are measurable functions.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 423

(d) If (En ) is a disjoint sequence in Ω, then E = ∪∞ n=1 En ∈ Ω.


For each n ∈ ℕ, the set ∪ni=1 Ei is in Ω (see the proof of fact (a)). Now (d) follows
from (b) applied to the ascending sequence (∪ni=1 Ei )∞ n=1 .

Now we use the 𝜎-finiteness assumption to write X as the disjoint union of subsets
Xn of finite 𝜇-measure, and Y as the disjoint union of subsets Ym of finite 𝜈
measure. For a member E of 𝔐 ⊗ 𝔑, define Em,n = E ∩ (Xn × Ym ), and let Ω1 be
the collection of all members E of 𝔐 ⊗ 𝔑 such that, for all m, n ∈ ℕ, Em,n ∈ Ω.
Facts (b) and (c) imply that Ω1 is a monotone class, and fact (a) implies that Ω1
contains all elementary sets. Thus Ω1 = 𝔐 ⊗ 𝔑 by corollary 8.8.7.
Thus Em,n ∈ Ω for every E ∈ 𝔐 ⊗ 𝔑 and for all m, n ∈ ℕ. Since E = ∪m,n Em,n
and the sets Em,n are disjoint, fact (d) implies that E ∈ Ω. 

Observe that conclusion (iii) of theorem 8.8.8 can be written as

∫ { ∫ 𝜒E (x, y)d𝜈(y)}d𝜇(x) = ∫ { ∫ 𝜒E (x, y)d𝜇(x)}d𝜈(y).


X Y Y X

Thus the order of integration can be switched in iterated integrals of characteristic


functions of 𝔐 ⊗ 𝔑-measurable sets. This is clearly the first step to prove Fubini’s
theorem. First we need to define the product measure of two 𝜎-finite measure
spaces.

Theorem 8.8.9. Under the assumptions of theorem 8.8.8, the set function defined by

(𝜇 ⊗ 𝜈)(E) = ∫ 𝜑d𝜇 = ∫ 𝜓d𝜈


X Y

is the unique positive measure on 𝔐 ⊗ 𝔑 such that (𝜇 ⊗ 𝜈)(A × B) = 𝜇(A)𝜈(B)


for all measurable rectangles A × B. Furthermore, 𝜇 ⊗ 𝜈 is 𝜎-finite. The measure
𝜇 ⊗ 𝜈 is the product of the measures 𝜇 and 𝜈.

Proof. Let (En ) be a disjoint sequence of 𝔐 ⊗ 𝔑-measurable subsets of X × Y, and


let E = ∪∞ ∞
n=1 En . Since Ex = ∪n=1 (En )x and since the sequence ((En )x ) is disjoint,

𝜈(Ex ) = ∑n=1 𝜈((En )x ). An application of the monotone convergence theorem
yields

∞ ∞ ∞
(𝜇 ⊗ 𝜈)(E) = ∫ ∑ 𝜈((En )x )d𝜇 = ∑ ∫ 𝜈((En )x )d𝜇 = ∑ (𝜇 ⊗ 𝜈)(En ).
X n=1 n=1 X n=1

The 𝜎-finiteness of 𝜇 ⊗ 𝜈 is obvious. We leave the proof of the uniqueness part as


an exercise. 
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

424 fundamentals of mathematical analysis

Remark. Both the existence and uniqueness of the product measure of 𝜎-finite
spaces can be based on the Hopf extension theorem. For a measurable rectangle
A × B in 𝔐 × 𝔑, we define 𝜌(A × B) = 𝜇(A)𝜈(B), and, for an elementary set
n
C = ∪ni=1 Ai × Bi , we define 𝜌(C) = ∑i=1 𝜇(Ai )𝜈(Bi ). Then one can check that
𝜌 is countably additive on the algebra ℭ of elementary sets (it is not difficult).
Now all the conditions of theorems 8.2.19 and 8.2.20 are met, and the (unique)
Hopf extension of 𝜌 is the product measure 𝜇 ⊗ 𝜈. The approach we took to
define the product measure has the slight advantage that it is better motivated
by calculus concepts, as explained in the opening remarks of this section. In
addition, Fubini’s theorem follows without difficulty from the above results.

Fubini’s Theorem

Theorem 8.8.10 (Tonelli’s theorem). Suppose f ∶ X × Y → ℂ is an 𝔐 ⊗ 𝔑-


measurable function.

(a) If f is positive, let 𝜑(x) = ∫Y fx d𝜈, and 𝜓(y) = ∫X f y d𝜇. Then 𝜑 is 𝔐-


measurable, 𝜓 is 𝔑-measurable, and

∫ 𝜑d𝜇 = ∫ fd(𝜇 ⊗ 𝜈) = ∫ 𝜓d𝜈.


X X×Y Y

(b) In general, let 𝜑∗ (x) = ∫Y | f |x d𝜈, and 𝜓 ∗ (y) = ∫X | f |y d𝜇. If 𝜑∗ ∈ 𝔏1 (𝜇) or if


𝜓∗ ∈ 𝔏1 (𝜈), then f ∈ 𝔏1 (𝜇 ⊗ 𝜈).

Proof. Tonelli’s theorem holds for the characteristic function of an 𝔐 ⊗ 𝔑-


measurable set by the previous theorem. By the linearity of the integral, Tonelli’s
theorem holds for any 𝔐 ⊗ 𝔑-measurable simple function.
Now let 0 ≤ s1 ≤ s2 ≤ ... be a sequence of 𝔐 ⊗ 𝔑-simple functions converging
to f(x, y) for every (x, y) ∈ X × Y, and let 𝜑n (x) = ∫Y (sn )x d𝜈. By the above para-
graph, ∫X 𝜑n d𝜇 = ∫X×Y sn d(𝜇 ⊗ 𝜈). The monotone convergence theorem implies
that ∫X 𝜑d𝜇 = ∫X×Y fd(𝜇 ⊗ 𝜈). The proof that ∫Y 𝜓d𝜈 = ∫X×Y fd(𝜇 ⊗ 𝜈) is identi-
cal to the above.
Part (b) is obtained by applying part (a) to the function | f |. 

Theorem 8.8.11 (Fubini’s theorem). If f ∈ 𝔏1 (𝜇 ⊗ 𝜈), then fx ∈ 𝔏1 (𝜈) for a.e.


x ∈ X, fy ∈ 𝔏1 (𝜇) for a.e. y ∈ Y, the functions 𝜑(x) = ∫Y fx d𝜈 and 𝜓(y) = ∫X fy d𝜇
are in 𝔏1 (𝜇) and 𝔏1 (𝜈), respectively, and

∫ 𝜑d𝜇 = ∫ fd(𝜇 ⊗ 𝜈) = ∫ 𝜓d𝜈. (7)


X X×Y Y
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 425

Proof. It is clearly sufficient to prove the result when f is a real function. Let f+
and f− be the positive and negative parts of f, and write 𝜑1 (x) = ∫Y (f+ )x d𝜈, and
𝜑2 (x) = ∫Y (f− )x d𝜈. Since f+ ≤ | f |, f+ ∈ 𝔏1 (𝜇 ⊗ 𝜈), theorem 8.8.10 applies and
∫X 𝜑1 d𝜇 = ∫X×Y f+ d(𝜇 ⊗ 𝜈) < ∞. Thus 𝜑1 ∈ 𝔏1 (𝜇) and example 1 in section 8.3
now implies that 𝜑1 (x) is finite for a.e. x ∈ X, that is, (f+ )x is integrable for a.e.
x ∈ X. Similar results apply to f− ; 𝜑2 ∈ 𝔏1 (𝜇), and 𝜑2 is finite for a.e. x ∈ X.
The function 𝜑 = 𝜑1 − 𝜑2 is defined for a.e. x ∈ X, and the identity ∫X 𝜑d𝜇 =
∫X×Y fd(𝜇 ⊗ 𝜈) follows from the fact that fx = (f+ )x − (f− )x and the linearity of
the integral. The remaining assertion of the theorem and the other identity in (7)
are obtained by replicating the above proof for the function f y . 

Products of Lebesgue Measures

In the discussion below and until the end of the section, k is a positive integer, and
𝜆k denotes Lebesgue measure on the 𝜎-algebra ℒk of Lebesgue measurable subsets
of ℝk . We also use the notation ℬk to denote the 𝜎-algebra of Borel subsets of ℝk .

In the following, we use the result of problem 10 on section 8.4 without explicit
mention.

Lemma 8.8.12. Let r and s be positive integers, and let n = r + s. If Z is a set of


Lebesgue measures 0 in ℝr and B ∈ ℒs , then Z × B ∈ ℒn , and 𝜆n (Z × B) = 0.

Proof. First assume that B is bounded, and choose an open set V of finite measure
such that B ⊆ V ⊆ ℝs . Let 𝜖 > 0. Choose an open set U such that Z ⊆ U ⊆ ℝr
and 𝜆r (U) < 𝜖. Since we have not yet established the Lebesgue measurability of
Z × B, we estimate its outer measure: m∗n (Z × B) ≤ m∗n (U × V) = 𝜆n (U × V) =
𝜆r (U)𝜆s (V) < 𝜖𝜆s (V). Since 𝜖 is arbitrary, m∗n (Z × B) = 0; hence Z × B is mea-
surable of measure 0.
If B is unbounded, consider the intersection Bi of B with the open ball in ℝs
of radius i and centered at the origin. By what we just proved, for each i ∈ ℕ,
Z × Bi ∈ ℒn has measure 0. Since Z × B = ∪∞ i=1 (Z × Bi ), the proof is complete. 

Proposition 8.8.13. Let r, s, and n be as in lemma 8.8.12. Then

(a) ℬn ⊆ ℒr ⊗ ℒs ⊆ ℒn .
(b) If A ∈ ℒr and B ∈ ℒs , then 𝜆n (A × B) = 𝜆r (A)𝜆s (B).

Proof. (a) Every open cube in ℝn is the product of two open cubes, one in ℝr and
one in ℝs . Thus ℒr ⊗ ℒs contains all open cubes in ℝn . Since every open subset of
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

426 fundamentals of mathematical analysis

ℝn is a countable union of open cubes, ℒr ⊗ ℒs contains all open subsets of ℝn ;


hence ℬn ⊆ ℒr ⊗ ℒs .
Suppose A ∈ ℒr and B ∈ ℒs . By theorem 8.4.10(c), choose F𝜍 sets F ⊆ A
and K ⊆ B such that the sets Z1 = A − F and Z2 = B − K have measure 0. Now
A × B = (F × K) ∪ Z, where Z = (F × Z2 ) ∪ (Z1 × K) ∪ (Z1 × Z2 ). By the previous
lemma, 𝜆n (Z) = 0. Since the product of F𝜍 sets is an F𝜍 set, F × K ∈ ℒn . The ℒn -
measurability of A × B is now immediate because it the union of two measurable
sets. We have shown that ℒn contains all measurable rectangles in ℒr ⊗ ℒs . Hence
ℒr ⊗ ℒs ⊆ ℒn .

(b) First assume that A and B are bounded G𝛿 sets. Thus there exist descending
sequences of bounded open sets {Ui } in ℝr and {Vj } in ℝs such that A = ∩∞
i=1 Ui
and B = ∩∞ V
j=1 j . Now

𝜆n (A × B) = 𝜆n (∩∞ ∞ ∞
i=1 ∩j=1 (Ui × Vj )) = lim 𝜆n (∩j=1 (Ui × Vj ))
i

= lim lim 𝜆n (Ui × Vj ) = lim lim 𝜆r (Ui )𝜆s (Vj ) = 𝜆r (A)𝜆s (B).
i j i j

Now, for arbitrary (unbounded) G𝛿 sets A and B, the result follows from the 𝜎-
finiteness of Lebesgue measure. We invite the reader to work out the details.

Finally, if A and B are Lebesgue measurable in their respective spaces, then, by


theorem 8.4.10(c), choose G𝛿 sets G in ℝr and H in ℝs such that A = G − Z1 ,
B = H − Z2 , where Z1 and Z2 have measure 0. By lemma 8.8.12,

𝜆n (G × H) = 𝜆n (A × B) + 𝜆n (A × Z2 ) + 𝜆n (Z1 × B) + 𝜆n (Z1 × Z2 ) = 𝜆n (A × B).

But 𝜆n (G × H) = 𝜆r (G)𝜆s (H) = 𝜆r (A)𝜆s (B), hence the result. 

Before we proceed to the next theorem, we will show that 𝜆1 ⊗ 𝜆1 is not a complete
measure. Let E be a subset of [0, 1] that is not Lebesgue measurable. The set A =
E × {0} ⊆ ℝ2 is contained in B = [0, 1] × {0}, which is in ℒ1 ⊗ ℒ1 . Clearly, (𝜆1 ⊗
𝜆1 )(B) = 0. However, A is not in ℒ1 ⊗ ℒ1 by proposition 8.8.2. As a by-product of
this example, it follows that ℒ2 is strictly larger than ℒ1 ⊗ ℒ1 .

Theorem 8.8.14. Let r and s be positive integers, and let n = r + s. Then (ℝn , ℒn , 𝜆n )
is the completion of (ℝn , ℒr ⊗ ℒs , 𝜆r ⊗ 𝜆s ).

Proof. By the above proposition, if A ∈ ℒr and B ∈ ℒs , then 𝜆n (A × B) =


𝜆r (A)𝜆s (B) = (𝜆r ⊗ 𝜆s )(A × B). Thus 𝜆n agrees with 𝜆r ⊗ 𝜆s on the set of
measurable rectangles in ℒr ⊗ ℒs . By the uniqueness of the product measure
(theorem 8.8.9 and problem 5 at the end of this section), 𝜆n extends 𝜆r ⊗ 𝜆s .
Since (ℝn , ℒn , 𝜆n ) is a complete measure space, it contains the completion of
(ℝn , ℒr ⊗ ℒs , 𝜆r ⊗ 𝜆s ).
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 427

The proof will be complete if we show that for a member E of ℒn , there are members
A and B of ℒr ⊗ ℒs such that A ⊆ E ⊆ B, and (𝜆r ⊗ 𝜆s )(B − A) = 0; see problem
4 on section 8.2. By theorem 8.4.10, there exists an F𝜍 set A ⊆ ℝn and a G𝛿 set
B ⊆ ℝn such that A ⊆ E ⊆ B, and 𝜆n (B − A) = 0. Since A, B ∈ ℬn ⊆ ℒr ⊗ ℒs , the
above paragraph implies that (𝜆r ⊗ 𝜆s )(B − A) = 𝜆n (B − A) = 0, as desired. 

Excursion: The Product of Finitely Many Measures

It is clear that the above definitions and constructions for the product of two
measurable spaces can be extended to the product of any finite number of
measurable spaces {(Xi , 𝔐i ), 1 ≤ i ≤ n}. A measurable rectangle is a set of the form
A1 × ... × An , Ai ∈ 𝔐i , and an elementary set is a disjoint union of a finite number
of measurable rectangles. It is easy to see that the collection, ℭ, of elementary
sets is an algebra. By definition, 𝔐1 ⊗ ... ⊗ 𝔐n is the 𝜎-algebra generated by
the collection of measurable rectangle. Obviously, the algebra ℭ also generates
𝔐1 ⊗ ... ⊗ 𝔐n .

We first establish the following technical lemma

Lemma 8.8.15. Let (Xi , 𝔐i ), 1 ≤ i ≤ 3 be measurable spaces. Then

𝔐1 ⊗ (𝔐2 ⊗ 𝔐3 ) = 𝔐1 ⊗ 𝔐2 ⊗ 𝔐3 .

Proof. Recall that 𝔐1 ⊗ 𝔐2 × 𝔐3 is generated by the set of all measurable


rectangles A1 × A2 × A3 , where Ai ∈ 𝔐i , while 𝔐1 ⊗ (𝔐2 ⊗ 𝔐3 ) is generated
by sets of the form A1 ⊗ P, where A1 ∈ 𝔐1 and P ∈ 𝔐2 ⊗ 𝔐3 . We show
that 𝔐1 ⊗ 𝔐2 ⊗ 𝔐3 ⊆ 𝔐1 ⊗ (𝔐2 ⊗ 𝔐3 ). Every measurable rectangle R =
A1 × A2 × A3 in X1 × X2 × X3 can be written as R = A1 × P, where P = A2 × A3 .
Since P ∈ 𝔐2 ⊗ 𝔐3 , 𝔐1 ⊗ (𝔐2 ⊗ 𝔐3 ) contains all measurable rectangles;
hence 𝔐1 ⊗ (𝔐2 ⊗ 𝔐3 ) ⊇ 𝔐1 ⊗ 𝔐2 ⊗ 𝔐3 .

To prove the reverse containment, it is enough to show that 𝔐1 ⊗ 𝔐2 ⊗ 𝔐3


contains every set of the form A1 × P, where A1 ∈ 𝔐1 and P ∈ 𝔐2 ⊗ 𝔐3 , which
is a generating set for 𝔐1 ⊗ (𝔐2 ⊗ 𝔐3 ).
Define a collection of subsets of X2 × X3 as follows:
Ω = {P ⊆ X2 × X3 ∶ X1 × P ∈ 𝔐1 ⊗ 𝔐2 ⊗ 𝔐3 }.
It is easy to see that Ω is a 𝜎-algebra and that every measurable rectangle A2 × A3
is in Ω. Thus Ω contains the 𝜎-algebra generated by all elementary rectangles in
X2 × X3 ; hence Ω ⊇ 𝔐2 ⊗ 𝔐3 . It follows that, for every P ∈ 𝔐2 ⊗ 𝔐3 , X1 ⊗
P ∈ 𝔐1 ⊗ 𝔐2 ⊗ 𝔐3 . It is clear that, for every A1 ∈ 𝔐1 , A1 × X2 × X3 ∈ 𝔐1 ⊗
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

428 fundamentals of mathematical analysis

𝔐2 ⊗ 𝔐3 ; hence the intersection of X1 × P and A1 × X2 × X3 is in 𝔐1 ⊗ 𝔐2 ⊗


𝔐3 . But the intersection of the latter two sets is exactly A1 × P. This concludes the
proof. 

By an argument almost identical to the above proof, it can be shown that


(𝔐1 ⊗ 𝔐2 ) ⊗ 𝔐3 = 𝔐1 ⊗ 𝔐2 ⊗ 𝔐3 . Thus the formation of products of
measurable spaces is associative.

It follows by induction that if {(Xi , 𝔐i ), 1 ≤ i ≤ n} is a finite set of measurable


spaces then

𝔐1 ⊗ ... ⊗ 𝔐n = 𝔐1 ⊗ ℜn−1 , where ℜn−1 = 𝔐2 ⊗ ... ⊗ 𝔐n .

This immediately suggests an inductive definition of the product of more than two
measure spaces.

Definition. Let {(Xi , 𝔐i , 𝜇i ), 1 ≤ i ≤ n} be a set of 𝜎-finite measure spaces. We


define the product measure 𝜇1 ⊗ ... ⊗ 𝜇n on 𝔐1 ⊗ ... ⊗ 𝔐n = 𝔐1 ⊗ ℜn−1 by

𝜇1 ⊗ ... ⊗ 𝜇n = 𝜇1 ⊗ 𝜌n−1 , where 𝜌n−1 = 𝜇2 ⊗ ... ⊗ 𝜇n .

Theorem 8.8.9 and the inductive nature of the construction imply that 𝜇1 ⊗ ... ⊗
𝜇n is a 𝜎-finite measure on 𝔐1 ⊗ ... ⊗ 𝔐n and that, for a measurable rectangle
n
A1 × ... × An , we have (𝜇1 ⊗ ... ⊗ 𝜇n )(A1 × ... × An ) = ∏i=1 𝜇i (Ai ).

Theorem 8.8.16. Suppose that {(Xi , 𝔐i , 𝜇i ), 1 ≤ i ≤ n} is a set of 𝜎-finite measure


spaces. Then the product measure 𝜇1 ⊗ ... ⊗ 𝜇n is the unique 𝜎-finite mea-
sure on 𝔐1 ⊗ ... ⊗ 𝔐n such that, for every measurable rectangle A1 × ... × An ,
n
(𝜇1 ⊗ ... ⊗ 𝜇n )(A1 × ... × An ) = ∏i=1 𝜇i (Ai ). 

The existence of the product measure is by the inductive construction outlined


before the statement of the theorem. The uniqueness of 𝜇1 ⊗ ... ⊗ 𝜇n is by problem
5 at the end of the section.

Fubini’s theorem (theorem 8.8.11) extends to the product of any finite number of
measures in a straightforward manner. Using the notation we established earlier
in this excursion, if f ∈ 𝔏1 (𝜇1 ⊗ ... ⊗ 𝜇n ), then

∫ fd(𝜇1 ⊗ ... ⊗ 𝜇n ) = ∫ ∫ fd𝜌n−1 d𝜇1 .


X1 ×...×Xn X1 X2 ×...×Xn
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 429

The repeated application of Fubini’s theorem (induction) yields

∫ fd(𝜇1 ⊗ ... ⊗ 𝜇n ) = ∫ ∫ ... ∫ fd𝜇n ...d𝜇2 d𝜇1 .


X1 ×...×Xn X1 X2 Xn

Exercises

1. Let r and s be positive integers, and let n = r + s. Prove that ℬn = ℬr ⊗ ℬs .


2. Let r, s, and n be as in problem 1. Prove that ℒr ⊗ ℒs is strictly contained
in ℒn .
3. Let T ∶ ℝn → ℝn be an invertible linear operator. Prove that, for a function
f that is either positive or integrable,

∫ fd𝜆 = |det(T)| ∫ (foT)d𝜆. (∗)


ℝn ℝn

Hint: Let A be the matrix of T relative to the standard basis of ℝn . By


theorem B.4 in Appendix B, A is the product of elementary matrices. Prove
that (*) holds for linear mappings generated by elementary matrices. You
need Fubini’s theorem and a specialized version of problem 15 on section
8.4. Observe that a useful by-product of this exercise is that if A is an
orthogonal matrix, then, for all E ∈ ℒn , 𝜆(E) = 𝜆(AE), where AE = {Ax ∶
x ∈ E}. Thus, the Lebesgue measure is rotation invariant.
4. Prove that a proper subspace of ℝn has Lebesgue measure 0. Hint: See
problem 6 on section 8.4.
5. Complete the proof of theorem 8.8.9. Thus prove that if 𝜌 is a measure on
𝔐 ⊗ 𝔑 such that for A ∈ 𝔐, and B ∈ 𝔑, 𝜌(A × B) = 𝜇(A)𝜈(B), then 𝜌 =
𝜇 ⊗ 𝜈 on 𝔐 ⊗ 𝔑. The same result easily extends to the product of any finite
number of measures.
6. Let f ∶ ℝ2 → ℝ2 be the function

⎧1 if x ≥ 0, x ≤ y ≤ x + 1,
f(x, y) = −1 if x ≥ 0, x + 1 ≤ y ≤ x + 2,

⎩0 otherwise.

∞ ∞ ∞ ∞
Prove that ∫−∞ ∫−∞ f(x, y)dxdy ≠ ∫−∞ ∫−∞ f(x, y)dydx. This does not con-
tradict theorem 8.8.11 because, clearly, | f | is not integrable.
7. Let X = [0, 1], 𝔐 be ℒ1 -restricted to [0, 1], and dx (or dy) denote the
Lebesgue measure on [0, 1]. Choose a sequence 𝛼1 < 𝛼2 < ... in (0, 1) and,
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

430 fundamentals of mathematical analysis

for each n ≥ 1, let gn be a continuous positive function that vanishes outside


𝛼
[𝛼n , 𝛼n+1 ] such that ∫𝛼nn+1 gn (x)dx = 1. Define a function f ∶ ℝ2 → ℝ by
∞ 1 1 1 1
f(x, y) = ∑n=1 gn (y)[gn (x) − gn+1 (x)]. Show that ∫0 ∫0 f(x, y)dxdy ≠ ∫0 ∫0
f(x, y)dydx. Also prove directly that | f | is not integrable on the unit square.
1 1 x2 −y2 1 1 x2 −y2 𝜋
8. Show that ∫0 ∫0 dxdy = − ∫0 ∫0 dydx = . By integrating
(x2 +y2 ) 2 (x2 +y2 )2 4
x2 −y2
the positive part of f = on the unit square, show directly that f is
(x2 +y2 )2
not integrable on the unit square.
9. By integrating e−y sin 2xy on the strip [0, 1] × (0, ∞), show that
∞1
∫0 e−y sin2 ydy = log(5)/4.
y
10. Let {(Xi , 𝔐i ), 1 ≤ i ≤ n} be a finite set of measurable spaces. Show that the
complement of a measurable rectangle in X1 × ... × Xn is an elementary set.
This fact is needed in the proof that the collection of elementary sets is an
algebra.

8.9 A Glimpse of Fourier Analysis

This section has a number of axes. We extend the discussion of Fourier series of
2𝜋-periodic functions we started in section 4.10. We also study Fourier series of
functions in 𝔏p (−𝜋, 𝜋). Then we take a brief tour through the Fourier transform.
Finally we take a last look at the orthogonal polynomials we encountered in
section 4.10.

Fourier Series of 2𝜋-Periodic Functions

In section 4.10, we looked at the sequence of partial sums Sn f of a 2𝜋-periodic


function f. The first tool we develop is an integral formula for Sn f. Using the
notation of section 4.10,
n n 𝜋
̂ ijx = ∑ 1 eijx ∫ e−ijt f(t)dt
Sn f (x) = ∑ f(j)e
j=−n j=−n
2𝜋 −𝜋
𝜋 n
1
= ∫ ( ∑ eij(x−t) ) f(t)dt.
2𝜋 −𝜋 j=−n

We define the Dirichlet kernel to be the sequence of functions


n
Dn (x) = ∑ eijx .
j=−n

Then the above calculation yields


𝜋
1 1
Sn f (x) = ∫ f(t)Dn (x − t)dt = (f ∗ Dn )(x).
2𝜋 −𝜋 2𝜋
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 431

n n
Observe that Dn (x) = 1 + ∑j=1 (eijx + e−ijx ) = 1 + 2 ∑j=1 cos(jx). Multiplying the
two sides of last identity by sin(x/2) we obtain

n
x x x
sin( )Dn (x) = sin( ) + 2 ∑ sin( )cos(jx)
2 2 j=1
2
n
x 1 1 1
= sin( ) + ∑ [sin(j + )x − sin(j − )x] = sin(n + )x,
2 j=1
2 2 2

from which we obtain the formula for Dn (x) in closed form:


1
sin(n+ )x
Dn (x) = x
2
.
sin
2

The Dirichlet kernel is clearly an even, 2𝜋-periodic function, and Dn (0) = 2n + 1.


Since sin(x/2) > 0 on the interval (0, 𝜋), Dn (x) has simple roots at the roots of the
2𝜋j
function sin(n + 1/2)x, namely, xj = , j = ±1, … , n.
2n+1
The graph of D10 appears in figure 8.7.

Example 1. We derive the following estimate of ‖Dn ‖1 :

𝜋 n
1 4 1
‖Dn ‖1 = ∫ |D (x)|dx > 2 ∑ .
2𝜋 −𝜋 n 𝜋 k=1 k

20

15

10

-π 0

Figure 8.7 D10


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

432 fundamentals of mathematical analysis

Since 0 < sin(x/2) < x/2 for x ∈ (0, 𝜋),

𝜋 𝜋
1 |sin(n + 1/2)x| 2 |sin(n + 1/2)x|
‖Dn ‖1 = ∫ x dx ≥ ∫ dx
𝜋 0 sin 𝜋 0 x
2
(n+1/2)𝜋 n𝜋 n k𝜋
2 |sin x| 2 |sin x| 2 |sin x|
= ∫ dx > ∫ dx = ∑ ∫ dx
𝜋 0 x 𝜋 0 x 𝜋 k=1 (k−1)𝜋 x
n k𝜋 n
2 1 4 1
> ∑ ∫ |sin x|dx = 2 ∑ .
𝜋 k=1 k𝜋 (k−1)𝜋 𝜋 k=1 k

We are now ready to prove that the Fourier series of a continuous, 2𝜋-periodic
function f need not converge pointwise to f.

Theorem 8.9.1. There exists a function f ∈ 𝒞(𝒮1 ) such that Sn f(0) does not converge
to f(0).

Proof. We prove that, for some continuous, 2𝜋-periodic function f, the sequence
Sn f(0) is unbounded. For each n ∈ ℕ, define a functional 𝜆n on 𝒞(𝒮1 ) as follows:
𝜆n (f ) = Sn f(0). Then

𝜋 𝜋
1 ‖ f‖∞
|𝜆n (f )| ≤ ∫ | f(t)|Dn (t)dt ≤ ∫ |D (t)|dt = ‖Dn ‖1 ‖ f‖∞ .
2𝜋 −𝜋 2𝜋 −𝜋 n

It follows that ‖‖𝜆n ‖‖ ≤ ‖Dn ‖1 . We show that ‖𝜆n ‖ = ‖Dn ‖1 . Let 𝜖 > 0. Consider
the function

1 if Dn (x) ≥ 0,
f (x) = {
−1 if Dn (x) < 0.

Observe that f (x)Dn (x) = |Dn (x)| for all x ∈ [−𝜋, 𝜋]. By example 3 in section 8.7,
there exists a function g ∈ 𝒞(𝒮1 ) such that ‖ f − g‖1 < 𝜖. Now

𝜋 𝜋 𝜋
1 | | 1 | |
| ∫ D (x)g(x)dx − ∫ |Dn (x)|dx| = | ∫ Dn (x)g(x) − Dn (x)f (x)dx|
2𝜋 | −𝜋 n −𝜋
| 2𝜋 |
−𝜋
|
𝜋
1
≤ ∫ |D (x)|| f (x) − g(x)|dx ≤ ‖Dn ‖∞ ‖ f − g‖1 < 𝜖‖Dn ‖∞ .
2𝜋 −𝜋 n

1 𝜋 1 𝜋
It follows that |𝜆n (g)| = | ∫−𝜋 Dn (x)g(x)dx| > ∫−𝜋 |Dn (x)|dx − 𝜖‖Dn ‖∞ .
2𝜋 2𝜋
Since 𝜖 is arbitrary, ‖𝜆n ‖ = ‖Dn ‖1 .
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 433

In particular, the sequence of functionals (𝜆n ) is not uniformly bounded, since


‖𝜆n ‖ = ‖Dn ‖1 → ∞ as n → ∞. By the Banach-Steinhaus theorem, 𝜆n cannot be
pointwise bounded. Thus there exists a function f ∈ 𝒞(𝒮1 ) such that sup{|Sn f(0)| ∶
n ∈ ℕ} = sup{|𝜆n (f )| ∶ n ∈ ℕ} = ∞. 

We now prove another classical theorem about the convergence of the means
of the sequence of partial sums of the Fourier series of continuous, 2𝜋-periodic
functions.

For a function f ∈ 𝒞(𝒮1 ), consider the trigonometric polynomial


S0 f (x)+...+Sn f (x)
(𝜎n f)(x) = .
n+1
1 1
Since Sj f (x) = f ∗ Dj (x), 𝜎n f (x) = f ∗ Kn (x), where
2𝜋 2𝜋

1
Kn (x) = (D + ... + Dn ).
n+1 0

The function Kn is known as the Feijer kernel. We derive a formula for Kn below.
Form the formula for the Dirichlet kernel, we have
n
x 1
(n + 1)sin ( ) Kn (x) = ∑ sin (j + ) x.
2 j=0
2

Thus
n
x x 1
(n + 1)sin2 ( ) Kn (x) = ∑ sin ( ) sin (j + ) x
2 j=0
2 2
n
1 1 (n + 1)x
= ∑ (cos jx − cos (j + 1)x) = (1 − cos(n + 1)x) = sin2 ( ).
2 j=0 2 2

Hence
(n+1)x
sin2 ( )
Kn (x) = 2
x .
(n+1)sin2
2
( )
𝜋
Clearly, Kn is an even, positive, 2𝜋-periodic function, and since ∫−𝜋 Dj (x)dx = 2𝜋
𝜋
for all j ∈ ℕ, ∫−𝜋 Kn (x)dx = 2𝜋 for all n ∈ ℕ.

The following property of Kn is crucial for the next theorem: For 𝛿 ∈


(0, 𝜋), limn→∞ max{Kn (x) ∶ 𝛿 ≤ |x| ≤ 𝜋} = 0. This is because if 𝛿 ≤ |x| ≤ 𝜋, then
1
sin2 (x/2) ≥ sin2 (𝛿/2), and hence 0 ≤ Kn (x) ≤ 2
→ 0 as n → ∞.
(n+1)sin (𝛿/2)

Theorem 8.9.2 (Feijer’s theorem). For f ∈ 𝒞(𝒮1 ), limn ‖𝜎n f − f‖∞ = 0.


OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

434 fundamentals of mathematical analysis

1 𝜋
Proof. Since f ∗ Kn = Kn ∗ f, it is more convenient here to write 𝜎n f (x) = ∫−𝜋 f(x −
2𝜋
t)Kn (t)dt. Let 𝜖 > 0. By the uniform continuity of f, there exists a number
𝛿 > 0 such that if |t| < 𝛿, then | f(x − t) − f (x)| < 𝜖 for all x ∈ [−𝜋, 𝜋]. Choose a
natural number N such that, for n > N, max{Kn (t) ∶ 𝛿 ≤ |t| ≤ 𝜋} < 𝜖. Recall that
𝜋
∫−𝜋 Kn (t)dt = 2𝜋. Now, for n > N,

𝜋
| 1 |
|𝜎n f (x) − f (x)| = || ∫ (f(x − t) − f (x))Kn (t)dt||
2𝜋 −𝜋
𝜋
1
≤ ∫ | f(x − t) − f (x)|Kn (t)dt
2𝜋 −𝜋
1 1
= ∫ | f(x − t) − f (x)|Kn (t)dt + ∫ | f(x − t) − f (x)|Kn (t)dt
2𝜋 |t|<𝛿 2𝜋 𝛿≤|t|≤𝜋
𝜖 2‖ f‖∞
≤ ∫ K (t)dt + ∫ 𝜖dt ≤ 𝜖 + 2𝜖‖ f‖∞ .
2𝜋 |t|<𝛿 n 2𝜋 𝛿≤|t|≤𝜋

Since 𝜖 is arbitrary, the proof is complete. 

Observe that Feijer’s theorem furnishes another proof that trigonometric polyno-
mials are uniformly dense in 𝒞(𝒮1 ).

Fourier Series of 𝔏p -Functions

Consider the Banach space 𝔏p (−𝜋, 𝜋) with the norm

𝜋 1/p
1
‖ f‖p = ( ∫ | f (x)|p dx) , 1 ≤ p < ∞.
2𝜋 −𝜋

We are primarily interested in the cases p = 1 and p = 2, but a good number of


the results in this subsection are valid for any p ∈ [1, ∞). One can see directly that
̂ = 1 ∫𝜋 f(t)e−int dt of a function f in 𝔏p (−𝜋, 𝜋) are
the Fourier coefficients f(n)
2𝜋 −𝜋
defined. This is because, for p ≥ 1, 𝔏p (−𝜋, 𝜋) ⊆ 𝔏1 (−𝜋, 𝜋); hence

𝜋
̂ 1
|f(n)| ≤ ∫ | f(t)|dt = ‖ f‖1 < ∞.
2𝜋 −𝜋

̂
It is convenient to refer to the set of Fourier coefficients (f(n)) n∈ℤ of a function
p
f ∈ 𝔏 (−𝜋, 𝜋) by the notation 𝔉(f ). We think of 𝔉 as a linear transformation
from 𝔏p (−𝜋, 𝜋) to some suitable range space. For example, when p = 2, the
range space of 𝔉 is l2 (ℤ). We will show in example 2 below that, for all p ≥ 1,
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 435

and all f ∈ 𝔏p (−𝜋, 𝜋), 𝔉(f ) ∈ c0 (ℤ). The norm on c0 (ℤ) is the ∞-norm. Thus
̂
‖𝔉(f )‖∞ = sup{|f(n)| ∶ n ∈ ℤ}.

The following theorem is a special case of example 3 in section 8.7.

Theorem 8.9.3. Trigonometric polynomials are dense in 𝔏2 (−𝜋, 𝜋). 

We are now ready to extend the discussion of Fourier series we started in section
4.10 from 𝒞(𝒮1 ) to 𝔏2 (−𝜋, 𝜋).

Theorem 8.9.4. The set {un (t) = eint ∶ n ∈ ℤ} is an orthonormal basis for
𝔏2 (−𝜋, 𝜋).

Proof. By theorem 8.9.3, Span({un ∶ n ∈ ℤ}) is dense in 𝔏2 (−𝜋, 𝜋). The assertion
of this theorem follows directly from theorem 7.2.7. 

All the results we obtained in section 4.10 for Fourier series of continuous
functions extend to 𝔏2 (−𝜋, 𝜋). The following theorem lists some of the properties.
They follow directly from general Hilbert space theory.

Theorem 8.9.5. The following are true for a function f ∈ 𝔏2 (−𝜋, 𝜋):

(a) The sequence (f(n)) ̂ belongs to l2 (ℤ).


n∈ℤ

(b) If a = (an ) ∈ l (ℤ), then the series ∑−∞ an un converges in 𝔏2 (−𝜋, 𝜋).
2

(c) ̂
f (x) = ∑−∞ f(n)e inx
, where convergence takes place in 𝔏2 (−𝜋, 𝜋).

(d) 2
‖ f‖2 = ∑−∞ |f(n)| .̂ 2

(e) ̂ = 0 for all n ∈ ℤ, then f = 0 a.e.


If f(n)
(f ) The mapping f → 𝔉(f ) is a linear isometry from 𝔏2 (−𝜋, 𝜋) onto l2 (ℤ). 

The simplicity, elegance, and completeness of theorem 8.9.5 does not extend
to functions in 𝔏1 (−𝜋, 𝜋). For example, the sequence of partial sums Sn f (x) =
n
̂ j (x) need not converge to f in the 1-norm (see the section exercises), and
∑−n f(j)u
𝔉 does not map 𝔏1 (−𝜋, 𝜋) onto its range space, which we now describe.

Example 2 (the Riemann-Lebesgue lemma). For f ∈ 𝔏1 (−𝜋, 𝜋), 𝔉(f ) is in c0 (ℤ).

Observe that the assertion holds for trigonometric polynomials. Indeed, if


N
p(t) = ∑j=−N aj eijt , then p(n)
̂ = 0 whenever |n| > N.
To prove the general case, let f ∈ 𝔏1 (−𝜋, 𝜋), and let 𝜖 > 0. Choose a trigono-
̂ =0
metric polynomial p such that ‖ f − p‖1 < 𝜖, and an integer N such that p(n)
for |n| > N. Now if |n| > N,
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

436 fundamentals of mathematical analysis

𝜋
̂ ̂ − p(n)| 1 | |
|f(n)| = |f(n) ̂ = | ∫ (f(t) − p(t))e−int dt|
2𝜋 | −𝜋 |
𝜋
1
≤ ∫ | f(t) − p(t)|dt = ‖ f − p‖1 < 𝜖. 
2𝜋 −𝜋

Example 3. If we view 𝜎n as a linear operator on 𝔏1 (−𝜋, 𝜋), 𝜎n is bounded, and


‖𝜎n ‖ ≤ 1. For f ∈ 𝔏1 (−𝜋, 𝜋),

𝜋 𝜋 𝜋 𝜋
1 1 | | 1
‖𝜎n f‖1 = ∫ | ∫ f(t)Kn (x − t)dt|dx ≤ 2 ∫ ∫ | f(t)|Kn (x − t)dtdx
2𝜋 −𝜋 2𝜋 | −𝜋 | 4𝜋 −𝜋 −𝜋
𝜋 𝜋 𝜋
1 1
= ∫ | f(t)| ∫ Kn (x − t)dxdt = ∫ | f(t)|dt = ‖ f‖1 . 
4𝜋 2 −𝜋 −𝜋
2𝜋 −𝜋

Example 4. For f ∈ 𝔏1 (−𝜋, 𝜋), 𝜎n f converges to f in 𝔏1 (−𝜋, 𝜋).


Let 𝜖 > 0, and choose g ∈ 𝒞(𝒮1 ) such that ‖ f − g‖1 < 𝜖. By Feijer’s theorem,
𝜎n g converges to g uniformly; hence 𝜎n g converges to g in 𝔏1 (−𝜋, 𝜋). Choose
an integer N such that, for n > N, ‖𝜎n g − g‖1 < 𝜖. Using example 3, if n >
N, then ‖𝜎n f − f‖1 ≤ ‖𝜎n f − 𝜎n g‖1 + ‖𝜎n g − g‖1 + ‖g − f‖1 ≤ 2‖ f − g‖1 + ‖𝜎n g −
g‖1 < 3𝜖. 

̂ = 0 for all
Theorem 8.9.6 (the uniqueness theorem). If f ∈ 𝔏1 (−𝜋, 𝜋) and f(n)
1
n ∈ ℤ, then f = 0 a.e. Consequently, the mapping 𝔉 ∶ 𝔏 (−𝜋, 𝜋) → c0 (ℤ) is
injective.

Proof. By assumption, 𝜎n f = 0 for all n ∈ ℕ. Since 𝜎n f converges to f by the previous


example, ‖ f‖1 = 0, and f = 0. a.e. 

Theorem 8.9.7. The linear mapping 𝔉 ∶ 𝔏1 (−𝜋, 𝜋) → c0 (ℤ) is not onto.

̂
Proof. First observe that 𝔉 is bounded by virtue of the inequality |f(n)| ≤ ‖ f‖1 . If 𝔉
is surjective, then, by the open mapping theorem, 𝔉 would be invertible; hence,
for every f ∈ 𝔏1 (−𝜋, 𝜋), ‖ f‖1 ≤ M‖𝔉(f )‖∞ , where M = ‖𝔉−1 ‖. Now, for the
sequence (Dn ) of Dirichlet kernels, ‖𝔉(Dn )‖∞ = 1, while ‖Dn ‖1 → ∞ as n → ∞.
This contradiction delivers the result. 

The Fourier Transform

The Fourier transform of a function f ∈ 𝔏1 (ℝ) is, by definition, the function

̂ = 1
f(x) ∫ f(t)e−ixt dt.
√2𝜋 ℝ
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 437

1
The normalization constant is included only for the symmetry of the formulas
√2𝜋
and is not essential.
One can think of the Fourier transform as the continuous equivalent of Fourier
series. Instead of using the discrete set of frequencies {eint }n∈ℤ , the Fourier
transform uses a continuum of frequencies {eixt }t∈ℝ .

It is clear that f ̂ ∈ 𝔏∞ (ℝ) and that ‖f‖̂ ∞ ≤ ‖ f‖1 . The following theorem narrows
down the range space of the Fourier transform.

Theorem 8.9.8. If f ∈ 𝔏1 (ℝ), then f ̂ ∈ 𝒞0 (ℝ).

Proof. First we prove that f ̂ is continuous. Suppose (xn ) is a convergent sequence


and that limn xn = x0 . Since | f(t)e−ixn t | = | f(t)| and f ∈ 𝔏1 (ℝ), the dominated
convergence theorem implies that

̂ n ) = limn ∫ 1 1 ̂ 0 ).
limn f(x f(t)e−ixn t dt = ∫ f(t)e−ix0 t dt = f(x
ℝ √2𝜋 ℝ √2𝜋

To prove that f ̂ vanishes at ∞, write

̂ = − 1 ∫ f(t)e−ix(t+𝜋/x) dt = − 1 ∫ (𝜏a f)(𝜉)e−ix𝜉 d𝜉.(a = 𝜋/x).


f(x)
√2𝜋 ℝ √2𝜋 ℝ

Thus

̂ = 1 1
2f(x) ∫ f(𝜉)e−ix𝜉 d𝜉 − ∫ (𝜏a f)(𝜉)e−ix𝜉 d𝜉
√2𝜋 ℝ √2𝜋 ℝ
1
= ∫ (f − 𝜏a f)(𝜉)e−ix𝜉 d𝜉.
√2𝜋 ℝ

It follows that

̂ 1 1
2|f(x)| ≤ ∫ | f(𝜉) − (𝜏a f)(𝜉)e−i𝜉x |d𝜉 = ‖ f − 𝜏a f‖1 .
√2𝜋 ℝ √2𝜋

As |x| → ∞, a → 0, and lim|x|→∞ ‖ f − 𝜏a f‖1 → 0, by theorem 8.7.9. 

̂ = 0 is known as the Riemann-Lebesgue


In this theorem, the fact that lim|x|→∞ f(x)
lemma.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

438 fundamentals of mathematical analysis

The next goal is to prove the inversion formula. Guided by the inversion formula
for a function f ∈ 𝔏1 (−𝜋, 𝜋) when 𝔉(f ) ∈ l1 (ℤ) (see problem 1 at the end of this
section), one can reasonably conjecture that if f and f ̂ are both in 𝔏1 (ℝ), then
1
f (x) = ̂ ixt dt for almost every x ∈ ℝ.
∫ℝ f(t)e
√2𝜋

The proof bears some resemblance to that of Feijer’s theorem in that we will find
a family of functions {G𝜍 } such that lim𝜍↓0 f ∗ G𝜍 converges to f in 𝔏1 (ℝ). Before
we construct the family G𝜍 , it may be useful to find an even function that is equal
to its own Fourier transform. One such function exists. The proof of the following
proposition is left as an exercise.

1 2
Proposition 8.9.9. For the function G1 (x) = e−x /2 , Ĝ 1 = G1 . 
√2𝜋

Example 5. The inversion formula holds for the function G1 .


Because G1 is even,

1 1
G1 (x) = G1 (−x) = Ĝ 1 (−x) = ∫ G1 (t)eixt dt = ∫ Ĝ 1 (t)eixt dt. 
√2𝜋 ℝ √2𝜋 ℝ

Definition. The Gauss kernel For 𝜎 > 0, define

1 −x2
G𝜍 (x) = exp{ }.
𝜎√2𝜋 2𝜎 2

1
Observe that G𝜍 (x) = G1 (x/𝜎).
𝜍

The family {G𝜍 ∶ 𝜎 > 0} is an approximate identity in the sense that

(a) G𝜍 (x) ≥ 0 for all x ∈ ℝ and all 𝜎 > 0,


(b) ∫ℝ G𝜍 (x)dx = 1 for all 𝜎 > 0, and
(c) For every 𝛿 > 0, lim𝜍↓0 ∫|x|>𝛿 G𝜍 (x)dx = 0.

Example 6. We prove property (c). For |x| > 𝛿, we have


∞ ∞
2 −x2 2 −x2
∫ G𝜍 (x)dx = ∫ 𝛿 exp{
}dx ≤ ∫ x exp{ }dx
|x|>𝛿 𝜎𝛿 √2𝜋 𝛿 2𝜎 2 𝜎𝛿 √2𝜋 𝛿 2𝜎 2

2𝜎 −y2 2𝜎 −𝛿 2
= ∫ y exp{ }dy = exp{ 2 } → 0 as 𝜎 ↓ 0. 
𝛿 √2𝜋 𝛿/𝜍 2 𝛿 √2𝜋 2𝜎
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 439

Other examples of approximate identities include the bump kernel we studied in


section 8.7. Indeed, for a fixed 𝜂 > 0, limh↓0 ∫|x|≥𝜂 𝛿h (x)dx = 0, because, for every
h < 𝜂, ∫|x|≥𝜂 𝛿h (x)dx = 0.

Example 7. Let g ∶ ℝ → ℝ be a bounded continuous function. Then


lim𝜍↓0 ∫ℝ g(y)G𝜍 (y)dy = g(0).

Let 𝜖 > 0, and choose 𝛿 > 0 such that |g(y) − g(0)| < 𝜖 for all y such that |y| < 𝛿.
Also choose 𝜎0 > 0 such that ∫|y|>𝛿 G𝜍 (y)dy < 𝜖 whenever 0 < 𝜎 < 𝜎0 :

| |
|g(y)G𝜍 (y) − g(0)| = || ∫ g(y)G𝜍 (y) − ∫ g(0)G𝜍 (y)dy||
ℝ ℝ

≤ ∫ |g(y) − g(0)|G𝜍 (y)dy


=∫ |g(y) − g(0)|G𝜍 (y)dy + ∫ |g(y) − g(0)|G𝜍 (y)dy


|y|<𝛿 |y|≥𝛿

≤ 𝜖∫ G𝜍 (y)dy + 2‖g‖∞ ∫ G𝜍 (y)dy < 𝜖(1 + 2‖g‖∞ ). 


|y|<𝛿 |y|>𝛿

Proposition 8.9.10. If 1 ≤ p < ∞ and f ∈ 𝔏p (ℝ), then lim𝜍↓0 ‖ f ∗ G𝜍 − f‖p = 0.

Proof. Replicating the estimates in the proof of theorem 8.7.12, we obtain

p p
‖ f ∗ G𝜍 − f‖p ≤ ∫ ‖𝜏y f − f‖p G𝜍 (y)dy.

p
The function g(y) = ‖𝜏y f − f‖p is continuous by problem 4 on section 8.7 and is
p
bounded because |g(y)| ≤ (‖𝜏y f‖p + ‖ f‖p )p = 2p ‖ f‖p .
p
Applying the previous example, we have lim𝜍→0 ∫ℝ ‖𝜏y f − f‖p G𝜍 (y)dy =
g(0) = 0. 

1
Example 8. We will later need the identity G𝜍 (x) = ∫ℝ G1 (𝜎t)eixt dt:
√2𝜋

1 1 1 x
G𝜍 (x) = G (x/𝜎) = Ĝ 1 (−x/𝜎) = ∫ G1 (y) exp(iy )dy
𝜎 1 𝜎 𝜎√2𝜋 ℝ 𝜎
1
= ∫ G1 (𝜎t)eixt dt. 
√2𝜋 ℝ
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

440 fundamentals of mathematical analysis

Theorem 8.9.11 (the inversion theorem). If f and f ̂ are in 𝔏1 (ℝ), then for almost
every x ∈ ℝ,
1 ̂ ixt dt.
f (x) = ∫ f(t)e
√2𝜋 ℝ

Proof. Using the previous example, we have

1
f ∗ G𝜍 (x) = ∫ f(x − t)G𝜍 (t)dt = ∫ f(x − t) ∫ G1 (𝜎y)eiyt dydt
ℝ ℝ √2𝜋 ℝ
1
= ∫ ∫ f(x − t)eiyt dtG1 (𝜎y)dy
√2𝜋 ℝ ℝ
1
= ∫ ∫ f(u)eiy(x−u) duG1 (𝜎y)dy
√2𝜋 ℝ ℝ

̂ ixy G1 (𝜎y)dy.
= ∫ f(y)e

The summary of the above calculations is that

̂ ixy G1 (𝜎y)dy.
f ∗ G𝜍 (x) = ∫ f(y)e

Now consider the sequence 𝜎n = 1/n. By the identity we just established,

̂ ixy G1 (𝜎n y)dy.


f ∗ G𝜍n (x) = ∫ f(y)e (8)

̂ ixy G1 (𝜎n y)| ≤ |f(y)|/


On the one hand, |f(y)e ̂ √2𝜋, f ̂ ∈ 𝔏1 (ℝ), and limn G1 (𝜎n x) =
1
G1 (0) = . Thus, by the dominated convergence theorem, the right side of
√2𝜋
1
identity (8) converges to ̂ ixy dy for every x ∈ ℝ.
∫ℝ f(y)e
√2𝜋

On the other hand, by proposition 8.9.10, the left side of identity (8) converges to f
in 𝔏1 (ℝ). By example 5 in section 8.3, the sequence f ∗ G𝜍n contains a subsequence
that converges a.e. to f.
Putting the last two facts together, we arrive at the inversion formula. 

1
Observe that the function g(x) = ̂ ixt dt is in 𝒞0 (ℝ) by an argument
∫ℝ f(t)e
√2𝜋
identical to that in the proof of theorem 8.9.8. Thus the assumptions of the above
theorem imply the f is equal a.e. to a 𝒞0 (ℝ) function.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 441

Corollary 8.9.12 (the uniqueness theorem). If f ∈ 𝔏1 (ℝ) and f ̂ = 0, then f (x) = 0


for a.e. x ∈ ℝ. 

Orthogonal Polynomials: One More Time

In section 4.10, we studied the space H of continuous, square integrable functions


with respect to a weight function 𝜔. The inner product we used was given by the
formula
b
⟨f, g⟩ = ∫a f (x)g(x)𝜔(x)dx.

We are now ready to settle a question that could not be answered completely in
chapter 4. What is the smallest Hilbert space that contains the space H? The answer
is now within our reach.

If we define a finite positive measure 𝜇 on the 𝜎-algebra ℒ of Lebesgue measurable


sets by d𝜇 = 𝜔d𝜆, where 𝜆 is Lebesgue measure on ℝ, then the Hilbert space we
seek is (or should be, if there is justice) 𝔏2 (ℝ, ℒ, 𝜇) = 𝔏2 (𝜇).

In the case of Legendre polynomials, the situation is simple. Since 𝜔(x) = 1,


the measure 𝜇 is nothing other than the Lebesgue measure on (−1, 1), and
𝔏2 (𝜇) = 𝔏2 (−1, 1). As we observed in section 4.10, the space H contains the space
𝒞[−1, 1]. By theorem 4.10.8, the linear span of the sequence-normalized Legendre
polynomials (P̃ n ) is dense in the space 𝒞[−1, 1]. By example 1 in section 8.7,
𝒞[−1, 1] is dense in 𝔏2 (−1, 1). It follows that the linear span of (P̃ n ) is dense in
𝔏2 (−1, 1), and we have proved the following result.

Theorem 8.9.13. The normalized Legendre polynomials P̃ n form an orthonormal


basis for 𝔏2 (−1, 1). 

The situation is far less obvious in the case of Hermite polynomials. In this
2
case, d𝜇 = e−x d𝜆, and it is true that the normalized Hermite polynomials
1
H̃ n = Hn (see problem 15 on section 4.10) form an orthonormal basis for
√n!2n √𝜋
𝔏2 (𝜇). Equivalently, we prove the following.

Theorem 8.9.14. If f ∈ 𝔏2 (𝜇) and ∫ℝ f (x)H̃ n (x)d𝜇 = 0 for all n ∈ ℕ, then f (x) = 0
for a.e. x ∈ ℝ.

Proof. Since Span({H̃ 0 , … , H̃ n }) = Span({1, x, , … , xn }), the assumption is equivalent


2
to ∫ℝ f (x)xn e−x dx = 0 for all n ∈ ℕ.
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

442 fundamentals of mathematical analysis

Because 𝜇 is a finite measure, 𝔏2 (𝜇) ⊆ 𝔏1 (𝜇). In particular, f ∈ 𝔏1 (𝜇), and the


2
function g(t) = f(t)e−t ∈ 𝔏1 (𝜆).The proof will be complete if we show that ĝ = 0.

We leave it to the reader to verify that, for a fixed x ∈ ℝ, the function h(t) = e|xt| ∈
2
𝔏2 (𝜇). It follows that the product f(t)h(t) ∈ 𝔏1 (𝜇); hence f(t)e|xt| e−t ∈ 𝔏1 (𝜆).
Now

2 2 (−ixt)n
̂ = ∫ f(t)e−t e−ixt dt = ∫ f(t)e−t ∑
√2𝜋g(x) dt
ℝ ℝ n=0
n!
∞ n
(−ix) 2
=∑ ∫ f(t)tn e−t dt = 0.
n=0
n! ℝ

The term-by-term integration of the series is justified by the dominated conver-


2 n (−ixt)j
gence theorem because the sequence of functions f(t)e−t ∑j=0 is dominated
j!
|xt| −t2
by the integrable function f(t)e e .

Exercises
∞ ∞
1. Prove that if f ∈ 𝔏1 (−𝜋, 𝜋) and ∑−∞ |f(n)| ̂ ̂
< ∞, then f (x) = ∑−∞ f(n)e inx

a.e. In particular, f is equal a.e. to a continuous, 2𝜋-periodic function. See


theorem 4.10.7.
2. Prove that there exists a function f ∈ 𝔏1 (−𝜋, 𝜋) such that Sn f does not
converge to f in the 1-norm. Hint: View 𝔉 as a bounded linear operator
on 𝔏1 (−𝜋, 𝜋), and use the Banach-Steinhaus theorem.
𝜆
3. Prove that the family h𝜆 (x) = 2 2 , 𝜆 > 0, is an approximate identity.
𝜋(x +𝜆 )
4. Show that the mapping 𝔉 ∶ 𝔏1 (ℝ) → 𝒞0 (ℝ) given by f ↦ f ̂ is bounded and
1
that ‖𝔉‖ = .
√2𝜋
5. Suppose f ∈ 𝔏1 (ℝ), and let a ∈ ℝ. Prove that
(a) if g(x) = f (x)eiax , then g(x) ̂ − a);
̂ = f(x
(b) if g(x) = f(x − a), then g(x) ̂ −iax ;
̂ = f(x)e
(c) if g(x) = f(−x), then g(x) ̂
̂ = f(x); and
(d) if g(x) = f(x/a) and a > 0, then g(x) ̂
̂ = af(ax).
6. Show that if f, g ∈ 𝔏1 (ℝ), then f ∗ g ∈ 𝔏1 (ℝ), and (fˆ ̂̂
∗ g) = fg.
7. Prove that if f ∈ 𝔏 (ℝ) and the function g(x) = −ixf (x) ∈ 𝔏1 (ℝ), then f ̂ is
1
df ̂
differentiable and = g(x). ̂ Hint: Use the definition of derivative and the
dx
dominated convergence theorem.
8. Prove proposition 8.9.9. Hint: Apply the previous exercises to derive the
dĜ (x)
differential equation 1 = −xĜ 1 (x).
dx
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi

integration theory 443

9. Let g ∶ ℝ → ℝ be a bounded continuous function. Prove that, for every


x ∈ ℝ, lim𝜍↓0 (g ∗ G𝜍 )(x) = g(x). This result generalizes example 7.
10. Prove that there does not exist a function 𝛿 ∈ 𝔏1 (ℝ) such that 𝛿 ∗ f = f for
all f ∈ 𝔏1 (ℝ).
11. Verify the details of the proof of theorem 8.9.14.
1 2
12. Prove that the sequence ( Hn (x)e−x /2 ) is an orthonormal basis for
√ n!2n √𝜋
𝔏2 (𝜆).
OUP UNCORRECTED PROOF – FINAL, 15/1/2021, SPi
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

APPENDIX A

The Equivalence of Zorn’s Lemma,


the Axiom of Choice, and the Well
Ordering Principle

Before we embark on the task of proving theorem 2.2.1, we need to develop some back-
ground work.

Notation. If S is a subset of a well-ordered set A, we use the notation min{S} to denote the
least element of S.

Definition. Let A be a well-ordered set, and let x ∈ A. The initial segment of A determined
by x is the set
S(A, x) = { y ∈ A ∶ y < x}.

Observe that x = min{A − S(A, x)}.

Definition. Let (A1 , ≤1 ) and (A2 , ≤2 ) be well-ordered subsets of a nonempty set X. No


ordering of X is assumed. We say that (A2 , ≤2 ) is a continuation of (A1 , ≤1 ) if A1 ⊆ A2 ,
A1 is a segment of A2 and ≤1 agrees with ≤2 on A1 . Simply stated, A1 is a segment of A2 ,
and ≤1 is the ordering A1 inherits from (A2 , ≤2 ). We use the notation (A1 , ≤1 ) ⊆ (A2 , ≤2 )
to indicate that (A2 , ≤2 ) is a continuation of (A1 , ≤1 ) or that (A2 , ≤2 ) = (A1 , ≤1 ).

A little reflection reveals that ⊆ is a partial ordering of the collection 𝔚 of well-ordered


subsets of X.

Lemma A.1. In the notation of the above paragraph, let ℭ = {(A𝛼 , ≤𝛼 )}𝛼∈I be a chain in 𝔚,
and let A = ∪𝛼 A𝛼 . Then A is well ordered.

Proof. Recall that to say that ℭ is a chain means that, for 𝛼, 𝛽 ∈ I, either (A𝛼 , ≤𝛼 ) ⊆ (A𝛽 , ≤𝛽 )
or (A𝛽 , ≤𝛽 ) ⊆ (A𝛼 , ≤𝛼 ). Here is an explicit definition of the ordering ≤ on A: for a, b ∈ A,
let 𝛼, 𝛽 ∈ I be such that a ∈ A𝛼 , b ∈ A𝛽 . Since ℭ is a chain, say, (A𝛼 , ≤𝛼 ) ⊆ (A𝛽 , ≤𝛽 ). Then
a, b ∈ A𝛽 . Define a ≤ b if a ≤𝛽 b. The fact that ≤ is well defined follows from the fact that
ℭ is a chain. It is a simple exercise to show that ≤ linearly orders A. We now show that
≤ is a well ordering on A. Let S be a nonempty subset of A. Then S ∩ A𝛼 ≠ ∅ for some 𝛼.
Let a be the least element of S ∩ A𝛼 . We claim that a is the least element of S. Let b ∈ S be
such that b ≤ a, and assume that b ∈ A𝛽 . If (A𝛽 , ≤𝛽 ) ⊆ (A𝛼 , ≤𝛼 ), then a, b ∈ S ∩ A𝛼 and
b = a, since a is least in S ∩ A𝛼 . If (A𝛼 , ≤𝛼 ) ⊆ (A𝛽 , ≤𝛽 ), then A𝛼 is a segment of A𝛽 , and
b ≤ a; hence, b ∈ A𝛼 , and, as before, b = a. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

446 fundamentals of mathematical analysis

Note that (A, ≤) in the above lemma is an upper bound of the chain ℭ. Thus if A𝛼 ≠ A,
then (A, ≤) is a continuation of (A𝛼 , ≤𝛼 ). The reader is encouraged to work out the details.
The crucial step to verify is the following: if a ∈ A𝛼 , y ∈ A, and y < a, then y ∈ A𝛼 . See
lemma A.4.

Theorem A.2. Zorn’s lemma implies the well ordering principle.

Proof. Let X be a nonempty set. We show that X can be well ordered. Let 𝔚 be the collection
of well-ordered subsets of X, and partially order 𝔚 by continuation. By lemma A.1, a chain
in 𝔚 has an upper bound. By Zorn’s lemma, 𝔚 has a maximal member (A, ≤). We claim
that A = X. If A ≠ X, pick an element z in X − A, and define an ordering ≤0 on Z = A ∪ {z}
as follows: retain the ordering ≤ on A, and define a <0 z for all a ∈ A. Now (Z, ≤0 ) is a strict
continuation of (A, ≤), which contradicts that maximality of (A, ≤). 

Theorem A.3. The well ordering principle implies the axiom of choice.

Proof. Let {X𝛼 } be a nonempty collection of nonempty sets. By assumption, each X𝛼 can be well
ordered. Let x𝛼 be the least element of X𝛼 , and let x = (x𝛼 ). Clearly, x is a choice function
and ∏𝛼 X𝛼 ≠ ∅. 

We need a final set of details before we prove the last leg of theorem 2.2.1. The definition
below makes sense for linearly ordered sets, but we limit the discussion to well-ordered sets
because this is where our interest lies now.

Definition. A subset B of a well-ordered set A is said to be a section of A if the conditions


a ∈ A, b ∈ B and a < b imply that a ∈ B.

The following facts are obvious:


(a) A segment of A is a section of A.
(b) A is a section of itself.
The lemma below is crucial. It is a partial converse of fact (a).

Lemma A.4. Every proper section B of a well-ordered set A is a segment of A.

Proof. By assumption, A − B ≠ ∅. Let x = min{A − B}. We show that B = S(A, x). If y ∈


S(A, x), then y < x; hence y ∈ B because otherwise y would contradict the definition of x.
Conversely, suppose y ∈ B. Now y ≠ x since y ∈ B and x ∉ B. Also if y > x, then, by the
definition of a section, x ∈ B, which is a contradiction. Thus y < x and y ∈ S(A, x). 

We adopt the following assumptions and terminology for the remainder of this appendix.

Let (X, ≤) be a partially ordered set such that every chain in X has an upper bound but X
has no maximal element.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

appendix a 447

Given a proper chain A in X, A has an upper bound, u. Since u is not maximal in X, there
is x ∈ X such that x > u. Clearly, x ∉ A because x is a strict upper bound of A. Let 𝔄 be
the collection of all chains in X. Invoking the axiom of choice, we can choose a strict upper
bound of each chain in X. Thus we have a function f ∶ 𝔄 → X that assigns to each chain A
a strict upper bound, f (A).

Fix an element x0 ∈ X, and define a subset A of X to be conforming if


(a) (A, ≤) is well ordered,
(b) x0 is the least element in A, and
(c) For every x ∈ A, f (S(A, x)) = x.

Lemma A.5. If A and B are conforming subsets of X and A − B ≠ ∅, then B is a segment of A.

Proof. Let x = min{A − B}, and define C = S(A, x). It is easy to verify that C ⊆ B. We claim
that C = B. Suppose, for a contradiction, that B − C ≠ ∅, and let y = min{B − C}. We need
three steps before we finalize the proof:
1. S(B, y) is a proper subset of C: Suppose u ∈ B, u < y. If u ∉ C, then u ∈ B − C, and
u < y. This contradicts the definition of y. If S(B, y) = C = S(A, x), then y = f (S(B, y)) =
f (S(A, x)) = x, which is a contradiction because y ∈ B and x ∉ B. This proves our
assertion that S(B, y) is a proper subset of C.
2. S(B, y) is a section of C: If u ∈ C, v ∈ S(B, y), and u < v, we show that u ∈ S(B, y). Since
u < v < y, u < y. If u ∉ S(B, y), then u ∉ B. Thus u ∈ A − B; hence u ≥ x. But u ∈ C =
S(A, x); hence u < x. This contradiction proves that u ∈ S(B, y); hence S(B, y) is a section
of C.
3. S(B, y) is a segment of C; thus S(B, y) = S(C, z), where z ∈ C: This follows directly from
steps 1 and 2 and lemma A.4.
Now we conclude the proof. By step 3, y = f (S(B, y)) = f (S(C, z)) = z. This is a contradiction
because z ∈ C, but y ∉ C by the definition of y. This contradiction proves that B = C. 

Let U be the union of all the conforming subsets of X. The following is a direct result of
lemma A.5.
Observation. If A is a conforming subset of X, a ∈ A, y ∈ U and y < a, then y ∈ A.

Lemma A.6. U is a conforming subset of X. Thus U is the largest conforming subset of X.

Proof. It is clear that x0 is the least element of U.


The following facts follow directly from the above observation:
(a) If T ⊆ U and A is a conforming subset that intersects T, then the least element of T ∩ A
is also the least element of T. Thus U is well-ordered.
(b) If x ∈ U, and A is a conforming subset that contains x, then S(U, x) = S(A, x).
Thus f (S(U, x)) = f (S(A, x)) = x. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

448 fundamentals of mathematical analysis

Theorems A.2 and A.3 together with theorem A.7 below constitute the proof of
theorem 2.2.1.

Theorem A.7. The axiom of choice implies Zorn’s lemma.

Proof. Let (X, ≤) be a partially ordered set such that each chain in X has an upper bound. If X
is a chain, then it would have a maximal (in fact, a largest) element, and there is nothing
more to prove. Therefore, we assume that X is not a chain. We show that X has a maximal
element. Suppose, for a contradiction, that X contains no maximal element.
We have shown in lemma A.6 that the set U is the largest conforming subset of X. Since U
is well ordered and X is not, U ≠ X. Let 𝜔 = f (U). The set U ∪ {𝜔} is clearly a conforming
subset of X that strictly contains U. This contradiction establishes the theorem. 
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

APPENDIX B

Matrix Factorizations

The main purpose of this appendix is to prove a useful matrix factorization result: theorem
B.4. Theorem B.3 is a useful by-product of this appendix.

Definition. Let A be an m × n matrix. By an elementary row (column) operation on A, we


mean one of the following operations:

(a) multiplying one row (column) by a nonzero scalar s


(b) interchanging two rows (columns)
(c) adding a multiple (𝜇) of one row (column) to another row (column)

Definition. An elementary matrix is an matrix obtained by performing a single elementary


row (or column) operation on the identity matrix.

Thus there are three types of elementary matrices:

(a) a scaling matrix (the entry s ≠ 0 is the ith diagonal entry):

1
⎛ ⎞

⎜ ⎟
⎜ 1 ⎟
S(s, i) = ⎜ s ⎟
⎜ 1 ⎟
⎜ ⋱ ⎟
⎝ 1⎠

(b) an elementary permutation matrix (the off diagonal entries are (i, j) and ( j, i)):

1
⎛ ⎞

⎜ ⎟
⎜ 0 1 ⎟
P(i, j) = ⎜ ⋮ ⋱ ⋮ ⎟
⎜ 1 0 ⎟
⎜ ⋱ ⎟
⎝ 1⎠
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

450 fundamentals of mathematical analysis

(c) a multiplier matrix (the entry 𝜇 is the ( j, i) entry):

1
⎛ ⎞
1
⎜ ⎟

⎜ ⎟
1
M(𝜇, i, j) = ⎜ ⎟
⎜ ⋮ ⋱ ⎟
⎜ 𝜇 ⋱ ⎟
⎜ ⋮ ⎟
⎝ 1⎠

Elementary matrices are invertible since they have nonzero determinants:


det (S) = s, det (P) = −1, and det (M) = 1.
The inverse of an elementary matrix is an elementary matrix of the same type. Clearly, a
permutation matrix is equal to its own inverse. For a scaling matrix, S(s, i)−1 = S(1/s, i).

Observe that a multiplier matrix can be written as M(𝜇, i, j) = I + 𝜇ej eTi . Using this, it is easy
to verify that (I + 𝜇ej eTi )−1 = I − 𝜇ej eTi . Here I is the identity matrix of the appropriate size.

Lemma B.1. Let A be an m × n matrix.

(a) If E is an m × m elementary matrix obtained by performing a certain elementary row


operation on Im , then performing the same operation on A produces the matrix EA.
(b) If F is an n × n elementary matrix obtained by performing a certain elementary column
operation on In , then performing the same operation on A produces the matrix AF.

Proof. Verifying the theorem when E or F is a scaling matrix or a permutation matrix is trivial.

If F is obtained from In by adding 𝜇 times column j to column i, then AF = A(In + 𝜇ej eTi ) =
A + 𝜇(Aej )eTi . Now Aej is the jth column of A, and (Aej )eTi is a matrix whose only nonzero
column is the jth column of A placed in the ith column. Hence the result.

Proving part (a) for the case of left multiplication by a multiplier matrix is similar and is
left to the reader. 

Theorem B.2. Given a nonzero m × n matrix A, there exist elementary m × m matrices


E1 , E2 , . . . , Er and elementary n × n matrices F1 , F2 , . . . , Fs such that
Er . . . E2 E1 AF1 F2 . . . Fs = D, where D is a diagonal matrix of the form

d1
⎛ ⎞
⎜ ⋱ ⎟
⎜ dq ⎟ , di ≠ 0, 1 ≤ i ≤ q.
⎜ ⎟
⎜ ⎟
⎝ ⎠
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

appendix b 451

Proof. In light of lemma B.1, it is enough to prove that A can be reduced to a diagonal matrix
through a sequence of elementary row and column operations. We proceed by induction on
the number of rows, m. The result is true for a 1 × n matrix and for any n ∈ ℕ. Consider
the 1 × n matrix A = (a1 , … , an ). If a1 = 0, we interchange the first entry with a later,
nonzero entry. Once that is achieved, we subtract ai /a1 times entry 1 from entry j, 2 ≤ j ≤ n.
We obtain a matrix of the form (a1 , 0, 0, … , 0). This proves the base case of our inductive
proof. Now we show the inductive step. Suppose the conclusion of the theorem holds for
k × n matrices if k < m and n ∈ ℕ. Let A be an m × n matrix. If a1,1 = 0, we can move a
nonzero entry from a later row and/or column to the top left entry, so assume that a1,1 ≠ 0.
Subtracting ai,1 /a1,1 times the top row from row i, 2 ≤ i ≤ m, then subtracting a1,j /a1,1 times
the first column from column j, 2 ≤ j ≤ n, we obtain a matrix of the form

a 0 ... ... 0
⎛ 11 ⎞
⎜ 0 ⎟. (∗)
⎜ ⋮ A′ ⎟
⎜ ⎟
⎝ 0 ⎠

Applying the inductive hypothesis to the sub-matrix A′ , we can reduce A′ to a diagonal


matrix through a combination of elementary row and column operations. Notice that
operating on A′ does not perturb the top row or the first column of the matrix (*). 

Theorem B.3. Given an m × n matrix A, there exists an invertible m × m matrix Q and an


invertible n × n matrix P such that Q−1 AP is diagonal.

Proof. Use the previous theorem and take Q = (Er . . . E1 )−1 and P = F1 . . . Fs . 

Theorem B.4. An invertible square matrix is the product of elementary matrices.

Proof. Using theorem B.2, if A is invertible, so is D (recall that elementary matrices are
invertible). In this case, still in the notation of theorem B.2, q = n and D=S1 S2 . . . Sn , where
Si is the scaling matrix
1
⎛ ⎞
⎜ ⋱ ⎟
Si = ⎜ di ⎟.
⎜ ⎟
⎜ ⋱ ⎟
⎝ 1⎠

1 . . . Er S1 . . . Sn . . . Fs . . . F1 , as desired. 
Thus A = E−1 −1 −1 −1
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

Bibliography

Bogachev, Vladimir I. Measure Theory. Vol. 1. Berlin, Heidelberg: Springer-Verlag, 2007.


Bowers, Adam and Nigel Kalton. An Introductory Course in Functional Analysis. New York:
Springer, 2014.
Butcher, J. C. Numerical Methods for Ordinary Differential Equations. Chichester: Wiley,
2003.
Chartrand, Gary et al. Mathematical Proofs: A Transition to Advanced Mathematics. 3rd
Edition. New York: Pearson, 2013.
Cohn, Donald. Measure Theory. 2nd Edition. Birkhäuser Advanced Texts. Basel: Springer,
2013.
Conte, S. D. and Carl De Boor. Elementary Numerical Analysis: An Algorithmic Approach.
Tokyo: McGraw-Hill, 1980.
Crilly, Tony and Dale Johnson. The emergence of topological dimension theory, in I. M. James
(ed.), History of Topology. New York: Elsevier, 1999, 1–24.
DeBarra, G. Introduction to Measure Theory. New York: Van Nostrand Reinhold Company,
1974.
Debnath, Lokenath and Piotr Mikusinski. Introduction to Hilbert Spaces with Applications.
3rd Edition. Amsterdam: Elsevier Academic Press, 2005.
Dorier, Jean-Luc. A general outline of the genesis of vector space theory. Historia mathemat-
ica, 22, no. 3, 1995, 227–61.
Fabian Marián et al. Functional Analysis and Infinite-Dimensional Geometry. CMS Books in
Mathematics. New York: Springer, 2001.
Folland, Gerald. Real Analysis. 2nd Edition. Hoboken: John Wiley and Sons, 1999.
Garling, D. J. H. A Course in Mathematical Analysis. Cambridge: Cambridge University
Press, 2013.
Griffel, D. H. Applied Functional Analysis. Ellis Horwood Sereis Mathematics and Its
Applications. Chichester : Ellis Horwood, 1981.
Hewit, Edwin and Karl Stromberg. Real and Abstract Analysis. Vol. 25. Graduate Texts in
Mathematics. Berlin, Heidelberg: Springer-Verlag, 1965.
Hillen, Thomas et al. Partial Differential Equations. Hoboken: Wiley and Sons, 2012.
Hönig, Chaim. Proof of the well-ordering of cardinal numbers. Proceedings of the American
Mathematical Society 5 no. 2, 1954, 312.
Hutson, Vivian and John Sydney Pym. Applications of Functional Analysis and Operator
Theory. Mathematics in Science and Engineering, Vol 146. London: Academic Press,
1980.
Johnson, L. W. and R. D. Riess. Numerical Analysis. Reading: Addison-Wesley, 1982.
Lang, Serge. Real and Functional Analysis. 3rd Edition. Graduate Texts in Mathematics. New
York: Springer-Verlag, 1993.
Lewin, Jonathan. A Simple Proof of Zorn’s Lemma. The American Mathematical Monthly
98 no. 4, 1991, 353.
MacCluer, Barbara. Elementary Functional Analysis. New York: Springer, 2009.
Mendelson, Bert. Introduction to Topology. 3rd Edition. New York: Dover Publications,
1990.
OUP UNCORRECTED PROOF – FINAL, 14/1/2021, SPi

454 bibliography

Mostow, George et al. Fundamental Structures of Algebra. New York: McGraw-Hill, 1975.
Munkres, James. Topology: A First Course. London: Prentice-Hall, 1975.
Muscat, Joseph. Functional Analysis: An Introduction to Metric Spaces, Hilbert Spaces and
Banach Algebras. Cham: Springer, 2014.
O’Connor, J. J. and E. F. Robertson. MacTutor History of Mathematics. St. Andrews:
University of St Andrews, 1998, http://www-history. mcs.st-andrews.ac.uk/index.html,
accessed Nov. 6, 2020.
Oden, Tinsley and Leszek Demkowicz. Applied Functional Analysis. 2nd Edition. Boca
Raton: CRC Press, 2010.
Pedersen, Gert K. Analysis Now. New York: Springer-Verlag, 1989.
Pinter, Charles. A Book of Set Theory. New York: Dover Publications, 2014.
Pitts, C. G. C. Introduction to Metric Spaces. University Mathematical Texts. Edinburgh:
Oliver and Boyd, 1972.
Rudin, Walter. Functional Analysis. McGraw-Hill Series in Higher Mathematics. New York:
McGraw-Hill, 1973.
Rudin, Walter. Principles of Mathematical Analysis. 3rd Edition. International Series in Pure
and Applied Mathematics. New York: McGraw-Hill, 1976.
Rudin, Walter. Real and Complex Analysis. 2nd Edition. McGraw-Hill Series in Higher
Mathematics. New York: McGraw-Hill, 1974.
Rynne, Bryan and Martin Youngson. Linear Functional Analysis. London: Springer-Verlag,
2008.
Searcóid, Micheál. Metric Spaces. Springer Undergraduate Mathematics Series. London:
Springer, 2007.
Shalit, Orr Moshe. A First Course in Functional Analysis. Boca Raton: CRC Press, 2017.
Simmons, George. Introduction to Topology and Modern Analysis. New York: McGraw-Hill,
1963.
Smith Douglas et al. A Transition to Advanced Mathematics. 8th Edition. Andover: Cenage
Learning, 2015.
Spence Lawrence et al. Elementary Linear Algebra: A Matrix Approach. 2nd Edition. Noida:
Pearson, 2018.
Stakgold, Iver. Green’s Functions and Boundary Value Problems. Pure and Applied Mathe-
matics. New York: John Wiley and Sons, 1979.
Stoer, J. and R. Bulirsch. Introduction to Numerical Analysis. New York: Springer-Verlag,
1980.
Viro, O. Ya. et al. Elementary Topology Problem Book. Providence: American Mathematical
Society, 2008.
Wade, William. Introduction to Analysis. Edinburgh: Pearson, 2014.
Young, Nicholas. An Introduction to Hilbert Space. Cambridge: Cambridge University Press,
1988.
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

Glossary of Symbols

(x𝛼 )𝛼∈I Typical element in ∏𝛼∈I X𝛼 6 Tn Tchebychev polynomials 187


AB Set exponentiation 6 U ⊕ V Direct sum 65
IX Identity function on X 4 U/V Quotient space 64
X = ∏𝛼∈I X𝛼 Product of {X𝛼 } 6 X∗ Dual space of X 256
𝜒S Characteristic function of S 7 X∗∗ Second dual space of X 270
limn Limit as n → ∞ 11 X∞ One-point compactification 229
lim infn Limit inferior as n → ∞ 17 ℵ0 Cardinality of ℕ 40
lim supn Limit superior as n → ∞ 17 𝜒(x, y) Chordal distance between x, y 127
ℕ Natural numbers 2 𝛿h Bump kernel 415
ℚ Rational numbers 2 X̂ Image of X in X∗∗ 270
ℝn Euclidean n-space 6 fn̂ Fourier coefficients of f 176
ℝ Real numbers 2 f ̂ Fourier transform 437
ℤ Integers 2 x̂n Fourier coefficients of x 302
𝒫(A) Power set of A 7 ⟨., .⟩ Inner product 86
ℜ(f) The range of f 3 ⟨x, 𝜆⟩ Duality bracket 279
ℝ Extended real numbers 17 𝕂(I) Finitely supported functions I → 𝕂
𝜋𝛼 ∶ X → X𝛼 Projection onto X𝛼 7 50
f(A) Direct image of A 3 𝕂(ℕ) Finitely supported seuences 50
f ∶ X → Y Function from X to Y 3 𝕂n n-d space, ℝn or ℂn 50, 433
f−1 (B) Inverse image of B 4 𝕂m×n Space of m × n matrices 50
gof Composition of functions 4 𝕂 Base field, ℝ or ℂ 50
inf(A) Infimum of a set 11 ℕn Natural numbers ≤ n 27
sup(A) Supremum of a set 11 ℙn Polynomials of degree ≤ n 50
2ℕ Binary sequences 31 ℙ Space of polynomials 50
A − B Set difference 3 ℝ2l Sorgenfrey plane 217
A ≈ B Set equivalence 26 ℝl Sorgenfrey line 199
A′ Derived set of A 112 ℬ(X) Bounded functions on X 160
A∗ Conjugate transpose of A 93 ℬ[a, b] Space of bounded functions on
B(x, r) Ball of radius r centered at x 80 [a, b] 50
Card(A) Cardinality of A 27 ℬ𝒞(X) Continuous bounded functions on
Conv(A) Convex hull of A 82 X 160
Dn Dirichlet kernel 430 𝒞(X) Continuous functions on X 160
G𝜍 Gauss kernel 438 𝒞(𝒮1 ) 2𝜋-periodic functions 174
Hn Hermite polynomials 189 𝒞[a, b] Space of continuous functions on
Ker(T) Kernel of T 62 [a, b] 51
M⟂ Orthogonal complement of M 90 𝒞∞ (ℝ) Infinitely differentiable functions
PM Projection operator on M 297 on ℝ 51
Pn Legendre polynomials 184 𝒞0 (X) Continuous functions vanishing at
S∆ Upper Riemann sum 343 ∞ 237
Sn f Finite Fourier sum 176 𝒞c (X) Continuous functions of compact
Span(A) Linear span of A 52 support 237
T∗ Adjoint of T 279 ℋ Hilbert cube 118
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

456 glossary of symbols

ℒ(X) Bounded operators on X 256 𝜕A Boundary of A 113


ℒ(X, Y) Bounded transformations X → Y 𝜌(T) Resolvent set of T 274
255 →w Weak convergence 272
𝒩(T) Null-space of T 62 𝜎(T) Spectrum of T 274
𝒮1 Unit circle 174 {e1 , e2 , ...} Canonical vectors in 𝕂(ℕ) 52
𝒮n−1
∗ Punctured unit sphere in ℝn 126 c0 Space of null sequences 51
𝒮n Unit sphere in ℝn 126 c Space of convergent sequences 51
𝔏∞ Essentially bounded functions 403 dim(U) Dimension of U 58
𝔏p Lebesgue spaces 403 dist(A, B) Distance between A and B 113
𝔐 ⊗ 𝔑 Product of 𝜎-algebras 419 dist(x, A) Distance from x to A 113
f ∗ g Convolution of functions 413
𝔠 Cardinality of ℝ 40
f+ Positive part of f 367
ℬ(X) Borel subsets of X 350
f− Negative part of f 367
ℬn Borel subsets of ℝn 350
int(A) Interior of A 110
ℒ(ℝn ) Lebesgue measurable subsets of ℝn
l∞ Space of bounded sequences 51
382
lp Space of p-convergent series 77
ℒn Lebesgue measurable subsets of ℝn r(T) Spectral radius of T 276
382 s∆ Lower Riemann sum 343
𝜇 ⊗ 𝜈 Product measure 423 sgn(g) Sign of g 405
𝜈 << 𝜇 Absolute continuity 398 supp(f) Support of f 237
𝜈+ Positive part of 𝜈 396 vol(Q) Volume of box Q 343
𝜈 − Negative part of 𝜈 396 x ⟂ y x is orthogonal to y 87
A Closure of A 111 |𝜈| Total variation of 𝜈 396
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

Index

Absolute continuity of measures 398 Bounded subset 114


Absolutely convergent series 252 Box topoloogy 242
Adjoint operator 98, 279, 308 Bump function 414
Alexandroff 231 Bump kernel 415
Algebra of sets 349
Algebra over a field 68 Canonical projections 260
Algebraic complement 65 Canonical vectors 52
Almost everywhere 369 Cantor intersection theorem 139
Annihilator 281 Cantor set 116
Annihilator of a set 295 Cantor, Georg 25
Antisymmetric relation 33 Carathéodory condition 353, 382, 389
Apollonius identity 299 Carathéodory’s theorem 83, 354
Approximate identity 438 Cardinal arithmetic 41
Arzela-Ascoli theorem 164 Cardinality 27, 39
Ascending sequence of sets 5 Cartesian product 3, 6
Ascoli’s theorem 164, 170 Cauchy criterion 15
Axiom of choice 35 Cauchy sequence 13, 136
Cauchy-Schwarz inequality 87, 293
Baire’s theorem 139, 225 Chain 33
Banach algebra 273 Change of base 72
Banach space 247 Characteristic function 7
Banach, Stefan 245 Choice function 35
Banach-Alaoglue theorem 287 Chordal metric 126
Banach-Saks theorem 308 Closed box 343
Banach-Steinhaus theorem 265 Closed graph 264
Barycentric coordinates 85 Closed graph theorem 264
Basis 55 Closed mapping 207
Bessel’s inequality 302 Closed set 107, 194
Best approximation 91 Closure of a set 111, 194
Bicontinuous function 123, 203 Closure point 111, 194
Bijection 4 Co-dimension 67
Binary sequences 6 Co-finite topology 193
Bolzano-Weierstrass property 15, 151 Coarser metric 121
Bolzano-Weierstrass theorem 13 Coarser topology 199
Borel algebra 350 Compact operator 320, 336
Borel sets 350 Compact space 149, 221
Boundary of a set 113, 195 Compact subspace 150, 221
Boundary point 113, 195 Compactification 231
Bounded away from zero 257 Complemented subspace 271
Bounded functions, space of 160 Complement of a set 3
Bounded linear mapping 253 Complete measure 355
Bounded metric 114, 124 Complete metric space 137
Bounded projection 272 Completeness of Lp 404
Bounded sequence 12, 114 Completeness of R 11
Bounded sequences, space of 51 Completeness of C 21
Bounded set 11 Completion of a measure 362
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

458 index

Completion of a metric 162 Disconnected space 208


Completion of a norm 270 Discrete metric 106
Completion of an inner product 298 Discrete topology 193
Complex measure 397 Disjoint family 5
Complex numbers 19 Distance function 105
Componentwise convergence 131 Distributive laws 3, 5
Composition of functions 4 Dominated convergence theorem 372
Conforming set 447 Dual space 256
Conjugate Hölder exponents 78 Duality bracket 278
Conjugate transpose 93
Connected components 211 Egoroff ’s theorem 409
Connected points 211 Eigenspace 274
Connected space 208 Eigenvalues 274
Connected subset 210 Eigenvectors 274
Continuity at a point 120, 201 Elementary matrix 449
Continuity of inner products 121 Elementary sets 420
Continuity of norms 121 Equicontinuity 163
Continuous bounded functions 160, 161, 202 Equivalence classes 7
Continuous function 120, 201, 202 Equivalence relation 7
Continuous functions, space of 160 Equivalent metrics 123
Continuum hypothesis 45 Equivalent sets 26
Contraction 142 Essential upper bound 403
Contraction mapping theorem 141 Essentially bounded functions 403
Convergent sequence 11, 108 Euclidean metric 105
Convergent sequences, space of 51 Euclidean n-space 6
Convergent series 252 Extended real line 17
Convex combination 82 Extreme point 83
Convex hull 82
Convex set 81 F𝜍 set 351
Convolution of functions 413 Fatou’s theorem 371
Coset of a subspace 64 Feijer kernel 433
Countable additivity 351 Field 10
Countable intersection property 218 Finer metric 121
Countable set 29 Finite intersection property 223
Countably compact space 225 Finite measure 351
Counting measure 351 Finite rank operator 322
Finite sequence 4
De Morgan’s laws 3, 5 Finite set 27
Decreasing sequence 12 Finite-dimensional space 57
Dedekind, Richard 1 First countable space 220
Defining base 206 Fixed point 142
Defining subbase 206 Fourier coefficients 176, 302
Dense subset 133, 217 Fourier series 176
Dependent vectors 53 Fourier transform 436
Derived set 112 Fredholm alternative theorem 327
Descending sequence of sets 5 Fredholm integral equation 332
Diagonalization 73 Fredholm theory 325
Diameter of a set 114 Fubini’s theorem 424, 428
Dimension of a vector space 58, 59 Functions of compact support 237, 374
Dini’s theorem 171
Dirac measure 351 G𝛿 set 351
Direct sum of subspaces 65 Gauss kernel 438
Direct sums 65 Gelfand’s theorem 277
Dirichlet kernel 430 Generalized continuum hypothesis 45
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

index 459

Gram-Schmidt process 92 Inverse image 3


Greatest element 33 Inversion formula 438, 440
Greatest lower bound 11 Invertible operator 273
Grid 342 Isolated point 112, 220
Isometric spaces 124
Hölder’s inequality 78, 403 Isometry 124
Hahn decomposition theorem 395 Isomorphism 63
Hahn-Banach theorem 268
Half-spaces 90 Jordan decomposition theorem 396
Hamel basis 55
Hausdorff property 108 Kernel of a linear mapping 62
Hausdorff space 214 Krein-Millman theorem 84
Heine-Borel theorem 154, 252 Kronecker delta 52
Hermite polynomials 188
Hermitian matrix 95 Least upper bound 11, 34
Hilbert cube 118, 250 Lebesgue measurable set 382
Hilbert dimension 306 Lebesgue measure 382
Hilbert space 293 Lebesgue number 152
Hilbert space isomorphism 304 Lebesgue outer measure 380
Hilbert, David 291 Lebesgue, Henri 341
Hilbert-Schmidt kernel 332 Left shift operator 274
Hilbert-Schmidt theorem 329, 335 Legendre polynomials 92, 183
Homeomorphic spaces 125, 203 Limit inferior 17
Homeomorphism 125, 203 Limit point 15, 20, 112, 197
Homomorphism 67 Limit point of a sequence 17
Hopf extension theorem 361 Limit superior 17
Hyperplane 90 Lindelöf space 134, 217
Linear basis 55
Idempotent operator 69 Linear combination 52
Identity function 4 Linear functional 66
Image of a set 3 Linear mapping 61
Increasing sequence 12 Linear operator 68
Independent vectors 54 Linear ordering 33
Indexed sets 4 Liouville’s theorem 276
Indiscrete topology 193 Lipschitz function 142
Induced metric 115 Locally bounded function 202
Infimum 11 Locally compact space 154, 227
Infinite sequnce 4 Lower bound 11
Infinite set 27 Lower limit 17
Infinite-dimensional space 59 Lower limit topology 199
Infinity norm 76, 160 Lower Riemann integral 344
Initial segment 38, 445 Lower Riemann sum 343
Injective function 4 Lower semicontinuous 203
Inner product space 86 𝔏p spaces 403
Inner regularity 387 lp spaces 77
Integrable function 368, 369 Luzin’s theorem 411
Integral of a function 365, 366, 368, 369
Interior of a set 110, 194 Matrix of a linear mapping 70
Interior point 110, 194 Matrix representation 70
Intermediate value theorem 209 Maximal element 33
Interval 208 Maximal subspace 67
Invariance of dimension 59 Mean square convergence 177
Invariant subspace 69 Measurable dissection 401
Inverse functions 4 Measurable function 355, 359
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

460 index

Measurable rectangle 419 Orthogonal set 88


Measurable sets 351 Orthogonal vectors 87
Measurable space 351 Orthonormal basis 300
Measure space 351 Orthonormal set 88
Metric 105 Outer measure 352
Metric space 105 Outer regularity 387
Metrizable space 233
Metrization 233 Parallelogram law 294
Minimal spanning set 56 Parseval’s identity 303
Minkowski’s inequality 78, 404 Partial ordering 33
Monotone class 421 Partition 342
Monotone class lemma 421 Partition of unity 375
Monotone convergence theorem 371 Path connected 212
Monotonic sequence 12 Peano, Giuseppe 47
Monotonicity of measures 351 Perfect set 116
Mutually singular measures 396 Piecewise linear function 61
Point spectrum 274
Natural embedding 270 Pointwise boundedness 261
Negative set 394 Polarization identity 294
Negative variation of a measure 396 Polytope 83
Neighborhood of a point 108, 195 Positive measure 351
Neighborhood of a set 195 Positive operator 319
Norm 76 Positive set 394
Norm of a bounded mapping 254 Positive variation of a measure 396
Norm topology 284 Power set 7
Normal matrix 95 Product measure 423
Normal operator 98, 316 Product metric 130
Normal space 215 Product spaces 130
Normed linear space 75 Product topology 206, 239
Nowhere dense set 116, 139, 197 Projection 7, 66
Nowhere differentiable functions 144 Projection operator 297
Null sequences, space of 51 Projection theorem 296
Nullity of a linear mapping 62 Punctured circle 126
Null-space of a linear mapping 62 Punctured sphere 127
Pythagorean theorem 88, 294
Obtuse angle criterion 156
One-point compactification 231 Qoutient map 64
One-to-one correspondence 4 Quotient space 64, 281
Open ball 80, 106
Open base 134, 198 Radon measurable set 389
Open cover 134, 149, 217, 221 Radon measure 389
Open mapping 207 Radon outer measure 389
Open mapping theorem 262 Radon-Nikodym derivative 398
Open neighborhood 195 Radon-Nikodym theorem 400
Open set 106, 193 Range of a function 3
Open subbase 199 Range of a linear mapping 62
Open subcover 134, 149, 217, 221 Rank of a linear mapping 62
Operator algebra 273 Real measure 394
Operator norm 256 Reflexive relation 7
Ordered pairs 3 Reflexive space 270
Orthogonal complement 90, 295 Regular measure 387
Orthogonal matrix 93 Regular space 215
Orthogonal polynomials 183 Relation 7
Orthogonal projection 91, 297 Relative topology 196
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

index 461

Resolvent set 274 Subspace 51, 115


Restricted metric 115 Subspace metric 115
Restricted topology 196 Subspace topology 196
Riemann integrable function 344 Sum of subspaces 65
Riemann integral 344 Support of a function 237, 374
Riemann-Lebesgue lemma 435, 437 Supporting half-space 157
Riesz representation theorem 297 Supporting hyperplane theorem 158
Riesz’s lemma 251 Supremum 11
Riesz-Fisher theorem 304 Supremum norm 76, 160
Riesz-Schauder theorem 323 Surjective function 4
Right shift operator 274 Symmetric relation 7

Schauder basis 253 T1 space 214


Schröder-Bernstein theorem 37 Tchebychev polynomials 186, 187
Second countable space 134, 217 Three-term recurrence 184
Second dual space 269 Tietze extension theorem 409
Section of a set 446 Tonelli’s theorem 424
Sections of functions 420 Topological embedding 203
Sections of sets 419 Topological space 193
Segment of a set 445 Topology 193
Self-adjoint operator 98, 309 Total ordering 33
Separable space 133, 217 Total variation of a measure
Sequential continuity 120 396, 400
Sequentially compact space 151 Totally bounded space 151
Set exponentiation 6 Totally disconnected space 212
𝜍-algebra 349 Transfinite induction 38
𝜍-compact 228 Transitive relation 7
𝜍-finite 402 Translation of a set 81
Simple function 365 Triangle inequality 76, 105
Skew symmetric matrices 60 Trigonometric polynomial 177
Sorgenfrey line 199 Tube lemma 224
Sorgenfrey plane 217 Tychonoff ’s theorem 153, 224, 241
Space of continuous functions 202
Space of bounded functions 160, 202 Uncountable set 29
Space-filling curve 167 Uniform boundedness 261
Span of a set 52 Uniform boundedness principle
Spectral decomposition 97 262
Spectral radius 275 Uniform continuity 14, 161
Spectral theorem 99, 329 Uniform equicontinuity 163
Spectrum of an operator 274 Uniform norm 76, 160
Square integrable functions 183 Uniqueness theorem 181, 436, 440
Standard inner product 86 Unitary matrix 93
Standard matrix 61 Unitary operator 316
Standard n-simplex 84 Upper bound 10, 34
Step function 348 Upper limit 17
Stereographic projections 126 Upper Riemann integral 344
The Stone-Weierstrass Theorem 172 Upper Riemann sum 343
Strictly convex norm 307 Upper semicontinuous 203
Strong topology 284 Urysohn, Pavel 191
Stronger metric 121 Urysohn metrization theorem
Stronger topology 199 235
Subadditivity of measure 352 Urysohn’s lemma 234, 373, 415
Subcover 134, 149, 221 Usual metric 105
Subsequence 13 Usual topology 194
OUP UNCORRECTED PROOF – FINAL, 12/1/2021, SPi

462 index

Vector space 49 Weakly convergent sequence 272, 307


Vertices of a polytope 83 Weierstrass approximation theorem 165
Volterra equation 143 Weierstrass M-test 140
Weierstrass, Karl 103
Weak topology 284 Weight function 182
Weak* topology 284 Well-ordered set 34
Weaker metric 121
Weaker topology 199 Zorn’s lemma 36

You might also like