Calculus and Analysis in Euclidean Space: Jerry Shurman
Calculus and Analysis in Euclidean Space: Jerry Shurman
Jerry Shurman
Calculus and
Analysis in
Euclidean
Space
Undergraduate Texts in Mathematics
Undergraduate Texts in Mathematics
Series Editors:
Sheldon Axler
San Francisco State University, San Francisco, CA, USA
Kenneth Ribet
University of California, Berkeley, CA, USA
Advisory Board:
123
Jerry Shurman
Department of Mathematics
Reed College
Portland, OR
USA
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
2 Euclidean Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1 Algebra: Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Geometry: Length and Angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Analysis: Continuous Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4 Topology: Compact Sets and Continuity . . . . . . . . . . . . . . . . . . . . 51
6 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
6.1 Machinery: Boxes, Partitions, and Sums . . . . . . . . . . . . . . . . . . . . . 253
6.2 Definition of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
6.3 Continuity and Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
6.4 Integration of Functions of One Variable . . . . . . . . . . . . . . . . . . . . 277
6.5 Integration over Nonboxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
6.6 Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
6.7 Change of Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
6.8 Topological Preliminaries for the Change of Variable Theorem 328
6.9 Proof of the Change of Variable Theorem . . . . . . . . . . . . . . . . . . . 335
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
Preface
This book came into being as lecture notes for a course at Reed College on
multivariable calculus and analysis. The setting is n-dimensional Euclidean
space, with the material on differentiation culminating in the inverse function
theorem and its consequences, and the material on integration culminating
in the general fundamental theorem of integral calculus (often called Stokes’s
theorem) and some of its consequences in turn. The prerequisite is a proof-
based course in one-variable calculus and analysis. Some familiarity with the
complex number system and complex mappings is occasionally assumed as
well, but the reader can get by without it.
The book’s aim is to use multivariable calculus to teach mathematics as
a blend of reasoning, computing, and problem-solving, doing justice to the
structure, the details, and the scope of the ideas. To this end, I have tried to
write in an informal style that communicates intent early in the discussion of
each topic rather than proceeding coyly from opaque definitions. Also, I have
tried occasionally to speak to the pedagogy of mathematics and its effect on
the process of learning the subject. Most importantly, I have tried to spread
the weight of exposition among figures, formulas, and words. The premise is
that the reader is eager to do mathematics resourcefully by marshaling the
skills of
• geometric intuition (the visual cortex being quickly instinctive)
• algebraic manipulation (symbol-patterns being precise and robust)
• and incisive use of natural language (slogans that encapsulate central ideas
enabling a large-scale grasp of the subject).
Thinking in these ways renders mathematics coherent, inevitable, and fluid.
In my own student days I learned this material from books by Apostol,
Buck, Rudin, and Spivak, books that thrilled me. My debt to those sources
pervades these pages, and there are many other fine books on the subject as
well. Indeed, nothing in these notes is claimed as new. Whatever effective-
ness this exposition has acquired over time is due to innumerable ideas from
my students, and from discussion with colleagues, especially Joe Buhler, Paul
x Preface
Garrett, Ray Mayer, and Tom Wieting. After many years of tuning my presen-
tation of this subject matter to serve the needs in my classroom, I hope that
now this book can serve other teachers and their students too. I welcome sug-
gestions for improving it, especially because some of its parts are more tested
than others. Comments and corrections should be sent to [email protected].
By way of a warmup, Chapter 1 reviews some ideas from one-variable
calculus, and then covers the one-variable Taylor’s theorem in detail.
Chapters 2 and 3 cover what might be called multivariable precalculus, in-
troducing the requisite algebra, geometry, analysis, and topology of Euclidean
space, and the requisite linear algebra, for the calculus to follow. A pedagogical
theme of these chapters is that mathematical objects can be better understood
from their characterizations than from their constructions. Vector geometry
follows from the intrinsic (coordinate-free) algebraic properties of the vector
inner product, with no reference to the inner product formula. The fact that
passing a closed and bounded subset of Euclidean space through a continuous
mapping gives another such set is clear once such sets are characterized in
terms of sequences. The multiplicativity of the determinant and the fact that
the determinant indicates whether a linear mapping is invertible are conse-
quences of the determinant’s characterizing properties. The geometry of the
cross product follows from its intrinsic algebraic characterization. Further-
more, the only possible formula for the (suitably normalized) inner product,
or for the determinant, or for the cross product, is dictated by the relevant
properties. As far as the theory is concerned, the only role of the formula is
to show that an object with the desired properties exists at all. The intent
here is that the student who is introduced to mathematical objects via their
characterizations will see quickly how the objects work, and that how they
work makes their constructions inevitable.
In the same vein, Chapter 4 characterizes the multivariable derivative as a
well-approximating linear mapping. The chapter then solves some multivari-
able problems that have one-variable counterparts. Specifically, the multivari-
able chain rule helps with change of variable in partial differential equations,
a multivariable analogue of the max/min test helps with optimization, and
the multivariable derivative of a scalar-valued function helps to find tangent
planes and trajectories.
Chapter 5 uses the results of the three chapters preceding it to prove the
inverse function theorem, then the implicit function theorem as a corollary,
and finally the Lagrange multiplier criterion as a consequence of the implicit
function theorem. Lagrange multipliers help with a type of multivariable op-
timization problem that has no one-variable analogue, optimization with con-
straints. For example, given two curves in space, what pair of points—one
on each curve—are closest to each other? Not only does this problem have
six variables (the three coordinates of each point), but furthermore, they are
not fully independent: the first three variables must specify a point on the
first curve, and similarly for the second three. In this problem, x1 through x6
Preface xi
e1 x1 + ⋯ + en xn = 1.
π n/2 n
vol (Bn (r)) = r , n = 1, 2, 3, 4, . . . .
(n/2)!
the integral is useful in ways far beyond computing volumes. The second point
is that with approximation by convolution in hand, we feel free to assume in
the sequel that functions are smooth. The reader who is willing to grant this
assumption in any case can skip Chapter 7.
Chapter 8 introduces parametrized curves as a warmup for Chapter 9
to follow. The subject of Chapter 9 is integration over k-dimensional parame-
trized surfaces in n-dimensional space, and parametrized curves are the special
case k = 1. Aside from being one-dimensional surfaces, parametrized curves
are interesting in their own right. Chapter 8 focuses on the local description
of a curve in an intrinsic coordinate system that continually adjusts itself as
it moves along the curve, the Frenet frame.
Chapter 9 presents the integration of differential forms. This subject poses
the pedagogical dilemma that fully describing its structure requires an in-
vestment in machinery untenable for students who are seeing it for the first
time, whereas describing it purely operationally is unmotivated. The approach
here begins with the integration of functions over k-dimensional surfaces in
n-dimensional space, a natural tool to want, with a natural definition suggest-
ing itself. For certain such integrals, called flow and flux integrals, the inte-
grand takes a particularly workable form consisting of sums of determinants
of derivatives. It is easy to see what other integrands—including integrands
suitable for n-dimensional integration in the sense of Chapter 6, and includ-
ing functions in the usual sense—have similar features. These integrands can
be uniformly described in algebraic terms as objects called differential forms.
That is, differential forms assemble the smallest coherent algebraic structure
encompassing the various integrands of interest to us. The fact that differen-
tial forms are algebraic makes them easy to study without thinking directly
about the analysis of integration. The algebra leads to a general version of
the fundamental theorem of integral calculus that is rich in geometry. The
theorem subsumes the three classical vector integration theorems: Green’s
theorem, Stokes’s theorem, and Gauss’s theorem, also called the divergence
theorem.
The following two exercises invite the reader to start engaging with some
of the ideas in this book immediately.
Exercises
0.0.1. (a) Consider two surfaces in space, each surface having at each of its
points a tangent plane and therefore a normal line, and consider pairs of
points, one on each surface. Conjecture a geometric condition, phrased in
terms of tangent planes and/or normal lines, about the closest pair of points.
(b) Consider a surface in space and a curve in space, the curve having at
each of its points a tangent line and therefore a normal plane, and consider
pairs of points, one on the surface and one on the curve. Make a conjecture
about the closest pair of points.
(c) Make a conjecture about the closest pair of points on two curves.
Preface xiii
0.0.2. (a) Assume that the factorial of a half-integer makes sense, and grant
the general formula for√the volume of a ball in n dimensions. Explain why
it follows that (1/2)! = π/2. Further assume that the half-integral factorial
function satisfies the relation
Subject to these assumptions, verify that the volume of the ball of radius r
in three dimensions is 43 πr3 as claimed. What is the volume of the ball of
radius r in five dimensions?
(b) The ball of radius r in n dimensions sits inside a circumscribing box
with sides of length 2r. Draw pictures of this configuration for n = 1, 2, 3.
Determine what portion of the box is filled by the ball in the limit as the
dimension n gets large. That is, find
The original version of the book was revised: The Author’s later corrections have been
incorporated. The corrected book is available at https://doi.org/10.1007/978-3-319-
49314-5_10
1
Results from One-Variable Calculus
We begin with a quick review of some ideas from one-variable calculus. The
material of Sections 1.1 and 1.2 in assumed to be familiar. Section 1.3 discusses
Taylor’s theorem at greater length, not assuming that the reader has already
seen it.
All of basic algebra follows from the field axioms. Additive and multi-
plicative inverses are unique, the cancellation law holds, 0 ⋅ x = 0 for all real
numbers x, and so on.
Subtracting a real number from another is defined as adding the additive
inverse. In symbols,
We also assume that R is an ordered field. That is, we assume that there
is a subset R+ of R (the positive elements) such that the following axioms
hold.
x ∈ R+ , −x ∈ R+ , x = 0.
(o2) Closure of positive numbers under addition: for all real numbers x and y,
if x ∈ R+ and y ∈ R+ then also x + y ∈ R+ .
(o3) Closure of positive numbers under multiplication: for all real numbers x
and y, if x ∈ R+ and y ∈ R+ then also xy ∈ R+ .
x<y
to mean
y − x ∈ R+ .
The usual rules for inequalities then follow from the axioms.
Finally, we assume that the real number system is complete. Complete-
ness can be phrased in various ways, all logically equivalent. One version of
completeness is phrased in terms of set-bounds.
N = {0, 1, 2, . . . }.
Indeed, the hypotheses of the theorem say that P (n) is true for a subset
of N that is inductive, and so the theorem follows from the definition of N as
the smallest inductive subset of R.
The Archimedean property of the real number system states that the
subset N of R is not bounded above. Equivalently, the sequence {1, 12 , 13 , . . . }
converges to 0: there are no infinitesimal real numbers greater than 0 but
less than every reciprocal positive integer. The Archimedean property follows
from the assumption that R satisfies the set-bound criterion for completeness.
A second version of completeness is phrased in terms of monotonic se-
quences. Again it is an existence statement.
This version of completeness follows from the first one. However, it does
not imply the first one unless we also assume the Archimedean property.
The set of integers, denoted Z, is the union of the natural numbers and
their additive inverses,
Z = {0, ±1, ±2, . . . }.
Exercises
1.1.1. Referring only to the field axioms, show that 0x = 0 for all x ∈ R.
4 1 Results from One-Variable Calculus
1.1.2. Prove that in every ordered field, 1 is positive. Prove that the complex
number field C cannot be made an ordered field.
1.1.3. Use a completeness property of the real number system to show that 2
has a positive square root.
(b) (Bernoulli’s inequality) For every real number r ≥ −1, prove that
1.1.5. (a) Use the induction theorem to show that for every natural num-
ber m, the sum m + n and the product mn are again natural for every natural
number n. Thus N is closed under addition and multiplication, and conse-
quently so is Z.
(b) Which of the field axioms continue to hold for the natural numbers?
(c) Which of the field axioms continue to hold for the integers?
1.1.6. For every positive integer n, let Z/nZ denote the set {0, 1, . . . , n − 1}
with the usual operations of addition and multiplication carried out taking
remainders on division by n. That is, add and multiply in the usual fashion
but subject to the additional condition that n = 0. For example, in Z/5Z we
have 2 + 4 = 1 and 2 ⋅ 4 = 3. For what values of n does Z/nZ form a field?
The second theorem says that under suitable conditions, every value
trapped between two output values of a function must itself be an output
value.
1.2 Foundational and Basic Theorems 5
and
f (x′ ) > y for some x′ ∈ I.
Then
f (c) = y for some c ∈ I.
Theorem 1.2.3 (Mean value theorem). Let a and b be real numbers with
a < b. Suppose that the function f ∶ [a, b] Ð→ R is continuous and that f is
differentiable on the open subinterval (a, b). Then
f (b) − f (a)
= f ′ (c) for some c ∈ (a, b).
b−a
The fundamental theorem of integral calculus quantifies the idea that inte-
gration and differentiation are inverse operations. In fact, two different results
are both called the fundamental theorem, one a result about the derivative
of the integral and the other a result about the integral of the derivative.
“Fundamental theorem of calculus,” unmodified, usually refers to the second
of the next two results.
Exercises
1.2.1. Use the intermediate value theorem to show that 2 has a positive square
root.
1.2.2. Let f ∶ [0, 1] Ð→ [0, 1] be continuous. Use the intermediate value theo-
rem to show that f (x) = x for some x ∈ [0, 1].
1.2.3. Let a and b be real numbers with a < b. Suppose that f ∶ [a, b] Ð→ R
is continuous and that f is differentiable on the open subinterval (a, b). Use
the mean value theorem to show that if f ′ > 0 on (a, b) then f is strictly
increasing on [a, b]. (Note: The quantities called a and b in the mean value
theorem when you cite it to solve this exercise will not be the a and b given
here. It may help to review the definition of “strictly increasing.”)
1.2.4. For the extreme value theorem, the intermediate value theorem, and
the mean value theorem, give examples to show that weakening the hypotheses
of the theorem gives rise to examples for which the conclusion of the theorem
fails.
p(a) = f (a), p′ (a) = f ′ (a), p′′ (a) = f ′′ (a), ..., p(n) (a) = f (n) (a)?
x2 x3 xn n
xk
Tn (x) = 1 + x + + +⋯+ =∑ .
2 3! n! k=0 k!
Recall that the second question is how well the polynomial Tn (x) approx-
imates f (x) for x ≠ a. Thus it is a question about the difference f (x) − Tn (x).
Giving this quantity its own name is useful.
Ik (x) = ∫
x x1 xk−1
∫ ⋯∫ dxk ⋯dx2 dx1 .
x1 =a x2 =a xk =a
The method and pattern are clear, and the answer in general is
Ik (x) = (x − a)k ,
1
k = 0, 1, 2, . . . .
k!
Note that this is part of the kth term (f (k) (a)/k!)(x − a)k of the Taylor
polynomial, the part that makes no reference to the function f . That is,
f (k) (a)Ik (x) is the kth term of the Taylor polynomial for k = 0, 1, 2, . . . .
With the formula for Ik (x) in hand, we return to using the fundamental
theorem of integral calculus to study the remainder Rn (x), the function f (x)
minus its nth-degree Taylor polynomial Tn (x). According to the fundamental
theorem,
f (x) = f (a) + ∫ f ′ (x1 ) dx1 .
x
a
That is, f (x) is equal to the constant term of the Taylor polynomial plus an
integral,
f (x) = T0 (x) + ∫ f ′ (x1 ) dx1 .
x
a
By the fundamental theorem again, the integral is in turn
The first term of the outer integral is f ′ (a)I1 (x), giving the first-order term
of the Taylor polynomial and leaving a doubly nested integral,
and the first term of the outer integral is f ′′ (a)I2 (x), giving the second-order
term of the Taylor polynomial and leaving a triply nested integral,
f ′′ (a)
f ′′ (x2 ) dx2 dx1 = (x −a)2 + ∫ ∫ f ′′′ (x3 ) dx3 dx2 dx1 .
x x1 x x1 x2
∫ ∫ ∫
a a 2 a a a
Continuing this process through n iterations shows that f (x) is Tn (x) plus
an (n + 1)-fold iterated integral,
(x − a)n+1 (x − a)n+1
≤ Rn (x) ≤ M
(n + 1)! (n + 1)!
m . (1.2)
1.3 Taylor’s Theorem 11
(x − a)n+1
g ∶ [a, x] Ð→ R, g(t) = f (n+1) (t)
(n + 1)!
.
That is, since there exist values tm and tM in [a, x] such that f (n+1) (tm ) = m
and f (n+1) (tM ) = M , the result (1.2) of our calculation can be rephrased as
where
f (n+1) (c)
Rn (x) = (x − a)n+1
(n + 1)!
for some c between a and x.
f˜ ∶ −I Ð→ R, f˜(−x) = f (x).
Since f˜ = f ○ neg, where neg is the negation function, a small exercise with
the chain rule shows that
where
f (−a)
T̃n (−x) = ∑
n ˜(k)
(−x − (−a))k
k=0 k!
and
12 1 Results from One-Variable Calculus
˜(n+1) (−c)
̃n (−x) = f (−x − (−a))n+1 for some −c between −a and −x.
(n + 1)!
R
But f˜(−x) = f (x), and T̃n (−x) is precisely the desired Taylor polyno-
mial Tn (x),
f (−a)
T̃n (−x) = ∑
n ˜(k)
(−x − (−a))k
k=0 k!
n
(−1)k f (k) (a) n
f (k) (a)
=∑ (−1)k (x − a)k = ∑ (x − a)k = Tn (x),
k=0 k! k=0 k!
̃n (−x) = f
(n+1)
(c)
(x − a)n+1
(n + 1)!
R for some c between a and x.
Thus we obtain the statement of Taylor’s theorem in the case x < a as well.
Whereas our proof of Taylor’s theorem relies primarily on the fundamental
theorem of integral calculus, and a similar proof relies on repeated integration
by parts (Exercise 1.3.6), many proofs rely instead on the mean value theorem.
Our proof neatly uses three different mathematical techniques for the three
different parts of the argument:
• To find the Taylor polynomial Tn (x), we differentiated repeatedly, using
a substitution at each step to determine a coefficient.
• To get a precise (if unwieldy) expression for the remainder Rn (x) = f (x) −
Tn (x), we integrated repeatedly, using the fundamental theorem of integral
calculus at each step to produce a term of the Taylor polynomial.
• To express the remainder in a more convenient form, we used the extreme
value theorem and then the intermediate value theorem once each. These
foundational theorems are not results from calculus but (as we will discuss
in Section 2.4) from an area of mathematics called topology.
The expression for Rn (x) given in Theorem 1.3.3 is called the Lagrange
form of the remainder. Other expressions for Rn (x) exist as well. Whatever
form is used for the remainder, it should be something that we can estimate
by bounding its magnitude.
For example, we use Taylor’s theorem to estimate ln(1.1) by hand to within
1/500 000. Let f (x) = ln(1+x) on (−1, ∞), and let a = 0. Compute the following
table:
1.3 Taylor’s Theorem 13
f (k) (0)
k f (k) (x)
k!
0 ln(1 + x) 0
1
(1 + x)
1 1
1 1
− −
(1 + x)2
2
2
2 1
(1 + x)
3 3 3
3! 1
− −
(1 + x)
4 4 4
⋮ ⋮ ⋮
(−1)n−1 (n − 1)! (−1)n−1
(1 + x)n
n
n
(−1)n n!
n+1
(1 + x)n+1
Next, read off from the table that for n ≥ 1, the nth-degree Taylor polynomial
is
x2 x3 xn n
xk
Tn (x) = x − + − ⋯ + (−1)n−1 = ∑ (−1)k−1 ,
2 3 n k=1 k
and the remainder is
(−1)n xn+1
Rn (x) =
(1 + c)n+1 (n + 1)
for some c between 0 and x.
This expression for the remainder may be a bit much to take in, because
it involves three variables: the point x at which we are approximating the
logarithm, the degree n of the Taylor polynomial that is providing the ap-
proximation, and the unknown value c in the error term. But we are in-
terested in x = 0.1 in particular (since we are approximating ln(1.1) using
f (x) = ln(1 + x)), so that the Taylor polynomial specializes to
(0.1)n+1
∣Rn (0.1)∣ =
(1 + c)n+1 (n + 1)
for some c between 0 and 0.1.
Now the symbol x is gone. Next, note that although we don’t know the value
of c, the smallest possible value of the quantity (1 + c)n+1 in the denominator
of the absolute remainder is 1, because c ≥ 0. And since this value occurs in
14 1 Results from One-Variable Calculus
the denominator, it lets us write the greatest possible value of the absolute
remainder with no reference to c. That is,
(0.1)n+1
∣Rn (0.1)∣ ≤
(n + 1)
,
and the symbol c is gone as well. The only remaining variable is n, and the
goal is to approximate ln(1.1) to within 1/500 000. Set n = 4 in the previous
display to get
∣R4 (0.1)∣ ≤
1
.
500 000
That is, the fourth-degree Taylor polynomial
T4 (0.1) =
1 1 1 1
− + − ,
10 200 3000 40000
which numerically is
T4 (0.1) = 0.10000000 . . .
−0.00500000 . . .
+0.00033333 . . .
−0.00002500 . . .
= 0.09530833 . . . ,
Any computer should confirm this. The point here is not that we have ob-
tained impressively many digits of ln(1.1), or that we would want to continue
carrying out such calculations by hand, but that we see how Taylor’s theo-
rem guarantees correct computation to a specified accuracy using only basic
arithmetic.
Continuing to work with the function f (x) = ln(1 + x) for x > −1, set x = 1
instead to get that for n ≥ 1,
Tn (1) = 1 − + − ⋯ + (−1)n−1 ,
1 1 1
2 3 n
and
∣Rn (1)∣ = ∣ ∣
1
(1 + c)n+1 (n + 1)
for some c between 0 and 1.
Thus ∣Rn (1)∣ ≤ 1/(n + 1), and this goes to 0 as n → ∞. Therefore ln(2) is
expressible as an infinite series,
1 1 1
ln(2) = 1 − + − + ⋯.
2 3 4
This example illustrates an important general principle:
1.3 Taylor’s Theorem 15
0.5 1 1.5 2
−1
−2
−3
xn+1
∣Rn (x)∣ = ∣ec ∣
(n + 1)!
for some c between 0 and x.
∣x∣n+1
∣Rn (x)∣ ≤ max{1, ex }
(n + 1)!
.
The power series here can be used to define ex , but then obtaining the prop-
erties of ex depends on the technical fact that a power series can be differenti-
ated term by term in its open interval (or disk if we are working with complex
numbers) of convergence.
The power series in the previous display also allows a small illustration of
the utility of quantifiers. Since it is valid for every real number x, it is valid
with x2 in place of x,
∞
x4 x6 x2n x2k
ex = 1 + x 2 + + +⋯+ +⋯= ∑ for every x ∈ R.
2
2! 3! n! k=0 k!
There is no need here to introduce the function g(x) = ex , then work out its
2
The ratio test shows that this series converges absolutely when ∣x∣ < 1, and
the nth-term test shows that the series diverges when x > 1. The series also
converges at x = 1, as observed earlier. Thus, while the domain of the func-
tion ln(1 + x) is (−1, ∞), the Taylor series has no chance to match the func-
tion outside of (−1, 1]. As for whether the Taylor series matches the function
on (−1, 1], recall the Lagrange form of the remainder,
(−1)n xn+1
Rn (x) =
(1 + c)n+1 (n + 1)
for some c between 0 and x.
1.3 Taylor’s Theorem 17
f (x) = ⎨
⎪
⎪ if x = 0.
⎩0
It is possible to show that f is infinitely differentiable and that every derivative
of f at 0 is 0. That is, f (k) (0) = 0 for k = 0, 1, 2, . . . . Consequently, the Taylor
series for f at 0 is
T (x) = 0 + 0x + 0x2 + ⋯ + 0xn + ⋯.
That is, the Taylor series is the zero function, which certainly converges for all
x ∈ R. But the only value of x for which it converges to the original function f
is x = 0. In other words, although this Taylor series converges everywhere,
it fails catastrophically to equal the function it is attempting to match. The
problem is that the function f decays exponentially, and since exponential be-
havior dominates polynomial behavior, any attempt to discern f using poly-
nomials will fail to see it. Figures 1.2 and 1.3 plot f to display its rapid decay.
The first plot is for x ∈ [−25, 25] and the second is for x ∈ [−1/2, 1/2].
18 1 Results from One-Variable Calculus
−20 −10 10 20
Figure 1.2. Rapidly decaying function, wide view
Exercises
1.3.1. (a) Let n ∈ N. What is the (2n+1)st-degree Taylor polynomial T2n+1 (x)
for the function f (x) = sin x at 0? (The reason for the strange indexing here
is that every second term of the Taylor polynomial is 0.) Prove that sin x is
equal to the limit of T2n+1 (x) as n → ∞, similarly to the argument in the text
for ex . Also find T2n (x) for f (x) = cos x at 0, and explain why the argument
for sin shows that cos x is the limit of its even-degree Taylor polynomials as
well.
(b) Many years ago, the author’s high-school physics textbook asserted,
bafflingly, that the approximation sin x ≈ x is good for x up to 8○ . Deconstruct.
1.3.2. What is the nth-degree Taylor polynomial Tn (x) for the following func-
tions at 0?
1.3 Taylor’s Theorem 19
(a) f (x) = arctan x. (This exercise is not just a matter of routine mechan-
ics. One way to proceed involves the geometric series, and another makes use
of the factorization 1 + x2 = (1 − ix)(1 + ix).)
(b) f (x) = (1 + x)α where α ∈ R. (Although the answer can be written
in a uniform way for all α, it behaves differently when α ∈ N. Introduce the
generalized binomial coefficient symbol
1.3.3. (a) Further tighten the numerical estimate of ln(1.1) from this section
by reasoning as follows. As n increases, the Taylor polynomials Tn (0.1) add
terms of decreasing magnitude and alternating sign. Therefore T4 (0.1) un-
derestimates ln(1.1). Now that we know this, it is useful to find the smallest
possible value of the remainder (by setting c = 0.1 rather than c = 0 in the for-
mula). Then ln(1.1) lies between T4 (0.1) plus this smallest possible remainder
value and T4 (0.1) plus the largest possible remainder value, obtained in the
section. Supply the numbers, and verify by machine that the tighter estimate
of ln(1.1) is correct.
(b) In Figure 1.1, identify the graphs of T1 through T5 and the graph of ln
near x = 0 and near x = 2.
1.3.4. Working by hand, use the third-degree Taylor polynomial for sin(x)
at 0 to approximate a decimal representation of sin(0.1). Also compute the
decimal representation of an upper bound for the error of the approximation.
Bound sin(0.1) between two decimal representations.
√
1.3.5. Use a second-degree Taylor polynomial to approximate 4.2. Use Tay-
lor’s theorem to find a guaranteed√accuracy of the approximation and thus to
find upper and lower bounds for 4.2.
1.3.6. (a) Another proof of Taylor’s Theorem uses the fundamental theorem
of integral calculus once and then integrates by parts repeatedly. Begin with
the hypotheses of Theorem 1.3.3, and let x ∈ I. By the fundamental theorem,
parts gives
Let u = f ′′ (t) and v = 21 (t − x)2 , so that again the integral is ∫a u dv, and
x
(x − a)2 (t − x)2
f (x) = f (a) + f ′ (a)(x − a) + f ′′ (a) + ∫ f ′′′ (t)
x
dt.
2 a 2
Show that after n steps, the result is
(t − x)n
f (x) = Tn (x) + (−1)n ∫ f (n+1) (t)
x
dt.
a n!
Whereas the expression for f (x) − Tn (x) in Theorem 1.3.3 is called the La-
grange form of the remainder, this exercise has derived the integral form
of the remainder. Use the extreme value theorem and the intermediate value
theorem to derive the Lagrange form of the remainder from the integral form.
(b) Use the integral form of the remainder to show that
Rn = {(x1 , . . . , xn ) ∶ xi ∈ R for i = 1, . . . , n} ,
p+x
p
x x
+ ∶ Rn × Rn Ð→ Rn ,
For example, (1, 2, 3) + (4, 5, 6) = (5, 7, 9). Note that the meaning of the “+”
sign is now overloaded: on the left of the displayed equality, it denotes the
new operation of vector addition, whereas on the right side it denotes the old
addition of real numbers. The multiple meanings of the plus sign shouldn’t
cause problems, because the meaning of “+” is clear from context, i.e., the
2.1 Algebra: Vectors 25
meaning of “+” is clear from whether it sits between vectors or scalars. (An
expression such as “(1, 2, 3) + 4,” with the plus sign between a vector and a
scalar, makes no sense according to our grammar.)
The interpretation of vectors as arrows gives a geometric description of
vector addition, at least in R2 . To add the vectors x and y, draw them as
arrows starting at 0 and then complete the parallelogram P that has x and y
as two of its sides. The diagonal of P starting at 0 is then the arrow depicting
the vector x + y. (See Figure 2.3.) The proof of this is a small argument with
similar triangles, left to the reader as Exercise 2.1.2.
x+y
y
P
x
⋅ ∶ R × Rn Ð→ Rn ,
defined by
a ⋅ (x1 , . . . , xn ) = (ax1 , . . . , axn ).
For example, 2⋅(3, 4, 5) = (6, 8, 10). We will almost always omit the symbol “⋅”
and write ax for a⋅x. With this convention, juxtaposition is overloaded as “+”
was overloaded above, but again this shouldn’t cause problems.
Scalar multiplication of the vector x (viewed as an arrow) by a also has a
geometric interpretation: it simply stretches (i.e., scales) x by a factor of a.
When a is negative, ax turns x around and stretches it in the other direction
by ∣a∣. (See Figure 2.4.)
−3x
x
2x
The other vector space axioms for Rn can be shown similarly, by unwinding
vectors to their coordinates, quoting field axioms coordinatewise, and then
bundling the results back up into vectors (see Exercise 2.1.3). Nonetheless,
the vector space axioms do not perfectly parallel the field axioms, and you
are encouraged to spend a little time comparing the two axiom sets to get a
feel for where they are similar and where they are different (see Exercise 2.1.4).
Note in particular that
For n > 1, Rn is not endowed with vector-by-vector multiplication.
Although one can define vector multiplication on Rn componentwise, this mul-
tiplication does not combine with vector addition to satisfy the field axioms
except when n = 1. The multiplication of complex numbers makes R2 a field,
and in Section 3.10 we will see an interesting noncommutative multiplication
of vectors for R3 , but these are special cases.
One benefit of the vector space axioms for Rn is that they are phrased
intrinsically, meaning that they make no reference to the scalar coordinates
2.1 Algebra: Vectors 27
of the vectors involved. Thus, once you use coordinates to establish the vector
space axioms, your vector algebra can be intrinsic thereafter, making it lighter
and more conceptual. Also, in addition to being intrinsic, the vector space
axioms are general. While Rn is the prototypical set satisfying the vector space
axioms, it is by no means the only one. In coming sections we will encounter
other sets V (whose elements may be, for example, functions) endowed with
their own addition, multiplication by elements of a field F , and distinguished
element 0. If the vector space axioms are satisfied with V and F replacing Rn
and R then we say that V is a vector space over F .
The pedagogical point here is that although the similarity between vector
algebra and scalar algebra may initially make vector algebra seem uninspiring,
in fact the similarity is exciting. It makes mathematics easier, because familiar
algebraic manipulations apply in a wide range of contexts. The same symbol-
patterns have more meaning. For example, we use intrinsic vector algebra to
prove a result from Euclidean geometry, that the three medians of a triangle
intersect. (A median is a segment from a vertex to the midpoint of the opposite
edge.) Consider a triangle with vertices x, y, and z, and form the average of
the three vertices,
x+y+z
p= .
3
This algebraic average will be the geometric center of the triangle, where
the medians meet. (See Figure 2.5.) Indeed, rewrite p as
2 y+z
p=x+ ( − x) .
3 2
The displayed expression for p shows that it is two-thirds of the way from x
along the line segment from x to the average of y and z, i.e., that p lies on
the triangle median from vertex x to side yz. (Again see the figure. The idea
is that (y + z)/2 is being interpreted as the midpoint of y and z, each of these
viewed as a point, while on the other hand, the little mnemonic
{e1 , e2 , . . . , en }
where
28 2 Euclidean Space
y+z
2
p
x
(Thus each ei is itself a vector, not the ith scalar entry of a vector.) Every
vector x = (x1 , x2 , . . . , xn ) (where the xi are scalar entries) decomposes as
x = (x1 , x2 , . . . , xn )
= (x1 , 0, . . . , 0) + (0, x2 , . . . , 0) + ⋯ + (0, 0, . . . , xn )
= x1 (1, 0, . . . , 0) + x2 (0, 1, . . . , 0) + ⋯ + xn (0, 0, . . . , 1)
= x 1 e1 + x 2 e2 + ⋯ + x n en ,
or more succinctly,
n
x = ∑ x i ei . (2.1)
i=1
Note that in equation (2.1), x and the ei are vectors, while the xi are scalars.
The equation shows that every x ∈ Rn is expressible as a linear combination
(sum of scalar multiples) of the standard basis vectors. The expression is
unique, for if also x = ∑ni=1 x′i ei for some scalars x′1 , . . . , x′n then the equality
says that x = (x′1 , x′2 , . . . , x′n ), so that x′i = xi for i = 1, . . . , n.
(The reason that the geometric-sounding word linear is used here and
elsewhere in this chapter to describe properties having to do with the alge-
braic operations of addition and scalar multiplication will be explained in
Chapter 3.)
The standard basis is handy in that it is a finite set of vectors from which
each of the infinitely many vectors of Rn can be obtained in exactly one way
as a linear combination. But it is not the only such set, nor is it always the
optimal one.
For example, the set {f1 , f2 } = {(1, 1), (1, −1)} is a basis of R2 . To see this,
consider an arbitrary vector (x, y) ∈ R2 . This vector is expressible as a linear
combination of f1 and f2 if and only if there are scalars a and b such that
Since f1 = (1, 1) and f2 = (1, −1), this vector equation is equivalent to a pair
of scalar equations,
x = a + b,
y = a − b.
Exercises
2.1.1. Write down any three specific nonzero vectors u, v, w from R3 and any
two specific nonzero scalars a, b from R. Compute u + v, aw, b(v + w), (a + b)u,
u + v + w, abw, and the additive inverse to u.
2.1.3. Verify that Rn satisfies vector space axioms (A2), (A3), (D1).
2.1.4. Are all the field axioms used in verifying that Euclidean space satisfies
the vector space axioms?
30 2 Euclidean Space
2.1.5. Show that 0 is the unique additive identity in Rn . Show that each vector
x ∈ Rn has a unique additive inverse, which can therefore be denoted −x. (And
it follows that vector subtraction can now be defined,
2.1.7. Show the uniqueness of the additive identity and the additive inverse
using only (A1), (A2), (A3). (This is tricky; the opening pages of some books
on group theory will help.)
How many elements do you think a basis for Rn must have? Give (without
proof) geometric descriptions of all bases of R2 , of R3 .
Cn = {(z1 , . . . , zn ) ∶ zi ∈ C for i = 1, . . . , n} ,
and endow it with addition and scalar multiplication defined by the same
formulas as for Rn . You may take for granted that under these definitions, Cn
satisfies the vector space axioms with scalar multiplication by scalars from R,
and also Cn satisfies the vector space axioms with scalar multiplication by
scalars from C. That is, using language that was introduced briefly in this
section, Cn can be viewed as a vector space over R and also, separately, as a
vector space over C. Give a basis for each of these vector spaces.
Before continuing, a few comments about how to work with these notes may
be helpful.
2.2 Geometry: Length and Angle 31
⟨ , ⟩ ∶ Rn × Rn Ð→ R,
For example,
n(n + 1)
⟨(1, 1, . . . , 1), (1, 2, . . . , n)⟩ = ,
2
⟨x, ej ⟩ = xj where x = (x1 , . . . , xn ) and j ∈ {1, . . . , n},
⟨ei , ej ⟩ = δij (this means 1 if i = j, 0 otherwise).
(IP1) The inner product is positive definite: ⟨x, x⟩ ≥ 0 for all x ∈ Rn , with
equality if and only if x = 0.
(IP2) The inner product is symmetric: ⟨x, y⟩ = ⟨y, x⟩ for all x, y ∈ Rn .
(IP3) The inner product is bilinear:
for all a, b ∈ R, x, x′ , y, y ′ ∈ Rn .
Thus the modulus is defined in terms of the inner product, rather than by
its own formula. The inner product formula shows that the modulus formula
is √
∣(x1 , . . . , xn )∣ = x21 + ⋯ + x2n ,
2.2 Geometry: Length and Angle 33
Like other symbols, the absolute value signs are now overloaded, but their
meaning can be inferred from context, as in property (Mod2). When n is 1,
2, or 3, the modulus ∣x∣ gives the distance from 0 to the point x, or the length
of x viewed as an arrow. (See Figure 2.6.)
∣x∣ ∣x∣
x ∣x∣
x
The following relation between inner product and modulus will help to
show that distance in Rn behaves as it should, and that angle in Rn makes
sense. Since the relation is not obvious, its proof is a little subtle.
Note that the absolute value signs mean different things on each side of
the Cauchy–Schwarz inequality. On the left side, the quantities x and y are
vectors, their inner product ⟨x, y⟩ is a scalar, and ∣⟨x, y⟩∣ is its scalar absolute
value, while on the right side, ∣x∣ and ∣y∣ are the scalar absolute values of
vectors, and ∣x∣ ∣y∣ is their product. That is, the Cauchy–Schwarz inequality
says:
The size of the product is at most the product of the sizes.
The Cauchy–Schwarz inequality can be written out in coordinates if we
temporarily abandon the principle that we should avoid reference to formulas,
i i j
where the indices of summation run from 1 to n. Expand the square to get
∑ x i yi + ∑ x i yi x j yj ≤ ∑ x i yj ,
2 2 2 2
i i,j i,j
i≠j
∑ x i yi x j yj ≤ ∑ x i yj ,
2 2
i≠j i≠j
or
∑(xi yj − xi yi xj yj ) ≥ 0.
2 2
i≠j
Rather than sum over all pairs (i, j) with i ≠ j, sum over the pairs with
i < j, collecting the (i, j)-term and the (j, i)-term for each such pair, and the
previous inequality becomes
∑(xi yj + xj yi − 2xi yj xj yi ) ≥ 0.
2 2 2 2
i<j
∑(xi yj − xj yi ) ≥ 0.
2
i<j
So the main proof is done, although there is still the question of when equality
holds.
But surely the previous paragraph is not the graceful way to argue. The
computation draws on the minutiae of the formulas for the inner product and
the modulus, rather than using their properties. It is uninformative, making
2.2 Geometry: Length and Angle 35
View the right side as a quadratic polynomial in the scalar variable a, where
the scalar coefficients of the polynomial depend on the generic but fixed vec-
tors x and y,
f (a) = ∣x∣2 a2 − 2⟨x, y⟩a + ∣y∣2 .
We have shown that f (a) is always nonnegative, so f has at most one root.
Thus by the quadratic formula its discriminant is nonpositive,
and the Cauchy–Schwarz inequality ∣⟨x, y⟩∣ ≤ ∣x∣ ∣y∣ follows. Equality holds
exactly when the quadratic polynomial f (a) = ∣ax − y∣2 has a root a, i.e.,
exactly when y = ax for some a ∈ R. ⊔
⊓
Geometrically, the condition for equality in Cauchy–Schwarz is that the
vectors x and y, viewed as arrows at the origin, are parallel, though perhaps
pointing in opposite directions. A geometrically conceived proof of Cauchy–
Schwarz is given in Exercise 2.2.15 to complement the algebraic argument
that has been given here.
The Cauchy–Schwarz inequality shows that the modulus function satisfies
the triangle inequality.
Theorem 2.2.6 (Triangle inequality). For all x, y ∈ Rn ,
∣x + y∣ ≤ ∣x∣ + ∣y∣,
36 2 Euclidean Space
∣x + y∣2 = ⟨x + y, x + y⟩
= ∣x∣2 + 2⟨x, y⟩ + ∣y∣2 by bilinearity
≤ ∣x∣ + 2∣x∣∣y∣ + ∣y∣
2 2
by Cauchy–Schwarz
= (∣x∣ + ∣y∣) ,
2
proving the inequality. Equality holds exactly when ⟨x, y⟩ = ∣x∣∣y∣, or equiva-
lently when ∣⟨x, y⟩∣ = ∣x∣∣y∣ and ⟨x, y⟩ ≥ 0. These hold when one of x, y is a
scalar multiple of the other and the scalar is nonnegative. ⊔
⊓
While the Cauchy–Schwarz inequality says that the size of the product is
at most the product of the sizes, the triangle inequality says:
x+y
y
x
Figure 2.7. Sides of a triangle
The only obstacle to generalizing the basic triangle inequality in this fashion
is notation. The argument can’t use the symbol n to denote the number of
vectors, because n already denotes the dimension of the Euclidean space where
we are working; and furthermore, the vectors can’t be denoted with subscripts
since a subscript denotes a component of an individual vector. Thus, for now
we are stuck writing something like
or
k k
∣∑ x(i) ∣ ≤ ∑ ∣x(i) ∣, x(1) , . . . , x(k) ∈ Rn .
i=1 i=1
As our work with vectors becomes more intrinsic, vector entries will demand
less of our attention, and we will be able to denote vectors by subscripts. The
notation-change will be implemented in the next section.
For every vector x = (x1 , . . . , xn ) ∈ Rn , useful bounds on the modulus ∣x∣
in terms of the scalar absolute values ∣xi ∣ are as follows.
The Cauchy–Schwarz inequality also lets us define the angle between two
nonzero vectors in terms of the inner product. If x and y are nonzero vectors
in Rn , define their angle θx,y by the condition
⟨x, y⟩
cos θx,y = 0 ≤ θx,y ≤ π.
∣x∣∣y∣
, (2.2)
q − y ⊥ x and q − x ⊥ y.
We want to show that q also lies on the third altitude, i.e., that
q ⊥ x − y.
⟨q − y, x⟩ = 0
{ } Ô⇒ ⟨q, x − y⟩ = 0.
⟨q − x, y⟩ = 0
Since the inner product is linear in each of its arguments, a further rephrasing
is that we want to show that
⟨q, x⟩ = ⟨y, x⟩
{ } Ô⇒ ⟨q, x⟩ = ⟨q, y⟩.
⟨q, y⟩ = ⟨x, y⟩
And this is immediate because the inner product is symmetric: ⟨q, x⟩ and ⟨q, y⟩
both equal ⟨x, y⟩, and so they equal each other as desired. The point q where
the three altitudes meet is called the orthocenter of the triangle. In general,
the orthocenter of a triangle is not the geometric center that we considered
in the previous section.
2.2 Geometry: Length and Angle 39
x−y
q
x
Figure 2.8. Three altitudes of a triangle
Exercises
2.2.1. Let x = ( 23 , − 12 , 0), y = ( 12 , z = (1, 1, 1). Compute ⟨x, x⟩, ⟨x, y⟩,
√ √
3
, 1),
⟨y, z⟩, ∣x∣, ∣y∣, ∣z∣, θx,y , θy,e1 , θz,e2 .
2
2.2.2. Show that the points x = (2, −1, 3, 1), y = (4, 2, 1, 4), z = (1, 3, 6, 1) form
the vertices of a triangle in R4 with two equal angles.
2.2.5. Use the inner product properties and the definition of the modulus in
terms of the inner product to prove the modulus properties.
2.2.6. In the text, the modulus is defined in terms of the inner product. Prove
that this can be turned around by showing that for every x, y ∈ Rn ,
∣x + y∣2 − ∣x − y∣2
⟨x, y⟩ = .
4
2.2.7. Prove the full triangle inequality: for every x, y ∈ Rn ,
Do not do this by writing three more variants of the proof of the triangle in-
equality, but by substituting suitably into the basic triangle inequality, which
is already proved.
2.2.10. Working in R2 , depict the nonzero vectors x and y as arrows from the
origin and depict x − y as an arrow from the endpoint of y to the endpoint
of x. Let θ denote the angle (in the usual geometric sense) between x and y.
Use the law of cosines to show that
⟨x, y⟩
cos θ =
∣x∣∣y∣
,
so that our notion of angle agrees with the geometric one, at least in R2 .
2.2.12. Prove that two nonzero vectors x, y are orthogonal if and only if
∣x + y∣2 = ∣x∣2 + ∣y∣2 .
2.2.14. Use vectors to show that every angle inscribed in a semicircle is right.
2.2.15. Let x and y be vectors, with x nonzero. Define the parallel component
of y along x and the normal component of y to x to be
⟨x, y⟩
y(∥x) = y(⊥x) = y − y(∥x) .
∣x∣2
x and
(a) Show that y = y(∥x) + y(⊥x) ; show that y(∥x) is a scalar multiple of x; show
that y(⊥x) is orthogonal to x. Show that the decomposition of y as a sum of
vectors parallel and perpendicular to x is unique. Draw an illustration.
(b) Show that
∣y∣2 = ∣y(∥x) ∣2 + ∣y(⊥x) ∣2 .
What theorem from classical geometry does this encompass?
(c) Explain why it follows from (b) that
∣y(∥x) ∣ ≤ ∣y∣,
x′1 = x1
x′2 = x2 − (x2 )(∥x′1 )
x′3 = x3 − (x3 )(∥x′2 ) − (x3 )(∥x′1 )
⋮
x′n = xn − (xn )(∥x′n−1 ) − ⋯ − (xn )(∥x′1 ) .
(a) What is the result of applying the Gram–Schmidt process to the vectors
x1 = (1, 0, 0), x2 = (1, 1, 0), and x3 = (1, 1, 1)?
(b) Returning to the general case, show that x′1 , . . . , x′n are pairwise or-
thogonal and that each x′j has the form
Thus every linear combination of the new {x′j } is also a linear combination
of the original {xj }. The converse is also true and will be shown in Exer-
cise 3.3.13.
f ∶ R2 Ð→ R2
defined by
f (x, y) = (x2 − y 2 , 2xy)
takes the real and imaginary parts of a complex number z = x + iy and returns
the real and imaginary parts of z 2 . By the nature of multiplication of complex
numbers, this means that each output point has modulus equal to the square
of the modulus of the input point and has angle equal to twice the angle of
the input point. Make sure that you see how this is shown in Figure 2.9.
Mappings expressed by formulas may be undefined at certain points (e.g.,
f (x) = 1/∣x∣ is undefined at 0), so we need to restrict their domains. For a given
dimension n, a given set A ⊂ Rn , and a second dimension m, let M(A, Rm )
denote the set of all mappings f ∶ A Ð→ Rm . This set forms a vector space
over R (whose points are functions) under the operations
2
1
1 −1 1
defined by
(f + g)(x) = f (x) + g(x) for all x ∈ A,
and
⋅ ∶ R × M(A, Rm ) Ð→ M(A, Rm ),
defined by
(a ⋅ f )(x) = a ⋅ f (x) for all x ∈ A.
As usual, “+” and “⋅” are overloaded: on the left they denote operations
on M(A, Rm ), while on the right they denote the operations on Rm de-
fined in Section 2.1. Also as usual, the “⋅” is generally omitted. The origin
in M(A, Rm ) is the zero mapping, 0 ∶ A Ð→ Rm , defined by
For example, to verify that M(A, Rm ) satisfies (A1), consider any mappings
f, g, h ∈ M(A, Rm ). For every x ∈ A,
That is, a sequence is null if for every ε > 0, all but finitely many terms of the
sequence lie within distance ε of 0n .
∣ ∣x∣ ∣ = ∣x∣, x ∈ Rn ,
and so a vector sequence {xν } is null if and only if the scalar sequence {∣xν ∣}
is null.
Proof. By the observation just before the lemma, it suffices to show that
{∣(x1,ν , . . . , xn,ν )∣} is null if and only if each {∣xj,ν ∣} is null. The size bounds
give for every j ∈ {1, . . . , n} and every ν,
n
∣xj,ν ∣ ≤ ∣(x1,ν , . . . , xn,ν )∣ ≤ ∑ ∣xi,ν ∣.
i=1
If {∣(x1,ν , . . . , xn,ν )∣} is null then by the first inequality, so is each {∣xj,ν ∣}. On
the other hand, if each {∣xj,ν ∣} is null then so is {∑ni=1 ∣xi,ν ∣}, and thus by the
second inequality, {∣(x1,ν , . . . , xn,ν )∣} is null as well. ⊔
⊓
∣ ∣ ∶ Rn Ð→ R
Since the right side is the νth term of a null sequence, so is the left, giving
the result.
For another example, let a ∈ Rn be any fixed vector and consider the
function defined by taking the inner product of this vector with other vectors,
2.3 Analysis: Continuous Mappings 45
∣T (xν ) − T (p)∣ = ∣⟨a, xν ⟩ − ⟨a, p⟩∣ = ∣⟨a, xν − p⟩∣ ≤ ∣a∣ ∣xν − p∣.
Since ∣a∣ is a constant, the right side is the νth term of a null sequence,
whence so is the left, and the proof is complete. We will refer to this example
in Section 3.1. Also, note that as a special case of this example we may take
any j ∈ {1, . . . , n} and set the fixed vector a to ej , showing that the jth
coordinate function map,
πj ∶ Rn Ð→ R, πj (x1 , . . . , xn ) = xj ,
is continuous.
Proposition 2.3.7 (Vector space properties of continuity). Let A be a
subset of Rn , let f, g ∶ A Ð→ Rm be continuous mappings, and let c ∈ R. Then
the sum and the scalar multiple mappings
f + g, cf ∶ A Ð→ Rm
g ○ f ∶ A Ð→ Rℓ
is continuous.
The proof is Exercise 2.3.7.
Let A be a subset of Rn . Every mapping f ∶ A Ð→ Rm decomposes as m
functions f1 , . . . , fm , with each fi ∶ A Ð→ R, by the formula
The previous example was actually fairly simple in that we only needed
to study f (x, y) as (x, y) approached 0 along straight lines. Consider the
function g ∶ R2 Ð→ R defined by
⎧
⎪
⎪
⎪ 4
x2 y
if (x, y) ≠ 0,
g(x, y) = ⎨ x + y 2
⎪
⎪
⎪
⎩ b if (x, y) = 0.
x4ν
g(xν , yν ) = g(xν , x2ν ) =
1
= .
x4ν + x4ν 2
Proof. Assume that the displayed statement in the proposition fails for ev-
ery ε > 0. Then in particular, it fails for ε = 1/ν for ν = 1, 2, 3, . . . . So there is
a sequence {xν } in A such that
Exercises
Briefly explain how this section has shown that C(A, Rm ) is a vector space.
Do the inner product properties (IP1), (IP2), and (IP3) (see Proposition 2.2.2)
hold for this inner product on C([0, 1], R)? How much of the material from Sec-
tion 2.2 on the inner product and modulus in Rn carries over to C([0, 1], R)?
Express the Cauchy–Schwarz inequality as a relation between integrals.
2.3.6. Use the definition of continuity and the componentwise nature of con-
vergence to prove the componentwise nature of continuity.
Is f continuous?
Proof. The hypothesis that {xν } converges to p means that for every given
ε > 0, only finitely many sequence-terms xν lie outside the ball B(p, ε). Con-
sequently, only finitely many subsequence-terms xνk lie outside B(p, ε), which
is to say that {xνk } converges to p. ⊔
⊓
Since the static notions of closed and bounded are reasonably intuitive, we
can usually recognize compact sets on sight. But it is not obvious from how
compact sets look that they are related to continuity. So our program now
has two steps: first, combine Proposition 2.4.5 and the Bolzano–Weierstrass
property to characterize compact sets in terms of sequences, and second, use
the characterization to prove that compactness is preserved by continuous
mappings.
Again, the sets in Theorem 2.4.14 are defined with no direct reference to
sequences, but the theorem is proved entirely using sequences. The point is
that with the theorem proved, we can easily see that it applies in particular
contexts without having to think any longer about the sequences that were
used to prove it.
A corollary of Theorem 2.4.14 generalizes the theorem that was quoted to
begin the section:
Exercises
2.4.1. Are the following subsets of Rn closed, bounded, compact?
(a) B(0, 1),
(b) {(x, y) ∈ R2 ∶ y − x2 = 0},
(c) {(x, y, z) ∈ R3 ∶ x2 + y 2 + z 2 − 1 = 0},
(d) {x ∶ f (x) = 0m }, where f ∈ M(Rn , Rm ) is continuous (this generalizes
(b) and (c)),
(e) Qn where Q denotes the rational numbers,
(f) {(x1 , . . . , xn ) ∶ x1 + ⋯ + xn > 0}.
2.4.2. Give a set A ⊂ Rn and limit point b of A such that b ∉ A. Give a set
A ⊂ Rn and a point a ∈ A such that a is not a limit point of A.
2.4.3. Let A be a closed subset of Rn and let f ∈ M(A, Rm ). Define the
graph of f to be
G(f ) = {(a, f (a)) ∶ a ∈ A},
a subset of Rn+m . Show that if f is continuous then its graph is closed.
2.4.4. Prove the closed set properties: (1) the empty set ∅ and the full space
Rn are closed subsets of Rn ; (2) every intersection of closed sets is closed; (3)
every finite union of closed sets is closed.
2.4.5. Prove that every ball B(p, ε) is bounded in Rn .
2.4.6. Show that A is a bounded subset of Rn if and only if for each j ∈
{1, . . . , n}, the jth coordinates of its points form a bounded subset of R.
2.4.7. Show by example that a closed set need not satisfy the sequential char-
acterization of bounded sets, and that a bounded set need not satisfy the
sequential characterization of closed sets.
2.4.8. Show by example that the continuous image of a closed set need not
be closed, that the continuous image of a closed set need not be bounded,
that the continuous image of a bounded set need not be closed, and that the
continuous image of a bounded set need not be bounded.
2.4.9. A subset A of Rn is called discrete if each of its points is isolated.
(Recall that the term isolated was defined in this section.) Show or take for
granted the (perhaps surprising at first) fact that every mapping whose do-
main is discrete must be continuous. Is discreteness a topological property?
That is, need the continuous image of a discrete set be discrete?
2.4.10. A subset A of Rn is called path-connected if for every two points
x, y ∈ A, there is a continuous mapping
γ ∶ [0, 1] Ð→ A
such that γ(0) = x and γ(1) = y. (This γ is the path that connects x and y.)
Draw a picture to illustrate the definition of a path-connected set. Prove that
path-connectedness is a topological property.
3
Linear Mappings and Their Matrices
for all positive integers k, all real numbers α1 through αk , and all vectors x1
through xk .
The reader may find this definition discomfiting. It does not say what form
a linear mapping takes, and this raises some immediate questions. How are we
to recognize linear mappings when we encounter them? Or are we supposed to
think about them without knowing what they look like? For that matter, are
there even any linear mappings to encounter? Another troublesome aspect of
Definition 3.1.1 is semantic: despite the geometric sound of the word linear,
the definition is in fact algebraic, describing how T behaves with respect to
the algebraic operations of vector addition and scalar multiplication. (Note
that on the left of the equality in the definition, the operations are set in Rn ,
while on the right they are in Rm .) So what is the connection between the
definition and actual lines? Finally, how exactly do conditions (3.1) and (3.2)
relate to the condition in the definition?
On the other hand, Definition 3.1.1 has the virtue of illustrating the prin-
ciple that to do mathematics effectively we should characterize our objects
rather than construct them. The characterizations are admittedly guided by
hindsight, but there is nothing wrong with that. Definition 3.1.1 says how
a linear mapping behaves. It says that whatever form linear mappings will
turn out to take, our reflex should be to think of them as mappings through
which we can pass sums and constants. (This idea explains why one of the
inner product properties is called bilinearity: the inner product is linear as a
function of either of its two vector variables when the other variable is held
fixed.) The definition of linearity tells us how to use linear mappings once we
know what they are, or even before we know what they are. Another virtue
of Definition 3.1.1 is that it is intrinsic, making no reference to coordinates.
3.1 Linear Mappings 61
Some of the questions raised by Definition 3.1.1 have quick answers. The
connection between the definition and actual lines will quickly emerge from our
pending investigations. Also, an induction argument shows that (3.1) and (3.2)
are equivalent to the characterization in the definition, despite appearing
weaker (Exercise 3.1.1). Thus, to verify that a mapping is linear, we only
need to show that it satisfies the easier-to-check conditions (3.1) and (3.2);
but to derive properties of mappings that are known to be linear, we may want
to use the more powerful condition in the definition. As for finding linear map-
pings, the definition suggests a two-step strategy: first, derive the form that
a linear mapping necessarily takes in consequence of satisfying the definition;
and second, verify that the mappings of that form are indeed linear, i.e., show
that the necessary form of a linear mapping is also sufficient for a mapping
to be linear. We now turn to this.
The easiest case to study is linear mappings from R to R. Following the
strategy, first we assume that we have such a mapping and determine its form,
obtaining the mappings that are candidates to be linear. Second, we show
that all the candidates are indeed linear mappings. Thus suppose that some
mapping T ∶ R Ð→ R is linear. The mapping determines a scalar, a = T (1).
And then for every x ∈ R,
T (x) = T (x ⋅ 1) since x ⋅ 1 = x
= xT (1) by (3.2)
= xa by definition of a
= ax since multiplication in R commutes.
as needed. You can check (3.1) similarly, and the calculation that T (1) = a is
immediate. These last two paragraphs combine to prove the following result.
T (x) = ax
where a ∈ R. That is, each linear mapping T ∶ R Ð→ R is multiplication by a
unique a ∈ R and conversely.
The slogan encapsulating the formula T (x) = ax (read “T of x equals a
times x”) in the proposition is:
For scalar input and scalar output, linear OF is scalar TIMES.
That is, given x ∈ R, the effect of a linear mapping T ∶ R Ð→ R on x is
simply to multiply x by a scalar a ∈ R associated with T . This may seem
trivial, but the issue is that at times our methodology will be to study a linear
mapping by its defining properties, i.e., the rules T (x + y) = T (x) + T (y) and
T (αx) = αT (x), while at other times we will profit from studying a linear
mapping computationally, i.e., as a mapping that simply multiplies its inputs
by something—by a scalar here, but by a vector or by a matrix later in this
section. The slogan displayed just above, as well as its two variants to follow
below, gives the connection between the two ways to think about a linear
mapping.
Also, the proposition explains the term linear: the graphs of linear map-
pings from R to R are lines through the origin. (Mappings f (x) = ax + b with
b ≠ 0 are not linear according to our definition even though their graphs are
also lines. However, see Exercises 3.1.15 and 3.2.6.) For example, a typical
linear mapping from R to R is T (x) = (1/2)x. Figure 3.1 shows two ways of
visualizing this mapping. The left half of the figure plots the domain axis and
the codomain axis orthogonally to each other in one plane, the familiar way
to graph a function. The right half of the figure plots the axes separately,
using the spacing of the dots to describe the mapping instead. The uniform
spacing along the rightmost axis depicts the fact that T (x) = xT (1) for all
x ∈ Z, and the spacing is half as big because the multiplying factor is 1/2.
Figures of this second sort can generalize up to three dimensions of input and
three dimensions of output, whereas figures of the first sort can display at
most three dimensions of input and output combined.
T (x)
T
x
0 1 0 T (1)
(So here each xi is a scalar entry of the vector x, whereas in Definition 3.1.1,
each xi was itself a vector. The author does not know any graceful way to
avoid this notation collision, the systematic use of boldface or arrows to adorn
vector names being heavyhanded, and the systematic use of the Greek letter
ξ rather than its Roman counterpart x to denote scalars being alien. Since
mathematics involves finitely many symbols and infinitely many ideas, the
reader will in any case eventually need the skill of discerning meaning from
context, a skill that may as well start receiving practice now.) Returning to
the main discussion, since x = ∑ni=1 xi ei and T is linear, Definition 3.1.1 shows
that
n n n
T (x) = T (∑ xi ei ) = ∑ xi T (ei ) = ∑ xi ai = ⟨x, a⟩ = ⟨a, x⟩.
i=1 i=1 i=1
T (x) = ⟨a, x⟩
The slogan encapsulating the formula T (x) = ⟨a, x⟩ of the proposition is:
In the previous chapter, the second example after Definition 2.3.6 showed
that every linear mapping T ∶ Rn Ð→ R is continuous. You are encouraged to
reread that example now before continuing.
A depiction of a linear mapping from R2 to R can again plot the domain
plane and the codomain axis orthogonally to each other or separately. See
Figures 3.2 and 3.3 for examples of each type of plot. The first figure suggests
that the graph forms a plane in R3 and that a line of inputs is taken to
the output value 0. The second figure shows more clearly how the mapping
compresses the plane into the line. As in the right half of Figure 3.1, the idea
is that T (x, y) = xT (1, 0) + yT (0, 1) for all x, y ∈ Z. The compression is that
although (1, 0) and (0, 1) lie on separate input axes, T (1, 0) and T (0, 1) lie
on the same output axis.
and
But T satisfies (3.1) exactly when the left sides are equal, the left sides are
equal exactly when the right sides are equal, and the right sides are equal
exactly when each Ti satisfies (3.1). A similar argument with (3.2), left as
Exercise 3.1.5, completes the proof. ⊔
⊓
The componentwise nature of linearity combines with the fact that scalar-
valued linear mappings are continuous (as observed after Proposition 3.1.3)
and with the componentwise nature of continuity to show that all linear map-
pings are continuous. Despite being so easy to prove, this fact deserves a
prominent statement.
Sometimes one saves writing by abbreviating the right side of (3.3) to [aij ]m×n ,
or even just [aij ] when m and n are firmly established.
The set of all m × n matrices (those with m rows and n columns) of real
numbers is denoted Mm,n (R). The n × n square matrices are denoted Mn (R).
Euclidean space Rn is often identified with Mn,1 (R) and vectors written as
columns,
66 3 Linear Mappings and Their Matrices
⎡x ⎤
⎢ 1⎥
⎢ ⎥
(x1 , . . . , xn ) = ⎢ ⋮ ⎥ .
⎢ ⎥
⎢xn ⎥
⎣ ⎦
This typographical convention may look odd, but it is useful. The idea is that
a vector in parentheses is merely an ordered list of entries, not inherently a
row or a column; but when a vector—or, more generally, a matrix—is enclosed
by square brackets, the distinction between rows and columns is significant.
To make the linear mapping T ∶ Rn Ð→ Rm be multiplication by its matrix
A ∈ Mm,n (R), we need to define multiplication of an m × n matrix A by an
n × 1 vector x appropriately. That is, the only sensible definition is as follows.
Definition 3.1.6 (Matrix-by-vector multiplication). Let A ∈ Mm,n (R)
and let x ∈ Rn . The product Ax ∈ Rm is defined to be the vector whose ith
entry is the inner product of A’s ith row and x,
⎡ ⎤
⎡ a11 a12 ⋯ ⋯ a1n ⎤ ⎢ x1 ⎥ ⎡ a11 x1 + ⋯ + a1n xn ⎤
⎢ ⎥ ⎢x ⎥ ⎢ ⎥
⎢ a a ⋯ ⋯ a ⎥ ⎢ 2⎥ ⎢ a x + ⋯ + a x ⎥
⎢ 21 22 2n ⎥ ⎢ ⎥ ⎢ 21 1 2n n ⎥
Ax = ⎢ ⎥⎢ ⋮ ⎥ = ⎢ ⎥.
⎢ ⋮ ⋮ ⋮ ⎥ ⎢ ⎥ ⎢ ⋮ ⎥
⎢ ⎥⎢ ⋮ ⎥ ⎢ ⎥
⎢am1 am2 ⋯ ⋯ amn ⎥ ⎢ ⎥ ⎢am1 x1 + ⋯ + amn xn ⎥
⎣ ⎦ ⎢xn ⎥ ⎣ ⎦
⎣ ⎦
For example,
⎡7⎤
123 ⎢ ⎥
⎢ ⎥ 1⋅7+2⋅8+3⋅9
[ ] ⎢8⎥ = [ ] = [ ].
50
456 ⎢ ⎥
⎢9⎥ 4 ⋅ 7 + 5 ⋅ 8 + 6 ⋅ 9 122
⎣ ⎦
Definition 3.1.6 is designed to give the following theorem, which encompasses
Propositions 3.1.2 and 3.1.3 as special cases.
Theorem 3.1.7 (Description of linear mappings from vectors to vec-
tors). The linear mappings T ∶ Rn Ð→ Rm are precisely the mappings
T (x) = Ax
The columns of A also have a description in terms of T . Indeed, the jth column
is
3.1 Linear Mappings 67
⎡ a ⎤ ⎡ T (e ) ⎤
⎢ 1j ⎥ ⎢ 1 j ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⋮ ⎥ = ⎢ ⋮ ⎥ = T (ej ).
⎢ ⎥ ⎢ ⎥
⎢amj ⎥ ⎢Tm (ej )⎥
⎣ ⎦ ⎣ ⎦
That is:
The jth column of A is T (ej ), i.e., is T of the jth standard basis vector.
x1 + x2
x2 r(x2 ) r(x1 )
x1
and thus √
−1/2
3/2 √
A=[ ].
1/2 3/2
So now we know r, because the rows of A describe its component functions,
√ ⎡ 3 x − 1 y⎤
√ √ √
−1/2 x
3/2 √ ⎢ 2 ⎥
r(x, y) = [ ][ ] = ⎢1 ⎥=(
3 1 1 3
x − y, x +
⎢ x + 3 y⎥
√ 2 y).
1/2 3/2 y ⎣2 2 ⎦ 2 2 2 2
Figures 3.5 through 3.8 show more depictions of linear mappings between
spaces of various dimensions. Note that although these mappings stretch and
torque their basic input grids, the grids still get taken to configurations of
68 3 Linear Mappings and Their Matrices
straight lines. Contrast this to how the nonlinear mapping of Figure 2.9 bends
the basic grid lines into curves.
We end this section by returning from calculations to intrinsic methods.
The following result could have come immediately after Definition 3.1.1, but
it has been deferred to this point for the sake of presenting some of the objects
more explicitly first, to make them familiar. However, it is most easily proved
intrinsically.
Let L(Rn , Rm ) denote the set of all linear mappings from Rn to Rm . Not
only does this set sit inside the vector space M(Rn , Rm ), it is a vector space
in its own right:
3.1 Linear Mappings 69
Exercises
3.1.1. Prove that T ∶ Rn Ð→ Rm is linear if and only if it satisfies (3.1)
and (3.2). (It may help to rewrite (3.1) with the symbols x1 and x2 in place
of x and y. Then prove one direction by showing that (3.1) and (3.2) are
implied by the defining condition for linearity, and prove the other direction
by using induction to show that (3.1) and (3.2) imply the defining condition.
Note that as pointed out in the text, one direction of this argument has a bit
more substance than the other.)
70 3 Linear Mappings and Their Matrices
3.1.8. Let θ denote a fixed but generic angle. Argue geometrically that the
mapping R ∶ R2 Ð→ R2 given by counterclockwise rotation by θ is linear, and
then find its matrix.
3.1.12. Continue the proof of Proposition 3.1.8 by proving the other three
statements about S + T and aS satisfying (3.1) and (3.2).
S T ∶ Rm Ð→ Rn
Granting that indeed a unique such S T exists, use the characterizing condition
to show that
by showing that
A similar argument (not requested here) shows that S T (αy) = αS T (y) for
all α ∈ R and y ∈ Rm , and so the transpose of a linear mapping is linear.
(b) Keeping S from part (a), now further introduce T ∈ L(Rp , Rn ), so that
also S ○ T ∈ L(Rp , Rm ). Show that the transpose of the composition is the
composition of the transposes in reverse order,
(S ○ T )T = T T ○ S T ,
by showing that
Show that this function satisfies the distance properties of Theorem 2.2.8.
(e) Show that for every S ∈ L(Rn , Rm ) and every T ∈ L(Rp , Rn ),
∥ST ∥ ≤ ∥S∥∥T ∥.
72 3 Linear Mappings and Their Matrices
A, B ∈ Mm,n (R),
and if a is a real number, then the matrices for the linear mappings
S + T ∶ Rn Ð→ Rm and aS ∶ Rn Ð→ Rm
So “+” and “⋅” (or juxtaposition) are about to acquire new meanings yet
again,
+ ∶ Mm,n (R) × Mm,n (R) Ð→ Mm,n (R)
and
⋅ ∶ R × Mm,n (R) Ð→ Mm,n (R).
To define the sum, fix j between 1 and n. Then
For example,
−1 0
[ ]+[ ]=[ ].
12 02
34 21 55
A similar argument shows that the appropriate definition to make for scalar
multiplication of matrices is as follows.
3.2 Operations on Matrices 73
For example,
2[ ]=[ ].
12 24
34 68
The zero matrix 0m,n ∈ Mm,n (R), corresponding to the zero mapping in
L(Rn , Rm ), is the obvious one, with all entries 0. The operations in Mm,n (R)
precisely mirror those in L(Rn , Rm ), giving the following result.
Proposition 3.2.3 (Mm,n (R) forms a vector space). The set Mm,n (R)
of m × n matrices forms a vector space over R.
S ∶ Rn Ð→ Rm and T ∶ Rp Ð→ Rn
S ○ T ∶ Rp Ð→ Rm
Then the composition S ○ T has a matrix in Mm,p (R) that is naturally defined
as the matrix-by-matrix product
AB ∈ Mm,p (R),
the order of multiplication being chosen for consistency with the composition.
Under this specification,
AB ∈ Mm,p (R),
has for its (i, j)th entry (for every (i, j) ∈ {1, . . . , m} × {1, . . . , p}) the inner
product of the ith row of A and the jth column of B. In symbols,
Indeed, both quantities in the previous display are the 1 × p vector whose jth
entry is the inner product of the ith row of A and the jth column of B.
For example, consider the matrices
⎡1 −2⎤
⎢ ⎥
⎢ ⎥
A=[ ], B = ⎢2 −3⎥ , C = [ ],
123 45
⎢ ⎥
456 ⎢3 −4⎥ 67
⎣ ⎦
⎡1 1 1⎤ ⎡x⎤
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
D = ⎢0 1 1⎥ , E = [a b c] , F = ⎢y ⎥ .
⎢ ⎥ ⎢ ⎥
⎢0 0 1⎥ ⎢z ⎥
⎣ ⎦ ⎣ ⎦
Some products among these (verify!) are
⎡ −8 −9 ⎤
⎢ ⎥
14 −20 ⎢ ⎥
AB = [ ], BC = ⎢−10 −11⎥ , AD = [ ],
13 6
32 −47 ⎢ ⎥
⎢−12 −13⎥ 4 9 15
⎣ ⎦
⎡6 −9⎤ ⎡ax bx cx⎤
⎢ ⎥ ⎢ ⎥
⎢ ⎥ x + 2y + 3z ⎢ ⎥
DB = ⎢5 −7⎥ , AF = [ ], F E = ⎢ay by cy ⎥ ,
⎢ ⎥ 4x + 5y + 6z ⎢ ⎥
⎢3 −4⎥ ⎢az bz cz ⎥
⎣ ⎦ ⎣ ⎦
EF = ax + by + cz.
3.2 Operations on Matrices 75
[ ][ ]=[ ], [ ][ ]=[ ].
01 00 10 00 01 00
00 10 00 10 00 01
id ∶ Rn Ð→ Rn , id(x) = x.
Proof. The right way to prove these is intrinsic, by recalling that addition,
scalar multiplication, and multiplication of matrices precisely mirror addition,
scalar multiplication, and composition of mappings. For example, if A, B, C
are the matrices of the linear mappings S ∈ L(Rn , Rm ), T ∈ L(Rp , Rn ), and
U ∈ L(Rq , Rp ), then (AB)C and A(BC) are the matrices of (S ○ T ) ○ U and
S ○ (T ○ U ). But these two mappings are the same, because the composition of
mappings (mappings in general, not only linear mappings) is associative. To
76 3 Linear Mappings and Their Matrices
verify the associativity, we cite the definition of four different binary compo-
sitions to show that the ternary composition is independent of parentheses,
as follows. For every x ∈ Rq ,
The steps here are not explained in detail because the author finds this method
as grim as it is gratuitous: the coordinates work because they must, but their
presence only clutters the argument. The other equalities are similar. ⊔
⊓
Exercises
3.2.4. (If you have not yet worked Exercise 3.1.14 then do so before working
this exercise.) Let A = [aij ] ∈ Mm,n (R) be the matrix of S ∈ L(Rn , Rm ). Its
transpose AT ∈ Mn,m (R) is the matrix of the transpose mapping S T . Since
S and S T act respectively as multiplication by A and AT , the characterizing
property of S T from Exercise 3.1.14 gives
Make specific choices of x and y to show that the transpose AT ∈ Mn,m (R) is
obtained by flipping A about its northwest–southeast diagonal; that is, show
that the (i, j)th entry of AT is aji . It follows that the rows of AT are the
columns of A, and the columns of AT are the rows of A.
(Similarly, let B ∈ Mn,p (R) be the matrix of T ∈ L(Rp , Rn ), so that B T
is the matrix of T T . Because matrix multiplication is compatible with linear
mapping composition, we know immediately from Exercise 3.1.14(b), with no
reference to the concrete description of the matrix transposes AT and B T in
terms of the original matrices A and B, that the transpose of the product is
the product of the transposes in reverse order,
Show that
tr(AB) = tr(BA), A, B ∈ Mn (R).
(This exercise may entail double subscripts.)
3.2.6. For every matrix A ∈ Mm,n (R) and column vector a ∈ Rm , define the
affine mapping (cf. Exercise 3.1.15)
AffA,a ∶ Rn Ð→ Rm
A′ = [ ].
A a
0n 1
Aff (x)
A′ [ ] = [ A,a ].
x
1 1
[ ]=[ ][ ].
ab 1 ab a 0
0d 0 1 0d
Thus this exercise has shown that all matrices [ ac db ] with ad − bc = 1 can be
expressed in terms of matrices [ 10 β1 ] and matrices [ α0 α0−1 ] and the matrix
[ 01 −10 ].
If so, what is T ?
The symmetry of the previous display shows that if T is an inverse of S
then S is an inverse of T in turn. Also, the inverse T , if it exists, must be
unique, for if T ′ ∶ Rm Ð→ Rn also inverts S then
T ′ = T ′ ○ idm = T ′ ○ (S ○ T ) = (T ′ ○ S) ○ T = idn ○ T = T.
Thus T can unambiguously be denoted S −1 . In fact, this argument has shown
a little bit more than claimed: if T ′ inverts S from the left and T inverts S
from the right then T ′ = T . On the other hand, the argument does not show
that if T inverts S from the left then T also inverts S from the right—this is
not true.
If the inverse T exists then it too is linear. To see this, note that the
elementwise description of S and T being inverses of one another is that every
y ∈ Rm takes the form y = S(x) for some x ∈ Rn , every x ∈ Rn takes the form
x = T (y) for some y ∈ Rm , and
for all x ∈ Rn and y ∈ Rm , y = S(x) ⇐⇒ x = T (y).
Now compute that for every y1 , y2 ∈ Rm ,
T (y1 + y2 ) = T (S(x1 ) + S(x2 )) for some x1 , x2 ∈ Rn
= T (S(x1 + x2 )) since S is linear
= x1 + x2 since T inverts S
= T (y1 ) + T (y2 ) since y1 = S(x1 ) and y2 = S(x2 ).
Thus T satisfies (3.1). The argument that T satisfies (3.2) is similar.
Since matrices are more explicit than linear mappings, we replace the
question at the beginning of this section with its matrix counterpart: given a
matrix A ∈ Mm,n (R), does it have an inverse matrix, a matrix B ∈ Mn,m (R)
such that
AB = Im and BA = In ?
As above, if the inverse exists then it is unique, and so it can be denoted A−1 .
The first observation to make is that if the equation Ax = 0m has a nonzero
solution x ∈ Rn then A has no inverse. Indeed, also A0n = 0m , so an inverse
A−1 would have to take 0m both to x and to 0n , which is impossible. And so
we are led to a subordinate question: when does the matrix equation
Ax = 0m
have nonzero solutions x ∈ R ? n
Proof. (1) As observed immediately after Definition 3.2.4, each row of Ri;j,a M
equals the corresponding row of Ri;j,a times M . For every row index k ≠ i,
the only nonzero entry of the row is a 1 in the kth position, so the product
of the row and M simply picks out the kth row of M . Similarly, the ith row
of Ri;j,a has a 1 in the ith position and an a in the jth, so the row times M
equals the ith row of M plus a times the jth row of M .
The proofs of statements (2) and (3) are similar, left as Exercise 3.3.2. ⊓
⊔
To get a better sense of why the statements in the proposition are true, it
may be helpful to do the calculations explicitly with some moderately sized
matrices. But then, the point of the proposition is that once one believes it, left
multiplication by elementary matrices no longer requires actual calculation.
Instead, one simply carries out the appropriate row operations. For example,
R1;2,3 ⋅ [ ]=[ ],
123 13 17 21
456 4 5 6
because R1;2,3 adds 3 times the second row to the first. The slogan here is:
Thus we use the elementary matrices to reason about this material, but for
hand calculation we simply carry out the row operations.
The next result is that performing row operations on A doesn’t change the
set of solutions x to the equation Ax = 0m .
Note that B has a 1 as the leftmost entry of its first row. Recombine various
multiples of the first row with the other rows to put 0’s beneath the leading 1
of the first row; call the result C:
⎡1 0 2 3 0 5⎤
⎢ ⎥
⎢ 0 −1 −7 −11 0 −13 ⎥
⎢ ⎥
⎢ ⎥
R5;1,−5 R4;1,−5 R3;1,2 R2;1,3 B = ⎢ 0 1 7 11 0 13 ⎥ = C.
⎢ ⎥
⎢ 0 1 7 11 1 30 ⎥
⎢ ⎥
⎢ 0 0 0 0 1 17 ⎥
⎣ ⎦
Recombine various multiples of the second row with the others to put 0’s
above and below its leftmost nonzero entry; scale the second row to make its
leading nonzero entry a 1; call the result D:
⎡1 0 5⎤
⎢ 02 3 ⎥
⎢0 0 13 ⎥
⎢ ⎥
⎢ ⎥
17 11
S2,−1 R4;2,1 R3;2,1 C = ⎢ 0 0 0 ⎥ = D.
⎢ ⎥
00 0
⎢0 1 17 ⎥
⎢ 00 0 ⎥
⎢0 1 17 ⎥
⎣ 00 0 ⎦
Transpose the third and fifth rows; put 0’s above and below the leading 1 in
the third row; call the result E:
⎡1 5⎤
⎢ 02 30 ⎥
⎢0 13 ⎥
⎢ ⎥
⎢ ⎥
17 11 0
R4;3,−1 T3;5 D = ⎢ 0 17 ⎥ = E.
⎢ ⎥
00 01
⎢0 0⎥
⎢ 00 00 ⎥
⎢0 0⎥
⎣ 00 00 ⎦
Matrix E is a prime example of a so-called echelon matrix. (The term will be
defined precisely in a moment.) Its virtue is that the equation Ex = 05 is now
easy to solve. This equation expands out to
⎡ x1 ⎤
⎡1 0 5⎤ ⎢ ⎥ ⎡ ⎤ ⎡ ⎤
⎢ 2 30 ⎥ ⎢ x2 ⎥ ⎢ x1 + 2x3 + 3x4 + 5x6 ⎥ ⎢ 0 ⎥
⎢0 1 13 ⎥ ⎢ ⎥ ⎢ x2 + 7x3 + 11x4 + 13x6 ⎥
⎥ ⎢ ⎥ ⎢ ⎢ ⎥
⎢ ⎥ ⎢0⎥
⎢ ⎥ ⎢ x3 ⎥ ⎢ ⎥ ⎢ ⎥
7 11 0
Ex = ⎢ 0 0 17 ⎥ ⎢ ⎥ = ⎢ x5 + 17x6 ⎥ = ⎢ 0 ⎥ .
⎢ ⎥ ⎢ x4 ⎥ ⎢ ⎥ ⎢ ⎥
0 01
⎢0 0 0⎥ ⎢ ⎥ ⎢ 0⎥ ⎢ ⎥
⎢ 0 00 ⎥ ⎢ x5 ⎥ ⎢ ⎥ ⎢0⎥
⎢0 0 ⎥ ⎢ ⎥ ⎢ 0⎦ ⎢
⎥ ⎥
⎣ 0⎦⎢ ⎥ ⎣ ⎣0⎦
⎣ x6 ⎦
0 00
In an echelon matrix E, the columns with leading 1’s are called new
columns, and all others are old columns. The recipe for solving the equation
Ex = 0m is then as follows.
1. Freely choose the entries in x that correspond to the old columns of E.
2. Then each nonzero row of E will determine the entry of x corresponding
to its leading 1 (which sits in a new column). This entry will be a linear
combination of the free entries to its right.
Let’s return to the problem of determining whether A ∈ Mm,n (R) is in-
vertible. The idea was to see whether the equation Ax = 0m has any nonzero
solutions x, in which case A is not invertible. Equivalently, we may check
whether Ex = 0m has nonzero solutions, where E is the echelon matrix to
which A row reduces. The recipe for solving Ex = 0m shows that there are
nonzero solutions unless all of the columns are new.
If A ∈ Mm,n (R) has more columns than rows then its echelon matrix E
must have old columns. Indeed, each new column comes from the leading 1 in
a distinct row, so
showing that not all the columns are new. Thus A is not invertible when
m < n. On the other hand, if A ∈ Mm,n (R) has more rows than columns and
3.3 The Inverse of a Linear Mapping 85
it has an inverse matrix A−1 ∈ Mn,m (R), then A−1 in turn has inverse A, but
this is impossible, because A−1 has more columns than rows. Thus A is also
not invertible when m > n.
The remaining case is that A is square. The only square echelon matrix
with all new columns is I, the identity matrix (Exercise 3.3.10). Thus, unless
A’s echelon matrix is I, A is not invertible. On the other hand, if A’s echelon
matrix is I, then P A = I for some product P of elementary matrices. Multiply
from the left by P −1 to get A = P −1 ; this is invertible by P , giving A−1 = P .
This discussion is summarized in the following theorem.
Theorem 3.3.7 (Invertibility and echelon form for matrices). A non-
square matrix A is never invertible. A square matrix A is invertible if and
only if its echelon form is the identity matrix.
When A is square, the discussion above gives an algorithm that simulta-
neously checks whether it is invertible and finds its inverse when it is.
Proposition 3.3.8 (Matrix inversion algorithm). Given A ∈ Mn (R), set
up the matrix
B = [A ∣ In ]
in Mn,2n (R). Carry out row operations on this matrix to reduce the left side
to echelon form. If the left side reduces to In then A is invertible and the right
side is A−1 . If the left side doesn’t reduce to In then A is not invertible.
The algorithm works because if B is left multiplied by a product P of
elementary matrices, the result is
P B = [P A ∣ P ] .
As discussed, P A = In exactly when P = A−1 .
For example, the calculation
⎡ 1 −1 0 1 0 0 ⎤ ⎡ 1 0 0 1 1 1 ⎤
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
R1;2,1 R2;3,1 ⎢ 0 1 −1 0 1 0 ⎥ = ⎢ 0 1 0 0 1 1 ⎥
⎢ ⎥ ⎢ ⎥
⎢0 0 1 0 0 1⎥ ⎢0 0 1 0 0 1⎥
⎣ ⎦ ⎣ ⎦
shows that
⎡ 1 −1 0 ⎤−1 ⎡ 1 1 1 ⎤
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢ 0 1 −1 ⎥ = ⎢ 0 1 1 ⎥ ,
⎢ ⎥ ⎢ ⎥
⎢0 0 1⎥ ⎢0 0 1⎥
⎣ ⎦ ⎣ ⎦
and one readily checks that the claimed inverse really works. Since arithmetic
by hand is so error-prone a process, one always should confirm one’s answer
from the matrix inversion algorithm.
We now have an algorithmic answer to the question at the beginning of
the section.
Theorem 3.3.9 (Echelon criterion for invertibility). The linear map-
ping S ∶ Rn Ð→ Rm is invertible only when m = n and its matrix A has
echelon matrix In , in which case its inverse S −1 is the linear mapping with
matrix A−1 .
86 3 Linear Mappings and Their Matrices
Exercises
3.3.1. Write down the following 3 × 3 elementary matrices and their inverses:
R3;2,π , S3,3 , T3;2 , T2;3 .
3.3.2. Finish the proof of Proposition 3.3.2.
3.3.7. Are the following matrices echelon? For each matrix M , solve the equa-
tion M x = 0.
⎡0 0⎤
⎡1 0 3⎤ ⎢ ⎥ ⎡1 0 0 0⎤ ⎡0 1 1⎤
⎢ ⎥ ⎢1 0⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢0 1 1⎥ , [ ], [ ], ⎢ ⎥, ⎢0 1 1 0⎥ , ⎢1 0 3⎥ .
0001 1100
⎢ ⎥ ⎢0 1⎥ ⎢ ⎥ ⎢ ⎥
⎢0 0 1⎥ 0000 0011 ⎢ ⎥ ⎢0 0 1 0⎥ ⎢0 0 0⎥
⎣ ⎦ ⎢0 0⎥ ⎣ ⎦ ⎣ ⎦
⎣ ⎦
3.3.8. For each matrix A solve the equation Ax = 0.
⎡ −1 1 4 ⎤ ⎡ 2 −1 3 2 ⎤ ⎡ 3 −1 2 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 1 3 8⎥, ⎢1 4 0 1⎥, ⎢2 1 1⎥.
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 1 2 5⎥ ⎢ 2 6 −1 5 ⎥ ⎢ 1 −3 0 ⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
3.3.9. Balance the chemical equation
Ca + H3 PO4 Ð→ Ca3 P2 O8 + H2 .
3.3.10. Prove by induction that the only square echelon matrix with all new
columns is the identity matrix.
3.3.11. Are the following matrices invertible? Find the inverse when possible,
and then check your answer.
⎡ 1 −1 1 ⎤ ⎡ 2 5 −1 ⎤ ⎡1 1⎤
⎢ ⎥ ⎢ ⎥ ⎢ 3⎥
1
⎢ ⎥ ⎢ ⎥ ⎢1 1⎥
⎢2 0 1⎥, ⎢ 4 −1 2 ⎥ , ⎢2 ⎥.
2
⎢ ⎥ ⎢ ⎥ ⎢1 4⎥
1
⎢3 0 1⎥ ⎢6 4 1⎥ ⎢ 1⎥
3
⎣ ⎦ ⎣ ⎦ ⎣3 5⎦
1
4
and thus every linear combination of the original {xj } is also a linear combi-
nation of the new {x′j }.
The question whether a linear mapping T is invertible led to solving the linear
equation Ax = 0, where A was the matrix of T . Such a linear equation, with
right side 0, is called homogeneous. An inhomogeneous linear equation
has nonzero right side,
Ax = b, A ∈ Mm,n (R), x ∈ Rn , b ∈ Rm , b ≠ 0.
Ex = P b,
and since P b is just a vector, the solutions to this can be read off as in the
homogeneous case. There may not always be solutions, however.
Exercises
3.4.3. A parent has a son and a daughter. The parent is four times as old
as the daughter, and the daughter is four years older than the son. In three
years, the parent will be five times as old as the son. How old are the parent,
daughter, and son?
3.4.4. Show that to solve an inhomogeneous linear equation, one may solve
a homogeneous system in one more variable and then restrict to solutions for
which the last variable is equal to −1.
In this section all matrices are square, n × n. The goal is to define a function
that takes such a matrix, with its n2 entries, and returns a single number.
The putative function is called the determinant,
det ∶ Mn (R) Ð→ R.
For every square matrix A ∈ Mn (R), the scalar det(A) should contain as
much algebraic and geometric information about the matrix as possible. Not
surprisingly, so informative a function is complicated to encode.
This context nicely demonstrates a pedagogical principle already men-
tioned in Section 3.1: characterizing a mathematical object illuminates its
construction and its use. Rather than beginning with a definition of the de-
terminant, we will stipulate a few natural behaviors for it, and then we will
eventually see that
• there is a function with these behaviors (existence),
• there is only one such function (uniqueness), and, most importantly,
• these behaviors, rather than the definition, further show how the function
works (consequences).
We could start at the first bullet (existence) and proceed from the construction
of the determinant to its properties, but when a construction is complicated
(as the determinant’s construction is), it fails to communicate intent, and
pulling it out of thin air as the starting point of a long discussion is an obstacle
to understanding. A few naturally gifted readers will see what the unexplained
idea really is, enabling them to skim the ensuing technicalities and go on to
start using the determinant effectively; some other tough-minded readers can
work through the machinery and then see its operational consequences; but
it is all too easy for the rest of us to be defeated by disorienting detail-fatigue
before the presentation gets to the consequential points and provides any
energizing clarity.
Another option would be to start at the second bullet (uniqueness), letting
the desired properties of the determinant guide our construction of it. This
3.5 The Determinant: Characterizing Properties and Their Consequences 89
det ∶ Rn × ⋯ × Rn Ð→ R.
det(r1 , . . . , rj , . . . , ri , . . . , rn ) = − det(r1 , . . . , ri , . . . , rj , . . . , rn ).
What the condition does say is that if all rows but one of a square matrix are
held fixed, then the determinant of the matrix varies linearly as a function
of the one row. By induction, an equivalent statement of multilinearity is the
more cluttered
but to keep the notation manageable, we work with the simpler version.
We will prove the following theorem in the next section.
det ∶ Rn × ⋯ × Rn Ð→ R.
In more structural language, Theorem 3.5.1 says that the multilinear skew-
symmetric functions from the n-fold product of Rn to R form a 1-dimensional
vector space over R, and {det} is a basis.
The reader may object that even if the conditions of multilinearity, skew-
symmetry, and normalization are grammatically natural, they are concep-
tually opaque. Indeed, they reflect considerable hindsight, since the idea of
a determinant originally emerged from explicit calculations. But again, the
payoff is that characterizing the determinant rather than constructing it illu-
minates its many useful properties. The rest of the section can be viewed as
an amplification of this idea.
For one consequence of the determinant’s existence, with no reference to
its uniqueness, consider the standard basis of Rn taken in order,
(e1 , . . . , en ).
det(A−1 ) = (det(A))−1 .
δ(r1 , . . . , rj , . . . , ri , . . . , rn ) = det(r1 B, . . . , rj B, . . . , ri B, . . . , rn B)
= − det(r1 B, . . . , ri B, . . . , rj B, . . . , rn B)
= −δ(r1 , . . . , ri , . . . , rj , . . . , rn ).
It follows from Theorem 3.5.1 that δ(A) = det(B) det(A), and this is the
desired main result det(AB) = det(A) det(B) of the theorem. Finally, if A is
invertible then
And we note that the same result holds for the trace, introduced in Exer-
cise 3.2.5, in consequence of that exercise,
More facts about the determinant are immediate consequences of its char-
acterizing properties.
The proofs of statements (2) and (3) are similar. For (4), if E = I then det(E) =
1, because the determinant is normalized. Otherwise the bottom row of E is 0,
and because a linear function takes 0 to 0 it follows that det(E) = 0. ⊔
⊓
For one consequence of Theorem 3.5.2 and Proposition 3.5.3, recall that
every matrix A ∈ Mn (R) has a transpose matrix AT , obtained by flipping A
about its northwest–southeast diagonal. The next theorem (whose proof is
Exercise 3.5.4) says that all statements about the determinant as a function
of the rows of A also apply to the columns. This fact will be used without
comment from now on. In particular, det(A) is the unique multilinear skew-
symmetric normalized function of the columns of A.
Proof. We may consider only upper triangular matrices, because a lower tri-
angular matrix has an upper triangular matrix for its transpose. The 3×3 case
makes the general argument clear. The determinant of a 3×3 upper triangular
matrix A is
3 3 3
det A = det( ∑ a1i1 ei1 , ∑ a2i2 ei2 , ∑ a3i3 ei3 ),
i1 =1 i2 =2 i3 =3
3 3 3
det A = ∑ ∑ ∑ a1i1 a2i2 a3i3 det(ei1 , ei2 , ei3 ).
i1 =1 i2 =2 i3 =3
3 3
det A = ∑ ∑ a1i1 a2i2 a33 det(ei1 , ei2 , e3 ),
i1 =1 i2 =2
2
det A = ∑ a1i1 a22 a33 det(ei1 , e2 , e3 ).
i1 =1
⊔
⊓
3.5 The Determinant: Characterizing Properties and Their Consequences 95
Exercises
3.5.1. Consider a scalar-valued function of pairs of vectors,
ip ∶ Rn × Rn Ð→ R,
satisfying the following three properties.
(1) The function is bilinear,
ip(αx + α′ x′ , y) = α ip(x, y) + α′ ip(x′ , y),
ip(x, βy + β ′ y ′ ) = β ip(x, y) + β ′ ip(x, y ′ )
for all α, α′ , β, β ′ ∈ R and x, x′ , y, y ′ ∈ Rn .
(2) The function is symmetric,
ip(x, y) = ip(y, x) for all x, y ∈ Rn .
(3) The function is normalized,
ip(ei , ej ) = δij for all i, j ∈ {1, . . . , n}.
(The Kronecker delta δij was defined in Section 2.2.)
96 3 Linear Mappings and Their Matrices
Compute that this function, if it exists at all, must be the inner product.
On the other hand, we already know that the inner product has these three
properties, so this exercise has shown that it is characterized by them.
3.5.2. Let n ≥ 2. This exercise proves, without invoking the determinant, that
every succession of pair-exchanges of the ordered set
(1, 2, . . . , n)
(i j)(∗ ∗),
(i j)(i j),
(i j)(i k), k ∉ {i, j},
(i j)(j k), k ∉ {i, j},
(i j)(k ℓ), k, ℓ ∉ {i, j}, k ≠ ℓ.
(∗ ∗)(i ∗)
where the first exchange does not involve the ith slot. Next we may apply
the same argument to the second and third exchanges, then to the third and
fourth, and so on. Eventually, either a contradiction arises from the first of
the four cases, or only the last pair-exchange involves the ith slot. Explain
why the second possibility is untenable, completing the argument.
3.5.3. Let f ∶ Rn × ⋯ × Rn Ð→ R be a multilinear skew-symmetric function,
and let c be a real number. Show that the function cf is again multilinear and
skew-symmetric.
3.5.4. This exercise shows that det(AT ) = det(A) for every square matrix A.
(a) Show that det(RT ) = det(R) for every elementary matrix R. (That is,
R can be a recombine matrix, a scale matrix, or a transposition matrix.)
3.6 The Determinant: Uniqueness and Existence 97
A=[ ].
ab
cd
and similarly its second row is r2 = ce1 + de2 . Thus, since we view the deter-
minant as a function of rows, its determinant must be
Since the determinant is linear in its first vector variable, this expands to
det(ae1 + be2 , ce1 + de2 ) = a det(e1 , ce1 + de2 ) + b det(e2 , ce1 + de2 ),
and since the determinant is also linear in its second vector variable, this
expands further,
And finally, since the determinant is normalized, we have found the only
possible formula for the 2 × 2 case,
det(A) = ad − bc.
But now that we have the only possible formula, checking that indeed it
satisfies the desired properties is purely mechanical. For example, to verify
linearity in the first vector variable, compute
For skew-symmetry,
det((c, d), (a, b)) = cb − da = −(ad − bc) = − det((a, b), (c, d)).
We should also verify linearity in the second vector variable, but this no longer
requires the defining formula. Instead, since the formula is skew-symmetric
and is linear in the first variable,
3.6 The Determinant: Uniqueness and Existence 99
This little trick illustrates the value of thinking in general terms: a slight
modification, inserting a few occurrences of “. . . ” and replacing the subscripts
1 and 2 by i and j, shows that for every n, the three required conditions for
the determinant are redundant—linearity in one vector variable combines with
skew-symmetry to ensure linearity in each vector variable.
One can similarly show that for a 1 × 1 matrix,
A = [a],
det(A) = a,
and that indeed this works. The result is perhaps silly, but the exercise of
working through a piece of language and logic in the simplest instance can
help one to understand its more elaborate cases. As another exercise, the same
techniques show, granting that each permutation of three elements has only
one parity, that the only possible formula for a 3 × 3 determinant is
⎡a b c ⎤
⎢ ⎥
⎢ ⎥
det ⎢d e f ⎥ = aek + bf g + cdh − af h − bdk − ceg.
⎢ ⎥
⎢g h k ⎥
⎣ ⎦
This formula is complicated enough that we should rethink it in a more sys-
tematic way before verifying that it has the desired properties. And we may
as well generalize it to arbitrary n in the process. Here are some observations
about the 3 × 3 formula:
• It is a sum of 3-fold products of matrix entries.
• Every 3-fold product contains one element from each row of the matrix.
• Every 3-fold product also contains one element from each column of the
matrix. So every 3-fold product arises from the positions of three rooks
that don’t threaten each other on a 3 × 3 chessboard.
• Every 3-fold product comes weighted by a “+” or a “−”.
Similar observations apply to the 1×1 and 2×2 formulas. Our general formula
should encode them. Making it do so is partly a matter of notation, but also
an idea is needed to describe the appropriate distribution of plus signs and
minus signs among the terms. The following language provides all of this.
⎛n n n ⎞
δ(r1 , r2 , . . . , rn ) = δ ∑ a1i ei , ∑ a2j ej , . . . , ∑ anp ep
⎝i=1 j=1 p=1 ⎠
n n n
= ∑ ∑ ⋯ ∑ a1i a2j ⋯anp δ(ei , ej , . . . , ep ).
i=1 j=1 p=1
where
c = δ(e1 , . . . , en ).
Especially, a possible formula for a multilinear skew-symmetric normalized
function is
det(r1 , r2 , . . . , rn ) = ∑ (−1) a1i a2j ⋯anp .
π
π=(i,j,...,p)
det ∶ Mn (R) Ð→ R,
Figure 3.9. The rook placement for (2, 3, 1), showing the two inversions
det [ ] = ad − bc,
ab
cd
and even the silly 1 × 1 formula det[a] = a. The 2 × 2 and 3 × 3 cases are
worth memorizing. They can be visualized as adding the products along
northwest–southeast diagonals of the matrix and then subtracting the prod-
ucts along southwest–northeast diagonals, where the word diagonal connotes
wraparound in the 3×3 case. (See Figure 3.10.) But be aware that this pattern
of the determinant as the northwest–southeast diagonals minus the southwest–
northeast diagonals is valid only for n = 2 and n = 3.
We have completed the program of the second bullet at the beginning of the
previous section, finding the only possible formula (the one in Definition 3.6.2)
that could satisfy the three desired determinant properties, its uniqueness
dependent on its doing so if we haven’t already shown that each permutation
has a unique sign. That is, we have now proved the uniqueness but not yet
the existence of the determinant in Theorem 3.5.1, the uniqueness possibly
provisional on the existence.
3.6 The Determinant: Uniqueness and Existence 103
− − −
+ + +
The first bullet tells us to prove the existence by verifying that the com-
puted determinant formula indeed satisfies the three stipulated determinant
properties. Similarly to the 2 × 2 case, this is a mechanical exercise. The im-
pediments are purely notational, but the notation is admittedly cumbersome,
and so the reader is encouraged to skim the next proof.
Proof. (1) If A has rows ri = (ai1 , . . . , ain ) except that its kth row is the linear
combination αrk + rk′ where rk = (ak1 , . . . , akn ) and rk′ = (a′k1 , . . . , a′kn ), then
its (i, j)th entry is
⎧
⎪
⎪aij if i ≠ k,
⎨
⎪
⎪ + if i = k.
⎩ kj
′
αa a kj
Thus
as desired.
(2) Let A have rows r1 , . . . , rn where ri = (ai1 , . . . , ain ). Suppose that rows
k and k + 1 are exchanged. The resulting matrix has (i, j)th entry
104 3 Linear Mappings and Their Matrices
⎧
⎪ if i ∉ {k, k + 1},
⎪
⎪
⎪
aij
⎨ak+1,j if i = k,
⎪
⎪
⎪
⎪ if i = k + 1.
⎩akj
For each permutation π ∈ Sn , define a companion permutation π ′ by exchang-
ing the kth and (k + 1)st entries,
Thus π ′ (k) = π(k + 1), π ′ (k + 1) = π(k), and π ′ (i) = π(i) for all other i.
As π varies through Sn , so does π ′ , and for each π we have the relation
(−1)π = −(−1)π (Exercise 3.6.6). The defining formula of the determinant
′
gives
det(r1 , . . . , rk+1 , rk , . . . , rn )
= ∑(−1)π a1π(1) ⋯ak+1,π(k) akπ(k+1) ⋯anπ(n)
π
π′
= − det(r1 , . . . , rk , rk+1 , . . . , rn ).
The previous calculation establishes the result when adjacent rows of A are
exchanged. To exchange rows k and ℓ in A where ℓ > k, carry out the following
adjacent row exchanges to trickle the kth row down to the ℓth and then bubble
the ℓth row back up to the kth, bobbing each row in between them up one
position and then back down:
The display shows that the process carries out an odd number of exchanges
(all but the bottom one come in pairs), each of which changes the sign of the
determinant.
(3) This is left to the reader (Exercise 3.6.7). ⊔
⊓
function must be a scalar multiple of the determinant. The last comment nec-
essary to complete the proof of Theorem 3.5.1 is that since the determinant
is multilinear and skew-symmetric, so are its scalar multiples. This fact was
shown in Exercise 3.5.3.
The reader is invited to contemplate how unpleasant it would have been
to prove the various theorems about the determinant in the previous section
using the unwieldy determinant formula, with its n! terms, each an n-fold
product. That said, the theorems really can be shown directly from the for-
mula. For example, to prove that det(AT ) = det(A), one can write
det(AT ) = ∑ (−1)π aπ(1)1 aπ(2)2 ⋯aπ(n)n ,
π∈Sn
and then persuade oneself that this is also the sum over the permutations π ′
that undo the permutations π, and the undo-permutations have the same
signs as the originals,
π ′ ∈Sn
and this is det(A). Here we are adumbrating basic ideas from group theory.
The previous section has already established that the determinant of a
triangular matrix is the product of the diagonal entries, but the result also
follows immediately from the determinant formula (Exercise 3.6.8). This fact
should be cited freely to save time.
An algorithm for computing det(A) for every A ∈ Mn (R) is now at hand.
Algebraically, the idea is that if
P1 AP2 = ∆,
where P1 and P2 are products of elementary matrices and ∆ is a triangular
matrix, then since the determinant is multiplicative,
det(A) = det(P1 )−1 det(∆) det(P2 )−1 .
Multiplying A by P2 on the right carries out a sequence of column operations
on A, just as multiplying A by P1 on the left carries out row operations. Recall
that the determinants of the elementary matrices are
det(Ri;j,a ) = 1,
det(Si,a ) = a,
det(Ti;j ) = −1.
Procedurally, this all plays out as follows.
Proposition 3.6.4 (Determinant algorithm). Given A ∈ Mn (R), use row
and column operations—recombines, scales, transpositions—to reduce A to a
triangular matrix ∆. Then det(A) is det(∆) times the reciprocal of each scale
factor and times −1 for each transposition.
106 3 Linear Mappings and Their Matrices
The only role that the determinant formula (as compared to the determi-
nant properties) played in obtaining this algorithm is that it gave the deter-
minant of a triangular matrix easily.
For example, the matrix
⎡1/0! 1/3!⎤
⎢ ⎥
⎢1/1! 1/4!⎥
1/1! 1/2!
⎢ ⎥
A=⎢ ⎥
1/2! 1/3!
⎢1/2! 1/5!⎥
⎢ 1/3! 1/4! ⎥
⎢1/3! 1/6!⎥
⎣ 1/4! 1/5! ⎦
becomes, after scaling the first row by 3!, the second row by 4!, the third row
by 5!, and the fourth row by 6!,
⎡ 6 6 3 1⎤
⎢ ⎥
⎢ 24 12 4 1⎥
⎢ ⎥
B=⎢ ⎥.
⎢ 60 20 5 1⎥
⎢ ⎥
⎢120 30 6 1⎥
⎣ ⎦
Subtract the first row from each of the others to get
⎡ 6 6 3 1⎤
⎢ ⎥
⎢ 18 6 1 0⎥
⎢ ⎥
C =⎢ ⎥,
⎢ 54 14 2 0⎥
⎢ ⎥
⎢114 24 3 0⎥
⎣ ⎦
and then scale the third row by 1/2 and the fourth row by 1/3, yielding
⎡ 6 6 3 1⎤
⎢ ⎥
⎢18 6 1 0⎥
⎢ ⎥
D=⎢ ⎥.
⎢27 7 1 0⎥
⎢ ⎥
⎢38 8 1 0⎥
⎣ ⎦
Next subtract the second row from the third row and the fourth rows, and
scale the fourth row by 1/2 to get
⎡6 6 3 1⎤
⎢ ⎥
⎢18 6 1 0⎥
⎢ ⎥
E=⎢ ⎥.
⎢9 1 0 0⎥
⎢ ⎥
⎢10 1 0 0⎥
⎣ ⎦
Subtract the third row from the fourth, transpose the first and fourth columns,
and transpose the second and third columns, leading to
⎡1 6⎤
⎢ ⎥
⎢0 18⎥
36
⎢ ⎥
∆=⎢ ⎥.
16
⎢0 9⎥
⎢ 01 ⎥
⎢0 1⎥
⎣ 00 ⎦
This triangular matrix has determinant 1, and so according to the algorithm,
3.6 The Determinant: Uniqueness and Existence 107
2⋅3⋅2 1
det(A) = = .
6! 5! 4! 3! 1036800
In the following exercises, feel free to use the determinant properties and
the determinant formula in whatever combined way gives you the least work.
Exercises
3.6.1. For this exercise, let n and m be positive integers, not necessarily equal,
and let Rn × ⋯ × Rn denote m copies of Rn . Consider any multilinear function
f ∶ Rn × ⋯ × Rn Ð→ R.
a1 = (a11 , . . . , a1n ),
a2 = (a21 , . . . , a2n ),
⋮
am = (am1 , . . . , amn ),
explain why
n n n
f (a1 , a2 , . . . , am ) = ∑ ∑ ⋯ ∑ a1i a2j ⋯amp f (ei , ej , . . . , ep ).
i=1 j=1 p=1
3.6.2. Use the three desired determinant properties to derive the formulas in
this section for 1 × 1 and 3 × 3 determinants. Verify that the 1 × 1 formula
satisfies the properties.
3.6.3. For each permutation, count the inversions and compute the sign:
(2, 3, 4, 1), (3, 4, 1, 2), (5, 1, 4, 2, 3).
3.6.6. Explain why (−1)π = −(−1)π in the proof of part (2) of Proposi-
′
tion 3.6.3.
108 3 Linear Mappings and Their Matrices
3.6.7. Use the defining formula of the determinant to reproduce the result
that det(In ) = 1.
3.6.8. Explain why in every term (−1)π a1π(1) a2π(2) ⋯anπ(n) from the deter-
minant formula, ∑ni=1 π(i) = ∑ni=1 i. Use this to reexplain why the determinant
of a triangular matrix is the product of its diagonal entries.
3.6.9. Calculate the determinants of the following matrices:
⎡ 4 3 −1 2 ⎤ ⎡ 1 −1 2 3 ⎤
⎢ ⎥ ⎢ ⎥
⎢0 1 2 3⎥ ⎢2 2 0 2⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥, ⎢ ⎥.
⎢1 0 4 1⎥ ⎢ 4 1 −1 −1 ⎥
⎢ ⎥ ⎢ ⎥
⎢2 0 3 0⎥ ⎢1 2 3 0⎥
⎣ ⎦ ⎣ ⎦
3.6.10. Show that the Vandermonde matrix,
⎡1 a a2 ⎤
⎢ ⎥
⎢ ⎥
⎢1 b b2 ⎥ ,
⎢ ⎥
⎢1 c c2 ⎥
⎣ ⎦
has determinant (b − a)(c − a)(c − b). For what values of a, b, c is the Vander-
monde matrix invertible? (The idea is to do the problem conceptually rather
than writing out the determinant and then factoring it, so that the same ideas
would work for larger matrices. The determinant formula shows that the de-
terminant in the problem is a polynomial in a, b, and c. What is its degree in
each variable? Why must it vanish if any two variables are equal? Once you
have argued that that the determinant is as claimed, don’t forget to finish the
problem.)
3.6.11. Consider the following n × n matrix based on Pascal’s triangle:
⎡1 1 1 1 ⋯ 1 ⎤
⎢ ⎥
⎢1 2 3 4 ⋯ n ⎥
⎢ ⎥
⎢ ⎥
⎢1 3 6 10 ⋯ n(n+1) ⎥
A=⎢ ⎢1 4 10 20 ⋯
⎥.
⋅ ⎥
2
⎢ ⎥
⎢⋮ ⋮ ⋮ ⎥
⎢ ⋮ ⋮ ⎥
⎢ ⎥
⎢1 n n(n+1) ⋅ ⋯ ⋅ ⎥
⎣ 2 ⎦
Find det(A). (Hint: Row and column reduce.)
3.6.12. This exercise constructs a determinant with no reference to permu-
tations or their signs, inductively on the dimension n of the matrix. Define
det1 ([a]) = a, and then
n
detn (A) = ∑ (−1)1+j a1j detn−1 (A1j ), n ≥ 2,
j=1
T ∶ Rn Ð→ Rn
having matrix
A ∈ Mn (R).
In Section 3.3 we discussed a process to invert A and thereby invert T . Now,
with the determinant in hand, we can also write the inverse of A explicitly in
closed form. Because the formula giving the inverse involves many determi-
nants, it is hopelessly inefficient for computation. Nonetheless, it is of interest
to us for a theoretical reason (the pending Corollary 3.7.3) that we will need
in Chapter 5.
be the (n − 1) × (n − 1) matrix obtained by deleting the ith row and the jth
column of A. The classical adjoint of A is the n × n matrix whose (i, j)th
entry is (−1)i+j times the determinant of Aj,i ,
d −b
adj
[ ] =[ ].
ab
cd −c a
Already for a 3 × 3 matrix, the formula for the classical adjoint is daunting,
⎡ ⎤
⎢ det [ e f ] − det [ b c ] det [ b c ] ⎥
⎢ ⎥
⎡a b c ⎤adj ⎢ ⎥
⎢ ⎥
h k h k e f
⎢ ⎥ ⎢ ac ⎥
⎢ ⎥ ⎢
⎢d e f ⎥ = ⎢ − det [ ] det [ ] − det [ ]⎥
df ac
⎢ ⎥
⎢g h k ⎥ ⎢ df ⎥ ⎥
⎢ ⎥
gk gk
⎣ ⎦ ⎢ ⎥
⎢ det [ d e
] − [
a b
] [
a b
] ⎥
⎢ det det
de ⎥
⎣ gh gh ⎦
⎡ ek − f h ch − bk bf − ce ⎤
⎢ ⎥
⎢ ⎥
= ⎢ f g − dk ak − cg cd − af ⎥ .
⎢ ⎥
⎢ dh − eg bg − ah ae − bd ⎥
⎣ ⎦
110 3 Linear Mappings and Their Matrices
d −b
A=[ ] and Aadj = [ ],
ab
cd −c a
compute that
ad − bc
A Aadj = [ ] = (ad − bc) [ ] = det(A)I2 .
0 10
0 ad − bc 01
A Aadj = det(A)In .
Exercise
3.7.1. Verify at least one diagonal entry and at least one off-diagonal entry
in the formula A Aadj = det(A)In for n = 3.
3.8 Geometry of the Determinant: Volume 111
T ∶ Rn Ð→ Rn .
E ⊂ Rn ,
T E ⊂ Rn ,
vol T E = t ⋅ vol E.
B = {α1 e1 + ⋯ + αn en ∶ 0 ≤ α1 ≤ 1, . . . , 0 ≤ αn ≤ 1}.
112 3 Linear Mappings and Their Matrices
Thus box means interval when n = 1, rectangle when n = 2, and the usual
notion of box when n = 3. Let p be a point in Rn , let a1 , . . . , an be positive
real numbers, and let B ′ denote the box spanned by the vectors a1 e1 , . . . , an en
and translated by p,
B ′ = {α1 a1 e1 + ⋯ + αn an en + p ∶ 0 ≤ α1 ≤ 1, . . . , 0 ≤ αn ≤ 1}.
(See Figure 3.11. The figures of this section are set in two dimensions, but the
ideas are general and hence so are the figure captions.) A face of a box is the
set of its points such that some particular αi is held fixed at 0 or at 1 while
the others vary. A box in Rn has 2n faces.
p + a 2 e2
B′ e2
p p + a 1 e1 B
e1
Figure 3.11. Scaling and translating the unit box
vol B = 1.
And we assume that scaling any spanning vector of a box affects the box’s
volume continuously in the scaling factor. It follows that scaling any spanning
vector of a box by a real number a magnifies the volume by ∣a∣. To see this,
first note that scaling a spanning vector by an integer ℓ creates ∣ℓ∣ abutting
translated copies of the original box, and so the desired result follows in this
case from finite additivity. A similar argument applies to scaling a spanning
vector by a reciprocal integer 1/m (m ≠ 0), since the original box is now ∣m∣
copies of the scaled one. These two special cases show that the result holds
for scaling a spanning vector by any rational number r = ℓ/m. Finally, the
continuity assumption extends the result from the rational numbers to the
real numbers, since every real number is approached by a sequence of rational
numbers. Since the volume of the unit box is normalized to 1, since volume
3.8 Geometry of the Determinant: Volume 113
vol B ′ = a1 ⋯an .
and such that the boxes that complete the partial union to the full union have
a small sum of volumes,
M
∑ vol Bi < ε. (3.9)
i=N +1
M
U − L = ∑ vol Bi < ε.
i=N +1
abbreviated to P when the vectors are firmly fixed. Again the terminology
is pandimensional, meaning in particular interval, parallelogram, and paral-
lelepiped in the usual sense for n = 1, 2, 3. We will also consider translations
of parallelepipeds away from the origin by offset vectors p,
P ′ = P + p = {v + p ∶ v ∈ P}.
(See Figure 3.13.) A face of a parallelepiped is the set of its points such that
some particular αi is held fixed at 0 or at 1 while the others vary. A paral-
lelepiped in Rn has 2n faces. Boxes are special cases of parallelepipeds. The
methods of Chapter 6 will show that parallelepipeds are well approximated by
boxes, and so they have well-defined volumes. We assume that parallelepiped
volume is finitely additive, and we assume that every finite union of paral-
lelepipeds each having volume zero again has volume zero.
p + v2
P′
p + v1
v2
P
v1
p
T (p) T B′
B ′
p
B
TB
Figure 3.14. Linear image of the unit box and of a scaled translated box
3.8 Geometry of the Determinant: Volume 115
We need one last preliminary result about volume. Again let E be a subset
of Rn that is well approximated by boxes. Fix a linear mapping T ∶ Rn Ð→
Rn . Very similarly to the argument for E, the set T E also should have a
volume, because it is well approximated by parallelepipeds. Indeed, the set
containments (3.8) are preserved under the linear mapping T ,
T ⋃ Bi ⊂ T E ⊂ T ⋃ Bi .
N M
i=1 i=1
In general, the image of a union is the union of the images, so this can be
rewritten as
⋃ T Bi ⊂ T E ⊂ ⋃ T Bi .
N M
i=1 i=1
116 3 Linear Mappings and Their Matrices
(See Figure 3.15.) As before, numbers at most big enough and at least big
enough for the volume of T E are
N M
L = vol ⋃ T Bi = ∑ vol T Bi , U = vol ⋃ T Bi = ∑ vol T Bi .
N M
The only new wrinkle is that citing the finite additivity of parallelepiped
volume here assumes that the parallelepipeds T Bi either inherit from the
original boxes Bi the property of being disjoint except possibly for shared
faces, or they all have volume zero. The assumption is valid because if T is
invertible then the inheritance holds, while if T is not invertible then we will
see later in this section that the T Bi have volume zero, as desired. With this
point established, let t be the factor by which T magnifies box-volume. The
previous display and (3.10) combine to show that the difference of the bounds
is
M M M
U − L = ∑ vol T Bi = ∑ t ⋅ vol Bi = t ⋅ ∑ vol Bi ≤ tε.
i=N +1 i=N +1 i=N +1
vol T E t(V + ε)
≤ ≤
tV
vol E
.
V +ε V
Since ε can be arbitrarily small, the left and right quantities in the display
can be arbitrarily close to t, and so the only possible value for the quantity in
the middle (which is independent of ε) is t. Thus we have the desired equality
announced at the beginning of this section,
vol T E = t ⋅ vol E.
The discussion for scale matrices, transposition matrices, and echelon ma-
trices generalizes effortlessly from 2 to n dimensions, but generalizing the dis-
cussion for recombine matrices Ri;j,a takes a small argument. Because trans-
position matrices have no effect on volume, we may multiply Ri;j,a from the
left and from the right by various transposition matrices to obtain R1;2,a and
study it instead. Multiplication by R1;2,a preserves all of the standard basis
vectors except e2 , which is taken to ae1 + e2 as before. The resulting paral-
lelepiped P(e1 , ae1 + e2 , e3 , . . . , en ) consists of the parallelogram shown in the
right side of Figure 3.16, extended one unit in each of the remaining orthogo-
nal n−2 directions of Rn . The n-dimensional volume of the parallelepiped is its
base (the area of the parallelogram, 1) times its height (the (n−2)-dimensional
volume of the unit box over each point of the parallelogram, again 1). That is,
the n × n recombine matrix still magnifies volume by 1, the absolute value of
its determinant, as desired. The base times height property of volume is yet
another invocation here, but it is a consequence of a theorem to be proved in
Chapter 6, Fubini’s theorem. Summarizing, we have the following result.
3.8 Geometry of the Determinant: Volume 119
The work of this section has given a geometric interpretation of the mag-
nitude of det A: it is the magnification factor of multiplication by A. If the
columns of A are denoted c1 , . . . , cn then Aej = cj for j = 1, . . . , n, so that
even more explicitly ∣ det A∣ is the volume of the parallelepiped spanned by
the columns of A. For instance, to find the volume of the 3-dimensional par-
allelepiped spanned by the vectors (1, 2, 3), (2, 3, 4), and (3, 5, 8), compute
⎡1 2 3⎤
that
⎢ ⎥
⎢ ⎥
∣ det ⎢2 3 5⎥ ∣ = 1.
⎢ ⎥
⎢3 4 8⎥
⎣ ⎦
Exercises
3.8.1. (a) This section states that the image of a union is the union of the
images. More specifically, let A and B be any sets, let f ∶ A Ð→ B be any
mapping, and let A1 , . . . , AN be any subsets of A. Show that
f ( ⋃ Ai ) = ⋃ f (Ai ).
N N
i=1 i=1
3.8.4. (a) Express the matrix [ 01 −10 ] as a product of recombine and scale
matrices (you may not need both types).
(b) Use part (a) to describe counterclockwise rotation of the plane through
the angle π/2 as a composition of shears and scales.
3.8.7. Let P be the parallelogram in R2 spanned by (a, c) and (b, d). Cal-
culate directly that ∣ det [ ac db ] ∣ = area P. (Hint: area = base × height =
∣(a, c)∣ ∣(b, d)∣ ∣ sin θ(a,c),(b,d) ∣. It may be cleaner to find the square of the area.)
3.8.8. This exercise shows directly that ∣ det ∣ = volume in R3 . Let P be the
parallelepiped in R3 spanned by v1 , v2 , v3 , let P ′ be spanned by the vectors
v1′ , v2′ , v3′ obtained from performing the Gram–Schmidt process on the vj ’s,
let A ∈ M3 (R) have rows v1 , v2 , v3 , and let A′ ∈ M3 (R) have rows v1′ , v2′ , v3′ .
(a) Explain why det A′ = det A.
(b) Give a plausible geometric argument that vol P ′ = vol P.
(c) Show that
⎡∣v ′ ∣2 0 0 ⎤
⎢ 1 ⎥
⎢ ⎥
A A = ⎢ 0 ∣v2 ∣ 0 ⎥ .
′ ′t
⎢ ′ 2⎥
′ 2
⎢ 0 0 ∣v3 ∣ ⎥
⎣ ⎦
Explain why therefore ∣ det A′ ∣ = vol P ′ . It follows from parts (a) and (b) that
∣ det A∣ = vol P.
Recall from Section 2.1 that a basis of Rn is a set of vectors {f1 , . . . , fp } such
that every vector in Rn is a unique linear combination of the {fj }. Though
strictly speaking, a basis is only a set, we adopt here the convention that the
basis vectors are given in the specified order indicated. Given such a basis,
view the vectors as columns and let F denote the matrix in Mn,p (R) with
columns f1 , . . . , fp . Thus the order of the basis vectors is now relevant. For
a standard basis vector ej of Rp , the matrix-by-vector product F ej gives the
jth column fj of F . Therefore, for every vector x = (x1 , . . . , xp ) ∈ Rp (viewed
as a column),
⎛p ⎞ p p
F x = F ⋅ ∑ x j ej = ∑ x j F e j = ∑ x j f j .
⎝j=1 ⎠ j=1 j=1
{f1 , . . . , fp } is a basis of Rn
each y ∈ Rn is uniquely expressible
⇐⇒ ( )
as a linear combination of the {fj }
each y ∈ Rn takes the form
⇐⇒ ( )
y = F x for a unique x ∈ Rp
⇐⇒ F is invertible
⇐⇒ F is square (i.e., p = n) and det F ≠ 0.
f2
f1
f1 f2
f3
f2
f2
f1
f1 f3
The calculation lets us interpret the sign of det A geometrically: if det A > 0
then T preserves the orientation of bases, and if det A < 0 then T reverses
orientation. For example, the mapping with matrix
⎡0 0 0 1⎤
⎢ ⎥
⎢1 0 0 0⎥
⎢ ⎥
⎢ ⎥
⎢0 1 0 0⎥
⎢ ⎥
⎢0 0 1 0⎥
⎣ ⎦
reverses orientation in R4 .
To summarize: Let A be an n × n matrix. Whether det A is nonzero says
whether A is invertible; the magnitude of det A is the factor by which A
magnifies volume; and (assuming that det A ≠ 0) the sign of det A determines
how A affects orientation. The determinant is astonishing.
3.10 The Cross Product, Lines, and Planes in R3 123
Exercises
There is the question of which way u×v should point along the line orthogonal
to the plane spanned by u and v. The natural answer is that the direction
should be chosen to make the ordered triple of vectors {u, v, u × v} positive
unless it is degenerate,
det(u, v, u × v) ≥ 0. (3.12)
Also there is the question of how long u × v should be. With hindsight, we
assert that specifying the length to be the area of the parallelogram spanned
by u and v will work well. That is,
The three desired geometric properties (3.11) through (3.13) seem to describe
the cross product completely. (See Figure 3.22.)
The three geometric properties also seem disparate. However, they combine
into a uniform algebraic property, as follows. Since the determinant in (3.12) is
nonnegative, it is the volume of the parallelepiped spanned by u, v, and u × v.
124 3 Linear Mappings and Their Matrices
u
Figure 3.22. The cross product of u and v
The volume is the base times the height, and because u × v is normal to u
and v, the base is the area of P(u, v) and the height is ∣u × v∣. Thus
Since orthogonal vectors have inner product 0, since the determinant is 0 when
two rows agree, and since the square of the absolute value is the vector’s inner
product with itself, we can rewrite (3.11) and this last display (obtained from
(3.12) and (3.13)) uniformly as equalities of the form ⟨u × v, w⟩ = det(u, v, w)
for various w,
⟨u × v, u⟩ = det(u, v, u),
⟨u × v, v⟩ = det(u, v, v), (3.14)
⟨u × v, u × v⟩ = det(u, v, u × v).
Instead of saying what the cross product is, as an equality of the form u × v =
f (u, v) would, the three equalities of (3.14) say how the cross product interacts
with certain vectors—including itself—via the inner product. Again, the idea
is to characterize rather than construct.
(The reader may object to the argument just given that det(u, v, u × v) =
area P(u, v) ∣u × v∣, on the grounds that we don’t really understand the area
of a 2-dimensional parallelogram in 3-dimensional space to start with, that
in R3 we measure volume rather than area, and the parallelogram surely has
volume zero. In fact, the argument can be viewed as motivating the formula
as the definition of the area. This idea will be discussed more generally in
Section 9.1.)
Based on (3.14), we leap boldly to an intrinsic algebraic characterization
of the cross product.
Definition 3.10.1 (Cross product). Let u and v be any two vectors in R3 .
Their cross product u × v is defined by the property
3.10 The Cross Product, Lines, and Planes in R3 125
That is, u × v is the unique vector x ∈ R3 such that ⟨x, w⟩ = det(u, v, w) for
all w ∈ R3 .
As with the determinant earlier, we do not yet know that the characterizing
property determines the cross product uniquely, or even that a cross product
that satisfies the characterizing property exists at all. But also as with the
determinant, we defer those issues and first reap the consequences of the
characterizing property with no reference to an unpleasant formula for the
cross product. Of course the cross product will exist and be unique, but for
now the point is that graceful arguments with its characterizing property show
that it has all the further properties that we want it to have.
Proof. (1) This follows from the skew-symmetry of the determinant. For every
w ∈ R3 ,
Since w is arbitrary, v × u = −u × v.
(2) For the first variable, this follows from the linearity of the determinant
in its first row-vector variable and the linearity of the inner product in its first
vector variable. Fix a, a′ ∈ R, u, u′ , v ∈ R3 . For every w ∈ R3 ,
⟨(au + a′ u′ ) × v, w⟩ = det(au + a′ u′ , v, w)
= a det(u, v, w) + a′ det(u′ , v, w)
= a⟨u × v, w⟩ + a′ ⟨u′ × v, w⟩
= ⟨a(u × v) + a′ (u′ × v), w⟩.
Since w is arbitrary, (au + a′ u′ ) × v = a(u × v) + a′ (u′ × v). The proof for the
second variable follows from the result for the first variable and from (1).
126 3 Linear Mappings and Their Matrices
⟨u × v, w⟩ = det(u, v, w) ≠ 0.
Therefore u × v ≠ 0.
(5) By (4), u × v ≠ 0, so 0 < ⟨u × v, u × v⟩ = det(u, v, u × v). By the results
on determinants and orientation, {u, v, u × v} is right-handed.
(6) By definition, ∣u × v∣2 = ⟨u × v, u × v⟩ = det(u, v, u × v). As discussed
earlier in this section, det(u, v, u × v) = area P(u, v) ∣u × v∣. The result follows
from dividing by ∣u × v∣ if it is nonzero, and from (4) otherwise. ⊔
⊓
Now we show that the characterizing property determines the cross prod-
uct uniquely. The idea is that a vector’s inner products with all other vectors
completely describe the vector itself. The observation to make is that for every
vector x ∈ Rn (n need not be 3 in this paragraph),
That is, the inner product values ⟨x, w⟩ for all w ∈ Rn specify x, as anticipated.
To prove that the cross product exists, it suffices to write a formula for it
that satisfies the characterizing property in Definition 3.10.1. Since we need
the cross product to have components
3.10 The Cross Product, Lines, and Planes in R3 127
⟨u × v, e1 ⟩ = det(u, v, e1 ),
⟨u × v, e2 ⟩ = det(u, v, e2 ),
⟨u × v, e3 ⟩ = det(u, v, e3 ),
the only possible formula is to construct the cross product from these compo-
nents,
u × v = (det(u, v, e1 ), det(u, v, e2 ), det(u, v, e3 )).
This formula indeed satisfies the definition, because by definition of the inner
product and then by the linearity of the determinant in its third argument,
we have for every w = (w1 , w2 , w3 ) ∈ R3 ,
⟨u × v, w⟩ = det(u, v, e1 ) ⋅ w1 + det(u, v, e2 ) ⋅ w2 + det(u, v, e3 ) ⋅ w3
= det(u, v, w1 e1 + w2 e2 + w3 e3 )
= det(u, v, w).
In coordinates, the formula for the cross product is
⎡u u u ⎤ ⎡u u u ⎤ ⎡u u u ⎤
⎢ 1 2 3⎥ ⎢ 1 2 3⎥ ⎢ 1 2 3⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
u × v = (det ⎢ v1 v2 v3 ⎥ , det ⎢ v1 v2 v3 ⎥ , det ⎢ v1 v2 v3 ⎥)
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢1 0 0⎥ ⎢0 1 0⎥ ⎢0 0 1⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
= (u2 v3 − u3 v2 , u3 v1 − u1 v3 , u1 v2 − u2 v1 ).
A bit more conceptually, the cross product formula in coordinates is
⎡u u u ⎤
⎢ 1 2 3⎥
⎢ ⎥
u × v = det ⎢ v1 v2 v3 ⎥ .
⎢ ⎥
⎢ e1 e2 e3 ⎥
⎣ ⎦
The previous display is only a mnemonic device: strictly speaking, it doesn’t
lie within our grammar, because the entries of the bottom row are vectors
rather than scalars. But even so, its two terms u1 v2 e3 − u2 v1 e3 do give the
third entry of the cross product, and similarly for the others. In Chapter 9,
where we will have to compromise our philosophy of working intrinsically
rather than in coordinates, this formula will be cited and generalized. In the
meantime, its details are not important except for mechanical calculations,
and we want to use it as little as possible, as with the determinant earlier.
Indeed, the display shows that the cross product is essentially a special case
of the determinant.
It is worth knowing the cross products of the standard basis pairs,
e1 × e1 = 03 , e 1 × e2 = e3 , e1 × e3 = −e2 ,
e2 × e1 = −e3 , e2 × e2 = 03 , e 2 × e3 = e1 ,
e3 × e1 = e2 , e3 × e2 = −e1 , e3 × e3 = 03 .
Here ei × ej is 03 if i = j, and ei × ej is the remaining standard basis vector if
i ≠ j and i and j are in order in the diagram
128 3 Linear Mappings and Their Matrices
/2
1Y
3
and ei × ej is minus the remaining standard basis vector if i ≠ j and i and j
are out of order in the diagram.
p+d
P (p, n) = {q ∈ R3 ∶ ⟨q − p, n⟩ = 0}.
In coordinates, a point (x, y, z) lies in P ((xp , yp , zp ), (xn , yn , zn )) exactly when
(x − xp )xn + (y − yp )yn + (z − zp )zn = 0.
Exercises
3.10.1. Evaluate (2, 0, −1) × (1, −3, 2).
3.10.2. Suppose that a vector v ∈ R3 takes the form v = u1 × e1 = u2 × e2 for
some u1 and u2 . Describe v.
3.10.3. True or false: For all u, v, w in R3 , (u × v) × w = u × (v × w).
3.10.4. Express (u + v) × (u − v) as a scalar multiple of u × v.
3.10.5. (a) Let U, V ∈ Mn (R) be skew-symmetric, meaning that U T = −U and
similarly for V , where U T is the transpose of U (Exercise 3.2.4). Show that aU
is skew-symmetric for every a ∈ R, and that U + V is skew-symmetric. Thus
the skew-symmetric matrices form a vector space. Show furthermore that the
Lie bracket [U, V ] = U V − V U is skew-symmetric. One can optionally check
that although the Lie bracket product is not in general associative, it instead
satisfies the Jacobi identity,
[[U, [V, W ]] + [V, [W, U ]] + [W, [U, V ]] = 0.
(b) Encode the vectors u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) as 3 × 3 skew-
symmetric matrices,
⎡ 0 −u −u ⎤ ⎡ 0 −v −v ⎤
⎢ 2⎥ ⎢ 2⎥
⎢ ⎥ ⎢ ⎥
1 1
U = ⎢ u1 0 −u3 ⎥ , V = ⎢ v1 0 −v3 ⎥ .
⎢ ⎥ ⎢ ⎥
⎢ u2 u3 0 ⎥ ⎢ v2 v3 0 ⎥
⎣ ⎦ ⎣ ⎦
Show that the Lie bracket product [U, V ] encodes the cross product u × v.
130 3 Linear Mappings and Their Matrices
3.10.6. Investigate the extent to which a cancellation law holds for the cross
product, as follows: for fixed u, v in R3 with u ≠ 0, describe the vectors w
satisfying the condition u × v = u × w.
3.10.7. What is the line specified by two points p and p′ ?
3.10.8. Give conditions on the points p, p′ and the directions d, d′ so that
ℓ(p, d) = ℓ(p′ , d′ ).
3.10.9. Express the relation between the coordinates of a point on ℓ(p, d) if
the x-component of d is 0.
3.10.10. What can you conclude about the lines
x − x p y − y p z − zp x − x p y − y p z − zp
= = and = =
xd yd zd xD yD zD
given that xd xD + yd yD + zd zD = 0? What can you conclude if instead xd /xD =
yd /yD = zd /zD ?
3.10.11. Show that ℓ(p, d) and ℓ(p′ , d′ ) intersect if and only if the linear
equation Dt = ∆p is solvable, where D ∈ M3,2 (R) has columns d and d′ , t
is the column vector [ tt12 ], and ∆p = p′ − p. For what points p and p′ do
ℓ(p, (1, 2, 2)) and ℓ(p′ , (2, −1, 4)) intersect?
3.10.12. Use vector geometry to show that the distance from the point q to
the line ℓ(p, d) is
∣(q − p) × d∣
∣d∣
.
(Hint: what is the area of the parallelogram spanned by q − p and d?) Find
the distance from the point (3, 4, 5) to the line ℓ((1, 1, 1), (1, 2, 3)).
3.10.13. Show that the time of nearest approach of two particles whose po-
sitions are s(t) = p + tv, s̃(t) = p̃ + tṽ is t = −⟨∆p, ∆v⟩/∣∆v∣2 . (You may assume
that the particles are at their nearest approach when the difference of their
velocities is orthogonal to the difference of their positions.)
3.10.14. Write the equation of the plane through (1, 2, 3) with normal direc-
tion (1, 1, 1).
3.10.15. Where does the plane x/a + y/b + z/c = 1 intersect each axis?
3.10.16. Specify the plane containing the point p and spanned by directions
d and d′ . Specify the plane containing the three points p, q, and r.
3.10.17. Use vector geometry to show that the distance from the point q to
the plane P (p, n) is
∣⟨q − p, n⟩∣
∣n∣
.
(Hint: Resolve q − p into components parallel and normal to n.) Find the
distance from the point (3, 4, 5) to the plane P ((1, 1, 1), (1, 2, 3)).
4
The Derivative
f (a + h) − f (a)
f ′ (a) = lim .
h→0 h
But for every integer n > 1, the corresponding expression makes no sense for
a mapping f ∶ Rn Ð→ Rm and for a point a of Rn . Indeed, the expression is
f (a + h) − f (a)
lim ,
h→0n h
but this is not even grammatically admissible—there is no notion of division by
the vector h. That is, the standard definition of derivative does not generalize
to more than one input variable.
The breakdown here cannot be repaired by any easy patch. We must re-
think the derivative altogether in order to extend it to many variables.
Fortunately, the reconceptualization is richly rewarding.
Exercise
ϕ1/2
ϕ1
ϕ3
ψ(h, k) = h, φ(h, k) = k
are O((h, k)) because the size bounds say that they are bounded absolutely by
the O(h)-mapping ϕ1 (h, k) = ∣(h, k)∣, i.e., ∣ψ(h, k)∣ = ∣h∣ ≤ ∣(h, k)∣ and similarly
for φ. For general n and for every i ∈ {1, . . . , n}, now letting h denote a vector
again as usual rather than the first component of a vector as it did a moment
ago, the ith component function
ψ ∶ Rn Ð→ R, ψ(h) = hi
is O(h) by the same argument. We will use this observation freely in the
sequel.
The o(1) and O(h) and o(h) conditions give rise to predictable closure
properties.
Proposition 4.2.4 (Vector space properties of the Landau spaces).
For every fixed domain-ball B(0n , ε) and codomain-space Rm , the o(1)-map-
pings form a vector space, and O(h) forms a subspace, of which o(h) forms
a subspace in turn. Symbolically,
i.e., o(1) and O(h) and o(h) absorb addition and scalar multiplication.
The fact that o(1) forms a vector space encodes the rules that sums and
constant multiples of continuous mappings are again continuous.
Proof (Sketch). Consider any ϕ, ψ ∈ o(1). For every c > 0,
∣(αϕ)(h)∣ ≤ (∣α∣c)∣h∣.
The argument for o(h) is similar to the argument for o(1) (Exercise 4.2.3). ⊔
⊓
For example, the function
Using the left side of the size bounds and then the vector space properties of
o(1) and then the right side of the size bounds, we get
m
∣ϕ∣ is o(1) Ô⇒ each ∣ϕj ∣ is o(1) Ô⇒ ∑ ∣ϕi ∣ is o(1) Ô⇒ ∣ϕ∣ is o(1).
i=1
Thus ∣ϕ∣ is o(1) if and only if each ∣ϕi ∣ is. As explained just above, we may
drop the absolute values, and so in fact ϕ is o(1) if and only if each ϕi is,
as desired. The arguments for the O(h) and o(h) conditions are the same
(Exercise 4.2.4). The componentwise nature of the o(1) condition encodes the
componentwise nature of continuity.
The role of linear mappings in the Landau notation scheme is straightfor-
ward, affirming the previously mentioned intuition that the O(h) condition de-
scribes at-most-linear growth and the o(h) condition describes smaller-than-
linear growth.
4.2 New Environment: The Bachmann–Landau Notation 137
Proposition 4.2.5. Every linear mapping is O(h). The only o(h) linear
mapping is the zero mapping.
That is, ∣T (h)∣ > c∣h∣ for some arbitrarily small h-values, i.e., it is not the case
that ∣T (h)∣ ≤ c∣h∣ for all small enough h. Thus T fails the o(h) definition for
the particular constant c = ∣T (ho )∣/2. ⊔
⊓
ϕ, ψ, ϕψ ∶ B(0n , ε) Ð→ R.
Proof. Let c > 0 be given. For some d > 0, for all h close enough to 0n ,
and so
∣(ϕψ)(h)∣ ≤ c∣h∣.
The second statement of the proposition follows from its first statement and
the previous proposition. ⊔
⊓
π1 , π2 ∶ R2 Ð→ R, π1 (h, k) = h, π2 (h, k) = k.
The proposition combines with the vector space properties of o(h, k) to say
that the functions
138 4 The Derivative
α, β ∶ R2 Ð→ R, α(h, k) = h2 − k 2 , β(h, k) = hk
o(o(1)) = o(1),
O(O(h)) = O(h),
o(O(h)) = o(h),
O(o(h)) = o(h).
That is, o(1) and O(h) absorb themselves, and o(h) absorbs O(h) from either
side.
The rule o(o(1)) = o(1) encodes the persistence of continuity under com-
position.
Proof. For example, to verify the third rule, suppose that ϕ ∶ B(0n , ε) Ð→ Rm
is O(h) and that ψ ∶ B(0m , ρ) Ð→ Rℓ is o(k). Then
Since c is some particular positive number and d can be any positive number,
cd again can be any positive number. That is, letting e = cd and combining
the previous two displays, we have
for every e > 0, ∣(ψ ○ ϕ)(h)∣ ≤ e∣h∣ for all small enough h.
there exist c > 0 and δ > 0 such that ∣ϕ(h)∣ ≤ c∣h∣ if ∣h∣ ≤ δ
4.2 New Environment: The Bachmann–Landau Notation 139
and that
for every d > 0 there exists εd > 0 such that ∣ψ(k)∣ ≤ d∣k∣ if ∣k∣ ≤ εd .
Now let e > 0 be given. Define d = e/c and ρe = min{δ, εd /c}. Suppose that
∣h∣ ≤ ρe . Then
and so
That is,
∣ψ(ϕ(h))∣ ≤ e∣h∣ since cd = e.
This shows that ψ ○ ϕ is o(h), since for every e > 0 there exists ρe > 0 such
that ∣(ψ ○ φ)(h)∣ ≤ e∣h∣ if ∣h∣ ≤ ρe .
The other rules are proved similarly (Exercise 4.2.5). ⊔
⊓
Exercises
ϕe ∶ Rn Ð→ R, ϕ(x) = ∣x∣e .
(a) Suppose that e > 0. Let c > 0 be given. If ∣h∣ ≤ c1/e then what do we
know about ∣ϕe (h)∣ in comparison to c? What does this tell us about ϕe ?
(b) Prove that ϕ1 is O(h).
(c) Suppose that e > 1. Combine parts (a) and (b) with the product prop-
erty for Landau functions (Proposition 4.2.6) to show that ϕe is o(h).
(d) Explain how parts (a), (b), and (c) have proved Proposition 4.2.2.
4.2.4. Establish the componentwise nature of the O(h) condition, and estab-
lish the componentwise nature of the o(h) condition.
f (a + h) − f (a)
f ′ (a) = lim ,
h→0 h
is a construction. To rethink the derivative, we should characterize it instead.
To think clearly about what it means for the graph of a function to have a
tangent of slope t at a point (a, f (a)), we should work in local coordinates and
normalize to the case of a horizontal tangent. That is, given a function f of
x-values near some point a, and given a candidate tangent-slope t at (a, f (a)),
define a related function g of h-values near 0,
Thus g takes 0 to 0, and the graph of g near the origin is like the graph of f
near (a, f (a)) but with the line of slope t subtracted. To reiterate, the idea
that f has a tangent of slope t at (a, f (a)) has been normalized to the tidier
idea that g has slope 0 at the origin:
To say that the graph of g is horizontal at the origin is to say that for
every positive real number c, however small, the region between the
lines of slope ±c contains the graph of g close enough to the origin.
That is:
The intuitive condition for the graph of g to be horizontal at the origin
is precisely that g is o(h). The horizontal nature of the graph of g
at the origin connotes that the graph of f has a tangent of slope t
at (a, f (a)).
The symbolic connection between this characterization of the derivative
and the constructive definition is immediate. As always, the definition of f
having derivative f ′ (a) at a is
f (a + h) − f (a)
lim = f ′ (a),
h→0 h
which is to say,
f (a + h) − f (a) − f ′ (a)h
lim = 0,
h→0 h
and indeed, this is precisely the o(h) condition on g. Figure 4.2 illustrates the
idea that when h is small, not only is the vertical distance f (a + h) − f (a) −
f ′ (a)h from the tangent line to the curve small as well, but it is small even
relative to the horizontal distance h.
We need to scale these ideas up to many dimensions. Instead of viewing
the one-variable derivative as the scalar f ′ (a), think of it as the corresponding
4.3 One-Variable Revisionism: The Derivative Redefined 141
f (x)
f (a + h)
f (a) + f ′ (a)h
f (a)
x
a a+h
Ta (h)
f (a + h) − f (a)
Ta (h)
h
h
Now we can define the derivative in a way that encompasses many variables
and is suitably local.
at (a, b) if its graph has a well-fitting tangent plane through (a, b, f (a, b)).
(See Figure 4.4.) Here the derivative of f at (a, b) is the linear mapping
taking (h, k) to αh + βk, and the Jacobian matrix of f at a is therefore [α, β].
The tangent plane in the figure is not the graph of the derivative Df(a,b) ,
but rather a translation of the graph. Another way to say this is that the
(h, k, Df(a,b) (h, k))-coordinate system has its origin at the point (a, b, f (a, b))
in the figure.
f (x, y)
T (h, k)
k
h
x (a, b) y
matrix is [ βγ ]. Note that the figure does not show the domain of f , so it may
α
f (a)
ℓ
f ∶ A Ð→ R, f (x, y) = x2 − y 2 .
We show that for every point (a, b) ∈ A, f is differentiable at (a, b), and its
derivative is the linear mapping
To verify this, we need to check Definition 4.3.2. The point that is written
in the definition intrinsically as a (where a is a vector) is written here in
coordinates as (a, b) (where a and b are scalars), and similarly the vector h in
the definition is written (h, k) here, because the definition is intrinsic, whereas
here we are going to compute. To check the definition, first note that every
point (a, b) of A is an interior point; the fact that every point of A is interior
doesn’t deserve a detailed proof right now, only a quick comment. Second,
confirm the derivative’s characterizing property (4.1) by calculating that
We saw immediately after the product property for Landau functions (Propo-
sition 4.2.6) that h2 −k 2 is o(h, k). This is the desired result. Also, the calcula-
tion tacitly shows how the derivative was found for us to verify: the difference
f (a + h, b + k) − f (a, b) is 2ah − 2bk + h2 − k 2 , which as a function of h and k has
a linear part 2ah − 2bk and a quadratic part h2 − k 2 that is much smaller when
h and k are small. The linear approximation of the difference is the derivative.
Before continuing, we need to settle a grammatical issue. Definition 4.3.2
refers to any linear mapping that satisfies condition (4.1) as the derivative of f
at a. Fortunately, the derivative, if it exists, is unique, justifying the definite
article. The uniqueness is geometrically plausible: if two straight objects (e.g.,
lines or planes) approximate the graph of f well near (a, f (a)), then they
4.3 One-Variable Revisionism: The Derivative Redefined 145
should also approximate each other well enough that straightness forces them
to coincide. The quantitative argument amounts to recalling that the only
linear o(h)-mapping is zero.
Proof. Suppose that the linear mappings Ta , T̃a ∶ Rn Ð→ Rm are both deriva-
tives of f at a. Then the two mappings
are both o(h). By the vector space properties of o(h), so is their difference
(T̃a − Ta )(h). Since the linear mappings from Rn to Rm form a vector space
as well, the difference T̃a − Ta is linear. But the only o(h) linear mapping is
the zero mapping, so T̃a = Ta as desired. ⊔
⊓
Proof. Compute, using the differentiability of f at a and the fact that lin-
ear mappings are O(h), then the containment o(h) ⊂ O(h) and the closure
of O(h) under addition, and finally the containment O(h) ⊂ o(1), that
We will study the derivative via two routes. On the one hand, the linear
mapping Dfa ∶ Rn Ð→ Rm is specified by mn scalar entries of its matrix f ′ (a),
and so calculating the derivative is tantamount to determining these scalars
by using coordinates. On the other hand, developing conceptual theorems
without getting lost in coefficients and indices requires the intrinsic idea of
the derivative as a well-approximating linear mapping.
Exercises
4.3.1. Let T ∶ Rn Ð→ Rm be a linear mapping. Show that for every ε > 0, the
behavior of T on B(0n , ε) determines the behavior of T everywhere.
Being the zero mapping, C(a + h) − C(a) − Z(h) is crushingly o(h), showing
that Z meets the condition to be DCa . And (2) is similar (Exercise 4.4.1). ⊓
⊔
The proof is a matter of seeing that the vector space properties of o(h)
encode the sum rule and constant multiple rule for derivatives.
Proof. Since f and g are differentiable at a, some ball about a lies in A and
some ball about a lies in B. The smaller of these two balls lies in A ∩ B. That
is, a is an interior point of the domain of f + g. With this topological issue
settled, proving the proposition reduces to direct calculation. For (1),
Elaborate mappings are built by composing simpler ones. The next theo-
rem is the important result that the derivative of a composition is the composi-
tion of the derivatives. That is, the best linear approximation of a composition
is the composition of the best linear approximations.
The fact that we can prove that the derivative of a composition is the
composition of the derivatives without an explicit formula for the derivative
is akin to the fact in the previous chapter that we could prove that the deter-
minant of the product is the product of the determinants without an explicit
formula for the determinant.
Proof. To showcase the true issues of the argument clearly, we reduce the
problem to a normalized situation. For simplicity, we first take a = 0n and
f (a) = 0m . So we are given that
Compute that
Since o(h) ⊂ O(h) and O(h) is closed under addition, since o(h) absorbs O(h)
from either side, and since o(h) is closed under addition, the error (the last
two terms on the right side of the previous display) is
exactly as desired. The crux of the matter is that o(h) absorbs O(h) from
either side.
For the general case, no longer assuming that a = 0n and f (a) = 0m , we
are given that
Compute that
and from here the proof that the remainder term is o(h) is precisely as it is
in the normalized case. ⊓
⊔
Two quick applications of the chain rule arise naturally for scalar-valued
functions. Given two such functions, not only is their sum defined, but because
R is a field (unlike Rm for m > 1), so is their product and so is their quotient at
points where g is nonzero. With some help from the chain rule, the derivative
laws for product and quotient follow easily from elementary calculations.
p ∶ R2 Ð→ R, p(x, y) = xy,
Then:
(1) The derivative of p at every point (a, b) ∈ R2 exists and is
By the size bounds, ∣h∣ ≤ ∣(h, k)∣ and ∣k∣ ≤ ∣(h, k)∣, so ∣hk∣ = ∣h∣ ∣k∣ ≤ ∣(h, k)∣2 .
Since ∣(h, k)∣2 is ϕ2 (h, k) (where ϕe is the example from Proposition 4.2.2), it
is o(h, k).
Statement (2) is left as Exercise 4.4.3. ⊔
⊓
D(f g)a (h) = D(p ○ (f, g))a (h) = (Dp(f,g)(a) ○ D(f, g)a )(h)
= Dp(f (a),g(a)) (Dfa (h), Dga (h)),
Dp(f (a),g(a)) (Dfa (h), Dga (h)) = g(a)Dfa (h) + f (a)Dga (h).
This proves (1). Statement (2) is similar (Exercise 4.4.4) but with the wrinkle
that one needs to show that since g(a) ≠ 0 and since Dga exists, it follows
that a is an interior point of the domain of f /g. Here it is relevant that g
must be continuous at a, and so by the persistence of inequality principle
(Proposition 2.3.10), g is nonzero on some ε-ball at a, as desired. ⊔
⊓
Df(a,b) (h, k)
(Y + 1)(a, b)D(X 2 − Y )(a,b) − (X 2 − Y )(a, b)D(Y + 1)(a,b)
= (h, k)
((Y + 1)(a, b))2
(b + 1)(D(X 2 )(a,b) − DY(a,b) ) − (a2 − b)(DY(a,b) + D1(a,b) )
= (h, k)
(b + 1)2
(b + 1)(2X(a, b)DX(a,b) − Y ) − (a2 − b)Y
= (h, k)
(b + 1)2
(b + 1)(2aX − Y ) − (a2 − b)Y
= (h, k)
(b + 1)2
(b + 1)(2ah − k) − (a2 − b)k
=
(b + 1)2
2a a2 + 1
= h−
(b + 1)2
k.
b+1
In practice, this method is too unwieldy for any functions beyond the simplest,
and in any case, it applies only to mappings with rational component func-
tions. But on the other hand, there is no reason to expect much in the way of
computational results from our methods so far, since we have been studying
the derivative based on its intrinsic characterization. In the next section we
will construct the derivative in coordinates, enabling us to compute easily by
drawing on the results of one-variable calculus.
For another application of the chain rule, let A and B be subsets of Rn ,
and suppose that f ∶ A Ð→ B is invertible with inverse g ∶ B Ð→ A. Suppose
further that f is differentiable at a ∈ A and that g is differentiable at f (a).
The composition g ○ f is the identity mapping idA ∶ A Ð→ A, which, being the
restriction of a linear mapping, has that linear mapping id ∶ Rn Ð→ Rn as its
derivative at a. Therefore,
This argument partly shows that for invertible f as described, the linear map-
ping Dfa is also invertible. A symmetric argument completes the proof by
showing that also id = Dfa ○ Dgf (a) . Because we have methods available to
check the invertibility of a linear map, we can apply this criterion once we
know how to compute derivatives.
Not too much should be made of this result, however; its hypotheses are
too strong. Even in the one-variable case, the function f (x) = x3 from R
√ and yet has the noninvertible derivative 0 at x = 0. (The
to R is invertible
inverse, g(x) = 3 x, is not differentiable at 0, so the conditions above are not
met.) Besides, we would prefer a converse statement, that if the derivative is
invertible then so is the mapping. The converse statement is not true, but we
will see in Chapter 5 that it is locally true, i.e., it is true in the small.
152 4 The Derivative
Exercises
4.4.1. Prove part (2) of Proposition 4.4.1.
4.4.2. Prove part (2) of Proposition 4.4.2.
4.4.3. Prove part (2) of Lemma 4.4.4.
4.4.4. Prove the quotient rule.
4.4.5. Let f (x, y, z) = xyz. Find Df(a,b,c) for arbitrary (a, b, c) ∈ R3 . (Hint:
f is the product XY Z, where X is the linear function X(x, y, z) = x and
similarly for Y and Z.)
4.4.6. Define f (x, y) = xy 2 /(y − 1) on {(x, y) ∈ R2 ∶ y ≠ 1}. Find Df(a,b) where
(a, b) is a point in the domain of f .
4.4.7. (A generalization of the product rule.) Recall that a function
f ∶ Rn × Rn Ð→ R
is called bilinear if for all x, x′ , y, y ′ ∈ Rn and all α ∈ R,
f (x + x′ , y) = f (x, y) + f (x′ , y),
f (x, y + y ′ ) = f (x, y) + f (x, y ′ ),
f (αx, y) = αf (x, y) = f (x, αy).
(a) Show that if f is bilinear then f (h, k) is o(h, k).
(b) Show that if f is bilinear then f is differentiable with Df(a,b) (h, k) =
f (a, k) + f (h, b).
(c) What does this exercise say about the inner product?
4.4.8. (A bigger generalization of the product rule.) A function
f ∶ Rn × ⋯ × Rn Ð→ R
(there are k copies of Rn ) is called multilinear if for each j ∈ {1, . . . , k}, for
all x1 , . . . , xj , x′j , . . . , xk ∈ Rn and all α ∈ R,
f (x1 , . . . , xj + x′j , . . . , xk ) = f (x1 , . . . , xj , . . . , xk ) + f (x1 , . . . , x′j , . . . , xk )
f (x1 , . . . , αxj , . . . , xk ) = αf (x1 , . . . , xj , . . . , xk ).
(a) Show that if f is multilinear and a1 , . . . , ak , h1 , . . . , hk ∈ Rn then for
any j ∈ {2, . . . , k}, f (h1 , . . . , hj , aj+1 . . . , ak ) is o(h1 , . . . , hk ). The same result
holds if any j inputs to f are h’s, rather than the first j inputs, because
permuting the inputs of a multilinear function creates another multilinear
function. Flesh this argument out as much as feels necessary for your under-
standing.
(b) Show that if f is multilinear then f is differentiable with
k
Df(a1 ,...,ak ) (h1 , . . . , hk ) = ∑ f (a1 , . . . , aj−1 , hj , aj+1 , . . . , ak ).
j=1
(c) When k = n, what does this exercise say about the determinant?
4.5 Calculating the Derivative 153
f (x, y)
ℓx
(a, b)
x y
The line ℓx is tangent to a cross section of the graph of f . To see this cross
section, freeze the variable y at the value b and look at the resulting function
of one variable, ϕ(x) = f (x, b). The slope of ℓx in the vertical (x, b, z)-plane
is precisely ϕ′ (a). A small technicality here is that since (a, b) is an interior
point of A, also a is an interior point of the domain of ϕ.
Similarly, ℓy has slope ψ ′ (b) where ψ(y) = f (a, y). The linear function
approximating f (a + h, b + k) − f (a, b) for small (h, k) is now specified as
T (h, k) = ϕ′ (a)h + ψ ′ (b)k. Thus Df(a,b) has matrix [ϕ′ (a) ψ ′ (b)]. Since the
154 4 The Derivative
Dj f (a) = ϕ′ (aj )
Partial derivatives are easy to compute: fix all but one of the variables,
and then take the one-variable derivative with respect to the variable that
remains. For example, if
f (x, y, z) = ey cos x + z
then
⎢ ⋮ ⋱ ⋮ ⎥
⎢D1 fm (a) ⋯ Dn fm (a)⎥
j=1,...,n
⎣ ⎦
Proof. The idea is to read off the (i, j)th entry of f ′ (a) by studying the ith
component function of f and letting h → 0n along the jth coordinate direction
in the defining property (4.1) of the derivative. The ensuing calculation will
4.5 Calculating the Derivative 155
repeat the quick argument in Section 4.3 that the characterization of the
derivative subsumes the construction in the one-variable case.
The derivative of the component function fi at a is described by the ith
row of f ′ (a). Call the row entries di1 , di2 , . . . , din . Since linear of is matrix
times, it follows that
(Dfi )a (tej ) = dij t for all t ∈ R.
Let h = tej with t a variable real number, so that h → 0n as t → 0R . Since
(Dfi )a exists, we have as a particular instance of the characterizing property
that fi (a + h) − fi a) − (Dfi )a (h) is o(h),
∣fi (a + tej ) − fi (a) − (Dfi )a (tej )∣
0 = lim
t→0 ∣tej ∣
fi (a + tej ) − fi (a) − dij t
= lim ∣ ∣
t→0 t
fi (a + tej ) − fi (a)
= lim ∣ − dij ∣ .
t→0 t
That is,
fi (a + tej ) − fi (a)
lim = dij .
t→0 t
The previous display says precisely that Dj fi (a) exists and equals dij . ⊔
⊓
So the existence of the derivative Dfa makes necessary the existence of all
partial derivatives of all component functions of f at a. The natural question
is whether their existence is also sufficient for the existence of Dfa . It is not.
The proof of Theorem 4.5.2 was akin to the straight line test from Section 2.3:
the general condition h → 0n was specialized to h = tej , i.e., to letting h
approach 0n only along the axes. The specialization let us show that the
derivative matrix entries are the partial derivatives of the component functions
of f . But the price for this specific information was loss of generality, enough
loss that the derived necessary conditions are not sufficient.
For example, the function
⎧
⎪
⎪ 2 2
2xy
if (x, y) ≠ (0, 0),
f ∶ R2 Ð→ R, f (x, y) = ⎨ x +y
⎪
⎪ (x, y) = (0, 0)
⎩0 if
has for its first partial derivative at the origin
f (t, 0) − f (0, 0) 0−0
D1 f (0, 0) = lim = lim = 0,
t→0 t t→0 t
and similarly D2 f (0, 0) = 0; but as discussed in Chapter 2, f is not contin-
uous at the origin, much less differentiable there. However, this example is
contrived, the sort of function that one sees only in a mathematics class, and
in fact a result in the spirit of the converse to Theorem 4.5.2 does hold, though
with stronger hypotheses.
156 4 The Derivative
satisfies the defining property of the derivative. That is, we need to show that
We may take h small enough that the partial derivatives Dj f exist at all
points within distance ∣h∣ of a. Here we use the hypothesis that the partial
derivatives exist everywhere near a.
4.5 Calculating the Derivative 157
Because the partial derivatives exist, we may apply the mean value theorem
in two directions and the one-variable derivative’s characterizing property in
the third,
where ∣ci ∣ ≤ ∣hi ∣ for i = 1, 2. Since D1 f and D2 f are continuous at the point a =
(a1 , a2 , a3 ), and since the condition h → 03 squeezes each hi and ci to 0,
Also, o(1)hi = o(h) for i = 1, 2 and o(h3 ) = o(h), and so altogether we have
Note how all this compares to the discussion of the determinant in the
previous chapter. There we wanted the determinant to satisfy characterizing
properties. We found the only function that could possibly satisfy them, and
then we verified that it did. Here we wanted the derivative to satisfy a char-
acterizing property, and we found the only possibility for the derivative—the
linear mapping whose matrix consists of the partial derivatives, which must
exist if the derivative does. But analysis is more subtle than algebra: this linear
mapping need not satisfy the characterizing property of the derivative unless
we add further assumptions. The derivative-existence theorem, Theorem 4.5.3
or the slightly stronger Theorem 4.5.4, is the most substantial result so far
in this chapter. We have already seen a counterexample to the converse of
Theorem 4.5.3, in which the function had partial derivatives but wasn’t differ-
entiable because it wasn’t even continuous (page 155). For a one-dimensional
counterexample to the converse of Theorem 4.5.3, in which the derivative ex-
ists but is not continuous, see Exercise 4.5.3. The example in the exercise does
not contradict the weaker converse of the stronger Theorem 4.5.4.
To demonstrate the ideas of this section so far, consider the function
⎧
⎪
⎪ x2 +y2
x2 y
if (x, y) ≠ (0, 0),
f (x, y) = ⎨
⎪
⎪ if (x, y) = (0, 0).
⎩0
The top formula in the definition describes a rational function of x and y
on the punctured plane R2 − {(0, 0)}. Every rational function and all of its
partial derivatives are continuous on its domain (feel free to invoke this result),
and furthermore every point (a, b) away from (0, 0) lies in some ε-ball that
is also away from (0, 0). That is, for every point (a, b) ≠ (0, 0), the partial
derivatives of f exist at and about (a, b) and they are continuous at (a, b).
Thus the conditions for Theorem 4.5.3 are met, and so its conclusion follows:
f is differentiable at (a, b). Now Theorem 4.5.2 says that the derivative matrix
at (a, b) is the matrix of partial derivatives,
2ab3 a2 (a2 − b2 )
f ′ (a, b) = [ D1 f (a, b) D2 f (a, b) ] = [ ].
(a2 + b2 )2 (a2 + b2 )2
2ab3 a2 (a2 − b2 )
Df(a,b) (h, k) = +
(a2 + b2 )2 (a2 + b2 )2
h k.
However, this analysis breaks down at the point (a, b) = (0, 0). Here our only
recourse is to figure out whether a candidate derivative exists and then test
whether it works. The first partial derivative of f at (0, 0) is
∣h∣2 ∣k∣
∣f (h, k) − f (0, 0) − 0∣ = ∣f (h, k)∣ =
∣(h, k)∣2
.
√
Let (h, k) approach 02 along the line h = k. Because ∣h∣ = ∣(h, h)∣/ 2,
Figure 4.7. The crimped sheet is differentiable everywhere except at the origin
Similarly, the function g(x, y) = xey from Exercise 4.3.5 has domain R2 ,
all of whose points are interior, and its partial derivatives D1 g(x, y) = ey and
D2 g(x, y) = xey are continuous everywhere. Thus it is differentiable every-
where. Its matrix of partial derivatives at every point (a, b) is
At every (x, y) where f is defined, the partial derivatives are D1 f1 (x, y) = 2x,
D2 f1 (x, y) = −2y, D1 f2 (x, y) = 2y, and D2 f2 (x, y) = 2x. These are continuous
functions of (x, y), so for every (a, b) ≠ (0, 0), Df(a,b) exists and its matrix is
The matrix has determinant 4(a2 + b2 ) > 0, and hence it is always invertible.
On the other hand, the mapping f takes the same value at points (x, y)
and −(x, y), so it is definitely not invertible.
With the Jacobian matrix described explicitly, a more calculational version
of the chain rule is available.
Proof. The composition is differentiable by the intrinsic chain rule. The Ja-
cobian matrix of g at b is
∂f ∂w
f1 , fx , , wx , .
∂x ∂x
If x, y, z are in turn functions of s and t then a classical formulation of the
chain rule would be
= + +
∂w ∂w ∂x ∂w ∂y ∂w ∂z
. (4.2)
∂t ∂x ∂t ∂y ∂t ∂z ∂t
The formula is easily visualized as chasing back along all dependency chains
from t to w in a diagram where an arrow means contributes to:
162 4 The Derivative
♣7 E x ❃
♣ ♣♣♣♣ ☛☛ ❃❃❃
♣ ☛ ❃❃
♣♣♣ ☛☛ ❃❃
♣♣♣ ☛ ❃❃
s ✸◆◆◆ ☛☛
❃❃
✸✸ ◆◆◆ ☛☛ ❃❃
✸✸ ◆☛☛◆◆◆ ❃❃
✸✸ ☛☛ ◆◆◆ ❃
✸☛✸☛ &
8 y /w
☛☛ ✸ qq ✁ @
☛ ✸qq ✸ q q ✁
☛ q ✸ ✁✁
☛ qq ✁
☛q☛qqq ✸✸✸ ✁✁✁
t ◆◆◆ ✸✸✸ ✁✁
◆◆◆
◆◆◆ ✸✸ ✁✁✁
◆◆◆ ✸✸ ✁✁✁
◆& ✁
z
Unfortunately, for all its mnemonic advantages, the classical notation is a
veritable minefield of misinterpretation. Formula (4.2) doesn’t indicate where
the various partial derivatives are to be evaluated, for one thing. Specifying the
variable of differentiation by name rather than by position also becomes con-
fusing when different symbols are substituted for the same variable, especially
since the symbols themselves may denote specific values or other variables.
For example, one can construe many different meanings for the expression
(y, x, z).
∂f
∂x
Blurring the distinction between functions and the variables denoting their
outputs is even more problematic. If one has, say, z = f (x, t, u), x = g(t, u),
t❖❄❄❖❖❖
❄❄ ❖❖❖
❄❄ ❖❖❖
❄❄ ❖❖❖
❖/'
⑦> x ♥♥♥7 z
⑦ ♥♥
⑦⑦⑦♥♥♥♥♥
⑦♥⑦♥♥♥
u
then chasing all paths from z back to t gives
= +
∂z ∂z ∂x ∂z
∂t ∂x ∂t ∂t
with “∂z/∂t” meaning something different on each side of the equality. While
the classical formulas are useful and perhaps simpler to apply in elementary
situations, they are not particularly robust until one has a solid understand-
ing of the chain rule. On the other hand, the classical formulas work fine
in straightforward applications, so several exercises are phrased in the older
language to give you practice with it.
For example, let
4.5 Calculating the Derivative 163
[ ]=[ ]⋅[ ],
∂z/∂r ∂z/∂θ ∂z/∂x ∂z/∂y ∂x/∂r ∂x/∂θ
∂w/∂r ∂w/∂θ ∂w/∂x ∂w/∂y ∂y/∂r ∂y/∂θ
= + = 2x cos θ − 2y sin θ.
∂z ∂z ∂x ∂z ∂y
∂r ∂x ∂r ∂y ∂r
√
We are given (r, θ) = (2, π/3), and it follows that (x, y) = (1, 3). So the
answer is √
√
(2, π/3) = 2 ⋅ 1 ⋅ − 2 ⋅ 3 ⋅
1 3
= −2.
∂z
∂r 2 2
To confirm the result without using the chain rule, note that f is the polar-
to-Cartesian change of coordinates, and g is the complex squaring function in
Cartesian coordinates, so that the composition g ○ f is the squaring function
in polar coordinates. That is, the composition is
Consequently ∂z/∂r = 2r cos 2θ, and substituting (r, θ) = (2, π/3) gives in
particular (∂z/∂r)(2, π/3) = 2 ⋅ 2 cos 2π/3 = 2 ⋅ 2 ⋅ (−1/2) = −2, as we know it
must.
For another example, the function f (x) = xx is usually differentiated as
follows in one-variable calculus: Consider the related function ln(f (x)) =
ln(xx ) = x ln(x), and take derivatives of both sides to get f ′ (x)/f (x) =
1 + ln(x); thus f ′ (x) = xx (1 + ln(x)). On the other hand, if we differenti-
ate xx treating the first x as variable and the second x as constant then
we get xxx−1 = xx , and if we differentiate xx treating the first x as con-
stant and the second x as variable then we get xx ln(x); the sum of these
two sort-of-derivatives is xx (1 + ln(x)), the derivative of xx as computed a
moment ago. The method of treating the two x’s as independent has pro-
duced the right answer, despite its illegality. This can’t be a coincidence, and
it isn’t. In general, if F (x1 , . . . , xn ) is a differentiable function of many vari-
ables then the derivative of the one-variable function f (x) = F (x, x, . . . , x) is
f ′ (x) = ∑ni=1 Di F (x, x, . . . , x). Exercise 4.5.10 is to prove this formula as an
immediate consequence of the chain rule, and then to use it to establish a
result known as Leibniz’s Rule. Exercise 4.5.11(a) is to use this formula to
differentiate the function f (x) = xx , and more generally Exercise 4.5.11(b)
x
Exercises
4.5.1. Explain why in the discussion beginning this section the tangent
plane P consists of all points (a, b, f (a, b)) + (h, k, T (h, k)) where T (h, k) =
ϕ′ (a)h + ψ ′ (b)k.
4.5.2. This exercise shows that all partial derivatives of a function can exist at
and about a point without being continuous at the point. Define f ∶ R2 Ð→ R
by
⎧
⎪
⎪ 2 2 if (x, y) ≠ (0, 0),
2xy
f (x, y) = ⎨ x +y
⎪
⎪ if (x, y) = (0, 0).
⎩0
(a) Show that D1 f (0, 0) = D2 f (0, 0) = 0.
(b) Show that D1 f (a, b) and D2 f (a, b) exist and are continuous at all
other (a, b) ∈ R2 .
(c) Show that D1 f and D2 f are discontinuous at (0, 0).
4.5.3. Define f ∶ R Ð→ R by
⎧
⎪
⎪x2 sin x1 if x ≠ 0,
f (x) = ⎨
⎪
⎪ if x = 0.
⎩
0
Show that f ′ (x) exists for all x but that f ′ is discontinuous at 0. Explain how
this disproves the converse of Theorem 4.5.3.
(After you are done, compare the effort of doing the problem now to the effort
of doing it as we did at the end of Section 4.4.)
(b) f (x, y) = y−1
xy 2
on {(x, y) ∈ R2 ∶ y ≠ 1} at generic (a, b) with b ≠ 1.
⎧
⎪ √ xy if (x, y) ≠ (0, 0)
⎪
(c) f (x, y) = ⎨ x +y at generic (a, b) ≠ (0, 0) and at
⎪
2 2
⎪
⎩0 if (x, y) = (0, 0)
(0, 0).
4.5.6. Show that if z = f (xy) then x, y, and z satisfy the differential equation
x ⋅ zx − y ⋅ zy = 0.
F (x) = ∫ f (x, y) dy
b
F ∶ R Ð→ R,
y=a
= lim
h→0 h
b f (x + h, y) − f (x, y)
= lim ∫ dy
h→0 y=a h
b f (x + h, y) − f (x, y)
=∫
!
lim dy
y=a h→0 h
(x, y) dy.
b ∂f
=∫
y=a ∂x
Thus x affects G in three ways: as a contributor the lower and upper limits of
integration, and as a parameter for the integrand. What is dG(x)/dx? (Hint:
G(x) = H(x, x, x) where H(x1 , x2 , x3 ) = ∫y=α(x f (x3 , y) dy.)
β(x2 )
1)
166 4 The Derivative
4.5.11. (a) Use the ideas at the end of the section to differentiate the function
f (x) = xx .
x
(b) For x > 0, define f−1 (x) = 0 and then fn (x) = xfn−1 (x) for n ≥ 0. Thus
f0 (x) = x0 = 1, f1 (x) = x1 = x, f2 (x) = xx , f3 (x) = xx , and so on. Show that
x
Partial differentiation can be carried out more than once on nice enough func-
tions. For example, if
f (x, y) = ex sin y
then
D1 f (x, y) = sin yex sin y , D2 f (x, y) = x cos yex sin y .
Taking partial derivatives again yields
Suspiciously many of these match. The result of two or three partial differen-
tiations seems to depend only on how many were taken with respect to x and
how many with respect to y, not on the order in which they were taken.
To analyze the situation, it suffices to consider only two differentiations.
Streamline the notation by writing D2 D1 f as D12 f . (The subscripts may look
reversed, but reading D12 from left to right as D-one-two suggests the appro-
priate order of differentiating.) The definitions for D11 f , D21 f , and D22 f are
similar. These four functions are called the second-order partial derivatives
of f , and in particular D12 f and D21 f are the second-order mixed partial
derivatives. More generally, the kth-order partial derivatives of a function f are
those that come from k partial differentiations. A C k -function is a function
for which all the kth-order partial derivatives exist and are continuous. The
theorem is that with enough continuity, the order of differentiation doesn’t
matter. That is, the mixed partial derivatives agree.
D1 f (a, b + k) − D1 f (a, b)
D12 f (a, b) = lim
k→0 k
limh→0 f (a+h,b+k)−f (a,b+k)
− limh→0 f (a+h,b)−f (a,b)
= lim h h
k→0 k
f (a + h, b + k) − f (a, b + k) − f (a + h, b) + f (a, b)
= lim lim ,
k→0 h→0 hk
and similarly
f (a + h, b + k) − f (a + h, b) − f (a, b + k) + f (a, b)
D21 f (a, b) = lim lim .
h→0 k→0 hk
So, letting ∆(h, k) = f (a + h, b + k) − f (a, b + k) − f (a + h, b) + f (a, b), we want
to show that
= lim lim
∆(h, k) ∆(h, k)
lim lim .
h→0 k→0 hk k→0 h→0 hk
If the order of taking the limits doesn’t matter then we have the desired re-
sult. However, if f is not a C 2 -function then the order of taking the limits can
in fact matter, i.e., the two mixed partial derivatives can both exist but not
be equal (see Exercise 4.6.1 for an example). Thus a correct proof of Theo-
rem 4.6.1 requires a little care. The theorem is similar to Taylor’s theorem
from Section 1.3 in that both are stated entirely in terms of derivatives, but
they are most easily proved using integrals. The following proof uses integra-
tion to show that ∆(h, k)/(hk) is an average value of both D12 f and D21 f
168 4 The Derivative
near (a, b), and then letting h and k shrink to 0 forces D12 f and D21 f to
agree at (a, b), as desired. That is, the proof shows that the two quantities in
the previous display are equal by showing that each of them equals a common
third quantity.
Proof. Since f is a C 2 -function on A, every point of A is interior. Take any
point (a, b) ∈ A. Then some box B = [a, a + h] × [b, b + k] lies in A. Compute
the nested integral
a+h b+k a+h
∫ ∫ dy dx = ∫ k dx = hk.
a b a
and
D21 f (x, y) dx dy = ∆(h, k),
b+k a+h
∫ ∫
b a
and so the same argument shows that
urr = (ux xr + uy yr )r
= uxr xr + ux xrr + uyr yr + uy yrr .
170 4 The Derivative
Since ux and uy depend on r and θ via x and y just as u does, each of them can
take the place of u in the diagram above, and the chain rule gives expansions
of uxr and uyr as it did for ur ,
Note the use of equality of mixed partial derivatives. The same calculation
with θ instead of r gives
It follows that
and so
Recall that the Cartesian form of Laplace’s equation is uxx + uyy = 0. Now the
polar form follows:
r2 urr + rur + uθθ = 0.
That is,
∂2u ∂u ∂ 2 u
r2+ r + = 0.
∂r2 ∂r ∂θ2
The point of this involved calculation is that having done it once, and only
once, we now can check directly whether any given function g of the polar
variables r and θ satisfies Laplace’s equation. We no longer need to transform
each u = g(r, θ) into Cartesian terms u = f (x, y) before checking.
An n×n matrix A is orthogonal if AT A = I. (This concept was introduced
in Exercise 3.5.5.) Let A be orthogonal and consider its associated linear map,
TA ∶ Rn Ð→ Rn , TA (x) = Ax.
4.6 Higher-Order Derivatives 171
f ∶ Rn Ð→ R,
we show that
∆(f ○ TA ) = ∆f ○ TA .
To see this, start by noting that for every x ∈ Rn , the chain rule and then
the fact that the derivative of every linear map is itself give two equalities of
linear mappings,
In terms of matrices, the equality of the first and last quantities in the previous
display is an equality of row-vector-valued functions of x,
The trace of a square matrix was introduced in Exercise 3.2.5 as the sum of its
diagonal entries, and the fact that tr(A−1 BA) = tr(B) if A is invertible was
noted just after the proof of Theorem 3.5.2. Equate the traces of the matrices
in the previous display to get the desired result,
∆(f ○ TA ) = ∆f ○ TA .
and thus
n n n
Dii (f ○ TA )(x) = ∑ aji ∑ Djk f (Ax)Di (Ax)k = ∑ aji aki Djk f (Ax),
j=1 k=1 j,k=1
as desired.
Exercises
4.6.1. This exercise shows that continuity is necessary for the equality of
mixed partial derivatives. Let
⎧
⎪
⎪ x2 +y2
xy(y 2 −x2 )
if (x, y) ≠ (0, 0),
f (x, y) = ⎨
⎪
⎪ if (x, y) = (0, 0).
⎩0
Away from (0, 0), f is rational, and so it is continuous and all its partial
derivatives of all orders exist and are continuous. Show: (a) f is continuous
at (0, 0), (b) D1 f and D2 f exist and are continuous at (0, 0), (c) D12 f (0, 0) =
1 ≠ −1 = D21 f (0, 0).
For the rest of these exercises, assume that the relevant functions are C 2 .
p = x + ct, q = x − ct.
4.6 Higher-Order Derivatives 173
wpq = 0.
(b) Using part (a), show that in particular if w = F (x + ct) + G(x − ct)
(where F and G are arbitrary C 2 -functions of one variable) then w satisfies
the wave equation. Here F and G are traveling waves, F traveling backward
and G forward.
(c) Now let 0 < v < c (both v and c are constant), and define new space
and time variables in terms of the original ones by a Lorentz transformation,
Show that
so that consequently (y, u) has the same spacetime norm as (x, t),
y 2 − c 2 u 2 = x 2 − c 2 t2 .
4.6.6. Let u be a function of x and y, and suppose that x and y in turn depend
linearly on s and t,
[ ]=[ ][ ], ad − bc = 1.
x ab s
y cd t
What is the relation between uss utt − u2st and uxx uyy − u2xy ?
4.6.7. (a) Let H denote the set of points (x, y) ∈ R2 such that y > 0. Associate
to each point (x, y) ∈ H another point,
−x
(z, w) = ( ).
y
, 2
x2+ y x + y2
2
and
Consider a quantity u = f (z, w), so that also u = f˜(x, y) for a different func-
tion f˜. As usual, we have
Show that
y 2 (uxx + uyy ) = w2 (uzz + uww ).
The operator y 2 (∂ 2 /∂x2 + ∂ 2 /∂y 2 ) on H is the hyperbolic Laplacian, de-
noted ∆H . We have just established the invariance of ∆H under the hyperbolic
transformation that takes (x, y) to (z, w) = (−x/(x2 + y 2 ), y/(x2 + y 2 )).
(b) Show that the invariance relation y 2 (uxx + uyy ) = w2 (uzz + uww ) also
holds when (z, w) = (x + b, y) for every fixed real number b, and that the
relation also holds when (z, w) = (rx, ry) for every fixed positive real num-
ber r. It is known that every hyperbolic transformation of H takes the form
(z, w) = φ(x, y) where φ is a finite succession of transformations of the type
in part (a) or of the two types just addressed here. Note that consequently
this exercise has shown that the invariance relation holds for every hyperbolic
transformation of H. That is, for every hyperbolic transformation φ and for
every twice-differential function f ∶ H Ð→ R we have, analogously to the result
at the very end of this section,
∆H (f ○ φ) = ∆H f ○ φ.
4.7 Extreme Values 175
X =[ ], Y =[ ], H =[ ].
01 00 1 0
00 10 0 −1
XY − Y X = H, HX − XH = 2X, HY − Y H = −2Y.
(Xf )(x) = 1
2
∣x∣2 f (x),
(Y f )(x) = − 12 ∆f (x),
n
(Hf )(x) = n
2
f (x) + ∑ xi Di f (x).
i=1
XY − Y X = H, HX − XH = 2X, HY − Y H = −2Y.
The three matrices generate a small instance of a Lie algebra, and this exercise
shows that the space of smooth functions on Rn can be made a representation
of the Lie algebra. Further show, partly by citing the work at the end of this
section, that the action of every orthogonal matrix A on smooth functions
commutes with the representation,
X(f ○ TA ) = (Xf ) ○ TA ,
Y (f ○ TA ) = (Y f ) ○ TA ,
H(f ○ TA ) = (Hf ) ○ TA .
specifies the tangent line to the graph of f at (a, f (a)), the quadratic function
Proof. For each j ∈ {1, . . . , n}, the value f (a) is an extreme value for the one-
variable function ϕ from Definition 4.5.1 of the partial derivative Dj f (a). By
the one-variable critical point theorem, ϕ′ (aj ) = 0. That is, Dj f (a) = 0. ⊔
⊓
As an example, if
f (x, y) = sin2 x + x2 y + y 2 ,
then for every (a, b) ∈ R2 ,
and
2 cos 2a + 2b 2a
f ′′ (a, b) = [ ].
2a 2
Every n × n matrix M determines a quadratic function
QM ∶ Rn Ð→ R, QM (h) = hT M h.
⎢ ⎥⎢ ⎥
⎢mn1 ⋯ mnn ⎥ ⎢hn ⎥ i=1 j=1
⎣ ⎦⎣ ⎦
The function QM is homogeneous of degree 2, meaning that each of its terms
has degree 2 in the entries of h and therefore QM (th) = t2 QM (h) for all t ∈ R
and h ∈ Rn .
When M is the second derivative matrix of a function f at a point a, the
corresponding quadratic function is denoted Qfa rather than Qf ′′ (a) . Just as
178 4 The Derivative
f (a) + Dfa (h) gives the best affine approximation of f (a + h) for small h,
f (a) + Dfa (h) + 21 Qfa (h) gives the best quadratic approximation.
In the example f (x, y) = sin2 x + x2 y + y 2 , the second derivative matrix at
a point (a, b) defines the quadratic function
2 cos 2a + 2b 2a h
Qf(a,b) (h, k) = [h k] [ ][ ]
2a 2 k
= 2((cos 2a + b) h2 + 2a hk + k 2 ) for (h, k) ∈ R2 ,
and so the best quadratic approximation of f near, for instance, the point
(π/2, 1) is
quadratic function on R2 having a critical point at (0, 0). The graphs of nine
such quadratic functions are shown in Figure 4.9. If the best quadratic ap-
proximation of f at (a, b) is a bowl then f should have a minimum at (a, b).
Similarly for an inverted bowl and a maximum. If the best quadratic ap-
proximation is a saddle then there should be points (x, y) near (a, b) where
f (x, y) > f (a, b) and points (x′ , y ′ ) near (a, b) where f (x′ , y ′ ) < f (a, b). In
this case, (a, b) is called for obvious reasons a saddle point of f .
Returning to the example f (x, y) = sin2 x + x2 y + y 2 , note that (0, 0) is
a critical point of f because f ′ (0, 0) = [0 0]. The second derivative matrix
f ′′ (0, 0) is [ 20 02 ], and so the quadratic function 21 Qf(0,0) is given by
Qf(0,0) (h, k) = [h k] [ ] [ ] = h2 + k 2 .
1 1 20 h
2 2 02 k
Thus the graph of f looks like a bowl near (0, 0), and f (0, 0) should be a local
minimum.
This discussion is not yet rigorous. Justifying the ideas and proving the
appropriate theorems will occupy the rest of this section. The first task is to
study quadratic approximation of C 2 -functions.
Proposition 4.7.3 (Special case of Taylor’s theorem). Let I be an open
interval in R containing [0, 1]. Let ϕ ∶ I Ð→ R be a C 2 -function. Then
Figure 4.9. Two bowls, two saddles, four half-pipes, and a plane
γ ∶ I Ð→ A, γ(t) = a + th.
ϕ = f ○ γ ∶ I Ð→ R.
That is, ϕ(t) = f (a + th) is the restriction of f to the line segment from a
to a + h. By the chain rule and the fact that γ ′ = h,
The previous display can be rephrased as ϕ′ (t) = ⟨f ′ (γ(t)), h⟩, and so the
chain rule and the symmetry of f ′′ give
180 4 The Derivative
[ ] = [ −1 ][ 2 ][ ]
αβ 1 0 α 0 1 α−1 β
β δ α β 1 0 α (αδ − β ) 0 1
−1
That is, a change of variables eliminates the cross term, and the variant
quadratic function makes the results of the definiteness test clear.
The positive definite, negative definite, or indefinite character of a matrix
is preserved if the matrix entries vary by small enough amounts. Again we re-
strict our discussion to the 2×2 case. Here the result is plausible geometrically,
since it says that if the matrix M (a, b) defines a function whose graph is (for
example) a bowl, then matrices close to M (a, b) should define functions with
similar graphs, which thus should still be bowl-shaped. The same persistence
holds for a saddle, but a half-pipe can deform immediately into either a bowl
or a saddle, and so can a plane.
Proposition 4.7.7 (Persistence of definiteness). Let A be a subset of R2 ,
and let the matrix-valued mapping
M ∶ A Ð→ M2 (R), M (x, y) = [ ]
α(x, y) β(x, y)
β(x, y) δ(x, y)
f (x, y) = x2 + xy − 2x − y 2 .
1 1
f ∶ R2 Ð→ R,
2 2
Since R2 is not compact, there is no guarantee that f has any extrema. In fact,
for large x, f (x, 0) gets arbitrarily large, and for large y, f (0, y) gets arbitrarily
large in the negative direction. So f has no global extrema. Nonetheless, there
may be local ones. Every point of R2 is interior, so it suffices to examine the
critical points of f . The partial derivatives are
fx (x, y) = x + y − 2, fy (x, y) = x − y,
and the only point where both of them vanish is (x, y) = (1, 1). The second
derivative matrix is f ′′ (1, 1) = [ 11 −11 ], so the critical point (1, 1) is a saddle
point. The function f has no extrema, local or global.
4.7 Extreme Values 183
Exercises
While the roots of a polynomial with real coefficients are in general com-
plex, the roots of the characteristic polynomial of a symmetric matrix in
Mn (R) are guaranteed to be real. The characterization we want is contained
in the following theorem.
With this result one can extend the methods in this section to functions
of more than two variables.
(a) Let M be the symmetric matrix [ α
β δ ] ∈ M2 (R). Show that
β
4.7.11. This exercise eliminates the cross terms from a quadratic function of
n variables, generalizing the calculation for n = 2 in this section. Throughout,
we abbreviate positive definite to positive. Let M be a positive n×n symmetric
matrix where n > 1. This exercise shows how to diagonalize M as a quadratic
function. (This is different from diagonalizing M as a transformation, as is
done in every linear algebra course.) Decompose M as
M =[ ],
a cT
cN
f (a + tej ) − f (a)
Dj f (a) = lim ,
t→0 t
186 4 The Derivative
measures the rate of change of f at a as its input varies in the jth direction.
Visually, Dj f (a) gives the slope of the jth cross section through a of the
graph of f .
Analogous formulas measure the rate of change of f at a as its input varies
in a direction that doesn’t necessarily parallel a coordinate axis. A direction
in Rn is specified by a unit vector d, i.e., a vector d such that ∣d∣ = 1. As the
input to f moves a distance t in the d direction, f changes by f (a + td) − f (a).
Thus the following definition is natural.
f (a + td) − f (a)
Dd f (a) = lim ,
t→0 t
if this limit exists.
or, since the constant t passes through the linear map Dfa ,
f (a + td) − f (a)
lim = Dfa (d),
t→0 t
or, since the linear map Dfa has matrix [D1 f (a), . . . , Dn f (a)],
n
Dd f (a) = ∑ Dj f (a)dj
j=1
as desired.
The derivative matrix f ′ (a) of a scalar-valued function f at a is often
called the gradient of f at a and written ∇f (a). That is,
The previous calculation and this definition lead to the following theorem.
4.8 Directional Derivatives and the Gradient 187
Therefore:
• The rate of increase of f at a in the d direction varies with d, from −∣∇f (a)∣
when d points in the direction opposite to ∇f (a), to ∣∇f (a)∣ when d points
in the same direction as ∇f (a).
• In particular, the vector ∇f (a) points in the direction of greatest increase
of f at a, and its modulus ∣∇f (a)∣ is precisely this greatest rate.
• Also, the directions orthogonal to ∇f (a) are the directions in which f
neither increases nor decreases at a.
∇f (1, 1) = (D1 f (1, 1), D2 f (1, 1)) = (−2x, −4y)∣(x,y)=(1,1) = (−2, −4).
Therefore the direction of steepest descent down √ the hillside is the (2, 4)-
direction (this could be divided by its modulus 20 to make it a unit √ vector),
and the slope of steepest descent is the absolute value ∣∇f (1, 1)∣ = 20. On the
other hand, cross-country skiing in the (2, −1)-direction, which is orthogonal
to ∇f (1, 1), neither gains nor loses elevation immediately. (See Figure 4.12.)
The cross-country skiing trail that neither climbs nor descends has a mathe-
matical name.
Figure 4.12. Gradient and its orthogonal vector for the parabolic mountain
The curves on a topographical map are level sets of the altitude function.
The isotherms on a weather map are level sets of the temperature function,
and the isobars on a weather map are level sets of the pressure function.
Indifference curves in economics are level sets of the utility function, and iso-
quants are level sets of the production function. Surfaces of constant potential
in physics are level sets of the potential function.
For example, on the mountain
f ∶ R2 Ð→ R, f (x, y) = 9 − x2 − 2y 2 ,
L = {(x, y) ∈ R2 ∶ x2 + 2y 2 = 4}.
And similarly, the level set is an ellipse for every real number b up to 9. As
just mentioned, plotting the level sets of a function f of two variables gives a
topographical map description of f . The geometry is different for a function
4.8 Directional Derivatives and the Gradient 189
Figure 4.13. Level set and gradients for the sine function
of one variable: each level set is a subset of the line. For example, consider a
restriction of the sine function,
L = {π/6, 5π/6}.
For a function of three variables, each level set is a subset of space. For ex-
ample, if a, b, and c are positive numbers, and the function is
√
then its level sets are ellipsoids. Specifically, for every positive r, the √level set
√
of points taken by f to r is the ellipsoid of x-radius a r, y-radius b r, and
z-radius c r,
⎧
⎪ ⎫
⎪
⎪ ⎪
2 2 2
L = ⎨(x, y, z) ∈ R3 ∶ ( √ ) + ( √ ) + ( √ ) = 1⎬ .
x y z
⎪
⎪ ⎪
⎪
⎩ ⎭
a r b r c r
The third bullet in Theorem 4.8.2 says that the gradient is normal to the
level set. This fact may seem surprising, since the gradient is a version of the
derivative, and we think of the derivative as describing a tangent object to a
graph. The reason that the derivative has become a normal object is that
Figure 4.14. Level set and gradients for the temperature function
Although Theorem 4.8.2 has already stated that the gradient is orthogonal
to the level set, we now amplify the argument. Let f ∶ A Ð→ R (where A ⊂ Rn )
be given, and assume that it is differentiable. Let a be a point of A, and
let b = f (a). Consider the level set of f containing a,
L = {x ∈ A ∶ f (x) = b} ⊂ Rn ,
and consider any smooth curve from some interval into the level set, passing
through a,
γ ∶ (−ε, ε) Ð→ L, γ(0) = a.
The composite function
f ○ γ ∶ (−ε, ε) Ð→ R
is the constant function b, so that its derivative at 0 is 0. By the chain rule
this relation is
4.8 Directional Derivatives and the Gradient 191
∇f (a) ⋅ γ ′ (0) = 0.
Every tangent vector to L at a takes the form γ ′ (0) for some γ of the sort that
we are considering. Therefore, ∇f (a) is orthogonal to every tangent vector
to L at a, i.e., ∇f (a) is normal to L at a.
Before continuing to work with the gradient, we pause to remark that level
sets and graphs are related. For one thing:
The graph of a function is also the level set of a different function.
To see this, let n > 1, let A0 be a subset of Rn−1 , and let f ∶ A0 Ð→ R be any
function. Given this information, let A = A0 × R and define a second function
g ∶ A Ð→ R,
g(x1 , . . . , xn−1 , xn ) = f (x1 , . . . , xn−1 ) − xn .
Then the graph of f is a level of g, specifically the set of inputs that g takes
to 0,
H = {(x, y, z) ∈ R3 ∶ x2 + y 2 − z 2 = 1}.
√
(This surface is a hyperboloid of one sheet.) The point (2 2, 3, 4) belongs
to H. Note that H is a level set of the function f (x, y, z) = x2 + y 2 − z 2 , and
compute the gradient
√ √
∇f (2 2, 3, 4) = (4 2, 6, −8).
√
Since this is the normal vector to H at (2 2, 3, 4), the tangent plane equation
at the√end of Section 3.10 shows that the equation of the tangent plane to H
at (2 2, 3, 4) is
√ √
4 2(x − 2 2) + 6(y − 3) − 8(z − 4) = 0.
192 4 The Derivative
γ ∶ I Ð→ Rn
Whether (and how) one can solve this for γ depends on the data f and a.
In the case of the mountain function f (x, y) = 9 − x2 − 2y 2 , with gradient
∇f (x, y) = (−2x, −4y), the path γ has two components γ1 and γ2 , and the
differential equation and initial conditions (4.3) become
(γ1′ (t), γ2′ (t)) = (−2γ1 (t), −4γ2 (t)), (γ1 (0), γ2 (0)) = (a, b),
Let x = γ1 (t) and y = γ2 (t). Then the previous display shows that
a2 y = bx2 ,
and so the integral curve lies on a parabola. The parabola is degenerate if the
starting point (a, b) lies on either axis. Every parabola that forms an integral
curve for the mountain function meets orthogonally with every ellipse that
forms a level set. (See Figure 4.15.)
For another example, let f (x, y) = x2 − y 2 . The level sets for this function
are hyperbolas having the 45 degree lines x = y and x = −y as asymptotes.
The gradient of the function is ∇f (x, y) = (2x, −2y), so to find the integral
curve starting at (a, b), we need to solve the equations
(γ1′ (t), γ2′ (t)) = (2γ1 (t), −2γ2 (t)), (γ1 (0), γ2 (0)) = (a, b).
4.8 Directional Derivatives and the Gradient 193
Figure 4.15. Level sets and integral curves for the parabolic mountain
Thus (γ1 (t), γ2 (t)) = (ae2t , be−2t ), so that the integral curve lies on the hy-
perbola xy = ab having the axes x = 0 and y = 0 as asymptotes. The integral
curve hyperbola is orthogonal to the level set hyperbolas. (See Figure 4.16.)
For another example, let f (x, y) = ex − y. The level sets for this function
are the familiar exponential curve y = ex and all of its vertical translates. The
gradient of the function is ∇f (x, y) = (ex , −1), so to find the integral curve
starting at (0, 1), we need to solve the equations
(γ1′ (t), γ2′ (t)) = (eγ1 (t) , −1), (γ1 (0), γ2 (0)) = (0, 1).
e−γ1 (t) γ1′ (t) = 1 for all t ≥ 0 where the system is sensible,
e−γ1 (τ ) γ1′ (τ ) dτ = t.
t
∫
τ =0
Integration gives
−e−γ1 (t) + e−γ1 (0) = t,
194 4 The Derivative
is the portion of the curve y = e−x where x ≥ 0. (See Figure 4.17.) The entire
integral curve is traversed in one unit of time.
Figure 4.17. Negative exponential integral curve for exponential level sets
For another example, let f (x, y) = x2 + xy + y 2 . The level sets for this
function are tilted ellipses. The gradient of f is ∇f (x, y) = (2x + y, x + 2y), so
to find the integral curve starting at (a, b), we need to solve the equations
γ1′ (t) = 2γ1 (t) + γ2 (t), γ1 (0) = a,
γ2′ (t) = γ1 (t) + 2γ2 (t), γ2 (0) = b.
Here the two differential equations are coupled, meaning that the derivative
of γ1 depends on both γ1 and γ2 , and similarly for the derivative of γ2 . How-
ever, the system regroups conveniently,
(γ1 + γ2 )′ (t) = 3(γ1 + γ2 )(t), (γ1 + γ2 )(0) = a + b,
(γ1 − γ2 )′ (t) = (γ1 − γ2 )(t), (γ1 − γ2 )(0) = a − b.
Thus
from which
(See Figure 4.18.) The integral curves in the first two examples were quadratic
only by happenstance, in consequence of the functions 9 − x2 − 2y 2 and x2 − y 2
having such simple coefficients. Changing the mountain function to 9−x2 −3y 2
would produce cubic integral curves, and changing x2 − y 2 to x2 − 5y 2 in the
second example would produce integral curves x5 y = a5 b.
Exercises
4.8.1. Let f (x, y, z) = xy 2 + yz. Find D( 23 ,− 13 , 23 ) f (1, 1, 2).
4.8.2. Let g(x, y, z) = xyz, and let d be the unit vector in the direction from
(1, 2, 3) to (3, 1, 5). Find Dd g(1, 2, 3).
4.8.3. Let f be differentiable at a point a, and let d = −e1 , a unit vector. Are
the directional derivative Dd f (a) and the partial derivative D1 f (a) equal?
Explain.
196 4 The Derivative
4.8.9. (a) Sketch some level sets and integral curves for the function f (x, y) =
x2 + y. Find the integral curves analytically if you can.
(b) Sketch some level sets and integral curves for the function f (x, y) = xy.
Find the integral curves analytically if you can.
4.8.11. Define f ∶ R2 Ð→ R by
⎧
⎪
⎪1 if y = x2 and (x, y) ≠ (0, 0),
f (x, y) = ⎨
⎪
⎪
⎩
0 otherwise.
(a) Show that f is discontinuous at (0, 0). It follows that f is not differ-
entiable at (0, 0).
4.8 Directional Derivatives and the Gradient 197
(b) Let d be any unit vector in R2 . Show that Dd f (0, 0) = 0. Show that
consequently the formula Dd f (0, 0) = ⟨∇f (0, 0), d⟩ holds for every unit vec-
tor d. Thus, the existence of every directional derivative at a point and the
fact that each directional derivative satisfies the formula are still not sufficient
for differentiability at the point.
4.8.12. Fix two real numbers a and b satisfying 0 < a < b. Define a mapping
T = (T1 , T2 , T3 ) ∶ R2 Ð→ R3 by
(a) Describe the shape of the set in R3 mapped to by T . (The answer will
explain the name T .)
(b) Find the points (s, t) ∈ R2 such that ∇T1 (s, t) = 02 . The points map to
only four image points p under T . Show that one such p is a maximum of T1 ,
another is a minimum, and the remaining two are saddle points.
(c) Find the points(s, t) ∈ R2 such that ∇T3 (s, t) = 02 . To what points q
do these (s, t) map under T ? Which such q are maxima of T3 ? Minima? Saddle
points?
5
Inverse and Implicit Functions
and the right side is the product of two positive numbers, hence positive. But
the mean value theorem is an abstract existence theorem (“for some c”) whose
proof relies on foundational properties of the real number system. Thus, mov-
ing from the linearized problem to the actual problem is far more sophisticated
technically than linearizing the problem or solving the linearized problem. In
sum, this one-variable example is meant to amplify the point of the preced-
ing paragraph, that (now returning to n dimensions) if f ∶ A Ð→ Rn has
an invertible derivative at a then the inverse function theorem—that f itself
is invertible in the small near a—is surely inevitable, but its proof will be
technical and require strengthening our hypotheses.
Already in the one-variable case, the inverse function theorem relies on
foundational theorems about the real number system, on a property of con-
tinuous functions, and on a foundational theorem of differential calculus. We
quickly review the ideas. Let f ∶ A Ð→ R (where A ⊂ R) be a function, let a be
an interior point of A, and let f be continuously differentiable on some inter-
val about a, meaning that f ′ exists and is continuous on the interval. Suppose
that f ′ (a) > 0. Since f ′ is continuous about a, the persistence of inequality
principle (Proposition 2.3.10) says that f ′ is positive on some closed interval
[a − δ, a + δ] about a. By an application of the mean value theorem as in the
previous paragraph, f is therefore strictly increasing on the interval, and so its
restriction to the interval does not take any value twice. By the intermediate
value theorem, f takes every value from f (a − δ) to f (a + δ) on the interval.
Therefore f takes every such value exactly once, making it locally invertible.
A slightly subtle point is that the inverse function f −1 is continuous at f (a),
but then a purely formal calculation with difference quotients will verify that
the derivative of f −1 exists at f (a) and is 1/f ′ (a). Note how heavily this
proof relies on the fact that R is an ordered field. A proof of the multivariable
inverse function theorem must use other methods.
Although the proof to be given in this chapter is technical, its core idea
is simple common sense. Let a mapping f be given that takes x-values to y-
values and in particular takes a to b. Then the local inverse function must take
y-values near b to x-values near a, taking each such y back to the unique x
that f took to y in the first place. We need to determine conditions on f
that make us believe that a local inverse exists. As explained above, the basic
condition is that the derivative of f at a—giving a good approximation of f
near a, but easier to understand than f itself—should be invertible, and the
derivative should be continuous as well. With these conditions in hand, an
argument similar to that in the one-variable case (though more painstaking)
shows that f is locally injective:
• Given y near b, there is at most one x near a that f takes to y.
So the remaining problem is to show that f is locally surjective:
• Given y near b, show that there is some x near a that f takes to y.
5.1 Preliminaries 201
5.1 Preliminaries
Recall also that a subset of Rn is called closed if it contains all of its limit
points. Not unnaturally, a subset S of Rn is called open if its complement
S c = Rn − S is closed. A set, however, is not a door: it can be neither open nor
closed, and it can be both open and closed. (Examples?)
Proposition 5.1.1 (ε-balls are open). For every a ∈ Rn and every ε > 0,
the ball B(a, ε) is open.
Proof. Let x be any point in B(a, ε), and set δ = ε − ∣x − a∣, a positive number.
The triangle inequality shows that B(x, δ) ⊂ B(a, ε) (Exercise 5.1.1), and
therefore x is not a limit point of the complement B(a, ε)c . Consequently all
limit points of B(a, ε)c are in fact elements of B(a, ε)c , which is thus closed,
making B(a, ε) itself open. ⊔
⊓
202 5 Inverse and Implicit Functions
This proof shows that every point x ∈ B(a, ε) is an interior point. In fact,
an equivalent definition of open is that a subset of Rn is open if each of its
points is interior (Exercise 5.1.2).
The closed ε-ball at a, denoted B(a, ε), consists of the corresponding
open ball with its edge added in,
B(a, ε) = {x ∈ Rn ∶ ∣x − a∣ ≤ ε}.
The boundary of the closed ball B(a, ε), denoted ∂B(a, ε), is the set of
points on the edge,
∂B(a, ε) = {x ∈ Rn ∶ ∣x − a∣ = ε}.
(See Figure 5.1.) Every closed ball B and its boundary ∂B are compact sets
(Exercise 5.1.3).
V = {x ∈ A ∶ f (x) ∈ W }.
f −1 (W ) f −1 (W )
The converse to Theorem 5.1.2 is also true and is Exercise 5.1.8. We need
one last technical result for the proof of the inverse function theorem.
Thus we have reduced the problem from vector output to scalar output. To
create an environment of scalar input as well, make the line segment from x
to x̃ the image of a function of one variable,
Note that γ(0) = x, γ(1) = x̃, and γ ′ (t) = x̃ − x for all t ∈ (0, 1). Fix any
i ∈ {1, . . . , n} and consider the restriction of gi to the segment, a scalar-valued
function of scalar input,
Thus ϕ(0) = gi (x) and ϕ(1) = gi (x̃). By the mean value theorem,
gi (x̃) − gi (x) = (gi ○ γ)′ (t) = gi′ (γ(t))γ ′ (t) = gi′ (γ(t))(x̃ − x).
Because gi′ (γ(t)) is a row vector and x̃−x is a column vector, the last quantity
in the previous display is their inner product. Hence the display and the
Cauchy–Schwarz inequality give
For each j, the jth entry of the vector gi′ (γ(t)) is the partial derivative
Dj gi (γ(t)). And we are given that ∣Dj gi (γ(t))∣ ≤ c, so the size bounds show
that ∣gi′ (γ(t))∣ ≤ nc and therefore
Exercises
5.1.1. Let x ∈ B(a; ε) and let δ = ε − ∣x − a∣. Explain why δ > 0 and why
B(x; δ) ⊂ B(a; ε).
5.1 Preliminaries 205
5.1.2. Show that a subset of Rn is open if and only if each of its points is
interior.
5.1.3. Prove that every closed ball B is indeed a closed set, as is its boundary
∂B. Show that every closed ball and its boundary are also bounded, hence
compact.
5.1.4. Find a continuous function f ∶ Rn Ð→ Rm and an open set A ⊂ Rn such
that the image f (A) ⊂ Rm of A under f is not open. Feel free to choose n
and m.
5.1.5. Define f ∶ R Ð→ R by f (x) = x3 − 3x. Compute f (−1/2). Find
f −1 ((0, 11/8)), f −1 ((0, 2)), f −1 ((−∞, −11/8) ∪ (11/8, ∞)). Does f −1 exist?
5.1.6. Show that for f ∶ Rn Ð→ Rm and B ⊂ Rm , the inverse image of the
complement is the complement of the inverse image,
f −1 (B c ) = f −1 (B)c .
Before the proof, it is worth remarking that the formula for the derivative
of the local inverse, and the fact that the derivative of the local inverse is
continuous, are easy to establish once everything else is in place. If the local
inverse f −1 of f is known to exist and to be differentiable, then for every
x ∈ V the fact that the identity mapping is its own derivative combines with
the chain rule to say that
and similarly idn = Dfx ○(Df −1 )y , where this time idn is the identity mapping
on y-space. The last formula in the theorem follows. In terms of matrices, the
formula is
(f −1 )′ (y) = f ′ (x)−1 where y = f (x).
This formula combines with Proposition 4.3.4 (differentiability implies conti-
nuity) and Corollary 3.7.3 (the entries of the inverse matrix are continuous
functions of the entries of the matrix) to show that since the mapping is con-
tinuously differentiable and the local inverse is differentiable, the local inverse
is continuously differentiable: If y varies slightly, then so does x because f −1
is continuous, hence so does f ′ (x) because f ′ is continuous, hence so does
f ′ (x)−1 , which is (f −1 )′ (y). Thus we need to show only that the local inverse
exists and is differentiable.
Proof. The proof begins with a simplification. Let T = Dfa , a linear map-
ping from Rn to Rn that is invertible because its matrix f ′ (a) has nonzero
determinant. Let
f˜ = T −1 ○ f.
By the chain rule, the derivative of f˜ at a is
g̃ ○ f˜ = idn near a
and
5.2 The Inverse Function Theorem 207
f˜
̃
f T −1 /W
V^ /W o
T
g̃
The diagram shows that the way to invert f locally, going from W back to V ,
̃: g = g̃ ○ T −1 . Indeed, since f = T ○ f˜,
is to proceed through W
and Lemma 5.1.3 (with c = 1/(2n2 )) show that for every two points x and x̃
in B,
∣g(x̃) − g(x)∣ ≤ 12 ∣x̃ − x∣,
and therefore, since f = idn + g,
The previous display shows that f is injective on B, i.e., every two distinct
points of B are taken by f to distinct points of Rn . For future reference, we
note that the result of the previous calculation can be rearranged as
2ε
f
a f (a)
∂B f (∂B)
Let W = B(f (a), ε), the open ball with radius less than half the distance
from f (a) to f (∂B). Thus
That is, for every point y of W , f (a) is closer to y than every point of f (∂B)
is close to y. (See Figure 5.4.)
The goal now is to exhibit a mapping on W that inverts f near a. In
other words, the goal is to show that for each y ∈ W , there exists a unique x
interior to B such that f (x) = y. So fix an arbitrary y ∈ W . Define a function
∆ ∶ B Ð→ R that measures for each x the square of the distance between f (x)
and y,
5.2 The Inverse Function Theorem 209
f y ε
W
x f (x)
n
∆(x) = ∣f (x) − y∣2 = ∑(fi (x) − yi )2 .
i=1
The idea is to show that for one and only one x near a, ∆(x) = 0. Because the
modulus is always nonnegative, the x we seek must minimize ∆. As mentioned
at the beginning of the chapter, this simple idea inside all the technicalities
is the heart of the proof: the x to be taken to y by f must be the x that is
taken closest to y by f .
The function ∆ is continuous and B is compact, so the extreme value
theorem guarantees that ∆ does indeed take a minimum on B. Condition (5.5)
guarantees that ∆ takes no minimum on the boundary ∂B. Therefore the
minimum of ∆ must occur at an interior point x of B; this interior point x
must be a critical point of ∆, so all partial derivatives of ∆ vanish at x. Thus
by the chain rule,
n
0 = Dj ∆(x) = 2 ∑(fi (x) − yi )Dj fi (x) for j = 1, . . . , n.
i=1
f
V f −1 W
f (h) − h = o(h),
f −1 (k) − k = o(k).
For every point k ∈ W , let h = f −1 (k). Note that ∣h∣ ≤ 2∣k∣ by condition (5.4)
with x̃ = h and x = 0n , so that f (x̃) = k and f (x) = 0n , and thus h = O(k). So
now we have
Note the range of mathematical skills that this proof of the inverse func-
tion theorem required. The ideas were motivated and guided by pictures, but
the actual argument was symbolic. At the level of fine detail, we normalized
the derivative to the identity in order to reduce clutter, we made an adroit
choice of quantifier in choosing a small enough B to apply the difference mag-
nification lemma with c = 1/(2n2 ), and we used the full triangle inequality to
5.2 The Inverse Function Theorem 211
obtain (5.4). This technique sufficed to prove that f is locally injective. Since
the proof of the difference magnification lemma used the mean value theorem
many times, the role of the mean value theorem in the multivariable inverse
function theorem is thus similar to its role in the one-variable proof reviewed
at the beginning of this chapter. However, while the one-variable proof that
f is locally surjective relied on the intermediate value theorem, the multivari-
able argument was far more elaborate. The idea was that the putative x taken
by f to a given y must be the actual x taken by f closest to y. We exploited
this idea by working in broad strokes:
• The extreme value theorem from Chapter 2 guaranteed that there is such
an actual x.
• The critical point theorem and then the chain rule from Chapter 4 de-
scribed necessary conditions associated to x.
• And finally, the linear invertibility theorem from Chapter 3 showed that
f (x) = y as desired. Very satisfyingly, the hypothesis that the derivative is
invertible sealed the argument that the mapping itself is locally invertible.
Indeed, the proof of local surjectivity used nearly every significant result from
Chapters 2 through 4 of these notes.
For an example, define f ∶ R2 Ð→ R2 by f (x, y) = (x3 − 2xy 2 , x + y). Is f
locally invertible at (1, −1)? If so, what is the best affine approximation to the
inverse near f (1, −1)? To answer the first question, calculate the Jacobian
R
3x2 − 2y 2 −4xy RRRR
f ′ (1, −1) = [ ] RR = [ ].
14
1 RRR
R(x,y)=(1,−1)
1 11
This matrix is invertible with inverse f ′ (1, −1)−1 = 13 [ −11 −14 ]. Therefore f
is locally invertible at (1, −1), and the affine approximation to f −1 near
f (1, −1) = (−1, 0) is
1 −1 4 h
f −1 (−1 + h, 0 + k) ≈ [ ] + [ ] [ ] = (1 − h + k, −1 + h − k).
1 1 4 1 1
−1 3 1 −1 k 3 3 3 3
The actual inverse function f −1 about (−1, 0) may not be clear, but the inverse
function theorem guarantees its existence, and its affine approximation is easy
to find.
Exercises
Compute,
as desired.
Also, we review the argument in Section 4.8 that every graph is a level
set. Let A0 be a subset of Rr , and let f ∶ A0 Ð→ Rc be any mapping. Let
A = A0 × Rc (a subset of Rn ) and define a second mapping g ∶ A Ð→ Rc ,
214 5 Inverse and Implicit Functions
and this is the set of inputs to g that g takes to 0c , a level set of g as desired.
Now we return to rephrasing the question at the beginning of this section.
Let A be an open subset of Rn , and let a mapping g ∶ A Ð→ Rc have continuous
partial derivatives at every point of A. Points of A can be written
(x, y), x ∈ Rr , y ∈ Rc .
L = {(x, y) ∈ A ∶ g(x, y) = 0c }.
[M N ] [ ] = w,
x
y
that is,
M x + N y = w.
Assume that N is invertible. Then subtracting M x from both sides and then
left multiplying by N −1 shows that the relation is
y = N −1 (w − M x).
5.3 The Implicit Function Theorem 215
[M N ] [ ] = 0c k = −N −1 M h.
h
⇐⇒ (5.6)
k
When the conditions are nonaffine, the situation is not so easy to analyze.
However:
• The problem is easy to linearize. That is, given a point (a, b) (where a ∈ Rr
and b ∈ Rc ) on the level set {(x, y) ∶ g(x, y) = w}, differential calculus
tells us how to describe the tangent object to the level set at the point.
Depending on the value of r, the tangent object will be a line, or a plane,
or higher-dimensional. But regardless of its dimension, it is described by
the linear conditions g ′ (a, b)v = 0c , and these conditions take the form
that we have just considered,
x2 + y 2 = 1.
Globally (in the large), this relation specifies neither x as a function of y nor
y as a function of x. It can’t: the circle is visibly not the graph of a function
of either sort—recall the vertical line test to check whether a curve is the
graph of a function y = ϕ(x), and analogously for the horizontal line test. The
situation does give a function, however, if one works locally (in the small) by
looking only at part of the circle at a time. Every arc in the bottom half of
the circle is described by the function
216 5 Inverse and Implicit Functions
√
y = ϕ(x) = − 1 − x2 .
Every arc in the bottom right quarter is described by both functions. (See
Figure 5.6.) On the other hand, no arc of the circle about the point (a, b) =
(1, 0) is described by a function y = ϕ(x), and no arc about (a, b) = (0, 1) is
described by a function x = ψ(y). (See Figure 5.7.) Thus, about some points
(a, b), the circle relation x2 + y 2 = 1 contains the information to specify each
variable as a function of the other. These functions are implicit in the relation.
About other points, the relation implicitly defines one variable as a function
of the other, but not the second as a function of the first.
y = ϕ(x)
x = ψ(y)
x ≠ ψ(y)
y ≠ ϕ(x)
The tangent line to the circle at (a, b) consists of the points (a + h, b + k) such
that (h, k) is orthogonal to g ′ (a, b),
[2a 2b] [ ] = 0.
h
k
That is,
2ah + 2bk = 0.
Thus, whenever b ≠ 0 we have
k = −(a/b)h,
showing that on the tangent line, the second coordinate is a linear function
of the first, and the function has derivative −a/b. And so on the circle it-
self near (a, b), plausibly the second coordinate is a function of the first as
well, provided that b ≠ 0. Note that indeed this argument excludes the two
points (1, 0) and (−1, 0), about which y is not an implicit function of x. But
about points (a, b) ∈ C where D2 g(a, b) ≠ 0, the circle relation should im-
plicitly define y as a function of x. And at such points (say, on the lower
half-circle), the function is explicitly
√
ϕ(x) = − 1 − x2 ,
√
so that ϕ′ (x) = x/ 1 − x2 = −x/y (the last minus sign is present because the
square root is positive but y is negative) and in particular,
ϕ′ (a) = −a/b.
Thus ϕ′ (a) is exactly the slope that we found a moment earlier by solving
the linear problem g ′ (a, b)v = 0 where v = (h, k) is a column vector. That is,
using the constraint g(x, y) = 0 to set up and solve the linear problem, making
no reference in the process to the function ϕ implicitly defined by the con-
straint, we found the derivative ϕ′ (a) nonetheless. The procedure illustrates
the general idea of the pending implicit function theorem:
Constraining conditions locally define some variables implicitly in
terms of others, and the implicitly defined function can be differen-
tiated without being found explicitly.
(And returning to the circle example, yet another way to find the derivative
is to differentiate the relation x2 + y 2 = 1 at a point (a, b) about which we
assume that y = ϕ(x),
218 5 Inverse and Implicit Functions
2a + 2bϕ′ (a) = 0,
so that again ϕ′ (a) = −a/b. The reader may recall from elementary calculus
that this technique is called implicit differentiation.)
It may help the reader to visualize the situation if we revisit the idea of
the previous paragraph more geometrically. Since C is a level set of g, the
gradient g ′ (a, b) is orthogonal to C at the point (a, b). When g ′ (a, b) has a
nonzero y-component, C should locally have a big shadow on the x-axis, from
which there is a function ϕ back to C. (See Figure 5.8, in which the arrow
drawn is quite a bit shorter than the true gradient, for graphical reasons.)
g(x, y, z) = x2 + y 2 + z 2 .
y
x
Figure 5.9. Function from the (x, y)-plane to the z-axis via the sphere
The argument based on calculus and linear algebra to suggest that near
points (a, b, c) ∈ S such that D3 g(a, b, c) ≠ 0, z is implicitly a function ϕ(x, y)
on S is similar to the case of the circle. The derivative of g at the point is
ℓ = −(a/c)h − (b/c)k,
showing that on the tangent plane, the third coordinate is a linear function of
the first two, and the function has partial derivatives −a/c and −b/c. And so
on the sphere itself near (a, b, c), plausibly the third coordinate is a function of
the first two as well, provided that c ≠ 0. This argument excludes points on the
equator, about which z is not an implicit function of (x, y). But about points
(a, b, c) ∈ S where D3 g(a, b, c) ≠ 0, the sphere relation should implicitly define z
as a function of (x, y). And at such points (say, on the upper hemisphere),
the function is explicitly
√
ϕ(x, y) = 1 − x2 − y 2 ,
220 5 Inverse and Implicit Functions
√ √
so that ϕ′ (x, y) = −[x/ 1 − x2 − y 2 y/ 1 − x2 − y 2 ] = −[x/z y/z], and in par-
ticular,
ϕ′ (a, b) = − [a/c b/c] .
The partial derivatives are exactly as predicted by solving the linear problem
g ′ (a, b, c)v = 0, where v = (h, k, ℓ) is a column vector, with no reference to ϕ.
(As with the circle, a third way to find the derivative is to differentiate the
sphere relation x2 + y 2 + z 2 = 1 at a point (a, b, c) about which we assume that
z = ϕ(x, y), differentiating with respect to x and then with respect to y,
The two conditions on the three variables should generally leave one variable
(say, the first one) free and define the other two variables in terms of it. That
is, n = 3 and c = 2, so that r = 1. Indeed, GC is a circle that is orthogonal to
the plane of the page, and away from its two points (±1, 0, 0) that are farthest
in and out of the page, it does define (y, z) locally as functions of x. (See
Figure 5.10.) This time we first proceed by linearizing the problem to obtain
the derivatives of the implicit function without finding the implicit function
ϕ = (ϕ1 , ϕ2 ) itself. The derivative matrix of g at p is
g ′ (a, b, c) = [ ].
2a 2b 2c
0 1 1
The level set GC is defined by the condition that g(x, y, z) remain constant
at (1, 0) as (x, y, z) varies. Thus the tangent line to GC at a point (a, b, c)
consists of points (a + h, b + k, c + ℓ) such that neither component function of g
is instantaneously changing in the (h, k, ℓ)-direction,
⎡h⎤
2a 2b 2c ⎢⎢ ⎥
⎥
[ ] ⎢k ⎥ = [ ] .
0
0 1 1 ⎢ ⎢ℓ⎥
⎥ 0
⎣ ⎦
The right 2 × 2 submatrix of g ′ (a, b, c) has nonzero determinant whenever
b ≠ c, that is, at all points of GC except the two aforementioned ex-
treme points (±1, 0, 0). Assuming that b ≠ c, let M denote the first column
5.3 The Implicit Function Theorem 221
of g ′ (a, b, c) and let N denote the right 2 × 2 submatrix. Then by (5.6), the
linearized problem has solution
1 −2c 2a −a
[ ] = −N −1 M h = [ ] [ ] h = [ 2b
a ]h
k 1
ℓ 2(c − b) −1 2b 0 − 2c
k=− ℓ=−
a a
h, h. (5.7)
2b 2c
And so for all points (a + h, b + k, c + ℓ) on the tangent line to GC at (a, b, c),
the last two coordinate-offsets k and ℓ are specified in terms of the first co-
ordinate offset h via (5.7), and the component functions have partial deriva-
tives −a/(2b) and −a/(2c). (And as with the circle and the sphere, the two
partial derivatives can be obtained by implicit differentiation as well.)
To make the implicit function in the great circle relations explicit, note
that near the point p = (a, b, c) in the figure,
√ √
⎛ 1 − x2 1 − x2 ⎞
(y, z) = (ϕ1 (x), ϕ2 (x)) = −
⎝ 2 ⎠
, .
2
define y and z implicitly in terms of x near the point (1, −1, 0)? (This point
meets both conditions.) Answering this directly by solving for y and z is
manifestly unappealing. But linearizing the problem is easy. At our point
(1, −1, 0), the mapping
g(x, y, z) = (y 2 − ez cos(y + x2 ), y 2 + z 2 − x2 )
0 −2 −1
=[ ].
−2 −2 0
−2 −1 −1
−1
ϕ′ (1) = −N −1 M = − [ ] [ ]= [ ][ ] = [ ].
0 1 0 1 0
−2 0 −2 2 2 −2 −2 2
ϕ′ (a) = −N −1 M.
ϕ(a + h) ≈ b − N −1 M h.
Proof. Examining the derivative has already shown the theorem’s plausibility
in specific instances. Shoring up these considerations into a proof is easy with
a well-chosen change of variables and the inverse function theorem. For the
change of variables, define
G ∶ A Ð→ Rn
as follows: for all x ∈ Rr and y ∈ Rc such that (x, y) ∈ A,
highly reversible, being the identity mapping on the x-coordinates. That is, it
is easy to recover g from G. The mapping G affects only y-coordinates, and
it is designed to take the level set L = {(x, y) ∈ A ∶ g(x, y) = 0c } to the x-axis.
(See Figure 5.11, in which the inputs and the outputs of G are shown in the
same copy of Rn .)
Rc
A
b p
G(A) x
Rn a Rr
Rc
b p
x
Rn a Rr
Now we can exhibit the desired mapping implicit in the original g. Define
a mapping
ϕ(x) = φ(x, 0c ) for x near a. (5.10)
The idea is that locally this lifts the x-axis to the level set L where g(x, y) = 0c
and then projects horizontally to the y-axis. (See Figure 5.13.) For every (x, y)
near (a, b), a specialization of condition (5.9) combines with the definition
(5.10) of ϕ to give
g(x, y) = 0c ⇐⇒ y = ϕ(x).
This equivalence exhibits y as a local function of x on the level set of g, as
desired. And since by definition (5.10), ϕ is the last c component functions
of Φ restricted to the first r inputs to Φ, the derivative ϕ′ (a) is exactly the
lower left c × r block of Φ′ (a, 0c ), which is −N −1 M . This completes the proof.
⊔
⊓
Thus the implicit function theorem follows easily from the inverse function
theorem. The converse implication is even easier. Imagine a scenario in which
somehow we know the implicit function theorem but not the inverse function
theorem. Let f ∶ A Ð→ Rn (where A ⊂ Rn ) be a mapping that satisfies the
hypotheses for the inverse function theorem at a point a ∈ A. That is, f is
continuously differentiable in an open set containing a, and det f ′ (a) ≠ 0.
Define a mapping
g ∶ A × Rn Ð→ Rn , g(x, y) = f (x) − y.
(This mapping should look familiar from the beginning of this section.) Let
b = f (a). Then g(a, b) = 0, and the derivative matrix of g at (a, b) is
Rc
φ(x, 0c ) p
b
x
Rn a Rr
Figure 5.13. The implicit mapping from x-space to y-space via the level set
Since f ′ (a) is invertible, we may apply the implicit function theorem, with
the roles of c, r, and n in the theorem taken by the values n, n, and 2n here,
and with the theorem modified as in the third remark before its proof so that
we are checking whether the first n variables depend on the last n values. The
theorem supplies us with a differentiable mapping ϕ defined for values of y
near b such that for all (x, y) near (a, b),
g(x, y) = 0 ⇐⇒ x = ϕ(y).
y = f (x) ⇐⇒ x = ϕ(y).
(as it must be), and we have recovered the inverse function theorem. In a
nutshell, the argument converts the graph y = f (x) into a level set g(x, y) = 0,
and then the implicit function theorem says that locally the level set is also
the graph of x = ϕ(y). (See Figure 5.14.)
Rederiving the inverse function theorem so easily from the implicit function
theorem is not particularly impressive, since proving the implicit function
theorem without citing the inverse function theorem would be just as hard as
the route we took of proving the inverse function theorem first. The point is
that the two theorems have essentially the same content.
We end this section with one more example. Consider the function
g ∶ R2 Ð→ R, g(x, y) = (x2 + y 2 )2 − x2 + y 2
5.3 The Implicit Function Theorem 227
y
Rn
(ϕ(y), y)
b
(x, f (x))
ϕ(y)
f (x)
x
R2n a Rn
Figure 5.14. The inverse function theorem from the implicit function theorem
Exercises
5.3.1. Does the relation x2 + y + sin(xy) = 0 implicitly define y as a function
of x near the origin? If so, what is its best affine approximation? How about
x as a function of y and its affine approximation?
5.3.2. Does the relation xy − z ln y + exz = 1 implicitly define z as a function
of (x, y) near (0, 1, 1)? How about y as a function of (x, z)? When possible,
give the affine approximation to the function.
5.3.3. Do the simultaneous conditions x2 (y 2 + z 2 ) = 5 and (x − z)2 + y 2 = 2
implicitly define (y, z) as a function of x near (1, −1, 2)? If so, then what is
the function’s affine approximation?
5.3.4. Same question for the conditions x2 + y 2 = 4 and 2x2 + y 2 + 8z 2 = 8
near (2, 0, 0).
5.3.5. Do the simultaneous conditions xy + 2yz = 3xz and xyz + x − y = 1
implicitly define (x, y) as a function of z near (1, 1, 1)? How about (x, z) as a
function of y? How about (y, z) as a function of x? Give affine approximations
when possible.
5.3.6. Do the conditions xy 2 +xzu+yv 2 = 3 and u3 yz+2xv−u2 v 2 = 2 implicitly
define (u, v) in terms of (x, y, z) near the point (1, 1, 1, 1, 1)? If so, what is the
derivative matrix of the implicitly defined mapping at (1, 1, 1)?
5.3.7. Do the conditions x2 + yu + xv + w = 0 and x + y + uvw = −1 implicitly
define (x, y) in terms of (u, v, w) near (x, y, u, v, w) = (1, −1, 1, 1, −1)? If so,
what is the best affine approximation to the implicitly defined mapping?
5.3.8. Do the conditions
2x + y + 2z + u − v = 1
xy + z − u + 2v = 1
yz + xz + u2 + v = 0
define the first three variables (x, y, z) as a function ϕ(u, v) near the point
(x, y, z, u, v) = (1, 1, −1, 1, 1)? If so, find the derivative matrix ϕ′ (1, 1).
5.4 Lagrange Multipliers: Geometric Motivation and Specific Examples 229
Let’s step back from specifics (but we will return to the currently unre-
solved example soon) and consider in general the necessary nature of a critical
point in a constrained problem. The discussion will take place in two stages:
first we consider the domain of the problem, and then we consider the critical
point.
The domain of the problem is the points in n-space that satisfy a set of c
constraints. To satisfy the constraints is to meet a condition
g(x) = 0c ,
Equivalently:
5.4 Lagrange Multipliers: Geometric Motivation and Specific Examples 231
has the same domain A ⊂ Rn as g.) Then for every unit vector d describing a
direction in L at p, the directional derivative Dd f (p) must be 0. But Dd f (p) =
⟨∇f (p), d⟩, so this means that:
• ∇f (p) must be orthogonal to L at p.
This observation combines with our description of the most general vector
orthogonal to L at p, in the third bullet above, to give Lagrange’s condition:
Suppose that p is a critical point of the function f restricted to the
level set L = {x ∶ g(x) = 0c } of g. If the gradients ∇gi (p) are linearly
independent, then
c
∇f (p) = ∑ λi ∇gi (p) for some scalars λ1 , . . . , λc ,
i=1
g(p) = 0c .
f (v, w, x, y, z) = v 2 + w2 + x2 + y 2 + z 2
g1 (v, w, x, y, z) = v + w + x + y + z − 1
g2 (v, w, x, y, z) = v − w + 2x − y + z + 1
and the corresponding Lagrange condition and constraints are (after absorbing
a 2 into the λ’s, whose particular values are irrelevant anyway)
Substitute the expressions from the Lagrange condition into the constraints
to get 5λ1 + 2λ2 = 1 and 2λ1 + 8λ2 = −1. That is,
[ ] [ 1] = [ ] ,
52 λ 1
2 8 λ2 −1
8 −2
[ 1] = [ ][ ] = [ ].
λ 1 1 10/36
λ2 36 −2 5 −1 −7/36
Note how much more convenient the two λ’s are to work with than the five
original variables. Their values are auxiliary to the original problem, but sub-
stituting back now gives the nearest point to the origin,
A C
P
D h
x
a x−a
The first relation quickly yields (x − a)H = x(H − h). Combining this with the
second shows that H −h = h, that is, H = 2h. The solution of Euclid’s problem
is, therefore, to take the segment that is bisected by P between the two sides
of the angle. (See Figure 5.22.)
Euclid’s least area problem has the interpretation of finding the point
of tangency between the level set g(x, H) = 0, a hyperbola having asymp-
totes x = a and H = h, and the level sets of f (x, H) = (1/2)xH, a family
of hyperbolas having asymptotes x = 0 and H = 0. (See Figure 5.23, where
the dashed asymptotes meet at (a, h) and the point of tangency is visibly
(x, H) = (2a, 2h).)
5.4 Lagrange Multipliers: Geometric Motivation and Specific Examples 235
A a tan(α)
medium 1
a sec(α) a
α
d
medium 2 β b sec(β)
b
b tan(β) B
Figure 5.25 depicts the situation using the variables x = tan α and y = tan β.
The level set of possible configurations becomes the portion of the line
√+ by = d in √
ax the first quadrant, and the function to be optimized becomes
a 1 + x2 /v + b 1 + y 2 /w. A level set for a large value of the function passes
through the point (0, d/b), the configuration with α = 0 in which the parti-
cle travels vertically in medium 1 and then travels a long path in medium 2,
and a level set for a smaller value of the function passes through the point
(d/a, 0), the configuration with β = 0 in which the particle travels a long path
in medium 1 and then travels vertically in medium 2, while a level set for an
even smaller value of the function is tangent to the line segment at its point
that describes the optimal configuration specified by Snell’s law.
For an example from analytic geometry, let the function f measure the
square of the distance between the points x = (x1 , x2 ) and y = (y1 , y2 ) in the
5.4 Lagrange Multipliers: Geometric Motivation and Specific Examples 237
plane,
f (x1 , x2 , y1 , y2 ) = (x1 − y1 )2 + (x2 − y2 )2 .
Fix points a = (a1 , a2 ) and b = (b1 , b2 ) in the plane, and fix positive numbers
r and s. Define
g(x1 , x2 , y1 , y2 ) = (0, 0)
can be viewed as the set of pairs of points x and y that lie respectively on the
circles centered at a and b with radii r and s. Thus, to optimize the function f
subject to the constraint g = 0 is to optimize the distance between pairs of
points on the circles. The rows of the 2 × 4 matrix
x − a1 x 2 − a2
g ′ (x, y) = 2 [ 1 ]
0 0
0 0 y 1 − b1 y 2 − b2
(x1 − y1 , x2 − y2 , y1 − x1 , y2 − x2 ) = λ1 (x1 − a1 , x2 − a2 , 0, 0)
− λ2 (0, 0, y1 − b1 , y2 − b2 ),
or
(x − y, y − x) = λ1 (x − a, 02 ) − λ2 (02 , y − b).
The second half of the vector on the left is the additive inverse of the first, so
the condition can be rewritten as
x − y = λ1 (x − a) = λ2 (y − b).
x − y ∥ x − a ∥ y − b,
and so the points x, y, a, and b are collinear. Granted, these results are obvious
geometrically, but it is pleasing to see them follow so easily from the Lagrange
multiplier condition. On the other hand, not all points x and y such that x,
y, a, and b are collinear are solutions to the problem. For example, if both
circles are bisected by the x-axis and neither circle sits inside the other, then
x and y could be the leftmost points of the circles, neither the closest nor the
farthest pair.
238 5 Inverse and Implicit Functions
The last example of this section begins by maximizing the geometric mean
of n nonnegative numbers,
f (1, . . . , 1) = (1⋯1)1/n = 1.
This Lagrange multiplier argument provides most of the proof of the following
theorem.
Theorem 5.4.1 (Arithmetic–geometric mean inequality). The geomet-
ric mean of n positive numbers is at most their arithmetic mean:
a1 + ⋯ + an
(a1 ⋯an )1/n ≤ for all nonnegative a1 , . . . , an .
n
Proof. If any ai = 0 then the inequality is clear. Given positive numbers
a1 , . . . , an , let a = (a1 + ⋯ + an )/n and let xi = ai /a for i = 1, . . . , n. Then
(x1 + ⋯ + xn )/n = 1, and therefore
a1 + ⋯ + an
(a1 ⋯an )1/n = a(x1 ⋯xn )1/n ≤ a = .
n
⊔
⊓
5.4 Lagrange Multipliers: Geometric Motivation and Specific Examples 239
Exercises
5.4.1. Find the nearest point to the origin on the intersection of the hyper-
planes x + y + z − 2w = 1 and x − y + z + w = 2 in R4 .
5.4.6. Find the rectangular box of greatest volume, having sides parallel to the
coordinate axes, that can be inscribed in the ellipsoid ( xa ) + ( yb ) + ( zc ) = 1.
2 2 2
5.4.7. The lengths of the twelve edges of a rectangular block sum to 4, and
the areas of the six faces sum to 4α. Find the lengths of the edges when the
excess of the block’s volume over that of a cube with edge equal to the least
edge of the block is greatest.
5.4.8. A cylindrical can (with top and bottom) has volume V . Subject to this
constraint, what dimensions give it the least surface area?
5.4.9. Find the distance in the plane from the point (0, 1) to the parabola y =
ax2 where a > 0. Note: the answer depends on whether a > 1/2 or 0 < a ≤ 1/2.
mean inequality,
The proof will culminate the ideas in this chapter as follows. The inverse
function theorem says:
If the linearized inversion problem is solvable then the actual inversion
problem is locally solvable.
The inverse function theorem is equivalent to the implicit function theorem:
If the linearized level set is a graph then the actual level set is locally
a graph.
And finally, the idea for proving the Lagrange condition is:
Although the graph is a curved space, where the techniques of Chapter 4
do not apply, its domain is a straight space, where they do.
That is, the implicit function theorem lets us reduce optimization on the graph
to optimization on the domain, which we know how to do.
Proof. The second condition holds since p is a point in L. The first condition
needs to be proved. Let r = n − c, the number of variables that should remain
free under the constraint g(x) = 0c , and notate the point p as p = (a, b),
where a ∈ Rr and b ∈ Rc . Using this notation, we have g(a, b) = 0c and
g ′ (a, b) = [M N ] where M is c × r and N is c × c and invertible. (We may
assume that N is the invertible block in the hypotheses to the theorem because
we may freely permute the variables.) The implicit function theorem gives a
mapping ϕ ∶ A0 Ð→ Rc (where A0 ⊂ Rr and a is an interior point of A0 ) with
ϕ(a) = b, ϕ′ (a) = −N −1 M , and for all points (x, y) ∈ A near (a, b), g(x, y) = 0c
if and only if y = ϕ(x).
Make f depend only on the free variables by defining
(See Figure 5.26.) Since the domain of f0 doesn’t curve around in some larger
space, f0 is optimized by the techniques from Chapter 4. That is, the implicit
function theorem has reduced optimization on the curved set to optimization
in Euclidean space. Specifically, the multivariable critical point theorem says
that f0 has a critical point at a,
∇f0 (a) = 0r .
Our task is to express the previous display in terms of the given data f and g.
Doing so will produce the Lagrange condition.
Because f0 = f ○ (idr , ϕ) is a composition, the chain rule says that the
condition ∇f0 (a) = 0r is ∇f (a, ϕ(a)) ⋅ (idr , ϕ)′ (a) = 0r , or
∇f (a, b) [ ′ r ] = 0r .
I
ϕ (a)
242 5 Inverse and Implicit Functions
Let ∇f (a, b) = (u, v) where u ∈ Rr and v ∈ Rc are row vectors, and recall that
ϕ′ (a) = −N −1 M . The previous display becomes
[u v] [ ] = 0r ,
Ir
−N −1 M
[u v] = vN −1 [M N ] .
∇f (p) = λg ′ (p).
⊔
⊓
y
Rc
p
f
(idr , ϕ) f0 R
x
Rn A0 a Rr
Figure 5.26. The Lagrange multiplier criterion from the implicit function theorem
We have seen that the Lagrange multiplier condition is necessary but not
sufficient for an extreme value. That is, it can report a false positive, as in the
two-circle problem in the previous section. False positives are not a serious
problem, since inspecting all the points that meet the Lagrange condition will
determine which of them give the true extrema of f . A false negative would be
a worse situation, giving us no indication that an extreme value might exist,
much less how to find it. The following example shows that the false negative
scenario can arise without the invertible c × c block required in Theorem 5.5.1.
Let the temperature in the plane be given by
f (x, y) = x,
L = {(x, y) ∈ R2 ∶ y 2 = x3 }.
(See Figure 5.27.) Since temperature increases as we move to the right, the
coldest point of L is its leftmost point, the cusp at (0, 0). However, the La-
grange condition does not find this point. Indeed, the constraining function
is g(x, y) = x3 − y 2 (which does have continuous derivatives, notwithstanding
that its level set has a cusp: the graph of a smooth function is smooth, but
the level set of a smooth function need not be smooth—this is exactly the
issue addressed by the implicit function theorem). Therefore the Lagrange
condition and the constraint are
These equations have no solution. The problem is that the gradient at the cusp
is ∇g(0, 0) = (0, 0), and neither of its 1 × 1 subblocks is invertible. In general,
the Lagrange multiplier condition will not report a false negative as long as we
remember that it only claims to check for extrema at the nonsingular points
of L, the points p such that g ′ (p) has an invertible c × c subblock.
The previous section gave specific examples of the Lagrange multiplier
method. This section now gives some general families of examples.
Recall that the previous section discussed the problem of optimizing the
distance between two points in the plane, each point lying on an associated
circle. Now, as the first general example of the Lagrange multiplier method,
let (x, y) ∈ Rn × Rn denote a pair of points each from Rn , and let the function
f measure the square of the distance between such a pair,
f ∶ Rn × Rn Ð→ R, f (x, y) = ∣x − y∣2 .
Note that ∇f (x, y) = [x − y y − x], viewing x and y as row vectors. Given two
mappings g1 ∶ Rn Ð→ Rc1 and g2 ∶ Rn Ð→ Rc2 , define
level sets cut out of Rn by the c1 conditions g1 (x) = 0c1 and the c2 conditions
g2 (y) = 0c2 . Assuming that the Lagrange condition holds for the optimizing
pair, it is
g ′ (x) 0c2 ×n
[x − y y − x] = λg ′ (x, y) = [λ1 −λ2 ] [ 1 ]
0c1 ×n g2′ (y)
= λ1 (g1′ (x), 0c2 ×n ) − λ2 (0c1 ×n , g2′ (y)),
where λ1 ∈ Rc1 and λ2 ∈ Rc2 are row vectors. The symmetry of ∇f reduces
this equality of 2n-vectors to an equality of n-vectors,
That is, either x = y or the line through x and y is normal to the first level
set at x and normal to the second level set at y, generalizing the result from
the two-circle problem. With this result in mind, you may want to revisit
Exercise 0.0.1 from the preface to these notes.
The remaining general Lagrange multiplier methods optimize a linear func-
tion or a quadratic function subject to affine constraints or a quadratic con-
straint. We gather the results in one theorem.
Theorem 5.5.2 (Low-degree optimization with constraints).
(1) Let f (x) = aT x (where a ∈ Rn ) subject to the constraint M x = b (where
M ∈ Mc,n (R) has linearly independent rows, with c < n, and b ∈ Rc ). Check
whether aT M T (M M T )−1 M = aT . If so, then f subject to the constraint is
identically aT M T (M M T )−1 b; otherwise, f subject to the constraint has no
optima.
(2) Let f (x) = xT Ax (where A ∈ Mn (R) is symmetric and invertible) subject to
the constraint M x = b (where M ∈ Mc,n (R) has linearly independent rows,
with c < n, and b ∈ Rc ). The x that optimizes f subject to the constraint
and the optimal value are
With aT being a linear combination of the rows of M and with b being a linear
combination of the columns of M , the Lagrange condition and the constraints
immediately show that for every x in the constrained set,
f (x) = aT x = λT M x = λT b = aT M T (M M T )−1 b.
As in (1), we assume that c < n, and we assume that the c rows of M are
linearly independent in Rn , i.e., some c columns of M are a basis of Rc ,
i.e., some c × c subblock of M has nonzero determinant. Thus the constraints
M x = b have solutions x for every b ∈ Rc .
To set up the Lagrange condition, we need to differentiate the quadratic
function f . Compute that
and so the best linear approximation of this difference is T (h) = 2xT Ah. It
follows that
∇f (x) = 2xT A.
Returning to the optimization problem, the Lagrange condition and the
constraints are
x T A = λT M where λ ∈ Rc ,
M x = b.
Having solved a particular problem of this sort in Section 5.4, we use its
particular solution to guide our solution of the general problem. The first step
5.5 Lagrange Multipliers: Analytic Proof and General Examples 247
f ∶ Rn Ð→ R, f (x) = aT x where a ∈ Rn ,
M ∈ Mn (R) is symmetric,
g ∶ Rn Ð→ R, g(x) = xT M x − b where {
b ∈ R is nonzero.
aT = λxT M where λ ∈ R,
x M x = b.
T
and so to find these values it suffices to find the possible values of λ. Assuming
that M is invertible, the Lagrange condition is aT M −1 = λxT , and hence
aT M −1 ab = λxT ab = λ2 b2 = f (x)2 .
xT A = λxT M where λ ∈ R,
x M x = b.
T
By the Lagrange condition and the constraint, the possible optimal values
of f take the form
f (x) = xT Ax = λxT M x = λb,
which we will know as soon as we find the possible values of λ, without needing
to find x. Assuming that M is invertible, the Lagrange condition gives
M −1 Ax = λx.
(B − λI)x = 0.
5.5 Lagrange Multipliers: Analytic Proof and General Examples 249
det(B − λI) = 0.
Conversely, for every λ ∈ R satisfying this equation there is at least one eigen-
vector x of B, because the equation (B − λI)x = 0 has nonzero solutions. And
so the eigenvalues are the real roots of the polynomial
Exercises
5.5.1. Let f (x, y) = y and let g(x, y) = y 3 −x4 . Graph the level set L = {(x, y) ∶
g(x, y) = 0}. Show that the Lagrange multiplier criterion does not find any
candidate points where f is optimized on L. Optimize f on L nonetheless.
250 5 Inverse and Implicit Functions
(a) Use Theorem 5.5.2, part (1), to optimize the linear function f (x, y, z) =
6x + 9y + 12z subject to the affine constraint g(x, y, z) = (7, 8).
(b) Verify without using the Lagrange multiplier method that the function
f subject to the constraint g = (7, 8) (with f and g from part (a)) is constant,
always taking the value that you found in part (a).
(c) Show that the function f (x, y, z) = 5x + 7y + z cannot be optimized
subject to any constraint g(x, y, z) = b.
5.5.3. (a) Use Theorem 5.5.2, part (2), to minimize the quadratic function
f (x, y) = x2 + y 2 subject to the affine constraint 3x + 5y = 8.
(b) Use the same result to find the extrema of f (x, y, z) = 2xy + z 2 subject
to the constraints x + y + z = 1, x + y − z = 0.
(c) Use the same result to find the nearest point to the origin on the
intersection of the hyperplanes x + y + z − 2w = 1 and x − y + z + w = 2 in R4 ,
reproducing your answer to Exercise 5.4.1.
5.5.5. (a) Use Theorem 5.5.2, part (4), to optimize the function f (x, y) = 2xy
subject to the constraint g(x, y) = 1 where g(x, y) = x2 + 2y 2 .
(b) Use the same result to optimize the function f (x, y, z) = 2(xy +yz +zx)
subject to the constraint g(x, y, z) = 1 where g(x, y, z) = x2 + y 2 − z 2 .
Part II
The integral represents physical ideas such as volume or mass or work, but
defining it properly in purely mathematical terms requires some care. Here is
some terminology that is standard from the calculus of one variable, except
perhaps compact (meaning closed and bounded ) from Section 2.4 of these
notes. The language describes a domain of integration and the machinery to
subdivide it.
I = [a, b] = {x ∈ R ∶ a ≤ x ≤ b} ,
where a and b are real numbers with a ≤ b. The length of the interval is
length(I) = b − a.
P = {t0 , t1 , . . . , tk }
satisfying
a = t0 < t1 < ⋯ < tk = b.
Such a partition divides I into k subintervals J1 , . . . , Jk where
Jj = [tj−1 , tj ], j = 1, . . . , k.
J1 J Jk
a = t0 t1 t2 t3 tk−1 tk = b
mJ (f ) = inf {f (x) ∶ x ∈ J} ,
MJ (f ) = sup {f (x) ∶ x ∈ J} .
L(f, P ) = ∑ mJ (f ) length(J),
J
U (f, P ) = ∑ MJ (f ) length(J).
J
If the interval I in Definition 6.1.3 has length zero, then the lower and
upper sums are empty, and so they are assigned the value 0 by convention.
The function f in Definition 6.1.3 is not required to be differentiable or
even continuous, only bounded. Even so, the values mJ (f ) and MJ (f ) in
the previous definition exist by the set-bound phrasing of the principle that
the real number system is complete. To review this idea, see Theorem 1.1.3.
When f is in fact continuous, the extreme value theorem (Theorem 2.4.15)
justifies substituting min and max for inf and sup in the definitions of mJ (f )
and MJ (f ), since each subinterval J is nonempty and compact. It may be eas-
iest at first to understand mJ (f ) and MJ (f ) by imagining f to be continuous
and mentally substituting appropriately. But we will need to integrate dis-
continuous functions f . Such functions may take no minimum or maximum
on J, and so we may run into a situation like the one pictured in Figure 6.2,
in which the values mJ (f ) and MJ (f ) are not actual outputs of f . Thus the
definition must be as given to make sense.
The technical properties of inf and sup will figure in Lemmas 6.1.6, 6.1.8,
and 6.2.2. To see them in isolation first, we rehearse them now. So, let S
256 6 Integration
MJ (f )
mJ (f )
mJ (f ) ≤ mJ ′ (f ) ≤ MJ ′ (f ) ≤ MJ (f ).
S1 × S2 × ⋯ × Sn = {(s1 , s2 , . . . , sn ) ∶ s1 ∈ S1 , s2 ∈ S2 , . . . , sn ∈ Sn } .
(See Figure 6.4, in which n = 2, and S1 has two components, and S2 has one
component, so that the Cartesian product S1 × S2 has two components.)
B = I1 × I2 × ⋯ × In
P = P1 × P2 × ⋯ × Pn .
Such a partition divides B into subboxes J, each such subbox being a Carte-
sian product of subintervals. By a slight abuse of language, these are called
the subboxes of P .
(See Figure 6.5, and imagine its three-dimensional Rubik’s cube counterpart.)
Every nonempty compact box in Rn has partitions, even such boxes with
some length-zero sides. This point will arise at the very beginning of the next
section.
Figure 6.7 illustrates the fact that if P ′ refines P then every subbox of P ′
is contained in a subbox of P . The literal manifestation in the figure of the
containment P ′ ⊃ P is that the set of points where a horizontal line segment
and a vertical line segment meet in the right side of the figure subsumes the
set of such points in the left side.
Refining a partition brings the lower and upper sums nearer each other:
See Figure 6.8 for a picture-proof for lower sums when n = 1, thinking of
the sums in terms of area. The formal proof is just a symbolic rendition of
the figure’s features.
J ′ ⊂J J ′ ⊂J
= mJ (f ) ∑ vol(J ′ ) = mJ (f )vol(J).
J ′ ⊂J
Exercises
6.1.1. (a) Let I = [0, 1], let P = {0, 1/2, 1}, let P ′ = {0, 3/8, 5/8, 1}, and let P ′′
be the common refinement of P and P ′ . What are the subintervals of P , and
what are their lengths? Same question for P ′ . Same question for P ′′ .
(b) Let B = I × I, let Q = P × {0, 1/2, 1}, let Q′ = P ′ × {0, 1/2, 1}, and let
Q be the common refinement of Q and Q′ . What are the subboxes of Q and
′′
what are their areas? Same question for Q′ . Same question for Q′′ .
6.1.2. Show that the lengths of the subintervals of every partition of [a, b]
sum to the length of [a, b]. Same for the areas of the subboxes of [a, b] × [c, d].
Generalize to Rn .
6.1.3. Let J = [0, 1]. Compute mJ (f ) and MJ (f ) for each of the following
functions f ∶ J Ð→ R.
(a) f (x) = x(1 − x),
⎧
⎪
⎪1
(b) f (x) = ⎨
if x is irrational,
⎪
⎪ x = n/m in lowest terms, n, m ∈ Z and m > 0,
⎩
1/m if
⎧
⎪
⎪(1 − x) sin(1/x) if x ≠ 0,
(c) f (x) = ⎨
⎪
⎪ if x = 0.
⎩
0
6.1.7. Show that the union of partitions of a box B need not be a partition
of B.
{L(f, P ) ∶ P is a partition of B} ,
is nonempty because such partitions exist (as observed in the previous sec-
tion), and similarly for the set of upper sums. Proposition 6.1.10 shows that
the set of lower sums is bounded above by every upper sum, and similarly the
set of upper sums is bounded below. Thus the next definition is natural.
Similarly, the upper integral of f over B is the greatest lower bound of the
upper sums of f over all partitions P ,
The function f is called integrable over B if the lower and upper integrals
are equal, i.e., if L ∫B f = U ∫B f . In this case, their shared value is called the
integral of f over B and written ∫B f .
264 6 Integration
sup(L) ≤ inf(U).
Since U is nonempty and has lower bounds, it has a greatest lower bound
inf(U). Since each ℓ ∈ L is a lower bound and inf(U) is the greatest lower
bound,
ℓ ≤ inf(U) for each ℓ ∈ L,
meaning precisely that
Since L is nonempty and has an upper bound, it has a least upper bound
sup(L). Since sup(L) is the least upper bound and inf(U) is an upper bound,
sup(L) ≤ inf(U).
Chasing through the definitions shows that for this B and f , every lower
sum is L(f, P ) = 0, so the lower integral is L ∫B f = sup {0} = 0. Similarly,
U ∫B f = 1. Since the upper and lower integrals don’t agree, ∫B f does not
exist.
So the questions are, what functions are integrable, or at least, what are
some general classes of integrable functions, and how does one evaluate their
integrals? Working from the definitions, as in the last example, is a good
exercise in simple cases to get familiar with the machinery, but as a general
procedure it is hopelessly unwieldy. Here is one result that will help us in the
next section to show that continuous functions are integrable.
Proposition 6.2.3 (Integrability criterion). Let B be a box, and let f ∶
B Ð→ R be a bounded function. Then f is integrable over B if and only if for
every ε > 0, there exists a partition P of B such that U (f, P ) − L(f, P ) < ε.
Proof. ( Ô⇒ ) Let f be integrable over B and let ε > 0 be given. Since
∫B f − ε/2 is less than the least upper bound of the lower sums, it is not an
upper bound of the lower sums, and similarly ∫B f + ε/2 is not a lower bound
of the upper sums. Thus there exist partitions P and P ′ of B such that
L(f, P ) ≤ L ∫ f ≤ U ∫ f ≤ U (f, P ).
B B
U ∫ f − L ∫ f < ε.
B B
∑ ∫ f = ∫ f.
J J B
Similarly, U (f, P ) =
′
∑J U (f, PJ′ ).
Suppose that f is integrable over B. Let an arbitrary ε > 0 be given. By
“ Ô⇒ ” of the integrability criterion, there exists a partition P ′ of B such
that
U (f, P ′ ) − L(f, P ′ ) < ε.
Since refining a partition cannot increase the difference between the upper and
lower sums, we may replace P ′ by its common refinement with P and thus
assume that P ′ refines P . Therefore the formulas from the previous paragraph
show that
∑(U (f, PJ ) − L(f, PJ )) < ε,
′ ′
J
and so
U (f, PJ′ ) − L(f, PJ′ ) < ε for each subbox J of B.
Therefore f is integrable over each subbox J of B by “ ⇐Ô ” of the integra-
bility criterion.
Now assume that f is integrable over B and hence over each subbox J.
Still letting P ′ be any partition of B that refines P , the integral over each
subbox J lies between the corresponding lower and upper sums, and so
Thus ∑J ∫J f is an upper bound of all lower sums L(f, P ′ ) and a lower bound
of all upper sums U (f, P ′ ), giving
L ∫ f ≤ ∑ ∫ f ≤ U ∫ f.
B J J B
Similar techniques show that the converse of the proposition holds as well,
so that given B, f , and P , f is integrable over B if and only if f is integrable
over each subbox J, but we do not need this full result. Each of the proposition
and its converse requires both implications of the integrability criterion.
The symbol B denotes a box in the next set of exercises.
Exercises
6.2.4. Granting that every interval of positive length contains both rational
and irrational numbers, fill in the details in the argument that the function
f ∶ [0, 1] Ð→ R with f (x) = 1 for rational x and f (x) = 0 for irrational x is
not integrable over [0, 1].
mJ (f ) + mJ (g) ≤ mJ (f + g) ≤ MJ (f + g) ≤ MJ (f ) + MJ (g).
(b) Part (a) of this exercise obtained comparisons between lower and upper
sums, analogously to the first paragraph of the proof of Proposition 6.2.4.
Argue analogously to the rest of the proof to show that ∫B (f + g) exists and
equals ∫B f + ∫B g. (One way to begin is to use the integrability criterion twice
and then a common refinement to show that there exists a partition P of B
such that U (f, P ) − L(f, P ) < ε/2 and U (g, P ) − L(g, P ) < ε/2.)
(c) Let c ≥ 0 be any constant. Let P be any partition of B. Show that for
every subbox J of P ,
268 6 Integration
L ∫ cf = c L ∫ f and U ∫ cf = c U ∫ f.
B B B B
∫ cf = c ∫ f.
B B
∫ (−f ) = − ∫ f.
B B
Explain why the work so far here in part (d) combines with part (c) to show
that for every c ∈ R (positive, zero, or negative), ∫B cf exists and
∫ cf = c ∫ f.
B B
To prove this theorem, as we will at the end of this section, we first need to
sharpen our understanding of continuity on boxes. The version of continuity
that we’re familiar with isn’t strong enough to prove certain theorems, this
one in particular. Formulating the stronger version of continuity requires first
revising the grammar of the familiar brand.
f
x f (x)
point f (x) in Rm . The idea is that in response, you can draw a ball of some
radius—this is the δ in the definition—about the point x in S such that every
point in the δ-ball about x gets taken by f into the ε-ball about f (x). (See
Figure 6.11.)
δ f ε
x f (x)
Now ∣f (x̃) − f (x)∣ ≤ (1 + 2∣x∣)∣x̃ − x∣. Next we constrain δ further to make this
estimate less than ε when ∣x̃ − x∣ < δ. Stipulating that δ be at most ε/(1 + 2∣x∣)
does so. Hence the choice of δ in the proof.
To prove instead that the function f ∶ R Ð→ R given by f (x) = x2 is
sequentially continuous on R, again take any x ∈ R. Consider any sequence
{xν } in R converging to x. To show that the sequence {f (xν )} in R converges
to f (x), compute that by sequence limit properties,
To show that {f (xν )} converges to f (x) means that given an arbitrary ε > 0,
we need to exhibit a starting index N such that
for all ν > N , ∣f (xν ) − f (x)∣ < ε.
The definition of ε-δ continuity gives a δ such that
if x̃ ∈ S and ∣x̃ − x∣ < δ then ∣f (x̃) − f (x)∣ < ε.
And since {xν } converges in S to x, there is some starting index N such that
for all ν > N , ∣xν − x∣ < δ.
The last two displays combine to imply the first display, showing that f is
sequentially continuous at x.
( Ô⇒ ) Now suppose that f is not ε-δ continuous at x. Then for some ε > 0,
no δ > 0 satisfies the relevant conditions. In particular, δ = 1/ν fails the
conditions for ν = 1, 2, 3, . . . . So there is a sequence {xν } in S such that
∣xν − x∣ < 1/ν and ∣f (xν ) − f (x)∣ ≥ ε, ν = 1, 2, 3, . . . .
The display shows that f is not sequentially continuous at x.
Since the two types on continuity imply each other at each point x of S,
they imply each other on S. ⊓
⊔
The fact that the second half of this proof has to proceed by contrapo-
sition, whereas the first half is straightforward, shows that ε-δ continuity is
a little more powerful than sequential continuity on the face of it, until we
do the work of showing that they are equivalent. Also, the very definition
of ε-δ continuity seems harder for students than the definition of sequential
continuity, which is why these notes have used sequential continuity up to
now. However, the exceptionally alert reader may have recognized that the
second half of this proof is essentially identical to the proof of the persistence
of inequality principle (Proposition 2.3.10). Thus, the occasional arguments
in these notes that cited the persistence of inequality were tacitly using ε-δ
continuity already, because sequential continuity was not transparently strong
enough for their purposes. The reader who dislikes redundancy is encouraged
to rewrite the second half of this proof to quote the persistence of inequality
rather than re-prove it.
The reason that we bother with this new ε-δ type of continuity, despite
its equivalence to sequential continuity meaning that it is nothing new, is
that its grammar generalizes to describe the more powerful continuity that
we need. The two examples above of ε-δ continuity differed: in the example
f (x) = x2 , the choice of δ = min{1, ε/(2∣x∣ + 1)} for any given x and ε to
satisfy the definition of ε-δ continuity at x depended not only on ε but on x
as well. In the example f (x) = 2∣x∣, the choice of δ = ε/2 for any given x
and ε depended only on ε, i.e., it was independent of x. Here, one value of δ
works simultaneously at all values of x once ε is specified. This technicality
has enormous consequences.
6.3 Continuity and Integrability 273
Proof. Suppose that f is not uniformly continuous. Then for some ε > 0 there
exists no suitable uniform δ, and so in particular no reciprocal positive inte-
ger 1/ν will serve as δ in the definition of uniform continuity. Thus for each
ν ∈ Z+ there exist points xν and yν in K such that
i.e., even if both limits exist then they still cannot be equal. (If they both
exist and they agree then lim(f (xν ) − f (yν )) = 0, but this is incompatible
with the second condition in (6.2), ∣f (xν ) − f (yν )∣ ≥ ε for all ν.) The previous
two displays combine to show that
i.e., at least one of the left sides in the previous display doesn’t match the
corresponding right side or doesn’t exist at all. Thus f is not continuous at p.
⊔
⊓
Exercises
6.3.1. Reread the proof that sequential and ε-δ continuity are equivalent; then
redo the proof with the book closed.
6.3.2. Let f ∶ R Ð→ R be the cubing function f (x) = x3 . Give a direct proof
that f is ε-δ continuous on R. (Hint: A3 − B 3 = (A − B)(A2 + AB + B 2 ).)
6.3.3. Here is a proof that the squaring function f (x) = x2 is not uniformly
continuous on R. Suppose that some δ > 0 satisfies the definition of uniform
continuity for ε = 1. Set x = 1/δ and x̃ = 1/δ + δ/2. Then certainly ∣x̃ − x∣ < δ,
but
1 δ 2 1 δ2 δ2
∣f (x̃) − f (x)∣ = ∣( + ) − 2 ∣ = ∣ 2 + 1 + − 2∣ = 1+
1 1
> ε.
δ 2 δ δ 4 δ 4
This contradicts uniform continuity.
Is the cubing function of the previous exercise uniformly continuous on R?
On [0, 500]?
6.3.4. (a) Show that if I ⊂ R is an interval (possibly all of R), f ∶ I Ð→ R is
differentiable, and there exists a positive constant R such that ∣f ′ (x)∣ ≤ R for
all x ∈ I then f is uniformly continuous on I.
(b) Prove that sine and cosine are uniformly continuous on R.
√
6.3.5. Let f ∶ [0, +∞) Ð→ R be the square root function f (x) = x. You may
take for granted that f is ε-δ continuous on [0, +∞).
(a) What does part (a) of the previous problem say about the uniform
continuity of f ?
(b) Is f uniformly continuous?
6.3.6. Let J be a box in Rn with sides of length less than δ/n. Show that all
points x and x̃ in J satisfy ∣x̃ − x∣ < δ.
6.3.7. For ∫B f to exist, it is sufficient that f ∶ B Ð→ R be continuous, but it
is not necessary. What preceding exercise provides an example of this? Here is
another example. Let B = [0, 1] and let f ∶ B Ð→ R be monotonic increasing,
meaning that if x1 < x2 in B then f (x1 ) ≤ f (x2 ). Show that such a function
is bounded, though it need not be continuous. Use the integrability criterion
to show that ∫B f exists.
6.3.8. The natural logarithm is defined as an integral. Let r ∶ R+ Ð→ R be the
reciprocal function, r(x) = 1/x for x > 0. The natural logarithm is
⎧
⎪
⎪ ∫ if x ≥ 1,
ln(x) = ⎨ [1,x]
r
ln ∶ R+ Ð→ R,
⎪
⎪− if 0 < x < 1.
⎩ ∫[x,1]
r
We know that the integrals in the previous display exist, because the reciprocal
function is continuous.
6.4 Integration of Functions of One Variable 277
(a) Show that limx→∞ ln x/x = 0 as follows. Let some small ε > 0 be given.
For x > 2/ε, let u(x, ε) denote the sum of the areas of the boxes [1, 2/ε] ×[0, 1]
and [2/ε, x] × [0, ε/2]. Show that u(x, ε) ≥ ln x. (Draw a figure showing the
boxes and the graph of r, and use the words upper sum in your answer.)
Compute limx→∞ u(ε, x)/x (here ε remains fixed), and use your result to show
that u(ε, x)/x < ε for all large enough x. This shows that limx→∞ ln x/x = 0.
(b) Let a > 0 and b > 1 be fixed real numbers. Part (a) shows that
√ √ ,
9 dx
∫
0 1+ x
√ √
let u = 1 + x. Then some algebra shows that x = (u2 − 1)2 , and so dx =
4(u2 − 1)u du. Also, when x = 0, u = 1, and when x = 9, u = 2. Therefore the
integral is
278 6 Integration
2 (u2 − 1)u
√ √ = 4∫ du = 4 ∫ (u2 − 1) du
9 dx 2
∫
0 1+ x 1 u 1
2
= 4 ( u3 − u) ∣ = .
1 16
3 1
3
Although both of these examples use substitution, they differ from each
other in a way that a first calculus course may not explain. The first substitu-
tion involved picking an x-dependent u (i.e., u = ln x) where u′ (x) (i.e., 1/x)
was present in the integral and got absorbed by the substitution. The second
substitution took an opposite form to the first: this time the x-dependent u
was inverted to produce a u-dependent x, and the factor u′ (x) was introduced
into the integral rather than eliminated from it. Somehow, two different things
are going on under the guise of u-substitution.
In this section we specialize our theory of multivariable integration to n = 1
and review two tools for evaluating one-dimensional integrals, the fundamen-
tal theorem of integral calculus (FTIC) and the change of variable theorem.
Writing these down precisely will clarify the examples we just worked. More
importantly, generalizing these results appropriately to n dimensions is the
subject of the remainder of these notes.
The multivariable integral notation of this chapter, specialized to one di-
mension, is ∫[a,b] f . For familiarity, replace this by the usual notation,
b
∫ f =∫ f for a ≤ b.
a [a,b]
Once this is done, the same relation between signed integrals holds regardless
of which (if either) of a and b is larger,
b a
∫ f = −∫ f for all a and b.
a b
positive traversal along the real line from a up to b, while ∫b describes negative
a
traversal from b down to a. This sort of thing does not obviously generalize
to higher dimensions, because Rn is not ordered.
Casewise inspection shows that for every three points a, b, c ∈ R in any
order, and for every integrable function f ∶ [min{a, b, c}, max{a, b, c}] Ð→ R,
6.4 Integration of Functions of One Variable 279
c b c
∫ f =∫ f +∫ f.
a a b
Also, if f ∶ [min{a, b}, max{a, b}] Ð→ R takes the constant value k then
b
∫ f = k(b − a),
a
Proof. Let x and x + h lie in [a, b] with h ≠ 0. Study the difference quotient
F (x + h) − F (x) ∫a f − ∫a f ∫x f
x+h x x+h
= = .
h h h
through by h shows that the difference quotient lies between m[x,x+h] (f ) and
M[x,x+h] (f ). Thus the difference quotient is forced to f (x) as h goes to 0,
since f is continuous. A similar analysis applies when h < 0.
Alternatively, an argument using the characterizing property of the deriva-
tive and the Landau–Bachmann notation does not require separate cases de-
pending on the sign of h. Compute that
But here the reader needs to believe, or check, the last equality. ⊔
⊓
The alert reader will recall the convention in these notes that a mapping
can be differentiable only at an interior point of its domain. In particular,
the derivative of a function F ∶ [a, b] Ð→ R is undefined at a and b. Hence
the statement of Theorem 6.4.1 is inconsistent with our usage, and strictly
speaking the theorem should conclude that F is continuous on [a, b] and
differentiable on (a, b) with derivative F ′ = f . The given proof does show this,
since the existence of the one-sided derivative of F at each endpoint makes F
continuous there.
280 6 Integration
F ′ = F (b) − F (a).
b
∫
a
ceding theorem, so (Exercise 6.4.3) there exists a constant c such that for all
x ∈ [a, b],
F2 (x) = F (x) + c. (6.3)
Plug x = a into (6.3) to get 0 = F (a) + c, so c = −F (a). Next plug in x = b
to get F2 (b) = F (b) − F (a). Since F2 (b) = ∫a F ′ by definition, the proof is
b
complete. ⊔
⊓
One can also prove the fundamental theorem with no reference to Theo-
rem 6.4.1, letting the mean value theorem do all the work instead. Compute
that for every partition P of [a, b], whose points are a = t0 < t1 < ⋯ < tk = b,
k
F (b) − F (a) = ∑ F (ti ) − F (ti−1 ) (telescoping sum)
i=1
k
= ∑ F ′ (ci )(ti − ti−1 ) with each ci ∈ (ti−1 , ti ), by the MVT
i=1
≤ U (F ′ , P ).
6.4 Integration of Functions of One Variable 281
Since P is arbitrary, F (b)−F (a) is a lower bound of the upper sums and hence
is at most the upper integral U ∫a F ′ . Since F ′ is continuous, its integral exists
b
F (b) − F (a) ≤ ∫
b
F ′.
a
ous functions have antiderivatives that are readily found, or even possible to
write in an elementary form (for example, try f (x) = e−x or f (x) = sin(x2 )),
2
∫ (f ○ φ) ⋅ φ = ∫
b φ(b)
′
f. (6.4)
a φ(a)
∫ (f ○ φ) ⋅ φ = ∫ (F ○ φ) = (F ○ φ)(b) − (F ○ φ)(a)
b b
′ ′
a a
= F (φ(b)) − F (φ(a)) = ∫
φ(b) φ(b)
F′ = ∫ f.
φ(a) φ(a)
⊔
⊓
to recognize that the integrand takes the form g = (f ○ φ) ⋅ φ′ , giving the left
side of (6.4) for suitable f and φ such that the right side ∫φ(a) f is easier
φ(b)
section, take
g ∶ R+ Ð→ R, g(x) = (ln x)2 /x.
To evaluate ∫1 g, define
e
φ ∶ R+ Ð→ R, φ(x) = ln x
282 6 Integration
and
f ∶ R Ð→ R, f (u) = u2 .
Then g = (f ○ φ) ⋅ φ′ , and φ(1) = 0, φ(e) = 1, so by the change of variable
theorem,
g = ∫ (f ○ φ) ⋅ φ′ = ∫
e e φ(e) 1
∫ f =∫ f.
1 1 φ(1) 0
Since f has antiderivative F where F (u) = u3 /3, the last integral equals F (1)−
F (0) = 1/3 by the FTIC.
The second integral at the beginning of the section was evaluated not by
the change of variable theorem as given, but by a consequence of it:
∫ (f ○ φ) = ∫ f ⋅ (φ−1 )′ .
b φ(b)
a φ(a)
Noting where the various elements of the left diagram occur in the forward
substitution formula ∫a (f ○ φ) ⋅ φ′ = ∫φ(a) f shows that applying the forward
b φ(b)
integrand as g = f ○ φ, giving the left side, and then invert φ and differentiate
the inverse to see whether √
√
the right side is easier to evaluate. For instance, for
the second integral ∫0 dx/ 1 + x at the beginning of the section, define
9
√ √
φ ∶ R≥0 Ð→ R≥1 , φ(x) = 1+ x
and
f ∶ R≥1 Ð→ R, f (u) = 1/u.
Then the integral is
√ √ = ∫ (f ○ φ).
9 dx 9
∫
0 1+ x 0
6.4 Integration of Functions of One Variable 283
Let √ √
u = φ(x) = 1 + x.
Then a little algebra gives
so that
(φ−1 )′ (u) = 4u(u2 − 1).
Since φ(0) = 1 and φ(9) = 2, the integral becomes
2 u(u2 − 1) du
√ √ = ∫ (f ○ φ) = ∫ f ⋅ (φ−1 )′ = 4 ∫
9 dx 9 2
∫ ,
0 1+ x 0 1 1 u
level, where we deal with functions as functions, this extra notation is useless
and cumbersome, but in any down-to-earth example it is in fact a convenience
because describing functions by formulas is easier and more direct than intro-
ducing new symbols to name them.
The second, more serious, objection to the variable-based notation is to
the dx, the du, and mysterious relations such as du = dx/x between them.
What kind of objects are dx and du? In a first calculus course they are typi-
cally described as infinitesimally small changes in x and u, but our theory of
integration is not based on such hazy notions; in fact, it was created in the
nineteenth century to answer objections to their validity. (Though infinitesi-
mals were revived and put on a firm footing in the 1960s, we have no business
with them here.) An alternative is to view dx and du as formal symbols that
serve, along with the integral sign ∫ , as bookends around the expression for the
function being integrated. This viewpoint leaves notation such as du = dx/x
still meaningless in its own right. In a first calculus course it may be taught
as a procedure with no real justification, whereas by contrast, the revisited
versions of the two integral-calculations of this section are visibly applications
of results that have been proved. However, the classical method is probably
easier for most of us, its notational conventions dovetailing with the change
of variable theorem and its corollary so well. So feel free to continue using it.
(And remember to switch the limits of integration when you do.)
284 6 Integration
Exercises
6.4.1. (a) Show that for three points a, b, c ∈ R in any order, and every inte-
grable function f ∶ [min{a, b, c}, max{a, b, c}] Ð→ R, ∫a f = ∫a f + ∫b f .
c b c
(b) Show that if f ∶ [min{a, b}, max{a, b}] Ð→ R takes the constant value k
then ∫a f = k(b − a), regardless of which of a and b is larger.
b
6.4.2. Complete the proof of Theorem 6.4.1 by analyzing the case h < 0.
6.4.3. Show that if F1 , F2 ∶ [a, b] Ð→ R are differentiable and F1′ = F2′ , then
F1 = F2 + C for some constant C. This result was used in this section to
prove the fundamental theorem of calculus (Theorem 6.4.2), so do not use
that theorem to address this exercise. However, this exercise does require a
theorem. Reducing to the case F2 = 0, as in the comment in Exercise 6.2.7,
will make this exercise a bit tidier.
6.4.5. Let f ∶ [0, 1] Ð→ R be continuous and suppose that for all x ∈ [0, 1],
∫0 f = ∫x f . What is f ?
x 1
6.4.6. Find all differentiable functions f ∶ R≥0 Ð→ R such that for all x ∈ R≥0 ,
(f (x))2 = ∫0 f .
x
Figure 6.16. Some type I subboxes of the partition, and for an arc in R3
The idea is that the graph of the function ϕ in the proposition will describe
some of the points of discontinuity of a different function f that we want to
integrate. Thus the dimension m in the proposition is typically n − 1, where
the function f that we want to integrate has n-dimensional input.
Now take a partition P of B whose subboxes J have sides of length less than
δ/m, so that if two points are in a common subbox J then the distance between
them is less than δ. Consider the partition P × Q of B × I. For each subbox J
of P there exist at most two subboxes J × K of P × Q over J that intersect the
graph of ϕ, i.e., subboxes of type I. To see this, note that if we have three or
more such subboxes, then some pair J ×K and J ×K ′ are not vertical neighbors,
and so every hypothetical pair of points of the graph, one in each subbox, are
less than distance δ apart horizontally but at least distance ε′ apart vertically.
But by (6.5), this is impossible. (See Figure 6.17. The horizontal direction in
the figure is only a schematic of the m-dimensional box B, but the vertical
direction accurately depicts the one-dimensional codomain of ϕ.)
Now, working with subboxes J × K of P × Q, compute that
288 6 Integration
Figure 6.17. The graph meets at most two boxes over each base
type I type I
y = 2π − x2
x = sin(y)
x=2
y=0
∑ ∑ (MJ ′ (f ) − mJ ′ (f )) vol(J )
′
J : type I J ′ ⊂J
(6.6)
≤ 2R ∑ vol(J ) = 2R vol(J) < 2R = .
ε ε
∑ ∑
′
J : type I J ′ ⊂J J : type I 4R 2
∑ ∑ (MJ ′ (f ) − mJ ′ (f )) vol(J )
′
J : type II J ′ ⊂J
(6.7)
≤ U (f, PJ′ ) − L(f, PJ′ ) < N ⋅ = .
ε ε
∑
J : type II 2N 2
Finally, combining (6.6) and (6.7) shows that U (f, P ′ ) − L(f, P ′ ) < ε, and so
by ( ⇐Ô ) of the integrability criterion, ∫B f exists. ⊓
⊔
6.5 Integration over Nonboxes 291
To recapitulate the argument: The fact that f is bounded means that its
small set of discontinuities can’t cause much difference between lower and up-
per sums, and the continuity of f on the rest of its domain poses no obstacle
to integrability either. The only difficulty was making the ideas fit into our
box-counting definition of the integral. The reader could well object that prov-
ing Theorem 6.5.4 shouldn’t have to be this complicated. Indeed, the theory
of integration being presented here, Riemann integration, involves laborious
proofs precisely because it uses such crude technology: finite sums over boxes.
More powerful theories of integration exist, with stronger theorems and more
graceful arguments. However, those theories also entail the startup cost of as-
similating a larger, more abstract set of working ideas, making them difficult
to present as quickly as Riemann integration.
Now we can discuss integration over nonboxes.
Definition 6.5.5 (Known-integrable function). A function
f ∶ K Ð→ R
is known-integrable if K is a compact subset of Rn having boundary of
volume zero, and if f is bounded on K and is continuous on all of K except
possibly a subset of volume zero.
For example, let K = {(x, y) ∶ ∣(x, y)∣ ≤ 1} be the closed unit disk in R2 ,
⎧
and define
⎪
⎪ 1 if x ≥ 0,
f ∶ K Ð→ R, f (x, y) = ⎨
⎪
⎪−1 if x < 0.
⎩
To see that this function is known-integrable, note that the boundary of K
is the union of the upper and lower unit semicircles, which are graphs of
continuous functions on the same 1-dimensional box,
√
ϕ± ∶ [−1, 1] Ð→ R, ϕ± (x) = ± 1 − x2 .
Thus the boundary of K has area zero. Furthermore, f is bounded on K, and
f is continuous on all of K except the vertical interval {0} × [−1, 1], which has
area zero by the 2-dimensional box area formula.
Definition 6.5.6 (Integral over a nonbox). Let
f ∶ K Ð→ R
be a known-integrable function. Extend its domain to Rn by defining a new
function
⎧
⎪
⎪f (x) if x ∈ K,
f˜ ∶ Rn Ð→ R, f˜(x) = ⎨
⎪0
⎪ if x ∉ K.
⎩
Then the integral of f over K is
For the example just before the definition, the extended function is
⎧
⎪ if ∣(x, y)∣ ≤ 1 and x ≥ 0,
⎪
⎪
1
⎪
f˜ ∶ R2 Ð→ R, f (x, y) = ⎨−1 if ∣(x, y)∣ ≤ 1 and x < 0,
⎪
⎪
⎪
⎪
⎩ 0 if ∣(x, y)∣ > 1,
and to integrate the original function over the disk, we integrate the extended
function over the box B = [0, 1] × [0, 1].
Returning to generality, the integral on the right side of the equality in the
definition exists because f˜ is bounded and discontinuous on a set of volume
zero, as required for Theorem 6.5.4. In particular, the definition of volume is
now, sensibly enough,
vol(K) = ∫ 1.
K
Naturally, the result of Proposition 6.2.4, that the integral over the whole
is the sum of the integrals over the pieces, is not particular to boxes and
subboxes.
Proof. Define
⎧
⎪
⎪f (x) if x ∈ K1 ,
f1 ∶ K Ð→ R, f1 (x) = ⎨
⎪
⎪
⎩
0 otherwise.
∫ f1 + ∫ f2 = ∫ f1 + ∫ f2 = ∫ (f1 + f2 ).
K1 K2 K K K
Exercises
6.5.3. Let B ⊂ Rn be a box. Show that its volume under Definition 6.5.1
equals its volume under Definition 6.1.4. (Hint: Exercise 6.2.3.)
6.5.4. Let S be the set of rational numbers in [0, 1]. Show that under Defini-
tion 6.5.1, the volume (i.e., length) of S does not exist.
6.5.7. Prove that if S1 and S2 have volume zero, then so does S1 ∪ S2 . (Hint:
χS1 ∪S2 ≤ χS1 + χS2 .)
6.5.8. Find an unbounded set with nonempty boundary, and a bounded set
with empty boundary.
6.5.9. Review Figure 6.18 and its discussion in this section. Also review the
example that begins after Definition 6.5.5 and continues after Definition 6.5.6.
Similarly, use results from this section such as Theorem 6.5.4 and Proposi-
tion 6.5.3 to explain why for each set K and function f ∶ K Ð→ R below, the
integral ∫K f exists. Draw a picture each time, taking n = 3 for the picture in
part (f).
(a) K = {(x, y) ∶ 2 ≤ y ≤ 3, 0 ≤ x ≤ 1 + ln y/y}, f (x, y) = exy .
√
(b) K = {(x, y) ∶ 1 ≤ x ≤ 4, 1 ≤ y ≤ x}, f (x, y) = ex/y /y 5 .
2
f (x1 , x2 , . . . , xn ),
b1 b2 bn
∫ ∫ ⋯∫
x1 =a1 x2 =a2 xn =an
each inner integral over y is being taken over a segment of x-dependent length
as the outer variable x varies from 0 to π. (See Figure 6.21.)
Fubini’s theorem says that under suitable conditions, the n-dimensional
integral is equal to the n-fold iterated integral. The theorem thus provides an
essential calculational tool for multivariable integration.
Theorem 6.6.1 (Fubini’s theorem). Let B = [a, b] × [c, d] ⊂ R2 , and let
f ∶ B Ð→ R be bounded, and continuous except on a subset S ⊂ B of area zero,
so ∫B f exists. Suppose that for each x ∈ [a, b], the cross-sectional integral
∫y=c f (x, y) exists; this happens if the cross-sectional function ϕx ∶ [c, d] Ð→ R
d
f (x, y).
b d
∫ f =∫ ∫
B x=a y=c
6.6 Fubini’s Theorem 295
y=x
x
π
y
x
However, since the multiple integral and the iterated integral are defined
analytically as limits of sums, our only available method for proving the the-
orem is analytic: we must compare approximating sums for the two integrals.
We now discuss the ideas before giving the actual proof. A lower sum for the
integral ∫B f is shown geometrically on the left side of Figure 6.23. A partition
P × Q divides the box B = [a, b] × [c, d] into subboxes I × J, and the volume
of each solid region in the figure is the area of a subbox times the minimum
height of the graph over the subbox. By contrast, letting g(x) = ∫y=c f (x, y)
d
be the area of the cross section at x, the right side of Figure 6.23 shows a lower
296 6 Integration
sum for the integral ∫x=a g(x). The partition P divides the interval [a, b] into
b
subintervals I, and the volume of each bread-slice in the figure is the length
of a subinterval times the minimum area of the cross sections orthogonal to I.
The proof will show that because integrating in the y-direction is a finer di-
agnostic than summing minimal box-areas in the y-direction, the bread-slices
on the right side of the figure are a superset of the boxes on the left side.
Consequently, the volume beneath the bread-slices is at least the volume of
the boxes,
L(f, P × Q) ≤ L(g, P ).
By similar reasoning for upper sums, in fact we expect that
Since L(f, P ×Q) and U (f, P ×Q) converge to ∫B f under a suitable refinement
of P × Q, so do L(g, P ) and U (g, P ). Thus the iterated integral exists and
equals the double integral as desired. The details of turning the geometric
intuition of this paragraph into a proof of Fubini’s theorem work out fine,
provided that we carefully tend to matters in just the right order. However,
the need for care is genuine. A subtle point not illustrated by Figure 6.23 is
that
• although the boxes lie entirely beneath the bread-slices (this is a relation
between two sets),
• and although the boxes lie entirely beneath the graph (so is this),
• and although the volume of the bread-slices is at most the volume beneath
the graph (but this is a relation between two numbers),
• the bread-slices need not lie entirely beneath the graph.
Since the bread-slices need not lie entirely beneath the graph, the fact that
their volume L(g, P ) estimates the integral ∫B f from below does not follow
from pointwise considerations. The proof finesses this point by establishing
the inequalities (6.8) without reference to the integral, only then bringing the
integral into play as the limit of the extremal sums in (6.8).
g ∶ [a, b] Ð→ R,
d
g(x) = ∫ ϕx .
c
The iterated integral ∫x=a ∫y=c f (x, y) is precisely the integral ∫a g. We need
b d b
y y
x x
mJ×K (f ) ≤ mK (ϕx ).
The previous two displays combine to give a lower bound for the cross-
sectional integral g(x), the lower bound making reference to the interval J on
which x lies but independent of the particular point x of J,
That is, the left side of this last display is a lower bound of all values g(x)
as x varies through J. So it is at most the greatest lower bound,
(This inequality says that each y-directional row of boxes in the left half of
Figure 6.23 has at most the volume of the corresponding bread-slice in the
right half of the figure.) As noted at the end of the preceding paragraph, the
iterated integral is the integral of g. The estimate just obtained puts us in
a position to compare lower sums for the double integral and the iterated
integral,
Concatenating a virtually identical argument with upper sums gives the an-
ticipated chain of inequalities,
Since we will use Fubini’s theorem to evaluate actual examples, all the
notational issues discussed in Section 6.4 arise here again. A typical notation
for examples is
∫ f (x, y) = ∫ f (x, y),
b d
∫
B x=a y=c
where the left side is a 2-dimensional integral, the right side is an iterated
integral, and f (x, y) is an expression defining f . For example, by Fubini’s
theorem and the calculation at the beginning of this section,
1 2 4
∫ xy 2 = ∫ ∫ xy 2 = .
[0,1]×[0,2] x=0 y=0 3
Of course, an analogous theorem asserts that ∫B f (x, y) = ∫y=c ∫x=a f (x, y),
d b
then one iterated integral takes the form ∫x=a ∫y=ϕ f (x, y). Similarly, if
2 b ϕ (x)
1 (x)
then the other iterated integral is ∫y=c ∫x=θ f (x, y). (See Figure 6.24.)
d
2 θ (y)
1 (y)
6.6 Fubini’s Theorem 299
y = ϕ2 (x)
x = θ2 (y)
x = θ1 (y)
y = ϕ1 (x)
2
looks daunting because the integrand e−x has no convenient antiderivative,
but after exchanging the order of the integrations and then carrying out a
change of variable, it becomes
1 2x 1 1
∫ ∫ e−x = ∫ 2xe−x = ∫ e−u = 1 − e−1 .
2 2
Interchanging the order of integration can be tricky in such cases; often one
has to break K up into several pieces first, e.g.,
2 2 1 2 2 2
∫ ∫ =∫ ∫ +∫ ∫ .
x=1 y=1/x y=1/2 x=1/y y=1 x=1
A carefully labeled diagram facilitates this process. For example, Figure 6.25
shows the sketch that arises from the integral on the left side, and then the
resulting sketch that leads to the sum of two integrals on the right side.
Interchanging the outer two integrals in a triply iterated integral is no dif-
ferent from the double case, but interchanging the inner two is tricky, because
of the constant-but-unknown value taken by the outer variable. Sketching a
generic two-dimensional cross section usually makes the substitutions clear.
For example, consider the iterated integral
1 x2 x2
∫ ∫ ∫ . (6.9)
x=0 y=x3 z=y
300 6 Integration
y y
y=2
2
x=1
x=2
1
y = 1/x 1/2 x = 1/y
x x
1 2
On the other hand, to exchange the inner integrals of (6.9), think of x as fixed
but generic between 0 and 1 and consider the second diagram in Figure 6.26.
This diagram shows that (6.9) is also the iterated integral
1 x2 z
∫ ∫ 3
∫ . (6.10)
x=0 z=x y=x3
Switching the outermost and innermost integrals of (6.9) while leaving the
middle one in place requires three successive switches of adjacent integrals.
For instance, switching the inner integrals as we just did and then doing an
outer exchange on (6.10) virtually identical to the outer exchange of a moment
earlier (substitute z for y in the first diagram of Figure 6.26) shows that (6.9)
is also √
3
1 z z
∫ ∫ √ ∫ .
z=0 x= z y=x3
Finally, the first diagram of Figure 6.27 shows how to exchange the inner
integrals once more. The result is
√
3 y
1 z
∫ ∫ ∫ √ .
z=0 y=z 3/2 x= z
The second diagram of Figure 6.27 shows the three-dimensional figure that our
iterated integral has traversed in various fashions. It is satisfying to see how
6.6 Fubini’s Theorem 301
y z
y = x2
√ z = x2
x= y
y = x3 x2
√ z=y
y = x3
x= 3 y x3
y=z
x y
1 x3 x2
z y = x3
√
z 3/2 x= 3 y
√ √ x
z 3z
S = {(x, y, z) ∶ x ≥ 0, y ≥ 0, z ≥ 0, x + y + z ≤ 1}
302 6 Integration
∫S x ∫S y ∫S z
x= , y= , z= .
vol(S) vol(S) vol(S)
Fubini’s theorem lets us treat the integrals as iterated, giving
1 1−x 1−x−y
∫ x=∫ ∫ ∫ x
S x=0 y=0 z=0
1 1−x
=∫ ∫ x(1 − x − y)
x=0 y=0
1 1
=∫ 1
2
x(1 − x)2 = ,
x=0 24
where the routine one-variable calculations are not shown in detail. Similarly,
vol(S) = ∫S 1 works out to 1/6, so x = 1/4. By symmetry, y = z = 1/4 also. See
the exercises for an n-dimensional generalization of this result.
ellipse {(x, y) ∶ (x/2)2 + (y/ 2)2 = 1}. (See Figure 6.29.) By Fubini’s theorem
the volume is
√
√ √
2 4−2y 2 8−x2 −y 2
V = ∫ √ ∫ √ ∫ 1 = π8 2
y=− 2 x=− 4−2y 2 z=x2 +3y 2
√ √
(1 − x2 ) =
1 1−x2 1−x2 1 16
∫ ∫ √ ∫ √ 2 1 = 4∫ .
x=−1 y=− 1−x2 z=− 1−x x=−1 3
dx y=c y=c ∂x
Proof. Compute for x ∈ [a, b], using the fundamental theorem of integral cal-
culus (Theorem 6.4.2) for the second equality and then Fubini’s theorem for
the fourth,
f (x, y)
d
g(x) = ∫
y=c
D1 f (t, y) + C.
x d
=∫ ∫
t=a y=c
We show that ∫y=c D1 f (t, y) is a continuous function of t. Fix t, and let ε > 0
d
This proves the claimed continuity. Now Theorem 6.4.1 says that the deriva-
tive of the iterated integral is the inner integral evaluated at t = x,
y=c
Exercises
6.6.1. Let S be the set of points (x, y) ∈ R2 between the x-axis and the
sine curve as x varies between 0 and 2π. Since the sine curve has two arches
between 0 and 2π, and since the area of an arch of the sine function is 2,
∫ 1 = 4.
S
6.6 Fubini’s Theorem 305
x2 +y 2
6.6.4. Exchange the inner order of integration in ∫x=0 ∫y=0 ∫z=0
1 1
f . Sketch
the region of integration.
6.6.5. Evaluate ∫K f from parts (a), (b), (c), (f) of Exercise 6.5.9, except
change K to [0, 1]n for part (f).
6.6.6. Find the volume of the region K bounded by the coordinate planes,
x + y = 1, and z = x2 + y 2 . Sketch K.
6.6.8. Find the volume of the region K in the first octant bounded by x = 0,
z = 0, z = y, and x = 4 − y 2 . Sketch K.
have equal (n − 1)-dimensional volumes. Show that K and L have the same
volume. (Hint: Use Fubini’s theorem to decompose the n-dimensional volume-
integral as the iteration of a 1-dimensional integral of (n − 1)-dimensional
integrals.) Illustrate for n = 2.
306 6 Integration
(Use induction. The base case n = 1 is easy; then the induction hypothesis
applies to the inner (n − 1)-fold integral.)
(b) Prove that vol(S1 (r)) = r. Use part (a) and Fubini’s theorem (cf. the
hint to Exercise 6.6.11) to prove that
Work this integral by substitution or by parts to get ∫Sn (r) xn = rn+1 /(n + 1)!.
(d) The centroid of Sn (r) is (x1 , . . . , xn ), where xj = ∫Sn (r) xj /vol(Sn (r))
for each j. What are these coordinates explicitly? (Make sure your answer
agrees with the case in the text.)
y p
r
θ
x
Also, tan θ = y/x provided that x ≠ 0, but this doesn’t mean that θ =
arctan(y/x). Indeed, arctan isn’t even a well-defined function until its range
is specified, e.g., as (−π/2, π/2). With this particular restriction, the actual
formula for θ, even given that not both x and y are 0, is not arctan(y/x), but
⎧
⎪ x>0 and y ≥ 0 (this lies in [0, π/2)),
⎪
⎪
⎪
arctan(y/x) if
⎪
⎪
⎪π/2 x=0 and y > 0,
⎪
⎪
⎪
if
θ = ⎨arctan(y/x) + π x<0 (this lies in (π/2, 3π/2)),
⎪
if
⎪
⎪
⎪
⎪
⎪ x=0 and y < 0,
⎪
3π/2 if
⎪
⎪
⎪
⎩arctan(y/x) + 2π if x>0 and y < 0 (this lies in (3π/2, 2π)).
The formula is unwieldy, to say the least. (The author probably would not
read through the whole thing if he were instead a reader. In any case, see
Figure 6.32.) A better approach is that given (x, y), the polar radius r is the
unique nonnegative number such that
r 2 = x2 + y 2 ,
and then, if r ≠ 0, the polar angle θ is the unique number in [0, 2π) such
that (6.11) holds. But still, going from polar coordinates (r, θ) to Cartesian
coordinates (x, y) as in (6.11) is considerably more convenient than conversely.
This is good, since as we will see, doing so is also more natural.
The change of variable mapping from polar to Cartesian coordinates is
The mapping is injective except that the half-lines R≥0 × {0} and R≥0 × {2π}
both map to the nonnegative x-axis, and the vertical segment {0} × [0, 2π] is
squashed to the point (0, 0). Each horizontal half-line R≥0 × {θ} maps to the
308 6 Integration
2π
3π/2
π/2
y
θ
2π
Φ
x
r
ray of angle θ with the positive x-axis, and each vertical segment {r} × [0, 2π]
maps to the circle of radius r. (See Figure 6.33.)
It follows that regions in the (x, y)-plane defined by radial or angular con-
straints are images under Φ of (r, θ)-regions defined by rectangular constraints.
For example, the Cartesian disk
Db = {(x, y) ∶ x2 + y 2 ≤ b2 }
Rb = {(r, θ) ∶ 0 ≤ r ≤ b, 0 ≤ θ ≤ 2π}.
(See Figure 6.34.) Similarly, the Cartesian annulus and quarter disk
Aa,b = {(x, y) ∶ a2 ≤ x2 + y 2 ≤ b2 },
Qb = {(x, y) ∶ x ≥ 0, y ≥ 0, x2 + y 2 ≤ b2 },
y
θ
2π
Φ
x
b
r
b
y
θ
2π
Φ
x
a b
r
a b
y
θ
Φ
x
π/2 b
r
b
Figure 6.36. Rectangle to quarter disk under the polar coordinate mapping
These tidy (r, θ) limits describe the (x, y) annulus Aa,b indirectly via Φ, while
the more direct approach of an (x, y)-iterated integral over Aa,b requires four
messy pieces,
310 6 Integration
√ √ √ √
[∫ √ ]+∫
−a b2 −x2 a − a2 −x2 b2 −x2 b b2 −x2
∫ ∫ √ +∫ +∫ √ ∫ √ .
x=−b y=− b2 −x2 x=−a y=− b2 −x2 y= a2 −x2 x=a y=− b2 −x2
Φ′ = [Dj Φi ]i,j=1,...,n .
Φ ∶ A Ð→ Rn
Let
f ∶ Φ(K) Ð→ R
be a continuous function. Then
∫ f = ∫ (f ○ Φ) ⋅ ∣ det Φ′ ∣.
Φ(K) K
This section will end with a heuristic argument to support Theorem 6.7.1,
and then Section 6.9 will prove the theorem after some preliminaries in Sec-
tion 6.8. In particular, Section 6.8 will explain why the left side integral in
the theorem exists. (The right-side integral exists because the integrand is
6.7 Change of Variable 311
(f ○ Φ)(r, θ) = r2 ,
cos θ −r sin θ
Φ′ = [ ],
sin θ r cos θ
∫ f =∫ r2 ⋅ r.
Aa,b [a,b]×[0,2π]
∫Hb y ∫ ∫ r sin θ ⋅ r 2b /3
π b 3
4
y= = θ=0 r=02 = 2 =
area(Hb ) πb /2 πb /2 3π
b.
Indeed, 4/(3π) is somewhat less than 1/2, in conformance with our physical
intuition of the centroid of a region as its balancing point.
Subtle aspects of Theorem 6.7.1 were in play for the previous two examples.
The polar change of coordinate mapping Φ(r, θ) isn’t injective on all of the
312 6 Integration
box [a, b] × [0, 2π] that parametrizes the annulus: the 2π-periodic behavior
of Φ as a function of θ maps the top and bottom edges of the box to the
same segment [a, b] of the x-axis. Furthermore, on the box [0, b] × [0, π] that
parametrizes the half disk, not only does Φ collapse the left edge of the box to
the origin in the (x, y)-plane, but also det Φ′ = 0 on the left edge of the box.
Thus we really do require the theorem’s hypotheses that Φ need be injective
only on the interior of K, and that the condition det Φ′ ≠ 0 need hold only on
the interior of K.
Just as polar coordinates are convenient for radial symmetry in R2 , cylin-
drical coordinates in R3 conveniently describe regions with symmetry about
the z-axis. A point p ∈ R3 with Cartesian coordinates (x, y, z) has cylindrical
coordinates (r, θ, z) where (r, θ) are the polar coordinates for the point (x, y).
(See Figure 6.37.)
z z
y
θ Φ
x
r
given by
Φ(r, θ, z) = (r cos θ, r sin θ, z).
That is, Φ is just the polar coordinate mapping on z cross sections, so like the
polar map, it is mostly injective. Its derivative matrix is
⎡cos θ −r sin θ 0⎤
⎢ ⎥
⎢ ⎥
Φ′ = ⎢ sin θ r cos θ 0⎥ ,
⎢ ⎥
⎢ 0 1⎥
⎣ 0 ⎦
6.7 Change of Variable 313
and again
∣ det Φ′ ∣ = r.
So, for example, to integrate f (x, y, z) = y 2 z over the cylinder C : x2 + y 2 ≤ 1,
0 ≤ z ≤ 2, note that C = Φ([0, 1] × [0, 2π] × [0, 2]), and therefore by the change
of variable theorem and then Fubini’s theorem,
1 2
r4 z2
∣ ⋅ ∣
2π 1 2 2π
∫ f =∫ r2 sin2 θ ⋅ z ⋅ r = ∫ sin2 θ ⋅ = .
π
∫ ∫
C θ=0 r=0 z=0 θ=0 4 r=0 2 z=0 2
given by
Φ(ρ, θ, ϕ) = (ρ cos θ sin ϕ, ρ sin θ sin ϕ, ρ cos ϕ).
The spherical coordinate mapping has derivative matrix
⎡cos θ sin ϕ −ρ sin θ sin ϕ ρ cos θ cos ϕ⎤
⎢ ⎥
⎢ ⎥
Φ = ⎢ sin θ sin ϕ ρ cos θ sin ϕ ρ sin θ cos ϕ ⎥ ,
⎢ ⎥
′
⎢ cos ϕ −ρ sin ϕ ⎥
⎣ 0 ⎦
with determinant (using column-linearity)
⎡cos θ sin ϕ sin θ cos θ cos ϕ⎤
⎢ ⎥
⎢ ⎥
det Φ = −ρ sin ϕ det ⎢ sin θ sin ϕ − cos θ sin θ cos ϕ ⎥
⎢ ⎥
′ 2
⎢ cos ϕ − sin ϕ ⎥
⎣ 0 ⎦
⎛ + ⎞
2 2 2 2
cos θ sin ϕ sin θ cos ϕ
= −ρ2 sin ϕ
⎝ + cos2 θ cos2 ϕ + sin2 θ sin2 ϕ⎠
= −ρ2 sin ϕ,
so that since 0 ≤ ϕ ≤ π,
314 6 Integration
∣ det Φ′ ∣ = ρ2 sin ϕ.
That is, the spherical coordinate mapping reverses orientation. It can be re-
defined to preserve orientation by changing ϕ to the latitude angle, varying
from −π/2 to π/2, rather than the colatitude.
Figure 6.38 shows the image under the spherical coordinate mapping of
some (θ, ϕ)-rectangles, each having a fixed value of ρ, and similarly for Fig-
ure 6.39 for some fixed values of θ, and Figure 6.40 for some fixed values of ϕ.
Thus the spherical coordinate mapping takes boxes to regions with these sorts
of walls, such as the half ice cream cone with a bite taken out of its bottom
in Figure 6.41.
z
y
ϕ θ Φ
x
ρ
vol(B3 (r)) = ∫
2π r π 1 4
1=∫ ∫ ∫ ρ2 sin ϕ = 2π ⋅ r3 ⋅ 2 = πr3 .
B3 (r) θ=0 ρ=0 ϕ=0 3 3
It follows that the cylindrical shell B3 (b) − B3 (a) has volume 4π(b3 − a3 )/3.
See Exercises 6.7.12 through 6.7.14 for the lovely formula giving the volume
of the n-ball for arbitrary n.
The change of variable theorem and spherical coordinates work together
to integrate over the solid ellipsoid of (positive) axes a, b, c,
z
y
ϕ θ Φ
z
y
ϕ θ Φ
∫ (Ax2 + By 2 + Cz 2 ),
Ea,b,c
first define a change of variable mapping that stretches the unit sphere into
the ellipsoid,
z
ϕ y
θ Φ
x
ρ
⎡a 0 0⎤
⎢ ⎥
⎢ ⎥
Φ = ⎢0 b 0⎥ , ∣ det Φ′ ∣ = abc.
⎢ ⎥
′
⎢0 0 c ⎥
⎣ ⎦
Let f (x, y, z) = z . Then because Ea,b,c = Φ(B3 (1)) and (f ○Φ)(u, v, w) = c2 w2 ,
2
∫ f =∫ (f ○ Φ) ⋅ ∣ det Φ′ ∣ = abc ⋅ c2 ∫ w2 .
Φ(B3 (1)) B3 (1) B3 (1)
Apply the change of variable theorem again, using the spherical coordinate
mapping into (u, v, w)-space,
1 2π π 4π
∫ w2 = ∫ ∫ ∫ ρ2 cos2 ϕ ⋅ ρ2 sin ϕ = .
B3 (1) ρ=0 θ=0 ϕ=0 15
By the symmetry of the symbols in the original integral, its overall value is
therefore
(Ax2 + By 2 + Cz 2 ) =
4π
∫ abc(a2 A + b2 B + c2 C).
Ea,b,c 15
Another example is to find the centroid of upper hemispherical shell
(b − a4 ).
b 2π π/2
∫ z=∫ ρ cos ϕ ⋅ ρ2 sin ϕ =
π 4
∫ ∫
S ρ=a θ=0 ϕ=0 4
This integral needs to be divided by the volume 2π(b3 − a3 )/3 of S to give
6.7 Change of Variable 317
3(b4 − a4 )
z=
8(b3 − a3 )
.
In particular, the centroid of the solid hemisphere is 3/8 of the way up. It
is perhaps surprising that π does not figure in this formula, as it did in the
two-dimensional case.
Here is a heuristic argument to support the change of variable theorem.
Suppose that K is a box. Recall the theorem’s assertion: under certain con-
ditions,
∫ f = ∫ (f ○ Φ) ⋅ ∣ det Φ′ ∣.
Φ(K) K
To argue that this equality holds, take a partition P dividing K into subboxes
J, and in each subbox choose a point xJ . If the partition is fine enough, then
each J maps under Φ to a small patch A of volume vol(A) ≈ ∣ det Φ′ (xJ )∣vol(J)
(cf. Section 3.8), and each xJ maps to a point yA ∈ A. (See Figure 6.42.) Since
the integral is a limit of weighted sums, it follows that
∫ f ≈ ∑ f (yA )vol(A)
Φ(K) A
≈ ∑ f (Φ(xJ ))∣ det Φ′ (xJ )∣vol(J)
J
≈ ∫ (f ○ Φ) ⋅ ∣ det Φ′ ∣,
K
and these should become equalities in the limit as P becomes finer. What
makes this reasoning incomplete is that the patches A are not boxes, as are
required for our theory of integration.
J A
Recall from Sections 3.8 and 3.9 that the absolute value of det Φ′ (x) de-
scribes how the mapping Φ scales volume at x, while the sign of det Φ′ (x)
says whether the mapping locally preserves or reverses orientation. The fac-
tor ∣ det Φ′ ∣ in the n-dimensional change of variable theorem (rather than
318 6 Integration
the signed det Φ′ ) reflects the fact that n-dimensional integration does not
take orientation into account. This unsigned result is less satisfying than
the corresponding result in one-variable theory, which does consider ori-
entation and therefore comes with a signed change of variable theorem,
∫φ(a) f = ∫a (f ○ φ) ⋅ φ . An orientation-sensitive n-dimensional integration
φ(b) b ′
Exercises
6.7.1. Evaluate ∫S x2 + y 2 where S is the region bounded by x2 + y 2 = 2z and
z = 2. Sketch S.
6.7.2. Find the volume of the region S above x2 + y 2 = 4z and below x2 + y 2 +
z 2 = 5. Sketch S.
6.7.3. Find the volume of the region between the graphs of z = x2 + y 2 and
z = (x2 + y 2 + 1)/2.
6.7.4. Derive the spherical coordinate mapping.
6.7.5. Let Φ be the spherical coordinate mapping. Describe Φ(K) where
K = {(ρ, θ, ϕ) ∶ 0 ≤ θ ≤ 2π, 0 ≤ ϕ ≤ π/2, 0 ≤ ρ ≤ cos ϕ}.
(Hint: Along with visualizing the geometry, set θ = 0 and consider the condi-
tion ρ2 = ρ cos ϕ in Cartesian coordinates.) Same question for
K = {(ρ, θ, ϕ) ∶ 0 ≤ θ ≤ 2π, 0 ≤ ϕ ≤ π, 0 ≤ ρ ≤ sin ϕ}.
6.7.6. Evaluate ∫S xyz where S is the first octant of B3 (1).
6.7.7. Find the mass of a solid figure filling the spherical shell
S = B3 (b) − B3 (a)
with density δ(x, y, z) = x2 + y 2 + z 2 .
6.7.8. A solid sphere of radius b has density δ(x, y, z) = e−(x
2
+y 2 +z 2 )3/2
. Find
its mass, ∫B3 (b) δ.
6.7.9. Find the centroid of the region S = B3 (a) ∩ {x2 + y 2 ≤ z 2 } ∩ {z ≥ 0}.
Sketch S.
6.7.10. (a) Prove Pappus’s theorem: Let K be a compact set in the (x, z)-
plane lying to the right of the z-axis and with boundary of area zero. Let S
be the solid obtained by rotating K about the z-axis in R3 . Then
vol(S) = 2πx ⋅ area(K),
where as always, x = ∫K x/area(K). (Use cylindrical coordinates.)
(b) What is the volume of the torus Ta,b of cross-sectional radius a and
major radius b from the center of rotation to the center of the cross-sectional
disk? (See Figure 6.43.)
6.7 Change of Variable 319
6.7.11. Prove the change of scale principle: if the set K ⊂ Rn has volume
v then for every r ≥ 0, the set rK = {rx ∶ x ∈ K} has volume rn v. (Change
variables by Φ(x) = rx.)
6.7.12. (Volume of the n-ball, first version.) Let n ∈ Z+ and r ∈ R≥0 . The
n-dimensional ball of radius r is
Let
vn = vol(Bn (1)).
(a) Explain how Exercise 6.7.11 reduces computing the volume of Bn (r)
to computing vn .
(b) Explain why v1 = 2 and v2 = π.
(c) Let D denote the unit disk B2 (1). Explain why for n > 2,
√
Bn (1) = ⊔ {(x1 , x2 )} × Bn−2 ( 1 − x21 − x22 ).
(x1 ,x2 )∈D
(1 − r2 ) 2 −1 ⋅ r
2π 1
= vn−2 ∫ ∫
n
θ=0 r=0
= vn−2 π/(n/2).
(Use the definition of volume at the end of Section 6.5, Fubini’s theorem, the
definition of volume again, the change of scale principle from the previous
exercise, and the change of variable theorem.)
(e) Prove by induction the for n even case of the formula
320 6 Integration
⎧
⎪
⎪
⎪
⎪
π n/2
⎪ (n/2)! for n even,
vn = ⎨ (n−1)/2 n
⎪
⎪
⎪ 2 ((n − 1)/2)!
⎪
π
⎪
⎩
for n odd.
n!
(The for n odd case can be proved by induction as well, but the next two
exercises provide a better, more conceptual, approach to the volumes of odd-
dimensional balls.)
6.7.13. This exercise computes the improper integral I = ∫x=0 e−x , defined as
∞ 2
the limit limR→∞ ∫x=0 e−x . Let I(R) = ∫x=0 e−x for R ≥ 0.
R 2 R 2
(a) Use Fubini’s theorem to show that I(R)2 = ∫S(R) e−x −y , where S(R)
2 2
is the square
S(R) = {(x, y) ∶ 0 ≤ x ≤ R, 0 ≤ y ≤ R}.
(b) Let Q(R) be the quarter disk
Q(R) = {(x, y) ∶ 0 ≤ x, 0 ≤ y, x2 + y 2 ≤ R2 },
√
and similarly for Q( 2 R). Explain why
∫ ≤∫ ≤∫
2
−y 2 2
−y 2 −x2 −y 2
e−x e−x √ e .
Q(R) S(R) Q( 2 R)
(c) Change variables, and evaluate ∫Q(R) e−x −y and ∫Q(√2 R) e−x
2 2 2
−y 2
.
What are the limits of these two quantities as R → ∞?
(d) What is I?
6.7.14. (Volume of the n-ball, improved version) Define the gamma function
as an integral,
Γ (s) = ∫
∞
xs−1 e−x dx, s > 0.
x=0
(This improper integral is well behaved, even though it is not being carried
out over a bounded region and even though the integrand is unbounded near
x = 0 when 0 < s < 1. We use dx here √ because this exercise is computational.)
(a) Show: Γ (1) = 1, Γ (1/2) = π, Γ (s + 1) = sΓ (s). (Substitute and see
the previous exercise for the second identity, integrate by parts for the third.)
(b) Use part (a) to show that n! = Γ (n + 1) for n = 0, 1, 2, . . . . Accordingly,
define x! = Γ (x + 1) for all real numbers x > −1, not only nonnegative integers.
(c) Use Exercise 6.7.12(b), Exercise 6.7.12(d), and the extended definition
of the factorial in part (b) of this exercise to obtain a uniform formula for the
volume of the unit n-ball,
π n/2
vn = n = 1, 2, 3, . . . .
(n/2)!
,
(We already have this formula for n even. For n odd, the argument is essen-
tially identical to Exercise 6.7.12(e) but starting at the base case n = 1.) Thus
the n-ball of radius r has volume
6.7 Change of Variable 321
π n/2 n
vol(Bn (r)) = n = 1, 2, 3, . . . .
(n/2)!
r ,
For odd n, what value of s shows that the values of vn from part (c) of this
exercise and from part (e) of Exercise 6.7.12 are equal?
(e) (Read-only. While the calculation of vn in these exercises shows the
effectiveness of our integration toolkit, the following heuristic argument il-
lustrates that we would profit from an even more effective theory of integra-
tion.) Decompose Euclidean space Rn into concentric n-spheres (the n-sphere
is the boundary of the n-ball), each having radius r and differential radial
thickness dr. Since each such n-sphere is obtained by removing the n-ball of
radius r from the n-ball of radius r + dr, its differential volume is
Here we ignore the higher powers of dr on the grounds that they are so much
smaller than the dr-term. Thus, reusing some ideas from a moment ago, and
using informal notation,
√
n
π n/2 = ( ∫ e−x dx)
2
since the integral equals π
R
=∫
2
e−∣x∣ dV by Fubini’s theorem
Rn
∞
= vn n ∫
2
rn−1 e−r dr integrating over spherical shells
r=0
∞
= vn n/2 ∫ tn/2−1 e−t dt substituting t = r2
t=0
= vn n/2 Γ (n/2)
= vn (n/2)!.
The formula vn = π n/2 /(n/2)! follows immediately. The reason that this
induction-free argument lies outside our theoretical framework is that it in-
tegrates directly (rather than by the change of variable theorem) even as it
decomposes Rn into small pieces that aren’t boxes. Although we would prefer
a more flexible theory of integration that allows such procedures, developing
it takes correspondingly more time.
(a) Show that because n! = Γ (n + 1), it follows that n! = ∫t=0 en ln t−t dt.
∞
322 6 Integration
(b) With n fixed and t variable, show that the quantity n ln t − t takes
its maximum value n ln n − n at t = n, where its first derivative is 0 and its
second derivative is −1/n. Thus the quantity’s quadratic approximation about
its maximizing point is n ln n − n − 2n1
(t − n)2 .
(c) In the integral expression of n! from (a), replace n ln t−t by its quadratic
approximation from (b) to get
Γ (n + 1) ∼ (n/e)n ∫
∞ 1 2
e− 2n (t−n) dt.
t=0
The quantity t − n runs through (−n, ∞) as t runs through (0, ∞). Thus,
assuming that n ≫ 0, replace t − n by t and extend the integration to all of R
to get
Γ (n + 1) ∼ (n/e)n ∫
∞ 1 2
e− 2n t dt.
√
t=−∞
1 ∞
s −αt dt
α−s = ∫ t e α > 0.
Γ (s) t=0
,
t
1 ∞ ∞
s −(1+x2 )t dt
Is = ∫ ∫ t e
Γ (s) x=−∞ t=0
dx.
t
(c) Explain how after exchanging the order of integration, a few other steps
lead to
1 ∞ ∞
s−1/2 −t dt
Is = ∫ t ∫
2
e−x dx.
Γ (s) t=0
e
t x=−∞
(d) Use earlier exercises to conclude that
√ Γ (s − 1/2)
Is =
Γ (s)
π .
6.7.17. Let A and B be positive real numbers. This exercise evaluates the
improper integral
6.7 Change of Variable 323
∞ dx
Is = ∫ (for every real number s > 0).
x=−∞ (Ae2x + Be−2x )s
(a) Recall from Exercise 6.7.16(a) that α−s = ∫t=0 t e for all α > 0.
1 ∞ s −αt dt
Γ (s) t
Explain how a particular choice of α leads to
1 ∞ ∞
s −(Ae2x +Be−2x )t dt
Is = ∫ ∫ t e
Γ (s) x=−∞ t=0
dx.
t
(b) Let x = 1
2
log u (natural logarithm) and show that
1 ∞ ∞
s −(Au+Bu−1 )t dt du
Is = ∫ ∫ t e
2Γ (s) u=0 t=0
.
t u
Replace t by ut to get
1 ∞ ∞
s s −(Au2 +B)t dt du
Is = ∫ ∫ t u e
2Γ (s) u=0 t=0
.
t u
√
Replace u by u to get
1 ∞ ∞
s s/2 −(Au+B)t dt du
Is = ∫ ∫ t u e
4Γ (s) u=0 t=0
.
t u
6.7.18. (Read-only. This exercise makes use not only of the gamma function
but of some results beyond our scope, in the hope of interesting the reader in
those ideas.)
(a) Consider any x ∈ R>0 , ξ ∈ R, and s ∈ R>1 . We show that
⎧
⎪
⎪ Γ (s) e−xξ ξ s−1 if ξ > 0,
2π
eiξy
⎨
∞
∫ =
(x + iy)s ⎪
dy
y=−∞ ⎪
⎩0 if ξ ≤ 0.
Γ (s) = ∫
∞ dξ ∞ dξ
e−ξ ξ s = xs ∫ e−xξ ξ s .
ξ=0 ξ ξ=0 ξ
A result from complex analysis says that this formula extends from the open
half-line of positive x-values to the open half-plane of complex numbers x + iy
with x positive. That is, for every y ∈ R,
Γ (s) = (x + iy)s ∫
∞ dξ
e−(x+iy)ξ ξ s .
ξ=0 ξ
This is
⎧
⎪
Γ (s) ⎪e−xξ ξ s−1 if ξ > 0,
= ∫ e−iyξ ϕx (ξ) dξ ϕx (ξ) = ⎨
∞
(x + iy) ⎪
where
⎪ if ξ ≤ 0.
⎩
s ξ=0 0
The integral here is a Fourier transform. That is, letting F denote the Fourier
transform operator, the previous display says that
Γ (s)
= (Fϕx )(y), y ∈ R.
(x + iy)s
The integral Γ (s) ∫y=−∞ eiξy (x + iy)−s dy is consequently the inverse Fourier
∞
in which dξ = ∏i≤j dξij is the product of the differentials of the diagonal and
superdiagonal elements of ξ, where we recall that because ξ is symmetric the
subdiagonal entries are redundant. The decomposition of Cn combines with
some other facts (which the reader is encouraged to identify, if not prove) to
show that
6.7 Change of Variable 325
Γn (s) = ∫ ⎜ ⎟.
∫ ∫ ⎜ dξ da dc ⎟
⎝ ⋅
a(n+1)/2 (det ξ2 )(n+1)/2 ⎠
2
c∈Rn−1 a∈R>0 ξ2 ∈Cn−1
Replacing c by a1/2 c (and thus dc by a(n−1)/2 dc) lets the integral be separated,
Γn (s) = ∫
da
e−∣c∣ dc ⋅ ∫
2
e−a as
c∈Rn−1 a∈R>0 a
(det ξ2 )s−1/2
dξ2
⋅∫ −tr ξ2
(det ξ2 )n/2
e
ξ2 ∈Cn−1
= π (n−1)/2 Γ (s)Γn−1 (s − 12 ).
And iterating the argument gives the value of the nth gamma function in
terms of the basic gamma function,
Similarly to part (a), one now can evaluate an integral over the vector space Vn
of n × n symmetric matrices for a given ξ ∈ Vn ,
⎧
⎪ (2π) π
⎪ e−tr(xξ) (det ξ)s−(n+1)/2 if ξ ∈ Cn ,
n (n−1)n/2
ei tr(ξy)
∫ = ⎨
⎪
dy Γn (s)
det(x + iy)s ⎪
⎩0 otherwise,
y∈Vn
using the fact that the constant for Fourier inversion over the space of n × n
symmetric matrices is (2π)n π (n−1)n/2 .
6.7.19. Figure 6.44 shows a geodesic dome with 5-fold vertices and 6-fold
vertices. (A geodesic of the sphere is a great circle.) Figure 6.45 shows a
bird’s-eye view of the dome. The thinner edges emanate from the 5-vertices,
while four of the six edges emanating from each 6-vertex are thicker. The five
triangles that meet at a 5-vertex are isosceles, while two of the six triangles
that meet at a 6-vertex are equilateral. This exercise uses vector algebra and
the spherical coordinate system to work out the lengths and angles of the
dome. Integration and the change of variable theorem play no role in this
exercise.
(a) Take all vertices to lie on a sphere of radius 1. The ten thick edges
around the equator form a regular 10-gon. Show that consequently the thick
edges have length
a = 2 sin(π/10) = 2 cos(2π/5).
This famous number from geometry goes back to Euclid. Note that a = ζ5 +ζ5−1
where ζ5 = e2πi/5 = cos(2π/5) + i sin(2π/5) is the fifth root of unity one-fifth
of the way counterclockwise around the complex unit circle. Thus a2 + a − 1 =
ζ52 + 2 + ζ53 + ζ5 + ζ54 − 1, and the right side is 0 by the finite geometric sum
formula. That is,
a2 + a − 1 = 0, a > 0,
and so the length of the thick edges is
√
−1 + 5
a= = 0.618033988 . . . .
2
This number is a variant of the so-called golden ratio.
(b) The dome has a point at the north pole (0, 0, 1); then a layer of five
points p0 through p4 around the north pole at some colatitude ϕ; then a
6.7 Change of Variable 327
layer of ten points q0 through q9 , five at colatitude 2ϕ and the other five at
some second colatitude φ; and finally the layer of ten equatorial points r0
through r9 . The colatitude ϕ must be such that the triangle with vertices n,
q0 , and q2 is equilateral. These vertices may be taken to be
n = (0, 0, 1),
q0 , q2 = (cos(π/5) sin(2ϕ), ∓ sin(π/5) sin(2ϕ), cos(2ϕ)).
ϕ = arctan(a) = 31.7174 . . .○ .
Use the cross-sectional triangle having vertices 0, n, p0 and the law of cosines
to show that the shorter segments have length
√ √
b = 2(1 − 1/ 2 − a) = 0.546533057 . . . .
(Alternatively, one can find ϕ and b using the triangle with vertices n, p0 , p1 .)
For reference in part (e), show that
2ϕ = arctan(2) = 63.4349 . . .○ .
(c) Show that the angle of an isosceles triangle where its equal sides meet
at a 5-vertex is
α = 2 arcsin(a/(2b)) = 68.8619 . . .○ ,
and the angles where its unequal sides meet at 6-vertices are
β = arccos(a/(2b)) = 55.5690 . . .○ .
(d) Show that the angle where two a-segments meet along a geodesic
is 180○ − 36○ . Show that the angle where two b-segments meet along a geodesic
(this happens at the 6-vertices but not at the 5-vertices) is 180○ − ϕ.
(e) To find the colatitude φ of q1 , q3 , . . . , q9 , take q9 and q1 to be
and consider the geodesic containing them. Their cross product is normal to
the plane of the geodesic. Show that this cross product is
Φ ∶ A Ð→ Rn
Let
f ∶ Φ(K) Ð→ R
be a continuous function. Then
∫ f = ∫ (f ○ Φ) ⋅ ∣ det Φ′ ∣.
Φ(K) K
Thus the obvious data for the theorem are K, Φ, and f . (The description of Φ
subsumes A, and in any case the role of A is auxiliary.) But also, although the
dimension n is conceptually generic but fixed, in fact the proof of the theorem
will entail induction on n, so that we should view n as a variable part of the
setup as well. Here are some comments about the data.
• The continuous image of a compact set is compact (Theorem 2.4.14), so
that Φ(K) is again compact. Similarly, by an invocation in Section 2.4,
the continuous image of a connected set is connected, so that Φ(K) is
again connected. The reader who wants to minimize invocation may in-
stead assume that that K is path-connected, so that Φ(K) is again path-
connected (see Exercise 2.4.10 for the definition of path-connectedness and
the fact that path-connectedness is a topological property); the distinction
6.8 Topological Preliminaries for the Change of Variable Theorem 329
B(a, r) = {x ∈ Rn ∶ ∣x − a∣ < r} .
330 6 Integration
θ y
Φ
x
Figure 6.46. The change of variable mapping need not behave well on the boundary
A boundary point of a set need not be a limit point of the set, and a limit
point of a set need not be a boundary point of the set (Exercise 6.8.1(b)).
Nonetheless, similarly to the definition of closed set in the second bullet
before Definition 6.8.1, a set is closed if and only if it contains all of its
boundary points (Exercise 6.8.1(c)). The boundary of every set is closed (Ex-
ercise 6.8.1(d)). Since the definition of boundary point is symmetric in the
set and its complement, the boundary of the set is also the boundary of the
complement,
∂S = ∂(S c ).
The closure of a set is the union of the set and its boundary (Exercise 6.8.2(a)),
6.8 Topological Preliminaries for the Change of Variable Theorem 331
S = S ∪ ∂S.
∂B(a, r) = {x ∈ Rn ∶ ∣x − a∣ = r}
Proof (Sketch). Suppose that no finite collection of the open boxes Ji cov-
ers K. Let B1 be a box that contains K. Partition B1 into 2n subboxes B ̃ by
̃
bisecting it in each direction. If for each subbox B, some finite collection of
the open boxes Ji covers K ∩ B, ̃ then the 2n -fold collection of these finite col-
lections in fact covers all of K. Thus no finite collection of the open boxes Ji
covers K ∩ B ̃ for at least one subbox B ̃ of B1 . Name some such subbox B2 ,
repeat the argument with B2 in place of B1 , and continue in this fashion,
obtaining nested boxes
B1 ⊃ B2 ⊃ B3 ⊃ ⋯
whose sides are half as long at each succeeding generation, and such that no
K ∩ Bj is covered by a finite collection of the open boxes Ji . The intersection
K ∩ B1 ∩ B2 ∩ ⋯ contains at most one point, because the boxes Bj eventually
shrink smaller than the distance between any two given distinct points. On the
other hand, since each K ∩Bj is nonempty (otherwise the empty subcollection
of the open boxes Ji would cover it), there is a sequence {cj } with each
cj ∈ K ∩ Bj ; and since K is compact and each Bj is compact and the Bj
are nested, the sequence {cj } has a subsequence that converges in K and
in each Bj , hence converging in the intersection K ∩ B1 ∩ B2 ∩ ⋯. Thus the
intersection is a single point c. Some open box Ji covers c because c ∈ K, and
so because the boxes Bj shrink to c, also Ji covers Bj for all high enough
indices j. This contradicts the fact that no K ∩ Bj is covered by finitely
many Ji . Thus the initial supposition that no finite collection of the open
boxes Ji covers K is untenable. ⊔
⊓
Although the finiteness property of compact sets plays only a small role in
these notes, the idea is important and far-reaching. For example, it lies at the
heart of sequence-free proofs that the continuous image of a compact set is
332 6 Integration
Proof. Let x be the centerpoint of B and let x̃ be any point of B. Make the
line segment connecting x to x̃ the image of a function of one variable,
Fix any i ∈ {1, . . . , n}. Identically to the proof of the difference magnification
lemma, we have for some t ∈ (0, 1),
For each j, the jth entry of the vector gi′ (γ(t)) is Dj gi (γ(t)), and we are
given that ∣Dj gi (γ(t))∣ ≤ c. Also, the jth entry of the vector x̃ − x satisfies
∣x̃j − xj ∣ ≤ ℓ/2, where ℓ is the longest side of B. Thus
and so
gi (B) ⊂ [gi (x) − ncℓ/2, gi (x) + ncℓ/2].
Apply this argument for each i ∈ {1, . . . , n} to show that g(B) lies in the
box B ′ centered at g(x) having sides ncℓ and therefore having volume
vol(B ′ ) = (ncℓ)n .
vol(B) ≥ (ℓ/2)n .
Using the previous two results, we can show that the property of having
volume zero is preserved under mappings that are well enough behaved. How-
ever, we need to assume more than just continuity. The property of having
volume zero is not a topological property.
6.8 Topological Preliminaries for the Change of Variable Theorem 333
⊔
⊓
The last topological preliminary that we need is the formal definition of
interior.
Definition 6.8.7 (Interior point, interior of a set). Let S ⊂ Rn be a set.
Every nonboundary point of S is an interior point of S. Thus x is an interior
point of S if some open ball B(x, r) lies entirely in S. The interior of S is
S ○ = {interior points of S}.
The interior of every set S is open (Exercise 6.8.6(a)). Every set decom-
poses as the disjoint union of its interior points and its boundary points (Ex-
ercise 6.8.6(b)),
S = S ○ ∪ (S ∩ ∂S), S ○ ∩ ∂S = ∅.
As anticipated at the beginning of this section, we now can complete the
argument that the properties of the set K in the change of variable theorem
are preserved by the mapping Φ in the theorem.
Proposition 6.8.8. Let K ⊂ Rn be a compact and connected set having
boundary of volume zero. Let A be an open superset of K, and let Φ ∶ A Ð→ Rn
be a C 1 -mapping such that det Φ′ ≠ 0 everywhere on K ○ . Then Φ(K) is again
a compact and connected set having boundary of volume zero.
334 6 Integration
Proof. We have discussed the fact that Φ(K) is again compact and connected.
Restrict Φ to K. The inverse function theorem says that Φ maps interior points
of K to interior points of Φ(K), and thus ∂(Φ(K)) ⊂ Φ(∂K). By the volume-
zero preservation proposition, vol(Φ(∂K)) = 0. So vol(∂(Φ(K))) = 0 as well.
⊔
⊓
Exercises
6.8.1. (a) Show that every intersection—not just twofold intersections and
not even just finite-fold intersections—of closed sets is closed. (Recall from
Proposition 2.4.5 that a set S is closed if and only if every sequence in S that
converges in Rn in fact converges in S.)
(b) Show by example that a boundary point of a set need not be a limit
point of the set. Show by example that a limit point of a set need not be a
boundary point of the set.
(c) Show that a set is closed if and only if it contains each of its boundary
points. (Again recall the characterization of closed sets mentioned in part (a).)
(d) Show that the boundary of every set is closed.
(e) Show that every union of two closed sets is closed. It follows that every
union of finitely many closed sets is closed. Recall that by definition, a set is
open if its complement is closed. Explain why consequently every intersection
of finitely many open sets is open.
(f) Explain why every union of finitely many compact sets is compact.
6.8.2. Let S be any subset of Rn .
(a) Show that its closure is its union with its boundary, S = S ∪ ∂S.
(b) Show that if S is bounded then so is S.
6.8.3. (a) Which points of the proof of Proposition 6.8.4 are sketchy? Fill in
the details.
(b) Let S be an unbounded subset of Rn , meaning that S is not contained
in any ball. Find a collection of open boxes Ji that covers S but such that no
finite subcollection of the open boxes Ji covers S.
(c) Let S be a bounded but nonclosed subset of Rn , meaning that S is
bounded but missing a limit point. Find a collection of open boxes Ji that
covers S but such that no finite subcollection of the open boxes Ji covers S.
6.8.4. Let ε > 0. Consider the box B = [0, 1] × [0, ε] ⊂ R2 , and consider the
mapping g ∶ R2 Ð→ R2 given by g(x, y) = (x, x). What is the smallest box B ′
containing g(B)? What is the ratio vol(B ′ )/vol(B)? Discuss the relationship
between this example and Lemma 6.8.5.
6.8.5. The following questions are about the proof of Proposition 6.8.6.
(a) Explain why for each s ∈ S there exists an rs > 0 such that the copy of
the box [−rs , rs ]n centered at s lies in A.
(b) Explain why every box (with all sides assumed to be positive) can be
subdivided into boxes whose longest side is at most twice the shortest side.
6.9 Proof of the Change of Variable Theorem 335
∫ f = ∫ (f ○ Φ) ⋅ ∣ det Φ′ ∣.
Φ(K) K
(In the left side of Figure 6.47, the type I subboxes are shaded and the type II
subboxes are white. There are no type III subboxes in the figure, but type III
subboxes play no role in the pending argument anyway.) The three types of
box are exclusive and exhaustive (Exercise 6.9.2(a)).
Also define a function
⎧
⎪
⎪(f ○ Φ)(x) ⋅ ∣ det Φ′ (x)∣ if x ∈ K,
g ∶ B Ð→ R, g(x) = ⎨
⎪
⎪ if x ∉ K.
⎩
0
336 6 Integration
θ y
Φ
x
Figure 6.47. Type I and type II subboxes, image of the type I subboxes
(Exercise 6.9.2(c)).
Let
Φ(K)I = ⋃ Φ(J), Φ(K)II = Φ(K)/Φ(K)I .
J:type I
(Thus Φ(K)I is shaded in the right side of Figure 6.47, while Φ(K)II is white.)
Then the integral on the left side of the equality in the change of variable
theorem decomposes into two parts,
∫ f =∫ f +∫ f,
Φ(K) Φ(K)I Φ(K)II
∫ f= ∑ ∫ f +∫ f. (6.12)
Φ(K) J : type I Φ(J) Φ(K)II
6.9 Proof of the Change of Variable Theorem 337
Also,
Φ(K)II ⊂ ⋃ Φ(J),
J : type II
so that
∣∫ f∣ ≤ ∫ ∣f ∣ ≤ ∑ ∫ ∣f ∣.
Φ(K)II Φ(K)II J : type II Φ(J)
∣∫ f ∣ < ε.
Φ(K)II
That is, the second term on the right side of (6.12) contributes as negligibly
as desired to the integral on the left side, which is the integral on the left side
of the change of variable theorem. In terms of Figure 6.47, the idea is that if
the boxes in the left half of the figure are refined until the sum of the white
box-areas is small enough then the integral of f over the corresponding small
white region in the right half of the figure becomes negligible.
Meanwhile, the integral on the right side of the equality in the change of
variable theorem also decomposes into two parts,
∫ (f ○ Φ) ⋅ ∣ det Φ ∣ = ∑ ∫ g+ ∑ ∫ g.
′
(6.13)
K J : type I J J : type II J
That is, the second term on the right side of (6.13) contributes as negligibly
as desired to the integral on the left side, which is the integral on the right
side of the change of variable theorem. In terms of Figure 6.47, the idea is that
if the boxes in the left half of the figure are refined until the sum of the white
box-areas is small enough then the integral of (f ○ Φ) ⋅ ∣ det Φ′ ∣ over the white
boxes becomes negligible. That is, it suffices to prove the change of variable
theorem for boxes like the shaded boxes in the left half of the figure.
The type I subboxes J of the partition of the box B containing the orig-
inal K (which is not assumed to be a box) satisfy all of the additional hy-
potheses in the statement of the proposition: each J is a box, and we may
shrink the domain of Φ to the open superset K ○ of each J, where Φ is in-
jective and where det Φ′ ≠ 0. Thus, knowing the change of variable theorem
subject to any of the additional hypotheses says that the first terms on the
right sides of (6.12) and (6.13) are equal, making the integrals on the left sides
lie within ε of each other. Since ε is arbitrary, the integrals are in fact equal.
In sum, it suffices to prove the change of variable theorem assuming any of
the additional hypotheses, as desired. ⊔
⊓
338 6 Integration
Similarly to the remark after Proposition 6.9.1, we will not always want
the additional hypotheses.
Proof. With the previous proposition in play, the idea now is to run through its
proof in reverse, starting from the strengthened hypotheses that it grants us.
Thus we freely assume that K is a box, that the change of variable mapping Φ
is injective on all of A, and that det Φ′ ≠ 0 on all of A. By the inverse function
theorem, the superset Φ(A) of Φ(K) is open and Φ ∶ A Ð→ Φ(A) has a C 1
inverse
Φ−1 ∶ Φ(A) Ð→ A.
Let ε > 0 be given.
Let B be a box containing Φ(K), and let P be a partition of B into
subboxes J. Define three types of subbox,
These three types of box are exclusive and exhaustive. Also, define as before
⎧
⎪
⎪(f ○ Φ)(x) ⋅ ∣ det Φ′ (x)∣ if x ∈ K,
g ∶ B Ð→ R, g(x) = ⎨
⎪
⎪ if x ∉ K.
⎩
0
Φ−1
Figure 6.48. Type I, II, and III subboxes, inverse image of the type I subboxes
Let
KI = ⋃ Φ−1 (J), KII = K/KI .
J:type I
Then the integral on the left side of the equality in the change of variable
theorem decomposes into two parts,
∫ f= ∑ ∫ f+ ∑ ∫ f. (6.14)
Φ(K) J : type I J J : type II J
That is, the second term on the right side of (6.14) contributes as negligibly
as desired to the integral on the left side, which is the integral on the left side
of the change of variable theorem.
Meanwhile, the integral on the right side of the equality in the change of
variable theorem also decomposes into two parts,
∫ (f ○ Φ) ⋅ ∣ det Φ ∣ = ∫ g+∫
′
g,
K KI KII
∫ (f ○ Φ) ⋅ ∣ det Φ ∣ = ∑ ∫ g+∫
′
g. (6.15)
K J : type I Φ−1 (J) KII
Also,
KII ⊂ ⋃ Φ−1 (J),
J : type II
so that
340 6 Integration
∣∫ g∣ ≤ ∫ ∣g∣ ≤ ∑ ∫ ∣g∣.
KII KII J : type II Φ−1 (J)
For each box J of type II, vol(Φ−1 (J)) ≤ (2nc)n vol(J). Thus, by the bounds
on g and on the sum of the type II box-volumes, it follows that
∣∫ g∣ < ε.
KII
That is, the second term on the right side of (6.15) contributes as negligibly
as desired to the integral on the left side, which is the integral on the right
side of the change of variable theorem.
The type I subboxes J of the partition of the box B containing the orig-
inal Φ(K) (which is not assumed to be a box) satisfy the new additional
hypothesis in the statement of the proposition. The other two additional hy-
potheses in the statement of the proposition are already assumed. Thus, know-
ing the change of variable theorem subject to the additional hypotheses says
that the first terms on the right sides of (6.14) and (6.15) are equal, making
the integrals on the left sides lie within ε of each other. Since ε is arbitrary, the
integrals are in fact equal. In sum, it suffices to prove the change of variable
theorem assuming the additional hypotheses, as desired. ⊔
⊓
∫ (f ○ Φ) ⋅ ∣ det Φ ∣ = ∑ ∫ (f ○ Φ) ⋅ ∣ det Φ′ ∣
′
K J Φ−1 (J)
≤ ∑∫ (MJ (f ) ○ Φ) ⋅ ∣ det Φ′ ∣
J Φ−1 (J)
= ∑ ∫ MJ (f ) by the assumption
J J
= ∑ MJ (f ) vol(J)
J
= U (f, P ).
As a lower bound of the upper sums, ∫K (f ○Φ)⋅∣ det Φ′ ∣ is at most the integral,
6.9 Proof of the Change of Variable Theorem 341
∫ (f ○ Φ) ⋅ ∣ det Φ ∣ ≤ ∫
′
f.
K Φ(K)
A similar argument gives the opposite inequality, making the integrals equal
as desired. ⊔
⊓
The next result will allow the proof of the change of variable theorem to
decompose the change of variable mapping.
Φ=Γ ○Ψ
and
∫ 1=∫ ∣ det Γ ′ ∣
Γ (Ψ (K)) Ψ (K)
then also
∫ 1 = ∫ ∣ det Φ′ ∣.
Φ(K) K
= ∫ ∣ det(Γ ○ Ψ ) ∣ = ∫ ∣ det Φ ∣.
′ ′
K K
⊔
⊓
Proof. Let
T ∶ Rn Ð→ Rn
342 6 Integration
Φ′ ∶ [a, b] Ð→ R
can take the value 0 only at a and b. Thus by the intermediate value theorem,
Φ′ never changes sign on [a, b]. If Φ′ ≥ 0 on [a, b] then Φ is increasing, and so
(using Theorem 6.4.3 for the second equality)
f = ∫ (f ○ Φ) ⋅ Φ′ = ∫ (f ○ Φ) ⋅ ∣Φ′ ∣.
Φ(b) b
∫ f =∫
Φ([a,b]) Φ(a) a [a,b]
f = − ∫ (f ○ Φ) ⋅ Φ′ = ∫ (f ○ Φ) ⋅ ∣Φ′ ∣.
Φ(a) Φ(b) b
∫ f =∫ f = −∫
Φ([a,b]) Φ(b) Φ(a) a [a,b]
⊔
⊓
344 6 Integration
At long last we can prove the change of variable theorem for n > 1.
Proof. We may assume the result for dimension n − 1, and we may assume
that K is a box B, that A is an open superset of B, and that Φ ∶ A Ð→ Rn is
a C 1 -mapping such that Φ is injective on A and det Φ′ ≠ 0 on A. We need to
show that
∫ 1 = ∫ ∣ det Φ′ ∣. (6.17)
Φ(B) B
Φ = T ○ Γ ○ Ψ,
where Ψ and Γ are C 1 -mappings that fix at least one coordinate and T is a
linear transformation. Note that Ψ , Γ , and T inherit injectivity and nonzero
determinant-derivatives from Φ, so that in particular, T is invertible. Since
the theorem holds for each of Ψ , Γ , and T , it holds for their composition. In
more detail,
T = DΦx
and define
̃ = T −1 ○ Φ,
Φ
so that DΦ̃x = idn is the n-dimensional identity map. Introduce the nth pro-
jection function, πn (x1 , . . . , xn ) = xn , and further define
Ψ ∶ A Ð→ Rn , ̃1 , . . . , Φ
Ψ = (Φ ̃n−1 , πn ),
Γ ∶ Ψ (Ax ) Ð→ Rn , ̃n ○ Ψ −1 ).
Γ = (π1 , . . . , πn−1 , Φ
Ψ Γ T
In contrast to all of this, recall the much easier proof of the one-dimensional
change of variable theorem, using the construction of an antiderivative by in-
tegrating up to a variable endpoint (Theorem 6.4.1, sometimes called the first
fundamental theorem of integral calculus) and using the (second) fundamental
theorem of integral calculus twice,
∫ (f ○ φ) ⋅ φ = ∫ (F ○ φ) ⋅ φ where F (x) = ∫
b b x
′ ′ ′
f , so F ′ = f
a a a
= ∫ (F ○ φ)′
b
by the chain rule
a
= (F ○ φ)(b) − (F ○ φ)(a) by the FTIC
= F (φ(b)) − F (φ(a)) by definition of composition
φ(b)
=∫ F′ by the FTIC again
φ(a)
φ(b)
=∫ f since F ′ = f .
φ(a)
realized version of that proof still has to handle topological issues, but even
so it is more efficient than the long, elementary method of this section.
Exercises
γ ∶ [0, 1] Ð→ R
such that γ(0) = a and γ(1) = b. Explain why consequently K = [a, b].
6.9.2. (a) Explain to yourself why the three types of rectangle in the proof
of Proposition 6.9.1 are exclusive. Now suppose that the three types are not
exhaustive, i.e., some rectangle J lies partly in K ○ and partly in (B/K)○
without meeting the set ∂K = ∂(B/K). Supply details as necessary for the
following argument. Let x ∈ J lie in K ○ and let x̃ ∈ J lie in (B/K)○ . Define
a function from the unit interval to R by mapping the interval to the line
segment from x to x̃, and then mapping each point of the segment to 1 if
it lies in K and to −1 if it lies in B/K. The resulting function is continuous
on the interval, and it changes sign on the interval, but it does not take the
value 0. This is impossible, so the rectangle J cannot exist.
(b) In the proof of Proposition 6.9.1, show that we may assume that the
partition P is fine enough that all subboxes J of type I and type II lie in U .
(c) In the proof of Proposition 6.9.1, show that given ε > 0, we may assume
that the partition P is fine enough that
f ∶ Rn Ð→ R
briefly touches on the fact that for functions that vanish off a compact set,
C 0 -functions and C 1 -functions and C 2 -functions are well approximated by C ∞ -
functions.
The approximation technology is an integral called the convolution. The
idea is as follows. Suppose that we had a function
δ ∶ Rn Ð→ R
∫ f (y)ϕ(x − y) ≈ f (x).
y∈Rn
The approximating integral on the left side of the previous display is the
convolution of f and ϕ evaluated at x. Although f is assumed only to be con-
tinuous, the convolution is smooth. Indeed, every xi -derivative passes through
the y-integral and ϕ is smooth, so that
f ∶ Rn Ð→ R.
The support of f is the closure of the set of its inputs that produce nonzero
outputs,
supp(f ) = {x ∈ Rn ∶ f (x) ≠ 0}.
The function f is compactly supported if its support is compact. The class
of compactly supported C k -functions is denoted Cck (Rn ). Especially, Cc0 (Rn )
denotes the class of compactly supported continuous functions.
Each class Cck (Rn ) of functions forms a vector space over R (Exercise 7.1.1).
Figure 7.1 shows a compactly supported C 0 -function on R and its support.
The graph has some corners, so the function is not C 1 .
The class of test functions sits at the end of the chain of containments of
function-spaces from a moment ago,
350 7 Approximation by Smooth Functions
and as an intersection of vector spaces over R, the test functions Cc∞ (Rn )
again form a vector space over R. In the chain of containments
all of the containments are proper. Indeed, for a vivid example of the first
containment, Weierstrass showed how to construct a function f of one variable,
having support [0, 1], that is continuous everywhere but differentiable nowhere
on its support. The function of n variables
thus lies in Cc0 (Rn ) but not in Cc1 (Rn ). Next, the function
f1 (x1 , x2 , . . . , xn ) = ∫ f0 (t1 , x2 , . . . , xn )
x1
t1 =0
lies in Cc1 (Rn ) but not Cc2 (Rn ), because its first partial derivative is f0 , which
does not have a first partial derivative. Defining f2 as a similar integral of f1
gives a function that lies in Cc2 (Rn ) but not Cc3 (Rn ), and so on. Finally, none
of the functions fk just described lies in Cc∞ (Rn ).
For every k > 0 and every f ∈ Cck (Rn ), the supports of the partial deriva-
tives are contained in the support of the original function,
supp(Dj f ) ⊂ supp(f ), j = 1, . . . , n.
Thus the partial derivative operators Dj take Cck (Rn ) to Cck−1 (Rn ) as sets.
The operators are linear because
and
Dj (cf ) = c Dj f, f ∈ Cck (Rn ), c ∈ R.
In addition, more can be said about the Dj operators. Each space Cck (Rn ) of
functions carries an absolute value function having properties similar to the
absolute value on Euclidean space Rn . With these absolute values in place,
the partial differentiation operators are continuous.
Definition 7.1.3 (Cck (Rn ) absolute value). The absolute value function
on Cc0 (Rn ) is
∣ ∣k ∶ Cck (Rn ) Ð→ R
given by
⎧
⎪ ∣f ∣, ⎫
⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪
⎪
⎪ ∣D ∣ = ⎪
⎪
⎪
⎪
⎪
⎪
j f for j 1, . . . , n, ⎪
⎪
⎪
∣f ∣k = max ⎨ ∣Djj ′ f ∣ for j, j = 1, . . . , n,
′
⎬.
⎪
⎪ ⎪
⎪
⎪
⎪
⎪ ⋮ ⎪
⎪
⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪
⎪ ⎪
⎩ ∣Dj1 ⋯jk f ∣ for j1 , . . . , jk = 1, . . . , n⎪
⎭
That is, ∣f ∣k is the largest absolute value of f or of any derivative of f up to
order k. In particular, ∣ ∣0 = ∣ ∣.
The largest absolute values mentioned in the definition exist by the ex-
treme value theorem, because the relevant partial derivatives are compactly
supported and continuous. By contrast, we have not defined an absolute value
on the space of test functions Cc∞ (Rn ), because the obvious attempt to extend
Definition 7.1.3 to test functions would involve the maximum of an infinite
set, a maximum that certainly need not exist.
Proof. The first two properties are straightforward to check. For the third
property, note that for every f, g ∈ Cc0 (Rn ) and every x ∈ Rn ,
∣f + g∣ ≤ ∣f ∣ + ∣g∣.
That is, ∣f + g∣0 ≤ ∣f ∣0 + ∣g∣0 . If f, g ∈ Cc1 (Rn ) then the same argument shows
that also ∣Dj (f + g)∣ ≤ ∣Dj f ∣ + ∣Dj g∣ for j = 1, . . . , n, so that
∣f + g∣,
∣f + g∣1 = max { }
∣Dj f + Dj g∣ for j = 1, . . . , n
∣f ∣ + ∣g∣,
≤ max { }
∣Dj f ∣ + ∣Dj g∣ for j = 1, . . . , n
∣f ∣, ∣g∣,
≤ max { } + max { }
∣Dj f ∣ for j = 1, . . . , n ∣Dj g∣ for j = 1, . . . , n
= ∣f ∣1 + ∣g∣1 .
352 7 Approximation by Smooth Functions
Fix any j ∈ {1, . . . , n}. As a subset of the information in the previous display,
lim ∣Dj fm − Dj f ∣ = 0,
m
lim ∣Djj ′ fm − Djj ′ f ∣ = 0 for j ′ = 1, . . . , n,
m
⋮
lim ∣Djj2 ...jk fm − Djj2 ...jk f ∣ = 0 for j2 , . . . , jk = 1, . . . , n.
m
That is,
lim ∣Dj fm − Dj f ∣k−1 = 0.
m
The implication that we have just proved,
lim ∣fm − f ∣k = 0 Ô⇒ lim ∣Dj fm − Dj f ∣k−1 = 0,
m m
is exactly the assertion that Dj ∶ Cck (Rn ) Ð→ Cck−1 (Rn ) is continuous, and the
proof is complete. ⊔
⊓
7.1 Spaces of Functions 353
Again let k ≥ 1. The fact that ∣f ∣k−1 ≤ ∣f ∣k for every f ∈ Cck (Rn ) (Ex-
ercise 7.1.2) shows that for every f ∈ Cck (Rn ) and every sequence {fm }
in Cck (Rn ), if limm ∣fm − f ∣k = 0 then limm ∣fm − f ∣k−1 = 0. That is, the in-
clusion mapping
i ∶ Cck (Rn ) Ð→ Cck−1 (Rn ), i(f ) = f
is continuous.
The space Cc∞ (Rn ) of test functions is closed under partial differentiation,
meaning that the partial derivatives of a test function are again test functions
(Exercise 7.1.3).
In this chapter we will show that just as every real number x ∈ R is ap-
proximated as closely as desired by rational numbers q ∈ Q, every compactly
supported continuous function f ∈ Cck (Rn ) is approximated as closely as de-
sired by test functions g ∈ Cc∞ (Rn ). More precisely, we will show that:
For every f ∈ Cck (Rn ), there exists a sequence {fm } in Cc∞ (Rn ) such
that limm ∣fm − f ∣k = 0.
The fact that limm ∣fm − f ∣k = 0 means that given any ε > 0, there exists a
starting index m0 such that fm for all m ≥ m0 uniformly approximates f to
within ε up to kth order. That is, for all m ≥ m0 , simultaneously for all x ∈ Rn ,
Exercises
7.1.1. Show that each class Cck (Rn ) of functions forms a vector space over R.
7.1.3. Explain why each partial derivative of a test function is again a test
function.
7.1.4. Let {fn } be a sequence of functions in Cc0 (Rn ), and suppose that the
sequence converges, meaning that there exists a function f ∶ Rn Ð→ R such
that limn fn (x) = f (x) for all x ∈ Rn . Must f have compact support? Must f
be continuous?
354 7 Approximation by Smooth Functions
(See Figure 7.2.) Each x < 0 lies in an open interval on which s is the constant
function 0, and each x > 0 lies in an open interval on which s is a composi-
tion of smooth functions, so in either case all derivatives s(k) (x) exist. More
specifically, for every nonnegative integer k, there exists a polynomial pk (x)
such that the kth derivative of s takes the form
⎧
⎪ if x < 0,
⎪
⎪
⎪
0
(k)
(x) = ⎨pk (x)x−2k e−1/x if x > 0,
⎪
s
⎪
⎪
⎪
⎩? if x = 0.
Only s(k) (0) is in question. However, s(0) (0) = 0, and if we assume that
s(k) (0) = 0 for some k ≥ 0 then it follows (because exponential behavior
dominates polynomial behavior) that
That is, s(k+1) (0) exists and equals 0 as well. By induction, s(k) (0) = 0 for
all k ≥ 0. Thus s is smooth: each derivative exists, and each derivative is
continuous because the next derivative exists as well. But s is not a test
function, because its support is not compact: supp(s) = [0, ∞).
s(x + 1)s(−x + 1)
p ∶ R Ð→ R, p(x) = .
∫x=−1 s(x + 1)s(−x + 1)
1
The graph of p (Figure 7.3) explains the name pulse function. As a product
of compositions of smooth functions, p is smooth. The support of p is [−1, 1],
so p is a test function. Also, p is normalized so that
∫ p = 1.
[−1,1]
The maximum pulse value p(0) is therefore close to 1 because the pulse graph
is roughly a triangle of base 2, but p(0) is not exactly 1. The pulse function
p2 (x, y) = p(x)p(y) from R2 to R, having support [−1, 1]2 , is shown in Fig-
ure 7.4. A similar pulse function p3 on R3 can be imagined as a concentration
of density in a box about the origin.
Exercises
7.2.1. Since the function s in this section is smooth, it has nth-degree Taylor
polynomials Tn (x) at a = 0 for all nonnegative integers n. (Here n does not
denote the dimension of Euclidean space.) For what x does s(x) = Tn (x)?
7.2.2. Let p be the pulse function defined in this section. Explain why
supp(p) = [−1, 1].
(c) Use the function r from part (b) to give a formula for a test function
that is 0 for x < a, climbs from 0 to 1 for a ≤ x ≤ b, is 1 for b < x < c, drops
from 1 to 0 for c ≤ x ≤ d, and is 0 for d < x.
7.3 Convolution
This section shows how to construct test functions from Cc0 (Rn )-functions. In
preparation, we introduce a handy piece of notation.
Definition 7.3.1 (Sum, difference of two sets). Let S and T be subsets
of Rn . Their sum is the set consisting of all sums of a point of S plus a point
of T ,
S + T = {s + t ∶ s ∈ S, t ∈ T }.
Their difference is similarly
S − T = {s − t ∶ s ∈ S, t ∈ T }.
S/T = {s ∈ S ∶ s ∉ T }.
7.3 Convolution 357
Returning to Cc0 (Rn )-functions, every such function can be integrated over
all of Rn .
Definition 7.3.2 (Integral of a Cc0 (Rn )-function). Let f ∈ Cc0 (Rn ). The
integral of f is the integral of f over any box that contains its support,
∫ f =∫ f where supp(f ) ⊂ B.
B
In Definition 7.3.2 the integral on the right side exists by Theorem 6.3.1.
Also, the integral on the right side is independent of the suitable box B, always
being the integral over the intersection of all such boxes, the smallest suitable
box. Thus the integral on the left side exists and is unambiguous. We do not
bother writing ∫Rn f rather than ∫ f , because it is understood that by default
we are integrating f over Rn .
Definition 7.3.3 (Mollifying kernel). Let f ∈ Cc0 (Rn ) be a compactly sup-
ported continuous function, and let ϕ ∈ Cc∞ (Rn ) be a test function. The mol-
lifying kernel associated to f and ϕ is the function
κ ∶ Rn × Rn Ð→ R, κ(x, y) = f (y)ϕ(x − y).
For every fixed x ∈ Rn , the corresponding cross section of the mollifying kernel
is denoted κx ,
κx ∶ Rn Ð→ R, κx (y) = κ(x, y).
For each x ∈ Rn , the mollifying kernel κx (y) can be nonzero only if y ∈
supp(f ) and x − y ∈ supp(ϕ). It follows that
supp(κx ) ⊂ supp(f ) ∩ ({x} − supp(ϕ)).
Therefore κx is compactly supported. (Figure 7.5 shows an example of the
multiplicands f (y) and ϕ(x−y) of κx (y), and Figure 7.6 shows their compactly
supported product.) Also, since f and ϕ are continuous, κx is continuous.
That is, for each x, the mollifying kernel κx viewed as a function of y again
lies in Cc0 (Rn ), making it integrable by Theorem 6.3.1.
The mollifying kernel is so named for good reason. First, it is a kernel in
the sense that we integrate it to get a new function.
Definition 7.3.4 (Convolution). Let f ∈ Cc0 (Rn ) and let ϕ ∈ Cc∞ (Rn ). The
convolution of f and ϕ is the function defined by integrating the mollifying
kernel,
f (y) ϕ(x − y)
κx (y)
Dj (f ∗ ϕ) = f ∗ Dj ϕ, j = 1, . . . , n,
Proof. Fix some j in {1, . . . , n}. The mean value theorem at the jth coordinate
gives for all a ∈ Rn and all nonzero h ∈ R,
7.3 Convolution 359
Thus
ϕ(a + hej ) − ϕ(a)
∣h∣ < δj Ô⇒ ∣ − Dj ϕ(a)∣ < ε.
h
After running the argument of the previous paragraph for j = 1, . . . , n, de-
fine δ = min{δ1 , . . . , δn }. Then for all nonzero h ∈ R and for each j ∈ {1, . . . , n},
if ∣h∣ < δ then ∣h∣ < δj . This implication combines with the previous display to
give the result. ⊔
⊓
x = y + z, y ∈ supp(f ), z ∈ supp(ϕ).
That is, the integrand is always zero if x ∉ supp(f ) + supp(ϕ) (see Figure 7.7).
Hence,
supp(f ∗ ϕ) ⊂ supp(f ) + supp(ϕ).
ϕ(x − y)
f (y)
Figure 7.7. The mollifying kernel is zero for x outside supp(f ) + supp(ϕ)
360 7 Approximation by Smooth Functions
Since the integral is being taken over some box B, the equality follows from
Proposition 6.6.2. But we prove it using other methods, for reasons that will
emerge later in the chapter. The function f is bounded, say by R, so we can
estimate that for every x ∈ Rn and every nonzero h ∈ R and every j,
Assuming that ∣h∣ < 1, the support of the integrand as a function of y lies in
the bounded set
{x + tej ∶ −1 < t < 1} − supp(ϕ),
and therefore the integral can be taken over some box B. By the unifor-
mity lemma, given any ε > 0, for all small enough h the integrand is less
than ε/(R vol(B)) uniformly in y. Consequently the integral is less than ε/R.
In sum, given any ε > 0, for all small enough h we have
Since x is arbitrary, this gives the desired result for first-order partial deriva-
tives,
Dj (f ∗ ϕ) = f ∗ Dj ϕ, j = 1, . . . , n.
As for higher-order partial derivatives, note that Dj ϕ ∈ Cc∞ (Rn ) for each j.
So the same result for second-order partial derivatives follows,
Djj ′ (f ∗ ϕ) = Dj ′ (f ∗ Dj ϕ) = f ∗ Djj ′ ϕ, j, j ′ = 1, . . . , n,
and so on. ⊔
⊓
If the function f lies in the subspace Cc1 (Rn ) of Cc0 (Rn ) then the partial
derivatives of the convolution pass through the integral to f as well as to ϕ.
That is, for differentiable functions, the derivative of the convolution is the
convolution of the derivative.
Corollary 7.3.7. Let k ≥ 1, let f ∈ Cck (Rn ), and let ϕ ∈ Cc∞ (Rn ). Then
Proof. Since
(f ∗ ϕ)(x) = ∫ f (y)ϕ(x − y),
y
(f ∗ ϕ)(x) = ∫ f (x − y)ϕ(y).
y
Now the proof of the proposition works with the roles of f and ϕ exchanged
to show that Dj (f ∗ ϕ) = Dj f ∗ ϕ for j = 1, . . . , n. (Here is where it is relevant
that the uniformity lemma requires only a Cc1 (Rn )-function rather than a test
function.) Similarly, if f ∈ Cc2 (Rn ) then because Dj f ∈ Cc1 (Rn ) for j = 1, . . . , n,
it follows that
Djj ′ (f ∗ ϕ) = Djj ′ f ∗ ϕ, j, j ′ = 1, . . . , n.
Consider a function f ∈ Cc0 (Rn ). Now that we know that every convolution
f ∗ ϕ (where ϕ ∈ Cc∞ (Rn )) lies in Cc∞ (Rn ), the next question is to what extent
the test function f ∗ ϕ resembles the original compactly supported continuous
function f . As already noted, for every x, the integral
(f ∗ ϕ)(x) = ∫ f (y)ϕ(x − y)
y
Exercises
7.3.1. (a) Show that the sum of two compact sets is compact.
(b) Let B(a, r) and B(b, s) be open balls. Show that their sum is B(a +
b, r + s).
(c) Recall that there are four standard axioms for addition, either in the
context of a field or a vector space. Which of the four axioms are satisfied by
set addition, and which are not?
(d) Let 0 < a < b. Let A be the circle of radius b in the (x, y)-plane, centered
at the origin. Let B be the closed disk of radius a in the (x, z)-plane, centered
at (b, 0, 0). Describe the sum A + B.
7.3.2. Let f ∈ Cc0 (Rn ), and let ϕ ∈ Cc∞ (Rn ). Assume that ϕ ≥ 0, i.e., all
output values of ϕ are nonnegative, and assume that ∫ ϕ = 1. Suppose that R
bounds f , meaning that ∣f (x)∣ < R for all x. Show that R also bounds f ∗ ϕ.
{ϕm } = {ϕ1 , ϕ2 , ϕ3 , . . . }
such that:
(1) Each ϕm is nonnegative, i.e., each ϕm maps Rn to R≥0 .
(2) Each ϕm has integral 1, i.e., ∫ ϕm = 1 for each m.
(3) The supports of the ϕm shrink to {0}, i.e.,
m=1
Then supp(ϕm ) = [−1/m, 1/m]n for each m. Here the coefficient mn is chosen
such that ∫ ϕm = 1 (Exercise 7.4.1). Figure 7.8 shows the graphs of ϕ2 , ϕ4 ,
ϕ8 , and ϕ15 when n = 1. The first three graphs have the same vertical scale,
but not the fourth. Figure 7.9 shows the graphs of ϕ1 through ϕ4 when n = 2,
all having the same vertical scale.
The identity being approximated by the sequence of test functions {ϕm } is
the Dirac delta function from the chapter introduction, denoted δ. To repeat
7.4 Test Approximate Identity and Convolution 363
- -
-
Figure 7.8. The functions ϕ2 , ϕ4 , ϕ8 , and ϕ15 from an approximate identity
-
-
- -
ideas from the introduction, δ is conceptually a unit point mass at the origin,
and so its properties should be
supp(δ) = {0}, ∫ δ = 1.
364 7 Approximation by Smooth Functions
No such function exists in the orthodox sense of the word function. But regard-
less of sense, for every function f ∶ Rn Ð→ R and every x ∈ Rn , the mollifying
kernel associated to f and δ,
κx (y) = f (y)δ(x − y),
is conceptually a point of mass f (x) at each x. That is, its properties should
be
supp(κx ) = {x}, (f ∗ δ)(x) = ∫ κx (y) = f (x).
y
Under a generalized notion of function, the Dirac delta makes perfect sense as
an object called a distribution, defined by the integral in the previous display
but only for a limited class of functions:
for all x, (f ∗ δ)(x) = f (x) for test functions f .
Yes, now it is f that is restricted to be a test function. The reason for this is
that δ is not a test function, not being a function at all, and to get a good
theory of distributions such as δ, we need to restrict the functions that they
convolve with. In sum, the Dirac delta function is an identity in the sense that
f ∗ δ = f for test functions f .
Distribution theory is beyond the scope of these notes, but we may conceive
of the identity property of the Dirac delta function as the expected limiting
behavior of any test approximate identity. That is, returning to the environ-
ment of f ∈ Cc0 (Rn ) and taking any test approximate identity {ϕm }, we expect
that
lim(f ∗ ϕm ) = f for Cc0 (Rn )-functions f .
m
As explained in Section 7.1, this limit will be uniform, meaning that the values
(f ∗ ϕm )(x) will converge to f (x) at one rate simultaneously for all x in Rn .
See Exercise 7.4.3 for an example of nonuniform convergence.
For an example of convolution with elements of a test approximate identity,
consider the sawtooth function
⎧
⎪∣x∣ if ∣x∣ ≤ 1/4,
⎪
⎪
⎪
f ∶ R Ð→ R, f (x) = ⎨1/2 − ∣x∣ if 1/4 < ∣x∣ ≤ 1/2,
⎪
⎪
⎪
⎪
⎩0 if 1/2 < ∣x∣.
Recall the test approximate identity {ϕm } from after Definition 7.4.1. Fig-
ure 7.10 shows f and its convolutions with ϕ2 , ϕ4 , ϕ8 , and ϕ15 . The convo-
lutions approach the original function while smoothing its corners, and the
convolutions are bounded by the bound on the original function as shown in
Exercise 7.3.2. Also, the convolutions have larger supports than the original
function, but the supports shrink toward the original support as m grows.
The following lemma says that if compact sets shrink to a point, then
eventually they lie inside any given ball about the point. Specifically, the sets
that we have in mind are the supports of a test approximate identity.
7.4 Test Approximate Identity and Convolution 365
{Sm } = {S1 , S2 , S3 , . . . }
⋂ Sm = {0}.
∞
S1 ⊃ S2 ⊃ S3 ⊃ ⋯,
m=1
Then for every δ > 0 there exists some positive integer m0 such that
Proof. Let δ > 0 be given. If no Sm lies in B(0, δ) then there exist points
x1 ∈ S1 /B(0, δ),
x2 ∈ S2 /B(0, δ),
x3 ∈ S3 /B(0, δ),
m ≥ m0 Ô⇒ ∣f ∗ ϕm − f ∣ < ε.
That is, the convolutions f ∗ϕm converge uniformly to the original function f .
Proof. Let ε > 0 be given. Since the support of f is compact, f is uniformly
continuous on its support, and hence f is uniformly continuous on all of Rn .
So there exists some δ > 0 such that for all x, y ∈ Rn ,
Use the fact that the approximate identity functions ϕm are nonnegative to
estimate that for all x ∈ Rn and all positive integers m,
This is the desired result. Note how the argument has used all three defining
properties of the approximate identity. ⊔
⊓
7.4 Test Approximate Identity and Convolution 367
m ≥ m0 Ô⇒ ∣f ∗ ϕm − f ∣k < ε.
That is, the convolutions and their derivatives converge uniformly to the orig-
inal function and its derivatives up to order k.
Proof. Recall from Corollary 7.3.7 that if f ∈ Cc1 (Rn ) then for every test func-
tion ϕ, the derivative of the convolution is the convolution of the derivative,
Dj (f ∗ ϕ) = Dj f ∗ ϕ, j = 1, . . . , n.
Since the derivatives Dj f lie in Cc0 (Rn ), the theorem says that their convo-
lutions Dj f ∗ ϕm converge uniformly to the derivatives Dj f as desired. The
argument for higher derivatives is the same. ⊔
⊓
Exercises
⎧
Also define
⎪
⎪0 if 0 ≤ x < 1,
f ∶ [0, 1] Ð→ R, f (x) = ⎨
⎪
⎪1 if x = 1.
⎩
(a) Using one set of axes, graph f1 , f2 , f3 , f10 , and f .
(b) Show that for every x ∈ [0, 1], limm fm (x) = f (x). That is, given ε > 0,
there exists some positive integer m0 such that for all positive integers m,
Thus the function f is the limit of the sequence of functions {fm }. That is,
(c) Now let ε = 1/2. Show that for every positive integer m, no matter how
large, there exists some corresponding x ∈ [0, 1] such that ∣fm (x) − f (x)∣ ≥ ε.
That is,
Thus the convergence of {fm } to f is not uniform, i.e., the functions do not
converge to the limit-function at one rate simultaneously for all x ∈ [0, 1].
f ∶ Rn Ð→ R.
∫ f =∫ f where supp(f ) ⊂ B.
B
Similarly to the remarks after Definition 7.3.2, the integral on the right side
exists, but this time by Theorem 6.5.4. The integral on the right side is inde-
pendent of the box B, and so the integral on the left side exists, is unambigu-
ous, and is understood to be the integral of f over all of Rn .
The convolution remains sensible when f is known-integrable. That is, if
f ∈ Ic (Rn ) and ϕ ∈ Cc∞ (Rn ) then for each x ∈ Rn the mollifying kernel
κx ∶ Rn Ð→ R, κx (y) = f (y)ϕ(x − y)
The formulas for convolution derivatives remain valid as well. That is, if f ∈
Ic (Rn ) and ϕ ∈ Cc∞ (Rn ) then also f ∗ ϕ ∈ Cc∞ (Rn ), and
Dj (f ∗ ϕ) = f ∗ ϕj , j = 1, . . . , n,
Djj ′ (f ∗ ϕ) = f ∗ Djj ′ ϕj , j, j ′ = 1, . . . , n,
and so on. Here is where it is relevant that our proof of Proposition 7.3.5
required only that each κx be integrable, that f be bounded, and that ϕ lie
in Cc1 (Rn ).
Given a known-integrable function f ∈ Ic (Rn ) and a test approximate
identity {ϕm }, we would like the convolutions {f ∗ ϕm } to approximate f uni-
formly as m grows. But the following proposition shows that this is impossible
when f has discontinuities.
Proposition 7.5.2 (The uniform limit of continuous functions is con-
tinuous). Let
{fm } ∶ Rn Ð→ R
be a sequence of continuous functions that converges uniformly to a limit func-
tion
f ∶ Rn Ð→ R.
Then f is continuous as well.
Proof. For every two points x, x̃ ∈ Rn and for every positive integer m we have
Let ε > 0 be given. For all m large enough, the first and third terms are less
than ε/3 regardless of the values of x and x̃. Fix such a value of m, and fix x.
Then since fm is continuous, the middle term is less than ε/3 if x̃ is close
enough to x. It follows that
Proof. Since K is compact, it lies in some ball B(0, R). Solving the problem
with the open set A ∩ B(0, R) in place of A also solves the original problem.
Having replaced A by A ∩ B(0, R), define a function on K that takes
positive real values,
The fact that we have shrunk A (if necessary) to lie inside the ball has ensured
that d is finite, because specifically d(a) ≤ R for all a. Fix some a ∈ K and let
r = d(a). Let {rm } be a strictly increasing sequence of positive real numbers
such that limm {rm } = r. Then B(a, rm ) ⊂ A for each m, and so
B(a, r) = ⋃ B(a, rm ) ⊂ A.
∞
m=1
The function d is continuous. To see this, fix some point a ∈ K and let
r = d(a). Consider also a second point ã ∈ K such that ∣ã − a∣ < r, and let
r̃ = d(ã). Then
B(ã, r − ∣ã − a∣) ⊂ B(a, r) ⊂ A,
showing that r̃ ≥ r − ∣ã − a∣. Either r̃ ≤ r + ∣ã − a∣, or r̃ > r + ∣ã − a∣ ≥ r so that also
∣ã − a∣ < r̃ and the same argument shows that r ≥ r̃ − ∣ã − a∣, i.e., r̃ ≤ r + ∣ã − a∣
after all. That is, we have shown that for every a ∈ K,
ã ∈ K
{ } Ô⇒ ∣d(ã) − d(a)∣ ≤ ∣ã − a∣.
∣ã − a∣ < r(a)
That is, the convolutions converge uniformly to the original function on com-
pact subsets of open sets where the function is continuous.
7.5 Known-Integrable Functions 371
Proof. Let ε > 0 be given. By the thickening lemma, there exists some r > 0
such that f is continuous on K + B(0, r). Hence f is uniformly continuous on
K + B(0, r). That is, there exists δ > 0 (with δ < r) such that for all x ∈ K and
all y ∈ Rn ,
∣y − x∣ < δ Ô⇒ ∣f (y) − f (x)∣ < ε.
There exists some positive integer m0 such that for all integers m ≥ m0 ,
supp(ϕm ) ⊂ B(0, δ). For all x ∈ K, all y ∈ Rn , and all m ≥ m0 ,
From here, the proof is virtually identical to the proof of Theorem 7.4.3. ⊔
⊓
Note that f lies in Ic (Rn ) rather than in Cc0 (Rn ) because of its discontinuities
at x = ±1/2. Figure 7.11 shows f and its convolutions with ϕ2 , ϕ4 , ϕ8 , and ϕ15 .
The convolutions converge uniformly to the truncated parabola on compact
sets away from the two points of discontinuity. But the convergence is not well
behaved at or near those two points. Indeed, the function value f (±1/2) = 1/4
rather than f (±1/2) = 0 is arbitrary and has no effect on the convolution
in any case. And again the convolutions are bounded by the bound on the
372 7 Approximation by Smooth Functions
original function, and their supports shrink toward the original support as m
grows.
7.5 Known-Integrable Functions 373
The straightedge constructs the line that passes through two given points in
the Euclidean plane. The compass constructs the circle that is centered at
a given point and has a given distance as its radius. A finite succession of
straightedge and compass constructions is called a Euclidean construction.
Physical straightedge and compass constructions are imprecise. Further-
more, there is really no such thing as a straightedge: aside from having to be
infinite, the line-constructor somehow requires a prior line for its own con-
struction. But we don’t concern ourselves with the details of actual tools for
drawing lines and circles. Instead we imagine the constructions to be ideal,
and we focus on the theoretical question of what Euclidean constructions can
or cannot accomplish.
With computer graphics being a matter of course to us today, the techno-
logical power of Euclidean constructions, however idealized, is underwhelming,
and so one might reasonably wonder why they deserve study. One point of this
section is to use the study of Euclidean constructions to demonstrate the idea
of investigating the limitations of a technology. That is, mathematical reason-
ing of one sort (in this case, algebra) can determine the capacities of some
other sort of mathematical technique (in this case, Euclidean constructions).
In a similar spirit, a subject called Galois theory uses the mathematics of fi-
nite group theory to determine the capacities of solving polynomial equations
by radicals.
In a high-school geometry course one should learn that Euclidean con-
structions have the capacity to
• bisect an angle,
• bisect a segment,
• draw the line through a given point and perpendicular to a given line,
• and draw the line through a given point and parallel to a given line.
These constructions (Exercise 8.1.1) will be taken for granted here.
Two classical problems of antiquity are trisecting the angle and doubling
the cube. This section will argue algebraically that neither of these problems
can be solved by Euclidean constructions, and then the second point of this
section is to introduce particular curves—and methods to generate them—
that solve the classical problems where Euclidean constructions fail to do so.
Take any two distinct points in the plane and denote them 0 and 1. Use
the straightedge to draw the line through them. We may as well take the
line to be horizontal with 1 appearing to the right of 0. Now define a real
number r as Euclidean if we can locate it on our number line with a Euclidean
construction. For instance, it is clear how the compass constructs the integers
from 0 to any specified n, positive or negative, in finitely many steps. Thus
the integers are Euclidean. Further, we can add an orthogonal line through
any integer. Repeating the process on such orthogonal lines gives us as much
of the integer-coordinate grid as we want.
Proposition 8.1.1. The Euclidean numbers form a subfield of R. That is, 0
and 1 are Euclidean, and if r and s are Euclidean, then so are r ± s, rs, and
(if s ≠ 0) r/s.
Proof. We have already constructed 0 and 1, and given any r and s it is
easy to construct r ± s. If s ≠ 0 then the construction shown in Figure 8.1
produces r/s. Finally, to construct rs when s ≠ 0, first construct 1/s, and then
rs = r/(1/s) is Euclidean as well. ⊔
⊓
8.1 Euclidean Constructions and Two Curves 377
y
x
r/s r
Figure 8.1. Constructing r/s
Let E denote the field of Euclidean numbers. Since Q is the smallest sub-
field of R, it follows that Q ⊂ E ⊂ R. The questions are whether E is no more
than Q, whether E is all of R, and—assuming that in fact E lies properly be-
tween Q and R—how we can describe the elements of E. The next proposition
shows that E is a proper superfield of Q.
√
Proposition 8.1.2. If c ≥ 0 is constructible, i.e., if c ∈ E, then so is c.
x
c+1
2
1
√
Figure 8.2. Constructing c
378 8 Parametrized Curves
C1 ∶ x2 + y 2 + a1 x + b1 y + c1 = 0,
C 2 ∶ x 2 + y 2 + a 2 x + b2 y + c 2 = 0
(−ey − f )2 + y 2 + a1 (−ey − f ) + b1 y + c1 = 0,
the field E is the set of numbers expressible in finitely many field and
square root operations starting from Q.
Now we can dispense with the two classical problems mentioned earlier.
Proof. Indeed, the side satisfies the relation x3 − 2 = 0, which again has no
quadratic factors. ⊓
⊔
x
O
Figure 8.3. A conchoid
B E
α α
β D
C
β
α x
O
A
P
C
O x
a 2a
P′
C′
B′
meets the circle, and the two horizontal distances labeled x are equal by the
nature of the cissoid. Continuing to work in the right half of the figure, we
see that the right triangle with base x and height y is similar to the two other
right triangles, and the analysis of the left half of the figure has shown that
the unlabeled vertical segment in the right half has height (2 − x)/2. Thus the
similar right triangles give the relations
y 2−x
= =
y x
x (2 − x)/2
and .
x y
It follows that
y2 2
= 2 − x and =
y
.
x x 2 2−x
Multiply the two equalities to get
y 3
( ) = 2.
x
That is, multiplying the sides of a cube by y/x doubles the volume of the
cube, as desired.
8.1 Euclidean Constructions and Two Curves 383
P
M y
1 x 2 − 2x x
Exercises
8.1.1. Show how straightedge and compass constructions bisect an angle, bi-
sect a segment, draw the line through point P perpendicular to line L, and
draw the line through point P parallel to line L.
8.1.2. What tacit assumption does the proof of Proposition 8.1.2 make
about c? Complete the proof for constructible c ≥ 0 not satisfying the as-
sumption.
384 8 Parametrized Curves
8.1.3. Show that for every subfield F of R, every line in F has equation ax+by+
c = 0 with a, b, c ∈ F; show that every circle in F has equation x2 +y 2 +ax+by+c =
0 with a, b, c ∈ F. Are the converses to these statements true? If the line passes
through the point p in direction d, what are the relations between p, d and
a, b, c? If the circle has center p and radius r, what are the relations between
p, r and a, b, c?
8.1.4. (a) If L1 and L2 are nonparallel lines in F, show that L1 ∩ L2 is a point
with coordinates in F.
(b) If C1 and C2 are distinct intersecting circles in F with equations x2 +
y + a1 x + b1 y + c1 = 0 for C1 and similarly for C2 , show that C1 ∩ C2 is equal to
2
In physical terms, this definition is a curvy version of the familiar idea that
distance equals speed times time. For a more purely mathematical definition
of a curve’s arc length, we should take the limit of the lengths of inscribed
polygonal paths. Take a partition t0 < t1 < ⋯ < tn of the parameter interval
[t, t′ ], where t0 = t and tn = t′ . The partition determines the corresponding
points on the curve, α(t0 ), α(t1 ), . . . , α(tn ). The arc length should be the
limit of the sums of the lengths of the line segments joining the points,
n
L(t, t′ ) = lim ∑ ∣α(tk ) − α(tk−1 )∣.
n→∞
k=1
rectifiable. For that matter, the image of a continuous curve need not match
our intuition of a curve. For instance, there is a continuous mapping from the
closed interval [0, 1] to all of the square [0, 1] × [0, 1], a so-called area-filling
curve. In any case, we will continue to assume that our curves are smooth,
and we will use the integral definition of arc length.
For example, the helix is the curve α ∶ R Ð→ R3 where
The length of the cycloid as the parameter varies from 0 to some angle θ
is
sin(τ ) dτ
θ θ θ/2
L(0, θ) = ∫ 2 sin(t/2) dt = 4 ∫ sin(t/2) d(t/2) = 4 ∫
t=0 t=0 τ =0
= 4 − 4 cos(θ/2), 0 ≤ θ ≤ 2π.
In particular, a full arch of the cycloid has length 8.
The cycloid has amazing properties. Upside down, it is the brachis-
tochrone, the curve of steepest descent, meaning that it is the curve between
two given points along which a bead slides (without friction) most quickly.
Upside down, it is also the tautochrone, meaning that a bead starting from
any point slides (without friction) to the bottom in the same amount of time.
For another property of the cycloid, suppose that a weight swings from a string
4 units long suspended at the origin, between two upside-down cycloids. The
right-hand upside-down cycloid is
388 8 Parametrized Curves
But since 0 ≤ θ ≤ π, we may carry out the following calculation, in which all
quantities under square root signs are nonnegative and so is the evaluation of
the square root at the last step,
√
1
(1 + cos θ)
=√
cos(θ/2)
cot(θ/2) =
2
sin(θ/2) 1
(1 − cos θ)
¿ √
2
Á (1 + (1 + cos θ)2
=ÁÀ cos 2
=
θ)
(1 − cos θ)(1 + cos θ) 1 − cos2 θ
1 + cos θ
= .
sin θ
And so now
1 + cos θ
α(θ) = (θ − sin θ, cos θ − 1) + 2 (1 − cos θ, − sin θ)
sin θ
= (θ − sin θ, cos θ − 1) + 2(sin θ, −1 − cos θ)
= (θ + sin θ, −3 − cos θ).
On the other hand, the right half of the original upside-down cycloid is
These are identical: α(θ) + (π, 2) = C(θ + π) for 0 ≤ θ ≤ π. That is, the weight
swings along the trace of a cycloid congruent to the two others. Since the
the upside-down cycloid is the tautochrone, this idea was used by Huygens
to attempt to design pendulum-clocks that would work on ships despite their
complicated motion.
The area under one arch of the cycloid is the integral
2π
∫ y(x) dx
x=0
8.2 Parametrized Curves 389
where y(x) is the function that takes the x-coordinate of a point of the cycloid
and returns its y-coordinate. As the cycloid parameter θ varies from 0 to 2π,
so does the x-coordinate of the cycloid-point,
x = x(θ) = θ − sin θ,
and the parametrization of the cycloid tells us that even without know-
ing y(x), we know that
y(x(θ)) = 1 − cos θ.
Thus the area under one arch of the cycloid is
where now the line L is {x = b}, rotating the conchoid a quarter turn clockwise
from before, and where the parameter θ is the usual angle from the polar
coordinate system. Every point (x, y) on the conchoid satisfies the equation
2at2 2at3
α ∶ R Ð→ R2 , α(t) = ( , ).
1 + t2 1 + t2
where the parameter t is tan θ, with θ being the usual angle from the polar
coordinate system.
Exercises
8.2.1. (a) Let α ∶ I Ð→ Rn be a regular curve that doesn’t pass through the
origin, but has a point α(t0 ) of nearest approach to the origin. Show that
the position vector α(t0 ) and the velocity vector α′ (t0 ) are orthogonal. (Hint:
If u, v ∶ I Ð→ Rn are differentiable then ⟨u, v⟩′ = ⟨u′ , v⟩ + ⟨u, v ′ ⟩—this follows
quickly from the one-variable product rule.) Does the result agree with your
geometric intuition?
(b) Find a regular curve α ∶ I Ð→ Rn that does not pass through the origin
and does not have a point of nearest approach to the origin. Does an example
exist with I compact?
8.2.2. Let α be a regular parametrized curve with α′′ (t) = 0 for all t ∈ I. What
is the nature of α?
390 8 Parametrized Curves
8.2.4. (a) Verify the parametrization of the conchoid given in this section.
(b) Verify the relation (x2 + y 2 )(x − b)2 = d2 x2 satisfied by points on the
conchoid.
8.2.5. (a) Verify the parametrization of the cissoid given in this section. Is
this parametrization regular? What happens to α(t) and α′ (t) as t → ∞?
(b) Verify Newton’s organic generation of the cissoid.
Recall that the trace of a curve is the set of points on the curve. Thinking
of a curve as time-dependent traversal makes it clear that different curves
may well have the same trace. That is, different curves can describe different
motions along the same path. For example, the curves
all have the unit circle as their trace, but their traversals of the circle are
different: α traverses it once counterclockwise at unit speed, β traverses it five
times counterclockwise at speed 5, γ traverses it once clockwise at unit speed,
and δ traverses it once counterclockwise at increasing speed.
Among the four traversals, α and δ are somehow basically the same, mov-
ing from the same starting point to the same ending point in the same direc-
tion, never stopping or backing up. The similarity suggests that we should be
able to modify one into the other. On the other hand, β and γ seem essentially
different from α and from each other. The following definition describes the
idea of adjusting a curve without changing its traversal in any essential way.
α ∼ β,
Also, φ′ (s) = 1/(s + 1) is positive for all s ∈ I. Again recalling the examples α
and δ, the calculation
τ =t
L(s, s′ ) = s′ − s.
F ∶ [a, b] Ð→ R, F (x) = ∫
x
f.
a
∣α′ (τ )∣ dτ.
t
ℓ(t) = ∫
τ =t0
Since ∣α′ (t)∣ > 0 for all t because α is regular, it follows that
Exercises
8.3.1. Show that the equivalence “∼” on curves is reflexive, symmetric, and
transitive.
(where a > 0 and b < 0 are real constants) is called a logarithmic spiral.
(a) Show that as t → +∞, α(t) spirals in toward the origin.
(b) Show that as t → +∞, L(0, t) remains bounded. Thus the spiral has
finite length.
T = γ′.
394 8 Parametrized Curves
So to first order, the curve is moving in the T -direction. Its normal vec-
tor N (s) is the 90-degree counterclockwise rotation of T (s). Thus the Frenet
frame {T, N } is a positive basis of R2 consisting of orthogonal unit vectors.
Before proceeding, we need to establish two handy little facts that hold in
every dimension n.
Lemma 8.4.1. (a) Let v ∶ I Ð→ Rn be a smooth mapping such that ∣v(t)∣ = c
(where c is constant) for all t. Then
⟨v, v ′ ⟩ = 0.
T ′ = κN, κ = κ(s) ∈ R.
T ′ = ⟨T ′ , T ⟩T + ⟨T ′ , N ⟩N,
N ′ = ⟨N ′ , T ⟩T + ⟨N ′ , N ⟩N.
The condition T ′ = κN shows that the top row inner products are ⟨T ′ , T ⟩ = 0
and ⟨T ′ , N ⟩ = κ. Since N is a unit vector, ⟨N ′ , N ⟩ = 0 by part (a) of the
lemma, and since T and N are orthogonal, ⟨N ′ , T ⟩ = −⟨T ′ , N ⟩ = −κ by part (b).
Thus the Frenet equations for a curve parametrized by arc length can be
formulated as
[ ′]=[ ][ ].
T′ 0 κ T
N −κ 0 N
The geometric idea is that as we move along the curve at unit speed, the
Frenet frame continually adjusts itself so that its first vector is tangent to
the curve in the direction of motion and the second vector is ninety degrees
counterclockwise to the first. The curvature is the rate (positive, negative, or
zero) at which the first vector is bending toward the second while the second
vector preserves the ninety-degree angle between them by bending away from
the first vector as much as the first vector is bending toward it.
Since γ ′ = T and thus γ ′′ = T ′ , the first and second derivatives of every
curve γ parametrized by arc length are expressed in terms of the Frenet frame,
[ ]=[ ][ ].
γ′ 10 T
γ ′′ 0κ N
This matrix relation shows that the local canonical form of a such a curve is,
up to quadratic order,
α ∶ I Ð→ R2 .
By the chain rule, and then by the product rule and again the chain rule,
α′ = (γ ′ ○ ℓ) ⋅ ℓ′ ,
α′′ = (γ ′ ○ ℓ) ⋅ ℓ′′ + (γ ′′ ○ ℓ) ⋅ (ℓ′ )2 .
These relations and the earlier expressions of γ ′ and γ ′′ in terms of the Frenet
frame combine to give
ℓ′ 0 γ′ ○ ℓ ℓ′ 0
[ ′′ ] = [ ′′ ′ 2 ] [ ′′ ] = [ ′′ ′ 2 ] [ ][ ].
α′ 10 T
α ℓ ℓ γ ○ ℓ ℓ ℓ 0 κ N
The fact that a plane curve lies on a circle if and only if its curvature
is constant cries out to be true. (If it isn’t, then our definitions must be
misguided.) And it is easy to prove using global coordinates. However, we
prove it by working with the Frenet frame, in anticipation of the less obvious
result for space curves to follow in the next section.
Proposition 8.4.2. Let γ ∶ I Ð→ R2 be regular. Then
When these conditions hold, ∣κ∣ = 1/ρ where ρ > 0 is the radius of the circle.
Proof. We may assume that γ is parametrized by arc length.
( Ô⇒ ) We will zoom in on the global condition that γ lies on a circle,
differentiating repeatedly and using the Frenet frame as our coordinate sys-
tem. In the argument, γ and its derivatives depend on the parameter s, and
8.4 Plane Curves: Curvature 397
so does the curvature κ, but we omit s from the notation in order to keep the
presentation light. We are given that for some fixed point p ∈ R2 and some
fixed radius ρ > 0,
∣γ − p∣ = ρ.
And by the nature of the Frenet frame, γ − p decomposes as
γ − p = ⟨γ − p, T ⟩T + ⟨γ − p, N ⟩N. (8.1)
γ − p = −(1/κ)N.
(γ + (1/κ)N )′ = T + (1/κ)(−κT ) = 0.
Exercises
8.4.1. (a) Let a and b be positive. Find the curvature of the ellipse α(t) =
(a cos(t), b sin(t)) for t ∈ R.
(b) Let a be positive and b be negative. Find the curvature of the loga-
rithmic spiral α(t) = (aebt cos t, aebt sin t) for t ≥ 0.
8.4.2. Let γ ∶ I Ð→ R2 be parametrized by arc length. Fix any unit vector
v ∈ R2 , and define a function
θ ∶ I Ð→ R
by the conditions
Thus θ is the angle that the curve γ makes with the fixed direction v. Show
that θ′ = κ. Thus our notion of curvature does indeed measure the rate at
which γ is turning.
398 8 Parametrized Curves
B = T × N.
T ′ = ⟨T ′ , T ⟩T + ⟨T ′ , N ⟩N + ⟨T ′ , B⟩B,
N ′ = ⟨N ′ , T ⟩T + ⟨N ′ , N ⟩N + ⟨N ′ , B⟩B,
B ′ = ⟨B ′ , T ⟩T + ⟨B ′ , N ⟩N + ⟨B ′ , B⟩B.
The definition
T ′ = κN
shows that the top row inner products are
⟨T ′ , T ⟩ = 0, ⟨T ′ , N ⟩ = κ, ⟨T ′ , B⟩ = 0.
And since N and B are unit vectors, the other two diagonal inner products
also vanish by Lemma 8.4.1(a),
⟨N ′ , N ⟩ = ⟨B ′ , B⟩ = 0.
Lemma 8.4.1(b) shows that the first inner product of the second row is the
negative of the second inner product of the first row,
8.5 Space Curves: Curvature and Torsion 399
⟨N ′ , T ⟩ = −⟨T ′ , N ⟩ = −κ,
and so only the third inner product of the second row is a new quantity,
All of the derivatives computed so far can be gathered into the Frenet equa-
tions,
⎡T′ ⎤ ⎡ 0 κ 0⎤⎡T ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ′⎥ ⎢ ⎥⎢ ⎥
⎢ N ⎥ = ⎢ −κ 0 τ ⎥ ⎢ N ⎥ .
⎢ ′⎥ ⎢ ⎥⎢ ⎥
⎢ B ⎥ ⎢ 0 −τ 0 ⎥ ⎢ B ⎥
⎣ ⎦ ⎣ ⎦⎣ ⎦
The geometric idea is that as we move along the curve, the bending of the
first natural coordinate determines the second natural coordinate; the second
natural coordinate bends away from the first as much as the first is bending
toward it, in order to preserve the ninety-degree angle between them; the
remaining bending of the second coordinate is toward or away from the third
remaining orthogonal coordinate, which bends away from or toward from the
second coordinate at the same rate, in order to preserve the ninety-degree
angle between them.
The relations γ ′ = T and γ ′′ = T ′ = κN and γ ′′′ = (κN )′ = κ′ N + κN ′ , and
the second Frenet equation N ′ = −κT + τ B combine to show that
⎡ γ′ ⎤ ⎡ 1 0 0 ⎤ ⎡ T ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ′′ ⎥ ⎢ ⎥⎢ ⎥
⎢γ ⎥ = ⎢ 0 κ 0 ⎥⎢N ⎥.
⎢ ′′′ ⎥ ⎢ 2 ′ ⎥ ⎢ ⎥
⎢ γ ⎥ ⎢−κ κ κτ ⎥ ⎢ B ⎥
⎣ ⎦ ⎣ ⎦⎣ ⎦
This relation shows that the local canonical form of a such a curve is, up to
third order,
• In the (T, N )-plane the curve is locally (s, (κ/2)s2 ), a parabola opening
upward at the origin (see Figure 8.11, viewing the curve down the positive
B-axis).
• In the (T, B)-plane the curve is locally (s, (κτ /6)s3 ), a cubic curve inflect-
ing at the origin, rising from left to right if τ > 0 and falling if τ < 0 (see
Figure 8.12, viewing the figure up the negative N -axis).
• In the (N, B)-plane the curve is locally ((κ/2)s2 , (κτ /6)s3 ), a curve in the
right half-plane with a cusp at the origin (see Figure 8.13, viewing the
curve down the positive T -axis).
The relation of the curve to all three local coordinate axes is shown in Fig-
ure 8.14.
α′ = (γ ′ ○ ℓ)ℓ′ ,
α′′ = (γ ′ ○ ℓ)ℓ′′ + (γ ′′ ○ ℓ)ℓ′2 ,
α′′′ = (γ ′ ○ ℓ)ℓ′′′ + 3(γ ′′ ○ ℓ)ℓ′ ℓ′′ + (γ ′′′ ○ ℓ)ℓ′3 .
These relations and the earlier expressions of γ ′ and γ ′′ in terms of the Frenet
frame combine to give
⎡ α ′ ⎤ ⎡ ℓ′ 0 0 ⎤ ⎡ γ ′ ○ ℓ ⎤ ⎡ ℓ′ 0 0 ⎤ ⎡ 1 0 0 ⎤ ⎡ T ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥
⎢ ′′ ⎥ ⎢ ′′ ′2 ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥
⎢α ⎥ = ⎢ℓ ℓ 0 ⎥ ⎢ γ ′′ ○ ℓ ⎥ = ⎢ ℓ′′ ℓ′2 0 ⎥ ⎢ 0 κ 0 ⎥ ⎢ N ⎥ .
⎢ ′′′ ⎥ ⎢ ′′′ ′ ′′ ′3 ⎥ ⎢ ′′′ ⎥ ⎢ ′′′ ′ ′′ ′3 ⎥ ⎢ 2 ′ ⎥ ⎢ ⎥
⎢ α ⎥ ⎢ℓ 3ℓ ℓ ℓ ⎥ ⎢ γ ○ ℓ ⎥ ⎢ℓ 3ℓ ℓ ℓ ⎥ ⎢−κ κ κτ ⎥ ⎢ B ⎥
⎣ ⎦ ⎣ ⎦⎣ ⎦ ⎣ ⎦⎣ ⎦⎣ ⎦
8.5 Space Curves: Curvature and Torsion 401
B
Thus α′ × α′′ = ℓ′ T × (∗T + ℓ′ κN ) = ℓ′3 κB, and since ℓ′ = ∣α′ ∣, this gives the
2
curvature,
∣α′ × α′′ ∣
κ=
∣α′ ∣3
.
r = 1/κ, t = 1/τ.
When these conditions hold, r2 + (r′ t)2 = ρ2 where ρ > 0 is the radius of the
sphere.
γ − p = ⟨γ − p, T ⟩T + ⟨γ − p, N ⟩N + ⟨γ − p, B⟩B. (8.2)
8.5 Space Curves: Curvature and Torsion 403
δ ′ = T + r′ N + r(−κT + τ B) + (r′′ t + r′ t′ )B − r′ tτ N
= (1 − rκ)T + (r′ − r′ tτ )N + (rτ + r′′ t + r′ t′ )B
= ( + r′′ t + r′ t′ ) B.
r
t
Exercise
8.5.1. (a) Let a and b be positive. Compute the curvature κ and the torsion τ
of the helix α(t) = (a cos t, a sin t, bt).
(b) How do κ and τ behave if a is held constant and b → ∞?
(c) How do κ and τ behave if a is held constant and b → 0?
(d) How do κ and τ behave if b is held constant and a → ∞?
(e) How do κ and τ behave if b is held constant and a → 0?
404 8 Parametrized Curves
α ∶ I Ð→ Rn
F1 = α′ /∣α′ ∣.
Thus F1 is a unit vector pointing in the same direction as the tangent vector
of α at t.
Assuming that F1′ never vanishes and that n ≥ 3, next define the first
curvature κ1 (t) of α at t and the second Frenet vector F2 (t) of α at t by the
conditions
F1′ = κ1 F2 , κ1 > 0, ∣F2 ∣ = 1.
Since ∣F1 ∣ = 1 for all t, it follows from Lemma 8.4.1(a) that ⟨F2 , F1 ⟩ = 0.
Because ⟨F2 , F1 ⟩ = 0, Lemma 8.4.1(b) gives ⟨F2′ , F1 ⟩ = −⟨F1′ , F2 ⟩ = −κ1 .
Assuming that F2′ + κ1 F1 never vanishes and that n ≥ 4, define the second
curvature κ2 (t) and the third Frenet vector F3 (t) by the conditions
⟨Fk′ , Fj ⟩ = −⟨Fj′ , Fk ⟩
= −⟨−κj−1 Fj−1 + κj Fj+1 , Fk ⟩
⎧
⎪
⎪ 0 if j = 1, . . . , k − 2,
=⎨
⎪
⎪−κk−1 if j = k − 1.
⎩
8.6 General Frenet Frames and Curvatures 405
So, assuming that Fk′ ≠ −κk−1 Fk−1 , define κk and Fk+1 by the conditions
Then the relation κk Fk+1 = Fk′ + κk−1 Fk−1 shows that ⟨Fk+1 , Fj ⟩ = 0 for j =
1, . . . , k. Use this process, assuming the nonvanishing that is needed, until
κn−2 and Fn−1 have been defined. Thus if n = 2 then the process consists only
of defining F1 ; if n = 3 then the process also defines κ1 and F2 ; if n = 4 then
the process further defines κ2 and F3 ; and so on.
Finally, define the nth Frenet vector Fn as the unique unit vector orthogo-
nal to F1 through Fn−1 such that det(F1 , F2 , . . . , Fn ) > 0, and then define the
(n − 1)st curvature κn−1 by the condition
′
Fn−1 = −κn−2 Fn−2 + κn−1 Fn .
The (n − 1)st curvature need not be positive. By Lemma 8.4.1(b) yet again,
we have Fn′ = −κn−1 Fn−1 , and so the Frenet equations are
⎡ F ′ ⎤ ⎡ 0 κ1 ⎤ ⎡ F1 ⎤
⎢ 1 ⎥ ⎢ ⎥⎢ ⎥
⎢ F ′ ⎥ ⎢ −κ 0 κ ⎥⎢ F ⎥
⎢ 2 ⎥ ⎢ 1 ⎥⎢ 2 ⎥
⎢ ′ ⎥ ⎢ ⎥⎢ ⎥
⎢ F3 ⎥ ⎢ ⎥ ⎢ F3 ⎥
2
⎢ ⎥ ⎢ −κ2 0 κ3 ⎥⎢ ⎥
⎢ ⋮ ⎥=⎢ ⋱ ⋱ ⋱ ⎥⎢ ⋮ ⎥.
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⋮ ⎥ ⎢ ⋱ ⋱ ⋱ ⎥⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ′ ⎥ ⎢ ⎥⎢ ⎥
⎢ Fn−1 ⎥ ⎢ −κ ⎥ ⎢ Fn−1 ⎥
⎢ ′ ⎥ ⎢ 0 κ ⎥⎢ ⎥
⎢ Fn ⎥ ⎢ −κn−1 0 ⎥ ⎢ ⎥
n−2 n−1
⎣ ⎦ ⎣ ⎦ ⎣ Fn ⎦
The first n−1 Frenet vectors and the first n−2 curvatures can also be obtained
by applying the Gram–Schmidt process (see Exercise 2.2.16) to the vectors
α′ , . . . , α(n−1) .
The Frenet vectors and the curvatures are independent of parametrization.
To see this, let α̃ ∶ Ĩ Ð→ Rn be a second curve equivalent to α. That is,
α = α̃ ○ φ
Since the curvatures and the rest of the Frenet vectors are described in terms
of derivatives of the first Frenet vector with respect to its variable, it follows
that the Frenet vectors and the curvatures are independent of parametrization,
as claimed,
406 8 Parametrized Curves
Since the curvatures describe the curve in local terms, they should be
unaffected by passing the curve through a rigid motion. The remainder of this
section establishes this invariance property of curvature, partly because doing
so provides us an excuse to describe the rigid motions of Euclidean space.
Definition 8.6.1. The square matrix A ∈ Mn (R) is orthogonal if At A = I.
That is, A is orthogonal if A is invertible and its transpose is its inverse. The
set of n × n orthogonal matrices is denoted On (R).
It is straightforward to check (Exercise 8.6.2) that
• the identity matrix I is orthogonal,
• if A and B are orthogonal then so is the product AB,
• and if A is orthogonal then so is the inverse A−1 .
These three facts, along with the fact that matrix multiplication is associative,
show that the orthogonal matrices form a group under matrix multiplication.
Some examples of orthogonal matrices are
cos θ − sin θ
[ ], [ ] for every θ ∈ R.
1 0
0 −1 sin θ cos θ
That is, rigid maps preserve the geometry of vector differences. The next
proposition characterizes rigid mappings.
8.6 General Frenet Frames and Curvatures 407
This shows that S(x) = Ax where A has columns S(e1 ), . . . , S(en ). Since
⟨S(ei ), S(ej )⟩ = ⟨ei , ej ⟩ for i, j ∈ {1, . . . , n}, in fact A ∈ On (R), as desired. ⊓
⊔
α ∶ I Ð→ Rn ,
α̃ ∶ I Ð→ Rn , α̃ = R ○ α.
Thus the first Frenet vectors of the two curves satisfy the relation
F̃1 = AF1 ,
so that since κ̃1 and κ1 are positive and ∣F̃2 ∣ = 1 = ∣F2 ∣ = ∣AF2 ∣,
Similarly,
κ̃i = κi , i = 1, . . . , n − 1
and
F̃i = AFi , i = 1, . . . , n.
We need A to be special orthogonal rather than just orthogonal in order that
this argument apply to the last Frenet vector and the last curvature. If A is
orthogonal but not special orthogonal then F̃n = −AFn and ̃κn−1 = −κn−1 .
Exercises
special cases of the general FTIC, and Section 9.17 takes a closer look at some
of the quantities that arise in this context.
Φ ∶ D Ð→ A,
See Figure 9.1. Here are some points to note about Definition 9.1.1:
• Recall that a subset A of Rn is called open if its complement is closed.
The definitions in this chapter need the environment of an open subset
rather than all of Rn in order to allow for functions that are not defined
everywhere. For instance, the reciprocal modulus function
1/∣ ⋅ ∣ ∶ Rn − {0} Ð→ R
is defined only on surfaces that avoid the origin. In most of the examples,
A will be all of Rn , but Exercise 9.11.1 will touch on how the subject
becomes more nuanced when it is not.
• Recall also that compact means closed and bounded. Connected means
that D consists of only one piece, as discussed informally in Section 2.4.
And as discussed informally in Section 6.5 and formally in Section 6.8, the
boundary of a set consists of all points simultaneously near the set and
near its complement—roughly speaking, its edge. Typically D will be some
region that is easy to integrate over, such as a box, whose compactness,
connectedness, and small boundary are self-evident.
9.1 Integration of Functions over Surfaces 411
• The word smooth in the definition means that the mapping Φ extends
to some open superset of D in Rk , on which it has continuous partial
derivatives of all orders. Each such partial derivative is therefore again
smooth. All mappings in this chapter are assumed to be smooth.
• When we compute, coordinates in parameter space will usually be written
as (u1 , . . . , uk ), and coordinates in Rn as (x1 , . . . , xn ).
• It may be disconcerting that a surface is by definition a mapping rather
than a set, but this is for good reason. Just as the integration of Chapter 6
was facilitated by distinguishing between functions and their outputs, the
integration of this chapter is facilitated by viewing the surfaces over which
we integrate as mappings rather than their images.
• A parametrized curve, as in Definition 8.2.1, is precisely a 1-surface.
z
v y
Φ
u
Φp ∶ R0 Ð→ Rn , Φp (0) = p,
v
z
x
u
⎢vk ⎥ ⎢vk ⋅ v1 ⋯ vk ⋅ vk ⎥
⎣ ⎦ ⎣ ⎦
For example, if k = 1 and γ ∶ [a, b] Ð→ Rn is a 1-surface (i.e., a curve)
in Rn , then its derivative matrix at a point u of [a, b] has one column,
⎡ γ ′ (u) ⎤
⎢ 1 ⎥
⎢ ⎥
γ (u) = ⎢ ⋮ ⎥ .
⎢ ′ ⎥
′
⎢γn (u)⎥
⎣ ⎦
Consequently, formula (9.2) is
√
length(γ ′ (u)) = γ ′ (u) ⋅ γ ′ (u).
√
That is, Definition 9.1.2 for k = 1 specializes to the definition of ∣γ ′ ∣ as γ ′ ⋅ γ ′
from Section 2.2. (Here and throughout this chapter, we drop the notational
convention that curves named γ are parametrized by arc length; thus no as-
sumption is present that ∣γ ′ ∣ = 1.) At the other extreme, if k = n then for-
mula (9.1) is
voln (P(v1 , . . . , vn )) = ∣ det(v1 , . . . , vn )∣.
That is, Definition 9.1.2 for k = n recovers the interpretation of ∣ det ∣ as volume
from Section 3.8. When k = 2, formula (9.2) is
√
area(P(v1 , v2 )) = ∣v1 ∣2 ∣v2 ∣2 − (v1 ⋅ v2 )2
√
= ∣v1 ∣2 ∣v2 ∣2 (1 − cos2 θ12 )
= ∣v1 ∣ ∣v2 ∣ ∣ sin θ12 ∣,
414 9 Integration of Differential Forms
giving the familiar formula for the area of a parallelogram. When k = 2 and also
n = 3, we can study the formula further by working in coordinates. Consider
two vectors u = (xu , yu , zu ) and v = (xv , yv , zv ). An elementary calculation
shows that the quantity under the square root in the previous display works
out to
∣u∣2 ∣v∣2 − (u ⋅ v)2 = ∣u × v∣2 .
So when k = 2 and n = 3, Definition 9.1.2 subsumes the familiar formula
area(P(v1 , v2 )) = ∣v1 × v2 ∣.
Here is an argument that (9.2) is the appropriate formula for the k-
dimensional volume of the parallelepiped spanned by the vectors v1 , . . . , vk
in Rn . (The fact that the vectors are tangent vectors to a k-surface is irrele-
vant to this discussion.) Results from linear algebra guarantee that there exist
vectors vk+1 , . . . , vn in Rn such that
• each of vk+1 through vn is a unit vector orthogonal to all the other vj ,
• det(v1 , . . . , vn ) ≥ 0.
Recall the notation in Definition 9.1.2 that V is the n×k matrix with columns
v1 , . . . , vk . Augment V to an n × n matrix W by adding the remaining vj as
columns too,
W = [v1 ⋯ vn ] = [V vk+1 ⋯ vn ] .
The scalar det(W ) is the n-dimensional volume of the parallelepiped spanned
by v1 , . . . , vn . But by the properties of vk+1 through vn , this scalar should
also be the k-dimensional volume of the the parallelepiped spanned by v1 , . . . ,
vk . That is, the natural definition is (using the second property of v1 , . . . , vn
for the second equality to follow)
√ √
volk (P(v1 , . . . , vk )) = det(W ) = (det W )2 = det(W T ) det(W )
√
= det(W T W ).
The first property of v1 , . . . , vn shows that
WT W = [ ],
V T V 0k×(n−k)
0(n−k)×k In−k
v z
y
Φ
u
x
Φ ∶ [0, 2π] × [0, π] Ð→ R3 , Φ(θ, ϕ) = (r cos θ sin ϕ, r sin θ sin ϕ, r cos ϕ).
This surface is the 2-sphere of radius r. Since the sphere is a surface of revo-
lution, its area is readily computed by methods from a first calculus course,
but we do so with the ideas of this section to demonstrate their use. The
derivative vectors are
416 9 Integration of Differential Forms
⎡ −r sin θ sin ϕ ⎤ ⎡r cos θ cos ϕ⎤
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
v1 = ⎢ r cos θ sin ϕ⎥ , v2 = ⎢ r sin θ cos ϕ ⎥ ,
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ −r sin ϕ ⎥
⎣ 0 ⎦ ⎣ ⎦
and so the integrand of the surface area integral is
√ √
∣v1 ∣2 ∣v2 ∣2 − (v1 ⋅ v2 )2 = r4 sin2 ϕ = r2 sin ϕ
The fact that the sphere-area magnification factor r2 sin ϕ is the familiar vol-
ume magnification factor for spherical coordinates is clear geometrically: to
traverse the sphere, the spherical coordinates θ and ϕ vary while r stays con-
stant, and when r does vary, it moves orthogonally to the sphere-surface so
that the incremental volume is the incremental surface-area times the incre-
mental radius-change. Indeed, the vectors v1 and v2 from a few displays back
are simply the second and third columns of the spherical change of variable
derivative matrix. The reader can enjoy checking that the first column of the
spherical change of variable derivative matrix is indeed a unit vector orthog-
onal to the second and third columns.
The integral in Definition 9.1.3 seems to depend on the surface Φ as a
parametrization rather than merely as a set, but in fact, the integral is unaf-
fected by reasonable changes of parametrization, because of the change of vari-
able theorem. To see this, let A be an open subset of Rn , and let Φ ∶ D Ð→ A
and Ψ ∶ D̃ Ð→ A be k-surfaces in A. Suppose that there exists a smoothly in-
vertible mapping T ∶ D Ð→ D ̃ such that Ψ ○T = Φ. In other words, T is smooth,
T is invertible, its inverse is also smooth, and the following diagram commutes
(meaning that either path around the triangle yields the same result):
D ◆◆
◆◆◆
◆◆Φ◆
◆◆◆
◆◆&
T 8A
qqqqq
qqq
qqqq Ψ
D̃q
√ √
But ∣ det(T ′ )∣ = det(T ′ )2 = det(T ′ T ) det(T ′ ), so this becomes
√
∫ (f ○ Ψ ○ T ) det (T ′ (Ψ ′ ○ T )T (Ψ ′ ○ T ) T ′ ) ,
T
D
Exercises
9.1.4. Find the surface area of the upper half of the cone at fixed angle ϕ from
the z-axis, extended outward to radius a. That is, the surface is the image of
the spherical coordinate mapping with ϕ fixed at some value between 0 and π
as ρ varies from 0 to a and θ varies from 0 to 2π.
418 9 Integration of Differential Forms
F = (F1 , F2 ) ∶ R2 Ð→ R2 ,
and a curve,
γ = (γ1 , γ2 ) ∶ [a, b] Ð→ R2 .
9.2 Flow and Flux Integrals 419
Assuming that the derivative γ ′ is always nonzero but not assuming that γ is
parametrized by arc length, the unit tangent vector to γ at the point γ(u),
pointing in the direction of the traversal, is
γ ′ (u)
T̂(γ(u)) = ′
∣γ (u)∣
.
Note that the denominator is the length factor in Definition 9.1.3. The parallel
component of F (γ(u)) along T̂(γ(u)) has magnitude (F ⋅ T̂)(γ(u)). (See Ex-
ercise 2.2.15.) Therefore the net flow of F along γ in the direction of traversal
is ∫γ F ⋅ T̂. By Definition 9.1.3, this flow integral is
γ ′ (u) ′
∫ F ⋅ T̂ = ∫ F (γ(u)) ⋅ ∣γ (u)∣ = ∫ F (γ(u)) ⋅ γ ′ (u),
b b
∣γ (u)∣
′
(9.3)
γ u=a u=a
and the length factor has canceled. In coordinates, the flow integral is
On the other hand, for every vector (x, y) ∈ R2 , define (x, y)× = (−y, x). (This
seemingly ad hoc procedure of negating one of the vector entries and then
exchanging them will be revisited soon as a particular manifestation of a
general idea.) The unit normal vector to the curve γ at the point γ(u), at
angle π/2 counterclockwise from T̂(γ(u)), is
̂ (γ(u)) = γ (u) .
′ ×
∣γ (u)∣
N ′
or, in coordinates,
F = (F1 , F2 , F3 ) ∶ R3 Ð→ R3 .
The intrinsic expression (9.3) for the flow integral of F along a curve γ remains
unchanged in R3 , making the 3-dimensional counterpart of (9.4) in coordinates
obvious,
γ u=a
420 9 Integration of Differential Forms
Φ = (Φ1 , Φ2 , Φ3 ) ∶ D Ð→ R3 .
Assuming that the two columns D1 Φ and D2 Φ of the derivative matrix Φ′ are
always linearly independent, a unit normal to the surface Φ at the point Φ(u)
(where now u = (u1 , u2 )) is obtained from their cross product,
̂=∫
∫ F ⋅N F (Φ(u)) ⋅ (D1 Φ(u) × D2 Φ(u)), (9.7)
Φ u∈D
or, in coordinates,
⎛ (F1 ○ Φ)(D1 Φ2 D2 Φ3 − D1 Φ3 D2 Φ2 )⎞
̂=∫
∫ F ⋅N ⎜ +(F2 ○ Φ)(D1 Φ3 D2 Φ1 − D1 Φ1 D2 Φ3 )⎟ (u).
⎜ ⎟ (9.8)
⎝ +(F3 ○ Φ)(D1 Φ1 D2 Φ2 − D1 Φ2 D2 Φ1 )⎠
Φ u∈D
Whereas the 2-dimensional flow and flux integrands and the 3-dimensional
flow integrand involved derivatives γj′ of the 1-surface γ, the integrand here
contains the determinants of all 2 × 2 subblocks of the 3 × 2 derivative matrix
of the 2-surface Φ,
⎡D Φ D Φ ⎤
⎢ 1 1 2 1⎥
⎢ ⎥
Φ = ⎢D1 Φ2 D2 Φ2 ⎥ .
⎢ ⎥
′
⎢D1 Φ3 D2 Φ3 ⎥
⎣ ⎦
The subdeterminants give a hint about the general picture. Nonetheless, (9.8)
is forbidding enough that we should pause and think before trying to compute
more formulas.
For general n, formula (9.3) for the flow integral of a vector field along a
curve generalizes transparently,
But the generalization of formulas (9.5) through (9.8) to a formula for the flux
integral of a vector field in Rn through an (n − 1)-surface is not so obvious.
Based on (9.7), the intrinsic formula should be
̂=∫
∫ F ⋅N ((F ○ Φ) ⋅ (D1 Φ × ⋯ × Dn−1 Φ))(u), (9.10)
Φ u∈D
9.2 Flow and Flux Integrals 421
v × = det [ ] = (−y, x)
x e1
y e2
This is the formula that appeared with no explanation as part of the flux
integral in R2 . That is, the generalization (9.10) of the 3-dimensional flux
integral to higher dimensions also subsumes the 2-dimensional case. Returning
to Rn , the cross product of the vectors D1 Φ(u),. . . ,Dn−1 Φ(u) is
⎡ e1 ⎤
⎢ ⎥
⎢ ⎥
(D1 Φ × ⋯ × Dn−1 Φ)(u) = det ⎢ D1 Φ(u) ⋯ Dn−1 Φ(u) ⋮ ⎥ .
⎢ ⎥
⎢ en ⎥
⎣ ⎦
This determinant can be understood better by considering the data in the
matrix as rows. Recall that for i = 1, . . . , n, the ith row of the n × (n − 1)
derivative matrix Φ′ is the derivative matrix of the ith component function
of Φ,
Φ′i (u) = [D1 Φi (u) ⋯ Dn−1 Φi (u)] .
In terms of these component function derivatives, the general cross product is
422 9 Integration of Differential Forms
⎡ Φ′ (u) e ⎤ ⎡e ⎤
⎢ 1 1⎥ ⎢ 1 Φ1 (u) ⎥
′
⎢ ⎥ ⎢ ⎥
(D1 Φ × ⋯ × Dn−1 Φ)(u) = det ⎢ ⋮ ⋮ ⎥ = (−1) det ⎢ ⋮ ⋮ ⎥
⎢ ′ ⎥ ⎢ ⎥
n−1
⎢ Φn (u) en ⎥ ⎢ en Φn (u) ⎥
⎣ ⎦ ⎣ ⎦
′
⎢ i+1 ⎥
′
Φ
⎢ ⋮ ⎥
i=1
⎢ ⎥
⎢ Φ′ (u) ⎥
⎣ n ⎦
Thus finally, the general flux integral in coordinates is
⎡ Φ′1 ⎤
⎢ ⎥
⎢ ⋮ ⎥
⎢ ⎥
⎢Φ′ ⎥
̂ ⎢ i−1 ⎥
( ∑(−1) (Fi ○ Φ) det ⎢ ′ ⎥ )(u).
n
∫ F ⋅ N = (−1) ∫ ⎢Φi+1 ⎥
n−1 i−1
(9.12)
⎢ ⎥
⎢ ⋮ ⎥
Φ u∈D i=1
⎢ ⎥
⎢ Φ′ ⎥
⎣ n⎦
Φ = (Φ1 , Φ2 , Φ3 , Φ4 ) ∶ D Ð→ R4 .
⎢ ′⎥ ⎢ ⎥
⎢Φ4 ⎥ ⎢D1 Φ4 D2 Φ 4 ⎥
⎣ ⎦ ⎣ ⎦
so that any two of its rows form a square matrix. Consider also any six smooth
functions
F1,2 , F1,3 , F1,4 , F2,3 , F2,4 , F3,4 ∶ R4 Ð→ R.
Then we can define an integral,
9.3 Differential Forms Syntactically and Operationally 423
⎜ 4 ⎟
⎜ ⎟ (u).
′ ′ ′
Φ Φ Φ
∫ ⎜ ⎟
2 3
u∈D ⎜ Φ′3 ⎟
[ ] (F [ ] (F [ ]
′ ′
+(F ○ + ○ + ○
Φ2 Φ2
⎝ 2,3 Φ) det
Φ′3 2,4 Φ) det
Φ′4 3,4 Φ) det
Φ′4 ⎠
(9.13)
Since the surface Φ is not 1-dimensional, this is not a flow integral. And since
Φ is not (n − 1)-dimensional, it is not a flux integral either. Nonetheless, since
the integrand contains the determinants of all 2 × 2 subblocks of the 4 × 2
derivative matrix of the 2-surface Φ, it is clearly cut from the same cloth as
the flow and flux integrands of this section. The ideas of this chapter will
encompass this integral and many others in the same vein.
As promised at the beginning of this section, the k-volume factor has
canceled in flow and flux integrals, and the remaining integrand features de-
terminants of the derivatives of the component functions of the surface of
integration. Rather than analyze such cluttered integrals, the method of this
chapter is to abstract their key properties into symbol-patterns, and then
work with the patterns algebraically instead. An analysis tracking all the de-
tails of the original setup would be excruciating to follow, not to mention
being unimaginable to recreate ourselves. Instead, we will work insightfully,
economy of ideas leading to ease of execution. Since the definitions to fol-
low do indeed distill the essence of vector integration, they will enable us to
think fluently about the phenomena that we encounter. This is real progress in
methodology, much less laborious than the classical approach. Indeed, having
seen the modern argument, it is unimaginable to want to recreate the older
one.
Exercises
(1, 1, 1), (1, 1, 2), (1, 2, 1), (1, 2, 2), (2, 1, 1), (2, 1, 2), (2, 2, 1), (2, 2, 2).
A sum over the ordered k-tuples from {1, . . . , n} means simply a sum of terms
with each term corresponding to a distinct k-tuple. Thus we may think of an
ordered k-tuple (i1 , . . . , ik ) as a sort of multiple index or multiple subscript,
and for this reason we often will abbreviate it to I. These multiple subscripts
will figure prominently throughout this chapter, so you should get comfortable
with them. Exercise 9.3.1 provides some practice.
or
∑ fI dxI ,
I
Make the convention that the empty set I = ∅ is the only ordered 0-tuple
from {1, . . . , n}, and that the corresponding empty product dx∅ is 1. Then
the definition of a k-form for k ≥ 1 in Definition 9.3.1 also makes sense for
k = 0, and it subsumes the special definition that was given for k = 0.
For example, a differential form for n = 3 and k = 1 is
y dx ∧ dx + ex dx ∧ dy + y cos x dy ∧ dx,
9.3 Differential Forms Syntactically and Operationally 425
with the missing dy ∧ dy term tacitly understood to have the zero function as
its coefficient-function f(2,2) (x, y), and hence to be zero itself. The expression
1
dx
x
is a 1-form on the open subset A = {x ∈ R ∶ x ≠ 0} of R, but it is not a 1-form
on all of R. The hybrid expression
z dx ∧ dy + ex dz
ω ∶ {0-surfaces in A} Ð→ R,
ω(Φp ) = f (p).
ω ∶ {k-surfaces in A} Ð→ R,
∫ ω = ω(Φ).
Φ
Formula (9.14), defining ω(Φ), is the key for everything to follow in this
chapter. It defines an integral over the image Φ(D), which may have volume
zero in Rn , by pulling back—this term will later be defined precisely—to an
integral over the parameter domain D, which is a full-dimensional set in Rk
and hence has positive k-dimensional volume.
Under Definition 9.3.2, the integral of a differential form over a surface
depends on the surface as a mapping, i.e., as a parametrization. However, it
is a straightforward exercise to show that that the multivariable change of
variable theorem implies that the integral is unaffected by reasonable changes
of parametrization.
Returning to formula (9.14): despite looking like the flux integral (9.12),
it may initially be impenetrable to the reader who (like the author) does not
assimilate notation quickly. The next two sections will illustrate the formula
in specific instances, after which its general workings should be clear. Before
long, you will have an operational understanding of the definition.
Operational understanding should be complemented by structural under-
standing. The fact that the formal consequences of Definitions 9.3.1 and 9.3.2
subsume the main results of classical integral vector calculus still doesn’t ex-
plain these ad hoc definitions conceptually. For everything to play out so
nicely, the definitions must somehow be natural rather than merely clever,
and a structural sense of why they work so well might let us extend the ideas
to other contexts rather than simply tracking them. Indeed, differential forms
fit into a mathematical structure called a cotangent bundle, with each differ-
ential form being a section of the bundle. The construction of the cotangent
bundle involves the dual space of the alternation of a tensor product, all of
these formidable-sounding technologies being utterly Platonic mathematical
objects. However, understanding this language requires an investment in ideas
and abstraction, and in the author’s judgment the startup cost is much higher
without some experience first. Hence the focus of the chapter is purely op-
erational. Since formula (9.14) may be opaque to the reader for now, the
first order of business is to render it transparent by working easy concrete
examples.
Exercises
9.3.1. Write out all ordered k-tuples from {1, . . . , n} in the cases n = 4, k = 1;
n = 3, k = 2. In general, how many ordered k-tuples I = (i1 , . . . , ik ) from
{1, . . . , n} are there? How many of these are increasing, meaning that i1 <
⋯ < ik ? Write out all increasing k-tuples from {1, 2, 3, 4} for k = 1, 2, 3, 4.
9.4 Examples: 1-Forms 427
γ = (γ1 , γ2 , γ3 ) ∶ [a, b] Ð→ R3 ,
For every such curve, ω is the instructions integrate γ1 γ2′ over the parameter
domain [a, b], and similarly λ instructs to integrate γ2 γ3′ . You should work
through applying formula (9.14) to ω and λ to see how it produces these direc-
tions. Note that x and y are being treated as functions on R3 —for example,
so that x ○ γ = γ1 .
To see ω and λ work on a specific curve, consider the helix
⎢ b ⎥
⎣ ⎦
Thus by (9.14),
2π 2π
∫ ω=∫ a cos t ⋅ a cos t = πa2 and ∫ λ=∫ a sin t ⋅ b = 0.
H t=0 H t=0
Looking at the projections of the helix in the (x, y)-plane and the (y, z)-plane
suggests that these are the right values for ∫H x dy and ∫H y dz if we interpret
the symbols x dy and y dz as in one-variable calculus. (See Figure 9.4.)
428 9 Integration of Differential Forms
z z
y y
x x
Then
∫ ω = ∫ (1 ○ γ) ⋅ γ1 = ∫ γ1′ = γ1 (b) − γ1 (a).
b b
′
γ a a
A change of notation makes this example more telling. Rewrite the component
functions of the curve as x, y, and z,
That is, the form dx does indeed measure change in x along curves. As a set
of instructions, it simply says to evaluate the x-coordinate difference from the
initial point on the curve to the final point. Returning to the helix H, it is
now clear with no further work that
∫ dx = 0, ∫ dy = 0, ∫ dz = 2πb.
H H H
ω = D1 f dx1 + ⋯ + Dn f dxn .
= ∫ (f ○ γ)′
b
by the chain rule in coordinates
a
= (f ○ γ)∣a
b
= f (γ(b)) − f (γ(a)).
That is, the form ω measures change in f along curves. Indeed, ω is classically
called the total differential of f . It is tempting to give ω the name df , i.e., to
define
df = D1 f dx1 + ⋯ + Dn f dxn .
Soon we will do so as part of a more general definition.
(Recall the chain rule: If A ⊂ Rn is open, then for every smooth γ ∶ [a, b] Ð→
A and f ∶ A Ð→ R,
ω = F1 dx1 + ⋯ + Fn dxn .
and this is the general flow integral (9.9) of the vector field (F1 , . . . , Fn )
along γ. That is, the flow integrals from Section 9.2 are precisely the inte-
grals of 1-forms.
430 9 Integration of Differential Forms
Exercises
9.4.2. Let ω = z dx+x2 dy+y dz, a 1-form on R3 . Evaluate ∫γ ω for the following
two curves.
(a) γ ∶ [−1, 1] Ð→ R3 , γ(t) = (t, at2 , bt3 );
(b) γ ∶ [0, 2π] Ð→ R3 , γ(t) = (a cos t, a sin t, bt).
Φ = (Φ1 , Φ2 , Φ3 ) ∶ D Ð→ R3 .
The parameter domain D has been partitioned into subrectangles, and the
image Φ(D) has been divided up into subpatches by mapping the grid lines
in D over to it via Φ. The subrectangle J of D maps to the subpatch B of Φ(D),
which in turn has been projected down to its shadow B(1,2) in the (x, y)-
plane. The point (uJ , vJ ) resides in J, and its image under Φ is Φ(uJ , vJ ) =
(xB , yB , zB ).
Note that B(1,2) = (Φ1 , Φ2 )(J). Rewrite this as
That is, B(1,2) is the image of J under the (1, 2) component functions of Φ.
If J is small then results on determinants give
J Φ
B12 x
y
x
B21
x
B12 y
∫ ω = ∫ (f ○ Φ) det Φ(1,2)
′
Φ D
≈ ∑(f ○ Φ)(uJ , vJ ) det Φ′(1,2) (uJ , vJ )area(J)
J
≈ ∑ f (xB , yB , zB )( ± area(B(1,2) )).
B
(See Figure 9.7.) The (x, y)-shadows of B1 , B2 have the same areas as J1 , J2
and positive orientation, so ∫Φ dx ∧ dy should be equal to area(D), i.e., 2. (See
the left half of Figure 9.8.) The (z, x)-shadows of B1 , B2 have area zero, so
∫Φ dz ∧ dx should be an emphatic 0. (See the right half of Figure 9.8.) The
(y, z)-shadows of B1 , B2 have the same area but opposite orientations, so
∫Φ dy ∧ dz should be 0 by some cancellation on opposite sides of the (y, z)-
plane or equivalently, cancellation in the u-direction of the parameter domain.
(See Figure 9.9.)
v
Φ
z
y
J1 J2
u
B2
y x
B1 B2 B2
x z
B1
z z
B1 B2
y y
⎢−2u 0⎥
⎣ ⎦
we have
det [ ] = 2,
1 1
∫ dx ∧ dy = ∫ det Φ(1,2) = ∫
10
∫
′
Φ D v=0 u=−1 01
and similarly
−2u 0
det [ ] = ∫ ∫ 0 = 0,
1 1
∫ dz ∧ dx = ∫ det Φ(3,1) = ∫ ∫
′
Φ D v=0 u=−1 1 0 v u
det [ ] = ∫ ∫ 2u = 0.
1 1
∫ dy ∧ dz = ∫ det Φ(2,3) = ∫
0 1
∫
′
Φ D v=0 u=−1 −2u 0 v u
434 9 Integration of Differential Forms
Note how the first integral reduces to integrating 1 over the parameter do-
main, the second integral vanishes because its integrand is zero, and the third
integral vanishes because of cancellation in the u-direction. All three of these
behaviors confirm our geometric insight into how forms should behave.
Since the differential form dx ∧ dy measures projected area in the (x, y)-
plane, the integral
∫ z dx ∧ dy
Φ
should give the volume under the arch. And indeed formula (9.14) gives
∫ z dx ∧ dy = ∫ (1 − u2 ) ⋅ 1,
Φ (u,v)∈D
(1 − u2 ) = 1 ⋅ (2 − u3 /3∣ ) = 4/3.
1 1 1
∫ z dx ∧ dy = ∫ ∫ −1
Φ v=0 u=−1
⎛ (F1 ○ Φ)(D1 Φ2 D2 Φ3 − D1 Φ3 D2 Φ2 )⎞
∫ ω=∫ ⎜ +(F2 ○ Φ)(D1 Φ3 D2 Φ1 − D1 Φ1 D2 Φ3 )⎟ (u),
⎜ ⎟
⎝ +(F3 ○ Φ)(D1 Φ1 D2 Φ2 − D1 Φ2 D2 Φ1 )⎠
Φ u∈D
9.5 Examples: 2-Forms on R3 435
and this is the flux integral (9.8) of the vector field (F1 , F2 , F3 ) through Φ. A
straightforward generalization of this example shows that the general integral
of an (n−1)-form over an (n−1)-surface in Rn is the general flux integral (9.12).
That is, the flux integrals from Section 9.2 are precisely the integrals of (n−1)-
forms.
Along with the last example of the previous section, this raises the follow-
ing question: why bother with k-forms for values of k other than 1 and n − 1,
and maybe also 0 and n? The answer is that the amalgamation of k-forms for
all values of k has a coherent algebraic structure, making the whole easier to
study than its parts. The remainder of the chapter is largely an elaboration
of this point.
After this discussion of the mechanics and meaning of integrating forms,
you should be ready to prove a result that has already been mentioned: inte-
gration of forms reduces to ordinary integration when k = n, and integration
of forms is unaffected by reasonable changes of parametrization. These points
are covered in the next set of exercises.
Exercises
Sketch this surface, noting that θ varies from 0 to π, not from 0 to 2π. Try
to determine ∫Φ dx ∧ dy by geometric reasoning, and then check your answer
using (9.14) to evaluate the integral. Do the same for dy ∧ dz and dz ∧ dx. Do
the same for z dx ∧ dy − y dz ∧ dx.
∫ ω = ∫ f.
∆ D
Your solution should use the basic properties of ∆ but not the highly sub-
stantive change of variable theorem. Note that in particular if f = 1, then
ω = dx1 ∧ ⋯ ∧ dxn and ∫∆ ω = vol(D), explaining why in this case ω is called
the volume form.
Thus in Rn , we may from now on blur the distinction between integrating
the function f over a set and integrating the n-form ω = f dxI over a surface,
provided that I = (1, . . . , n) (i.e., the dxi factors appear in canonical order),
and provided that the surface is parametrized trivially.
9.5.5. This exercise proves that because of the change of variable theorem,
the integration of differential forms is invariant under orientation-preserving
reparametrizations of a surface.
Let A be an open subset of Rn . Let Φ ∶ D Ð→ A and Ψ ∶ D ̃ Ð→ A be
k-surfaces in A. Suppose that there exists a smoothly invertible mapping
T ∶ D Ð→ D ̃ such that Ψ ○ T = Φ. In other words, T is smooth, T is invertible,
its inverse is also smooth, and the following diagram commutes:
D ◆◆
◆◆◆
◆◆Φ◆
◆◆◆
◆◆&
T q8 A
qqqqq
qq
qqqq Ψ
D̃ q
(M N )I = MI N.
(Suggestion: Do it first for the case I = i, that is, I denotes a single row.)
(c) Use the chain rule and part (b) to show that for every I,
det Φ′I (u) = det ΨI′ (T (u)) det T ′ (u) for all u ∈ D.
∫ ω = ∫ ω.
Ψ Φ
ω1 = ω2
where the first “+” lies between two forms, the second between two real num-
bers. Similarly, the definition of scalar multiplication is
The addition of forms here is compatible with the twofold use of summation
in the definition of forms and how they integrate. Addition and scalar multi-
plication of forms inherit all the vector space properties from corresponding
438 9 Integration of Differential Forms
the last equality holding since (−1)x = −x for all real numbers x.
Forms have other algebraic properties that are less familiar. For example,
on R2 , dy ∧ dx = −dx ∧ dy. This rule follows from the skew symmetry of the
determinant: for any 2-surface Φ ∶ D Ð→ R2 ,
More generally, given two k-tuples I and J from {1, . . . , n}, dxJ = −dxI if J
is obtained from I by an odd number of transpositions. Thus for example,
dz ∧ dy ∧ dx = −dx ∧ dy ∧ dz
since (3, 2, 1) is obtained from (1, 2, 3) by swapping the first and third entries.
Showing this reduces again to the skew symmetry of the determinant. As a
special case, dxI = 0 whenever the k-tuple I has two matching entries. This
rule holds because exchanging those matching entries has no effect on I but
gives the negative of dxI , and so dxI = −dxI , forcing dxI = 0. One can also
verify directly that dxI = 0 if I has matching entries by referring back to the
fact that the determinant of a matrix with matching rows vanishes.
Using these rules (dy∧dx = −dx∧dy, dx∧dx = 0, and their generalizations),
one quickly convinces oneself that every k-form can be written
Exercise
define their concatenation (I, J), a (k+ℓ)-tuple from {1, . . . , n}, in the obvious
way,
(I, J) = (i1 , . . . , ik , j1 , . . . , jℓ ).
Also, if f, g ∶ A Ð→ R are functions on an open subset of Rn then their product
f g is the function
f g ∶ A Ð→ R, (f g)(x) = f (x)g(x).
ω ∧ λ = ∑ fI gJ dx(I,J) .
I,J
That is, the wedge product is formed by following the usual distributive law
and wedge-concatenating the dx-terms.
For convenient notation, let Λk (A) denote the vector space of k-forms
on A. Thus the wedge product is a mapping,
This example shows that the wedge product automatically encodes the inner
product in R3 , and the idea generalizes easily to Rn . For another example, a
wedge product of two 1-forms on R3 is
Comparing this to the formula for the cross product in Section 3.10 shows
that the wedge product automatically encodes the cross product. Similarly, a
wedge product of two 1-forms on R2 is
440 9 Integration of Differential Forms
Exercises
9.7.1. Find a wedge product of two differential forms that encodes the inner
product of R4 .
9.7.2. Find a wedge product of three differential forms that encodes the 3 × 3
determinant.
by the rules
n
df = ∑ Di f dxi for a 0-form f ,
i=1
dω = ∑ dfI ∧ dxI for a k-form ω = ∑ fI dxI .
I I
For example, we saw in Section 9.4 that for a function f , the 1-form
df = D1 f dx1 + ⋯ + Dn f dxn
is the form that measures change in f along curves. To practice this new kind
of function-differentiation in a specific case, define the function
π1 ∶ R3 Ð→ R
This calculation is purely routine. In practice, however, one often blurs the
distinction between the name of a function and its output, for instance speak-
ing of the function x2 rather than the function f ∶ R Ð→ R where f (x) = x2
or the squaring function on R. Such loose nomenclature is usually harmless
442 9 Integration of Differential Forms
d(x) = dx.
ω = x dy − y dx
And if
ω = x dy ∧ dz + y dz ∧ dx + z dx ∧ dy
then
dω = 3 dx ∧ dy ∧ dz.
The differentiation operator d commutes with sums and scalar multiples.
That is, if ω1 , ω2 are k-forms and c is a constant then
More interesting are the following two theorems about form differentiation.
Theorem 9.8.2 (Product rule for differential forms). Let A be an open
subset of Rn . Let ω and λ be respectively a k-form and an ℓ-form on A. Then
Next consider a k-form and an ℓ-form with one term each, fI dxI and gJ dxJ .
Then
d(ω ∧ λ) = d (∑ ωI ∧ ∑ λJ ) = ∑ d(ωI ∧ λJ )
I J I,J
= d ∑ ωI ∧ ∑ λJ + (−1)k ∑ ωI ∧ d ∑ λJ
I J I J
= dω ∧ λ + (−1) ω ∧ dλ.k
⊔
⊓
Because the last step in this proof consisted only in pushing sums tediously
through the other operations, typically it will be omitted from now on, and
proofs will be carried out for the case of one-term forms.
Consider a function f (x, y) on R2 . Its derivative is
444 9 Integration of Differential Forms
The dx ∧ dx term and the dy ∧ dy term are both 0. And the other two terms
sum to 0, because the mixed partial derivatives D12 f (x, y) and D21 f (x, y)
are equal while dy ∧ dx and dx ∧ dy are opposite. Overall, then,
d2 f = 0.
d2 = 0.
and so
n
d2 f = d(df ) = ∑ d(Di f ) ∧ dxi = ∑ Dij f dxj ∧ dxi .
i=1 i,j
All terms with i = j cancel because dxi ∧ dxi = 0, and the rest of the terms
cancel pairwise because for i ≠ j, Dji f = Dij f (equality of mixed partial
derivatives) and dxi ∧dxj = −dxj ∧dxi (skew symmetry of the wedge product).
Thus
d2 f = 0.
Also, for a k-form dxI with constant coefficient function 1,
dω = df ∧ dxI ,
A form ω is called
and
closed if dω = 0.
Theorem 9.8.3 shows that:
The converse question, whether every closed form is exact, is more subtle. We
will discuss it in Section 9.11.
Exercises
ω0 = φ,
ω1 = f1 dx + f2 dy + f3 dz,
ω2 = g1 dy ∧ dz + g2 dz ∧ dx + g3 dx ∧ dy,
ω3 = h dx ∧ dy ∧ dz.
∇ = (D1 , D2 , D3 ),
where the Di are familiar partial derivative operators. Thus, for a function
φ ∶ R3 Ð→ R,
∇φ = (D1 φ, D2 φ, D3 φ).
Similarly, for a mapping F = (f1 , f2 , f3 ) ∶ R3 Ð→ R3 , ∇ × F is defined in the
symbolically appropriate way, and for a mapping G = (g1 , g2 , g3 ) ∶ R3 Ð→ R3 ,
so is ⟨∇, G⟩. Write down explicitly the vector-valued mapping ∇ × F and the
function ⟨∇, G⟩ for F and G as just described. The vector-valued mapping ∇φ
is the gradient of φ from Section 4.8,
446 9 Integration of Differential Forms
grad φ = ∇φ.
curl F = ∇ × F.
9.8.5. Continuing with the notation of the previous two problems, introduce
correspondences between the classical scalar–vector environment and the en-
vironment of differential forms, as follows. Let
dV = dx ∧ dy ∧ dz.
Let id be the mapping that takes each function φ ∶ R3 Ð→ R to itself, but with
Ð→
the output-copy of φ viewed as a 0-form. Let ⋅ds be the mapping that takes
each vector-valued mapping F = (f1 , f2 , f3 ) to the 1-form
Ð
→
F ⋅ ds = f1 dx + f2 dy + f3 dz.
Let ⋅dn be the mapping that takes each vector-valued mapping G = (g1 , g2 , g3 )
Ð→
to the 2-form
Ð→
G ⋅ dn = g1 dy ∧ dz + g2 dz ∧ dx + g3 dx ∧ dy.
And let dV be the mapping that takes each function h to the 3-form
h dV = h dx ∧ dy ∧ dz.
Combine the previous problems to verify that the following diagram com-
mutes, meaning that either path around each square yields the same result.
(Do each square separately, e.g., for the middle square start from an arbitrary
(f1 , f2 , f3 ) with no assumption that it is the gradient of some function φ.)
φ
✤ grad
/ (f1 , f2 , f3 ) ✤ curl / (g1 , g2 , g3 ) ✤ div /h
❴ ❴ ❴ ❴
Ð
→ Ð→
⋅ds ⋅dn
id dV
f1 dx g1 dy ∧ dz
+f2 dy ✤ +g2 dz ∧ dx ✤ / h dx ∧ dy ∧ dz
✤ d d d
φ / /
+f3 dz +g3 dx ∧ dy
Explain, using the diagram from the preceding exercise and the nilpotence
of d. For a function φ ∶ R3 Ð→ R, write out the harmonic equation (or
Laplace’s equation), which does not automatically hold for all φ but turns
out to be an interesting condition,
div(grad φ) = 0.
∫ f = ∫ (f ○ T ) ⋅ ∣ det T ′ ∣.
T (D) D
Using this formula, and thinking of T as mapping from (r, θ)-space forward
to (x, y)-space, every form on (x, y)-space can naturally be converted back
into a form on (r, θ)-space, simply by substituting r cos θ for x and r sin θ
for y. If the form on (x, y)-space is named λ then the form on (r, θ)-space is
denoted T ∗ λ. For example, the 2-form that gives area on (x, y)-space,
λ = dx ∧ dy,
Working out the derivatives and then the wedge shows that
448 9 Integration of Differential Forms
Thus (now dropping the wedges from the notation), this process has converted
dx dy into r dr dθ as required by the change of variable theorem.
For another example, continue to let T denote the polar coordinate map-
ping, and consider a 1-form on (x, y)-space (for (x, y) ≠ (0, 0)),
x dy − y dx
ω= .
x2 + y 2
The corresponding 1-form on (r, θ)-space (for r > 0) is
r cos θ d(r sin θ) − r sin θ d(r cos θ)
T ∗ω =
(r cos θ)2 + (r sin θ)2
.
d(r sin θ) = sin θ dr + r cos θ dθ, d(r cos θ) = cos θ dr − r sin θ dθ,
(See Figure 9.10.) To infinitesimalize this, multiply it by dt, and then, to make
the resulting form measure infinitesimal change in the polar angle θ along the
curve, we also need to divide by the distance from the origin to get altogether
(x dy − y dx)/(x2 + y 2 ).
For a third example, again start with the 1-form
x dy − y dx
ω= ,
x2 + y 2
but this time consider a different change of variable mapping,
(x′ , y ′ )
(x, y)×
(x, y)
The 1-form on (u, v)-space (for (u, v) ≠ (0, 0)) corresponding to ω is now
and so
(u2 − v 2 )(v du + u dv) − 2uv(u du − v dv)
T ∗ω = 2
(u2 + v 2 )2
((u − v )v − 2u2 v) du + ((u2 − v 2 )u + 2uv 2 ) dv
2 2
=2
(u2 + v 2 )2
u dv − v du
=2 2 .
u + v2
Thus T ∗ ω is essentially the original form, except that it is doubled, and now
it is a form on (u, v)-space. The result of the calculation stems from the fact
that T is the complex square mapping, which doubles angles. The original
form ω, which measures change of angle in (x, y)-space, has transformed back
to the form that measures twice the change of angle in (u, v)-space. Integrating
T ∗ ω along a curve γ in (u, v)-space that misses the origin returns twice the
change in angle along this curve, and this is the change in angle along the
image-curve T ○ γ in (x, y)-space.
450 9 Integration of Differential Forms
ω = ∑ fI dyI ,
is
T ∗ ω = ∑(fI ○ T ) dTI .
I
T ∗ f = f ○ T.
As the examples before the definition have shown, computing pullbacks is easy
and purely mechanical: given a form ω in terms of y’s and dy’s, its pullback
T ∗ ω comes from replacing each yi in ω by the expression Ti (x1 , . . . , xn ) and
then working out the resulting d’s and wedges.
The fact that pulling the form dx ∧ dy back through the polar coordinate
mapping produced the factor r from the change of variable theorem is no
coincidence.
We have already seen this result for n = 2 in Section 9.7 and for n = 3 in
Exercise 9.7.2.
Proof. The only increasing n-tuple from {1, . . . , n} is (1, . . . , n). As a product
of n 1-forms on Rn , ∆(a1 , a2 , . . . , an ) is an n-form on Rn , and therefore it is
a scalar-valued function δ(a1 , a2 , . . . , an ) times dx(1,...,n) . The relation
δ(a1 , a2 , . . . , an ) dx(1,...,n) = ω1 ∧ ω2 ∧ ⋯ ∧ ωn ,
where ωi is the inner product ai ⋅ (dx1 , . . . , dxn ) for each i, combines with
various properties of the wedge product to show that the following three con-
ditions hold:
• The function δ is linear in each of its vector variables, e.g.,
and
δ(a1 , ca2 , . . . , an ) = c δ(a1 , a2 , . . . , an ).
• The function δ is skew-symmetric, i.e., transposing two of its vector vari-
ables changes its sign.
• The function δ is normalized, i.e., δ(e1 , e2 , . . . , en ) = 1.
The determinant is the unique function satisfying these three conditions, so δ =
det. ⊔
⊓
Proof. By definition,
The right side is precisely ∆(Ti′1 , Ti′2 , . . . , Ti′n ), so the lemma completes the
proof. ⊔
⊓
You may want to verify this directly to get a better feel for the pullback and the
lemma. In general, the pullback–determinant theorem can be a big time-saver
for computing pullbacks when the degree of the form equals the dimension of
the domain space. Instead of multiplying out lots of wedge products, simply
compute the relevant subdeterminant of a derivative matrix.
What makes the integration of differential forms invariant under change of
variable is that the pullback operator commutes with everything else in sight.
T ∗ (ω1 + ω2 ) = T ∗ ω1 + T ∗ ω2 ,
T ∗ (cω) = c T ∗ ω.
T ∗ (ω ∧ λ) = (T ∗ ω) ∧ (T ∗ λ).
That is, the pullback is linear, the pullback is multiplicative (meaning that
it preserves products), and the pullback of the derivative is the derivative of
the pullback. The results in the theorem can be expressed in commutative
diagrams, as in Exercise 9.8.5. Part (2) says that the following diagram com-
mutes:
Λk (B) × Λℓ (B) / Λk (A) × Λℓ (A)
(T ∗ ,T ∗ )
∧ ∧
Λk (B) / Λk (A)
T∗
d d
All of this is especially gratifying because the pullback itself is entirely natural.
Furthermore, the proofs are straightforward: all we need to do is compute, ap-
ply definitions, and recognize definitions. The only obstacle is that the process
requires patience.
Proof. (1) Is immediate from the definition.
(2) For one-term forms f dyI and g dyJ ,
T ∗ (f dyI ∧ g dyJ ) = T ∗ (f g dy(I,J) ) by definition of multiplication
= (f g) ○ T dT(I,J) by definition of the pullback
= f ○ T dTI ∧ g ○ T dTJ since (f g) ○ T = (f ○ T )(g ○ T )
= T ∗ (f dyI ) ∧ T ∗ (g dyJ ) by definition of the pullback.
The result on multiterm forms follows from this and (1).
(3) For a 0-form f ∶ Rm Ð→ R, compute that
m
T ∗ (df ) = T ∗ (∑ Di f dyi ) applying the definition of d
i=1
m
= ∑(Di f ○ T ) dTi
applying the definition
i=1 of the pullback
m n
= ∑ Di f ○ T ⋅ ∑ Dj Ti dxj applying the definition of d
i=1 j=1
n m
= ∑ [∑(Di f ○ T ) ⋅ Dj Ti ] dxj interchanging the sums
j=1 i=1
n
= ∑ Dj (f ○ T ) dxj recognizing the chain rule
j=1
454 9 Integration of Differential Forms
For a one-term k-form f dyI we have d(f dyI ) = df ∧ dyI , so by (2) and the
result for 0-forms,
Λk (C) / Λk (B)
4 Λ (A).
S∗ T∗ / k
(S○T )∗
Since every k-form is a sum of wedge products of 0-forms and 1-forms, and
since the pullback passes through sums and products, the general case follows.
⊔
⊓
Recapitulating this section: To pull a differential form back though a map
is to change variables in the form naturally. Because the wedge product has
the determinant wired into it, so does the pullback. Because the pullback is
natural, it commutes with addition, scalar multiplication, wedge multiplica-
tion, and differentiation of forms, and it anticommutes with composition of
forms. That is, everything that we are doing is preserved under change of
variables.
The results of this section are the technical heart of this chapter. The
reader is encouraged to contrast their systematic algebraic proofs with the
tricky analytic estimates in the main proofs of Chapter 6. The work of this
section will allow the pending proof of the general fundamental theorem of
integral calculus to be carried out by algebra, an improvement over hand-
waving geometry or tortuous analysis. The classical integration theorems of
the nineteenth century will follow without recourse to the classical procedure
of cutting a big curvy object into many pieces and then approximating each
small piece by a straight piece instead. The classical procedure is either im-
precise or byzantine, but for those willing to think algebraically, the modern
procedure is accurate and clear.
We end this section by revisiting the third example from its beginning.
Recall that we considered the 1-form
x dy − y dx
ω=
x2 + y 2
and the complex square mapping
and we computed that the pullback T ∗ ω was twice ω, but written in (u, v)-
coordinates. Now we obtain the same result more conceptually in light of the
results of this section. The idea is that since ω measures change in angle, which
doubles under the complex square mapping, the result will be obvious in polar
coordinates, and furthermore, the pullback behaves so well under changes of
variable that the corresponding result for Cartesian coordinates will follow
easily as well. Thus, consider the polar coordinate mapping
And the polar coordinate mapping also applies to the polar coordinates that
are output by the complex square mapping,
456 9 Integration of Differential Forms
Φ ∶ R>0 × R Ð→ R2 /{(0, 0)}, Φ(r̃, θ̃) = (r̃ cos θ̃, r̃ sin θ̃) = (x, y).
R>0 × R
Φ / R2 /{(0, 0)}
S T
/ R2 /{(0, 0)}.
R>0 × R
Φ
❴
✤❴
dθ̃ o ω
Since d(2θ) = 2 dθ, the sought-for pullback T ∗ ω must be the (u, v)-form that
pulls back through the polar coordinate mapping to 2 dθ. And so T ∗ ω should
be the double of ω, but with u and v in place of x and y,
u dv − v du
T ∗ω = 2 .
u2 + v 2
This is the value of T ∗ ω that we computed mechanically at the beginning
of this section. Indeed, note that this second derivation of T ∗ ω makes no
reference whatsoever to the formula T (u, v) = (u2 − v 2 , 2uv), only to the fact
that in polar coordinates the complex square mapping squares the radius and
doubles the angle.
Similarly, we can use these ideas to pull the area-form λ = dx ∧ dy back
through T . Indeed, dx ∧ dy pulls back through the polar coordinate mapping
to r̃ dr̃ ∧ dθ̃, which pulls back through S to r2 d(r2 ) ∧ d(2θ) = 4r3 dr ∧ dθ. Thus
we have a commutative diagram
4r3 drO ∧ dθ o
✤
T ∗O λ
❴
✤❴
r̃ dr̃ ∧ dθ̃ o λ
9.9 Algebra of Forms: The Pullback 457
So T ∗ λ must pull back through the polar coordinate mapping to 4r3 dr ∧ dθ.
Since the area-form du ∧ dv pulls back to r dr ∧ dθ, the answer is the area
form du ∧ dv multiplied
√ by 4r2 in (u, v)-coordinates. That is, since r in (u, v)-
coordinates is u + v ,
2 2
Exercises
T (S λ).
∗ ∗
As at the end of this section, in light of the fact that T is the complex reciprocal
mapping, determine what T ∗ ω and T ∗ λ must be. If you wish, confirm your
answers by computing them mechanically as at the beginning of this section.
9.9.3. Consider a differential form on the punctured (x, y)-plane,
x dx + y dy
µ= √ .
x2 + y 2
(a) Pull µ back through the polar coordinate mapping from the end of this
section,
458 9 Integration of Differential Forms
Φ ∶ [0, 2π] × [0, π] Ð→ A, Φ(θ, ϕ) = (r cos θ sin ϕ, r sin θ sin ϕ, r cos ϕ).
Compute the derivative matrix Φ′ (θ, ϕ), and use the pullback–determinant
theorem three times to compute the pullback Φ∗ ω. Compare your answer
to the integrand of the surface integral near the end of Section 9.1 used to
compute the volume of the sphere of radius r. (It follows that ω is the area-
form for the particular surface Φ in this exercise, but not that ω is a general
area-form for all surfaces.)
Then
∫ ω=∫ Φ∗ ω.
Φ ∆D
∫ f dxI = ∫ (f ○ Φ) det ΦI
′
by definition, as in (9.14)
Φ D
⊔
⊓
The general change of variable theorem for differential forms follows im-
mediately from the pullback theorem and the contravariance of the pullback.
∫ ω = ∫ T ∗ ω.
T ○Φ Φ
∫ ω=∫ (T ○ Φ)∗ ω = ∫ Φ∗ (T ∗ ω) = ∫ T ∗ ω.
T ○Φ ∆D ∆D Φ
⊔
⊓
Exercise
Let γ be the curve γ ∶ [0, 1] Ð→ R2 given by γ(t) = (1, t) mapping the unit
interval into (x1 , x2 )-space, and let T ○ γ be the corresponding curve mapping
into (y1 , y2 )-space. Let ω = y1 dy2 , a 1-form on (y1 , y2 )-space.
(a) Compute T ○ γ, and then compute ∫T ○γ ω using formula (9.14).
(b) Compute T ∗ ω, the pullback of ω by T .
(c) Compute ∫γ T ∗ ω using formula (9.14). What theorem says that the
answer here is the same as (a)?
(d) Let λ = dy1 ∧ dy2 , the area form on (y1 , y2 )-space. Compute T ∗ λ.
(e) A rectangle in the first quadrant of (x1 , x2 )-space,
R = {(x1 , x2 ) ∶ a1 ≤ x1 ≤ b1 , a2 ≤ x2 ≤ b2 },
and
ω is closed if dω = 0.
The nilpotence of d (the rule d2 = 0 from Theorem 9.8.3) shows that every
exact form is closed. We now show that under certain conditions, the converse
is true as well, i.e., under certain conditions a closed differential form can be
antidifferentiated.
A homotopy of a set is a process of deforming the set to a single point,
the deformation taking place entirely within the original set. For example,
consider the open ball
A = {x ∈ Rn ∶ ∣x∣ < 1}.
A mapping that shrinks the ball to its center as one unit of time elapses is
ball has shrunk to its center. (So here we have let time flow from t = 1 to t = 0
for convenience.)
However, the geometric story just told is slightly misleading. We could
replace the ball A in the previous example by all of Euclidean space Rn , and
the map
h ∶ [0, 1] × Rn Ð→ Rn , h(t, x) = tx
would still contract Rn to {0} in the sense that each point x ∈ Rn is moved
by h to 0 as t varies from 1 to 0. However, at any intermediate time t ∈ (0, 1),
h(t, Rn ) = tRn = Rn is still all of Euclidean space. Although every point
of Rn is moved steadily by h to 0, h does not shrink the set Rn as a whole
until the very end of the process, when space collapses instantaneously to a
point. Each point x of Rn is taken close to the origin once the time t is close
enough to 0, but the required smallness of t depends on x; for no positive t,
however close to 0, is all of Rn taken close to the origin simultaneously. The
relevant language here is that homotopy is a convergent process that need not
be uniformly convergent, analogously to how a continuous function need not
be uniformly continuous. The mental movie that we naturally have of a set
shrinking to a point depicts a uniformly convergent process, and so it doesn’t
fully capture homotopy.
For another example, consider the annulus
A = {x ∈ R2 ∶ 1 < ∣x∣ < 2}.
Plausibly there is no homotopy of the annulus, meaning that the annulus
cannot be shrunk to a point by a continuous process that takes place entirely
within the annulus. But proving that there is no homotopy of the annulus is
not trivial. We will return to this point in Exercise 9.11.1.
The formal definition of a homotopy is as follows.
Definition 9.11.1 (Homotopy, contractible set). Let A be an open subset
of Rn . Let ε be a positive number and let
B = (−ε, 1 + ε) × A,
an open subset of Rn+1 . A homotopy of A is a smooth mapping
h ∶ B Ð→ A
such that for some point p of A,
h(0, x) = p
{ } for all x ∈ A.
h(1, x) = x
An open subset A of Rn that has a homotopy is called contractible.
Again, the idea is that B is a sort of cylinder over A, and that at one end
of the cylinder the homotopy gives an undisturbed copy of A, while by the
other end of the cylinder the homotopy has compressed A down to a point.
This section proves the following result.
462 9 Integration of Differential Forms
B = (−ε, 1 + ε) × A,
but for now we make no reference to the pending homotopy that will have B
as its domain. Recall that the differentiation operator d increments the degree
of a differential form. Now, by contrast, we define a linear operator that takes
differential forms on B and returns differential forms of one degree lower on A.
Let the coordinates on B be (t, x) = (t, x1 , . . . , xn ) with t viewed as the zeroth
coordinate.
I J I t=0
However, note that cd proceeds from Λk (B) to Λk (A) via Λk+1 (B), while dc
proceeds via Λk−1 (A). To analyze the two compositions, compute first that
for a one-term differential form that contains dt,
n
(cd)(g(t, x) dt dxI ) = c (∑ Di g(t, x) dxi dt dxI )
i=1
n
= c (− ∑ Di g(t, x) dt dx(i,I) )
i=1
n
= − ∑ (∫
1
Di g(t, x)) dx(i,I) ,
i=1 t=0
while, using the fact that xi -derivatives pass through t-integrals for the third
equality to follow,
9.11 Closed Forms, Exact Forms, and Homotopy 463
t=0
n
= ∑ Di (∫
1
g(t, x)) dx(i,I)
i=1 t=0
n
= ∑ (∫
1
Di g(t, x)) dx(i,I) .
i=1 t=0
Thus cd + dc annihilates forms that contain dt. On the other hand, for a
one-term differential form without dt,
⎛ n ⎞
(cd)(g(t, x) dxJ ) = c D0 g(t, x) dt dxJ + ∑ Dj g(t, x) dx(j,J)
⎝ j=1 ⎠
= (∫
1
D0 g(t, x)) dxJ
t=0
= (g(1, x) − g(0, x)) dxJ ,
while
(dc)(g(t, x) dxJ ) = d(0) = 0.
That is, cd + dc replaces each coefficient function g(t, x) in forms without dt
by g(1, x) − g(0, x), a function of x only.
To notate the effect of cd + dc more tidily, define the two natural mappings
from A to the cross sections of B where the pending homotopy of A will end
and where it will begin,
β0 (x) = (0, x)
β0 , β1 ∶ A Ð→ B, { }.
β1 (x) = (1, x)
Because β0 and β1 have ranges where t is constant, and because they don’t
affect x, their pullbacks,
and
β0∗ (g(t, x) dxJ ) = g(0, x) dxJ , β1∗ (g(t, x) dxJ ) = g(1, x) dxJ .
h ∶ B Ð→ A.
h∗ ∶ Λk (A) Ð→ Λk (B), k = 0, 1, 2, . . . .
d(ch∗ ω) = ω.
This function must have derivative ω. To verify that it does, compute that its
first partial derivative is
By the chain rule and then by the fact that D1 g = D2 f , the first partial
derivative is therefore
The last integral takes the form ∫t=0 u v ′ where u(t) = t and v(t) = f (tx, ty).
1
Exercises
9.11.1. (a) Here is a special case of showing that a closed form is exact without
recourse to Poincaré’s theorem. A function f ∶ R3 Ð→ R is called homoge-
neous of degree k if
x dy − y dx
ω= ,
x2 + y 2
466 9 Integration of Differential Forms
gives a nonzero answer. Explain why this shows that there is no 0-form (i.e.,
function) θ on the punctured plane such that ω = dθ.
(c) Use part (b) to show that there cannot exist a homotopy of the punc-
tured plane. How does this nonexistence relate to the example of the annulus
at the beginning of this section?
As mentioned in Section 9.3, when k = 0 this means the one-point set whose
point is ().
C = ∑ νs Φ(s) ,
s
2Φ − 3Ψ + 23Γ
is a k-chain in Rn . This k-chain is not the singular k-cube that maps points u
to 2Φ(u) − 3Ψ (u) + 23Γ (u) in Rn . The term formal linear combination in the
definition means that we don’t actually carry out any additions and scalings.
Rather, the coefficients νs are to be interpreted as integration multiplicities.
A k-chain, like a k-form, is a set of instructions.
C = ∑ νs Φ(s)
s
∫ ω = ∑ νs ∫ ω.
∑ νs Φ(s) s Φ(s)
∫ ω = f (b) − f (a).
Φb −Φa
One can define predictable rules for addition and scalar multiplication (integer
scalars) of chains, all of which will pass through the integral sign tautologically.
Especially, the change of variable theorem for differential forms extends from
integrals over surfaces to integrals over chains,
∫ ω = ∫ T ∗ ω.
T ○C C
468 9 Integration of Differential Forms
Exercises
9.12.1. Let A be an open subset of Rn . Consider the inner-product-like func-
tion (called a pairing)
⟨ , ⟩ ∶ {k-chains in A} × {k-forms on A} Ð→ R
defined by the rule
Show that this inner product is bilinear, meaning that for all suitable chains
C and Ci , all suitable forms ω and ωi , and all constants ci ,
⟨∑ ci Ci , ω⟩ = ∑ ci ⟨Ci , ω⟩
i i
and
⟨C, ∑ ci ωi ⟩ = ∑ ci ⟨C, ωi ⟩.
i i
It makes no sense to speak of symmetry of this pairing, because the argu-
ments cannot be exchanged.
Do you think the pairing is nondegenerate, meaning that for every fixed
chain C, if ⟨C, ω⟩ = 0 for all forms ω then C must be 0, and for every fixed
form ω, if ⟨C, ω⟩ = 0 for all chains C then ω must be 0?
9.12.2. Let A be an open subset of Rn , let B be an open subset of Rm , and
let k ≥ 0. Every smooth mapping T ∶ A Ð→ B gives rise via composition to a
corresponding pushforward mapping from k-surfaces in A to k-surfaces in B,
T∗ ∶ {k-surfaces in A} Ð→ {k-surfaces in B}, T∗ Φ = T ○ Φ.
In more detail, since a k-surface in A takes the form Φ ∶ D Ð→ A where D ⊂ Rk
is a parameter domain, the pushforward mapping is
(Φ ∶ D Ð→ A) z→ (T ○ Φ ∶ D Ð→ B).
T∗
Using the pairing-notation of the previous exercise, which result from earlier
in this chapter can be renotated as
⟨T∗ Φ, ω⟩ = ⟨Φ, T ∗ ω⟩ for all suitable Φ and ω?
Note that the renotation shows that the pushforward and pullback are like a
pair of adjoint operators in the sense of linear algebra.
9.13 Geometry of Chains: The Boundary Operator 469
∂ (∑ νs Φ(s) ) = ∑ νs ∂Φ(s) .
∂Φ = Φ ○ ∂∆k .
(The composition here is of the sort defined at the end of the previous
section.)
(3) Define mappings from the standard (k−1)-cube to the faces of the standard
k-cube as follows: for every i ∈ {1, . . . , n} and α ∈ {0, 1}, the mapping to
the face where the ith coordinate equals α is
given by
Then
k 1
∂∆k = ∑ ∑ (−1)i+α ∆ki,α . (9.16)
i=1 α=0
In property (2) the composition symbol “○” has been generalized a little
from its ordinary usage. Since ∂∆k is a chain ∑ µs Ψ(s) , the composition Φ○∂∆k
is defined as the corresponding chain ∑ µs Φ ○ Ψ(s) . The compositions in the
sum make sense, because by property (3), each Ψ(s) maps [0, 1]k−1 into [0, 1]k .
To remember the definition of ∆ki,α in (9.16), read its name as:
470 9 Integration of Differential Forms
or just set the ith variable to α. The idea of formula (9.16) is that for each of
the directions in k-space (i = 1, . . . , k), the standard k-cube has two faces with
normal vectors in the ith direction (α = 0, 1), and we should take these two
faces with opposite orientations in order to make both normal vectors point
outward. Unlike differentiation, which increments the degree of the form it
acts on, the boundary operator decrements chain dimension.
For example, the boundary of the standard 1-cube is given by (9.16),
That is, the boundary is the right endpoint of [0, 1] with a plus and the left
endpoint with a minus. (See Figure 9.11. The figures for this section show the
images of the various mappings involved, with symbols added as a reminder
that the images are being traversed by the mappings.) One consequence of
this is that the familiar formula from the one-variable fundamental theorem
of integral calculus,
f ′ = f (1) − f (0),
1
∫
0
is now expressed suggestively in the notation of differential forms as
∫ df = ∫ f.
∆1 ∂∆1
− +
This chain traverses the boundary square of [0, 1]2 once counterclockwise.
(See Figure 9.12.) Next consider a singular 2-cube that parametrizes the unit
disk,
Φ ∶ [0, 1]2 Ð→ R2 , Φ(r, θ) = (r cos 2πθ, r sin 2πθ).
By property (2), ∂Φ = Φ ○ ∂∆2 . This chain traverses the boundary circle
once counterclockwise, two radial traversals cancel, and there is a degener-
ate mapping to the centerpoint. (See Figure 9.13.) Changing to Φ(r, θ) =
(r cos 2πθ, −r sin 2πθ) also parametrizes the unit disk, but now ∂Φ traverses
the boundary circle clockwise.
This chain traverses the faces of [0, 1]3 , oriented positively if we look at them
from outside the solid cube. (See Figure 9.14.)
The second boundary of the standard 2-cube works out by cancellation to
472 9 Integration of Differential Forms
∂ 2 ∆2 = 0.
(See the left side of Figure 9.15.) And the second boundary of the standard
3-cube similarly is
∂ 2 ∆3 = 0.
(See the right side of Figure 9.15.) These two examples suggest that the no-
tational counterpart to the nilpotence of d is also true,
∂ 2 = 0.
+
z
−
− +
y
+ −
− +
are
Exercises
(In fact, the image is all of the simplex, but showing this would take us too
far afield.)
(c) For each of the values k = 1, 2, 3, do the following. Calculate ∂Φ (the
result is a (k − 1)-chain). Graph ∂Φ by graphing each (k − 1)-cube in the chain
and indicating its coefficient (+1 or −1) beneath the graph. Each graph should
show [0, 1]k−1 and Rk .
√ shell H ∶ D Ð→ R where
3
9.13.2. Describe the boundary of the hemispherical
D is the unit disk in R2 and H(x, y) = (x, y, 1 − x2 − y 2 ). (You might
parametrize D from [0, 1]2 and then compute the boundary of the compo-
sition, or you might simply push ∂D from this section through H.)
H = {(x, y, z) ∈ R3 ∶ x2 + y 2 + z 2 ≤ 1, z ≥ 0}.
Φ(u, v) = (u, v, u2 + v 2 ).
(Again, first make sure that you understand the geometry of the problem,
especially the interpretation of the parametrizing variables in the image-
space.) How does this exercise combine with the result ∂ 2 = 0 to bear on
Exercise 9.13.5?
9.13.7. Fix constants 0 < a < b. Describe the boundary of Φ ∶ [0, 2π] × [0, 2π] ×
[0, 1] Ð→ R3 where Φ(u, v, t) = ((b + at cos v) cos u, (b + at cos v) sin u, at sin v).
(First understand the geometry, especially the interpretation of u, v, and t in
the image-space.)
476 9 Integration of Differential Forms
9.13.8. This exercise gives a self-contained proof that the double boundary
operator is identically zero. It suffices to show this for the double boundary
on the standard k-cube, where k ≥ 2.
(a) Explain why the double boundary is
k 1 k−1 1
∂ 2 ∆k = ∑ ∑ ∑ ∑ (−1)i+j+α+β ∆ki,α ○ ∆k−1
j,β .
i=1 α=0 j=1 β=0
with α in the ith slot and β in the (j + 1)st slot, whereas if i > j then we have
with β in the jth slot and α in the ith slot. Thus the double boundary of the
standard k-cube consists of two sums, written as formal sums of functions of
the variables u1 , . . . , uk−2 ,
k−1 k−1
∂ 2 ∆ = ∑ ∑ (−1)i+j+α+β (u1 , . . . , ui−1 , α, ui , . . . , uj−1 , β, uj , . . . , uk−2 )
i=1 j=i
k i−1
+ ∑ ∑ (−1)i+j+α+β (u1 , . . . , uj−1 , β, uj , . . . , ui−2 , α, ui−1 , . . . , uk−2 ).
i=1 j=1
(c) Explain why the second double sum can instead be written as
k−1 k
∑ ∑ (−1) (u1 , . . . , uj−1 , β, uj , . . . , ui−2 , α, ui−1 , . . . , uk−2 ).
i+j+α+β
j=1 i=j+1
(d) Convince yourself that it is valid to replace i by i+1 in this new second
sum, and that doing so gives
k−1 k−1
− ∑ ∑ (−1)i+j+α+β (u1 , . . . , uj−1 , β, uj , . . . , ui−1 , α, ui , . . . , uk−2 ),
j=1 i=j
∫ dω = ∫ ω. (9.17)
C ∂C
Before proving the theorem, we study two examples. First, suppose that
k = n = 1, and that the 1-chain C is a singular 1-cube Φ ∶ [0, 1] Ð→ R taking 0
and 1 to some points a and b. Then the theorem says that for every suitable
smooth function f ,
f ′ (x) dx = f (b) − f (a).
b
∫
a
This is the one-variable fundamental theorem of integral calculus. Thus, what-
ever else we are doing, we are indeed generalizing it.
Second, to study a simple case involving more than one variable, suppose
that C = ∆2 (the standard 2-cube) and ω = f (x, y) dy for some smooth function
f ∶ [0, 1]2 Ð→ R. The derivative on the left side of (9.17) works out to
dω = D1 f (x, y) dx ∧ dy,
Exercise 9.5.4 says that we may drop the wedges from the integral of this
2-form over the full-dimensional surface ∆2 in 2-space to obtain a Chapter 6
function-integral, and so the left side of (9.17) works out to
∫ dω = ∫ D1 f (x, y) dx ∧ dy = ∫ D1 f.
∆2 ∆2 [0,1]2
Meanwhile, on the right side of (9.17), the boundary ∂∆2 has four pieces, but
on the two horizontal pieces dy is zero because y is constant. Thus only the
integrals over the two vertical pieces contribute, giving
t=0
D1 f (t, u) = ∫
1 1
∫ ∫ D1 f.
u=0 t=0 [0,1]2
478 9 Integration of Differential Forms
Thus both sides of (9.17) work out to ∫[0,1]2 D1 f , making them equal, as
desired, and the general FTIC holds in this case. The first step of its proof is
essentially the same process as in this example.
To evaluate the left side ∫C dω of (9.17), we need to compute dω. In this special
case,
dω = Dj f (x) dxj ∧ dxJ = (−1)j−1 Dj f dx(1,...,k) ,
and so by Exercise 9.5.4, the left side reduces to the function-integral of the
jth partial derivative over the unit box,
To evaluate the right side ∫∂C ω of (9.17), we need to examine the boundary
k 1
∂∆k = ∑ ∑ (−1)i+α ∆ki,α ,
i=1 α=0
⎡10⋯00⋯0⎤
⎢ ⎥
⎢01⋯00⋯0⎥
⎢ ⎥
⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⎥
⎢ ⎥
⎢00⋯10⋯0⎥
⎢
(∆i,α ) = ⎢ ⎥.
⎥
k
⎢0 0 ⋯ 0 0 ⋯ 0 ⎥
′
⎢ ⎥
⎢00⋯01⋯0⎥
⎢ ⎥
⎢ ⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥
⎢ ⎥
⎢00⋯00⋯1⎥
⎣ ⎦
This derivative matrix is k × (k − 1), consisting of the identity matrix except
that zeros have been inserted at the ith row, displacing everything from there
downward. Meanwhile, recall that J = (1, . . . , ̂j, . . . , k), where the omitted
index j is fixed throughout this calculation. It follows that as the index i of
summation varies, the determinant of the Jth rows of the matrix is
⎧
⎪
⎪1 if i = j,
det(∆ki,α )′J =⎨
⎪
⎪ if i ≠ j.
⎩
0
9.14 The General Fundamental Theorem of Integral Calculus 479
That is, the integral of ω = f (x) dxJ can be nonzero only for the two terms
in the boundary chain ∂∆k with i = j, parametrizing the two boundary faces
whose normal vectors point in the direction missing from dxJ :
Here the last equality follows from the definition of integration over chains
and the defining formula (9.14). For every point u = (u1 , . . . , uk−1 ) ∈ [0, 1]k−1 ,
the integrand can be rewritten as an integral of the jth partial derivative by
the one-variable fundamental theorem of integral calculus,
By Fubini’s theorem this is equal to the right side of (9.18), and so the general
FTIC is proved in the special case.
The rest of the proof is handled effortlessly by the machinery of forms and
chains. A general (k − 1)-form on [0, 1]k is
̂ j ∧ ⋯ ∧ dxk .
k
ω = ∑ ωj , each ωj = fj (x) dx1 ∧ ⋯ ∧ dx
j=1
Each ωj is a form of the type covered by the special case, and dω = ∑j dωj .
So, continuing to integrate over the standard k-cube, and citing the special
case just shown for the crucial third equality,
∫ dω = ∫ ∑ dωj = ∑ ∫ dωj
∆k ∆k j j ∆k
= ∑∫ ωj = ∫ ∑ ωj = ∫ ω.
j ∂∆k ∂∆k j ∂∆k
∫ dω = ∫ dω = ∑ νs ∫ dω = ∑ νs ∫ ω,
C ∑s νs Φ(s) s Φ(s) s ∂Φ(s)
with the third equality due to the result for singular cubes, and the calculation
continues
The beauty of this argument is that the only analytic results that it uses
are the one-variable FTIC and Fubini’s theorem, and the only geometry that
it uses is the definition of the boundary of a standard k-cube. All the twist-
ing and turning of k-surfaces and their boundaries in n-space is filtered out
automatically by the algebra of differential forms.
Computationally, the general FTIC will sometimes give you a choice be-
tween evaluating two integrals, one of which may be easier to work. Note that
the integral of lower dimension may not be the preferable one, however; for
example, integrating over a solid 3-cube may be quicker than integrating over
the six faces of its boundary.
Conceptually the general FTIC is exciting because it allows the possi-
bility of evaluating an integral over a region by antidifferentiating and then
integrating only over the boundary of the region instead.
Exercises
9.14.1. Similarly to the second example before the proof of the general FTIC,
show that the theorem holds when C = ∆3 and ω = f (x, y, z) dz ∧ dx.
∫ f dω = ∫ f ω − ∫ df ∧ ω.
C ∂C C
∫ f1 dy ∧ dz ∧ dw + f2 dz ∧ dw ∧ dx + f3 dw ∧ dx ∧ dy + f4 dx ∧ dy ∧ dz
∂Φ
= ∫ (D1 f1 − D2 f2 + D3 f3 − D4 f4 ) dx ∧ dy ∧ dz ∧ dw.
Φ
FTIC CoV
(n = 1) (n = 1)
+3 Fubini
♣♣
♣♣♣♣♣
♣♣
♣♣♣♣♣
♣♣
s{ ♣♣
CoV
(n > 1)
∫ f ind. of param.
Φ
'/ FTIC ow
(general)
∫ f = ∫ (f ○ Φ) ⋅ det Φ′ .
Φ(J) J
View the mapping Φ as a singular n-cube in Rn . (Since J need not be the unit
box, the definition of a singular n-cube is being extended here slightly to allow
any box as the domain. The boundary operator extends correspondingly, as
discussed at the end of Section 9.13.) Consider the trivial parametrization of
the image of the cube,
9.15 Classical Change of Variable Revisited 483
FTIC CoV
(n = 1) ❖❖ (n = 1)
3+ Fubini
♣♣
❖❖❖
❖❖❖❖ ♣♣♣♣♣
♣♣
❖❖❖
❖❖❖❖ ♣♣♣♣♣
♣♣
❖ #+ s{ ♣♣
FTIC
(general)
CoV
(n > 1)
∫ f ind. of param.
Φ
Figure 9.17. Provisional layout of the main results after this section
Here x = (x1 , . . . , xn ) and dx = dx1 ∧ ⋯ ∧ dxn , and the pullback on the right
side of the equality is Φ∗ ω = (f ○ Φ)(x) det Φ′ (x) dx. (Note that applying the
pullback theorem (Theorem 9.10.1) reduces the desired formula to
∫ ω = ∫ ω,
∆Φ(J) Φ
F ′ = F (φ(b)) − F (φ(a))
φ(b) φ(b)
∫ f =∫ (9.20)
φ(a) φ(a)
484 9 Integration of Differential Forms
and
∫ (f ○ φ) ⋅ φ = ∫ (F ○ φ) = (F ○ φ)(b) − (F ○ φ)(a).
b b
′ ′
(9.21)
a a
Since the right sides are equal, so are the left sides, giving the theorem. Here
the first version of the one-variable FTIC (Theorem 6.4.1) provides the an-
tiderivative F = ∫φ(a) f of f .
x
Now, starting from the integral on the left side of the desired equal-
ity (9.19), attempt to pattern-match the calculation (9.20) without yet wor-
rying about whether the steps are justified or even meaningful,
∫ ω=∫ dλ = ∫ λ. (9.22)
∆Φ(J) ∆Φ(J) ∂∆Φ(J)
Similarly, the integral on the right side of (9.19) looks like the integral at the
beginning of the calculation (9.21), so pattern-match again,
∫ Φ∗ ω = ∫ d(Φ∗ λ) = ∫ Φ∗ λ. (9.23)
∆J ∆J ∂∆J
∫ λ=∫ Φ∗ λ.
∂∆Φ(J) ∂∆J
This formula looks like the desired (9.19) but with (n−1)-dimensional integrals
of (n−1)-forms. Perhaps we are discovering a proof of the multivariable change
of variable theorem by induction on the number of variables. But we need to
check whether the calculation is sensible.
Just as the one-variable calculation rewrote f as F ′ , the putative multi-
variable calculation has rewritten ω as dλ, but this needs justification. Recall
that ω = f (x) dx. Although Φ(J) is not a box, an application of Theorem 6.4.1
to the first variable shows that in the small, f takes the form
f (x1 , x2 , . . . , xn ) = D1 F (x1 , x2 , . . . , xn ).
9.15 Classical Change of Variable Revisited 485
Φ ∶ J Ð→ Rn .
The box J is compact, and hence so is its continuous image Φ(J). Therefore
some large box B contains them both. If J is small enough then because
det Φ′ > 0 on J, it follows from some analysis that Φ extends to a mapping
Ψ ∶ B Ð→ Rn
such that
• Ψ is the original Φ on J,
• Ψ takes the complement of J in B to the complement of Φ(B) in B,
• Ψ is the identity mapping on the boundary of B.
(See Figure 9.19.) Furthermore, the n-form ω on the original Φ(J) can be
modified into a form ω on the larger set B such that
• ω is the original ω on Φ(J),
• ω = 0 essentially everywhere off the original Φ(J).
And now that the nonbox Φ(J) has been replaced by the box B, the calcula-
tion of the antiderivative form λ such that ω = dλ works in the large.
Let ∆B denote the trivial parametrization of B. Then the properties of Ψ
and ω show that the desired equality (9.19) has become
486 9 Integration of Differential Forms
∫ ω=∫ Ψ ∗ ω,
∆B ∆B
the integrals on both sides now being taken over the same box B. Again
pattern-matching the one-variable proof shows that the integral on the left
side is
∫ ω = ∫ dλ = ∫ λ
∆B ∆B ∂∆B
and the integral on the right side is
∫ Ψ ∗ω = ∫ d(Ψ ∗ λ) = ∫ Ψ ∗ λ,
∆B ∆B ∂∆B
where everything here makes sense. Thus the problem is reduced to proving
that
∫ B λ = ∫ B Ψ λ.
∗
∂∆ ∂∆
And now the desired equality is immediate: since Ψ is the identity mapping on
the boundary of B, the pullback Ψ ∗ in the right-side integral of the previous
display does nothing, and the two integrals are equal. (See Exercise 9.15.1 for a
slight variant of this argument.) The multivariable argument has ended exactly
as the one-variable argument did. We did not need to argue by induction after
all.
In sum, the general FTIC lets us side-step the traditional proof of the
classical change of variable theorem, by expanding the environment of the
problem to a larger box and then reducing the scope of the question to the
larger box’s boundary. On the boundary there is no longer any difference
between the two quantities that we want to be equal, and so we are done.
The reader may well object that the argument here is only heuristic, and
that there is no reason to believe that its missing technical details will be
any less onerous than those of the usual proof the classical change of variable
9.16 The Classical Theorems 487
theorem. The difficulty of the usual proof is that it involves nonboxes, while
the analytic details of how this argument proceeds from the nonbox Φ(J) to
a box B were not given. Along with the extensions of Φ and ω to B being
invoked, the partitioning of J into small enough subboxes was handwaved.
Furthermore, the change of variable mapping Φ is assumed here to be smooth,
whereas in Theorem 6.7.1 it need only be C 1 . But none of these matters is
serious. A second article by Lax, written in response to such objections, shows
how to take care of them. Although some analysis is admittedly being elided
here, the new argument nonetheless feels more graceful to the author of these
notes than the older one.
Exercise
9.15.1. Show that in the argument at the end of this section, we could instead
reason about the integral on the right side that
∫ Ψ ∗ ω = ∫ dλ = ∫ λ.
∆B Ψ ∂Ψ
Thus the problem is reduced to proving that ∫∂∆B λ = ∫∂Ψ λ. Explain why the
desired equality is immediate.
∬ ( − ) dx ∧ dy = ∫ f dx + g dy.
∂g ∂f
Φ ∂x ∂y ∂Φ
The double integral sign is used on the left side of Green’s theorem to em-
phasize that the integral is two-dimensional. Naturally the classical statement
doesn’t refer to a singular cube or include a wedge. Instead, the idea classi-
cally is to view Φ as a set in the plane and require a traversal of ∂Φ (also
488 9 Integration of Differential Forms
viewed as a set) such that Φ is always to the left as one moves along ∂Φ.
Other than this, the boundary integral is independent of how the boundary is
traversed because the whole theory is invariant under orientation-preserving
reparametrization. (See Figure 9.20.)
Green’s theorem has two geometric interpretations. To understand them,
first let A ⊂ R2 be open and think of a vector-valued mapping F⃗ ∶ A Ð→ R2
as defining a fluid flow in A. Define two related scalar-valued functions on A,
curl F⃗ = D1 F2 − D2 F1 and div F⃗ = D1 F1 + D2 F2 .
These are two-dimensional versions of the quantities from exercises 9.8.4
and 9.8.5. Now consider a point p in A. Note that curl F⃗ (p) and div F⃗ (p)
depend only on the derivatives of F⃗ at p, not on F⃗ (p) itself. So replacing F⃗
by F⃗ − F⃗ (p), we may assume that F⃗ (p) = 0, i.e., the fluid flow is stationary
at p. Recall that D1 F2 is the rate of change of the vertical component of F
with respect to change in the horizontal component of its input, and D2 F1 is
the rate of change of the horizontal component of F with respect to change
in the vertical component of its input. The left side of Figure 9.21 shows a
scenario in which the two terms D1 F2 and −D2 F1 of (curl F⃗ )(p) are positive.
The figure illustrates why curl F⃗ is interpreted as measuring the vorticity of F⃗
at p, its tendency to rotate a paddle wheel at p counterclockwise. Similarly,
D1 F1 is the rate of change of the horizontal component of F with respect
to change in the horizontal component of its input, and D2 F2 is the rate of
change of the vertical component of F with respect to change in the verti-
cal component of its input. The right side of Figure 9.21 shows a scenario in
which the terms of (div F⃗ )(p) are positive. The figure illustrates why div F⃗
is viewed as measuring the extent to which fluid is spreading out from p, i.e.,
how much fluid is being pumped into or drained out of the system at the
point. Specifically, the left side of the figure shows the vector field
9.16 The Classical Theorems 489
F⃗ (x, y) = (−y, x)
and the right side shows (with some artistic license taken to make the figure
legible rather than accurate) the vector field
F⃗ (x, y) = (x, y)
and
the net positive rate of creation of fluid by F⃗ throughout Φ
equals the net flux of F⃗ outward through ∂Φ.
These interpretations appeal strongly to physical intuition.
We can also bring dimensional analysis to bear on the integrals in Green’s
theorem. Again view the vector field F⃗ as a velocity field describing a fluid
flow. Thus each component function of F⃗ carries units of length over time
(for instance, m/s). The partial derivatives that make up curl F⃗ and div F⃗
are derivatives with respect to space-variables, so the curl and the divergence
carry units of reciprocal time (1/s). The units of the area-integral on the left
side of Green’s theorem are thus area over time (1/s ⋅ m2 = m2 /s), as are the
units of the path-integral on the right side (m/s ⋅ m = m2 /s as well). Thus both
integrals measure area per unit of time. If the fluid is incompressible then
area of fluid is proportional to mass of fluid, and so both integrals essentially
measure fluid per unit of time: the amount of fluid being created throughout
the region per unit of time, and the amount of fluid passing through the
boundary per unit of time; or the amount of fluid circulating throughout the
region per unit of time, and the amount of fluid flowing along the boundary
per unit of time.
The physical interpretations of divergence and curl will be discussed more
carefully in the next section.
Setting n = 3, k = 2 gives Stokes’s theorem: Let A be an open subset
of R3 . For a singular 2-cube Φ in A and functions f, g, h ∶ A Ð→ R,
∬ ( − ) dy ∧ dz + ( − ) dz ∧ dx + ( − ) dx ∧ dy
∂h ∂g ∂f ∂h ∂g ∂f
Φ ∂y ∂z ∂z ∂x ∂x ∂y
=∫ f dx + g dy + h dz.
∂Φ
curl F⃗ = (D2 F3 − D3 F2 , D3 F1 − D1 F3 , D1 F2 − D2 F1 ).
∬ curl F⃗ ⋅ dn = ∫ F⃗ ⋅ ds.
Ð→ Ð
→
Φ ∂Φ
∭ ( + + ) dx ∧ dy ∧ dz = ∬ f dy ∧ dz + g dz ∧ dx + h dx ∧ dy.
∂f ∂g ∂h
Φ ∂x ∂y ∂z ∂Φ
div F⃗ = D1 F1 + D2 F2 + D3 F3 .
∭ div F⃗ dV = ∬ F⃗ ⋅ dn.
Ð→
Φ ∂Φ
Exercises
9.16.1. (a) Let γ ∶ [0, 1] Ð→ R2 , t ↦ γ(t) be a curve, and recall the form-
vectors on R2 ds = (dx, dy), dn = (dy, −dx). Compute the pullbacks γ ∗ (ds)
Ð
→ Ð→ Ð→
and γ ∗ (dn) and explain why these are interpreted as differential tangent and
Ð→
normal vectors to γ.
492 9 Integration of Differential Forms
dn = (dy ∧dz, dz ∧dx, dx∧dy). Compute the pullbacks γ ∗ (ds) and Φ∗ (dn) and
Ð→ Ð
→ Ð→
explain why these are interpreted respectively as differential tangent vector
to γ and differential normal vector to Φ.
area(Φ) = ∫ x dy = − ∫ y dx.
∂Φ ∂Φ
Thus one can measure the area of a planar set by traversing its bound-
ary. (This principle was used to construct ingenious area-measuring machines
called planimeters before Green’s theorem was ever written down.)
H = {(x, y, z) ∈ R3 ∶ x2 + y 2 + z 2 = 1, z ≥ 0}.
∫ x2 dy ∧ dz + y 2 dz ∧ dx + z 2 dx ∧ dy,
∂H
H = {(x, y, z) ∈ R3 ∶ x2 + y 2 + z 2 ≤ 1, z ≥ 0}.
(Thus ∂H is the union of the unit disk in the (x, y)-plane and the unit upper
hemispherical shell.) Feel free to cancel terms by citing symmetry if you’re
confident of what you’re doing.
∭ (g ∆h + ∇g ⋅ ∇h) dV = ∬
Ð→
g ∇h ⋅ dn.
D ∂D
9.17 Divergence and Curl in Polar Coordinates 493
(Here n is the unit outward normal to D and ∇h⋅n is the directional derivative
of h in the direction of n.) Interchange g and h and subtract the resulting
formula from the first one to get
F = Fr + Fθ ,
where
(Recall that the unary cross product (x, y)× = (−y, x) in R2 rotates vectors
90 degrees counterclockwise.) Here fr is positive if Fr points outward and
negative if Fr points inward, and fθ is positive if Fθ points counterclockwise
and negative if Fθ points clockwise. Since F (0) = 0, the resolution of F into
radial and angular components extends continuously to the origin, fr (0) =
fθ (0) = 0, so that Fr (0) = Fθ (0) = 0 even though r̂ and θ̂ are undefined at
the origin.
The goal of this section is to express the divergence and the curl of F
at the origin in terms of the polar coordinate system derivatives that seem
naturally suited to describe them, the radial derivative of the scalar radial
component of F ,
fr (r cos θ, r sin θ)
Dr fr (0) = lim+ ,
r→0 r
and the radial derivative of the scalar angular component of F ,
fθ (r cos θ, r sin θ)
Dr fθ (0) = lim+ .
r→0 r
However, matters aren’t as simple here as one might hope. For one thing,
the limits are stringent in the sense that they must always exist and take
the same values regardless of how θ behaves as r → 0+ . Also, although F
is differentiable at the origin if its vector radial and angular components Fr
and Fθ are differentiable at the origin, the converse is not true. So first we
need sufficient conditions for the converse, i.e., sufficient conditions for the
components to be differentiable at the origin. Necessary conditions are always
easier to find, so Proposition 9.17.1 will do so, and then Proposition 9.17.2 will
show that the necessary conditions are sufficient. The conditions in question
are the Cauchy–Riemann equations,
D1 f1 (0) = D2 f2 (0),
D1 f2 (0) = −D2 f1 (0).
F = (f1 , f2 ) ∶ A Ð→ R2 , F (0) = 0.
Assume that the vector radial and angular components Fr and Fθ of F are
differentiable at the origin. Then F is differentiable at the origin, and the
Cauchy–Riemann equations hold at the origin.
For example, the vector field F (x, y) = (x, 0) is differentiable at the origin,
but since D1 f1 (0) = 1 and D2 f2 (0) = 0, it does not satisfy the Cauchy–
Riemann equations, and so the derivatives of the radial and angular compo-
nents of F at the origin do not exist.
Proof. As already noted, the differentiability of F at the origin is immedi-
ate, because F = Fr + Fθ and the sum of differentiable mappings is again
differentiable. We need to establish the Cauchy–Riemann equations.
The radial component Fr is stationary at the origin, and we are given
that it is differentiable at the origin. By the componentwise nature of differ-
entiability, the first component Fr,1 of Fr is differentiable at the origin, and
so necessarily both partial derivatives of Fr,1 exist at 0. Since Fr,1 vanishes
on the y-axis, the second partial derivative is 0. Thus the differentiability
criterion for the first component of Fr is
To further study the condition in the previous display, use the formula
⎧
⎪
⎪
fr (x,y)
(x, y) if (x, y) ≠ 0,
Fr (x, y) = ⎨ ∣(x,y)∣
⎪
⎪ if (x, y) = 0
⎩0
to substitute h fr (h, k)/∣(h, k)∣ for Fr,1 (h, k). Also, because Fθ is angular,
Fθ,1 vanishes on the x-axis, and so D1 Fθ,1 (0) = 0; thus, since f1 = Fr,1 + Fθ,1 ,
we may substitute D1 f1 (0) for D1 Fr,1 (0) as well. Altogether the condition
becomes
h(fr (h, k)/∣(x, y)∣ − D1 f1 (0)) = o(h, k).
A similar argument using the second component Fr,2 of Fr shows that
And so we have shown that the first Cauchy–Riemann equation holds and a
little more,
fr (h, k)
= D1 f1 (0) = D2 f2 (0).
(h,k)→0 ∣(h, k)∣
lim
fθ (h, k)
= D1 f2 (0) = −D2 f1 (0).
∣(h, k)∣
lim
(h,k)→0
Decompose the quantity in the previous display into radial and angular com-
ponents,
F (h, k) − (ah − bk, bh + ak) = (Fr (h, k) − a(h, k)) + (Fθ (h, k) − b(−k, h)).
That is, Fr and Fθ are differentiable at the origin with respective Jacobian
matrices
0 −b
Fr′ (0) = [ ] Fθ′ (0) = [ ].
a0
and
0a b 0
This completes the proof. ⊔
⊓
498 9 Integration of Differential Forms
F = (f1 , f2 ) ∶ A Ð→ R2 , F (0) = 0.
fr (r cos θ, r sin θ)
Dr fr (0) = lim+
r→0 r
and
fθ (r cos θ, r sin θ)
Dr fθ (0) = lim+ ,
r→0 r
both exist independently of how θ behaves as r shrinks to 0. Furthermore,
the divergence of F at the origin is twice the radial derivative of the radial
component,
(div F )(0) = 2Dr fr (0),
and the curl of F at the origin is twice the radial derivative of the angular
component,
(curl F )(0) = 2Dr fθ (0).
fr (h, k)
Dr fr (0) = = D1 f1 (0) = D2 f2 (0).
∣(h, k)∣
lim
(h,k)→0
fθ (h, k)
Dr fθ (0) = = D1 f2 (0) = −D2 f1 (0),
(h,k)→0 ∣(h, k)∣
lim
so that
(curl F )(0) = D1 f2 (0) − D2 f1 (0) = 2Dr fθ (0).
⊔
⊓
9.17 Divergence and Curl in Polar Coordinates 499
Exercises
9.17.1. Put R2 into correspondence with the complex number field C as fol-
lows:
[ ] ←→ x + i y.
x
y
Show that the correspondence extends to
a −b x
[ ] [ ] ←→ (a + i b)(x + i y).
b a y
∣ [ ] ∣ = ∣x + i y∣,
x
y
9.17.2. Let A ⊂ R2 be an open set that contains the origin, and let F ∶ A Ð→
R2 be a vector field on A that is stationary at the origin. Define a complex-
valued function of a complex variable corresponding to F ,
f (z + ∆z) − f (z)
lim .
∆z→0 ∆z
The limit is denoted f ′ (z).
(a) Suppose that f is complex-differentiable at 0. Compute f ′ (z) first by
letting ∆z go to 0 along the x-axis, and again by letting ∆z go to 0 along
the y-axis. Explain how your calculation shows that the Cauchy–Riemann
equations hold at 0.
(b) Show also that if f is complex differentiable at 0 then F is vector
differentiable at 0, meaning differentiable in the usual sense. Suppose that f
is complex-differentiable at 0, and that f ′ (0) = reiθ . Show that
Correction to:
J. Shurman, Calculus and Analysis in Euclidean Space,
Undergraduate Texts in Mathematics,
https://doi.org/10.1007/978-3-319-49314-5
The original version of the book was inadvertently published with a few typesetting
errors, which have now been corrected.