100% found this document useful (1 vote)
544 views453 pages

A Radical Approach To Real Analysis 2E - David M Bressoud

Uploaded by

Mamta Ghimire
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
544 views453 pages

A Radical Approach To Real Analysis 2E - David M Bressoud

Uploaded by

Mamta Ghimire
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 453

AMS / MAA TEXTBOOKS VOL 10

A Radical Approach
to Real Analysis
Second Edition

David Bressoud
A Radical Approach to
Real Analysis
Second Edition
Originally published by
The Mathematical Association of America, 2007.
Softcover ISBN: 978-1-4704-6904-7
LCCN: 2006933946

Copyright © 2007, held by the American Mathematical Society


Printed in the United States of America.
Reprinted by the American Mathematical Society, 2022
The American Mathematical Society retains all rights
except those granted to the United States Government.

∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at https://www.ams.org/
10 9 8 7 6 5 4 3 2 27 26 25 24 23 22
AMS/MAA TEXTBOOKS

VOL 10

A Radical Approach to
Real Analysis
Second Edition

David M. Bressoud
Committee on Books
Frank Farris, Chair
MAA Textbooks Editorial Board
Zaven A. Karian, Editor
George Exner
Thomas Garrity
Charles R. Hadlock
William Higgins
Douglas B. Meade
Stanley E. Seltzer
Shahriar Shahriari
Kay B. Somers
MAA TEXTBOOKS
Bridge to Abstract Mathematics, Ralph W. Oberste-Vorth, Aristides Mouzakitis, and Bonita
A. Lawrence
Calculus Deconstructed: A Second Course in First-Year Calculus, Zbigniew H. Nitecki
Calculus for the Life Sciences: A Modeling Approach, James L. Cornette and Ralph A.
Ackerman
Combinatorics: A Guided Tour, David R. Mazur
Combinatorics: A Problem Oriented Approach, Daniel A. Marcus
Common Sense Mathematics, Ethan D. Bolker and Maura B. Mast
Complex Numbers and Geometry, Liang-shin Hahn
A Course in Mathematical Modeling, Douglas Mooney and Randall Swift
Cryptological Mathematics, Robert Edward Lewand
Differential Geometry and its Applications, John Oprea
Distilling Ideas: An Introduction to Mathematical Thinking, Brian P. Katz and Michael
Starbird
Elementary Cryptanalysis, Abraham Sinkov
Elementary Mathematical Models, Dan Kalman
An Episodic History of Mathematics: Mathematical Culture Through Problem Solving,
Steven G. Krantz
Essentials of Mathematics, Margie Hale
Field Theory and its Classical Problems, Charles Hadlock
Fourier Series, Rajendra Bhatia
Game Theory and Strategy, Philip D. Straffin
Geometry Illuminated: An Illustrated Introduction to Euclidean and Hyperbolic Plane Ge-
ometry, Matthew Harvey
Geometry Revisited, H. S. M. Coxeter and S. L. Greitzer
Graph Theory: A Problem Oriented Approach, Daniel Marcus
An Invitation to Real Analysis, Luis F. Moreno
Knot Theory, Charles Livingston
Learning Modern Algebra: From Early Attempts to Prove Fermat’s Last Theorem, Al
Cuoco and Joseph J. Rotman
The Lebesgue Integral for Undergraduates, William Johnston
Lie Groups: A Problem-Oriented Introduction via Matrix Groups, Harriet Pollatsek
Mathematical Connections: A Companion for Teachers and Others, Al Cuoco
Mathematical Interest Theory, Second Edition, Leslie Jane Federer Vaaler and JamesW.
Daniel
Mathematical Modeling in the Environment, Charles Hadlock
Mathematics for Business Decisions Part 1: Probability and Simulation (electronic text-
book), Richard B. Thompson and Christopher G. Lamoureux
Mathematics for Business Decisions Part 2: Calculus and Optimization (electronic text-
book), Richard B. Thompson and Christopher G. Lamoureux
Mathematics for Secondary School Teachers, Elizabeth G. Bremigan, Ralph J. Bremigan,
and John D. Lorch
The Mathematics of Choice, Ivan Niven
The Mathematics of Games and Gambling, Edward Packel
Math Through the Ages, William Berlinghoff and Fernando Gouvea
Noncommutative Rings, I. N. Herstein
Non-Euclidean Geometry, H. S. M. Coxeter
Number Theory Through Inquiry, David C. Marshall, Edward Odell, and Michael Starbird
Ordinary Differential Equations: from Calculus to Dynamical Systems, V. W. Noonburg
A Primer of Real Functions, Ralph P. Boas
A Radical Approach to Lebesgue’s Theory of Integration, David M. Bressoud
A Radical Approach to Real Analysis, 2nd edition, David M. Bressoud
Real Infinite Series, Daniel D. Bonar and Michael Khoury, Jr.
Teaching Statistics Using Baseball, 2nd edition, Jim Albert
Thinking Geometrically: A Survey of Geometries, Thomas Q. Sibley
Topology Now!, Robert Messer and Philip Straffin
Understanding our Quantitative World, Janet Andersen and Todd Swanson
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

to the memory of my mother


Harriet Carnrite Bressoud
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Preface

The task of the educator is to make the child’s spirit


pass again where its forefathers have gone, mov-
ing rapidly through certain stages but suppressing
none of them. In this regard, the history of science
must be our guide.
—Henri Poincaré

This course of analysis is radical; it returns to the roots of the subject. It is not a history
of analysis. It is rather an attempt to follow the injunction of Henri Poincaré to let history
inform pedagogy. It is designed to be a first encounter with real analysis, laying out its
context and motivation in terms of the transition from power series to those that are less
predictable, especially Fourier series, and marking some of the traps into which even great
mathematicians have fallen.
This is also an abrupt departure from the standard format and syllabus of analysis.
The traditional course begins with a discussion of properties of the real numbers, moves
on to continuity, then differentiability, integrability, sequences, and finally infinite series,
culminating in a rigorous proof of the properties of Taylor series and perhaps even Fourier
series. This is the right way to view analysis, but it is not the right way to teach it. It
supplies little motivation for the early definitions and theorems. Careful definitions mean
nothing until the drawbacks of the geometric and intuitive understandings of continuity,
limits, and series are fully exposed. For this reason, the first part of this book follows the
historical progression and moves backwards. It starts with infinite series, illustrating the
great successes that led the early pioneers onward, as well as the obstacles that stymied
even such luminaries as Euler and Lagrange.
There is an intentional emphasis on the mistakes that have been made. These highlight
difficult conceptual points. That Cauchy had so much trouble proving the mean value
theorem or coming to terms with the notion of uniform convergence should alert us to the
fact that these ideas are not easily assimilated. The student needs time with them. The highly
refined proofs that we know today leave the mistaken impression that the road of discovery
in mathematics is straight and sure. It is not. Experimentation and misunderstanding have
been essential components in the growth of mathematics.

ix
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

x Preface

Exploration is an essential component of this course. To facilitate graphical and numerical


investigations, Mathematica and Maple commands and programs as well as investigative
projects are available on a dedicated website at www.macalester.edu/aratra.
The topics considered in this book revolve around the questions raised by Fourier’s
trigonometric series and the restructuring of calculus that occurred in the process of an-
swering them. Chapter 1 is an introduction to Fourier series: why they are important and
why they met with so much resistance. This chapter presupposes familiarity with partial
differential equations, but it is purely motivational and can be given as much or as little
emphasis as one wishes. Chapter 2 looks at the background to the crisis of 1807. We
investigate the difficulties and dangers of working with infinite summations, but also the
insights and advances that they make possible. More of these insights and advances are
given in Appendix A. Calculus would not have revolutionized mathematics as it did if it
had not been coupled with infinite series. Beginning with Newton’s Principia, the physical
applications of calculus rely heavily on infinite sums. The chapter concludes with a closer
look at the understandings of late eighteenth century mathematicians: how they saw what
they were doing and how they justified it. Many of these understandings stood directly in
the way of the acceptance of trigonometric series.
In Chapter 3, we begin to find answers to the questions raised by Fourier’s series. We
follow the efforts of Augustin Louis Cauchy in the 1820s to create a new foundation to the
calculus. A careful definition of differentiability comes first, but its application to many of
the important questions of the time requires the mean value theorem. Cauchy struggled—
unsuccessfully—to prove this theorem. Out of his struggle, an appreciation for the nature
of continuity emerges.
We return in Chapter 4 to infinite series and investigate the question of convergence.
Carl Friedrich Gauss plays an important role through his complete characterization of
convergence for the most important class of power series: the hypergeometric series. This
chapter concludes with a verification that the Fourier cosine series studied in the first chapter
does, in fact, converge at every value of x.
The strange behavior of infinite sums of functions is finally tackled in Chapter 5. We
look at Dirichlet’s insights into the problems associated with grouping and rearranging
infinite series. We watch Cauchy as he wrestles with the problem of the discontinuity of
an infinite sum of continuous functions, and we discover the key that he was missing. We
begin to answer the question of when it is legitimate to differentiate or integrate an infinite
series by differentiating or integrating each summand.
Our story culminates in Chapter 6 where we present Dirichlet’s proof of the validity of
Fourier series representations for all “well behaved” functions. Here for the first time we
encounter serious questions about the nature and meaning of the integral. A gap remains in
Dirichlet’s proof which can only be bridged after we have taken a closer look at integration,
first using Cauchy’s definition, and then arriving at Riemann’s definition. We conclude with
Weierstrass’s observation that Fourier series are indeed strange creatures. The function
represented by the series
1 1 1
cos(π x) + cos(13π x) + cos(169π x) + cos(2197π x) + · · ·
2 4 8
converges and is continuous at every value of x, but it is never differentiable.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Preface xi

The material presented within this book is not of uniform difficulty. There are computa-
tional inquiries that should engage all students and refined arguments that will challenge the
best. My intention is that every student in the classroom and each individual reader striking
out alone should be able to read through this book and come away with an understanding of
analysis. At the same time, they should be able to return to explore certain topics in greater
depth.

Historical Observations
In the course of writing this book, unexpected images have emerged. I was surprised
to see Peter Gustav Lejeune Dirichlet and Niels Henrik Abel reveal themselves as the
central figures of the transformation of analysis that fits into the years from 1807 through
1872. While Cauchy is associated with the great theorems and ideas that launched this
transformation, one cannot read his work without agreeing with Abel’s judgement that
“what he is doing is excellent, but very confusing.” Cauchy’s seminal ideas required two
and a half decades of gestation before anyone could begin to see what was truly important
and why it was important, where Cauchy was right, and where he had fallen short of
achieving his goals.
That gestation began in the fall of 1826 when two young men in their early 20s, Gustav
Dirichlet and Niels Henrik Abel, met to discuss and work out the implications of what
they had heard and read from Cauchy himself. Dirichlet and Abel were not alone in this
undertaking, but they were of the right age to latch onto it. It would become a recurring
theme throughout their careers. By the 1850s, the stage was set for a new generation
of bright young mathematicians to sort out the confusion and solidify this new vision for
mathematics. Riemann and Weierstrass were to lead this generation. Dirichlet joined Gauss
as teacher and mentor to Riemann. Abel died young, but his writings became Weierstrass’s
inspiration.
It was another twenty years before the vision that Riemann and Weierstrass had grasped
became the currency of mathematics. In the early 1870s, the general mathematical com-
munity finally understood and accepted this new analysis. A revolution had taken place.
It was not an overthrow of the old mathematics. No mathematical truths were discred-
ited. But the questions that mathematicians would ask and the answers they would accept
had changed in a fundamental way. An era of unprecedented power and possibility had
opened.

Changes to the Second Edition


This second edition incorporates many changes, all with the aim of aiding students who
are learning real analysis. The greatest conceptual change is in Chapter 2 where I clarify
that the Archimedean understanding of infinite series is the approach that Cauchy and the
mathematical community has adopted. While this chapter still has a free-wheeling style in
its use of infinite series—the intent being to convey the power and importance of infinite
series—it also begins to introduce rigorous justification of convergence. A new section
devoted entirely to geometric series has been added. Chapter 4, which introduces tests of
convergence, has been reorganized.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

xii Preface

I have also trimmed some of the digressions that I found led students to lose sight of my
intent. In particular, the section on the Newton–Raphson method and the proof of Gauss’s
test for convergence of hypergeometric series have been taken out of the text. Because I
feel that this material is still important, though not central, these sections and much more
are available on the web site dedicated to this book.

Web Resource: When you see this box with the designation “Web Resource”,
more information is available in a pdf file, Mathematica notebook, or Maple
worksheet that can be downloaded at www.macalester.edu/aratra. The box is
also used to point to additional information available in Appendix A.

I have added many new exercises, including many taken from Problems in Mathematical
Analysis by Kaczor and Nowak. Problems taken from this book are identified in Appendix C.
I wish to acknowledge my debt to Kaczor and Nowak for pulling together a beautiful
collection of challenging problems in analysis. Neither they nor I claim that they are the
original source for all of these problems.
All code for Mathematica and Maple has been removed from the text to the website.

Exercises for which these codes are available are marked with the symbol
M&M . The
appendix with selected solutions has been replaced by a more extensive appendix of hints.
I considered adding a new chapter on the structure of the real numbers. Ultimately,
I decided against it. That part of the story properly belongs to the second half of the
nineteenth century when the progress described in this book led to a thorough reappraisal
of integration. To everyone’s surprise this was not possible without a full understanding
of the real numbers which were to reveal themselves as far more complex than had been
thought. That is an entirely other story that will be told in another book, A Radical Approach
to Lebesgue’s Theory of Integration.

Acknowledgements
Many people have helped with this book. I especially want to thank the NSA and the
MAA for financial support; Don Albers, Henry Edwards, and Walter Rudin for their early
and enthusiastic encouragement; Ray Ayoub, Allan Krall, and Mark Sheingorn for helpful
suggestions; and Ivor Grattan-Guinness who was extremely generous with his time and
effort, suggesting historical additions, corrections, and references. The epilogue is among
the additions that were made in response to his comments. I am particularly indebted
to Meyer Jerison who went through the manuscript of the first edition very carefully and
pointed out many of the mathematical errors, omissions, and questionable approaches in the
early versions. Some was taken away and much was added as a result of his suggestions. I
take full responsibility for any errors or omissions that remain. Susan Dziadosz assisted with
the exercises. Her efforts helped weed out those that were impossible or incorrectly stated.
Beverly Ruedi helped me through many aspects of production and has shepherded this book
toward a speedy publication. Most especially, I want to thank the students who took this
course at Penn State in the spring of 1993, putting up with a very preliminary edition and
helping to identify its weaknesses. Among those who suggested improvements were Ryan
Anthony, Joe Buck, Robert Burns, Stephanie Deom, Lisa Dugent, David Dunson, Susan
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Preface xiii

Dziadosz, Susan Feeley, Rocco Foderaro, Chris Franz, Karen Lomicky, Becky Long, Ed
Mazich, Jon Pritchard, Mike Quarry, Curt Reese, Brad Rothenberger, Chris Solo, Randy
Stanley, Eric Steel, Fadi Tahan, Brian Ward, Roger Wherley, and Jennifer White.
Since publication of the first edition, suggestions and corrections have come from many
people including Dan Alexander, Bill Avant, Robert Burn, Dennis Caro, Colin Denis,
Paul Farnham II, Julian Fleron, Kristine Fowler, Øistein Gjøvik, Steve Greenfield, Michael
Kinyon, Mary Marion, Betty Mayfield, Mi-Kyong, Helen Moore, Nick O’Neill, David
Pengelley, Mac Priestley, Tommy Ratliff, James Reber, Fred Rickey, Wayne Roberts,
Cory Sand, Karen Saxe, Sarah Spence, Volker Strehl, Simon Terrington, and Stan Wagon.
I apologize to anyone whose name I may have forgotten.

David M. Bressoud
[email protected]
October 20, 2006
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Contents

Preface ix
1 Crisis in Mathematics: Fourier’s Series 1
1.1 Background to the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Difficulties with the Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Infinite Summations 9
2.1 The Archimedean Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Geometric Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Calculating π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Logarithms and the Harmonic Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.6 Emerging Doubts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3 Differentiability and Continuity 57
3.1 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2 Cauchy and the Mean Value Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3 Continuity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.4 Consequences of Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.5 Consequences of the Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4 The Convergence of Infinite Series 117
4.1 The Basic Tests of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.2 Comparison Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

xv
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

16 Contents

4.3 The Convergence of Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145


4.4 The Convergence of Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5 Understanding Infinite Series 171
5.1 Groupings and Rearrangements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
5.2 Cauchy and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.3 Differentiation and Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
5.4 Verifying Uniform Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
6 Return to Fourier Series 217
6.1 Dirichlet’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
6.2 The Cauchy Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
6.3 The Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
6.4 Continuity without Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
7 Epilogue 267
A Explorations of the Infinite 271
A.1 Wallis on π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
A.2 Bernoulli’s Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
A.3 Sums of Negative Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
A.4 The Size of n! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
B Bibliography 303
C Hints to Selected Exercises 305
Index 317
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

1
Crisis in Mathematics: Fourier’s Series

The crisis struck four days before Christmas 1807. The edifice of calculus was shaken
to its foundations. In retrospect, the difficulties had been building for decades. Yet while
most scientists realized that something had happened, it would take fifty years before the
full impact of the event was understood. The nineteenth century would see ever expanding
investigations into the assumptions of calculus, an inspection and refitting of the structure
from the footings to the pinnacle, so thorough a reconstruction that calculus was given
a new name: Analysis. Few of those who witnessed the incident of 1807 would have
recognized mathematics as it stood one hundred years later. The twentieth century was to
open with a redefinition of the integral by Henri Lebesgue and an examination of the logical
underpinnings of arithmetic by Bertrand Russell and Alfred North Whitehead, both direct
consequences of the events set in motion in that critical year. The crisis was precipitated by
the deposition at the Institut de France in Paris of a manuscript, Theory of the Propagation of
Heat in Solid Bodies, by the 39-year old prefect of the department of Isère, Joseph Fourier.

1.1 Background to the Problem


Fourier began his investigations with the problem of describing the flow of heat in a very
long and thin rectangular plate or lamina. He considered the situation where there is no heat
loss from either face of the plate and the two long sides are held at a constant temperature
which he set equal to 0. Heat is applied in some known manner to one of the short sides,
and the remaining short side is treated as infinitely far away (Figure 1.1). This sheet can
be represented in the x, w plane by a region bounded below by the x-axis, on the left by
x = −1, and on the right by x = 1. It has a constant temperature of 0 along the left and
right edges so that if z(x, w) represents the temperature at the point (x, w), then

z(−1, w) = z(1, w) = 0, w > 0. (1.1)


1
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2 1 Crisis in Mathematics: Fourier’s Series

w-axis

w-axis

x = −1
x =0 x-axis
x =1 x-axis
x = −1 x =1

FIGURE 1.1. Two views of Fourier’s thin plate.

The known temperature distribution along the bottom edge is described as a function of x:

z(x, 0) = f (x). (1.2)

Fourier restricted himself to the case where f is an even function of x, f (−x) = f (x).
The first and most important example he considered was that of a constant temperature
normalized to

z(x, 0) = f (x) = 1. (1.3)

The task was to find a stable solution under these constraints. Trying to apply a constant
temperature across the base of this sheet raises one problem: what is the value at x = 1,
w = 0? The temperature along the edge x = 1 is 0. On the other hand, the temperature
across the bottom where w = 0 is 1. Whatever value we try to assign here, there will have
to be a discontinuity.
But Joseph Fourier did find a solution, and he did it by looking at situations where the
temperature does drop off to zero as x approaches 1 along the bottom edge. What he found
is that if the original temperature distribution along the bottom edge −1 ≤ x ≤ 1 and w = 0
can be written in the form
πx       
3π x 5π x (2n − 1)π x
f (x) = a1 cos + a2 cos + a3 cos + · · · + an cos ,
2 2 2 2
(1.4)

where a1 , a2 , . . . , an are arbitrary constants, then the temperature of the sheet will drop off
exponentially as we move away from the x-axis,
πx   
−πw/2 −3πw/2 3π x
z(x, w) = a1 e cos + a2 e cos + ···
2 2
 
−(2n−1)πw/2 (2n − 1)π x
+an e cos . (1.5)
2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

1.1 Background to the Problem 3

0
−1 −0.5 0 0.5 1
x

−1

FIGURE 1.2. The functions f (x) and z(x, w).

Web Resource: To see how Fourier found this solution, go to The Derivation of
Fourier’s Solution.

For example (see Figure 1.2), if the temperature along the bottom edge is given by
the function f (x) = cos(π x/2) + 2 cos(5π x/2), then the temperature at the point (x, w),
−1 ≤ x ≤ 1, w ≥ 0, is given by
πx   
5π x
z(x, w) = e−πw/2 cos + 2e−5πw/2 cos . (1.6)
2 2

The problem with the solution in equation (1.5) is that it assumes that the distribution of
heat along the bottom edge is given by a formula of the form found in equation (1.4). Any
function that can be written in this way must be continuous and equal to 0 at x = ±1.
The constant function f (x) = 1 cannot be written in this form. One possible interpreta-
tion is that there simply is no solution when f (x) = 1. That possibility did not sit well
with Fourier. After all, it is possible to apply a constant temperature to one end of a
metal bar.
Fourier observed that as we take larger values of n, the number of summands in
equation (1.4), we can get functions that more closely approximate f (x) = 1. If we could
take infinitely many terms, then we should be able to get a function of this form that is
exactly f (x) = 1. Fourier was convinced that this would work and boldly proclaimed his
solution. For −1 < x < 1, he asserted that
 πx  1     
4 3π x 1 5π x
1= cos − cos + cos − ···
π 2 3 2 5 2
∞  
4  (−1)n−1 (2n − 1)π x
= cos . (1.7)
π n=1 2n − 1 2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4 1 Crisis in Mathematics: Fourier’s Series

If true, then this implies that the temperature in the plate is given by
 πx  1   
4 −πw/2 −3πw/2 3π x
z(x, w) = e cos − e cos + ···
π 2 3 2
∞  
4  (−1)n−1 −(2n−1)πw/2 (2n − 1)π x
= e cos . (1.8)
π n=1 2n − 1 2

Web Resource: To explore graphs of approximations to Fourier series go to


Approximating Fourier’s Solution.

Here was the heart of the crisis. Infinite sums of trigonometric functions had appeared be-
fore. Daniel Bernoulli (1700–1782) proposed such sums in 1753 as solutions to the problem
of modeling the vibrating string. They had been dismissed by the greatest mathematician
of the time, Leonhard Euler (1707–1783). Perhaps Euler scented the danger they presented
to his understanding of calculus. The committee that reviewed Fourier’s manuscript: Pierre
Simon Laplace (1749–1827), Joseph Louis Lagrange (1736–1813), Sylvestre François
Lacroix (1765–1843), and Gaspard Monge (1746–1818), echoed Euler’s dismissal in an
unenthusiastic summary written by Siméon Denis Poisson (1781–1840). Lagrange was later
to make his objections explicit. In section 2.6 we shall investigate the specific objections to
trigonometric series that were raised by Lagrange and others. Well into the 1820s, Fourier
series would remain suspect because they contradicted the established wisdom about the
nature of functions.
Fourier did more than suggest that the solution to the heat equation lay in his trigonometric
series. He gave a simple and practical means of finding those coefficients, the ai , for
any function. In so doing, he produced a vast array of verifiable solutions to specific
problems. Bernoulli’s proposition could be debated endlessly with little effect for it was
only theoretical. Fourier was modeling actual physical phenomena. His solution could not
be rejected without forcing the question of why it seemed to work.

Web Resource: To see Fourier’s method for finding the values of ai and to see
how to determine values for the function f (x) = 1, go to The General Solution.

There are problems with Fourier series, but they are subtler than anyone realized in that
winter of 1807–08. It was not until the 1850s that Bernhard Riemann (1826–1866) and
Karl Weierstrass (1815–1897) would sort out the confusion that had greeted Fourier and
clearly delineate the real questions.

1.2 Difficulties with the Solution


Fourier realized that equation (1.7) is only valid for −1 < x < 1. If we replace x by x + 2
in the nth summand, then it changes sign:
   
(2n − 1)π (x + 2) (2n − 1)π x
cos = cos + (2n − 1)π
2 2
 
(2n − 1)π x
= −cos .
2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

1.2 Difficulties with the Solution 5

−3 −2 −1 1 2 3

−1


FIGURE 1.3. f (x) = 4
π
cos πx
2
− 13 cos 3π x
2
+ ··· .

It follows that for x between 1 and 3, equation (1.7) becomes


 
4 πx 1 3π x 1 5π x 1 7π x
f (x) = −1 = cos − cos + cos − cos + ··· . (1.9)
π 2 3 2 5 2 7 2

In general, f (x + 2) = −f (x). The function represented by this cosine series has a graph
that alternates between −1 and +1 as shown in Figure 1.3.
This is very strange behavior. Equation (1.7) seems to be saying that our cosine series is
the constant function 1. Equation (1.9) says that our series is not constant. Moreover, to the
mathematicians of 1807, Figure 1.3 did not look like the graph of a function. Functions were
polynomials; roots, powers, and logarithms; trigonometric functions and their inverses; and
whatever could be built up by addition, subtraction, multiplication, division, or composition
of these functions. Functions had graphs with unbroken curves. Functions had derivatives
and Taylor series. Fourier’s cosine series flew in the face of everything that was known
about the behavior of functions. Something must be very wrong.
Fourier was aware that his justification for equation (1.7) was not rigorous. It began
with the assumption that such a cosine series should exist, and in a crucial step he assumes
that the integral of such a series can be obtained by integrating each summand. In fact,
strange things happen when you try to integrate or differentiate this series by integrating or
differentiating each term.

Term-by-term Integration and Differentiation


Term-by-term integration, the ability to find the integral of a sum of functions by integrating
each summand, works for finite sums,

b 
f1 (x) + f2 (x) + · · · + fn (x) dx
a

b
b
b
= f1 (x)dx + f2 (x)dx + · · · + fn (x)dx.
a a a
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6 1 Crisis in Mathematics: Fourier’s Series

It is not surprising that Fourier would assume that it also works for any infinite sum of
functions. After all, this lay behind one of the standard methods for finding integrals.
Pressed for a definition of integration, mathematicians of Fourier’s time would have
replied that it is the inverse process of differentiation: to find the integral of f (x), you find
a function whose derivative is f (x). This definition has its limitations: what is the integral
of e−x ?
2

There is no simple function with this derivative, but the integral can be found explicitly
by using power series. Using the fact that
x4 x6 x8
e−x = 1 − x 2 +
2
− + − ··· ,
2 3! 4!
and the fact that a power series can be integrated by integrating each summand, we see that

x3 x5 x7 x9
e−x dx = C + x −
2
+ − + − ··· . (1.10)
3 2 · 5 3! · 7 4! · 9
Mathematicians knew that as long as you stayed inside the interval of convergence
there was never any problem integrating a power series term-by-term. The worst that
could go wrong when differentiating term-by-term was that you might lose convergence
at the endpoints. Few mathematicians even considered that switching to an infinite sum of
trigonometric functions would create problems. But you did not have to press Fourier’s
solution very far before you started to uncover real difficulties.

Web Resource: To see how complex analysis can shed light on why Fourier series
are problematic, go to Fourier Series as Complex Power Series.

Looking at the graph of


      
4 πx  1 3π x 1 5π x
f (x) = cos − cos + cos − ···
π 2 3 2 5 2
shown in Figure (1.3), it is clear that the derivative, f  (x), is zero for all values of x other
than odd integers. The derivative is not defined when x is an odd integer. But if we try to
differentiate this function by differentiating each summand, we get the series
 πx        
3π x 5π x 7π x
−2 sin − sin + sin − sin + ··· (1.11)
2 2 2 2

which only converges when x is an even integer.


Many mathematicians of the time objected to even considering infinite sums of cosines.
These infinite summations cast doubt on what scientists thought they knew about the
nature of functions, about continuity, about differentiability and integrability. If Fourier’s
disturbing series were to be accepted, then all of calculus needed to be rethought.
Lagrange thought he found the flaw in Fourier’s work in the question of convergence:
whether the summation approaches a single value as more terms are taken. He asserted that
the cosine series,
πx 1 3π x 1 5π x 1 7π x
cos − cos + cos − cos + ··· ,
2 3 2 5 2 7 2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

1.2 Difficulties with the Solution 7

does not have a well-defined value for all x. His reason for believing this was that the series
consisting of the absolute values of the coefficients,
1 1 1 1
1+ + + + + ··· ,
3 5 7 9
grows without limit (see exercise 1.2.3). In fact, Fourier’s cosine expansion of f (x) = 1
does converge for any x, as Fourier demonstrated a few years later. The complete justifi-
cation of the use of these infinite trigonometric series would have to wait twenty-two years
for the work of Peter Gustav Lejeune Dirichlet (1805–1859), a young German who, in 1807
when Fourier deposited his manuscript, was two years old.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

1.2.1.
M&M Graph each of the following partial sums of Fourier’s expansion over the
interval −1 ≤ x ≤ 3.
4
a. cos(π x/2)
π 
b. 4
cos(π x/2) − 13 cos(3π x/2)
π 
c. 4
cos(π x/2) − 13 cos(3π x/2) + 15 cos(5π x/2)
π 
d. 4
π
cos(π x/2) − 13 cos(3π x/2) + 15 cos(5π x/2) − 17 cos(7π x/2)

M&M
1.2.2.
Let Fn (x) denote the sum of the first n terms of Fourier’s series evaluated
at x:
 
4 πx 1 3π x (−1)n−1 (2n − 1)π x
Fn (x) = cos − cos + ··· + cos .
π 2 3 2 2n − 1 2
a. Evaluate F100 (x) at x = 0, 0.5, 0.9, 0.99, 1.1, and 2. Is this close to the expected value?
b. Evaluate Fn (0.99) at n = 100, 200, 300, . . . , 2000 and plot these successive approxi-
mations.
c. Evaluate Fn (0.999) at n = 100, 200, 300, . . . , 2000 and plot these successive approxi-
mations.
d. What is the value of this infinite series at x = 1?

M&M
1.2.3.
Evaluate the partial sums of the series

1 1 1 1
1+ + + + + ···
3 5 7 9
for the first 10, 20, 40, 80, 160, 320, and 640 terms. Does this series appear to approach a
value? If so, what value is it approaching?

1.2.4.
M&M Graph the surfaces described by the partial sums consisting of the first term,
the first two terms, the first three terms, and the first four terms of Fourier’s solution over
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

8 1 Crisis in Mathematics: Fourier’s Series

0 ≤ w ≤ 0.6, −1 ≤ x ≤ 1:

4 −πw/2 πx 1 3π x
z(x, w) = e cos − e−3πw/2 cos
π 2 3 2

1 −5πw/2 5π x 1 −7πw/2 7π x
+ e cos − e cos + ··· .
5 2 7 2

1.2.5.
M&M Consider the series
1 1 1 1 1 1 1
1+ − − + + − − + ··· .
5 7 11 13 17 19 23
Prove that the partial sums are always greater than or equal to 1 once we have at least five
terms. What number does this series appear to approach?

1.2.6. Fourier series illustrate the dangers of trying to find limits by simply substituting the
value that x approaches. Consider Fourier’s series:
 
4 πx 1 3π x 1 5π x 1 7π x
f (x) = cos − cos + cos − cos + ··· . (1.12)
π 2 3 2 5 2 7 2

a. What value does this approach as x approaches 1 from the left?


b. What value does this approach as x approaches 1 from the right?
c. What is the value of f (1)?
These three answers are all different.

1.2.7.
M&M Consider the function that we get if we differentiate each summand of the
function f (x) defined in equation (1.12),
 
πx 3π x 5π x 7π x
g(x) = −2 sin − sin + sin − sin + ··· .
2 2 2 2
a. For −1 < x < 3, graph the partial sums of this series consisting of the first 10, 20, 30,
40, and 50 terms. Does it appear that these graphs are approaching the constant function
0?
b. Evaluate the partial sums up to at least 20 terms when x = 0, 0.2, 0.3, and 0.5. Does it
appear that this series is approaching 0 at each of these values of x?
c. What is happening at x = 0, 0.2, 0.3, 0.5? What can you prove?
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2
Infinite Summations

The term infinite summation is an oxymoron. Infinite means without limit, nonterminating,
never ending. Summation is the act of coming to the highest point (summus, summit),
reaching the totality, achieving the conclusion. How can we conclude a process that never
ends? The phrase itself should be a red flag alerting us to the fact that something very
subtle and nonintuitive is going on. It is safer to speak of an infinite series for a summation

that has no end, but we shall use the symbols of addition, the + and the . We need to
remember that they no longer mean quite the same thing.
In this chapter we will see why infinite series are important. We will also see some of the
ways in which they can behave totally unlike finite summations. The discovery of Fourier
series accelerated this recognition of the strange behavior of infinite series. We will learn
more about why they were so disturbing to mathematicians of the early 19th century. We
begin by learning how Archimedes of Syracuse dealt with infinite processes. While he may
seem to have been excessively cautious, ultimately it was his approach that mathematicians
would adopt.

2.1 The Archimedean Understanding


The Greeks of the classical era avoided such dangerous constructions as infinite series. An
illustration of this can be found in the quadrature of the parabola by Archimedes of Syracuse
(287–212 ..). To make the problem concrete, we state it as one of finding the area of the
region bounded below by the x-axis and above by the curve y = 1 − x 2 (Figure 2.1), but
Archimedes actually showed how to find the area of any segment bounded by an arc of a
parabola and a straight line.

9
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

10 2 Infinite Summations

−1 1

FIGURE 2.1. Archimedes’ triangulation of a parabolic region.

Web Resource: To see what Archimedes actually did to find the area of any
segment bounded by a parabola and a straight line, go to The quadrature of a
parabolic segment.

The triangle with vertices at (±1, 0) and (0, 1) has area 1. The two triangles that lie
above this and have vertices at (±1/2, 3/4) have a combined area of 1/4. If we put four
triangles above these two, adding vertices at (±1/4, 15/16) and (±3/4, 7/16), then these
four triangles will add a combined area of 1/16. In general, what Archimedes showed is
that no matter how many triangles we have placed inside this region, we can put in two
new triangles for each one we just inserted and increase the total area by one-quarter of the
amount by which we last increased it.
As we take more triangles, we get successive approximations to the total area:

1 1 1 1 1 1 1 1 1
1, 1+ , 1+ + , 1+ + + , ..., 1+ + + ··· + n.
4 4 16 4 16 64 4 16 4

Archimedes then makes the observation that each of these sums brings us closer to 4/3:

4 1
1= − ,
3 3
1 4 1
1+ = − ,
4 3 4·3
1 1 4 1
1+ + = − ,
4 16 3 16 · 3
1 1 1 4 1
1+ + + = − ,
4 16 64 3 64 · 3
..
.
1 1 4 1
1+ + ··· + k = − k . (2.1)
4 4 3 4 ·3
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.1 The Archimedean Understanding 11

A modern reader is inclined to make the jump to an infinite summation at this point and
say that the actual area is

1 1 4
1+ + + ··· = .
4 16 3

This is precisely what Archimedes did not do. He proceeded very circumspectly, letting K
denote the area to be calculated and demonstrating that K could not be larger than 4/3 nor
less than 4/3.

Archimedes’ Argument
Let K denote the area bounded by the parabolic arc and the line segment. Archimedes
showed that each time we add new triangles, the area of the region inside the parabolic arc
that is not covered by our triangles is reduced by more than half (see exercises 2.1.2–2.1.3).
It follows that we can make this error as small as we want by taking enough inscribed
triangles. If K were larger than 4/3, then we could inscribe triangles until their total area
was more than 4/3. This would contradict equation (2.1) which says that the sum of the
areas of the inscribed triangles is always strictly less than 4/3. If K were smaller than 4/3,
then we could find a k for which 4/3 − 1/(4k · 3) is larger than K. But then equation (2.1)
tells us that the sum of the areas of the corresponding inscribed triangles is strictly larger
than K. This contradicts the fact that the sum of the areas of inscribed triangles cannot
exceed the total area.
This method of calculating areas by summing inscribed triangles is often referred to as
the “method of exhaustion.” E. J. Dijksterhuis has pointed out that this is “the worst name
that could have been devised.” As Archimedes or Eudoxus of Cnidus (ca. 408–355 ..)
(the first to employ this method) would have insisted, you never exhaust the area. You only
get arbitrarily close to it.
Archimedes argument is important because it points to our modern definition of the
infinite series 1 + 1/4 + 1/16 + · · · + 1/4n + · · · . Just as Archimedes handled his infinite
process by producing a value and demonstrating that the answer could be neither greater
nor less than this produced value, so Cauchy and others of the early nineteenth century
would handle infinite series by producing the desired value and demonstrating that the
series could not have a value either greater or less than this.
To a modern mathematician, an infinite series is the succession of approximations by
finite sums. Our finite sums may not close in quite as nicely as Archimedes’ series 1 +
1/4 + 1/16 + · · · + 1/4n + · · · , but the idea stays the same. We seek a target value T
so that for each M > T , the finite sums eventually will all be below M, and for all real
numbers L < T , the finite sums eventually will all be above L. In other words, given any
open interval (L, M) that contains T , all of the partial sums are inside this interval once
they have enough terms. If we can find such a target value T , then it is the value of the
infinite series. We shall call this the Archimedean understanding of an infinite series.
For example, the Archimedean understanding of 1 + 1/4 + 1/16 + · · · + 1/4n + · · · is
that it is the sequence 1, 1 + 1/4, 1 + 1/4 + 1/16, . . . . All of the partial sums are less than
4/3, and so they are less than any M > 4/3. For any L < 4/3, from some point on all of
the partial sums will be strictly larger than L. Therefore, the value of the series is 4/3.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

12 2 Infinite Summations

Definition: Archimedean understanding of an infinite series


The Archimedean understanding of an infinite series is that it is shorthand for the
sequence of finite summations. The value of an infinite series, if it exists, is that number
T such that given any L < T and any M > T , all of the finite sums from some point
on will be strictly contained in the interval between L and M. More precisely, given
L < T < M, there is an integer n, whose value depends on the choice of L and M,
such that every partial sum with at least n terms lies inside the interval (L, M).

In the seventeenth and eighteenth centuries, there was a free-wheeling style in which it
appeared that scientists treated infinite series as finite summations with a very large number
of terms. In fact, scientists of this time were very aware of the distinction between series
with a large number of summands and infinite series. They knew you could get into serious
trouble if you did not make this distinction. But they also knew that treating infinite series
as if they really were summations led to useful insights such as the fact that the integral
of a power series could be found by integrating each term, just as in a finite summation.
They developed a sense for what was and was not legitimate. But by the early 1800s,
the sense for what should and should not work was proving insufficient, as exemplified
by the strange behavior of Fourier’s trigonometric series. Cauchy and others returned to
Archimedes’ example of how to handle infinite processes.
It may seem the Archimedean understanding creates a lot of unnecessary work simply to
avoid infinite summations, but there is good reason to avoid infinite summations for they
are manifestly not summations in the usual sense.

Web Resource: To learn about the Archimedean principle and why it is essential
to the Archimedean understanding of an infinite series, go to The Archimedean
principle.

The Oddity of Infinite Sums


Ordinary sums are very well behaved. They are associative, which means that it does not
matter how we group them:

(2 + 3) + 5 = 2 + (3 + 5),

and they are commutative, which means that it does not matter how we order them:

2 + 3 + 5 = 3 + 5 + 2.

These simple facts do not always hold for infinite sums. If we could group an infinite sum
any way we wanted, then we would have that

1 − 1 + 1 − 1 + 1 − 1 + 1 − 1 + ···
= (1 − 1) + (1 − 1) + (1 − 1) + (1 − 1) + · · ·
= 0,
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.1 The Archimedean Understanding 13

0.9

0.8

0.7

0.6

0.5
10 20 30 40 50

FIGURE 2.2. Plot of partial sums up to fifty terms of 1 − 12 + 13 − 14 + · · · .

whereas by regrouping we obtain

1 − 1 + 1 − 1 + 1 − 1 + 1 − 1 + ···
= 1 + (−1 + 1) + (−1 + 1) + (−1 + 1) + (−1 + 1) + · · ·
= 1.

It takes a little more effort to see that rearrangements are not always allowed, but the
effort is rewarded in the observation that some very strange things are happening here.
Consider the alternating harmonic series
1 1 1 1 1
1− + − + − + ··· .
2 3 4 5 6
A plot of the partial sums of this series up to the sum of the first fifty terms is given
in Figure 2.2. The partial sums are narrowing in on a value near 0.7 (in fact, this series
converges to ln 2).
If we rearrange the summands in this series, taking two positive terms, then one negative
term, then the next two positive terms, then the next negative term:
1 1 1 1 1 1 1 1
1+ − + + − + + − + ··· ,
3 2 5 7 4 9 11 6
we obtain a series whose partial sums are plotted in Figure 2.3. The partial sums are now
approaching a value near 1.04. Rearranging the summands has changed the value.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

14 2 Infinite Summations

1.3

1.2

1.1

0.9

10 20 30 40 50

FIGURE 2.3. Plot of partial sums of 1 + 13 − 12 + 15 + 17 − 14 + · · · .

Web Resource: To explore the alternating harmonic series and its rearrangements,
go to Explorations of the alternating harmonic series.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

2.1.1.
a. Show that the triangle with vertices at (a1 , b1 ), (a2 , b2 ), and (a3 , b3 ) has area equal to
1

(a2 − a1 )(b3 − b1 ) − (b2 − b1 )(a3 − a1 ) . (2.2)
2
One approach is to use the fact that the area of the parallelogram defined by (a2 − a1 )ı +
(b2 − b1 ) and (a3 − a1 )ı + (b3 − b1 ) is
   

(a2 − a1 )ı + (b2 − b1 ) + 0 k × (a3 − a1 )ı + (b3 − b1 ) + 0 k .

b. Use the area formula in line (2.2) to prove that the area of the triangle with vertices at
(a, a 2 ), (a + δ, (a + δ)2 ), (a + 2δ, (a + 2δ)2 ) is |δ|3 .
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.1 The Archimedean Understanding 15

c. Use these results to prove that the area of the polygon with vertices at (−1, 0), (−1 +
2−n , 1 − (−1 + 2−n )2 ), (−1 + 2 · 2−n , 1 − (−1 + 2 · 2−n )2 ), (−1 + 3 · 2−n , 1 − (−1 +
3 · 2−n )2 ), . . ., (1, 0) is 1 + 4−1 + 4−2 + · · · + 4−n .

2.1.2. Archimedes’ formula for the area of a parabolic region is obtained by constructing
triangles where the base is the line segment that bounds the region and the apex is located at
the point where the tangent line to the parabola is parallel to the base. Show that the tangent
to y = 1 − x 2 at ((k + 1/2)2−n , 1 − (k + 1/2)2 2−2n ) has the same slope as the line segment
connecting the two endpoints: (k 2−n, 1 − k 2 2−2n ) and ((k + 1)2−n ), 1 − (k + 1)2 2−2n ).

2.1.3. Show that if we take a parabolic region and inscribe a triangle whose base is the line
segment that bounds the region and whose apex is located at the point where the tangent
line to the parabola is parallel to the base, then the area of the triangle is more than half the
area of the parabolic region.

2.1.4. Archimedes’ method to find the area under the graph of y = 1 − x 2 is equivalent
to using trapezoidal approximations to the integral of this function from x = −1 to x = 1,
first with steps of size 1, then size 1/2, 1/4, 1/8, . . . .
1
a. Verify that the trapezoidal approximation to −1 1 − x 2 dx with steps of size 1/2 is
equal to 4/3 − 1/3 · 4 = 5/4.
1
b. Verify that the trapezoidal approximation to −1 1 − x 2 dx with steps of size 1/4 is
equal to 4/3 − 1/3 · 42 = 21/16.
1
c. Verify that the trapezoidal approximation to −1 1 − x 2 dx with steps of size 1/8 is
equal to 4/3 − 1/3 · 43 = 85/64.

2.1.5.
1 Explain each step in the following evaluation of the trapezoidal approximation to
−1 (1 − x 2 ) dx with steps of size 2−k :


2
k
−1
  2 
1
1 i
(1 − x 2 ) dx ≈ k 1− k
(2.3)
−1 2 k
2
i=1−2
 
2k
−1 2
1  k+1 i 
= k 2 −1−2 (2.4)
2 i=1
22k
 
1 1 23k 22k 2k
= 2 − k − 3k−1 − + (2.5)
2 n 2 3 2 6
4 1
= − . (2.6)
3 3 · 4k

It follows that the sum of the areas of the last 2k triangles is


   
4 1 4 1 4−1 1
− − − = = k.
3 3 · 4k 3 3 · 4k−1 3·4k 4
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

16 2 Infinite Summations

2.1.6. Consider the series


1 1 1 1
1+ + + + ··· + k + ··· .
2 4 8 2
Find the target value, T , of the partial sums. How do you know that for any M greater than
your target value, all of the partial sums are strictly less than M? How many terms do you
have to take in order to guarantee that all of the partial sums from that point on will be
larger than L = T − 1/10?

2.1.7. Consider the series


3 3 3 (−1)k 3
3− + − + ··· + + ··· , k ≥ 0.
2 4 8 2k
Find the target value, T , of the partial sums. How many terms do you have to take in
order to guarantee that all of the partial sums from that point on will be smaller than
M = T + 1/10? How many terms do you have to take in order to guarantee that all of the
partial sums from that point on will be larger than L = T − 1/10? How many terms do
you have to take in order to guarantee that all of the partial sums from that point on will be
within 1/100 of T ?

2.1.8. Consider the series


1 1 1 (−1)k−1
1− + − + ··· + + ··· .
2 3 4 k
Explain why there should be a target value. You may not be able to prove that the target
value is T = ln 2, but you should still be able to explain why there should be one. How
many terms will be enough to guarantee that all of the partial sums from that point on will
be within 1/10 of T ? Explain the reasoning that leads to your answer.

2.1.9. What is the Archimedean understanding of the infinite series 1 − 1 + 1 − 1 + · · · ?


Explain why this series cannot have a value under this understanding.

M&M
2.1.10.

a. Calculate the first 2n terms of the alternating harmonic series with the summands in the
usual order. Check that it gets close to the target value of T = ln 2 as n gets large. How
large does n have to be before the partial sums are all with 10−6 of ln 2?
b. Find the value that the series approaches when you take two positive summands for
every negative summand,
1 1 1 1 1 1 1 1
1+ − + + − + + − + ··· .
3 2 5 7 4 9 11 6
c. Find the value that the series approaches when you take one positive summand for every
two negative summands,
1 1 1 1 1 1 1 1
1− − + − − + − − + ··· .
2 4 3 6 8 5 10 12
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.2 Geometric Series 17

d. Take your decimal answers from parts (b) and (c). For each decimal, d, calculate e2d .
Guess the true values of these rearranged series. Explore the decimal values you get
with other integers for r and s. Guess the general value formula. Keep notes of your
exploration and explain the process that led to your guess.

M&M
2.1.11.
Explore what happens if you rearrange the series
1 1 1
1 − 2 + 2 − 2 + ··· .
2 3 4
Compare the values that you get with the original series, taking two positive summands for
every negative summand, and taking two negative summands for every positive summand.
Explore what happens with other values for r and s.

2.2 Geometric Series


By the fourteenth century, the Scholastics in Oxford and Paris, people such as Richard
Swineshead (fl. c. 1340–1355) and Nicole Oresme (1323–1382), were using and assigning
values to infinite series that arose in problems of motion. They began with series for which
each pair of consecutive summands has the same ratio, such as the summation used by
Archimedes,
1 1 1
1+ + + ··· + n + ··· .
4 16 4
Any series such as this for which there is a constant ratio between successive summands is
called a geometric series.
For many values of x, the infinite geometric series can be summed using the identity

1
1 + x + x2 + x3 + x4 + · · · = . (2.7)
1−x

Examples of this are


1 1 1 1 3 1
1+ + + + + ··· = =
3 9 27 81 2 1 − 1/3
and
1 1 1 1 2 1
1− + − + − ··· = = .
2 4 8 16 3 1 − (−1/2)
One has to be very careful with equation (2.7). If we set x = 2, we get a very strange
equality:

1
1 + 2 + 4 + 8 + 16 + · · · = = −1. (2.8)
1−2

We need to decide what we mean by an infinite summation. We could define 1 + x +


x 2 + x 3 + · · · to mean 1/(1 − x), in which case equation (2.8) is correct. We would be
in good company. Leonhard Euler accepted this definition. It yields many other interesting
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

18 2 Infinite Summations

results, for example:


1 1
1 − 2 + 4 − 8 + 16 − · · · = = .
1 − (−2) 3
In the exhaustive and fascinating account, Convolutions in French Mathematics, 1800–
1840, Ivor Grattan-Guinness writes, “Some modern appraisals of the cavalier style of
18th-century mathematicians in handling infinite series convey the impression that these
poor men set their brains aside when confronted by them.” They did not. Certainly Euler
had not set his brain aside. He rather viewed infinite series in a larger context, a context
that he makes clear in his article “On divergent series” published in 1760. Euler illustrates
his understanding with the series 1 − 1 + 1 − 1 + · · · which he asserts to be equal to 1/2,
obtained by setting x = −1 in equation (2.7).

Notable enough, however, are the controversies over the series 1 − 1 + 1 −


1 + 1− etc. whose sum was given by Leibniz as 1/2, although others dis-
agree. . . . Understanding of this question is to be sought in the word “sum”; this
idea, if thus conceived—namely, the sum of a series is said to be that quantity to
which it is brought closer as more terms of the series are taken—has relevance only
for the convergent series, and we should in general give up this idea of sum for
divergent series. On the other hand, as series in analysis arise from the expansion
of fractions or irrational quantities or even of transcendentals, it will in turn be
permissible in calculation to substitute in place of such series that quantity out of
whose development it is produced.

Here is the point we have been making: for any infinite summation we need to stretch our
definition of sum. Euler merely asks that in the case of a series that does not converge, we
allow a value determined by the genesis of the series.
As we shall see in section 2.6, Euler’s approach raises more problems than it settles.
Eventually, mathematicians would be forced to allow divergent series to have values. Such
values are too useful to abandon completely. But using these values must be done with great
delicacy. The scope of this book will only allow brief glimpses of how this can be done
safely. The Archimedean understanding is the easiest and most reliable way of assigning
values to infinite series.

Web Resource: To learn more about divergent series, go to Assigning values to


divergent series.

When an infinite series has a target value in the sense of Archimedes’ understanding, we
say that our series converges. For our purposes, it will be safest not to assign a value to an
infinite series unless it converges.

Definition: convergence of an infinite series


An infinite series converges if there is a target value T so that for any L < T and any
M > T , all of the partial sums from some point on are strictly between L and M.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.2 Geometric Series 19

Cauchy’s Approach
Returning to equation (2.7), it is tempting to try to prove this result using precisely the
associative law that we saw does not work:

1 = 1 − x + x − x2 + x2 − x3 + x3 − · · ·
= (1 − x) + x(1 − x) + x 2 (1 − x) + x 3 (1 − x) + · · ·
= (1 + x + x 2 + x 3 + · · · )(1 − x),
1
= 1 + x + x2 + x3 + · · · . (2.9)
1−x
In 1821, Augustin Louis Cauchy published his Cours d’analyse de l’École Royale Poly-
technique (Course in Analysis of the Royal Institute of Technology). One of his intentions
in writing this book was to put the study of infinite series on a solid foundation. In his
introduction, he writes,

As for the methods, I have sought to give them all of the rigor that one insists
upon in geometry, in such manner as to never have recourse to explanations drawn
from algebraic technique. Explanations of this type, however commonly admitted,
especially in questions of convergent and divergent series and real quantities that
arise from imaginary expressions, cannot be considered, in my opinion, except as
heuristics that will sometimes suggest the truth, but which accord little with the
accuracy that is so praised in the mathematical sciences.

When Cauchy speaks of “algebraic technique,” he is specifically referring to the kind of


technique employed in equation (2.9). While this argument is suggestive, we cannot rely
upon it.
Cauchy shows how to handle a result such as equation (2.7). We need to restrict our
argument to the safe territory of finite summations:

1 = 1 − x + x − x2 + x2 − · · · − xn + xn
= (1 − x) + x(1 − x) + x 2 (1 − x) + · · · + x n−1 (1 − x) + x n ,
1 xn
= 1 + x + x 2 + · · · + x n−1 + ,
1−x 1−x
1 xn
1 + x + x 2 + · · · + x n−1 = − . (2.10)
1−x 1−x
Cauchy follows the lead of Archimedes. What we call the infinite series is really just
the sequence of values obtained from these finite sums. Approaching the problem in this
way, we can see exactly how much the finite geometric series differs from the target value,
T = 1/(1 − x). The difference is
xn
.
1−x
If we take a value larger than T , is this finite sum eventually below it? If we take a value
smaller than T , is this finite sum eventually above it? The value of this series is 1/(1 − x)
if and only if we can make the difference as close to 0 as we wish by putting a lower bound
on n. This happens precisely when |x| < 1.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

20 2 Infinite Summations

Cauchy’s careful analysis shows us that equation (2.7) needs to carry a restriction:

1
1 + x + x2 + x3 + · · · = , provided that |x| < 1. (2.11)
1−x

We have stumbled across a curious and important phenomenon. Ordinary equalities do


not carry restrictions like this. A statement such as
1 − x2
1+x =
1−x
is valid for any x, as long as the denominator on the right is not 0. Equation (2.11) is
something very different. It is a statement about successive approximations. The equality
does not mean what it usually does. The symbol + no longer means quite the same. The
Archimedean understanding, cumbersome as it may seem, has become essential.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

2.2.1. Find the target value of the series


1 1 1
1+ + + ··· + k + ··· .
3 9 3
Find a value of n so that any partial sum with at least n terms is within 0.001 of the target
value. Justify your answer.

2.2.2. Find the target value of the series


3 9 27 3k
1− + − + · · · + (−1)k k + · · · .
4 16 64 4
Find a value of n so that any partial sum with at least n terms is within 0.001 of the target
value. Justify your answer.

2.2.3. Find the target value of the series


1 1 5 25 5k−1
− + − + · · · + (−1)k k + · · · .
5 6 36 216 6
Find a value of n so that any partial sum with at least n terms is within 0.001 of the target
value. Justify your answer.

2.2.4. Find the target value of the series


1 1 1 1 1 1 1 1
1+ − + + − + · · · + 3k + 3k+1 − 3k+2 + · · · .
2 4 8 16 32 2 2 2
Find a value of n so that any partial sum with at least n terms is within 0.001 of the target
value. Justify your answer.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.2 Geometric Series 21

2.2.5. It is tempting to differentiate each side of equation (2.11) with respect to x and to
assert that

1
1 + 2x + 3x 2 + 4x 3 + · · · = . (2.12)
(1 − x)2

Following Cauchy’s advice, we know we need to be careful. Differentiate each side of


equation (2.10). What is the difference between 1 + 2x + 3x 2 + · · · + nx n−1 and (1 −
x)−2 ? For which values of x will this difference approach 0 as n increases?

2.2.6. Find the target value of the series


2 3 4 k
1− + − + · · · + (−1)k−1 k−1 + · · · .
3 9 27 3
Find a value of n so that any partial sum with at least n terms is within 0.001 of the target
value. Justify your answer.

2.2.7. Find the target value of the series


1 4 3 6 5 2k + 2 2k + 1
2− + − + − + ··· + − 2k+1 + · · · .
3 9 27 81 243 32k 3
Find a value of n so that any partial sum with at least n terms is within 0.001 of the target
value. Justify your answer.

2.2.8.
M&M Explore the rearrangements of 1 − 1/2 + 1/4 − 1/8 + 1/16 − 1/32 + · · · .
Explain why all rearrangements of this series must have the same target value.

2.2.9.
M&M It is tempting to integrate each side of equation (2.11) with respect to x and
to assert that

x2 x3 x4
x+ + + + · · · = − ln(1 − x). (2.13)
2 3 4

Following Cauchy’s advice, we know we need to be careful, but now we run into trouble.
What happens when we try to integrate x n /(1 − x)? Fortunately, we do not have to find the
exact value of the difference between x + x 2 /2 + x 3 /3 + · · · + x n /n and − ln(1 − x). All
we have to show is that we can make this difference as small as we wish by taking enough
terms. We can do this by bounding the integral of x n /(1 − x).
We use the fact that if

b
b

|f (x)| < g(x) for all x, then f (x) dx < g(x) dx.
a a

If 0 < x < 1, then we can find a number a so that 0 < x < a < 1 and
n
x
−1
1 − x < x (1 − a) .
n
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

22 2 Infinite Summations

Integrate this bounding function with respect to x, and show that if 0 < x < 1, then the
partial sums of the series in equation (2.13) approach the target value of − ln(1 − x) as n
increases. Explain what happens for −1 < x < 0. Justify your answer.

2.2.10. Find the target value of the series

1 1 1 1 1
− + − + · · · + (−1)k−1 + ··· .
2 2·2 2 3·23 4·24 k · 2k
Find a value of n so that any partial sum with at least n terms is within 0.001 of the target
value. Justify your answer.

2.2.11. Find the target value of the series

3 4 5 k+1
1−1+ − + − · · · + (−1)k + ··· .
2 · 22 3 · 23 4 · 24 k · 2k
Find a value of n so that any partial sum with at least n terms is within 0.001 of the target
value. Justify your answer.

2.3 Calculating π
Beginning in the Middle Ages, at first hesitantly and then with increasing confidence,
mathematicians plunged into the infinite. They resurfaced with treasures that Archimedes
could never have imagined. The true power of calculus lies in its coupling with infinite
processes. Mathematics as we know it and as it has come to shape modern science could
never have come into being without some disregard for the dangers of the infinite.
As we saw in the last section, the dangers are real. The genius of the early explorers of
calculus lay in their ability to sense when they could treat an infinite summation according
to the rules of the finite and when they could not. Such intuition is a poor foundation for
mathematics. By the time Fourier proposed his trigonometric series, it was recognized that
a better understanding of what was happening—what was legitimate and what would lead
to error—was needed. The solution that was ultimately accepted looks very much like what
Archimedes was doing, but it would be a mistake to jump directly from Archimedes to our
modern understanding of infinite series, for it would miss the point of that revolution in
mathematics that occured in the late seventeenth century and that was so powerful precisely
because it dared to treat the infinite as if it obeyed the same laws as the finite.
The time will come when we will insist on careful definitions, when we will concentrate
on potential problems and learn how to avoid them. But the problems will not be meaningful
unless we first appreciate the usefulness of playing with infinite series as if they really are
summations. We begin by seeing what we can accomplish if we simply assume that infinite
series behave like finite sums.
Much of the initial impetus for using the infinite came from the search for better approx-
imations to π , the ratio of the circumference of a circle to its diameter. In this section we
will describe several different infinite series as well as an infinite product that can be used
to approximate π .
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.3 Calculating π 23

The Arctangent Series


One of the oldest and most elegant series for computing π is usually attributed to Gottfried
Leibniz (1646–1716) but was also known to Isaac Newton (1642–1727) and to James
Gregory (1638–1675). Almost two centuries earlier, it was known to Nilakantha (ca.
1450–1550) of Kerala in southwest India where the power series for the sine and cosine
probably had been discovered even earlier by Madhava (ca. 1340–1425). It is

π 1 1 1 1 1
=1− + − + − + ··· . (2.14)
4 3 5 7 9 11

This is the special case x = 0 of Fourier’s equation (1.7). It was discovered by integrating
a geometric series.
We use the fact that the derivative of the arctangent function is 1/(1 + x 2 ) = 1 − x 2 +
x − x 6 + · · · . If we integrate this series, we should get the arctangent:
4

x3 x5 x7
x− + − + · · · = arctan x. (2.15)
3 5 7

Equation (2.14) is the special case x = 1.


The series in equation (2.14) converges very slowly, but we have at our disposal the
series for the arctangent of any value between 0 and 1. The convergence becomes much
faster if we take a value of x close to 0. Around 1706, John Machin (1680–1751) calculated
the first 100 digits of π using the identity
   
1 1
π = 16 arctan − 4 arctan
5 239
 
1 1 1 1
= 16 − + − + ···
5 3 · 53 5 · 55 7 · 57
 
1 1 1 1
−4 − + − + · · · . (2.16)
239 3 · 2393 5 · 2395 7 · 2397

Web Resource: To investigate series that converge to π , go to More pi.

Wallis’s Product
John Wallis (1616–1703) considered the integral

1
(1 − t 1/p )q dt.
0

When
√ p = q = 1/2, this is the area in the first quadrant bounded by the graph of y =
1 − x 2 , the upper half circle. It equals π/4. Wallis knew the binomial theorem for integer
exponents,
     
n n 2 n k
(1 + x)n = 1 + x+ x + ··· + x + · · · + xn, (2.17)
1 2 k
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

24 2 Infinite Summations

and he knew how to integrate a rational power of x. Relying on what happens at integer
values of q, he was able to extrapolate to other values. From the patterns he observed, he

discovered remarkable bounds for π/2:
√ 
2 · 4 · 6 · · · (2n − 2) 2n π 2 · 4 · 6 · · · (2n − 2)(2n)
> > √ , (2.18)
3 · 5 · 7 · · · (2n − 1) 2 3 · 5 · 7 · · · (2n − 1) 2n + 1

valid for any n ≥ 2. This implies that

π 2 2 4 4 6 6 8
= · · · · · · ··· . (2.19)
2 1 3 3 5 5 7 7

To learn how John Wallis discovered equation (2.19), go to Appendix A.1, Wallis
on π.

Newton’s Binomial Series


In 1665, Isaac Newton read Wallis’s Arithmetica infinitorum in which he explains how
to derive his product identity. This led Newton to an even more important discovery. He
described the process in a letter to Leibniz written on October 24, 1676.
The starting point was both to generalize and to simplify Wallis’s integral. Newton
looked at

x
(1 − t 2 )m/2 dt.
0

When m is an even integer, we can use the binomial expansion in equation (2.17) to produce
a polynomial in x:

x
(1 − t 2 )0 dt = x,
0

x
1
(1 − t 2 )1 dt = x − x 3 ,
0 3

x
2 1
(1 − t 2 )2 dt = x − x 3 + x 5 ,
0 3 5

x
3 3 1
(1 − t 2 )3 dt = x − x 3 + x 5 − x 7 ,
0 3 5 7
..
.
What happens when m is an odd integer? Is it possible to interpolate between these
polynomials? If it is, then we could let m = 1 and x = 1 and obtain an expression for π/4.
Newton realized that the problem comes down to expanding (1 − t 2 )m/2 as a polynomial
in t 2 and then integrating each term. Could this be done when m is an odd integer? Playing
with the patterns that he discovered, he stumbled upon the fact that not only could he find
an expansion for the binomial when the exponent is m/2, m odd, he could get the expansion
with any exponent. Unless the exponent is a positive integer (or zero), the expansion is an
infinite series.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.3 Calculating π 25

Newton’s Binomial Series


For any real number a and any x such that |x| < 1, we have that

a(a − 1) 2 a(a − 1)(a − 2) 3


(1 + x)a = 1 + ax + x + x + ··· . (2.20)
2! 3!

Web Resource: To learn how Newton discovered his binomial series, go to New-
ton’s formula. To explore the convergence of the series for π that arises from the
binomial series, go to More pi.

Equipped with equation (2.20) and assuming that there are no problems with term-by-
term integration, we can find another series that approaches π/4:

1
π  1/2
= 1 − t2 dt
4 0

1 
1 (1/2)(−1/2) 4 (1/2)(−1/2)(−3/2) 4
= 1 − t2 + t − t + · · · dt
0 2 2! 3!
1 1 3 3·5
= 1− − − − − ··· . (2.21)
2 · 3 4 · 2! · 5 8 · 3! · 7 16 · 4! · 9
This series is an improvement over equation (2.14), but Newton showed how to use his
binomial series to do much better. He considered the area of the shaded region in Figure 2.4.
On the one hand, this area is represented by the series:

1/4 
Area = x − x 2 dx
0

1/4
= x 1/2 (1 − x)1/2 dx
0

1/4 
1 3/2 (1/2)(−1/2) 5/2 (1/2)(−1/2)(−3/2) 7/2
= x − x + 1/2
x − x + · · · dx
0 2 2! 3!
   5/2  7/2  9/2
2 1 3/2 2 1 2 1 2·3 1
= − − −
3 4 5·2 4 7 · 2 · 2! 4
2 9 · 2 · 3! 4
3

 11/2
2·3·5 1
− − ···
11 · 2 · 4! 4
4

1 1 1 3 3·5
= − − − − − ··· .
3 · 22 5 · 25 7 · 28 · 2! 9 · 211 · 3! 11 · 214 · 4!
(2.22)

On the other hand, this area is one-sixth of a circle of radius 1/2 minus a right triangle
whose base is 1/4 and whose hypotenuse is 1/2:

π 3
Area = − . (2.23)
24 32
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

26 2 Infinite Summations

1/ 2 y = x − x2

1/4 1/2 3/4 1



FIGURE 2.4. The area under y = x − x 2 from 0 to 1/4.

The square root of 3 can be expressed using the binomial series:


√ 
3 = 2 3/4
 
1 1/2
= 2 1−
4
 
1 1 3 3·5
= 2 1− − 2 2 − 3 3 − 4 4 − ··· . (2.24)
2 · 4 2 · 4 · 2! 2 · 4 · 3! 2 · 4 · 4!
Putting these together, we see that
 
1 1 1 3 3·5
π = 24 − − − − − ···
3 · 22 5 · 25 7 · 28 · 2! 9 · 211 · 3! 11 · 214 · 4!
 
3 1 1 3 3·5
+ 1− 3 − 6 − − − ··· . (2.25)
2 2 2 · 2! 29 · 3! 212 · 4!
All of this work is fraught with potential problems. We have simply assumed that we
may integrate the infinite summations by integrating each term. In fact, here it works. That
will not always be the case.
Newton’s discovery was more than a means of calculating π . The binomial series is one
that recurs repeatedly and has become a critical tool of analysis. It is a simple series that
raises difficult questions. In Chapter 4, we will return to this series and determine the values
of a for which it converges at one or both of the endpoints, x = ±1.

Ramanujan’s Series
The calculation of π was and continues to be an important source of interesting infinite
series. Modern calculations to over two billion digits are based on far more complicated
series such as the one published by S. Ramanujan (1887–1920) in 1915:
√ ∞
1 8  (4n)! (1103 + 26390n)
= .
π 9801 n=0 (n!)4 3964n

Web Resource: To learn more about approximations to π and to find links and
references, go to More pi.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.3 Calculating π 27

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

2.3.1. Find a value of n so that any partial sum with at least n terms is within 1/100 of the
target value π/4 of the series 1 − 1/3 + 1/5 − 1/7 + · · · . Justify your answer.

2.3.2. Find a value of n so that any partial sum with at least n terms is within 0.001 of the
target value for the series expansion of arctan(1/2). Justify your answer.

2.3.3. Use the method outlined in exercise 2.2.9 to show that for |x| < 1 the partial sums
of x − x 3 /3 + x 5 /5 − x 7 /7 + · · · can be forced arbitrarily close to the target value of
arctan x by taking enough terms.

2.3.4. Use equation (2.14) to prove that


π 1 1 1
= + + + ··· .
8 1 · 3 5 · 7 9 · 11

2.3.5. Prove Machin’s identity:


   
π 1 1
= 4 arctan − arctan .
4 5 239

2.3.6. How many terms of each series in equation (2.16) did Machin have to take in order to
calculate the first 100 digits of π ? Specify how many terms are needed so that the series for
16 arctan(1/5) and for 4 arctan(1/239) are each within 2.5 × 10−100 of their target values.

M&M
2.3.7.
Use your answer to exercise 2.3.6 to find the first 100 digits of π.

2.3.8. Explain why the geometric series is really just a special case of Newton’s binomial
series.

2.3.9. What happens to Newton’s binomial series when a is a positive integer? Explain
why it turns into a polynomial.

2.3.10. When a = 1/2, Newton’s binomial series becomes the series expansion for

1 + x. Find a value of n so that any partial sum with at least n terms is within 0.001 of

the target value 3/2. Justify your answer.

2.3.11. It may appear that Newton’s binomial series can only be used to find approximations
to square roots of numbers between 0 and 2, but once you can do this, you can find a series
for the square root of any positive number.
√  If x ≥ 2, then find the integer n so that
n2 ≤ x < (n + 1)2 . It follows
√ that x = n x/n2 and 1 ≤ x/n2 < 2. Use this idea to find
a series expansion for 13. Find a√value of n so that any partial sum with at least n terms
is within 0.001 of the target value 13. Justify your answer.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

28 2 Infinite Summations


2.3.12.
M&M Evaluate the partial sum of at least the first hundred terms of the binomial
series expansion of (1 + x)2 at x = 0.5 and for each of the following values of a: −2, −0.4,
1/3, 3, and 5.2. In each case, does the numerical evidence suggest that you are converging
to the true value of (1 + .5)a ? Describe and comment on what you see happening.

2.3.13.
M&M Evaluate the partial sum of at least the first hundred terms of the binomial

series expansion of 1 + x at x = −2, −1, 0.9, 0.99, 1, 1.01, 1.1, and 2. In each case,

does the numerical evidence suggest that you are converging to the true value of 1 + x?
Describe and comment on what you see happening.
 √
2.3.14.
M&M Graph y = 1 + x for −1 ≤ x ≤ 2 and compare this with the graphs over
the same interval of the polynomial approximations of degrees 2, 5, 8 and 11 obtained from
the binomial series
1/2 (1/2)(−1/2) 2 (1/2)(−1/2)(−3/2) 3
1+ x+ x + x + ··· .
1 2! 3!
Describe what is happening in these graphs. For which values of x is each polynomial a

good approximation to 1 + x?

2.3.15. Using the methods of this section, find an infinite series that is equal to

1
 1/3
1 − t3 dt.
0

2.4 Logarithms and the Harmonic Series


In exercise 2.2.9 on page 21, we saw how to justify integrating each side of
1
= 1 + x + x2 + x3 + x4 + · · ·
1−x
to get the series expansion, valid for |x| < 1,

x2 x3 x4 x5
− ln(1 − x) = x + + + + + ··· . (2.26)
2 3 4 5

Replacing x by −x and multiplying through by −1, we get the series expansion for the
natural logarithm in its usual form,

x2 x3 x4 x5
ln(1 + x) = x − + − + − ··· . (2.27)
2 3 4 5

Around 1667, this identity was independently discovered by Isaac Newton and by Nico-
laus Mercator (?–1687). Mercator was the first to publish it. Though we have only proved
its validity for −1 < x < 1, it also holds for x = 1 where it yields the target value for the
alternating harmonic series,

1 1 1 1 1
ln 2 = 1 − + − + − + ··· . (2.28)
2 3 4 5 6
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.4 Logarithms and the Harmonic Series 29

What about the harmonic series,


1 1 1 1
1+ + + + + · · ·?
2 3 4 5
It is not hard to see that under the Archimedean understanding, this does not have a value.
Consider the partial sums of the first 2n terms:
1 3
n=1: 1+ = ,
2 2
 
1 1 1 1 2
n=2: 1+ + + > 1 + + = 2,
2 3 4 2 4
   
1 1 1 1 1 1 2 4 5
n=3: 1+ + + + + ··· + >1+ + + = ,
2 3 4 5 8 2 4 8 2
     
1 1 1 1 1 1 1
n=4: 1+ + + + + ··· + + + ··· +
2 3 4 5 8 9 16
1 2 4 8
> 1+ + + + = 3.
2 4 8 16
In general, we see that
1 1 1 1 2 4 2n−1 1 n+2
1+ + + ··· + n > 1 + + + + ··· + n = 1 + n · = .
2 3 2 2 4 8 2 2 2
No matter what number we pick, we can find an n so that all of the partial sums with at
least 2n terms will exceed that number. There is no target value.
What about ∞? The problem is that ∞ is not a number, so it cannot be a target value.
Nevertheless, there is something special about the way in which this series diverges. No
matter how large a number we pick, all of the partial sums beyond some point will be larger
than that number.

Definition: divergence to infinity


When we write that an infinite series equals ∞, we mean that no matter what number
we pick, we can find an n so that all of the partial sums with at least n terms will exceed
that number.

We write
1 1 1 1
1+ + + + + · · · = ∞,
2 3 4 5
but the fact that we have set this series equal to infinity does not mean that it has a value.
This series does not have a value under the Archimedean understanding. What we have
written is shorthand for the fact that this series diverges in a special way.

Euler’s Constant
How large is the nth partial sum of the harmonic series? This is not an idle question. It arises
in many interesting and important problems. We are going to find a simple approximation,
in terms of n, to the value of the partial sum of the first n − 1 terms of the harmonic series,
1 + 1/2 + 1/3 + · · · + 1/(n − 1).
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

30 2 Infinite Summations

0.8

0.6

0.4

0.2

0
1 2 3 4 5 6
x

FIGURE 2.5. The graphs of y = 1/x and y = 1/ x


.

Web Resource: For problems that require finding the values of the partial sums
of the harmonic series, go to Explorations of the Harmonic Series.

The key to getting started is to think of this value as an area. It is the area under y = 1/ x

from x = 1 to n—this is why we only took the first n − 1 terms of the harmonic series.
(The symbol x
denotes the greatest integer less than or equal to x, read as the floor of x.)
Our area is slightly larger than that under the graph of y = 1/x from x = 1 to x = n (see
Figure 2.5). The area under the graph of y = 1/x is

n
1
dx = ln n.
1 x

How much larger is the area we want to find?


The missing areas can be approximated by triangles. The first has area (1 − 1/2)/2, the
second (1/2 − 1/3)/2, and so on. The sum of the areas of the triangles is
     
1 1 1 1 1 1 1 1 1 1
1− + − + ··· + − = − .
2 2 2 2 3 2 n−1 n 2 2n
This sum approaches 1/2 as n gets larger. This is not big enough. We have missed part of the
area between the curves. But it gives us some idea of the probable size of this missing area.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.4 Logarithms and the Harmonic Series 31

The value that this missing area approaches as n increases is denoted by the greek letter γ ,
read gamma, and is called Euler’s constant, since it was Leonard Euler (1707–1783) who
discovered this constant and established the exact connection between the harmonic series
and the natural logarithm in 1734.

Definition: Euler’s constant, γ


Euler’s constant is defined as the limit between the partial sum of the harmonic series
and the natural logarithm,
 
1 1 1
γ = lim 1 + + + ··· + − ln n . (2.29)
n→∞ 2 3 n−1

Estimating Euler’s Gamma


We define

1 1 1
xn = 1 + + + ··· + − ln n. (2.30)
2 3 n−1

This sequence records the accumulated areas between 1/ x


and 1/x for 1 ≤ x ≤ n, so it is
increasing. By definition, γ is the value of its limit, but how do we know that this sequence
has a limit? If it does, how large is γ ? We will answer these questions by finding another
sequence that decreases toward γ , enabling us to squeeze the true value between these two
sequences.
We define

1 1 1 1 1
yn = 1 + + + ··· + + − ln n = xn + . (2.31)
2 3 n−1 n n

We can use the series expansion for ln(1 + 1/n) to show that this sequence is decreasing
(see exercise 2.4.3 for a geometric proof that the sequence (y1 , y2 , . . .) decreases):
1
yn − yn+1 = ln(n + 1) − ln(n) −
n+1
 
1 1
= ln 1 + −
n n+1
1 1 1 1 1 1
= − − + 3 − 4 + 5 − ···
n n + 1 2n2 3n 4n 5n
2n(n + 1) − 2n2 − (n + 1) 1 1 1
= + 3 − 4 + 5 − ···
2n (n + 1)
2 3n 4n 5n
   
n−1 1 1 1 1
= 2 + − 4 + − 6 + ···
2n (n + 1) 3n3 4n 5n5 6n
> 0.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

32 2 Infinite Summations

This implies that


y1 > y2 > y3 > y4 > · · · .
Now yn = xn + 1/n, and so yn is always larger than xn . As n gets larger, xn and yn get
closer. For example:
x10 = 0.526 . . . < γ < y10 = 0.626 . . . ,
x100 = 0.5722 . . . < γ < y100 = 0.5822 . . . ,
x1000 = 0.57671 . . . < γ < y1000 = 0.57771 . . . .
Since we know that xn and yn differ by 1/n, we can bring them as close together as we
want, and find γ to whatever accuracy we desire.

s s s s s s s s s s
x1 x2 x3 x4 x5 · · · y5 y4 y3 y2 y1

But as we narrow in, is there really something there?

The Nested Interval Principle


What may seem to be a ridiculous question is actually very profound. It gets to the heart
of what we mean by the real number line. It would not be until the second half of the 19th
century that anyone seriously asked this question: If we have two sequences approaching
each other, one always increasing, the other always decreasing, and so that the distance
between the sequences can be made as small as we wish by going out far enough on both
sequences, do these sequences have a limit?
We cannot prove that such a limit must exist. The existence of the limit must be wrapped
up in the definition of what we mean by the real number line, stated as an axiom or
fundamental assumption.

Definition: nested interval principle


Given an increasing sequence, x1 ≤ x2 ≤ x3 ≤ · · · , and a decreasing sequence, y1 ≥
y2 ≥ y3 ≥ · · · , such that yn is always larger than xn but the difference between yn and
xn can be made arbitrarily small by taking n sufficiently large, there is exactly one real
number that is greater than or equal to every xn and less than or equal to every yn .

The important part of this principle is that there is at least one number that is greater
than or equal to every xn and less than or equal to every yn . We cannot have more than one
such number. If
xn ≤ a < b ≤ yn
for all n, then yn − xn would have to be at least as large as b − a. But our assumption is
that we can make yn − xn as small as we want.
The conclusion that there is at least one such number is something we cannot prove, not
without making some other assumption that is equivalent to the nested interval principle.
We have reached one of the foundational assumptions on which a careful and rigorous
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.4 Logarithms and the Harmonic Series 33

treatment of calculus can be built. It took mathematicians a long time to realize this. In
the early 1800s, the nested interval principle was used as if it was too obvious to bother
justifying or even stating very carefully. What it guarantees is that the real number line has
no holes in it. If two sequences are approaching each other from different directions, then
where they “meet” there is always some number.
This principle will play an important role in future chapters when we enter the nineteenth
century and begin to grapple with questions of continuity and convergence. It will be our
primary tool for showing that a desired number actually exists even when we do not know
what it is.

Approximating Partial Sums of the Harmonic Series


What is the first integer n for which
1 1 1
1+ + + · · · + > 10?
2 3 n
We know that the sum of the first n terms of the harmonic series is about ln n, so e10 ≈
22,000 should be roughly accurate. That is off by a factor close to 2 because we did not use γ .
Our partial sum is closer to ln n + γ , so we want ln n + γ ≈ 10 or n ≈ e10−γ ≈ 12,366.968.
This requires a fairly accurate approximation to γ . We can use

γ = 0.5772156649012.

Will 12,366 terms be enough, or do we need the 12,367th? Or is it the case that we are
close, but not quite close enough to be able to determine the exact number of terms? Can
we find out without actually adding 12,366 fractions?
We know that
1 1 1 1 1
1+ + ··· + − ln n < γ < 1 + + · · · + + − ln n,
2 n−1 2 n−1 n
and so

1 1 1 1
γ + ln(n − 1) < 1 + + ··· + < γ + ln n < 1 + + · · · + . (2.32)
2 n−1 2 n

It follows that
1 1 1 1
9.9999217 < 1 + + ··· + < 10.00000258 < 1 + + · · · + ,
2 12366 2 12367
so the answer must be either 12,366 or 12,367.
In his original paper of 1734, Euler gave more information that will enable us to decide
which is the correct answer. We observe that
 
1 1 1 1 1
ln(n + 1) − ln n = ln 1 + = − 2 + 3 − 4 + ··· ,
n n 2n 3n 4n
which we will write as

1 1 1 1 1
= ln(n + 1) − ln n + 2 − 3 + 4 − 5 + · · · . (2.33)
n 2n 3n 4n 5n
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

34 2 Infinite Summations

This implies the following identities:


1 1 1 1
1 = ln 2 − ln 1 + − + − + ··· ,
2 3 4 5
1 1 1 1 1
= ln 3 − ln 2 + − + − + ··· ,
2 2·2 2 3·2 3 4·2 4 5 · 25
1 1 1 1 1
= ln 4 − ln 3 + − + − + ··· ,
3 2·3 2 3·3 3 4·3 4 5 · 35
1 1 1 1 1
= ln 5 − ln 4 + − + − + ··· ,
4 2 · 42 3 · 43 4 · 44 5 · 45
..
.
1 1 1
= ln(n) − ln(n − 1) + −
n−1 2 · (n − 1)2 3 · (n − 1)3
1 1
+ − + ··· .
4 · (n − 1)4 5 · (n − 1)5
Adding these equations (and recognizing that ln 1 = 0), we see that
1 1 1
1+ + + ··· + = ln n
2 3 n−1
 
1 1 1 1
+ 1+ + 2 + ··· +
2 22 3 (n − 1)2
 
1 1 1 1
− 1+ + 3 + ··· +
3 23 3 (n − 1)3
 
1 1 1 1
+ 1+ + 4 + ··· +
4 24 3 (n − 1)4
 
1 1 1 1
− 1+ + 5 + ··· + + ··· . (2.34)
5 25 3 (n − 1)5
This implies that
 
1 1 1
γ = lim 1 + + + ··· + − ln n
n→∞ 2 3 n−1
∞ ∞ ∞ ∞
1 1 1 1 1 1 1 1
= − + − + ··· . (2.35)
2 m=1 m2 3 m=1 m3 4 m=1 m4 5 m=1 m5

We can now see exactly how far the partial sum of the harmonic series is from ln n + γ :
1 1 1
1+ + + ··· + − ln n − γ
2 3 n−1
∞ ∞ ∞ ∞
1 1 1 1 1 1 1 1
= − + − + − ··· . (2.36)
2 m=n m2 3 m=n m3 4 m=n m4 5 m=n m5

It follows that
∞ ∞
1 1 1 1 1 1 1
1+ + + ··· + < ln n + γ − + .
2 3 n−1 2 m=n m2 3 m=n m3
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.4 Logarithms and the Harmonic Series 35

We can use integrals to approximate these sums:


∞

1 dx 1
2
> 2
= ,
m=n
m n x n

∞

1 1 dx 2+n
3
< 3+ 3
= .
m=n
m n n x 2n3
Finally, we see that
1 1 1 1 12369
1+ + + ··· + < ln 12367 + γ − + < 9.9999622.
2 3 12366 2 · 12367 6 · 123673
The first time the partial sum of the harmonic series exceeds 10 is with the 12,367th
summand.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

2.4.1. Give an example of a series that diverges to ∞ but whose partial sums do not form
an increasing sequence.

2.4.2. Give an example of a series that does not diverge to ∞ but whose partial sums are
increasing.

2.4.3. Show that the area above 1/ x + 1


and below 1/x for 1 ≤ x ≤ n is equal to
ln n − (1/2 + 1/3 + · · · + 1/n). Use this fact to prove that yn = 1 + 1/2 + 1/3 + · · · +
1/n − ln n is a decreasing sequence.

2.4.4.
M&M Evaluate the partial sum of the power series for ln(1 + x) with at least
100 terms at x = −0.9, 0.9, 0.99, 0.999, 1, 1.001, 1.01, and 1.1. Compare these approxima-
tions with the actual value of ln (1 + x). Describe and comment on what you see happening.

2.4.5.
M&M In 1668, James Gregory came up with an improvement on the series for
ln (1 + x). He started with the observation that
   
1+z z3 z5
ln =2 z+ + + ··· . (2.37)
1−z 3 5

Using the series for ln (1 + x), prove this identity. Show that if 1 + x = (1 + z)/(1 − z)
then z = x/(x + 2) and therefore, for any x > −1,
  3 
x 1 x
ln(1 + x) = 2 + + ··· . (2.38)
x+2 3 x+2

Explore the convergence of this series. How many terms are needed in order to calculate
ln 5 to within an accuracy of 1/100,000 if we set x = 4 in equation (2.38)? How many
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

36 2 Infinite Summations

terms are needed to get this same accuracy if we set x = −4/5 and calculate − ln(1/5)?
Justify your answers.

2.4.6.
M&M Evaluate the partial sum of the series given in equation (2.38) using at least
100 terms at x = −0.9, 0.9, 1, 1.1, 5, 20, and 100. Describe and comment on these results
and compare them with the results of exercise 2.4.4.

M&M
2.4.7.
Using equations (2.14), page 23, and (2.28), page 28, express the series
1 1 1 1 1 1 1
1 + − − + + − − + ···
2 3 4 5 6 7 8
in terms of π and ln 2. Check your result by calculating the sum of the first 1000 terms of
this series.

2.4.8. Identify the point at which the following argument goes wrong. Which is the first
equation that is not true? Explain why it is not true.

Let f be any given function. Then:



2
2
1
f (x) dx = f (x) dx − f (x) dx. (2.39)
1 0 0

Letting x = 2y in the first integral on the right:



2
1
f (x) dx = 2 f (2y) dy (2.40)
0 0

1
=2 f (2x) dx. (2.41)
0

Take f (x) such that f (2x) = 12 f (x) for all values of x. Then

2
1
1
1
f (x) dx = 2 f (x) dx − f (x) dx (2.42)
1 0 2 0
= 0. (2.43)
2
Now f (2x) = 12 f (x) is satisfied by f (x) = 1/x. Thus, 1 dx/x = 0, so log 2 = 0.

2.4.9.
M&M
The summations in equation (2.35) are well known as the zeta (greek letter ζ )
functions:

∞
1
ζ (k) = . (2.44)
m=1
mk

When k is even, these are equal to a rational number times π k . There is no simple formula
when k is odd, but these values of ζ (k) are also known to very high accuracy. What happens
to the values of ζ (k) as k increases? Assuming that we have arbitrarily good accuracy on
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.4 Logarithms and the Harmonic Series 37

the values of ζ (k), how many terms of the series in equation (2.35) are needed to calculate
γ to within 10−6 ?

To see how to evaluate the zeta function at positive even integers, go to


Appendix A.3, Sums of Negative Powers.

2.4.10. We know that the harmonic series does not converge. A result that is often seen as
surprising is that if we eliminate those integers that contain the digit 9, the partial sums of
the resulting series do stay bounded. Prove that the partial sums of the reciprocals of the
integers that do not contain any 9’s in their decimal representation are bounded.

2.4.11. Show that if we sum the reciprocals of the positive integers that contain neither an
8 nor a 9, the sum must be less than 35.

2.4.12. Find an upper bound for the sum of the reciprocals of the positive integers that do
not contain the digit 1.

M&M
2.4.13.
The following procedure enables us to estimate the sum of
1 1 1 1
1 + √ + √ + √ + ··· + √ .
2 3 4 n
a. Show that
√ √ 1
k+1− k = √ √ ,
k+1+ k
and then explain how this implies that
√ √ 1 √ √
2 k + 1 − 2 k < √ < 2 k − 2 k − 1. (2.45)
k
b. List the double inequalities of (2.45) for k = 1, 2, 3, . . . , n, and then add up each column
to prove that
√ √ 1 1 √ √
2 n + 1 − 2 1 < 1 + √ + · · · + √ < 2 n − 2 0. (2.46)
2 n
Show that this implies that
1 1 1 1
61.27 < 1 + √ + √ + √ + · · · + √ < 63.25.
2 3 4 1000
c. Using a computer or calculator, what is the value to six significant digits of the series
1 1 1 1
1 + √ + √ + √ + ··· + √ ?
2 3 4 1000
d. Find bounds for
1 1 1 1
1 + √ + √ + √ + ··· + √ .
2 3 4 1,000,000,000
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

38 2 Infinite Summations

e. Does the infinite series


1 1 1
1 + √ + √ + √ + ···
2 3 4
converge or does it diverge to ∞? Explain your answer.

2.4.14. Find a simple function of n in terms of ln n, call it ω, so that


 
1 1 1
lim 1 + + + · · · + − ω(n) = 0.
n→∞ 3 5 2n − 1

2.4.15. Consider the rearranged alternating harmonic series which takes the first r positive
summands, then the first s negative summands, and then alternates r positive summands
with s negative summands. If we take the partial sum with n(r + s) terms, it is equal to
   
1 1 1 1 1 1 1
1 + + + ··· + − + + + ··· + .
3 5 2nr − 1 2 4 6 2ns
Using the result from exercise 2.4.14, find a simple function that differs from this summ-
ation by an amount that approaches 0 as n gets larger. Show that this function does not
depend on n. Explain why every partial sum of this rearranged alternating harmonic series
will be as close as we wish to this target value if we have enough terms.

2.4.16. Use the fact that



1 1 1 −1  1
1+ + + ··· + − ln n − γ >
2 3 n−1 2 m=n m2

to find a lower bound for 1 + 1/2 + 1/3 + · · · + 1/(n − 1) in the form ln n + γ − R(n)
where R(n) is a rational function of n (a ratio of polynomials). Show that ln n + γ − R(n)
is strictly larger than ln(n − 1) + γ .

2.4.17.
M&M Use the result from exercise 2.4.16 to find the precise smallest integer n
such that 1 + 1/2 + 1/3 + · · · + 1/n is larger than 100. Show the work that leads to your
answer.

2.4.18. You are asked to walk to the end of an infinitely stretchable rubber road that is one
mile long. After each step, the road stretches uniformly so that it is one mile longer than it
was before you took that step. Assuming that there are 2000 steps to a mile and that you
are moving at the brisk pace of two steps per second, show that you will eventually reach
the end of the road. Find the approximate time (in years) that it will take.

2.5 Taylor Series


Infinite series explode across the eighteenth century. They are discovered, investigated,
and utilized. They are recognized as a central pillar of calculus, so much so that one of
the most important books to be published in this century, Euler’s Introductio in analysin
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.5 Taylor Series 39

infinitorum of 1748, is a primer on infinite series. There is no calculus in it in the sense that
there are no derivatives, no integrals, only what Euler calls “algebra,” but it is the algebra
of the infinite: derivations of the power series for all of the common functions and some
extraordinary manipulations of them. This is done not as a consequence of calculus but as
a preparation for it. As he says in the Preface:

Often I have considered the fact that most of the difficulties which block the
progress of students trying to learn analysis stem from this: that although they
understand little of ordinary algebra, still they attempt this more subtle art. From
this it follows not only that they remain on the fringes, but in addition they entertain
strange ideas about the concept of the infinite, which they must try to use . . . I am
certain that the material I have gathered in this book is quite sufficient to remedy
that defect.

By the end of the seventeenth century, power series,

a0 + a1 x + a2 x 2 + a3 x 3 + a4 x 4 + · · · ,

had emerged as one of the primary tools of calculus. They were useful for finding approx-
imations. They soon became indispensable for solving differential equations. As long as x
is restricted to the interval where the power series are defined, they can be differentiated,
integrated, added, multiplied, and composed as if they were ordinary polynomials.
One example of their utility can be found in Leonhard Euler’s analysis of 1759 of the
vibrations of a circular drumhead. Euler was led to the differential equation
 
d 2 u 1 du β2
+ + α − 2 u = 0,
2
(2.47)
dr 2 r dr r

where u (the vertical displacement) is a function of r (the distance from the center of the
drum) and α and β are constants depending on the properties of the drumhead. There is no
closed form for the solution of this differential equation, but if we assume that the solution
can be expressed as a power series,

u = r λ + a1 r λ+1 + a2 r λ+2 + a3 r λ+3 + · · · ,

then we can solve for λ and the ai .

Web Resource: To learn more about the drumhead problem and to see how Euler
solved it, go to Euler’s solution to the vibrating drumhead.

Power series are useful. They are also ubiquitous. Every time a power series representa-
tion was sought, it was found. It might be valid for all x as with sin x, or only for a restricted
range of x as with ln (1 + x), but it was always there. In 1671, James Gregory wrote to
John Collins and listed √ the first five or six terms of the power series for tan x, arctan x,
sec x, ln sec x, sec−1 ( 2 ex ), ln( tan(x/2 + π/4)), and 2 arctan(tanh x/2). Clearly, he was
drawing on some underlying machinery to generate these.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

40 2 Infinite Summations

Everyone seemed to know about this power series machine. Gottfried Leibniz and
Abraham de Moivre had each described it and explained the path of their discovery in
separate letters to Jean Bernoulli, Leibniz in 1694, de Moivre in 1708. Newton had hinted at
it in his geometric interpretation of the coefficients of a power series in Book II, Proposition
X of the Principia of 1687. He elucidated it fully in an early draft of De Quadratura but
removed it before publication. Johann Bernoulli published the general result

x
x2  x3 x4
f (t) dt = xf (x) − f (x) + f  (x) − f  (x) + · · · (2.48)
0 2! 3! 4!

in the journal Acta Eruditorum in 1694. He would later point out that this is equivalent
to the machine in question. Today, this machine is named for the first person to actually
put it into print, Brook Taylor (1685–1731). It appeared in his Methodus incrementorum
of 1715. His derivation is based on an interpolation formula discovered independently by
James Gregory and Isaac Newton (Book III, Lemma V of the Principia).

Taylor’s Formula
The machine described by Taylor expresses the coefficients of the power series in terms of
the derivatives at a particular point.

Definition: Taylor Series


If all of the derivatives of the function f exist at the point a, then the Taylor series for
f about a is the infinite series
f  (a) f  (a)
f (x) = f (a) + f  (a) (x − a) + (x − a)2 + (x − a)3
2! 3!
f (4) (a)
+ (x − a)4 + · · · . (2.49)
4!
This has as a special case (a = 0):

f  (0) 2 f  (0) 3 f (4) (0) 4


f (x) = f (0) + f  (0) x + x + x + x + · · · . (2.50)
2! 3! 4!

All power series are special cases of equation (2.49). For example, if f (x) = ln(1 + x),
we observe that

f (a) = ln(1 + a),


f  (a) = (1 + a)−1 ,
f  (a) = −(1 + a)−2 ,
f  (a) = 2(1 + a)−3 ,
..
.
f (n) (a) = (−1)n−1 (n − 1)! (1 + a)−n .
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.5 Taylor Series 41

Making this substitution into equation (2.50), we obtain


−1 2 2 (−1)n−1 (n − 1)! n
ln(1 + x) = ln 1 + 1 · x + x + x3 + · · · + x + ···
2! 3! n!
x2 x3 xn
=x− + − · · · + (−1)n−1 + ··· .
2 3 n
We are not yet ready to prove that equation (2.49) satisfies the Archimedean understand-
ing of an infinite series. In fact, we know that we will have to restrict the values of x for
which it is true, since even the series for ln(1 + x) is only valid for −1 < x ≤ 1. But we can
give a fast and dirty reason why it makes sense. If we assume that f (x) can be represented
as a power series,

f (x) = c0 + c1 (x − a) + c2 (x − a)2 + · · · + ck (x − a)k + · · · ,

and if we assume that we are allowed to differentiate this power series by differentiating
each summand, then the kth derivative of f is equal to
(k + 1)! (k + 2)!
f (k) (x) = ck k! + ck+1 (x − a) + ck+2 (x − a)2 + · · · .
1! 2!
Setting x = a eliminates everything but the first term. If power series are as nice as we
hope them to be, then we will have
f (k) (a)
f (k) (a) = ck k! =⇒ ck = .
k!

d’Alembert and the Question of Convergence


One of the first mathematicans to study the convergence of series was Jean Le Rond
d’Alembert (1717–1783) in his paper of 1768, “Réflexions sur les suites et sur les racines
imaginaires.” d’Alembert was science editor for Diderot’s Encyclopédie and contributed
many of the articles across many different fields. He was born Jean Le Rond, a foundling
whose name was taken from the church of Saint Jean Le Rond in Paris on whose steps he
had been abandoned. “Le Rond” (the round or plump) refers to the shape of the church.
Perhaps feeling that John the Round was a name lacking in dignity, he added d’Alembert,
and occasionally signed himself Jean Le Rond d’Alembert et de la Chapelle.1
d’Alembert considered Newton’s binomial series and asked when it is valid. In particular,
he looked at the following series:
  
200 ? 200 (1/2)(1/2 − 1) 200 2
1+ = 1 + (1/2) +
199 199 2! 199
 3
(1/2)(1/2 − 1)(1/2 − 2) 200
+ + ··· . (2.51)
3! 199
As d’Alembert pointed out, the series begins well. The partial sums of the first 100 and the
first 101 terms are, respectively, 1.416223987 and 1.415756552. It appears to be converging
very quickly toward the correct value near 1.41598098.

1 d’Alembert was not the only famous mathematician to create his own surname. James Joseph Sylvester was

born James Joseph.


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

42 2 Infinite Summations

Web Resource: To explore the convergence of d’Alembert’s series, go to Explo-


rations of d’Alembert’s series.

Starting out well is not enough. d’Alembert analyzed this series by comparing it to the
geometric series. What characterizes a geometric series is the fact that the ratio of any two
consecutive summands is always the same. This suggests analyzing the binomial series
by looking at the ratio of consecutive summands. We can then compare our series to a
geometric series. The series in (2.51) has as the nth summand
 
(1/2)(1/2 − 1) . . . (1/2 − n + 2) 200 n−1
an = .
(n − 1)! 199
The absolute value of the ratio of consecutive summands is
 200 n

an+1 (1/2)(1/2−1)...(1/2−n+2)(1/2−n+1)

a = (1/2)(1/2−1)...(1/2−n+2)  200 n−1
n! 199
n (n−1)! 199


(1/2 − n + 1) 200

=
n 199
 
3 200
= 1− . (2.52)
2n 199
d’Alembert now observed that this ratio is larger than 1 whenever n is larger than 300:
 
3 200
1− > 1 if and only if n > 300.
2n 199
At n = 301, the ratio is larger than 1.000016, and it approaches 200/199 as n gets larger.
Once we pass n = 300, our summands will start to get larger. If the summands approach
zero, we are not guaranteed convergence. On the other hand, if the summands do not
approach zero, then the series cannot converge. Table 2.1. shows the partial sums for
various values of n up to n = 3000. The partial sums are closest to the target value when
n = 300, and then they move away with ever increasing error.
The general binomial series is

α(α − 1) 2 α(α − 1)(α − 2) 3


(1 + x)α = 1 + αx + x + x + ··· . (2.53)
2! 3!

A similar analysis can be applied. The absolute value of the ratio of the n + 1st to the nth
summands is

an+1 (α − n + 1)
= x = 1 − 1 + α |x|. (2.54)
a n n
n

As n increases, this ratio approaches |x|. If |x| > 1, then the summands do not approach 0
and the series cannot converge to (1 + x)α .
If |x| < 1, then the summands approach 0. Is this enough to guarantee that the binomial
series converges to the desired value? d’Alembert did not answer this although he seemed to
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.5 Taylor Series 43

Table 2.1. Binomial series approximations to



1 + 200/199.

n sum of first n terms sum of first n + 1 terms


100 1.416223987 1.415756552
200 1.416125419 1.415853117
300 1.416111363 1.415866832
400 1.416120069 1.415857961
500 1.416143716 1.415834169
600 1.416183194 1.415794514
700 1.416243295 1.415734170
800 1.416332488 1.415644629
900 1.416464086 1.415512518
1000 1.416658495 1.415317346
1500 1.420454325 1.411505919
2000 1.451536959 1.380289393
2500 1.727776909 1.102822490
3000 4.323452545 −1.504623925

imply it. Neither did he investigate what happens when |x| = 1 (a question with a delicate
answer that depends on the value of α and the sign of x), or how far this approach can be
extended to other series.

Lagrange’s Remainder
Joseph Louis Lagrange (1736–1813) was born in Turin, Italy under the name Giuseppe
Lodovico Lagrangia. When he published his first mathematics in 1754, he signed it Luigi De
la Grange Tournier (the final name being a reference to his native city). Shortly thereafter,
he adopted the French form of his name, Joseph Louis. Like many people of his time, he was
not consistent in his signature. “J. L. de la Grange” was the most common. It was only after
his death that “Joseph Louis Lagrange” became the common spelling. His mathematical
reputation was established at an early age, and he was a frequent correspondent of Euler
and d’Alembert. In 1766 he succeeded Euler at the Berlin Academy, and in 1787 he moved
to Paris. He was clearly the dominant member of the committee that met, probably in early
1808, to reject Fourier’s treatise on the propagation of heat.
In the revised edition of Théorie des fonctions analytiques published in the year of his
death, Lagrange gives a means of estimating the size of the error that is introduced when
any partial sum of a Taylor series is used to approximate the value of the original function.
In other words, he finds a way of bounding the difference between the partial sums of
the Taylor series and the function value that it approaches. This is exactly what we will
need in order to prove that Taylor series satisfy the Archimedean understanding of infinite
series. Proving Lagrange’s Remainder Theorem will be one of the chief goals of the next
chapter.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

44 2 Infinite Summations

Theorem 2.1 (Lagrange’s Remainder Theorem). Given a function f for which all
derivatives exist at x = a, let Dn (a, x) denote the difference between the nth partial
sum of the Taylor series for f expanded about x = a and the target value f (x),

f  (a)
Dn (a, x) = f (x) − f (a) + f  (a) (x − a) + (x − a)2
2!

f (n−1) (a)
+··· + (x − a)n−1
. (2.55)
(n − 1)!
There is at least one real number c strictly between a and x for which
f (n) (c)
Dn (a, x) = (x − a)n . (2.56)
n!

While we do not know the value of c, the fact that it lies between a and x is often enough
to be able to bound the size of the error.

Web Resource: To explore the behavior of this difference function, go to


Explorations of Lagrange’s Remainder .

The Exponential, Sine, and Cosine Functions


The exponential, sine, and cosine functions all have particularly nice derivatives which,
when evaluated at x = 0 always yield 1, −1, or 0, giving us simple Taylor series:
x2 x3 x4
ex = 1 + x + + + + ··· , (2.57)
2! 3! 4!
x3 x5 x7
sin x = x − + − + ··· , (2.58)
3! 5! 7!
x2 x4 x6
cos x = 1 − + − + ··· . (2.59)
2! 4! 6!
For which values of x do the partial sums of these infinite series approach the target value,
that is to say, when do they approach the value of the function at x?
Lagrange’s remainder theorem tells us that the difference between the partial sum that
ends with the term involving x n−1 and the target value is bounded by
(n)
f (c)
|x|n ,
n!
for some c between 0 and x. For the exponential function, the nth derivative is ex , and so
the difference is ec x n /n!. The absolute value of this is bounded by
ex n |x|n
x for x > 0, for x < 0.
n! n!
The situation is even simpler for the sine
and cosine. Since the nth derivative is always a
sine or cosine function, we know that f (n) (c) ≤ 1 no matter what the value of c, and so
the difference is bounded by |x|n /n!.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.5 Taylor Series 45

|x|n
What happens to n!
as n gets large? Stirling’s formula gives us an easy answer.

Stirling’s Formula

The factorial function n! is well approximated by the function (n/e)n 2π n. Specifi-
cally, we have that
n!
lim √ = 1. (2.60)
n→∞ (n/e)n 2π n

For a proof of Stirling’s formula and for information on the accuracy of this
approximation, go to Appendix A.4, The Size of n!.

With Stirling’s formula on hand, we see that for any real number x, the difference
between the exponential, sine, or cosine function and the partial sum of its Taylor series
can be made arbitrarily small by taking n sufficiently large,
 
|x|n e|x| n 1
lim = lim √ = 0.
n→∞ n! n→∞ n 2π n

Lagrange and the Binomial Series


Lagrange’s remainder enables us to answer three questions left open by d’Alembert’s
analysis of the binomial series:
a. What happens when x = 1?
b. If the series converges, how many terms must we take in order to obtain the desired
degree of accuracy?
c. If the series diverges, how accurate can we be?
To simplify our calculations, we shall restrict our attention to Newton’s original
expansion:
√ (1/2)(−1/2) 2
1 + x = 1 + (1/2) x + x + ··· .
2!
If we take the partial sum up to
(1/2)(−1/2) · · · (−n + 1 + 3/2) n−1
x ,
(n − 1)!

then the difference between this partial sum and 1 + x is

(1/2)(−1/2) · · · (−n + 3/2)(1 + c)−n+1/2 n


Dn (0, x) = x . (2.61)
n!

For x > 0, we find an upper bound on the absolute value of Dn (0, x) by taking c = 0.
The error that is introduced by using the polynomial approximation of degree n − 1 is
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

46 2 Infinite Summations

0.025

0.02

D6 ( 0, x )
y=
x6
0.015

0.01

0.005

0
0.2 0.4 0.6 0.8 1
x

FIGURE 2.6. Plot of |D6 (0, x)/x 6 |. Note that it stays well below 6−3/2 π −1/2 ≈ 0.0384.

bounded by

(1/2)(−1/2) · · · (−n + 3/2) n
|Dn (0, x)| ≤ x
n!
1 · 3 · 5 · · · (2n − 3) n
= |x| . (2.62)
2 · 4 · 6 · · · (2n)
Using Wallis’s inequality (2.18) on page 24, we have that

1 · 3 · · · (2n − 1) 2
√ < , (2.63)
2 · 4 · · · (2n − 2) 2n π

and so (Figure 2.6)


1 · 3 · 5 · · · (2n − 3) n
|Dn (0, x)| ≤ |x|
2 · 4 · 6 · · · (2n)

2/π
< √ |x|n
(2n − 1) 2n
< |x|n n−3/2 π −1/2 . (2.64)
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.5 Taylor Series 47

When x = 1, the error term does approach zero as n gets larger. Given |x| < 1 and a
limit on the size of the allowable error, inequality (2.64) can be used to see how large n
must be. If |x| is larger than 1, then the error will eventually grow without bound. This
bound is minimized when
3
n= .
2 ln |x|
If x = 200/199, we want to choose n = 300. The resulting approximation will be within
(200/199)300
√ ≈ 4.88 × 10−4 .
3003/2 π

The True Significance


The Lagrange remainder for the Taylor series is more than a tool for estimating errors. It
makes precise the difference between the partial sum which is a polynomial and the target
function that this polynomial approximates. This precision will come to play a critical role
as we try to pin down the reasons why certain series behave well while others must be
treated with great care.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

2.5.1.
M&M
Find the first five nonzero terms in the power series for each of Gregory’s
functions.
a. tan x
b. arctan x
c. sec x
d. ln(sec x)

e. sec−1 ( 2 ex ), (sec−1 is the arc secant)
f. ln[tan(x/2 + π/4)]
g. 2 arctan(tanh x/2), [tanh x = (ex − e−x )/(ex + e−x ) is the hyperbolic tangent]

2.5.2.
M&M For each problem in exercise 2.5.1, graph the given function and compare
it to the graph of the first five terms of its power series. For what values of x do you have a
good approximation?

2.5.3. Prove Bernoulli’s identity, equation (2.48), by using repeated integration by parts:

x
x
f (t) dt = x f (x) − tf  (t) dt
0 0

x
x2  t 2 
= x f (x) − f (x) + f (t) dt
2 0 2
= ... .
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

48 2 Infinite Summations

2.5.4. Use Lagrange’s remainder theorem to determine the number of terms of the partial
sum for the power series expansion of the exponential, sine, or cosine function that are
needed in order to guarantee that the partial sum is within 1/100 of each of the following
target values.
a. e3
b. e10
 
c. sin π4
 
d. sin 2π
3
e. cos π2

2.5.5. Show that Taylor’s series implies Bernoulli’s identity by first using Taylor’s series
to prove that

f  (0) 2 f  (0) 3


f (x) − f (0) = f  (0) x + x + x + ··· ,
2! 3!
f  (0) 2 f (4) (0) 3
f  (x) − f  (0) = f  (0) x + x + x + ··· ,
2! 3!
f (4) (0) 2 f (5) (0) 3
f  (x) − f  (0) = f  (0) x + x + x + ··· ,
2! 3!
f (5) (0) 2 f (6) (0) 3
f  (x) − f  (0) = f (4) (0) x + x + x + ··· ,
2! 3!
..
.

Now eliminate f  (0), f  (0), f  (0), . . . to obtain

f  (x) 2 f  (x) 3 f (4) (x) 4


f (x) = f (0) + f  (x) x − x + x − x + ··· .
2! 3! 4!

2.5.6. Use Taylor series to find the power series for (1 + x)α . What happens when α is a
positive integer? What happens when α = 0?

M&M
2.5.7.
Consider the binomial series for the reciprocal of the square root,

(−1/2)(−3/2) 2 (−1/2)(−3/2)(−5/2) 3
(1 + x)−1/2 = 1 + (−1/2) x + x + x + ··· .
2! 3!
(2.65)

Calculate the partial sums as n goes from 100 to 1000 in steps of 100 when x = 200/199.
Describe what you see.

2.5.8.
M&M Calculate the partial sum of the series in equation (2.65) as n goes from
100 to 1000 in steps of 100 when x = 1 and when x = −1. Describe what you see.
Make a guess of whether or not this series converges for these values of x. Explain your
reasoning.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.5 Taylor Series 49

2.5.9. Find a good bound for the absolute value of the Lagrange remainder for the series
in equation (2.65). What happens to this bound when |x| = 1 and n gets large? What can
you say about the convergence of this series at x = ±1?

2.5.10. For the series in equation (2.65) with x = 1/2, how many terms in the partial sum
are needed in order to guarantee that the partial sum is within 0.01 of the target value?
Use the Lagrange remainder to answer this question and show the work that leads to your
answer.

2.5.11. Prove that


 
3 200
1− >1 if and only if n > 300.
2n 199
In general, prove that when n ≥ 1 + α > 0, then

(α − n + 1)x
>1
n

if and only if |x| > 1 and


1+α
n> .
1 − |x|−1
What happens if 1 + α ≤ 0?
 
2.5.12.
M&M Calculate the partial sum for 1 + (200/199):
   
200 (1/2)(−1/2) . . . (−297.5) 200 299
1 + (1/2) + ··· +
199 299! 199

and compare your result to the calculator value of 399/199. Is the error within the
predicted bounds? How close are you to the outer bound?

2.5.13. Find the Lagrange remainder for an approximation to an arbitrary binomial series:
α(α − 1) 2 α(α − 1) . . . (α − n + 2) n−1
(1 + x)α ≈ 1 + αx + x + ··· + x .
2! (n − 1)!

2.5.14. Simplify the Lagrange remainder of the previous exercise when α = −1. What
happens to this remainder when x = 1 and n increases? Does it approach 0?

2.5.15. What is wrong with the following argument? Using the Lagrange remainder, we
know that

(1 + x)−1 = 1 − x + x 2 − x 3 + · · · + (−1)n−1 x n−1 + Dn (0, x),

where Dn (0, x) = (−1)n x n /(1 + c)n+1 for some c between 0 and x. If x = 1, then c is
positive and the absolute value of the Lagrange remainder is 1/(1 + c)n+1 which approaches
0 as n increases.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

50 2 Infinite Summations


2.5.16.
M&M Experiment with different values of α between −1 and 1/2 in the Lagrange
remainder for the binomial series (exercise 2.5.13). Does the remainder approach zero when
x = 1 and n increases? Describe and discuss the results of your experiments.

2.5.17. Find the Lagrange form of the remainder for the partial sum approximation to
ln(1 + x):

x2 x n−1
ln(1 + x) ≈ x − + · · · + (−1)n−2 . (2.66)
2 n−1

Use this error bound to prove that when x = 1 the partial sums approach the target value
ln 2 as n increases. How large must n be in order to guarantee that the partial sum is within
0.001 of the target value? Show the work that leads to your answer.

2.5.18. Continuing exercise 2.5.17, find the value of c, 0 ≤ c ≤ 200/199, that maximizes
the absolute value of the Lagrange form of the remainder for the partial sum approximation
to ln(1 + x) given in equation (2.66) and evaluated at x = 200/199. Find the value of n
that minimizes this bound.

2.5.19. Consider Lagrange’s form of the remainder for D6 (0, x) when the function is

f (x) = 3 1 + x = (1 + x)1/3 , x > 0. Find the value of c, 0 ≤ c ≤ x, that maximizes the
absolute value of D6 (0, x). Graph the resulting function of x, and find the largest value of
x for which this bound is less than or equal to 0.5.

2.5.20.
M&M For the function f (x) = ln(1 + x), graph x −7 times each of the two
functions, the greater of which must bound the remainder when n = 6:
f (7) (0) f (7) (x)
y= and y= .
7! 7!
Now graph |D7 (0, x) x −7 |, 0 ≤ x ≤ 1, where
 
x2 x3 x4 x5 x6
D7 (0, x) = ln(1 + x) − x − + − + − ,
2 3 4 5 6
and see how it compares with these bounding functions.

2.6 Emerging Doubts


Calculus derives its name from its use as a tool of calculation. At its most basic level,
it is a collection of algebraic techniques that yield exact numerical answers to geometric
problems. One does not have to know why it works to use it. But the question of why
kept arising, partly because no one could satisfactorily answer it, partly because sometimes
these techniques would fail.
Newton and his successors thought in terms of velocities and rates of change and
talked of fluxions. For Leibniz and his school, the founding concept was the differential,
a small increment that was not zero yet smaller than any positive quantity. Neither of
these approaches is entirely satisfactory. George Berkeley (1685–1753) attacked both
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.6 Emerging Doubts 51

understandings in his classic treatise The Analyst, published in 1734. His point was that a
belief in mechanistic principles of science that could explain everything was self-deception.
Not even the calculus was sufficiently well understood that it could be employed without
reliance on faith.

By moments we are not to understand finite particles. These are said not to be
moments, but quantities generated from moments, which last are only the nascent
principles of finite quantities. It is said that the minutest errors are not to be ne-
glected in mathematics: that the fluxions are celerities, not proportional to the
finite increments, though ever so small; but only to the moments or nascent incre-
ments, whereof the proportion alone, and not the magnitude, is considered . . . It
seems still more difficult to conceive the abstracted velocities of such nascent
imperfect entities. But the velocities of the velocities, the second, third, fourth,
and fifth velocities, &c, exceed, if I mistake not, all human understanding. The
further the mind analyseth and pursueth these fugitive ideas the more it is lost
and bewildered; the objects, at first fleeting and minute, soon vanishing out of
sight. Certainly in any sense, a second or third fluxion seems an obscure mystery.
The incipient celerity of an incipient celerity, the nascent augment of a nascent
augment, i.e., of a thing which hath no magnitude; take it in what light you please,
the clear conception of it will, if I mistake not, be found impossible; whether it
be so or no I appeal to the trial of every thinking reader. And if a second fluxion
be inconceivable, what are we to think of third, fourth, fifth fluxions, and so on
without end!
The foreign mathematicians are supposed by some, even of our own, to proceed
in a manner less accurate, perhaps, and geometrical, yet more intelligible. Instead
of flowing quantities and their fluxions, they consider the variable finite quantities
as increasing or diminishing by the continual addition or subduction of infinitely
small quantities. Instead of the velocities wherewith increments are generated, they
consider the increments or decrements themselves, which they call differences, and
which are supposed to be infinitely small. The difference of a line is an infinitely
little line; of a plane an infinitely little plane. They suppose finite quantities to
consist of parts infinitely little, which by the angles they make one with another
determine the curvity of the line. Now to conceive a quantity, or than any the least
finite magnitude is, I confess, above my capacity. But to conceive a part of such
infinitely small quantity that shall be still infinitely less than it, and consequently
though multiplied infinitely shall never equal the minutest finite quantity is, I
suspect, an infinite difficulty to any man whatsoever; and will be allowed such by
those who candidly say what they think; provided they really think and reflect,
and do not take things upon trust.

This is only a small piece of Berkeley’s attack, but it illustrates the fundamental weakness
of calculus which is hammered upon in the second paragraph: the need to use infinity without
ever clearly defining what it means. The abuse of infinity has yielded rich rewards, but it is
abuse. Berkeley recognizes this.
No one was prepared to abandon calculus, but the doubts that had been voiced were
unsettling. Many mathematicians tried to answer the question of why it was so successful.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

52 2 Infinite Summations

Berkeley himself suggested that there was a system of compensating errors underlying
calculus. Jean Le Rond d’Alembert relied on the notion of limits. In 1784, the Berlin
Academy offered a prize for a “clear and precise theory of what is called the infinite in
mathematics.” They were not entirely satisfied with any of the entrants, although the prize
was awarded to the Swiss mathematician Simon Antoine Jean L’Huillier (1750–1840) who
had adopted d’Alembert’s limits.
To the reader who has seen the derivative and integral defined in terms of limits, it may
seem that d’Alembert and L’Huillier got it right. This was not so clear to their contem-
poraries. In his article of 1754 on the différentiel for Diderot’s Encyclopédie, d’Alembert
speaks of the limit as that number that is approached “as closely as we please” by the slope
of the approximating secant line. We still use this phrase to explain limits, but its meaning
is not entirely clear. Mathematicians of the 18th century were not yet ready to embrace the
full radicalism of the Archimedean understanding, that there is, strictly speaking, no such
thing as an infinite summation or process.

Problems with Infinite Series


If the trouble had only lain in the definition of the derivative and integral, then it would not
have received the attention that it did. Infinite series were also causing misgivings. Euler
worked with divergent series and, as we saw in section 2.1, determined the value from the

genesis of the series. He would assign the value 1 + 200/199 to the divergent series
   
200 (1/2)(1/2 − 1) 200 2 (1/2)(1/2 − 1)(1/2 − 2) 200 3
1 + (1/2) + + + ···
199 2! 199 3! 199
(2.67)

because it arises from the Taylor series for 1 + x.
There is a difficulty with this point of view that was exposed by Johann Bernoulli’s son
Daniel (1700–1782) in 1772: different machinery can give rise to the same series with
different values. The alternating series of 1’s and −1’s can arise when x is set equal to 1 in
1
1 − x + x2 − x3 + x4 − x5 + · · · = ,
1+x
or

1 − x + x 3 − x 4 + x 6 − x 7 + · · · = (1 − x)(1 + x 3 + x 6 + · · · )
1−x
=
1 − x3
1
= ,
1 + x + x2
or

1 − x 2 + x 3 − x 5 + x 6 + x 8 + · · · = (1 − x 2 )(1 + x 3 + x 6 + · · · )
1 − x2
=
1 − x3
1+x
= .
1 + x + x2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.6 Emerging Doubts 53

The first gives a value of 1/2, the second 1/3, and the last 2/3. We can vary these exponents
to get any rational number between 0 and 1. The same divergent series can have different
values depending upon the context in which it arises. Many mathematicians found this to
be a highly unsatisfactory state of affairs. On the other hand, to simply discard all divergent
series is to lose those, like the error term in Stirling’s formula, that are truly useful.

For information on the error term in Stirling’s formula, Go to Appendix A.4, The
Size of n!.

The Vibrating String Problem


Fourier’s 1807 paper on the propagation of heat was seen by Lagrange and the other
members of the reviewing committee as another piece in a longstanding controversy within
mathematics. This controversy had begun with the mathematical model of the vibrating
string.
In 1747, d’Alembert published the differential equation governing the height y above
position x at time t of a vibrating string:

∂ 2y 1 ∂ 2y
= , (2.68)
∂x 2 c2 ∂t 2

where c depends on the length, tension, and mass of the string. To solve this equation, we
need to know boundary conditions. If the ends of the string at x = 0 and l are fixed, then
y(0, t) = y(l, t) = 0. We also need to know the original position of the string: y(x, 0).
The situation is very similar to that of heat propagation. If y(x) can be expressed as a
linear combination of functions of the form sin(π x/ l), sin(2π x/ l), sin(3π x/ l), . . . , for
example

y(x, 0) = 3 sin(π x/ l) − 2 sin(3π x/ l), (2.69)

then the solution to equation (2.68) is

y(x, t) = 3 sin(π x/ l) cos(π ct/ l) − 2 sin(3π x/ l) cos(3π ct/ l). (2.70)

It is worth noting that as a function of time, each piece of this solution is periodic. The first
piece has period 2l/c; the second has period 2l/3c. That means that the first piece has a
frequency of c/2l vibrations per unit time; the second has frequency 3c/2l. This explains
the overtones or harmonics of a vibrating string.
Daniel Bernoulli suggested in 1753 that the vibrating string might be capable of infinitely
many harmonics. The most general initial position should be an infinite sum of the form

y(x) = a1 sin(π x/ l) + a2 sin(2π x/ l) + a3 sin(3π x/ l) + · · · . (2.71)

Euler rejected this possibility. The reason for his rejection is illuminating. The function in
equation (2.71) is necessarily periodic with period 2l. Bernoulli’s solution cannot handle
an initial position that is not a periodic function of x.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

54 2 Infinite Summations

Euler seems particularly obtuse to the modern mathematician. We only need to describe
the initial position between x = 0 and x = l. We do not care whether or not the function
repeats itself outside this interval. But this misses the point of a basic misunderstanding
that was widely shared in the eighteenth century.
For Euler and his contemporaries, a function was an expression: a polynomial, a trigono-
metric function, perhaps one of the more exotic series arising as a solution of a differential
equation. As a function of x, it existed as an organic whole for all values of x. This is not
to imply that it was necessarily well-defined for all values of x, but the values where it
was not well-defined would be part of its intrinsic nature. Euler admitted that one could
chop and splice functions. For example, one might want to consider y = x 2 for x ≤ 1 and
y = 2x − 1 for x ≥ 1. But these were two different functions that had been juxtaposed. To
Euler, the shape of a function between 0 and l determined that function everywhere.
Lagrange built on this understanding when he asserted that every function has a power
series representation and that the derivative of f at x = a can be defined as the coefficient
of (x − a) in the expansion of f (x) in powers of (x − a). In other words, he used the Taylor
series for f to define f  (a), f  (a), f  (a), . . . . As late as 1816, Charles Babbage (1792–
1871), John Herschel (1792–1871), and George Peacock (1791–1858) would champion
Lagrange’s viewpoint. It implies that the values of a function and all of its derivatives at
one point completely determine that function at every value of x.
Lagrange was asssuming that all functions, at least all functions worthy of study, possess
infinitely many derivatives and so have power series expansions. We now know that that
is far too limited a view. Lagrange’s functions are now given a special designation. They
are called analytic functions.

Definition: C p and analytic functions


Given an interval I , a function with a continuous first derivative in I is said to belong to
the class C 1 . If the pth derivative exists and is continuous in I , the function belongs to
the class C p . If all derivatives exist, the function belongs to C ∞ and is called analytic.

The most revolutionary thing that Fourier accomplished in 1807 was to assert that both
Daniel Bernoulli and Leonhard Euler were right. Any initial position can be expressed as
an infinite sum of the form given in equation (2.71). Fourier showed how to compute the
coefficients. But it is equally true that any function represented by such a trigonometric
expansion is periodic. The implication is that the description of a function between 0 and l
tells us nothing about the function outside this interval.
To Lagrange especially, but probably also to the other members of the committee that
reviewed this manuscript, there had to be a flaw. The easiest way out was to assume
a problem with the convergence of Fourier’s series. In succeeding years, Fourier and
others demonstrated that there was no problem with the convergence. This forced a critical
reevaluation of what was meant by a function, an infinite series, a derivative. As each
object of the structure that is calculus came under scrutiny, it was found to rest on uncertain
foundations that needed to be examined and reconstructed. Above all, it was the notion of
infinity that was in need of correction. It was Augustin Louis Cauchy (1789–1857) who
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

2.6 Emerging Doubts 55

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
−1 −0.5 0 0.5 1
x

2.7. Plot of y = e−1/x .


2
FIGURE

realized that the only true foundation was a return to the Archimedean understanding. In
the 1820s, he set himself the task of reconstructing calculus upon this bedrock.

Cauchy’s Counterexample
The death knell for Lagrange’s definition of the derivative was sounded by Cauchy in 1821.
He exhibited a counterexample to Lagrange’s assertion that distinct functions have distinct
power series (Figure 2.7):
f (x) = e−1/x ,
2
f (0) = 0.
All of the derivatives of f (x) at x = 0 are equal to 0. At x = 0, this function has the same
power series expansion as the constant function 0. The determination of the derivatives of
f (x) at x = 0 will be demonstrated in section 3.1.

Exercises

2.6.1. Rewrite the series


1 − x 2 + x 5 − x 7 + x 10 − x 12 + · · ·
as a rational function of x. Set x = 1. What value does this give for the series 1 − 1 + 1 −
1 + ··· ?
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

56 2 Infinite Summations

2.6.2. Find a power series in x that would imply that


4
1 − 1 + 1 − 1 + ··· =
7
when x is set equal to 1.

2.6.3. Given any nonzero integers m and n, find a power series in x that would imply that
m
1 − 1 + 1 − 1 + ··· =
n
when x is set equal to 1.

2.6.4. All of the derivatives of ex at x = 0 are equal to 1. Find another analytic function
on R, call it f , that is not equal to ex at any x other than x = 0, but for which the nth
derivative of f at 0, f (n) (0), is equal to 1 for all n ≥ 1.

2.6.5. Find an analytic function on R, call it g, such that g(0) = 1 = g (n) (0) for all n ≥ 1
and g(1) = 1.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3
Differentiability and Continuity

In this chapter we shall compress roughly fifty years of struggle to understand differentia-
bility and continuity. Our goal is to prove Theorem 2.1 on page 44, the Lagrange remainder
theorem, but we cannot do this without first coming to grips with these two concepts.
Our modern interpretations of differentiability and continuity were certainly in the air by
the early 1800s. Fr. Bernhard Bolzano (1781–1848) in Prague described these concepts
in terms that clearly conform to our present definitions although it is uncertain how much
influence he had. Carl Gauss in private notebooks of 1814 showed considerable insight into
these foundational questions, but he never published his results.
Credit for our current interpretation of differentiability and continuity usually goes to
Augustin Louis Cauchy (1789–1857) and the books that he wrote in the 1820s to support the
courses he was teaching, especially his Cours d’analyse de l’École Royale Polytechnique
of 1821. Cauchy was born in the summer of the fall of the Bastille. Laplace and Lagrange
were family friends who admired and encouraged this precocious child. His first job came
in 1810 as a military engineer in Cherbourg preparing for the invasion of England. In the
midst of it, he wrote and published mathematics. By the time he returned to Paris in 1813,
he had made his reputation, amply confirmed in the succeeding years. His contemporaries
described him as bigoted and cold, but he was loyal to his king and his church, and he was
brilliant. He dominated French mathematics through the golden years of the 1820s. In 1830
he left France, following the last Bourbon king of France, Charles X, into exile.
It is a mistake to think that Cauchy got all of his mathematics right the first time. He was
often very confused. We shall spend considerable time on the points that confused him,
precisely because those are still difficulties for those who are first entering this subject.
Nevertheless, Cauchy’s writings were important and influential. He brought foundational
questions to the forefront of mathematical research and discovered many of the definitions
that would make progress possible.

57
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

58 3 Differentiability and Continuity

a c b

FIGURE 3.1. Illustration of the Mean Value Theorem.

It was not until the 1850s and 1860s that anything like our modern standards of rigor came
into analysis. This was the result of efforts by people such as Karl Weierstrass (1815–1897)
and Bernhard Riemann (1826–1866) who will be introduced in Chapter 5.

3.1 Differentiability
The Lagrange Remainder Theorem is a statement about the difference between the partial
sum of the first n terms of the Taylor series and the target value of that series. We will
prove it by induction on n, which means that we first need to prove the case n = 1:

f (x) = f (a) + f  (c) (x − a) (3.1)

for some real number c strictly between a and x. Equation (3.1) is commonly known as the
Mean Value Theorem.

Theorem 3.1 (Mean Value Theorem). Given a function f that is differentiable at all
points strictly between a and x and continuous at all points on the closed interval from
a to x, there exists a real number c strictly between a and x such that

f (x) − f (a)
= f  (c). (3.2)
x−a

This theorem says that the average rate of change of the function f over the interval
from a to x is equal to the instantaneous rate of change at some point between a and x. If
we look at what this means graphically (see Figure 3.1), we see that it makes sense. The
slope of the secant line connecting (a, f (a)) to (x, f (x)) should be parallel to the tangent
at some point between a and x. It makes sense, but how do we prove it? How do we know
the right conditions under which this theorem is true? Could we weaken the assumption of
either differentiability or continuity and still get this conclusion? Is this really all we need
to assume?
Cauchy was the first mathematician to fully appreciate the importance of this theorem
and to wrestle with its proof. He was not entirely successful. In the next section, we will
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.1 Differentiability 59

follow his struggles because they are very informative of the complexities of working
with derivatives. Right now we need to address a more basic issue: what do mean by the
derivative of a function f at a?
The standard definition of the derivative given in first-year calculus is

f (x) − f (a)
f  (a) = lim , (3.3)
x→a x−a

where this is understood to mean that (f (x) − f (a)) / (x − a) is a pretty good approxima-
tion to f  (a) that gets better as x gets closer to a. But what precisely do we mean by a
limit? We can apply the Archimedean understanding to limits.

Definition: Archimedean understanding of limits


When we write any limit statement such as
lim f (x) = T ,
x→a

what we actually mean is that if we take any number M > T , then we can force
f (x) < M by taking x sufficiently close to a. Similarly, if we take any L < T , then
we can force f (x) > L by taking x sufficiently close to (but not equal to) a.

When we apply this strict definition of the limit to the slope of the secant line, we get a
precise definition of the derivative.

Definition: derivative of f at x = a
The derivative of f at a is that value, denoted f  (a), such that for any L < f  (a) and
any M > f  (a), we can force
f (x) − f (a)
L< <M
x−a
simply by taking x sufficiently close to (but not equal to) a.

For example, if f (x) = x 3 and a = 2, then


x3 − 8
= x 2 + 2x + 4.
x−2
Near x = 2, this is an increasing function. Our candidate for f  (2) is 12. Let L = 11.99
and M = 12.01. If we take 1.999 < x < 2.001, then
x3 − 8 1.9993 − 8
> = 11.994001 > 11.99 = L,
x−2 1.999 − 2
x3 − 8 2.0013 − 8
< = 12.006001 < 12.01 = M.
x−2 2.001 − 2
Note that this does not prove that the derivative is 12. We must show that there is an
interval around x = 2 that works for every possible pair (L, M) for which L < 12 < M.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

60 3 Differentiability and Continuity

To accomplish this, it is useful to study the actual difference between the derivative and
the average rate of change:

f (x) − f (a)
E(x, a) = f  (a) − . (3.4)
x−a

The function E(x, a) is the discrepancy or error introduced when we use the average rate
of change in place of the derivative, or vice-versa.
We now present notation that was introduced by Cauchy in 1823. Rather than working
with L and M, he used f  (a) − and f  (a) + where is a positive amount. It is the same
idea, and we do not lose anything by insisting that L and M must be the same distance
from f  (a).
Following Cauchy, we let δ > 0 be the distance x is allowed to vary from a: a − δ <
x < a + δ, x = a. It is important that x = a because the slope of the secant line is not
defined at x = a. These conditions are neatly summarized by 0 < |x − a| < δ.

Definition: Cauchy definition of derivative of f at x = a


The derivative of f at a is that value, denoted f  (a), such that for any > 0, we have
a response δ > 0 so that if 0 < |x − a| < δ, then this forces

 f (x) − f (a)

|E(x, a)| = f (a) −
x−a < .

In our example, f (x) = x 3 at x = 2, we have


E(x, 2) = 12 − (x 2 + 2x + 4) = −(x − 2)(x + 4).
If we are given any > 0 and restrict our x to 2 − δ < x < 2 + δ where δ is the smaller of
/10 and 1, then
7
|E(x, s)| = |x − 2| |x + 4| < δ(6 + δ) = 6δ + δ 2 < 7δ < .
10
We note that there is at most one number that can serve as the value of the derivative at
a. To prove this, we assume that both α and β will work, α = β. Let
f (x) − f (a)
E1 (x, a) = α −
x−a
and
f (x) − f (a)
E2 (x, a) = β − .
x−a
The distance between α and β can be expressed in terms of these error functions:
   
f (x) − f (a) f (x) − f (a)
|β − α| = β − − α−
x−a x−a
= |E2 (x, a) − E1 (x, a)|
≤ |E2 (x, a)| + |E1 (x, a)|.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.1 Differentiability 61

Since we can make the absolute value of each error as small as we want—certainly less
than |β − α|/2—this hands us a contradiction.

Web Resource: To see how this error function plays a role in analyzing the
effectiveness of the Newton–Raphson method for finding roots, go to Newton–
Raphson Method .

Caution!
The definition just given is neither obvious nor easy to absorb. The reader encountering
it for the first time should keep in mind that it is the fruit of two centuries of searching.
It looks deceptively like the casual definition, for it compares f  (a) to the average rate
of change, (f (x) − f (a))/(x − a). There may be a tendency to ignore the and δ and
hope that they are not important. They are. This has proven to be the definition of the
derivative that explains those grey areas where differentiation does not seem to be working
the way we expect. Later in this section, we shall use it to explain why it is that sometimes
you can differentiate an infinite series by differentiating each term, and sometimes you
cannot.
The key to this definition is the emphasis on the error or difference between the derivative
and the average rate of change. We have differentiability when this error can be made as
small as desired by tightening up the distance between x and a. We are not just saying that
we can make this error small. It is not enough to show that the absolute value of the error
can be made less than 0.01 or 0.0001 or even 10−100 . We have to be able to get the error
inside any specific positive bound.

 and δ
I find it useful to think of and δ as a two-person game. You play ; I play δ. The particular
game is specified by a function, for example f (x) = 3x 2 − 5x + 2, and a point at which
we want to check for differentiability, perhaps x = 2. If this function has a derivative at 2,
the value of that derivative would have to be

f  (2) = 6 · 2 − 5 = 7.

Our error term is


f (x) − f (2) 3x 2 − 5x − 2
E(x, 2) = 7 − =7− = 7 − (3x + 1) = −3x + 6.
x−2 x−2
You now get to choose any . The only constraint is that it must be positive. If you
choose = 0.01, then this is saying that you want to see the absolute value of the error,
|−3x + 6|, less than 0.01. My challenge is to find a positive distance δ such that if x is
within δ of 2, then the error will satisfy your constraint (see Figure 3.2). I could respond
with δ = 0.001. This means that 1.999 < x < 2.001, and so

−0.003 < E(x, 2) < 0.003.

I have met your . (Note that I did not have to find the best possible δ, I only had to find
some δ that would make |E(x, 2)| < 0.01.)
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

62 3 Differentiability and Continuity

+
y = −3x + 6

2 −δ

2 2 +δ

−

FIGURE 3.2. and δ when E(x, 2) = −3x + 6.

But I have not yet won. The game is not over. You are now permitted to reply with a
smaller , maybe = 10−100 . I counter with δ = 10−101 , and we check that if 2 − 10−101 <
x < 2 + 10−101 , then

−3 · 10−101 < E(x, 2) < 3 · 10−101 ,

and so

|E(x, 2)| < 3 · 10−101 < 10−100 .

At this point you realize that I always have a comeback. If you propose = 10−1000000 , I can
counter with δ = 10−1000001 . When it is recognized that every positive can be countered,
then I have won and we declare that f (x) = 3x 2 − 5x + 2 is differentiable at 2 and its
derivative is 7.
On the other hand, if you ever succeed in stumping me, then the function is not differen-
tiable at that point. Let us take the function defined by f (x) = |x 2 − x/200| at x = 0 (see
Figure 3.3). We shall try setting f  (0) = 0. If this is not right, then I am allowed to change
it because the definition of differentiability only asks that there is some number f  (0) for
which I always have a comeback. The error function is

|x 2 − x/200| − 0
|E(x, 0)| = 0 − = |x − 0.005|.
x−0
We are ready to play.
Your first challenge is = 0.01. I successfully counter with δ = 0.001. If −0.001 <
x < 0.001, then |E(x, 0)| < 0.006 < 0.01. You return with = 0.001. I cannot reply. If I
want

|x − 0.005| < 0.001,

then x must lie between 0.004 and 0.006. I cannot put x between those limits. I can
only restrict how close x lies to 0. In desperation, I can try a different f  (0). The error
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.1 Differentiability 63

0.00005

0.00004

0.00003

0.00002

0.00001

0
−0.004 −0.002 0 0.002 0.004 0.006 0.008 0.01

FIGURE 3.3. Graph of f (x) = |x 2 − x/200|.

function is

 |x 2 − 0.005x|

|E(x, 0)| = f (0) −
x
 
|f (0) − 0.005 + x|, 0 < x < 0.005,
=
|f  (0) − x + 0.005|, x < 0.

To make this error less than 0.001, I would have to find an f  (0) for which |f  (0) − 0.005|
and |f  (0) + 0.005| are each less than 0.001. No number f  (0) is going to lie within
= 0.001 of both 0.005 and −0.005. I have lost. This function is not differentiable at
x = 0.

Derivatives of Sums
The definition of the derivative that we have given is pointless unless we can show that it
tells us something about the derivative that we did not already know. One problem that we
can begin to tackle is that of determining when we are allowed to differentiate an infinite
series by differentiating each term.
We recall Fourier’s series,
        
4 πx  1 3π x 1 5π x 1 7π x
F (x) = cos − cos + cos − cos + ··· ,
π 2 3 2 5 2 7 2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

64 3 Differentiability and Continuity

which is equal to 1 for −1 < x < 1. Anywhere between −1 and 1, the derivative of this
series is 0,
F  (x) = 0, −1 < x < 1.
However, if we try to differentiate each term, we obtain the series
        
πx  3π x 5π x 7π x
G(x) = −2 sin − sin + sin − sin + ··· .
2 2 2 2
For −1 < x < 1, this series does not converge unless x = 0. For example, at x = 0.5 we
have

G(0.5) = − 2 [1 − 1 − 1 + 1 + 1 − 1 − 1 + 1 + · · · ] .
The conclusion is that G is not the derivative of F .
To understand the difference between differentiating a finite sum and an infinite sum,
we shall begin by proving that if f and g are both differentiable at x = a, then so is f + g
and the derivative of this sum at x = a is f  (a) + g  (a).
Since f and g are both differentiable at a, we know that we have two error functions
satisfying
f (x) − f (a)
f  (a) = + E1 (x, a), (3.5)
x−a
g(x) − g(a)
g  (a) = + E2 (x, a), (3.6)
x−a
where |E1 | and |E2 | can each be made as small as we wish by taking x sufficiently close to
a. We now define the error function for f + g:
f (x) + g(x) − f (a) − g(a)
E(x, a) = f  (a) + g  (a) −
x−a
f (x) − f (a) g(x) − g(a)
= f  (a) − + g  (a) −
x−a x−a
= E1 (x, a) + E2 (x, a). (3.7)
The error of the sum is the sum of the errors.
We have to show that we shall always win the –δ game for the error function E(x, a).
We know that it is winnable for E1 and E2 separately. Someone serves us an > 0. We
give half of it to E1 and half to E2 . In other words, we find a distance δ1 that guarantees
that

|E1 (x, a)| < ,
2
and another distance δ2 that guarantees that

|E2 (x, a)| < .
2
We choose whichever distance is smaller and call it δ. If any restriction on the distance
works, then any tighter restriction will also work. If 0 < |x − a| < δ, then both of the
individual errors are less than /2. It follows that

|E(x, a)| = |E1 (x, a) + E2 (x, a)| ≤ |E1 (x, a)| + |E2 (x, a)| < . (3.8)
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.1 Differentiability 65

We always have a comeback. The sum f + g is differentiable at x = a and the derivative


is f  (a) + g  (a).

What Goes Wrong with Infinite Series?


If we want to prove the differentiability of the sum of three functions, the same argument
shows us that the error function for the sum is the sum of the three error functions, and we
allot to each of them a third of our error bound . With a sum of four functions, each error
gets a quarter of . How do we allocate the error bound if we have an infinite sum?
It can be done. Let us assume that we have
F (x) = f1 (x) + f2 (x) + f3 (x) + · · · ,
and that each function in the summation is differentiable at x = a:

fn (x) − fn (a)
fn (a) = + En (x, a), (3.9)
x−a

where we can make |En (x, a)| arbitrarily small by taking x sufficiently close to a. Given
a total error bound of , we can allocate /2 to the first error, /4 to the second, /8 to
the third, /16 to the fourth, and so on. Each function has a response: δ1 , δ2 , δ3 , . . . . Our
difficulty arises when we try to find a single positive δ that is less than or equal to each of
these. If there were finitely many δs we could do it. But there are infinitely many, and there
is no guarantee that they do not get arbitrarily close to 0. In other words, there may be no
positive number that is less than or equal to all of our δs.
As we shall see, another way of looking at this is to sum both sides of equation (3.9):

 ∞
 ∞

fn (x) − fn (a)
fn (a) = + En (x, a)
n=1 n=1
x−a n=1

F (x) − F (a) 
= + En (x, a). (3.10)
x−a n=1
∞ 
The derivative of F = ∞n=1 fn at a is n=1 fn (a) if and only if


E(x, a) = En (x, a)
n=1

can be made arbitrarily close to 0 by taking x sufficiently close to a. Sometimes it can,


and sometimes it cannot. An infinite sum of very small numbers might be very small or
very large. When we learn more about infinite series, we shall see some useful criteria for
determining when the derivative of an infinite series is the infinite series of the derivatives.

Strange Derivatives: x 2 sin(x −2 ) and e−1/x


2

Our –δ definition of the derivative enables us to demonstrate the existence of a derivative in


certain cases where a simple application of the techniques of differentiation would suggest
that no derivative exists. One example is the function defined by (see Figure 3.4)

f (x) = x 2 sin(x −2 ), x = 0, f (0) = 0. (3.11)


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

66 3 Differentiability and Continuity

0.1

0.05
x
−0.4 −0.2 0 0.2 0.4
0

−0.05

−0.1

−0.15

−0.2

FIGURE 3.4. Graph of f (x) = x 2 sin(x −2 ).

When x is not 0, we can use our standard algorithms to find the derivative:

f  (x) = 2x sin(x −2 ) − 2x −1 cos(x −2 ), x = 0.

(Figure 3.5). Since x cannot be set equal to 0 in the expression on the right, there is
a common misconception that f  (0) does not exist. But if we look at the definition
of the derivative, we see that if we try setting f  (0) = 0, then this is the correct value
provided

x 2 sin(x −2 ) − 0

|E(x, 0)| = 0 − −2
x−0 = −x sin(x ) (3.12)

can be made less than any specified by restricting x to within some δ of 0. Since
| sin(x −2 )| ≤ 1, we see that

|E(x, 0)| ≤ |x|, (3.13)

and therefore given a bound > 0, we can always reply with δ = . If |x| < δ = , then
|E(x, 0)| ≤ |x| < . There is a response that forces |E(x, 0)| to be less than , and therefore
the derivative of f at x = 0 is zero.
Our second example is the function from section 2.6:

g(x) = e−1/x , x = 0,
2
g(0) = 0. (3.14)
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.1 Differentiability 67

200

150

100

50

0
0.1 0.2 0.3 0.4 0.5
x
−50

−100

−150

FIGURE 3.5. Graph of f  (x) = 2x sin(x −2 ) − 2x −1 cos(x −2 ).

We claimed that not only is g  (0) = 0, but all derivatives of g at x = 0 are zero: g (n) (0) = 0.
When x = 0, the derivatives of g are given by
g  (x) = 2x −3 e−1/x ,
2

g  (x) = (4x −6 − 6x −4 ) e−1/x ,


2

g  (x) = (8x −9 − 36x −7 + 24x −5 ) e−1/x ,


2

..
.
In general, g (n) (x) will be a polynomial in x −1 multiplied by e−1/x (see exercise 3.1.21).
2

We write this as

g (n) (x) = Pn (x −1 ) e−1/x , x = 0.


2
(3.15)

We first prove that g  (0) = 0. Our error function is given by



e−1/x − 0 −1 −1/x 2
2

|E1 (x, 0)| = 0 − = x e . (3.16)
x−0

It follows from equation (2.57) on page 44 that if z is positive, then ez is strictly larger than
1 + z. Therefore, for x = 0:
 
2 1
x 2 e1/x > x 2 1 + 2 = x 2 + 1 > 1, (3.17)
x
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

68 3 Differentiability and Continuity

and so

1 −1 −1/x 2
|x| > = x e . (3.18)
|x| e1/x 2

Given an error bound , we can respond with δ = . If |x| < , then



2
|E1 (x, 0)| = x −1 e−1/x < |x| < .

We now move to the higher derivatives of g. If we have shown that g (n) (0) = 0, then to
prove that g (n+1) (0) is also 0, we need to demonstrate that we can force
Pn (x −1 ) e−1/x − 0
2

= x −1 Pn (x −1 ) e−1/x
2
En+1 (x, 0) = 0 −
x−0
to be arbitrarily close to 0 by keeping x sufficiently close to 0. We know that x −1 Pn (x −1 ) is
a finite sum of powers of x −1 . It is enough to show that for any positive integer k, x −k e−1/x
2

can be forced arbitrarily close to 0 by keeping x sufficiently close to 0.


We choose an integer j so that k + 1 − 2j ≤ −1 and then restrict our choices of δ to
δ ≤ 1/j !. It follows that |x| < δ implies that |x|−1 > j !. We know that when z is positive,
ez is strictly larger than 1 + z + z2 /2! + · · · + zj /j !. Since |x| < 1/j ! < 1, we have that
 
k+1 1/x 2 1 1 1
|x| e > |x| k+1
1 + 2 + 4 + ··· +
x 2x j ! x 2j
|x|k+1−2j |x|−1
> ≥ > 1. (3.19)
j! j!
It follows that

2
|x| > x −k e−1/x . (3.20)

Given the challenge , we respond with δ equal to either or 1/j!, whichever is smaller.
We have proven that every derivative of g at x = 0 is zero, and therefore the Taylor
series for g(x) = e−1/x , expanded about a = 0, is the same as the Taylor series for the
2

constant function 0.

Exercises

3.1.1. You are the teacher in a calculus class. You give a quiz in which one of the questions
is, “Find the derivative of f (x) = x 2 at x = 3.” One of your students writes
d
f (3) = 32 = 9, 9= 0
dx
Write a response to this student.

3.1.2. Find the derivatives (where they exist) of the following functions. The function
denoted by x
sends x to the greatest integer less than or equal to x.
a. f (x) = x |x|, x∈R

b. f (x) = |x|, x∈R
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.1 Differentiability 69

c. f (x) = x
sin2 (π x), x ∈ R
d. f (x) = (x − x
) sin2 (π x), x ∈ R
e. f (x) = ln |x|, x ∈ R\{0}
f. f (x) = arccos(1/|x|), |x| > 1

3.1.3. Find the derivatives of the following functions.


a. f (x) = logx 2, x > 0, x = 1
b. f (x) = logx (cos x) , 0 < x < π/2, x = 1

3.1.4. Find the derivatives (where they exist) of the following functions. The signum
function, sgn x, is +1 if x > 0, −1 if x < 0, and 0 if x = 0.

arctan x, if |x| ≤ 1
a. f (x) =
(π/4)sgn x + (x − 1)/2, if |x| > 1
 2 −x 2
x e , if |x| ≤ 1
b. f (x) =
1/e, if |x| > 1

arctan(1/|x|), if x = 0
c. f (x) =
π/2, if x = 0

3.1.5. Show that the function given by


 2
x |cos(π/x)| , if x = 0
f (x) =
0, if x = 0
is not differentiable at xn = 2/(2n + 1), n ∈ Z, but is differentiable at 0 which is the limit
of this sequence.

3.1.6. Determine the constants a, b, c, and d so that f is differentiable for all real values
of x.

 4x if x ≤ 0
a. f (x) = ax 2 + bx + c if 0 < x < 1

3 − 2x if x ≥ 1

 ax + b if x ≤ 0
b. f (x) = cx + dx if 0 < x < 1
2

1 − x1 if x ≥ 1


 ax + b if x ≤ 1
c. f (x) = ax 2 + c if 1 < x < 2

 dx 2 +1
x
if x ≥ 2

3.1.7. Let f (x) = x 2 , f  (a) = 2a. Find the error E(x, a) in equation (3.4) in terms of x
and a. How close must x be to a if |E(x, a)| is to be less than 0.01, less than 0.0001?

3.1.8. Let f (x) = x 3 . Find the error E(x, 1) in equation (3.4) as a function of x. Graph
E(x, 1). How close must x be to 1 if |E(x, 1)| is to be less than 0.01, less than 0.0001?
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

70 3 Differentiability and Continuity

3.1.9. Let f (x) = x 3 . Find the error E(x, 10) in equation (3.4) as a function of x. Graph
E(x, 10). How close must x be to 10 if |E(x, 10)| is to be less than 0.01, less than
0.0001?

3.1.10. Let f (x) = sin x. Find the error E(x, π/2) in equation (3.4) as a function of x.
Graph E(x, π/2). Find a δ to respond with if you are given = 0.1, 0.0001, 10−100 .

3.1.11. Use the definition of differentiability to prove that f (x) = |x| is not differentiable
at x = 0, by finding an for which there is no δ response. Explain your answer.

3.1.12. Graph the function f (x) = x sin(1/x) (f (0) = 0) for −2 ≤ x ≤ 2. Prove that f (x)
is not differentiable at x = 0, by finding an for which there is no δ response. Explain your
answer.

3.1.13. Graph the function f (x) = x 2 sin(1/x) (f (0) = 0) for −2 ≤ x ≤ 2. Use the
definition of differentiability to prove that f (x) is differentiable at x = 0. Show
that this derivative cannot be obtained by differentiating x 2 sin(1/x) and then setting
x = 0.

3.1.14. Prove that if f is continuous at a and limx→a f  (x) exists, then so does f  (a) and
they must be equal.

Web Resource: For help with finding and then presenting proofs, go to How to
find and write a proof.

3.1.15. Assume that f and g are differentiable at a. Find


xf (a) − af (x)
a. lim ,
x→a x−a
f (x)g(a) − f (a)g(x)
b. lim .
x→a x−a

3.1.16. Let f be differentiable at x = 0, f  (0) = 0. Find


f (x)ex − f (0)
lim .
x→0 f (x) cos x − f (0)

3.1.17. Let f be differentiable at a with f (a) > 0. Find


 
f (x) 1/(ln x−ln a)
lim .
x→a f (a)

3.1.18. Let f be differentiable at x = 0, f (0) = 0, and let k be any positive integer. Find
the value of
 x  x   x 
1
lim f (x) + f +f + ··· + f .
x→0 x 2 3 k
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.2 Cauchy and the Mean Value Theorems 71

3.1.19. Let f be differentiable at a and let {xn } and {zn } be two sequences converging to
a and such that xn = a, zn = a, xn = zn for n ∈ N. Give an example of f for which
f (xn ) − f (zn )
lim
n→∞ xn − zn
a. is equal to f  (a),
b. does not exist or exists but is different from f  (a).

3.1.20. Let f be differentiable at a and let {xn } and {zn } be two sequences converging to
a such that xn < a < zn for n ∈ N. Prove that
f (xn ) − f (zn )
lim = f  (a).
n→∞ xn − zn

3.1.21. Consider the polynomials Pn (x) defined by equation 3.15 on page 67. We have
seen that
P0 (x) = 1,
P1 (x) = 2x 3 ,
P2 (x) = 4x 6 − 6x 4 ,
P3 (x) = 8x 9 − 36x 7 + 24x 5 .
Find P4 (x). Show that
Pn+1 (x) = 2x 3 Pn (x) − x 2 Pn (x),
and therefore Pn (x) is a polynomial of degree 3n.

3.1.22. Sketch the graph of the function


 1/(x 2 −1)
e if |x| < 1,
f (x) =
0 if |x| ≥ 1.
Show that all derivatives of f exist at x = ±1.

3.2 Cauchy and the Mean Value Theorems


Cauchy knew that he needed to prove Lagrange’s form of Taylor’s theorem with its explicit
remainder term:

f  (0) 2 f  (0) 3 f (k−1) (0) k−1


f (x) = f (0) + f  (0) x + x + x + ··· + x + Rk (x), (3.21)
2! 3! (k − 1)!

where Rk (x) = f k (c) x k /k! for some constant c between 0 and x. If we replace x by
x − a and then shift our function by a, we obtain the equivalent but seemingly more
general equation:
f  (a) f  (a)
f (x) = f (a) + f  (a) (x − a) + (x − a)2 + (x − a)3
2! 3!
f k−1 (a)
+ ··· + (x − a)k−1 + Rk (x), (3.22)
(k − 1)!
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

72 3 Differentiability and Continuity

where Rk (x) = f k (c) (x − a)k /k! for some constant c between a and x. He began the
proof of equation (3.22) with the very simplest case, k = 1, the mean value theorem,
Theorem 3.1 on page 58.
Equation (3.2) simply asserts that the average rate of change of f between x = a
and x = b is equal to the instantaneous rate of change at some point between a and b (see
Figure 3.1). Geometrically, this seems obvious. But to actually verify it and thus understand
when it is or is not valid requires a great deal of insight.
None of the early proofs of the mean value theorem was without flaws. The proof that
is found in most textbooks today is a very slick approach that was discovered by Ossian
Bonnet (1819–1892) and first published in Joseph Alfred Serret’s calculus text of 1868,
Cours de Calcul Différentiel et Intégral. It avoids all of the difficulties that we shall be
encountering as we attempt to follow Cauchy. We shall postpone Bonnet’s proof until after
we have discussed continuity and its consequences in section 3.4.

Cauchy’s First Proof of the Mean Value Theorem


The following proof of the mean value theorem, equation (3.2), was given by Cauchy in
his Résumé des Leçons données a l’École Royale Polytechnique sur le calcul infinitésimal
of 1823. We shall run through it and then analyze the difficulties.
We first assume that f is differentiable at every point in the interval [a, b]. The definition
of the derivative given in the last section guarantees that for any given positive error , we
can find a distance δ such that if |h| < δ, then

f (x + h) − f (x)
f  (x) − < < f  (x) + . (3.23)
h

We partition the interval from a to b into n steps of size h < δ, xi+1 = xi + h:

s s s s s s
x0 = a x1 x2 x3 x4 ... xn = b

We note that nh = b − a.
Applying the inequality (3.23) to each of these intervals, we obtain

f (x1 ) − f (x0 )
f  (x0 ) − < < f  (x0 ) + ,
h
f (x2 ) − f (x1 )
f  (x1 ) − < < f  (x1 ) + ,
h
f (x3 ) − f (x2 )
f  (x2 ) − < < f  (x2 ) + ,
h
..
.
f (xn ) − f (xn−1 )
f  (xn−1 ) − < < f  (xn−1 ) + . (3.24)
h
Let A be the minimal value of f  (x) over the interval [a, b] and B the maximal value.
We can replace the left side of each of the lower inequalities by the same bound, A − ,
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.2 Cauchy and the Mean Value Theorems 73

and the right side of the upper inequalities by B + . Adding the central terms, we see that
f (x1 ) − f (x0 ) f (x2 ) − f (x1 ) f (xn ) − f (xn−1 )
n(A − ) < + +··· + < n(B + ),
h h h
(3.25)

f (xn ) − f (x0 )
n(A − ) < < n(B + ). (3.26)
h
We divide by n and recall that nh = b − a, xn = b, x0 = a:

f (b) − f (a)
A− < < B + . (3.27)
b−a

This is true for any positive , no matter how small, and so we must have that

f (b) − f (a)
A≤ ≤ B. (3.28)
b−a

Cauchy now assumes that f  is continuous and he invokes the intermediate value property
of continuous functions. This result will need to be proven, a fact that Cauchy recognized.

Definition: intermediate value property


A function f is said to have the intermediate value property on the interval [a, b] if
given any two points x1 , x2 ∈ [a, b] and any number N satisfying
f (x1 ) < N < f (x2 ),
then there is at least one value c between x1 and x2 for which f (c) = N .

If f  (x1 ) = A and f  (x2 ) = B and N = [f (b) − f (a)]/[b − a] lies between A and B


(see Figure 3.6), and if f  (x) is continuous, then f  (x) must cross the line y = [f (b) −
f (a)]/[b − a] somewhere between x1 and x2 . We let c denote the value of x where this
happens:
f (b) − f (a)
= f  (c).
b−a
Q.E.D.

The initials Q.E.D. stand for Quod Erat Demonstrandum, a Latin translation of
the phrase oπ ερ εδει δειξ αι with which Euclid concluded most of his proofs. It
means “what was to be proved” and signifies that the proof has been concluded.

The Problems with this Proof


This is an ingenious proof that demonstrates a profound understanding of the derivative,
but it would not pass muster today. We can prove something stronger than Cauchy’s
theorem. Cauchy required that f be differentiable at every point in the interval [a, b].
This is necessary if we want to get all of the inequalities of (3.24). As we shall see,
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

74 3 Differentiability and Continuity

f (x2)

N
f (x1)
x1 c x2

FIGURE 3.6. Intermediate value property in Cauchy’s proof of the mean value theorem.

Bonnet’s proof works under the slightly weaker assumption that we have differentiability
on the open interval (a, b) and continuity on the closed interval [a, b]. For example,
Cauchy’s proof would not apply to the function x sin(1/x) on [0, 1] (see exercise 3.1.12 of
section 3.1 and exercise 3.2.1 of this section). Bonnet’s would. Furthermore, Cauchy only
proves that c lies somewhere in the interval [a, b]. Bonnet’s proof shows that it must lie
strictly between a and b.
But these are quibbles. The question is not whether we can improve on Cauchy’s state-
ment of the mean value theorem, but whether his proof is valid. There are two questionable
assertions in his proof. The first is that given the error bound we can find a δ that works
over the entire interval [a, b]. It is true that at x0 we can find a δ so that if |x1 − x0 | < δ,
then
f (x1 ) − f (x0 )
f  (x0 ) − < < f  (x0 ) + ,
x1 − x0
but the inequalities of (3.24) assume that the same δ can be used at x1 and x2 and x3 . . . all
the way to xn−1 . This is not always so (see exercise 3.2.2 for a counterexample).
The second assumption is that f  actually has an upper bound B and a lower bound A
over the interval [a, b] and that it achieves these bounds. That is to say, we can find c1 ,
c2 ∈ [a, b] such that

f  (c1 ) ≤ f  (x) ≤ f  (c2 )

for all x ∈ [a, b]. In fact, such bounds do not always exist. Consider the function defined
by

f (x) = x 2 sin(x −2 ), x = 0; f (0) = 0. (3.29)

We saw this function in the last section (pages 65–66) where it was demonstrated that the
derivative exists at every x in [−1, 1], but the derivative is not bounded over this interval.
Cauchy could have avoided these problems by assuming that the derivative of f is
continuous. When he does invoke continuity to pass from the double inequality (3.28) to the
statement that [f (b) − f (a)]/[b − a] = f  (c) for some c ∈ [a, b], it is no longer needed.
While the derivative is not always continuous, we shall see that it does always possess the
intermediate value property. Part of the confusion that will have to be straightened out in
section 3.3 is the difference between continuity and the intermediate value property.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.2 Cauchy and the Mean Value Theorems 75

Cauchy’s Second Attempt


Cauchy gave another proof of the mean value theorem in an appendix to Résumé des
Leçons. He actually proves a far more powerful result. We shall state it in the form in
which we will eventually prove it.

Theorem 3.2 (Generalized Mean Value Theorem). If f and F are both continuous
at every point of [a, b] and differentiable at every point in the open interval (a, b) and
if F  is never zero in this interval, then

f (b) − f (a) f  (c)


=  (3.30)
F (b) − F (a) F (c)

for at least one point c, a < c < b.

We note that if F (x) = x, then this becomes the ordinary mean value theorem.
We begin by defining g(x) = f (x) − f (a) and G(x) = F (x) − F (a) so that g(a) = 0,
g  (x) = f  (x), G(a) = 0, and G (x) = F  (x). We consider the function defined by
f  (x) g  (x)
= ,
F  (x) G (x)
and let A be its minimal value over [a, b], B its maximal value:

g  (x)
A≤ ≤ B. (3.31)
G (x)

This implies that

g  (x) − A G (x) ≥ 0, (3.32)


 
g (x) − B G (x) ≤ 0, (3.33)

and so g − A G is an increasing function which is equal to g(a) − A G(a) = 0 at x = a,


and g − B G is a decreasing function which is equal to g(a) − B G(a) = 0 at x = a
(Figure 3.7). It follows that

g(x) − A G(x) ≥ 0, (3.34)


g(x) − B G(x) ≤ 0, (3.35)

for all x, a ≤ x ≤ b, and therefore

g(b)
A≤ ≤ B. (3.36)
G(b)

We now assume that g  /G is a continuous function and so takes on every value between
its minimum A and its maximum B. For some c ∈ [a, b],

g(b) g  (c) f  (c)


=  =  . (3.37)
G(b) G (c) F (c)
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

76 3 Differentiability and Continuity

y = g ( x) − A ⋅ G ( x)

a b

y = g ( x) − B ⋅ G ( x)

FIGURE 3.7. Graphs of g(x) − A G(x) and g(x) − B G(x).

When we substitute f (b) − f (a) for g(b) and F (b) − F (a) for G(b), we obtain the desired
equation (3.30).
Q.E.D.

The principal difficulty with this approach is the use of the result that if h(a) = 0
and h (x) ≥ 0 for a ≤ x ≤ b, then h(b) ≥ 0. This may seem obvious from our geometric
understanding of the derivative: a positive derivative means that the function is increasing.
But how do we prove this statement without relying on geometric intuition? If you look in
any current calculus text, you will see that the proof uses the mean value theorem:
h(b) − h(a)
= h (c) ≥ 0,
b−a
and so
h(b) − h(a) ≥ 0.
This sends us into a circular argument: the mean value theorem is true because positive
derivative implies increasing, and positive derivative implies increasing because of the
mean value theorem.
Cauchy does give an independent proof that a positive derivative implies an increasing
function. He first points out that when x is sufficiently close to a, f  (a) will be close
to [f (x) − f (a)]/[x − a]. If f  (a) is positive we can, by keeping x close to a, force
[f (x) − f (a)]/[x − a] to be positive. If x is larger than a, then f (x) will have to be larger
than f (a). So far, so good.
The problem arises when Cauchy tries to use this fact to conclude that if f  (x) is positive
for all x ∈ [a, b], then f (b) is strictly larger than f (a). To quote Cauchy, “If one increases
the variable x by insensible degrees from the first limit to the second, the function y will
grow with it as long as it has a finite derivative with positive value.” The difficulty is subtle
but important. How large are these “insensible degrees”? Can we use them to get from a
to b?
There is a way of proving that a positive derivative implies an increasing function without
using the mean value theorem, but it requires a deeper understanding of continuity. This
proof is described in exercises 3.4.12 and 3.4.13 at the end of section 3.4.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.2 Cauchy and the Mean Value Theorems 77

Our present state of affairs with respect to the mean value theorem is far from satisfactory.
We still have not seen an acceptable proof. The key to such a proof is a better understanding
of continuity, and for this we shall have to wait until the next section.

Exercises

3.2.1. Where does Cauchy’s proof of the mean value theorem break down if we try to
apply it to the function defined by f (x) = x sin(1/x) (f (0) = 0) over the interval [0, 1].
Note: the mean value theorem does apply to this function, but Cauchy’s approach cannot
be used to establish this fact.

3.2.2. The purpose of this exercise is to demonstrate that even though the function defined
by

f (x) = x 2 sin(x −2 ), x = 0; f (0) = 0,

is differentiable at all points of [0, 1], if we are given a bound 0 < < 1 for the error
function

 f (x) − f (a)

|E(x, a)| = f (a) −
x−a ,

there is no response δ that works for all values of a ∈ [0, 1].


a. Prove that f is differentiable at x = 0 and that f  (0) = 0. Given a bound , find a
response δ (as a function of ) such that |x| < δ guarantees that |E(x, 0)| < .
b. Graph f (x) and f  (x) over the interval [0, 1].
c. Given δ > 0, show that we can always find a and x such that 0 < x < a < δ and both
a −2 and x −2 are even multiples of π .
d. Given the values of a and x that were found in the previous part of this problem, show
that

|E(x, a)| = 2a −1 > 2δ −1 .

e. Complete the proof that there is no single value of δ that will serve as a response to the
bound at every value of a ∈ [0, 1].

3.2.3. Using the function f given in exercise 3.2.2, let a = 1/ 2π . Graph E(x, a). How
close must x be to a if we are to keep |E(x, a)| within the error bound = 0.1?, = 0.01?

3.2.4. Repeat exercise 3.2.3 with a = 1/ 200π .

3.2.5. Find another function that is differentiable at every point in [0, 1] but whose deriva-
tive is not bounded on [0, 1].

3.2.6. Find a function that is not continuous on a closed interval [a, b] but which does have
the intermediate value property on this interval.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

78 3 Differentiability and Continuity

3.2.7. Explain why a function f that satisfies the intermediate value property on the
interval [a, b] and satisfies f (a) < 0 < f (b) or f (a) > 0 > f (b) must have at least one
root between a and b.

3.2.8. Graph the function defined by

1−x
f (x) = , x = 0; f (0) = 1.
|x|

Explain how you know that it does not have the intermediate value property on the interval
[−2, 2].

3.2.9. Use the generalized mean value theorem to prove that if f is twice differentiable,
then there is some number c between x0 and x0 + 2x such that

f (x0 + 2x) − 2f (x0 + x) + f (x0 )


= f  (c).
x 2

3.2.10. Explain why the generalized mean value theorem implies that

f (x0 + 3x) − 3f (x0 + 2x) + 3f (x0 + x) − f (x0 )


lim = f  (x0 ).
x→0 x 3

3.3 Continuity
Continuity is such an obvious geometric phenomenon that only slowly did it dawn on
mathematicians that it needed a precise definition. Well into the 19th century it was simply
viewed as a descriptive term for curves that are unbroken. The preeminent calculus text of
that era was S. F. Lacroix’s Traité élémentaire de calcul différentiel et de calcul intégral.
It was first published in 1802. The sixth edition appeared in 1858. Unchanged throughout
these editions was its definition of continuity: “By the law of continuity is meant that which
is observed in the description of lines by motion, and according to which the consecutive
points of the same line succeed each other without any interval.”
This intuitive notion of continuity is useless when one tries to prove anything. The first
appearance of the modern definition of continuity was published by Bernhard Bolzano in
1817 in the Proceedings of the Prague Scientific Society under the title Rein analytischer
Beweis des Lehrsatzes dass zwieschen je zwey [sic] Werthen, die ein entgegengesetztes Re-
sultat gewaehren, wenigstens eine reele Wurzel der Gleichung liege. This roughly translates
as Purely analytic proof of the theorem that between any two values that yield results of
opposite sign there will be at least one real root of the equation. The title says it all. Bolzano
was proving that any continuous function has the intermediate value property.
Bolzano’s article raises an important point. If he has to prove that continuity implies the
intermediate value property, then he is not using the intermediate value property to define
continuity. Why not? Such a definition would agree with the intuitive notion of continuity.
If a function is defined at every point on the interval [a, b], then to say that it has the
intermediate value property is equivalent to saying that it has no jumps or breaks.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.3 Continuity 79

30

20

10

0
0.2 0.4 0.6 0.8 1
x

−10

−20

−30

FIGURE 3.8. Graph of x −1 sin(1/x).

There are several problems with choosing this definition of continuity. A function that
has the intermediate value property on [0, 1] is not necessarily bounded on that interval.
An example is the function defined by

f (x) = x −1 sin(1/x), x = 0; f (0) = 0, (3.38)

(see Figure 3.8).


Another problem with using the intermediate value property to define continuity is that
two functions can have it while their sum does not:

f (x) = sin2 (1/x), x = 0; f (0) = 0,


g(x) = cos (1/x),
2
x = 0; g(0) = 0,
f (x) + g(x) = sin2 (1/x) + cos2 (1/x) = 1, x = 0; f (0) + g(0) = 0

(see Figure 3.9). This is not damning, but it is unsettling.


Perhaps the most important aspect of continuity that the intermediate value property
lacks, and the one that may have suggested the modern definition, is that if f is continuous
in a neighborhood of a and if there is a small error in the input so that instead of evaluating
f at a we evaluate it at something very close to a, then we want the output to be very close
to f (a) (see Figure 3.10). The function defined by

f (x) = sin(1/x), x = 0; f (0) = 0,


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

80 3 Differentiability and Continuity

0.8

0.6

0.4

0.2

0
0.5 1 1.5 2 2.5 3

FIGURE 3.9. Graphs of f (x) = sin2 (1/x) and g(x) = cos2 (1/x).

satisfies the intermediate value property no matter how we define f (0), provided that
−1 ≤ f (0) ≤ 1 (see Figure 3.11), but at a = 0 any allowance for error in the input will
result in an output that could be any number from −1 to 1. We want to be able to control
the variation in the output by setting a tolerance on the input.

Definition of Continuity
We require that if f is continuous in a neighborhood of a and if x is close to a, then f (x)
must be close to f (a). More precisely, we want to be able to force f (x) to be arbitrarily
close to f (a) by controlling the distance between x and a. This is the definition of continuity

error in
output

error in
input

FIGURE 3.10. The effect of a small error in input.


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.3 Continuity 81

0.5

x
−0.4 −0.2 0 0.2 0.4
0

−0.5

−1

FIGURE 3.11. Graph of f (x) = sin(1/x), x = 0.

that Bolzano stated in 1817 and that Cauchy proposed in his Cours d’analyse of 1821. To
make it as precise as possible, Cauchy used the Archimedean understanding.

Definition: continuity
We say that f is continuous at a if given any positive error bound , we can always
reply with a tolerance δ such that if x is within δ of a, then f (x) is within of f (a):
|x − a| < δ implies that |f (x) − f (a)| < .
To say that f is continuous on an interval I means that it is continuous at every point
a in the interval I .

Neither sin(1/x) nor x −1 sin(1/x) satisfy this definition of continuity at x = 0. Neither


of these functions are forced to take values close to 0 simply by restricting x to be close
to 0.
This definition does it all for us: It implies the intermediate value property, and it implies
that a continuous function of [a, b] is bounded and achieves its bounds on that interval.
Continuity is preserved when we add two continuous functions or multiply them or even
take compositions of continuous functions. All of the difficulties that we encountered in
Cauchy’s first proof of the mean value theorem evaporate if we add the assumption that
f  is continuous. Even better, our analysis of the properties of continuous functions will
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

82 3 Differentiability and Continuity

eventually lead us to Bonnet’s proof of the mean value theorem in which we can weaken
Cauchy’s assumptions.

Strange Examples
This definition does have its own idiosyncracies that run counter to our intuitive notion of
continuity. The emphasis is not on what happens over an interval but rather at what happens
near a specific point. The result is that it is possible for a function to be continuous at one
and only one point as the following example demonstrates. The examples given in this
section are elaborations of a basic idea that first occurred to Dirichlet: to define a function
one way over the rationals and in a different manner over the irrationals.
Define a function f by

x, if x is rational,
f (x) = (3.39)
0, if x is irrational.

We observe that this function is continuous at x = 0. If we are given a bound > 0, we


can always reply with δ = :
|x − 0| < implies that |f (x) − f (0)| <
because |f (x) − f (0)| is always either |x − 0| = |x| (when x is rational) or |0 − 0| = 0
(when x is irrational).
If a = 0, then f is not continuous at a because we have no reply to any bound < |a|. If
a is rational, then no matter how small we choose δ there is an irrational x within distance
δ of a:
|x − a| < δ but |f (x) − f (a)| = |0 − a| = |a| > .
If a is irrational, then no matter how small we choose δ there is a rational x within distance
δ of a and with an absolute value slightly larger than |a|:
|x − a| < δ but |f (x) − f (a)| = |x − 0| > |a| > .
This function is continuous at x = 0 and only at x = 0.
An example that is even stranger is the following function which is not continuous at
any rational number but which is continuous at every irrational number. In other words,
the points where this function is continuous form a discontinuous set. When we write a
nonzero rational number as p/q, we shall choose p and q > 0 to be the unique pair of
integers with no common factor. Let g be defined by


1, if x = 0,
g(x) = 1/q, if x = p/q is rational, (3.40)

 0, if x is irrational.

If a = 0, there is no response to any < 1. If a = p/q is rational, then we cannot


respond to a bound of < 1/q. In both cases, this is because within any distance δ of a,
we can always find an irrational number x for which |g(x) − g(a)| = g(a) > .
On the other hand, if a is irrational and we are given a bound , then the change in g is
bounded by , |g(x) − g(a)| = g(x) < , provided that whenever x = p/q is rational, 1/q
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.3 Continuity 83

 22 1 
 157 1   7 , 7 
0.02  ,
50 50 

 179 1 
 , 
57 57 
 201 1 
 , 
0.015 64 64 
 245 1 
 223 1   , 
78 78 
 71 71
,
 289 1 
 , 
 267 1  92 92 
 , 
85 85
0.01
 311 1 
 , 
99 99 

0.005 π–0.00015 π+0.00015

3.14 3.141 3.142 3.143


π

FIGURE 3.12. Rational numbers between 3.14 and 3.143 with denominators ≤ 100.

is less than . Equivalently, we want to choose a distance δ so that if x = p/q is a rational


number within δ of a, then q > −1 . We locate the rational numbers within distance 1 of a
for which the denominator is less than or equal to −1 . The critical observation is that there
are at most finitely many of them. For example, if a = π and = 0.01, then we only need
to exclude those fractions p/q with
p
π − 1 < < π + 1, and q ≤ 100.
q
We mark their positions on the interval (π − 1, π + 1) and choose our response δ to be less
than the distance between π and the closest of these unacceptable rational numbers. The
closest is 311/99 = 3.141414 . . . which is just over 0.00017 from π (see Figure 3.12). If we
respond with δ = 0.00015, then none of the fractions inside the interval (π − 0.00015, π +
0.00015) has a denominator less than or equal to 100:
|x − π | < 0.00015 implies that |g(x) − g(π )| < 0.01.

Web Resource: To discover how continued fractions can be used to find these
approximations to π with very small denominators, go to Continued Fractions.

An Equivalent Definition of Continuity


If we look back at the definition of continuity on page 81 and compare it with the definition
of limit on page 59, we see that f is continuous at x = a if and only if

lim f (x) = f (a). (3.41)


x→a
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

84 3 Differentiability and Continuity

This says that a function is continuous at x = a if and only if we can force f (x) to be as
close as we wish to f (a) simply by restricting the distance between x and a, excluding
the value x = a. In particular, continuity at x = a implies that if (x1 , x2 , x3 , . . .) is any
sequence that converges to a, then
 
lim f (xk ) = f lim xk . (3.42)
k→∞ k→∞

Usually, this approach is not helpful when we are trying to prove continuity; there are too
many possible sequences to check. But if we know that a function is continuous, then this
characterization of continuity can be very useful (see the proof of Theorem 3.3). And there
are times when we want to prove that a function is discontinuous (not continuous) at a given
value, a. A common method of accomplishing this is to find a sequence of values of x,
(x1 , x2 , x3 , . . .), that converges to a, but for which the sequence (f (x1 ), f (x2 ), f (x3 ), . . .)
does not converge to f (a),
lim f (xk ) = f (a).
k→∞

If this can be done, then f cannot be continuous at x = a.


An example is provided by f (x) = sin(1/x), x = 0, f (0) = 0. The sequence
(1/π, 2/3π, 2/5π, 2/7π, . . .) converges to 0, but the sequence
         
2 2 2 2
f ,f ,f ,f , . . . = (1, −1, 1, −1, . . .)
π 3π 5π 7π
does not converge.

The Intermediate Value Theorem


The key to proving that any continuous function satisfies the intermediate value prop-
erty is the nested interval principle. It was stated on page 32. It is repeated here for
convenience:

Definition: nested interval principle


Given an increasing sequence, x1 ≤ x2 ≤ x3 ≤ · · · , and a decreasing sequence, y1 ≥
y2 ≥ y3 ≥ · · · , such that yn is always larger than xn but the difference between yn and
xn can be made arbitrarily small by taking n sufficiently large, there is exactly one real
number that is greater than or equal to every xn and less than or equal to every yn .

As was mentioned in section 2.4, this principle is taken as an axiom or unproven


assumption. Both Bolzano and Cauchy used it without proof. When in the later 19th century
mathematicians began to realize that it might be in need of justification, they saw that it
depends upon the definition of the real numbers. In 1872, Richard Dedekind (1831–1916),
Georg Cantor (1845–1918), Charles Méray (1835–1911), and Heinrich Heine (1821–1881)
each gave a different definition of the real numbers that would imply this principle. It is
referred to as the Bolzano–Weierstrass theorem when it is proven as a consequence
of carefully stated properties of the real numbers. The name acknowledges the first two
mathematicians to recognize the need to state it explicitly.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.3 Continuity 85

f ( c2 )

f ( c3 )
f ( c5 )
A
f ( c4 )

f ( c1 )

c1 c4 c5 c3 c2

FIGURE 3.13. Proof of intermediate value theorem.

We have been searching for bedrock, a solid and unequivocal foundation on which to
construct analysis. One of the lessons of the twentieth century has been that this search
can be continued forever. To define the real numbers we require a careful definition of the
rationals. This in turn rests on a precise description of the integers which is impossible
without a clear understanding of sets and cardinality. At this point, the very principles of
logic need underpinning. The solution is to draw a line somewhere and state that this is
what we shall assume, this is where we shall begin. Not everyone will agree that the nested
interval principle is the right place to draw that line, but it has the advantage of being
simple and yet sufficient for all we want to prove. Here is where we shall begin to build the
theorems of analysis.

Theorem 3.3 (Intermediate Value Theorem). If f is continuous on the interval [a, b],
then f has the intermediate value property on this interval.

Proof: We assume that f is continuous on the interval [a, b]. We want to show that if c1
and c2 are any two points on this interval and if A is any number strictly between f (c1 )
and f (c2 ), then there is at least one value c between c1 and c2 for which f (c) = A. The
trick is to shrink the interval in which c is located, show that we can make this interval
arbitrarily small, and then invoke the nested interval principle to justify the claim that there
is something left in our net when we are done.
We can assume that c1 < c2 . We begin to define the sequences for the nested interval
principle by setting x1 = c1 and y1 = c2 . We split the difference between these endpoints,
call it
x1 + y1
c3 = .
2
If f (c3 ) = A, then we are done. We have found our c. If not, then f (c3 ) is either on the
same side of A as f (x1 ) or it is on the same side of A as f (y1 ) (Figure 3.13). We are in one
of two possibilities:
1. If f (x1 ) and f (c3 ) are on opposite sides of A, then we define x2 = x1 and y2 = c3 .
2. If f (c3 ) and f (y1 ) are on opposite sides of A, then we define x2 = c3 and y2 = y1 .
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

86 3 Differentiability and Continuity

In either case, the result is that

x1 ≤ x2 < y2 ≤ y1 ,
y2 − x2 = (y1 − x1 )/2,

and f (x2 ), f (y2 ) lie on opposite sides of A. We have cut in half the size of the interval
where c must lie.
We repeat what we have just done. We find the midpoint of our last interval:
x2 + y2
c4 = .
2
If f (c4 ) = A, then we are done. Otherwise, we are in one of two situations:
1. If f (x2 ) and f (c4 ) are on opposite sides of A, then we define x3 = x2 and y3 = c4 .
2. If f (c4 ) and f (y2 ) are on opposite sides of A, then we define x3 = c4 and y3 = y2 .
In either case, the result is that

x1 ≤ x2 ≤ x3 < y3 ≤ y2 ≤ y1 ,
y3 − x3 = (y2 − x2 )/2 = (y1 − x1 )/4,

and f (x3 ), f (y3 ) lie on opposite sides of A.


We can keep on doing this as long as we please. Once we have found xk and yk , we find
the midpoint:
xk + yk
ck+2 = .
2
If f (ck+2 ) = A, then we are done. Otherwise, we have that either
1. f (xk ) and f (ck+2 ) are on opposite sides of A in which case we define xk+1 = xk and
yk+1 = ck+2 , or
2. f (ck+2 ) and f (yk ) are on opposite sides of A in which case we define xk+1 = ck+2 and
yk+1 = yk .
In either case, the result is that

x1 ≤ x2 ≤ · · · ≤ xk ≤ xk+1 < yk+1 ≤ yk ≤ · · · ≤ y2 ≤ y1 ,


yk+1 − xk+1 = (yk − xk )/2 = · · · = (y2 − x2 )/2k−1 = (y1 − x1 )/2k ,

and f (xk+1 ), f (yk+1 ) lie on opposite sides of A and can be forced as close as we wish to A
by taking k sufficiently large. This is the Archimedean definition of limit:

lim f (xk ) = lim f (yk ) = A. (3.43)


k→∞ k→∞

Our sequences x1 ≤ x2 ≤ · · · and y1 ≥ y2 ≥ · · · satisfy the conditions of the nested


interval principle and so there is a number c that lies in all of these intervals. Again by the
Archimedean definition of limit, we see that

lim xk = lim yk = c. (3.44)


k→∞ k→∞
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.3 Continuity 87

Since f is continuous at x = c, we know that

lim f (xk ) = f (c). (3.45)


k→∞

Since this limit is also equal to A and any sequence has at most one limit, we have proved
that f (c) = A.
Q.E.D.

The Modified Converse to the Intermediate Value Theorem


As we have seen by example, the intermediate value property is not enough to imply
continuity. The converse of the intermediate value theorem is not true. But a very reasonable
question to pose is whether we can find a broad class of functions for which the intermediate
value property is equivalent to continuity. One such class consists of the functions that are
piecewise monotonic on any finite interval.

Definition: monotonic
A function is monotonic on [a, b] if it is increasing on this interval,
a ≤ x1 < x2 ≤ b implies that f (x1 ) ≤ f (x2 ),
or if it is decreasing on this interval,
a ≤ x1 < x2 ≤ b implies that f (x1 ) ≥ f (x2 ).
A function is piecewise monotonic on [a, b] if we can find a partition of the interval
into a finite number of subintervals
a = x1 < x2 < · · · < xn−1 < xn = b
for which the function is monotonic on each open subinterval (xi , xi+1 ).

The key word is finite. It excludes all of the strange functions we have encountered so far.
They all jumped or oscillated infinitely often within our interval.

Theorem 3.4 (Modified Converse to IVT). If f is piecewise monotonic and satisfies


the intermediate value property on the interval [a, b], then f is continuous at every
point c in (a, b).

Proof: We shall assume that c lies inside one of the intervals (xi , xi+1 ) on which f is
monotonic. The proof that f is also continuous at the ends of these intervals is similar
and is left as an exercise. For convenience, we assume that f is increasing on (xi , xi+1 ). If
not, then we replace f with −f . Since f is increasing and xi < c < xi+1 , it follows that
f (xi ) ≤ f (c) ≤ f (xi+1 ).
We are given an error bound . Our challenge is to show that we always have a response
δ so that keeping x within δ of c guarantees that f (x) will be within of f (c). We begin
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

88 3 Differentiability and Continuity

by finding two numbers, c1 and c2 , that satisfy


xi ≤ c1 < c < c2 ≤ xi+1
and
f (c) − < f (c1 ) ≤ f (c) ≤ f (c2 ) < f (c) + .
If f (c) − < f (xi ) ≤ f (c), then we let c1 = xi . Otherwise, we have that
f (xi ) ≤ f (c) − < f (c).
The intermediate value property promises us a c1 between xi and c for which f (c) − <
f (c1 ) < f (c). For example, we could choose c1 to be a value for which f (c1 ) = f (c) − /2.
In either case, we have that xi ≤ c1 < c and
f (c) − < f (c1 ) ≤ f (c).
We find c2 similarly. If f (xi+1 ) < f (c) + , then c2 = xi+1 . Otherwise, we choose c2 so
that c < c2 < xi+1 and f (c) ≤ f (c2 ) < f (c) + . In either case, we have that c < c2 ≤ xi+1
and
f (c) ≤ f (c2 ) < f (c) + .
Now that we have found c1 and c2 , we choose δ to be the smaller of the distances
c − c1 > 0 and c2 − c > 0 so that if x is within δ of c then
c1 < x < c2 .
Since f is increasing on [c1 , c2 ], we can conclude that
f (c) − < f (c1 ) ≤ f (x) ≤ f (c2 ) < f (c) + .
Q.E.D.

Most functions that you are likely to encounter are piecewise monotonic. It should come
as a relief that in this case our two definitions of continuity are interchangeable. When we
reach Dirichlet’s proof of the validity of the Fourier series expansion, we shall see that
piecewise monotonicity is a critical assumption.

Sums, Products, Reciprocals, and Compositions


Combinations of continuous functions using addition, multiplication, division, or composi-
tion yield continuous functions. The proofs follow directly from the definition of continuity.
We begin by assuming that f and g are continuous at x = c. In order to show that f + g
is also continuous at c, we have to demonstrate that if someone gives us a bound > 0,
then we can find a response δ so that if we keep x within distance δ of c, |x − c| < δ, then
we are guaranteed that


[f (x) + g(x)] − [f (c) + g(c)] < .
We split our error bound, giving half to f and half to g. The continuity of f and g at c
promises us responses δ1 and δ2 such that
|x − c| < δ1 implies that |f (x) − f (c)| < /2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.3 Continuity 89

and
|x − c| < δ2 implies that |g(x) − g(c)| < /2.
We choose δ to be the smaller of these two responses. When |x − c| < δ we have that
| [f (x) + g(x)] − [f (c) + g(c)] | ≤ |f (x) − f (c)| + |g(x) − g(c)|
< /2 + /2 = .

The product f g is a little trickier. We again begin with the assumption that both f and g
are continuous at x = c. Before deciding how to divide our assigned bound we observe
that
|f (x) g(x) − f (c) g(c)| = |f (x) g(x) − f (c) g(x) + f (c) g(x) − f (c) g(c)|
≤ |f (x) − f (c)| |g(x)| + |f (c)| |g(x) − g(c)|. (3.46)
We want each of these two pieces to be less than /2. If f (c) = 0, then the second piece
gives us no problem. If f (c) is not zero, then we need to have |g(x) − g(c)| bounded by
( /2) |f (c)|. Let δ1 be the response:

|x − c| < δ1 implies that |g(x) − g(c)| < .
2|f (c)|
The first piece is slightly more problematic. Since c is fixed, f (c) is a constant. We are
now faced with a multiplier, |g(x)|, that can take on different values. Our first task is to use
the continuity of g to pin down |g(x)|. We choose a δ2 so that |x − c| < δ2 guarantees that
|g(x) − g(c)| < 1. This implies that
|g(x)| < 1 + |g(c)| .
We find a δ3 for which

|x − c| < δ3 implies that |f (x) − f (c)| < .
2 (1 + |g(c)|)
In either case, choosing a δ that is less than or equal to both δ2 and δ3 gives us the desired
bound:

|f (x) − f (c)| |g(x)| < (1 + |g(c)|) = .
2 (1 + |g(c)|) 2
If we choose our final response δ to be the smallest of δ1 , δ2 , and δ3 , then

|f (x) g(x) − f (c) g(c)| ≤ |f (x) − f (c)| |g(x)| + |f (c)| |g(x) − g(c)| < + .
2 2
Reciprocals require a similar finesse. If f is continuous at x = c and if f (c) = 0, then we
need to find a δ that will force

1 1 |f (c) − f (x)|

f (x) − f (c) = |f (x)| |f (c)|

to be less than any prespecified bound . We need an upper bound on 1/|f (x)| which means
finding a lower bound on |f (x)|. We use the continuity of f to find δ1 guaranteeing that
|f (c)|
|f (x) − f (c)| < ,
2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

90 3 Differentiability and Continuity

and therefore
|f (c)|
|f (x)| > .
2
We now have the bound
|f (c) − f (x)| 2
< |f (x) − f (c)| .
|f (x)| |f (c)| |f (c)|2
We again use the continuity of f at c to find δ2 for which
|f (c)|2
|x − c| < δ2 implies that |f (x) − f (c)| < .
2
Choosing δ to be the smaller of δ1 and δ2 , we see that

1 1 |f (c)|2 2

f (x) − f (c) < 2
·
|f (c)|2
= .

The easiest has been saved for last: compositions of continuous functions. If g(x) is
continuous at c and f (y) is continuous at g(c), then given a bound we first feed it to f
and find the response δ1 :
|y − g(c)| < δ1 implies that |f (y) − f (g(c))| < .
To get |g(x) − g(c)| < δ1 , we feed δ1 to g, getting a response δ2 :
|x − c| < δ2 implies that |g(x) − g(c)| < δ1
implies that |f (g(x)) − f (g(c))| < .

Differentiability Implies Continuity


We conclude this section with the observation that differentiability at x = c implies conti-
nuity at x = c. Once it was realized that continuity was a significant property that actually
needed verification, it was seen that we could not have differentiability without continuity.
The converse remained an enigma for many years. It is possible to have a continuous
function that fails to be differentiable at a single point or even at several discrete points.
The function f (x) = |x| is the simplest example. It is continuous at 0. Given a bound ,
one can always reply with δ = :
|x − 0| < δ = implies that |x| − 0 < .
On the other hand, if we look at the error term in the definition of differentiability:
 
 |x| − 0  1, x > 0
E(x, 0) = f (0) − = f (0) − ,
x−0 −1, x < 0
we see that there is no value that we can assign to f  (0): If it is close to 1 then it will be far
from −1, and if it is close to −1 then it will be far from 1.
How nondifferentiable can a continuous function be? In particular, can we find a function
that is continuous at every point in some interval [a, b] but that is not differentiable at any
point in this interval? To most people’s surprise, the answer to this question is yes. Bolzano
found an example in the early 1830s, although it was not published until almost a century
later.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.3 Continuity 91

In 1872, Weierstrass shocked the mathematical community with his example,




f (x) = bn cos(a n π x), (3.47)
n=0

where a is an odd integer, 0 < b < 1, and ab > 1 + 3π/2. What is so astonishing is that
this is a reasonable Fourier series. For example, if a = 13 and b = 1/2, then this is the
series
1 1 1
cos(π x) + cos(13π x) + cos(169π x) + cos(2197π x) + · · · .
2 4 8
This example and others will be explained in section 6.4. To verify that this function is
continuous but not differentiable at any value of x, we shall first need to study properties
of infinite series in more detail. For now, we shall content ourselves with the verification
that differentiability requires continuity.

Theorem 3.5 (Differentiable ⇒ Continuous). If f is differentiable at x = c, then f


is continuous at x = c.

Proof: From the definition of differentiability, we know that there is a value f  (c) for
which the error term
f (x) − f (c)
E(x, c) = f  (c) −
x−c
can be made as small as we want by restricting x to be sufficiently close to c. We solve this
equation for f (x) − f (c):
f (x) − f (c) = (x − c)f  (c) − (x − c)E(x, c),
|f (x) − f (c)| ≤ |x − c| |f  (c)| + |x − c| |E(x, c)|.
Given a bound , we give half of it to each of the terms on the right side of this inequality.
If f  (c) = 0, then the first term is zero. If not, then to make |x − c| |f  (c)| less than /2 we
need to have

|x − c| < .
2|f  (c)|
We can make |E(x, c)| as small as we want. We find a δ1 so that |x − c| < δ1 implies that
|E(x, c)| < /2. The second term will be the right size as long as |x − c| is less than 1.
We choose δ to be the smallest of /2|f  (c)|, δ1 , and 1 and we get the desired bound:
|f (x) − f (c)| ≤ |x − c| |f  (c)| + |x − c| |E(x, c)|

< 
|f  (c)| + 1 · = .
2|f (c)| 2
Q.E.D.

In exercise 3.3.34 you will get a chance to prove this theorem with a weaker hypothesis,
only using one-sided derivatives.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

92 3 Differentiability and Continuity

Definition: one-sided limits and derivatives


The limit from the right, limx→a + f (x), is the target value T with the property that
for any > 0, there is a response δ so that if a < x < a + δ, then |f (x) − T | < .
Similarly, the limit from the left implies this inequality when a − δ < x < a. The
one-sided derivatives are defined by
f (x) − f (a) f (x) − f (a)
f+ (a) = lim+ , f− (a) = lim− .
x→a x−a x→a x−a

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

3.3.1. Prove that the function defined by


f (x) = sin(1/x), x = 0; f (0) = 0
is not continuous at x = 0 by finding an for which there is no reply.

3.3.2. Prove that the function defined by



1, x rational
f (x) =
0, x irrational
is not continuous at any x.

3.3.3. For
√ the function g given in equation (3.40) on page 82, find a response
√ δ that will
√ work
at a = 2 when = √ 0.2. What are the rational numbers x in the interval ( 2 − 1, 2 + 1)
for which |g(x) − g( 2)| ≥ 0.2?

3.3.4. At what values of x is the function f continuous? Justify your answer.



0, if x is irrational
f (x) =
sin x, if x is rational

3.3.5. At what values of x is the function f continuous? Justify your answer.


 2
x − 1, if x is irrational
f (x) =
0, if x is rational

3.3.6. At what values of x is the function f continuous? Justify your answer.




 x, if x is irrational or x = 0
f (x) = qx/(q + 1), if x = p/q, where p and q are relatively prime integers,


q>0

3.3.7. Prove that if f is continuous on [a, b], then |f | is continuous on [a, b]. Show by an
example that the converse is not true.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.3 Continuity 93

3.3.8. Let f be a continuous function from [0, 1] to [0, 1]. Show that there must be an x in
[0, 1] for which f (x) = x.

3.3.9. Let f and g be continuous functions on [0, 1] such that f (0) < g(0) and f (1) > g(1).
Show that there must be an x in (0, 1) for which f (x) = g(x).

3.3.10. Prove that the equation (1 − x) cos x = sin x has at least one solution in (0, 1).

3.3.11. Let f be a continuous function on [0, 2] such that f (0) = f (2). Show that there
must be values a and b in [0, 2] such that

a − b = 1 and f (a) = f (b).

3.3.12. Let f be a continuous function on [0, 2]. Show that there must be values a and b
in [0, 2] such that

f (2) − f (0)
a − b = 1 and f (a) − f (b) = .
2

3.3.13. Let f be a continuous function on [0, n] such that f (0) = f (n), where n ∈ N.
Show that there must be values a and b in [0, n] such that

a − b = 1 and f (a) = f (b).

3.3.14. Let f (x) = x 2


sin(π x) for all real x. Study the continuity of f . ( a
is the floor
of a, the greatest integer less than or equal to a.)

3.3.15. Let

f (x) = x
+ (x − x
) x
, for x ≥ 1/2.

Show that f is continuous. Show that f is strictly increasing on [1, ∞).

3.3.16. Find an x between 0 and 0.1 for which sin(1/x) = 1. Find such an x for which
sin(1/x) = −1. Find x’s between 0 and 0.001 for which sin(1/x) = 1, = −1. Find x’s
between 0 and 10−100 for which sin(1/x) = 1, = −1.

3.3.17. Find a δ > 0 such that 0 < h ≤ δ guarantees that

| sin(x + h) − sin x| < 0.1.

3.3.18. Find a δ > 0 such that 0 < h ≤ δ guarantees that

|(x + h)2 − x 2 | < 0.1,

when 0 ≤ x ≤ 1.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

94 3 Differentiability and Continuity

3.3.19. If we do not restrict the size of x, can we find a δ > 0 that does not depend on x
and for which 0 < h ≤ δ guarantees that

|(x + h)2 − x 2 | < 0.1?

Explain why or why not.

3.3.20. Find a δ > 0 such that 0 < h ≤ δ guarantees that

|ex+h − ex | < 0.1,

when 0 ≤ x ≤ 1. Do we need to restrict the size of x?

3.3.21. Give an example of a function other than f (x) = |x| that is continuous for all real
x but that is not differentiable for at least one value of x.

3.3.22. Prove that



|x − c|/c, if x > c,
| ln x − ln c| < (3.48)
|x − c|/x, if x < c.

3.3.23. Use the inequality in exercise 3.3.22 to find a positive number δ > 0 such that
|h| ≤ δ implies that

| ln(x + h) − ln(x)| < 0.1

for all x ≥ 1.

3.3.24. Does it seem strange to you that a function can be continuous at exactly one point?
Find another function that is continuous at exactly one point.

M&M
3.3.25.
Graph the functions defined by

ln(x + 2) − x 2n (sin x)
fn (x) = ,
1 + x 2n
for n = 2, 5, 10, and 20 over the interval [0, π/2]. Describe what you see. Find the
approximate location of the root.

3.3.26. The functions fn of exercise 3.3.25 are all continuous. Graph the function defined
by

f (x) = lim fn (x)


n→∞

and prove that it is not continuous on the interval [0, π/2]. What is the value of f (1)?
The conclusion is that the limit of a family of continuous functions might not be
continuous.

3.3.27. Let f (x) = sin(1/x) when x = 0. Prove that if we choose any value from the
interval [−1, 1] to assign to f (0), then f will have the intermediate value property.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.4 Consequences of Continuity 95

3.3.28. Consider the function that takes the tenths digit in the decimal expansion of x and
replaces it with a 1. For example, f (2.57) = 2.17, f (3) = 3.1, f (π ) = 3.14159. . . = π .
Where is this function continuous? Where is this function not continuous? Justify your
assertions.

3.3.29. Consider the function that takes the digits in the decimal expansion of x ∈ (0, 1)
and inserts 0’s between them so that 0.a1 a2 a3 . . . becomes 0.0a1 0a2 0a3 . . . . Is there
any x in this interval that has a finite decimal expansion and for which this function is
continuous?

3.3.30. Using the same function as in exercise 3.3.29, is there any x ∈ (0, 1) that has an
infinite decimal expansion and for which this function is not continuous?

3.3.31. Prove the intermediate value theorem with the weaker assumption that f is contin-
uous on (a, b), continuous from the right at x = a (limx→a + f (x) = f (a)), and continuous
from the left at x = b (limx→b− f (x) = f (b)).

3.3.32. Prove that if f has the intermediate value property on the interval [a, c] and if it is
monotonic on (a, c) then limx→c− f (x) = f (c). This completes the proof of Theorem 3.4
and shows that we can add the conclusion limx→a + f (x) = f (a) and limx→b− f (x) = f (b).

3.3.33. Prove that if f and g are both continuous at x = c and if g(c) = 0, then f/g must
be continuous at x = c.

3.3.34. Show that if the one-sided derivatives f− (a) and f+ (a) exist, then f is continuous
at a.

3.4 Consequences of Continuity


Continuity is a powerful concept. There is much that we can conclude about any continuous
function. In this section, we shall pursue some of these consequences and investigate the
even richer rewards that accrue when differentiability is also brought in to play.

Theorem 3.6 (Continuous ⇒ Bounded). If f is continuous on the interval [a, b],


then there exist finite bounds A and B such that
A ≤ f (x) ≤ B
for all x ∈ [a, b].

Before proving this theorem, we note that we really do need all of the conditions. If f
only satisfies the intermediate value property, then it could be the function defined by

f (x) = x −1 sin(1/x), x = 0; f (0) = 0

which is not bounded on [0, 1]. If the endpoints of the interval are not included, then we
could have a continuous function such as f (x) = 1/x which is not bounded on (0, 1).
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

96 3 Differentiability and Continuity

Proof: We assume that f is not bounded and show that this implies at least one point
c ∈ [a, b] where f is not continuous. Again, we use the nested interval principle to find the
point c. Let x1 = a and y1 = b and let c1 be the midpoint of this interval:
x1 + y1
c1 = .
2
If we consider the two intervals [x1 , c1 ] and [c1 , y1 ], our function must be unbounded on at
least one of them (if it were bounded on both, then the greater of the upper bounds would
work for both intervals, the lesser of the lower bounds would serve as lower bound for both
intervals). We choose one of these intervals on which f is unbounded and define x2 , y2 to
be the endpoints of this shorter interval:
x1 ≤ x2 < y2 ≤ y1 ,
y2 − x2 = (y1 − x1 )/2.
We repeat this operation, setting c2 = (x2 + y2 )/2 and choosing a shorter interval on
which f is still unbounded. Continuing in this manner, we obtain a sequence of nested
intervals of arbitrarily short length,
x1 ≤ x2 ≤ · · · ≤ xk < yk ≤ · · · ≤ y2 ≤ y1 ,

yk+1 − xk+1 = (yk − xk )/2 = · · · = (y2 − x2 )/2k−1 = (y1 − x1 )/2k ,


each with the property that f is unbounded on [xk , yk ]. We let c be the point in all of these
intervals that is promised to us by the nested interval principle.
All that is left is to prove that f is not continuous at c. We play the –δ game with an
interesting twist: no matter which is chosen, there is no δ with which we can respond.
To see this, let be any positive number, and let us claim that a certain δ will work. Our
opponent points out that there is a k for which yk − xk < δ and so any point in [xk , yk ] is less
than distance δ from c. We are also reminded that f is unbounded on [xk , yk ] which means
that there is at least one x in this interval for which f (x) > f (c) + or f (x) < f (c) −
(otherwise we could use f (c) − and f (c) + as our bounds). But then the distance from
f (x) to f (c) is larger than .
Q.E.D.

Least Upper and Greatest Lower Bounds


If we want to patch up Cauchy’s first proof of the mean value theorem by assuming that
the derivative is continuous on [a, b], it is not enough to prove that a continuous function
on a closed interval is bounded, it must actually achieve the best possible bounds. That is
to say, if f is continuous on [a, b] then we must be able to find c1 and c2 in [a, b] for which
f (c1 ) ≤ f (x) ≤ f (c2 )
for all x ∈ [a, b]. The theorem we have just proved only promises us that bounds exist. It
says nothing about how close these bounds come to the actual values of the function.
What we are usually interested in are the best possible bounds. In the case of f (x) = x 3
on [−2, 3], these are −8 and 27. Respectively, these are called the greatest lower bound
and the least upper bound. The greatest lower bound is a lower bound with the property
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.4 Consequences of Continuity 97

that any larger number is not a lower bound. Similarly, the least upper bound is an upper
bound with the property that any smaller number is not an upper bound. Before we can ask
whether or not f achieves these best possible bounds, we must know whether they always
exist. The precise definition is similar to the Archimedean understanding of a limit.

Definition: least upper, greatest lower bounds


Given a set S, the least upper bound or supremum of S, denoted sup S, is the number
G with the property that for any numbers L < G and M > G, there is at least one
element of S that is strictly larger than L and at least one upper bound for S that is
strictly smaller than M. The greatest lower bound or infimum of S, denoted inf S, is
the negative of the least upper bound of −S = {−s | s ∈ S}.

It may seem obvious that every bounded set has greatest lower and least upper bounds,
but this is a subtle point that would not be recognized as a potential difficulty until the
latter half of the 19th century. We have the machinery at hand for tackling it, and so we
shall proceed. To convince you that there is something worth investigating, we consider
the sequence
x1 = 1 − 1/3
= 2/3,
x2 = 1 − 1/3 + 1/5 − 1/7
= 76/105,
x3 = 1 − 1/3 + 1/5 − 1/7 + 1/9 − 1/11
= 2578/3465,
x4 = 1 − 1/3 + 1/5 − 1/7 + 1/9 − 1/11 + 1/13 − 1/15
= 33976/45045, . . . .
As we know from our earlier work on series, these numbers are increasing and approaching
π/4. If we define
S = {xk | k ≥ 1},
then π/4 is the least upper bound for this set. But what happens if we restrict our attention
to rational numbers? Every element of S is rational, but π/4 is not. In the set of rational
numbers, we cannot call on π/4 to serve as our least upper bound. We are required to
choose a rational number. We can certainly find a rational number that is an upper bound.
The number 1 will do, but it is not a least upper bound: 83/105 would be better. Still better
would be 2722/3465. If we restrict our sights to the rational numbers, then we can always
find a better upper bound, there is no best. That is because the best, π/4, is outside the
domain in which we are searching. No matter how close to π/4 we choose our rational
number, there is always another rational number that is a little bit closer.
The problem is that the set of rational numbers has holes in it: precisely those irrational
numbers like π/4. What characterizes the real numbers is that they include all of the
rationals plus what is needed to plug the holes. This property of the real numbers is implicit
in the nested interval principle.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

98 3 Differentiability and Continuity

Theorem 3.7 (Upper Bound ⇒ Least Upper Bound). In the real numbers, every set
that has an upper bound also has a least upper bound and every set that has a lower
bound also has a greatest lower bound.

Proof: Since the greatest lower bound can be defined in terms of the least upper bound, it
is enough to prove the existence of the least upper bound.
We assume that S has an upper bound (and therefore is not empty) and construct our
sequences for the nested interval principle as follows: let x1 be a number that is not an upper
bound of S (choose some x ∈ S and then choose any x1 < x) and let y1 be an upper bound
for S. We let c1 be the midpoint of [x1 , y1 ], c1 = (x1 + y1 )/2. If c1 is an upper bound, then
we set x2 = x1 and y2 = c1 . If c1 is not an upper bound, then we set x2 = c1 and y2 = y1 .
In either case, x2 is not an upper bound and y2 is an upper bound for S,
x1 ≤ x2 < y2 ≤ y1 ,
and
y2 − x2 = (y1 − x1 )/2.
This can be repeated as often as we like. Once we have found xk and yk , we split the
difference: ck = (xk + yk )/2. If ck is an upper bound, then xk+1 = xk and yk+1 = ck . If ck
is not an upper bound, then xk+1 = ck and yk+1 = yk . In either case, xk+1 is not an upper
bound and yk+1 is an upper bound for S,
x1 ≤ x2 ≤ · · · ≤ xk ≤ xk+1 < yk+1 ≤ yk ≤ · · · ≤ y2 ≤ y1 ,
and
yk+1 − xk+1 = (yk − xk )/2 = · · · = (y2 − x2 )/2k−1 = (y1 − x1 )/2k .
We claim that the c that lies in all of these intervals is the least upper bound. If we take
any L < c, then we can find an xk > L and so there is an x ∈ S with L < xk < x. If we
take any M > c, then we can find a yk < M, and yk is an upper bound for S.
Q.E.D.

We could have taken the existence of least upper bounds as an axiom of the real numbers.
As you are asked to prove exercise 3.4.11, the statement “every set with an upper bound
has a least upper bound” implies the nested interval principle.

Achieving the Bounds

Theorem 3.8 (Continuous ⇒ Bounds Achieved). If f is continuous on [a, b], then


it achieves its greatest lower bound and its least upper bound. Equivalently, there exist
k1 , k2 ∈ [a, b] such that
f (k1 ) ≤ f (x) ≤ f (k2 )
for all x ∈ [a, b].
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.4 Consequences of Continuity 99

Proof: We shall only prove the existence of k2 . The proof for k1 follows by substituting
−f for f .
As we saw in Theorem 3.6, the set {f (x) | a ≤ x ≤ b} has an upper bound. Theorem 3.7
then promises us a least upper bound; call it A. Our problem is to show that there is
some c ∈ [a, b] for which f (c) = A. By now you should expect that we use the nested
interval principle to find our candidate for c. We start by defining x1 = a, y1 = b, and
c1 = (x1 + y1 )/2. Since A is an upper bound for f (x) over the entire interval [a, b], it
is also an upper bound for f (x) over each of the shorter intervals [x1 , c1 ] and [c1 , y1 ]. It
must be the least upper bound for f (x) over at least one of these subintervals, because
if something smaller worked for both subintervals, then A would not be the least upper
bound over [a, b]. If A is the least upper bound over [x1 , c1 ], then we define x2 = x1 and
y2 = c1 . If not, then A is the least upper bound over [c1 , y1 ] and we define x2 = c1 and
y2 = y1 .
We continue in this manner, each time chopping our interval in half and choosing a half
on which A is still the least upper bound. We get our sequences

x1 ≤ x2 ≤ · · · ≤ xk < yk ≤ · · · ≤ y2 ≤ y1 ,

yk − xk = (yk−1 − xk−1 )/2 = · · · = (y2 − x2 )/2k−1 = (y1 − x1 )/2k .

Let c be the point that is in all of these intervals.


Since A is an upper bound for f (x) over [a, b], we know that f (c) is less than or equal
to A. If f (c) is strictly less than A, then we choose an < A − f (c) and use the continuity
of f at c to find a δ such that |x − c| < δ guarantees that |f (x) − f (c)| < . This in turn
implies that

f (c) − < f (x) < f (c) + < A.

We now choose our k so that yk − xk < δ. It follows that every point in [xk , yk ] is at most
distance δ from c. The quantity f (c) + —which is less than A—is an upper bound for
f (x) over [xk , yk ]. This contradicts the fact that A is the least upper bound for f (x) over
the interval [xk , yk ]. Our assumption that f (c) is strictly less than A cannot be valid, and
so f (c) = A.
Q.E.D.

Fermat’s Theorem on Extrema


As was mentioned in section 3.2, the best proof of the mean value theorem is slick, but
it is neither direct nor obvious. It is the result of knowing enough about continuous and
differentiable functions that someone, eventually, observed a better route. Here we begin
to look at the consequences of differentiability, starting with an observation that had been
made by Pierre de Fermat in 1637 or 1638, and in a less precise form by Johann Kepler
in 1615, well before Newton or Leibniz were born (1642 and 1646, respectively). It is
the observation that one finds the extrema (maxima or minima) of a function where the
derivative is zero. This observation was a principal impetus behind the search for the
algorithms of differentiation.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

100 3 Differentiability and Continuity

Theorem 3.9 (Fermat’s Theorem on Extrema). If f has an extremum at a point


c ∈ (a, b) [f (c) ≥ f (x) for all x ∈ (a, b) or f (c) ≤ f (x) for all x ∈ (a, b)] and if f
is differentiable at every point in (a, b), then f  (c) = 0.

Proof: We shall actually prove that if f  (c) = 0, then we can find x1 , x2 ∈ (a, b) for which
f (x1 ) < f (c) < f (x2 ).
It follows that if f  (c) = 0, then we do not have an extremum at x = c. This is logically
equivalent to what we want to prove.
Without loss of generality, we can assume that f  (c) > 0 (if it is less than zero, then we
replace f by −f ). It should be evident that we want x1 to be a little less than c and x2 to
be a little more, but both have to be very close to c. How close? Here is where we use the
definition of differentiability.
We let E(x, c) be the error introduced when the derivative is replaced by the average
rate of change:
f (x) − f (c)
E(x, c) = f  (c) − .
x−c
If the absolute value of the error is smaller than |f  (c)|, then [f (x) − f (c)]/[x − c] will
have the same sign as f  (c). Since we have assumed that f  (c) is positive, we have
f (x) − f (c)
> 0.
x−c
When x is less than c, f (x) will have to be less than f (c). When x is larger than c, f (x)
will have to be larger than f (c). The solution is therefore to find a δ for which
|x − c| < δ implies that |E(x, c)| < |f  (c)|.
The definition of differentiability promises us such a δ. We choose x1 and x2 so that
c − δ < x1 < c < x2 < c + δ.
Q.E.D.

Rolle’s Theorem
There is a special case of the mean value theorem that was noted by Michel Rolle (1652–
1719) in 1691 and was periodically resurrected over the succeeding years. At the time, it
seemed so obvious that no one bothered to prove it. In fact, it is equivalent to the mean
value theorem. Once we have proved it, we shall be almost there.

Theorem 3.10 (Rolle’s Theorem). Let f be a function that is continuous on [a, b]


and differentiable on (a, b) and for which f (a) = f (b) = 0. There exists at least one
c, a < c < b, for which f  (c) = 0.

Proof: Since our function is continuous on [a, b], Theorem 3.8 promises us that it must
achieve its maximal and minimal values somewhere on this interval. At least one of these
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.4 Consequences of Continuity 101

a c b

FIGURE 3.14. Proof of the mean value theorem.

extrema must occur at some x strictly between a and b. The only possible counterexample
would be a function with both extrema at the endpoints, but then 0 is the largest value of
the function and 0 is the smallest value of the function. The function would be identically
0 and so have an extremum at every point in [a, b]. Let c ∈ (a, b) be a point at which f has
an extremum. By Theorem 3.9, f  (c) = 0.
Q.E.D.

We note that this is a special case of the mean value theorem because the average rate of
change over this interval is
f (b) − f (a) 0−0
= = 0.
b−a b−a
It is equivalent to the mean value theorem because we can find an auxiliary function that
enables us to reduce the mean value theorem to this case.

Mean Value Theorem


We are now ready to prove the mean value theorem, Theorem 3.1 on page 58, that if f is
continuous on [a, b] and differentiable on (a, b), then there is at least one c, a < c < b, for
which
f (b) − f (a)
f  (c) = .
b−a
Proof: (Mean Value Theorem) We subtract from our function the straight line passing
through [a, f (a)] and [b, f (b)] (Figure 3.14). The result is a new function that is continuous
on [a, b], differentiable on (a, b) and for which g(a) = g(b) = 0:
f (b) − f (a)
g(x) = f (x) − (x − a) − f (a).
b−a
We apply Rolle’s theorem to g:
f (b) − f (a)
0 = g  (c) = f  (c) − ,
b−a
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

102 3 Differentiability and Continuity

and therefore
f (b) − f (a)
f  (c) = .
b−a
Q.E.D.

As was pointed out in section 3.2, this proof was discovered by Ossian Bonnet and
published in Serret’s calculus text of 1868. It should be noted that we have weakened
the assumptions that Cauchy made. Our function does not need to be differentiable at the
endpoints, and we certainly do not need the derivative to be continuous. There is no reason
why we cannot have an unbounded derivative, for example the function defined by
f (x) = x sin(1/x), x = 0; f (0) = 0.
This is continuous on [0, 1], and it is differentiable on (0, 1), but it is not differentiable on
[0, 1]. The mean value theorem that we have just proven assures us that for every positive
x there is a point c, 0 < c < x, for which

x sin(1/x) − 0
= sin(1/x) = f  (c) = sin(1/c) − c−1 cos(1/c). (3.49)
x−0

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

3.4.1. Give an example of a function that exists and is bounded for all x in the interval
[0, 1] but which never achieves either its least upper bound or its greatest lower bound over
this interval.

3.4.2. Give an example of a bounded, continuous function which does not achieve its least
upper bound. Notice that the domain was not specified.

3.4.3. Give an example of a function whose derivative vanishes at x = 1, f  (1) = 0, but


which does not have an extremum at x = 1.

3.4.4. Prove that if A is not zero and |A − B| is less than |A|, then A and B must have the
same sign.

3.4.5. Prove that if a function is continuous at every point except x = c and is so discontin-
uous at x = c that there is no response δ for any error bound , then the function must have
a vertical asymptote at x = c. Part of proving this statement is coming up with a careful
definition of a vertical asymptote.

3.4.6. Find the greatest lower bound and the least upper bound of each of the following
sets.
a. the interval (0, 3)
b. {1, 1/2, 1/4, 1/8, . . .}
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.4 Consequences of Continuity 103

c. {1, 1 + 1/2, 1 + 1/2 + 1/4, 1 + 1/2 + 1/4 + 1/8, . . .}


d. {2/1, (2 · 2)/(1 · 3), (2 · 2 · 4)/(1 · 3 · 3), (2 · 2 · 4 · 4)/(1 · 3 · 3 · 5), . . .} (See equation
(2.18) on page 24.)
e. {0.2, 0.22, 0.222, . . .}
f. the set of decimal fractions between 0 and 1 whose only digits are 0’s and 1’s
g. {(n + 1)2 /2n | n ∈ N}
h. {(m + n)2 /2mn | m, n ∈ N, m < 2n}
i. {m/n | m, n ∈ N}
√ √
j. { n − n
| n ∈ N}
k. {x | x 2 + x + 1 > 0}
l. {x + x −1 | x > 0}
m. {2x + 21/x | x > 0}
n. {m/n + 4n/m | m, n ∈ N}
o. {mn/(4m2 + n2 ) | m ∈ Z, n ∈ N}
p. {m/(m + n) | m, n ∈ N}
q. {m/(|m| + n) | m ∈ Z, n ∈ N}
r. {mn/(1 + m + n) | m, n ∈ N}

3.4.7. Prove that for any set S, the negative of the least upper bound of −S is a lower
bound for S, and there is no lower bound for S that is larger.

3.4.8. Modify the proof of Theorem 3.6 to prove that if f is continuous on (a, b),
limx→a + f (x) = f (a), and limx→b− f (x) = f (b), then there exist finite bounds A and
B such that A ≤ f (x) ≤ B for all x ∈ [a, b].

3.4.9. Modify the proof of Theorem 3.8 to prove that if f is continuous on (a, b),
limx→a + f (x) = f (a), and limx→b− f (x) = f (b), then there exist k1 , k2 ∈ [a, b] such that
f (k1 ) ≤ f (x) ≤ f (k2 ) for all x ∈ [a, b].

3.4.10. For the mean value theorem (Theorem 3.1) and the generalized mean value the-
orem (Theorem 3.2), explain how the proofs need to be modified in order to weaken the
hypotheses so that instead of continuity at every point on a closed interval we only need
continuity on the open interval and continuity from one side at each of the endpoints.

3.4.11. Prove that if “every set with an upper bound has a least upper bound,” then the
nested interval principle holds.

3.4.12. Use the existence of a least upper bound for any bounded set to prove that if
g  (x) > 0 for all x ∈ [a, b], then g is increasing over [a, b] (a ≤ x1 < x2 ≤ b implies that
g(x1 ) < g(x2 )).

3.4.13. Use the result from exercise 3.4.12 to prove that if f  (x) ≥ 0 for all x ∈ [a, b] and
if a ≤ x1 < x2 ≤ b, then f (x1 ) ≤ f (x2 ).
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

104 3 Differentiability and Continuity

3.4.14. Prove that


f (b) − f (a)
y= (x − a) + f (a)
b−a
is the equation of the straight line through [a, f (a)] and [b, f (b)].

3.4.15.
M&M In equation 3.49 on page 102, we showed that for any positive x, the mean
value theorem implies that there exists a value c, 0 < c < x, for which
sin(1/x) = sin(1/c) − c−1 cos(1/c).
Find (to ten-digit accuracy) such a c for each of the following values of x: 1, 1/3, and 0.01.

3.4.16. Using exercise 3.4.15, we can define a function g for which g(0) = 0 and if x > 0
then g(x) = c where c is the largest number less than x for which
sin(1/x) = sin(1/c) − c−1 cos(1/c).
Prove that g does not have the intermediate value property on [0, 1].

3.4.17. Explain why the conclusion of the mean value theorem still holds if we only assume
that f is continuous and differentiable on the open interval (a, b) and that limx→a + f (x) =
f (a), limx→b− f (x) = f (b).

3.4.18. Define
f (x) = x sin(ln x), x > 0; f (0) = 0.
Show that f on the interval [0, 1] satisfies the conditions of the mean value theorem given
in exercise 3.4.17. Prove that if x is positive, then there exists a c, 0 < c < x, for which
sin(ln c) + cos(ln c) = sin(ln x).

3.4.19. Using exercise 3.4.18, we can define a function h for which h(0) = 0 and if x > 0
then h(x) = c where c is the largest number less than x for which
sin(ln c) + cos(ln c) = sin(ln x).
Prove that h does not have the intermediate value property on [0, 1].

3.4.20. Prove that if f is differentiable on [a, b] and if f  is piecewise monotonic on [a, b],
then f  is continuous on [a, b].

M&M
3.4.21.
Graph the function
x
f (x) = , x = 0; f (0) = 0,
1 + x sin(1/x)
over the interval [0, 1]. Prove that it is differentiable and piecewise monotonic on the
interval [0, 1]. Is the derivative f  continuous on [0, 1]? Discuss this result in light of
exercise 3.4.20.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.5 Consequences of the Mean Value Theorem 105

3.4.22. Let P (x) be any polynomial of degree at least 2, all of whose roots are real and
distinct. Prove that all of the roots of P  (x) must be real. What happens if some of the roots
of P are multiple roots?

3.4.23. Prove that if f is defined on (a, b), f achieves its maximum value at c ∈ (a, b),
and the one-sided derivatives f− (c) and f+ (c) exist, then f+ (c) ≤ 0 ≤ f− (c).

3.4.24. Prove that if f is continuous on [a, b], f (a) = f (b), and the one-sided derivative
f− exists for all x ∈ (a, b), then

inf {f− (x) | x ∈ (a, b)} ≤ 0 ≤ sup{f− (x) | x ∈ (a, b)}.

3.4.25. Prove that if f is continuous on [a, b] and the one-sided derivative f− exists for
all x ∈ (a, b), then

f (b) − f (a)
inf {f− (x) | x ∈ (a, b)} ≤ ≤ sup{f− (x) | x ∈ (a, b)}.
b−a

3.4.26. Prove that if f− exists and is continuous for all x ∈ (a, b), then f is differentiable
on (a, b) and f  (x) = f− (x) for all x ∈ (a, b).

3.4.27. Does there exist a function f on (1, 2) such that f− (x) = x and f+ (x) = 2x for all
x ∈ (1, 2)?

3.4.28. Let f be differentiable on [a, b] such that f (a) = 0 = f (b) and f  (a) > 0, f  (b) >
0. Prove that there is at least one c ∈ (a, b) for which f (c) = 0 and f  (c) ≤ 0.

3.4.29. Suppose that f is continuous on [a, ∞) and limx→∞ f (x) is finite. Show that f is
bounded on [a, ∞).

3.4.30. Prove that if f is continuous on a closed interval [a, b], differentiable on the open
interval (a, b), and f (a) = f (b) = 0, then for any real number α there is an x ∈ (a, b) such
that

α f (x) + f  (x) = 0.

3.5 Consequences of the Mean Value Theorem


Cauchy used the generalized mean value theorem to prove Lagrange’s form of Taylor’s
theorem, so we begin by proving Theorem 3.2.

Proof: (Generalized Mean Value Theorem) We are given that f and F are continuous
in [a, b] and differentiable in (a, b) and that F  (x) is never zero for a < x < b. We define
a new function g by

g(x) = F (x) [f (b) − f (a)] − f (x) [F (b) − F (a)] .


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

106 3 Differentiability and Continuity

The function g is also continuous on [a, b] and differentiable on (a, b) and

g(a) = F (a)f (b) − f (a)F (b) = g(b).

By the mean value theorem, there is a c strictly between a and b for which

g  (c) = F  (c) [f (b) − f (a)] − f  (c) [F (b) − F (a)] = 0. (3.50)

Since F  (x) is never zero for a < x < b, F (b) cannot equal F (a) (why not?, exercise 3.5.1)
and F  (c) = 0. We can rewrite equation 3.50 as

f (b) − f (a) f  (c)


=  . (3.51)
F (b) − F (a) F (c)
Q.E.D.

Finally, we are ready to prove Theorem 2.1 from page 44.

Proof: (Lagrange Remainder Theorem) We assume that the first k derivatives of f exist
in some neighborhood of x = a. We define F to be the difference between f and the
truncated Taylor series:

F (x) = f (x) − f (a) − f (a) (x − a)


f  (a) f (k−1) (a)
− (x − a)2 − · · · − (x − a)k−1 . (3.52)
2! (k − 1)!
We observe that

F (a) = F  (a) = F  (a) = · · · = F (k−1) (a) = 0, (3.53)

and

F (k) (x) = f (k) (x). (3.54)

We consider the fraction F (x) divided by (x − a)k . Since both expressions are 0 when
x = a, we can subtract F (a) from the numerator and (a − a)k from the denominator and
then apply the generalized mean value theorem. There must be some x1 between x and a
for which

F (x) F (x) − F (a) F  (x1 )


= = . (3.55)
(x − a)k (x − a)k − (a − a)k k(x1 − a)k−1

We apply the generalized mean value theorem to this function of x1 :

F  (x1 ) F  (x1 ) − F  (a) F  (x2 )


= = , (3.56)
k(x1 − a)k−1 k(x1 − a)k−1 − k(a − a)k−1 k(k − 1)(x2 − a)k−2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.5 Consequences of the Mean Value Theorem 107

for some x2 between x1 and a. We continue in this manner:


F (x) F  (x1 ) F  (x2 )
= = = ···
(x − a)k k(x1 − a)k−1 k(k − 1)(x2 − a)k−2
F (k−1) (xk−1 ) F (k) (c)
= = , (3.57)
k(k − 1) · · · 2(xk−1 − a) k!
where a < c < xk−1 < xk−2 < · · · < x2 < x1 < x. Since F (k) (c) = f (k) (c), we can rewrite
this last equation as
f (k) (c)
F (x) = (x − a)k . (3.58)
k!
Q.E.D.

Cauchy realized that there was another way of expressing this error.

Theorem 3.11 (Cauchy’s Remainder Theorem). Given a function f for which all
derivatives exist at x = a, let Dn (a, x) denote the difference between the nth partial
sum of the Taylor series for f expanded about x = a and the target value f (x),

f  (a)
Dn (a, x) = f (x) − f (a) + f  (a) (x − a) + (x − a)2
2!

f (n−1) (a)
+··· + (x − a)n−1
. (3.59)
(n − 1)!
There is at least one real number c strictly between a and x for which

f (n) (c)
Dn (a, x) = (x − c)n−1 (x − a). (3.60)
(n − 1)!

Proof: We look at the difference between f (x) and the truncated series not as a function
of x but as a function of a:
f  (a) f (k−1) (a)
φ(a) = f (x) − f (a) − f  (a) (x − a) − (x − a)2 − · · · − (x − a)k−1 .
2! (k − 1)!
(3.61)
We note that φ(x) = 0. Taking the derivative with respect to a, we see that
 
φ  (a) = 0 − f  (a) − f  (a) (x − a) − f  (a)
  
f (a) f  (a)
− (x − a)2 − 2 (x − a) − · · ·
2! 2!
 (k) 
f (a) f (k−1) (a)
− (x − a)k−1 − (k − 1) (x − a)k−2
(k − 1)! (k − 1)!
f (k) (a)
=− (x − a)k−1 . (3.62)
(k − 1)!
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

108 3 Differentiability and Continuity

We now use the mean value theorem just once:

φ(a) φ(a) − φ(x) f (k) (c)


= = φ  (c) = − (x − c)k−1 , (3.63)
a−x a−x (k − 1)!

for some c between a and x. In this case, the remainder is

f (k) (c)
φ(a) = (x − c)k−1 (x − a). (3.64)
(k − 1)!

Q.E.D.

Comparing Remainders
A good series for illustrating the distinction between these two expressions for the remainder
is the logarithmic series:
x2 x3 x4 x k−1
ln(1 + x) = x − + − + · · · + (−1)k + Rk (x).
2 3 4 k−1
The Lagrange form of the remainder is

f (k) (c) k xk
Rk (x) = x = (−1)k−1 . (3.65)
k! k(1 + c)k

The Cauchy form is

f (k) (c) x(x − c)k−1


Rk (x) = (x − c)k−1 x = (−1)k+1 . (3.66)
(k − 1)! (1 + c)k

In each case, c is some constant (different in each case) lying between 0 and x.
If x = 2/3, then 0 < c < 2/3 and the absolute values of the respective remainders are
(2/3)k 2k
<
k(1 + c) k k · 3k
(the Lagrange remainder is maximized when c = 0), and
(2/3)(2/3 − c)k−1 2k
<
(1 + c)k 3k
(the Cauchy remainder is also maximized when c = 0). We see that the Lagrange form
gives a tighter bound.
If x = −2/3, then −2/3 < c < 0 and the absolute values of the respective remainders
are
(2/3)k 2k
<
k(1 + c)k k
(the Lagrange remainder is maximized when c = −2/3), and
(2/3)(2/3 + c)k−1 2k
< k
(1 + c) k 3
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.5 Consequences of the Mean Value Theorem 109

(the Cauchy remainder is maximized when c = 0). We see that the Cauchy form gives a
tighter bound in this case. In fact, the Cauchy bound approaches zero as k goes to infinity
while the Lagrange bound diverges to infinity.

L’Hospital’s Rule
It is a familiar story that the Marquis de L’Hospital (1661–1704) stole what has come
to be known as L’Hospital’s rule from Johann Bernoulli. It needs to be tempered with
the observation that while the result is almost certainly Bernoulli’s, L’Hospital was a
respectable mathematician who had paid for the privilege of publishing Bernoulli’s results
under his own name.

To learn more about the Marquis de l’Hospital, his role in the early development of
calculus, our uncertainty over how to spell his name, and why we do not pronounce
the “s” in his name, go to The Marquis de l’Hospital.

We work with the Archimedean definition of limit given on page 59. We also need to be
careful about what we mean by an infinite limit and a limit at infinity.

Definition: infinite limit and limit at infinity


The statement
lim f (x) = ∞
x→a

means that for any real number L, we can force f (x) > L by restricting x to be
sufficiently close to a. That is to say, there is a δ > 0 so that |x − a| < δ implies that
f (x) > L. When we write
lim f (x) = T ,
x→∞

we mean that for any positive , we can force f (x) to be within of T by taking x
sufficiently large. In other words, there is an N so that x > N implies that |f (x) − T | <
.

Theorem 3.12 (L’Hospital’s Rule: 0/0). If f and F are both differentiable inside an
open interval that contains a, if
lim f (x) = 0 = lim F (x),
x→a x→a

if F (x) = 0 for all x in this open interval, and if limx→a f  (x)/F  (x) exists, then


f (x) f  (x)
lim = lim  . (3.67)
x→a F (x) x→a F (x)

Note that this theorem has a lot of hypotheses. They are all important. As you work
through this proof, identify the places where the hypotheses are used. In the exercises,
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

110 3 Differentiability and Continuity

you will show that each of these hypotheses is necessary by finding examples where the
conclusion does not hold when one of the hypotheses is removed. You will also be asked
to prove that this theorem remains true if a is replaced by ∞.
Proof: Since f and F are differentiable in this interval, they are continuous, and therefore
f (a) = 0 = F (a). The generalized mean value theorem tells us that
f (x) f (x) − f (a) f  (c)
= =  ,
F (x) F (x) − F (a) F (c)
for some c between a and x. Let L be the limit of limx→a f  (x)/F  (x). Given any > 0,
there is a response δ so that if |c − a| < δ, then

f (c)
− L < .
F  (c)
If δ > |x − a|, then we also have that δ > |c − a| and so

f (x) f (c)

F (x) − L = F  (c) − L < .

Q.E.D.

Note that there is nothing in this proof that requires that we work with values of x on
both sides of a or that either f or F is differentiable at a. In particular, L’Hospital’s rule
works equally well with one-sided limits (see exercise 3.5.6).

Theorem 3.13 (L’Hospital’s Rule: ∞/∞). If f and F are both differentiable at every
point except x = a in an open interval that contains a, if
lim |F (x)| = ∞,
x→a

if F  (x) = 0 for all x in this open interval, and if limx→a f  (x)/F  (x) exists, then

f (x) f  (x)
lim = lim  . (3.68)
x→a F (x) x→a F (x)

While we do not insist that limx→a |f (x)| = ∞, that is the only interesting case of this
theorem. Otherwise, the limit of f (x)/F (x) is 0 or does not exist (see exercise 3.5.5).
Proof: Here we need a great deal more finesse. We shall assume that L, the limit of
f  (x)/F  (x), is finite. The proof can be modified to handle an infinite limit.
We are given an error bound . We must find a response δ so that if x is within δ of a,
then f (x)/F (x) will lie within of L. We begin by observing that if we take two values
inside our open interval and both on the same side of a, call them x and s, then

f (x) − f (s) f  (c)


=  , (3.69)
F (x) − F (s) F (c)

for some c between x and s. We choose x and s so that x lies between a and s and so that
s is close enough to a to guarantee that |L − f  (c)/F  (c)| < /2 for any c between s and
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.5 Consequences of the Mean Value Theorem 111

a. The generalized mean value theorem implies that for any choice of x between s and a,
we have

f (x) − f (s) f  (c)


L− < =  <L+ . (3.70)
2 F (x) − F (s) F (c) 2

We fix our value for s. Since limx→a |F (x)| = ∞, there is a δ1 for which |x − a| < δ1
implies that |F (x)| > |F (s)| and therefore

F (s) F (s)
1− ≥1− > 0.
F (x) F (x)
Multiplying equation (3.70) by
F (s) F (x) − F (s)
1− =
F (x) F (x)
gives us
    
 F (s) f (x) f (s)   F (s)
L− 1− < − < L+ 1− . (3.71)
2 F (x) F (x) F (x) 2 F (x)

This is equivalent to

F (s)[L − /2] − f (s) f (x) F (s)[L + /2] − f (s)


L− − < <L+ − . (3.72)
2 F (x) F (x) 2 F (x)

Since s, L, and are fixed, we can find a δ2 so that |x − a| < δ2 implies that

F (s)[L − /2] − f (s)
< , (3.73)
F (x) 2

F (s)[L + /2] − f (s)
< . (3.74)
F (x) 2
Choose δ to be the smaller of δ1 and δ2 . Equations (3.72–3.74) imply that if |x − a| < δ
then
f (x)
L− < < L + .
F (x)
Q.E.D.

Intermediate Value Property for Derivatives


In exercise 3.1.14 of section 3.1, you were asked to prove that if limx→a f  (x) exists, then so
does f  (a), and they must be equal. This implies that where the limit exists, the derivative
must be continuous. Gaston Darboux (1842–1917) was the first to observe that even more
is true. Even if a derivative is not continuous, it must have the intermediate value property.
By Theorem 3.4, the modified converse of the intermediate value theorem, if a derivative
is not continuous then it cannot be piecewise monotonic. All examples of discontinuous
derivatives are similar to the derivative of x 2 sin(x −1 ) which exists but is not continuous
at x = 0 because the derivative oscillates infinitely often in any neighborhood of 0. Our
proof is based on one discovered by Lars Olsen.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

112 3 Differentiability and Continuity

Theorem 3.14 (Darboux’s Theorem). If f is differentiable on [a, b], then f  has the
intermediate value property on [a, b].

Proof: We define a new function g:



 
 f (a), x = a,



 f (2x − a) − f (a)

 , a < x ≤ (a + b)/2,
2x − 2a
g(x) =
 f (b) − f (2x − b)


 , (a + b)/2 ≤ x < b,

 2b − 2x

 f  (b), x = b.
The function g is continuous on [a, b] (see exercise 3.5.19). Given any T between f  (a)
and f  (b), the intermediate value theorem promises us an x in [a, b] at which g(x) = T .
For all x in (a, b), g(x) is equal to
f (t) − f (s)
t −s
for some pair s and t with a ≤ s < t ≤ b. By the mean value theorem, every value of g is
a value of the derivative of f at some point in [a, b].

Q.E.D.

Note that we could have weakened the hypotheses and only assumed that f is differen-
tiable on (a, b) and that f+ (a) and f− (b) exist. The conclusion then applies to the function
defined as f  on (a, b), f+ (a) at x = a, and f− (b) at x = b.

Exercises

3.5.1. Prove that if F is continuous on [a, b] and differentiable on (a, b) and if F  (x) is
not zero for any x strictly between a and b, then F (b) = F (a).

3.5.2. Show that the approximation formula


√ 1 1
1 + x ≈ 1 + x − x2
2 8

gives 1 + x with an error not greater than |x|3 /2, if |x| < 1/2.

3.5.3. For x > −1, x = 0, show that

(1 + x)α > 1 + αx, if α > 1 or α < 0,


(1 + x)α < 1 + αx of 0 < α < 1.

3.5.4. Show that each of the following equations has exactly one real root.
a. x 13 + 7x 3 − 5 = 0
b. 3x + 4x = 5x
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.5 Consequences of the Mean Value Theorem 113

3.5.5. Prove that under the hypothesis of Theorem 3.13, if limx→a |f (x)| = ∞, then
f (x)
lim =0 or does not exist.
x→a F (x)

3.5.6. Show that the 0/0 and ∞/∞ forms of L’Hospital’s rule also work for one-sided
limits. That is to say, explain how to modify the given proofs so that if f and F are both
differentiable in an open interval whose left-hand endpoint is a, if
f (a) = lim+ f (x) = 0 = lim+ F (x) = F (a),
x→a x→a
or
lim F (x) = ∞,
x→a +

if F  (x) = 0 for all x in this open interval, and if limx→a + f  (x)/F  (x) exists, then

f (x) f  (x)
lim+ = lim+  . (3.75)
x→a F (x) x→a F (x)

3.5.7. Explain what is wrong with the following application of L’Hospital’s rule:
To evaluate limx→0 (3x 2 − 1)/(x − 1), apply l’Hospital’s rule:
3x 2 − 1 6x
lim = lim = 0.
x→0 x − 1 x→0 1

From the original function, however, it can be seen that as x approaches zero, the
function approaches 1.

3.5.8. Explain what is wrong with the following application of L’Hospital’s rule:
Let f (x) = x 2 sin(1/x), F (x) = x. Each of these functions approaches 0 as x
approaches 0, so by L’Hospitals’ rule
f (x) f  (x) 2x sin(1/x) − cos(1/x)
lim = lim  = ,
x→0 F (x) x→0 F (x) 1
which does not exist.

3.5.9. This exercise pursues a more subtle misapplication of the ∞/∞ form of L’Hospital’s
rule for limits from the right (see exercise 3.5.6). We begin with the functions
f (x) = cos(x −1 ) sin(x −1 ) + x −1 ,
  −1
F (x) = cos(x −1 ) sin(x −1 ) + x −1 esin(x ) .

a. Show that limx→0+ f (x) = ∞ = limx→0+ F (x).


b. Show that the ratio of the derivatives of f and F simplifies to
−1
f  (x) 2x cos(x −1 )e− sin(x )
= ,
F  (x) 2x cos(x −1 ) + x cos(x −1 ) sin(x −1 ) + 1
and that this approaches 0 as x approaches 0 from the right.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

114 3 Differentiability and Continuity

c. Show that f (x)/F (x) simplifies to


f (x) −1
= e− sin(x )
F (x)

which oscillates between e and e−1 as x approaches 0 and thus does not have a limit.
d. Which hypothesis of L’Hospital’s rule is violated by these functions?
e. Where was that hypothesis used in the proof? Identify the point at which the proof
breaks down for these functions.

3.5.10. Modify the proof of the ∞/∞ form of L’Hospital’s rule to prove that if f and
F are differentiable at every x in some neighborhood of a, if F  is never zero in this
neighborhood, if limx→a |F (x)| = ∞, and if
f  (x)
lim = ∞,
x→a F  (x)
then
f (x)
lim = ∞.
x→a F (x)

3.5.11. Use L’Hospital’s rule to prove that

e−1/x
2

lim = 0. (3.76)
x→0 x

Use this to prove that if f (x) = e−1/x when x = 0 and f (0) = 0, then f  (0) = 0.
2

3.5.12. Prove by induction that for any positive integer n,

e−1/x
2

lim = 0. (3.77)
x→0 x n

3.5.13. Compare the remainder terms of Lagrange and Cauchy for the truncated Taylor
series for f (x) = ex when x = 2, expanded around a = 0. Which remainder gives a tighter
bound on the error?

3.5.14. Prove that over the interval [0, 2/3] with k ≥ 1 both
(2/3)k (2/3)(2/3 − c)k−1
and
k(1 + c)k (1 + c)k
are maximized at c = 0.

3.5.15. Prove that over the interval [−2/3, 0] with k ≥ 3,


(2/3)k
k(1 + c)k
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

3.5 Consequences of the Mean Value Theorem 115

is maximized at c = −2/3 while


(2/3)(2/3 + c)k−1
(1 + c)k
is maximized at c = 0.

3.5.16. Graph the function y = x 1/x , x > 0. Approximately where does it achieve its
maximum? Use L’Hospital’s rule to prove that
lim ln(x 1/x ) = 0.
x→∞

It follows that
lim x 1/x = 1.
x→∞

3.5.17. Let f and g be functions with continuous second derivatives on [0, 1] such that
g  (x) = 0 for x ∈ (0, 1) and f  (0)g  (0) − f  (0)g  (0) = 0. Define a function θ for x ∈ (0, 1)
so that θ (x) is one of the values that satisfies the generalized mean value theorem,
f (x) − f (0) f  (θ (x))
=  .
g(x) − g(0) g (θ (x)
Show that
θ (x) 1
lim = .
x→0+ x 2

3.5.18. Use L’Hospital’s rule to evaluate the following limits.


 2 
arctan xx 2 −1
+1
a. lim
x→1 x−1 
 
1 x
b. lim x 1+ −e
x→+∞ x
c. lim (6 − x) 1/(x−5)
x→5
 sin x 1/x
d. lim+
x→0 x
 sin x 1/x 2
e. lim+
x→0 x

3.5.19. Prove that the function g defined in the proof of Darboux’s theorem is continuous.
L’Hospital’s rule for one-sided limits looks tempting, but it assumes that the derivative of
f is continuous on that side. To be safe, use the Cauchy definition of the derivative and
find the δ response that forces |E(2x − a, a)| < .

3.5.20. Find another function h that can be used to prove Darboux’s theorem. What makes
the function g work is that for a < x < b it is equal to (f (t) − f (s))/(t − s) where s and
t are continuous functions of x, a ≤ s < t ≤ b, and
f (t) − f (s) f (t) − f (s)
lim = f  (a), lim = f  (b).
x→a t −s x→b t −s
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4
The Convergence of Infinite Series

We have seen that when we talk about an infinite series, we are really talking about the
sequence of partial sums. The definitions of infinite series and of convergence on pages 12
and 18 are stated in terms of the partial sums. This is the approach that will enable us to
handle any infinite process. Thus the question “What is the value of 0.99999 . . .?” is not
well-posed until we clarify what we mean by such an infinite string of 9’s. Our interpre-
tation will be the limit of the sequence of finite strings of 9’s: 0.9, 0.99, 0.999, 0.9999,
0.99999, . . . . If we combine this with the Archimedean understanding of such a limit: the
number T such that for any L < T and any M > T , all of the finite strings from some
point on will lie strictly between L and M, then the meaning and value of 0.99999 . . . are
totally unambiguous. The value is 1.
Rather than using L and M, we shall follow the same procedure as we did in the last
chapter and choose symmetric bounds. We choose an > 0 and then use L = T −
and M = T + . In terms of , the definition of convergence of an infinite series is as
follows.

Definition: convergence of an infinite series


An infinite series converges if there is a value T with the property that for each > 0
there is a response N so that all of the partial sums with at least N terms lie strictly
within the open interval (T − , T + ).

This chapter is devoted to answering two basic questions:


r How do we know if a particular infinite series converges?
r If we know that a particular infinite series converges, how do we find its value?

117
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

118 4 The Convergence of Infinite Series

Neither question is easy, and there are no universal procedures for finding an answer. In
some sense, the second question is meaningless. We know that
1 1 1
1− + − + ···
2 3 4
has the value ln 2, but what do we mean by ln 2? My calculator tells me that ln 2 is
.6931471806, which I know is wrong because the natural logarithm of 2 is not a rational
number. Those ten digits give me an approximation. We have just finished seeing that a
convergent series is a sequence of approximations that can be used to obtain any degree of
accuracy we desire. It may require many terms, but the infinite series carries within itself a
better approximation to ln 2 than the ten digit decimal. We might as well call this number
1 − 1/2 + 1/3 − 1/4 + · · · .
This is a bit ingenuous. It is nice to know that the sequence of partial sums approaches
a number which, when exponentiated, yields precisely 2. Recognizing the value of a
convergent series as a number we have seen in another context can be very useful. But we
need to be alert to the fact that asking for the precise value of a convergent series is not
always meaningful. There may be no better way of expressing that value than as the limit
of the partial sums of the series. We return to the first question. How do we know if a series
converges?

4.1 The Basic Tests of Convergence


A highly unreliable method of deciding convergence is to actually calculate the partial
sum of the first hundred or thousand or million terms. If you know something more than
these first terms, then these calculations may give you some useful indications, but the first
million summands in and of themselves tell you nothing about the next million summands,
nor the million after them. It is even less true that as soon as the partial sums start agreeing
to within the accuracy of your calculations, you have found the value of the series.

Stirling’s Series

Stirling’s formula (page 45) says that n! is well approximated by (n/e)n 2π n. One of the
ways of making explicit what we mean by “well approximated” is that the logarithms of
these two functions of n differ by an amount that approaches 0 as n increases:

1
ln(n!) = n ln n − n + ln(2π n) + E(n), lim E(n) = 0. (4.1)
2 n→∞

There is an explicit series for E(n) given in terms of the Bernoulli numbers, rational
numbers that were first discovered by Jacob Bernoulli as an aid to calculating sums of
powers,

1k + 2k + 3k + · · · + nk , k = 1, 2, 3, . . . .

To learn Bernoulli’s formula for the sum of consecutive integers raised to any
fixed positive integer power, go to Appendix A.2, Bernoulli’s Numbers.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.1 The Basic Tests of Convergence 119

Table 4.1. Values of Sn to


ten-digit accuracy.

k Sk
1 0.008333333333
2 0.008330555556
3 0.008330563492
4 0.008330563433
5 0.008330563433
6 0.008330563433
7 0.008330563433
8 0.008330563433
9 0.008330563433
10 0.008330563433
.. ..
. .

The easiest way to define these numbers is in terms of a power series expansion:

 ∞
x x x 2k
+ = 1 + B 2k . (4.2)
ex − 1 2 k=1
(2k)!

The first few values are


1 −1 1 −1 5 −691
B2 = , B 4 = , B6 = , B8 = , B10 = , B12 = .
6 30 42 30 66 2730
The series expansion of the error term E(n) = ln(n!) − (n ln n − n + ln(2π n)/2) is

B2 B4 B6 B2k
+ + + ··· + + ··· . (4.3)
1 · 2 · n 3 · 4 · n3 5 · 6 · n5 (2k − 1) · 2k · n2k−1

Does this series converge?

Web Resource: To explore the convergence of this error term for different values
of n, go to Stirling’s formula. More information on Stirling’s formula including
its derivation can be found in Appendix A.4, The Size of n!.

We let n = 10 and start calculating the partial sums:


B2 B4 B2k
Sk = + + ··· + .
1 · 2 · 10 3 · 4 · 103 (2k − 1) · 2k · 102k−1
It looks as if this series converges and that it converges quite rapidly. The values in
Table 4.1. are given with ten-digit accuracy.
This is pretty good. The true value of ln(10!) − 10 ln 10 + 10 − ln(20π )/2 to ten digits
is 0.008330563433. It appears that this series converges to the true value of the error.
But a little after k = 70, something starts to go wrong (see Table 4.2.).
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

120 4 The Convergence of Infinite Series

Table 4.2. Values of Sn to


ten-digit accuracy.

k Sk
.. ..
. .
70 0.008330563433
71 0.008330563433
72 0.008330563432
73 0.008330563436
74 0.008330563418
75 0.008330563514
76 0.008330562971
77 0.008330566127
78 0.008330547295
79 0.008330662638
80 0.008329937885
81 0.008334608215
82 0.008303752990
83 0.008512682811
84 0.007063134389
85 0.017364593510
86 −0.05760318347
87 0.5009177478
88 −3.757762841
89 29.46731813
90 −235.6875347
.. ..
. .

In fact, this series does not converge. Even for A = 0.008330563433 and an error bound
of = 0.01, there is no N that we can use as a reply. For any k above 85, the partial sums
all differ from A by more than 0.01.

A Preview of Abel’s Test


On the other hand, it can take a convergent series a very long time before it closes in on its
value. As we shall see at the end of this chapter,

 sin(k/100)
k=2
ln k

is a convergent series, but if we look at the partial sums:



n
sin(k/100)
Sn = ,
k=2
ln k
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.1 The Basic Tests of Convergence 121

40

35

30

25

20

15

10

500 1000 1500 2000 2500 3000


n
FIGURE 4.1. Plots of points (n, S(n)) where Sn = k=2 sin(k/100)/ ln k.

we see that at least as far as n = 3000 they do not seem to be settling down. Figure 4.1
is a plot of the values of the partial sums at the multiples of 100 from 100 to 3000.
Among the partial sums are S100 = 11.6084, S200 = 30.7754, S300 = 41.1982, . . . , S1300 =
11.5691, S1400 = 22.2942, . . . , S2200 = 37.2332, S2300 = 31.1325, . . . , S2900 = 33.6201,
and S3000 = 22.3079.
It is also not enough to ask if the summands are approaching zero. The numbers 1, 1/2,
1/3, 1/4, 1/5, . . . approach 0, but

1 1 1
1+ + + + ···
2 3 4

is the harmonic series which we know does not converge. A common explanation is to say
that these summands do not go to zero “fast enough,” but we must be more careful than
this. After all,

1 1 1
1− + − + · · · = ln 2
2 3 4

does converge and its summands have exactly the same absolute values as those in the
harmonic series.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

122 4 The Convergence of Infinite Series

When the Summands Do Not Approach 0


In section 2.5 we saw d’Alembert’s analysis of the binomial series and his proof that
the summands do not approach zero when |x| > 1. He concluded that the series cannot
converge. The justification for his conclusion is given in the next theorem.

Theorem 4.1 (The Divergence Theorem). Let a1 + a2 + a3 + · · · be an infinite series.


If this series converges, then the summands approach zero. More precisely, if this series
converges and we are given any positive error bound , then there is a positive integer
N for which all summands beyond the Nth have absolute value less than :
n≥N implies that |an | < .

Before we prove this theorem, I want to emphasize what it is does and what it does not say.
The converse of any theorem reverses the direction of implication. The inverse states that
the negation of the hypothesis implies the negation of the conclusion. The contrapositive is
that the negation of the conclusion implies the negation of the hypothesis. For Theorem 4.1
these are

converse: “If the summands approach zero, then the series converges.” We know that
this is false.
inverse: “If the series diverges, then the summands do not approach zero.” The harmonic
series also contradicts this statement.
contrapositive: “If the summands do not approach zero, then the series diverges.” This
is logically equivalent to Theorem 4.1. It is the reason we call this the divergence
theorem. We shall use it both ways. It can provide a fast and easy way of seeing that
a series must diverge, but it also tells us something very useful about the summands
whenever we know that our series converges.

Note that the inverse is the contrapositive of the converse, so these two statements are
logically equivalent to each other. One is true if and only if the other is also. Whenever you
see a theorem, it is always worth asking whether the converse is also true. If you think it
might not always hold, can you think of an example for which it does not hold?

Proof: From the definition of convergence, we know that there is a number T for which
we can always win the –N game on the partial sums. The nth summand is the difference
between the nth partial sum and the one just before it:

|an | = |(a1 + a2 + · · · + an ) − (a1 + a2 + · · · + an−1 )|


= |(a1 + a2 + · · · + an ) − T + T − (a1 + a2 + · · · + an−1 )|
≤ |(a1 + a2 + · · · + an ) − T | + |T − (a1 + a2 + · · · + an−1 )|. (4.4)

We assign half of our bound to each of these differences. We find an N so that if m ≥ N ,


then

|(a1 + a2 + · · · + am ) − T | < /2.


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.1 The Basic Tests of Convergence 123

As long as n is at least N + 1, we have that

|an | ≤ |(a1 + a2 + · · · + an ) − T | + |T − (a1 + a2 + · · · + an−1 )|


< /2 + /2 = . (4.5)

Q.E.D.

The Cauchy Criterion


It was Cauchy in his 1821 Cours d’analyse who presented the first systematic treatment of
the question of convergence of infinite series. He began by facing the question: how can
we determine whether the partial sums are approaching a value T when we do not know
the value of T ? The answer is known as the Cauchy criterion.

Theorem 4.2 (The Cauchy Criterion). Let a1 + a2 + a3 + · · · be an infinite series


whose partial sums are denoted by Sn = a1 + a2 + · · · + an . This series converges if
and only if the partial sums can be brought arbitrarily close together by taking the
subscripts sufficiently large. Specifically, it converges if and only if for any positive
error bound , we can always find a subscript N such that for any pair of partial sums
beyond the Nth (m, n ≥ N), we have

|Sm − Sn | < . (4.6)

In Cauchy’s own words, “It is necessary and it suffices that, for infinitely large values
of the number n, the sums Sn , Sn+1 , Sn+2 , . . . differ from the limit S, and in consequence
among each other, by infinitely small quantities.” This was not stated as a theorem by
Cauchy, but merely as an observation. He did prove that if the series converges, then for
every > 0 there is a response N for which equation (4.6) must hold whenever m and n
are at least as large as N . He stated the converse but did not prove it. As we shall see, this
is the difficult part of the proof. It is also the heart of the theorem because it gives us a
means for proving that a series converges even when we have no idea of the value to which
it converges.
We say that a series is Cauchy if its partial sums can be forced arbitrarily close together
by taking sufficiently many terms. Theorem 4.2 can be stated succinctly as: an infinite
series converges if and only if it is Cauchy.

Definition: Cauchy sequence and series


An infinite sequence {S1 , S2 , S3 , . . .} is Cauchy if for any positive error bound , we
can always find a subscript N such that N ≤ m < n implies that |Sm − Sn | < . A
series is Cauchy if the sequence of its partial sums is a Cauchy sequence.

Proof: We will work with the sequence of partial sums and prove that a sequence converges
if and only if it is Cauchy. We start with the easy direction. If our sequence converges, then
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

124 4 The Convergence of Infinite Series

it has a value T , and we can force our terms to be arbitrarily close to T . Noting that
|Sm − Sn | = |Sm − T + T − Sn | ≤ |Sm − T | + |T − Sn |,
we split our error bound in half and find an N such that n ≥ N implies that
|Sn − T | < /2.
If m and n are both at least N , then
|Sm − Sn | ≤ |Sm − T | + |T − Sn | < /2 + /2 = .
The converse is harder. We need to show that there is a value T to which the sequence
converges. We are going to use Theorem 3.7 which states that every set with an upper
bound has a least upper bound. As we saw, this implies that any set with a lower bound has
a greatest lower bound.
Start with the set of all terms of the sequence. The fact that the sequence is Cauchy
guarantees that this set is bounded because we can find an n that is the response to = 1.
All of the terms from the nth on sit inside the interval (Sn − 1, Sn + 1). We are left with
{S1 , S2 , . . . , Sn−1 }, but any finite set is bounded. The entire sequence must be bounded. By
Theorem 3.7, this set has a greatest lower bound; call it L1 .
While L1 might be our target value, it also might not. Consider the sequence of partial
sums of the alternating harmonic series: 1 − 1/2 + 1/3 − 1/4 + · · · . For this series, L1 =
1 − 1/2 = 1/2. If we throw out the first two partial sums and consider the greatest lower
bound of {S3 , S4 , . . .}, then the greatest lower bound is 1 − 1/2 + 1/3 − 1/4 = 7/12. That
is still not the target value, but it is getting closer. We continue throwing away those partial
sums with only a few terms. In general, we let Lk denote the greatest lower bound of the
set {Sk , Sk+1 , Sk+2 , . . .}. Notice that as we throw away terms, the greatest lower bound can
only increase: L1 ≤ L2 ≤ L3 ≤ · · · . These Lk are bounded by any upper bound on our
sequence, and so they have a least upper bound. Let M be this least upper bound of the Lk .
I claim that this is the target value for the series.
To prove that M is the target value, we have to demonstrate that given any > 0, there
is a response N so that all of the terms from the N th term on lie inside the open interval
(M − , M + ). By the definition of a least upper bound (page 97), we know that there is
at least one Lk that is larger than M − , call it LK . Since LK is the greatest lower bound
of all terms starting with the Kth, all terms starting with the Kth are strictly greater than
M − .
We have used the fact that this sequence is Cauchy to conclude that it must be bounded,
but not every bounded sequence also converges (consider {1, −1, 1, −1, . . .}). We now
need to use the full power of being Cauchy to find an N for which all terms starting with
the Nth are strictly less than M + .
We choose an N ≥ K such that m, n ≥ N implies that |Sn − Sm | < /2. Since LN is the
greatest lower bound among all Sn , n ≥ N , we can find an m ≥ N so that Sm is within /2
of LN . It follows that for any n ≥ N ,

M − < LK ≤ LN ≤ Sn < Sm + < LN + ≤ M + . (4.7)
2
Q.E.D.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.1 The Basic Tests of Convergence 125

Completeness
We have shown that the nested interval principle implies that every set with an upper bound
has a least upper bound, and we have shown that if every set with an upper bound has a
least upper bound, then every Cauchy series converges. We will now complete the cycle
by showing that if every Cauchy series converges, then the nested interval principle must
hold. This does not prove the nested interval principle. What it shows is that these three
statements are equivalent. They are different ways of looking at the same basic property of
the real numbers, a property that is called completeness.

Theorem 4.3 (Completeness). The following three properties of the real numbers are
equivalent:
r The nested interval principle,
r Every set with an upper bound has a least upper bound,
r Every Cauchy sequence converges.

Definition: completeness
A set of numbers is called complete if it has any of the three equivalent properties
listed in Theorem 4.3. In particular, the set of all real numbers is complete. The set of
all rational numbers is not complete.

Proof: We only have to prove that if every Cauchy sequence converges, then the nested
interval principle holds. Let x1 ≤ x2 ≤ x3 ≤ · · · be the left-hand endpoints of our nested
intervals. We first observe that this sequence is Cauchy: Given any > 0, we can find an
interval [xk , yk ] of length less than . All of the xn with n ≥ k lie inside this interval, and
so any two of them differ by at most .
Let T be the limit of this sequence. Since the xn form an increasing sequence, T must
be greater than or equal to every xn . We only need to show that T is less than or equal to
every yn . What would happen if we could find a yk < T ? Since T is the limit of the xn , we
could find an xn that is larger than yk . This cannot happen because our intervals are nested.
Our limit T lies inside all of the intervals.
Q.E.D.

Absolute Convergence
One of the consequences of the Cauchy criterion is the fact that if the sum of the absolute
values of the terms in a series converges, then the original series must converge.

Definition: absolute convergence


If |a1 | + |a2 | + |a3 | + · · · converges, then we say that a1 + a2 + a3 + · · · converges
absolutely.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

126 4 The Convergence of Infinite Series

Corollary 4.4 (Absolute Convergence Theorem). Given a series a1 + a2 + a3 + · · · ,


if |a1 | + |a2 | + |a3 | + · · · converges, then so does a1 + a2 + a3 + · · · .

Proof: Let Tn = |a1 | + |a2 | + · · · + |an | be the partial sum of the absolute values and let
Sn = a1 + a2 + · · · + an be the partial sum of the original series. Given any positive error
, we know we can find an N such that for any m, n ≥ N , |Tn − Tm | < . We now show
that the same response, N , will work for the series without the absolute values. We can
assume that m ≤ n, and therefore
|Sn − Sm | = |am+1 + am+2 + · · · + an |
≤ |am+1 | + |am+2 | + · · · + |an |
= |Tn − Tm |
< .
Q.E.D.

The Converse Is False


Convergence does not imply absolute convergence. The series
1 1 1
1− + − + ···
2 3 4
converges, but if we take the absolute values we get the harmonic series which does not
converge. This is an example of a series that converges conditionally.

Definition: conditional convergence


We say that a series converges conditionally if it converges but does not converge
absolutely.

Cauchy realized that while having the summands approach zero is not enough to guar-
antee convergence in all cases, it is sufficient when the summands decrease in size and
alternate between positive and negative values.

Corollary 4.5 (Alternating Series Test). If a1 , a2 , a3 ,. . . are positive and decreasing


(a1 ≥ a2 ≥ a3 ≥ · · · ≥ 0), then the alternating series
a1 − a2 + a3 − a4 + a5 − a6 + · · ·
converges if and only if the summands approach zero. That is to say, we have conver-
gence if and only if given any positive error , we can find a subscript N such that for
all n ≥ N, an < .

Proof: Each time we add a summand with an odd subscript, we add back something less
than or equal to what we just subtracted. Each time we subtract a summand with an even
subscript, we subtract something less than or equal to what we just added. That means that
all of the partial sums from the nth on lie between Sn and Sn+1 . The absolute value of the
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.1 The Basic Tests of Convergence 127

difference between these two partial sums is precisely an+1 , which we can make as small
as we wish by taking n + 1 sufficiently large. This series is Cauchy.
Q.E.D.
This corollary is a rich source of series that converge conditionally. For example,
1 1 1 1
− + − + ···
ln 2 ln 3 ln 4 ln 5
converges. If we turned the minus signs to plus signs, it would diverge. It does not help us
determine the convergence of
sin(2/100) sin(3/100) sin(4/100) sin(5/100)
+ + + + ···
ln 2 ln 3 ln 4 ln 5
because the summands of this series do not alternate between positive and negative values.
Warning: The hypotheses of the alternating series test are all important. In particular, it is
not enough that the signs alternate and the summands approach zero. Consider the series
1 1 1 1 1 1 1
1− + − + − + ··· + − n + ··· .
2 2 4 3 8 n 2
The summands alternate, and the summands approach zero, but this series does not con-
verge. If we take the first 2n summands, we know that this series is bounded below by
ln n + γ − 1, and so it diverges to infinity.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

4.1.1. How many terms of the series


1 1 1
1+ + + + ···
2 4 8
do we need to take if we are to guarantee that we are within = 0.0001 of the target
value 2?

4.1.2. How many terms of the series in exercise 4.1.1 do we need to take if we are to
guarantee that we are within = 10−1000000 of the target value 2?

4.1.3. How many terms of the series


1 1 1
1− + − + ···
2 3 4
do we need to take if we are to guarantee that we have an approximation to the target value,
ln 2, with 10-digit accuracy? with 20-digit accuracy? with 100-digit accuracy?

M&M
4.1.4.
Evaluate the partial sums
n
k!
1+ ,
k=1
100k
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

128 4 The Convergence of Infinite Series

for the multiples of 10 up to n = 400. Describe and discuss what you see happening.

M&M
4.1.5.
What is the smallest summand in the series in exercise 4.1.4?

4.1.6. d’Alembert would have described the series in exercise 4.1.4 as converging until we
reach the smallest summand and then diverging after that point. What did he mean by the
word “converging,” and how does that differ from our modern understanding of the word?

4.1.7. Prove that


x x
+
ex − 1 2
is an even function, and therefore its power series only involves even powers of x.

M&M
4.1.8.
Let B2k be the 2kth Bernoulli number. Use the fact that

k−1 2(2k)! k−1 2(2k)
2k
4π k e−2k
B2k ≈ (−1) ≈ (−1)
(2π )2k (2π )2k
to find the summand with the smallest absolute value in the series
 ∞
B2 B4 B6 B2k
+ + + · · · = .
1 · 2 · 10 3 · 4 · 103 5 · 6 · 105 k=1
(2k − 1)(2k) 102k−1

4.1.9. Prove that the series in exercise 4.1.8 does not converge.

4.1.10. Find the summand with smallest absolute value in the series

 B2k
.
k=1
(2k − 1) · (2k) · 10002k−1

4.1.11. Prove that the series in exercise 4.1.10 does not converge.

4.1.12. Find a divergent series for which the first million partial sums, S1 , S2 , . . . , S1000000 ,
all agree to ten significant digits.

M&M
4.1.13.
Calculate the partial sums


n
sin(k/100)
Sn =
k=2
ln k

up to at least n = 2000. Describe what you see happening. Make a guess of the approximate
value to which this series is converging. Explain the rationale behind your guess.

4.1.14.
M&M For each of the following series, explore the values of the partial sums to at
least the first two thousand terms, then analyze the series to determine whether it converges
absolutely, converges conditionally, or diverges. Justify your answer.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.2 Comparison Tests 129

 (−1)k ∞
1 1 1 1
a. − + − + ··· =
ln 2 ln 3 ln 4 ln 5 k=2
ln k
∞
(ln k)2
b. (−1)k
k=2
k


c. (−1)k sin(1/k)
k=1
∞
(ln k)ln k
d. (−1)k
k=2
k2


4.1.15.
M&M For each of the following series, explore the values of the partial sums to at
least the first two thousand terms, then analyze the series to determine whether it converges
absolutely, converges conditionally, or diverges. Justify your answer.
 (−1)n+1−3 (n+1)/3

1 1 1 1 1 1 1 1
a. 1 + − + + − + + − + ··· =
2 3 4 5 6 7 8 9 n=1
n

 (−1) (n−1)/2

1 1 1 1
b. 1 + − − + + ··· =
2 3 4 5 n=1
n
∞ √
1 1 1 1 1 1 1 1  (−1) n

c. −1 − − + + + + + − + · · · =
2 3 4 5 6 7 8 9 n=1
n

4.1.16. Let

+1 for 22k ≤ n < 22k+1 ,
n =
−1 for 22k+1 ≤ n < 22k+2 ,

where k = 0, 1, 2, . . . Determine whether the series



 n
n=1
n

converges absolutely, converges conditionally, or diverges.

4.1.17. There are six inequalities in equation (4.7). Explain why each of them holds.

4.2 Comparison Tests


Underlying d’Alembert’s treatment of the binomial series is the assumption that we can
determine the convergence or divergence of a series by comparing it to another series
whose convergence or divergence is known. We must be careful. This only works when
the summands are positive, and it requires both skill and luck in choosing the right series
with which to compare, but it is a powerful technique. Its justification rests on the Cauchy
criterion.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

130 4 The Convergence of Infinite Series

Theorem 4.6 (The Comparison Test). Let a1 + a2 + a3 + · · · and b1 + b2 + b3 + · · ·


be two series with summands that are greater than or equal to zero. We assume that
each bi is greater than or equal to the corresponding ai :
b1 ≥ a1 ≥ 0, b2 ≥ a2 ≥ 0, b3 ≥ a3 ≥ 0, . . .
If b1 + b2 + b3 + · · · converges, then so does a1 + a2 + a3 + · · · . If a1 + a2 + a3 +
· · · diverges, then so does b1 + b2 + b3 + · · · .

Proof: Let Sn = a1 + a2 + · · · + an and Tn = b1 + b2 + · · · + bn . If m < n, then

0 ≤ Sn − Sm = am+1 + am+2 + · · · + an ≤ bm+1 + bm+2 + · · · + bn = Tn − Tm ,

and so

|Sn − Sm | ≤ |Tn − Tm |. (4.8)

We assume the series b1 + b2 + b3 + · · · converges. Given a positive bound , we have


a response N . Equation (4.8) shows us that the same response will work for the series
a1 + a2 + a3 + · · · .
The contrapositive of what we have just proven says that if a1 + a2 + a3 + · · · diverges,
then b1 + b2 + b3 + · · · diverges.
Q.E.D.

The Ratio Test


The ratio and root test rely on comparing our series to a geometric series. They are very
simple and powerful techniques that quickly yield one of three conclusions:
1. the series in question converges absolutely,
2. the series in question diverges, or
3. the results of this test are inconclusive.
It is the third possibility that is the principal drawback of these tests. The most interesting
series mathematicians and scientists were encountering in the early 1800s all fell into
category 3. Nevertheless, these tests are important because they are simple. Start with one
of these tests, and move on to a more complicated test only if the results are inconclusive.

Theorem 4.7 (The Ratio Test). Given a series with nonzero summands, a1 + a2 +
a3 + · · · , we consider the ratio r(n) = |an+1 /an |. If we can find a number α < 1 and
a subscript N such that for all n ≥ N , r(n) is less than or equal to α, then the series
converges absolutely.
If we can find a subscript N such that for all n ≥ N , r(n) is greater than or equal to
1, then the series diverges.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.2 Comparison Tests 131

Proof: If r(n) is less than or equal to α < 1 when n ≥ N, then the series of absolute values,
|a1 | + |a2 | + |a3 | + · · · , is dominated by the convergent series
|aN |
|a1 | + · · · + |aN | + |aN |α + |aN |α 2 + |aN |α 3 + · · · = |a1 | + · · · + |aN−1 | + .
1−α
If r(n) is greater than or equal to 1 when n ≥ N , then |an | is greater than or equal to |aN |
and so does not approach zero as n gets larger.
Q.E.D.

In many cases, r(n) approaches a limit as n gets very large. If this happens, there is a
simpler form of the ratio test.

Corollary 4.8 (The Limit Ratio Test). Given a series with nonzero summands and
r(n) = |an+1 /an |, if
lim r(n) = L < 1,
n→∞

then the series converges absolutely. If


lim r(n) = L > 1,
n→∞

then the series diverges. If


lim r(n) = L = 1,
n→∞

then this test is inconclusive.

Proof: Recall that

lim r(n) = L
n→∞

means that given any positive error bound , we can find an N with which to reply such
that if n ≥ N , then |r(n) − L| < . If L < 1 then we can use an that is small enough so
that L + < 1. We can then choose L + to be our α. If n ≥ N , then r(n) < L + = α.
If L > 1, we can use an that is small enough so that L − ≥ 1. If n ≥ N , then
r(n) > L − ≥ 1.
If L = 1, then it might be the case that r(n) ≥ 1 for all n sufficiently large, which implies
that the series diverges. But if all we know is the value of this limit, then r(n) could be less
than 1 for all values of n. The ratio test is inconclusive.
Q.E.D.

The Root Test


Cauchy found an even better test that rests on a comparison with geometric series. We can
view a geometric series as one for which the nth root of the nth summand is constant,
n
|x n | = |x|.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

132 4 The Convergence of Infinite Series

This suggests taking the nth root of the absolute value of the nth summand in an arbitrary
series. This test is often more complicated to apply than the ratio test, but it will give an
answer in some cases where the ratio test is inconclusive.

Theorem 4.9 (The Root Test). Given a series a1 + a2 + a3 + · · · , we consider



ρ(n) = n |an |.
If we can find a number α < 1 and a subscript N such that for all n ≥ N , ρ(n) is less
than or equal to α, then the series converges absolutely.
If for any subscript N we can always find a larger n for which ρ(n) is greater than or
equal to 1, then the series diverges.

Notice that while the convergence condition looks very much the same, the divergence
condition has been liberalized a great deal. We do not have to go above 1 and stay there. It
is enough if we can always find another ρ(n) that climbs to or above 1. The ratio r(n) and
the root ρ(n) are related. Exercises 4.2.8–4.2.11 show that if limn→∞ r(n) exists, then so
does limn→∞ ρ(n) and the two will be equal. Whenever limn→∞ r(n) exists, the root and
ratio tests will always give the same response.

Proof: If n ≥ N implies that ρ(n) is less than or equal to α < 1, then the series of absolute
values, |a1 | + |a2 | + |a3 | + · · · , is dominated by the convergent series

|a1 | + |a2 | + · · · + |aN−1 | + α N + α N+1 + α N+2 + · · ·


αN
= |a1 | + |a2 | + · · · + |aN−1 | + .
1−α
If ρ(n) ≥ 1, then |an | ≥ 1. If this happens for arbitrarily large values of n, then the
summands do not approach zero, and so the series cannot converge.
Q.E.D.

Corollary 4.10 (The Limit Root Test). Given a series with positive summands and

ρ(n) = n |an |, if
lim ρ(n) = L < 1,
n→∞

then the series converges absolutely. If


lim ρ(n) = L > 1,
n→∞

then the series diverges. If


lim ρ(n) = L = 1,
n→∞

then this test is inconclusive.

The proof of this corollary parallels that of Corollary 4.8 and is left as an exercise.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.2 Comparison Tests 133

Examples
With all of this machinery in place, we can now answer the question of convergence for
many series. We recall the series expansion for (1 + x)1/2 at x = 2/3:
   
2 1/2 2 (1/2)(1/2 − 1) 2 2
1+ = 1 + (1/2) +
3 3 2! 3
 3
(1/2)(1/2 − 1)(1/2 − 2) 2
+ + ··· .
3! 3

The absolute value of the ratios of successive terms is



1/2(1/2 − 1) · · · (1/2 − n + 1) (2/3)n /n!
r(n) =
1/2(1/2 − 1) · · · (1/2 − n + 2) (2/3)n−1 /(n − 1)!

(1/2 − n + 1)(2/3)
= = 2n − 3 .
n 3n

This has a limit:

2n − 3 2
lim = < 1.
n→∞ 3n 3

This series converges absolutely.


A more interesting example is

∞
1! 2! 2! 4! 3! 6! n! (2n)!
1+ + + + ··· = 1 + .
3! 6! 9! n=1
(3n)!

The ratio is

n! (2n)!/(3n)!
r(n) =
(n − 1)! (2n − 2)!/(3n − 3)!
n(2n)(2n − 1)
=
(3n)(3n − 1)(3n − 2)
4n3 − 2n2
= .
27n3 − 27n2 + 6n

The limit is

4n3 − 2n2 4
lim = < 1,
n→∞ 27n3 − 27n2 + 6n 27

and so this series converges absolutely.


We note that we obtain exactly the same limit if we use the nth root of the nth term.
To make life a little simpler, let us ignore the first summand so √ that the nth summand is
n! (2n)!/(3n)!. It is necessary to use Stirling’s formula, n! = nn 2π n e−n+E(n) where E(n)
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

134 4 The Convergence of Infinite Series

approaches 0 as n gets large:



n! (2n)!
ρ(n) = n
(3n)!
 √ √ 1/n
nn 2π n e−n+E(n) · (2n)2n 4π n e−2n+E(2n)
= √
(3n)3n 6π n e−3n+E(3n)
n(2π n)1/2n e−1+E(n)/n · 4n2 (4π n)1/2n e−2+E(2n)/n
=
27n3 (6π n)1/2n e−3+E(3n)/n
 
4 4π n 1/2n [E(n)+E(2n)−E(3n)]/n
= e .
27 3
This also approaches 4/27 < 1 as n gets arbitrarily large.
Still more interesting is
 n!∞
2! 3! 4!
1+ 2
+ 3 + 4 + ··· = .
2 3 4 n=1
nn
The ratio test gives us
 
(n + 1)!/(n + 1)n+1 (n + 1)nn 1 −n
r(n) = = = 1 +
n!/nn (n + 1)n+1 n
which approaches e−1 < 1 as n gets arbitrarily large.
 x n
Web Resource: For a proof and exploration of the limit formula lim 1 + =
x
n→∞ n
e , go to Exponential function.

The root test gives us


 √ 1/n
nn 2π n e−n+E(n)
ρ(n) = = (2π n)1/2n e−1+E(n)/n
nn

which also approaches e−1 < 1 as n gets arbitrarily large.


Sometimes it is easier to take the nth root rather than the nth ratio. Consider the series
       ∞   2
−1 1 −4 1 −9 1 −16 1 −n
(1 + 1) + 1 + + 1+ + 1+ + ··· = 1+ .
2 3 4 n=1
n
The nth root of the nth summand is
 
1 −n
ρ(n) = 1 +
n
which approaches e−1 < 1 as n gets arbitrarily large. This series converges absolutely.

Limitations of the Root and Ratio Tests


While the root and ratio tests are usually the ones we want to use first, there are many
important series for which they return an inconclusive result. Neither of these tests will
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.2 Comparison Tests 135

confirm that the harmonic series diverges. For the limit ratio test we have
1/(n + 1) n
lim = lim = 1.
n→∞ 1/n n→∞ n + 1

Similarly, the limit root test returns

lim n−1/n = elimn→∞ −(ln n)/n = e0 = 1.


n→∞

Of course, we know that the harmonic series diverges. We can use this information with

the comparison test. If p ≤ 1, then 1/np ≥ 1/n and so ∞ 1
n=1 np diverges. What if p is
greater than 1? Does

∞
1
n1.01
n=1

converge or diverge? Can we find a divergent series with an < 1/n? What about ∞ 1
n=2 n ln n ?
Our last two tests enable us to answer these questions. They are both based on the
observation that if the summands are positive, then the partial sums are increasing. If the
partial sums are bounded, then they form a Cauchy sequence and so the series converges.
If the partial sums are not bounded, then the series diverges to infinity.

Cauchy’s Condensation Test


The first convergence test in Cauchy’s Cours d’analyse is the root test. The second is the
ratio test. The third is the condensation test.

Theorem 4.11 (Cauchy’s Condensation Test). Let a1 + a2 + a3 + · · · be a series


whose summands are eventually positive and decreasing. That is to say, there is a
subscript N such that
n≥N implies that an ≥ an+1 ≥ 0.
This series converges if and only if the series
a1 + 2a2 + 4a4 + 8a8 + · · · + 2k a2k + · · ·
converges.

This test is good enough to settle the convergence questions that the root and ratio tests
could not handle. We shall state and prove the p-test after we have proven Cauchy’s test.
But first, we show that there is a series with smaller summands than the harmonic series
but which still diverges. We consider
1 1 1
+ + + ··· .
2 ln 2 3 ln 3 4 ln 4
These summands are positive and decreasing. We can apply the condensation test, letting
the first summand be 0 and treating 1/2 ln 2 as the second summand. We compare our series
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

136 4 The Convergence of Infinite Series

with
∞
2 4 8 2k
+ + + ··· =
2 ln 2 4 ln 4 8 ln 8 k=1
2k ln 2k

 1
=
k=1
k ln 2
 
1 1 1 1
= 1 + + + + ··· .
ln 2 2 3 4
We are comparing our original series with the harmonic series which we know diverges. It
follows that 1/(2 ln 2) + 1/(3 ln 3) + · · · also diverges.

Proof: We can assume that the summands are positive and decreasing beginning with
the first summand. Otherwise, we chop off the initial portion containing the recalcitrant
summands. This will change the value of the series (if it converges), but it will not change
whether or not it converges.
If a1 + 2a2 + 4a4 + · · · converges, then it has a value V . Given a partial sum of our
original series,

Sn = a1 + a2 + a3 + · · · + an ,

we choose the smallest integer m such that n < 2m . We can compare Sn with the partial
sum of the first m terms in the second series:

Sn = a1 + (a2 + a3 ) + (a4 + a5 + a6 + a7 )
+(a8 + a9 + · · · + a15 ) + · · · + (a2m−1 + a2m−1 +1 + · · · + an )
≤ a1 + 2a2 + 4a4 + 8a8 + · · · + 2m−1 a2m−1
≤ V.

The partial sums are bounded and so they converge.


If a1 + a2 + a3 + · · · converges, then it has a value W . Given a partial sum of the second
series,

Tn = a1 + 2a2 + 4a4 + · · · + 2n a2n ,

we can compare Tn with twice the partial sum of the first 2n terms in the first series:

Tn ≤ 2a1 + 2a2 + 2(a3 + a4 ) + 2(a5 + a6 + a7 + a8 )


+2(a9 + a10 + · · · + a16 ) + · · ·
+2(a2n−1 +1 + a2n−1 +2 + · · · + a2n )
≤ 2(a1 + a2 + · · · + a2n )
≤ 2W.

The partial sums are bounded and so they converge.


Q.E.D.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.2 Comparison Tests 137

Corollary 4.12 (The p-Test). The series


∞
1
n=1
np
diverges for p ≤ 1 and converges for p > 1.


Proof: We compare our series n=1 1/np to

∞ ∞
2n 1
n p =  n .
n=1
(2 ) n=1
2p−1

This is a geometric series. It converges if and only if 2p−1 > 1, which happens if and only
if p > 1.
Q.E.D.

The Integral Test



When we first studied the harmonic series in section 2.4, we proved that ∞ n=1 1/n diverges

by comparing it to the improper integral 1 (1/x) dx. This is an approach that works
whenever an is the value of a function of n that is positive, decreasing, and asymptotic to 0
as n approaches infinity. The following test for convergence was published by Cauchy in
1827.

Theorem 4.13 (The Integral Test). Let f be a positive, decreasing, integrable function
for x ≥ 1. The series


f (k)
k=1

converges if and only if we have convergence of the improper integral




f (x) dx.
1

Any time we see the symbol ∞, warning lights should go off. The improper integral
actually means the limit

n
lim f (x) dx.
n→∞ 1

Proof: Since f is positive for x ≥ 1, it is enough to show that when one of them converges,
it provides an upper bound for the other.
Since f is decreasing, we know that (see Figure 4.2)

k+1
f (k + 1) ≤ f (x) dx ≤ f (k).
k
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

138 4 The Convergence of Infinite Series

Definition: improper integral (unbounded domain)


The improper integral


f (x) dx
1
is said to converge if there is a number V such that for any error bound , we can
always find a response N for which

n

n≥N implies that f (x) dx − V < .

1

The number V is called the value of the integral.

It follows that


N N

 k+1
N+1 
N
f (k + 1) ≤ f (x) dx = f (x) dx ≤ f (k).
k=1 k=1 k 1 k=1

If the series converges, then the partial integrals are bounded:



N+1 
N ∞

f (x) dx ≤ f (k) ≤ f (k).
1 k=1 k=1

If the integral converges, then the partial sums are bounded:


N+1
N+1

f (k) ≤ f (1) + f (x) dx ≤ f (1) + f (x) dx.
k=1 1 1

Q.E.D.

In section 2.4, we not only proved that the harmonic series diverges, we found an explicit
n
formula for the difference between the partial sum of the first n terms and 1 dx/x =
ln n. The same thing can be done whenever the summand is of the form f (k) where f
is an analytic function for x > 0. In the 1730’s, Leonhard Euler and Colin Maclaurin

( k, f ( k ))
( k + 1, f ( k + 1))

k k +1
k+1
FIGURE 4.2. f (k + 1) ≤ k
f (x) dx ≤ f (k).
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.2 Comparison Tests 139

independently discovered this explicit connection, the Euler–Maclaurin formula:



n
n
1 B2 
f (k) = f (x) dx + [f (n) + f (1)] + [f (n) − f  (1)]
k=1 1 2 2!
B4  B6 (5)
+ [f (n) − f  (1)] + [f (n) − f (5) (1)] + · · · , (4.9)
4! 6!
where the Bn are the Bernoulli numbers defined on page 119.

To see a proof of the Euler–Maclaurin formula and to explore its consequences,


Go to Appendix A.4, The size of n!.

Examples
The series
 1 ∞
1 1
1+ + + ··· = 1 +
2 ln 2 3 ln 3 k=2
k ln k

is handled very efficiently by the integral test. We can ignore the first summand and consider
the improper integral


n
dx dx
= lim
2 x ln x n→∞ 2 x ln x
n
= lim ln (ln x)
n→∞ 2

= lim (ln ln n − ln ln 2) ,
n→∞

which is an infinite limit. The improper integral does not converge. It follows that the series
also does not converge.
On the other hand, the series
 ∞
1 1 1
1+ + + · · · = 1 +
2(ln 2)2 3(ln 3)2 k=2
k(ln k)2

is compared with the improper integral




n
dx dx
2
= lim
2 x(ln x) n→∞ 2 x(ln x)2
n
−1
= lim
n→∞ ln x
2
 
1 1
= lim −
n→∞ ln 2 ln n
1
= .
ln 2
Since the improper integral converges, this series must also converge.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

140 4 The Convergence of Infinite Series

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.
∞ ∞
4.2.1. Prove that if an > 0 and n=1 an converges, then n=1 an3 must converge.

4.2.2. Show that the following series converges:

1 1 1 1 1 1 1
1+ √3
+ √ 3
−√
3
+ √
3
+ √3
+ √ 3
−√ 3
2 2 2 2 2 3 3 3 3 3 3 3
1 1 1 1
+··· + √ + √ + ··· + √ −√ + ··· . (4.10)
n3n n3n n3n 3
n
! "# $
n terms


4.2.3. Show that if bk is the kth summand in the series given in (4.10), then ∞ b3
3 k=1 k
diverges. This gives us an example of a series for which bk converges but bk diverges.
Why does this not contradict the result of exercise 4.2.1?

4.2.4. For each of the following series, determine whether it converges absolutely, con-
verges conditionally, or diverges. Justify your answer.
arctan 1 arctan 2 arctan n
a. + 2
+ ··· + + ···
2 2 2n
1 22 n2
b. 1 + + 2 + ··· + n + ···
4 4 4
1 1 1
c. 1 + + + ··· + + ···
2 3 n
1 1 1
d. − + · · · + (−1)n−1 + ···
1·2 2·3 n(n + 1)

e. α1 q 1 + α2 q 2 + · · · + αn q n + · · · , where |q| < 1 and |αk | ≤ M for k = 1, 2, . . .

1 2 n
f. + 2 + ··· + + ···
22 3 (n + 1)2

4.2.5. For each of the following series, determine whether it converges absolutely, con-
verges conditionally, or diverges. Justify your answer.
∞ 
  
n2 + 1 − n3 + 1
3
a.
n=1

∞  n(n+1)
n
b.
n=1
n+1
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.2 Comparison Tests 141



c. (1 − cos(1/n))
n=1

 √ n
d. n
n−1
n=1

4.2.6. For each of the following series, find those values of a for which it converges
absolutely, the values of a for which it converges conditionally, and the values of a for
which it diverges. Justify your answers.
∞  
an n
a.
n=1
n+1
 1  a 2 − 4a − 8 n

b. , a = −8, 2
n=1
n + 1 a 2 + 6a − 16
∞
nn
c.
n=1
a (n2 )

4.2.7. Describe the region in the x, y-half-plane, y > 0, in which the series

 (ln n)x
(−1)n
n=1
ny
converges absolutely, the region in which it converges conditionally, and the region in
which it diverges.

4.2.8. Given a series a1 + a2 + a3 + · · · , assume that we can find a bound α and a subscript
N such that n ≥ N implies that

an+1

a ≤ α.
n

Prove that given any positive error , there is a subscript M such that n ≥ M implies that
n
|an | < α + .

Show that this does not necessarily imply that n |an | ≤ α.

4.2.9. Use the result of exercise 4.2.8 to prove that if the ratio test tells us that our series
converges absolutely, then the root test will also tell us that our series converges absolutely.

4.2.10. Modify the argument in exercise 4.2.8 to prove that if we can find a bound β and
a subscript N such that n ≥ N implies that

an+1

a ≥ β,
n

then given any positive error , there is a subscript M such that n ≥ M implies
n
|an | > β − .
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

142 4 The Convergence of Infinite Series

4.2.11. Use the results from exercises 4.2.8 and 4.2.10 to prove that if
limn→∞ |an+1 /an | exists, then

 an+1
lim n |an | = lim . (4.11)
n→∞ n→∞ an

4.2.12. Find an infinite series of positive summands for which the root test shows diver-
gence but the ratio test is inconclusive. Explain why this example does not contradict the
result of exercise 4.2.11.

4.2.13. Verify that the root test can be used in situations where the ratio test is inconclusive
by applying both tests to the series
 
1 1 1 1 1 1 1 1 5 − (−1)n −n
+ + 3 + 4 + 5 + 6 + 7 + 8 + ··· + + ··· ,
3 22 3 2 3 2 3 2 2
and to the series
1 1 1 1 n
+ 22 + 3 + 24 + 5 + 26 + 7 + 28 + · · · + 2(−1) n + · · · .
2 2 2 2

4.2.14. Prove Corollary 4.10 on page 132.



M&M
4.2.15.
Find the partial sums
n 
 k
k
Sn =
k=1
2k − 1
for n = 20, 40, . . . , 200. Prove that this series converges.

M&M
4.2.16.
Find the partial sums
 n  k n 
 k
k k
Sn (2) = 2k and Sn (−2) = (−2)k
k=1
2k − 1 k=1
2k − 1
for n = 20, 40, . . . , 200. Describe what you see happening. Do you expect that either
or both of these converge? Prove your guesses about the convergence of the series in
exercise 4.2.16.

M&M
4.2.17.
Find the partial sums

n
kk
Sn =
k=1
k!
for n = 20, 40, . . . , 200. Prove that this series diverges.

M&M
4.2.18.
Find the partial sums
n
k k −k 
n
kk
Sn (e−1 ) = e and Sn (−e−1 ) = (−e)−k
k=1
k! k=1
k!
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.2 Comparison Tests 143

for n = 20, 40, . . . , 200. Describe what you see happening. Do you expect that either
or both of these converge? Prove your guesses about the convergence of the series in
exercise 4.2.18.

M&M
4.2.19.
Find the partial sums
n
2k
Sn = √
k=1
k
for n = 20, 40, . . . , 200. Prove that this series diverges.

M&M
4.2.20.
Calculate the partial sum

n
1
k=2
k ln k

up to n = 10, 000. Does it appear that this series is converging? Prove your assertion.

M&M
4.2.21.
Calculate the partial sum

n
1
k=2
k(ln k)3/2

up to n = 10, 000. Does it appear that this series is converging? Use both the integral test
and the Cauchy condensation test to determine whether or not this series converges.

M&M
4.2.22.
Calculate the partial sum

n
1
k=10
k(ln k)(ln ln k)

up to n = 10, 000. Does it appear that this series is converging? Use both the integral test
and the Cauchy condensation test to determine whether or not this series converges.

4.2.23. For what values of α does



 1
n=10
n(ln n)(ln ln n)α

converge?

4.2.24. Determine whether or not



 1
n=10
n1+f (n)

converges when
ln ln n + ln ln ln n
f (n) = .
ln n
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

144 4 The Convergence of Infinite Series

Do we have convergence when


ln ln n + 2 ln ln ln n
f (n) = ?
ln n

4.2.25. For what values of α does




x α dx
1

converge?

4.2.26. Define

ln2 n = ln ln n, ln3 n = ln ln ln n, ..., lnk n = ln(lnk−1 n),

and let Nk be the smallest positive integer for which lnk Nk > 0. Prove that

 1
n=Nk
n(ln n)(ln2 n)(ln3 n) · · · (lnk n)

diverges.

4.2.27. Prove that



 1
n=Nk
n(ln n)(ln2 n)(ln3 n) · · · (lnk n)α

diverges for α ≤ 1 and converges for α > 1.

4.2.28. Prove that



 (n!)n
n=1
(n2 )!

converges. Find a function f (n) that grows as fast as possible and such that

 (n!)n
f (n)
n=1
(n2 )!

still converges.

4.2.29. Use Cauchy condensation to determine whether the following series converge or
diverge.
∞
1
a. √
n=1
2 n
∞
1
b.
n=1
2 n
ln
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.3 The Convergence of Power Series 145


4.2.30. Prove that if an is a positive, decreasing sequence, then ∞n=1 an converges if and
∞ n
only if n=0 3 a3n converges. Use this to determine whether the series
∞
1
3ln n
n=1

converges or diverges.

4.3 The Convergence of Power Series


We are concerned not just with infinite series but with infinite series of functions,
F (x) = f1 (x) + f2 (x) + f3 (x) + · · · .
For our purposes, convergence is always pointwise convergence.

Definition: pointwise convergence


A series of functions f1 + f2 + f3 + · · · converges pointwise to F if at each value
of x, the value of F is the limit of the sum of the fk evaluated at that value of x,
F (x) = f1 (x) + f2 (x) + f3 (x) + · · · .

Web Resource: To see another type of convergence for infinite series of functions,
go to Convergence in norm.

For Fourier’s cosine series,


πx 1 3π x 1 5π x 1 7π x
F (x) = cos − cos + cos − cos + ··· ,
2 3 2 5 2 7 2
the series at x = 1 is 0 − 0 + 0 − 0 + · · · which converges to 0. If we evaluate the series
at any x strictly between −1 and 1, we obtain a series that converges to π/4.
With a series of functions, the question is not whether or not it converges, but for which
values of x it converges. In this section, we shall consider power series in which the
summands are constant multiples of powers of x:


a0 + an x n .
n=1

A power series might be shifted x0 units to the right by replacing the variable x with x − x0 ,


a0 + an (x − x0 )n .
n=1

In the next section, we shall treat trigonometric series in which the summands are
constant multiples of the sine and cosine of nx:


a0 + (an cos nx + bn sin nx) .
n=1
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

146 4 The Convergence of Infinite Series

As we shall see, power series are well behaved. The set of x for which they converge is
always an interval that is symmetric (except possibly for the endpoints) about the origin or
about the value x0 if it has been shifted. Trigonometric series are not always well behaved.

Some Examples
We begin with the most important of the power series, the binomial series:

α(α − 1) 2 α(α − 1)(α − 2) 3


(1 + x)α = 1 + αx + x + x + ··· . (4.12)
2! 3!

As we saw in equation (2.54) on page 42, the absolute value of the ratios of successive
terms is
 
1+α
r(n) = 1 − |x|.
n
This has a limit:
 
1+α
lim 1− |x| = |x|.
n→∞ n
By Corollary 4.8, the binomial series converges absolutely when |x| < 1, it diverges when
|x| > 1, and we do not yet know what happens when |x| = 1.

The exponential series


x2 x3
ex = 1 + x + + + ···
2! 3!
is another easy case. We have that

x n /n! |x|

r(n) = n−1 = .
x /(n − 1)! n
Regardless of the value of x, this approaches 0 as n gets arbitrarily large,
|x|
lim = 0 < 1.
n→∞ n
This series converges absolutely for all values of x.
We can also use the root test on this series, replacing n! by Stirling’s formula, n! =

n 2π n e−n+E(n) where E(n) approaches 0 as n gets large:
n

1/n
xn |x|
ρ(n) = √ =
1/2n e−1+E(n)/n
.
n
n 2π n e −n+E(n) n(2π n)
Again, this quantity approaches 0 as n gets large, and so the exponential series converges
absolutely for all values of x.

Radius of Convergence
A power series will often have the property that the absolute value of the ratio of consecutive
terms has a well-defined limit. The limit ratio test produces a bound on the absolute value of
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.3 The Convergence of Power Series 147

x (or a bound on |x − x0 | if the series has been shifted) within which the series converges.
This bound is called the radius of convergence.
We apply the limit ratio test to
  ∞  
2n n 2n n
1 + 2x + 6x 2 + 15x 3 + · · · + x + ··· = 1 + x ,
n n=1
n
2n
where n
is the binomial coefficient,
 
2n (2n)!
= .
n n! n!
By the limit ratio test, this converges absolutely for

(2n + 2)! x n+1 n! n! (2n + 1)(2n + 2) x
lim · = lim = 4|x| < 1.
n→∞ (n + 1)! (n + 1)! (2n)! x n n→∞ (n + 1)(n + 1)

The radius of convergence is 1/4.


We can also apply the limit ratio test to
  ∞  
2n 2n 2n 2n
1 + 2x 2 + 6x 4 + 15x 6 + · · · + x + ··· = 1 + x .
n n=1
n

For this series, we have absolute convergence when



(2n + 2)! x 2n+2 n! n! (2n + 1)(2n + 2) x 2
lim ·
= lim = 4|x|2 < 1.
n→∞ (n + 1)! (n + 1)! (2n)! x 2n n→∞ (n + 1)(n + 1)
The radius of convergence in this case is 1/2.

Definition: radius of convergence



The radius of convergence of a power series ∞ n
n=1 an x is the bound B with the
property that the series converges absolutely for |x| < B, and the series diverges for
|x| > B.

As we shall see in the next few pages, part of the beauty and convenience of power series
is that there will always be a radius of convergence. If

an+1 x n+1
lim
n→∞ an x n

does not exist, perhaps because many of the an are zero, we can still use the root test. We

have absolute convergence when the upper limit of n |an x n |, denoted by
 
lim n |an x n | = lim n |an | |x|,
n→∞ n→∞

is strictly less than 1, divergence when it is strictly greater than 1. The radius of convergence
is then
1
R= √ .
limn→∞ n
|an |
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

148 4 The Convergence of Infinite Series

When the limit of a sequence exists, then the upper limit is simply the limit. The advantage
of using the upper limit is that for any bounded sequence it always exists, even when the
limit does not.

lim inf and lim sup (Limb Soup)


In the proof of the Cauchy criterion, starting on page 123, we took our bounded sequence of
partial sums and considered the set of greatest lower bounds where Lk is the greatest lower
bound of the set {Sk , Sk+1 , Sk+2 , . . .}. We then took the least upper bound M of the set of
Lk . In this case, because we had assumed that the sequence was Cauchy, M was the limit of
the sequence (S1 , S2 , S3 , . . .). But all we needed in order to have a least upper bound of the
sequence of greatest lower bounds was that our original sequence was bounded. Given a
bounded sequence, we call this least upper bound of the sequence of greatest lower bounds
the lim inf or lower limit of the original sequence. Similarly, the greatest lower bound of
the sequence of least upper bounds is called the lim sup or upper limit.
Thus for the sequence

(0.9, 3.1, 0.99, 3.01, 0.999, 3.001, 0.9999, 3.0001, . . .) ,

the lower limit is the least upper bound of {0.9, 0.99, 0.999, . . .} which is 1. The upper
limit is the greatest lower bound of {3.1, 3.01, 3.001, . . .} which is 3.

Definition: upper and lower limits


The upper limit of a bounded sequence (x1 , x2 , x3 , . . .) is the greatest lower
bound of the set {M1 , M2 , M3 , . . .} where Mk is the least upper bound of the set
{xk , xk+1 , xk+2 , . . .}. The lower limit of the bounded sequence is the negative of the
upper limit of (−x1 , −x2 , −x3 , . . .),
lim xn = − lim (−xn ).
n→∞ n→∞

We owe the concept of the upper limit to Cauchy. He introduced it in his Cours d’analyse
for exactly the reason we have used it here: to find the radius of convergence of an arbitrary
power series. His definition was less precise than we would tolerate today. He spoke of it
as “the limit towards which the greatest values converge.”

Existence of Radius of Convergence


√ √ √
We consider the sequence S = (|a1 |, |a2 |, 3 |a3 |, 4 |a4 |, . . .). If this sequence is un-
√ √ √
bounded, then for every x = 0, the sequence (|a1 | |x|, |a2 | |x|, 3 |a3 | |x|, 4 |a4 | |x|, . . .)
is also unbounded. By the root test (Theorem 4.9), the power series diverges at every value
of x other than x = 0. In this case, the radius of convergence is zero. If S is bounded, then

limn→∞ n |an | will always be well defined and greater than or equal to zero. It still remains

for us to prove that R = 1/ limn→∞ n |an | is a radius of convergence.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.3 The Convergence of Power Series 149


Theorem 4.14 (Existence of Radius of Convergence). Let a0 + n=1 an x n be an
arbitrary power series and define
1
R= √ .
limn→∞ n
|an |
This series converges absolutely for |x| < R and it diverges for |x| > R. The power

series converges at all values of x when limn→∞ n |an | = 0, and it converges only at
x = 0 when the upper limit is infinite.


Proof: Let λ = limn→∞ n |an |. If |x| < 1/λ, then we can find an α just a little less than 1
and an just a little larger than zero so that we still have
α
|x| < .
λ+
It follows that

  n
|an |
n
|an x n | = n |an | |x| < α.
λ+

By the definition of λ as the upper limit of n |an |, this last term is strictly less than α for all
sufficiently large values of n. The root test, Theorem 4.9, tells us that the series converges
absolutely.
If |x| > 1/λ, then we can find an just a little larger than zero so that we still have
1
|x| > .
λ−
It follows that

  n
|an |
n
|an x n | = n |an | |x| > .
λ−

From the definition of λ, there must be infinitely many elements of n |an | that equal or

exceed λ − . This means that there are infinitely many values of n for which n |an x n | ≥ 1.
The root test tells us that this series diverges.
Q.E.D.

Hypergeometric Series
What happens when |x| equals the radius of convergence? The series might converge at
both endpoints, diverge at both, or converge at only one of these values. If it converges
at both, the convergence might be absolute or conditional. There is no single test that will
return a conclusive answer for all power series, but in 1812 Carl Friedrich Gauss did publish
a test that determines the convergence at the endpoints for every power series you are likely
to encounter outside of a course in real analysis. It is a definitive test that works when the
power series is hypergeometric.
The easiest infinite series with which to work is the geometric series,

1 + x + x2 + x3 + · · · .
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

150 4 The Convergence of Infinite Series

It converges to 1/(1 − x) when |x| < 1, and it diverges when |x| ≥ 1. In the seventeenth
and eighteenth centuries, mathematicians began to appreciate a larger class of series that
was almost as nice, the hypergeometric series. A geometric series is characterized by the
fact that the ratio of two successive summands is constant. In a hypergeometric series, the
ratio of two succesive nonzero summands is a rational function of the subscript.

Definition: hypergeometric series


A series a1 + a2 + a3 + · · · is hypergeometric if
an+1 P (n)
=
an Q(n)
where P (n) and Q(n) are polynomials in n.

For example, the exponential series is hypergeometric:


an+1 x n /n! x
= n−1 = .
an x /(n − 1)! n
The numerator is the constant x (constant with respect to n), and the denominator is the
linear function n. The series for sin x is also hypergeometric:

an+1 (−1)n x 2n+1 /(2n + 1)! −x 2


= = .
an n−1
(−1) x 2n−1 /(2n − 1)! (2n)(2n + 1)

Again the numerator is constant, −x 2 ; the denominator is a quadratic function, 4n2 + 2n.
The binomial series is also hypergeometric. Given that
α(α − 1) · · · (α − n + 2) n−1
an = x ,
(n − 1)!
the ratio of consecutive terms is
an+1 (α − n + 1)x
= .
an n
In this case, both numerator and denominator are linear functions of n. Even a series such
as
1 1 1
1+ 2
+ 2 + 2 + ···
2 3 4
is hypergeometric:

an+1 n2
= .
an (n + 1)2
It was quickly realized that most of the series people were finding were hypergeometric or
could be expressed in terms of hypergeometric series. On page 39 we encountered Euler’s
differential equation that models a vibrating drumhead. Euler showed that the solution to
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.3 The Convergence of Power Series 151

this equation is given by the series


  αr 2  αr 4
1 1
u(r) = r β
1− +
(β + 1) 2 2! (β + 1)(β + 2) 2
 αr 6 
1
− + ··· . (4.13)
3! (β + 1)(β + 2)(β + 3) 2

The nth summand is

1  αr 2n−2
an = (−1)n−1 .
(n − 1)!(β + 1)(β + 2) · · · (β + n − 1) 2

The ratio of successive summands is

an+1 −α 2 r 2
= . (4.14)
an 4n(β + n)

We again have a hypergeometric series. The numerator is constant (as a function of n), and
the denominator is a quadratic polynomial.
Gauss’s attention was turned to hypergeometric series by problems in astronomy.
Like Euler, he found that the solutions he was obtaining were power series that satis-
fied the hypergeometric condition. In 1812, he presented a thorough study of these se-
ries entitled “Disquisitionis generales circa seriem infinitam 1 + 1αβ

x + α(α+1)β(β+1)
1 . 2 . γ (γ +1)
xx +
α(α+1)(α+2)β(β+1)(β+2) 3
1 . 2 . 3 . γ (γ +1)(γ +2)
x + etc.”

The Question of Convergence


A hypergeometric series is custom-made for the ratio test—or rather, the ratio test is
custom-made for hypergeometric series. We can always make sense of the limit

an+1 P (n)
lim
= lim .
n→∞ an n→∞ Q(n)

We observe that if the degree of P (n) is larger than the degree of Q(n), then |P (n)/Q(n)|
gets arbitrarily large as n increases, and so the series diverges. If the degree of P (n) is less
than the degree of Q(n), then our ratio approaches zero as n increases and so the series is
absolutely convergent. In both of these cases, our conclusion is independent of the choice
of x.
The exponential function, the sine, and the cosine all fall into this second category. These
are functions for which the radius of convergence is infinite, and so there are no endpoints
of the interval of convergence. If x is an endpoint of the interval of convergence, then we
know that the series evaluated at this point satisfies

an+1 P (n)
lim
= lim = 1.
n→∞ an n→∞ Q(n)
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

152 4 The Convergence of Infinite Series

This happens if and only if P and Q are polynomials of the same degree with leading
coefficients that have the same absolute value:
P (n) Ct nt + Ct−1 nt−1 + · · · + C0
= ,
Q(n) ct nt + ct−1 nt−1 + · · · + c0
where Ct = ±ct .

On the Radius of Convergence


Gauss found a test that is absolutely sharp for all hypergeometric series for which
limn→∞ |P (n)/Q(n)| = 1. It never returns an inconclusive answer. Nine years before
Cauchy published his Cours d’analyse, Gauss demonstrated an understanding of the ques-
tion of convergence that was decades ahead of its time. Twenty years later, in 1832,
J. L. Raabe was to publish a test for convergence that could be applied to hypergeometric
series but which was less effective than Gauss’s test, leaving some situations indeterminate.
Gauss was so far ahead of his contemporaries that few realized what he had accomplished.
It was not until other mathematicians began to rediscover his test that it was recognized
that Gauss had already been there. Not only was Gauss the first to arrive, his proof is a
model of clarity and precision. One sees in it the hand of the master.

Web Resource: To see Gauss’s proof of his test as well as additional information on
it, go to Gauss’s test.

Theorem 4.15 (Gauss’s Test). Let a0 + a1 + a2 + · · · be a hypergeometric series for


which
an+1 Ct nt + Ct−1 nt−1 + · · · + C0
= ,
an ct nt + ct−1 nt−1 + · · · + c0
where Ct = ±ct . Set Bj = Cj /Ct and bj = cj /ct so that the resulting polynomials
are monic (the coefficient of the highest term is 1). The test is as follows:
1. If Bt−1 > bt−1 , then the absolute values of the summands grow without limit and
the series cannot converge.
2. If Bt−1 = bt−1 , then the absolute values of the summands approach a finite nonzero
limit and the series cannot converge.
3. If Bt−1 < bt−1 , then the absolute values of the summands approach zero. If the
series is alternating, then it converges.
4. If Bt−1 ≥ bt−1 − 1, then the series is not absolutely convergent.
5. If Bt−1 < bt−1 − 1, then the series is absolutely convergent.

We note that if the question is simply one of convergence, then there are three cases:
1. If Bt−1 ≥ bt−1 , then the series does not converge.
2. If bt−1 > Bt−1 ≥ bt−1 − 1, then the series converges if and only if it is an alternating
series.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.3 The Convergence of Power Series 153

3. If bt−1 − 1 > Bt−1 , then the series is absolutely convergent.


2n n
We can use Gauss’s test to determine the convergence of 1 + ∞ n=1 n x at the end-
points of the interval of convergence, x = ±1/4. At x = 1/4, we have

an+1 (2n + 2)! (1/4)n+1 n! n!


= ·
an (n + 1)! (n + 1)! (2n)! (1/4)n
(2n + 2)(2n + 1) 4n2 + 6n + 2 n2 + (3/2)n + (1/2)
= = = .
(n + 1)(n + 1)4 4n2 + 8n + 4 n2 + 2n + 1

We see that B1 = 3/2, b1 = 2 and we are in the situation where

3
b1 = 2 > B1 = ≥ b1 − 1 = 1.
2

The series converges if and only if it is alternating. At x = 1/4, the series does not alternate,
and so it diverges. At x = −1/4, the summands of the series do alternate in sign, and so
the series converges conditionally. The interval of convergence is [−1/4, 1/4).
For a more general example, we consider the binomial series

α(α − 1) 2 α(α − 1)(α − 2) 3


(1 + x)α = 1 + αx + x + x + ··· ,
2! 3!

an+1 (−n + α)x


= .
an n+1

The radius of convergence is | − 1/1| = 1. If x = ±1, then the rational function that
determines convergence is

n−α
.
n+1

We see that t = 1, B0 = −α, b0 = 1.


1. If −α ≥ 1 (α ≤ −1), then the summands either grow without limit (α < −1) or all have
absolute value 1 (α = −1). In either case, the series does not converge.
2. If 1 > −α > 0 (−1 < α < 0), then the summands approach zero. The series converges
when the summands alternate in sign which happens when x = 1. It diverges when
x = −1. (Note that the case α = 0 is degenerate: (1 + x)0 = 1. This is true for all
values of x.)
3. If 0 > −α (α > 0), then the series converges absolutely. It converges for both x = 1
and x = −1.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

154 4 The Convergence of Infinite Series

4.3.1. Determine the domain of convergence of the power series given below.


a. n3 x n
n=1

 2n
b. xn
n=1
n!

 2n
c. xn
n=1
n2


d. (2 + (−1)n )n x n
n=1
∞  
2 + (−1)n n n
e. x
n=1
5 + (−1)n+1

 2
f. 2n x n
n=1

 2
g. 2n x n!
n=1
∞   n 2
1 (−1) n n
h. 1+ x
n=1
n

4.3.2. Find the domain of convergence of the following series.



 (x − 1)2n
a.
n=1
2n n3
∞  
n 2x + 1 n
b.
n=1
n+1 x

 n 4n
c. x n (1 − x)n
n=1
3n
∞
(n!)2
d. (x − 1)n
n=1
(2n)!

 √
e. n (tan x)n
n=1

 2
f. (arctan(1/x))n
n=1


4.3.3. Find the radius of convergence R of n=0 an x n if
a. there are α and L > 0 such that limn→∞ |an nα | = L,
b. there exist positive α and L such that limn→∞ |an α n | = L,
c. there is a positive L such that limn→∞ |an n!| = L.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.3 The Convergence of Power Series 155


4.3.4. Suppose that the radius of convergence of n=0 an x n is R, 0 < R < ∞. Evaluate
the radius of convergence of the following series.


a. 2n an x n
n=0


b. nn an x n
n=0

 nn
c. an x n
n=0
n!


d. an2 x n
n=0

4.3.5. Find the radius of convergence for

∞ 
 k
k
xk .
k=1
2k − 1


M&M
4.3.6.
Graph the partial sums

n 
 k
k
Sn (x) = xk
k=1
2k − 1

for n = 3, 6, 9, and 12. Describe what you see. Do you expect convergence at either or both
of the endpoints of the interval of convergence of the infinite series. Prove your assertions.

4.3.7. Find the radius of convergence for


 kk
xk .
k=1
k!


M&M
4.3.8.
Graph the partial sums


n
kk
Sn (x) = xk
k=1
k!

for n = 3, 6, 9, and 12. Describe what you see. Do you expect convergence at either or both
of the endpoints of the interval of convergence of the infinite series. Prove your assertions.

4.3.9. Find the radius of convergence for

∞
2k
√ xk .
k=1
k
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

156 4 The Convergence of Infinite Series


M&M
4.3.10.
Graph the partial sums

n
2k
Sn (x) = √ xk
k=1
k

for n = 3, 6, 9, and 12. Describe what you see. Do you expect convergence at either or both
of the endpoints of the interval of convergence of the infinite series. Prove your assertions.

M&M
4.3.11.
For the series

2 2·4 2 2·4·6 3
x+ x + x + ··· ,
3 3·5 3·5·7
graph the partial sums of the first 3 terms, the first 6 terms, the first 9 terms, and the first 12
terms over the domain −2 ≤ x ≤ 2. Find the radius of convergence R for this series.

4.3.12. Using the series in exercise 4.3.11, decide whether or not this series converges
when x = R and when x = −R. Explain your answers.

4.3.13. Use Stirling’s formula to prove that

1 · 3 · 5 · · · (2n − 1) = 2n+1/2 nn e−n+F (n) (4.15)

where F (n) is an error term that approaches zero as n gets large.



4.3.14.
M&M Graph the partial sums of the first 3 terms, the first 6 terms, the first 9
terms, and the first 12 terms of the series

 kk
xk ,
k=1
1 · 3 · 5 · · · (2k − 1)

over the domain −2 ≤ x ≤ 2. Find the radius of convergence for this series.

4.3.15.
M&M For each of the following series:
(i) Verify that the series is hypergeometric.
(ii) Graph the polynomial approximations that are obtained from the first three, six, and
nine terms of the series. Describe what you see and where it appears that each of these
gives a reasonable approximation to the function represented by this series.
(iii) Find the radius of convergence.
(iv) Use Gauss’s test to determine whether or not the series converges at each endpoint.
 xk∞
x2 x3
a. x + + + ··· =
4 9 k=1
k2
 (2k)! ∞
2! 4! 6!
b. 1 + x+ x2 + x3 + · · · = 1 + xk
1·1 2! · 2! 3! · 3! k=1
k! · k!
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.3 The Convergence of Power Series 157

 (k!)3 ∞
1 (2!)3 2 (3!)3 3
c. 1 + x+ x + x + ··· = 1 + xk
3! 6! 9! k=1
(3k)!

 3 · 5 · · · (2k + 1)
3 3·5 2 3·5·7 3
d. 1 + x + x + x + ··· = 1 + xk
1 2! 3! k=1
k!

 3 · 8 · · · (k 2 − 1)
3 2 3 · 8 3 3 · 8 · 15 4
e. x + x + x + ··· = xk
4 4·9 4 · 9 · 16 k=2
4 · 9 · · · k 2


 12 · 32 · · · (2k − 1)2
12 · 32 2 12 · 32 · 52 3
f. 1 + x + x + x + · · · = 1 + xk
(2!)2 (3!)2 k=1
(k!) 2


 (3k)!
3! 6! 9!
g. 1 + x+ x2 + x3 + · · · = 1 + xk
1 · 2! 2! · 4! 3! · 6! k=1
k! · (2k)!

4.3.16. Explain why the following series is not a hypergeometric series:


x2 x3 x4 x5 x6
x+ − + + − + ··· .
2 3 4 5 6

4.3.17. The power series in exercise 4.3.16 can be expressed as a difference of two hyper-
geometric series. What are they?

4.3.18. Find the upper and lower limits of the following sequences.
a. an = nα − nα
, α∈Q
b. an = nα − nα
, α ∈ Q
c. an = sin(nπ α), α ∈ Q
d. an = sin(nπ α), α ∈ Q

4.3.19. Prove that lim an = A if and only if given any > 0, there exists a response N
n→∞
so that for any n ≥ N , an < A + and there is an m ≥ n with am > A − .

4.3.20. Prove that if an and bn are bounded sequences, then


lim an + lim bn ≤ lim (an + bn ) ≤ lim an + lim bn
n→∞ n→∞ n→∞ n→∞ n→∞
≤ lim (an + bn ) ≤ lim an + lim bn .
n→∞ n→∞ n→∞

For each of these inequalities, give an example of sequences {an } and {bn } for which weak
inequality (≤) becomes strict inequality (<).

4.3.21. Prove that if limn→∞ an = a, then


lim (an + bn ) = a + lim bn .
n→∞ n→∞

4.3.22. Prove that if limn→∞ an = a > 0 and bn ≥ 0 for all n sufficiently large, then
lim (an · bn ) = a · lim bn .
n→∞ n→∞
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

158 4 The Convergence of Infinite Series

4.3.23. Prove that if an > 0 then

an+1 √ √ an+1
lim ≤ lim n an ≤ lim n an ≤ lim . (4.16)
n→∞ an n→∞ n→∞ n→∞ an

4.3.24. Let f be a continuous function for all x and let {xn } be a bounded sequence. Prove
or disprove:
   
lim f (xn ) = f lim xn and lim f (xn ) = f lim xn .
n→∞ n→∞ n→∞ n→∞

4.3.25. Let f be a continuous and increasing function for all x and let {xn } be a bounded
sequence. Prove that
   
lim f (xn ) = f lim xn and lim f (xn ) = f lim xn .
n→∞ n→∞ n→∞ n→∞

4.3.26. Let f be a continuous and decreasing function for all x and let {xn } be a bounded
sequence. Prove or disprove:
   
lim f (xn ) = f lim xn and lim f (xn ) = f lim xn .
n→∞ n→∞ n→∞ n→∞


M&M
4.3.27.
Evaluate the sum of the first thousand terms of

∞  
2 · 5 · · · (3k − 1) m
k=1
3 · 6 · · · (3k)

when m = 1, 2, 3, and 4. Use Gauss’s test to determine those values of k for which this
series converges.

M&M
4.3.28.
Evaluate the sum of the first thousand terms of

∞  
1 · 4 · · · (3j − 2) m
j =1
3 · 6 · · · (3j )

when m = 1, 2, 3, and 4. Use Gauss’s test to determine those values of k for which this
series converges.

4.4 The Convergence of Fourier Series


None of the convergence tests that we have examined so far can help us with the question
of convergence of the Fourier series that we met in the first chapter:

1 1 1
cos(π x/2) − cos(3π x/2) + cos(5π x/2) − cos(7π x/2) + · · · . (4.17)
3 5 7
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.4 The Convergence of Fourier Series 159

Table 4.3. Comparison of summands and partial sums—summations


are over k odd, 1 ≤ k ≤ n.


n n−1 cos(.15 nπ ) ±k −1 cos(.15 kπ ) ± cos(.15 kπ )

1 0.8910070 0.891007 0.891007


3 −0.0521448 0.838862 0.734572
5 −0.1414210 0.697440 0.027465
7 0.1410980 0.838539 1.015150
9 −0.0504434 0.788095 0.561163
11 −0.0412719 0.746823 0.107173
13 0.0759760 0.822799 1.094860
15 −0.0471405 0.775659 0.387754
17 −0.0092020 0.766457 0.231320
19 0.0468951 0.813352 1.122330
21 −0.0424289 0.770923 0.231320
23 0.0068015 0.777725 0.387754
25 0.0282843 0.806009 1.094860
27 −0.0365810 0.769428 0.107173
29 0.0156548 0.785083 0.561163
31 0.0146449 0.799728 1.015150
33 −0.0299299 0.769798 0.027465
35 0.0202031 0.790001 0.734572
37 0.0042280 0.794229 0.891007
39 −0.0228463 0.771382 0.000000
41 0.0217319 0.793114 0.891007
43 −0.0036380 0.789476 0.734572
45 −0.0157135 0.773763 0.027465
47 0.0210146 0.794777 1.015150
49 −0.0092651 0.785512 0.561163
51 −0.0089018 0.776611 0.107173
53 0.0186356 0.795246 1.094860
55 −0.0128565 0.782390 0.387754
57 −0.0027445 0.779645 0.231320
59 0.0151018 0.794747 1.122330
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

160 4 The Convergence of Infinite Series

It does not converge absolutely; at x = 0 it becomes the alternating series


1 1 1
1− + − + ··· .
3 5 7
On the other hand, for most values of x it does not alternate. Table 4.3. shows summands
and partial sums when x = 0.3 (the significance of the last column will be explained after
Abel’s lemma). The sign of the summands displays an interesting pattern:

+−−+−−+−−+
−++−++−++−
+−−+−−+−−+
− + +··· ,

but this is not an alternating series.


Joseph Fourier had shown that this particular series converges for all values of x, but it
was Niels Henrik Abel (1802–1829) who, in 1826, published results on the analysis of such
series, enabling the construction of a simple and useful test for the convergence of Fourier
series. Abel was a Norwegian, born in Findö. In 1825, the Norwegian government paid
for him to travel through Europe to meet and study with the great mathematicians of the
time. He arrived in Paris in the summer of 1826. He had already done great mathematics:
his primary accomplishment was the proof that the roots of a general fifth degree (quintic)
polynomial cannot be expressed in terms of algebraic operations on the coefficients (for
arbitrary quadratic, cubic, and biquadratic polynomials, the roots can be expressed in terms
of the coefficients).
Abel stayed only six months in Paris. Almost all of the great mathematicians of the time
were there, but it was difficult to get to know them and he felt very isolated. He described
this world in a letter written to his former teacher and mentor, Bernt Holmboe, back in
Norway:

I am so anxious to hear your news! You have no idea. Don’t let me down, send
me a few consoling lines in this isolation where I find myself because, to tell you
the truth, this, the busiest capital on the continent, now feels to me like a desert.
I know almost no one; that’s because in the summer months everyone is in the
country, and no one can be found. Up until now, I have only met Mr. Legendre,
Cauchy and Hachette, and several less famous but very capable mathematicians:
Mr. Saigey, the editor of the Bulletin of the Sciences, and Mr. Lejeune-Dirichlet,
a Prussian who came to see me the other day believing I was a compatriot. He’s
a mathematician of penetrating ability. With Mr. Legendre he has proven the
impossibility of a solution in integers to the equation x 5 + y 5 = z5 , and some
other very beautiful things. Legendre is extremely kind, but unfortunately very
old. Cauchy is crazy, and there is no way of getting along with him, even though
right now he is the only one who knows how mathematics should be done. What
he is doing is excellent, but very confusing. At first I understood almost nothing;
now I see a little more clearly. He is publishing a series of memoirs under the title
Exercises in mathematics. I’m buying them and reading them assiduously. Nine
issues have appeared since the beginning of this year. Cauchy is, at the moment,
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.4 The Convergence of Fourier Series 161

the only one concerned with pure mathematics. Poisson, Fourier, Ampère, etc.
are working exclusively on magnetism and other physical subjects. Mr. Laplace
is writing nothing, I think. His last work was a supplement to his Theory of
probabilities. I’ve often seen him at the Institute. He’s a very jolly little man.
Poisson is a short gentleman; he knows how to carry himself with a great deal of
dignity; the same for Mr. Fourier. Lacroix is really old. Mr. Hachette is going to
introduce me to some of these men.

Dirichlet was actually from Düren in Germany, near Bonn and Cologne where he had
attended school. In 1822, at the age of seventeen, he had come to Paris to study. Dirichlet
also left Paris at the end of 1826, going to a professorship at the university in Breslau. Both
of these men play a role in the mathematics that we shall see in this section. Abel’s traveling
allowance was not generous, and most of it was sent home to support his widowed mother
and younger siblings. His living conditions were mean. While in Paris, he was diagnosed
with tuberculosis. In January of 1829, it killed him.

Abel’s Lemma

Theorem 4.16 (Abel’s Lemma). We consider a series of the form


a1 b1 + a2 b2 + a3 b3 + · · ·
where the b’s are positive and decreasing: b1 ≥ b2 ≥ b3 ≥ · · · ≥ 0. Let Sn be the nth
partial sum of the a’s:

n
Sn = ak .
k=1

If these partial sums stay bounded—that is to say, if there is some number M for which
|Sn | ≤ M for all values of n—then
n


ak bk ≤ Mb1 . (4.18)

k=1

We note that this theorem is applicable to Fourier series such as the series given in (4.17).
We take
1 1 1
b1 = 1, b2 = , b3 = , b4 = , ...,
3 5 7

a1 = cos(π x/2), a2 = − cos(3π x/2), a3 = cos(5π x/2), a4 = − cos(7π x/2), . . . .

While it will still take some work to prove that the partial sums of these a’s stay bounded,
a little experimentation shows that whenever x is rational the partial sums are periodic (see
the last column of Table 4.3. and Figure 4.3).
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

162 4 The Convergence of Infinite Series

0.8

0.6

0.4

0.2

0
0 10 20 30 40 50 60 70

FIGURE 4.3. Plot of partial sums of ± cos(.15 kπ ), k odd, 1 ≤ k ≤ n.

We notice that the sum of the a’s does not have to converge. When x = 0, we have
a1 = 1, a2 = −1, a3 = 1, a4 = −1, . . . ,


n 
1, if n is odd,
Sn = (−1) k−1
=
0, if n is even.
k=1

This series does not converge, but the partial sums are bounded by M = 1.

Proof: We use the fact that ak = Sk − Sk−1 and do a little rearranging of the partial sum
of the ak bk :


n
ak bk = S1 b1 + (S2 − S1 )b2 + · · · + (Sn − Sn−1 )bn
k=1

= (S1 b1 + S2 b2 + · · · + Sn bn ) − (S1 b2 + S2 b3 + · · · + Sn−1 bn )


= S1 (b1 − b2 ) + S2 (b2 − b3 ) + · · · + Sn−1 (bn−1 − bn ) + Sn bn

n−1
= Sk (bk − bk+1 ) + Sn bn . (4.19)
k=1

We take absolute values and use the fact that the absolute value of a sum is less than or
equal to the sum of the absolute values. We then use our assumptions that |Sk | ≤ M and
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.4 The Convergence of Fourier Series 163

bk − bk+1 ≥ 0:
n n−1
 

ak bk ≤ Sk (bk − bk+1 ) + |Sn bn |

k=1 k=1


n−1
= |Sk | (bk − bk+1 ) + |Sn | bn
k=1


n−1
≤ M(bk − bk+1 ) + Mbn
k=1

= M(b1 − b2 + b2 − b3 + · · · + bn−1 − bn + bn )
= Mb1 . (4.20)

Q.E.D.

The Convergence Test


As it stands, Abel’s lemma does not seem to be much help. The partial sums of the a’s in
our example never exceed 1.112233. . . , but since b1 = 1, Abel’s lemma only tells us that
for every odd integer n,


cos(0.15 π ) − 1 cos(0.45π ) + 1 cos(0.75π ) − 1 cos(1.05π )
3 5 7

1
+ · · · ± cos(0.15 nπ ) ≤ 1.112234. (4.21)
n
That may be nice to know, but it does not prove convergence.
Abel proved his lemma in order to answer questions about the convergence of power
series. His paper of 1826 was the first fully rigorous treatment of the binomial series for all
values (real and complex) of x. We are almost to a convergence test for Fourier series, but
it was Dirichlet who was the first to publicly point out how to pass from Abel’s lemma to
the convergence test that we shall apply to Fourier series.
The key is to use the Cauchy criterion. A series converges if and only if the partial sums
can be brought arbitrarily close together by taking sufficient terms. The difference between
two partial sums is simply a partial sum that starts much farther out. We observe that
n n
  m

ak = ak − ak

k=m+1 k=1 k=1
≤ |Sn | + |Sm |
≤ 2M. (4.22)
n
If Tn = k=1 ak bk and the a’s and b’s satisfy the conditions of Abel’s lemma, then


n
T n − Tm = ak bk ≤ 2Mbm+1 . (4.23)
k=m+1
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

164 4 The Convergence of Infinite Series

If the b’s actually are approaching 0—notice that we did not need to assume this for Abel’s
lemma—then the difference between the partial sums can be made arbitrarily small and the
series must converge.

Corollary 4.17 (Dirichlet’s Test). We consider a series of the form


a1 b1 + a2 b2 + a3 b3 + · · ·
where the b’s are positive, decreasing, and approaching 0,
b1 ≥ b2 ≥ b3 ≥ . . . ≥ 0.
Let Sn be the nth partial sum of the a’s:

n
Sn = ak .
k=1

If these partial sums stay bounded—that is to say, if there is some number M for which
|Sn | ≤ M for all values of n—then the series converges.

Proof: We must demonstrate that we can win an –N game: given any error bound , we
must always have a response N for which N ≤ m < n implies that
n


ak bk < .

k=m+1

As we have seen, Abel’s lemma implies that


n


ak bk ≤ 2Mbm+1 .

k=m+1

We use the fact that the b’s are approaching 0. We can find an N for which n > N implies
that bn < /2M. This is our response. If m + 1 is larger than N , then
n


ak bk ≤ 2Mbm+1 < 2M = .
2M
k=m+1
Q.E.D.

A Trigonometric Identity
We have proven that

  
(−1)k−1 (2k − 1)π x
cos
k=1
2k − 1 2

converges when x = 0.3. What about other values of x? In order to apply Dirichlet’s test,
we must prove that once we have chosen x, the absolute value of the partial sum of the a’s,
n  
 −
(2k 1)π x
(−1)k−1 cos ,
2
k=1

stays bounded for all n.


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.4 The Convergence of Fourier Series 165

We let y stand for π x/2. We would like to find a trigonometric identity that enables us
to simplify

cos y − cos 3y + cos 5y − · · · − (−1)n cos(2n − 1)y.

Such an identity can be found by using the fact that

cos A + i sin A = eiA . (4.24)

If we add

i sin y − i sin 3y + i sin 5y − · · · − (−1)n i sin(2n − 1)y

to our summation, we can rewrite it as a finite geometric series which we know how to
sum:

[cos y + i sin y] − [cos 3y + i sin 3y] + [cos 5y + i sin 5y] − · · ·


−(−1)n [cos(2n − 1)y + i sin(2n − 1)y]
= eiy − e3iy + e5iy − · · · − (−1)n e(2n−1)iy
 
= eiy 1 + z + z2 + · · · + zn−1
1 − zn
= eiy , (4.25)
1−z

where z = −e2iy .
We want to separate the real and imaginary parts of our formula. To do this, we need to
make the denominator real. If we multiply it by 1 + z, we get

1 − z2 = z(z−1 − z)
= z(− cos 2y + i sin 2y + cos 2y + i sin 2y)
= z(2i sin 2y).

Multiplying numerator and denominator by 1 + z yields

1 − zn (1 − zn )(1 + z)
eiy = eiy
1−z 1 − z2
(1 − zn )(1 + z)
= −iz−1 eiy
2 sin 2y
(1 − zn )(1 − e2iy )
= ie−iy
2 sin 2y
(1 − zn )(e−iy − eiy )
=i
2 sin 2y
(1 − zn )(−2i sin y)
=i
2 sin 2y
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

166 4 The Convergence of Infinite Series

1 − zn
=
2 cos y
1 − (−1)n [cos(2ny) + i sin(2ny)]
=
2 cos y
1 − (−1)n cos 2ny (−1)n+1 sin 2ny
= +i . (4.26)
2 cos y 2 cos y
We get two identities. The real part will equal the sum of the cosines. The imaginary part
will be i times the sum of the sines:
n  
(2k − 1)π x 1 − (−1)n cos(π nx)
k−1
(−1) cos = , (4.27)
k=1
2 2 cos(π x/2)


n  
(2k − 1)π x (−1)n+1 sin(π nx)
(−1)k−1 sin = . (4.28)
k=1
2 2 cos(π x/2)

Since | cos π nx| ≤ 1, we have a bound on the partial sums of the a’s:
n  
 (2k − 1)π x

(−1) k−1
cos ≤ |sec(π x/2)| . (4.29)
2
k=1

This gives us a bound


 provided x is not an integer. When x is an integer, we have that
cos (2k − 1)π x/2 = 0, and so the partial sums are 0. Dirichlet’s test can be invoked to
imply the convergence of Fourier’s series,
1 1 1
cos(π x/2) − cos(3π x/2) + cos(5π x/2) − cos(7π x/2) + · · · ,
3 5 7
for all values of x. Lagrange was wrong. It does converge.

An Observation
We recall that Dirichlet’s test requires a bound on the partial sums that does not depend
on n. The bound that we found in inequality (4.29) satisfies this requirement, but it does
depend on x. Choosing a specific value for x, we get a specific bound and so can apply
Dirichlet’s test. If we graph this bound as a function of x (Figure 4.4), we see that the graph
is not bounded as x approaches an odd integer. Something very curious is happening; our
bound is not bounded.
This is significant. We do not need a bounded bound in order to get convergence, but
we do need a bounded bound if we want the sum of these continuous functions to again be
continuous. This strange behavior of the bound is directly related to the strange behavior
of the Fourier series and the fact that it is discontinuous at odd integers. We shall explore
this topic further in Chapter 5.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.4 The Convergence of Fourier Series 167

10

0
−2 −1 0 1 2
x

FIGURE 4.4. The graph of | sec π x/2|.


M&M
4.4.1.
Construct a table of the values of the partial sums of


(−1)k−1 cos [(2k − 1)π x/2]
k=1

when x = 1/2, 2/3, 3/5, and 5/18. How large do you have to take n in order to see whatever
patterns are present? Describe what you see. What happens when x is irrational?

4.4.2. Prove that if x is rational, then the values taken on by the partial sums in exercise 4.4.1
are periodic. What is the period? How many values do you go through before they repeat?

M&M
4.4.3.
Plot the values of the partial sums


n
(−1)k−1
Tn (x) = cos [(2k − 1)π x/2]
k=1
2k − 1

for 1 ≤ n ≤ 200 when x = 1/2, 2/3, 9/10, and 99/100. Describe what you see.

4.4.4. Let Tn (x) be the partial sum defined in exercise 4.4.3. We are given an error bound
of = 0.1. For x = 1/2, 2/3, 9/10, and 99/100, use Dirichlet’s test to determine the size of
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

168 4 The Convergence of Infinite Series

a response N such that if N ≤ m < n, then

|Tn − Tm | < 0.1.

4.4.5. Let Tn (x) be the partial sum defined in exercise 4.4.3. We are given an error bound
of = 0.001. For x = 1/2, 2/3, 9/10, and 99/100, use Dirichlet’s test to determine the size
of a response N such that if N ≤ m < n, then

|Tn − Tm | < 0.001.



M&M
4.4.6.
Graph the partial sums


n
(−1)k−1
Un (x) = sin [(2k − 1)π x/2]
k=1
2k − 1

over −2 ≤ x ≤ 2 for n = 3, 6, 9, and 12. Describe what you see. For what values of x does
this series converge? Use Dirichlet’s test to prove your assertion.

M&M
4.4.7.
Show that
(1 + z)(1 − zn )
z + z2 + z3 + · · · + zn = .
z−1 − z
Use this to prove that if x is not a multiple of π , then
 
sin x 1 − cos nx sin nx
sin x + sin 2x + sin 3x + · · · + sin nx = + . (4.30)
2 1 − cos x 2

Graph this function of x over the domain −π ≤ x ≤ π for n = 10, 20, 100, and 1000.

4.4.8. If x is held constant, does sin x + sin 2x + sin 3x + · · · + sin nx stay bounded for
all values of n?

4.4.9. Prove that the series that we met in section 4.1,



 sin(k/100)
,
k=2
ln k

does converge.

4.4.10. Use Dirichlet’s test to estimate the number of terms of the series in exercise 4.4.9
that we must take if we are to insure that the partial sums are within = 0.01 of the value
of this series.

M&M
4.4.11.
Graph the partial sums


n
(−1)k−1
Vn (x) = sin(kπ x/2)
k=1
k
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

4.4 The Convergence of Fourier Series 169

over −2 ≤ x ≤ 2 for n = 3, 6, 9, and 12. Describe what you see. For what values of x does
this series converge? Use Dirichlet’s test to prove your assertion.

4.4.12. The term radius of convergence was coined because of its applicability to power
series in which x is allowed to take on complex values. In this case, the series converges
absolutely for every x inside the circle with center at the origin and radius R, and it diverges
at every x outside this circle. The situation on the circle of radius R was investigated by
Abel in his paper on the convergence of binomial series. For the following problems,

c0 + ∞ k
k=1 ck x is a power series with radius of convergence R. For the sake of simplicity,
we assume that ck R k ≥ ck+1 R k+1 ≥ 0 for all k.
a. Show that if limk→∞ ck R k = 0, then this series does not converge for any x on the circle
of radius R.

b. Show that if c0 + ∞ k
k=1 ck R converges, then the power series converges for every x
on the circle of radius R.

c. Show that if limk→∞ ck R k = 0 but c0 + ∞ k
k=1 ck R diverges, then the power series
converges for every x on the circle of radius R except x = R.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5
Understanding Infinite Series

As we have seen, infinite series are not summations with lots of terms. Many of the nice
things that hold for sums of functions fall apart when we look at series of functions. But they
do not always fall apart. Sometimes, we can regroup or rearrange a series without affecting
its value. Sometimes, an infinite summation of continuous functions will be continuous and
can be differentiated or integrated following the rules that hold for finite summations. It
is precisely because infinite series can often be treated as if they were finite sums that so
much progress was made in the eighteenth century.
Trigonometric series such as those introduced by Joseph Fourier can be troublesome.
Once Fourier’s series were accepted, the question that came to the fore was why some series
behaved well and others did not. By understanding why, it became possible to predict when
a series could be rearranged without changing its value, when it was safe to differentiate
each summand and claim that the resulting series was the derivative of the original series.
Most of the problems that we shall investigate reduce to a basic question: when are we
allowed to interchange limits? Continuity, differentiation, and integration are each defined
in terms of limiting processes. So is infinite summation. In exercise 1.2.6 of section 1.2,
we saw that these limiting processes are not always interchangeable. If we use limx→1− to
designate the limit as x approaches 1 from the left, then

 
4  (−1)k−1
n
(2k − 1)π x
lim lim cos = lim− 1
x→1− n→∞ π
k=1
2k − 1 2 x→1

= 1; (5.1)
 
4 
n
(−1)k−1 (2k − 1)π x 4 
n
lim lim− cos = lim 0
n→∞ x→1 π k=1
2k − 1 2 n→∞ π
k=1
= 0. (5.2)
171
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

172 5 Understanding Infinite Series

FIGURE 5.1. z = (x 2 − y 2 )/(x 2 + y 2 ).

On a more basic level, we can see why interchanging limits is potentially dangerous.
Consider the function defined by

x2 − y2
f (x, y) = , (x, y) = (0, 0); f (0, 0) = 0. (5.3)
x2 + y2

If we want to find the value f approaches as (x, y) approaches (0, 0), it makes a great deal
of difference how we approach (0, 0):
 
x2 − y2 x2
lim lim 2 = lim = 1, (5.4)
x→0 y→0 x + y 2 x→0 x 2
 
x2 − y2 −y 2
lim lim 2 = lim = −1. (5.5)
y→0 x→0 x + y 2 x→0 y 2

The reason for the difference is transparent when we look at the graph of z = f (x, y)
(Figure 5.1). In the first case, we moved to the ridgeline when we took the limit y → 0. We
then stayed on this ridge as we approached the origin. In the second case, the limit x → 0
took us to the bottom of the valley which we followed toward the origin.
We shall see lots of examples where geometric representations will help us understand
what can go wrong when limits are interchanged, but such pictures are not always available.
When in doubt, the safest route will be to rely on the –δ and –N definitions.

5.1 Groupings and Rearrangements


In section 2.1 we saw that while the operation of addition is associative and commutative,
these properties often disappear from infinite series. The standard example of the lack of
associativity is the divergent series whose value Leibniz and Euler fixed at 1/2. If we were
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.1 Groupings and Rearrangements 173

free to regroup at will, then we would have

1 − 1 + 1 − 1 + · · · = (1 − 1) + (1 − 1) + · · · = 0 (5.6)
= 1 − (1 − 1) − (1 − 1) − · · · = 1. (5.7)

Regouping does seem to be allowed, however, when the series converges. For example,
regrouping yields another series that represents ln 2:
1 1 1
ln 2 = 1 − + − + · · ·
 2 3  4 
1 1 1
= 1− + − + ···
2 3 4
1 1 1
= + + + ··· (5.8)
1 · 2 3 · 4  5· 6 
1 1 1 1
= 1− − − − − ···
2 3 4 5
1 1 1
= 1− − − − ··· . (5.9)
2·3 4·5 6·7
This is easily justified using our definition of convergence.

Theorem 5.1 (Regrouping Convergent Series). Given a convergent series a1 + a2 +


a3 + · · · = A, we can regroup consecutive summands without changing the value of
the series. In other words, the associative law holds for convergent series.

Proof: We consider an arbitrary series, b1 + b2 + b3 + · · · , formed from our original series


by regrouping the summands so that b1 is the sum of one or more of the initial terms in our
series, b2 is obtained by adding one or more of the next terms to come along, and so on. We
are not allowed to change the order of the summands, only to regroup. For convenience,
we choose k1 to denote the subscript of the last a in b1 , k2 the subscript of the last a in b2 ,
and so on:

b1 = a1 + a2 + · · · + ak1 ,
b2 = ak1 +1 + ak1 +2 + · · · + ak2 ,
b3 = ak2 +1 + ak2 +2 + · · · + ak3 ,
..
.

We note that km ≥ m. If Sn = a1 + a2 + · · · + an is the partial sum of the a’s and Tm =


b1 + b2 + · · · + bm is the partial sum of the b’s, then

Tm = b1 + b2 + · · · + bm = a1 + a2 + · · · + akm = Skm . (5.10)

We need to show that we can win the –N game for the b’s. Given a positive error bound
, we must find a response N so that m ≥ N implies that |Tm − A| < . We know that

|Tm − A| = |Skm − A|,


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

174 5 Understanding Infinite Series

and there is a response for the a’s; call it N. We use this response. If m is greater than or
equal to N , then so is km :
|Tm − A| = |Skm − A| < .
Q.E.D.

Rearrangements
Determining when it is safe to rearrange an infinite series is going to be harder. We have
seen convergent series that change their value when rearranged. For example,

1 1 1 1 1
1− + − + − + ··· (5.11)
3 5 7 9 11

converges to π/4. If we rearrange it to

1 1 1 1 1 1 1 1
1+ − + + − + + − + ··· , (5.12)
5 3 9 13 7 17 21 11

then we have exactly the same summands, but this series converges to a number near 1.0,
well above π/4.
What is happening? Different rearrangements can lead to different answers, but not
always. Taking the series
1 1 1
1+ + + + ···
2 4 8
and experimenting with different rearrangements, we see that it always approaches 2. The
fact that all of these summands are positive is significant. The previous series grew larger
after rearrangement because we kept postponing those negative summands that would
bring the value back down. But it is not just the presence of negative summands that spoils
rearrangements. The series

1 1 1 1 1 2
1− + − + − + ··· = (5.13)
2 4 8 16 32 3

will also be the same no matter how we rearrange it. What is the difference between the
series in (5.11) and the one in (5.13)?

Bernhard Riemann
The first complete answer to this question did not appear until 1867 in Bernhard Riemann’s
posthumous work Über die Darstellbarkeit einer Function durch eine trigonometrische
Reihe (On the representability of a function by a trigonometric series). Georg Friedrich
Bernhard Riemann was born in 1826, the same fall that Abel and Dirichlet had met in
Paris. Riemann entered Göttingen in 1846 at the age of nineteen, stayed one year, and then
transferred to Berlin where he studied with Dirichlet (who had gone to Berlin in 1828),
Eisenstein, Jacobi, and Steiner. He remained there two years, and then transferred back to
Göttingen to finish his studies with Gauss, now an old man whose sparse praise for others
became effusive when he saw Riemann’s work.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.1 Groupings and Rearrangements 175

In the fall of 1852, Dirichlet visited Gauss and spent much of the time talking about series
with young Riemann. Riemann was later to credit many of his insights into trigonometric
series to these discussions. Gauss died in 1855. Dirichlet succeeded to his chair at Göttingen.
He had only four years to live. In 1859, Riemann became heir to what was now the world’s
most prestigious position in mathematics. Riemann died in 1866. Like Abel, he was killed
by tuberculosis.
He had not published his work on trigonometric series. It was his friend and colleague
Richard Dedekind who, after Riemann’s death, recognized that it had to be published. It
revolutionized our understanding of these series.

The Difference
As Riemann realized, the difference between the series in (5.11) and in (5.13) lies in
the summation formed from just the positive (or negative) terms. In (5.13), the positive
summands give a convergent series:
1 1 1 4
1+ + + + ··· = .
4 16 64 3
No matter how we rearrange our positive terms, they will never take us above 4/3. The
negative terms in this series will always subtract 2/3. Any rearrangement will leave us with
4/3 − 2/3 = 2/3.
On the other hand, the series in (5.11) has positive terms whose sum diverges:
1 1 1
1+ + + + ··· .
5 9 13
The only thing that keeps the whole series from diverging is the presence of the negative
terms that constantly compensate. The sum of the negative terms, taken on their own, must
also diverge, otherwise they would not be sufficient to compensate for the diverging sum
of positive terms. The difference between our series is that (5.13) is absolutely convergent
and (5.11) is not.

Theorem 5.2 (Rearranging Convergent Series). Given an absolutely convergent


series a1 + a2 + a3 + · · · = A, any rearrangement of this series yields another con-
vergent series, converging to the same value. In other words, the commutative law
holds for absolutely convergent series.

Proof: We begin with the simplest case. We assume that all of the summands are positive.
Let b1 + b2 + b3 + · · · be a rearrangement of our series, still all positive but put into a
different order. As we saw in section 4.1, a series of positive summands converges if and
only if there is an upper bound on the set of partial sums. The least upper bound is then the
value of this series.
We look at any partial sum of the b’s:
Tn = b1 + b2 + · · · + bn .
The value A is larger than any partial sum of the a’s, and we can always find some partial
sum of the a’s that includes everything in Tn , so A must be larger than Tn . This tells us that
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

176 5 Understanding Infinite Series

the partial sums of the b’s are bounded. They must converge to something that is less than
or equal to A:

b1 + b2 + b3 + · · · = B ≤ A.

We turn this argument around. Any partial sum of the a’s must be less than B, and so B is
an upper bound for these partial sums. It follows that A is less than or equal to B,

A ≤ B ≤ A.

This tells us that A = B. The values are the same.


We now consider the case where the a’s are not all positive, but we do have absolute
convergence:

|a1 | + |a2 | + |a3 | + · · · = A .

As before, we let b1 + b2 + b3 + · · · be some rearrangement of the a’s. By what we proved


in the first part, we at least know that

|b1 | + |b2 | + |b3 | + · · · = A

is also convergent, and so the series of b’s is absolutely convergent. Let Tm = b1 + b2 +


· · · + bm be a partial sum of the b’s. We must show that we can win the –N game, that
given any positive , we can always find a response N such that m ≥ N implies that
|A − Tm | < .
We let Sn = a1 + a2 + · · · + an be a partial sum of the a’s. We know that we can force
Sn close to A:

|A − Tm | = |A − Sn + Sn − Tm |
≤ |A − Sn | + |Sn − Tm |. (5.14)

We choose an N1 so that n ≥ N1 implies that |A − Sn | < /2. It remains to choose an


n ≥ N1 and a lower bound on m so that |Sn − Tm | is less than /2.
Let S n = {a1 , a2 , . . . , an } and T m = {b1 , b2 , . . . , bm }. Using the Cauchy criterion and
the fact that the sum of the b’s converges absolutely, we can find an N2 for which

m >  ≥ N2 implies that |b+1 | + |b+2 | + · · · + |bm | < /2.

We find an n large enough so that

TN2 ⊆ Sn .

If necessary, we make n slightly larger so that it is at least as big as N1 . We now find an N


large enough that

Sn ⊆ TN .

If m ≥ N , then

TN2 ⊆ Sn ⊆ TN ⊆ Tm ,
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.1 Groupings and Rearrangements 177

and so the summands that appear in Tm − Sn lie in Tm − TN2 . That is to say, they are taken
from the b’s with subscript less than or equal to m and strictly greater than N2 . This implies
that
|Tm − Sn | ≤ |bN2 +1 | + |bN2 +2 | + · · · + |bm | < /2. (5.15)

Our response is N.
Q.E.D.

Rearrangement with Conditional Convergence


If a series converges conditionally, then a rearrangement can change its value. How many
possible values are there? Riemann realized that every real number can be obtained by
rearranging such a series. You want to rearrange the series in √
(5.13) so that it converges to
1? We can do it. To 10.35? No problem. Sum it up to −68 + 3 − eπ ? A piece of cake.

Theorem 5.3 (Riemann Rearrangement Theorem). If the series a1 + a2 + a3 + · · ·


converges conditionally, then for any real number r, we can find a rearrangement of
this series that converges to r.

Rather than a formal proof, we shall see how this is done with an example. We shall take
the series
1 1 1
1 − + − + ···
3 5 7
and rearrange it so that it converges to ln 2 instead of π/4. We separate the positive
summands from the negative ones, keeping their relative order, and note that if we added
up just the positive summands, the series would diverge. The same must be true for the
negative summands. This is important.
We have a target value T ; in this case T = ln 2 = .693147. . . . We add positive sum-
mands until we are at or over this target:
1 = 1.
We know that sooner or later we shall reach or exceed the target because the positive
summands diverge. We now add the negative summands until we are below the target:
1
= .6666. . . .
1−
3
Again, the negative summands diverge so that eventually we shall move below the target.
We now put in more positive summands until we are above the target:
1 1
1−+ = .8666. . . ,
3 5
and then add negative summands until we are below again:
1 1 1 1
1− + − − = .63290. . . ,
3 5 7 11
and so on. No matter how far along the series we may be, there are always enough positive
or negative terms remaining to move us back to the other side of the target. Every summand
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

178 5 Understanding Infinite Series

will eventually be inserted.


1 1 1 1 1 1 1 1 1
ln 2 = 1 − + − − + − + − −
3 5 7 11 9 15 13 19 23
1 1 1 1 1
+ − + − − + ··· . (5.16)
17 27 21 31 35
Let b1 + b2 + b3 + · · · be the reordered series and Sm = b1 + b2 + · · · + bm be the mth
partial sum.
How do we know that we are converging to the target value and not just bouncing around
it? We have to show that when we are given a positive error , we always have a response
N such that

m≥N implies that |T − Sm | < .

We know that our original series converges, and so the summands are approaching zero.
This means that there is a finite list of summands with absolute value greater than or equal
to . We move down our reordered series, b1 + b2 + b3 + · · · , until we have included all
summands with absolute value greater than or equal to . We continue moving down the
series until we come to the next pair of consecutive partial sums that lie on opposite sides
of the target T :

SN < T ≤ SN+1 or SN ≥ T > SN+1 .

The subscript N is our response. We know that all of the summands from here on have
absolute value less than . Since T lies between SN and SN+1 , it must differ from each
by less than . Each time we add a new summand we are either moving closer to T (the
difference is getting smaller) or we are jumping by an amount less than to the other side
of T (we are still within of T ).

Other Results
There is another result on rearranging series that we shall need later in this chapter. When
can we add two series by adding the corresponding summands? When are we allowed to
say that

(a1 + a2 + a3 + · · · ) + (b1 + b2 + b3 + · · · ) = (a1 + b1 ) + (a2 + b2 ) + (a3 + b3 ) + · · · ?


(5.17)

Theorem 5.4 (Addition of Series). If a1 + a2 + a3 + · · · = A and b1 + b2 + b3 +


· · · = B both converge, then (a1 + b1 ) + (a2 + b2 ) + (a3 + b3 ) + · · · converges to
A + B.

Proof: Let

Sn = a1 + a2 + · · · + an , Tm = b1 + b2 + · · · + bm .

Given an error bound , we must find an N such that

n≥N implies that |(A + B) − (Sn + Tn )| < .


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.1 Groupings and Rearrangements 179

We use the fact that

|(A + B) − (Sn + Tn )| ≤ |A − Sn | + |B − Tn |,

and split our allowable error between these differences. We find an N1 such that

n ≥ N1 implies that |A − Sn | < /2

and an N2 such that

n ≥ N2 implies that |B − Tn | < /2.

Our response is the larger of N1 and N2 .


Q.E.D.

We need one more basic result.

Theorem 5.5 (Distributive Law for Series). If a1 + a2 + a3 + · · · converges to A


and if c is any constant, then ca1 + ca2 + ca3 + · · · converges to cA.

The proof of this theorem is similar and is left as an exercise.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

5.1.1. Evaluate the series


1 1 1 1 1
1− + − + − + ···
3 9 27 81 243
in two different ways: first as a geometric series with initial term 1 and ratio −1/3, then by
combining each positive term with the succeeding negative term.

M&M
5.1.2.
Use regrouping to evaluate the series

1 1 1 1 1 1 1 1
1+ − + + − + + − + ··· .
2 4 8 16 32 64 128 256
Use numerical calculation to check your answer.

5.1.3. Prove that


1 1 1
1+ + + + ···
5 9 13
diverges.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

180 5 Understanding Infinite Series


5.1.4.
M&M The series in (5.12) can be regrouped so that it forms a series of positive
summands:
     
1 1 1 1 1 1 1 1
1+ − + + − + + − + ···
5 3 9 13 7  17 21 11
1 1 1
+ + − + ···
8n − 7 8n − 3 4n − 1
13 37 61 24n − 11
= + + + ··· + + ··· .
15 819 3927 (8n − 7)(8n − 3)(4n − 1)
Calculate the partial sum of the first thousand terms of this series and so find a lower bound
for the value of the rearrangement in (5.12).

5.1.5.
M&M We can also regroup the series in (5.12) so that it is 6/5 plus a series of
negative summands:
   
1 1 1 1 1 1 1
1+ − − − − − − − ···
5 3 9 13 7 17 21 
1 1 1
− − − − ··· .
4n − 1 8n + 1 8n + 5
Find the general summand of this regrouping and calculate the partial sum of the first
thousand terms of this new series, thereby finding an upper bound for the value of the
rearrangement in (5.12).

5.1.6. Consider the following two evaluations of the series 1/2 · 3 + 1/3 · 4 + · · · +
1/(k + 1)(k + 2) + · · · . Which of these is correct? Where is the flaw in the one that is
wrong? Justify the reasoning for the one that is correct.
1 1 1
+ + ··· +
2 · 3  3 · 4   (k + 1)(k
 + 2)  
1 1 1 1 1 1 1
= − + − + ··· + − + ··· = ,
2 3 3 4 k+1 k+2 2
1 1 1
+ + ··· +
2 · 3  3 · 4   (k +1)(k + 2)  
5 5 3 k+3 k+4
= 1− + − + ··· + − + · · · = 1.
6 6 4 2k + 2 2k + 4

M&M
5.1.7.
Find the first 200 summands in the rearrangements of
1 1 1 1 1
1− + − + − + ···
2 3 4 5 6
that approach 1.5 and 0.5, respectively. Is it possible to pick up any patterns that will
continue forever?

M&M
5.1.8.
Find the first 200 summands in the rearrangements of
1 1 1 1 1
1− + − + − + ···
3 5 7 9 11
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.2 Cauchy and Continuity 181

that approach 1.5 and 0.5, respectively. Is it possible to pick up any patterns that will
continue forever?

5.1.9. Find a different rearrangement of the series in exercise 5.1.7 that approaches 1.5.

5.1.10. If a series converges conditionally, how many distinct rearrangements of that series
are there that yield the same value? Can you describe all possible rearrangements that yield
the same value?

5.1.11. Find an example of two divergent series a1 + a2 + a3 + · · · and b1 + b2 + b3 +


· · · for which the sum (a1 + b1 ) + (a2 + b2 ) + (a3 + b3 ) + · · · converges.

5.1.12. Is it possible for a1 + a2 + a3 + · · · to converge, b1 + b2 + b3 + · · · to diverge,


and the sum (a1 + b1 ) + (a2 + b2 ) + (a3 + b3 ) + · · · to converge? Either give an example
of such series or prove it is impossible.

5.1.13. Prove Theorem 5.5.

5.1.14. Prove that if a series converges conditionally, then we can find a rearrangement
that diverges.

5.2 Cauchy and Continuity


On page 120 of his Cours d’analyse, Cauchy proves his first theorem about infinite series.
Let S be an infinite series of continuous functions,

S(x) = f1 (x) + f2 (x) + f3 (x) + · · · ,

let Sn be the partial sum of the first n terms,

Sn (x) = f1 (x) + f2 (x) + · · · + fn (x),

and let Rn be the remainder,

Rn (x) = S(x) − Sn (x) = fn+1 (x) + fn+2 (x) + · · · .

Just as questions of convergence are investigated by considering the sequence of partial


sums, so also in this chapter we shall look at questions of continuity, differentiability, and
integrability in terms of the sequence of partial sums. Cauchy remarks that Sn , a finite sum
of continuous functions, must be continuous, and then goes on to state:

Let us consider the changes in these three functions when we increase x by an


infinitely small value α. For all possible values of n, the change in Sn (x) will be
infinitely small; the change in Rn (x) will be as insignificant1 as the size of Rn (x)
when n is made very large. It follows that the change in the function S(x) can

1 Literally: insensible.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

182 5 Understanding Infinite Series

only be an infinitely small quantity. From this remark, we immediately deduce the
following proposition:
T I—When the terms of a series are functions of a single variable x and are
continuous with respect to this variable in the neighborhood of a particular value
where the series converges, the sum S(x) of the series is also, in the neighborhood
of this particular value, a continuous function of x.

Cauchy has proven that any infinite series of continuous functions is continuous.
There is only one problem with this theorem. It is wrong. The Fourier series
1 1 1
cos(π x/2) − cos(3π x/2) + cos(5π x/2) − cos(7π x/2) + · · ·
3 5 7
is an infinite series of continuous functions. As we have seen, it is not continuous at x = 1.
No one seems to have noticed this contradiction until 1826 when Niels Abel pointed it out
in a footnote to his paper on infinite series.
Even though Dirichlet definitively established the validity of Fourier series in 1829, it
was 1847 before anyone was able to make progress on resolving the contradiction between
Cauchy’s theorem and the properties of Fourier series. The first light was shed by George
Stokes (1819–1903). A year later, Dirichlet’s student Phillip Seidel (1821–1896) went a
long way toward clarifying Cauchy’s error. Cauchy corrected his error in 1853, but the
conditions required for the continuity of an infinite series were not generally recognized
until the 1860s when Weierstrass began to emphasize their importance.

Cauchy’s Proof
Before we search for the flaw in Cauchy’s argument, we need to restate it more carefully
using our definitions of continuity and convergence. The simple act of putting it into precise
language may reveal the problem.
To prove the continuity of S(x) at x = a, we must show that for any given > 0, there
is a δ such that as long as x stays within δ of a, S(x) will be within of S(a):

|x − a| < δ implies that |S(x) − S(a)| < .

Cauchy’s analysis begins with the observation that

|S(x) − S(a)| = |Sn (x) + Rn (x) − Sn (a) − Rn (a)|


≤ |Sn (x) − Sn (a)| + |Rn (x)| + |Rn (a)|. (5.18)

We can divide the allowable error three ways, giving /3 to each of the terms in the last
line. The continuity of Sn (x) guarantees that we can make

|Sn (x) − Sn (a)| < /3.

The convergence of S(x) at x = a and at all points close to a tells us that the remainders
can each be made arbitrarily small:

|Rn (x)| < /3 and |Rn (a)| < /3.

If you still do not see what is wrong with this proof, you should not be discouraged. It took
mathematicians over a quarter of a century to find the error.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.2 Cauchy and Continuity 183

An Example
It is easiest to see where Cauchy went wrong by analyzing an example of an infinite series
of continuous functions that is itself discontinuous. Fourier series are rather complicated.
We shall use a simpler example:


 x2
S(x) = . (5.19)
k=1
(1 + kx 2 )(1+ (k − 1)x 2 )

Each of the summands is a continuous function of x. The partial sums are particularly easy
to work with. We observe that
x2 1 1
= − ,
(1 + kx )(1 + (k − 1)x )
2 2 1 + (k − 1)x 2 1 + kx 2
and therefore
   
1 1 1
Sn (x) = 1 − + −
1 + x2 1 + x2 1 + 2x 2
   
1 1 1 1
+ − + ··· + −
1 + 2x 2 1 + 3x 2 1 + (n − 1)x 2 1 + nx 2
1
= 1−
1 + nx 2
nx 2
= . (5.20)
1 + nx 2
We see that Sn (0) = 0 for all values of n, and so S(0) = 0. If x is not zero, then

x2
Sn (x) =
n−1 + x 2
which approaches 1 as n gets large,

S(x) = 1, x = 0.

The series is definitely discontinuous at x = 0.


We can see what is happening if we look at the graphs of the partial sums (Figure 5.2).
As n increases, the graphs become steeper near x = 0. In the limit, we get a vertical jump.

Where is the Mistake?


Cauchy must be making some unwarranted assumption in his proof. To see what it might
be, we return to his proof and use our specific example:

0, if x = 0,
S(x) = (5.21)
1, if x = 0,
nx 2
Sn (x) = , (5.22)
1 + nx 2

0, if x = 0,
Rn (x) = (5.23)
1/(1 + nx 2 ), if x =
0.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

184 5 Understanding Infinite Series

0.8

0.6

0.4

0.2

0
−1 0 1 2 3

FIGURE 5.2. Graphs of S3 (x) (solid), S6 (x) (dotted), and S9 (x) (dashed).

The critical point at which we want to investigate continuity is a = 0. If x is close to but


not equal to 0, then inequality (5.18) becomes

|S(x) − S(0)| ≤ |Sn (x) − Sn (0)| + |Rn (x)| + |Rn (0)|



nx 2 1
= − 0 +
1 + nx 2 + |0| (5.24)
1 + nx 2
nx 2 1
= +
1 + nx 2 1 + nx 2
= 1. (5.25)

Something is wrong with the assertion that we can make each of the pieces in line (5.24)
arbitrarily small.
We make the first piece small by taking x close to 0. How close does it have to be? We
want

nx 2
< . (5.26)
1 + nx 2 3

Multiplying through by 1 + nx 2 and then solving for x 2 , we see that

nx 2 < ( /3)(1 + nx 2 ),
x (n − n /3) < /3,
2
/3
x2 < = ,
n
 − n /3 n(3 − )
|x| < /(3n − n). (5.27)
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.2 Cauchy and Continuity 185

The size of our response δ depends on n. As n gets larger, δ must get smaller. This makes
sense if we think of the graph in Figure 5.2. If = 0.1 so that we want Sn (x) < 0.1, we
need to take a much tighter interval when n = 9 than we do when n = 3.
To make the second piece small,

1
< , (5.28)
1 + nx 2 3

we have to take a large value of n. If we solve this inequality for n, we see that we need
1 < ( /3)(1 + nx 2 ),
3
− 1 < nx 2 ,

3−
< n. (5.29)
x 2
The size of n depends on x. As |x| gets smaller, n must be taken larger. This also makes
sense when we look at the graph. If we take an x that is very close to 0, then we need a
very large value of n before we are near S(x) = 1.
Here is our difficulty. The size of x depends on n, and the size of n depends on x. We
can make the first piece small by making x small, but that increases the size of the second
piece. If we increase n to make the second piece small, the first piece increases. We are in
a vicious cycle. We cannot make both pieces small simultaneously.

Fixing it up with Uniform Convergence


Part of the reason that Cauchy made his mistake is that many infinite series of continuous
functions are continuous. Having found what is wrong with Cauchy’s proof, we can attempt
to find criteria that will identify infinite series that are continuous. If we are going to be
able to break our cycle, then either the size of the first piece does not depend on n or the
size of the second piece does not depend on x.
The usual solution is the second: that the size of |Rn (x)| does not depend on x. When
this happens, we say that the series is uniformly convergent. Specifically, we have the
following definition.

Definition: uniform convergence


Given a series of functions, S = f1 + f2 + f3 + · · · , which converges for all x in an
interval I , we let {S1 , S2 , S3 , . . .} denote the sequence of partial sums: Sn = f1 + f2 +
· · · + fn . We say that this series converges uniformly over I if given any positive
error bound , we always have a response N such that
n≥N implies that |S(x) − Sn (x)| < .
The same N must work for all x ∈ I .

Graphically, this implies that if we put an envelope extending distance above and below
S (Figure 5.3), then there is a response N such that n ≥ N implies that the graph of Sn lies
entirely inside this envelope. Using the example from equation (5.19) (Figure 5.4), we see
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

186 5 Understanding Infinite Series

y = S (x ) + γ

y = S (x )

y = S (x ) − γ

FIGURE 5.3. The envelope around the graph of y = S(x).

that when is small (less than 1/2), none of the partial sums stay inside the envelope.
This example was not uniformly convergent.

Proof: We repeat Cauchy’s proof, being careful to choose n first. We choose any a ∈ (α, β)
and use inequality (5.18):

|S(x) − S(a)| ≤ |Sn (x) − Sn (a)| + |Rn (x)| + |Rn (a)|.

0.8

0.6

0.4

0.2

0
−1 0 1 2 3

FIGURE 5.4. Figure 5.2 with envelope.


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.2 Cauchy and Continuity 187

Theorem 5.6 (Continuity of Infinite Series). If S = f1 + f2 + f3 + · · · converges


uniformly over the interval (α, β), and if each of the summands is continuous at every
point in (α, β), then the series S is continuous at every point in (α, β).

As before, we assign a third of our error bound to each of these terms. Using the uniform
convergence, we can find an n for which both |Rn (x)| and |Rn (a)| are less than /3,
regardless of our choice of x. Once n is chosen, we turn to the first piece and use the
continuity of Sn (x) to find a δ for which

|x − a| < δ implies that |Sn (x) − Sn (a)| < /3.

This is now the δ that we can use as our response,

|x − a| < δ implies that |S(x) − S(a)| < /3 + /3 + /3 = .


Q.E.D.

A Nice Example
As an example of the use of uniform convergence, we consider the dilogarithm shown in
Figure 5.5:


 xk
Li2 (x) = . (5.30)
k=1
k2

This series has radius of convergence 1. Using either Gauss’s test or an appropriate com-
parison test, we see that it converges for all x ∈ [−1, 1].

Web Resource: To learn more about the dilogarithm, go to The Dilogarithm.

This series converges uniformly over [−1, 1] as we can see by comparing the remainder
Rn (x) with a bounding integral:

|x|n+1 |x|n+2 |x|n+3


|Rn (x)| ≤ + + + ···
(n + 1)2 (n + 2)2 (n + 3)2
1 1 1
≤ + + + ···
(n + 1) 2 (n + 2) 2 (n + 3)2


dt 1
< 2
= . (5.31)
n t n
Given an error bound , we can respond with any integer N ≥ 1/ . If n ≥ N , then |Rn (x)| <
1/n ≤ regardless of which x we choose from [−1, 1].
Theorem 5.6 assumes that we are working over an open interval. It only implies that
Li2 (x) is continuous at every x ∈ (−1, 1). The behavior of this function at the endpoints is
left as an exercise.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

188 5 Understanding Infinite Series

1.5

0.5

0
−1 −0.5 0 0.5 1
x

−0.5

FIGURE 5.5. The dilogarithm, Li2 (x).

Continuity without Uniform Convergence


Uniform convergence is sufficient to patch up Cauchy’s theorem. It is not necessary. It is
possible that the series is continuous even when we do not have uniform convergence. An
example of this is the series


 x + x 3 (k − k 2 )
S(x) = . (5.32)
k=1
(1 + k 2 x 2 )(1 + (k − 1)2 x 2 )

Observing that

x + x 3 (k − k 2 ) kx (k − 1)x
= − ,
(1 + k 2 x 2 )(1 + (k − 1)2 x 2 ) 1 + k2x 2 1 + (k − 1)2 x 2

we can evaluate the partial sums,


   
x 2x x 3x 2x
Sn (x) = + − + −
1 + x2 1 + 4x 2 1 + x2 1 + 9x 2 1 + 4x 2
 
nx (n − 1)x
+ ··· + −
1+n x 2 2 1 + (n − 1)2 x 2
nx
= . (5.33)
1 + n2 x 2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.2 Cauchy and Continuity 189

0.4

0.2

0
−1 0 1 2 3
x

−0.2

−0.4

FIGURE 5.6. Sn (x) = nx/(1 + n2 x 2 ), n = 3, 6, 12.

For any value of x, these partial sums approach 0 as n increases and so

S(x) = 0 for all x. (5.34)

Graphing our partial sums and a small envelope (Figure 5.6), we see that we do not
have uniform convergence over any interval containing x = 0. The remainder is

−nx
Rn (x) = .
1 + n2 x 2

If we have been given an error < 1/2, then we can always find a large integer n for which
1/n or −1/n is inside the interval, but

±1
|Rn (±1/n)| = = 1 > .
1 + 1 2

Nevertheless, the constant function S(x) = 0 is continuous.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

190 5 Understanding Infinite Series


M&M
5.2.1.
Graph the partial sums to 3, 6, 9, and 12 terms of



S(x) = x 2 (1 − x 2 )n−1 , −1 ≤ x ≤ 1.
n=1

Either prove that this series converges uniformly on [−1, 1], or explain why it cannot
converge uniformly over this interval.

M&M
5.2.2.
Calculate the partial sums of the series in equation (5.19):


n
x2
Sn (x) =
k=1
(1 + kx 2 )(1 + (k − 1)x 2 )

when x = 1/10, 1/100, and 1/1000. How many terms are needed in order to get the value of
Sn (x) within 0.01 of S(x) = 1? Explain the reasoning that leads to your answer.

M&M
5.2.3.
Calculate the partial sums of the series in equation (5.32):


n
x + x 3 (k − k 2 )
Sn (x) =
k=1
(1 + k 2 x 2 )(1 + (k − 1)2 x 2 )

when x = 1/10, 1/100, and 1/1000. How many terms are needed in order to get the value of
Sn (x) within 0.01 of S(x) = 0? Explain the reasoning that leads to your answer.

5.2.4. Consider the power series expansion for the sine:

 ∞
x3 x5 x 2k−1
sin x = x − + − ··· = (−1)k−1 .
3! 5! k=1
(2k − 1)!

Show that this series converges uniformly over the interval [−π, π ]. How many terms must
you take if the partial sum is to lie within the envelope when = 1/2, 1/10, 1/100?

5.2.5. Prove that the power series expansion for the sine converges uniformly over the
interval [−2π, 2π ]. How many terms must you take if the partial sum is to lie within the
envelope when = 1/2, 1/10, 1/100?

5.2.6. Is the power series expansion for the sine uniformly convergent over the set of all
real numbers? Explain your answer.

5.2.7. Euler proved (see Appendix A.3) that

1 1 1 π2
Li2 (1) = 1 + + + + ··· = .
4 9 16 6
Find the value of Li2 (−1).
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.3 Differentiation and Integration 191

5.2.8. What is the relationship between the series



 xk
k=1
k

and the natural logarithm? Why do you think that k=1 x k /k 2 is called the dilogarithm?

5.2.9. Prove that as x approaches 1 or −1 from inside the interval (−1, 1), the value of
Li2 (x) approaches Li2 (1) or Li2 (−1), respectively. For a = 1, you have to show that for
any given error bound , there is always a response δ such that

1 − δ < x < 1 implies that |Li2 (1) − Li2 (x)| < .

How large is δ when = 1/4?

5.2.10. The graph of Li2 (x) and the fact that it is analogous to the natural logarithm both
suggest that we should be able to define this function for values of x that are less than
−1. Show that if term-by-term integration of power series is allowed over the domain of
convergence, then

0
ln(1 − t)
Li2 (x) = dt
x t

for −1 ≤ x ≤ 1, and this integral is defined for all x < 1.

5.2.11. We arrived at the notion of uniform convergence by breaking the second part of
the cycle that we encountered on page 185. We found an N that was independent of x
for which n ≥ N implies that |Rn (x)| < /3. Discuss what it would mean to break the
first part of the cycle, to find a δ independent of n for which |x − a| < δ implies that
|Sn (x) − Sn (a)| < /3. Find an example of such a series. Why is this not the route that is
usually chosen?

5.3 Differentiation and Integration


As we saw in section 3.2, it is not always safe to differentiate a series by differentiating
each term. For example, the Fourier series
 
4 1 1 1
F (x) = cos(π x/2) − cos(3π x/2) + cos(5π x/2) − cos(7π x/2) + · · · ,
π 3 5 7

is equal to 1 for −1 < x < 1. Its derivative is zero at each x between −1 and 1. If we try
to differentiate each term, we obtain the series

−2 [sin(π x/2) − sin(3π x/2) + sin(5π x/2) − sin(7π x/2) + · · · ] ,

which does not converge unless x is an even integer.


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

192 5 Understanding Infinite Series

Worse than this can happen. Trying to differentiate a series by differentiating each
summand can give us a series that converges to the wrong answer. Consider the series

 ∞
 x3
F (x) = fk (x) = . (5.35)
k=1 k=1
(1 + kx 2 )(1 + (k − 1)x 2 )

The derivative of the kth summand is


3x 2 2kx 4
fk (x) = −
(1 + kx 2 )(1 + (k − 1)x 2 ) (1 + kx 2 )2 (1 + (k − 1)x 2 )
2(k − 1)x 4
− .
(1 + kx 2 )(1 + (k − 1)x 2 )2
We see that
fk (0) = 0
for all values of k. If we try to find F  (0) by differentiating each term and then setting
x = 0, we get 0—the wrong answer.
To see what the derivative should be, we look at the partial sums and observe that
x3 kx 3 (k − 1)x 3
= − .
(1 + kx 2 )(1 + (k − 1)x 2 ) 1 + kx 2 1 + (k − 1)x 2
We let Fn (x) denote the partial sum of the first n summands:
Fn (x) = f1 (x) + f2 (x) + · · · + fn (x)
   
x3 2x 3 x3 3x 3 2x 3
= + − + −
1 + x2 1 + 2x 2 1 + x2 1 + 3x 2 1 + 2x 2
 
nx 3
(n − 1)x 3
+ ··· + −
1 + nx 2 1 + (n − 1)x 2
3
nx
= . (5.36)
1 + nx 2
As n gets large, Fn (x) approaches x for all values of x. Our series is

F (x) = x, F  (x) = 1. (5.37)

Figure 5.7 shows what is happening. We see that it is possible for the slope of the partial
sums at a particular point to bear no relationship whatsoever to the slope of the infinite
series. The series that we are using even converges uniformly. The graph suggests that
it should. We can confirm this algebraically. Given an error bound , we want to find N
(independent of x) for which n ≥ N implies that

nx 3 x
> |F (x) − Fn (x)| = x − = = |x| . (5.38)
1 + nx 2 1 + nx 1 + nx 2
2

Solving this inequality for n, we see that we want

|x| −
n> . (5.39)
x 2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.3 Differentiation and Integration 193

0.8

0.4

0
−1 −0.5 0 0.5 1
x

−0.4

−0.8

FIGURE 5.7. Graphs of Fn (x) = nx 3 /(1 + nx 2 ), n = 2 (solid), 4 (dotted), and 6 (dashed), with graph
of y = x included.

It appears that the right-hand side depends on x, but if we graph (|x| − )/ x 2 (Figure 5.8),
we see that it has an absolute maximum of 1/4 2 at x = ±2 . As long as N > 1/4 2 , the
error will be within the allowed bounds.

When is Term-by-term Differentiation Legitimate?


An example like this should make the prospects of being able to differentiate a series by
differentiating each summand seem very dim. In fact, in most of the series you are likely to
encounter, it is safe to differentiate each summand. This can be a very powerful technique.
For example, once you know that
x3 x5 x7
sin x = x − + − + ··· ,
3! 5! 7!
then, provided it is legal to differentiate this series by differentiating each summand, we
can conclude that
x2 x4 x6
cos x = 1 − + − + ··· .
2! 4! 6!
To find conditions under which it is safe to differentiate each term, we return to the
definition of the derivative given in section 3.2. To say that fk (x) is differentiable at x = a
means that
fk (x) − fk (a)
Ek (x, a) = fk (a) − (5.40)
x−a
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

194 5 Understanding Infinite Series

FIGURE 5.8. Graph of (|x| − )/ x 2 .

can be made arbitrarily small by taking x sufficiently close to a. We know that any finite
sum of differentiable functions is differentiable, and so there is a comparable error term
that corresponds to the partial sum Fn (x). This error,

Fn (x) − Fn (a)
En (x, a) = Fn (a) − , (5.41)
x−a

can be made arbitrarily small by taking x sufficiently close to a. If fk (x) converges for

all x close to a and if fk (a) converges, then we have that
∞ 
 F (x) − F (a)

E(x, a) = fk (a) −
k=1
x−a
∞  ∞
 ∞
 k=1 fk (x) − k=1 fk (a)
= fk (a) −
k=1
x − a
∞  
fk (x) − fk (a)
= fk (a) −
k=1
x−a


= Ek (x, a). (5.42)
k=1

Our series is differentiable at x = a and the derivative is equal to ∞ 
k=1 fk (a) if and only if

E(x, a) = k=1 Ek (x, a) can be made arbitrarily small by taking x sufficiently close to a.

A Glance at Our Example


For the series given in equation (5.35) on page 192, we see that

fk (x) − 0 −x 2
Ek (x, 0) = 0 − = . (5.43)
x−0 (1 + kx )(1 + (k − 1)x 2 )
2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.3 Differentiation and Integration 195

This should look familiar. It is precisely the summand that we saw in the last section where
we showed that

 ∞
 −x 2
E(x, 0) = Ek (x, 0) = = −1,
k=1 k=1
(1 + kx 2 )(1 + (k − 1)x 2 )

(since x is not 0). No matter how close we take x to 0, E(x, 0) will remain −1. It cannot be
made arbitrarily small. This confirms what we already knew; we cannot differentiate this
series at x = 0 by differentiating each summand.

The Solution

Uniform convergence of ∞ k=1 fk (x) is not enough to guarantee that term-by-term differ-

entiation can be used. Uniform convergence of the series of derivatives, ∞ 
k=1 k (x), is
f
sufficient.

Theorem 5.7 (Term-by-term Differentiation). Let f1 + f2 + f3 + · · · be a series of


functions that converges at x = a and for which the series of derivatives, f1 + f2 +
f3 + · · · , converges uniformly over an open interval I that contains a. It follows that
1. F = f1 + f2 + f3 + · · · converges uniformly over the interval I ,
2. F is differentiable at x = a, and

3. for all x ∈ I , F  (x) = ∞ 
k=1 fk (x).

Proof: The key to this proof is defining the function

fk (x) − fk (a)
gk (x) = , x = a. (5.44)
x−a

We can make this function continuous at x = a by setting gk (a) = fk (a), though, in fact,

we are only interested in it for x = a, x ∈ I . We will show that ∞k=1 gk (x) converges
uniformly over I . As you think about how we might be able to do this (hint: think mean
value theorem), notice what uniform convergence will do for us.
First, we can express fk (x) in terms of gk (x) and fk (a),

fk (x) = (x − a)gk (x) + fk (a).



Using Theorems 5.4 and 5.5, it follows that fk (x) converges. It is not hard to see

(exercise 5.3.3) that if gk converges uniformly over I , then so must fk .
Next, we let Fn denote the partial sum of the first n functions:

Fn = f1 + f2 + · · · + fn .

We denote by En (x, a) the size of the error when the average rate of change of Fn between
x and a is replaced by the derivative of Fn at a:
Fn (x) − Fn (a)
En (x, a) = Fn (a) − .
x−a
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

196 5 Understanding Infinite Series

Given any positive error , our task is to find a response δ for which 0 < |x − a| < δ
implies that

 −
F (x) F (a)
|E(x, a)| = fk (a) − < .
x−a
k=1

We know that we can control the size of En (x, a), though we must keep in mind that the δ
response could depend on n. We rewrite the quantity to be bounded as

 F (x) − F (a)

|E(x, a)| = fk (a) −
x−a
k=1

 F (x) − F (a) Fn (x) − Fn (a)

= fk (a) − Fn (a) − +
x−a x−a
k=1
Fn (x) − Fn (a)
+ Fn (a) −
x−a

 F (x) − F (a) F (x) − F (a)

fk (a) − Fn (a) +
n n
≤ −
x − a x − a
k=1
Fn (x) − Fn (a)
+ Fn (a) −
∞ ∞ x− a
 

= fk (a) + gk (x) + En (x, a) . (5.45)

k=n+1 k=n+1

We split our error three ways and choose an n so that each of the first two pieces is less
than /3. We then choose our δ so that the third piece is also less than /3. We can do this

because of the convergence of ∞ 
k=1 fk (a), the uniform convergence of k=1 gk (x) (so
that the choice of n does not depend on x), and the differentiability of Fn , a finite sum of
differentiable functions.

So it all comes down to the uniform convergence of gk . Have you figured it out yet?
We can use the Cauchy criterion to establish uniform convergence. A series such as


gk (x)
k=1

converges uniformly over the interval I if and only if given an error bound , there is a
response N independent of x for which
n


N ≤ m < n implies that gk (x) < .

k=m+1

For the series under consideration, the difference between the partial sums is

n
Fn (x) − Fn (a) Fm (x) − Fm (a)
gk (x) = −
k=m+1
x−a x−a
[Fn (x) − Fm (x)] − [Fn (a) − Fm (a)]
= . (5.46)
x−a
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.3 Differentiation and Integration 197

Applying the mean value theorem to the function Fn (x) − Fm (x), we see that this is equal
to


n 
n
gk (x) = Fn (t) − Fm (t) = fk (t) (5.47)
k=m+1 k=m+1

for some t between x and a. This t must also lie in I . By the uniform convergence of
∞ 
k=1 fk (t), we can find a response N that forces
n


gk (x)

k=m+1

to be as small as we wish regardless of the choice of x ∈ I . It follows that ∞ k=1 gk (x) is
also uniformly convergent.

We have only proven that F  (a) = ∞ 
k=1 fk (a), but now that we know that k=1 fk (x)
converges for all x in I , we can replace a by any x in I .
Q.E.D.

Integration
In his derivation of the formula for the coefficients of a Fourier series, Joseph Fourier
assumed that the integral of a series is the sum of the integrals. This is a questionable
procedure that will sometimes fail. It is correct, however, when the series in question
converges uniformly over the interval of integration.

Theorem 5.8 (Term-by-term Integration). Let f1 + f2 + f3 + · · · be uniformly con-


vergent over the interval [a, b], converging to F . If each fk is integrable over [a, b],
then so is F and

b ∞
b

F (x) dx = fk (x) dx.
a k=1 a

Before proceeding with the proof, we need to face one major obstacle: we have not yet
defined integration. The reason for this is that defining integration is not easy. It requires
a very profound understanding of the nature of the real number line. In fact, it will not be
until the sequel to this book, A Radical Approach to Lebesgue’s Theory of Integration, that
we do justice to the question of integration. The modern definition was not determined until
the 20th century.
In the meantime, you will have to rely on whatever definition of integration you prefer.
Fortunately, to prove this theorem we only need two properties of the integral:

b 
b
b
f (x) + g(x) dx = f (x) dx + g(x) dx, (5.48)
a
b
a b
a

f (x) dx ≤ f (x) dx. (5.49)

a a
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

198 5 Understanding Infinite Series

In the next chapter we shall discuss integration as defined by Cauchy and Riemann. It is
not hard to show that their integrals satisfy these properties.

Proof: We have to show that given any > 0, we can find an N for which

n
b

b 

F (x) dx − fk (x) dx <
a a
k=1

when n is at least N . From equation (5.48), any finite sum of integrals over the same interval
is the integral of the sum. We can rewrite our difference as

n
b


b
b  b

F (x) dx − fk (x) dx = F (x) dx − Fn (x) dx
a
k=1 a a a

b

= (F (x) − Fn (x)) dx
a

b

≤ F (x) − Fn (x) dx. (5.50)
a

Since our series converges uniformly, we can find an N for which n ≥ N implies that
|F (x) − Fn (x)| < /(b − a). Substituting this bound, we see that


n
b

b
b 

F (x) dx − fk (x) dx < dx = . (5.51)
a b − a
k=1 a a

Q.E.D.

An Example
An example of a series that cannot be integrated by integrating each summand is given by


  2 k
k x (1 − x) − (k − 1)2 x k−1 (1 − x)
k=1

whose partial sums are (see Figure 5.9)

Sn (x) = n2 x n (1 − x).

As n increases, the hump in the graph of Sn gets pushed further to the right. For any x in
[0, 1], Sn (x) approaches 0 as n gets larger, and so


∞ 

1  1
k x (1 − x) − (k − 1) x (1 − x) dx =
2 k 2 k−1
0 dx = 0. (5.52)
0 k=1 0
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.3 Differentiation and Integration 199

1.5

0.5

0
0 0.2 0.4 0.6 0.8 1
x

FIGURE 5.9. Graphs of Sn (x) = n2 x n (1 − x), n = 2 (solid), 4 (dotted), and 6 (dashed).

But the area under Sn (x) approaches 1 as n gets larger:

n 

 
1  2 k
lim k x (1 − x) − (k − 1)2 x k−1 (1 − x) dx
n→∞ 0
k=1
n 
 
k2 (k − 1)2
= lim −
n→∞
k=1
(k + 1)(k + 2) k(k + 1)
n2
= lim
n→∞ (n + 1)(n + 2)
= 1. (5.53)

In this example, the integral of the sum is not the same as the sum of the integrals.

Not Enough!
Theorem 5.8 is often useful, but it is not what we need for Fourier series. Looking back
to the technique introduced by Joseph Fourier for finding the coefficients in the cosine
expansion of an even function, we see that he began by assuming that his function had a
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

200 5 Understanding Infinite Series

cosine expansion:

f (x) = a1 cos(π x/2) + a2 cos(3π x/2) + a3 cos(5π x/2) + · · ·


∞  
(2k − 1)π x
= ak cos .
k=1
2

This is a dangerous assumption, but we shall accept it for the moment. Fourier observed
that

1     
(2k − 1)π x (2m − 1)π x 0, if k = m,
cos cos dx =
−1 2 2 1, if k = m.

He then argued as follows:



 
1
(2m − 1)π
f (x) cos x dx
−1 2
%

1  ∞  &  
(2k − 1)π (2m − 1)π
= ak cos x cos x dx
−1 k=1
2 2
∞ 
    
1
(2k − 1)π (2m − 1)π
= ak cos x cos x dx (5.54)
k=1 −1 2 2
= a1 · 0 + a2 · 0 + · · · + am−1 · 0 + am · 1 + am+1 · 0 + · · ·
= am . (5.55)

As we see, his argument rests on integrating the series by integrating each summand. If
the cosine series converges uniformly, then we are completely justified in doing this. But
one of the most important series that we have to deal with is the cosine expansion of the
constant 1 between −1 and 1. As we have seen, its expansion,
 
4 1 1 1
cos(π x/2) − cos(3π x/2) + cos(5π x/2) − cos(7π x/2) + · · · ,
π 3 5 7

does not converge uniformly over [−1, 1].


Fortunately, Theorem 5.8 gives a condition that is sufficient but not necessary. Even if
the series does not converge uniformly, it may be permissible to integrate by integrating
each summand. The search in the late 19th century for necessary as well as sufficient
conditions will be an important part of the story in A Radical Approach to Lebesgue’s
Theory of Integration.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

5.3.1. Give an example of a series for which each summand fk is differentiable at every

x in an interval I and ∞ 
k=1 fk converges uniformly over I , but k=1 fk (x) does not
converge for any x in I .
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.3 Differentiation and Integration 201

5.3.2. Prove that



 1
f (x) =
n=1
n2 + x 2

is differentiable for all values of x.



5.3.3. Prove that if gk converges uniformly over the interval I , then

 ∞
 ∞

fk (x) = (x − a) gk (x) + fk (a)
k=1 k=1 k=1

also converges uniformly over I .



M&M
5.3.4.
Graph the partial sums of the first 5, the first 10, and the first 20 terms of

−2 [sin(π x/2) − sin(3π x/2) + sin(5π x/2) − sin(7π x/2) + · · · ] .

Prove that this series converges if and only if x is an even integer.



M&M
5.3.5.
Consider the series

 ∞
 x 2 sin x
G(x) = gk (x) = .
k=1 k=1
(1 + kx 2 )(1 + (k − 1)x 2 )

Evaluate the partial sum of this series to at least a thousand terms when x = π/6, π/4, and
π/2.

M&M
5.3.6.
Graph the partial sums

n
x 2 sin x
Gn (x) =
k=1
(1 + kx 2 )(1 + (k − 1)x 2 )

for −π ≤ x ≤ π and n = 3, 6, 9, and 12. Discuss what you see. Prove that
nx 2 sin x
Gn (x) = .
1 + nx 2
What is G(x)?

5.3.7. Prove that



 x 2 sin x
G(x) =
k=1
(1 + kx 2 )(1 + (k − 1)x 2 )

converges uniformly for all values of x.

5.3.8. Show that if


x 2 sin x
gk (x) = ,
(1 + kx 2 )(1 + (k − 1)x 2 )
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

202 5 Understanding Infinite Series

then


gk (0) = 0.
k=1

Using the result from exercise 5.3.6, find G (0). This is a series that is differentiable but
which we cannot differentiate term-by-term. This series does converge uniformly. Explain
why this does not contradict Theorem 5.7.

5.3.9. Prove that the Cauchy criterion can be used for uniform convergence:

Let f1 + f2 + f3 + · · · be a series of functions converging to F for all x in


the interval I , and let Fn = f1 + f2 + · · · + fn be the partial sum. This series
converges uniformly over I if and only if given any error bound , there is a
response N (valid for all x ∈ I ) such that N ≤ m < n implies that |Fm (x) −
Fn (x)| < .

M&M
5.3.10.
Show that

N 
 
kxe−kx − (k − 1)xe−(k−1)x = N xe−Nx ,
2 2 2

k=1

and use this to prove that


∞ 
 
kxe−kx − (k − 1)xe−(k−1)x
2 2
=0
k=1

for all values of x (including x = 0). Graph the partial sums for N = 5, 10, and 20.

5.3.11. Using the result of exercise 6.3.7, evaluate



1  ∞

−kx 2 −(k−1)x 2
kxe − (k − 1)xe dx.
0 k=1

5.3.12. Show that



1  1 1
kxe−kx − (k − 1)xe−(k−1)x dx = − e−k + e−(k−1) .
2 2

0 2 2
Use this to evaluate

 1  
kxe−kx − (k − 1)xe−(k−1)x
2 2
dx.
k=1 0

5.3.13. The last two exercises should have yielded different results. This tells us that

the convergence of ∞ −kx 2
− (k − 1)xe−(k−1)x cannot be uniform over the interval
2
k=1 kxe
[0, 1]. Where is it that this series does not converge uniformly?
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.4 Verifying Uniform Convergence 203

5.4 Verifying Uniform Convergence


The importance of uniform convergence was not generally recognized until the 1860s. Once
it was accepted as a critical property of “nice” series, the question that came to the fore was
how to determine whether or not a series converged uniformly over a given interval. Three
names stand out among those associated with the tests for uniform convergence: Gustav
Lejeune Dirichlet and Niels Henrik Abel whose work of forty years earlier turned out to be
applicable to this new question, and Karl Weierstrass (1815–1897).
Weierstrass had gone to the University of Bonn at the age of nineteen to study law.
Instead, he became noted for his drinking and fencing. He left after four years without
earning a degree. After convincing the authorities that he had reformed himself, he was
allowed to enter the university at Münster to seek a teaching certificate. There he had the
good fortune to be taught by Christof Gudermann (1798–1852). It was with Gudermann
that Weierstrass began his life-long love of analysis.
In 1841, at the age of 26, Weierstrass received his certification and began to teach high
school2 mathematics. In his spare time, he worked on questions of analysis, concentrating
on the writings of Abel and building upon them. His first papers appeared in 1854. They
excited the entire mathematical community. Weierstrass was granted an honorary doctorate
by the University of Königsberg. Two years later he was made a professor at the University
of Berlin. To him we owe the first truly clear vision of the nature and significance of
uniform convergence.

The Weierstrass M -test


One of the simplest and most useful tests for uniform convergence was published by
Weierstrass in 1880, the M-test. It is based on the following analog of the comparison test.

Theorem 5.9 (Dominated Uniform Convergence). If the series g1 + g2 + g3 + · · ·


is uniformly convergent over the interval I and if |fk (x)| ≤ gk (x) for all x ∈ I and for
every positive integer k, then f1 + f2 + f3 + · · · is also uniformly convergent over I .

Proof: We use the Cauchy criterion for uniform convergence. Given an error bound , we
must find a response N , independent of x, such that N ≤ m < n implies that
n


fk (x) < .

k=m+1

We know that
n
 
n 
n

fk (x) ≤ fk (x) ≤ gk (x).

k=m+1 k=m+1 k=m+1

2 The German gymnasium.


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

204 5 Understanding Infinite Series

The uniform convergence of g1 (x) + g2 (x) + g3 (x) + · · · guarantees an N , independent


of x, for which this sum is less than when N ≤ m < n.
Q.E.D.

This theorem has several immediate corollaries, including the M-test.

Corollary 5.10 (Absolute Uniform Convergence). If |f1 | + |f2 | + |f3 | + · · · con-


verges uniformly over I , then so does f1 + f2 + f3 + · · · .

Corollary 5.11 (Variation on Dominated Uniform Convergence). If the series g1 +


g2 + g3 + · · · is uniformly convergent over the interval I and if |fk (x)| ≤ gk (x) for
every k greater than or equal to some fixed integer N and for all x ∈ I , then
f1 (x) + f2 (x) + f3 (x) + · · · is also uniformly convergent over I .

Corollary 5.12 (Weierstrass M-test). If we can find a sequence of constants M1 , M2 ,


M3 , . . . such that
|fk (x)| ≤ Mk
for every k greater than or equal to some fixed integer N and for all x ∈ I , and if
M1 + M2 + M3 + · · · converges, then f1 + f2 + f3 + · · · converges uniformly over
I.

The first and third corollaries follow from the theorem by taking gk (x) = |fk (x)|, gk (x) =
Mk , respectively. The second corollary is simply the observation that convergence, and thus
uniform convergence, is not affected by changing a finite number of summands.

Why Power Series are so Nice


The M-test has an important consequence for power series:

Corollary 5.13 (Uniform Convergence of Power Series, I). If


a0 + a1 x + a2 x 2 + a3 x 3 + · · ·
is a power series with finite radius of convergence R > 0 and if 0 < α < R, then this
series converges uniformly over [−α, α]. If the radius of convergence is infinite, then
the power series converges uniformly over [−α, α] for any finite positive value of α.

Proof: From the definition of the radius of convergence, R, we know that if R is finite,
then
'
lim sup k ak R k = 1. (5.56)
k→∞
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.4 Verifying Uniform Convergence 205

For any positive error we can find an N such that k ≥ N implies that
'
k
ak R k < 1 + .

If 0 < |x| < α < R and we choose = (α − |x|)/|x|, then


'
k
|x|  |x| α
ak x k = k
|ak R k | < (1 + ) = < 1, (5.57)
R R R

for k ≥ N. We can apply the Weierstrass M-test to this series, using


 α k
Mk = .
R
We have proven that the convergence is uniform on the open interval (−α, α). Exer-
cise 5.4.5 asks you to finish this part of the proof by explaining why the convergence must
be uniform on the closed interval [−α, α].
If R is infinite, then

lim sup k |ak | = 0. (5.58)
k→∞


We can find an N such that k ≥ N implies that k |ak | < 1/2α. If 0 ≤ |x| < α, then
'
k
|x| 1
ak x k < < ,
2α 2
for k ≥ N. We can apply the Weierstrass M-test to this series, using Mk = 1/2k .
Q.E.D.

We note that this corollary does not permit us to take uniform convergence all the way
out to the end of the radius of convergence. We have to stop at some α < R. This should
not be too surprising as we do not always have convergence when x = ±R, much less
uniform convergence on [−R, R].
We also note that the radius of convergence of any power series is the same as the radius
of convergence of the series of derivatives. This is because

lim k = 1,
k

k→∞

and so
 
lim sup k |ak | = lim sup k |ak |k. (5.59)
k→∞ k→∞

We can now see why power series are so very nice and never gave any indication
that there might be problems with continuity or how to differentiate or integrate infi-
nite series. At each point inside the interval of convergence, we are inside an interval
in which the series and the series of derivatives both converge uniformly. Power series
will always be continuous functions; differentiation and integration can always be accom-
plished by differentiating or integrating each term in the series. Power series are always
“nice.”
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

206 5 Understanding Infinite Series

−1 −0.5 0.5 1 1.5

−1


FIGURE 5.10. Partial sums of the graphs of the expansions of 1/ 1 + x (thick) to 3 (long dash),
6 (dots), 9 (short-long), and 12 (short-short-short-long) terms.

An Example
Let us return to Newton’s binomial series and in particular look at the expansion of

1/ 1 + x:
(−1/2)(−3/2) 2 (−1/2)(−3/2)(−5/2) 3
(1 + x)−1/2 = 1 + (−1/2)x + x + x + ··· .
2! 3!
(5.60)
As we have seen, this has radius of convergence R = 1. It converges at x = 1 but not at
x = −1. Figure 5.10 shows the graphs of the partial sums of the first 3, 6, 9, and 12 terms
of this series. We see that the graphs are spreading further apart near x = −1. Even though
our series converges at every point in (−1, 0), it appears that there is no hope for uniform
convergence over this interval. On the other hand, we see that the graphs do seem to be
coming closer near x = 1 (see Figures 5.11 and 5.12). It looks as if it should be possible,

given an envelope around 1/ 1 + x over the interval [0, 1], to find an N so that all of
the partial sums with at least N terms have graphs that lie entirely inside the envelope.
In fact, the behavior that we see here is typical of any power series. If the series converges
at the end of the radius of convergence, then it converges uniformly up to and including
that point. If it does not converge at the end of the radius of convergence, then we cannot
maintain uniform convergence over the entire open interval. In the next two subsections,
we shall prove these assertions.
To simplify our arguments, we are only going to consider what happens at the right-hand
endpoint, x = R, where R is the radius of convergence. If a0 + a1 x + a2 x 2 + a3 x 3 + · · ·
is a power series with radius of convergence R and we want to look at its behavior at
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.4 Verifying Uniform Convergence 207

0.9

0.8

0.7

0.6

0.9 0.95 1.05 1.1

FIGURE 5.11. Close-up near x = 1 of graphs of partial sums from Figure 5.10.

0.9

0.8

0.7

0.6

0.96 0.98 1.02 1.04



FIGURE 5.12. Close-up near x = 1 of graphs of 1/ 1 + x (thick) and of partial sums with 20 (long
dash), 45 (dots), 70 (short-long), and 95 (short-short-short-long) terms.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

208 5 Understanding Infinite Series

x = −R, then this is the same as the behavior at x = R of

a0 − a1 x + a2 x 2 − a3 x 3 + · · · .

Non-uniform Convergence
Given a power series a0 + a1 x + a2 x 2 + a3 x 3 + · · · with a finite radius of convergence
R > 0, we assume that the series does not converge at x = R. We want to show that even
though we have convergence at every point in (0, R), we cannot have uniform convergence
over this interval. It is easier to look at this problem from the other direction: if we do have
uniform convergence over (0, R), then the series must converge at x = R. This result is
not unique to power series. It holds for any series of continuous functions.

Theorem 5.14 (Continuity & Uniform Conv. =⇒ Conv. at Endpoints). Let F =


f1 + f2 + f3 + · · · be a series that is uniformly convergent over (a, b). If each of the
summands fk is continuous at every point in [a, b], then this series converges at a and
at b and is uniformly convergent over [a, b].

Proof: The difficult part is proving that the series converges at x = a and x = b. Once we
have shown convergence, uniform convergence follows (see exercise 5.4.5).
We need to prove that f1 (b) + f2 (b) + f3 (b) + · · · converges. Since we do not have any
handle on F (b) (we do not yet know that F (x) is continuous on [a, b]), we shall use the
Cauchy criterion for convergence. We must show how to find a response N to any positive
error bound so that
n


N ≤ m < n implies that fk (b) < .

k=m+1

We know that if a < x < b, then we can find an N that forces


n


fk (x)

k=m+1

to be as small as we want, regardless of which x we have chosen. We also know that


fm+1 (x) + fm+2 (x) + · · · + fn (x) is continuous over [a, b]. We can make
n
 n

fk (b) − fk (x)

k=m+1 k=m+1

as small as we want by choosing an appropriate x. Note that x will depend on our choice of
m and n. We have to be careful. We cannot choose x until after we have specified m and n.
The key inequality is
n n n
  
n 

fk (b) ≤ fk (b) − fk (x) + fk (x) . (5.61)

k=m+1 k=m+1 k=m+1 k=m+1
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.4 Verifying Uniform Convergence 209

We first choose an N so that


n


N ≤ m < n implies that fk (x) < /2,

k=m+1

regardless of the choice of x ∈ (a, b). We now look at such a pair, m, n, and choose an x
close enough to b that
n
 n

fk (b) − fk (x) < /2.

k=m+1 k=m+1

Combining these two bounds with our inequality (5.61) gives us the desired result. The
same argument works at x = a.
Q.E.D.

Uniform Convergence
We want to prove that when a power series converges at x = R, then it converges uniformly
over [0, R]. We need something stronger than the Weierstrass M-test. If the power series
converges absolutely at x = R, then the M-test can be used. But there are many examples

like our expansion of 1/ 1 + x for which the convergence at x = 1 is not absolute. We
need to prove that even in this case, we still have uniform convergence over [0, R].
The key to proving this is the work that Abel published in 1826 on the binomial expansion.
In particular, we shall use Theorem 4.16 (Abel’s lemma) stated in section 4.5: if

b1 ≥ b2 ≥ b3 ≥ · · · ≥ 0

and if
n


ck < M for all n,

k=1

then
n


ck bk ≤ Mb1 . (5.62)

k=1

We shall see exactly how it is used after stating the theorem.

Theorem 5.15 (Uniform Convergence of Power Series, II). Let a0 + a1 x + a2 x 2 +


a3 x 3 + · · · be a power series with a finite radius of convergence R > 0. If this series
converges at x = R, then it converges uniformly over [0, R].

Proof: Again we use the Cauchy criterion. Given an error bound , we must find a response
N so that
n

k
N ≤ m < n implies that ak x < ,

k=m+1
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

210 5 Understanding Infinite Series

regardless of the choice of x, 0 ≤ x ≤ R. We use equation (5.62) with


 x m+k
bk =
R

and

ck = am+k R m+k .

Since 0 ≤ x ≤ R, we have
 x m+1  x m+2  x m+3  x m+4
≥ ≥ ≥ ≥ · · · ≥ 0.
R R R R

We fix an integer m and let Mm be the least upper bound of the absolute values of the
partial sums that begin with the m + 1st term and are evaluated at R. In other words, for
all n > m:


am+1 R m+1 + am+2 R m+2 + · · · + an R n ≤ Mm .

By the Cauchy criterion, the convergence of a0 + a1 x + a2 x 2 + a3 x 3 + · · · at x = R is


equivalent to the statement that we can make Mm as small as we want by taking m
sufficiently large.
Equation (5.62) implies that
n n
   x k  x m+1
k
ak x = ak R ·
k
≤ Mm · ≤ Mm .
R R
k=m+1 k=m+1

Our response is any N for which m ≥ N guarantees that Mm is less than .


Q.E.D.

Fourier Series
We know that we do not always have uniform convergence for Fourier series. We know of
Fourier series that converge but are not continuous. Nevertheless, we can hope to find some
Fourier series that do converge uniformly. To find conditions under which a Fourier series
converges uniformly, we return to Dirichlet’s test (Corollary 4.17) in section 4.4 and replace

a1 , a2 , a3 , . . . by functions of x: a1 (x), a2 (x), a3 (x), . . . . The condition that Sn = nk=1 ak

is bounded is replaced by the requirement that Sn (x) = nk=1 ak (x) is uniformly bounded.
That is to say, for all x in some interval I , there is a bound M independent of x for
which
n


Sn (x) = ak (x) ≤ M.

k=1
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.4 Verifying Uniform Convergence 211

Theorem 5.16 (Dirichlet’s Test for Uniform Convergence). We consider a series of


the form
a1 (x) b1 (x) + a2 (x) b2 (x) + a3 (x) b3 (x) + · · ·
where the ak and bk are functions defined for all x in the interval I , where for each
x ∈ I the values of bk (x) are positive, decreasing, and approaching 0,
b1 (x) ≥ b2 (x) ≥ b3 (x) ≥ · · · ≥ 0, lim bk (x) = 0,
k→∞

and for which there exists a sequence (Bk )∞ k=1 approaching 0 such that Bk ≥ bk (x) for
all x ∈ I . Let Sn (x) be the nth partial sum of the ak (x)’s:

n
Sn (x) = ak (x).
k=1

If these partial sums are


uniformly
bounded over I —that is to say, if there is some

number M for which Sn (x) ≤ M for all values of n and all x ∈ I —then the series
converges uniformly over I .

The proof follows that of Dirichlet’s test (Corollary 4.17) and is left as exercise 5.4.9.

Example
As an example, consider the Fourier sine series that we met in section 4.1,

 sin kx
F (x) = . (5.63)
k=2
ln k

We looked at this at x = 0.01. Now we want to consider its behavior at an arbitrary value
of x.
Our bk ’s are
1
bk = (k ≥ 2).
ln k
These are positive, decreasing, and approaching zero. Our ak ’s are
ak (x) = sin kx.
In exercise 4.4.7 on page 168, you were asked to prove that
 
sin x 1 − cos nx sin nx
sin x + sin 2x + sin 3x + · · · + sin nx = + (5.64)
2 1 − cos x 2

when x is not a multiple of π . It is zero when x is a multiple of π . Since |1 − cos nx| ≤ 2


and | sin nx| ≤ 1, we have a bound of
n
 sin x 1
+
sin kx ≤

k=1
1 − cos x 2
when x is not a multiple of π . This bound is independent of n. Together with Theorem 5.12,
it implies that our series converges for any value of x. We cannot find a bound that is
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

212 5 Understanding Infinite Series

independent of x. As x approaches any even multiple of π , our bound is unbounded. We


do have uniform convergence on any interval that stays a positive distance away from any
even multiple of π . It appears—although we have not actually proven it—that this series
is not uniformly convergent on any interval that contains or comes arbitrarily close to an
even multiple of π. For example, we would not expect it to be uniformly convergent on
(0, π/2). Figure 5.13 shows the graphs of some of the partial sums of this series. It appears
that this Fourier series will be discontinuous at x = 0. An actual proof that this series is not
uniformly convergent on (0, π/2) is left as exercises 5.4.10 and 5.4.11.

Exercises

5.4.1. Determine whether or not each of the following series is uniformly convergent on
the prescribed set S and justify your conclusion. If it is not uniformly convergent, prove
that it is still convergent.


n2 x 2 e−n |x|
2
a. , S=R
n=1

∞
n2
b. √ (x n + x −n ), S = [1/2, 2]
n=1
n

  
1
c. n
2 sin , S = (0, ∞)
n=1
3n x

  
x2
d. ln 1 + , S = (−a, a), where a is a positive constant.
n=2
n ln2 n

 ln(1 + nx)
e. . S = [2, ∞]
n=1
n xn
∞ 
 
π
f. − arctan(n2 (1 + x 2 )) , S=R
n=1
2

5.4.2. Let a0 + a1 x + a2 x 2 + · · · be a power series with finite radius of convergence R.


Prove that if the series of derivatives, a1 + 2a2 x + 3a3 x 2 + · · · converges at x = R, then
so does the original series.

5.4.3. Give an example of a power series with radius of convergence R that converges at
x = R but for which the series of derivatives does not converge at x = R.

5.4.4. Let F (x) = f1 (x) + f2 (x) + f3 (x) + · · · be a series that is uniformly convergent
over (a, b) and for which each fk (x) is continuous on [a, b]. Prove that

F (b) = lim− F (x).


x→b
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.4 Verifying Uniform Convergence 213

12

10

y 6

0
0.4 0.8 1.2 1.6
x

12

10

y 6

0
0.4 0.8 1.2 1.6
x

12

10

y 6

0
0.4 0.8 1.2 1.6
x

FIGURE 5.13. Partial sums of graphs of k=2 (sin kx)/(ln k) to 50, 100, and 150 terms.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

214 5 Understanding Infinite Series

5.4.5. Explain why it is that if a series converges on the closed interval [a, b] and converges
uniformly on the open interval (a, b), then it must converge uniformly on the closed interval
[a, b]

5.4.6. Give an example of a series that is uniformly convergent over (a, b) and that is
not uniformly convergent over [a, b]. Theorem 5.14 implies that such a series cannot be a
power series.

5.4.7. Give an example of a series that converges at every point of (a, b), each summand is
continuous at every point in [a, b], but the series does not converge at every point of [a, b].

5.4.8. If a power series has radius of convergence R and converges at x = ±R, then
we have shown that it converges uniformly over [−R, 0] and over [0, R]. Prove that it
converges uniformly over [−R, R].

5.4.9. Prove Dirichlet’s test for uniform convergence, Theorem 5.16.

5.4.10. Prove that



 sin kx
k=2
ln k

is discontinuous at x = 0 by proving that



 sin kx
lim+ = ∞.
x→0
k=2
ln k

5.4.11. Using the result from the previous exercise, prove that

 sin kx
k=2
ln k

is not uniformly convergent over (0, π/2).

5.4.12. Use the Dirichlet test for uniform convergence to show that each of the following
series converges uniformly on the indicated set S:
∞
xn
a. (−1)n+1 , S = [0, 1]
n=1
n

 sin(nx)
b. , S = [δ, 2π − δ], for fixed δ, 0 < δ < π
n=1
n

∞
sin(n2 x) sin(nx)
c. S=R
n=1
n + x2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

5.4 Verifying Uniform Convergence 215

∞
sin(nx) arctan(nx)
d. , S = [δ, 2π − δ], for fixed δ, 0 < δ < π
n=1
n

∞
1
e. (−1)n+1 x , S = [a, ∞), for some constant a > 0
n=1
n

∞
e−nx
f. (−1)n+1 √ , S = [0, ∞)
n=1 n + x2


5.4.13. Let F (x) = ∞ k=1 fk (x) be a series that is uniformly convergent over any closed
interval [c, d] ⊆ (a, b) where every fk (x) is continuous on [a, b]. Furthermore, assume

that ∞ k=1 fk (b) converges. Does this imply that

lim F (x) = F (b)?


x→b−
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6
Return to Fourier Series

In the spring of 1808, Siméon Denis Poisson wrote up the committee’s report on Fourier’s
Theory of the Propagation of Heat in Solid Bodies. The conclusion was that it contained
nothing that was new or interesting. Behind this opinion lay Lagrange’s opposition to the
admission of Fourier’s trigonometric series and his conviction that they must not converge.
In the following years, Fourier attempted to meet Lagrange’s objections and to convince him
that his series did in fact converge. In the meantime, he conducted experiments, comparing
the predictions of his mathematical models with observed phenomena.
The problem of modeling the flow of heat was of concern to many scientists of the time.
In 1811, the Institut de France announced a competition for the best explanation of heat
diffusion. Fourier reworked his earlier manuscript and submitted it. Despite continuing ob-
jections from Lagrange, he was awarded the prize. Lagrange could not deny him the award,
but he could postpone publication. Even after Lagrange died in 1813, Fourier’s manuscript
continued to languish in the Institut. Fourier began to prepare a book to disseminate his
ideas.
After the second fall of Napoléon, Fourier came to Paris as the director of the Bureau of
Statistics for the department of the Seine. He was back at the center of intellectual life. His
book, Théorie analytique de la chaleur (Analytic theory of heat), appeared in 1822. That
same year he was elected perpetual secretary of the Académie des Sciences, the highest of
scientific honors. He used that position in succeeding years to encourage and promote the
careers of emerging mathematicians. Gustav Dirichlet, Sophie Germain, Joseph Liouville,
Claude Navier, and Charles Sturm were among those who received his assistance and
would remember him fondly.
The problem of the convergence of Fourier series was given its first published treatment
in 1820 in a paper by Poisson. His work suffers from the defect that in the course of
proving the convergence of Fourier series he needed—in a subtle way—to assume that
they converged. Fourier tried to supply a proof in his book. He did see the fundamental
217
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

218 6 Return to Fourier Series

difficulty and so was able to show the way to an eventual proof, but he himself did not
succeed. In 1826, Cauchy took up this problem and published what he considered to be a
solution. There were flaws in his work.
In January of 1829, at the age of 23 and from his new professorship in Berlin, Gustav
Lejeune Dirichlet submitted the paper “Sur la convergence des séries trigonométriques
qui servent à représenter une fonction arbitraire entre des limites données.” It begins
with a critique of Cauchy’s paper, pointing out Cauchy’s mistaken assumption that if vn
approaches wn as n approaches infinity, and if w1 + w2 + w3 + · · · converges, then so
must v1 + v2 + v3 + · · · . This was a critical assumption on Cauchy’s part. Without it, his
argument collapses. Dirichlet pointed out that if we define
(−1)n (−1)n 1
wn = √ , vn = √ + ,
n n n

then vn approaches wn and the series n=1 wn converges, but the series ∞ n=1 vn diverges.
After pointing out Cauchy’s error, Dirichlet goes on to give the first substantially correct
proof for the validity of Fourier series. It is this proof that we shall investigate.
Dirichlet’s great interest in mathematics was always number theory. This is where he did
most of his work. In 1829, Fourier’s health was fading. He would die the following spring.
For Dirichlet, this paper was more than an answer to an abstract question in mathematics;
it was a tribute to a mentor and friend, a validation of the new and disturbing series that
Joseph Fourier had introduced to the scientific community in 1807.

6.1 Dirichlet’s Theorem


Until now, we have not done full justice to Fourier’s series of trigonometric functions.
In Chapter 1, we considered the expansion of an even function, a function for which
f (x) = f (−x). For such a function, we look for a cosine expansion

f (x) = a0 + a1 cos x + a2 cos 2x + a3 cos 3x + · · · ,

where f is defined over the interval (−π, π ). There is an analogous sine series for any odd
function, g(x) = −g(−x):

g(x) = b1 sin x + b2 sin 2x + b3 sin 3x + · · · .

An arbitrary function can be expressed uniquely as the sum of an even function and
an odd function (see exercises 6.1.4 and 6.1.5 at the end of this section). For an arbitrary
function defined over the interval (−π, π ), we try to represent it as the sum of a cosine
series and a sine series:

∞ (

F (x) = a0 + ak cos(kx) + bk sin(kx) . (6.1)
k=1

Fourier had considered such general series. The heuristic argument for finding the co-
efficients in an arbitrary Fourier series rests on the observation that for integer values
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.1 Dirichlet’s Theorem 219

of k and m,


π  0 if k = m,
cos(kx) cos(mx) dx = 2π if k = m = 0, (6.2)
−π 
π if k = m = 0,

π 
0 if k = m,
sin(kx) sin(mx) dx = (6.3)
−π π if k = m = 0,

π
sin(kx) cos(mx) dx = 0. (6.4)
−π

If we assume that our function F has such a Fourier series expansion and that it is legal to
interchange the summation and the integral, then the coefficients can be determined from
the following formulæ:

π
1
a0 = F (x) dx, (6.5)
2π −π

π
1
ak = F (x) cos(kx) dx, (k ≥ 1), (6.6)
π −π

π
1
bk = F (x) sin(kx) dx. (6.7)
π −π

Fourier contended that if we define the coefficents ak and bk by equations (6.5–6.7), then
F (x) will equal the series in equation (6.1) when x lies between −π and π .

The Nature of the Problem


The first problem Fourier encountered was that of defining what he meant by

π
F (x) cos(kx) dx.
−π

In 1807, integration was defined and understood as the inverse process of differentiation,
what some of today’s textbooks call “antidifferentiation.” The connection between integra-
tion and problems of areas and volumes was well understood, but that did not change the
fact that one defined

F (x) cos(kx) dx

as a function whose derivative was F (x) cos(kx).


This was a conceptual problem for many of those encountering Fourier series for the
first time. It is not always true that F (x) cos(kx) can be expressed as the derivative of a
known function. Fourier was responsible for changing the definition of an integral from an
antiderivative to an area. It was his idea to put limits on the integration sign and to talk of a
definite integral that was to be defined in terms of the area between F (x) cos(kx) and the
x axis.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

220 6 Return to Fourier Series

Dirichlet was the first to realize that not every function could be integrated. He mentions
the function that takes on one value at every rational number and a different value at every
irrational number, for example

1, if x is rational,
f (x) =
0, if x is irrational.

He thus highlighted the first assumption that we need to make about our function, that
F (x) is integrable over [−π, π ]. Cauchy in the 1820s, Riemann in the 1850s, and Lebesgue
in the 1900s were each to expand and clarify the meaning of integration. The Cauchy and
Riemann integrals will be explained in the next two sections. A brief introduction to the
Lebesgue integral can be found in the Epilogue, Chapter 7.
Poisson, Fourier, and Cauchy had concentrated their attention on proving that


a0 + [ak cos(kx) + bk sin(kx)]
k=1

always converges when ak and bk are defined by equations (6.5–6.7). When Dirichlet
tackled the problem of Fourier series in 1829, he saw that the difficulties were greater than
anyone else had imagined. It was not just a question of convergence. In many cases the
convergence would not be absolute. This means that a rearrangement of the summands
could lead to a different value. Even if the series was known to converge, there was a
legitimate question of whether it converged to F (x).
Specifically, the problem is as follows. We define coefficients ak and bk according to
equations (6.5–6.7). We then form a partial sum:


n
Fn (x) = a0 + [ak cos(kx) + bk sin(kx)]
k=1

π
1
= F (t) dt
2π −π
n 

π 
1
+ F (t) cos(kt) dt cos(kx)
π −π
k=1 
π  
1
+ F (t) sin(kt) dt sin(kx) . (6.8)
π −π

We must prove convergence to F (x):

Given a positive error bound and a value for x in (−π, π ), we must be able to
find a response N so that

n≥N implies that |F (x) − Fn (x)| < .

The proof that we give here is modeled on Dirichlet’s original approach, but we shall incor-
porate some simplifications that were suggested by Ossian Bonnet in 1849 and Bernhard
Riemann in 1854.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.1 Dirichlet’s Theorem 221

Simplifying Fn (x)
Since Fn (x) involves finite sums, we are allowed to interchange the integrals and the
summations. We can rewrite the partial sum as

π
1
Fn (x) = F (t) dt
2π −π

 n 
1 π 
+ cos(kt) cos(kx) + sin(kt) sin(kx) F (t) dt
π −π k=1

 
1 π 1 
n
= + cos k(t − x) F (t) dt. (6.9)
π −π 2 k=1

In the last line, we used the trigonometric identity

cos(kt) cos(kx) + sin(kt) sin(kx) = cos k(t − x). (6.10)

We now use another trigonometric identity (see exercise 6.1.16),

1 sin[(2n + 1)u/2]
+ cos u + cos 2u + · · · + cos nu = , (6.11)
2 2 sin[u/2]

to simplify Fn (x) further:


1 π
sin[(2n + 1)(t − x)/2]
Fn (x) = F (t) dt. (6.12)
π −π 2 sin[(t − x)/2]

We want to get our variable x out of the sine functions and back into the argument of F ,
and so we want to make a change of variable inside the integral. In order to simplify later
calculations, it will be helpful if we first shift the entire interval of integration by distance
x. In other words, we want to rewrite our last equation as

1 π+x
sin[(2n + 1)(t − x)/2]
Fn (x) = F (t) dt. (6.13)
π −π+x 2 sin[(t − x)/2]

This is legal as long as the integrand is periodic with period 2π . If it is periodic, then it does
not matter which interval of length 2π we choose for our integration; the integral from −π
to π will be the same as the integral from −π + x to π + x. There is no problem with the
sine functions. Their ratio has period 2π . The only possible problem lies with the function
F . But we have only specified the values of F for t between −π and π . We are free to
define F as we wish when t is outside this interval. In particular, we can choose to define
F to be periodic with period 2π :

F (t + 2π ) = F (t). (6.14)

This then is the second assumption about the function for which we seek a Fourier series
representation: that it is periodic with period 2π . It is important to remember that this is
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

222 6 Return to Fourier Series

not really a restriction since we began by only specifying the values of the function between
−π and π .

Splitting the Integral


We now split our integral into two pieces and use a different substitution on each. In the
first piece, we replace t with x − 2u. In the second piece, t becomes x + 2u.

1 x sin[(2n + 1)(t − x)/2]


Fn (x) = F (t) dt
π −π+x 2 sin[(t − x)/2]

π+x
1 sin[(2n + 1)(t − x)/2]
+ F (t) dt
π x 2 sin[(t − x)/2]

1 π/2 sin[(2n + 1)u]


= F (x − 2u) du
π 0 sin u

π/2
1 sin[(2n + 1)u]
+ F (x + 2u) du. (6.15)
π 0 sin u
This is essentially as far as Fourier went in his analysis, although he did give arguments
why the sum of these integrals should approach F (x). Before continuing with Dirichlet’s
paper, it is important to pause and look at what we have found. For convenience, we shall
define Fn− (x) and Fn+ (x) to be these two integrals,

1 π/2 sin[(2n + 1)u]


Fn− (x) = F (x − 2u) du, (6.16)
π 0 sin u

π/2
1 sin[(2n + 1)u]
Fn+ (x) = F (x + 2u) du. (6.17)
π 0 sin u
We shall concentrate on Fn+ . Similar results apply to Fn− .

Qualitative Analysis of Fn+


The first thing that should strike you is that Fn+ (x) depends not just on F (x) but on the
value of this function over the entire interval from x to x + π . In fact, the value of Fn+ (x)
is actually independent of the value of F (x). If we leave this function the same at every
point except x and change its value at x, then we do not change the value of the integral.
This is very discouraging news if we want to prove that

F (x) = lim [Fn− (x) + Fn+ (x)]


n→∞

since neither of these integrals depends on the value of F at x. It shows that not every
function can have a Fourier series expansion. The value of F at x is going to have to be
determined by its values at points to the left and right of x.
To see what this dependence is, we take a closer look at what we are integrating. The
graphs of y = sin[(2n + 1)u]/ sin u for n = 4, 8, and 12 are given in Figure 6.1. We can
easily show that for each n the curve has a spike of height 2n + 1 at the y axis. As n gets
larger, the spike gets narrower. The graph hits the u axis for the first time at u = π/(2n + 1).
We then get oscillations that damp down to a fairly constant amplitude as we move toward
π/2. As n gets larger, these oscillations become tighter (see Figure 6.2), increasing in
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.1 Dirichlet’s Theorem 223

25

20

15

10

0
0 0.4 0.8 1.2 1.6
u

−5

FIGURE 6.1. Graphs of y = sin[(2n + 1)u]/ sin u for n = 4 (solid), 8 (dots), and 12 (dashes).

frequency. Because of its importance to the analysis of Fourier series, this function,
sin[(2n + 1)u]
Kn (u) = ,
sin u
has a name. It is called the Dirichlet kernel.
We take the Dirichlet kernel, multiply it by F (x + 2u), and then integrate from u = 0 to
u = π/2. Almost all of the area occurs inside the first spike. The value of the integral will
be dominated by the values of F (x + 2u) for 0 < u < π/(2n + 1). If F is continuous over
this interval and n is large, then F (x + 2u) will stay fairly constant, and this initial part of
the integral will be approximately the value of
 
π
F x+
2n + 1
times the area under the spike. As we shall see, the area under the spike is approximately
π/2.
After the spike, we are integrating F (x + 2u) multiplied by a function that has tight
oscillations. If F (x + 2u) stays fairly constant over one complete oscillation, then the area
above the u axis will approximate the area below the u axis for a net contribution that is
close to zero. For a large value of n, we can expect the contribution from

π/2
sin[(2n + 1)u]
F (x + 2u) du
π/(2n+1) sin u
book3
P1: kpb

224
MAAB001/Bressoud

0.5

u
1.12 1.16 1.2 1.24 1.28
0

−0.5

−1

FIGURE 6.2. Graphs of y = sin[(2n + 1)u]/ sin u for n = 25 (solid), 75 (dots), and 250 (dashes) near u = 1.2.
October 20, 2006
4:18
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.1 Dirichlet’s Theorem 225

to be close to zero. While all of this needs to be made much more precise, you should be
willing to believe that

+ 1 π/2 sin[(2n + 1)u]


Fn (x) = F (x + 2u) du
π 0 sin u

π/(2n+1)
1 sin[(2n + 1)u]
≈ F (x + 2u) du
π 0 sin u
   
1 π π 1 π
≈ · F x+ = F x+ . (6.18)
π 2 2n + 1 2 2n + 1
As n gets larger, this approaches F (x)/2 provided F is continuous at x from the right.
Similarly, Fn− (x) approaches F (x)/2 if and only if F is continuous at x from the left. This
suggests that F must be a continuous function at every value of x.

How to Avoid Continuity


Some of the most interesting and useful functions for which we want to find a Fourier
series expansion are not continuous. One example is the series that we met in Chapter 1
that alternates between +1 and −1. Dirichlet was the first to see what it would mean to
avoid continuity.
As n gets larger, Fn+ (x) approaches the limit from the right of F (x)/2 (see page 92):

1
lim Fn+ (x) = lim F (t). (6.19)
n→∞ 2 t→x +

Dirichlet invented a suggestive notation for this limit from the right:

F (x + 0) = lim+ F (t). (6.20)


t→x

Just as the “+” in an infinite summation is not really addition, so the “+” in F (x + 0) is
not really addition. He similarly defined

F (x − 0) = lim− F (t). (6.21)


t→x

We see that the best we can hope to prove is that

F (x − 0) + F (x + 0)
lim [Fn− (x) + Fn+ (x)] = . (6.22)
n→∞ 2

If F (x) is to have a Fourier series expansion, it does not have to be continuous, but it must
be true that at every x ∈ [−π, π ] we have

F (x − 0) + F (x + 0)
F (x) = . (6.23)
2

This is our third assumption. It says that wherever F has a discontinuity, its value must
be the average of the limit from the left and the limit from the right.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

226 6 Return to Fourier Series

1
x
−4 −2 0 2 4
0

−1

−2

−3

−4

−5

FIGURE 6.3. Graph of F (x) = 2x + 1, (−π < x < 0), (x − 2)/3, (0 < x < π ).

For example, we can find a Fourier series expansion for the function

2x + 1, −π < x < 0,
F (x) =
(x − 2)/3, 0 < x < π,
(see Figure 6.3). At x = 0, the limit from the left is 1, the limit from the right is −2/3. The
Fourier series for this function will take the value
1 − 2/3 1
F (0) = = .
2 6
We also know that F (x) is periodic of period 2π so that F (−π ) = F (π ). To find the value
at these endpoints, we find the limits from the right and the left:
F (−π + 0) = −2π + 1,
F (π − 0) = (π − 2)/3,
−2π + 1 + (π − 2)/3 1 − 5π
F (−π ) = F (π ) = = .
2 6

Dirichlet’s Theorem
We have seen that proving that a function is equal to its Fourier expansion is equivalent
to proving that F (x − 0)/2 = limn→∞ Fn− (x), and F (x + 0)/2 = limn→∞ Fn+ (x) for all
x ∈ (−π, π ). We have also seen that there are two pieces to the proof that F (x + 0)/2 =
limn→∞ Fn+ (x).
First, we must show that we can force

1 a sin[(2n + 1)u]
F (x + 2u) du
π 0 sin u
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.1 Dirichlet’s Theorem 227

to be as close as we want to F (x + 0)/2 by taking an a close to 0 and an n that is


sufficiently large. To do this, we shall have to have an interval of the form (x, x + 2a)
where F is continuous. Even though we are allowing discontinuities, they must be sep-
arated by intervals on which F is continuous. This leads us to our fourth assumption,
that F is piecewise continuous, which means that there are at most a finite number of
values on [−π, π ] at which F is not continuous. As we shall see later, piecewise con-
tinuity implies integrability. Our first assumption has been subsumed under this stronger
assumption.
Second, we must show that we can force

π/2
sin[(2n + 1)u]
F (x + 2u) du
a sin u
to be as close as we want to zero by taking an n that is sufficiently large. We need to use the
frequent oscillations of sin[(2n + 1)u] to get our cancellations. This means that we want to
be able to control the oscillations of F (x + 2u). The fifth and last assumption is that F is
bounded and piecewise monotonic (see page 87).
Dirichlet believed that this last assumption was not necessary, but he could not see how
to prove his theorem without it. In fact, this last assumption can be weakened considerably.
We shall content ourselves with the theorem as Dirichlet proved it.

Theorem 6.1 (Dirichlet’s Theorem). Let F be a bounded, piecewise continuous and


piecewise monotonic function on [−π, π ]. Furthermore, assume that F is periodic
with period 2π and that
F (x + 0) + F (x − 0)
F (x) =
2
for every value of x. We define coefficients

π
1
a0 = F (x) dx,
2π −π

1 π
ak = F (x) cos(kx) dx, (k ≥ 1),
π −π

1 π
bk = F (x) sin(kx) dx.
π −π
Then, at every value of x,
∞ (

F (x) = a0 + ak cos(kx) + bk sin(kx) . (6.24)
k=1

In fact, a piecewise monotonic function on a closed and bounded interval must be


piecewise continuous. Since proving this would take us beyond the scope of this book, we
leave both assumptions in the statement of Dirichlet’s theorem.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

228 6 Return to Fourier Series

Riemann’s Lemma
We shall first show that

π/2
sin[(2n + 1)u]
F (x + 2u) du, 0 < a < π/2,
a sin u
can be made arbitrarily close to zero by taking n sufficiently large. To do this, we shall
concentrate on a single interval (a, b), 0 < a < b ≤ π/2, where F (x + 2u)/ sin u is con-
tinuous and bounded. Since our whole interval from a to π/2 is made up of a finite number
of such pieces, it is enough to show that the integral over each such piece can be forced to
be arbitrarily close to zero.
Working with the integral over [a, b], we do not change the value of the integral if we
change the value of our function just at x = a or b. It is convenient when working on this
particular piece to redefine F , if necessary, so that F (x + 2a) = F (x + 2a + 0) (the limit
from the right) and F (x + 2b) = F (x + 2b − 0) (the limit from the left) so that we can
consider F (x + 2u)/ sin u to be continuous over the closed interval [a, b]. Since we are now
working over a closed interval, the restriction that F be bounded is implied by the continuity
(see Theorem 3.6). For simplicity of notation, we write g(u) for F (x + 2u)/ sin u. The only
restriction on g is that it be continuous over [a, b] where we mean continuous from the
right at a, continuous from the left at b, and continuous in the usual sense at all points
between a and b. We also write M in place of 2n + 1. For this lemma, it is not necessary
for this multiplier to be an odd integer.

Lemma 6.2 (Riemann’s Lemma). If g(u) is a continuous function over the interval
[a, b], 0 < a < b ≤ π/2, then

b
lim sin(Mu) g(u) du = 0. (6.25)
M→∞ a

Before we begin this proof, we need another result. We are going to need to know that
given any > 0, we can force g(u) and g(v) to be within of each other just by keeping
u sufficiently close to v. This sounds like the definition of continuity, but it is not quite the
same. Continuity is defined by being able to find a response to at a specific value of u.
We are going to need a response that works for all u in [a, b]. This is more than continuity.
This is uniform continuity.

Definition: uniform continuity


We say that f is uniformly continuous over the interval I if given any positive error
bound , we can always reply with a tolerance δ such that for any points a and x in I ,
if x is within δ of a, then f (x) is within of f (a):
|x − a| < δ implies that |f (x) − f (a)| < .
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.1 Dirichlet’s Theorem 229

Uniform continuity does not have to be defined over an interval. It can equally well be
defined over any set. This should be compared with the definition of continuity given on
page 81. With uniform continuity, the value of δ does not depend on a.
The hypothesis of Riemann’s lemma only called for continuity. Fortunately, the next
lemma tells us that since we are working over a closed and bounded interval, this is enough.

Lemma 6.3 (Continuity on [a, b] =⇒ Unif. Continuity). If f is continuous over the


closed and bounded interval [a, b], then it is uniformly continuous over this interval.

Proof: We shall prove the contra-positive: if f is not uniformly continuous, then there is
at least one point in [a, b] at which f is not continuous.
To say that f is not uniformly continuous over [a, b] means that there is some > 0 for
which there is no uniform response, no single δ that works at every point x in [a, b]. This,
in turn, means that given any δ > 0, we can always find an x ∈ [a, b] and another point
y ∈ [a, b] such that 0 < |x − y| < δ but |f (x) − f (y)| ≥ .
We choose an > 0 for which there is no uniform response and, for each n ∈ N, choose
xn , yn in [a, b] such that |xn − yn | < 1/n and |f (xn ) − f (yn )| ≥ . Let x = limn→∞ xn
and y = limn→∞ yn . These exist because both sequences are bounded. They must be equal
because
|x − y| ≤ |x − xn | + |xn − yn | + |yn − y|,
and each of the pieces on the right can be made as small as desired by taking n sufficiently
large.
Since x = y, the function f cannot be continuous at this common upper limit. We can
find pairs (xn , yn ) as close to x = y as we wish, but the values of F (xn ) and F (yn ) will
always stay at least apart.
Q.E.D.

Proof: (Riemann’s Lemma) We must show that for any specified error bound , there is
a response M such that N ≥ M implies that

b

sin(N u) g(u) du < .

a

The key is to partition our interval [a, b] into m equal subintervals:


a = u0 < u1 < u2 < · · · < um = b,
b−a
uk − uk−1 = ,
m
where we choose m so that if |u − v| ≤ (b − a)/m, then |g(u) − g(v)| < /2(b − a). Since
g is uniformly continuous over [a, b], we can always find such an m.
The proof proceeds by approximating our function g by the constant g(uk−1 ) on the kth
interval. As M increases, the integral of
sin(Mu) g(uk−1 )
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

230 6 Return to Fourier Series

can be made arbitrarily small. Uniform continuity enables us to control the size of the error
introduced when we substitute g(uk−1 ) for g(u).
We break our integral up into a sum of integrals over these subintervals:

b 
uk

m
sin(Mu) g(u) du = sin(Mu) g(u) du

a k=1 u k−1
m

 u k

= sin(Mu) [g(uk−1 ) + g(u) − g(uk−1 )] du

k=1 uk−1
m

 u k

≤ sin(Mu) g(uk−1 ) du

k=1 uk−1
m

 uk

+ sin(Mu) [g(u) − g(uk−1 )] du

k=1 uk−1
 m
uk

≤ sin(Mu) g(u ) du
k−1
k=1 uk−1
m
uk


+ sin(Mu) [g(u) − g(uk−1 )] du. (6.26)
k=1 uk−1

Since g is continuous on this closed interval, it must be bounded. Let K be an upper


bound for |g|:
|g(u)| ≤ K for all u ∈ [a, b].
In our first integral, g(uk−1 ) is independent of u, and so we can pull it outside the integral
and then replace |g(uk−1 )| by the upper bound K. In the second integral, we use the fact
that | sin(Mu)| ≤ 1 and

|g(u) − g(uk−1 )| ≤ .
2(b − a)
Using these results, we simplify our inequality,

b m
u k

sin(Mu) g(u) du ≤ sin(Mu) g(u ) du
k−1
a k=1 uk−1
m

 uk

+ sin(Mu) [g(u) − g(uk−1 )] du
k=1 uk−1


 m

m
uk uk

≤ K sin(Mu) du + du
k=1 uk−1 k=1 uk−1 2(b − a)
m
| − cos(Muk ) + cos(Muk−1 )|
= K + . (6.27)
k=1
M 2
Since |− cos(Muk ) + cos(Muk−1 )| ≤ 2, the first part of this bound can be bounded by
2mK/M. Our upper bound can be simplified to


b 2mK
sin(Mu) g(u) du ≤ + . (6.28)
M 2
a
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.1 Dirichlet’s Theorem 231

We have to careful. The value of m has been forced by our choice of and the values of a,
b and K are outside our control. But once we have chosen (thus m), we are still free to
choose M as large as we wish. We find an M so that
2mK
< .
M 2
If N ≥ M, then the absolute value of the integral is strictly less than .
Q.E.D.

The Integral of the Dirichlet Kernel

Lemma 6.4 (Integral of Dirichlet Kernel). For any positive integer n,



π/2
sin[(2n + 1)u] π
du = .
0 sin u 2

Proof: From equation (6.11), we know that


sin[(2n + 1)u]
= 1 + 2 cos(2u) + 2 cos(4u) + · · · + 2 cos(2nu).
sin u
Substituting this into our integral and integrating each summand, we get exactly π/2.
Q.E.D.

Bonnet’s Mean Value Theorem


The final lemma that we need is Bonnet’s form of the mean value theorem, a version that
he discovered and proved in 1849 specifically to simplify the proof of Dirichlet’s theorem.
As he pointed out, it also has many other applications. We shall postpone the proof of this
lemma until the next section. Here, for the first time, we shall need to be very careful about
exactly what we mean by an integral.

Lemma 6.5 (Bonnet’s Mean Value Theorem). Let f be integrable and let g be a
nonnegative, increasing function on [α, β]. There is at least one value ζ strictly between
α and β for which

β
β
f (t)g(t) dt = g(β) f (t) dt. (6.29)
α ζ

As an example, let f (t) = sin t and g(t) = t 2 on the interval [0, 2π ]. This lemma
promises us a number ζ for which



t 2 sin t dt = 4π 2 sin t dt.
0 ζ
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

232 6 Return to Fourier Series

Conclusion to the Proof of Dirichlet’s Theorem


As we have seen, we need to show that we can force


F (x + 0) 1 π/2 sin[(2n + 1)u]
− F (x + 2u) du
2 π 0 sin u

and


F (x − 0) 1 π/2 sin[(2n + 1)u]
− F (x − 2u) du
2 π 0 sin u

to each be arbitrarily small by taking n sufficiently large. Any argument that works for one
of these differences will work for the other. Given a positive error bound , our problem is
to find a response N for which n ≥ N implies that


F (x + 0) 1 π/2 sin[(2n + 1)u]
− F (x + 2u) du < . (6.30)
2 π 0 sin u

Lemma 6.4 implies that


F (x + 0) 1 π/2
sin[(2n + 1)u]
= F (x + 0) du. (6.31)
2 π 0 sin u

Making this substitution, we can rewrite the left side of equation (6.30) as

π/2

1 sin[(2n + 1)u] 1 π/2 sin[(2n + 1)u]
F (x + 0) du − F (x + 2u) du
π sin u π 0 sin u
0

π/2
1 sin[(2n + 1)u]
= [F (x + 0) − F (x + 2u)] du
π 0 sin u


1 a sin[(2n + 1)u]
≤ [F (x + 0) − F (x + 2u)] du
π 0 sin u


1 π/2 sin[(2n + 1)u]
+ [F (x + 0) − F (x + 2u)] du , (6.32)
π a sin u
where a is some point between 0 and π/2 whose exact position we shall determine later.
Lemma 6.2 tells us that once we have chosen a value for a, we can make the second term
as small as we want by taking a sufficiently large value for n.
The idea at this point is to choose our a small enough so that
|F (x + 0) − F (x + 2a|
is very small when 0 < u < a. We can do this because F (x + 0) is the limit from the right
of F (x + 2u). There is one potential problem. We want to make

a
sin[(2n + 1)u]
[F (x + 0) − F (x + 2u)] du
sin u
0

very small. How do we know that the choice of a does not depend on the choice of n? This
is a real danger. If it does, then our argument collapses: in the second integral the choice
of n depends on a and in the first integral the choice of a depends on n. We might find
ourselves in exactly the kind of trap that Cauchy fell into when he proved that every series
of continuous functions is continuous.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.1 Dirichlet’s Theorem 233

Bonnet’s mean value theorem, Lemma 6.5, comes to our rescue. We insist that a be close
enough to 0 that F (x + 2u) is continuous and monotonic on (0, a]. We define

|F (x + 0) − F (x + 2a)|, 0 < u ≤ a,
g(u) =
0, u = 0.

Since F (x + 0) − F (x + 2u) is monotonic on (0, a] and approaches 0 as u approaches 0


from the right, either it is ≥ 0 for all u ∈ [0, a] or it is ≤ 0 for all u ∈ [0, a]. In either case,
g(u) will be nonnegative and increasing on [0, a]. Since F (x + 0) − F (x + 2u) does not
change sign on this interval,

a
a
sin[(2n + 1)u] sin[(2n + 1)u]

[F (x + 0) − F (x + 2u)] du = g(u) du .
sin u sin u
0 0
(6.33)

We can now apply Lemma 6.5:



a

sin[(2n + 1)u] a
sin[(2n + 1)u]
[F (x + 0) − F (x + 2u)] du = g(a) du
sin u sin u
0 ζ

a
sin[(2n + 1)u]
= |F (x + 0) − F (x + 2a)| du . (6.34)
ζ sin u
Lemmas 6.2 and 6.4 imply that the integral

a
sin[(2n + 1)u]
du
sin u
ζ

is bounded as n increases, regardless of the value of ζ . It is not difficult to see that it is


bounded by π (exercises 6.1.12 and 6.1.15). We need to choose an a that is close enough
to zero so that

|F (x + 0) − F (x + 2a)| < /2. (6.35)

It follows that


1 a sin[(2n + 1)u]
[F (x + 0) − F (x + 2u)] du
π 0 sin u

a sin[(2n + 1)u]
1
= F (x + 0) − F (x + 2a) du
π ζ sin u
1
< · ·π = . (6.36)
π 2 2
Thanks to Bonnet’s mean value theorem, we have been able to find a value of a that
makes the first piece of our integral less than /2 regardless of the choice of n. We are now
free to choose an N that depends on a. We respond with any N for which n ≥ N implies that


1 π/2 sin[(2n + 1)u]
< .
[F (x + 0) − F (x + 2u)] du 2 (6.37)
π a sin u

Q.E.D.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

234 6 Return to Fourier Series

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.
6.1.1. Prove that
∞
(−1)k

k=1
k
converges but
∞  
(−1)k 1
√ +
k=1
k k
diverges.

6.1.2. Let k=1 wk be any convergent series. Prove the divergence of
∞  
1
wk + .
k=1
k

∞ ∞
6.1.3. Find two other examples of infinite series, k=1 ak and k=1 bk , for which the first
converges, the second diverges, but
lim (ak − bk ) = 0.
k→∞

6.1.4. Let F be an arbitrary function defined for all real values of x. Let
F (x) + F (−x)
f (x) = , (6.38)
2
F (x) − F (−x)
g(x) = . (6.39)
2
Prove that f is an even function and that g is an odd function and that
F (x) = f (x) + g(x).

6.1.5. Prove that if F can be written as the sum of an even function and an odd function,
F = f + g, then f and g must satisfy equations (6.38) and (6.39).
∞ ∞
6.1.6. We have seen that if ∞ k=1 ak and k=1 bk each converge, then the series k=1 (ak +
bk ) must converge. It does not necessarily work the other way. For example,
 ∞    ∞
1 1 1
− =
k=1
k k+1 k=1
k(k + 1)
converges, but neither

 ∞
1 −1
nor
k=1
k k=1
k +1
converges. Discuss whether or not it is possible to have a Fourier series,


a0 + [ak cos(kx) + bk sin(kx)]
k=1
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.1 Dirichlet’s Theorem 235

converge for all x without either



 ∞

a0 + ak cos(kx) or bk sin(kx)
k=1 k=1
converging.

6.1.7. It is often convenient to work with the Fourier series of a function F with period 2
that is defined on [−1, 1]. Show that, in this case, the Fourier series is given by
∞
F (x) = a0 + [ak cos(kπ x) + bk sin(kπ x)],
k=1
where

1 1
a0 = F (x) dx,
2

1 −1
ak = F (x) cos(kπ x) dx, (k ≥ 1),
−1

1
bk = F (x) sin(kπ x) dx.
−1

6.1.8.
M&M Find the Fourier series expansions for each of the following functions that
are defined on (−1, 1) and have period 2. Find the value of this Fourier series at x = 1.
a. f (x) = x 2
b. f (x) = cos(3π x)
c. f (x) = sin x
d. f (x) = ex

6.1.9. Find the Fourier series expansion for the function with period 2 that is equal to x 2
on the interval (1, 3). What is the value of this Fourier series at x = 1?

6.1.10. Assume that




F (x) = a0 + [ak cos(kx) + bk sin(kx)]
k=1

converges uniformly. Use equations (6.2) and (6.3) to prove that



π
1
a0 = F (x) dx,

−π
1 π
ak = F (x) cos(kx) dx, (k ≥ 1),
π
−π
1 π
bk = F (x) sin(kx) dx.
π −π

6.1.11.
M&M We know that
1 1 1
cos x − cos 3x + cos 5x − cos 7x + · · ·
3 5 7
converges to π/4 when −π/2 < x < π/2, but that it does not converge absolutely. Choose
at least four different values of x between 0 and π/2. For each value of x, apply the Riemann
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

236 6 Return to Fourier Series

algorithm as described on page 177 to find the first twenty terms of the rearrangement of
this series that converges to 1. Does the same rearrangement work for every value of x?

6.1.12. Prove equation (6.11):


1 sin[(2n + 1)u/2]
+ cos u + cos 2u + · · · + cos nu = .
2 2 sin[u/2]

M&M
6.1.13.
Approximate the values of

1 π/2 sin[(2n + 1)u] √


9 + 2u du
π 0 sin u
for various values of n. Describe what happens as n increases. What value do you expect
it to approach?

6.1.14. Find a value of ζ for which





t sin t dt = 2π sin t dt.
0 ζ

6.1.15. Prove that



π/(2n+1)
sin[(2n + 1)u]
du < π.
0 sin u

6.1.16. Justify the statement that if 0 ≤ ζ < a ≤ π/2, then



a
π/(2n+1)
sin[(2n + 1)u] sin[(2n + 1)u]
du ≤ du.
sin u sin u
ζ 0


M&M
6.1.17.
Find the coefficients of the Fourier series expansion of

2x + 1, −π < x < 0,
F (x) =
(x − 2)/3, 0 < x < π.
Evaluate partial sums for x = 0 and x = π . Do they approach the expected values?

6.1.18. Show that Bonnet’s mean value theorem is equivalent to the statement that if f is
integrable and g is a nonnegative, decreasing function on [α, β], then there is at least one
value ζ strictly between α and β for which

β
ζ
f (t)g(t) dt = g(α) f (t) dt. (6.40)
α α

6.2 The Cauchy Integral


If we have waited this long before defining integration, it is because we have not needed a
careful definition. For more than a hundred years, it was enough to define integration as the
inverse process to differentiation. As we saw in the last section, this is no longer sufficient
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.2 The Cauchy Integral 237

a x1 x2 x3 x4 x5 x6 x7 b
b n
FIGURE 6.4. a
f (x) dx ≈ j =1 f (xj −1 ) (xj − xj −1 ).

when we start using Fourier series. We need a broader and clearer definition. Fourier’s
solution, to define the definite integral in terms of area, raises the question: what do we
mean by “area”? As the nineteenth century progressed, it became increasingly evident that
the problem of defining areas was equivalent to the problem of defining integrals. If we
wanted a meaningful definition of either, then we needed to look elsewhere.
It was Cauchy who first proposed the modern solution to this problem. He defined the
integral as the limit of approximating sums. Ever since the invention of the calculus, those
who used it knew that integrals were limits of these sums. We have seen that Archimedes
calculated areas by using approximating sums. But when pushed to define integration, they
chose to define it as the inverse process of differentiation. The fact that this inverse process
could yield the limits of these approximating sums was the key theorem of calculus that
made it such a powerful tool for calculation. It never occurred to them to define the integral
as the limit of these sums.
The reason that no one used this definition before Cauchy is that it is ungainly. For the
functions that were studied before the 19th century, it is much easier to define integration as
anti-differentiation. Cauchy was on the first wave of the mathematical realization that the
existing concept of function, something that could be expressed by an algebraic formula
involving a small family of common functions, was far too restrictive. Cauchy needed a
definition of integration that would enable him to establish that any continuous function is
integrable.
Following Cauchy, we shall assume that we are working with a continuous function f
on a closed and bounded interval [a, b]. We choose a positive integer n and an arbitrary
partition of [a, b] into n subintervals:
a = x0 < x1 < x2 < · · · < xn = b.
These subintervals do not have to be of equal length. We form a sum that approximates the
value of the definite integral of f from a to b:

b 
n
f (x) dx ≈ f (xj −1 ) (xj − xj −1 ). (6.41)
a j =1

In terms of area, this is an approximation by rectangles (see Figure 6.4). The j th rectangle
sits over the interval [xj −1 , xj ] and has height f (xj −1 ), the value of the function at the
left-hand edge of the interval. Cauchy now defines the value of the definite integral to be
the limit of all such sums as the lengths of the subintervals approach zero.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

238 6 Return to Fourier Series

It is significant that he does not merely take the limit as n approaches infinity. As is
shown in the exercises, increasing only the number of subintervals is not enough to give
us convergence to the desired value. The lengths of these subintervals must all shrink. For
precision, we state Cauchy’s definition of the definite integral in the language of the –δ
game.

Definition: integration (Cauchy)


A function f is said to be integrable over the interval [a, b] and its integral has the
value V provided that the following condition is satisfied. Given any specified error
bound , there must be a response δ such that for any partition
a = x0 < x1 < x2 < · · · < xn = b
where all of the subintervals have length less than δ,
|xj − xj −1 | < δ, for all j,
the corresponding approximating sum will lie within of V ,

n

f (x ) (x − x ) − V < .
j −1 j j −1
j =1
b
The value of the integral is denoted by V = a f (x) dx.

An Example
To see how cumbersome this definition is, we shall use it to justify the simple integral
evaluation

4
16 − 1 15
x dx = = . (6.42)
1 2 2

Given any positive error bound , we must show how to respond with a tolerance δ so that
for any partition with subintervals of length less than δ, the approximating sum will lie
within of 15/2.
The approximating sum is


n
xj −1 (xj − xj −1 ).
j =1

We have approximated the area under y = x in Figure 6.5 by the sum of the areas of the
rectangles with height xj −1 and width xj − xj −1 . The correct area for each trapezoid is
(xj2 − xj2−1 )/2. Our approximation is too low by precisely

xj2 − xj2−1 xj2 − 2xj xj −1 + xj2−1 (xj − xj −1 )2


− xj −1 (xj − xj −1 ) = = .
2 2 2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.2 The Cauchy Integral 239

y= x

x0 = 1 x1 x2 x 3 x 4 x5 = 4

FIGURE 6.5. The area under y = x approximated by rectangles.

If we replace each summand in the approximating sum by the correct value minus the error,
we see that our approximating sum is
 2 
 n  n
xj − xj2−1 (xj − xj −1 )2
xj −1 (xj − xj −1 ) = −
j =1 j =1
2 2

n
xj2 − xj2−1  n
(xj − xj −1 )2
= −
j =1
2 j =1
2
xn2 − x02  n
(xj − xj −1 )2
= −
2 j =1
2
15  (xj − xj −1 )2
n
= − . (6.43)
2 j =1
2

If each subinterval has length xj − xj −1 < δ, then the total error is less than

 n
(xj − xj −1 )2 δ
n

< (xj − xj −1 ) = . (6.44)
j =1
2 2 j =1
2

We can guarantee that this error is less than if we choose δ = 2 /3.

The Cauchy Criterion


If the target value V is not known, there is a corresponding Cauchy criterion (p. 240).
This definition is slippery because the two partitions may have few or no interior points in
common. Cauchy used this definition to prove the integrability of any continuous function.
He recognized that to compare these sums, we must have common points in the partitions.
His solution is the one we use today. We look for a common refinement, a partition that
combines the break points of both of the original partitions. If we define a partition by its
set of break points,

P1 = {a, x1 , x2 , . . . , xn−1 , b},


P2 = {a, x1 , x2 , . . . , xm−1

, b},
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

240 6 Return to Fourier Series

Definition: Cauchy criterion for integration


A function f is said to be integrable over the interval [a, b] provided that the following
condition is satisfied. Given any specified error bound , there must be a response δ
such that for any pair of partitions of the interval [a, b]:
a = x0 < x1 < x2 < · · · < xn = b,
a = x0 < x1 < x2 < · · · < xm = b,
where all of the subintervals have length less than δ,
|xj − xj −1 | < δ, |xj − xj −1 | < δ for all j,
the corresponding approximating sums will lie within of each other,

n
  m

f (xj −1 ) (xj − xj −1 ) − f (xj −1 ) (xj − xj −1 ) < .
  

j =1 j =1

then a refinement of P1 is any partition of [a, b] that contains P1 . The smallest common
refinement of P1 and P2 is the union of these sets,
)
P = P1 P2 .

To prove the integrability of a function, we need only find a response δ so that if we start
with a partition whose subintervals have length less than δ and refine this partition, then
we change the value of the approximating sum by at most /2. Let P = {t0 , t1 , t2 , . . . , tr }
be a common refinement of P1 and P2 . If

 
n r

f (x ) (x − x ) − f (t ) (t − t )
j −1 < /2
j −1 j j −1 j −1 j
j =1 j =1

and


n  m

f (t ) (t − t ) − f (x 
) (x 
− x 
j −1 j j −1 j −1 j )
j −1 < /2,
j =1 j =1

then

 
n m

f (x ) (x − x ) − f (x 
) (x 
− x 
j −1 j j −1 j −1 j )
j −1 < .
j =1 j =1

Continuity Implies Integrability


Cauchy’s definition of integrability may be cumbersome, but it accomplished the task that
no previous definition had been able to do. It made it possible to prove that any continuous
function is integrable.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.2 The Cauchy Integral 241

Theorem 6.6 (Continuous =⇒ Integrable). If f is a continuous function on the


closed, bounded interval [a, b], then f is integrable over [a, b].

Proof: We have outlined what needs to be done. If we are given an error bound , we must
be able to construct a response δ with the property that if

P1 = {a, x1 , x2 , . . . , xn−1 , b}

is any partition of [a, b] with interval lengths less than δ and if P2 is any refinement of P1
(P1 ⊆ P2 ), then the difference between the corresponding approximating sums is less than
/2. We take each interval [xj −1 , xj ] whose endpoints are consecutive points of P1 and
denote its partition in P2 by

P1,j = {xj −1 = xj 0 , xj 1 , xj 2 , . . . , xj dj = xj }.

The partition P2 is the union of these partitions of the subintervals,


)
n
P2 = P1,j .
j =1

We must show that



  n  dj
n
f (xj −1 ) (xj − xj −1 ) − f (xj k−1 ) (xj k − xj k−1 ) < /2.

j =1 j =1 k=1

Let us consider just the j th subinterval of our original partition. This subinterval con-
tributes

dj
f (xj −1 ) (xj − xj −1 ) − f (xj k−1 ) (xj k − xj k−1 )
k=1

to the difference of the sums. Let Mj be the maximum value of f over the interval [xj −1 , xj ]
and let mj be the minimum value over the same interval. Since every xj k−1 is contained in
this interval, we have that

mj ≤ f (xj k−1 ) ≤ Mj .

We see that

dj

dj
f (xj k−1 ) (xj k − xj k−1 ) ≤ Mj (xj k−1 − xj k )
k=1 k=1
= Mj (xj − xj −1 ) (6.45)

and

dj

dj
f (xj k−1 ) (xj k−1 − xj k ) ≥ mj (xj k − xj k−1 )
k=1 k=1
= mj (xj − xj −1 ). (6.46)
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

242 6 Return to Fourier Series

We now invoke the intermediate value theorem. Since


dj
k=1 f (xj k−1 ) (xj k − xj k−1 )
xj − xj −1
is a constant sitting somewhere between the minimal and maximal values of f over
[xj −1 , xj ], it must actually equal f (cj ) at some point cj in this interval. We have proven
that, for some cj ∈ [xj −1 , xj ],


dj
f (xj k−1 ) (xj k − xj k−1 ) = f (cj ) (xj − xj −1 ). (6.47)
k=1

We use this to simplify the contribution from the j th subinterval:




dj

f (xj −1 ) (xj − xj −1 ) − f (xj k−1 ) (xj k − xj k−1 )

k=1
= |f (xj −1 ) − f (cj )| (xj − xj −1 ). (6.48)
Both xj −1 and cj lie in the same subinterval of the original partition. Using the continuity
of f , we choose our δ so that if |xj −1 − cj | < δ, then


|f (xj −1 ) − f (cj )| < . (6.49)
2(b − a)

This is our response. It only remains to verify that the difference of the two sums is within
the allowed error:

  n  dj
n
f (x ) (x − x ) − f (x ) (x − x )
j −1 j j −1 j k−1 jk j k−1
j =1 j =1 k=1

n
  n

= f (xj −1 ) (xj − xj −1 ) − f (cj ) (xj − xj −1 )
j =1 j =1

n
≤ |f (xj −1 ) − f (cj )| (xj − xj −1 )
j =1

 n

< (xj − xj −1 ) = . (6.50)
2(b − a) j =1 2

Q.E.D.

The proof we have just seen is a carefully stated version of Cauchy’s proof. There
are no mistakes in it, but it does reflect an oversight by Cauchy. The problem comes in
the italicized portion of the last paragraph, the place where we choose our δ. Notice that
we need uniform continuity in exactly the same way that we needed uniform continuity
to prove Riemann’s lemma. We have uniform continuity because we are working with
a continuous function on a closed and bounded interval, but Cauchy never explicitly
recognized this need. This should be reminiscent of the earlier flaw in Cauchy’s reasoning
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.2 The Cauchy Integral 243

with regard to the continuity of infinite series. There he missed the need for uniform
convergence.

A Mean Value Theorem Implying Continuity


There is a version of the mean value theorem that deals with integrals rather than derivatives.
We observe that if f is bounded below by A and above by B as x ranges over the interval
[a, b], then we have for any partition of this interval,

n 
n 
n
A (xj − xj −1 ) ≤ f (xj −1 ) (xj − xj −1 ) ≤ B (xj − xj −1 ),
j =1 j =1 j =1

n
A(b − a) ≤ f (xj −1 ) (xj − xj −1 ) ≤ B(b − a),
j =1
n
j =1 f (xj −1 ) (xj − xj −1 )
A ≤ ≤ B.
b−a
If these bounds hold for every approximating sum, then they also must hold for the integral:
b
f (x) dx
A≤ a
≤ B. (6.51)
b−a

If f is continuous over the interval [a, b], then the intermediate value theorem tells us that
it must actually equal this ratio at some point strictly between a and b.

Theorem 6.7 (Integral Form of the Mean Value Theorem). If f is continuous on


the interval [a, b], then there is at least one point c ∈ (a, b) for which

b
f (x) dx = f (c) (b − a). (6.52)
a

Theorem 6.7 says something important about the continuity of the integral regardless of
whether f is continuous or not.

Corollary 6.8 (Continuity of Integral). Let f be a bounded integrable function on


[a, b] and define F for x in [a, b] by

x
F (x) = f (t) dt.
a

Then F is continuous at every point between a and b.

Proof: We choose a point c between a and b. We must show that for any error bound
supplied by our opponent, we always have a response δ for which

t
c


|t − c| < δ implies that f (x) dx − f (x) dx < .
a a
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

244 6 Return to Fourier Series

We choose a bound M so that |f (x)| ≤ M for all x in [a, b]. It follows that the absolute
value of the difference of the integrals is bounded by

t
c
t

f (x) dx − f (x) dx = f (x) dx ≤ M |t − c|.
(6.53)

a a c

We respond with δ = /M.


Q.E.D.

Proof of Bonnet’s Mean Value Theorem


Cauchy’s definition of the integral enables us to prove results such as Lemma 6.5 on
page 231, Bonnet’s mean value theorem. The result that Bonnet actually proved was the
following lemma.

Lemma 6.9 (Bonnet’s Lemma). Let f be integrable and let g be a nonnegative,


increasing function on [α, β]. If the integral of f from x to β has least upper bound B
and greatest lower bound A as x ranges over the values in [α, β],

β
A≤ f (t) dt ≤ B,
x

then

β
A g(β) ≤ f (t) g(t) dt ≤ B g(β). (6.54)
α

To see that this implies Lemma 6.5, we observe that, as a function of x,



β
g(β) f (t) dt
x
is continuous and so achieves the values of its least upper and greatest lower bounds: A g(β)
and B g(β). The intermediate value theorem implies that there is some ζ between α and β
for which

β
β
g(β) f (t) dt = f (t) g(t) dt.
ζ α

Proof: Bonnet refers to Lemma 6.9 as a particular case of Abel’s lemma which we have
seen as Theorem 4.16 on page 161. We use the definition of the integral to work with the
approximating sum and do exactly the same manipulation on this sum that Abel performed
in obtaining his lemma. The proof is complicated by the fact that we have to keep track of the
tolerances involved, but Bonnet is correct that the basic idea is contained in Abel’s lemma.
Let us forget tolerances for a moment and concentrate on the summations. We choose a
partition
β of [α, β], P = {α = x0 , x1 , . . . , xn = β}, and look at the sum that approximates
α f (t) g(t) dt by evaluating the function at the right-hand edge of each interval:


n
SP = f (xj ) g(xj ) (xj − xj −1 ). (6.55)
j =1
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.2 The Cauchy Integral 245

For k = 1, 2, . . . , n, we define


n
SP,k = f (xj ) (xj − xj −1 ). (6.56)
j =k

β
The partial sum SP,k will be a good approximation to xk−1 f (t) dt which we know lies
between A and B.
We see that

f (xj ) (xj − xj −1 ) = SP ,j − SP ,j +1 . (6.57)

If we define SP ,n+1 = 0, then SP can be rewritten as



n
SP = g(xj ) (SP ,j − SP ,j +1 )
j =1
= g(x1 ) (SP ,1 − SP ,2 ) + g(x2 ) (SP ,2 − SP ,3 ) + · · · + g(xn ) (SP ,n − SP ,n+1 )
= g(x1 ) SP ,1 + [g(x2 ) − g(x1 )] SP ,2 + · · · + [g(xn ) − g(xn−1 )] SP ,n . (6.58)

Since g is nonnegative and increasing, each coefficient of SP,k is greater than or equal to 0.
We let AP and BP be, respectively, lower and upper bounds on the set of values of SP,k ,

AP ≤ SP,k ≤ BP for 1 ≤ k ≤ n.

We can now bound SP :

SP ≥ AP (g(x1 ) + g(x2 ) − g(x1 ) + · · · + g(xn ) − g(xn−1 ))


= AP g(xn ) = AP g(β), (6.59)
SP ≤ BP (g(x1 ) + g(x2 ) − g(x1 ) + · · · + g(xn ) − g(xn−1 ))
= BP g(xn ) = BP g(β). (6.60)

βThe idea at this point is to argue that as the partition becomes finer, SP approaches
α f (t) g(t) dt and the upper and lower bounds, BP and AP , approach B and A, respec-
tively. We use the definition of the integral to make this part of the argument more precise.
We choose an error bound and find a response δ so that for any partition of [α, β] with
subintervals of length less than δ, we have that

n
f (xj ) g(xj ) (xj − xj −1 )
j =1
β
is within of α f (t) g(t) dt and at the same time

n
f (xj ) (xj − xj −1 )
j =1
β
is within of α f (t) dt. There is a δ1 that works in the first case and a δ2 that works in the
second. We choose whichever is smaller.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

246 6 Return to Fourier Series

Once we have chosen our δ, any partition with subintervals of length less than δ will
yield an approximating sum within the allowable error . For any k, 1 ≤ k ≤ n,

n
f (xj ) (xj − xj −1 )
j =k
β
must also be within of xk−1 f (t) dt. This is because we are allowed to choose as fine a
x wish over [α, xk−1 ] so that the missing summands add to an amount
partition as we might
arbitrarily close to α k−1 f (t) dt. It follows that AP > A − and BP < B + , and there-
fore

(A − ) g(β) < AP g(β) ≤ SP ≤ BP g(β) < (B + ) g(β). (6.61)

Combining this with the inequality



β
SP − < f (t) g(t) dt < SP + (6.62)
α

yields

β
A g(β) − [1 + g(β)] < f (t) g(t) dt < B g(β) + [1 + g(β)]. (6.63)
α

Since this is true for every positive , we have proved the desired inequalities:

β
A g(β) ≤ f (t) g(t) dt ≤ B g(β).
α

Q.E.D.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.
 1
6.2.1.
M&M If we use n subintervals of equal length, then 0 (x 3 − 2x 2 + x) dx is
approximated by the sum
n−1  3
 
j j2 j 1
−2 2 + .
j =0
n3 n n n

Evaluate these approximations for 1 ≤ n ≤ 20. How large must n be before you are within
0.001 of the correct value?

M&M
6.2.2.
Experiment with the value of

n
(xj3−1 − 2xj2−1 + xj −1 ) (xj − xj −1 )
j =1
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.2 The Cauchy Integral 247

for different partitions of [0, 1] into intervals


1 of length at most 0.25. Find a partition for
which the difference between this sum and 0 (x 3 − 2x 2 + x) dx is as large as possible.

Define sin(1/0) = 0. To see whether
M&M
6.2.3.


1
sin(1/x) dx
0

exists, we can look at the approximating sums and see if they seem to converge. Does it
appear that
 

n−1
1
 sin(n/j )
j =1
n

converges as n increases? If so, what does the value appear to be?

6.2.4. Determine whether or not sin(1/x) is integrable over [0, 1] and prove your assertion.

6.2.5. What is the value of



1
cos2 (100π x) dx?
0

Consider the approximating sum

1
n−1
cos2 (100πj/n).
n j =0

Describe what happens as n increases. Does increasing the value of n always bring you
closer to the actual value of the integral?

6.2.6. Prove that if f is differentiable and the derivative is bounded on the interval I , then
f is uniformly continuous on I .

6.2.7. Give an example of a function f and an interval I such that f is differentiable at


every point of I , the derivative of f over I is not bounded, but f is uniformly continuous
over I .

6.2.8. Is sin(1/x) uniformly continuous on (0, 1)? Justify your answer.

6.2.9. Is sin(1/x) uniformly continuous on (1, ∞)? Justify your answer.


b
6.2.10. Prove that if every approximating sum for the integral a f (x) dx is bounded below
by m and above by M,

n
m≤ f (xj −1 ) (xj − xj −1 ) ≤ M,
j =1

then the integral must also be bounded below by m and above by M.


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

248 6 Return to Fourier Series

6.2.11. Use the integral form of the mean value theorem to prove that if f is continuous
over [a, b], then

x
d
f (t) dt = f (x) (6.64)
dx a

for every x ∈ [a, b].

6.2.12. Discuss whether or not equation (6.64) is valid at a point where f is not continuous.

6.3 The Riemann Integral


A more useful definition of integration was given by Bernhard Riemann in “Über die
Darstellbarkeit einer Function durch eine trigonometrische Reihe.” As mentioned in sec-
tion 5.1, this was written after the summer of 1852 when Riemann had discussed questions
of Fourier series with Dirichlet. Its purpose was nothing less than to find necessary and suf-
ficient conditions for a function to have a representation as a trigonometric series. Riemann
never published it, probably because it raised many new questions that he was hoping to
answer. It appeared in 1867, after his death.
Riemann begins with a summary of the history of the subject, describing the contributions
from d’Alembert, Euler, Bernoulli, and Lagrange and the questions that arose concerning
the validity of a trigonometric expansion for arbitrary functions. He discusses Fourier’s
contributions and Dirichlet’s proof, emphasizing Dirichlet’s recognition of the distinction
between absolute and conditional convergence. This is where the Riemann rearrangement
theorem is stated, not as a theorem but as an observation. He points out the difficulty with
Fourier series: that in general the convergence will not be absolute.
This is followed by a list of the assumptions that Dirichlet needed to impose on a function
in order to prove that it did have representation as a trigonometric series:
I. it must be integrable,
II. at each point of discontinuity, its value must be the average of the limit from the left
and the limit from the right,
III. it must be piecewise continuous, bounded, and piecewise monotonic.
The second condition is essential. We have seen that the Fourier series cannot equal the
original function at any point where this is not true. The third assumption is not as clearly
necessary. Most of Riemann’s work involved probing how far the third assumption could
be weakened.

The Riemann Integral


The first task is to clarify the meaning of the integral. Cauchy’s definition was adequate
for proving that any bounded continuous function is integrable. It is also sufficient for
a demonstration that any bounded piecewise continuous function is integrable. Riemann
wished to consider even more general functions, functions with infinitely many disconti-
nuities within any finite interval. His definition is very similar to Cauchy’s. Like Cauchy,
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.3 The Riemann Integral 249

he uses approximating sums:



b 
n
f (x) dt ≈ f (xj∗−1 ) (xj − xj −1 ).
a j =1

Unlike Cauchy who evaluated the function f at the left-hand endpoint of each inter-
val, Riemann allows approximating sums in which xj∗−1 can be any point in the interval
[xj −1 , xj ]. Because of this extra freedom, it appears more difficult to guarantee conver-
gence of these series. In fact, for bounded functions Riemann’s definition is equivalent to
Cauchy’s. Cauchy wanted to be able to prove that any continuous function is integrable.
Riemann was interested in seeing how discontinuous a function could be and still remain
integrable. As he realized, to be tied to the left-hand endpoints obscures what is happening
in general. Riemann’s definition—in the language of the –δ game—is the following.

Definition: integration (Riemann)


A function f is said to be Riemann integrable over the interval [a, b] and its integral
has the value V provided that the following condition is satisfied. Given any specified
error bound , there must be a response δ such that for any partition
a = x0 < x1 < x2 < · · · < xn = b
where each of the subintervals has length less than δ,
|xj − xj −1 | < δ, for all j ,
and for any set of values x0∗ ∈ [x0 , x1 ], x1∗ ∈ [x1 , x2 ], . . . , xn−1

∈ [xn−1 , xn ], the
corresponding approximating sum will lie within of the value V ,


n
f (x ∗
) (x − x ) − V < .
j −1 j j −1
j =1
The value of the integral is denoted by

b
V = f (x) dx.
a

What Riemann gains in allowing xj∗−1 to take on any value in [xj −1 , xj ] is greater
flexibility. In particular, it enables him to establish necessary and sufficient conditions for
the existence of the integral.

Necessary and Sufficient Conditions


If we want to prove that the integral exists without knowing its value, then we are again
thrown back on the Cauchy criterion. Given an error bound , we must show that we
have a response δ with the property that any two approximating sums with subintervals of
length less than δ must differ from each other by less than . As Cauchy did in proving the
integrability of continuous functions, it is enough if each sum can be brought within /2 of
any approximating sum that uses the common refinement.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

250 6 Return to Fourier Series

Let P1 = {a, x1 , x2 , . . . , xn−1 , b} be a partition of [a, b] and let P2 be a refinement of


P1 . As before, we take each interval [xj −1 , xj ] whose endpoints are consecutive points of
P1 and denote its partition in P2 by
P1,j = {xj −1 = xj 0 , xj 1 , xj 2 , . . . , xj dj = xj }.
The partition P2 is the union of these partitions of the subintervals,
)
n
P2 = P1,j .
j =1

We need to show that we can force



  n  dj
n
f (x ∗
) (x − x ) − f (x ∗∗
) (x − x
j k−1 < /2
)
j −1 j j −1 j k−1 jk
j =1 j =1 k=1

by establishing a bound on xj − xj −1 . Beyond the fact that xj∗−1 ∈ [xj −1 , xj ] and xj∗∗k−1 ∈
[xj k−1 , xj k ] ⊆ [xj −1 , xj ], there is no necessary relationship between xj∗−1 and xj∗∗k−1 .
We assume that f is bounded on [a, b], and we let Mj be the least upper bound of the
values of f (x) for x ∈ [xj −1 , xj ], mj be the greatest lower bound. We define the variation
of f on [xj −1 , xj ] to be
D j = Mj − m j .
It follows that

dj

f (x ∗ ) (xj − xj −1 ) − f (xj k−1 ) (xj k − xj k−1 )
∗∗
j −1
k=1

 d
dj
j
= f (xj∗−1 ) (xj k − xj k−1 ) − f (xj∗∗k−1 ) (xj k − xj k−1 )
k=1 k=1

dj
≤ |f (xj∗−1 ) − f (xj∗∗k−1 )| (xj k − xj k−1 )
k=1


dj
≤ Dj (xj k − xj k−1 )
k=1
= Dj (xj − xj −1 ). (6.65)
This is an upper bound that we can approach as closely as we please by taking the refinement
to be the original partition, P2 = P1 , choosing xj∗−1 so that f (xj∗−1 ) is close to Mj , and
choosing xj∗∗0 so that f (xj∗∗0 ) is close to mj .
This means that we have integrability if and only if there is a response δ such that for
any partition with subintervals of length less than δ,

D1 (x1 − x0 ) + D2 (x2 − x1 ) + · · · + Dn (xn − xn−1 ) < /2, (6.66)

where Dj is the variation of f on the interval [xj −1 , xj ]. From here, Riemann derived the
following theorem.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.3 The Riemann Integral 251

Theorem 6.10 (Conditions for Riemann Integrability). Let f be a bounded function


on [a, b]. This function is integrable over [a, b] if and only if for any pair of positive
numbers (ν, σ ), there is a δ such that for any partition of [a, b] with subintervals of
length less than δ, the subintervals on which the variation is ≥ σ have a combined
length that is < ν.

For example, if f (x) = x 2 on [0, 1] and we are asked to respond to the challenge
ν = 0.3, σ = 0.23, we must find a δ such that any partition of [0, 1] into subintervals
of length less than δ results in subintervals of combined length less than 0.3 on which
the variation is greater than 0.23. The response δ = 0.2 will not work. For the partition
{0, 0.2, 0.4, 0.6, 0.8, 1}, there are two subintervals on which the variation exceeds 0.23. On
[0.6, 0.8] the variation is 0.82 − 0.62 = 0.28, and on [0.8, 1] the variation is 12 − 0.82 =
0.36. The combined length of these intervals is 0.4, and this is larger than ν = 0.3. There
is a response, however, and it is left as an exercise to find one.
This theorem implies that Dirichlet’s function

1, if x is rational,
f (x) =
0, if x is irrational
is not Riemann integrable over [0, 2]. Every subinterval contains both rational and irrational
numbers. If we are challenged with ν = 1/2, σ = 1/3, then every subinterval has variation
equal to 1 no matter how short it might be. The sum of the lengths of the subintervals with
variation larger than 1/3 1is 2 which is larger than 1/2.
On the other hand, 0 sin(1/x) dx does exist. Once we are given ν and σ , we choose
some point α between 0 and ν. The function sin(1/x) is uniformly continuous on the
interval [α, 1], and so we can choose a δ so that on each subinterval of [α, 1] the variation
is less than σ . It follows that all of the subintervals with variation larger than σ lie inside
[0, α + δ), and so the sum of their lengths is less than α + δ. If we also restrict δ so that
α + δ is less than ν, then the sum of the lengths of the subintervals with variation larger
than σ will be less than ν.

Proof: We assume that f is integrable and so D1 (x1 − x0 ) + D2 (x2 − x1 ) + · · · + Dn (xn −


xn−1 ) can be made arbitrarily small by taking a partition with sufficiently short subintervals.
Given ν and σ and a partition of [a, b], let s be the sum of the lengths of the subintervals
on which the variation is larger than σ . We have that

σ s < D1 (x1 − x0 ) + D2 (x2 − x1 ) + · · · + Dn (xn − xn−1 ). (6.67)

We choose our δ so that the right side of this inequality is less than σ ν. This implies that
σ s < σ ν, and so s is less than ν.
In the other direction, we assume that there is such a δ for any choice of ν and σ . Let
D be the variation of f over the entire interval [a, b] so that each Dj is less than or equal
to D. Those subintervals with variation greater than σ contribute at most Dν, while the
subintervals with variation less than σ contribute at most σ (b − a). It follows that

D1 (x1 − x0 ) + D2 (x2 − x1 ) + · · · + Dn (xn − xn−1 ) < Dν + (b − a)σ.


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

252 6 Return to Fourier Series

If we choose ν = /4D and σ = /4(b − a), then


D1 (x1 − x0 ) + D2 (x2 − x1 ) + · · · + Dn (xn − xn−1 ) < /2.

Q.E.D.

Improper Integrals

Definition: improper integral An integral is improper if either the function that is


being integrated or the interval over which the function is integrated is unbounded.
Strictly speaking, the Riemann integral does not exist in either case. However, there
may be a value that can be assigned to such an integral by taking a limit of integrals
that are Riemann integrable.

Riemann’s definition only applies to bounded functions on closed, bounded intervals.


Cauchy had shown how to integrate unbounded functions. Riemann’s treatment is exactly
the same. If f (x) is unbounded as x approaches c for some c ∈ [a, b], then he defines

b 
c+ 1  
b 
f (x) dx = lim− f (x) dx + lim+ f (x) dx . (6.68)
a 1 →0 a 2 →0 c+ 2

Both limits must exist independently.


For example,

1 
1  
1 
dx dx dx
= lim− + lim+
−1 x 1 →0 −1 x 2 →0 2 x
= lim− (ln | 1 |) − lim+ (ln 2 ) ,
1 →0 2 →0

which does not exist. On the other hand,



1
dx √ 1 √
√ = lim+ 2 x = lim+ (2 − 2 ) = 2.
0 x →0 →0

See page 138 for a discussion of integration over an unbounded domain.

Integrability with Infinitely Many Discontinuities


One of the surprising results that Riemann produces is an example of an integrable function
with infinitely many discontinuities between 0 and 1. He defines the function

 x − x
, x
≤ x < x
+ 1/2,
((x)) = 0, x = x
+ 1/2, (6.69)

x − x
− 1, x
+ 1/2 < x < x
+ 1,

(see Figure 6.6). He then defines



 ((nx))
f (x) = . (6.70)
n=1
n2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.3 The Riemann Integral 253

0.6

y 0.4

0.2

0
−2 −1 0 1 2
x
−0.2

−0.4

−0.6

FIGURE 6.6. Graph of y = ((x)).


Since |((nx))| < 1/2, this series converges for all x. It has a discontinuity whenever nx is
half of an odd integer, and that will happen for every x that is a rational number with an
even denominator (see Figure 6.7).
Specifically, if x = a/2b where a is odd and a and b are relatively prime, and if n is an
odd multiple of b, then
((na/2b + 0)) − ((na/2b)) = −1/2 and ((na/2b − 0)) − ((na/2b)) = 1/2.
We want to be able to assert that
a  a   ∞
−1/2
f +0 −f =
2b 2b m=1
(mb)2
m odd
 
−1 1 1
= 2 1+ + + ···
2b 9 25
−π 2
= , (6.71)
16b2
a  a   ∞
1/2
f −0 −f =
2b 2b m=1
(mb)2
m odd
 
1 1 1
= 2 1+ + + ···
2b 9 25
π2
= . (6.72)
16b2
The first line of these equalities assumes that we can interchange limits, that
∞ 
 ((nx + nν)) − ((nx))
f (x + 0) − f (x) = lim+
ν→0
n=1
n2
∞  
((nx + nν)) − ((nx))
= lim+ . (6.73)
n=1
ν→0 n2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

254 6 Return to Fourier Series

0.6

0.4

0.2

0.2 0.4 0.6 0.8 1

−0.2

−0.4

−0.6


FIGURE 6.7. Graph of y = 2
n=1 ((nx))/n .

The justification of this interchange rests on the uniform convergence of our series over the
set of all x and is left as exercise 6.3.16.
Our function f has a discontinuity at every rational number with an even denominator,
but it is integrable. Given ν and σ , there are only finitely many rational numbers between
0 and 1 at which the variation is larger than σ . If the variation is larger than σ at x = a/2b,
then b must satisfy

π2

8b2

which means that b is a positive integer less than π/ 8σ . If there are N such rational
numbers, then we choose our response δ so that N δ is less than ν and so that the variation
is less than σ on every other subinterval.

Fourier Series
Now that he has settled the problem of integrability, Riemann moves to the main theme
of his paper. He points out that previous work had focused on the question: when is
a function representable by a trigonometric series? Characteristic of his insightfulness,
he realizes that the question needs to be reversed. “We must proceed from the inverse
question: if a function is representable by a trigonometric series, what consequences does
this have for its behavior, for the variation of its value with the continuous variation of the
argument?”
This shift of focus enabled him to find necessary and sufficient conditions for a function
to be representable as a trigonometric series. If f is a convergent trigonometric series,
then there exists a function F for which f is the second derivative of F . Furthermore, for
arbitrary constants a, b, and c and any function λ that is continuous on [b, c] and zero at b
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.3 The Riemann Integral 255

and at c, whose derivative is continuous on [b, c] and zero at b and at c, and whose second
derivative is piecewise monotonic on [b, c], we must have that

c
lim µ2 F (x) cos µ(x − a) λ(x) dx = 0. (6.74)
µ→0 b

An example of a function that fails these conditions and thus does not have a Fourier series
representation is
d ν
f (x) = [x cos(1/x)] = νx ν−1 cos(1/x) + x ν−2 sin(1/x),
dx
where ν is any constant between 0 and 1/2.
Riemann then proves that these conditions are not only necessary, they are sufficient. If
there exists a function F satisfying equation (6.74) with the conditions described above,
then f has a representation as a trigonometric series.
Riemann opened new worlds of possibilities: integrable functions whose Fourier series
do not converge, convergent trigonometric series whose sum is not integrable, trigonometric
series that converge only at rational values of x or are unbounded in any open interval,
functions that are continuous at every point but that lack a derivative at any point.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

6.3.1. While the Riemann and Cauchy definitions of integration are equivalent for bounded
functions, they are not entirely equivalent for unbounded functions. Show that
 √
1/ |x|, −1 ≤ x < 0,
f (x) =
0, x = 0,

is integrable over [−1, 0] in the Cauchy sense but not in the Riemann sense.

6.3.2. Prove that if f is continuous on the closed and bounded interval [a, b], then f is
Riemann integrable over [a, b].

6.3.3. Prove that if f is Riemann integrable over [a, b], then it also satisfies Cauchy’s
definition of integrability.

6.3.4. When looking for a response to a (ν, σ ) challenge—to find a δ so that for any partition
of [a, b] with subintervals of length less than δ, ν is larger than the sum of the lengths of
the subintervals with variation larger than σ —it is important to realize that shifting the
partition can affect the sum of the lengths of the intervals on which the variation exceeds
σ . For the example given on page 251, f (x) = x 2 on [0, 1], ν = 0.3, σ = 0.23, we saw
that if the partition is

0 < 0.2 < 0.4 < 0.6 < 0.8 < 1,


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

256 6 Return to Fourier Series

then the sum of the lengths is 0.4. Find the sum of the lengths of the subintervals on which
the variation exceeds 0.23 for each of the following partitions in which the subintervals
still have length ≤ 0.2:

0 < 0.1 < 0.3 < 0.5 < 0.7 < 0.9 < 1,
0 < 0.15 < 0.35 < 0.55 < 0.73 < 0.88 < 1,
0 < 0.13 < 0.33 < 0.53 < 0.72 < 0.87 < 1.

6.3.5. Continuing the previous exercise, find a partition of [0, 1] into subintervals of length
less than or equal to 0.2 that maximizes the sum of the lengths of the subintervals on which
the variation equals or exceeds 0.23.

6.3.6. Continuing the previous two exercises, find a response δ to the challenge ν =
0.3, σ = 0.23. Prove that you have satisfied the challenge for any partition of [0, 1] into
subintervals of length less than δ.

6.3.7. Prove or disprove that the function f defined by



1/q, if x = p/q is rational,
f (x) =
0, if x is irrational,

is Riemann integrable over [0, 1]. We define f (0) = f (1) = 1.

6.3.8. Prove or disprove that the function g defined by


. /
1 1
g(x) = −
x x
is Riemann integrable over [0, 1].

6.3.9. Let h be defined by



1, x = 1/n, n ∈ N,
h(x) =
0, otherwise.
1
Prove that f is Riemann integrable over [0, 1] and that 0 f (x) dx = 0.

6.3.10. Using Riemann integrals of suitably chosen functions, find the following limits.
1  1/n 
a. lim e + e2/n + e3/n + · · · + en/n
n→∞ n
1  
b. lim 3 12 + 22 + 32 + · · · + n2
n→∞ n
1  
c. lim k+1 1k + 2k + 3k + · · · + nk , k ≥ 0
n→∞ n
 
1 1 1
d. lim + + ··· +
n→∞ n + 1 n+2 3n
 
1 1 1
e. lim n2 + + · · · +
n→∞ n3 + 13 n3 + 23 n3 + n3
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.3 The Riemann Integral 257

1
f. lim n
(n + 1)(n + 2) · · · (n + n)
n→∞ n

6.3.11. Prove that if f is Riemann integrable on [0, 1] and |q| < 1, then


1
lim− (1 − q) q f (q ) =
n n
f (x) dx. (6.75)
q→1 0
n=1

6.3.12. Find a bound (in terms of α > 0) on the size of

d
sin(1/x), x ∈ [α, 1].
dx
For the function sin(1/x) on the interval [0, 1], find a response δ to the challenge ν = 0.3,
σ = 0.1.

6.3.13. Use the fact that

1 1 1 1 π2
1+ + + + + ··· =
4 9 16 25 6
(appendix A.3) to prove that

1 1 1 π2
1+ + + + ··· = .
9 25 49 8

6.3.14. For the function



 ((nx))
f (x) =
n=1
n2

defined on page 252, find a response δ to the challenge ν = 0.2, σ = 0.1.



M&M
6.3.15.
Graph the partial sums


N
((nx))
fN (x) =
n=1
n2

over [0, 1] for N = 10, 100, and 1000.



6.3.16. Prove that f (x) = ∞ 2
n=1 ((nx))/n converges uniformly. Prove that the interchange
of limits in equation (6.73) is allowable.

6.3.17. To find the Fourier expansion of ((nx)) over [−1, 1], we observe that this function
is odd and so ak = 0 for all k. Using exercise 6.1.7 from section 6.1, show that


0, if 2n does not divide k,
bk = (6.76)
−(−1)k/2n (2n/kπ ), if 2n does divide k.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

258 6 Return to Fourier Series

6.3.18. Use the results from exercises 6.3.16 and 6.3.17 to prove that

 ∞

((nx)) −ψ(k)
= sin(2kπ x), (6.77)
n=1
n2 k=1
k2π

where

ψ(k) = (−1)d d, (6.78)
d|k

the sum being over all positive integers d that divide evenly into k.

6.3.19.
M&M Investigate the function ψ defined in equation (6.78). Calculate its values
up to at least k = 100. How fast does its absolute value grow? When is it positive? What
else can you say about ψ?

6.3.20. Show that if



 ((nx))
g(x) = , (6.79)
n=1
n

then
 
1 1 1 1 π
g(1/4) = 1 − + − + ··· = .
4 3 5 7 16
Find an approximate value for g(1/5). Prove that this series converges when x = 1/5.

6.3.21. Prove that the series g of equation (6.79) converges at every rational value of x.
Discuss what you think happens at irrational values of x.

6.4 Continuity without Differentiability


Few mathematical feats have been as surprising as the exhibition of a function that is
continuous at every value and differentiable at none. It illustrates that confusion between
continuity and differentiability is indeed confusion. While differentiability implies conti-
nuity, continuity guarantees nothing about differentiability.
Until well into the 1800s, there was a basic belief that all functions have derivatives,
except possibly at a few isolated points such as one finds with the absolute value function,
|x|, at x = 0. In 1806, Ampère tried to prove the general existence of derivatives. His proof
is difficult to evaluate because it is not clear what implicit assumptions he was making
about what constitutes a function. In 1839 with the publication of J. L. Raabe’s calculus
text, Die Differential- und Integralrechnung, the “theorem” that any continuous function
is differentiable—with the possibility of at most finitely many exceptional points—started
making its way into the standard textbooks.
Bolzano, Weierstrass, and Riemann knew this was wrong. By 1861 Riemann had intro-
duced into his lectures the function

 sin(n2 x)
,
n=1
n2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.4 Continuity without Differentiability 259

claiming that it is continuous at every x but not differentiable for infinitely many values of
x. The convergence of this series is uniform (by the Weierstrass M-test with Mn = 1/n2 ),
and so it is continuous at every x. Nondifferentiability is harder to prove. It was not until
1916 that G. H. Hardy showed that in any finite interval, no matter how short, there will be
infinitely many values of x for which the derivative does not exist. It was demonstrated in
1970 that there are also infinitely many values at which the derivative does exist. Riemann’s
example—while remarkable—does not go as far as nondifferentiability for all x.
The faith in the existence of derivatives is illustrated by the reaction to Hermann Han-
kel’s 1870 paper “Untersuchungen über die unendlich oft oszillierenden und unstetigen
Functionen” in which, among other things, he described a general method for creating con-
tinuous functions with infinitely many points of nondifferentiability. J. Hoüel applauded
this result and expressed hope that it would change the current attitude in which “there is no
mathematician today who would believe in the existence of continuous functions without
derivatives.”1 Phillipe Gilbert pounced upon errors and omissions in Hankel’s work and
displayed them “so as to leave no doubt . . . about the inanity of the conclusions.”2
But the tide had turned. Hankel responded with the observation that Riemann’s example
of an integrable function with infinitely many discontinuities implies that its integral,

x ∞

((nt))
F (x) = dt,
0 n=1
n2

is necessarily continuous at every x but cannot be differentiable at any of the infinitely


many points where the integrand is not continuous. The real surprise came in 1872 when
Karl Weierstrass showed the Berlin Academy the trigonometric series mentioned at the
end of Chapter 1:



f (x) = bn cos(a n π x), (6.80)
n=0

where a is an odd integer, b lies strictly between 0 and 1, and ab is strictly larger than
1 + 3π/2. It is continuous at every value of x and differentiable at none. A flood of examples
followed.

Proving Nondifferentiability
The continuity of Weierstrass’s example is easy. We have uniform convergence from the
M-test with Mn = bn . To see how to prove that a function is not differentiable, we must
first recall what it means to say that it is differentiable. If f is differentiable at x0 , then
there is a number, denoted by f  (x0 ), for which

f (x1 ) − f (x0 )
− f (x0 ) = |E(x1 , x0 )|

x −x
1 0

can be made as small as desired by taking x1 sufficiently close to x0 . What is significant


here is that we must be able to force E to be small not by how we choose x1 but by how

1 As quoted in Medvedev, Scenes from the History of Real Functions, p. 222.


2 As quoted in Hawkins, Lebesgue’s Theory of Integration, p. 45.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

260 6 Return to Fourier Series

we bound it. There must be a response δ so that for all possible values of x1 within δ of x0 ,
E(x1 , x0 ) is smaller than the allowed error.
To prove that f is not differentiable at x0 , we must show that no matter how we select a
value for f  (x0 ) and how we select our response δ, there is at least one x1 within δ of x0 for
which E(x1 , x0 ) is larger than the allowed error. One way of accomplishing this is to show
that for any δ, there is always an x1 within δ of x0 for which

f (x1 ) − f (x0 )

x −x
1 0

is larger than any prespecified bound. If this ratio is unbounded inside every interval of the
form (x0 − δ, x0 + δ), then it cannot stay close to any single value of f  (x0 ).
We begin with a Fourier series of the form


f (x) = bn cos(a n π x),
n=0

where 0 < b < 1 so that it converges uniformly and look for conditions on a and b that
will imply that
∞ n
n=0 b cos(a n π x1 ) − ∞ n n
n=0 b cos(a π x0 )

x1 − x0

is unbounded as x1 ranges over any interval of the form (x0 − δ, x0 + δ). By Theorems 5.4
and 5.5, we can combine the summations and simplify this ratio to

 cos(a n π x1 ) − cos(a n π x0 )
bn .
n=0
x1 − x0

We make two critical observations. First, given a, x0 , and a positive integer m, there will
always be an integer N satisfying

1 ≤ |N − a m x0 | ≤ 3/2. (6.81)

Second, if we choose x1 so that a m x1 = N , then cos(a m π x1 ) and cos(a m π x0 ) will have


opposite signs. It follows that

cos(a m π x1 ) − cos(a m π x0 ) = cos(π N0) − cos[π N + π (a m x0 − N1)]


= (−1)N 1 + cos[π (a m x0 − N − 1)] , (6.82)

where

1 + cos[π (a m x0 − N − 1)] ≥ 1. (6.83)

If a is an odd integer and n is larger than m, then

cos(a n π x1 ) − cos(a n π x0 ) = cos(a n−m


0 N π ) − cos(a
n
π x0 ) 1
= (−1) 1 + cos[a
N n−m
π (a m x0 − N − 1)] ,
(6.84)
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.4 Continuity without Differentiability 261

and

1 + cos[a n−m π (a m x0 − N − 1)] ≥ 0. (6.85)

Equations (6.82–6.85) imply that all of the summands in




bn [cos(a n π x1 ) − cos(a n π x0 )]
n=0

from the mth term on have the same sign and that the mth summand has absolute value
greater than or equal to bm :

 cos(a n π x ) − cos(a n π x ) bm
1 0
b n
≥ . (6.86)
n=m x1 − x0 |x1 − x0 |

If we replace N by a m x1 in equation (6.81), we see that

3
|x1 − x0 | ≤ . (6.87)
2a m

As long as we choose m large enough so that 3/2a m < δ, there is such an x1 inside our
interval (x0 − δ, x0 + δ). This upper bound on |x1 − x0 | combined with equation (6.86)
tells us that

 cos(a n π x ) − cos(a n π x ) 2
1 0
b n
≥ (ab)m . (6.88)
x1 − x0 3
n=m

As long as ab is larger than 1, we can find an x1 for which the tail of the series is as large
as we wish.
We are not quite done. We must verify that the first m summands of our series do not
cancel out the value of the tail. We need an upper bound on
m−1
 cos(a n π x ) − cos(a n π x )
1 0
bn .
x1 − x0
n=0

The mean value theorem tells us that there is an x2 between x0 and x1 for which

cos(a n π x1 ) − cos(a n π x0 )
= −a n π sin(a n π x2 ). (6.89)
x1 − x0

Since the absolute value of the sine is bounded by 1, we see that



cos(a n π x1 ) − cos(a n π x0 )
≤ a n π, (6.90)
x −x
1 0
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

262 6 Return to Fourier Series

and therefore
m−1
 cos(a n π x ) − cos(a n π x ) m−1
 n cos(a π x1 ) − cos(a π x0 )
n n
1 0
b n
≤ b
x1 − x0 x1 − x0
n=0 n=0


m−1
(ab)m − 1
≤ (ab)n π = π
n=0
ab − 1
(ab)m
<π . (6.91)
ab − 1
If we choose ab so that
π 2
< ,
ab − 1 3
which is the same as saying that

ab > 1 + 3π/2, (6.92)

then the absolute value of the sum of the first m terms will be a strictly smaller multiple of
(ab)m than is the sum of the tail:
 
1 ∞
2 π
b [cos(a π x1 ) − cos(a π x0 )] > (ab)m
n n n
− . (6.93)
x1 − x0 3 ab − 1
n=0

Since m is not bounded and ab is larger than 1, an x1 exists for which this average rate of
change is larger than any predetermined error.

Q.E.D.

As an example, let us take b = 6/7 and a = 7 so that ab = 6 > 1 + 3π/2 ≈ 5.7. Given
δ, we can choose any m for which 7m > 3/2δ. We have demonstrated that there will be an
x1 within δ of x0 for which
∞    
 6 n cos(7n π x ) − cos(7n π x )
1 0 m 2 π
>6 − > 0.038 × 6m . (6.94)
7 x1 − x0 3 5
n=0

Even Weierstrass Could Be Wrong


Even after Weierstrass announced his example of a function that is continuous everywhere
and differentiable nowhere, the question remained of whether a “nice” continuous function
would have to be differentiable. The simplest additional condition would be to insist on
monotonicity or piecewise monotonicity. Several people searched for a proof that a con-
tinuous monotonic function would have to be differentiable at all but finitely many points.
Weierstrass responded with a continuous monotonic function that is not differentiable at
any rational number. He then sought such a function that is not differentiable at any number.
Though he never found one, he believed that they must exist.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.4 Continuity without Differentiability 263

Weierstrass was wrong. They do not exist. Henri Lebesgue would prove in 1903 that a
continuous and monotonic function must be differentiable at most points. What he proved
was that the set of points where the function is not differentiable must have measure zero,
where measure zero is a technical restriction on the size of a set. It will be defined in the
next chapter. The rational
√ numbers have measure zero. Even if we include all algebraic
numbers, numbers like 2 that are the roots of polynomials with rational coefficients, we
still have a set of measure zero. It may seem like there are a lot of them, but in a sense that
can be made precise, most real numbers are neither rational nor algebraic.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

6.4.1. Prove that if f (a − 0) = f (a) = f (a + 0), then



x
F (x) = f (t) dt
0

cannot be differentiable at x = a.

6.4.2.
M&M The exercises beginning here and continuing through exercise 6.4.10 de-
velop and verify a standard example of an everywhere continuous, nowhere differentiable
function. We begin with the function that assigns to the variable x the distance between x and
the nearest integer. For example: f (2.15) = 0.15, f (1.78) = 0.22, f (1/2) = 1/2, f (3) =
0. Graph this function for −2 ≤ x ≤ 2. The function you get should look like the teeth of
a saw.

M&M
6.4.3.
Graph the function
1
fn (x) = f (4n x)
4n
for n = 2, 3, 4 over the interval −41−n ≤ x ≤ 41−n .

M&M
6.4.4.
Define a new function F by



F (x) = fn (x). (6.95)
n=0

This is the function that will be shown to be continuous but never differentiable. Let

N
SN (x) = fn (x)
n=0

be the partial sums. Graph S2 (x), S3 (x), S4 (x) for −2 ≤ x ≤ 2.

6.4.5. Prove that the series expansion for F given in equation (6.95) converges uniformly
for all x. This implies that F must be a continuous function.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

264 6 Return to Fourier Series

6.4.6. The first step in showing that f is never differentiable is to consider the real number
line divided into intervals that are split at the half integers:

−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

Let m be any positive integer. If 4m x and 4m x + 1/4 are in different intervals, then
4 x and 4m x − 1/4 must be in the same interval. Define σm = 1 or − 1 so that 4m x and
m

4m x + σm /4 are in the same interval (if 4m x is a half integer, then take σm = 1). Show that

σm
fm (x + σm /4m+1 ) − fm (x) = ± . (6.96)
4m+1

6.4.7. The reason why equation (6.96) works is that x and x + σm /4m+1 lie on the same
edge of the same tooth in the graph of the function fm . Prove that if n ≤ m, then x and
x + σm /4m+1 lie on the same edge of the same tooth in the graph of the function fn , and
then show how this implies that

σm
fn (x + σm /4m+1 ) − fn (x) = ± , n ≤ m. (6.97)
4m+1

6.4.8. Prove that if n > m, then fn (x + σm /4m+1 ) = fn (x), and so

fn (x + σm /4m+1 ) − fn (x) = 0, n > m. (6.98)

6.4.9. Show that

F (x + σm /4m+1 ) − F (x) 
m
= ±1 = αm , (6.99)
σm /4m+1 n=0

where αm is even if and only if m is odd.

6.4.10. If F is differentiable at x, then we can find a number—denoted by F  (x)—such


that

F (x + σm /4m+1 ) − F (x)
αm = = F  (x) − E(x + σm /4m+1 , x), (6.100)
σm /4m+1

where E(x + h, x) can be made arbitrarily small by taking h sufficiently small. Explain
why no such F  (x) can exist.

M&M
6.4.11.
Graph the partial sums
M  n
 6
fM (x) = cos(7n π x)
n=0
7

for x in [−1/7M−1 , 1/7M−1 ] with M = 1, 2, and 3.


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

6.4 Continuity without Differentiability 265

6.4.12. Let
∞  n
 6
f (x) = cos(7n π x).
n=0
7
What is f (0)? How many terms of this series would you have to take in order to be
certain that you are within 0.01 of the correct value of f (x)? The fact that this series is
uniformly convergent implies that there is an answer to this question that is independent
of x.

6.4.13. Find a value of x1 that lies within 0.001 of 0.5 and for which
∞  
 6 n cos(7n π x ) − cos(7n π × 0.5)
1
> 1,000,000. (6.101)
7 x1 − 0.5
n=0
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

7
Epilogue

For over a decade, Weierstrass was a voice crying in the wilderness, proclaiming the
importance of uniform convergence and the need for –δ proofs. Few understood what he
was saying. But he trained students who spread the seeds of his message. The publication
of Riemann’s treatise on trigonometric series became the catalyst for the acceptance of
Weierstrass’s position into the mathematical canon. By the 1870s, the mathematical world
was abuzz with the questions that emerged from Riemann’s work.
One of these questions was the uniqueness of the trigonometric representation: can
two different trigonometric series be equal at every value of x? Fourier’s proof of the
uniqueness of his series rests on the interchange of integration and summation, of the
ability to integrate an infinite sum by integrating each summand. Weierstrass, Riemann,
and Dirichlet recognized the potential hazards of this approach. Weierstrass knew that
it was legitimate when the series converged uniformly, but some of the most interesting
Fourier series do not converge uniformly.
In 1870, Heinrich Heine introduced the notion of uniform convergence in general.
A trigonometric series converges uniformly in general if there are a finite number of
break points in the interval of the basic period so that on any closed interval that does
not include a break point, the series converges uniformly. Fourier’s cosine series for the
function that is 1 on (−1, 1) is not uniformly convergent, but it is uniformly convergent
in general. Heine proved that if the series converges uniformly in general, then there is no
other trigonometric series that represents that same function. The representation is unique.
Mathematicians became aware of the subtleties of term-by-term integration and uniform
convergence.
They also began to evince unease with Riemann’s definition of the integral. In that same
year, 1870, Hermann Hankel pointed out that if a function is integrable in Riemann’s sense,
then inside any open interval, no matter how small, there is at least one point where the

267
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

268 7 Epilogue

function is continuous. In modern terminology, the set of points at which the function is
continuous is dense. But there is no reason why a trigonometric series, even if it converges,
need be continuous at any point. The search was on for a more general definition of
integration.
The road to this redefinition of integration leads through set theory and point-set topology
and involves such names as Cantor, Baire, and Borel. The new integral was announced
in 1901 by Henri Lebesgue and became the subject of his doctoral thesis, published a
year later. Whenever a function can be integrated in the Riemann sense, its integral exists
and is the same whether one uses the Riemann or Lebesgue definition. The advantage of
Lebesgue’s definition is that it completely divorces integrability from continuity and so
extends integrability to many more functions.

The Lebesgue Integral


The Lebesgue definition of the integral begins with the notion of the measure of a set.
For an interval, whether open or closed, its measure is simply its length. The measure of a
single point is 0. For other sets, the measure gets more interesting. It is not always defined,
but when the measure of a set S exists, then it can be found by looking at all coverings of
S. A covering of S is a countable collection of open intervals whose union contains every
point in S. For each possible covering, we calculate the sum of the lengths of the intervals
in the covering. When S is measurable, its measure m(S) is the greatest lower bound (or
infimum) of these calculations, taken over all possible coverings:
 
m(S) = inf m(In ) .
S ⊆ ∪In

To find the measure of the set of rational numbers between 0 and 1, we observe that we
can order them in one-to-one correspondence with the integers:
 
1 1 2 1 3 1 2 3 4 1 5 1 2
0, 1, , , , , , , , , , , , , , . . . .
2 3 3 4 4 5 5 5 5 6 6 7 7

Given any > 0, we can create a covering out of an open interval of length /2 containing
0, /4 containing 1, /8 containing 1/2, /16 containing 1/3, and so on. The sum of these
lengths is . The greatest lower bound of the sum of the lengths over all coverings of this
set is 0.
The set of irrational numbers between 0 and 1 has measure 1. While it is possible to
construct sets that are not measurable, they are very strange creatures indeed.
The characteristic function of a set S, χS , is defined to be 1 at any point in S and 0 at
any point not in S:

1, x ∈ S,
χS (x) =
0, x ∈ S.

The Lebesgue integral starts with characteristic functions. The integral over the interval
[a, b] of χS is the measure of S ∩ [a, b]:

b
χS (x) dx = m (S ∩ [a, b]) .
a
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

7 Epilogue 269

If a function is a finite linear combination of characteristic functions, its integral is the


appropriate linear combination of the integrals:

b  n

n
ai χSi (x) dx = ai m (Si ∩ [a, b]) .
a i=1 i=1

The hard part comes in showing that any reasonably nice function is the limit of linear
combinations of characteristic functions and that the limit of the corresponding integrals is
well defined.
Not everyone was happy with the direction analysis had taken in the last decades of the
nineteenth century. In 1889, Henri Poincaré wrote

So it is that we see the emergence of a multitude of bizarre functions that seem to


do their best to resemble as little as possible those honest functions that serve a
useful purpose. No longer continuous, or maybe continuous but not differentiable,
etc. More than this, from a logical point of view, it is these strange functions that
are the most common. The functions that one encounters without having searched
for them and which follow simple laws now appear to be no more than a very
special case. Only a small corner remains for them.
In earlier times, when we invented a new function it was for the purpose of
some practical goal. Today, we invent them expressly to show the flaws in our
forefathers’ reasoning, and we draw from them nothing more than that.

More succinctly, Hermite wrote to Stieltjes in 1893, “I turn away with fright and horror
from this lamentable plague of functions that do not have derivatives.”

Why?
Had analysis gone too far? Had it totally divorced itself from reality to wallow in a self-
generated sea of miscreations and meaningless subtleties? Some good mathematicians may
have been alarmed by the direction analysis was taking, but both of the quotes just given
were taken out of context. Poincaré was not disparaging what analysis had become, but
how it was taught. Hermite was not complaining of artificially created functions that lack
derivatives, but of trigonometric series that he had encountered in his explorations of the
Bernoulli polynomials. They were proving intractable precisely because they were not
differentiable.
With characteristic foresight, Riemann put his finger on the importance of these studies:

In fact, [the problem of the representability of a function by trigonometric series]


was completely solved for all cases which present themselves in nature alone,
because however great may be our ignorance about how the forces and states of
matter vary in space and time in the infinitely small, we can certainly assume that
the functions to which Dirichlet’s research did not extend do not occur in nature.
Nevertheless, those cases that were unresolved by Dirichlet seem worthy of atten-
tion for two reasons.
The first is that, as Dirichlet himself remarked at the end of his paper, this subject
stands in the closest relationship to the principles of the infinitesimal calculus
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

270 7 Epilogue

and can serve to bring these principles to greater clarity and certainty. In this
connection its treatment has an immediate interest.
But the second reason is that the applicability of Fourier series is not restricted to
physical researches; it is now also being applied successfully to one area of pure
mathematics, number theory. And just those functions whose representability by
a trigonometric series Dirichlet did not explore seem here to be important.
Riemann was intimately acquainted with the connection between infinite series and
number theory. Legendre had conjectured in 1788 that if a and q are relatively prime
integers, then there are infinitely many primes p for which p − a is a multiple of q. As
an example, there should be infinitely many primes p for which p − 6 is a multiple of 35.
Dirichlet proved Legendre’s conjecture by showing that if we sum the reciprocals of these
primes,
1
, p is prime and q divides p − a,
p
this series always diverges, irrespective of the choice of a and q (provided they have no
common factor). Not only is there no limit to the number of primes of this form, they are
in some sense quite common.
The methods that Dirichlet introduced to prove this result are extremely powerful and
far-reaching. Riemann himself modified and extended them to suggest how it might be
possible to prove that the number of primes less than or equal to x is asymptotically x/ ln x.
The route he mapped out is tortuous, involving strange series and very subtle questions
of convergence. It was not completely negotiated until the independent proofs of Jacques
Hadamard and Charles de la Vallée Poussin in 1896. They needed all of the machinery of
analysis that was available to them.
Analysis has continued to be a key component of modern number theory. It is more than
a toy for the investigation of primes and Bernoulli numbers. It has emerged as an important
tool for the study of a wide range of discrete systems with interesting structure. It sits at the
heart of the modern methods used by Andrew Wiles and Richard Taylor to attack Fermat’s
last theorem. It plays a critical role in the theoretical constructs of modern physics. As
Riemann foresaw, “just those functions whose representability by a trigonometric series
Dirichlet did not explore seem here to be important.”
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Appendix A
Explorations of the Infinite

The four sections of Appendix A lead to a proof of Stirling’s remarkable formula for the
value of n!:

n! = nn e−n 2π n eE(n) , (A.1)

where limn→∞ E(n) = 0, and this error term can be represented by the asymptotic series

B2 B4 B6
E(n) ∼ + + + ··· , (A.2)
1 · 2 · n 3 · 4 · n3 5 · 6 · n5

where B1 , B2 , B3 , . . . are rational numbers known as the Bernoulli numbers. Note that we
do not write equation (A.2) as an equality since this series does not converge to E(n) (see
section A.4).
In this first section, we follow John Wallis as he discovers an infinite product that is equal
to π . While his formula is a terrible way to approximate π , it establishes the connection
between n! and π , explaining that curious appearance of π in equation (A.1). In section 2,
we show how Jacob Bernoulli was led to discover his mysterious and pervasive sequence
of numbers by his search for a simple formula for the sum of kth powers. We continue
the applications of the Bernoulli numbers in section 3 where we follow Leonhard Euler’s
development of formulæ for the sums of reciprocals of even powers of the positive integers.
It all comes together in section 4 when we shall prove this identity discovered by Abraham
deMoivre and James Stirling.

A.1 Wallis on π
When Newton said, “If I have seen a little farther than others it is because I have stood on
the shoulders of giants,” one of those giants was John Wallis (1616–1703). Wallis taught
271
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

272 Appendix A. Explorations of the Infinite

at Cambridge before becoming Savilian Professor of Geometry at Oxford. His Arithmetica


Infinitorum, published in 1655, derives the rule (found also by Fermat) for the integral of a
fractional power of x:

1
1 n
x m/n dx = = . (A.3)
0 1 + m/n m+n

We begin our development of Wallis’s formula for π with the observation that π is the
area of any circle with radius 1. If we locate the center of our circle at the origin, then the
quarter circle in the first quadrant has area

1
π
= (1 − x 2 )1/2 dx. (A.4)
4 0

This looks very much like the kind of integral Wallis had been studying. Any means of
calculating this integral will yield a means of calculating π . Realizing that he could not
attack it head on, Wallis looked for similar integrals that he could handle. His genius is
revealed in his decision to look at

1
(1 − x 1/p )q dx.
0
When q is a small positive integer, we can expand the integrand:

1
1
(1 − x ) dx =
1/p 0
dx
0 0
= 1,

1
1
(1 − x 1/p )1 dx = (1 − x 1/p ) dx
0 0
p
= 1−
p+1
1
, =
p+1

1
1
(1 − x ) dx =
1/p 2
(1 − 2x 1/p + x 2/p ) dx
0 0
2p p
= 1− +
p+1 p+2
2
= ,
(p + 1)(p + 2)

1
1
(1 − x 1/p )3 dx = (1 − 3x 1/p + 3x 2/p − x 3/p ) dx
0 0
3p 3p p
= 1− + −
p+1 p+2 p+3
6
= .
(p + 1)(p + 2)(p + 3)
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.1 Wallis on π 273

A pattern is emerging, and it requires little insight to guess that



1
4!
(1 − x 1/p )4 dx = ,
0 (p + 1)(p + 2)(p + 3)(p + 4)

1
5!
(1 − x 1/p )5 dx = ,
0 (p + 1)(p + 2)(p + 3)(p + 4)(p + 5)
..
.

where 4! = 4 · 3 · 2 · 1 = 24, 5! = 5 · 4 · 3 · 2 · 1 = 120, and so on.


These numbers should look familiar. When p and q are both integers, we get reciprocals
of binomial coefficients,
q! 1
= p+q  ,
(p + 1)(p + 2) · · · (p + q) q

where
   
p+q p+q (p + q)!
= = .
q p p! q!
This suggested to Wallis that he wanted to work with the reciprocals of his integrals:
2
1
(p + 1)(p + 2) · · · (p + q)
f (p, q) = 1 (1 − x 1/p )q dx = . (A.5)
0 q!

We want to find the value of f (1/2, 1/2) = 4/π . Our first observation is that we can
evaluate f (p, q) for any p so long as q is a nonnegative integer. We use induction
1 on q to
prove equation (A.5). As shown above, when q = 0 we have f (p, q) = 1/ 0 dx = 1. In
exercise A.1.2, you are asked to prove that

1  q q 1  q−1
1 − x 1/p dx = 1 − x 1/p dx. (A.6)
0 p+q 0

With this recursion and the induction hypothesis that


(p + 1)(p + 2) · · · (p + q − 1)
f (p, q − 1) = ,
(q − 1)!
it follows that
 
p + q (p + 1)(p + 2) · · · (p + q − 1) (p + 1)(p + 2) · · · (p + q)
f (p, q) = = .
q (q − 1)! q!
We also can use this recursion to find f (1/2, 3/2) in terms of f (1/2, 1/2):
1/2 + 3/2 4
f (1/2, 3/2) = f (1/2, 1/2) = f (1/2, 1/2).
3/2 3
We can now prove by induction (see exercise A.1.3) that
   
1 1 (2q)(2q − 2) · · · 4 1 1
f ,q − = f , . (A.7)
2 2 (2q − 1)(2q − 3) · · · 3 2 2
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

274 Appendix A. Explorations of the Infinite

Table A.1. Values of f (p, q) in terms of  = 4/π.

f (p, q)
p↓ q → −1/2 0 1/2 1 3/2 2 5/2 3 7/2 4

−1/2 ∞ 1 1
2
 1
2
1
3
 3
8
4
15
 5
16
8
35
 35
128

0 1 1 1 1 1 1 1 1 1 1

1/2
1
2
 1  3
2
4
3
 15
8
8
5
 35
16
64
35
 315
128

1 3 5 7 9
1 2
1 2
2 2
3 2
4 2
5

3/2
1
3
 1 4
3
 5
2
8
3
 35
8
64
15
 105
16
128
21
 1155
128

3 15 35 63 99
2 8
1 8
3 8
6 8
10 8
15

5/2
4
15
 1 8
5
 7
2
64
15
 63
8
128
15
 231
16
512
35
 3003
128

5 35 105 231 429


3 16
1 16
4 16
10 16
20 16
35

7/2
8
35
 1 64
35
 9
2
128
21
 99
8
512
35
 429
16
1024
35
 6435
128

35 315 1155 3003 6435


4 128
1 128
5 128
15 128
35 128
70

What if p is a nonnegative integer or half-integer? The binomial coefficient is symmetric


in p and q,
   
p+q p+q (p + q)!
= = .
q p p! q!

It is only a little tricky to verify that for any values of p and q, we also have that f (p, q) =
f (q, p) (see exercise A.1.4). This can be used to prove that when p and q are positive
integers (see exercise A.1.5),
   
1 1 2 · 4 · 6 · · · (2p + 2q − 2) 1 1
f p − ,q − = f , . (A.8)
2 2 3 · 5 · · · (2p − 1) · 3 · 5 · · · (2q − 1) 2 2

Wallis could now construct a table of values for f (p, q), allowing  to stand for
f (1/2, 1/2) = 4/π (see Table A.1.).
We see that any row in which p is a positive integer is increasing from left to right, and
it is reasonable to expect that the row p = 1/2 is also increasing from left to right (see
exericse A.1.6 for the proof ). Recalling that  = 4/π, this implies a string of inequalities:

4 3 16 15 32 35 256 315
1< < < < < < < < .
π 2 3π 8 5π 16 35π 128
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.1 Wallis on π 275

These, in turn, yield inequalities for π/2:


4 π
< < 2,
3 2
64 π 16
< < ,
45 2 9
256 π 128
< < ,
175 2 75
16384 π 2048
< < .
11025 2 1225
It is easier to see what is happening with these inequalities if we look at our string of
inequalities in terms of the ratios that led us to find them in the first place:
4 3 4 4 3 5 4 6 4 3 5 7
1< < < · < · < · · < · · < ··· .
π 2 3 π 2 4 3 5 π 2 4 6
In general, we have that

3 · 5 · 7 · · · (2n − 1) 4 · 6 · 8 · · · (2n) 4 3 · 5 · 7 · · · (2n + 1)


< < . (A.9)
2 · 4 · 6 · · · (2n − 2) 3 · 5 · 7 · · · (2n − 1) π 2 · 4 · 6 · · · (2n)

This yields a general inequality for π/2 that we can make as precise as we want by taking
n sufficiently large:

22 · 42 · 62 · · · (2n)2 π 22 · 42 · 62 · · · (2n − 2)2 · (2n)


< < . (A.10)
1 · 32 · 52 · · · (2n − 1)2 · (2n + 1) 2 1 · 32 · 52 · · · (2n − 1)2

As n gets larger, these bounds on π/2 approach each other. Their ratio is 2n/(2n + 1)
which approaches 1. Wallis therefore concluded that

π 2 2 4 4 6 6
= · · · · · ··· . (A.11)
2 1 3 3 5 5 7

Note that this product alternately grows and shrinks as we take more terms.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

M&M
A.1.1.
Consider Wallis’s infinite product for π/2 given in equation (A.11).
a. Show that the product of the kth pair of fractions is 4k 2 /(4k 2 − 1) and therefore

3 4k 2
π =2 .
k=1
4k 2 − 1

b. How many terms of this product are needed in order to approximate π to 3-digit
accuracy?
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

276 Appendix A. Explorations of the Infinite

c. We can improve our accuracy if we average the upper and lower bounds. Prove that this
average is
22 · 42 · · · (2n) · (2n + 1/2)
.
1 · 32 · · · (2n − 1)2 · (2n + 1)
How large a value of n do we need in order to approximate π to 3-digit accuracy?

A.1.2. Prove equation (A.6).

A.1.3. Use equation (A.6) to prove equation (A.7) by induction on q.

A.1.4. Prove that



1  q 1  p
1 − x 1/p dx = 1 − x 1/q dx, (A.12)
0 0

and therefore f (p, q) = f (q, p).

A.1.5. Using the results from exercises A.1.3 and A.1.4, prove equation (A.8) when p and
q are positive integers.

A.1.6. Prove that if p is positive and q1 > q2 , then


 q1  q2
1 − x 1/p < 1 − x 1/p ,

for all x between 0 and 1, and therefore f (p, q1 ) > f (p, q2 ). Prove that if p is negative
and q1 > q2 , then f (p, q1 ) < f (p, q2 ).

A.1.7.

M&M We have seen that as long as p or q is an integer, we have that f (p, q) =
p+q
. This suggests a way of defining binomial coefficients when neither p nor q are
q
1
integers. We would expect the value of 1/2 to be f (1/2, 1/2) = 4/π . Using your favorite
1 
computer algebra system, see what value it assigns to 1/2 . How should we define ab for
arbitrary real numbers a and b?

A.1.8. When p and q are integers, we have the relationship of Pascal’s triangle,

f (p, q) = f (p, q − 1) + f (p − 1, q). (A.13)

Does this continue to hold true when p and q are not both integers? Either give an example
where it does not work or prove that it is always true.

A.1.9. Show that if we use the row p = −1/2 to approximate π/2:


4 1 4 3 16 5 32 35
1> > > > > > > > > ··· ,
2π 2 3π 8 15π 16 35π 128
then we get the same bounds for π/2.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.2 Bernoulli’s Numbers 277

A.1.10. What bounds do we get for π/2 if we use the row p = 3/2?

A.1.11. What bounds do we get for π/2 if we use the diagonal:


4 32 512 4094
1< <2< <6< < 20 < < 70 < · · ·?
π 3π 15π 35π

A.1.12. As far as you can, extend the table of values for f (p, q) into negative values of p
and q.

A.1.13.
M&M
Use the method of this section to find an infinite product that approaches
the value of
2
1
 1/3
f (2/3, 1/3) = 1 1 − x 3/2 dx .
0
 1

Compare this to the value of 1/3
given by your favorite computer algebra system.

A.2 Bernoulli’s Numbers


Johann and Jacob Bernoulli were Swiss mathematicians from Basel, two brothers who
played a critical role in the early development of calculus. The elder, Jacob Bernoulli, died
in 1705. Eight years later, his final masterpiece was published, Ars Conjectandi. It laid the
foundations for the study of probability and included an elegant solution to an old problem:
to find formulas for the sums of powers of consecutive integers. He bragged that with it he
had calculated the sum of the tenth powers of the integers up to one thousand,
110 + 210 + 310 + · · · + 100010 ,
in “half a quarter of an hour.”
The formula for the sum of the first k − 1 integers is

k2 k
1 + 2 + 3 + 4 + · · · + (k − 1) = − . (A.14)
2 2

The proof is simple:


1 + 2 + · · · + (k − 2) + (k − 1)
+ (k − 1) + (k − 2) + · · · + 2 + 1
= k + k + ··· + k + k = (k − 1)k.
No one knows who first discovered this formula. Its origins are lost in the mists of time.
Even the formula for the sum of squares is ancient,

k3 k2 k
12 + 22 + 32 + 42 + · · · + (k − 1)2 = − + . (A.15)
3 2 6

It was known to and used by Archimedes, but was probably even older. It took quite a bit
longer to find the formula for the sum of cubes. The earliest reference that we have to this
formula connects it to the work of Aryabhata of Patna (India) around the year 500 .. The
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

278 Appendix A. Explorations of the Infinite

formula for sums of fourth powers was discovered by ibn Al-Haytham of Baghdad about
1000 .. In the early 14th century, Levi ben Gerson of France found a general formula
for arbitary kth powers, though the central idea which draws on patterns in the binomial
coefficients can be found in other contemporary and earlier sources: Al-Bahir fi’l Hisab
(Shining Treatise on Calculation) written by al-Samaw’al in 1144 in what is now Iraq,
Siyuan Yujian (Jade Mirror of the Four Unknowns) written by Zhu Shijie in 1303 in China,
and Ganita Kaumudi written by Narayana Pandita in 1356 in India.

Web Resource: To see a derivation and proof of this historic formula for the
sum of kth powers based on properties of binomial coefficients, go to Binomial
coefficients and sums of kth powers.

Jacob Bernoulli may have been aware of the binomial coefficient formula, but that did
not stop him from finding his own. He had a brilliant insight. The new integral calculus
gave efficient tools for calculating limits of summations that today we call Riemann sums.
Perhaps it could be turned to the task of finding formulas for other types of sums.

The Bernoulli Polynomials


Jacob Bernoulli looked for polynomials, B1 (x), B2 (x), . . . , for which

k
1 + 2 + · · · + (k − 1) = B1 (x) dx,
0

k
12 + 22 + · · · + (k − 1)2 = B2 (x) dx,
0

k
1 + 2 + · · · + (k − 1) =
3 3 3
B3 (x) dx,
0
..
.

k
1 + 2 + · · · + (k − 1) =
n n n
Bn (x) dx.
0

Such a polynomial must satisfy the equation


k+1
Bn (x) dx = k n . (A.16)
k

In fact, for each positive integer n, there is a unique monic1 polynomial of degree n that
satisfies this equation for all values of k, not only when k is a positive integer. It is easiest
to see why this is so by means of an example. We shall set

B3 (x) = x 3 + a2 x 2 + a1 x + a0

and show that there exist unique values for a2 , a1 , and a0 .

1 The coefficient of x n is 1.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.2. Bernoulli’s Numbers 279

Substituting this polynomial for B3 (x) in equation (A.16), we see that



k+1
 3 
k =
3
x + a2 x 2 + a1 x + a0 dx
k
1 a2 a1 1 a2 a1
= (k + 1)4 + (k + 1)3 + (k + 1)2 + a0 (k + 1) − k 4 − k 3 − k 2 − a0 k
4 3 2 4 3 2
   
3 1 a2 a1
= k3 + + a2 k 2 + (1 + a2 + a1 )k + + + + a0 . (A.17)
2 4 3 2
The coefficients of the different powers of k must be the same on both sides:
3
0= + a2 , (A.18)
2
0 = 1 + a2 + a1 , (A.19)
1 a2 a1
0= + + + a0 . (A.20)
4 3 2
These three equations have a unique solution: a2 = −3/2, a1 = 1/2, a0 = 0, and so

3x 2 x
B3 (x) = x 3 − + . (A.21)
2 2

Integrating this polynomial from 0 to k, we obtain the formula for the sum of cubes:

1 
3x 2 x
1 + 2 + · · · + (k − 1) =
3 3 3
x −
3
+ dx
0 2 2
k4 k3 k2
= − + .
4 2 4
The first two Bernoulli polynomials are
1
B1 (x) = x − , (A.22)
2
1
B2 (x) = x 2 − x + . (A.23)
6
We now make an observation that will enable us to construct Bn+1 (x) from Bn (x). If we
differentiate both sides of equation (A.16) with respect to k, we get:

Bn (k + 1) − Bn (k) = nk n−1 . (A.24)

This implies that



n 0n−1 + 1n−1 + 2n−1 + · · · + (k − 1)n−1
= [Bn (1) − Bn (0)] + [Bn (2) − Bn (1)] + [Bn (3) − Bn (2)]
+ · · · + [Bn (k) − Bn (k − 1)]
= Bn (k) − Bn (0), (A.25)
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

280 Appendix A. Explorations of the Infinite

and therefore

Bn (k) − Bn (0) k
= 1n−1 + 2n−1 + · · · + (k − 1)n−1 = Bn−1 (x) dx. (A.26)
n 0

Our recursive formula is



x
Bn (x) = n Bn−1 (t) dt + Bn (0). (A.27)
0

Given that B4 (0) = −1/30, we can find B4 (x) by integrating B3 (x), multiplying by 4, and
then adding −1/30:
 4 
x x3 x2 1 1
B4 (x) = 4 − + − = x 4 − 2x 3 + x 2 − .
4 2 4 30 30
If we know the constant term in each polynomial: B1 (0) = −1/2, B2 (0) = 1/6, B3 (0) =
0, . . . , then we can successively construct as many Bernoulli polynomials as we wish.
These constants are called the Bernoulli numbers:
−1 1 −1
B1 = , B2 = , B3 = 0, B4 = , ...
2 6 30

A Formula for B n (x )
We can do even better. Recalling that B1 (x) = x + B1 and repeatedly using equation (A.27),
we see that

x
B2 (x) = 2 (t + B1 ) dt + B2
0

= x 2 + 2B1 x + B2 ,

x
B3 (x) = 3 (t 2 + 2B1 t + B2 ) dt + B3
0

= x + 3B1 x 2 + 3B2 x + B3 ,
3

x
B4 (x) = 4 (t 3 + 3B1 t 2 + 3B2 t + B3 ) dt + B4
0

= x + 4B1 x 3 + 6B2 x 2 + 4B3 x + B4 ,


4
..
.

A pattern is developing. Our coefficients are precisely the coefficients of the binomial
expansion. Pascal’s triangle has struck again. Once we see it, it is easy to verify by
induction (see exercise A.2.3) that
n(n − 1) n(n − 1)(n − 2)
Bn (x) = x n + nB1 x n−1 + B2 x n−2 + B3 x n−3
2! 3!
+ · · · + nBn−1 x + Bn . (A.28)

The only problem left is to find an efficient means of determining the Bernoulli numbers.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.2. Bernoulli’s Numbers 281

A Recursive Formula for B n


We turn to equation (A.24), set k = 0, and assume that n is larger than one:

Bn (1) − Bn (0) = n · 0n−1 = 0. (A.29)

We use equation (A.28) to evaluate Bn (1):


0 = Bn (1) − Bn
 
n(n − 1) n(n − 1)
= 1 + nB1 + B2 + · · · + Bn−2 + nBn−1 + Bn − Bn , (A.30)
2! 2!
 
−1 n(n − 1) n(n − 1)
Bn−1 = 1 + nB1 + B2 + · · · + Bn−2 . (A.31)
n 2! 2!
It follows that
 
1 −1 1 −1
B5 = − 1+6· + 15 · + 20 · 0 + 15 ·
6 2 6 30
= 0,
 
1 −1 1 −1
B6 = − 1+7· + 21 · + 35 · 0 + 35 · + 21 · 0
7 2 6 30
1
= .
42
Continuing, we obtain
−1
B7 = 0, B8 = , B9 = 0,
30
5 −691
B10 = , B11 = 0, B12 = .
66 2730

Bernoulli’s Calculation
Equipped with equation (A.26) and the knowledge of B1 , B2 , . . . , B10 , we can find the
formula for the sum of the first k − 1 tenth powers:


k−1
1
i 10 = [B11 (k) − B11 ]
i=1
11

1 11
= (k + 11B1 k 10 + 55B2 k 9 + 165B3 k 8 + 330B4 k 7
11
+ 462B5 k 6 + 462B6 k 5 + 330B7 k 4 + 165B8 k 3

+ 55B9 k 2 + 11B10 k)
1 11 1 10 5 9 1 5
= k − k + k − k7 + k5 − k3 + k. (A.32)
11 2 6 2 66
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

282 Appendix A. Explorations of the Infinite

Since it is much easier to take powers of 1000 = 103 than of 1001, let us add 100010 =
30
10 to the sum of the tenth powers of the integers up to 999:
110 + 210 + 310 + · · · + 100010
1 1 5 1 5
= 1030 + 1033 − 1030 + 1027 − 1021 + 1015 − 109 + 103 .
11 2 6 2 66
This is a simple problem in arithmetic:
1 00000 00000 00000 00000 00000
00000 .00
+ 90 90909 09090 90909 09090 90909
09090 .90
− 50000 00000 00000 00000 00000
00000 .00
+ 83 33333 33333 33333 33333
33333 .33
− 10 00000 00000 00000
00000 .00
+ 1 00000 00000
00000 .00
− 5000
00000 .00
+ 75 .75
91 40992 42414 24243 42424 19242 42500
Seven and a half minutes is plenty of time.

Fermat’s Last Theorem


The Bernoulli numbers will make appearances in each of the next two sections. Once they
were discovered, mathematicians kept finding them again, and again, and again. One of the
more surprising places that they turn up is in connection with Fermat’s last theorem.
After studying Pythagorean triples, triples of positive integers satisfying
x 2 + y 2 = z2 ,
Pierre de Fermat pondered the question of whether such triples could exist when the
exponent was larger than 2. He came to the conclusion that no such triples exist, but never
gave a proof. It should be noted that if there is no solution to
x n + y n = zn ,
then there can be no solution to
x mn + y mn = zmn ,
because if x = a, y = b, z = c were a solution to the second equation, then x = a m ,
y = bm , z = cm would be a solution to the first. If we want to prove Fermat’s statement,
then it is enough to prove that there are no solutions when n = 4 and no solutions when n
is an odd prime.
The case n = 4 can be handled by methods described by Fermat. Euler essentially proved
the case n = 3 in 1753. His proof was flawed, but his approach was correct. Fermat’s
“theorem” for n = 5 came in pieces. Sophie Germain (1776–1831), one of the first women
to publish mathematics, showed that if a solution exists, then either x, y, or z must be
divisible by 5. Gustav Lejeune Dirichlet made his mark on the mathematical scene when,
in 1825 at the age of 20, he proved that the variable divisible by 5 cannot be even. In the
same year, Adrien Marie Legendre, then in his 70’s, picked up Dirichlet’s analysis and
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.2. Bernoulli’s Numbers 283

carried it forward to show the general impossibility of the case n = 5. Gabriel Lamé settled
n = 7 in 1839.
In 1847, Ernst Kummer proved that there is no solution in positive integers to
x p + y p = zp
whenever p is a regular prime. The original definition of a regular prime is well outside
the domain of this book, but Kummer found a simple and equivalent definition:
An odd prime p is regular if and only if it does not divide the numerator of any
of the Bernoulli numbers: B2 , B4 , B6 , . . . , Bp−3 .
The prime 11 is regular. Up to p − 3 = 8, the numerators are all 1. The prime 13 is regular.
So is 17. And 19. And 23. Unfortunately, not all primes are regular. The prime 37 is not,
nor is 59 or 67. Methods using Bernoulli numbers have succeeded in proving Fermat’s last
theorem for all primes below 4,000,000. The proof by Andrew Wiles and Richard Taylor
uses a very different approach.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

A.2.1.
M&M Find the polynomials B5 (x), B6 (x), B7 (x), and B8 (x).

A.2.2.
M&M Graph the eight polynomials B1 (x) through B8 (x). Describe the symmetries
that you see. Prove your guess about the symmetry of Bn (x) for arbitrary n.

A.2.3. Prove equation (A.28) by induction on n.

A.2.4. Prove that

Bn (1 − x) = (−1)n Bn (x) (A.33)

provided n ≥ 1.

A.2.5. Prove that

B2n+1 = 0 (A.34)

provided that n ≥ 1.

A.2.6. Show how to use Bernoulli polynomials and hand calculations to find the sum
18 + 28 + 38 + · · · + 10008 .

A.2.7. Show how to use Bernoulli polynomials and hand calculations to find the sum
110 + 210 + 310 + · · · + 1,000,00010 .
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

284 Appendix A. Explorations of the Infinite


A.2.8.
M&M Explore the factorizations of the numerators and of the denominators of
the Bernoulli numbers. What conjectures can you make?

A.2.9.
M&M Find all primes less than 100 that are not regular. Show that 691 is not a
regular prime.

A.3 Sums of Negative Powers


Jacob Bernoulli and his brother Johann were also interested in the problem of summing
negative powers of the integers. The first such case is the harmonic series,
1 1 1
1+ + + ··· + ,
2 3 k
which by then was well understood. The next case involves the sums of the reciprocals of
the squares:
1 1 1
1+ 2
+ 2 + 2 + ··· .
2 3 4
We observe that this seems to approach a finite limit. The sum up to 1/1002 is 1.63498. Up
to 1/10002 it is 1.64393. In fact, the Bernoullis knew that it must converge because
1 1 1 1
< = − ,
n2 n(n − 1) n−1 n
and so
N N  
1 1 1
< 1+ −
n=1
n2 n=2
n−1 n
   
1 1 1
= 1+ 1− + −
2 2 3
   
1 1 1 1
+··· + − + −
N −2 N −1 N −1 N
1
= 2− .
N
The sum of the reciprocals of the squares must converge and it must converge to something
less than 2. What is the actual value of its limit?
It was around 1734 that Euler discovered that the value of this infinite sum is, in fact,
2
π /6. His proof stretched even the credulity of his contemporaries, but it is worth giving to
show the spirit of mathematical discovery. The fact that π 2 /6 = 1.64493 . . . is very close
to the expected value was convincing evidence that it must be correct.
Consider the power series expansion of sin(x)/x:

sin x x2 x4
=1− + + ··· . (A.35)
x 3! 5!
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.3 Sums of Negative Powers 285

We know that this function has roots at ±π , ±2π , ±3π , . . . , and so we should be able to
factor it:
sin x  x  x  x  x 
= 1− 1+ 1− 1+ ...
x π π 2π 2π
   
x2 x2 x2
= 1− 2 1− 2 2 1 − 2 2 ... . (A.36)
π 2π 3π

We compare the coefficient of x 2 in equations (A.35) and (A.36) and see that
 
−1 1 1 1
=− + + + · · · ,
6 π2 22 π 2 32 π 2

or equivalently,

π2 1 1 1
= 1 + 2 + 2 + 2 + ··· . (A.37)
6 2 3 4

Comparing the coefficients of x 4 and doing a little bit of work, we can also find the
formula for the sum of the reciprocals of the fourth powers:

1 1 1 1 1 1
= 2 2 4 + 2 2 4 + 2 2 4 + ··· + 2 2 4 + 2 2 4 + ···
120 1 ·2 π 1 ·3 π 1 ·4 π 2 ·3 π 2 ·4 π
 1
= 2k2π 4
,
1≤j <k<∞
j

π4  1
= . (A.38)
120 1≤j <k<∞
j k2
2

If we square both sides of equation (A.37) and separate the pieces of the resulting product,
we see that
  
π4 1 1 1 1
= 1 + 2 + 2 + ··· 1 + 2 + 2 + ···
36 2 3 2 3
 1  1  1
= 2 2
+ 2 2
+
1≤j <k<∞
j k 1≤j =k<∞
j k ∞>j >k≥1
j k2
2

 ∞
 1
1
=2 + .
1≤j <k<∞
j 2 k 2 k=1 k 4

Using the result from equation (A.38), we obtain our formula:

 1 ∞
π4 π4
=2 + ,
36 120 k=1 k 4

π4 1 1 1
= 1 + 4 + 4 + 4 + ··· . (A.39)
90 2 3 4
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

286 Appendix A. Explorations of the Infinite

One can—and Euler did—continue this to find formulas for the sums of the reciprocals
of the other even powers,
∞ ∞ ∞
1 1 1
, , , ... .
k=1
k6 k=1
k8 k=1
k 10

In 1740, Euler discovered a formula that covered all of these cases.

A Generating Function
A problem that is not unreasonable at first glance is that of finding a power series expansion
for 1/(1 − ex ). It looks as if it should be quite straightforward. Expand as a geometric
series, use the power series for the exponential function, then rearrange (note that we need
to define 0! = 1):
1
= 1 + ex + e2x + e3x + · · ·
1 − ex
∞
= 1+ ekx
k=1

∞  
(kx)2 (kx)3
= 1+ 1 + kx + + + ···
k=1
2! 3!
∞  ∞
(kx)n
= 1+
k=1 n=0
n!

 ∞
xn 
= 1+ kn ... whoa!
n=0
n! k=1

Something is wrong. We are getting infinite coefficients. We need to back up.


The constant term in the power series expansion should be the value of the function at
x = 0. There is our problem. If we set x = 0 in our original function, we get a zero in the
denominator. We can get rid of the zero in the denominator if we multiply the numerator
by x. The function we should try to expand is
x
.
1 − ex
We check what happens as x approaches 0 and see that we get −1. So far so good. It would
be nice if the constant were +1 instead, so we change the sign of the denominator. We are
looking for the coefficients in the power series expansion:

x
= 1 + a1 x + a2 x 2 + a3 x 3 + · · · . (A.40)
ex − 1

The fact that we have multiplied by −x is not going to make our original argument work,
but this power series should exist. There is little choice but to compute the coefficients, the
an , by brute force. We could do it by using Taylor’s formula, but those derivatives quickly
become very messy. Instead, we shall multiply both sides of equation (A.40) by ex − 1,
expanded as a power series, and then equate the coefficients of comparable powers of x to
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.3 Sums of Negative Powers 287

solve for the an :

x = (ex − 1)(1 + a1 x + a2 x 2 + a3 x 3 + · · · )
 
x2 x3 x4
= x+ + + + · · · (1 + a1 x + a2 x 2 + a3 x 3 + · · · )
2! 3! 4!
     
1 1 a1 1 a1 a2
=x+ + a1 x 2 + + + a2 x 3 + + + + a3 x 4 + · · · .
2! 3! 2! 4! 3! 2!
(A.41)

We obtain an infinite sequence of equations which we can solve for an :


1 −1
0= + a1 =⇒ a1 = ,
2 2
1 a1 1
0= + + a2 =⇒ a2 = ,
6 2 12
1 a1 a2
0= + + + a3 =⇒ a3 = 0.
24 6 2
We continue in this manner:
−1 1 −1
a4 = , a5 = 0, a6 = , a7 = 0, a8 = , ... .
720 30240 1209600
If you do not yet see what is happening, try multiplying each an by n!:
−1 1 −1
1 · a1 = , 2! · a2 = , 3! · a3 = 0, 4! · a4 = ,
2 6 30
1 −1
5! · a5 = 0, 6! · a6 = , 7! · a7 = 0, 8! · a8 = .
42 30
The Bernoulli numbers!
Once we see them, it is not hard to prove that they are really there. Equation (A.41)
implies that

1 a1 a2 a3 an−2 an−1
0= + + + + ··· + + . (A.42)
n! (n − 1)! (n − 2)! (n − 3)! 2! 1!

We multiply both sides by n!:


n(n − 1) n(n − 1)(n − 2)
0 = 1 + n a1 + (2! · a2 ) + (3! · a3 )
2! 3!
n(n − 1)
+··· + [(n − 2)! · an−2 ] + n [(n − 1)! · an−1 ]. (A.43)
2!
This is precisely the recursion that we saw in equation (A.30) for the Bernoulli numbers.
We have proven that

∞
x xn
= 1 + Bn . (A.44)
e −1
x
n=1
n!
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

288 Appendix A. Explorations of the Infinite

Euler’s Analysis
Once he had realized equation (A.44), Euler was off and running. One of the things that
it shows is that x/(ex − 1) is almost an even function: Bn is zero whenever n is odd and
larger than 1. If we add x/2 to both sides, we knock out the single odd power of x and
obtain an even function:

 x 2m x x 2x + xex − x x(ex + 1)
1+ B2m = x + = = . (A.45)
m=1
(2m)! e −1 2 2(ex − 1) 2(ex − 1)

We replace x by 2t:

 (2t)2m t(e2t + 1)
1+ B2m = 2t
m=1
(2m)! e −1

et + e−t
=t
et − e−t
= t coth t, (A.46)
where coth t is the hyperbolic cotangent of t. Euler knew that
eiz − e−iz
sin z = = −i sinh iz, (A.47)
2i
eiz + e−iz
cos z = = cosh iz, (A.48)
2
and so he saw that
cosh iz
z cot z = z
−i sinh iz
= iz coth iz

 (2iz)2m
= 1+ B2m
m=1
(2m)!

 (2z)2m
= 1+ (−1)m B2m . (A.49)
m=1
(2m)!

Euler knew of another expansion for z cot z. Recognizing that the denominator of cot z =
cos z/ sin z is zero whenever z = kπ , k any integer, he had found an infinite partial fraction
decomposition of the cotangent (see exercise A.3.16):
1 1 1 1 1 1
cot z = · · · + + + + + + + ···
z + 2π z+π z z−π z − 2π z − 3π
∞  
1  1 1
= + +
z k=1 z + kπ z − kπ

∞
1 z
= +2
z k=1
z − k2π 2
2

∞
1 z
= −2 2 π 2 − z2
. (A.50)
z k=1
k
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.3 Sums of Negative Powers 289

If we multiply by z, we get an alternate expression for z cot z:



 z2
z cot z = 1 − 2
k=1
k 2 π 2 − z2

 z2 /k 2 π 2
= 1−2
k=1
1 − z2 /k 2 π 2

∞  
z2 z4 z6
= 1−2 + + + · · ·
k=1
k2π 2 k4π 4 k6π 6
∞ 
 ∞
z2m
= 1−2
k=1 m=1
k 2m π 2m

∞  
z2m 1 1 1
= 1−2 1 + 2m + 2m + 2m + · · · . (A.51)
m=1
π 2m 2 3 4

Comparing the coefficients of z2m in equations (A.49) and (A.51), we see that
 
B2m 22m −2 1 1 1
(−1)m = 2m 1 + 2m + 2m + 2m + · · · , (A.52)
(2m)! π 2 3 4

or equivalently

1 1 1 (2π )2m
1+ + 2m + 2m + · · · = (−1)m+1 B2m . (A.53)
22m 3 4 2 · (2m)!

Euler had them all, provided the exponent was even:


∞
1 (2π )2 1 π2
= · = , (A.54)
n=1
n2 4 6 6

∞
1 (2π )4 1 π4
= · = , (A.55)
n=1
n4 2 · 24 30 90

∞
1 (2π )6 1 π6
= · = , (A.56)
n=1
n6 2 · 720 42 945
..
.

The function n−s which Euler had shown how to evaluate when s is a positive even
integer would come to play a very important role in number theory. Today it is called the
zeta function:
∞
1
ζ (s) = , s > 1.
n=1
ns

It can be defined for all complex values of s except s = 1. When Riemann laid out his
prescription for a proof that the number of primes less than or equal to x is asymptotically
x/ ln x, he conjectured that all of the nonreal roots of ζ (s) lie on the line of s’s with real
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

290 Appendix A. Explorations of the Infinite

part 1/2. This is known as the Riemann hypothesis. It says a great deal about the error that
is introduced when x/ ln x is used to approximate the prime counting function. It is still
unproven.

If the Exponent is Odd?


If the exponent is odd, it appears that there is no simple formula. The most that can be said,
and this was only proved in 1978 by Roger Apéry, is that

∞
1
n=1
n3

is definitely not a rational number.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

M&M
A.3.1.
Calculate


100
1 
1000
1
2
and 2
.
n=1
n n=1
n

The first differs from π 2 /6 by about 1/100, the second from π 2 /6 by about 1/1000. This
suggests that

1  1
N
+
N n=1
n2

should be a pretty good approximation to π 2 /6. Test this hypothesis for different values of
N, including at least N = 5000 and N = 10000.

M&M
A.3.2.
Calculate

N
1 N
1
and
n=1
n4 n=1
n6

for N = 100, 500, and 1000. Compare your results with the predicted values of π 4 /90 and
π 6 /945, respectively. Are you willing to believe Euler’s result? In each case, what is the
approximate size of the error?

A.3.3. Prove that if k is larger than 1, then


∞ ∞


dx 1 dx
< < . (A.57)
N+1 xk n=N+1
nk N xk
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.3 Sums of Negative Powers 291


Use these bounds to prove that N k
n=1 1/n differs from
k
n=1 1/n by an amount that lies
between
1 1
and .
(k − 1)(N + 1) k−1 (k − 1)N k−1

A.3.4. Set x = π/2 in equation (A.36) and see what identity you get. Does it look familiar?
It should.

A.3.5. Set x = π/3 in equation (A.36) and see what identity you get. What happens if you
set x = π/4?

A.3.6. Comparing the coefficients of x 6 in equations (A.35) and (A.36) tells us that

π6  1
= 2k2l2
. (A.58)
7! 1≤j <k<l<∞
j

Use this fact together with equations (A.37) and (A.38) to prove that
∞
1 π6
= .
k=1
k6 945

A.3.7. Consider the aborted derivation on page 286. Remember that any equality involving
infinite series must, in general, carry a restriction on those x’s for which it is valid. What
are the restrictions that need to go with each equality? Where precisely does the argument
go wrong?

M&M
A.3.8.
Graph the polynomials


N
xn
y =1+ Bn
n=1
n!

for N = 4, 6, 8, 10, and 12. Compare these to the graph of x/(ex − 1). Describe what you
see. Where does it appear that this series converges?

A.3.9. We observe that ∞ n=1 n
−2m
= 1 + 2−2m + · · · is always larger than 1. Use this fact
and equation (A.53) to prove that

2 · (2m)!
|B2m | > . (A.59)
(2π )2m

Evaluate this lower bound for B20 , B40 , and B100 . Do these numbers stay small or do they
get large? Express the lower bound in scientific notation with six digits of accuracy.

A.3.10. Show that limn→∞ ζ (n) = 1. Use this fact and the formula for Bn implied by

equation (A.53) to find the interval of convergence of the series 1 + ∞ n
n=1 Bn x /n!. Explain
your analysis of convergence at the endpoints.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

292 Appendix A. Explorations of the Infinite


A.3.11.
M&M Taylor’s theorem tells us that Bn must be the nth derivative of x/(ex − 1)
evaluated at x = 0. Verify that this is correct when n = 1, 2, 3, and 4 by finding the
derivatives.

A.3.12. Use the power series expansions of ex , cos x, and sin x to prove that

eix = cos x + i sin x. (A.60)

A.3.13. Use equation (A.60) to prove equations (A.47) and (A.48).



M&M
A.3.14.
Graph the polynomials


N
(2z)2m
y =1+ (−1)m B2m
m=1
(2m)!

for N = 2, 4, and 6. Compare these to the graph of z cot z. Describe what you see. Estimate
the radius of convergence for this series.

A.3.15. Determine the interval of convergence for the series in exercise A.3.14. Show the
work that supports your answer.

A.3.16. We assume that cot z has a partial fraction decomposition. This means that there
are constants, ak , such that
 ak
cot z = .
−∞<k<∞
z − kπ

To find the values of the ak , we multiply both sides by sin z,


 sin z
cos z = ak ,
−∞<k<∞
z − kπ

and then take the limit as z approaches mπ . Show that


sin z
z − kπ
approaches 0 if m = k and that it approaches cos mπ = (−1)m if m = k. Finish the proof
that am = 1.

M&M
A.3.17.
Graph the functions

1 N
z
RN (z) = +2 2 − k2π 2
z k=1
z

for N = 3, 6, 9, and 12. Compare these to the graph of cot z. Describe what you see. Where
does it appear that this series converges? Plot the differences cot(z) − RN (z) for various
values of N and find a reasonable approximation, in terms of N and z, to this error function.
Test the validity of your approximation for N = 1000.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.4 The Size of n! 293

A.3.18. What are the exact values of

∞ ∞
1 1
8
and 10
,
n=1
n n=1
n

expressed as a power of π times a rational number?

A.3.19. Prove that B2m and B2m+2 always have opposite sign.

A.3.20.
M&M Apéry proved that n−3 is not a rational number. We still do not know
if it can be written as π 3 times a rational number. Calculate

N
1
n3
n=1

for large values of N (at least 1000) and estimate the size of your error (see exer-
cise A.3.3).

A.4 The Size of n!


An accurate approximation to n! was discovered in 1730 in a collaboration between Abra-
ham de Moivre (1667–1754) and James Stirling (1692–1770). de Moivre was a French
Protestant. He and his parents had fled to London after the revocation of the Edict of Nantes
in 1685. Despite his brilliance, he was always a foreigner and never obtained an academic
appointment. He struggled throughout his life to support himself on the meager income
earned as a tutor. Stirling was a Jacobite and in 1716, a year after the Jacobite rebellion,
was expelled from Oxford for refusing to swear an oath of allegiance to the king. Because
of his politics, he too was denied an academic position.
Even though it was a joint effort, the formula that we will find is called Stirling’s formula.
This is primarily de Moivre’s own fault. When he published his result he gave Stirling credit
for finding the constant, but his language was sufficiently imprecise that the attribution of
the constant to Stirling could easily be misread as crediting him with the entire identity. In
any event, Stirling’s name does deserve to be attached to this identity because it was the
fruit of both their efforts.
Our first task is to turn n! into a summation so we can use an integral approximation.
This is easily accomplished by taking the natural logarithm:


n
ln(n!) = ln k.
k=1

We can bound this above and below by integrals:



n 
n
n−1
ln x dx < ln k < ln(x + 1) dx + ln n,
1 k=1 0
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

294 Appendix A. Explorations of the Infinite

1.5

0.5

0
0 1 2 3 4 5 6 7
x

FIGURE A.1. Graphs of ln(x + 1) and ln x bounding the step function ln x


.


where the value of nk=1 ln k is represented by the area under the staircase in Figure A.1.
Evaluating these integrals, we see that
n ln n − n + 1 < ln(n!) < n ln n − n + 1 + ln n,
 n n  n n
e < n! < ne .
e e
This gives us a pretty good idea of how fast n! grows, but because the summands are
increasing, our upper and lower bounds get further apart as n increases, unlike the situation
when we estimated the rate of the growth of the harmonic series.

A Trick for Approximating Summations


To get a better approximation, we use a trick that is part of the repertoire of number theory
where there are many summations that need to be approximated. We rewrite ln k as the
integral of 1/x from x = 1 to x = k and then interchange the integral and the summation:
n  n
k
dx
ln k =
k=1 k=1 1
x

n n
k= x
+1 1
= dx
1 x

n
n − x

= dx. (A.61)
1 x
We now split this integral into two pieces:

n
n
n
n − x
n − x + 1/2 x − x
− 1/2
dx = dx + dx
1 x 1 x 1 x

n
1 x − x
− 1/2
= n ln n − n + 1 + ln n + dx. (A.62)
2 1 x
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.4 The Size of n! 295

0.2

0.1
x
2 4 6 8 10
0

−0.1

−0.2

−0.3

−0.4

−0.5

FIGURE A.2. Graph of (x − x


− 1/2)/x.

The integrand in the last line has a graph that oscillates about and approaches the x-axis
(see Figure A.2). The limit of this integral as n approaches infinity exists because



x − x
− 1/2 n+1/2 x − x
− 1/2
dx < dx
x x
n n

1/2
x − 1/2
= dx
0 n+x
    
−1 1 1
= + n+ ln n + − ln n ,
2 2 2
(A.63)
which approaches 0 as n goes to infinity (see exercise A.4.1).
We have proven that


1 x − x
− 1/2
ln(n!) = n ln n − n + ln n + 1 + dx + E(n), (A.64)
2 1 x

where


x − x
− 1/2
E(n) = − dx
n x
approaches 0 as n approaches infinity. Equivalently,
 n n √
n! = C n eE(n) (A.65)
e
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

296 Appendix A. Explorations of the Infinite

where


x − x
− 1/2
ln C = 1 + dx.
1 x
What is the value of C?

Evaluating C
Wallis’s formula comes to our aid in a very slick evaluation. Stirling’s formula implies that

(2n)! C(2n/e)2n 2n eE(2n)
=
n! · n! C 2 (n/e)2n n e2E(n)

22n 2 E(2n)−2E(n)
= e . (A.66)
C n
We solve for C and then do a little rearranging:

22n (n!)2 2 E(2n)−2E(n)
C= √ e
(2n)! n
(2 · 4 · 6 · · · 2n)2 2 E(2n)−2E(n)
= √ e
1 · 2 · 3 · · · 2n 2n
2 · 4 · 6 · · · 2n 2
= √ eE(2n)−2E(n)
1 · 3 · 5 · · · (2n − 1) 2n

2 · 4 · 6 · · · (2n − 2) · 2n E(2n)−2E(n)
= 2e . (A.67)
1 · 3 · 5 · · · (2n − 1)
Looking back at Wallis’s work, we see from equation (A.10) that

2 · 4 · 6 · · · (2n − 2) · 2n
1 · 3 · 5 · · · (2n − 1)

approaches π/2 as √ n gets large. That means that the right side of equation (A.67)

approaches 2 π/2 = 2π as n approaches √ infinity. Since the left side is independent of
n, the constant C must actually equal 2π . We have proven Stirling’s formula:

n! = nn e−n 2π n eE(n) , (A.68)

where E(n) is an error that approaches 0 as n gets large.

The Asymptotic Series for E (n)


Not long after deMoivre and Stirling published their formula for n! in 1730, both Leonard
Euler and Colin Maclaurin realized that something far more general was going on, a formula
for approximating arbitrary series that today is called the Euler–Maclaurin formula. Euler
wrote to Stirling in 1736 describing this general formula. Stirling wrote back in 1738
saying that Colin Maclaurin had also discovered this result. Euler’s proof was published
in 1738, Maclaurin’s in 1742. Because it takes very little extra work, we shall develop the
asymptotic series for E(n) in the more general context of the Euler-Maclaurin formula.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.4 The Size of n! 297

n
We want to find a formula for k=1 f (k) where f is an analytic function for x > 0. We
set

n
S(n) = f (k)
k=1

and assume that S also can be defined for all x > 0 so that it is analytic. By Taylor’s
formula, we have that
S  (n) 2 S  (n) 3
S(n + x) = S(n) + S  (n)x + x + x + ··· .
2! 3!
We set x = −1 and observe that
S  (n) S  (n) S (4) (n)
f (n) = S(n) − S(n − 1) = S  (n) − + − + ··· .
2! 3! 4!
We want to invert this and write S  (n) in terms of f and its derivatives at n. In principle,
this is doable because
S  (n) S (4) (n) S (5) (n)
f  (n) = S  (n) − + − + ···
2! 3! 4!
S (4) (n) S (5) (n) S (6) (n)
f  (n) = S  (n) − + − + ···
2! 3! 4!
S (5) (n) S (6) (n) S (7) (n)
f  (n) = S (4) (n) − + − + ···
2! 3! 4!
..
.
In other words, we want to find the constants a1 , a2 , a3 , . . . such that

S  (n) = f (n) + a1 f  (n) + a2 f  (n) + a3 f  (n) + · · · . (A.69)

We substitute the expansions of the derivatives of f in terms of the derivatives of S into


equation (A.69). This tells us that

 ∞ ∞
S (j ) (n)   S (j ) (n)
S  (n) = (−1)j −1 + ak (−1)j −k−1
j =1
j! k=1 j =k+1
(j − k)!



j −1

  ak
 j −1 (j )
= S (n) + (−1) S (n) 1 + (−1)k . (A.70)
j =2 k=1
(j − k)!

This will be true if and only if


a1 a2 a3 aj −1
1− + − + · · · + (−1)j −1 = 0, j ≥ 2.
(j − 1)! (j − 2)! (j − 3)! 1!
This equation should look familiar. Except for the sign changes, it is exactly the equality
that we saw in equation (A.42) on page 287, an equality uniquely satisfied by the Bernoulli
numbers divided by the factorials. In our case,
Bk
ak = (−1)k .
k!
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

298 Appendix A. Explorations of the Infinite

Since we know that B2m+1 = 0 for m ≥ 1, we have shown that


∞
Bk (k)
S  (x) = f (x) + (−1)k f (x)
k=1
k!

∞
1 B2m (2m)
= f (x) + f  (x) + f (x). (A.71)
2 m=1
(2m)!

We do need to keep in mind that these have all been formal manipulations, what Cauchy
referred to as “explanations drawn from algebraic technique.” This derivation should be
viewed as suggestive. In no sense is it a proof. In particular, there is no guarantee that this
series converges.
Nevertheless, even when the series does not converge, it does provide useful approxi-
mations. If we now integrate each side of equation (A.71) from x = 1 to x = n and then
add S(1) = f (1), we get the Euler–Maclaurin formula.

Theorem A.1 (Euler–Maclaurin Formula). Let f be an analytic function for x > 0,


then, provided the series converge, we have that
n
n ∞
1 B2m (2m−1)
f (k) = f (x) dx + f (n) + f (n)
k=1 1 2 m=1
(2m)!

∞
1 B2m (2m−1)
+ f (1) − f (1). (A.72)
2 m=1
(2m)!

When we set f (x) = ln x and use the fact that we know that the constant term is ln(2π )/2,
this becomes Stirling’s Formula:


n
1 1
ln(n!) = ln k = n ln n − n + ln n + ln(2π ) + E(n), (A.73)
k=1
2 2

where E(n) can be approximated by the asymptotic series,


 B2m
E(n) ∼ . (A.74)
m=1
(2m)(2m − 1)n2m−1

Difficulties

Does the fact that the constant term is ln( 2π ) mean that
  √ 
B2 B4 B6
1− + + + · · · = ln 2π = .9189385. . . ?
1·2 3·4 5·6
Hardly. If we try summing this series, we find that it does not approach anything. The first
few Bernoulli numbers are small, but as we saw in the last section, they start to grow. They
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.4 The Size of n! 299

Table A.2. Partial sums of de Moivre’s series.


N
N 1− m=1 B2m /2m(2m − 1)

1 0 .9166667
2 0 .9194444
3 0 .9186508
4 0 .9192460
5 0 .9184043
6 0 .9203218
7 0 .9139116
8 0 .9434622
9 0 .763818
10 2 .15625
11 −11 .2466
12 145 .602
13 −2047 .5
14 34061 .3
15 −657411 .0

grow faster than 2(2m)!/(2π )2m . Table A.2. lists the partial sums of

N
B2m
1− .
m=1
2m(2m − 1)

The first few values look good—up to N = 4 they seem to be approaching ln 2π —but
then they begin to move away and very quickly the series is lurching out of control.
What about the error function:
B2 B4 B6
E(n) ∼ + + + ··· ;
1 · 2n 3 · 4n3 5 · 6n5
does it converge? In exercises A.4.2 and A.4.3, the reader is urged to experiment with this
series. What you should see is that no matter how large n is, eventually this series will start
to oscillate with increasing swings. But that does not mean that it is useless. If you take the
first few terms, say the first two, then

nn e−n 2π n e1/(12n)−1/(360n )
3

is a better approximation to n! than just



nn e−n 2π n.

Something very curious is happening. As we take more terms, the approximation keeps
getting better up to some point, and then it starts to get worse as the series moves into its
uncontrolled swings. This is what we mean by an asymptotic series. Even though it does
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

300 Appendix A. Explorations of the Infinite

not converge, it does give an approximation to the quantity in question. How many terms
of the asymptotic series should you take? That depends on n. As n gets larger, you can go
farther. Infinite series do strange things.

Exercises

The symbol
M&M indicates that Maple and Mathematica codes for this problem are
available in the Web Resources at www.macalester.edu/aratra.

A.4.1. Show that


    
−1 1 1
+ n+ ln n + − ln n
2 2 2
    
−1 1 1 1 1 2n
= + ln 1 + + ln 1 + .
2 2 2n 2 2n

Use this identity to prove (see equation (A.63)) that




x − x
− 1/2
lim dx = 0.
n→∞ n x

M&M
A.4.2.
Evaluate

nn e−n
3
2π n e1/(12n)−1/(360n )

for n = 5, 10, 20, 50, and 100 and compare it to n!.



A.4.3.
M&M To see how many terms of the asymptotic series we should take, find the
summand in the asymptotic series that is closest to zero and stop at that term. For each of
the values n = 5, 10, 20, 50, and 100, find which summand is the smallest in absolute value.
Estimate the function of n that describes how many terms of the asymptotic series should
be taken for any given n. How accurately does this approximate n! when the number of
terms is chosen optimally?

A.4.4. Use the approximation


2(2m)!
|B2m | ≈
(2π )2m
to check your estimate from exercise A.4.3.

A.4.5.
M&M Using the Euler–Maclaurin formula with f (x) = 1/x gives us an ap-
proximation for the harmonic series. Show that the constant term of the Euler–Maclaurin
formula is

1  B2m
+ .
2 m=1 2m

Determine how useful this is in approximating the value of Euler’s γ .


P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

A.4 The Size of n! 301


M&M
A.4.6.
Use the Euler–Maclaurin formula to show that
n
1 1
= ln n + + γ − H (n)
k=1
k 2n
where H (n) can be approximated by the asymptotic series

 B2m
H (n) ∼ .
m=1
2m n−2m
For each of the values n = 5, 10, 20, 50, and 100, find which summand is the smallest in
absolute value. Estimate the function of n that describes how many terms of the asymptotic
series should be taken for any given n. How accurately does this approximate the harmonic
series when the number of terms is chosen optimally?
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Appendix B
Bibliography

Birkhoff, Garrett, A Source Book in Classical Analysis, Harvard University Press, Cambridge, MA,
1973.
Bonnet, Ossian, “Remarques sur quelques intégrales définies,” Journal de Mathématiques Pures et
Appliquées, vol. 14, August 1849, pages 249–256.
Borwein, J. M., P. B. Borwein, and D. H. Bailey, “Ramanujan, Modular Equations, and Approxima-
tions to Pi or How to Compute One Billion Digits of Pi,” The American Mathematical Monthly,
vol. 96, no. 3, March 1989, pages 201–219.
Cauchy, Augustin-Louis, Cours d’Analyse de l’École Royale Polytechnique, series 2, vol. 3 in Œuvres
complètes d’Augustin Cauchy, Gauthier-Villars, Paris, 1897.
Cauchy, Augustin-Louis, Leçons sur le calcul différentiel, series 2, vol. 4 in Œuvres complètes
d’Augustin Cauchy, Gauthier-Villars, Paris, 1899.
Cauchy, Augustin-Louis, Résumé des Leçons données a l’École Royale Polytechnique sur le calcul
infinitésimal, series 2, vol. 4 in Œuvres complètes d’Augustin Cauchy, Gauthier-Villars, Paris,
1899.
Dijksterhuis, E. J., Archimedes, translated by C. Dikshoorn, Princeton University Press, Princeton,
1987.
Dirichlet, G. Lejeune, Werke, reprinted by Chelsea, New York, 1969.
Dunham, William, Journey through Genius: the great theorems of mathematics, John Wiley & Sons,
New York, 1990.
Edwards, C. H., Jr., The Historical Development of the Calculus, Springer–Verlag, New York, 1979.
Euler, Leonhard, Introduction to Analysis of the Infinite, books I & II, translated by John D. Blanton,
Springer–Verlag, New York, 1988.
Gauss, Carl Friedrich, Werke, vol. 3, Königlichen Gesellschaft der Wissenschaften, 1876.
Grabiner, Judith V., The Origins of Cauchy’s Rigorous Calculus, MIT Press, Cambridge, MA, 1981.
Grattan-Guinness, Ivor, Convolutions in French Mathematics, 1800–1840, vols. I, II, III, Birkhäuser
Verlag, Basel, 1990.
Grattan-Guinness, Ivor, The Development of the Foundations of Mathematical Analysis from Euler
to Riemann, MIT Press, Cambridge, MA, 1970.

303
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

304 Appendix B. Bibliography

Grattan-Guinness, Ivor, Joseph Fourier, 1768–1830, MIT Press, Cambridge, MA, 1972.
Hawkins, Thomas, Lebesgue’s theory of integration: its origins and development, 2nd edition,
Chelsea, New York, 1975.
Hermite, Charles and Thomas Jan Stieltjes, Correspondance d’Hermite et de Stieltjes, B. Baillaud
and H. Bourget, eds., Gauthier-Villars, Paris, 1903–1905.
Kaczor, W. J., and M. T. Nowak, Problems in Mathematical Analysis, vols. I, II, III, Student
Mathematical Library vols. 4, 12, 21, American Mathematical Society, Providence, RI, 2000–
2003.
Kline, Morris, Mathematical Thought from Ancient to Modern Times, Oxford, 1972.
Lacroix, S. F., An Elementary Treatise on the Differential and Integral Calculus, translated by
Babbage, Peacock, and Herschel with appendix and notes, J. Deighton and Sons, Cambridge,
1816.
Lacroix, S. F., Traité Élémentaire de Calcul Différentiel et de Calcul Intégral, 4th edition, Bachelier,
Paris, 1828.
Medvedev, Fyodor A., Scenes from the History of Real Functions, translated by Roger Cooke,
Birkhäuser Verlag, Basel, 1991.
Olsen, L., A new proof of Darboux’s theorem, American Mathematical Monthly, vol. 111 (2004),
pp. 713–715.
Poincaré, Henri, “La Logique et l’Intuition dans la Science Mathématique et dans l’Enseignement,”
L’Ensiegnement mathématique, vol. 1 (1889), pages 157–162.
Preston, Richard, “The Mountains of Pi,” The New Yorker, March 2, 1992, pages 36–67.
Riemann, Bernhard, Gesammelte Mathematische Werke, reprinted with comments by Raghavan
Narasimhan, Springer–Verlag, New York, 1990.
Rudin, Walter, Principles of Mathematical Analysis, 3rd edition, McGraw-Hill, New York, 1976.
Serret, J.-A., Calcul Différentiel et Intégral, 4th edition, Gauthier-Villars, Paris, 1894.
Struik, D. J., A Source Book in Mathematics 1200–1800, Princeton University Press, Princeton, 1986.
Truesdell, C., “The Rational Mechanics of flexible or elastic bodies 1638–1788,” Leonardi Euleri
Opera Omnia, series 2, volume 11, section 2, Orell Füssli Turici, Switzerland, 1960.
Van Vleck, Edward B., “The influence of Fourier’s series upon the development of mathematics,
Science, N.S. vol. 39, 1914, pages 113–124.
Weierstrass, Karl Theodor Wilhelm, Mathematische werke von Karl Weierstrass, 7 volumes, Mayer
& Muller, Berlin, 1894–1927.
Whittaker, E. T., and G. N. Watson, A Course of Modern Analysis, 4th ed., Cambridge University
Press, Cambridge, 1978.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Appendix C
Hints to Selected Exercises


Exercises which can also be found in Kaczor and Nowak are listed at the start of each
section following the symbol
. The significance of 3.1.2 = II:2.1.1 is that exercise
KN
3.1.2 in this book can be found in Kaczor and Nowak, volume II, problem 2.1.1.
2.1.6 Use the fact that 1 + x + x 2 + · · · + x k−1 = (1 − x k )/(1 − x).
2.1.8 If you stop at the kth term, how far away are the partial sums that have more
terms?

2.2.1 Use the fact that 1 + x + x 2 + · · · + x k−1 = (1 − x k )/(1 − x).


2.2.4 Take the first 3k + 3 terms and rewrite this finite summation as (1 + 2−3 + 2−6 +
· · · + 2−3k ) + (2−1 + 2−4 + 2−7 + · · · + 2−(3k+1) ) − (2−2 + 2−5 + 2−8 + · · · +
2−(3k+2) ).
2.2.6 Use the work from exercise 2.2.5.
2.2.8 Find an expression in terms of r and s for a partial sum of a rearranged series that
uses the first r positive summands and the first s negative summands. Show that
you can get as close as desired to the target value provided only that r and s are
sufficiently close, regardless of their respective sizes.

2.3.4 Take pairs of terms and assume that regrouping of the summands is allowed.
2.3.5 Take the tangent of each side and use the formula
tan x + tan y
tan(x + y) = .
1 − tan x tan y

2.3.8 Explain what happens when you take a = −1 in equation (2.20).

305
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

306 Appendix C. Hints to Selected Exercises

2.4.10 Begin by separating the summands according to the total number of digits in the
denominator:
 
1 1 1
+ + ··· +
1 2 8
 
1 1 1 1 1
+ + + ··· + + + ··· +
10 11 18 20 88
 
1 1 1
+ + + ··· +
100 101 888
 
1 1 1
+··· + + + · · · +
10k 10k + 1 8(10k+1 − 1)/9
+··· .
a. There are 8 summands in the first pair of parentheses. Show that there are
72 = 8 · 9 in the second, 648 = 8 · 92 in the third, 5832 = 8 · 93 in the fourth,
and that in general there are 8 · 9k in the k + 1st. Hint: what digits are you
allowed to place in the first position? in the second? in the third?
b. Each summand in a given pair of parentheses is less than or equal to the first
term. Show that the sum of the terms in the k + 1st parentheses is strictly less
than 8 · 9k /10k , and thus our series is bounded by
8 8 · 9 8 · 92 8 · 93
+ + 2
+ + ··· .
1 10 10 103
c. Evaluate the geometric series given above.
2.4.14 Show that
   
1 1 1 1 1 1 1 1 1
1+ + ··· + = 1+ + + ··· + − 1+ + + ··· + .
3 2n − 1 2 3 2n 2 2 3 n
∞

1 1 dx
2.4.16 Show that 2
< 2
+ .
m=n
m n n x2
2.4.18 Work with the fraction of the road that you have covered. The first step takes you
1/2000th of the way, the next step 1/4000th, the third 1/6000th.

2.5.3 Integration by parts.


2.5.15 Is c the same for all values of n?

2.6.5 How can you use the fact that e−1/x has all of its derivatives equal to 0 at x = 0?
2


KN 3.1.2 = II:2.1.1, 3.1.3 = II:2.1.2, 3.1.4 = II.2.1.3, 3.1.5 = II.2.1.4, 3.1.6 = II:2.1.5,

3.1.15 = II:2.1.8, 3.1.16 = II:2.1.10b, 3.1.17 = II:2.1.9b, 3.1.18 = II:2.1.12,


3.1.19 = II:2.1.13, 3.1.20 = II:2.1.13.
3.1.2 For those functions with |x|, consider x > 0 and x < 0 separately. Use the definition
of the derivative at x = 0. For functions with x
, consider x ∈ Z separately. Use
the definition of the derivative at x ∈ Z.
3.1.3 logx a = (ln a)/(ln x).
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Appendix C. Hints to Selected Exercises 307

3.1.4 Consider the transition points: Is the function continuous there? If it is, rely on the
definition of the derivative.
3.1.15 xf (a) − af (x) = (x − a)f (a) − a(f (x) − f (a)). The same trick will work in
part (b).
f (x)ex − f (0)e0 f (x) cos x − f (0) cos 0
3.1.16 Rewrite the fraction as ÷ .
x−0 x−0
3.1.19 (b) Consider f (x) = x 2 sin(1/x), x = 0.
3.1.20 Rewrite
f (xn ) − f (zn ) f (xn ) − f (a) xn − a f (zn ) − f (a) a − zn
= · + · .
xn − zn xn − a xn − zn zn − a xn − zn
f (xn ) − f (a) f (zn ) − f (a)
Show that this must lie between and . Why doesn’t
xn − a zn − a
this approach work when xn and zn lie on the same side of a?

3.2.9 Show that there is a k between 0 and x for which


f (x0 + 2x) − 2f (x0 + x) + f (x0 ) f  (x0 + 2k) − f  (x0 + k)
= .
x 2 k
Define g(h) = f  (x0 + k + h), so that
f (x0 + 2x) − 2f (x0 + x) + f (x0 ) g(k) − g(0)
= .
x 2 k
Use the generalized mean value theorem a second time.

KN 3.3.4 = II:1.2.1, 3.3.5 = II:1.2.2, 3.3.6 = II:1.2.3, 3.3.7 = II:1.2.4, 3.3.8 = II:1.3.3,

3.3.9 = II:1.3.4, 3.3.10 = II:1.3.7, 3.3.11 = II:1.3.10, 3.3.12 = II:1.3.11, 3.3.13 =


II:1.3.12, 3.3.14 = II:1.2.6, 3.3.15 = II:1.2.7, 3.3.34 = II:2.1.23.
√ √
3.3.3 What fractions in ( 2 − 1, 2 + 1) have denominators ≤ 5?
3.3.4 Where is sin x = 0?
3.3.6 For rational numbers, f (p/q) = p/(q + 1). What is the difference between p/q
and f (p/q)?
3.3.8 Apply the intermediate value theorem to the function g defined by g(x) = f (x) − x.
3.3.11 Consider g(x) = f (x + 1) − f (x), 0 ≤ x ≤ 1.
3.3.12 f (2) − f (0) = (f (2) − f (1)) + (f (1) − f (0)).
3.3.13 Start by explaining why f (i + 1) − f (i) cannot be strictly positive for all integer
values of i ∈ [0, n − 1].
3.3.14 Consider separately the cases x 2 ∈ N, x 2 ∈ N.
3.3.15 Consider separately the cases x ∈ N, x ∈ N.
3.3.17

| sin(x + h) − sin x| = |(sin x)(cos h − 1) + (cos x)(sin h)|


≤ | sin x| · | cos h − 1| + | cos x| · | sin h|
≤ | cos h − 1| + | sin h|.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

308 Appendix C. Hints to Selected Exercises

Graph | cos h − 1| + | sin h| and find an interval containing h = 0 where this func-
tion is less than 0.1.
3.3.18

|(x + h)2 − x 2 | = |2xh + h2 |


= |h| · |2x + h|
≤ |h| · |2 + h|.

3.3.22 Use the power series for ln(1 + x) to show that ln(1 + x) < x for x > 0, and
therefore if a > b > 0 then
 
a−b a−b
ln a − ln b = ln 1 + < .
b b
3.3.27 Since f is continuous on any interval that does not contain 0, you only need to
prove that if c1 ≤ 0 ≤ c2 and if A is between f (c1 ) and f (c2 ), then there is some c,
c1 < c < c2 , for which f (c) = A.
3.3.28 When does a small change in x result in a change in f (x) that cannot be made
arbitrarily small?
f (x) f (c) f (x) g(c) − g(x) f (x) − f (c)
3.3.33 − = · + .
g(x) g(c) g(x) g(c) g(c)


KN 3.4.6 = I:1.1.7–12, 3.4.23 = II:2.1.24, 3.4.24 = II:2.1.25, 3.4.25 = II:2.1.26,
3.4.26 = II:2.1.27, 3.4.27 = II:2.1.28, 3.4.28 = II:2.1.29, 3.4.29 = II:1.2.17, 3.4.30
= II:2.2.1.
3.4.1 The function cannot be continuous.
3.4.2 The domain cannot be a closed, bounded interval.
3.4.4 Prove the contrapositive. Explain why if A and B have opposite signs, then
|A − B| ≥ |A|.
3.4.5 What exactly is the technical statement that corresponds to this condition? For
every pair ( , δ), what must exist? What happens for very large values of ? Does
this technical statement of existence make sense as the definition of a vertical
asymptote?
3.4.6 (h) If you hold n constant, what value of m, 1 ≤ m ≤ 2n − 1 maximizes this
expression? (j) How close can this expression get to 1? (n) Find the minimum
value of x/y + 4y/x in the first quadrant. (o) Set m = kn and find the values of k
that maximize, minimize the resulting expression. (r) Find the maximum value of
xy/(1 + x + y) in the first quadrant.
3.4.11 Given the sequences x1 ≤ x2 ≤ · · · ≤ xk ≤ · · · < · · · ≤ yk ≤ · · · ≤ y2 ≤ y1 , let c
be the least upper bound of {x1 , x2 , x3 , . . .}. Prove that c ∈ [xk , yk ] for every k.
3.4.12 Let S be the set of all x for which a ≤ x < x2 and g(x) ≥ g(x2 ). If S is not empty,
then it is bounded and so has a least upper bound, call it B ≤ x2 . Note that B may
or may not be in S.
a. Use the continuity of g to prove that g(B) ≥ g(x2 ).
b. Use the fact that we can make |g  (x2 ) − (g(x2 ) − g(x))/(x2 − x)| as small as we
wish by taking x sufficiently close to x2 to prove that B < x2 .
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Appendix C. Hints to Selected Exercises 309

c. Use the fact that B < x2 , g(B) ≥ g(x2 ), and g  (x) ≥ 0 to prove that there are
elements of S that are strictly larger than B. This implies that B is not an upper
bound and so S must be the empty set.
3.4.13 Assume that we can find a pair (x1 , x2 ), a ≤ x1 < x2 ≤ b, for which f (x1 ) > f (x2 ).
It follows that there is a positive number α such that
f (x2 ) − f (x1 )
< −α < 0.
x2 − x1
3.4.16 If | sin(1/c) − c−1 cos(1/c)| > 1, then c cannot be in the range of g.
3.4.19 Let c = e(−8n+1)π/4 , n ∈ N, and try to find an x for which sin(ln c) + cos(ln c) =
sin(ln x). What are other values of c ∈ (0, 1) that do not correspond to any value of
x?
3.4.20 Recall Theorem 3.4.
3.4.22 Start by proving that between any two real roots of P there must be at least
one real root of P  . If a polynomial P has a root of order n > 1 at x = a, then
P (x) = (x − a)n Q(x) where Q is a polynomial, Q(a) = 0. The derivative P  (x) =
n(x − a)n−1 Q(x) + (x − a)n Q (x) has a root of order n − 1 at x = a.
3.4.24 If f has a local maximum at x = c, then f− (c) ≥ 0 (why?). Let d = sup{x | f (x) >
f (c)/2}. Show that f− (d) ≤ 0. Complete the proof.
3.4.25 Use the idea that helped us prove the mean value theorem.
3.4.26 Use the result of exercise 3.4.25.
3.4.27 Use the result of exercise 3.4.26.
3.4.28 Let c = inf {x ∈ (a, b) | f (x) = 0}. Why is this set non-empty?
3.4.30 Consider f (x)eαx .


KN 3.5.2 = II2.3.6, 3.5.3 = II:2.3.7, 3.5.4 = II:2.2.11, 3.5.17 = II:2.3.8, 3.5.18 =
II:2.3.34.
3.5.1 Prove the contrapositive.
3.5.2 Consider negative as well as positive values of x.
3.5.4 Consider derivatives.
3.5.11 Rewrite the limit as
e−1/x x −1
2

lim = lim 1/x 2 .


x→0 x x→0 e

3.5.17 Differentiate each side of

[f (x) − f (0)] g  (θ (x)) = [g(x) − g(0)] f  (θ (x))

with respect to x, collect the terms that involve θ  (x) on one side, divide both sides
by x, and then take the limit of each side as x → 0+ .
3.5.18 Rewrite f (x)g(x) as eg(x) ln(f (x)) .


KN 4.1.16 = I:3.4.10a.
4.1.1 In this case we know that the partial sum to n terms differs from the value of the
series by exactly (1/2)n /(1 − 1/2) = 1/2n−1 .
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

310 Appendix C. Hints to Selected Exercises

4.1.3 For an alternating series with summands whose absolute values are decreasing
toward zero, the partial sum approximation differs from the target value by at most
the absolute value of the next term.
4.1.7 A function f is even if and only if f (−x) = f (x).
4.1.14 Are the hypotheses of the Alternating Series Test satisfied?
4.1.15 Combine consecutive summands with the same sign.
4.1.16 Write out enough terms that you get a feel for n . Combine consecutive summands
with the same sign.


KN 4.2.4 = I:2.2.50, 4.2.5 = I:3.2.1, 4.2.6 = I:3.4.1, 4.2.7 = I:3.4.13, 4.2.29 = I:3.2.17.
4.2.1 How do you know that for all n sufficiently large, |an | < 1?
4.2.2 Use the definition of convergence. To what value does this series converge? Show
that given any > 0, there is some N so that all of the partial sums past the N th
differ from this value by less than .
4.2.4 (a) The arctangent function is bounded. (d) To test for absolute convergence, com-
bine consecutive pairs of terms. (f ) Show that n/(n + 1)2 > 1/(n + 3).
√ √
4.2.5 (a) n2 + 1 − 3 n3 + 1 = n(1 + 1/n2 )1/2 − n(1 + 1/n3 )1/3 . Use a Taylor poly-
nomial approximation. (b) Show that limn→∞ (n/(n + 1))n+1 = limn→∞ (1 +
1/n)−n−1 = 1/e. (c) Use a Taylor polynomial approximation. (f ) Use the root
test.
4.2.6 (b) When does the rational function of a have absolute value less than 1? (c) Use
the root test.
4.2.8 Show that if n ≥ N, then |an | ≤ |aN | α n−N and so
 
n
|an | ≤ α n |aN |/α N .

4.2.22 Prove and then use the fact that for k ≥ 2:


1 1
> .
k ln(k ln 2) 2k ln k

4.2.24 Show that n1+(ln ln n+ln ln ln n)/ ln n = n(ln n)(ln ln n).


4.2.28 Use Stirling’s formula in place of the factorials.
4.2.29 (a) 2n/2 > 2n for n > 8.


KN 4.3.1 = I:3.3.2, 4.3.2 = I:3.3.3, 4.3.3 = I:3.3.6, 4.3.4 = I:3.3.7, 4.3.18 = I:2.4.11,
4.3.20 = I:2.4.15, 4.3.21 = I:2.4.19, 4.3.22 = I:2.4.20, 4.3.23 = I:2.4.26, 4.3.24 =
II:1.2.18, 4.3.25 = II:1.2.19, 4.3.26 = II:1.2.20.
4.3.1 (a) Using the limit ratio test, we have absolute convergence if

(n + 1)3 |x|n+1
1 > lim = |x|.
n→∞ n3 |x|n
Check for convergence at x = +1 and at x = −1. (d) Using the lim sup root test,
we have absolute convergence when

1 > lim (2 + (−1)n )x = 3 |x|.
n→∞
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Appendix C. Hints to Selected Exercises 311

Check for convergence at x = 1/3 and at x = −1/3. (f ) Rewrite this summation



so that the power

of x is the index of summation: ∞ n n2
n=1 2 x = ∞ m=1 am x
m

where am = 2 m if m is a perfect square, am = 0 if m is not a perfect square.


Now use the lim sup root test. (h) Use the lim sup root test and remember that
limn→∞ (1 + 1/n)n = e.
4.3.2 (b) Use the lim sup root test. This converges absolutely if
 1/n
n 2x + 1 2x + 1
1 > lim =
n→∞ n + 1 x x .

Check what happens when |(2x + 1)/x| = 1, i.e. when 2x + 1 = x and when 2x +
1 = −x.
4.3.3 (a) This implies that limn→∞ |an x n |1/n = |x| limn→∞ L1/n n−α/n .

4.3.4 (a) Since the radius of convergence is R, we know that limn→∞ n |an x n | = 1/R. It

follows that limn→∞ n |2n an x n | = 2/R. (c) Use Stirling’s formula.
4.3.6 Do the summands approach 0 when |x| equals the radius of convergence?
4.3.7 Use Stirling’s formula.
4.3.8 Use Stirling’s formula.
4.3.11 Use the ratio test.
4.3.12 This is a hypergeometric series.
4.3.13 Show that
(2n)! (2n)!
1 · 3 · 5 · · · (2n − 1) = = n .
2 · 4 · 6 · · · 2n 2 · n!
If we ignore F (n) in equation (4.15), how close is this approximation when
n = 10? = 20? = 100?
4.3.14 Either use the result of exercise 4.3.13 together with Stirling’s formula, or use the
fact that limk→∞ (1 + 1/k)k = e.
4.3.18 (a) Let α = p/q where gcd(p, q) = 1. The answer is in terms of q.
4.3.20 First show that it is enough to prove the last two inequalities. Use the equivalent
definition on the lim sup found in exercise 4.3.19.
4.3.21 Use the result from exercise 4.3.20.
4.3.23 Show that it is enough to prove the last inequality. Let
an+1
A = lim .
n→∞ an

choose an > 0 and a response N such that for all n ≥ N , an+1 /an < A + . Show
that for n ≥ N , an < AN (A + )n−N . Take the limit as n approaches infinity of the
nth root of this upper bound.

4.4.2 Use equation (4.27).


4.4.4 At x = 1/2,
n
 1 − (−1)n cos(π n/2) √

(−1) cos[(2k − 1)π/4] =
k−1 ≤ 2.

2 cos(π/4)
k=1

By equation (4.23), |Tn − Tm | ≤ 2 2/(2m + 1).
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

312 Appendix C. Hints to Selected Exercises

4.4.12 (c) Use Dirichlet’s test with bk = ck R k and ak = (x/R)k = eikθ .

5.1.1 The regrouped series has initial term 2/3 and ratio 1/9.
5.1.10 Choose any ten terms from the original series to be the first ten terms of the
rearranged series. Can the remaining terms be arranged so that the resulting series
converges to the target value? Does it matter in what order we put the first ten
terms?

5.2.1 Show that the function represented by this series is not continuous.
5.2.4 For all x ∈ [−π, π ], this is an alternating series and therefore the sum of the first
N terms differs from the value of the series by an amount whose absolute value is
less than |x|2N+1 /(2N + 1)!.
5.2.7
1 1 1 1 1
−1 + − + − + − ···
4 9 16 25 36
   
1 1 1 1 1 1
=− 1+ + + + ··· + 2 + + + ··· .
4 9 16 4 16 36
5.2.8 What is the power series expansion of ln(1 − x)?
5.2.9 Use the partial sums

n
xk
Sn (x) =
k=1
k2

and the fact that


|Li2 (1) − Li2 (x)| ≤ |Li2 (1) − Sn (1)| + |Sn (1) − Sn (x)| + |Sn (x) − Li2 (x)|.

KN 5.3.2 = II:3.2.29.

5.3.1 Consider functions for which fk (x) = 0 for all k and all x.
5.3.2 Show that it converges at x = 0. Explain why it is enough to show that for any N
and any x,

 ∞
1 1
≤ ,
n=N+1
n +x
2 2
n=N+1
n2

and then explain why this is true.


5.3.4 Use equation (4.28) from page 166.
5.3.6 Show that
x2 1 1
= − .
(1 + kx 2 )(1+ (k − 1)x )
2 1 + (k − 1)x 2 1 + kx 2
5.3.7 Show that |G(x) − Gn (x)| = | sin x|/(1 + nx 2 ). Given > 0, find x0 > 0 so that
|x| ≤ x0 implies that | sin x|/(1 + nx 2 ) ≤ | sin x| ≤ | sin x0 | < . For this value
of x0 , find an N so that n ≥ N and |x| ≥ x0 implies that | sin x|/(1 + nx 2 ) ≤
1/(1 + nx 2 ) ≤ 1/(nx02 ) < . Explain why this proves that the convergence is
uniform.
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Appendix C. Hints to Selected Exercises 313


KN 5.4.1 = II:3.2.2, 5.4.12 = II:3.2.14.

5.4.1 (a) For each n, find the supremum of {n2 x 2 e−n |x| | x ∈ R}. (c) Show that for
2

any n, there is an x > 0 for which the nth summand is equal to 2n and all of
the summands beyond the nth are ≥ 0. Explain why you cannot have uniform
convergence if this is true. (f ) Show that arctan x + arctan x −1 = π/2, and therefore
the nth summand is equal to arctan(1/(n2 (1 + x 2 ))). Explain why this is less than
or equal to 1/(n2 (1 + x 2 )).
5.4.2 Use the fact that a1 + 2a2 x + 3a3 x 2 + · · · converges uniformly and absolutely on
(0, R).
5.4.5 Find the values of N that are responses to at x = a, over the open interval (a, b),
and at x = b.
5.4.6 Consider summands that are not continuous at x = a or x = b.
5.4.10 Show that for any n ≥ 2:

 
2n
sin kπ/n sin kπ/n

k=2
ln k k=2
ln k
 
− sin π/n 
n
1 1
= + sin(kπ/n) −
ln(n + 1) k=2
ln k ln(k + n)

− sin π/n 
n
ln 2
≥ + sin(kπ/n)
ln(n + 1) k=2
(ln n)(ln 2n)
 
f (n) ln 2 1 ln 2
= − sin(π/n) + ,
(ln n)(ln 2n) ln(n + 1) (ln n)(ln 2n)
where
f (n) = sin(π/n) + sin(2π/n) + sin(3π/n) + · · · + sin(nπ/n).
Use equation (5.64) to show that
sin(π/n)
f (n) = .
1 − cos(π/n)
5.4.12 (c) Show that 2 sin(n2 x) sin(nx) = cos(n(n − 1)x) − cos(n(n + 1)x).
(d) Rewrite the summation as
sin(nx)  π  π  sin(nx)
∞ ∞
arctan(nx) − + .
n=1
n 2 2 n=1 n
In the first sum, let bn (x) = π/2 − arctan(nx). (e) Rewrite the summation as

 (−1)n+1
n−a/2 .
n=1
nx−a/2

5.4.13 Consider k=1 x −x
k k−1
on [0, 1].

6.1.4 Show that f (x) = f (−x), g(x) = −g(−x).


6.1.5 If f is even and g is odd, then F (−x) = f (x) − g(x).
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

314 Appendix C. Hints to Selected Exercises


6.1.6 If the Fourier series converges at x = 0, then ∞k=1 ak converges, and therefore the

partial sums of k=1 ak are bounded.
6.1.9 Find an algebraic expression for this function on (−1, 1).
6.1.10 Uniform convergence means that you are allowed to interchange integration and
infinite summation.
6.1.18 Change variables using t = α + β − u and let h(u) = g(α + β − u). Show that h
is nonnegative and increasing on [α, β].

6.2.2 Where is the graph of y = x 3 − 2x 2 + x increasing? Where is it decreasing? Where


is the slope steepest?
6.2.4 Fix > 0. Put a bound on the error contributed by using an approximating sum over
the interval [0, ]. Use the fact that sin(1/x) is continuous on the interval [ , 1].
6.2.6 Use the mean value theorem.
6.2.7 We need a bounded, differentiable function whose derivative is not bounded.
6.2.11 Use the definition of differentiability. You must show that
x
f (t) dt
x0
lim − f (x0 ) = 0.
x→x0 x − x0

6.2.12 Consider Theorem 3.14.




KN 6.3.8 = III:1.1.7, 6.3.9 = III:1.1.6, 6.3.10 = III:1.1.14.
6.3.2 Show that given any σ > 0, there is a response δ so that for any partition with
subintervals of length < δ, the variation is less than σ .
6.3.7 Fix a variation σ . Can we limit the sum of the lengths of the intervals on which the
variation exceeds σ ?
6.3.8 Where is this function discontinuous? How large is the variation at the points of
discontinuity?
6.3.10 (d) 1/(n
n
+ k) = (1/2n)(2/(1 + k/n). (f ) First show that the function of n is equal
to e k=1 (1/n) ln(1+k/n) .
6.3.11 The summation is an approximation using a partition with infinitely many intervals
of the form [q n+1 , q n ]. Show that for any > 0, we can find a Riemann sum with
intervals of length less than 1 − q that differs from our infinite summation by less
than .
∞  ∞ ∞
1 1 1
6.3.13 = − .
n=1
(2n − 1) 2
n=1
n 2
n=1
(2n) 2

6.3.20 Note that at points of discontinuity, the function decreases. Otherwise, it is an



increasing function. Show that if we approximate f (x) with 100 2
n=1 ((nx))/n , then
we are within 1/200 of the correct value. Now explain why it follows that if
0 ≤ x < y ≤ 1, then


100
ny − nx 1 1
f (y) − f (x) < + < (y − x)( ln(101) + γ ) + .
n=1
n2 100 100
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Appendix C. Hints to Selected Exercises 315

6.3.16 Write f (x) = fN (x) + RN (x) where


∞
((nx))
RN (x) = .
n=N+1
n2

Given an > 0, the task is to show how to find a response δ such that for 0 < ν < δ,
|fN (x + ν) + RN (x + ν) − fN (x + 0) − RN (x + 0)| < .
6.3.19 Let σ (k) be the sum of the divisors of k, and set k = 2a k1 where k1 is odd. Show
that ψ(k) = (2a+1 − 3)σ (k1 ). It is known that
σ (k)
= eγ .
lim
k ln ln k
This is Gronwall’s Theorem, published in 1913.
6.3.20 Show that
∞ ∞ ∞ ∞
1 1 2 1 2 1 1 1
g(1/5) = + − −
5 n=0 5n + 1 5 n=0 5n + 2 5 n=0 5n + 3 5 n=0 5n + 4

 125n2 + 125n + 26
= .
n=0
5(5n + 1)(5n + 2)(5n + 3)(5n + 4)

6.3.21 Let x = p/q, gcd(p, q) = 1. Let m = (q − 1)/2


. Show that

m 
m ∞

((kx)) 1
g(x) = + ((kx))
k qn +k
k=1  k=−m ∞ n=1 
m
1  1
= ((kx)) − 2k .
k=1
k n=1
q 2 n2 − k 2

6.4.1 Consider Theorem 3.14.


6.4.10 Use the fact that αm is an integer that is odd when m is even and even when m is
odd.

A.1.2 Start with



1

 q 1  q−1 1  q−1 1/p


1 − x 1/p = 1 − x 1/p dx − 1 − x 1/p x dx
0 0 0

1  q−1
= 1 − x 1/p dx
0

 
1  q−1 −x (1−p)/p
+p 1 − x 1/p x dx,
0 p
and then use integration by parts on the second integral.
A.1.4 Use the substitution u = (1 − x 1/p )q .
A.1.8 Using equations (A.6) and (A.12), we see that
p+q p+q
f (p, q) + f (p, q − 1) = f (p − 1, q).
q p
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

316 Appendix C. Hints to Selected Exercises

A.1.12 The values are undefined when p or q is a negative integer, but it is defined for
other negative values of p and q.
A.1.13 Show that f (2/3, k) = (5 · 8 · 11 · · · (3k + 2))/(3 · 6 · 9 · · · (3k)). Show that
f (2/3, 1/3 + k) = f (2/3, 1/3) (6 · 9 · 12 · · · (3k + 3))/(4 · 7 · 10 · · · (3k + 1)).

A.2.4 Use defines Bn (x). Show that


k+1the fact that equation (A.16)
n k+1
k B n (1 − x) dx = (−1) k Bn (x) dx.
A.2.5 Use equation (A.33) from the previous exercise and equation (A.29).

A.3.6 Use the fact that


∞ ∞ ∞
1  1  1  1
2 4
= 2 4
+ .
j =1
j k=1 k j =k
j k k=1
k6

Find the coefficients of the summmations on the right side:


 3
∞   1 ∞
 1  = ?× 1 1
2 2 2 2
+ ? × 2 4
+ ? × .
j =1
j 1≤j <k<l<∞
j k l j =k
j k k=1
k6

A.3.9 Rather than trying to evaluate 100!, find


   2m

2 · (2m)!
A = ln = ln n − (2m − 1) ln 2 − (2m) ln π.
(2π )2m n=1

Use the observation that


2 · (2m)!
= eA = 10A/ ln 10 .
(2π )2m

A.3.10 Show that ζ (n) < 1 + 1 x −n dx = 1 + 1/(n − 1).
A.3.19 Use equation (A.53).
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Index

Abel, Niels Henrik, xi, 160–161, 163, 169, 174, Berkeley, George, 50–52
182, 203, 209, 244 Bernoulli numbers, 118, 128, 271, 280–283,
Abel’s lemma (theorem 4.16), 161, 209, 244 287, 297
absolute convergence, 125, 175, 220, 248 Bernoulli polynomials, 269, 278–280
absolute convergence theorem (corollary 4.4), Bernoulli’s identity, 40
126 Bernoulli, Daniel, 4, 52–54, 248
absolute uniform convergence (corollary 5.10), Bernoulli, Jacob, 118, 271, 277–282,
204 284
addition of series (theorem 5.4), 178 Bernoulli, Johann, 40, 52, 109, 277, 284
al-Samaw’al, 278 binomial series, 25, 129
algebraic numbers, 263 convergence, 122, 146, 153, 206
alternating harmonic series, 13, 16 d’Alembert’s investigation of convergence,
alternating series, 126 41–43
alternating series test (corollary 4.5), 126 Bolzano, Bernhard, 57, 78, 81, 84, 90, 258
Ampère, André Marie, 161, 258 Bolzano–Weierstrass theorem, 84
analytic function, 54 Bonnet, Ossian, 72, 102, 220, 244
Apéry, Roger, 290 Bonnet’s lemma (lemma 6.9), 244
Archimedean principle, 12 Bonnet’s mean value theorem (lemma 6.5), 231,
Archimedean understanding, 12, 18 233, 244
Archimedes of Syracuse, 9–11, 19, 22, 237, 277 Borel, Émile, 268
arctangent
series expansion, 23 Cantor, Georg, 84, 268
Aryabhata, 277 Cauchy, Augustin Louis, ix–xi, 11, 12, 19–20,
associative, 12 55, 57, 71–76, 81, 84, 96, 102, 105, 123,
asymptotic series, 299 135, 137, 152, 160, 181–185, 218, 220,
237–242, 248–249, 252, 298
Babbage, Charles, 54 Cauchy criterion
Baire, René Louis, 268 for integrability, 240, 249

317
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

318 Index

Cauchy criterion (theorem 4.2), 123, 129, 208, binomial series, 146
210 Cauchy criterion, 123, 129, 208
Cauchy integral, 238 comparison test, 130
Cauchy sequence, 123 condensation test, 135
Cauchy series, 123 conditional, 126
Cauchy’s condensation test (theorem 4.11), 135 d’Alembert’s definition, 128
Cauchy’s remainder theorem (theorem 3.11), Dirichlet’s test, 164
107 Gauss’s test, 152
characteristic function, 268 improper integral, 138
Charles X, 57 in norm, 145
Collins, John, 39 infinite series, 117
commutative, 12 integral test, 137
comparison test (theorem 4.6), 130 limit ratio test, 131
completeness, 125 limit root test, 132
completeness (theorem 4.3), 125 of binomial series, 153
condensation test, 135 of exponential series, 146
conditional convergence, 126, 177 of Fourier series, 158
conditions for Riemann integrability p-test, 137
(theorem 6.10), 251 pointwise, 145
continued fractions, 83 radius of, 147
continuity, 81 ratio test, 130
and boundedness, 95 root test, 132
and differentiability, 91, 258 uniform, 185, 188, 197, 203, 208, 254, 259
and integrability, 241, 268 converse, 122
Lacroix’s definition, 78 cosine
of composition, 90 series expansion, 44
of power series, 205 covering, 268
of product, 89 C p function, 54
of reciprocal, 89
of sum, 88 d’Alembert, Jean Le Rond, 41–43, 52, 53, 122,
on an interval, 81 128, 129, 248
piecewise, 227 Darboux’s theorem (theorem 3.14), 112
uniform, 228 Darboux, Jean Gaston, 111
continuity and uniform convergence implies decreasing function, 87
convergence at endpoints (theorem 5.14), Dedekind, Julius Wilhelm Richard, 84, 175
208 definite integral, 219
continuity of infinite series (theorem 5.6), 187 de Moivre, Abraham, 40, 271, 293–296
continuity of integral (corollary 6.8), 243 derivative
continuous implies bounded (theorem 3.6), 95 and continuity, 91, 258
continuous implies bounds achieved Lagrange’s definition, 54, 55
(theorem 3.8), 98 of infinite series, 63–65, 195
continuous implies integrable (theorem 6.6), 241 of power series, 205
continuous on [a, b] implies uniform continuity one-sided, 92
(lemma 6.3), 229 Diderot, Denis, 41, 52
contrapositive, 122 differentiable implies continuous (theorem 3.5),
convergence 91
absolute, 125, 149, 175, 220, 248 differentiation
alternating series, 126 of series, 6
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Index 319

Dijksterhuis, E. J., 11 floor, 30


dilogarithm, 187 Fourier, Jean Baptiste Joseph, 1–7, 22, 43, 53,
Dirichlet, Peter Gustav Lejeune, x–xi, 7, 82, 88, 54, 160, 161, 171, 197, 199, 217–220, 222,
160–161, 163, 174, 175, 182, 203, 248, 267
217–227, 248, 267, 269, 270, 282 Fourier series, 5–7, 63, 145, 166, 171, 182, 191,
Dirichlet kernel, 223 197, 199, 218, 248, 269
Dirichlet’s test (corollary 4.17), 164, 166 convergence, 158, 217
Dirichlet’s test for uniform convergence Dirichlet’s test, 164
(theorem 5.16), 211 Dirichlet’s theorem, 227
Dirichlet’s theorem (theorem 6.1), 227 uniform convergence, 210
discontinuity, 84 uniqueness, 267
distributive law for series (theorem 5.5), 179 frequency, 53
divergence, 18, 52–53, 149
comparison test, 130 γ , see Euler’s constant
condensation test, 135 Gauss, Carl Friedrich, x, 57, 149, 151, 152, 174
d’Alembert’s definition, 128 Gauss’s test (theorem 4.15), 152, 187
Gauss’s test, 152 generalized mean value theorem (theorem 3.2),
integral test, 137 75
limit ratio test, 131 geometric series, 17
limit root test, 132 Germain, Sophie, 217, 282
of binomial series, 153 Gilbert, Phillipe, 259
p-test, 137 Grattan-Guinness, Ivor, 18
ratio test, 130 greatest lower bound, 97
root test, 132 Gregory, James, 23, 35, 39, 40
divergence theorem (theorem 4.1), 122 Gudermann, Christof, 203
divergence to infinity, 29
dominated uniform convergence (theorem 5.9), Hachette, Jean Nicholas Pierre, 160
203 Hadamard, Jacques, 270
Hankel, Hermann, 259, 267
Eisenstein, Ferdinand Gotthold Max, 174 Hardy, Godfrey Harold, 259
envelope, 185 harmonic series, 121, 284
–δ, 61 partial sums, 33
Euclid, 73 harmonics, 53
Eudoxus of Cnidus, 11 Hawkins, Thomas, 259n
Euler, Leonhard, ix, 4, 17–18, 31, 38, 39, 43, Heine, Heinrich Eduard, 84, 267
52–54, 138, 150, 151, 172, 248, 271, 282, Hermite, Charles, 269
284–289, 296 Herschel, John, 54
Euler’s constant, 31 Hoüel, Guillaume Jules, 259
Euler–Maclaurin formula (theorem A.1), 139, Holmboe, Bernt Michael, 160
296, 298 hypergeometric series, 150
existence of radius of convergence Gauss’s test for convergence, 152
(theorem 4.14), 149
exponential function ibn Al-Haytham, 278
series expansion, 44 improper integral, 252
unbounded domain, 138
Fermat, Pierre de, 99, 272, 282 value, 138
Fermat’s last theorem, 160, 270, 282–283 increasing function, 87
Fermat’s theorem on extrema (theorem 3.9), 100 infimum, 97
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

320 Index

infinite limit, 109 least upper bound, 97, 210


infinite series, 9, 171 Lebesgue, Henri Léon, 1, 220, 263, 268
addition of two series, 178 Lebesgue integral, 268–269
alternating, 126 Legendre, Adrien Marie, 160, 270, 282
differentiation, 63–65, 195 Leibniz, Gottfried, 23, 24, 40, 50, 99, 172
divergent, 18, 52–53 Levi ben Gerson, 278
integration, 197 L’Hospital, Guillaume François Antoine de, 109
multiplication by constant, 179 L’Hospital’s rule
of continuous functions, 182, 187 0/0 (theorem 3.12), 109
rearranging, 13, 175, 177, 220 ∞/∞ (theorem 3.13), 110
regrouping, 13, 173 L’Huillier, Simon Antoine Jean, 52
infinite summation, see infinite series lim inf, 148
integral lim sup, 148
as an area, 219 limit
as inverse of derivative, 219 at infinity, 109
Cauchy, 238 d’Alembert’s definition, 52
with Cauchy criterion, 240 from the left, 92
definite, 219 from the right, 92
improper, 138 infinite, 109
Lebesgue, 268, 269 interchanging, 171
of infinite series, 197 lower, 148
of power series, 205 one-sided, 92
Riemann, 249, 267 upper, 148
necessary and sufficient conditions for limit ratio test (corollary 4.8), 131
existence, 251 limit root test (corollary 4.10), 132
with Cauchy criterion, 249 Liouville, Joseph, 217
integral form of the mean value theorem logarithm
(theorem 6.7), 243 series expansion, 28
integral of Dirichlet kernel (lemma 6.4), 231 lower limit, 148
integral test (theorem 4.13), 137
integration Machin, John, 23
of series, 5 Maclaurin, Colin, 138, 296
intermediate value property, 73, 78, 79, 85 Madhava, 23
intermediate value theorem (theorem 3.3), 85 mean value theorem
inverse, 122 Bonnet’s, 231, 233, 244
Bonnet’s proof, 72, 74, 102
Jacobi, Carl Gustav Jacob, 174 Cauchy’s first proof, 72–73
Cauchy’s second proof, 75, 77
Kepler, Johann, 99 generalized, 75
Kummer, Ernst Eduard, 283 integral form, 243
mean value theorem (theorem 3.1), 58, 72–77
Lacroix, Sylvestre François, 4, 78, 161 measure, 263, 268
Lagrange, Joseph Louis, ix, 4, 6, 43, 53, 54, 57, Medvedev, Fyodor A., 259n
166, 217, 248 Meray, Charles, 84
Lagrange remainder, 43–47 Mercator, Nicolaus, 28
Lagrange’s remainder theorem (theorem 2.1), 44 method of exhaustion, 11
Lamé, Gabriel, 283 modified converse to intermediate value theorem
Laplace, Pierre Simon, 4, 57, 161 (theorem 3.4), 87
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

Index 321

Monge, Gaspard, 4 rational function, 38


monotonic function, 87 rearranging convergent series (theorem 5.2),
175
Napoléon I, 217 rearranging infinite series, 13, 175, 177, 220
Narayana Pandita, 278 refinement, 239
Navier, Claude Louis Marie Henry, 217 regrouping infinite series (theorem 5.1), 13,
nested interval principle, 32 173
Newton, Isaac, 23–26, 28, 40, 50, 99, 271 regular prime, 283
Nilakantha, 23 Riemann, Georg Friedrich Bernhard, xi, 4, 58,
174–175, 220, 228, 248–255, 258, 267,
Olsen, Lars, 111 269, 270, 289
one-sided derivative, 92 Riemann hypothesis, 290
one-sided limit, 92 Riemann integral, 249, 267
Oresme, Nicole, 17 necessary and sufficient conditions for
existence, 251
p-test (corollary 4.12), 137 Riemann rearrangement theorem (theorem 5.3),
Peacock, George, 54 177
π Riemann’s lemma (lemma 6.2), 228, 242
calculations of, 22–26 Rolle, Michel, 100
Wallis’s formula for, 24 Rolle’s theorem (theorem 3.10), 100
piecewise continuous, 227 root test (theorem 4.9), 132
piecewise monotonic, 87, 227, 262 Russell, Bertrand Arthur William, 1
Poincaré, Henri, ix, 269
pointwise convergence, 145 Saigey, Jacques Fréderic, 160
Poisson, Siméon Denis, 4, 161, 217, Seidel, Phillip, 182
220 series, see infinite series
power series, 39, 145, 204–205 Serret, Joseph Alfred, 72, 102
binomial, see binomial series sine
continuity, 205 series expansion, 44
differentiation, 205 Steiner, Jakob, 174
expansion of e−1/x , 68
2
Stieltjes, Thomas Jan, 269
hypergeometric, see hypergeometric series Stirling, James, 271, 293–296
integration, 205 Stirling’s formula, 45, 118, 133, 146, 271,
uniform convergence, 204, 209 293–298
primes Stirling’s series, 118
counting function, 270, 289 Stokes, George, 182
in arithmetic progression, 270 Sturm, Charles François, 217
Pythagorean triples, 282 supremum, 97
Swineshead, Richard, 17
Q.E.D., 73 Sylvester, James Joseph, 41n
quadrature of the parabola, 9
Taylor series, 40
Raabe, Joseph Ludwig, 152, 258 Cauchy remainder, 107–109
radius of convergence, 147 Lagrange remainder, 43–47, 71, 105–109
existence of, 149 Taylor, Brook, 40
for complex-valued power series, 169 Taylor, Richard, 270, 283
Ramanujan, S., 26 term-by-term differentiation (theorem 5.7), 6,
ratio test (theorem 4.7), 130, 151 195
P1: kpb
book3 MAAB001/Bressoud October 20, 2006 4:18

322 Index

term-by-term integration (theorem 5.8), 5, 197 Vallée Poussin, Charles de la,


trigonometric series, 145, 218, 254 270
variation, 250
uniform continuity, 228 variation on dominated uniform convergence
uniform convergence, 185, 188, 197, 203, 208, (corollary 5.11), 204
254, 259, 267 vibrating string problem, 4, 53–54
Cauchy criterion, 202
Dirichlet’s test, 211 Wallis, John, 23, 271–275
in general, 267 Wallis’s formula, 24, 275, 296
of power series, 209 Weierstrass, Karl Theodor Wilhelm, xi, 4, 58,
Weierstrass M-test, 203, 204 91, 182, 203, 258–263, 267
uniform convergence of power series, I Weierstrass M-test (corollary 5.12), 203, 204,
(corollary 5.13), 204 259
uniform convergence of power series, II Whitehead, Alfred North, 1
(theorem 5.15), 209 Wiles, Andrew, 270, 283
uniformly bounded, 210
upper bound implies least upper bound ζ , see zeta function
(theorem 3.7), 98 zeta function, 36, 289
upper limit, 148 Zhu Shijie, 278
CORRECTIONS TO A RADICAL APPROACH TO REAL ANALYSIS , 2nd
EDITION

page 11: paragraph 4, line 1: “Archimedes” should be “Archimedes’ ”

page 14, Exercise 2.1.1, in part b, the vertices should be at (a, 1−a2 ), (a+δ, 1−(a+δ)2 ),
(a + 2δ, 1 − (a + 2δ)2 )

page 15: Exercise 2.1.2, last line, coordinates of first point should be (k2−n , 1 − k 2 2−2n ).

page 15: Exercise 2.1.5, line (2.5), the second term should be 1/2k rather than 1/(2k n)

page 16: Exercise 2.1.10.a, last line, “are all with” should read ”are all within”

page 17: Exercise 2.1.10 (d) Insert the following sentence immediately before the sentence
that begins “Explore the decimal values . . .”: “Given positive integers r and s, consider
the rearrangement of the harmonic series that takes the first r positive terms, then the
first s negative terms, then continues to alternate r positive terms with s negative terms.”

page 25: Equation (2.21), in the second line, the last term should be t6 , not t4

page 27: Exercise 2.3.11. This technique only works for x ≥ 4. For x = 2 or 3, find the
square root of 1/2 = 1 − 1/2 or 3/4 = 1 − 1/4, respectively, then multiply your answer by
2.

page 28: Exercise 2.3.12. (1 + x)2 should be (1 + x)a .

page 36: Exercise 2.4.9, “greek letter” should be “Greek letter”

page 40: line 3, “Jean Bernoulli” should be “Johann Bernoulli”

page 44: Theorem 2.1, Opening phrase of the theorem should be: “Given a function f
for which the nth derivative, f (n) , is continuous on an open interval that contains (a, x),
. . .”

page 45: Just above the headline Lagrange and the Binomial Series, insert “Of
course, we do not need Stirling’s Formula to prove that this limit is 0. For example, choose
any integer N > 2|x|. Then show that ratio of consecutive terms is less than 1/2 for all
n > N , which implies the sequence (for n > N ) is bounded above by a constant times
(1/2)n and hence approaches 0.”

page 47: line 1: “|x| < 1” should be “0 < x < 1”,


Date: February 2, 2020.
1
2 CORRECTIONS TO A RADICAL APPROACH TO REAL ANALYSIS , 2ND EDITION

line 3: “If |x| is larger than 1” should be “If x is larger than 1”.

page 50: Exercise 2.5.20, change “graph x−7 times each” to just “graph each”

page 52: left-side of last displayed equation should be 1 − x2 + x3 − x5 + x6 − x8 + · · ·

page 54: The definition of C p and analytic functions ignores a very real distinction
between C ∞ functions and analytic functions. A function f that is an analytic function at
x0 must be C ∞ on an open interval containing x0 , but more than that, there must be an
open interval containing x0 in which the power series at x0 converges to f . The example
given on page 55 is precisely an example of a function that is C ∞ for all x but is not
analytic at x = 0.

page 56: Exercises 2.6.4 and 2.6.5. Delete the adjective “analytic.”

page 58: Figure 3.1, the label at the right endpoint of the interval should be x

page 59: line 2 should end with “. . . what do we mean by the”

page 73: In the definition of the intermediate value property, immediately below, and
in Figure 3.6: It is confusing to use x1 and x2 since x1 and x2 occur earlier in the page as
points where Cauchy cuts the interval. Substitute α for x1 and β for x2 .

page 81: fifth line from bottom, “of [a, b]” should be “on [a, b]”

Page 84: line 16, first term in the sequence should be 2/π rather than 1/π.

Pages 86–7: starting at the second line above equation (3.43), the fact that f (xk ) and
f (yk ) can be forced as close together as we wish by taking k sufficiently large relies on
uniform continuity, which has not yet been established for continuous functions on closed
bounded intervals. To avoid the need to use uniform continuity, replace the text starting
at this point and continuing to the end of the proof with the following:

and f (xk+1 ), f (yk+1 ) lie on opposite sides of A.


Our sequences x1 ≤ x2 ≤ · · · and y1 ≥ y2 ≥ · · · satisfy the conditions of the nested
interval principle and so there is a number c that lies in all of these intervals. Again, by
the Archimedean definition of limit, we see that
lim xk = lim yk = c.
k→∞ k→∞
Since f is continuous at x = c, we knwo that
lim f (xk ) = lim f (yk ) = f (c).
k→/inf ty k→∞

Since a lies between f (xk ) and f (yk ) and each of these sequences has the common limit of
f (c), A must equal f (c).

page 103, Exercise 3.4.6.f. By “decimal fraction” I mean a number in decimal form
CORRECTIONS TO A RADICAL APPROACH TO REAL ANALYSIS , 2nd EDITION 3

page 103, Exercise 3.4.12, “increasing” should be “strictly increasing”

page 106: equation (3.52) third term of the series expansion for F (x) should be f 0 (a)(x−
a) rather than f (a)(x − a).

page 107: Theorem 3.11. The hypothesis should be that there is a neighborhood of
x = a in which all derivatives of f exist rather than just that all deriviatives of f exist at
x = a.

page 109: In “Definition: infinite limit and limit at infinity,” line 4 should read
sufficiently close to a (but not equal to a). That is to say, there is a δ > 0
so that 0 < |x − a| < δ implies that

page 112, exercise 3.5.3, second displayed inequality, condition should read “if 0 < α < 1”

page 113, Exercise 3.5.8, last expression in displayed equation should be


2x sin(1/x) − cos(1/x)
lim ,
x→0 1

page 122: first paragraph following Theorem 4.1: ... I want to emphasize what it [omit
is] does

page 127, 3rd line before exercises, change “The summands alternate” to “The signs of
the summands alternate”

page 131: Corollary 4.8 (The Limit Ratio Test). Add: If the limit does not exist, then
this test is inconclusive.

page 132: Corollary 4.10 (The Limit Root Test). Add: If the limit does not exist, then
this test is inconclusive.

page 158: Exercise 4.3.27, “those values of k” should be “those values of m”

page 161: last line before Abel’s Lemma, the month of Abel’s death should be April,
not January.

page 172: equation (5.5), in the limit after the equal sign, limx→0 should be limy→0

page 177, line 3 following ”Rearrangment with Conditional Convergence,” (5.13) should
be (5.11)

page 193, Figure 5.7. There is no graph of y = x. Delete the phrase “, with graph of
y = x included”

page 195, Theorem 5.7, part 1 should read F = f1 + f2 + f3 + · · · converges uniformly


over any bounded subinterval of I
4 CORRECTIONS TO A RADICAL APPROACH TO REAL ANALYSIS , 2ND EDITION
P
page 201: Exercise 5.3.3 should read: Prove that ifP gk converges uniformly over the
bounded interval I and if fk (x) = (x − a)gk (x), then fk converges uniformly over I.

page 202: Exercise 5.3.11. The reference should be to exercise 5.3.10, not 6.3.7.
n2
page 212: Exercise 5.4.1.b., summand should be √
n!
(xn + x−n ), missing factorial in the
denominator.

page 229: The proof of Lemma 6.3 contains an error. The fact that x is the upper limit of
the xn does not guarantee that |x−xn | can be made arbitarily small for sufficiently large n.
To see a corrected proof, go to www.macalester.edu/aratra/corrections/lemma6-3.pdf.

page 231: first line, “We have to careful.” should be “We have to be careful.”

pages 231 and 244: Lemmas 6.5 and 6.9, if we want ζ to lie strictly between α and β,
then we do need g to be non-constant on the open interval (α, β).

page 232: First displayed equation after (6.32) should be |F (x + 0) − f (x + 2a)|, missing
closing parenthesis.

page 233: Definition of g in first displayed equation, top line should be |F (x + 0) − F (x +


2u)|, 0 < u ≤ a,

page 256, Last line of exercise 6.3.9, the functions f should be h.

page 240: second to last displayed inequality: In the first summation, the upper limit of
summation should be r rather than n.

page 259: Line 6. The proof in 1970 was by Joseph Gerver: The differentiability of the
Riemann function at certain rational multiples of π. American Journal of Mathematics,
92:1970, pp. 33–55.

page 274: Equation (A.8), the numerator of the fraction to the right of the equality
should be 4 · 6 · 8 · · · (2p + 2q − 2)

page 279: first display after (A.21): upper limit of integration should be k [not 1]

page 310: hint for 4.2.5, “(f)” should be “(d)”


ARATRA | A Radical Approach to Real Analysis

David M. Bressoud
Resources for 
A Radical Approach to Real Analysis (2nd edition)

Links to the available resources. 


Note that the Mathematica notebooks may download as text files that may need to
be pasted into a Mathematica notebook in order to be read.The Maple code is on
.docx files from which it can be copied.

Chapter 1: Crises in Mathematics: Fourier's Series


The Derivation of Fourier's Solution
Laplace's Equation
Finding the Coefficients
Approximating Fourier's Solution
Mathematica code​
Maple code
The General Solution
The Orthogonality Relation
Fourier Series as Complex Power Series
Maple code for exercises
Mathematica code for exercises
Chapter 2: Infinite Summations
The quadrature of a parabolic segment
The Archimedean Principle
Explorations of the alternating harmonic series
Mathematica code
Maple code
Assigning values to divergent series
More Pi
Mathematica code
Maple code
Newton's formula
Explorations of the Harmonic Series

https://www.davidbressoud.org/aratra[2023/03/01 13:24:32]
ARATRA | A Radical Approach to Real Analysis

Euler's Solution to the Vibrating Drumhead


Explorations of d'Alembert's series
Mathematica code
Maple code
Explorations of Lagrange's Remainder
Mathematica code
Maple code
Maple code for exercises
Mathematica code for exercises
Chapter 3: Differentiability and Continuity
Newton-Raphson Method
How to find and write a proof
Continued Fractions
The Marquis de l'Hospital
Maple code for exercises
Mathematica code for exercises
Chapter 4: The Convergence of Infinite Series
Stirling's Formula
Mathematica code
Maple code
Exponential function
Mathematica code
Maple code
Convergence in norm
Gauss's test
Maple code for exercises
Mathematica code for exercises
Chapter 5: Understanding Infinite Series
The Dilogarithm
Maple code for exercises
Mathematica code for exercises
Chapter 6: Return to Fourier Series​
Maple code for exercises
Mathematica code for exercises
Appendix A: Explorations of the Infinite
Binomial coefficients and sums of nth powers
Maple code for exercises
Mathematica code for exercises
Corrections
Acknowledgements
HOME

BIO and RESUME

BOOKS and VIDEOS

TALKS

CONTACT
© 2023 by Name of Site. Proudly created with Wix.com

https://www.davidbressoud.org/aratra[2023/03/01 13:24:32]
Derivation of Fourier’s Solution

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

December 13, 2005

Fourier began his study of the heat flow problem by demonstrating that a stationary solution
satisfies the differential equation now known as Laplace’s equation:
∂2z ∂2z
+ = 0. (1)
∂x2 ∂w2
Pierre Simon Laplace (1749–1827) and others had come across this equation in various contexts.
In modern terminology, it is simply the observation that when the flow of heat (∇z) has reached a
state of equilibrium, it is incompressible (∇·∇z = 0).

Web Resource: Go to Laplace’s Equation to learn more about


this fundamental partial differential equation.

To solve his partial differential equation (1), Fourier introduced a technique that is standard today.
He searched for special solutions of the form
z(x, w) = g(x)h(w). (2)
When z is of this form, equation (1) reduces to
g 00 (x)h(w) + g(x)h00 (w) = 0, (3)
or, assuming the second derivatives are not zero,
h(w) g(x)
00
+ 00 = 0,
h (w) g (x)
g(x) −h(w)
= . (4)
g 00 (x) h00 (w)
The left side of equation (4) is independent of w while the right side is independent of x. This
implies that both sides are independent of both w and x, and so each of these ratios is constant,
g(x) −h(w)
= C = 00 .
g 00 (x) h (w)

1
Derivation of Fourier’s Solution 2

Since g(x) = Cg 00 (x), the sign of g(x) is either always the same as the sign of g 00 (x), or it is always
the opposite. If we want z(x, w) to be continuous, then we need to have g(−1) = g(1) = 0, and so
g(x)/g 00 (x) must be negative:

h(w) g(x)
= A > 0, = −A < 0,
h00 (w) g 00 (x)

for some positive constant A. Fourier set A = 1/t2 and solved for g(x) = c1 cos tx + c2 sin tx and
h(w) = c3 e−tw + c4 etw . The coefficient of sin tx must be zero because g is an even function of x.
He then argued that c4 must be zero because the temperature will approach 0 as we move away
from the source of heat at w = 0. He had found a solution:

z(x, w) = ae−tw cos tx,

where a and t are unknown constants. If we want this solutionto be zero at x = ±1, then t must
be an odd multiple of π/2.

The general solution is a sum of such functions:

z(x, w) = a1 e−πw/2 cos(πx/2) + a2 e−3πw/2 cos(3πx/2)


+a3 e−5πw/2 cos(5πx/2) + · · · + an e−(2n−1)πw/2 cos (2n − 1)πx/2 .


Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Laplace’s Equation

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

December 16, 2005

Heat can be thought of as a fluid or a gas. In the absence of external forces, it moves from regions
of high density to regions of low density in a manner that is very similar to a gas. In particular,
it moves along curves perpendicular to the isoclines or curves of constant temperature. If z(x, w)
denotes the temperature at point (x, w), then the vector representing the flow of heat at (x, w) will
be the gradient of z which is
∂z ∂z
∇z = ~ı + ~.
∂x ∂w

The divergence of a vector function is a measure of how much more of whatever is flowing leaves
a given region than enters it. Given a region R and a flow described by the vector function
F~ = f ~ı + g ~, the divergence at a point is measured by calculating the net rate at which the flow
leaves the region R, I
F~ · ~n ds,
∂R
dividing by the area of R, and then taking the limit as the region R shrinks to the single point in
question. This value is denoted by divF~ = ∇ · F~ and can be calculated directly as
∂f ∂g
∇ · F~ = + .
∂x ∂w

As long as we are not on the boundary of our thin plate, heat is neither being created nor destroyed.
Since the temperature has reached steady state (it is independent of time), the divergence must be
0. This is the same as saying that ∇ · ∇z = 0 or, equivalently, that

∂2z ∂2z
+ = 0.
∂x2 ∂w2
This is Laplace’s equation, valid for any incompressible fluid with potential function z.

1
How Fourier found the coefficients for equation (1.7)

Appendix to A Radical Approach to Real Analysis 2nd edition


2013
c David M. Bressoud

September 6, 2013

While Fourier described the cosine expansion of many different even functions, all of the relevant
techniques and difficulties can be found in his first example: the expansion of f (x) = 1. Fourier
used the observation that
Z 1     
(2m − 1)πx (2n − 1)πx 0 if m 6= n,
cos cos dx = (1)
−1 2 2 1 if m = n.
We follow Fourier and assume that our even function f can be expressed as a cosine series:
 πx     
3πx 5πx
f (x) = a1 cos + a2 cos + a3 cos + ···
2 2 2
∞  
X (2m − 1)πx
= am cos . (2)
2
m=1
Fourier now argues that an can be calculated by evaluating the following integral:
Z 1  
(2n − 1)πx
f (x) cos dx
−1 2
Z 1 "X ∞  #  
(2m − 1)πx (2n − 1)πx
= am cos cos dx
−1 m=1 2 2
∞ Z 1    
X (2m − 1)πx (2n − 1)πx
= am cos cos dx (3)
−1 2 2
m=1
= a1 · 0 + a2 · 0 + a3 · 0 + · · · + an−1 · 0 + an · 1 + an+1 · 0 + · · ·
= an . (4)

For our particular case, f (x) = 1, the coefficients are


Z 1  
(2n − 1)πx
an = 1 · cos dx
−1 2
(2n − 1)πx 1
  
2
= sin
(2n − 1)π 2 −1
4 n−1
= (−1) . (5)
(2n − 1)π

1
Fourier Series as Complex Power Series 2

It follows that
 
4 πx 1 3πx 1 5πx 1 7πx
f (x) = 1 = cos − cos + cos − cos + ··· . (6)
π 2 3 2 5 2 7 2

We recall that our original problem was to find the distribution of heat, z(w, x), when we hold the
side at x = 0 at the constant temperature 1 and the sides at x = −1 and x = 1 at the constant
temperature 0. The solution is given by

4 −πw/2 πx 1 −3πw/2 3πx
z(w, x) = e cos − e cos
π 2 3 2

1 5πx 1 −7πw/2 7πx
+ e−5πw/2 cos − e cos + ··· . (7)
5 2 7 2

Equation (??) has an interesting corollary. If we set x = 0 and multiply both sides by π, then we
see that  
1 1 1
π = 4 1 − + − + ··· . (8)
3 5 7

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Approximating Fourier's Solution
This notebook provides explorations of the general solution to the heat-flow problem. Click on
the formula given below and then press the ENTER key. When you are asked if you want to
evaluate all the initialization cells, answer YES. When Mathematica warns that there might be a
problem, proceed and answer EVALUATE.

> z :=(x,w) -> Sum(a[i]*exp((1/2*(2*i-1)*(-Pi)*w))*(cos(1/2*(2*i-1)*Pi*x)),i = 1 ..


nops(a));

The variable a is a list of the coefficients. For example, we might have a = {1,-1/2,1/3,-1/4,1/5}.
This definition of z uses the length of the list to determine the number of summands. The
following input enters these values and returns the resulting function z.

> a := [1, -1/2, 1/3, -1/4, 1/5]:


z(x,w);

The  first command produces a 3-dimensional plot of z(x,w), the second produces the
cross-section parallel to the w-axis at x=0, and the third produces the cross-section above the
x-axis.

> plot3d(z(x, w), x= -1.. 1, w= 0.. 2);

> plot(z(0, w), w= 0.. 2);

> plot(z(x, 0), x= -1.. 1);

Fourier's approximation to the constant function 1


The first line enters the function FS(n,x), the first n terms of Fourier's approximation to the
constant function 1.  The second line graphs this truncated Fourier series. It is set up to use 6
terms. This value can be changed. The third line intrdouces the solution to the heat equation, a
function of both x and w. The fourth line graphs the resulting surface that shows the solution of
the heat equation. Again, this has been set up to use the first 6 terms, but this parameter can be
varied.

> FS :=  (n, x) -> 4/Pi*sum((-1)^(i-1)/(2*i-1)*cos(1/2*(2*i-1)*Pi*x),i = 1 .. n) ;

> plot(FS(6,x), x=-1..1);

> FSS := (n,x,w) ->


4/Pi*sum((-1)^(i-1)/(2*i-1)*exp(1/2*(-2*i+1)*Pi*w)*cos(1/2*(2*i-1)*Pi*x),i = 1 .. n);
> plot3d(FSS(6,x,w), x=-1..1, w=0..2);
The General Solution

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

January 20, 2009

How did Fourier discover that in order to expand f (x) = 1, the coefficient of cos((2n − 1)πx/2)
should be (−1)n−1 · 4/(2n − 1)π? He gave several different derivations, but they all amounted to
what has become the standard procedure for finding the coefficients in a Fourier series. To keep
life simple, we will restrict our attention to even functions, f (x) = f (−x), because these can be
expressed in terms of cosines. In chapter 6, we will look at the case of Fourier series for more
general functions.

We begin with the assumption that our function actually can be written as cosine series, though it
may require infinitely many terms. We begin with the equation
∞  
πx 3πx 5πx X (2m − 1)πx
f (x) = a1 cos + a2 cos + a3 cos + ··· = am cos , (1)
2 2 2 2
m=1

where the coefficients a1 , a2 , a3 , . . . exist, we just do not know what they are.

There is a nice trick for finding these coefficients. We observe that


Z 1     
(2m − 1)πx (2n − 1)πx 0, m 6= n,
cos · cos dx = (2)
−1 2 2 1, m = n.

Web Resource: Go to The Orthogonality Relation to to see why


this is true.

1
The General Solution 2

Fourier now uses equation (2) to peel off the coefficients one at a time:
Z 1   Z 1 "X ∞  #  
(2n − 1)πx (2m − 1)πx (2n − 1)πx
f (x) cos dx = am cos cos dx
−1 2 −1 2 2
m=1
∞ 1    
(2m − 1)πx (2n − 1)πx
X Z
= am cos · cos dx
−1 2 2
m=1

= a1 · 0 + a2 · 0 + · · · + an−1 · 0 + an · 1 + an+1 · 0 + · · ·
= an . (3)

It is now possible to calculate the coefficients for the solution when f (x) = 1. The coefficients are
found by substituting 1 for f (x) in equation (3):
Z 1  
(2n − 1)πx
an = 1 · cos dx
−1 2
(2n − 1)πx x=1
 
2
= sin
(2n − 1)π 2
x=−1
4
= (−1)n−1 . (4)
(2n − 1)π

When 1 < x < 1, we have


       
4  πx  1 3πx 1 5πx 1 7πx
1 = cos − cos + cos − cos + ···
π 2 3 2 5 2 7 2

4 X (−1)n−1
 
(2n − 1)πx
= cos . (5)
π 2n − 1 2
n−1

There is one particularly nice consequence of equation (5). If we set x = 0, then all the cosines
take on the value 1. This implies that
π 1 1 1 1 1
=1− + − + − + ··· . (6)
4 3 5 7 9 11

Web Resource: Go to Approximating Fourier’s Solution to ap-


proximate this solution and to explore the Fourier cosine series for
other functions.

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
The Orthogonality Relation

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

January 2, 2022

The term orthogonal means “at right angles.” The concept comes from geometry, but it is not
too much of a stretch to see how it comes to be applied to functions.

An easy way to determine whether or not two vectors are orthogonal is to take their inner product
(also called the dot product). The inner product has two equivalent definitions. On the one hand,
it is the product of the lengths of the vectors multiplied by the cosine of the angle between them,

~v · w
~ = k~v k kwk
~ cos θ.

On the other hand, if we know the decomposition of these vectors into the unit basis vectors, then
the dot product is the sum of the products of the corresponding coefficients,
   
v1~ı + v2~ + v3~k · w1~ı + w2~ + w3~k = v1 w1 + v2 w2 + v3 w3 .

The first definition gives meaning to the the inner product. The second makes it easy to calculate.
From the first definition, we can use the inner product to decide whether or not two vectors are
orthogonal: They are orthogonal if and only if their inner product is 0. We can also use it to find
the norm or length of any vector. Since the angle between any vector and itself is 0, we have that

k~v k = ~v · ~v .

Functions are like vectors in that the sum of two functions is another function, and any constant
multiple of a function is a function. If we can define a natural inner product on functions, then we
can use it to find the norm of a function (analogous to the length of a vector), and we can use it
to define orthonality of functions.

A natural inner product for functions is given by the integral of their ordinary product. This is the
limit of a sum of products, and so really is analogous to the inner product of vectors. We denote
this inner product by Z 1
hf, gi = f (x)g(x) dx.
−1

1
The Orthogonality Relation 2

It follows that the norm or “length” of a function is


Z 1 1/2
p 2
kf k = hf, f i = f (x) dx .
−1

Two functions are orthogonal if and only if


Z 1
hf, gi = f (x)g(x) dx = 0.
−1

The functions 
cos(πx/2), cos(3πx/2), cos(5πx/2), cos(7πx/2), . . .
are pairwise orthogonal—any two distinct functions from this list are orthogonal—and their norms
are all 1:
    
(2m − 1)πx (2n − 1)πx
cos , cos
2 2
Z 1    
(2m − 1)πx (2n − 1)πx
= cos cos dx
−1 2 2

0, if m 6= n
= (1)
1, if m = n.

To prove equation (1), we start with the trigonometric identity,


1 
cos(A) cos(B) = cos(A + B) + cos(A − B) .
2
When we use this to simplify our integral, we get that
Z 1    
(2m − 1)πx (2n − 1)πx
cos cos dx
−1 2 2
1 1
Z
 
= cos (m + n − 1)πx + cos (m − n)πx dx.
2 −1

If m = n, this is
"  #1
1
1 sin (2n − 1)πx
Z
1   
cos (2n − 1)πx + 1 dx = +x = 1.
2 −1 2 (2n − 1)π
−1

If m 6= n, this is
"   #1
1 sin (m + n − 1)πx sin (m − n)πx
+ = 0.
2 (m + n − 1)π (m − n)π
−1

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Fourier Series as Complex Power Series

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

June 26, 2007

As we shall prove in chapter 5, power series are nice. You can integrate a function represented by
its power series by integrating term-by-term. The resulting series will converge to an integral of the
original function at every point in the interval of convergence of the original function. You can also
differentiate a power series by differentiating term-by-term, but here you have to be a little more
careful. The infinite sum of derivatives will have the same radius of convergnece as the original
summation, but the interval of convergence could be strictly smaller. For example,
x2 x3 x4
f (x) = 1 + x + + + + ··· (1)
2 3 4
has [1, 1) as its interval of convergence, converging at x = 1 but not at x = 1. We can differentiate
term-by-term,
f 0 (x) = 1 + x + x2 + x3 + · · · , (2)
which has interval of convergence (−1, 1), converging at neither x = −1 nor at x = 1.

In chapter 5, we shall prove that when we integrate or differentiate a power series, we do not
change the radius of convergence, but we can lose points of convergence on the boundary when we
differentiate. When integrating, we can gain points of convergence on the boundary.

The key to understanding Fourier series is that they are really power series in the complex plane,
using the relationship
n
eix = cos x + i sin x, eix = einx = cos nx + i sin nx. (3)

The series
 
4 1 1 1
f (x) = cos(πx/2) − cos(3πx/2) + cos(5πx/2) − cos(7πx/2) + · · · (4)
π 3 5 7
is simply the real part of
z3 z5 z7
 
4
z− + − + ··· , (5)
π 3 5 7
where z = e(iπx/2) .

The series in (5) has radius of convergence equal to 1.

1
Fourier Series as Complex Power Series 2

Given any power series in z with radius of convergence R, the series will converge when |z| < R
and diverge when |z| > R. In other words, it converges at all points inside the circle of radius R
(which is why we call it the radius of convergence). Behavior on the circle that forms the boundary
between the region of convergence and the region of divergence is more complicated. The series
might converge at all points on the circle, or it might diverge at all points on the circle, or it could
converge at some points and diverge at others.

The Fourier series is a power series in z evaluated at z = e(iπx/2) . All of these points lie on the
circle centered at the origin with radius 1. For the power series in (5), these are all points on the
boundary between the region of convergence and the region of divergence. This is why these series
are so problematic. It explains why Fourier series can converge everywhere, yet the term-by-term
derivative fails to converge anywhere except where all summands are 0.

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Chapter 1
Maple code for exercises in section 1.2
1.

The command jf[n,x] produces the first n terms of the Fourier cosine expansion of the constant
function 1.

> jf := (n, x) -> evalf(4/Pi*sum((-1)^(k-1)*cos((2*k-1)/2*Pi*x)/(2*k-1),k = 1 .. n));

> for n from 1 to 4 do plot(jf(n,x),x = -1 .. 3) end do;

2.  

The vector v lists all of the values at which we want to evalate the function jf[n,x]
defined in exercise 1. The plots[listplot] commands generate lists of the values of the
partial sums of this series evaluated at x = 0.99 and x = 0.999, respectively.

> seq([jf(100,v)],v=[0,0.5,0.9,0.99,1.1,2]);

> plots[listplot]([seq([n, jf(n,.99)],n = [seq(100+100*i,i = 0 .. 19)])]);

> plots[listplot]([seq([n, jf(n,.999)],n = [seq(100+100*i,i = 0 .. 19)])]);

3.

This command evaluates the sum of the first 10*2^n terms in the given series.

> [seq(evalf(Sum(1/(2*k-1),k = 1 .. 10*2^n)),n = 0 .. 10)];

4.

We denote the sum of the first n terms of this series as

> z :=  (n, x, w) ->


evalf(4/Pi*sum((-1)^(k-1)*exp(-(2*k-1)*Pi*w/2)*cos((2*k-1)*Pi*x/2)/(2*k-1),k = 1 ..
n));

and then do a 3-dimensional plot using

> for n from 1 to 4 do plot3d(z(n,x,w),x = -1 .. 1,w = 0 .. .6) end do;


5.

The following command will compute the first n terms of the series

> s := n -> evalf(Sum((-1)^floor(1/2*k-1/2)/(6*floor(1/2*k)+(-1)^(k-1)),k = 1 .. n));

> [seq(s(n),n = 1 .. 20)];

7.

The following command will compute the first n terms of the series

> g := (n, x) -> evalf(2*sum((-1)^i*sin(1/2*(-1+2*i)*Pi*x),i = 1 .. n));

> for n from 10 by 10 to 50 do plot(g(n, x), x = -1 .. 3, y = -10 .. 10, adaptive = false,


numpoints = 1000) end do;

> [seq(g(n,0),n = 1 .. 20)];

> [seq(g(n,.2),n = 1 .. 20)];

> [seq(g(n,.3),n = 1 .. 20)];

> [seq(g(n,.5),n = 1 .. 20)];


The Quadrature of the Parabolic Segment

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

March 21, 2006

The problem of quadrature is one of finding the square or rectangle whose area is the same as that
of a given region. In other words, it amounts to finding the area of the given region.Archimedes
considered an arbitrary region bounded by an arc of a parabola, ADBEC, and a straight line, AC.

The point B is on the arc of the parabola where the tangent line is parallel to line AC, D is the
point where the tangent line is parallel to AB, and E is the point where the tangent line is parallel
to BC. Archimedes proves that the areas of triangles ADB and BEC are each 1/8 of the area of
triangle ABC. This argument can then be repeated to show that each of the next four triangles
are each 1/8 of the area of ADB or BEC, or 1/64 of the area of ABC. At the nth step, we add
2n triangles, each of which has area equal to 1/8n of triangle ABC.

You can find how Archimedes proved that triangle ADB has 1/8 the area of triangle ABC on pages
54–57 of Archimedes: What did he do besides cry Eureka? by Sherman Stein.1

1
Published by MAA, 1999.

1
The Archimedean Principle

Appendix to A Radical Approach to Real Analysis 2nd edition


2009
c David M. Bressoud

January 21, 2009

The Archimedean principle states that any two positive distances are commensurable, which
means that we can find a finite multiple of the smaller distance that will exceed the larger. This
specifically rules out the possibility of infinitesimal distances that are so small that no matter how
many of them we take—as long as it is a finite number—we can never get enough to equal or exceed
any finite length.

Since we have not yet defined real numbers, it is safest to restate the Archimedean understanding
of an infinite series in terms of rational numbers, numbers that can be expressed as a ratio of two
integers for which the denominator is not zero.

The value of an infinite series, if it exists, is that number T such given any rational
numbers L and M such that L < T < M , all of the finite sums from some point on will
be strictly contained in the interval between L and M .

We note that positive rational numbers are commensurable (clear denominators and use the fact
that positive integers are commensurable) and that between any two rational numbers there always
lies a third (the average of two rational numbers is rational).

If we could have numbers that are infinitesimal with respect to the rational numbers, then the
Archimedean understanding of infinite series would not work. If the number T is the target values
and infinitesimals exist, then there is a positive infinitesimal ι so that for any rational number
M > T , ι is strictly less than M − T . It follows that if L and M are rational numbers for which
L < T < M , then it is also true that L < T + ι < M . The Archimedean characterization of the
target value would not identify it uniquely.

On the other hand, if we accept the Archimedean principle, we can prove that T is uniquely
defined. Let us assume that there is another number, U , that also satisfies the definition of a target
value. We can assume that T < U . Since there is some n for which n(U − T ) > 1, we know that
0 < 1/n < U −T , and therefore the smallest multiple of 1/n that exceeds T must be strictly smaller
than U . This says that there is a rational number of the form m/n that lies strictly between T and
U . Thus, if L and M are rational numbers with L < T < U < M , then the open interval (L, m/n)
contains all terms of the sequence beyond a certain point, but so does (m/n, M ), a contradiction.

1
Explorations of the Alternating Harmonic
Series
This notebook provides explorations of the rearrangements of the alternating harmonic series.
Click on the formula given below and then press the ENTER key. When you are asked if you
want to evaluate all the initialization cells, answer YES. When Mathematica warns that there
might be a problem, proceed and answer EVALUATE.

The command AHS[r,s,m] calculates the sum of the first (r+s)m terms from the rearranged
alternating harmonic series in which we rearrange the series so that the first r positive terms are
followed by the first s negative terms, then the next r positive terms followed by the next s
negative terms, and so on. Check the value of the original alternating harmonic series and the
series with r=1, s=2, both taken out to a total of six million terms.

> AHS := (r, s, m) -> evalf(Sum(1/(2*j-1),j = 1 .. r*m))-evalf(Sum(1/2/j,j = 1 .. s*m)) ;

> AHS(2,3,100000);

> AHS(1,2,2000000);

The command AHSGraph[r,s,m] finds the sum of the first (r+s)m terms of the alternating
harmonic series as well 20 partials sums. It lists the values of these partial sums and plots their
values.

> AHSList := (r,s,m) -> [seq([(r+s)*floor(k*m/20)+r*(k-2*floor(1/2*k)),


AHS(r,s,floor(k*m)/20)],k = 1 .. 20)];

> AHSList(2,3,1000);

> plots[listplot](AHSList(2,3,1000));

Challenge problem

We know that lim_{m ->   }AHS(1, 1, m) = Log[2] and lim_{m ->   }AHS(1, 2, m) =
log(2)\/2. There is a general formula for lim_{m ->   } AHS(r, s, m). Try to guess it .  You may
want to use the Inverse Symbolic Calculator
at http://oldweb.cecm.sfu.ca/projects/ISC/ISCmain.html.  The ISC takes a decimal and returns all
exact quantities (such as log 2) whose decimal digits agree with those that have been entered.
Assigning Values to Divergent Series

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

March 29, 2006

Cauchy tried to banish the practice of assigning values to series that do not converge.As Daniel
Bernoulli showed (see section 2.6), there are traps that are easy to fall into when we attempt
to assign values to divergent series. But the fact is, scientists need these values. The work of
d’Alembert (see section 2.5) demonstrates the usefulness of such values. A classic example of a
divergent series that is nevertheless extremely useful is the common series expansion of the error
term in Stirling’s formula for n! (see “The size of n!” in appendix A.4).

A divergent series cannot give us an arbitarily close approximation to a given value, but it might
be able to give us an approximation that is good enough for our purposes. This happens in many
areas of science. In the nineteenth century, it occured particularly frequently in astronomy where
the values that needed to be calculated could be found to sufficient accuracy by using series that
did diverge, but whose divergence was not evident until many terms had been taken. Such series
are called asymptotic series.

Much work has been done on divergent series. Ernesto Cesàro and Otto Hölder are among the great
mathematicians of the late nineteenth century who worked on them. S. Ramanujan was particularly
adept at their manipulation. In 1949, G. H. Hardy published his classic book, Divergent Series,
on methods of handling and assigning values to such series as well as on demonstrations of their
usefulness. But they are not for the novice. It is very easy to fall into error if you are not extremely
careful about the assumptions that lie beneath the work you are doing.

1
More Pi
Machin's Approximation
The series partial(s) = 4*sum((-1)^k/(2*k+1),k=0..10^s) converges very slowly. To see how
slowly, evaluate the sum for different values of the upper limit and compare to the true value pi =
3.14159265358979323846...  Note that the upper limit of the summation is 10^s, so partial(3)
calculates the first 1000 terms, partial(4) calculates the first 10,000, and partial(5) the first
100,000.

> partial := s -> evalf(4*sum((-1)^k/(2*k+1),k=0..10^s));

> partial(3);

The Binomial Series Approximation


The approximation to pi given by equation (2.20) is encoded as BS(n). The last command plots
10 values of this function from n=10 to n = 100.

> BS := n -> evalf( 10/3 - sum( (2*k-2)!/2^(2*k-1) / k! / (k-1)! / (2*k+1),k=2..n) );

> BS(10);

> BSList:= n -> [seq([floor(n*j/10),BS(floor(n*j/10))],j=1..10)];

> plots[listplot](BSList(100));

Newton's Approximation
The approximation to pi given by Newton's formula is encoded as NS(n). The last command
returns 10 values of the difference between this function and pi from n=10 to n = 100.

> NS := n -> evalf( 253/80 - sum( 6*(2*k - 3)!*(2*k + 7)/(16^k * k! *(k - 2)!*(2*k +
3)),k=2..n) );

> NS(10);

> NSList:= n -> [seq([floor(n*j/10),evalf(NS(floor(n*j/10))-Pi,100)],j=1..10)];

> NSList(100);
Ramanujan's Approximation
Ramanujan's approximation is to 1/pi, but it is easy to take reciprocals and so obtain an
approximation to pi. The function RS(n) calculates the sum up to n of Ramanujan's series.The
last command returns 10 values of the difference between 1/RS(n) and pi  from n=10 to n = 100.

> RS := n -> evalf(sqrt(8)*1103/9801,100) + evalf(sqrt(8)/9801*sum((4*k)!*(1103 +


26390*k)/(k!)^4/396^(4*k),k=1..n),100) ;

> RSList:= n -> [seq([floor(n*j/10),evalf(1/RS(floor(n*j/10))-Pi,100)],j=1..10)];

> RSList(10);
Newton’s Formula

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

July 5, 2007

We know how Newton discovered his binomial theorem because he described the process in a letter
to Leibniz.

Following in Wallis’s footsteps, Newton recognized that the key to calculating π was finding a way
R1 1/2
of evaluating π/4 = 0 1 − x2 dx. If that exponent were an integer instead of 1/2, life would
be easy. Like Wallis, Newton begins by comparing what he has to what he can evaluate. He looks
at the expansions of (1 + x)m for integer values of m.
(1 + x)0 = 1 + 0 · x + 0 · x2 + 0 · x3 + 0 · x4 + 0 · x5 + · · · ,
(1 + x)1 = 1 + 1 · x + 0 · x2 + 0 · x3 + 0 · x4 + 0 · x5 + · · · ,
(1 + x)2 = 1 + 2 · x + 1 · x2 + 0 · x3 + 0 · x4 + 0 · x5 + · · · ,
(1 + x)3 = 1 + 3 · x + 3 · x2 + 1 · x3 + 0 · x4 + 0 · x5 + · · · ,
(1 + x)4 = 1 + 4 · x + 6 · x2 + 4 · x3 + 1 · x4 + 0 · x5 + · · · ,
..
.

He considered a table of the cofficients and tried to guess what coefficients would correspond to an
exponent of m = 1/2 in
(1 + x)m = a0 + a1 x + a2 x2 = a3 x3 + a4 x4 + a5 x5 + · · · .

m x0 x1 x2 x3 x4 x5
0 1 0 0 0 0 0
1/2
1 1 1 0 0 0 0
3/2
2 1 2 1 0 0 0
5/2
3 1 3 3 1 0 0
7/2
4 1 4 6 4 1 0
9/2
5 1 5 10 10 5 1

1
Newton’s Formula 2

It is easy to guess that the values in the first column are all 1, and the values in the second column
must equal m. What about the third column?

Newton would have been very familiar with the sequence 1, 3, 6, 10, , the triangular numbers.
The jth triangular number is the sum of the integers from 1 to j. It equals j(j +1)/2. The exponent
m corresponds to the (m − 1)st triangular number, so the formula to use in the third column is
m(m − 1)/2.

If the values in the first column are constant, the values in the second column increase linearly, and
the values in the third column are given by a quadratic formula, then it makes sense to look for a
cubic polynomial for the fourth column, a quartic polynomial for the fifth, and so on. Armed with
this assumption, it is not difficult to determine what these polynomials must be.

We know that the cubic polynomial in m that fits the coefficients of x3 must have roots at m = 0,
1, and 2. This cubic polynomial must be cm(m − 1)(m − 2) for some still to be determined constant
c. We can find c by using the fact that that this polynomial is 1 when m = 3:

1 = c · 3(3 − 1)(3 − 2) = 6c.

This polynomial is m(m − 1)(m − 2)/6.

A similar argument shows us that the polynomial in the next column should be m(m − 1)(m −
2)(m − 3)/4! and the polynomial in the fifth column should be m(m − 1)(m − 2)(m − 3)(m − 4)/5!.

In general, the column that corresponds to xk will have zeros at m = 0, 1, 2, . . . , k − 1, and a 1 at


m = k. The corresponding polynomial is

m(m − 1)(m − 2) · · · (m − k + 1)
.
k!
Since this is defined for any value of m, we can fill in the table:

m x0 x1 x2 x3 x4 x5
0 1 0 0 0 0 0
1/2 1 1/2 −1/8 1/16 −5/128 7/256
1 1 1 0 0 0 0
3/2 1 3/2 3/8 −1/16 3/128 −3/256
2 1 2 1 0 0 0
5/2 1 5/2 15/8 5/16 −5/128 3/256
3 1 3 3 1 0 0
7/2 1 7/2 35/8 35/16 35/128 −7/256
4 1 4 6 4 1 0
9/2 1 9/2 63/8 105/16 315/128 63/256
5 1 5 10 10 5 1

All of this has been inspired guesswork. Newton did not supply a proof at this time, but he
recognized that this enabled him to calculate π with great accuracy, and therefore he was certain

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Newton’s Formula 3

that it must be correct. He had discovered the general binomial theorem:

m(m − 1) 2 m(m − 1)(m − 2) 3 m(m − 1) · · · (m − k + 1) k


(1 + x)m = 1 + mx + x + x +· · ·+ x +· · · .
2! 3! k!
(1)

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Explorations of the Harmonic Series

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

May 11, 2006

We consider two classic problems that have more in common than might appear at first glance.

1 Stacking Bricks

We stack bricks so that they are held in place simply by gravity. The stack will remain stable even
if one brick extends out further than the brick below it, so long as the center of mass of the top
brick rests on a solid brick. We can even put a third brick on top of that so that it extends a bit
further (see figure 1). How far out can the top brick extend? Is it possible to build a stable stack
of bricks, one on each level, so that the top brick is completely to right of the bottom brick?

To answer this question, we assume that the bricks are identical, each brick has length 1, we ignore
the width of the brick, and we also do not care how thick it is. As a practical matter, just to keep
the stack from getting very high, we want thin bricks, but they have to be thick enough that they
will not bend. We put the bottom brick so that its left-hand edge is at 0. We are interested in the

0 R(n)

Figure 1: A leaning—but stable—stack of bricks.

1
Explorations of the Harmonic Series 2

0 1 23/12
Figure 2: Stacking four bricks.

numerical value of the right-hand edge of the top brick. Let R(n) be the distance from the vertical
line at 0 to the right edge of the top brick in a stack of n bricks: R(1) = 1.

If we have two bricks, they will just balance if the center of the top brick is directly over the right
edge of the bottom brick: R(2) = 3/2.

How far to the right can we place the third brick? The top two bricks must balance by themselves,
so the top brick extends 1/2 unit further than the middle brick. The combined center of mass of
the top two bricks must lie over the right-hand edge of the bottom brick. This center of mass is at
R(3) − 3/4. Therefore,
3 7
R(3) = 1 + = .
4 4

Can we find an n so that R(n) ≥ 2? The evidence is still inconclusive. The fourth brick begins to
illuminate what is happening (see figure 2). Again, the top three bricks must be stable, so the top
brick is 1/2 a unit to the right of the second brick from the top. The second brick from the top is
1/4 of a unit to the right of the third brick from the top. How far to the right can we move this
third brick? The center of mass of the top three bricks must lie over the right-hand edge of the
bottom brick.

The center of mass of these three bricks is at


   
1 1 5 11
R(4) − + (R(4) − 1) + R(4) − = R(4) − .
3 2 4 12

That means that


11 23
R(4) = 1 + = .
12 12

We see that what we really need to know is the distance from the center of mass of a stack of n
bricks to the right-hand edge of the stack. If we call this C(n), then

R(n + 1) = 1 + C(n).

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Explorations of the Harmonic Series 3

As we have seen,
1 3 11
C(1) = , C(2) = , C(3) = .
2 4 12
When we put our three bricks on top of a fourth, that moves the center of mass to the left by
(1/4) × (1/2) because the fourth brick is 1/4 of the total mass and its center of mass is 1/2 unit to
the left of the previous center of mass,
1 11 1 25
C(4) = C(3) + = + = .
8 12 8 24
With five bricks, R(5) = 49/24, and this fifth brick lies completely to the right of the bottom brick.

We can now pick up the pattern. If we have a stack of n − 1 bricks and place them on top of an
nth brick, this moves the center of mass 1/2n units to the left, so
1
C(n) = C(n − 1) +
2n
1 1
= C(n − 2) + +
2(n − 1) 2n
= ···
1 1 1
= C(1) + + + ··· +
4 6 2n
 
1 1 1 1
= 1 + + + ··· + ,
2 2 3 n
 
1 1 1 1
R(n) = 1 + 1 + + + ··· + .
2 2 3 n−1

Questions

1. What is the smallest n so that R(n) ≥ 3? ≥ 10? ≥ 100? (Ignore all practical considerations.)

2. We have stacked these bricks so that each is as far to the right as possible. The slightest
breeze would cause this stack to topple. Redo this problem so that instead of placing the
center of mass of the n − 1 bricks above the right-hand edge of the bottom brick, at each
iteration we put it 1/4 of a unit to the left of the right-hand edge of the bottom brick, to gain
greater stability. How many bricks are now needed so that the top brick is completely to the
right of the bottom brick?

2 Traversing the Desert

We are faced with the problem of crossing 1000 miles of desert using a truck that gets 5 miles to
the gallon and that can only carry 80 gallons, enough gasoline to go 400 miles before it needs to
refill. We assume that it can carry its gasoline in containers that can be deposited at any point

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Explorations of the Harmonic Series 4

along the route for later use. Thus, for example, we could travel 100 miles into the desert, drop
off 40 gallons, and have just enough to get back to the starting point and refill. We can now make
a second trip. When we get to the drop-off point, we have 60 gallons left. We refill from our
deposit and go another 100 miles into the desert. There we drop off 40 gallons, get back to the first
depository, and refill, getting just enough gas to get back to the starting point. We now have no
gasoline at the 100 mile mark, but we have 40 gallons sitting 200 miles into the desert. If we now
started out from home, we could refill when we got to the 200 mile mark and go another 400 miles
into the desert. We do not want to do that because we would run out of gas 400 miles short of our
destination. But can we get across the desert, and, if so, how many trips would it take?

There are many solutions to this problem. We can extend the method used in the first paragraph
to make 40 gallon deposits at various 100 mile marks. Show by induction that using the procedure
described above, it takes 2n−1 trips to get 40 gallons out to the 100n-mile mark. If we deposited
40 gallons at the 400-mile mark and 80 gallons at the 600-mile mark, then we could make a trip
completely across the desert. It would take 72 trips to set up these deposits. We would cross the
desert on our 73rd trip.

We can do better than that. If we deposit 40 gallons at the 200-, 400-, and 600-mile marks, then
on the last trip we top up every 200 miles and can get across the desert. This only requires 42 trips
to set up the deposits, and we get across the desert with 43 trips.

What is the fewest number of trips that will get us across the desert?

To understand the solution to this problem, it is useful to change how we visualize it. Instead of
making n trips, we begin with n trucks. Each truck can share its gasoline with the subsequent
trucks down the line. All but one of them will have to return to the starting point (so that they
can refill to make the next trip). One truck, which represents our last trip across the desert, does
not return. It needs to have filled up from the last truck to turn back at the 600-mile mark.

The n trucks all start out together. The first truck to turn back will share its gasoline with the
other trucks. It needs to be able to share all of its gasoline, leaving just enough gas for it and all
but the last truck to get back from its turn-around point. Let x1 be the mile at which it turns
around. Each of the n trucks has consumed x1 /400 of its gasoline. All but the last truck will need
to return from this drop-off point, which means that they will need x1 /400 of their capacity for the
return trip. We use the gasoline in the first truck most efficiently if
x1 x1 400
n + (n − 1) = 1, x1 = .
400 400 2n − 1

The remaining n − 1 trucks are filled when they leave mile x1 = 400/(2n − 1), and there is enough
gasoline deposited at this position for n − 2 of them to get back to the starting point. The second
truck should go a further 400/(2n − 3) miles, turning around at mile
 
1 1
x2 = 400 + .
2n − 1 2n − 3
Continuing in this way, we see that the n − 1st truck turns around at mile
 
1 1 1
xn−1 = 400 + + ··· + .
2n − 1 2n − 3 3

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Explorations of the Harmonic Series 5

When it leaves mile xn−1 , the last truck is full. It will make it across the desert provided that
xn−1 ≥ 600. We now have an equation we can solve:
 
1 1 1
400 + + ··· + ≥ 600,
3 5 2n − 1
1 1 1 3
+ + ··· + ≥ ,
3 5 2n − 1 2
1 1 1 5
1+ + + ··· + ≥ ,
3 5 2n − 1 2
2n n
X 1 1X1
− ≥ 2.5. (1)
k 2 k
k=1 k=1

Questions

1. Find the smallest value of n that satisfies equation (1).

2. If the truck can carry enough gasoline to travel r miles and if the desert is d miles wide, find
the fewest trips needed to cross the desert, expressed as a function of r and d.

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Euler’s Solution to the Vibrating Drumhead

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

March 31, 2006

One example of the utility of infinite series can be found in Leonhard Euler’s analysis of 1759 of
the vibrations of a circular drumhead. Euler was led to the differential equation
d2 u 1 du β2
 
2
+ + α − 2 u = 0, (1)
dr2 r dr r
where u (the vertical displacement) is a function of r (the distance from the center of the drum)
and α and β are constants depending on the properties of the drumhead. There is no closed form
for the solution of this differential equation, but if we assume that the solution can be expressed as
a power series,
u = rλ + a1 rλ+1 + a2 rλ+2 + a3 rλ+3 + · · · ,
then we can solve for λ and the ai . The derivatives of our power series are
du
= λrλ−1 + (λ + 1)a1 rλ + (λ + 2)a2 rλ+1 + · · · , (2)
dr
d2 u
= (λ − 1)λrλ−2 + (λ)(λ + 1)a1 rλ−1 + (λ + 1)(λ + 2)a2 rλ + · · ·
dr2
(3)
Substituting these series into equation (1), we see that:
(λ − 1)λ + λ − β 2 rλ−2
 
(4)
+ λ(λ + 1)a1 + (λ + 1)a1 − β 2 a1 rλ−1
 

+ (λ + 1)(λ + 2)a2 + (λ + 2)a2 − β 2 a2 + α2 rλ


 

+ · · · + (λ + j − 1)(λ + j)aj + (λ + j)aj − β 2 aj + α2 aj−2 rλ+j−2


 

+ · · · = 0. (5)
Each of these coefficients must be zero, and so
λ = β, (6)
a1 = 0, (7)
−α2
a2 = , (8)
2(2β + 2)
−α2
aj = aj−2 , j > 2. (9)
j(2β + j)

1
Euler’s Solution to the Vibrating Drumhead 2

It follows that

1  αr 2 1  αr 4
u(r) = rβ 1 − +
(β + 1) 2 2! (β + 1)(β + 2) 2

1  αr 6
− + ··· . (10)
3! (β + 1)(β + 2)(β + 3) 2

There is no better representation for the solution of this differential equation.

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Explorations of d'Alembert's Series
The command SquareRoot[x,n] computes the first  n terms of the binomial series expansion of a
and compares it to the actual value (10-digit accuracy).

> SquareRoot := (x,n) ->


[1+evalf(sum((-1)^(k-1)*(2*k-2)!/2^(2*k-1)/k!/(k-1)!*x^k,k=1..n),10),evalf(sqrt(1+x),
10)];

> SquareRoot(200/199,100);

The next command computes  this function at ten values of n: m/10, 2m/10, ..., m. If we chooose
a value of m that is not a multiple of 10, it computes these at the floors. This is useful so that the
number of terms is not always even or always odd.

> SquareRootList := (x,m) ->


[seq([floor(j*m/10),SquareRoot(x,floor(j*m/10))[1]],j=1..10)];

> SquareRootList(200/199,1007);

The last command plots these values.

> plots[listplot](SquareRootList(200/199,1007));
Explorations of Lagrange's Remainder
The command LagrangeRem(f,n,a,x) returns the difference between f(x) and the (n-1) degree
Taylor polynomial of f expanded about a.

> LagrangeRem := (f,n,a,x) -> evalf(eval(f(t) -


convert(taylor(f(t),t=a,n),polynom),t=x),20);

> LagrangeRem(sin,10,0,x);

> LagrangeRem(sin,10,0,0.5);

> plot(LagrangeRem(sin,10,0,x),x=-2..2);

The problem with investigating Lagrange's remainder theorem is that the Lagrange remainder is
so extremely small near x=a. Rather than comparing this remainder with the nth derivative times
(x-a)^n divided by n!, it makes more sense to divide the remainder by (x-a)^n, multiply it by n!,
and then compare the resulting function to the nth derivative of f.

> ModifiedLagrangeRem := (f,n,a,x) -> LagrangeRem(f,n,a,x)*n!/(x-a)^n;

We now compare the plot of the constant function y = ModifiedLagrangeRem with the nth
derivative of f over the interval [a,x]. Lagrange's Remainder theorem is simply the statement that
the graphs of these two functions interesect.

> ModifiedLagrangeRem(cosh,10,0,0.5);

> plot([ModifiedLagrangeRem(cosh,10,0,0.5),eval(diff(cosh(x),x$10),x=t)],t=0..0.5);
Chapter 2
Maple code for exercises in section 2.1
7.

The following Maple command allows you to explore the values of the alternating harmonic
series when you rearrange summands. It takes three arguments: n, r, and s, which must be
positive integers. It returns a decmial approximation to the partial sum of the first n(r+s) terms of
the rearranged series that alternates the next r positive summands followed by the next s negative
summands.

> Rearrange := (n, r, s) -> sum(1/(2*k-1),k = 1 .. n*r)-sum(1/2/k,k = 1 .. n*s);

To get the desired accuracy, you need to set the number of digits to at least 20. For example, in
part (b) where you are asked to evaluate the partial sums of the rearranged series that alternate
two positive tersm followed by a single negative term, the following command evaluates the
partial sum of the first 300 terms with 20-digit accuracy.

> evalf(Rearrange(100,2,1),20);

The exponential function can be evaluated either using E^.. or the function Exp[..].

> evalf(exp(2*Rearrange(1000000,2,1)),20);

8.

The following Maple command allows you to explore the values of the alternating sum of
reciprocals of perfect squares when you rearrange summands. It takes three arguments: n, r,and s,
which must be positive integers.It returns a decmial approximation to the partial sum of the first
n(r+s) terms of the rearranged series that alternates the next r positive summands followed by the
next s negative summands.

> Rearrange2 := (n, r, s) -> sum(1/(2*k-1)^2,k = 1 .. n*r)-sum(1/4/k^2,k = 1 .. n*s);

Maple code for exercises in section 2.2


8.

The following Maple command allows you to explore the values of the alternating geometric
series of reciprocals of powers of 2 when you rearrange summands. It takes three arguments: n,
r,and s, which must be positive integers.It returns a decmial approximation to the partial sum of
the first n(r+s) terms of the rearranged series that alternates the next r positive summands
followed by the next s negative summands.

> RearrangeGS := (n, r, s) -> sum(1/(2^(2*k-2)),k = 1 .. n*r)-sum(1/(2^(2*k-1)),k = 1 ..


n*s);

9.

Try the command

> int(x^n/(1-x),x);

To find out what this answer means, you can query Maple:

> ?LerchPhi

Maple code for exercises in section 2.3


7.

The sum of the first n terms of the power series for the arctangent function is given by

> ArcTanPS := (n, x) -> sum((-1)^k*x^(2*k+1)/(2*k+1),k = 0 .. n-1);

12.

Partial sums of the binomial series for (1+x)^a can be calculated with the following Maple
command

> Binom := (a, n, x) -> 1+sum(binomial(a,m)*x^m,m = 1 .. n);

> map(Binom,[-2, -.4, 1/3, 3, 5.2],100,.5);

13.

> f:=x->Binom(.5,100,x);map(f,[-2, -1, .9, .99, 1, 1.01, 1.1]);

>

14.

The following command plots each of the polynomial approximations against the graph of
Sqrt[1+x] and then superimposes all of the graphs.
> plots[display](plot([sqrt(1+x), Binom(1/2,2,x)],x = -1 .. 2,view = 0 ..
2.5),plot([sqrt(1+x), Binom(1/2,5,x)],x = -1 .. 2,view = 0 .. 2.5),plot([sqrt(1+x),
Binom(1/2,8,x)],x = -1 .. 2,view = 0 .. 2.5),plot([sqrt(1+x), Binom(1/2,11,x)],x = -1 ..
2,view = 0 .. 2.5));

Maple code for exercises in section 2.4


4.

Partial sums of the power series for ln(1+x) can be calculated with the following Maple
command

> LogSeries :=  (n, x) -> sum((-1.)^(k-1)*x^k/k,k = 1 .. n);

> g:= x -> LogSeries(100,x);map(g,[-.9, .9, .99, .999, 1, 1.001, 1.01, 1.1]);

5.

Partial sums of Gregory's series for ln(1+x) can be calculated with the following Maple
command

> Gregory :=  (n, x) -> 2*sum((x/(x+2))^(2*k-1)/(2*k-1),k = 1 .. n);

> evalf(Gregory(10,4),10);

> evalf(Gregory(10,-4/5),10);

> evalf(ln(5),10);

6.

> h:=x -> Gregory(100,x);evalf(map(h,[-.9, .9, 1, 1.1, 5, 20, 100]),20);

>

7.

The sum of the first thousand terms of this series can be calculated with the following Maple
command

> evalf(sum((-1)^(1/2*(k-1)*(k-2))/k,k = 1 .. 1000),20);

9.
The sum of the first thousand terms of the series for gamma can be calculated with the following
Maple command

> evalf(sum((-1)^k*Zeta(k)/k,k = 2 .. 1000),10);

> evalf(gamma,10);

13.

The sum of the first thousand terms of this series can be calculated with the following Maple
command

> evalf(sum(1/sqrt(k),k = 1 .. 1000),10);

17.

You can get one candidate for your answer by evaluating

> evalf(floor(exp(100-gamma)),50);

Maple code for exercises in section 2.5


1.

The functions used in this exercise are tan(x), arctan(x), sec(x), ln(x), arcsec(x), and  tanh(x).
The  command eval(diff(f(x),x$n),x=a), evaluates the nth derivative of f at x = 0. The next
command creates a table of the values of the first through 10th deriviatives of tan(x) evaluated
at x = 0.

> d := (n,f,a) -> eval(diff(f(x),x$n),x=a);

> map(d,[1,2,3,4,5,6,7,8,9,10],tan,0);

2.

The polynomial approximation to tan(x) using the first five non-zero terms is

> P5Tan := x -> x+2*x^3/3!+16*x^5/5!+272*x^7/7!+7936*x^9/9!;

The following commmand plots both the tangent function and the polynomial approximation.

> plot([tan(x), P5Tan(x)],x = -1/2*Pi .. 1/2*Pi, y=-10..10);

7.
Partial sums of the binomial series for (1+x)^a can be calculated with the following Maple
command

> Binom := (a, n, x) -> 1+sum(binomial(a,m)*x^m,m = 1 .. n);

The list of values of the partial sums can be generated with the following Table command

> seq([100*n, Binom(-.5,100*n,200/199)],n = 1 .. 10);

You can plot these values with the command

> plots[listplot]([seq([100*n, Binom(-.5,100*n,200/199)],n = 1 .. 10)],style =


POINT,symbol = CIRCLE);

8.

> seq([100*n, Binom(-.5,100*n,1)],n = 1 .. 10);

> seq([100*n, Binom(-.5,100*n,-1)],n = 1 .. 10);

12.

Enter the definition of the following function

> Binom := (a, n, x) -> 1+sum(binomial(a,m)*x^m,m = 1 .. n);

The two values that you need to compare are

> [Binom(.5,299,200/199), evalf(sqrt(399/199))];

Their difference is

> Binom(.5,299,200/199)-evalf(sqrt(399/199));

The 300th derivative of sqrt(1+x) is

> diff(sqrt(1+x),x$300);

The error term given by Lagrange's remainder theorem is

> ErrorBound := c ->abs( eval( diff( sqrt(1+x), x$300 ), x=c ) )*(200/199)^300 / 300!;

> plot(ErrorBound(c),c = 0 .. 200/199, y=0..0.00005);


What is the value of c on the interval [0 , 200/199] that maximizes the error and what is the
resulting bound on the error?

16.

Enter the definition of the following function

> Binom :=  (a, n, x) -> 1+sum(binomial(a,m)*x^m,m = 1 .. n)

Define a new error bound function for x = 1 that depends on the choice of a. Note that that the
value of c that maximizes these errors is c = 0.

> ErrorBound := (a,n) ->  abs( eval( diff( (1+x)^a, x$n ), x=0 ) )/ n!;

> map(ErrorBound,[-1, -.8, -.6, -.6, 0, .2, .4, .5],100);

20.

The following command plots the three functions. You should be able to determine which is
which by their values at x = 0.

> plot([eval(diff(ln(1+y),y$7)/7!,y = 0), eval(diff(ln(1+y),y$7)/7!,y = x),


ln(1+x)-x+1/2*x^2-1/3*x^3+1/4*x^4-1/5*x^5+1/6*x^6],x = 0 ..
1,color=[red,green,blue]);

>
Newton-Raphson Method

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

June 20, 2006

A method for finding the roots of an “arbitrary” function that uses the derivative was first circulated
by Isaac Newton in 1669. John Wallis published Newton’s method in 1685, and in 1690 Joseph
Raphson (1648–1715) published an improved version, essentially the form in which we use it today.
The idea is as follows. We are given a function:

f (x) = x3 − 4x2 − x + 2.

Even a rough graph shows that this function has three roots: near −1, 1, and 4 (see figure 1).
Those are not the exact values of the roots. The value of f (−1) is −2. We shall have to move a
little to the right of −1. To see how far we should move, we note that the slope of our function
at x = −1 is f 0 (−1) = 10. Our function does not have a constant slope between (−1, −2) and the
root we are seeking, but it looks as if the slope does not change by too much. If we increase x
by 1/5, the y value should increase by about 2. Our next guess for the location of the root is at
x = −1 + 1/5 = −0.8 (see figure 2).

Again, we are not quite there: f (−0.8) = −0.272. Our slope has changed slightly. At x = −0.8
it is only f 0 (−0.8) = 7.32. To increase the y value by 0.272 (∆ y = 0.272), we need to increase
x by approximately 0.272/7.32 = 0.037158 [∆ x ≈ ∆ y/f 0 (x)]. The next guess is pretty good:
f (−0.8 + 0.037158) = −0.0088, and so the root is close to x = −0.762842.

What we have found is an iterative procedure that will get us progressively closer to our root. If
xk was our last guess and neither f (xk ) nor f 0 (xk ) is zero, then the next approximation will be
f (xk )
xk+1 = xk − . (1)
f 0 (xk )
This kind of iteration is easily programmed. Starting with x1 = −1, the successive iterations (with
ten-digit accuracy) are

x2 = −0.8,
x3 = −0.7628415301,
x4 = −0.7615586962,
x5 = −0.7615571818,
x6 = −0.7615571818.

1
Newton-Raphson Method 2

Figure 1: f (x) = x3 − 4x2 − x + 2.

Figure 2: A close-up of f (x) = x3 − 4x2 − x + 2.

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Newton-Raphson Method 3

p
Figure 3: f (x) = x/ |x|.

This is as close as we are going to get to the root using a ten-digit decimal approximation. This is
the Newton–Raphson method.

What is wrong with Newton–Raphson

Most of the time, Newton–Raphson converges very quickly to the root. To find the other two roots,
we just start closer to them. But it does not always work. Consider the function defined by
x
f (x) = p , x 6= 0,
|x|

and define f (0) = 0 so that it is continuous (see figure 3). The derivative of this function is
1
f 0 (x) = p , x 6= 0.
2 |x|
If we choose any starting point off the actual root, x1 = a 6= 0, then
p
a/ |a|
x2 = a − p = a − 2a = −a.
1/2 |a|
It follows that x3 = −x2 = a, x4 = −a, x5 = a, and we keep bouncing back and forth forever.

This function is rather unusual. A more common occurence is that Newton–Raphson works for
some choices of starting point but not for others, and when it does work it does not necessarily
take you to the closest root. We consider the function defined by (see figure 4)

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Newton-Raphson Method 4

Figure 4: f (x) = sin x − x/2.

f (x) = sin x − x/2, f 0 (x) = cos x − 1/2.

This has three roots. One of them is at x = 0. The others are near x = ±2.

For this function, the Newton–Raphson method uses the iteration

sin xk − xk /2
xk+1 = xk − .
cos xk − 0.5
If we start with x1 = 2, we quickly approach the rightmost root:

x1 = 2,
x2 = 1.900995594,
x3 = 1.898679953,
x4 = 1.895500113,
x5 = 1.895494267,
x6 = 1.895494267.

To within ten digits of accuracy, we have found the positive root of sin x − x/2.

What happens if our initial guess is not so accurate? What happens if we start with x1 = 1? The
reader is encouraged to perform these calculations. They are very sensitive to internal roundoff

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Newton-Raphson Method 5

and may vary from one machine to the next.

x1 = 1,
x2 = −7.47274064,
x3 = 14.47852099,
x4 = 6.935115429,
x5 = 16.63568455,
x6 = 8.34393498,
x7 = 4.9546244809,
x8 = −8.300926412,
x9 = −4.816000398,
x10 = 3.764084222,
x11 = 1.88580759,
x12 = 1.895549291,
x13 = 1.895494269.

We have again found the rightmost root, but it took considerably longer and we did a lot of bouncing
around en route. In fact, starting just a little closer, at x = 1.1, my calculator takes me as low as
x7 = −205.8890057 and as high as x16 = 323.2663795 before settling down to the root at x = 0.

What if we had started between these two points? If x1 = 1.01, my calculator takes me to the
negative root, −1.895494267. If x1 = 1.02, I move very far away from the origin, eventually
reaching a negative number that exceeds the internal storage capacity. Let us call that destination
−∞. The table given below lists the terminal values of the Newton–Raphson method with x1 =
1.00, 1.01, 1.02, . . . , 1.10 as found on my calculator. Try this on your own machine. Do not expect
your answers to be the same.

x1 terminal value
1.00 1.895494267
1.01 −1.895494267
1.02 −∞
1.03 −1.895494267
1.04 −1.895494267
1.05 +∞
1.06 −∞
1.07 1.895494267
1.08 0
1.09 −1.895494267
1.10 0

This is a good example of chaos: extreme sensitivity to initial conditions and machine round-off
error.

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Newton-Raphson Method 6

This does not mean that the Newton–Raphson method is no good. Even today it is one of the most
useful and powerful tools available for finding roots. But as we have seen, it can have problems.
We need further analysis of how and why it works if we want to determine when we can use it
safely and when we must proceed with caution.

What is Happening

Let r be the actual though unknown value of a root that we are trying to approach, and let xk be
our latest guess. We assume that it is “pretty close.” We can calculate f (xk ) and f 0 (xk ). While
we do not know the value of r, the fact that it is a root implies that f (r) = 0. Since xk is “pretty
close” to r, f 0 (xk ) will be “pretty close” to the slope of the line from (xk , f (xk )) to (r, 0):
f (r) − f (xk ) −f (xk )
f 0 (xk ) ≈ = . (2)
r − xk r − xk
We can solve this to get something that is “pretty close” to r:
f (xk )
r ≈ xk − . (3)
f 0 (xk )

As our last example showed, “pretty close” can sometimes be not close at all. What do we mean
by “pretty close”? Equation (2) shows that we are using an approximation to the derivative.
Equation (3) uses this approximation in the denominator , and there lies the crux of our problem.
While 0.01 and 0.0001 are close to each other, 1/0.01 = 100 and 1/0.0001 = 10000 are not close.
It is not enough to say “pretty close.” We need to know the size of the error in equation (2).
Lagrange’s remainder for the Taylor series can help us.

We use the equality


f 00 (c)
f (x) = f (a) + f 0 (a) (x − a) + (x − a)2 , (4)
2!
where c is some unknown constant between a and x. If we solve for f 0 (a) we see that this equation
is equivalent to
f (x) − f (a) f 00 (c)
f 0 (a) = − (x − a). (5)
x−a 2!
The error is precisely −f 00 (c) (x − a)/2. Although we do not know the value of c, it may be possible
to find bounds on f 00 (c) when c is between a and x, and thus find bounds on the error.

We replace a with xk and x with r, and then solve for r, keeping the r − xk term in the error:
−f (xk ) f 00 (c)
f 0 (xk ) = − (r − xk ),
r − xk 2!
−f (xk ) f 00 (c)
r − xk = − (r − xk )2 ,
f 0 (xk ) 2f 0 (xk )
f (xk ) f 00 (c)
r = xk − − (r − xk )2
f 0 (xk ) 2f 0 (xk )
f 00 (c)
= xk+1 − (r − xk )2 . (6)
2f 0 (xk )

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Newton-Raphson Method 7

Are we getting closer?

We now have to decide what we want to mean when we say we are getting “pretty close” to r.
A reasonable criterion that is easy to use is to ask that xk+1 be strictly closer to r than xk was.
Equation (6) gives us a relationship between |r − xk+1 | and |r − xk |:
00
f (c)
|r − xk+1 | = 0 |r − xk |2 . (7)
2f (xk )
We shall get closer to r provided 00
f (c)
2f 0 (xk ) |r − xk | < 1,

or, equivalently, 0
00 f (xk )
f (c) < 2
r − xk . (8)

For the example we have been using, f (x) = sin x − x/2, we want

cos xk − 0.5
| sin c| < 2
. (9)
r − xk
Since we do not know how c depends on xk and r, we do not know the exact values of xk for which
Newton–Raphson gets progressively closer to r, but this inequality is satisfied if the right side is
larger than 1. If we graph the right side of inequality (9) near (but not at) the root r = 1.89. . .
(figure 5), we see that it is larger than 1 for 1.35 ≤ x ≤ 4.07. If we start with any point in this
range, we are guaranteed to converge to the positive root.

Exercises


The symbol
M&M indicates that Maple and Mathematica codes for this problem are available
in the Web Resources at www.macalester.edu/aratra, Chapter 3.

1. Lagrange’s form of the remainder tells us that if we use 1+2/1+22 /2!+23 /3!+· · ·+2k−1 /(k −1)!
to approximate e2 , then the difference between the true value and this approximation is

2k−1 2k ec
 
2 4
E(k) = e2 − 1 + + + · · · + =
1 2! (k − 1)! k!

for some c between 0 and 2. Stirling’s formula tells us that k! > k k e−k 2πk, and therefore the
error is less than  k
2k e2 2e e2
< √ .
k! k 2πk
Using this bound, find a small value of k that guarantees an error of less than 0.001, an error of
less than 0.000001.

2.
To ten-digit accuracy, find the other two roots of x3 − 4x2 − x + 2.
M&M

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Newton-Raphson Method 8

Figure 5: 2| cos x − 0.5|/|1.89 − x|.


3.
M&M For the function f (x) = sin x − x/2, run ten iterations of the Newton–Raphson method
at every value of x from x = 0.75 to x = 1.25 in steps of 0.01. The outcome will suggest which
initial values between 0.75 and 1.25 wind up at which roots. In regions of transition, you may want
to decrease the step size and increase the number of iterations.

4.
M&M The Newton–Raphson method is equally valid for complex-valued functions. Compare
the results of the previous exercise with the results obtained if x ranges from 0.75 + 0.01i to
1.25 + 0.01i.
5. Prove that if f (x) = x/ |x|, x 6= 0, then f 0 (x) = 1/2 |x|, x 6= 0.
p p
p
6. The example f (x) = x/ |x| works the way it does because this function satisfies the differential
equation
f (x)
= 2x.
f 0 (x)
Describe all functions that satisfy this differential equation.
7. Describe all functions that satisfy the differential equation
f (x)
= 3x.
f 0 (x)
Choose one of them and implement the Newton–Raphson method on it. What happens? Discuss
why it happens.
8. Describe all complex-valued functions that satisfy the differential equation
f (x)
= (1 − i)x.
f 0 (x)

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Newton-Raphson Method 9

Choose one of them and implement the Newton–Raphson method on it. What happens? Discuss
why it happens.

9.
M&M Use the Newton–Raphson method to find all the roots (to within ten-digit accuracy)
of f (x) = cos(3x) − x.

10. There is one positive root for f (x) = cos(3x) − x. Using the methods of this section, find an
interval around this root such that if Newton–Raphson is initiated at any point in this interval,
each iteration will take you closer to the positive root.

11. For the function f (x) = sin x − x/2, find an interval containing the origin and as large as
reasonably possible so that if we start the Newton–Raphson method with any x in this interval,
the iterations will always converge to 0.

12. Prove that if we have an interval I around a root r and a positive number α < 1 such that
00
f (c)
2f 0 (x) |r − x| ≤ α

for all x ∈ I and c between x and r, then the Newton–Raphson method must converge to r.

13. Is it possible that from some starting point each iteration of Newton–Raphson takes us closer
to the root r and yet from this starting point we can never get arbitrarily close to the value of r?
Explain your answer.

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
How to find and write a proof

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

March 31, 2006

This document lays out the steps of finding a proof of an if-then statement. We will use as an
example exercise 14 of section 3.1: “Prove that if f is continuous at a and limx→a f 0 (x) exists, then
so does f 0 (a), and they must be equal.” In section 2, we give some further ideas to help when
stymied in step 4.

1 Five steps

Step 1: Identify the hypothesis and the conclusion. Usually the hypothesis is signaled by the word
“if” and the conclusion by “then”, but there are other ways of expressing a theorem. For example,
we might have written, “Prove that f 0 (a) exists whenever f is continuous at a and limx→a f 0 (x)
exists, and show that when this happens they must be equal.”

In either case, the hypothesis and conclusion are the same:

Hypothesis: f is continuous at a AND limx→a f 0 (x) exists


Conclusion: f 0 (a) exists AND f 0 (a) = limx→a f 0 (x)

Note that there are two statements in the conclusion, each of which will need to be proven.

Step 2: Focus first on the conclusion. What does it actually say? What statements can you think
of that would immediately imply this conclusion? How are the terms in this conclusion defined?

Start with the statement “f 0 (a) exists.” The Cauchy explanation of this statement is that there is
a number—which we shall call f 0 (a)—with the property that if we define
f (x) − f (a)
E(x, a) = f 0 (a) − ,
x−a

then we can force E(x, a) to be as small as we wish simply by taking x sufficiently close to a. If
we can show
that,
given any  > 0, there is such a number for which we can find a response δ > 0
so that E(x, a) <  for all x satisfying 0 < |x − a| < δ, then we will have the first conclusion.

1
How to find and write a proof 2

The second part of the conclusion says that f 0 (a) is equal to limx→a f 0 (x). This tells us how to find
our candidate for f 0 (a). We need to show that given an  > 0, there is a response δ > 0 so that

0
lim f (x) − f (x) − f (a)
< (1)
x→a x−a

for all x satisfying 0 < |x − a| < δ. If we can prove this, then we have demonstrated the conclusion.
This is where we want to head.

Step 3: Now look at the hypothesis. What does it say that might help get to our reformulation of
the conclusion? How are the terms of the hypothesis defined?

For our example, the hypothesis states that limx→a f 0 (x) exists. This means that there is a target
value T and we can force f 0 (x) to be as close as we wish to T by taking x sufficiently close to a.
The hypothesis that f is continuous at a is clearly important. You should be able to think of an
example of a function that is not continuous at a and for which limx→a f 0 (x) exists, but f 0 (a) does
not exist. But it is not yet clear how we will use it.

Step 4: Now comes the hard part. We begin a process of comparing what we know from the
hypothesis with what we want to show in order to arrive at the conclusion. Is there some result
that follows from the hypothesis that will get us closer to the conclusion? Is there another statement
that implies the conclusion that looks a little more like what we know from the hypothesis? We
work from both ends trying to bring them closer. There is no guaranteed route to success. You
keep trying ideas until you find something that works. Think of it as building a bridge, working
out across a gorge from each side until they link up.

There is one quick simplification we can make now that we have assigned T as the target value of
the limit of f 0 (x). If we can show that for any  > 0 there is a response δ > 0 so that

T − f (x) − f (a) < 

(2)
x−a

for all x satisfying 0 < |x − a| < δ, then we have finished the proof.

In addition to trying to bring the hypothesis and conclusion as close together as possible, we also
scour the results we know to see if anything might be relevant. In this case, we do not yet know
many theorems. There is one that might help: the mean value theorem (theorem 3.1). It would
enable us to replace f (x) − f (a) /(x − a) with f 0 (c). Before we try to use it, check that the
hypotheses of this theorem are satisfied:

1. Is f differentiable at all points strictly between x and a? We do not know that it is differen-
tiable at a, but we do not need to know that. We know that limx→a f 0 (x) exists, and so f 0
must exist for all values sufficiently close to a. The first hypothesis is satisfied.

2. Is f continuous at every point on the closed interval from a to x? Differentiability implies


continuity. The only problem that we might have is continuity at a. Here is where we use
the other part of our hypothesis. We were told that f is continuous at a.

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
How to find and write a proof 3

We can use the mean value theorem to find a simpler  form of equation (2). We know that there is
0
a c strictly between a and x for which f (x) − f (a) /(x − a) = f (c). If x is within δ of a, then c
will also be within δ of a. Given  > 0, we need to show that there is a response δ > 0 so that if c
is within δ of a, then
T − f 0 (c) < 

(3)
But that is just the definition of this target value. We have found a proof!

Step 5: This is the step that too many students skip, but you have not given a proof if you stop
after completing Step 4. The hard work is done. You have connected the two ends, the hypotheses
and the conclusions. Now you need to write up the proof. You lay out for your readers a logical
progression that takes them directly from the hypotheses to the conclusions in as painless and clear
a manner as possible. You want to take the readers across your bridge in a seamless trip. Here
is one example of how to rewrite the proof that has just been discovered. Notice that it includes
making explicit those δs that imply that our error function is bounded by .

Let T = limx→a f 0 (x). We need to show that given any  > 0, there is a response δ
so that 0 < |x − a| < δ implies that

T − f (x) − f (a) < .

x−a
By the definition of the limit, there is a δ so that 0 < |x−a| < δ implies that |T −f 0 (x)| <
. By the mean value theorem, there is a c strictly between a and x for which
f (x) − f (a)
= f 0 (c).
x−a
Since 0 < |c − a| < |x − a| < δ, we have that

T − f (x) − f (a) = T − f 0 (c) < .

x−a

Some of the details have been left out. When they need to be included is a matter of judgment.
Details often obscure the essence of the proof, but if you think that your readers would stop and
puzzle over certain points, then you need to put in those details. For example, if your readers are
likely to wonder why you can use the mean value theorem, then you should include your analyses
of the hypotheses of that theorem. Also notice that I have chosen to start the proof with the
observation made in Step 2. Again, this is for the benefit of the readers, to help them see where I
am going with this proof. Sometimes it helps to begin the proof by stating what needs to be done
to reach the conclusion. Sometimes this is not necessary. It might even be confusing.

2 Help with Step 4

In our example, we were able to construct our bridge by working out from the hypotheses and
conclusions until we found a link. That is often very hard. There are two other variations that can
be helpful.

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
How to find and write a proof 4

Using the contra-positive

Instead of trying to prove that if A, then B, try to prove the contra-positive, the logically
equivalent statement that if B is false then A is false. You need to be particularly careful when
taking the negation of a statement that contains a conjunction: “and” or “or”. The negation of
“A and B” is “not A or not B”. The contra-positive of our example is:

If f 0 (a) does not exist or it exists but does not equal limx→a f 0 (x), then limx→a f 0 (x)
does not exist or f is not continuous at a.

Taking the contra-positive does not make it any easier to tackle this particular theorem, but it can
be helpful. You are now trying to build your bridge in a different location.

Search for a contradiction

One of the most powerful tools for bridging the gap is to assume that the hypothesis is true and
the conclusion is false. If you can show that this leads to an impossible situation, then whenever
the hypothesis is true, the conclusion must be true, and so you have proven your theorem. For our
example, we would assume that

1. f is continuous at a,

2. limx→a f 0 (x) exists, and

3. f 0 (a) does not exist or it exists but does not equal limx→a f 0 (x)

You now explore what these assumptions tell you about the function, looking for some contradiction.

The classic example of the use of proof by contradiction is the standard proof that x2 = 2, then x
is irrational. the statements with which we get to work are

1. x2 = 2,

2. x is rational.

We look for consequences of these statements that produce a contradiction. Since x is rational,
we can write it as x = m/n where m and n are relatively prime integers. Since x2 = 2, we can
substitute m/n for x and clear denominators,
 m 2
=2 =⇒ m2 = 2n2 .
n

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
How to find and write a proof 5

The right side of this equality is an even integer, so m2 is even and that means that m is even. We
can find an integer t such that m = 2t. We substitute 2t for m in our last equation,

(2t)2 = 2n2 =⇒ 2t2 = n2 .

Now we see that n must be even. We have our contradiction because m and n are both even but
they are also relatively prime.

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Continued Fractions

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

June 20, 2006

To get out sequence that approaches π, we begin by splitting off the great integer less than or equal
to π,
f loorπc = a1 = 3,
π = 3 + r1 , r1 = 0.14159 . . . .
The remainder, r1 , lies between 0 and 1, so its reciprocal is larger than 1:
1
= 7.0625133 . . . .
r1
Again, we split off the greatest integer, a2 = 7, and consider the new remainder r2 = 0.0625133 . . . .
We keep doing this, generating a sequence of positive integers:

a1 = 3, a2 = 7, a3 = 15, a4 = 1, a5 = 292, a6 = 1, a7 = 1, a8 = 1, a9 = 2, . . . .

If we stop after the kth integer, we get a rational approximation to π,


p1 3
= 3= ,
q1 1
p2 1 22
= 3+ = ,
q2 7 7
p3 1 333
= 3+ 1 = 106 ,
q3 7 + 15
p3 1 355
= 3+ 1 = ,
q3 7+ 15+1/1
113
..
.

This is called a continued fraction. There is a special notation that makes it easier to write long
continued fractions:
p9 1 1 1 1 1 1 1 1 833719
=3+ = .
q9 7+ 15+ 1+ 292+ 1+ 1+ 1+ 2 265381

1
Continued Fractions 2

These give the best possible rational approximations for a given limit on the denominator. Specif-
ically, we shall see that if 1 ≤ b < qk , then no fraction with denominator b can be closer to π than
pk /qk .

There is nothing√special about π in all of this. We could have started with any other irrational
number such as 2 or e. In fact, while the sequence for√π has no discernible pattern, there are
very simple patterns for the integers in the sequences for 2 and e. As long as we start we with an
irrational number, the sequence will never end. The sequence terminates if and only if we start with
a rational number. In what follows, we shall prove everything for an arbitrary irrational number
that we call α.

We begin by defining the sequence:

a1 = bαc,
r1 = α − a1 ,
 
1
ak+1 = ,
rk
1
rk+1 = − ak+1 .
rk
We also define the sequence of rational approximations,
pk 1 1 1
= a1 + .
qk a2 + a3 + · · · ak

Notice that if we replace the last ak by ak + rk (an irrational number), we get an expression that
exactly equals our original number,
1 1 1
α = a1 + .
a2 + a3 + · · · ak + rk
Proposition 1. If we define p0 = 1, p1 = a1 , q0 = 0, q1 = 1, then for all k ≥ 1, we can define

pk+1 = pk−1 + ak+1 pk , (1)


qk+1 = qk−1 + ak+1 qk . (2)

Furthermore, we have that


pk+1 qk − pk qk+1 = (−1)k+1 , (3)
and, therefore, gcd(pk , qk ) = 1 for all k ≥ 0.

Proof. We prove these equations by induction. When k = 1, we have that

p2 = a1 a2 + 1 = p0 + a2 p1 , q2 = a2 = q0 + a2 q1 , p2 q1 − p1 q2 = a1 a2 + 1 − a1 a2 = 1.

We can turn pk /qk into pk+1 /qk+1 by taking the continued fraction for pk /qk and replacing ak with
ak + 1/ak+1 . By our induction hypothesis,

pk = pk−2 + ak pk−1 , qk = qk−2 + ak qk−1 .

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Continued Fractions 3

Therefore,

pk+1 pk−2 + (ak + 1/ak+1 )pk−1


=
qk+1 qk−2 + (ak + 1/ak+1 )qk−1
pk−2 + ak pk−1 + (1/ak+1 )pk−1
=
qk−2 + ak qk−1 + (1/ak+1 )qk−1
pk + pk−1 /ak+1
=
pk + pk−1 /ak+1
pk−1 + ak+1 pk
= .
qk−1 + ak+1 qk

By our induction hypothesis, pk qk−1 − pk−1 qk = (−1)k . Using equations (1) and (2), we have that

pk+1 qk − pk qk+1 = (pk−1 + ak+1 pk )qk − pk (qk−1 + ak+1 qk )


= pk−1 qk + ak+1 pk qk − pk qk−1 − ak+1 pk qk
= pk−1 qk − pk qk−1 = −(−1)k .

By equation (3), any common divisor of pk and qk is a divisor of 1.

Because we also round down to the next integer, pk /qk is less than α when k is odd, greater than
α when k is even. The difference between two successive approximations is

pk+1 pk pk+1 qk − pk qk+1 (−1)k+1


− = = . (4)
qk+1 qk qk qk+1 qk qk+1

This tells us that we can write pk /qk as the sum of an alternating series,
k−1
X (−1)j+1
pk
= a1 + .
qk qj qj+1
j=1

Since the values of qk increase, the summands are decreasing and approach 0. This is an alternating
series that converges to α. The partial sums alternate larger and smaller than α.

If a/b lies between pk /qk and pk+1 /qk+1 , then



1 a pk |aqk − bpk | 1
> − = ≥ ,
qk qk+1 b qk bqk bqk

and therefore b > qk+1 .

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
The Marquis de l’Hospital

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

September 30, 2006

Guillaume François Antoine de L’Hospital, Marquis de Sainte-


Mesme and Comte d’Entremont was born in 1661 and was fore-
most among the French nobility intrigued by the developments
of calculus. In 1691, Johann Bernoullli visited Paris and ex-
plained the new calculus in a series of public lectures that con-
tinued into 1692. L’Hospital hired him as a private tutor over a
period of four months.

In March of 1694, L’Hospital wrote to Bernoulli, then back in


Basel, offering him an annual pension of 300 livres in exchange
for help with mathematical questions and a promise to send
to L’Hospital mathematical results which L’Hospital could then
publish under his own name. What today we call L’Hospital’s
Figure 1: Marquis de L’Hospital rule was sent by Bernoullli to L’Hospital later that year. In 1696,
L’Hospital published the very first book on calculus, Analyse des
infiniments petits, pour l’inteligence des lignes courbes, which
Fred Rickey has translated as Analysis of the Little-Bitty-Guys for the Study of Curved Lines.
Here is the first recorded mention of what today we call L’Hospital’s rule. Bernoulli’s lectures from
1691–92 would be published in 1922, revealing that much of L’Hospital’s book was first discovered
by Bernoulli. In fact, after L’Hospital’s death in 1704 with Bernoulli now freed from his contract,
he laid claim to L’Hospital’s rule as his own result.

Not even the historians of mathematics are agreed on how to spell his name. On at least one of his
letters, he spelled his name Lhospital (without the apostrophe and with a lower case h), but people
did not spell their names consistently back then (think of Shakespeare). On his calculus book, it
is spelled l’Hospital (lower case l). The official French national bibliographic entry is L’Hospital,
which is what most historians choose. Today, the French word for hospital is spelled “hôpital”.
The mark over the o is called a circumflex. It is used denote a missing s. While there is no evidence
that L’Hospital ever spelled his name with a circumflex or without the s, many mathematicians
prefer this spelling because the s, even if written, would not have been pronounced.

1
Chapter 3
Maple code for exercise in section 3.3
22.

> f := (n, x) -> (ln(x+2)-x^(2*n)*sin(x))/(1+x^(2*n)) ;

> plot(f(2,x),x=0..1/2*Pi);

> plot(f(5,x),x=0..1/2*Pi);

> plot(f(10,x),x=0..1/2*Pi);

> plot(f(20,x),x=0..1/2*Pi);

The command fsolve(f(x) = 0, x = a); finds a root of f near x=a.

> fsolve(f(2,x)=0,x=1);

> fsolve(f(5,x)=0,x=1);

> fsolve(f(10,x)=0,x=1);

> fsolve(f(20,x)=0,x=1);

Maple code for exercises in section 3.4


15.
How are these solutions related to the value of x?

> fsolve(sin(1) = sin(1/c) - 1/c * cos(1/c));

> fsolve(sin(1/3) = sin(1/c) - 1/c * cos(1/c));

> fsolve(sin(0.01) = sin(1/c) - 1/c * cos(1/c));

21.
The numpoints command gives somewhat greater detail in the plot. What parts of this graph
might warrant studying in closer detail?

> plot(x/(1+x*sin(1/x)),x = 0 .. 1,numpoints = 500);

Enter the following command to define a new function, F, that is the derivative of the given
function.

> F := y -> eval(diff(x/(1+x*sin(1/x)),x),x = y);

You can now draw the graph of F.

> plot(F(y),y = 0 .. 1);

Maple code for exercises in Newton-Raphson Method


2.
If you define f(x), specify an initial value for x, and then define the command nr as given below,
then each time you enter nr, Maple will perform another iteration of the Newton--Raphson
method. To change the intial value of x, you need only define the new value.

> f := x -> x^3 - 4*x^2 - x + 2;

> x:= 4; for n from 1 to 10 do x := evalf(x - f(x)/D(f)(x),12) end do;

3.
The following command will iterate NewtonĞRaphson for each of the starting points.

> f := x -> sin(x) - x/2;

> x:=0.84; for n from 1 to 10 do x := evalf(x - f(x)/D(f)(x),12) end do;

4.
The letter I designates the squareroot of Ğ1

> f := x -> sin(x) - x/2;

> x:=0.84+0.1*I; for n from 1 to 10 do x := evalf(x - f(x)/D(f)(x),12) end do;


9.

Find the range of values of x at which there could be a root. Use a plot command to find
approximate values of these roots. Then use NewtonĞRaphson to find roots to 10-digit accuracy.

> f := cos(3*x)-x;

> x:= 4; for n from 1 to 10 do x := evalf(x - f(x)/D(f)(x),12) end do;


Stirling's Formula
> Compare n! to (n/e)^n * sqrt(2*Pi*n)

> [seq([100*n,evalf((100*n)!,10),evalf((100*n/exp(1))^(100*n)*sqrt(2*Pi*100*n),10)],n
=1..10)];

> Compare ln(n!) to n*ln(n) -n+(1/2) * ln(2*Pi*n)

> S := n -> n*ln(n) -n+(1/2) * ln(2*Pi*n);

> [seq([100*n,evalf(ln((100*n)!),10),evalf(S(100*n),10)],n=1..10)];

> er := (n,k) -> sum(evalf(bernoulli(2*j)/((2*j-1)*(2*j)*n^(2*j-1)),50),j=1..k);

> [seq([10*k,evalf(er(10,10*k),50)],k=1..10)];

> The actual error is given by

> evalf(ln(10!)-S(10),50);

> This next program finds the smallest summand in the asymptotic series for the error function.

> summand := j -> abs(bernoulli(2*j)/((2*j-1)*(2*j)*10^(2*j-1)));

> k=1; while summand(k) > summand(k+1) do k:=k+1 end do; print(k);

> evalf(summand(32));

> This will be the greatest possible accuracy in the error function. We set the accuracy of our
calculations to 50 digits to ensure that machine rounding does not affect accuracy of our
results. We first compute the approximation of Log[100!] given by Stirling's formula.

> evalf(S(10),50);

> We next computee the error term given by the asymptotic series with 32 summands.

> sum(evalf(bernoulli(2*j)/((2*j-1)*(2*j)*10^(2*j-1)),50),j=1..32);

> We now compare the exponential of Stirling's formula, the exponential of Stirling's formula
with the error term added in, and the actual value of 10!.
> round(evalf(exp(S(10)),50));

> round(evalf(exp(S(10)+sum(evalf(bernoulli(2*j)/((2*j-1)*(2*j)*10^(2*j-1)),50),j=1..32
)),50));

> 10!;
Exponential Function

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

June 21, 2006

n
The associated Maple and Mathematica notebooks will let you explore the convergence of 1 + x/n .
Using the fact that the natural logarithm is a continuous function and therefore the natural loga-
rithm of this limit is the same as the limit of the natural logarithm, we have that
  x n   x n 
ln lim 1 + = lim ln 1 +
n→∞ n n→∞
 nx 
= lim n ln 1 +
n→∞ n
ln(1 + x/n)
= lim
n→∞ n−1
−xn−2 /(1 + x/n)
= lim
n→∞ −n−2
x
= lim
n→∞ 1 + x/n
= x.

Therefore,  x n
lim 1+ = ex .
n→∞ n

Another way to approach this identity is to use the binomial expansion:


n   k
 x n X n x
1+ =
n k nk
k=0
n
X n(n − 1)(n − 2) · · · (n − k + 1)
= xk
k! nk
k=0
n
X 1(1 − 1/n)(1 − 2/n) · · · (1 − (k − 1)/n)
= xk .
k!
k=0

As n approaches infinity, the number of summands becomes infinite and the kth summand becomes
xk /k!:

 x n X xk
lim 1 + = = ex .
n→∞ n k!
k=0

1
Exponential Function
> f := (x,n) -> (1+x/n)^n;

> [seq([100*n,evalf(f(2,100*n))],n=1..20)];

> Compare these to the true value.

> evalf(exp(2));
Convergence in Norm

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

June 21, 2006

In A Radical approach to Real Analysis, we only considered pointwise convergence of a sequence of


functions, but in a Radical Approach to Lebesgue’s Theory of Integration, other types of convergence
will be considered including convergence in norm. We define the distance between two functions,
f and g, over an interval, say [0, 1], by
Z 1

kf − gk = f (x) − g(x) dx.
0

The distance between two functions is 0 if and only if the integral of the absolute value of difference
between these functions is 0. Given two continuous functions, this can happen only if they are
equal everywhere (see exercise 1).

We say that the sequence (fn )∞


n=1 converges to f if for each  > 0 there is a response N so that
n ≥ N implies that
kfn − f k < .

Exercises

1. Prove that if f and g are continuous on [0, 1], then kf − gk = 0 if and only if f (x) = g(x) for all
x ∈ [0, 1].
2. Define the sequence of functions
 x n
fn (x) = 1 + , 0 ≤ x ≤ 1, n ≥ 1.
n
Show that fn converges to ex both pointwise and in norm over [0, 1].
3. Define the sequence of functions
2
gn (x) = nxe−nx , 0 ≤ x ≤ 1, n ≥ 1.

Show that gn converges pointwise to the constant function g(x) = 0. Show that it does not converge
in norm to any function.

1
Convergence in Norm 2

4. Consider the functions fn,k defined for 0 ≤ x ≤ 1, 1 ≤ k ≤ n, by



1, if (k − 1)/n ≤ x ≤ k/n,
fn,k (x) =
0, otherwise.

We create a sequence,
f1,1 , f2,1 , f2,2 , f3,1 , f3,2 , f3,3 , f4,1 , f4,2 , . . . .
Show that this sequence does not converge pointwise at any point in [0, 1], but it does converge in
norm to the constant function f (x) = 0

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Gauss’s Test

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

June 21, 2006

Theorem 1. Gauss’s Test Let

f (x) = xβ (a0 + a1 x + a2 x2 + a3 x3 + · · · ), ai 6= 0,

be a hypergeometric power series. We assume that the polynomials in the numerator and denomi-
nator of an+1 /an have the same degree:
an+1 Ct nt + Ct−1 nt−1 + · · · + C0
= ,
an ct nt + ct−1 nt−1 + · · · + c0
where neither Ct nor ct is zero. The radius of convergence is |ct /Ct |. If |x| = |ct /Ct |, then we can
write the absolute value of the ratio of successive terms as
an+1 nt + Bt−1 nt−1 + · · · + B0

an x = nt + bt−1 nt−1 + · · · + b0 , (1)

where Bj = Cj /Ct and bj = cj /ct . The test is as follows:

1. If Bt−1 > bt−1 , then the absolute values of the summands grow without limit and the series
cannot converge.
2. If Bt−1 = bt−1 , then the absolute values of the summands approach a finite nonzero limit and
the series cannot converge.
3. If Bt−1 < bt−1 , then the absolute values of the summands approach zero. If the series is
alternating, then it converges.
4. If Bt−1 ≥ bt−1 − 1, then the series is not absolutely convergent.
5. If Bt−1 < bt−1 − 1, then the series is absolutely convergent.

Proof of Gauss’s Test: Part I

We shall follow Gauss’s proof, taking the liberty of rephrasing it and occasionally elaborating on
what he is doing. The reader is encouraged to look up Gauss’s original statements, portions of

1
Gauss’s Test 2

which have been translated into English in Garrett Birkhoff’s A Source Book in Classical Analysis.
The complete paper (in Latin) is in the third volume of Werke, Carl Friedrich Gauss’s collected
works. Following Gauss, we define M1 + M2 + M3 + · · · to be our hypergeometric series,
Mn+1 nt + Bt−1 nt−1 + · · · + B0

Mn nt + bt−1 nt−1 + · · · + b0 .
= (2)

We define P (n) = nt + Bt−1 nt−1 + · · · + B0 and p(n) = nt + bt−1 nt−1 + · · · + b0 .

For large values of n, it is the coefficient of the highest power of n (the leading coefficient) that
determines the sign of the polynomial. As Gauss points out, once you are past the rightmost root
of the polynomial, then the polynomial takes on only positive values if and only if the leading
coefficient is positive. It follows that once n is larger than the largest root of either P (n) or p(n),
then
P (n) P (n)
p(n) = p(n) . (3)

Let k be the largest subscript for which Bk 6= bk . If Bk − bk is positive, then P (n) − p(n) will
eventually be positive:
P (n) P (n) − p(n)
=1+ > 1, (4)
p(n) p(n)
for all n larger than the rightmost root of p(n)[P (n) − p(n)]. This implies that

|Mn+1 | P (n)
= >1
|Mn | p(n)

and so |Mn+1 | > |Mn |. The |Mn | are strictly increasing once we are past this rightmost root.

Similarly, if Bk − bk is negative, then P (n) − p(n) will eventually be negative and

P (n) P (n) − p(n)


=1+ < 1, (5)
p(n) p(n)

for all n sufficiently large. The |Mn | are strictly decreasing once n has passed the rightmost root.

Proof of Gauss’s Test: Part II

Gauss’s test says more than just that the absolute values of the summands are increasing or de-
creasing. It says that when Bt−1 > bt−1 , they increase without limit. When Bt−1 < bt−1 , they
approach zero. In this part, we assume that Bt−1 6= bt−1 .

If Bt−1 > bt−1 , then we can find an integer h such that

h(Bt−1 − bt−1 ) > 1. (6)

We define a new series N1 + N2 + N3 + · · · by

Mnh
Nn = . (7)
n

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Gauss’s Test 3

Our new series is also hypergeometric:

Mn+1 h n

Nn+1
Nn =

Mn n + 1

P (n) h n
 
=
p(n) n+1
 t h
n + Bt−1 nt−1 + · · · + B0 n
= t t−1
n + bt−1 n + · · · + b0 n+1
nth+1 + hBt−1 nth + · · ·
= . (8)
nth+1 + (hbt−1 + 1)nth + · · ·

Since we chose h so that hBt−1 > hbt−1 + 1, we know from part I that the summands of our new
series are also increasing in absolute value for n sufficiently large. We now observe that
p
|Mn | = h n|Nn |. (9)

Since h is constant and |Nn | is increasing, |Mn | grows without limit as n increases.

Similarly, if Bt−1 < bt−1 , then we can find an integer h such that

h(bt−1 − Bt−1 ) > 1. (10)

We define a new series N1 + N2 + N3 + · · · by

Nn = nMnh . (11)

This series is also hypergeometric:

P (n) h n + 1
 
Nn+1
=
Nn p(n) n
nth+1 + (hBt−1 + 1)nth + · · ·
= . (12)
nth+1 + hbt−1 nth + · · ·

Since we chose h so that hBt−1 + 1 < hbt−1 , we know from part I that the summands of our new
series are decreasing in absolute value. We observe that
p
|Mn | = h |Nn |/n. (13)

Since h is constant and |Nn | is decreasing, |Mn | approaches 0 as n increases.

Proof of Gauss’s Test: Part III

We next tackle the case where Bt−1 = bt−1 . We first assume that Bk > bk at the largest subscript
k for which Bk 6= bk . From part I, this tells us that for n sufficiently large, the absolute values of
the summands will be strictly increasing.

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Gauss’s Test 4

We find an integer h with the property that

Bt−2 − bt−2 − h < 0. (14)

We then define a new series N1 + N2 + N3 + · · · by


 h
n
N n = Mn , (15)
n−1

and note that


|Nn | > |Mn |. (16)
Again, we have created a new hypergeometric series:
 h
= Mn+1 (n + 1)(n − 1)
Nn+1

Nn Mn n2
h
P (n) n2 − 1

=
p(n) n2
nt + Bt−1 nt−1 + · · · + B0 n2h − hn2h−2 + · · ·
= ·
nt + bt−1 nt−1 + · · · + b0 n2h
nt+2h + Bt−1 nt+2h−1 + (Bt−2 − h)nt+2h−2 + · · ·
= . (17)
nt+2h + bt−1 nt+2h−1 + bt−2 nt+2h−2 + · · ·

Since Bt−1 = bt−1 and we have chosen h so that Bt−2 − h < bt−2 , the values of |Nn | are decreasing
once n is sufficiently large. We have that

|Mn | < |Mn+1 | < |Mn+2 | < · · · < |Nn+2 | < |Nn+1 | < |Nn |,

and the distance between |Mn | and |Nn | approaches zero. The nested interval principle promises
us that both series are approaching a common limit.

Similarly, if at the largest subscript k for which Bk 6= bk we have Bk < bk , then we find an integer
h for which
bt−2 − Bt−2 − h < 0. (18)
and define a new series by
 h
n−1
N n = Mn . (19)
n
We note that
|Nn | < |Mn |. (20)
For this series, we have that
h
n2

Nn+1 P (n)
Nn =

p(n) 2
n −1
nt+2h + Bt−1 nt+2h−1 + Bt−2 nt+2h−2 + · · ·
= . (21)
nt+2h + bt−1 nt+2h−1 + (bt−2 − h)nt+2h−2 + · · ·

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Gauss’s Test 5

Since Bt−1 = bt−1 and we have chosen h so that bt−2 − h < Bt−2 , the |Nn | are increasing for
sufficiently large values of n. We see that

|Nn | < |Nn+1 | < |Nn+2 | < . . . < |Mn+2 | < |Mn+1 | < |Mn |,

and the distance between |Mn | and |Nn | approaches zero. Both series are approaching a common
limit.

The remaining situation is where P (n) = p(n). Here we have that |Mn+1 | = |Mn |. All summands
have the same absolute value.

Proof of Gauss’s Test: Part IV

As we know, the fact that the summands approach zero is not enough to guarantee convergence of
the series. If the series alternates in sign (or alternates in sign for all summands after some finite
subscript), then we have convergence. Because the ratio of consecutive terms, Mn+1 /Mn , is a ratio
of polynomials in n, this ratio will eventually be either positive or negative and stay there. If it is
negative, then our summands alternate in sign. If it is positive, then the summands have the same
sign and our series converges if and only if it converges absolutely. We shall now determine when
this series converges absolutely.

We first consider the case where Bt−1 > bt−1 − 1. We observe that

(n + 1)(nt + Bt−1 nt−1 + · · · + B0 )



(n + 1) Mn+1
=
n Mn n(nt + bt−1 nt−1 + · · · + b0 )
nt+1 + (Bt−1 + 1)nt + · · ·
= . (22)
nt+1 + bt−1 nt + · · ·
Since Bt−1 + 1 > bt−1 , this last fraction will eventually be larger than 1. Let m be an integer large
enough so that if n is greater than or equal to m then
n
|Mn+1 | > |Mn |. (23)
n+1
This implies that
m+k−1
|Mm+k | > |Mm+k−1 |
m+k
m+k−1 m+k−2
> · |Mm+k−2 |
m+k m+k−1
m+k−1 m+k−2 m+k−3
> · · |Mm+k−3 |
m+k m+k−1 m+k−2
..
.
m+k−1 m+k−2 m
> · ··· |Mm |
m+k m+k−1 m+1
m
= |Mm |. (24)
m+k

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Gauss’s Test 6

It now follows that

|M1 | + |M2 | + |M3 | + · · · + |Mm | + |Mm+1 | + |Mm+2 | + · · ·


≥ |Mm | + |Mm+1 | + |Mm+2 | + · · ·
m m m
> |Mm | + |Mm | + |Mm | + |Mm | + · · ·
m+1 m+2 m+3
 
1 1 1 1
= m|Mm | + + + + ··· . (25)
m m+1 m+2 m+3

This last series is the harmonic series (with a finite number of summands taken off the front end).
It diverges. The comparison test tells us that |M1 | + |M2 | + |M3 | + · · · must also diverge.

If Bt−1 = bt−1 − 1, then we find a positive integer h such that

Bt−1 + Bt−2 − bt−2 + h > 0. (26)

We observe that
(n + 1 − h)(nt + Bt−1 nt−1 + · · · + B0 )

n+1−h Mn+1
Mn =

n−h (n − h)(nt + bt−1 nt−1 + · · · + b0 )
nt+1 + (Bt−1 + 1 − h)nt + (Bt−2 + Bt−1 − hBt−1 )nt−1 + · · ·
= .
nt+1 + (bt−1 − h)nt + (bt−2 − hbt−1 )nt−1 + · · ·
(27)

Since 1 = bt−1 − Bt−1 , we can rewrite inequality (26) as

Bt−2 + Bt−1 − hBt−1 − (bt−2 − hbt−1 ) > 0. (28)

In the last fraction of equation (27), the coefficients of nt are the same (Bt−1 + 1 − h = bt−1 − h)
and Bt−2 + Bt−1 − hBt−1 > bt−2 − hbt−1 . We choose m larger than h and large enough so that if
n ≥ m, then the fraction in equation (27) will be larger than 1. It follows that for n ≥ m:

n−h
|Mn+1 | > |Mn |. (29)
n+1−h
This implies that
m+k−1−h
|Mm+k | > |Mm+k−1 |
m+k−h
m+k−1−h m+k−2−h
> · |Mm+k−2 |
m+k−h m+k−1−h
..
.
m+k−1−h m+k−2−h m−h
> · ··· |Mm |
m+k−h m+k−1−h m+1−h
m−h
= |Mm |. (30)
m+k−h

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Gauss’s Test 7

It now follows that

|M1 | + |M2 | + |M3 | + · · · + |Mm | + |Mm+1 | + |Mm+2 | + · · ·


≥ |Mm | + |Mm+1 | + |Mm+2 | + · · ·
m−h m−h m−h
> |Mm | + |Mm | + |Mm | + |Mm | + · · ·
m+1−h m+2−h m+3−h
 
1 1 1 1
= (m − h)|Mm | + + + + ··· .
m−h m+1−h m+2−h m+3−h
(31)

Again, we can compare our series to the divergent harmonic series. The series |M1 |+|M2 |+|M3 |+· · ·
must diverge.

Proof of Gauss’s Test: Part V

Finally, we consider the case where Bt−1 < bt−1 − 1. We find a small positive number h such that
Bt−1 + h is still less than bt−1 − 1,

Bt−1 + h < bt−1 − 1. (32)

We observe that
n(nt + Bt−1 nt−1 + · · · + B0 )

n Mn+1
=
n − 1 − h Mn (n − 1 − h)(nt + bt−1 nt−1 + · · · + b0 )
nt+1 + Bt−1 nt + · · ·
= . (33)
nt+1 + (bt−1 − 1 − h)nt + · · ·

We know that Bt−1 is strictly less than bt−1 − 1 − h. Eventually, this fraction will be less than and
stay less than 1. We choose an integer m larger than h + 1 and large enough so that if n ≥ m, then
n−1−h
|Mn+1 | < |Mn |. (34)
n
This implies that

|Mm | + |Mm+1 | + |Mm+2 | + · · ·


m−1−h (m − h)(m − 1 − h)
< |Mm | + |Mm | + |Mm |
m (m + 1)m
(m + 1 − h)(m − h)(m − 1 − h)
+ |Mm | + · · ·
(m + 2)(m + 1)m

m − 1 − h (m − h)(m − 1 − h)
= |Mm | 1 + +
m (m + 1)m

(m + 1 − h)(m − h)(m − 1 − h)
+ + ··· . (35)
(m + 2)(m + 1)m

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Gauss’s Test 8

We now observe that


m−1 m−1−h
1 = − ,
h h
m−1−h m − 1 (m − 1 − h)(m − h)
1+ = − ,
m h hm
m − 1 − h (m − h)(m − 1 − h)
1+ +
m (m + 1)m
m − 1 (m − 1 − h)(m − h)(m − h + 1)
= − ,
h hm(m + 1)
..
.

m − 1 − h (m − h)(m − 1 − h)
1+ + + ···
m (m + 1)m
(m + k − 2 − h)(m + k − 3 − h) . . . (m − 1 − h)
+
(m + k − 1)(m + k − 2) . . . m
m − 1 (m − 1 − h)(m − h) . . . (m + k − 1 − h)
= − . (36)
h hm(m + 1) . . . (m + k − 1)
All of these partial sums are bounded above by (m − 1)/h. In fact, they converge to (m − 1)/h.
Since m is larger than h + 1,
m − 1 − h (m − h)(m − 1 − h)
1+ + + ···
m (m + 1)m
is absolutely convergent. By comparison, the series

|Mm | + |Mm+1 | + |Mm+2 | + · · ·

must also converge. Since m is a fixed subscript, this series differs from the original series |M1 | +
|M2 | + |M3 | + · · · by a known finite amount: |M1 | + |M2 | + · · · + |Mm−1 |. The original series
converges absolutely.

Conclusion

This proof is an ample demonstration of the thoroughness, care, and rigor of Gauss’s approach to
mathematics. In fact, I have occasionally been less careful than he is in his original manuscript. If
an inequality only becomes true once n passes a certain bound, then Gauss always makes explicit
what this bound is and how it enters the calculations. There can be no question that Gauss fully
understood the meaning of convergence and how it could be verified.

Not all power series are hypergeometric. We shall not always have clear, sharp tests for convergence.
But most of the power series encountered in the real world will be hypergeometric. When the root
and ratio tests return inconclusive answers, Gauss’s test is the next place to turn.

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Gauss’s Test 9

And if the series is not hypergeometric . . .

If the series is not hypergeometric, then Gauss’s test can still be applied, but it may return an
inconclusive answer.
Theorem 2. Gauss’s Test for Arbitrary Series Let a1 + a2 + a3 + · · · be a series for which

an+1 µ + E(n)
an = 1 + ,

n
where µ is some constant and E(n) is an error function that can be forced to be arbitrarily close to
zero by taking n sufficiently large (given a positive error bound , there exists a subscript N such
that n ≥ N implies that |E(n)| < ).

1. If µ > 0, then the absolute values of the summands grow without limit and the series cannot
converge.
2. If µ = 0 and |nE(n)| is bounded for all n, then the absolute values of the summands approach
a finite nonzero limit and the series cannot converge.
3. If µ < 0, then the absolute values of the summands approach zero. If the series is alternating,
then it converges.
4. If µ > −1, then the series is not absolutely convergent.
5. If µ = −1 and nE(n) has a lower bound (there is a number B and a subscript N such that
n ≥ N implies that nE(n) ≥ B), then the series is not absolutely convergent.
6. If µ < −1, then the series is absolutely convergent.

This generalization of Gauss’s test enables us to handle series such as


x2 x3 x4
1 + x + √ + √ + √ + ··· .
2 3 4
The radius of convergence is √
n+1
lim √ = 1.
n→∞ n
When x = ±1, we have that

an+1 n
an =

n+1
1 −1/2
 
= 1+
n
−1/2 (−1/2)(−3/2)
= 1+ + + ··· .
n 2! n2
In this case, we have µ = −1/2 and
(−1/2)(−3/2) (−1/2)(−3/2)(−5/2)
E(n) = + + ··· .
2! n 3! n2

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Gauss’s Test 10

Our series converges at x = −1 because it alternates in sign, but it is not absolutely convergent
and so does not converge at x = 1.

An example of a series for which Gauss’s test is inconclusive is

x2 x3 x4
x+ + + + ··· .
2 ln 2 3 ln 3 4 ln 4
The radius of convergence is again 1. To simplify the algebra, we shall look at the ratio |an /an−1 |
rather than |an+1 /an |:

an (n − 1) ln(n − 1)
an−1 =

n ln n
n − 1 ln(n · n−1
n )
= ·
n ln n
ln n + ln(1 − n−1 )
  
1
= 1−
n ln n
ln(1 − n−1 )
  
1
= 1− 1+
n ln n
1 ln(1 − n−1 ) ln(1 − n−1 )
= 1− + −
n ln n n ln n
 
1 1 1 1
= 1− − + + + ···
n n ln n 2n2 ln n 3n3 ln n
 
1 1 1
+ + + + ···
n2 ln n 2n3 ln n 3n4 ln n
−1 + E(n)
= 1+ ,
n
where
1 1 1
E(n) = − + + 2 + ··· .
ln n 2n ln n 6n ln n
For this series, µ = −1 but
n 1 1
nE(n) = − + + + ···
ln n 2 ln n 6n ln n
which does not have a lower bound. Gauss’s test shows that this series converges at x = −1 where
the series alternates. The test is inconclusive about what happens at x = 1.

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Chapter 4
Maple code for exercises in section 4.1
4.

The first command defines Sum1[n] as the sum of the first n terms of the series. The second
command generates a table of the values of the partial sums as n increases from 10 to 400 in
multiples of 10.

> Sum1 := n -> evalf(1 + sum(k!/100^k, k = 1..n - 1), 10);

> seq([10*n, Sum1(10*n)], n=1..40);

5.

First think about how you can use Stirling's formula to identify where this smallest summand is
likely to occur. The follwing command finds all values of k!/100^k for a <= k <= b.

> TestSummand := (a,b) -> seq([k, evalf(k!/100^k, 20)], k=a..b);

> TestSummand(10,20);

8.

After finding your candidate, you can test it with the command

> TestBernoulliSummand := (a,b) ->  seq([k, evalf(abs(bernoulli(2*k)/((2*k -


1)*(2*k)*10^(2*k - 1))), 20)], k=a..b);

> TestBernoulliSummand(10,20);

13.

The first command defines Sum2[n] as the sum of the first n terms of the series. The second
command generates a table of the values of the partial sums as n increases from 1000 to 2000 in
multiples of 100.

> Sum2 := n ->  evalf(sum(sin(k/100)/ln(k), k=2..n + 1), 10);

> seq([100*n, Sum2(100*n)], n=10..20);


14.

> Sum3 := n -> evalf(sum((-1)^k/ln(k),k = 2 .. n+1), 10);

> seq([100*n, Sum3(100*n)], n=10..20);

> Sum4 := n ->  evalf(sum((-1)^k*ln(k)^2/k,k = 2 .. n+1), 10);

> seq([100*n, Sum4(100*n)], n=10..20);

> Sum5 :=  evalf(sum((-1)^k*sin(1/k), k = 2 .. n+1), 10);

> seq([100*n, Sum5(100*n)], n=10..20);

> Sum6 := n -> evalf(sum((-1)^k*ln(k)^ln(k)/k^2, k = 2 .. n+1), 10);

> seq([100*n, Sum6(100*n)], n=10..20);

15.

> Sum7 := n -> evalf(sum((-1)^(k + 1 - 3*floor((k + 1)/3))/k, k=1..n), 10);

> seq([100*n, Sum7(100*n)], n=10..20);

> Sum8 := n ->  evalf(sum((-1)^floor(1/2*k-1/2)/k, k = 1 .. n), 10);

> seq([100*n, Sum8(100*n)], n=10..20);


> Sum9 := n -> evalf(sum((-1)^floor(k^(1/2))/k, k = 1 .. n), 10);

> seq([100*n, Sum9(100*n)], n=10..20);

Maple code for exercises in section 4.2


15.

> Sum10 := n -> evalf(sum((k/(2*k-1))^k,k = 1 .. n), 20);

> seq([20*n, Sum10(20*n)], n=1..10);

16.

> Sum11 := n -> evalf(sum((k/(2*k-1))^k*2^k,k = 1 .. n), 20);

Sum12 := n -> evalf(sum((k/(2*k-1))^k*(-2)^k,k = 1 .. n), 20);

> seq([20*n, Sum11(20*n), Sum12(20*n)], n=1..10);

17.

> Sum13 := n -> evalf(sum(k^k/k!,k = 1 .. n), 10);

> seq([20*n, Sum13(20*n)], n=1..10);

18.

> Sum14 := n -> evalf(sum(k^k*exp(-k)/k!,k = 1 .. n), 20);

Sum15 := n -> evalf(sum(k^k*(-exp(1))^(-k)/k!,k = 1 .. n), 20);

> seq([20*n, Sum14(20*n), Sum15(20*n)], n=1..15);

19.
> Sum16 := n -> evalf(sum(2^k/k^(1/2), k = 1 .. n), 10);

> seq([20*n, Sum16(20*n)], n=1..10);

20.

> Sum17 := n -> evalf(sum(1/k/ln(k),k = 2 .. n), 20);

> seq([1000*n, Sum17(1000*n)], n=1..10);

21.

> Sum18 := n -> evalf(sum(1/k/ln(k)^(3/2),k = 2 .. n), 20);

> seq([1000*n, Sum18(1000*n)], n=1..10);

22.

sum(1/k/ln(k)/ln(ln(k)),k = 10 .. n)

> Sum19 := n -> evalf(sum(1/k/ln(k)/ln(ln(k)),k = 10 .. n), 20);

> seq([1000*n, Sum19(1000*n)], n=1..10);

Maple code for exercises in section 4.3


6.

sum((k/(2*k-1))^k*x^k,k = 1 .. n)

> FSum1 := (n,x) -> sum((k/(2*k-1))^k*x^k,k = 1 .. n);

> plots[display](plot(FSum1(3,x),x = -2.1 .. 2.1),plot(FSum1(6,x),x = -2.1 ..


2.1),plot(FSum1(9,x),x = -2.1 .. 2.1),plot(FSum1(12,x),x = -2.1 .. 2.1));

8.

> FSum2 := (n,x) -> sum(k^k/k!*x^k, k=1..n);


> plots[display](plot(FSum2(3,x),x = -.7 .. .7),plot(FSum2(6,x),x = -.7 ..
.7),plot(FSum2(9,x),x = -.7 .. .7),plot(FSum2(12,x),x = -.7 .. .7));

10.

> FSum3 := (n,x) -> sum(2^k/Sqrt[k]*x^k, k=1..n);

> plots[display](plot(FSum3(3,x),x = -.7 .. .7),plot(FSum3(6,x),x = -.7 ..


.7),plot(FSum3(9,x),x = -.7 .. .7),plot(FSum3(12,x),x = -.7 .. .7));

11.

These summands can be simplified by realizing that product(2*i,i=1..n)/product(2*i+1,i=1..n) =


4^n (n!)^2\/(2*n + 1)!

> FSum4 := (n,x) -> sum(4^k*k!^2*x^k/(2*k+1)!,k = 1 .. n);

> plots[display](plot(FSum4(3,x),x = -.7 .. .7),plot(FSum4(6,x),x = -.7 ..


.7),plot(FSum4(9,x),x = -.7 .. .7),plot(FSum4(12,x),x = -.7 .. .7));

14.

> FSum5 := (n,x) -> sum(2^k*k!*k^k*x^k/(2*k)!,k = 1 .. n);

> plots[display](plot(FSum5(3,x),x = -2 .. 2),plot(FSum5(6,x),x = -2 ..


2),plot(FSum5(9,x),x = -2 .. 2),plot(FSum5(12,x),x = -2 .. 2));

15.

> FSum6 := (n,x) -> sum(x^k/k^2,k = 1.. n);

> plots[display](plot(FSum6(3,x),x = -1.1 .. 1.1),plot(FSum6(6,x),x = -1.1 ..


1.1),plot(FSum6(9,x),x = -1.1 .. 1.1),plot(FSum6(12,x),x = -1.1 .. 1.1));

> FSum7 := (n,x) -> 1+sum((2*k)!*x^k/k!^2,k = 1 .. n);

> plots[display](plot(FSum7(3,x),x = -.4 .. .4),plot(FSum7(6,x),x = -.4 ..


.4),plot(FSum7(9,x),x = -.4 .. .4),plot(FSum7(12,x),x = -.4 .. .4));

> FSum8 := (n,x) -> 1+sum(k!^3*x^k/(3*k)!,k = 1 .. n);


> plots[display](plot(FSum8(3,x),x = -30 .. 30),plot(FSum8(6,x),x = -30 ..
30),plot(FSum8(9,x),x = -30 .. 30),plot(FSum8(12,x),x = -30 .. 30));

> FSum9 := (n,x) -> 1+sum((2*k+1)!*x^k/(2^k)/k!^2,k = 1 .. n);

> plots[display](plot(FSum9(3,x),x = -.7 .. .7),plot(FSum9(6,x),x = -.7 ..


.7),plot(FSum9(9,x),x = -.7 .. .7),plot(FSum9(12,x),x = -.7 .. .7));

> FSum10 := (n,x) -> sum(product(i^2-1,i = 2 .. k)*x^k/product(i^2,i = 2 .. k),k = 2 ..


n);

> plots[display](plot(FSum10(3,x),x = -1.2 .. 1.2),plot(FSum10(6,x),x = -1.2 ..


1.2),plot(FSum10(9,x),x = -1.2 .. 1.2),plot(FSum10(12,x),x = -1.2 .. 1.2));

> FSum11 := (n,x) -> 1+sum((2*k)!^2*x^k/(4^k)/k!^4,k = 1 .. n);

> plots[display](plot(FSum11(3,x),x = -.3 .. .3),plot(FSum11(6,x),x = -.3 ..


.3),plot(FSum11(9,x),x = -.3 .. .3),plot(FSum11(12,x),x = -.3 .. .3));

> FSum12 := (n,x) -> 1+sum((3*k)!*x^k/k!/(2*k)!,k = 1 .. n);

> plots[display](plot(FSum12(3,x),x = -.2 .. .2),plot(FSum12(6,x),x = -.2 ..


.2),plot(FSum12(9,x),x = -.2 .. .2),plot(FSum12(12,x),x = -.2 .. .2));

27.

> HyperSum1 := (n,m) -> evalf(sum((product(3*i-1,i = 1 .. k)/product(3*i,i = 1 ..


k))^m,k = 1 .. n), 10);

> seq([100*n, HyperSum1(100*n, 1)], n=1..10);

> seq([100*n, HyperSum1(100*n, 2)], n=1..10);

> seq([100*n, HyperSum1(100*n, 3)], n=1..10);

> seq([100*n, HyperSum1(100*n, 4)], n=1..10);

28.

> HyperSum2 := (n,m) -> evalf(sum((product(3*i-2,i = 1 .. k)/product(3*i,i = 1 ..


k))^m,k = 1 .. n), 10);
> seq([100*n, HyperSum2(100*n, 1)], n=1..10);

> seq([100*n, HyperSum2(100*n, 2)], n=1..10);

> seq([100*n, HyperSum2(100*n, 3)], n=1..10);

> seq([100*n, HyperSum2(100*n, 4)], n=1..10);

Maple code for exercises in section 4.4


1.

> FSum13 := (n,x) -> evalf(sum((-1)^(k-1)*cos(1/2*(2*k-1)*Pi*x),k = 1 .. n),10);

The first command plots the partial sums; the second lists them.

> plots[listplot]([seq([n, FSum13(n,1/2)],n = 1 .. 20)],style = POINT,symbol =


CIRCLE);

> seq([n, FSum13(n, 1/2)], n=1..20);

> plots[listplot]([seq([n, FSum13(n,2/3)],n = 1 .. 20)],style = POINT,symbol =


CIRCLE);

> seq([n, FSum13(n, 2/3)], n=1..20);

> plots[listplot]([seq([n, FSum13(n,3/5)],n = 1 .. 20)],style = POINT,symbol =


CIRCLE);

> seq([n, FSum13(n, 3/5)], n=1..20);

> plots[listplot]([seq([n, FSum13(n,5/18)],n = 1 .. 20)],style = POINT,symbol =


CIRCLE);

> seq([n, FSum13(n, 5/18)], n=1..20);

3.

> FSum14 := (n,x) -> evalf(sum((-1)^(k-1)*cos(1/2*(2*k-1)*Pi*x)/(2*k-1),k = 1 .. n),


10);

The first command plots the partial sums for 1 <= n <= 200; the second lists the last 20 values.
> plots[listplot]([seq([n, FSum14(n,1/2)],n = 1 .. 200)],style = POINT,symbol =
CIRCLE);

> seq([n, FSum14(n, 1/2)], n=181..202);

> plots[listplot]([seq([n, FSum14(n,2/3)],n = 1 .. 200)],style = POINT,symbol =


CIRCLE);

> seq([n, FSum14(n, 2/3)], n=181..202);

> plots[listplot]([seq([n, FSum14(n,9/10)],n = 1 .. 200)],style = POINT,symbol =


CIRCLE);

> seq([n, FSum14(n, 9/10)], n=181..202);

> plots[listplot]([seq([n, FSum14(n,99/100)],n = 1 .. 200)],style = POINT,symbol =


CIRCLE);

> seq([n, FSum14(n, 99/100)], n=181..202);

6.

> FSum15 := (n,x) -> sum((-1)^(k-1)*sin(1/2*(2*k-1)*Pi*x)/(2*k-1),k = 1 .. n);

> plots[display](plot(FSum15(3,x),x = -2 .. 2),plot(FSum15(6,x),x = -2 ..


2),plot(FSum15(9,x),x = -2 .. 2),plot(FSum15(12,x),x = -2 .. 2));

7.

> FSum16 := (n,x) -> sin(x)*(1-cos(n*x))/(2-2*cos(x))+1/2*sin(n*x);

> plots[display](plot(FSum16(10,x),x = -Pi .. Pi,view = 0 .. 20),plot(FSum16(20,x),x


= -Pi .. Pi,view = 0 .. 20),plot(FSum16(100,x),x = -Pi .. Pi,view = 0 ..
20),plot(FSum16(1000,x),x = -Pi .. Pi,view = 0 .. 20));

11.

> FSum17 := (n,x) -> sum((-1)^(k-1)*sin(1/2*k*Pi*x)/k,k = 1 .. n);

> plots[display](plot(FSum17(3,x),x = -2 .. 2),plot(FSum17(6,x),x = -2 ..


2),plot(FSum17(9,x),x = -2 .. 2),plot(FSum17(12,x),x = -2 .. 2));
The Dilogarithm

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

June 21, 2006

The dilogarithm is defined for −1 ≤ x ≤ 1 by



X xk
Li2 (x) = . (1)
k2
k=1

In general, the polylogarithm is defined by



X xk
Lin (x) = , (2)
kn
k=1

in the interval of convergence. The function Li1 (x) is related to the natural logarithm by

X xk
Li1 (x) = = − ln(1 − x). (3)
k
k=1

From equation (3), it is easy to see that


x
− ln(1 − t)
Z
Li2 (x) = dt. (4)
0 t

Equation (4) has the advantage that it is well-defined for all x ≤ 1.

There is a wealth of information on the dilogarithm on the Wolfram MathWorld site,


http://mathworld.wolfram.com/Dilogarithm.html.

1
Chapter 5
Maple code for exercises in section 5.1
2.

sum((-1)^floor(2/3*k+4/3)/(2^(k-1)),k = 1 .. n)

> Sum01 := n -> evalf(sum((-1)^floor(2/3*k+4/3)/(2^(k-1)),k = 1 .. n),10) ;

> seq([10*n, Sum01(10*n)], n=1..10);

4.

sum((24*k-11)/(8*k-7)/(8*k-3)/(4*k-1),k = 1 .. n)

> Sum02 := n -> evalf(sum((24*k-11)/(8*k-7)/(8*k-3)/(4*k-1),k = 1 .. n),10) ;

> seq([100*n, Sum02(100*n)], n=1..10);

> Sum02(10000);

5.

The first command factors the sum of three reciprocals:

> factor(1/(4*k-1)-1/(8*k+1)-1/(8*k+5));

6/5-sum((11+40*k)/(4*k-1)/(8*k+1)/(8*k+5),k = 1 .. n)

> Sum03 := n -> 6/5-evalf(sum((11+40*k)/(4*k-1)/(8*k+1)/(8*k+5),k = 1 .. n),10) ;

> seq([100*n, Sum03(100*n)], n=1..10);

> Sum03(10000);

7.

The following program uses the algorithm described in section 5.1. i counts the number of the
term that is about to be added, sum keeps track of the current partial sum, odd keeps track of the
number of the next odd summand that can be added, even keeps track of the number of the next
even summand that can be added, list is the list of the summands that have been added so far.
> riemann1 := proc(target,n)    

> local odd,even,sum,i,l;

> sum:=0; i:=1; odd:=1; even:=2; l:=[];

> while i <= n do

> while sum < target do

> sum := evalf(sum+1/odd,10);

> l :=[op(l),1/odd];

> odd := odd+2; i := i+1;

> end do;

> while sum >= target do

> sum := evalf(sum - 1/even,10);

> l := [op(l),-1/even];

> even := even+2; i := i+1;

> end do;

> end do;

> print(l);

> print(sum);

> end proc;

> riemann1(1.5,200);

> riemann1(.5,200);
8.

See the explanation of this code in the previous program.

> riemann2 := proc(target,n)    

> local odd,even,sum,i,l;

> sum:=0; i:=1; odd:=1; even:=3; l:=[];

> while i <= n do

> while sum < target do

> sum := evalf(sum+1/odd,10);

> l :=[op(l),1/odd];

> odd := odd+4; i := i+1;

> end do;

> while sum >= target do

> sum := evalf(sum - 1/even,10);

> l := [op(l),-1/even];

> even := even+4; i := i+1;

> end do;

> end do;

> print(l);

> print(sum);

> end proc;

> riemann2(1.5,200);
> riemann2(.5,200);

Maple code for exercises in section 5.2


1.

x^2*sum((1-x^2)^(k-1),k = 1 .. n)

> FSum01 :=  (n, x) -> x^2*sum((1-x^2)^(k-1),k = 1 .. n) ;

> plots[display](plot(FSum01(3,x),x = -1 .. 1,view = 0 .. 1),plot(FSum01(6,x),x = -1 ..


1,view = 0 .. 1),plot(FSum01(9,x),x = -1 .. 1,view = 0 .. 1),plot(FSum01(12,x),x = -1
.. 1,view = 0 .. 1));

2.

sum(x^2/(1+k*x^2)/(1+(k-1)*x^2),k = 1 .. n)

> FSum02 :=  (n, x) -> evalf(sum(x^2/(1+k*x^2)/(1+(k-1)*x^2),k = 1 .. n),20) ;

> seq([10*n, FSum02(10*n,1/10)], n=1..10);

> seq([10*n, FSum02(10*n,1/100)], n=1..10);

> seq([10*n, FSum02(10*n,1/1000)], n=1..10);

3.

sum((x+x^3*(k-k^2))/(1+k^2*x^2)/(1+(k-1)^2*x^2),k = 1 .. n)

> FSum03 :=  (n, x) -> evalf(sum((x+x^3*(k-k^2))/(1+k^2*x^2)/(1+(k-1)^2*x^2),k = 1


.. n),20) ;

> seq([10*n, FSum03(10*n,1/10)], n=1..10);

> seq([10*n, FSum03(10*n,1/100)], n=1..10);

> seq([10*n, FSum03(10*n,1/1000)], n=1..10);

Maple code for exercises in section 5.3


4.

2*sum((-1)^k*sin(1/2*(2*k-1)*Pi*x),k = 1 .. n)

> FSum04 :=  (n, x) -> 2*sum((-1)^k*sin(1/2*(2*k-1)*Pi*x),k = 1 .. n) ;

> plots[display](plot(FSum04(5,x),x = -4 .. 4),plot(FSum04(13,x),x = -4 ..


4),plot(FSum04(27,x),x = -4 .. 4));

5.

sum(x^2*sin(x)/(1+k*x^2)/(1+(k-1)*x^2),k = 1 .. n)

> FSum05 :=  (n, x) -> evalf(sum(x^2*sin(x)/(1+k*x^2)/(1+(k-1)*x^2),k = 1 .. n),10) ;

> seq([100*n, FSum05(10*n,Pi/6)], n=1..10);

> seq([100*n, FSum05(10*n,Pi/4)], n=1..10);

> seq([100*n, FSum05(10*n,Pi/2)], n=1..10);

6.

Be certain to  enter  the definition of FSum05(n,x) in exercise 5.

> plots[display](plot(FSum05(3,x),x = -Pi .. Pi),plot(FSum05(6,x),x = -Pi ..


Pi),plot(FSum05(9,x),x = -Pi .. Pi),plot(FSum05(12,x),x = -Pi .. Pi));

10.

sum(k*x*exp(-k*x^2)-(k-1)*x*exp((-k+1)*x^2),k = 1 .. n)

exp(x) is the same as E^x.

> FSum06 := (n, x) -> sum(k*x*exp(-k*x^2)-(k-1)*x*exp((-k+1)*x^2),k = 1 .. n) ;

> plots[display](plot(FSum06(5,x),x = -1 .. 1),plot(FSum06(10,x),x = -1 ..


1),plot(FSum06(20,x),x = -1 .. 1));
Chapter 6
Maple code for exercises in section 6.1
8.

> a := k -> piecewise(k=0,1/2*int(x^2,x = -1 .. 1), int(x^2*cos(k*Pi*x),x = -1 .. 1));

> b := k -> int(x^2*sin(k*Pi*x),x = -1 .. 1) ;

11.

> riemann := proc(x)

> local pos::list, neg::list, negi, posi, i, l::list, sum, summands::list, j, posl::list,
negl::list; negi := 1; posi := 1; posl := []; negl := []; i := 1; sum := 0; pos := []; neg
:= []; summands := []; l := [];

> summands := [seq((-1)^(j-1)*cos((2*j-1)*x)/(2*j-1),j=1..100)];

> for j from 1 to 100 do if op(j,summands) >= 0 then pos :=


[op(pos),op(j,summands)]; posl := [op(posl),j]; else neg :=
[op(neg),op(j,summands)]; negl := [op(negl),j] end if; end do;

> while i <= 20 do while sum < 1 do sum := sum + op(posi,pos); l :=


[op(l),op(posi,posl)]; posi := posi + 1; i := i + 1; end do; while sum >= 1 do sum
:= sum + op(negi,neg); l := [op(l),op(negi,negl)]; negi := negi + 1; i := i + 1; end
do; end do; print(l); print(sum);
end proc;

> riemann(.5);

The list shows the order in which the summands have been rearranged. thus, with x = 0.5, we
take the first summand, then the fourth, then the second, then the seventh, and so on.

13.

The following command will calculate a numerical approximation to the value of the integral:

> F := n -> evalf(evalf(int(sin((2*n+1)*u)*sqrt(9+2*u)/sin(u),u = 0 .. 1/2*Pi))/Pi) ;


> plots[listplot]([seq([5*n, F(5*n)],n = 1 .. 30)],style = POINT,symbol = POINT,view
= 1.5 .. 1.51);

17.

> a := k -> piecewise(k=0,  1/2*int(2*x+1,x = -Pi .. 0)/Pi+int(1/3*x-2/3,x = 0 ..


Pi)/Pi, int((2*x+1)*cos(k*x),x = -Pi .. 0)/Pi+int(1/3*(x-2)*cos(k*x),x = 0 .. Pi)/Pi);
b := k -> int((2*x+1)*sin(k*x),x = -Pi .. 0)/Pi+int(1/3*(x-2)*sin(k*x),x = 0 .. Pi)/Pi ;

> [a(k), b(k)];

Simplify Mathematica's answer by using your knowledge that k is an integer and therefore sin(k
Pi) = 0 and cos( k Pi) = (-1)^k.

Maple code for exercises in section 6.2


1.

> S1 := n -> sum(j^3/n^4-2*j^2/n^3+j/n^2,j = 0 .. n-1) ;

> evalf(seq([n, S1(n)], n=1..20),10);

2.

In the argument of ApproxS, enter the list of points in the partition in increasing order, starting
with 0 and ending with 1.

> ApproxS := proc (P::list) local j ;


sum((op(j,P)^3-2* op(j,P)^2+ op(j,P))*( op(j+1,P)- op(j,P)),j = 1 .. nops(P)-1) end
proc;

> ApproxS([0, .25, .5, .75, 1]);

> ApproxS([0, .1, .35, .6, .85, 1]);

3.

> S2 := n -> sum(sin(n/j)/n,j = 1 .. n-1) ;

> evalf(seq([5*n, S2(5*n)], n=1..20),10);

5.
> int(cos(100*Pi*x)^2,x = 0 .. 1);

> S3 := n -> sum(cos(100*Pi*j/n)^2/n,j = 0 .. n-1) ;

> evalf(seq([n, S3(n)], n=1..30),10);

Maple code for exercises in section 6.3


15.

The first command defines the numerator function ((x)).

> num := x -> piecewise(x < floor(x)+1/2,x-floor(x),x = floor(x)+1/2,0,floor(x)+1/2 <


x,x-floor(x)-1);

This next command simply looks at the plot of this function.

> plot(num(x),x = -4 .. 4);

The command points[n] generates a table of the approximations to f(x) at the values x = j/1000, j
= 1, 2, ..., 1000, using a summation with n summands (and thus an error that is bounded by 1/2n).

> points := n -> [seq([1/1000*j, sum(num(1/1000*k*j)/k^2,k = 1 .. n)],j = 1 .. 1000)] ;

> plots[listplot](points(10),style = POINT,symbol = POINT);

> plots[listplot](points(100),style = POINT,symbol = POINT);

> plots[listplot](points(1000),style = POINT,symbol = POINT);

19.

> fun := (k,d)-> piecewise( k mod d = 0, (-1)^d*d , 0 ) ;

> psi := k -> sum(fun(k,d),d = 1 .. k) ;

> seq([k, psi(k)], k=1..100);

Maple code for exercises in section 6.4


2.
> distance := x -> piecewise(x <= floor(x)+1/2,x-floor(x),floor(x)+1/2 <
x,floor(x)+1-x) ;

> plot(distance(x),x = -2 .. 2);

3.

> plotF := n -> plot(distance(4^n*x)/(4^n),x = -4^(1-n) .. 4^(1-n)) ;

Notice what happens to the scale on both the x- and y-axes.

> plotF(2);

> plotF(3);

> plotF(4);

4.

> S4 := (n, x) -> sum(distance(4^k*x)/(4^k),k = 0 .. n) ;

> plot(S4(2,x),x = -2 .. 2);

> plot(S4(3,x),x = -2 .. 2);

> plot(S4(4,x),x = -2 .. 2);

Why don't these look any differrent?

11.

> S5 := (n, x) -> sum((6/7)^k*cos(7^n*Pi*x),k = 0 .. n) ;

Notice what happens to the scale on both the x- and y-axes.

> plot(S5(1,x),x = -1 .. 1);

> plot(S5(2,x),x = -1/7 .. 1/7);

> plot(S5(3,x),x = -1/49 .. 1/49);


Binomial Coefficients and Sums of nth Powers

Appendix to A Radical Approach to Real Analysis 2nd edition


2006
c David M. Bressoud

June 22, 2006

1 Introduction

There is a remarkable property of Pascal’s triangle that was independently discovered between the
12th and 14th centuries in India, China, and Europe. If we start along the right-hand edge and
come down along any southwest heading diagonal as far as we wish, the sum of the entries we have
crossed is equal to the next entry to the southeast.

1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1
5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1

In this example, we observe that


         
3 4 5 6 7
+ + + = .
3 3 3 3 4

In general, we have the formula


n n + 1 n + 2 
m−1
 
m

+ + + ··· + = . (1)
n n n n n+1

1
Binomial Coefficients and Sums of nth Powers 2

If you think about this little while, you will see that it is a consequence of the fact that each entry
is equal to the sum of the two entries that lie above it (see exercise 1).

If we fix a nonnegative integer n, then the binomial coefficient


 x  x(x − 1)(x − 2) · · · (x − n + 1)
=
n n!
is a polynomial of degree n in x that we denote by Pn (x):
P0 (x) = 1,
P1 (x) = x,
1 2 1
P2 (x) = x − x,
2 2
1 3 1 2 1
P3 (x) = x − x + x,
6 2 3
1 4 1 3 11 2 1
P4 (x) = x − x + x − x,
24 4 24 4
1 5 1 4 7 3 5 1
P5 (x) = x − x + x − x2 + x,
120 12 24 12 5
..
.
Note that if x is a positive integer less than n, then Pn (x) = 0. Equation (1) translates into a
remarkable insight into this sequence of polynomials:
Pn (n) + Pn (n + 1) + · · · + Pn (k − 1) = Pn (1) + Pn (2) + · · · + Pn (k − 1) = Pn+1 (k). (2)

We can use this to find sums of arbitrary powers because any polynomial of degree n, including xn ,
can be expressed in terms of P1 (x), P2 (x), . . . , Pn (x). For example:
x4 = 24P4 (x) + 36P3 (x) + 14P2 (x) + P1 (x).
It follows that
k−1
X
4 4 4 4

1 + 2 + 3 + · · · + (k − 1) = 24P4 (j) + 36P3 (j) + 14P2 (j) + P1 (j)
j=1
k−1
X k−1
X k−1
X k−1
X
= 24 P4 (j) + 36 P3 (j) + 14 P2 (j) + P1 (j)
j=1 j=1 j=1 j=1

= 24P5 (k) + 36P4 (k) + 14P3 (k) + P2 (k)


   
24 5 36 24 4 14 36 24 · 7
= k + − k + − + k3
120 24 12 6 4 24
   
1 14 36 · 11 24 · 5 2 −1 14 36 24
+ − + − x + + − + k
2 2 24 12 2 3 4 5
1 5 1 4 1 3 1
= k − k + k − k.
5 2 3 30

It should be clear that if we know how to expand xn in terms of our polynomials P1 (x), P2 (x), . . . , Pn (x),
then we can use it to find the formula for 1n + 2n + · · · + (k − 1)n .

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Binomial Coefficients and Sums of nth Powers 3

1 2 3 4

5 6 7 8

Figure 1: Eight painted houses using exactly four colors.

2 Finding the Coefficients

We shall use a combinatorial or counting argument to find these coefficients. If m and n are positive
integers, then mn counts the number of ways of painting n houses where for each house we have a
choice of m colors (see Figure 1). The fact that we have m colors available does not mean that we
use all m. We might use only k of our colors. If we do decide to use k colors, then there are m k
choices of which k colors to use. Once we have decide which colors to use, then we want to know
how many ways we can paint our n houses using exactly k colors. We call this number HP (n, k),
the house-painting number.

We have shown that


m m m m
mn = HP (n, 1) + HP (n, 2) + HP (n, 3) + · · · + HP (n, n). (3)
1 2 3 n
These house-painting numbers are precisely the coefficients that we want. Note that HP (4, 1) = 1.
If we are using one color, there is only one way to paint four houses. It is also easy to see that
HP (4, 4) = 24. If we have to use all four colors on four houses, then each house gets a different
color, and there are 4! = 24 ways of assigning the colors. Check for yourself that HP (4, 2) = 14
and HP (4, 3) = 36.

Notice that we have only established equation (3) when m is a positive integer, but both sides are
polynomials in m, so if they agree for all positive integers, then they must agree for all possible
values. This equation can be rewritten as

xn = HP (n, 1)P1 (x) + HP (n, 2)P2 (x) + HP (n, 3)P3 (x) + · · · + HP (n, n)Pn (x). (4)

In general, we see that HP (n, 1) = 1 and HP (n, n) = n! (see exercise 3). We can start listing these
numbers in a triangular arrangements like Pascal’s triangle for binomial coefficients:

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Binomial Coefficients and Sums of nth Powers 4

1
1 2
1 6 6
1 14 36 24
1 30 150 240 120

This table has a recursion similar to that in Pascal’s triangle, but a little more complicated. If we
have n houses and must use exactly k colors, we first paint house # 1. There are k choices. We
now paint the n − 1 remaining houses. We have two options for the remaining houses. We can
decide that the color used on house # 1 is one we do not want to use again. That leaves us with
HP (n − 1, k − 1) ways of coloring the remaining houses. Or we we can decide that we do want to
use that color again, leaving us HP (n − 1, k) ways of painting the remaining houses. This gives us
the recurrsive formula,

HP (n, k) = k HP (n − 1, k − 1) + HP (n − 1, k) . (5)

We add the number above and to the left, to the number above and multiply that sum by the
column number.

3 Stirling Numbers

The house-painting numbers are related to a better known collection of numbers known as the
Stirling numbers of the second kind, S(n, k), by

HP (n, k) = k! S(n, k).

Note that all of the numbers in column k are divisible by k!, and it is not hard to see what this must
be. If we have k colors for n houses, we can permute the colors and get a different coloring. The
Stirling number S(n, k) counts the number of ways of sorting n objects into exactly k non-empty
sets. To connect this to the house-painting number, each set of houses gets the same color, and
there are k! ways of deciding which color to assign to each set.

The triangle for the Stirling numbers is

1
1 1
1 3 1
1 7 6 1
1 15 25 10 1

The recursion is

k! S(n, k) = k (k − 1)! S(n − 1, k − 1) + k! S(n − 1, k)
S(n, k) = S(n − 1, k − 1) + k S(n − 1, k). (6)

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Binomial Coefficients and Sums of nth Powers 5

Exercises

1. Prove equation (1) by induction on k.

2. Verify that HP (4, 2) = 14 and HP (4, 3) = 36.

3. Explain why HP (n, 1) = 1 and HP (n, n) = n! for all positive integers n.

4. Use equation 5 to find the values of HP (6, k), 1 ≤ k ≤ 6.

5. Use equation 6 to find the values of S(6, k), 1 ≤ k ≤ 6.

6. Find the formula for the sum of the 5th powers of the integers from 1 to k − 1.

Appendix to A Radical Approach to Real Analysis 2nd edition. 2006


c David M. Bressoud
Appendix A
Maple code for exercises in section A.1
1.
> wallis := n -> evalf( product( 4*k^2/(4*k^2-1),k=1..n));

> Show that the average value of the upper and lower bounds on Pi can be computer as

> wallisaverage := n -> 2*evalf( product( 4*k^2/(4*k^2-1),k=1..n)*(1+1/(4*n)) );

7.
> evalf( binomial(1,1/2),20 );

> evalf( 4/Pi, 20);

13.
> evalf( binomial(1,1/3),20);

> evalf(1/int((1-x^(3/2))^(1/3), x=0..1),20);

Maple code for exercises in section A.2


1.
> The nth Bernoulli number is stored in Maple as bernoulli(n). The following command
generates a list of the first forty Bernoulli numbers

> [seq(bernoulli(n),n=1..40)];

> We can now use the recursive formula for Bernoulli polynomials:

> B := (n,x) -> if n=1 then x-1/2 else n*int(B(n-1,t),t=0..x)+bernoulli(n) end if;

> [seq(B(n,x),n=1..10)];
2.
> for n from 1 to 8 do plot(B(n,x),x=-1..2) end do;

8&9
> The command NB(n) finds the numberator of the 2nth Bernoulli number. Thus, the numerator
of B_20 is NB[10]. The command DB(n) finds the denominator.

> NB := n -> numer(bernoulli(2*n)); DB := n -> denom(bernoulli(2*n));

> The following commands list the factorizations of the numerators and of the denoninators of
the Bernoulli numbers. The first column gives the value of n (it is the Bernoulli number for
2n), the second column list the prime divisors, and the third column gives the powers of each
of those prime divisors.

> [seq([n,ifactor(NB(n))],n=1..20)];

> [seq([n,ifactor(DB(n))],n=1..20)];

Maple code for exercises in section A.3


1.

> The following command calculates the difference between the sum of the reciprocals of the
first n perfect squares and Pi^2/6.

> df := n -> evalf( sum(1/k^2,k=1..n) - Pi^2/6,10);

> [seq( [100*n, df(100*n)],n=1..10)];

> The next command adds 1/n to the sum of the reciprocals of the first n squares.

> dfplus := n -> evalf( 1/n + sum(1/k^2,k=1..n) - Pi^2/6,10);

> [seq( [100*n, dfplus(100*n)],n=1..10)];

2.

> The following command calculates the difference between the sum of the reciprocals of the
first n fourth and sixth powers and the exact values of the infinite series.
> df4 := n -> evalf( sum(1/k^4,k=1..n) - Pi^4/90,10);

> [seq( [100*n, df4(100*n)],n=1..10)];

> df6 := n -> evalf( sum(1/k^6,k=1..n) - Pi^6/945,10);

> [seq( [100*n, df6(100*n)],n=1..10)];

8.

> P := (n,x) -> 1 + sum( bernoulli(k)*x^k/k!,k=1..n);

> The following command superimposes the the polynomials that approximate x/(exp(x)-1).

> plot([P(4,x),P(6,x),P(8,x),P(10,x),P(12,x),x/(exp(x)-1)],x=-8..8);

11.

> The command given below will find these derivatives, but notice that the denominators are 0.
Show that in the limit as x approaches 0, these all approach the correct values.

> for n from 1 to 3 do simplify(diff(x/(exp(x)-1),x$n)) end do;

14.

> Q := (n,x) -> 1 + sum((-1)^k * bernoulli(2*k)*(2*x)^(2*k)/(2*k)!, k=1..n);

> plot([Q(2,x),Q(4,x),Q(6,x),Q(8,x),x+cot(x)],x=-4..4);

17.

> R := (n,z) -> 1/z + 2*sum( z/(z^2-k^2*Pi^2), k=1..n);

> plot([R(3,x),R(6,x),R(9,x),R(12,x),cot(x)],x=-4..4,y=-1..1);

> The following command plots the differences between the cotangent and each of thes seies.
Use these graphs to give a reasonable approximation of the error function cot(x)-R(n,x)

> for n from 1 to 4 do plot(cot(x) - R(3*n,x),x=-4..4) end do;

20.
> If we evaluate the first 1000 terms of this series, then we obtain a value that is between

> evalf(1/(2*1001^2), 10);

> and

> evalf(1/(2*1000^2), 10);

> below the true value of zeta(3), and therfore zeta(3) lies in the interval

> [evalf(sum(1/k^3,k=1..1000)+1/(2*1001^2),12),evalf(sum(1/k^3,k=1..1000)+1/(2*10
00^2),12)];

Maple code for exercises in section A.4


2.

> The command StirlingTriple produces three numbers: n, n! (to 10-digit accuracy), and
Stirling's formula for n with the first two terms of the asymptotic series (to 10-digit accuracy).

> StirlingTriple := x -> [x,evalf(x!,10),evalf( (x/exp(1))^x*sqrt(2*Pi*x)*exp(1/(12*x) -


1/(30*x^3)), 10)];

> [seq(StirlingTriple(x),x in [5, 10, 20, 50, 100])];

3.

> The following command finds the aboluate value of the mth summand in the asymptotic
series, evaluated at n.

> Summand := (m,n) -> abs(bernoulli(2*m)/((2*m)*(2*m-1)*n^(2*m-1)));

> The next command lists the absolute values of the summands, taking 20 summands between 1
and 10n.

> ListSummands := n -> [seq([floor(m*n/2),


evalf(Summand(floor(m*n/2),n),10)],m=1..20)];

> ListSummands(10);

> To refine your search, the following commmands lists all the summands in the range a to b
> RefineSearch := (n,a,b) -> [seq([m,evalf(Summand(m,n),10)],m=a..b)];

> RefineSearch(10,25,40);

> This command allows to compare the value of the approximation to n! with the best
asymptotic estimate (to m terms):

> Cf := (n,m) -> [n!,evalf( (n/exp(1))^n


*sqrt(2*Pi*n)*exp(sum(bernoulli(2*k)/(2*k*(2*k - 1)*n^(2*k - 1)),
k=1..m))),ceil(log10(n!))];

> Cf(10,3);

5.

> The first command calculates 1/2 + the sum of the first n terms. The second produces a table
of these values as n goes from 1 to 10. The third command gives the actual value of Euler's
gamma to 10-digit accuracy for purposes of comparison.

> GammaApprox := n -> evalf( 1/2 + sum(bernoulli(2*k)/(2*k), k=1..n), 10);

> [seq([n,GammaApprox(n)],n=1..10)];

> evalf(gamma,10);

6.

> The following command finds the aboluate value of the mth summand in the asymptotic
series, evaluated at n.

> HarmonicSummand := (m,n) -> abs( bernoulli(2*m)/(2*m*n^(2*m)) ) ;

> The next commmand lists the absolute values of the summands, taking 20 summands between
1 and 10n.

> ListHarmonicSummands := n -> [seq( [floor(m*n/2), evalf(


Summand(floor(m*n/2),n), 10)],m=1..20)];

> ListHarmonicSummands(10);

> To refine your search, the following commmands lists all the summands in the range a to b
> RefineHarmonicSearch := (n,a,b) ->  [seq([m,evalf( Summand(m,n),10)],m=a..b)];

> RefineHarmonicSearch(10,25,40);

> This command allows you to compare the value of the approximation to the harmonic series
with the best asymptotic estimate (to m terms):

> HarmonicCf := (n,m) -> [ evalf(sum((1/k),k=1..n),n), evalf( ln(n) + gamma + 1/(2*n)


- sum(bernoulli(2*k)/(2*k*n^(2*k)), k=1..m) , n) ];

> HarmonicCf(10,3);
The following people have contributed corrections to the 2nd edition of A Radical
Approach to Real Analysis. Their careful reading of this text is greatly appreciated.

Donald G. M. Anderson
Drew Ash
John Baltutis
Jacob Bond
Donald Brewer
Paul Campbell
Dan Flath
Tim Fortune
Joseph Gerver
Larry Gray
Kevin Hartshorne
Larry Holmquist
Ryan Mullen
Jennifer Parker
David Pearson
Luke Pinkston
Fred Rickey
Hans R. Schneebeli
Tom Sciascia
Stan Selzer
Jon Stadler
Steve Strogatz
Naveen Thakur
Erin Thorngate
Enrique Treviño
Richard Vitray
Hao Zou
AMS / MAA TEXTBOOKS

In this second edition of the MAA classic, exploration continues to be an


essential component. More than 60 new exercises have been added, and
the chapters on Infinite Summations, Differentiability and Continuity, and
Convergence of Infinite Series have been reorganized to make it easier to
identify the key ideas.
A Radical Approach to Real Analysis is an introduction to real analysis,
rooted in and informed by the historical issues that shaped its develop-
ment. It can be used as a textbook, as a resource for the instructor who
prefers to teach a traditional course, or as a resource for the student who
has been through a traditional course yet still does not understand what
real analysis is about and why it was created.
The book begins with Fourier’s introduction of trigonometric series
and the problems they created for the mathematicians of the early 19th
century. It follows Cauchy’s attempts to establish a firm foundation for
calculus and considers his failures as well as his successes. It culminates
with Dirichlet’s proof of the validity of the Fourier series expansion and
explores some of the counterintuitive results Riemann and Weierstrass
were led to as a result of Dirichlet’s proof.

TEXT/10.S

You might also like