94% found this document useful (16 votes)
14K views655 pages

(Industrial and Applied Mathematics) Martin Brokate, Pammy Manchanda, Abul Hasan Siddiqi - Calculus For Scientists and Engineers (2019, Springer)

Math

Uploaded by

Mae Ann Kikiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
94% found this document useful (16 votes)
14K views655 pages

(Industrial and Applied Mathematics) Martin Brokate, Pammy Manchanda, Abul Hasan Siddiqi - Calculus For Scientists and Engineers (2019, Springer)

Math

Uploaded by

Mae Ann Kikiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 655

Industrial and Applied Mathematics

Martin Brokate
Pammy Manchanda
Abul Hasan Siddiqi

Calculus for
Scientists
and
Engineers
Industrial and Applied Mathematics

Editor-in-Chief
Abul Hasan Siddiqi, Sharda University, Greater Noida, India

Editorial Board
Zafer Aslan, Istanbul Aydin University, Istanbul, Turkey
Martin Brokate, Technical University, Munich, Germany
N.K. Gupta, Indian Institute of Technology Delhi, New Delhi, India
Akhtar A. Khan, Rochester Institute of Technology, Rochester, USA
René Pierre Lozi, University of Nice Sophia-Antipolis, Nice, France
Pammy Manchanda, Guru Nanak Dev University, Amritsar, India
Zuhair Nashed, University of Central Florida, Orlando, USA
Govindan Rangarajan, Indian Institute of Science, Bengaluru, India
Katepalli R. Sreenivasan, NYU Tandon School of Engineering, Brooklyn, USA
The Industrial and Applied Mathematics series publishes high-quality research-level
monographs, lecture notes and contributed volumes focusing on areas where
mathematics is used in a fundamental way, such as industrial mathematics,
bio-mathematics, financial mathematics, applied statistics, operations research and
computer science.

More information about this series at http://www.springer.com/series/13577


Martin Brokate Pammy Manchanda
• •

Abul Hasan Siddiqi

Calculus for Scientists


and Engineers

123
Martin Brokate Pammy Manchanda
Department of Mathematics Department of Mathematics
Technical University of Munich Guru Nanak Dev University
Munich, Bayern, Germany Amritsar, Punjab, India

Abul Hasan Siddiqi


Department of Mathematics
Sharda University
Greater Noida, Uttar Pradesh, India

ISSN 2364-6837 ISSN 2364-6845 (electronic)


Industrial and Applied Mathematics
ISBN 978-981-13-8463-9 ISBN 978-981-13-8464-6 (eBook)
https://doi.org/10.1007/978-981-13-8464-6
© Springer Nature Singapore Pte Ltd. 2019
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Foreword

In my past position as Director of the Abdus Salam International Centre of


Theoretical Physics in Trieste, Italy, I was deeply involved in strengthening
advanced research in developing countries. Even though the Centre’s mandate was
postdoctoral research, it was clear that only by enhancing the quality of education at
all levels can one enable advanced research at a sustainable level of excellence. The
Centre had thus ventured, often and deliberately, into undergraduate education. One
lacuna we had observed was the lack of good, affordable, and motivating textbooks
in science and mathematics.
I was thus pleased when the authors of this book—all of whom are well-known
researchers and teachers I know personally—approached me in the summer of 2006
with a proposal for writing a quality book on calculus for use in undergraduate
education. Their goals were to make the book useful for instruction in any country
but priced such that students in developing countries could afford it. With this
understanding, I facilitated several visits of the authors to the Centre, during which
they actively collaborated on the book. I am pleased that the book is now being
published in its new edition by Springer Nature, and thank the authors for persisting
with the collaboration despite geographical separation.
I am satisfied that the coverage and presentation in the book are at a high level
and meet one of our principal requirements. The authors have endeavored to present
the basic concepts clearly and point to their applications in diverse fields. I hope
that many brilliant young students will benefit from this book, which is the result of
a continuing collaboration among the authors, on which I congratulate them
warmly.

New York, USA K. R. Sreenivasan


Courant Institute of Mathematical Sciences
New York University

v
Preface

This book is meant to be used as a first course in calculus for students of science
and engineering. It will also be useful for students of other disciplines who are
interested in learning calculus.
We endeavored to explain the basic concepts of calculus hand in hand with their
relevance to real-world problems. We have given special emphasis on applications
without compromising rigorous analysis. Plenty of solved examples have been
given to clarify techniques related to a particular theme. In appendices, we have
discussed concepts and themes we regard as prerequisites, like the number system,
trigonometric functions, and analytic geometry. Moreover, proofs of some of the
theorems have been included there in order to not interrupt the flow of the argument
in the main body of the text. Some references to other books on calculus that have
motivated our presentation have been given in the bibliography.
The text is application oriented. Many interesting, relevant, and up-to-date
applications have been drawn from the fields of business, economics, social and
behavioral sciences, life sciences, physical sciences, and other fields of general
interest. Applications are found in the main body of the text as well as in the
exercise sets. In fact, one goal of the text is to include at least one real-life appli-
cation in each section wherever possible.
The book comprises 12 chapters. Chapter 1 is devoted to an introduction of
functions of one independent variable. Chapter 2 provides the concepts of limit and
continuity along with their physical and geometrical interpretations. Chapter 3 deals
with derivatives and the techniques of differentiation. Chapter 4 discusses the
optimization of a function, that is, finding minima and maxima of a function over an
interval. Moreover, applications of optimization to various real-world problems,
including a fairly large number of solved examples from business and finance, are
also studied in this chapter. Chapter 5 considers sequences and series, in particular
Maclaurin and Taylor series. Chapters 6 and 7 are devoted to the process of inte-
gration and its applications in business and industry, engineering problems, and
probability theory. We show with a lot of examples that the utility of integrals has
expanded far from their original purpose, the computation of the area below a curve.

vii
viii Preface

Chapter 8 introduces functions of several variables. Concepts of level curves


(level sets) or contours, graphs of functions of two variables, and equipotential and
isothermal surfaces are developed. Physical situations represented by functions of
more than one variable are discussed. This chapter also deals with the extension
of the concepts of limit, continuity, differentiability, optimization, and integration to
functions of several variables. Physical situations, where such extensions are
required, are discussed in detail. Often we have restricted our presentation to
functions of two variables only, for the sake of clarity and easier understanding and
because most results which hold true for two variables can be readily extended to
functions of more than two variables.
Chapter 9 is devoted to the calculus of vector-valued functions (vector fields),
that is, functions that are defined on a domain of dimension 1, 2, or 3 and take
values in the plane or the space. Continuity, differentiability, and integration for
vector-valued functions are introduced, and the theorems of Green, Gauss, and
Stokes are discussed. Applications of vector calculus and of these theorems to
problems of science and engineering are presented.
Chapter 10 deals with Fourier methods and their applications to real-world
problems for readers who want to pursue this topic.
Chapter 11 is devoted to the introduction of ordinary and partial differential
equations. Modeling of real-world problems with these equations is explained.
Chapter 12 shows how MATLAB can be used as an aid for teaching and
learning concepts of calculus, in particular those we have discussed in this book.
Teachers may use MATLAB as a tool for vivid and precise demonstrations, while
students may use MATLAB as a tool for exploring by themselves various concepts
of calculus. Indeed, MATLAB is being used nowadays practically in every branch
of science and engineering.
* Chapter 12 is mainly written by Dr. A. K. Verma, Assistant Professor, Sharda
Group of Institutions, Agra, and Dr. Jean-Marc Ginoux from France. Dr. Verma has
also drawn all figures of this book using MATLAB.
The International Centre for Theoretical Physics (ICTP), Trieste, Italy (a joint
venture between UNESCO and the Italian government), has played a pivotal role in
the creation of this book. Established in 1964 and renamed as the Abdus
Salam ICTP in 1997, ICTP possesses a worldwide rather unique combination of
features. It is a meeting point of scientists from developed and developing countries
and, in particular, continues to provide yeoman’s service in training bright scientists
from developing countries. The authors of this book have been frequent visitors to
this Centre since 1986 and have availed the Centre’s hospitality to enhance their
academic capabilities and cooperation. During one of their visits in 2006, while
walking along the Adriatic Sea, they began to discuss the utility of writing a book
on calculus for undergraduates and agreed to write such a book on calculus with
special emphasis on the clarity of concepts and their applications in diverse fields.
Our objective is to show that, throughout all of its contents, the mathematics of
calculus is not just an abstract subject, but has relevance to many different fields of
human knowledge. The next morning, Director of ICTP, Prof. K. R. Sreenivasan
was approached with the request to support the writing of such a book. He was very
Preface ix

prompt in approving the idea and assured to provide all kinds of facilities in
ICTP. We take this opportunity to thank him for the financial and infrastructural
support without which this book could not have been completed. In particular, we
have highly benefitted from the excellent library in ICTP.
We take this opportunity to thank Dr. Meenakshi, UGC Research Fellow, gold
medalist, and now Lecturer at Dev Samaj College for Women, Ferozepur, who has
gone through this book carefully and has given several valuable suggestions. Also,
Sharda University deserves a special mention, as a major part of the technical work
was carried out at this place.

Munich, Germany Martin Brokate


Amritsar, India Pammy Manchanda
Greater Noida, India Abul Hasan Siddiqi
Contents

1 Functions and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Function, Domain, and Range . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Various Types of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Important Examples of Functions . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Functions as Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5 Algebra of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.6 Proofs, Mathematical Induction . . . . . . . . . . . . . . . . . . . . . . . . 30
1.7 Geometric Transformation of Functions . . . . . . . . . . . . . . . . . . 32
1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2 Limit and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.1 Idea and Definition of the Limit . . . . . . . . . . . . . . . . . . . . . . . 39
2.2 Evaluating Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.3 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.4 Improper Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1 Definition of the Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2 Derivative of Elementary Functions . . . . . . . . . . . . . . . . . . . . . 59
3.3 Some Differentiation Formulas . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4 Derivatives of Higher Order . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.5 A Basic Differential Equation . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.6 Differentials, Newton–Raphson Approximation . . . . . . . . . . . . . 83
3.7 Indeterminate Forms and l’Hôpital’s Rule . . . . . . . . . . . . . . . . 90
3.8 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.1 Extremum Values of Functions . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

xi
xii Contents

4.3 Further Properties of Extremum Values . . . . . . . . . . . . . . . . . . 108


4.4 Convexity and Concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.5 Applications of Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5 Sequences and Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.1 Sequences and Their Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.2 Infinite Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.3 Alternating Series, Absolute and Conditional Convergence . . . . 141
5.4 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.2 Integral and Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.3 Antiderivatives and Rules of Integration . . . . . . . . . . . . . . . . . . 156
6.4 Integration by Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
6.5 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.6 The Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . . 169
6.7 Trigonometric Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.8 Partial Fractions and Integration . . . . . . . . . . . . . . . . . . . . . . . . 179
6.9 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.10 Additional Tables of Integrals . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
7 Applications of Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.1 Areas Under Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.2 Determination of Length, Area, and Volume . . . . . . . . . . . . . . 202
7.3 Definite Integral as Average . . . . . . . . . . . . . . . . . . . . . . . . . . 212
7.4 Applications to Business and Industry . . . . . . . . . . . . . . . . . . . 215
7.4.1 Present and Future Values . . . . . . . . . . . . . . . . . . . . . 215
7.4.2 Annuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7.4.3 Applications in Business . . . . . . . . . . . . . . . . . . . . . . . 219
7.5 Applications to Mechanics and Engineering . . . . . . . . . . . . . . . 223
7.6 Integrals and Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
8 Functions of Several Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.2 Situations Modeled by Functions of More Than One
Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
8.3 Continuity of Functions of Several Variables . . . . . . . . . . . . . . 247
8.4 Partial Derivatives with Applications . . . . . . . . . . . . . . . . . . . . 251
Contents xiii

8.5 Optimization of Functions of Two Variables . . . . . . . . . . . . . . 273


8.5.1 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . 273
8.5.2 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . 285
8.6 Taylor Expansion in Two Variables . . . . . . . . . . . . . . . . . . . . . 288
8.7 Integration of Functions of Several Variables . . . . . . . . . . . . . . 291
8.8 Applications of Double Integrals . . . . . . . . . . . . . . . . . . . . . . . 298
8.8.1 Population of a City . . . . . . . . . . . . . . . . . . . . . . . . . . 298
8.8.2 Average Value of a Function of Two Variables . . . . . . 299
8.8.3 Joint Probability Density Functions . . . . . . . . . . . . . . . 300
8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
9 Vector Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
9.2 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
9.3 Differential Calculus of Vector Fields . . . . . . . . . . . . . . . . . . . 320
9.3.1 Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
9.3.2 Vector Fields in Several Dimensions . . . . . . . . . . . . . . 326
9.3.3 Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
9.4 Integration in Vector Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
9.4.1 Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
9.4.2 Surface Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
9.5 Fundamental Theorems of Vector Calculus . . . . . . . . . . . . . . . . 352
9.5.1 The Theorem of Green and Ostrogradski . . . . . . . . . . . 352
9.5.2 The Divergence Theorem of Gauss . . . . . . . . . . . . . . . 355
9.5.3 The Theorem of Stokes . . . . . . . . . . . . . . . . . . . . . . . 360
9.6 Applications of Vector Calculus to Engineering Problems . . . . . 363
9.6.1 Elements of Vector Calculus and the Physical World . . . 364
9.6.2 Applications of Line Integrals . . . . . . . . . . . . . . . . . . . 372
9.6.3 An Example of Planar Fluid Flow-Hurricane . . . . . . . . 374
9.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
10 Fourier Methods with Applications . . . . . . . . . . . . . . . . . . . . . . . . . 383
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
10.2 Orthonormal Systems and Fourier Series . . . . . . . . . . . . . . . . . 384
10.2.1 Orthonormal Systems . . . . . . . . . . . . . . . . . . . . . . . . . 384
10.2.2 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
10.2.3 Further Properties of Fourier Series . . . . . . . . . . . . . . . 402
10.3 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
10.3.1 Basic Properties of the Fourier Transform . . . . . . . . . . 411
10.3.2 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
10.3.3 The Discrete Fourier Transform . . . . . . . . . . . . . . . . . 421
10.4 Application of Fourier Methods to Signal Analysis . . . . . . . . . . 422
10.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
xiv Contents

11 Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427


11.1 Introduction and Basic Notions . . . . . . . . . . . . . . . . . . . . . . . . 428
11.2 Separation of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
11.3 First-Order Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 437
11.4 Solution by Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
11.4.1 Homogeneous Equations . . . . . . . . . . . . . . . . . . . . . . . 439
11.4.2 Bernoulli Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 441
11.4.3 Reduction of Order . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
11.4.4 Homogeneous Linear Equations with Constant
Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
11.5 Modeling with Differential Equations . . . . . . . . . . . . . . . . . . . . 448
11.5.1 Growth and Decay . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
11.5.2 Population Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
11.5.3 Pollution of Lakes . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
11.5.4 The Quantity of a Drug in the Body . . . . . . . . . . . . . . 454
11.5.5 Spread of Diseases, Technologies and Rumor . . . . . . . 455
11.5.6 Application of Newton’s Law of Cooling . . . . . . . . . . 457
11.5.7 Application of Newton’s Cooling Law
for Determining Time of Death . . . . . . . . . . . . . . . . . . 458
11.6 Introduction to Partial Differential Equations . . . . . . . . . . . . . . 460
11.7 Applications of Fourier Methods to Partial Differential
Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
11.7.1 Fourier Methods for the Wave Equation . . . . . . . . . . . 465
11.7.2 Fourier Methods for the Heat Equation . . . . . . . . . . . . 469
11.7.3 Fourier Methods for the Laplace Equation . . . . . . . . . . 470
11.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
12 Calculus with MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
12.2 Important Elements of MATLAB . . . . . . . . . . . . . . . . . . . . . . 476
12.2.1 Advantages of MATLAB . . . . . . . . . . . . . . . . . . . . . . 476
12.2.2 How to Run MATLAB? . . . . . . . . . . . . . . . . . . . . . . . 476
12.2.3 MATLAB Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 480
12.3 Visualization of Scalar- and Vector-Valued Function . . . . . . . . 482
12.3.1 Plotting Scalar Functions with MATLAB . . . . . . . . . . 482
12.3.2 Plots for Vector-Valued Functions in 2D and 3D . . . . . 487
12.4 Certain Topics of Calculus with MATLAB . . . . . . . . . . . . . . . 489
12.4.1 Differentiation and Integration . . . . . . . . . . . . . . . . . . . 489
12.4.2 Finding Limits of Functions . . . . . . . . . . . . . . . . . . . . 491
12.4.3 Sequences and Series . . . . . . . . . . . . . . . . . . . . . . . . . 493
12.4.4 Solving Ordinary Differential Equations (ODEs) . . . . . 494
12.4.5 Animated Phase Portraits of Nonlinear and Chaotic
Dynamical Systems* . . . . . . . . . . . . . . . . . . . . . . . . . 498
Contents xv

12.4.6 Finding Minima and Maxima . . . . . . . . . . . . . . . . . . . 502


12.4.7 Fourier Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
12.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
Appendix A: Real Numbers and Inequalities . . . . . . . . . . . . . . . . . . . . . . 509
Appendix B: Analytic Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
Appendix C: Trigonometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Appendix D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
Solutions of Selected Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
About the Authors

Martin Brokate is Professor Emeritus of Applied Mathematics at the Technical


University, Munich, Germany. He received his PhD in Mathematics at Freie
Universität, Berlin, Germany, in 1980, and was appointed to the Chair of Numerical
Analysis and Control Theory in 1999. He was the spokesman of Special Research
Area 438 “Mathematical Modeling, Simulation and Verification in
Material-Oriented Processes and Intelligent Systems” from 2001 to 2004. He was
the Dean of the Department of Mathematics in 2003–2006. His interests lie in
applied analysis and control theory, with a focus on the mathematical analysis of
rate-independent evolutions and hysteresis operators.

Pammy Manchanda is Senior Professor in the Department of Mathematics at the


Guru Nanak Dev University, Amritsar, India and Secretary of the Indian Society of
Industrial and Applied Mathematics (ISIAM). She has published more than 50
research papers in several international journals of repute, edited 4 proceedings for
international conferences of the ISIAM and co-authored 3 books. She has visited
the International Centre for Theoretical Physics (ICTP) (a UNESCO institution) at
Trieste, Italy, many times to carry out her research activities, attended and delivered
talks and chaired sessions at several international conferences and workshops across
the globe, including the International Council for Industrial and Applied
Mathematics (ICIAM) during 1999–2015 and the International Congress of
Mathematicians (ICM). She is the managing editor of the Indian Journal of
Industrial and Applied Mathematics and a member of the editorial board of the
Springer book series Industrial and Applied Mathematics.

Abul Hasan Siddiqi is a distinguished scientist and Adjunct Professor at the


School of Basic Sciences and Research, and Coordinator at the Centre for
Advanced Research in Applied Mathematics and Physics (CARAMP) at Sharda
University, Greater Noida, India. He was a visiting consultant at ICTP; Sultan
Qaboos University, Muscat, Oman; MIMOS, Kuala Lumpur, Malaysia; and a
professor at several reputed universities including Aligarh Muslim University,
Aligarh, India; and King Fahd University of Petroleum and Minerals, Dhahran,

xvii
xviii About the Authors

Saudi Arabia. He has a long association with ICTP (a regular associate, guests
of the director and senior associate). He was awarded the German Academic
Exchange Fellowship thrice to carry out mathematical research in Germany. He has
published more than 100 research papers jointly with his research collaborators, 13
books and edited proceedings of 17 international conferences, as well as supervised
29 PhD scholars. He is the founder secretary and the current President of the
ISIAM, which celebrated its silver jubilee in January 2016. He is the editor-in-chief
of the Indian Journal of Industrial and Applied Mathematics (published by ISIAM)
and the Springer’s book series Industrial and Applied Mathematics.
Chapter 1
Functions and Models

The concept of a function is of vital importance for the proper understanding of


many phenomena occurring in different areas of human knowledge. The fundamental
processes of calculus known as differentiation and integration are processes applied
to functions. We discuss here the notion of a function, various forms of functions, and
important classes of functions. We also discuss how functions express phenomena
from other sciences, in particular, physics.

1.1 Function, Domain, and Range

Definition 1.1 (Relation) Let S be a set. A relation R on S is a set of ordered pairs


(x, y), where x, y ∈ S. Moreover, if T is another set, a relation R on S and T is a
set of ordered pairs (x, y), where x ∈ S and y ∈ T .

The term “ordered pair” is used to emphasize that when we write (x, y), x is the first
element of the pair, and y the second. This is to be distinguished from the notion of
the set {x, y}, which consists of the elements x and y, but no order is implied. Thus,
{x, y} and {y, x} denote the same set, but (x, y) and (y, x) are different ordered pairs.
Having this explained once and for all, from now on we will just speak of a “pair”
when we mean an ordered pair.
Example 1.1 Let

S = {x| x is a natural number with x < 6}

and T be the set of natural numbers, which we denote by N. We define a relation R


as
R = {(x, y)| y = 3x, x ∈ S} . (1.1)

© Springer Nature Singapore Pte Ltd. 2019 1


M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6_1
2 1 Functions and Models

Fig. 1.1 Graph of the relation (1.1)

Since S = {1, 2, 3, 4, 5}, we can write R alternatively as the set

R = {(1, 3), (2, 6), (3, 9), (4, 12), (5, 15)} .

If the sets S and T consist of numbers, we can represent a relation graphically in the
Cartesian plane, using the horizontal axis for S and the vertical axis for T . For the
example above, this is given in Fig. 1.1. Such a pictorial representation of a relation
is called a graph. Thus, a graph exhibits a pattern of points distributed over the
plane, each of which represents a single pair (x, y). Most of the figures in this book
are graphs of one form or another. The actual patterns formed by their points can
have rather different shapes, and there may appear random collections of points or
continuously filled areas, lines, or curves. All this depends on how the relationship
between x and y is defined.
Example 1.2 1. The graph of y = x is given in Fig. 1.2.
2. The graph of y = x 2 is given in Fig. 1.3.
In Example 1.2, we have used the short notation “y = x”, respectively, “y = x 2 ” for
the relations

{(x, y)| x ∈ R, y = x} , resp. {(x, y)| x ∈ R, y = x 2 } .

Here, R denotes the set of all real numbers.

Definition 1.2 (Function) A function f from a set S into a set T is a rule that assigns
to each element x ∈ S a unique element y ∈ T . The set S of elements, for which f
is defined, is called the domain of f and usually denoted by D( f ), or simply D.
1.1 Function, Domain, and Range 3

Fig. 1.2 Graph of the function y = f (x) = x

Fig. 1.3 Graph of the


function y = f (x) = x 2

If f is a function from S into T , then the element y ∈ T , which is assigned to a


specific element x ∈ D by the function f , is denoted by f (x) (read “ f of x”) and
is called the value of f at x, while x is called the argument of f . The set of values
of f ,
R( f ) = {y| y ∈ T, y = f (x) for some x ∈ D( f )}

is called the range of f or the image of D( f ) under f . 


4 1 Functions and Models

Fig. 1.4 Area of a circle as a function of its radius

Usually, a typical element of D( f ) is denoted by a specific letter, here x, and a typical


element of R( f ) is denoted by a different letter, here y. In that case, x is called the
independent variable, and y is called the dependent variable.
From Definition 1.2, we see that a function is a relation on S and T , where for
each x ∈ S = D( f ) there is exactly one corresponding value y ∈ T . In other words,
for each x ∈ D( f ) there is exactly one element y such that y = f (x). Thus, the
relation defined by f is
{(x, y)| y = f (x)} . (1.2)

The study of calculus is based on functions, whose domain D is a set of real numbers,
often an interval or a union of intervals, and whose range is a set of real numbers.
In this context, a function f from D to R, where D and R are subsets of the set of
real numbers, is called a real-valued function of a real variable. When we restrict
ourselves to a single variable, as we do here, f is also called a function of one
variable, or of a single variable. Later in Chap. 8, we will also encounter functions
of two or more variables.
If x ∈ D( f ) is a real number, then the ordered pair (x, f (x)) with x ∈ D( f ) can
be identified with a point in the x y-plane. The relation (1.2) formed by the set of all
such points gives rise to a pictorial representation of f . Usually, both the picture and
its formal description as the set (1.2) are called the graph of the function f . See, for
example, Figs. 1.4 and 1.5.
There are four ways to represent a function: Verbally, numerically, visually (graph-
ically), and algebraically.
Verbal representation: A function f is represented verbally if f is described in
words. Examples are as follows:
1.1 Function, Domain, and Range 5


Fig. 1.5 Graph of the function f (x) = 9 − x2

1. A(r ) is the area of a circle of radius r .


2. T (t) is the temperature at time t.
3. v(t) is the (instantaneous) velocity at time t.
4. h(t) is the height at time t of a ball which has been thrown upward.
Tabular or numerical representation: A numerical representation of a function f
is a table of its arguments and values. If done by rows, the upper row contains the
arguments (the elements of the domain), and the lower row the values (the elements
of the range). Example:
x 2 3 5 8 11
.
f (x) 11 9 12 −7 9

In this case,

D( f ) = {2, 3, 5, 8, 11} , R( f ) = {11, 9, 12, −7} .

As a counterexample, the table of values

x −2 1 3 5 3
f (x) 11 9 9 −7 −6

does not represent a function since there are two different values (9 and −6) associated
to the number 3.
Graphical representation: A function f is represented graphically if there is a
graph G in the coordinate plane such that the point (x, y) is on G if and only if
y = f (x), see Fig. 1.6. The domain of f consists of those points on the x-axis such
that the vertical lines through these points meet the graph G. (Since f is a function,
each vertical line may meet G at most once.) The range of f consists of those points
6 1 Functions and Models

Fig. 1.6 Graph, domain, and range of a function of x

on the y-axis such that the horizontal line through these points meets the graph G.
(It may happen that a horizontal line meets G more than once.)
Algebraic representation: A function f is said to be represented algebraically if,
for each x in the domain of f , f (x) is equal to some algebraic expression involving
x. Examples are
1. f (x) = x 2 ,
2. g(x) = √x + 5,
3. h(x) = x.
The last example needs clarification, since the square
√ root of a positive number is not
unique. In this book, when we use the symbol x, we always refer to the positive
square root of x. Then h indeed defines a function.
A function may be thought of as an input–output machine, see Fig. 1.7. Given a
particular input, say x, the function uses it as an argument and produces the value
f (x) as output.
⎧ ⎫
⎨ input t → function f → output f (t) ⎬
input 3 → function g → output g(3)
⎩ ⎭
input 5 → f (t) = 2t − 6 → f (5) = 4

A function which describes the transmission of information is often called a signal.

Remark 1.1 There are many curves which we can draw in the plane, but which are
not graphs of a function. In general, a curve is a graph of a function if and only
if no vertical line intersects the curve more than once. In other words, if a vertical
1.1 Function, Domain, and Range 7

Fig. 1.7 Function as a


machine or a signal

Fig. 1.8 Not a graph of a


function of x

line intersects the curve at all, it does so only once. For example, the curve given in
Fig. 1.8 is not a graph of a function of x as a vertical line intersects it more than once.

As a further example, let f be the absolute value function.


1. Verbal representation: f (x) is the absolute value of x.
2. Tabular representation:
x −2 −1 0 1 2
f (x) 2 1 0 1 2

3. Graphical representation: See Fig. 1.9.


4. Algebraic representation:

x, x ≥ 0,
f (x) = |x| =
−x , x < 0 .

Remark 1.2 Let us emphasize an important point concerning the notation of func-
tions. In the majority of cases, people use the letter x to denote an argument of a
function of a single variable. On the other hand, in particular, when the function
8 1 Functions and Models

Fig. 1.9 Absolute value function

arises in the context of a real-world problem, other letters are chosen for the argu-
ment which reflects that context, like t for time. Now the point we want to make is
that the definition f (x) = x 2 denotes exactly the same function as the definition
f (t) = t 2 . Indeed, according to Definition 1.2 a function f is specified completely
by its domain and range (which are sets) and the rule by which elements of the range
are associated to elements of the domain. Which letters or symbols we use to spec-
ify those elements are immaterial when we consider the function as a mathematical
object. This observation may seem trivial to some. But it lies at the roots of the power
of mathematics. Once you have a mathematical theory for the function f (x) = x 2 , it
applies no matter whether x stands for time, distance, price, signal intensity, or other
quantities.

1.2 Various Types of Functions

Functions come in a large variety. Therefore, it makes sense not only to specify indi-
vidual functions, like f (x) = 3x 2 − 4, but to consider certain classes of functions.
The functions belonging to such a class are identified by certain features common to
all functions in that class.

Definition 1.3 (Constant function) If f (x) = c for all x in the domain of f , where
c is a fixed number, then f is called a constant function.

The graph of a constant function is a horizontal line. For example, the function
defined by f (x) = 4, x ∈ [−4, 16], is a constant function. See Fig. 1.10.

Definition 1.4 (Linear function) A function of the form y = f (x) = mx + b, where


m and b are given real numbers, is called a linear function.
1.2 Various Types of Functions 9

Fig. 1.10 Constant function

The graphs of linear functions are straight lines with slope m. Examples of linear
functions are f (x) = 9x + 2 and f (x) = 21 x − 3.

Definition 1.5 (Quadratic function) A function of the form y = f (x) = ax 2 +


bx + c, where a, b, and c are given real numbers and a  = 0, is called a quadratic
function.

The functions defined by f (x) = 5x 2 + 9x + 4 and f (x) = −x 2 + 4x + 1 are


examples of quadratic functions.

Definition 1.6 (Polynomial function) A function of the form

y = f (x) = an x n + an−1 x n−1 + · · · + a2 x 2 + a1 x + a0 ,

where an , an−1 , . . . , a2 , a1 , a0 are given real numbers with an  = 0, and n ≥ 0 is a


nonnegative integer, is called a polynomial function or simply a polynomial of
degree n.

A linear function f (x) = ax + b is a polynomial of degree 1 if a  = 1, otherwise


it reverts to the constant b, a polynomial of degree 0. A quadratic function f (x) =
ax 2 + bx + c, a  = 0, is a polynomial of degree 2. As a further example, the function
f (x) = 4x 8 − 3x − 2 is a polynomial of degree 8.

Definition 1.7 (Rational function) A function of the form

P(x)
f (x) = ,
Q(x)

where P(x) = an x n + · · · + a1 x + a0 and Q(x) = bm x m + · · · + b1 x + b0 are poly-


nomials of degree n and m, respectively, is called a rational function.
10 1 Functions and Models

For example, the function


3x 3 − 4x 2 + 2
f (x) =
5x 2 + x − 3

is a rational function.

Definition 1.8 (Power function) A function of the form y = f (x) = x r , where r is


a given real number, is called a power function.

For power functions, one has to distinguish different cases. If r is a positive integer,
then f (x) = x r is a polynomial of degree r , also called the monomial of degree
r . Examples are f (x) = x 2 and f (x) = x 7 . If r is a negative integer, then −r is a
positive integer and x r is defined as

1
xr = , if x = 0.
x −r
For example,
1
x −2 = .
x2

If r = 0, then x 0 = 1 by convention (because it should hold that x 0 = x r −r = x r x −r ).


If r is a rational number, then x r may involve roots, for example

√ √ √ 1
x2 , x− 2 = √ .
1 1 2 1
x2 = x, x6 = x, x5 =
6 5

In this case, x r is defined only if x is nonnegative. The case when x is irrational is more
complicated. We discuss it in the next section under the subheading “exponential
function.”
The classes of functions considered above have been specified by algebraic for-
mulas. We now consider some classes which are specified by general properties.

Definition 1.9 (Even and odd functions) A function f having the property f (−x) =
f (x) for all x ∈ D( f ) is called an even function. An odd function is a function for
which f (−x) = − f (x) for all x ∈ D( f ). In both cases, we assume that −x ∈ D( f )
whenever x ∈ D( f ).

For example, f (x) = x 4 is an even function and f (x) = x 3 is an odd function. The
trigonometric functions cos and sec are even functions, whereas sin, tan, cot, and csc
are odd functions. The absolute value function f (x) = |x| is an even function.
The graph of an odd function is symmetric with respect to the origin (Fig. 1.11),
and the graph of an even function is symmetric w.r.t. the y-axis (Fig. 1.12). The zero
function is the only function which is both even and odd, because the latter implies
that − f (x) = f (−x) = f (x), and hence f (x) = 0 for all x.
There are many functions which are neither even nor odd, for example
1.2 Various Types of Functions 11

Fig. 1.11 The odd function y = sin x

Fig. 1.12 The even function y = cos x

f (x) = x + x 2 .

In fact, the sum of an even and an odd function is neither even nor odd, unless one
of them is the zero function.

Theorem 1.1 Let f and g be two real-valued functions defined on the same domain.
(a) If f and g are even functions, then f + g and f · g are even functions.
12 1 Functions and Models

(b) If f and g are odd functions, then f + g is an odd function, and f · g is an


even function.
(c) If f is an even and g is an odd function, then f · g is an odd function.
For example,
1
tan x = · sin x
cos x
is odd, by part (c) of the theorem above.
Definition 1.10 (Periodic function) A function f is said to be periodic with period
p, if for each x in the domain of f , namely, x ∈ D( f ), the point x + p also belongs to
D( f ) and f (x + p) = f (x). Such a number p is called a period of f . The smallest
positive period of f is called the fundamental period of f .
The trigonometric functions sin x and cos x are periodic functions with the funda-
mental period 2π, as

sin (x + 2π ) = sin x , cos (x + 2π ) = cos x ,

and since there is no smaller positive period than 2π. On the other hand, all numbers
of the form 2kπ, k being any integer, are periods of sin x and cos x.
Definition 1.11 (Increasing and decreasing functions) A function f is said to be
increasing on an interval I , if f (x2 ) > f (x1 ) for every pair of points x1 , x2 ∈ I such
that x2 > x1 . Similarly, f is said to be decreasing on I if f (x2 ) < f (x1 ) for every
pair of points x1 , x2 ∈ I such that x2 > x1 .
In addition, if f (x2 ) ≥ f (x1 ) whenever x2 > x1 , f is called nondecreasing. If
f (x2 ) ≤ f (x1 ) whenever x2 > x1 then f is called nonincreasing.
It can be checked that the sum of two increasing functions is an increasing function.
This statement remains true if we replace “increasing” by “decreasing”, “nonincreas-
ing”, or “nondecreasing”.
There is a close relationship between increasing (decreasing, nonincreasing, non-
decreasing) functions and the sign of their first derivative (see Sect. 4.3).
Example 1.3 1. The function f (x) = x 2 is decreasing on (−∞, 0] and increasing
on [0, ∞). On (−∞, ∞) it is neither decreasing nor increasing.
2. The function
2, x < 0
f (x) =
x, x ≥ 0

is constant on (−∞, 0) and increasing on [0, ∞). (See Fig. 1.13.) It is neither
decreasing nor increasing on (−∞, ∞).
3. The function f (x) = x 3 is increasing on (−∞, ∞). (See Fig. 1.14.)
4. The function defined by

1 , x is rational,
f (x) =
0 , x is irrational,
1.2 Various Types of Functions 13

Fig. 1.13 A function which is neither decreasing nor increasing

Fig. 1.14 An increasing function

is called the Dirichlet function. (see Fig. 1.15) There is no interval on which
the function either increases or decreases. On every interval, however small, the
function jumps back and forth between 0 and 1 an infinite number of times. Thus,
there is no good way to draw the graph of this function, and Fig. 1.15 provides
just a very rough approximation.

The next class of functions we consider is the class of invertible functions. It consists
of all those functions for which we can find an inverse function according to the
following definition.
14 1 Functions and Models

Fig. 1.15 A very rough approximation of the graph of the Dirichlet function

Definition 1.12 (Inverse function)


(a) A function f is said to be one-to-one or injective if any two different numbers
x1 , x2 in its domain yield different function values, that is, x1  = x2 implies
f (x1 )  = f (x2 ). An equivalent formulation of this condition would be that
f (x1 ) = f (x2 ) implies x1 = x2 .
(b) Let f be one-to-one. The inverse of f , denoted by f −1 , is the function whose
domain D( f −1 ) equals the range of f and is defined by

f −1 (y) = x , (1.3)

for all y in the range of f , where x is the unique element in D( f ) with f (x) = y.

Note that the inverse f −1 indeed is a function according to Definition 1.2, since
when f is one-to-one, for any given y ∈ R( f ), there is exactly one x ∈ D( f ) with
y = f (x). Inserting y = f (x) into (1.3), we see that

f −1 ( f (x)) = x , for all x ∈ D( f ), (1.4)

furnishes an equivalent definition of the inverse function.


If we follow the convention that x denotes an argument of f and y denotes a value
y = f (x) of f , it is natural to use the letter y to denote the argument of f −1 . So, for

example, the inverse function f −1 of f (x) = x 3 is given by f −1 (y) = y 1/3 = 3 y.
Indeed, (1.4) holds since

f −1 ( f (x)) = f −1 (x 3 ) = x 3 = x .
3

As another example, the inverse f −1 of f (x) = 3x − 5 is given by f −1 (y) =


1
3
y + 53 .
Theorem 1.2 Let the function f , defined on an interval I , be either increasing or
decreasing. Then f has an inverse.
1.2 Various Types of Functions 15

Proof Increasing and decreasing functions are one-to-one; therefore, they have an
inverse.

Remark 1.3 1. If f has an inverse f −1 , then

f ( f −1 (y)) = y , for all y in the range of f.

2. If f is a one-to-one function, then there is one and only one function g with
domain equal to the range of f such that

g( f (x)) = x , f (g(y)) = y ,

for all x ∈ D( f ) and all y ∈ R( f ). Namely, g = f −1 .


3. There is no special meaning attached to the letter y as an argument of f −1 . We
could equally well use the letter x, so that, for example, the inverse of f (x) = x 3
is given by f −1 (x) = x 1/3 .
4. If f is a one-to-one function, every horizontal line in the coordinate plane meets
the graph of f at most once.
5. When we reflect the graph of f in the coordinate plane across the straight line
y = x, we obtain the graph of f −1 .

Definition 1.13 (Bounded function) A function f is said to be bounded on an inter-


val I if there exists a real number M > 0 such that | f (x)| ≤ M for all x ∈ I . Any
such number is called a bound for f on I .

For example, f (x) = 9 − x 2 is bounded on its domain, the interval I = [−3, 3],
because 0 ≤ f (x) ≤ 3 for all x ∈ I , and M = 3 (or any larger number) is a bound
for f . On the other hand, the function g(x) = π x 2 is unbounded on its domain of
definition (−∞, ∞), but it is bounded if we restrict it to the interval I = [0, 1],
where it has the bound π.

1.3 Important Examples of Functions

Trigonometric functions. The trigonometric functions sin x, cos x, tan x, cot x, sec x
and csc x are discussed in Appendix C.
Absolute value function. This function is denoted as |x| and is defined by

x, x ≥ 0,
f (x) = |x| =
−x , x < 0 .

Its domain is (−∞, ∞) or R, its range is [0, ∞). If it appears as part of another
function, one has to take care what becomes of the case distinction. For example,
g(x) = |2x − 3| can be expressed as
16 1 Functions and Models

Fig. 1.16 Signum or sign function


2x − 3 , x≥ 3
,
g(x) = 2
−(2x − 3) , x < 3
2
.

Signum function. This function, also called sign function, is denoted by sgn(x) and
defined by ⎧
⎨ x , x = 0 ,
sgn (x) = |x|
⎩0 , x = 0.

The signum function has the values 1 if x > 0, −1 if x < 0, and 0 if x = 0. Its domain
is R or (−∞, ∞), and its range is the set {−1, 0, 1} consisting of three elements only.
(See Fig. 1.16.)
Greatest Integer function. This function is denoted by [x] and defined by

f (x) = [x] = the largest integer n such that n ≤ x.

See Fig. 1.17. It is also called the integer part function, and the number [x] is called
the integer part of x. The domain of this function is R or (−∞, ∞), and its range
is the set of all integers. It is constant on each interval of the form [n, n + 1) where
n is an integer. At each integer point x = n the value of the function [x] changes
from n − 1 to n; the function is said to have a jump or a step of unit magnitude
at those points. For example, we have [2.1] = 2, [2] = 2, [1.9] = 1, [−2.1] = −3,
[−2] = −2.
Step function. Let a closed interval [a, b] be divided into subintervals [x0 , x1 ],
[x1 , x2 ], …, [xn−1 , xn ] by the points a = x0 < x1 < x2 < . . . < xn−1 < xn = b. A
step function s(x) on [a, b] is a function that is constant on open intervals (x j−1 , x j ),
j = 1, 2, 3, . . . , n. The values s(x j ) at the partition points need not to be related to
1.3 Important Examples of Functions 17

Fig. 1.17 Integer part function

Fig. 1.18 A step function

the values s(x) on either adjoining subintervals. An example of a step function is


shown in Fig. 1.18. The greatest integer function (if restricted to an interval [a, b])
also furnishes an example of a step function.
Factorial function. The factorial function or factorial of n is denoted by n! and
defined by
f (n) = n! = 1 · 2 · 3 · · · n .

Its domain is the set of nonnegative integers including 0, and we have 0! = 1 by


convention (Fig. 1.19). Its range is a subset of the positive integers. Its graph consists
of isolated points (0, 1), (1, 1), (2, 2), (3, 6), (4, 24) ….
Haar function. The function f defined by

1, 0 < x ≤ 21 ,
f (x) =
−1 , 21 < x ≤ 1 ,
18 1 Functions and Models

Fig. 1.19 Factorial function

Fig. 1.20 Haar function

is called the Haar function, see Fig. 1.20. The Haar function appears in approxima-
tion theory, in particular, in the construction of wavelets.
Heaviside function. (See Fig. 1.21) The Heaviside function H (x) is defined as

1, x ≥ 0,
H (x) =
0, x < 0.

Shannon sampling function. The function f defined by

sin ω0 x
f (x) = ,
πx
1.3 Important Examples of Functions 19

Fig. 1.21 Heaviside


function

Fig. 1.22 Shannon sampling


function

where ω0 is a positive number, is called the Shannon sampling function. (See


Fig. 1.22.)
Exponential function. A function of the form

y = f (x) = a x ,

where a > 0 is a real constant, is called an exponential function. This name is


chosen because the independent variable x appears in the exponent. Previously, √ we
2
have defined a x when x is a rational number, using√roots like, for example, x 3 = x 2 .
3

But how can we define, for example, the number 3 2 ? One way to do this is to observe
that the function f (x) = 3x is increasing if we consider only rational numbers x as
20 1 Functions and Models

Fig. 1.23 Exponential


function

√ √
arguments.

Therefore,
√ we require that 3 2
> 3x if 2 > x and x is rational, and
3 2 < 3x if 2 < x and x is rational. Indeed one can prove that this procedure
defines a unique real number a x for any real number x in the case a > 1. A similar
argument works for the case 0 < a < 1, and finally one sets 1x = 1 for all real
numbers x. This approach is perfectly feasible.
Nowadays, mathematicians worldwide usually adopt a different approach which
is directly related to important formulas. It is based on the famous Euler number
e, an irrational number given by e = 2.71828 . . . . We will define e as the limit of a
sequence in Chap. 2. Later, we will define the function f (x) = e x through an infinite
series in Chap. 5. Because of its fundamental importance, both in mathematics itself
and in applications of mathematics, the function f (x) = e x is usually called the
exponential function. Its domain is R = (−∞, ∞) and its range is (0, ∞). Its graph
is given in Fig. 1.23. The basic formula for the exponential function is

e x+y = e x · e y . (1.5)

Naturally, this should be true, since otherwise the notation e x would not make much
sense, and indeed one can prove it from the series representation of e x in Chap. 5.
Setting x = 0 in (1.5) we see that e0 = 1, and setting y = −x in (1.5) we obtain that

1
1 = e0 = e x−x = e x · e−x , e−x = . (1.6)
ex

The number e itself is obtained from the exponential function as e = e1 . The exponen-
tial function will appear throughout this book in many examples where mathematics
is applied.
The function f (x) = e−x is called a Gaussian function.
2

The other exponential functions f (x) = a x , a = e are related to the exponential


function f (x) = e x through the logarithmic functions, to be explained in the next
paragraph.
1.3 Important Examples of Functions 21

Fig. 1.24 Natural logarithm

Logarithmic function. If x = e y , which is possible only if x > 0, then y is called


the natural logarithm of x, and it is written as y = ln x. In other words, y = ln x
means the same as x = e y . We have ln e = 1 since e = e1 , and ln 1 = 0 since 1 = e0 .
The exponential function is increasing and hence has an inverse function defined on
its range (0, ∞) due to Theorem 1.2. This inverse function is furnished by the natural
logarithm, since ln e y = y for all y ∈ R, and x = eln x for all x > 0. Therefore, the
graph of the natural logarithm (Fig. 1.24) is obtained from the graph of the exponential
function (Fig. 1.23) by reflection across the line y = x. The graph of the natural
logarithm crosses the x-axis at (1, 0), and the graph of the exponential function
crosses the y-axis at (0, 1); this corresponds to the formulas ln 1 = 0 and e0 = 1
from above.
With the aid of the natural logarithm, a convenient definition of the exponential
functions f (x) = a x for arbitrary real numbers a > 0 is given by

a x = e x ln a . (1.7)

The rules for powers with natural numbers extend to real numbers a, x, y where
a > 0,
a x+y = a x · a y , (a x ) y = a x y .

This can be seen by combining (1.7) with the basic formula (1.5) for the exponential
function.
Due to (1.7), the functions f (x) = a x are increasing if a > 1 and decreasing if
a < 1; hence, they can be inverted if a > 0 and a = 1. If x = a y , which is possible
only for x > 0, then y is called logarithm of x to the base a and it is written as
y = loga x. In other words y = loga x means the same as x = a y . We give some
examples:
22 1 Functions and Models

y = log2 1 means 2 y = 1, that is , y = 0.


y = log2 4 means 2 y = 4, that is , y = 2.
y = loga a means a y = a, that is , y = 1.
1 1
y = log3 means 3 y = , that is , y = −1.
3 3
Besides e, the most common bases are a = 10 (due to the everyday use of the decimal
system) and a = 2 (due to the binary system used in computers). The logarithm to
the base 10 of x, that is log10 x, is called the common logarithm and often written
as log x.
Let x, y, r be arbitrary real numbers, where x, y > 0. Then the following formulas
hold.

loga (x y) = loga (x) + loga (y) ,




x
loga = loga (x) − loga (y) ,
y
loga (x r ) = r · loga x) ,
logb x
loga x = .
logb a

In the last formula, b denotes any real number with b > 0, b  = 1.


Hyperbolic functions. The hyperbolic functions are defined as follows.

e x − e−x
sinh x = (hyperbolic sine)
2
e x + e−x
cosh x = (hyperbolic cosine)
2
e x − e−x
tanh x = (hyperbolic tan)
e x + e−x
e x + e−x
coth x = (hyperbolic cot)
e x − e−x
2
sech x = (hyperbolic sec)
e + e−x
x

2
cosech x = (hyperbolic cosec).
e x − e−x

The terms “tanh”, “sech”, and “cosech” are pronounced as “tanch”, “seech”, and
“coseech”, respectively.
1.4 Functions as Models 23

1.4 Functions as Models

For a systematic study of the world around us, mathematics is required. As early
as 1623, the Italian scientist and mathematician Galileo Galilei (1564–1642) wrote
about “the all-encompassing book which is constantly open before our eyes” (he
means the universe) that “it is written in mathematical language.” He adds that “its
characters are triangles, circles, and other geometrical figures” and that without math-
ematical language, “one wanders around pointlessly in a dark labyrinth.” This is even
more relevant today, when we have come a long way from the origins of mathemat-
ics in numbers and geometrical figures. Indeed, Galilei’s statement addresses what
in the modern scientific view is termed as mathematical modeling. A situation in
physics or another discipline is formulated in terms of mathematical concepts such
as equations, functions, derivatives, and integrals. Such a formulation is then called
a mathematical model.
In this section, we use functions as mathematical models for different situations.
We address different situations from the real world, present mathematical models
for them in terms of functions (algebraic, numerical, or graphical), and give some
examples of results.
One of the most important steps in creating a mathematical model of a real-world
situation is to decide which factors to consider and which to ignore. The more the
factors one takes into account, the more complicated the expressions and equations
of the model tend to become, so an appropriate balance is needed between keeping
a model mathematically simple and considering enough factors to make the model
realistic and useful. Moreover, the developers of the model often have a purpose,
namely, some questions which the model should help to answer. A good mathemat-
ical model, at first, has to produce results that are consistent with the real world.
Depending on the questions it should help to answer, it may be good for one purpose
but not for another. If a mathematical model does not meet these requirements, it
must be modified or even changed completely.
In this section, we consider models that involve only two variables, say x and
y. We assume that the data for the phenomenon being modeled consists of a col-
lection of ordered pairs of measurements (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) that relate
corresponding values of the variables x and y. We will discuss the examples from the
deterministic viewpoint only, that is, when a given value of x uniquely determines the
value of y. Let us remark that in many other situations, a probabilistic (or stochastic)
viewpoint has to be taken, that is, the value of y does not only depend on the value
of x, but moreover probabilities are involved in some way or other.
Example 1.4 Express the area of a circular region as a function of its radius.
Solution: We know that the area of a circular region of radius r is given by the
number πr 2 . Thus, its area A is represented by the function f (r ) = πr 2 . In order
not to introduce additional symbols, one often denotes the value of the area and the
function by the same symbol A, so A(r ) = πr 2 . In this case, r is the independent
variable, and the area A is the dependent variable. The domain of this function is
(0, ∞) and its range, too, is (0, ∞).
24 1 Functions and Models

Example 1.5 The area of a square region in the plane is a function of the length of
its side. If α is the length of a side of the given square, then its area A is a function
of α, namely, A(α) = α 2 .

Example 1.6 The area of an equilateral triangle is a function of its side length.

Example 1.7 Let us consider a rectangular box such that the length plus the girth
(the perimeter of a cross section) equals 108 cm. Express the volume V of the box as
a function of the edge length of a cross section and find the domain of this function.

Solution: Let h denote the length of the box. All cross sections (including those at
the end) are squares of identical edge length, let us denote this edge length by α.
The girth equals the perimeter of such a square and is therefore equal to 4α. By
assumption, we must have 4α + h = 108 or h = 108 − 4α. The volume of the box
is given by V = α 2 h. Inserting the formula for h we obtain

V (α) = α 2 (108 − 4α) .

Since neither the edge length of the square nor the length of the box can be negative,
we have α ≥ 0 and h = 108 − 4α ≥ 0. The latter inequality holds if and only if
α ≤ 27. Therefore, the total restrictions on α are 0 ≤ α ≤ 27. Thus, the domain of
the function V is the interval [0, 27].

Example 1.8 A soft drink manufacturer wants to fabricate cylindrical cans. Each
can should have a volume of 200 cubic cm. Express the total surface area of each
can as a function of its radius and find the domain of this function.

Solution: The total surface area S of the can equals 2πr 2 + 2πr h, and thus is a
function of its radius r and height h. We can eliminate h through the volume constraint
V = πr 2 h = 200, which gives h = πr 200
2 as a function of r . Inserting this into the

formula for the surface area yields

200 400
S(r ) = 2πr 2 + 2πr = 2πr 2 + .
πr 2 r

Since r can take any positive value, D(S) = (0, ∞).

Example 1.9 The volume V of a ball is a function of its radius r ,

4 3
V (r ) = πr .
3
Example 1.10 Suppose a computer shop sells x laptops each day at 500 Euro a
piece, and suppose that the cost of manufacturing and selling the laptops is 400 Euro
per laptop plus fixed daily operating costs (heat, rent, insurance, etc.) of 200 Euro.
Express the daily profit as a function of x. How big is the profit if 20 laptops are sold
daily?
1.4 Functions as Models 25

Solution: The daily revenue equals R(x) = 500x, the daily cost equals C(x) =
400x + 200, the daily profit equals P(x) = R(x) − C(x), all measured in Euro.
Therefore, we obtain

P(x) = 500x − (400x + 200) = 100x − 200 .

If 20 laptops are manufactured then

P(20) = 100 · 20 − 200 = 1800 .

In this case, the daily profit equals 1800 Euro.


Proportionality. We say that the quantity y is directly proportional to the quantity x if
there is a constant k such that y = kx. This k is called the constant of proportionality.
We also say that the quantity y is inversely proportional to the quantity x if y is
proportional to the reciprocal of x, that is, there is a constant μ such that y = μ/x.
The constant μ is called the constant of inverse proportionality.

Example 1.11 (a) A motorist has to drive from town A to town B, taking a duration
t of time at a constant speed v. Show that the speed he has to use is inversely
proportional to the time t. Compute the constant μ if the towns are 100 km apart.
(b) Assume that the heart mass of an animal is proportional to its body mass (which
is true approximately). Then
(i) Write a formula for the heart mass, h, as a function of the body mass, b.
(ii) An animal with a body mass of 80 kilograms has a heart mass of 0.48
kilograms. Find the constant of proportionality.
(iii) Estimate the heart mass of a cow with a body mass of 600 kg.

Solution:
(a) Let s denote the distance between A and B, then s = vt or v = s/t. If s is
measured in km, t in hours and v in km/h, then

s 100
v= = , μ = s = 100 .
t t
(b) (i) By the definition of direct proportionality, h = kb. In this case, the propor-
tionality constant is dimensionless, since both h and b have the dimension
of a mass (to be taken as kilogram).
(ii) Here h = 0.48 kg and b = 80 kg, so

0.48
0.48 = k · 80 , k = = 0.006 .
80
(iii) Since h = kb, we have h = 0.006 · 600 = 3.6 kg. Thus, the heart mass of
the cow is 3.6 kg.
26 1 Functions and Models

Models for exponential growth and decay. We consider a function which is pro-
portional to an exponential function,

P(t) = P0 a t .

It is a model for the growth or decay of a quantity P over time t, the initial amount
being P(0) = P0 . The number a > 0 is called the growth factor. Indeed, since P(t +
1) = a P(t), when the time t increases by one unit, the new value P(t + 1) is obtained
from the old value P(t) by multiplying with a. If one writes a = 1 + k, then the
number k P(t) gives the absolute change of P from t to t + 1, thus k is also termed
the relative growth rate. If it is positive, then the quantity P indeed is growing, while
if k < 0, it is decaying. We may also express the growth rate in percent, then, for
example, a growth rate of 5% corresponds to k = 0.05.
This model is closely related to a basic situation in finance. Suppose P0 is the
initial amount of money deposited in an account which pays an annual interest rate
of r percent, and P(t) is the balance in the account after t years, then

P(t) = P0 (1 + k)t , k = 0.01r , (1.8)

if the interest is compounded annually. If the interest is compounded continuously,


the model
P(t) = P0 ekt (1.9)

applies. Here, e = 2.71828 . . . is the Euler number as mentioned above. The meaning
of this model will be explained in detail in Chap. 3.
Example 1.12 (a) If Rs. 10000 is deposited in a bank account paying 10% annual
interest compounded continuously, how much will be the amount 10 years later?
(b) Suppose that a bank advertises an annual rate of 8% interest. If you deposit Rs.
5000, how much will be in the account 3 years later if the interest is compounded
(i) annually, (ii) continuously?
Solution:
(a) We use the formula P(t) = P0 ekt . The annual interest rate is 10%, so k = 0.1,
t = 10, and P0 = 10000, the initial deposit. Therefore, P(10) = 10000e0.1·10 =
10000e = 27182.81828.
(b) (i) For annual compounding after t = 3 years,

P(3) = P0 (1 + k)3 = 5000(1 + 0.08)3 = 6298.56 .

(ii) For continuous compounding after t = 3 years,

P(3) = P0 e3k = 5000e0.08·3 = 6356.25 .

We see that the amount in the account 3 years later is larger if the interest is
compounded continuously (6356.25) than if the interest is compounded annually
1.4 Functions as Models 27

(6298.56). This is to be expected since with annual compounding, the capital


increases only at the end of the year, while with continuous compounding, the
continuous increase of the capital during the year yields additional interest.
Example 1.13 A woman wants to invest for the education of her children in a certifi-
cate of deposit (CD). She wants it to be worth 12,000 in 10 years. How much should
she invest if the CD pays 9% interest rate (a) compounded annually, (b) compounded
continuously?
Solution:
(a) If the compound interest pays 9% interest over a period of 10 years, then r = 0.09
and t = 10. We find initial amount P0 if the balance after 10 years compounded
annually is P = 12000, we have

P = P0 (1 + r )t or
12000 = P0 (1 + 0.09)10 which gives
12000 12000
P0 = = = 5068.93.
(1.09) 10 2.36736

This amount should be invested to have the given balance after 10 years.
(b) In this case, we use the formula

12000 = P0 e(0.09)(10)
12000
or P0 = (0.09)(10)
e
12000 12000
= 0.9 = 4878.84.
e 2.45960
Thus initial deposit should be Euro 4878.84.
Example 1.14 The population of a city equals 60,000 at the beginning of the year
2007 and is growing continuously at a yearly rate of 5%.
1. Determine the population of the city at the beginning of the year 2017.
2. Calculate the time after which the size of the population will have doubled since
2007.
Solution: We use the formula P(t) = P0 ekt .
(a) Here k = 0.05 and t = 10, hence kt = 0.5. As P0 = 60, 000, we obtain P(10) =
60, 000e0.5 = 98, 923.28.
(b) The time t to double satisfies P(t) = 2P0 . Since on the other hand P(t) =
P0 e0.5t , we must have 2 = e0.05t . Taking the logarithm gives ln 2 = 0.05t, hence
t = 20 ln 2 = 13.8629.
Example 1.15 Using the model P(t) = P0 a t , predict the size of the world population
in 2010 with the help of the following table:
28 1 Functions and Models

Year Population(million) Ratio


1986 4936 5023/4936 ≈ 1.0176
1987 5023 5111/5023 ≈ 1.0175
1988 5111 5201/5111 ≈ 1.0176
1989 5201 5329/5201 ≈ 1.0246
1990 5329 5422/5329 ≈ 1.0175
1991 5422

Solution: The third column shows that the population in any year is about 1.018 times
the population in the previous year, so let us take a = 1.018 for the value of the growth
factor. If t years have passed after 1986, the world population would then be given by
approximately P(t) = 4936 · 1.018t million people. For 2010 we have t = 24, so our
estimate yields P(24) ≈ 7573.9, that is, approximately 7.6 billion people as the size
of the world population in 2010. An example of exponential decay is furnished by
the model y(t) = Ae−0.00012t which describes how the radioactive element carbon-14
decays over time. Here, A is the original amount of carbon-14, t is the time in years,
and y(t) is the amount of carbon-14 present after t years. Carbon-14 decay is used
to date the remains of dead organisms such as shells, seeds, and wooden artifacts.
Models involving the logarithmic function. As the first example, consider the
intensity of earthquakes. It is often characterized as a number on the logarithmic
Richter scale. The formula for its magnitude R is given by
a
R = log10 + B,
T
where a is the amplitude of the ground motion in microns at the receiving station, T
is the period of the seismic in seconds, and B is an empirical factor that allows for
the weakening of the seismic wave with increasing distance from the epicenter of
the quake. For an earthquake 10,000 km from the receiving station, B = 6.8. If the
recorded vertical ground motion is a = 10 microns and the period is T = 1 seconds,
the earthquake’s magnitude is

R = log10 (10/1) + 6.8 = 1 + 6.8 = 7.8 .

An earthquake of this magnitude does great damage near its epicenter.


The logarithmic function is also used in measuring the intensity of sound. The
so-called sound intensity level is defined by


I
L(I ) = 10 log10 .
I0

Here, I is the sound intensity and I0 is a reference value, usually taken as 10−12
Watts per square meter. The sound intensity arriving at a fixed point is directly
proportional to the sound power emitted by the source. The sound intensity level
L(I ) is a dimensionless number with decibel (abbreviated “db”) as its unit.
1.4 Functions as Models 29

Example 1.16 How many decibels does the sound intensity level increase if we
double the power output of an amplifier?
Solution: We have


2I I
L(2I ) = 10 log10 = 10 log10 + log10 2 = L( p) + 10 log10 2
I0 I0
= L( p) + 3.01 .

Thus, doubling the power output increases the sound intensity level by approximately
3 db.
Let us summarize the examples given in this section by saying that the exponential
function and its inverse, the logarithm function, play an important role in the math-
ematical modeling of the real world. Their mathematical properties will be studied
in detail in the following chapters.

1.5 Algebra of Functions

Let f and g be the two functions with domains D( f ) and D(g), respectively. They
can be added, subtracted, and multiplied on the intersection D( f ) ∩ D(g) of their
domains, that is,

( f + g)(x) = f (x) + g(x) (addition) ,


( f − g)(x) = f (x) − g(x) (subtraction),
( f g)(x) = f (x)g(x) (multiplication),

in particular, if α is any real number,

(α f )(x) = α f (x) (scalar multiplication).

At points x where g(x) = 0, f can be divided by g,

f f (x)
(x) = .
g g(x)

These statements may seem rather innocuous. Their mathematical meaning is as


follows: From two given functions f and g, we build a new function, which (in the
case of addition) we call f + g. Its value ( f + g)(x) at a point x is defined to be the
number f (x) + g(x), which we obtain by adding the values f (x) and g(x) of the
original functions.
The composition of two functions f and g is again a function. It is denoted by
f ◦ g and defined as
( f ◦ g)(x) = f (g(x)) .
30 1 Functions and Models

In other words, we obtain its value ( f ◦ g)(x) by inserting the value g(x) as an
argument into f . The domain of f ◦ g is given by

D( f ◦ g) = {x : x ∈ D(g) and g(x) ∈ D( f )} .

As an example, consider f (x) = ln x and g(x) = 4 − x 2 . We have D(g) = R and


D( f ) = (0, ∞), hence D( f ◦ g) = {x : 4 − x 2 > 0} = {x : |x| < 4}. The function
f ◦ g is given by ( f ◦ g)(x) = f (g(x)) = ln(4 − x 2 ).

1.6 Proofs, Mathematical Induction

Mathematical results are based on proofs. The notion of a proof and its rules have
a long history, and they were formulated already by Greek philosophers and mathe-
maticians, for example, by Aristotle. In mathematics, we work in a deductive system,
where truth is argued on the basis of assumptions, definitions, and previously proved
results. No one can righteously claim that a mathematical result is true without
clearly stating the basis, either implicitly or explicitly, on which the claim is made,
and presenting a proof.
In the previous sections, we already have stated definitions and theorems, and
drawn conclusions in order to prove a theorem or to get other results out of a theorem.
The ingredients of a theorem are its assumptions, the statements it claims to be true,
and the proof which shows that the statements indeed follow from the assumptions
in a strictly logical manner.
A simple way of stating a theorem is to say “if A holds then B holds,” or equiv-
alently “A implies B.” This means that A is the assumption, and B is the statement
which follows from the assumption. An equivalent formulation would be “A holds
only if B holds.”
Sometimes, a theorem tells us “if A holds then B holds” and conversely “if B holds
then A holds.” In this case, one usually merges those two statements and simply says
“A holds if and only if B holds” or, more briefly, “A holds iff B holds.” An example
would be the theorem “the number n is odd if and only if the number 3n is odd”,
which is true.
It may be noted that there may be more than one set of assumptions under which
a conclusion of a theorem holds. For example, if a and b are both positive, then ab
is positive, and if a and b are both negative, then ab is positive.
Let us present three common schemes of a proof. For the first two, we consider
the example theorem “if f is an even function, then f + 1 is an even function.”
Direct proof. Let f be an even function, let x be an arbitrary element of D( f ).
Then ( f + 1)(−x) = f (−x) + 1 = f (x) + 1 = ( f + 1)(x); therefore, f + 1 is an
even function.
Indirect proof (contraposition). Let the function f + 1 be not even. Then there
is an x ∈ D( f + 1) = D( f ) such that ( f + 1)(−x)  = ( f + 1)(x). This implies
1.6 Proofs, Mathematical Induction 31

that f (−x) + 1  = f (x) + 1, hence, f (−x) = f (x). Therefore, f is not an even


function.
The third scheme is a bit more involved, and it is the proof by contradiction.
Here, in order to prove “if A holds then B holds,” we assume that A holds, but B
does not hold. We proceed step by step and arrive at a contradiction. Since this is
not allowed in mathematics, the statement “A holds, but B does not hold” must be
wrong, so we conclude that if A holds, then B has √ to hold, too. Let us illustrate
this scheme by a famous example, the proof that 2 is√irrational, which is due to
the Greek mathematician Euclid. √ Let us assume that 2 is rational. Then, there
are
√ natural numbers p, q with 2 = p/q. We cancel common factors and arrive at
2 = r/s, where r and s do not possess common factors. Squaring and rearranging,
we get 2s 2 = r 2 . Therefore, 2 is a factor of r 2 and hence of r , let r = 2t. Furthermore,
2s 2 = 4t 2 , so s 2 = 2t 2 , and 2 is a factor of s, too. But this means that r and s have the
common factor √ 2, a contradiction. (Note that in this example, B corresponds to the
statement “ 2 is irrational,” while the role of A is played by the rules of computation
for rational numbers.)
Mathematical theorems often involve a statement about variables which may take
infinitely many different values. For example, the statement “ f is an even function”
means that f (−x) = f (x) holds for all (usually infinitely many) x ∈ D( f ). Another
statement of this type is the formula


n
n(n + 1)
k= , (1.10)
k=1
2

which is true for all natural numbers n. A way to prove such a statement is offered
by the principle of mathematical induction. In abstract form, it says
Let S be a set of positive integers. If (i) 1 ∈ S and (ii) n ∈ S implies that n + 1 ∈ S, then all
positive integers are in S.

In other words, if a statement involving n is true for n = 1, and furthermore it is true


for n + 1 whenever it is true for n, then it is true for all values of n.
Let us illustrate the principle of mathematical induction with the formula (1.10).
Since
1(1 + 1)
1= ,
2
it is true for n = 1. Now assume that it is true for n. We then have


n+1 
n
n(n + 1) (n + 2)(n + 1)
k = (n + 1) + k =n+1+ = ,
k=1 k=1
2 2

so it is true for n + 1. The principle of mathematical induction now implies that


(1.10) is true for all natural numbers n.
32 1 Functions and Models

1.7 Geometric Transformation of Functions

Translations of Functions
Suppose that y = f (x) is a function and c > 0.

Translate Graph Horizontally


When you subtract a positive number c from x, you are translating horizontally the
graph of the function c units to the right, that is, the graph of y = f (x − c) is obtained
by translating the graph of y = f (x), c units to the right. The graph of y = f (x + c)
is obtained by translating the graph of y = f (x), c units to the left.

Translate Graph Vertically


The graph of y = f (x) + c is obtained by translating the graph of y = f (x), c units
upward, that is, when you add a positive number c to a function you are translating
vertically the graph of the function c units upward.
The graph of y = f (x) − c is obtained by translating the graph of y = f (x), c
units downward (Fig. 1.25).

Fig. 1.25 Vertical translation and stretching of functions


1.7 Geometric Transformation of Functions 33

Compression and Stretching of Functions


Suppose that y = f (x) is a function and c > 1 (Fig. 1.25, 5th graph).

Horizontal Stretching or Compression


The graph of y = f (cx) is obtained by compressing horizontally the graph of
y = f (x) by a factor of c units. The graph of y = f ( xc ) is obtained by stretch-
ing horizontally the graph of y = f (x) by a factor of c units.

Vertical Stretching or Compression


The graph of y = c f (x) is obtained by stretching vertically the graph of y = f (x) by
a factor of c units. The graph of y = ( 1c ) f (x) is obtained by compressing vertically
the graph of y = f (x) by a factor of c units.

Reflection
Suppose that y = f (x) is a function, then the graph of y = f (−x) is obtained by
reflecting the graph of y = f (x) across the y-axis. The graph of y = − f (x) is
obtained by reflecting the graph of y = f (x) across the x-axis.

1.8 Exercises

1.8.1 Let f (x) = x1 . Find f ( 35 ), f (− 27 ).


1.8.2 Let f (x) = |x| − x. Find f (2), f (−2), f (50), f (−40).
1.8.3 Find out for which numbers x the function f (x) = x 21−3 is defined. What is
the value of this function for x = 5?
1.8.4 Find the domain and range of the function

(a) f (x) = √x + 5
(b) f (x) = x − 3
(c) f (x) = √9−x1
2

(d) f (x) = |x − 2|
(e) f (x) = x 2 − x − 6.
1.8.5 Indian postal service regulation requires that the length plus the girth (the
perimeter of a cross section) of a package for mailing cannot exceed 208
mm. A rectangular box with square end is designed to meet the regulation
exactly (see Fig. 1.26). Write down the volume V of the box as a function of
the edge length of the square end and give the domain of this function.
1.8.6 A soft drink manufacturer wants to fabricate cylindrical cans for its product
(see Fig. 1.27). The can is to have a volume of 12 fluid ounces, which is
approximately 22 cubic mm. Express the total surface area S of the can as a
function of its radius and give domain of this function.
the values of a x for√a = 2 and x = 4; a√= 5 and x = −1; a = 2 and
1.8.7 Find √
x = 2; a = e and x = 2; a = e and x = π.
34 1 Functions and Models

1.8.8 Examine whether the following functions are odd, even, or neither.
(a) f (x) = x −5 , (b) f (x) = x 4 + 3x 2 − 1, (c) f (x) = x 2x−1 , (d) f (t) = |t 3 |,

(e) h(t) = t 4 + 3.
1.8.9 Let f (x) = x 2 + x + 1. Find f (x − 4), f (x + 4), f ( 21 x), f (2x − 4).
1.8.10 Let f (x) = x + 6 and g(x) = x 2 − 4, find
(a) f (g(x)), (b) f ( f (2)), (c) g(g(3)), (d) f ( f (x)), (e)g( f (x)).
1.8.11 Let f (x) = x + 1 and g(x) = x+1 1
, find
(a) g( f ( 2 )), (b) f ( f (x)), (c) g(g(x)), (d) f (g( 31 )).
1

1.8.12 (a) Let ( f ◦ g)(x) = x, where g(x) = x1 , find f (x). (◦ denotes composi-
tion.)
(b) Let f (x) = x−1 x
, g(x) = x−1 x
, find ( f ◦ g)(x).
(c) Let ( f ◦ g)(x) = x, where f (x) = 1 + x1 , find g(x).
1.8.13 Draw the graph of the following function:

|x|
f (x) = , x = 0 , f (0) = 0 .
x

1.8.14 Draw the graph of the function f (x) = aeb(x−c) + d by putting different val-
ues of a, b, c, and d.

1.8.15 (a) Introduce the concept of random variable with the help of physical exam-
ples.
(b) Give examples of a function of random variable (stochastic function).
1.8.16 The average rate of change of a function y = f (t) between time t = t0 and
t = t1 is defined as Δy
Δt
= f (t1t1)− f (t0 )
−t0
. As we know a linear function has the
form
y = f (t) = mt + b ,

where m = f (t1t1)− f (t0 )


−t0
is the slope or rate of change of y with respect to t,
and b is the vertical intercept or value of the function when t = 0. Suppose
the solid waste generated each year in the cities of a country is a linear func-
tion of time, namely, y = mt + b. The amount of the solid waste was 82.3
million tons in 1960 and 139.1 in 1980. Find the linear function modeling
this situation and use this model to predict the amount of the solid waste in
2020.

1.8.17 (a) The data in the following table lie on a line. Find a function y in terms
of x (Table 1.1).
(b) Which of the following tables of values could represent linear equations
(Tables 1.2, 1.3, 1.4 and 1.5).
1.8.18 A company wants to understand the relationship between the amount a spent
on advertising and total sale S. The data collected is given in the following
table: If an amount of 3500 Euro is spent on advertising, predict the sale.
1.8 Exercises 35

1.8.19 The relationship between Fahrenheit temperature and Celsius temperature


is linear, that is, F = mC + b. Find the Celsius equivalent of 90 ◦ F and the
Fahrenheit equivalent of −5 ◦ C.
1.8.20 An open box is to be made from a rectangular piece of cardboard of dimen-
sions 0 cm × 30 cm by cutting out identical squares of area x 2 from each
corner turning up the sides. Express the volume V of the box as a function
of x.
1.8.21 An aquarium, open on top, of height 1.5 m is to have a volume of 6 m3 . Let
x denote the length of the base and let y denote the width.
(a) Express y as a function of x.
(b) Express the total number S of square meters of glass needed as a function
of x.
1.8.22 The shape of a spacecraft be a frustum of a right circular cone, a solid formed
by truncating a cone by a plane parallel to its base, see Fig. 1.28.
The radii a and b of the lower and upper part, respectively, are given.
(a) Express y as a function of the height h.
(b) Express the volume of the frustum as a function of h.
(c) Given a = 6 m, b = 3 m and V = 600 m3 , find h.
1.8.23 Explain the concept of periodic functions with the help of physical phenom-
ena.
1.8.24 For any periodic function, the amplitude is defined as the half of the difference
between its maximum and minimum values. The period is the time for the
function to execute one complete cycle. Sketch the graph of y = 3 sin 2t and
use the graph to determine the amplitude and period.
1.8.25 The constants A and B in the equation y = A sin Bt are called parame-
ters. The amplitude is determined by the parameter A, while the period
is determined by the parameter B. Functions y = A sin Bt + C and y =
A cos Bt + C are periodic with period |B|

and amplitude | A|, where C is the
vertical shift.
On February 15, 1995(assume), high tide in Indian ocean was at midnight.
The height of the water in the Bombay harbor is a periodic function, since
it oscillates between high and low tide. The height is approximated by the
function π
y = 4 − 9 cos t +5
6
where t is time in hours since midnight on February 15, 1995.

(a) Sketch a graph of this function on February 15, 1995 (from t = 0 to t = 24).
(b) What was the water level at high tide?
(c) When was low tide, and what was the water level at that time?
(d) What is the period of this function, and what does it represent in terms of
tides?
(e) What is the amplitude of this function and what does it represent in terms of
tides?
36 1 Functions and Models

Fig. 1.26 A rectangular box


with square end

Fig. 1.27 A cylindrical can

Table 1.1 Tabular representation of a function


x 10 20 30 40
y 200 180 160 140

Table 1.2 Does this table represent a linear function?


t 40 60 80 100
f (t) 2.4 2.2 2 1.8

Table 1.3 Does this table represent a linear function?


x 0 2 4 6
g(x) 10 16 26 40

Table 1.4 Does this table represent a linear function?


t 5 10 15 20
h(t) 100 90 80 70

Table 1.5 Does this table represent a linear function?


a (advertisement) 6 8 10 12
S (sales) 200 240 280 320
1.8 Exercises 37

Fig. 1.28 A frustum of a


circular cone
Chapter 2
Limit and Continuity

In this chapter, we introduce the concept of the limit of a function. This concept
bridges the gap between the areas of algebra, geometry, and calculus. The limit
is the most important concept of calculus, without limits calculus simply does not
exist. Practically, every notion of calculus is a limit in one sense or another. Physical
concepts and real-world situations can be expressed as limits of certain functions.
For example, a circle is the limit of a polygon, the length of a curve is the limit of the
lengths of polygonal paths, the area of a region bounded by a curve is the limit of
the sum of areas of approximating rectangles, and instantaneous velocity (velocity
at a particular time) is the limit of average velocities. We also discuss the concept of
continuous and discontinuous functions in terms of limit.

2.1 Idea and Definition of the Limit

Originally, calculus was developed in order to solve problems of the following type.

Problem 2.1 A car moves on a straight line in such a way that its position s at time
t is given by the equation s = f (t), where f is a given function. Find the velocity
of the car at any time t.

Problem 2.2 Let f be a given function, find the area bounded by the graph y =
f (x), the x-axis and the vertical lines x = a and x = b.

Problem 2.1 is solved by the process of differentiation (Chap. 3), and Problem 2.2
by the process of integration (Chap. 6).
Both processes are based on the concept of a limit. Its mathematical definition
gives a refined and formally precise meaning to an intuitive notion that occurs fre-
quently in our everyday lives. Before giving this definition, let us first illustrate the
concept with two examples. The first example involves the notion of instantaneous
velocity.

© Springer Nature Singapore Pte Ltd. 2019 39


M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6_2
40 2 Limit and Continuity

Fig. 2.1 a A particle


moving along a straight line.
b The position of the particle
as a function of time

Example 2.1 (Instantaneous velocity as a limit) Consider an object (car, rocket,


proton) moving along a straight line. Suppose we know the position s (relative to a
fixed origin) of the object at each moment of time t, that is, s = f (t) is given. Our
experience with moving objects provides us with an intuitive idea of velocity, but what
precisely does it mean to speak of the velocity of an object at time t (instantaneous
velocity at t), and how can we calculate this quantity? Let the motion of the particle
be given in Fig. 2.1a with the corresponding graph of f shown in Fig. 2.1b. Let us fix
a time t0 , the object being at position s0 = f (t0 ). For any other time t, the average
velocity during the time interval between t0 and t is given by

s − s0 f (t) − f (t0 )
vav (t0 , t) = = , (2.1)
t − t0 t − t0

which is the ratio of the total distance traveled and the total time elapsed. For the
situation shown in Fig. 2.1a, b, the average velocity may be either positive, negative,
or zero, depending on the choice of t0 and t.
Usually, more important than the average velocity is the instantaneous veloc-
ity, displayed, for example, on the speedometer in a moving car. Intuitively, the
instantaneous velocity v(t0 ) should be obtained from the average velocity when t
is close to t0 . We expect that the closer we choose t near t0 , the closer the average
velocity vav (t0 , t) will be to the instantaneous velocity v(t0 ), or in other words, “as t
approaches t0 , vav (t0 , t) will approach v(t0 )”. With the definition of and the calculus
rules for the limit, to be described below, mathematics provides a precise notion and
ways to compute the instantaneous velocity.
2.1 Idea and Definition of the Limit 41

Fig. 2.2 Path of the object


(projectile)

While obviously important, the notion of instantaneous velocity is already a some-


what complicated example of a limit, namely the derivative of a function. This will
be discussed in the next chapter. Instead, we now present a second (simpler) example
which will lead to the general definition of the limit in a straightforward manner.

Example 2.2 Suppose an object is moving through the air toward a solid wall, and
we want to find out at which height L it hits the wall. See Fig. 2.2. Let the wall be
represented by the vertical line x = c (we look at it from the side), and suppose the
path of the object in the air is described by a function y = f (x), where x < c (the
object comes from the left). As x comes closer and closer to c, we expect the height
f (x) of the object to approach a number L, the height at which the object hits the
wall. This expectation is natural if we imagine the object to move “continuously”
through the air, without being “teleported” suddenly from one point to another or
undergoing erratic fluctuations. We use the notation

lim f (x) = L . (2.2)


x→c−

This is read as “the limit of f (x), as x approaches c from the left, equals L” or “the
left-sided limit of f (x) at x = c equals L”, and in this example, it expresses the fact
that the object hits the wall at the height L. See Fig. 2.3. The minus sign within the
expression “x → c− ” indicates that we consider values of x which are close to c, but
always strictly less than c.
In the same manner, we may think of an object approaching the wall x = c from
the right along a path y = f (x), where now x is taken to be larger than c, see Fig. 2.4.
If the number M equals the height at which the object hits the wall, we write

lim f (x) = M (2.3)


x→c+

to denote the right-sided limit of f (x) at x = c. The only difference to (2.2), besides
using a different letter for the height, is the expression “x → c+ ”, where the plus
42 2 Limit and Continuity

Fig. 2.3 Left-sided limit

Fig. 2.4 Right-sided limit

sign indicates that we consider values of x close to c, but always strictly larger
than c.
We now take a close look at the above description of the limit process: “As x
approaches c from the left, f (x) approaches L.” We try to rephrase this as follows:
We can enforce f (x) to deviate from L by an amount less than a given ε > 0, if we restrict
x to be taken from the open interval (c − δ, c) with a sufficiently small δ > 0.

Since we want f (x) to come “arbitrarily close” to L, we have to require that the
quoted property holds no matter how small ε is prescribed.

Definition 2.1 (Left-sided Limit) We say that

lim f (x) = L , (2.4)


x→c−

if for every ε > 0 there is a δ > 0 such that: For all x with c − δ < x < c, we have
| f (x) − L| < ε. On the other hand, if there is no number L for which this is true,
we say that lim x→c− f (x) does not exist.

This definition looks very clumsy and complicated. It usually takes some time to
comprehend, and no harm is done if one reads on, coming back to it at some later
2.1 Idea and Definition of the Limit 43

time. On the other hand, it provides a firm foundation for the treatment of limits no
matter how they occur. Indeed, it emerged more than 100 years ago as the final result
of a long scientific struggle, when it was realized how easily one can arrive at wrong
conclusions if one does not have a precise definition of the limit. This is even more
important today, when mathematics lies at the heart of the description of complicated
processes, and more and more trust is placed in machines and computers to work
correctly.
The definition of the right-sided limit

lim f (x) = M (2.5)


x→c+

is exactly the same as Definition 2.1, except that the phrase “c − δ < x < c” is
replaced by “c < x < c + δ”, and L is replaced by M. Instead of “left-sided limit”
and “right-sided limit”, one may also use the notions “left hand limit” and “right hand
limit”. The notion one-sided limit refers to either the left-sided or the right-sided
limit.
If both the left-sided and the right-sided limits exist and are moreover equal, that
is, if
lim− f (x) = L = lim+ f (x) , (2.6)
x→c x→c

then we say that f has a two-sided limit or simply a limit at x = c and write

lim f (x) = L . (2.7)


x→c

Equivalently, we may write the definition of the limit in a way similar to Definition
2.1.

Definition 2.2 (Limit) We say that

lim f (x) = L , (2.8)


x→c

if for every ε > 0 there is a δ > 0 such that: For all x with c − δ < x < c + δ and
x = c, we have | f (x) − L| < ε. On the other hand, if there is no number L for which
this is true, we say that lim x→c f (x) does not exist.

Example 2.3 1. Let us consider the Heaviside function given by



1, x ≥ 0,
f (x) = (2.9)
0, x < 0,

Here, f (x) approaches 0 (in fact, is equal to 0) as x approaches 0 from the left,
that is
lim− f (x) = 0 ,
x→0
44 2 Limit and Continuity

and f (x) is equal to 1 as x approaches 0 from the right, that is

lim f (x) = 1 .
x→0+

Note that lim x→0 f (x) does not exist, since the left-sided limit and the right-sided
limit are different from each other.
2. The function f (x) = [x], denoting the integer part of x, possesses both one-sided
derivatives for arbitrary real numbers x. They are equal if x is not an integer
number, but they are not equal whenever x is an integer number, for example,

lim [x] = 1 , lim [x] = 0 .


x→1+ x→1−

Remark 2.1 1. A one-sided or two-sided limit may or may not exist, but if it exists,
it is uniquely determined. In other words, there can be at most one number L
(or M) which has the property required in the definitions above.
2. For the question whether f has a one-sided or two-sided limit at a point c and
what its value is, it does not matter whether f is defined at the point c itself or
what the value f (c) is. For the Heaviside function in (2.9), we have f (0) = 1
which happens to be equal to the right-sided, but unequal to the left-sided limit.
3. One also considers limits where either x → ∞ or f (x) → ∞ or both. These are
called improper limits and will be treated in Sect. 2.4.

Remark 2.2 The following four statements are equivalent:


1. lim x→c f (x) = L
2. lim h→0 f (c + h) = L
3. lim x→c ( f (x) − L) = 0
4. lim x→c | f (x) − L| = 0

2.2 Evaluating Limits

We summarize important properties of limits in the form of the following rules. With
their aid, the knowledge of some elementary limits can be used to compute the limits
of more and more complicated functions. Their proofs are based upon the exact
definition of the limit as presented in the previous section. Interested readers may
find them through solving selected exercises, given at the end of this chapter.
The simplest limit is that of a constant function, say f (x) = A for all x. We have

lim f (x) = lim A = A ,


x→c x→c

no matter at which point c we take the limit. Thus, the limit of a constant function
with value A is equal to A itself. As an example, let f (x) = 100, then lim x→3 f (x) =
lim x→3 100 = 100.
2.2 Evaluating Limits 45

Next, we consider the function f (x) = x. For all numbers c, we have

lim f (x) = lim x = c .


x→c x→c

Now, we look at elementary algebraic operations.

Theorem 2.1 Let f and g be functions such that lim x→c f (x) = L and lim x→c g(x) =
M at some given point c. Then

lim [ f (x) + g(x)] = lim f (x) + lim g(x) = L + M , (2.10)


x→c x→c x→c
lim α f (x) = α lim f (x) = αL , for any constant α, (2.11)
x→c x→c
   
lim [ f (x)g(x)] = lim f (x) · lim g(x) = L M , (2.12)
x→c x→c x→c
f (x) L
lim = , provided M = 0. (2.13)
x→c g(x) M

Rules (2.10)–(2.13) mean that we can interchange the limit with the elementary
algebraic operations. This can be done repeatedly, as in the following examples.

Example 2.4 1. Let f (x) = 3x + 5. Then


   
lim f (x) = lim (3x + 5) = lim 3x + lim 5 = 3 · lim x + 5
x→2 x→2 x→2 x→2 x→2

= 3 · 2 + 5 = 11 .

2. Let P(x) = a0 + a1 x + a2 x 2 + · · · + an x n be a polynomial. Then lim x→c P(x) =


P(c), since
 
lim P(x) = lim a0 + a1 x + a2 x 2 + · · · + an x n
x→c x→c
= a0 + a1 c + a2 c 2 + · · · + an c n
= P(c) .

3. Formula (2.13) for the quotient does not work when the denominator M is zero.
But sometimes this can be avoided through cancelation. For example, let

x 2 − 16
f (x) = ,
x −4

and let c = 3. Then

x 2 − 16 (x − 4)(x + 4)
lim = lim = lim (x + 4) = 7 .
x→3 x −4 x→3 x −4 x→3
46 2 Limit and Continuity

4. Consider a rational function


P(x)
R(x) = , where P and Q are polynomials.
Q(x)

Then
P(c)
lim R(x) = , provided Q(c)  = 0.
x→c Q(c)

Roots can also be interchanged with limits.


Theorem 2.2 For every natural number n,
√ √
lim n
x= n
c (2.14)
x→c

holds for all real numbers c if n is odd, and for all real numbers c ≥ 0 if n is even.
(In the latter case, if c = 0, we have to replace the limit by the right-sided limit.)

Theorem 2.3 If lim x→c f (x) = L, then lim x→c | f (x)| = |L|. For L = 0, the con-
verse holds: If lim x→c | f (x)| = 0, then lim x→c f (x) = 0.

Example 2.5 1. For the absolute value, we have lim x→0 |x| = 0.
2. For the sign function, we have
x
sgn(x) = , x = 0 ,
|x|
so
lim sgn(x) = −1 , lim sgn(x) = 1 ,
x→0− x→0+

and therefore lim x→0 sgn(x) does not exist. Since on the other hand |sgn(x)| = 1
for x  = 0, this shows that in Theorem 2.3, the converse statement may not hold
if L  = 0.

Theorem 2.4 (Composition) Let functions f and g be given with suitable domains
such that the composition g ◦ f is defined. If

lim f (x) = L , lim g(y) = M ,


x→c y→L

then
lim g( f (x)) = M .
x→c

Theorem 2.5 (Sandwich Theorem) Let the functions f, g, h have the property that
g(x) ≤ f (x) ≤ h(x) in some interval I which contains the point c. If

lim g(x) = L = lim h(x) ,


x→c x→c
2.2 Evaluating Limits 47

then also
lim f (x) = L .
x→c

The rules above (Theorems 2.1–2.5) are formulated for the limit (that is, the two-
sided limit), but they are equally valid for the one-sided limits, one just has to replace
the phrase “x → c” by ”x → c− ” or “x → c+ ”, respectively.

Example 2.6 1. We want to determine lim x→0 sin x. We know that 0 ≤ | sin x| ≤
|x|. Since lim x→0 |x| = 0, see Example 2.5, we conclude from the Sandwich
Theorem that lim x→0 | sin x| = 0 and hence, lim x→0 sin x = 0 by Theorem 2.3.
2. We want to determine lim x→0 cos x. We know that sin2 x + cos x = 1. For x
2

near to 0, the cosine is positive, and therefore we have cos x = 1 − sin2 x. As


x tends to 0, sin x tends to 0, thus sin2 x tends to 0. From Theorems 2.2 and 2.4,
we see that cos x tends to 1. So lim x→0 cos x = 1.
3. We want to determine lim x→0 sinx x . We cannot use the rule (2.13) for the quotient,
because both denominator and numerator tend to 0 as x tends to 0. On the other
hand, from Theorem C.1 in Appendix C.2 we have, for x close to 0,

sin x
cos x < < 1.
x
Since lim x→0 cos x = 1 as shown just above, we obtain from the Sandwich The-
orem that
sin x
lim = 1.
x→0 x

Example 2.7 This example is more complicated. We investigate whether



π
lim sin (2.15)
x→0 x
 
exists. The function f (x) = sin πx is defined for all x  = 0; that it is not defined for
x = 0 does not matter for the investigation of the limit. The values of f at successive
points x = 1, x = 21 , x = 13 , …, coming closer and closer to 0, are

π π π
sin = sin π = 0 , sin = sin 2π = 0 , sin = sin 3π = 0 ,
1 1/2 1/3

and so on. One might be misled to conclude that the limit (2.15) exists and is equal
to 0. But if we compute the values of f at successive point x = 2, x = 25 , x = 29 ,
x = 132
, …, we see that those points, too, come closer and closer to 0. However,


π        
π 5π π 9π
sin = 1 , sin = sin = 1 , sin = sin = 1,
2 2/5 2 2/9 2
48 2 Limit and Continuity

Fig. 2.5 Graph of y = sin(π/x)

and so on. The graph of f is given in Fig. 2.5. It shows that the values of f oscillate
between 1 and −1 faster and faster as x approaches 0, indeed an infinite number of
times on the open interval (0, 1). The values of f do not approach a fixed number
L, and the limit does not exist.

2.3 Continuous Functions

In daily life, when we speak of a continuous process, we mean that it goes some-
what smoothly, without any disruption or abrupt changes. In mathematics, the word
“continuous” has much the same meaning. Intuitively, we imagine a function f to be
continuous, if we can draw its graph in the x-y-plane in a single continuous move-
ment. Thus, when x is close to a point c in the domain of f , the function values f (x)
should be close to f (c). In terms of limits this means that f has a limit at c, which
is equal to the value f (c) of the function at this point.
A formal statement of the concept of continuity is the following.
Definition 2.3 (Continuous Function, Open Interval) Let f be a function defined
on an open interval I = (a, b). We say that f is continuous at a point c ∈ I , if

lim f (x) = f (c) . (2.16)


x→c

We say that f is continuous on I = (a, b), if it is continuous at every point c ∈ I .


If moreover f is defined at the left endpoint a, we say that f is right-continuous at
a, if
lim+ f (x) = f (a) . (2.17)
x→a

Correspondingly, if f is defined at the right endpoint b, we say that f is left-


continuous at b, if
2.3 Continuous Functions 49

Fig. 2.6 Finite discontinuities

lim f (x) = f (b) . (2.18)


x→b−

Definition 2.4 (Continuous Function, Closed Interval) Let f be a function defined


on a closed interval [a, b]. We say that f is continuous on [a, b], if it is continuous
on (a, b), and moreover right-continuous at a and left-continuous at b.
If f is not continuous at a point c, we say that f is discontinuous at c, or that c is a
point of discontinuity of f . We distinguish the following situations.
If f has a limit as x approaches c (that is, lim x→c f (x) exists), but this limit is
not equal to f (c), then f is said to have a removable discontinuity at c. In fact,
in this case, the discontinuity at c can be removed by redefining f (c) as f (x) =
lim x→c f (x). For example, the function

1 , x = 0 ,
f (x) =
2, x = 0,

is discontinuous at c = 0, but redefining it as f (0) = 1 makes it continuous.


If lim x→c− f (x) and lim x→c+ f (x) exist but are not equal, then f is said to have
jump discontinuity at c. An example is given by the Heaviside function

1, x ≥ 0,
f (x) =
0, x < 0,

which has a jump discontinuity at c = 0 (For other examples, see Fig. 2.6).
However, the situation can be even worse, since it may well happen that one or
both of the one-sided (left or right) limits do not exist. For example, consider the
Dirichlet function,

1 , x is a rational number
f (x) = .
0 , x is an irrational number
50 2 Limit and Continuity

This function is discontinuous everywhere. Neither the left-sided nor the right-sided
limit of f exists at any point, since for any number c there are both rational and
irrational numbers arbitrarily close to c, both from the right as from the left.
Properties of continuous functions. By the definitions above, continuity is for-
mulated in terms of limits. Therefore, the properties of limits as discussed in the
preceding section give rise to corresponding properties of continuous functions.
Theorem 2.6 Let f and g be continuous functions defined on an interval I , where
I = (a, b) or I = [a, b]. Then
1. f + g is continuous on I ,
2. f − g is continuous on I ,
3. α f is continuous on I for each number α,
4. f · g is continuous on I ,
5. f /g is continuous at all points c ∈ I where g(c)  = 0.
Let f and g as above, except that the domain of g is now contained in f (I ). Then
6. The composition g ◦ f is continuous on I .
Many of the functions which have been discussed in Chap. 1 are continuous on their
domain of definition. Among them are polynomials, rational functions, trigonometric
functions, exponential functions, and logarithmic functions. We will not give explicit
proofs here. Their continuity will arise as a byproduct of other results in later chapters.

Theorem 2.7 (Intermediate Value Theorem) Let f be a function which is continuous


on a [a, b], and let m be any number between f (a) and f (b) (either f (a) < m <
f (b) or f (b) > m > f (a)). Then there is at least one number c in the interval (a, b)
such that f (c) = m.
In Chap. 4, we will discuss maxima and minima extensively, mainly based on the
concept and the properties of derivatives from Chap. 3. But let us mention here an
important property of continuous functions relevant for optimization.
Theorem 2.8 (Extreme value theorem) Let f be a function which is continuous on
a closed and bounded interval [a, b]. Then, f attains a maximal value M and a
minimal value m in this interval (Fig. 2.7).
Note that if the domain of f is unbounded or not closed, then f may be unbounded,
as for example f (x) = tan x on the domain (−π/2, π/2).
Theorem 2.9 (Continuity of inverse functions) Let f be a function which is in-
creasing on (a, b). If moreover f is continuous on (a, b), then its inverse f −1 is also
continuous on the respective domain.
The assertion of Theorem 2.9 also holds if we replace “increasing” by “decreasing”,
or the open interval (a, b) by the closed interval [a, b].
Let us finish this section by mentioning the so-called ε-δ-definition of continuity.
It is equivalent to the one given above.
2.3 Continuous Functions 51

Fig. 2.7 Geometric


illustration of Theorem 2.8

Definition 2.5 A function f is said to be continuous at a point c of its domain, if for


each ε > 0, there exists a δ > 0, such that if |x − c| < δ then | f (x) − f (c)| < ε.
This is just another way of stating precisely that when x is close to c, then f (x) is
close to f (c).

2.4 Improper Limits

There are two types of improper limits. For the first type, let us consider the function

1
f (x) = .
x

When x tends to zero from the right, the values f (x) increase without bound, and
we say that
1
lim = ∞.
x→0+ x

In general, we say that

lim f (x) = ∞ , (2.19)


x→c+

if for every M > 0, there is a δ > 0 such that, for all x with c < x < c + δ, we
have f (x) > M. (In order that this makes sense, we have to assume that the interval
(c, c + δ) is contained in the domain of f .) In an analogous manner one can consider
the cases when f (x) tends to −∞, or when x comes from the left, to obtain the
improper limits

lim f (x) = −∞ , lim f (x) = ∞ , lim f (x) = −∞ . (2.20)


x→c+ x→c− x→c−

For example,
1
lim = −∞ .
x→0− x
52 2 Limit and Continuity

This type of improper limit typically arises at poles of rational functions, but also for
other elementary functions. For example,

lim − tan x = ∞ , lim ln x = −∞ .


x→( π2 ) x→0+

The second type of improper limit occurs when we consider x to tend to ∞ or −∞.
For example, the values of f (x) = 1/x tend to zero when x increases without bound.
In general, we say that
lim f (x) = L , (2.21)
x→∞

if for every ε > 0 there is an M > 0 such that, for all x with x > M, we have
| f (x) − L| < ε. (We have to assume that the interval (M, ∞) is contained in the
domain of f .) Again, we may also consider the case when x tends to −∞. For
example,
π π
lim x → ∞ arctan x = , lim x → −∞ arctan x = − .
2 2
As with standard limits, one must be careful to check whether such an improper limit
really exists or not. For example, the function f (x) = sin x is bounded on the whole
real line, but it has no limit when x tends to infinity, in fact, it “forever” oscillates
between −1 and 1.
The two types of improper limits considered above may also be combined. For
example,
lim 3x = ∞ .
x→∞

2.5 Exercises

2.5.1 Let ⎧

⎨1 − x , x < 0 ,
2

f (x) = 13 , x = 0,


1− x , x > 0.

Examine whether lim x→0 f (x) exists or not.


2.5.2 Find the following limits:
x 2 − x − 12
a. lim
x→4 x 2 − 4x
x 3 + 64
b. lim
x→−4 x + 4
3x 2 + 7x − 6
c. lim
x→∞ 4x 2 − 3x + 6
2.5 Exercises 53

sin x
d. lim
x→0 x
e. lim (x + cos x)
x→0  
sin x
f. lim e + x
x→0 x
2.5.3 Let 
8 , x is rational,
f (x) =
3, x is irrational.

Show that lim x→c f (x) does not exist, no matter how we choose c ∈ R.
1 − cos x
2.5.4 Find lim .
x→0 x

π
2.5.5 Let f (x) = cos x − x 2 on [0, 1]. Show that there exists c ∈ (0, 1) such
2
that f (c) = 0.
2.5.6 Show that the absolute value function is continuous.
Chapter 3
Derivatives

The concept of the derivative was hidden in the problem of finding the line tangent
to a given curve at a given point, a problem tackled by the Greek mathematicians
more than two thousand years ago. This concept took definite shape during the years
1665–1666, when the famous English scientist Isaac Newton, the founder of modern
physics, developed the process now known as differentiation. At that time, New-
ton did not publish his work, and the same concept was rediscovered independently
by Gottfried Wilhelm Leibniz (1646–1716), a German scholar having expertise in
philosophy, law, mathematics, and science. The discovery of the derivative as a fun-
damental ingredient of calculus completely changed the nature of scientific studies,
particularly mathematics. Coupled with Newton’s formulation of the laws of motion
and gravitation, the calculus of Newton and Leibniz and their subsequent refinements
and extensions revolutionized the modern world.
In this chapter, we introduce the concept of the derivative, discuss the method of
computing derivatives of elementary functions, outline the results devoted to basic
properties of differentiation, present a basic differential equation, and conclude with
certain applications.

3.1 Definition of the Derivative

Average and instantaneous velocity. In Example 2.1, we have considered an object


moving along a straight line from a position s0 = f (t0 ) at time t0 to a position
s = f (t) at time t = t0 . We have seen that

s − s0 f (t) − f (t0 )
vav (t0 , t) = = (3.1)
t − t0 t − t0

gives the average velocity during a time interval from t0 to t, and we have indicated
that the instantaneous velocity v(t0 ) should arise as a limit when t tends to t0 in (3.1). In
Fig. 3.1, we examine this situation from a geometrical point of view. The points (t0 , s0 )
© Springer Nature Singapore Pte Ltd. 2019 55
M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6_3
56 3 Derivatives

Fig. 3.1 Slope of a secant


line

Fig. 3.2 Secant lines


approach the tangent line as
t → t0

and (t, s) on the graph of s = f (t) are indicated. The quotient (s − s0 )/(t − t0 ) from
Eq. (3.1) is precisely the slope of the straight line (the secant line) passing through
these two points. In other words, given the position s of the particle as a function of
time t, the average velocity during any time interval is the slope of the secant line
joining the corresponding pair of points on the graph of s = f (t). Now we interpret
geometrically what happens if t tends to t0 in (3.1). In Fig. 3.2, we have drawn the
secant lines with slopes

s2 − s0 s4 − s 0 s6 − s 0
, , , ···
t 2 − t0 t 4 − t0 t 6 − t0

for t2 , t4 , t6 , . . . greater than but successively closer to t0 , and the secant lines with
slopes
s 0 − s1 s0 − s3 s0 − s5
, , , ···
t0 − t1 t 0 − t1 t 0 − t5

for t1 , t3 , t5 , . . . less than but successively closer to t0 .


3.1 Definition of the Derivative 57

Fig. 3.3 Secant line

It seems intuitively reasonable that these lines approach the slope of the line that is
tangent to the graph of s = f (t) at (t0 , s0 ). In fact, we define the slope of the tangent
line to the curve s = f (t) as the limit

f (t) − f (t0 )
m = lim .
t→t0 t − t0

General definition of the derivative. Let us consider a function y = f (x) defined


on an interval I . Any change in x, say from x1 to x2 , induces a corresponding change
in y, namely, from f (x1 ) to f (x2 ). We define the average rate of change of f from
x1 to x2 as the ratio of the change in y to the change in x,

f (x2 ) − f (x1 )
. (3.2)
x2 − x1

This expression is also called a difference quotient. Sometimes, one writes sym-
bolically
Δy
, where Δx = x 2 − x1 , Δy = f (x2 ) − f (x1 ).
Δx
From (3.2), we see that the average rate of change of a function from x1 to x2 equals the
slope of the line which joins the two points (x1 , f (x1 )) and (x2 , f (x2 )) of its graph.
Such a line is called a secant line, see Fig. 3.3. The angle θ between the secant and
the x-axis is given in terms of the slope as tan θ = ( f (x2 ) − f (x1 ))/(x2 − x1 ).

Definition 3.1 (Derivative of a function) Let y = f (x) be a function defined on an


interval I , and let c be an interior point of I . The derivative of f at c, denoted by
f  (c) and read “ f prime of c” is defined as the number

f (x) − f (c)
f  (c) = lim , (3.3)
x→c x −c

provided this limit exists.


58 3 Derivatives

Remark 3.1 1. It is clear that the derivative is a unique number: For each c, for
which the limit exists, the right-hand side of (3.3) specifies a unique number
f  (c). As a consequence, formula (3.3) defines a function f  , called the derivative
of f , whose value at c is f  (c). The domain of f  consists of those points of
D( f ) for which the limit in (3.3) exists. If f  (c) is defined at a point c, then f
is said to be differentiable at that point. If f is differentiable at each point of an
open interval (a, b), then f is said to be differentiable on that interval.
2. According to (3.3), the derivative arises as a limit from difference quotients.
Thus, we may interpret the derivative of f as the limit of its average rate of
change when the corresponding interval shrinks to a point.
3. We define the tangent line to the graph of a function f at c ∈ D( f ) to be the
line which passes through the point (c, f (c)) and has the slope
f (x) − f (c)
f  (c) = lim ,
x→c x −c

provided the limit exists.


4. Let y = f (x) be differentiable. Sometimes y  is used to denote the value of its
derivative at x, that is, y  = f  (x). No matter which notation one uses, one has to
distinguish between the derivative viewed as a function, denoted here as f  , and
specific values of the derivative, denoted here as f  (x) or y  . It is most important
to understand this distinction.
5. Sometimes, in discussing differentiation, it is helpful to emphasize the indepen-
dent variable. Thus, if x is the independent variable, we may say “derivative with
respect to x” instead of merely “derivative”.
6. Above, we have interpreted the derivative as a velocity or as the slope of a tangent
line. More generally, if y = f (x), then its derivative f  is the rate of change of
f with respect to the independent variable x, and f  (x0 ) is the rate of change of
f at the point x = x0 .
7. Very often, Eq. (3.3) is written as

f (c + h) − f (c)
f  (c) = lim . (3.4)
h→0 h

Substitute h = x − c in (3.3), then h → 0 as x → c and x = h + c. We thus


obtain (3.4) from (3.3).
8. If f is defined on a closed interval [a, b], we may consider the right-sided resp.
left-sided derivatives at the end points, defined by

f (x) − f (a) f (x) − f (b)


f  (a+) = lim+ , f  (b−) = lim− . (3.5)
x→a x −a x→b x −b

Example 3.1 A family travels by a car on a Saturday morning, starting at 5 a.m. and
arriving at their destination at 9 a.m. When they began the trip, the car’s odometer read
28,700 km, and when they arrived it read 29,000 km. Compute the average velocity
of the car during the journey.
3.1 Definition of the Derivative 59

Solution: As we know, the average velocity equals the distance traveled, divided by
time elapsed. In the present case, the distance traveled equals 29,000 − 28,700 =
300 km, and the time elapsed equals 9 − 5 = 4 h.
The average velocity therefore equals 300/4 = 75 km/h.

Example 3.2 Show that the rate of change of the area of a circle with respect to its
radius is equal to its circumference.

Solution: The area A of a circle is related to its radius by A(r ) = πr 2 . We obtain the
rate of change of A with respect to r as the derivative A (r ) and compute

A(r + h) − A(r ) π(r + h)2 − πr 2


A (r ) = lim = lim
h→0 h h→0 h
πr 2 + 2πr h + π h 2 − πr 2 π h(2r + h)
= lim = lim = lim π(2r + h)
h→0 h h→0 h h→0
= 2πr ,

which equals the circumference of the circle.

Example 3.3 In a metabolic experiment, an amount of mass m of glucose decreases


according to the formula m(t) = 9 − 0.06t 2 , where m is measured in grams and the
elapsed time t in hours. Find the reaction rate at the time when 1 h has elapsed.

Solution: The reaction rate at t = 1 is m  (1). Thus

m(t) − m(1) (9 − 0.06t 2 ) − (9 − 0.06)


m  (1) = lim = lim
t→1 t −1 t→1 t −1
−0.06 · (t − 1)
2
−0.06 · (t − 1)(t + 1)
= lim = lim
t→1 t −1 t→1 t −1
= −0.12 .

Thus, the reaction rate at t = 1 is −0.12, that is, at the time when 1 hour has elapsed,
the rate of decrease equals 0.12 gram per hour.

3.2 Derivative of Elementary Functions

Constant Function
Let f (x) = c in some open interval I , c a given number. Then

f (x + h) − f (x) c−c
f  (x) = lim = lim
h→0 h h→0 h
= lim 0 = 0
h→0
60 3 Derivatives

Fig. 3.4 Constant function

Thus, the derivative of a constant function is everywhere zero. The graph of f equals
the horizontal line y = c, the slope of the tangent at each point of the graph is zero,
and hence all tangents coincide with the graph of f itself (Fig. 3.4).
Identity Function
Let f (x) = x in some open interval I . Its graph on the interval I = (−5, 5) is given
in Fig. 3.5. For every x ∈ I ,

f (x + h) − f (x) (x + h) − x
f  (x) = lim = lim = lim 1
h→0 h h→0 h h→0
= 1.

Hence, the derivative of the identity function is everywhere equal to 1, and the tangent
at a point (x, x) of the graph of f equals the straight line y = x. Thus, the tangent
line again is simply the original line itself.

Fig. 3.5 Identity function


3.2 Derivative of Elementary Functions 61

Power Function
Let f (x) = x n in some open interval I , for a given positive integer n. We will show
that
f  (x) = nx n−1 . (3.6)

For n = 1, we have seen that f  (x) = 1 for every x. For n = 2, we have at any point
c∈I
f (x) − f (c) x 2 − c2
lim = lim = lim (x + c) = c + c = 2c .
x→c x −c x→c x − c x→c

For n = 3,

f (x) − f (c) x 3 − c3 (x − c)(x 2 + cx + c2 )


lim = lim = lim = 3c2 .
x→c x −c x→c x − c x→c x −c

Replacing the letter c by x, we see that (3.6) is satisfied for n = 1, 2, 3. We now


check that (3.6) holds for arbitrary positive integers n. Indeed,

f (x) − f (c) x n − cn
lim = lim
x→c x −c x→c x − c
 
= lim x n−1 + cx n−2 + · · · + cn−1
x→c
= c
n−1
+ cn−1
+ · · · + cn−1 = ncn−1 .
n times

In words, the derivative of x raised to the power n equals n times x raised to the
power n − 1.
Another method to prove the same result starts from the alternative definition of
the derivative as
f (x + h) − f (x) (x + h)n − x n
f  (x) = lim = lim .
h→0 h h→0 h

Applying the Binomial Theorem to (x + h)n , we get

n(n − 1) n−2 2
(x + h)n = x n + nx n−1 h + x h + · · · + nxh n−1 + h n .
2!
Thus

 n(n − 1) n−2
f (x) = lim nx n−1
+ x h + · · · + nxh n−2 + h n−1
h→0 2!
= nx n−1 .

Formula (3.6) can also be proved for negative integers n, in fact, it holds for any real
number n.
62 3 Derivatives

Trigonometric Functions
We find here derivatives of sin x and cos x. The derivatives of sec x, csc x, tan x, and
cot x will be given in Sect. 3.3.
Let f (x) = sin x, then

sin (x + h) − sin x
f  (x) = lim .
h→0 h

Using the identity sin (x + h) = sin x cos h + cos x sin h, we get

sin x · (cos h − 1) + cos x sin h


f  (x) = lim
h→0 h
sin x · (1 − cos h) sin h
= − lim + lim cos x .
h→0 h h→0 h

We know that (see Example 2.6 (3))

sin h
lim = 1,
h→0 h
so
sin h sin h
lim cos x = cos x · lim = cos x .
h→0 h h→0 h

Moreover,
1 − cos h cos h − 1
lim sin x = sin x · lim = 0.
h→0 h h→0 h

Putting these formulas together, we see that f  (x) = 0 + cos x = cos x.


We now consider f (x) = cos x. Then

cos (x + h) − cos x
f  (x) = lim .
h→0 h

Using the identity cos (x + h) = cos x cos h − sin x sin h, we get

cos x · (cos h − 1) − sin x sin h


f  (x) = lim
h→0 h
(cos h − 1) sin h
= cos x · lim − sin x · lim
h→0 h h→0 h
= − sin x ,

where we have used the formulas


sin h cos h − 1
lim = 1, lim = 0.
h→0 h h→0 h
3.2 Derivative of Elementary Functions 63

Absolute Value Function


We know that the absolute value function


⎨x , x > 0,
f (x) = |x| = 0 , x = 0,


−x , x < 0 ,

is continuous at all points x, in particular at x = 0. (See Exercise 2.5.6). On the other


hand, f is not differentiable at x = 0. Indeed,

f (0 + h) − f (0) |h| 1, h > 0,
= =
h h −1 , h < 0 .

Therefore,

f (0 + h) − f (0) f (0 + h) − f (0)
lim = 1, lim = −1 ,
h→0+ h h→0− h

so the right- and left-sided limits exist, but are different. Thus,

f (0 + h) − f (0)
lim
h→0 h

does not exist, so f is not differentiable at x = 0.


This example shows that it may happen that a function is continuous, but not
differentiable. On the other hand, we have the following result.

Theorem 3.1 Let f be a function which is differentiable on an interval I . Then f


is continuous on I .

Proof Let x be an arbitrary point of I . For h = 0 and x + h ∈ I

f (x + h) − f (x)
f (x + h) − f (x) = ·h.
h
We have
f (x + h) − f (x)
lim h = 0 , lim = f  (x) ,
h→0 h→0 h

since f is differentiable at x. From the product rule for limits, we conclude that
   
f (x + h) − f (x)
lim [ f (x + h) − f (x)] = lim · lim h = f  (x) · 0 = 0 .
h→0 h→0 h h→0

It follows that lim h→0 f (x + h) = f (x), thus f is continuous at x.


64 3 Derivatives

3.3 Some Differentiation Formulas

A further remark on the notation of derivatives. In the previous sections, we have


taken care to specify derivatives in terms of a general function f , as in the statement
Let f (x) = sin x, then f  (x) = cos x.

Having understood the concept of the derivative, one often writes more briefly

d
sin x = cos x , or sin (x) = cos x ,
dx

or even (sin x) = cos x.


Constant multiples. The derivative of a constant times a function equals the constant
times the derivative of the function. That is, if c is a constant and f is a differentiable
function, then
(c f ) (x) = c f  (x) . (3.7)

Examples:
1. If f (x) = x 5 , then (6 f ) (x) = 6 f  (x) = 6 · 5x 4 = 30x 4 . Or, more briefly,
(6x 5 ) = 6(x 5 ) = 6 · 5x 4 = 30x 4 .
2. (−2 sin) (x) = −2 sin (x) = −2 cos x, or, with the same meaning, (−2 sin x) =
−2(sin x) = −2 cos x.
Derivative of a sum. The derivative of the sum of two differentiable functions equals
the sum of their derivatives. That is,

( f + g) (x) = f  (x) + g  (x) . (3.8)

Verification:
[ f (x + h) + g(x + h)] − [ f (x) + g(x)]
( f + g) (x) = lim
h→0 h
 
f (x + h) − f (x) g(x + h) − g(x)
= lim +
h→0 h h
f (x + h) − f (x) g(x + h) − g(x)
= lim + lim
h→0 h h→0 h
= f  (x) + g  (x) .

Example: (cos +4 sin) (x) = cos (x) + 4 sin (x) = − sin x + 4 cos x.
Derivative of a difference. The derivative of the difference of two differentiable
functions equals the difference of their derivatives. That is,

( f − g) (x) = f  (x) − g  (x) . (3.9)


3.3 Some Differentiation Formulas 65

Verification: We use the rules for sums and constant multiples,

( f − g) (x) = ( f + (−g)) (x) = f  (x) + (−g) (x) = f  (x) − g  (x) .

Derivative of a product. The derivative of the product of two differentiable functions


equals the first function times the derivative of the second plus the second function
times the derivative of the first. That is,

( f · g) (x) = f (x)g  (x) + f  (x)g(x) . (3.10)

We note that, in contrast to the case of sum and difference, the derivative of a product
does not equal the product of the derivatives.
Verification: We analyze the difference quotient

( f · g)(x + h) − ( f · g)(x) f (x + h)g(x + h) − f (x)g(x)


=
h h
f (x + h)g(x + h) − f (x + h)g(x) + f (x + h)g(x) − f (x)g(x)
=
  h  
g(x + h) − g(x) f (x + h) − f (x)
= f (x + h) + g(x) .
h h

Here we have added and subtracted f (x + h)g(x) in the numerator and then
regrouped the terms so as to display the difference quotients for f and g separately.
Now, since f is differentiable at x, we know that f is continuous at x by Theorem
3.1 and thus lim h→0 f (x + h) = f (x). From the rules for limits and the definition
of f  (x) and g  (x), we then obtain

( f · g)(x + h) − ( f · g)(x)
( f · g) (x) = lim
h→0 h
   
g(x + h) − g(x) f (x + h) − f (x)
= lim f (x + h) · lim + g(x) · lim
h→0 h→0 h h→0 h
= f (x)g  (x) + g(x) f  (x) .

The reciprocal rule. If g is differentiable at x and g(x)  = 0, then 1/g is differentiable


and  
1 g  (x)
(x) = − . (3.11)
g (g(x))2

Verification: Since g is differentiable at x, it is also continuous at x by Theorem 3.1.


Since g(x)  = 0, we know that the function (1/g)(x) is continuous at x and moreover

1 1
lim = .
h→0 g(x + h) g(x)
66 3 Derivatives

We now obtain
   
1 1 1 1 g(x) − g(x + h)
(x) = lim − = lim
g h→0 h g(x + h) g(x) h→0 g(x + h)g(x)
   
g(x + h) − g(x) 1
= − lim · lim
h→0 h h→0 g(x + h)g(x)

g  (x)
=− .
(g(x))2

The quotient rule. The derivative of a quotient equals the denominator times the
derivative of the numerator minus the numerator times the derivative of the denom-
inator, all divided by the square of the denominator.
Let f and g be differentiable at x and g(x) = 0, then the quotient f /g is differ-
entiable at x and  
f g(x) f  (x) − f (x)g  (x)
(x) = . (3.12)
g (g(x))2

To verify (3.12), one applies the rule for products and the reciprocal rule.
The chain rule. Let y be a differentiable function of u, and u be a differentiable
function of x. Then the rate of change of y with respect to x equals the rate of change
of y with respect to u times the rate of change of u with respect to x.
Let y = f (u) be a differentiable function of u, and u a differentiable function
of x, namely, u = g(x). Then y = f (u) = f (g(x)) = ( f ◦ g)(x), that is, in order
to express y as a function of x, we have to use the composite function f ◦ g. The
derivative of the composite function is given by the so-called chain rule

( f ◦ g) (x) = f  (g(x)) · g  (x) . (3.13)

For example, we want to find the derivative of y = (sin x)2 . We introduce the inter-
mediate variable u and write y = f (u) = u 2 , u = g(x) = sin x, then ( f ◦ g)(x) =
f (g(x)) = (sin x)2 . In order to obtain its derivative by (3.13), we have to com-
pute f  (u) = 2u and insert u = g(x) = sin x, so f  (g(x)) = 2 sin x. Since moreover
g  (x) = cos x, we arrive at

( f ◦ g) (x) = 2 sin x cos x

as the derivative of (sin x)2 .


The chain rule is sometimes written symbolically as

dy dy du
= , or y  (x) = y  (u)u  (x) .
dx du d x
Derivative of an inverse function. Let the function f have an interval I as its
domain D( f ). Suppose that f is differentiable and that either f  (x) > 0 for all
interior points of I , or that f  (x) < 0 for all such points. Then f is increasing and
3.3 Some Differentiation Formulas 67

continuous, its range R( f ) is again an interval, and it has an inverse function f −1


with domain R( f ) and range D( f ) = I . Additionally, f −1 is differentiable in the
interior of the interval R( f ). Its derivative at a point y = f (x) is given by

1 1
( f −1 ) (y) = =  −1 . (3.14)
f  (x) f ( f (y))

As an example, let us consider f (x) = x 2 on the interval I = [1, 3]. We have f  (x) =

2x > 0 on I . The inverse of f is f −1 (y) = y. To find its derivative at a point y = x 2 ,

we have to compute 1/ f  (x) = 1/2x and insert x = f −1 (y) = y, so

√ 1
( f −1 ) (y) = ( y) = √ .
2 y

To verify (3.14), we start from the equation

y = f ( f −1 (y))

and differentiate both sides with respect to y. Assuming that f −1 is differentiable


(which is true under the assumptions stated above, but we do not prove it here), we
get
1 = f  ( f −1 (y)) · ( f −1 ) (y) (3.15)

and divide both sides by f  ( f −1 (y)) to obtain (3.14).


We note that the derivatives of f and f −1 are reciprocal to each other, if we take
proper care to evaluate them at corresponding points y = f (x) of the independent
variables (x for f , y for f −1 ). Symbolically, Eqs. (3.14) and (3.15) may also be
written as
d x dy dx 1
· = 1 , or = dy , (3.16)
dy d x dy dx

but let us remark again that nowadays standard mathematical notation is given by
the formulas (3.14) and (3.15).
Derivatives of other trigonometric functions. We determine the derivative of the
trigonometric functions sec x, tan x, cot x, and csc x.
(i) Let f (x) = sec x = cos1 x . By the reciprocal rule (3.11),

d 1 (− sin x) tan x
=− = = tan x sec x .
dx cos x (cos x)2 cos x

(ii) Let f (x) = tan x = sin x


cos x
. By the quotient rule (3.12),

d sin x cos x cos x + sin x sin x 1
= 2
= = sec2 x .
dx cos x cos x cos2 x
68 3 Derivatives

(iii) Let f (x) = cot x = cos x


sin x
. Again, by the quotient rule (3.12),

d  cos x  − sin2 x − cos2 x 1


= 2
= − 2 = − csc2 x .
d x sin x sin x sin x
(iv) 
d d 1 cos x cos t
(csc x) = =− =− = − cot x csc x .
dx dx sin x sin2 x sin x

Exponential and logarithmic functions. We determine the derivative of the expo-


nential function and the logarithm.
(i) We consider first f (x) = e x . One can prove, as a consequence of the definition
of the exponential function to be given in Chap. 5, that
 2 
eh − e0 1 + h + h2! + · · · − 1 h
= = 1 + + ··· .
h h 2!
eh −1
This implies that limh→0 h
= 1, therefore

e x+h − e x eh − e
f  (x) = lim = e x lim = ex ,
h→0 h h→0 h
so
d x
e = ex . (3.17)
dx

(ii) We consider the general exponential function f (x) = a x with a > 0. From the
formula a x = e x ln a and the chain rule (3.13) we compute in view of (3.17)

d x d x ln a
a = e = ln a · e x ln a = ln a · a x . (3.18)
dx dx

(iii) The logarithm function f (x) = loga (x) and the general exponential function
are inverse to each other,
x = a loga x .

We compute the derivative of both sides, using the chain rule and (3.18),

d  loga x  d d
1= a = ln a · a loga x · loga x = ln a · x · loga x ,
dx dx dx
so
d 1
loga x = . (3.19)
dx x ln a
In particular, taking a = e we obtain, since ln e = 1,
3.3 Some Differentiation Formulas 69

d 1
ln x = . (3.20)
dx x
Implicit differentiation. Until now, we have discussed derivatives of functions which
are given explicitly in terms of the independent variable, for example, functions such
as y = x 2 + 1, y = sin x, and y = x 2 sin x. Now we discuss derivatives of functional
relationships between two variables in which neither of the variables is given in terms
of the other. For example, consider the equation

y2 + y − x 2 = 1 . (3.21)

While it is possible to find y explicitly in the form y = f (x) by the solution formula
for quadratic equations, one may want to avoid this because it is somewhat cum-
bersome; in other situations, it may be impossible to write an explicit formula for y
in terms of x. Nevertheless, we can obtain an equation for the derivative of y with
respect to x by the process called implicit differentiation. This process is based on
differentiating both sides of the equation satisfied by x and y. Let us assume that
(3.21) defines a function y = f (x), so we must have

f (x)2 + f (x) − x 2 = 1 .

Differentiating both sides with respect to x, we obtain in view of the chain rule

2 f (x) f  (x) + f  (x) − 2x = 0 .

Solving for f  (x), we get


2x
f  (x) = .
1 + 2 f (x)

Usually one writes this in shorter form, with y and y  instead of f (x) and f  (x).
Starting from (3.21), the computation then looks like

2x
2yy  + y  − 2x = 0 , y = .
1 + 2y

Note that we have to pay a price—the right-hand side involves not only x but also y.
So, in order to determine the derivative y  at, for example, x = 1, we have to find the
corresponding value of y first. In this example, there are two possible values, namely,
y = 1 and y = −2. (This is related to the fact that (3.21), being a quadratic equation
in y, is satisfied by two different functions of the form y = f (x).) The corresponding
values of the derivative are
2 2 2 2
y  (1) = = , respectively y  (1) = =− .
1+2 3 1−4 3
70 3 Derivatives

Example 3.4 Find y  = ddyx , if x and y are related by the following equations:
(a) x 2 + y 2 = 9,
(b) 2x 2 y − y 3 + 5 = x + 6y,
(c) sin (x + y) = x y,
(d) y 3 + 2y − cos π x − 6 = 0.
Solution: (a) Differentiating both sides of the given equation, we get
x
2x + 2yy  = 0 , or y  = − .
y

(b) Differentiating both sides of the given equation, we get

4x y + 2x 2 y  − 3y 2 y  = 1 + 6y  , (2x 2 − 3y 2 − 6)y  = 1 − 4x y ,
1 − 4x y
y = 2 .
2x − 3y 2 − 6

(c) By differentiating both sides of sin (x + y) = x y, we get

cos (x + y)(1 + y  ) = y + x y  ,
(cos (x + y) − x)y  = y − cos (x + y) ,
y − cos (x + y)
y = .
cos (x + y) − x

(d) By differentiating both sides of the given equation, we get

3y 2 y  + 2y  + π sin x = 0 ,
π sin x
y = − .
(3y 2 + 2)

Example 3.5 A spherical balloon is expanding. If the radius is increasing at the rate
of 4 cm/min, at what rate does the volume increase when the radius is 10 cm?
Solution: The volume V and the radius r are related by V = 43 πr 3 . At an arbitrary
time t, we have V (t) = 43 πr (t)3 , and consequently

4
V = π · 3r 2 r  = 4πr 2 r  .
3

Here r = 10, and the rate of increase r  = drdt


of the radius with time equals 4.
Therefore, V  = ddtV = 4π 102 · 4 = 1600π cm/min. Note that it was not necessary
to compute the time t at which r (t) = 10 and r  (t) = 4 holds.
Example 3.6 Consider a cylinder with variable radius r and height h. Suppose that
the radius is changing, but the volume is kept constant. How is h changing with
respect to r ?
3.3 Some Differentiation Formulas 71

Solution: The volume V of the cylinder satisfies V = πr 2 h, thus h = V


πr 2
. We obtain
the rate of change of h with respect to r as

dh 2V −3 2(πr 2 h) −3 2h
=− r =− r =− .
dr π π V
Example 3.7 How does the atmospheric pressure p vary (change) with respect to
the height h?

Solution: We are required to find the derivative of p as a function of h. If h is measured


in meters and p in kPa (kilo Pascal), then approximately

h
p(h) = p0 exp − ,
7990

where p0 is the atmospheric pressure at ground (=sea) level, p0 = 100 approxi-


mately. The rate of change of pressure satisfies p  (h) = −(7990)−1 p(h). Near the
ground, we therefore have

p  (h) ∼
= p  (0) ∼
= −0.01 kPa/m .

We see that for heights between 0 and 100 m, for example, the decrease in pressure
is rather small.

Example 3.8 A projectile is dropped from a height of 98 m. After how many seconds
does it hit the ground? What is the speed at the moment of impact?

Solution: According to Galileo’s formula y(t) = − 21 gt 2 + v0 t + y0 , where g is the


gravitational acceleration, v0 is the initial (upward) velocity of the projectile at time
t0 = 0, and y0 is the height from which the projectile is dropped. In the present case,
g = 9.8 m/sec2 approximately, y0 = 98 m and v0 = 0, so

y(t) = −4.9t 2 + 98 . (3.22)

At the time t of impact there holds y(t) = 0, which gives


 1/2
98 √ √
t= = ± 20 = ±2 5 .
4.9

Since the impact occurs at √ a time t > 0, t = 2 5 is the correct solution. The veloc-
y  (2 5). From (3.22), we obtain y  (t) = v(t) = −(4.9)2t =
ity at this time equals √
−9.8t. Inserting t = 2 5 gives us the speed at impact as
√ √
|v(2 5)| = |y  (2 5)| ∼
= 43.83 m/s .
72 3 Derivatives

Derivatives in economics. In business and economics, we want to know how changes


in variables such as production, supply or price, effect changes in variables such as
cost, revenue, or profit. If f is a function that describes the relationship between
pairs of these variables, the term “marginal” is used when one wants to refer to the
derivative of f .
For example, let C = C(x) denote the cost of production x units of a certain
commodity. Then C  is called the marginal cost. The marginal cost C  (x) at the
production level x is approximately equal to the cost of producing the (x + 1)st
unit. If R = R(x) is the revenue received for selling x units of the commodity, then
R  is called the marginal revenue. The marginal revenue R  (x) at a sales level x is
approximately equal to the revenue obtained by selling one additional unit. For the
cost and revenue functions C = C(x) and R = R(x), respectively, associated with
producing and selling x units of a commodity,

P(x) = R(x) − C(x)

is called the profit function.


Values of x (if any) at which C(x) = R(x), that is, values at which cost equals
revenue, are called break-even points. The derivative P  of P is called the marginal
profit.

Example 3.9 (a) A manufacturer of computer components determines that the total
cost C of producing x components per week is

x2
C(x) = 2000 + 100x − dollars .
10
Find the marginal cost at the production level of 40 units. What is the exact cost of
producing the 41st component?
(b) A manufacturer of watches determines that the cost and revenue functions
involved in producing and selling x watches are

x2
C(x) = 1200 + 13x , R(x) = 75x − ,
2
respectively. Find the profit function and determine the break-even points. Find the
marginal profit and determine the production/sales level at which the marginal profit
is zero.

Solution: (a) The marginal cost is C  (x) = 100 − x5 . Thus, the marginal cost at the
production level of 40 components equals C  (40) = 100 − 40 5
= 92 dollars. The
exact cost of producing the 41st component is given by
3.3 Some Differentiation Formulas 73
   
412 402
C(41) − C(40) = 2000 + 100 · 41 − − 2000 + 100 · 40 −
10 10
= 2000 + 4100 − 168.1 − 2000 − 4000 + 160
= 91.9 dollars.

(b) The profit function P is given by

x2
P(x) = R(x) − C(x) = 75x − − (1200 + 13x)
2
x2
= 62x − − 1200 .
2
In order to find the break-even point, set P(x) = 0. Multiplying by −2, we have

0 = x 2 − 124x + 2400 = (x − 24)(x − 100) = 0 .

The break-even points are x = 24 and x = 100. The marginal profit is given by
derivative P  (x) = 62 − x, and P  (x) = 0 when x = 62.

Example 3.10 Gravel is being poured by a conveyor onto a conical pile at a constant
rate of 40π cubic meter per minute. The frictional forces within the pile are such that
the height of the pile is always two-thirds of its radius. How fast is the radius of the
pile changing at the moment when it equals 5 m?

Solution: The formula for the volume V of a right circular cone of radius r and height
h is
1
V = πr 2 h .
3

In the present case, it is prescribed that h = 23 r , and hence

2 3 2
V (r ) = πr , V  (r ) = πr 2 .
9 3
Since gravel is being poured onto the pile, both volume and radius are functions of
time t. The chain rule implies that

d 2
V (r (t)) = V  (r (t))r  (t) = πr (t)2 r  (t) ,
dt 3

or in abbreviated form
dV 2 dr
= πr 2 · . (3.23)
dt 3 dt

It is prescribed that ddtV = 40π . Our goal is to find dr


dt
at the time t when r (t) = 5.
Inserting this information into (3.23), one obtains
74 3 Derivatives

Fig. 3.6 A conical pile of


sand

50 dr dr 12
40π = π· , therefore = .
3 dt dt 5
We conclude that the radius is increasing at the rate of 2.4 m/min at the moment when
the radius equals 5 m (Fig. 3.6).
Example 3.11 Sand is poured on a conical pile at a rate of 10 cubic meter per minute.
The diameter of the base of the pile is always 50% greater than its height. Determine
how fast the height of the pile is rising when the pile is 5 m high.
Solution: As in the previous example, V = 13 πr 2 h. This time, it is given that 2r = 23 h,
so r = 43 h. Thus
 2
1 3 3 9
V (h) = π h = π h 3 , V  (h) = π h2 .
3 4 16 16

Since h is a function of t, the chain rule d


dt
V (h(t)) = V  (h(t))h  (t) thus yields

dV 9 dh
= π h2 · .
dt 16 dt
dV
We substitute the values of dt
and h given in the problem and obtain

9 dh dh 32
10 = π 52 , therefore = .
16 dt dt 45π
At the moment when h = 5 m, the height rises at a rate of 32/45π m/min.
Example 3.12 A searchlight is mounted 1000 m offshore and rotates at a constant
angular speed of four revolutions per minute. Determine how fast the spot of light is
moving along the shoreline when it reaches a point 1500 m from the light.
Solution: We choose the origin at the searchlight, and the vertical line x = 1000 for
the shoreline, thus the x-axis is perpendicular to the shoreline. Let y represent the
3.3 Some Differentiation Formulas 75

Fig. 3.7 Moving spot of


light

current position of the spot of light on the shoreline, and let θ be the angle between the
positive x-axis and the direction of the searchlight. We know that dθ dt
= 8π rad/min.
We need to find dydt
.
From Fig. 3.7, we see that y = 1000 tan θ, so dθ dy
= 1000 sec2 θ. By the chain rule

dy dy dθ
= = 1000 sec2 θ · 8π = 8000π sec2 θ . (3.24)
dt dθ dt
At the moment of interest, we have l = 1500, so

1 1500 3
sec θ = = = .
cos θ 1000 2
Substituting this value into (3.24), we obtain
 2
dy 3
= 8000π = 18000π .
dt 2

The spot of light is moving at a speed of 18000π m/min at the moment in question.

Example 3.13 Find the derivatives of the following functions:


(a) f (x) = x 2/3
(b) f (x) = x −3
(c) f (x) = (2x 3 − x)(x 4 + 3x)
(d) f (x) = x82 − x6
2
−1
(e) f (x) = x 46x+5x+1
(f) f (x) = 9x 2 +8x+10
1

(g) f (x) = (x 2 − 1)100


(h) f (x) = 2x 3 (x 2 − 3)4
(i) f (x) = x(x 2 + 1)3
(j) f (x) = sin x · cos x
f (x) = e−x
2
(k)
76 3 Derivatives

Solution:
(a) For f (x) = x 2/3 , we have

2 2 −1 2 1 1
f  (x) = x 3 = x− 3 = √ , x > 0.
3 3 23x

(The graph of x 2/3 suggests that the tangent line at x = 0 is the y-axis, which
has an infinite slope.)
(b) For f (x) = x −3 , we have by the power rule

f  (x) = −3x −3−1 = −3x −4 .

(c) For f (x) = (2x 3 − x)(x 4 + 3x), if we use the product rule, we get

d d 4
f  (x) = [2x 3 − x] · (x 4 + 3x) + (2x 3 − x) · [x + 3x]
dx dx
= (6x 2 − 1)(x 4 + 3x) + (2x 3 − x)(4x 3 + 3)
= (6x 6 − x 4 + 18x 3 − 3x) + (8x 6 − 4x 4 + 6x 3 − 3x)
= 14x 6 − 5x 4 + 24x 3 − 6x .

In this case, it is easier first to multiply out and then to differentiate with the
power rule,

f (x) = 2x 7 − x 5 + 6x 4 − 3x 2 , f  (x) = 14x 6 − 5x 4 + 24x 3 − 6x .

(d) For f (x) = 8


x2
− 6
x
= 8x −2 − 6x −1 , we obtain from the power rule

16 6
f  (x) = −16x −3 + 6x −2 = − 3
+ 2.
x x
6x 2 −1
(e) For f (x) = x 4 +5x+1
, we use the quotient rule,

(x 4 + 5x + 1)(12x) − (6x 2 − 1)(4x 3 + 5)


f  (x) =
(x 4 + 5x + 1)2
−12x + 4x + 30x 2 + 12x + 5
5 3
= .
(x 4 + 5x + 1)2

(f) For f (x) = 1


9x 2 +8x+10
, we use the reciprocal rule,

−(18x + 8)
f  (x) = .
(9x 2 + 8x + 10)2

(g) For f (x) = (x 2 − 1)100 , we use the chain rule,


3.3 Some Differentiation Formulas 77

d 2
f  (x) = 100(x 2 − 1)99 (x − 1) = 100(x 2 − 1)99 · 2x
dx
= 200x(x 2 − 1)99 .

(h) For f (x) = 2x 3 (x 2 − 3)4 , we use first the product rule and then the chain rule,

d d d
[2x 3 (x 2 − 3)4 ] = 2x 3 · [(x 2 − 3)4 ] + (x 2 − 3)4 · (2x 3 ) (product rule)
dx dx dx
= 2x 3 · 4(x 2 − 3)3 · 2x + (x 2 − 3)4 · 6x 2 (chain rule)
= 16x 4 (x 2 − 3)3 + 6x 2 (x 2 − 3)4
= 2x 2 (x 2 − 3)3 (11x 2 − 9) .

(i) f (x) = x(x 2 + 1)3

d d d
[x(x 2 + 1)3 ] = x · [(x 2 + 1)3 ] + (x 2 + 1)3 · (x)
dx dx dx
= x · 3(x + 1) · 2x + (x + 1) · 1
2 2 2 3

= 6x 2 (x 2 + 1)2 + (x 2 + 1)3
= (x 2 + 1)2 [6x 2 + x 2 + 1]
= (x 2 + 1)2 (7x 2 + 1) .

(j) For f (x) = sin x · cos x, we use the product rule,

f  (x) = sin (x) · cos(x) + sin(x) · cos (x) = cos2 x − sin2 x .

(k) For f (x) = e−x , we use the chain rule,


2

d
f  (x) = e−x · [−x 2 ] = −2xe−x .
2 2

dx

3.4 Derivatives of Higher Order

When we differentiate a function y = f (x), its derivative f  is again a function.


The derivative of f  is called the second derivative of f and is denoted by f  . The
derivative of the function f  is called the third derivative of f and denoted by f  ,
or alternatively f (3) . The fourth derivative is denoted by f (4) . In general, if n is a
positive integer, the nth derivative of f is denoted by f (n) .
Example 3.14 Find the fourth derivative of the following functions:
(a) f (x) = x 4 − 3x −1 + 10,
(b) f (x) = sin(2x).
78 3 Derivatives

Solution:
(a) To find the fourth derivative of f (x) = x 4 − 3x −1 + 10, we have to compute
sequentially the first, second, and third derivatives,

3
f  (x) = 4x 3 + , f  (x) = 12x 2 − 6x −3 ,
x2
f  (x) = 24x + 18x −4 , f (4) (x) = 24 − 72x −5 .

(b) The same procedure applies to f (x) = sin(2x),

f  (x) = 2 cos(2x) , f  (x) = −4 sin(2x) ,


f  (x) = −8 cos(2x) , f (4) (x) = 16 sin(2x) .

As in the case of the first derivative, various different notations are in use to denote
higher derivatives. Instead of f , f  , f  , f  …we may write

df d2 f d3 f
f, , , ,...
dx dx2 dx3

or, if we use y to denote a function y = y(x), we may write

dy d 2 y d 3 y
y , y  , y  , y  , y (4) , . . . , , ,...
dx dx2 dx3

So, for example, y = x −1 has the derivatives

d4 y
y  = −x −2 , y  = 2x −3 , y  = −6x −4 , = y (4) (x) = 24x −5 .
dx4

3.5 A Basic Differential Equation

A differential equation is an equation for one (or several) unknown functions, in


which derivatives of those functions appear. Differential equations as mathematical
models for phenomena in the real world have been developed for several hundred
years, and are nowadays practically ubiquitous. In this subsection, we present a basic
differential equation which leads to exponential growth or decay, and we discuss
various situations where it arises.
The basic model. In different areas of science and technology, there arise many sit-
uations in which the following assertion is valid, in an exact or approximate manner:

A quantity Q varies at a rate proportional to itself. (3.25)


3.5 A Basic Differential Equation 79

Since the mathematical expression for the rate of change of Q is the derivative of Q,
the corresponding mathematical model takes the form of the differential equation

dQ
= kQ . (3.26)
dt
Here k is a real number, the constant of proportionality, and Q is a function of an
independent variable which we denote by t, because in many applications it will
stand for time. If k > 0, then (3.26) signifies that the time rate of change of Q is
positive, thus the amount Q of the quantity is increasing with time. If k < 0, then
(3.26) implies that the time rate of change of Q is negative, thus the amount Q of
the quantity is decreasing with time.
We will study differential equations later in more detail (see Chap. 11), but we
can check immediately that the function

Q(t) = Q 0 ekt , (3.27)

Q 0 being an arbitrary constant, solves (3.26), as we may simply differentiate both


sides of (3.27) and obtain

Q  (t) = Q 0 · kekt = k Q(t) .

Population Growth Model


In his 1798 “Essay on the Principle of Population”, the English economist Thomas
Malthus introduced (3.25) into the study of the growth of human populations. He
assumed that the rate, at which a population grows at a certain time, is proportional
to the total population at that time. Let N (t) denote the population size (=number of
individuals) of a country at time t. The mathematical model above (with Q replaced
by N ) then becomes
dN
= kN . (3.28)
dt

If the population size equals N0 at the time t0 , we must have N (t0 ) = N0 . The latter
condition fixes the arbitrary constant in the solution, and we arrive at

N (t) = N0 ek(t−t0 ) . (3.29)

The model thus predicts the population size at any time t, if we know its size at some
time t0 . It is a rather simple model which does not take into account other important
factors like crowding, immigration, or emigration. Many more refined models have
been studied in the life and social sciences, as well as by mathematicians. The basic
model (3.28) is often used to model the growth of small populations over a short
interval of time.

Example 3.15 Suppose the US population grows continuously at the rate of 4% per
year. If the population is currently 275 million, what will it be in 1 year? In 10 years?
80 3 Derivatives

Solution: Let us fix the current time as t0 = 0. With N measured in millions and t in
years, the constants in (3.28), (3.29) become k = 0.04 and N0 = 275, so the solution
is
N (t) = 275e0.04t .

After 1 year, the population size will be equal to N (1) = 275e0.04 286.223 million,
after 10 years, its size will be N (10) = 275e0.4 410.25 million.

Example 3.16 Assume that the population of the United States grows at a rate pro-
portional to the current population. According to census, the population size was
75.99 million in the year 1900 and 134 million in the year 1940.
(a) Find the annual growth rate during this period.
(b) Using the growth rate found in part (a), determine the population size predicted
by the model for the year 2000.

Solution:
(a) With the constants t0 = 1900 and N0 = 75.99, the solution (3.29) becomes

N (t) = 75.99ek(t−1900) . (3.30)

Inserting the data for t = 1940, we get



1 134
134 = N (1940) = 75.99e 40k
, k= ln = 0.01418 .
40 75.99

Hence, the average growth rate was about 1.42% per year during the period from
1900 to 1940.
(b) For this part, t = 2000 and k = 0.01418. We insert these values into (3.30) and
obtain N (2000) = 75.99e100·0.01418 313.76 million.
Bacterial Growth Model
Example 3.17 Assume that a colony of bacteria increases at a rate proportional to
the current number. If the number of the bacteria doubles in 5 h, how long will it take
for their number to triple?
Solution: We fix t0 = 0 and obtain the formula N (t) = N0 ekt for the number of
bacteria as before. Since we know that this number doubles in 5 h, we get

N (5) 1
2= = e5k , therefore k = ln 2 .
N0 5

The time t to triple has to satisfy

N (t)
3= = ekt ,
N0
3.5 A Basic Differential Equation 81

and we obtain t from


ln 3 ln 3 1.0986
ln 3 = kt , t = =5 =5 7.925 .
k ln 2 0.6931
Thus, after 7.925 h the number of bacteria has tripled.
Financial Growth Model
Consider an amount y0 of capital, deposited in a bank account. If the bank pays
an annual interest rate of r percent at the end of the year (annual compounding),
the capital will then amount to y0 (1 + k), where k = 0.01r . After t years, the total
amount will be
y = y0 (1 + k)t .

Compounding monthly for t years yields


 12t
k
y(t) = y0 1+ .
12

Compounding daily for t years yields



k 365t
y = y0 1 + .
365

If we compound kn times per year at equal time intervals, after t years, we will have
an amount of   kt
k knt 1 n
y(t) = y0 1 + = y0 1 + .
kn n

We say that interest is compounded continuously, if we let n tend to infinity in the


latter formula. Then

1 n
y(t) = y0 e , since e = lim 1 +
kt
.
n→∞ n

We see that continuous compounding results in capital growth according to our basic
model
dy
= ky .
dt
Example 3.18 How long will it take for money in the bank to double at 10% annual
interest, if compounded continuously? Compounded annually?

Solution: We determine the time t from the equation

y(t) 1
2= = ekt , t = ln 2 .
y0 k
82 3 Derivatives

Since k = 0.01, the result is that the capital doubles after t = 10 ln 2 6.92 years.
If we use the interest rate r (in percent) instead of k, the formula above becomes

100 69.2
t= ln 2 .
r r
This corresponds to the banker’s “Rule of 70” which says that the doubling time can
be estimated dividing 70 by the interest rate.
For interest compounded annually, y(t) = y0 (1 + k)t = y0 (1.1)t . At doubling,
(1.1)t = 2, t ln 1.1 = ln 2, t = ln 2/ ln 1.1 7.3. Since annual compounding occurs
only when t is an integer, the balance will not double before the eighth interest
payment.
Radioactive Decay
For a radioactive substance, the rate of decay at a given time t is proportional to
the amount present at that time. That is, if A represents the amount of radioactive
substance at time t, again our basic model

dA
= kA
dt
applies. Here, the constant k is negative and depends on the radioactive substance.
The half-life of the radioactive substance is defined as the time when half of the
substance has decayed.
In carbon dating, we use the fact that all living organisms contain two kinds of
carbon, carbon-12 (a stable carbon) and carbon-14 (a radioactive carbon). As a result,
when an organism dies, the amount of carbon-12 which is present within the organism
remains unchanged, while the amount of carbon-14 begins to decrease. This change
in the amount of carbon-14 relative to the amount of carbon-12 makes it possible to
calculate the time at which the organism lived.

Example 3.19 In the skull of an animal found in an archaeological dig, it was deter-
mined that about 20% of the original amount of carbon-14 was still present. The
half-life of carbon-14 is 5600 years. Find the approximate age of the animal.

Solution: Let A(t) be the amount of carbon-14 present in the skull at time t. Then
A satisfies the differential equation d A/dt = k A, whose solution is A(t) = A0 ekt ,
where A0 is the amount of carbon-14 present at time t = 0, the time of death of the
animal. To determine the constant k, we use the fact that when t = 5600, half of the
original amount A0 will remain. Thus,

1 A(t) 1 1
= = e5600k , k = ln = −0.000124 .
2 A0 5600 2

Thus, at time t an amount of A(t) = A0 e−(0.000124)t carbon-14 is still present. If this


comprises 20% of the original amount A0 , we have
3.5 A Basic Differential Equation 83

1.6094
ln 0.2 = −0.000124t , therefore t = 12,979 .
0.000124
Thus, the animal lived approximately 13,000 years ago.

3.6 Differentials, Newton–Raphson Approximation

Differentials. Let f be a differentiable function whose graph and the tangent line at
the point (x, f (x)) are shown in Fig. 3.8. We see that f (x + h) − f (x), the change
in f from x to x + h can be approximated by the product f  (x)h for small h, h  = 0:

f (x + h) − f (x) ∼
= f  (x)h . (3.31)

Definition 3.2 Let h = 0. The difference f (x + h) − f (x) is called the increment


of f from x to x + h and is denoted by Δf ,

Δf = f (x + h) − f (x) .

The product f  (x)h is called the differential of f at x with increment h and is


denoted by d f ,
d f = f  (x)h .

As defined above, both Δ f and d f are functions which depend on x and h. In actual
computation, one usually just writes Δf and d f as above, instead of Δf (x, h) or
d f (x, h).

Fig. 3.8 Increment and differential


84 3 Derivatives

Figure 3.8 says that, for small h, the values of Δf and d f are approximately equal,

Δf ∼
= df . (3.32)

But the main point here is that, when h tends to 0, not only the difference Δf − d f
tends to 0 but also the ratio
Δf − d f
(3.33)
h
tends to 0, so Δf − d f tends to 0 faster than h. Indeed,

Δf f (x + h) − f (x) df f  (x)h
= , = = f  (x)
h h h h
are the difference quotient and the derivative of f , respectively, and their difference
tends to 0, since f is differentiable.
We now consider a simple case. The area of a square of sidelength x > 0 is given
by the function
f (x) = x 2 .

If the length of each side is increased from x to x + h, the area increases from f (x)
to f (x + h). The change in area equals the increment Δf ,

Δf = f (x + h) − f (x) = (x + h)2 − x 2 = (x 2 + 2xh + h 2 ) − x 2


= 2xh + h 2 .

See Fig. 3.9. As an estimate for this change, we can use the differential

d f = f  (x)h = 2xh.

The error of this estimate, the difference between the actual change Δf and the
estimated change d f , is the difference

Fig. 3.9 Error of the


estimated change d f
3.6 Differentials, Newton–Raphson Approximation 85

Δf − d f = h 2 .

The error of the estimate is small compared to h in the sense that

Δf − d f h2
= =h
h h
tends to 0 as h tends to 0.

Example 3.20 Use a differential to estimate the change in f (x) = x 2/5 if


(a) x is increased from 34 to 36,
7
(b) x is decreased from 1 to 10 .

Solution: Since f  (x) = 25 x −3/5 = 2/(5x 3/5 ), we have

2h
d f = f  (x)h = .
5x 3/5
(a) We set x = 34 and h = 2. The differential then becomes

2 4
df = ·2= = 0.096.
5 · (34)3/5 41.48

A change in x from 34 to 36 increases the value of f by approximately 0.096. We


have

Δf = f (36) − f (34) = (36)2/5 − (34)2/5 ∼


= 4.193 − 4.098 = 0.095.

(b) Put x = 1 and h = − 10


3
. In this case, the differential is

2 3 6
df = · − =− = −0.12.
5 · (1)3/5 10 50

9
A change in x from 1 to 10
decreases the value of f by approximately 0.12. We also
have

Δf = f (0.7) − f (1) = (0.7)2/5 − (1)2/5 ∼


= 0.867 − 1 = −0.133 .

Example
√ 3.21 Use a differential to estimate
(a) 105,
(b) cos 40◦ ,
starting from known values for nearby arguments.
86 3 Derivatives

Solution:

(a) We know that 100 = 10. We want to obtain an estimate for the increase of

f (x) = x,

when x increases from 100 to 105. Here,

1 h
f  (x) = √ , d f = f  (x)h = √ .
2 x 2 x

With x = 100 and h = 5, d f becomes

5 1
√ = = 0.25 .
2 100 4

A change in x from 100 to 105 increases the value of the square root by approx-
imately 0.25. Hence,
√ √
105 ∼
= 100 + 0.25 = 10 + 0.25 = 10.25 .

Since (10.25)2 = 104.04, so the estimate is not far off.√


(b) Let f (x) = cos x. We know that cos 45◦ = cos π4 = 2/2. The angle 40◦ =
45◦ − 5◦ can be written in radians as
π π π π
−5· = − rad.
4 180 4 36
We use a differential to estimate the change in cos x, when x decreases from π/4
to (π/4) − (π/36). We have

f  (x) = − sin x , d f = f  (x)h = −h sin x .

With x = π/4 and h = −π/36, d f becomes

 √ √
π π  π 2 π 2 ∼
df = − − sin = = = 0.0617 .
36 4 36 2 72
Thus, a decrease in x from π/4 to (π/4) − (π/36) increases the value of the
cosine function by approximately 0.0617. Therefore,

= cos 450 = 0.0617 ∼


cos 40◦ ∼ = 0.7071 + 0.0617 = 0.7688 .

We see that cos 40◦ ∼


= 0.7688.
Example 3.22 A metal sphere with a radius of 5 cm is to be covered with a 0.02 cm
coating of silver. Approximately how much silver will be required?
3.6 Differentials, Newton–Raphson Approximation 87

Fig. 3.10 Newton–Raphson


method

Solution: We use a differential to estimate the increase in the volume of the sphere
when the radius is increased from 5 to 5.02 cm. The formula for the volume of a
sphere of radius r is
4
V (r ) = πr 3 .
3

The differential d V = V  (r )h then becomes

d V = 4πr 2 h .

For r = 5 and h = 0.02, we get

d V = 4π(5)2 · 0.02 = 2π ∼
= 6.283 .

Thus, it will take approximately 6.283 cm3 of silver to coat the sphere.
Newton–Raphson Method
Let f be a function whose graph is shown in Fig. 3.10. Since the graph of f crosses
the x-axis at x = c, the number c is a solution (root) of the equation f (x) = 0. In
the setup of Fig. 3.10, we can approximate c as follows: start at a point x1 (see the
figure). The tangent line at (x1 , f (x1 )) intersects the x-axis at a point x2 , which is
closer to c than x1 . The tangent line at (x2 , f (x2 )) intersects the x-axis at a point x3 ,
which is closer to c than x2 . Continuing in this manner, we will obtain better and
better approximations x4 , x5 , . . . , xn to the root c.
We now develop an algebraic connection between xn and xn+1 . The tangent line
at (xn , f (xn )) has the equation

y − f (x n ) = f  (xn )(x − xn ) .

The value xn+1 , where it intersects the x-axis, can be found by setting y = 0,

0 − f (xn ) = f  (xn )(xn+1 − xn ) .


88 3 Derivatives

Fig. 3.11 Failure when the


tangent becomes horizontal

Solving this equation for xn+1 , we have

f (xn )
xn+1 = xn − .
f  (xn )

The method described above of locating a root of an equation f (x) = 0 is called the
Newton–Raphson method, or simply the Newton method. It works if the following
conditions are satisfied:
(i) f is differentiable in some interval that includes the root c.
(ii) f  (x)  = 0 in some interval including c. (See Fig. 3.11 for what happens if
f  (xn ) = 0.)
(iii) The initial approximation x1 is close enough to c.
Indeed, the method may fail if x1 is not chosen properly. For instance, it can happen
that the first approximation x1 produces a second approximation x2 which, in turn,
gives the same x1 as the third approximation, and so on—the approximations simply
alternate between x1 and x2 . See Fig. 3.12. Another type of difficulty can arise if
f  (x1 ) is smaller than f  (c). In this case, the second approximation x2 could be
worse than x1 , the third approximation x3 could be worse than x2 , and so forth (see
Fig. 3.13).
Let us consider a situation where the Newton–Raphson method is guaranteed
to work no matter how far x1 is away from the root c. Suppose that x1 > c, f is
twice differentiable and that f (x) f  (x) > 0 on the open interval I joining c and x1 .
If f  (x) > 0 on I , then the graph of f is curved upward1 on I , and we have the
situation pictured in Fig. 3.14. On the other hand, if f  (x) < 0 on I , then the graph
of f is curved downward on I , and we have the situation pictured in Fig. 3.15. In
either case, the sequence of approximations x1 , x2 , x3 , . . . will converge to the root c.

Example
√ 3.23 The number 5 is a root of the equation x 2 − 5 = 0. We will estimate
5 by applying the Newton–Raphson method to the function f (x) = x 2 − 5 starting

1 The curvature of a graph and the role of the second derivative are treated in Chap. 4.
3.6 Differentials, Newton–Raphson Approximation 89

Fig. 3.12 Approximations


alternate

Fig. 3.13 Approximations


get worse

Fig. 3.14 Approximations


converge to a root
90 3 Derivatives

Fig. 3.15 Approximations


converge to a root


at x1 = 4. (As you can check, f (x) f  (x) > 0 on ( 5, 4), and therefore we can be
sure that the method applies.) Since f  (x) = 2x, the Newton–Raphson formula gives

xn2 − 5 xn2 + 5
xn+1 = xn − = .
2xn 2xn

Successive calculations with this formula (using a calculator) are given in the fol-
lowing table:
xn2 +4
n xn xn+1 = 2x n
1 4 2.625
2 2.625 2.2649
3 2.2649 2.2363

The approximation satisfies (2.2363)2 =∼ 5.0010, the exact solution is 5 =
2.23606... Thus, the method has generated a very accurate estimate in only three
steps.

3.7 Indeterminate Forms and l’Hôpital’s Rule

In Chap. 2, we already have discussed methods of finding limits. In this section, we


present an additional technique which is known as l’Hôpital’s rule, in honor of G.
l’Hôpital, a French mathematician who lived during 1661–1704. This rule is used
widely, both in theoretical work and practical calculations.
Let us illustrate the problem addressed by l’Hôpital’s rule with a very simple
example. We want to find
2x
lim . (3.34)
x→0 x
3.7 Indeterminate Forms and l’Hôpital’s Rule 91

Since we know that 2x/x = 2 and obviously lim x→0 2 = 2, we see that the limit in
(3.34) equals 2. On the other hand, we might compute the limits of the numerator
and the denominator separately,

lim x 2 = 0 , lim x = 0 ,
x→0 x→0

but then we are stuck since there is no way from the quotient 0/0 to arrive at the
correct value 2 of the limit.
More generally, we want to compute the limit

f (x)
lim , (3.35)
x→c g(x)

in situations where the formula


f (x) lim x→c f (x)
lim =
x→c g(x) lim x→c g(x)

does not work, because the limits of numerator and denominator both have values 0
or ±∞. We say that f /g has indeterminate form 0/0, respectively, ∞/∞ at x = c,
if

lim f (x) = 0 , lim g(x) = 0 , (3.36)


x→c x→c

respectively

lim f (x) = ±∞ , lim g(x) = ±∞ . (3.37)


x→c x→c

The following theorem tells us how to evaluate the limit in such cases.

Theorem 3.2 (l’Hôpital’s Rule) Let f and g be differentiable on an open interval


(a, b) containing c, except possibly at c itself. If f /g has the indeterminate form 0/0
or ∞/∞ at x = c and f  (x)/g  (x) = 0 for x = c, then

f (x) f  (x)
lim = lim  , (3.38)
x→c g(x) x→c g (x)

f  (x) f  (x)
provided that lim x→c g  (x)
exists or lim x→c g  (x)
= ∞.

(x)
Typically, the condition that lim x→c gf  (x) has to exist is not checked separately, but
is verified as a by-product of the computation when one uses l’Hôpital’s Rule.
92 3 Derivatives

Example 3.24 Evaluate the following limits using l’Hôpital’s rule:


2
−9
1. lim x→3 xx−3 ,
2. lim x→0 x ,
sin 3x

lim x→0 e x−1


x
3. 2 ,

4. lim x→0 1−cosx


x
,
sin x
5. lim x→0 x , (this limit we have already computed in Chap. 2 by more elementary
arguments.)
ln x
6. lim x→0 cot x
.

Solution:
1. Here f (x) = x 2 − 9, g(x) = x − 3, and lim x→3 f (x) = 0, lim x→3 g(x) = 0.
Therefore, f /g has indeterminate form 0/0 at x = 0. By l’Hôpital’s rule, we
have
x2 − 9 2x
lim = lim = 6.
x→3 x − 3 x→3 1

Note that in this example it is also possible to compute

x2 − 9
= x + 3, lim x + 3 = 6 .
x −3 x→3

2. We have the form 0/0. By L’Hôpital’s rule,

sin 3x (sin) (3x)


lim = lim = lim 3 cos 3x = 3 .
x→0 x x→0 1 x→0

3. We have the form 0/0. By l’Hôpital’s rule,

ex − 1 ex
lim = lim = ∞.
x→0 x2 x→0 2x

4. We have the form 0/0. By l’Hôpital’s rule,

1 − cos x sin x
lim = lim = 0.
x→0 x x→0 1

5. We have the form 0/0. By l’Hôpital’s rule,

sin x cos x
lim = lim = 1.
x→0 x x→0 1

6. We have the form ∞/∞. By l’Hôpital’s rule,


1
ln x sin2 x
lim = lim x
= − lim ,
x→0 cot x x→0 −csc x
2 x→0 x
3.7 Indeterminate Forms and l’Hôpital’s Rule 93

Fig. 3.16 An electrical


circuit

if the latter limit exists. Now we may either use the product formula to obtain

sin2 x sin x
− lim = − lim · lim sin x = 1 · 0 = 0 ,
x→0 x x→0 x x→0

or we may apply the rule a second time to get

sin2 x 2 sin x cos x


− lim = − lim = 0.
x→0 x x→0 1

The rule can also be applied at c = ±∞. Consider, for example

x2
lim .
x→∞ ln x

Applying l’Hôpital’s rule, we get

x2 2x
lim = lim 1 = lim 2x 2 = ∞ .
x→∞ ln x x→∞ x→∞
x

Example 3.25 Figure 3.16 represents an electrical circuit consisting of an electro-


motive force that produces a voltage V , a resistor with resistance R, and an inductor
with inductance L. The current I at time t is given by

V
I = (1 − e−Rt/L ).
R
When the voltage is first applied at time t = 0, the inductor opposes the rate of
increase of the current and I is small initially. As t increases to ∞, I approaches the
value V /R, the value given by Ohm’s law in the case L = 0.
1. If L is the only independent variable, find lim L→0+ I .
2. If R is the only independent variable, find lim R→0+ I .
94 3 Derivatives

Solution:
1. Let V , R, and t be constants and let L be variable. In this case, I is not indeter-
minate at L = 0. From the rules discussed in Chap. 2, we get

V V
lim+ I = lim+ (1 − e−Rt/L ) = 1 − lim+ e −Rt/L
L→0 L→0 R R L→0
V V
= (1 − 0) = .
R R
Thus, if L ≈ 0, then the current can be approximated by the value I = V /R
given by Ohm’s law, except for small times t.
2. If V , L, and t are constants and R is a variable, then I has the indeterminate
form 0/0 at R = 0. By l’Hôpital’s rule, we get

1 − e−Rt/L 0 − e−Rt/L (−t/L)


lim+ I = V lim+ = V · lim+
R→0 R→0 R R→0 1
V
= V [0 − 1 · (−t/L)] = t .
L

This shows that as R → 0+ , the current is (approximately) directly proportional


to time t. For t = 1, I = VL . For t = 4, I = 4V
L
.
Example 3.26 The logistic model for population growth predicts the population size
at time t by the formula P(t) = 1+ce K
−r t , where r and c are positive constants and
K −y(0)
c = y(0) . K is called the carrying capacity and interpreted as the maximum number
of individuals that the environment can sustain. Find limt→∞ P(t) for K fixed, and
lim K →∞ P(t) for t fixed, and interpret these limits.
Solution: We compute

K K
lim P(t) = lim =
t→∞ (1 + ce−r t )
t→∞ 1 + c limt→∞ e−r t
K
= =K.
1+c·0

This shows that the population will essentially attain its carrying capacity after a
sufficiently long period of time. The expression

K K y(0)er t
lim P(t) = lim K −y(0) −r t
= lim
K →∞ K →∞ 1+ y(0)
e K →∞ y(0)er t + K − y(0)

has the indeterminate form ∞/∞. Applying l’Hôpital’s rule, we get

y(0)er t
lim P(t) = lim = y(0)er t .
K →∞ K →∞ 1
3.7 Indeterminate Forms and l’Hôpital’s Rule 95

This means that population will grow exponentially if the carrying capacity is infinite,
and the logistic model reverts to the model of Malthus.
Other cases of indeterminate forms. Those can be converted to the forms 0/0 or
∞/∞, and then l’Hôpital’s rule can be applied.
The case 0 · ∞ means that we want to compute

lim f (x)g(x) ,
x→c

where
lim f (x) = 0 , lim g(x) = ±∞ .
x→c x→c

Setting

f (x) g(x)
lim f (x)g(x) = lim 1
, or lim f (x)g(x) = lim 1
,
x→c x→∞ x→c
g(x) f (x)

we are back in the case 0/0 or ∞/∞.


Next, we want to compute
lim f (x)g(x) , (3.39)
x→c

for the indeterminate forms 00 , 1∞ , and ∞0 . In all those cases, we begin by setting

y(x) = f (x)g(x)

and taking the natural logarithm of both sides to obtain

ln y(x) = g(x) ln f (x) . (3.40)

We see that
the case 00 means f (x) → 0, g(x) → 0, ln f (x) → −∞,
the case 1∞ means f (x) → 1, g(x) → ∞, ln f (x) → 0,
the case ∞0 means f (x) → 0, g(x) → ∞, ln f (x) → −∞.

Therefore, we can apply our previous versions of l’Hôpital’s rule to compute the
limit in (3.40),
L := lim ln y(x) = lim g(x) ln f (x) .
x→c x→c

After that, we come back to the original limit (3.39), since

lim f (x)g(x) = lim y(x) = lim eln y(x) = e L .


x→c x→c x→c
96 3 Derivatives

Example 3.27 Evaluate the following limit, provided it exists:


1. lim x→0+ x ln x,
2. lim x→0 (e x − 1)x ,
3. lim x→∞ x 1/x , 
x2 x2
4. lim x→∞ x−1 − x+1 , and
5. lim x→1 (1 − x)ln x .

Solution:
1. Here we have the form 0 · ∞. We transform it to the form ∞/∞ and compute
the limit,
1
ln x
lim+ = lim+ 1 = lim+ x 1 = lim+ (−x) = 0 .
x→0 x→0 x→0 − 2 x→0
x x

2. The expression (e x − 1)x has the form 00 for x → 0+ . We set y = (e x − 1)x .


Then
ln y = x ln (e x − 1)

has the form 0 · ∞. We transform to the form ∞/∞ and compute the limit,
ex
ln(e x − 1) e x −1
lim x ln(e − 1) = lim+
x
= lim+
x→0+ x→0 1
x
x→0 − x12
2 x
x e
= lim+ − .
x→0 ex − 1

The latter expression has the form 0/0. We apply l’Hôpital’s rule a second time
and arrive at

x 2 ex 2xe x + x 2 e x 0
lim+ x ln(e x − 1) = lim+ − = lim+ − = − = 0.
x→0 x→0 e − 1 x→0
x e x 1

Finally, we arrive at
lim (e x − 1)x = e0 = 1 .
x→0

x→∞ . Here we have the form ∞ .


3. lim1/x 0

We set y = x . Then
1/x
1
ln y = ln x
x

has the form . We apply l’Hôpital’s rule.

1
1
lim x → ∞ ln x = lim x → ∞ x = 0.
x 1
3.7 Indeterminate Forms and l’Hôpital’s Rule 97

Therefore,
 lim x → ∞x 
1/x
= e0 = 1.
2 2
4. lim x→∞ x−1x
− x+1
x
 2 
x2 x2
We write lim x→∞ x−1 x
− x+1 = lim x→∞ x−1
− lim x→∞ x2
x+1

Each term is of the form .

We use l’Hopital’s rule to get
 x2 x2 
lim − = lim 2x − lim 2x.
x→∞ x − 1 x +1 x→∞ x→∞

5. lim x→1 (1 − x)ln x


Here we have the form 00 .
We write y = (1 − x)ln x . Then

ln y = ln x ln(1 − x)

which has the form 0.∞. ∞


We transform it to the form .

lim(1 − x)
lim ln ln(1 − x) = lim 1
x→1 x→1
ln x
−1
(1−x) x(ln x)2
= lim −1(ln x)2 = lim
x→1 x→1 (1 − x)
x
x
2 ln x + (ln x)2
= lim x
= 0.
x→1 −1

This implies lim x→1 ln y = 0. Therefore, lim x→1 (1 − x)ln x = e0 = 1.

3.8 Sensitivity Analysis

We discuss in this section sensitivity to change. When a small change in x causes


a large change in the value of a function f (x), we say that the function is highly
sensitive to changes in x. The derivative f  (x) is the measure of this change.
As an example, we discuss a situation in genetics. Let us consider a population of
peas, consisting of peas with smooth skin and with wrinkled skin. Let p (a number
between 0 and 1) be the proportion of the gene for the smooth skin and 1 − p the pro-
portion of the gene for wrinkled skin. According to Mendel’s theory of hybridization
(the Austrian monk Gregor Johann Mendel (1822–1884) provided the first scientific
explanation of hybridization), the proportion of smooth-skinned peas in the next
generation will be
98 3 Derivatives

dy
Fig. 3.17 a The graph of y = 2 p − p 2 b The graph of dp

y = 2 p(1 − p) + p 2 = 2 p − p 2 . (3.41)

The graph of y versus p in Fig. 3.17a suggests that the value of y is more sensitive to
a change in p where p is small than when p is large. Indeed, this fact is also visible
through the graph of the derivative y  ( p) = 2 − 2 p, see Fig. 3.17b. It is clear that if
y  ( p) is close to 2 when p is near to 0, and close to 0 when p is near 1.
The implication for genetics is that introducing a few more dominant genes into
a highly recessive population (the proportion of wrinkled skin peas is large, in the
example above) will have a more intense effect on later generations than a similar
increase in a highly dominant population (the proportion of smooth-skinned peas is
already large).
The number f  (x) tells us how sensitive the output of f is with respect to a change
in the input at any value of x. The larger the value of f  at x, the greater the effect of
a given increment Δx of x.
In Sect. 3.6, we have already studied the effect of increments h = Δx and seen that
the differential d f = f  (x)h approximates the true change Δf = f (x + h) − f (x)
rather well when the increment is small. Indeed, the approximation error Δf − d f
can be written as
 
 f (x + h) − f (x) 
Δf − d f = f (x + h) − f (x) − f (x)h = − f (x) h
h
=ε·h,

where ε = f (x+h)−
h
f (x)
− f  (x) is small if h is small. Consequently, the true change
in f can be written as
Δf = f  (x)Δx + εΔx ,
3.8 Sensitivity Analysis 99

and we see that the derivative f  (x) is a good measure of sensitivity as described
above.

Example 3.28 Suppose we want to calculate the depth s of a well from the equation
s = 16t 2 by timing how long it takes a heavy stone dropped from above to splash into
the water below. How sensitive will calculations be to an error of 0.1 s in measuring
the time? (We measure s in ft. and t in seconds.)

Solution: Since s = f (t) = 16t 2 , our estimate for the sensitivity becomes

f  (t) = 32t .

The effect of a measurement error Δt = 0.1 is approximately described by the dif-


ferential d f = 32tΔt and thus depends on t. If t = 2, the error in depth caused by
the measurement error equals d f = 64 · 0.1 = 6.4 ft. Four seconds later, the error
in depth caused by the same Δt is d f = 192 · 0.1 = 19.2 ft.

3.9 Exercises

3.9.1 The distance s (in meters) covered by a particle in time t (in seconds) is given
by s = f (t) = 4t 2 + 3t. Find the velocity at t = 0 and t = 3.
3.9.2 A circle of radius r has area A = πr 2 and circumference c = 2πr . If the
radius changes from r to r + dr , find the
(i) change in area,
(ii) change in circumference.
3.9.3 Using the product rule for derivatives show that d
dx
(x n ) = nx n−1 for any
integer n. Is this result true for any real number n?
3.9.4 Find the derivatives of the functions
(i) f (x) = (2x 5 − x)(x 3 + 1),
(ii) f (x) = 10x −4 + 3x −2 ,
f (x) = −x −1
3
(iii) 2x 2 +1
,
(iv) f (x) = (x + 1)(x − 1)(x + 5),
2

(v) f (x) = (1 + x 2 )x 3 e x ln x,
(vi) f (x) = ln(1 + 3x 2 ),
2
(vii) y = ex .
3.9.5 Find y  if
(a) y = x ln x − x,
(b) y = √x12 +4 ,
(c) y = e2x (e2x − e−2x ).
3.9.6 Find the first three derivatives of the following functions:
100 3 Derivatives

(a) f (t) = t 100 + t 40 + t 2 ,


(b) f (t) = (3t + 5)2 ,
(c) f (t) = t 5 ,
f (t) = (t(t 3 +1)(t +1)
2 4
(d) +1)(t 5 +1)
.

3.9.7 If f = g 2 , prove that f  (1) = 2g(1)g  (1).


3.9.8 If y 5 + x y + x 2 = 3, find y  .
3.9.9 Using the chain rule, find y  for y = (3x + 5)2 and y = (−5x 2 + x − 1)2 .
3.9.10 Find f  (x) for f (x) given below:
(a) f (x) = (x 5 −x+1)
1
3,

(b) f (x) = sin x,


3

(c) f (x) = √1−xx


2
.

3.9.11 Find ddyx for y given below:


(a) y = x 3 sin2 5x (b)y = secsin x
(3x+1)
(c) y = cos3 (sin 2x).
3.9.12 If y = 4x 3 and the maximum percentage error in x is ±15%, approximate
the maximum percentage error in y.
3.9.13 A spherical balloon is being inflated with gas. Use differentials to approximate
the increase in surface area of the balloon if the diameter changes from 2 to
2.02 m.
3.9.14 Examine whether the function y 3 = 2x 2 + c, c constant, satisfies the equation
4x
y = 2 .
3y
3.9.15 Show that any solution y = y(x) of the equation x y = c satisfies the equation
y + x y  = 0.
3.9.16 Find derivatives of the following functions: 
 x 
(a) f (x) = ln 1+x 2 (b) f (x) = ln(ln x) (c) f (x) = 1 + ln2 x
(d) f (x) = cos(ln x).
3.9.17 Find derivatives of the following functions:
−x
(a) f (x) = e1/x , (b) f (x) = eex −e
x

+e−x
, (c) f (x) = sin (e x ),
(d) f (x) = ln (cos e ).
x

3.9.18 Find ddyx by implicit differentiation


(a) y + ln x y − 2 = 0 (b) y = ln (x tan y).
3.9.19 Find f  (x) for f (x) given below:
(a) f (x) = sin−1 ( 51 x), (b) f (x) = (tan x)−1 ,
(c) f (x) = sin−1 x + cos−1 x.
3.9.20 Show using l’Hôpital’s rule that lim x→0 (1 + x)1/x = e.
3.9.21 Find
 
(a) lim x→0+ x (ln a)/(1+ln x) ,
(ln a)/(1+ln x)
(b) lim x→+∞  x  ,
(c) lim x→0 (x + 1)(ln a)/x .
3.9 Exercises 101

3.9.22 Examine whether lim x→0+ x sin(1/x)


sin x
exists or not.
3.9.23 (a) Show that the function y = eax sin bx satisfies

y  − 2ay  + (a 2 + y 2 )y = 0.

(b) Show that y = tan−1 x satisfies y  = −2 sin y cos3 y.


3.9.24 Show that the rate of change of y = 32x 57x is proportional to y.
Chapter 4
Optimization

Very often a real-world problem is equivalent to finding an element in the domain


of a function at which the value of the function is larger or smaller than at all other
elements of the domain. The techniques for finding such an element, that is, finding
an element at which the maximal or minimal values of the function are attained,
make up the field called optimization.
The goal of this chapter is to understand how maxima and minima of a function are
related to its derivative. On practical aspects, we see how to analyze the relationship
between average and marginal costs.

4.1 Extremum Values of Functions

Definition 4.1 (Global maximum and minimum) Let f be a real-valued function


with domain D. If a point c ∈ D satisfies

f (c) ≤ f (x) , for all x ∈ D, (4.1)

then c is called a global minimizer of f on D, and f (c) is called the global minimal
value of f on D. On the other hand, if a point c ∈ D satisfies

f (c) ≥ f (x) , for all x ∈ D, (4.2)

then c is called a global maximizer of f on D, and f (c) is called the global maximal
value of f on D.

The notion “extremum” is generic for maximum and minimum, that is, a global
extremum value is either a global maximal or a global minimal value.

© Springer Nature Singapore Pte Ltd. 2019 103


M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6_4
104 4 Optimization

Alternatively, a minimizer is often termed a “minimum point” or simply “mini-


mum”.1 Also, “absolute minimum” is synonymous to “global minimum”.
Example 4.1 1. On D = R, the functions f (x) = x 2 and f (x) = |x| both have
c = 0 as a global minimizer with minimal value f (c) = 0, but no global maxi-
mizer, since f (x) → ∞ for x → ±∞. Moreover, there is no other global min-
imizer.
2. For the function f (x) = cos x, all points c = 2kπ with k ∈ Z are global maxi-
mizers with maximal value 1, and all points c = (2k + 1)π with k ∈ Z are global
minimizers with minimal value −1.

Definition 4.2 (Local Extrema) Let f be a function with domain D. If there is an


open interval I around some point c ∈ D such that

f (c) ≤ f (x) , for all x ∈ D ∩ I, (4.3)

then c is called a local minimizer of f on D, and f (c) is called the local minimal
value of f on D. Reversing the inequality in (4.3) yields the definition of local
maximizer and local maximal value.

The remarks above concerning the terminology for global extrema also apply to local
extrema. Local extrema are also called relative extrema.
By definition, every global minimizer (maximizer) is also a local minimizer (max-
imizer). Hence, the collection of all local extrema will also include every global
extremum (if existing). On the other hand, a local minimizer does not have to be a
global minimizer.
The following observation provides a crucial link between extrema and deriva-
tives. Let c be a local minimum point of a function f . Assume that I = [c, d] is
an interval contained in the domain D of f , such that f (c) ≤ f (x) holds for all
x ∈ I . Then
f (x) − f (c)
≥0
x −c

holds for c < x ≤ d. By the properties of limits,

f (x) − f (c)
lim+ ≥ 0, (4.4)
x→c x −c

if this limit exists. But this limit is nothing else than the right-sided derivative f  (c+)
of f at c, see Remark 3.1, so
f  (c+) ≥ 0 (4.5)

1 Note
that the plural of “minimum”, being a Latin word originally, is “minima”, the same for
maximum, extremum etc.
4.1 Extremum Values of Functions 105

holds in this case. Analogously, if I = [b, c] extends to the left of c, we obtain that

f  (c−) ≤ 0 (4.6)

holds for the left-sided derivative. If, on the other hand, c is a local maximum of f ,
then c is a local minimum of − f , and we obtain (4.5) and (4.6) with inequalities
reversed. Now, if f is differentiable at c, we must have f  (c) = f  (c+) = f  (c−).
Putting together (4.5) and (4.6), we have arrived at the single most important fact of
optimization theory.

Theorem 4.1 If f has a local extremum at an interior point c of its domain, and if
f  (c) exists, then
f  (c) = 0 , (4.7)

that is, the tangent to the graph of f at the point (c, f (c)) is horizontal.

The theorem tells us that the only points where a function f with domain D can
possibly have extrema are
• interior points of D at which f  = 0 (example: f (x) = x 2 ),
• interior points of D at which f  is undefined (example: f (x) = |x|),
• end points of D.
Note that the domain D plays a crucial role. The function f (x) = x has no extremum
on its natural domain D = R. But if we restrict D to the nonnegative numbers [0, ∞),
for example, in an application where negative values make no sense, then f (x) = x
has the global minimizer c = 0, an endpoint of [0, ∞).

Definition 4.3 (Critical Point) An interior point x of the domain D of a function f


is called critical or stationary, if f  (x) = 0.

As we have seen above, if f is differentiable on an open interval I , then every local


extremum in I must be a critical point. However, a point x can be critical without
being an extremum. For example, the function f (x) = x 3 has no extremum, but
x = 0 is a critical point of f .
At an extremum x of f , the slope of f typically changes its sign. The function
f (x) = x 2 , for example, has negative slope f  (x) = 2x < 0 for x < 0 and positive
slope for x > 0. On the other hand, the slope f  (x) = 3x 2 of f (x) = x 3 is positive
for x = 0, so it does not change sign at the critical point x = 0.
There arises the question whether we can guarantee that a given function has a
maximizer or a minimizer.
Theorem 4.2 (Existence of Extrema) Let f be a continuous function defined on a
closed interval I = [a, b], where a < b are given numbers. Then f has a maximizer
and a minimizer on I .
To appreciate this result, let us show that it does not hold if we remove any of its
assumptions.
106 4 Optimization

Example 4.2 On the domain D = R,


• the function f (x) = x 2 has the minimizer c = 0, but no maximizer,
• the function f (x) = x 3 has neither a minimizer nor a maximizer,
• the function f (x) = arctan x has neither a minimizer nor a maximizer, although
all of its values remain in the bounded interval [−π/2, π/2].
On the domain D = [−2, 2], the discontinuous function

x , |x| < 1 ,
f (x) =
0 , |x| ≥ 1 ,

has neither a minimizer nor a maximizer.


Again we note that the choice of the domain D is crucial. While the functions
f (x) = x 2 , f (x) = x 3 , f (x) = arctan x of the preceding example fail to have max-
imizers on D = R, they have maximizers on every closed interval D = [a, b], by
Theorem 4.2.

4.2 Monotonicity

In this section, we examine properties of a function and its graph in terms of its
derivatives. First, let us recall from Chap. 1 that a function f with domain D is said
to be
• increasing (respectively, nondecreasing), if f (x1 ) < f (x2 ) (respectively, f (x1 ) ≤
f (x2 )) whenever x1 < x2 ,
• decreasing (respectively, nonincreasing), if f (x1 ) > f (x2 ) (respectively, f (x1 ) ≤
f (x2 )) whenever x1 < x2
holds for arbitrary points x1 , x2 ∈ D. Looking at the graphs of functions like the ones
in Fig. 4.1, one realizes that increasing functions have tangents with positive slope,
while decreasing functions have tangents with negative slope. (We already know that
constant functions have zero slope.) The converse, too, is true.
Theorem 4.3 (Monotonicity Criterion) Let f be function which is continuous on
some interval [a, b] and differentiable on its interior (a, b).
1. If f  (x) > 0 for every value of x in (a, b), then f is increasing on [a, b].
2. If f  (x) < 0 for every value of x in (a, b), then f is decreasing on [a, b].
3. If f  (x) = 0 for every value of x in (a, b), then f is constant on [a, b].
This result is very important, because now we can check without doubt whether a
function is increasing or decreasing, by computing its derivative (in fact, it suffices
to compute the sign of the derivative.)
The proof of Theorem 4.3 is based on the following result, which is interesting by
itself.
4.2 Monotonicity 107

Fig. 4.1 (a) an increasing


function, (b) a decreasing
function, (c) a constant
function

Theorem 4.4 (Mean Value Theorem) Let f be continuous on [a, b] and differen-
tiable on (a, b). Then there is at least one point c in (a, b) where

f (b) − f (a)
f  (c) = . (4.8)
b−a

In geometrical terms, the mean value theorem says that for every secant through two
points of the graph of f , there is a tangent at an intermediate point, having the same
slope as the secant.
The proof of Theorem 4.4 is yet based upon another theorem.
Theorem 4.5 (Rolle’s Theorem) let f be continuous on [a, b] and differentiable on
(a, b). If f (a) = f (b), then there is at least one point c in (a, b) where f  (c) = 0.
108 4 Optimization

We present here the proof of Theorem 4.3, while the proofs of Theorems 4.4 and 4.5
can be found in Appendix D.
Proof of Theorem 4.3: We only prove Part 1, the other parts are proved in the same
manner. Let x1 , x2 be in [a, b] such that x1 < x2 . By the mean value theorem applied
on [x1 , x2 ], there is a point c in (x1 , x2 ) such that

f (x2 ) − f (x1 )
f  (c) =
x2 − x1

or equivalently,
f (x2 ) − f (x1 ) = f  (c)(x2 − x1 ) . (4.9)

By assumption, f  (c) > 0, as c ∈ (x1 , x2 ⊂ (a, b). Since x2 − x1 > 0, Eq. (4.9)
implies that f (x2 ) − f (x1 ) > 0, so f (x1 ) < f (x2 ). Since x1 and x2 have been chosen
arbitrarily, f is increasing.
Remark 4.1 Let us point out explicitly that Eq. (4.8) of the Mean Value Theorem 4.4
implies that a function, whose derivative is identically zero, must be constant. Now,
let f and g be two functions. If f  (x) = g  (x) for all x in some interval I , then
( f − g) = 0 on I , hence there is a constant C such that f (x) = g(x) + C for all x
in I . That is, two functions with identical derivatives differ only by some constant
value. This result will be important in the study of integration in Chap. 6.

Example 4.3 Find the intervals on which the following functions are increasing and
the intervals on which they are decreasing:
1. f (x) = x 2 − 4x + 9,
2. f (x) = x 3 .

Solution:
1. We have f  (x) = 2x − 4 = 2(x − 2). It is clear that f  (x) > 0 if x > 2 and
f  (x) < 0 if x < 2, therefore f  > 0 on the interval (2, ∞) and f  < 0 on the
interval (−∞, 2). Since f is differentiable (hence continuous) everywhere, it fol-
lows from Theorem 4.3 that f is increasing on [2, ∞) and decreasing on (−∞, 2].
2. In the same manner, one checks that f is increasing on (−∞, 0] and [0, ∞).
Therefore, it is increasing on all of R.

4.3 Further Properties of Extremum Values

If we find a critical point, at first we do not know whether it is an extremum or not.


However, if we know for example that f is increasing to the left of c and decreasing
to the right of c, then c must be a local maximum. From the monotonicity criterion
(Theorem 4.3) we, therefore, obtain the following test.
4.3 Further Properties of Extremum Values 109

Theorem 4.6 (First Derivative Test) Let f be a differentiable on an open interval


which includes a critical point c.
1. If f  (x) > 0 on some open interval extending to the left from c,2 and f  (x) < 0
on some open interval extending to the right from c, then f has a local maximum
at c.
2. If f  (x) < 0 on some open interval extending to the left from c and f  (x) > 0 on
some open interval extending to the right from c, then f has a local minimum at
c.
3. If f  (x) has the same sign (either f  (x) > 0 or f  (x) < 0) on some open intervals
extending to the left and the right from c, then f does not have a local extremum
at x0 .
The previous test requires us to check the sign of f  in some interval around the
critical point c, where f  (c) = 0. Since f  is the derivative of f  , we can interpret
the value f  (c) as the slope of the tangent to the function f  at the point c. Now,
if we know that for example f  (c) > 0, we can conclude from Theorem 3.1 that
f  (x) > f  (c) = 0 if x > c is sufficiently close to c. This yields the following test.
Theorem 4.7 (Second Derivative Test) Suppose that f is differentiable on an open
interval which includes the point c, and that f  (c) exists.
1. If f  (c) = 0 and f  (c) > 0, then f has a local minimum at c.
2. If f  (c) = 0 and f  (c) < 0, then f has a local maximum at c.
If f  (c) = 0 and f  (c) = 0, then the test is inconclusive, that is, f may have a local
maximum, a local minimum, or neither at c.
Example 4.4 Find the extremum values of

1
f (x) = √ .
9 − x2

1 −2x x
Solution: f  (x) = − = ,
2 (9 − x ) 2 3/2 (9 − x 2 )3/2
f  (x) = 0 when x = 0,
f has a stationary point at x = 0,
(9 − x 2 )3/2 − x. 23 (9 − x 2 )1/2
f  (x) = ,
(9 − x 2 )3
f  (0) > 0,
So f has a minima at x = 0.
Example 4.5 Find the local extrema of the following functions:
1. f (x) = 2x 2 − x 4 ,
2. f (x) = x 2 e x ,
3. f (x) = cos2 x, 0 < x < 2π .

2 That is, an interval of the form (c − δ, c) for some δ > 0.


110 4 Optimization

Solution:
1. f (x) = 2x 2 − x 4 ,
f  (x) = 4x − 4x 3 = 4x(1 − x 2 ),
f  (x) = 0 gives x = 0, 1, −1,
f  (x) = 4 − 12x 2 ,
f  (0) = 4 > 0, f  (1) = 4 − 12 = −8 < 0, f  (−1) = −8 < 0,
f has a minimum at x = 0 and maxima at x = 1, −1.
2. f (x) = x 2 e x ,
f  (x) = 2xe x + x 2 e x = xe x (2 + x),
f  (x) = 0 gives x = 0, −2, x = −∞,
We consider the first two values
f  (x) = 2xe x + 2e x + 2xe x + x 2 e x = e x (x 2 + 2 + 4x),
f  (0) = 2 > 0; f  (−2) = e−2 (4 + 2 − 8) = −2e−2 < 0.
Therefore, f has a minimum at x = 0 and a minimum at x = −2.
3. f (x) = cos2 x,
f  (x) = −2 sin x cos x = − sin 2x,
π
f  (x) = 0 gives sin 2x = 0 or 2x = π or x = .
2
f  (x)
π=  −2 cos 2x.
f  = −2 cos (π ) = 2 > 0.
2
π
Therefore, f has a minimum at x = .
2
Example 4.6 Find values of a,b,c, and d, so that the function

f (x) = ax 3 + bx 2 + cx + d

satisfies f (0) = 0, f (1) = 1, has a local minimum at c = 0 and a local maximum


at c = 1.
Solution: f (x) = ax 3 + bx 2 + cx + d,
f  (x) = 3ax 2 + 2bx + c; f  (x) = 6ax + 2b.
Since f has a local minimum at (0, 0) and a local maximum at (1, 1),
f  (0) = 0 and f  (1) = 0. This gives c = 0 and 3a + 2b = 0.
Since f (0) = 0 and f (1) = 0, we have d = 0 and 1 = a + b.
Solve 3a + 2b = 0 and a + b = 1 to get a = −2 and b = 3.
Therefore a = −2, b = 3, c = d = 0.

4.4 Convexity and Concavity

We have seen that the sign of derivative of f reveals where the graph of f is increasing
or decreasing. However, it does not reveal the direction of its curvature.
In Fig. 4.2 the graph is increasing but on the left side it has an upwards curvature
(holds water) and on the right side it has a downward curvature (spills water). On
4.4 Convexity and Concavity 111

Convex Concave

Fig. 4.2 A function which is convex on the left part and concave on the right part

intervals where the graph of a given function f has an upward curvature, we say that
f is convex, and on intervals where the graph has a downward curvature, we say that
f is concave.3 See also Fig. 4.3 below. A formal definition can be made as follows.
Definition 4.4 A function f which is differentiable on an open interval I = (a, b)
is said to be convex on I if f  is increasing on I , and it is said to be concave on I if
f  is decreasing on I .
The convexity or concavity of a function f can be characterized in terms of the
second derivative f  of f as follows.
Theorem 4.8 Let f be twice differentiable on an open interval I = (a, b).
1. If f  (x) > 0 on I , then f is convex on this interval.
2. If f  (x) < 0 on I , then f is concave on this interval.
The proof of Theorem 4.8 follows from the observation that f  is increasing where
its derivative f  is positive and decreasing where f  is negative.

Example 4.7 Find open intervals on which the following functions are convex or
concave:
1. f (x) = x 2 ,
2. f (x) = 3 + sin x,
3. f (x) = x 2 − 4x + 6.

Solution:
1. By Theorem 4.8, f is convex on (−∞, ∞) because its second derivative satisfies
f  (x) = 2, which is positive everywhere.
2. f is concave on (0, π), since there f  (x) = − sin x is negative. It is convex on
(π, 2π ) as f  (x) = − sin x is positive on this interval.
3. f  (x) = 2, that is f  (x) > 0 on (−∞, ∞), hence f is convex on (−∞, ∞).

3 Sometimes, the terms “concave up” and “concave down” are used instead of “convex” and “con-
cave”. We prefer to use the latter because it conforms to the standard use in mathematics.
112 4 Optimization

Fig. 4.3 Inflection point

The points where a function f changes its direction of curvature are called inflection
points of f .
Definition 4.5 (Inflection Points) Let the function f be differentiable on an open
interval (a, b). A point c ∈ (a, b) is called an inflection point if f is either convex
on some interval extending to the left of c and concave on some interval extending
to the right of c (see Fig. 4.3), or concave on some interval extending to the left of c
and convex on some interval extending to the right of c.
In Example 4.7, we have seen that the function f (x) = 3 + sin x has an inflection
point at c = π.
From Theorem 4.8, we see that a point c is an inflection point of f if f  (c) = 0
and f  has different signs to the left and to the right of c. However, not every point c
with f  (c) = 0 is an inflection point. For example, the function f (x) = x 4 satisfies
f  (0) = 0, but it is convex on the whole line (−∞, ∞).
Summary: properties of graphs of functions. Let us summarize at this point var-
ious concepts of calculus which are related to pictorial representations of functions
(Fig. 4.4).
1. The domain and the range of f (see Chap. 1).
2. Continuity of f (see Chap. 2).
3. The points where the graph of f meets the horizontal and the vertical axis. For
the x-axis, they are the solutions of the equation f (x) = 0. The y-axis is met at
the point (0, f (0)), if 0 belongs to the domain of f .
4. Symmetry: If f is an even function, its graph is symmetric with respect to the
y-axis. If f is an odd function, its graph is symmetric with respect to the origin.
5. Local extrema as discussed in Sect. 4.2.
6. Inflection points as presented in Definition 4.5.
7. Convexity and Concavity as discussed in Sect. 4.4.
4.4 Convexity and Concavity 113

Fig. 4.4 Behavior of functions in terms of derivatives

8. Horizontal and vertical asymptotes are connected to improper limits. If


lim x→∞ f (x) = L or lim x→−∞ f (x) = L, then the line y = L is a horizontal
asymptote. If lim x→c+ f (x) or lim x→c− f (x) equals either ∞ or −∞, then the
line x = c is a vertical asymptote.
9. Intervals of increase and decrease as discussed in Sect. 4.2.

4.5 Applications of Optimization

If one wants to formulate and solve an optimization problem, one may go through
the following steps.
• Understand the problem. Read the problem carefully and identify unknown quan-
tities, known quantities, and the quantity to be sought.
• Formulate a mathematical model. Draw pictures and label the important parts.
Introduce a variable to represent the quantity to be maximized or minimized.
Using that variable, find a function f whose extreme values give the information
sought.
114 4 Optimization

• Find the domain of the function f. Determine what values of the variable make
sense in the problem. Draw a graph of the function if feasible.
• Identify the critical points and end points. Find points where the derivative is zero
or fails to exist. Use the first and second derivatives to identify and classify critical
points.
• Compute the solution. If you are unsure of the result, support or confirm your
solution with a different method.
• Interpret the solution. Translate the mathematical result into the problem setting
and determine whether the result makes sense.
Applications to Business and Economics
We have already introduced in the examples of Sect. 3.3 the cost C(x) of producing
x units of a commodity, the revenue R(x) received by selling them, and the profit
P(x) = R(x) − C(x). Moreover, we have defined the marginal cost, marginal rev-
enue, and marginal profit as their rates C  (x), R  (x), and P  (x), respectively. The term
“marginal cost” is used because it is approximately equal to the cost of producing
one more unit. Indeed, by definition the marginal cost equals

C(x + h) − C(x)
C  (x) = lim . (4.10)
h→0 h

On the other hand, the cost C(x + 1) − C(x) of producing one more unit is obtained
setting h = 1 in the difference quotient above. Example 3.9 furnishes a situation
where those two quantities are in fact close to each other. Usually, this occurs when
x is large. The same considerations apply to marginal revenue and marginal profit.
A common goal in business is to maximize profit. This is a question of optimiza-
tion.
Theorem 4.9 Let x > 0 be a production level which maximizes the profit P(x) on
[0, ∞). Then at this level, the marginal cost equals marginal revenue, that is

C  (x) = R  (x) . (4.11)

(We have assumed that C and R, and hence also P, are differentiable functions of x.)

Proof Since x is a maximizer of P, we must have P  (x) = 0 by Theorem 4.1. There-


fore
0 = P  (x) = (R − C) (x) = R  (x) − C  (x) ,

which proves the claim.

Note that in applying the differential calculus, we assume the argument x to take
arbitrary real numbers as values, where the amount of a commodity like computers or
tables is represented by an integer number. This is a typical situation in mathematical
modeling, where often a continuum is used instead of a discrete set, in order to use
the tools and concepts of the differential and integral calculus.
4.5 Applications of Optimization 115

Let us consider a cost function C(x) given by

C(x) = a + bx + d x 2 + kx 3 . (4.12)

Here, the constant a represents a fixed overhead charge for items like rent, heat, and
light that is independent of the number of units produced. If, except for this fixed
overhead, the production costs are strictly proportional to the number x of units
produced, then b is just the cost per additional unit, in this case (d = k = 0) equal
to the marginal cost. The quadratic and cubic terms model situations where for large
number of units the marginal cost is actually increasing, thus affecting production
costs significantly.
Another relevant concept is the average cost

C(x)
A(x) = .
x
The average cost describes the actual cost per unit, assuming that exactly x units
are produced. It may be visualized as the slope of the line from the origin to point
(x, C(x)) of the graph of the cost curve,

Example 4.8 Show that the critical points of the average cost occur when marginal
cost equals average cost.

Solution: We have

C(x) C  (x)x − C(x)


A(x) = , A (x) = .
x x2

At critical points of the average cost we have A (x) = 0 which implies that C  (x)x −
C(x) = 0. We conclude that

C(x)
C  (x) = = A (x) ,
x
which is what we wanted to show.

Example 4.9 Let R(x) = 9x and C(x) = x 3 − 6x 2 + 15x, where x represents one
thousand units of some commodity. Find the production level at which the profit is
maximal. Find also the maximal value of the profit.

Solution: For arbitrary x, the value of the profit is given by

P(x) = R(x) − C(x) = 9x − (x 3 − 6x 2 + 15x) = −x 3 + 6x 2 − 6x . (4.13)

By Theorem 4.9 the profit is maximal for the level x at which C  (x) = R  (x). Here
we have R(x) = 9x so R  (x) = 9, and C(x) = x 3 − 6x 2 + 15x so C  (x) = 3x 2 −
12x + 15. Equating those expressions we get 9 = 3x 2 − 12x + 15 or equivalently
116 4 Optimization

x 2 − 4x + 2 = 0. Solving √ this quadratic equation with√the standard formula


√ yields
the two solutions x = 2 ± 2. One checks√ that p(2 − 2) < p(2 + 2). Thus the
maximal profit is attained at x = 2 + 2 = 3.414, and its value is

p(2 + 2) = −(3.414)3 + 6 · (3.414)2 − 6 · 3.414 = 9.657 .


√ we would have to check that P (2 +
Remark 4.2 Normally, in the example above,

2) < 0 in order to ensure that x = 2 + 2 is indeed a maximizer, according to
Theorem 4.7. But in this example (and also the following ones) the profit function
has the properties

P(0) < 0 , lim P(x) = −∞ , P(x) > 0 for some x > 0. (4.14)
x→∞

In this case, Theorem 4.2 implies the existence of a maximizer with positive profit,
and therefore any critical point which maximizes P among the critical points has to
be a maximum of P on all of [0, ∞).
Example 4.10 A company producing car components estimates that the cost (in
Indian rupees) of producing x units of a certain component is given by

C(x) = 0.0001x 2 + 0.05x + 200 .

1. Find the total as well as the average and marginal cost of producing 500 units and
1000 units.
2. Compare the marginal cost of producing 1000 units with the cost of producing
the 1001st unit.
Solution:
1. The average and marginal costs are

C(x) 200
A(x) = = 0.0001x + 0.05 + , C  (x) = .0002x + 0.05 .
x x

The cost for producing 500 units is C(500) = 250 Indian rupees. The cost for
producing 1000 units is C(1000) = 350 rupees. The average cost equals 0.50 for
500 units and 0.35 for 1000 units rupees per unit. The marginal costs for 500
units and 1000 units are 0.15 and 0.25 rupees per unit, respectively.
2. For 1001 units, we have

C(1001) = 200 + 0.05 · 1001 + 0.0001x 2 · 1001 350.25 .

The cost of producing the 1001st unit becomes

C(1001) − C(1000) 350.25 − 350 = 0.025 = C  (1000) ,

thus it is very close to the marginal cost at the level x = 1000.


4.5 Applications of Optimization 117

Example 4.11 A liquid form of medicine manufactured by a pharmaceutical com-


pany is sold in bulk at a price of 200 Indian rupees per unit (one bottle). The total
production costs (in Indian rupees) for x units are

C(x) = 500,000 + 80x + 0.003x 2 .

Moreover, the production capacity of the firm is limited to 30,000 units during some
specified time interval. How many units of medicine must be manufactured and sold
to maximize the profit during that time interval?

Solution: Since the total revenue for selling x units is R(x) = 200x, the profit P(x)
on x units will be

P(x) = R(x) − C(x) = 200x − (1500,000 + 80x + 0.003x 2 ) . (4.15)

Since the production capacity is limited 30,000 units, x must lie in the interval
I = [0, 30, 000]. By (4.15)

P  (x) = 200 − (80 + 0.006x) = 120 − 0.006x .

Setting P  (x) = 0 gives 120 − 0.006x = 0 or x = 20,000 as the only critical point
of P in I . The maximizer of P on I , which exists by Theorem 4.2, must be either
this critical point or one of the end points of I . Substituting these values into (4.15)
we obtain P(0) < 0 (which we may ignore) and

P(20,000) = 700,000 , P(30,000) = 400,000 .

This tells us that the maximal profit P = 700,000 occurs when x = 20,000 units are
manufactured and sold during the specified time.
Example 4.12 Find the quantity which maximizes the profit if the total revenue and
total cost in Indian rupees are given by R(x) = 5x − 0.003x 2 and C(x) = 300 +
1.1x, respectively. Moreover, production is restricted to at most 1000 units. Find the
production levels at which profit is maximal and at which the profit is minimal.
Solution: We again determine the critical points of

P(x) = (5x − 0.003x 2 ) − (300 + 1.1x) = −300 + 3.9x − 0.003x 2 (4.16)

from
0 = P  (x) = 3.9 − 0.006x .

This equation has the unique solution x = 0.006 3.9


= 650. The possible production
levels lie in the interval I = [0, 1000]. By Theorem 4.2, the profit function P has a
maximizer and a minimizer on I . They are to be found among the critical points and
the end points. Substituting their values into (4.16) yields
118 4 Optimization

Fig. 4.5 Revenue of a travel agency as a function of price

P(0) = −300 , P(650) = 967.50 , P(1000) = 600 .

Therefore, maximal profit occurs at a production level of 650 units, and minimal
profit (in this case a loss) at x = 0 when there is no production at all.

Example 4.13 A travel agency knows that at a price of 80 Indian rupees for a half-
day trip, they attract 300 customers. For every decrease of 5 rupees in price, they
attract approximately 30 additional customers. What price should the agency charge
in order to maximize its revenue?

Solution: This situation is different from the one in the examples above. We first set
up a so-called demand function D(x) which tells us the total number of customers
at price level x. In order to understand its behavior, we make the following table:

Price x No. of customers D(x)


80 300
75 330
70 360
65 390

These points all lie on the straight line with slope

300 − 330 30
=− = −6 .
80 − 75 5

This line satisfies the equation

D(x) = −6x + 780 ,


4.5 Applications of Optimization 119

Fig. 4.6 Optimizing the


pipe location

where we have determined the constant 780 from the condition D(80) = 300. The
revenue of the agency is the product of the price (per customer) and the number of
customers (Fig. 4.5),

R(x) = x · D(x) = x(−6x + 780) = −6x 2 + 780x .

To find the maximal revenue, we differentiate the revenue function and find its critical
points,
780
0 = R  (x) = −12x + 780 , so x = = 65 .
12
The agency achieves the maximal revenue when it sets the price at 65 rupees.
Example 4.14 (Piping oil from a drilling-rig to a refinery) A drilling-rig positioned
12 km offshore is to be connected by a pipe to a refinery, located on the shoreline at
20 km distance from the coastal point closest to the rig. An underwater pipe costs
50,000, a land-based pipe 30,000 Indian rupees per km. What combination of the
two will give the least expensive connection?
Solution: Figure 4.6 describes the geometrical situation, the horizontal line being the
shoreline. Let x and y denote the length of underwater pipe and the length of land-
based pipe, respectively. The Pythagorean theorem gives x 2 = 122 + (20 − y)2 ,

x= 144 + (20 − y)2 . (4.17)

The cost c of the whole pipeline is

c = 50,000x + 30,000y .

Substituting x from (4.17) yields



c(y) = 50,000 144 + (20 − y)2 + 30,000y .
120 4 Optimization

We now find the minimal value of c(y) on the interval 0 ≤ y ≤ 20. The first derivative
of c with respect to y is

1 2(20 − y)(−1)
c = 50,000 · · + 30,000
2 144 + (20 − y)2
20 − y
= −50,000  + 30,000 .
144 + (20 − y)2

Setting c equal to zero gives



50,000(20 − y) = 30,000 144 + (20 − y)2 .

In order to solve this equation for y, we successively transform it as follows:

5 
(20 − y) = 144 + (20 − y)2 ,
3
25
(20 − y)2 = 144 + (20 − y)2 ,
9
16
(20 − y)2 = 144 ,
9
3
20 − y = ± · 12 = ±9 ,
4
y = 11 or y = 29 .

Only y = 11 lies in the interval [0, 20]. The value of c at this critical point and at the
end points are

c(11) = 1,080,000
c(0) = 1,166,190
c(20) = 1.200,000 .

The least expensive connection costs 1,080,000 rupees and is achieved by running
the line underwater to the point on the shore at 11 km distance from the refinery.
Example 4.15 (The Stock Market) The graph in Fig. 4.7 shows a hypothetical version
of the Sensex Average. The Sensex Average is a stock market index, capturing the
overall increase of the stock market along with local dips and rises.
One way to invest in the stock market is to buy shares of an index fund, which in
turn buys a number of different stocks with the goal of tracking the index. The goal of
an index fund director would certainly be to buy low (for example, at local minima)
and sell high (for example, at local maxima). We may also attach a meaning to the
curvature of this graph. On a convex part of the curve, the growth rate increases with
time, while on the concave part it decreases. The inflection points mark the times
where the growth rate changes this behavior, so one may view it as a signal of a trend
reversal.
4.5 Applications of Optimization 121

Fig. 4.7 Hypothetical


version of Sensex

Other Applications

Example 4.16 (Fabricating a Box) A box open on top is made by cutting small
congruent squares from the corners of a 12 by 12 cm sheet of tin and bending up the
sides. How large should the squares be which we cut from the corners, in order to
maximize the volume of the box?

Solution: Let the corner squares have x cm side length. By construction, the volume
of the box becomes a function of x, namely (Fig. 4.8)

V (x) = x(12 − 2x)2 = 144x − 48x 2 + 4x 3 . (4.18)

Since the sides of the sheet of tin are 12 cm long, x must lie in the interval [0, 6].
Obviously, V (x) > 0 for 0 < x < 6, and V (0) = 0 = V (6), so the end points x = 0
and x = 6 are minima. By Theorem 4.2 there exists a maximum, which in this case
has to be in the interior. Indeed, the graph of V (see Fig. 4.9) suggests a maximum
near x = 2. We compute first derivative of V and obtain

V  (x) = 144 − 96x + 12x 2 = 12(12 − 8x + x 2 ) = 12(2 − x)(6 − x) .

It has only one zero in the interior of [0, 6], which, therefore, must be the maximum.
Since V (2) = 128, the maximal volume of the box is 128 cm3 , and the squares cut
out should have a side length of 2 cm.

Example 4.17 (Designing an Efficient Oil Can) You have to design a 1 L oil can
shaped like a right circular cylinder (see Fig. 4.10). How should we choose the radius
r and height h in order to use the least amount of material?
122 4 Optimization

Fig. 4.8 An open box made by cutting the corners from a square sheet

Fig. 4.9 Maximizing box volume

Solution: Because the volume should be equal to 1 L, we must have

πr 2 h = 1000 ,

if r and h are measured in centimeters. The surface area of the can is given by

A = 2πr 2 + 2πr h . (4.19)


4.5 Applications of Optimization 123

Fig. 4.10 Oil can to be


designed

We assume that the amount of material we have to use is proportional to the surface
area. We thus have to minimize A subject to the volume constraint πr 2 h = 1000. In
order to remove the constraint from the computation, we express h in terms of r ,

1000
h= ,
πr 2
and substitute that expression into the surface area formula. Using (4.19) we obtain
 
1000 2000
A(r ) = 2πr 2 + 2πr · = 2πr 2 + .
πr 2 r

Our aim is to find the value of r > 0 that minimizes the value of A. Figure 4.11
suggests that such a value exists.
For small r (a tall thin container, like a piece of pipe), the term 2000/r dominates
and A is large. For large r (a short wide container, like a pizza pan), the term 2πr 2
dominates and A again is large. To compute the minimum, we set the derivative of
A equal to zero,
2000
0 = A (r ) = 4πr − 2 .
r
Rearranging the terms, we obtain

500
4πr = 2000 , r =
3
= 5.42 .
3

π
124 4 Optimization

Fig. 4.11 Minimizing surface area at constant volume

In order to check the type of this critical point, we compute the second derivative,

4000
A (r ) = 4π + .
r3
It is positive throughout the domain (0, ∞) of A. Therefore, A is convex and the
critical point must be a global minimum. The corresponding value of h is

1000 3 500
h= =2 = 2r .
πr 2 π

The 1 L can that uses the least amount of material has height equal to its diameter,
with r = 5.42 cm and h = 10.84 cm.
Example 4.18 (Inscribing Rectangles) A rectangle is to be inscribed into a semi-
circle of radius 2. What is the largest area the rectangle can have, and what are its
dimensions?
Solution: We place the circle and the rectangle in the plane as seen in Fig. 4.12. The
√ of the rectangle has coordinates (x, 0), the upper right corner
lower right corner √ has
coordinates (x, 4 − x 2 ). In terms of x, the rectangle has length 2x, height 4 − x 2
and area 
A(x) = 2x 4 − x 2 . (4.20)

Notice that the values of x are restricted to the interval 0 ≤ x ≤ 2, since the rectangle
has to lie inside the semicircle. Our goal is to find the global maximum of the function
A from (4.20) on the domain [0, 2], whose existence is guaranteed by Theorem 4.2.
4.5 Applications of Optimization 125

Fig. 4.12 Rectangle and


semicircle

We have A(x) > 0 for 0 < x < 2 and A(0) = A(2) = 0. The maximum, therefore,
must be a critical point of the derivative

2x 2 
A (x) = − √ + 2 4 − x2
4 − x2

with 0 < x < 2. Rearranging the equation

−2x 2 
√ + 2 4 − x2 = 0
4−x 2

we obtain

−2x 2 + 2(4 − x 2 ) = 0
8 − 4x 2 = 0
x2 = 2

x = ± 2.
√ √ √
Of the two zeros x = 2 and x = − 2, only x = 2 satisfies the restriction 0 <
x < 2, and √ √ √
A( 2) = 2 2 4 − 2 = 4 .
√ √
The area has a maximal
√ value of 4, the corresponding height is 4 − x2 = 2 and
the length is 2x = 2 2.
126 4 Optimization

4.6 Exercises

4.6.1 Find the local extrema of f on the given interval


(i) f (x) = sec 21 x; [− π2 , π2 ],
(ii) f (x) = tan x − 2 sec x; [− π4 , π4 ],
(iii) f (x) = sin x − cos x; [0, π ],
(iv) f (x) = |6 − 4x|; [−3, 3].
4.6.2 Find the global (absolute) extrema of f on (−∞, ∞).
(i) f (x) = 3 − 4x − 2x 2 ,
(ii) f (x) = x 3 − 3x − 2.
4.6.3 Show that among all rectangles with perimeter p, the square has the maximum
area.
4.6.4 Show that among all rectangles with area A, the square has the minimum
perimeter.
4.6.5 Find the point on the graph of y = x 2 + 1 that is closest to the point (3, 1).
4.6.6 Find the point on the graph of y = x 3 that is closest to the point (4, 0).
4.6.7 A pipeline for transporting oil will connect two points A and B that are 3 km
apart and on opposite banks of a straight river 1 km wide (see Fig. 4.13). Part
of the pipeline will run under water from A to a point C on the opposite bank,
and then above ground from C to B. If the cost per km of running the pipeline
underwater is four times the cost per km of running it above ground, find the
location of C that will minimize the cost. (The slope of the of the river bed
should be disregarded.)
4.6.8 Let f (x) = x 2 + px + q. Find the value of p and q such that f (1) = 3 is an
extreme value of f on [0, 2]. Is this value a maximum or minimum?
4.6.9 Show that
64 27
f (x) = +
sin x cos x

has a minimum value, but no maximum value on the interval (0, π/2).
4.6.10 Find the extrema of f on the given interval
(a) f (x) = −2x 3 − 6x 2 + 5; [−3, 1],
(b) f (x) = x 4 − 5x 2 + 4; [0, 2].
4.6.11 A window has the shape of a rectangle surmounted by a semicircle. If the
perimeter of the window is 15 m, find the dimensions that will allow the
maximum amount of light to enter.
4.6.12 A farmer has 500 m of fencing to enclose a rectangular field. A barn will use
part of one side of the field. Prove that the area of the field is maximal when
the rectangle is a square.
4.6.13 Find the dimensions of the rectangle of maximum area that can be inscribed
in a semicircle of radius a, if two vertices lie on the diameter.
4.6 Exercises 127

Fig. 4.13 Optimal location


of a pipeline

4.6.14 A rectangle has its two lower corners on the x-axis and its two upper corners
on the curve y = 16 − x 2 . For all such rectangles, what are the dimensions
of the one with largest area?
4.6.15 Find the dimensions of the rectangle with maximum area that can be inscribed
in a circle of radius 10.
4.6.16 Let the number of bacteria in a culture at time t be given by N = 5000(25 +
te−t/20 ).
(a) Find the largest and smallest number of bacteria in the culture during the
time interval 0 ≤ t ≤ 100.
(b) At what time during the time interval in part (a) is the number of bacteria
decreasing most rapidly?
4.6.17 Find the dimensions of the right circular cylinder of largest volume that can
be inscribed in a sphere of radius R.
Chapter 5
Sequences and Series

The concepts of sequence and series provide another basic tool of calculus. One may
use them, among other things, to approximate functions by comparatively simple
formulas. In 1668, Mercator published the formula for the logarithmic series. Since
that time, series have been used for countless purposes in science and engineering.
We give here a brief introduction into the definition and convergence properties of
sequences and series, in particular, power series and Taylor series. Later in Chap. 10,
we will treat Fourier series, a special type of series involving trigonometric functions.

5.1 Sequences and Their Limits

Everyone is familiar with the sequence 1, 2, 3, 4,… of positive integers, or with


the sequence 2, 4, 6, 8,… of positive even integers. In the latter case, 2 is the first
element of the sequence, 4 the second, and so on. If we denote the n-th element of
this sequence by an , there is a simple formula for it, namely, an = 2n. In this manner,
we may obtain other sequences of numbers as well, for example

√ 1 1
an = n, an = , an = , (5.1)
n n2
n−1 1
an = (−1)n , an = , an = (−1)n+1 . (5.2)
n n
In fact, a sequence of numbers is a special type of a function, namely, a function
whose domain is the set of integers (usually the positive integers, with or without
0), and whose range is a subset of the real numbers. In other words, we can define
a sequence as a function f : N → R. Usually, however, sequences are denoted by
a1 , a2 , . . . or briefly {an }, or with a different letter like {bn } or {sn }. The elements
an of the sequence {an } are also called the terms of the sequence. Conceptually, one
© Springer Nature Singapore Pte Ltd. 2019 129
M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6_5
130 5 Sequences and Series

Fig. 5.1 Sequence and limit

has to distinguish between the sequence a1 , a2 , a3 , . . . and the set of its elements
{an : n ∈ N} = {a1 , a2 , . . . }.1 For example, for the sequence defined by an = (−1)n ,
we have {an : n ∈ N} = {−1, 1}.
Since sequences are special cases of functions, many of the notions developed
in Chap. 1 for functions also apply to sequences. For example, {an } is an increasing
sequence whenever an < an+1 holds for all positive integers n (which is the same
as saying that an < am whenever n < m). Analogously, the notion of a decreasing
sequence, a constant sequence, and a bounded sequence is introduced. The algebraic
operations on sequences are defined elementwise. For example, the product of two
sequences {an } and {bn } is defined as the sequence whose n-th element is an bn ,
compare Sect. 1.5 on the algebra of functions.
As for functions, the notion of a limit of a sequence is most important.

Definition 5.1 (Convergence, Divergence, and Limit) The sequence {an } converges
to the number s if to every positive number ε there corresponds an integer N such
that
|an − s| < ε, whenever n > N .

In this case, we call {an } a convergent sequence. If no such s exists, we call {an }
a divergent sequence. If {an } converges to s, we write limn→∞ an = s, or simply
an → s, and call s the limit of the sequence.

Figure 5.1 illustrates two ways of describing limits geometrically. One may plot the
points an on the real line in order to see how they approach the limit point s. Or
one may plot the points (n, an ). We have an → s if the line y = s is a horizontal
asymptote of those points. In this figure, all an ’s after a N lie within the distance ε of s.

1 This corresponds to the distinction between a function f and its range R( f ).


5.1 Sequences and Their Limits 131

Example 5.1 1. We have limn→∞ n1 = 0, that is, the limit of the sequence {an }
where an = n1 is zero.
2. Consider the constant sequence with an = k for all n, where k is a given number.
Then limn→∞ an = k.
3. Let an = 1 + n1 , then limn→∞ an = 1.
Several properties of limits of functions, as developed in Chap. 2, hold for limits of
sequences in an analogous manner. We present two such results.
Theorem 5.1 Let {an } and {bn } be sequences with an → s and bn → t. Then the
sequences {an + bn }, {an − bn }, {an bn }, and {an /bn } converge to s + t, s − t, st, and
s/t, respectively, the latter if t = 0.
Theorem 5.2 (Sandwich Theorem for sequences) Let {an }, {bn }, {cn } be sequences
with an ≤ bn ≤ cn for all n. If an → s and cn → s, then also bn → s.
Continuity of functions is related to limits of sequences.
Theorem 5.3 Let {sn } be a sequence of real numbers with sn → s. If f is a function
that is continuous at s and defined at all sn , then f (sn ) → f (s).
In fact, a converse of Theorem 5.3 also holds: If f (sn ) → f (s) for all sequences {sn }
which converge to a given point s, then f is continuous at s.
We may also express the limit of a sequence as an improper limit of a function,2
see also Fig. 5.1.
Theorem 5.4 Let f be a function with f (x) defined for all x ≥ N with N given,
let {an } be a sequence of real numbers such that an = f (n) for n ≥ N . Then
lim x→∞ f (x) = s implies limn→∞ an = s.
We may exploit Theorem 5.4 in order to compute limits of sequences.
Example 5.2 Find the limit of the sequences
 
ln n
1. ,
 n 1/n 
2. (n) ,
  
n+1 n
3. .
n−1
Solution:
ln x
1. The function f (x) = is defined for all x ≥ 1 and agrees with the given
x
ln n ln x
sequence at positive integers. By Theorem 5.4, lim = lim holds if
n→∞ n x→∞ x
the limit on the right-hand side exists. By l’Hôpital’s rule,

ln x 1/x 0
lim = lim = = 0.
x→∞ x x→∞ 1 1

2 See Sect. 2.4.


132 5 Sequences and Series

ln n
Therefore, lim = 0.
n→∞ n
2. Since
1 ln n
(n) n = e n ,

we obtain from the previous example and Theorem 5.3 that (n)1/n → e0 = 1,
because the
 exponential
 function is continuous.
n+1 n
3. Let an = . The limit is of indeterminate form ∞/∞. In order to apply
n−1
l’Hôpital’s rule, we change it to the form ∞ · 0 by taking the natural logarithm
of an ,    
n+1 n n+1
ln an = ln = n ln ,
n−1 n−1

and set  
x +1
f (x) = x ln .
x −1

Then
  
x +1 ln x+1
x−1
lim f (x) = lim x ln = lim
x→∞ x→∞ x −1 x→∞ 1/x
−2/(x − 1)
2
= lim by l’Hôpital’s rule
x→∞ −1/x 2
2x 2 2
= lim 2 = lim = 2.
x→∞ x − 1 x→∞ 1 − (1/x 2 )

By Theorem 5.4, limn→∞ ln an = lim x→∞ f (x) = 2. Since f (x) = e x is contin-


uous, by Theorem 5.3 we have limn→∞ an = limn→∞ eln an = e2 . Therefore,
 n
n+1
lim = e2 .
n→∞ n−1

In the example an = 1/n, the distance of an to the limit 0 decreases monotonically


with increasing n. The example an = ln n/n illustrates another possible behavior of a
convergent sequence. Namely, we have a1 = 0 < a2 = 0.347 · · · < a3 = 0.366 . . . ,
and a3 > a4 > . . . , so the sequence at first moves away from its limit 0 before it
comes closer and closer to it. Moreover, a convergent sequence may oscillate around
its limit, like, for example

(−1)n n1 , n odd,
an =
0, n even.

Its limit is 0, but it moves away from zero in both directions an infinite number of
times (but by smaller and smaller amounts).
5.1 Sequences and Their Limits 133

Note also that if we add, delete, or alter a finite number of terms in a given
sequence, its convergence behavior does not change—it will still converge to the
same limit as before, or it will still be divergent.
In the examples considered so far, sequences {an } were defined by specifying the
n-th element (or term) an with a formula where we only have to insert n, like an = 2n.
Alternatively, we may specify an with a formula which involves elements am with
m < n, for example, an = an−1 + 1. In that case, if we know a1 (say, a1 = 1), we
can compute the elements an successively, since a2 = a1 + 1 = 2, a3 = a2 + 1 = 3,
and so on.
Definition 5.2 (Recursively Defined Sequences) A recursive definition of a
sequence consists of
1. the value(s) of the initial term or terms,
2. a rule, called a recursive formula, for calculating any later term from terms that
precede it.
Note that for a given sequence we may have a direct definition as well as a recursive
definition. For example, the sequence 1, 2, 3,… can be obtained by prescribing
an = n, or from the recursive definition a1 = 1, an = an−1 + 1. Note also that the
recursive definition could also be written as a1 = 1, an+1 = an + 1.
Example 5.3 1. Let a1 = 1 and an = nan−1 . This is a recursive definition of the
sequence whose terms are a1 = 1, a2 = 2a1 = 2, a3 = 3a2 = 3 · 2 = 6, a4 =
4 · a3 = 4 · 6 = 24, a5 = 5 · a4 = 5 · 24 = 120, and so on. A direct definition
of this sequence is an = n!, the factorial of n.
2. Let a1 = 1, a2 = 1, and an+1 = an + an−1 . The sequence generated from this
recursive definition is known as the Fibonacci sequence, and the terms an of
this sequence are called the Fibonacci numbers. Note that two initial terms are
needed, since the recursive formula uses the two preceding terms. The next terms
after a2 are a3 = a2 + a1 = 2, a4 = a3 + a2 = 2 + 1 = 3, a5 = a4 + a3 = 3 +
2 = 5, and a6 = a5 + a4 = 5 + 3 = 8. There is also a direct definition of this
sequence, by the formula of Moivre–Binet
√ n √ n 
1 1+ 5 1− 5
an = √ − .
5 2 2

nan
3. Let a1 = −2, an+1 = . This recursive formula defines a sequence whose
(n + 1)
terms are
a1 2a2 2 3a3 1
a1 = −2, a2 = = −1, a3 = = − , a4 = = − , ...
2 3 3 4 2
134 5 Sequences and Series

5.2 Infinite Series

General definition. Given a sequence {sn } of numbers, an expression of the form

s1 + s 2 + s 3 + · · · + s n + · · · (5.3)

is called an infinite series or simply series. The number sn is called the n-th term of
the series. The finite sums Sn

S1 = s1
S2 = s1 + s2
S3 = s1 + s2 + s3
...

n
Sn = sk
k=1

are called partial sums of the series. If the sequence {Sn } formed by the partial sums
has a limit S, we say that the series (5.3) converges, we call S the sum of the series
and write


s1 + s2 + s3 + · · · + sn + · · · = sk = S. (5.4)
k=1

If the sequence {Sn } diverges, we say that the series diverges.


Since it is convenient, it is the standard custom to write the series, too, in sigma
notation. Therefore, expressions like

 ∞
 
sn , ak , or an
n=1 k=1

may refer either to the series itself (a collection of terms) or to its sum (a number).
However, usually the meaning is clear from the context.
Geometric Series. A series of the form


a + ar + ar 2 + · · · + ar n−1 + · · · = ar n−1 , (5.5)
n=1

where a and r are fixed real numbers with a = 0 is called geometric series. The
ratio r can be positive or negative, for example, r is positive in the geometric series

1 1 1 1
1+ + 2 + 3 + · · · + r −1 + · · ·
2 2 2 2
5.2 Infinite Series 135

while it is negative in
 
1 1 1 n−1
1− + 2 + ··· + − + ···
3 3 3

If |r |  = 1, we can determine the convergence or divergence of the geometric series


in the following way. We start with the partial sums

Sn = a + ar + ar 2 + · · · + ar n−1 ,
r Sn = ar + ar 2 + · · · + ar n .

Subtracting r Sn from Sn , we obtain

Sn (1 − r ) = a(1 − r n ),

and dividing both sides by 1 − r , we arrive at

a(1 − r n )
Sn = . (recall r  = 1.)
1−r

Three cases arise


a
• If |r | < 1, then r n → 0 as n → ∞ and so lim Sn = .
1−r
n→∞
• If |r | > 1, then |r | → ∞ as n → ∞ and the series diverges.
n

• If r = 1, we obtain

Sn = a + a · 1 + a · 12 + · · · + a · 1n−1 = na,

so the series diverges, as limn→∞ Sn = ±∞ depending on sign of a. If r = −1,


the series diverges because the Sn alternate between the values a and 0.

In short, the geometric series ∞ n=1 ar
n−1 a
converges to the sum 1−r if |r | < 1 and
diverges if |r | ≥ 1.
Example 5.4 Let a ball be dropped from a meters above a flat surface. Each time
the ball hits the surface after falling from a height h, it rebounds to the height r h,
where r is positive but less than 1. Find the total distance the ball travels up and down
(Fig. 5.2).
Solution: The total distance is
2ar
S = a + 2ar + 2ar 2 + 2ar 3 + · · · = a +
1−r
1+r
=a .
1−r
136 5 Sequences and Series

Fig. 5.2 A bouncing ball

If a equals 6 m and r = 2/3, for example, the distance is

1 + 2/3 5/3
S =6· =6· = 30 m.
1 − 2/3 1/3

Example 5.5 Examine whether the following series converge or diverge:


∞  1 n−1
1. n=1 5 2
 n−1
2. −1 + 21 − 41 + 18 · · · − − 21 ···
∞  3 k ∞  3 k−1
3. k=0 7 = k=1 7
4. π2 + π4 + π8 + · · ·
2 3

Solution:
1. The series converges as 0 < r = 1
2
< 1, its sum equals

5
S= = 10.
1 − (1/2)

2. We have a = −1 and r = − 21 , so |r | < 1. The series converges to

−1 2
S= =− .
1 − (−1/2) 3

3. We have a = 1 and r = 37 , the series converges to

1 7
S= = .
1 − 3/7 4
π
4. The series diverges as r = 2
≈ 22
14
> 1.
5.2 Infinite Series 137

General properties of series.



If ∞
1.  n=1 sn converges, then sn → 0 as n → ∞.

2. n=1 sn diverges if lim n→∞ sn fails to exist or is different from zero.
3. Addition, deletion, or alteration of a finite number of terms of an infinite series
does not change its nature (convergence or divergence). However, it may change
its sum. 
4. The series s1 + s2 + s3 + s4 + · · · + s n + · · , which we usually write as ∞
·∞ n=1 sn ,

can also be written as n=1+m sn−m or n=1−m sn+m , with an arbitrary positive
integer m. For example, the expressions

∞ ∞ ∞

1 1 1
n
, n−5
and n+3
n=0
2 n=5
2 n=−3
2

denote the same series, namely, 1 + 1


2
+ 1
4
+ 1
8
+ · · · ..
We already know that limits of sequences can be interchanged with addition, sub-
traction, and scalar multiplication. The same is true for series.
 
Theorem 5.5 Consider two convergent series, n s n = S and n tn =
T , let k be
a given real number. Then the series n (sn + tn ), n (sn − tn ), and n ksn also
converge, and
  
(sn + tn ) = sn + tn =S + T, (5.6)
n n n
  
(sn − tn ) = sn − tn =S − T, (5.7)
n n n
 
ksn = k sn = k S. (5.8)
n n

Convergence tests for series with nonnegative terms. It is often useful to know
whether a given series converges or diverges, even if one does not (or not yet) know
its sum. For this purpose, a variety of criteria have been developed.
 
Theorem 5.6 (Comparison principle) Let n sn and n tn be series with 0 ≤ tn ≤
sn for all n ≥ N , where N is a given positive integer. Then the following assertions
hold:
 
1. If n sn converges, then n tn converges.
2. If n tn diverges, then n sn diverges.
 
Theorem 5.7 (Limit comparison test) Let n sn and n tn be series with sn > 0
and tn > 0 for all n ≥ N , where N is a given positive integer. Then the following
assertions hold:
sn  
1. If lim = λ for some 0 < λ < ∞, then the series n sn and n tn either both
n→∞ tn
converge or both diverge.
138 5 Sequences and Series

sn  
2. If lim = 0 and tn converges, then n sn converges.
n→∞ tn
sn  
3. If lim = ∞ and tn diverges, then sn diverges.
n→∞ tn

Theorem 5.8 (Ratio test) Let sn be a series with sn > 0 for all n and suppose
that sn+1
lim = ρ. (5.9)
n→∞ sn

Then the following assertions hold:


1. The series converges if ρ < 1.
2. The series diverges if ρ > 1 or ρ is infinite.
3. The test is inconclusive if ρ = 1.
∞ x n
Let us apply the ratio test to the series , where x is a given real number.
n=0 n!
We have

xn sn+1 x n+1 n! x sn+1


sn = , = · = , so lim = 0.
n! sn (n + 1)! x n n+1 n→∞ sn

Therefore, the series converges for all real numbers x. In this manner, we have shown
that the definition
∞
xn
ex = (5.10)
n=0
n!

makes sense. Indeed, (5.10) constitutes the standard definition of the exponential
function on which the definitions of general exponentials and logarithms are based,
see Chap. 1.
Let us add the remark that the ratio test can also be applied if the limit in (5.9)
does not exist. In fact, if we can find an integer N and a real number ρ such that
sn+1
≤ρ<1
sn

holds for all n ≥ N , then the series converges.



Theorem 5.9 (Root test) Let sn be a series with sn ≥ 0 for all n ≥ N , where N
is a given positive integer, and suppose that

lim n
sn = ρ.
n→∞

Then the following assertions hold:


1. The series converges if ρ < 1.
2. The series diverges if ρ > 1 or ρ is infinite.
3. The test is inconclusive if ρ = 1.
5.2 Infinite Series 139

An important series is the harmonic series

∞
1
. (5.11)
n=1
n

It is divergent,
 as can be seen from the comparison principle (Theorem 5.6(2)), if we
choose for n tn the series
     
1 1 1 1 1 1 1
1+ + + + + ··· + + + ··· + +··· . (5.12)
2 4 4 8 8 16 16
     
4 terms 8 terms

Indeed, all bracketed sums have the value 1/2, so the series (5.12) diverges.
Example 5.6 Determine whether the following series converge or diverge:
∞
1
1. n
,
n=1
n2
∞
2n + 1
2. 2 + 2n + 1
,
n=1
n
∞
2n + 5
3. ,
n=0
3n
∞
2n
4. .
n=1
n2

Solution:
 1
1. We see that this series converges when we compare it with the series and
2n
apply Theorem 5.6(1) or alternatively Theorem 5.7(2). For the former note that
 1
0 ≤ 1/(n2n ) ≤ 1/(2n ) and that is a convergent geometric series, since
2n
r = 1/2 < 1.
2. The series diverges by Theorem 5.7(1) when we involve the divergent series
1 1 2n + 1
. Let tn = and sn = 2 . We have
n n n + 2n + 1

sn
2n+1
n 2 +2n+1 n 2 (2 + n1 ) 2+ 1
= = = n
,
tn 1
n
n 2 (1 + 2
n
+ 1
n2
) 1+ 2
n
+ 1
n2

and therefore sn
lim = 2,
n→∞ tn

since the three sequences n1 , n2 , 1


n2
converge to zero.
140 5 Sequences and Series

3. We apply the ratio test. We have

2n+1 +5
sn+1 3n+1 2n+1 + 5 2 + 25n
= 2n +5
= = .
sn 3n
3(2n + 5) 3(1 + 25n )

Since 5/2n → 0, we obtain

sn+1 2
lim = < 1.
n→∞ sn 3

Hence, the series is convergent.


4. It follows from the root test that the series is divergent. Indeed,
 1/n
2n 2 2
(sn )1/n = = = .
n2 (n 2 )1/n (n 1/n )2

Since we know already that n 1/n → 1 (see Example 5.2(2)), we obtain

lim (sn )1/n = 2 > 1.


n→∞

Example 5.7 Let us consider the series

∞
1 1 1 1 1
p
= p + p + p + ··· + p + ··· ,
n=1
n 1 2 3 n

where p is a nonnegative real number. This series is convergent if p > 1 and divergent
if 0 ≤ p ≤ 1, as we will see later in Chap. 6. For p = 1, the harmonic series arises,
whose divergence we have already investigated above. When we consider the sum
as a function of p, we obtain the famous Riemann zeta function

∞
1
ζ ( p) = . (5.13)
n=1
np

Here, ζ is the Greek letter “zeta”. In 1734, the mathematician Euler has found a way
to compute ζ ( p) for even integers ζ , for example,

π2 π4 π6
ζ (2) = , ζ (4) = , and ζ (6) = .
6 90 945
No simple formulas are known for ζ ( p) when p is an odd integer.
5.3 Alternating Series, Absolute and Conditional Convergence 141

5.3 Alternating Series, Absolute and Conditional


Convergence

A series in which the terms are alternately positive and negative is called an alter-
nating series. For example,


 (−1)n+1 1 1 1 1 1
=1− + − + − + ··· (5.14)
n=1
n 2 3 4 5 6
∞
(−1)n+1 = 1 − 1 + 1 − 1 + 1 − 1 + · · · (5.15)
n=1


(−1)n+1 n = 1 − 2 + 3 − 4 + 5 − 6 + · · · (5.16)
n=1

are alternating series. The latter two are obviously divergent (just look at the partial
sums). The first one converges, as a consequence of the following criterion.

Theorem 5.10 (Leibniz test for alternating series) The series ∞ n=1 (−1)
n+1
sn =
s1 − s2 + s3 − s4 + · · · converges if all three of the following conditions are satisfied:
1. The terms sn are nonnegative for all n ≥ N , where N is a fixed integer.
2. sn ≥ sn+1 for all n ≥ N .
3. sn → 0.
Moreover, in this case the sum S satisfies |S − Sn | < sn+1 for all n ≥ N , where Sn
is the n-th partial sum.
Indeed, setting sn = 1/n in the theorem we see that the series (5.14) converges.

Definition 5.3 A series n sn converges absolutely (is absolutely convergent) if the
corresponding series of absolute values, n |sn |, converges. A series that converges
but does not converge absolutely is called conditionally convergent.

Since the harmonic series ∞ 1
n=1 n diverges (as we have seen in the previous section),
the series (5.14) furnishes an example of a conditionally convergent series.
Theorem 5.11 Every absolutely convergent series is convergent.
∞ 1
Example 5.8 1. We show that the series (−1)n+1
is absolutely conver-
n2
n=1
gent, and hence convergent by Theorem 5.11. Indeed, we have

1 1
sn = (−1)n+1 , |sn | = 2 ,
n2 n
∞ 1
and the series is convergent according to Example 5.7.
n=1 n2
142 5 Sequences and Series


 sin nx
2. We show that the series is absolutely convergent. Indeed, we have
n=1
n2

sin nx | sin nx| 1


sn = 2
, |sn | = 2
≤ 2.
n n n

 ∞
1
Again, the series is convergent according to Example 5.7. Now, the com-
n2
n=1 
parison principle (Theorem 5.6) implies that ∞ n=1 |sn | is convergent.

Theorem 5.12 (Rearrangement of absolutely convergent series) If ∞ n=1 sn con-
verges absolutely and {tn } is any rearrangement of the terms sn , then ∞ n=1 tn con-
verges absolutely, and
∞ ∞
tn = sn .
n=1 n=1

5.4 Power Series

Power series are a special type of infinite series. We will see that in an interval where
a power series is convergent, the sum of the power series is a continuous function
with derivatives of all orders. We will also examine the opposite question, that is,
whether a function f = f (x), which has derivatives of all orders on an interval I ,
can be expressed in the form of a power series.
Definition 5.4 (Power series) An infinite series of the form


cn x n = c0 + c1 x + c2 x 2 + · · · + cn x n + · · ·
n=0

is called a power series centered at x = 0.


An infinite series of the form


cn (x − a)n = c0 + c1 (x − a) + c2 (x − a)2 + · · · + cn (x − a)n + · · ·
n=0

is called a power series centered at x = a. Its n-th term is cn (x − a)n , the number a
is the center.
Examples of power series. Our first example is the geometric series


xn = 1 + x + x2 + x3 + · · · (5.17)
n=0
5.4 Power Series 143

We have seen in Sect. 5.2 that it converges for |x| < 1 and has the sum

 1
xn = . (5.18)
n=0
1−x

We now look at this formula the other way round and see that the function f (x) =
1/(1 − x) is expressed in form of the power series (5.17), as long as |x| < 1.
Other examples of functions expressed by power series are the exponential, sine
and cosine functions

x2 x3 x4
ex = 1 + x + + + +··· (5.19)
2! 3! 4!
x3 x5
sin x = x − + − ··· (5.20)
3! 5!
x2 x4
cos x = 1 − + − ··· (5.21)
2! 4!
That these formulas are indeed correct will result from the following exposition up
to Example 5.9. Moreover, the binomial series

p( p − 1) 2 p( p − 1)( p − 2) 3
(1 + x) p = 1 + px + x + x + ··· (5.22)
2! 3!
provides another example of a power series centered at 0.
Radius of convergence. We already know that the exponential series (5.19) con-
verges for all real numbers x, while the geometric series (5.18) converges for |x| < 1
and diverges for |x| > 1. Such a behavior is not coincidental. It turns out that for
every power series ∞ n=0 cn (x − a) there is a nonnegative real number R (possi-
n

bly R = ∞) such that the power series converges if |x − a| < R and diverges if
|x − a| > R. The case R = ∞ means that the power series converges for all x, the
case R = 0 means that the series converges only for x = a (which is trivial, since
then its sum equals c0 in any case). If R > 0 is finite, the series may or may not
converge at either of the points x = a − R or x = a + R.
Definition 5.5 The number R as defined above is called the radius of conver-
gence, and the interval (a − R, a + R) is called the interval of convergence of
the power series.
Often, but not always, the radius of convergence can be determined by a variant (due
to Euler) of the ratio test. Namely,

cn+1 1
if ρ = lim exists, then R = . (5.23)
n→∞ cn ρ

Using this criterion, one can check that the power series for the sine and cosine
functions given above have convergence radius R = ∞, while the binomial series
(5.22) has R = 1.
144 5 Sequences and Series

Let us remark that the domain of definition of a function should not be confused
with the radius of convergence of an associated power series. For example, the
function f (x) = 1/(1 − x) is defined everywhere except at its poles x = ±1, but its
power series (5.17) centered at 0 has R = 1, so it does not represent f for |x| > 1,
that is, across the poles as seen from the center x = 0.
Theorem 5.13 (Derivative of power series) Any function expressed as a power
series,


f (x) = cn (x − a)n , (5.24)
n=0

is differentiable (and hence, continuous) within its interval of convergence (a −


R, a + R). Its derivative is obtained term by term,


f
(x) = ncn (x − a)n−1 , (5.25)
n=1

and the interval of convergence of (5.25) is again (a − R, a + R).


As an example, for the exponential function we get from (5.19)

∞ ∞ ∞
xn d x  x n−1  x n−1
ex = , e = n = = ex ,
n=0
n! dx n=1
n! n=1
(n − 1)!

an alternative way to compute its derivative. In an analogous manner, one may check
that sin
= cos and cos
= − sin.
Since by Theorem 5.13, the derivative of a power series has the same interval of
convergence as the original series, we may repeat this process to obtain the power
series for the second derivative,


f

(x) = n(n − 1)cn (x − a)n−2 , (5.26)


n=2

and so on for derivatives of arbitrary order. Inserting x = a into the formulas (5.24)–
(5.26), we get
f (a) = c0 , f
(a) = c1 , f

(a) = 2c2 .

Theorem 5.14 Any function expressed as a power series,




f (x) = cn (x − a)n , (5.27)
n=0
5.4 Power Series 145

possesses derivatives f (k) of all orders k valid within its interval of convergence.
Moreover, we have
f (k) (a) = k! ck . (5.28)

Taylor and Maclaurin Series. We have just seen that, within its interval of con-
vergence, a power series defines a function which has derivatives of all orders. Now
we reverse this procedure. Let f be a function which has derivatives of all orders
on an interval containing a as an interior point. Keeping in mind formula (5.28), we
associate with it the power series


 f k (a) f

(a) f (n) (a)


(x − a)k = f (a) + f
(a)(x − a) + (x − a)2 + · · · + (x − a)n + · · ·
k! 2! n!
k=0
(5.29)

Definition 5.6 The power series (5.29) is called the Taylor series for f at a. In the
special case a = 0, it takes the form

∞
f k (0) k f

(0) 2 f (n) (0) n


x = f (0) + f
(0)x + x + ··· + x + ··· (5.30)
k=0
k! 2! n!

and is also called the Maclaurin series for f .


For an arbitrary point x in the domain of f with given a ∈ D( f ) there are three
mutually exclusive possibilities:
1. The Taylor series of f in a converges at x, and we have

∞
f k (a)
f (x) = (x − a)k . (5.31)
k=0
k!

2. The Taylor series of f in a converges at x, but (5.31) does not hold.


3. The Taylor series of f in a diverges at x.
If the first case applies to all x in some interval I containing a, we say that the Taylor
series for f at a converges to f on I . This situation arises when f can be expressed
by a power series with interval of convergence I . (Then formula (5.31) says that the
Taylor series for f is given by this power series.) In general, we may characterize
the first case with the aid of the n-th partial sum of the Taylor series,

n
f k (a)
Pn (x) = (x − a)k (5.32)
k=0
k!
f

(a) f (n) (a)


= f (a) + f
(a)(x − a) + (x − a)2 + · · · + (x − a)n .
2! n!
146 5 Sequences and Series

Viewed as a function of x, Pn is a polynomial of order n, it is called the Taylor


polynomial of order n for f at a. We interpret Pn (x) as an approximation for f (x)
and define the remainder
Rn (x) = f (x) − Pn (x). (5.33)

Obviously, (5.31) holds at a given x if and only if limn→∞ Rn (x) = 0.


Theorem 5.15 Let f be differentiable up to order n + 1 in an open interval I con-
taining a point a. Then for each x in I , there exists a number c between x and a, that
is, x < c < a or a < c < x, such that

n
f k (a)
f (x) = (x − a)k + Rn (x) = Pn (x) + Rn (x) (5.34)
k=0
k

holds with
f (n+1) (c)
Rn (x) = (x − a)n+1 . (5.35)
(n + 1)!

We present a particular situation where we can ensure that the remainder term con-
verges to 0.
Theorem 5.16 Let f be differentiable to all orders n in an open interval I containing
a point a. If there are positive constants M and r such that

| f (n+1) (t)| ≤ Mr n+1 , for all t ∈ I, (5.36)

then limn→∞ Rn (x) = 0 for all x ∈ I , and consequently the Taylor series of f in a
converges to f on I .

Proof By Theorem 5.15 and (5.36), we have

|x − a|n+1
|Rn (x)| ≤ Mr n+1 , for all x ∈ I.
(n + 1)!

The assertion follows since limn→∞ t n /n! = 0 for all numbers t.

Example 5.9 Find the Maclaurin series for sin x, cos x, and e x and show that they
converge to those functions.

Solution: For the sine function, we have f (x) = sin x, f


(x) = cos x, f

(x) =
− sin x, f

(x) = − cos x, and so on. We see that 0 = f (0) = f

(0) = f (4) (0) =


· · · , and that f
(0) = 1, f

(0) = −1, f (5) (0) = 1, and so on. Therefore, the Maclau-


rin series of f (x) = sin x becomes

(0) 2 x3 x5
f (0) + f
(0)x + x + ··· = x − + − ···
2! 3! 5!
5.4 Power Series 147

In an analogous way, one obtains the Maclaurin series for cos x as

x2 x4
1− + − ··· .
2! 4!

For the exponential function f (x) = e x , we have f (n) (x) = f (x) = e x for all n, and
hence f (n) (0) = e0 = 1 for all n, and the Maclaurin series becomes

(0) f n (0)
f (0) + f
(0)x + + ··· + + ···
2! n!
x2 x3 xn
=1+x + + + ··· + + ···
2! 3! n!
∞
xk
= .
k=0
k!

In all three cases, one may check that condition (5.36) holds on every bounded
interval I , with r = 1 and a suitable M. Therefore, the Maclaurin series converges
and yields a power series expansion of the sine, cosine, and the exponential function,
respectively. We thus have proved that formulas (5.19)–(5.21) are correct.
Taylor series can be used to compute approximations to unknown quantities. We
present a classical example from mechanics. It involves the notion and calculus of
integrals treated in Chap. 6, so the student may want to come back to this example
after having studied that chapter.

Example 5.10 A simple pendulum consists of a mass attached to the end of a weight-
less rod of length L, the other end of which is fixed. Its position is characterized by
the angle it makes with the vertical axis. Suppose it is held initially at an angle α and
then released from rest. It can be shown as a consequence of the laws of mechanics
that, in the absence of friction, the time T it takes for the pendulum to perform one
complete swing back and forth (called the period of the pendulum) is given by
 
8L α 1
T = √ dθ.
g 0 cos θ − cos α

θ
Putting k = sin α/2, cos θ = 1 − 2 sin2 , and cos α = 1 − 2 sin2 α/2 and making
2
sin θ/2 1
the substitution sin φ = = sin θ/2, we obtain the integral3
sin α/2 k
  π/2
L 1
T =4  dφ. (5.37)
g 0 1 − k 2 sin2 φ

3 Such an integral is called a complete elliptic integral of the first kind.


148 5 Sequences and Series

We are interested in the function T = T (k), that is, the dependence of the period on
the initial angular displacement. We compute the Maclaurin series for the integrand
by expanding (1 − k 2 sin2 φ)−1/2 as a binomial series, according to (5.22) with x =
k 2 sin2 φ and p = −1/2. Then

T (k) = (5.38)
   
L π/2 1 1 1·3·5 6 6
4 1 + k 2 sin2 φ + 2 3k 4 sin4 φ + 3 k sin φ + · · · dφ.
g 0 2 2 2! 2 3!

If we integrate term by term, we get a power series w.r.t. the variable k. One can show
that this power series is indeed the Maclaurin series for T = T (k) which moreover
converges to the period T . For small initial displacements, k is small, and it turns
out that few terms of the series suffice in order to yield a good approximation of
T . If we consider only the first (constant) term, we obtain the so-called first-order
approximation of T which gives

L
T = 2π .
g

If we moreover take into account the next term in the Maclaurin series, we obtain
the second-order approximation
  
L k2
T = 2π 1+ .
g 4

5.5 Exercises

5.5.1 Determine the following limits of sequences:


(a) lim (3n)1/n ,
n→∞
(b) lim (−1/2)n ,
n→∞
 
n−2 2
(c) lim ,
n→∞ n
100n
(d) lim ,
n→∞ n!
ln n 2
(e) lim .
n→∞ n

 ∞  
1 1 3
5.5.2 Show that − = .
k=1
k k+2 2
5.5 Exercises 149

5.5.3 In Exercises (a)–(d), check that the series is a geometric series and find its sum
if it converges. In Exercises (e) and (f), find a formula for the nth partial sum
of each series and use it to find the sum of the series if the latter converges.
2 2 2 2
(a) 2 + + + + · · · + n−1 + · · ·
3 9 27 3
9 9 9 9
(b) + 2
+ 3
+ ··· + + ···
100 100 100 100n
1 1 1 1
(c) 1 − + − + · · · + (−1)n−1 n−1 + · · ·
2 4 8 2
(d) 1 − 2 + 4 − 8 + · · · + (−1)n−1 2n−1 + · · ·
1 1 1 1
(e) + + + ··· + + ···
2·3 3·4 4·5 (n + 1)(n + 2)
5 5 5 5
(f) + + + ··· + + ···
1·2 2·3 3·4 n(n + 1)
5.5.4 Write out the first few terms of the following series to show how the series
starts. Then find the sum of the series.
∞
(−1)n
(a) ,
n=0
4n
∞
7
(b) ,
n=1
4n
∞  
5 1
(c) + n ,
n=1
2n 3
∞  
5 1
(d) − n ,
n=0
2n 3

 1  
(−1)n
(e) + ,
n=0
2n 5n
∞  n+1 
5
(f) .
n=0
2n
5.5.5 Which of the following series converge, and which diverge? Give reasons for
your answer. If a series converges, find its sum.
∞  
1 n
(a) √ ,
n=0 2
∞

(b) ( 2)n ,
n=0
∞
3
(c) (−1)n+1 n ,
n=1
2
150 5 Sequences and Series


 cos nπ
(d) ,
n=0
5n
∞
(e) e−2n ,
n=0

 1
(f) ln .
n=0
n
5.5.6 Discuss the convergence of the series
 1
(a) , 0 < p ≤ 1,
np
 1
(b) , p > 1.
np
5.5.7 Test for convergence of the series

 1
(a) √ √ ,
n=0
n+ n+1

 1
(b) .
n=0
xn + x −n
5.5.8 Test for convergence of the series
1 x2 x4
(a) √ + √ + √ + ···
2 1 3 2 4 3
2 6 14 (2n − 2)x n−1
(b) 1 + x + x 2 + x 3 + · · · + + ···, x > 0
5 9 17 2n + 1
 4 · 7...(3n + 1)
5.5.9 Test for convergence of the series xn.
1 · 2···n
5.5.10 Test for convergence of the series

 n2
(i) ,
n=1
2n
∞  n
1
(ii) .
n=1
1+n

5.5.11 Test for convergence of the series sn , where

n
, n s odd,
sn = 2n
1
2n
, n is even.

1
5.5.12 Find the Taylor series for 1+x at a = 2.
5.5.13 Compute the value of e with an error of less than 10−6 , using the series
expansion of the exponential function.
5.5 Exercises 151

Fig. 5.3 Planning a highway

5.5.14 In planning a highway across a desert, a surveyor must make compensations


for the curvature of the earth when measuring differences in elevation, see
the following figure.
a. If s is the length of highway and R is the radius of the earth, show that the
correction C is given by C = R[sec (s/R) − 1].
s2
b. Use the Maclaurin series for sec x to show that C is approximately 2R +
5s 4
24R 3
(Fig. 5.3).
Chapter 6
Integration

6.1 Introduction

In Chap. 1, we have seen how mathematical functions model real processes. The
differential calculus deals with finding the rate of change of a function (thus, of a
process). We have studied this subject in Chaps. 3 and 4. But this part reveals only half
of the story. It is good to know, at any given instant, the rate of change of a function,
but it is of additional interest and use if we could describe how such instantaneous
changes accumulate over an interval to produce the original (given) function. In other
words, we are interested in the process of studying how a change in behavior will
yield behavior itself. The process we are looking for was discovered by Newton and
Leibniz, it is called integration. Its precursors in human history date back to the
early civilizations in China, India, Egypt, Mesopotamia, and Greece, related to the
determination of length, area, and volume. In the meantime, however, the notion of
an integral has by far surpassed these origins and nowadays constitutes a fundamental
and general concept which permeates almost all areas where mathematics is applied.
The main goal of this chapter is to present those basic results of the integral
calculus which are essential for its use in engineering, science, and economics to
solve real-world problems. We introduce the definite integral as the limit of a sum of
certain quantities, motivated by the classical problem of area computation. This is the
theme of Sect. 6.2. In Sect. 6.3, we present the indefinite integral as the process inverse
to differentiation, that is, given a function f , we want to find a function F (called the
antiderivative) whose derivative F  is equal to f . Sections 6.4 and 6.5 are devoted
to the method of substitution and of partial integration, powerful tools to transform
and compute integrals. The Fundamental Theorem of Calculus, which exhibits that
differentiation and integration are inverse to each other, is presented in Sect. 6.6,
together with the calculus for definite integrals. In Sects. 6.7–6.10, derivatives and
integrals are discussed for various classes of elementary functions.

© Springer Nature Singapore Pte Ltd. 2019 153


M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6_6
154 6 Integration

6.2 Integral and Area

Let f (x) be a nonnegative function defined on a closed interval [a, b]. We want to
find area of the region “below the graph of f ”, that is, of the region enclosed by the
graph of f , the vertical lines x = a and x = b, and the x-axis, shown as the shaded
region in Fig. 6.1. If f is a constant function, say f (x) = c, the region in question
becomes a rectangle whose area equals c(b − a). Next, consider a subdivision (also
called a partition) a = x 0 < x1 < x2 < · · · < xn = b of the interval [a, b]. This
partition divides [a, b] into n subintervals [xk−1 , xk ], k = 1, 2, . . . , n. Let s be a step
function for this partition, that is, a function which has a constant value s(x) = ck on
each open subinterval (xk−1 , xk ), see Sect. 1.3. If ck ≥ 0  holds for all those values,
the area of the region below the graph of s equals the sum nk=1 ck (xk − xk−1 ) of the
area of the rectangles below s bounded by the verticals x = xk−1 and x = xk from
left and right.
Let now f be an arbitrary nonnegative function. We may approximate the area of
the region below its graph by the sum


n
f (ξk )(xk − xk−1 ) (6.1)
k=1

of the areas of the rectangles {(x, y) : xk−1 ≤ x ≤ xk , 0 ≤ y ≤ f (ξk )}. Here, the
points ξk on the x-axis are arbitrarily chosen between xk−1 and xk . We expect that the
smaller the subintervals become (taking n larger and larger), the more accurate will
be the estimate of the shaded area in Fig. 6.1. The exact value A of the area arises
“in the limit of smaller and smaller partitions”,


n
A = lim f (ξk )(xk − xk−1 ) . (6.2)
k=1

Fig. 6.1 Area and partition


6.2 Integral and Area 155

When this limit A exists, it is called the Riemann integral or definite integral of f
over [a, b] and is denoted by
 b
f (x) d x , (6.3)
a

and the function f is called Riemann integrable or simply integrable. The approx-
imate sums in (6.2) are called Riemannian sums. Thus, the Riemann integral is
defined as the limit of Riemannian sums. For nonnegative functions f , it gives the
area below y = f (x) for a ≤ x ≤ b. The expression (6.3) is read as “the integral
from a to b of f of x dee x” or sometimes as “the integral from a to b of f of x with
respect to x”. The number a is called the lower limit of integration, the number b the
upper limit, and x is called the integration variable.
Several remarks are in order.
1. The limit procedure in (6.2) is more complicated than the ones previously encoun-
tered (limits of functions and sequences), because it involves partitions. A precise
definition is given in Appendix D.
2. The letter used for the integration variable is completely arbitrary. In place of
 b  b  b  b
f (x) d x, we may write f (t) dt or f (u) du or f (y) dy.
a a a a
3. The value of the integral only depends on the function f , the lower limit a and
the upper limit b.
So far, we only have considered nonnegative functions f , because we wanted to
emphasize the close connection between the notion of an integral and the computation
of areas. In fact, the Riemannian sums (6.1) and the Riemann integral (6.3)—if
the limit (6.2) exists—are defined for arbitrary functions f : [a, b] → R, no matter
whether the values of f are positive, negative, or both. As a consequence, the value
of the integral may become negative. In which way, it is still related to areas will be
discussed in Chap. 7.
Remark 6.1 The calculus of integrals was developed independently by Newton and
Leibniz, as part of the infinitesimal calculus. They understood the relations and
analogies between sums and integrals and between differences and derivatives. The
symbol was invented by Leibniz to represent the integral. People believe it is
a stretched out “S”, from the Latin word “summa” for sum. This summarizes the
whole construction: Sum approaches integral,  approaches , and rectangular area
approaches curved area. The notation
 b
f (x) d x
a

has to be understood from this historical context—originally, the symbol “d x” had


been interpreted as the length of an “infinitesimally small” rectangle with height
f (x), and the integral was seen as an infinite sum of such rectangles. Nowadays (that
156 6 Integration

Fig. 6.2 Area of a trapezoid

is, since the end of the nineteenth century), this interpretation has been replaced in
mathematics by the notion of the limit as it is used throughout this book.

Example 6.1 Express the area under the graph of the function y = f (x) = x on
[a, b], 0 < a < b, in the form of a definite integral and evaluate its value.

Solution: The shaded region in Fig. 6.2 is a trapezoid with height b − a and bases a
and b. Its area A equals the value of the corresponding definite integral, so we have
 b
a+b b2 a2
x d x = A = (b − a) = − ,
a 2 2 2

as we know from elementary geometry. (Soon, we will be able to evaluate the integral
without recourse to geometrical considerations.)

6.3 Antiderivatives and Rules of Integration

Definition 6.1 A function F is called an antiderivative or a primitive of the func-


tion f on an interval I if F  (x) = f (x) for all x ∈ I .

Example 6.2 1. Show that 23 x 3/2 is an antiderivative of x.
2. Let F(x) = 13 x 3 − 5x 2 + x − 1. Show that F is an antiderivative of f (x) =
x 2 − 10x + 1.
3. Let F(x) = n+1 1
x n+1 and f (x) = x n . Show that F(x) is an antiderivative of f .

Solution:
1. Let F(x) = 23 x 3/2 , then F  (x) = 23 23 x 1/2 = x 1/2 . Hence by Definition 6.1, F(x) =
x is an antiderivative of f (x) = x 1/2 .
2 3/2
3
6.3 Antiderivatives and Rules of Integration 157

2. In view of Definition 6.1, we must show that F  (x) = f (x). We have F  (x) =
x 2 − 10x + 1, which is equal to f (x).
3. We have to show that F  (x) = f (x). We have

1
F  (x) = · (n + 1)x n = x n = f (x).
n+1

Theorem 6.1 Let F be an antiderivative of f on an interval I . Then any antideriva-


tive G of f has the form G = F + C for some constant C, that is,

G(x) = F(x) + C , for all x ∈ I.

Proof Since F and G both are antiderivatives of f , we have F  (x) = f (x) and
G  (x) = f (x) for all x ∈ I . This implies that the derivatives of F and G are iden-
tically equal. By Remark 4.1 we find that F and G differ by a constant C. Hence
G = F + C.

Definition 6.2 (Indefinite Integral) For any given function f defined on an interval
I , the indefinite integral 
f (x) d x

is defined as the family of all antiderivatives of f , that is, of all functions of the form

F(x) + C ,

where F  = f on I , and C is an arbitrary constant.

To express the statement above, one traditionally writes the single formula

f (x) d x = F(x) + C . (6.4)

For example, the solution of Example 6.2(2) can now be written as



1 3
x 2 − 10x + 1 d x = x − 5x 2 + x + C .
3

Note, however, that the letter “x” is used in two different meanings in (6.4). On the
right-hand side, it denotes the argument of the function F + C, while on the left-hand
side it denotes the integration variable.
In addition, let us point out a definite integral is a number, while an indefinite
integral is a collection of functions (differing only by a constant).
Constant multiples, sums, and differences. Let F and G be antiderivatives of f
and g respectively, let c be a constant. Since the formulas
158 6 Integration

(cF  )(x) = cF  (x) = c f (x) , (F + G) (x) = F  (x) + G  (x) = f (x) + g(x)

hold by the rules of differentiation, cF is the antiderivative of c f , and F + G is the


antiderivative of f + g. Hence, the indefinite integral satisfies
 
c f (x) d x = c f (x) d x , (6.5)

 
(in particular, for c = −1 we obtain − f (x) d x = − f (x) d x), and
  
[ f (x) ± g(x)] d x = f (x) d x ± g(x) d x . (6.6)

These rules can be combined into the property of linearity, namely,


  
[α f (x) + βg(x)] d x = α f (x) d x + β g(x) d x (6.7)

holds for functions f, g and constants α, β.


Example 6.3 Evaluate the following integrals:

1. π2 dx

2. x 2 − 10x + 1 d x

Solution:
 
1. Since dx = 1 d x = x + C, by (6.5) we have
 
π2 dx = π2 dx = π2x + C .

2. This is just Example 6.2(2), seen in a different light. By the property of linearity
(6.7) we have
   
1
x 2 − 10x + 1 d x= x 2 d x − 10 x d x + 1 d x = x 3 − 5x 2 + x + C .
3

Tables of Integrals. Indefinite integrals are compiled in tables. Table 6.1 contains a
small selection.

Example 6.4 Evaluate the following integrals using Table 6.1



1. x6 dx

1
2. √ dx
x
6.3 Antiderivatives and Rules of Integration 159

Table 6.1 Integral formulas

S.No. Derivative Indefinite


 integral

d x (F(x)) = F (x)
d d
(F(x)) d x = F(x) + C
dx
 
d x (x) =1 1 dx = dx = x + C
d
1

d x n+1 x n+1
2 d x ( n+1 )= xn, xn dx =
n+1
+ C,
n  = −1, n rational n  = −1, n rational

d x (sin x) = cos x cos x d x = sin x + C
d
3

d x (− cos x) = sin x sin x d x = − cos x + C
d
4

d x (tan x) = sec2 x sec2 x d x = tan x + C
d
5

d x (− cot x) = csc2 x csc2 x d x = − cot x + C
d
6

d x (sec x) = sec x tan x sec x tan x d x = sec x + C
d
7

d x (− csc x) = csc x cot x csc x cot x d x = − csc x + C
d
8

d x (e ) = ex ex d x = ex + C
d x
9

d bx bx
10 d x ( ln b ) = bx bx d x =
ln b
+C

1
d x (ln |x|) = d x = ln |x| + C
d 1
11 x x

d x (ln | sec x + tan x|) = sec x sec x d x = ln | sec x + tan x| + C
d
12


1
3. 3
dx
 x
tan x
4. dx
sec x
Solution:
1. Put n = 6 in formula 2 of Table 6.1. This gives

x7
x6 dx = +C.
7

2. By Table 6.1(2) we have with n = −1/2



1
√ d x = 2x 1/2 + C .
x
160 6 Integration

3. 
1 1
3
dx = − 2 + C
x 2x

by putting n = −3 in formula 2 of Table 6.1.


4.   
tan x sin x
d x = cos x d x = sin x d x = − cos x + C .
sec x cos x

6.4 Integration by Substitution

The rules of differentiation as discussed in Sect. 3.3 give rise to corresponding rules
of integration. We have already seen this in the previous section for the rules related
to linearity. In this section, we introduce the so-called method of substitution which
is related to the chain rule for differentiation. In conjunction with the other rules of
integration, the method of substitution is a powerful tool.
In order
 to understand this method, assume that we want to compute the indefinite
integral h(x) d x, where the function h has the form

h(x) = f (g(x))g  (x) (6.8)

for some other functions f and g. Let us assume that we can find an antiderivative
F of f , that is, F  = f . Then it follows from the chain rule that

d
F(g(x)) = (F ◦ g) (x) = F  (g(x))g  (x) = f (g(x))g  (x) = h(x) .
dx

This means that the function H = F ◦ g, H (x) = F(g(x)), is an antiderivative of h.


In this manner, we can find the indefinite integral of h, provided we can decompose
h as in (6.8) and we can compute the indefinite integral of f .
Let us consider for example

8 sin 8x d x .

Here h(x) = 8 sin 8x. We try g(x) = 8x. Then g  (x) = 8, and (6.8) holds with
f (u) = sin u. We know that F(u) = − cos u + C is the indefinite integral of f .
Hence F(g(x)) = − cos 8x + C is the indefinite integral of h(x) = 8 sin 8x.
One may carry out this method according to the following
 scheme.
Step 1. Replace a suitable expression as part of h(x) d x by a new variable u (so
u = g(x) for some function g).
Step 2. Compute g  (x) and replace g  (x)d x by du. 
Step 3. Check whether the resulting integral now has the form f (u) du for some
function f .
6.4 Integration by Substitution 161

Step 4. Evaluate the integral f (u) du. 
Step 5. Replace u by g(x) to obtain the indefinite integral h(x) d x as a function
of x.
The procedure just described is commonly written in the abbreviated form of the
substitution rule  

f (g(x))g (x) d x = f (u) du . (6.9)

However, one must be careful in the interpretation of this formula, since it does not
explicitly mention the necessary backsubstitution (Step 5 above).
Example 6.5 Evaluate the following integrals:

1. sin (x + 10) d x,

2. cos 15x d x,

3. sin2 x cos x d x,


4. (3 3x + 1) d x,

x
5. 2+1
d x,
6x
 √x
e
6. √ d x,
 x
ex
7. d x,
 1 + e2
x
(ln x)
8. d x.
3x
Solution:
1. Let u = x + 10 = g(x), then g  (x) = 1 and du = d x. Replace the expression
x + 10 by u in the given integral, then
 
sin (x + 10) d x = sin u du = − cos u + C = − cos (x + 10) + C .

2. This example illustrates the handling of constants. Let u = 15x = g(x), then
g  (x) = 15 and du = 15 d x. We have
  
1 1 1
cos 15x d x = 15 cos 15x d x = cos u du = (sin u + C) .
15 15 15

Since C is an arbitrary constant, we may replace C/15 by C in the last expression


and obtain 
1
cos 15x d x = sin 15x + C.
15
162 6 Integration

We may abbreviate this computation somewhat. We transform “du = 15 d x” into


“d x = 15
1
du” and simply write
 
1 1
cos 15x d x = cos u du = sin 15x + C .
15 15

3. Let u = sin x then du = cos x d x and


 
1 1
sin2 x cos x d x = u 2 du = u 3 + C = sin3 x + C .
3 3

4. Let u = 3x + 1, then du = 3 d x or d x = 13 du. With this substitution, we compute


  
1 2
3 (3x + 1) d x = u 1/2 du = 3 .u 3/2 + C = (3x + 1)3/2 + C .
2
3

5. Let u = 6x 2 + 1 then du = 12x d x or 1


12
du = x d x. With this substitution, we
compute
 
x 1 du 1 1
dx = = ln |u| + C = ln |6x 2 + 1| + C .
6x + 1
2 12 u 12 12

√ 1 1 1
6. Let u = x, then du = √ d x and 2 du = √ d x. Then
2 x x
 √   √
e x
√ dx = 2eu du = 2 eu du = 2eu + C = 2e x
+C.
x

7. Let u = 1 + e x , then du = e x d x and the given integral takes the form


 
ex 1
dx = du = ln |u| + C = ln |1 + e x | + C .
1 + ex u

1
8. Let u = ln x, then du = d x. By this substitution, the integral takes the form
x
 
(ln x)2 1 1 1
dx = u 2 du = u 3 + C = (ln x)3 + C .
3x 3 9 9

Note 6.1 The substitution technique yields



f  (x)
d x = ln | f (x)| + C (6.10)
f (x)

as well as
6.4 Integration by Substitution 163

[ f (x)]n+1
f  (x)[ f (x)]n d x = +C (6.11)
n+1

for arbitrary functions f .

So far, we have used the substitution rule


 

f (g(x))g (x) d x = f (u) du

in order to compute the integral on the left-hand side, knowing the integral on the
right-hand side. We may also use the rule the other way round, that is, we want to
find an antiderivative of f through evaluating the integral on the left-hand side. This
approach is rather flexible since we may in principle choose any function u = g(x)
to work with, as long as we can evaluate the left-hand side.

1
Example 6.6 Evaluate du.
1 + eu
Solution: In order to eliminate the exponential, we set u = g(x) = ln x. Then du =
g  (x) d x = x1 d x, and
    
1 1 1 1 1
du = · d x = − dx
1 + eu 1+x x x 1+x
= ln x − ln(1 + x) + C = ln eu − ln(1 + eu ) + C
= u − ln(1 + eu ) + C .

Note that we have used the backsubstitution x = g −1 (u) = eu . Indeed, this variant
of the substitution method requires that the function g is invertible on the appropriate
domain (here, x > 0) and range (here, all of R).

6.5 Integration by Parts

The method of integration by parts is essentially the antiderivative version of the


formula for differentiating a product of two functions which says that

f (x)g(x) = f (x)g  (x) + g(x) f  (x) (6.12)


dx
holds whenever f and g are differentiable functions. Forming the indefinite integral
on both sides, we obtain
 
d

f (x)g(x) d x = f (x)g  (x) + g(x) f  (x) d x .


dx
164 6 Integration

Due to the linearity of the indefinite integral, we get


  
d

f (x)g  (x) d x = f (x)g(x) d x − g(x) f  (x) d x .


dx

Since the function x


→ f (x)g(x) is an antiderivative of the function
x
→ d/d x [ f (x)g(x)], we obtain the formula for integration by parts
 

f (x)g (x) d x = f (x)g(x) − g(x) f  (x) d x . (6.13)

It is not necessary to write a constant here, since both indefinite integrals in (6.13)
are only determined up to a constant.1
The integration by parts formula is very useful as it converts a given indefinite
integral into another one that may be easier to evaluate. For the purposes of actual
computation, we may abbreviate it in the form
 
u dv = uv − v du , (6.14)

which may also be easier to memorize. Here, u and v stand for f (x) and g(x),
respectively, while du stands for f  (x)d x and dv for g  (x)d x.
Example 6.7 Evaluate the following integrals applying the method of integration by
parts:

1. xe x d x,

2. x 2 e−x d x,

xe x
3. d x,
 (x + 1)
2

4. ln x d x,

5. x ln x d x,

6. x 2 cos x d x,

7. e x cos x d x,

8. sec3 x d x,

9. sinn x d x, where n ≥ 2 is a natural number.

1 Notethat in (6.13), the letter ‘x’ is used in two different meanings; as an actual argument in the
expression ‘ f (x)g(x)’ and as an integration variable in the two integrals.
6.5 Integration by Parts 165

Solution:

1. We want to write the given integral in the form u dv and apply formula (6.14).
To this purpose, we set u = f (x) = x and dv = g  (x)d x = e x d x. Then du = d x.
Next, we choose v = g(x) = e x as an antiderivative of x
→ e x . The formula
 
u dv = uv − v du

now becomes
 
xe x d x = xe x − e x d x = xe x − e x + C .

The rightmost expression no longer contains an indefinite integral, therefore we


have to introduce the constant C. (The reader may check that if we had chosen
v = e x + 1, for example, instead of v = e x , we would have arrived at the same
final result.)
2. Let u = x 2 , dv = e−x d x, then du = 2x d x, and we may choose v = −e−x as
an antiderivative of x
→ e−x . (The latter result is obtained by the substitution
method. Let −x = t, then −d x = dt or d x = −dt, so
 
e−x d x = − et dt = −et = −e−x ,

where we have chosen the integration constant to be zero.) Thus


   
x 2 e−x d x = u dv = uv − v du = −x 2 e−x + 2 e−x x d x .

We still have an indefinite integral on the right-hand side, but with the factor of x
instead of x 2 . To evaluate that integral, we apply integration by parts once again.
Put u = x, dv = e−x d x, then du = d x. Again, we choose v = −e−x . Then
  
−x −x
xe d x = uv − v du = −xe + e−x d x

= −xe−x − e−x + C1 .

As final result, we get



x 2 e−x d x = −x 2 e−x + 2(−xe−x − e−x + C1 )

= −(x 2 + 2x + 2)e−x + C ,

where we have written C instead of 2C1 for the arbitrary constant.


166 6 Integration

1 1
3. Let u = xe x and dv = d x. We choose v = − as antiderivative
(x + 1)2 x +1
and compute du = (xe + e ) d x = e (x + 1) d x. Now
x x x

  
xe x
d x = u dv = uv − v du
(x + 1)2
    
−1 −1 xe x
= xe x − e x (x + 1) d x = − + ex + C
x +1 x +1 x +1
ex
= +C.
x +1
 
1
4. ln x d x = u dv, where we choose u = ln x, dv = d x. We have du = d x,
x
and we take v = x as antiderivative. Then
   
1
ln x d x = u dv = uv − v du = x ln x − x d x
x
= x ln x − x + C .

dx 1
5. Let u = ln x and dv = x d x, then du = and v = x 2 . Thus,
x 2
   
1 2 1 2 1
x ln x d x = u dv = uv − v du = x ln x − x . dx
2 2 x

1 2 1 1 2 1 2
= x ln x − x d x = x ln x − x + C
2 2 2 4
 
1 2 1
= x ln x − +C.
2 2
 
6. x 2 cos x d x = u dv, where we set u = x 2 and dv = cos x d x. We get du =
2x d x and take v = sin x as antiderivative of cos x. Then
   
x cos x d x = u dv = uv − v du = x sin x − 2x sin x d x
2 2


= x 2 sin x − 2 x sin x d x . (6.15)


To find the indefinite integral x sin x d x, we again apply integration by parts.
Let u = x, dv = sin x d x, then du = d x, and with v = − cos x we get
  
x sin x d x = uv − v du = −x cos x − − cos x d x

= −x cos x + sin x + C1 .
6.5 Integration by Parts 167

Inserting this formula into (6.15), we finally obtain



x 2 cos x d x = x 2 sin x − 2(−x cos x + sin x + C1 )

= x 2 sin x + 2x cos x − 2 sin x + C ,

where we have replaced −2C1 by C.


7. Let u = e x , dv = cos x d x, then du = e x d x and we take v = sin x. Integration
by parts yields
   
e cos x d x =
x
u dv = uv − v du = e sin x −
x
e x sin x d x

= e x sin x − e x sin x d x . (6.16)


To evaluate e x sin x d x we again use integration by parts. Let u = e x , dv =
sin x d x, then du = e x d x and v = − cos x. Then
   
e x sin x d x = u dv = uv − v du = −e x cos x − −e x cos x d x

= −e cos x + e x cos x d x .
x

Using this formula in (6.16), we get


 
e x cos x d x = e x sin x + e x cos x − e x cos x d x .

Consequently, 
2 e x cos x d x = e x (sin x + cos x) + C1

and finally 
1 x
e x cos x d x = e (sin x + cos x) + C .
2
 
8. sec3 x d x = u dv, where we set u = sec x, dv = sec2 x d x. We get du =
sec x tan x d x, and use v = tan x as an antiderivative. Thus
   
sec3 x d x = u dv = uv − v du = sec x tan x − sec x tan2 x d x .

We use the trigonometric identity 1 + tan2 x = sec2 x, or tan2 x = sec2 x − 1, to


obtain
168 6 Integration
 
sec3 x d x = sec x tan x − (sec3 x − sec x) d x .

Rearranging the integrals yields


 
2 sec3 x d x = sec x tan x + sec x d x .

We divide by 2 and look up the indefinite integral of sec x to obtain



1 1
sec3 x d x = sec x tan x + ln | sec x + tan x| + C .
2 2
 
9. sinn x d x = u dv, where we set u = sinn−1 x and dv = sin x d x. We obtain
du = (n − 1) sinn−2 x cos x d x and take v = − cos x. Thus
   
sin x d x = sin
n n−1
x sin x d x = u dv = uv − v du

= − sinn−1 x cos x + (n − 1) sinn−2 x cos2 x d x .

Since cos2 x = 1 − sin2 x, we may write


  
sinn x d x = − cos x sinn−1 x + (n − 1) sinn−2 x d x − (n − 1) sinn x d x .

Moving the rightmost integral to the left-hand side, we get


 
n sin x d x = − cos x sin
n n−1
x + (n − 1) sinn−2 x d x ,

and division by n finally yields


 
1 n−1
sinn x d x = − cos x sinn−1 x + sinn−2 x d x . (6.17)
n n

Remark
 6.2 Formula (6.17) is called a reduction formula or a recursion formula
for sinn x d x. For example, using it with n = 4 we get

 
1 3
sin x d x = − cos x sin3 x +
4
sin2 x d x .
4 4

Applying (6.17) for n = 2 we get



1 1
sin2 x d x = − cos x sin x + x + C .
2 2
6.5 Integration by Parts 169

Consequently,

1 3 3
sin4 x d x = − cos x sin3 x − cos x sin x + x + D ,
4 8 8

where D = 34 C. In this manner, we successively reduce even powers n to the constant


function. Analogously, we may reduce odd powers n to the power 1, that is, to

sin x d x = − cos x + C.

6.6 The Fundamental Theorem of Calculus

As its name indicates, this theorem is the cornerstone of calculus. It establishes the
fact that the processes of integration and of differentiation are inverse to each other.
Since the theorem deals with general functions, not just functions given by explicit
formulas as in the preceding sections, we need to know that the integrals to be written
down make sense. For this purpose, we present a preliminary result, whose proof
can be found in Appendix D.
Theorem 6.2 If f is a continuous function defined on a closed interval [a, b], then
f is integrable on [a, b], that is,
 b
f (x) d x
a

exists.
There are two parts of the Fundamental Theorem of Calculus. Here, we will state
them and interpret them. Their proofs are given in Appendix D.5.
Theorem 6.3 (The Fundamental Theorem of Calculus, Part 1) If f is a continuous
function defined on [a, b], then the function
 x
F(x) = f (t) dt (6.18)
a

has a derivative at every point x in [a, b] and


 x
 d
F (x) = f (t) dt = f (x) . (6.19)
dx a

Remark 6.3 1. Theorem 6.3 says that if we integrate a function f as in (6.18), and
differentiate the resulting function F, we get back the function f from which we
started. Conversely, if we differentiate some function G to obtain f = G  , and
integrate f according to (6.18), the resulting F is an antiderivative of f , hence
equal to G except for a constant difference.
170 6 Integration

2. Moreover, it says that every continuous function f has an antiderivative,


namely F.
dF
3. In addition, it implies that the differential equation = f has a solution for
dx
every continuous f .

Theorem 6.4 (The Fundamental Theorem of Calculus, Part 2) If f is a continuous


function defined on [a, b], and if F is any antiderivative of f on [a, b], then
 b
f (x) d x = F(b) − F(a) . (6.20)
a

Remark 6.4 1. Theorem 6.4 says that the definite integral of any continuous func-
tion f can be calculated without taking limits, without computing Riemann sums,
as long as an antiderivative of f can be found (which often does not present any
difficulty).
2. It relates the computation of area below the graph of a (nonnegative) function f
to the concept of the tangent (namely, f (x) = F  (x) is the slope of the tangent
to the graph of F at (x, F(x))).
3. It can be viewed as the truly fundamental part of the theorem.
b x=b

4. Often, F(b) − F(a) is written as F(x) , or [F(x)]ab or F(x) .
a x=a

From Theorem 6.4 we see that we can evaluate the definite integral of a function
f on the interval [a, b], and thus (if f is nonnegative) determine the area below its
graph as follows.
Step 1: Find an antiderivative F of f . (Any antiderivative F of f works.)
Step 2: Evaluate F(b) and F(a) and calculate
 b
f (x) d x = F(b) − F(a) .
a

Example 6.8 Using the Fundamental Theorem of Calculus, compute the area
1. below y = x 2 for the interval 0 ≤ x ≤ 1,
2. below y = cos x for the interval [0, π/2].

Solution:
1. Let f (x) = x 2 . The general form of its antiderivative F (that is, its indefinite
integral) is F(x) = 13 x 3 + C. Since we may choose any antiderivative, we set
C = 0. (The constant C would cancel out in F(b) − F(a) anyway.) Therefore,
the sought-for area is given by
 1
1 3 1 1 1 1
x2 dx = x = 13 − 03 = .
0 3 0 3 3 3
6.6 The Fundamental Theorem of Calculus 171

2. Let f (x) = cos x. The function F(x) = sin x yields an antiderivative of f . There-
fore the area below y = cos x on the interval [0, π/2] is
 π/2 π/2
π
cos x d x = sin x = sin − sin 0 = 1 .
0 0 2

Properties of the definite integral. Throughout this subsection, let f and g be


integrable functions on the interval [a, b]. We have the rules for constant multiples,
sums and differences,
 b  b
c f (x) d x = c f (x) d x , for any constant c, (6.21)
a a

 b  b  b
( f (x) ± g(x)) d x = f (x) d x ± g(x) d x . (6.22)
a a a

They can be combined into the property of linearity,


 b  b  b
(α f (x) + βg(x)) d x = α f (x) d x + β g(x) d x , (6.23)
a a a

where α, β are arbitrary constants. In addition, the property of monotonicity holds:


If f is nonnegative on [a, b], that is, f (x) ≥ 0 for all x ∈ [a, b], then
 b
f (x) d x ≥ 0 , (6.24)
a

and if f ≥ g on [a, b], that is, f (x) ≥ g(x) for all x ∈ [a, b], then
 b  b
f (x) d x ≥ g(x) d x . (6.25)
a a

Moreover, we may split a definite integral on [a, b] into two integrals on [a, c] and
[c, b] with a < c < b,
 b  c  b
f (x) d x = f (x) d x + f (x) d x . (6.26)
a a c

The formulas (6.21)–(6.26) can be proved as a consequence of the formal definition


of the definite integral.
So far, in accordance with its definition, we have assumed that a < b. However,
for computations it can be convenient if we drop this restriction and define
 c
f (x) d x = 0 , (6.27)
c
172 6 Integration

whenever c belongs to the domain of f , as well as


 a  b
f (x) d x = − f (x) d x . (6.28)
b a

It turns out that with these extended definitions, (6.26) holds no matter how a, b, and
c are related, as long as f is integrable on the respective intervals.
We now present the substitution rule for definite integrals.
Theorem 6.5 Let g be a continuously differentiable function on the interval [a, b],
let f be a continuous function on the range R(g) of g. Then
 b  g(b)
f (g(x))g  (x) d x = f (u) du . (6.29)
a g(a)

Proof Let F be an antiderivative of f , then


 b  g(b)

f (g(x))g (x) d x = F(g(b)) − F(g(a)) = f (u) du .
a g(a)

Both equalities follow from the Fundamental Theorem, the first because F ◦ g is an
antiderivative of the left integrand, the second because F is an antiderivative of f .
Setting g(x) = x + c in Theorem 6.5, c being a fixed number, we obtain the formula
 b  b+c
f (x + c) d x = f (x) d x . (6.30)
a a+c

(Note that we have replaced the letter ‘u’ for the integration variable in the second
integral by ‘x’.) Setting g(x) = cx, c = 0 being a fixed number, we get
 b  cb
1
f (cx) d x = f (x) d x . (6.31)
a c ca

In the particular case c = −1, (6.31) becomes


 b  −b
f (−x) d x = − f (x) d x . (6.32)
a −a

From this formula, we obtain


 0  a
f (x) d x = f (x) d x , if f is an even function, (6.33)
−a 0
 0  a
f (x) d x = − f (x) d x , if f is an odd function. (6.34)
−a 0
6.6 The Fundamental Theorem of Calculus 173

These formulas imply that


 a  a
f (x) d x = 2 f (x) d x , if f is an even function, (6.35)
−a 0
 a
f (x) d x = 0 , if f is an odd function. (6.36)
−a

The next theorem states the integration by parts formula for definite integrals.
Theorem 6.6 Let f and g be continuously differentiable functions on [a, b]. Then
 b b  b

f (x)g  (x) d x = f (x)g(x) − f  (x)g(x) d x . (6.37)
a a a

The Fundamental Theorem yields this result, too, if we apply it to the function f · g
which is an antiderivative of the function f g  + f  g.
We close this section with the following theorem.
Theorem 6.7 (Mean Value Theorem for Integrals) Let f be continuous on the inter-
val [a, b], then there is at least one number c in (a, b) such that
 b  b
1
f (x) d x = f (c)(b − a) , or f (c) = f (x) d x . (6.38)
a b−a a

The proof of this theorem will be given in Appendix D.

6.7 Trigonometric Integrals

In this section we discuss the integration of functions which arise as products of


sines, cosines, secants, and tangents, as well as integrals of hyperbolic trigonometric
functions and inverse trigonometric functions. The method used is mainly integration
by parts in combination with trigonometric identities. Moreover, integrals of other
functions can often be converted to trigonometric integrals by a suitable substitution.
Let us consider an example. We may evaluate
 
x2 1 − x2 dx

using the substitution x = sin θ in the way described at the end of Sect. 6.4. This
gives us  

sin θ 1 − sin θ cos θ dθ = sin2 θ cos2 θ dθ .
2 2
174 6 Integration

Let us evaluate the latter integral. We have


   
sin2 θ cos2 θ dθ = sin2 θ (1 − sin2 θ ) dθ = sin2 θ dθ − sin4 θ dθ .
(6.39)
By Example 6.7(9),

1 1
sin2 θ dθ = − cos θ sin θ + θ + C1
2 2

and 
1 3 3
sin4 θ dθ = − cos θ sin3 θ − cos θ sin θ + θ + C2 .
4 8 8

It follows that


1 1 1 3 3
sin2 θ cos2 θ dθ = − cos θ sin θ + θ + cos θ sin3 θ + cos θ sin θ − θ + C .
2 2 4 8 8

We substitute back sin θ = x, cos θ = 1 − x 2 and θ = sin−1 (x) = arcsin x to
obtain the final result
  1  1 1 
x 2 1 − x 2 d x = − x 1 − x 2 + sin−1 x + x 3 1 − x 2
2 2 4
3  3
+ x 1 − x 2 − sin−1 x + C .
8 8
Formulas for Integration of Products of Trigonometric Functions

sin (m + n)x sin (m − n)x
1. sin mx sin nx d x = − + + C.
 2(m + n) 2(m − n)
sin (m + n)x sin (m − n)x
2. cos mx cos nx d x = + + C.
 2(m + n) 2(m − n)
cos (m + n)x cos (m − n)x
3. sin mx cos nx d x = − − + C.
2(m + n) 2(m − n)
4.
 
sinm−1 x cosn+1 x m−1
sinm x cosn x d x = + sinm−2 x cosn x d x
m+n m+n

sinm+1 x cosn−1 x n−1
= + sinm x cosn−2 x d x.
m+n m+n
 
1 n−1
5. cosn x d x = cosn−1 x sin x + cosn−2 x d x
n n
6.7 Trigonometric Integrals 175
 
1
6. tann x d x = tann−1 x − tann−2 x d x.
 n−1 
tan x secn−2 x n−2
7. sec x d x =
n
+ secn−2 x d x.
 n − 1 n
 − 1
1
8. cot n x d x = cot n−1 x − cot n−2 x d x.
 n−1 
cot x cscn−2 x n−2
9. csc x d x =
n
+ cscn−2 x d x .
n−1 n−1

Integration and Differentiation of Hyperbolic Trigonometric Functions. We have


introduced hyperbolic trigonometric functions in Sect. 1.3. Here, we discuss the dif-
ferentiation and integration of such functions. Before doing this, we want to mention
some practical situations where those functions arise. Hyperbolic cosine functions
can be used to describe the shape of a uniform cable or chain, whose ends are fixed
at the same height. Telephone and power lines may be strung between poles in this
manner. As another example, let us consider a falling object. If we assume air resis-
tance to be proportional to the square of its velocity, then the vertical distance y
covered by the object in t seconds is given by y = a ln(cosh bt), where a and b are
constants.
Let us recall the two identities

e x − e−x e x + e−x
sinh x = , cosh x = .
2 2

From those and the other defining formulas for hyperbolic trigonometric functions
as presented in Sect. 1.3, one may obtain the following identities:
(i) cosh x + sinh x = e x ,
(ii) cosh x − sinh x = e−x ,
(iii) cosh2 x − sinh2 x = 1,
(iv) 1 − tanh2 x = sech2 x,
(v) coth2 x − 1 = csch2 x,
(vi) cosh (−x) = cosh x,
(vii) sinh (−x) = − sinh x,
(viii) sinh (x + y) = sinh x cosh y + cosh x sinh y,
(ix) cosh (x + y) = cosh x cosh y + sinh x sinh y,
(x) sinh (x − y) = sinh x cosh y − cosh x sinh y,
(xi) cosh (x − y) = cosh x cosh y − sinh x sinh y,
(xii) sinh 2x = 2 sinh x cosh x,
(xiii) cosh 2x = cosh2 x + sinh2 x,
(xiv) cosh 2x = 2 sinh2 x + 1,
(xv) cosh 2x = 2 cosh2 x − 1 .
176 6 Integration

Table 6.2 Integrals of hyperbolic functions



d
sinh x = cosh x sinh x d x = cosh x + C
dx

d
cosh x = sinh x cosh x d x = sinh x + C
dx

d
tanh x = sech2 x sech2 x d x = tanh x + C
dx

d
coth x = csch2 x csch2 x d x = − coth x + C
dx

d
sech x = −sech x tanh x sech x tanh x d x = −sech x + C
dx

d
csch x = −csch x coth x csch x coth x d x = −csch x + C
dx

Table 6.3 Derivatives of inverse hyperbolic functions


d 1 d 1
sinh−1 x = √ coth−1 x = , |x| > 1
dx 1 + x2 dx 1 − x2
d 1 d 1
cosh−1 x = √ ,x > 1 (sech−1 x) = − √ , 0<x <1
dx x2 − 1 dx x 1 − x2
d 1 d 1
tanh−1 x = , |x| < 1 (csch−1 x) = √ , x = 0
dx 1 − x2 dx |x| 1 + x 2

Table 6.2 given below provides a complete list of the derivative formulas and corre-
sponding integration formulas for the hyperbolic functions.
We check here the first line in Table 6.2. We have
 
d d e x − e−x 1 d x d −x e x + e−x
sinh x = = e − e =
dx dx 2 2 dx dx 2
= cosh x .

Therefore, we also have cosh x = sinh x + C by definition of the indefinite inte-
gral. Tables 6.3 and 6.4 provide a list of derivative and integration formulas for inverse
hyperbolic functions.
6.7 Trigonometric Integrals 177

Table 6.4 Functions whose antiderivatives are inverse hyperbolic functions

If a > 0, then
 x 
dx √
√ = sinh−1 + C or ln (x + x 2 + a 2 ) + C
 a +x
2 2 a
dx x  √
√ = cosh−1 + C or ln (x + x 2 − a 2 ) + C
x 2 − a2 ⎧ a
⎪  
 ⎪ a1 tanh−1 ax + C , |x| < a

dx ⎨ 1 a + x
= or ln + C, |x|  = a
a2 − x 2 ⎪
⎪ 2a a − x

⎩1  
−1 x + C, |x| > a
a coth a
  √ 
a + a2 − x 2
dx 1 −1 x 1
√ = − sech + C or − ln + C , 0 < |x| < a
x a2 − x 2 a a a |x|
  √ 
a + a2 + x 2
dx 1 −1 x 1
√ = − csch + C or − ln + C , 0 < |x| < a
x a2 + x 2 a a a |x|

Differentiation and Integration of Inverse Trigonometric functions. The inverse


trigonometric functions (cyclometric functions) are introduced in Appendix C.3. We
discuss here the differentiation and integration of these functions.
The following identities hold for the inverse trigonometric functions:

(i) cos (arcsin x) = √1 − x 2 .
(ii) sin (arccos x) = 1 − x 2 .
x
(iii) tan (arcsin x) = √ .
 −x
1 2

(iv) sec (arctan x) = √1 + x 2 .


1 − x2
(v) sin (arcsec x) = , if x ≥ 1.
x
Let us check identity (i). We have

(cos(arcsin x))2 + (sin(arcsin x))2 = 1 .

Since sin(arcsin x) = x, we get (cos(arcsin x))2 = 1 − x 2 , and therefore (i) holds.


Similarly, the other identities are obtained.
Example 6.9 Find the derivatives of arcsin x, arccos x, and arctan x.
Solution: We differentiate the identity x = sin(arcsin x) and obtain from the chain
rule that
1 = cos(arcsin x) · (arcsin) (x) ,

hence, using (i) above,

d 1 1
arcsin x = =√ . (6.40)
dx cos(arcsin x) 1 − x2
178 6 Integration

Similarly, from the identity x = cos(arccos x) and (ii) above we obtain

d 1
arccos x = − √ . (6.41)
dx 1 − x2

To find the derivative of arctan x, we differentiate the identity x = tan(arctan x)


which yields, together with (iv) above,

1 = sec2 (arctan x) · (arctan) (x) = (1 + x 2 ) · (arctan) (x) ,

so
d 1
arctan x = .
dx 1 + x2

We can now derive the identity


π
arcsin x + arccos x = . (6.42)
2
Indeed, we have
d
(arcsin x + arccos x) = 0
dx
by (6.40) and (6.41), and therefore the left-hand side of (6.42) must be constant.
Since arcsin 0 = 0 and arccos 0 = π/2, (6.42) is proved.

Example 6.10 Evaluate the indefinite integral arcsin x d x.
 
Solution: We use integration by parts. In the formula u dv = uv − v du we set
dx
u = arcsin x, dv = d x. This yields du = √ , and we take v = x as antideriva-
1 − x2
tive of the constant 1. It follows that
 
x
arcsin x d x = x arcsin x − √ dx .
1 − x2

We now substitute w = 1 − x 2 . Then dw = −2x d x and


 
x 1 dw √
√ dx = − √ =− w+C,
1−x 2 2 w

so finally  
arcsin x d x = x arcsin x + 1 − x2 + C .


dx 1
Example 6.11 Evaluate the indefinite integral , where x > .
25x − 1
2 5
6.7 Trigonometric Integrals 179

Solution: Let u = 5x. Thus du = 5 d x and


  
dx 1 5 1 du 1
√ = √ dx = √ = cosh−1 u + C
25x − 1
2 5 25x − 1
2 5 u −1
2 2 5
1
= cosh−1 5x + C .
5

6.8 Partial Fractions and Integration

Here, we discuss the integration of rational functions

P(x)
f (x) = ,
Q(x)

which we have introduced in Definition 1.7 as the quotient of two polynomials P


and Q. We present the method of partial fraction expansion by means of several
examples.

−4x + 16
Example 6.12 Find d x.
(x − 2)(x + 2)x
Solution: The idea is to decompose the rational function as a sum of simpler fractions,

P(x) −4x + 16 A B C
= = + + , (6.43)
Q(x) (x − 2)(x + 2)x x −2 x +2 x

where the unknown coefficients A, B, and C have to be determined. To this purpose,


we compute

A B C Ax 2 + 2x A + Bx 2 − 2x B + C x 2 − 4C
+ + =
x −2 x +2 x (x − 2)(x + 2)x
(A + B + C)x 2 + (2 A − 2B)x − 4C
= .
(x − 2)(x + 2)x

In order that (6.43) holds, we must have

P(x) = −4x + 16 = (A + B + C)x 2 + (2 A − 2B)x − 4C .

Comparing coefficients we get the system

A + B + C = 0 , 2 A − 2B = −4 , −4C = 16 ,

of 3 linear equations for the 3 unknowns A, B, C. The third equation gives C = −4,
and we are left with
A + B = 4 , A − B = −2 ,
180 6 Integration

which we add to obtain 2 A = 2, A = 1, and therefore B = 3. We may use these


numbers in (6.43) to obtain finally
   
−4x + 16 1 3 −4
dx = dx + dx + dx
(x − 2)(x + 2)x x −2 x +2 x
= ln |x − 2| + 3 ln |x + 2| − 4 ln |x| + C .

3x 2 + 8x − 4
Example 6.13 Evaluate d x.
(x − 2)(x + 2)x
Solution: As in the previous example, we determine A, B, C such that

3x 2 + 8x − 4 A B C
= + +
(x − 2)(x + 2)x x −2 x +2 x
A(x + 2)x + B(x − 2)x + C(x − 2)(x + 2)
=
(x − 2)(x + 2)x

holds. Comparing the coefficients of x 2 , x and the constant term we find that A = 3,
B = −1, C = 1. Thus,
   
3x 2 + 8x − 4 3 −1 dx
dx = dx + dx +
(x − 2)(x + 2)x x −2 x +2 x
= 3 ln |x − 2| − ln |x + 2| + ln |x| + C .

2x + 5
Example 6.14 Evaluate d x.
(x + 2)2 (x + 3)2
Solution: Because of the double factors in the denominator, we use a slightly different
partial fraction expansion, namely

2x + 5 A B C D
= + + + (6.44)
(x + 2)2 (x + 3)2 (x + 2) (x + 2)2 (x + 3) (x + 3)2

with the 4 unknown coefficients A, B, C, D. They must satisfy

2x + 5 = A(x + 2)(x + 3)2 + B(x + 3)2 + C(x + 3)(x + 2)2 + D(x + 2)2 .

We have to compare the coefficients of the 4 terms x 3 , x 2 , x, and the constant. When
we do this, we obtain A = C = 0, B = 1, D = −1. Therefore,
  
2x + 5 1 1
dx = dx − dx
(x + 2)2 (x + 3)2 (x + 2)2 (x + 3)2
1 1
=− + +C.
x +2 x +3
6.8 Partial Fractions and Integration 181

Let us remark that often we have to carry out the factorization of the denominator
Q as a preliminary step. If in Example 6.12, we had been asked for the indefinite
integral 
−4x + 16
dx ,
x 3 − 4x

we first should have factorized Q(x) as x 3 − 4x = (x − 2)(x + 2)x and then contin-
ued as before. In general, it may also happen that Q(x) not only contains linear factors,
but also quadratic factors which cannot be decomposed further, like, for example,
x 2 + 1. This case arises when the associated quadratic equation (here x 2 + 1 = 0)
has no real solutions. We do not discuss this case here.

6.9 Improper Integrals

In many applications, we are required to evaluate integrals which have unbounded


intervals of integration (that is, the lower limit of integration is −∞, or the upper
limit is +∞, or both), or integrals whose integrand tends to infinity at an interior
or boundary point of the integration interval. Such integrals are called improper
integrals. To motivate their definition, assume we want to find the area of the region
R below the curve y = f (x) = x12 , extending from the vertical line x = 1 to the right
without bound. For any number b > 1, we may partition R into two subregions by
the vertical line x = b. As b gets larger and larger, the area of the part where x ≥ b
becomes smaller and smaller. If it tends to  zero (which we will see below for this

1
example), it makes sense to define the area d x of the region R as the limit
 b 1 x2
1
(for b → ∞) of the area 2
d x to the left of the vertical x = b.
1 x

Definition 6.3 (Infinite limit of integration) Let f be a function which is integrable


on every bounded interval [a, b] with b > a (this holds, for example, when f is
continuous on [a, ∞)). Then the improper integral of f over [a, ∞) is defined by
 ∞  b
f (x) d x = lim f (x) d x (6.45)
a b→∞ a

whenever the limit exists and is a finite number. In this case, the improper integral
(6.45) is said to be convergent; otherwise, it is said to be divergent.

Note that the improper integral is a special case of the improper limit of a function
as discussed in Sect. 2.4. Namely, if we set
 b
F(b) = f (x) d x ,
a
182 6 Integration

then the improper integral (6.45) equals the improper limit of F(b) as b → ∞.
Improper integrals of f over (−∞, b] are defined analogously.
The improper integral of f over the whole real line (−∞, ∞) is defined as
 ∞  c  ∞
f (x) d x = f (x) d x + f (x) d x , c ∈ R , (6.46)
−∞ −∞ c

 ∞
if both integrals on the right-hand side are convergent. In this case, f (x) d x is
−∞
said to be convergent. (It does not matter which value of c we choose in (6.46), since
the sum of the two integrals on the right- hand side will always be the same.)

Example 6.15 Evaluate the following improper integrals:


 ∞
1
1. 2
d x,
2 ∞ x
2. e−3x d x,
 ∞
0
1
3. d x,
−∞ 1 + x2

xe−x d x,
2
4.
−∞∞
(arctan x)2
5. d x.
−∞ 1 + x2
Solution: In all instances, the computations will show that the improper integrals
exist, so the passage to the limit is justified.
1.  
∞ 1 b 1 1 b 1 1 1 1
d x = lim d x = lim − = − lim + lim =0+ = .
2 x2 b→∞ 2 x2 b→∞ x 2 b→∞ b b→∞ 2 2 2

2.
 ∞  b  b
e−3x 1 1
e−3x d x = lim e−3x d x = lim − = − [ lim e−3b − 1] = .
0 b→∞ 0 b→∞ 3 3 b→∞ 3
0

3. Choosing c = 0 in (6.46), we partition the integral as


 ∞  0  ∞
1 1 1
dx = dx + d x.
−∞ 1 + x2 −∞ 1 + x2 0 1 + x2
6.9 Improper Integrals 183

We compute the first integral on the right-hand side,


 0  0
1
=
1
= −1 x 0 = 0 − lim tan−1 x
d x lim d x lim tan
−∞ 1 + x a→−∞ a 1 + x 2
2 a→−∞ a a→−∞
π
= .
2

Since the function f (x) = 1/(1 + x 2 ) is even, we have


 ∞ 
1 0
1 π
dx = dx = ,
0 1 + x2 −∞ 1+x 2 2
 ∞
1
and therefore dx = π.
−∞ 1 + x
2
4. Choosing c = 0 in (6.46), we have
 ∞  0  ∞
−x 2 −x 2
xe−x d x .
2
xe dx = xe dx + (6.47)
−∞ −∞ 0

We evaluate the first integral on the right-hand side,


 0  0
−x 2
xe−x d x
2
xe d x = lim
−∞ a→−∞ −a
 0
1
= lim e−t dt (using the substitution x 2 = t)
a→−∞ 2 a2
0  
1 −t 1 1 −a 2
= lim − e = lim − + e
a→−∞ 2 a2
a→−∞ 2 2
1
=− .
2

Since the function f (x) = xe−x is odd, we have


2

 ∞  0
−x 2 1
xe−x d x =
2
xe dx = − ,
0 −∞ 2
so  ∞
1 1
xe−x d x = − + = 0 .
2

−∞ 2 2

5. As above, we partition with c = 0,


 ∞   ∞
(arctan x)2 0
(arctan x)2 (arctan x)2
dx = dx + dx .
−∞ 1 + x2 −∞ 1 + x2 0 1 + x2
184 6 Integration

The antiderivative we need is given by



(arctan x)2 1
d x = (arctan x)3 + C .
1+x 2 3

Now arctan b → 21 π as b → ∞, and arctan a → − 21 π as a → −∞. Therefore,


 ∞ b  
(arctan x)2 1 3 1 1 3 1 3
d x = lim (arctan x) = π = π .
0 1 + x2 b→∞ 3 0 3 2 24

Since the integrand is even,


  ∞
0
(arctan x)2 (arctan x)2 1 3
dx = dx = π ,
−∞ 1 + x2 0 1 + x2 24

hence  ∞
(arctan x)2 1 3 1 1 3
dx = π + π3 = π .
−∞ 1+x 2 24 24 12

Remark 6.5 The improper integral


 ∞
1
dx (6.48)
1 x

is a divergent integral. Indeed, we have


 b b
1
d x = ln x = ln b − ln 1 = ln b .
1 x 1

We know that ln b tends to ∞ as b → ∞. Hence, the integral (6.48) is divergent.

We now discuss the case where the integrand tends to infinity at an end point of the
integration interval. Let f be a continuous function on (a, b] with f (x) → ∞ (or
−∞) for x → a. The improper integral of f on [a, b] is defined as the right-hand
limit  b  b
f (x) d x = lim f (x) d x , (6.49)
a ε→0+ a+ε

whenever this limit exists as a finite number, and the improper integral is then said
to be convergent. As an example, consider
 1
1
√ dx . (6.50)
0 x
6.9 Improper Integrals 185

We have  1
1 √ 1 √
√ dx = 2 x = 2 − 2 ε ,
ε x ε

so we get
 1  1
1 1
√ d x = lim √ dx = 2 .
0 x ε→0+ ε x

Analogously, when f (x) → ∞ (or −∞) for x → b on [a, b], we define


 b  b−ε
f (x) d x = lim f (x) d x .
a ε→0+ a

The integral test. We have discussed the notion of convergence for infinite series
in Chap. 5. The so-called integral test relates the convergence behavior of a series to
that of an improper integral.

Theorem 6.8 (Integral Test) Let f be a continuous nonincreasing function defined


for all x ≥ 1. For n = 1, 2, 3, . . . , we set sn = f (n). Then

  ∞
the series sn and the improper integral f (x) d x
n=1 1

either both converge or both diverge.

The proof of this theorem will be given in Appendix D.


While this test works both ways, it is mainly used to deduce the convergence of
a series from the convergence of the corresponding improper integral.
 ∞
1
Example 6.16 Show that the improper integral d x, where a > 0, diverges
a xp
∞
1
if p ≤ 1 and converges if p > 1. As a consequence, the series p
converges for
n=1
n
∞
1
p > 1 and diverges for p ≤ 1. In particular, the harmonic series diverges.
n=1
n

Solution: Suppose p = 1. Then


 b  
b
1 1 1 1 1 1
d x = − . = − .
a xp p − 1 x p−1 a p − 1 a p−1 b p−1

We have 
1 0, if p − 1 > 0 ,
lim =
b→∞ b p−1 +∞ , if p − 1 < 0 .
186 6 Integration

Hence, if p > 1 then


 ∞  b
1 1 1 1
d x = lim dx = · p−1 .
a xp b→∞ a x p p−1 a

If p < 1, then
 ∞  b
1 1 1
d x = lim d x = lim (b1− p − a 1− p ) = +∞ .
a xp b→∞ a xp b→∞ 1 − p

In the case p = 1 we have (see Remark 6.5 above)


 ∞
1
d x = lim (ln b − ln a) = +∞ .
a x b→∞

Therefore, the improper integral converges for p > 1 and diverges for p ≤ 1.
∞
n
Example 6.17 Use the integral test to determine whether the infinite series
n=1
en 2
is convergent.
n x
= xe−x .
2
Solution: The nth term of the series is sn = 2 , so we take f (x) =
e n ex 2
This function is continuous and positive for all positive values of x. Moreover,

f  (x) = e−x + xe−x (−2x) = e−x (1 − 2x 2 ) .


2 2 2

Since f  (x) is negative for x > 1, f is decreasing on [1, ∞). The assumptions of
the integral test (Theorem 6.8) are satisfied. To apply the test, we compute
 b
2
b
1 1 2 1
xe−x d x = − e−x = − e−b + e−1 .
2

1 2 1
2 2

Therefore, the improper integral (the limit as b → ∞) converges,


 ∞
1 −1
xe−x d x =
2
e .
1 2

We thus conclude from the integral test that the given infinite series is convergent.

6.10 Additional Tables of Integrals

Forms involving a + bx

x 1
1. d x = 2 [a + bx − a ln |a + bx|] + C.
a + bx b
6.10 Additional Tables of Integrals 187

x2 1
2. d x = 3 [(a + bx)2 − 4a(a + bx) + 2a 2 ln |a + bx|] + C.
 a + bx 2b
x 1 a
3. dx = 2 + ln |a + bx| + C.
(a + bx)2 b a + bx
 2
x 1 a2
4. d x = 3 a + bx − − 2a ln |a + bx| + C.
 (a + bx) a + bx
2 b
x 2 √
5. √ d x = 2 (bx − 2a) a + bx + C.
a + bx 3b
 √a + bx − √a
dx 1
6. √ = √ ln √ √ + C, a > 0.
x a + bx a a + bx + a

Forms involving a 2 + x 2
 
dx
7. √ = ln |x + a 2 + x 2 | + C.
a2 + x 2 √

dx 1 a 2 + x 2 + a
8. √ = − ln + C.
x a2 + x 2 a x
 √
dx a2 + x 2
9. √ =− + C.
 x a +x a2 x
2 2 2
dx x
10. = √ + C.
(a 2 + x 2 )3/2 a 2 a2 + x 2
 
x 2 a2 
11. a2 + x 2 d x = a + x2 + ln |x + a 2 + x 2 | + C
 2 2
 x  a4 
12. x 2 a 2 + x 2 d x = (a 2 + 2x 2 ) a 2 + x 2 − ln |x + a 2 + x 2 | + C
8 8

Forms involving x − a 2 2
 
dx
13. √ = ln |x + x 2 − a 2 | + C.
 x 2 − a2 √
dx x 2 − a2
14. √ = + C.
 x x −a a2 x
2 2 2
dx x
15. =− √ + C.
(x 2 − a 2 )3/2 a x 2 − a2
2
 
x 2 a2 
16. x 2 − a2 d x = x − a2 − ln |x + x 2 − a 2 | + C.
 2 2
 x  a4 
17. x x − a d x = (2x − a ) x 2 − a 2 −
2 2 2 2 2
ln |x + x 2 − a 2 | + C.
8√ 8
 √ 2 
x − a2 x 2 − a2
18. dx = − + ln |x + x 2 − a 2 | + C.
x2 x

Forms involving a 2 − x 2
 √
dx 1 a + a 2 − x 2
19. √ = − ln + C.
x a2 − x 2 a x
188 6 Integration

 √
dx a2 − x 2
20. √ =− + C.
 x a −x a2 x
2 2 2
dx x
21. = √ + C.
(a − x )
2 2 3/2
a a2 − x 2
2

 √ 2  a + √a 2 − x 2
a − x2
22. d x = a 2 − x 2 − a ln + C..
x x
Forms involving eax and ln x

1
23. xeax d x = 2 (ax − 1)eax + C
 a 
1 n ax n
24. x e dx = x e −
n ax
x n−1 eax d x.
 a a
dx 1
25. = x − ln(1 + beax ) + C.
 1 + be ax a
26. ln x d x = x ln x − x + C.
 
27. (ln x)n d x = x(ln x)n − n (ln x)n−1 d x.

x n+1
28. x n ln x d x = [(n + 1) ln x − 1] + C , n = −1.
 (n + 1)2
dx
29. = ln | ln x| + C.
 x ln x
eax (a sin bx − b cos bx)
30. eax sin bx d x = .
 a 2 + b2
e (a cos bx + b sin bx)
ax
31. eax cos bx d x = .
a 2 + b2
We present some examples.

x
Example 6.18 Use the table of integrals to evaluate d x.
(1 + 2x)2
Solution: Since (1 + 2x)2 is of the form (a + bx)2 with a = 1 and b = 2, we use
Formula 3 to obtain

x 1 1
dx = + ln |1 + 2x| + C .
(1 + 2x)2 4 1 + 2x
 4
dx
Example 6.19 Use the table of integrals to evaluate √ .
3 x 50 − 2x 2
Solution: We first evaluate the indefinite integral

dx
I = √ .
x 50 − 2x 2
√ √ √
We have 50 − 2x 2 = 2 25 − x 2 , so that
6.10 Additional Tables of Integrals 189

1 dx
I =√ √ .
2 x 25 − x 2

Using Formula 19 with a = 5, we get


  √

1 1 5 + 25 − x 2
I =√ · − ln +C.
2 5 x

For 3 ≤ x ≤ 4, the argument of the logarithm is positive, so we can compute


 √  √  4
4
dx 2 5 + 25 − x 2
√ =− ln
3 x 50 − 2x 2 10 x 3
√ √ √ √
2 5 + 25 − 16 2 5 + 25 − 9
=− ln + ln
√10 √4 √ 10 3
2 2 2
=− ln 2 + ln 3 = (− ln 2 + ln 3)
√ 10 10 10
2 3
= ln .
10 2

dx
Example 6.20 Use the table of integrals to evaluate .
1 + e−x
Solution: Using Formula 25 with a = −1, b = 1, we get

dx
= x + ln(1 + e−x ) + C .
1 + e−x
 2
dx
Example 6.21 Use the table of integrals to evaluate .
0 (5 − x 2 )3/2
Solution: We first evaluate the indefinite integral

dx
I = .
(5 − x 2 )3/2

Using Formula 21 with a = 5, we get
x
I = √ +C.
5 5 − x2

Now  2
2
dx 1 x 1 2 2
= ·√ = · √ −0 = .
(5 − x 2 )3/2 5 5−x 0
2 5 5−4 5
0
190 6 Integration

6.11 Exercises

Evaluate the following integrals:



6.11.1 (4x + 5) d x,

6.11.2 (9t 2 − 5t + 9) dt,
  
√ 2
6.11.3 3 u+√ du,
 u
6.11.4 (5z −7 + 7z −3 − z) dz,

6.11.5 (x 2/3 − 4x −1/5 + 4) d x,

6.11.6 u(1 + u 2 ) du,
 √
6.11.7 (2x −1 − 2e x ) d x,

6.11.8 (x 2/3 − sin x) d x,

sec x
6.11.9 d x,
 cos x
sec u sin u
6.11.10 du,
 cos u
6.11.11 (1 + sin2 θ csc θ ) dθ,

sin 2θ
6.11.12 dθ,
 cos θ 2
(1 + cot x) cot x
6.11.13 d x,
csc x
6.11.14 Show that sin2 x, − cos2 x and − 21 cos 2x are antiderivatives of 2 sin x cos x.
Evaluate the following integrals using the given substitution and express the answer
in terms of x:

6.11.15 2x(x 2 + 1)9 d x; u = x 2 + 1,

x
6.11.16 d x; u = x 2 + 6,
 (x 2 + 6)
sin 3x
6.11.17 d x; u = 1 + cos 3x,
 (1 + cos 3x)

6.11.18 e2x 1 + e2x d x, u = 1 + e2x .

Evaluate the following integrals:



sin 2x
6.11.19 √ d x,
1 − cos 2x
6.11 Exercises 191

6.11.20 sin x(1 + cos x)2 d x,

6.11.21 sin5 3θ cos 3θ dθ.
6.11.22 Evaluate the following integrals
 12  4  3  4
(a) 12 d x (b) (4 − 6x) d x. (c) 900 d x, (d) 2 d x.
−6 −1 3 −4
 8  2   π/3
(e) (2x 2 + 5x + 2) d x, (f) 4 − x 2 d x, (g) sin x d x.
2 0 −π/3
6.11.23 Express as a single integral
 1  6
(a) f (x) d x + f (x) d x
6 6 −32
(b) f (x) d x − f (x) d x
−2h  −2h
(c) f (x) d x − f (x) d x.
d g

6.11.24 Evaluate the following integrals:


 7
(a) f (x) d x, where f (x) = |x − 4|.
−38 
1 if − 2 ≤ x ≤ 3
(b) g(x) d x, where g(x) =
−2 x if 3 ≤ x ≤ 8
6.11.25 Use the integration by parts formula to evaluate
 the following integrals:
(a) ln x d x, (b) (cos x)2 d x, (c) tan−1 x d x,
  1  4
(d) 3 x
x e d x, (e) xe−3x d x. (f) ln (x 2 + 1) d x
0 0
6.11.26 Evaluate
 ∞ the following improper
 ∞ integrals provided
 ∞they are convergent:
x 2
(a) e−x d x, (b) d x, (c) d x,
−1 1 + x x −1
2 2
0 5
 0  ∞
ex 1
(d) d x, (e) d x.
−∞ 3 − 2e
x x4
1

6x 2 + 13x + 6
6.11.27 Evaluate d x, using partial fractions.
(x + 2)(x + 1)2
6.11.28 Evaluate
 the following integrals:

(a) cos5 θ dθ, (b) sin3 x cos3 x d x,
 π/6  π/3
(c) sec3 θ tan θ dθ, (d) sin4 3x cos3 3x d x,
0  3 0
cos θ x3
(e)  dθ, (f) d x.
0 (3 + x )
2 5/2
2 − sin2 θ
Chapter 7
Applications of Integration

In the previous chapter, we have introduced the notion of the integral as the limit of
certain sums known as Riemannian sums, and we have seen that it yields the area
below the graph of a given nonnegative function f (x) defined over a closed interval
[a, b]. In Sect. 7.1, we apply integration to find the area for several variants of this
basic situation, involving curves below the x-axis, partly above and partly below the
x-axis, two curves, or a curve and the y-axis. Sect. 7.2 is devoted to the determination
of the length of plane curves, the area of surfaces of revolution as well as the volume
of solids of revolution. The interpretation of the integral as an average is discussed
in Sect. 7.3. Applications of integration to problems from finance and business are
presented in Sect. 7.4. The modeling of basic concepts of mechanics like mechanical
work and forces in terms of integrals is discussed in Sect. 7.5. In Sect. 7.6, we show
that integrals arise in elementary probability theory as well. A fairly large number
of exercises are given in Sect. 7.7.

7.1 Areas Under Curves

Curves Above the x-Axis. Let a curve y = f (x) lie above the x-axis. We have
already explained in the beginning of the previous chapter that the area below the
curve between x = a and x = b is given by the definite integral
 b
f (x) d x
a

as the result of a limiting process involving the areas f (x)δx of small approximating
rectangles (see Fig. 7.1).

© Springer Nature Singapore Pte Ltd. 2019 193


M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6_7
194 7 Applications of Integration

Fig. 7.1 A small


approximating rectangle

Fig. 7.2 Approximating


rectangle for a curve below
the x-axis

Example 7.1 Find the area under the curve y = f (x) = x 2 − 4x + 5 between x =
−1 and x = 2.

Solution: The area is obtained from the Fundamental Theorem of Calculus as


 2  2  x=2
1 3
f (x) d x = (x 2 − 4x + 5) d x = x − 2x 2 + 5x
−1 −1 3 x=−1
   
8 1
= − 8 + 10 − − − 2 − 5 = 12 .
3 3

Curves Below the x-Axis. If a curve y = f (x) lies below the x-axis, the function
values y = f (x) are negative (Fig. 7.2). Thus, the numbers f (x)δx are negative, and
the area of a small approximating rectangle is given by − f (x)δx. In the limit, the
number  b
− f (x) d x
a

gives the area between the curve and the x-axis from x = a to x = b.
7.1 Areas Under Curves 195

Fig. 7.3 Area between curve and x-axis

Example 7.2 Find the area between the curve y = f (x) = 1 − x 2 and the x-axis
from x = 3 to x = 5.

Solution: Between x = 3 and x = 5, the curve lies below the x-axis (Fig. 7.3). We
compute
 5  5  x=5    
x3 125 27
f (x) d x = (1 − x 2 ) d x = x − = 5− − 3−
3 3 3 x=3 3 3
92
=− .
3
92
Therefore, the required area is 3
.
Curves Partly Above and Partly Below the x-Axis. If a curve lies partly below and
partly above the x-axis, we first determine the points where the curve intersects the
x-axis. Then the areas above and below the x-axis are calculated separately.
Example 7.3 Find the total area between the curve y = f (x) = x 2 − 4x + 3 and
the x-axis between x = 0 and x = 4.
Solution: The curve, shown in Fig. 7.4, intersects the x-axis when

0 = f (x) = x 2 − 4x + 3 = (x − 1)(x − 3) .

We therefore have two intersection points, namely, x = 1 and x = 3. We want to


determine the area of the shaded region, which lies partly above and partly below the
x-axis. To this purpose, we find the areas A1 , A2 , and A3 separately. We compute
 1  1  x=1
x3 x2 1 4
f (x) d x = (x 2 − 4x + 3) d x = − 4 + 3x = −2+3= ,
0 0 3 2 x=0 3 3

therefore A1 = 43 . Next,
196 7 Applications of Integration

Fig. 7.4 Curve partly above and partly below the x-axis

 3  3  3 x=3
x x2
f (x) d x = (x 2 − 4x + 3) d x = − 4 + 3x
1 1 3 2
  x=1
1 4
= (9 − 18 + 9) − −2+3 =− .
3 3
3
Between x = 1 and x = 3, the curve lies below the x-axis, therefore A2 = − 1 f (x)
d x = 43 . Finally,

 4  4  x=4
x3 x2
f (x) d x = (x − 4x + 3) d x =
2
− 4 + 3x
3 3 3 2
  x=3
64 4
= − 32 + 12 − (9 − 18 + 9) = ,
3 3

and A3 = 43 .
The total area between the curve and the x-axis becomes
4 4 4
A1 + A2 + A3 = + + = 4.
3 3 3
Note that the definite integral from x = 0 to x = 4 equals
 4  4  3 x=4
x 64 4
f (x) d x = (x 2 − 4x + 3) d x = − 2x 2 + 3x = − 32 + 12 =
0 0 3 x=0 3 3
1
=1 .
3
We may interpret this as the sum of the separate areas, where areas below the x-axis
are counted as negative, that is,
7.1 Areas Under Curves 197

Fig. 7.5 Approximating


rectangle between two curves

4 4 4 1
A1 − A2 + A3 = − + =1 .
3 3 3 3
If we wish to find the actual area, we must calculate the areas separately as shown
above.
Area Between Two Curves. Consider two curves y = f (x) and y = g(x) as shown
in Fig. 7.5. If we divide the area between the two curves, from x = a to x = b, into
vertical strips, a typical approximating rectangle has height f (x) − g(x) and area
( f (x) − g(x))δx. The limiting process performed in the construction of the definite
integral then yields the area between the two curves as
 b
( f (x) − g(x)) d x .
a

Note that as long as the curve y = f (x) lies above the curve y = g(x), the position
of the x-axis in relation to the curves does not matter.
Example 7.4 Find the total area between the curves y = f (x) = x 3 + x 2 − 5x and
y = g(x) = x 2 − x, from x = −2 to x = 2.
Solution: The two curves intersect when x 3 + x 2 − 5x = f (x) = g(x) = x 2 − x,
(see Fig. 7.6). Solving this equation for x yields

0 = x 3 − 4x = x(x 2 − 4) = x(x + 2)(x − 2) .

The intersection points are x = −2, x = 0 and x = 2. The required area equals the
area of the shaded region, which consists of two parts whose areas we find separately.
From x = −2 to x = 0, y = f (x) lies above y = g(x), so the area of this part is
given by
198 7 Applications of Integration

Fig. 7.6 Area between two


curves

  0  0
f (x) − g(x) d x = [(x 3 + x 2 − 5x) − (x 2 − x)] d x = (x 3 − 4x) d x
−2 −2 −2
 x=0
x4
= − 2x 2 = 0 − (4 − 8) = 4 .
4 x=−2

From x = 0 to x = 2, y = g(x) lies above y = f (x) so the area of this part equals
 2  2  2
g(x) − f (x) d x = [(x − x) − (x + x − 5x)] d x =
2 3 2
(4x − x 3 ) d x
0 0 0
 x=2
x4
= 2x 2 − = 4.
4 x=0

Therefore, the total area between the two curves equals 4 + 4 = 8.


Area Between a Curve and the y-Axis. Consider the region bounded by a curve
x = g(y), the y-axis and the horizontal lines y = c and y = d (see Fig. 7.7). We
divide the region into strips parallel to the x-axis. The area of a typical approximating
rectangle equals g(y)δy. The limit process yields the total area between the curve
and the y-axis as
 d
g(y) dy .
c

Example 7.5 Find the area bounded by the curve given by y 2 = 1 + 2x and the
y-axis between y = 1 and y = 3.
Solution: We rewrite the equation of the curve as
1 2
x = g(y) = (y − 1) .
2
The required area is the area of the shaded region in Fig. 7.8, and computed as
7.1 Areas Under Curves 199

Fig. 7.7 Approximating


rectangle for x = g(y)

Fig. 7.8 Area between


curve and y-axis

 3  3   y=3    
1 2 y3 y 9 3 1 1
g(y) dy = (y − 1) dy = − = − − −
1 1 2 6 2 y=1 2 2 6 2
10 1
= =3 .
3 3
We illustrate a different method by the following example.
Example 7.6 Find the area between the curve y = f (x) = (x + 1)2 and the y-axis,
from y = 4 to y = 16.
Solution: As shown in Fig. 7.9, we may obtain the required area (called C) if we
subtract the area of the region B and of the small rectangle A from the area of the
large rectangle OMNP. We calculate
 3  3
C = 48 − 4 − (x + 1)2 d x = 44 − (x 2 + 2x + 1) d x
1 1
 x=3   
x3 1
= 44 − + x2 + x = 44 − (9 + 9 + 3) − +1+1
3 x=1 3
56 1
= 44 − = 25 .
3 3

Alternatively, we rewrite the equation as x = g(y) = y 1/2 − 1 and apply the method
described previously. This yields
200 7 Applications of Integration

Fig. 7.9 Two ways to compute the area between a curve and the y-axis

 16  16   y=16
2 3/2
C= g(y) dy = (y − 1) dy = y −y
1/2
4 4 3 y=4
   
128 16 1
= − 16 − − 4 = 25 .
3 3 3

Area in Polar Coordinates. We want to compute the area of the region

G = {(r, θ) : α ≤ θ ≤ β , 0 ≤ r ≤ f (θ )} ,

whose points are given in polar coordinates (r, θ). The region G is bounded by the
curve r = f (θ ) and the two lines which pass through the origin with angles α and
β, respectively.
Theorem 7.1 Let r = f (θ ) be a continuous function defined in some interval α ≤
θ ≤ β, where f (θ ) ≥ 0 and β ≤ α + 2π . The area A of the region G is equal to
 β
1 2
A= f (θ ) dθ .
α 2

Proof Consider a partition

α = θ0 < θ1 < · · · < θn = β

of the interval [α, β]. Let f (si ) be a maximum and f (ti ) be a minimum of f in the
interval [θi , θi+1 ] (see Fig. 7.10). Let Ai be the area of the region between the curve
and the lines θ = θi and θ = θi+1 . This region is enclosed between the two sectors
7.1 Areas Under Curves 201

Fig. 7.10 Area in polar


coordinates

bounded by those lines and the curves r = f (ti ) and r = f (si ), respectively. The
area of a sector having angle θi+1 − θi and radius R is equal to θi+12π−θi times the total
area of the circle of radius R, namely, π R 2 . Hence, we have

θi+1 − θi θi+1 − θi
π f 2 (ti ) ≤ Ai ≤ π f 2 (si ) .
2π 2π

Let g(θ ) = 1
2
f 2 (θ ). Then the sum of the areas Ai of the small pieces satisfies the
inequalities


n−1 
n−1 
n−1
g(ti )(θi+1 − θi ) ≤ Ai ≤ g(si )(θi+1 − θi ) .
i=0 i=0 i=0

n−1
The required area A = i=0 Ai is thus enclosed by the sums on the left and the right,
which are Riemannian sums for the function g. Since g is continuous, it is integrable,
and therefore the Riemannian sums converge to the integral of g according to the
limit process which defines the integral. Therefore, we obtain
 β  β
1 2
A= g(θ ) dθ = f (θ ) dθ .
α α 2

Example 7.7 Find the area of the region bounded by the curve r = f (θ ) = 2 +
cos θ .

Solution: The required area is


202 7 Applications of Integration
 π/2  
1 2 1 π/2 1 π/2
f (θ ) dθ = (2 + cos θ ) dθ =
2
(4 + 4 cos θ + cos2 θ ) dθ
0 2 2 0 2 0
  
1 π/2 1 + cos 2θ
= 4 + 4 cos θ + dθ
2 0 2
 
1 1 sin 2θ θ=π/2 1
π π
= 4θ + 4 sin θ + θ + = 4 +4+
2 2 4 θ=0 2 2 4
9
= π + 2.
8

Example 7.8 Find the area bounded by one loop of the curve r 2 = 2a 2 cos 2θ, a > 0.
√ √
Solution: The given curve is r = f (θ ) = 2a cos 2θ . For − π4 ≤ θ ≤ π4 , we have
cos 2θ ≥ 0, and the curve describes a loop which begins and ends at the origin. We
compute the area of this loop as
 π/4  π/4
1 2 1 2
A= f (θ ) dθ = 2a cos 2θ dθ = a 2 .
−π/4 2 −π/4 2

7.2 Determination of Length, Area, and Volume

In this section, we determine the length of curves, the area of revolving surfaces and
the volume of revolving solids.
Length of Curves. Let P1 = (x1 , y1 ) and P2 = (x2 , y2 ) be two points in the plane.
The length of the line segment joining P1 and P2 equals the distance between P1 and
P2 , given by the theorem of Pythagoras as (x2 − x1 )2 − (y2 − y1 )2 . Let us now
consider a curve described by the graph of a function f over an interval [a, b] (see
Fig. 7.11). When f is differentiable and f  is continuous, its length, denoted by
L ab ( f ), is given by
 b
L ab ( f ) = 1 + f  (x)2 d x . (7.1)
a

We will now argue that the mathematical definition (7.1) coincides with the intuitive
notion of length. To this purpose, consider a partition of the interval [a, b]

a = x0 < x1 < · · · < xn = b .

For each xi the point (xi , f (xi )) lies on the curve y = f (x). Draw line segments
between two successive points. The length of the line segment between (xi , f (xi ))
and (xi+1 , f (xi+1 )) is

(xi+1 − xi )2 + ( f (xi+1 ) − f (xi ))2 .
7.2 Determination of Length, Area, and Volume 203

Fig. 7.11 Curve


approximated by line
segments

By the mean value theorem, we have

f (xi+1 ) − f (xi ) = (xi+1 − xi ) f  (ci )

for some number ci between xi and xi+1 . The length of our line segment therefore
becomes

(xi+1 − xi )2 + (xi+1 − xi )2 f  (ci )2 = (xi+1 − xi ) 1 + f  (ci )2 .

The sum of the lengths of these line segments is


n−1
1 + f  (ci )2 (xi+1 − xi ) . (7.2)
i=0


Let h(x) = 1 + f  (x)2 . The sum (7.2) can now be written as


n−1
h(ci )(xi+1 − xi ) .
i=0

This is a Riemannian sum for the function h. Since h is continuous, the Riemannian
sums converge to the integral
 b  b
h(x) d x = 1 + f  (x)2 d x
a a

according to its construction, when the partition is made finer and finer. On the other
hand, our intuitive notion of length says that the sums (7.2) of the lengths of the
204 7 Applications of Integration

approximating line segments should become closer and closer to the length of the
curve as the partition becomes finer and finer. Therefore, formula (7.1) indeed is the
correct way to define the length of a curve.
Let us remark that, according to the different possible notations of the derivative,
formula (7.1) can also be written as
 2
 b
dy
L ab ( f) = 1+ dx .
a dx

Example 7.9 Find the length of the curve y = f (x) = e x between x = 1 and x = 2.
Solution: The required length is
 2  2
L ab ( f) = 1+ f  (x)2 dx = 1 + e2x d x .
1 1

we make the substitution 1 + e2x = u √


To compute the integral,√ 2
, so 2e2x d x = 2u du.
For x = 1 we get u = 1 + e2 , and for x = 2 we get u = 1 + e4 . Since e2x =
u 2 − 1, we obtain
 √  √1+e4 2
1+e4
u2 u −1+1
L ab ( f) = √ du = √ du
1+e2 u − 1 u2 − 1
2
1+e2
 √1+e4  √1+e4   √ 4
1 1 u − 1 u= 1+e
= √ 1 du + √ du = u + ln
1+e2 1+e2 u − 1
2 2 u + 1 u=√1+e2
√ √
1 1 + e4 − 1 1 1 + e2 − 1
= 1 + e + ln √
4 − 1 + e − ln √
2 .
2 1 + e4 − 1 2 1 + e2 − 1

Curves in Parametric Form. Let a curve be given in parametric form x = f (t),


y = g(t) with a ≤ t ≤ b, and assume that f and g are differentiable with continuous
derivatives f  and g  . Its length is defined as
 b
L ab ( f, g) = f  (t)2 + g  (t)2 dt . (7.3)
a

In order to show that this formula is a reasonable definition of length, we argue in a


similar manner as in the previous subsection. We consider a partition of the interval
[a, b]
a = t 0 < t1 < · · · < t n = b .

The distance between two successive points ( f (ti ), g(ti )) and ( f (ti+1 ), g(ti+1 )) is

( f (ti+1 ) − f (ti ))2 + (g(ti+1 ) − g(ti ))2 .
7.2 Determination of Length, Area, and Volume 205

By the mean value theorem for f and g there exist numbers ci and di between ti and
ti+1 such that

f (ti+1 ) − f (ti ) = f  (ci )(ti+1 − ti )


g(ti+1 ) − g(ti ) = g  (di )(ti+1 − ti ) .

Substituting these values, the sum of the lengths of the line segments becomes


n−1
f  (ci )2 + g  (di )2 (ti+1 − ti ) . (7.4)
i=0


Let h(t) = f  (t)2 + g  (t)2 . The sum (7.4) is close to


n−1
h(ci )(ti+1 − ti ) , (7.5)
i=0

which is a Riemannian sum for h. The sums (7.4) and (7.5) are not necessarily equal
(since ci will, in general, be different from di ), but their difference can be shown
to converge to zero as the partition becomes finer and finer. Since the sums (7.5)
converge to the integral (7.3) in the limit, it is indeed reasonable to define the length
of the curve in parametric form by (7.3).
An alternative way would be to write
 b
   dy 2
dx 2
L ab = + dt . (7.6)
a dt dt

Note that the usual (nonparametric) form arises as a special case of the parametric
form, if we set f (t) = t in (7.3). (The letters t and g in (7.3) then correspond to x
and f in (7.1).)
When we replace the fixed upper limit b in (7.3) by a variable t (and replace t by
τ in the integral to avoid confusion), we obtain
 t
s(t) = f  (τ )2 + g  (τ )2 dτ . (7.7)
a

The function s is called the arc length of the curve. From the Fundamental Theorem
of Calculus, we immediately obtain

s  (t) = f  (t)2 + g  (t)2 .
206 7 Applications of Integration

Example 7.10 Find the length of the curve x = f (θ ) = cos3 θ, y = g(θ ) = sin3 θ
for 0 ≤ θ ≤ π2 .

Solution: We have f  (θ ) = 3 cos2 θ · (− sin θ ) and g  (θ ) = 3 sin2 θ cos θ. From (7.3)


we get
 π/2
π/2
L0 = 9 cos4 θ sin2 θ + 9 sin4 θ cos2 θ dθ
0
 π/2 
=3 sin2 θ cos2 θ (cos2 θ + sin2 θ ) dθ
0
 π/2 
3 π/2
=3 sin θ cos θ dθ = sin 2θ dθ
0 2 0
 θ=π/2
3 cos 2θ 3 3
= − = − (cos π − cos 0) = .
2 2 θ=0 4 2

Length of Curves in Polar Coordinates. Let r = f (θ ) be the equation of a curve


in polar coordinates, defined in the interval a ≤ θ ≤ b. Its length is given by
 b
L ab = f (θ )2 + f  (θ )2 dθ . (7.8)
a

Proof From the formulas x = r cos θ, y = r sin θ we obtain the parametric form
(x(θ ), y(θ )) of the curve in the usual (Cartesian) coordinates as

x(θ ) = f (θ ) cos θ , y(θ ) = f (θ ) sin θ .

Using formula (7.6), its length is equal to


 b
L ab = x  (θ )2 + y  (θ )2 dθ . (7.9)
a

Since

x  (θ ) = f  (θ ) cos θ − f (θ ) sin θ
y  (θ ) = f  (θ ) sin θ + f (θ ) cos θ ,

we obtain

x  (θ )2 + y  (θ )2 = f (θ )2 sin2 θ + f  (θ )2 cos2 θ − 2 f (θ ) f  (θ ) sin θ cos θ


f (θ )2 cos2 θ + f  (θ )2 sin2 θ + 2 f (θ ) f  (θ ) sin θ cos θ
= f (θ )2 + f  (θ )2 .
7.2 Determination of Length, Area, and Volume 207

Inserting this result into (7.9), we obtain (7.8). An alternative way to write this
formula would be  2
 b
dr
La =
b
r2 + dθ .
a dθ

From the computation above, we obtain the formula for the arc length in polar
coordinates as
 θ  θ
s(θ ) = L θb = x  (η)2 + y  (η)2 dη = f (η)2 + f  (η)2 dη ,
a a

so
s  (θ ) = f (θ )2 + f  (θ )2 . (7.10)

Example 7.11 Find the length of the curve r = f (θ ) = 1 − cos θ between θ = 0


and θ = π4 .
Solution: We have f (θ ) = 1 − cos θ , f  (θ ) = sin θ . We compute
 π/4
π/4
L0 = f (θ )2 + f  (θ )2 dθ
0

π/4
= 1 − 2 cos θ + cos2 θ + sin2 θ dθ
0
 π/4  π/4 
θ
= 2(1 − cos θ ) dθ = 4 sin2 dθ
0 0 2
 π/4 θ=π/4 
θ θ π
=2 sin dθ = −4 cos  = 4 1 − cos .
0 2 2 8 θ=0

Area of Surfaces of Revolution. Let y = f (x) be a function which satisfies f (x) ≥


0 and has a continuous derivative on an interval [a, b]. Let S denote the area of the
surface of revolution of the graph of f around the x-axis as shown in Fig. 7.12. Then
 b
S = 2π f (x) 1 + f  (x)2 d x . (7.11)
a

We explain how formula (7.11) arises.


We approximate the curve by line segments. Consider a partition a = x0 < x1 <
· · · < xn = b of [a, b]. The length L i of the line segment joining two successive
points (xi , f (xi )) and (xi+1 , f (xi+1 )) is given by

Li = (xi+1 − xi )2 + ( f (xi+1 ) − f (xi ))2 .
208 7 Applications of Integration

Fig. 7.12 A surface of


revolution

Fig. 7.13 Small


approximating surface of
revolution

We know that the length of a circle (its circumference) of radius y equals 2π y.


Therefore, if we revolve the line segment about the x-axis, we expect that the area of
this surface of revolution will lie between 2π f (ti )L i and 2π f (si )L i , where f (ti ) and
f (si ) are the minimum and maximum of f , respectively, on the interval [xi , xi+1 ]
(see Fig. 7.13, here we have ti = xi and si = xi+1 ).
By the mean value theorem (Theorem 4.4), we have

f (xi+1 ) − f (xi ) = f  (ci )(xi+1 − xi )

for some number ci between xi and xi+1 . From this we obtain



Li = (xi+1 − xi )2 + f  (ci )2 (xi+1 − xi )2

= 1 + f  (ci )2 (xi+1 − xi ).

Therefore,
2π f (ci ) 1 + f  (ci )2 (xi+1 − xi )
7.2 Determination of Length, Area, and Volume 209

is an approximation of the area of the surface of revolution of the curve over the
small interval [xi , xi+1 ]. An approximation over the whole interval [a, b] is given by
the sum

n−1
2π f (ci ) 1 + f  (ci )2 (xi+1 − xi ) .
i=0


This is a Riemannian sum for the function h(x) = f (x) 1 + f  (x)2 . It is therefore
reasonable to define the area of the surface of revolution of the curve y = f (x)
between x = a and x = b by the integral
 b
S= 2π f (x) 1 + f  (x)2 d x .
a

For curves in parametric form x = f (t), y = g(t), a ≤ t ≤ b, the length L i


between ( f (ti ), g(ti )) and ( f (ti+1 ), g(ti+1 )) is given by (again we use the mean
value theorem)

L i = ( f (ti+1 ) − f (ti ))2 + (g(ti+1 ) − g(ti ))2

= f  (ci )2 + g  (di )2 (ti+1 − ti ) ,

where ci , di are some numbers between ti and ti+1 . Therefore, the approximation
for the area of the surface of revolution of the curve in the small interval [ti , ti+1 ] is
given by
2πg(ti ) f  (ci )2 + g  (di )2 (ti+1 − ti ) .

The arguments to obtain the length of a parametric curve can be used in an analogous
manner to show that the area of the surface of revolution over the whole interval [a, b]
is given by
 b
S = 2π g(t) f  (t)2 + g  (t)2 dt . (7.12)
a

If we set x = t and y = g(t) = g(x), formula (7.11) becomes a special case of (7.12).
Another form of (7.12) arises when we use the arc length s(t) of the parametrized
curve as defined in the previous subsection. Since s  (t) = f  (t)2 + g  (t)2 , (7.12)
simply becomes
 b
S = 2π g(t)s  (t) dt . (7.13)
a

Note. Let the curve be given in polar coordinate form r = f (θ ) with a ≤ θ ≤ b. We


write it in parametric form x = f (θ ) cos θ, y = f (θ ) sin θ and obtain from (7.13),
taking into account (7.10),
210 7 Applications of Integration
 b  b
S = 2π f (θ ) sin θ f (θ )2 + f  (θ )2 dθ = 2π f (θ ) sin θ s  (θ ) dθ ,
a a

or, in an alternative form,


 2
 b
ds ds dr
S = 2π r sin θ dθ , where = r2 + .
a dθ dθ dθ

Example 7.12 Find the area S of the surface of revolution which arises from rotating
the curve y = f (x) = x 3 between x = 0 and x = 1 around the x-axis.
Solution: The given curve is y = f (x) = x 3 with f  (x) = 3x 2 . We obtain
 1 
2π 1
S = 2π x 3 1 + 9x 4 d x = 36x 3 1 + 9x 4 d x
0 36 0
 x=1
π 2 π √
= (1 + 9x 4 )3/2 = [10 10 − 1] .
18 3 x=0 27

Example 7.13 Find the area of a sphere of radius a > 0.


Solution: The sphere can be viewed as the surface of revolution of a half-circle of
radius a. The equation of a half-circle in parametric form is

x = f (θ ) = a cos θ , y = g(θ ) = a sin θ , 0 ≤ θ ≤ π .

Since f  (θ ) = −a sin θ, g  (θ ) = a cos θ, the required area is computed as


 π
S = 2π g(θ ) f  (θ )2 + g  (θ )2 dθ
0 π
= 2π a sin θ a 2 sin2 θ + a 2 cos2 θ dθ
0
 π
= 2πa 2
sin θ dθ = 2πa 2 [− cos θ]π0
0
= 4πa 2 .

Volume of Solids of Revolution. Let y = f (x) be a continuous function of x on


[a, b] with f (x) ≥ 0 for all x in [a, b]. If we revolve the curve y = f (x) around the
x-axis, we obtain a solid (or body), see Fig. 7.14, whose volume V is given by
 b
V =π f (x)2 d x . (7.14)
a
7.2 Determination of Length, Area, and Volume 211

Fig. 7.14 A solid of


revolution

To show that this is the correct formula, we again consider a partition a = x0 <
x1 < · · · < xn = b of [a, b]. Let ci and di be the minimum and maximum of f ,
respectively, in the interval [xi , xi+1 ]. Recall that the volume of a cylinder of radius r
and height h equals πr 2 h. The volume of the solid of revolution formed by revolving
the line segment, joining (xi , f (xi )) and (xi+1 , f (xi+1 )), around the x-axis will lie
between the volume of the small cylinder of radius f (ci ) and height xi+1 − xi and
the volume of the big cylinder of radius f (di ) and height xi+1 − xi .
Taking the sum over all the line segments, we get


n−1 
n−1
π f (ci )2 (xi+1 − xi ) ≤ V ≤ π f (di )2 (xi+1 − xi ) .
i=0 i=0

The sums on the left and right are Riemannian sums for the function h(x) = π f (x)2 .
It is therefore reasonable to define the volume to be
 b
V =π f (x)2 d x .
a

Example 7.14 Find the volume of the solid of revolution obtained by rotating the
region bounded by the curves y = f 1 (x) = x 2 and y = f 2 (x) = 5x around x-axis.
Solution: The two curves intersect at the points (0, 0) and (5, 25), and f 2 lies above
f 1 , see Fig. 7.15. The required volume V is therefore equal to the difference of
the volumes obtained by rotating y = 5x and y = x 2 between x = 0 and x = 5.
Consequently,
212 7 Applications of Integration

Fig. 7.15 A region which


generates a solid of
revolution

 5  5  5
V =π (5x) d x − π
2
(x ) d x = π
2 2
(25x 2 − x 4 ) d x
0 0 0
 x=5    
x3 x5 53 55 5
= π 25 − = π 25 · − = π54 −1
3 5 x=0 3 5 3
2
= π54 .
3

7.3 Definite Integral as Average

We are familiar with the average (or arithmetic mean) of n numbers, which we obtain
by dividing the sum of those numbers by n. Here, we discuss the average value of a
continuously varying function.
Let f (t) denote the temperature at time t, measured in hours since midnight. We
want to calculate the average temperature over a 24-h period. One way to start would
be to average the temperatures at n equally spaced times t1 , t2 , t3 , . . . , tn during the
day. This gives the estimate

f (t1 ) + f (t2 ) + · · · + f (tn )


Average temperature  . (7.15)
n
Since the difference between two successive times equals Δt = 24/n or n = 24/Δt,
we may rewrite (7.15) as
7.3 Definite Integral as Average 213

Fig. 7.16 A sample of values of a function on an interval [a, b]

f (t1 )Δt + f (t2 )Δt + · · · + f (tn )Δt


Average temperature 
24
1  (7.16)
n
 f (ti )Δt .
24 i=1

The larger we make n, the better we expect the estimate to be. Since the right-hand
side of (7.16) is a Riemann sum for the function f and therefore converges to the
integral of f for n → ∞, it is natural to consider

1 
24 n
1 24
f (t) dt = lim f (tin )Δt , tin = i ,
24 0 n→∞ 24
i=1
n

as the average of the function f over the interval [0, 24].


In view of the discussion above we define the average value (or mean value) of
a function f on the interval [a, b] as
 b
1
Av( f ) = f (t) dt . (7.17)
b−a a

b
It is clear that (b − a) · Av( f ) = a f (t) dt.
The example above is an instance of a rather general situation. Let f = f (x) be a
continuous function on [a, b]. We partition [a, b] into n subintervals of equal length
Δx = (b − a)/n and evaluate (or sample) f at a point ck in each subinterval (see
Fig. 7.16). The average of the n sampled values is
214 7 Applications of Integration

1 Δx 
n n
f (c1 ) + f (c2 ) + · · · + f (cn )
= f (ci ) = f (ci )
n n i=1 b − a i=1
1 
n
= f (ci )Δx .
b − a i=1

As n → ∞, this expression converges to


 b
1
Av( f ) = f (x) d x .
b−a a


Example 7.15 Find the average value (mean value) of f (x) = 9 − x 2 on [−3, 3].
Solution: The average value on the interval [−3, 3] is defined as
 3  3
1 1
Av( f ) = 9 − x2 dx = 9 − x2 .
3 − (−3) −3 6 −3

Let x = 3 cos θ , then d x = −3 sin θ dθ, and

 
1 0
9 π 2
Av( f ) = − 3 1 − cos2 θ 3 sin θ dθ = sin θ dθ
6
π 6 0
 
3 1
θ=π 3
θ=π 1 − cos 2θ
= · θ − sin 2θ sin θ =
2
2 2 θ=0 4 θ=0 2
3
= π.
4

Example 7.16 Find the average or mean value of


1. f (x) = c
2. f (x) = x
on the interval [−4, 4].
Solution:
1. We have  4
1 8
Av( f ) = c dx = c · = c.
8 −4 8

2. We have
 4  4
1 1 x2 1
Av( f ) = x dx = = [16 − 16] = 0 .
8 −4 8 2 −4 16
7.3 Definite Integral as Average 215

Note 7.1 The average value is used in economics to study the daily inventory I (t)
and the average daily inventory
 T
1
Av(I ) = I (t) dt
T 0

over the time period [0, T ].

Example 7.17 Let the population of a country be modeled by the function

P(t) = 67.38 · 1.026t ,

where P is measured in millions of people and t in years since 2000. Use this function
to predict the average population of the country between the years 2020 and 2040.

Solution: We want to find average value of the function P between t = 20 and t = 40.
We obtain
 40  40
1 1
Av(P) = P(t) dt = 67.38(1.026)t dt
40 − 20 20 20 20
 40  
(1.026)t 40
= 3.369 (1.026) dt = 3.369 ·
t
20 ln 1.026 20
 147

by Table 6.1(10). Thus, the average population of the country between 2020 and
2040 is predicted to be 147 million people.

7.4 Applications to Business and Industry

7.4.1 Present and Future Values

A lot of business deals with payments in the future. For example, when buying a
car on credit, payments are made over a period of time. If we are going to make
or accept payments in the future under such an agreement, we should know how to
compare the values of such payments made at different times. Being paid Rs 10,000
in the future is, under usual circumstances, clearly worse than being paid Rs 10,000
today, due to several reasons. For example, if we get money today, we can invest it in
profitable shares, bank, and business. Therefore, even without considering inflation,
in order to get the same value we should expect to be paid more when the payment
is made in the future instead of now, in order to compensate for this loss of potential
earnings. The important question is: How much more? To simplify matters we do
not take inflation into consideration, but we consider only what we would lose by
216 7 Applications of Integration

not earning interest. Suppose we deposit Rs 10,000 in an account which earns 7%


interest compounded annually, so in a year’s time we will have Rs 10,700. We say
that 10,700 is the future value of Rs 10,000, and that Rs 10,000 is the present value
of Rs 10,700. In general we say the following.
The future value Rs B, of a present payment of Rs P, is the amount to which the
Rs P would have grown if deposited today in an interest-bearing bank account.
The present value Rs P, of a future payment of Rs B, is the amount which would
have to be deposited in a bank account today to produce Rs B in the account at the
relevant time in the future.
It is clear that due to the interest earned, the future value is larger than the present
value. The relation between the present value, denoted by P V , and the future value,
denoted by F V , is as follows:

FV
F V = P V · (1 + r )t , PV = . (7.18)
(1 + r )t

For continuous compounding

F V = P V · er t , P V = F V · e−r t . (7.19)

In both cases, it is assumed that the interest is compounded over a period of t years
at an annual rate r , for example, r = 0.07 for an annual interest rate of 7%.
When we consider payments made to or by an individual, we normally think of
discrete payments, that is, payments made at specific moments in time. However,
when we analyze the overall money flow of a company or a bank, it makes sense to
model it as a continuous stream of payments and earnings. (A similar modeling step—
although the difference in scale is much larger—is performed when one analyzes the
flow of a river, without looking at the behavior of the individual water molecules.)
Above we have considered the relation between the present and the future value
of a single payment. We now want to calculate those quantities for a continuous
stream of money, say an income stream, described by a rate of S(t) Indian rupees
per year, which varies continuously with time t during the time interval [0, T ], that
is, from now until T years in the future. In order to use what we know about single
deposits, we approximate the continuous income stream by a succession of many
small deposits Di made at times ti of an equidistant partition

T
0 = t0 < t1 < · · · < tn = T , Δt = ti+1 − ti = ,
n
of the time interval [0, T ]. If Δt is small, then the rate S (assumed to be a continuous
function of t) does not vary much within one subinterval [ti , ti+1 ], so the amount
deposited by the continuous stream during that subinterval is approximately equal to
S(ti ) times its length Δt. Consequently, we set Di = S(ti )Δt. Assuming continuous
compounding with a constant interest rate r , the present value of the deposit Di at
time ti becomes
7.4 Applications to Business and Industry 217

S(ti )e−r ti Δt .

Summing over all subintervals gives


n−1
S(ti )e−r ti (ti+1 − ti )
i=0

as the total present value of all successive small deposits. This is a Riemannian sum
for the function f (t) = S(t)e−r t . Since S is assumed to be continuous, so is f , and
for n → ∞ the sum converges to the integral of f . It is therefore natural to define the
present value of the continuous stream of rate S = S(t) during the time interval
[0, T ] as  T
PV = S(t)e−r t dt . (7.20)
0

Consequently, the future value of the same continuous stream, evaluated at the final
time after T years, becomes
 T  T
F V = P V · er T = er T S(t)e−r t dt = S(t)er (T −t) dt . (7.21)
0 0

Example 7.18 Find the present and future values of a constant income stream of
10,000 Indian rupees per year over a period of 20 years, assuming an interest rate of
6% compounded continuously.
Solution: We have S(t) = 10,000 and r = 0.06. By (7.20) and (7.21),
 20
PV = 10,000e−0.06t dt ≈ 116467.6 IR ,
0
F V = 116467.6 · e0.06(20) = 386686.2 IR .

Example 7.19 Suppose we want to have an amount of 50,000 IR at the date 8 years
in the future in a bank account earning 2% interest compounded continuously.
1. If we make one lump sum deposit now, how much should we deposit ?
2. If we deposit money continuously throughout the period of 8 years, at what rate
should we deposit it?
Solution:
1. If we deposit a lump sum of P Indian rupees now, then P should be equal to
the present value P V of 50,000 Indian rupees. Using the second equation from
(7.19), we have
218 7 Applications of Integration

P V = 50,000 · e−0.02·8 = 50,000 · e−0.16 ≈ 42607.20 .

We therefore should now deposit P = 42607.20 IR into the account so that we


obtain 50,000 IR after 8 years.
2. Suppose we deposit money at the constant rate of S Indian rupees per annum. By
(7.20) we have, since S is constant,
 8  8
−0.02t
PV = S(t)e dt = S e−0.02t dt ≈ 7.39S .
0 0

But the present value of the continuous deposit must be the same as the present
value of the lump sum deposits, that is 42607.20. So

42607.20 ≈ 7.39S , S ≈ 5763.33 .

7.4.2 Annuity

An annuity is a sequence of payments made at regular time intervals. The time period
during which these payments are made is called the term of annuity. Although the
payments need not be equal in size, in many situations they are indeed equal. We
assume in our discussion here that they are equal. Let

P = size of each payment in the annuity,


r = interest rate compounded continuously,
T = term of annuity (in years), and
m = number of payments per year.

The payments into the annuity amount to m P rupees per year, which we model as an
income stream with constant rate S(t) = m P. According to (7.20), its present value
becomes
   t=T
T T
−m Pe−r t
S(t)e−r t dt = m Pe−r t dt =
0 0 t t=0
mP
= (1 − e−r T ) .
r
The present value of an annuity is therefore defined as

mP
PV = (1 − e−r T ) . (7.22)
r
The amount A of an annuity is defined as the corresponding future value at the end of
the term and thus represents the sum of payments plus the interest earned. According
to (7.21), it becomes
7.4 Applications to Business and Industry 219
 T
m P rT
A = mP er(T −t) dt = (e − 1) . (7.23)
0 r

Example 7.20 A proprietor of a hardware store wants to establish a fund now, in


order to withdraw 1000 IR per month for the next 10 years. The fund earns interest
at the rate of 9% per year compounded continuously. Calculate how much money he
needs to establish the fund.
Solution: The money he needs equals the present value of the annuity for the given
values P = 1000, r = .09, T = 10 and m = 12. According to (7.22), we obtain

12 · 1000
PV = (1 − e−0.09·10 )  79, 124.04 I R .
0.09

Example 7.21 On April 1, 1995, a person deposited 4000 IR into an individual


retirement account paying interest at the rate of 10 percent per year compounded
continuously. Assuming that he deposits 4000 IR annually into the account, how
much will he have in his retirement account at the beginning of the year 2003?
Solution: We apply (7.23) where P = 4000, r = 0.1, T = 8 and m = 1 to get
 8
4000 0.8
A = 1 · 4000 · e0.1(8−t) dt = (e − 1)  158121.30 I R .
0 0.1

The person has approximately 158121.30 IR in his account in the beginning of 2003.

7.4.3 Applications in Business

The determination of the variable cost of producing a consecutive number of units


is important to manufacturers. Let us recall from Sect. 3.3 that C = C(x) denotes
the cost of producing x units of a certain commodity, and that its derivative C  is
called the marginal cost. Assume that we want to produce a − 1 units anyway and
ask how much more would it cost to additionally produce units a through b. By the
Fundamental Theorem of Calculus, this cost (called variable cost) is given by
 b
V C = C(b) − C(a − 1) = C  (x) d x . (7.24)
a−1

If we know the marginal cost C  (x) which we also denote by MC(x), we can compute
the variable cost by evaluating the integral.
Example 7.22 Let MC(x) = 2x 2 − 3x + 2 be the marginal cost for a certain com-
modity. Find the variable cost of producing 12 through 16 units.
220 7 Applications of Integration

Solution: We insert the marginal cost function MC = C  and the given values a = 12
and b = 16 into (7.24) and obtain
 16  3 16
x 3x 2
VC = (2x 2 − 3x + 2) d x = 2 − + 2x = 1650.84 .
11 3 2 11

Depletion. Natural resources such as oil, gas, and coal are limited in quantity, and
their total depletion depends on the rate at which each resource is being consumed.
Let A = A(t) denote the annual rate of depletion, let A(0) = A0 at time t = 0 for
some given value A0 and suppose that A(t) increases at a rate of k percent each year.
If compounded continuously, we have

A(t) = A0 ekt ,

and the total amount S of depletion after a time of T years becomes


 T  T
S= A(t) dt = A0 ekt dt . (7.25)
0 0

Example 7.23 Suppose the world use of oil in 1976 was 21 billion barrels and the
annual percentage of increase of consumption equalled 8% in this and the following
years.
1. How many barrels of oil did the world use from 1976 to 1996 ?
2. In 1976, there were 550 billion barrels of proven reserves. How long did it take
to use all of them ?
Solution:
1. We apply Eq. (7.25) with the given values T = 20, k = 0.08 and A0 = 21 billion
barrels in 1976, and obtain the amount S of oil used between 1976 and 1996 as
 20
21 0.08t 20 21 1.6
S= 21e0.08t dt =
e  = (e − 1)
0 0.08 0 0.08
 1037.66 billion barrels.

2. Here we need to know how long it did take to use 550 billion barrels of oil. We
put S = 550, A0 = 21 and k = 0.08 in Eq. (7.25) and get
 T
21 0.08t T 21 0.08T
550 = 21e0.08t dt = e  = (e − 1) .
0 0.08 0 0.08

We find the value of T from this equation,

550 · 0.08
e0.08T = + 1 = 3.095 , ln 3.095 = 0.08T , T = 14.12 .
21
7.4 Applications to Business and Industry 221

Thus, 14.12 years from 1976 or by 1990, the world oil reserves would have been
depleted according to this model if no new reserves were discovered.
Rate of Sales. When the rate of sales of a product is a known function f = f (t) of
time t, the number of sales S(t) of this product up to time t satisfies S  (t) = f (t),
and hence the total sales over the period [0, T ] is given by
 T
S(T ) = f (t) dt . (7.26)
0

Example 7.24 Suppose the rate of sales of a new model of Honda car is given by

f (t) = 100 − 90e−t ,

where t is the number of days the product is on the market. Find the total sales during
the first 4 days.
Solution: From (7.26) we get
 4  4
4
S(4) = f (t) dt = (100 − 90e−t ) dt = 100t + 90e−t
0 0 0

= 310 + 90 · 0.018 = 311.62 units.

That is, 311 cars will be sold during first 4 days.


Example 7.25 A furniture manufacturing company has a current sales rate of
1,000,000 Indian rupees per month, and the profit to the company averages 10%
of the sales. The company’s past experience with a certain advertising strategy is
that sales will increase 2% per month over the duration of the advertising campaign
(12 months). The monthly rate of sales f (t) during this advertising campaign obeys
a growth curve of the type f (t) = A0 er t , where A0 is the current sales rate and r
is constant determining its increase. The company now needs to decide whether to
embark on a similar campaign that will cost 130,000 Indian rupees. The decision
will be affirmative, provided the increase in sales due to the campaign yields more
than 13,000 Indian rupees as a profit (which would be the standard 10% return on
investments of the company). Should the company take an affirmative decision or
not?
Solution: We calculate what happens when the company decides to start the adver-
tising campaign. In this case, the monthly rate of sales during the campaign will be
f (t) = 106 e0.02t , where t is measured in months. The total sales after 12 months (the
length of campaign) will be according to (7.26)
222 7 Applications of Integration

 12
12
106 0.02t 
S(12) = 10 · e
6 0.02t
dt = e  = 5 · 10 · (e
7 0.24
− 1)
0 0.02 0
= 13.55 · 106 IR .

Without the campaign, the total sales will be 12 · 106 IR. The profit to the company
amounts to 10% of the sales so that the profit due to an increase in sales by the
campaign is
0.1 · (13.55 · 106 − 12 · 106 ) = 155, 000 IR .

This 155,000 IR profit is achieved through the expenditure of 130,000 IR. Thus, the
advertisement would yield an additional profit of

155, 000 − 130, 000 = 25, 000 IR .

Since this is more than 13,000 IR, the standard profit obtainable for the company
from the money spent on advertisement, the decision should be affirmative.
Consumer Surplus. Here we introduce the notion of “consumer surplus” and show
that it is represented by an integral. Assume that a company sells a certain commodity,
and that the price p(x) it gets is a function of the number x of units it sells. This
function (or its inverse, the number of units the company can sell in dependence upon
the price it offers) is called the demand function. Usually, p is a decreasing function
of x as one needs lower prices for larger sales.
Let X be the amount of the commodity currently available and P = p(X ) its
current selling price. Assume for the moment that we replace the function p by
a step function s, according to a partition 0 = x0 < x1 < x2 < · · · < xn = X with
subintervals of equal length h = xi − xi−1 , having values s(x) = p(xi ) for xi−1 <
x ≤ xi . The portion (xi−1 , xi ] of the commodity is sold at price P, whereas it could
have been sold at price p(xi ) if only xi instead of X units were on sale, because the
function s tells us that there are customers prepared to pay the amount s(x) = p(xi ).
As a consequence, the amount saved by the customers “belonging” to that subinterval
equals ( p(xi ) − P)(xi+1 − xi ). The total savings of all customers are


n
( p(xi ) − P)(xi − xi−1 ) .
i=1

This is a Riemannian sum for the function f (x) = p(x) − P. Taking the limit as
n → ∞, this sum approaches the integral
 X
( p(x) − P) d x . (7.27)
0

The integral in (7.27) is called the consumer surplus for the commodity. It represents
the total amount of money saved by the customers in purchasing the commodity at
7.4 Applications to Business and Industry 223

Fig. 7.17 The demand curve

price P corresponding to a demand level of X . Thus, the consumer surplus is equal


to the area between the demand curve p = p(x) and the line p = P, the shaded area
in Fig. 7.17.

7.5 Applications to Mechanics and Engineering

In this section, we discuss how integrals arise in the modeling of mechanical work.
The concept of mechanical work is of vital importance in engineering problems.
Concrete examples are the work done when pumping water for a dam to function
properly, or the work done when lifting an object (say, a bucket of water or a bag of
sand). In lower classes, it is taught that the work W done by a constant force F when
moving an object over a distance d along a line equals W = Fd, that is, the work
done by a force equals force times distance. If the force acts along the x-axis and
depends continuously upon the position x, and an object moves from a to b under
its influence, then the work done by F in moving the object from a to b is
 b
W = F(x) d x . (7.28)
a

Note 7.2 In the English system, work is measured in foot-pounds (ft-lbs). In the
metric system, the unit of work is a Newton meter (Nm), or a Joule (J).

Example 7.26 At each point of the x-axis, marked off in feet, a force of 5x 2 − x + 2
pounds pulls an object. Determine the work done in moving the object from x = 1
to x = 4.

Solution: By (7.28) we have


224 7 Applications of Integration

Fig. 7.18 Lifting a leaky


bucket

 4  4
5 3 1 2
W = (5x 2 − x + 2) d x = x − x + 2x
1 3 2 1
5 1
= (64 − 1) − (16 − 1) + 2(4 − 1)
3 2
= 103.5 ft-lbs.

Let us now consider what happens when we lift an object vertically upward. Assume
that the object has mass m = m(x) which depends on its height x above the ground,
as for example a leaky bucket which gradually loses its contents. In lifting the object,
we have to overcome the gravitational force of the earth which, according to Newton’s
law, is given by
F(x) = m(x)g .

Here g = 9.81 m/s2 denotes the acceleration due to earth’s gravity near its surface.
By (7.28), the work needed to lift the object from height a to height b equals
 b  b
W = F(x) d x = g m(x) d x . (7.29)
a a

Example 7.27 A leaky bucket of weight 5 kg is lifted vertically from the ground into
the air by pulling in 20 m of rope at a constant speed (Fig. 7.18). The rope weighs
0.08 kg/m. The bucket starts with 8 L (=8 kg) of water and leaks at a constant rate.
It finishes draining just as it reaches the top. Compute the work done in lifting for
the bucket, the water and the rope separately as well as the total work done.
Solution: We first compute the masses as functions of height x. The mass of the
bucket is constant, while the masses of water and rope decrease linearly with x. For
7.5 Applications to Mechanics and Engineering 225

the water, the fraction of water still present when the bucket is x meters off the ground
equals (20 − x)/20, so the mass of the water equals 8 times this fraction. In the same
manner, we obtain the mass of the rope. Thus, we get the mass

for the bucket m B (x) = 5 kg,


20 − x
for the water m W (x) = 8 kg,
20
for the rope m R (x) = 0.08 · (20 − x) kg.

The work done in lifting becomes, see (7.29),


 20
WB = g 5 d x = 100g = 981 Nm,
0
     20
20
20 − x 20
2 x2
WW = g 8 dx = g 8− x d x = g 8x −
0 20 0 5 5 0
= (160 − 80)g = 80g = 784.8 Nm,
 20  20
WR = g 0.08 · (20 − x) d x = g (1.6 − 0.08x) d x
0 0
 20
= g 1.6x − 0.04x 2 0 = (32 − 26)g = 16g
= 156.96 Nm.

The total work done is

W = W B + WW + W R = 196g = 1922.76 Nm.

Example 7.28 A bag of sand of mass 100 kg is lifted by a cable from the ground to
the top of a 50 m high building. Sand leaks out of the bag at the rate of 0.5 kg for
each meter the bag is raised. How much work is required to lift the bag of sand to
the top of the building if
1. the masses of the cable and the bag are negligible,
2. the cable has a mass of 1.5 kg per m and the mass of the bag is negligible.
Solution:
1. At the point x m above the ground, the mass of the sand bag is

m 1 (x) = 100 − 0.5x .

The work done in lifting the bag to the top of the building is
226 7 Applications of Integration

 50  50
5x 2
W1 = g (100 − 0.5x) d x = g 100x − = (5000 − 625)g
0 2 0
= 4375g = 42918.75 Nm.

2. The mass of the fraction of the rope to be pulled at height x is 1.5(50 − x) kg.
Thus, the work required to lift the cable to the top of the building is
 50  50
x2
W2 = g 1.5(50 − x) d x = 1.5g 50x −
0 2 0
= 1.5g(2500 − 1250) = 1.5g · 1250
= 1875g = 18393.75 Nm.

Therefore, total work done in lifting is

W = W1 + W2 = (4375 + 1875)g = 6250g = 61312.5 Nm.

Let us now consider an elastic spring. Hooke’s law states that the force F required
to stretch or compress a spring by a length x from its natural length (at F = 0) is
proportional to x, that is,
F = kx . (7.30)

Here, k is a constant which depends on the specific spring, it is called the spring
constant and measured in units of force per length, for example, N/m.
Example 7.29 Find the work required to compress a spring from its natural length
of 1 m to a length of 0.8 m if the spring constant equals k = 16 N/m.
Solution: Consider the uncompressed spring along the x-axis with its movable end at
the origin and its fixed end at x = 1 m (see Fig. 7.19). The force required to compress
the spring from 0 to x is F = 16x Newton by Hooke’s law. The work done by F

Fig. 7.19 Compressed and


uncompressed spring
7.5 Applications to Mechanics and Engineering 227

Fig. 7.20 Unstretched and


stretched spring

over the interval from x = 0 to x = 0.2 m is


 0.2
0.2
W = 16x d x = 8x 2 = 0.32 Nm.
0 0

Example 7.30 A spring has a natural length of 1 m. A force of 24 N stretches the


spring to a length of 1.6 m.
1. Find the spring constant k.
2. How much work is required to stretch the spring 2 m beyond its natural length?
3. How far will a 40 N force stretch the spring?

Solution:
1. Since a force of 24 N stretches the spring by 0.6 m, using (7.30) we get

24 N
24 = k · 0.6 , k = = 40 .
0.6 m
2. Consider the unstretched spring hanging along the x-axis with its free end at
x = 0 (see Fig. 7.20). The force required to pull the spring x m beyond its natural
length is just the force required to pull the free end of the spring x m downward
from its original position.

By Hooke’s law this force is F(x) = 40x N. The work done by pulling the spring
from x = 0 m to x = 2 m is
 2
2
W = 40x d x = 20x 2 = 80 Nm.
0 0
228 7 Applications of Integration

3. We have F = 40x. To find the elongation resulting from a force of 40 N, we


substitute F = 40 into this equation and obtain

40
x= = 1 m.
40
A force of 40 N will stretch the spring by 1 m.

7.6 Integrals and Probability

Probability theory began to arise as a science in Europe during the sixteenth and
seventeenth centuries, as marked by the book “Liber de ludo aleae” on games of dice
by Gerolamo Cardano (published posthumously in 1663) and by a famous exchange
of letters between the mathematicians Blaise Pascal and Pierre de Fermat in 1654). In
the meantime, probability theory and its descendants have evolved into an important
branch of mathematics with wide applications in practically every sphere of human
endeavor in which an element of uncertainty is involved. Here, we present some
examples of how integrals are involved in the computation of probabilities in some
elementary situations.
A dependent variable whose values also depend on some random outcome is
called a random variable. A random variable x that can assume any value in some
given interval is called a continuous random variable. The life span of a light bulb,
the length of a telephone call, the length of an infant at birth, the daily amount of
rainfall in Delhi, and the life span of certain plant species are examples of continuous
random variables.
Definition 7.1 (Probability Density Function) A function f defined on some interval
I = [ A, B] (the values A = −∞ and B = ∞ are also possible) is called a proba-
bility density function (or simply a density function) if the following conditions
are satisfied:
1. f is nonnegative, that is, f (x) ≥ 0 for all x.
2. f is integrable over I (in the cases A = −∞ or B = ∞ we mean the improper
integral).
3. The total area under the graph of f from A to B is equal to 1 (see Fig. 7.21), that
B
is, A f (x) d x = 1.
If the probability that an observed value of a given random variable x lies in some
subinterval [a, b] of I satisfies
 b
P(a ≤ x ≤ b) = f (x) d x (7.31)
a

(see Fig. 7.22), we say that f is the density function belonging to this random variable.
7.6 Integrals and Probability 229

Fig. 7.21 Total area


corresponds to probability
equal to one

Fig. 7.22 Probability of the


value of x lying between a
and b

P(a < x < b) is the probability that an outcome of an experiment will lie between
a and b
Remark 7.1 1. According to property (3) of Definition 7.1, the probability that
the continuous random variable takes on a value lying in its overall range I =
[A, B] equals 1. This corresponds to the fact that no other values are possible.
(The probability of an event which always occurs is 1, by a basic convention of
probability theory.)
2. According to formula (7.31), the probability that the random variable x assumes
a value in an interval a ≤ x ≤ b is given by the area of the region between the
graph of f and the x-axis from c x = a to x = b.
3. Since P(c − ε ≤ x ≤ c) = c−ε f (x) d x tends to zero as ε tends to zero (the
area below a single point of the graph of f is zero), the probability that x exactly
attains an arbitrarily given value c is zero. The random variable x therefore must
have the property that

P(a ≤ x < b) = P(a < x ≤ b) = P(a < x < b) = P(a ≤ x ≤ b) .

Consequently, a continuous random variable x which assigns a nonzero proba-


bility to certain discrete values cannot be modeled in the form (7.31).

Example 7.31 Show that the following functions are probability density functions
on the intervals indicated:
230 7 Applications of Integration

3
1. f (x) = (5x − x 2 ), I = [0, 4].
56
2
2. f (x) = x(x − 1), I = [1, 4].
27
1 −1x
3. f (x) = e 3 , I = [0, ∞).
3
Solution: In all cases, it is clear that f is nonnegative on the respective interval I .
1. The function is continuous and hence integrable on I . We have
  2  4  
4
3 3 x x 3  3 80 64
(5x − x 2 ) d x = 5 − = − = 1,
0 56 56 2 3 0 56 2 3

hence f satisfies requirements (1)–(3) of Definition 7.1 and therefore is a density


function.
2. The function is continuous and hence integrable on I . We have
    4
4 4
2 2 2 x3 x 2 
f (x) d x = (x − x) d x = −
1 1 27 27 3 2 1
     
2 64 16 1 1 2 27
= − − − = .
27 3 2 3 2 27 2
= 1.

3. The function is continuous and hence integrable on any finite interval [0, b]. We
obtain the improper integral as the limit
 ∞  b 
1 −1x 1 −1x 1 b
e 3 d x = lim e 3 d x = lim −e− 3 x 
3 b→∞ 3 b→∞ 0
0
0 
− 13 b
= lim −e + 1 = 1.
b→∞

Example 7.32 The Philips company manufactures a 200 watt light bulb. Laboratory
tests showed that the life span of these light bulbs has a distribution described by the
probability density function

f (x) = 0.001e−0.001x ,

where x is measured in hours. Determine the probability that a light bulb will have
a life span of
1. 500 h or less,
2. more than 500 h, and
3. more than 1000 h, but less than 1500 h.
Solution: Let x denote the life span of the light bulbs. x is a continuous random
variable whose value is different for each actual bulb.
7.6 Integrals and Probability 231

1. The probability that a certain specific light bulb will have a life span of 500 h or
less is given by
 500 500

P(0 ≤ x ≤ 500) = 0.001e−0.001x d x = −e−0.001x 
0 0

= −e−0.5 + 1  0.3935 .

2. The probability that a light bulb will have a life span of more than 500 h is given
by
 ∞  b
P(x > 500) = 0.001e−0.001x d x = lim 0.001e−0.001x d x
500 b→∞ 500
b

= lim −e−0.001x  = lim (−e−0.001b + e−0.5 )
b→∞ 500 b→∞
−0.5
=e  0.6065 .

Using (1), the result can also be obtained as

P(x > 500) = 1 − P(x ≤ 500) = 1 − (1 − e−0.5 ) = e−0.5  0.6065 .

3. The probability that a light bulb will have a life span of more than 1000 h, but
less than 1500 h is given by
 1500
P(1000 < x < 1500) = 0.001e−0.001x d x
1000
1500

= −e−0.001x  = −e−1.5 + e−1
1000
 −0.2231 + 0.3679 = 0.1448 .

Example 7.33 1. Determine the value of the constant k such that the function
f (x) = kx 2 becomes a probability density function on the interval [0, 5].
2. If x is a continuous random variable with the probability density function given
by (1), compute the probability that x will assume a value between x = 1 and
x = 2.

Solution:
1. We have  
5 5
k 3 5 125
kx 2 d x = k x2 dx = x  = k.
0 0 3 0 3

For f (x) = kx 2 to be a probability density function, we must have 125


3
k = 1,
therefore k = 125
3
.
2. The required probability is given by
232 7 Applications of Integration

Fig. 7.23 Uniform density


function

Fig. 7.24 Exponential


density function

 
2
3 2
1 3 2
P(1 ≤ x < 2) = f (x) d x = x2 dx = x 
1 125 1 125 1
1 7
= (8 − 1) = .
125 125
Definition 7.2 (Uniform and exponential density function)
1. The probability density function f defined by

1
f (x) =
b−a

is called the uniform density function on I = [a, b]. In this case, we say that the
random variable x is uniformly distributed on [a, b] (Fig. 7.23).
2. A probability density function f defined by

f (x) = ke−kx ,

where k is a positive constant, is called an exponential density function on I =


[0, ∞). In this case, the random variable x is said to be exponentially distributed
on [0, ∞). Note that the area below the graph of f (x) = ke−kx is equal to 1
(Fig. 7.24).
7.6 Integrals and Probability 233

Example 7.34 Trains stop at a certain terminal regularly every 30 min. What is the
probability that a passenger, who arrives at the terminal at a random time, will have
to wait more than 10 min before he catches a train?

Solution: Let t denote the length of time the passenger has to wait for the next train.
This is a continuous random variable with values in the interval I = [0, 30]. In the
absence of other information, we assume that t is uniformly distributed on [0, 30].
The corresponding uniform density function is the constant function

1
f (x) = .
30
The probability that a passenger will have to wait more than 10 min is
 30
1 1 2
P(t ≥ 10) = dt = (30 − 10) = .
10 30 30 3

Example 7.35 Assume that airplanes departing from an airport follow a pattern
described by an exponential density function, that is, when a plane has just left
the next plane will depart after t minutes, where t is exponentially distributed. Find
the probability that an airplane leaves within 6 min, if the constant of the exponential
distribution has the value k = 0.25.
Solution: The random variable t is exponentially distributed according to

f (t) = 0.25e−0.25t .

The required probability is


 6 6

P(t ≤ 6) = 0.25e−0.25t dt = e−0.25t  = −e−1.5 + 1
0 0

 1 − 0.223 = 0.777 .

Therefore, the probability that the next plane departs within 6 min is equal to 0.777.
Example 7.36 A machine produces a successive stream of items of a certain com-
modity. An inspector tests the items and records when a defective item appears.
Assume that it is known that, when a defective item has just appeared, the probabil-
ity that the next x items are in order is given by an exponential density distribution
with parameter k = 1/200. Find the probability that, after a defective item appears,
the next 200 items are not defective.
Solution: The probability density function is given by

1
f (x) = ke−kx , k = .
200
234 7 Applications of Integration

The required probability is


 200 
1 −x x 200
e 200 d x = −e− 200  = −e−1 + 1  0.632 .
0 200 0

The probability that the next defective item will not occur within the next 200 items
is equal to 1 − 0.632 = 0.368.

7.7 Exercises

7.7.1 Find the area of the region between the graphs of the following two functions
on the interval from 0 to π .
(a) y = 1 + cos 13 x and y = sin 4x.
(b) y = 4 + cos 2x and y = 3 sin x2 .
7.7.2 Find the area of the region between the graphs of f and g if x belongs to the
given interval.
(a) f (x) = x 3 − 4x + 2, g(x) = 2, on [−1, 3].
(b) f (x) = sin x, g(x) = cos x, on [0, 2π].
7.7.3 Sketch the region D bounded by the graphs of the functions of the following
equations, and find the volume of the solid generated if D is revolved about
the indicated axis.
(a) y = x1 , x = 1, x = 3, y = 0, x-axis.
(b) y = 2x, y = 4x 2 , y-axis.
x2 y2
7.7.4 Show that the circumference C of the ellipse with the equation 2 + 2 = 1
a b
is given by
 π/2
C = 4a 1 − e2 sin2 θ dθ
0

where e = 1 − b2 /a 2 is the eccentricity (assume that b ≤ a).
If the planet Mercury travels in an elliptical orbit with e = 0.206 and a =
1.387, find the maximum and minimum distances between Mercury and the
Sun.
7.7.5 A force of 25 N is required to compress a spring of natural length 0.80 m to
a length of 0.75 m. Find the work done in compressing the spring from its
natural length to a length of 0.70 m.
7.7.6 A construction worker pulls a 50 kg motor from ground level to the top of a
60 m high building using a rope that weighs 41 kg per m. Find the work done.
7.7.7 The force (in Newtons) with which two electrons repel each other is inversely
proportional to the square of the distance (in meters) between them.
7.7 Exercises 235

(a) If one electron is held fixed at the point (5, 0), find the work done in
moving a second electron along the axis from the origin to the point
(3, 0).
(b) If two electrons are held fixed at the points (5, 0) and (−5, 0), respec-
tively, find the work done in moving the third electron from the origin to
(3, 0).

7.7.8 A motorboat uses gasoline at the rate of t 9 − t 2 gal/hr. If the motor is started
at t = 0, how much gasoline has been used after 2 h?
7.7.9 Suppose the flow rate of blood at time t through a certain cross section of a
blood vessel is given by

F1 cm3
F(t) = ,
(1 + αt)2 s

where F1 and α are constants. Find the average flow rate F during the time
interval [0, T ].
7.7.10 The cost (in Euro) of producing q units of a product is given by

c = 4000 + 10q + 0.1q 2 .

Find the average cost for the range from 100 to 500 units.
7.7.11 Suppose that colored dye is injected into the blood stream at a constant rate
R
R. At time t, let C(t) = the concentration of dye at a location different
F(t)
from the point of injection, where F(t) is given as in Exercise 7.7.13. Show
that the average concentration on a time interval [0, T ] is

R(1 + αT + 13 α 2 T 2 )
C= .
F1

7.7.12 Let the average lifetime of a DVD player be 4 years. A reasonable model
for breakdown time is given by an exponential random variable. Let p(x) =
1 −x/4
4
e be its density, for 0 ≤ x < ∞ measured in years.
(a) Find the probability that the DVD player will eventually break.
(b) Find the probability that DVD player will break down within 12 years.

Hint: Use the fact that the probability of breakdown in the interval a ≤
x ≤ b equals the integral of p from a to b.
7.7.13 Let the following figure (Fig. 7.25) gives the density function for the amount
of waiting time at a doctor’s clinic.
(a) What is the longest time one has to wait?
(b) Approximately what fraction of patients wait between 1 and 2 h?
(c) Approximately what fraction of patients wait less than an hour?
236 7 Applications of Integration

Fig. 7.25 Distribution of waiting time at a doctor’s clinic

Fig. 7.26 Catching fish

7.7.14 Suppose we want to analyze the fishing industry in a town. Each day, boats
bring back at least 2 tons (quintals) of fish, but never more than 8 tons (quin-
tals).
(a) Applying the density function describing the catch in Fig. 7.26, find the
cumulative distribution and graph the corresponding cumulative distri-
bution function.
(b) What is the probability that the catch is between 5 and 7 tons?
7.7.15 After measuring the duration of many telephone cells, a telephone com-
pany found their data was well approximated by the density function p(x) =
0.4e−0.4x , where x is the duration of a call in minutes.
(a) What percentage of calls last between 1 and 2 min?
(b) What percentage of calls last 1 min oe less?
(c) What percentage of calls last 2 min or less?
(d) What percentage of calls last 3 min or more?
7.7 Exercises 237

7.7.16 A cumulative distribution function P of a density (probability) function p is


defined by  t
P(t) = p(x) d x .
−∞

The values P(t) give the fraction of the population having values of x below t.
Find the cumulative distribution function of the density function (probability
density function) of Exercise 7.7.19.
7.7.17 At a bus stop, the time X (in minutes) that a randomly arriving person must
wait for a bus is uniformly distributed with density function f (x) = 10
1
, where
0 ≤ x ≤ 10 and f (x) = 0 otherwise. What is the probability that such a
person must wait at most seven minutes? What is the average time that a
person must wait? ∞
Hint: The average or mean waiting time equals −∞ x f (x) d x.
7.7.18 The length of life X (in years) of an electronic component has an exponential
distribution with k = 16 . What is the probability that such a component will
fail within 4 years of use? Find the probability that it will last more than 6
years.
7.7.19 Assume that in a specific hospital the length of time X (in hours) between
successive arrivals at the emergency room is exponentially distributed with
k = 4. What is the probability that it will take more than 2 h before the next
arrival?
7.7.20 If an automobile starts from rest, what constant acceleration will enable it to
travel 500 m in 10 s?
7.7.21 If a car is traveling at a speed of 60 km/hr, what constant (negative) acceler-
ation will enable it to stop in 9 s?
7.7.22 Let a province of a country have a natural gas reserve of 100 billion m3 .
Let A(t) denote the total amount of natural gas consumed after t years, then
A (t) is its rate of consumption. If the rate of consumption is predicted to be
5 + 0.01t b m3 /year, in approximately how many years will the province’s
natural gas reserve be depleted?
7.7.23 Let I be an alternating current of the form I (t) = I M sin ωt where t is the
time, I M is the current amplitude, and ω/2π is the frequency. Assume that
the current flows through a resistor of R ohms. The rate P at which heat is
being produced in the resistor is given by P = I 2 R. Compute the average
rate of production of heat over one complete cycle from t = 0 to t = 2π/ω.
Chapter 8
Functions of Several Variables

8.1 Introduction

In Sect. 1.1, we have introduced functions of one independent variable, expressed


by y = f (x). Functions of one variable represent various phenomena. A number of
illustrative examples have been given in Sect. 1.4. Let us recall that a real-valued
function f of one variable assigns to each point x of its domain D( f ) on the line R
a unique point f (x) on the line R.
A real-valued function f of two variables differs from a real-valued function of
one variable in the fact that its domain D( f ) lies in the plane R2 = R × R; as in
the case of one variable, it assigns to each point in its domain a unique point on the
line R. Analogously, the domain D( f ) of a real-valued function f of three variables
lies in the space R3 = R × R × R; again f assigns to each point in its domain a
unique point in R. In the same manner, we can introduce functions of n variables
whose domain lies in n-dimensional space Rn = R × R × · · · × R (with n factors)
and whose values are real numbers.
We will confine ourselves mainly to functions of two variables for the sake of
clarity and visualization. Many relevant concepts and results for 2 or 3 variables
carry over to the general case of n variables, where n is an arbitrary (finite) natural
number. In fact, the differential and integral calculus nowadays is well developed
also in infinite-dimensional spaces. This, however, is beyond the scope of this book.
Very often one encounters situations which cannot be modeled by functions of one
variable only, but require the use of functions of more than one variable. The cost of
producing a certain item may depend on the simultaneous combination of variables
such as labor and material. In economic theory, supply and demand of a commodity
may depend not only on its own price but also on the prices of related commodities
and some other factors such as income level or time of year. In physiology, we come
across functions of several variables when we study the relationship between body
surface area and the weight and height of the person. Other situations which demand
the study of functions of more than one variable are as follows: The amount of food
grown depends on the amount of water and fertilizer used; the rate of a chemical

© Springer Nature Singapore Pte Ltd. 2019 239


M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6_8
240 8 Functions of Several Variables

reaction depends on the temperature and pressure of the environment in which it


takes place; the quantity of grocery purchased by a person depends on the price of
different items and the income of the person concerned; the rate of fallout from a
volcanic eruption depends on the distance from the volcano and the time since the
eruption.
Examples of functions of several variables representing various phenomena will
be discussed in Sect. 8.2. In the present section, we introduce the concepts of graph,
level curves, and contours.
Definition 8.1 Suppose that D is a set of ordered pairs of real numbers (x, y). A
real-valued function f of two variables on D is a rule that assigns a unique real
number
z = f (x, y)

to each ordered pair (x, y) in D. The set D is the domain of f , and the set of values
(or z-values) taken on by f is its range. As before, we also write D( f ) for the domain
of f . The independent variables x and y are the components of the function’s input
variable (x, y), and the dependent variable z is the function’s output variable.
Note.
1. The notation “z = f (x, y)” is a convenient and short way to convey three pieces
of information. These are the name of the function (here “ f ”) and the letters used
to denote the independent variables (here “(x, y)”) and the dependent variable
(here “z”), respectively.
2. In the same manner as in Definition 8.1, we can define functions of three
independent variables w = f (x, y, z), functions of four independent variables
u = f (x, y, z, w), and so on.
3. Definition 8.1 is a special case of the general definition of a function (Defini-
tion 1.2).

Definition 8.2 The set of points (x, y) in the plane where a function f of two
variables has a fixed value f (x, y) = c, c being any constant, is called a level set or
contour of f . Since in many cases the level sets are curves, they are also called level
curves of f . A graph showing selected contours of a function is called a contour
diagram or a contour map. The set of all points (x, y, f (x, y)) in the space R3 ,
where (x, y) ranges over all points in the domain of f , is called the graph of f . The
graph of f , described by the equation z = f (x, y), is also called a surface.

For functions f of three variables, the level sets described by f (x, y, z) = c, where
c is any constant, are called level surfaces, since in many cases they form a surface
in space.

Example 8.1 1. Let z = f (x, y) = y − x 2 . The domain of this function is the
set of points (x, y) such that y ≥ x 2 . For y < x 2 the z-values are not real. The
range of f is [0, ∞).
8.1 Introduction 241

Fig. 8.1 A contour diagram

1
2. For z = f (x, y) = , the domain is the set of all (x, y) such that x y  = 0, and
xy
the range is (−∞, 0) ∪ (0, ∞).
3. Let z = f (x, y) = cos(x y). Its domain is the whole plane R2 , and its range is
[−1, 1].
4. Let f (x, y) = 2x + 4y. The domain of this function is again the plane R2 , and
its range is R.
5. Let f (x, y, z) = x 2 + y 2 + z 2 . The domain of f is R3 , and the range of f is R+ ,
the set of nonnegative numbers.
When f is a potential function as used in physics, that is, it gives the value of the
potential energy at each point of the space R3 , the level surfaces f (x, y, z) = c are
called equipotential surfaces. When f represents a temperature distribution, the
level surfaces f (x, y, z) = c are called isothermal surfaces.

Example 8.2 Draw a contour diagram for the function R(x, y) = 350x + 200y.
Include the contours for R = 2000, 4000, 8000, 12000, and 16000.

Solution: The contour for R = 2000 is given by

350x + 200y = 2000 .

This is equation of a straight line with meets the x-axis at x = 2000


350
= 5.71 and the
y-axis at y = 2000
200
= 10. The contour for R = 4000 is given by

350x + 200y = 4000.

This is the equation of a line parallel to the one above which meets the x- and y-axes
at x = 4000
350
= 11.43 and y = 4000200
= 20, respectively. The contours for R = 8000,
R = 12000, and R = 16000 are again parallel lines drawn similarly (Fig. 8.1).
242 8 Functions of Several Variables

Fig. 8.2 a Level curves and b Graph of f (x, y) = x 2 + y 2

Example 8.3 Sketch a contour map for the function f (x, y) = x 2 + y 2 .

Solution: The level curves are defined by the equation x 2 + y 2 = c for nonnegative
numbers c. Taking c = 0, 1, 4, 9, 16 and 25 for example, we get for
c = 0: x 2 + y 2 = 0
c = 1: x 2 + y 2 = 1
c = 4: x 2 + y 2 = 4 = 22
c = 9: x 2 + y 2 = 9 = 32
c = 16: x 2 + y 2 = 16 = 42
c = 25: x 2 + y 2 = 25 = 52
The level curves are concentric circles with center at the origin and radius given
by r = 0, 1, 2, 3, 4, and 5, respectively (see Fig. 8.2a). A sketch of the graph of
f (x, y) = x 2 + y 2 is shown in Fig. 8.2b
Computer Generated Graphs. As we know it is quite difficult to sketch an accurate
graph of a function of two variables manually. However, powerful help is at hand;
three-dimensional graphic programs for the computer (for example, MATLAB and
MATHEMATICA) make it possible to visualize even quite complicated surfaces.
These programs allow the user to view a surface from different perspectives. They
show level curves and sections in various planes. Examples of computer generated
graphs are shown below for the functions:
1. f (x, y) = x y, see Fig. 8.3.
−3y
2. f (x, y) = x 2 +y 2 +1 , see Fig. 8.4.

3. f (x, y) = e−x + e−4y , see Fig. 8.5.


2 2
8.1 Introduction 243

Fig. 8.3 Graph and level curves for f (x, y) = x y

Fig. 8.4 Graph and level curves for f (x, y) = (−3y)/(x 2 + y 2 + 1)

Fig. 8.5 Graph and level curves for f (x, y) = e−x + e−4y
2 2


4. f (x, y) = sin x 2 + y 2 , see Fig. 8.6.
5. f (x, y) = 2x 2 + 4y 2 , see Fig. 8.7.
f (x, y) = x ye−(x +y )/2 , see Fig. 8.8.
2 2
6.
7. f (x, y) = sin x sin y, see Fig. 8.9.
244 8 Functions of Several Variables


Fig. 8.6 Graph and level curves for f (x, y) = sin x 2 + y2

Fig. 8.7 Graph and level curves for f (x, y) = 2x 2 + 4y 2

Fig. 8.8 Graph and level curves for f (x, y) = x ye−(x


2 +y 2 )/2
8.2 Situations Modeled by Functions of More Than One Variable 245

Fig. 8.9 Graph and level curves for f (x, y) = sin x sin y

8.2 Situations Modeled by Functions of More Than One


Variable

Example 8.4 A computer manufacturing company determines that profits are 600
IR, 500 IR, and 400 IR, respectively, for a single unit of the type A, type B, and type
C laptops it plans to produce. Let x, y, and z denote the number of type A, type B,
and type C laptops to be made, then the profit of the company is modeled by the
function of three variables

P(x, y, z) = 600x + 500y + 400z .

Example 8.5 An electronic company in India manufactures a TV set that may be


bought fully assembled or in a kit. The demand equations that relate the unit prices,
p and q to the weekly demanded quantities x and y of the assembled and the kit
versions of TV set, are given by the functions of two variables:

1 1
p(x, y) = 300 − x − y
4 8
1 3
q(x, y) = 240 − x − y .
8 8
The weekly total revenue function F is a function of two variables:

F(x, y) = x p(x, y) + yq(x, y)


1 1 1 3
= x(300 − x − y) + y(240 − x − y)
4 8 8 8
1 2 3 2 1
= − x − y − x y + 300x + 240y .
4 8 4
246 8 Functions of Several Variables

Fig. 8.10 Domain of the function F

To find the domain of the function F(x, y), let us observe that the quantities x, y,
p, and q must be nonnegative. This requirement leads to the following system of
inequalities:

1 1
300 − x − y ≥0
4 8
1 3
240 − x − y ≥0
8 8
x ≥0
y ≥ 0.

The domain of this function is sketched in Fig. 8.10.

Example 8.6 A car rental company charges 400 IR a day and 15 IR per kilometer
for its cars. Write a formula for the cost C of renting a car as a function of the number
d of days and the number m of kilometers driven. Find C(5, 300) and interpret it.

Solution: The total cost in Indian rupees of renting a car is 400 times the number
d of days, plus 15 times the number m of kilometers, so that it equals the number
400d + 15m. This gives the value of the function for given values of d and m, so

C(d, m) = 400d + 15m,

which is a function of two variables. We have

C(5, 300) = 400 · 5 + 15 · 300 = 2000 + 4500 = 6500.


8.2 Situations Modeled by Functions of More Than One Variable 247

This tells us that if we rent a car for 5 days and drive it over 300 km, it will cost 6500
IR.
Example 8.7 Let A denote the area of rectangle having sides of length x and y. Then
A is a function of x and y, namely, A(x, y) = x y.

8.3 Continuity of Functions of Several Variables

In Chap. 2, we have discussed the notions of limit and continuity for functions of
one variable. In this section, we extend those notions first for functions of two, and
afterward for three and more variables.
We start with the definition of a limit of a function f of two variables. We want
to give a precise meaning to the statement
As (x, y) approaches a point (a, b) in the plane, f (x, y) approaches the value L.

We rephrase this statement as follows:


We can enforce f (x, y) to deviate from L by an amount less than a given ε > 0, if we restrict
(x, y) to be taken close enough to (a, b).

In the plane, the closeness of the point (x, y) to the point (a, b) is measured by the
distance (see Appendix B)

d((x, y), (a, b)) = (x − a)2 + (y − b)2 . (8.1)

For a given δ > 0, the set of points whose distance from (a, b) is smaller than δ,

Bδ (a, b) = {(x, y) : d((x, y), (a, b)) < δ}, (8.2)

is called the δ-neighborhood of the point (a, b). Thus we arrive at


We can enforce f (x, y) to deviate from L by an amount less than a given ε > 0, if we restrict
(x, y) to be taken from a δ-neighborhood of (a, b) with a sufficiently small δ > 0.

The formal definition of a limit is based upon these considerations.


Definition 8.3 (Limit of a function of two variables) Let f be a function defined in
some neighborhood of (a, b) except possibly at the point (a, b) itself. We say that
f (x, y) tends to the limit L as (x, y) approaches (a, b) and write

lim f (x, y) = L , (8.3)


(x,y)→(a,b)

if for every ε > 0 there is a number δ > 0 such that

| f (x, y) − L| < ε whenever 0 < d((x, y), (a, b)) < δ, (8.4)
248 8 Functions of Several Variables

where d((x, y), (a, b)) = (x − a)2 + (y − b)2 .

Remark 8.1 1. Definition 8.3 enforces that f (x, y) tends to the same value L no
matter from which direction (x, y) approaches (a, b), since only the distance to
(a, b) plays a role.
2. As in the case of a single variable, statement (8.3) is equivalent to the statement

lim | f (x, y) − L| = 0. (8.5)


(x,y)→(a,b)

3. The precise meaning of the requirement “ f is defined in some neighborhood of


(a, b)” is that there exists an η > 0 such that f is defined in the η-neighborhood
of (a, b). In this case, the point (a, b) is called an interior point of the
domain of f .

Sets D with the property that all elements of D are interior points of D are called
open.
Definition 8.4 A subset D of the plane is called open if for every point (a, b) in D
there exists an η > 0 such that Bη (a, b) ⊂ D, that is, every point whose distance to
(a, b) is smaller than η belongs to D.
The definition of continuity is based on the definition of the limit in the same manner
as in the case of a function of one variable. Namely, we require that the limit at a
point has to be equal to the function value at that point.

Definition 8.5 In the situation of Definition 8.3, the function f is said to be contin-
uous at the point (a, b), if

lim f (x, y) = f (a, b). (8.6)


(x,y)→(a,b)

The function f is said to be continuous in some subset D of the plane, if it is


continuous at every point (a, b) in D.

Example 8.8 1. The constant function defined by f (x, y) = c for some number c is
continuous at every point (a, b) of the plane. Indeed, condition (8.4) is satisfied
for L = f (a, b) = c no matter how we choose δ, because | f (x, y) − c| = 0
holds for all points (x, y).
2. The function f (x, y) = x is continuous at every point (a, b) of the plane. To
check this, we observe first that

|x − a| ≤ (x − a)2 + (y − b)2 = d((x, y), (a, b))

holds for every point (x, y). Condition (8.4) is therefore satisfied for L = f (a, b)
if, for any given ε > 0, we choose δ = ε. Analogous considerations show that
f (x, y) = y, too, yields a continuous function.
8.3 Continuity of Functions of Several Variables 249

3. For the function 


1 , for (x, y) = (0, 0) ,
f (x, y) =
0 , for all other (x, y) ,

we have
lim f (x, y) = 0 = 1 = f (0, 0).
(x,y)→(0,0)

Hence, f is not continuous at (a, b) = (0, 0). At all other points (a, b)  = (0, 0)
it is continuous.

Fortunately, in order to check whether a function is continuous one usually does


not have to verify condition (8.4) explicitly. As in the case of functions of a single
variable, limits can be interchanged with the algebraic operations as well as the
composition of functions, so the theorems regarding the sum, product, quotient, and
composition of Sect. 2.2 hold for functions of two variables as well. Therefore, on
the basis of parts 1 and 2 of Example 8.8, we see that functions like

ex
f (x, y) = 3y 2 sin(x y) − + |x y|
3 + y2

are continuous.
The Sandwich theorem can be stated as follows.
Theorem 8.1 (Sandwich Theorem, Functions of Two Variables) Let f , g, h be
functions of two variables such that g(x, y) ≤ f (x, y) ≤ h(x, y) for all (x, y)  =
(a, b) in some neighborhood of (a, b) and let

lim g(x, y) = lim h(x, y) = L .


(x,y)→(a,b) (x,y)→(a,b)

Then we have
lim f (x, y) = L (8.7)
(x,y)→(a,b)

as well.
When f is defined by a quotient whose numerator and denominator both tend to
zero, one has to take a closer look.
x 2 (x + y)
Example 8.9 1. Show that the function defined by f (x, y) = for
x 2 + y2
(x, y)  = (0, 0) is continuous at (0, 0) if we define f (0, 0) = 0.
x 2 − y 2 + 2x 3
2. Show that the function defined by g(x, y) = for (x, y)  = (0, 0)
x 2 + y2
is not continuous at (0, 0) no matter how we define g(0, 0).
Solution:
250 8 Functions of Several Variables

1. Since x 2 ≤ x 2 + y 2 ,

x2
0 ≤ | f (x, y)| ≤ |x + y| ≤ |x| + |y|
x 2 + y2

holds whenever x and y are not both zero, the Sandwich theorem implies that

lim | f (x, y)| = 0.


(x,y)→(a,b)

Comparing (8.4) and (8.5) with L = 0, we see that f (x, y) tends to 0 as (x, y)
tends to (0, 0), so f is continuous at this point provided f (0, 0) = 0.
2. Suppose g(0, 0) = L. If g is continuous at (0, 0), then g(x, y) must approach the
value L as (x, y) approaches (0, 0), and this must be true in particular when (x, y)
approaches along an arbitrary line through (0, 0). But along the x-axis we have
g(x, 0) = 1 + 2x for x = 0, which tends to 1 as x → 0. On the other hand, along
the y-axis we have g(0, y) = −1 for y = 0. The former result requires L = 1
and latter L = −1. Since these requirements are incompatible, it follows that g
cannot be continuous at (0, 0).

Remark 8.2 When we have a function f of two variables (x, y), we can consider it
as a function of one of its variables, keeping the other variable fixed. For example, if
we fix y = b, we obtain a function g(x) = f (x, b) of x only, or if we fix x = a, we
obtain a function h(y) = f (a, y) of y only. Sometimes, these functions g and h are
also called partial functions of f . We may check from the definitions that if f is
continuous at (a, b), then g is continuous at a and h is continuous at b. We summarize
this in saying that a continuous function of two variables is continuous in each
of its variables. However, the converse is false as it is possible for a function of two
variables to be continuous in each variable separately and yet fail to be continuous
as a whole. Let us consider the function f defined by

⎨ 2x y , (x, y)  = (0, 0) ,
f (x, y) = x 2 + y 2

0, (x, y) = (0, 0).

Since f (x, 0) = 0 for all x and f (0, y) = 0 for all y, we have lim f (x, 0) = 0 =
x→0
f (0, 0) and lim f (0, y) = 0 = f (0, 0). Thus f is continuous in x as well as in y at
y→0
the point (0, 0). However, as a function of two variables f is not continuous at (0, 0).
To see this, we approach (0, 0) along the line x = y which consists of all points of
the form (t, t). At such points we have, for t = 0,

2t 2
f (t, t) = = 1.
t2 + t2
8.3 Continuity of Functions of Several Variables 251

Hence, condition (8.4) cannot be satisfied for L = f (0, 0) = 0, so f is not continuous


at (0, 0).

For the remainder of this section, we discuss the case of functions of 3 or more
variables. We can extend Definitions 8.3 and 8.5, if we extend the notion of distance
accordingly. In the general case of n variables, consider points x = (x1 , x2 , . . . , xn )
and a = (a1 , a2 , . . . , an ) in Rn and define their distance as
 n 1/2

d(x, a) = (xi − ai ) 2
. (8.8)
i=1

We say that f (x) tends to a number L and write

lim f (x) = L , (8.9)


x→a

if for every ε > 0 there exists a δ > 0 such that

| f (x) − L| < ε whenever d(x, a) < δ,

where d(x, a) is given by (8.8). Presently, (8.9) is just a short (and very convenient)
form of the statement

lim f (x1 , x2 , x3 , . . . , xn ) = L .
(x1 ,x2 ,x3 ,...,xn )→(a1 ,a2 ,a3 ,...,an )

When we will study the vector calculus in Chap. 9, we will consider x, a, and so
on as vectors, and notations like (8.9) will be a natural way to write things down.
The theorems regarding the sum, product, quotient, and composition of continuous
functions from Sect. 2.2 also extend to functions of three and more variables, so, for
example, the function
f (x, y, z) = x 2 − cos(x ye z )

is continuous.

8.4 Partial Derivatives with Applications

In Chap. 3, we have studied the derivative of a function of a single variable. Here, we


study the concept of a partial derivative of a function of two or more variables, which
means that we form the derivative with respect to one variable while keeping the other
variable (resp. all variables) fixed. We present and explain its formal definition, give
a geometrical interpretation as well as a lot of examples, including several situations
in applications whose modeling and analysis involves partial derivatives. Moreover,
252 8 Functions of Several Variables

several forms of chain rule are discussed. Toward the end, we show how to compute
derivatives in parameter-dependent integrals.
Definition 8.6 (Partial Derivative) Let f be a function of two variables (x, y). The
∂f
partial derivative of f with respect to x at a point (x0 , y0 ) is denoted by (x0 , y0 )
∂x
or f x (x0 , y0 ), and defined as

∂f f (x0 + h, y0 ) − f (x0 , y0 )
(x0 , y0 ) = f x (x0 , y0 ) = lim , (8.10)
∂x h→0 h

provided that f is defined in some neighborhood of (x0 , y0 ) and that the limit on the
right-hand side of (8.10) exists.
Analogously, we define

∂f f (x0 , y0 + h) − f (x0 , y0 )
(x0 , y0 ) = f y (x0 , y0 ) = lim .
∂y h→0 h

Remark 8.3 Let us consider f as a function of x only, that is, we fix y = y0 and vary
only x. For the resulting partial function g(x) = f (x, y0 ) we have, by the definitions,

g(x0 + h) − g(x0 ) ∂f
g (x0 ) = lim = (x0 , y0 ).
h→0 h ∂x

In this manner, the partial derivative can be interpreted as an “ordinary” derivative


of a function of a single variable. Thus, the partial derivative ∂ f /∂ x gives the rate of
change of f with respect to x when y is held fixed at the value y0 , or in other words,
the rate of change of f in the direction of the unit vector i = (1, 0) at (x0 , y0 ).
Remark 8.4 1. Various notations are used for partial derivatives. As in f x (x0 , y0 ),
they must convey three pieces of information: The name of the function (“ f ”),
the direction of the partial derivative (“x”) and the point, at which the partial
derivative is taken (“(x0 , y0 )”). Besides the expressions given in Definition 8.6,
the following ones are also in use:

∂ ∂ f

f (x0 , y0 ), ∂x f (x0 , y0 ), .
∂x ∂ x
(x0 ,y0 )

Their mathematical meaning is identical. Besides being read as “partial derivative


of f with respect to x at (x0 , y0 ),” they may also be read as “ f sub x at (x0 , y0 )” or
as “dee f by dee x at (x0 , y0 ).” In order to familiarize the reader with this situation,
in the remainder of this book we will not stick to one notation in particular, but
will use several of them.
2. In Definition 8.6, we have denoted by (x0 , y0 ) the point at which the partial deriva-
tive is taken, in order to distinguish it clearly from the names x, y of the variables.
However, as in the case of functions of a single variable, one usually denotes the
point as (x, y) and writes
8.4 Partial Derivatives with Applications 253

∂f
(x, y) , and so on.
∂x
3. As in the case of ordinary derivatives, we may regard partial derivatives as func-
tions of the point at which they are taken. Since a function is denoted just by its
name (like “ f ” or “sin”), they are denoted accordingly as

∂f ∂
, f x , or f , ∂x f.
∂x ∂x

Any point (x, y) then becomes the argument, and f x (x, y) the value of this func-
tion.
4. Alternatively, we may emphasize just the variables. Consider z = f (x, y). Instead
of “ f ” we may just use the letter “z” and write

∂z
, z x , and so on
∂x

with or without the arguments (x, y). Again, the mathematical meaning of those
expressions is the same as above.
5. The interpretation of partial derivatives as functions naturally leads us to the
notion of second partial derivatives as well as partial derivatives of order higher
than two. If f x possesses a partial derivative with respect to x, it is denoted by

∂2 f
fx x , , or ∂x x f.
∂x2
If f x possesses a partial derivative with respect to y, it is written as

∂2 f
fx y , , or ∂ y ∂x f.
∂y ∂x

Analogously, f yx and f yy are defined. Third-order partial derivatives are

∂3 f ∂3 f
fx x x = , fx x y = , and so on.
∂x3 ∂y ∂x2

6. One can prove that f yx = f x y , provided the functions f , f x , f y , f x y , and f yx are


continuous. Analogous results hold for higher partial derivatives. Thus, partial
derivatives can be interchanged under those conditions.
Next, we discuss the geometric interpretation of partial derivatives. Since the partial
derivative f x for y = y0 is equal to the derivative g of the partial function g(x) =
f (x, y0 ) (see Remark 8.3), the value f x (x0 , y0 ) gives the slope of the curve z =
g(x) = f (x, y0 ) at x = x0 , that is, the slope of the tangent to that curve at x = x0 .
Let us now consider the full three-dimensional picture of the surface z = f (x, y).
Let C1 be its intersection with the plane y = y0 and C2 be its intersection with the
254 8 Functions of Several Variables

plane x = x0 . Then f x (x0 , y0 ) can be viewed as the slope of the tangent to the curve
C1 at the point (x0 , y0 ), and f y (x0 , y0 ) can be viewed as the slope of the tangent to
the curve C2 at the point (x0 , y0 ). The number f x (x0 , y0 ) is called the slope of the
surface in the x-direction at (x0 , y0 ), and f y (x0 , y0 ) is called the slope of the surface
in the y-direction at (x0 , y0 ). Moreover, we can interpret f x (x, y0 ), for varying x,
as the rate of change of z with respect to x along the curve C1 , and f x (x0 , y0 ) as
its rate of change with respect to x at the point (x0 , y0 ). Analogously, f y (x0 , y), for
varying y, gives the rate of change of z with with respect to y along the curve C2 ,
and f y (x0 , y0 ) the rate of change with respect to y at the point (x0 , y0 ).
We now give examples of how to calculate partial derivatives. Note that, when
forming the partial derivative with respect to one variable, the other variables are
treated as constants (see Remark 8.3).
Example 8.10 Compute the partial derivatives f x , f y , f yx , f x y , f x x , and f yy of the
following functions:
1. f (x, y) = x 2 + 3x y + x + y − 2.
2. f (x, y) = y cos x y.
2y
3. f (x, y) = .
y + cos x
4. f (x, y) = x 2 − x y 2 + y 3 .
5. f (x, y) = ln(x 2 + y 2 ).
6. f (x, y) = (x 2 − x y + y 2 )5 .
7. f (x, y) = e x y + ln(x + y).
Solution:
1. We have
∂f ∂f
(x, y) = 2x + 3y + 1 , (x, y) = 3x + 1,
∂x ∂y
∂2 f ∂2 f ∂2 f ∂2 f
(x, y) = 3 = (x, y), (x, y) = 2, (x, y) = 0.
∂y ∂x ∂x ∂y ∂x2 ∂ y2

2. We have
∂f ∂f
(x, y) = −y 2 sin x y, (x, y) = cos x y − x y sin x y,
∂x ∂y
∂2 f
(x, y) = −x y 2 cos x y − 2y sin x y,
∂y ∂x
∂2 f
(x, y) = −y sin x y − y sin x y − x y 2 cos x y.
∂x ∂y

∂2 f ∂2 f
We see that indeed = . Furthermore,
∂x ∂y ∂y ∂x
8.4 Partial Derivatives with Applications 255

∂2 f ∂2 f
(x, y) = −y 3 cos x y, = −x sin x y − x sin x y − x 2 y cos x y.
∂x2 ∂ y2

3. For f (x, y) = 2y/(y + cos x) we get

∂f −2y(− sin x) 2y sin x


(x, y) = = ,
∂x (y + cos x) 2 (y + cos x)2
∂f (y + cos x)2 − 2y 2 cos x
(x, y) = = ,
∂y (y + cos x)2 (y + cos x)2
∂2 f (y + cos x)2 2 sin x − 4y sin x(y + cos x)
(x, y) =
∂y ∂x (y + cos x)4
(y + cos x) · (2 sin x) · (y + cos x − 2y)
=
(y + cos x)4
2 sin x(cos x − y)
= ,
(y + cos x)3
∂2 f (y + cos x)2 (−2 sin x) − 4 cos x(y + cos x)(− sin x)
(x, y) =
∂x ∂y (y + cos x)4
−2 sin x(y + cos x − 2 cos x)
=
(y + cos x)3
2 sin x(cos x − y)
= ,
(y + cos x)3
∂2 f ∂2 f
(again we see that = )
∂x ∂y ∂y ∂x
∂2 f (y + cos x)2 2y cos x − 4y sin x(y + cos x)(− sin x)
(x, y) =
∂x2 (y + cos x)4
2(y + cos x)[(y + cos x)y cos x + 2y sin2 x]
=
(y + cos x)4
2y(y cos x + cos2 x + sin2 x) 2y(y cos x + 1)
= = .
(y + cos x) 3 (y + cos x)3

4. For f (x, y) = x 2 − x y 2 + y 3 we get

∂f ∂f
(x, y) = 2x − y 2 , (x, y) = −2x y + 3y 2 ,
∂x ∂y
∂2 f ∂2 f
(x, y) = −2y = (x, y),
∂y ∂x ∂x ∂y
∂2 f ∂2 f
(x, y) = 2, (x, y) = −2x + 6y.
∂x2 ∂ y2

5. For f (x, y) = ln(x 2 + y 2 ) we get


256 8 Functions of Several Variables

∂f 2x ∂f 2y
(x, y) = 2 , (x, y) = 2 ,
∂x x +y 2 ∂y x + y2
∂2 f −2x(2y) −4x y ∂2 f
(x, y) = 2 = 2 = (x, y),
∂y ∂x (x + y )2 2 (x + y )2 2 ∂x ∂y
∂2 f (x 2 + y 2 ) · 2 − 2x · 2x 2(y 2 − x 2 )
= = 2 ,
∂x 2 (x + y )
2 2 2 (x + y 2 )2
∂2 f (x 2 + y 2 ) · 2 − 2y · 2y 2(x 2 − y 2 )
(x, y) = = 2 .
∂y 2 (x + y )
2 2 2 (x + y 2 )2

6. For f (x, y) = (x 2 − x y + y 2 )5 we get

∂f
∂ x (x, y) = 5(x − x y + y ) (2x − y),
2 2 4
∂f
∂ y (x, y) = 5(x − x y + y ) (−x + 2y),
2 2 4

∂2 f
∂ y ∂ x (x, y) = 20(x 2 − x y + y 2 )3 (−x + 2y)(2x − y) + 5(x 2 − x y + y 2 )5 (−1),
∂2 f
∂ x∂ y (x, y) = 20(x − x y + y 2 )3 (2x − y)(−x + 2y) + 5(x 2 − x y + y 2 )4 (−1),
2

∂2 f ∂2 f
so ∂x ∂y = ∂y ∂x ,
∂2 f
∂x2
(x, y) = 20(x 2 − x y + y 2 )3 (2x − y)2 + 5(x 2 − x y + y 2 )4 · 2,
∂2 f
∂ y2
(x, y) = 20(x 2 − x y + y 2 )3 (−x + 2y)2 + 5(x 2 − x y + y 2 ) · 2.

7. For f (x, y) = e x y + ln(x + y) we get

∂f 1 ∂f 1
(x, y) = ye x y + , (x, y) = xe x y + ,
∂x x+y ∂y x+y
∂2 f 1 ∂2 f
(x, y) = e x y + x ye x y − = (x, y),
∂x ∂y (x + y)2 ∂y ∂x
∂2 f 1 ∂2 f 1
(x, y) = y 2 xy
e − , (x, y) = x 2 e x y − .
∂x2 (x + y)2 ∂ y2 (x + y)2

Example 8.11 1. Compute f x (1, 1) and f y (−1, 1) for f (x, y) = e x y + ln(x 2 +


y 2 ).
2. Compute f x (1, 1, 1) and f z (−1, 1, −1) for f (x, y, z) = x sin (y + 3z).
3. Compute f x x (−1, −1) and f x y (−1, −1) for f (x, y) = 3x 2 − 5x cos π y.

Solution:
1. We get

2x 2y
f x (x, y) = ye x y + , f y (x, y) = xe x y + .
x2 + y2 x2 + y2

Therefore,
8.4 Partial Derivatives with Applications 257

2 2 1
f x (1, 1) = e + = e + 1, f y (−1, 1) = −e−1 + =1− .
2 2 e

2. We get f x (x, y, z) = sin (y + 3z), f z (x, y, z) = 3x cos (y + 3z), and therefore


f x (1, 1, 1) = sin 4, f z (−1, 1, −1) = −3 cos (1 − 3) = −3 cos (−2) = −3 cos 2.
3. For f (x, y) = 3x 2 − 5x cos π y we get

f x (x, y) = 6x − 5 cos π y, f x x (x, y) = 6 , f x y (x, y) = 5π sin π y ,

and therefore f x x (−1, −1) = 6, f x y (−1, −1) = 5π sin (−π ) = 0.

Remark 8.5 It may happen that the partial derivatives f x and f y exist at some point
without f being continuous at that point. For example, let
⎧ xy
⎨ , (x, y)  = (0, 0) ,
2 + y2
f (x, y) = x
⎩0 (x, y) = (0, 0).

One may check that f is not continuous at (0, 0), but f x (0, 0) and f y (0, 0) both exist.
The reason for this phenomenon to appear is that the partial derivatives of f are only
concerned with the behavior of f along horizontal and vertical directions, whereas
continuity of f at some point refers to the behavior of f in a whole neighborhood
of that point.

On the other hand, the following result holds.


Definition 8.7 (Continuous Differentiability) Let z = f (x, y) be defined in some
neighborhood B = Bη (x0 , y0 ) of a point (x0 , y0 ). We say that f is continuously
differentiable in B if f x and f y exist for all points in B and are continuous in B.
More general, let D be an open subset of the plane. We say that f is continuously
differentiable in D if f x and f y exist for all points in D and are continuous in D.

Theorem 8.2 Let z = f (x, y) be continuously differentiable in some subset D of


the plane. Then f is continuous in D.

Note that all functions considered in Example 8.9 are continuously differentiable in
their domain.

Applications of partial derivatives. We present several examples.


Example 8.12 The volume of the frustum of a cone (see Fig. 8.11) is given by the
function
1
V (R, r, h) = π h(R 2 + Rr + r 2 ).
3
Find the rate of change of the volume with respect to each of these variables if the
other variables are held constant. Determine the values of these rates of changes
when R = 8, r = 4, and h = 6.
258 8 Functions of Several Variables

Fig. 8.11 Frustum of a cone

Solution: The partial derivatives of V are

1
VR (R, r, h) = π h(2R + r ) ,
3
1
Vr (R, r, h) = π h(R + 2r ) ,
3
1
Vh (R, r, h) = π(R 2 + Rr + r 2 ).
3
For R = 8, r = 4 and h = 6, the rate of change of V
with respect to R is V R (8, 4, 6) = 13 π 6(16 + 4) = 40π,
with respect to r is Vr (8, 4, 6) = 32π ,
with respect to h is Vh (8, 4, 6) = 13 π(64 + 32 + 16) = 112
3 π.

Example 8.13 The parallel connection of three resistors (see Fig. 8.12) acts like a
single resistor with resistance R whose value is given in terms of the resistances R1 ,
R2 , and R3 by the formula
1 1 1 1
= + + .
R R1 R2 R3

Find the rate of change of R with respect to R1 , when R1 = 30, R2 = 45, and R3 =
90 .

Solution: The situation is modeled by the functions

1 1 1 1
R(R1 ) = , g(R1 ) = f (R1 , R2 , R3 ) = + + .
g(R1 ) R1 R2 R3

The rate of change of R with respect to R1 is given by

g (R1 ) ∂f 1
R (R1 ) = − , g (R1 ) = (R1 , R2 , R3 ) = − 2 ,
(g(R1 ))2 ∂ R1 R1

therefore
8.4 Partial Derivatives with Applications 259

Fig. 8.12 Resistors


connected in parallel

(R(R1 ))2
R (R1 ) = .
R12

For R1 = 30, R2 = 45, R3 = 90  we have

1 1 1 1 6 1
= f (30, 45, 90) = + + = = ,
R 30 45 90 90 15
so R = 15  and 2
152 1 1
R (R1 ) = 2 = = .
30 2 4

Another way to model this situation would be to set

1 1 1 1
R(R1 , R2 , R3 ) = , f (R1 , R2 , R3 ) = + + ,
f (R1 , R2 , R3 ) R1 R2 R3

and to apply the chain rule in the form

∂R 1 ∂f
(R1 , R2 , R3 ) = − (R1 , R2 , R3 ),
∂ R1 f (R1 , R2 , R3 )2 ∂ R1

which leads to the same computation as above.


Example 8.14 Suppose that the weight w in pounds is a function w = f (c, t) of the
number c of calories consumed daily and the number t of minutes you exercise daily.
Using these units for w, c and t interpret the statements

∂w ∂w
(2000, 15) = 0.02, (2000, 15) = −0.025.
∂c ∂t
∂w
Solution: The units of are pounds per calorie. Therefore, the statement
∂c
∂w
∂c
(2000, 15) = 0.02 means that if you are consuming 2000 cal daily and exercis-
260 8 Functions of Several Variables

ing 15 min daily, you will weigh approximately 0.02 pounds more if you consume
daily one extra calorie. If this rate does not change much when c increases, that is,
if w viewed as a function of c is approximately linear, then the increase in weight
∂w
would be about 2 pounds for each extra 100 cal per day. The units of are pounds
∂t
per minute. The statement ∂w ∂t
(2000, 15) = −0.025 means that, based on this calorie
consumption and number of minutes of exercise, you will weigh 0.025 pounds less
for each extra minute you exercise daily or 1 pound less for each extra 40 min per
day. So if you eat 100 cal extra each day and exercise for 80 min more each day,
your weight would remain roughly the same. (Again, this argument presumes that
the relationships are approximately linear for the considered values of c and t.)
Example 8.15 Assume that the concentration C of bacteria in the blood (in millions
of bacteria per milliliter) after the injection of an antibiotic is modeled by a function
C = f (x, t) = te−xt of the dose x (in grams) injected and the time t (in hours) since
the injection. Evaluate the following quantities and explain what each one means in
practical terms.
(a) f x (1, 2), b) f t (1, 2).
Solution: (a) We have f x (x, t) = −t 2 e−xt , so f x (1, 2) = −4e−2
−0.54. The graph
of f (x, 2) as a function of x gives the concentration of bacteria two hours after the
injection, as a function of the dose. The partial derivative f x (1, 2) gives the rate of
change of bacteria concentration with respect to the dose, at the value of a dose of 1 g.
It is the slope of the graph of f (x, 2) (see Fig. 8.13) at the point x = 1; it is negative
because a larger dose reduces the bacteria population. The value f x (1, 2) = −0.54
means a rate of decrease in bacteria concentration of 0.54 million/ml per gram of
additional antibiotic injected, near the nominal dose of 1 g.
(b) We have

f t (x, t) = e−xt − xte−xt , f t (1, 2) = e−2 − 2e−2


−0.14.

The graph of f (1, t) as a function of t (Fig. 8.14) gives the concentration of bacteria at
time t for the dose 1 g of antibiotic. The derivative f t (1, 2) is the slope of the graph
at the point t = 2. It is negative because 2 h after the injection, the concentration
of bacteria is decreasing with time, due to the action of the antibiotic. The partial
derivative f t (1, 2) gives the rate at which the bacteria concentration is changing with
respect to time, namely, a rate of decrease in bacteria concentration of 0.14 million/ml
per hour, near the nominal time of 2 h.
The Cobb Douglas Production Model. In the year 1928, Cobb and Douglas used
a simple formula proposed by Wicksell to model the production of the entire US
economy in the first quarter of twentieth century. Let P be the total yearly production
between 1899 and 1922, K the total investment over the same period, and L the total
labor force. They found that P was well approximated by the function
8.4 Partial Derivatives with Applications 261

Fig. 8.13 Bacteria concentration after 2 h as a function of the quantity of antibiotic injected

Fig. 8.14 Bacteria concentration as a function of time if 1 unit of antibiotic is injected

P = 1.01L 0.75 K 0.25 .

It modeled the US economy quite accurately for the period on which it was based as
well as for some time afterward. This model has found a widespread use in the more
general form of a so-called Cobb–Douglas production function

P = f (N , V ) = cN a V b ,
262 8 Functions of Several Variables

where P is the total quantity produced, c, a, and b are positive constants with 0 < a <
1 and 0 < b < 1 (often one assumes that b = 1 − a), N is the number of workers, and
V is the cost of capital equipment or investment. The partial derivative f N is called
the marginal productivity of labor. It measures the rate of change of production
with respect to a change in the expenditure for labor when capital expenditure is kept
constant. The partial derivative f V , called the marginal productivity of capital,
measures the rate of change of production with respect to change in the amount
expended on capital, keeping labor expenditure constant.
Example 8.16 Consider a factory manufacturing blades. Let N be the number of
workers, V the value of the equipment (in units of Rs. 50, 000), and P the production
measured in thousands of blades per day. Let the production function of this factory
be given by
P = f (N , V ) = 2N 0.6 V 0.4 .

(a) If the factory has a labor force of 100 workers and 200 units of equipment, what
is the production output of the factory?
(b) Find f N (100, 200) and f V (100, 200). Interpret your answers in terms of pro-
duction.
Solution:
(a) We have N = 100 and V = 200, so P = 2 · 1000.6 · 2000.4 = 263.9 thousand
blades per day.
(b) We get

f N (N , V ) = 2 · 0.6N −0.4 V 0.4 ,


f N (100, 200) = 1.2 · 100−0.4 · 2000.4
1.583 thousand blades/worker.

This means that if we have 200 units of equipment and increase the number of
workers by 1 from 100 to 101, the production output will go up by approximately
1.58 units or 1580 blades per day. Furthermore, we get

f V (N , V ) = 2 · 0.4N 0.6 V −0.6 ,


f V (100, 200) = 0.8 · 1000.6 · 200−0.6
0.53

thousand blades per unit of equipment. This means that if we have 100 workers
and increase the value of equipment by 1 unit (Rs 50, 000) from 200 units to
201 units, the production will go up by about 0.53 units, or 530 blades per day.

Example 8.17 The production in a certain country (in the early years following world
war II) is described by the function

P = f (x, y) = 30x 2/3 y 1/3 ,

where x denotes the number of units of labor used and y the number of units of
capital used.
8.4 Partial Derivatives with Applications 263

1. Compute f x and f y .
2. What is the marginal productivity of labor and the marginal productivity of capital
when the amount expended on labor and capital are 125 units and 27 units,
respectively?
3. Assuming that one unit of labor and is somehow interchangeable with one unit
of capital, should the government have encouraged capital investment rather than
increasing expenditure on labor to increase the country’s productivity?

Solution:

1. We have
2 y 1/3
f x (x, y) = 30 · x −1/3 y 1/3 = 20 ,
3 x
2/3
1 x
f y (x, y) = 30 · x 2/3 y −2/3 = 10 .
3 y

2. The required marginal productivity of labor is given by


1/3
27 3
f x (125, 27) = 20 = 20 · = 12 ,
125 5

that is, 12 units per unit increase in labor expenditure, keeping capital investment
constant at 27. The required productivity of capital is given by
2/3
125 25 250 7
f y (125, 27) = 10 = 10 · = = 27 ,
27 9 9 9

that is, 27 79 units per unit increase in capital expenditure, keeping labor constant
at 125 units.
3. From (2) we see that a unit increase in capital expenditure would have resulted
in a much faster increase in productivity than a unit increase in labor expendi-
ture would have. Therefore, the government should have encouraged increased
spending on capital rather than on labor during the early years of reconstruction.

The Chain Rule. Let us recall the chain rule for a function of a single variable.
For w = f (x) and x = g(t), we obtain the derivative of the composite function
w(t) = ( f ◦ g)(t) = f (g(t)) of the functions f and g as

w (t) = ( f ◦ g) (t) = f (g(t))g (t). (8.11)

Symbolically, this may also be written as

dw d f dx
= . (8.12)
dt d x dt
264 8 Functions of Several Variables

Let now w be a function of two variables w = f (x, y), and let x = g(t) and y = h(t)
be functions of yet another variable t. We consider the composite function w defined
by
w(t) = f (g(t), h(t)).

Theorem 8.3 (Chain Rule for functions of two variables) Let w = f (x, y) , x =
g(t) and y = h(t) be continuously differentiable functions. Then w is a continuously
differentiable function of t and

∂f ∂f
w (t) = (g(t), h(t))g (t) + (g(t), h(t))h (t). (8.13)
∂x ∂y

Symbolically, this formula may be written as

dw ∂ f dx ∂ f dy
= + (8.14)
dt ∂ x dt ∂ y dt
or
dw ∂w d x ∂w dy
= + . (8.15)
dt ∂ x dt ∂ y dt

Remark 8.6 In the context of the theorem above, w is the dependent variable and
t the independent variable. The variables x and y are called intermediate variables;
they play the role of dependent variables (with respect to t) as well as of independent
variables (with respect to w). The tree diagram (Fig. 8.15) provides a convenient
way to remember the chain rule (8.15). Start at w and go down both routes to t,
multiplying derivatives along the way. Then add the products.

Note.
1. Formulas (8.14) and (8.15) do not specify at which arguments the various deriva-
tives are evaluated. A more detailed form of (8.15) is

dw ∂f dx ∂f dy
(t0 ) = (x0 , y0 ) (t0 ) + (x0 , y0 ) (t0 ).
dt ∂x dt ∂y dt

Here, x0 is the value of the function x = g(t) at t = t0 , and y0 is the value of the
function y = h(t) at t = t0 .
2. The chain rule for a function of three independent variables can be stated as
follows: If w = f (x, y, z) is differentiable and x, y, and z are differentiable
functions of t, then w is a differentiable functions of t and

dw ∂w d x ∂w dy ∂w dz
= + + . (8.16)
dt ∂ x dt ∂ y dt ∂z dt
8.4 Partial Derivatives with Applications 265

Fig. 8.15 Illustration of the


chain rule

3. The chain rule for two independent variables and three intermediate variables
reads as follows. Suppose that w = f (x, y, z), x = g(r, s), y = h(r, s), and z =
k(r, s). If all four functions are differentiable then w has partial derivatives with
respect to r and s, given by formulas

∂w ∂w ∂ x ∂w ∂ y ∂w ∂z
= + + ,
∂r ∂ x ∂r ∂ y ∂r ∂z ∂r
(8.17)
∂w ∂w ∂ x ∂w ∂ y ∂w ∂z
= + + .
∂s ∂ x ∂s ∂ y ∂s ∂z ∂s

Example 8.18 Let w = f (x, y) = e x(x−y) , x = g(t) = 2t cos t, y = h(t) = 2t sin t.


dw
Evaluate w = at t = π .
dt
Solution: We have
∂f dx
(x, y) = e x(x−y) (2x − y), (t) = 2 cos t − 2t sin t,
∂x dt
∂f dy
(x, y) = −xe x(x−y) , (t) = 2 sin t + 2t cos t.
∂y dt

For t = π we have x = g(π ) = −2π , y = h(π ) = 0. The chain rule yields


266 8 Functions of Several Variables

dw ∂f dx ∂f dy
w (π ) = (π ) = (−2π, 0) (π ) + (−2π, 0) (π )
dt ∂x dt ∂y dt
2 2
= (−4π )e4π (−2) − (−2π )e4π (−2π )
2
= 4π(2 − π )e4π .

Example 8.19 Let w = f (x, y, z) = x + yz, x = g(t) = cos t, y = h(t) = sin t,


dw
z = k(t) = t. Find w = at t = 0.
dt
Solution: We have
∂f ∂f ∂f
(x, y, z) = 1, (x, y, z) = z, (x, y, z) = y,
∂x ∂y ∂z
dx dy dz
(t) = − sin t, (t) = cos t , (t) = 1.
dt dt dt

For t = 0 we have x = cos(0) = 1, y = sin(0) = 0, z = 0. Thus, the partial deriva-


tives of f have to be evaluated at (1, 0, 0). Using (8.16), that is,

dw ∂w d x ∂w dy ∂w dz
= + + ,
dt ∂ x dt ∂ y dt ∂z dt

we obtain
dw
w (0) = (0) = 1 · 0 + 0 · 1 + 0 · 1 = 0.
dt

r
Example 8.20 Let w = f (x, y, z) = x + 2y + z 2 , x = g(r, s) = , y = h(r, s) =
s
∂w ∂w
r + ln s, z = k(r, s) = 2r . Find
2
and as functions of r and s.
∂r ∂s
Solution: We use (8.17) and get, in abbreviated notation,

∂w ∂w ∂ x ∂w ∂ y ∂w ∂z 1
= + + = 1 · + 2 · 2r + 2z · 2.
∂r ∂ x ∂r ∂ y ∂r ∂z ∂r s

Since z = 2r we obtain

∂w 1 1
(r, s) = + 4r + 4r · 2 = + 12r.
∂r s s
In the same manner,

∂w ∂w ∂ x ∂w ∂ y ∂w ∂z  r  1
= + + = 1 · − 2 + 2 · + 2z · 0 ,
∂s ∂ x ∂s ∂ y ∂s ∂z ∂s s s
8.4 Partial Derivatives with Applications 267

and therefore
∂w 2 r
(r, s) = − 2 .
∂s s s
Another application of the chain rule is concerned with homogeneous functions. A
function f of n variables is called homogeneous of degree n, if

f (t x1 , t x2 , . . . , t xn ) = t n f (x1 , x2 , . . . , xn ) (8.18)

holds for all values of t, x1 , …xn .

Theorem 8.4 (Euler) Let f be a function of n variables which is homogeneous of


degree n. Then
∂f ∂f ∂f
x1 + x2 + · · · + xn = nf (8.19)
∂ x1 ∂ x2 ∂ xn

holds for all values of x1 ,…,xn . (All functions in (8.19) are evaluated at (x1 , . . . , xn ).)

Solution: Let x1 ,…,xn be arbitrary real numbers which will be kept fixed in the
following. We introduce intermediate variables X 1 = g1 (t) = t x1 ,…,X n = gn (t) =
t xn , and define

w(t) = f (g1 (t), . . . , gn (t)) = f (t x1 , . . . , t xn ).

Since f is homogeneous of degree n, we have

w(t) = t n f (x1 , . . . , xn ).

Differentiating both sides as functions of t yields

w (t) = nt n−1 f (x1 , . . . , xn ). (8.20)

We evaluate w (t) with the chain rule and obtain

∂f ∂f
w (t) = (g1 (t), . . . , gn (t))g1 (t) + · · · + (g1 (t), . . . , gn (t))gn (t)
∂ x1 ∂ xn
(8.21)
∂f ∂f
= (t x1 , . . . , t xn )x1 + · · · + (t x1 , . . . , t xn )xn
∂ x1 ∂ xn

Putting together (8.20) and (8.21) and setting t = 1 yields the assertion.
Example 8.21 Verify Euler’s theorem for

f (x, y, z) = 2x 3 + yz 2 − x yz.

Solution: The given function f is homogeneous of degree 3. By direct calculation


268 8 Functions of Several Variables

x f x + y f y + z f z = x(6x 2 − yz) + y(z 2 − x z) + z(2yz − x y)


= 6x 3 − 3x yz + 3yz 2 = 3 f.

Hence (8.19) is verified.


Example 8.22 For w = f (x, y), the two-dimensional Laplace equation

∂ 2w ∂ 2w
+ 2 =0 (8.22)
∂x2 ∂y

describes, for example, steady-state temperature distributions in the plane. Its three-
dimensional version for w = f (x, y, z) is given by

∂ 2w ∂ 2w ∂ 2w
+ 2 + 2 = 0.
∂x2 ∂y ∂z

(See Fig. 8.16.) Show that the following functions are solutions of the two-
dimensional Laplace equation:
1. w(x, y) = e−2y
 cos 2x,
2. w(x, y) = ln x 2 + y 2 .
Solution:
1. For w(x, y) = e−2y cos 2x we get

∂w ∂w
(x, y) = −2e−2y sin 2x , (x, y) = −2e−2y cos 2x ,
∂x ∂y
∂ 2w ∂ 2w
(x, y) = −4e−2y cos 2x , (x, y) = 4e−2y cos 2x.
∂x2 ∂ y2

We see that (8.22) is satisfied.


2. We get

∂w 1 2x x
(x, y) =  ·  = 2 ,
∂x x +y 2 x +y
2 2 2 2 x + y2
∂ 2w x 2 + y 2 − x · 2x y2 − x 2
(x, y) = = 2 ,
∂x 2 (x + y )
2 2 2 (x + y 2 )2
∂w 1 2y y
(x, y) =  ·  = 2 ,
∂y x 2 + y2 2 x 2 + y2 x + y2
∂ 2w x 2 + y 2 − y · 2y x 2 − y2
(x, y) = = .
∂ y2 (x 2 + y 2 )2 (x 2 + y 2 )2

We see that (8.22) is satisfied.


8.4 Partial Derivatives with Applications 269

Fig. 8.16 Steady state temperature distributions in planes and solids satisfy Laplace equations. The
plane (a) may be treated as a thin slice of the solid (b) Perpendicular to the z-axis

Example 8.23 For functions w = f (t, x), the one-dimensional wave equation

∂ 2w 2∂ w
2
= c (8.23)
∂t 2 ∂x2
is a basic model for wave propagation along a line (forward and backward). For
example, w stands for air pressure (when describing sound waves) or for displacement
(when describing elastic waves, as in a vibrating string). The independent variables
are the time t and the distance x. The constant c corresponds to the speed of the
wave, and it varies with the medium; for example, the speed of sound is greater in
water than in air.
Show that the following functions are solutions of wave equation:
1. w = sin (x + ct),
2. w = cos (2x + 2ct),
3. w = f (u),
where f is a twice differentiable function of u and u(t, x) = a(x + ct) with some
constant a,
4. w = 5 cos (3x + 3ct) + e x+ct .
270 8 Functions of Several Variables

Solution:
1. For w = sin (x + ct) we get

∂w ∂ 2w
(t, x) = cos (x + ct), (t, x) = − sin (x + ct),
∂x ∂x2
∂w ∂ 2w
(t, x) = c cos (x + ct), (t, x) = −c2 sin (x + ct).
∂t ∂t 2
We see that (8.23) is satisfied.
2. We get

∂w ∂ 2w
(t, x) = −2 sin (2x + 2ct), (t, x) = −4 cos (2x + 2ct),
∂x ∂x2
∂w ∂ 2w
(t, x) = −2c sin (2x + 2ct), (t, x) = −4c2 cos (2x + 2ct).
∂t ∂t 2
We see that (8.23) is satisfied.
3. We have w(t, x) = f (u(t, x)) = f (ax + act). Therefore

∂w ∂ 2w
(t, x) = f (ax + act) · ac, (t, x) = f (ax + act) · a 2 c2 ,
∂t ∂t 2
∂w ∂ 2w
(t, x) = f (ax + act) · a, (t, x) = f (ax + act) · a 2 .
∂x ∂x2
We see that (8.23) is satisfied.
4. For w(t, x) = 5 cos (3x + 3ct) + e x+ct we get

∂w
(t, x) = −15c sin (3x + 3ct) + ce x+ct ,
∂t
∂ 2w
(t, x) = −45c2 cos (3x + 3ct) + c2 e x+ct ,
∂t 2
∂w
(t, x) = −15 sin (3x + 3ct) + e x+ct ,
∂x
∂ 2w
(t, x) = −45 cos (3x + 3ct) + e x+ct .
∂x2
We see that (8.23) is satisfied.
8.4 Partial Derivatives with Applications 271

Example 8.24 The heat equation (or diffusion equation)

∂u ∂ 2u
=κ 2, (8.24)
∂t ∂x

where κ = c2 > 0 is a constant, describes instationary (that is, time-dependent) dif-


fusion processes, in particular, heat conduction. In the latter case, u = u(t, x) stands
for the temperature, and the number κ represents the heat conduction coefficient.
Show that the following functions are solutions of the heat equation.
1. u = e−t sin (x/c)
2. u = e−t cos (x/c)

Solution:
1. We have, in short notation,

∂u x ∂u 1 x ∂ 2u 1 x
= −e−t sin , = e−t cos , = − 2 e−t sin .
∂t c ∂x c c ∂x 2 c c

Hence, u(t, x) = e−t sin (x/c) is a solution of the heat equation.


2. We have

∂u x ∂u 1 x ∂ 2u 1 x
= −e−t cos , = − e−t sin , = − 2 e−t cos .
∂t c ∂x c c ∂x 2 c c

Hence, u(t, x) = e−t cos (x/c) is a solution of the heat equation.


Parameter-dependent integrals. Let f = f (x, y) be a real-valued function defined
on a rectangular domain

Q = {(x, y) : a ≤ x ≤ b , c ≤ y ≤ d}. (8.25)

We consider, for a fixed x ∈ [a, b], the integral


 d
f (x, y) dy.
c

Such an integral is called a parameter-dependent integral, since integration is


performed with respect to y, while x plays the role of a parameter. Next, we may
consider this integral as a function of the parameter,
 d
F(x) = f (x, y) dy. (8.26)
c

We can compute the derivative of F as


272 8 Functions of Several Variables
 
d d d
∂f
F (x) = f (x, y) dy = (x, y) dy , (8.27)
dx c c ∂x

provided that it is correct to interchange the integral with the derivative. This is
asserted in the following theorem. Note that the present situation is different from
the one encountered in the fundamental theorem of calculus, since differentiation
and integration are performed here with respect to different variables.
Theorem 8.5 Let f be a continuous real-valued function with domain Q as in (8.25).
Then (8.26) defines a function F : [a, b] → R which is continuous. If moreover f
has a continuous partial derivative with respect to x on Q, then F is differentiable
on (a, b) and  d
∂f
F (x) = (x, y) dy. (8.28)
c ∂x

We will not prove this theorem.


Example 8.25 Verify (8.28) for f (x, y) = sin(x + y) and Q = [0, 1] × [0, π ].
Solution: We have
 π   y=π
F(x) = sin(x + y) dy = − cos(x + y)
0 y=0

= −cos(x + π ) + cos x = 2 cos x ,

so F (x) = −2 sin x. On the other hand,


 π  π   y=π
∂f
(x, y) dy = cos(x + y) dy = sin(x + y)
0 ∂x 0 y=0

= sin(x + π ) − sin x = −2 sin x.

Remark 8.7 In order for the assertions of Theorem 8.5 to hold, it is not necessary
that f resp. ∂x f are continuous on Q. Actually, it suffices that they are bounded by
integrable functions g resp. h,

| f (x, y)| ≤ g(y) , |∂x f (x, y)| ≤ h(y) ,

for all x ∈ [a, b].

We finally consider the situation where not only the integrand but also the lower and
upper limits depend on the parameter,
 d(x)
F(x) = f (x, y) dy. (8.29)
c(x)

The derivative of F is given by the formula of Leibniz


8.4 Partial Derivatives with Applications 273


d
∂f
F (x) = (x, y) dy + f (x, d(x))d (x) − f (x, c(x))c (x) . (8.30)
c ∂x

In order to derive this formula, one sets


 q
G(x, p, q) = f (x, y) dy
p

and applies the chain rule as well as the fundamental theorem of calculus to

F(x) = G(x, c(x), d(x)) .

Example 8.26 Compute the derivative of


 3−x
F(x) = sin(x y) dy .
x2

Solution: We have c(x) = x 2 and d(x) = 3 − x. From (8.30) we obtain that


 3−x

F (x) = y cos(x y) dy + sin(x(3 − x)) · (−1) − sin(x 3 ) · 2x .
x2

8.5 Optimization of Functions of Two Variables

In Chap. 4, we have discussed optimization of functions of one variable. In this


section we extend those results to the case of functions of two or more variables.
Again, optimization means that we want to choose the most favorable value – here,
the maximum or the minimum of a function of several variables. As in the case
of a single variable, the differential calculus helps us to find those maximum and
minimum values as well as the location of their occurrence.
In optimization, one usually distinguishes between “unconstrained” and “con-
strained” optimization. In contrast to the former, the latter means that we explicitly
take into account certain restrictions imposed when finding the maximum or mini-
mum.

8.5.1 Unconstrained Optimization

Definition 8.8 Let f be a function of two variables with domain D ⊂ R2 , that is,
its domain is a subset of the plane R2 .
274 8 Functions of Several Variables

1. A point (x0 , y0 ) in D is called a global minimizer or global minimum of f if


f (x0 , y0 ) ≤ f (x, y) holds for all points (x, y) in D.
2. A point (x0 , y0 ) in D is called a global maximizer or global maximum of f if
f (x0 , y0 ) ≥ f (x, y) holds for all points (x, y) in D.
3. A point (x 0 , y0 ) in D is called a local minimizer or local minimum of f if

f (x0 , y0 ) ≤ f (x, y) (8.31)

holds for all (x, y) in D sufficiently close to (x 0 , y0 ). This


 means that there exists
an ε > 0 such that (8.31) holds for all (x, y) in D with (x − x0 )2 + (y − y0 )2 <
ε.
4. A point (x0 , y0 ) in D is called a local maximizer or local maximum of f if
f (x0 , y0 ) ≥ f (x, y) holds for all (x, y) in D sufficiently close to (x 0 , y0 ) (which
means the same as in 3.)
5. A point (x0 , y0 ) in D is called a (local resp. global) extremum of f if it is either
a (local resp. global) maximum or a (local resp. global) minimum of f .
In all these cases, the corresponding function values f (x0 , y0 ) are called global
(respectively local) minimal or minimum (respectively maximal or maximum)
values of f .

Let us recall from Remark 8.1 that (x, y) is called an interior point of the domain of
a function f , if f is defined in a whole neighborhood of this point.
Definition 8.9 Let (x, y) be an interior point of the domain of a function f of two
variables at which the partial derivatives f x (x, y) and f y (x, y) both exist. If

f x (x, y) = 0 and f y (x, y) = 0 , (8.32)

then (x, y) is called a critical point or a stationary point of f . If (x, y) is a critical


point, but neither a local maximum nor a local minimum point, it is called a saddle
point.
First and second derivative test for a local extremum. Here, we look for extrema
which lie in the interior of the domain D of the function f to be extremized. This
means that near the extremum we are not constrained by any restriction. Extrema on
the boundary of D are treated in Sect. 8.5.2.
Theorem 8.6 Let the function f of two variables have a local maximum or minimum
at a point (x0 , y0 ) which is an interior point of the domain D of f .
1. (First Derivative Test for a Local Extremum)
Assume that both f x and f y exist at (x0 , y0 ). Then

f x (x0 , y0 ) = 0 and f y (x0 , y0 ) = 0, (8.33)

that is, (x0 , y0 ) is a critical point of f .


8.5 Optimization of Functions of Two Variables 275

Fig. 8.17 A saddle point

2. (Second Derivative Test for a Local Extremum)


Let (x0 , y0 ) be a critical point of f found from part (1), and assume that moreover
the second partial derivatives f x x , f x y , f yx , and f yy exist and are continuous. Let

Δ = f x x (x0 , y0 ) f yy (x0 , y0 ) − ( f x y (x0 , y0 ))2 . (8.34)

Then the following holds.


a. If Δ > 0 and f x x (x0 , y0 ) > 0, then f has a local minimum at (x0 , y0 ).
b. If Δ > 0 and f x x (x0 , y0 ) < 0, then f has a local maximum at (x0 , y0 ).
c. If Δ < 0, then f has a saddle point at (x0 , y0 ).
d. If Δ = 0, then the test is inconclusive.
Remark 8.8 1. In Fig. 8.17 we see the function z = f (x, y) = y 2 − x 2 with its
saddle point at the origin (0, 0, 0)
2. The only points where f (x, y) can assume extremum values are critical points
and boundary points.

Example 8.27 Find the local extrema of


1. f (x, y) = x 2 + y 2 ,
2. f (x, y) = x 2 − x y + y 2 + 3x,
3. f (x, y) = x 2 + x y + y 2 − 6x + 6.
Solution:
1. The domain D of f (x, y) = x 2 + y 2 is the whole plane, so all points (x, y) are
interior points of D. We have

f x (x, y) = 2x , f y (x, y) = 2y.

To find the critical points of f we set f x = 0 and f y = 0, which gives x = 0


and y = 0. Therefore, (0, 0) is the only critical point of f . To check whether
276 8 Functions of Several Variables

(0, 0) gives a maximum, a minimum, or a saddle point, we compute the second


derivatives. They turn out to be the constant functions

fx x = 2 , fx y = 0 , f yy = 2.

So f x x (0, 0) = 2, f x y (0, 0) = 0 and f yy (0, 0) = 2, and we get

Δ = f x x (0, 0) f yy (0, 0) − ( f x y (0, 0))2 = 2 · 2 − 0 = 4 > 0 .

Since Δ > 0 and f x x (0, 0) = 2 > 0, we see that f has a local minimum at the
point (0, 0) with the minimal value f (0, 0) = 0. Since f (x, y) > 0 for all other
points (x, y), in this case the local minimum is also a global minimum of f , see
Fig. 8.18.
2. The domain D of f (x, y) = x 2 − x y + y 2 + 3x is the whole plane, so all points
(x, y) are interior points of D. We have

f x (x, y) = 2x − y + 3 , f y (x, y) = −x + 2y.

The conditions f x = 0 and f y = 0 yield the system

2x − y + 3 = 0
−x + 2y = 0

of two equations for the two unknowns x and y. Solving these gives x = −2 and
y = −1, so f has exactly one critical point, namely, (−2, −1). To check whether
(−2, −1) is a maximum, a minimum, or a saddle point, we calculate the second
partial derivatives. They turn out to be the constant functions

fx x = 2 , f yy = 2 , f x y = −1.

At the critical point f x x (−2, −1) = 2 > 0 and

Δ = f x x (−2, −1) f yy (−2, −1) − ( f x y (−2, −1))2 = 2 · 2 − 1 = 3 > 0.

Therefore, f has a local minimum at (−2, −1) with the minimal value
f (−2, −1) = −3. Since f has partial derivatives everywhere and has no other
critical point, there can be no other local minimum or maximum.
3. For f (x, y) = x 2 + x y + y 2 − 6x + 6 we compute the first partial derivatives

f x (x, y) = 2x + y − 6 , f y (x, y) = x + 2y.

Critical points must satisfy 2x + y − 6 = 0 and x + 2y = 0. Solving these equa-


tions simultaneously, we get x = 4 and y = −2, so that (4, −2) is the unique
critical point. To determine whether this critical point gives a minimum, a max-
8.5 Optimization of Functions of Two Variables 277

Fig. 8.18 A global


minimum

imum or a saddle point we find f x x , f yy , f x y at (4, −2). Again, these partial


derivatives are constant functions, namely, f x x = 2, f yy = 2 and f x y = 1 so that
in particular

f x x (4, −2) = 2 , f yy (4, −2) = 2 , f x y (4, −2) = 1.

Since

Δ = f x x (4, −2) f yy (4, −2) − ( f x y (4, −2))2 = 2 · 2 − 1 = 3 > 0

and f x x (4, −2) = 2 > 0, the function f has a local minimum at (4, −2) with
minimum value f (4, −2) = −6.

Example 8.28 Find and analyze the critical points of


1. f (x, y) = −x 2 + 2x − y 2 − 4y − 5,
2. f (x, y) = x y.
Solution:
1. We have
f x (x, y) = −2x + 2 , f y (x, y) = −2y − 4.

To find the critical points we set f x = 0 and f y = 0, that is −2x + 2 = 0 and


−2y − 4 = 0. Solving these equations gives x = 1, y = −2. Hence, f has exactly
one critical point, namely, (1, −2). To determine the type of the critical point we
compute the second partial derivatives and obtain the constant functions f x x = 2,
f yy = 2, f x y = 0. Since

Δ = f x x (1, −2) f yy (1, −2) − ( f x y (1, −2))2 = 2 · 2 − 0 = 4 > 0

and f x x (1, −2) < 0, the function f has a local maximum at the point (1, −2)
with a maximum value f (1, −2) = 0.
278 8 Functions of Several Variables

Fig. 8.19 A solid whose


volume is maximized

2. We have f x (x, y) = y and f y (x, y) = x. The conditions f x = 0 and f y = 0 give


x = 0 and y = 0, so (0, 0) is the only critical point. To determine the type of this
critical point, we compute

f x x (0, 0) = 0, f yy (0, 0) = 0, f x y (0, 0) = 1

so
Δ = 0 · 0 − 12 = −1 < 0.

Therefore, the function f has a saddle point at (0, 0), and f has no local extrema.

Example 8.29 Find the rectangular three-dimensional solid of maximal volume (see
Fig. 8.19), when the sum of the length of all edges is equal to a given constant.

Solution: Let x, y, and z be the length of the edges in the corresponding directions.
The sum of the length of those 12 edges equals 4x + 4y + 4z (see Fig. 8.19). Let
4x + 4y + 4z = 4k with a fixed constant k > 0, so x + y + z = k. The volume of
the solid is V = x yz. Since z = k − x − y, we can write V as a function of x and y
only,

V (x, y) = x y(k − x − y) = kx y − x 2 y − x y 2 .

Since edges must have nonnegative lengths, the domain D of V is described by the
inequalities 0 ≤ x, 0 ≤ y and 0 ≤ z = k − x − y which form a triangle in the plane,
see Fig. 8.20. Along the sides of the triangle we have V = 0, so the maximum, if it
exists, must be an interior point of D. We therefore compute the critical points of f
in the interior of D. They satisfy

0 = Vx (x, y) = ky − 2x y − y 2 ,
0 = Vy (x, y) = kx − x 2 − 2x y.
8.5 Optimization of Functions of Two Variables 279

Fig. 8.20 Restrictions on


edge lengths

Since x > 0 and y > 0 for interior points, we may divide by y and x, respectively,
and obtain
k − 2x − y = 0 , k − x − 2y = 0.

Solving these equations simultaneously, we get

1 1
x= k, y= k
3 3
as the unique critical point of f which moreover lies in the interior of D because
k − 13 k − 13 k = 13 k > 0. Now we determine whether the point ( 13 k, 13 k) is a local
maximum, a local minimum, or a saddle point. We compute the second partial deriva-
tives as

Vx x (x, y) = −2y , Vyy (x, y) = −2x, Vx y (x, y) = k − 2x − 2y .

At the point ( 31 k, 13 k), we get



2 2 1 2 4 1
Vx x Vyy − Vx2y = − k · − k − − k = k2 − k2
3 3 3 9 9
1
= k 2 > 0.
3

Therefore, V has in D a unique local maximum at x = 13 k and y = 13 k. One can


prove by other means that V must have a global maximum on D. Since the global
maximum is also a local maximum, it must be the point we just computed. We also
see that z = k − x − y = 13 k for this point, so the solid with maximal volume is in
fact a cube with side length 13 k.
280 8 Functions of Several Variables

Fig. 8.21 Locating a site for


a television relay station

Example 8.30 A television relay station will serve towns A, B, and C whose relative
locations are shown in Fig. 8.21. Determine a site for the location of the station such
that the sum of the squares of the distances from each town to the site is minimized.
Solution: Let the required site be located at the point P = (x, y). The square of the
distance of P from town A is (x − 30)2 + (y − 20)2 . For B and C, the squares of the
distances are (x + 20)2 + (y − 10)2 and (x − 10)2 + (y + 10)2 , respectively. The
sum of the squares of the distances from P to the three towns is given by

f (x, y) = (x − 30)2 + (y − 20)2 + (x + 20)2 + (y − 10)2 + (x − 10)2 + (y + 10)2 .

In order to find the minimum, we determine the critical points of f . We must have
f x = 0 and f y = 0, so

0 = f x (x, y) = 2(x − 30) + 2(x + 20) + 2(x − 10) = 6x − 40,


0 = f y (x, y) = 2(y − 20) + 2(y − 10) + 2(y + 10) = 6y − 40.

Solving these equations we get x = 20


3
and y = 20
3
, so that ( 20
3
, 20
3
) is the only critical
point of f . Now

20 20 20 20 20 20
fx x , = 6, fx y , = 0, f yy , = 6.
3 3 3 3 3 3

We get

20 20 20 20 20 20
D = fx x , · f yy , − f x2y , = 36 > 0,
3 3 3 3 3 3

and f x x ( 20
3
, 20
3
) > 0. Therefore, f has a local minimum at ( 20
3
, 20
3
). One can prove
by other means that f must have a global minimum. Thus, the point just computed
yields the unique global minimum. Hence, the required site has coordinates
8.5 Optimization of Functions of Two Variables 281

20 20
x= and y = .
3 3

Example 8.31 A manufacturing company produces two products which are sold in
two separate markets. The quantities q1 and q2 demanded by the consumers and the
prices p1 and p2 (in dollars) of each item are related by the equations

p1 = 600 − 0.3q1 , p2 = 500 − 0.2q2 .

(Increasing price corresponds to decreasing demand.) The company’s total produc-


tion cost is given by

C(q1 , q2 ) = 16 + 1.2q1 + 1.5q2 + 0.2q1 q2 .

If the company wants to maximize its total profits, how much of each product should
it produce? What is the maximum profit?
Solution: The revenue from the each product sold in the corresponding market equals
p1 q1 and p2 q2 , respectively. The total revenue R equals its sum p1 q1 + p2 q2 . Written
as a function of q1 and q2 we get

R(q1 , q2 ) = (600 − 0.3q1 )q1 + (500 − 0.2q2 )q2


= 600q1 − 0.3q12 + 500q2 − 0.2q22 .

The total profit becomes

P(q1 , q2 ) = R(q1 , q2 ) − C(q1 , q2 )


= 600q1 − 0.3q12 + 500q2 − 0.2q22 − (16 + 1.2q1 + 1.5q2 + .2q1 q2 )
= −16 + 598.8q1 − 0.3q12 + 498.5q2 − 0.2q22 − 0.2q1 q2 .

The critical points of P are obtained by equating to zero both partial derivatives of
P, that is,

∂P
0= (q1 , q2 ) = 598.8 − 0.6q1 − 0.2q2 ,
∂q1
∂P
0= (q1 , q2 ) = 498.5 − 0.4q2 − 0.2q1 .
∂q2

Solving these two equations simultaneously, we get

q1 = 699.1
699 and q2 = 896.7
897,

and therefore p1
390.30, p2
320.60. We use the second derivative test to check
whether this critical point (699.1, 896.7) gives a local maximum. The second-order
282 8 Functions of Several Variables

partial derivatives are constant functions,

∂2 P ∂2 P ∂2 P
= −0.6, = −0.4, = −0.2.
∂q12 ∂q22 ∂q1 ∂q2

Since
2
∂2 P ∂2 P ∂2 P
Δ= − = (−0.6)(−0.4) − (−0.2)2 = 0.2 > 0,
∂q12 ∂q22 ∂q1 ∂q2

and
∂2 P
= −0.6 < 0,
∂q12

the profit function has a unique local maximum at (699, 897). One can prove by other
means that a global maximum must exist, which is therefore equal to this point. As
the final result, the company should produce 699 units of the first product priced at $
390.30 per unit and 897 units of the second product priced at $ 320.60 per unit. The
maximal profit is
P(699, 897) = $ 432, 797.

Least Squares Fit, Regression. Let us assume that two scalar quantities x and y
are related by the linear model y = g(x) = ax + b, and that we want to determine
the parameters a and b from a set of known data (x1 , y1 ), (x2 , y2 ), . . . (xn , yn ) of
corresponding values of x and y. (Once we now a and b, we may predict the values
of y for other values of x from the equation y = g(x).) In principle, two pairs are
sufficient (n = 2), since we then may determine a and b from the linear system

ax1 + b = y1 , ax2 + b = y2 ,

whenever x1  = x2 , and y = ax + b for those values of a and b yields the unique


straight line through the points (x1 , y1 ) and (x2 , y2 ) in the plane. Since the data or
the model may not be accurate, one usually wants to use more data (n > 2) in order to
get the most useful values for a and b. In that case, however, we have more equations
axi + b = yi than the two unknowns a and b, and the problem arises to find the
“best” line with respect to the data, see Fig. 8.22. The usual solution is to determine
a and b such that the sum of the squares of the deviations yi − g(xi ) becomes as
small as possible. In other words, one minimizes the function of two variables


n
f (a, b) = [(axi + b) − yi ]2 .
i=1

The resulting values of a and b are called the least squares fit to the given data set
of pairs (xi , yi ), see Fig. 8.23. In order to determine those values we compute
8.5 Optimization of Functions of Two Variables 283

Fig. 8.22 Find the line that


best “fits” the data

Fig. 8.23 Least squares fit:


The sum of squares of the
vertical deviations is minimal

∂f n   n n n 
(a, b) = 2 (axi + b − yi )xi = 2 a xi2 + b xi − xi yi ,
∂a i=1 i=1 i=1 i=1
(8.35)
∂f n   n n 
(a, b) = 2 (axi + b − yi ) = 2 a xi + nb − yi .
∂b i=1 i=1 i=1

At the minimum both partial derivatives must be zero, and we arrive at the system


n 
n 
n
a xi2 + b xi = xi yi ,
i=1 i=1 i=1
(8.36)

n 
n
a xi + nb = yi ,
i=1 i=1

of two equations, linear with respect to a and b, from which we determine a and b.

Example 8.32 Find the line y = ax + b that gives the least squares fit to the points
(0, 2), (1, 3), (2, 3), see Fig. 8.24.

Solution: We have n = 3, and the (xi , yi ) are given successively by (0, 2), (1, 3) and
(2, 3). According to (8.36), the necessary conditions ∂ f /∂a = 0 = ∂ f /∂b become
284 8 Functions of Several Variables

Fig. 8.24 Least squares fit,


Example 8.32

a(0 + 1 + 4) + b(0 + 1 + 2) = 2 · 0 + 3 · 1 + 3 · 2,
a(0 + 1 + 2) + 3b = 2 + 3 + 3,

that is,

5a + 3b = 9
3a + 3b = 8.

Its unique solution is a = 1


2
and b = 13
6
, so the least squares solution yields the
straight line
1 13
y= x+ .
2 6
With the second derivative test, one may check that the computed values for a and
b indeed yield a local minimum. Namely, we get f aa = 10 > 0, f bb = 6 and f ab =
f ba = 6, so Δ = f aa f bb − ( f ab )2 = 60 − 36 = 24 > 0. One may ask whether the
system (8.36) has a unique solution for a and b. The answer is “yes” unless we have
xi = x j for all 1 ≤ i, j ≤ n. We will not give the proof here.
Relation to statistics. Statistical analysis provides a reason why the least squares
method appears in many cases as the natural approach when one wants to fit data. Let
us briefly mention this connection; for further explanation and exposition we refer to
courses in probability and statistics. One interprets the measured values y1 , . . . , yn
as solutions of
yi = axi + b + z i ,

where z i are random perturbations, assumed to be independent and normally dis-


tributed with mean zero. One then can prove that the least squares solution computed
above yields the so-called maximum likelihood estimate of a and b. In this context,
the resulting line y = ax + b is called the regression line, and its slope a is called
the regression coefficient. Moreover, the mean values
8.5 Optimization of Functions of Two Variables 285

1 1
n n
x= xi , y= yi ,
n i=1 n i=1

are related by
ax + b = y.

This follows if we divide the second line in (8.36) by n.

8.5.2 Constrained Optimization

In most cases, practical optimization problems include constraints due to external


circumstances and conditions. For example, a city administration desiring to build a
modern public transport system has limited resources. A nation trying to maintain
its balance of trade must not spend more on imports than it earns on exports.
We discuss here how to find an optimum value under such constraints. We restrict
ourselves to the basic situation when want to find the extrema of a function z =
f (x, y) of two variables subject to a single constraint of the form g(x, y) = 0. (Such
constrained extrema are also called relative extrema.)
The Method of Lagrange Multipliers. To find the constrained extrema of the func-
tion z = f (x, y) subject to the constraint g(x, y) = 0, one carries out the following
steps.
1. One forms an auxiliary function of three variables,

F(x, y, λ) = f (x, y) + λg(x, y). (8.37)

The function F is called the Lagrange function (or, in short, the Lagrangian), and
the variable λ is called the Lagrange multiplier.
2. Determine the critical points of F, that is, solve the system of three equations

Fx = 0 , Fy = 0 , Fλ = 0, (8.38)

with respect to the three unknowns x, y, and λ.


3. For each critical point (x, y, λ) of F found in step 2, the point (x, y) is a candidate
for a constrained local maximum (or minimum) of f .
4. Apply the second derivative test in the following form: If

Δ = Fx x Fyy − Fx2y

satisfies Δ > 0 at a critical point (x, y, λ), then (x, y) is a local minimum if
Fx x (x, y, λ) > 0, and a local maximum if Fx x (x, y, λ) < 0. The test is incon-
clusive in the cases Δ = 0 and (in contrast to the situation without constraints)
Δ < 0.
286 8 Functions of Several Variables

Example 8.33 Using the method of Lagrange multipliers, find the local extrema of
1. f (x, y) = 2x 2 + y 2 subject to the constraint x + y = 1.
2. f (x, y) = x y subject to the constraint x + y = 16.

Solution:
1. We write the constraint equation x + y = 1 in the form 0 = g(x, y) = x + y − 1
and form the Lagrange function

F(x, y, λ) = f (x, y) + λg(x, y) = 2x 2 + y 2 + λ(x + y − 1) .

To find the critical points of the function F, we solve the system of equations

0 = Fx = 4x + λ,
0 = Fy = 2y + λ,
0 = Fλ = x + y − 1.

Solving the first two equations for x and y in terms of λ, we obtain

1 1
x = − λ, y = − λ.
4 2
Substituting x and y in the third equation, we get

1 1 4
− λ − λ − 1 = 0 , so λ = − .
4 2 3

Therefore, x = 13 , y = 23 , and thus the point ( 13 , 23 , − 43 ) is the only critical point


of F. Concerning the second derivative test, we see that the second derivatives of
F are the constant functions

Fx x = 4 , Fyy = 2 , Fx y = 0,

so Δ = Fx x Fyy − (Fx y )2 = 4 · 2 − 0 = 8 > 0 and Fx x = 4 > 0. Therefore, the


point ( 13 , 23 ) is a local minimum of f relative to the constraint x + y = 1
2. We set g(x, y) = x + y − 16 and obtain the Lagrange function F as

F(x, y, λ) = f (x, y) + λg(x, y) = x y + λ(x + y − 16).

The critical points are obtained by solving the system of equations

0 = Fx = y + λ,
0 = Fy = x + λ,
0 = Fλ = x + y − 16 = 0.
8.5 Optimization of Functions of Two Variables 287

From the first and second equation, we get y = −λ and x = −λ. Substituting these
in the third equation, we get −2λ − 16 = 0, so λ = −8 and x = y = 8. Therefore,
the unique critical point of F is (8, 8, −16). Again, the second derivatives of
F are constant functions, namely, Fx x = Fyy = 0 and Fx y = Fyx = 1, so Δ =
Fx x Fyy − (Fx y )2 = −1 < 0. Thus, the second derivative test is inconclusive. In
fact, the point (8, 8) is a local maximum of f relative to the constraint x + y = 16.
Note that in the two examples above we could have eliminated the constraint
g(x, y) = 0 by solving for y, thus reducing the problem to an unconstrained opti-
mization problem, as we have done previously. In the first problem, g(x, y) = 0
gives y = 1 − x and we are left with finding the extrema of h(x) = f (x, 1 − x) =
2x 2 + (1 − x)2 , which indeed would be much simpler than the computation above.
However, such a direct elimination may be cumbersome or infeasible in other exam-
ples, in particular, when more variables and more constraints are involved, a situation
which we do not discuss here.
Example 8.34 Suppose that x units of labor and y units of capital are required to
produce
f (x, y) = 100x 3/4 y 1/4

units of a certain product. Assume that each unit of labor costs $200, each unit
of capital costs $ 300, and a total amount of $ 60, 000 is available for production.
Determine how the funds should be allocated to labor and capital in order to maximize
production.
Solution: The total cost of x units of labor at $ 200 per unit and y units of capital at
$ 300 per unit equals 200x + 300y dollars. At our disposal, we have 200x + 300y =
60, 000 dollars. We thus write

g(x, y) = 200x + 300y − 60000

as the constraint function. To maximize f (x, y) = 100x 3/4 y 1/4 subject to the con-
straint g(x, y) = 0 we form the Lagrange function

F(x, y, λ) = f (x, y) + λg(x, y)


= 100x 3/4 y 1/4 + λ(200x + 300y − 60000).

To find the critical points of F, we solve the system of equations

0 = Fx = 75x −1/4 y 1/4 + 200λ = 0 ,


0 = Fy = 25x 3/4 y −3/4 + 300λ , (8.39)
0 = Fλ = 200x + 300y − 60000.
288 8 Functions of Several Variables

Solving the first equation for λ, we get

75x −1/4 y 1/4 3 y 1/4


λ=− =− .
200 8 x
Substituting in the second equation gives
x 3/4 3  y 1/4
25 + 300 · − = 0.
y 8 x

Multiplying the above equation by ( xy )1/4 gives

x 900 900 1 9
25 · − = 0 , so x = · y = y.
y 8 8 25 2

Substituting this value of x in the third equation of the system (8.39), we have

9
200 · y + 300y − 60000 = 0,
2
so y = 50 and hence x = 225. We leave the verification of the second derivative test
to the reader. Thus, maximum production is obtained when 225 units of labor and
50 units of capital are used.

8.6 Taylor Expansion in Two Variables

For a function f of a single variable, we have discussed in Sect. 5.4 its Taylor expan-
sion of order n at x = a

n
f (k) (a) k f (n+1) (a + ch) n+1
f (a + h) = h + Rn (h) , Rn (h) = h ,
k=0
k! (n + 1)!

the remainder term Rn being evaluated at some point a + th with t ∈ [0, 1], which
lies between a and a + h. For functions f of more than one variable, an analogous
formula holds which involves the partial derivatives of f . We present here the Taylor
expansion in the case of two independent variables.

Theorem 8.7 Let a function f of two variables be defined in an open rectangle D


centered at (a, b), suppose that f and its partial derivatives up to order (n + 1) are
continuous in D. Then for all points (x, y) = (a + h, b + k) in D,
8.6 Taylor Expansion in Two Variables 289

∂f ∂ f

f (a + h, b + k) = f (a, b) + h +k
∂x ∂ y
(a,b)

1 ∂2 f ∂2 f ∂2 f

+ h 2 2 + 2hk + k 2 2

(8.40)
2! ∂x ∂ x∂ y ∂y (a,b)

1 ∂3 f ∂3 f ∂3 f ∂3 f

+ h 3 3 + 3h 2 k 2 + 3hk 2 2 + k 3 3

3! ∂x ∂x ∂y ∂y ∂x ∂y (a,b)
n

1 ∂ ∂

+ ··· + h +k f

+ Rn (a + th, b + tk),
n! ∂x ∂y (a,b)

where the terms up to order n are evaluated at the point (a, b) as indicated, and the
remainder

1 ∂ ∂ n+1

Rn (x, y) = h +k f

(n + 1)! ∂x ∂y (x,y)

is evaluated at some point (x, y) = (a + th, b + tk), ∈ [0, 1], on the line segment
joining (a, b) and (a + h, b + k).
In the last line of formula (8.40), and analogously in the formula for Rn , we have
used the abbreviation
∂ ∂ n
h +k f
∂x ∂y

with n to be understood as an exponent. The reader should compare with the term
for n = 3 in formula (8.40) to interpret its meaning.
In the case (a, b) = (0, 0) we have (x, y) = (h, k) in the theorem above, and
Taylor’s formula for the expansion of f at the origin becomes

∂f ∂ f

f (x, y) = f (0, 0) + x +y
∂x ∂ y
(0,0)
2

2∂ f ∂2 f 2∂ f
2
1

+ x + 2x y +y (8.41)
2! ∂x2 ∂ x∂ y ∂ y2
(0,0)

1 ∂3 f ∂3 f ∂3 f ∂3 f

+ x 3 3 + 3x 2 y 2 + 3x y 2 2 + y 3 3

+ ...
3! ∂x ∂x ∂y ∂y ∂x ∂x (0,0)

1 ∂ ∂ n

1 ∂ ∂ n+1

+ x +y f
+ x +y f
.
n! ∂x ∂y (0,0)
(n + 1)! ∂x ∂y (t x,t y)

The terms containing the first n derivatives are evaluated at (0, 0). The remainder
term is evaluated at some point (t x, t y) on the line segment joining the origin and
(x, y).
Example 8.35 Write down the expansion at the origin of the following functions of
two variables with derivative terms up to order 3.
290 8 Functions of Several Variables

1. f (x, y) = sin x sin y.


2. f (x, y) = e x sin y.
3. f (x, y) = e x y .
Solution:
1. The partial derivatives up to order 3 are given by (we list them without writing
the argument (x, y) on the left side)

∂f ∂f ∂2 f
= cos x sin y, = sin x cos y, = − sin x sin y,
∂x ∂y ∂x2
∂2 f ∂2 f ∂3 f
= cos x cos y, = − sin x sin y, = − cos x sin y,
∂ x∂ y ∂ y2 ∂x3
∂3 f ∂3 f ∂3 f
= − sin x cos y, = − cos x sin y, = − sin x cos y .
∂ x 2∂ y ∂ y2∂ x ∂ y3

Since all those terms except ∂ 2 f /∂ x∂ y are equal to 0 when evaluated at (0, 0),
the Taylor expansion of f at (0, 0) up to order 3 therefore becomes

1 ∂2 f
f (x, y) = 2x y (0, 0) + R2 (x, y) == x y + R3 (x, y) .
2! ∂ x∂ y

2. For f (x, y) = e x sin y, we have f (0, 0) = 0 and

∂f ∂f
(x, y) = e x sin y, (0, 0) = 0,
∂x ∂x
∂f ∂f
(x, y) = e x cos y, (0, 0) = 1,
∂y ∂y
∂2 f ∂2 f
(x, y) = e x sin y, (0, 0) = 0,
∂x2 ∂x2
∂2 f ∂2 f
(x, y) = e x cos y, (0, 0) = 1,
∂ x∂ y ∂ x∂ y
∂2 f ∂2 f
(x, y) = −e x sin y, (0, 0) = 0,
∂ y2 ∂ y2
∂3 f ∂3 f
(x, y) = e x sin y, (0, 0) = 0,
∂x3 ∂x3
∂3 f ∂3 f
(x, y) = e x cos y, (0, 0) = 1,
∂ x 2∂ y ∂ x 2∂ y
∂3 f ∂3 f
(x, y) = −e x sin y, (0, 0) = 0,
∂ x∂ y 2 ∂ x∂ y 2
∂3 f ∂3 f
(x, y) = −e x cos y, (0, 0) = −1 .
∂ y3 ∂ y3
8.6 Taylor Expansion in Two Variables 291

Inserting these values into (8.41) we get

1 2 1
f (x, y) = 0 + y + x y + 3x y − y 3 + R3 (x, y)
3! 6
1 2 1 3
= y + x y + x y − y + R3 (x, y) .
2 6
3. For f (x, y) = e x y , we have f (0, 0) = 1. The partial derivatives up to order 3 are
given by (we again omit the argument (x, y) on the left side)

∂f ∂f ∂2 f
= ye x y , = xe x y , = y 2 ex y ,
∂x ∂y ∂x2
∂2 f ∂2 f ∂3 f
= e x y + x ye x y , = x 2 ex y , = y 3 ex y ,
∂ x∂ y ∂ y2 ∂x3
∂3 f ∂3 f ∂3 f
= 2ye x y + x y 2 e x y , = 2xe x y + yx 2 e x y , = x 3 ex y .
∂ x 2∂ y ∂ y2∂ x ∂ y3

Evaluated at (0, 0), all those partial derivatives except ∂ 2 f /∂ x∂ y are equal to
zero, so we obtain

1 ∂2 f
f (x, y) = f (0, 0) + xy (0, 0) + R3 (x, y)
2! ∂ x∂ y
1
= 1 + x y + R3 (x, y) .
2

8.7 Integration of Functions of Several Variables

In Chap. 6, we have discussed integrals of functions of one variable, and some appli-
cations have been presented in Chap. 7. In this section, we extend the notion of
integration to functions of several variables. We consider in detail the case of two
variables. Subsequently, we present the case of more than two variables which is
analogous.
Definition of integrals in two dimensions. Consider a function f which is defined
in a rectangular region Q = [a, b] × [c, d] of the x y-plane. Let Q be partitioned into
rectangular subregions as in Fig. 8.25, let us denote by A P the area of such a small
 point of P, and by Δ the
rectangle P, by f P the value of f at an arbitrarily chosen
partition of Q consisting of all such P. If the sums P∈Δ f P A P approach a finite
limit when the partition Δ is made finer and finer, this limit is called the integral (or
double integral) of f over Q, and it is written as
 
f d A , or f (x, y) d A . (8.42)
Q Q
292 8 Functions of Several Variables

Fig. 8.25 Partition of a


rectangle

The function f is then called integrable over Q. A formally precise definition of


this limit will be given in Appendix D.10. It always exists when f is continuous, but
it also exists for many discontinuous functions.
We now consider an arbitrary bounded region D of the plane. Let Q = [a, b] ×
[c, d] be an arbitrary rectangle which encloses D (that is, D ⊂ Q), and let us denote
by 1 D the function whose value is 1 for points in D and 0 at all other points. The
integral of f over D is defined as
 
f (x, y) d A = f (x, y)1 D (x, y) d A, (8.43)
D Q

provided the integral on the right-hand side exists, that is, the product f 1 D is inte-
grable over Q. (Note that the function 1 D is discontinuous if D  = Q, so that f 1 D
usually is discontinuous, too.) In particular, setting f = 1 we obtain the area of D,

area of D = 1 D (x, y) d A. (8.44)
Q

Computation of two-dimensional integrals. The main tool is provided by Fubini’s


theorem, which reduces the computation of such an integral to two successive com-
putations of one-dimensional integrals.
Theorem 8.8 (Fubini) Let f be integrable over the rectangle Q = [a, b] × [c, d].
Then
  d  b  b  d
f (x, y) d A = f (x, y) d x dy = f (x, y) dy d x.
Q c a a c
(8.45)
The integrals in the middle and on the right-hand side are called repeated (or iterated)
integrals.
d b
To evaluate a repeated integral like c ( a f (x, y) d x) dy, one first computes the
b
inner integral a f (x, y) d x. The result will be an expression which usually contains
8.7 Integration of Functions of Several Variables 293

y. In the second step, this expression is integrated with respect to y. After having
understood this procedure, one may also omit the brackets on the right-hand side of
d b
(8.45) and simply write c a f (x, y) d x d y for the repeated integral.

Example 8.36 1. Evaluate Q f (x, y) d A, where f (x, y) = x + 2y and Q is the
rectangle defined by 1 ≤ x ≤ 4 and 1 ≤ y ≤ 2.
2. Let Q be the rectangle given by 0 ≤ x ≤ 1 and −1 ≤ y ≤ 2. Express the double
integral of x 2 y 2 (x 2 − y 3 ) over Q as a repeated integral in two different ways and
evaluate each.
Solution:
1. By Fubini’s theorem, the double integral becomes a repeated integral,
  2  4
x + 2y d A = (x + 2y) d x dy.
Q 1 1

We first evaluate the inner integral,


 4  4
x2 15
(x + 2y) d x = + 2yx = + 6y.
1 2 1 2

We insert this result into the outer integral and obtain


  2  2
15 15
x + 2y d A = + 6y dy = y + 3y 2
Q 1 2 2 1
15 33
= (15 + 12) − ( + 3) = .
2 2
2. In the first variant, we put the y-integration inside and the x-integration outside.
This gives
  1  2
x 2 y 2 (x 2 − y 3 ) d A = x 2 y 2 (x 2 − y 3 ) dy dx
Q 0 −1
 1  2  1   y=2
x 4 y3 1
= x 4 y 2 − x 2 y 5 dy dx = − x 2 y6 dx
0 −1 0 3 6 y=−1
  
1
21 2 3 5 7 3 1 3 7 29
= 3x 4 − x dx = x − x = − =− .
0 2 5 2 0 5 2 10

In the second variant, we do it the other way round,


294 8 Functions of Several Variables
  2  1
x y (x − y ) d A =
2 2 2 3
x y (x − y ) d x
2 2 2 3
dy
Q −1 0
 2  1  2  x=1
x 5 y2 1
= x 4 y2 − x 2 y5 d x dy = − x 3 y5 dy
−1 0 −1 5 3 x=0
 2  3 2
y2 1 y y6 8 64 1 1 29
= − y5 d x = − = − + + =− .
−1 5 3 15 18 −1 15 18 15 18 10

Integrals over non-rectangular domains D also often can be reduced to a repeated


integral. Suppose, for example, that each line “x = constant” which crosses the
boundary of D does so in just two points y1 (x) and y2 (x), where y1 (x) < y2 (x).
Then    b y2 (x)
f (x, y) d A = f (x, y) d y d x , (8.46)
D a y1 (x)

where a and b are the smallest and largest values of x in D. This follows from
Fubini’s theorem, applied to the function f 1 D according to (8.43), since we have for
the inner integral
 d  y2 (x)
f (x, y)1 D (x, y) dy = f (x, y) dy.
c y1 (x)


Example 8.37 Evaluate D f (x, y) d A, where f (x, y) = xe y and D is the plane
region bounded by the graphs of y = x 2 and y = x.
Solution: The region D is shown by the shaded portion in Fig. 8.26. The points of
intersection of the two curves are obtained by solving the equation x 2 = x, giving
x = 0 and x = 1. Thus we obtain D = {(x, y) : 0 ≤ x ≤ 1 , x 2 ≤ y ≤ x}. Applying
... we get
  1  x  1  x
xe y d A = xe y dy dx = x e y dy dx
D 0 x2 0 x2
  y=x1  1
2
= x ey d x = xe x − xe x d x
0 y=x 2 0
 1  1
2
= xe x d x − xe x d x .
0 0

The first of these two integrals is computed with integration by parts,


 1  1  1  1
xe x d x = xe x − e x d x = e1 − e x = 1 ,
0 0 0 0

the second with the substitution t = x 2 ,


8.7 Integration of Functions of Several Variables 295
 1  1
x2 1 1 1
− xe d x = − et dt = − e + .
0 2 0 2 2

Thus 
1 1 1
xe y d A = 1 − e + = (3 − e) .
D 2 2 2

The properties of linearity and monotonicity, formulated in (6.21)–(6.25) for one-


dimensional integrals, are also valid for two-dimensional integrals, so
  
α f + βg d A = α f dA +β gdA,
D
  D D
(8.47)
f dA ≥ g d A , if f ≥ g,
D D

hold for integrable functions f, g and scalars α, β.


From the very beginning, we have seen that the integral of a nonnegative function
y = f (x) defined on a one-dimensional interval gives the (two-dimensional) area
below the graph of f . In an analogous manner, we can obtain the (three-dimensional)
volume V of the solid bounded from above by the surface z = f (x, y) (assuming f
to be nonnegative) and from below by some two-dimensional region D by

V = f (x, y) d A .
D

Integrals and polar coordinates. To evaluate the integral of a function f of a


single variable, one may use the substitution rule, Theorem 6.5. For the substitution
x = g(r ), it reads as
 g(b)  b
f (x) d x = f (g(r ))g (r ) dr . (8.48)
g(a) a

A corresponding rule can be used for functions of two variables. We explain this
procedure for the case where Cartesian coordinates (x, y) are substituted by polar
coordinates (r, θ), that is, x = r cos θ and y = r sin θ. It connects the integral over a
plane region G expressed in polar coordinates to the integral over the corresponding
region
D = {(r cos θ, r sin θ ) : (r, θ) ∈ G} (8.49)

in Cartesian coordinates.
Theorem 8.9 Let G and D be as above, let f be integrable over D. Then
 
f (x, y) d A = f (r cos θ, r sin θ )r d A . (8.50)
D G
296 8 Functions of Several Variables

Fig. 8.26 Illustration of Theorem 8.9

The plane regions D and G in (8.50) correspond to the intervals [g(a), g(b)] and
[a, b] in (8.48), and the factor r in (8.50) corresponds to the factor g (r ) in (8.48).
We will not prove this theorem.

Example 8.38 Evaluate the integral D f (x, y) d A, where D is the unit circle and
f (x, y) = x 2 .

Solution: Expressed in polar coordinates, the unit circle D in the plane takes on the
form of the rectangular region G = {(r, θ) : 0 ≤ r ≤ 1 , 0 ≤ θ ≤ 2π}. We compute,
using Theorem 8.9 and Fubini’s theorem,
  
f (x, y) d A = x dA =
2
(r cos θ )2 r d A
D D G
 1  2π  1  2π
= r cos θ dθ dr =
3 2
r dr ·
3
cos2 θ dθ
0 0 0 0
π
r=1
1

= r 4
·π = .
4 r=0 4
As an alternative solution, one might proceed along the lines of Example 8.37. This
leads to
   √ 
1 1−x 2 1 
f (x, y) d A = √ x dy dx =
2
x2 · 2 1 − x2 dx = . . . ,
D −1 − 1−x 2 −1

but this procedure is more involved.


Integrals in three dimensions. To define such integrals, one carries out a procedure
analogous to that described above for two dimensions. One first defines the integral
8.7 Integration of Functions of Several Variables 297

f (x, y, z) d V (8.51)
Q

for functions f defined on rectangular solids Q = [a1 , b1 ] × [a2 , b2 ] × [a3 , b3 ] by a


limit process involving small such solids. In a second step, one defines the integral
over an arbitrary bounded domain D by
 
f (x, y, z) d V = f (x, y, z)1 D (x, y, z) d V , (8.52)
D Q

where Q is any rectangular solid enclosing D, and 1 D is the function equal to 1 on


D and to 0 elsewhere. We refer to Appendix D.10 for a more detailed exposition.
The integrals in (8.51) and (8.52) are called three-dimensional or triple integrals.
The triple integral can be expressed as a repeated integral just as a double integral.
For Q = [a1 , b1 ] × [a2 , b2 ] × [a3 , b3 ] we get
  b1  b2  b3
f (x, y, z)d V = f (x, y, z) dz dy d x .
Q a1 a2 a3

The sequence of the one-dimensional integrations on the right-hand side can be


interchanged arbitrarily.
Integrals and spherical coordinates. In three dimensions, spherical coordinates
are often convenient (like polar coordinates in two dimensions). We denote them by
(r, ϕ, θ); r has the meaning of a radius, while ϕ and θ stand for angles. Transformation
from Cartesian to spherical coordinates is done via the formulas

x = r sin ϕ cos θ
y = r sin ϕ sin θ (8.53)
z = r cos ϕ .

If r is kept fixed, and if ϕ varies in [0, π ] and θ in [0, 2π], we get all points of the sphere
of radius r centered at the origin. Indeed, one may check that x 2 + y 2 + z 2 = r 2 .
The angle ϕ = 0 corresponds to the “north pole” (0, 0, r ), the angle ϕ = π to the
“south pole” (0, 0, −r ), and for ϕ = π/2 we get the “equator”, that is, the circle
(r cos θ, r sin θ, 0) in the x y-plane. The transformation formula analogous to (8.50)
connects the integral over a spatial region G expressed in spherical coordinates to
the integral over the corresponding region

D = {(r sin ϕ cos θ, r sin ϕ sin θ, r cos ϕ) : (r, ϕ, θ) ∈ G} (8.54)

in Cartesian coordinates according to the following theorem.


Theorem 8.10 Let G and D be as above, let f be integrable over D. Then
298 8 Functions of Several Variables
 
f (x, y, z) d V = f (r sin ϕ cos θ, r sin ϕ sin θ, r cos ϕ)r 2 sin ϕ d V .
D G
(8.55)
Again, we will not prove this theorem.

Example 8.39 Evaluate the integral D f (x, y, z) d V , where D is the unit ball
and f (x, y, z) = x 2 + y 2 + z 2 .

Solution: Expressed in spherical coordinates, the unit ball D in three-dimensional


space takes on the form of the rectangular region

G = {(r, ϕ, θ) : 0 ≤ r ≤ 1 , 0 ≤ ϕ ≤ π , 0 ≤ θ ≤ 2π} .

We compute, using Theorem 8.10 and Fubini’s theorem,


  
f (x, y, z) d A = x + y + z dA =
2 2 2
r 2 · r 2 sin ϕ d A
D D G
 1  π  2π
= r 4 sin ϕ dθ dϕ dr
0 0 0
 1  π  2π
= r dr ·
4
sin ϕ dϕ · 1 dθ
0 0 0
1

r=1 4
= r 5
· 2 · 2π = π .
5 r=0 5

8.8 Applications of Double Integrals

8.8.1 Population of a City

Let the rectangular region D of Fig. 8.27 represent a certain district of a city, and
let f (x, y) be the population density function (the numberof people per unit area)
defined at all points (x, y) ∈ D. Then the double integral D f (x, y) d A gives the
actual number of people living in the district under consideration.
Example 8.40 The population density of a certain city equals

f (x, y) = 20000e−0.2|x|−0.1|y|

people per square kilometer, where x and y are measured in kilometers and the origin
(0, 0) gives the location of the city hall. Determine the total population inside the
rectangular area described by

D = {(x, y) : −10 ≤ x ≤ 10 , −5 ≤ y ≤ 5} .
8.8 Applications of Double Integrals 299

Fig. 8.27 A rectangular region D representing a district of a big city

Solution: By symmetry, it suffices to compute the population in the first quadrant of


Fig. 8.27. Therefore the population in the district equals

  10  5
f (x, y) d A = 4 2 · 104 e−0.2x e−0.1y dy dx
D 0 0
 10   y=5  10
=4 − 2 · 105 e−0.2x e−0.1y d x = 8 · 105 (1 − e−0.5 ) e−0.2x d x
0 y=0 0
= 4 · 106 (1 − e−0.5 )(1 − e−2 )

or approximately 1, 360, 876 people.

8.8.2 Average Value of a Function of Two Variables

In Sect. 7.3, we have shown that the average value or mean value of a function of
one variable over an interval can be represented by its integral divided by the length
of the interval. An analogous result holds for functions of two variables, that is, if
such a function f is integrated over a plane region D, then its average value over D
is given by the quotient
 
f (x, y) d A Df (x,
y) d A
D
= , (8.56)
area of D D 1dA

see (8.44). In particular, if D = {(x, y) : a ≤ x ≤ b, c ≤ y ≤ d} is a rectangle, then


the average value of f over D is given by
300 8 Functions of Several Variables
 d  b  b  d
1 1
f (x, y) d x d y = f (x, y) d y d x .
(b − a)(d − c) c a (b − a)(d − c) a c

Example 8.41 Find the average value of the function f (x, y) = x y over the plane
region defined by y = e x , 0 ≤ x ≤ 2.
Solution: The integral of f over D is given by
   ex  x  
2
1 2 y=e 2
f (x, y) d A = x y dy d x = y x dx
D 0 0 0 2 y=0
 2    2
1 2x 1 2x 2 1 2x
= xe d x = xe − e dx
0 2 4 0 0 4
 
1 1 2x 2 1 1 1 1
= e4 − e = e4 − e4 + = (3e4 + 1) .
2 8 0 2 8 8 8

The area of the region D is


  2  ex  2   y=ex  2
1dA = 1 dy dx = y dx = e x d x = e2 − 1 .
D 0 0 0 y=0 0

By (8.56), the average value of f over D equals the quotient



Df (x, y) d A 1
(3e4 + 1) 1 3e4 + 1
= 8 2 = · 2 .
D 1dA e −1 8 e −1

8.8.3 Joint Probability Density Functions

In Sect. 7.6, we have established a relationship between the integral of a function f


of a single variable and the probability of an event to occur. Let us recall that f is
the density function belonging to some random variable x if the probability that an
observed value of x lies in an interval [a, b] is given by
 b
f (x) d x .
a

Here, we consider two real-valued random variables x and y. Let x take values in
some interval I x and y take values in some interval I y . Let f be a nonnegative function
of two variables with domain D( f ) = {(x, y) : x ∈ I x , y ∈ I y }, that is, f (x, y) ≥ 0
for all (x, y) ∈ D( f ). f is called the joint probability density function of x and
y, if
8.8 Applications of Double Integrals 301

f (x, y) d A = 1 , (8.57)
D( f )

and if for every planar region D ⊂ D( f ), the probability that the pair (x, y) of the
observed values of the random variables lies in D is given by

P((x, y) ∈ D) = f (x, y) d A . (8.58)
D

Example 8.42 A new car manufactured by some company carries a 50, 000 km war-
ranty on its engine and its transmission. Preproduction tests indicate that the life
spans of the engine and the transmission are described by random variables x and y
with joint probability density function

f (x, y) = 0.00004e−0.005x−0.008y , x, y ≥ 0 ,

and the unit of measurement for x and y is 1000 km.


1. What is the probability that a new car chosen at random will have an engine
breakdown before the 50, 000 km warranty expires?
2. What is the probability that a new car chosen at random will have a breakdown
of both its engine and its transmission before the 50, 000 km warranty expires?
Solution: 1. The required probability is given by

P((x, y) ∈ D) , where D = {(x, y) : 0 ≤ x ≤ 50 , 0 ≤ y} .

We compute
  ∞  50
P((x, y) ∈ D) = f dA = f (x, y) d x d y
D 0 0
 ∞  50
= 0.00004e−0.005x−0.008y d x d y
0 0
 ∞  50
−5
= 4 · 10 e−0.005x e−0.008y d x d y
0 0
 50  ∞
−5 −0.005
= 4 · 10 e dx · e−0.008y dy .
0 0

These integrals give


 50  ∞
1
e−0.005 d x = 200(1 − e−0.25 ) , e−0.008y dy = ,
0 0 0.008
so
302 8 Functions of Several Variables

1
P((x, y) ∈ D) = 4 · 10−5 · 200(1 − e−0.25 ) = 1 − e−0.25 )
0.2212 .
0.008
2. The required probability is given by

P((x, y) ∈ D) , where D = {(x, y) : 0 ≤ x ≤ 50 , 0 ≤ y ≤ 50} .

We compute
  50  50
P((x, y) ∈ D) = f dA = f (x, y) d x d y
D 0 0
 50  50
= 0.00004e−0.005x−0.008y d x d y
0 0
 50  50
−5
= 4 · 10 e−0.005x e−0.008y d x d y
0 0
 50  50
−5 −0.005
= 4 · 10 e dx · e−0.008y dy .
0 0

These integrals give


 50  50
−0.005 −0.25 1
e d x = 200(1 − e ), e−0.008y dy = (1 − e−0.4 ) ,
0 0 0.008

so as before

P((x, y) ∈ D) = 1 − e−0.25 )(1 − e−0.4 )


0.0729 .

8.9 Exercises

8.9.1 Give a formula for the function m = f (b, t) where m is the amount of money
in a bank account t years after an initial investment of b Indian rupees, if
interest occurs at a rate of 5% per year compounded
(a) annually, (b) continuously.
8.9.2 Suppose the concentration C (in mg per liter) of a drug in the blood is a
function of two variables x, the amount (in mg) of the drug given in the
injection, and t, the time (in hours) since the injection was administered. Let
C be given by

C = f (x, t) = te−t (9−x) , for 0 ≤ x ≤ 8 and t ≥ 0.

Explain the meaning of the cross sections


(a) f (8, t), (b) f (x, 1).
8.9 Exercises 303

8.9.3 Let f (x) = x sin x. Evaluate


x
(a) f (x − y), (b) f ( ), (c) f (x y).
y
8.9.4 Let h(x, y, z) = x y z + 4. Evaluate
2 3

(a) h(a + b, a − b, b), (b) h(0, 0, 0), (c) h(t, t 2 , −t), (iv) h(−6, 4, 2).
8.9.5 Describe in words √the domain of the following functions.
(a) f (x, y) = xe− y+2 , (b) f (x, y, z) = e x yz ,
x yz
(c) f (x, y, z) = .
x+y+z
8.9.6 Sketch the graph  of the functions given below.
(a) f (x, y) = x 2 + y 2 (b) f (x, y) = 4 − x 2 − y 2
(c) f (x, y) = x 2 + y 2 − 1.
8.9.7 (a) Sketch the level curve z = k for the specified values of k.
(i) z = x 2 + y 2 , k = 0, 1, 2, 3
(ii) z = x 2 − y 2 , k = −2, −1, 0, 1, 2.
(b) Sketch the level surface f (x, y, z) = c.
(i) f (x, y, z) = 4x 2 + y 2 + 4z 2 , c = 16
(ii) f (x, y, z) = 4x − 2y + z, c = 1.
8.9.8 Let T (x, y) be the temperature at a point (x, y) on a flat metal plate situated
in the x y-plane. The level curves of T are called isothermal curves. Assume
that T (x, y) is inversely proportional to the distance of (x, y) from the origin.
(a) Sketch the isothermal curves on which T = 1, T = 2, and T = 3.
(b) If the temperature at the point (4, 3) is 40 ◦ C, find an equation for the
isothermal curve belonging to a temperature of 20 ◦ C.
8.9.9 If V (x, y) is a potential (for example, the voltage) at a point (x, y) in the
x y-plane, the level curves of V are called equipotential curves. The potential
remains constant along such curves. For

8
V (x, y) = 
16 + x 2 + y 2

sketch the equipotential curves at which V = 2, V = 1, and V = 0.5.


8.9.10 According to the ideal gas law, the pressure P, volume V , and temperature
T of a confined gas are related by the formula P V = nkT , where n is the
number of particles in the gas and k is the Boltzmann constant. Express P
as a function of V and T , and describe the level curves associated with this
function. What is the physical significance of these level curves?
8.9.11 The power P generated by a wind rotor is proportional to the product of the
area A swept out by the blades and the third power of the wind velocity v.
(a) Express P as a function of A and v.
(b) Describe the level curves of P and explain their physical meaning.
(c) Consider a rotor whose blades sweep out a circular area with a diameter
of 10 ft and which produces a power of 3000 watts at a wind velocity of
20 m/sec. Find an equation of the level curve P = 4000.
304 8 Functions of Several Variables

8.9.12 Assume that the atmospheric pressure near ground level in certain region is
given by
p(x, y) = ax 2 + by 2 + c ,

where a, b, and c are positive constants.


(a) Describe the isobars in this region for pressures greater than c.
(b) How are the low and the high pressures distributed in this region ?
8.9.13 Find
x y3
(a) lim ,
(x,y)→(−1,2) x + y
x−y
(b) lim .
(x,y)→(0,0) x 2 + y 2

8.9.14 Suppose ⎧
⎨ sin (x + y ) , (x, y)  = (0, 0) ,
2 2

f (x, y) = x 2 + y2

1, (x, y) = (0, 0) .

Show that f is continuous at (0, 0).


8.9.15 Describe the set of all points in the x y-plane at which√f is continuous.

(a) f (x, y) = ln (x + y − 1), (b) f (x, y) = xe 1−y .
2

2 ∂ z
2
∂ z
2
8.9.16 (a) z = y 2 e x + x 21y 3 . Find and .
∂ x∂ y ∂ y∂ x
x 2
∂ w
3
(b) Let w = 2 , find .
y +z 2 ∂z∂ y 2
y
8.9.17 (a) Show that f (x, y) = arctan satisfies the equation
x

∂2 f ∂2 f
+ 2 = 0.
∂x 2 ∂y

(b) Show that f (x, t) = (x − at)4 + cos (x + at) satisfies

∂2 f ∂2 f
= a2 2 .
∂t 2 ∂x

8.9.18 Show that the functions u = u(x, y) and v = v(x, y) given below satisfy the
following relations, known as the Cauchy–Riemann equations:

∂u ∂v ∂u ∂v
= and =− .
∂x ∂y ∂y ∂x

(a) u = x 2 − y 2 , v = 2x y,
(b) u = e x cos y, v = e x sin y,
8.9 Exercises 305

(c) u = ln (x 2 + y 2 ), v = 2 tan−1 ( xy ).
8.9.19 Let z = f (u) and u = g(x, y). Show that
∂2z dz ∂ 2 u d 2 z ∂u 2
(a) = + ( ) ,
∂x2 du ∂ x 2 du 2 ∂ x
∂ z
2
dz ∂ u
2
d z ∂u
2
(b) = + 2 ( )2 ,
∂ y2 du ∂ y 2 du ∂ y
∂2z dz ∂ 2 u d 2 z ∂u ∂u
(c) = + 2 .
∂ y∂ x du ∂ y∂ x du ∂ x ∂ y
8.9.20 Show that among all parallelograms with perimeter l, a square with sides of
length l/4 has maximum area.
8.9.21 Determine the dimensions of a rectangular box, open at the top, having vol-
ume V , and requiring the least amount of material for its construction.
8.9.22 A company plans to manufacture closed rectangular boxes that have a volume
of 8 cubic feet. Find the dimensions that will minimize the cost if the material
for the top and bottom costs twice as much as the material for the sides.
8.9.23 A window has the shape of a rectangle surmounted by an isosceles triangle
as shown in Fig. 8.28. If the perimeter of the window is 4 m, what values of
x, y, and θ will maximize the total area?
8.9.24 The following Table 8.1 lists the relationship between semester averages and
scores on the final examination for ten students in a mathematical class.
Fit the data to a line and use the line to estimate the final examination grade
of a student with an average of 70.

Table 8.1 Relationship between semester averages and examination scores


Semester average 40 55 62 68 72 76 80 86 90 94
Final examination 30 45 65 72 60 82 76 92 88 98

8.9.25 In studying the stress–strain diagram of an elastic material, an engineer finds


that part of the curve appears to be linear. Experimental values are listed in
the following Table 8.2.
Fit these data to a straight line and estimate the strain when the stress is 2.5%.

Table 8.2 Data relating stress and strain of an elastic material


Stress 4 4.4 4.8 5.2 5.6 6
(MPa)
Strain (%) 0.2 0.6 0.8 1.2 1.4 1.8
306 8 Functions of Several Variables

Fig. 8.28 Maximizing the


area of a window

8.9.26 Apply the method of Lagrange multipliers to find maximum and minimum
values of
(a) f (x, y) = x y, where 4x 2 + 8y 2 = 16.
(b) f (x, y) = x − 3y − 1, where x 2 + 3y 2 = 16.
8.9.27 Find the point on the line 2x − 4y = 3 that is closest to the origin.
8.9.28 (a) Find the Taylor series of the function e x ln y around the point (0, 1) up
to the terms of order 4.
(b) Expand e x cos y in a Taylor series at (1, π2 ). (Compute the terms up to
order 3.)
8.9.29 Evaluate the following integrals:
 3  −1
(a) 0 −2 (4x y 3 + y) d x d y
 π/6  π/2
(b) 0 0 (x cos y − y cos x) d y d x.

8.9.30 Let an artificial lake be created bordering one side of a straight dam. The
shape of the lake surface is that of a region in the x y-plane bounded by the
graph of 2y = 16 − x 2 and x + 2y = 4. Find the area A of the surface of the
lake.
8.9.31 Find the area of the region D that lies outside the circle r = 100 and inside
the circle r = 200 sin θ.
8.9.32 Find the area of the region R bounded by one loop of the lemniscate r 2 =
a 2 sin 2θ .
8.9.33 Use the polar coordinates to evaluate the following integrals
 a  √a 2 −x 2
e−(x +y ) d y d x.
2 2
(a)
0 a 0 √a 2 −x 2
(b) (x 2 + y 2 )3/2 d y d x.
−a 0
Chapter 9
Vector Calculus

9.1 Introduction

In the previous chapters, we have studied properties of functions defined on R (the


line), R2 (the plane), or R3 (the space) with values in R, which are called real-
valued or scalar functions, or scalar fields. Here, we would like to study the calculus
of functions taking values in R2 or R3 , instead of R. Those functions are called
vector-valued functions or vector fields. In Sect. 9.2 the concept of a vector will be
introduced along with its basic algebraic properties. Vector fields and their continuity
and differentiation properties are discussed in Sect. 9.3 along with the notions of
gradient, divergence, and curl. Moreover, we explain how curves and surfaces are
described by such functions. Integrals of vector fields are introduced in Sect. 9.4, first
the line integral for scalar and vector fields, and then the surface integral for scalar
fields. In Sect. 9.5 we present three fundamental theorems of vector calculus, namely,
the Green–Ostrogradski theorem, the Gauss divergence theorem, and the theorem of
Stokes. Section 9.6 is devoted to certain applications of the vector calculus to science
and engineering, with an emphasis on problems from various parts of mechanics.
The concept of a vector can be traced to the development of affine and ana-
lytic geometry in the seventeenth century and, later, the invention of complex num-
bers and quaternions. Vector calculus itself was developed mainly during the sec-
ond half of the nineteenth century; contributions stem from, among others, Sir
William Rowan Hamilton (1805–1865), William Kingdon Clifford (1845–1879),
James Clerk Maxwell (1831–1879), Hermann Günter Grassmann (1809–1877), and
Josiah Willard Gibbs (1839–1903). However, the fundamental results mentioned
above were already discovered by George Green (1793–1841), Carl Friedrich Gauss
(1777–1855), Gabriel Stokes (1819–1903), and Mikhail Ostrogradski (1801–1862).
Nowadays, vector calculus serves as a basic mathematical tool in all areas of sci-
ence and engineering, where mechanical, electromagnetic and thermodynamic forces
determine the behavior of solids, fluids, electric conductors or semiconductors, and
magnetic materials.

© Springer Nature Singapore Pte Ltd. 2019 307


M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6_9
308 9 Vector Calculus

9.2 Vectors

We know that a number x is used to represent a point on a line, and a pair (x, y) of
numbers x and y is used to represent a point in the plane, see Fig. 9.1a and b.
Moreover, in the previous chapter, we have already represented points in space,
that is, three-dimensional space, by a triple (x, y, z) of numbers, see Fig. 9.2. The
number of components in such tuples equals the dimension of the corresponding
space (1 for the real line R, 2 for the plane R2 , and 3 for the space R3 ). Irrespective
of the dimension, those tuples are called vectors. Instead of (x, y) or (x, y, z),
we also denote them by (x1 , x2 ) or (x1 , x2 , x3 ), respectively. This notation is more
convenient when we want to go to higher dimensions and consider, for example, a
vector (x1 , x2 , x3 , x4 ) as an element of the 4-space R4 , or even—when we do not
want to specify the dimension as a fixed number—a vector (x1 , x2 , x3 , . . . , xn ) as an
element of n-space Rn , where n stands for an arbitrary natural number.
Since the main goal of this chapter is vector calculus in 3-space, when we speak
of vectors we will mean vectors in 3-space; otherwise, it will be made explicit.

Fig. 9.1 a Point on a line, b point in a plane

Fig. 9.2 A vector (or point)


in three-dimensional space
9.2 Vectors 309

The basic operations with vectors are addition, defined componentwise by

(x1 , x2 , x3 ) + (y1 , y2 , y3 ) = (x1 + y1 , x2 + y2 , x3 + y3 ) ,

and multiplication by scalars (real numbers) α,

α(x1 , x2 , x3 ) = (αx1 , αx2 , αx3 ) .

It is also very convenient to denote a vector by a single letter. We will use x (in
boldface type) to denote the vector (x1 , x2 , x3 ).
In arbitrary dimension, addition and scalar multiplication are defined analogously
by

(x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ) = (x1 + y1 , x2 + y2 , . . . , xn + yn )


α(x1 , x2 , · · · , xn ) = (αx1 , αx2 , . . . , αxn ) ,

where x1 , x2 , . . . , xn , y1 , y2 , . . . , yn and α are real numbers.


Let us note at this point that two vectors x = (x1 , x2 , x3 ) and y = (y1 , y2 , y3 ) are
equal, that is x = y, if all their components are equal, that is, x1 = y1 , x2 = y2 , and
x3 = y3 .

Definition 9.1 Two vectors x and y are said to be parallel if x = αy (or y = αx)
for some real number α. If α > 0 then x and y are said to have the same direction;
if α < 0 then x and y are said to have opposite direction.

As an immediate consequence (take α = 0), we note that the vector 0 = (0, 0, 0)


is parallel to every vector. Moreover, if x and y are vectors which are parallel to a
vector z, then any linear combination αx + βy of x and y, where α and β are real
numbers, is also parallel to z.
Addition and scalar multiplication have the property that

x + y = x + y , (x + y) + z = x + (y + z) ,
α(x + y) =αx + αy , (α + β)x = αx + βx , (αβ)x = α(βx) ,

hold for all vectors x, y, and z and all scalars α and β. This follows from the corre-
sponding properties of ordinary addition and multiplication.
We know from elementary geometry that the length (or magnitude) of the vector
x = (x 1 , x2 , x3 ) is given by the expression

x12 + x22 + x32 .

Definition 9.2 The Euclidean norm (or simply norm) x of a vector x is defined
as
310 9 Vector Calculus

x = x12 + x22 + x32 .

The norm has the following properties, which are analogous to the properties of the
absolute value of a real number.

x ≥ 0 , and x = 0 if and only if x = 0 , (9.1)


αx = |α|x for scalars α , (9.2)
x + y ≤ x + y . (9.3)

Properties (9.1) and (9.2) can be checked immediately from the definition. It will be
seen below that property (9.3) is a consequence of the Schwarz inequality. Setting
α = −1 in property (9.2), we obtain that

 − x = x (9.4)

holds for every vector x.


The inequality x + y ≤ x + y is called the triangle inequality. Consider
the triangle formed by the points A = 0, B = x, and C = x + y. The triangle inequal-
ity states that the length of the side connecting A and C cannot be larger than the
sum of the length of the other two sides.

Example 9.1 Given x = (1, −2, 2) and y = (−1, 2, −2), compute


(i) x, (ii) y, (iii) x + y, (iv) x − y, and (v) 4x.

Solution:
 √
(i) x = 12 + (−2)2 + 22 = 9 √ = 3.
(ii) y = (−1) + 2 + (−2) = 9 = 3.
2 2 2

(iii) x + y = (1 − 1, −2 + 2 − 2) =(0, 0, 0), x + y = 0.



(iv) x − y = (2, −4, 4), x − y
 = 2 2 + (−4)2 + 42 = 36 = 6. √

(v) 4x = (4, −8, 8), 4x = 4 + (−8) + 8 = 16 + 64 + 64 = 144=12.
2 2 2

Alternatively, 4x = 4x = 4 · 3 = 12, by (i).


A vector x with norm x = 1 is called a unit vector. If x is a nonzero vector, then
x
e=
x

is a unit vector. The unit vectors

i = (1, 0, 0) , j = (0, 1, 0) , k = (0, 0, 1)

(they obviously satisfy i = j = k = 1) are also called unit coordinate vec-
tors or standard unit vectors, see Fig. 9.3. Every vector can be expressed as a linear
combination of the unit coordinate vectors: For x = (x1 , x2 , x3 ), we have
9.2 Vectors 311

Fig. 9.3 Standard unit


vectors

x = x1 i + x2 j + x3 k .

Indeed, from the rules of addition and scalar multiplication we see that

x = (x1 , 0, 0) + (0, x2 , 0) + (0, 0, x3 ) = x1 (1, 0, 0) + x2 (0, 1, 0) + x3 (0, 0, 1)


= x1 i + x2 j + x3 k .

Example 9.2 Let x = (3, −1, 1) and y = (2, 3, −1).


(i) Express 2x − y in terms of i, j, and k.
(ii) Calculate 3x − y.

Solution:
(i) We have 2x = (6, −2, 2) and

2x − y = (6, −2, 2) − (2, 3, −1) = (6 − 2, −2 − 3, 2 + 1) = (4, −5, 3)


= 4i − 5j + 3k .

(ii) We have 3x − y = (9, −3, 3) − (2, 3, −1) = (7, −6, 4) and


 √ √
3x − y = 72 + (−6)2 + 42 = 49 + 36 + 16 = 101 .

Let us consider a vector x = x1 i + x2 j + x3 k. If x3 = 0, we have x = x1 i + x2 j, so


x lies in the plane spanned by the first two unit coordinate vectors. We may identify
this plane with the standard x y-plane spanned by the unit coordinate vectors (1, 0)
and (0, 1), which we continue to denote by i and j. Every vector x = (x1 , x2 ) in the
plane can be expressed as a linear combination
312 9 Vector Calculus

x = x1 (1, 0) + x2 (0, 1) = x1 i + x2 j

of the unit coordinate vectors.

Example 9.3 (a) Simplify the linear combinations (2i − j + k) + (i − 3j + 5k) and
2(i − j) + 6(2i + j − 2k).
(b) Calculate the norm of the vector (i − j) + 2(j − i) + (k − j).
(c) Find the unit vector in the direction of x = (3, −4, 0) and y = i − 2j + 2k.
(d) Prove that |x − y| ≤ x − y holds for all vectors x and y. (This inequality
is called the reverse triangle inequality.)

Solution:
(a) We have

(2i − j + k) + (i − 3j + 5k) = 3i − 4j + 6k ,
2(i − j) + 6(2i + j − 2k) = 2i − 2j + 12i + 6j − 12k = 14i + 4j − 12k .

(b) We have

(i − j) + 2(j − i) + (k − j) =  − i + k = (−1, 0, 0) + (0, 0, 1)


 √
= (−1, 0, 1) = (−1)2 + 02 + 12 = 2 .

(c) Let us denote by e the corresponding unit vector. We get


 
x (3, −4, 0) (3, −4, 0) 3 4 3 4
e= = = √ = ,− ,0 = i − j,
x 9 + (−4) + 0
2 25 5 5 5 5
 
y (1, −2, 2) 1 2 2 1 2 2
e= = = ,− , = i− j+ k.
y 1 + (−2) + (2)
2 2 3 3 3 3 3 3

(d) From the triangle inequality, we get x = x − y + y ≤ x − y + y, so


x − y ≤ x − y. Interchanging the role of x and y we get y − x ≤
y − x =  − (y − x) = x − y by (9.4). Putting those two inequalitites
together we obtain |x − y| ≤ x − y, the desired result.
The Scalar Product

Definition 9.3 (Scalar Product) Let x = (x 1 , x2 , x3 ) and y = (y1 , y2 , y3 ) be two


vectors. The scalar product (or dot product) of x and y is denoted by x · y and
defined by
x · y = x1 y1 + x2 y2 + x3 y3 .

Thus, the scalar product yields a scalar (a real number).

Example 9.4 Find the scalar product of the following vectors:


9.2 Vectors 313

(a) x = (2, −1, 3) and y = (−3, 1, 4),


(b) y = (−3, 1, 4) and z = (1, 3, 0),
(c) x = 2i − 4j + k and y = i − j + 3k.
Solution:
(a) x · y = 2 · (−3) + (−1) · 1 + 3 · 4 = −6 − 1 + 12 = 5.
(b) y · z = −3 + 3 + 0 = 0.
(c) x · y = 2 · 1 + (−4) · (−1) + 1 · 3 = 9.
The scalar product and the norm are related by

x · x = x2 = x12 + x22 + x32 , x = x ·x, (9.5)

which holds for every vector x = (x1 , x2 , x3 ).


The scalar product is commutative, that is,

x·y=y·x

holds for all vectors x and y. Moreover, it is distributive in the sense that

x · (y + z) = x · y + x · z (9.6)
(x + y) · z = x · z + y · z (9.7)
(αx) · y = α(x · y) = x · (αy) (9.8)

holds for all vectors x, y, and z and all scalars α. In particular, setting α = 0 we have
for every vector x
x ·0 = 0 = 0·x.

All these rules follow immediately from the definition of the scalar product.
Geometric interpretation of the scalar product. We look at the triangle in Fig. 9.4
with vertices A, B, and C which has a right angle at B. Setting x = B − A, y =
C − B, the theorem of Pythagoras tells us that

x + y2 = x2 + y2 . (9.9)

Using (9.5) and the properties of the scalar product, we compute

x + y2 = (x + y) · (x+y) = x · x + x · y + y · x + y · y = x2 + 2x · y + y2 .

Comparing with (9.9), we see that we must have x · y = 0.


Definition 9.4 Two vectors x and y are called orthogonal or perpendicular if
x · y = 0, and we write x⊥y in this case.
Example 9.5 Examine whether the vectors x = (2, 1, 1) and y = (1, 1, −3) are
orthogonal.
314 9 Vector Calculus

Fig. 9.4 Triangle with a


right angle at B

Fig. 9.5 The projection


p y (x) = αy

Solution: We have x · y = 2 · 1 + 1 · 1 + 1 · (−3) = 2 + 1 − 3 = 0. This implies


x⊥y. Let us now look at the triangle with vertices A, B, and C in Fig. 9.5, set-
ting x = B − A and y = C − B. We want to determine the point D on the side BC
such that the line AD becomes perpendicular to BC. We have D − B = αy for some
scalar α, and A − D = x − αy. In order to determine α, we exploit the orthogonality
relation
0 = αy · (x − αy) = αy · x − α 2 y · y .

Solving for α, we obtain


x·y
α= . (9.10)
y·y

Definition 9.5 Let x, y be vectors with y = 0. The projection of x on y, denoted


by py (x), is defined by
x·y
py (x) = y. (9.11)
y·y
9.2 Vectors 315

The length of the projection is given by

|x · y|
py (x) = , (9.12)
y

since x · y  x · y
    |x · y| |x · y|
py (x) =  y =  y = y = .
y·y y·y y 2 y

In the special case where y is a unit vector, y = 1, (9.11) becomes

py (x) = (x · y)y . (9.13)

In this manner, setting

x I = py (x) , x I I = x − py (x) , (9.14)

we can decompose an arbitrary vector x into a sum x = x I + x I I of a vector x I


parallel to y and a vector x I I perpendicular to y, provided y is nonzero. Indeed, (9.14)
is the only way to get a decomposition with those properties, so the decomposition is
unique. This appears natural when we recall the geometric construction made above;
we will not write down a formal proof here. One may check directly, however, that
the vectors x I and x I I are orthogonal. We have
 
x·y x·y
xI · xI I = py (x) · (x − py (x)) = y· x− y
y·y y·y
x·y (x · y)2
= (y · x) − (y · y) = 0 ,
y·y (y · y)2

We now relate the projection vector to the angle θ in Fig. 9.5. There θ is an acute
angle (less than 90◦ ), and we have by (9.12)

py (x) |x · y|
cos θ = = .
x yx

If the angle θ is larger than 90◦ , then both the cosine and the scalar product x · y
become negative. (The reader is urged to draw a picture similar to Fig. 9.5 for this
situation.) The general formula relating the angle θ between x and y to the scalar
product is
x·y
cos θ = . (9.15)
xy

Example 9.6 Find the angle between the vectors x = (2, 3, 2) and y = (1, 2, −1).

Solution: We determine the angle from formula (9.15) just above. We have x · y =
2 · 1 + 3 · 2 + 2 · (−1) = 6 and
316 9 Vector Calculus
√ √  √
x = 4+9+4= 17 , y = 1 + 22 + (−1)2 = 6 .

Thus
x·y 6 1√ 10.1 ∼
cos θ = =√ √ = 102 ∼
= = 0.594 ,
xy 17 6 17 17
∼ 0.935 rad, which is about 54◦ . Since the cosine ranges between −1 and 1,
and θ =
we immediately get from (9.15) the Schwarz inequality (or Cauchy–Schwarz or
Cauchy–Bunyakovski–Schwarz inequality)

|x · y| ≤ xy , (9.16)

which is valid for any two vectors x and y. (Let us remark that one can prove this
inequality directly from the properties of the scalar product, without recourse to the
geometric construction used above.) Using the Schwarz inequality, we may now
prove the triangle inequality. Indeed, for any two vectors x and y we have

x + y2 = (x + y) · (x + y) = x · x + y · x + x · y + y · y
≤ x2 + 2|x · y| + y2
≤ x2 + 2xy + y2 by Schwarz’s inequality
= (x + y) . 2

Taking the square root of both sides, we get

x + y ≤ x + y .

The Cross Product

Definition 9.6 (Cross Product) Let x = x1 i + x2 j + x3 k and y = y1 i + y2 j + y3 k.


The cross product (or vector product) of x and y is denoted by x × y and defined
as
x × y = (x2 y3 − x3 y2 )i + (x3 y1 − x1 y3 )j + (x1 y2 − x2 y1 )k . (9.17)

Thus, the cross product of two vectors yields a vector.

The formula (9.17) for x × y is related to determinants as follows:


 
 i j k      
   x2 x3   x1 x3   x1 x2 
  
x × y =  x1 x2 x3  =   
i−  
j+ k.
 y1 y2 y3  y2 y3  y1 y3  y1 y2 

Actually, the expression between the two equality signs is not a true determinant, since
the first line consists of vectors, not of numbers, but it is a convenient way to memorize
formula (9.17). The standard rule for computing the three 2 × 2 determinants on the
9.2 Vectors 317

Fig. 9.6 Geometric interpretation of the cross product

right-hand side gives (9.17). Another way to memorize (9.17) is to observe the cyclic
behavior of the indices.
Geometric interpretation of the cross product. The vector x × y is geometrically
related to the vectors x and y as follows. Whenever the vectors x and y are not parallel,
they form the sides of a parallelogram, see Fig. 9.6. The side AB is formed by the
vector x and has length x, AD is formed by y with length y. As we know, the
area of a parallelogram with base b and height h is given by bh; here b = x and
h = y sin θ . It turns out (see the discussion of Lagrange’s identity below) that the
magnitude of x × y equals the area of that parallelogram, so

x × y = xy sin θ . (9.18)

Moreover, x × y is orthogonal to both x and y. Finally, from the two remaining


possibilities the direction of x × y is chosen according to the so-called right-hand
rule: Point the index finger in the direction of x and the middle finger in the direction
of y. The thumb then points in the direction of x × y.
Below and in the exercises, we will show that these geometric properties follow
from Definition 9.6, if the coordinate system has a right-hand orientation (index finger
points to i, middle finger points to j, thumb points to k.)
Algebraic properties of the cross product. In contrast to the scalar product, the
cross product is anticommutative, that is,

y × x = −x × y (9.19)

holds for all vectors x and y. As a consequence, for all vectors x we have

x ×x = 0. (9.20)

On the other hand, the cross product shares some properties of the scalar product.
The distributive laws
318 9 Vector Calculus

x × (y + z) = x × y + x × z (9.21)
(x + y) × z = x × z + y × z (9.22)
(αx) × y = α(x × y) = x × (αy) (9.23)

hold for all vectors x, y, and z and all scalars α. In particular, setting α = 0 we have
for every vector x
x ×0 = 0 = 0×x.

The properties (9.19)–(9.23) can be verified directly from the definition of the cross
product.

Example 9.7 (a) Calculate x × y where x = (1, −2, 3) and y = (2, 1, −1).
(b) Show that i × j = k, j × k = i and k × i = j.
(c) Compute i × (i × j) and (i × i) × j.
(d) Compute x × y where x = i − j and y = i + k.

Solution: (a) We have


 
i j k       
   −2 3  1 3   1 −2 
  
x × y =  1 −2 3  =   
i−  
j+  k = −i + 7j + 5k .
 2 1 −1  1 −1  2 −1  2 1 

(b) We have i = (1, 0, 0) and j = (0, 1, 0). We compute


 
 i j k      
  0 0  1 0  1 0 
i × j =  1 0 0  =   i − 
0  j + 
0 k = 0i − 0j + 1k = k ,
0 1 0 1 0 0 1

so we have shown that i × j = k. Similarly, we can prove the other two identities.
(c) We obtain

i × (i × j) = i × k = −j , (i × i) × j = 0 × j = 0 .

(d) We calculate as before


 
 i j k      
   −1 0  1 0   1 −1 
x × y =  1 −1 0  =  i −  
j+  k = −i − j + k.
1 0 1 0 1 0 1 1 0 

Alternatively we may use part (b) above together with properties (9.19)–(9.23) to
compute

x × y = (i − j) × (i + k) = i × i + i × k − j × i − j × k = −j + k − i .

We note that, as we see from part (c) above, it may occur that
9.2 Vectors 319

(x × y) × z = x × (y × z) .

Thus, the cross product is not associative.


We come back to formula (9.18), x × y = xy sin θ. From formula (9.15),
we see that

x2 y2 − (x · y)2 = x2 y2 (1 − cos2 θ ) = x2 y2 sin2 θ ,

so (9.18) is equivalent to Lagrange’s identity

x × y2 = x2 y2 − (x · y)2 . (9.24)

Lagrange’s identity, in turn, is a special case (z = x, w = y) of the more general


formula
(x × y) · (z × w) = (x · z)(y · w) − (y · z)(x · w) . (9.25)

Therefore, formula (9.18) for the magnitude of the cross product follows from (9.25).
A derivation of (9.25) will be done in Exercise 9.7.2.
The scalar triple product. If x, y, and z are vectors, the expression (x × y) · z
is called the scalar triple product of x, y, and z. Its absolute value |(x × y) · z|
represents the volume of the parallelogram with edges x, y, and z.

Example 9.8 Show that the scalar triple product satisfies


 
 x 1 x2 x3 
 
(x × y) · z =  y1 y2 y3  .
 z1 z2 z3 

Solution: We have
      
 x2 x3   x1 x3   x1 x2 
(x × y) · z =    i−   j+   k · (z 1 i + z 2 j + z 3 k)
y2 y3  y1 y3  y1 y2 
     
 x2 x3   x1 x3   x1 x2 
= z1   
− z2   
+ z3  .
y2 y3  y1 y3  y1 y2 

The latter expression is the expansion of the 3 × 3 determinant


 
 x1 x2 x3 
 
 y1 y2 y3  .
 
 z1 z2 z3 

Scalar and vector products often appear in physics and engineering. Work is the scalar
product of force and displacement. Torque and angular momentum are the cross
products of force and displacement, resp., force and linear momentum. Maxwell’s
320 9 Vector Calculus

equations, which provide the foundation of electromagnetic theory, involve both


scalar and cross products of electrical and magnetic variables.

9.3 Differential Calculus of Vector Fields

Let us begin with a specific situation. Let f 1 , f 2 , f 3 be real-valued functions defined


on some interval I , then for each t ∈ I we can form the vector

F(t) = f 1 (t)i + f 2 (t)j + f 3 (t)k . (9.26)

All those points (that is, the points in the range of F) form a curve in 3-space; if we
think of t as a time variable, we may imagine an object moving along this curve,
passing the point F(t) at time t. The functions f 1 , f 2 , f 3 are called the component
functions, or simply the components of F.
In general, let us consider a function F defined on some set S. Whenever the
values of F are vectors, the function F is called a vector field (or a vector-valued
function, or a vector function). Its domain S may be an interval as above, or it may
be a subset of the plane R2 or the space R3 . In the latter case, the vector field has the
form
F(x, y, z) = f 1 (x, y, z)i + f 2 (x, y, z)j + f 3 (x, y, z)k . (9.27)

As an example, F may be the gravity field, which to every point P = (x, y, z) in


space associates the vector F(x, y, z) representing the gravity force at P. The same
situation occurs with the electric field and the magnetic field. Moreover, the flow of
a fluid gives rise to a velocity field which associates with each point P = (x, y, z)
the velocity vector of the fluid at this point. The form (9.27) describes the situation
when the velocity at a point P does not change with time (this is called stationary
flow); for instationary flow the vector field F and its components f i depend on time,
too,
F(x, y, z, t) = f 1 (x, y, z, t)i + f 2 (x, y, z, t)j + f 3 (x, y, z, t)k ,

so in this case the domain S of F is a subset of 4-space. Many more examples of


vector fields will appear in the following.
In the two-dimensional situation, (9.27) simplifies to

F(x, y) = f 1 (x, y)i + f 2 (x, y)j .

In this case, the values of F are vectors in R2 instead of R3 .


As in Sect. 9.2 above, for the argument vector (x, y, z) we may also write
(x1 , x2 , x3 ), or x in compact notation. The value of F at x is then simply denoted by
F(x).
9.3 Differential Calculus of Vector Fields 321

9.3.1 Curves

In this subsection, we consider vector fields which are defined on an interval I of


the real line, as in (9.26). Such vector fields are commonly called curves, since one
imagines the points F(t) to form a curve in space as t varies in I . Note that in Sect. 7.2
we have already encountered curves in 2-space of the form F(t) = f (t)i + g(t)j,
which we have called “curves in parametric form”.

Example 9.9
(a) The function F defined by F(t) = sin t i + cos t j + et k is a vector field defined
on the whole line R.
(b) Let F(t) = x0 i + y0 j + z 0 k, where x0 , y0 , z 0 are constants. Since the right-hand
side does not depend on t, the range of F consists of a single point, and F is called a
constant vector field.
(c) Let f 1 (t) = 2 cos t, f 2 (t) = 2 sin t, f 3 (t) = t. Write down the associated vector
field having f 1 , f 2 and f 3 as components.

Solution: (c) The vector field is F(t) = 2 cos ti + 2 sin tj + tk, or F(t) = (2 cos t,
2 sin t, t) in vector notation.

Definition 9.7 Let F be a vector field defined on an interval I , let t0 ∈ I be given.


We say that the vector L is the limit of F as t tends to t0 , and we write

lim F(t) = L ,
t→t0

if lim F(t) − L = 0. We say that F is continuous at t0 if


t→t0

lim F(t) = F(t0 ) .


t→t0

We say that F is continuous on I if it is continuous at every point t0 ∈ I .

The limit process can be carried out by components.

Theorem 9.1 Let F(t) = f 1 (t)i + f 2 (t)j + f 3 (t)k and L = l1 i + l2 j + l3 k. Then


lim F(t) = L if and only if
t→t0

lim f 1 (t) = l1 , lim f 2 (t) = l2 , lim f 3 (t) = l3 . (9.28)


t→t0 t→t0 t→t0

F is continuous at t0 if and only if all f i are continuous at t0 . F is continuous on I if


and only if all f i are continuous on I .

Verification: Since 
3

F(t) − L =
( f i (t) − li )2
i=1
322 9 Vector Calculus

and therefore 0 ≤ | f i (t) − li | ≤ F(t) − L for all i, the properties of limits and the
Sandwich theorem imply the first assertion. The other assertions are then a conse-
quence of Definition 9.7.
Remark 9.1 If lim F(t) = L then lim F(t) = L. As for scalar functions, the
t→t0 t→t0
converse of this result is wrong, namely lim F(t) = L does not imply
t→t0
lim F(t) = L.
t→t0

Verification: By the reverse triangle inequality (see Example 9.3(d))

0 ≤ |F(t) − L| ≤ F(t) − L .

It follows from the Sandwich theorem that if limt→t0 F(t) − L = 0, then limt→t0
(F(t) − L) = 0. For the converse choose F(t) = x and L = −x, where x is any
fixed nonzero vector. We have lim F(t) = x = L, but lim F(t) = x  = −x =
t→t0 t→t0
L.
The usual properties of the limit (see Chap. 2) can be extended for limits of vector
fields. Let F(t) → L, G(t) → M and α(t) → A as t → t0 . Then

lim (F(t) + G(t)) = lim F(t) + lim G(t) = L + M , (9.29)


t→t0 t→t0 t→t0
   
lim (α(t)F(t)) = lim α(t) · lim F(t) = AL , (9.30)
t→t0 t→t0 t→t0

in particular, if β is a scalar and x is a vector,


 
lim (βF(t)) = β lim F(t) = βL , lim (α(t)x) = lim α(t) x = Ax , (9.31)
t→t0 t→t0 t→t0 t→t0

and moreover
   
lim (F(t) · G(t)) = lim F(t) · lim G(t) = L · M , (9.32)
t→t0 t→t0 t→t0

lim (F(t) × G(t)) = L × M . (9.33)


t→t0

Definition 9.8 (Derivative of a vector field) We define the derivative of F at t0 by

F(t0 + h) − F(t0 )
F (t0 ) = lim ,
h→0 h

if this limit exists. In that case, F is called differentiable at t0 . Alternatively, F (t0 )


dF
is also denoted as (t0 ).
dt
Geometrically, F (t0 ) points along the tangent of the curve in the point F(t0 ), see
Fig. 9.7.
9.3 Differential Calculus of Vector Fields 323

Fig. 9.7 Tangent vector

Differentiation can be carried out component by component, that is, if F(t) =


f 1 (t)i + f 2 (t)j + f 3 (t)k is differentiable at t, then

F (t) = f 1 (t)i + f 2 (t)j + f 3 (t)k . (9.34)

Indeed, applying rules (9.29) and (9.31), we see that

F(t + h) − F(t)
F (t) = lim
h→0 h

f 1 (t + h) − f 1 (t) f 2 (t + h) − f 2 (t) f 3 (t + h) − f 2 (t)
= lim i+ j+ k
h→0 h h h

f 1 (t + h) − f 1 (t) f 2 (t + h) − f 2 (t)
= lim i + lim j
h→0 h h→0 h

f 3 (t + h) − f 2 (t)
+ lim k
h→0 h
= f 1 (t)i + f 2 (t)j + f 3 (t)k .

Differentiation properties of vector fields. We have already seen in Sect. 1.5 that
we may define, for example, the sum f + g of two functions by ( f + g)(x) =
f (x) + g(x). In this manner, operations performed on function values give rise to
operations on the functions themselves. The same applies to vector fields. Let F and
G be two vector fields defined on an interval I , and let α, β be scalars. We define

(F + G)(t) = F(t) + G(t) , (αF)(t) = αF(t) , (9.35)


thus (αF + βG)(t) = αF(t) + βG(t) , (9.36)
(F · G)(t) = F(t) · G(t) , (9.37)
324 9 Vector Calculus

(F × G)(t) = F(t) × G(t) . (9.38)

Moreover, when v is a scalar function defined on I , we set

(vF)(t) = v(t)Ft . (9.39)

We also consider the composition

(F ◦ u)(t) = F(u(t))

for a scalar function u whose range is contained in I .


The following rules hold for differentiation.

(F + G) (t) = F (t) + G (t) (9.40)



(αF) (t) = αF (t) for constants α, (9.41)
(vF) (t) = v(t)F (t) + v (t)F(t) , (9.42)
(F · G) (t) = F(t) · G (t) + F (t) · G(t) , (9.43)
(F × G) (t) = F(t) × G (t) + F (t) × G(t) , (9.44)

and the chain rule for vector functions

(F ◦ u) (t) = F (u(t))u (t) = u (t)F (u(t)) . (9.45)

The formulas (9.40)–(9.45) are a consequence of the definitions, the componentwise


formula (9.34) and the corresponding differentiation rules for scalar functions (see
Chap. 3). They can also be written in Leibniz’ notation, for example,

d dF dG
(F + G) = + ,
dt  dt dt 
d dG dF
(F · G) = F · + ·G ,
dt dt dt
   
d dG dF
(F × G) = F × + ×G .
dt dt dt

Example 9.10 Let F(t) = 2t 2 i − 3j, G(t) = i + tj + t 2 k, u(t) = 13 t 3 . Verify (9.43)–


(9.45) for these functions.
Solution: We first compute the derivatives

F (t) = 4ti , G (t) = j + 2tk , u (t) = t 2 .

We verify (9.43). We have

(F · G)(t) = (2t 2 i − 3j) · (i + tj + t 2 k) = 2t 2 − 3t ,


9.3 Differential Calculus of Vector Fields 325

therefore, the left-hand side becomes (F · G) (t) = 4t − 3. For the right-hand side,
we get

F(t) · G (t) + F (t) · G(t) = (2t 2 i − 3j) · (j + 2tk) + 4ti · (i + tj + t 2 k) = −3 + 4t ,

so indeed both sides are equal.


We now consider (9.44). We have

(F × G)(t) = (2t 2 i − 3j) × (i + tj + t 2 k) = −3t 2 i − 2t 4 j + (2t 3 + 3)k ,

so the left-hand side equals

(F × G) (t) = −6ti − 8t 3 j + 6t 2 k .

We compute the right-hand side

F(t) × G (t) + F (t) × G(t) = (2t 2 i − 3j) × (j + 2tk) + 4ti × (i + tj + t 2 k)


= (2t 2 k − 4t 3 j − 6ti) + (4t 2 k − 4t 3 j) = −6ti − 8t 3 j + 6t 2 k ,

therefore, both sides are equal.


Finally, we investigate (9.45). We have
 2
1 3 1 4
(F ◦ u)(t) = 2 t i − 3j , (F ◦ u) (t) = 2 · · 6t 5 i = t 5 i
3 9 3

on the left-hand side, and


 
1 3
4
F (u(t))u (t) = [4u(t)i] · u (t) = 4 t i t 2 = t 5i
3 3

on the right-hand side.



Example 9.11 (a) Let F(t) = ti + t + 1j − et k. Find F (t), F (0), F (1), and
F(t) · F (t).
(b) Find F (t) if F(t) = t sin ti + e−t j + tk.

Solution: (a) We have f 1 (t) = t, f 2 (t) = t + 1, and f 3 (t) = −et . We get

1
F (t) = f 1 (t)i + f 2 (t)j − f 3 (t)k = i + √ j − et k ,
2 t +1
1 √
F (0) = i + j − k, as e0 = 1, t + 1 = 1 for t = 0 ,
2
1 1
F (t) = 0i − j − et k ,
4 (t + 1)3/2
326 9 Vector Calculus

1 1
F (0) = − j − e0 k = − j − k ,
4  4 
 √  1 1

F(t) · F (t) = ti + t + 1j − e k · i + √
t
j − e k = t + + e2t .
t
2 t +1 2

(b) We have f 1 (t) = t sin t, f 2 (t) = e−t , and f 3 (t) = t. We get

F (t) = f 1 (t)i + f 2 (t)j − f 3 (t)k = (t cos t + sin t)i − e−t j + k ,


F (t) = (−t sin t + cos t + cos t)i + e−t j + 0k = (2 cos t − t sin t)i + e−t j .

9.3.2 Vector Fields in Several Dimensions

In this section, we introduce the concepts of gradient, divergence, and curl. These
are based on the notion of partial derivatives, which we have introduced in Sect. 8.4.
Since any kind of derivatives of a function f at a certain point x are defined as limits
involving function values at nearby points, it is most convenient if the domain of f
“always includes nearby points”. This is made precise in the following definition.

Definition 9.9 (Open Set) A subset D of Rn is called open if for every x ∈ D there
exists a ball B with center x which is contained in D.

Thus, if x ∈ D and if the ball B in question has radius r , then every point z whose
distance from x is smaller than r must also belong to D. For example, the interior
D = {(x, y, z) : 0 < x, y, z < 1} of a cube is open, while if we include its boundary,
the corresponding set R = {(x, y, z) : 0 ≤ x, y, z ≤ 1} is not open.
Let us note that the definition of an open set in the plane (Definition 8.4) is a
special case of Definition 9.9.
The gradient. Let f be a scalar function of three variables. The gradient of f at a
point (x, y, z) is defined as the vector

∂f ∂f ∂f
∇ f (x, y, z) = (x, y, z)i + (x, y, z)j + (x, y, z)k . (9.46)
∂x ∂y ∂z

If f and its partial derivatives are defined on some open subset D of R3 , the gradient
thus becomes a vector field ∇ f : D → R3 whose component functions are just the
partial derivatives ∂ f /∂ x, ∂ f /∂ y and ∂ f /∂z. According to the notation of vectors,
we may also write
 
∂f ∂f ∂f
∇ f (x, y, z) = (x, y, z), (x, y, z), (x, y, z) .
∂x ∂y ∂z
9.3 Differential Calculus of Vector Fields 327

Example 9.12 For f (x, y, z) = x 2 y sin z we have

∂f ∂f ∂f
(x, y, z) = 2x y sin z , (x, y, z) = x 2 sin z , (x, y, z) = x 2 y cos z ,
∂x ∂y ∂z

and thus
∇ f (x, y, z) = 2x y sin zi + x 2 sin zj + x 2 y cos zk ,

or
∇ f (x, y, z) = (2x y sin z, x 2 sin z, x 2 y cos z) .

For a function f of two variables (x, y),

∂f ∂f
∇ f (x, y) = (x, y)i + (x, y)j .
∂x ∂y

In n dimensions, we define


n
∇ f (x) = ∂i f (x)ei , (9.47)
i=1

where x = (x1 , . . . , xn ), ∂i f = ∂ f /∂ xi and ei is the ith unit vector.


The gradient obeys the rules

∇( f + g)(x) = ∇ f (x) + ∇g(x) , (9.48)


∇(α f )(x) = α∇ f (x) , if α is a scalar, (9.49)
∇( f g)(x) = f (x)∇g(x) + g(x)∇ f (x) . (9.50)

Those rules can be verified componentwise by the corresponding rules for partial
derivatives.
Linearization. In the case of a function f of a single variable we have, as a conse-
quence of the definition of the derivative,

f (x + h) = f (x) + f (x)h + r (h) (9.51)

for some remainder term r (h) with r (h)/ h → 0 as h → 0. For functions of several
variables, the gradient plays the corresponding role.

Definition 9.10 Let f be a real-valued function of n variables defined in an open


subset D of Rn . We say that f is differentiable at a point x ∈ D, if all partial
derivatives of f exist at x and

| f (x + h) − f (x) − ∇ f (x) · h|
lim = 0. (9.52)
h→0 h
328 9 Vector Calculus

As a consequence, for a differentiable function f we obtain the analogue of (9.51)


in several dimensions as

f (x + h) = f (x) + ∇ f (x) · h + r (h) , (9.53)

where r (h) is a remainder term with r (h)/h → 0 as h → 0. For this reason, the
function g defined by
g(z) = f (x) + ∇ f (x) · (z − x)

is called the linearization of f at x. Note that the difference

g(x + h) − f (x + h) = r (h)

converges to 0 faster than h. The notation

r (h) = o(h)

(read as “r (h) is small o of h”) is a common way of stating that “r (h)/h → 0


as h → 0”.
It would be very inconvenient if, whenever we want to apply linearization, we
had to check the validity of (9.52) explicitly. Usually, this is not necessary because
of the following theorem.

Theorem 9.2 Let f be a real-valued function of n variables, which is continuously


differentiable in an open subset D of Rn . Then f is differentiable at every point
x ∈ D.

This means that we only have to check whether all partial derivatives of f are
continuous in D, which is often obvious.
The directional derivative. We present its definition in the general case of n dimen-
sions.

Definition 9.11 (Directional Derivative) Let f be a real-valued function of n vari-


ables. For each vector u ∈ Rn , the limit

f (x + tu) − f (x)
f u (x) = lim ,
t→0 t

if it exists, is called the directional derivative of f at x in the direction u. Instead


of f u (x), we also write ∂u f (x).

Remark 9.2 1. When u = ei is the ith canonical unit vector, the directional deriva-
tive f u becomes the partial derivative ∂i f . In particular, for a function f of three
variables (x, y, z), we obtain

∂f
(x, y, z) = f i (x, y, z) = ∂x f (x, y, z) , i = (1, 0, 0) ,
∂x
9.3 Differential Calculus of Vector Fields 329

similarly ∂ f /∂ y = f j = ∂ y f and ∂ f /∂z = f k = ∂z f .


2. As we know already the partial derivatives ∂ f /∂ x, ∂ f /∂ y, and ∂ f /∂z give the
rates of change of f in the directions i, j and k, respectively. Analogously, if
u is a unit vector, the directional derivative f u (x) gives the rate of change of
f in the direction u.
3. The geometrical interpretation of the directional derivative is essentially the same
as that of the partial derivative, presented in Sect. 8.4, except that we now look
at tangents to the graph of f in arbitrary directions u, not only in the direction
of the coordinate axes.

The following theorem gives the connection between the gradient of f at x and its
directional derivative at x.

Theorem 9.3 If f is differentiable at x, then f has a directional derivative at x in


every direction u, and
f u (x) = ∂u f (x) = ∇ f (x) · u . (9.54)

In particular, all partial derivatives ∂i f exist.

Proof Let u ∈ Rn be given, assume u = 0 (otherwise, the assertion is trivially sat-


isfied). In order to verify the definition of the directional derivative, we consider the
identity, valid for all t = 0,
 
 f (x + tu) − f (x)  | f (x + tu) − f (x) − t∇ f (x) · u|
 − ∇ f (x) · u =
 t |t|
| f (x + tu) − f (x) − t∇ f (x) · u|
= · u .
tu

Since f is differentiable at x, the fraction on the right-hand side converges to 0, see


Definition 9.2. Therefore, the left-hand side, too, converges to 0, which yields the
assertion.

Theorem 9.3 yields an important geometric property of the gradient. According to


(9.15), the angle θ between ∇ f (x) and any vector u satisfies

∇ f (x) · u = ∇ f (x)u cos θ .

If moreover u is a unit vector and ∇ f (x) is nonzero, we get from (9.54) that

f u (x) = ∇ f (x) cos θ .

Since −1 ≤ cos θ ≤ 1, we have

−∇ f (x) ≤ f u (x) ≤ ∇ f (x) .

In particular, if u points in the direction of ∇ f (x), then θ = 0 and cos θ = 1, so


330 9 Vector Calculus

f u (x) = ∇ f (x) , (9.55)

whereas, if u points in the direction of −∇ f (x), then θ = π and cos θ = −1, so

f u (x) = −∇ f (x) . (9.56)

Since the directional derivative gives the rate of change of the function in that direc-
tion, it follows that the function f increases most rapidly in the direction of the
gradient and decreases most rapidly in the opposite direction.

Example 9.13 Let f (x, y, z) = x yz. Compute ∇ f (1, 1, 1) and determine the max-
imum and minimum rate of change of f at (1, 1, 1).

Solution: We have
∂f ∂f ∂f
(x, y, z) = yz , (x, y, z) = x z , (x, y, z) = x y ,
∂x ∂y ∂z

therefore
∇ f (x, y, z) = yzi + x zj + x yk = (yz, x z, x y) ,

and hence ∇ f (1, 1, 1)) = i + j + k = (1, 1, 1). According to (9.55) and (9.56),
√ the
maximum and minimum √ rates of change are ∇ f (1, 1, 1) = 3 and
−∇ f (1, 1, 1) = − 3, respectively.

Example 9.14 Calculate the directional derivative of ϕ(x, y, z) = 8x y 2 − x z at an


arbitrary point (x, y, z) in the direction of the vector
 
1 1 1 1 1 1
u = √ i+ √ j+ √ k = √ ,√ ,√ .
3 3 3 3 3 3

Solution: According to Theorem 9.3, we have

∂u ϕ(x, y, z) = ϕu (x, y, z) = ∇ϕ(x, y, z) · u .

We compute

∂ϕ ∂ϕ ∂ϕ
(x, y, z) = 8y 2 − z , (x, y, z) = 16x y , (x, y, z) = −x ,
∂x ∂y ∂z

hence ∇ϕ(x, y, z) = (8y 2 − z, 16x y, −x) and


 
1 1 1 8y 2 − z + 16x y − x
∂u ϕ(x, y, z) = (8y 2 − z, 16x y, −x) · √ ,√ ,√ = √ .
3 3 3 3
9.3 Differential Calculus of Vector Fields 331

Example 9.15 Let the temperature at a point (x, y) on a metallic plate in the x y-
xy
plane be given by T (x, y) = degrees Celsius.
1 + x 2 + y2
(a) Find the rate of change of temperature at (1, 1) in the direction of u = 2i − j.
(b) An insect at (1, 1) wants to walk in the direction in which the temperature drops
most rapidly. Find a unit vector in that direction.
Solution: (a) We have

∂T y(1 − x 2 + y 2 ) ∂T x(1 + x 2 − y 2 )
(x, y) = , (x, y) = ,
∂x (1 + x 2 + y 2 )2 ∂y (1 + x 2 + y 2 )2

so at (x, y) = (1, 1) we have


 
i j 1 1
∇T (1, 1) = + = , .
9 9 9 9

The directional derivative gives the correct rate of change √


only after we normalize
the vector u to unit length, so we set e = u/u = (2i − j)/ 5 and compute the rate
of change as
 
i j 2i − j 1
∂e T (1, 1) = ∇T (1, 1) · e = + · √ = √ .
9 9 5 9 5

(b) The temperature drops most rapidly in the direction of of −∇T (1, 1). Using part
(a), we compute a unit vector e in that direction as

1 1 2
−∇T (1, 1) = − i − j ,  − ∇T (1, 1) = ,
9 9 9
so
−∇T (1, 1) 1 1
e= = −√ i − √ j .
 − ∇T (1, 1) 2 2

Example 9.16 Let the temperature at each point of a metal plate be given by the
function
T (x, y) = e x cos y + e y cos x .

In what direction does the temperature increase most rapidly at the point (0,0)? What
is the rate of increase?
(b) In what direction does the temperature decrease most rapidly at (0,0)?
Solution: We first compute

∂T ∂T
∇T (x, y) = (x, y)i + (x, y)j
∂x ∂y
= (e x cos y − e y sin x)i + (e y cos x − e x sin y)j .
332 9 Vector Calculus

(a) At (0, 0) the temperature increases most rapidly in the direction of the gradient

∇T (0, 0) = i + j. The rate of increase is obtained as ∇T (0, 0) = i + j = 2.
(b) The temperature decreases most rapidly in the direction of −∇T (0, 0) = −i − j.
We repeat that the gradient vector ∇ f (x) tells us the direction of the steepest climb
of f at the point x, and its length, ∇ f (x, gives the steepness.
Polar coordinates and the gradient. Let a function “ψ = ψ(r, θ)” be defined on the
plane with arguments given in polar coordinates. We want to determine its gradient
in Cartesian coordinates (x, y). One way to do this is to first transform ψ explicitly
to Cartesian coordinates, and then compute the gradient as above. Usually, however,
it is easier to compute the partial derivatives w.r.t. Cartesian coordinates directly
from the partial derivatives w.r.t. polar coordinates, as we now explain. In order to
clearly understand what is going on, we use the symbol ψ̃ to denote the function
in Cartesian coordinates, “ψ̃ = ψ̃(x, y)”. We emphasize that ψ and ψ̃ are different
functions from the standpoint of the general definition of a function as presented in
Chap. 1, since the rule by which we associate the function value to (x, y) will be
different from the rule applied to (r, θ). The fact that ψ and ψ̃ only differ through a
change of coordinates is expressed by the equation

ψ(r, θ) = ψ̃(r cos θ, r sin θ ) . (9.57)

Our task is to determine ∇ ψ̃. From (9.57) and from the chain rule (see Sect. 8.4), we
obtain the relation between the partial derivatives of ψ and ψ̃ as

∂ψ ∂ ψ̃ ∂ ψ̃
(r, θ) = cos θ · (r cos θ, r sin θ ) + sin θ · (r cos θ, r sin θ ). (9.58)
∂r ∂x ∂y
∂ψ ∂ ψ̃ ∂ ψ̃
(r, θ) = −r sin θ · (r cos θ, r sin θ ) + r cos θ · (r cos θ, r sin θ ) .
∂θ ∂x ∂y
(9.59)

For any given point (r, θ), this is a system of two linear equations for the two
unknowns ∂ ψ̃/∂ x and ∂ ψ̃/∂ y at the corresponding point (x, y) = (r cos θ, r sin θ ).
Solving this, we obtain

∂ ψ̃ ∂ψ 1 ∂ψ
(x, y) = cos θ (r, θ) − sin θ (r, θ) (9.60)
∂x ∂r r ∂θ
∂ ψ̃ ∂ψ 1 ∂ψ
(x, y) = sin θ (r, θ) + cos θ (r, θ ) . (9.61)
∂y ∂r r ∂θ

(We obtain (9.60) if we multiply (9.58) by cos θ, (9.59) by −(1/r ) sin θ and add the
resulting equations, and similarly for (9.61).) This is the result we were looking for.
In shortened symbolic form (and no longer distinguishing between ψ and ψ̃) it reads
9.3 Differential Calculus of Vector Fields 333
   
∂ψ ∂ψ ∂ψ 1 ∂ψ ∂ψ 1 ∂ψ
∇ψ = , = cos θ − sin θ , sin θ + cos θ .
∂x ∂y ∂r r ∂θ ∂r r ∂θ
(9.62)

Example 9.17 Let ψ(r, θ) = 1/r . Compute ∇ψ at (x, y) = (r cos θ, r sin θ ).

Solution: Since ψ does not depend on θ, we have ∂ψ/∂θ = 0, and (9.62) becomes
 
1 1 1
∇ψ(x, y) = − 2 cos θ, − 2 sin θ = − 3 (x, y) .
r r r

Sometimes, a notation yet shorter than (9.62) is in use. Let us define vectors er , eθ
by
er = cos θ i + sin θ j , eθ = − sin θ i + cos θ j . (9.63)

Then (9.62) becomes


∂ψ 1 ∂ψ
∇ψ = er + eθ . (9.64)
∂r r ∂θ
Here, again, the right-hand side is evaluated at (r, θ) and the left-hand side is evaluated
at (x, y) = (r cos θ, r sin θ ).
Divergence and rotation. Here, we present the definition and some properties of
the differential expressions termed “divergence” and “curl” (or “rotation”).

Definition 9.12 Let F(x, y, z) = f 1 (x, y, z)i + f 2 (x, y, z)j + f 3 (x, y, z)k be a vec-
tor field with components f 1 , f 2 , f 3 defined on some open subset D of R3 which we
assume to possess partial derivatives.

(a) The divergence of F, denoted by div F or ∇ · F (read as nabla dot F), is defined
as
∂ f1 ∂ f2 ∂ f3
div F = + + . (9.65)
∂x ∂y ∂z

It is a scalar field with domain D.


(b) The curl of F, denoted by curl F or ∇ × F (read as nabla cross F), is defined as
     
∂ f3 ∂ f2 ∂ f1 ∂ f3 ∂ f2 ∂ f1
curl F = − i+ − j+ − k. (9.66)
∂y ∂z ∂z ∂x ∂x ∂y

It is a vector field with domain D. In analogy to the definition of a 3 × 3-


determinant, the curl can be expressed symbolically as
 
 i j k. 
 
 ∂ ∂ ∂ 
curl F =  ,

 ∂x ∂y ∂z 
 f1 f2 f3 
334 9 Vector Calculus

a traditional way to memorize the definition of the curl. Alternatively, the notion
“rotation” is used instead of “curl”, and rot F is used instead of curl F.
In contrast to the gradient, the meaning of divergence and curl cannot be explained
from this definition in a direct and intuitive manner. The notion of divergence is best
understood in context with the divergence theorem of Gauss (Theorem 9.9) and will
be explained in Sect. 9.5.2. Likewise, the theorem of Stokes (Theorem 9.12) helps
to clarify the meaning of the curl, see Sect. 9.5.3. Moreover, in Example 9.50 we
will see that for a rotating body, angular velocity is proportional to the curl of the
tangential velocity.
Example 9.18 Compute ∇ · F (the divergence) and ∇ × F (the curl) of the vector
field
F(x, y, z) = 2x yi + xe y j + 2zk .

Solution: The components of F are f 1 (x, y, z) = 2x y, f 2 = xe y , f 3 = 2z. Therefore


 
∂ f1 ∂ f2 ∂ f3
(div F)(x, y, z) = + + (x, y, z) = 2y + xe y + 2 .
∂x ∂y ∂z

To evaluate
     
∂ f3 ∂ f2 ∂ f1 ∂ f3 ∂ f2 ∂ f1
curl F = − i+ − j+ − k,
∂y ∂z ∂z ∂x ∂x ∂y

we compute
   
∂ f3 ∂ f2 ∂ f1 ∂ f3
− (x, y, z) = 0 − 0 = 0 , −
(x, y, z) = 0 − 0 = 0 ,
∂y ∂z ∂z ∂x
 
∂ f2 ∂ f1
− (x, y, z) = e y − 2x ,
∂x ∂y

and obtain

curl F(x, y, z) = 0i + 0j + (e y − 2x)k = (e y − 2x)k .

Example 9.19 (i) Let ϕ be a scalar field which possesses continuous first and second
partial derivatives. Show that

∇ × (∇ϕ) = 0 or curl(∇ϕ) = 0 ,

that is, the curl of the gradient of ϕ is identically equal to the zero vector field.
(ii) Let F = f 1 i + f 2 j + f 3 k be a vector field which has continuous first and second
partial derivatives. Show that

div curl F = 0 or ∇ · (∇ × F) = 0 .
9.3 Differential Calculus of Vector Fields 335

This means that the divergence of the curl of F is identically equal to the zero scalar
field.

Solution: (i) In order to compute the curl of ∇ϕ, we apply (9.66) to the vector field
F = ∇ϕ, that is, f 1 = ∂ϕ/∂ x, f 2 = ∂ϕ/∂ y, f 3 = ∂ϕ/∂z. Inserting those expressions
into (9.66), we obtain
     
∂ 2ϕ ∂ 2ϕ ∂ 2ϕ ∂ 2ϕ ∂ 2ϕ ∂ 2ϕ
∇ × (∇ϕ) = − i+ − j+ − k.
∂ y∂z ∂z∂ y ∂z∂ x ∂ x∂z ∂ x∂ y ∂ y∂ x

We know from Remark 8.4 that we can interchange the sequence of partial derivatives
if the second partial derivatives are continuous. Therefore

∂ 2ϕ ∂ 2ϕ ∂ 2ϕ ∂ 2ϕ ∂ 2ϕ ∂ 2ϕ
= , = , = ,
∂ y∂z ∂z∂ y ∂z∂ x ∂ x∂z ∂ x∂ y ∂ y∂ x

so all components of the vector field ∇ × (∇ϕ) are zero, and thus ∇ × (∇ϕ) is
identically equal to the zero vector field.
(ii) By the definition
     
∂ f3 ∂ f2 ∂ f1 ∂ f3 ∂ f2 ∂ f1
∇ ×F= − i+ − j+ − k.
∂y ∂z ∂z ∂x ∂x ∂y

Forming the divergence on both sides, we get


     
∂ ∂ f3 ∂ f2 ∂ ∂ f1 ∂ f3 ∂ ∂ f2 ∂ f1
div(∇ × F) = − + − + −
∂x ∂y ∂z ∂y ∂z ∂x ∂z ∂x ∂y
∂ 2 f3 ∂ 2 f2 ∂ 2 f1 ∂ 2 f3 ∂ 2 f2 ∂ 2 f1
= − + − + −
∂ x∂ y ∂ x∂z ∂ y∂z ∂ y∂ x ∂z∂ x ∂z∂ y
2 2 2
∂ f3 ∂ 2 f3 ∂ f2 ∂ 2 f2 ∂ f1 ∂ 2 f1
= − + − + −
∂ x∂ y ∂ y∂ x ∂z∂ x ∂ x∂z ∂ y∂z ∂z∂ y
= 0 +0 +0 = 0,

because, for the same reason as in (i),

∂ 2 f3 ∂ 2 f3 ∂ 2 f2 ∂ 2 f2 ∂ 2 f1 ∂ 2 f1
= , = , = .
∂ x∂ y ∂ y∂ x ∂z∂ x ∂ x∂z ∂ y∂z ∂z∂ y

Example 9.20 Consider the vector field r(x, y, z) = xi + yj + zk (the identity map-
ping on R3 ). We let r = r, that is, we also consider the scalar field r given by
r (x, y, z) = r(x, y, z).
(a) Let n be any integer (positive or negative). Prove that ∇(r n ) = nr n−2 r.
(b) Let ϕ be a real-valued function of one variable. Prove that curl(ϕ(r )r) = 0.
336 9 Vector Calculus

Solution: (a) We have r (x, y, z) = r(x, y, z) = (x 2 + y 2 + z 2 )1/2 , so r n (x, y, z) =


(x 2 + y 2 + z 2 )n/2 . This gives

∂(r n ) n
(x, y, z) = (x 2 + y 2 + z 2 )( 2 −1) · 2x = nxr n−2 (x, y, z) ,
n

∂x 2
and similarly
∂(r n ) ∂(r n )
= nyr n−2 , = nzr n−2 .
∂y ∂z

Thus, in abbreviated notation,

∂r n ∂r n ∂r n
∇r n = i+ j+ k = nxr n−2 i + nyr n−2 j + nzr n−2 k
∂x ∂y ∂z
= nr n−2 (xi + yj + zk) = nr n−2 r .

(b) Recall the formula for the curl of a vector field F,


 
 i j k 
       
 ∂ ∂ ∂  ∂ f3 ∂ f2 ∂ f1 ∂ f3 ∂ f2 ∂ f1
curl F =  
 = ∂ y − ∂z i + ∂z − ∂ x j + ∂ x − ∂ y k .
 ∂ x ∂ y ∂z 
 f1 f2 f3 

In the present case (again we use abbreviated notation) F = ϕ(r )r = ϕ(r )xi +
ϕ(r )yj + ϕ(r )zk, that is,

f 1 = ϕ(r )x , f 2 = ϕ(r )y , f 3 = ϕ(r )z ,

and r (x, y, z) = (x 2 + y 2 + z 2 )1/2 . Hence, using the chain rule,


∂ f3 ∂ f2 ∂r ∂r
− = zϕ (r ) − yϕ (r )
∂y ∂z ∂y ∂z
= ϕ (r )[zy(x 2 + y 2 + z 2 )−1/2 − yz(x 2 + y 2 + z 2 )−1/2 ] = 0 .

Similarly, we obtain
∂ f1 ∂ f3 ∂ f2 ∂ f1
− = 0, − = 0.
∂z ∂x ∂x ∂y

This proves that curl(ϕ(r )r) = 0. Finally, we note that divergence and curl satisfy
the property of linearity, that is,

div(F + G) = div F + div G , div(αF) = α div G ,


curl(F + G) = curl F + curl G , curl(αF) = α curl G ,

hold for all vector fields F and G and all scalars α.


9.3 Differential Calculus of Vector Fields 337

9.3.3 Surfaces

Many surfaces Σ in the space R3 can be described parametrically by vector fields


defined on a domain D in the plane R2 as

Σ = r(D) = {r (u, v) : (u, v) ∈ D} . (9.67)

Example 9.21
(a) Let x, y be vectors in R3 . Describe the parallelogram with corners 0, x, y, and
x + y as a parametric surface Σ.
(b) Consider a vertical cylinder of height H whose bottom is formed by a circle of
radius R and midpoint 0 in the x y-plane. Describe its side as a parametric surface
Σ.

Solution: (a) We set D = [0, 1] × [0, 1] and r(u, v) = ux + vy. Then Σ = r(D)
yields the parallelogram.
(b) We set D = [0, 2π ] × [0, H ] and define

r(u, v) = (R cos u, R sin u, v) = R cos ui + R sin uj + vk .

Then Σ = r(D) yields the side of the cylinder. Often, a surface in x yz-space is given
as a graph of a function S defined on a subset D of the x y-plane. In that case, we
may identify u = x and v = y and write its parametric representation as

r(x, y) = (x, y, S(x, y)) = xi + yj + S(x, y)k , (x, y) ∈ D . (9.68)

We have already seen many examples of this type in Sect. 8.1.


Let us consider a surface Σ given in the parametric form (9.67), let P0 = r(u 0 , v0 )
be a point on Σ. Varying u 0 with v0 fixed, we obtain a curve c(u) = r(u, v0 ) through
the point P0 which lies completely within the surface Σ, since all of its points are
also points of Σ. According to Sect. 9.3.1, the vector

c(u 0 + h) − c(u 0 )
c (u 0 ) = lim
h→0 h

(if nonzero) is tangent to C in the point P0 . Instead of c (u 0 ) we write ∂u r(u 0 , v0 );


this is natural since we then have
r(u 0 + h, v0 ) − r(u 0 , v)
∂u r(u 0 , v0 ) = lim . (9.69)
h→0 h

Analogously, if we fix u 0 and vary v0 , we obtain another curve through P0 within Σ


with
r(u 0 , v0 + h) − r(u 0 , v)
∂v r(u 0 , v0 ) = lim (9.70)
h→0 h
338 9 Vector Calculus

as a tangent vector. The plane through P0 spanned by the two vectors ∂u r(u 0 , v0 ) and
∂v r(u 0 , v0 ) is called the tangent plane for Σ at P0 , or the plane tangent to Σ at P0 .
In order that all of this works, we assume that the parametrization r is regular in the
sense of the following definition.
Definition 9.13 A parametrization r : D → Σ is called regular, if it is continuously
differentiable on the interior of D, and if the vectors ∂u r(u, v) and ∂v r(u, v) are not
parallel at all interior points (u, v) of D. A surface Σ is called smooth if it has a
regular parametrization.
Example 9.22 Consider the vertical cylinder with height H and a circular base of
radius R, see Example
√ 9.21(b)√above. Find the plane tangent to the side surface at
the point P0 = ((1/ 2)R, (1/ 2)R, 1).
Solution: The parametrization r(u, v) = R cos ui + R sin uj + vk yields P0 = r
(u 0 , v0 ) with u 0 = v0 = π/4, as well as

∂u r(u, v) = −R sin ui + R cos uj , ∂v r(u, v) = k .

The tangent plane for Σ at P0 is, therefore, spanned by the vectors


π π  1 1 π π 
∂u r , = − √ i + √ j , ∂v r , = k.
4 4 2 2 4 4

Let P0 be a point on a surface Σ. Any vector which is perpendicular to the tangent


plane for Σ at P0 is called a normal vector (or simply a normal) for Σ at P0 , or a
vector normal to Σ at P0 . Note that any scalar multiple of a normal vector is again
a normal vector. A normal vector of length 1 is called a unit normal. Unit normals
are commonly denoted by n.
Since the vector product x × y of two vectors is perpendicular to both vectors x
and y, we see that the vector

N(P0 ) = ∂u r(u 0 , v0 ) × ∂v r(u 0 , v0 )

is a normal vector for Σ at P0 = r(u 0 , v0 ).


Let us return to the special situation where Σ is given as the graph of a (continu-
ously differentiable) function S,

r(x, y) = (x, y, S(x, y)) = xi + yj + S(x, y)k , (x, y) ∈ D . (9.71)

In this case, the vectors

∂x r(x0 , y0 ) = i + ∂x S(x0 , y0 )k ,
(9.72)
∂ y r(x0 , y0 ) = j + ∂ y S(x0 , y0 )k ,

span the tangent plane at P0 = (x0 , y0 , S(x0 , y0 )). Note that the parametrization
(9.71) is regular, since the vectors in (9.72) are not parallel. The vector
9.3 Differential Calculus of Vector Fields 339

N(P0 ) = ∂x r(x0 , y0 ) × ∂ y r(x0 , y0 ) = −∂x S(x0 , y0 )i − ∂ y S(x0 , y0 )j + k (9.73)

is normal to Σ at P0 . A corresponding unit normal is given by

1 
n= ∂x r(x0 , y0 ) × ∂ y r(x0 , y0 ) = −∂x S(x0 , y0 )i − ∂ y S(x0 , y0 )j + k , (9.74)
ν
where 
ν= 1 + (∂x S(x0 , y0 ))2 + (∂ y S(x0 , y0 ))2 .

As we know from geometry, any plane in the space R3 can be described by an equation
involving a vector perpendicular to it. In our case, this means that the tangent plane
to Σ at P0 is equal to the set of all points (x, y, z) which satisfy the equation

− ∂x S(x0 , y0 ) · (x − x0 ) − ∂ y S(x0 , y0 ) · (y − y0 ) + (z − z 0 ) = 0 , (9.75)

or alternatively

z − z 0 = ∂x S(x0 , y0 ) · (x − x0 ) + ∂ y S(x0 , y0 ) · (y − y0 ) . (9.76)

As an alternative to the parametric description (9.67), we can describe a surface as a


level set in the form
Σ = {(x, y, z) : f (x, y, z) = c} , (9.77)

where f is a scalar vector field and c a constant. For example,

Σ = {(x, y, z) : x 2 + y 2 + z 2 = c} , c > 0 ,

yields the sphere centered at the origin of radius c. Let now q(t) = x(t)i + y(t)j +
z(t)k be any curve which lies in Σ, that is,

f (q(t)) = f (x(t), y(t), z(t)) = c . (9.78)

We differentiate both sides with respect to t and obtain from the chain rule that

∂x f (q(t))x (t) + ∂ y f (q(t))y (t) + ∂z f (q(t))z (t) = ∇ f (q(t)) · q (t) = 0 .


(9.79)
Since the vector q (t) is tangent to the surface Σ at the point q(t), (9.79) means that
the gradient of f at some point of the surface is perpendicular to any tangent vector
in that point, that is, ∇ f (x0 , y0 , z 0 ) is normal to Σ at P0 = (x0 , y0 , y0 ) whenever P0
lies in Σ. This is consistent with the fact that ∇ f , if nonzero, points in the direction
of the steepest increase (and −∇ f in the direction of steepest decrease) of f , since
the tangent vectors point in directions where f , to first order, is constant.
340 9 Vector Calculus

9.4 Integration in Vector Fields

In this section, we discuss integrals of vector fields over curves and surfaces. Since
they describe important aspects of processes taking place in space and time, they
constitute another basic tool for solving problems of science and engineering, see
also Sect. 9.6. There are two subsections of this section dealing, respectively, with
line integrals and surface integrals.

9.4.1 Line Integrals

Suppose a curve C in R3 is given by the parametric equations

x = x(t) , y = y(t) , z = z(t) , fora ≤ t ≤ b. (9.80)

The resulting vector function r : [a, b] → R3 with

r(t) = (x(t), y(t), z(t)) = x(t)i + y(t)j + z(t)k , t ∈ [a, b] . (9.81)

is called a parametrization of the curve C.


We will think of C not only as a geometric locus of points (x(t), y(t), z(t)) but
also as having a direction or an orientation induced by its parametrization. The point
r(a) = (x(a), y(a), z(a)) is called the initial point, while r(b) = (x(b), y(b), z(b))
is called the terminal (or end) point. A curve is called closed if r(a) = r(b), that
is, if initial and terminal points coincide. In Fig. 9.8a and b we display curves as the
range of r, that is, as the set of all points r(t) = (x(t), y(t), z(t)) as t varies over the
parameter interval.
A curve C is called continuous if its parametrization r is a continuous function (or
equivalently, if its components in (9.80) are continuous functions) on the parameter
interval [a, b]. C is called differentiable if r is differentiable, and C is called smooth
if the derivative r is a continuous function and the vectors r (t) are not zero for
all t. A continuous curve C is called piecewise smooth, if the parameter interval
[a, b] admits a partition a = t0 < . . . tn = b such that C is smooth on every interval
[t j−1 , t j ] of this partition.
Line Integrals of Scalar Fields. Let us first consider a straight rod of length L
which we think of as a one-dimensional object with mass density ρ. If L is measured
in centimeters and ρ in grams per centimeter, its total mass will be ρ L grams.
Now let us consider a curved rod as a curve C in space, whose points are given by
the parametrization r(t) = (x(t), y(t), z(t)) for t ∈ [a, b], and assume that its mass
density varies along the rod as a function of (x, y, z). Then the total mass of the rod
is given by the integral
 b
ρ(r(t))r (t) dt . (9.82)
a
9.4 Integration in Vector Fields 341

Fig. 9.8 a Curve between


initial and terminal point, b
closed curve

Indeed, a limit passage like the one explained in Sect. 7.2 (based on an approximation
of the rod consisting of straight pieces) shows that (9.82) is the correct formula to
compute the total mass of the rod. This motivates the following definition.

Definition 9.14 (Line Integral of a Scalar Field) Let C be a smooth curve given by
(9.81), let f be a continuous scalar field whose
 domain contains the curve C. Then
the line integral of f over C is denoted by C f ds and defined by
  b
f ds = f (r(t))r (t) dt . (9.83)
C a
342 9 Vector Calculus

Example 9.23 Evaluate the line integral



(x + y) ds ,
C

where C is given by x(t) = y(t) = t, z(t) = t 2 for 0 ≤ t ≤ 2.

Solution: Since r(t) = (x(t), y(t), z(t)) = (t, t, t 2 ), we get

r (t) = (x (t), y (t), z (t)) = (1, 1, 2t) ,


  
r (t) = x (t)2 + y (t)2 + z (t)2 = 1 + 1 + (2t)2 = 2 + 4t 2 .

For f (x, y, z) = x + y we get f (x(t), y(t), z(t)) = f (t, t, t 2 ) = t + t = 2t, so


  2 √
2  1 2 3/2  26 2
(x + y) ds = 2t 2 + 4t dt = (2 + 4t )  =
2 .
C 0 6 0 3

In the special case f = 1, the line integral (9.83) equals the length of the curve C.
Therefore, the function
 t  t
s(t) = x (τ )2 + y (τ )2 + z (τ )2 dτ = r (τ ) dτ (9.84)
a 0

yields the length of C from the initial point r(a) up to the point r(t), it is called
the arc length of the curve C; compare (7.7) for the corresponding situation in

 dimensions. From (9.84) we see that s (t) = r (t).
two

This explains the notation
C f ds in (9.83), where “ds” stands for “s (t) dt” or “r (t) dt”. Accordingly, “ds”
is termed the line element.
Line Integrals of Vector Fields. We consider a vector field F with components f 1 ,
f 2 , and f 3 ,

F(x, y, z) = f 1 (x, y, z)i + f 2 (x, y, z)j + f 3 (x, y, z)k , (9.85)

whose domain contains the curve C, that is, the points r(t) belong to the domain of
F for all t ∈ [a, b].

Definition 9.15 (Line Integral of a Vector Field) Let C be a smooth curve given by
(9.81), let F be a continuous vector field of the form (9.85) whose
 domain contains
the curve C. Then the line integral of F over C is denoted by C F · dr and defined
by
  b
F · dr = F(r(t)) · r (t) dt . (9.86)
C a
9.4 Integration in Vector Fields 343

When C is closed, one also writes



F · dr
C

instead of C F · dr. In this case, the value of the line integral is also called the
circulation of F around C.

We give some explanations concerning the integral on the right-hand side of (9.86).
The integrand is a real-valued function of the single variable t, obtained as the scalar
product of the vector F(r(t)) (the value of the vector field at the curve point r(t))
and the vector r (t) (a vector tangent to the curve at that point). If we expand the
scalar product into components according to its definition, we obtain, since r (t) =
(x (t), y (t), z (t)),
  b
F · dr = f 1 (r(t))x (t) + f 2 (r(t))y (t) + f 3 (r(t))z (t) dt
C a
 b
= f 1 (x(t), y(t), z(t))x (t) + f 2 (x(t), y(t), z(t))y (t) (9.87)
a
+ f 3 (x(t), y(t), z(t))z (t) dt .

One may ask why the line integral C F · dr is defined in this way. In fact, it expresses
certain quantities which are important in applications, for example, the mechanical
work done by a force field F, moving a point mass along the curve C from the initial
to the terminal point. This will elaborated in more detail in Sect. 9.6.2.

Example 9.24 Let the curve C be given by r(t) = (cos t)i − (sin t)j + tk on the
interval 0 ≤ t ≤ π. Evaluate the integral C F · dr, where F(x, y, z) = yi − xj +
2k.

Solution: We use formula (9.86), or (9.87). In order to form F(r(t)), we have to


insert cos t for x, − sin t for y and t for z. This gives F(r(t)) = (− sin t, − cos t, 2).
Moreover, r (t) = (− sin t, − cos t, 1). Computing the scalar product of those two
vectors and integrating, we get
  π  π
F · dr = F(r(t)) · r (t) dt = sin2 t + cos2 t + 2 dt = π + 2π = 3π .
C 0 0

The line integral C F · dr is also frequently denoted as

f 1 (x, y, z) d x + f 2 (x, y, z) dy + f 3 (x, y, z) dz , (9.88)
C

or shorter as 
f 1 d x + f 2 dy + f 3 dz .
C
344 9 Vector Calculus

The definition of those expressions is, of course, the same as above in (9.86) or (9.87).
If we formally replace “x” by “x(t)” and “d x” by “x (t) dt” (and analogously for y
and z) in (9.88), we arrive at (9.87).

Example 9.25 Evaluate C x y d x − y cos x dy, where C is given by x(t) = t 2 and
y(t) = t on the interval −2 ≤ t ≤ 5.
Solution: We have x (t) = 2t and y (t) = 1, therefore
  5  
x yd x − y cos xdy = x yx − y(cos x)y (t) dt
C −2
 5 5
5
t 1 5
= t 2 t2t − t cos t 2 dt = 2 − sin t 2 −2
−2 5 −2 2
 
32 1
= 2 625 + − (sin 25 − sin 4)
5 2
6314 1
= − (sin 25 − sin 4) .
5 2
Example 9.26 Find the circulation of the field F(x, y) = (−y + x)i + xj around the
circle x 2 + y 2 = 1.
Solution: The parametric equation of the given circle is

r(t) = cos t i + sin t j , 0 ≤ t ≤ 2π .

We have

F(r(t)) = (− sin t + cos t)i + cos tj , r (t) = − sin t i + cos t j ,


F(r(t)) · r (t) = (sin2 t − sin t cos t) + cos2 t = 1 − sin t cos t .

Therefore, the circulation of F around the circle (denoted as C) becomes


   2π
2π 2π
sin2 t
F·r = F(r(t)) · r (t) dt = (1 − sin t cos t) dt = t −
C 0 0 2 0
= 2π .

The result of the following example will be used later in the proof of the theorem of
Green and Ostrogradski.
Example 9.27 Let C be the graph of a function y = h(x), where h : [a, b] → R
is differentiable, let F be a vector field whose second component is zero, that is,
F(x, y) = ( f (x, y), 0) for some continuous real-valued function f . Show that
  b
F · dr = f (x, h(x)) d x . (9.89)
C a
9.4 Integration in Vector Fields 345

Solution: We parametrize the curve C as r(t) = (t, h(t)). Then r (t) = (1, h (t)) and
  b  b

F · dr = f (r(t)) · 1 + 0 · h (t) dt = f (t, h(t)) dt ,
C a a

so (9.89) is proved. (Recall that it does not play any role whether the integration
variable is denoted by “t” or by “x”.) The following properties hold for the line
integral. Let F and G be continuous vector fields defined on some region in space
containing a curve C, which is given as the range of some vector function r as above.
Then
  
(F + G) · dr = F · dr + G · dr . (9.90)
 C  C C

αF · dr = α F · dr , αbeing any scalar. (9.91)


C C

This means that the line integral is linear with respect to the vector fields, that is,
the line integral of the sum of two vector fields (respectively, the scalar multiple of a
vector field) is equal to the sum of the line integrals (respectively, the scalar multiple
of the line integral). These formulas are a direct consequence of the definition of the
line integral, using the corresponding properties of the ordinary integral.
For piecewise smooth curves, the line integral is evaluated on each piece sepa-
rately, as stated in the following definition.

Definition 9.16 (Line Integral, Piecewise Smooth Curves) Let C be a piecewise


smooth curve given by (9.81), let F be a continuous vector field of the form (9.85)
whose domain contains the curve C. Then the line integral of F over C is defined by
 n 
ti
F · dr = F(r(t)) · r (t) dt , (9.92)
C i=1 ti−1

where a = t0 < · · · < tn = b is a partition of [a, b] such that C is smooth on every


interval [ti−1 , ti ].

Path Independence, Potential Functions, and Conservative Fields

Definition 9.17 Let F be a vector field defined on an open region D in R3 . If F = ∇ψ


for some scalar function ψ defined on D, then ψ is called a potential function (or
simply potential) for F in D, and the vector field F is called conservative.

It will be explained in Remark 9.10 why the term “conservative” is used here.

Theorem 9.4 Let F be a continuous vector field defined on an open region D in R3


which possesses a potential function ψ in D. Let C be a smooth curve in D with
initial point A and terminal point B. Then
346 9 Vector Calculus

F · dr = ψ(B) − ψ(A) . (9.93)
C

The proof will be given in Appendix D.9.


Let us take a look at formula (9.93). Given the potential ψ, the value of the right-
hand side depends only on the points A and B. This means in particular that the
value of the line integral on the left-hand side does not change if we replace C by
a different curve C̃, as long as C̃ has the same initial and terminal point as C. This
property is called path independence. Theorem 9.4 thus states that for conservative
vector fields, line integrals are path independent.
We present another related definition.

Definition 9.18 A continuous vector field F defined on an open region D in R3 is


called circulation free on D if we have

F · dr = 0 (9.94)
C

for every closed curve C in D.

When the curve C is closed, its initial and terminal points coincide, and the right-
hand side of (9.93) becomes zero. Therefore, a conservative field is circulation free
by Theorem 9.4. Actually, the converse is true, too, so the following theorem holds.

Theorem 9.5 (Closed-Loop Property of Conservative Fields) Let F be a continuous


vector field defined on an open region D in R3 . Assume that D is connected, that is,
any two points in C can be connected by a smooth curve. The following statements
are equivalent:
1. F is conservative on D.
2. F is circulation free in D.

The proof will be given in Appendix D.9.


Another way of stating Theorem 9.5 would be to say that a vector field F possesses
a potential ψ in D if and only if the line integral over any closed curve in D is zero.

Example 9.28 Let ψ(x, y, z) = x yz, let C be a smooth curve with initial point
(−1, 6, 18) and terminal point (2, 12, −8). Find C F · dr, where F = ∇ψ.

Solution: Setting A = (−1, 6, 18) and B = (2, 12, −8) in Theorem 9.4, we get

F · dr = ψ(B) − ψ(A) = 2 · 12 · (−8) − (−1) · 6 · 18 = −192 + 108
C
= 84 .
9.4 Integration in Vector Fields 347

9.4.2 Surface Integrals

From Sect. 9.3.3, we know how a surface Σ in space is described by a parametrization


r : D → Σ defined on some domain D of the plane. Our first task is to compute
the area of the surface. As an introduction to the general formula, we consider the
parallelogram with corners 0, x, y, and x + y, where x and y are two nonparallel
vectors in space. We know that its area equals x × y. We interpret this expression
in terms of the parametrization

r(u, v) = ux + vy , Σ = r(D) , D = [0, 1] × [0, 1] .

Indeed, since
∂u r(u, v) = x , ∂v r(u, v) = y ,

are constant as functions of u and v, we have


 1  1 
x × y = x × y du dv = ∂u r(u, v) × ∂v r(u, v) d A ,
0 0 D

so we have expressed the area in terms of a two-dimensional integral. If we just want


to determine the area of a parallelogram, there is no need for such a complicated
formula, but for a general surface, this is just the way to go.

Definition 9.19 (Area of a Surface) Let Σ be a smooth surface given as Σ = r(D)


with a regular parametrization r. The area of Σ is defined as

A(Σ) = ∂u r(u, v) × ∂v r(u, v) d A . (9.95)
D

Since r is regular, the integral on the right-hand side is well defined as the limit
of the corresponding Riemannian sums, according to Sect. 8.7. Those sums can be
interpreted as the area of small parallelograms which approximate corresponding
portions of Σ. (We do not carry out the details, which in fact would be rather cum-
bersome to do). Thus, (9.95) is a natural definition of the area of an arbitrary curved
surface.
The most important special case arises when Σ is the graph of a function S.

Theorem 9.6 Let Σ be a smooth surface given as z = S(x, y), where S is a con-
tinuously differentiable function defined on some domain D of the plane. Then we
have  
A(Σ) = 1 + ∂x S(x, y)2 + ∂ y S(x, y)2 d A . (9.96)
D

Verification: From Sect. 9.3.3, we know that with r(x, y) = xi + yj + S(x, y)k we
obtain
∂x r(x, y) × ∂ y r(x, y) = −∂x S(x, y)i − ∂ y S(x, y)j + k .
348 9 Vector Calculus

Taking the length of this vector, we see that (9.96) is indeed a special case of (9.95).
Formulas (9.95) and (9.96) are analogous to the formula for the length of a curve
as the integral over the length of tangent vectors. Similarly, the definition of the
surface integral of a scalar function f defined on Σ is analogous to that of a line
integral as given in Definition 9.14.
Definition 9.20 (Surface Integral) Let Σ be a smooth surface, given as Σ = r(D)
 defined on Σ.
with a regular parametrization r. Let f be a scalar continuous function
Then the surface
 integral of f over the surface Σ is denoted by Σ f (x, y, z) dσ
or simply Σ f dσ , and defined by
 
f dσ = f (x, y, z) dσ
Σ Σ
 (9.97)
= f (r(u, v))∂u r(u, v) × ∂v r(u, v) d A .
D

For the case where Σ is the graph of a function S, we obtain the following formula
for the surface integral from (9.97) in the same manner as we obtained the area in
Theorem 9.6 from Definition 9.19.
Theorem 9.7 Let Σ be a smooth surface given as z = S(x, y), where S is a contin-
uously differentiable function defined on some domain D of the (x, y)-plane. Let f
be a scalar continuous function defined on Σ. Then we have
  2  2
 
∂S ∂S
f dσ = f (x, y, S(x, y)) 1 + (x, y) + (x, y) d A .
Σ D ∂x ∂y
(9.98)

 9.3 (i) In the special case f = 1 we obtain the area of the surface, A(Σ) =
Remark
Σ 1 dσ , as one sees from the definitions.
(ii) For a given surface Σ, it is possible to choose different parametrizations r of
it. One can prove that this does not change the value of the area, respectively,
of the surface integral.
(iii) Let a surface Σ consist of smooth components Σ1 , Σ2 , . . . , Σn which are mutu-
ally disjoint or intersect each other only in a set of zero area, for example, along
a curve. Such a surface is called piecewise smooth. The surface integral over
Σ is then defined as
  
f dσ = f dσ + · · · + f dσ . (9.99)
Σ Σ1 Σn

For example, the boundary Σ of a cube is not smooth (the normal vector jumps
across its edges), but it is piecewise smooth, since it consists of six square pieces
which are smooth.
The formal expression “dσ = ∂u r(u, v) × ∂v r(u, v) d A”, respectively “dσ =
(iv) 
1 + ∂x S(x, y)2 + ∂ y S(x, y)2 d A” is often called the surface element.
9.4 Integration in Vector Fields 349

Example 9.29 Evaluate the surface integral Σ f dσ , where f (x, y, z) = y 2 and
Σ is the part of the plane z = x given by 0 ≤ x ≤ 2, 0 ≤ y ≤ 4.

Solution: We have S(x, y) = x and S : D → Σ with D = [0, 2] × [0, 4], so


  2  2
∂S ∂S  √
1+ (x, y) + (x, y) = 1 + 12 + 02 = 2,
∂x ∂y

and therefore
  √  2 4 √
f dσ = y 2dA = 2
y2 2 d y d x
Σ D 0 0
√  2  4
128 √
= 2 dx · y 2 dy = 2.
0 0 3

Example 9.30 Evaluate the surface integral Σ f dσ , where f (x, y, z) = z 2 , and

Σ is the portion of the boundary surface of the vertical cone z = x 2 + y 2 which
lies between the planes z = 1 and z = 2.

Solution: We present two different approaches. For the first one, we use a parametriza-
tion adapted to the cone, namely,

r(u, v) = u cos vi + u sin vj + uk ,

so Σ = r(D) with D = {(u, v) : 1 ≤ u ≤ 2 , 0 ≤ v ≤ 2π}. In order to use Defini-


tion 9.20 we compute

∂u r(u, v) = cos vi + sin vj + k , ∂v r(u, v) = −u sin vi + u cos vj .

We get

(∂u r × ∂v r)(u, v) = (0 − u cos v)i − (−u sin v + 0)j + (u cos2 v + u sin2 v)k
= −u cos vi + u sin vj + uk

and moreover
 √ √
(∂u r × ∂v r)(u, v) = u 2 cos2 v + u 2 sin2 v + u 2 = 2u 2 = u 2 .

Inserting this into (9.97), we get, since f (r(u, v)) = u 2 ,


  √ √  
2π 2
15 √
f dσ = u2 · u 2 d A = 2 u 3 du dv = π 2.
Σ D 0 1 2
350 9 Vector Calculus

For the second approach, we use Theorem
 9.7 with S(x, y) = x 2 + y 2 and the
annular domain D = {(x, y) : 1 ≤ x 2 + y 2 ≤ 2}. We have f (x, y, S(x, y)) =
S(x, y)2 = x 2 + y 2 , so
  2  2
 
∂S ∂S
f dσ = (x 2 + y 2 ) 1 + (x, y) + (x, y) d A .
Σ D ∂x ∂y

The partial derivatives of S are

∂S x ∂S y
(x, y) =  , (x, y) =  ,
∂x x + y2
2 ∂y x + y2
2

so
  2  2 
∂S ∂S x2 y2 √
1+ (x, y) + (x, y) = 1+ + 2 = 2.
∂x ∂y x2 +y 2 x +y 2

This yields   √
f dσ = (x 2 + y 2 ) 2 d A .
Σ D

Since D is an annular region, it is best to evaluate this two-dimensional integral


in polar coordinates. According to Theorem 8.9, we get with G = {(r, θ) : 1 ≤ r ≤
2 , 0 ≤ θ ≤ 2π} that
 √  √  2
f dσ = 2 (x 2 + y 2 ) d A = 2 (r cos2 θ + r 2 sin2 θ )r d A
Σ D G
√  2π  2 3 15 √
= 2 r dr dθ = π 2,
0 1 2

which is the same as above.



Example 9.31 Evaluate the surface integral Σ f dσ , where f (x, y, z) = x 2 and
Σ is the upper half of the sphere x 2 + y 2 + z 2 = a 2 .

Solution: First we apply Theorem 9.7. We have S(x, y) = a 2 − x 2 − y 2 and

∂S x ∂S y
(x, y) = −  , (x, y) = −  .
∂x a − x 2 − y2
2 ∂y a − x 2 − y2
2

Thus   2  2
∂S ∂S a
1+ (x, y) + (x, y) =  .
∂x ∂y a − x 2 − y2
2
9.4 Integration in Vector Fields 351

The integration domain becomes the circular disk D = {(x, y) : 0 ≤ x 2 + y 2 ≤ a 2 },


and therefore,  
a
f dσ = x2  dA.
Σ D a2 − x 2 − y2

Again, this two-dimensional integral is best evaluated in polar coordinates, and The-
orem 8.9 yields
  2π  a
a
f dσ = r 2 cos2 θ · √ r dr dθ
Σ 0 0 a − r2
2
 2π  a
r3
=a cos2 θ dθ · √ dr .
0 0 a2 − r 2

The rightmost integral can be evaluated with the substitution r = a sin t, dr =


a cos t dt, and (we omit the details)
 a  π/2
r3 2 3
√ dr = a 3 sin3 t dt = a .
0 a2 − r 2 0 3

We finally obtain
  2π
2 2
f dσ = a cos2 θ dθ · a 3 = πa 4 .
Σ 0 3 3

An alternative solution procedure arises from the use of spherical coordinates,

r(u, v) = a sin u cos vi + a sin u sin vj + a cos uk .

Here Σ = r(D) with D = {(u, v) : 0 ≤ u ≤ π/2 , 0 ≤ v ≤ 2π}. A straightforward


but somewhat lengthy computation yields

(∂u r × ∂v r)(u, v) = a 2 sin u .

Since f (x, y, z) = x 2 , we have f (r(u, v)) = a 2 sin2 u cos2 v, and from (9.97) we
get
 
f dσ = a 2 sin2 u cos2 v · a 2 sin u d A
Σ D
 π/2  2π
2
=a 4
sin u du ·
3
cos2 v dv = πa 4
0 0 3

as before.
352 9 Vector Calculus

9.5 Fundamental Theorems of Vector Calculus

We discuss here three very important results of vector calculus. The first result bears
the names Green and Ostrogradski, in honor of the British scientist George Green
and of the Ukrainian mathematician Mikhail Ostrogradski. This theorem establishes
a relationship between a double integral over a domain in the plane and a line integral
over the boundary of that domain. The second result which we present here is known
as the divergence theorem of Gauss in honor of the German mathematician and
scientist Carl Friedrich Gauss. This theorem relates a volume (triple) integral over
a region in space to a surface integral over the boundary surface of that region. The
third theorem is known as Stokes’ theorem, named for Sir George Gabriel Stokes, an
Irish mathematician who worked at Cambridge University. This theorem connects a
surface integral to a line integral over the boundary curve of the surface.
Thus, all those theorems relate integrals over objects of different (adjacent) dimen-
sions. In some sense, they can be viewed as extensions of the fundamental theorem of
calculus; recall that the latter relates an integral over an interval (a one-dimensional
object) to function values on the boundary (a zero-dimensional object).
There are numerous applications of this theorem in science and engineering, some
of them will be described in the next section.

9.5.1 The Theorem of Green and Ostrogradski

Let C be a piecewise smooth curve in the plane R2 parametrized as r(t) = (x(t), y(t))
for a ≤ t ≤ b. Let C be closed, that is, r(b) = r(a) holds for the initial and terminal
points. The curve C is called positively oriented if r(t) traverses C counterclockwise
(anticlockwise) as t varies from a to b. If r(t) traverses C clockwise, then C is said
to be negatively oriented.
Example 9.32 The parametrization r(t) = (cos t, sin t) traverses the unit circle C
anticlockwise as t varies from 0 to 2π, and hence C is positively oriented by r. On the
other hand, the parametrization r(t) = (− cos t, sin t) defines a negatively oriented
curve C (whose graph is again the unit circle), since it traverses C clockwise.
A non-closed curve C in the plane is called simple if it does not intersect itself, that
is, if it has a parametrization r : [a, b] → R2 which is one-to-one, so r(t1 ) = r(t2 )
can hold only if t1 = t2 (recall Definition 1.12). If we imagine the graph of a curve
as a train track, this means that the track does not cross itself. A closed curve is
called simple if it does not intersect itself except for the initial and final point, that
is, r(t1 ) = r(t2 ) holds
 for t1 < t2 only if t1 = a and t2 = b.
Recall that by C F · dr we denote the line integral of a vector field F = ( f, g)
over a closed curve C, and also write it as
 
F · dr = f (x, y) d x + g(x, y) dy .
C C
9.5 Fundamental Theorems of Vector Calculus 353

Theorem 9.8 (Green–Ostrogradski) Let D be a bounded domain in the plane whose


boundary C is a closed, simple, and positively oriented curve. Let F = ( f, g) be a
vector field whose components are continuously differentiable. Then
   
∂g ∂f
f (x, y) d x + g(x, y) dy = − dA. (9.100)
C D ∂x ∂y

The proof of this theorem will be given in Appendix D.6.

Remark 9.4
(i) The theorem relates a line integral, which is one-dimensional, to an integral over
a 2-dimensional region.
(ii) The theorem is used for theoretical purposes as well as for the computation of
specific integrals. In particular, it may happen that one of the integrals is much
easier to calculate directly than the other one.

Example 9.33 (a) A particle moves counterclockwise once around the triangle D
with vertices (0, 0), (4, 0), and (1, 6), under the influence of the force F(x, y) =
x yi + xj. Calculate the work done by this force, if the units of length and force
are meters and Newton, respectively.
(b) Evaluate C F · dr, where F(x, y) = (x 2 − y)i + (cos 2y − e3y + 4x)j and C
is (the boundary of) any square with sides of length 5. Assume C is oriented
counterclockwise.

Solution: (a) Let us denote by C the curve formed by thethree sides of the triangle. The
total work done by the force F equals the line integral C F · dr. We have F = ( f, g)
with f (x, y) = x y, g(x, y) = x, so ∂ f /∂ y = x and ∂g/∂ x = 1. From Theorem 9.8
we know that
    
∂g ∂f
F · dr = − (x, y) d A = (1 − x) d A .
C D ∂x ∂y D

With the methods of Sect. 8.7, we compute the two-dimensional integral over the
triangular region D as a double integral,
  1  6x  4  8−2x
(1 − x) d A = (1 − x) d y d x + (1 − x) d y d x
D 0 0 1 0
 1  4
= 6x(1 − x) d x + (8 − 2x)(1 − x) d x = −8 .
0 1

Therefore, the work done equals −8 Nm.


(b) We have F = ( f, g) with f (x, y) = x 2 − y, g(x, y) = cos 2y − e3y + 4x, so
∂ f /∂ y = −1 and ∂g/∂ x = 4. The two-dimensional region D is a square of side
length 5. We calculate
354 9 Vector Calculus
     
∂g ∂f
F · dr = − (x, y) d A = 4 − (−1) d A = 5 dA.
C D ∂x ∂y D D
 
Since D d A equals the area of the square, which is equal to 25, we obtain C F ·
dr = 125.

Example
 29.34 (a)2 Use the Green–Ostrogradski theorem to evaluate the line integral
C y d x + x dy, where C is the boundary of the square with vertices (0, 0), (1,
0), (0, 1), and (1, 1), oriented counterclockwise. Check the answer by evaluating
the line integral directly.

(b) Do the same for C x yd x + (y + x)dy, where C is the unit circle x 2 + y 2 = 1,
oriented counterclockwise.
(c) Verify the validity of the Green–Ostrogradski theorem for the vector field F =
2yi − xj and the curve C taken as the circle of radius 4 with center (1, 3).

Solution: (a) Here f (x, y) = y 2 , g(x, y) = x 2 , so ∂ f /∂ y = 2y and ∂g/∂ x = 2x,


moreover D = [0, 1] × [0, 1]. Using (9.100) we get
    
∂g ∂f
f (x, y) d x + g(x, y) dy = − (x, y) d A = 2x − 2y d A
C D ∂x ∂y D
 1 1
= 2x − 2y dy d x = 1 − 1 = 0 . (9.101)
0 0

For the line integral, we have to evaluate y(t)2 x (t) + x(t)2 y (t) dt separately
along all four sides of the square, traversed by a suitable parametrization r(t) =
(x(t), y(t)). Along the side x = 0 we have y(t) = 0 = x (t), and along the side
y = 0 we have x(t) = 0 = y (t), so the corresponding line integrals are zero. The
side x = 1 is parametrized by r(t) = (x(t), y(t)) = (1, t) for 0 ≤ t ≤ 1, so the line
integral becomes
 1
t 2 · 0 + 12 · 1 dt = 1 .
0

Analogously, the line integral along y = 1 yields the value −1, so the overall integral
gives 0 + 0 + 1 − 1 = 0, which is the same as the result in (9.101).
(b) We have f (x, y) = x y, g(x, y) = y + x, so ∂ f /∂ y = x and ∂g/∂ x = 1, more-
over D = {(x, y) : 0 ≤ x 2 + y 2 ≤ 1} is the unit disk. Thus with F = ( f, g) we obtain
    
∂g ∂f
F · dr = − (x, y) d A = 1− x dA.
C D ∂x ∂y D

Using polar coordinates we get, according to Theorem 8.9,


  2π  1  2π  
1 1
1− x dA = (1 − r cos θ )r dr dθ = − cos θ dθ = π .
D 0 0 0 2 3
9.5 Fundamental Theorems of Vector Calculus 355

In order to compute C F · dr directly, we parametrize C by r(t) = (x(t), y(t)) with
x(t) = cos t, y(t) = sin t, where 0 ≤ t ≤ 2π. We obtain
  2π
x y d x + (y + x) dy = [x yx + (y + x)y ](t) dt
C 0
 2π
= [cos t sin t (− sin t) + (sin t + cos t) cos t] dt
0
 2π  2π  2π
=− cos t sin2 t dt + sin t cos t dt + cos2 t dt
0 0 0
 2π
=0+0+ cos2 t dt = π ,
0

 2π
recall that 0 cos2 t dt = π .
(c) We have F = ( f, g) with f (x, y) = 2y, g(x, y) = −x, so ∂ f /∂ y = 2, ∂g/∂ x =
−1, and D is the disk of radius 4 with center (1, 3). We get ( A(D) denotes the area
of D)
    
∂g ∂f
F · dr = − (x, y) d A = −3 d A
C D ∂x ∂y D
= −3 · A(D) = −3 · 42 π = −48π .

To evaluate C F · dr directly, we parametrize C by r(t) = (x(t), y(t)) with x(t) =
1 + 4 cos t, y(t) = 3 + 4 sin t, where 0 ≤ t ≤ 2π. We obtain
  2π
2y d x − x dy = [2yx − x y ](t) dt
C 0
 2π
= 2(3 + 4 sin t) · (−4 sin t) − (1 + 4 cos t)4 cos t dt
0
 2π
= −24 sin t − 32 sin2 t − 4 cos t − 16 cos2 t dt
0
= 0 − 32π − 0 − 16π = −48π ,
 2π  2π
recall that 0 sin2 t dt = 0 cos2 t dt = π .

9.5.2 The Divergence Theorem of Gauss

We consider the following situation in the space R3 . Let D be an open set which is
bounded by a surface Σ such that D lies “completely inside” Σ. For example, the
open unit ball {x : x ∈ R3 , x < 1} is bounded by and lies completely inside the unit
sphere {x : x ∈ R3 , x = 1}; the same is true for the (full) cube whose boundary
356 9 Vector Calculus

surface consists of its 6 sides. In the case of the ball the boundary surface (the sphere)
is smooth, in the case of the cube the boundary surface is piecewise smooth.
At every point x of a smooth surface, normal vectors can be defined (see
Sect. 9.3.3). In the present situation, we can distinguish between outer normals
pointing away from D and inner normals pointing into D. Thus, while we have two
unit normals (normals whose length equals 1) at each point x ∈ Σ, there is exactly
one outer unit normal which we denote by n(x). In this way, we obtain a vector field
n, which is defined on the boundary surface Σ. As a consequence of our definition
of a smooth surface (Definition 9.13), the field n is continuous.
When the boundary surface consists of smooth pieces (as in the case of the cube),
the outer unit normal field n is continuous within each piece, but has jumps across
their connecting boundaries (the edges and corners, in the case of the cube) where it
is not defined.

Theorem 9.9 (Divergence Theorem of Gauss) Let D be a domain in R3 with bound-


ary Σ and outer unit normal field n as described above. Suppose that F is a vector
field with values in R3 whose components are continuous and have continuous first
partial derivatives in D and up to the boundary Σ. Then
 
F · n dσ = div F d V . (9.102)
Σ D

Remark 9.5 (i) A proof will be given in Appendix D.7.


(ii) The theorem relates an integral over a two-dimensional surface to an integral over
a three-dimensional volume. The integral on the left side is a surface integral (see
Definition 9.20), the integrand being the scalar function F · n = f 1 n 1 + f 2 n 2 + f 3 n 3 ,
where f j and n j are the component functions of the vector fields F and n. The integral
on the right side is a volume integral, which we have treated in Sect. 8.7, of the scalar
function div F = ∂ f 1 /∂ x + ∂ f 2 /∂ y + ∂ f 3 /∂z.
(iii) Its interpretation and some applications will be discussed in Sect. 9.6. Indeed,
the divergence theorem is a fundamental tool in the analysis of partial differential
equations and of the phenomena described by them. Some identities useful for that
purpose will be presented below, following the examples.
(iv) Moreover, Eq. (9.102) is sometimes helpful when one wants to compute surface
or volume integrals, because the evaluation of one side may be more convenient than
the evaluation of the other side.

Example 9.35 For each of the following data calculate the right-hand side and left-
hand side of Eq. (9.102), whichever is convenient.
(i) F(x, y, z) = xi + yj − zk, Σ is the sphere of radius 4 centered at (1, 1, 1).
(ii) F(x, y, z) = x 3 i + y 3 j + z 3 k, Σ is the sphere of radius 1 with center at the origin.
(iii) F(x, y, z) = x 2 i + y 2 j + z 2 k, Σ is the rectangular box bounded by the coordi-
nate planes x = 0, y = 0, z = 0 and the planes x = 6, y = 2 and z = 7.

Solution: In all cases, D denotes the region enclosed by Σ. Recall that


9.5 Fundamental Theorems of Vector Calculus 357

∂ f1 ∂ f2 ∂ f3
div F = + + , where F = f 1 i + f 2 j + f 3 k .
∂x ∂y ∂z

(i) We have f 1 (x, y, z) = x, f 2 (x, y, z) = y, f 3 (x, y, z) = −z, so

div F(x, y, z) = 1 + 1 − 1 = 1 .

We evaluate the right-hand side of (9.102) as (here, vol(D) denotes the volume of
D, a ball of radius 4)
 
4 256
div F d V = d V = vol (D) = π · 43 = π.
D D 3 3

(ii) Here f 1 (x, y, z) = x 3 , f 2 (x, y, z) = y 3 , f 3 (x, y, z) = z 3 . So

∂ f1 ∂ f2 ∂ f3
(x, y, z) = 3x 2 , (x, y, z) = 3y 2 , (x, y, z) = 3z 2 ,
∂x ∂y ∂x
div F(x, y, z) = 3x 2 + 3y 2 + 3z 2 .

The right-hand side of (9.102) becomes


 
div F d V = 3 (x 2 + y 2 + z 2 ) d V .
D D

Using spherical coordinates, we have already evaluated this integral in Example 8.39
so 
4 12
div F d V = 3 · π = π.
D 5 5

(iii) Here F(x, y, z) = x 2 i + y 2 j + z 2 k, so

div F(x, y, z) = 2x + 2y + 2z .

Again we evaluate the right-hand side of (9.102), this time with Fubini’s theorem
applied to the rectangular box,
  7  2  6
div F d V = 2 x + y + z d x dy dz
D 0 0 0
 7 2 x=6  7 2
1 2
=2 x + xy + xz dy dz = 36 + 12y + 12z dy dz
0 0 2 x=0 0 0
 7  y=2  7
= 36y + 6y 2 + 12yz dz = 72 + 24 + 24z dz
0 y=0 0
 z=7
= 96z + 12z 2 = 672 + 12 · 49 = 1260 .
z=0
358 9 Vector Calculus

Fig. 9.9 The cone D

Example 9.36 Verify the statement of Gauss’ divergence theorem for the vector field
F(x, y, z) = xi + yj + zk and the cone whose interior is given by

D = {(x, y, z) : 0 < z < 1 , 0 < x 2 + y 2 < z 2 } .

Solution: The boundary surface Σ of D consists of two


 parts: see Fig. 9.9. Its lateral
part Σ1 is described by z = S(x, y) with S(x, y) = x 2 + y 2 for x, y ∈ [−1, 1], its
upper part Σ2 is the flat circular disk described by x 2 + y 2 ≤ 1 lying in the plane
z = 1. The surface integral on the left-hand side of (9.102) thus decomposes into
two parts,   
F · n dσ = F · n1 dσ + F · n2 dσ ,
Σ Σ1 Σ2

where n1 and n2 are the vector fields of outer unit normals to Σ1 respectively Σ2 .
According to (9.73), the vector

∂S ∂S x y
N1 (x, y, z) = − (x, y)i − (x, y)j + k = − i − j + k
∂x ∂y z z

is a normal vector to Σ1 in (x, y, z)), where z = x 2 + y 2 . The corresponding outer
unit normal n1 is then given by
9.5 Fundamental Theorems of Vector Calculus 359
 
1 x y
n1 (x, y, z) = √ i+ j−k .
2 z z

Moreover, n2 = k as Σ2 is a flat horizontal surface. For the given vector field


F(x, y, z) = xi + yj + zk, we now compute
  
F · n dσ = F · n1 dσ + F · n2 dσ
Σ Σ1 Σ2
   
1 x y
= (xi + yj + zk) · √ i + j − k dσ + (xi + yj + zk) · k dσ
Σ1 2 z z Σ1
  2   
1 x y2
=√ + − z dσ + z dσ = 0 + 1 dσ
2 Σ1 z z Σ2 Σ2
=π,

as z 2 = x 2 + y 2 on Σ1 , z = 1 on Σ2 and the area of the disk Σ2 equals π. The


right-hand side of (9.102) becomes
  
div F d V = 1 + 1 + 1 dV = 3 1 d V = 3vol (D) .
D D D

Since the volume of the cone of height 1 and radius 1 is equal to π/3, the right-
hand side, too, is equal to π. We now present Green’s identities. Here Δ denotes the
Laplace operator,
∂2 f ∂2 f ∂2 f
Δf = + + .
∂x2 ∂ y2 ∂z 2

Theorem 9.10 (Green’s first identity) Let D, Σ, and n as in Theorem 9.9. Let f
and g be continuous scalar fields whose first and second partial derivatives are
continuous in D and up to the boundary Σ. Then
 
f ∇g · n dσ = ( f Δg + ∇ f · ∇g) d V . (9.103)
Σ D

One can derive Green’s first identity from the divergence theorem by choosing F =
f ∇g in (9.102).

Theorem 9.11 (Green’s second identity) Let D, Σ, n, f , and g as in Theorem 9.10.


Then  
( f ∇g − g∇ f ) · n dσ = ( f Δg − gΔf ) d V . (9.104)
Σ D

Green’s second identity is obtained from Green’s first identity by interchanging the
roles of f and g, and then subtracting the resulting formula from the original one.
As a particular case of Green’s identities, setting f = 1 we get
360 9 Vector Calculus
 
∇g · n dσ = Δg d V . (9.105)
Σ D

9.5.3 The Theorem of Stokes

Let F(x, y, z) = f 1 (x, y, z)i + f 2 (x, y, z)j + f 3 (x, y, z)k be a vector field whose
component functions are continuously differentiable, let Σ be a smooth surface
bounded by a piecewise smooth closed curve C, let n be a vector field of unit normals
to Σ whose orientation fits the orientation of C (this will be described below). Stokes’
theorem relates the line integral of F over C to a surface integral involving curl F
over Σ by the formula
 
F · dr = (curl F) · n dσ . (9.106)
C Σ

Line integrals of vector fields have been studied in Sect. 9.4.1 surface integrals in
Sect. 9.4.2. The integrand of the surface integral on the right-hand side is a scalar field,
obtained as the scalar product of the vector fields curl F and n. From Definition 9.12,
we recall that the notation as a vector product or as a determinant
 
 i j k
∂ ∂ ∂
curl F = ∇ × F =  ∂ x ∂ y ∂z  (9.107)
 f1 f2 f3 

is a convenient way to memorize the componentwise definition of the rotation curl F


of the vector field F.
We assume that the surface Σ is given as z = S(x, y) with a function S defined
on the corresponding domain D in the x y-plane, and we assume that S and its first
and second partial derivatives are continuous. We have seen in (9.74) that

1  
n= −∂x Si − ∂ y Sj + k , ν = 1 + (∂x S)2 + (∂ y S)2 , (9.108)
ν
defines a unit normal in the point (x, y, S(x, y)) of Σ, if the right-hand side is
evaluated at (x, y) ∈ D. We fix a suitable orientation of the boundary curve C of
Σ as follows. Let Γ be the boundary of D, positively oriented by a parametrization
q : [a, b] → R2 . Then

r(t) = q1 (t)i + q2 (t)j + S(q1 (t), q2 (t))k

defines a parametrization r : [a, b] → R3 of C which orients C in the needed way.


9.5 Fundamental Theorems of Vector Calculus 361

Theorem 9.12 (Stokes) Under the assumptions above, we have


 
F · dr = (curl F) · n dσ . (9.109)
C Σ

Remark 9.6 (i) A proof will be given in Appendix D.8.


(ii) The theorem also holds for more general surfaces, that is, surfaces which cannot
be described as the graph of a function S.
(iii) The theorem relates a one-dimensional integral (the line integral) to a two-
dimensional integral (the surface integral).
Example 9.37 Verify the statement of Stokes’ theorem by evaluating both sides of
(9.109) for the following data.
(a) F(x, y, z) = (x − y)i + (y − z)j + (z − x)k, the surface Σ is the portion of the
plane x + y + z = 1 which lies in the first octant.
(b) F(x, y, z) = (y − z)i + (z + x)j + (y − x)k, the surface Σ is the portion of the
paraboloid z = 9 − x 2 − y 2 which lies above the x y-plane.
Solution: (a) The surface Σ forms a planar triangular region in space whose vertices
are the unit vectors i, j, and k, the boundary curve C is the triangle connecting those
points, thus it is piecewise smooth. The corresponding domain D in the x y-plane
is the triangular region enclosed by x + y = 1, x = 0, and y = 0. Thus, if we go
through the corners of C in the sequence i → j → k → i, the orientation corresponds
to a positive orientation of Γ , the boundary of D. We parametrize the first piece
C1 : i → j with r(t) = (1 − t)i + tj, t ∈ [0, 1]. For F(x, y, z) = (x − y)i + (y −
z)j + (z − x)k, the line integral becomes
  1
F · dr = F(r(t)) · r (t) dt
C1 0
 1
= ((1 − 2t)i + tj + (t − 1)k) · (−i + j) dt
0
 1
1
= (3t − 1) dt = .
0 2

The other two pieces of C are parametrized by r(t) = (1 − t)j + tk and r(t) =
(1 − t)k + ti, t ∈ [0, 1], respectively. An analogous computation as above shows
that  
1
F · dr = F · dr = ,
C2 C3 2

so 
1 1 1 3
F · dr = + + = .
C 2 2 2 2

The parametrization S of Σ is given by S(x, y) = 1 − x − y, so ∂x S = ∂ y S = −1


are constant. The unit normal according to (9.108) is
362 9 Vector Calculus

1
n = √ (i + j + k) ,
3

Moreover,
 curl F = i + j +
√ k. Thus, both n and curl F are constant on Σ. Since
1 + (∂x S)2 + (∂ y S)2 = 3, the surface integral becomes
  √  √ √
3
(curl F) · n dσ = 3 dσ = 3 · 3dA = ,
Σ Σ D 2

because the area of the triangular region D equals 1/2.


(b) The boundary curve C is the circle lying in the x y-plane with radius 3 and center
at the origin, it can be parametrized as r(t) = 3 cos ti + 3 sin tj for 0 ≤ t ≤ 2π. We
have
  2π  2π
F · dr = F(r(t)) · r (t) dt = 9(− sin2 t + cos2 t) dt = 0 .
C 0 0

Since the line integral equals 0, it does not matter which orientation we choose.
Moreover, we compute that curl F = 0, so
 
(curl F) · n dσ = 0 dσ = 0 .
Σ Σ

Hence, we have verified Stokes’ theorem for both sets of data.

Remark 9.7 A vector field F defined on some open set G of space R3 is called
irrotational in G if curl F = 0 at all points of G. This terminology is motivated by
Stokes’ theorem, since then the circulation C F · dr equals zero for all closed curves
which arise as a boundary of a surface Σ within the domain G. If all closed curves
in G arise as boundaries, we then can conclude from Theorem 9.5 that

curl F = 0 in G implies that F is conservative on G.

This is the case, for example, when G is the whole space R3 or when G is a ball.
On the other hand, if curl F(x, y, z) is not zero at some point (x, y, z), then for
small
 disks around this point whose normal is parallel to curl F(x, y, z), we have
C F · dr  = 0 and hence F cannot be conservative.
However, note that for some types of regions G there may exist closed curves
C which are not boundaries of such a surface Σ and have nonzero circulation (and
hence, F is not circulation free and not conservative on G according to Definition 9.18
and Theorem 9.5, even though we might have curl F = 0 on G). Consider, for exam-
ple, the vector field
y x
F(x, y, z) = − i+ 2 j.
x2 +y 2 x + y2
9.5 Fundamental Theorems of Vector Calculus 363

Its domain
 G is the whole space R3 except the z-axis, it satisfies curl F = 0 on G,
but C F · dr = 2π = 0 for the circle C defined by x 2 + y 2 = 1. Indeed, if Σ is a
surface with boundary C, it must intersect the z-axis, so it cannot be contained in G.
 
Example 9.38 For the following data, evaluate C F · dr or Σ (curl F) · n dσ ,
whichever is easier.
(a) F = yx 2 i − x y 2 j + z 2 k, the surface Σ is the hemisphere x 2 + y 2 + z 2 = 4,
z ≥ 0. 
(b) F = zi + xj + y 2 k, the surface Σ is the cone z = x 2 + y 2 for 0 ≤ z ≤ 4.

Solution: (a) The boundary curve C of Σ can be described by r(t) = 2 cos ti +


2 sin tj for 0 ≤ t ≤ 2π, so along C we have

F(r(t)) · r (t) = −16 cos2 t sin2 t − 16 cos2 t sin2 t = −8 sin2 (2t) = −4(1 − cos 4t) ,

and therefore   2π
F · dr = −4(1 − cos 4t)dt = −8π .
C 0

(b) The parametric equation of C is r(t) = 4 sin ti + 4 cos tj + 4k for 0 ≤ t ≤ 2π,


so

F(r(t)) · r (t) = 16 cos t − 16 sin2 t ,


  2π
F · dr = 16 cos t − 16 sin2 t dt = −16π .
C 0

Note that the parametrization z = S(x, y) = x 2 + y 2 is not differentiable at 0; we
remark that Stokes’ theorem remains valid in this particular case.

9.6 Applications of Vector Calculus to Engineering


Problems

A major part of science and engineering deals with the analysis of forces such as the
force of water on a dam or of air on a wing, stresses within buildings and bridges,
electric and magnetic forces in the power industry or in computer hardware, and so
on. These forces vary with position, time and other circumstances. Vector calculus
provides basic tools for analyzing and understanding these situations.
In the first subsection, we exhibit relationships between elements of vector calcu-
lus and physical concepts like velocity, acceleration, momentum, angular momentum,
and temperature. The second subsection presents applications of line integrals. The
third subsection contains some basic examples of fluid flow, with an application to
hurricane modeling.
364 9 Vector Calculus

9.6.1 Elements of Vector Calculus and the Physical World

We present here some physical phenomena modeled by vector fields.


Example 9.39 (Gravitation) According to Newton’s law of gravitation, two objects
with masses m and M attract each other with a force F of magnitude

Gm M
F = , (9.110)
r2
where r is the distance between the two objects (treated as point masses), and G
is the gravitational constant (G = 6.673 m3 /(kg sec2 )). Assume that the object with
mass M is located at the origin in 3-space and r is the position vector of the object
of mass m, then r = r, and the force F(r) exerted by the object of mass M on
the object of mass m points in the direction of the unit vector −r/r. Thus, from
(9.110)
Gm M r Gm M
F(r) = − =− r, (9.111)
r2 r r3

or, in Cartesian coordinates,

Gm M
F(x, y, z) = − (xi + yj + zk) .
(x 2 + y 2 + z 2 )3/2

This defines a vector field whose domain D equals the whole space R3 except the
origin. It describes the gravitational force of a point mass M located at the origin, as
a function of the position of the point mass m.
Example 9.40 (Electric Force Field) Coulomb’s law states that the electrostatic force
exerted by one charged particle on another is directly proportional to the product of
the charges and inversely proportional to the square of the distance between them.
Let two particles of charge Q and q be located at the origin of R3 and at the position
given by a vector r, respectively. Then the force F(r) that the particle of charge Q
exerts on the particle of charge q equals

qQ
F(r) = r, (9.112)
4π ε0 r3

where ε0 is a positive constant (called the permittivity constant or the dielectric


constant). Note that the force is repellent (directed outward) if Q and q have the
same sign, and attractive otherwise. Formula (9.112) defines the vector field F of the
electrostatic force generated by a the point charge Q at the origin, as a function of
the position of the point charge q. If we divide by q we obtain the electrostatic force
per unit charge, which is called the electric field

F
E= . (9.113)
q
9.6 Applications of Vector Calculus to Engineering Problems 365

Because of their form (9.110) and (9.112), both Newton’s law and Coulomb’s law
are instances of what is termed an inverse square law.
Recall that, for a scalar vector field ψ = ψ(x, y, z), the gradient is defined as

∂ψ ∂ψ ∂ψ
∇ψ = i+ j+ k,
∂x ∂y ∂z

and that ψ is called a potential of the vector field F if F = ∇ψ. In this case, F is
called a gradient field.

Example 9.41 (Gravitational and Electric Potential) The gravitational force field
(9.111) possesses the potential (called gravitational potential)

Gm M
ψ(r) = , (9.114)
r

Indeed, we have computed in Example 9.20(a), setting n = −1 there, that F = ∇ψ.


Analogously, the electric field (9.113) possesses the potential −ψ (called electric
potential, the minus sign is conventional) with

Q
ψ(r) = . (9.115)
4π ε0 r

We have seen in Sect. 9.3 that at each point in a gradient field F where the gradient
is nonzero, the latter points in the direction in which the rate of increase of the cor-
responding potential ψ is maximal, and that moreover the gradient is perpendicular
to the tangent plane of the level surface ψ(x, y, z) = c through that point.
Vector fields F with domain and range in the plane R2 can be represented graph-
ically by drawing the vectors F(r) at some points r of the plane. We give some
examples.

Example 9.42 (i) Setting r(x, y) = xi + yj as above, we consider the vector field
F(r) = r as well as
r x y
F(r) = = i+  j , (x, y)  = (0, 0).
r x 2 + y2 x 2 + y2

These vector fields are shown in Fig. 9.10a and b, respectively.


(ii) The vector field defined as F(r) = F(x, y) = −yi + xj is called the spin field
or rotation field or turning field. F(r) is perpendicular to r, since

F(r) · r = −yx + x y = 0

Moreover, we have F(r) = x 2 + y 2 = r. The vector fields F and F/F are,
respectively, shown in Fig. 9.11a and b.
366 9 Vector Calculus

Fig. 9.10 The vector fields (a) F(r ) = R and (b) F(r ) = R/r

Fig. 9.11 The spin fields (a)


F and (b) F/F

Example 9.43 (Velocity and Acceleration) Let a particle be moving along a path
having position r(t) = x(t)i + y(t)j + z(t)k as t varies from a to b. Its velocity
field v also has the domain [a, b] and is defined as

v(t) = r (t) = x (t)i + y (t)j + z (t)k . (9.116)

The speed of the particle is defined as the magnitude of the velocity, it, therefore, is
equal to
v(t) = (x (t)2 + y (t)2 + z (t)2 )1/2 = r (t) . (9.117)
t
Let s(t) denote the total distance traveled up to time t, that is, s(t) = a F (τ ) dτ ,
see (9.84). Since s (t) = r (t), we see from (9.117) that the speed of the particle
is equal to s (t).
The acceleration a of the particle is defined as the rate of change of the velocity
with respect to time,

a(t) = v (t) = r (t) = x (t)i + y (t)j + z (t)k .


9.6 Applications of Vector Calculus to Engineering Problems 367

Example 9.44 (Momentum) The momentum p of an object is defined as the mass of


the object times its velocity,

p(t) = mv(t) = mr (t) .

By Newton’s law, its time derivative

p (t) = mr (t) = ma(t)

equals the total (or net) force acting upon the object. We see, therefore, that p (t) = 0
if the net force is zero at time t.

Remark 9.8 As a consequence, the momentum p of an object stays constant dur-


ing time intervals where no force is applied to it. This statement is called the law
of conservation of momentum. Indeed conservation laws (which assert that cer-
tain quantities stay constant under certain conditions) are a basic ingredient of the
sciences, since in particular when analyzing situations with several (or many) chang-
ing quantities it can be very helpful to identify quantities which are not changing.
Instead of saying “quantity X stays constant” one also says “quantity X is invariant”
or “quantity X remains invariant”.

Example 9.45 (Angular Momentum) If an object of mass m has velocity v(t) at


position r(t) at time t, its angular momentum L(t) w.r.t. the origin is defined by the
equation
L(t) = r(t) × p(t) = r(t) × mv(t) = r(t) × mr (t) .

Note that the angular momentum L(t) is perpendicular to r(t) and v(t). According
to (9.18), its magnitude is given by

L(t) = r(t)mv(t) sin θ (t) ,

where θ (t) is the angle between r(t) and v(t).

Example 9.46 A particle in the plane moves along the circle with radius 2 centered
at the origin in such a way that its x- and y-coordinates are given by x(t) = 2 cos t,
y(t) = 2 sin t.
(a) Find the velocity, the speed, and the acceleration of the particle at an arbitrary
time t.
(b) Sketch the path of the particle and show the position and velocity vectors at
t = π/4.

Solution: (a) The position is described by the vector function

r(t) = 2 cos ti + 2 sin tj .

Its velocity and speed at time t are, therefore,


368 9 Vector Calculus

v(t) = r (t) = −2 sin ti + 2 cos tj ,


 
v(t) = (−2 sin t) + (2 cos t) = 4(sin2 t + cos2 t) = 2 .
2 2

Its acceleration at time t is a(t) = r (t) = −2 cos ti − 2 sin tj. At time t = π/4, we
have
π  π π √ √
r = 2 cos i + 2 sin j = 2i + 2j ,
π  4  4 4
π π π √ √
v =r = −2 sin i + 2 cos j = − 2i + 2j .
4 4 4 4
Example 9.47 A particle of charge q moving in a magnetic field B is subject to the
so-called Lorentz force q
F = v × B, (9.118)
c
where c is the speed of light and v is the velocity of the particle. Assume that the
magnetic field is constant and vertically oriented, B(t) = B0 k with B0  = 0. Find the
path
r(t) = x(t)i + y(t)j + z(t)k

of the particle, given its initial position r(0) = r0 and velocity v(0) = v0 , as well as
its mass m.

Solution: By Newton’s law and (9.118), we have


q
mv (t) = mr (t) = F(t) = v(t) × B(t) .
c

Since B(t) = B0 k, we get

q B0
v (t) = λv(t) × k , where λ = . (9.119)
mc

Written in components v(t) = v1 (t)i + v2 (t)j + v3 (t)k, (9.119) becomes

v1 (t)i + v2 (t)j + v3 (t)k = λ[v2 (t)i − v1 (t)j] .

This implies that

v1 (t) = λv2 (t) , v2 (t) = −λv1 (t) , v3 (t) = 0 . (9.120)

Since v3 (t) = 0 for all t, v3 is a constant function, say v3 (t) = C. From the first two
equations of (9.120), we get

v1 (t) = λv2 (t) = −λ2 v1 (t) ,


9.6 Applications of Vector Calculus to Engineering Problems 369

or
v1 (t) + λ2 v1 (t) = 0 . (9.121)

A solution of (9.121) is given by

v1 (t) = A sin(λt + ϕ) ,

because
v1 (t) = λA cos(λt + ϕ) , v1 (t) = −λ2 A sin(λt + ϕ) .

Since v1 (t) = λv2 (t), we get

v1 (t)
v2 (t) = = A cos(λt + ϕ) .
λ
Therefore
r (t) = v(t) = A sin(λt + ϕ)i + A cos(λt + ϕ)j + Ck .

A final integration with respect to t gives us



A
r(t) = − cos(λt + ϕ) + K 1 i
λ
(9.122)
A
+ sin(λt + ϕ) + K 2 j + [Ct + K 3 ]k ,
λ

where K 1 , K 2 , K 3 are constants of integration. All six constants A, ϕ, C, K 1 , K 2 ,


and K 3 can be evaluated from the six initial conditions given by the vector equations
r(0) = r0 and v(0) = v0 , but we omit this computation here. Instead, we observe that
the path of the particle is a circular helix with axis parallel to B, that is, parallel to k.
One can see this from Eq. (9.122): The z-component of r varies linearly with t, while
the x- and y-components represent uniform circular motion with angular velocity λ
and radius |A1 /λ| around the center (K 1 , K 2 ). Thus, charged particles spiral around
the magnetic field lines. Qualitatively, this behavior still occurs even if the magnetic
field lines are curved, as in the case of the Earth’s magnetic field. Charged particles
trapped by the Earth’s magnetic field spiral around the magnetic field lines that run
from pole to pole.

Example 9.48 A heat-seeking particle is located at the point (2, 3) on a flat metal
sheet whose temperature at a point (x, y) is

T (x, y) = 10 − 8x 2 − 2y 2 .

Find an equation for the trajectory of the particle if it moves continuously in the
direction of maximum temperature increase.
370 9 Vector Calculus

Solution: Let the trajectory (a curve) be parametrized by r(t) = (x(t), y(t)) with
r(0) = (2, 3). Since the direction of maximum temperature increase at any point
(x, y) is given by ∇T (x, y), the velocity vector v(t) of the particle at time t points
in the direction of the gradient at its current position r(t). Thus, there is a scalar μ
that may depend on t such that

v(t) = μ(t)∇T (x(t), y(t)) .

Since v(t) = r (t) and ∇T (x, y) = (−16x, −4y), we get

x (t)i + y (t)j = μ(t)(−16x(t)i − 4y(t)j) ,

or, equating components,

x (t) = −16μ(t)x(t) , y (t) = −4μ(t)y(t) . (9.123)

Let M = M(t) be any antiderivative of μ. We may check that

x(t) = e−16M(t) x0 , y(t) = e−4M(t) y0 ,

are solutions of (9.123) with initial values x(0) = x0 and y(0) = y0 . Using the initial
values x0 = 2 and y0 = 3, we see that the trajectory (x(t), y(t)) satisfies, for all values
of t, the equation
3 √
y=√ 4
4
x. (9.124)
2

The graph of the trajectory and contour plot of the temperature function are shown
in Fig. 9.12.

Remark 9.9 The preceding example exhibits dynamics controlled by the gradient
of some scalar field. This is called gradient flow. There are numerous applications
where such a situation arises.

Example 9.49 Assume that a certain distribution of electric charges in the plane
produces the electric potential ψ(x, y) = e−2x cos(2y).
(i) Find the electric field vector E = −∇ψ at (π/4, 0).
(ii) Find the direction in which the potential decreases most rapidly at this point.

Solution: (i) We have

∇ψ(x, y) = −2e−2x cos 2yi − 2e−2x sin 2yj ,


E(x, y) = −∇ψ(π/4, 0) = −2e−π/2 j .

(ii) At (π/4, 0), ψ decreases most rapidly in the direction of −∇ψ(π/4, 0) which
was computed in (i).
9.6 Applications of Vector Calculus to Engineering Problems 371

Fig. 9.12 Trajectory of the


particle and level curves of
the temperature

Example 9.50 Suppose a rigid object rotates with constant angular speed around an
axis through the origin with direction a ∈ R3 . Let r = xi + yj + zk be any point of
space. We decompose r = r⊥ + αa into a radial component r⊥ and a component αa
parallel to a, where α is a scalar. Since any material point of the object which passes
through a fixed space point r will have the same velocity vector, we can associate with
this motion a velocity field v = v(x, y, z). We see that v is perpendicular to r⊥ as well
as to the rotation axis a and that the speed v is proportional to r⊥  = r sin θ.
Therefore, there exists a unique vector ω which is parallel to a such that

v = ω × r⊥ = ω × r .

This vector ω is called the angular velocity of the object; its length ω equals the
angular speed. Let ω = Ai + Bj + Ck. Then

v(x, y, z) = (Ai + Bj + Ck) × (xi + yj + zk)


= (Bz − C y)i + (C x − Az)j + (Ay − Bx)k .

We compute
 
 i j k 
 
curl(v) =  ∂x ∂y ∂z  = 2 Ai + 2Bj + 2Ck

 Bz − C y C x − Az Ay − Bx  (9.125)
= 2ω .
372 9 Vector Calculus

Thus the angular velocity of a uniformly rotating body equals one-half the curl of
the “linear” velocity, as the latter is called in this context to emphasize its different
character.

9.6.2 Applications of Line Integrals

Line integrals have been introduced in Sect. 9.4.1. Here, we present some applications
in the area of mechanics.
Line Integrals of Scalar Fields. Let us consider a bent wire as a (one-dimensional)
smooth curve r(t) = x(t)i + y(t)j + z(t)k in space, where r : [a, b] → R3 . We
assume that the distribution of its mass is described by a continuous density func-
tion ρ (in units of mass per unit length) defined on the set C = r ([a, b]) of the
curve points. We have already mentioned in Sect. 9.4.1 that its total mass M can be
expressed as the line integral
  b
M= ρ ds = ρ(r(t))r (t)|| dt .
C a

Its center of mass is another quantity of interest in mechanics. We define the coordi-
nates of the so-called first moment of ρ as the line integrals
  
Mx = xρ ds , My = yρ ds , Mz = zρ ds ,
C C C

that is,  b
Mx = x(t)ρ(r(t))r (t)|| dt , My = . . . , Mz = . . . .
a

The coordinates of the center of mass are then given by

Mx My Mz
x̄ = , ȳ = , z̄ = .
M M M
Let us now assume that the wire rotates around an axis. Its moment of inertia with
respect to this axis is given by the line integral

I = d 2 ρ ds ,
C

here d(x, y, z) denotes the distance of the point (x, y, z) from the axis. In particular,
we obtain the moment of inertia of the wire with respect to the x-axis as
9.6 Applications of Vector Calculus to Engineering Problems 373

Ix = (y 2 + z 2 )ρ ds .
C

Line Integrals of Vector Fields. Let C be the straight line connecting two points r0
and r1 in space. We know from elementary mechanics that if a mass is moved from
r0 to r1 under the influence of a constant force F, the work W done by the force is
given by
W = F · (r1 − r0 ) = Fr1 − r0  cos α ,

where α is the angle between the vectors F and r1 − r0 . Let us now parametrize C
as r(t) = r0 + t (r1 − r0 ) with r : [0, 1] → R3 . Since r (t) = r1 − r0 , we have
 1
F · (r1 − r0 ) = F · r (t) dt ,
0

The latter integral is nothing else than the line integral of Definition 9.15, so we can
express the work W as the line integral

W = F · dr . (9.126)
C

This line integral also yields the correct value of the total work done by an arbi-
trary force field Let F(x, y, z) = f 1 (x, y, z)i + f 2 (x, y, z)j + f 3 (x, y, z)k on a mass
which has moved along an arbitrary curve C from its initial point r(a) to its final
point r(b), if C is given by r = x(t)i + y(t)j + z(t)k with r : [a, b] → R3 . This can
be seen if we approximate C by a curve consisting of pieces of straight lines and
pass to the limit. The procedure is analogous to the one used to compute the length
of a curve in Sect. 7.2, we will not carry out the details.

Example 9.51 Let F(x, y, z) = i − yj + x yzk be a force field. Calculate the work
done when moving a particle from (0, 0, 0) to (1, −1, 1) along the curve x = t,
y = −t 2 , z = t, 0 ≤ t ≤ 1.

Solution: The work done is equal to the line integral C F · dr. We compute
   1
F · dr = d x − y dy + x yz dz = [1x − yy + x yzz ](t) dt
C C 0
 1 1
1 1
= 1 + t (−2t) − t dt = t − t 4 − t 5
2 4
0 2 5 0
1 1 3
=1− − = .
2 5 10

Example 9.52 Find the work done by F(x, y, z) = x 2 i − 2yzj + zk in moving an


object along the line segment from (1, 1, 1) to (4, 4, 4).
374 9 Vector Calculus

Solution: First parametrize the line segment as x = y = z = 1 + 3t with 0 ≤ t ≤ 1,


so x (t) = y (t) = z (t) = 3. The work done by F moving the object along C is given
by
  1
W = F · dr = [x 2 x − 2yzy + zz ](t) dt
C 0
 1
= [(1 + 3t)2 − 2(1 + 3t)2 + (1 + 3t)] · 3 dt
0
1
(1 + 3t)2 (1 + 3t)3 27
= − =− .
2 3 0 2

In Definition 9.17, we have called a vector field F conservative if it possesses a


potential function, that is, if F = ∇ψ for some scalar field ψ. The following remark
explains this terminology.

Remark 9.10 (Conservation of Mechanical Energy) Let an object of mass m move


in space according to Newton’s law F = ma = mr , see Example 9.43. Assume that
the force field F has a potential function ψ, so F = ∇ψ. In mechanics, one considers
the potential energy −ψ and the kinetic energy −(1/2)mv2 , where v = r . The
total mechanical energy E(t) of the object at time t is given as the sum of its kinetic
and its potential energy at that time, so

1
E(t) = mr (t) · r (t) − ψ(r(t)) . (9.127)
2

Let us compute its time derivative E (t). Using the product rule and the chain rule
we get

1 1
E (t) = mr (t) · r (t) + mr (t) · r (t) − ∇ψ(r(t)) · r (t)
2 2
= mr (t) − ∇ψ(r(t)) · r (t) .

Since mr (t) = F(r (t)) = ∇ψ(r(t)), we conclude that E (t) = 0. This means that
the total energy remains constant (or, is conserved) along the trajectory of the object.

Remark 9.11 As we have seen in Example 9.41, the gravitational field possesses a
potential and hence it is conservative.

9.6.3 An Example of Planar Fluid Flow-Hurricane

We begin with two basic situations, sink flow and vortex flow, in a situation made
as simple as possible. Consider an incompressible fluid (that is, its density ρ is
constant) whose flow takes place in the plane and is stationary, that is, its velocity
9.6 Applications of Vector Calculus to Engineering Problems 375

field v = v(x, y) does not depend on time. It is also assumed that the fluid is inviscid,
that is, its internal friction (the viscosity) is zero.
Sink flow. We imagine that there is a hole at the origin (the sink) where the fluid
leaves the plane and that
(i) the fluid flows toward the origin, that is, the velocity vector at every point (x, y)
is directed toward the origin,
(ii) the flow is radially symmetric, that is, the speed of the fluid is the same at all
points of every given circle centered at the origin.
Conditions (i) and (ii) above are modeled by a velocity field of the form

v(x, y) = β(r )(xi + yj) , r = x 2 + y2 , (9.128)

with a function β = β(r ) < 0 yet to be fixed. One can show that one must have
div v = 0 in order to satisfy the principle of mass conservation. Using the chain rule,
we compute

∂v1 ∂r x2
= β(r ) + xβ (r ) = β(r ) + β (r )  ,
∂x ∂x x 2 + y2

In the same manner, we obtain

∂v2 y2
= β(r ) + β (r )  .
∂y x 2 + y2

Since in two dimensions


∂v1 ∂v2
div v = + ,
∂x ∂y

we get
x 2 + y2
div v = 2β(r ) + β (r )  = 2β(r ) + rβ (r ) .
x 2 + y2

The condition div v = 0 leads to


c
β(r ) =
r2
for some constant c which must be negative by the above. Setting c = −q/2π (q is
called the sink strength), (9.128) becomes
q
v(x, y) = − (xi + yj) . (9.129)
2π(x 2 + y2)
376 9 Vector Calculus

Since it follows that v(x, y) = q/2πr , the sink flow has the further property that
(iii) the speed of the fluid at any point P = (x, y) is inversely proportional to the
distance of P from the origin; in particular, it tends to +∞ as that distance tends
to zero.
Vortex flow. Here the fluid flows along concentric circles around the origin in the
counterclockwise direction, that is,
(i) the velocity vector v(x, y) at a point (x, y) is tangent to the circle centered at
the origin which passes through (x, y),
(ii) v(x, y) points in the counterclockwise direction.
Moreover, the speed v
(iii) is constant along those circles, and
(iv) is inversely proportional along any such circle to the radius r of the latter,
and hence it tends to +∞ as r tends to 0. The vector field

k
v(x, y) = (−yi + xj) (9.130)
2π(x 2 + y2)

possesses those four properties. (The constant k >0 is called the vortex strength.)
Indeed, by (9.130) we have v(x, y) = k/2π x 2 + y 2 = k/2πr . Moreover,
v(x, y) · (xi + yj) = 0, so v(x, y) is perpendicular to the radius vector of the circle,
and by drawing a picture one sees that the direction is counterclockwise.
Streamlines and stream functions. The paths followed by the fluid particles in a fluid
flow are called the streamlines of the flow. If the streamlines can be represented as the
level curves of some function ψ = ψ(x, y), then ψ is called the stream function of
the flow. In this case, every particle path must satisfy ψ(x(t), y(t)) = c for a suitable
constant c. By the chain rule,

∂ψ ∂ψ
(x(t), y(t))ẋ(t) + (x(t), y(t)) ẏ(t) = 0
∂x ∂y

must hold along particle paths. Therefore, ψ and the velocity field v of the flow are
related by
∇ψ(x, y) · v(x, y) = 0 (9.131)

at all points (x, y) in the domain of the flow; the velocity vectors are tangent, while
the vectors ∇ψ(x, y) are perpendicular to the streamlines.
Combined sink and vortex flow. Here we have the velocity field

1  
v(x, y) = (−q x − ky)i + (−qy + kx)j . (9.132)
2π(x 2+y )
2
9.6 Applications of Vector Calculus to Engineering Problems 377

The particles in this flow rotate while moving toward the sink, so we expect that they
spiral inward. In order to find the streamlines, we will compute its stream function
ψ, using (9.131). It is convenient to do this in polar coordinates, using the vectors
introduced in (9.63) as

er = cos θ i + sin θ j , eθ = − sin θ i + cos θ j .

Setting x = r cos θ, y = r sin θ, the velocity field in polar coordinates becomes

1  
v(r, θ) = 2
(−qr cos θ − kr sin θ )i + (−qr sin θ + kr cos θ )i
2πr
1
= (−qer + keθ ) ,
2πr
and ∇ψ transforms according to (9.64) into

∂ψ 1 ∂ψ
∇ψ = er + eθ .
∂r r ∂θ
Condition (9.131) for the stream function now becomes in polar coordinates
 
∂ψ 1 ∂ψ
0= (r, θ)er + (r, θ)eθ · v(r, θ)
∂r r ∂θ
  (9.133)
1 ∂ψ k ∂ψ
= −q (r, θ) + (r, θ) .
2πr ∂r r ∂θ

We see that (9.133) is satisfied if we choose ψ such that

∂ψ k ∂ψ
(r, θ) = , (r, θ) = q . (9.134)
∂r r ∂θ
This is indeed possible, we set

ψ(r, θ) = k ln r + qθ . (9.135)

We want to compute r as a function of θ for the streamlines ψ = c. From k ln r +


qθ = c we get

1
ln r = (c − qθ ) , r = e(c−qθ)/k = ec/k · e−qθ/k .
k

Since c is an arbitrary constant, we may replace ec/k by c and finally obtain that

r = ce−qθ/k , c > 0 , (9.136)


378 9 Vector Calculus

holds along the streamlines. Thus, the spirals are determined by the value of q/k.
Modeling of a Hurricane. Let us assume that the preceding flow model can be used
for a hurricane and that only a single measurement of velocity is available.

Example 9.53 Find the strength of the parameters k and q of the flow model (9.132)
for a hurricane from the report that at 20 km distance from the eye the wind veloc-
ity has a component of 15 km/h toward the eye and a counterclockwise tangential
component of 45 km/h. Estimate the size of the hurricane by finding a radius beyond
which the wind speed is less than 5 km/h.

Solution: The velocity component toward the eye is equal to the speed of the sink
part of the flow. We have already seen that the latter is given by q/(2πr ), so at r = 20
we have q
15 = , so q = 600π
2π · 20
with unit 1/h. The tangential velocity component is equal to the speed of the vortex
part of the flow, and we have already seen that the latter is given by k/(2πr ), so at
r = 20 we have q
45 = , so q = 1800π ,
2π · 20
again with unit 1/h. To estimate the size, we determine r from the condition that
v = 5 km/h. Since tangential and inward velocity are perpendicular, we have
  2
1  q 2 k
5 = v = + ,
r 2π 2π

therefore
1 100 √ √
r= 3002 + 9002 = 9 + 81 = 60 10 ≈ 189.7 km.
5 5

9.7 Exercises

9.7.1 Let x, y, z be vectors of three-dimensional space, let λ be a scalar. Show that

(a) (x + y) + z = x + (y + z),
(b) λ(x + y) = λx + λy,
(c) x · (y + z) = x · y + x · z,
(d) λ(x · y) = (λx) · y = x · (λy),
(e) x · (x × y) = 0, that is, x × y is perpendicular to x.
(f) y · (x × y) = 0, that is, x × y is perpendicular to y.
(g) x × (y + z) = x × y + x × z
(h) x × (y × z), = (x · z)y − (x · y)z
9.7 Exercises 379

(i) x · (y × z), = y · (z × x) = z · (x × y)
(j) x × x = 0.
9.7.2 The goal of this exercise is to derive the identity

(x × y) · (z × w) = (x · z)(y · w) − (y · z)(x · w) , (*)

for arbitrary vectors x, y, z, and w of three-dimensional space.


(a) Show that (*) holds for x = y = k and arbitrary vectors z and w.
(b) Show that (*) holds for x = k, y = αi + βj and arbitrary vectors z and
w, where α and β are arbitrary scalars.
(c) Show that (*) holds for x = k and arbitrary vectors y, z and w.
(d) Show that (*) holds for for arbitrary vectors x, y, z and w.
9.7.3 The line in R2 that passes through the point r0 = (x0 , y0 ) and is parallel to
the nonzero vector v = (a, b) = ai + bj has parametric equations

x(t) = x 0 + at , y(t) = y0 + bt ,

or, in vector form r(t) = (x(t), y(t)),

r(t) = r0 + tv .

In R3 , the vector form is the same, but now r(t) = (x(t), y(t), z(t)), r0 =
(x0 , y0 , z 0 ) and v = (a, b, c) and the component equations are

x(t) = x 0 + at , y(t) = y0 + bt , z(t) = z 0 + ct .

(a) Find the parametric equations of the line


(i) passing through (4, 2) and parallel to v = (−1, 5),
(ii) passing through (1, 2, −3) and parallel to v = (4, 5, −7).
(b) Find parametric equations for the line whose vector equation is given as
(i) r(t) = 2i − 3j + ti − 4tj,
(ii) r(t) = −i + 0j + 2k − ti + 3tj.

9.7.4 The equation of the plane passing through a point r0 = (x0 , y0 , z 0 ) in R3 and
perpendicular to a vector N = (a, b, c), called a normal for the plane, is given
by
a(x − x 0 ) + b(y − y0 ) + c(z − z 0 ) = 0 ,

that is, the plane consists of the points r = (x, y, z) satisfying this equation.
Its vector form is
N · (r − r0 ) = 0 .
380 9 Vector Calculus

Find an equation of the plane that passes through the point r0 = (2, 6, 1)
having the vector N = (1, 4, 2) as a normal.
9.7.5 Let F and G be two vector fields defined on an interval with values in R3 .
Prove that
d dG dF
(a) [F · G] = F · + · G.
dt dt dt
d dG dF
(b) [F × G] = F × + × G.
dt dt dt
(c) Let r = r(t) be a vector-valued function with values in R3 such that
dr
r(t) = c for some constant c. Show that r(t) · r (t) = r(t) · (t) =
dt
dr
0, that is, r(t) is orthogonal to (t) = r (t), for all t.
dt
9.7.6 (a) Verify explicitly that div(curl F) = 0, where
(i) F = sinh(x − z)i + 2yj + (z − y 2 )k,
(ii) F = x 2 i + y 2 j + z 2 k.
(b) Verify explicitly that curl (∇ϕ) = 0, where
(i) ϕ(x, y, z) = −2x 3 yz 2 ,
(ii) ϕ(x, y, z) = e x+y+z .
9.7.7 (a) Let F and G be two vector fields with domain in R3 and values in R3 .
Prove that

div (F × G) = G · (∇ × F) − F · (∇ × G) .

(b) Let ϕ = ϕ(x, y, z) and ψ = ψ(x, y, z) be scalar fields. Prove that


div (∇ϕ × ∇ψ) = 0.
9.7.8 Let C be a constant vector and r(x, y, z) = xi + yj + zk. Prove that
(a) ∇(C · r) = C,
(b) div(r − C) = 3,
(c) curl(r − C) = 0.

9.7.9 Evaluate C F · dr, where F(x, y, z) = i − xj + k and C is parametrized by
r(t) = cos ti − sin tj + tk for0 ≤ t ≤ π.
9.7.10 Evaluate the surface integral Σ f dσ , where

(a) f (x, y, z) = y 2 , Σ is the part of the cone z = x 2 + y 2 lying in the first
octant and between the plane z = 2 and z = 4.
(b) f (x, y, z) = x yz, Σ is the part of the surface z = 1 + y 2 for 0 ≤ x ≤ 1,
0 ≤ y ≤ 1.
9.7.11 A particle moves once counterclockwise around the circle of radius 12 cen-
tered at the origin under the influence of the force F(x, y, z) = (e x − y +
x cosh x)i + (y 3/2 + x)j. Calculate the work done.
9.7 Exercises 381

9.7.12 Apply the Green–Ostrogradski theorem to evaluate the line integrals C F · dr
for the following data. The curves are oriented counterclockwise.
(a) F = x 2 yi − x y 2 j, C is the boundary of the region defined by x 2 + y 2 ≤
4, x ≥ 0 and y ≥ 0.
(b) F = (esin x − y)i + (sinh(y 3 ) − 4x)j, C is the circle of radius 4 with cen-
ter (−8, 0).
(c) F = (x 2 + y 2 )i + (x 2 − y 2 )j, C is the ellipse 4x 2 + y 2 = 10.

Let D be a region bounded by the surface Σ. Evaluate Σ F · n dσ or
9.7.13 
D div(F) d V for the following data, whichever is more convenient.

(a) F = 4xi − 6yj + k, Σ is the surface of the solid cylinder x 2 + y 2 ≤ 4,


0 ≤ z ≤ 6 (the surface includes both caps of the cylinder).
(b) F = 2yzi − 4x zj+x yk, Σ is the sphere of radius 6 with center (−1, 3, 1).

9.7.14 Find the value of Σ F · n dσ for the data F = 3x yi + z 2 k, Σ the sphere of
radius 1 centered at the origin.
9.7.15 Let Σ be a smooth  surface enclosing some region D, let C be a constant
vector. Show
 that Σ C · n dσ = 0.
9.7.16 Evaluate Σ (curl F) · n dσ , where F = x yi + yzj + x yk and Σ is the part
of the plane 2x + 4y + z = 8 in the first octant.
9.7.17 Calculate the circulation of F = (x − y)i + x 2 yj + x zak counterclockwise
around the circle x 2 + y 2 = 1, where a is a positive constant.
9.7.18 Examine whether the following vector fields are conservative or not.
(a) F = cosh(x + y)(i + j − k),
(b) F = 2xi − 2yj + 2zk,
(c) F = i − 2j + k.
9.7.19 Let Σ be the portion of the paraboloid z = 1 − x 2 − y 2 for which z ≥ 0, and
let C be the circle x 2 + y 2 = 1 that forms the boundary of Σ. Verify Stokes’
theorem for the vector field

F = (x 2 y − z 2 )i + (y 3 − x)j + (2x + 3z − 1)k

by evaluating the line integral as well as the surface integral.


Chapter 10
Fourier Methods with Applications

10.1 Introduction

Fourier methods or Fourier analysis is a branch of mathematics that was developed


formally some 150 years after Newton’s and Leibniz’ calculus and heavily depended
on integral and differential calculus. Jean Baptiste Joseph Fourier was born in 1768
in Auxerre, a town between Paris and Dijon. He became fascinated by mathematics
at the age of 13 years. After the French revolution, Fourier taught in Paris, then
accompanied Napoleon to Egypt and served as permanent secretary of the Institute
of Egypt. He wrote a book on Egypt and in certain quarters he is famous as an
Egyptologist rather than for his contributions to mathematics and physics. In the
world of science, he is famous for, among other things, the ideas and thoughts he set
forth in a memoir in 1807 and published in 1822 in his book in French entitled “The
Analytic Theory of Heat”.
Fourier analysis shows that we can represent periodic functions, even very jagged
and irregular-looking ones, in form of a finite or infinite sum of sine and cosine func-
tions called Fourier series. Nonperiodic functions can be treated with the Fourier
transform. Fourier himself showed how these mathematical tools can be used to
study natural phenomena such as heat diffusion, making it possible to solve equations
that had until then remained intractable. Under the action of the Fourier transform,
derivatives are transformed into multiplications, thus turning differential equations
into equations containing algebraic expressions. In this way, many important differ-
ential equations are transformed into equations which are much easier to study and
to solve.
If a phenomenon is described by a function of time or of space, Fourier analysis
tells us, loosely speaking, how much of each frequency this phenomenon contains.
In many cases, this frequency information is not simply a mathematical trick to make
calculations easier, but corresponds to relevant properties of the phenomenon under
study.
During the second half of the previous century, Fourier analysis has been refined in
various directions in order to make it easier and more efficient to transmit, compress,

© Springer Nature Singapore Pte Ltd. 2019 383


M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6_10
384 10 Fourier Methods with Applications

and analyze information or to separate information from surrounding noise. One of


those techniques, the wavelet transform which was invented during the 80s, will be
presented at the end of this chapter.
Fourier analysis and its descendants have been applied to a broad range of areas of
science and engineering, let us mention telecommunications and signal processing,
physics, imaging in the biomedical sciences (EEG, ECG, CAT, MRI, NMR) and
elsewhere, meteorology, oceanography, seismology, economics, and finance. A few
of them will be touched in this chapter. Interested readers may find details of some
of these applications in the references Prestini [28] and Cartwright [7]. There are
also many interactions within mathematics itself, in areas as diverse as statistics and
number theory.
For several reasons, we have chosen not to present proofs (detailed or sketched)
for the majority of results of this chapter, and we refer the reader to the literature on
the subject.
Finally, we want to acknowledge the contribution of Eng. A.K. Verma to this
chapter, whose program we have used in order to give pictures of the partial sums of
several of the Fourier series discussed below.

10.2 Orthonormal Systems and Fourier Series

10.2.1 Orthonormal Systems

In Chap. 9, we have introduced the concept of orthogonal (or perpendicular) vectors.


Namely, two vectors u and v are called orthogonal if their scalar product (dot product)
u · v equals zero. Recall also that u · u = u2 > 0 if u  = 0. We now extend this
concept in two respects. We define orthogonality for a system of more than two
vectors, and we define orthogonality for functions instead of vectors.
During all of this chapter, the functions considered will be piecewise continu-
ous. Such functions are formed by putting together continuous functions defined on
separate intervals, for example, the sign function or the greatest integer function as
considered in Sect. 1.3, or the “sawtooth” function defined by

1 1
f (x) = x − k, if k − ≤x <k+ . (10.1)
2 2
The formal definition runs as follows. A function f defined on an interval I = [a, b] is
called piecewise continuous, if we can decompose I into finitely many subintervals
Ik = [xk−1 , xk ] such that f is continuous in the interior of Ik and the one-sided limits
lim x→xk−1 + f (x) and lim x→xk − f (x) exist for all such subintervals. If the domain of
f is unbounded, we require this property to hold for every bounded interval in the
domain. As a consequence, we can integrate piecewise continuous functions, and
10.2 Orthonormal Systems and Fourier Series 385
 b  xk
f (x) d x = f (x) d x .
a k xk−1

Definition 10.1 (Orthogonal functions) Let f 1 , f 2 be piecewise continuous real-


valued functions defined on an interval [a, b]. Their scalar product (or inner prod-
uct) is denoted by  f 1 , f 2  and defined by
 b
 f1 , f2  = f 1 (x) f 2 (x) d x . (10.2)
a

The functions f 1 and f 2 are said to be orthogonal on [a, b] if


 b
 f1 , f2  = f 1 (x) f 2 (x) d x = 0 . (10.3)
a

Example 10.1 (a) The functions f 1 (x) = e x and f 2 (x) = sin x are orthogonal on
the interval [π/4, 5π/4].
(b) The functions f 1 (x) = x and f 2 (x) = cos 2x are orthogonal on the interval
[−π/2, π/2].
In analogy to the case of vectors, for functions f : [a, b] → R, we define

  b
f =  f, f  = f (x)2 d x (10.4)
a

and call it the norm of f . (We might also call  f  the “length” of f , although its
geometric meaning is no longer obvious.) We then have
 b
 f  =  f, f  =
2
f (x)2 d x .
a

As in the case of vectors, we have 0 = 0 for the zero function. On the other hand,
a piecewise continuous function f satisfies  f  = 0 only if f is the zero function;
in other words, if f is nonzero, we must have  f  > 0.
If two nonzero functions f 1 and f 2 are orthogonal, we say that the set { f 1 , f 2 }
formed by those two functions is an orthogonal system. If we have more than two
functions, we require that the functions are mutually (or pairwise) orthogonal in the
sense of the following definition.
Definition 10.2 (Orthogonal system, orthonormal system) A system (or set) consist-
ing of finitely or infinitely many nonzero functions f1 , f 2 , . . . is said to be orthogonal
on the interval [a, b] if
 b
 fm , fn  = f m (x) f n (x) = 0 , whenever m  = n.
a
386 10 Fourier Methods with Applications

If moreover
 f n , f n  = 1 =  f n  , for all n,

the system is called orthonormal on [a, b].

Remark 10.1 An orthogonal system { f 1 , f 2 , . . .} can be made into an orthonormal


system by replacing each function f n with its scalar multiple

fn
.
 fn 

Example 10.2 (a) The set {1, cos x, cos 2x, . . . , cos nx} is an orthogonal system on
[−π, π].
 1 cos x cos nx 
(b) The set √ , √ , . . . , √ is an orthonormal system on [−π, π].
2π π π
(c) Examine whether the following systems are orthonormal or not on the intervals
indicated:
(i) {sin x, sin 3x, sin 5x, . . .}, I = [0, π/2],
(ii) {sin nx} : n = 1, 2, 3, . . .}, I = [0, π ],
cos nπ x
(iii) {1, : n = 1, 2, 3, . . .}, I = [0, l],
l
cos nπ x sin mπ x
(iv) {1, , } : n, m = 1, 2, 3, . . .}, I = [−l, l].
l l
Solution: We discuss here part (c).
(i) We set f n (x) = sin(2n + 1)x. For m = n, we obtain
 π/2
 fn , fm  = sin(2n + 1)x sin(2m + 1)x d x
0
 π/2
1
= [cos 2(n − m)x − cos 2(n + m + 1)x] d x
2 0

(using the trigonometric identity 2 sin Ax sin Bx = cos(A − B)x − cos(A + B)x)

1  sin 2(n − m)x π/2 1  sin 2(n + m + 1)x π/2


= −
2 2(n − m) 0 2 2(n + m + 1) 0

= 0 −0 = 0.

For n = m, we get
 π/2  π/2
1
1
sin (2n + 1)x d x =
2
− cos 2(2n + 1)x d x
0 0 2 2

(using the identity 2 sin2 Ax = 1 − cos 2 Ax)


10.2 Orthonormal Systems and Fourier Series 387
 
1 π/2 1 π/2
= dx − cos 2(2n + 1)x d x
2 0 2 0
 1 π/2 1  1 π/2
= x − sin 2(2n + 1)x
2 0 2 2(2n + 1) 0
π
= .
4
Hence, the given system
√ is orthogonal but not orthonormal. However, if we multiply
each function by 2/ π , the new system obtained in this way is orthonormal, that is,

2 sin x 2 sin 3x sin 5x 2 sin(2n + 1)x
√ , √ ,2 √ ,..., √ ,...,
π π π π

is an orthonormal system on [0, π/2].


(ii) Set f n (x) = sin nx. For m = n, we have
 π  π 
1
 fn , fm  = sin nx sin mx d x = cos(n − m)x − cos(n + m)x d x
0 2 0

(by the trigonometric identity 2 sin Ax sin Bx = cos(A − B)x − cos(A + B)x)
 
1 π 1 π
= [cos(n − m)x] d x − [cos(n + m)x] d x
2 0 2 0
1  sin(n − m)x π 1  sin(n + m)x π
= − = 0.
2 (n − m) 0 2 (n + m) 0

For m = n, we have
 π  π  
1 1
 fn , fn  = sin2 nx d x =
− cos 2nx dx
0 0 2 2
1 1 π π
= π− sin 2nx = ,
2 4n 0 2

where again we have used that 2 sin2 Ax = 1 − cos 2 Ax. Hence, {sin  √nx} is an
2 sin nx
orthogonal, but not an orthonormal system on [0, π ]. But the system √
π
is an orthonormal system on [0, π ].
(iii) For m  = n, we have
  l 
l
nπ x mπ x 1 (n − m)π x (n + m)π x
cos cos dx = cos + cos dx
0 l l 2 0 l l

(by the trigonometric identity 2 cos Ax cos Bx = cos(A − B)x + cos(A + B)x)
388 10 Fourier Methods with Applications
 
1 (n − m)π x
l
1 l (n + m)π x
= cos dx + cos dx
2 0 l 2 0 l
 l  
l (n − m)π x l (n + m)π x l
= sin + sin
2(n − m)π l 0 2(n + m)π l 0
= 0 +0 = 0.

For m = n, we have
  l   l  
l
nπ x 1 1 2nπ x 1 l 2nπ x l
cos2 dx = + cos dx = x + sin
0 l 0 2 2 l 2 0 4nπ l 0
l
= .
2
Moreover,
    l
l
nπ x l nπ x l
1 · cos dx = sin = 0, 12 d x = l .
0 l nπ l 0 0

The given system is orthogonal but not orthonormal. However, the system
 √ 
1 2 cos nπ x
√ ,√
l l l

is orthonormal.
(iv) We know already from part (iii) that
 l  l
nπ x mπ x nπ x mπ x
cos cos dx = 2 cos cos dx = 0 .
−l l l 0 l l

Concerning the sine, for m = n, we get


 l  l
nπ x mπ x nπ x mπ x
sin sin dx = 2 sin sin dx
−l l l 0 l l

(by the identity 2 sin Ax sin Bx = cos(A − B)x − cos(A + B)x)


 l 
(n − m)π x (n + m)π x
= cos− cos dx
0 l l
 l  l
(n − m)π x l (n + m)π x l
= sin · − sin ·
l n−m 0 l n+m 0
= 0 −0 = 0.
10.2 Orthonormal Systems and Fourier Series 389

Using the same identities as in the previous computations one derives that
 l  l
nπ x nπ x
sin2
dx = l , cos2 dx = l ,
−l l −l l
 l
nπ x mπ x
sin cos d x = 0 , for n  = m,
−l l l

moreover
 l  l  l
nπ x nπ x
1 · cos dx = 0 , 1 · sin dx = 0 , 12 d x = 2l .
−l l −l l −l

The given system is orthogonal but not orthonormal. However, the system

1 1 cos nπ x 1 sin nπ x
√ ,√ ,√
l l l l l

is orthonormal.
Remark 10.2 The systems named after Rademacher, Haar, and Walsh are other well-
known orthonormal systems.
A function f is said to be a linear combination of the functions f 1 , f 2 , . . . , f n if

f = α1 f 1 + α2 f 2 + · · · + αn f n

holds for suitably chosen scalars α1 , α2 , . . . , αn .


Definition 10.3 (Linear Dependence and Independence) A system { f 1 , f 2 , . . . , f n }
of functions is said to be linearly independent if α1 f 1 + α2 f 2 + α3 f 3 + · · · +
αn f n = 0 implies that α1 = α2 = α3 = · · · = αn = 0. The system is called linearly
dependent if it is not linearly independent. In other words, at least one element of
the system is a linear combination of the remaining n − 1 elements. An infinite sys-
tem { f 1 , f 2 , . . .} of functions is said to be linearly independent, if every finite subset
taken from it forms a linearly independent system in the sense above, and it is said
to be linearly dependent if this is not the case.

Remark 10.3 It turns out that every orthogonal (and, hence, every orthonormal)
system is linearly independent. However, the converse is not true, so a system may
be linearly independent without being orthogonal.
390 10 Fourier Methods with Applications

10.2.2 Fourier Series

A series of the form


1
a0 + (a1 cos x + b1 sin x) + (a2 cos 2x + b2 sin 2x) + · · ·
2
that is, ∞
1 
a0 + (an cos nx + bn sin nx) (10.5)
2 n=1

is called an (infinite) trigonometric series.


Let f be a piecewise continuous function defined on the interval [−π, π]. The
numbers

1 π
a0 = f (x) d x , (10.6)
π −π
 
1 π 1 π
an = f (x) cos nx d x , bn = f (x) sin nx d x , (10.7)
π −π π −π

where n = 1, 2, 3, . . ., are called the Fourier coefficients (more precisely, the Fourier
cosine resp. sine coefficients) of f on the interval [−π, π], and the series (10.5) with
the coefficients from (10.6) and (10.7) is called the Fourier series of f on this
interval. Note that a0 /2 is just the average of the function f over [−π, π].
The Fourier coefficients arise when we want to represent a function f : [−π, π] →
R as a trigonometric series,
∞
1
f (x) = a0 + (an cos nx + bn sin nx) , (10.8)
2 n=1

for the following reason. For an arbitrary trigonometric series, we define the partial
sums
1 N
s N (x) = a0 + (an cos nx + bn sin nx) . (10.9)
2 n=1

(We remind the reader that we have studied series and their partial sums in
Chap. 5.) Now let us compute, for N ≥ n ≥ 1, using the orthogonality of the system
{cos nx, sin nx : n = 1, 2, . . .} on [−π, π],
 π  π 
N  π
1
s N (x) cos nx d x = a0 cos nx d x + ak cos nx cos kx d x
−π 2 −π k=1 −π


N  π
+ bk cos nx sin kx d x
k=1 −π

= 0 + an π + 0 = πan . (10.10)
10.2 Orthonormal Systems and Fourier Series 391

If f can be represented as in (10.8), that is, if

f (x) = lim s N (x) ,


N →∞

and if moreover
 π  π
f (x) cos nx d x = lim s N (x) cos nx d x ,
−π N →∞ −π

then we see from (10.10) that


 π
1
an = f (x) cos nx d x .
π −π

Similarly, we get the corresponding formula for bn by using sin nx instead of cos nx
in (10.10).
Instead of the interval [−π, π] we may consider an interval [−l, l], where l > 0
is an arbitrary number.

Definition 10.4 (Fourier series) Let f be a piecewise continuous function defined


on the interval [−l, l], where l > 0. Then the Fourier series of f on [−l, l] is given
by
a0 
nπ x

nπ x
+ an cos + bn sin , x ∈ [−l, l] ,
2 n=1
l l

where
l
a0 = 1
l
f (x) d x ,
−l (10.11)
l l
an = 1
l −l f (x) cos nπ x
l
d x , bn = 1l −l f (x) sin nπ x
l
dx . (10.12)

If the Fourier series at x converges to f (x), we write as usual

a0 
nπ x

nπ x
f (x) = + an cos + bn sin . (10.13)
2 n=1
l l

The following theorem gives conditions under which (10.13) holds, that is, when the
Fourier series of f on [−l, l] converges to f .
Theorem 10.1 (Dirichlet Convergence Theorem) Let f and f be piecewise con-
tinuous on the interval [−l, l], let x ∈ (−l, l).
(i) If f is continuous at x, the Fourier series of f at x converges to f (x).
392 10 Fourier Methods with Applications

(ii) If f is discontinuous at x, the Fourier series of f at x converges to the mean


value
f (x+) + f (x−)
,
2
where f (x+) and f (x−) denote the right- and left-hand limits, respectively, of f
at x.
Note that in fact (i) can be viewed as a special case of (ii), since f (x) = f (x+) =
f (x−) whenever f is continuous at x.
Example 10.3 Expand the following functions in Fourier series:

0, −π < x < 0 ,
(a) f (x) = on [−π, π],
π−x, 0≤x <π,

0 , −π < x < 0 ,
(b) f (x) = on [−π, π],
1, 0 ≤ x < π ,

0 , −1 < x < 0 ,
(c) f (x) = on [−1, 1],
x , 0 ≤ x < 1,

0, −π < x < 0 ,
(d) f (x) = on [−π, π],
sin x , 0 ≤ x < π ,

π

⎨− 4 , −π < x < 0 ,
(e) f (x) = 0 , x = 0, on [−π, π],

⎩π
, 0≤x <π,
 4
−1 , −4 < x < 0 ,
(f) f (x) = on [−4, 4],
1, 0 ≤ x < 4,
(g) f (x) = e x , on [−π, π].
Solution: (a) Here we have l = π. We get
 π  0  π 
1 1
a0 = f (x) d x = f (x) d x + f (x) d x
π −π π −π 0
 0  π   π
1 1 x2 π
= 0 dx + (π − x) d x = πx − = .
π −π 0 π 2 0 2
  0  π 
1 π 1
an = f (x) cos nx d x = 0 dx + (π − x) cos nx d x
π −π π −π 0
  
1 sin nx π 1 π
= (π − x) + sin nxd x
π n 0 n 0
 
1 cos nx π − cos nπ + 1 1 − (−1)n
= − = = .
nπ n 0 n π
2 n2π
10.2 Orthonormal Systems and Fourier Series 393

Fig. 10.1 The function from Example 10.3(a)

Fig. 10.2 Fourier approximation of the function in Fig. 10.1

Similarly, we can calculate that


  0  π 
1 π 1
bn = f (x) sin nx d x = f (x) sin nx d x + f (x) sin nx d x
π −π π −π 0
 π
1 1
= (π − x) sin nx d x = .
π 0 n

Since the given function is continuous except at x = 0, the Fourier series converges.
See the graphs of the function and of a partial sum of its Fourier series in Figs. 10.1
and 10.2, respectively.

(b) We have
 π   π  
1 1 0
1 π
a0 = f (x) d x = f (x) d x + f (x) d x = 0 + 1 dx
π −π π −π 0 π 0
394 10 Fourier Methods with Applications

Fig. 10.3 The function from


Example 10.3(b)

= 1,
 π  0  π 
1 1
an = f (x) cos nx d x = f (x) cos nx d x + f (x) cos nx d x
π −π π −π 0
 π  
sin nx π
=0+ 1 cos nx d x = = 0,
0 n 0
 π  0  π 
1 1
bn = f (x) sin nx d x = f (x) sin nx d x + f (x) sin nx d x
π −π π −π 0
 π
1 1  cos nx π
=0+ sin nx d x = −
π 0 π n 0
 
1 cos nπ 1
= − +
π n n
(−1) n
1
=− + , as cos nπ = (−1)n ,
nπ nπ
1
= (1 − (−1)n ) .

Since the given function is continuous except at x = 0, the Fourier series converges
to the given function for x = 0, and we have

1 1  1 − (−1)n
f (x) = + sin nx .
2 π n=1 n

For the graphs of f and of a partial sum of its Fourier series see Figs. 10.3 and 10.4,
respectively.

(c) Here l = 1. The given function is continuous also at x = 0, and hence its Fourier
series converges everywhere. We get
 1  0  1   0  1 
1
a0 = f (x) d x = f (x) d x + f (x) d x = 0 dx + x dx
−1 π −1 0 −1 0
 1  1  
1 2 1 2 1
=0+ x dx = x = ·1 −0 = ,
0 2 0 2 2
10.2 Orthonormal Systems and Fourier Series 395

Fig. 10.4 Fourier approximation of the function in Fig. 10.3

 1  0  1
1
an = f (x) cos nx d x = cos nπ x d x + f (x) cos nπ x d x
1 −1 −1 0
 0  1
= 0 cos nπ x d x + x cos nπ x d x
−1 0
   1
x sin nπ x 1 sin nπ x
=0+ π − dx
n 0 0 nπ
    
sin nπ cos nπ x 1 cos nπ cos nπ0
= −0 + = 0 + −
nπ π 2n2 0 n2π 2 π 2n2
(−1)n 1 1
= − 2 2 = 2 2 [(−1)n − 1] .
π n
2 2 n π n π
Similarly,
 1  0  1
bn = f (x) sin nπ x d x = f (x) sin nπ x d x + f (x) sin nπ x d x
−1 −1 0
 1
(−1)n+1
=0+ x sin nπ x d x = ,
0 nπ

through integration by parts. See Figs. 10.5 and 10.6 for the graphs of the function
and of a partial sum of its Fourier series.
396 10 Fourier Methods with Applications

Fig. 10.5 The function from Example 10.3(c)

Fig. 10.6 Fourier approximation of the function in Fig. 10.5

(d) We have
 π  
1 1 0 1 π
a0 = f (x) d x = f (x) d x + f (x) d x
π −π π −π π 0
 π
1 1 2
=0+ sin x d x = [− cos x]π0 = .
π 0 π π
  0  π 
1 π
an = f (x) cos nx d x = f (x) cos nx d x + f (x) cos nx d x
π −π −π 0
 π
1
= sin x cos nx d x
π 0
 π
1
= [sin(n + 1)x + sin(1 − n)x]d x
2π 0
10.2 Orthonormal Systems and Fourier Series 397

(by the identity 2 sin Ax cos Bx = sin(A + B)x + sin(A − B)x)


   
1 − cos(n + 1)x π 1 − cos(n − 1)x π
= +
2π n+1 2π 1−n
 0
  0

1 − cos(n + 1)π 1 1 − cos(1 − n)π 1
= + + +
2π n+1 n+1 2π 1−n 1−n
1 + (−1)n
= , for n = 2, 3, 4, . . .
π(1 − n 2 )
 π
1
a1 = sin 2x d x = 0 .
2π 0
 
1 π 1 π
bn = f (x) sin nx d x = sin x sin nx d x
π −π π 0
 π
1
= cos(1 − n)x − cos(1 + n)x] d x
2π 0
= 0 , for n = 2, 3, 4, . . .
 π
1 1
b1 = (1 − cos 2x) d x = .
2π 0 2

Thus,

 1 + (−1)n
1 1
f (x) = + sin x + cos nx
π 2 n=2
π(1 − n 2 )

is the Fourier series for f .


See the graph of the function in Fig. 10.7 and the graph of a partial sum of its
Fourier series in Fig. 10.8.
(e) We have
  
1 π 1 0 1 π
a0 = f (x) d x = f (x) d x + f (x) d x
π −π π −π π −π
 
1 0 π 1 π π π π
= − dx + dx − + = 0 .
π −π 4 π 0 4 4 4

Fig. 10.7 The function from


Example 10.3(d)
398 10 Fourier Methods with Applications

Fig. 10.8 Fourier approximation of the function in Fig. 10.7


1 π
an = f (x) cos nx d x
π −π
 0 
1 1 π
= f (x) cos nx d x + f (x) cos nx d x
π −π π 0
 
1π 0 1π π
=− cos nx d x + cos nx d x
π 4 −π π 4 0
1  sin nx 0 1  sin nx π
=− + = 0.
4 n −π 4 n 0


1 π
bn = f (x) sin nx d x
π −π
 
1 0 1 π
= f (x) sin nx d x + f (x) sin nx d x
π −π π 0
 0  π
1 π 1 π
= − sin nx d x + sin nx d x
π −π 4 π 0 4
1  cos nx 0 1  cos nx π
= −
4 n −π 4 n 0
1 cos nπ 1 1 2 2(−1)n
= − − cos nπ + = − .
4n 4n 4n 4n 4n 4n
Thus, the Fourier series for f is
10.2 Orthonormal Systems and Fourier Series 399

Fig. 10.9 The function from Example 10.3(e)

Fig. 10.10 Fourier approximation of the function in Fig. 10.9


 1
f (x) = sin(2n − 1)x .
n=1
2n − 1

The graph of this function and of a partial sum of its Fourier series are given in
Figs. 10.9 and 10.10, respectively.

(f) We have
  
1 4 1 0 1 4 1 1
a0 = f (x) d x = f (x) d x + f (x) d x = [−x]0−4 + [x]40
4 −4 4 −4 4 0 4 4
= −1 + 1 = 0 .
 
1 0 nπ x 1 4 nπ x
an = − cos dx + cos dx
4 −4 4 4 0 4
400 10 Fourier Methods with Applications

Fig. 10.11 The function


from Example 10.3(f)

   
1 4 nπ x 0 1 4 nπ x 4
=− sin + sin = 0.
4 nπ 4 −4 4 nπ 4 0
 
1 0 nπ x 1 4 nπ x
bn = f (x) sin dx + f (x) sin dx
4 −4 4 4 0 4
   
1 0 nπ x 1 4 nπ x 1 4 nπ x 0
=− sin dx + sin dx = cos
4 −4 4 4 0 4 4 nπ 4 −4
 4
1 4 nπ x 1 1
− cos = [1 − (−1)n ] − [(−1)n − 1]
4 nπ 4 0 nπ nπ
2 2(−1)n 4 1
= − = .
nπ nπ π 2n − 1

Therefore, the Fourier series of f is



4 1  (2n − 1)π x
f (x) = sin .
π n=1 2n − 1 4

The graph of this function and of a partial sum of its Fourier series are given in
Figs. 10.11 and 10.12.

(g)
 
1 π 1 π x 1
a0 = f (x) d x = e d x = (eπ − e−π ) ,
π −π π −π π
 
1 π 1 π x (−1)n (eπ − e−π )
an = f (x) cos nx d x = e cos nx d x = ,
π −π π −π π(1 + n 2 )
 
1 π 1 π x (−1)n (e−π − eπ )
bn = f (x) sin nx d x = e sin nx d x = .
π −π π −π π(1 + n 2 )
10.2 Orthonormal Systems and Fourier Series 401

Fig. 10.12 Fourier approximation of the function in Fig. 10.11

Fig. 10.13 The function


from Example 10.3(g)

The Fourier series of f is


∞  
eπ − e−π  (−1)n (eπ − e−π ) (−1)n (e−π − eπ )
f (x) = + cos nx + sin nx .
2π n=1
π(1 + n 2 ) π(1 + n 2 )

See Figs. 10.13 and 10.14 for the graph of the function and of a partial sum of its
Fourier series.
402 10 Fourier Methods with Applications

Fig. 10.14 Fourier approximation of the function in Fig. 10.13

10.2.3 Further Properties of Fourier Series

Periodic Extension. Let us recall the notion of a periodic function given in Chap. 1. A
real function of a single variable is called periodic with period p if f (x + p) = f (x)
for all x. For example, 8π is a period of the sine function as sin(x + 8π ) = sin x for
all x. The smallest value of p for which f (x + p) = f (x) holds for all x is called
the fundamental period. For example, p = 2π is the fundamental period of the sine
function as 2π is the smallest value of p which satisfies f (x + p) = f (x) for all x.
Let us point out that often “period” is defined to be the fundamental period.
Let f be an arbitrary function defined on (−l, l). Its Fourier series (if convergent)

1  nπ x nπ x
a0 + an cos + bn sin
2 n=1
l l

is a periodic function of x of period p = 2l and thus (if convergent to f ) not only


represents f on (−l, l) but also gives the periodic extension of f on the real line.
Approximation by Partial Sums. Let

1 N
 
s N (x) = a0 + an cos nx + bn sin nx
2 n=1

denote the N th partial sum of the Fourier series of a function f defined on [−π, π].
One may ask how well s N approximates f . We present without proof the following
10.2 Orthonormal Systems and Fourier Series 403

two results. If f is continuous on [−π, π] and f (−π ) = f (π ), then



π
1 1
| f (x) − s N (x)| ≤ √ √ | f (t)|2 dt , for all x.
N π −π

If f is continuous, piecewise differentiable, and satisfies f (−π ) = f (π ), then




| f (x) − s N (x)| ≤ (|an | + |bn |) , for all x.
n=N

The Gibbs Phenomenon. If we examine the graphs of the partial sums of the Fourier
series in Example 10.2.2, we observe that all of them are overshooting the true values
of the function f near its point of discontinuity. In fact, this phenomenon always
occurs when we approximate a discontinuous function with Fourier series. It is
known as the Gibbs phenomenon, in honor of Josiah William Gibbs, a mathematical
physicist working at Yale, who analyzed it just prior to 1900, after it had been
discovered by Henry Wilbraham already in 1848. One can show that the overshooting
amounts to approximately 9% of the size of the jump, that is, of the difference
| f (x+) − f (x−)| of the one-sided limits at the discontinuity point x. The main
point is that the amount of overshooting of the partial sums s N does not decrease
when N tends to infinity.
As an example, let us consider the Fourier series of the step function


⎨−1 , if x ∈ [−π, 0),
f (x) = 0 , if x = 0,


1, if x ∈ (0, π ],

extended to a 2π-periodic function on R. Proceeding along the lines of the solution


of Example 10.2.3(e), we obtain that

4 1
f (x) = sin(2n − 1)x .
π x=1 2n − 1

The graph of s15 is given in Fig. 10.15, and the graph of s100 is given in Fig. 10.16.
We observe the overshooting of the partial sum around the points where f is discon-
tinuous.
Complex Form of Fourier Series. For this paragraph, we assume the reader to have
some basic familiarity with complex numbers. The sine and cosine are related to the
complex exponential function by
404 10 Fourier Methods with Applications

Fig. 10.15 Fourier approximation for 15 harmonics

Fig. 10.16 Fourier approximation for 100 harmonics

eiθ = cos θ + i sin θ , e−iθ = cos θ − i sin θ ,


eiθ + e−iθ eiθ − e−iθ
cos θ = , sin θ = ,
2 2i
10.2 Orthonormal Systems and Fourier Series 405

where θ is any real number. For a real-valued function f with domain (−π, π ), we
define the complex Fourier coefficients by
 π
1
cn = f (x)e−inx d x , (10.14)
2π −π

where n is any integer 0, ±1, ±2, . . .. Since


 π  π
f (x)e−inx d x = f (x)(cos nx − i sin nx) d x
−π
−π
π  π
= f (x) cos nx d x − i f (x) sin nx d x
−π −π
= πan − iπbn , n > 0 ,

and, analogously,
 π
f (x)einx d x = πan + iπ bn , n > 0 ,
−π

we see that the complex Fourier coefficients are related to the Fourier sine and cosine
coefficients by
1 1
cn = (an − ibn ) , c−n = (an + ibn ) ,
2 2 (10.15)
an = cn + c−n , bn = i(cn − c−n ) ,

for any n > 0. Moreover,


 π
1 a0
c0 = f (x) d x = . (10.16)
2π −π 2

From (10.15), we obtain

cn einx + c−n e−inx = cn (cos nx + i sin nx) + c−n (cos nx − i sin nx)
= an cos nx + bn sin nx .

Therefore, the partial sums s N of the Fourier series can also be represented as

1 N N
s N (x) = a0 + (an cos nx + bn sin nx) = cn einx ,
2 n=1 n=−N
406 10 Fourier Methods with Applications

and the Fourier series of f can be written in complex form as




cn einx .
n=−∞

Over an interval (−l, l), l > 0, the complex form of the Fourier series is defined as

  l
1
f (x)e−
inπ x inπ x
cn e l , where cn = l d x , n = 0, ±1, ±2, ±3, . . .
n=−∞
2l −l

It can be verified that the set of functions { √12π einx }n∈Z is an orthonormal set.
Sine and Cosine Fourier Series. We consider Fourier series of even and odd func-
tions. Recall that if g is an even function on (−π, π ), then
 0  π
g(−x) = g(x) for x ∈ [0, π), g(x) d x = g(x) d x ,
−π 0

and if h is an odd function on (−π, π ), then


 0  π
h(−x) = h(x) for x ∈ [0, π), h(x) d x = − h(x) d x .
−π 0

Thus, if f is even on (−π, π ), then its Fourier coefficients satisfy


  
1 π 1 0
1 π
a0 = f (x) d x = f (x) d x + f (x) d x
π −π π −π π 0

2 π
= f (x) d x ,
π 0
  
1 π 1 0
1 π
an = f (x) cos nx d x = f (x) cos nx d x + f (x) cos nx d x
π −π π −π π 0

2 π
= f (x) cos nx d x ,
π 0

since the function g(x) = f (x) cos nx is even, and


 π  0  π
1 1 1
bn = f (x) sin nx d x = f (x) sin nx d x + f (x) sin nx d x
π −π π −π π 0
= 0,

since the function h(x) = f (x) sin nx is odd. Consequently, the Fourier series of an
even function f is a cosine series,
10.2 Orthonormal Systems and Fourier Series 407


a0 
+ an cos nx ,
2 n=1

where the coefficients an are given above.


Similarly, we find that the Fourier series of an odd function f contains only sine
terms, and hence it is a sine series

  π
2
bn sin nx , where bn = f (x) sin nx d x .
n=1
π 0

Indeed, in this case g(x) = f (x) cos nx is an odd function, while h(x) = f (x) sin nx
is an even function, so in particular
  
1 π 1 π 1 π
a0 = f (x) d x = − f (x) d x + f (x) d x = 0 ,
π −π π 0 π 0
  
1 π 1 0 1 π
an = f (x) cos nx d x = f (x) cos nx d x + f (x) cos nx d x
π −π π −π π 0
 π  π
1 1
=− f (x) cos nx d x + f (x) cos nx d x = 0 .
π 0 π 0

As a consequence, if we know that f is odd or even on (−π, π ), we need to compute


only the bk ’s if f is odd function and the ak ’s if f is an even function, respectively.
Example 10.4 Find the Fourier series of the function f (x) = x 2 , x ∈ (−π, π ).
Solution: Since f (−x) = (−x)2 = x 2 = f (x), f is even and so bn = 0 for n =
1, 2, 3, . . . We compute
 π  π
2 2 2 1  3 π 2
a0 = f (x) d x = x2 dx = · x 0 = π2 ,
π 0 π 0 π 3 3

and
 
2 π 2 π 2
an = f (x) cos nx d x = x cos nx d x
π 0 π 0
 π
2  x2 π 4
= sin nx − x sin nx d x
π n 0 nπ 0
 π
4  π 4
= 0 + 2 x cos nx 0 − 2 cos nx d x
n π n π 0
4 4  π 4
= 2 (−1)n − 3 sin nx 0 = 2 (−1)n .
n n π n
408 10 Fourier Methods with Applications

Thus, the Fourier series of f is given as the cosine series

∞
1 2 (−1)n
π +4 cos nx .
3 n=1
n2

Phase Angle Form and Frequency Spectrum. Let f be a periodic function defined
on the real line which has the fundamental period l, that is, f (x + l) = f (x) for all
x, and l is the smallest number satisfying this condition. We define ω = 2π/l as the
frequency corresponding to l. Let

∞
1
a0 + an cos(nωx) + bn sin(nωx) (10.17)
2 n=1

be the Fourier series of f on [−l/2, l/2]. The series

∞
1
a0 + dn cos(nωx + δn ) (10.18)
2 n=1

is called the phase angle of the Fourier series. Indeed, if two pairs (a, b) and (d, δ)
of numbers are related by

a = d cos δ , b = −d sin δ ,
  b (10.19)
d = a 2 + b2 , δ = arctan − ,
a
then
a cos(nωx) + b sin(nωx) = d cos(nωx + δ)

holds for all x, as a consequence of the trigonometric identities, so the series (10.17)
and (10.18) correspond term by term. The phase angle form of the Fourier series is
also called its harmonic form. It represents a periodic function as a superposition of
cosine waves. The term cos(nωx + δn ) is the nth harmonic, dn is the nth harmonic
amplitude, and δn is the nth phase angle of f . Note that the harmonic amplitude
satisfies
dn = 2|cn | , (10.20)

where cn = (an − ibn )/2 is the nth complex Fourier coefficient of f as introduced
earlier.
The amplitude spectrum or frequency spectrum of the periodic function f
is a plot of |cn | = dn /2 on the vertical versus n along the horizontal axis. The
phase spectrum of f is a plot of the points (n, δn ) for n = 0, 1, 2, . . ., where
δn = arctan(−bn /an ) is nth phase angle of f .
10.2 Orthonormal Systems and Fourier Series 409

Example 10.5 Find the complex Fourier series of f on the given intervals. Further-
more, find the frequency spectrum of the function.
(a) 
0 , −π < x < 0 ,
f (x) = on [−π, π].
x, 0≤x <π,

(b) 
−1 , −2 < x < 0 ,
f (x) = on [−2, 2].
1, 0 ≤ x < 2,

(c) 
cos x , 0 < x < π/2 ,
f (x) = on [0, π ].
0, π/2 ≤ x < π ,

Solution: (a) We have


 π  0  π 
1 −inx 1 −inx −inx
cn = f (x)e dx = f (x)e dx + f (x)e dx
2π −π 2π −π 0
  π     π
1 1 1 π 1 1
= 0+ xe−inx d x = −xe−inx · − −e−inx d x
2π 0 2π in 0 2π 0 in
 π
1 i −inπ 1 1 1
= e − e−inx
2n 2π in in 0
1 i −inπ 1 −inπ 1
= e + 2
e −
2n 2π n 2π n 2
1 + inπ −inπ 1
= e − 2 , for n = 0,
2πn 2
 π 2n π
1 π
c0 = x dx = .
2π 0 4

The complex Fourier representation of f becomes




π 1 1
f (x) = + 2
[(1 + inπ )e−inπ − 1]einx .
4 2π n=−∞,n =0
n

Here we have used the formulas

einπ = (−1)n = e−inπ , e−2πin = 1 , e−inπ/2 = (−i)n ) .

(b) We have
410 10 Fourier Methods with Applications

Fig. 10.17 Frequency spectrum

 2  0  2 
1 −inπ x/2 1 −inπ x/2 −inπ x/2
cn = f (x)e dx = (−1)e dx + e dx
4 −2 4 −2 0
i 1
= [−1 + einπ + e−inπ − 1] = [−2 + (−1)n + (−1)n ]
2nπ 2nπ
1 − (−1)n
= , for n = 0,
nπi
 2
1
c0 = f (x) d x = 0 .
4 −2

Thus
∞
1 − (−1)n inπ x/2
f (x) = e .
n=−∞
nπi

The fundamental period is equal to 4 so ω = 2π/4 = π/2, c0 = 0, and |cn | =


(1 − (−1)n )/nπ (Fig. 10.17).

(c) Using cos x = (ei x − e−i x )/2 we get


 
1 π 1 π/2
cn = f (x)e−2inx d x = cos xe−2inx d x
π 0 π 0
  π/2
1 π/2 1 i x 1
= (e − e−i x )e−2inx d x = (e(1−2n)i x − e−(1+2n)i x ) d x
π 0 2 2π 0
 π/2
1 1 (1−2n)i x 1 (1+2n)i x 2ne−inπ + i
= e + e = .
2π i(1 − 2n) i(1 + 2n) 0 π(1 − 4n 2 )
10.3 The Fourier Transform 411

10.3 The Fourier Transform

The Fourier transform is a mathematical procedure that breaks up a function into the
frequencies that compose it, as a prism breaks up light into colors. It transforms a
function f into a new function, fˆ or F [ f ] (read as “f hat” or “script f”) which is
called the Fourier transform of f . Depending on the context, the argument of f is a
time variable or a spatial variable. The argument of fˆ usually has the meaning of a
frequency.
A function and its Fourier transform are two faces of the same information. The
function exhibits the time (or spatial) information and hides the information about
frequencies. The Fourier transform displays information about frequencies and hides
the time (or spatial) information. Nevertheless, the function and its Fourier transform
both contain all the information about the function. One can compute the transform
from the original function as well as reconstruct the function from its transform, that
is, one can invert the transform.
In the previous section, we have studied the decomposition of a function into its
Fourier series, which is a periodic function. This works well for functions defined
on a bounded interval, as we can always think of them as periodically extended
to the whole real line. In contrast to that, the Fourier transform acts on arbitrary
(nonperiodic) functions.

10.3.1 Basic Properties of the Fourier Transform

When dealing with the Fourier transform, one constantly encounters integrals over
the whole real line (−∞, ∞), that is, improper integrals as introduced in Sect. 6.9.
In order to simplify the exposition during this section, we call a function f defined
on R = (−∞, ∞) integrable on R, respectively, square integrable on R, if the
improper integral
 ∞  ∞
| f (x)| d x , resp. | f (x)|2 d x
−∞ −∞

converges.
For the definition of the Fourier transform, several variants are in common use,
see Remark 10.11 below. We choose the following one.
Definition 10.5 (Fourier Transform) Let the function f be defined on R and assume
that it is integrable on R. The function fˆ, defined on R by
 ∞
fˆ(ξ ) = f (t)e−iξ t dt , (10.21)
−∞

is called the Fourier transform of f .


412 10 Fourier Methods with Applications

In signal analysis, t is understood as time variable and ξ is understood as the frequency


variable, see Sect. 10.4 below.
Remark 10.4 1. According to (10.21), integrals of complex-valued functions are
involved in the definition of the Fourier transform. They are defined as
 ∞  ∞
f (t)e−iξ t dt = f (t)(cos(ξ t) − i sin(ξ t)) dt
−∞ −∞
 ∞  ∞
= f (t) cos(ξ t) dt − i f (t) sin(ξ t) dt ,
−∞ −∞

that is, the real and imaginary parts of the integrand are evaluated separately and
yield the real and imaginary parts of the integral, which is a complex number (in
this case, the number fˆ(ξ )).
2. The integrand in (10.21) satisfies

| f (t)e−iξ t | = | f (t)| · |e−iξ t | = | f (t)| ,

since |ei x | = 1 for all real numbers x. Therefore and since f is assumed to be
integrable on R, the improper integral in (10.21) is defined. One then infers from
the properties of parameter-dependent integrals (Theorem 8.5 and the remark
following it) that the Fourier transform fˆ is a continuous function. In addition,
one can prove that fˆ(ξ ) tends to zero as ξ tends to ±∞. (The latter result is
called the Riemann–Lebesgue lemma.)
3. As it stands, the requirement that f has to be integrable on the whole line is
rather restrictive. For example, Definition 10.5 does not cover the case when f
is a constant function. Indeed, the Fourier transform of the constant 1 is defined,
but it is no longer a function defined on R, but a more general mathematical
object (although it is called the Dirac function). This, however, is outside the
scope of this book.

Example 10.6 Find the Fourier transform of the following functions:


e−|t|
(a) f (t) = 
0, t <0
(b) f (t) = −t
e , t ≥0
(c) Let a and k be positive numbers, let

k, −a ≤ t < a
f (t) =
0, otherwise
10.3 The Fourier Transform 413

Solution: (a) For f (t) = e−|t| , we get


 ∞  0  ∞
fˆ(ξ ) = e −|t|−itξ
dt = e −|t|−itξ
dt + e−|t|−itξ dt
−∞ −∞ 0
 0  ∞
= et e−iξ t dt + −t −iξ t
e e dt
−∞ 0
  t=0 
1 (1−iξ )t −1 −(1+iξ )t t=∞
= e + e
1 − iξ t=−∞ 1 + iξ t=0
1 1 2 2
= + = 2 = .
1 − iξ 1 + iξ 1 + ξ2 1 + ξ2

(b) Let 
1, t ≥ 0
H (t) =
0, t < 0

be the Heaviside function (see Chap. 1). Then f (t) = H (t)e−t and
 ∞  ∞  ∞
fˆ(ξ ) = f (t)e−iξ t dt = H (t)e−t e−iξ t dt = e−t e−iξ t dt
−∞ −∞ 0
∞  t=∞
1 1
= e−(1+iξ )t dt = − e−(1+iξ )t = .
0 1 + iξ t=0 1 + iξ

(c) We obtain
 ∞   t=a
a
−k −iξ t
fˆ(ξ ) = f (t)e −iξ t
dt = ke −iξ t
dt = e
−∞ −a iξ t=−a
k  −iξ a  2k
=− e − eiξ a = sin(aξ ) ,
iξ ξ

since
eiξ a − e−iξ a
sin(aξ ) = .
2i

Remark 10.5 (i) For a given function f , the Fourier transform (if defined) yields
a new function fˆ. We may thus view the Fourier transform as a mapping whose
domain and range are certain sets of functions. Such a mapping is commonly called
an operator. Let us denote it by F , so

F ( f ) = fˆ . (10.22)
414 10 Fourier Methods with Applications

From Definition 10.5, we see that F is linear, that is,

F (α f + βg) = αF ( f ) + βF (g)

holds for functions f , g and scalars α, β.


(ii) Instead of F ( f ) = fˆ, one often writes

F [ f (t)] = fˆ(ξ ) .

Although this is, in a strict sense, mathematically not correct (it confuses the functions
f and fˆ with their function values f (t) and fˆ(ξ )), it leads to a concise way of writing
formulas. In this notation, the result of Example 10.3.1 (a) becomes

2
F [e−|t| ] = .
1 + ξ2

In the next two theorems, we state some important properties of the Fourier transform.

Theorem 10.2 Let f be a function which is integrable on R.


(a) Time shift. Let t0 be a real number, let g(t) = f (t − t0 ). Then

ĝ(ξ ) = e−iξ t0 fˆ(ξ ) , ξ ∈ R . (10.23)

This means that the Fourier transform of the translated function f equals the Fourier
transform of f multiplied by a factor. In shorter notation, without introducing the
function g explicitly, (10.23) becomes

F [ f (t − t0 )] = e−iξ t0 fˆ(ξ ) .

(b) Frequency shift. Let ξ0 be a real number, let g(t) = eiξ0 t f (t). Then

ĝ(ξ ) = fˆ(ξ − ξ0 ) , ξ ∈ R , (10.24)

or, in shorter notation,


F [eiξ0 t f (t)] = fˆ(ξ − ξ0 ) .

(c) Scaling or Dilation. Let a be a real number with a  = 0, let g(t) = f (at). Then

1 ˆ
ξ
ĝ(ξ ) = f , ξ ∈ R. (10.25)
|a| a

This states that the Fourier transform of the scaled function is obtained by replacing ξ
by ξ/a in the Fourier transform of the original function and dividing by the magnitude
of the scaling factor.
10.3 The Fourier Transform 415

The formulas in the theorem above are obtained from properties of the integral. For
example, (10.23) results from the computation
 ∞  ∞
ĝ(ξ ) = f (t − t0 )e−iξ t dt = e−iξ t0 f (t − t0 )e−iξ(t−t0 ) dt
−∞ −∞
 ∞
−iξ t0 −iξ s −iξ t0 ˆ
=e f (s)e ds = e f (ξ ) ,
−∞

and (10.25) from the computation


 
∞ ∞
1 1 ˆ
ξ
ĝ(ξ ) = f (at)e−iξ t dt = f (u)e−iuξ/a du = f .
−∞ −∞ |a| |a| a

Remark 10.6 The properties given in Theorem 10.2 can also be written in operator
form. For example, the time shift can be expressed by the translation operator
Tt0 which maps a function f to its translate Tt0 f defined by (Tt0 f )(t) = f (t − t0 ).
Equation (10.23) then takes the form

Tt0 f (ξ ) = e−ξ t0 fˆ(ξ ) .




Theorem 10.3 (a) Suppose that f is continuous, f is piecewise continuous and


both f and f are integrable on R. Then

 
f (ξ ) = F [ f ](ξ ) = iξ fˆ(ξ ) . (10.26)
∞ ∞
(b) Suppose that f satisfies −∞ | f (t)| dt < ∞ and −∞ |t f (t)| dt < ∞. Then (see
Remark 10.5 for the notation)

d ˆ
F [t f (t)] = i f (ξ ) . (10.27)

Remark 10.7 If we apply Theorem 10.3 to derivatives of f , we obtain the formulas


f (n) (ξ ) = F [ f (n) ](ξ ) = (iξ )n fˆ(ξ ) , (10.28)
dn
F [t n f (t)] = i n n fˆ(ξ ) , (10.29)

provided f and its derivatives up to order n satisfy the corresponding assumptions.


In particular, for n = 2 we have

d2 ˆ
F [t 2 f (t)] = − f (ξ ) . (10.30)
dξ 2
416 10 Fourier Methods with Applications

Theorem 10.4 (Plancherel’s Identity) If f is integrable as well as square integrable


on R, and if the same holds for g, then
 ∞  ∞
fˆ(ξ )ĝ(ξ ) dξ = 2π f (t)g(t) dt . (10.31)
−∞ −∞

(Here c denotes the complex conjugate of the complex number c.)

Setting g = f in the preceding theorem, we obtain the following.


Theorem 10.5 (Parseval’s Identity) If f and f 2 are integrable on R, then
 ∞  ∞
| fˆ(ξ )|2 dξ = 2π | f (t)|2 dt . (10.32)
−∞ −∞

If we interpret f as a signal, the norm



 ∞ 1/2
f = | f (t)|2 dt
−∞

represents the energy of the signal.


The Inverse Fourier Transform. In the beginning of this section, we have defined the
Fourier transform fˆ of a function f . It turns out that we can reverse this procedure—if
we know fˆ, we can obtain f according to the following result.
Theorem 10.6 Suppose that f is continuous and that f and fˆ are integrable on R.
Then  ∞
1
f (t) = fˆ(ξ )eiξ t dξ (10.33)
2π −∞

holds for all t ∈ R.


In abstract terms, the right-hand side of (10.33) defines the inverse F −1 of the Fourier
transform F . It is called the inverse Fourier transform
 ∞
−1 1
(F [g])(t) = g(ξ )eiξ t dξ . (10.34)
2π −∞

Indeed, we see that F −1 [F [ f ]] = f .


Remark 10.8 The following interpretation of Theorem 10.6 is fundamental for many
applications of the Fourier transform. Consider t as a time variable. For fixed ξ , the
values eiξ t traverse along the unit circle at constant speed. Since ξ = 2π corresponds
to the completion of one cycle in one unit of time, the number ξ/2π gives the number
of cycles per unit time, which is called the frequency. Its unit is Hertz if t is measured
in seconds. The number ξ is called the angular frequency, it gives the number of
radians traversed per unit time. Seen in this light, formula (10.33) is a decomposition
of the original function f into a weighted sum of oscillations in form of an integral.
10.3 The Fourier Transform 417

The weight of the angular frequency ξ is given by the value fˆ(ξ ) of the Fourier
transform of f .

Remark 10.9 If f is twice differentiable and if f , f , and f are integrable on R,


then fˆ is integrable on R, so in this case we can apply Theorem 10.6.

For piecewise continuous functions, one has the following result.


Theorem 10.7 Let f and f be piecewise continuous and assume that f is integrable
on R. Then  R
1 1
lim fˆ(ξ )eiξ t dξ = ( f (t+) − f (t−)) (10.35)
R→∞ 2π −R 2

holds for all t ∈ R.

Remark 10.10 For a function h, the limit


 R
lim h(x) d x ,
R→∞ −R

∞
if it exists, is called the principal value (or Cauchy principal value) of −∞ h(x) d x.
Thus, under the assumptions of Theorem 10.7 we also obtain the formula
 ∞
1
f (t) = fˆ(ξ )eiξ t dξ
2π −∞

at points t where f is continuous, provided we interpret the integral as its principal


value.

Remark 10.11 If one wants the frequency variable ξ to denote ordinary frequency
instead of angular frequency, one defines the Fourier transform by
 ∞
fˆ(ξ ) = f (t)e−2πitξ dt .
−∞

The inverse formula then becomes


 ∞
f (t) = fˆ(ξ )e2πitξ dξ .
−∞

In this case, the frequency ξ is measured in Hertz (cycles per second), if t is measured
in seconds. If one keeps the angular frequency, but wants a more symmetric relation
between the transform and its inverse, one uses
 ∞  ∞
1 1
fˆ(ξ ) = √ f (t)e−itξ dt , f (t) = √ fˆ(ξ )eitξ dξ .
2π −∞ 2π −∞
418 10 Fourier Methods with Applications

Less common is an interchange of the sign in the exponent,


 ∞  ∞
1
fˆ(ξ ) = f (t)e itξ
dt , f (t) = f (t)e−itξ dξ .
−∞ 2π −∞

It is also possible to mix these variants. Therefore, when dealing with the Fourier
transform, one has to make sure which convention is used.

Localization and the Uncertainty Principle. In this subsection, we explain the fact
that a function f and its Fourier transform fˆ cannot both be concentrated on a small
interval.
First, consider the dilation g(t) = f (at). For a > 1, g represents a compression
of f around t = 0 by the factor a. On the other hand, Theorem 10.2(c) says that

1 ˆ
ξ
ĝ(ξ ) = f ,
|a| a

that is, we have to stretch out fˆ by a factor a in order to obtain ĝ. If a < 1, g is a
stretched version of f while ĝ is a compressed version of fˆ.
Second, assume that fˆ(ξ ) = 0 outside some interval [−l, l]. We say in this case
that f has bandwidth l. From the Fourier inversion formula which becomes
 l
1
f (t) = fˆ(ξ )eiξ t dξ
2π −l

one concludes, with the aid of a result of complex function theory which we cannot
present here, that f can be zero only at isolated points, so it spreads out to infinity and
in particular cannot have the property that f (t) = 0 outside some interval [−M, M].
Third, one can quantify this phenomenon. Consider the expression
 ∞   ∞ −1
Δf = t | f (t)| dt ·
2 2
| f (t)| dt
2
.
−∞ −∞

If the large values of f arise only at small values of t and f decays rapidly as t
gets large, the numerator will be small in comparison with the denominator, so Δf
somehow measures the concentration (or localization) of f around t = 0. It can be
proved that
1
(Δf ) · (Δ fˆ) ≥ ,
4

that is, if Δf is small then Δ fˆ has to be large and vice versa. This is called the
uncertainty principle.
10.3 The Fourier Transform 419

10.3.2 Convolution

The convolution of two sequences a = {an } and b = {bn }, where n ranges over all
integers, is defined as
∞
(a ∗ b)k = a j bk− j ,
j=−∞

provided the infinite series converges. If a j and b j are nonzero only for j ≥ 0, the
convolution becomes the finite sum


k
(a ∗ b)k = a j bk− j .
j=0

For example, for k = 2 and k = 3, we have


2
(a ∗ b)2 = a j b2− j = (a0 b2 + a1 b1 + a2 b0 )
j=0


3
(a ∗ b)3 = a j b3− j = (a0 b3 + a1 b2 + a2 b1 + a3 b0 ) .
j=0

For functions, convolution involves the integral instead of the sum.


Definition 10.6 Let f and g be two functions defined on the real line R, assume
that f and g are integrable on R. The convolution of f and g is denoted by f ∗ g
and defined as  ∞
( f ∗ g)(x) = f (x − y)g(y) dy (10.36)
−∞

(read as f star g, or f convolved with g).


One can prove that, under the stated assumptions, the improper integral in (10.36)
indeed converges (we will not do it here), so that f ∗ g is integrable on R, too.
Remark 10.12 (a) The convolution has the following properties:
(i) For all functions f, g, h as in Definition 10.6,

( f + g) ∗ h = ( f ∗ h) + (g ∗ h) ,

that is, [( f + g) ∗ h](x) = ( f ∗ h)(x) + (g ∗ h)(x) for all x ∈ R.


(ii) (λ f ) ∗ g = λ( f ∗ g) for functions f, g and scalars λ.
(iii) f ∗ (g + h) = ( f ∗ g) + ( f ∗ h).
(iv) f ∗ (λg) = λ( f ∗ g).
420 10 Fourier Methods with Applications

(v) f ∗ (g ∗ h)] = ( f ∗ g) ∗ h.
(vi) f ∗ g = g ∗ f .
These properties tell us that the convolution f ∗ g is linear with respect to f and
g separately, and that it is commutative (property (vi)).
(c) The convolution can also be interpreted as a moving weight average of the
function f , where the weighting is determined by the function g. In view of (a)
(vi), f ∗ g can also be interpreted as a moving weight average of g, where the
weighting is determined by f .
(d) If the function f (x) has large oscillations, sharp peaks, or discontinuities, then
averaging about each point x will tend to decrease the oscillations, lower the
peaks, and smooth out the discontinuities. In view of all this, convolution acts
as a smoothing operator. Let us mention two particular results in this direction.
∞
(i) If supt∈R | f (t)| and |g(t)| dt are finite, then the function f ∗ g is con-
−∞
 ∞ on R.
tinuous ∞
(ii) If −∞ | f (t)|2 dt and −∞ |g(t)|2 dt are finite, then the function f ∗ g is
continuous on R.
(e) Convolutions arise as a basic tool to describe input–output systems. Such a sys-
tem transforms a time-dependent input function u = u(t) into a time-dependent
output function w = w(t) according to
 ∞
w(t) = ( f ∗ u)(t) = f (t − s)u(s) ds . (10.37)
−∞

In signal analysis, such an input–output system is called a filter. For example,


an electrical circuit with an input and an output line can be described in this
way, and indeed much of mathematical systems theory has been developed in
this context. A filter may serve various purposes such as letting through certain
frequencies while blocking other ones, or removing noise or blurring in pictures.
We may write the system (10.37) in operator form as

w = S [u] .

The system is linear,

S [αu + βv] = αS [u] + βS [v] ,

and it is time-invariant, that is, if ũ(t) = u(t − h) is a translate of u, then w̃ =


S [ũ] satisfies w̃(t) = w(t − h), that is, w̃ is the corresponding translate of w.
This means that the behavior of the system does not change when time passes.
We state some important properties of the convolution.
10.3 The Fourier Transform 421

Theorem 10.8 Suppose that f and g are integrable on R. Then


 ∞  ∞  ∞
|( f ∗ g)(t)| dt ≤ | f (t)| dt · |g(t)| dt . (10.38)
−∞ −∞ −∞

For convolution in the time domain, we have

f ∗ g(ξ ) = fˆ(ξ )ĝ(ξ ) .


 (10.39)

For convolution in the frequency domain, we have

 1 ˆ
f g(ξ ) = ( f ∗ ĝ)(ξ ) . (10.40)

Thus, under the action of the Fourier transform or its inverse, multiplication becomes
convolution and vice versa. This is a major reason why convolution plays a prominent
role in the calculus of Fourier transforms.

10.3.3 The Discrete Fourier Transform

In digital signal processing, signals are represented by sequences {xn }, also written
as {x(n)}, where n ranges over all integers. In other words, we consider functions x
whose domain are the integers, instead of functions defined on the real numbers R.
More specifically, let x be a periodic sequence with period N , that is, x(n +
N ) = x(n) for all n. Any such sequence is completely specified by the values x(0),
x(1), . . . , x(N − 1). The (N -point) discrete Fourier transform (DFT) of x, denoted
by x̂, is the N -periodic sequence defined by

N −1

x̂(n) = x( j)e−2πi jn/N , 0 ≤ n ≤ N − 1 , (10.41)
j=0

and extended by periodicity to all integer values of n.


The following theorem yields the inverse of the discrete Fourier transform.
Theorem 10.9 Let x be an N -periodic sequence x(n) with DFT x̂. Then

N −1
1 
x( j) = x̂(n)e2πin j/N , 0 ≤ j < N − 1 . (10.42)
N n=0

We define the convolution of N -periodic sequences.


422 10 Fourier Methods with Applications

Definition 10.7 Let x and y be N -periodic sequences. The circular convolution of


x and y is defined by
N −1

(x ∗ y)(n) = x(k)y(n − k) . (10.43)
k=0

One may check immediately from (10.43) that x ∗ y is also N -periodic.


Theorem 10.10 Let x and y be N -periodic sequences with DFT’s x̂ and ŷ. Then

x
∗ y(n) = x̂(n) ŷ(n) , (10.44)

where x
∗ y denotes the DFT of x ∗ y.
If one computes the discrete Fourier transform of an N -periodic sequence directly
from its definition, one needs N multiplications (and N additions) for each element
x̂(n), thus N 2 multiplications for all elements x(0), . . . , f x(N − 1). However, due to
specific properties of the factors e−2πi jn/N , it is possible to compute the DFT with only
cN log2 N multiplications and additions, where c is a small constant. An algorithm
for this purpose was discovered by James Cooley and John Tukey, published in 1965,
and is known since as the fast Fourier transform (FFT). Its history goes back to Carl
Friedrich Gauss. The algorithm is based on the recursive factorization of a particular
matrix. When N is large, the speedup from N 2 to cN log2 N is enormous. Indeed,
the FFT is most widely used in computations involving the Fourier transform in all
areas of science and technology.

10.4 Application of Fourier Methods to Signal Analysis

Signals are ubiquitous. The train’s whistle, the blinking of a car’s beam can be viewed
as quantities varying in time which contain information; they are examples of time-
dependent signals. Early in human history, signals of smoke by day and of fire by
night have been used to transmit information. In recent times, telegraph, telephone,
radio, television, and radar have been, respectively, are used as signal transmitters. A
radio signal consists of a sine or cosine wave with radio frequency, called the carrier
wave, which has been modulated with the information to be transmitted.
Signals can be divided into two categories, analog signals (functions defined on
a continuum of numbers, for example, an interval in R) and digital signals, which
are defined on a discrete set like the integers. In 1949, Claude E. Shannon of Bell
Telephone Laboratories published a mathematical result now known as the Shannon
sampling theorem. This result provided the foundation for digital signal processing. It
tells us that if the range of frequencies of a signal measured in cycles per second does
not exceed n, then the time-continuous signal can be reconstructed with complete
accuracy by measuring its amplitude 2n times a second.
10.4 Application of Fourier Methods to Signal Analysis 423

The study of signals is relevant not only in telecommunication but also in teleme-
try, astronomy, oceanography, optics, crystallography, geophysics, bioengineering,
bioinformatics, and medicine, to mention a few.
The Shannon Sampling Theorem. Let us assume that a signal f is band-limited,
that is,
fˆ(ξ ) = 0 , for all |ξ | > l, (10.45)

holds for some l > 0, and the smallest such number l is called the bandwidth of the
signal. This means that the total frequency content of the signal f lies in the band (or
interval) [−l, l]. Moreover, let us assume that f is integrable and square integrable
on R, that is, the signal f has finite energy. Let it be recovered from its Fourier
transform as  ∞
1
f (t) = fˆ(ξ )eiξ t dξ .
2π −∞

Because of (10.45),  l
1
f (t) = fˆ(ξ )eiξ t dξ . (10.46)
2π −l

We expand fˆ on [−l, l] in a complex Fourier series, so




fˆ(ξ ) = cn enπiξ/l , (10.47)
n=−∞

where  l
1
cn = fˆ(ξ )e−nπiξ/l dξ . (10.48)
2l −l

We insert (10.47) in (10.46) and obtain


 l  l ∞
1 1
f (t) = fˆ(ξ )eiξ t dξ = cn enπiξ/l eiξ t dξ
2π −l 2π −l n=−∞
∞  l (10.49)
1 
= cn enπiξ/l eiξ t dξ .
2π n=−∞ −l

(Interchanging the integral with the sum is possible by a general property of Fourier
series of functions of finite energy; note that fˆ has finite energy by Parseval’s formula
(10.32).) Next, we compare (10.46) for t = −nπ/l with (10.48) and see that
π

cn = f − . (10.50)
l l
424 10 Fourier Methods with Applications

We insert this value into (10.49) and compute



1  π
nπ l nπiξ/l iξ t

f (t) = f − e e dξ
2π n=−∞ l l −l

1 
nπ l −nπiξ/l iξ t

= f e e dξ
2l n=−∞ l −l

(we have replaced n by −n, as n ranges over all integers)



1 
nπ l iξ(t−nπ/l)

= f e dξ
2l n=−∞ l −l

1 
nπ  ξ =l

1
= f eiξ(t−nπ/l
2l n=−∞ l i(t − nπ/l) ξ =−l



nπ 1 1 i(lt−nπ)
= f [e − e−i(lt−nπ) ] ,
n=−∞
l lt − nπ 2i

so finally we arrive at the Whittaker–Shannon interpolation formula




nπ sin(lt − nπ )
f (t) = f . (10.51)
n=−∞
l lt − nπ

This is the main content of Shannon’s sampling theorem which states that a func-
tion of bandwidth l can be completely recovered by (10.51) from its values at the
points nπ/l, where n = 0, ±1, ±2, . . .. This result forms the basis for the conversion
between analog and digital signals. If we convert an analog signal f of bandwidth l
into a digital signal by evaluating it at times t = 0, ±π/l, ±2π/l, we can convert it
back to an analog signal without loss of information, at least in principle—note that
an exact evaluation of (10.51) involves an infinite sum of values of f at arbitrarily
large positive and negative times. To develop suitable approximations, both from the
theoretical and the practical standpoint, is one of the subjects of the area of signal
analysis and signal processing.

10.5 Exercises

10.5.1 Discuss the relationship between linear independence and orthonormality.


Can you convert an orthogonal system into an orthonormal system?
 f (x) = e and g(x) = sin x are orthogonal on the interval
x
10.5.2 (a) 
Show that
π 5π
, .
4 4
10.5 Exercises 425

(b) Show that {cos x, cos 3x, cos 5x, . . .} is an orthogonal set on [0, π/2].
10.5.3 Expand the following functions into Fourier series on the given interval:
(a) f (x) = x + π, −π < x < π .
(b) f (x) = e −8x for −4 ≤ x ≤ 4.
0, −π < x < 0
(c) f (x) = .
x 2, 0 ≤ x < π
10.5.4 Show that
π 1 1 1
(a) = 1 − + − + ··· (using 10.5.3 (a)),
4 3 5 7
π2 1 1
(b) = 1 + 2 + 2 + ··· (using 10.5.3 (c)).
6 2 3

10.5.5 Compare the graph of the function f (x) = x 2 with the 3rd, 6, 10, and 13th
partial sums of its Fourier series on the interval [−2, 2].
10.5.6 Write the complex Fourier series of the following functions:
(a) f (x) = cos x, 0 ≤ x < 1, and
 f has period 1.
0, 0 ≤ x < 1
(b) f has period 4, and f (x) = .
1, 1 ≤ x < 4
10.5.7 Expand the following functions in an appropriate cosine and sine Fourier
series:
x 3 , −π < x < π .
(a) f (x) = 
x − 1, −π < x < 0 ,
(b) f (x) =
x + 1, 0 ≤ x < π .
10.5.8 Let f (x) = 4 sin x, 0 < x < π , f (x + π ) = f (x). Sketch this function and
its Fourier series. Find the frequency spectrum of f .
10.5.9 Let f be integrable on [−l, l].
(a) Prove that
∞ 
1 1 l
a0 + (an + bn ) ≤
2 2
| f (x)|2 d x ,
2 n=1
l −l

where a0 , an , and bn are the Fourier coefficients of f .


(b) Prove Parseval’s identity

∞ 
1 1 l
a0 + (an2 + bn2 ) = | f (x)|2 d x
2 n=1
l −l

 l
if | f (x)|2 d x is finite and a0 , an , and bn are its Fourier coefficients.
−l
(c) Show that an → 0 and bn → 0 as n → ∞.
426 10 Fourier Methods with Applications

10.5.10 Find the Fourier transform of the following functions:


−|t|
(a) f (t) = te
 for all real t.
sin (πt), −5 ≤ t ≤ 5
(b) f (t) =
0, |t| > 5


⎨1, 0≤t ≤k
(c) f (t) = −1, −k ≤ t < 0 , where k is a positive constant.


0, |t| > k
10.5.11 Prove Theorem 10.2 (b): Let ξ0 be a real number, let g(t) = eiξ0 t f (t), then
ĝ(ξ ) = fˆ(ξ − ξ0 ).
10.5.12 Show that
(a) For g(t) = f (−t) show that ĝ(ξ ) = fˆ(−ξ ).
(b) For g(t) = fˆ(t) show that ĝ(ξ ) = 2π f (−ξ ).
10.5.13 Prove that ( f ∗ g)(x) = (g ∗ f )(x) for all x.
10.5.14 Let f (x) = e−ax χ(0,∞) (x) and g(x) = e−bx χ(0,∞) (x), where

1, x ∈ (0, ∞)
χ(0,∞) (x) =
0, x ∈
/ (0, ∞)

Calculate ( f ∗ g)(x).
10.5.15 Find fˆ and ĝ if

1, −1/2 ≤ t ≤ 1/2
f (t) = χ[−1/2,1/2] (t) = ,
0, t < −1/2 or t > 1/2

10.5.16 Let f and g be square integrable. Show that the convolution f ∗ g is a


continuous function on R.
10.7.17 Let x = x(n) and y = y(n) be N -periodic signals. Prove that (x ∗ y)(n) =
(y ∗ x)(n).
Chapter 11
Differential Equations

A differential equation is an equation relating a function with its derivatives. In


these equations, the functions often represent physical quantities, the derivatives
represent their rates of change and the equation defines their relationship. Differ-
ential equations have been and still are a major and important branch of pure and
applied mathematics since their invention in the mid-seventeenth century. Differen-
tial equations began with the German mathematician Leibniz and the Swiss brother
mathematicians Jacob and Johann Bernoulli and some others from 1680 on, not long
after Newton’s fluxional equations in the 1670s. Applications were made to geom-
etry, mechanics, and optimization. Most part of the eighteenth century was devoted
to the consolidation of the Leibnizian tradition, extending it to several independent
variables which led to partial differential equations. New scholars known as experts
of mathematics, physics, astronomy, and philosophy, namely Euler, Daniel Bernoulli
(the son of Jacob Bernoulli), Lagrange, and Laplace appeared on the scene to solve
challenging problems related to theory and applications. Several applications were
made to mechanics, particularly to astronomy and continuum mechanics.
In the nineteenth century, the general theory was enriched by a better understand-
ing of the nature of solutions, e.g., through existence and uniqueness theorems. This
went hand in hand with the clarification of the foundations of analysis through the
notions of limit and continuity as we use them today. In the twentieth century, the
general theory was developed in many directions, influenced by functional analysis
and other mathematical disciplines.
Besides the persons mentioned above, among the mathematicians cum scien-
tists who have contributed significantly in this area are Fourier, Legendre, Bessel,
d’Alembert, Cauchy, Riemann, Monge, Poisson, Dirichlet, Gauss, Navier, Stokes,
Maxwell, Helmholtz, Korteweg, de Vries, Poincaré, Dieudonné, Sobolev, Kruskal,
Lax, Lions, Stampacchia, Nirenberg, Brezis, Browder, Hörmander, Mosco, and
Zeidler. Among Indian mathematicians who have notably contributed in this field are
recipient King Faisal award and FRS Narasimhan, Adimurti, Kesavan, Vaninathan,
and Gowda.
© Springer Nature Singapore Pte Ltd. 2019 427
M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6_11
428 11 Differential Equations

In this chapter, we present basic methods for computing solutions of differential


equations through explicit formula, as well as many examples of how differential
equations are used for the modeling of real phenomena.

11.1 Introduction and Basic Notions

How do differential equations arise? As an example, let us study a simple mathemat-


ical model for the development (or evolution) of a fish population in a large pond.
Let P(t) be the size (in millions) of the population at time t (in years). Its rate of
change P  (t) at time t equals the rate of change due to breeding minus the rate of
change due to harvesting. Let us assume that the former is equal to 20% of the current
population, and that the latter is equal to 10 million fish per year. The function P
then satisfies
P  (t) = 0.2 P(t) − 10

for each time t. The underlying equation is commonly written as

P  = 0.2 P − 10. (11.1)

This is the differential equation that models how the fish population changes. The
unknown quantity in the equation is P as a function of the time t. This equation can
be used to predict the fish population in the future.
A rough estimate can be made as follows. Suppose that at the initial time t = 0, the
fish population equals 80 million. At this time, P  (0) = 0.2 · 80 − 10 = 16 − 10 =
6, a rate of change of six million fish per year. On this basis, we may estimate the
population size after 1 year as 80 + 6 = 86 million fish. This number, however, will
be an underestimate since the population size, and consequently its rate of change,
will increase during the year.
An analysis of the differential equation, which will be done later, yields that the
family of functions (and no other function)

P(t) = 50 + ce0.2t , (11.2)

c being an arbitrary constant, solves (11.1). Actually, given this family, one may
check directly that for every value of c we obtain a solution of (11.1), as

P  (t) − (0.2 P(t) − 10) = 0.2 ce0.2t − 0.2 · (50 + ce0.2t ) + 10 = 0.

The family (11.2) is called the general solution of (11.1). If, as above, we also
require the initial condition P(0) = 80, we have to set c = 30. The corresponding
solution P(t) = 50 + 30e0.2t is called a particular solution of (11.1). (It yields the
value P(1) ≈ 86.64.) The combined pair (the differential equation and the initial
condition) is called an initial value problem.
11.1 Introduction and Basic Notions 429

Example 11.1 (a) Check that P(t) = ce2t is a solution of the differential equation
P  = 2P for arbitrary values of the constant c.
(b) Find the particular solution which satisfies the initial condition P(0) = 200.
Solution: (a) Differentiating P(t) = ce2t , we obtain P  (t) = 2ce2t = 2P(t), so
P(t) = ce2t satisfies P  = 2P.
(b) For t = 0 and P(0) = 200, we must have 200 = ce0 , so c = 200. Hence
P(t) = 200e2t is a particular solution which solves the given initial value problem.
It is a natural question posed by many beginners in the field why one should
study differential equations. The answer is that many laws or models governing nat-
ural phenomena involve the rates at which things change. This leads to mathematical
equations in which, besides the unknown function itself, its derivatives appear, that is,
we have to solve differential equations. This occurs in problems from fluid mechan-
ics, population growth, circuit design, heat transfer, seismic waves, option trading.
The solutions to differential equations are functions. If they can be expressed symbol-
ically, they are given by mathematical formulas; if they are represented graphically,
they look like curves (for ordinary differential equations) or surfaces (for partial
differential equations).
Since differential equations model many real-world situations, the question of
whether a solution exists and is unique, and how it depends upon changes in the
equation or its parameters, respectively, can have great practical importance. If we
know how the velocity of a satellite is changing, can we know its position and
velocity for the future? If we know the initial population of a city, and we know
how the population is changing, can we predict the population in the future? Yes we
can; if we know the initial value of some quantity and we know how it is changing,
we should be able to find the future value of the quantity. In terms of differential
equations, an initial value problem (a differential equation with an initial condition)
representing a real-life situation which satisfies some natural conditions has a unique
solution.
Order and General Form of a Differential Equation
The order of a differential equation is the order of the highest derivative in the
equation. For example,

y  + 20y = e x , y  + 10y = sin x,

are differential equations of order 1 and 2, respectively. In symbols we can express


the general form of an nth-order ordinary differential equation in one dependent
variable as
F(x, y, y  , y, . . . , y (n) ) = 0. (11.3)

Here, F is a real-valued function of n + 2 variables, and y  , . . . , y (n) stand for the


derivatives of the unknown function y; we will also use the alternative (traditional)
notation
dy d 2 y dn y
, 2,..., n.
dx dx dx
430 11 Differential Equations

A solution of (11.3) is a real-valued function y defined on some open interval J


which satisfies
F(x, y(x), y(x), . . . , y (n) (x)) = 0

for all x ∈ J .
A differential equation of the form

an (x)y (n) + · · · + a1 (x)y  + a0 (x)y = g(x) (11.4)

is called linear. Here, a0 , a1 , a2 , . . . an and g are functions of x on some open interval


J . If an is not the zero function, the equation is of order n. If g = 0 (the zero function),
it is called homogeneous, otherwise it is called inhomogenous or nonhomogeneous.
When analyzing linear equations, we will mainly be concerned with equations

dy d2 y dy
a1 (x) + a0 (x)y = g(x), a2 (x) 2 + a1 (x) + a0 (x)y = g(x)
dx dx dx
of first and second order, respectively.
Initial and Boundary Value Problems
As we have seen above, the first-order differential equation y  = y has the functions
y(x) = ce x as solutions, where c is an arbitrary constant. In order to fix a particular
solution, one has to specify an additional condition. The second-order equation

y  + y = 0

is solved by the functions

y(x) = c1 cos x + c2 sin x,

where c1 and c2 are constants; indeed, differentiating these functions yields

y  (x) = −c1 sin x + c2 cos x, y  (x) = −c1 cos x − c2 sin x.

In this case, two additional conditions are needed to single out a particular solution.
This is not coincidental. It turns out that, typically, an nth-order ordinary dif-
ferential equation requires us to specify n additional conditions to obtain a unique
solution. The most common ones are initial conditions and boundary conditions.
Definition 11.1 (Initial Value Problem) Here, the additional conditions relate to a
single x-value. They are called initial conditions or initial values, and the differential
equation together with the initial conditions is called an initial value problem. Its
order is defined to be the same as the order of the differential equation.
Example 11.2 (i) y  + y = 3 with y(0) = 2 is an initial value problem of first order,
y(0) = 2 is the initial condition.
11.1 Introduction and Basic Notions 431

(ii) y  + 2y = 0 with y(1) = 2 and y  (1) = −3 is an initial value problem of second


order. The initial conditions are y(1) = 2 and y  (1) = −3. In this case, the values of
the function y and its derivative are specified at x = 1.

Definition 11.2 (Boundary Value Problem) Here, the additional conditions relate to
two or more x values. They are called boundary conditions or boundary values, and
the differential equation together with the boundary conditions is called a boundary
value problem. Its order is defined to be the same as the order of the differential
equation.

Example 11.3 (i) y  − y  + y = x 3 with y(0) = 4 and y  (1) = −2 is a boundary


value problem of second order. The boundary conditions are specified at two points,
namely x = 0 and x = 1.
(ii) y  − y  + y = x 3 with y(0) = 4 and y(1) = −2 is also a boundary value problem
of second order. In this case, both boundary conditions refer to values of the function
y itself, and not to values of the derivative y  .
(iii) y  − y = 1 with y(0) = y(1) is a boundary value problem of first order. Such a
boundary condition is called a periodic boundary condition.

The following questions are relevant, from the theoretical standpoint as well as when
one uses boundary and initial value problems as a tool to compute solutions to real-
world problems.
Problem 1. When does a solution exist? That is, does an initial value problem or a
boundary value problem necessarily have a solution?
Problem 2. Is the solution unique? That is, is there only one solution of a given initial
value or boundary value problem?
The following theorem states that under the specified conditions, a first-order initial
value problem has a unique solution.
Theorem 11.1 Let f and ∂ f /∂ y be functions which are defined and continuous1 in
some rectangle R of the x y-plane, and let (x0 , y0 ) be a point in the interior of that
rectangle. Then on some interval centered at x0 there exists a unique solution of the
initial value problem
y  = f (x, y), y(x0 ) = y0 .

Example 11.4 (i) y(x) = 4e x is the unique solution of the initial value problem

y  = y, y(0) = 4

on the real line (−∞, ∞). It is the particular solution of the differential equation
y  = y which passes through the point (0, 4).
Verification: We have y  (x) = 4e x = y(x) and y(0) = 4e0 = 4. Thus, the given
function solves the initial value problem. The right hand side f (x, y) = y satisfies

1 See Definition 8.5.


432 11 Differential Equations

the assumptions of the theorem, so we conclude from the theorem that no other
solution exists.2
(ii) Find a solution of the initial value problem y  = y, y(1) = 3. That is, find a
solution of the differential equation y  = y which passes through the point (1, 3). Is
it unique?
Solution: As in part (i) one checks that y(x) = ce x solves the given equation y  = y.
Imposing the given initial condition we get 3 = y(1) = ce1 , so c = 3e−1 . Therefore,
y(x) = 3e−1 e x = 3e x−1 is a solution of the initial value problem. As in part (i), the
theorem is applicable, so the solution is unique.

A corresponding theorem holds for the initial value problem of nth order

y (n) = f (x, y, . . . , y (n−1) ) (11.5)



y(x0 ) = y0 ,y (x0 ) = y0(1) , ... , y (n−1)
(0) = y0(n−1) (11.6)

where the real-valued function f depends on n + 1 variables and y0 , y0(1) , . . . , y0(n−1)


are given numbers. In this case, one assumes that besides f , all partial derivatives
∂ f /∂ y, ∂ f /∂ y  ,…are defined and continuous in some rectangular box in Rn+1 con-
taining (x0 , y0 , . . . , y0(n−1) ) in its interior.

Remark 11.1 Theorem 11.1 and its generalization to nth-order equations are special
cases of the famous Picard-Lindelöf theorem.

The superposition principle. We consider the homogeneous linear equation of order


n,
an (x)y (n) + · · · + a1 (x)y  + a0 (x)y = 0.

If the functions y1 and y2 are solutions, then also the function

y(x) = c1 y1 (x) + c2 y2 (x)

is a solution, for arbitrary values of the constants c1 and c2 . (One can verify this
directly by inserting y into the differential equation.) This property of linear equations
is called the superposition principle.
General and particular solution. It can be shown that the homogeneous linear
equation of order n (note that an is set to 1)

y (n) + · · · + a1 (x)y  + a0 (x)y = 0 (11.7)

2 Assume that there exists a second solution z with z(x)  = y(x) for some x > x 0 . Let x M ≥ x0 be
the largest number such that z(ξ ) = y(ξ ) for all ξ ∈ [x0 , x M ]. Then the functions y and z yield two
different solutions of y  = f (x, y) through the point (x M , y(x M )) in some interval centered at x M ,
contradicting the theorem. Therefore, such a second solution cannot exist. In the same manner, one
shows that the solution is unique for values x < x0 .
11.1 Introduction and Basic Notions 433

has n linearly independent3 solutions y1 , . . . , yn . Such a set {y1 , . . . , yn } of solutions


is called a fundamental set for (11.7). By the superposition principle,

y(x) = c1 y1 (x) + · · · + cn yn (x)

is a solution, for arbitrary constants c1 , . . . , cn . It is called the general solution of


(11.7). Any solution obtained by fixing the constants is called a particular solution
of (11.7). For example, the constants can be determined from the initial conditions
(11.6).
Historical note. Some people say that the study of differential equations began
in 1675 when the German mathematician cum philosopher Gottfried Wilhelm von
Leibniz (1646–1716) wrote the equation

1 2
xd x = x .
2

The search for general methods of finding solution of differential equations started
when the English physicist and mathematician Isaac Newton (1643–1727) classified
first-order differential equations into three classes, namely

dy dy ∂u ∂u
(1) = f (x), (2) = f (x, y), (3) x +y = u.
dx dx ∂x ∂y

The first two cases contain only ordinary derivatives of one or more dependent
variables, with respect to a single independent variable; they are ordinary differential
equations. The third case involves partial derivatives of one variable which depends
on several independent variables; it represents a partial differential equation.
The simplest method of solving differential equations, namely the separation of
variables method, was developed by the joint effort of Leibniz and Jacob Bernoulli
(1655–1705) around 1691. Johann Bernoulli (1667–1748), the younger brother of
Jacob, also contributed to the development of the separation of variables method.
Johann Bernoulli also introduced a linear homogeneous differential equation around
1692.
A differential equation of the type

dy
+ P(x) = f (x)y n
dx
was introduced by Johann Bernoulli in December 1695 and solved by Leibniz in
1696. Nowadays this equation is known as the Bernoulli equation.
Jacopo Riccati (1676–1754) introduced the equation, now bearing his name,

y  = P(x) + Q(x)y + R(x)y 2 .

3 See Definition 10.3.


434 11 Differential Equations

Another Bernoulli, Daniel, had already solved special cases of this equation. The
Swiss mathematician cum physicist, Leonard Euler (1707–1783) made a significant
progress when he posed and solved the problem of reducing a particular class of
second order differential equation to that of a first-order differential equation. He
interacted with Johann Bernoulli in 1739 on the study of homogeneous linear differ-
ential equation with constant coefficients. He also studied non-homogeneous linear
equations and applied the method of successive order reduction to solve equations
of higher order.
Alexis-Claude Clairaut (1713–1765) studied differential equations of the form

y = x y  + f (y  )

in 1734, nowadays called Clairaut equations. Besides a one-parametric family of


solutions as usual, this equation also possesses other solutions which are termed sin-
gular; thus Clairaut was one of the first to study initial value problems with nonunique
solutions.
The Italian mathematician and astronomer Joseph-Louis Lagrange (1736–1813)
had embarked on the problem of integrating factor for the general linear equation.
He also invented the method of variation of parameters.
The French mathematician and philosopher Jean-Baptiste le Rond d’Alembert
(1717–1783) further developed the theory of ordinary linear differential equations.
He provided fundamental contributions to mathematical continuum mechanics; in
this context, he introduced partial differential equations into the modeling of vibrating
strings.
One might say that all known methods of computing explicit formulas for solutions
of first-order differential equations had been obtained during the time period referred
to as above.

11.2 Separation of Variables

Definition 11.3 A first-order differential equation of the form

dy
= g(x)h(y), (11.8)
dx
where g and h are functions of x and y only, respectively, is called separable or said
to have separable variables.

The simplest case arises when h = 1, that is, the right hand side of (11.8) does not
depend on y,
dy
= g(x).
dx
11.2 Separation of Variables 435

This means that we want to find a function y whose derivative equals the function g.
So, we obtain y as
y = G + c, y(x) = G(x) + c, (11.9)

where G is an antiderivative4 of g, and c is a constant.

Example 11.5 Solve the following differential equation:

dy
= cos 7x.
dx

Solution: Here, g(x) = cos 7x. An antiderivative of g is G(x) = (1/7) sin 7x, so the
solution is
sin 7x
y(x) = + c.
7
Remark 11.2 In Chap. 6, we have discussed extensively how to find antiderivatives
of a given function.

We now turn to the general case

dy
= g(x)h(y).
dx
The separation of variables method works as follows. In the first step, one separates
the “x” and the “y” part5
1
dy = g(x) d x . (11.10)
h(y)

In the second step, one integrates


 
1
dy = g(x) d x + c ;
h(y)

this means, one finds an antiderivative H of the function p defined by p(y) = 1/ h(y),
an antiderivative G of the function g, and writes

H (y) = G(x) + c.

From this equation, in the third step one determines y as a function of x. In the fourth
step, one checks whether y is a solution (by inserting it and its derivative into the
differential equation), and determines its domain.

4 SeeDefinition 6.1.
5 Thisis a purely formal computation; while the expression “dy/d x” has a clear mathematical
meaning—it stands for the derivative of the function y with respect to x—it is not helpful in the
present context to try to attach a meaning to the expressions “d x” and “dy”.
436 11 Differential Equations

Example 11.6 Solve the differential equation


y
y = (11.11)
x

Solution: Here g(x) = 1/x, h(y) = y and p(y) = 1/y. Thus,

H (y) = ln y, G(x) = ln x.

Hence, the second step of the method yields (we denote the integration constant by
b)
ln y = ln x + b. (11.12)

Taking the exponential on both sides, we obtain

y = eln y = eln x+b = eln x eb = xeb .

Thus, the solution has the form


y(x) = cx,

where c is a constant. We have y  (x) = c = cx/x, so this function y indeed solves


(11.11). Its domain is the whole real line (−∞, ∞), and c can be an arbitrary real
number. Note that, in our computation, Eq. (11.12) tacitly assumes that x and y (and
consequently c) are positive, but this no longer matters once we have arrived at the
solution.

Example 11.7 Solve the initial value problem

dy x
=− , y(4) = 3.
dx y

Solution: We have
1
g(x) = x, h(y) = − , p(y) = −y,
y
so
1 1
H (y) = − y 2 , G(x) = x 2 .
2 2
Thus, the second step of the separation of variables method yields

1 1
− y2 = x 2 + c
2 2
or equivalently
11.2 Separation of Variables 437

x 2 + y 2 = −2c.

From the initial condition y(4) = 3 we get −2c = 42 + 32 = 25, so we arrive at the
implicit equation
x 2 + y 2 = 25.

This equation has two solutions y(x) = ± 25 − x 2 , but only the positive one satis-
fies the initial condition. Its domain is the interval (−5, 5); for x = ±5 the derivative
of y becomes infinite.

11.3 First-Order Linear Equations

In this section, we discuss the linear equation of first order

dy
a1 (x) + a0 (x)y = g(x).
dx

If a1 (x)  = 0 for all x considered, we can write this differential equation in the form

dy
+ b(x)y = f (x), (11.13)
dx

where b(x) = a0 (x)/a1 (x) and f (x) = g(x)/a1 (x). The Eq. (11.13) is called the
standard form of a linear differential equation of the first order.
We first consider the special case f = 0. Let B be an antiderivative of b, so B  = b.
Then
y(x) = ce−B(x) , (11.14)

c being a constant, is a solution of (11.13), because

y  (x) + b(x)y(x) = ce−B(x) · (−B  (x)) + b(x)y(x) = y(x) · (−b(x)) + b(x)y(x) = 0.

(One can find the solution (11.14) with the separation of variables method.)

Example 11.8 Find the solution of the differential equation y  = 9y.

Solution: We have b(x) = −9, B(x) = −9x, so the solution becomes y(x) = e9x . It
is defined on (−∞, ∞).
We now consider (11.13) for an arbitrary function f . Again, let B be an antiderivative
of b. The function I is defined by

I (x) = e B(x) (11.15)


438 11 Differential Equations

is called integrating factor. It satisfies

I  (x) = e B(x) B  (x) = b(x)I (x).

Let Q be an antiderivative of the function f · I , so Q  (x) = f (x)I (x). Then using


the product rule (we omit the argument x)

d
(y · I − Q) = y  · I + I  · y − Q  = (−by + f )I + bI · y − f · I = 0.
dx
Therefore,
y(x)I (x) = Q(x) + c, (11.16)

c being a constant, yields a solution y of (11.13).


We thus obtain the following procedure of solution for the linear equation of
first order.
Step 1: Put the equation in the standard form (11.13) if it is not given in this form.
Step 2: Identify b(x), find an antiderivative B of b and compute the integrating factor
I (x) = e B(x) .
Step 3: Find an antiderivative Q of f · I .
Step 4: The solution y satisfies the equation

y(x) · I (x) = Q(x) + c. (11.17)

Example 11.9 Find the solution of the following differential equations:

dy dy
(a) x + 2y = 3, (b) x + (3x + 1)y = e−3x .
dx dx

Solution: (a) We have b(x) = 2/x and f (x) = 3/x. We compute

3 2 3 2
B(x) = 2 ln x, I (x) = e B(x) = x 2 , f (x)I (x) = · x = 3x, Q(x) = x .
x 2
The solution y therefore satisfies

3 2
y(x) · x 2 = x + c.
2

Division by x 2 yields
3 c
y(x) = + .
2 x2

It is valid for x ∈ R, x = 0.
(b) The standard form becomes
11.3 First-Order Linear Equations 439
 
dy 1 e−3x
+ 3+ y=
dx x x
1 e−3x
b(x) = 3 + , f (x) = .
x x
We have

B(x) = 3x + ln x, I (x) = e B(x) = xe3x , f (x)I (x) = 1, Q(x) = x.

The solution y therefore satisfies

y(x) · xe3x = x + c,

so  c
y(x) = e−3x 1 + .
x

It is valid for x ∈ R, x = 0.

11.4 Solution by Substitution

Some differential equations of first order can be transformed into a separable differ-
ential equation or into a linear differential equation of standard form (Eq. (11.13))
by an appropriate substitution. We discuss here two classes of differential equations,
one class comprises homogeneous equations and the other class consists of Bernoulli
equations.

11.4.1 Homogeneous Equations

A function f of two variables is called homogeneous of degree m, where m is an


integer, if f (t x, t y) = t m f (x, y) for all real numbers t.
A first-order differential equation

M(x, y) d x + N (x, y) dy = 0 (11.18)

is called homogeneous if both coefficient functions M and N are homogeneous of


the same degree.
The differential equation (11.18) is written in a form different from what we used
to write so far, namely (11.3). The form (11.18) also has a long tradition; it is natural
if one thinks of the solution as a curve (x(s), y(s)) in the plane, parametrised by
some variable s. When we look for solutions for which y is a function of x, as we
440 11 Differential Equations

do in the present chapter, then (11.18) is defined to be the same as the first-order
equation
M(x, y) + N (x, y)y  = 0.

To solve a differential equation by substitution means that we replace the dependent


variable y by another dependent variable u. This transforms the differential equation
for y into one for u, which is hopefully easier to solve. For homogeneous differential
equation, the substitution

y(x)
y = ux, that is, u(x) =
x
works. (Alternatively, one may also use the substitution x = vy for a new dependent
variable v.) We then obtain a separable equation for u (v, respectively) of first order.
We illustrate this solution method with two examples. The computations are facil-
itated by using a symbolic calculus; one can (and should) always check at the end
whether the function obtained actually solves the differential equation.
Example 11.10 Solve the following homogeneous equations:

(a) (x − y)d x + xdy = 0


(b) (y 2 + yx)d x + x 2 dy = 0.

Solution: (a) We use y = ux to replace y by u. We formally compute dy = ud x +


xdu, so the given equation becomes

(x − ux)d x + x(ud x + xdu) = 0.

This is further processed as

dx
xd x + x 2 du = 0, + du = 0.
x
We have obtained a separable differential equation, already written in the separated
form as used in (11.10). Solving this yields

ln |x| + u = c, x ln |x| + y = cx,

so we arrive at the solution


y(x) = cx − x ln |x|.

(b) Again, we use y = ux and dy = ud x + xdu. The given equation takes the form

(u 2 x 2 + ux 2 )d x + x 2 (ud x + xdu) = 0.

Appropriate divisions give


11.4 Solution by Substitution 441

dx du
(u 2 + 2u)d x + xdu = 0, + = 0.
x u(u + 2)

Passing to antiderivatives in this separable equation yields

1 1
ln |x| + ln |u| − ln |u + 2| = c.
2 2
Multiplying by 2 and taking the exponential gives
 u 
 
x 2  = e2c .
u+2

Replacing u by y/x and e2c by b, we arrive at

x 2 |y| = b|y + 2x|. (11.19)

This is an implicit equation for the solution of the given equation. When x and y
have the same sign, the absolute values can be omitted. One may then check by
differentiation that functions y which satisfy (11.19) also satisfy the given equation.

11.4.2 Bernoulli Equations

A differential equation of the form

dy
+ b(x)y = f (x)y n (11.20)
dx
is called a Bernoulli equation. If n = 0 and n = 1, it can be reduced to a linear
equation of first order by the substitution v = y 1−n . This linear equation can be
solved by the method described above in Sect. 11.3.

Example 11.11 Solve the following differential equations:

dy 1 dy
(a) + y = 3y 3 (b) − y = ex y 2 .
dx x dx

Solution: (a) We have n = 3. Let v = y 1−n = y −2 . The chain rule yields

dv dy dy 1 dv
= −2y −3 · , = − y3 · .
dx dx dx 2 dx
Substituting these values into the given differential equation, we get
442 11 Differential Equations

1 dv 1
− y3 · + y = 3y 3 .
2 dx x

Division by −(1/2)y 3 gives


dv 2
− v = −6. (11.21)
dx x
This equation is of the standard form (11.13), and therefore the solution procedure
of Sect. 11.3 is applicable. We have b(x) = −2/x and f (x) = −6, and obtain

B(x) = −2 ln x, I (x) = e B(x) = x −2 , f (x) · I (x) = −6x −2 , Q(x) = 6x −1 .

According to Sect. 11.3, the solution v of (11.21) satisfies

v(x) · x −2 = 6x −1 + c,
v(x) = 6x + cx 2 .

Since v = y −2 , we get
1
y(x) = ± √ .
6x + cx 2

(b) We have n = 2. Let w = y −1 , then the equation

dy
− y = ex y 2
dx
becomes
dw
+ w = −e x .
dx

We have b(x) = 1, f (x) = −e x , and obtain

1
B(x) = x, I (x) = e x , f (x) · I (x) = −e2x , Q(x) = − e2x .
2
According to Sect. 11.3, the solution w satisfies

1
w(x) · e x = − e2x + c.
2

Since w = y −1 , we get
1
y(x) = .
− 21 e x + ce−x
11.4 Solution by Substitution 443

11.4.3 Reduction of Order

Let
a2 (x)y  + a1 (x)y  + a0 (x)y = 0 (11.22)

be a linear second order homogeneous differential equation. Assume that we already


know one solution y1 . We want to find a second solution y2 of the form

y2 (x) = u(x)y(x). (11.23)

It turns out that the derivative w = u  of u satisfies a linear first-order equation.


The latter can be solved by the procedure of Sect. 11.3. Then, we obtain u as an
antiderivative of w and y2 by inserting u into (11.23).

Remark 11.3 This procedure can be generalized to higher order linear differential
equations.

Example 11.12 For the second order equation y  − y = 0, we know that y1 (x) = e x
is a solution on the interval (−∞, ∞). Use reduction of order to find a second solution
y2 .

Solution: Let y2 (x) = u(x)y1 (x). Differentiating this product function we get

y2 (x) = u(x)y1 (x) + u  (x)y1 (x)



y2 (x) = u(x)y1 (x) + 2u  (x)y1 (x) + u  (x)y1 (x).

In order that y2 becomes a solution of y  − y = 0, we must have (we omit writing


the argument “x”)

0 = y2 − y2 = uy1 + 2u  y1 + u  y1 − uy1 = u(y1 − y1 ) + 2u  y1 + u  y1 .

Since y1 solves the given equation, we must have

0 = 2u  y1 + u  y1 = (2u  + u  )e x ,

that is, since e x  = 0,


u  + 2u  = 0.

By substituting u  = w in this equation we obtain

w + 2w = 0.

This is a linear first-order differential equation. Here, it is not necessary to go back


to the procedure of Sect. 11.3 since we already know that
444 11 Differential Equations

w(x) = ce−2x

is the general solution. The function u(x) = e−2x solves u  = w for c = −2. Thus,
we have arrived at a second solution

y2 (x) = u(x)y1 (x) = e−2x e x = e−x

of the equation y  − y = 0. Since y2 is not a constant multiple of y1 and both y1 , y2


are nonzero, these solutions are linearly independent.

11.4.4 Homogeneous Linear Equations with Constant


Coefficients

We consider in this subsection equations of the type

an y (n) + · · · + a1 y  + a0 y = 0, (11.24)

where the coefficients an , . . . , a1 , a0 are real constants and an  = 0. It turns out that
all solutions of (11.24) are exponential functions or are constructed from exponential
functions.
We try to find a solution of the form y(x) = eλx . After substituting y  = λeλx ,
y  = λ2 eλx and so on, Eq. (11.24) gives us

an λn eλx + an−1 λn−1 eλx + · · · + a1 λeλx + a0 eλx = 0.

Since eλx  = 0 for all x, we can divide by eλx and obtain

an λn + an−1 λn−1 + · · · + a1 λ + a0 = 0. (11.25)

Equation (11.25) is called the auxiliary equation. Its left side is a polynomial of
order n. If λ is a root of this polynomial, the corresponding exponential function
y(x) = eλx is a solution of (11.24).
Let us consider the special case n = 2 of (11.24), written in the form

ay  + by  + cy = 0 (11.26)

with constants a, b, c. The auxiliary equation becomes

aλ2 + bλ + c = 0, (11.27)

The roots of (11.27) are


11.4 Solution by Substitution 445

−b +b2 − 4ac
λ1 = ,
2a

−b − b2 − 4ac
λ2 = .
2a

We know that (i) λ1 and λ2 are real and distinct if b2 − 4ac > 0, (ii) λ1 and λ2 are
real and equal if b2 − 4ac = 0, (iii) λ1 and λ2 are conjugate complex numbers if
b2 − 4ac < 0.
Case (i) Two distinct real roots
Let λ1 and λ2 be two distinct real roots of (11.27). We find two solutions

y1 (x) = eλ1 x , y2 (x) = eλ2 x .

We can check that y1 and y2 are linearly independent on every interval and thus form
a fundamental set of solutions. Thus,

y(x) = c1 eλ1 x + c2 eλ2 x (11.28)

is the general solution of (11.26) in this case.


Case (ii) A double real root
If λ1 = λ2 , we obtain only one exponential solution y1 (x) = eλ1 x . A second solution
is
y2 (x) = xeλ1 x .

This can be obtained by the method of order reduction, or verified directly by differ-
entiation. The general solution in this case becomes

y(x) = c1 eλ1 x + c2 xeλ1 x . (11.29)

Case (iii) Conjugate complex roots


In this case, λ1 = α + iβ and λ2 = α − iβ, where α and β are real and β > 0. Then

y1 (x) = e(α+iβ)x , y2 (x) = e(α−iβ)x

are two linearly independent solutions; they are complex-valued, as the exponentials
e±iβ are complex numbers. The general solution of (11.26) becomes

y(x) = c1 e(α+iβ)x + c2 e(α−iβ)x . (11.30)

We will obtain real solutions through a suitable choice of the constants c1 and c2 .
From Euler’s formula
eiθ = cos θ + i sin θ

where θ is any real number, we obtain


446 11 Differential Equations

eiβx = cos βx + i sin βx,

eiβx + e−iβx eiβx − e−iβx


cos βx = , sin βx = .
2 2i
The choices c1 = c2 = 1 and c1 = 1, c2 = −1 respectively give the two solutions

z 1 (x) = e(α+iβ)x + e(α−iβ)x ,


z 2 (x) = e(α+iβ)x − e(α−iβ)x .

Therefore

z 1 (x) = eαx (eiβx + e−iβx ) = 2eαx cos βx,


z 2 (x) = eαx (eiβx − e−iβx ) = 2eαx sin βx.

These functions are real-valued and thus the sought-for real solutions of (11.26).
Moreover, these solutions form a fundamental set on (−∞, ∞). Consequently the
general solution is

y(x) = c1 eαx cos βx + c2 eαx sin βx = eαx (c1 cos βx + c2 sin βx). (11.31)

This solution represents an oscillation whose frequency is determined by β and whose


amplitude is constant in the case α = 0 (which corresponds to purely imaginary
roots) or increasing respectively, decreasing in x in the case α > 0 respectively,
α < 0.6 Equation (11.26) models the simple harmonic oscillator (case α = 0) and
the damped harmonic oscillator (case α < 0), respectively.
Example 11.13 Solve the following differential equations:
(i) 2y  − 5y − 3y = 0.
(ii) y  + 5y  − 6y = 0.
(iii) y  + 8y  + 16y = 0.
(iv) y  + 4y  + 7y = 0.

Solution: of (i) The auxiliary equation is 2λ2 − 5λ − 3 = 0 which can be written as

(2λ + 1)(λ − 3) = 0.

Its two roots are λ1 = −1/2, λ2 = 3. The solution has the form (11.28), that is,
1
y = c1 e− 2 x + c2 e3x .

(ii) The auxiliary equation is λ2 + 5λ − 6 = 0. This can be written in the form

6 See also Chap. 10 on Fourier methods.


11.4 Solution by Substitution 447

(λ − 1)(λ + 6) = 0.

Its roots are λ1 = 1, λ2 = −6. Again, the solution has the form form (11.28), that is,

y = c1 e x + c2 e−6x .

(iii) The auxiliary equation is λ2 + 8λ + 16 = 0. Its roots are λ1 = λ2 = −4. The


solution is of the form (11.29), that is,

y = c1 eλ1 x + c2 xeλ1 x = c1 e−4x + c2 xe−4x .

(iv) The auxiliary


√ m 2 + 4m + 7 = 0. Its roots λ1 and λ2 are given by
equation is √
λ1 = −2 + i 3, λ2 = −2 − i 3. The solution is of the form (11.31), that is,
√ √
y = e−2x (c1 cos 3x + c2 sin 3x).

Example 11.14 Solve the initial value problem

y  + 3y  − 2y = 0
y(0) = 1, y  (0) = 2.

Solution: The auxiliary equation is

λ2 + 3λ + 2 = 0.

Its roots λ1 and λ2 are


λ1 = −1, λ2 = −2.

Therefore, the solution is of the form (11.28), that is,

y = c1 e−x + c2 e−2x .

To find c1 and c2 we use the initial conditions

y(0) = 1, y  (0) = 2.

We have
1 = y(0) = c1 e−0 + c2 e−0 ,

so
c1 + c2 = 1.

Since y  (x) = −c1 e−x − 2c2 e−2x , the second initial condition yields

2 = y  (0) = −c1 e−0 − 2c2 e−0 = −c1 − 2c2 ,


448 11 Differential Equations

so
c1 + 2c2 = −2.

Solving the linear system c1 + c2 = 1, c1 + 2c2 = −2 we obtain c2 = −3 and c1 =


4. Therefore, the solution is

y(x) = 4e−x − 3e−2x .

Remark 11.4 In general, to solve an nth order differential equation (11.24) we must
solve the nth degree polynomial equation

an λn + an−1 λn−1 + · · · + a2 λ2 + a1 λ + a0 = 0.

If all roots λ1 , λ2 , . . . , λn of this equation are real and distinct, the functions y1 (x) =
eλ1 x , . . . , yn (x) = eλn x form a fundamental set, and the general solution of (11.24)
is
y(x) = c1 eλ1 x + c2 eλ2 x + · · · + cn eλn x .

If m is a multiple root, then functions xeλx , x 2 eλx ,…appear in the fundamental set.
If there are pairs of conjugate complex roots, then corresponding pairs as in (11.31)
appear in the fundamental set.

11.5 Modeling with Differential Equations

In this section we discuss the solution of different real world problems represented
by differential equations of first order. These problems are
(11.5.1) Growth and decay
(11.5.2) Population growth (population dynamics)
(11.5.3) Pollution of lakes
(11.5.4) Quantity of a drug in the body
(11.5.5) Spread of disease, technologies, and rumor
(11.5.6) Newton’s law of heating and cooling
(11.5.7) Timing of death (police investigation in criminal cases)

11.5.1 Growth and Decay

Growth and decay of a nonnegative quantity y can be modeled by considering y as


a function of time t and formulating a differential equation

dy
= f (t, y). (11.32)
dt
11.5 Modeling with Differential Equations 449

This means that the rate of change of y at the time t equals the value f (t, y(t)). The
quantity grows when f is positive, and decays when f is negative.
If the growth rate f (t, y) is equal to a constant k, then

y(t) = kt + c

with an arbitrary constant c solves (11.32). This describes linear growth and decay,
respectively. If k < 0, this model only makes sense until the time t when y(t) becomes
zero.
If f (t, y) = ky with k being constant, that is, the growth rate is linear, then

y(t) = cekt

solves (11.32). This describes exponential growth and decay, respectively, with a
growth factor k.
A third variant is f (t, y) = kt; then the growth rate is linear with respect to time,
but independent from y. The solution in this case becomes

k 2
y(t) = t + c,
2
that is, we have quadratic growth.
In all variants, c equals y(0), the amount of the quantity at time zero.

Example 11.15 Find the general solution to the following differential equations:

dy dy
(a) = 2t, (b) = 2y.
dt dt

Solution: (a) dy/dt = 2t, so y(t) = t 2 + c.


(b) dy/dt = 2y, so y(t) = ce2t . This is an example of exponential growth.

11.5.2 Population Growth

One of the earliest attempts to model human population growth by means of math-
ematics was made by the English economist Thomas Malthus in 1798. Essentially,
the idea of his model, now called the Malthusian model,
He assumed that that the rate at which a population of a country grows at a certain
time is proportional to the total population of the country at that time. This is nothing
else than the model with linear growth rate from the previous subsection; in this
context, it is called the Malthusian model. If N (t) and k denote the total population
at time t and the proportionality constant, respectively, we have
450 11 Differential Equations

dN dN
∝ N, = kN. (11.33)
dt dt
As we know, its general solution is given by

N (t) = cekt .

When augmented by an initial condition, the solution of (11.33) will provide the
population size at any future time t. This simple model, which does not take into
account many factors (immigration and emigration, for example) that influence the
growth or decline of human populations, nevertheless turned out to be fairly accurate
in predicting the population of the United States during the years 1790–1860. Since
it yields exponential growth, which is unbounded when k is positive, it cannot be
realistic for large times; nevertheless, (11.33) is still used to model the growth of
small populations over short time intervals.
In 1837, the Dutch biologist Verhulst improved the Malthusian model while look-
ing at fish populations in the Adriatic sea. He reasoned that the rate of change of
population N (t) with respect to t should be influenced by growth factors such as the
population itself, and also factors tending to limit the population, such as limitations
of food and space. In his model, he assumes that the growth factors are incorporated
into a term a N (t), and limiting factors into a term −bN (t)2 , with a and b being
positive constants whose values depend on the particular population. From this, he
obtained the logistic model of population growth,

dN
= a N − bN 2 . (11.34)
dt

If we assume the initial population at time t = 0 to be N (0) = N0 , we arrive at the


initial value problem
dN
= a N − bN 2 N (0) = N0 .
dt
We will see below that its solution is given by

a N0 eat
N (t) = . (11.35)
a − bN0 + bN0 eat

Example 11.16 The population of a community is known to increase at a rate pro-


portional to the number of people present at time t. If the population has doubled in
6 years, how long it will take to triple?

Solution: Let N (t) denote the population at time t. Let N (0) denote the initial pop-
ulation at t = 0. The model is
dN
= kN,
dt
11.5 Modeling with Differential Equations 451

where the constant k is unknown. Its solution is is N (t) = Aekt , where A = N (0).
By the given data
Ae6k = N (6) = 2N (0) = 2 A,

that is,
1
e6k = 2, k = ln 2.
6
We want to determine t such that N (t) = 3A = 3N (0). We compute

N (0)ekt = 3N (0), ekt = 3, ln 3 = kt = 16 (ln 2)t,


t = 6 ln
ln 3
2
≈ 9.5

The population triples after approximately 9.5 years.

Example 11.17 Let the population of a country be decreasing at a rate proportional


to its population. If the population has decreased to 25% in 10 years, how long did
it take to decrease to 40%?

Solution: This phenomenon can again be modeled by d N /dt = k N . Its solution is

N (t) = N (0)ekt ,

where N (0) is the initial population. We compute

1 1 1
N (0)e10k = N (10) = N (0), k = ln .
4 10 4
We want to determine t such that N (t) = (2/5)N (0). We compute

2 2 2
N (0)ekt = N (t) = N (0), ekt = , kt = ln ,
5 5 5
ln(2/5)
t = 10 ≈ 6.6
ln(1/4)

The population has been reduced to 40% after approximately 6.6 years.

Example 11.18 Let N (t) be the population at time t and let N0 denote the initial
population, i.e., N (0) = N0 . Find the solution to the differential equation

dN
= a N − bN 2
dt

with the initial condition N (0) = N0 .

Solution: This is a separable differential equation. We use separation of variables,


452 11 Differential Equations

dN
= dt. (11.36)
a N − bN 2

In order to find an antiderivative, we decompose the left side into partial fractions.
We set
1 1 A B
= = + .
a N − bN 2 N (a − bN ) N a − bN

The constants A and B must satisfy 1 = A(a − bN ) + B N for arbitrary N . This


gives Aa = 1 and 0 = −b A + B, so

1 b
A= , B= .
a a
Equation (11.36) becomes
 
1 b
+ d N = dt.
aN a(a − bN )

Integrating both sides we get

1 1
ln |N | − ln |a − bN | = t + c
a a
In order to solve this for N , we compute
 N 
 N
ln   = at + ac, = c1 eat ,
a − bN a − bN
so ac1
N (t) = .
bc1 + e−at

Using the condition N (0) = N0 gives

N0
c1 =
a − bN0

After substitutions and simplifications the solution becomes

a N0 eat
N (t) = .
a − bN0 + bN0 eat
11.5 Modeling with Differential Equations 453

11.5.3 Pollution of Lakes

Let Q(t) be the volume of some pollutant at time t in a lake of total volume V . Suppose
that clean water is flowing into the lake at a constant rate r and that water flows out
at the same rate; consequently, the total volume V remains constant. Assume that
the pollutant is evenly spread throughout the lake, and that the clean water coming
into the lake immediately mixes with the rest of the water.
How does Q vary with time? First, notice that since the pollutant is being taken
out of the lake but not added, Q decreases with time, and the water leaving the lake
becomes less polluted, so the rate at which the pollutant leaves also decreases with
time.
To understand how Q changes with time, we write a differential equation for Q.
The rate at which the pollutant leaves at time t is equal to −Q  (t). (The outflow is
positive, and Q is decreasing.) The total rate of the outflow equals r , and the pollution
is represented in the outflow with the fraction Q(t)/V . Thus, the rate at which the
pollutant leaves is equal to
Q(t)
· r.
V
So the differential equation is
dQ r
= − Q. (11.37)
dt V
Its solution is
Q(t) = Q 0 e−r t/V , (11.38)

where Q 0 is the volume of the pollutant at time zero. This tells us that Q is decreasing.
By (11.37), Q  is increasing. Consequently, Q is convex.7 In addition, the pollutants
will never be completely removed from the lake though the quantity remaining will
become arbitrarily small. Indeed, limt→∞ Q(t) = 0, the t-axis is a horizontal asymp-
tote for the function Q.

Example 11.19 Assume that the lake has a volume of 46 units, and the rate of inflow
and outflow equals 17.5 units per year. How long it will take for 90% of the pollutant
to be removed from the lake? For 99% to be removed?

Solution: By the given data, r/V = 17.5/46 = 0.38, so at time t we have

Q(t) = Q 0 e−0.38t .

When 90% of the pollution has been removed, 10% remains, so Q(t) = 0.1Q 0 .
Substituting gives
0.1Q 0 = Q 0 e−0.38t .

7 See Definition 4.4.


454 11 Differential Equations

Canceling Q 0 and solving for t gives

− ln(0.1)
t= ≈ 6.
0.38
Thus, it takes approximately 6 years to remove 90% of the pollution. Similarly, when
99% of the pollution has been removed, Q(t) = 0.01Q 0 , so we solve

0.01Q 0 = Q 0 e−0.38t

giving
ln(0.01)
− ≈ 12.
0.38
It takes approximately 12 years to remove 99% of the pollution.

11.5.4 The Quantity of a Drug in the Body

The exponential decay model which we have used above to describe pollutants leav-
ing a lake can also be applied to other contaminants flowing in or out of a fluid
system, provided we have complete mixing. Another example is the quantity of a
drug in a patient’s body. After stopping the administration of a drug, the rate at which
the drug leaves the body can be assumed proportional to the quantity of the drug left
in the body. If we let A represent the quantity of drug in the body,

dA
= −k A.
dt
The minus sign indicates that the quantity of the drug in the body is decreasing.
The solution to this differential equation is A(t) = A0 e−kt ; the quantity decreases
exponentially. The constant k depends on the drug. A0 is the amount of drug in the
body at time zero. Sometimes physicians convey information about the relative decay
rate in form of the half-life, which is the time it takes for A to decrease by a factor
of 1/2.
Example 11.20 Valproic acid is a drug used to control epilepsy; its half-life in the
human body is about 15 h.
(a) Use the half-life to find the constant k in the differential equation d A/dt = −k A
(b) At what time will 10% of the original dose remain?

Solution: (a) Since the half-life is 15 h, we know that the quantity remaining equals
A(t) = 0.5A0 when t = 15. We substitute this into the solution of the differential
equation, A(t) = A0 e−kt

0.5A0 = A(t) = A0 e−15k .


11.5 Modeling with Differential Equations 455

Dividing by A0 and taking the logarithm yields

ln 0.5 = −15k,

so
ln 0.5
k=− ≈ 0.0462.
15
(b) To find the time when 10% of the original dose remains in the body, we
substitute 0.10 A0 for the quantity A(t) remaining, and solve for the time t. We
compute

0.10 A0 = A0 e−0.0462t , 0.10 = e−0.0462t , ln 0.10 = −0.0462t,

so
ln 0.10
t =− ≈ 49.84
0.0462
There will be 10% of the drug still in the body at t = 49.84, or after about 50 h.

11.5.5 Spread of Diseases, Technologies and Rumor

Example 11.21 Suppose some students in a school of 1000 students are carrying
a flu virus. Find a differential equation governing the number of people N (t) who
have contracted the flu if the rate at which the disease spreads is proportional to the
number of interactions between the number of students with flu and the number of
students, who have not yet been exposed to it.

Solution: Let N be the number of students with the flu. Then the number of students
not infected is 1000 − N . We assume that the number of interactions between these
two groups is proportional to the product of the sizes of these groups. The model
then becomes
dN
= k N (1000 − N )
dt
with some proportionality constant k > 0.

Example 11.22 A technological innovation is introduced into a community with a


fixed population of M people, say at time t = 0. Find a differential equation repre-
senting the number of people N (t) who have adopted the innovation at time t under
the condition that the rate at which the innovation spreads through the community is
jointly proportional to the number of people who have adopted it and who have not
adopted it.
456 11 Differential Equations

Solution: At time t, the number of people who have not adopted the invention is
equal to M − N (t). If we assume that at time t = 0 a single person has adopted the
invention, the model becomes

dN
= k N (M − N ), N (0) = 1,
dt
with some proportionality constant k > 0.

Example 11.23 (a) Along the lines of the previous two examples, describe a model
for the spread of a rumor and solve it.
(b) Currently the rumor is known to 300 students in a residential university of 45000
students. It will be known to 900 students after one week. Find the number of students
who will know the rumor after 4 weeks.

Solution: (a) We assume that the rate of spreading is proportional to the number of
interactions between students who know, respectively, do not know the rumor, and
that this number is again proportional to the product of the sizes of the respective
group. Let N (t) be the number of students who know the rumor at time t, and let M
be the total number of students. The desired model then is
dN
= μN (M − N ),
dt
where μ > 0 is some proportionality constant. It can be solved by the separation of
variables method,
dN
= μdt,
N (M − N )
 
dN
= μ dt + c,
N (M − N )

1 N
ln | | = μt + c,
M M−N

N
ln = Mμt + Mc,
M−N

as N > 0 and M − N > 0.


Taking the exponential on both sides gives

N
= e Mμt+Mc = Ae Mμt , A = e Mc .
M−N

Solving this equation for N yields


11.5 Modeling with Differential Equations 457

M Ae Mμt M
N (t) = = .
Ae Mμt + 1 1 + A1 e−μMt

(b) We have M = 45000 and N (0) = 300. Setting b = 1/A, we compute b from

M 45000
300 = N (0) = = , b = 149.
1+b 1+b

We then get

45000 49
900 = N (1) = , 1 + 149e−Mμ = 50, e−Mμ = .
1 + 149e−Mμ 149

As e−4Mμ = (e−Mμ )4 , we finally obtain

45000
N (4) =  49 4 ≈ 16400.
1 + 149 149

There are approximately 16400 students who know the rumor after 4 weeks.

11.5.6 Application of Newton’s Law of Cooling

Example 11.24 A thermometer is removed from a room whose temperature is 80 ◦ F


and is taken outside where the air temperature is 10 ◦ F. After 2 min, the thermometer
shows the reading 40 ◦ F. What is the reading of the thermometer at t = 3 min? How
long will it take for the thermometer to reach 20 ◦ F?

Solution: This phenomenon is modeled by

dT
= λ(T − 10)
dt

by Newton’s law of cooling. Its solution is T (t) = 10 + ceλt .


As T (0) = 80, we have 80 = 10 + ce0 , so c = 70. As T (2) = 40, we have 40 =
10 + 70e2λ , so
30 1 3
2λ = ln , λ = ln .
70 2 7
Thus
1 3
T (3) = 10 + 70e3· 2 ln 7 .

Solving
1 3
20 = T (t) = 10 + 70et 2 ln 7 ,
458 11 Differential Equations

we get  

1
ln 3 1 3 10 1 1
10 = 70e 2 7 , ln t = ln , t = 1 3
ln .
2 7 70 2
ln 7
7

Example 11.25 A 4 kg roast, initially at 60 ◦ F is placed in a 375 ◦ F oven at 6 pm. At


7.15 pm, the temperature of the roast is 125 ◦ F. Find the time when the roast will be
at 150 ◦ F.

Solution: We assume that at any instant the temperature T (t) of the roast is uniform
throughout. We have T (t) < 375 at any time t. By Newton’s law of cooling (here,
in fact, heating),
dT
= λ(375 − T )
dt
with some constant λ > 0. The general solution of this equation is

T (t) = 375 + c−λt .

We take t = 0 which corresponds to 6 pm. From the initial condition T (0) = 60 we


obtain c = −315. Measuring t in minutes, 7:15 pm corresponds to t = 75. We thus
have
125 = T (75) = 375 − 315e−75λ .

We compute
315 1 315
315e−75λ = 250, = e75λ , λ = ln .
250 75 250
Solving
150 = T (t) = 375 − 315e−λt

for t yields
1 315
315 = 225eλt , t = ln .
λ 225

11.5.7 Application of Newton’s Cooling Law for Determining


Time of Death

The time of death of a murdered person can be estimated with the help of modeling
through differential equations. A police personnel discovers the body of a dead person
presumably murdered, and the problem is to estimate the time of death. The body is
located in a room that is kept at a constant 70 ◦ F. After the death, the body will radiate
heat into the cooler room, causing the body’s temperature to decrease, assuming that
the victim’s temperature was a normal 98.6 ◦ F at the time of death. Forensic experts
will try to estimate this time from the body’s current temperature and calculate how
11.5 Modeling with Differential Equations 459

long it would have had to lose heat to reach that temperature. According to Newton’s
law of cooling, the body will radiate heat energy into the room at a rate proportional
to the difference in temperature between the body and the room. If T (t) is the body
temperature at time t, then for some constant of proportionality k,

T (t) = k(T (t) − 70).

The corresponding linear differential equation

T  = k(T − 70)

we have already solved; its general solution is

T (t) = 70 + cekt .

The constants k and c can be determined provided the following information is


available: The Time of arrival of the police personnel, the temperature of the body just
after his arrival, the temperature of the body after a certain interval of time. Assume
that the officer arrived at 10:40 pm and the body temperature was 94.4 ◦ F. This
means that if the officer considers 10:40 pm as t = 0 then T (0) = 94.4 = 70 + c,
so c = 24.4 giving T (t) = 70 + 24.4ekt . Assume that the officer makes another
measurement of the temperature after 90 min, that is, at 12:10 am, and temperature
was then 89 ◦ F. This means that

89 = T (90) = 70 + 24.4e90k .

We determine k through the computation


   
19 19 1 19
e90k = , 90k = ln , k= ln .
24.4 24.4 90 24.4

The officer has now temperature function

T (t) = 70 + 24.4e 90 ln( 24.4 ) .


t 19

In order to find at which last time the body was at 98.6◦ F (presumably the time of
death), one has to solve for time the equation

T (t) = 98.6 = 70 + 24.4e 90 ln( 24.4 ) .


t 19

This is done by computing

28.6  28.6  t  19 
= e 90 ln( 24.4 ) , ln
t 19
= ln .
24.4 24.4 90 24.4
460 11 Differential Equations

Therefore, the time of death, according to this mathematical model, was

90 ln(28.6/24.4)
t= ,
ln(19/24.4)

which is approximately −57 min. Thus, the death occurred approximately 57 min
before the first measurement at 10:40 pm, that is at 9:43 pm approximately.

11.6 Introduction to Partial Differential Equations

In the previous sections, we have studied some basic ordinary differential equa-
tions and used them for modeling various real-world problems. However, since the
world around us is three-dimensional, many relevant quantities depend on the three
space coordinates, and possibly on time. Consequently, differential equations for the
corresponding mathematical functions naturally involve partial derivatives of these
functions. Indeed, the basic laws of continuum mechanics and thermodynamics for
the conservation of mass, momentum and energy become, when formulated in math-
ematical terms, partial differential equations. The same applies to the basic laws of
continuum electrodynamics, as well as to the equations which describe processes on
very small scales (molecular or atomistic level) and on very large scales (e.g., stellar
evolution).
Seen from a different angle, partial differential equations play a role in modeling
a very broad range of phenomena, including somewhere at first one does not expect
them to do so. Let us just present a (somewhat arbitrary) list of such phenomena:
(i) Diffusion of one material within another, smoke particles in air.
(ii) Chemical reactions, such as the Belousov–Zhabotinsky reaction which exhibits
fascinating pattern structures.
(iii) Dispersion of populations; individuals move both randomly and to avoid over-
crowding.
(iv) Pursuit and evasion in predator–prey systems.
(v) Pattern formation in animal coats, the formation of zebra stripes.
(vi) Dispersion of pollutants in a running stream.
(vii) Appropriate price of an option in a capital market.
When describing phenomena taking place in three-dimensional space, it is often
possible to let the unknown functions depend on one space coordinate only. This
happens if the actual dependence on the other coordinates is very slight (or nonexis-
tent). Moreover, in order to understand the behavior of a partial differential equation
in three space dimensions (3D) it is often helpful if at first one analyses its behavior
in one space dimension (1D).
We present some examples of partial differential equations in 1D. The unknown
function u is real-valued and depends on x and t.
11.6 Introduction to Partial Differential Equations 461

(a) Heat equation or diffusion equation


The equation
∂u ∂ 2u
=k 2
∂t ∂x
describes the evolution of the temperature u as a function of space x and time t.
The constant k denotes the thermal diffusivity. It was introduced by Fourier in his
celebrated memoir “Théorie analytique de la chaleur” which appeared in 1822.
(b) Wave equation
The equation
∂ 2u 2∂ u
2
= c
∂t 2 ∂x2
describes the evolution of some quantity u which exhibits wave propagation. The
constant c denotes the wave speed. This equation can be applied to model elastic (e.g.,
vibrations of an elastic rod), to acoustic and to electromagnetic wave propagation.
It was introduced and analyzed by d’Alembert in 1752 as a model for a vibrating
string.
(c) Linear transport equation
The equation
∂u ∂u
+c =0
∂t ∂x
describes the transport of a spatially distributed quantity u with constant speed c. In
this case, u is to be understood as a density function with respect to x, that is, the
integral
 b
u(x, t) d x
a

represents the total amount of the quantity present within the interval [a, b] at time
t. For example, u could be the density of cars per unit kilometer on a road (modeled
as a subset of the real line) which move at a constant speed in one direction.
(d) Scalar conservation law
The equation
∂u ∂u
+ a(u) = 0,
∂t ∂x
describes nonlinear transport (the linear case is recovered when a is constant). It is
used to model various types of flow. In the special case a(u) = u it is called Burgers’
equation; it arises in particle and fluid flow with zero viscosity. That equation was
introduced by Betman in 1915 and later studied by Burgers in 1948.
462 11 Differential Equations

(e) Telegraph equation


The equation
∂ 2u ∂u ∂ 2u
+ A + Bu = ,
∂t 2 ∂t ∂x2
where A and B are constants, arises in the study of propagation of electrical signals
in a cable transmission line. Both the current I and voltage V satisfy equations of
this type. The telegraph equation was introduced by Oliver Heaviside in 1880. It also
arises in the propagation of pressure waves in the study of pulsating blood flow in
arteries.
(f) Korteweg de Vries (KDV) Equation
The equation
∂u ∂u ∂ 3u
+ cu + 3 = 0.
∂t ∂x ∂x
models shallow water waves. It was originally discovered by Boussinesq in 1877 and
rediscovered by Korteweg and de Vries in 1895.
Let us now pass to examples of partial differential equations in 3D (or 2D). Within
these equations, there often appears the Laplace operator Δ defined by

∂ 2u ∂ 2u ∂ 2u ∂ 2u ∂ 2u
Δu = + 2 + 2 respectively Δu = + 2.
∂x 2 ∂y ∂z ∂x 2 ∂y

Since Δu = div(∇u), it is also written as Δu = (∇ · ∇)u = ∇ 2 u.


(g) Laplace equation
The Laplace equation8
−Δu = 0,

is probably the most studied partial differential equation since it appeared in Laplace’s
work in 1785. Its inhomogeneous counterpart

−Δu = f

is called the Poisson equation. In the latter, f is a function which depends on the
space variables and represents some additional influence arising from the specific
situation in which the equation is applied. For example, these equations are satisfied
by the electrostatic potential in absence (presence, respectively) of charges, by the
gravitational potential in the absence (presence, respectively) of mass, by the equilib-
rium displacement of a membrane in absence (presence, respectively) of distributed
forces, by the steady-state temperature in the absence (presence, respectively) of ther-
mal sources or sinks, and by the velocity potential for an inviscid, incompressible,

8 The minus sign is a convention which is widely adopted in the mathematical theory of the Laplace

equation.
11.6 Introduction to Partial Differential Equations 463

irrotational homogeneous fluid in the absence (presence, respectively) of sources and


sinks.
The 1D examples (a)–(c) actually arise from their 3D formulation. To obtain the
latter, one just has to replace ∂ 2 u/∂ x 2 by Δu and, for the transport equation, c∂u/∂ x
by c · ∇u (the scalar constant is replaced by a constant vector).
(h) Helmholtz equation
The equation
(Δ + k 2 )u = 0

has been found useful in diffraction theory. It was introduced by Helmholtz in 1860.
It is obtained from the wave equation when one looks for solutions of the form
u(x, t) = v(x)w(t).
(i) Eikonal equation
The eikonal equation
∇u = 0

models problems of geometric optics. Here, ∇u(x) denotes the length of the gra-
dient vector ∇u(x) at the space point x.
(j) Klein–Gordon equation
The equation
1 ∂ 2u
− Δu + μ2 u = 0
c2 ∂t 2
arises in quantum field theory, where μ = mc/ h, m is the mass, c the speed of light
and h the Planck constant. It is named after the physicists Oskar Klein and Walter
Gordon who proposed it in 1926.
(k) Schrödinger equation
The equation
∂u h2
ih + Δu − V u = 0.
∂t 2m
is a fundamental equation of quantum mechanics. Here, u is the wave function of a
particle with mass m and potential energy V , h is the Planck constant and i denotes
the imaginary unit. The Austrian physicist Erwin Schrödinger, who developed this
equation in 1926, obtained a Nobel prize in 1933 for this work.
(l) Navier–Stokes equation
The equation
∂u 1
+ (u · ∇)u + ∇ p = νΔu
∂t ρ
464 11 Differential Equations

is satisfied by the velocity vector u = (u 1 , u 2 , u 3 ) of an incompressible fluid, that


is, a fluid whose density ρ is constant. The variable p stands for the pressure of the
fluid and the constant ν denotes the kinematic viscosity. In the period from 1822 to
1845, several scientists including Navier and Stokes contributed to the formulation
of this equation. In the special case ν = 0, it is called the Euler equation, published
by Euler in 1757.
The study of partial differential equations (PDEs) started in the eighteenth century
in the work of Euler, d’Alembert and Laplace, motivated by continuum mechanics.
Until the present time, their relevance in all kinds of situations in science and tech-
nology and (more recently) economy and even social sciences provided a rather
strong incentive to develop their mathematical understanding, besides being intrin-
sically interesting for mathematicians. A lot of mathematical research in other areas
of analysis (e.g., functional analysis) was inspired by the need of tools for solving
problems with partial differential equations.

11.7 Applications of Fourier Methods to Partial


Differential Equations

Fourier methods can be used to solve boundary value problems for linear partial
differential equations. We discuss this briefly for three equations which we have
already encountered in Chap. 8, namely the wave equation in one dimension,

∂ 2u 2∂ u
2
= c , where c is a constant,
∂t 2 ∂x2
the heat equation in one dimension

∂u ∂ 2u
= k 2 , where k > 0 is a constant,
∂t ∂x
and the Laplace equation in two dimensions

∂ 2u ∂ 2u
+ = 0.
∂x2 ∂ y2

For problems on a bounded interval one uses Fourier series, while for problems on
the real line one uses the Fourier transform.
11.7 Applications of Fourier Methods to Partial Differential Equations 465

11.7.1 Fourier Methods for the Wave Equation

A Boundary Value Problem. We consider the wave equation

∂ 2u ∂ 2u
(x, t) = c2 2 (x, t) , 0 < x < l , t > 0 , (11.39)
∂t 2 ∂x

on the bounded space interval (0, l) for positive times t > 0. Let us mention as one
particular application that (11.39) is an approximate model for transverse vibrations
of an elastic string of length l. There one assumes that at rest the string occupies the
horizontal interval [0, l], that is, any value x ∈ [0, l] corresponds to a material point
of the string. The variable u denotes the transverse displacement with respect to the
state of rest u = 0, that is, u(x, t) gives the vertical position of the string particle x
at time t. We do not derive (11.39) from the laws of continuum mechanics (which is
in fact not so easy, since in reality, a string
√ is a three-dimensional object), but only
remark that the wave speed c is given by qσ/ρ, where q is the cross-sectional area,
σ the tensile stress acting on a cross section and ρ the mass density of the spring.
The boundary conditions

u(0, t) = u(l, t) = 0 , t ≥ 0 , (11.40)

describe the situation where the string is fixed at both ends x = 0 and x = l. Let us
choose t = 0 as an initial time and impose the initial conditions

u(x, 0) = f (x) , 0≤x ≤l, (11.41)


∂u
(x, 0) = 0 , 0≤x ≤l. (11.42)
∂t
This means that the string is brought to the position described by the given function
f , where it is held at velocity 0, and then suddenly released at time t = 0. In order
that (11.40) and (11.41) are compatible, we require that f (0) = f (l) = 0.
In order to find the solution of the boundary value problem (11.39)–(11.41), one
employs as a first step the method of separation of variables and as a second step
the Fourier series representation. “Separation of variable” here means that we are
looking for a solution of the form

u(x, t) = U (x)T (t) , 0 ≤ x ≤ l , 0 ≤ t . (11.43)

We substitute this into the wave equation and obtain

U (x)T  (t) = c2 U  (x)T (t) ,

where T  = dT /dt and U  = dU/d x. Then


466 11 Differential Equations

U  (x) 1 T  (t)
= 2 . (11.44)
U (x) c T (t)

The left side of this equation depends only on x and the right side only on t. Since
it has to be satisfied for all values of t and x in the intervals considered, we can
fix any value for t for which T (t) = 0, and see that the left side must be equal
to T  (t)/(c2 T (t)), no matter which value we choose for x. Thus, considered as a
function of x the left-hand side is constant. Let us denote this constant by −λ (the
negative sign is customary and convenient, but we would arrive at the same final
result if we just used λ). Therefore,

U  (x) 1 T  (t)
= −λ = 2 (11.45)
U (x) c T (t)

for all x and t. In this manner, the wave equation has been separated into the two
differential equations

U  (x) + λU (x) = 0 , (11.46)



T (t) + λc T (t) = 0 .
2
(11.47)

Let us now determine U . The boundary condition (11.40) becomes

U (0) = U (l) = 0 . (11.48)

For λ ≤ 0, the only solution of (11.46) which satisfies (11.48) is the trivial solution
U = 0 (we will not prove this here) leading to the trivial solution u = 0 of the wave
equation, which is indeed the solution of the boundary value problem in the trivial
case f = 0, but not if f = 0. For λ > 0, one checks that
√ √
U (x) = a cos( λx) + b sin( λx) (11.49)

is a solution of (11.46)—in fact, it is the general solution—, where a and b are


constants. Since U (0) = a, the boundary condition U (0) = 0 yields a = 0. For b =
0, we again get√the trivial solution U = 0. For
√b = 0, the boundary condition U (l) =
0 implies sin( λl) = 0, which means that λl must be a positive integer multiple
of π . Therefore, the possible values for λ are

n2π 2
λn = , n = 1, 2, 3, . . . (11.50)
l2
Let us remark that the λn ’s are called eigenvalues, and the corresponding func-
tions Un (x) = sin(nπ x/l) are called eigenfunctions of the boundary value problem
(11.46) and (11.48).
The Eq. (11.47) for T is solved analogously. Its general solution is
11.7 Applications of Fourier Methods to Partial Differential Equations 467
√ √
T (t) = a cos( λct) + b sin( λct) (11.51)

The initial condition (11.42) for ∂u/∂t gives 0 = U (x)T  (0), so T  (0) = 0 for the
nontrivial case, and therefore b = 0. From (11.50) we get solutions
 nπ ct 
Tn (t) = cos . (11.52)
l
Putting together the results above, we have found that the functions
 nπ x   nπct 
u n (x, t) = sin cos , n = 1, 2, 3, . . . (11.53)
l l
are special solutions of the wave Eq. (11.39) which satisfy the boundary conditions
(11.40) and (11.42). Moreover, we have
 nπ x 
u n (x, 0) = sin .
l
From those functions u n , we construct a solution u which also satisfies the initial
condition u(x, 0) = f (x), using Fourier series. We consider the odd extension of f
on [−l, l] defined by f (x) = − f (−x) for x < 0. Its Fourier coefficients on [−l, l]
are (see Sect. 10.2.3)
 l
2 nπ x
bn = f (x) sin d x , an = 0 . (11.54)
l 0 l

If f is smooth enough, it is equal to its Fourier series,






nπ x
f (x) = bn sin = bn u n (x, 0) .
n=1
l n=1

As the final step, one can show that the series



u(x, t) = bn u n (x, t) ,
n=1

(where u n and bn are defined in (11.53) and (11.54)) converges and defines a function
u which solves the given boundary value problem.
An Initial Value Problem on an Unbounded Domain. We apply the Fourier trans-
form to solve the wave equation on the real line with given initial conditions at time
t = 0,

∂ 2u 2∂ u
2
= c , −∞ < x < ∞, t > 0 , (11.55)
∂t 2 ∂x2
468 11 Differential Equations

∂u
u(x, 0) = f (x) , (x, 0) = 0 , −∞ < x < ∞ . (11.56)
∂t
Because x varies along the entire real line, we can try to compute the unknown
function u with the aid of the Fourier transform with respect to the x variable which
we denote by Fx , that is,
 ∞
û(ξ, t) = (Fx u)(ξ, t) = u(x, t)e−iξ x d x .
−∞

Here, t plays the role of a parameter. We want to apply Fx to both sides of (11.55).
For the left side, we obtain
  2   ∞ 2  ∞
∂ u ∂ u −iξ x ∂2
Fx (ξ ) = (x, t)e dx = 2 u(x, t)e−iξ x d x
∂t 2 −∞ ∂t
2 ∂t −∞
(11.57)
∂2
= 2 û(ξ, t) .
∂t
Here we have interchanged the differentiation w.r.t. t with the integration w.r.t. x,
see Theorem 8.5.
For the right side of (11.55) we use formula (10.28) and obtain
  2 
∂ u
Fx (ξ ) = −ξ 2 û(ξ, t) . (11.58)
∂x2

Thus the wave equation takes the form

∂2
û(ξ, t) = −c2 ξ 2 û(ξ, t)
∂t 2
or
∂2
û(ξ, t) + c2 ξ 2 û(ξ, t) = 0 .
∂t 2
Since the space derivative has disappeared, we can treat this equation as an ordinary
differential equation w.r.t. t for the unknown function û(ξ, t), where ξ appears as a
parameter. Its general solution has the form

û(ξ, t) = aξ cos(ξ ct) + bξ sin(ξ ct)

for some constants aξ and bξ depending on ξ . Applying Fx to the initial conditions


(11.56) we get
aξ = û(ξ, 0) = fˆ(ξ ) ,
1 ∂ 1  ∂u

bξ = û(ξ, 0) = (ξ, 0) = 0 .
cξ ∂t cξ ∂t
11.7 Applications of Fourier Methods to Partial Differential Equations 469

Thus û(ξ, t) = fˆ(ξ ) cos(ξ ct). We apply the inverse Fourier transform and finally
obtain  ∞
1
u(x, t) = fˆ(ξ ) cos(ξ ct)eiξ x dξ .
2π −∞

11.7.2 Fourier Methods for the Heat Equation

A Boundary Value Problem. Let us consider the boundary value problem for the
heat equation

∂u ∂ 2u
=k 2, 0 < x < l , t > 0,
∂t ∂x (11.59)
u(0, t) = u(l, t) = 0 , t ≥ 0,
u(x, 0) = f (x) , 0≤x ≤l.

We proceed in the same manner as in Sect. 11.7.1, using separation of variables as


well as Fourier series. By substituting u(x, t) = U (x)T (t) into the heat equation we
get
U (x)T  (t) = kU  (x)T (t) .

After division we see that


T  (t) U  (x)
= −λ =
kT (t) U (x)

holds for some constant λ, since the left side does not depend on t and the right side
does not depend on x. Due to the given boundary conditions, U has to satisfy

U  + λU = 0 , U (0) = U (l) = 0 .

We determine the values of λ for which this boundary value problem has nontrivial
solutions U (the eigenfunctions). After some computation we arrive at the solution


 
nπ x −n 2 π 2 kt/l 2
u(x, t) = bn sin e ,
n=1
l

where   
2 l
nπ ξ
bn = f (ξ ) sin dξ ,
l 0 l

of the boundary value problem (11.59) for the heat equation.


An Initial Value Problem. We consider the initial value problem for the heat equation
on the whole real line
470 11 Differential Equations

∂u ∂ 2u
=k 2, − ∞ < x < ∞, t > 0,
∂t ∂x (11.60)
u(x, 0) = f (x) , − ∞ < x < ∞.

We proceed similarly as in the case of the wave equation. We take the Fourier trans-
form w.r.t. x of both sides of the heat equation,
     2 
∂u ∂ u
Fx (ξ ) = k Fx (ξ ) .
∂t ∂x2

We evaluate the left side,


    ∞  ∞
∂u ∂u ∂
Fx (ξ ) = (x, t)e−iξ x d x = u(x, t)e−iξ x d x
∂t −∞ ∂t ∂t −∞
(11.61)

= û(ξ, t) ,
∂t
and the right side,   
∂ 2u
Fx (ξ ) = −ξ 2 û(ξ, t) . (11.62)
∂x2

Thus, the heat equation takes the form


û(ξ, t) + kξ 2 û(ξ, t) = 0 ,
∂t
where ξ appears as a parameter only. We already know that this differential equation
has the general solution
û(ξ, t) = aξ e−ξ kt .
2

Using the transform of the initial condition we get

aξ = û(ξ, 0) = fˆ(ξ ) .

Taking the inverse Fourier transform according to (10.34) we arrive at


 ∞
1
fˆ(ξ )e−ξ kt eiξ x dξ .
2
u(x, t) =
2π −∞

11.7.3 Fourier Methods for the Laplace Equation

On a rectangular two-dimensional domain, we can apply Fourier series to solve a


boundary value problem for the Laplace equation. Consider as an example
11.7 Applications of Fourier Methods to Partial Differential Equations 471

∂ 2u ∂ 2u
Δu(x, y) = + = 0, 0 < x < l , 0 < y < p,
∂x2 ∂ y2
u(x, 0) = 0 , 0 ≤ x ≤ l ,
u(0, y) = u(l, y) = 0 , 0 ≤ y ≤ p ,
u(x, p) = (l − x) sin x , 0 ≤ x ≤ l ,

where l and p are positive numbers. Using separation of variables and Fourier series,
one can obtain the solution in form of the infinite series


4l 2 nπ [1 − (−1)n cos l]  nπ x   nπ y 
u(x, y) = sin sin .
n=1
sinh(nπ p/l) (l 2 − n 2 π 2 )2 l l

For the boundary value problem in the upper half planes

Δu(x, y) = 0 , −∞ < x < ∞ , y > 0 ,


u(x, 0) = f (x) , −∞ < x < ∞ ,

one can obtain with the use of the Fourier transform the solution
 ∞  ∞ 
1 −|ξ |y −iξ(η−x)
u(x, y) = e e dξ f (η) dη
2π −∞ −∞
 ∞
y f (η)
= dη .
π −∞ y 2 + (η − x)2

11.8 Exercises

11.11.1 Find the general solution of the following differential equations:

dy dy dy
(a) = 0.03y, (b) = ky, k > 0, (c) = sin 5t.
dt dt dt
11.11.2 A bank account earns interest continuously at a rate of 5% of the current
balance per year. Assume that the initial value deposition is Rs 10,000 and
that no deposits or withdrawals are made.
(a) Write the differential equation modeling the balance in the account.
(b) Solve the differential equation and draw the graph of the solution.
11.11.3 Solve the following differential equations:

dy
(a) x 2 dy + y 2 d x = 0, (b) d x − x 2 dy = 0, (c) + 5x y = 0.
dx
11.11.4 Solve the following differential equations:
472 11 Differential Equations

dy dy 1
(a) x 3 + 3x 2 y = cos x, (b) x + y = 2.
dx dx x
11.11.5 Solve the following initial value problem:

dy
+ 5y = 20, y(0) = 2.
dx
11.11.6 Solve the following initial value problem

dy y4 1
x − 2y = 3 , y(1) = .
dx x 2
11.11.7 Solve the initial value problem

dy 2
= x y2, y(1) = − .
dx 3
11.11.8 In a city of India the rate at which the population grows at any time is
proportional to the size of the population. If the population was 125,000 in
1970 and 140,000 in 1990, what is the expected population in 2020?
11.11.9 A representative of a pharmaceutical company recommends that a new
drug of his company be given every T hours in doses of quantity y0 for
an extended period of time. Find the saturation level of the drug in the
patient’s body.
11.11.10 A simple model for the shape of a tsunami or a tidal wave is given by

dH √
= H 4 − 2H ,
dx

where H (x) ≥ 0 is the height of the wave expressed as a function of its


position relative to a point off shore.
(a) Find all constant solution of the differential equation by inspection.
(b) Solve the tsunami model by separating variables.
(c) Draw the graph of the solution that satisfies the condition H (0) = 2.
11.11.11 A lady was found murdered in her home. Police arrived at 11 am. The
temperature of the body at that time was 31 ◦ C, and one hour later 30 ◦ C.
The temperature of the room where the body was discovered was 22 ◦ C.
Estimate the time at which the murder was committed.
11.11.12 Examine whether the functions f 1 (x) = e x and f 2 (x) = sin x are linearly
independent.
11.11.13 Solve the following differential equations:

(a) 4y  − 10y  + 25y = 0, (b) y  − 16y  + 64y = 0, (c) y  + 2y  + 2y = 0.

11.11.14 Solve the initial value problem


11.8 Exercises 473

4y  − 4y  − 3y = 0, y(0) = 1, y  (0) = 5.

11.11.15 Show that u(x, y) = ln(x 2 + y 2 ) is a solution of the Laplace equation in


two dimensions, that is,

∂ 2u ∂ 2u
Δu(x, y) = + = 0.
∂x2 ∂ y2

11.11.16 Show that u(x, t) = sin x cos t is a solution of the wave equation

∂ 2u ∂ 2u
= 2.
∂t 2 ∂x

11.11.17 Show that u(x, t) = t −1/2 e−x /t


2
is a solution of the heat equation

∂u 1 ∂ 2u
= .
∂t 4 ∂x2
11.11.18 Find the general solution of

∂ 2u
(x, y) = 0.
∂x2
11.11.19 Solve the following boundary value problem:

∂ 2u 2∂ u
2
= c , 0 < x < l, t > 0
∂t 2 ∂x2

u(0, t) = 0, u(l, t) = 0, t > 0

∂u
u(x, 0) = 0, (x, 0) = h(x), 0 < x < l
∂t

where h is a given function with h(0) = h() = 0 and the constant c is the
wave speed.
11.11.20 Solve the following boundary value problem:

∂u ∂ 2u
= 2, 0 < x < 1
∂t ∂x
subject to
u(0, t) = 0, u(1, t) = 0, t ≥ 0,
u(x, 0) = x(1 − x), 0 < x < 1.

11.11.21 Solve the following boundary value problem:


474 11 Differential Equations

∂u ∂ 2u
= 2 , −π < x < π,
∂t ∂x
subject to

u(x, 0) = x, −π < x < π,


∂u ∂u
u(−π, t) = u(π, t), (−π, t) = (π, t), t ≥ 0.
∂x ∂x
11.11.22 Solve the following boundary value problem:

∂ 2u ∂ 2u
+ , 0 < x < a, 0 < y < b
∂x2 ∂ y2

u(x, 0) = 0, 0x ≤ a

u(0, y) = u(a, y) = 0, 0 ≤ y ≤ b

u(x, b) = x, 0 < x < a


Chapter 12
Calculus with MATLAB

12.1 Introduction

The Mathworks Corp. is the world’s leading developer of technical computing soft-
ware MATLAB for engineers and scientists in industries, government, and education;
it is the language of technical computing which is high-level language and interactive
environment that enables us to perform computationally intensive tasks faster than
with traditionally programming languages such as C, C++, and FORTRAN. The
main objective of this chapter is to introduce the MATLAB and highlight its role for
better understanding of concepts and methods of calculus. We focus our attention on
visualization of scalar- and vector-valued function, interpretation and illustration of
concepts such as limits, differentiation, integration, sequences, and series including
Taylor and Fourier series, ordinary differential equation (ODEs), and optimization.
MATLAB stands for MATrix LABoratory developed by Mathworks Corporation,
United States. MATLAB is a high-performance language for technical computing. It
integrates computation, visualization, and programming in an easy-to-use environ-
ment where problems and solutions are expressed in familiar mathematical notation.
Today, MATLAB has evolved as tool which finds its role in almost every field of
science and engineering. This book deals with some fundamentals of MATLAB.
MATLAB is very versatile software. It finds its application in almost all areas
of engineering and technology. Basically, MATLAB was developed to solve math-
ematical problems later it was modified to find its role in various other areas of
engineering and technologies such as signal processing, image processing, control
systems, fuzzy logic, neural networks, and many more. To see other areas, one can
go to MATLAB help and toolboxes.
MATLAB has evolved over a period of years with input from many users. In
university environments, it is the standard instructional tool for introductory and
advanced courses of mathematics, engineering, and sciences. In industry, MATLAB
is the tool of choice for high productivity, research, development, and analysis.

© Springer Nature Singapore Pte Ltd. 2019 475


M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6_12
476 12 Calculus with MATLAB

12.2 Important Elements of MATLAB

12.2.1 Advantages of MATLAB

MATLAB has many advantages as compared with conventional computer languages


or programs for solving technical problems which can be summarized as follows:
1. Easy to use: MATLAB is an interpreted language. It is very easy to use. There
are number of inbuilt functions which are optimized for particular problem. It is
really very easy to evaluate certain mathematical expressions just by passing the
values in these functions. With the help of these functions, large mathematical
problems can easily be solved in very few lines of code while to solve same
mathematical problems the lines of code written in other languages may extend
to several hundreds.
2. Platform Independence: The MATLAB software is supported by various oper-
ating systems such as Win XP/Vista, UNIX, and MAC OS X. The program written
in one platform can run on other platform.
3. Powerful Graphics: This is one of the important features of MATLAB which
makes it great for technical data analysis and interpretation. MATLAB supports
wide variety of plotting data so that it can be interpreted well. It supports colored
2D, 3D plots, animation, and videos that make it unique.
4. Graphical User Interface: MATLAB includes tools that allow a programmer to
interactively develop a graphical user interface (GUI) for his program. With this
capability, programmer can design sophisticated data analysis program that can
be operated by relatively inexperienced users.
5. MATLAB Help: The MATLAB help system is powerful and user-friendly. It
gives very good help on almost every topics and commands. User can get help
both offline and online (through Internet). MATLAB has hundreds of examples
(demo) on various problems that give an idea to write good program to the users.
6. MATLAB Compiler: MATLABs flexibility and platform independence is
achieved by compiling MATLAB programs into a device independent p-code
and then interpreting the p-code instructions at run time. But this is causing slow
execution of programs. A separate MATLAB compiler is also available, which
can compile MATLAB program directly into executables (exe) files that run faster.

12.2.2 How to Run MATLAB?

Let us first start with MATLAB environment. We can open MATLAB window by
clicking MATLABs icon on desktop.
On double clicking the icon, we will see the window Fig. 12.1.
Here, we can see that, MATLAB main window is divided into three parts, i.e.,
Workspace/Current Directory, Command History, and Command Window.
12.2 Important Elements of MATLAB 477

The workspace shows the variables, which are in use currently with their properties
such as size, type, maximum value, and minimum value. If you press the current
directory tab, it will show you the contents of the current directory. The command
window is a place, where we can write our MATLAB command to be executed.
MATLAB file formats: MATLAB can read or write various types of files. However,
there are mainly five types of files for storing data or programs that one will go
through them frequently. They are
(a) M-Files: These are standard ASCII text files with .m extension. They are written
in MATLAB editor. Our main program files are written in M-file formats. Later,
we will show one example of M-file.
(b) MAT Files: These are binary files with .mat extension. These files are created
when we save variables of MATLAB workspace.
(c) Fig Files: These are binary files with .fig extension to store graphics (i.e., curves
obtained by plotting the data).
(d) P Files: These are compiled M-files with .p extension. Mainly, these files are used
for distribution purpose of MATLAB programs with hidden MATLAB code.
(e) MEX Files: These are MATLAB callable C and FORTRAN programs with .mex
extension. These files are used in interfacing MATLAB with C or FORTRAN.

Fig. 12.1 Main window


478 12 Calculus with MATLAB

Starting MATLAB:
Variables, Operators, and Matrices. Let us start with simple MATLAB commands
related to matrices. Now, how to enter a Matrix?
Simply write the following on the command prompt in the command window:
>> a = [1 2 3; 5 7 4; 9 8 6]
and press Enter. We will see following in the command window:

>> a =
1 2 3
5 7 4
9 8 6

It shows that our 3 × 3 matrix is stored in variable a. This variable can be seen in
the workspace window. Here we can see that a semicolon is used to separate the row.
Now let us do some basic operations on this matrix. Simply type a on command
prompt (>> a) and press enter. We get

ans =
1 5 9
2 7 8
3 4 6

Obviously, it is the transpose of matrix a. Here ans is the default variable provided
by MATLAB, if we don’t specify our own variable. Similarly, we can do other
operations. Students are advised to try the following commands on the command
prompt:
1. >>Inv(a) [It will give the inverse of matrix a]
2. >>det(a) [Determinant of matrix a]
3. >>eig(a) [Eigen Value & Eigen Vector of a]
4. >>diag(a) [This gives elements of main diagonal]
5. >>rank(a) [Finds rank of matrix a]
Now let us do some operations on two matrices such as summation, subtraction,
multiplication, etc.
Matrix a is already stored in memory, now enter another matrix b as given below:
>> b = [2 4 5; 1 6 8; 3 4 6]
Here, we can notice that we have put a semicolon at the end of matrix b. The function
of the semicolon is to suppress output in workspace window, which becomes useful
in the cases when we generate very large matrices.
Students are advised to try the following operations on two matrices a and b:
1. >> c = a + b [Summation of matrix a and b and result is stored in c]
2. >> c = a − b [Subtraction of matrix b from a]
3. >> c = a ∗ b [Multiplication of matrix a and b]
12.2 Important Elements of MATLAB 479

4. >> c = a. ∗ b [Element-by-element multiplication of matrix a and b]


5. >> c = a./b [Element-by-element division of matrix a and b]
There is another operator called left division (‘\ ), which is used in solving systems
of linear algebraic equations (see the MATLAB help for detail).
Besides these arithmetic operators, MATLAB has relational and logical operators
also. Please see the MATLAB help for this.
Writing script files (M-files): In the previous section, we have seen some basic
operations performed by writing commands on the command prompt in the com-
mand window. It was easy, when we have to perform a single and easy operation
on a variable. But in the cases, when we have to perform a several complicated and
interdependent operations, then it is not possible to use the command prompt. MAT-
LAB has given an editor in which we can write programs called Script file which
can be compiled and run and results are displayed in the command window or figure
window, if some graphs are requested in program. Here we are giving one very sim-
ple example of writing an M-file to solve a system of linear algebraic equations. The
equations are given below:

x + 2y − z = 10, 4x + 6y + z = 20, x − 8y + 3z = 8.

These three equations can be written in matrix form as [ A].[x] = [B]


⎡ ⎤⎡ ⎤ ⎡ ⎤
1 2 −1 x 10
⎣4 6 1 ⎦ ⎣ y ⎦ = ⎣ 20 ⎦
1 −8 3 z 8

and solution can be obtained by [x] = [A]−1 .[B] (in MATLAB syntax it is x =
inv(A) ∗ B).
The solution can be found by writing M-file in editor and executing it. To open
the editor window, go to File menu of MATLAB window and select New > M-file
and write the following code:
% Solution of Linear Algebraic Equations.
A = [12 − 1; 461; 1 − 83];
B = [10; 20; 8];
x yz = inv(A) ∗ B;
x yz
After executing it, we get following result:

x yz =
8.6667
−1.6667
−4.6667
480 12 Calculus with MATLAB

Fig. 12.2 Editor window

The first line which starts with % is comment line and it is not executed.
The editor window is shown in Fig. 12.2.
Now how to run the program from editor window? It is easy. Just press the Run
button. MATLAB will ask to save the program, first save it and if we are not in
the current directory, then MATLAB will ask us to change the directory, allow the
directory change, and now program is run and result is seen in command window.
Second crude method is copy whole program and paste it at command prompt in
command window and press enter, program will be run without saving it.

12.2.3 MATLAB Functions

MATLAB has rich library of inbuilt functions to perform various tasks of vari-
ous fields of science and engineering. Here, we are giving use of some very basic
MATLAB functions. The one, we have already used, that is inv(a). Here, inv () is
a function, in which parameter a is passed and result is returned by this function.
Here, reader is encouraged to explore the MATLAB help for various functions for
different applications (Fig. 12.3).
Besides MATLAB’s inbuilt functions, user can write his own function. The user-
defined function is written in separate M-file and saved with the name of that function
(i.e., functionname.m).
12.2 Important Elements of MATLAB 481

Fig. 12.3 MATLAB’s Current Directory

Here we are giving one example of how user-defined functions can be created. We
have written one function, which will solve quadratic equation (ax 2 + bx + c = 0)

function [x1, x2] = eqsolve2(a, b, c)


discr = b2 − 4 ∗ a ∗ c % finding discriminant
x1 = (−b + sqrt(discr))/(2 ∗ a);
x2 = (−b − sqrt(discr))/(2 ∗ a);

This function file is first saved with name “eqsolve2.m” in some directory. Now this
function can be used to calculate two values of x (roots) simply by passing values of
“a”, “b”, and “c” on command prompt as given below:

>> [x1, x2] = eqsolve2(2, 5, 1)


x1 =
−0.2192
x2 =
−2.2808

And another example is given below:

[x1, x2] = eqsolve2(2, 4, 5)


x1 =
−1.0000 + 1.2247i
x2 =
−1.0000 − 1.2247i

This M-file is called function file. We can also call functions in main file. These
main M-files are called script files. To run above program from command prompt,
one should ensure that the directory in which this function file saved is same what is
written in MATLAB’s current directory field. Also, when we are using a main script
file with several function files, all must be in same directory.
482 12 Calculus with MATLAB

12.3 Visualization of Scalar- and Vector-Valued Function

Let us discuss an important feature of MATLAB that is handling powerful graphics.


MATLAB has capability of presenting our output and helping us in interpreting the
data graphically with help of various types of curves and plots.

12.3.1 Plotting Scalar Functions with MATLAB

2D dimensional plots. To plot a function, we have to create two arrays (vectors),


one containing abscissa, and the other corresponding function values. Let us plot
f (x) = sin(x). Type following commands on command prompt:

>> x = −2 ∗ pi : pi/100 : 2 ∗ pi; % range of x from − 2 p to + 2 p in steps of p/100.


>> f x = sin(x); %function sine() to compute sine of all x.
>> plot(x, f x) %plot function, which plot f x versus x.
>> grid % creates grid in plot.

After giving all these commands, and pressing Enter Key, we get following curve
window as shown in Fig. 12.4:
Let us plot now f (t) = e−t/10 sin t. Type the following commands on command
prompt:
>> t = 0 : 0.01 : 50;
>> f t = exp(−t/10). ∗ sin(t);
>> plot(t, f t)
>> grid
In above two examples, we have used plot() function for plotting. Now, we will
see that how we can combine two or more plots in one window. Just for simplicity,
let us combine plot of function f 1 (t) = e−t/10 sin t (Fig. 12.5) with another function
f 2 (t) = e−t/10 in same figure window (Fig. 12.6). Type following commands:
>> t = 0 : 0.01 : 50;
>> f t1 = exp(−t/10);
>> f t2 = exp(−t/10). ∗ sin(t);
>> plot(t, f t1, t, f t2) % This is how we can combine two plots.
>> grid
Similarly, we can add several graphs in same window with the command plot
(x, f x, y, f y, z, f z, ..). Please see MATLAB help for more detail.
We can plot several other types of 2D plots in MATLAB with different attributes
such as colors, styles, etc. here we are taking some more examples.
Output graphs of these examples are shown in Fig. 12.7.
12.3 Visualization of Scalar- and Vector-Valued Function 483

Fig. 12.4 Plot of a curve

Fig. 12.5 Plot of f (t) = e−t/10 sin t


484 12 Calculus with MATLAB

Fig. 12.6 Combining two curves in a single plot

Fig. 12.7 Various types of MATLAB plots


12.3 Visualization of Scalar- and Vector-Valued Function 485

Table 12.1 Some MATLAB functions and their code


Function MATLAB Code
Stairs() >> x = 0 : 0.1 : 10;
>> y = exp(−x). ∗ sin(x)
>> stairs(x, y)
Area() >> x = 0 : 0.1 : 10;
>> y = exp. ∗ sin(x)
>> area(x, y)
Stem() >> x = −4 ∗ pi : pi/5 : 4 ∗ pi;
>> y = sin (x)./x;
>> y(( length(y) − 1)/2 + 1) = 1;
>>stem(x, y)
Bar() >> x = −4 ∗ pi : pi/5 : 4 ∗ pi;
>> y = sin (x)./x;
>> y(( length(y) − 1)/2 + 1) = 1;
>>bar(x, y)
Semilogx() >> x = 0 : 0.1 : 10;
>> y = x. ∗ exp(−x);
>>semilogx(x, y)
>>grid
Semilogy >> x = 0 : 0.1 : 10;
>> y = x. ∗ exp(−x);
>>semilogy(x, y)
>>grid
Loglog() >> x = 0 : 0.1 : 10;
>> y = x. ∗ exp(−x);
>>loglog(x, y)
>>grid
Polar() >> theta = 0 : pi/100 : 2 ∗ pi;
>> r = sqrt (abs(sin(4 ∗ theta)));
>>polar(theta, r)

There are several other types of plots also, one can go through MATLAB’s help
(Table 12.1).
Three-dimensional (3D) plots. Let us have a look on 3D plots. Three-dimensional
graphs, we can plot for functions of two variables such as z = f (x, y). Here, we will
plot x(t) = et/10 . sin(t) verses y(t) = et/10 . cos(t) along with “t” axis (see Fig. 12.8).
Type the following commands:
486 12 Calculus with MATLAB

Fig. 12.8 3D plot of a spiraling curve

>> t = 0 : 0.01 : 30;


>> x = exp(−0.1 ∗ t). ∗ sin(t);
>> y = exp(−0.1 ∗ t). ∗ cos(t);
>> plot3(x, y, t) % 3d plot command.
>> grid
We will get the following figure:
MATLAB has several specialized 3D plots. Just try this one
>> [x, y] = meshgrid(−8 : 0.5 : 8);
>> r = sqrt(x.ˆ2+y.ˆ2) + eps;
>> z = sin(r )./r ;
>> mesh(x, y, z)
This will give, following result (Fig. 12.9):
One can try following plot commands on same function code and see the results:
>> surf(x, y, z)
>> contour(z)
>> surfc(x, y, z)
>> surf1(x, y, z)
>> meshz(x, y, z)
>> waterfall(z)
Please see help on each command in MATLAB help.
12.3 Visualization of Scalar- and Vector-Valued Function 487

Fig. 12.9 3D plot of a surface

12.3.2 Plots for Vector-Valued Functions in 2D and 3D


In variousenlargethispage8pt applications, we need to visualize the vector-valued
functions (vector fields). MATLAB has several functions to visualize vector field
in 2D and 3D such as quiver, quiver3, stream, stream3, streamline, streamline3,
streamslice, streamtube, streamribbon, streamparticles, coneplot, divergence, curl,
etc. Here we plot for the function z = xe−(x +y ) .
2 2

First, we take quiver plot. Just write following code on command prompt:
>> [x, y] = meshgrid(−2 : 0.1 : 2);
>> z = x. ∗ exp(−x.ˆ2 −y.ˆ2);
>> [d x, dy] = gradient(z);
>> quiver (x, y, d x, dy)
We get following plot: (Fig. 12.10)
Now we use streamline function to plot a vector ui + v j where u = f (x, y) and
v = g(x, y) are given as

u = x + y − x(x 2 + y 2 ) and v = −x + y − y(x 2 + y 2 ).

This, we can achieve by writing this code on command prompt as given below:
>> [x, y] = meshgrid(−2 : 0.1 : 2);
>> u = x + y − x. ∗ (x.ˆ2+y.ˆ2);
>> v = −x + y − y. ∗ (x.ˆ2+y.ˆ2);
>> x0 = [−2 − 2 − 2 − 2 − 0.5 − 0.5.5.52222 − 0.1 − 0.01.01];
>> y0 = [−2 − 0.5.52 − 22 − 22 − 2 − 0.5.52 − 0.01.01 − 0.01.01];
>> streamline(x, y, u, v, x0, y0)
>> axis square
488 12 Calculus with MATLAB

Fig. 12.10 Visualization of a 2D vector field

Fig. 12.11 Streamlines

We get this result (Fig. 12.11)


For other types of plots, kindly go through the MATLAB help.
12.4 Certain Topics of Calculus with MATLAB 489

12.4 Certain Topics of Calculus with MATLAB

There are several areas of mathematics where MATLAB has been of utmost value,
but here we confine ourselves to the interaction of MATLAB and Calculus. More
precisely, in this section, we demonstrate relevance of MATLAB to concepts of calcu-
lus such as differentiation and integration, limits of functions, series and sequences,
ordinary differential equation (ODEs), optimization (finding minima and maxima),
and Fourier analysis.

12.4.1 Differentiation and Integration

Differentiation and integration can be done in two ways either numerically or symbol-
ically. To understand symbolic differentiation and integration, we need to go through
Symbolic Math Toolbox which incorporates symbolic computation into the numeric
environment of MATLAB.
Symbolic Integration. First of all, we have to define symbolic variables by syms
command as given below:
>> syms x, t;
Then write function of x or t and int() function as given below:
>> f x = 1/(1+xˆ2);
>> y = int ( f x) % computes integration of expression sin x.
We get
y=
a tan(x) % which is tan −1 (x)
We can visualize these results also with ezplot() function, just add following code in
above code of integration:
>> ezplot( f x)
>> hold on
>> ezplot(y)
We get Fig. 12.12
This was an example of indefinite integral, we can do definite integral also, see
the code given below:
>> f t = t ∗ log(1 + t) ;
>> y = int( f t, 0, 1) % function f x is integrated from 0 to 1.
y=
1/4
Symbolic Differentiation. Similarly, for expression t 2 , we will find differentiation
as
>> tˆ2;
>> y = diff( f t)
y=
2∗t
490 12 Calculus with MATLAB

Fig. 12.12 Symbolic integration

Fig. 12.13 Symbolic differentiation

We can visualize the result also with ezplot() (see Fig. 12.13) as given below:
We can obtain second differentiation by diff( f t, 2) command.
>> y1 = diff( f t, 2)
>> y1 =
2
12.4 Certain Topics of Calculus with MATLAB 491

Similarly, we can have nth differentiation by diff( f x, n).


Numerical Integration. In MATLAB, we can do integration numerically. For this,
MATLAB has several functions such as quad, quad1, quadl, quadgk, quadv, dblquad,
and triplequad. Here we will introduce only quad function which numerically inte-
grates based on adaptive Simpson quadrature rule.
The general syntax is given as >> y = quad(“ f x  , a, b), which evaluates y =
b  2 −x 2
a f (x) d x. Let us evaluate 0 e d x using quad function. First, we have to write
function file which will describe input function, then on command prompt, we can
use that file to evaluate integral for any limit combinations. The code for function
file is given below which is written as M-file in editor.
% function file for describing function f (x) to be used in quad()
function y = myfunction(x)
y = exp(-x.ˆ2);
(*the above function file has to be saved with name myfunction.m)
To evaluate integral write following on command prompt:
>> y = quad( myfunction , 0, 2)
y=
0.8821
There is another simple way in which M-file is not required. That is for command
prompt. The code is given below:
>> f x = inline( exp(-x.ˆ2)’);
>> y = quad( f x, 0, 2)
y=
0.8821
The double integration can also be found by using dblquad function. The gen-
eral syntax is >> y = dblquad( f x y, xmin, xmax, ymin, ymax) which evaluates
xmax ymax  2π  2π
y= f (x, y) d x d y. Let us evaluate f (x, y) = π 0 (y sin x −
xmin ymin
x sin y) d x d y using dblquad function. The code is given below:
>> f x y = inline( y. ∗ sin(x) − x. ∗ sin(y) );
>> q = dblquad( f x y, pi, 2 ∗ pi, 0, pi)
q=
−39.4784
For details about other functions, please see MATLAB’s help.

12.4.2 Finding Limits of Functions

The fundamental idea in calculus is to make calculations on functions as a variable


“gets close to” or approaches a certain value. Recall that the definition of the derivative
f (x + h) − f (x)
is given by a limit lim , provided this limit exists. The general
h→0 h
syntax is >> limit( f x, x, limit value). We can find limit as given below:
492 12 Calculus with MATLAB

>> symsx;
>> limit(sin(x)/x, x, 0) % here f x = sin(x)/x and x → 0
ans =
1
We can find one-sided limit also such as left sided or right sided. The general syntax
is
>> limit( f x, x, limit value, right) or >> limit( f x, x, limit value, left). Let us
find right and left limits of the function f (x) = x/|x| as x approaches to 0. The code
is given below:
>> limit(x/abs(x), x, 0, right )
ans =
1
And
>> limit(x/abs(x), x, 0, left )
ans =
−1
Now, we will see one application of limits in drawing tangent line at a point on a
given curve. Let the given curve is y = x 2 + 1 and we want to plot a tangent line at
point (2, 5). If P(x0 , y0 ) is a point on the graph of a function f , then the tangent line
to the graph of f at P, also called the tangent line to the graph of f at x0 ,

f (x0 + h) − f (x0 )
m tan = lim
h→0 h

provided this limit exists. If the limit does not exist, then by agreement the graph has
no tangent line at P.
The program code is given below:
% Program to draw the tangent line at a point on a curve.
syms xh;
x0 = 2; % At point (2, 5), tangent is to be drawn
y0 = 5;
y = xˆ2 + 1; % Function declaration
y1 = subs(y, (h + x0));
y2 = subs(y, x0);
y3 = (y1 − y2)/ h;
m = limit(y3, h, 0);
line = (m(x − x0) + y0);
ezplot(y)
hold on
ezplot(line)
plot(x0, y0, ‘r o , ‘MarkerFaceColor , ‘g  )
v = [−8 8 − 10 40];
axis(v)
grid
we get following plot (Fig. 12.14)
12.4 Certain Topics of Calculus with MATLAB 493

Fig. 12.14 Tangent line at a point on a curve

12.4.3 Sequences and Series

We can compute the summation  of finite and infinite series using symsum() function,
if they exist. To find sum of m (k). The general syntax is >> symsum( f k, l, m).
k=l f
Let us have one example of ∞ k=1 (1/k ) = 1 + 1/2 + 1/3 + 1/4 + . . . ∞.
2 2 2 2

The code is given below:


>> syms xk
>> s1 = symsum(1/kˆ2, 1, inf)
s1 = 1/6 ∗ piˆ2
If series is given as 1 + x 2 + x 3 + x 4 + x 5 + . . . ∞. Then code will be as follows:
>> syms xk
>> s2 = symsum(xˆk, k, 0, in f )
s2 =
−1/(x − 1)
In symsum, the default variable is k.
Now we will see that how Taylor series can be obtained. The expression of Taylor

f (n)
series is (x − a)n . Which is expansion about a. The n terms about a can be
n=0
n!
found by following command:
>> taylor( f x, n, a),
where, f x: function of x,
n: gives (n − 1)st-order polynomial and
494 12 Calculus with MATLAB

Fig. 12.15 Taylor approximation of the sine function

a: a point around which polynomial will be calculated.


Let us graphically compare f (x) = sin(x), −4 < x < 4 with its Taylors polynomial
of fourth order computed around a = 0 (see Fig. 12.15). The code is given below:
>> syms x;
>> f x = sin(x);
>> f x x = taylor( f x, 5, 0) % we can write taylor( f x, 5) also
f xx =
x − 1/6 ∗ xˆ3
For plotting the function, write the following code:
>> ezplot( f x)
>> hold on
>> ezplot( f x x)

12.4.4 Solving Ordinary Differential Equations (ODEs)

Differential equation is very important part of calculus as it has role in applied areas
of mathematics; such one of them is modeling of physical systems. In MATLAB,
we can solve differential equations in two ways, either symbolically or numerically.
First, we start with symbolic solution of ODEs.
Symbolic Solution. The function dsolve computes symbolic solutions to ordinary
differential equations. The equations are specified by symbolic expressions contain-
ing the letter D to denote differentiation. The symbols D2, D3…DN correspond to the
12.4 Certain Topics of Calculus with MATLAB 495

Fig. 12.16 Solution of a differential equation

second, third, ..., Nth derivative, respectively. Thus, D2y is equivalent of d 2 y/dt 2 .
dx
We can specify the initial conditions also. For example, let us solve = x + t,
dt
with initial condition x(0) = 0. Then write following code:
>> x = dsolve( Dx = x + t  , x(0) = 0 )
x=
−1 − t + exp(t)
With ezplot function, we can plot this solution also (Fig. 12.16).
We can solve higher order differential equations also. Let the differential equation
d2 y dy(0)
to be solved is 2 = cos 2t − y, with initial conditions y(0) = 1 and = 0.
dt dt
To solve it write the following code (Table 12.2):
>> y = dsolve(‘D2y = cos(2 ∗ t) − y  , y(0) = 1 , Dy(0) = 0 )
y=
4/3 ∗ cos(t) − 1/3 ∗ cos(2 ∗ t)
Numerical Solution. MATLAB has several inbuilt functions to solve various types
of ODEs numerically. The detail is given below.
Here, we are taking one simple example of ode45(). The general syntax is given
below >> [time, solution] = ode45(myfunction, tspan, xo), where myfunction is
the ODE, tspan is the time interval in which we want solution, and xo is initial
condition. Let us same ode, which we have taken above for symbolic solution. This
496 12 Calculus with MATLAB

Table 12.2 ODE solvers in MATLAB


Solver Solves these kinds of problems Method
ode45 Nonstiff differential equations Runge–Kutta
ode23 Nonstiff differential equations Runge–Kutta
ode113 Nonstiff differential equations Adams
ode15s Stiff differential equations and DAEs NDFs (BDFs)
ode23s Stiff differential equations Rosenbrock
ode23t Moderately stiff differential equations and DAEs Trapezoidal rule
ode23tb Stiff differential equations TR-BDF2
ode15i Fully implicit differential equations BDFs

Fig. 12.17 Numerical solution of an ODE

ode can be solved in two ways, either writing function file describing ODE or by
using inline function. The code is given below:
>> myode=inline(’x + t’);
>> [t, x] = ode45(myode, [02], 0);
>> plot(t, x) (see Fig. 12.17).
We can code in alternate form also by writing function file. The code is given
below:
% function file saved as myode.m
function d xdt = myode(t, x)
d xdt = x + t;
On command prompt, we will write the following:
12.4 Certain Topics of Calculus with MATLAB 497

Fig. 12.18 Solution of an ODE of higher order

>> [t, x] = ode45( myode , [02], 0);


>> plot(t, x)
We can solve higher order ODEs also. For this, we have to represent them as a set of
d2 y dy
first-order ODEs. For example, let us solve 2 − (1 − y 2 ) + y = 0 with initial
dt dt
dy(0)
condition y(0) = 2 and = 0.
dt
Before going code writing, we have to express second-order ode into set of 1st
order ode. The ode can also be written as ÿ = (1 − y 2 )ẏ − y...........(1)
Let y = x1 and x2 = ẋ1.............(2)
Then equation in y can be written as ẋ2 =
(1 − x1
)x2 − x1..............(3)
2

ẋ1 x2
The equation (2) and (3) can be written as = .............
ẋ2 (1 − x12 )x2 − x1
(4), the equation (4) is of the form ẋ = f (x) (Fig. 12.18).
Now we write function file which will describe equation (4). The code is given
below function xdot = myode(t, x) xdot = [x(2); (1 − x(1)2 ) ∗ x(2) − x(1)].
Then write following commands on command prompt:
>> [t, x] = ode45( myode , [020], [2; 0]);
>>plot(t, x)
498 12 Calculus with MATLAB

12.4.5 Animated Phase Portraits of Nonlinear and Chaotic


Dynamical Systems*

Introduction. The aim of this section is to present programs allowing to highlight


the slow–fast evolution of the solutions of nonlinear and chaotic dynamical systems
such as Van der Pol, Chua, and Lorenz models. These programs provide animated
phase portraits in dimension two and three, i.e., “integration step-by-step” which are
useful tools enabling to understand the dynamic of such systems.
Van der Pol model. The oscillator of Van der Pol [7] is a second-order system with
nonlinear frictions which can be written as

d2 y dy
2
− α(1 − y 2 ) + y = 0.
dt dt
The particular form of the friction which can be carried out by an electric circuit
causes a decrease of the amplitude of the great oscillations and an increase of the
small. This equation constitutes the “paradigm of relaxation-oscillations”. According
to d’Alembert transformation [1], any single nth-order differential equation may be
transformed into a system of n simultaneous
first-order
equations and conversely.
d y3
Let us consider α(1 − y ) ẏ = α y −
2
and let us pose: x1 = y and y =
dt 3
α ẋ2 . Thus, we have
 ẋ  1  x1 + x2 − x13 
1
= 3 ,
ẋ2 α −x1

when α becomes very large, x1 becomes a “fast” variable and x2 a “slow” variable.
√ α → ∞, we introduce a small parameter ε = 1/α and
2
In order to analyze the limit

“slow time” t = t/α = εt. Thus, the system can be written as

 ε ẋ   x13 
1
= x1 + x2 − 3 ,
ẋ2 −x1

with ε a small positive real parameter ε = 0.05. System (1) which has been exten-
sively studied since nearly one century is called a slow–fast dynamical system or a
singularly perturbed dynamical system. Although it has been established that system
(1) cannot be integrated by quadratures (closed-form) it is well known that it admits
a solution of limit-cycle type. The program presented here enables to emphasize the
slow–fast evolutions of the solution on this limit cycle.
First, copy the filed named “vanderpol” and “vanderpolpp” into the “current
folder” of MATLAB. Then, open the M-file called “vanderpolpp” (see Fig. 12.19)
and press the green button (red circle on Fig. 12.19) to provide an animated plot 2D.
On Fig. 12.20, the solution materialized by a green point (green circle on Fig. 12.20)
12.4 Certain Topics of Calculus with MATLAB 499

Fig. 12.19 Program for 2D animated phase portait

Table 12.3 Description of the main functions


Property Description
OutputFcn A function for the solver to call after every successful integration step.
odephas2 2D Phase plane
Refine Increase the number of output points by a factor of Refine

which evolves on the limit cycle, i.e., slowly on the nearly vertical parts and fast on
the nearly horizontal parts.
All the main functions are described in Table 12.3. The program provides the
animated phase portrait corresponding to the solution of system (1) and the time
series (Fig. 12.20).
Chua’s model. The L.O. Chua’s circuit [2] is a relaxation oscillator with a cubic
nonlinear characteristic elaborated from a circuit comprising a harmonic oscillator of
which operation is based on a field-effect transistor, coupled to a relaxation oscillator
500 12 Calculus with MATLAB

Fig. 12.20 Animated phase portrait in 2D

composed of a tunnel diode. The modeling of the circuit uses a capacity which will
prevent from abrupt voltage drops and will make it possible to describe the fast
motion of this oscillator by the following equations which also constitute a slow–fast
dynamical system or a singularly perturbed dynamical system.
⎡ ⎤ ⎡ ⎤
ε ẋ1 x3 − 44 x 3 − 41
3 1 2 1
x 2 − μx1
⎣ ẋ2 ⎦ = ⎣ −x3 ⎦,
ẋ3 −0.7x1 + x2 + 0.24x3

with ε and μ are real parameters ε = 0.05, μ = 2. The system (2) which cannot
be integrated by quadratures (closed-form) exhibits a solution evolving on “chaotic
attractor” in the shape of a “double-scroll”. The program presented here enables to
emphasize the slow–fast evolutions of the solutions on the “chaotic attractor”.
First, copy the files named “chua” and “chua1” into the “current folder” of MATLAB.
Then, open the M-File called “chua1” and press the green button to provide an
animated plot 3D. The solution materialized by a green point evolves on the attractor
according to slow and fast motion. The function “odephas2” is simply replaced by
“odephas3”.
Lorenz model. The purpose of the model established by Edward Lorenz [5] was
in the beginning to analyze the unpredictable behavior of weather. After having
developed nonlinear partial derivative equations starting from the thermal equation
12.4 Certain Topics of Calculus with MATLAB 501

Table 12.4 Description of the main functions


Name Description
Vanderpol Dynamical system (1) with ε = 1/20
Vanderpolpp Animated phase portrait 2D
Chua Dynamical system (2) with ε = 1/20, μ = 2
Chua1 Animated phase portrait 3D
Lorenz Dynamical system (3) with σ = 10, β = 8/3, r = 28
Lorenz1 Animated phase portrait 3D

and Navier–Stokes equations, Lorenz truncated them to retain only three modes. The
most widespread form of the Lorenz model is as follows:
⎡ ⎤ ⎡ ⎤
ε ẋ1 σ (x2 − x1 )
⎣ ẋ2 ⎦ = ⎣ −x1 x3 + r x1 − x2 ⎦
ẋ3 x1 x2 − βx3

with σ, r , and β are real parameters: σ = 10, β = 83 , and r = 28.


Although this system is not singularly perturbed since it does not contain any small
multiplicative parameter, it is a slow–fast dynamical system. Its solution exhibits a
solution evolving on “chaotic attractor” in the shape of a “butterfly”. The program
presented here enables to emphasize the slow–fast evolutions of the solution in the
“chaotic attractor” (Table 12.4).

References
1. D’Alembert J, Suite des recherches sur le calcul intégral, quatrième partie: Méth-
odes pour intégrer quelques équations différentielles. Hist Acad Berlin tome
IV(1748):275–291
2. Chua LO, Komuro M, Matsumoto T (1986) The double scroll family. IEEE Trans
Circuits Syst 33(11):1072–1097
3. Ginoux M (2009) Differential geometry applied to dynamical systems, vol 66.
World scientific series on nonlinear science, Series A. World Scientific, Singapore
4. Lorenz N (1963) Deterministic non-periodic flows. J Atmos Sci 20:130–141
5. Pratap R (2009) Getting started with MATLAB: A quick introduction for scientists
and engineers. Oxford University Press, USA
6. Van Der Pol B (1926) On Relaxation-Oscillations. Philos Mag 7(2):978–992
502 12 Calculus with MATLAB

12.4.6 Finding Minima and Maxima

Using MATLAB, we can find unconstrained minima and maxima of a nonlin-


ear function within a specified range. For this purpose, fminbnd() and fmin-
search() functions are used. The general syntax of fminbnd is given below >> x =
f minbnd( f x, x1, x2), where f x is a function of x, x1, and x2 define range in which
minima is to be found.
1
Let us take one example function f (x) = to find its minima,
10 + 5 cos ( x2 + 2)
the example code is given below:
>> f x = 1/(10 + 5 ∗ cos((x/2) + 2)) ;
>> x = f minbnd( f x, −6, −2)
x=
−4
>> y = eval( f x, x)
y=
0.0667
>> ezplot( f x)
>> hold on
>> plot(x, y, ro )
We get the result (Fig. 12.21).

Fig. 12.21 Minimum of a given function


12.4 Certain Topics of Calculus with MATLAB 503

Fig. 12.22 Maximum of the same function

We can find maxima also for this function. But for that we will find minima of
inverted function, i.e., f (x) that obviously will be the maxima of f (x). The code is
given below:
>> f x = 1/(10 + 5 ∗ cos((x/2) + 2)) ;
>> f xinv = −1/(10 + 5 ∗ cos((x/2) + 2)) ; % Inverted function
>> xinv = f minbnd( f xinv, 0, 4)
xinv =
2.2832
>> yinv = eval( f xinv, xinv)
yinv =
−0.2000
>> ezplot( f x)
>> hold on
>> plot(xinv, −yinv, ro )
We get the result (Fig. 12.22).

12.4.7 Fourier Analysis


We have described some of the basic properties of Fourier series in Chapter 10.
MATLAB helps us to understand these properties in much better way by plotting
some functions along with several of the partial sums of their Fourier series. This will
illustrate that the partial sum of Fourier series of a function is not equal to the function
itself. Through MATLAB, one can experiment to observe the shape of partial sums
Sn (x), when n increases.
504 12 Calculus with MATLAB

Here we present one very versatile program written in MATLAB to compute


Fourier series of the given function. This program can be used to compute Fourier
series of any function, just by doing small modification in main program body.
The program code is given  below to find Fourier series of the square function
−1 −π ≤ x < 0
which is defined as f (x) = The program code is given below:
1 0<x ≤π
% Fourier Analysis of square functions for first 15 harmonics.
t0 = −pi; % initial time
t0_ T = pi; % final time
mp = 0; % mid point
T = t0_ T−t0; % time period
syms t; % sym variable declaration
f t = −diff(t); % −1 part of function
f tt = diff(t); % 1 part of function
w0 = 2 ∗ pi/T ; % frequency
n = 1 : 15; % number of Harmonics
% computation of Trigonometric Fourier series Coefficients
a0 = 1/T ∗ (int( f t, −pi, 0) + int( f tt, 0, pi));
an = 2/T ∗ (int( f t ∗ cos(n ∗ w0 ∗ t), − pi, 0)+ int( f tt ∗ cos(n ∗ w0 ∗ t), 0, pi));
bn = 2/T ∗ (int( f t ∗ sin(n ∗ w0 ∗ t), − pi, 0) + int( f tt ∗ sin(n ∗ w0 ∗ t), 0, pi));
ann = an. ∗ cos(n ∗ w0 ∗ t);
bnn = bn. ∗ sin(n ∗ w0 ∗ t);
avg = double(a0);% converting sym variable to value
t = −pi : pi/100 : pi;
suma = 0; sumb = 0;
for j = 1 : 15 % taking 15 harmonics
sumb = sumb + bnn( j);
suma = suma + ann( j);
end
bnsum = eval(sumb);
ansum = eval(suma);
plot(t, avg + bnsum + ansum) % plot of truncated harmonics function
hold on
% plotting actual function
t1 = −pi : pi/1000 : mp;
plot(t1, −1, r  )
t2 = mp : pi/1000 : pi;
plot(t2, 1, r  )
grid on
% formatting plot
xlabel(‘Time’)
ylabel(‘Amplitude’)
title(‘Fourier approximation plot for 15 harmonics for square function’)
legend(‘Fourier Approximation’, ‘Actual Function’)
The result is shown below (Fig. 12.23)
12.4 Certain Topics of Calculus with MATLAB 505

Fig. 12.23 Fourier approximation of the square function

Here is another
 program to plot harmonics of the ramp function which is defined as
0 −1 ≤ x ≤ 0
f (x) =
x 0≤x ≤1
One can see the required modifications in this program as compared with previous
one.
% Fourier Analysis of ramp function for first 5 harmonics.
t0 = −1; % initial time
t0_ T=1; % final time
mp = 0; % mid point
T = t0_ T−t0; % time period
syms t; % sym variable declaration
f t = diff(diff(t)); % zero part of function
f tt = t; % t part of function
w0 = 2 ∗ pi/T ; % frequency
n = 1 : 5; % number of Harmonics
% computation of Trigonometric Fourier Series Coefficients
a0 = 1/T ∗ (int( f t, −1, 0) + int( f tt, 0, 1));
an = 2/T ∗ (int( f t ∗ cos(n ∗ w0 ∗ t), −1, 0) + int( f tt ∗ cos(n ∗ w0 ∗ t), 0, 1));
bn = 2/T ∗ (int( f t ∗ sin(n ∗ w0 ∗ t), −1, 0) + int( f tt ∗ sin(n ∗ w0 ∗ t), 0, 1));
ann = an. ∗ cos(n ∗ w0 ∗ t);
bnn = bn. ∗ sin(n ∗ w0 ∗ t);
avg = double(a0);% converting sym variable to value
506 12 Calculus with MATLAB

t = −1.5 : 0.010 : 1.5;


suma = 0; sumb = 0;
for j = 1 : 5 % First 5 harmonics
sumb = sumb + bnn( j);
suma = suma + ann( j);
end
bnsum = eval(sumb);
ansum = eval(suma);
plot(t, avg + bnsum + ansum) % plot of truncated harmonics function
hold on
% plotting actual function
t1 = −1 : 0.010 : 0;
plot(t1, 0, r  )
t2 = 0 : 0.010 : 1;
plot(t2, t2, r  )
grid on
% formatting plot
xlabel(‘Time’)
ylabel(‘Amplitude’)
title(‘Fourier approximation plot for 5 harmonics for ramp function’)
legend(‘Fourier Approximation’, ‘Actual Function’)
The result is shown in Fig. 12.24.

Fig. 12.24 Fourier approximation of the ramp function


12.5 Exercises 507

12.5 Exercises

12.5.1 Write the MATLAB program to obtain 2D plot of following function. The
type of plot is specified with the functions.
 x
e x <0
(a) Simple line plot for f (x) = .
(x − 1)2 x ≥ 0
(b) Stem plot of f (x) = x sin x + e−x/5 cos x, 0 ≤ x ≤ 10.
(c) Quiver plot of f (x, y) = xi + (x 2 + y 2 ) j over the rectangle [−2, 2] ×
[−2, 2].
θ
(d) Polar plot of r = 2e(− 10 ) , 0 ≤ θ ≤ 10π.
12.5.2 Write the MATLAB program to obtain 3D plot of following functions. The
type of plot is specified with the functions.
(a) Simple line plot for x = t sin t, y = t cos t, z = t for 0 ≤ t ≤ 10π.
(b) Quiver plot for f (x, y, z) = yi + x j + (x 2 + z)k over the rectangle [−2, 2]
× [−2, 2] × [−1, 1].
(c) Surf plot for the figures 8.1.3, 8.1.6 of Chapter 8 of this book.
x y(x 2 − y 2 )
(d) Mesh plot for the function z = , −3 ≤ x ≤ 3, −3 ≤ y ≤ 3.
x 2 + y2
12.5.3 Write a MATLAB program to compute the first 200 partial sums of the series:
∞ ∞ ∞
ln2 (n) (−1)n+1 1
(a) , (b) , (c) .
n=1
n 1.5
n=1
n ln (n + 1) n=1
n3
12.5.4 Find the limits of the following functions:
−1
1 2x ecos x
(a) lim , (b) lim √ , (c) lim √ ,
x→0 5 + 4 cos x x→∞ x2 − 1 x→0 1 − x2
x
(d) lim √ .
x→∞ 9 + 4x 2  5
12.5.5 Find symbolic and numerical integration of e x sin x d x and compare the
0
results.
12.5.6 Find symbolic differentiation of following functions and plot them:
(x 2 − 1) sin x
(a) (4x 2 − 1)(7x 3 + x), (b) , (c) ,
(x + 1)
4 (1 + cos x)
x
(d) e sin x.
12.5.7 Write a MATLAB program to solve following differential equations numer-
ically and plot the output variables:
(a) ÿ + sin y = 0 with initial conditions y(0) = 1, ẏ(0) = 0.
(b) ẋ = 3x − 4 cos t with initial condition x(0) = 1.
508 12 Calculus with MATLAB

12.5.8 Write a program in MATLAB to compute Fourier series of following func-


tion: 
x −π < x < 0
f (x) =
x 0<x <π

and plot partial sums of first 5, 10, and 30 harmonics and show the result
from −3π to +π .
Appendix A
Real Numbers and Inequalities

A.1 The Number System

The simplest numbers are the natural numbers 1, 2, 3, 4, 5, . . . . The numbers


. . . , −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, . . . are called integer numbers or simply
integers. We denote the set of all natural numbers by N and the set of all integers by
Z. It is clear that N ⊂ Z, that is, the set of natural numbers is a subset of the set of
integers.
p
The numbers of the form , where p and q are integers with q  = 0, are called
q
3 −2 11
rational numbers. For example, , , are rational numbers. We denote the
4 5 2
set of all rational numbers by Q. We have Z ⊂ Q, since any integer p can be written
p
as the ratio .
1
Numbers which
√ cannot
√ be √ written as the ratio of two integers are called irrational.
For example 2, 1 √ + 2, 5 and π are irrational numbers. (In Sect. 1.6 we present
Euclid’s proof that 2 is irrational.) Together, rational and irrational numbers form
what is called the real number system. Thus, a real number is either rational or
irrational. The set of all real numbers is denoted by R, and we have that Q ⊂ R.
We assume that the reader is familiar with the algebraic operations (addition,
subtraction, multiplication, and division) for numbers. Recall that division by zero
is not allowed.
Since the square x 2 of a real number x is nonnegative, there cannot be any real
number which satisfies the equation x 2 = −1, or x 2 + 1 = 0. Nevertheless, √ it has
been found very useful in mathematics to introduce the number i = −1, so that
i 2 = −1. i is called the imaginary unit. Numbers of the form a + bi, where√a and b
are real numbers, are called complex numbers. For example, 2 + 5i and 3 − 6i
are complex numbers. A complex number a + bi is called imaginary (or pure
imaginary) if a = 0, so 9i is an imaginary number. Since a = a + 0i, every real
2
number a can be interpreted as a complex number, as for example the number ,
3
© Springer Nature Singapore Pte Ltd. 2019 509
M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6
510 Appendix A: Real Numbers and Inequalities

thus R ⊂ C. The algebraic operations are defined for complex numbers in the natural
way, for example
√ √ √
(3 + 5i) + (4 − 2i) = 3 + 4 + 5i − 2i = 7 + (5 − 2)i ,
√ √ √
(3 + 5i) − (4 − 2i) = 3 − 4 + 5i − (− 2i) = −1 + (5 + 2)i ,
√ √ √
(3 + 5i) · (4 − 2i) = 3 · 4 + 5i · 4 − 3 2i − 5i · 2i
√ √
= (12 + 5 2) + (20 − 3 2)i .

The division of two complex numbers is a bit more complicated, in general one has

a + bi (a + bi)(c − di) ac + bd bc − ad
= = 2 + 2 i.
c + di (c + di)(c − di) c + d2 c + d2

Rational and irrational numbers can be distinguished by their decimal representa-


tions. Decimals like 12.345 = 12.3450000 . . ., where only zeroes appear from some
point onwards, are called terminating decimals. They correspond to those rational
numbers p/q whose denominator q has only 2’s and 5’s as prime factors. Every
other rational number has a periodic decimal representation, that is, a representation
where a certain string of numbers is repeated from some point onwards, for example

31
= 0.51666666 . . . ,
60
13
= 0.7647058823529411 76470 . . .
17
Irrational numbers have nonperiodic decimal representations. For example, the dec-
imal representations

π = 3.141592653 . . . ,

2 = 1.414213562 . . . ,

do not exhibit any periodic repetition. Moreover, let us remark that if we truncate a
nonterminating decimal representation (periodic or nonperiodic) of a real number x
at some point, the resulting terminating decimal will only be an approximation to x.

A.2 Intervals, Absolute Value and Inequalities

We say that a real number x is less than another real number y, and write x < y, if
x − y is less than zero, that is, x − y is negative or y − x is positive. Writing “x ≤ y”
(read “x is less than or equal to y”) means that x − y is less than or equal to 0, that
is, x − y is negative or it is equal to 0.
Appendix A: Real Numbers and Inequalities 511

The set of all real numbers x such that a ≤ x ≤ b is called the closed interval
from a to b and denoted as [a, b]. The set of all real numbers x such that a < x < b
is called the open interval from a to b and denoted as (a, b).

Definition A.1 The absolute value or magnitude of a real number x is denoted by


|x| and is defined as


⎨x , x > 0,
|x| = 0 , x = 0,


−x , x < 0 .

Properties of the absolute value. For any real numbers x and y, the number |x − y|
represents the distance of x and y on the real line. (Thus |x| equals the distance of x
from 0.) Moreover, the following properties hold.

(i) x 2 = |x|.
(ii) | − x| = |x|. (A number and its negative have the same absolute value.)
(iii) |x y| = |x||y|. (The absolute value of a product is the product of the absolute
values.)
(iv) |x + y| ≤ |x| + |y|. (This is called the triangle inequality.)
(v) ||x| − |y|| ≤ |x − y|. (This is called the inverse triangle inequality.)

A.3 The Binomial Theorem

Let x and y be real or complex numbers, and let n be any nonnegative integer. Then
n  
 n
(x + y) =n
x n−k y k ,
k
k=0

where  
n n!
= , n! = n(n − 1)(n − 2) · · · 2 · 1 .
k k!(n − k)!
Appendix B
Analytic Geometry

Rectangular coordinates. The real line consists of all real numbers; each point
on the real line is associated with a real number. Let us now consider the plane. A
rectangular coordinate system (also called Cartesian coordinate system) consists
of two perpendicular coordinate lines, called coordinate axes. The intersection of the
two axes is called the origin of the coordinate system and denoted by O. Usually, the
coordinate axes are chosen along the horizontal and vertical direction; the horizontal
axis is called the x-axis, the vertical axis is called the y-axis. In this case, the plane
and the axis together are referred to as the x y-plane (Fig. B.1). Any point P in the
plane is represented by a pair (x1 , y1 ) of real numbers, x1 is called the x-coordinate of
P and y1 is called the y-coordinate of P. Points on the x-axis have the form (x1 , 0),
while points on the y-axis have the form (0, y1 ). The origin O has the coordinates
(0, 0). The vertical line through a point (x1 , 0) and the horizontal line through (0, y1 )
are represented, respectively, by the equations

x = x1 and y = y1 .

Fig. B.1 Rectangular


coordinate system

© Springer Nature Singapore Pte Ltd. 2019 513


M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6
514 Appendix B: Analytic Geometry

Lines, distances, circles. Let P1 and P2 be two points in the x y-plane, having coor-
dinates (x1 , y1 ) and (x2 , y2 ) respectively. Consider the straight line passing through
P1 and P2 . Whenever x1 = x2 , the number

y2 − y1
m=
x2 − x1

is called the slope of this line; the line itself is described by the equation

y = m(x − x1 ) + y1 .

The distance between P1 and P2 is defined by



d(P1 , P2 ) = d((x1 , y1 ), (x2 , y2 )) = (x2 − x1 )2 + (y2 − y1 )2 .

Indeed, this number yields the length of the line segment which connects P1 and P2 ,
by the theorem of Pythagoras. As a particular case, the distance of P1 = (x1 , 0) and
P2 = (x2 , 0) equals

d(P1 , P2 ) = (x2 − x1 )2 = |x2 − x1 | .

The equation of a circle with center (x0 , y0 ) and radius r is given by

(x − x0 )2 + (y − y0 )2 = r 2 .

If (x0 , y0 ) = (0, 0) then this equation becomes x 2 + y 2 = r 2 , it describes the circle


centered at the origin with radius r .
Polar coordinates. Instead of Cartesian coordinates, it is often more convenient to
use polar coordinates which we usually denote by (r, θ). The Cartesian coordinates
(x, y) and the polar coordinates (r, θ) of a given point P in the plane are related by

x = r cos θ , y = r sin θ . (B.1)

The number r equals the length of the line segment joining the origin O and the point
P, and θ equals the angle between the line O P and the x-axis (Fig. B.2). Note that θ
is usually measured in radians, that is, an angle of 90◦ has θ = π/2, the full angle of
360◦ has θ = 2π and so on. Formula (B.1) expresses the Cartesian coordinates of P
in terms of the polar coordinates of P Conversely, we obtain the polar coordinates
of P from its Cartesian coordinates by the formula
y
r= x 2 + y 2 , θ = tan−1 . (B.2)
x
Appendix B: Analytic Geometry 515

Fig. B.2 Polar coordinates


Appendix C
Trigonometry

C.1 Trigonometric Functions

The trigonometric functions are directly related to the geometry of the circle. Con-
sider the point P with Cartesian coordinates (x, y) and polar coordinates (r, θ) as in
Fig. C.1. The standard trigonometric functions are defined as follows.
y
sin θ = (read as “sine of θ”),
r
x
cos θ = (read as “cosine of θ ”),
r
and analogously
y x r r
tan θ = , cot θ = , sec θ = , csc θ = ,
x y x y

called the tangent, the cotangent, the secant and the cosecant, respectively. From
elementary geometry, we see that the definitions above do not depend on the chosen
value of r , as long as r > 0, so it suffices to consider the case r = 1 (the unit circle).
Recall that θ is usually measured in radians (θ = π/2 corresponds to an angle of
90◦ and so on.)

C.2 Trigonometric Identities

A trigonometric identity is an equation involving trigonometric functions that is true


for all angles for which both sides of the equation are defined. In the following, we
state a few useful trigonometric identities.

© Springer Nature Singapore Pte Ltd. 2019 517


M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6
518 Appendix C: Trigonometry

Fig. C.1 Trigonometric


functions and the circle

cos2 θ + sin2 θ = 1 . (C.1)


1 + tan θ = sec θ .
2 2
(C.2)
1 + cot2 θ = csc2 θ . (C.3)
sin (θ + 2π ) = sin (θ − 2π ) = sin θ . (C.4)
cos (θ + 2π ) = cos (θ − 2π ) = cos θ . (C.5)

In general,

sin (θ ± 2nπ ) = sin θ , for n = 0, 1, 2, 3, . . . ,


(C.6)
cos (θ ± 2nπ ) = cos θ , for n = 0, 1, 2, 3, . . . .
tan (θ + π ) = tan θ , tan (θ − π ) = tan θ . (C.7)

Next we give formulas which involve sums or multiples of angles.

sin (α + β) = sin α cos β + cos α sin β . (C.8)


cos (α + β) = cos α cos β − sin α sin β . (C.9)
sin 2θ = 2 sin θ cos θ . (C.10)
cos 2θ = cos θ − sin θ .
2 2
(C.11)
2 tan θ
tan 2θ = . (C.12)
1 − tan2 θ
cos 2θ = 2 cos2 θ − 1 . (C.13)
cos 2θ = 1 − 2 sin θ .2
(C.14)
Appendix C: Trigonometry 519

tan α + tan β
tan (α + β) = . (C.15)
1 − tan α tan β
tan α − tan β
tan (α − β) = . (C.16)
1 + tan α tan β
α+β α−β
sin α + sin β = 2 sin cos . (C.17)
2 2
α+β α−β
sin α − sin β = 2 cos sin . (C.18)
2 2
α+β α−β
cos α + cos β = 2 cos cos . (C.19)
2 2
α+β α−β
cos α − cos β = −2 sin sin . (C.20)
2 2

Theorem C.1 We have


sin x π
cos x < < 1 , for any x satisfying 0 < |x| ≤ . (C.21)
x 2
A geometric proof. Assume first that x > 0. The proof is based on the relations
between certain areas in Fig. C.2.
For a circle of radius 1, the figure shows a sector of angle x with corners O, A and
P, and additional auxiliary points B and Q, connected to P resp. A by vertical lines
of length sin x resp. tan x. The distance between O and B equals cos x. We then have

1
area of triangle OAP = · 1 · sin x ,
2
1
area of sector OAP = x ,
2
1
area of triangle OAQ = · 1 · tan x .
2
We see that
1 1 1 1 sin x
sin x < x < tan x = ,
2 2 2 2 cos x

Fig. C.2 Geometric proof of


Theorem C.1
520 Appendix C: Trigonometry

since the triangle OAP is contained in the sector OAP, which in turn is contained in
the triangle OAQ. Multiplying with 2 and dividing by sin x results in

1 x 1
< < .
sin x sin x cos x
Taking reciprocals yields the assertion (C.21) for x > 0. For x < 0 it is enough to
observe that
sin(−x) − sin x sin x
cos(−x) = cos x , = = .
−x −x x

An alternative proof, for small values of x. This proof uses the power series
representation of the sine and the cosine, as discussed in Chap. 5. We have

x3 x5
sin x = x − + − ··· ,
3! 5!
therefore
sin x x2 x4
=1− + − ··· . (C.22)
x 3! 5!
On the other hand,
x2 x4
cos x = 1 − + − ··· , (C.23)
2! 4!
If we consider just the first two terms of the series, we get for x > 0

x2 x2
1− <1− < 1. (C.24)
2! 3!
If x is small enough, inequality (C.24) continues to hold even after we add the
remaining terms of the series (C.22) and (C.23), because those remaining terms have
exponents of x greater than 2, and hence the remaining sums have the form x 2 r (x)
for some function r with lim x→0 r (x) = 0.

C.3 Inverse Trigonometric Functions

The inverse trigonometric functions or cyclometric functions are obtained as


inverse functions of the trigonometric functions. However, in order that this works
out correctly in the sense of Definition 1.12 (the general definition of an inverse
function), one has to restrict the domain of the original function to an interval where
the latter is increasing resp. decreasing. For example, the sine function is increasing
on the interval [−π/2, π/2], thus by Theorem 1.2 it has an inverse defined on its
range [−1, 1] with values in [−π/2, π/2]. This inverse function is called arcsine;
Appendix C: Trigonometry 521

more precisely, it is called the principal branch of the arcsine. Indeed, on the interval
[π/2, 3π/2] the sine is decreasing (again with range [−1, 1]), so we could also define
an inverse on [−1, 1] with range [π/2, 3π/2], and similarly on other intervals.
In the following table, we list the principal branches of the 6 standard trigonometric
functions. Their names are obtained by putting the syllable “arc” in front of the
original function.

Name Standard notation Domain Range in radians Range in degrees


arcsine y = arcsin(x) x ∈ [−1, 1] −π/2 ≤ y ≤ π/2 −90◦ ≤ y ≤ 90◦
arccosine y = arccos(x) x ∈ [−1, 1] 0≤y≤π 0◦ ≤ y ≤ 180◦
arctangent y = arctan(x) x ∈R −π/2 < y < π/2 −90◦ < y < 90◦
arccotangent y = arccot(x) x ∈R 0<y<π 0◦ < y < 180◦
x ≥1 0 ≤ y < π/2 0◦ ≤ y < 90◦
arcsecant y = arcsec(x) or or or
x ≤ −1 π/2 < y ≤ π 90◦ < y ≤ 180◦
x ≤ −1 −π/2 ≤ y < 0 −90◦ ≤ y < 0◦
arccosecant y = arccsc(x) or or or
x ≥1 0 < y ≤ π/2 0◦ < y ≤ 90◦

The “inverse function” notations sin−1 , cos−1 etc. for arcsin, arccos etc. are natural,
but one must be aware of the following notational conflict: Since we commonly write
sin2 x instead of (sin x)2 , an unspecified use of “sin−1 (x)” might mean “arcsin(x)”
as well as 1/(sin x)”.

C.4 Inverse Hyperbolic Functions

The inverse hyperbolic functions are the inverses of the hyperbolic functions sinh,
cosh, tanh, coth, sech and csch. Similarly as in the case of trigonometric functions,
one has to consider suitable domains and ranges in order to define inverses in the
sense of Definition 1.12. (In contrast to the trigonometric functions, this is not always
necessary; sinh is invertible on its entire domain R.) They are called area hyperbolic
functions, and for the mathematical notation the prefix “ar” precedes the name of
the original function, for example, “arsinh” denotes the inverse of sinh.
Table C.1 lists the principal branches of the inverse hyperbolic functions.
The inverse hyperbolic functions are expressible in terms of natural logarithms.
The formulas in the following Table C.2 hold for all x in the domains of the inverse
hyperbolic functions.
522 Appendix C: Trigonometry

Table C.1 Principal branches of the inverse hyperbolic functions


Function Domain Range
y = arsinh(x) x ∈ (−∞, ∞) y ∈ (−∞, ∞)
y = arcosh(x) x ∈ [1, ∞) y ∈ [0, ∞)
y = artanh(x) x ∈ (−1, 1) y ∈ (−∞, ∞)
y = arcoth(x) x ∈ (−∞, −1) ∪ (1, ∞) y ∈ (−∞, 0) ∪ (0, ∞)
y = arsech(x) x ∈ (0, 1] y ∈ [0, ∞)
y = arcsch(x) x ∈ (−∞, 0) ∪ (0, ∞) y ∈ (−∞, 0) ∪ (0, ∞)

Table C.2 Relations between the area hyperbolic functions and the logarithm
√ √
arsinh(x) = ln (x + x 2 + 1) arcosh(x) = ln (x + x 2 − 1)
   
1 1+x 1 x +1
artanh(x) = ln arcoth(x) = ln
2 1−x 2 x −1



1 + 1 − x2 1 1 + x2
arsech(x) = ln arcsch(x) = ln +
x x |x|
Appendix D

D.1 The Sandwich Theorem

We present the proof of the sandwich theorem, Theorem 2.5. First, we treat the case
of the right-hand limit, that is, we assume that lim x→c+ g(x) = lim x→c+ h(x) = L.
Then for any ε > 0 there exists a δ > 0 such that the inequality c < x < c + δ implies

L − ε < g(x) < L + ε and L − ε < h(x) < L + ε. (D.1)

Since g(x) ≤ f (x) ≤ h(x) by assumption, we get

L − ε < g(x) ≤ f (x) ≤ h(x) < L + ε ,

so −ε < f (x) − L < ε. Therefore, the inequality c < x < c + δ implies that
| f (x) − L| < ε. This concludes the proof for the right-hand limit. For the left-hand
limit, suppose that lim x→c− g(x) = lim x→c− h(x) = L. Then for any ε > 0 there
exists a δ > 0 such that the inequality c − δ < x < c implies (D.1). We proceed
as before and obtain that c − δ < x < c implies that | f (x) − L| < ε. Finally, let
lim x→c g(x) = lim x→c h(x) = L. By what we just have proved, lim x→c+ f (x) =
L = lim x→c− f (x). Hence, lim x→c f (x) exists and equals L.

D.2 Rolle’s Theorem

Proof of Theorem 4.5. We have to find a c ∈ (a, b) such that f (c) = 0. We dis-
tinguish two cases. If f is constant in [a, b], then f (x) = 0 for all x in (a, b), so
for c we can choose an arbitrary point in (a, b). For the second case, assume that
f is not constant in [a, b]. Set r = f (a) = f (b). There must be either a point x
in (a, b) where f (x) > r or a point x in (a, b) where f (x) < r . Assume that the
first situation occurs (the proof for the second situation is analogous). Since f is

© Springer Nature Singapore Pte Ltd. 2019 523


M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6
524 Appendix D

continuous on [a, b], it follows from the Extreme-Value Theorem 2.8 that f has a
maximum value at some point c in [a, b]. The point c cannot be an endpoint, since
f (a) = f (b) = r < f (x). By hypothesis f is differentiable everywhere on (a, b),
thus f (c) exists. By Theorem 4.1 we get f (c) = 0.

D.3 The Mean Value Theorem

Proof of Theorem 4.4. The two-point form of the equation of the secant line y = s(x)
joining (a, f (a)) and (b, f (b)) is

f (b) − f (a)
y − f (a) = (x − a)
b−a

or equivalently
f (b) − f (a)
y = s(x) = (x − a) + f (a) . (D.2)
b−a

The difference v between the values of the function f and of the secant line s equals

f (b) − f (a)
v(x) = f (x) − s(x) = f (x) − (x − a) + f (a) .
b−a

Since f is continuous on [a, b] and differentiable on (a, b), so is v. Because f (a) =


s(a) and f (b) = s(b) we have

v(a) = v(b) = 0 ,

so that the function v satisfies the hypothesis of Rolle’s Theorem on the interval
[a, b]. Thus there is a point c in (a, b) such that v (c) = 0. Using (D.2) we compute

f (b) − f (a)
v (x) = f (x) − s (x) = f (x) − ,
b−a

so in particular
f (b) − f (a)
v (c) = f (c) − .
b−a

Thus, at the point c in (a, b), where v (c) = 0, we have

f (b) − f (a)
f (c) = .
b−a
Appendix D 525

D.4 Taylor’s Theorem

For convenience of the reader we first restate the theorem.


Theorem 5.15. Let f be differentiable up to order n + 1 in an open interval I
containing a point a. Then for each x in I , there exists a number c between x and a,
that is, x < c < a or c < a < x, such that

n
f k (a)
f (x) = (x − a)k + Rn (x) = Pn (x) + Rn (x) (D.3)
k=0
k

holds with
f (n+1) (c)
Rn (x) = (x − a)n+1 . (D.4)
(n + 1)!

Proof of Theorem 5.15. We fix x ∈ I , x = a, and consider the auxiliary function

n
f (k) (t)
g(t) = f (x) − (x − t)k . (D.5)
k=0
k!

We note that g(x) = f (x) − f (x) = 0 and that

n
f (k) (a)
g(a) = f (x) − (x − a)k = f (x) − Pn (x) . (D.6)
k=0
k!

From (D.5) we see that g is differentiable on (a, x) due to our assumptions on f .


Using the product rule we compute its derivative (which is taken with respect to t,
not to x!) as

n
f (k+1) (t) n
f (k) (t)
g (t) = − (x − t)k + k(x − t)k−1
k! k!
k=0 k=1 (D.7)
f (n+1) (t)
=− (x − t)n ,
n!
because all other terms cancel out. We define a second auxiliary function

(x − t)n+1
h(t) = g(t) − g(a) . (D.8)
(x − a)n+1

We see that
h(a) = 0 , h(x) = g(x) = f (x) − f (x) = 0 .
526 Appendix D

Since h is differentiable between a and x, Rolle’s theorem asserts that there exists a
c between a and x such that
h (c) = 0 .

From (D.8) and (D.7) we obtain

(x − c)n
0 = h (c) = g (c) + g(a) · (n + 1)
(x − a)n+1
(D.9)
f (n+1) (c) (x − c)n
=− (x − c)n + g(a) · (n + 1) .
n! (x − a)n+1

Since x  = c, we can divide by (x − c)n . Rearranging then gives

f (n+1) (c)
g(a) = (x − a)n+1 . (D.10)
(n + 1)!

But we have already seen in (D.6) that g(a) = f (x) − Pn (x), so (D.10) yields

f (n+1) (c)
f (x) − Pn (x) = (x − a)n+1
(n + 1)!

which was to be proved.

D.5 The Fundamental Theorem of Calculus

In this section, we present the precise definition of the integral and the proofs of the
fundamental theorem of calculus and of related results.
Definition of the integral. As explained in the beginning of Chap. 6, the integral
 b
f (x) d x
a

of the function f (which we assume to be bounded) over the interval [a, b] is defined
through an approximation procedure which involves Riemannian sums


n
sΔ = f (ξk )(xk − xk−1 ) . (D.11)
k=1

Here, Δ is a partition of the interval [a, b] of the form a = x0 < x1 < · · · < xn ,
and each point ξk lies somewhere in the subinterval [xk−1 , xk ]. Thus, the value of
sΔ depends on the choice of Δ as well as on the choice of the points ξk . One can
estimate the influence of the latter choice through the notion of oscillation of a
bounded function. If I is a subset of the domain of f (here, of the interval [a, b]), the
Appendix D 527

oscillation of f on I is defined as the maximum possible difference of two function


values on I , that is,
osc I ( f ) = max | f (x) − f (z)| . (D.12)
x,z∈I

We remark that it may happen that this maximum does not exist (that is, there are no
points x, z ∈ I where a maximum value is attained). One then replaces the maximum
by the so-called supremum, which in this case is equal to the smallest number η
such that | f (x) − f (z)| ≤ η for all x, z ∈ I .
Let us now consider two different Riemannian sums for the same partition Δ,


n 
n
sΔ = f (ξk )(xk − xk−1 ) , s̃Δ = f (ξ̃k )(xk − xk−1 ) . (D.13)
k=1 k=1

We estimate their difference as


 n 
 
n 
 
|sΔ − s̃Δ | =  f (ξk )(xk − xk−1 ) − f (ξ̃k )(xk − xk−1 )
 
k=1 k=1

n
≤ | f (ξk ) − f (ξ̃k )|(xk − xk−1 )
k=1
n
≤ Ok ( f )(xk − xk−1 ) =: VΔ ( f ) , (D.14)
k=1

where Ok ( f ) denotes the oscillation of f on the subinterval I = [xk−1 , xk ]. The


number VΔ ( f ) defined in (D.14) is called the oscillation sum of f for the partition Δ.

Definition D.1 (Integrable Function) A bounded function f : [a, b] → R is said to


be integrable on [a, b], if for every ε > 0 there exists a partition Δ of [a, b] such
that VΔ ( f ) ≤ ε.

The latter condition means that, in view of (D.14), we can enforce the difference
between different Riemannian sums for the same partition Δ to become as small as
we want, if we choose Δ fine enough.
Let now f : [a, b] → R be integrable. In order to define its integral, one goes
through the following three steps.
1. One proves that
|sΔ − sΔ̃ | < VΔ ( f ) ,

whenever the partition Δ̃ is a refinement of the partition Δ (that is, Δ̃ is obtained


from Δ by adding partition points), for all Riemannian sums sΔ and sΔ̃ for those
partitions.
2. One proves that
|sΔ − sΔ̃ | < VΔ ( f ) + VΔ̃ ( f ) ,
528 Appendix D

for arbitrary partitions Δ and Δ̃ and all Riemannian sums for those partitions.
3. One proves that there exists a unique number I such that

|I − sΔ | ≤ VΔ ( f ) (D.15)

holds for all partitions and all corresponding Riemannian sums.


We then define the integral of f on [a, b] as
 b
f (x) d x = I ,
a

where I is the number obtained in step 3 above.


It was stated in Theorem 6.2 that every continuous function on a closed interval
[a, b] is integrable. Its proof uses the notion of uniform continuity. A function f :
[a, b] → R is called uniformly continuous on [a, b], if for every ε > 0 there exists
δ > 0 such that | f (x) − f (z)| < ε holds for all x, z ∈ [a, b] with |x − z| < δ. This
is similar to, but not the same as the definition of continuity. It is a theorem (not
treated in this book) that every continuous function defined on a closed and bounded
interval [a, b] is uniformly continuous on that interval.
Proof of Theorem 6.2. Let f : [a, b] → R be continuous. According to Definition
D.1, for an arbitrarily given ε > 0 we have to find a partition Δ such that VΔ ( f ) < ε.
Choose δ > 0 small enough such that | f (x) − f (z)| < ε/(b − a) whenever |x −
z| < δ. (This is possible since, by what we said just above, f is also uniformly
continuous.) Next, choose a natural number n large enough such that (b − a)/n < δ.
Take as partition Δ the equidistant partition of [a, b] with xk − xk−1 = (b − a)/n.
Then the oscillation Ok ( f ) of f on [xk−1 , xk ] satisfies Ok ( f ) ≤ ε/(b − a) for every
k, hence
n
ε b−a
VΔ ( f ) = Ok ( f )(xk − xk−1 ) ≤ n · = ε.
k=1
b − a n

Therefore, f is integrable.
Let us remark that there are different ways of defining the integral. We have
chosen a method which can be generalized conveniently to the case of double and
triple integrals, see Sect. D.10 below.
The fundamental theorem of calculus. We present the proofs of both parts of this
theorem, as well as that of the mean value theorem for integrals, Theorem 6.7. We
begin with the latter.
Proof of Theorem 6.7. Since f is continuous, it attains its maximum and minimum
on [a, b] by Theorem 2.8. Let

M = max f (x) , m = min f (x) .


x∈[a,b] x∈[a,b]
Appendix D 529

Since m ≤ f (x) ≤ M for all x ∈ [a, b], due to (6.25) we have


 b  b  b
m(b − a) = m dx ≤ f (x) d x ≤ M d x = M(b − a) .
a a a

We divide by b − a and obtain


 b
1
m≤ f (x) d x ≤ M .
b−a a

By the intermediate value Theorem 2.7, f attains every value between m and M.
Therefore there exists c ∈ [a, b] such that
 b
1
f (c) = f (x) d x .
b−a a

The theorem is proved.


Proof of Part 1, Theorem 6.3. To establish this theorem, we must show that if x is
an arbitrary point in [a, b], then F (x) = f (x), that is,

F(x + h) − F(x)
lim = f (x) .
h→0 h

In order to prove this, fix x ∈ [a, b] and let h be any number such that x + h ∈ [a, b].
Using the definition of F together with property (6.26) yields
 x+h  x
F(x + h) − F(x) = f (t) dt − f (t) dt
a a
 x  x+h  x
= f (t) dt + f (t) dt − f (t) dt
a x a
 x+h
= f (t) dt .
x

Consequently, if h = 0, then

F(x + h) − F(x) 1 x+h
= f (t) dt .
h h x

In the case h > 0, by the mean value Theorem 6.7, just proved above, we can find a
number z = z(h) in the open interval (x, x + h) such that
 x+h
f (t) dt = f (z(h)) · (x + h − x) = f (z(h)) · h
x
530 Appendix D

and, therefore,
F(x + h) − F(x)
= f (z(h)) . (D.16)
h

Since x < z < x + h, we have lim h→0+ z(h) = x. It then follows from the continuity
of f that
lim+ f (z(h)) = f (x) ,
h→0

and we conclude from (D.16) that

F(x + h) − F(x)
lim+ = f (x) .
h→0 h

If h < 0, we may prove in a similar way that

F(x + h) − F(x)
lim− = f (x) .
h→0 h

The two preceding one-sided limits imply that

F(x + h) − F(x)
F (x) = lim = f (x) .
h→0 h

This completes the proof.


Proof of Part 2, Theorem 6.4. Let F be any antiderivative of f and let
 x
G(x) = f (t) dt . (D.17)
a

From Theorem 6.3 we know that G is an antiderivative of G, and from Theorem 6.1
we know that there is a constant C such that

G(x) = F(x) + C

for every x in [a, b]. Together with (D.17) this implies that
 x
f (t) dt = F(x) + C
a
a
for every x in [a, b]. If we let x = a and use the fact that a f (t) dt = 0, we obtain
0 = F(a) + C, or C = −F(a). If we let x = b, we arrive at
 b
f (t) dt = F(b) + C = F(b) − F(a) .
a
Appendix D 531

The theorem is proved. (Recall that it is irrelevant whether the integration variable
is denoted by x or by t.)
The integral test. Finally we prove Theorem ∞6.8 which says that for continuous
positive nonincreasing
∞ functions the series n=1 f (n) converges if and only if the
improper integral 1 f (x) d x converges.
∞
Proof of Theorem 6.8. First, assume that 1 f (x) d x converges. We define a func-
tion ϕ : [1, ∞) → R by setting ϕ(x) = f (n) if n − 1 ≤ x < n. Since f is nonin-
creasing, we have ϕ(x) ≤ f (x) for all x ≥ 1. For every natural number N ≥ 2 we
therefore obtain, using (6.25),


N N 
 n  N  N
0≤ f (n) = ϕ(x) d x = ϕ(x) d x ≤ f (x) d x
n=2 n=2 n−1 1 1
 ∞
≤ f (x) d x < ∞ .
1

 ∞
Thus, the partial sums of the series ∞ n=1 f (n) are boundedby 1 f (x) d x, and
hence the series converges. To prove the converse, assume that ∞n=1 f (n) converges.
We define a function ψ : [1, ∞) → R by setting ψ(x) = f (n − 1) if n − 1 ≤ x < n.
Since f is nonincreasing, we have ψ(x) ≥ f (x) for all x ≥ 1. For every natural
number N ≥ 2 we therefore obtain, using (6.25),
 N  N N 
 n 
N
0≤ f (x) d x ≤ ψ(x) d x = ψ(x) d x = f (n − 1)
1 1 n=2 n−1 n=2

(D.18)

≤ f (n) < ∞ .
n=1

The function F : [1, ∞) → R defined by


 t
F(t) = f (x) d x
1

is nondecreasing and, due to (D.18), bounded by the finite number ∞ n=1 f (n).
Therefore, the improper limit limt→∞ F(t) exists which means that the improper
integral converges.
532 Appendix D

Fig. D.1 Special form of the y


region D
C2
y = k(x)

C1
y = h(x)

D.6 The Green–Ostrogradski Theorem

For convenience of the reader, we restate the theorem.


Theorem 9.8. Let D be a bounded domain in the plane whose boundary C is a
closed, simple and positively oriented curve. Let F = ( f, g) be a vector field whose
components are continuously differentiable. Then
/ ''

∂g ∂f
f (x, y) d x + g(x, y) dy = − dA. (D.19)
C D ∂x ∂y

Proof of Theorem 9.8. In this proof we restrict ourselves to the case where the
region D has the form indicated in Fig. D.1. First, we consider the case g = 0. The
boundary C of D consists of a lower part C1 which is the graph of some function
y = h(x), and of an upper part C2 which is the graph of some function y = k(x);
again we refer to Fig. D.1. The line integral becomes
/ ' '
F · dr = F · dr + F · dr ,
C C1 C2

where, due to the counterclockwise orientation, C1 is traversed from left to right, and
C2 from right to left. We already have computed (see formula (9.89)) that
' ' b
F · dr = f (x, h(x)) d x ,
C1 a

and analogously we obtain (note the reversal of the integration limits)


' ' a
F · dr = f (x, k(x)) d x .
C2 b

Taken together the formulas above yield


Appendix D 533
  b
F · dr = f (x, h(x)) − f (x, k(x)) d x . (D.20)
C a

Now we consider the double integral. The region D has the form

D = {(x, y) : a ≤ x ≤ b , h(x) ≤ y ≤ k(x)} .

According to (8.46), we compute the double integral on the right-hand side of (D.19)
for g = 0, using the fundamental theorem of calculus,
  
∂f b k(x)
∂f
− (x, y) d A = − (x, y) d y d x
D ∂y a h(x) ∂y
 b
=− f (x, k(x)) − f (x, h(x)) d x .
a

From (D.20) we see that in the case g = 0 indeed


  
∂f
F · dr = f (x, y) d x = − (x, y) d A . (D.21)
C C D ∂y

An analogous proof for the case f = 0, taking F̃ = (0, g), shows that
  
∂g
F̃ · dr = g(x, y) dy = (x, y) d A . (D.22)
C C D ∂x

(In that proof, we decompose the boundary C of D into a left part and a right
part, described by some functions x = h̃(y) and x = k̃(y), respectively.) For the
general case, we decompose ( f, g) = ( f, 0) + (0, g), and add (D.21) and (D.22).
The theorem is proved for domains D of the form in Fig. D.1.
For more general domains D, the two-dimensional variant of the divergence the-
orem of Gauss can be used conveniently to prove Theorem 9.8.

D.7 The Divergence Theorem of Gauss

The Gauss divergence theorem, Theorem 9.9, asserts that


 
F · n dσ = div F d V . (D.23)
Σ D

Here, D is a domain in R3 with boundary Σ and outer unit normal field n. We present
a proof of Theorem 9.9 for the special case where D and Σ can be represented, with
respect to the coordinate directions x, y and z, in the following way. With respect to
534 Appendix D

the z-coordinate, we assume that D has the form

D = {(x, y, z) : k(x, y) < z < h(x, y) , (x, y) ∈ Dz } , (D.24)

where Dz = pz (D) with pz (x, y, z) = (x, y, 0) being the projection of D onto the
x y-plane, and k and h are certain functions. Moreover, we assume that the boundary
surface Σ of D consists of an upper part Σ+ described as z = h(x, y), and a lower
part Σ− described as z = k(x, y), and possibly a part Σ0 which is vertical to the
x y-plane (that is, the normals to Σ0 are perpendicular to the z-direction).
Before proceeding further, let us illustrate this situation with two examples. If D
is a ball bounded by a sphere Σ, the surfaces Σ+ and Σ− are the upper and lower
half sphere, respectively, while Σ0 is empty. If D is a rectangular box parallel to
the coordinate axes, Σ+ and Σ− are the rectangles forming the top and the bottom,
respectively, while Σ0 consists of the four vertical sides.
We assume moreover that D admits analogous representations with respect to the
x- and the y-coordinate.
Proof of Theorem 9.9. Recall that for F = f 1 i + f 2 j + f 3 k, its divergence is given
by
∂ f1 ∂ f2 ∂ f3
div F = + + .
∂x ∂y ∂z

Due to the form of D described above, we can decompose a volume integral over
D into an outer integral over the two-dimensional projection Dz and an integral
over intervals in the z direction. We compute (using Fubini’s theorem, and then the
fundamental theorem of calculus)
    h(x,y)
∂ f3 ∂ f3
dV = (x, y, z) dz d x d y
D ∂z Dz k(x,y) ∂z
  
= f 3 (x, y, h(x, y)) − f 3 (x, y, k(x, y)) d x d y (D.25)
D
 z 
= f 3 (x, y, h(x, y)) d x d y − f 3 (x, y, k(x, y)) d x d y .
Dz Dz

The surface integral of F · n = f 1 n 1 + f 2 n 2 + f 3 n 3 decomposes into


   
F · n dσ = f 1 n 1 dσ + f 2 n 2 dσ + f 3 n 3 dσ . (D.26)
Σ Σ Σ Σ

For the third term on the right-hand side, we consider the partition of Σ as described
above,
   
f 3 n 3 dσ = f 3 n 3 dσ + f 3 n 3 dσ + f 3 n 3 dσ . (D.27)
Σ Σ+ Σ− Σ0
Appendix D 535

On Σ0 , the unit outer normal n is perpendicular to the z-direction, so n 3 = 0 on


Σ0 and the corresponding integral vanishes. We consider Σ+ . Since Σ+ is given as
z = h(x, y) and D lies below Σ+ the unit outer normal n points upward. In the point,
(x, y, z) with z = h(x, y) it is given by

1  
n(x, y, z) = − ∂x hi − ∂ y hj + k , ν = 1 + (∂x h)2 + (∂ y h)2 , (D.28)
ν
where the partial derivatives ∂x h and ∂ y h are evaluated at (x, y). We thus obtain

1
n3 = (D.29)
ν
for its third component. We transform the surface integral over Σ+ into a two-
dimensional integral over the region Dz , according to Definition 9.20,
  
f3 f3 
f 3 n 3 dσ = dσ = · 1 + (∂x h)2 + (∂ y h)2 d A
Σ+ Σ+ ν Dz ν

= f 3 (x, y, h(x, y)) d x d y . (D.30)
Dz

The surface integral over Σ− is treated analogously; there, however, the outer normal
vector points downward, so we have n 3 = −1/ν instead of (D.29). Consequently,
 
f 3 n 3 dσ = − f 3 (x, y, k(x, y)) d x d y . (D.31)
Σ− Dz

Putting together (D.25), (D.27), (D.30) and (D.31) we arrive at


 
∂ f3
f 3 n 3 dσ = dV . (D.32)
Σ D ∂z

Working with the representations of D with respect to the x- and the y-coordinate,
one obtains in an analogous manner that
   
∂ f1 ∂ f2
f 1 n 1 dσ = dV , f 2 n 2 dσ = dV . (D.33)
Σ D ∂x Σ D ∂y

Adding the three equation in (D.32) and (D.33) yields (D.23). This completes the
proof for the special form of D as considered.
To prove the divergence theorem for domains D of general form, one employs so-
called partitions of unity of the vector field F, which reduces the general situation to
a situation where similar computations can be done as in the proof presented above. In
addition, let us remark that, while we have treated the situation in three-dimensional
space, more or less the same proof works for the case of an n-dimensional region D
536 Appendix D

bounded by an n − 1-dimensional surface Σ, where n is an arbitrary number greater


or equal to 2. Both these developments are, however, outside the scope of this book.

D.8 Stokes’ Theorem

This section is devoted to the proof of the theorem of Stokes, Theorem 9.12. It states
that under suitable assumptions, introduced prior to this theorem in Sect. 9.5.3,
 
F · dr = (curl F) · n dσ , (D.34)
C Σ

where F is a vector field, Σ is a surface with unit normal field n and boundary curve
C, suitably oriented.
Proof of Theorem 9.12. The strategy of the proof is to transform the situation to the
x y-plane, in order to apply the Green–Ostrogradski theorem. Let us first consider
the surface integral on the right-hand side of (D.34). The surface Σ is described as
z = S(x, y), that is, as the graph of a function S defined on D, the projection of Σ
onto the x y-plane. The unit normal at a point (x, y, S(x, y)) of Σ is given by

1  
n= − ∂x Si − ∂ y Sj + k , ν = 1 + (∂x S)2 + (∂ y S)2 , (D.35)
ν
where the right-hand side is evaluated at (x, y). Since
 
 i j k
 
curl F = ∇ × F = ∂x ∂ y ∂z  (D.36)
 f1 f2 f3
= (∂ y f 3 − ∂z f 2 )i + (∂z f 1 − ∂x f 3 )j + (∂x f 2 − ∂ y f 1 )k , (D.37)

the integrand of the surface integral becomes the scalar function

1
(curl F) · n = (∂ y f 3 − ∂z f 2 ) · (−∂x S) + (∂z f 1 − ∂x f 3 ) · (−∂ y S)
ν  (D.38)
+ (∂x f 2 − ∂ y f 1 ) ,

to be evaluated at points of Σ. Using Definition 9.20 we can transform the surface


integral into a double integral over the domain D,
Appendix D 537
 
(curl F) · n dσ = (curl F) · n · ν d A
Σ
  D

= (∂ y f 3 − ∂z f 2 ) · (−∂x S) + (∂z f 1 − ∂x f 3 ) · (−∂ y S) (D.39)


D

+ (∂x f 2 − ∂ y f 1 ) d A

Let us now consider the line integral on the left side of (D.34). If C is parametrized
by r : [a, b] → R3 , then by the definition of the line integral we have
  b
F · dr = F(r(t)) · r (t) dt . (D.40)
C a

Since Σ is the graph of S defined on D, C is the graph of S restricted to the boundary


Γ of D. Let Γ be positively oriented by the parametrization q : [a, b] → R2 , set

r(t) = q1 (t)i + q2 (t)j + S(q1 (t), q2 (t))k . (D.41)

Using the chain rule we obtain


 
r (t) = q1 (t)i + q2 (t)j + ∂x S · q1 (t) + ∂ y S · q2 (t) k , (D.42)

where ∂x S and ∂ y S are evaluated at q(t) = (q1 (t), q2 (t)). Let us now define the plane
vector field F̃ by

F̃1 (x, y) = f 1 (x, y, S(x, y)) + f 3 (x, y, S(x, y))∂x S(x, y)


(D.43)
F̃2 (x, y) = f 2 (x, y, S(x, y)) + f 3 (x, y, S(x, y))∂ y S(x, y) .

Setting x = q1 (t) and y = q2 (t) we see in view of (D.41) and (D.42) that

F(r(t)) · r (t) = F̃(q(t)) · q (t) (D.44)

for all t ∈ [a, b]. (This is the reason for defining F̃ by (D.43).)
The theorem of Green and Ostrogradski, Theorem 9.8, asserts that
   
F̃ · dr = ∂x F̃2 − ∂ y F̃1 d A . (D.45)
Γ D

We compute the partial derivatives from (D.43) with the aid of the chain rule,
538 Appendix D

∂ y F̃1 = ∂ y f 1 + ∂z f 1 · ∂ y S + (∂ y f 3 + ∂z f 3 · ∂ y S)∂x S
+ f 3 · ∂ y ∂x S ,
(D.46)
∂x F̃2 = ∂x f 2 + ∂z f 2 · ∂x S + (∂x f 3 + ∂z f 3 · ∂x S)∂ y S
+ f 3 · ∂x ∂ y S .

Since the function S was assumed to have continuous second partial derivatives, we
can interchange their order, so we have ∂x ∂ y S = ∂ y ∂x S. Thus we obtain from (D.46)
that

∂x F̃2 − ∂ y F̃1 = (∂z f 2 − ∂ y f 3 ) · ∂x S + (∂x f 3 − ∂z f 1 ) · ∂ y S + (∂x f 2 − ∂ y f 1 ) .


(D.47)
We compare this expression with the corresponding one in (D.39) and find that
   
∂x F̃2 − ∂ y F̃1 d A = (curl F) · n dσ . (D.48)
D Σ

We now put together the previous calculations and finally conclude that
  b  b
F · dr = F(r(t)) · r (t) dt = F̃(q(t)) · q (t) dt
C
a   a

= F̃ · dr = ∂x F̃2 − ∂ y F̃1 d A
Γ D

= (curl F) · n dσ .
Σ

The proof is complete.

D.9 Conservative fields

In this section, we prove the two Theorems 9.4 and 9.5 concerning conservative
vector fields.
Proof of Theorem 9.4. Let r : [a, b] → R3 be a parametrization of C, so A = r(a)
and B = r(b). We set g(t) = ψ(r(t)). From the chain rule we get

g (t) = ∇ψ(r(t)) · r (t) .

We compute
Appendix D 539
   b

F · dr = F(r(t)) · r (t) dt = ∇ψ(r(t)) · r (t) dt
C C a
 b
= g (t) dt = g(b) − g(a) = ψ(r(b)) − ψ(r(a))
a
= ψ(B) − ψ(A) .

In the middle line of this computation, we have used the fundamental theorem of
calculus. Indeed, one may view Theorem 9.4 as a generalization of the fundamental
theorem of calculus to line integrals.
Proof of Theorem 9.5. Since we already know that every conservative vector field
is circulation free, it remains to prove that every vector field, which is circulation
free, is conservative. Let F be circulation free. We first show that F has the property
of path independence. If C 1 and C2 are two curves with initial point A and end point
B, let C denote the curve which first connects A to B via C1 and then B to A via C2
in the opposite direction (the latter curve we denote by −C2 ). Since F is circulation
free,
    
0= F · dr = F · dr + F · dr = F · dr − F · dr ,
C C1 −C2 C1 C2

thus  
F · dr = F · dr .
C1 C2

Therefore, the line integral is path independent. We now fix a point P in D and define
a function ψ by 
ψ(x) = F · dr , x ∈ D , (D.49)
Cx

where C x is a curve which connects P to x. (Since the line integral is path independent,
it does not matter which curve we choose.) We claim that ψ is a potential for F in
D. To this end, fix x ∈ D and choose h > 0 so small that the line segment L from
x to x + he1 lies in D, where e1 denotes the unit vector in the x-direction. Since we
obtain a curve from P to x + he1 by first traversing C x and then L, we see from the
definition of ψ that 
ψ(x + he1 ) = ψ(x) + F · dr . (D.50)
L

We parametrize L by r(t) = x + te1 with 0 ≤ t ≤ h, then r (t) = e1 and


  h  h
F · dr = F(r(t)) · e1 dt = f 1 (r(t)) dt ,
L 0 0

therefore
540 Appendix D

ψ(x + he1 ) − ψ(x) 1 h
= f 1 (r(t)) dt .
h h 0

Since the limit of the right-hand side exists as h → 0 and is equal to f 1 (r(0)) = f 1 (x),
we obtain
∂ψ
(x) = f 1 (x) , x ∈ D .
∂x
An analogous argument works for the other coordinate directions, so we finally
conclude that ∇ψ = F. The proof is complete.

D.10 Definition of Multiple Integrals

In this section, we present the definition of double and triple integrals,


 
f (x, y) d A , respectively f (x, y, z) d V ,
Q Q

over rectangular regions Q. The exposition closely parallels that given in Appendix
D.5 for the case of a single variable. We, therefore, shorten it here somewhat and
refer the reader to Sect. D.5 for some more details.
The double integral. Let the function f be defined in a rectangular region Q =
[a, b] × [c, d] of the x y-plane. We partition Q into rectangular subregions as follows.
Choose a partition of [a, b] of the form a = x0 < x1 < · · · < xn = b, and a partition
of [c, d] of the form c = y0 < y1 < · · · < ym = d. This gives us a partition Δ of Q
into nm rectangles

Q i j = {(x, y) : xi−1 ≤ x ≤ xi , y j−1 ≤ y ≤ x j } , 1 ≤ i ≤ n , 1 ≤ j ≤ m .

A Riemannian sum for the partition Δ is defined as


n 
m
sΔ = f (ξi , η j )(xi − xi−1 )(y j − y j−1 ) . (D.51)
i=1 j=1

Here, each point (ξi , η j ) lies somewhere in the rectangle Q i j . We define the oscillation
of f on the rectangles Q i j as

Oi j ( f ) = osc Q i j ( f ) = max | f (x) − f (z)| . (D.52)


x,z∈Q i j

Again, we have to replace the maximum by the supremum if the former does not
exist. Next, we consider two different Riemannian sums for the same partition Δ,
Appendix D 541


n 
m
sΔ = f (ξi , η j )(xi − xi−1 )(y j − y j−1 ) ,
i=1 j=1
(D.53)

n 
m
s̃Δ = f (ξ̃i , η̃ j )(xi − xi−1 )(y j − y j−1 ) .
i=1 j=1

Their difference can be estimated as


n 
m
|sΔ − s̃Δ | = Oi j ( f )(xi − xi−1 )(y j − y j−1 ) =: VΔ ( f ) . (D.54)
i=1 j=1

The number VΔ ( f ) defined in (D.54) is called the oscillation sum of f for the
partition Δ.
Definition D.2 (Integrable Function) A bounded function f : Q → R is said to
be integrable on Q, if for every ε > 0 there exists a partition Δ of Q such that
VΔ ( f ) ≤ ε.
The latter condition means that, in view of (D.54), we can enforce the difference
between different Riemannian sums for the same partition Δ to become as small as
we want, if we choose Δ fine enough.
Let now f : Q → R be integrable. In order to define its integral, one goes through
the same three steps as we did in Sect. D.5.
1. One proves that
|sΔ − sΔ̃ | < VΔ ( f ) ,

whenever the partition Δ̃ is a refinement of the partition Δ (that is, Δ̃ is obtained


from Δ by adding partition points), for all Riemannian sums sΔ and sΔ̃ for those
partitions.
2. One proves that
|sΔ − sΔ̃ | < VΔ ( f ) + VΔ̃ ( f ) ,

for arbitrary partitions Δ and Δ̃ and all Riemannian sums for those partitions.
3. One proves that there exists a unique number I such that

|I − sΔ | ≤ VΔ ( f ) (D.55)

holds for all partitions and all corresponding Riemannian sums.


We then define the integral of f on Q as

f (x, y) d A = I ,
Q

where I is the number obtained in step 3 above.


542 Appendix D

Theorem .1 Let f : Q → R be continuous. Then f is integrable on Q.

The proof of this theorem is the same as that given in Sect. D.5 for a function of a
single variable, except that one has to replace the intervals by rectangles.
The triple integral. Let the function f be defined in a rectangular region Q =
[a1 , b1 ] × [a2 , b2 ] × [a3 , b3 ] of three-dimensional space. We partition each of those
intervals,

a1 = x0 < · · · xn = b1 , a2 = y0 < · · · ym = b2 , a3 = z 0 < · · · zl = b3 .

The partition Δ now consists of rectangular regions

Q i jk = [xi , xi−1 ] × [y j , y j−1 ] × [z k , z k−1 ] .

The Riemannian sums have the form


n 
m 
l
sΔ = f (ξi , η j , ζk )(xi − xi−1 )(y j − y j−1 )(z k − z k−1 ) ,
i=1 j=1 k=1

where (ξ, η j , ζk ) lies somewhere in Q i jk . From this point onwards, the definition of

f (x, y, z) d V
Q

proceeds in a completely analogous manner as in the case of the double integral.


Solutions of Selected Exercises

Solutions to the Exercises of Chap. 1


1 3 5 2 7
1.8.1 f (x) = , f ( ) = , f (− ) = − .
x 5 3 7 2
1.8.2 f (x) = |x| − x, f (2) = 0, f (−2) = | − 2| − (−2) = 2 + 2 = 4, f (50) =
0, f (−40) = 80.
1
1.8.3 f (x) = 2 is defined if x 2 − 3 = 0 or equivalently x 2  = 3, that is, x  =
√ x − 3 √
± 3. √ The domain of f is the set of all real numbers √ different √ from√− 3
and 3, that is, it is the union of three intervals, (−∞, − 3) ∪ (− 3, 3) ∪
√ 1 1 1
( 3, ∞). For x = 5 we get f (x) = 2 = = .
√ √ 5 −3 25 − 3 22
1.8.4 (a) f (x) = x + 5. Since x is a real number iff x ≥ 0, the domain of f is
[0, ∞) and√ its range is [5, √ ∞).
(b) f (x) = x − 3. Since x − 3 is a real number iff x − 3 ≥ 0, we must
have x ≥ 3. The domain of f is [3, ∞) and its range is [0, ∞).
1 √
(c) f (x) = √ . The number 9 − x 2 is a real number iff 9 − x 2 ≥ 0,
9 − x2 √
for which x 2 ≤ 9, that is, −3 ≤ x ≤ 3. But at x = ±3, 9 − x 2 = 0 and
its reciprocal is not defined. Therefore, the domain of f is (−3, 3) and its
range is [ 13 , ∞).
(d) f (x) = |x − 2|. The domain of f is (−∞, ∞), and its range is [0, ∞).
(e) f (x) = x 2 − 6. The domain of f is (−∞, ∞), its range is [−6, ∞).
1.8.5 Let h denote the length of the box and x the edge length of the cross section,
which is a square. The girth (the perimeter of this square) has length 4x. We
must have 4x + h = 108, Therefore h = 108 − 4x. The volume V of the box
equals x 2 h. As a function of x only, it becomes

V (x) = x 2 (108 − 4x) = 108x 2 − 4x 3 .

© Springer Nature Singapore Pte Ltd. 2019 543


M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6
544 Solutions of Selected Exercises

Since edge length of the square and the length of the box cannot be negative,
we must have x ≥ 0 and h = 108 − 4x ≥ 0, that is, x ≤ 27. Therefore, 0 ≤
x ≤ 27, and the domain of the function V is [0, 27].
1.8.6 Let the radius of the can be r and the height be h. Its total surface area S is
given by
S = 2πr 2 + 2πr h .

22
Its volume V has to satisfy V = πr 2 h = 22, therefore h = . We thus can
πr 2
write S as a function of r only,

22 44
S(r ) = 2πr 2 + 2πr · = 2πr 2 + .
πr 2 r

Since r can take any positive value, the domain of S is (0, ∞).
1.8.7

For a = 2 and x = 4 we obtain a x = 24 = 16,


for a = 2 and x = −1 we obtain a x = 5−1 = 1/5,
√ √
for a = 2 and x = 2 we obtain a x = 2 2 ,
√ √
for a = e and x = 2 we obtain a x = e 2 ,
√ √
for a = e and x = π we obtain a x = e π .

1.8.8 (a) f (x) = x −5 is an odd function.


(b) f (x) = x 4 + 3x 2 − 1 is an even function.
x
(c) f (x) = 2 is an odd function.
x −1
(d) f (t) = √
|t | is an even function.
3

(e) h(t) = t 4 + 3 is an even function.


1.8.9 Given f (x) = x 2 + x + 1, we get

f (x − 4) = (x − 4)2 + (x − 4) + 1 = x 2 + 16 − 8x + x − 4 + 1 = x 2 − 7x + 13 .
f (x + 4) = (x + 4)2 + (x + 4) + 1 = x 2 + 16 + 8x + x + 4 + 1 = x 2 + 9x + 21 .
   
1 x 2 x x2 x
f x = + +1= + + 1.
2 2 2 4 2
f (2x − 4) = (2x − 4)2 + (2x − 4) + 1 = 4x 2 − 16x + 16 + 2x − 4 + 1
= 4x 2 − 14x + 13 .
1.8.10 For f (x) = x + 6, g(x) = x 2 − 4 we obtain
(a) f (g(x)) = f (x 2 − 4) = x 2 − 4 + 6 = x 2 + 2.
(b) f ( f (2)) = f (2 + 6) = f (8) = 8 + 6 = 14.
(c) g(g(3)) = g(9 − 4) = g(5) = 52 − 4 = 21.
Solutions of Selected Exercises 545

(d) f ( f (x)) = f (x + 6) = x + 12.


(e) g( f (x)) = g(x + 6) = (x + 6)2 − 4 = x 2 + 12x + 32.
1
1.8.11 For f (x) = x + 1, g(x) = we obtain
x +1
1 3 1 2
(a) g( f ( )) = g( ) = = .
2 2 5/2 5
(b) f ( f (x)) = f (x + 1) = x + 2.
1 1 x +1
(c) g(g(x)) = g( )= 1 = .
x +1 x+1
+ 1 x +2
1 1 3 3 7
(d) f (g( )) = f ( 1 ) = f( ) = +1 = .
3 3
+1 4 4 4
1.8.12 (a) ( f ◦ g)(x) = x; g(x) = x1 . We compute
 
1 1
f (g(x)) = x , therefore f = x , therefore f (x) = .
x x
x x
(b) f (x) = , g(x) = . We compute
x −1 x −1
  x
x x−1 x x −1
( f ◦ g)(x) = f (g(x)) = f = = ·
x −1 x
x−1
−1 x −1 1
=x.

x
(c) Inserting f (x) = into ( f ◦ g)(x) = x, we get
x −1

g(x)
=x.
g(x) − 1

In order to solve for g(x) we transform as g(x) = x[g(x) − 1] and more-


over x
g(x)(x − 1) = x , g(x) = .
x −1

1.8.16 Let W (t) be the amount of solid waste at time t (in years). Then W (1960) =
82.3 and W (1980) = 139.1. The slope of the linear function W becomes

ΔW 139.1 − 82.3 56.8


m= = = = 2.84 millions of tons/year.
Δt 1980 − 1960 20

Let the linear function be W (t) = b + mt. Then

82.3 = b + 2.84 · 1960 , therefore b = −5484.1 .


546 Solutions of Selected Exercises

The equation of the function is W (t) = −5484.1 + 2.84t. At t = 2020, we


get W (t) = −5484.1 + 2.84 · 2020 = 252.7 million tons.
1.8.17 (a) Let the equation of the line be y = f (x) = mx + b. To find the slope m
we use any two points from the table, say the first two, and obtain

Δy 180 − 200 −20


m= = = = −2 .
Δx 20 − 10 10

To find b use any point, say x = 20, y = 180,

180 = −2 · 20 + b , b = 220 .

Hence f (x) = 220 − 2x.


t 40 60 80 100
(b)
f(t) 2.4 2.2 2 1.8
Table 1.8.2(a)
Since f (t) increases by 0.2 for every increase of 20 in t, the table could
represent a linear function with slope −0.2
20
= −0.01.

x 0 2 4 6
g(x) 10 16 26 40
Table 1.8.2(b)
Between x = 0 and x = 2 the value of g(x) increases by 6 as x goes up
by 2. Between x = 2 and x = 4, the value of g(x) increases by 10 as x
goes up by 2. Since the slope of g is not constant, g cannot be a linear
function (the table cannot represent a linear equation).

t 5 10 15 20
(c)
h(t) 100 90 80 70
Table 1.8.2(c)

Since h(t) decreases by 10 for every increase of 5 in t, the table can


represent a linear equation, arising from a linear function h with slope =
− 10
5
= −2.

a (advertisement) 6 8 10 12
1.8.18 S (sales) 200 240 280 320

Since the increase in sales S is 40 (in 1000 Euro) for every increase of 2 (in
1000 Euro) of advertisement a, S could be a linear function

S(a) = ma + b . (*)
Solutions of Selected Exercises 547

To find the slope m, choose any two points, say (6, 200) and (8, 240), and
compute
ΔS 240 − 200 40
m= = = = 20 .
Δa 8−6 2

To find b, we use the point (6, 200) in (*),

200 = 20 · 6 + b , so b = 200 − 120 = 80 .

The function S is therefore

S(a) = 20a + 80 .

If 3500 Euro is spent on advertising, i.e. a = 3.5, we obtain the sales

S(3.5) = 20 · 3.5 + 80 = 70 + 80 = 150

in thousands of Euro, or 150,000 Euro.


1.8.19 We have the relationship
F = mC + b .

The freezing point of water is F = 32◦ or C = 0◦ , whereas the boiling point


is F = 212◦ or C = 100◦ . This gives the equations

32 = m · 0 + b and 212 = m · 100 + b ,

(212 − 32) 9
so b = 32 and m = = . We thus can transform F into C and
100 5
vice versa by the formulas

9 5
F= C + 32 , C = (F − 32) .
5 9
5
The Celsius equivalent of 90 ◦ F is C = (90 − 32) = 32.2◦ . The Fahrenheit
9
9
equivalent of −5 ◦ C is F = (−5) + 32 = 23◦ .
5
1.8.20 The height of the box is x, its length is 30 − 2x, and its width is 20 − 2x.
Since the volume V is the product of length, width and height, we obtain

V (x) = (30 − 2x)(20 − 2x)x = 4x 3 − 100x 2 + 600x .

1.8.21 The requirement for the volume gives 6 = 1.5yx.


6 4
(a) y = = .
1.5x x
548 Solutions of Selected Exercises

(b)

4 4
S = x y + 2 · 1.5x + 2 · 1.5y = x · + 2 · 1.5x + 2 · 1.5 ·
x x
12
= 4 + 3x + .
x
1.8.22 (a) Since the corresponding triangles (see Fig. 1.28) are similar,

y y+h
= , or ya = by + bh .
b a
Solving for y we get the function

bh
y(h) = .
a−b

(b) The volume V becomes

1 2 1 π
V = πa (y + h) − π b2 y = [(a 2 − b2 )y + a 2 h]
3 3 3
π bh π
= (a 2 − b2 ) + a 2 h = [(a + b)bh + a 2 h]
3 a−b 3
π
= h(a 2 + ab + b2 )
3
π
(c) We have 600 = 3
h(62 + 6 · 3 + 32 ), therefore

1800 200 200 · 7 100


h= = = = ≈ 9.1 f t .
63π 7π 7 · 22 11
1.8.24 Figure E.1 gives the graph of the function

y = 3 sin 2t .

The waves have a maximum of 3 and a minimum of −3. So the amplitude is


3. The graph completes one full cycle between t = 0 and t = π. Therefore
the period is π.
1.8.25 (a) Figure E.2 shows the graph of the function
π 
y = 4.9 cos t +5
6
(b) At t = 12, y = 4.9 cos (2π ) + 5 = 9.9 ft.
The water level at high tide was 9.9 ft.
Solutions of Selected Exercises 549

Fig. E.1 Graph of the


function y = 3 sin(2t)

Fig. E.2 Graph of the


function
y = 5 + 4.9 cos(πt/6)

(c) Low tide occurs at t = 6 (the minimum value of cos ( π6 t) equals −1) and
at t = 18 (at 6 p.m.). The water level at this time is 0.1 ft.
(d) The period is 12 h and represents the interval between successive high
tides or successive low tides. Here we have assumed the period to be 12 h.
If this were correct, the high tide would always be at noon or midnight, but
actually it progresses through the day. The interval between successive
high tides actually averages about 12 h 24 min, and one should take this
into account in a more precise mathematical model.
(e) The maximum height is 9.9 ft and the minimum is 0.1 ft, so the amplitude
is
9.9 − 0.1
= 4.9 .
2
This is half the difference between the depths at high and low tide.

Solutions to the Exercises of Chap. 2


2.5.1 ⎧

⎨1 − x , x < 0 ,
2

f (x) = 13 , x = 0,


1− x , x > 0.
550 Solutions of Selected Exercises

Considering the first and the last line in the case distinction, we get lim x→0+
f (x) = 1, lim x→0− f (x) = 1. Therefore lim x→0 f (x) exists and is equal to 1.
Let us remark that these results do not depend on the value of f (0), or even
on whether or not f is defined at x = 0.
2.5.2 (a) We have

x 2 − x − 12 (x − 4)(x + 3) x +3 7
lim = lim = lim = .
x→4 x 2 − 4x x→4 x(x − 4) x→4 x 4

(b)

x 3 + 64 (x + 4)(x 2 + 16 − 4x)
lim = lim
x→−4 x +4 x→−4 (x + 4)
= lim x + 16 − 4x = (−4)2 + 16 + 16 = 48 .
2
x→−4

(c) The quotient rule does not apply immediately since both numerator and
denominator increase without bound as x → ∞. As a first step, we divide
both numerator and denominator by x 2 ,

3x 2 + 7x − 6 3+ 7
− 6
x2
= x
.
4x − 3x + 6
2
4− 3
x
+ 6
x2

Since on the right-hand side we can pass to the limit as x → ∞, we get


 
3x 2 + 7x − 6 lim x→∞ 3 + 7
− 6
lim =  x x2

x→∞ 4x 2 − 3x + 6 lim x→∞ 4 − 3
+ 6
x x2
lim x→∞ 3 + 7 lim x→∞ x1 − 6 lim x→∞ x12
=  
lim x→∞ 4 − 3 lim x→∞ x1 + 6 lim x→∞ x12
3+7·0−6·0 3
= = .
4−3·0+6·0 4

(In general, the limit as x → ∞ or x as x → −∞ of any rational function


for which the degree of the numerator is less than or equal to the degree
of the denominator can be found as in this example.)
(d) We know that

sin x
cos x < < 1 , (see Appendix C2),
x
and that lim x→0 cos x = cos 0 = 1 and lim x→0 1 = 1. By the Sandwich
theorem,
sin x
lim = 1.
x→0 x
Solutions of Selected Exercises 551

(e)
lim (x + cos x) = lim x + lim cos x = 0 + 1 = 1 .
x→0 x→0 x→0

(f)  
sin x sin x
lim e x + = lim e x + lim = 1+ 1 = 2.
x→0 x x→0 x→0 x

2.5.3 
8 , x rational,
f (x) =
3 , x irrational.

lim x→c f (x) does not exist. Intuitively, as x approaches c, x passes through
both rational and irrational numbers, and f (x) therefore jumps back and forth
between 8 and 3. In view of the formal definition of the limit we see that no
matter how small we choose an interval I = (c − δ, c + δ) around c, there will
always be rational and irrational numbers in it with function values 8 and 3,
respectively. Thus, for ε = 1 there can be no number L such that | f (x) − L| <
ε = 1 for all x in the interval I .
2.5.4 We have

1 − cos x (1 − cos x)(1 + cos x) 1 − cos2 x


= =
x x(1 + cos x) x(1 + cos x)
2
sin x sin x sin x
= = · ,
x(1 + cos x) x 1 + cos x

and therefore
     
1 − cos x sin x sin x
lim = lim · lim = 1·0 = 0.
x→0 x x→0 x x→0 1 + cos x

π
2.5.5 We consider f (x) = cos ( x) − x 2 on [0, 1]. Since f is the difference of two
2
continuous functions, it is continuous. We have
π
f (0) = cos 0 − 02 = 1 > 0 , f (1) = cos − 12 = −1 < 0 .
2
Therefore by intermediate value theorem there exists at least one number c ∈
(0, 1) such that f (c) = 0.
Solutions to the Exercises of Chap. 3
3.9.1 We have s(t) = 4t 2 + 3t. The velocity v at time t is given by v(t) = s (t) =
8t + 3. The velocities at time t = 0 and t = 3 are v(0) = 3 and v(3) = 24 +
3 = 27, respectively.
3.9.2 (i) Change in area: A(r + h) − A(h) = π(r + h)2 − πr 2 = 2πr h + π h 2 .
(ii) Change in circumference: C(r + h) − C(h) = 2π(r + h) − 2πr = 2π h.
552 Solutions of Selected Exercises

3.9.3 Hint: Use the method of induction (see Chap. 1).


3.9.4 (i) For f (x) = (2x 5 − x)(x 3 + 1) we get

f (x) = (2x 5 − x) · 3x 2 + (10x 4 − 1)(x 3 + 1) = 16x 7 + 10x 4 − 4x 3 − 1 .

(ii) For f (x) = 10x −4 + 3x −2 we get f (x) = −40x −5 − 6x −3 .


−3x 3 − 1
(iii) For f (x) = we get
2x 2 + 1

(2x 2 + 1)(−9x 2 ) − (−3x 3 − 1) · 4x


f (x) =
(2x 2 + 1)2
−18x − 9x + 12x 4 + 4x
4 2
−6x 4 − 9x 2 + 4x
= =
(2x 2 + 1)2 (2x 2 + 1)2
−x(6x + 9x − 4)
3
= .
(2x 2 + 1)2

(iv) For f (x) = (x 2 + 1)(x − 1)(x + 5) we get

f (x) = (x 2 + 1)(x − 1 + x + 5) + (x − 1)(x + 5)2x


= (x 2 + 1)(2x + 4) + (x 2 + 4x − 5)2x
= 2x 3 + 4x 2 + 2x + 4 + 2x 3 + 8x 2 − 10x
= 4x 3 + 12x 2 − 8x + 4 = 4(x 3 + 3x 2 − 2x + 1) .

(v) For f (x) = (1 + x 2 )x 3 e x ln x = (x 3 + x 5 )e x ln x we get


 
ex
f (x) = (3x + 5x )e ln x + (x + x )
2 4 x
+ e x ln x
3 5
.
x

6x
(vi) For f (x) = ln(1 + 3x 2 ) we get f (x) = .
1 + 3x 2
(vii) For f (x) = e x we get f (x) = e x · 2x.
2 2

3.9.5 (a) For y = x ln x − x we get

x 1 1
y = + ln x − 1 = ln x , y = , y = − .
x x x2
1
= (x 2 + 4)− 2 we get
1
(b) For y = √
x2 +4

1
y = − 2x(x 2 + 4)− 2 = −x(x 2 + 4)− 2
3 3

2
3
y = −(x 2 + 4)− 2 + x(x 2 + 4)− 2 · 2x
3 5

2
Solutions of Selected Exercises 553

= −(x 2 + 4)− 2 + 3x 2 (x 2 + 4)− 2


3 5

3 5
y = (x 2 + 4)− 2 2x − 3x 2 · (x 2 + 4)− 2 · 2x + 6x(x 2 + 4)− 2 .
5 7 5

2 2

(c) For y = e2x (e2x − e−2x ) = e4x − 1, we get

y = 4e4x , y = 16e4x , y = 64e4x .

3.9.6 (a) For f (t) = t 100 + t 40 + t 2 we get

f (t) = 100t 99 + 40t 39 + 2t , f (t) = 9900t 98 + 40 × 39t 38 + 2 ,


f (t) = 9900 · 98t 97 + 40 · 39 · 38t 37 .

(b) For f (t) = (3t + 5)2 we get

f (t) = 2(3t + 5) · 3 , f (t) = 6 · 3 = 18 , f (t) = 0 .

(c) For f (t) = t 5 we get

f (t) = 5t 4 , f (t) = 20t 3 , f (t) = 60t 2 .

3.9.7 We have f (t) = g(t)2 for all t, so f (t) = 2g(t)g (t) for all t and therefore
f (1) = 2g(1)g (1).
3.9.8 We have y 5 + x y + x 2 = 3. Differentiating with respect to x, considering y
as a function of x, we get

5y 4 y + x y + y + 2x = 0 ,
(5y 4 + x)y = −2x − y ,
−2x − y
y = .
5y 4 + x

3.9.9 For y = (3x + 5)2 we get y = 2(3x + 5) · 3 = 6(3x + 5).


For y = (−5x 2 + x − 1)2 we get y = 2(−5x 2 + x − 1)(−10x + 1).
1
3.9.10 (a) For f (x) = 5 = (x 5 − x + 1)−3 we get
(x − x + 1)3

d 5
f (x) = −3(x 5 − x + 1)−4 (x − x + 1)
dx
3(5x 4 − 1)
= −3(x 5 − x + 1)−4 (5x 4 − 1) = − .
(x 5 − x + 1)4

(b) For f (x) = sin3 x we get


554 Solutions of Selected Exercises

d
f (x) = 3 sin2 x · (sin x) = 3 sin2 x · cos x .
dx
x
(c) For f (x) = √ we get
1 − x2

( 1 − x 2 )(1) − x 21 √(−2x) 1
1−x 2
f (x) = = .
1 − x2 (1 − x 2 )3/2

3.9.11 (a) For y = x 3 sin2 (5x) we get

dy d
= x 3 · 2 sin 5x · (sin 5x) + 3x 2 sin2 (5x)
dx dx
= 10x 3 sin 5x · cos 5x + 3x 2 sin2 (5x) .

sin x
(b) For y = we get
sec (3x + 1)

dy sec (3x + 1) · cos x − sin x · sec (3x + 1) · tan (3x + 1) · 3


=
dx [sec (3x + 1)]2
cos x − 3 sin x · tan (3x + 1)
= .
sec (3x + 1)

(c) For y = cos3 (sin 2x) we get

dy d
= 3 cos2 (sin 2x) · [cos(sin 2x)]
dx d x
d
= 3 cos2 (sin 2x) · − sin (sin 2x) · (sin 2x)
dx
= −6 cos2 (sin 2x) · sin (sin 2x) · cos 2x .

3.9.13 For the surface area we have S(r ) = 4πr 2 . The differential at r for a change
of an amount h is given by d S = S (r )h = 8πr h. Since r = 1 and h = 0.01,
we get
ΔS ≈ d S = 8π · 1 · 0.01 = 0.08π ≈ 0.251 m 2 .

3.9.14 If we differentiate the equation y 3 = 2x 2 + c with respect to x, we get


3y 2 ddyx = 4x. Therefore, the given differential equation is satisfied.
3.9.15 Differentiating x y = c with respect to x, we get x y (x) + y = 0, so the given
differential equation is satisfied.
1 (1 + x 2 ) − x · 2x 1 − x2
3.9.16 (a) f (x) = x [ ]= .
1+x 2
(1 + x )2 2 x(1 + x 2 )
1
(b) f (x) = .
x ln x
Solutions of Selected Exercises 555

1 2 ln x ln x
(c) f (x) = = .
2 1 + ln2 x x x 1 + ln2 x
1
(d) f (x) = − sin (ln x) .
x
d 1 1
3.9.17 (a) f (x) = e1/x ( ) = − 2 e1/x .
dx x x
(b)

(e x + e−x )(e x + e−x ) − (e x − e−x )(e x − e−x )


f (x) =
(e x + e−x )2
(e + 2 + e ) − (e2x − 2 + e−2x )
2x −2x
4
= −x
= x .
(e + e )
x 2 (e + e−x )2

(c) f (x) = e x cos (e x ) .


1
(d) f (x) = (− sin e x ) · e x = −e x tan (e x ) .
cos (e x )
3.9.18 (a) Differentiating y + ln(x y) − 2 = 0, we get
 
1   1 1
y + xy + y = 0 , 1+ y + = 0 ,
xy y x

therefore y
y = − .
x(1 + y)

(b)

y = ln (x tan y)
1  
y = x(sec2 y)y + tan y
x tan y
 2 
sec y 1
y 1 − =
tan y x
tan y
y = .
x(tan y − sec2 y)

3.9.19 (a) f (x) = sin−1 ( 15 x)

1 1 1
f (x) =  . =√ .
1− x2 5 25 − x 2
25

1
(b) f (x) = (tan x)−1 = = cot x
tan x
f (x) = − csc2 x .
556 Solutions of Selected Exercises

(c) f (x) = sin−1 x + cos−1 x


1 1
f (x) = √ −√ =0 .
1 − x2 1 − x2
3.9.20 Let us define
y(x) = (1 + x)1/x .

Taking the natural logarithm on both sides, we get

  1
ln y(x) = ln (1 + x)1/x = ln (1 + x) ,
x
ln (1 + x)
lim ln y(x) = lim .
x→0 x→0 x

This is an indeterminate form of type 0/0. By l’Hôpital’s rule


1
1+x
lim ln y(x) = lim = 1,
x→0 x→0 1

therefore ln y(x) → 1 as x → 0. By the continuity of the exponential func-


tion
y(x) = eln y(x) → e1 , as x → 0,

that is, y(x) → e as x → 0. Thus lim (1 + x)1/x = e.


x→0
(ln a)
3.9.21 (a) Let y(x) = x . After taking the logarithm on both sides, we pass to
(1+ln x)

the limit as x → 0+ ,

ln a ln x
lim ln y = lim+ ln x = ln a · lim+
x→0+ x→0 1 + ln x x→0 1 + ln x
1
= ln a · lim+ x
1
(by l’Hòpital’s rule)
x→0
x
= ln a .

Therefore, lim+ y(x) = eln a = a.


x→0
(ln a)
(b) Let y(x) = x (1+ln x) . After taking the logarithm on both sides,

ln a ln x
lim ln y(x) = lim ln x = ln a · lim
x→+∞ x→+∞ 1 + ln x x→+∞ 1 + ln x
1
= ln a · lim+ x
1
(by l’Hòpital’s rule)
x→0
x
= ln a .

Therefore lim y(x) = eln a = a.


x→+∞
Solutions of Selected Exercises 557

(c) Let y(x) = (x + 1)(ln a)/x . Taking the logarithm on both sides and passing
to the limit, we get

ln a
ln y(x) = ln (x + 1) ,
x
ln (x + 1)
lim ln y(x) = lim ln a
x→0 x→0 x
1
= ln a · lim = ln a .
x→0 x + 1

Therefore, lim y(x) = eln a = a.


x→0
3.9.22 We consider the identity
   
sin x x sin x1 1
· = sin .
x sin x x

sin x x sin ( x1 )
We know that lim+ = 1. If lim+ would exist, then also
x→0 x x→0 sin x
1
lim+ sin ( ) would exist. But we know that the latter does not exist because
x→0 x
x sin ( x1 )
sin ( x ) oscillates between −1 and 1 as x → 0+ . So lim+
1
does not
x→0 sin x
exist.
3.9.23 (a) Let y(x) = eax sin bx. We compute

y = aeax sin bx + beax cos bx


y = a 2 eax sin bx + abeax cos bx + abeax cos bx − b2 eax sin bx
= (a 2 − b2 )eax sin bx + 2abeax cos bx ,

and furthermore

y − 2ay + (a 2 + b2 )y
= (a 2 − b2 )eax sin bx + 2abeax cos bx
− 2a(aeax sin bx + beax cos bx)
+ (a 2 + b2 )eax sin bx
= 0.

(b) Let y = tan−1 x. We get

1 −2x
y = , y = .
1 + x2 (1 + x 2 )2
558 Solutions of Selected Exercises

The following computations yield expressions for cos y and sin y. From the
definition of y, we get

x = tan y = sec2 y − 1 , sec2 y = 1 + x 2 ,
1 1
cos2 y = , cos y = √ ,
1+x 2
1 + x2
 
1 x2
sin y = 1 − cos y = 1 −
2 = .
1+x 2 1 + x2

From these formulas, we conclude that

2x x 1
−2y + 2 sin y cos3 y = − + 2√
(1 + x 2 )2 1+x 2 (1 + x 2 )3/2
= 0.

3.9.24 Let y(x) = 32x 57x . Then

ln y(x) = 2x ln 3 + 7x ln 5 .

Differentiating with respect to x, we get

1
y = 2 ln 3 + 7 ln 5 , y = (2 ln 3 + 7 ln 5)y ,
y

so y is proportional to y.
Solutions to Exercises of Chap. 4
x 
4.6.1 (i) f (x) = sec
2   x  1 sin(x/2)
1 x
f (x) = sec · tan =
2 2 2 2 cos2 (x/2)  π π

f (x) = 0 gives sin(x/2) = 0 which is true for x = 0 ( in − , ).
      2 2
1 1 x 1 x x
f (x) = sec3 + tan2 · sec
2 2 2 2 2 2
f (0) = 41 > 0.
Therefore f has a local minimum at x = 0.
(ii) f (x) = tan x − 2 sec x, f (x) = sec2 x − 2 sec x · tan x.
f (x) = 0 gives sec x · (sec x − 2 tan x) = 0,
i.e. sec x = 0 or sec x − 2 tan x = 0.
This gives

1 sin x
sec x − 2 tan x = 0 , −2 = 0,
cos x cos x
Solutions of Selected Exercises 559

1 π  π π
so sin x = , thus x = in − , .
2 6 4 4
Therefore x = π/6 is a critical point for f .
f (x) = 2 sec2 x · tan x − 2 tan2 x · sec x − 2 sec3 x
π  1 1 2 8 4
f = 2 · 4 · √ − 2 · · √ − √ = − √ < 0.
3 3 3 3 3 3 3 3
Therefore, x = π/3 is a local maximum for f .
(iii) f (x) = sin x − cos x
f (x) = cos x + sin x
f (x) = 0 holds for x = 3π 4
in [0, π ]. Moreover,
 
3π √
f (0) = −1 , f = 2, f (π ) = 1 ,
4

so the maximum value equals 2 at x = 3π
4
, and the minimum value

equals −1 at x = 0. (We have f ( ) < 0.)
 4
6 − 4x , x < 3/2 ,
(iv) f (x) = |6 − 4x| =
−6 + 4x , x > 3/2 .

−4 , x < 3/2 ,
f (x) =
4, x > 3/2 .
f (x) does not exist when x = 3/2. Since otherwise f (x)  = 0, the point

3/2 is the only candidate for an extremum in (−3, 3). We have

f (−3) = 18 , f (3/2) = 0 , f (3) = 6 ,

so the maximum value is 18 at x = −3 and the minimum value is 0 at


x = 3/2.
4.6.2 (i) For f (x) = 3 − 4x − 2x 2 we get f (x) = −4 − 4x; the condition f (x)
= 0 gives x = −1, so f has a unique stationary point at x = −1. Since
f (x) = −4, we get f (−1) = −4 < 0, so f (x) has a local maximum
at x = −1. It has no local minimum. Since moreover f (x) tends to −∞
for x → −∞ and for x → +∞, f must have a global maximum. Since
every global maximum is also a local maximum, it has to be at x = −1.
(ii) f (x) = x 3 − 3x − 2 is a polynomial of odd degree. Therefore, lim f (x)
x→+∞
and lim x→−∞ f (x) have opposite signs (one is +∞ and the other is −∞)
so there can be no global extremum.
4.6.3 Let the length of the sides of the rectangle be x and y, respectively. Its area A
equals A = x y. We have p = 2x + 2y for its perimeter, so we may express y
as y = 2p − x and obtain

px  p
A(x) = − x 2 , where x ∈ 0, .
2 2
560 Solutions of Selected Exercises

Now p
A (x) = − 2x ,
2

so A (x) = 0 if and only if x = 4p . In this case, y = 4p . Moreover, A = −2 <


0 everywhere. Therefore, the area becomes maximal when x = 4p and y = 4p
which means that the rectangle is a square.
4.6.4 Let the length of the sides of the rectangle be x and y, respectively. Its perimeter
equals p = 2x + 2y We have A = x y for its area, so we may express y as
y = A/x and consequently

A 2A
p(x) = 2x + 2 , p (x) = 2 − .
x x2

We have p (x) = 0 √if and only if x 2 − A = 0 or x = A, since x cannot be
negative.√Then y = A. Moreover, we have p (x) = 4 A/x 3 which is positive
for x = A. √ √
Thus p has a minimum at x = A. Since now x = y = A, the rectangle of
minimal perimeter is a square.

4.6.7 Let D be the point opposite to √ A such that √ ∠AD B = 90 . Then the dis-
tance between D and B equals 3 − 1 = 8 km.
2 2
√ Let x denote the distance
between the point C and the point D, so 0 ≤ x ≤ 8. Let k be the cost per km
of the pipe above the ground. Then the cost√ per km of the pipe under
√ the water
equals 4k. The pipe length above
√ ground is 8 − x,√its cost is k( 8 − x). The
pipe length under water is 1 + x 2 , its cost is 4k 1 + x 2 . For the total cost
P we obtain
√ 4kx
P(x) = k( 8 − x) + 4k 1 + x 2 , P (x) = −k + √ .
1 + x2

We have P (x) = 0 if and only if x = 1/15  0.26 km. This is a minimum
since
4k
P (x) = >0
(1 + x 2 )3/2

for all x, in particular


 for x as computed above. The distance from C to B
√ 1
becomes 8 −  2.57 km. Note that the solution does not depend on
15
the actual amount of the cost k, as long as the proportion between the cost
above ground and under water remains fixed.
4.6.8 We have f (x) = x 2 + px + q, so f (x) = 2x + p. In order that 1 is an
extremum of f we must have f (1) = 0, that is, 2 + p = 0. This gives p = −2,
so
f (x) = x 2 − 2x + q .
Solutions of Selected Exercises 561

Since moreover f (1) = 3 is assumed, we get 3 = 1 − 2 + q or q = 4. There-


fore,
f (x) = x 2 − 2x + 4 .

Since f (x) = 2 for all x, we have f (1) > 0, and therefore f has a local
minimum at x = 1.
4.6.9 We have
64 27
f (x) = + = 64 csc x + 27 sec x ,
sin x cos x
f (x) = −64 csc x · cot x + 27 sec x · tan x

We get f (x) = 0 when −64 csc x · cot x + 27 sec x · tan x = 0, and thus com-
pute

csc x · cot x 27 cos x · cos x · cos x 27


= , = ,
sec x · tan x 64 sin x · sin x · sin x 64
 3
3 4
(cot x)3 = , tan x = .
4 3

This gives  
3 3
cos x = , x = cos−1 .
5 5

To obtain the second derivative, we start from

cos x sin x
f (x) = −64 + 27 2
sin2 x cos x

and compute

64 27
f (x) = − · (− sin3 x − 2 cos2 x · sin x) + · (− cos3 x + 2 sin2 x · cos x)
sin4 x cos4 x
64 cos2 x 27 sin2 x
= + 128 3 − + 54 3 .
sin x sin x cos x cos x

At x = cos−1 ( 35 ) we have cos x = 3


5
and sin x = 45 . For this particular x we,
therefore, get

64 9 125 27 16 125
f (x) = + 128 − + 54
4/5 25 64 3/5 25 27
= 16 · 5 + 90 − 45 + 160 > 0 .
562 Solutions of Selected Exercises

Therefore, f has a local minimum at x = cos−1 35 which lies in the interval


π
(0, ). From the formula
2
   

27 cos3 x · tan3 x − 64 27 cos x · tan3 x − 64
f (x) = 27
= 27
sin2 x · cos2 x sin2 x

we see that besides the point x for which tan x = 4


3
there is no other critical
π
point in the interval (0, ). Since moreover
2
lim f (x) = +∞ = lim f (x) ,
x→0+ x→(π/2)−

there are no local maxima, and the point x as computed above must be a global
minimum.
4.6.10 (a) For f (x) = −2x 3 − 6x 2 + 5 we get

f (x) = −6x 2 − 12x = −6x(x + 2) .

We have f (x) = 0 for x = 0 and x = −2, both of which lie within


[−3, 1]. We compute

f (x) = −12x − 12 , f (0) = −12 < 0 , f (−2) = 12 > 0 .

Therefore, f has a local maximum at x = 0 with maximum value f (0) =


5, and a local minimum at x = −2 with minimum value f (−2) = −3.
(b) For f (x) = x 4 − 5x 2 + 4 we get

f (x) = 4x 3 − 10x = 2x(2x 2 − 5) ,


 
5 5
so f (x) = 0 holds for x = 0 and x = ± , however, − ∈
/ [0, 2].
2 2
We compute


5
f (x) = 12x − 10 ,
2
f (0) = −10 < 0 , f = 20 > 0 .
2

Therefore, f has a local maximum at x = 0 (which happens to lie at the


boundary of the interval)
 with maximum value  f (0) = 4, and a local
5 5 9
minimum at x = with minimum value f ( )=− .
2 2 4
4.6.11 Let y be the height of the rectangle and x the radius of the semicircle. The
circular arc has length π x, and the perimeter of the window is π x + 2y + 2x
which has to be equal to 15, so 2y(x) = 15 − (2 + π )x. The area A thus
Solutions of Selected Exercises 563

becomes
1  π 2
A(x) = 2x y(x) + π x 2 = 15x − 2 + x .
2 2
To maximize the area, we compute x from

15
0 = A (x) = 15 − (4 + π )x , which gives x = .
4+π

Since A (x) = −(4 + π ) < 0 for all x, this is indeed a maximum. The cor-
responding value for y becomes

1 (2 + π )15 15
y= 15 − = .
2 4+π 4+π

4.6.12 Let x denote the width of the field, y its length, and k the length of the barn,
so 0 < k < y. The amount of fence needed is 2x + y + (y − k) which has
to be equal to 500, so
500 + k
y= −x.
2
For the area A of the field we have
 
500 + k
A(x) = x · y(x) = x − x2 .
2

In order to maximize A, we need

500 + k 500 + k
0 = A (x) = − 2x , thus x = ,
2 4
500 + k
with the corresponding value y = = x. Since A (x) = −2 < 0 for
4
all x, we have a maximum. Since x = y, the rectangle is, in fact, a square.
Note that the result does not depend on the size of the barn.
4.6.13 Let the semicircle be given as x 2 + y 2 = a 2 with y ≥ 0. We consider rect-
√ with corners (x, 0), (−x, 0), (x, y) and (−x,
angles √ y) with 0 ≤ x ≤ a and
y = a 2 − x 2 . Their area A satisfies A(x) = 2x a 2 − x 2 , so

1 2x · 2x 2(a 2 − 2x 2 )
A (x) = 2 a 2 − x 2 − √ = √ .
2 a2 − x 2 a2 − x 2

1√
For maximum area, A (x) = 0, so x = 2a. Then
2
564 Solutions of Selected Exercises

Fig. E.3 Maximizing the


area of a rectangle


a2 1√
y = a2 − x 2 = a2 − = 2a .
2 2
√ 1√
Therefore, the length of rectangle equals 2a, its width equals 2a.
2
4.6.14 The situation is depicted in Fig. E.3. We have A = 2x y for the area, and
y = 16 − x 2 . We must have 0 ≤ x ≤ 4. We thus get

A(x) = 2x(16 − x 2 ) = 32x − 2x 3 , A (x) = 32 − 6x 2 .

4
For maximum area, A (x) = 0, so x = √ . We have A (x) = −12x, thus
3
4 4 32
A ( √ ) < 0. Therefore, x = √ maximizes the area, and we have y = .
3 3 3
8 32
The dimensions of the rectangle with the largest area are √ and .
3 3
4.6.15 Let the length of the rectangle be x and its width be y, let the circle be
 x 2  y 2
given by + = 102 , that is, x 2 + y 2 = 400 (Fig. E.4). So y(x) =
√ 2 2
400 − x 2 and the area A satisfies
2(200 − x 2 )
A(x) = x 400 − x 2 , A (x) = √ ,
400 − x 2

√ √
√ 0 ≤ x ≤ 20. We get A (x) = 0 when x = 200 = 10 2. For x =
where
10 2 we have A(x) = 200, whereas√for x = 0 and
√ x = 20 we get A(x)
√ = 0.
So the area is maximal when x = 10 2√and y = 400 − 200 = 10 2. (One
may check that A (x) < 0 for x = 10 2.)
Solutions of Selected Exercises 565

Fig. E.4 Rectangle


inscribed in a circle

4.6.16 (a) We have

N (t) = 5000(25 + te−t/20 ) ,


 
1
N (t) = 5000 e−t/20 − te−t/20 = 250(20 − t)e−t/20 .
20

Therefore, N (t) = 0 at t = 20. We have

N (0) = 125000 , N (20) ≈ 161788 , N (100) ≈ 128369 .

The global maximum is N = 161788 at t = 20. (Indeed N (t) < 0 for t =


20.) The global minimum is N = 125000 at t = 0.
(b) A minimum of N occurs when 0 = N (t) = 12.5(t − 40)e−t/20 = 0,
that is, when t = 40. (One may check that N (t) > 0 at this point.)
4.6.17 Let r and h be the radius and height of the cylinder. The volume of the
 2
h
inscribed cylinder is V = πr 2 h. We must have r 2 + = R 2 , thus r 2 =
2
h2
R 2 − . Moreover, 0 ≤ h ≤ 2R. So
4
   
h2 h3
V (h) = π R −2
h=π R h− 2
,
4 4
 
3
V (h) = π R 2 − h 2 .
4

We get V (h) = 0 when h = √ 2R


3
. For h = 0 and h = 2R we have V (h) = 0,
2R 4π
for h = √ we have V = √ R 3 . One checks that V (h) < 0 at the latter
3 3 3 
2R 2
value of h. So the volume is largest when h = √ and r = R.
3 3
566 Solutions of Selected Exercises

Solutions to the Exercises of Chap. 5


5.5.3 (a) The given series can be written as
 
1 1 1 1 1
2 1 + + 2 + 3 + · · · + n−1 + · · · = 2 =6
3 3 3 3 1− 1
3

1 1 1 1
as 1 + + + 3 + · · · + n−1 + · · · is a geometric series with com-
3 32 3 3
mon ratio r = 13 and first term a = 1.
1
(b) It can be written into a geometric series with common ratio r = and
100
1
9 1
first term a = 100
1
. Therefore the sum of the series is 100 1 = = .
1 − 100 99 11
1 1
(c) It is a geometric series with a = 1 and r = − and so its sum is =
2 1 + 21
2
.
3
(d) It is a geometric series with a = 1 and r = −2. Since |r | = 2 > 1, the
series diverges.
(e) We have
1 1 1 1
+ + + ··· +
2·3 3·4 4·5 (n + 1)(n + 2)
       
1 1 1 1 1 1 1 1
= − + − + − + ··· + − + ···
2 3 3 4 4 5 n+1 n+2
1 1 1
= − = as n → ∞
2 n+2 2

(f) The series is


 
1 1 1 1
5 + + + ··· + + ···
1·2 2·3 3·4 n(n + 1)
     
1 1 1 1 1
=5 1− + − + ··· + − + ···
2 2 3 n n+1

1
=5 1− + · · · = 5.
n+1

1
5.5.4 (a) It is a geometric series with a = 1, r = − . Hence, it is convergent and
4
1 4
its sum equals  1 = .
1 − −4 5
1
7
(b) It is a convergent series whose sum equals 7 · 4 1 = .
1− 4 3
Solutions of Selected Exercises 567

(c) It is the sum of two convergent series, and

∞    ∞ ∞
5 1 5 1 5 · 21 1
1 11
+ = + = + 3
=5+ = .
n=0
2 n 3n
n=0
2 n
n=0
3n
1 − 1
2
1 − 1
3
2 2

(d) It is a difference of two convergent series, and

∞ ∞
5 1 1 1 3 17
− =5· − = 10 − = .
n=0
2 n
n=0
3n
1 − 1
2
1− 1
3
2 2

2n+1  2n+2 5
(f) Here a = 2 and r = = . Since |r | > 1 the series diverges.
5n 5n+1 2
1 1
5.5.5 (a) It is a geometric series with a = √ and r = √ and hence it is conver-
2 2
√1
2 1
gent. Its sum is =√ .
1 − √12 2−1

(b) Since the sequence sn = ( 2)n does not converge to 0, the series
√ is diver-
gent. Alternatively, since it is a geometric series with r = 2 > 1, it is
divergent.
3 1
(c) It is a geometric series with a = and r = − , hence it is convergent
2 2
3
and its sum is 2
= 1.
1+ 1
2
5.5.6 See Sect. 6.9.
1
5.5.7 (a) Let sn = √ √ . We have
n+1+ n
√ √
n+1− n √ √
sn = √ √ √ √ = n+1− n.
( n + 1 + n) · ( n + 1 − n)

By the Mean Value Theorem, applied to f (x) = x on the interval [n, n +
1] we have, since f (x) ≥ f (n + 1) for x ≤ n + 1,

√ √ 1
sn = n+1− n≥ √ .
2 n+1

1
Let tn = . We get
n+1
√ √ √
sn n+1− n n+1 sn
= ≥ , lim = +∞ .
tn n 2 n→∞ tn
568 Solutions of Selected Exercises


Since we know that n tn diverges, the limit comparison test implies that

n sn diverges, too.
1
(b) We have sn = n . Consider first the case x < 1. Set tn = x n , then
x + x −n

sn 1 1 1 sn
= n −n
· n = 2n , lim = 1.
tn x +x x x +1 n→∞ tn

 
Since tn is convergent, the limit comparison test implies that sn is
also convergent.
In the case x > 1, we consider u n = x −n and get

sn 1 1 sn
= n · xn = , lim = 1.
un x + x −n 1 + x −2n n→∞ tn

 
But u n is convergent, so sn is also convergent.
 1 1 1 
When x = 1, sn = + + + · · · which is divergent. Hence, sn
2 2 2
converges for x < 1 and x > 1 but diverges for x = 1.
x 2n−2 x 2n
5.5.8 (a) We have sn = √ and sn+1 = √ . The ratio becomes
(n + 1) n (n + 2) n + 1

sn+1 x 2n (n + 1) n
= √ ·
sn (n + 2) n + 1 x 2n−2
 1/2 !
n+1 n 1 + 1/n 1
= x = lim
2
.√ x2
n+2 n+1 n→∞ 1 + 2/n 1 + 1/n
= x2 .

Thus,
sn+1
lim = x2 .
n→∞ sn

By the ratio test, sn converges if x 2 < 1 and diverges if x 2 > 1. If
x 2 = 1, then
1 1 1
sn = √ = 3/2 · .
(n + 1) n n 1 + 1/n

sn 1 
Taking tn = n 3/2 1
, we get lim = lim = 1. Since tn =
n→∞ tn n→∞ 1 + 1/n
 1 
n 3/2
is a convergent series, the limit comparison test implies that sn is
also convergent. Hence the given series converges if x 2 ≤ 1 and diverges
if x 2 > 1.
Solutions of Selected Exercises 569

(b) Here we have

sn+1 2n+1 − 2 n 2n + 1 1 2− 2
2n
1+ 1
2n
= n+1 x · n · n−1 = · x,
sn 2 +1 2 −2 x 2+ 1
2n
1− 2
2n

sn+1 2−0 1+0 


therefore lim = · x = x. Thus by the ratio test, sn
n→∞ sn 2+0 1−0
converges for x < 1 and diverges for x > 1. But the ratio test is not appli-
cable for x = 1. In that case,

2n − 2 1− 2
2n
lim sn = lim = lim = 1  = 0.
n→∞ n→∞ 2 + 1
n n→∞ 1 + 1
2n

Therefore, sn diverges for x = 1. Hence the given series converges for
x < 1 and diverges for x ≥ 1.
 2 1/n
n (n 2 )1/n n 2/n
5.5.10 (a) We have = = and therefore
2n (2n )1/n 2
 1/n
n2 n 2/n 1
lim = lim = < 1.
n→∞ 2n n→∞ 2 2

 n2
Thus, the root test implies that converges.
2n
(b) We have
 n 1/n
1 1 1
= , lim = 0 < 1.
1+n 1+n n→∞ 1+n
  1 n
Therefore the root test implies that 1+n
converges.
5.5.11 We have  1/n
n
, n is odd,
(sn )1/n = 1 2
2
, n is even.

Therefore
1 n 1/n
≤ (sn )1/n ≤
2 2
1/n
holds for all n. Since n 1/n → 1 as n → ∞, we have limn→∞ sn = 21 < 1 (by
the Sandwich theorem), that is, lim sn1/n < 1. Therefore, the series converges
n→∞
by the root test.
1
5.5.12 For f (x) = we get
1+x
570 Solutions of Selected Exercises

1 2 2·3
f (x) = − , f (x) = , f (x) = − ,...
(1 + x)2 (1 + x) 3 (1 + x)4
n!
f n (x) = (−1)n ,
(1 + x)n+1
1 1 1 2
f (2) = = , f (2) = − 2 , f (2) = 3 ,
1+2 3 3 3
−6 (−1)n n!
f (2) = 4 , . . . , f n (2) = .
3 3n+1
The required Taylor series is

1 f (2) f (2)
= f (2) + f (2)(x − 2) + (x − 2)2 + (x − 2)3 + · · ·
1+x 2! 3!
1 1 1 1 (−1)n
= − 2 (x − 2) + 3 (x − 2)2 − 4 (x − 2)3 + . . . + n+1 (x − 2)n + · · · .
3 3 3 3 3

5.5.13 The series expansion gives

1 1
e =1+1+ + ··· + + Rn (1)
2! n!
with
1
Rn (1) = ec
(n + 1)!

for some c between 0 and 1. For the purpose of this example, we assume that
we know that e < 3. Hence we are certain that
1 3
< Rn (1) < ,
(n + 1)! (n + 1)!

because 1 < ec < 3 for 0 < c < 1.


Since 9! = 362880, we have 1/9! > 10−6 , whereas 3/10! < 10−6 . Thus, we
should take (n + 1) to be at least 10 or n to be at least 9. With an error of less
than 10−6 ,

1 1 1
e ≈1+1+ + + ··· + ≈ 2.718282 .
2! 3! 9!
5.5.14 (a) The central angle θ of a circle of radius R subtended by an arc of length
s
s is θ = . From the figure in the text
R
R R+C
cos θ = or sec θ =
R+C R
or R sec θ = R +C or C = R(sec θ − 1)
or C = R sec Rs − 1 .
Solutions of Selected Exercises 571

(b) sec x = f (x)


f (x) = sec x tan x
f (x) = sec3 x + sec x tan2 x
f (x) = 5 sec3 x tan x + sec x tan3 x
f iv (x) = 18 sec3 x tan2 x + 5 sec5 x + sec x tan4 x
f (0) = 1, f (0) = 0, f (0) = 1, f (0) = 0, f iv (0) = 1.
1 5
f (x) = sec x ≈ 1 + x 2 + x 4 (using Maclaurin series)
2! 4!
 s   
s2 5s 4
C = R sec −1 ≈ R 1+ + − 1
R 2R 2 24R 4
s2 5s 4
= + .
2R 24R 3
Solutions to the Exercises of Chap. 6

x2
6.11.1 (4x + 5) d x = 4 + 5x + C = 2x 2 + 5x + C.
 2
t3 t2 5
6.11.2 (9t 2 − 5t + 9) dt = 9 − 5 + 9t + C = 3t 3 − t 2 + 9t + C.
3 2 2

√ u − 2 +1 √
1
2 u 3/2
6.11.3 (3 u + √ ) du = 3 · +2 1 + C = 2u + 4 u + C.
3/2
u 3/2 −2 + 1
 −6
z z −2 z2 5 7
6.11.4 (5z −7 + 7z −3 − z) dz = 5 +7 − +C =− 6 − 2 −
−6 −2 2 6z 2z
z2
+ C.
2
6.11.5

x 3 +1 x − 5 +1
2 1
−1/5
(x 2/3
− 4x + 4) d x = 2 −4 1 + 4x + C
3
+1 −5 + 1
3
= x 5/3 − 5x 4/5 + 4x + C .
5
 
u2 u4
6.11.6 u(1 + u ) du = (u + u 3 ) du =
2
+ + C.
 √ √
2 4
6.11.7 (2x −1 − 2e x ) d x = 2 ln x − 2e x + C.
 2
x3 +1 3
6.11.8 (x 2/3 − sin x) d x = 2 + cos x + C = x 5/3 + cos x + C.
  3
+1 5
sec x
6.11.9 d x = sec x d x = tan x + C.
2
 cos x 
sec u sin u
6.11.10 du = sec u tan u du = sec u + C.
cos u
572 Solutions of Selected Exercises
 
6.11.11 (1 + sin2 θ csc θ ) dθ = (1 + sin θ ) dθ = θ − cos θ + C.
  
sin 2θ 2 sin θ cos θ
6.11.12 dθ = dθ = 2 sin θ dθ = −2 cos θ + C.
cos θ cos θ
6.11.13
  
(1 + cot 2 x) cot x csc2 x cot x
dx = d x = csc x cot x d x
csc x csc x
= − csc x + C .

d d
6.11.14 (sin2 x) = 2 sin x cos x; (− cos2 x) = −2 cos x(− sin x) =
dx dx
2 sin x cos x;
d 1 1
(− cos 2x) = − (−2 sin 2x) = sin 2x = 2 sin x cos x.
dx 2 2
6.11.15 u = x 2 + 1, du = 2xd x,
 
u 10 (x 2 + 1)10
2x(x 2 + 1)9 d x = u 9 du = +C = +C.
10 10

6.11.16 u = x 2 + 6; du = 2x d x; x d x = 1
2
du,
 
x 1 1 1 1
dx = du = ln |u| + C = ln (x 2 + 6) + C .
x +6
2 2 u 2 2

6.11.17 u = 1 + cos 3x; du = −3 sin 3x d x,


 
sin 3x 1 1 1 1
dx = − du = − ln |u| + C = − ln |1 + cos 3x| + C .
(1 + cos 3x) 3 u 3 3

6.11.18 u = 1 + e2x ; du = 2e2x d x,


 
1 √ 1 u 3/2 1
e2x 1 + e2x d x = u du = = (1 + e2x )3/2 + C .
2 2 3/2 3

6.11.19 u = 1 − cos 2x, 1


2
du = sin 2x d x,
 
sin 2x 1 √
√ dx = u −1/2 du = u 1/2 + C = 1 − cos 2x + C .
1 − cos 2x 2

6.11.20 u = 1 + cos x; −du = sin x d x,


 
1
sin x(1 + cos x) d x = − u 2 du = − u 3 + C
2
3
1 1
= − (1 + cos x) + C = − (1 + cos3 x + 3 cos2 x + 3 cos x) + C
3
3 3
Solutions of Selected Exercises 573

1 1
= − cos x − cos2 x − cos3 x + D , where D = C − .
3 3
6.11.21 u = sin 3θ ; du = 3 cos 3θ dθ,
 
1 1 u6 1
sin5 3θ cos 3θ dθ = u 5 du = +C = sin6 3θ + C .
3 3 6 18
 6  6
6.11.22 (a) 12 d x = 12x −2 = 12 · 6 − 12 · (−2) = 72 + 24 = 96.
−2
(b)

 4 !4
 
x2 42 1
(4 − 6x) d x = 4x − 6 = 4·4−6· − 4 · (−1) − 6 ·
−1 2 2 2
−1
= (16 − 48) − (−4 − 3) = −32 + 7 = −25 .

 3
(c) 900 d x = 0.
3 4
 4
(d) 2 d x = 2x −4 = 2 · 4 − 2 · (−4) = 8 + 8 = 16.
−4
(e)
 8 3 8
x x2
(2x 2 + 5x + 2) d x = 2 + 5 + 2x
2 3 2 2
   
2 3 82 23 22
= ·8 +5· + 16 − 2 · +5· +4
3 2 3 2
   
1024 16
= + 160 + 16 − + 10 + 4
3 3
 
1024 16
= − + 162
3 3
1008
= + 162 = 336 + 162 = 498 .
3
(f)

 √ !2
2 x 4 − x 2 4 x
−1
4 − x2 dx = + sin (substituting x = 2 sin θ )
0 2 2 2
0
π
= 0 + 2 sin−1 1 − 2 sin−1 0 = 2 · = π .
2
574 Solutions of Selected Exercises

(g)
 π/3   π 
 π/3 π
sin x d x = − cos x −π/3 = − cos − cos −
−π/3 3 3
 π π
= − cos − cos = 0.
3 3
Alternatively, since the sine is an odd function (sin(−x) = − sin x), we
conclude directly that
 π/3
sin x d x = 0 .
−π/3

6.11.23 (a)
 1  6  6  1  6
f (x) d x + f (x) d x = − f (x) d x + f (x) d x + f (x) d x
6 −3 1 −3 1
 1
= f (x) d x .
−3

(b)
 6  2
  6  2
2
f (x) d x − f (x) d x = f (x) d x + f (x) d x − f (x) d x
−2 −2 −2 2 −2
 −2  6  6
= f (x) d x + f (x) d x = f (x) d x .
2 −2 2

 h  h  h  g  g
(c) f (x) d x − f (x) d x = f (x) d x + f (x) d x = f (x) d x.
d g d h d
6.11.24 (a)
 7  4  7
f (x) d x = −(x − 4) d x + (x − 4) d x
−3 −3 4
4 7
x2 1 2
= 4x − + x − 4x
2 −3 2 4
   
9 49
= (16 − 8) − −12 − + − 28 − (8 − 16)
2 2
 
33 7
= 8− − + − + 8 = 16 + 13 = 29 .
2 2
Solutions of Selected Exercises 575

(b)
 8  3  8 2 8
x
g(x) d x = 1 dx +
x d x = [x]3−2 +
−2 −2 3 2 3
 
64 9 55 65
= (3 − (−2)) + − =5+ = .
2 2 2 2
  
1
6.11.25 (a) ln x d x = 1 · ln x d x = x · ln x − x· d x = x ln x − x + C.
x
(b)
  
(cos x)2 d x = (cos x)(cos x) d x = sin x cos x − sin x(− sin x) d x

= sin x cos x + sin2 x d x

(1 − cos 2x)
= sin x cos x + dx
2
x sin 2x
= sin x cos x + − +C
2 4
sin 2x x sin 2x 1 x
= + − + C = sin 2x + + C .
2 2 4 4 2
(c)
  
−1 −1 −1 1
tan x dx = 1 · tanx d x = x tan x − x dx
1 + x2

1 2x
= x tan−1 x − dx
2 1 + x2
1
= x tan−1 x − ln (1 + x 2 ) + C
2
= x tan−1 x − ln 1 + x 2 + C .

(d)
 
x 3 ex d x = ex · x 3 − 3x 2 e x d x

= x 3 e x − 3 x 2 e x − 2xe x d x

= x 3 e x − 3x 2 e x + 6 xe x − e x d x

= x 3 e x − 3x 2 e x + 6xe x − 6e x + C .
576 Solutions of Selected Exercises

(e)
 1 −3x 1 
e 1 1 −3x
xe−3x d x = x + e dx
0 −3 0 3 0
1
1 −3 1 e−3x e−3 1
=− e + =− − (e−3 − 1)
3 3 −3 0 3 9
4 −3 1
=− e + .
9 9
(f)
 4  4
ln (x 2 + 1) d x = 1. ln (x 2 + 1) d x
0 0
 4
 4 x
= x ln (x 2 + 1) 0 − · 2x d x
0 x +1
2
 4 2
x +1−1
= 4 ln 17 − 2 dx
0 x2 + 1
 4  4
1
= 4 ln 17 − 2 1 dx − d x
0 x +1
2
0
 4  −1 4
= 4 ln 17 − 2 x 0 − tan x 0
= 4 ln 17 − 8 − tan−1 4 .

6.11.26 (a)
 ∞ 
−x
b  b
e d x = lim e−x d x = lim − e−x 0
0 b→∞ 0 b→∞
−b
= lim (−e + 1) = 1 .
b→∞

(b)
 ∞ b
x 1
d x = lim ln (1 + x )
2
−1 1 + x2 b→∞ 2 −1
1
= lim [ln (1 + l 2 ) − ln 2] = +∞ ,
l→∞ 2

so this integral is divergent.


Solutions of Selected Exercises 577

(c)
 ∞     
2 1 b
1 x −1 b
d x = lim − d x = lim ln
5 x −1
2 b→∞ 5 x −1 x +1 b→∞ x +1 5
   
b−1 4 2 3
= lim ln − ln = − ln = ln .
b→∞ b+1 6 3 2

(d)
 0  0
ex 1 0 2e x 1
d x = lim d x = lim − ln |3 − 2e x
|
−∞ 3 − 2e x b→−∞ 2 b 3 − 2e x b→−∞ 2 b
1 1
= lim ln (3 − 2e ) = ln 3 .
b
b→−∞ 2 2

(e)
 ∞ −3 b  
1 x 1 1 1
d x = lim = lim − −1 = .
1 x 4 b→∞ −3 1 b→∞ 3 b 3 3

6.11.27 Let
6x 2 + 13x + 6 A B C
= + + .
(x + 2)(x + 1) 2 x + 2 x + 1 (x + 1)2

This implies

6x 2 + 13x + 6 = A(x + 1)2 + B(x + 2)(x + 1) + C(x + 2) .

Putting x = −1 gives C = −1, putting x = −2 gives A = 4, putting x = 0


gives 6 = A + 2B + 2C which implies B = 2. Therefore,
   
6x 2 + 13x + 6 1 1 1
dx = 4 dx + 2 dx − dx
(x + 2)(x + 1) x +2 x +1 (x + 1)2
1
= 4 ln |x + 2| + 2 ln |x + 1| + + D.
x +1

6.11.28 (a)
 
cos5 θ dθ = (1 − sin2 θ )2 cos θ dθ

= (1 − 2 sin2 θ + sin4 θ ) cos θ dθ
  
= cos θ dθ − 2 sin2 θ cos θ dθ + sin4 θ cos θ dθ
2 3 1
= sin θ − sin θ + sin5 θ + C .
3 5
578 Solutions of Selected Exercises

(b)
 
sin3 x cos3 x d x = sin3 x(1 − sin2 x) cos x d x
 
= sin x cos x d x − sin5 x cos x d x
3

1 4 1
= sin x − sin6 x + C .
4 6
(c)
 π/6  π/6
sec θ tan θ dθ =
3
sec2 θ · sec θ tan θ dθ
0

0
π/6  3 !
1 1 2
= sec3 θ = √ −1 .
3 0
3 3

(d)
 π/3  π/3
(sin 3x)(cos 3x) d x =
4 3
(sin4 3x)(1 − sin2 3x) cos 3x d x
0 0
 
1 π/3 1 π/3
= (sin 3x) · 3 cos 3x d x −
4
3(sin6 3x) cos 3x d x
3 0 3 0
π/3
1 1
= sin5 3x − sin7 3x = 0.
15 21 0

(e) Let u = sin θ, then du = cos θ dθ,


 
cos θ 1 u
dθ = √ du = sin−1 √ + C
2 − sin θ2 2−u 2 2
 
sin θ
= sin−1 √ +C.
2
√ √
(f) Let x = 3 tan θ, then d x = 3 sec2 θ dθ and 3 + x 2 = 3 sec2 θ,
  π/3 3/2 3 √
3 x3 3 tan θ
d x = 3 sec2 θ dθ
0 (3 + x 2 )5/2 0 35/2 sec5 θ
 π/3  π/3
1 1
= √ sin3 θ dθ = √ (1 − cos2 θ) sin θ dθ
3 0 3 0
π/3    
1 1 1 1 1 1
= √ − cos θ + cos3 θ = √ − + − −1 +
3 3 0 3 2 24 3

5 3
= .
72
Solutions of Selected Exercises 579

Solutions to the Exercises of Chap. 7


7.7.1 (a) The region can be described as (see Fig. E.5)
" x#
(x, y) : 0 ≤ x ≤ π, sin 4x ≤ y ≤ 1 + cos .
3
The area of this region becomes
  π
π
x  x 1
A= 1 + cos − sin 4x d x = x + 3 sin + cos 4x
0 3 3 4 0

3 3 1 1 3√
=π+ + − =π+ 3 ≈ 5.74 .
2 4 4 2
(b) The region can be described as
" x #
(x, y) : 0 ≤ x ≤ π, 3 sin ≤ y ≤ 4 + cos 2x .
2
Its area is
 π
x 1 x π
A= (4 + cos 2x) − 3 sin d x = 4x + sin 2x + 6 cos
0 2 2 2 0
= 4π − 6 ≈ 6.57 .

Fig. E.5 Region whose area has to be computed


580 Solutions of Selected Exercises

Fig. E.6 Area between two


curves

7.7.2 (a) On [−1, 3], f (x) = g(x) at x = 0 and x = 2, and the graphs of f and g
intersect at (0, 2) and (2, 2), see Fig. E.6. We have f (x) ≥ g(x) on [−1, 0]
and on [2, 3], while f (x) ≤ g(x) on [0, 2]. We obtain the area as
 0  2
A= [(x − 4x + 2) − 2] d x +
3
[2 − (x 3 − 4x + 2)] d x
−1 0
 3
+ [(x 3 − 4x + 2) − 2] d x
2
7 25
= +4+ = 12 .
4 4
π 5π
(b) On [0, 2π ], f (x) = g(x) at x = and x = , and the graphs of f and
4 4
g intersect
at the corresponding points, see Fig. E.7. We have f (x) ≥ g(x)
π 5π  π  5π
on , and f (x) ≤ g(x) on 0, ∪ , 2π . The required area
4 4 4 4
becomes
 π/4  5π/4
A= (cos x − sin x) d x + (sin x − cos x) d x
0 π/4
 2π
+ (cos x − sin x) d x
5π/4
√ √ √ √
= ( 2 − 1) + 2 2 + (1 + 2) = 4 2 .
Solutions of Selected Exercises 581

Fig. E.7 Area between two


curves

Fig. E.8 Region defining a


solid of revolution

1
7.7.3 (a) Let f (x) = . The region D is sketched in Fig. E.8. The volume of the
x
solid of revolution generated by D becomes
  2  
3
1 1 3 1 2π
V =π dx = π − = π − − (−1) = .
1 x x 1 3 3

1 1√
(b) The graphs of x = y and x = y intersect for y = 0 and y = 1, and
2 2
1√ 1
y ≥ y on [0, 1], see Fig. E.9. The volume of the corresponding solid
2 2
of revolution becomes
  2
1√ 2 1
1 π 1 2 1 3 1
V =π y − y dy = y − y
0 2 2 4 2 3
  0
π 1 1 π
= − = .
4 2 3 24

x2 y2
7.7.4 The upper half of the ellipse given by + = 1 is described by the function
a2 b2
582 Solutions of Selected Exercises

Fig. E.9 Graphs of the


functions in Exercise
7.7.3(b)

b 2 bx
y = f (x) = a − x 2 , and f (x) = − (a 2 − x 2 )−1/2 .
a a
The circumference C equals 4 times the length of the graph of f for x ≥ 0,

  
a a
b2 x 2
C =4 1 + f (x)2 d x = 4 1+ dx .
0 0 a 2 (a 2 − x 2 )

To evaluate the integral, substitute x = a sin θ. Then d x = a cos θ dθ , and the


integration limits for θ are 0 and π/2. We compute

 π/2
b2 a 2 sin2 θ
C =4 1+ 2 2 a cos θ dθ
0 a · a cos2 θ
 π/2 
b2
= 4a cos2 θ + 2 sin2 θ dθ
0 a
 π/2
= 4a cos2 θ + sin2 θ − e2 sin2 θ dθ,
0

(note that b2 = a 2 (1 − e2 ), where e is the eccentricity)


 π/2
= 4a 1 − e2 sin2 θ dθ .
0

7.7.5 By Hooke’s law, F(x) = kx for some constant k where F is the force and x
is the change in length relative to the natural length, so
Solutions of Selected Exercises 583

0.08 − 0.75 1
25 = k = k , k = 400 .
0.08 16
The work done becomes
 0.80−0.70  0.1
W = 400x d x = 400x d x = 200 · (0.1)2 = 2 Nm.
0 0

7.7.6 Let the rope be placed along the y-axis, and let y = 0 be the initial position of
the bottom of the rope. If the rope is pulled y m then (60 − y) m of the rope is
suspended with a weight of 41 (60 − y) kg. Therefore, the work done in lifting
 60
1
the rope equals g (60 − y) dy Newton meter. The work done in lifting
0 4
the motor through 60 m is 60 · 50g Nm = 29430 Nm. The total work becomes

g 60
1 2 60
W = 29430 + (60 − y) dy = 29430 + 9.81 · 15y − y
4 0 8 0
= 33844.5 Nm.

k
7.7.7 Let F = 2 , where d is the distance between the two electrons and k is the
d
proportionality constant.
(a) The distance between the two electrons is 5 − x m, where x is the x-
coordinate of the second electron, so F(x) = k/(5 − x)2 N and the work
done becomes
 3 3
k 1 3k
W = d x = k = Nm.
0 (5 − x)2 5−x 0 10

(b) The distance between the third (moving) electron and the electron at (−5, 0)
is x − (−5) = x + 5 m. The distance between the moving electron and the
k k
electron at (5, 0) is x − 5 m. The repelling forces are and ,
(5 − x) 2 (5 + x)2
respectively. Since the latter acts in the direction of the movement while the
former acts against it, the net force is the difference of these two forces, that
k k
is, F(x) = − . The total work becomes
(5 − x)2 (5 + x)2
 3 3
k k 1 1 9k
W = − dx = k + = Nm.
0 (5 − x) 2 (5 + x)2 5−x 5+x 0 40

7.7.8 The amount of gasoline used after 2 h is


584 Solutions of Selected Exercises

 2
2
1 (9 − t 2 )3/2 1 √
t 9 − t dt = −
2 = 9 − 5 5 ≈ 5.27 gal.
0 2 3/2 0 3

7.7.9 The average flow rate F is computed as


 T 
1 1 T F1
F= F(t) dt = dt
T −0 0 T 0 (1 + αt)2
 T
F1 T F1 (1 + αt)−1
= (1 + αt)−2 α dt =
αT 0 αT −1
0
F1 1 F1 −1 + 1 + αT F1 αT
= − +1 = =
αT 1 + αT αT 1 + αT αT 1 + αT
F1
= .
1 + αT

7.7.10 The average cost C becomes


 500 500
1 1 10q 2 0.1q 3
C= (4000 + 10q + 0.1q 2 ) dq = 4000q + +
400 100 400 2 3 100
 
1 0.1
= 4000 · 400 + 5(250000 − 10000) + (500) − (100)
3 3
400 3

1 0.1
= 1600000 + 5 · 240000 + · 24000000
400 3
1 1
= [16000 + 12000 + 8000] = · 36000 = 900 (Euro).
4 4

7.7.11 The average concentration C on [0, T ] is


 T
1R T
R (1 + αt)3
C= (1 + αt)2 dt =
T
0 F1 T F1 3α 0
R  
= 1 + α T + 3α T + 3αT − 1
3 3 2 2
3F1 αT
 
R R 1 2 2
= [α T + 3α T + 3αT ] =
3 3 2 2
1 + αT + α T .
3F1 αT F1 3

7.7.12 (a) The probability of eventual breakdown is


 

1 ∞
4  −x/4 ∞
P(0 ≤ X < ∞) = p(x) d x = e−x/4 d x = − e .
0 4 0 4 0

= −(0 − 1) = 1 .

(b) The probability of breakdown within the first 12 years is


Solutions of Selected Exercises 585

  !12
12
1 12
1 e−x/4
P(0 ≤ X ≤ 12) = p(x) d x = e−x/4 d x =  1
0 4 0 4 −4
0
−12/4
= −(e − e ) = (1 − e−3 )  0.9502 .
0

7.7.13 (a) The density function (see Fig. 7.25) is zero for all t > 3, so no one waits
more than 3 h. The longest time anyone has to wait is 3 h.
(b) The fraction of patients who wait between 1 and 2 h is equal to the area
under the density function between t = 1 and t = 2. We estimate this
area by counting squares: about 7.5 squares in this region, each of area
0.5 · 0.1 = 0.05. The approximate area thus becomes 7.5 · 0.05 = 0.375.
Therefore 37.5% of patients wait between 1 and 2 h.
(c) The fraction of patients waiting less than 1 h is equal to the area under
the density function for t < 1. There are about 12 squares in this area.
Thus, the approximate area is 12 · 0.05) = 0.60. Therefore about 60%
of patients see the doctor in less than one hour.
7.7.14 The value P(t) of the cumulative distribution function P equals the fraction
of days on which the catch is less than t tons of fish. Since the catch is never
less than 2 tons, we have

P(t) = 0 , for t ≤ 2.

Since the catch is always less than 8 tons, we have

P(t) = 1 , for t ≤ 8.

For t in the range 2 < t < 8, the value of P(t) is given by the integral
 t  t
P(t) = p(x) d x = p(x) d x ,
−∞ 2

which is equal to the area under the graph of p between x = 2 and x = t.


We count the grid squares in the given figure, each square has area 0.04. For
example,
 3
P(3) = p(x) d x ≈ Area of 2.5 squares = 2.5 · 0.04 = 0.10 .
2

The following table contains the values of P(t):


586 Solutions of Selected Exercises

t(tons of fish) P(t) (fraction of fishing days)


2 0
3 0.10
4 0.24
5 0.42
6 0.64
7 0.85
8 1

Fig. E.10 Cumulative


distribution function

Fig. E.11 Density function

(b) The probability that the catch is between 5 and 7 tons can be found
using either the density function p or the cumulative distribution function
P (Fig. E.10). Using the density function, this probability is represented by
the shaded area in Fig. E.11 which is about 10.75 squares, so the required
probability is
 7
p(x) d x ≈ Area of 10.75 squares = 10.75 · 0.04 = 0.43 .
5

From the cumulative distribution function, this probability can be found as


the difference

P(7) − P(5) = 0.85 − 0.42 = 0.43 .


Solutions of Selected Exercises 587

7.7.15 (a) The fraction of calls lasting between 1 and 2 min is given as
 2
0.4  −0.4x 2
P(1 ≤ X ≤ 2) = 0.4 e−0.4x d x = e
1 −(0.4) 1

= e−0.4 − e−0.8 = 0.221 .


 1  1
(b) P(X ≤ 1) = 0.4 e−0.4x d x = − e−0.4x 0 = 1 − e−0.4 = 0.3297.
0
(c)
 2  2
P(0 < X ≤ 2) = 0.4 e−0.4x d x = − e−0.4x 0
0
= (1 − e−.8 ) = 0.5507 .
 ∞  ∞
(d) P(X ≥ 3) = 0.4 e−0.4x d x = − e−0.4 x 3 = e−1.2 = 0.3012.
3
For percentages, multiply by 100.
7.7.16 We get
 t  t  t
−0.4x
P(t) = p(x) d x = 0.4e d x = 0.4 e−0.4x d x
−∞ −∞ 0
 t
= − e−0.4x 0 = 1 − e −0.4t
.

7.7.17 The probability having to wait no more than 7 min is


 7  7  x 7
1 7
P(0 ≤ X ≤ 7) = f (x) d x = 1 dx = = .
0 10 0 10 0 10

The average waiting time is


 ∞  10 10
1 1 x2
x f (x) d x = x dx = = 5 min.
−∞ 10 0 10 2 0

7.7.18 Let us recall that the exponential density function f is given by



ke−kx , if x ≥ 0,
f (x) =
0, if x < 0.

Here k = 1/6. We obtain the required probabilities as


588 Solutions of Selected Exercises

1 4 −x/6 6 4
P(0 ≤ X ≤ 4) = e d x = − e−x/6 0 = 1 − e−2/3 = 0.4866 ,
6 0 6

1 ∞ −x/6  ∞
P(X ≥ 6) = e d x = − e−x/6 6 = e−1 = 0.3679 .
6 6

7.7.19 We have f (x) = 4e−4x and


 ∞  ∞  ∞
P(X ≥ 2) = f (x) d x = 4 e−4x d x = − e−4x 2 = e−8
2 2
= 0.000335 .

7.7.20 Let v (t) = a be the constant acceleration. Integrating, we get v(t) = at + C.


Since v(0) = 0 we have C = 0, thus v(t) = at. The distance traveled satisfies
s (t) = v(t). Integrating, we get

at 2
s(t) = + D.
2
We have D = 0 since s(0) = 0, so

1 2 1
s(t) = at , s(10) = a · 100 = 50a .
2 2

The condition s(10) = 500 gives a = 10 m/s2 .


7.7.21 As in the previous exercise, v(t) = at + C. Since v(0) = 60 km/h = 50
3
m/s,
we get

50 50 50 m
v(t) = at + , 0 = v(9) = 9a + , a=− .
3 3 27 s2
The minus sign means deceleration (braking).
7.7.22 Since A (t) = 5 + 0.01t, we obtain the amount A(t) of gas consumed until
time t by integration,

t2
A(t) = 5t + 0.01 +C.
2
Since A(0) = 0 we have C = 0. We determine t from

1 2
100 · 109 = 5t + t ,
200

which is equivalent to t 2 + 103 t − 2 · 1013 = 0. Its solutions are


√ √
−103 ± 106 + 8 · 1013 −103 ± 103 1 + 8 · 107
t= = ,
2 2
Solutions of Selected Exercises 589

so the positive solution becomes t = 500(−1 + 1 + 8 · 107 ) years.
7.7.23 The average rate P of heat production is
 2π/ω
1
P = 2π (I M
2
sin2 ωt)R dt
ω
− 0 0
 2   2π/ω
ω R IM 1
= (1 − cos 2ωt) dt
2π 0 2
2  
ω R IM
2
sin 2ωt 2π/ω ω R IM 2π
= t− = −0
4π 2ω 0 4π ω
1 2
= R IM .
2
Solutions to the Exercises of Chap. 8
8.9.1 (a) If interest is compounded annually, m increases by a factor of 1.05 every
year, so
m = f (b, t) = b · 1.05t .

(b) If interest is compounded continuously, m increases according to the func-


tion ekt with k = 0.05, so

m = f (b, t) = be0.05t .

8.9.2 (a) Keeping x fixed at 8 means that one considers an injection of 8 mg of the
drug. The expression f (8, t) describes the concentration of the drug in
the blood resulting from an 8 mg injection as a function of time t. Figure
E.12 shows the graph of f (8, t) = te−t . We notice that the concentration
in the blood attains a maximum at 1 h after injection and it eventually
approaches zero.

Fig. E.12 The function


f (4, t) shows the
concentration in the blood
resulting from a 4 mg
injection
590 Solutions of Selected Exercises

Fig. E.13 The function


f (x, 1) shows the
concentration in the blood 1
h after the injection

(b) Keeping t fixed at 1 means that we are focusing on the concentration


which is present 1 h after the injection. The expression f (x, 1) gives
the concentration of the drug in the blood 1 h after the injection as
a function of the amount x injected. Figure E.13 shows the graph of
f (x, 1) = e−(9−x) = e x−9 . This function is an increasing function of x.
8.9.3 For f (x) = x sin x we have
(a) f (x − y) = (x − y) sin (x − y).
x x x
(b) f ( ) = sin ( ).
y y y
(c) f (x y) = x y sin (x y).
8.9.4 For h(x, y, z) = x y 2 z 3 + 4 we have
(a) h(a + b, a − b, b) = (a + b)(a − b)2 b3 + 4.
(b) h(0, 0, 0) = 4.
(c) h(t, t 2 , −t) = −t · t 4 t 3 + 4 = −t 8 + 4.
(d) h(−6, 4, 2) = (−6) · 16 · 8 + 4 = −764.

8.9.5 (i) The domain of f (x, y) = xe− y+2 is the set all of points (x, y) above or
on the line y = −2.
(ii) The domain of f (x, y, z) = e x yz is the set of all points in3-space.
x yz
(iii) The domain of f (x, y, z) = is the set of all points (x, y, z)
x+y+z
which do not lie on the plane x + y + z = 0.
8.9.6 (i) See Fig. E.14.
(ii) See Fig. E.15.
(iii) See Fig. E.16.
8.9.7 (a) (i) See Fig. E.17.
(ii) See Fig. E.18.
(b) (i) See Fig. E.19.
(ii) See Fig. E.20.
Solutions of Selected Exercises 591

Fig. E.14 Solution of


Exercise 8.9.6(a)

Fig. E.15 Solution of


Exercise 8.9.6(b)

Fig. E.16 Solution of


Exercise 8.9.6(c)
592 Solutions of Selected Exercises

Fig. E.17 Solution of


Exercise 8.9.7(a)(i)

Fig. E.18 Solution of


Exercise 8.9.7(a)(ii)

Fig. E.19 Solution of


Exercise 8.9.7(b)(i)
Solutions of Selected Exercises 593

Fig. E.20 Solution of


Exercise 8.9.7(b)(ii)

c
8.9.8 We have T (x, y) = , where c is a constant.
x2 + y2
c2
(a) The isothermal curves T (x, y) = k are given by x 2 + y 2 = . These are
k2
circles with center (0, 0) and radius c/k. To sketch them for k = 1, 2, 3
requires knowledge of the value of the proportionality constant c.
(b) If T = 40 ◦ C at (x, y) = (4, 3), we can compute c from
c
40 = √ ,
42 + 32

so c = 200. For T = 20 ◦ C the equation is


 2
200
x +y =
2 2
= 100.
20

The isothermal curve for a temperature of 20 ◦ C is x 2 + y 2 = 100.


8
8.9.9 We obtain V (x, y) = . The equipotential curves are circles.
16 + x 2 + y 2
nkT
8.9.10 P V = nkT implies P(V, T ) = . The level curves are the curves where
V
nkT nk
= c holds for some constant c. In this case, V = T gives lines in
V c
the V T -plane passing through the origin. Physical Significance: If the state
of the system moves along one of those level curves, the pressure remains
constant, while volume and temperature change.
8.9.11 (a) We have P(A, v) = k Av 3 , where k > 0 is a constant.
c
(b) For level curves P(A, v) = k Av3 = c we get A = 3 . These level
kv
curves show the combinations of area and wind velocity that result in a
fixed power P = c.
594 Solutions of Selected Exercises

(c) We have v = 20 and A = 52 π , since the radius equals 5. The given data
3
yield 3000 = k52 π · 203 , so k = . The level curve at P = 4000 is
200π
described by

3 8
4000 = Av3 , Av3 = π × 105 .
200π 3

8.9.12 (a) The isobars are curves of constant pressure, so they are given by ax 2 +
by 2 + c = k for some constant k ≥ c. They are ellipses, alternatively
written as
x2 y2
k−c
+ k−c = 1 .
a b

(b) A region of low pressure occurs when its atmospheric pressure p is less
than that in the surrounding area. Here, the minimum value p(0, 0) = c
of the atmospheric pressure occurs at the origin. As we move away from
origin, the atmospheric pressure increases. Hence there is an area of low
pressure near the origin.
x y3
8.9.13 (a) The function f (x, y) = is continuous at (−1, 2). Therefore, its
x+y
limit at this point exists and equals f (−1, 2), so

x y3 (−1) · 23
lim = = −8.
(x,y)→(−1,2) x+y (−1) + 2

x−y 1
(b) Along y = 0, = , therefore the limit as x → 0 does not exist
x +y
2 2 x
1 x−y
because → ∞ as x → 0. So lim does not exist.
x (x,y)→(0,0) x 2 + y 2
8.9.14 We have ⎧
⎨ sin (x + y ) , (x, y)  = (0, 0) ,
2 2

f (x, y) = x +y
2 2

1, (x, y) = (0, 0) .

sin z
Take z = g(x, y) = x 2 + y 2 . Then lim+ = 1 = f (0, 0). Since more-
z→0 z
over g is continuous, f is continuous at 0.
8.9.15 (a) We have f (x, y) = ln (x + y − 1). The logarithm is continuous for argu-
ments in (0, ∞), and x + y − 1 > 0 holds if and only if x + y > 1.
Therefore, f is continuous√on the set {(x, y) : x + y > 1}.
√ 2
(b) We have f (x, y) = xe 1−y . The square root is defined and con-
tinuous for nonnegative arguments. In order that f is defined in a
neighborhood of (x, y), we must have x > 0 and moreover 1 − y 2 > 0
Solutions of Selected Exercises 595

which is equivalent to |y| < 1. Therefore, f is continuous on the set


{(x, y) : x > 0 and |y| < 1}.
2 1
8.9.16 (a) For z(x, y) = y 2 e x + 2 3 we get
x y

∂z
(x, y) = 2x y 2 e x − 2x −3 y −3 ,
2

∂x
 
∂2z ∂ ∂z
(x, y) = 4x ye x + 6x −3 y −4 ,
2
(x, y) =
∂ y∂ x ∂y ∂x
∂z
(x, y) = 2ye x − 3x −2 y −4 ,
2

∂y
 
∂2z ∂ ∂z
(x, y) = 4x ye x + 6x −3 y −4 .
2
(x, y) =
∂ x∂ y ∂x ∂y

x2
(b) For w(x, y) = we get
y2 + z2

∂w
(x, y) = −2x 2 y(y 2 + z 2 )−2 ,
∂y
∂ 2w
(x, y) = −2x 2 (y 2 + z 2 )−2 + 8x 2 y 2 (y 2 + z 2 )−3 ,
∂ y2
∂ 2w
(x, y) = 8x 2 z(y 2 + z 2 )−3 − 48x 2 y 2 z(y 2 + z 2 )−4
∂z∂ y 2
= 8x 2 z(y 2 + z 2 )−4 [(y 2 + z 2 ) − 6y 2 ]
8x 2 z(z 2 − 5y 2 )
= .
(y 2 + z 2 )4
y
8.9.17 (a) For f (x, y) = arctan we get
x

∂f y 1 y ∂2 f 2x y
(x, y) = − 2   =− 2 , (x, y) = 2 ,
∂x x [1 + y 2 ] x + y2 ∂x2 (x + y 2 )2
x
∂f 1 1 x ∂2 f 2x y
(x, y) =   = 2 , (x, y) = − 2 .
∂y x 1+ y 2 x + y2 ∂ y2 (x + y 2 )2
x

∂2 f ∂2 f
Hence + =0 .
∂x2 ∂ y2
(b) For f (x, t) = (x − at)4 + cos (x + at) we get
596 Solutions of Selected Exercises

∂f
(x, t) = 4(x − at)3 − sin (x + at) ,
∂x
∂2 f
(x, t) = 12(x − at)2 − cos (x + at) ,
∂x2
∂f
(x, t) = −4a(x − at)3 − a sin (x + at) ,
∂t
∂2 f 2∂ f
2
(x, t) = 12a 2
(x − at) 2
− a 2
cos (x + at) = a (x, t) .
∂t 2 ∂x2

8.9.18 (a) For u = x 2 − y 2 and v = 2x y we get

∂u ∂v ∂u ∂v
= = 2x , =− = −2y .
∂x ∂y ∂y ∂x

(b) For u = e x cos y and v = e x sin y we get

∂u ∂v ∂u ∂v
= = e x cos y , =− = −e x sin y .
∂x ∂y ∂y ∂x
x
(c) For u = ln (x 2 + y 2 ) and v = 2 tan−1 ( ) we get
y

∂u ∂v 2x ∂u ∂v 2y
= = 2 , =− = 2 .
∂x ∂y x + y2 ∂y ∂x x + y2

8.9.19 For z = f (u) and u = g(x, y) we have

∂z dz ∂u ∂z dz ∂u
= , = .
∂x du ∂ x ∂y du ∂ y

Therefore
   
∂2z ∂ dz ∂u d 2 z ∂u 2 dz ∂ 2 u
(i) = = + ,
∂x2 ∂ x du ∂ x du 2 ∂ x du ∂ x 2
   
∂2z ∂ dz ∂u d 2 z ∂u 2 dz ∂ 2 u
(ii) = = + ,
∂ y2 ∂y  du ∂ y  du 2 ∂ y du ∂ y 2
∂ z
2
∂ dz ∂u d z ∂u ∂u
2
dz ∂ u2
(iii) = = + .
∂ y∂ x ∂ y du ∂ x du ∂ y ∂ x
2 du ∂ y∂ x
8.9.20 Let a, b be the length and width of parallelogram and α be the angle between
the two sides, see Fig. E.21. We must have a > 0, b > 0 and 0 < α < π .
l − 2a
Since 2a + 2b = l for l given, b = . The area A of the parallelogram
2
is equal to ab sin α, so
a
A(a, α) = (l − 2a) sin α .
2
Solutions of Selected Exercises 597

Fig. E.21 Maximizing the


area of a parallelogram

We have
∂A a ∂A 1
(a, α) = (l − 2a) cos α , (a, α) = (l − 4a) sin α .
∂α 2 ∂a 2
At the maximum, the partial derivatives are zero.
∂A l
= 0 gives a = , since sin α = 0.
∂a 4
∂A π
= 0 gives cos α = 0, that is, α = . The second derivatives become
∂α 2

a 1
Aaa = −2 sin α , Aαα = − (l − 2a) sin α , Aaα = (l − 4a) cos α .
2 2
l π
For a = and α = 2
we obtain
4

l l l2
Aaa Aαα − Aaα
2
=2· · = > 0, Aaa = −2 < 0 .
8 2 8
l π
Therefore the area is maximal for a = and α = , that is, for a square
4 4
l
with side length .
4
8.9.21 Let x > 0, y > 0 and z > 0 be the length, width and height respectively of the
box. Assume that the amount of material needed is proportional to the area
S to be covered, which is given by S = x y + 2x z + 2yz. Since the volume
V
V = x yz is a given fixed number, we can eliminate z = , so
xy

V V
S(x, y) = x y + 2 +2 .
y x

2V 2V
Setting Sx = y − 2
= 0, S y = x − 2 = 0 and solving for x, y, we obtain
x y
x = (2V )1/3 , y = (2V )1/3 . Therefore, the critical point is ((2V )1/3 , (2V )1/3 ).
The second partial derivatives are
598 Solutions of Selected Exercises

4V 4V
Sx x = 3
, S yy = 3 , Sx y = 1 .
x y

For x = (2V )1/3 and y = (2V )1/3 , we get Sx x S yy − Sx2y = 2 · 2 − 1 = 3 > 0


and Sx x = 2 > 0. So at the point ((2V )1/3 , (2V )1/3 ) there is a relative mini-
mum. The corresponding length and width is x = y = (2V )1/3 , the height is
(2V )1/3
z= .
2
8.9.22 Let x, y and z be the length, width and height of the box, respectively. We
8
have to minimize C = 2 · 2x y + 2x z + 2yz, given 8 = x yz. Taking z = ,
xy
the cost becomes
16 16
C(x, y) = 4x y + + .
y x

16 16
Setting C x = 4y − 2
= 0 and C y = 4x − 2 = 0, we get yx 2 = 4 = x y 2
x y
8
and hence x = y = 4 ft, since x > 0 and y > 0). Moreover, z =
1/3
=
xy
2 · 41/3 ft. The second derivatives are
32 32
Cx x = , C yy = 3 , C x y = 4 .
x3 y

For x = y = 41/3 we get C x x C yy − C x2y = 8 · 8 − 16 > 0 and C x x = 8 > 0.


Hence the cost is minimum for x = y = 41/3 and z = 2 · 41/3 ft.
8.9.23 The perimeter of the window is P = x + 2y + x sec θ, its area is

1 1 1
A = x y + x · x tan θ = x y + x 2 tan θ .
2 2 4

Since P = 4, we can solve for y to obtain y = 21 (4 − x − x sec θ ). Thus

1
A(x, θ) == 2x − x 2 (2 + 2 sec θ − tan θ ) .
4
In order to maximize A, we first set

1
0 = Aθ = − x 2 (2 sec θ tan θ − sec2 θ ) ,
4

which gives 2 tan θ = sec θ, since x 2 = 0 and sec θ  = 0, so 2 sin θ = 1 and


θ = π6 . Next, we set
 
1 1 4 1
0 = A x = 2 − x(2 + 2 sec θ − tan θ ) = 2 − x 2 + √ − √ ,
2 2 3 3
Solutions of Selected Exercises 599

1 √
which gives 2 − x(2 + 3) = 0, hence
2
4 √ 2√
x= √ = 8 − 4 3, y =2− 3.
2+ 3 3

8.9.24 We will fit the data to a line y = ax + b using the method of least squares.
We get


10 
10
xk2 = 54, 785 , xk = 723 ,
k=1 k=1


10 
10
xk yk = 54, 277 , yk = 708 .
k=1 k=1

We have to solve

54785a + 723b = 54277


723a + 10b = 708

30, 886 454, 491


This gives a = ≈ 1.23 and b = − ≈ −18.09. Thus y ≈
25, 121 25, 121
1.23x − 18.09. For x = 70 we obtain y ≈ 68.
8.9.26 (a) We have F(x, y, λ) = x y + λ(4x 2 + 8y 2 − 16). Setting

0 = Fx = y + 8xλ , 0 = Fy = x + 16yλ = 0 ,

y x
we get = , so x 2 = 2y 2 , and thus 4 · 2y 2 + 8y 2 = 16. This gives
8x 16y
= 1 and therefore
y2 √ √ y = ±1. We thus have four critical points, namely
(± 2, −1) and (± 2, 1). Since
√ √ √
f (− 2, −1) = f ( 2, 1) = 2 ,
√ √ √
f (− 2, 1) = f ( 2, −1) = − 2 ,

and since there exists a√global minimum and a √ global maximum, √ the
maximum value equals 2 and is attained
√ at (− 2, −1) and (
√ 2, 1),
while
√ the minimum value equals − 2 and is attained at (− 2, 1) and
( 2, −1).
(b) We have F(x, y, λ) = x − 3y − 1 + λ(x 2 + 3y 2 − 16). Setting

0 = Fx = 1 + 2λx , 0 = Fy = −3 + 6λy ,
600 Solutions of Selected Exercises

1 1
we get = − , so y = −x, hence x 2 + 3x 2 = 16 and thus x = ±2.
2x 2y
We obtain the critical points (−2, 2) and (2, −2). Since a maximum
and a minimum exists and f (−2, 2) = −9, f (2, −2) = 7, we have a
maximum at (2, −2) with value 7, and a minimum (−2, 2) with value
−9.
8.9.27 We have to minimize the distance f (x, y) = x 2 + y 2 of (x, y) from the
origin, subject to the constraint 2x − 4y = 3. We define F(x, y, λ) = x 2 +
y 2 + λ(2x − 4y − 3) and set

0 = Fx = 2x + 2λ , 0 = Fy = 2y − 4λ .

The equations 2x = −2λ, 2y = 4λ give y = −2x, so 2x  + 8x = 3and there-


3 3 3
fore x = . Therefore, the point we are looking for is ,− .
10 10 5
8.9.29 (a)
 3  −1  3  x=−1
(4x y 3 + y) d x d y = 2x 2 y 3 + x y dy
0 −2 0 x=−2
 3 3
6 y2
= (−6y 3 + y) dy = − y 4 + = −117 .
0 4 2 0

(b)

 π/6  π/2 ! y=π/2


 π/6
y2
(x cos y − y cos x) d y d x = x sin y − cos x dx
0 0 0 2
y=0
 π/6
!π/6
π2 x2 π2 π2 π2 1
= x− cos x d x = − sin x = − .
0 8 2 8 72 8 2
0
7 2
=− π ≈ −0.48 .
144

8.9.30 A rough sketch is given in the Fig. E.22. The area A becomes

 4  8− x2
2  4
x2 x
A= −2+ dy dx =dx 8−
−3 2− x2 −3 2 2
 4  4
x2 x x3 x2 343
= 6− + d x = 6x − + = .
−3 2 2 6 4 −3 12

8.9.31 The region D is shown as the shaded portion in Fig. E.23.


Solving r = 100 and r = 200 sin θ, the points of intersection are obtained
as sin θ = 21 . So θ varies from π6 to 5π
6
. Due to symmetry, it suffices to let θ
vary from π6 to π2 and multiply by 2, so the area becomes
Solutions of Selected Exercises 601

Fig. E.22 Lake bordered by


a straight dam

Fig. E.23 Area outside a


circle and inside another
circle

  π/2  200 sin θ


A= dA = 2 r dr dθ
D π/6 100
 π/2 2 r=200 sin θ  π/2
r
=2 dθ = 104 (4 sin2 θ − 1) dθ
π/6 2 r=100 π/6
 π/2  π/2
(1 − cos 2θ )
= 10 4
4 − 1 dθ = 10 4
(1 − 2 cos 2θ ) dθ
π/6 2 π/6
 π/2 √ !
4 π π 3
= 10 θ − sin 2θ
4
= 10 − +
π/6 2 6 2


π 3
= 104 + .
3 2

8.9.32 For the lemniscate r 2 = a 2 sin 2θ, the area A of the region R becomes
(Fig. E.24)
602 Solutions of Selected Exercises

Fig. E.24 One loop of a


lemniscate

   √ 
π/2 a sin 2θ π/2
1
A= dA = r dr dθ = a 2 sin 2θ dθ
R 0 0 2 0
1  π/2 1 1
= − a 2 cos 2θ = − a 2 (−1 − 1) = a 2 .
4 0 4 2
8.9.33 (a) Transforming into polar coordinates, we get
  √  
a a 2 −x 2 π/2 a
e−(x +y 2 )
e−r r dr dθ =: I .
2 2
dy dx =
0 0 0 0

a
e−r r dr , we substitute t = r 2 , dt = 2r dr . We then obtain
2
To evaluate 0

  a2
1 − e−a
a 2
−r 2 1 1 1
e dt = − e−t |a0 = − (e−a − 1) =
−t
2 2
e r dr = ,
0 2 0 2 2 2
 π/2
1 π
I = (1 − e−a ) dθ = (1 − e−a ) .
2 2

2 0 4

√ region of integration is bounded by y = 0 and the semicircle y =


(b) The
a 2 − x 2 . We obtain
  √  
a a 2 −x 2 π a
(x + y ) 2 2 3/2
dy dx = r 3r dr dθ
−a 0 0 0
 π r=a  π
r5 1 5 1 5
= dθ = a dθ = πa .
0 5 r=0 5 0 5
Solutions of Selected Exercises 603

Solutions to the Exercises of Chap. 9


9.7.1 (a) Let x = (x1 , x2 , x3 ), y = (y1 , y2 , y3 ) and z = (z 1 , z 2 , z 3 )
Then

(x + y) + z = [(x1 , x2 , x3 ) + (y1 , y2 , y3 )] + (z 1 , z 2 , z 3 )
= (x1 + y1 , x2 + y2 , x3 + y3 ) + (z 1 , z 2 , z 3 )
= [(x1 + y1 ) + z 1 , (x2 + y2 ) + z 2 , (x3 + y3 ) + z 3 ]
= [x1 + (y1 + z 1 ), x2 + (y2 + z 2 ), x3 + (y3 + z 3 )]
= (x1 , x2 , x3 ) + (y1 + z 1 , y2 + z 2 , y3 + z 3 )
= x + (y + z) .

(d) Let x = (x1 , x2 , x3 ) and y = (y1 , y2 , y3 ). Then

λ(x · y) = λ(x1 y1 + x2 y2 + x3 y3 ) = (λx1 )y1 + (λx2 )y2 + (λx3 )y3


= (λx) · y .

Analogously λ(x · y) = x · (λy).


(e) Let x = (x1 , x2 , x3 ), y = (y1 , y2 , y3 ). Then
 
x1 x2 x3 
 
x · (x × y) = x1 x2 x3  = 0 ,
 y1 y2 y3 

since the determinant is zero when two rows are equal.


(f) For the same reason
 
 y1 y2 y3 
 
y · (x × y) = x1 x2 x3  = 0 .
 y1 y2 y3 

(g) Let x = x1 i + x2 j + x3 k, y = y1 i + y2 j + y3 k, z = z 1 i + z 2 j + z 3 k. Then


 
 i j k 

x × (y + z) =  x1 x2 x3 
 y1 + z 1 y2 + z 2 y3 + z 3 
   
 i j k  i j k
   
= x1 x2 x3  + x1 x2 x3 
 y1 y2 y3   z 1 z 2 z 3 
= (x × y) + (x × z) .

(h) x × (y × z) is parallel to the plane spanned by y and z, and orthogonal to


x. Therefore x × (y × z) = αy + βz for some scalars α and β, and
604 Solutions of Selected Exercises
 
0 = x · x × (y × z) = α(x · y) + β(x · z) .

Thus α = λ(x · z) and β = −λ(x · y) for some scalar λ. This gives


 
x × (y × z) = λ (x · z)y − (x · y)z . (*)

Assume now that z = a3 k, y = b2 j + b3 k and x = c1 i + c2 j + c3 k. Sub-


stituting these for x, y, z in (*) we have

x × (y × z) = x × (b2 a3 i) = −c2 b2 a3 k + b2 a3 c3 j .

Moreover,
   
λ (x · z)y − (x · y)z = λ (c3 a3 )(b2 j + b3 k) − (c2 b2 + b3 c3 )(a3 k)
 
= λ c3 a3 b2 j − c2 b2 a3 k ,

so λ = 1 and x × (y × z) = (x · z)y − (x · y)z. The case of general y and


z can be treated either by rotating the standard unit vectors into a suitable
orthogonal triple of unit vectors, or by doing an analogous computation
for other choices of y and z, and employing linearity.
(i) Let x = x1 i + x2 j + x3 k, y = y1 i + y2 j + y3 k, z = z 1 i + z 2 j + z 3 k. Then
   
x1 x2 x3   y1 y2 y3 
   
x · (y × z) =  y1 y2 y3  , y · (z × x) =  z 1 z 2 z 3  ,
z1 z2 z3  x1 x2 x3 
 
z1 z2 z3 
 
z · (x × y) = x1 x2 x3  .
 y1 y2 y3 

Each determinant can be obtained from the others by two row interchanges.
Therefore, by the properties of determinants, they are equal, so

x · (y × z) = y · (z × x) = z · (x × y) .

(j)  
i j k 
 
x × x =  x 1 x2 x3  = 0
 x1 x2 x3 

9.7.2 We want to show that

(x × y) · (z × w) = (x · z)(y · w) − (y · z)(x · w) (*)

holds for arbitrary vectors.


Solutions of Selected Exercises 605

(a) If x = y, both sides of (*) are zero.


(b) For x = k and y = αi + βj, we compute

x × y = −βi + αj ,
z × w = (z 2 w3 − z 3 w2 )i + (z 3 w1 − z 1 w3 )j + (z 1 w2 − z 2 w1 )k ,
x × y · z × w = −β(z 2 w3 − z 3 w2 ) + α(z 3 w1 − z 1 w3 ) ,
x · z = z 3 , y · w = αw1 + βw2 ,
x · w = w3 , y · z = αz 1 + βz 2 ,
(x · z) · (y · w) − (y · z) · (x · w) =
= z 3 (αw1 + βw2 ) − (αz 1 + βz 2 )w3
= α(z 3 w1 − z 1 w3 ) − β(z 2 w3 − z 3 w2 ) .

(c) Decompose an arbitrary y into y = (αi + βj) + γ k, apply (a) and (b) and
add the resulting equations.
(d) Perform steps (a)–(c) analogously for x = i and x = j, instead of x = k.
Decompose an arbitrary x as x = αi + βj + γ k, and apply the previous
results.
9.7.3 (a) (i) We have x0 = 4, y0 = 2, a = −1 and b = 5, therefore the parametric
equations are x = 4 − t, y = 2 + 5t.
(ii) x = 1 + 4t, y = 2 + 5t, z = −3 − 7t.
(b) (i) x = 2 + t, y = −3 − 4t,
(ii) x = −1 − t, y = 3t, z = 2.
9.7.4 The equation is (x − 2) + 4(y − 6) + 2(z − 1) = 0, or x + 4y + 2z = 28.
9.7.5 (c) Taking F = G = r in (a) yields

d
[r(t) · r(t)] = r(t) · r (t) + r (t) · r(t)
dt
d 
so r(t)2 = 2r(t) · r (t) .
dt

Since r(t) is constant, its derivative is zero, thus 2r(t) · r (t) = 0.


Hence, r(t) is orthogonal to r (t).
9.7.6 (a) (i) For F = sinh (x − z)i + 2yj + (z − y 2 )k we get
 
 i j k 

curl F =  ∂
∂x

∂y
∂ 
∂z  = −2yi − cosh (x − z)j ,
sinh (x − z) 2y z − y 2 
∂ ∂ ∂
div(curl F) = (−2y) + (− cosh (x − z)) + (0) = 0 .
∂x ∂y ∂z

(ii) For F = x 2 i + y 2 j + z 2 k we obtain


606 Solutions of Selected Exercises
 
 i j k
∂ ∂ ∂
curl F =  ∂ x ∂ y ∂z  = 0i + 0j + 0k = 0 ,
x 2 y2 z2 
div(curl F) = 0 .

(b) (i) For ϕ(x, y, z) = −2x 3 yz 2 we obtain

∂ϕ ∂ϕ ∂ϕ
= −6x 2 yz 2 , = −2x 3 z 2 , = −4x 3 yz ,
∂x ∂y ∂z

therefore

∇ϕ = −6x 2 yz 2 i − 2x 3 z 2 j − 4x 3 yzk ,
 
 i j k 

curl ∇ϕ =  ∂ x
∂ ∂
∂y

∂z


−6x 2 yz 2 −2x 3 z 2 −4x 3 yz 
= (−4x 3 z + 4x 3 z)i + (−12x 2 yz + 12x 2 yz)j
+ (−6x 2 z 2 + 6x 2 z 2 )k = 0 .

(ii) For ϕ(x, y, z) = e x+y+z we obtain

∇ϕ = e x+y+z (i + j + k) ,
 
 i j k 
 ∂
curl ∇ϕ =  ∂ x ∂
∂y

∂z 

e x+y+z
e x+y+z
e x+y+z 
= (e x+y+z − e x+y+z )i + (e x+y+z − e x+y+z )j
+ (e x+y+z − e x+y+z )k = 0 .

9.7.7 (a) Let F = f 1 i + f 2 j + f 3 k, G = g1 i + g2 j + g3 k. Then

F × G = [ f 2 g3 − f 3 g2 ]i + [ f 3 g1 − f 1 g3 ]j + [ f 1 g2 − f 2 g1 ]k ,

and moreover
∂ ∂ ∂
∇ · (F × G) = [ f 2 g3 − f 3 g2 ] + [ f 3 g1 − f 1 g3 ] + [ f 1 g2 − f 2 g1 ]
∂x ∂y ∂z

∂ f3 ∂ f2 ∂ f1 ∂ f3 ∂ f2 ∂ f1
= g1 i− + g2 − + g3 −
∂y ∂z ∂z ∂x ∂x ∂y

∂g2 ∂g3 ∂g3 ∂g1 ∂g1 ∂g2
+ f1 − + f2 − + f3 −
∂z ∂y ∂x ∂z ∂y ∂x
= G · (∇ × F) − F · (∇ × G) .
Solutions of Selected Exercises 607

(b) From (a) we know that ∇ · (F × G) = G · (∇ × F) − F · (∇ × G). Take


F = ∇ϕ, G = ∇ψ. Using ∇ × ∇ϕ = 0, ∇ × ∇ψ = 0, we get

∇ · (∇ϕ × ∇ψ) = ∇ψ · 0 − ∇ϕ · 0 = 0 .

9.7.8 Let C = c1 i + c2 j + c3 k.
(a) Since r(x, y, z) = xi + yj + zk, We get (C · r) = c1 x + c2 y + c3 z, so
∇(C · r) = c1 i + c2 j + c3 k = C.
(b) Since C is a constant function, div C = 0 and thus

div(r − C) = div r − div C = div r = 1 + 1 + 1 = 3 .

(c)  
 i j k 
 ∂
∇ × (r − C) =  ∂ x ∂
∂y
∂ 
∂z  = 0 .
x − k1 y − k2 z − k3 
$
9.7.11 The work done is equal to C F · dr. Let D be the disk enclosed by the
circle C, let A(D) denote its area. We compute, using the theorem of Green–
Ostrogradski,
 
F · dr = (e x − y + x cosh x) d x + (y 3/2 + x) dy
C
 C

∂ 3/2 ∂ x
= (y + x) − (e − y + x cosh x) d A
∂x ∂y
 D
= (1 + 1) d A = 2 A(D) = 2π(12)2 = 288π .
D

9.7.12 Let D denote the region enclosed by the curve C.


(a)
  
∂ ∂ 2
F · dr = (−x y ) −
2
(x y) d A = (−y 2 − x 2 ) d A
C D ∂x ∂y D
 π/2  2
= (−r 2 )r dr dθ (using polar coordinates)
0 0
 4 2
π 2
π r
= −r 3 dr = − = −2π .
2 0 2 4 0
608 Solutions of Selected Exercises

(b)
 
∂ ∂ sin x
F · dr = (sinh y 3 − 4x) − (e − y) d A
∂x ∂y
C
 D
= (−4 + 1) d A = −3A(D) = −3π42
D
= −48π .

(c)
 
∂ 2 ∂ 2
F · dr = (x − y ) −
2
(x + y ) d A
2
∂x ∂y
C
 
= (2x − 2y) d A = 2 (x − y) d A = 0 ,
D D

since the integrand f (x, y) = x − y satisfies f (−x, −y) = − f (x, y)


and the region is symmetric with respect to the origin.
9.7.13 (a) For F = 4xi − 6yj + k we get

∂ ∂
div F = (4x) − (6y) = 4 − 6 = −2 .
∂x ∂y

Let D be the solid cylinder defined by x 2 + y 2 ≤ 4 and 0 ≤ z ≤ 6. It has


radius 2 and height 6, so its volume equals 6π22 = 24π. We get
 
div F d V = (−2) d V = −48π .
D D

(b) We have F = 2yzi − 4x zj + x yk and therefore div F = 0, hence D
div F d V = 0 for the ball D bounded by the sphere Σ.
9.7.14 For F = 3x yi + z 2 k we have div F = 3y + 2z. By the divergence theorem,
  
F · n dσ = div F d V = (3y + 2z) d V .
Σ D D

The integral is zero because the integrand f (x, y, z) = 3y + 2z satisfies


f (−x, −y, −z) = − f (x, y, z) and the region D is symmetric with respect
to the origin. Therefore

F · n dσ = 0 .
Σ
Solutions of Selected Exercises 609

9.7.15 By Gauss’ divergence theorem


  
C · n dσ = div C d V = 0 dV = 0 ,
Σ D D

since C is constant and therefore div C = 0.


9.7.16 For F = x yi + yzj + x yk we have
 
 i j k
∂ ∂ ∂ 
curl F =  ∂ x ∂ y ∂z  = (x − y)i − yj − xk .
x y yz x y 

The surface Σ is parametrized by z = S(x, y) = 8 − 2x − 4y with domain


D = {(x, y) : x ≥ 0 , y ≥ 0 , x + 2y ≤ 4}. The partial derivatives ∂x S =
−2 and ∂ y S = −4 are constant, an outer normal is given by N = 2i + 4j + k,
the corresponding outer unit normal is

1
n = √ (2i + 4j + k) .
21

Since 1 + (∂x S)2 + (∂ y S)2 = 21, the surface integral is computed as
 
(curl F) · n dσ = (2x − 2y − 4y − x) d A
Σ D
 2  4−2y  2 x=4−2y
x2
= (x − 6y) d x d y = − 6x y dy
0 0 0 2 x=0

2
(4 − 2y)2
= − 6y(4 − 2y) dy
0 2
 2
= 2(4 + y 2 − 4y − 12y + 6y 2 ) dy
0
 2 y=2
y3 y2
=2 (4 + 7y − 16y) dy = 2 4y + 7 − 16
2
0 3 2 y=0

56 32
=2 8+ − 32 = − .
3 3
$ 
9.7.17 By Stokes’ theorem, C F · dr = Σ (curl F) · n dσ , where n is a suitably
oriented normal to the disk x 2 + y 2 ≤ 1. We have
 
 i j k 
 ∂
curl F =  ∂ x ∂
∂y
∂ 
∂z  = −zaj + (2x y + 1)k .
x − y x 2 y x za 
610 Solutions of Selected Exercises

Since the disk is horizontal, it is parametrized


by z = S(x, y) = 0,
√and n = k.
Therefore, curl F · n = 2x y + 1 and 1 + (∂x S)2 + (∂ y S)2 = 1 = 1. We
now compute
   
F · dr = (curl F) · n dσ 1 dσ = 1dA = π ,
C Σ Σ D

since x = y = 0 on Σ and the area of the disk D equals π.


9.7.18 Since in all three cases, the vector fields are defined on all of R3 , by Remark
9.7, in order to see whether F is conservative it is equivalent to check whether
we have curl F = 0 everywhere.
(a) For F = cosh (x + y)(i + j − k) we have
 
 i j k 
 

curl F =  ∂ ∂ ∂ 
∂x ∂y ∂z 
cosh (x + y) cosh (x + y) cosh (x + y)
= − sinh (x + y)i + sinh (x + y)j .

Since curl F is not everywhere zero, F is not conservative.


(b) For F = 2xi − 2yj + 2zk we have
 
 i j k
∂ ∂ ∂
curl F =  ∂ x ∂ y ∂z  = 0i + 0j + 0k = 0 .
2x −2y 2z 

Hence F is conservative.
(c) The vector field F(x, y, z) = i − 2j + k is constant, hence all its partial
derivatives are zero and therefore curl F is everywhere zero. Hence F is
conservative.
9.7.19 The given vector field is F = (x 2 y − z 2 )i + (y 3 − x)j + (2x + 3z − 1)k.
The parametric equations of the curve C are x = cos t, y = sin t, 0 ≤ t ≤ 2π,
thus r(t) = cos ti + sin tj and r (t) = − sin ti + cos tj. We compute

F(r(t)) · r (t) = cos2 t sin t (− sin t) + (sin3 t − cos t) cos t


= − cos2 t sin2 t + sin3 t cos t − cos2 t)
sin2 2t 1
=− + sin3 t cos t − (1 + cos 2t) .
4 2
Solutions of Selected Exercises 611

The line integral becomes


  2π
F · dr = F(r(t)) · r (t) dt
0
 2π
1 1 1
= − (1 − cos 4t) + sin t cos t − − cos 2t dt
3
0 8 2 2
1 1 π 5
= − · 2π − · 2π = − − π = − π .
8 2 4 4

The surface Σ is described by z = S(x, y), where S(x, y) = 1 − x 2 − y 2


with domain D = {(x, y) : x 2 + y 2 ≤ 1}. Since ∂x S = −2x and ∂ y S = −2y,
the vector field of unit normals for Σ with the correct orientation is
N
n= , N = 2xi + 2yj + k .
N

Furthermore, we have
 
 i j k 
 
curl F =  ∂∂x ∂
∂y

∂z
 = (−2z − 2)j + (−1 − x 2 )k .

x 2 y − z 2 y 3 − x 2x + 3z − 1


Since 1 + (∂x S)2 + (∂ y S)2 = N, the surface integral becomes

(curl F) · n dσ
Σ

1
= ((−2z − 2)j + (−1 − x 2 )k) · (2xi + 2yj + k) dσ
Σ N

 
= 2y(2x 2 + 2y 2 − 4) − 1 − x 2 d A
D
 2π  1
 
= 2r sin θ (2r 2 − 4) − 1 − r 2 cos2 θ r dr dθ
0 0
5
= ··· = − π .
4
Solutions to the Exercises of Chap. 10
10.5.1 A finite or infinite set of vectors x1 , x2 , . . . , is said to be linearly independent
if for all n and all scalars λi

λ 1 x 1 + λ2 x 2 + · · · + λ n x n = 0

implies that λi = 0 for every i = 1, 2, . . . , n.


The set {xi } is said to be orthonormal if xi  = 1 for all i and xi ⊥ x j when-
612 Solutions of Selected Exercises

ever i  = j, that is, it consists of mutually orthogonal unit vectors.


An orthogonal set can be converted into an orthonormal set by dividing each
vector by its magnitude.
10.5.2 (a) We have
 5π/4
5π/4
1 x 1 x 
e sin x d x = e sin x − e cos x 
x
= 0.
π/4 2 2 π/4

(b) For m  = n we have


 π/2
cos (2n + 1)x cos (2m + 1)x d x
0

1 π/2
= [cos 2(n − m)x + cos 2(n + m + 1)x] d x
2 0

π/2 π/2
1  1 
= sin 2(n − m)x  + sin 2(n + m + 1)x 
4(n − m) 0 4(n + m + 1) 0

= 0.

10.5.3 (a) The Fourier coefficients for f (x) = x + π are


 
1 π 1 π
a0 = f (x) d x = (x + π ) d x = 2π ,
π −π π −π
 π
1
an = (x + π ) cos nx d x = 0 ,
π −π

1 π 2
bn = (x + π ) sin nx d x = (−1)n+1 .
π −π n

∞
2
Therefore f (x) = π + (−1)n+1 sin nx.
n=1
n
(b) The Fourier coefficients for f (x) = e−8x are

1 4 −8x 1 32
a0 = e dx = (e − e−32 ) ,
2 −4 16
  nπ x 
1 4 −8x 64(−1)n
an = e cos d x = (e32 − e−32 ) ,
2 −4 4 1024 + n 2 π 2
 2  nπ x 
1 2nπ(−1)n
bn = e−8x sin d x = (e32 − e−32 ) .
2 −2 4 1024 + n 2 π 2
Solutions of Selected Exercises 613

The Fourier series is


∞ 
 
1 32 64(−1)n nπ x 2nπ(−1)n nπ x
(e − e−32 ) + (e32 − e−32 ) cos + sin .
32 1024 + n 2 π 2 4 1024 + n 2 π 2 4
n=1

(c) The Fourier coefficients for



0 −π < x < 0
f (x) =
x2 0 ≤ x < π

are  π  π
1 1 1 2
a0 = f (x) d x = x2 dx = π ,
π −π π 0 3

 
1 π 1 π 2
an = f (x) cos nx d x = x cos nx d x
π −π π 0
π  π
1 x2  2
= sin nx  − x sin nx d x
π n 0 nπ 0
π 1  π
2 x  (−1)n
=− − cos nx  + cos nx d x = 2 2 .
nπ n 0 n 0 n

 2 π 
1 π 2 1 x 2 π
bn = x sin nx d x = − cos nx + x cos nx d x
π 0 π n 0 π 0
π 2
= (−1)n+1 + 2 [(−1)n − 1] .
n n π
As Fourier series we obtain
∞  
π 2  2(−1)n π n+1 + 2 [(−1)n − 1] sin nx .
f (x) = + cos nx + (−1)
6 n2 n n2 π
n=1

π
10.5.4 (a) The function f (x) = x + π , −π < x < π is continuous at x = 2
, so

3π π  ∞
2 nπ
= f =π+ (−1)n+1 sin
2 2 n=1
n 2
 
1 1 1
= π + 2 1 − + − + ··· ,
3 5 7
π 1 1 1
so = 1 − + − + ···
4 3 5 7
614 Solutions of Selected Exercises

Fig. E.25 a Third partial sum of the Fourier series of f (x) = x 2 on [−2, 2], b Sixth partial sum
of the Fourier series of f (x) = x 2 on [−2, 2]

(b) The function f defined in 10.5.3 (c) is discontinuous at x = π.


1 π2
The Fourier series converges to [ f (π − 0) + f (−π + 0)] = at x = π.
2 2
That is,
∞  
π2 π 2  2(−1)n π 2
= + cos nπ + (−1)n+1
+ 2 [(−1) − 1] sin nπ
n
2 6 n=1
n2 n n π
∞ ∞
π 2  2(−1)n π2  2
= + (−1)n
= +
6 n=1
n2 6 n=1
n2
 
π2 1 1
= + 2 1 + 2 + 2 + ··· ,
6 2 3

therefore  
π2 1 π2 π2 1 1
= − =1+ + 2 + ··· .
6 2 2 6 22 3

10.5.5 See Fig. E.25.


10.5.6 (a) The complex Fourier series of f is

∞
2nπi[cos (1) − 1] + sin 1 2nπi x
e ,
n=−∞
1 − 4n 2 π 2

it converges to

1+cos (1)
x = 0 or x = 1
g(x) = 2
cos x 0 < x < 1.

(b) The complex Fourier series of f is


Solutions of Selected Exercises 615

1 " nπ  # nπi x


3 1 nπ
− sin + i cos −1 e 2 ,
4 2π n=−∞,n =0
n 2 2

it converges to


⎨ 2 x = 0 or x = 1 or x = 4
1

g(x) = 0 0 < x < 1




1 1<x <4

10.5.7 (a) The function f (x) = x 3 , is an odd function, we expand it in a sine series
on the interval −π < x < π . We get
  3 π 3  π 
2 π 3 2 x 
bn = x sin nx d x = − cos nx  + 2
x cos nx d x
π 0 π n 0 n 0
 π
2π 2 12
= (−1)n+1 − 2 x sin nx d x
n n π 0
 π 1  π 
2π 2 12 x 
= (−1)n+1 − 2 − cos nx  + cos nx d x
n n π n 0 n 0
2π 2 12
= (−1)n+1 + 2 (−1)n .
n n
Thus
∞ 
 
2π 2 12
f (x) = (−1)n+1 + (−1)n
sin nx.
n=1
n n2

(b) The function 


x − 1 −π < x < 0
f (x) =
x +1 0≤ x <π

is an odd function. We expand f in a sine series,


 π
2 2(π + 1) 2
bn = (x + 1) sin nx d x = (−1)n+1 + .
π 0 nπ nπ

Therefore

∞  
2(π + 1) 2
f (x) = (−1)n+1 + sin nx .
n=1
nπ nπ

ei x − e−i x
10.5.8 Using sin x = , we have
2
616 Solutions of Selected Exercises

1 π
f (x)e− π d x
2inx
cn =
π 0

1 π
(sin x)e− π d x
2inx
=
π 0
 π
1
(ei x − e−i x )e− π d x
2inx
=
π 2i 0
 π 
1
e(1− π )i x − e−(1+ π )i x d x
2n 2n
=
2πi 0

1 1 1
 e(1− π )i x +  −(1+ 2n
π )i x
2n
=  e
2πi i 1− 2n
π
i 1+ 2n
π 0
π(1 + e−2in )
= .
π 2 − 4n 2

The fundamental period is T = π so ω = 2π π


= 2 and the values of nω are
0, ±2, ±4, ±6, . . .. The values of |cn | for n = 0, ±1, ±2, ±3, ±4, ±5 are
shown in the table.

n –5 –4 –3 –2 –1 0 1 2 3 4 5
cn 0.0198 0.0759 0.2380 0.4265 0.5784 0.6366 0.5784 0.4265 0.2380 0.0759 0.0198

10.5.9 (a) The Fourier series for f (x) is


∞ ∞
a0  nπ x  nπ x
f (x) = + an cos + bn sin ,
2 n=1
l n=1
l

where
 l  l
1 nπ x 1 nπ x
an = f (x) cos d x , bn = f (x) sin dx .
l −l l l −l l

The N th partial sum of the series is

a0  nπ x 
N N
nπ x
S N (x) = + an cos + bn sin .
2 n=1
l n=1
l

We compute
 l
0≤ ( f (x) − S N (x))2 d x
−l
 l  l  l
= f 2 (x) d x − 2 f (x)S N (x) d x + 2 (x) d x
SN
−l −l −l
Solutions of Selected Exercises 617
⎛ ⎞
 l N  l N
a nπ x nπ x
f (x) ⎝ + ⎠ d x+
0
= f 2 (x) d x − 2 an cos + bn sin
−l −l 2 l l
n=1 n=1
⎛ ⎞
 l N  N  N
a
⎝ + 0 nπ x nπ x ⎠ · a0 + mπ x
an cos + bn sin am cos +
−l 2 l l 2 l
n=1 n=1 m=1


N 
mπ x
+ bm sin dx
l
m=1

 l  l 
N  l
nπ x
= f 2 (x)d x − a0 f (x) d x − 2 an f (x) cos d x−
−l −l −l l
n=1
  l   N  l
a0 
N
nπ x a2 l nπ x
−2 bn dx + 0
f (x) sin
dx + 2 an cos d x+
−l l 4 −l 2 −l l
n=1 n=1
N  l 
nπ x
+ bn f (x) sin dx
−l l
n=1


N 
N  l  N N  l
mπ x nπ x nπ x mπ x
+ am bn cos sin dx + an bm cos sin dx
−l l l −l l l
n=1 m=1 n=1 m=1


N 
N  l  N N  l
mπ x nπ x mπ x nπ x
+ am an cos cos dx + bm bn sin sin dx
−l l l −l l l
n=1 m=1 n=1 m=1

 l N N  
a02 2l
= f 2 (x) d x − a02 l + −2 lan2 − 2 bn2 + an2 l + bn2 l .
−l 4 n=1 n=1

In view of
 l  l  l
nπ x nπ x nπ x nπ x
cos dx = sin dx = cos sin dx = 0
−l l −l l −l l l

and of
  
l
nπ x mπ x l
nπ x mπ x 0 n = m
sin sin dx = sin sin dx = ,
−l l l −l l l l n=m

we now conclude that


 l N N N N
a02
0≤ f 2 (x) d x − a02 l + l − 2l an2 + l an2 − 2 lbn2 + bn2 l ,
−l 2 n=1 n=1 n=1 n=1
618 Solutions of Selected Exercises

so

1 2  2
N
1 l 2
a0 + (an + bn2 ) ≤ f (x) d x .
2 n=1
l −l

Letting N → ∞ we finally arrive at


∞ 
1 2  2 1 l 2
a0 + (an + bn2 ) ≤ f (x) d x .
2 n=1
l −l

(b) Under the given conditions the series

a0   nπ x 

nπ x
+ an cos + bn sin
2 n=1
l l

converges uniformly to f (x). Now a uniformly convergent series of contin-


uous functions can be integrated term by term, so
 l 
a0 l
[ f (x)]2 d x = f (x) d x
−l 2 −l
∞  l  l
nπ x nπ x
+ an f (x) cos d x + bn f (x) sin dx .
n=1 −l l −l l

Since
 l  l
1 nπ x 1 nπ x
an = f (x) cos d x , bn = f (x) sin dx ,
l −l l l −l l

we get !
 ∞
l
a02  2
| f (x)| d x = l
2
+ (an + bn ) .
2
−l 2 n=1



(c) By (b), (an2 + bn2 ) is convergent. Due to the properties of convergent
n=1
series, an2 + bn2 → 0 and therefore an → 0 and bn → 0.
10.5.10 (a) The Fourier transform of f (t) = te−|t| is
 0  ∞
4iξ
fˆ(ξ ) = tet e−iξ t dt + te−t e−iξ t dt = .
−∞ 0 (ξ 2 + 1)2

(b) The Fourier transform is


Solutions of Selected Exercises 619

5
sin 5(ξ + π ) sin 5(ξ − π )
fˆ(ξ ) = sin (πt)e −iξ t
dt = i − .
−5 ξ +π ξ −π

(c) The Fourier transform is


 0  k  0  k
fˆ(ξ ) = e−iξ t dt + e−iξ t dt = eiξ t dt + e−iξ t dt
−k 0 k 0
 
k
eiξ t − e−iξ t k
= −2i dt = −2i sin (ξ t) dt
0 2i 0
2i
= [cos (ξ k) − 1]
ξ

10.5.11 We have
 ∞  ∞
ĝ(ξ ) = eiξ0 t f (t)e−iξ t dt = f (t)e−i(ξ −ξ0 )t dt = fˆ(ξ − ξ0 ) .
−∞ −∞

10.5.12 (a) We get


 ∞  ∞
ĝ(ξ ) = e−iξ t f (−t) dt = e−iξ(−τ ) f (τ ) dτ
−∞ −∞
 ∞
= e−i(−ξ )τ f (τ ) dτ = fˆ(−ξ ) .
−∞

(b) We start from


 ∞
g(ξ ) = fˆ(ξ ) = f (t)e−iξ t dt .
−∞

We interchange ξ and t and take the Fourier transform


 ∞  ∞
ĝ(ξ ) = e −iξ t
fˆ(t) dt = eit (−ξ ) fˆ(t) dt = 2π f (−ξ ) ,
−∞ −∞

by the formula for the inverse Fourier transform.


10.5.15 We find fˆ for f = χ[− 21 , 21 ] .

 t=1/2

e−iξ t  1/2
fˆ(ξ ) = e −iξ t
f (t) dt = e dt = −iξ t
−∞ −1/2 −iξ t=−1/2
ξi ξi
!
1 2 e 2 − e− 2 2 ξ
= − [e− 2 ξ i − e 2 ξ i ] =
1 1
= sin .
iξ ξ 2i ξ 2
620 Solutions of Selected Exercises

10.5.16 We want to prove that f ∗ g is a continuous function on R if f and g are


square integrable on R. Let ε > 0 be given. Then given x, y ∈ R
  
 

|( f ∗ g)(x) − ( f ∗ g)(y)| =  f (t)g(x − t) dt − f (t)g(y − t) dt 
R R

≤ | f (t)||g(x − t) − g(y − t)| dt (E.1)
R

and, using the Cauchy–Schwarz inequality for functions,


 1/2  1/2
≤ | f (t)|2 dt |g(x − t) − g(y − t)|2 dt
R R
 1/2  1/2
= | f (t)| dt
2
|g(t − (x − y)) − g(t)| dt2
. (*)
R R

By a result stated below,


 1/2
lim |g((t) − (x − y)) − g(t)| dt 2
= 0.
x→y R

Therefore, taking the limit y → x in (*) we see that f ∗ g is continuous in


x, and thus on R, since x is arbitrary.
The result on the continuity of the translation needed here is the following:
Suppose that f is square integrable on R. Then

lim | f (x) − f (x − t)|2 d x = 0 .
t→0 R

Solutions to Exercises of Chap. 11

11.11.1 (a) Use separation of variables.

dy y
= 0.03dt, ln y = 0.03t + ln c, = e0.03t , so y(t) = ce0.03t .
y c

(b) y(t) = cekt , (c) y(t) = (1/5) cos 5t + c.


11.11.2 (a) Let A be the balance account in rupees as a function of t. The interest
is being added continuously to the account at a rate of 5% of the balance
at that moment, so the rate at which balance is increasing equals 5% of the
current balance. The following differential equation describes the process:

dA
= 0.05A.
dt
Solutions of Selected Exercises 621

It may be noted that initial deposit does not come into the picture as it does
not affect the process.
(b) The initial value of A at time 0 is A(0) = 10000. Thus the solution
becomes A(t) = A(0)e0.05t = 10000e0.5t .
11.11.3 (a)
dy dx 1 1
x 2 dy + y 2 d x = 0, = − 2 , − = + c.
y2 x y x

Solving for y yields


x
y(x) = − .
1 + cx

(b)
dx 1
d x − x 2 dy = 0, dy = , y=− + c.
x2 x
(c)

dy dy 5 y
= e− 2 x .
5 2
= −5x y, = −5x, ln y = − x 2 + ln c,
dx y 2 c

11.11.4 (a) b(x) = 3x 2 /x 3 = 3/x, f (x) = (cos x)/x 3 . We compute the integrating
factor I as

B(x) = 3 ln x, I (x) = e B(x) = e3 ln x = x 3 .


cos x 3
f (x)I (x) = · x = cos x, Q(x) = sin x.
x3
The solution y satisfies

sin x + c
y(x) · x 3 = Q(x) + c = sin x + c, y(x) = .
x3

(b) b(x) = 1/x, f (x) = 1/x 3 , B(x) = ln x, I (x) = eln x = x, f (x)I (x) =
1/x 2 , Q(x) = −1/x. The solution y satisfies

1 1 c
y(x)I (x) = y(x) · x = Q(x) + c = − + c, y(x) = − 2
+ .
x x x

11.11.5 b(x) = 5, B(x) = 5x, I (x) = e5x , f (x) = 20, f (x)I (x) = 20e5x , Q(x) =
4e5x . The solution y satisfies

y(x)I (x) = y(x)e5x = Q(x) + c = 4e5x + c, y(x) = 4 + ce5x ,

where c is to be determined from the initial condition y(0) = 2. This gives


c = −2 so
622 Solutions of Selected Exercises

y(x) = 4 − 2e5x .

11.11.6 From the given differential equation we get

dy 2 3
= − y = 2 y4.
dx x x

This is a Bernoulli equation with n = 4, b(x) = −2/x and f (x) = 3/x 2 .


Substituting v = y −3 we get

dv 6 9
+ v = − 2.
dx x x

This is a linear equation for v with b(x) = 6/x. We get B(x) = 6 ln x and
the integrating factor I (x) = e6 ln x = 6x. The solution v satisfies

9 9
x 6 v(x) = − x 5 + c, v(x) = − x −1 + cx −6 .
5 5

As v = y −3 , the initial condition y(1) = 1/2 gives c = 49/5, so

9 49
y(x)−3 = − x −1 + x −6 .
5 r
11.11.7 We apply the separation of variables method and compute

dy dy 1 x2 2
= x y2, = xd x, − = + c, y(x) = − .
dx y2 y 2 x2 +c

Inserting the initial condition y(1) = −2/3 yields c = 2.


11.11.8 The model is d N /dt = k N . It has the solution N (t) = cekt . Choosing for
t = 0 the year 1970, we get c = N (0) = 125000. We compute k from
   
140 1 140
140000 = N (20) = 125000e 20k
, 20k = ln , k= ln ,
125 20 125

so k ≈ 0.00567. This yields the estimate for the population in the year 2020

N (50) = 125000e50k ≈ 166000.

11.11.9 Since the initial dose is y0 , the drug concentration at any time 0 < t < T is
found from the equation y = y0 e−kt , the solution of the equation dy/dt =
−ky. At t = T the second dose of y0 is taken, which increases the drug
level to y(T ) = y0 + y0 e−kT = y0 (1 + e−kT ). For t > T , the drug level
immediately begins to decrease. To find its mathematical expression we
solve the initial value problem:
Solutions of Selected Exercises 623

dy
= −ky, y(T ) = y0 (1 + e−kT ).
dt
Solving this initial value problem we get

y = y0 (1 + e−kT )e−k(t−T ) .

This equation gives the drug level for T < t < 2T . The third dose of y0 is
to be taken at t = 2T , and the drug level just before this dose is taken is
given by

y = y0 (1 + e−kT )e−k(2T −T ) = y0 (1 + e−kT )e−kT .

The dosage y0 taken at t = 2T raises the drug level to

y(2T ) = y0 + y0 (1 + e−kT )e−kT = y0 (1 + e−kT + e−2kT ).

Continuing in this way, we find that, after the (n + 1)th dose is taken, the
drug level is

y(nT ) = y0 (1 + e−kT + e−2kT + · · · + e−nkT ).

We notice that the drug level after the (n + 1)th dose is the sum of the first
(n + 1) terms of a geometric series, with the first term y0 and the common
ratio e−kT . The sum can be written as

y0 (1 − e−(n+1)kT )
y(nT ) = .
1 − e−kT

As n becomes large, the drug level at the discrete sequence 0, T, 2T, . . . of


times approaches a saturation value ys given by
y0
ys = lim y(nT ) = .
n→∞ 1 − e−kT

√ belong to those values of H for which d H/d x =


11.11.10 (a) The constant solutions
0.
√ As d H/d x = H 4 − 2H this happens when H = 0 or when 0 =
4 − 2H , that is, H = 2.
(b) Using the separation of variables method, start from

dH
√ = d x.
H 4 − 2H

Integrating both sides we get


624 Solutions of Selected Exercises
 
1√
− tanh−1 4 − 2H = x + c.
2

Using the fact that the hyperbolic tangent is an odd function we have

1√
4 − 2H = tanh(−x − c) = − tanh(x + c)
2
1
(4 − 2H ) = tanh2 (x + c)
4

Solving for H we finally obtain, using the identity 1 − tanh2 x = sech2 x,

H (x) = 2sech2 (x + c).

(c) Inserting H (0) = 2 we find that sech2 (c) = 1 and c = 0.


11.11.11 Let t be the number of hours after the body was discovered, and T (t) be
the temperature (in degrees Celsius) of the body at time t. We want to find
the value of t for which T = 37 (normal body temperature). This value of
t will, of course, be negative. By Newton’s law of cooling,

dT
= k(T − a),
dt
where k is a constant and a (the ambient temperature) is 22. Thus,

dT
= k(T − 22).
dt
Separating the variables and integrating we get

dT
= kdt, ln |T − 22| = kt + c.
T − 22

Because T − 22 > 0, we obtain ln(T − 22) = kt + c. The condition


T (0) = 31 yields c = ln 9. The condition T (1) = 30 gives ln 8 = k + ln 9,
so k = ln(8/9). Thus,

T − 22 8
ln = t ln .
9 9
Now we determine t from the condition T (t) = 37:

37 − 22 8 ln(15/9)
ln = t ln , t = ≈ −4.34.
9 9 ln(8/9)
Solutions of Selected Exercises 625

Accordingly, the murder occurred about 4.34 h before the time of discovery
of body at 11:00 am. Since 4.34 h is approximately 4 h and 20 min, the
time of the murder is estimated as 6:40 am.
11.11.12 The functions e x and sin x are linearly independent as none of them is a
constant multiple of the other.
11.11.13 (a) The auxiliary equation is 4λ2 − 10λ + 25 = 0. We have a = 4, b =
−10, c = 25, so b2 − 4ac = 0. The auxiliary equation has the double root

−b + b2 − 4ac b 5
λ1 = λ2 = =− = .
2a 2a 4
The general solution of the differential equation is
5 5
y(x) = c1 e 4 x + c2 xe 4 x .

(b) The auxiliary equation is λ2 − 16λ + 64 = 0. We have a = 1, b = −16


and c = 64, so b2 − 4ac = 0, and the auxiliary equation has the double
root
b
λ1 = λ 2 = − = 8.
2a
The general solution is

y(x) = c1 e8x + c2 xe8x .

(c) The auxiliary equation is λ2 + 2λ + 2 = 0. We have a = 1, b = 2 and


c = 2, so b2 − 4ac = −4 < 0. The auxiliary equation has two conjugate
complex roots
√ √
−2 + 4−8 −2 − 4−8
λ1 = = −1 + i, λ2 = = −1 − i.
2 2
We have α = −1 and β = 1. The general solution of the differential equa-
tion is

y(x) = c1 e−x cos x + c2 e−x sin x = e−x (c1 cos x + c2 sin x).

11.11.14 The auxiliary equation is 4λ2 − 4λ − 3 = 0. We have a = 4, b = −4 and


c = −3, so b2 − 4ac = 16 + 48 = 64 > 0. We have to distinct real roots

b2 − 4ac
−b + 4+8 3
λ1 = = = ,
2a 8 2

−b − b2 − 4ac 4−8 1
λ2 = = =− .
2a 8 2
626 Solutions of Selected Exercises

The general solution is

y(x) = c1 e− 2 x + c2 e 2 x
1 3

with the derivative


1 1 3
y (x) = −c1 e− 2 x + c2 e 2 x .
3

2 2
Inserting the initial condition gives

1 3
1 = y(0) = c1 + c2 , 5 = y (0) = − c1 + c2 .
2 2

From the equations we get c1 = − 47 and c2 = 11


4
, so the solution is

7 1 11 3
y(x) = − e− 2 x + e 2 x .
4 4
11.11.15 We have

∂u 1 ∂ 2u 2x · 2x − 2(x 2 + y 2 ) 2x 2 − 2y 2
= 2 2x, = = 2 ,
∂x x + y2 ∂x 2 (x + y )
2 2 2 (x + y 2 )2

and in the same manner

∂ 2u 2y 2 − 2x 2
= .
∂ y2 (x 2 + y 2 )2

Thus Δu(x, y) = 0, that is, u(x, y) = ln(x 2 + y 2 ) solves the Laplace


equation.
11.11.16 For u(x, t) = sin x cos t we get

∂u ∂ 2u
= cos x cos t, = − sin x cos t,
∂x ∂x2
∂u ∂ 2u
= − sin x sin t, = − sin x cos t.
∂t ∂t 2

Thus u(x, t) = sin x cos t satisfies the wave equation.


11.11.17 For u(x, t) = t −1/2 e−x /t we get
2

∂u 1
= − t −3/2 e−x /t + t −5/2 x 2 e−x /t ,
2 2

∂t 2 2
∂u ∂ u
= −2xt −3/2 e−x /t , = −2t −3/2 e−x /t + 4x 2 t −5/2 e−x /t ,
2 2 2

∂x ∂x 2
Solutions of Selected Exercises 627

so
∂u 1 ∂ 2u
= .
∂t 4 ∂x2

11.11.18 The equation (∂ 2 /∂ x 2 )u = 0 means that the function ∂u/∂ x does not
depend on x. Therefore,

∂u
(x, y) = A(y),
∂x
where A is an arbitrary function. Integrating both sides of this equation
with respect to x while keeping y fixed we obtain

u(x, y) = A(y)x + B(y),

where A and B are arbitrary functions of y.


11.11.19 We look for solutions in the form u(x, t) = U (x)T (t). Inserting u into
the differential equation gives U (x)T (t) = c2 U (x)T (t). Separation of
variables yields that T /T and U /U are constant and satisfy

U (x) + λU (x) = 0, T (t) + λc2 T (t) = 0

for some constant λ. The functions Un (x) = sin(nπ x/) solve the first
equation and satisfy Un (0) = Un () = 0. The functions Tn (t) = sin
(nπct/) solve the second equation and satisfy Tn (0) = 0. So the func-
tions u n (x, t) = Un (x)Tn (t) solve the given problem except possibly for
the condition (∂u/∂t)(x, 0) = h(x). We have
 
nπc nπct
Tn (t) = cos
 

and thus
∂ nπc  nπ x 
u n (x, 0) = Un (x)Tn (0) = sin .
∂t  
Let the odd extension of h possess the Fourier expansion

  nπ x     nπ x 
2
h(x) = bn sin , bn = h(x) sin dx .
n=1
  0 

We set


u(x, t) = dn u n (x, t).
n=1
628 Solutions of Selected Exercises

In order to satisfy the condition (∂u/∂t)(x, 0) = h(x), we choose


 nπ c −1    nπ x 
2
dn = bn = h(x) sin dx .
 nπ c 0 

Then

∂  ∂∞  nπc  nπ x  ∞
u(x, 0) = dn u n (x, 0) = dn sin
∂t n=1
∂t n=1
 

  nπ x 
= bn sin = h(x).
n=1


As u also solves all the other equations of the boundary value problem by
the superposition principle, it solves the given problem.
11.11.20 Assume the solution to be of form u(x, t) = X (x)T (t). Substituting u into
the given heat equation we get X (x)T (t) = T (t)X (x), so T (t)/T (t) =
X (x)/ X (x). Thus, both sides define constant functions of t and x, respec-
tively, so

X (x) + λX (x) = 0, T (t) + λT (t) = 0

holds for some√constant λ. The √ first equation has the general solution
X (x) = a cos( λx) + b sin( λx). (In order that it can satisfy the bound-
ary conditions X (0) = X (1) = 0 and that it is not identically zero, we
must have λ > 0.) The boundary condition 0 = u(0, t) = X (0)T (t) gives
a = 0. The boundary condition 0 = u(1, t) = X (1)T (t) gives λ = n 2 π 2
where n = 1, 2, . . . . The second equation is solved by T (t) = e−λt . There-
fore, the functions

u n (x, t) = e−n π 2t
2
sin(nπ x)

solve the given heat equation and the boundary conditions at x = 0 and
x = 1. In order to satisfy the initial condition u(x, 0) = x(1 − x), we look
for a solution u of the form


cn e−n π 2t
2
u(x, t) = sin(nπ x).
n=1

Since


u(x, 0) = cn sin(nπ x),
n=1
Solutions of Selected Exercises 629

it suffices to choose cn as the Fourier coefficients of the odd extension of


f (x) = x(1 − x). These are given by the numbers
 1
cn = 2 x(1 − x) sin(nπ x) d x
0

which may be explicitly computed by evaluating this integral.


11.11.22 We solve this problem by separation of variables. Let u(x, y) = X (x)Y (y)
and substitute into the given Laplace equation to obtain X (x)Y (y) +
X (x)Y (y) = 0. This gives X (x)/ X (x) = Y (y)Y (y), so both sides
must be equal to a constant function. Therefore, the functions X and Y
must satisfy the differential equations

X + λX = 0, Y − λY = 0

for some constant λ. The boundary condition 0 = u(0, y) = X (0)Y (y)


and 0 = u(a, y) = X (a)Y (y) require X (0) = X (a) = 0 since otherwise
Y is identically zero which would result in u being identically zero. The
boundary value problem for X ,

X + λX = 0, X (0) = X (a) = 0,

has the nontrivial solutions

n2π 2  nπ x 
λn = , X n (x) = sin , n = 1, 2, . . .
a2 a
For any n, the corresponding function Yn thus has to satisfy

n2π 2
Y − Y = 0.
a2
This equation has the general solution
nπ y nπ y
Yn (x) = cn e a + dn e− a .

The boundary condition 0 = u(x, 0) = X (x)Y (0) yields as above that we


must have Yn (0) = 0. Therefore, dn = −cn , and Yn is of the form
 nπ y 
Yn (x) = 2cn sinh , n = 1, 2, . . .
a
Therefore, the functions
 nπ x   nπ y 
u n (x, y) = X n (x)Yn (y) = sin sinh , n = 1, 2, . . .
a a
630 Solutions of Selected Exercises

satisfy for n = 1, 2, . . . the Laplace equation in the rectangle and the


homogeneous boundary conditions on the lower side and on the vertical
sides. To find a solution which satisfies the condition on the upper side
y = b, we use a linear superposition of the u n in form of an infinite series,

  nπ x   nπ y 
u(x, y) = bn sin sinh . (E.2)
n=1
a a

We need to choose bn such that



  nπ x   
nπb
x = u(x, b) = bn sin sinh .
n=1
a a

The rightmost part coincides with the Fourier expansion of the odd exten-
sion of f (x) = x on [0, b]. We therefore have

  a  nπ x 
nπ b 2
bn sinh = x sin dx
a a a
0

and thus
a  nπ x 
2
bn =   x sin d x.
a sinh nπb
a
a
0

With this choice of bn , (E.2) gives the solution of the given boundary value
problem (a so-called Dirichlet problem) on the rectangle.
Solutions to Exercises of Chap. 12
12.5.1 (a) >> xminus = −3 : 0.01 : 0;
>> xplus = 0 : 0.01 : 3;
>> y1 = exp(xminus);
>> y2 = (xplus-1).ˆ2;
>> plot(xminus, y1, xplus, y2) (Fig. E.26)
(b) >> x = 0 : 0.1 : 10;
>> f x = (x.* sin (x) + exp(−x/5).* cos (x));
>> stem (x, f x) (Fig. E.27)
(c) >> [x, y] = meshgrid(−2 : 0.4 : 2);
>> u = x;
>> v = (x.ˆ2 + y.ˆ2);
>> quiver (x, y, u, v) (Fig. E.28)
(d) >> theta = 0 : pi/100 : 10* pi;
>> r = 2* exp(-theta/10)
>> polar(theta, r ) (Fig. E.29)
Solutions of Selected Exercises 631

Fig. E.26 Solution of


Exercise 12.5.1(a)

Fig. E.27 Solution of


Exercise 12.5.2(b)

Fig. E.28 Solution of


Exercise 12.5.1(c)
632 Solutions of Selected Exercises

Fig. E.29 Solution of


Exercise 12.5.1(d)

Fig. E.30 Solution of


Exercise 12.5.2(a)

12.5.2 (a) >> t = 0 : pi/50 : 10* pi;


>> plot3(t.*sin (t), t.*cos (t), t)
>> grid on
>> axis square (Fig. E.30)
(b) >> [x, y, z] = meshgrid(−2 : 0.4 : 2, −2 : 0.4 : 2, −1 : 0.4 : 1);
>> u = y;
>> v = x;
>> w = x.ˆ2 + z;
>> quiver3(x, y, z, u, v, w) (Fig. E.31)
(c) Program for Fig. 8.3 of Chap. 8
>> [x, y] = meshgrid(−2 : 0.1 : 2);
Solutions of Selected Exercises 633

Fig. E.31 Solution of


Exercise 12.5.2(b)

>> c = x.*y;
>> surfc(x, y, c) % this can be omitted for color plot (Fig. E.32) Pro-
gram for Fig. 8.6 of Chap. 8
>> [x, y] = meshgrid(−6 : 0.2 : 6);
>> f x y = sin (sqrt((x.ˆ2) + (y.ˆ2)));
>> surfc(x, y, f x y)% this can be omitted for color plot (Fig. E.33)
(d) >> [x, y] = meshgrid(−3 : 0.1 : 3);
>> z = (x.*y.*(x.ˆ2 − y.ˆ2))./(x.ˆ2 + y.ˆ2);
>> mesh(x, y, z) (Fig. E.34)
12.5.3 (a) >> syms k% n is represented by k here
>> sum1 = symsum(((log(k))ˆ2)/kˆ1.5, 1, 200);
>> double(sum1)

ans =

7.8960
(b) >> syms k% n is represented by k here
>> sum2 = symsum(((−1)ˆ(k + 1)/(k* log(k + 1)), 1, 200);
>> double(sum2) (Fig. E.34)

ans =

1.1360
(c) >> syms k% n is represented by k here
>> s = symsum(1/kˆ3, 1, 200);
>> double(s)

ans =

1.2020
634 Solutions of Selected Exercises

Fig. E.32 Solution of


Exercise 12.5.2(c), first
figure

Fig. E.33 Solution of


Exercise 12.5.2(c), second
figure

Fig. E.34 Solution of


Exercise 12.5.2(d)

12.5.4 (a) >> limit(1/(5 + 4* cos (x)), x, 0)

ans=

1/9
(b) >>limit((2*x)/sqrt(xˆ − 1), x, inf)

ans =

2
Solutions of Selected Exercises 635

(c) >> limit((exp(a cos (x))./sqrt(1 − x.ˆ2)), x, 0)

ans =

exp(1/2* pi)

>> double(ans)

ans =

4.8105
(d) >> limit(x/sqrt(9 + 4*xˆ2), x, inf)

ans =

1/2
12.5.5 Numerical Integration:
>> myfunc = inline( exp(x).* sin (x) );
>> x = quad(myfunc, 0, 5)
x=

−91.7081
Symbolic Integration:
>> syms x;
>> f x = exp(x)* sin (x);
>> x = int( f x, 0, 5);
>> x = double(x)
ans =

−91.7081

Here, we can see that both the methods are giving same answer.
12.5.6 (a) >> syms x;
>> ya = diff((4*xˆ2 − 1)(7*xˆ3 + x))

ya =

8*x*(7xˆ3 + x) + (4*xˆ2 − 1)*(21*xˆ2 + 1)

>> ezplot(ya) (Fig. E.35)


636 Solutions of Selected Exercises

Fig. E.35 Solution of


Exercise 12.5.6(a)

(b) >> syms x;


>> yb = diff((xˆ2 − 1)/(xˆ4 + 1))

yb =

2*x/(xˆ4 + 1) − 4*(xˆ4 + 1)ˆ2*xˆ3 (Fig. E.36)


(c) >> syms x;
>> yc = diff(sin (x)/(1 + cos (x)))

yc =

cos (x)/(1 + cos (x)) + sin (x)ˆ2/(1 + cos (x))ˆ2 (Fig. E.37)
(d) >> syms x;
>> yd = diff(exp(x)* sin (x))

yd =

exp(x)* sin (x) + exp(x)* cos (x) (Fig. E.38)


12.5.7 (a) Function file:
% Save this file with name ‘myode.m’
function xdot = myode(t, x)
xdot = [x(2); − sin (x(1))];
On command prompt, write following:
>> [t, x] = ode45( myode , [0, 20], [1; 0]);
>> plot(t, x) (Fig. E.39)
(b) Function file:
% Save this file with name ‘myode.m’
Solutions of Selected Exercises 637

Fig. E.36 Solution of


Exercise 12.5.6(b)

Fig. E.37 Solution of


Exercise 12.5.6(c)

Fig. E.38 Solution of


Exercise 12.5.6(d)
638 Solutions of Selected Exercises

Fig. E.39 Solution of


Exercise 12.5.7(a)

Fig. E.40 Solution of


Exercise 12.5.7(b)

function xdot = myode(t, x)


xdot = 3* − 4* cos (t);
On command prompt, write following:
>> [t, x] = ode45( myode , [0, 20], [1; 0]);
>> plot(t, x) (Fig. E.40)
12.5.8 01. % Fourier Analysis of ramp function for Fig. 12.24
02. t0 = −pi; % initial time
03. t0_T = pi; % final time
04. mp = 0; % mid point
05. T = t0_T − t0; % time period
06. syms t; % sym variable declaration
07. f t = t; % function declaration
08. w0 = 2*pi/T ; % frequency
09. n = 1.5; % number of Harmonics
10. % computation of Trigonometric Fourier Series Coefficients
11. a0 = 1/T *(int( f t, −pi, pi));
12. an = 2/T *(int( f t* cos (n*w0*t), −pi, pi))
Solutions of Selected Exercises 639

Fig. E.41 Fourier approximation of the ramp function for 5 harmonics

13. bn = 2/T *(int( f t* sin (n*w0*t), −pi, pi))


14. ann = an.* cos (n*w0*t);
15. bnn = bn.* sin (n*w0*t);
16. avg = double(a0); % converting sym variable to value
17. t = −3*pi : pi/100 : 3*pi; %plotting Fourier series for 3 periods
18. suma = 0; sumb = 0
19. for j = 1 : 5
20. sumb = sumb + bnn( j);
21. suma = suma + ann( j);
22. end
23. bnsum = eval(sumb);
24. ansum = eval(suma);
25. plot(t, avg+bnsum+ansum)% plot of truncated harmonics func-
tion
26. hold on
27. % plotting actual function for 3 periods
28. t x1 = −3* pi: pi/100 : −pi;
29. t x2 = −pi: pi/100 : pi;
30. t x3 = pi : pi/100 : 3*pi;
31. plot(t x1,t x1 + 2*pi,‘r’,t x2,t x2,‘r’,t x3,t x3 − 2*pi,‘r’)
32. % formatting plot
33. xlabel(‘Time’)
34. ylabel(‘Amplitude’)
35. title(‘Fourier approximation plot for 5 harmonics for ramp func-
tion’)
36. legend(‘Fourier Approximation’, ‘Actual Function’)
For above program, we get following result- (Fig. E.41)

Similarly we can draw for 10 and 30 harmonics also just by changing the
640 Solutions of Selected Exercises

Fig. E.42 Fourier approximation of the ramp function for 10 harmonics

Fig. E.43 Fourier approximation of the ramp function for 30 harmonics

limit of for loop in line number 09 and 19. For example, for 30 harmonics,
we have to replace 5 by 30. For 10 and 30 harmonics, see Figs. E.42 and E.43.
References

1. Adzievski K, Siddiqi AH (2014) Introduction to partial differential equations for scientists and
engineers using mathematica. CRC Press, Taylor and Francis Group, Boca Raton
2. Anton H (1999) Calculus: a new horizon, 6th edn. Wiley, New York
3. Blanchard P, Robert LP, Glem RH (2012) Ordinary differential equations. Richard Stratton,
New York
4. Boyce WE, Di Prima RC (2010) Elementary differential equations and boundary value prob-
lems, 9th edn. Wiley, New York
5. Brenan JR, Boyce WE (2007) Differential equations: an introduction to modern methods and
applications
6. Brezis A, Browder F (1998) Partial differential equations in the 20th century. Adv Math 135:76–
144
7. Cartwright M (1990) Fourier methods for mathematicians, scientists and engineers. Ellis Hor-
wood series in mathematics applications. Ellis Horwood, New York
8. Chapman SJ (2000) MATLAB programming for engineers, 2nd edn. Thomson [Brooks/Cole],
Pacific Grove
9. Connor KMO (2005) Calculus: labs for MATLAB. Jones and Bartlett Publishers, Burlington
10. Cooper JM (2001) A MATLAB companion for multivariable calculus. Academic, Elsevier,
New York
11. Davis JH (2004) Methods of applied mathematics with MATLAB overview. Birkhäuser, Boston
12. Delvin K (1997) An electronic companion to calculus. Cogito Learning Media, Inc., New York
13. Edwards CH Jr (1979) The historical development of the calculus. Springer, New York
14. Furati KM, Nashed MZ, Siddiqi AH (eds) (2006) Mathematical models and methods for real
world systems. Chapman & Hall/CRC, Taylor and Francis, Boca Raton
15. Gonzales RC, Wood RE (1993) Digital image processing. Addison-Wesley Publishing Com-
pany, Reading
16. Haeussler EF Jr, Paul RS (1999) Introductory mathematical analysis for business, economics
and the life and social sciences, 9th edn. Prentice Hall International, INC., Upper Saddle River
17. Hecht E (1996) Physics calculus. Brooks/Cole Publishing Company, An International Thomson
Publishing Company, Pacific Grove
18. Hughes Hallett D, Gleason AM (1999) Applied calculus produced by the consortium based at
Harvard under the auspices of the National Science Foundation, USA. Wiley, New York
19. Hubbard BB (1998) The world according to wavelets, 2nd edn. A K Peters, Natick
20. Jensen G (2000) Using Matlab in calculus. Prentice Hall, Upper Saddle River
21. Lang S (1980) A first course in calculus, 4th edn. Addison-Wesley Publishing Company,
Reading
© Springer Nature Singapore Pte Ltd. 2019 641
M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6
642 References

22. Marder L (1971) Calculus of several variables. Ruskin House, London


23. Marsden JE, Tromba AJ (1988) Vector calculus, 3rd edn. W.H. Freeman and Company, New
York
24. Math Works Inc, US, MATLAB: language of technical computing (Mathematics). Mathworks
Inc, US. www.mathworks.com, ver.7.0
25. Mathworks Inc, US, MATLAB: language of technical computing (Getting started). Mathworks
Inc, US. www.mathworks.com, ver.7.0
26. Neil PVO (2003) Advanced engineering mathematics, 5th edn. Thomson, Brooks/Cole, Pacific
Grove
27. Pratap R (2002) Getting started with MATLAB 7 (A quick introduction for scientists and
engineers), Indian edn. Oxford University Press, Oxford
28. Prestini E (2004) The evolution of applied harmonic analysis models of the real world.
Birkhäuser, Boston
29. Salas SL, Hille E, Etgen GJ (2003) Calculus: one and several variables, 9th edn. Wiley, New
York
30. Siddiqi AH, Al Lawati M, Boulbrachene M (2018) Modern engineering mathematics. CRC
Press, Taylor and Francis Group, New York
31. Siddiqi AH, Manchanda P (2006) A first course of differential equations with applications.
MacMillan, India Ltd
32. Stewart J (1999) Calculus. Brooks/Cole Publishing Company, Pacific Grove
33. Strang G, Calculus, PDF file at MIT. http://onlinebooks.library.upenn.edu/book/lookpid.
QA303
34. Swokowski EW, Olinick M, Pence D, Ecole JA (1994) Calculus, 6th edn. PWS Publishing
Company, Boston
35. Tan ST (2007) Applied calculus, 2nd edn./7th edn. PWS, Kent Publishing Company, Boston
36. Thomas GB Jr, Finney RL (2003) Calculus, 10th edn. Addison Wesley, Boston
37. Xie W (2010) Differential equations for engineers. Cambridge University Press, Cambridge
38. Zill DG (2000) Advanced engineering mathematics, 2nd edn. Jones and Bartlett, Burlington
39. Zill DG (2001) A first course in differential equations with modern applications, 7th edn.
Brooks/Cole, Pacific Grove
Index

Symbols C
F, 413 Cartesian coordinates, 513
N, 1, 509 Cauchy principal value, 417
Q, 509 Cauchy–Schwarz inequality, 316
R, 2 Chain rule, 264
Z, 509 Circulation, 343
|x|, 15, 16 Commutative, 313
e, 20 Complex number, 509
n, 17 Component function, 320
Composition, 29
limit of, 46
A Concave function, 111
Absolute value, 511 Conservation law, 461
Conservative, 345
Absolute value function, 15
Continuous, 48, 248
derivative, 63
piecewise continuous, 384
Angular momentum, 367
Contour diagram, 240
Angular velocity, 371
Contraposition, 30
Annuity, 218
Convex function, 111
Anticommutative, 317
Convolution, 419
Antiderivative, 156
Cosecant function, 517
Arc length, 342
Cosine, 517
Area hyperbolic function, 521
Cotangent function, 517
Area of a surface, 347
Critical point, 105, 274
Average cost, 115 Cross product, 316
Average value, 299 Curl, 333
Curve, 321
arc length, 205
B closed curve, 340
Bandwidth, 418 continuous, 340
Bernoulli equation, 441 differentiable, 340
Binomial theorem, 511 end point, 340
Boundary condition, periodic, 431 initial point, 340
Boundary conditions, 431 length, 202
Boundary value problem, 431, 465, 469 level curve, 240
Bound of a function, 15 oriented, 352
Burger’s equation, 461 parametric form, 204
© Springer Nature Singapore Pte Ltd. 2019 643
M. Brokate et al., Calculus for Scientists and Engineers, Industrial
and Applied Mathematics, https://doi.org/10.1007/978-981-13-8464-6
644 Index

parametrization, 340 Discontinuity


piecewise smooth, 340 jump, 49
simple, 352 point of, 49
smooth, 340 removable, 49
Discontinuous, 49
Distributive, 313
Divergence, 333
D Divergence theorem, 356, 533
Decreasing, 106 Domain of, 2
Decreasing function, 12 Dot product, 312
Demand function, 118 Double integral, 291, 540
Density function, 228, 300
exponential, 232
uniform, 232 E
Derivative, 57 Eigenfunction, 466
directional, 328 Eigenvalue, 466
Eikonal equation, 463
higher order, 77
Electric field, 364
implicit, 69
Euclidean norm, 309
in economics, 72
Euler equation, 464
left-sided, 58 Euler number, 20
of a product, 65 Exponential
of a quotient, 66 decay, 28
of a sum, 64 growth, 26
of an inverse function, 66 Exponential function, 19
partial, 251 derivative, 68
right-sided, 58 Extreme value theorem, 50
second, 77 Extremum, 103, 274
second partial, 253 first derivative test, 109, 274
Difference quotient, 57 second derivative test, 109, 275
Differentiable, 58, 340
continously, 257
vector field, 322 F
Differential, 83 Factorial, 17
Differential equation Fourier coefficients, 390
auxiliary equation, 444 Fourier coefficients, complex, 405
Bernoulli equation, 441 Fourier series, 390, 391
constant coefficients, 444 amplitude spectrum, 408
general solution, 433 complex form, 403
convolution, 422
homogeneous, 439
frequency spectrum, 408
linear, 430
periodic extension, 402
linear homogeneous, 430
phase angle, 408
linear inhomogeneous, 430
phase spectrum, 408
linear, standard form, 437 Shannon, 423
order, 429 sine and cosine form, 406
particular solution, 433 Fourier transform, 411
reduction of order, 443 as an operator, 413
solution, 430 convolution, 421
Diffusion equation, 271, 461, 469 dilation, 414
Directional derivative, 328 discrete, 421
Dirichlet convergence theorem, 391 inverse, 416
Dirichlet function, 13, 49 shift, 414
Index 645

Fubini’s theorem, 292 secant, 517


Function, 2 Shannon sampling, 18
absolute value, 15 signum, 16
algebra of, 29 sine, 517
argument of, 3 step, 16
average value, 213 stream function, 376
band-limited, 423 tangent, 517
bandwidth, 418 trigonometric, 517
bounded, 15 uniformly continuous, 528
component function, 320 value of, 3
composition, 29 Functions
concave, 111 of several variables, 245
constant, 8 Fundamental set, 433
continuous, 48, 248 Fundamental theorem of calculus, 169, 170,
convex, 111 528
cosecant, 517 Future value, 216
cosine, 517
cotangent, 517
differentiable, 327
G
differential of, 83
Gauss divergence theorem, 356, 533
Dirichlet, 13
Gaussian function, 20
discontinuous, 49
Gibbs phenomenon, 403
domain, 2
Gradient, 326
even, 10
in polar coordinates, 332
exponential, 19
Gradient field, 365
Gaussian, 20
Gradient flow, 370
graph of, 2, 4, 240
Graph of a function, 2, 240
Haar, 18
Gravitation, 364
Heaviside, 18
Green–Ostrogradski theorem, 353, 532
homogeneous, 267, 439
Green’s identity, 359
hyperbolic, 22
Growth
image of, 3
exponential, 449
increment of, 83
linear, 449
injective, 14
Growth factor, 449
integrable, 155, 292, 527, 541
Growth model, 78
inverse, 14
bacterial, 80
inverse trigonometric, 520
financial, 81
invertible, 13
population, 79
linear, 9
radioactive decay, 82
odd, 10
of two variables, 240
one-to-one, 14
orthogonal, 385 H
partial, 250 Haar function, 18
periodic, 12 Harmonic oscillator, 446
piecewise continuous, 384 Heat equation, 271, 461, 469
polynomial, 9 Heaviside function, 18, 49
power, 10 Helmholtz equation, 463
quadratic, 9 Hooke’s law, 226
range of, 3 l’Hôpital’s rule, 91
rational, 10 Hurricane, 378
real-valued, 4 Hyperbolic function, 22
representation, 4 Hyperbolic function, inverse, 521
646 Index

I K
Image, 3 Klein–Gordon equation, 463
Imaginary, 509 Korteweg de Vries equation, 462
Increasing, 106
Increasing function, 12
Increment, 83 L
Induction, 31 Lagrange multiplier, 285
Inflection point, 112 Lagrange’s identity, 319
Initial conditions, 430 Laplace equation, 268, 462, 470
Initial value problem, 430, 467, 469 Laplace operator, 462
Injective, 14 Least squares, 282
Inner product, 385 Left-continuous, 48
Integer, 509 Leibniz formula, 272
Level set, 240
Integer part, 16
Level surface, 240
Integrable, 155, 292, 411
Limit, 247
square integrable, 411
improper, 44, 51
Integrable function, 527, 541 left-sided, 42
Integral of a function, 43
as average, 212 right-sided, 43
definite, 155 vector field, 321
definition, 528, 541 Linear combination, 389
improper, 181 Linear dependence, 389
indefinite, 157 Linear function, 9
in polar coordinates, 295 Linear independence, 389
in spherical coordinates, 297 Linearization, 328
in three variables, 296 Line element, 342
in two variables, 291, 540 Line integral
iterated, 292 of a scalar field, 341
parameter dependent, 271 of a vector field, 342
principal value, 417 Logarithm, 21
repeated, 292 base of, 21
Riemann integral, 155 common, 22
surface integral, 348 natural, 21
table, 158, 186 Logarithmic function, 21, 28
Integral test, 185, 531 derivative, 68
Logistic equation, 450
Integrating factor, 438
Lorentz force, 368
Integration
by parts, 164, 173
by substitution, 160
M
rational function, 179
MacLaurin series, 145
table, 176
Malthusian model, 449
Interior point, 248 Marginal cost, revenue, profit, 72
Intermediate value theorem, 50 Marginal productivity, 262
Interval, 511 Mathematical
Inverse function induction, 31
continuity, 50 model, 23
derivative, 66 MATLAB
Inverse of a function, 14 differential equations, 494, 495
Invertible function, 13 file formats, 477
Irrational number, 509 Fourier series, 503
Irrotational, 362 functions, 480
Index 647

limits, 491 Partial derivative, 252


numerical integration, 491 second, 253
optimization, 502 Partial fraction, 179
symbolic differentiation, 489 Partial function, 250
symbolic integration, 489 Partial sum, 134
variables, operators, matrices, 478 Partition
visualization, 482 of an interval, 154
Maximal value Path independence, 346
global, 103 Period, 12
local, 104 of a pendulum, 147
Maximizer, see also extremum Perpendicular, 313
existence, 105 Poisson equation, 462
global, 103, 274 Polar coordinates, 514
local, 104, 274 area in, 200
Mean value theorem, 107, 173, 524 gradient, 332
Minimal value integral, 295
global, 103 Polynomial, 9
local, 104 degree of, 9
Minimizer, see also extremum Potential, 345
existence, 105 Power function, 10
global, 103, 274 Power series, 142
local, 104, 274 derivative, 144
Model, 23 interval of convergence, 143
Momentum, 367 radius of convergence, 143
Monomial, 10 Present value, 216
Monotonicity criterion, 106 Primitive, 156
Principal value, 417
Projection of vectors, 314
N Proof, 30
Natural number, 509 by contradiction, 31
Navier–Stokes equation, 463 direct, 30
Neighborhood, 247 indirect, 30
Newton–Raphson method, 88 Proportional, 25
Nondecreasing function, 12
Nonincreasing function, 12
Norm, 309, 385 Q
Quadratic function, 9
Quotient rule, 66
O
One-to-one, 14
Open set, 248, 326 R
Ordered pair, 1 Radians, 517
Orientation, 340 Range, 3
Orthogonal, 313 Rate of change, 58
Orthogonal system, 385 Rational function, 10
Orthonormal system, 386 Rational number, 509
Reciprocal rule, 65
Rectangular coordinates, 513
P Regression, 284
Pair, 1 Relation, 1
Parametrization Reverse triangle inequality, 312
of a curve, 340 Riemannian sum, 155, 526
regular, 338 Riemann zeta function, 140
648 Index

Right-continuous, 48 Substitution rule


Right-hand rule, 317 for definite integrals, 172
Rolle’s theorem, 107, 523 for indefinite integrals, 161
Rotation, 333 Superposition principle, 432
Surface, 240
area of, 347
equipotential, 241
S isothermal, 241
Saddle point, 274 level surface, 240
Sandwich theorem, 46, 131, 249, 523 piecewise smooth, 348
Scalar product, 312, 385 smooth surface, 338
Scalar triple product, 319 Surface element, 348
Schrödinger equation, 463 Surface integral, 348
Schwarz inequality, 316 Surface of revolution, 207
Secant function, 517
Secant line, 57
Sensitivity, 97 T
Separable variables, 434 Tangent function, 517
Tangent line, 58
Separation of variables, 435, 465
Tangent plane, 338
Sequence, 129
Tangent vector, 322
convergence, 130
Taylor expansion, 288, see also Taylor series
divergence, 130
Taylor polynomial, 146
limit, 130 Taylor series, 145
recursive definition, 133 convergence, 145
Series, 134, see also power series divergence, 145
absolute convergence, 141 Taylor’s theorem, 525
alternating, 141 Telegraph equation, 462
comparison principle, 137 Transport equation, 461
conditional convergence, 141 Triangle inequality, 310
convergence, 134 Trigonometric function, 517
divergence, 134 derivative, 62, 67
Fourier, see Fourier series integration formulas, 173
geometric, 134 Trigonometric function, inverse, 520
harmonic, 139 Trigonometric identities, 517
ratio test, 138 Trigonometric series, 390
rearrangement, 142 Triple integral, 296, 542
root test, 138
sum, 134
trigonometric, 390 U
Sgn, 16 Uncertainty principle, 418
Shannon sampling function, 18
Shannon sampling theorem, 423
V
Signum function, 16 Variable
Sine, 517 dependent, 4
Sink flow, 375 independent, 4
Slope, 58, 253 Vector, 308
Spherical coordinates, 297 angle between vectors, 315
Stationary point, 105, 274 normal, 338
Step function, 16 parallel vectors, 309
Stokes theorem, 361, 536 projection, 314
Stream function, 376 unit vector, 310
Streamline, 376 Vector field, 320
Index 649

conservative, 345, 538 average, 40


continuous, 321 instantaneous, 40
differentiable, 322 Vortex flow, 376
limit, 321
potential of, 345
Vector function, 320
Vector product, 316 W
Velocity Wave equation, 269, 461, 465, 467

You might also like