0% found this document useful (0 votes)
93 views128 pages

240 Notes

Uploaded by

Nur Nabi Rashed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views128 pages

240 Notes

Uploaded by

Nur Nabi Rashed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Math 240: Calculus III

Edvard Fagerholm
[email protected]

June 20, 2012


Contents
1 About 3
1.1 Content of these notes . . . . . . . . . . . . . . . . . . . . . . 3
1.2 About notation . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Vector Spaces 5
2.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Definition of a vector space . . . . . . . . . . . . . . . . . . . . 6
2.3 Span of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Linear independence of vectors . . . . . . . . . . . . . . . . . . 10
2.5 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6 Generalizing our Definition of a Vector (optional) . . . . . . . 15

3 Matrices 18
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 The transpose of a matrix . . . . . . . . . . . . . . . . . . . . 24
3.4 Some important types of matrices . . . . . . . . . . . . . . . . 25
3.5 Systems of linear equations and elementary matrices . . . . . . 27
3.6 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . 31
3.7 Rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.8 Rank and systems of linear equations . . . . . . . . . . . . . . 37
3.9 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.10 Properties of the determinant . . . . . . . . . . . . . . . . . . 42
3.11 Some other formulas for the determinant . . . . . . . . . . . . 47
3.12 Matrix inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.13 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . 55
3.14 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4 Higher-Order ODEs 68
4.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2 Homogeneous equations . . . . . . . . . . . . . . . . . . . . . 70
4.3 Nonhomogeneous equations . . . . . . . . . . . . . . . . . . . 72
4.4 Homogeneous linear equations with constant coefficient . . . . 74
4.5 Undetermined Coefficients . . . . . . . . . . . . . . . . . . . . 77
4.6 Variation of parameters (optional) . . . . . . . . . . . . . . . . 80
4.7 Cauchy-Euler equations . . . . . . . . . . . . . . . . . . . . . 81

1
4.8 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.8.1 Free undamped motion . . . . . . . . . . . . . . . . . . 83
4.8.2 Free damped motion . . . . . . . . . . . . . . . . . . . 84
4.8.3 Driven motion . . . . . . . . . . . . . . . . . . . . . . . 86

5 Systems of linear ODEs 88


5.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2 Homogeneous linear systems . . . . . . . . . . . . . . . . . . . 92
5.3 Homogeneous linear systems – complex eigenvalues . . . . . . 95
5.4 Solutions by diagonalization . . . . . . . . . . . . . . . . . . . 98

6 Series solutions to ODEs 99


6.1 Solutions around ordinary points . . . . . . . . . . . . . . . . 99
6.2 Solutions around singular points . . . . . . . . . . . . . . . . . 101

7 Vector Calculus 102


7.1 Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.2 Independence of Path . . . . . . . . . . . . . . . . . . . . . . . 105
7.3 Multiple integrals . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.4 Green’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.5 Change of variable formula for multiple integrals . . . . . . . . 116
7.6 Surface integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.7 Divergence theorem . . . . . . . . . . . . . . . . . . . . . . . . 125
7.8 Stokes theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 127

2
1 About
1.1 Content of these notes
These notes will cover all the material that will be covered in class. What is
mostly missing is going to be pictures and examples. The course text, Zill,
Cullen, Advanced Engineering Mathematics 3rd Ed, will be very useful for
more thorough and harder (read: longer) examples than what I’m willing to
spend my time on typing up. You’ll also find more problems to complement
the homework that I will assign, since the more problems you do the better.
All the theory that’s taught in class will be in these notes and some more.
During lecture I might sometimes skip parts of some proofs that you’ll find
in these notes or instead just present the general idea by doing e.g. a special
case of the general case. This will usually happen when:

1. The full proof will not add anything useful to your understanding of
the topic.

2. The full proof is not a computational technique that will turn out useful
when solving practical problems. This is an applied class after all.

Embedded in each section you’ll find some examples. After almost every
definition there will be something I would call a trivial example that should
help you check that you are understanding what the definition is saying.
You should make sure you understand them before moving on, since not
understanding them is a sign that you’ve misunderstood something.
Each section also ends with a list of all the most basic computational
problems related to that topic. These are things you will be expected to
perform in your sleep, so make sure you understand them. Almost any
problem you will encounter in this class will reduce to solving a sequence of
these problems, so they will be the ”bricks” of most applications that you’ll
encounter.

3
1.2 About notation
The following basic notations will be used in the class:

∅ = the set with no elements


N = 0, 1, 2, 3, . . .
Z = 0, 1, −1, 2, −2, . . .
Q = {x/y | x, y ∈ Z, y 6= 0}
R = the real numbers
C = the complex numbers
∀ = this is read ”for all”
∃ = this is read ”there exists”
∈ = this is read ”element of”
6∈ = this is read ”not element of”
s.t. = short for ”such that”

In other words, a short hand for the sentence ”there exists a real number that
is larger than zero” could be written as ∃x ∈ R s.t. x > 0. I will also assume
you know the following set theoretic notations. Given any sets A, B we may
define union, intersection, difference, equality and subset. These are defined
as follows:

x∈A∪B ⇔ x ∈ A or x ∈ B
x∈A∩B ⇔ x ∈ A and x ∈ B
x∈A\B ⇔ x ∈ A and x 6∈ B
A=B ⇔ x ∈ A if and only if x ∈ B
A⊂B ⇔ If x ∈ A, then x ∈ B.

Notice that A = B can also be written as A ⊂ B and B ⊂ A. Finally, we


sometimes use the following notation

A ( B,

which means that A ⊂ B, but A 6= B. In other words, B contains an element


that is not in A, but all elements of A are in B.
We will also sometimes use the following notation to describe sets. Say
we want to define the set of all whole numbers that are even. We can write

4
this set as
{x ∈ Z | x is even}
More generally, assume that P is a predicate, which is informally something
that’s either true or false depending on what you feed it and A any set. We
write
{x ∈ A | P (x)}
to mean ”the elements x of A s.t. P (x) is true”. In our example A = Z and
P (x) =”x is even”. If x is constructed of multiple parts, we might sometimes
write e.g.
{(x, y) ∈ R2 | y = 2x},
which would use as the predicate P (x, y) the statement ”y = 2x”. Usually,
these should not lead to too much confusion since most of the notation is
quite self-explanatory.
For functions we also use the familiar notation f : A → B. This means
that f is a function from A to B i.e. A is the domain of f and B the codomain
of f . In other words, given an x ∈ A, f assigns a unique f (x) ∈ B. In this
setting one also talks about the image of f which is defined as
Im f = {y ∈ B | ∃x ∈ A s.t. y = f (x)}
In other words, all the points in B onto which an element in A maps.

2 Vector Spaces
2.1 Vectors
In math 114 a sort of mixed notation was used for vectors. Sometimes we
wrote →−
v = 2i + j + k, while we also used the notation → −v = h2, 1, 1i. We
see that there’s an immediate equivalence between points of R3 and vectors.
If O denotes the origin in R3 , then given a point P = (2, 1, 1) it defines the
vector
−→
OP = h2, 1, 1i.
The moral of the story is that a point P = (x, y, z) represents a vector
and a point is essentially just a list of numbers. Those of you familiar with
computers might have heard the term vector processing, which simply means
that something operates on a list of numbers. In this spirit we will make the
following definition:

5
Definition 2.1.1 A vector is just an n-tuple of (a1 , . . . , an ) of numbers.
Typically we will have (a1 , . . . , an ) ∈ Rn i.e. each ai is a real number.

Example 2.1.2 (1, 1) is a vector in R2 and (1, 0, 1, 2) is a vector in R4 . Note


that in the latter case there’s really no way to ”visualize” where this vector
points, so it has less of a geometric meaning, since we can’t visualize four
dimensions.

Note 2.1.3 In applications one often encounters vectors in, say, R100 . For
example, say we sample 100 numbers independently for some statistical ap-
plication. The result will be a vector (a1 , . . . , a100 ) ∈ R100 , where ai is the
number obtained in the ith sample. Here, the vector is chosen from a 100-
dimensional space. This does not make any sense if thinking about it visually.
However, it does make sense, if we think about the dimension as simply being
the degrees of freedom of the experiment. Each number is basically chosen
independently of the others.

Note 2.1.4 Essentially any phenomena that depends freely on n-variables


can be modeled by something called an n-dimensional vector space, which is
the object of study in linear algebra. This is why linear algebra tends to be
the most useful subfield of mathematics for practical applications. From a
mathematical perspective, it would make much more sense to teach it before
calculus, but the world doesn’t seem to work that way.

2.2 Definition of a vector space


Let (a1 , . . . , an ), (b1 , . . . , bn ) ∈ Rn and c ∈ R. In other words, two vectors
and a real number. Write x = (a1 , . . . , an ) and y = (b1 , . . . , bn ). We can add
these vectors and multiply them by real numbers as follows:

x + y = (a1 , . . . , an ) + (b1 , . . . , bn ) = (a1 + b1 , . . . , an + bn )


cx = c(a1 , . . . , an ) = (ca1 , . . . , can ).

In other words add component-by-component and multiply each component


with the constant. The real number c ∈ R will be called a scalar. This leads
us to the concept of a vector space.

6
Definition 2.2.1 A vector space is a subset V ⊂ Rn s.t. the following
conditions hold:
1. If x, y ∈ V , then x + y ∈ V .

2. If c ∈ R, then cx ∈ V .
The former condition is called closed under vector addition while the latter
is called closed under multiplication by a scalar.

Example 2.2.2 Let n = 2, so we are looking at R2 . Let V = {(x, y) | y =


2x}. In other words V consists of all the points on the line y = 2x. Lets
show that V is a vector space by checking that conditions (1) and (2) in the
definition are true. This is done as follows:
1. Let (a1 , a2 ), (b1 , b2 ) ∈ V be points on the line y = 2x. By definition
a2 = 2a1 and b2 = 2b1 . It follows that a2 + b2 = 2a1 + 2b1 = 2(a1 + b1 ).
In other words,

(a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 + b2 ) ∈ V.

2. Now let (a1 , a2 ) ∈ V and c ∈ R. Since a2 = 2a1 , we also have that


ca2 = 2ca1 , so c(a1 , a2 ) = (ca1 , ca2 ) ∈ V .

Example 2.2.3 Let V = {(x, y) | y = 2x + 1} i.e. the points on the line


y = 2x + 1. Now (1, 3) is on the line, but (−1)(1, 3) = (−1, −3) is not. Thus
V is not closed under multiplication by a scalar, so it’s not a vector space.

2.3 Span of vectors


A typical problem one needs to solve is the following. Assume we are given
vectors x1 , . . . , xk ∈ Rn . We want to find all the vectors x s.t.

x = c1 x1 + . . . + ck xk , ci ∈ R, i = 1, . . . , n.

Conversely, given x we want to determine if it’s of the previous form.

Definition 2.3.1 A vector x that can be written as c1 x1 +. . .+ck xk is called


a linear combination of the vectors x1 , . . . , xk .

7
Example 2.3.2 Assume we are given one vector x = (1, 0) ∈ R2 . Then we
may write (2, 0) = 2x, so (2, 0) is a linear combination of x. However, say
we look at the vector (1, 1). Clearly,

(1, 1) 6= cx = c(1, 0) = (c, 0)

for any choice of c ∈ R. In other words (1, 1) is not a linear combination of


x.

Example 2.3.3 Assume we are given vectors x1 = (1, 0, 1), x2 = (2, 1, 0),
x3 = (0, 0, 1). We want to find all the linear combinations of x1 , x2 , x3 . We
want to find all x ∈ R3 s.t.

x = c1 x 1 + c2 x 2 + c3 x 3 .

Write x = (a1 , a2 , a3 ). To determine if x is a linear combination, we need to


solve for the unknowns c1 , c2 , c3 s.t. the combination gives us x. This gives

(a1 , a2 , a3 ) = c1 x 1 + c2 x 2 + c3 x 3
= c1 (1, 0, 1) + c2 (2, 1, 0) + c3 (0, 0, 1)
= (c1 , 0, c1 ) + (2c2 , c2 , 0) + (0, 0, c3 )
= (c1 + 2c2 , c2 , c1 + c3 ).

We just equate the components, which gives us a linear system of equations



 c1 + 2c2 = a1
c 2 = a2
c 1 + c 3 = a3 .

In other words, we need to determined when this linear system of equations


in the unknowns c1 , c2 and c3 has a solution given a1 , a2 , a3 . We will be able
to answer this question later in the class when we study systems of linear
equations and Gaussian elimination.

Definition 2.3.4 Let x1 , . . . , xk ∈ Rn be a collection of vectors. We define


the span of them to be the set

V = {x = (a1 , . . . , an ) ∈ Rn | x = c1 x1 + . . . + ck xk , ci ∈ R, i = 1, . . . , k}.

8
In other words, V is the set of all vectors that can be written as a linear
combination of the x1 , . . . , xn . We denote this by

V = span(x1 , . . . , xk ).

Definition 2.3.5 If V ⊂ Rn is a vector space and V = span(x1 , . . . , xk ) for


some vectors x1 , . . . , xk ∈ Rn . Then x1 , . . . , xk is called a spanning set of
vectors of V .

Note 2.3.6 The previous definition goes the other way around. We are given
the vector space V and we say that x1 , . . . , xk is a spanning set if it happens
that V = span(x1 , . . . , xk ).

Theorem 2.3.7 V = span(x1 , . . . , xk ) ⊂ Rn is a vector space.


Proof. Let x, y ∈ V , so that

x = c1 x1 + . . . + ck xk , y = d1 x1 + . . . + dk xk

where ci , di ∈ R are scalars. Then

x + y = (c1 + d1 )x1 + . . . + (ck + dk )xk ∈ V,

since this is a linear combination with scalars ci + di . Similarly,

cx = cc1 x1 + . . . + cck xk ∈ V.

Theorem 2.3.8 For any vectors x1 , . . . , xk ∈ Rn , we have that

span(x1 , . . . , xk−1 ) ⊂ span(x1 , . . . , xk )

Proof. x = c1 x1 + . . . + ck−1 xk−1 = c1 x1 + . . . + ck−1 xk−1 + 0xk .


The previous result is also useful for creating larger vector spaces. Assume
we are given a vector space V = span(x1 , . . . , xk ), then by choosing x ∈
Rn \ V , i.e. a vector not in V , we can create a larger vector space V ( W =
span(x1 , . . . , xk , x), since the space W contains x.

9
WARNING 2.3.9 The span of no vectors is by convention defined to the
the vector space containing just the zero vector 0 ∈ Rn . This comes up when
defining dimension later on.

Theorem 2.3.10 Given a list of vectors x1 , . . . , xn , then the following will


not change their span:

1. Multiplying a vector in the list by a nonzero constant.

2. Exchanging two vectors in the list (thus renaming one to be the other
and vice versa).

3. Replacing xi by xi + axj for some j 6= i i.e. adding a scalar multiple


of one vector to another.

Proof. c1 x1 + . . . + cn xn = c1 x1 + . . . + ci a−1 (axi ) + . . . + cn xn proves (1).


The others follow similar logic.

2.4 Linear independence of vectors


We mentioned earlier in Note 2.1.4 that we are often interested in the degrees
of freedom of a problem. In linear algebra this is called dimension. It’s based
on the following concept. Given vectors x1 , . . . , xk ∈ Rn and some vector
x ∈ span(x1 , . . . , xk ), then by definition we know that

x = c1 x 1 + . . . + ck x k .

We can now ask us the following question: are the ci ’s unique, that is, are
there multiple choices of the ci s.t. we get x as the linear combination?
Concretely, can we find ci , di ∈ R, s.t. ci 6= di for some i and still

x = c1 x 1 + . . . + ck x k = d 1 x 1 + . . . + d k x k ?

Example 2.4.1 Let x1 = (1, 0), x2 = (0, 1), x3 = (1, 1) be vectors in R2


and choose x = (2, 1). We see that

x = 2x1 + x2 + 0x3 = x1 + 0x2 + x3 ,

so there are multiple choices for the scalars.

10
Here’s the reason why we are interested in this. We are looking for a
definition of dimension, which is based on the following. Assume we are
given vectors x1 , . . . , xk ∈ Rn . Let’s remove a vector from this collection,
say, xk . By theorem 2.3.8 we know that
span(x1 , . . . , xk−1 ) ⊂ span(x1 , . . . , xk ),
but do we have equality or not?

Example 2.4.2 Let’s go back to the previous example, so x1 = (1, 0),


x2 = (0, 1), x3 = (1, 1). Then we know that span(x1 , x2 ) ⊂ span(x1 , x2 , x3 ).
However, since we can write x3 = x1 + x2 , we have that
x = c1 x1 + c2 x2 + c3 x3 = c1 x1 + c2 x1 + c3 (x1 + x2 ) = (c1 + c3 )x1 + (c2 + c3 )x2 .
This shows that if x ∈ span(x1 , x2 , x3 ), then x ∈ span(x1 , x2 ), so actually
span(x1 , x2 ) = span(x1 , x2 , x3 ).
On the other hand, since x2 = 0x1 + x2 , we clearly have x2 ∈ span(x1 , x2 ).
But (0, 1) = x2 6= c1 x1 = (c1 , 0) for any choice of c1 ∈ R, so x2 6∈ span(x1 ).
Thus, we have
span(x1 ) ( span(x1 , x2 ) = span(x1 , x2 , x3 ).
Is there a general reason to why this is the case? Yes, and we will answer
that next.

Definition 2.4.3 Given vectors x1 , . . . , xk ∈ Rn , we say that x1 , . . . , xk are


linearly independent if given any x ∈ span(x1 , . . . , xk ), the equation
x = c1 x 1 + . . . + ck x k
has a unique solution in terms of the ci .

Example 2.4.4 One vector x1 ∈ Rn is always linearly independent, assum-


ing x1 6= 0. This is because for distinct scalars c, d ∈ R we obviously have
cx1 6= dx1 , so that the equation
x = c1 x 1
has a unique solution for any x ∈ span(x1 ).

11
The alert reader might notice that this definition is quite problematic for
practical computations. Since span(x1 , . . . , xk ) is an infinite set (unless all
vectors are the point 0), to check for linear independence, we would have
to check the condition for infinitely many x ∈ span(x1 , . . . , xk ). This would
not be practical, since we would never be done, so fortunately we have the
following theorem, which implies we only need to check one element:

Theorem 2.4.5 The vectors x1 , . . . , xk ∈ Rn are linearly independent if and


only if the equation
0 = c1 x 1 + . . . + ck x k
has a unique solution ci = 0 for i = 1, . . . , k.
Proof. We first show that linear independence implies the condition. Assume
that x1 , . . . , xk ∈ Rn are linearly independent. Since 0x = 0 ∈ Rn for any
vector in Rn , we get
0 = 0x1 + . . . + 0xn
But the definition of linear independence says precisely that any such solution
is unique. Since we have just found one, this must be the unique one, so ci = 0
for i = 1, . . . , k is the unique solution.
Next assume that ci = 0 is the only solution for the equation in the
theorem. Pick x ∈ span(x1 , . . . , xk ) and assume that

x = c1 x1 + . . . + ck xk = d1 x1 + . . . + dk xk .

It follows that

0 = x − x = (c1 − d1 )x1 + . . . + (ck − dk )xk .

By our assumption, we must have ci − di = 0 i.e. ci = di , i = 1, . . . , k. It


follows that x = c1 x1 + . . . + ck xk has a unique solution.

2.5 Dimension
Now that we know linear independence and how to test for it, we can begin
defining the concept of dimension. We start with the following theorem.

12
Theorem 2.5.1 Assume that x1 , . . . , xk ∈ Rn are not linearly independent.
Then we may remove some vector xi from the list without changing the span,
i.e.
span(x1 , . . . , xi−1 , xi+1 , . . . , xk ) = span(x1 , . . . , xk ).
Proof. By theorem 2.4.5, we may find some c1 , . . . , ck ∈ R, s.t. ci 6= 0 for
some i and
0 = c1 x 1 + . . . + ck x k .
Since we can just rename our variables, we may just as well assume that
c1 6= 0 to simplify notation.
We already know that span(x2 , . . . , xk ) ⊂ span(x1 , . . . , xk ), so we need
to show the opposite inclusion. Pick x ∈ span(x1 , . . . , xk ). We need to show
that we may write
x = d2 x2 + . . . + dk xk .
It follows from 0 = c1 x1 + . . . + ck xk that −c1 x1 = c2 x2 + . . . + ck xk . By our
assumption, c1 6= 0, so we can divide both sides by −c1 giving

x1 = −(c2 /c1 x2 + . . . + ck /ci xk )

Now x ∈ span(x1 , . . . , xk ), so for some αi ∈ R, i = 1, . . . , k, we get by


substituting for xi that

x = α1 x1 + . . . + αk xk
= −α1 (c2 /c1 x2 + . . . + ck /ci xk ) + α2 x2 + . . . + αk xk
= (α2 − α1 c2 /c1 )x2 + . . . + (αk − α1 ck /c1 )xk .

It follows that x ∈ span(x2 , . . . , xk ), so the proof is complete.

Algorithm 2.5.2 (Find linearly independent subset) The previous theorem


provides us with the following algorithm. Assume we are given a list of
vectors x1 , . . . , xk ∈ Rn . Note that the previous proof shows that if 0 =
c1 x1 + . . . + ck xk , then we can remove any element from the list for which
ci 6= 0 without changing the span.

1. If our list contains vectors that are 0 we may remove them without
affecting the span.

13
2. If x1 , . . . , xk are not linearly independent, then 0 = c1 x1 + . . . + ck xk
and some ci 6= 0. Remove the vector xi from the list. This won’t then
affect the span.

3. If the list that is left is not linearly independent, we may repeat the
previous step.

4. Continue until left with a linearly independent list of vectors with the
same span.
At each stage the list of vectors decrease in length by 1, so the process has
to stop, so the algorithm always stops with a linearly independent list. This
proves the following:

Theorem 2.5.3 Assume that V = span(x1 , . . . , xk ), then we may always


find some linearly independent subset A ⊂ {x1 , . . . , xk }, s.t.

span(x1 , . . . , xk ) = span(A)

Example 2.5.4 Let x1 = (1, 0), x2 = (0, 1), x3 = (1, 1). We showed in
example 2.4.2 that these vectors are not linearly independent. Furthermore,
we showed that

span(x1 ) ( span(x1 , x2 ) = span(x1 , x2 , x3 ).

The subset A = {x1 , x2 } ⊂ {x1 , x2 , x3 } is linearly independent. Thus our


algorithm would have stopped after step (2).

Definition 2.5.5 Let V = span(x1 , . . . , xk ) and A a linearly independent


subset of {x1 , . . . , xk } s.t. V = span(A). We define the dimension of V to
be the number
dim V = #A.
(Here # denotes the number of elements in the set in our linearly independent
list produced by the algorithm). The vector space V = {0} ⊂ Rn is spanned
by an empty list of vectors, so by convention it has dimension 0.

Definition 2.5.6 Let V = span(x1 , . . . , xk ) s.t. the x1 , . . . , xk are linearly


independent. Then the vectors x1 , . . . , xk are called a basis for V .

14
Note 2.5.7 The alert reader might notice some serious problems here. If
V = span(x1 , . . . , xk ) = span(y1 , . . . , yl ) for two different sets of vectors.
Then running the algorithm for our first list produces some
A ⊂ {x1 , . . . , xk }
while running the algorithm for the second list produces some subset
B ⊂ {y1 , . . . , yl }.
Our definition then says that #A = dim V = #B, so our definition of di-
mension only makes sense if A and B contain the same number of vectors.
Fortunately, we have the following theorem, which tells us that vector spaces
behave pretty much exactly the way we would like them to. In particular, A
and B will always contain the same number of vectors.

Theorem 2.5.8 Let V ⊂ Rn be a vector space. Then the following are true:
1. V has a basis, so there always exists a finite list of vectors x1 , . . . , xk
s.t. they are linearly independent and V = span(x1 , . . . , xk ).
2. Any two bases of V have the same number of elements, thus dim V is a
well-defined constant independent of the chosen basis. Mathematicians
would call such a thing an intrinsic property of V .
3. If W ⊂ V ⊂ Rn are vector spaces, then dim W ≤ dim V ≤ dim Rn = n.
4. If W ⊂ V ⊂ Rn and dim W = dim V , then W = V .
Proof. Take either math 312 or math 370.

Note 2.5.9 Gaussian elimination taught later in the class will provide us
with an effective method for computing the dimension of the span of some
vectors.

2.6 Generalizing our Definition of a Vector (optional)


Our current definition of a vector is not completely adequate for our class.
We will need a slightly more general definition for a vector when solving
systems of differential equations. If you trace through all the proofs made so
far, you see that we have used to following properties of vector addition:

15
1. x + y ∈ V for all x, y ∈ V (closure under addition)

2. x + y = y + x for all x, y ∈ V (commutativity)

3. x + (y + z) = (x + y) + z for all x, y, z ∈ V (associativity)

4. ∃0 ∈ V s.t. 0 + x = x for all x ∈ V (existence of zero vector)

5. Given x ∈ V there’s a vector y ∈ V s.t. x+y = 0 (existence of inverse)

Now assume V is any set with an addition satisfying all the above properties.
Let F denote either R or C. If for each x ∈ V and c ∈ F, we can define a
new vector cx ∈ V s.t. the following hold

1. For all c ∈ F and x ∈ V , cx ∈ V (closure under scalar multiplication)

2. For all c ∈ F and x, y ∈ V , c(x + y) = cx + cy (distributivity)

3. For all c1 , c2 ∈ F and x ∈ V , (c1 + c2 )x = c1 x + c2 x (distributivity)

4. For all c1 , c2 ∈ F and x ∈ V , (c1 c2 )x = c1 (c2 x)

5. 1x = x for all x ∈ V ,

then we call the product cx a scalar multiplication on V .

Definition 2.6.1 A set V equipped with a scalar multiplication by F is a


vector space. When F = R we call V a real vector space and when F = C we
call V a complex vector space
If the previous example felt abstract, then it’s precisely because it is. A
few examples will show that it’s not that complicated.

Example 2.6.2 Let P(R) denote polynomials with real coefficients and set
V = P(R). We can clearly multiply a real polynomial by a real number, so
e.g. 2(1+x+x2 ) = 2+2x+2x2 and similarly a sum of two real polynomials is
a real polynomial. Thus we pick F = R. The reader can check that addition
of polynomials and multiplication by a scalar satisfies all the axioms given
above, where the zero vector is just the trivial polynomial 0.

16
Example 2.6.3 Let V = {f : [0, 1] → R} denote the collection of all func-
tions from [0, 1] to R. If f, g are two such functions, then we can add them
pointwise, so we can define a function f + g by (f + g)(x) = f (x) + g(x).
Similarly, we can multiply such a function with a real number, so if c ∈ R,
then cf is the function (cf )(x) = c · f (x) i.e. we multiply the value at each
point with c. This is precisely what it intuitively means to multiply a func-
tion by a number. Again, it’s a simple exercise to check that V together with
F = R satisfies all the properties of a vector space.
When solving differential equations, we will have to deal with vector
spaces V , where the elements i.e. the vectors x ∈ V are solutions to a dif-
ferential equation. Solving an equation will then be the problem of finding a
basis for this so called solution space. Here’s a simple example:

Example 2.6.4 Assume we are given the differential equation y = y 0 . From


previous courses we know that the solutions to this equation are the functions
cex . Now let V = {cex | c ∈ R}, then given x, y ∈ V , we have that x = c1 ex
and y = c2 ex , so that x + y = (c1 + c2 )ex ∈ V . This shows that V is closed
under addition. Scalar multiplication is then just multiplying the function
with a real number like in the previous example, so if cex ∈ V , then for any
real r ∈ R also rcex ∈ V . Again, one can check that the addition and scalar
multiplication satisfies the axioms of a vector space. Furthermore, ex is a
basis of V .

Example 2.6.5 A final standard example that we will encounter is vector


spaces where the vectors are lists of functions e.g. (x + 1, ex , x2 ). Again,
these can be added component-wise and similarly for scalar multiplication.

WARNING 2.6.6 When working with these more generalized examples,


theorem 2.5.8 stops being true. For example, there’s no finite list of vectors
spanning the vector space of all real polynomials P(R). Such vector spaces
will be called infinite dimensional. Those of you who want to think about
this can look at the optional homework problems.

17
3 Matrices
3.1 Introduction
Definition 3.1.1 A matrix is a rectangular array of numbers or functions,
 
a1,1 a1,2 · · · a1,m
 a2,1 a2,2 · · · a2,m 
..  .
 
 .. ..
 . . . 
an,1 an,2 · · · an,m
By ai,j we simply mean the element in the ith row and j th column. Sometimes
I will also use the term (i, j)-element of a matrix which just means the element
ai,j . By the dimensions of a matrix we mean the number of rows and columns.
A matrix with n rows and m columns is called an n-by-m matrix. We will
often denote an n-by-m matrix by A = (aij ), where i = 1, . . . , n and j =
1, . . . , m. This shorter notation is usually needed when proving things about
matrices.

Definition 3.1.2 n-by-m matrices are denoted by Matn,m (R), Matn,m (C)
etc. where the set in parenthesis denotes what set the elements of the matrix
belong to.

Example 3.1.3 The following are matrices in Mat2,2 (R) respectively Mat2,3 (C),
 √   
1 2 1 0 i
, ,
0 1 2 1 0
while the following is a matrix, where the entries are functions
 2 
x + 1 ex + 1
.
x 1
A matrix with function elements can be thought of as a function into either
Matn,m (R) or Matn,m (C), i.e. for each value of the variable, we get a real or
complex matrix.

Note 3.1.4 A real matrix is of course also a complex matrix, since real num-
bers are also complex, but just with zero imaginary part. Thus Matn,m (R) ⊂
Matn,m (C).

18
Definition 3.1.5 Given an n-by-m matrix
 
a1,1 a1,2 · · · a1,m
 a2,1 a2,2 · · · a2,m 
.
 
 .. .. ..
 . . . 
an,1 an,2 · · · an,m

we call x = (ai,1 , . . . , ai,n ) the ith row vector and


 
a1,j
 a2,j 
y =  ..  = (a1,j , . . . , an,j )
 
 . 
an,j
the jth column vector. Notice that we also denote vectors with the bracket
notation, so we identify an n-vector with a n-by-1 matrix.
In the definition above, we see that vectors in the matrix setting can also
contain functions as elements. Thus you should think of a row or column
vector as just a list of elements in the corresponding row or column of the
matrix.

Example 3.1.6 Given the complex matrix from the previous example,
 
1 0 i
,
2 1 0
the first column vector is (1, 2) while the second row vector is (2, 1, 0). The
first row vector of  2 
x + 1 ex + 1
x 1
is (x2 + 1, ex + 1).

3.2 Matrix Algebra


Definition 3.2.1 The sum of two matrices is defined as follows:
     
a1,1 · · · a1,m b1,1 · · · b1,m a1,1 + b1,1 · · · a1,m + b1,m
 .. . .. . .
.. + ..
  . .. .
..  = 
  .. .. .. 
 . . . . 
an,1 · · · an,m bn,1 · · · bn,m an,1 + bn,1 · · · an,m + bn,m

19
Note that matrices can only be added if they have the same dimensions.
More compactly, if A = (aij ) and B = (bij ) are of equal dimensions, then
A + B = (aij + bij ).

Definition 3.2.2 Given a number c and a matrix, we can define scalar


multiplication by
   
a1,1 · · · a1,m ca1,1 · · · ca1,m
c  ... ... ..  =  .. ... .. 

.   . . 
an,1 · · · an,m can,1 · · · can,m

The number c is called a scalar and depending on the context we might


assume c is either a natural, whole, rational, real or complex number. Again,
we may write A = (aij ) and then cA = (caij ).

Example 3.2.3 Some concrete examples:


 √   2   2 √ 
1 2 x + 1 ex + 1 x + 2 ex + 1 + 2
+ =
0 1 x 1 x 2
   
1 0 i 2 0 2i
2 =
2 1 0 4 2 0

Theorem 3.2.4 Given matrices A, B, C of the same dimensions and scalars


c, d, scalar multiplication and matrix addition satisfy the following properties:
1. A + B = B + A (commutativity)

2. A + (B + C) = (A + B) + C (associativity)

3. (cd)A = c(dA)

4. c(A + B) = cA + cB, (c + d)A = cA + dA (distributivity)


Proof. I’ll prove the first, the rest are just as easy to check and will be
homework. Write A = (aij ) and B = (bij ) with dimensions again being the
same. Then
A + B = (aij + bij ) = (bij + aij ) = B + A.

20
Note 3.2.5 With regards to the generalized definition of a vector space,
the reader can check that n-by-m matrices with real coefficients form a real
vector space. More generally, let V be any vector space with scalars F (i.e.
either R or C) and W = Matn,m (V ) the set of n-by-m matrices where the
matrix elements are elements of V . Then W is also an F vector space with
component-wise addition and scalar products.

Definition 3.2.6 If we are given two such lists x = (a1 , . . . , an ), y =


(b1 , . . . , bn ) representing row or column vectors of a matrix, we can define
their dot product as
n
X
x·y = ai b i .
i=1

Example 3.2.7 Let x denote the first column vector and y the second row
vector of  2 
x + 1 ex + 1
x 1
Then
x · y = (x2 + 1, x) · (x, 1) = (x2 + 1)x + x = x3 + 2x.
We can also multiply matrices, but the product of matrices has a some-
what unintuitive definition. Matrices where originally invented to handle
systems of linear equations, which we will also study, so therein lies the
motivation for the definition. The idea was to write a system of m linear
equations in n variables,

 a1,1 x1 + a2 x2 + . . . + a1,n xn = b1

.. ,
 .
 a x + a x + ... + a x = b
m,1 1 2 2 m,n n m

in the following form


    
a1,1 · · · a1,n x1 b1
 .. .. ..   ..  =  ..  .
 . . .  .   . 
am,1 · · · am,n xn bm

21
Therefore, we want the ”product” on the left to equal
 
a1,1 x1 + a2 x2 + . . . + a1,n xn
..
.
 
 .
am,1 x1 + a2 x2 + . . . + am,n xn

The following definition for the matrix product accomplishes just this:

Definition 3.2.8 The product of two matrices is defined as follows. Let A


be an n-by-p matrix and B a p-by-m matrix. Denote the row vectors of
A by x1 , . . . , xn and the column vectors of B by y1 , . . . , ym . Then
   
x1 x 1 · y 1 · · · x1 · ym
AB =  ...  y1 · · · ym =  .. .. ..
    
. . . 
xn xn · y1 · · · xn · · · ym

Thus the element ai,j in the matrix AB is the dot product of the ith row
vector of A and the j th column vector of B. The product AB is an n-by-m
matrix. Using the shorter notation with A = (aij ) and B = (bij ), then
AB = (cij ), where
p
X
cij = aik bkj .
k=1

WARNING 3.2.9 The dot product between row vectors of A and column
vectors of B has to make sense, i.e. they have to be lists of equal length. This
means that the matrix product AB is only defined if the number of
columns of A equals the number of rows of B. The size of the matrices
in the matrix product work as follows:

An×p Bp×m = Cn×m .

WARNING 3.2.10 If the product AB is defined, this does not imply that
BA is defined!

22
Example 3.2.11 Here are a bunch of examples:
 
  1 1 1  
1 0 2  1+2 1 1+2
0 0 0  =
0 1 1 1 0 1
1 0 1
 
3 1 3
=
1 0 1
    
1 0 1 1 1 1
=
2 1 1 1 2+1 2+1
 
1 1
=
3 3
    
1 1 1 0 1+2 1
=
1 1 2 1 1+2 1
 
3 1
=
3 1
    
2 1 x1 2x1 + x2
=
1 1 x2 x1 + 2x2

The last product shows how matrix multiplication relates to systems of linear
equations.

WARNING 3.2.12 As can be seen by the previous example matrix mul-


tiplication is not commutative when both AB and BA makes sense. It’s
actually rare for the product of two matrices to commute! Write down a few
2-by-2 matrices and compute products of them in both orders, you’ll see that
very few, if any of them, will commute.

Theorem 3.2.13 Given matrices A, B, C of such dimensions that the prod-


ucts are defined, the matrix product satisfies the following properties:

1. A(BC) = (AB)C (associativity)

2. (A + B)C = AC + BC, A(B + C) = AB + AC (distributivity)

Proof. These are annoying to check, but is best done using the shorthand
notation for matrices and the sum formula for the elements in the product
matrix. The interested student can try to check these.

23
3.3 The transpose of a matrix
Definition 3.3.1 Given an n-by-m matrix
 
a1,1 · · · a1,m
A =  ... .. .. 

. . 
an,1 · · · an,m

we define its transpose to be the m-by-n matrix


 
a1,1 · · · an,1
AT =  ... .. ..  .

. . 
a1,m · · · an,m

In other words the ith row vector of AT is the ith column vector of A.
Using our shorter notation, define a0ji = aij , then if A = (aij ), i = 1, . . . , n,
j = 1, . . . , m, we get AT = (a0ji ), where j = 1, . . . , m and i = 1, . . . , n.

Example 3.3.2  
 T 1 2
1 0 1
= 0 1 
2 1 1
1 1

Theorem 3.3.3 The transpose satisfies the following properties:

1. (AT )T = A

2. (A + B)T = AT + B T

3. (AB)T = B T AT

4. (cA)T = cAT

Proof. Most are obvious if you try a few examples. I will assign a few as
homework.

Definition 3.3.4 A matrix is symmetric if AT = A.

24
Symmetric matrices will come up later in the class when we study eigen-
values. They also have an important role in the definition of quadratic forms.
These are very important in number theory and are also used in geometry to
classify all plane conics. Again, this is outside the scope of this course. The
interested reader can look up these terms on Wikipedia.

3.4 Some important types of matrices


In this section we will look at some very important matrices that arise in
applications. Often we are interested in e.g. being able to factorize a ma-
trix into products of the following type. These are usually called matrix
decompositions and they are very important in e.g. numerical analysis and
when implementing matrix computations on a computer. Unfortunately, we
don’t have time to go into this. The most important class of matrices is the
following:

Definition 3.4.1 A matrix with the same number of rows and column is
called a square matrix.

Square matrices have the nice property that if we restrict ourselves to


n-by-n square matrices, then we can add them and multiply them and the
result will always be an n-by-n square matrix. They are closed under the
standard matrix operations.

Definition 3.4.2 Denote by In the n-by-n square matrix with 1’s on the
diagonal and 0’s everywhere else. Then this is called the n-by-n identity
matrix or just the identity matrix if the dimensions are clear from the context.
If the dimensions of the matrix is obvious from the context, we will just write
I to denote the identity matrix. More concretely, the matrix looks like follows
 
1 ··· 0
 .. . . .. 
 . . . 
0 ··· 1

Theorem 3.4.3 Let A be an n-by-n square matrix. Then

IA = A = AI,

25
so I is the identity element with respect to matrix multiplication of n-by-n
matrices. This also explains the name of the matrix.
Proof. Write I = (bij ), A = (aij ), where i, j = 1, . . . , n, so that

1, i = j
bij = .
0, i 6= j
Letting IA = (cij ) and AI = (c0ij ), we get
n
X
cij = bik akj = bii aij = aij
k=1

and n
X
c0ij = aik bkj = aij bjj = aij .
k=1

Definition 3.4.4 A square matrix with nonzero elements only on the diag-
onal is called a diagonal matrix.

Example 3.4.5 The matrix on the left is a diagonal matrix, while the matrix
on the right is not.
   
−1 0 0 −1 2 0
 0 1 0 ,  0 1 0 
0 0 3 0 0 3

Definition 3.4.6 A square matrix is called upper triangular if all elements


below the diagonal are 0’s. Similarly, a matrix is called lower triangular if
all elements above the diagonal are 0’s.

Example 3.4.7 The matrix on the left is lower triangular, while the matrix
on the right is upper triangular.
   
−1 0 0 −1 2 0
 4 1 0 ,  0 1 0 
1 0 3 0 0 3

26
Note 3.4.8 Note that a diagonal matrix is both lower and upper triangular.
Moreover, a diagonal matrix is also symmetric.

3.5 Systems of linear equations and elementary matri-


ces
We start by looking again at a linear system of equations:

 a1,1 x1 + a1,2 x2 + . . . + a1,n xn = b1

.. ,
 .
 a x + a x + ... + a x = b
m,1 1 m,2 2 m,n n m

We may also write it in the matrix form AX = B, where A, X and B are as


follows:     
a1,1 · · · a1,n x1 b1
AX =  ... ... ..   ..  =  ..  = B.

.  .   . 
am,1 · · · am,n xn bm
If we evaluate the product on the left, the matrix equation simply becomes
   
a1,1 x1 + a1,2 x2 + . . . + a1,n xn b1
..   .. 
= . 

 .
am,1 x1 + am,2 x2 + . . . + am,n xn bm

and if we equate components, this will give us precisely the original linear
system.

Definition 3.5.1 The matrix A in the matrix notation is called the co-
effiecient matrix of the system, B is often called the constant vector and X
the vector of unknowns.

Definition 3.5.2 A linear system is consistent if it has at least one solution.


If it has no solutions, then it’s called inconsistent.

Definition 3.5.3 A linear system is underdetermined if it has fewer equa-


tions than unknowns. It is overdetermined if there are more equations than
unknowns.

27
Example 3.5.4 The system on the left is consistent while the one on the
right is not:  
x+y =0 x+y =1
, .
x−y =0 x+y =0
In matrix form these equations are:
         
1 1 x 0 1 1 x 1
= , = .
1 −1 y 0 1 1 y 0

Given a linear system, we can clearly do the following without changing


the solution to the system:

1. We may multiply an equation with a nonzero constant.

2. We may switch the order of two equations in the system.

3. We may add a constant multiple of one equation to another.

Definition 3.5.5 The previous three kinds of operations are called elemen-
tary operations on a system. Notice the resemblance with Theorem 2.3.10.
This is not a coincidence.

Definition 3.5.6 Denote by Eij the n-by-n matrix for which every element
is a zero except that the (i, j)-element is 1.

Pn Pn
Note 3.5.7 If A = (aij ) is an n-by-n matrix, then A = i=1 j=1 aij Eij ,
so this gives us a nice notation for modifying matrices.

Definition 3.5.8 The following matrices will be called elementary matrices:

1. In + (a − 1)Eii .

2. In + Eij + Eji − Eii − Ejj .

3. In + aEij , i 6= j.

28
The first one is just the identity matrix with an element a in (i, i). The
second one is the identity matrix with the ith and jth row interchanged.
The third one is the identity matrix, but with the element 0 in (i, j) replaced
with a.

Theorem 3.5.9 Let A be an n-by-m matrix, with the following row vectors:
 
x1
A =  ...  .
 
xn

Multiplying with the elementary matrices on the left has the following effect.
Write E1 = In + (a − 1)Eii , E2 = In + Eij + Eji − Eii − Ejj and E3 = In + aEij
and assume i < j, then
 
x1
 .. 
   .   
x1  xi−1 
  x1
 ..   xj 
  ..
 .  .
 
 
x
 
 xi−1   xi−1 
   i+1   
 . 
E1 A =  axi  , E2 A =   ..  , E3 A =  xi + axj  ,
   
 xi+1   xi+1 
   x   
 .  j−1
..
 
 ..   x   
 i   . 
 x 
xx  j+1  xx
 . 
 .. 
xx

Thus multiplying by E1 multiplies row i by a, multiplying by E2 interchanges


rows i and j and multiplying by E3 adds axj to row i.
Proof. These are all straightforward using the short notation. Won’t go into
details.

Example 3.5.10 Here’s an example of an elementary matrix of each type:


     
a 0 0 1 0 0 1 a 0
 0 1 0 ,  0 0 1 ,  0 1 0 
0 0 1 0 1 0 0 0 1

29
Multiplying a 3-by-2 matrix with the middle one gives e.g.
    
1 0 0 1 1 1 1
 0 0 1   2 4  =  3 1 ,
0 1 0 3 1 2 4
so it interchanges the second and third row as desired. The reader should
check that multiplying with the other two has the result claimed in the
previous theorem.

Definition 3.5.11 The operations performed on a matrix by multiplying by


an elementary matrix on the left are called elementary row operations. Thus
these are:
1. Multiplying a row by a nonzero constant.
2. Exchanging two rows of a matrix.
3. Adding a constant multiple of one row to another.

Definition 3.5.12 Let A be a matrix and B a matrix that we have got by


performing elementary row operations on A, then A and B are said to be
equivalent.

Definition 3.5.13 Let A be an n-by-m matrix and B and n-by-p matrix.


We write (A | B) to denote the matrix
 
a1,1 · · · a1,m b1,1 · · · b1,p
(A | B) =  ... .. .. .. .. .. 

. . . . . 
an,1 · · · an,m bn,1 · · · bn,p
in other words, (A | B) is the matrix formed by taking the column vectors
of A and then appending the column vectors of B. The matrix (A | B) is
called the augmented matrix of A by B.

Definition 3.5.14 Given a linear system of equations



 a1,1 x1 + a2 x2 + . . . + a1,n xn = b1

..
 .
 a x + a x + ... + a x = b
m,1 1 2 2 m,n n m

30
we call the matrix  
a1,1 · · · a1,n b1
 .. .. .. .. 
 . . . . 
am,1 · · · am,n bm
the augmented matrix of the system. In other words, it’s just the matrix
(A | B), when the system is written as AX = B.

Definition 3.5.15 A linear system is called homogeneous if bi = 0 for all i.


If this does not hold, then the system is nonhomogeneous.
Homogeneous systems are important for the following reason. Assume we
are given the system AX = B and assume that we have found two different
vectors X1 and X2 as a solution. Then
A(X2 − X1 ) = B − B = 0,
so X2 − X1 is a solution to the corresponding homogeneous system AX = 0.
Thus, X2 = X1 + (X2 − X1 ), so the other solution is X1 plus a solution to the
homogeneous equation. Conversely, if Xh is a solution to the homogeneous
equation AX = 0, then
A(X1 + Xh ) = AX1 + AXh = B + 0 = B,
so X1 + Xh is a solution to the system AX = B. This argument shows that
to solve AX = B one needs to find one particular vector, say Xp , solving
it. Then we can look for all the solutions of the corresponding homogeneous
system AX = 0, which is generally easier to solve, and the full set of solutions
to AX = B will then be the vectors
Xp + Xh ,
where Xh varies over all the solutions of AX = 0.

3.6 Gaussian elimination


The tool for solving systems of linear equations will be the following. As-
sume that we have a system with an augmented matrix looking like e.g. the
following:  
1 1 1 3
 0 1 2 1 
0 0 1 1

31
The corresponding linear system to this is

 x+y+z =3
y + 2z = 1
z=1

We see that solving the system is very simple, since we already know the
value for z. Thus, we can plug in z = 1 into the second equation, then solve
for y = 1 and finally plug in y and z into the first equation to get x = 1.
The matrix defining the system has a very special form, which is what makes
solving the system easy.

Definition 3.6.1 We will say that a matrix is in row-echelon form if it


satisfies the following:
1. The first nonzero entry in each row is a 1.
2. The first nonzero entry in a lower row always appears to the right of
the first nonzero entry of an upper row.
3. All zero rows are at the bottom of the matrix.

Example 3.6.2 The matrix in the first example in this section is in row-
echelon form. Another one would be
 
1 0 2 −1 0
 0 0 1 −5 7 
 0 0 0 1 0 .
 

0 0 0 0 0

Definition 3.6.3 The first nonzero element in each row is called a pivot.

Algorithm 3.6.4 A linear system can be solved using the following proce-
dure:
1. By only applying elementary row operations to its augmented matrix,
transform the matrix into row-echelon form.
2. Now solve as in the first example of this section (if possible) by solving
for variables from the bottom up.

32
Definition 3.6.5 The procedure used to perform step (1) is called Gaussian
elimination. The procedure is best understood through a few examples, but
I’ll give an algorithmic description below.

Algorithm 3.6.6 (Gaussian elimination) Let A be an n-by-m matrix.

1. Look for the leftmost nonzero column of A.

2. Switch rows if needed, so the the upper most element in that row i.e. the
pivot is nonzero.

3. Multiply the row with the pivot by a scalar, so that the pivot is 1.

4. Make the elements below the pivot to be zero, by subtracting a multiple


of the pivot row from each row below the pivot.

5. Continue the procedure on the ”submatrix” that you get by removing


the pivot row and column.

Example 3.6.7 We want to transform the following matrix into row-echelon


form using only elementary row operations:
 
1 3 −2 −7
 4 1 3 5 .
2 −5 7 19

This works as follows. We see that for the matrix to be in row-echelon form,
the elements below the top-left element need to be zeros. Denote the ith row
by Ri . We can replace R2 by R2 − 4R1 and R3 by R3 − 2R1 . The process
gives the following chain:
    R2 ←− 1 R2
1 3 −2 −7 R2 ←R2 −4R1 1 3 −2 −7 R 11
←− 1 R
R3 ←R2 −2R1
 4 1 3 5  ⇒  0 −11 11 33  3 ⇒11 3
2 −5 7 19 0 −11 11 33
   
1 3 −2 −7 1 3 −2 −7
R3 ←R3 −R2
 0 1 −1 −3  ⇒  0 1 −1 −3 
0 1 −1 −3 0 0 0 0

33
This matrix in row-echelon form, then corresponds to the system

 x + 3y − 2z = −7
y − z = −3 ,
0=0

which is then equivalent to the original one, thus has the same set of solutions.
Note that even though we started with three equations, the system essentially
reduced to a system of two. This will be explained in more details in the next
section. The system is now underdetermined, so we can solve it as follows.
Since, the last equation puts no constraints on z, choose z freely to be any
nonzero value, say, z = 1. From the second equation, we then get y = −2
and finally from the first x = 16.

Example 3.6.8 Here’s an example of what happens when a system has no


solution. Assume our system has the augmented matrix
 
1 2 1 0
 −1 0 2 1 
0 2 3 2
Applying Gaussian elimination on this matrix gives us the following chain:
   
1 2 1 0 1 2 1 0
R ←R +R
 −1 0 2 1  2 ⇒2 1  0 2 3 1  R3 ←R −R
⇒3 2
0 2 3 2 0 2 3 2
   
1 2 1 0 1 2 1 0
R2 ←R2 /2
 0 2 3 1  ⇒  0 1 3 1 
2 2
0 0 0 1 0 0 0 1
Translating this back to a linear system, we see that our system is

 x + 2y + z = 0
y + 3z/2 = 1/2 .
0z = 1

Obviously, there’s no way to satisfy the last equation.

Example 3.6.9 Here’s a more complicated example where we need to apply


all row operations, including row switching. This is also closer to what solving

34
a real-life system would amount to, since the elements in the matrix typically
end up being rational numbers making the computation slightly annoying.
   
1 3 5 −1 1 1 3 5 −1 2
 1 3 0 1 1  R2 ←R2 −R1  0 0 −5 2 −1  R2 ↔R3

 2 2 9 4 0 
 ⇒  0 −4 −2 6 −4  ⇒
 

1 0 3 0 1 0 −3 −2 1 −1
   
1 3 5 −1 2 1 3 5 −1 2
 0 −4 −2 6 −4  R2 ←−R2 /4  0 1 1/2 −3/2 1  R4 ←R4 +3R2

 0 0 −5 2 −1 
 ⇒ 
 0 0 −5
 ⇒
2 −1 
0 −3 −2 1 −1 0 −3 −2 1 −1
   
1 3 5 −1 2 1 3 5 −1 2
 0 1 1/2 −3/2 1  R3 ←−R3 /5  0 1 1/2 −3/2 1  R4 ←R4 +R3 /2

 0 0 −5
 ⇒   ⇒
2 −1   0 0 1 −2/5 1/5 
0 0 −1/2 −7/2 2 0 0 −1/2 −7/2 2
   
1 3 5 −1 2 1 3 5 −1 2
 0 1 1/2 −3/2 1   0 1 1/2 −3/2 1
 R4 ←−10R 4 /37

 ⇒  
 0 0 1 −2/5 1/5   0 0 1 −2/5 1/5 
0 0 0 −37/10 21/10 0 0 0 1 −21/37

3.7 Rank of a matrix


Definition 3.7.1 Let A be an n-by-m matrix and denote the row vectors of
A by x1 , . . . , xn . We define the rank of A of to be the dimension of

V = span(x1 , . . . , xn )

i.e. rank A = dim V .

From chapter 2 we already know a method for computing this. We just


apply our algorithm that removes vectors from the list of row vectors with-
out changing the span. Once we are left with a linearly independent list of
vectors, then the size of that list will be the rank of A.
However, there is a much more efficient way of computing this. Assume
that we have applied Gaussian elimination on the matrix A transforming it
into row-echelon form. Gaussian elimination only performs elementary row-
operations on the matrix, but Theorem 2.3.10 says that these won’t affect

35
the span! So given a matrix in row-echelon form, is there an easy way to
figure out the dimension of the span of the row vectors?
Lets look at an example from the previous section. The following matrix
from the previous section is in row-echelon form
 
1 0 2 −1 0
 0 0 1 −5 7 
 
 0 0 0 1 0 
0 0 0 0 0

and we’ll denote the row vectors by x1 , . . . , x4 . First notice that the last
vector is the zero vector, so it doesn’t add anything to the span, so we can
just throw x4 away. We are then left with three row vectors corresponding
to the nonzero ones. Now look at the equation

0 = c1 x 1 + c2 x 2 + c3 x 3

in R5 . Notice that the first component of x1 is 1, since the vector equals


(1, 0, 2, −1, 0), but the first component is zero for all other vectors. If we
look at the above equation only for the first component it will be

0 = c1 · 1 + c2 · 0 + c3 · 0.

This forces c1 = 0. The equation for the third component will then become

0 = 0 · 2 + c2 · 1 + c3 · 0,

so again c2 = 0 and finally c3 = 0. Therefore our three row vectors are


linearly independent and the rank of the matrix is 3! You should convince
yourself that for any matrix in row-echelon form, the previous argument will
work. This provides us with the following method for computing the rank of
a matrix:

Algorithm 3.7.2 (Compute rank of matrix) Let A be an n-by-m matrix.

1. Apply Gaussian elimination to reduce A to row-echelon form.

2. Count the number of nonzero rows in the row-echelon form.

3. The number of nonzero rows is the rank of the matrix.

36
This method has another nice application to material we did on vector
spaces. Assume that we are given some vectors x1 , . . . , xk ∈ Rn and we want
to figure out the dimension of V = span(x1 , . . . , xk ). This can be done using
the idea of the previous algorithm as follows:

Algorithm 3.7.3 (Find basis of V = span(x1 , . . . , xk )) We perform the


following steps:

1. Form a k-by-n matrix A which has the vectors x1 , . . . , xk as row vectors.

2. Compute the rank using the previous algorithm.

3. The rank will be the dimension of V and the nonzero rows of the row-
echelon form of A will form a basis of V .

Note 3.7.4 The previous algorithm provides us with a basis of V . It will


not however find a subset of the list x1 , . . . , xk which is a basis. To do that
you have to use Algorithm 2.5.2.

3.8 Rank and systems of linear equations


There’s still an important application of rank. The rank of the augmented
matrix of a linear system lets us analyze whether or not a linear system of
equations has any solutions, and, if it has, if there’s a unique solution or
infinitely many of them. If we are given a system AX = B the size of its
solution set will be completely determined by comparing the rank of A to
the rank of (A | B).
Assume that we start with a matrix A and we reduce it row-echelon form.
Since the rank of A can never exceed the number of rows of A, we eventually
have two cases:

1. The row-echelon form has no zero rows. We say it has full rank, since
the rank equals the number of rows of A.

2. The row-echelon form has one or more zero rows at the bottom, so the
rank is less than the rank of A.

Now lets think what will happen if we add one more column to A and compute
the row-echelon form of the new matrix. If you look at the examples of

37
Gaussian elimination from the previous section, you’ll notice that at each
stage of the algorithm we only care about one column at a time, since we try
to clear the elements to zero below our current pivot. Therefore, nothing will
change in the algorithm if we add a new column, except when we reach that
column in the algorithm. The end result is that adding one more column to
the right might turn a zero row of the old row-echelon form into a nonzero
row in the row-echelon form of the new matrix. Thus we get the following:

Proposition 3.8.1 Given a linear system AX = B we have the following


possibilities:
1. rank A = rank(A | B).

2. rank A < rank(A | B).

The significance is that in the first case the system has solutions while in
the second case it does not. The second case essentially means that the
row-echelon form of (A | B) will have a row of the following form
 
0 ··· 0 1 .

But in the linear system corresponding to the matrix this will correspond
to an equation 0 = 1, which has no solution! Thus we have the following
theorem:

Theorem 3.8.2 A linear system AX = B has a solution if and only if


rank A = rank(A | B).

This now gives us a simple algorithm for determining if a system has a


solution:

Algorithm 3.8.3 (Checking if a linear system has a solution) Let the system
be AX = B and let (A | B) be the augmented matrix. We do the following:
1. Reduce the matrix (A | B) into row-echelon form.

2. Let M denote the reduced matrix and N the reduced matrix with the
last column removed.

38
3. If rank M > rank N , then the system has no solution.

Finally, if the system has solutions then we have two cases:

1. The system has a unique solution.

2. The system has more than one solution.

Now assume that the latter holds, then we may find two solutions to the
equation AX = B, so let X1 and X2 be solutions. As we saw earlier this
means that
A(X1 − X2 ) = 0,
but then we can multiply X1 − X2 by a scalar getting c(X1 − X2 ) and, again,

A(c(X1 − X2 )) = cA(X1 − X2 ) = 0,

so it follows that X1 + c(X1 − X2 ) is a solution to the system for all choices


of c. Since X1 − X2 is not the zero vector (the solutions being distinct) it
follows that the system has infinitely many solutions. Hence, we have the
theorem:

Theorem 3.8.4 A consistent linear system has either a unique solution or


infinitely many solutions.

There’s a very simple way to distinguish between these. Let A be the


n-by-m coefficient matrix of AX = B, so the system has n equations and m
unknowns. Then the following holds:

1. If rank A < m, then we have infinitely many solutions.

2. If rank A = m, then we a unique solution.

To collect everything together we have the following algorithm for com-


pletely solving a linear system:

Algorithm 3.8.5 (Solving a linear system) Assume we are given a system


of n equations and m unknowns, so AX = B, where A is an n-by-m matrix.
We do the following:

39
1. Compute the row-echelon form of (A | B).

2. Let M denote the reduced matrix and N the reduced matrix with the
last column removed.

3. If rank M > rank N , then stop since the system has no solutions.

4. If rank M = rank N = m, then the system has a unique solution.

5. If rank M = rank N < m, then the system has infinitely many solu-
tions.

6. If we have solutions and r = rank M , then solve by back substitution


as in the example in the beginning of section 3.6. During the back
substitution we can choose freely the value of m − r variables.

3.9 Determinants
In math 114 you have already met a form of the determinant when computing
the cross product of two vectors. The determinant is a tool that you can
think about as follows. You feed it a list of n vectors in Rn and out comes a
real number. In other words it’s a function that takes as input vectors and
outputs a number. The function is denoted by det and in the case n = 2 it
would give us a number

det((a1 , b1 ), (a2 , b2 )).

The problem of this section is trying to determine what we would like this
function to do and then how to define it.
The starting point is the following. Assume that we are given two vectors
x1 , x2 in R2 . Then from 114 you might remember that they form a paral-
lelogram with vertices 0, x1 , x2 , x1 + x2 . we would like the determinant to
measure the signed area, so

det(x1 , x2 ) = ±area of parallelogram.

For three vectors y1 , y2 , y3 ∈ R3 , we would like the determinant to measure


the signed volume of the parallelepiped of the matrices. This concept of sign
has to do with something called the orientation of the measured geometric
object. We won’t go into that.

40
This idea of ”volume” can be generalize into Rn and we want the previous
two examples to generalize to the definition of volume in higher dimensions. If
you don’t want to think about what something like that would philosophically
mean (e.g. what’s a cube in 4-dimensions and what’s the volume?), just
assume that all the following examples have n = 2 or n = 3.
If the determinant measures volume, then it should certainly satisfy the
following
det(cx1 , x2 , . . . , xn ) = c det(x1 , x2 , . . . , xn ), (1)
so if I stretch (or shrink) the parallelepiped in one dimension by a factor of
c, then certainly the volume also changes by a factor of c. Now if you add a
vector x to x1 , then we should get

det(x1 + x, x2 , . . . , xn ) = det(x1 , x2 , . . . , xn ) + det(x, x2 , . . . , xn ) (2)

Since, the parallelepiped on the left can be cut into pieces and assembled to
form the ones on the right (draw picture of n = 2 case). Also if two vectors
are the same, then the volume would be zero, since the object is ”flat”. In R3
the parallelepiped would have no height similarly the parallelogram defined
by two copies of the same vector has no height. Thus we want that

det(x1 , . . . , xn ) = 0 (3)

if xi = xj for some i 6= j.
To make notation easier, we can define the determinant on matrices. Since
the determinant takes as input n vectors x1 , . . . , xn ∈ Rn , we can just let A
be an n-by-n matrix where the row vectors are x1 , . . . , xn i.e.
 
x1
A =  ...  .
 
xn

and then let det A = det(x1 , . . . , xn ). Now we are ready to actually define
the determinant function through a list of conditions we want it to satisfy:

Definition 3.9.1 The determinant is a function that takes as input an n-


by-n matrix and outputs a number. The function is assumed to have the
following properties:
1. The identity matrix has determinant 1, so det I = 1.

41
2. If we replace the ith row vector by ax + by we have the following
equality:

det(x1 , . . . , xi−1 , ax + by, xi+1 , . . . , xn ) =


a det(x1 , . . . , xi−1 , x, xi+1 , . . . , xn ) + b det(x1 , . . . , xi−1 , y, xi+1 , . . . , xn )

3. If two rows in A are equal, then det A = 0.

WARNING 3.9.2 The determinant is only defined for square matrices, so


whenever we speak of the determinant of a matrix it’s automatically assumed
that the matrix is a square matrix.

Note 3.9.3 These conditions are nothing but a compact way of expressing
what we had just done above. At least the middle condition deserves some
comment. It actually says a few things. First, if b = 0, then it simply says
that multiplying a row by the constant a multiplies the determinant by a,
so it’s just the equation (1) above. When a = 1 it’s simply the condition
(2) about adding another vector to one of the rows. The last condition is
precisely condition (3).

So we haven’t actually given a formula for computing the determinant and


we don’t even know if there is a function that satisfies our properties, since
the list of properties might be contradictory. For example you can’t find a
function s.t. f (1) = 0 and f (1) = 1, so just listing a bunch of properties and
saying we pick a function that has these properties is not very correct from a
mathematical point of view. Thus, our current definition of the determinant
is more like a wishlist. However, in this course we will just take for granted
that the determinant function exists and it has all the properties listed.

3.10 Properties of the determinant


In this section we will derive some properties of the determinant starting
from the three properties listed in the definition of the determinant. These
properties end up being extremely important for practical computations.

Theorem 3.10.1 Interchanging two rows of a matrix changes the sign of


the determinant.

42
Proof. I’ll use the vector notation for this one, since it’s easier. Assume that
A has row vectors x1 , . . . , xn and assume we switch the first and second row.
The argument for any other pair is the same, but the notation is just messier.
We expand using property (2) and use (3) to get rid of terms, so

0 = det(x1 + x2 , x1 + x2 , x3 , . . . , xn )
= det(x1 , x1 , x3 , . . . , xn ) +
det(x1 , x2 , x3 , . . . , xn ) +
det(x2 , x1 , x3 , . . . , xn ) +
det(x2 , x2 , x3 , . . . , xn )
= det(x1 , x2 , x3 , . . . , xn ) +
det(x2 , x1 , x3 , . . . , xn )

Since the sum of the last two terms is zero, they must have the same magni-
tude and opposite signs.

Theorem 3.10.2 If a matrix contains a row of all zeros, then the determi-
nant is zero.
Proof. I’ll assume the first row is zero to simplify notation. The proof for
any other row is the same. Let the row vectors be 0, x2 , . . . , xn . Then

det(0, x2 , . . . , xn ) = det(0 + 0, x2 , . . . , xn )
= det(0, x2 , . . . , xn ) + det(0, x2 , . . . , xn )

Now subtract det(0, x2 , . . . , xn ) from both sides.

Theorem 3.10.3 Adding a scalar multiple of one row to a different row will
not change the value of the determinant.
Proof. Assume we add a scalar multiple of the row xi , i 6= 1, to the first
row. The general case is again similar. Let the row vectors be x1 , x2 , . . . , xn .
Then

det(x1 + axi , x2 , . . . , xn ) = det(x1 , x2 , . . . , xn ) + a det(xi , x2 , . . . , xn ).

But xi occurs twice in the list of vectors in the second term, so property (3)
of the determinant tells us that the term is zero.

43
The following theorem is very important and the observation used in the
proof will be used to actually compute determinants.

Theorem 3.10.4 If an n-by-n matrix has rank less than n, then the deter-
minant is zero.
Proof. Let A have rank(A) < n. Let B be the row-echelon form of A. Then
B will have a row of all zeros at the bottom, so by the previous theorem the
determinant is zero. The theorem then follows from observing the following:

1. If the determinant of a matrix is nonzero, then multiplying a row by


a nonzero constant won’t make the determinant zero (since, the deter-
minant just gets multiplied by the constant).

2. Interchanging two rows of a matrix will not make the determinant zero,
since it just switches the sign of the matrix.

3. Adding a constant multiple of one row to a different row does not


change the determinant.

Thus we see that applying elementary row operations on a matrix with


nonzero determinant won’t make the determinant zero. Since B has been
derived from A precisely in this way, det B = 0 only if det A = 0.
The next theorem is the basis for actually computing determinants, since
it gives an extremely simple formula to compute the determinant for certain
matrices.

Theorem 3.10.5 If a matrix is upper or lower triangular, then the deter-


minant equals the product of the diagonal elements.
Proof. Doing this in full generality is quite annoying, so I’ll show the idea in
the case of a 3-by-3 matrix for the upper triangular case. Thus, we have the
matrix:  
a11 a12 a13
 0 a22 a23 
0 0 a33
First notice that if any of the diagonal elements are zero, then reducing the
matrix to row-echelon form will produce a zero row, so the determinant is

44
zero. Thus the determinant equals the product of the diagonal elements in
this case.
Now assume that all diagonal elements are nonzero. Subtracting a23 a−1 33
times the third row from the second row and a13 a−1
33 times the third row from
the first row, the matrix becomes
 
a11 a12 0
 0 a22 0  .
0 0 a33

Now subtract a12 a−1


22 times the second row from the third, so the matrix
becomes  
a11 0 0
 0 a22 0 .
0 0 a33
What we have left is just the identity matrix with the rows multiplied by
some scalars. In other words
   
a11 0 0 1 0 0
det  0 a22 0  = a11 det  0 a22 0 
0 0 a33 0 0 a33
 
1 0 0
= a11 a22 det  0 1 0 
0 0 a33
 
1 0 0
= a11 a22 a33 det  0 1 0 
0 0 1
= a11 a22 a33 ,

since the identity matrix has determinant 1. The proof of the lower triangular
case is essentially the same.
The following is the most efficient way of computing the determinant for
large matrices.

Algorithm 3.10.6 (Determinant by Gaussian elimination) Let A be an n-


by-n matrix.

45
1. Reduce the matrix to row-echelon form keeping track of how many rows
have been switched and how many times a row has been multiplied by
a constant. (These are the operations that change the determinant)

2. The row-echelon matrix is upper triangular, so compute its determi-


nant by multiplying the elements on the diagonal. Let the computed
determinant of the row-echelon form be d.

3. Assume that rows have been multiplied by the scalars c1 , c2 , . . . , ck in


order during the Gaussian elimination and a row has been switched s
times with another.

4. det A = (−1)s (c1 · · · ck )−1 d.

The determinant also has the following two properties that are much
harder to prove:

Theorem 3.10.7 det(AB) = det(A) det(B).

Theorem 3.10.8 det(A) = det(AT ).

The latter theorem has lots of useful consequences. Essentially any claim
about columns becomes a claim about rows in the transpose, so we can
translate all our theorems to involve operations on columns instead of rows.
Using this and our previous results, we can summarize the following long list
of properties for the determinant:

1. Interchanging two rows or columns in a matrix A changes the sign of


the determinant.

2. For a triangular (upper or lower) matrix A, the determinant is the


product of the diagonal elements. In particular, det I = 1.

3. If a matrix A has a zero row or column, then det A = 0.

4. If one row or column of A is a linear combination of the others, then


det A = 0.

5. det A does not change if we add a scalar multiple of a row or column


to another row or column.

46
6. Multiplying a row or a column by a constant c multiplies the determi-
nant by c.

7. det A = det AT .

8. det(AB) = det(A) det(B).

9. For an n-by-n matrix det(cA) = cn A (we multiply n rows by c).

10. If an n-by-n matrix A has rank A < n, then det A = 0.

3.11 Some other formulas for the determinant


In this chapter we list some other concrete ways of computing the determi-
nant. I won’t provide any proof or justification for these, since it takes quite
a bit of work to derive them from the results in the previous chapter.

Theorem 3.11.1 (Determinant of 2-by-2 matrix) For a 2-by-2 matrix the


determinant has the following simple formula:
 
a11 a12
det = a11 a22 − a12 a21 .
a21 a22

Proof. This follows from row-reducing the matrix.

Theorem 3.11.2 (Determinant of 3-by-3 matrix) The determinant of a 3-


by-3 matrix  
a11 a12 a13
 a21 a22 a23 
a31 a32 a33
has the following simply visual rule:
+ + +
a11 a12 a13 a11 a12

a21 a22 a23 a21 a22

a31 a32 a33 a31 a32


− − −

47
In other words,
 
a11 a12 a13
det  a21 a22 a23  = a11 a22 a33 +a12 a23 a31 +a13 a21 a32 −a13 a22 a31 −a11 a23 a32 −a12 a21 a33 .
a31 a32 a33
Proof. Again we can reduce symbolically to row-echelon form. The details
are quite messy.

Unfortunately no simple formulas exists for n-by-n matrices when n > 3.


The only way to compute the determinant is either using the methods from
the previous section or by using the method of cofactor expansion described
next.

Definition 3.11.3 Let A be a matrix. The (i, j) minor of A is the matrix


that we get by removing the ith row and jth column of A.

Example 3.11.4 The matrix on the right is the (2, 1) minor of the one on
the left  
−3 2 −1  
 0 2 −1
5 3 ,
−1 2
1 −1 2

Definition 3.11.5 Let A = (aij ) be an n-by-n matrix. The cofactor Cij is


defined to be (−1)i+j det Mij , where Mij is the (i, j) minor of A.

Theorem 3.11.6 Let A = (aij ) be an n-by-n square matrix. Then the


determinant satisfies the following:
det A = ai1 Ci1 + ai2 Ci2 + . . . + ain Cin .
Thus we multiply each element in row i with the cofactor we get by remov-
ing the row and column of the element. Similarly, we can do the same for
columns, so
det A = a1i C1i + a2i C2i + . . . + ani Cni .

Definition 3.11.7 The method of computing the determinant in the previ-


ous theorem is called the cofactor expansion of the determinant.

48
Note 3.11.8 The previous formula lets you express the determinant of an
n-by-n matrix as a sum of determinants of (n − 1)-by-(n − 1) matrices.

Example 3.11.9 Expanding the determinant in the previous along the third
column gives
 
−3 2 −1  
1+3 0 5
det  0 5 3  = (−1)(−1) det
1 −1
1 −1 2
 
2+3 −3 2
+3(−1) det
1 −1
 
3+3 −3 2
+2(−1) det .
0 5

Note 3.11.10 The cofactor expansion is rarely an efficient method for com-
puting determinants of matrices larger than 4-by-4. However, in some special
cases it might be useful. For example if there’s a row or column in the matrix
that has mostly zeros in it, then computing the expansion along that row or
column results in a very few terms in the sum in Theorem 3.11.6, since most
aij ’s will be zeros.

Example 3.11.11 The following matrix has mostly zeros in the last column,
so the expansion along that column gives
 
−3 2 0  
1+3 0 5
det  0 5 0  = 0(−1) det
1 −1
1 −1 2
 
2+3 −3 2
+0(−1) det
1 −1
 
3+3 −3 2
+2(−1) det
0 5
 
−3 2
= 2 det
0 5

Similarly, we could expand along the second row, which would have the same
effect.

49
3.12 Matrix inverse
Definition 3.12.1 Let A be an n-by-n matrix. An n-by-n matrix B is called
a left inverse if BA = I. Similarly, B is called a right inverse if AB = I. If
BA = I = AB, then B is called an inverse of A.
Matrix multiplication takes a lot of work on larger matrices, which should
be apparent by know. Therefore, checking that a matrix is an inverse takes
a lot of work, since we have to perform two matrix multiplications to do it.
Fortunately, the following result allows us to cut the work in half.

Theorem 3.12.2 Let B be a left inverse of A, then B is also a right inverse.


Thus to check if B is an inverse of A, one only needs to check that BA = I.
Proof. This is tricky without the theory of linear maps. The proof can be
found in most texts on linear algebra.

Corollary 3.12.3 If AB = I, then BA = I.


Proof. A is now a left inverse of B, so it’s also a right inverse by the previous
theorem.

Theorem 3.12.4 If a matrix A has an inverse, then it is unique. This lets


us talk about the inverse of A.
Proof. Assume that both B and C are inverses of A. Then
B = BI = B(AC) = (BA)C = IC = C.

Definition 3.12.5 A matrix A with an inverse is called an invertible matrix.


The inverse of A will be denoted by A−1 .

The following shows how the matrix inverse is extremely useful for solv-
ing systems of linear equations. Assume we are given an invertible n-by-n
matrix A. Now given a system of linear equations
AX = B

50
we can just multiply both sides by A−1 from the left. It follows that

X = A−1 B.

What’s important about this is that the matrix inverse gives the solution
right away for any choice of vector B, since the solution X is now a function
of B.

Example 3.12.6 You should check by multiplication that the following ma-
trices are inverses:
   
2 0 1 −2 5 −3
A =  −2 3 4  , A−1 =  −8 17 −10 
−5 5 6 5 −10 6

Thus if we have a linear system AX = B, then we get


      
x −2 5 −3 b1 −2b1 + 5b2 − 3b3
 y  =  −8 17 −10   b2  =  −8b1 + 17b2 − 10b3  ,
z 5 −10 6 b3 5b1 − 10b2 + 6b3

so once we know the matrix inverse of the coefficient matrix a system, solving
it becomes a triviality. The moral of the story is that if you need to solve
many systems of equations with the same coefficient matrix, but varying
constant vectors, then the most economical approach is to try to compute
the matrix inverse of the matrix. If it exists, then you will get a formula like
the above, where you can just plug in your constants i.e. you don’t need to
do Gaussian elimination again for every choice of constant vector.

Having a nice tool like the matrix inverse is not of much use unless we ac-
tually know how to compute it. The rest of this section will be devoted to
developing an algorithm that lets you both compute it, if it exists, as well as
determine that it does not.
Assume we are given matrices A, B with the same number of rows and
let (A | B) the again the augmented matrix. Assume we want to peform an
elementary row operation on (A | B), say, we want to switch two rows in
the matrix. We know from previous sections that this can be performed by

51
multiplying with an elementary matrix E from the left. The reader should
convince himself/herself that the following formula then holds

E(A | B) = (EA | EB).

It simply says that it doesn’t matter if we switch rows in the whole matrix or
if we do it separately for the two parts. The end result will still be the same.
The same thing is true for any of the elementary row operations. Thus we
have the following theorem:

Theorem 3.12.7 Let A and B the two matrices with n rows and let E be
an elementary n-by-n matrix. Then

E(A | B) = (EA | EB).

Next assume that A is an n-by-n square matrix and I is the identity matrix.
Assume that we can transform A into the identity matrix by a sequence of
elementary row operations. Since each row operation corresponds to mul-
tiplying from the left by an elementary matrix E, there will be a sequence
of elementary matrices, E1 , . . . , Ek . corresponding to the row operations, in
other words
Ek Ek−1 · · · E2 E1 A = I. (4)
Now what happens if we perform the same row operations on the matrix
(A | I)? Then we will instead end up with

Ek Ek−1 · · · E2 E1 (A | I) = (I | Ek Ek−1 · · · E2 E1 ).

Thus the second half of the matrix will contain the matrix B = Ek Ek−1 · · · E2 E1 .
However, (1) tells us that BA = I, so we have in fact found a way to compute
the matrix inverse! It follows that we have the following theorem:

Theorem 3.12.8 Assume that an n-by-n matrix A can be converted into


the identity matrix through elementary row operations. Then A is invertible
and the previous method lets us compute the inverse.

52
 
1 −1 2
Example 3.12.9 We compute the inverse of the matrix  3 1 2 
1 1 −1
using the method just described. We get
   
1 −1 2 1 0 0 1 −1 2 1 0 0
 3 1 2 0 1 0  ⇒  0 4 −4 −3 1 0 
1 1 −1 0 0 1 0 2 −3 −1 0 1
   
1 −1 2 1 0 0 1 −1 2 1 0 0
 0 1 −1 −3/4 1/4 0  ⇒  0 1 −1 −3/4 1/4 0 
0 2 −3 −1 0 1 0 0 1 −1/2 1/2 −1

The matrix on the left is now in row-echelon form and we can work backwards
to subtract lower rows from upper rows to transform our left matrix into the
identity matrix. This works as follows:
   
1 −1 0 2 −1 2 1 0 0 3/4 −1/4 1
⇒  0 1 0 −5/4 3/4 −1  ⇒  0 1 0 −5/4 3/4 −1 
0 0 1 −1/2 1/2 −1 0 0 1 −1/2 1/2 −1

It follows that
 −1  
1 −1 2 3/4 −1/4 1
 3 1 2  =  −5/4 3/4 −1  .
1 1 −1 −1/2 1/2 −1

Now lets make the convention that a vector x = (a1 , . . . , an ) is an n-by-1


matrix. Then multiplying with an m-by-n matrix A makes sense, i.e., we
may compute the product  
a1
A  ... 
 
an
and the result will be an m-by-1 matrix, which is a vector in Rm .

Definition 3.12.10 An m-by-n matrix defines a function Rn → Rm which


maps a vector x ∈ Rn to Ax ∈ Rm . Such a function is called a linear map,
or linear transformation or a linear operator.

53
Using the concept of a linear map, we will give a complete answer to the
following question: when does a matrix have an inverse? Going back to ex-
ample 3.12.9 lets look at the working backwards step after we have reached
the row-echelon form. Notice that the method used to work backwards to the
identity matrix works for any n-by-n matrix in row-echelon form, assuming
that the last row is not a zero row. However, these matrices are precisely the
square matrices that have nonzero determinant, since the row-echelon form
has determinant one and the original determinant differs by a nonzero con-
stant corresponding to row switches and scalings of rows by nonzero scalars
during the Gaussian elimination.
Conversely, assume that the matrix A has an inverse A−1 . If the row-
echelon form of A has a zero row, then rank A < n, so that the system
AX = 0 has a nonzero solution, call it x. But then
x = Ix = A−1 Ax = A−1 0 = 0,
which would contradict the fact that x 6= 0. It follows that if rank A < n,
then no inverse can exist, so if the inverse exists, then the row-echelon form
does not have a zero row at the bottom. We have thus proved the following
theorem:

Theorem 3.12.11 A square matrix A has an inverse if and only if det A 6= 0.

Algorithm 3.12.12 (Finding inverse of matrix) Let A be a square matrix.


1. Compute det A.
2. If det A = 0, then no inverse exists.
3. If det A 6= 0, use the method in example 3.12.9 to find the inverse.

Finally, there’s a useful formula for 2-by-2 matrices, which is very quick
to compute:
 
a b
Theorem 3.12.13 Write A = 6 0, then we have the
. If det A =
c d
formula:    
−1 1 d −b 1 d −b
A = =
det A −c a ad − bc −c a

54
Proof. Compute the product of A and the claimed inverse and simplify.

Definition 3.12.14 An invertible square matrix will be called nonsingular


while a noninvertible square matrix will be called singular.

3.13 Eigenvalues and eigenvectors


This section introduces one of the most important computational tools in
mathematics with an extremely diverse set of applications in engineering and
sciences. The interested reader should check the discussion on applications
on the Wikipedia page explaining eigenvectors.
In the previous section we mentioned that given an n-by-n matrix A,
then it defines a function Rn → Rn , x 7→ Ax. We are often interested in
what this map does geometrically, since these maps determined by matrices
tend to have a very geometric description. The following examples serve as
illustrations.

Example 3.13.1 The matrix below rotates points in R2 counterclockwise


around the origin by the angle θ, which the reader can easily check by mul-
tiplying different (a, b) with the matrix,
 
cos θ − sin θ
.
sin θ cos θ
This matrix and its generalizations have important applications in e.g. com-
puter graphics where it’s used to rotate objects before computing a projection
of the scene to a 2D-screen. Note that the matrix rotating clockwise i.e. in
the opposite direction by θ will be A−1 .

Example 3.13.2 The following matrix reflects points around the x-axis.
 
1 0
0 −1

Example 3.13.3 Write θ = arctan a and let Aθ be the rotation matrix above
corresponding to the angle θ. Denote by A the reflection matrix above. Then
the matrix
Aθ AA−1
θ

55
reflects points around the line y = ax.

Example 3.13.4 Let Aθ be as in the previous example. Define


 
c 0
B= ,
0 1
so that A scales the x-component of a vector by c. Then the matrix
Aθ BA−1
θ

scales the component along the line y = ax by c.

It turns out that by defining matrices for different operations, we can use
them as building blocks to compute quite complicated maps, which would be
very hard to construct directly. The last two examples serve as illustrations.
While this is extremely useful in practice, we won’t have time to go into this.
I will write an optional short section on this which the interested student can
skim.
Notice that in second example, a vector along the x-axis stays fixed by
the matrix, in the third example a vector along the line stays fixed and in
the third example a vector along the line gets scaled by c. The geometry in
such a map can usually be analyzed through the following concept.

Definition 3.13.5 Let A be an n-by-n matrix, so it defines a linear map


Rn → Rn . Then a nonzero vector x ∈ Rn will be called an eigenvector of A
if
Ax = λx
for some λ ∈ R. In other words, an eigenvector is a nonzero vector for which
multiplying by A has the effect of just scaling the vector. The eigenvalues of
A are the scalars for which a corresponding eigenvector exists that satisfies
the equation above.

Example 3.13.6 In the second examples above (1, 0) is an eigenvector cor-


responding to the eigenvalue 1. In the third example the vector (1, a) is an
eigenvector corresponding to the eigenvalue 1 while in the fourth example
(1, a) is an eigenvector corresponding to the eigenvalue c.

56
Example 3.13.7 Let A = I be the identity matrix. For every vector x ∈ Rn ,
we then have that Ax = x. This shows that every nonzero vector is an
eigenvector corresponding to the eigenvalue 1. In particular, 1 is the only
eigenvalue.

Example 3.13.8 Let A be an n-by-n matrix. Then A has the eigenvalue 0


if and only if AX = 0 has multiple solutions, i.e. rank A < n. This follows,
since if x is a nonzero solution, then Ax = 0 = 0x, so 0 is an eigenvalue.
Conversely, if 0 is an eigenvalue, then there’s a corresponding eigenvector
satisfying the equation Ax = 0x = 0. Since eigenvectors are nonzero, this
gives a nonzero solution to AX = 0.

Theorem 3.13.9 λ ∈ R is an eigenvalue of A if and only if det(A − λI) = 0.


Proof. Assume first that λ ∈ Rn is an eigenvalue of A, so there exists a
corresponding eigenvector x. We have that

Ax = λx ⇔ Ax − λx = 0 ⇔ (A − λI)x = 0.

Since x 6= 0 it follows that B = A−λI does not have an inverse, so det B = 0.


Conversely, if det(A − λI) = 0, then A − λI does not have full rank, so
the equation (A − λI)X = 0 has a nonzero solution. Thus there is a vector
x s.t. (A − λI)x = 0. The chain of equivalences above shows that λ is then
an eigenvalue corresponding to the eigenvector x.
This theorem shows us how to compute the eigenvalues of a matrix. As-
sume that we start with a matrix
 
a11 · · · a1n
A =  ... ... ..  .

. 
an1 · · · ann

Then the expression A − λI corresponds to the matrix


 
a11 − λ · · · a1n
A − λI =  .. ... ..
,
 
. .
an1 ··· ann − λ

57
i.e. we subtract λ from each diagonal element of A. Having fixed the matrix
A, denote
p(λ) = det(A − λI),
so p : R → R is a function with parameter λ and the eigenvalues of A are
precisely the zeros of p. Let’s try to figure out what this function p looks
like. We will start with an example.
 
a b
Example 3.13.10 Write A = , so that
c d
 
a−λ b
p(λ) = det(A − λI) = det = (a − λ)(d − λ) − bc.
c d−λ
If we simplify the expression on the right, we see that p is a quadratic poly-
nomial in λ. The following is true in general.

Theorem 3.13.11 If A is an n-by-n matrix, then the function p(λ) =


det(A − λI) is a polynomial in the variable λ of degree n.
Proof. This is an easy induction proof.

Definition 3.13.12 The polynomial p(λ) = det(A−λI) of an n-by-n square


matrix is called the characteristic polynomial of A.
Finding the eigenvalues of a matrix is now relatively easy. We just com-
pute the characteristic polynomial and look for its roots. This can be done
exactly whenever n < 5, since we have explicit formulas for the roots of any
polynomial of degree less than 5. What we still need to figure out is how to
compute the eigenvectors. We do this next.

Note 3.13.13 Not all polynomials with real coefficients have real roots, so
some matrices have no eigenvalues. An example is given by example 3.13.1.

Let A be an n-by-n matrix and λ an eigenvalue. Then the eigenvectors


are precisely the vectors x ∈ Rn s.t. Ax = λx, but this corresponds to
precisely the vectors s.t.
(A − λI)x = 0.

58
Therefore, finding the eigenvectors correspond to finding all the solutions to
the linear system BX = 0, where B = A − λI. However, just finding a
formula for all the solutions, is not that useful. We actually want a bit more.

Definition 3.13.14 Let A be an m-by-n matrix, so that A defines a function


Rn → Rm . Then the set of zeros of this function is called the null space of
A, i.e. the null space is the set of vectors x ∈ Rn s.t. Ax = 0. We write

Null(A) = {x ∈ Rn | Ax = 0}

to denote the null space.


 
1 1
Example 3.13.15 Let A = . Then we see that the system AX = 0
1 1
has precisely the solutions x = (a, −a).

Theorem 3.13.16 Let A be an m-by-n matrix, then the null space Null(A) ⊂
Rn is a vector space.
Proof. Let x, y ∈ Null(A). Then

A(x + y) = Ax + Ay = 0 + 0 = 0

and
A(cx) = cAx = c0 = 0,
so x + y ∈ Null(A) and cx ∈ Null(A).
So how does this fit into our quest for finding the eigenvectors corre-
sponding to an eigenvalue? Well, notice that the solutions to the equation
Ax = λx consists of precisely the set Null(A − λI), so what we actually want
to find is a basis for this vector space.

Definition 3.13.17 If λ is an eigenvalue for A, then Null(A − λI) is called


the eigenspace of the eigenvalue λ of A

Finding the basis for Null(A − λI) sounds quite tricky. We need to find
”enough” linearly independent eigenvectors, so that they span Null(A − λI).

59
But how do we know we have ”enough” i.e. we need to know the dimension
of Null(A − λI). Fortunately, there is the following theorem, which we won’t
prove:

Theorem 3.13.18 Let A be an n-by-n matrix. Then dim Null(A) = n −


rank(A).

There’s also the following theorem, which gives an upper bound for the
dimension and can be quicker to use than the previous one:

Theorem 3.13.19 Let A be an n-by-n matrix and p(λ) = det(A − λI). If


λ1 is a root of p(λ), i.e. an eigenvalue, then we may factorize p(λ) as
p(λ) = (λ − λ1 )e q(λ),
where λ1 is not a root of q(λ). Thus e is the multiplicity of the root λ1 . Then
we have the following inequality:
dim Null(A − λ1 I) ≤ e.

Note 3.13.20 By definition there’s at least one eigenvector corresponding


to each eigenvalue, thus dim Null(A − λI) ≥ 1 for any eigenvalue λ.

What the previous theorem says is that if you notice that λ1 is e.g. a
double root of p(λ) and you find two linearly independent vectors in Null(A−
λ1 I), then these form a basis, since dim Null(A − λ1 I) ≤ 2.

Algorithm 3.13.21 (Finding eigenvalues and corresponding eigenvectors)


Let A be an n-by-n matrix.
1. Compute the characteristic polynomial p(λ) = det(A − λI) and find all
roots, these will be λ1 , . . . , λk and will be the eigenvalues.
2. For each eigenvalue reduce A−λi I to row-echelon form, which gives the
rank, hence the number of linearly independent eigenvectors needed to
form a basis of Null(A − λi I).

60
3. For each λi find all solutions to the system (A − λi I)X = 0 and by
choosing the free parameters appropriately find the number of linearly
independent eigenvectors needed as determined by the previous step.

Example 3.13.22 Find the eigenvalues and eigenvectors of the matrix


 
−2 2 0
A =  2 0 −1  .
1 3 −2

We start by computing the characteristic polynomial

p(λ) = det(A − λI)


 
−2 − λ 2 0
= det  2 −λ −1 
1 3 −2 − λ
= (−2 − λ)(−λ)(−2 − λ) − 2 − (−2 − λ)(−1)3 − 4(−2 − λ)
= λ − (λ + 2)2 λ = −λ(λ + 1)(λ + 3).

This reveals that the eigenvalues are 0, −1, −3. Next, we look for the eigen-
vectors corresponding to the eigenvalue 0. By Theorem 3.13.19, we have
dim Null(A − 0I) = 1, so we only need to find one eigenvector, since it will
be a basis. This is true for each eigenvalue, so for each eigenvalue we look
for one eigenvector. Thus, we solve (A − 0I)X = 0 i.e.
 
−2 2 0 0
 2 0 −1 0 
1 3 −2 0

and a standard Gaussian elimination gives e.g. the solution x = (1, 1, 2).
To find an eigenvector corresponding to the eigenvalue −1, we solve the
system (A + I)X = 0 i.e.
 
−1 2 0 0
 2 1 −1 0 
1 3 −1 0

and a standard Gaussian elimination gives e.g. the solution y = (2, 1, 5).

61
Finally, to find an eigenvector corresponding to the eigenvalue −3, we
solve the system (A + 3I)X = 0 i.e.
 
1 2 0 0
 2 3 −1 0 
1 3 1 0

and, again, a standard Gaussian elimination gives e.g. z = (2, −1, 1). To
check that these are indeed eigenvectors, we can compute
    
−2 2 0 1 0
 2 0 −1   1  =  0 
1 3 −2 2 0
    
−2 2 0 2 −2
 2 0 −1   1  =  −1 
1 3 −2 5 −5
    
−2 2 0 2 −6
 2 0 −1   −1  =  3 
1 3 −2 1 −3

which is precisely what we wanted.

Example 3.13.23 Next, lets compute the eigenvalues and eigenvectors of


the matrix  
1 1
A=
0 1
Here the characteristic polynomial is p(λ) = (1 − λ)2 , so there’s only one
eigenvalue 1 of multiplicity two, so dim Null(A − I) ≤ 2. By trial and error
we can find the eigenvector x = (1, 0), but is there another one? The matrix
A − I is just  
0 1
0 0
and it’s already in row-echelon form, so we see that the rank is 1. It follows
that dim Null(A − I) = 1, so we see that the inequality in Theorem 3.13.19
can also be strict.
I’ll end this section with the following famous theorem, which can some-
times be very useful:

62
Theorem 3.13.24 (Caley-Hamilton theorem) Let p(λ) be the characteristic
polynomial of A, then p(A) = 0.
Proof. Most probably proven in math 370.

1 1
Example 3.13.25 Find the inverse of the matrix A = . We com-
0 1
puted earlier the characteristic polynomial, which is p(λ) = (1 − λ)2 =
λ2 − 2λ + 1. It follows by the Cayley-Hamilton theorem that
p(A) = A2 − 2A + I = 0 ⇒ I = 2A − A2 = A(2I − A).
This shows that A−1 = 2I − A.
The previous example also works in general. Assume that we have some
n-by-n matrix A. Then we compute the characteristic polynomial, which is
of the form
p(λ) = an λn + an−1 λn−1 + . . . + a1 λ + a0 .
Plugging in A, and using Cayley-Hamilton, we get
an An + an−1 An−1 + . . . + a1 A + a0 I = 0
If a0 6= 0, then we get
I = −a−1 n
0 (an A +an−1 A
n−1
+. . .+a1 A) = A(−a−1
0 (an A
n−1
+an−1 An−2 +. . .+a1 I)),
which shows that
A−1 = −a−1
0 (an A
n−1
+ an−1 An−2 + . . . + a1 I).
Thus, this method always works, if the constant term a0 of the characteristic
polynomial is nonzero. One might ask then, when is a0 6= 0, so when will
this method work? It turns out that a0 6= 0 precisely when det A 6= 0, so if a
matrix is invertible, then this method always yields a formula for the inverse.

3.14 Diagonalization
In this chapter we develop some tools that turn out to be extremely useful
when working with systems of ordinary differential equations. Assume that
an n-by-n matrix A can be written as
A = P DP −1 ,

63
where D is a diagonal matrix. Then we may compute
n times
z }| {
An = (P DP −1 )n = P DP −1 P DP −1 · · · P DP −1

We see that the products P P −1 cancel, so we get

An = P Dn P −1 .

This now lets us define the matrix exponential eA as follows. We know that
ex satisfies the Taylor polynomial

x2 x3
ex = 1 + + + ...,
2! 3!
so we define
A2 A3
eA = I + + + ....
2! 3!
Contrary to the Taylor polynomial of ex , the previous sequence does not
always converge. However, if A = P DP −1 , then the sequence simplifies as

−1 P D2 P −1 P D3 P −1 D2 D3
A
e = PP + + + . . . = P (I + + + . . .)P −1
2! 3! 2! 3!
The matrix D is diagonal and of the form
 
a11 0
 .. 
 . 
0 ann

and a simple computation shows that


 
ea11 0
D2 D3 ...
I+ + + ... =  .
 
2! 3!
0 eann

This implies that  


ea11 0
eA = P 
 ...  −1
P
0 eann

64
Furthermore, we if we multiply A by a scalar t, then tA = P (tD)P −1 , so
that  
eta11 0
etA = P 
 ...  −1
P
tann
0 e
Now that we have this interesting matrix exponential, which we are able
to compute for matrices that can be written as A = P DP −1 , we still don’t
have any idea when we can find matrices P and D that gives us this relation
and when they can be found, how they can be found. It turns out that
eigenvectors and eigenvalues are the key.

Definition 3.14.1 A matrix A that can be written as A = P DP −1 as


described is called diagonalizable.

Theorem 3.14.2 An n-by-n matrix A is diagonalizable if and only if A


has n linearly independent eigenvectors (these can correspond to different
eigenvalues).
Proof. Is very easy, but requires the concept of base change, which we won’t
discuss.
Figuring out when we can find such eigenvectors is not a simple task, but
there is a theorem that gives a more computable answer. First we need some
definitions.

Definition 3.14.3 Let A be an n-by-n matrix and p(λ) the characteristic


polynomial. Let λ1 be a root of p(λ), so that we may factorize

p(λ) = (λ − λ1 )e q(λ),

where q(λ1 ) 6= 0. The exponent e is called the algebraic multiplicity of the


eigenvalue λ1 . The algebraic multiplicity of λ1 is simply the multiplicity of
the root λ1 .

Definition 3.14.4 Let A be an n-by-n matrix and λ1 an eigenvalue of A.


Then we say that
dim Null(A − λ1 I)

65
is the geometric multiplicity of the eigenvalue λ1 . The geometric multiplicity
is simply the number of linearly independent eigenvectors corresponding to
the eigenvalue λ1 .

Note 3.14.5 Theorem 3.13.19 states precisely that the geometric multiplic-
ity of an eigenvalue is less than or equal to the algebraic multiplicity.

Theorem 3.14.6 An n-by-n matrix is diagonalizable if and only if the geo-


metric multiplicity of each eigenvalue equals the geometric multiplicity and
the sum of the algebraic multiplicities equal n.
Proof. Again, way above the level of the class.

Note 3.14.7 The statement ”sum of the algebraic multiplicities equal n”


just means that the characteristic polynomial factors as

p(λ) = c(λ − λ1 )e1 · · · (λ − λk )ek

into linear factors i.e. the polynomial has no complex roots.

Definition 3.14.8 A real polynomial is said to factor completely if it has


no complex roots.

Now we can actually explicitly explain how to diagonalize a matrix:

Algorithm 3.14.9 (Diagonalizing a matrix) Let A be an n-by-n matrix.

1. Check if p(λ) factors completely. If it doesn’t then A does not diago-


nalize.

2. If p(λ) = c(λ − λ1 )e1 · · · (λ − λk )ek , check if dim Null(A − λi I) = ei .


This can be done by computing rank(A − λi I) for each i and then
using Theorem 3.13.18.

3. If the geometric and algebraic multiplicities match in part (2), then the
matrix is diagonalizable else it isn’t.

66
4. If we have determined that A is diagonalizable, find a basis for each
Null(A − λi I) once this is done for each i, we get a list of n linearly
independent eigenvalues x1 , . . . , xn , where each xi corresponds to an
eigenvalue λi .

5. Write P = [x1 · · · xn ], so the column vectors are the eigenvectors, and


set  
λ1 0
D=
 ... ,

0 λn
then A = P DP −1 .

Example 3.14.10 We will diagonalize the matrix


 
2 0 0
A=  1 2 1 .
−1 0 1

Computing the characteristic polynomial, we get

p(λ) = det(A − λI) = (2 − λ)2 (1 − λ).

We see that p(λ) factorizes completely, since it has real roots. Solving the
required linear systems, we find the following eigenvectors basis for Null(A −
I) and Null(A − 2I):

λ = 1 : x1 = (0, −1, 1), λ = 2 : x2 = (0, 1, 0), x3 = (−1, 0, 1).

The algebraic and geometric multiplicity of the eigenvalue 1 is one and for
the eigenvalue 2 this show it’s two. It follows that the matrix is diagonaliz-
able. Now our list of eigenvectors is x1 , x2 , x3 and the corresponding list of
eigenvalues is 1, 2, 2. It follows that
   −1
0 0 −1 1 0 0 0 0 −1
A =  −1 1 0   0 2 0   −1 1 0  .
1 0 1 0 0 2 1 0 1

If we want an actual expression for the inverse matrix, then a simple compu-
tation shows that the characteristic polynomial of P is λ3 − 2λ2 + 2λ − 1. It

67
follows that
 
1 0 1
I = P (P 2 − 2P + 2I) ⇒ P −1 = P 2 − 2P + 2I =  1 1 1  ,
−1 0 0
so that    
0 0 −1 1 0 0 1 0 1
A =  −1 1 0   0 2 0   1 1 1  .
1 0 1 0 0 2 −1 0 0

Note 3.14.11 The matrices P and D are not unique. D depends on the
order in which we list our eigenvectors. P depends on both the order of the
chosen eigenvectors as well as on the choice of basis for Null(A − λi I) for
each i.

4 Higher-Order ODEs
4.1 Basic definitions
Definition 4.1.1 A linear ordinary differential equation is an equation of
the form

an (x)y (n) + an−1 (x)y (n−1) + . . . + a1 (x)y 0 + a0 (x)y = g(x).

To simplify notation we will introduce the concept of a differential opera-


tor. Denote by D the derivative operator, i.e. applying D to a differentiable
function f gives it’s derivative Df = f 0 . With this notation, we can write

L = an (x)Dn + an−1 Dn−1 + . . . + a1 (x)D + a0 (x),

so if you symbolically ”multiply” a function y from the left by L, you get

Ly = (an (x)Dn + an−1 Dn−1 + . . . + a1 (x) + a0 (x))y


= an (x)Dn y + an−1 Dn−1 y + . . . + a1 (x)Dy + a0 (x)y
= an (x)y (n) + an−1 (x)y (n−1) + . . . + a1 (x)y 0 + a0 (x)y

This lets us express the linear ordinary differential equation above as simply

Ly = g(x).

68
Definition 4.1.2 Any L like the above containing terms with a function
times a power of D will be called a linear differential operator or just a
differential operator. The highest degree of n will be called the degree of the
differential operator.

Definition 4.1.3 An initial-value problem is a differential equation


Ly = g(x),
where L is an nth order differential operator, together with initial conditions
y (n−1) (x0 ) = yn−1 , . . . , y(x0 ) = y0 .

Theorem 4.1.4 Let L = an (x)Dn +an−1 Dn−1 +. . .+a1 (x)D+a0 (x) with the
functions ai (x) defined on some interval I. If x0 ∈ I, then the initial-value
problem
Ly = g(x), y (n) (x0 ) = yn , y (n−1) (x0 ) = yn−1 , . . . , y(x0 ) = y0
has a unique solution, y, on the interval I.

Definition 4.1.5 An boundary-value problem is a differential equation


Ly = g(x),
where L is an nth order differential operator, together with boundary con-
ditions. A set of boundary conditions on an interval I = [a, b] is any list of
specified values for the functions y, y 0 , . . . , y (n−1) at the points a, b. We can
also choose not to fix the value of a any of these functions at either a or b.

Example 4.1.6 The equation 3y 000 + 500 − y 0 + 7y = 0, y(1) = 0, y 0 (1) = 0,


y 00 (1) = 0 has a solution y(x) = 0 which is valid on all of R. It follows that
y = 0 is the unique solution on any interval containing x0 .

Example 4.1.7 Contrary to an initial-value problem, a boundary-value


problem can have multiple solutions. An example would be
y 00 + 16y = 0, y(0) = 0, y(2π) = 0.
This boundary-value problem is satisfied by both y ≡ 0 and y = sin 4x.

69
4.2 Homogeneous equations
Definition 4.2.1 Let L be an nth degree differential operator. A homoge-
neous equation is an equation of the form

Ly = 0,

i.e. if L = an (x)Dn + an−1 (x)Dn−1 + . . . + a1 (x)D + a0 (x), then the equation


is simply

an (x)y (n) + an−1 (x)y (n−1) + . . . + a1 (x)y 0 + a0 (x)y = 0.

Thus, there is no term ”only containing x”


The usefulness of using the L notation, is that L multiplying functions
from the left by L works similarly to multiplying vectors from the left by a
matrix. Since Dn (αf + βg) = αDn f + βDn g by the standard properties of
the derivative it also follows that if y1 , y2 are n-times differentiable functions,
then
L(αy1 + βy2 ) = αLy1 + βLy2 .
This gives

Theorem 4.2.2 Assume that y1 , . . . , yk are solutions to the homogeneous


differential equation Ly = 0. Then

c1 y 1 + . . . + ck y k

is also a solution for any choice of the constants ci .


Proof. L(c1 y1 + . . . + ck yk ) = c1 Ly1 + . . . + ck Lyk = 0.

Definition 4.2.3 A set of functions y1 (x), . . . , yn (x) are called linearly in-
dependent on an interval I if

c1 y1 (x) + . . . cn yn (x) = 0

for every x ∈ I implies that c1 = . . . = cn = 0. If y1 (x), . . . , yn (x) are not


linearly independent, then they are called linearly dependent.

70
Example 4.2.4 The functions x and x2 are linearly independent on any
interval I ⊂ R containing more than one point. This is, because if c1 6= 0 6=
c2 , then
c1 x + c2 x 2
is nonzero for at least some x on I.

Example 4.2.5 The functions y1 = x + 1, y2 = x2 − x, y3 = x2 + 1 are


linearly dependent, since
y1 + y2 − y3 = 0.

Definition 4.2.6 Let y1 , . . . , yn be (n − 1)-times differentiable. The deter-


minant
y1 y2 ··· yn
y10 y20 ··· yn0
W (y1 , . . . , yn ) = .. .. ..
. . .
(n−1) (n−1) (n−1)
y1 y2 ··· yn
is called the Wronskian of the functions.

Note 4.2.7 Note that the determinant is a function of x, so we may write


W (y1 , . . . , yn )(x).

Theorem 4.2.8 Let y1 , . . . , yn be solutions of an nth order homogeneous


linear differential equation on an interval I. Then y1 , . . . , yn are linearly
independent if
W (y1 , . . . , yn )(x0 ) 6= 0
for some x0 ∈ I.

Definition 4.2.9 A set of linearly independent solutions y1 , . . . , yn to an


nth order homogeneous linear differential equation on an interval I is called
a fundamental set of solutions on the interval.

Theorem 4.2.10 There exists a fundamental set of solutions for any nth
order homogeneous linear differential equation on an interval I.

71
Theorem 4.2.11 If y1 , . . . , yn is a set of fundamental solutions to an nth
order homogeneous linear differential equation on an interval I, then every
other solution is of the form

y = c1 y 1 + . . . + cn y n

for some constants ci ∈ R.

Note 4.2.12 If you read the section on the generalized definition of a vector
space, then one can make a few observations. The set of solutions to Ly = 0
is a vector space, since Lcy1 = cLy1 = 0 and L(y1 + y2 ) = Ly1 + Ly2 = 0,
so the set of solutions is closed under scalar multiplication and addition. All
the other properties listed in that section can also be checked.
What the above theorem then says in this language is that the funda-
mental set of solutions is a basis for the vector space of all solutions, so every
other solution is in the span. It also says that the dimension of the solution
set of an nth order linear differential equation is n. Using this one can apply
all the techniques and theorems of linear algebra, including eigenvectors and
eigenvalues, to the study of differential equations. This quickly leads to a
large field of mathematics called functional analysis.

4.3 Nonhomogeneous equations


Definition 4.3.1 Let L be an nth degree differential operator. A nonhomo-
geneous equation is an equation of the form

Ly = g(x),

i.e. if L = an (x)Dn + an−1 (x)Dn−1 + . . . + a1 (x)D + a0 (x), then the equation


is simply

an (x)y (n) + an−1 (x)y (n−1) + . . . + a1 (x)y 0 + a0 (x)y = g(x).

Theorem 4.3.2 Let y1 and y2 be solutions to the nonhomogeneous differ-


ential equation Ly = g(x). Then y1 = y2 + yh , where yh is a solution to the
corresponding homogeneous equation Ly = 0.
Proof. By assumption L(y1 − y2 ) = g(x) − g(x) = 0, so yh = y1 − y2 is a
solution of Ly = 0. But then y1 = y2 + (y1 − y2 ) = y2 + yh .

72
Corollary 4.3.3 Let yp be a particular solution to the nth order nonho-
mogeneous differential equation Ly = g(x) and y1 , . . . , yn a fundamental set
of solutions to the homogeneous equation Ly = 0. Then any solution of
Ly = g(x) is of the form
y p + c1 y 1 + . . . + cn y n .
Proof. yh in the theorem is of the form c1 y1 + . . . + cn yn being a solution to
Ly = 0.

Note 4.3.4 The previous results tell us that solving a nonhomogeneous


equation follows the following idea:
1. Find one solution to Ly = g(x).
2. Solve the equation Ly = 0.
The first step is usually the hard part, since there are many good methods for
dealing with homogeneous equations as we shall see in the following sections.

Definition 4.3.5 The previous results show that any solution of Ly = g(x)
is of the form y = yp + yh . The function yh is called the complementary
function of the solution.

Theorem 4.3.6 (Superposition principle) Let L = an (x)Dn +an−1 (x)Dn−1 +


. . . + a1 (x)D + a0 (x) and let ypi be a solution to
Ly = gi (x), i = 1, . . . , k
Then yp = yp1 + . . . + ypk is a solution to
Ly = g1 (x) + . . . + gk (x).
Proof. Ly = L(yp1 + . . . + ypk ) = Lyp1 + . . . + Lypk = g1 (x) + . . . + gk (x).

Example 4.3.7 If we have e.g. the differential equation Ly = x + x2 ,


then the previous example tells us that instead of trying to find a particular
solution directly, we could try to find particular solutions to Ly = x and
Ly = x2 and then add them. In general, it’s usually easier to split g(x) into
simpler parts.

73
4.4 Homogeneous linear equations with constant coef-
ficient
Definition 4.4.1 Let L = an Dn + an−1 Dn−1 + . . . + a1 D + a0 y, so the
coefficients of each y (i) is a constant. Then the differential equation

Ly = 0

is a homogeneous linear equation with constant coefficients.


To solve such an equation Ly = 0 we use a trial-and-error approach. We
guess that a solution is of the form y = ecx . Since y (n) = cn ecx , we get the
equation
an cn ecx + an−1 cn−1 ecx + . . . + a1 cecx + a0 ecx = 0.
We divide both sides by ecx , so that we are left with the equation

an cn + an−1 cn−1 + . . . + a1 c + a0 = 0.

It follows that y = ecx is a solution to Ly = 0 if c is a root of the above


equation.

Definition 4.4.2 Given a linear differential equation with constant coeffi-


cients Ly = 0, the equation

an cn + an−1 cn−1 + . . . + a1 c + a0 = 0

is called the auxiliary equation.

Theorem 4.4.3 (Fundamental theorem of algebra) Any degree n polyno-


mial with real coefficients has n roots if we allow complex roots and count
roots by multiplicity.

Note 4.4.4 The previous theorem says that given the polynomial p(c) =
an cn + an−1 cn−1 + . . . + a1 c + a0 , then it factors as

p(c) = an (c − c1 )e1 · · · (c − ck )ek ,

where each ci can be complex numbers and ei is the multiplicity of the root.

74
Theorem 4.4.5 (Conjugate pairs) If c = a + bi is a root of p(c) = an cn +
an−1 cn−1 + . . . + a1 c + a0 , then a − bi is also a root.

What we know up to this point is that if ci is a root of p(c) = an cn +


an−1 cn−1 + . . . + a1 c + a0 , then eci x is a solution to Ly = 0. The problem is
that we do now necessarily have enough distinct roots to have a fundamental
set of solutions.

Example 4.4.6 Let y 00 +2y 0 +y = 0 has the auxiliary equation p(c) = (c+1)2 ,
so we only find one solution y = e−x instead of two, which is what we need
in order to find the general solution.

Theorem 4.4.7 Let Ly = 0 be an nth degree linear differential equation


with constant coefficients and p(c) = an cn + an−1 cn−1 + . . . + a1 c + a0 the
auxiliary equation. If p has n distinct roots c1 , . . . , cn , then a fundamental
set of solutions is given by

ec1 x , ec2 x , . . . , ecn x .

Theorem 4.4.8 Let Ly = 0 be an nth degree linear differential equation


with constant coefficients and p(c) = an cn + an−1 cn−1 + . . . + a1 c + a0 the
auxiliary equation. If c1 is a root of p of multiplicity e1 , then

ec1 x , xec1 x , . . . , xe1 −1 ec1 x

are all linearly independent solutions to Ly = 0.

Theorem 4.4.9 Let Ly = 0 be an nth degree linear differential equation


with constant coefficients and p(c) = an cn + an−1 cn−1 + . . . + a1 c + a0 the
auxiliary equation. If p factors as

p(c) = an (c − c1 )e1 · · · (c − ck )ek ,

then a fundamental set of solutions is

ec1 x , xec1 x , . . . , xe1 −1 ec1 x , ec2 x , xec2 x , . . . , xe2 −1 ec2 x , . . . , eck x , xeck x , . . . , xek −1 eck x .

75
Example 4.4.10 A fundamental set of solutions to the previous example
y 00 + 2y 0 + y = 0 is given by e−x , xe−x .
We still have one problem we haven’t addresses properly. Assume that
our equation p(c) = 0 has a complex root a + bi, then the corresponding
solution is
eax+ibx = eax eibx = eax (cos bx + i sin bx)
using Euler’s formula eibx = cos bx+i sin bx. This function is complex-valued,
but we are only interested in real-valued solutions to our differential equation.
Since complex roots occur in conjugate pairs, it follows that a − ib is also
a root, which simplifies to
eax−ibx = eax (cos bx − i sin bx).
Write y1 = eax (cos bx + i sin bx) and y2 = eax (cos bx + i sin bx). Then
y1 + y2 = 2eax cos bx, y1 − y2 = 2ieax sin bx.
Since for any constant C, C(y1 + y2 ) and C(y1 − y2 ) are solutions, we may
divide by 2 in the first expression and by 2i in the second. It follows that
eax cos bx, eax sin bx
are solutions to Ly = 0 and they turn out to be linearly independent. This
gives the following refinement to our previous theorem.

Definition 4.4.11 The complex conjugate of a complex number a + bi is


a − bi and is denoted by a + bi. This means that if α = a + bi is a root of
some polynomial p(x) with real coeffiecients, then so is α.

Theorem 4.4.12 Let Ly = 0 be an nth degree linear differential equation


with constant coefficients and p(c) = an cn + an−1 cn−1 + . . . + a1 c + a0 the
auxiliary equation. Assume p factors as follows
p(c) = an (c − c1 )e1 · · · (c − cl )el (c − cl+1 )el+1 (c − cl+1 )el+1 · · · (c − ck )ek (c − ck )ek ,
so the first l roots are real and then we have pairs of complex conjugate
roots cl+1 , cl+1 , . . . , ck , ck , where ci = ai + bi for i = l + 1, . . . , k. Then a
fundamental set of solutions is given by the following. We have solutions
ec1 x , xec1 x , . . . , xe1 −1 ec1 x , . . . , ecl x , xecl x , . . . , xel −1 ecl x

76
corresponding to the real roots and the following

eal+1 cos bl+1 x, xeal+1 cos bl+1 x, . . . , xel+1 −1 eal+1 cos bl+1 x, . . .

eak cos bk x, xeak cos bk x, . . . , xek −1 eak cos bk x


eal+1 sin bl+1 x, xeal+1 sin bl+1 x, . . . , xel+1 −1 eal+1 sin bl+1 x, . . .
eak sin bk x, xeak sin bk x, . . . , xek −1 eak sin bk x
corresponding to the conjugate pairs of complex roots.

Note 4.4.13 The previous theorem just says that if α ∈ R is a root of


multiplicity e, then it adds the solutions

eαx , . . . , xe−1 eαx

and if α = a + ib ∈ C and α = a − ib ∈ C is a pair of complex conjugate


roots of multiplicity e, then it adds the solutions

eax cos bx, . . . , xe−1 eax cos bx, eax sin bx, . . . , xe−1 eax sin bx.

Example 4.4.14 The equation y 0000 + 2y 00 + y = 0 has the auxiliary equation


(c2 + 1)2 = c4 + 2c2 + 1 = 0, which has roots ±i with multiplicity two. It
follows that a fundamental set of solutions is given by

cos x, x cos x, sin x, x sin x.

Thus the general solution is the function

y = C1 cos x + C2 x cos x + C3 sin x + C4 x sin x.

for some constants Ci .

4.5 Undetermined Coefficients


Let L = an Dn + an−1 Dn−1 + . . . a1 D + a0 and Ly = g(x) an associated
nonhomogeneous linear differential equation with constant coefficients. In
this section we will describe a method that lets us solve these equations
assuming that g(x) is of the following form:

77
1. A constant, a polynomial, eax , sin ax, cos ax;

2. finite sums and products of these functions.

Example 4.5.1 g(x) = 1, g(x) = (x2 + 1)e2x , g(x) = xex sin x + 1.

Theorem 4.5.2 Let g(x) be of the following form. Then there’s a finite
number of functions f1 , . . . , fk s.t. any derivative g (n) (x) can be written in
the form:
g (n) (x) = c1 f1 + . . . + ck fk
for some constants ci .
Proof. Obvious once you realize what it’s saying.

Note 4.5.3 The assumption is that each fi has only one term in it, so no fi
is of the form say x + 1, but x2 ex is allowed, since there’s only one term in
the expression.

Example 4.5.4 Let g(x) = xex sin x + 1. Any derivative will only contain
terms that are sums of constants times the following

1, ex sin x, ex cos x, xex sin x, xex cos x.

This can be seen since differentiating any function in the list is a linear
combination of functions in the list, so differentiating does not produce any
new types of terms.

Algorithm 4.5.5 (Method of undetermined coefficients) Let Ly = g(x) be


of the required form and assume that f1 , . . . , fk are as in the theorem.

1. Solve the system Ly = 0 giving a fundamental set of solutions y1 , . . . , yn .

2. If any fi is a term in one of the yj , then replace fi by xm fi , where m is


the lowest integer, so that xm fi is not a term in any of the yj .

3. Set yp = A1 f1 + . . . + Ak fk , substitute into Ly = g(x) and solve the


resulting equations for the Ai .

78
Note 4.5.6 If g(x) = g1 (x) + . . . + gk (x), it’s often easier to solve the equa-
tions Ly = gi (x), i = 1, . . . , k, separately and then add the particular solu-
tions using the superposition principle. E.g. if g(x) = x2 + sin x, then solve
Ly = x2 and Ly = sin x and add the found solutions to get a solution for
Ly = x2 + sin x.

Example 4.5.7 Take the equation y 00 + 4y 0 − 2y = 2x2 − 3x + 6. Here


g(x) = 2x2 − 3x + 6 and any derivative of g(x) is a linear combination of the
functions f1 = 1, f2 = x, f3 = x2 . The homogeneous equation y 00 +4y 0 −2y √
=0
2
has the auxiliary equation m + 4m − 2 = 0, so the roots are m = −2 ± 6.
Thus, √ √
y1 = e−(2+ 6)x , y2 = e−(2− 6)x .
We see that neither y1 nor y2 contain terms that are among the fi . It follows
that the particular solution is of the form
yp = Af3 + Bf2 + Cf1 = Ax2 + Bx + C.
Plugging this into the differential equation gives
yp00 + 4yp − 2yp = 2A + 8Ax + 4B − 2Ax2 − 2Bx − 2C
= 2x2 − 3x + 6.
It follows that −2A = 2, 8A − 2B = −3, 2A + 4B − 2C = 6, so we solve
A = −1, B = −5/2, C = −9. Thus the general solution to the equation is
5 √ √
y = −x2 − x − 9 + c1 e−(2+ 6)x + c2 e−(2− 6)x
2

Example 4.5.8 Take the equation y 00 − 5y 0 + 4y = 8ex . Here g(x) = 8ex and
all the derivatives are linear combinations of f1 = ex . The auxiliary equation
of y 00 − 5y 0 + 4y = 0 is m2 − 5m + 4 = 0, so m = 1, 4. It follows that
y1 = ex , y2 = e4x
Since y1 is a constant times f1 , we need to replace f1 by xex , so yp = Axex .
Plugging into the equation we get
yp00 − 5yp0 + 4yp = −3Aex = 8ex ,
so A = −8/3. It follows that the general solutions is
8
y = − xex + c1 ex + c2 e4x .
3

79
4.6 Variation of parameters (optional)
The reason why a specific list of functions was chosen in the previous example
is that their derivatives only contain finitely many different types of terms.
If we chose for example g(x) = ln x, then successive differentiations require
higher and higher powers 1/xn , so there’s no finitely list of different terms
that can express every derivative. This method generally requires lots of
computation as we will soon see, but it’s the most general method for finding
particular solutions to nonhomogeneous systems that we will describe.
Let L = Dn +an−1 (x)Dn−1 +. . .+a1 (x)D+a0 (x), thus we only assume that
the highest degree coefficient is 1. Assume that we can solve the homogeneous
system Ly = 0, so that we find a fundamental set of solutions
y1 , . . . , y n .
Now write yp = u1 (x)y1 (x) + . . . + un (x)yn (x), where the ui are unknown
functions. Substituting yp into Ly = g(x) gives the following system of
linear equations
y1 u01 + y2 u02 + . . . + yn u0n = 0


0 0 0 0 0 0

 y1 u1 + y2 u2 + . . . + yn un = 0



..
.
 (n−2) 0 (n−2) 0 (n−2) 0


 y1 u1 + y2 u2 + . . . + yn un = 0
 (n−1)
 (n−1) 0 (n−1) 0
y1 u1 + y2 u2 + . . . + y n un = g(x).
We can rewrite this equation in matrix form as
    
y1 y2 ··· yn u01 0
0 0 0  0
 y1 y2 ··· yn    u2
  0 
.. .. .. = .
   
   .. ..
 . . .  .   . 
(n−1) (n−1) (n−1)
y1 y2 · · · yn u0n g(x)
Using Cramer’s rule, which we haven’t talked about, so you can just take
this as given, this system has the following solutions:
Wk
u0k = ,
W
where
y1 y2 ··· yn
0 0
y1 y2 ··· yn0
W = .. .. ..
. . .
(n−1) (n−1) (n−1)
y1 y2 ··· yn

80
and Wk is the determinant we get by replacing the kth column of W with
the constant vector of the system. This means that

0 y2 ··· yn y1 0 ··· yn
0 y20 ··· yn0 y10 0 ··· yn0
W1 = .. .. .. , W2 = .. .. ..
. . . . . .
(n−1) (n−1) (n−1) (n−1)
g(x) y2 ··· yn y1 g(x) · · · yn

To finally find the ui (x) we integrate the functions solved from this system.

4.7 Cauchy-Euler equations


Definition 4.7.1 Let L = an xn Dn + . . . + a1 xD + a0 . Then Ly = g(x) is
called a Cauchy-Euler equation, where D = d/dx.

Example 4.7.2 2x2 y 00 − xy 0 + 2y = 0 is a homogeneous Cauchy-Euler equa-


tion.
The Cauchy-Euler equation can be transformed into a linear differential
equation with constant coefficients using a substitution. If we substitute
x = ez , then
 
0 dy dy dz dy d dy 1 dy
y = = = ln x = = e−z
dx dz dx dz dx dz x dz
when multiplying by x = ez this simplifies to
dy
xy 0 = .
dz
d
If we write D = dz
, then a longer computation with the chain rule yields

xn y (n) = D(D − 1) · · · (D − n + 1)y.

Combining everything, and renaming the unknown function to y2 it follows


that the differential equation Ly = g(x) can be written as

(an D(D−1) · · · (D−n+1)+an−1 D(D−1) · · · (D−n+2)+. . .+a1 D+a0 )y2 = g(ez ),

which can be written as


Ly2 = g(ez ),

81
where

L = an D(D −1) · · · (D −n+1)+an−1 D(D −1) · · · (D −n+2)+. . .+a1 D +a0 .

Note that this differential equations gives us a solution y2 (z) = f (z) and the
answer to the original equation is then y = y2 (ln x), since ez = x.

Example 4.7.3 We solve x2 y 00 − 2xy 0 − 4y = 0. Using the formula above we


get
L = D(D − 1) − 2D − 4 = D2 − 3D − 4,
so that the our new equation in terms of the variable ez = x is

y200 − 3y200 − 4y2 = 0.

The auxiliary polynomial has roots −1 and 4, so that

y2 = C1 e−z + C2 e4z .

Substituting ez = x i.e. z = ln x, we find that the solution is

y = C1 e− ln x + C2 e4 ln x = C1 x−1 + C2 x4 .

Example 4.7.4 For a nonhomogeneous Cauchy-Euler equation we proceed


as follows. If we solve x2 y 00 − 2xy 0 − 4y = x, then the subsitution ez = x
gives, following what we did above,

y200 − 3y200 − 4y2 = ez .

A particular solution is then given by Aez , so that

Aez − 3Aez − 4Aez = ez ,

which gives A = −1/6. Thus the general solution is


ez
y2 = − + C1 e−z + C2 e4z ,
6
so that
x
y = − + C1 x−1 + C2 x4 .
6

82
Figure 1: On the left spring with no mass attached. In the middle spring
with mass attached at equilibrium. On the right spring with mass and
stretched distance x from equilibrium

4.8 Linear Models


In physics Hooke’s law states that if we stretch a spring the distance x, then
the spring exerts a restoring force F = −kx pointing towards the equilibrium
position of the spring. The constant k is called the spring constant. We will
use what we have developed in the previous chapter to analyze some basic
spring-mass systems, since these systems correspond to a second degree linear
differential equation with constant coefficients.

4.8.1 Free undamped motion


We start by analyzing the case where the system has no friction. Thus,
assume that a weight of mass m is hanging from the spring and it has been
displaced by x from the equilibrium point as in figure 1.
Hooke’s law states that the force acting on our mass is

F = −kx,

83
but since we can also measure force by measuring the acceleration of our
mass, we know that
d2 x
F = ma = m 2 = −kx.
dt
It follows that the movement of our mass is governed by the differential
equation
mx00 + kx = 0.
This equation is usually expressed in the form
k
x00 + ω 2 x = 0, ω 2 = ,
m
so the auxiliary equation is c2 + ω 2 = 0 i.e. c = ±ωi and the solution to the
equation is
x(t) = C1 cos ωt + C2 sin ωt.
The previous equation gives a full solution, but the formula can be sim-
plified into a form from
p which the amplitude, period and frequency are easier
to see. If we set A = C12 + C22 and let

C1
tan φ = ,
C2
then we may write
x(t) = A sin(ωt + φ),
from which it’s apparent that the amplitude is A, frequency is f = ω/2π and
period is T = 2π/ω. The motion defined by the previous equation is given
called simple harmonic motion or free undamped motion.

Example 4.8.1 A mass weighing 2kg stretches a spring by 5cm. If we release


the weight at time t = 0 from a point 10cm below the equilibrium position
with an upward velocity of 0.5m/s, determine the equation of free motion.

4.8.2 Free damped motion


Since the lack of friction is quite unrealistic, we can fix that by adding a
damping constant, caused by for example air resistance on the moving object.

84
Such a resistance is directly proportional to the velocity of the moving object,
so our differential equation is of the form

mx00 = −kx − βx0 ⇔ mx00 + βx0 + kx = 0.

We define ω 2 = k/m as before and set 2λ = β/m. Dividing our equation by


m, we may then write
x00 + 2λx0 + ω 2 x = 0.
The roots of the auxiliary polynomial are then given by

−λ ± λ2 − ω 2 ,

so the solution of our system is then covered by three cases, which are shown
in figure 2.
Case λ2 − ω 2 > 0: In this case the system is overdamped,
√ √
λ2 −ω 2 t λ2 −ω 2 t
x(t) = e−λt (C1 e + C2 e− ),

which does not represent a oscillatory motion. This happens when the
damping force is big compared to the spring constant.
Case λ2 − ω 2 = 0: In this case the system is critically damped,

x(t) = e−λt (C1 + C2 t)

and any slight decreasy in the damping force results in oscillatory mo-
tion.
Case λ2 − ω 2 < 0: In this case the system is underdamped, the roots are now
complex, so
√ √
x(t) = e−λt (C1 cos ω 2 − λ2 t + C2 sin ω 2 − λ2 t)

and the movement is an oscillatory motion with decaying amplitude.


Just like in the free undamped case we may write
√ C1
x(t) = Ae−λt sin( ω 2 − λ2 t + φ), tan φ = .
C2

Here√A is called the damped amplitude, 2π/ ω 2 − λ2 the quasi period
and ω 2 − λ2 /2π the quasi frequency.

85
1
β =8
0.8 β =4
β =1
0.6 β =0

0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0 2 4 6 8 10 12 14 16 18

Figure 2: β = 0: no damping, β = 1: underdamped, β = 4: critically


damped, β = 8: overdamped. Here m = 2 = k, so ω = 1 and λ = β/4.

4.8.3 Driven motion


Finally, we have the last case in which some external force f (t) is applied to
the mass at each moment t. This yields the differential equation

mx00 = −kx − βx0 + f (t)

and after rearranging and scaling f (t), this may be written as

x00 + 2λx0 + ω 2 x = F (t),

where F (t) = f (t)/m. In this case the system can behave in any continuous
fashion, since by picking f (t) appropriately, we may essentially cause the
spring mass system to move in any way we want.
From the general theory in the previous sections, we know that any solu-
tion is of the form
x(t) = xp (t) + xh (t),
where xp (t) is a particular solution and xh (t) is a solution to the homogeneous
part, which represents solution to the corresponding free damped case. The

86
solution xh (t) is called the transient solution and the function xp (t) is called
the steady-state term. The name for the latter comes from the fact that if
f (t) is the zero function, then the steady-state term can be chosen to be
the function x(t) = 0 i.e. the solution where the mass stays at equilibrium.
Worth noting is that defining initial-conditions on the system will only affect
the transient solution.

Example 4.8.2 The interesting case is when the driving force is periodic, so
we’ll analyze the case when there’s no damping. In this case the differential
equation becomes
x00 + ω 2 x = F sin αt.
This equation has a particular solution
F
xp (t) = sin αt.
ω2 − α2
and the general solution
F
x(t) = C1 cos ωt + C2 sin ωt + sin αt
ω2 − α2
If we assume that the system starts from the equilibrium position i.e. x(0) =
0 = x0 (0), then
αF
c1 = 0, c2 = − ,
ω(ω 2 − α2 )
so that
αF
x(t) = (−α sin ωt + ω sin αt).
ω(ω 2 − α2 )
These formulas only make sense if α 6= ω, in other words, the frequency of
the harmonic oscillator is different than the frequence of the driving force.
If the frequency of the driving force equals the frequency of the oscillator,
then one would expect the amplitude to start growing towards infinity. This
is analogous to a kid on a swing who keeps pushing more and more speed
to the swing in sync with the frequency. To find the function modeling this,
one can compute the following limit by applying l’Hopital’s rule, which gives
−α sin ωt + ω sin αt F F
x(t) = lim F 2 2
= 2
sin ωt − t cos ωt.
α→ω ω(ω − α ) 2ω 2ω
As expected, the second term shows that the amplitude goes to infinity.

87
5 Systems of linear ODEs
5.1 Basic definitions
Definition 5.1.1 A first-order system of differential equations is an equation
of the form 
dx1
 dt = g1 (t, x1 , . . . , xn )

.. ,
.
 dxn = g (t, x , . . . , x )

dt n 1 n

so in contrast to a standard differential equation, the derivative of each func-


tion is defined in terms of all the other functions.

Definition 5.1.2 A linear first-order system of differential equations is a


system of the form

dx1
 dt = a11 (t)x1 + . . . + a1n (t)xn + f1 (t)

..
.
 dxn = a (t)x + . . . + a (t)x + f (t)

dt n1 1 nn n n

We can write this in matrix form by setting


     
a11 (t) · · · a1n (t) x1 (t) f1 (t)
.. ... ..  .   . 
A(t) =   , X =  ..  , F (t) =  ..  ,
 
. .
an1 (t) · · · ann (t) xn (t) fn (t)
so that the system can be written as
X 0 = AX + F,
where the derivative of a matrix is defined by differentiating each element of
the matrix.

Definition 5.1.3 A first-order linear system X 0 = AX + F is homogeneous


if F = 0, otherwise it’s nonhomogeneous.

Example 5.1.4 The following nonhomogeneous system


 0
x1 = x1 + 2x2 + 1
x02 = 5x1 − x2

88
can be written as
x01
      
1 2 x1 1
= +
x02 5 −1 x2 0

Definition 5.1.5 A solution vector for X 0 = AX + F on an interval I is any


matrix  
x1 (t)
X =  ... 
 
xn (t)
that satisfies the system.
Many results from differential equations will translate directly to the case
of systems of differential equations and translating them will be our next
step.

Definition 5.1.6 Let t0 denote a point on in interval I. An initial-value


problem for a linear system X 0 = AX + F is a problem of the following form
X 0 = AX + F, X(t0 ) = X0 ,
where X0 is some constant vector.

Example 5.1.7 Using the system in the previous example, letting t0 = 0,


we could define
 0          
x1 1 2 x1 1 x1 (0) 1
= + , =
x02 5 −1 x2 0 x2 (0) −1

Theorem 5.1.8 Let A(t) and F (t) be continuous on a common interval I


that contains the point t0 . Then the initial-value problem
X 0 = AX + F, X(t0 ) = X0 ,
has a unique solution.

Theorem 5.1.9 (superposition principle) Let X1 , . . . , Xk be solutions to the


homogeneous linear system
X 0 = AX
on the interval I. Then X = c1 X1 + . . . + ck Xk is also a solution.

89
Proof. Since we differentiate element by element, we have

(c1 X1 + . . . + ck Xk )0 = c1 X10 + . . . + ck Xk0


= c1 AX1 + . . . + ck AXk
= A(c1 X1 + . . . + ck Xk ).

Definition 5.1.10 A set of solutions X1 , . . . , Xk are called linearly dependent


on an interval I if there exists nonzero constants c1 , . . . , ck s.t.

c1 X 1 + c2 X 2 + . . . + ck X k = 0

for every t ∈ I. If such nonzero ci can’t be found then the solutions are called
linearly independent.

Theorem 5.1.11 Let


     
x11 x12 x1n
X1 =  ...  , X2 =  ..  , . . . , X =  ..  .
  
.  n  . 
xn1 xn2 xnn

be n solution vectors to X 0 = AX on an interval I. Then X1 , . . . , Xn are


linearly independent if and only if for some t ∈ I,

x11 (t) · · · x1n (t)


W (X1 , . . . , Xn ) = .. ... .. 6= 0.
. .
xn1 (t) · · · xnn (t)

Definition 5.1.12 Any set X1 , . . . , Xn of linearly independent solution vec-


tors to a system X 0 = AX will be called a fundamental set of solutions.
   
1 3
Example 5.1.13 The vectors X1 = e−2t and X2 = e6t are
−1 5
solutions of  
0 1 3
X = X
5 3

90
and
e−2t 3e6t
W (X1 , X2 ) = = 8e4t 6= 0
−e−2t 5e6t
so X1 , X2 are linearly independent solutions and form a fundamental set of
solutions.

Theorem 5.1.14 There exists a fundamental set of solutions for X 0 = AX


on any interval on which the elements of A(t) are continuous.

Theorem 5.1.15 (general solution) Let X1 , . . . , Xn be a fundamental set


of solutions to X 0 = AX on an interval I. Then any other solution can be
written as
X = c1 X 1 + . . . + cn X n
for some choice of constants c1 , . . . , cn .

Theorem 5.1.16 (general solution to a nonhomogeneous system) Let Xp


be a particular solution to the nonhomogeneous system X 0 = AX + F on an
interval I. If X1 , . . . , Xn is a fundamental set of solutions to X 0 = AX on
the interval I, then every solution to X 0 = AX + F is of the form

X = Xp + c1 X1 + . . . + cn Xn .

Proof. Proof is the same as for the differential equation case, i.e. if Xp1 and
Xp2 are solutions to X 0 = AX + F , then Xp1 − Xp2 is a solution to X 0 = AX
and proceed similarly as in the differential equation case.
 
3t − 4
Example 5.1.17 Xp = is a particular solution to
−5t + 6
   
0 1 3 12t − 11
X = X+ .
5 3 −3
 
1
We know from the previous example that X1 = e−2t and X2 =
  −1
3
e6t are a fundamental set of solutions to the corresponding homoge-
5

91
neous system. It follows that the general solution is
     
3t − 4 −2t 1 6t 3
X= + C1 e + C2 e .
−5t + 6 −1 5

5.2 Homogeneous linear systems


Solving linear systems X 0 = AX, where A is a constant matrix, essentially
reduces to computing eigenvalues and eigenvectors of the coefficient matrix
A. Each eigenvalue contributes a set of functions to the fundamental set and
we will start by listing the theorems that describe this in increasing order
of difficulty. Throughout this section we will assume that A is a constant
matrix.

Theorem 5.2.1 Let X 0 = AX be a first order homogeneous linear system


and λ an eigenvalue of algebraic multiplicity k and geometric multiplicity g.
If k = e, so that we have linearly independent eigenvectors x1 , . . . , xk , then
the functions
eλt x1 , . . . , eλt xk
are linearly independent solutions.
 
0 2 3
Example 5.2.2 If we look at the system X = , then A has eigen-
2 1
values λ1 = −1 and λ2 = 4. These have algebraic multiplicity 1, so the
geometric multiplicity is also 1. By the theorem each eigenvalue contributes
one solution to the fundamental set, which corresponds to the eigenvector
for the particular eigenvalue. We get the eigenvectors
   
1 3
x1 = , x2 =
−1 2
and this gives us the general solution
   
−t 1 4t 3
X = c1 e + c2 e
−1 2



1 −2 2
Example 5.2.3 Let us look at the system X 0 =  −2 1 −2  X. The
2 −2 1

92
matrix has eigenvalues λ1 = −1 and λ2 = 5 and the former has algebraic
multiplicity 2. Computing the corresponding null spaces for each eigenvalue,
we first find the following linearly independent eigenvectors for −1,
   
1 0
x1 = 1 , x2 = 1  ,
  
0 1
so −1 has geometric multiplicity 2. For the eigenvalue 5, we find the eigen-
vector  
1
x3 =  −1  .
1
Using the theorem, we then get the general solution
     
1 0 1
X = c1 e−t  1  + c2 e−t  1  + c3 e5t  −1  .
0 1 1
If the geometric multiplicity does not equal the algebraic multiplicity, then
the situation is more complicated. Assume that we are given the system
X 0 = AX
and that λ is an eigenvalue of geometric multiplicity g < k, where k is the
algebraic multiplicity. Then we can find the following linearly independent
eigenvectors of A corresponding to the eigenvalue λ,
x1 , . . . , x g .
We need to find k solutions to the system corresponding to this eigenvalue
in order to find a fundamental set. Since, we can’t find enough eigenvectors,
we have to look for something else.
I will only cover the cases here when g = 1 and k = 2, 3. The general
setup requires something called generalized eigenvectors and while giving a
nice formula it requires slightly more linear algebra than what we have time
to cover. The g = 1 and k = 2 case is the following:

Theorem 5.2.4 Given the system X 0 = AX, assume that λ is an eigenvalue


of algebraic multiplicity 2 and geometric multiplicity 1. If x is an eigenvector
for λ, then the equation
(A − λI)y = x

93
has a nonzero solution y and the solutions to X 0 = AX corresponding to the
eigenvalue λ are
eλt x, eλt (xt + y).

Theorem 5.2.5 Given the system X 0 = AX, assume that λ is an eigenvalue


of algebraic multiplicity 3 and geometric multiplicity 1. If x is an eigenvector
for λ, then the equations
(A − λI)y = x
has a nonzero solution y,
(A − λI)z = y
has a nonzero solution z and the solutions to X 0 = AX corresponding to the
eigenvalue λ are
 2 
λt λt λt t
e x, e (xt + y), e x + ty + z .
2

 
2 1 6
Example 5.2.6 The system X 0 =  0 2 5  X has only one eigenvalue
0 0 2
of algebraic multiplicity 3 and geometric multiplicity 1. An eigenvector is
 
1
x = 0 .

0

The system (A − 2I)y = x has the nonzero solution


 
0
y= 1 

0

and the system (A − 2I)z = y has the nonzero solution


 
0
z =  −6/5  .
1/5

94
At this point the process stops, since we have found 3 vectors corresponding
to the algebraic multiplicity. The general solution to the equation is thus
             
1 1 0 1 0 0
X = c1 e2t  0 +c2 e2t t  0  +  1 +c3 e2t t2  0  + t  1  +  −6/5 
0 0 0 0 0 1/5

5.3 Homogeneous linear systems – complex eigenvalues


Since not all polynomials with real coefficients have real roots, it might hap-
pen that the coefficient matrix has no eigenvalues. Take for example the
system  
0 0 −1
X = X.
1 0
The method of the previous chapter does not work, since the characteristic
polynomial of the coefficient matrix is λ2 + 1, which has no real roots. The
way to solve this problem is to extend our definition of an eigenvalue to
include complex roots.

Definition 5.3.1 An complex eigenvalue of a matrix A with real entries in


any complex number λ that is a root of the characteristic polynomial. Any
vector x with complex entries is a corresponding complex eigenvector if

Ax = λx.


0 −1
Example 5.3.2 The matrix A = has eigenvalues λ = ±i and
1 0
      
0 −1 1 −i 1
= = −i
1 0 i 1 i

Thus if λ = −i and x = (1, i), then Ax = −ix.

WARNING 5.3.3 One major difference between real an complex eigenvec-


tors is that complex eigenvectors is that in the definition of linear indepen-
dence, we have to allow complex scalars. For example if we let x1 = (1, i)
and x2 = (i, −1), then
c1 x 1 + c2 x 2 = 0

95
if we let c1 = i and c2 = −1. Thus being complex vectors, these vectors are
linearly dependent, even though no nontrivial linear relation exists if we just
allow real values for c1 and c2 .
It turns out that everything we know about real eigenvectors translate
directly to complex eigenvectors. The multiplicity of a complex root of the
characteristic polynomial is the algebraic multiplicity and we can compute
ranks of matrices with complex entries just as we do for real matrices. We
just need to take into account that we have to do complex arithmetic during
the Gaussian elimination. When choosing linearly independent eigenvectors
we have to be careful though to choose them so that they are linearly inde-
pendent as complex vectors.

Theorem 5.3.4 If λ is an eigenvalue of a real matrix A, then λ is also an


eigenvalue.
Proof. Complex roots of real polynomials occur in conjugate pairs, so if λ is
a root of the characteristic polynomial, then so is λ.

Theorem 5.3.5 If λ is an eigenvalue of a real matrix A and x is an eigen-


vector, then x is an eigenvector for λ.
Proof. Since Ax = λx, we can take the complex conjugate of both sides and
get
Ax = Ax = λx = λx.

The previous result tells us that eigenvectors of λ and λ occur in pairs.


This is what we need in order to express solutions to differential equations
with complex eigenvalues. Thus, let X 0 = AX be a linear system and λ and
eigenvalue of A with eigenvector x. Then writing λ = a + bi, we know that

eλt x = eat (cos bt + i sin bt)x

and
eλt x = eat (cos bt − i sin bt)x
are solutions to the system. As with linear differential equations and complex
roots of the auxiliary polynomial, these are complex values functions. Using

96
the superposition principle, we get solutions
1 1 i
X1 = (eλt x + eλt x) = (x + x)eat cos bt − (−x + x)eat sin bt.
2 2 2
and
i i 1
X2 = (−eλt x + eλt x) = (−x + x)eat cos bt − (x + x)eat sin bt.
2 2 2
If we now define z = 12 (x + x) = Re(x) and w = 2i (−x + x) = Im(x), we get
the following theorem

Theorem 5.3.6 Let λ = a + ib be an eigenvalue of A of algebraic and geo-


metric multiplicity 1 and X 0 = AX be a linear system. Then the fundamental
set of solutions includes the following solutions

X1 = eat (z cos bt − w sin bt)

and
X2 = eat (w cos bt + z sin bt) .

Note 5.3.7 We will not cover the case when a complex eigenvalue has geo-
metric or algebraic multiplicity more than 1, since a nice uniform presentation
of that material requires more linear algebra than what we have covered.
 
0 2 8
Example 5.3.8 Let’s solve the system X = X. The character-
−1 −2
istic polynomial is λ2 +4, so the eigenvalues are ±2i. Since eigenvectors come
in conjugate pairs, we only need to find the eigenvector to the eigenvalue 2i.
Thus, we need to solve
     
2 − 2i 8 0 −1 −2 − 2i 0 −1 −2 − 2i 0
⇒ ⇒
−1 −2 − 2i 0 2 − 2i 8 0 0 0 0

The last system gives x = −(2+2i)y, so one eigenvector is x = (2+2i, −1) =


(2, −1) + i(2, 0). It follows that y = (2 − 2i, −1) = (2, −1) + i(−2, 0) and
that
z = Re x = (2, −1), w = Im x = (2, 0).

97
The general solution is then
X1 = c1 (z cos 2t − w sin 2t) + c2 (w cos 2t + z sin 2t)
   
2 cos 2t − 2 sin 2t 2 cos 2t + 2 sin 2t
= c1 + c2 .
− cos 2t − sin 2t

5.4 Solutions by diagonalization


By our definition of a system of linear differential equations, the following is
a linear system 
0
 x 1 = a1 x 1

.. .
.
 x0 = a x

n n n

At the same time, we see that the system is not a ”real system”, since it’s
essentially just a list of independent linear first-order differential equations.
That we know how to solve separately as simply xi = ci eai x . The equation
above can be written in matrix form as
    
x01 a1 0 x1
 ..   ..   .. 
 . = .  . .
0
xn 0 an xn
Conversely, we see that if a system can be written as X 0 = AX, then whenever
A is a diagonal matrix, the previous phenomena occurs.

Definition 5.4.1 A system X 0 = AX, where A is a diagonal matrix is called


uncoupled, i.e. x0i can be expressed solely in terms of xi .
Assume that we are given the system X 0 = AX and that we may diago-
nalize A = P DP −1 . Write X = P Y . Then the system reduces to
X 0 = P Y 0 = P DP −1 X = P DY
and dividing by P (since P is invertible), we get Y 0 = DY . If D has the
elements λ1 , . . . , λn on the diagonal, then
 
eλ1 x
 ..
Y = .


eλn x
and we get X by multiplying with P .

98
 
−2 −1 8
Example 5.4.2 The coefficient matrix of the systen X 0 =  0 −3 8  X
0 −4 9
can be diagonalized as
   −1
1 2 1 −2 0 0 1 2 1
 0 2 1  0 1 0  0 2 1  ,
0 1 1 0 0 5 0 1 1
so that
c1 e−2t
 

Y =  c2 et  .
c3 e5t
It follows that
c1 e−2t c1 e−2t + 2c2 et + c3 e5t
    
1 2 1
X = P Y =  0 2 1   c2 e t  =  2c2 et + c3 e5t 
5t t 5t
0 1 1 c3 e c2 e + c3 e

6 Series solutions to ODEs


6.1 Solutions around ordinary points
Definition 6.1.1 A second-order homogeneous linear differential equation
is in standard form if it’s written as
y 00 + p(x)y 0 + q(x)y = 0.

Definition 6.1.2 A point x0 is an ordinary point if p(x) and q(x) are analytic
at x0 , meaning they can be written as power series around the point x0 .
Otherwise it’s called a singular point.

Example 6.1.3 xy 00 + x2 y 0 + y = 0 is not in standard form, but dividing by


x, we can write it as
y 00 + xy 0 + y/x = 0,
which is in standard form. Here 0 is a singular point and all other points are
ordinary points. To solve a system like this, one has to analyze separately
solutions on intervals which contain the point 0, since the latter equation is
not well-defined at 0.

99
The method of series solutions works as follows. For simplicity, assume we
are looking at a solution around the point 0 and that p(x) and q(x) can both
be expressed as power series
P∞ around 0. A power series can be differentiated
n
term by term, so if y = n=0 an x , then

X ∞
X
0 n−1 00
y = nan x , y = n(n − 1)an xn−2 .
n=1 n=2

If we plug this into the equation above we get



X ∞
X
n−2
n(n − 1)an x + p(x) nan xn−1 + q(x) = 0.
n=2 n=1

If we plug in the power series representation for p(x) and q(x) on the left and
simplify, then the left-hand side becomes a power series which then needs to
have all coefficients zero, since the right side is zero. This gives us equations
for the an , which lets us compute a power series for the solution.

Example 6.1.4 Let’s analyze the equation y 0 = cx. If we plug in a power


series for y, we get
X∞ ∞
X
n−1
nan x = can xn
n=1 n=0
n
On the left side the coefficient of x is (n + 1)an while on the right side it’s
can for n > 0. Since both sides are equal we get
can
(n + 1)an+1 = can ⇒ an+1 = ,
n+1
n+1 n
which implies that an+1 = c(n+1)!
a0
or an = c n!a0 for n > 0 and the formula also
holds for n = 0, since c0 = 0 and 0! = 1. Thus,
∞ ∞
X c n a0 n
X (cx)n
y= x = a0 = a0 ecx ,
n=0
n! n=0
n!
which, as we know, is the general solution to y 0 = cx.

Theorem 6.1.5 If x = x0 is an ordinary point of y 00 +p(x)y 0 +q(x)y = 0, then


the power series method always gives us two linearly independent solutions
around x0 and the radius of convergence of both power series is at least the
distance to the closest singular point.

100
Example 6.1.6 content...

6.2 Solutions around singular points


Definition 6.2.1 A singular point x0 of y 00 + p(x)y 0 + q(x)y = 0 is a regular
singular point if (x − x0 )p(x) and (x − x0 )2 q(x) are analytic at x0 . Otherwise
it’s an irregular singular point. We will only cover the case of regular singular
points.
P∞To cover the ncase of regular singular points, instead P∞ of substituting y=
r n
n=0 an (x − x0 ) , we will substitute y = (x − x0 ) n=0 an (x − x0 ) . The
following example shows how the method works.

Example 0 is a regular singular point of 3x00 + y 0 − y = 0, so we plug


P∞6.2.2 n+r
in y = n=0 an x . It follows that

X ∞
X
0 n+r−1 00
y = (n + r)an x , y = (n + r)(n + r − 1)an xn+r−2 .
n=0 n=0

Plugging in and simplifying, we get



X ∞
X ∞
X
00 0 n+r−2 n+r−1
3xy + y − y = 3 (n + r)(n + r − 1)an x + (n + r)an x − an xn+r
n=0 n=0 n=0
= ...

!
X
−1
= xr r(3r − 2)c0 x + [(n + r + 1)(3n + 3r + 1)cn+1 − cn ]xn = 0.
n=0

The sum tells us that


cn
cn+1 = , n = 0, 1, 2, . . .
(n + r + 1)(3n + 3r + 1)
If we choose c0 = 0, then this gives us the solution y = 0 (since every
cn = 0), so this does not give any nontrivial solutions. Therefore, we assume
that c0 6= 0 and since we are looking for a nontrivial solution, we must have
r(3r − 2) = 0. It follows that r = 0 or r = 2/3. Fixing c0 and depending on
the value of r that we pick, we then find two linearly independent solutions.

Definition 6.2.3 The equation we got above for r i.e. r(3r −2) = 0 is called
the indicial equation and it’s roots are called the indicial roots.

101
7 Vector Calculus
7.1 Line Integrals
We start by covering the required theory of curves. The line integral is a
generalization of the one-dimensional Riemann integral that you have seen
before in calculus, but instead of integrating over a fixed axis, the partition
is done over a curve.

Definition 7.1.1 A curve is a continuous function γ : [a, b] → Rn . We call


γ(a) the initial point of the curve and γ(b) the endpoint.

Definition 7.1.2 A curve from x ∈ Rn to y ∈ Rn is any curve s.t. the initial


point is x and the endpoint is y.

Definition 7.1.3 A curve γ is simple if it does not intersect itself, in other


words, γ(s) = γ(t) only if s = t.

Definition 7.1.4 A curve γ is closed if γ(a) = γ(b), i.e., the initial point
and endpoint are the same.

Definition 7.1.5 Let γ : [a, b] → Rn be a curve. We may denote γ =


(γ1 , . . . , γn ), where the γi are the coordinate functions. We say that γ is
differentiable if each γi is. Furthermore, we say that γ is C k if each γi has a
kth derivative which is continuous.
When integrating over a curve, we should only be concerned about which
points in the plane the curve represents while our integral should be indepen-
dent with regards to how we want to represent these points. For example, if
γ : [a, b] → Rn is curve, then we may look at the set
{γ(t) | t ∈ [a, b]},
i.e. the set of points on the curve. For example if γ : [0, 2π] → R2 is the
curve
γ(t) = (cos t, sin t),
then the points
{γ(t) | t ∈ [0, 2π]}

102
make up the unit circle. However, we could also pick the curve

µ : [0, 1] → R2 , µ(t) = (cos 2πt, sin 2πt)

and its set of points also make up the unit circle. Thus if we think of the unit
circle itself as a closed simple curve, then there are multiple ways in which
we can represent it. In our theory of integration we require these curves to
be related in the following sense.

Definition 7.1.6 Let γ : [a, b] → Rn and µ : [c, d] → Rn be curves. We say


that γ and µ are equivalent if there exists a function φ : [a, b] → [c, d] s.t.
φ0 (t) > 0 and µ = γ ◦ φ.

Example 7.1.7 Our two parametrizations of the unit circle are equivalent,
since
µ = γ ◦ φ,
where φ : [0, 1] → [0, 2π], γ(t) = 2πt, so that φ0 (t) = 2π > 0 for all t ∈ [0, 1].
Let γ : [a, b] → U , U ⊂ R2 , be a curve and f : U → R some function.
Assume that we split the interval [a, b] into pieces

a = t0 < t1 < . . . < tk = b.

Denote by ∆sk the length along the curve from γ(tk−1 ) to γ(tk ) and denote
by ∆xk and ∆yk the difference in x and y coordinates i.e.

(∆xk , ∆yk ) = γ(tk ) − γ(tk−1 ).

If we measure the are of under the surface f along the curve γ, then this is
approximately given by
X k
F (γ(ti ))∆si
i=1

and the projections of the area to the x and y axis are approximately given
by
Xk k
X
F (γ(ti ))∆xi , F (γ(ti ))∆yi .
i=1 i=1

103
These are just typical Riemann sums, so by letting the size of the partition
of the interval [a, b] go to 0 we arrive at the quantities
Z b Z b Z b
0 0
f (γ(t))|γ (t)| dt, f (γ(t))γ1 (t) dt, f (γ(t))γ20 (t) dt.
a a a

Definition 7.1.8 Let f : U → R be a function s.t. U ⊂ Rn . We define the


line integral of f over the curve γ : [a, b] → U to be
Z Z b
f ds = f (γ(t))|γ 0 (t)| dt,
γ a

where we assume that γ is differentiable. Furthermore, if n = 2, we make


the definitions
Z Z b Z Z b
0
F dx = f (γ(t))γ1 (t) dt, F dy = f (γ(t))γ20 (t) dt.
γ a γ a

Theorem 7.1.9 Assume that γ : [a, b] → U and µ : [c, d] → U are equiva-


lent, then Z Z
f ds = f ds.
γ µ

Proof. We have that µ = γ ◦ φ, where φ : [c, d] → [a, b]. By the chain rule,
we have that
µ0 (t) = γ 0 (φ(t))φ0 (t)
and the change of variable formula for s = φ(t), so ds = φ0 (t) dt, gives that
Z d Z d Z b
0 0 0
f (µ(t))|µ (t)| dt = f (γ(φ(t)))|γ (φ(t))||φ (t)|dt = f (γ(s))|γ 0 (s)| ds.
c c a

Note 7.1.10 The previous theorem makes it possible to use expressions such
as ”integrating over the unit circle” since it doesn’t really matter which curve
we choose that represents the points on the unit circle, as long as the curve, is
differentiable and ”does not change direction”, meaning it does not suddenly
start moving backwards (this is what the φ0 (t) > 0 condition means in the
definition of equivalence of curves). This would guarantee that it’s equivalent
to the standard parametrization given in the example above.

104
Definition 7.1.11 A vector field is a function F : U → Rn , where U ⊂ Rn
In other words, to each point in space we attach a vector.
A vector field can be thought of as a current that flows through space. If
we move in space we have to fight the current to move forward. Therefore, if
−→
d = OP and we move from O to P , then the amount of work done is given
by F · d. It follows that if we move along a path γ, then the instantaneous
work done at a time t, is simply given by
γ 0 (t)
 
F (γ(t)) · 0 |γ 0 (t)| = F (γ(t)) · γ 0 (t).
|γ (t)|
This holds, since we are moving in the direction γ 0 (t) at speed |γ 0 (t)| while
the vector fields points in the direction F (γ(t)). The total work done when
moving along the path is then measured by
Z b
F (γ(t)) · γ 0 (t) dt
a

and we make the following definition.

Definition 7.1.12 Let F : U → Rn be a vector field. Then the line integral


over the vector field along the path γ : [a, b] → U is defined as
Z Z b
F · dr = F (γ(t)) · γ 0 (t) dt
γ a

A completely analogous computation to the proof of theorem 7.1.9 shows


that equivalent curves again give the same integral.

7.2 Independence of Path


In the previous section we showed that it doesn’t matter how fast we move
along a fixed curve as long as we keep moving in the same direction. In this
section we will show that sometimes when we move in a vector field, the path
that we pick between two points is also completely irrelevant.

Definition 7.2.1 Let F : U → Rn , U ⊂ Rn be a vector field. A function


f : U → R is called a potential function of F if F = ∇f . In other words,
F = (∂f /∂x1 , . . . , ∂f /∂xn ).

105
Example 7.2.2 If f (x, y) = xy 2 , then ∇f = (y 2 , 2xy), so that f is a poten-
tial for the vector fields F (x, y) = (y 2 , 2xy).

Theorem 7.2.3 If F has a potential function i.e. F = ∇f for some f : U →


R and γ : [a, b] → U is a curve, then
Z
F · dr = f (γ(b)) − f (γ(a)).
γ

In other words, the value of the integral is only dependent on the initial point
and the endpoint.
Proof. Since the chain rule gives dtd f (γ(t)) = ∇f (γ(t)) · γ 0 (t), we get
Z b Z b Z b
0 0 d
F (γ(t))·γ (t)dt = ∇f (γ(t))·γ (t)dt = f (γ(t))dt = f (γ(a))−f (γ(b)),
a a a dt

where the last equality is just the fundamental theorem of calculus.

Definition 7.2.4 A vector field F is called conservative if any line inte-


gral of F only depends on the initial point and the endpoint of the path of
integration.

Corollary 7.2.5 If γ is a closed curve and F has a potential function f ,


then Z
F · dr = 0.
γ

Proof. f (γ(b)) − f (γ(a)) = 0, since γ(a) = γ(b).


Now assume that F = (F1 , . . . , Fn ) = ∇f = (D1 f, . . . , Dn f ). We know
from earlier calculus classes that the order of differentiation does not matter
if Di Dj f = Dij f is continuous for all i, j. That is, we get

Di Dj f = Dj Di f

for all i, j. Thus, if F = ∇f , then we have

Di Fj = Di Dj f = Dj Di f = Dj Fi ,

so we get the following theorem.

106
Theorem 7.2.6 If F = (F1 , . . . , Fn ) : U → Rn has a potential function f ,
then
∂Fi ∂Fj
= .
∂xj ∂xi
It turns out that a strong converse is also true. If the domain U of the
vector field is nice enough, then the theorem has a converse. First we need
to describe what ”nice enough” means.

Definition 7.2.7 A domain U ⊂ Rn is path-connected if for any to points


x, y ∈ U , we may find a curve γ : [a, b] → U s.t. γ(a) = x and γ(b) = y.

Definition 7.2.8 A domain U ⊂ Rn is simply connected if it’s path-connected


and if any closed curve γ in U can be contracted to a point without ever leav-
ing U .

Theorem 7.2.9 Assume that F = (F1 , . . . , Fn ) : U → Rn is a vector field


and U ⊂ Rn is simply-connected. If
∂Fi ∂Fj
=
∂xj ∂xi
for all i, j, then
1. There exists a function f : U → R s.t. F = ∇f .
2. F is conservative.
Proof. The proof of this theorem is very hard and requires methods of an
extremely fancy field of mathematics called algebraic topology. The interested
student can take math 601.

Example 7.2.10 Assume that F (x, y) = (2x sin y, x2 cos y). This vector
field is defined on all of R2 , which is simply-connected, and
D2 2x sin y = D1 x2 cos y.
The previous theorem then states that there’s a function f : R2 → R s.t.
F = ∇f . Since F1 = 2x sin y = D1 f , by integrating, we get
Z
f (x, y) = 2x sin y dx + g(y) = x2 sin y + g(y).

107
and Z
f (x, y) = x2 cos y dy + h(x) = x2 sin y + h(x)

It follows that

x2 sin y + g(y) = x2 sin y + h(x) ⇒ g(y) = h(x).

Since g, h are functions of a different variable they must both be constants.


By differentiating, we see that any constant works, so

f = x2 sin y + C,

where C ∈ R can be any constant.

Example 7.2.11 Let γ : [0, 2π] → R2 be the curve γ(t) = (cos t, sin t) and
let F be the vector field in the previous problem. Then
Z Z b
F · dr = (2 cos t sin(sin t), cos2 t cos(sin t)) · (− sin t, cos t) dt.
γ a

Computing the dot product would lead to an extremely nasty integral. How-
ever, since F has a potential function and γ is a closed curve, we just get
Z
F · dr = 0.
γ

7.3 Multiple integrals


Definition 7.3.1 Let R ⊂ Rn be a cuboid with sides lengths a1 , . . . , an . We
define the n-dimensional volume of R to be

vol(R) = a1 a2 · · · an .

When n = 2 this is just area and when n = 3 this is standard volume.

Definition 7.3.2 Let R ⊂ Rn be a cuboid. A cuboid R can be cut into


smaller cuboids and a collection of them is called a partition partition. If P
is a partition, then the norm of kP k is defined as longest the side length of
a cuboid in P .

108
Let f : R → R be some function and R ⊂ Rn some cuboid. If P is a partition
of R into smaller cuboids, then we may choose an element xA ∈ A for each
A ∈ P . We will investigate the meaning of the sum
X
f (xA ) vol(A).
A∈P

Definition 7.3.3 Given f : R → R for a cuboid R ⊂ Rn as before, we define


Z X
f = lim f (xA ) vol(A).
R kP k→0
A∈P

If the limit exists, we say that f is integrable and that the limit above is the
integral of f over R.

When n = 1 the integral above is just the standard Riemann integral and
it’s usually denoted by Z
f dx.
R
when n = 2 the integral above is called a double integral and we usually
denote it by ZZ
f dA.
R
Furthermore, when n = 3 the integral is called a triple integral and we usually
denote it by ZZZ
f dV.
R
Assume next that D ⊂ Rn is some closed and bounded region in Rn .
Then we may find a cuboid R containing D. If f : D → R is a function
defined on R, we may extend f to R by letting it be zero outside of D. In
other words, if we denote the extension by f˜ : R → R, then

˜ f (x), x ∈ D
f (x) =
0, x ∈ R \ D.

Z Z
Definition 7.3.4 We define f= f˜.
D R

109
Theorem 7.3.5 (Fubini’s theorem) Let R = [a1 , b1 ] × · · · × [an , bn ] be a
cuboid in Rn and f : R → R an integrable function. Assuming that f is
”nice enough”, then
Z Z b1 Z b2  Z bn   
f= ··· f dxn · · · dx2 dx1
R a1 a2 an

with the integral on the right called an interated integral. Furthermore, the
order in which the iterated integrals are taken, does not matter.
Proof. Omitted.

Theorem 7.3.6 (Properties of the integral) Let f, g : R → R be integrable.


Then
Z Z
1. cf = c f for any constant c
R R
Z Z Z
2. (f + g) = f+ g
R R R

Example 7.3.7 Let D ⊂ R4 be the set of points (x, y, z, w), where 0 ≤ x ≤


1, 0 ≤ y ≤ x, 0 ≤ z ≤ y and 0 ≤ w ≤ z. Let f : D → R be the constant
function f (x, y, z, w) = 1. Since D is not a cuboid, we extend it to a function
f˜ : R → R, where R = [0, 1]4 , which is zero inside of R and outside of D and

110
1 on D. By the definition of the integral
Z Z
f = f˜
D
ZR1 Z 1 Z 1 Z 1
= f˜(x, y, z, w) dw dz dy dx
Z0 1 Z0 x Z0 y Z0 z
= 1 dw dz dy dx
0 0 0 0
Z 1Z xZ y
= z dz dy dx
0 0 0
Z 1Z x
1 2
= y dy dx
0 0 2
Z 1
1 3
= x dx
0 6
1
= .
24

Example 7.3.8 Let R ⊂ Rn be the cuboid [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ].


If we integrate the constant function f , we get
Z Z b1 Z b2  Z bn   
1 = ··· dxn · · · dx2 dx1
R a1 a2 an
Z b1  Z bn 
= dx1 · · · dxn
a1 an
= (b1 − a1 ) · · · (bn − an )
= vol(R).

Definition 7.3.9 Let D ⊂ Rn be closed and bounded. We extend our


definition of n-dimensional volume by defining
Z
vol(D) = 1.
D

7.4 Green’s theorem


Let C be a closed simple curve in R2 oriented counterclockwise. This simple
means that the area enclosed by the curve stays on the left when walking

111
around the curve. An illustration is given by the following picture.
y

d
J

I
c

x
a b

Figure 3: A simple closed curve oriented counterclockwise.

Green’s theorem is then the following remarkable statement:

Theorem 7.4.1 (Green’s theorem) Let C be a piecewise differentiable closed


simple curve bounding a region R. If F : R2 → R2 is a continuous vector
field and ∂F1 /∂y, ∂F2 /∂x are exist and are continuous on R, then
Z ZZ  
∂F2 ∂F1
F · dr = − dA.
C R ∂x ∂y

We prove Green’s theorem in two steps. The first step is to check that
it holds when C is just the boundary of a rectangle. This is given by the
following lemma.

Lemma 7.4.2 Let R be a rectangle and C its boundary and F as in the


statement of Green’s theorem, then Green’s theorem holds for this C and R.
Proof. The picture of the situation is like in the figure.
The four paths can be parametrized as follows

λ1 (t) = (t, c), λ2 (t) = (b, t), λ3 (t) = (t, d), λ4 (t) = (a, t).

112
y
λ3
d J

λ4 λ2

c I
λ1
x
a b

Figure 4: Green’s theorem for a rectangle.

and the corresponding line integrals are


Z Z b Z b
F · dr = F (γ1 (t)) · (1, 0) dt = F1 (t, c) dt
γ1 a a
Z Z d Z d
F · dr = F (γ2 (t)) · (0, 1) dt = F2 (b, t) dt
γ2 c c

Z Z a Z b
F · dr = F (γ3 (t)) · (1, 0) dt = F1 (t, d) dt
γ3 b a
Z Z c Z b
F · dr = F (γ4 (t)) · (0, 1) dt = F2 (a, t) dt
γ4 d a

The line integral over the whole path is then given by


Z Z Z Z Z
F · dr = F · dr + F · dr + F · dr + F · dr
C γ1 γ2 γ3 γ4
Z b Z d
= F1 (t, c) − F1 (t, d) dt + F2 (b, t) − F2 (a, t) dt
a c

Next, we compute the double integral and show that we get the same expres-
sion. Applying the fundamental theorem of calculus and Fubini on ∂F1 /∂y

113
and ∂F2 /∂x, we get
ZZ   Z dZ b Z bZ d
∂F2 ∂F1 ∂F2 ∂F1
− dA = dx dy − dy dx
R ∂x ∂y c a ∂x a c ∂y
Z b Z d
= F1 (x, c) − F1 (x, d) dx − F2 (a, y) − F2 (b, y) dy
a c
Z b Z d
= F1 (t, c) − F1 (t, d) dt − F2 (a, t) − F2 (b, t) dt
a c
Z b Z d
= F1 (t, c) − F1 (t, d) dt + F2 (b, t) − F2 (a, t) dt
a c

Which is exactly what we got for the line integral.


Proof of Green’s theorem. Let D be any domain in the plain. Then we can
cut D up into a grid as in the figure 5. If we compute the line integral around
every square in the grid inside D, then the integral of any side shared between
two squares will cancel out since it’s being traversed once in each direction.
If Ci is the boundary of the rectangles Ri inside D and R = R1 ∪ . . . ∪ Rk ,
then
ZZ   k ZZ   k Z
∂F2 ∂F1 X ∂F2 ∂F1 X
− dA = − dA = F · dr.
R ∂x ∂y i=1 Ri ∂x ∂y i=1 Ci

However, since the boundaries shared between different Ci cancel, it follows


that
ZZ   k Z Z
∂F2 ∂F1 X
− dA = F · dr = F · dr,
R ∂x ∂y i=1 Ci γ

where γ is the path on the right in figure 5 and R is the area bounded by γ.
The proof of the theorem now follows by letting the grid become finer and
finer.

Example 7.4.3 Standard Green’s theorem

Example 7.4.4 If R ⊂ R2 is some domain in the plane enclosed by a curve


γ : [a, b] → R2 , then we know that
ZZ
vol(R) = dA.
R

114
y y

d d
J J γ

I I I
c c

x x
a b a b

Figure 5: A simple closed curve oriented counterclockwise and path γ after


cancellations of adjacent sides.

If we let F (x, y) = (0, x), F (x, y) = (−y, 0) or F (x, y) = (− 12 y, 21 x), then by


Green’s theorem, we can express the area in the following form
ZZ ZZ   Z
∂F2 ∂F1
dA = − dA = F · dr.
R R ∂x ∂y γ

As an example, let’s use this to compute the area of an ellipse. An ellipse


with major axis a and b can be parametrized as γ(t) = (a cos t, b sin t), so
that γ 0 (t) = (−a sin t, b cos t), where 0 ≤ t ≤ 2π. If we denote the area inside
the ellipse by R, then letting F (x, y) = (0, x), we get
Z
vol(R) = F · dr
γ
Z 2π
= (0, a cos t) · (−a sin t, b cos t) dt
0
Z 2π
= ab cos2 t, dt
0
= πab.

Example 7.4.5 Let R = R1 ∪ R2 , where R1 and R2 are as in figure 6. We


can’t apply Green’s theorem directly on R, so we have to use a trick. Cut the
region in the middle into two pieces. If γ1 denotes the curve around the upper

115
piece and γ2 around the lower one where both are oriented counterclockwise.
Then we see that the dashed pieces cancel out and Green’s theorem gives the
equality
ZZ ZZ ZZ
F dA = F dA + F dA
R R1 R2
Z Z
= F · dr + F · dr
γ1 γ2
Z Z Z
= F · dr − F · dr − F · dr.
C1 C2 C3

The integrals over the inner circles are negative, since they are traversed
clockwise.

C1
J
R1
I

I
J

C3 C2
I R2

Figure 6: Domain with holes.

7.5 Change of variable formula for multiple integrals


The change of variable formula is based on the following geometric theorem:

Theorem 7.5.1 Let M be an n-by-n matrix and A a cuboid in Rn with one


vertex at the origin, then

vol(M (A)) = | det M | vol(A).

Proof. If you check our construction of the determinant, you’ll see that this
is precisely what we wanted the determinant to satisfy.

116
Definition 7.5.2 A function T is 1-to-1 if T (x) = T (y) implies x = y.

Definition 7.5.3 Let D ⊂ Rn be some closed and bounded region and


T : D → Rn a differentiable function. We define the Jacobian matrix or
Jacobian of T at the point (a1 , . . . , an ) ∈ D to be the matrix
 ∂T ∂T1

∂x1
1
(a1 , . . . , an ) · · · ∂x n
(a1 , . . . , an )
JT (a1 , . . . , an ) = T 0 (a1 , . . . , an ) = 
 .. ... .. 
. . 
∂Tn ∂Tn
∂x1
(a1 , . . . , an ) · · · ∂xn (a1 , . . . , an )

Note 7.5.4 Since the determinant |JT (a1 , . . . , an )| is just a real number it
defines a function D → R given by

(a1 , . . . , an ) 7→ |JT (a1 , . . . , an )|.

Theorem 7.5.5 Let D ⊂ Rn be a closed and bounded domain. Assume


that
T : D → Rn
is continuously differentiable and that T is 1-to-1. If this holds and f :
T (D) → R is a continuous function, then
Z Z
f = (f ◦ T )|JT |.
T (D) D

Proof. An actual proof is quite hard. The idea is discusses below.


The rest of this section is devoted to trying to intuitively explain why the
change of variable formula holds. We make the simplifying assumption that
D is actually a cuboid R, so the formula reads
Z Z
f = (f ◦ T )|JT |.
T (R) R

Let P be a partition of R into smaller cuboids. Then each smaller cuboid


A ∈ P gets mapped by T to some B i.e. B = T (A). In this case the B
are not cuboids. However, if we think about the definition of the integral,
there doesn’t seem to be any reason why we would have to cut the cuboid
into pieces that are cuboids, any ”small” pieces should do. This intuition is

117
almost right except for some mathematical technicalities that we ignore. It
follows that Z X
f≈ f (yB ) vol(B)
T (R) B∈T (P )

If we write B = T (A) and let T (xA ) = yB , then we get


Z X X
f≈ f (yB ) vol(B) = f (T (xA )) vol(T (A)).
T (R) B∈T (P ) A∈P

By definition, we also have that


Z X
(f ◦ T )|JT | ≈ f (T (xA ))|JT (xA )| vol(A).
R A∈P

Therefore, our intuitive proof makes sense, if we can show that

vol(B) = vol(T (A)) ≈ |JT (xA )| vol(A).

Showing that the previous equality is true is actually a consequence of


Taylor’s formula. Taylor’s formula in 1 variable can be written as

f (x) ≈ f (x0 ) + f 0 (x0 )(x − x0 ).

For functions T : Rn → Rn this has the generalization

T (x) ≈ T (x0 ) + T 0 (x0 )(x − x0 ).

Note that the actual change in the function is approximately given by multi-
plying by the Jacobian matrix T 0 (x0 ). If we make the simplifying assumption
that xA = 0 and T (x0 ) = 0 (we can do this by moving the origin), then it
follows that
T (A) ≈ T 0 (xA )A.
This is equivalent to the theorem given in the beginning of this section with
M = T 0 (xA ).
The following are standard coordinate transformations that everyone should
be familiar with.

118
Definition 7.5.6 Any point in the (x, y) can be written as x = r cos θ,
y = r sin θ, where r ≥ 0 and 0 ≤ θ2 ≤ π. Thus, we have a transformation

T (r, θ) = (r cos θ, r sin θ)

and its Jacobian satisfies


|JT (r, θ)| = r.
This coordinate representation is called polar coordinates.

Definition 7.5.7 Polar coordinates can be extended to three dimensions


into cylindrical coordinates. The coordinate transformation is given by

T (r, θ, z) = (r cos θ, r sin θ, z),

where r ≥ 0, 0 ≤ θ ≤ 2π and z ∈ R. Again, the Jacobian satisfies

|JT (r, θ, z)| = r.

Definition 7.5.8 Any point in R3 can be represented in spherical coordi-


nates
x = ρ sin φ cos θ, y = ρ sin φ sin θ, z = ρ cos φ.
where ρ ≥ 0, 0 ≤ φ ≤ π and 0 ≤ θ ≤ 2π. This gives the transformation

T (r, φ, θ) = (ρ sin φ cos θ, ρ sin φ sin θ, ρ cos φ)

and its Jacobian satisfies

|JT (r φ, θ)| = ρ2 sin φ.

Example 7.5.9 We compute the integral


ZZZ
x2 + y 2 dV,
D

where D ⊂ R3 is the solid region between two spheres given by

a ≤ x2 + y 2 + z 2 ≤ b.

119
In spherical coordinates the region is given by

a ≤ ρ ≤ b, 0 ≤ φ ≤ π, 0 ≤ θ ≤ 2π,

which we’ll denote by R. Let T be the transformation given above for spher-
ical coordinates. Using the change of variable formula, we get
ZZZ ZZZ
2 2
x + y dV = x2 + y 2 dV
D T (R)
ZZZ
= ρ2 sin2 φ|JT |dV
R
Z 2π Z π Z b
= ρ4 sin3 φ dρ dφ dθ,
0 0 a

which is an integral we can compute.

Example 7.5.10 Let D be the cylinder x2 + y 2 ≤ 1 and 0 ≤ z ≤ 2. In


cylindrical coordinates this is represented by the region R defined by

0 ≤ r ≤ 1, 0 ≤ θ ≤ 2π, 0 ≤ z ≤ 2.

It follows that
ZZZ Z 2π Z 1 Z 2
2
y z dV = zr2 sin2 θ r dz dr dθ,
D 0 0 0

which is again in a computable form.

7.6 Surface integrals


Definition 7.6.1 Let D ⊂ R2 be some closed and bounded region. A surface
in R3 a continuously differentiable function r : D → R3 , usually written as

r(u, v) = (x(u, v), y(u, v, z(u, v))).

Usually one thinks of surfaces as mapping a region of the (u, v)-plane into
a 2-dimensional surface in R3 . Now if (a, b) is a point on a horizontal grid
line, then at point r(a, b) the tangent vector has direction
∂r
(a, b)
∂u
120
v
z
Σ

R x = x(u, v)
y = y(u, v)
(a, b) z = z(u, v) r(u, v)
y
0
u x

Figure 7: A surface in R3 .

while if (a, b) lies on a vertical grid line, then it has direction


∂r
(a, b).
∂v
From previous math classes, we know that two vectors x, y ∈ R3 form a
parallelogram with approximate area |x × y|. It follows that a square in the
(u, v) plane with lower corner (a, b) is taken to a square with approximate
area
∂r ∂r
|(r(a+∆u, b)−r(a, b))×(r(a, b+∆v)−r(a, b))| ≈ (a, b) × (a, b) ∆u∆v.
∂u ∂v

It follows that the surface area of a surface S = r(D) is given by


ZZ
∂r ∂r
× dA.
D ∂u ∂v

Definition 7.6.2 Letting S = r(D) and f : S → R be a function, we define


the surface integral over f as
ZZ ZZ
∂r ∂r
f dS = f × dA.
S D ∂u ∂v

Example 7.6.3 Typically a the surface is of the form z = f (x, y), then we
may write
r(x, y) = (x, y, f (x, y))

121
z
(y − b)2 + z 2 = a2

a
u y
0

Figure 8: Forming a torus by rotating circle around the origin.

and
∂r ∂r
(x, y) = (1, 0, fx (x, y)), = (0, 1, fy (x, y)),
∂x ∂y
so that s 2  2
∂r ∂r ∂f ∂f
× = + +1
∂x ∂y ∂x ∂y

Example 7.6.4 Let’s compute the surface area of the torus with inner radius
b − a and outer radius b + a. A torus can be parametrized as a surface as

r(u, v) = ((b+a cos u) cos v, (b+a cos u) sin v, a sin u), 0 ≤ u ≤ 2π, 0 ≤ v ≤ 2π,

which follows from rotating the circle in figure 8. Therefore, we let D =


[0, 2π] × [0, 2π] and we want to compute
ZZ
dS.
r(D)

We have that
∂r
= −a sin u cos vi − a sin u sin vj + a cos uk
∂u
∂r
= −(b + a cos u) sin vi + (b + a cos u) cos vj + 0k
∂v
Computing the cross product gives
∂r ∂r
× = −a(b+a cos u) cos v cos ui−a(b+a cos u) sin v cos uj−a(b+a cos u) sin uk
∂u ∂v
122
and
∂r ∂r
× = a(b + a cos u).
∂u ∂v
It follows that
ZZ Z 2π Z 2π
dS = a(b + a cos u) du dv = 4π 2 ab.
r(D) 0 0

Next we develop a surface integral for vector fields. Note that the vectors
∂r ∂r
(a, b), (a, b)
∂u ∂v
span the tangent plane to the surface at the point (a, b) and that the unit
normal of the surface at a point (a, b) is then given by
∂r ∂r
∂u
(a, b) × ∂v
(a, b)
n(a, b) = ∂r ∂r
.
k ∂u (a, b) × ∂v
(a, b)k

Now a vector field can be thought of as a current in a stream. In different


applications we want to measure how much current flows through a certain
cross-sectional area. This is usually called the flux through a surface. At a
point r(u, v) on a surface the flux is proportional to

F (r(u, v)) · n(u, v)

The only problem is that in order for us to be able to define a surface integral
that makes sense, we need to be able to pick a unique normal vector at
each point. At each point on a surface there are two possible choices for a
unit normal i.e. the normals pointing in opposite directions. We make the
following definition.

Definition 7.6.5 If a surface S = r(D) has a continuous unit normal, then


the surface is said to be orientable. If not, then the surface is unorientable.

Definition 7.6.6 Let F : R → R3 , R ⊂ R3 be a vector field, where S = r(D)


is an orientable surface contained within the interior of R. Then we define
ZZ ZZ
F · dS = F · n dS.
S S

123
Example 7.6.7 Let F (x, y, z) = (0, z, z). Compute the flux through the
part of the plane z = 6 − 3x − 2y contained in the first octant where the
orientation of the plane is chosen, so that the normal points upwards.
The normal of the plane 3x + 2y + z = 6 is
(3, 2, 1) 1
n= = √ (3, 2, 1).
|(3, 2, 1)| 14
Hence, ZZ Z ZZ
1
F · dS = (F · n) dS = √ 3z dS.
S S 14 S
We may parametrize the plane by the x, y i.e. r(x, y) = (x, y, 6 − 3x − 2).
The intersection with (x, y)-plane is
3x + 2y = 6 ⇒ y = 3 − 3x/2,
so that
1
ZZ Z 2 Z 3−3x/2 √ Z 2 Z 3−3x/2
√ 3z dS = 3 2 2
(6−3x−2) 3 + 2 + 1 dA = 3 (6−3x−2) dA = 18.
14 S 0 0 0 0

Note 7.6.8 A general simplification worth noting in the formulas here (and
which the course book omits) is that the unit normal is given as
∂r ∂r
∂u
× ∂v
n= ∂r ∂r
,
k ∂u × ∂v
k
so that if r : D → R3 is the parametrization of S = r(D), then
∂r ∂r
× ∂v
ZZ ZZ ZZ ZZ
∂u ∂r ∂r ∂r ∂r
F ·dS = F ·n dS = (F · ∂r ∂r )k × kdA = F · × dA,
S S D k ∂u × ∂v k ∂u ∂v D ∂u ∂v
which gives us the following formula, which is much easier to compute
ZZ ZZ
∂r ∂r
F · dS = F· × dA
S D ∂u ∂v

This is why the 14 cancels in the previous example. Thus, using the formula
derived above, we could have just directly computed
Z 2 Z 3−3x/2 Z 2 Z 3−3x/2
(0, 6−3x−2y, 6−3x−2y)·(3, 2, 1) dy dx = 3 (6−3x−2) dA = 18.
0 0 0 0

124
7.7 Divergence theorem
Definition 7.7.1 A surface is closed if it has a well-defined interior.

Example 7.7.2 Closed surfaces are for example a sphere or a torus.


Assume that an imaginary sphere has been submerged in a liquid, mean-
ing that the liquid can flow through the sphere. If the liquid is incompressible
and the liquid is moving, then the amount of liquid that goes in to sphere
should equal the amount that’s going out. Similarly, if the liquid inside the
sphere is expanding, then the expansion of the liquid should be measured by
the liquid that goes out. This is precisely what the Divergence theorem is
saying. We start by finding a measure for expansion.
Let F : D → R3 be a vector field and S a cube centered at the point
(x, y, z) ∈ D. Furthermore, assume that the side lengths of S are ∆x, ∆y
and ∆z. Let S1 and S2 denote the top and bottom surfaces of the cube,
so that their outward pointing normal vectors are k and −k. If we write
F = (F1 , F2 , F3 ), then
ZZ Z x+∆x/2 Z y+∆y/2 Z x+∆x/2 Z y+∆y/2
F · dS = (F1 , F2 , F3 ) · k dA = F3 dA.
S1 x−∆x/2 y−∆y/2 x−∆x/2 y−∆y/2

If we assume that our cube S is small, then we can assume that F3 is


approximately constant on the face S1 . The center point of S1 is given by
(x, y, z + ∆z/2). If we denote the volume of S by ∆V = ∆x∆y∆z, we get
approximately
ZZ
F3 (x, y, z + ∆z/2)
F · dS ≈ F3 (x, y, z + ∆z/2)∆x∆y = ∆V.
S1 ∆z

If we look at the opposite side S2 , then the normal is reversed, and the center
point is (x, y, z − ∆z/2), so

−F3 (x, y, z − ∆z/2)


ZZ
F · dS ≈ ∆V.
S2 ∆z

It follows that
F3 (x, y, z + ∆z/2) − F3 (x, y, z − ∆z/2)
ZZ
F · dS ≈ ∆V.
S1 +S2 ∆z

125
If we know let the size of the cube go to 0, then we get

F3 (x, y, z + ∆z/2) − F3 (x, y, z − ∆z/2)


ZZ
1 ∂F3
lim F ·dS = lim = .
∆V →0 ∆V S1 +S2 ∆V →0 ∆z ∂z

Repeating the same argument for the other pair of sides of S we get
Z
1 ∂F1 ∂F2 ∂F3
lim F · dS = + + .
∆V →0 ∆V S ∂x ∂y ∂z
What is this surface integral on the left actually measuring? If we want to
know how much F (x, y, z) is expanding at the point (x, y, z), then we can
measure this by letting a small square be centered at (x, y, z) and measuring
the difference between how much flows out compared to how much flows in
and divide this by the volume of S. This will give us the expansion of the
vector field within S by unit volume. Thus, by making the cube infinitely
small, we are measuring the unit expansion of the vector field at (x, y, z).
This motivates the following definition.

Definition 7.7.3 Let F : D → Rn , D ⊂ Rn be a vector field. The divergence


of F is defined to be
∂F1 ∂Fn
div F = + ... + .
∂x1 ∂xn

Note 7.7.4 If we let ∇ = ( ∂x∂ 1 , . . . , ∂x∂n ) be a symbolic vector of differential


operators, then by symbolically expanding the dot product ∇ · F , we may
write the divergence in the form

div F = ∇ · F.

Theorem 7.7.5 Let F : D → R3 , D ⊂ R3 , be a continuously differentiable


vector field and S a closed surface, so that D contains S and its interior V .
Then ZZ ZZZ
F · dS = div F dV.
S V

Proof idea. We will again give the idea of the proof. If we subdivide the
interior of S into many small cubes, S1 , . . . , SN each with volume V1 , . . . , VN .

126
Then if Si is centered at (xi , y, zi ), we get
ZZ ZZ
1
F · dS ≈ div F (x, y, z) ⇔ F · dS ≈ div F (x, y, z)∆Vi .
∆Vi Si Si

It follows that
ZZ N ZZ
X N
X ZZZ
F · dS ≈ F · dS ≈ div F (xi , yi , zi )∆Vi ≈ div F dV
S i=1 S i=1 V

and equalities follow from letting N , the number of cubes in the subdivision,
go to infinity.

7.8 Stokes theorem

127

You might also like