240 Notes
240 Notes
Edvard Fagerholm
[email protected]
2 Vector Spaces 5
2.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Definition of a vector space . . . . . . . . . . . . . . . . . . . . 6
2.3 Span of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Linear independence of vectors . . . . . . . . . . . . . . . . . . 10
2.5 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6 Generalizing our Definition of a Vector (optional) . . . . . . . 15
3 Matrices 18
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 The transpose of a matrix . . . . . . . . . . . . . . . . . . . . 24
3.4 Some important types of matrices . . . . . . . . . . . . . . . . 25
3.5 Systems of linear equations and elementary matrices . . . . . . 27
3.6 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . 31
3.7 Rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.8 Rank and systems of linear equations . . . . . . . . . . . . . . 37
3.9 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.10 Properties of the determinant . . . . . . . . . . . . . . . . . . 42
3.11 Some other formulas for the determinant . . . . . . . . . . . . 47
3.12 Matrix inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.13 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . 55
3.14 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4 Higher-Order ODEs 68
4.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2 Homogeneous equations . . . . . . . . . . . . . . . . . . . . . 70
4.3 Nonhomogeneous equations . . . . . . . . . . . . . . . . . . . 72
4.4 Homogeneous linear equations with constant coefficient . . . . 74
4.5 Undetermined Coefficients . . . . . . . . . . . . . . . . . . . . 77
4.6 Variation of parameters (optional) . . . . . . . . . . . . . . . . 80
4.7 Cauchy-Euler equations . . . . . . . . . . . . . . . . . . . . . 81
1
4.8 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.8.1 Free undamped motion . . . . . . . . . . . . . . . . . . 83
4.8.2 Free damped motion . . . . . . . . . . . . . . . . . . . 84
4.8.3 Driven motion . . . . . . . . . . . . . . . . . . . . . . . 86
2
1 About
1.1 Content of these notes
These notes will cover all the material that will be covered in class. What is
mostly missing is going to be pictures and examples. The course text, Zill,
Cullen, Advanced Engineering Mathematics 3rd Ed, will be very useful for
more thorough and harder (read: longer) examples than what I’m willing to
spend my time on typing up. You’ll also find more problems to complement
the homework that I will assign, since the more problems you do the better.
All the theory that’s taught in class will be in these notes and some more.
During lecture I might sometimes skip parts of some proofs that you’ll find
in these notes or instead just present the general idea by doing e.g. a special
case of the general case. This will usually happen when:
1. The full proof will not add anything useful to your understanding of
the topic.
2. The full proof is not a computational technique that will turn out useful
when solving practical problems. This is an applied class after all.
Embedded in each section you’ll find some examples. After almost every
definition there will be something I would call a trivial example that should
help you check that you are understanding what the definition is saying.
You should make sure you understand them before moving on, since not
understanding them is a sign that you’ve misunderstood something.
Each section also ends with a list of all the most basic computational
problems related to that topic. These are things you will be expected to
perform in your sleep, so make sure you understand them. Almost any
problem you will encounter in this class will reduce to solving a sequence of
these problems, so they will be the ”bricks” of most applications that you’ll
encounter.
3
1.2 About notation
The following basic notations will be used in the class:
In other words, a short hand for the sentence ”there exists a real number that
is larger than zero” could be written as ∃x ∈ R s.t. x > 0. I will also assume
you know the following set theoretic notations. Given any sets A, B we may
define union, intersection, difference, equality and subset. These are defined
as follows:
x∈A∪B ⇔ x ∈ A or x ∈ B
x∈A∩B ⇔ x ∈ A and x ∈ B
x∈A\B ⇔ x ∈ A and x 6∈ B
A=B ⇔ x ∈ A if and only if x ∈ B
A⊂B ⇔ If x ∈ A, then x ∈ B.
A ( B,
4
this set as
{x ∈ Z | x is even}
More generally, assume that P is a predicate, which is informally something
that’s either true or false depending on what you feed it and A any set. We
write
{x ∈ A | P (x)}
to mean ”the elements x of A s.t. P (x) is true”. In our example A = Z and
P (x) =”x is even”. If x is constructed of multiple parts, we might sometimes
write e.g.
{(x, y) ∈ R2 | y = 2x},
which would use as the predicate P (x, y) the statement ”y = 2x”. Usually,
these should not lead to too much confusion since most of the notation is
quite self-explanatory.
For functions we also use the familiar notation f : A → B. This means
that f is a function from A to B i.e. A is the domain of f and B the codomain
of f . In other words, given an x ∈ A, f assigns a unique f (x) ∈ B. In this
setting one also talks about the image of f which is defined as
Im f = {y ∈ B | ∃x ∈ A s.t. y = f (x)}
In other words, all the points in B onto which an element in A maps.
2 Vector Spaces
2.1 Vectors
In math 114 a sort of mixed notation was used for vectors. Sometimes we
wrote →−
v = 2i + j + k, while we also used the notation → −v = h2, 1, 1i. We
see that there’s an immediate equivalence between points of R3 and vectors.
If O denotes the origin in R3 , then given a point P = (2, 1, 1) it defines the
vector
−→
OP = h2, 1, 1i.
The moral of the story is that a point P = (x, y, z) represents a vector
and a point is essentially just a list of numbers. Those of you familiar with
computers might have heard the term vector processing, which simply means
that something operates on a list of numbers. In this spirit we will make the
following definition:
5
Definition 2.1.1 A vector is just an n-tuple of (a1 , . . . , an ) of numbers.
Typically we will have (a1 , . . . , an ) ∈ Rn i.e. each ai is a real number.
Note 2.1.3 In applications one often encounters vectors in, say, R100 . For
example, say we sample 100 numbers independently for some statistical ap-
plication. The result will be a vector (a1 , . . . , a100 ) ∈ R100 , where ai is the
number obtained in the ith sample. Here, the vector is chosen from a 100-
dimensional space. This does not make any sense if thinking about it visually.
However, it does make sense, if we think about the dimension as simply being
the degrees of freedom of the experiment. Each number is basically chosen
independently of the others.
6
Definition 2.2.1 A vector space is a subset V ⊂ Rn s.t. the following
conditions hold:
1. If x, y ∈ V , then x + y ∈ V .
2. If c ∈ R, then cx ∈ V .
The former condition is called closed under vector addition while the latter
is called closed under multiplication by a scalar.
x = c1 x1 + . . . + ck xk , ci ∈ R, i = 1, . . . , n.
7
Example 2.3.2 Assume we are given one vector x = (1, 0) ∈ R2 . Then we
may write (2, 0) = 2x, so (2, 0) is a linear combination of x. However, say
we look at the vector (1, 1). Clearly,
Example 2.3.3 Assume we are given vectors x1 = (1, 0, 1), x2 = (2, 1, 0),
x3 = (0, 0, 1). We want to find all the linear combinations of x1 , x2 , x3 . We
want to find all x ∈ R3 s.t.
x = c1 x 1 + c2 x 2 + c3 x 3 .
(a1 , a2 , a3 ) = c1 x 1 + c2 x 2 + c3 x 3
= c1 (1, 0, 1) + c2 (2, 1, 0) + c3 (0, 0, 1)
= (c1 , 0, c1 ) + (2c2 , c2 , 0) + (0, 0, c3 )
= (c1 + 2c2 , c2 , c1 + c3 ).
V = {x = (a1 , . . . , an ) ∈ Rn | x = c1 x1 + . . . + ck xk , ci ∈ R, i = 1, . . . , k}.
8
In other words, V is the set of all vectors that can be written as a linear
combination of the x1 , . . . , xn . We denote this by
V = span(x1 , . . . , xk ).
Note 2.3.6 The previous definition goes the other way around. We are given
the vector space V and we say that x1 , . . . , xk is a spanning set if it happens
that V = span(x1 , . . . , xk ).
x = c1 x1 + . . . + ck xk , y = d1 x1 + . . . + dk xk
cx = cc1 x1 + . . . + cck xk ∈ V.
9
WARNING 2.3.9 The span of no vectors is by convention defined to the
the vector space containing just the zero vector 0 ∈ Rn . This comes up when
defining dimension later on.
2. Exchanging two vectors in the list (thus renaming one to be the other
and vice versa).
x = c1 x 1 + . . . + ck x k .
We can now ask us the following question: are the ci ’s unique, that is, are
there multiple choices of the ci s.t. we get x as the linear combination?
Concretely, can we find ci , di ∈ R, s.t. ci 6= di for some i and still
x = c1 x 1 + . . . + ck x k = d 1 x 1 + . . . + d k x k ?
10
Here’s the reason why we are interested in this. We are looking for a
definition of dimension, which is based on the following. Assume we are
given vectors x1 , . . . , xk ∈ Rn . Let’s remove a vector from this collection,
say, xk . By theorem 2.3.8 we know that
span(x1 , . . . , xk−1 ) ⊂ span(x1 , . . . , xk ),
but do we have equality or not?
11
The alert reader might notice that this definition is quite problematic for
practical computations. Since span(x1 , . . . , xk ) is an infinite set (unless all
vectors are the point 0), to check for linear independence, we would have
to check the condition for infinitely many x ∈ span(x1 , . . . , xk ). This would
not be practical, since we would never be done, so fortunately we have the
following theorem, which implies we only need to check one element:
x = c1 x1 + . . . + ck xk = d1 x1 + . . . + dk xk .
It follows that
2.5 Dimension
Now that we know linear independence and how to test for it, we can begin
defining the concept of dimension. We start with the following theorem.
12
Theorem 2.5.1 Assume that x1 , . . . , xk ∈ Rn are not linearly independent.
Then we may remove some vector xi from the list without changing the span,
i.e.
span(x1 , . . . , xi−1 , xi+1 , . . . , xk ) = span(x1 , . . . , xk ).
Proof. By theorem 2.4.5, we may find some c1 , . . . , ck ∈ R, s.t. ci 6= 0 for
some i and
0 = c1 x 1 + . . . + ck x k .
Since we can just rename our variables, we may just as well assume that
c1 6= 0 to simplify notation.
We already know that span(x2 , . . . , xk ) ⊂ span(x1 , . . . , xk ), so we need
to show the opposite inclusion. Pick x ∈ span(x1 , . . . , xk ). We need to show
that we may write
x = d2 x2 + . . . + dk xk .
It follows from 0 = c1 x1 + . . . + ck xk that −c1 x1 = c2 x2 + . . . + ck xk . By our
assumption, c1 6= 0, so we can divide both sides by −c1 giving
x = α1 x1 + . . . + αk xk
= −α1 (c2 /c1 x2 + . . . + ck /ci xk ) + α2 x2 + . . . + αk xk
= (α2 − α1 c2 /c1 )x2 + . . . + (αk − α1 ck /c1 )xk .
1. If our list contains vectors that are 0 we may remove them without
affecting the span.
13
2. If x1 , . . . , xk are not linearly independent, then 0 = c1 x1 + . . . + ck xk
and some ci 6= 0. Remove the vector xi from the list. This won’t then
affect the span.
3. If the list that is left is not linearly independent, we may repeat the
previous step.
4. Continue until left with a linearly independent list of vectors with the
same span.
At each stage the list of vectors decrease in length by 1, so the process has
to stop, so the algorithm always stops with a linearly independent list. This
proves the following:
span(x1 , . . . , xk ) = span(A)
Example 2.5.4 Let x1 = (1, 0), x2 = (0, 1), x3 = (1, 1). We showed in
example 2.4.2 that these vectors are not linearly independent. Furthermore,
we showed that
14
Note 2.5.7 The alert reader might notice some serious problems here. If
V = span(x1 , . . . , xk ) = span(y1 , . . . , yl ) for two different sets of vectors.
Then running the algorithm for our first list produces some
A ⊂ {x1 , . . . , xk }
while running the algorithm for the second list produces some subset
B ⊂ {y1 , . . . , yl }.
Our definition then says that #A = dim V = #B, so our definition of di-
mension only makes sense if A and B contain the same number of vectors.
Fortunately, we have the following theorem, which tells us that vector spaces
behave pretty much exactly the way we would like them to. In particular, A
and B will always contain the same number of vectors.
Theorem 2.5.8 Let V ⊂ Rn be a vector space. Then the following are true:
1. V has a basis, so there always exists a finite list of vectors x1 , . . . , xk
s.t. they are linearly independent and V = span(x1 , . . . , xk ).
2. Any two bases of V have the same number of elements, thus dim V is a
well-defined constant independent of the chosen basis. Mathematicians
would call such a thing an intrinsic property of V .
3. If W ⊂ V ⊂ Rn are vector spaces, then dim W ≤ dim V ≤ dim Rn = n.
4. If W ⊂ V ⊂ Rn and dim W = dim V , then W = V .
Proof. Take either math 312 or math 370.
Note 2.5.9 Gaussian elimination taught later in the class will provide us
with an effective method for computing the dimension of the span of some
vectors.
15
1. x + y ∈ V for all x, y ∈ V (closure under addition)
Now assume V is any set with an addition satisfying all the above properties.
Let F denote either R or C. If for each x ∈ V and c ∈ F, we can define a
new vector cx ∈ V s.t. the following hold
5. 1x = x for all x ∈ V ,
Example 2.6.2 Let P(R) denote polynomials with real coefficients and set
V = P(R). We can clearly multiply a real polynomial by a real number, so
e.g. 2(1+x+x2 ) = 2+2x+2x2 and similarly a sum of two real polynomials is
a real polynomial. Thus we pick F = R. The reader can check that addition
of polynomials and multiplication by a scalar satisfies all the axioms given
above, where the zero vector is just the trivial polynomial 0.
16
Example 2.6.3 Let V = {f : [0, 1] → R} denote the collection of all func-
tions from [0, 1] to R. If f, g are two such functions, then we can add them
pointwise, so we can define a function f + g by (f + g)(x) = f (x) + g(x).
Similarly, we can multiply such a function with a real number, so if c ∈ R,
then cf is the function (cf )(x) = c · f (x) i.e. we multiply the value at each
point with c. This is precisely what it intuitively means to multiply a func-
tion by a number. Again, it’s a simple exercise to check that V together with
F = R satisfies all the properties of a vector space.
When solving differential equations, we will have to deal with vector
spaces V , where the elements i.e. the vectors x ∈ V are solutions to a dif-
ferential equation. Solving an equation will then be the problem of finding a
basis for this so called solution space. Here’s a simple example:
17
3 Matrices
3.1 Introduction
Definition 3.1.1 A matrix is a rectangular array of numbers or functions,
a1,1 a1,2 · · · a1,m
a2,1 a2,2 · · · a2,m
.. .
.. ..
. . .
an,1 an,2 · · · an,m
By ai,j we simply mean the element in the ith row and j th column. Sometimes
I will also use the term (i, j)-element of a matrix which just means the element
ai,j . By the dimensions of a matrix we mean the number of rows and columns.
A matrix with n rows and m columns is called an n-by-m matrix. We will
often denote an n-by-m matrix by A = (aij ), where i = 1, . . . , n and j =
1, . . . , m. This shorter notation is usually needed when proving things about
matrices.
Definition 3.1.2 n-by-m matrices are denoted by Matn,m (R), Matn,m (C)
etc. where the set in parenthesis denotes what set the elements of the matrix
belong to.
Example 3.1.3 The following are matrices in Mat2,2 (R) respectively Mat2,3 (C),
√
1 2 1 0 i
, ,
0 1 2 1 0
while the following is a matrix, where the entries are functions
2
x + 1 ex + 1
.
x 1
A matrix with function elements can be thought of as a function into either
Matn,m (R) or Matn,m (C), i.e. for each value of the variable, we get a real or
complex matrix.
Note 3.1.4 A real matrix is of course also a complex matrix, since real num-
bers are also complex, but just with zero imaginary part. Thus Matn,m (R) ⊂
Matn,m (C).
18
Definition 3.1.5 Given an n-by-m matrix
a1,1 a1,2 · · · a1,m
a2,1 a2,2 · · · a2,m
.
.. .. ..
. . .
an,1 an,2 · · · an,m
Example 3.1.6 Given the complex matrix from the previous example,
1 0 i
,
2 1 0
the first column vector is (1, 2) while the second row vector is (2, 1, 0). The
first row vector of 2
x + 1 ex + 1
x 1
is (x2 + 1, ex + 1).
19
Note that matrices can only be added if they have the same dimensions.
More compactly, if A = (aij ) and B = (bij ) are of equal dimensions, then
A + B = (aij + bij ).
2. A + (B + C) = (A + B) + C (associativity)
3. (cd)A = c(dA)
20
Note 3.2.5 With regards to the generalized definition of a vector space,
the reader can check that n-by-m matrices with real coefficients form a real
vector space. More generally, let V be any vector space with scalars F (i.e.
either R or C) and W = Matn,m (V ) the set of n-by-m matrices where the
matrix elements are elements of V . Then W is also an F vector space with
component-wise addition and scalar products.
Example 3.2.7 Let x denote the first column vector and y the second row
vector of 2
x + 1 ex + 1
x 1
Then
x · y = (x2 + 1, x) · (x, 1) = (x2 + 1)x + x = x3 + 2x.
We can also multiply matrices, but the product of matrices has a some-
what unintuitive definition. Matrices where originally invented to handle
systems of linear equations, which we will also study, so therein lies the
motivation for the definition. The idea was to write a system of m linear
equations in n variables,
a1,1 x1 + a2 x2 + . . . + a1,n xn = b1
.. ,
.
a x + a x + ... + a x = b
m,1 1 2 2 m,n n m
21
Therefore, we want the ”product” on the left to equal
a1,1 x1 + a2 x2 + . . . + a1,n xn
..
.
.
am,1 x1 + a2 x2 + . . . + am,n xn
The following definition for the matrix product accomplishes just this:
Thus the element ai,j in the matrix AB is the dot product of the ith row
vector of A and the j th column vector of B. The product AB is an n-by-m
matrix. Using the shorter notation with A = (aij ) and B = (bij ), then
AB = (cij ), where
p
X
cij = aik bkj .
k=1
WARNING 3.2.9 The dot product between row vectors of A and column
vectors of B has to make sense, i.e. they have to be lists of equal length. This
means that the matrix product AB is only defined if the number of
columns of A equals the number of rows of B. The size of the matrices
in the matrix product work as follows:
WARNING 3.2.10 If the product AB is defined, this does not imply that
BA is defined!
22
Example 3.2.11 Here are a bunch of examples:
1 1 1
1 0 2 1+2 1 1+2
0 0 0 =
0 1 1 1 0 1
1 0 1
3 1 3
=
1 0 1
1 0 1 1 1 1
=
2 1 1 1 2+1 2+1
1 1
=
3 3
1 1 1 0 1+2 1
=
1 1 2 1 1+2 1
3 1
=
3 1
2 1 x1 2x1 + x2
=
1 1 x2 x1 + 2x2
The last product shows how matrix multiplication relates to systems of linear
equations.
Proof. These are annoying to check, but is best done using the shorthand
notation for matrices and the sum formula for the elements in the product
matrix. The interested student can try to check these.
23
3.3 The transpose of a matrix
Definition 3.3.1 Given an n-by-m matrix
a1,1 · · · a1,m
A = ... .. ..
. .
an,1 · · · an,m
In other words the ith row vector of AT is the ith column vector of A.
Using our shorter notation, define a0ji = aij , then if A = (aij ), i = 1, . . . , n,
j = 1, . . . , m, we get AT = (a0ji ), where j = 1, . . . , m and i = 1, . . . , n.
Example 3.3.2
T 1 2
1 0 1
= 0 1
2 1 1
1 1
1. (AT )T = A
2. (A + B)T = AT + B T
3. (AB)T = B T AT
4. (cA)T = cAT
Proof. Most are obvious if you try a few examples. I will assign a few as
homework.
24
Symmetric matrices will come up later in the class when we study eigen-
values. They also have an important role in the definition of quadratic forms.
These are very important in number theory and are also used in geometry to
classify all plane conics. Again, this is outside the scope of this course. The
interested reader can look up these terms on Wikipedia.
Definition 3.4.1 A matrix with the same number of rows and column is
called a square matrix.
Definition 3.4.2 Denote by In the n-by-n square matrix with 1’s on the
diagonal and 0’s everywhere else. Then this is called the n-by-n identity
matrix or just the identity matrix if the dimensions are clear from the context.
If the dimensions of the matrix is obvious from the context, we will just write
I to denote the identity matrix. More concretely, the matrix looks like follows
1 ··· 0
.. . . ..
. . .
0 ··· 1
IA = A = AI,
25
so I is the identity element with respect to matrix multiplication of n-by-n
matrices. This also explains the name of the matrix.
Proof. Write I = (bij ), A = (aij ), where i, j = 1, . . . , n, so that
1, i = j
bij = .
0, i 6= j
Letting IA = (cij ) and AI = (c0ij ), we get
n
X
cij = bik akj = bii aij = aij
k=1
and n
X
c0ij = aik bkj = aij bjj = aij .
k=1
Definition 3.4.4 A square matrix with nonzero elements only on the diag-
onal is called a diagonal matrix.
Example 3.4.5 The matrix on the left is a diagonal matrix, while the matrix
on the right is not.
−1 0 0 −1 2 0
0 1 0 , 0 1 0
0 0 3 0 0 3
Example 3.4.7 The matrix on the left is lower triangular, while the matrix
on the right is upper triangular.
−1 0 0 −1 2 0
4 1 0 , 0 1 0
1 0 3 0 0 3
26
Note 3.4.8 Note that a diagonal matrix is both lower and upper triangular.
Moreover, a diagonal matrix is also symmetric.
and if we equate components, this will give us precisely the original linear
system.
Definition 3.5.1 The matrix A in the matrix notation is called the co-
effiecient matrix of the system, B is often called the constant vector and X
the vector of unknowns.
27
Example 3.5.4 The system on the left is consistent while the one on the
right is not:
x+y =0 x+y =1
, .
x−y =0 x+y =0
In matrix form these equations are:
1 1 x 0 1 1 x 1
= , = .
1 −1 y 0 1 1 y 0
Definition 3.5.5 The previous three kinds of operations are called elemen-
tary operations on a system. Notice the resemblance with Theorem 2.3.10.
This is not a coincidence.
Definition 3.5.6 Denote by Eij the n-by-n matrix for which every element
is a zero except that the (i, j)-element is 1.
Pn Pn
Note 3.5.7 If A = (aij ) is an n-by-n matrix, then A = i=1 j=1 aij Eij ,
so this gives us a nice notation for modifying matrices.
1. In + (a − 1)Eii .
3. In + aEij , i 6= j.
28
The first one is just the identity matrix with an element a in (i, i). The
second one is the identity matrix with the ith and jth row interchanged.
The third one is the identity matrix, but with the element 0 in (i, j) replaced
with a.
Theorem 3.5.9 Let A be an n-by-m matrix, with the following row vectors:
x1
A = ... .
xn
Multiplying with the elementary matrices on the left has the following effect.
Write E1 = In + (a − 1)Eii , E2 = In + Eij + Eji − Eii − Ejj and E3 = In + aEij
and assume i < j, then
x1
..
.
x1 xi−1
x1
.. xj
..
. .
x
xi−1 xi−1
i+1
.
E1 A = axi , E2 A = .. , E3 A = xi + axj ,
xi+1 xi+1
x
. j−1
..
.. x
i .
x
xx j+1 xx
.
..
xx
29
Multiplying a 3-by-2 matrix with the middle one gives e.g.
1 0 0 1 1 1 1
0 0 1 2 4 = 3 1 ,
0 1 0 3 1 2 4
so it interchanges the second and third row as desired. The reader should
check that multiplying with the other two has the result claimed in the
previous theorem.
30
we call the matrix
a1,1 · · · a1,n b1
.. .. .. ..
. . . .
am,1 · · · am,n bm
the augmented matrix of the system. In other words, it’s just the matrix
(A | B), when the system is written as AX = B.
31
The corresponding linear system to this is
x+y+z =3
y + 2z = 1
z=1
We see that solving the system is very simple, since we already know the
value for z. Thus, we can plug in z = 1 into the second equation, then solve
for y = 1 and finally plug in y and z into the first equation to get x = 1.
The matrix defining the system has a very special form, which is what makes
solving the system easy.
Example 3.6.2 The matrix in the first example in this section is in row-
echelon form. Another one would be
1 0 2 −1 0
0 0 1 −5 7
0 0 0 1 0 .
0 0 0 0 0
Definition 3.6.3 The first nonzero element in each row is called a pivot.
Algorithm 3.6.4 A linear system can be solved using the following proce-
dure:
1. By only applying elementary row operations to its augmented matrix,
transform the matrix into row-echelon form.
2. Now solve as in the first example of this section (if possible) by solving
for variables from the bottom up.
32
Definition 3.6.5 The procedure used to perform step (1) is called Gaussian
elimination. The procedure is best understood through a few examples, but
I’ll give an algorithmic description below.
2. Switch rows if needed, so the the upper most element in that row i.e. the
pivot is nonzero.
3. Multiply the row with the pivot by a scalar, so that the pivot is 1.
This works as follows. We see that for the matrix to be in row-echelon form,
the elements below the top-left element need to be zeros. Denote the ith row
by Ri . We can replace R2 by R2 − 4R1 and R3 by R3 − 2R1 . The process
gives the following chain:
R2 ←− 1 R2
1 3 −2 −7 R2 ←R2 −4R1 1 3 −2 −7 R 11
←− 1 R
R3 ←R2 −2R1
4 1 3 5 ⇒ 0 −11 11 33 3 ⇒11 3
2 −5 7 19 0 −11 11 33
1 3 −2 −7 1 3 −2 −7
R3 ←R3 −R2
0 1 −1 −3 ⇒ 0 1 −1 −3
0 1 −1 −3 0 0 0 0
33
This matrix in row-echelon form, then corresponds to the system
x + 3y − 2z = −7
y − z = −3 ,
0=0
which is then equivalent to the original one, thus has the same set of solutions.
Note that even though we started with three equations, the system essentially
reduced to a system of two. This will be explained in more details in the next
section. The system is now underdetermined, so we can solve it as follows.
Since, the last equation puts no constraints on z, choose z freely to be any
nonzero value, say, z = 1. From the second equation, we then get y = −2
and finally from the first x = 16.
34
a real-life system would amount to, since the elements in the matrix typically
end up being rational numbers making the computation slightly annoying.
1 3 5 −1 1 1 3 5 −1 2
1 3 0 1 1 R2 ←R2 −R1 0 0 −5 2 −1 R2 ↔R3
2 2 9 4 0
⇒ 0 −4 −2 6 −4 ⇒
1 0 3 0 1 0 −3 −2 1 −1
1 3 5 −1 2 1 3 5 −1 2
0 −4 −2 6 −4 R2 ←−R2 /4 0 1 1/2 −3/2 1 R4 ←R4 +3R2
0 0 −5 2 −1
⇒
0 0 −5
⇒
2 −1
0 −3 −2 1 −1 0 −3 −2 1 −1
1 3 5 −1 2 1 3 5 −1 2
0 1 1/2 −3/2 1 R3 ←−R3 /5 0 1 1/2 −3/2 1 R4 ←R4 +R3 /2
0 0 −5
⇒ ⇒
2 −1 0 0 1 −2/5 1/5
0 0 −1/2 −7/2 2 0 0 −1/2 −7/2 2
1 3 5 −1 2 1 3 5 −1 2
0 1 1/2 −3/2 1 0 1 1/2 −3/2 1
R4 ←−10R 4 /37
⇒
0 0 1 −2/5 1/5 0 0 1 −2/5 1/5
0 0 0 −37/10 21/10 0 0 0 1 −21/37
V = span(x1 , . . . , xn )
35
the span! So given a matrix in row-echelon form, is there an easy way to
figure out the dimension of the span of the row vectors?
Lets look at an example from the previous section. The following matrix
from the previous section is in row-echelon form
1 0 2 −1 0
0 0 1 −5 7
0 0 0 1 0
0 0 0 0 0
and we’ll denote the row vectors by x1 , . . . , x4 . First notice that the last
vector is the zero vector, so it doesn’t add anything to the span, so we can
just throw x4 away. We are then left with three row vectors corresponding
to the nonzero ones. Now look at the equation
0 = c1 x 1 + c2 x 2 + c3 x 3
0 = c1 · 1 + c2 · 0 + c3 · 0.
This forces c1 = 0. The equation for the third component will then become
0 = 0 · 2 + c2 · 1 + c3 · 0,
36
This method has another nice application to material we did on vector
spaces. Assume that we are given some vectors x1 , . . . , xk ∈ Rn and we want
to figure out the dimension of V = span(x1 , . . . , xk ). This can be done using
the idea of the previous algorithm as follows:
3. The rank will be the dimension of V and the nonzero rows of the row-
echelon form of A will form a basis of V .
1. The row-echelon form has no zero rows. We say it has full rank, since
the rank equals the number of rows of A.
2. The row-echelon form has one or more zero rows at the bottom, so the
rank is less than the rank of A.
Now lets think what will happen if we add one more column to A and compute
the row-echelon form of the new matrix. If you look at the examples of
37
Gaussian elimination from the previous section, you’ll notice that at each
stage of the algorithm we only care about one column at a time, since we try
to clear the elements to zero below our current pivot. Therefore, nothing will
change in the algorithm if we add a new column, except when we reach that
column in the algorithm. The end result is that adding one more column to
the right might turn a zero row of the old row-echelon form into a nonzero
row in the row-echelon form of the new matrix. Thus we get the following:
The significance is that in the first case the system has solutions while in
the second case it does not. The second case essentially means that the
row-echelon form of (A | B) will have a row of the following form
0 ··· 0 1 .
But in the linear system corresponding to the matrix this will correspond
to an equation 0 = 1, which has no solution! Thus we have the following
theorem:
Algorithm 3.8.3 (Checking if a linear system has a solution) Let the system
be AX = B and let (A | B) be the augmented matrix. We do the following:
1. Reduce the matrix (A | B) into row-echelon form.
2. Let M denote the reduced matrix and N the reduced matrix with the
last column removed.
38
3. If rank M > rank N , then the system has no solution.
Now assume that the latter holds, then we may find two solutions to the
equation AX = B, so let X1 and X2 be solutions. As we saw earlier this
means that
A(X1 − X2 ) = 0,
but then we can multiply X1 − X2 by a scalar getting c(X1 − X2 ) and, again,
A(c(X1 − X2 )) = cA(X1 − X2 ) = 0,
39
1. Compute the row-echelon form of (A | B).
2. Let M denote the reduced matrix and N the reduced matrix with the
last column removed.
3. If rank M > rank N , then stop since the system has no solutions.
5. If rank M = rank N < m, then the system has infinitely many solu-
tions.
3.9 Determinants
In math 114 you have already met a form of the determinant when computing
the cross product of two vectors. The determinant is a tool that you can
think about as follows. You feed it a list of n vectors in Rn and out comes a
real number. In other words it’s a function that takes as input vectors and
outputs a number. The function is denoted by det and in the case n = 2 it
would give us a number
The problem of this section is trying to determine what we would like this
function to do and then how to define it.
The starting point is the following. Assume that we are given two vectors
x1 , x2 in R2 . Then from 114 you might remember that they form a paral-
lelogram with vertices 0, x1 , x2 , x1 + x2 . we would like the determinant to
measure the signed area, so
40
This idea of ”volume” can be generalize into Rn and we want the previous
two examples to generalize to the definition of volume in higher dimensions. If
you don’t want to think about what something like that would philosophically
mean (e.g. what’s a cube in 4-dimensions and what’s the volume?), just
assume that all the following examples have n = 2 or n = 3.
If the determinant measures volume, then it should certainly satisfy the
following
det(cx1 , x2 , . . . , xn ) = c det(x1 , x2 , . . . , xn ), (1)
so if I stretch (or shrink) the parallelepiped in one dimension by a factor of
c, then certainly the volume also changes by a factor of c. Now if you add a
vector x to x1 , then we should get
Since, the parallelepiped on the left can be cut into pieces and assembled to
form the ones on the right (draw picture of n = 2 case). Also if two vectors
are the same, then the volume would be zero, since the object is ”flat”. In R3
the parallelepiped would have no height similarly the parallelogram defined
by two copies of the same vector has no height. Thus we want that
det(x1 , . . . , xn ) = 0 (3)
if xi = xj for some i 6= j.
To make notation easier, we can define the determinant on matrices. Since
the determinant takes as input n vectors x1 , . . . , xn ∈ Rn , we can just let A
be an n-by-n matrix where the row vectors are x1 , . . . , xn i.e.
x1
A = ... .
xn
and then let det A = det(x1 , . . . , xn ). Now we are ready to actually define
the determinant function through a list of conditions we want it to satisfy:
41
2. If we replace the ith row vector by ax + by we have the following
equality:
Note 3.9.3 These conditions are nothing but a compact way of expressing
what we had just done above. At least the middle condition deserves some
comment. It actually says a few things. First, if b = 0, then it simply says
that multiplying a row by the constant a multiplies the determinant by a,
so it’s just the equation (1) above. When a = 1 it’s simply the condition
(2) about adding another vector to one of the rows. The last condition is
precisely condition (3).
42
Proof. I’ll use the vector notation for this one, since it’s easier. Assume that
A has row vectors x1 , . . . , xn and assume we switch the first and second row.
The argument for any other pair is the same, but the notation is just messier.
We expand using property (2) and use (3) to get rid of terms, so
0 = det(x1 + x2 , x1 + x2 , x3 , . . . , xn )
= det(x1 , x1 , x3 , . . . , xn ) +
det(x1 , x2 , x3 , . . . , xn ) +
det(x2 , x1 , x3 , . . . , xn ) +
det(x2 , x2 , x3 , . . . , xn )
= det(x1 , x2 , x3 , . . . , xn ) +
det(x2 , x1 , x3 , . . . , xn )
Since the sum of the last two terms is zero, they must have the same magni-
tude and opposite signs.
Theorem 3.10.2 If a matrix contains a row of all zeros, then the determi-
nant is zero.
Proof. I’ll assume the first row is zero to simplify notation. The proof for
any other row is the same. Let the row vectors be 0, x2 , . . . , xn . Then
det(0, x2 , . . . , xn ) = det(0 + 0, x2 , . . . , xn )
= det(0, x2 , . . . , xn ) + det(0, x2 , . . . , xn )
Theorem 3.10.3 Adding a scalar multiple of one row to a different row will
not change the value of the determinant.
Proof. Assume we add a scalar multiple of the row xi , i 6= 1, to the first
row. The general case is again similar. Let the row vectors be x1 , x2 , . . . , xn .
Then
But xi occurs twice in the list of vectors in the second term, so property (3)
of the determinant tells us that the term is zero.
43
The following theorem is very important and the observation used in the
proof will be used to actually compute determinants.
Theorem 3.10.4 If an n-by-n matrix has rank less than n, then the deter-
minant is zero.
Proof. Let A have rank(A) < n. Let B be the row-echelon form of A. Then
B will have a row of all zeros at the bottom, so by the previous theorem the
determinant is zero. The theorem then follows from observing the following:
2. Interchanging two rows of a matrix will not make the determinant zero,
since it just switches the sign of the matrix.
44
zero. Thus the determinant equals the product of the diagonal elements in
this case.
Now assume that all diagonal elements are nonzero. Subtracting a23 a−1 33
times the third row from the second row and a13 a−1
33 times the third row from
the first row, the matrix becomes
a11 a12 0
0 a22 0 .
0 0 a33
since the identity matrix has determinant 1. The proof of the lower triangular
case is essentially the same.
The following is the most efficient way of computing the determinant for
large matrices.
45
1. Reduce the matrix to row-echelon form keeping track of how many rows
have been switched and how many times a row has been multiplied by
a constant. (These are the operations that change the determinant)
The determinant also has the following two properties that are much
harder to prove:
The latter theorem has lots of useful consequences. Essentially any claim
about columns becomes a claim about rows in the transpose, so we can
translate all our theorems to involve operations on columns instead of rows.
Using this and our previous results, we can summarize the following long list
of properties for the determinant:
46
6. Multiplying a row or a column by a constant c multiplies the determi-
nant by c.
7. det A = det AT .
47
In other words,
a11 a12 a13
det a21 a22 a23 = a11 a22 a33 +a12 a23 a31 +a13 a21 a32 −a13 a22 a31 −a11 a23 a32 −a12 a21 a33 .
a31 a32 a33
Proof. Again we can reduce symbolically to row-echelon form. The details
are quite messy.
Example 3.11.4 The matrix on the right is the (2, 1) minor of the one on
the left
−3 2 −1
0 2 −1
5 3 ,
−1 2
1 −1 2
48
Note 3.11.8 The previous formula lets you express the determinant of an
n-by-n matrix as a sum of determinants of (n − 1)-by-(n − 1) matrices.
Example 3.11.9 Expanding the determinant in the previous along the third
column gives
−3 2 −1
1+3 0 5
det 0 5 3 = (−1)(−1) det
1 −1
1 −1 2
2+3 −3 2
+3(−1) det
1 −1
3+3 −3 2
+2(−1) det .
0 5
Note 3.11.10 The cofactor expansion is rarely an efficient method for com-
puting determinants of matrices larger than 4-by-4. However, in some special
cases it might be useful. For example if there’s a row or column in the matrix
that has mostly zeros in it, then computing the expansion along that row or
column results in a very few terms in the sum in Theorem 3.11.6, since most
aij ’s will be zeros.
Example 3.11.11 The following matrix has mostly zeros in the last column,
so the expansion along that column gives
−3 2 0
1+3 0 5
det 0 5 0 = 0(−1) det
1 −1
1 −1 2
2+3 −3 2
+0(−1) det
1 −1
3+3 −3 2
+2(−1) det
0 5
−3 2
= 2 det
0 5
Similarly, we could expand along the second row, which would have the same
effect.
49
3.12 Matrix inverse
Definition 3.12.1 Let A be an n-by-n matrix. An n-by-n matrix B is called
a left inverse if BA = I. Similarly, B is called a right inverse if AB = I. If
BA = I = AB, then B is called an inverse of A.
Matrix multiplication takes a lot of work on larger matrices, which should
be apparent by know. Therefore, checking that a matrix is an inverse takes
a lot of work, since we have to perform two matrix multiplications to do it.
Fortunately, the following result allows us to cut the work in half.
The following shows how the matrix inverse is extremely useful for solv-
ing systems of linear equations. Assume we are given an invertible n-by-n
matrix A. Now given a system of linear equations
AX = B
50
we can just multiply both sides by A−1 from the left. It follows that
X = A−1 B.
What’s important about this is that the matrix inverse gives the solution
right away for any choice of vector B, since the solution X is now a function
of B.
Example 3.12.6 You should check by multiplication that the following ma-
trices are inverses:
2 0 1 −2 5 −3
A = −2 3 4 , A−1 = −8 17 −10
−5 5 6 5 −10 6
so once we know the matrix inverse of the coefficient matrix a system, solving
it becomes a triviality. The moral of the story is that if you need to solve
many systems of equations with the same coefficient matrix, but varying
constant vectors, then the most economical approach is to try to compute
the matrix inverse of the matrix. If it exists, then you will get a formula like
the above, where you can just plug in your constants i.e. you don’t need to
do Gaussian elimination again for every choice of constant vector.
Having a nice tool like the matrix inverse is not of much use unless we ac-
tually know how to compute it. The rest of this section will be devoted to
developing an algorithm that lets you both compute it, if it exists, as well as
determine that it does not.
Assume we are given matrices A, B with the same number of rows and
let (A | B) the again the augmented matrix. Assume we want to peform an
elementary row operation on (A | B), say, we want to switch two rows in
the matrix. We know from previous sections that this can be performed by
51
multiplying with an elementary matrix E from the left. The reader should
convince himself/herself that the following formula then holds
It simply says that it doesn’t matter if we switch rows in the whole matrix or
if we do it separately for the two parts. The end result will still be the same.
The same thing is true for any of the elementary row operations. Thus we
have the following theorem:
Theorem 3.12.7 Let A and B the two matrices with n rows and let E be
an elementary n-by-n matrix. Then
Next assume that A is an n-by-n square matrix and I is the identity matrix.
Assume that we can transform A into the identity matrix by a sequence of
elementary row operations. Since each row operation corresponds to mul-
tiplying from the left by an elementary matrix E, there will be a sequence
of elementary matrices, E1 , . . . , Ek . corresponding to the row operations, in
other words
Ek Ek−1 · · · E2 E1 A = I. (4)
Now what happens if we perform the same row operations on the matrix
(A | I)? Then we will instead end up with
Ek Ek−1 · · · E2 E1 (A | I) = (I | Ek Ek−1 · · · E2 E1 ).
Thus the second half of the matrix will contain the matrix B = Ek Ek−1 · · · E2 E1 .
However, (1) tells us that BA = I, so we have in fact found a way to compute
the matrix inverse! It follows that we have the following theorem:
52
1 −1 2
Example 3.12.9 We compute the inverse of the matrix 3 1 2
1 1 −1
using the method just described. We get
1 −1 2 1 0 0 1 −1 2 1 0 0
3 1 2 0 1 0 ⇒ 0 4 −4 −3 1 0
1 1 −1 0 0 1 0 2 −3 −1 0 1
1 −1 2 1 0 0 1 −1 2 1 0 0
0 1 −1 −3/4 1/4 0 ⇒ 0 1 −1 −3/4 1/4 0
0 2 −3 −1 0 1 0 0 1 −1/2 1/2 −1
The matrix on the left is now in row-echelon form and we can work backwards
to subtract lower rows from upper rows to transform our left matrix into the
identity matrix. This works as follows:
1 −1 0 2 −1 2 1 0 0 3/4 −1/4 1
⇒ 0 1 0 −5/4 3/4 −1 ⇒ 0 1 0 −5/4 3/4 −1
0 0 1 −1/2 1/2 −1 0 0 1 −1/2 1/2 −1
It follows that
−1
1 −1 2 3/4 −1/4 1
3 1 2 = −5/4 3/4 −1 .
1 1 −1 −1/2 1/2 −1
53
Using the concept of a linear map, we will give a complete answer to the
following question: when does a matrix have an inverse? Going back to ex-
ample 3.12.9 lets look at the working backwards step after we have reached
the row-echelon form. Notice that the method used to work backwards to the
identity matrix works for any n-by-n matrix in row-echelon form, assuming
that the last row is not a zero row. However, these matrices are precisely the
square matrices that have nonzero determinant, since the row-echelon form
has determinant one and the original determinant differs by a nonzero con-
stant corresponding to row switches and scalings of rows by nonzero scalars
during the Gaussian elimination.
Conversely, assume that the matrix A has an inverse A−1 . If the row-
echelon form of A has a zero row, then rank A < n, so that the system
AX = 0 has a nonzero solution, call it x. But then
x = Ix = A−1 Ax = A−1 0 = 0,
which would contradict the fact that x 6= 0. It follows that if rank A < n,
then no inverse can exist, so if the inverse exists, then the row-echelon form
does not have a zero row at the bottom. We have thus proved the following
theorem:
Finally, there’s a useful formula for 2-by-2 matrices, which is very quick
to compute:
a b
Theorem 3.12.13 Write A = 6 0, then we have the
. If det A =
c d
formula:
−1 1 d −b 1 d −b
A = =
det A −c a ad − bc −c a
54
Proof. Compute the product of A and the claimed inverse and simplify.
Example 3.13.2 The following matrix reflects points around the x-axis.
1 0
0 −1
Example 3.13.3 Write θ = arctan a and let Aθ be the rotation matrix above
corresponding to the angle θ. Denote by A the reflection matrix above. Then
the matrix
Aθ AA−1
θ
55
reflects points around the line y = ax.
It turns out that by defining matrices for different operations, we can use
them as building blocks to compute quite complicated maps, which would be
very hard to construct directly. The last two examples serve as illustrations.
While this is extremely useful in practice, we won’t have time to go into this.
I will write an optional short section on this which the interested student can
skim.
Notice that in second example, a vector along the x-axis stays fixed by
the matrix, in the third example a vector along the line stays fixed and in
the third example a vector along the line gets scaled by c. The geometry in
such a map can usually be analyzed through the following concept.
56
Example 3.13.7 Let A = I be the identity matrix. For every vector x ∈ Rn ,
we then have that Ax = x. This shows that every nonzero vector is an
eigenvector corresponding to the eigenvalue 1. In particular, 1 is the only
eigenvalue.
Ax = λx ⇔ Ax − λx = 0 ⇔ (A − λI)x = 0.
57
i.e. we subtract λ from each diagonal element of A. Having fixed the matrix
A, denote
p(λ) = det(A − λI),
so p : R → R is a function with parameter λ and the eigenvalues of A are
precisely the zeros of p. Let’s try to figure out what this function p looks
like. We will start with an example.
a b
Example 3.13.10 Write A = , so that
c d
a−λ b
p(λ) = det(A − λI) = det = (a − λ)(d − λ) − bc.
c d−λ
If we simplify the expression on the right, we see that p is a quadratic poly-
nomial in λ. The following is true in general.
Note 3.13.13 Not all polynomials with real coefficients have real roots, so
some matrices have no eigenvalues. An example is given by example 3.13.1.
58
Therefore, finding the eigenvectors correspond to finding all the solutions to
the linear system BX = 0, where B = A − λI. However, just finding a
formula for all the solutions, is not that useful. We actually want a bit more.
Null(A) = {x ∈ Rn | Ax = 0}
Theorem 3.13.16 Let A be an m-by-n matrix, then the null space Null(A) ⊂
Rn is a vector space.
Proof. Let x, y ∈ Null(A). Then
A(x + y) = Ax + Ay = 0 + 0 = 0
and
A(cx) = cAx = c0 = 0,
so x + y ∈ Null(A) and cx ∈ Null(A).
So how does this fit into our quest for finding the eigenvectors corre-
sponding to an eigenvalue? Well, notice that the solutions to the equation
Ax = λx consists of precisely the set Null(A − λI), so what we actually want
to find is a basis for this vector space.
Finding the basis for Null(A − λI) sounds quite tricky. We need to find
”enough” linearly independent eigenvectors, so that they span Null(A − λI).
59
But how do we know we have ”enough” i.e. we need to know the dimension
of Null(A − λI). Fortunately, there is the following theorem, which we won’t
prove:
There’s also the following theorem, which gives an upper bound for the
dimension and can be quicker to use than the previous one:
What the previous theorem says is that if you notice that λ1 is e.g. a
double root of p(λ) and you find two linearly independent vectors in Null(A−
λ1 I), then these form a basis, since dim Null(A − λ1 I) ≤ 2.
60
3. For each λi find all solutions to the system (A − λi I)X = 0 and by
choosing the free parameters appropriately find the number of linearly
independent eigenvectors needed as determined by the previous step.
This reveals that the eigenvalues are 0, −1, −3. Next, we look for the eigen-
vectors corresponding to the eigenvalue 0. By Theorem 3.13.19, we have
dim Null(A − 0I) = 1, so we only need to find one eigenvector, since it will
be a basis. This is true for each eigenvalue, so for each eigenvalue we look
for one eigenvector. Thus, we solve (A − 0I)X = 0 i.e.
−2 2 0 0
2 0 −1 0
1 3 −2 0
and a standard Gaussian elimination gives e.g. the solution x = (1, 1, 2).
To find an eigenvector corresponding to the eigenvalue −1, we solve the
system (A + I)X = 0 i.e.
−1 2 0 0
2 1 −1 0
1 3 −1 0
and a standard Gaussian elimination gives e.g. the solution y = (2, 1, 5).
61
Finally, to find an eigenvector corresponding to the eigenvalue −3, we
solve the system (A + 3I)X = 0 i.e.
1 2 0 0
2 3 −1 0
1 3 1 0
and, again, a standard Gaussian elimination gives e.g. z = (2, −1, 1). To
check that these are indeed eigenvectors, we can compute
−2 2 0 1 0
2 0 −1 1 = 0
1 3 −2 2 0
−2 2 0 2 −2
2 0 −1 1 = −1
1 3 −2 5 −5
−2 2 0 2 −6
2 0 −1 −1 = 3
1 3 −2 1 −3
62
Theorem 3.13.24 (Caley-Hamilton theorem) Let p(λ) be the characteristic
polynomial of A, then p(A) = 0.
Proof. Most probably proven in math 370.
1 1
Example 3.13.25 Find the inverse of the matrix A = . We com-
0 1
puted earlier the characteristic polynomial, which is p(λ) = (1 − λ)2 =
λ2 − 2λ + 1. It follows by the Cayley-Hamilton theorem that
p(A) = A2 − 2A + I = 0 ⇒ I = 2A − A2 = A(2I − A).
This shows that A−1 = 2I − A.
The previous example also works in general. Assume that we have some
n-by-n matrix A. Then we compute the characteristic polynomial, which is
of the form
p(λ) = an λn + an−1 λn−1 + . . . + a1 λ + a0 .
Plugging in A, and using Cayley-Hamilton, we get
an An + an−1 An−1 + . . . + a1 A + a0 I = 0
If a0 6= 0, then we get
I = −a−1 n
0 (an A +an−1 A
n−1
+. . .+a1 A) = A(−a−1
0 (an A
n−1
+an−1 An−2 +. . .+a1 I)),
which shows that
A−1 = −a−1
0 (an A
n−1
+ an−1 An−2 + . . . + a1 I).
Thus, this method always works, if the constant term a0 of the characteristic
polynomial is nonzero. One might ask then, when is a0 6= 0, so when will
this method work? It turns out that a0 6= 0 precisely when det A 6= 0, so if a
matrix is invertible, then this method always yields a formula for the inverse.
3.14 Diagonalization
In this chapter we develop some tools that turn out to be extremely useful
when working with systems of ordinary differential equations. Assume that
an n-by-n matrix A can be written as
A = P DP −1 ,
63
where D is a diagonal matrix. Then we may compute
n times
z }| {
An = (P DP −1 )n = P DP −1 P DP −1 · · · P DP −1
An = P Dn P −1 .
This now lets us define the matrix exponential eA as follows. We know that
ex satisfies the Taylor polynomial
x2 x3
ex = 1 + + + ...,
2! 3!
so we define
A2 A3
eA = I + + + ....
2! 3!
Contrary to the Taylor polynomial of ex , the previous sequence does not
always converge. However, if A = P DP −1 , then the sequence simplifies as
−1 P D2 P −1 P D3 P −1 D2 D3
A
e = PP + + + . . . = P (I + + + . . .)P −1
2! 3! 2! 3!
The matrix D is diagonal and of the form
a11 0
..
.
0 ann
64
Furthermore, we if we multiply A by a scalar t, then tA = P (tD)P −1 , so
that
eta11 0
etA = P
... −1
P
tann
0 e
Now that we have this interesting matrix exponential, which we are able
to compute for matrices that can be written as A = P DP −1 , we still don’t
have any idea when we can find matrices P and D that gives us this relation
and when they can be found, how they can be found. It turns out that
eigenvectors and eigenvalues are the key.
p(λ) = (λ − λ1 )e q(λ),
65
is the geometric multiplicity of the eigenvalue λ1 . The geometric multiplicity
is simply the number of linearly independent eigenvectors corresponding to
the eigenvalue λ1 .
Note 3.14.5 Theorem 3.13.19 states precisely that the geometric multiplic-
ity of an eigenvalue is less than or equal to the algebraic multiplicity.
3. If the geometric and algebraic multiplicities match in part (2), then the
matrix is diagonalizable else it isn’t.
66
4. If we have determined that A is diagonalizable, find a basis for each
Null(A − λi I) once this is done for each i, we get a list of n linearly
independent eigenvalues x1 , . . . , xn , where each xi corresponds to an
eigenvalue λi .
We see that p(λ) factorizes completely, since it has real roots. Solving the
required linear systems, we find the following eigenvectors basis for Null(A −
I) and Null(A − 2I):
The algebraic and geometric multiplicity of the eigenvalue 1 is one and for
the eigenvalue 2 this show it’s two. It follows that the matrix is diagonaliz-
able. Now our list of eigenvectors is x1 , x2 , x3 and the corresponding list of
eigenvalues is 1, 2, 2. It follows that
−1
0 0 −1 1 0 0 0 0 −1
A = −1 1 0 0 2 0 −1 1 0 .
1 0 1 0 0 2 1 0 1
If we want an actual expression for the inverse matrix, then a simple compu-
tation shows that the characteristic polynomial of P is λ3 − 2λ2 + 2λ − 1. It
67
follows that
1 0 1
I = P (P 2 − 2P + 2I) ⇒ P −1 = P 2 − 2P + 2I = 1 1 1 ,
−1 0 0
so that
0 0 −1 1 0 0 1 0 1
A = −1 1 0 0 2 0 1 1 1 .
1 0 1 0 0 2 −1 0 0
Note 3.14.11 The matrices P and D are not unique. D depends on the
order in which we list our eigenvectors. P depends on both the order of the
chosen eigenvectors as well as on the choice of basis for Null(A − λi I) for
each i.
4 Higher-Order ODEs
4.1 Basic definitions
Definition 4.1.1 A linear ordinary differential equation is an equation of
the form
This lets us express the linear ordinary differential equation above as simply
Ly = g(x).
68
Definition 4.1.2 Any L like the above containing terms with a function
times a power of D will be called a linear differential operator or just a
differential operator. The highest degree of n will be called the degree of the
differential operator.
Theorem 4.1.4 Let L = an (x)Dn +an−1 Dn−1 +. . .+a1 (x)D+a0 (x) with the
functions ai (x) defined on some interval I. If x0 ∈ I, then the initial-value
problem
Ly = g(x), y (n) (x0 ) = yn , y (n−1) (x0 ) = yn−1 , . . . , y(x0 ) = y0
has a unique solution, y, on the interval I.
69
4.2 Homogeneous equations
Definition 4.2.1 Let L be an nth degree differential operator. A homoge-
neous equation is an equation of the form
Ly = 0,
c1 y 1 + . . . + ck y k
Definition 4.2.3 A set of functions y1 (x), . . . , yn (x) are called linearly in-
dependent on an interval I if
c1 y1 (x) + . . . cn yn (x) = 0
70
Example 4.2.4 The functions x and x2 are linearly independent on any
interval I ⊂ R containing more than one point. This is, because if c1 6= 0 6=
c2 , then
c1 x + c2 x 2
is nonzero for at least some x on I.
Theorem 4.2.10 There exists a fundamental set of solutions for any nth
order homogeneous linear differential equation on an interval I.
71
Theorem 4.2.11 If y1 , . . . , yn is a set of fundamental solutions to an nth
order homogeneous linear differential equation on an interval I, then every
other solution is of the form
y = c1 y 1 + . . . + cn y n
Note 4.2.12 If you read the section on the generalized definition of a vector
space, then one can make a few observations. The set of solutions to Ly = 0
is a vector space, since Lcy1 = cLy1 = 0 and L(y1 + y2 ) = Ly1 + Ly2 = 0,
so the set of solutions is closed under scalar multiplication and addition. All
the other properties listed in that section can also be checked.
What the above theorem then says in this language is that the funda-
mental set of solutions is a basis for the vector space of all solutions, so every
other solution is in the span. It also says that the dimension of the solution
set of an nth order linear differential equation is n. Using this one can apply
all the techniques and theorems of linear algebra, including eigenvectors and
eigenvalues, to the study of differential equations. This quickly leads to a
large field of mathematics called functional analysis.
Ly = g(x),
72
Corollary 4.3.3 Let yp be a particular solution to the nth order nonho-
mogeneous differential equation Ly = g(x) and y1 , . . . , yn a fundamental set
of solutions to the homogeneous equation Ly = 0. Then any solution of
Ly = g(x) is of the form
y p + c1 y 1 + . . . + cn y n .
Proof. yh in the theorem is of the form c1 y1 + . . . + cn yn being a solution to
Ly = 0.
Definition 4.3.5 The previous results show that any solution of Ly = g(x)
is of the form y = yp + yh . The function yh is called the complementary
function of the solution.
73
4.4 Homogeneous linear equations with constant coef-
ficient
Definition 4.4.1 Let L = an Dn + an−1 Dn−1 + . . . + a1 D + a0 y, so the
coefficients of each y (i) is a constant. Then the differential equation
Ly = 0
an cn + an−1 cn−1 + . . . + a1 c + a0 = 0.
an cn + an−1 cn−1 + . . . + a1 c + a0 = 0
Note 4.4.4 The previous theorem says that given the polynomial p(c) =
an cn + an−1 cn−1 + . . . + a1 c + a0 , then it factors as
where each ci can be complex numbers and ei is the multiplicity of the root.
74
Theorem 4.4.5 (Conjugate pairs) If c = a + bi is a root of p(c) = an cn +
an−1 cn−1 + . . . + a1 c + a0 , then a − bi is also a root.
Example 4.4.6 Let y 00 +2y 0 +y = 0 has the auxiliary equation p(c) = (c+1)2 ,
so we only find one solution y = e−x instead of two, which is what we need
in order to find the general solution.
ec1 x , xec1 x , . . . , xe1 −1 ec1 x , ec2 x , xec2 x , . . . , xe2 −1 ec2 x , . . . , eck x , xeck x , . . . , xek −1 eck x .
75
Example 4.4.10 A fundamental set of solutions to the previous example
y 00 + 2y 0 + y = 0 is given by e−x , xe−x .
We still have one problem we haven’t addresses properly. Assume that
our equation p(c) = 0 has a complex root a + bi, then the corresponding
solution is
eax+ibx = eax eibx = eax (cos bx + i sin bx)
using Euler’s formula eibx = cos bx+i sin bx. This function is complex-valued,
but we are only interested in real-valued solutions to our differential equation.
Since complex roots occur in conjugate pairs, it follows that a − ib is also
a root, which simplifies to
eax−ibx = eax (cos bx − i sin bx).
Write y1 = eax (cos bx + i sin bx) and y2 = eax (cos bx + i sin bx). Then
y1 + y2 = 2eax cos bx, y1 − y2 = 2ieax sin bx.
Since for any constant C, C(y1 + y2 ) and C(y1 − y2 ) are solutions, we may
divide by 2 in the first expression and by 2i in the second. It follows that
eax cos bx, eax sin bx
are solutions to Ly = 0 and they turn out to be linearly independent. This
gives the following refinement to our previous theorem.
76
corresponding to the real roots and the following
eal+1 cos bl+1 x, xeal+1 cos bl+1 x, . . . , xel+1 −1 eal+1 cos bl+1 x, . . .
eax cos bx, . . . , xe−1 eax cos bx, eax sin bx, . . . , xe−1 eax sin bx.
77
1. A constant, a polynomial, eax , sin ax, cos ax;
Theorem 4.5.2 Let g(x) be of the following form. Then there’s a finite
number of functions f1 , . . . , fk s.t. any derivative g (n) (x) can be written in
the form:
g (n) (x) = c1 f1 + . . . + ck fk
for some constants ci .
Proof. Obvious once you realize what it’s saying.
Note 4.5.3 The assumption is that each fi has only one term in it, so no fi
is of the form say x + 1, but x2 ex is allowed, since there’s only one term in
the expression.
Example 4.5.4 Let g(x) = xex sin x + 1. Any derivative will only contain
terms that are sums of constants times the following
This can be seen since differentiating any function in the list is a linear
combination of functions in the list, so differentiating does not produce any
new types of terms.
78
Note 4.5.6 If g(x) = g1 (x) + . . . + gk (x), it’s often easier to solve the equa-
tions Ly = gi (x), i = 1, . . . , k, separately and then add the particular solu-
tions using the superposition principle. E.g. if g(x) = x2 + sin x, then solve
Ly = x2 and Ly = sin x and add the found solutions to get a solution for
Ly = x2 + sin x.
Example 4.5.8 Take the equation y 00 − 5y 0 + 4y = 8ex . Here g(x) = 8ex and
all the derivatives are linear combinations of f1 = ex . The auxiliary equation
of y 00 − 5y 0 + 4y = 0 is m2 − 5m + 4 = 0, so m = 1, 4. It follows that
y1 = ex , y2 = e4x
Since y1 is a constant times f1 , we need to replace f1 by xex , so yp = Axex .
Plugging into the equation we get
yp00 − 5yp0 + 4yp = −3Aex = 8ex ,
so A = −8/3. It follows that the general solutions is
8
y = − xex + c1 ex + c2 e4x .
3
79
4.6 Variation of parameters (optional)
The reason why a specific list of functions was chosen in the previous example
is that their derivatives only contain finitely many different types of terms.
If we chose for example g(x) = ln x, then successive differentiations require
higher and higher powers 1/xn , so there’s no finitely list of different terms
that can express every derivative. This method generally requires lots of
computation as we will soon see, but it’s the most general method for finding
particular solutions to nonhomogeneous systems that we will describe.
Let L = Dn +an−1 (x)Dn−1 +. . .+a1 (x)D+a0 (x), thus we only assume that
the highest degree coefficient is 1. Assume that we can solve the homogeneous
system Ly = 0, so that we find a fundamental set of solutions
y1 , . . . , y n .
Now write yp = u1 (x)y1 (x) + . . . + un (x)yn (x), where the ui are unknown
functions. Substituting yp into Ly = g(x) gives the following system of
linear equations
y1 u01 + y2 u02 + . . . + yn u0n = 0
0 0 0 0 0 0
y1 u1 + y2 u2 + . . . + yn un = 0
..
.
(n−2) 0 (n−2) 0 (n−2) 0
y1 u1 + y2 u2 + . . . + yn un = 0
(n−1)
(n−1) 0 (n−1) 0
y1 u1 + y2 u2 + . . . + y n un = g(x).
We can rewrite this equation in matrix form as
y1 y2 ··· yn u01 0
0 0 0 0
y1 y2 ··· yn u2
0
.. .. .. = .
.. ..
. . . . .
(n−1) (n−1) (n−1)
y1 y2 · · · yn u0n g(x)
Using Cramer’s rule, which we haven’t talked about, so you can just take
this as given, this system has the following solutions:
Wk
u0k = ,
W
where
y1 y2 ··· yn
0 0
y1 y2 ··· yn0
W = .. .. ..
. . .
(n−1) (n−1) (n−1)
y1 y2 ··· yn
80
and Wk is the determinant we get by replacing the kth column of W with
the constant vector of the system. This means that
0 y2 ··· yn y1 0 ··· yn
0 y20 ··· yn0 y10 0 ··· yn0
W1 = .. .. .. , W2 = .. .. ..
. . . . . .
(n−1) (n−1) (n−1) (n−1)
g(x) y2 ··· yn y1 g(x) · · · yn
To finally find the ui (x) we integrate the functions solved from this system.
81
where
Note that this differential equations gives us a solution y2 (z) = f (z) and the
answer to the original equation is then y = y2 (ln x), since ez = x.
y2 = C1 e−z + C2 e4z .
y = C1 e− ln x + C2 e4 ln x = C1 x−1 + C2 x4 .
82
Figure 1: On the left spring with no mass attached. In the middle spring
with mass attached at equilibrium. On the right spring with mass and
stretched distance x from equilibrium
F = −kx,
83
but since we can also measure force by measuring the acceleration of our
mass, we know that
d2 x
F = ma = m 2 = −kx.
dt
It follows that the movement of our mass is governed by the differential
equation
mx00 + kx = 0.
This equation is usually expressed in the form
k
x00 + ω 2 x = 0, ω 2 = ,
m
so the auxiliary equation is c2 + ω 2 = 0 i.e. c = ±ωi and the solution to the
equation is
x(t) = C1 cos ωt + C2 sin ωt.
The previous equation gives a full solution, but the formula can be sim-
plified into a form from
p which the amplitude, period and frequency are easier
to see. If we set A = C12 + C22 and let
C1
tan φ = ,
C2
then we may write
x(t) = A sin(ωt + φ),
from which it’s apparent that the amplitude is A, frequency is f = ω/2π and
period is T = 2π/ω. The motion defined by the previous equation is given
called simple harmonic motion or free undamped motion.
84
Such a resistance is directly proportional to the velocity of the moving object,
so our differential equation is of the form
so the solution of our system is then covered by three cases, which are shown
in figure 2.
Case λ2 − ω 2 > 0: In this case the system is overdamped,
√ √
λ2 −ω 2 t λ2 −ω 2 t
x(t) = e−λt (C1 e + C2 e− ),
which does not represent a oscillatory motion. This happens when the
damping force is big compared to the spring constant.
Case λ2 − ω 2 = 0: In this case the system is critically damped,
and any slight decreasy in the damping force results in oscillatory mo-
tion.
Case λ2 − ω 2 < 0: In this case the system is underdamped, the roots are now
complex, so
√ √
x(t) = e−λt (C1 cos ω 2 − λ2 t + C2 sin ω 2 − λ2 t)
85
1
β =8
0.8 β =4
β =1
0.6 β =0
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0 2 4 6 8 10 12 14 16 18
where F (t) = f (t)/m. In this case the system can behave in any continuous
fashion, since by picking f (t) appropriately, we may essentially cause the
spring mass system to move in any way we want.
From the general theory in the previous sections, we know that any solu-
tion is of the form
x(t) = xp (t) + xh (t),
where xp (t) is a particular solution and xh (t) is a solution to the homogeneous
part, which represents solution to the corresponding free damped case. The
86
solution xh (t) is called the transient solution and the function xp (t) is called
the steady-state term. The name for the latter comes from the fact that if
f (t) is the zero function, then the steady-state term can be chosen to be
the function x(t) = 0 i.e. the solution where the mass stays at equilibrium.
Worth noting is that defining initial-conditions on the system will only affect
the transient solution.
Example 4.8.2 The interesting case is when the driving force is periodic, so
we’ll analyze the case when there’s no damping. In this case the differential
equation becomes
x00 + ω 2 x = F sin αt.
This equation has a particular solution
F
xp (t) = sin αt.
ω2 − α2
and the general solution
F
x(t) = C1 cos ωt + C2 sin ωt + sin αt
ω2 − α2
If we assume that the system starts from the equilibrium position i.e. x(0) =
0 = x0 (0), then
αF
c1 = 0, c2 = − ,
ω(ω 2 − α2 )
so that
αF
x(t) = (−α sin ωt + ω sin αt).
ω(ω 2 − α2 )
These formulas only make sense if α 6= ω, in other words, the frequency of
the harmonic oscillator is different than the frequence of the driving force.
If the frequency of the driving force equals the frequency of the oscillator,
then one would expect the amplitude to start growing towards infinity. This
is analogous to a kid on a swing who keeps pushing more and more speed
to the swing in sync with the frequency. To find the function modeling this,
one can compute the following limit by applying l’Hopital’s rule, which gives
−α sin ωt + ω sin αt F F
x(t) = lim F 2 2
= 2
sin ωt − t cos ωt.
α→ω ω(ω − α ) 2ω 2ω
As expected, the second term shows that the amplitude goes to infinity.
87
5 Systems of linear ODEs
5.1 Basic definitions
Definition 5.1.1 A first-order system of differential equations is an equation
of the form
dx1
dt = g1 (t, x1 , . . . , xn )
.. ,
.
dxn = g (t, x , . . . , x )
dt n 1 n
88
can be written as
x01
1 2 x1 1
= +
x02 5 −1 x2 0
89
Proof. Since we differentiate element by element, we have
c1 X 1 + c2 X 2 + . . . + ck X k = 0
for every t ∈ I. If such nonzero ci can’t be found then the solutions are called
linearly independent.
90
and
e−2t 3e6t
W (X1 , X2 ) = = 8e4t 6= 0
−e−2t 5e6t
so X1 , X2 are linearly independent solutions and form a fundamental set of
solutions.
X = Xp + c1 X1 + . . . + cn Xn .
Proof. Proof is the same as for the differential equation case, i.e. if Xp1 and
Xp2 are solutions to X 0 = AX + F , then Xp1 − Xp2 is a solution to X 0 = AX
and proceed similarly as in the differential equation case.
3t − 4
Example 5.1.17 Xp = is a particular solution to
−5t + 6
0 1 3 12t − 11
X = X+ .
5 3 −3
1
We know from the previous example that X1 = e−2t and X2 =
−1
3
e6t are a fundamental set of solutions to the corresponding homoge-
5
91
neous system. It follows that the general solution is
3t − 4 −2t 1 6t 3
X= + C1 e + C2 e .
−5t + 6 −1 5
1 −2 2
Example 5.2.3 Let us look at the system X 0 = −2 1 −2 X. The
2 −2 1
92
matrix has eigenvalues λ1 = −1 and λ2 = 5 and the former has algebraic
multiplicity 2. Computing the corresponding null spaces for each eigenvalue,
we first find the following linearly independent eigenvectors for −1,
1 0
x1 = 1 , x2 = 1 ,
0 1
so −1 has geometric multiplicity 2. For the eigenvalue 5, we find the eigen-
vector
1
x3 = −1 .
1
Using the theorem, we then get the general solution
1 0 1
X = c1 e−t 1 + c2 e−t 1 + c3 e5t −1 .
0 1 1
If the geometric multiplicity does not equal the algebraic multiplicity, then
the situation is more complicated. Assume that we are given the system
X 0 = AX
and that λ is an eigenvalue of geometric multiplicity g < k, where k is the
algebraic multiplicity. Then we can find the following linearly independent
eigenvectors of A corresponding to the eigenvalue λ,
x1 , . . . , x g .
We need to find k solutions to the system corresponding to this eigenvalue
in order to find a fundamental set. Since, we can’t find enough eigenvectors,
we have to look for something else.
I will only cover the cases here when g = 1 and k = 2, 3. The general
setup requires something called generalized eigenvectors and while giving a
nice formula it requires slightly more linear algebra than what we have time
to cover. The g = 1 and k = 2 case is the following:
93
has a nonzero solution y and the solutions to X 0 = AX corresponding to the
eigenvalue λ are
eλt x, eλt (xt + y).
2 1 6
Example 5.2.6 The system X 0 = 0 2 5 X has only one eigenvalue
0 0 2
of algebraic multiplicity 3 and geometric multiplicity 1. An eigenvector is
1
x = 0 .
0
94
At this point the process stops, since we have found 3 vectors corresponding
to the algebraic multiplicity. The general solution to the equation is thus
1 1 0 1 0 0
X = c1 e2t 0 +c2 e2t t 0 + 1 +c3 e2t t2 0 + t 1 + −6/5
0 0 0 0 0 1/5
Ax = λx.
0 −1
Example 5.3.2 The matrix A = has eigenvalues λ = ±i and
1 0
0 −1 1 −i 1
= = −i
1 0 i 1 i
95
if we let c1 = i and c2 = −1. Thus being complex vectors, these vectors are
linearly dependent, even though no nontrivial linear relation exists if we just
allow real values for c1 and c2 .
It turns out that everything we know about real eigenvectors translate
directly to complex eigenvectors. The multiplicity of a complex root of the
characteristic polynomial is the algebraic multiplicity and we can compute
ranks of matrices with complex entries just as we do for real matrices. We
just need to take into account that we have to do complex arithmetic during
the Gaussian elimination. When choosing linearly independent eigenvectors
we have to be careful though to choose them so that they are linearly inde-
pendent as complex vectors.
and
eλt x = eat (cos bt − i sin bt)x
are solutions to the system. As with linear differential equations and complex
roots of the auxiliary polynomial, these are complex values functions. Using
96
the superposition principle, we get solutions
1 1 i
X1 = (eλt x + eλt x) = (x + x)eat cos bt − (−x + x)eat sin bt.
2 2 2
and
i i 1
X2 = (−eλt x + eλt x) = (−x + x)eat cos bt − (x + x)eat sin bt.
2 2 2
If we now define z = 12 (x + x) = Re(x) and w = 2i (−x + x) = Im(x), we get
the following theorem
and
X2 = eat (w cos bt + z sin bt) .
Note 5.3.7 We will not cover the case when a complex eigenvalue has geo-
metric or algebraic multiplicity more than 1, since a nice uniform presentation
of that material requires more linear algebra than what we have covered.
0 2 8
Example 5.3.8 Let’s solve the system X = X. The character-
−1 −2
istic polynomial is λ2 +4, so the eigenvalues are ±2i. Since eigenvectors come
in conjugate pairs, we only need to find the eigenvector to the eigenvalue 2i.
Thus, we need to solve
2 − 2i 8 0 −1 −2 − 2i 0 −1 −2 − 2i 0
⇒ ⇒
−1 −2 − 2i 0 2 − 2i 8 0 0 0 0
97
The general solution is then
X1 = c1 (z cos 2t − w sin 2t) + c2 (w cos 2t + z sin 2t)
2 cos 2t − 2 sin 2t 2 cos 2t + 2 sin 2t
= c1 + c2 .
− cos 2t − sin 2t
At the same time, we see that the system is not a ”real system”, since it’s
essentially just a list of independent linear first-order differential equations.
That we know how to solve separately as simply xi = ci eai x . The equation
above can be written in matrix form as
x01 a1 0 x1
.. .. ..
. = . . .
0
xn 0 an xn
Conversely, we see that if a system can be written as X 0 = AX, then whenever
A is a diagonal matrix, the previous phenomena occurs.
98
−2 −1 8
Example 5.4.2 The coefficient matrix of the systen X 0 = 0 −3 8 X
0 −4 9
can be diagonalized as
−1
1 2 1 −2 0 0 1 2 1
0 2 1 0 1 0 0 2 1 ,
0 1 1 0 0 5 0 1 1
so that
c1 e−2t
Y = c2 et .
c3 e5t
It follows that
c1 e−2t c1 e−2t + 2c2 et + c3 e5t
1 2 1
X = P Y = 0 2 1 c2 e t = 2c2 et + c3 e5t
5t t 5t
0 1 1 c3 e c2 e + c3 e
Definition 6.1.2 A point x0 is an ordinary point if p(x) and q(x) are analytic
at x0 , meaning they can be written as power series around the point x0 .
Otherwise it’s called a singular point.
99
The method of series solutions works as follows. For simplicity, assume we
are looking at a solution around the point 0 and that p(x) and q(x) can both
be expressed as power series
P∞ around 0. A power series can be differentiated
n
term by term, so if y = n=0 an x , then
∞
X ∞
X
0 n−1 00
y = nan x , y = n(n − 1)an xn−2 .
n=1 n=2
If we plug in the power series representation for p(x) and q(x) on the left and
simplify, then the left-hand side becomes a power series which then needs to
have all coefficients zero, since the right side is zero. This gives us equations
for the an , which lets us compute a power series for the solution.
100
Example 6.1.6 content...
Definition 6.2.3 The equation we got above for r i.e. r(3r −2) = 0 is called
the indicial equation and it’s roots are called the indicial roots.
101
7 Vector Calculus
7.1 Line Integrals
We start by covering the required theory of curves. The line integral is a
generalization of the one-dimensional Riemann integral that you have seen
before in calculus, but instead of integrating over a fixed axis, the partition
is done over a curve.
Definition 7.1.4 A curve γ is closed if γ(a) = γ(b), i.e., the initial point
and endpoint are the same.
102
make up the unit circle. However, we could also pick the curve
and its set of points also make up the unit circle. Thus if we think of the unit
circle itself as a closed simple curve, then there are multiple ways in which
we can represent it. In our theory of integration we require these curves to
be related in the following sense.
Example 7.1.7 Our two parametrizations of the unit circle are equivalent,
since
µ = γ ◦ φ,
where φ : [0, 1] → [0, 2π], γ(t) = 2πt, so that φ0 (t) = 2π > 0 for all t ∈ [0, 1].
Let γ : [a, b] → U , U ⊂ R2 , be a curve and f : U → R some function.
Assume that we split the interval [a, b] into pieces
Denote by ∆sk the length along the curve from γ(tk−1 ) to γ(tk ) and denote
by ∆xk and ∆yk the difference in x and y coordinates i.e.
If we measure the are of under the surface f along the curve γ, then this is
approximately given by
X k
F (γ(ti ))∆si
i=1
and the projections of the area to the x and y axis are approximately given
by
Xk k
X
F (γ(ti ))∆xi , F (γ(ti ))∆yi .
i=1 i=1
103
These are just typical Riemann sums, so by letting the size of the partition
of the interval [a, b] go to 0 we arrive at the quantities
Z b Z b Z b
0 0
f (γ(t))|γ (t)| dt, f (γ(t))γ1 (t) dt, f (γ(t))γ20 (t) dt.
a a a
Proof. We have that µ = γ ◦ φ, where φ : [c, d] → [a, b]. By the chain rule,
we have that
µ0 (t) = γ 0 (φ(t))φ0 (t)
and the change of variable formula for s = φ(t), so ds = φ0 (t) dt, gives that
Z d Z d Z b
0 0 0
f (µ(t))|µ (t)| dt = f (γ(φ(t)))|γ (φ(t))||φ (t)|dt = f (γ(s))|γ 0 (s)| ds.
c c a
Note 7.1.10 The previous theorem makes it possible to use expressions such
as ”integrating over the unit circle” since it doesn’t really matter which curve
we choose that represents the points on the unit circle, as long as the curve, is
differentiable and ”does not change direction”, meaning it does not suddenly
start moving backwards (this is what the φ0 (t) > 0 condition means in the
definition of equivalence of curves). This would guarantee that it’s equivalent
to the standard parametrization given in the example above.
104
Definition 7.1.11 A vector field is a function F : U → Rn , where U ⊂ Rn
In other words, to each point in space we attach a vector.
A vector field can be thought of as a current that flows through space. If
we move in space we have to fight the current to move forward. Therefore, if
−→
d = OP and we move from O to P , then the amount of work done is given
by F · d. It follows that if we move along a path γ, then the instantaneous
work done at a time t, is simply given by
γ 0 (t)
F (γ(t)) · 0 |γ 0 (t)| = F (γ(t)) · γ 0 (t).
|γ (t)|
This holds, since we are moving in the direction γ 0 (t) at speed |γ 0 (t)| while
the vector fields points in the direction F (γ(t)). The total work done when
moving along the path is then measured by
Z b
F (γ(t)) · γ 0 (t) dt
a
105
Example 7.2.2 If f (x, y) = xy 2 , then ∇f = (y 2 , 2xy), so that f is a poten-
tial for the vector fields F (x, y) = (y 2 , 2xy).
In other words, the value of the integral is only dependent on the initial point
and the endpoint.
Proof. Since the chain rule gives dtd f (γ(t)) = ∇f (γ(t)) · γ 0 (t), we get
Z b Z b Z b
0 0 d
F (γ(t))·γ (t)dt = ∇f (γ(t))·γ (t)dt = f (γ(t))dt = f (γ(a))−f (γ(b)),
a a a dt
Di Dj f = Dj Di f
Di Fj = Di Dj f = Dj Di f = Dj Fi ,
106
Theorem 7.2.6 If F = (F1 , . . . , Fn ) : U → Rn has a potential function f ,
then
∂Fi ∂Fj
= .
∂xj ∂xi
It turns out that a strong converse is also true. If the domain U of the
vector field is nice enough, then the theorem has a converse. First we need
to describe what ”nice enough” means.
Example 7.2.10 Assume that F (x, y) = (2x sin y, x2 cos y). This vector
field is defined on all of R2 , which is simply-connected, and
D2 2x sin y = D1 x2 cos y.
The previous theorem then states that there’s a function f : R2 → R s.t.
F = ∇f . Since F1 = 2x sin y = D1 f , by integrating, we get
Z
f (x, y) = 2x sin y dx + g(y) = x2 sin y + g(y).
107
and Z
f (x, y) = x2 cos y dy + h(x) = x2 sin y + h(x)
It follows that
f = x2 sin y + C,
Example 7.2.11 Let γ : [0, 2π] → R2 be the curve γ(t) = (cos t, sin t) and
let F be the vector field in the previous problem. Then
Z Z b
F · dr = (2 cos t sin(sin t), cos2 t cos(sin t)) · (− sin t, cos t) dt.
γ a
Computing the dot product would lead to an extremely nasty integral. How-
ever, since F has a potential function and γ is a closed curve, we just get
Z
F · dr = 0.
γ
vol(R) = a1 a2 · · · an .
108
Let f : R → R be some function and R ⊂ Rn some cuboid. If P is a partition
of R into smaller cuboids, then we may choose an element xA ∈ A for each
A ∈ P . We will investigate the meaning of the sum
X
f (xA ) vol(A).
A∈P
If the limit exists, we say that f is integrable and that the limit above is the
integral of f over R.
When n = 1 the integral above is just the standard Riemann integral and
it’s usually denoted by Z
f dx.
R
when n = 2 the integral above is called a double integral and we usually
denote it by ZZ
f dA.
R
Furthermore, when n = 3 the integral is called a triple integral and we usually
denote it by ZZZ
f dV.
R
Assume next that D ⊂ Rn is some closed and bounded region in Rn .
Then we may find a cuboid R containing D. If f : D → R is a function
defined on R, we may extend f to R by letting it be zero outside of D. In
other words, if we denote the extension by f˜ : R → R, then
˜ f (x), x ∈ D
f (x) =
0, x ∈ R \ D.
Z Z
Definition 7.3.4 We define f= f˜.
D R
109
Theorem 7.3.5 (Fubini’s theorem) Let R = [a1 , b1 ] × · · · × [an , bn ] be a
cuboid in Rn and f : R → R an integrable function. Assuming that f is
”nice enough”, then
Z Z b1 Z b2 Z bn
f= ··· f dxn · · · dx2 dx1
R a1 a2 an
with the integral on the right called an interated integral. Furthermore, the
order in which the iterated integrals are taken, does not matter.
Proof. Omitted.
110
1 on D. By the definition of the integral
Z Z
f = f˜
D
ZR1 Z 1 Z 1 Z 1
= f˜(x, y, z, w) dw dz dy dx
Z0 1 Z0 x Z0 y Z0 z
= 1 dw dz dy dx
0 0 0 0
Z 1Z xZ y
= z dz dy dx
0 0 0
Z 1Z x
1 2
= y dy dx
0 0 2
Z 1
1 3
= x dx
0 6
1
= .
24
111
around the curve. An illustration is given by the following picture.
y
d
J
I
c
x
a b
We prove Green’s theorem in two steps. The first step is to check that
it holds when C is just the boundary of a rectangle. This is given by the
following lemma.
λ1 (t) = (t, c), λ2 (t) = (b, t), λ3 (t) = (t, d), λ4 (t) = (a, t).
112
y
λ3
d J
λ4 λ2
c I
λ1
x
a b
Z Z a Z b
F · dr = F (γ3 (t)) · (1, 0) dt = F1 (t, d) dt
γ3 b a
Z Z c Z b
F · dr = F (γ4 (t)) · (0, 1) dt = F2 (a, t) dt
γ4 d a
Next, we compute the double integral and show that we get the same expres-
sion. Applying the fundamental theorem of calculus and Fubini on ∂F1 /∂y
113
and ∂F2 /∂x, we get
ZZ Z dZ b Z bZ d
∂F2 ∂F1 ∂F2 ∂F1
− dA = dx dy − dy dx
R ∂x ∂y c a ∂x a c ∂y
Z b Z d
= F1 (x, c) − F1 (x, d) dx − F2 (a, y) − F2 (b, y) dy
a c
Z b Z d
= F1 (t, c) − F1 (t, d) dt − F2 (a, t) − F2 (b, t) dt
a c
Z b Z d
= F1 (t, c) − F1 (t, d) dt + F2 (b, t) − F2 (a, t) dt
a c
where γ is the path on the right in figure 5 and R is the area bounded by γ.
The proof of the theorem now follows by letting the grid become finer and
finer.
114
y y
d d
J J γ
I I I
c c
x x
a b a b
115
piece and γ2 around the lower one where both are oriented counterclockwise.
Then we see that the dashed pieces cancel out and Green’s theorem gives the
equality
ZZ ZZ ZZ
F dA = F dA + F dA
R R1 R2
Z Z
= F · dr + F · dr
γ1 γ2
Z Z Z
= F · dr − F · dr − F · dr.
C1 C2 C3
The integrals over the inner circles are negative, since they are traversed
clockwise.
C1
J
R1
I
I
J
C3 C2
I R2
Proof. If you check our construction of the determinant, you’ll see that this
is precisely what we wanted the determinant to satisfy.
116
Definition 7.5.2 A function T is 1-to-1 if T (x) = T (y) implies x = y.
Note 7.5.4 Since the determinant |JT (a1 , . . . , an )| is just a real number it
defines a function D → R given by
117
almost right except for some mathematical technicalities that we ignore. It
follows that Z X
f≈ f (yB ) vol(B)
T (R) B∈T (P )
Note that the actual change in the function is approximately given by multi-
plying by the Jacobian matrix T 0 (x0 ). If we make the simplifying assumption
that xA = 0 and T (x0 ) = 0 (we can do this by moving the origin), then it
follows that
T (A) ≈ T 0 (xA )A.
This is equivalent to the theorem given in the beginning of this section with
M = T 0 (xA ).
The following are standard coordinate transformations that everyone should
be familiar with.
118
Definition 7.5.6 Any point in the (x, y) can be written as x = r cos θ,
y = r sin θ, where r ≥ 0 and 0 ≤ θ2 ≤ π. Thus, we have a transformation
a ≤ x2 + y 2 + z 2 ≤ b.
119
In spherical coordinates the region is given by
a ≤ ρ ≤ b, 0 ≤ φ ≤ π, 0 ≤ θ ≤ 2π,
which we’ll denote by R. Let T be the transformation given above for spher-
ical coordinates. Using the change of variable formula, we get
ZZZ ZZZ
2 2
x + y dV = x2 + y 2 dV
D T (R)
ZZZ
= ρ2 sin2 φ|JT |dV
R
Z 2π Z π Z b
= ρ4 sin3 φ dρ dφ dθ,
0 0 a
0 ≤ r ≤ 1, 0 ≤ θ ≤ 2π, 0 ≤ z ≤ 2.
It follows that
ZZZ Z 2π Z 1 Z 2
2
y z dV = zr2 sin2 θ r dz dr dθ,
D 0 0 0
Usually one thinks of surfaces as mapping a region of the (u, v)-plane into
a 2-dimensional surface in R3 . Now if (a, b) is a point on a horizontal grid
line, then at point r(a, b) the tangent vector has direction
∂r
(a, b)
∂u
120
v
z
Σ
R x = x(u, v)
y = y(u, v)
(a, b) z = z(u, v) r(u, v)
y
0
u x
Figure 7: A surface in R3 .
Example 7.6.3 Typically a the surface is of the form z = f (x, y), then we
may write
r(x, y) = (x, y, f (x, y))
121
z
(y − b)2 + z 2 = a2
a
u y
0
and
∂r ∂r
(x, y) = (1, 0, fx (x, y)), = (0, 1, fy (x, y)),
∂x ∂y
so that s 2 2
∂r ∂r ∂f ∂f
× = + +1
∂x ∂y ∂x ∂y
Example 7.6.4 Let’s compute the surface area of the torus with inner radius
b − a and outer radius b + a. A torus can be parametrized as a surface as
r(u, v) = ((b+a cos u) cos v, (b+a cos u) sin v, a sin u), 0 ≤ u ≤ 2π, 0 ≤ v ≤ 2π,
We have that
∂r
= −a sin u cos vi − a sin u sin vj + a cos uk
∂u
∂r
= −(b + a cos u) sin vi + (b + a cos u) cos vj + 0k
∂v
Computing the cross product gives
∂r ∂r
× = −a(b+a cos u) cos v cos ui−a(b+a cos u) sin v cos uj−a(b+a cos u) sin uk
∂u ∂v
122
and
∂r ∂r
× = a(b + a cos u).
∂u ∂v
It follows that
ZZ Z 2π Z 2π
dS = a(b + a cos u) du dv = 4π 2 ab.
r(D) 0 0
Next we develop a surface integral for vector fields. Note that the vectors
∂r ∂r
(a, b), (a, b)
∂u ∂v
span the tangent plane to the surface at the point (a, b) and that the unit
normal of the surface at a point (a, b) is then given by
∂r ∂r
∂u
(a, b) × ∂v
(a, b)
n(a, b) = ∂r ∂r
.
k ∂u (a, b) × ∂v
(a, b)k
The only problem is that in order for us to be able to define a surface integral
that makes sense, we need to be able to pick a unique normal vector at
each point. At each point on a surface there are two possible choices for a
unit normal i.e. the normals pointing in opposite directions. We make the
following definition.
123
Example 7.6.7 Let F (x, y, z) = (0, z, z). Compute the flux through the
part of the plane z = 6 − 3x − 2y contained in the first octant where the
orientation of the plane is chosen, so that the normal points upwards.
The normal of the plane 3x + 2y + z = 6 is
(3, 2, 1) 1
n= = √ (3, 2, 1).
|(3, 2, 1)| 14
Hence, ZZ Z ZZ
1
F · dS = (F · n) dS = √ 3z dS.
S S 14 S
We may parametrize the plane by the x, y i.e. r(x, y) = (x, y, 6 − 3x − 2).
The intersection with (x, y)-plane is
3x + 2y = 6 ⇒ y = 3 − 3x/2,
so that
1
ZZ Z 2 Z 3−3x/2 √ Z 2 Z 3−3x/2
√ 3z dS = 3 2 2
(6−3x−2) 3 + 2 + 1 dA = 3 (6−3x−2) dA = 18.
14 S 0 0 0 0
Note 7.6.8 A general simplification worth noting in the formulas here (and
which the course book omits) is that the unit normal is given as
∂r ∂r
∂u
× ∂v
n= ∂r ∂r
,
k ∂u × ∂v
k
so that if r : D → R3 is the parametrization of S = r(D), then
∂r ∂r
× ∂v
ZZ ZZ ZZ ZZ
∂u ∂r ∂r ∂r ∂r
F ·dS = F ·n dS = (F · ∂r ∂r )k × kdA = F · × dA,
S S D k ∂u × ∂v k ∂u ∂v D ∂u ∂v
which gives us the following formula, which is much easier to compute
ZZ ZZ
∂r ∂r
F · dS = F· × dA
S D ∂u ∂v
√
This is why the 14 cancels in the previous example. Thus, using the formula
derived above, we could have just directly computed
Z 2 Z 3−3x/2 Z 2 Z 3−3x/2
(0, 6−3x−2y, 6−3x−2y)·(3, 2, 1) dy dx = 3 (6−3x−2) dA = 18.
0 0 0 0
124
7.7 Divergence theorem
Definition 7.7.1 A surface is closed if it has a well-defined interior.
If we look at the opposite side S2 , then the normal is reversed, and the center
point is (x, y, z − ∆z/2), so
It follows that
F3 (x, y, z + ∆z/2) − F3 (x, y, z − ∆z/2)
ZZ
F · dS ≈ ∆V.
S1 +S2 ∆z
125
If we know let the size of the cube go to 0, then we get
Repeating the same argument for the other pair of sides of S we get
Z
1 ∂F1 ∂F2 ∂F3
lim F · dS = + + .
∆V →0 ∆V S ∂x ∂y ∂z
What is this surface integral on the left actually measuring? If we want to
know how much F (x, y, z) is expanding at the point (x, y, z), then we can
measure this by letting a small square be centered at (x, y, z) and measuring
the difference between how much flows out compared to how much flows in
and divide this by the volume of S. This will give us the expansion of the
vector field within S by unit volume. Thus, by making the cube infinitely
small, we are measuring the unit expansion of the vector field at (x, y, z).
This motivates the following definition.
div F = ∇ · F.
Proof idea. We will again give the idea of the proof. If we subdivide the
interior of S into many small cubes, S1 , . . . , SN each with volume V1 , . . . , VN .
126
Then if Si is centered at (xi , y, zi ), we get
ZZ ZZ
1
F · dS ≈ div F (x, y, z) ⇔ F · dS ≈ div F (x, y, z)∆Vi .
∆Vi Si Si
It follows that
ZZ N ZZ
X N
X ZZZ
F · dS ≈ F · dS ≈ div F (xi , yi , zi )∆Vi ≈ div F dV
S i=1 S i=1 V
and equalities follow from letting N , the number of cubes in the subdivision,
go to infinity.
127